LOGIC DESIGN for ARRAY-based Circuits
by Donnamaie E. White
for elektroda people
Logic Design for Array-Based Circui...
134 downloads
1204 Views
9MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
LOGIC DESIGN for ARRAY-based Circuits
by Donnamaie E. White
for elektroda people
Logic Design for Array-Based Circuits by Donnamaie E. White Copyright © 1996, 2001, 2002 Donnamaie E. White
The original form of this book was published by Academic Press, 1250 Sixth Avenue, San Diego, California 92101-4311 in 1992. ISBN 0-12-746660-6. The figures were reproduced with the permission of Applied Micro Circuit Corporation. The Q20000 Series and other bopolar and BiCMOS series referenced belong to AMCC. Note that this book is dated as to the AMCC ASIC business. It represents design flow from the late 1980's to early 1990's. Thge design flow bears a remarkable similarity to the current design flow used by Cadence and Synopsis - with the switch from schematic capture to HDL code and synthesis and with more of the validation steps now performed by various software programs. A basic understanding of the underlying methodology to what we do today with deep submicron technologies is still a good read. Everything that bipolar design had to handle in the early 1980's is what we now must handle for 0.25 micron and belowCMOS technologies. This book was based on classes taught at USCD and at AMCC's customer Design Center. Customer training courses prepared by high-technology vendors are a required extension to that training available in the engineering classes at the college level. The quality of that training can vary with the experience of the instructor. The experience of the instructor with the nuances of the products is one facet. The teaching expertise is another. The purpose behind this book was to document what a proven instructor was adding to the course material and manuals. By providing this supplement, it would be possible for other, less experienced instructors, to take over the actual presentation of the seminars while ensuring no loss of insight into the methodology or product taught.
● ● ●
●
●
Table of Contents Preface Overview ❍ Introduction ❍ Integration Levels ❍ Demand and Supply ❍ eLearning - the Next Best Thing Chapter 1 Introduction ❍ Introduction ❍ Selection ❍ Design Support Issues ■ Schematic Rules Checking ■ Reformatters ■ Design Upgrades ❍ Exercises ❍ Update 2000 Chapter 2 Structured Design Methodology ❍ Structured Design Methodology ❍ Review of the Available Arrays ❍ Initial Sizing of the Circuit ❍ Create the Preliminary Macro Schematic (when schematics are used) ❍ Compute the Path Propagation Delay ❍ Compute the Estimated Power
Pre-Simultation Steps Simulation ❍ Fault Grading ❍ Design Submission Through Prototype ■ Design Validation Review ■ Placement Chapter 3 Sizing the Design ❍ Functional Specification - A Closer Look ❍ Review the Available Arrays ❍ Architectual Specification or Hardware Specification ❍ Array Sizing ❍ Cell Capabilities ❍ Array Architecture ❍ Netlist ❍ Example - AMCC Arrays - Power Supply Options ❍ Examples ❍ Refining Interface Requirements ❍ Dual-Function I/O Macros ❍ Thermal Diodes ❍ Final Interface Cell Utilization ❍ Drivers ❍ Exercises Chapter 3 Appendix = Case Study in Sizing a Design Chapter 4 Design Optimization ❍ Introduction ❍ Optimization Approaches ❍ Design to Improve Speed ❍ Example of Silicon Efficiency ❍ Internal Net Delays ❍ Design to Reduce Internal Cell Utilization ❍ Design to Reduce I/O Utilization ❍ Design to Fit the Package ❍ Design to Reduce Power ❍ Design to Reduce Cost ❍ Basic Design for Circuit Testability ❍ Basic Design for Circuit Reliability ❍ Design to Reduce Cost ❍ Exercises Chapter 5 Timing Analysis for Arrays ❍ Introduction ❍ Path Propagation Delay Overview ❍ Intrinsic Set-Up and Hold Time ❍ Interconnect Delays ❍ Annotation ❍ Manual Computation - One Method ❍ Example Equations for Extrinsic Loading - Internal Nets ❍ k-Factors ❍ Computing Lfo ❍ Computing Lwo ❍ Computing Lnet ❍ Example Equations for Extrinsic Loading - Output Nets ❍ Worst-Case Delay Multiplication Factors ❍ Front Annotation ❍ Intermediate Annotation ❍ Back Annotation ❍ Exercises Chapter 6 External Set-up and Hold Times ❍ Introduction ❍ Case 1: When the Timing Specifications Are Nominal ❍ Case 2: When The Timing Specifications Are Worst-Case ❍ Example - AMCC Q1400 BiCMOS Series ❍ Example - AMCC Q20000 Bipolar Series ❍ Case Study: Preventing Hold Violations Due To Clock Skew ❍ Computing Hold Time in the Register Example Chapter 7 Power Considerations ❍ Introduction ❍ DC Power--❍ AC Power ❍ Worst-Case Power ❍ Power Reduction Techniques ❍ Design Rules for Power Reduction (AMCC) ❍ Computing DC Power Dissipation ❍ AC Macro Power Dissipation Case Study: DC Power Computation ❍ Step 10: Determine ECL Static Power PEO Case Study: AC Power Computation ❍ Total Power Dissipation Chapter 8 Simulation ❍ Introduction ❍ Simulators - The Tools ❍ Wafer Sort/Packages Part Sort Functional Simulation ❍ ❍
●
● ●
●
●
●
●
●
●
At-Speed Simulation For Timing Verification AC Tests ❍ Parametric Vectors ❍ Hazards Case Study: Simulation ❍ Functional Simulation ❍ Parametric Simulation ❍ At-Speed Simulation ❍ AC Test Chapter 9 Faults and Fault Detection ❍ Introduction ❍ Fault Types ❍ The Problem ❍ Selecting a Chain ❍ Case Study - 16:1 MUX ❍ 2:1 MUX Example ❍ 3:1 MUX ❍ Actual Test Vector Sequence for a 16:1 MUX with Clocked Output Chapter 10 Design Submission ❍ Case Study: AMCC Design Submission ❍ Continued ❍ Functional Simulation Submission (Required) ❍ At-Speed Simulation Submission (Required) ❍ AC Tests Path Delay Vector Submission (Optional) ❍ Parametric Vectors (Required or Optional) ASIC Glossary ❍ Glossary A-D ❍ Glossary E-K ❍ Glossary L-R ❍ Glossary S-Z ❍ ❍
●
●
●
●
❍
Preface This book was based on classes taught at USCD and at AMCC's customer Design Center 1984 - 1994. It is based on schematic capture rather than Verilog or VHDL input and on manual support rather than the newer tools available for static timing verification, test generation, synthesis, etc. The Design Flow is, however, still the same. The steps must be done, with or without tools. This book details the theory behind the new EDA tools. Customer training courses prepared by high-technology vendors are a required extension to that training available in the engineering classes at the college level. The quality of that training can vary with the experience of the instructor. The experience of the instructor with the nuances of the products is one facet. The teaching expertise is another. The purpose behind this book was to document what a proven instructor was adding to the course material and manuals. By providing this supplement, it would be possible for other, less experienced instructors, to take over the actual presentation of the seminars while ensuring no loss of insight into the methodology or product taught. The course on which the book was based was rewritten for each new array series and technology change made by Applied Micro Circuits Corporation. The array series covered originally included the bipolar Q700 (1000 gates on a chip running at 200 MHz) and ends with the Q20000 (20000 gates on a chip running at 1.2 GHz), with CMOS and BiCMOS added along the way.
Structured Design In the process of these rewrites, it became obvious that a certain core of the seminar remained inviolate - the structured, orderly, logical approach to circuit design . This approach was taken from discrete board design, from SSI-MSI logic design, from bit-slice design, from structured software and firmware programming, and from systems concepts. This core material is represented in this text. The emphasis is on the total design picture - all those myriad of details that interleave. What was also obvious is that examples that are wrong as well as those that are right are essential to rapid assimilation of the material. The last array series turned to for examples was the 1994-1995 AMCC Q20000 Bipolar array series. The goal has been to create a book that can be used with any vendor's array series. It applies those designing circuits for an ASIC, Application Specific Integrated Circuit, vendor to produce, and to those vendors who are designing ASSPs, Application Specific Special Products, standard products that are designed to be built on an array base wafer. (ASSPs are the latest addition to the designer's toolbox.)
Acknowledgments The author would like to thank the AMCC staff for their time and energy expended in the compilation of this material, with specific thanks to Richard W. Spehn for his expertise and encouragement.
Overview Last Edit July 22, 2001
Introduction Each of the last six decades has seen a new technology come forward as the leading edge for that era. Table 1 provides a summary of this evolution by decade and integration level. Table 1 - Integrated Circuit Evolution Approx. Date 1950s
Size gate level
Description A few transistors and other components combined to form an AND, OR or NOR gates
mid 1960s
SSI
4 or more gates; NAND, NOR, OR, AND, EXOR, NOT or INVERT
early 1970s
MSI
up to 200 gates; registers, decoders, multiplexors, etc.
late 1970s
LSI
several hundred gates; ALUs with scratch-pad registers, interrupt controllers, microprogram sequencers, ROMs, PROMs
1980s
VLSI
700 gates and up; CPUs, complex functions
1980s
ASIC
up to 30,000 gates; multiple functions
early 1990s
ASIC
up to 100,000 gates and increasing with speeds at 1.4GHz and higher
1980-1990s
EPAC
The development of analog circuit arrays
1990s
DSM SoC IP
Deep SubMicron (< 0.18µ) designs; 1 Million gate arrays; System on a Chip Intellectual Property - soft and hard IP building blocks
2000 and up
Design Reuse (IP); High-speed
Deep SubMicron (0.13µ) designs; 410 Million gate arrays; more gates, faster designs; improved test methodology; faster synthesis the 1 GHz and faster CPUs
Each technology change has led to a period where those designers who are state-of-the-art orientated, those who readily delve into new developments, accept and begin to use the newest devices in designs. For successful technologies, this is followed by the intense application and development phase where the high demand for engineers who can design with the devices typically exceeds the supply of those engineers. The is the driving force behind the evolution of IP (Intellectual Property) blocks, predesigned mega-function blocks that can be re-used in more than one chip. These mega-functions become part of the design library.They may be hardIP, where all levels of the base die are involved, or soft-IP, where only the metallization layers (currently about 6-8 layers of the die) are involved.
Integration Levels From the mid-1960s, there are small-scale integration (SSI) gates: NAND, NOR, EXOR, and NOT or INVERT. SSI can be defined to be about 2-10 gates on a single chip. Anything can be built from SSI, but the design time, power, and size make this approach obsolete for designs that must be built quickly and in quantity. Custom design at the transistor and resistor level is reserved for special projects. From the early 1970s there are larger blocks, medium-scale integration (MSI): registers, decoders, multiplexors, counters, adders, comparators, etc. MSI is loosely defined as approximately 20-100 gates. MSI allows more modular designs, speeding the design process when the blocks could be applied. In the late 1970s arithmetic logic units (ALUs) with on-board registers, microprogrammable sequencers and interrupt controllers in a bit-slice format became available. Memory chips (ROM, PROM, RAM) in increasing sizes became readily available. Large-scale integration (LSI) culminated in the one-chip microprocessors. LSI is loosely defined as approximately 200-1000+ gates. Very large scale integration (VLSI) has reached 20,000 gates and higher. LSI and VLSI further increase the modular block size, reducing design time, space, and power considerations and increasing reliability as connections are moved inside the components. Many LSI and VLSI blocks are designed by their manufacturers and referred to as fixed-instruction-set modules.
Bit-Slice Design For any given design, if the architecture of the fixed LSI and VLSI blocks suit the application then the design time is considerably shortened. When a one-chip microprocessor is not quite suitable, microprogrammable architectures can often provide sufficient customization. Microprogrammable architectures, such as bit-slice, allow a closer control over the architecture but not total control. The basic building blocks are still designed by the chip manufacturer for generic applications. Bit-slice architectures include interruptable sequencers and 32-bit ALUs. The customization of the bit-slice modules to an application is done through customer-designed module interconnection, the implemented commands and their sequences. The commands or instruction set is called the microprogram for the design.
ASIC The 1980s saw the acceptance of ASICs ( application specific integrated circuits), VLSI devices large enough to allow designers to implement architectures that were suited to solving the design problem rather than
forcing one architecture to solve everything. It was the natural extension to the bit-slice architectures, where some control of architecture was possible through microprogramming but where the basic building blocks were fixed designs. The application-specific customization of the design solution allows the designer to have the creative power of a gate-level breadboard design while keeping the production advantages of VLSI. Not far behind the ASIC and ASIC developments, multimedia and design integration saw a need to incorporate analog functions into digital systems. For years the trend had been away from analog design as a chosen career and now there was a shortage of design engineers. First came massive retraining of internal staff as companies struggled to cope. Then came the creation of Electrically Programmable Analog Circuit (EPAC) and related devices. Now designers are coping with 8-12 inch wafers, 1 million gate chips, a deep submicron technologies with a shrinking design time window. For example, the next-generation Pentium chips are mandated to be first-time silicon success. The first took four tapeouts to achieve success. Table 2 Integration Sizing Terminology Acronym Definition SSI
small scale integration where a few gates were lumped together as a means of improving the design and the design process,
MSI
medium scale integration when more gates were packed together in a single chip for the same reasons,
LSI
large scale integration when functional blocks could be contained on a chip,
VLSI
very large scale integration and its various offshoots (VHLSI, etc.) where larger functional blocks and their related circuitry could be brought together in lower power, faster chips.
ASIC
application-specific integrated circuit
ASSP
application specific standard product
EPACtm
Electrically Programmable Analog Circuit
ALU
arithmetic-logic unit
CPU
central processor unit
DSM
Deep SubMicron
VDSM
Very Deep SubMicron
SoC
System-on-a-chip
IP
Intellectual Property - precoded functional block for design reuse (Hard-IP, Soft-IP)
Business Systems
Demand And Supply The number of designers who can successfully complete the design of an array-based circuit through design submission and prototype acceptance is limited. Some estimates as of 1998 are as low as 50,000 engineers in the USA. The demand for array-based circuit designers is already predicted by the periodicals to exceed the supply of trained engineers. The demand for designers capable of fast, efficient and successful design with ASICs is exceeding the supply and the predictions for the future show a projected shortage. In addition to adding engineers to meet the demand, the productivity of each designer will need to be drastically increased. Designers must choose from a complex array of new products, new technologies, changing standards, a wide range of support, changes in packaging, varied design tools, and changing design rules, while evaluating cost-effectiveness of the final product. Workstations are evolving, changing platforms, expanding features, and moving from device to board to system level capabilites. Note: While this book was being written, Daisy went from one of the leading vendors to nothing, Valid transferred to the SUN platform, obsoleting the SCALD system, hardware emulators were beginning to be interesting, virtual memory was recognized as probably useful for the big designs, the average array speed went from 280MHz to over 1.2Ghz, the ASIC array size went from 1000 gates to over 100,000 gates (30,000 useable), and design rules for the newer arrays were rated as four times more complicated then before. In the time since, we have reached successful 750,000 gate designs and higher, have reduced technology from 0.35 to 0.18 micron and switched from schematic capture to Verilog or VHDL input. Design tools have advanced to pick up the intermediate steps between the larger packages and tools to remove manual operations and make on-screen design a reality. Array vendors start as many FPGAs and ASICs and are outsourcing their libraries. EDA houses are supplying libraries alsog with a full design flow tools set, usually with the intention of being the sole vendor for all of the array designer's needs. With the size, simulations became longer and 4K vectors were no longer a reasonable limit for test vectors, packaging was pushed to its limits and beyond, simulators were faced with the need for hardware-assist, timing verifiers became non-unique in the design cycle, frameworks began to be spoken of if not heavily used, behavioral languages (HDL, VHDL) were accepted in marketing vocabulary and then supported - and are now the accepted design start. These changes are only some of the ongoing evolution made over the past five years. Pick up any magazine or newspaper devoted to ASIC and at least one article will decry the monumental task facing the design engineer in the 90's and forward. There is a constant need to acquire new skills, understand and master new tools and accept new array design restrictions and features. And not only is the designer faced with the choice of which vendor and what product, but also with the management of the design once started. The design tools that do exist may not work together making design management a complex and error-prone process. As with any new technology, the engineer can choose to study the product and its support from the design manuals, datasheets and reading literature. ASIC array vendors provide design manuals to assist the designer in completing a successful design submission, that point of transfer between the design and the vendor. Vendors maintain applications support engineers to answer questions and to guide the customer-designer through the submission process. This "earn while you learn" is acceptable in some cases, where design schedules will allow the weeks or months it takes for the engineer to "get up to speed" and to redo those design phases that failed due to misunderstanding of the technology and its limits.
eLearning - Next Best Thing When I first composed this text, back in the early 1990s, little did I realize how much the industry would leap forward. None of us were prepared for the advance of the Internet, although e-mail had been with the engineering community since the 1960s and FTP had been in use since at least that time. HTML burst upon the scene and several of us clicked on the concept of "living" classrooms on the web almost instantly. In 2001, Harvard put its entire curriculum on-line (for free). Cadence has put all of its technical training classes on-line (for a fee). Synopsys has begun to put its technical training on the web. The industry has spawned expensive-to-produce CD ROM training, which has not been widely accepted, page-turners (PowerPoint presentations with/without audio and with/without video assist, "live" webcasts or update training, and fully-integrated, true computer-based instruction. The goal is to have "living" technical material that can be updated faster than the two-year cycle for a technical book or the six-month cycle of a technical journal. The web is immediate. This author has just completed the conversion of the Synopsys Advanced Chip Synthesis 3-day lecture-lab Workshop into the Advanced Chip Synthesis eLearning Workshop, hosted at Vitalect. This is the first of several planned course conversions. The workshops will still be available in ILT form (Instructor Led Training) as well. There is a free on-line Advanced Chip Synthesis Demo featuring one of the workshop Units. You can view the demo at Vitalect but your browser must be configured with RealAudio and Flash for proper display of animations and to hear the audio scripts. Vitalect features a "Set-Up" page to help you.
Training Classes - Historical Review ASIC, library and EDA vendors offer training classes where the array product and its peripheral requirements for design submission are presented in intense two to five day seminars and workshops. Because of the structure of a class, the array vendor can attempt to ensure that important issues are discussed or at least brought to the attention of the designers. This reduces the problems that could occur during the acceptance review of the design submission which shortens the first-time design cycle. AMCC - Applied Micro Circuits Corporation - offered a three-day array design class and a two-day workshop workstation lab class to its customers. This class was taught for seven years, using the same methodology for a range of evolving products: Bipolar Q700 Series, Q1500, QH1500, Q3500 Series, Q5000 Series, Q20000 Series; CMOS Q6000 Series, Q6000A Series, Q9000 Series; BiCMOS Q14000 Series; and Q24000 Series.
The workstations covered included the Daisy Logician (now obsolete) Dazix SUN, VALID SCALD (now obsolete), Valid SUN and Mentor on Apollo. Simulators include Tegas 5 (discontinued at AMCC), Lasar 6 and Verilog. The seminar was also taught at UCSD - University of California at San Diego - as an extension graduate engineering class with credit. The range of series and the variability of the platforms and tools listed for just one vendor demonstrates some of the problems associated with maintaining currency.
Vendor-Independent Training - the Design Flow With the range of technologies and array families within any technology and the number of workstations, platforms and simulators that support them, a basic design methodology was developed at AMCC to ensure a successful design the first time. This design flow is reflected in the wide variety of design tools and customer education classes offered at Synopsys and Cadence, the two biggest EDA firms. At this moment Synopsys is the industry leader in synthesis tools (Design Compiler, Module CompilerBehaviorial Compiler, Designware Foundation, RTL Analyzer, etc. ) and Cadence is the leader in Place and Route tools (Gate Ensemble, Silicon Ensemble). The same flow is represented in the design-reuse concept supported by Synopsys and other companies. The flow is used with little variation for the design of a full chip (core and I/O) and for the design of an IP module (coreonly).
Structured design works. The methods have been developed and tested with hundreds of designs. Any problems seen on submissions and prototypes can usually be traced back to some violation of the stated design methodology. In addition to AMCC and its arrays and the Synopsys CBA Design System, design manuals from other vendors were obtained and reviewed to verify that the basic approach is generic, i.e., technology and vendor independent. Once a structured design methodology was developed, it was imperative that the presentation be consistent across several instructors. Class notes, usually in the form of slides or overheads, are merely topical outlines and suppliments. Few instructors last long if everything is written on the overhead and instruction is a "reading of the screen". The usual procedure is to keep key words and phrases on the screen and the instructor then speaks "off the cuff". This approach is acceptable for most subjects. ASIC design is so complex and encompassing an issue that the class content can be driven by active students so that it emphasizes those areas questioned and de-emphasizes the rest. Classes will therefore vary in the depth of topics covered depending on the students in attendance.
About This Text An improvement to the process is a text book that can survive the evolving technologies and changing equipment. This text is an attempt to capture the verbal lecture used by the AMCC/UCSD instructor for this course to provide consistency for the classroom and for those who choose to be self-taught. It does not try to duplicate what can be found in the design manuals per se beyond using design examples. The student cum reader is always referenced to the most current design manual and datasheet for the array or array series or vendor support software of interest. This text will present the basic structured design flow, show how various steps interconnect, how they may be performed and provide checklists of items that should be known prior to design start. It is designed to support any vendor and any array. For those who were taking the class for credit, chapter exercises were provided to allow the students to perform exercises using the equipment and materials of interest to them. At the time, schematic-capture was the methodology; today we have VHDL and Verilog. Daisy, Mentor and Valid workstations centered on shcematic capture are no longer found in most engineering environments. The SUN workstation, the NT platform, HP and LINUX are the modes of operation today using advanced software tools to complete the tasks formerly done by hand.
Introduction
Introduction to Chapter 1 Application-Specific Integrated Circuits (ASIC) [1996] Application-specific integrated circuits (ASICs) fit between the detailed full-custom circuit designs and the off-the-shelf predesigned components. They offer the designer a faster method of tailoring the circuit to the task while retaining most of the fast design turn-around time offered by predesigned parts.
The Array An ASIC array is a single die from a production wafer. in the 1990s, it was generally two or three layers of metalization placed on top of a base array. Figure 1-1 provides an overview of the steps involved in building a semi-custom array. By 2001, the levels of metalization had climbed to an average of six layers of metalization. At least two layers are usually reserved for power-ground planes. THe layers in the base array varied with the process with 26-28 layers in the base die being a reasonable assumption. Figure 1-1 Semicustom Array Processing
The base array is predesigned by the array vendor. It consists of the layers required to define the cells and the components within
them. These components vary depending on the type of cell and the array family. They are resistors, diodes, transistors (bipolar or CMOS) with capacitance and impendances implied in the layering. The threshold voltage generators and other overhead circuitry will also be included in the base design. WAFER -----------------> DIE multiple die
Individual array
The array designer will have already determined where the fixed power and ground pads are located, how many types and how many of each type of cell there is per array, and what design rules are required in the use of the array. The base array is premanufactured, reducing the turn-around time of the design between design acceptance and prototype or production. CBA Design System designers had the priveledge of designing their own base die, including punch-outs for hard IP blocks, and powerground routing for RAMs and soft IPs. The wafer is put through wafer-sort to determine good and bad die. The die is a pre-packaged part which can be and is tested. When packaging is completed, the packaged part is retested. Wafer verification software (Dracula comes to mind) must verify all layers of the wafer, metalization and the base die, and verify that all IP blocks and memory blocks are properly connected. Hard IP blocks interconnect or "stitch" into all levels of the base die.
Customization The customization of the array comes from the interconnect of the base array components. The interconnect is both the intraconnect between components within a cell to form a function, called a macro, and the interconnect between the macros to form the circuit module. One or more modules may be placed on an array. The interconnect between macros is considered the routing or nets. Routability is a measure of the ability to transform the design to physical metal etch patterns or the metalization of the array. The macros are formed by a predefined layout pattern that is not considered part of the routing problem. Macros may exist with several "footprints", which allow them to be positioned with different layout aspects. They also exist in different drive versions, which may also cause differences inthe layout pattern. Switching a macro from one drive configuration to another may require its relocation in the circuit layout. With the high-speed arrays already available, the time delay or propagation delay through an interconnect net under heavy loading conditions may exceed the propagation delay through a macro. Priority pre-placement, design optimization for speed and other design approaches must be used to control the interconnect delays.
For DSM technologies, any technology below 0.18 micron, it is given that the interconnect delays will represent approximately 70% or more of the timing path delay These tehnologies require pro-active design methodologies to be successful. Design partitioning, placement, and careful constraints are all required for a successful DSM design,
Design Tools - [2001] In the 1990s the industry began to shift to EDA tools to handle the increased complexity of the ASIC designs. Any reasonable engineer can handle a design of up to 30,000 gates. When 6 million gates are involved, it would take multiple engineers years to complate one design. By the 1990s designs shifted from schematic capture, with the engineeer selecting the appropriate macros from a library, to HDL code. VHDL is currently used in Europe and Verilog is currently used in the United States.
Design Tools - [1990s] To perform a logical circuit design for an array-based circuit, the designer may choose between schematic capture, direct netlist creation, and the use of behavorial languages such as HDL and VHDL. Netlist generation as was done using Tegas is too tedious an approach for ASIC-based circuits past a minimal size. Netlist generation via a behavioral language or from schematic capture is the more usual approach. Translation programs exist to move a netlist in one format to a netlist in another format. The industry is still trying to expand the idea of EDIF, a common netlist that would allow input to any simulator and any placement system. For example: Verilog to Mentor translation is now possible using a Verilog netlist to create Mentor schematics. (Back-generation of schematics will remain a necessary step in spite of the push for behavioral descriptions as the preferred design tool.) Once an acceptable netlist has been generated by whatever means, the designer needs to check or verify that the design rules have not been violated. When the circuit is certified as acceptable and buildable, the circuit must be simulated according to the design submission requirements of the chosen vendor. The simulation must be checked. The design must be documented. Simulations involve control programs, stimulus generation, annotation delay files and descriptions. AC test analysis requires additional documentation. Which simulator can be used, and whether any timing verifier or other tools are available, is limited to what the array vendor supports. The simulation output files must be formatted according to vendor rules to allow the generation of test vectors. These will be transferred to the placement software and to test-generation software. A submission may include dozens of files that must be
tracked, controlled for revision level and managed to verify that the design submitted to the vendor is the one intended to be submitted. And yes, errors do occur.
Framework Systems Framework systems are under development as the means of alleviating the design management problem but they are in their infancy and industry sages are predicting at least five years before they meet any goals. Further, those developing framework systems disagree about those goals. There are four basic functions of a frame work agreed upon: ● ● ● ●
integration of design tools provide a common user interface manage the design data and manage the design process.
The integration of design tools includes tools from non-framework vendors. Allowing access to different design tools requires that the interface to those tools be reasonably similar and easy to use. (The Macintosh computers have proven the merit of similar and easy interface to tools and common databases.)
Array Selection as the First Task Whatever the framework systems end up providing, the basic design flow that exists today will remain intact. The first and most difficult task of array selection will not change, nor will the basic goals of the current design methodology. It is the ease of satisfying those goals that will change. The process of selecting an implementation for a circuit involves two basic decision processes. ●
●
First, a decision must be made on the technology that will satisfy the design criteria for power and speed. Second, a selection must be made from the components (arrays, macro, IP, I/O, etc.) available within those technologies.
Even with all the changes made in software tools, these two key items remain unchanged. Choose the process, which defines the technology, and then choose the components, for even with highlevel synthesis, the astutue designer can "guide" the software to a better solution. The software (Synopsys, Cadence, Avant! are the big three) is chosen by the designing group with input from the selected foundry as to the product design flow.
Design Options The choices listed in Table 1-1 are available to the designer for whom off-the-shelf and bit-slice microprogrammable architectures are not good enough: full-custom arrays semi-custom arrays and simple-custom (gate) arrays.
Full Custom Arrays If the bit-slice or off the shelf microprocessor solution is not adequate, the next option may be a customized design. Full customization for an application-specific design is not practical in individual components at the SSI/MSI level. Instead, one or more custom semiconductors can be designed that are specifically for and only for the application. The customized VLSI chip may be totally designed by the customer from the design of the components present in the individual cells (resistors, diodes, transistors, etc.) to the interconnect between these components in one cell and other cells. Table 1-1 Design Approach Comparisons FULL CUSTOM
SEMI-CUSTOM
PREFAB
multiple layers (18-20+)
2-3 layers
0 layers
fastest (maybe)
faster (maybe)
fast
smallest (maybe)
smaller (maybe)
longest design cycle
moderate design cycle
fastest design cycle
most control over design
moderate controlover design
no control (fixed architecture)
All mask layers required to implement the full custom design must be generated specific to the application. Prototype and debug must encompass all layers. This approach will provide the smallest silicon and the most optimum solution if the designer is experienced. It can be the longest prototype time. The key is the required expertise of the designer. The number of designers that can successfully design a fully customized array is significantly less than the designers that can successfully design an MSI/LSI PC board. Depending on the manufacturer, a macro or standard cell library may exist that can speed the design time if the cells and macros are suitable for the application. The internal macro interconnects would still run through all mask layers. Design time may be reduced at the cost of some flexibility, but prototype time would remain lengthy. The advantage of the macro library is to help the designer by providing common functions while lessening the experience level required for a successful design.
Semi-Custom Arrays A compromise between off-the-shelf modules and full custom semiconductors is semi-custom design. Semi-custom combines a manufacturer-designed base wafer with all components in place (resistors, diodes, tran-sistors, etc.) and a customer-generated interconnect pattern to implement the desired circuit. (Refer to Figure 1-2.) A SEMI-CUSTOM ARRAY CONSISTS OF: ●
Base Wafer ●
Macro Intra-connects ●
Placement ●
Interconnects The interconnect pattern, also called a netlist, is generated from the customer-designed schematic and restricted to the topmost mask layers. Most arrays require two metal layers and a via (through hole) mask layer. Some arrays require three metal and two via layers. Three-layer arrays may use two layers for global interconnect and the third for macro intraconnect, but there are no hard rules. The more layers, the more prototype debug time required. This may be compromised with the significant gain in power management possible with the third layer. The schematic for a semi-custom array-based circuit is built up from a library of macros released librarythat represent SSI, MSI, and sometimes LSI functions. If a different macro is needed from those in a released library, the manufacturer, for a fee, can usually generate a special custom macro (cell and component dependent). Most manufacturers prefer that the released macros be used. Figure 1-2 Circuit Composition
Semi-custom arrays allow a designer to create at the SSI-MSI level, with familiar functions, without a detailed knowledge of the underlying technology. Semi-custom arrays may themselves contain elements of bit-slice components, allowing both the hardware and the software to be tailored to the application. For example, at least one CMOS array uses the AMD Am2909 sequencer as a macro. If the designer is experienced and familiar with the macro library, the resulting silicon usage may approach that required by the best full custom design. CBA (cell-based arrays) and the now more-popular standard-cell libraries [popular as of 1999] are macro collections. The differences between them invlove how they are built in the sub-strata of the base die. CBA designs led the size war for some time; standard cells now produce typically smaller die sizes. Approximate estimates were for 10,000 array-starts in 2000; spilt 50-50 between these two technologies, the first time standard cells had come on so strong. Metalization layers are the customizable layers in a semi-custom array. Metalization, which sits on top of the base die, currently runs to 6 layers, 4 for interconnect and 2 for power-ground, although this may vary. The number of layers of metalization is expected to increase. Keep in mind that between each routing metalization layer, is a layer of vias, the vertical interconnects.
Simple Semi-custom Devices At the simplest end of the semi-custom spectrum are gate arrays, providing one level of interconnect to the user for specification with all other connections defined. Programmable devices such as PLAs (programmable logic array), PALs (programmable array logic), field-programmable muxs, sequencers, gate arrays, and other modules are available for limited quantity applications. (PLAs allow both AND and OR gates to be programmed PALs allow only the AND gates to be programmed.) Programmable devices are restrictive in the functionality provided. They are suitable and competitive when there is a match between a module and the current application. Field-programmable devices are to VLSI what the ROM/PROM is to the microprocessor, i.e., they support and enhance the design project. These devices provide board-clean-up functions, incorporating the simple functions that do not fit into a full semicustom array or that were found necessary to augment in a bit-slice or fixed instruction set design. They are still with us.
Selection The choice between full-custom, semi-custom, fixed or simple gatelevel custom is based on several factors. These include: architectural requirements, interface technology requirements, size restrictions, speed (maximum worst-case operating frequency), power limitations, power supply options, manufacturing cycle time, cost, packaging options, and design time. Figure 1-3 characterizes the problem.
Basis for Discussion The discussion in this text will refer primarily to Applied Micro Circuits Corporation arrays for examples of current technology. These include: Bipolar arrays: the Q5000, and the Q20000 Series; and BiCMOS arrays: the Q14000 and Q24000 Series. However, the design methodology; can be applied to any arrays from any vendor for any array technology and to any future arrays developed by AMCC and the other array vendors.
The design methodology is generic. It is vendor and technology-independent.
WHERE DO YOU START? Figure 1-3 The Selection Problem
Note: Later chapters in this text refer to engineering workstations (EWS) and the methodology for their use in the design process. Workstations that are specifically referred to are: the Mentor Graphics System on Apollo and the Valid on SUN. Simulators referenced include Verilog on SUN4 and Lasar 6 on the VAX under VMS. The basic tools required for a design remain the same regardless of the workstation, platform, framework or mainframe used.
Circuit Architecture A fixed-instruction set microprocessor or sequencer has a predefined architecture and instruction set. A bit-slice solution places some constraints on the designer in terms of architecture but leaves most of the definition to the user by way of the selected interconnections between bit-slice modules and the microprogram control. An SSI/MSI implementation allows the designer the specify in complete, exact detail the architecture desired. The SSI/MSI design can be implemented in full custom or semi-custom VLSI. Bit-slice modules can be emulated on arrays. The ASIC arrays are big enough to support a complex ALU module but not yet large enough for one array to replace a full microprocessor.
Which Array Technology? The broad categories of technologies are CMOS, BiPOLAR, BiCMOS, and GaAs. Figure 1-4 provides a family tree of the most common technologies, at least at this moment. Array technology is a subject in itself and the reader is referred elsewhere for detailed discussions on any specific process. Figure 1-4a The Dominant Technologies
Bipolar as used in conjunction with arrays in this text refers to ECLinternal with TTL, ECL 10K, ECL 100K I/O modes, or mixed ECL/TTL interface capability. Not all arrays offer the ability to mix TTL and ECL or to mix ECL 10K and ECL 100K on one chip. Some arrays may limit the types of macros that can be placed on the I/O cells. Design limits imposed by these restrictions are generally based on the array technology. The AMCC BiCMOS has the same interface capability as the bipolar arrays while providing a CMOS internal core. BiCMOS interfaces include CMOS, TTL, ECL 10K and ECL 100K and combinations of all. Not all BiCMOS arrays offer the ability to mix TTL and ECL or ECL 10K and ECL 100K on one chip. Figure 1-4b Relations among Silicon Technologies
Ref: Design of VLSI Gate Array ICs by Ernest E. Hollis Technology differences for VLSI are primarily speed and power. CMOS is lower speed, lower power. Bipolar at 600MHz or 1.2GHz and up is faster with a high power dissipation (5-7, up to 16 watts for the fastest arrays is not unusual). BiCMOS is intended to be a combination of these two, providing a reasonable speed (about 130MHz and up) at greatly reduced power dissipation. The actual maximum frequency of operation and the power dissipation will vary from series to series even within the technologies. Data sheets for the array series of interest should be reviewed and compared as a first method of estimation for applicability.
Obtain Data Sheets from several vendors Note: One array series may be lower power at one frequency and higher power at another. Comparisons must be made using equivalent conditions. When the conditions are not specified, ask! All vendors maintain Field-Application Engineers that can explain how measurements were taken or what assumptions were used. Figure 1-4c Relations among Technologies
Size The physical size limitations imposed on a design can dictate the design approach.
Base Arrays Base arrays come in a variety of sizes, usually specified in terms of equivalent gates. The arrays discussed herein range from 250 to 28000 gates, depending on the computational approach used. Equivalent gates; allow a relative sizing between arrays of the same technology. The gate used as an equivalent gate for bipolar arrays is the NOR gate, that used for BiCMOS arrays is the NAND gate. Equivalent gate sizing can be misleading. For a CMOS array, one gate is typically one cell. For the Q24000 Series BiCMOS arrays, one internal cell is approximately 4 equivalent gates. Today's arrays are custom-designed to the project. The determination of the die size and the number of I/O is computed from initial evaluations based on the specification.
Cells The actual cells; available on a bipolar array are larger and more complex, and can support a large variety of macros. A Q5000 Series logic cell (internal) can support: a 4:1 MUX, a 1:4 decoder, a scan-set D F/F, an 8-input OR/NOR, three latches or 2 D flip/flops. A 4-bit universal register (4 4:1 MUXs and 4 D flip/flops) requires 4.5 logic cells. A 4-bit carry-look-ahead adder with carry-out requires 5 logic cells. The 4-bit carry-look-ahead adder in the Q14000 Series BiCMOS arrays macro library requires 14 basic cells or 56 gates. The Q20000 Series L-cell is sized based on one Turbo output; per cell and is smaller than a Q5000 cell. A flip/flop that uses 1 cell in the Q5000 Series may use 3 cells in the Q20000 Series. Estimating cell counts requires access to the macro library. Basic sizing information such as cell counts and die sizes; can be obtained from the data sheets. Many circuit modules can be equated to cell counts by the specific array vendor. These estimates can be used for initial circuit sizing.
Array Size - Die Size A full custom design may or may not be smaller in die size than a semi-custom design. For a heavily populated array, the differences may be insignificant. The comparison must be based on the specific
application and the skill of the designer.
Packaging For arrays, the die size, the number of I/O pads, and the number of power and ground pads used affect available packaging;. A number of standard packages; are usually available for each array and the data sheet for an array series will provide the designer with an initial table of available packages. If less than the maximum number of I/O cells is used, some smaller packages may be usable. The package selection; affects package pin capacitance;, which affects loading delay for output pins;, junction temperature; computations and cooling considerations;, and final cell placement;, which also depends on the pin capacitance, and should be made well before final design completion.
Word Length The word length necessary for the system, whether a computer, controller, signal processor, etc., is known in advance. This is seen as the width of registers, partitioning of counters, width of adders, and number of simultaneously switching outputs; (SSO;s). It affects the partitioning and modularity of the design. The adders; available with a macro library are typically 4-bit adders, cascadable with the carry-look-ahead; macro to build a range of standard adder sizes. With a macro design, the available MSI macros and SSI logic can be used to provide a range of nonstandard word lengths. Counters; are typically 4-bits wide, expandable to 12 or 16 bits in width. Comparators; are modulo 6. Registers; come as 4-bit widths and latches; as 8-bits (octal latch). Larger macros are also under development or custom structures may be possible.
Instruction Set The instruction set; that the system is to support is another major impact on the design implementation selection. By building a custom or semi-custom array, the hardware can be configured to support any instruction set yet have the advantages of still being a VLSI solution.
Speed The maximum frequency of operation; specified for the circuit must be compared to that available for the array series or the off-the shelf components. The nature of the design may make it necessary to look at the toggle frequency; of the internal functions. The maximum frequency of operation, of interface as well as internal macros, is very important but it is not the only consideration when evaluating the performance that can be achieved. Due to loading delays, the final performance will depend heavily on the implementation possible with the given macros or possible custom macros, their drive factors; and load limits;.
Achievable speed is a function of both the experience of the designer in general and the macro library in specific. As an example, three implementations of a test circuit were made with the Q3500 Series and they varied from 145MHz to 233MHz (worst case maximum speed limits). The variance was found to be solely a function of the macros selected. This type of performance variance can be repeated for almost any circuit of any reasonable size. Speed, cell utilization (silicon density) and power can be traded off among the different possible implementations. This diversity is an advantage as well as a design challenge.
Macros - Libraries - Etc. The existence of an extensive macro library;, or even one that supports the circuit function for the application at hand, can sway a
decision as to which product to select. For the arrays of interest, the designer needs to review the existence of a macro library. If the array has a macro library, review the macros available for application to the intended design.
Macro Library In an array macro library;, macros; already released are available without delay. They represent pre-modeled, pre-simulated, preverified logic blocks. Their interconnect patterns are already defined for the various mask levels.
Custom Macros If custom macros; are needed for a semi-custom array library, they involve 2-3 masks layers. If a custom macro has to be built for addition to a full custom array library, it is a multi-mask level design task.
Silicon Compilers Silicon compilers provide a translation from a design description to pre-defined macros. They provide support for designers who wish to stay at a higher level in the design process. A silicon compiler can be compared to a software compiler it will speed the design process for the engineer at the cost of some flexibility. Like framework systems, the industry has no set standard to measure or define exactly what a silicon compiler can do. They remain in isolated use, faced with the same resistance that software compilers met on their first introduction.
Other Support Regardless of the design implementation;, a certain amount of software design support; is required. Error checking;, annotation;, simulation;, testability analysis;, fault-grading;, and vector rules checking; are some of the support areas pre-layout. After placement, there are placement rules checking, bus current checking for those arrays which require it, finalization of overhead current computations (for those arrays with programmable overhead), and finalization of power dissipation computations.
Design-Support Issues The basic questions involving design support; which must be asked when selecting any array include: 1. ) Which workstations are a prospective library or parts catalog available on? What main-frame? Is the library accessible for a customer-site or must dial-up be used? 2. ) What error checking; at the schematic level is available? Are there engineering rules checks (ERCs) to check on valid names, fan-out loading, population counts, current sums, power dissipation, technology mix-ups, array pad count, and interconnection restriction violations need to be caught before simulation. 3. ) What about Front-, Intermediate- and Back-Annotation;? These are needed for metal length and load evaluation and the impact of these on the timing. The ability of the annotation software to handle rise and fall load factor; differences and metal layer; differences needs to be clearly identified. Is there provision for output capacitive load (system and package pin capacitance). 4. ) Are there support tools; for simulation? Simulation control files, reformatter;s, and vector checking; are required. Timing verifiers; are important when path matching; is required. Other software that is useful for bit-slice, all arrays and any microprogrammable architecture device is a metaassembler;. This software allows a program or vector set to be described in a user-defined language (a pseudoassembler) and compiled to ones and zeros. It provides the designer with the ability to code the vectors in pseudoEnglish for readability. An example is MICRO2 from Digital Equipment Corporation. Also for simulation, what about automatic test generation; (ATG;)? Are design-for-test; (DFT;) macros and support software available to allow the use of this tool? 5. ) How does placement; enter into the design sequence? This would be board placement for components or cell placement for a semi- or full-custom design. Does the software offer some assistance to the user in drafting a placement file? What checking software is provided either on the workstation or is accessible by dial-up?
Workstations, Mainframes, Dial-up When evaluating an array library on a workstation, there must be a match between the operating system;, the graphics editor; and simulator;s and the macro library;. Each installation document for a line of workstations specifies the versions of the vendor software with which that the library is compatible. Check with the vendor summaries published by several technical magazines for an initial review or check with the array vendor for a more updated list of equipment and software compatibility. Most array vendors offer support for several workstations. The workstations are not restricted to semi-custom or single array design support. They offer component libraries for board design through simulation. Multiple-array simulations are possible if the array is correctly modeled and there is enough memory.
Design-Support Issues Schematic Rules Checking Each workstation has a modest schematic checking pass that it makes on the way to generating the workstation-specific netlist. The error reports; from these checking routines should be checked and all pertinent errors removed. If a partial circuit is being compiled, there may be interconnect errors; that need to be ignored. The checks are not exhaustive, but later software will assume that these checking routines were successfully passed. Workstation checks include one-ended nets, undriven page inputs, page outputs with no destination, naming confusion, missing blocks, and an attempt at duplicate name detection. AMCC provides engineering rules checking (AMCCERC;) for commonly made schematic interconnect and design errors including too many cells for the array checked by cell type and macro type, too many fan-out loads, improper connections for 3-state and bidirectional enables, improper characters in names or too long names, improperly connected wire-ORs, dangling pins, grounded outputs, and terminated inputs. It is one of the most complete packages in the industry today. As a part of the AMCCERC package, internal current, worst-case power dissipation for bipolar arrays, fan-out loading tables, simultaneously switching outputs reporting and power-ground checking, an I/O list, a package data list and a detailed population report are generated. Once placement is completed, these reports have a final form that becomes part of the device specification;.
Annotation Front-Annotation; is the estimation of interconnect (pin to pin) delays in an array due to electrical fan-out loading, electrical wireOR loading and estimated metal loading. The metal load delay estimate; is a statistical estimate based on the net size;. It is available pre-placement. Intermediate-Annotation uses a refined estimation of the metal load delay based on the relative placement of the individual macros in an array. The electrical fan-out loads and electrical wire-OR loads remain the same. Intermediate-Annotation is generated postplacement but pre-routing. Back-Annotation uses the final, actual metal load delay computed from the known metal lengths for the metal layers involved in the interconnect. It is available post-routing.
The availability of the annotation software, its ease of use, and the ease of integration into the simulation database is an important concern. Output capacitive load delays; for system system capacitive load;and package pin capacitance; affect the overall path delay. The ability to specify these loads and to have their delays included in the simulation database is another item of concern. If this feature is not available, the computation must be manually performed.
Simulation Support Every simulator has its own unique format requirements; for simulation input files. The stimulus, its switching waveform, the operating condition (military, commercial, nominal or minimum) library, sampling rates; or print on change recording, output file format, and input file format if a binary file can be read. The workstation may offer several methods of simulation and timing verification. The vendor may only accept certain files or file formats;. List and waveform displays; are available on the three previously listed workstations. Data can be displayed in binary, octal, decimal and hex format.
Reformatters If a standard simulation vector format; is required by the array vendor or by software to which the simulation results must be submitted as data, some means of reformatting must be available. For arrays, the functional, parametric, and AC test simulation results are generally used as input to test vector generation; software, and the allowed input formats may be restricted.
Example AMCC accepts only binary results for specific signals (input, output, bidirectional, 3-state and bidirectional enable internal signals). Sample size is restricted. No print on change; results are used for functional simulations, only sampled. No waveforms are requested. Since there are different simulation output formats, AMCC customers use a reformatter to translate Dazix, MENTOR, Verilog, Lasar and VALID simulation output files into a generic format;. If any other workstation is used, the output of that simulator must also be reformatted. AMCC uses their AMCCSIMFMT; software to transpose output files into an AMCC generic interface format that their test software programs can read.
Rules Checking Regardless of the implementation selected, the design must be simulated and the parts tested. There may be a number of functional, parametric and AC test simulation vector rules; that must be followed to insure correctness in the test program. The rules are based on tester limitations;, test procedures and test objectives. The rules required by the array vendor must be clearly stated and it is increasingly desirable to have some form of rules check software available to help the designer. AMCC supplies a vector checker, AMCCVRC;, to catch the more blatant vector rule violations such as missing required signals, too many signals switching in one vector causing noise, race conditions, undesired internal signals in the output and uneven sampling steps. Some basic toggle tests are also included.
Submission Assistance The design submission process for custom and semi-custom arrays requires a number of specific forms, files and validation procedures be followed and the process is increasingly complex. Automation of that procedure is one desirable goal. Automation support is feasible for the I/O signal list;, package pad-pin-post, capacitive load; and I/O toggle frequency; descriptions, design validation; checklists and design submission; checklists, including simulation submission. If no automated support is available, the necessary forms must be reviewed and filled in manually. Errors and incomplete information can lead to schedule delays. (Refer to the framework systems.)
Placement As a part of the submission process for custom and semi-custom arrays, the designer may wish to submit a desired placement or
partial placement. The vendor must supply placement; rules and restrictions for the particular array in the selected package as well as a placement worksheet. The user may be able to choose between a full graphic interface to the placement system or be content to supply the vendor with an ASCII list for placing some or all the macros, and let the vendor complete the placement process. The options and the control over placement become an issue when performance is driven to the limits of the array technology. I/O placement is an issue when an array will emulate an older technology and the PC board array pin out pattern must remain unchanged.
Design-Support Issues Reformatters If a standard simulation vector format; is required by the array vendor or by software to which the simulation results must be submitted as data, some means of reformatting must be available. For arrays, the functional, parametric, and AC test simulation results are generally used as input to test vector generation; software, and the allowed input formats may be restricted.
Example AMCC accepts only binary results for specific signals (input, output, bidirectional, 3-state and bidirectional enable internal signals). Sample size is restricted. No print on change; results are used for functional simulations, only sampled. No waveforms are requested. Since there are different simulation output formats, AMCC customers use a reformatter to translate Dazix, MENTOR, Verilog, Lasar and VALID simulation output files into a generic format;. If any other workstation is used, the output of that simulator must also be reformatted. AMCC uses their AMCCSIMFMT; software to transpose output files into an AMCC generic interface format that their test software programs can read.
Rules Checking Regardless of the implementation selected, the design must be simulated and the parts tested. There may be a number of functional, parametric and AC test simulation vector rules; that must be followed to insure correctness in the test program. The rules are based on tester limitations;, test procedures and test objectives. The rules required by the array vendor must be clearly stated and it is increasingly desirable to have some form of rules check software available to help the designer. AMCC supplies a vector checker, AMCCVRC;, to catch the more blatant vector rule violations such as missing required signals, too many signals switching in one vector causing noise, race conditions, undesired internal signals in the output and uneven sampling steps. Some basic toggle tests are also included.
Submission Assistance The design submission process for custom and semi-custom arrays requires a number of specific forms, files and validation procedures be followed and the process is increasingly complex. Automation of that procedure is one desirable goal. Automation support is feasible
for the I/O signal list;, package pad-pin-post, capacitive load; and I/O toggle frequency; descriptions, design validation; checklists and design submission; checklists, including simulation submission. If no automated support is available, the necessary forms must be reviewed and filled in manually. Errors and incomplete information can lead to schedule delays. (Refer to the framework systems.)
Placement As a part of the submission process for custom and semi-custom arrays, the designer may wish to submit a desired placement or partial placement. The vendor must supply placement; rules and restrictions for the particular array in the selected package as well as a placement worksheet. The user may be able to choose between a full graphic interface to the placement system or be content to supply the vendor with an ASCII list for placing some or all the macros, and let the vendor complete the placement process. The options and the control over placement become an issue when performance is driven to the limits of the array technology. I/O placement is an issue when an array will emulate an older technology and the PC board array pin out pattern must remain unchanged.
Design-Support Issues Design Upgrades A semi-custom array, full array or bit-slice design can be upgraded more easily than an LSI/MSI/SSI component or a fixed-instruction set microprocessor design. For bit-slice, if the design enhancements are known at the time of the original design, allowances can be made through interconnections and functional capabilities that are not accessed until a microprogram accessing these features is incorporated. Many changes can be made with microprogram changes alone. For semi-custom arrays, if the design enhancements are known in advance, the arrays can be partitioned to leave room for future macro additions or the macro functions could even be incorporated. As with bit-slice, the added capability is simply not accessed until required. If the design enhancements (evolution) are not known, but are anticipated to occur, the allowances for expansion may be anticipated. The designer may provide room for the design changes to be incorporated onto the older schematics, with additional vectors to be added to the existing simulations. The design is thus easily revised.
Tradeoffs The designer must evaluate the all the items discussed in this chapter to make a selection as to the best method of implementation for a specific circuit design. From there, the designer must further evaluate to find the best components available within the chosen category of implementation.
Exercises 1. To select a design approach, the following are questions that may need to be answered: ● ● ●
●
What architecture does the design require What flexibility can be allowed in the implementation What package types are desired versus what package types are available What operating environment (Commercial, Industrial or
●
● ●
●
● ● ● ● ●
● ● ● ●
● ● ●
● ● ● ● ● ● ● ● ●
● ●
●
Military) What cooling considerations have been made (heat sinks, air flow) What is the required interface to the outside world What is the required I/O mode (ECL, TTL, CMOS, MIXED ECL/TTL) What power supplies are available (+5, -5.2, -4.5,+5 with 4.5 or +5 with -5.2v) How many of the required I/O signals are inputs How many of the required I/O signals are outputs How many of the required I/O signals are bidirectional What type(s) of TTL: Totem pole, open collector or 3-stated What type(s) of ECL: ECL 10K, ECL 100K, on-chip series termination, off-chip series termination, differential, open collector, Darlington, etc. What about CML What about CMOS What are the physical size limitations imposed on the design What word length is required for an adder, ALU, counter, sequencer What instruction set or commands are to be supported How big is the design (equivalent gates) What is the intended maximum frequency of operation including I/O toggle frequency for the circuit, i.e., what are the performance requirements How much design time has been allowed What design support is available How much debug time has been allowed What debug support is available What simulation support is available What simulators What timing verifiers What about testability support Are upgrades to the design planned and if so, how easily can a design be revised What upgrades to the component series are planned What are the possible time schedules for ❍ design review: ❍ design to prototype ❍ prototype to production What are the overall cost limitations
Review these questions. Catalog them as to design-specific, arrayspecific, component vendor-specific and workstation-specific. What other questions might need to be asked before a design implementation approach (semi-custom, full-custom, fixed components, bit-slice, or gate array) can be selected? 2. Review the latest issues of ASIC News and at least one other ASIC related magazine. a. Locate two articles on framework systems. b. Locate two articles on HDL and VHDL. c. Locate at least one survey on expected growth of demand for ASIC arrays: bipolar, BiCMOS, CMOS and GaAs.
Update 2000 When I first wrote this book, I had spent a considerable amount of time in the Bit-Slice and ASIC industry. The book reflected the design procedures at that moment. It was a reflection of the class I taught at AMCC and at UCSD for 11.5 years. From there I developed and taught the technical training classes for CBA (cellbased array) designs. Now, even CBA is beginning to fade as standard cells take over a the dominant array-based technology. As of January 2001, there have been some changes, although not as many as people would like to think. The basic design flow? - It is still with us. ●
● ●
●
●
●
●
●
● ●
●
●
●
●
The size of the designs has changed from what could be handled by a human (up to 50,000 gates) to what must be handled by a computer (12 million gates). We are seeing designs that run 4-6 millions gates per ASIC and will be seeing 10-12 million designs shortly. Wafers have gone from 3" to 6-8" and are headed for 12". From primarily bipolar, I now work almost exclusively with CMOS. The process technology has gone from 5 microns to 0.18 deep sub-micron, and everything bipolar designers worried about is now the headache that concerns CMOS designers namely, that gate delays are practically negligible compared to interconnect delays. DSM (deep sub-micron) refers to this phomonina. By 1998, CBA ASICs still ruled, but by 2000 standard cells had become dominant, producing smaller and faster designs. Libraries still exist but now macro selection is by a synthesis software package and 85% of designs are done with the Synopsys Design Compiler package. Cadence now has a competitive synthesis package. Schematic capture is pretty much an anachronism and design are specified in Verilog or VHDL (RTL). Software tools take the design from RTL (register-transfer logic) through wafer-verification, with software like Dracula, Vampire), performing ATT (antenna checking), ERCs (electrical rules checking), DRCs (design rule checking). Manual operations are no longer feasible. RAM and ROM onboard an ASIC is no longer unusual. IP (intellectual property) blocks are in common use. DesignReuse is a buzzword. IP may be multi-layered and fixed (hard IP), or soft IP where a netlist is incorporated and the block may be altered by synthesis steps. Vectors are generated by automatic test vector generation software. Designs are made DFT (design for test) as a routine step in the compile process. DFT is no longer an option. Software can tell you if your design is testable and if it is routable. Place and Route are no longer a last step in the process.
●
●
●
● ●
●
●
● ● ●
●
●
●
●
●
Floorplanners are required for any sizable design. Cadence's Gate Ensemble and Silicon Ensemble were the leaders in Place and Route. Synopsys has the Chip Architect floorplanner. Cadence has the Logical and Physical design planners. Avant! has Planit! for floorplanning tasks. Everybody's floorplanner has to talk to everybody's synthesis tool and they both have to talk to everybody's place&route software. Everybody has to talk to PrimeTime. What AMCC called "intermediate annotation" is what is produced during floorplanning and it's use is not optional.. EDIF became the standard for netlists. There is no more concern about whose netlist went where. EDIF is used to input to floorplanners, and EDIF is produced by the systhesis tools and by the place&route tool. DB has become a standard format. DEF (design exchange format - Cadence) became the standard for input to place&route PDEF 2.0 is the standard output of the floorplanners and is now standard input to place&route software. PDEF 3.0 is on the horizon. SDF 2.0 is used for delay files and can be created by PrimeTime from delay information from most floorplanners and place&route tools. SPICE files are still with us VCS, VSS and related tools perform simulation. GDSII is for building the basedie layers and GDSII is produced by the place&route software PrimeTime is the standard for static timing analysis (85% of design are verified with that software). Designers no longer have to "fit" their designs into a fixedsize chip with a fixed I/O count (cells in the I/O ring). Dies are designed to fit the design. Holes are punched through the layers to accommodate IP blocks (Hard IP blocks) and software exists to "stitch" the IP blocks into the basedie. You may not "diddle" with parametric specifications. If you have a different set of operating conditions, you must go back to the library vendor for new library specifications. The slightest variation can have dire consequences in the results. The axiom that the engineer who knew the library could do better designs than a more experienced engineer who did not, still holds. You may "direct" the use of macros by the synthesis tools. Tcl has become the standard interface scripting language.
In fact, Synopsys alone has approximately 42 different software tools available to help create an ASIC design. Design flow from RTL to wafer fab is the focus of most vendors today.
Structured Design Methodology
Introduction To The Overview The Structured Design Methodology, as developed here for the design of Bipolar, CMOS or BiCMOS logic arrays, applies to any array design effort regardless of technology or vendor. The designer who follows this methodology will ensure a smooth design flow between milestones that will help ensure a successful design the first time. The design flow is presented in this chapter at the introductory level. Following chapters will detail specific areas such as timing analysis, simulation and power computation.
Design Sequence - Pre-Capture The Structured Design Methodology stresses a certain design flow sequence of events, developed for use by the beginning array designer, the beginning user of an Engineering Workstation (EWS) or the designer experienced in both. Each step will be discussed in more detail after the design flow is fully outlined.
Circuit functional specification The circuit functional specification is the target specification; it describes what it is that is to be implemented on one or more arrays. This includes: a block diagram of the system or circuit, overall performance requirements, I/O interface, testability, environmental and packaging requirements. (See Table 2-1.) Once the functional specification identifies the need for more than one array, partitioning of the overall circuit modules to ensure proper boundary conditions must be made and then the functional specifications of the individual array circuits must be created. The specifications must be defined to be independent of each other to allow parallel circuit development. Note that there is no constraint at this point as to the product to be used beyond operating specifications. The technology of the array is defined by the performance requirements. As a basic guideline, high speed requires ECL bipolar, slower speeds and low power require CMOS, and moderate speeds and bipolar drive capability without the price of bipolar power dissipation require BiCMOS. Where the boundaries are is subjective and subject to continual evolution and change. Table 2-1 Components Of The Target Specification
Target Specification Block Diagram Showing Modules and Their Interface to Each Other and to the Rest of the System Functional Description of Modules Maximum Frequency of Operation Performance Requirements I/O Interface Environmental Requirements Physical Restrictions Power Restrictions Packaging Restrictions
Circuit hardware specification The circuit hardware specification is the planned hardware approach to satisfying the target functional specification. For multiple array designs, this may involve another level of specification, one specification for each circuit intended for a different array. This implies that project partitioning has been completed, and defines all required I/O and throughput performance. (See Table 2-2.) Table 2-2 Components Of The Hardware Specification Hardware Specification Selected Technology Potential Array Series Modules Detailed into Functional Sub-Modules Functional Description of Sub-Modules Functional Block Sizing - Cell Counts (Rough) I/O Interface Details - Cell Counts (Rough) Toggle Frequency for I/O initial Packages Critical Path Throughput Estimates Power Estimates
A hardware architecture specification equates to PDL (program description language) for software. It identifies modules and closely defines how the modules will work together. HDL (hardware description language) and VHDL have been developed to formalize this specification. From this level of specification it is possible to estimate I/O signal requirements and internal cell utilization. At this point, the estimates are very rough and will only serve to allow a first cut at reducing the number of arrays that need to be considered. Some compromises or engineering tradeoffs may have been made, refining the functional specification.
Review of the available arrays The arrays available at the time of a design evaluation need to be reviewed using the outline in Table 2-3 as an initial basis of comparison. Table 2-3 Array Checklist - Initial Review Initial Review Checklist technology I/O resources - number of available I/O pads and pins internal density or equivalent gate limits I/O mode configurations including power supplies supported Placement support, options power dissipation limits available packaging maximum operating frequencies ● ●
internal toggle frequencies interface toggle frequencies
design support: ● ● ● ● ●
EWS libraries - the macros available Netlister - the macros available annotation support design-correctness software user-friendly interface with test
turnaround time from design submission to wafer prototype cost
Figure 2-1 indicates the interdependencies between functional specification, hardware specification and the arrays. Figure 2-1 The Array Selection Process
This review must compare what is available with the circuit specifications and produce a list of the available arrays that could be used to support those specifications. As the number of potential arrays is reduced, preliminary implementations of some of the critical paths for the circuit, constructed from the macro libraries under consideration, should be evaluated.
Initial sizing of the circuit Before an array or array series has been chosen, estimate the size of the circuit or circuits to be placed on the array. Estimate the number of I/O connections, the types of I/O connections and the I/O cell count. The I/O cell count and the pad count may both be required. Estimate the internal cell count. (See Table 2-4.) Table 2-4 Sizing Review Initial Sizing Review Types of I/O Interface Number of Each Type Equivalent Gate Count or Internal Cell Utilization ---- Estimated by Cell Type
For standard functions, equivalent gate counts may exist that can be used in place of internal cell count to estimate the size of the internal array area that will be required. Internal cell counts are more useful than equivalent gate counts where the cells are more complex than one or two gates. Compare these estimates to the review of the arrays still under consideration and their I/O resources, their internal density and their maximum frequency of operation. Note that, at this stage in the design, the sizing estimates for the circuit may be off by a considerable margin. Historically, device cell utilization at the estimate stage of a design is 20-30% below the final value. Q2000 Series Approximate Equivalent Gate Size (Historical)
Internal cell utilization The first population checks can be made before the circuit is designed. Internal cell utilization is one of these checks. Internal cell utilization is the number of cells required by a circuit divided by the number of cells available. Internal
Number of Internal Cells Used
Cell Utilization = ------------------------------------------Number of Internal Cells Available Macros that are suitable can be listed and a rough estimate of internal cell utilization computed. This step includes a review of the available macros in the various libraries with emphasis on the requirements of the specific circuit application. Reviewing the macros available allows a match to be made between functional macros that exist and what is required to implement the design in the least silicon for the highest performance. All other things being equal, the convenience of the macro library can be a decisive factor in the final array selection. Do the macros available support the circuit modules? Large macros may include adders, carry-look-ahead, comparators, up and down counters, universal registers, large multiplexors and decoders. Internal cell utilization should be 60-70% at the initial stages of sizing estimates to allow for expansion due to buffers, fan-out load distribution, path balancing or specification changes. The internal cell utilization limit for a completed design is array-specific. (See Table 25.) AMCC arrays have an upper limit of 95-100%. Table 2-5 Internal Cell Utilization Limit Preliminary Circuit Final Circuit 60-70%
80-100%
Interface cell utilization The I/O requirements to the outside world are the second size determination. The array for a circuit must provide sufficient I/O capability to handle all signals, all other interface-placed circuit support such as three-state enable drivers, test enable controls and added power and ground pads to support simultaneously switching outputs (SSO) and high-speed inputs. As with internal cell utilization, only an estimate of final interface cell utilization can be made. The array should not use100% of the I/O or the design will become I/O bound. Pad utilization, for cases where the I/O cells and pads are not one for one, must also be kept under 100%. A check on array symmetry should be made. The Q20000 Series arrays do not provide the same number of I/O cells in each array quadrant. This may affect placement and added power and ground usage. The Q24008 is not square and has variable power and ground bonding. Check for these and other variations that might affect allowable utilization of the I/O pads and cells.
Selection of the array series Integrate the hardware specification, the available arrays and the initial sizing estimates to select the target array series. The final choice is usually based on the performance - cost - availability support matrix. In cases of equivalence between one or more array series, the final choice may be subjective. Package availability should be considered in the early decisions since customized packages, especially for large arrays, take months to
develop. The specified performance and requirements for on-chip memory will assist in the reducing the number of options. Only a limited number of arrays support on-chip memory, such as the QM1600T. CMOS and BiCMOS do not yet support designs operating at 300MHz (although individual macros can toggle at these speeds). High-speed bipolar arrays support paths operating over 1.4GHz.
Combine all of the information gathered to date and select one or more series for final evaluation.
Compute the path propagation delay Compute the path propagation delay for the most critical (time sensitive) paths in the circuit. Make adjustments to the schematic in terms of macro options for speed where needed. Does the estimated performance satisfy the specification? Sum of Macro
Sum of Macro
Path Delay = Intrinsic Delays + Extrinsic Loading Delays For the arrays that use typical specifications, be certain to use the correct multiplication factor (WCM) for this worst-case analysis. Review the assumptions made in establishing the multiplication factors and adjust them if these assumptions are not expected to be met (i.e., derate the performance by a higher factor). Some vendors call these multiplication factors "adjustment factors". Be clear as to what is being adjusted and why. There may be different multipliers for the different product grades, Commercial and Military, and for different power supplies within the product grade. The multiplier may depend on the macro type. Many arrays are specified without worst-case timing multipliers. They are specified with min/max ranges for each macro propagation delay. Maximum path delay is found using the MAX data although the conditions for a maximum propagation delay for an individual macro will vary. Minimum delays are found using the MIN data. Be certain that the proper fan-out loading and performance specifications are selected when doing this computation. Because of the high degree of variation in the way a library is documented between vendors and between array series from the same vendor, be certain that the rules regarding the methods of specifying timing delays for the macros for the array series selected are clearly understood. Internal extrinsic loading delays are composed of metal load (Lnet), electrical fan-out load, the sum of all loads driven (Lfo), wire-OR electrical loading if the array allows wire-ORs and if one was used in the net (Lwo) and the k-factors for each. The k-factors, expressed in ns/LU, convert the load units into time units. Table 2-7 shows the extrinsic load equations for internal nets as they are used by AMCC and other vendors. K-factors may be specified as tables, graphs, or broken down into parts for temperature, voltage and processing. Check with the specific vendor.
Will the array support the maximum frequency of operation and the critical path performance requirements?
Table 2-7 Components Of Path Delay - Internal Loading General Equation for Internal Extrinsic Delay: No wire-OR allowed: tex = knet * Lnet+ kfo * Lfo General Equation for Internal Extrinsic Delay: Wire-OR allowed: tex = knet * Lnet+ kfo * Lfo + kwo * Lwo Worst-case Internal Extrinsic Delay: For Arrays with a Worst-Case Multiplier: texwc = WCM * tex For Arrays with no Worst-Case Multiplier: tex is already worst-case
External extrinsic loading delays are composed of the system load capacitance and the package pin capacitance (Lcap) and the k-factor. The k-factor, expressed in ns/pF, convert the load capacitance into time units. The equation used by AMCC for this delay are listed in Table 2-8. Table 2-8 Components Of Path Delay - External Loading General Equation for External Extrinsic Delay: tex = kcap * Lcap Worst-case Internal Extrinsic Delay: For Arrays with a Worst-Case Multiplier: texwc = WCM * tex For Arrays with no Worst-Case Multiplier: tex is already worst-case
Compute the estimated power Use the macro occurrence list compiled for cell utilization to compute power. Determine the worst-case current multipliers used by the array and what voltage variations will be used by the circuit for DC power computations. Review the AC power equation if AC power must be computed. ECL output macros use a termination current and that power element must be included with the DC power computation. Different technologies use different methods to compute power as seen by the examples in Table 2-9. Table 2-9 Example Technology Approaches To Power Computation - AMCC Arrays
●
Bipolar (pre-Q20000) uses a current dissipation for each macro regardless of operating frequency (DC power only). ●
CMOS uses internal and output macros and their operating frequency to find AC power dissipation. ●
BiCMOS uses a combination of these techniques, DC power for bipolar interface macros and AC power for internal macros. ●
Q20000 Series uses DC power for all macros and AC power computation for ECL inputs, Darlington outputs and all internal macros.
Some bipolar arrays have power-down capabilities that can reduce the current dissipated when macro output pins are not used (conditional geometry). Other arrays may have programmable overhead current. Before ac-tual placement, an estimate of the overhead current will need to be used.
Are the estimated power and estimated maximum current acceptable for this design on this array? Actual DC power computations and maximum current checks are available through the MacroMatrix AMCCERC after once the circuit has been captured on an AMCC-supported EWS or netlister. A worksheet is provided for AC power computation.
Compute maximum internal current A maximum internal current may be specified for bipolar arrays. It is possible for the total core current to be computed and compared to array limits. It does not guarantee that the design will later pass layout row current limits. If the circuit internal core current is high and the cell utilization is also high, and other placement constraints are required, then the placement process will be difficult and may be unsuccessful.
Before placement, a global check is used, verifying that the core as a whole can handle the current required by the macros. A more detailed bus-check, or row, half-row, and quadrant current check, can be made after placement for those arrays which require this type of checking. BiCMOS and CMOS arrays typically have no internal current limit. The development of three-layer metal arrays reduced the concern for this check for bipolar arrays as well, leaving the final control of the power used in the design to be a function of the ability to keep the junction temperature of the packaged part within limits.
Make the final package selection Make the final package selection based on the array chosen and the estimated power. Refer to the Packaging Brochure from the chosen vendor. For packages with internal power and ground planes, the package selected will control the placement of added power and grounds if the use of package signal pins is to be avoided. A package must accommodate all signal pins required for the circuit plus any signal pins required by added power and grounds not placed to connect to the internal power/ground planes of the package. When a package has no internal bonding planes, the selected package signal pins must be sufficient to include all circuit signals and all added power and grounds. Review the array for any other pads that need package signal pins before making the package selection. The Q20000 Series arrays have four fixed pads, two for the thermal diode anode and cathode and two for the AC speed monitor. These array pads must reach external package signal pins, decreasing what is available for the circuit proper.
Compute the junction temperature Compute the estimated junction temperature based on the power dissipation, the packages available that meet specifications and the operating environment, including any heat sinking and air flow as specified in the functional specifications. If possible, several options should be evaluated. The allowed packages for an array should also have their thermal coefficients for junction-case (Qjc) and junction-ambient (Qja) specified. Tables or some other means of computing the coefficient for case-ambient (Qca) as a function of the heatsink, the array, the package and airflow should also be provided. For most military applications, Tc can be maintained at 125oC. For most Commercial applications, Ta can be maintained at 70oC.
Read "Theta" for Q: Military: Tj = Pd * Qjc + Tc Commercial: Tj = Pd * Qja + Ta with Qca = Qjc + Qca
With the completion of both timing and power analysis, changes in macro options, or optional functions within the circuit can be evaluated and the speed-power curve managed before full schematic capture and simulation have been performed.
Optional - Bonding diagram (custom bonding), Pinout request As an option, a bonding diagram (pin out) request can be submitted to the vendor for approval Both pin out requests and placement requests can be initiated by the designer and both must be approved by the vendor after layout and Back-Annotation evaluation.
Review the design submission requirements Review the requirements for the array series design submission as specified by the vendor. ● ●
Are schematics required? What schematic format is required by the vendor?
● ●
What simulation must be run and submitted? What other procedures are requested by the vendor?
Clarify what is to be done to actually perform a design submission to your vendor.
Pre-Simulation Steps Once an array or array series has been selected, the design can be captured and all checking performed with packaged or vendor software. For non-schematic designs, the steps leading to the netlist are performed per the system requirements. Once a netlist exists, the design steps are the same.
Perform schematic capture through netlist generation Perform the schematic capture using the Dazix, Mentor, Valid or other EWS (Electronic WorkStation) system; Lasar 6, Verilog or other netlister equipped with schematic-generation software. Perform the schematic capture following vendor schematic rules and conventions. Perform the vendor-software steps through netlist generation Each workstation has a different netlist format and a different procedure to generate it. Each workstation has its own simulator that uses the workstation-specific netlist as an input file. LASAR 6 (Vax/VMS) and Verilog each has a specific netlist format. Communication of a design from the design workstation to a vendor must be done using a netlist the array vendor can recognize. (See Figure 2-2.) In the 90s, an array vendor was limited to accepting only those designs created on a workstation that matches the equipment that the vendor has in-house. Most design input today is done without schematic capture. Cadence Composer can handle schematics. Design Compiler from Synopsys will display a schematic after synthesis (best used at the module level). Today;'s engineers use Verilog or VHDL netlist to input a circuit description. Design Compiler produces a Verilog netlist, an EDIF netlist and a Synopsys .db formatted file for design transfer. Figure 2-2 Netlister Confusion
Another solution is the use of a dial-up design system based on a mainframe. The array vendor provides the account access for a fee and provides all required support and the designer provides an acceptable terminal. The problem is the access to a compatible terminal when a graphics terminal is required and the costs of the design in connect time. To combat the problem of multiple formats without moving to a dial-up solution, netlist reformatters or translation programs have been written. AMCC has a netlist formatter, AGIF, which is customized to each supported workstation and netlister. The AMCC Generic Interface Format file produced is called circuit.sdi and it is the means of communication between the customer and all AMCC software, including the MacroMatrix components: AMCCERC, AMCCANN, AMCCVRC, AMCCSIMFMT, AMCCSUBMIT and AMCCAD for placement.
Perform design rules checking For systems and vendors without software support or with support that is less than complete, the design checks must be performed manually. EWSbased checking provided by the EWS vendor is minimal and should only be used as a first step in the validation process. Intelligent checkers are evolving. These may be interactive with a schematic capture or work on the standardized netlist. The checker must be successfully completed before proceeding. Remove all errors if possible, and document those that remain. The vendor may require a waiver before submission if errors are not removed from the circuit. AMCC customers must run AMCCERC and remove errors. The program output, AMCCERC.LST, provides reports on population, I/O types and mixes, utilization, package signal pin requirements, DC power, internal pin count, and SSO power-ground evaluation while listing naming violations, unconnected pins, pin connect violations, fan-out loading violations with derated loads, and technology (array, power-supply, and macro mismatch) errors. AMCCERC.LST must be included with the design submission package.
Generate extrinsic load time delays (Annotation) The need for annotation software came from the change in the ratio between the delays caused by the interconnect between macros and the macro internal (intrinsic) delays. Once it was common for an interconnect net delay to exceed one half of the intrinsic delay, or even to exceed the intrinsic delay, it became necessary to produce a reasonable estimate of the interconnect delay. Figure 2-3 Schematic And Netlist Paths Into AMCC
In 2000, the netlist standard had become edif. Verilog, VHD, edif, db, PDEF are transfer standards now. Front-Annotation is the term used for pre-placement-pre-route interconnect delay estimation. The estimate is based on the net size, number of fan-out loads, both physical and electrical, or the capacitive load on an output macro. The Front-Annotation programs such as AMCCANN compute the fan-out loading delay, the wire-OR loading delay, and provide an estimate of the metal etch delay due to the size of the nets. The estimate is based on a statistical evaluation of previously built circuits and the average etch length used to connect same-sized nets. It is too large a number some of the time and too small of a number at other times. Front-Annotation is not a specification. Where Intermediate-Annotation is available (a Manhattan-Distance algorithm based on a placement file), it should be used. It is more accurate in more case but it is still an estimate. Only after place and route can the actual metal etch delays be known. Annotation after place and route is called Back-Annotation.
Perform testability analysis on the circuit. All testability measures have one common goal: to enhance controllability and observability of the circuit. It is a grade on the logic design itself. Controllability is a measure of the ease in setting a particular node to a logic level of zero or one, while observability determines the ease of propagating the node's state to one or more primary outputs. After a netlist has been created and logic simulation has verified correct functional performance, testability can be verified by running testability analysis programs. This optional step is highly recommended if there is software available to perform the analysis. For a modular design, a manual review should be performed if there is no software support. The purpose is to identify those parts of the circuit that are difficult or impossible to reach by way of primary inputs (controllability), and those parts of the circuit that may change state but that are difficult or impossible to observe at a primary output (observability). Steps should be taken to make hard to reach nodes controllable by adding test control signals and degating logic. Make hard to observe nodes observable by adding test points. Make any adjustments or changes to the schematic as necessary to improve testability to acceptable limits. Changing the schematic will mean repeating the error-checking and annotation software steps.
Testability analysis should be done before simulation since the result will be to simplify the functional simulation vector set development.
Simulation Once the circuit has been checked for design rule violation, sizing, power, package fit, optimization, functionality and other non-simulation dependent checking, the simulations required by the array vendor may be performed. There are several types of simulations: functional (all etch is connected without SA0, SA1 faults); at-speed (the arrayimplemented design runs at the specified maximum operating frequency of the circuit); AC test (path propagation delay) and parametric (VIH, VIL). The array vendor may specify the simulations, formats required, and vector rules to be followed.
Modular simulations - Debug only During logical debug of the original design it is better to simulate modular segments of the circuit, verifying basic logical operation and debugging the immediately obvious design errors. Once the circuit is considered to be a successful logical design, then perform the functional simulation that will form the basis of the test vectors submitted with the design. Multiple fragmented functional simulations cannot be submitted.
Functional simulation The object of functional testing is to detect a single SA1 (stuck-at-1) or SA0 (stuck-at-0) fault in the circuit if one exists. This ideally requires sufficient vectors to "cover" all possible SA1 and SA0 fault locations. The percentage of coverage is the fault grade of the vector set. For a high fault-grade score (95% and up), other types of circuit failures are assumed to be "covered". Functional test vectors are initially created from the functional simulation sampled results file. Functional simulations are run using Front-Annotation or Intermediate-Annotation with timing checks enabled. They are re-executed when Back-Annotation is available. It is the Back-Annotated simulation result file that goes to test. The functional vector set for a circuit should detect any single fault occurring on a single path. In theory, triple faults, odd faults of 5, etc., per path are covered by the vectors detecting single faults provided the faults do not mask each other. Even-numbered sets of faults on a path (double faults, quad faults, etc.) are assumed to mask each other and not to be detectable. The probability of multiple faults on a path is significantly less than the probability of a single fault. (Multiple faults that signal a catastrophic failure are detected within the basic wafer screening.) Figure 2-4a Simulations - Types And Forms
PRINT ON CHANGE
SAMPLED MAX
MIN MAX
MIN
FUNCTIONAL X
X
AT-SPEED
X
X
X
X
AC TEST
X
X
X
X
PARAMETRIC X
X
Figure 2-4b Circuit Simulation Requirements
Redundant circuit logic will cause some faults to be masked (prevent their detection) and should be avoided. Where redundancy is desired for other reasons, the designer should add test points to make masked faults visible. One extreme approach used to develop functional vectors is to cycle all inputs and outputs through all combinations of 1-0 and 0-1 transitions as a first check after initialization. (Theoretically, this should cycle all internal nodes in a combinatorial circuit as well.) This 2n (where n = number of inputs) brute force approach is not necessary. Minimum vector test sets and minimum vector test sequences will cover 100% of all observable faults. A fault cannot be detected by any test methodology if it is a masked fault. A masked fault cannot be seen at a primary output due to redundancy in the logic. Logic minimization is therefore a requirement if high fault grade scores are desired. The functional simulation vectors may have been developed for an earlier technology version of the array circuit or may be developed from scratch. They need to be constructed in pages (AMCC uses a 4K or an 128K page depending on the tester), begin with initialization of the array, and initialize periodically within the page between test modules. Begin by initializing every I/O pin (preferred initialization is within 25 100 simulation steps, depending on array size). Proceed to "home" the circuit For testability, a master reset or master set is desirable since it will allow a circuit to be placed in a known state quickly. For circuits that combine reset or set with non-resettable logic, the flip/flops and latches that are not cleared by the set or reset should be initialized after the set or reset has executed and the components settled. A circuit will need to be placed in a known state between groups of tests, at tester page boundaries and before any long or complex test.
Functional simulation execution Functional simulations must be done for the maximum and minimum worst-case timing and are sampled with a step long enough to ensure that all changes caused by the controlling data or clock signal have stabilized. (AMCC uses a step of 100ns.) The rule of thumb is to measure the longest path in the circuit, compute its worst-case maximum time delay, add 50ns and round to the nearest 100. For BiCMOS and bipolar arrays, 100ns is more than adequate. The 100ns step size equates to 50MHz, the limit for the SENTRY tester. Different vendors may specify different step size approaches but the necessity of all signals being stable will remain. Functional simulation results for the maximum and minimum libraries should be compared as a check on hazards and races. The results for the minimum library should match those obtained with the maximum library. If they do not match, stop and evaluate why they do not.
Table 2-10 Functional Simulations minimum worst-case maximum worst-case sampled
sampled
Simulation outputs Each simulator produces a data file or a list file that represents the signals the designer specified and the time step at which they were recorded. Most provide a waveform of the results as well. The formats of these output files are not standardized. To submit them to the array vendor, some reformatting must take place.
Reformatting simulation outputs To allow vector format checking and to simplify test transfer, AMCC developed the AMCCSIMFMT (AMCC simulation format) program. It reformats the output of the logical simulator into a form acceptable to AMCC test and to programs that need to read the files. (This standard format allows the simulation sampled output file to be used as a data input file to other software.) Each supported EWS and netlister has a unique AMCCSIMFMT program.
Vector Rules Checking The AMCC Vector Rules Checker (AMCCVRC) can be run against any AMCCSIMFMT (AMCC Simulation Format) sampled simulation output file from any simulator. AMCCVRC will issue a count of the number of test vectors and simulation vectors for the particular file being scanned. AMCCVRC will check for: missing primary I/O signals, missing 3-state or bidirectional enable internal signals. It will identify differential signals, verify that related clock and data signals do not change in the same vector (race conditions for the tester), check the number of simultaneously switching outputs per vector against some established limit, look for internal signals that should not be present, and print a summary of warn-ings and errors. Figure 2-5 Using A Formatter For Simulation Output
AMCCVRC will also identify primary signals that did not change in both directions during the vector set (toggle test). It produces a report and error listing called AMCCVRC.LST that is a required part of the design
submission package. Every maximum worst-case functional simulation file must be processed through AMCCSIMFMT and AMCCVRC.
simulator output ----> AMCCSIMFMT formatter --->
amccvrc.lst AMCCVRC Report
Fault grading There are fault grading programs that score the vectors as to per-cent faults covered. There are a number of fault-grading packages appearing on the workstations and on mainframes. Fault-grading is used to verify that the simulation bit vectors sufficiently exercise nodes within the circuit to assure that the outgoing product matches the customer specification. If an array vendor does not support a particular package, it is likely that the software will give misleading fault grade scores. Fault grade scores depend on the modeling approach used as well as the vectors themselves. Most fault-graders need a file or support program to reduce errors due to global ground not switching, VCC, VSS or VDD not switching, or a terminated output not switching and other, similar exceptions. Insufficient fault coverage as determined in a fault grading analysis may require the addition of vectors to the graded set. Functional simulation vector fault-grading can be performed at AMCC using the LASAR 6 simulator. AMCC looks for scores based on the interconnect nets and not on the internal macro component interconnect links. MSI macro modeling (and whether the macro is hard or soft) will affect fault grade scores. AMCC recommends the creation of enough vectors to achieve a fault coverage of 90% or higher. simulation stimuli and netlist ----> fault-grader ---> report grade
At-speed simulation In addition to function simulation, the designer must perform some at-speed verification of circuit operation. One method is to perform a simulation that is executed at the specified maximum frequency of operation of the circuit with timing checks enabled. At the minimum, these vectors should cover the critical performance paths of the circuit and may cover the entire circuit. The at-speed simulations are run using Front-Annotation. The Front-Annotation results are not to be considered to be a specification of the final results. The atspeed simulation is re-executed when Back-Annotation files are available. For conventionally specified array series, at-speed timing analysis is done with the worst-case military or commercial (maximum) and with the minimum library. At-speed simulations are run with the print on change option for the simulator (print_on_change, -c, list -change, etc.), monitoring the same signals monitored by the functional simulation. Because these are complex to evaluate, they are also performed in the sampled mode. They are run using the maximum library
and the minimum library. Table 2-11 At-Speed Simulations minimum worst-case sampled
print_on_change
maximum worst-case sampled
print_on_change
Timing Verifiers - An at-speed option If they are supported by the array vendor, timing verifiers can be substituted for at-speed simulation. Not all timing verifiers are supported by the array vendors even if the corresponding simulators are supported. (The Valid timing verifier is the only one currently supported by AMCC and then only with certain libraries.) Check with the array vendor. Verifiers can run min-max analysis against either the maximum or minimum delay library. The min-max spread is the process, temperature, and voltage variation for the library and is about 10-40%, as specified by the vendor. This type of analysis can highlight spikes, ambiguity on clock paths, and marginal timing performance.
Supported and non-supported EWS features Timing verifiers emphasize the need to communicate clearly with the array vendors. When evaluating an EWS or netlist purchase, consult the intended array vendors for a list of systems and system features that the target libraries support before committing to a design approach. The EWS system may have software for which the vendor has not created models, rendering that software useless without extensive further development. There is a growing pool of independent workstation tool suppliers. For these packages, the array vendor must also be consulted before assuming that they can be used. Some of them alter the netlist that the vendor may be using as input to the layout system, destroying the circuit interface. Always refer to allowed equipment and EWS configuration supplied by the target array vendors. Consult with them before starting a purchase or a design.
Create the AC test simulation vectors - Optional AC tests are optional and may be written to check either propagation path delay in a non-memory path or external set-up and hold time for memory elements. Both rising and falling edges should be checked. AC test simulations may be concatenated into one simulation file provided clear documentation of start and stop time addresses are provided. Each test (one pair of input-output pads, one edge direction) must initialize the circuit so that the test can be performed, provide the stimuli and run until the effect of the stimuli is seen at the circuit output. AC test simulations are run using the maximum and then the minimum library. In each case, run once for sampled results and once for print on change. AMCC performs only path propagation delay AC tests. For older AMCC arrays, there is a limit of 20 tests over 10 paths, with bus lines handled as multiple paths. AMCCVRC is used by AMCC customers to screen AC Test simulation vectors. Table 2-12 AC-Test Simulations minimum worst-case sampled
print_on_change
maximum worst-case sampled
print_on_change
AC Speed Monitor The AC speed monitor that AMCC built into the Q20000 Series base arrays removes the requirement for customer-generated AC test simulation vectors. This on-chip device will be added to all future arrays. The basis of the AC monitor is a 9-stage ring oscillator followed by a 2-stage divide by 4 counter. Each stage uses 100 mils of second and third layer metal to evaluate metal loading. The accuracy of the counter is 0.005% up to 100MHz. The AC speed monitor uses two pads, a power supply pin and the output pad, that are bonded out to external package pins. (See Figure 2-6.) Figure 2-6 AC Speed Monitor - Q20000 Series Arrays
Parametric testing - Optional Parametric testing for VIH, VIL is optional. There are several different methods of setting up a parametric simulation. One approach is the use of a parametric gate tree, where all circuit inputs (clocks and set and reset included) are treed by NOR, AND or OR gates (SSI logic) to a single output. The cost is the number of internal cells needed to implement the gate tree, one output and an added load on the primary input signals. The vectors are the minimal test sequence (100% fault coverage) for that gate tree. A minimal sequence changes one input per vector and the output toggles every vector. Every input is switched from 1-0 and from 0-1, one by one. The parametric vector set is combined with the functional simulation vector set for fault-grading. Parametric simulation is run once, using the maximum worst-case library and a sampled output. The vendor may require that the minimum simulation also be run. AMCCVRC can be run to check parametric testing simulations. Table 2-13 Parametric Simulations minimum worst-case maximum worst-case sampled
sampled
The Design Submission Through Prototype Complete the design validation review Once the simulations are completed, the entire circuit package should be reviewed for completeness. This is a preliminary design acceptance review. AMCC currently provides a Design Validation form that identifies areas that are characteristically problems in a design submission, a list of items that previous design submissions had in error. After AMCCERC errors, timing check errors and AMCCVRC errors have been resolved, the designer should work through the checks in the Design Validation section of the design manual. These are checks that have not yet been or can never be automated. The questionnaire is incorporated into AMCCSUBMIT, a design submission program that queries the designer for file names and conditions, and produces reports for use in design submission. For array vendors without such a list or automated support, review the submission procedures and the array design rules.
Complete the design submission checklist The array vendor will have a submission procedure, a list of the files and documents required for submission before the vendor can accept the design and proceed to layout. Check off the required items as they are assembled into the package. A design cannot be submitted without the completion of the required items. Optional items must be complete if the option is chosen. Last of all, make sure some media index exists in both media and hardcopy form that identifies what is in the submission package. AMCC has created a generic design submission form as a first step in creating a user-interface automation of the design submission - design validation process. It and its accompanying document detail what is required to be submitted in hardcopy and what is to be submitted on media (disk or tape). (The first version of the form is now part of AMCCSUBMIT.) AMCCSUBMIT generates a report that alerts the reviewer to problem areas in the design or the design submission package.
Submit the circuit - acceptance design review When the design submission package is complete, submit it. The vendor will review the package for completeness and correctness. The submission will include all pre-approved design waivers negotiated before submission. ●
●
●
If everything has been done according to the vendor''s rules, the design will be accepted and move into its proposed schedule. If only minor things are missing or incorrect, the array vendor may make the changes under the designer''s approval. If serious errors or omissions exist, the design may be rejected, i.e., returned to the designer with instructions on what is missing or incorrect.
Implementation Engineering - the Array Vendor On design acceptance, the Implementation Engineer assigned to the design will rerun all simulations against the in-house library. The object is to identify any macro design changes implemented after the library release date that would affect the design under review. A second objective is to verify that the circuit used on the schematics, netlist and simulations all match since it is easy to violate file consistency.
The Design Submission Through Prototype Placement After processing by Implementation Engineering, the circuit will be submitted for layout. Preplacement requests that were approved by the array vendor are input to the layout system in this phase. For customers who wish a particular package pin-out, a specific pad placement may be required. Vendors attempt to honor these requests if they do not violate other placement criteria. Placement restrictions may be I/O mode and package specific. They may be driven by the type of macro, such as the dual-cell differentials. They may be driven by MSI (multiple-cell) placement requirements, whether these are hard or soft macros. Timing specifications and clock distribution requirements are another factor as are the particular restrictions induced by simultaneously switching outputs (SSO). Packages that use internal power and ground planes may restrict where added power and ground macros are placed, and this may conflict with the SSO requirements. All of these factors must be reviewed before approving a placement. A placement is not usually considered final until after routing and then only after the at-speed Back-Annotated simulation is approved by the customer. On the average, a circuit requires a first-pass placement (90-95% auto-placement is the goal) and some adjustments in a second pass.
- Intermediate Annotation Some vendors may have an Intermediate-Annotation software package capable of providing Manhattan-Distance algorithm-based Intermediate Annotation delay files. They allow simulations to be performed with time delay data that is much closer to reality than the generic, "every-same-sized-net-is-the-same-length" Front-Annotation software. For a circuit where the technology is being pushed to the limit, and Back-Annotation will take more than a week to obtain, it might be a good idea to run Intermediate-Annotation simulations. They are still not accurate enough to be treated as a specification, but they could identify gross errors in placement that could be corrected before routing.
Routing Routing is the longer process. For circuits meeting the internal pin count and cell utilization limits for the array, 95% of the nets can usually be routed automatically. The last few are closed by a human operator at a graphics interface terminal. Some array vendors will not accept an array that cannot be 95% autorouted. As a guideline, AMCCERC will report warnings for those circuits exceeding recommended internal cell utilization limits and recommended internal pin count limits. It will report an error for those circuits that exceed the limits so far as to be considered impossible to route. It cannot cross-check package pad-pin requirements or make any assumptions about the physical location of the macros.
- Back-Annotation After layout, the Back-Annotation delay files are available to the designer to rerun the logical and at-speed simulations, plus any of the optional simulations originally submitted. These files provide the actual metal lengths in the circuit nets as opposed to the estimated metal length, and the actual (as far as is known) package pin capacitance for the output nets. Critical paths must be checked with this data. At this point, a failure to meet specification timing requirements by a small amount may be correctable with a layout adjustment or a routing change. Serious failures may signal the need to re-design. AMCC and most vendors guarantee the maximum worst-case BackAnnotation at-speed simulation, i.e., guarantee that the silicon will not be slower than the results. The careful evaluation of the critical paths early in the design phase, the proper derating of the fan-out loading, careful selection of the macros and the options for those macros, preplacement for critical and sensitive paths (balanced against the placement restrictions and rules for the array), and the careful simulation and timing validation before layout, will all ensure a successful design experience. Re-simulation and timing validation after layout (place and route) help ensure a successful wafer.
Prototyping After the Back-Annotated simulations are approved by the customer, the vendor can proceed to produce prototypes.
Array Design Acceptance After prototyping and testing per the testing specification supplied at design submission, including the functional vectors, the customer would perform the final array acceptance as desired. At this point, full fabrication of the final product can begin.
Functional Specification - A Closer Look The functional or target specification is the first level of description of the project that may encompass one or more arrays when the design is partitioned. There may be a specification tree with the total project at the top node and individual circuit blocks or modules detailed underneath. Topics included in a functional specification are listed in Table 3-1. At this early stage, a functional description of what is to be accomplished is created along with some of the top-level circuit requirements.
Array Interfacing For the partitioned project (multiple arrays), the individual array specifications would include a description of array interfacing. Interconnection between arrays is faster when done with ECL. When choosing single or dual (differential) rail ECL use the following guidelines: ●
●
●
If the arrays will be placed on the same board and will be adjacent to each other, single rail (non-differential) ECL may be acceptable. If the arrays will communicate across a backplane or be remote on the board, differential ECL may be required. Differential ECL is required if the operating speeds exceed the maximum frequency specifications for single rail ECL.
The potential need for differential ECL should be indicated at the functional specifica tion level.
Partitioned circuits should attempt to balance the distribution of I/O and internal cell usage between the different arrays while maintaining critical paths within one array if possible. This is still the rule to follow - no matter how big the arrays get. It is also a good guideline for how to break up a 6-8 milllion gate array into top-level blocks - keep the critical paths inside the block if possible. Interblock connections today are what interarray connections were yesterday. Table 3-1 Components of The Functional specification
Functional Specification Block diagram to the module level- including any partitioning into more than one array Description of the boundaries between the modules and the rest of the system Initial sizing of the I/O interface by type - ECL, TTL, etc. Functional Description of the Modules Description of the interface between the circuit modules - busses, control, critical interconnects The overall performance requirements - - - the maximum frequency of operation - - - target clock speed (per clock) - - - path propagation delay requirements set by modules external to this design I/O toggle rates Synchronous/asynchronous signals Allowed or available power supplies Power restrictions Physical size restrictions Environmental requirements -Commercial, Military, Industrial, Other Packaging requirements Derating for junction temperature Prioritized design objectives
Hard Specifications Design criteria that are considered as hard (inflexible) specifications should be clearly documented as such. Specifications that might be alterable should also be clearly identified. If a tradeoff or judgment call needs to be made during the remainder of the design project, such information can save time and possibly the project.
Design Objectives Overall design objectives should be clearly identified and documented. These include optimization for speed, power or die size, which translates to minimized inter nal cell utilization and minimized I/O utilization. Since these objectives are in conflict, they should be prioritized. As a last step, there should be a careful design review of the circuit and sys tem functional specifications, and the partitioning
Review the Available Arrays With a clear understanding of the design description and overall objectives, review the arrays currently available that could be used. For a listing of currently-available array series, check with the latest ASIC vendor surveys run by several of the engineering magazines. These buyer's guides provide a cursory look at what is available and allow a first-pass sort of available arrays into feasible and non-feasible, a staring point from which the designer can proceed. They have limited space to review technology, die size, cell counts, metal layers, number of macros, interface levels, second sources and the EWS workstations the array ven dor supports. They may not have the latest updates on an array series. They can provide addresses and phone numbers for array vendors. Once one or more vendors have been selected, the designer should obtain data sheets and design guides from the prospective vendors for the most promising array series and begin a more in-depth review.
Example - The AMCC Arrays - as of 1991 The industry shows an evolutionary trend as designers drive them to develop larger, faster and cooler arrays. There have been five bipolar array families from AMCC since 1984, (see Table 3-2) increasing in cell size and speed while reducing die size and power. The most recent is the AMCC Q20000 series, officially released in September 1989. Table 3-2 AMCC Bipolar Array Series AMCC Array Series Year
●
Q20000
1989
Q5000
1987
Q3500
1986
Q1500
1985
Q700
1981
The Q20000 Series speed estimates list its internal toggle rate, at least twice as fast as that of the previous Q5000 Series, at 1.25GHz, with an enhanced drive and much lower power. Individual macros have been found to run at 1.4GHz and higher.
There are two AMCC BiCMOS array families, the Q14000 Series and the Q24000 Series, a partial shrink of the Q14000, as shown in Table 3-3. Table 3-3 AMCC BiCMOS Array Series AMCC Array Series Year Q24000 Series
1990
Q14000 Series
1988
The current BiCMOS families were preceded by three CMOS array series, each faster than its predecessor. The BiCMOS arrays combine the drive and interface ability of bipolar with the cooler operation of CMOS. The newer BiCMOS Series must be larger, faster and cooler.
Comparing the arrays The items that define the differences between array series include those shown in Table 3-4. Table 3-4 Features for Array Series Comparison Array Series Comparison Topics The process technology Metal layers routed (2, 3) Series gating techniques Sea-of-cells versus routing track architectures also called channelless vs. channelled Overall Maximum Speed of Operation specified as I/O and internal toggle rates Frequency Ft (frequency at which beta for transistor becomes unity) Noise immunity Edge rates - programmable or not Symmetry in rise and fall times Power-supply options allowed Power-supply variation stability Maximum number of I/O cells available I/O modes allowed (TTL, ECL, MIXED, etc.) ECL terminations On-chip translators Maximum number of internal cells or gates available Features for Array Series Comparison Macro Options - Standard (S); Power (P); Low-power (L); High-speed (H); Fast (V), Drivers (D); superdrivers (D) - - - or lack of options; i.e., speed-power programmability Variety in the macro library available Wire-ORs (dot-wire) allowed or not Design constraints Power dissipation per gate Packaging Available Autoplace, Autoroute Engineering Workstation support Simulators supported Second source Military compatible Commercial compatible Military qualified testing Other topics as dictated by the arrays, their technology and the design issues
The arrays within a series refine these differences with specific information on size, number of cells by type, and details about interfacing, as shown in Table 3-5. Data sheets, product profiles and macro library design guides or design manuals supply the specific information for an array series. The design manual, supplied with the array library media, is the controlling document.
Architectural Specification or Hardware Specification Once a clear definition exists of the circuit or circuits that will be placed on one array, then the planned design can be developed. This is on a smaller module scale than the block-level functional specification, e.g., at the level of counters, adders, latches, registers, sequencers, etc. The performance requirements defined in the functional specification can be used to select the technology. Table 3-5 Array-Specific Specifications Hardware or Architectural Specification Number of internal cells Number of I/O cells Number of outputs Number of bidirectional macros Number of fixed power and grounds Rules for adding power and ground Packaging Options Maximum internal current limits On-chip memory Macro-type design use restrictions such as number of Darlington; CML outputs Placement rules that affect design Variable bonding The review of available arrays is conducted in parallel with the creation of the hardware specification. With the descriptions developed for the modules, equivalent gate estimates can be made for the circuit, or estimated cell usages can be computed for the circuit on a specific array. The array vendor Applications Engineer can help with the sizing esti mate. The hardware design specification details what the designer intends to do to meet the target functional specification. This level of specification can be equated to a PDL (program definition language) description of software and is the basis for the evolution of HDL, hardware description
language, and its derivatives. If a particular testing methodology is being enforced, the sizing estimates must take this additional logic into account. If additional testing logic, such a parametric gate tree or parity logic, is to be used, it must be included in the sizing estimates. The specification may include proposed vendors and arrays. Table 3-6 Components of The Hardware Specification Hardware Specification Components The selected technology or technologies Potential array series (1-3 at the most) Block level diagram to the sub-module level The functional description of the different circuit sub modules such as adders, counters, registers, etc. Sub-module sizing --- equivalent gates or estimated internal cell utilization --- estimated I/O cell utilization --- estimated pad utilization --- estimated internal pin counts Refined details on the array interface --- number of CMOS I/O --- number of TTL I/O --- number of ECL 10K I/O --- number of ECL 100K I/O --- all four types partitioned into inputs and outputs and bidirectionals --- number of outputs switching simultaneously (by type) (SSOs) --- maximum toggle frequencies for each I/O --- external set-up and hold window unless this circuit will establish the window specification for the driving circuit Critical path throughput performance Estimated power - DC and AC as required Package to be used Heatsinks required and/or air cooling required Estimated junction temperature There should be a design review of the architectural or hardware specification before final selection of an array series. On final selection, the specification should be revised to show that series and all computations performed for that series. Note that a workstation can provide some assistance. The critical path may be captured in more than one version and comparisons made based on an annotated simulation. Power and sizing details can be run against a macro list rather than a full interconnect netlist. (This tool is vendor-dependent.) Check if such a precapture tool is available to help size the circuit.
Array Sizing Cell Structure Each cell in an array consists of a number of uncommitted transistors, resistors and other discrete components and is designed around the performance criteria for the intended macro library. The cells will vary between array series, regardless of the vendor.
Equivalent gates The number of equivalent gates has been a design measure dating from the days of discrete designs first converting into SSI-level ICs. Integrated circuits were classed as SSI, MSI and LSI based on their equivalent gate counts. Circuits were "sized" based on the number of equivalent gates it would take to create them. CMOS arrays carried on with the equivalent gate count and it was reasonable because the internal cell in a CMOS array can be sized as 1, 2 or 3 gates. Bipolar arrays carry equivalent gate counts on their data sheets as a sizing measure but it serves only to show relative sizing between arrays in the same series. Bipolar array cell complexities render equivalent gates a rough measure at best. BiCMOS cells are more complex than CMOS and equivalent gate estimates are not recom mended for them either. To complicate the problem, vendors use many different methods for computing equiva lent gates. The designer would need the algorithms before a rational comparison based on equivalent gates can be made between and two array series, even from the same vendor.
Example - Method 1 One approach to array sizing is to count the number of transistors in the internal core cells, assume that 2.5 transistors is equivalent to a gate (Digital Equipment's defini tion), and compute the number of equivalent gates per cell. The product of the number of cells times the number of gates per cell provides the equivalent gates per array. equivalent gates = ( number of transistors in core / 2.5 )
Example - Method 2 Another method is to use the D flip/flop. Sizing the D flip/flop as 11 gates, the Q20000 Series D flip/flop uses 2 internal cells.
equivalent gates = ( number of internal cells / 2 ) * 11
Example - Method 3 The usual AMCC method is to size a 3:1 MUX-D flip/flop macro as 11 gates. The Q20000 Series 3:1 MUX-D flip/flop uses 3 internal cells. equivalent gates = (number of internal cells / 3) * 11
Example - Method 4 The last method discussed here is to size a full adder at 16 gates. For the Q20000 Series, a 1-bit full adder takes 3 internal cells. equivalent gates = (number of internal cells / 3) * 16 or: equivalent gates = [(number of internal cells / number of cells required for measuring function) * number of gates in function] AMCC ASIC Product Selection Guide with Equivalent Gates Listed (1996)
AMCC ASIC PRODUCT SELECTION GUIDE (1990's)
Equivalent Gates Number Structured (Full Adder of I/O Array Blocks Method)
Part Number
Technology
Q20004
1 Micron Bipolar
671
28
Q20010
1 Micron Bipolar
1469
66
None
Q20025
1 Micron Bipolar
4032
100
None
Q20045
1 Micron Bipolar
6782
128
None
Q20080
1 Micron Bipolar
11242
162
None
Q20120
1 Micron Bipolar
18777
198
None
928
34
1 GHz PLL
Q20P010 1 Micron Bipolar
None
Q20P025 1 Micron Bipolar
3120
51
1 GHz PLL
Q20M100 1 Micron Bipolar
13475
195
RAM
I/O cell contributions None of these methods for estimating equivalent gates take the logic capability of the interface cells into account. Some vendors do count them in their published equivalent gate counts and others do not.
Example - AMCC cell design AMCC cell design is optimized for MUX, latch and flip/flop implementations. Each cell is designed to support high-speed requirements so that there are no placement re strictions on the highspeed option macros due to cell limitations. No power is used by a cell in its base configuration. For the AMCC BiCMOS arrays, a cell is roughly 3 gates. For the bipolar arrays, a logic cell is a more complex structure and varies with the series.
Cell capabilities Cells for each array have different capabilities. The cells for different array series, same technology (bipolar, BiCMOS or CMOS), from the same vendor may also differ widely in the approach used in their design and in their functional complexity.
Example - AMCC cell capabilities An internal cell for the Q5000 Series can support a complex D flip/flop, a 3:1 MUX and D flip/flop, a triple latch, two simple (no RESET, single output) D flip/flops, or triple 2:1 MUXs with common select. The Q20000 Series internal cell alone cannot support a D flip/flop. S- and L-option D flip/flops use two cells while H-option D flip /flops require three. The Q20000 Series internal cell is roughly comparable to a half-cell for the Q5000 Series if size of function alone is considered as the basis for comparison. The logic cell for the Q20000 Series is defined as the smallest partition possible and each internal cell supports one Turbo macro output. Turbo is a Q20000 feature that provides high drive (18 loads) with less power and less skew.
Cell types and resources The vendor data sheet and design guide or design manual should clearly identify cell types and the number of each on each array in the series. Any restrictions in the use of the cells, either utilization limits or cell count limits should also be readily available. Included in these descriptions should be a measure of cell functionality, either in a table summarizing the array cell capability or through the macro library documenta tion. As a part of the cell resources identification, the vendor should be supply a clear description of the fixed power and ground pads and procedures to added additional power and ground pads. These added power and ground macros usually reside on an I/O cell and pad and can affect the number of cells left for circuit signals.
Example - AMCC Cell types The basic AMCC logic array is composed of two classes of cells: the internal cells, which is composed of logic (L) and memory (M) cells for bipolar arrays or basic (B) cells for BiCMOS arrays; and the perimeter cells composed of input, output or bidirec tional (I/O) cell. Older AMCC arrays had buffer cells internally and specialized input or output-only interface cells. An array may or may not have specialized I/O cells. AMCC
cell types are shown in Table 3-7. The QM1600S (now the QM1600T) was the first of the AMCC arrays to incorporate memory on a logic array. Table 3-7 Cell Types INTERNAL:
Logic,
Basic,
Buffer,
Memory
PERIPHERAL:
Input,
Output,
I/O,
Special-I/O
Refer to the cell resources table for an approximate idea of the array cell capacity for three series and note the differences. Cell resources for the Q24000 Series are shown in Table 3-8, for the Q5000 Series in Table 3-9 and for the Q20000 Series in Table 3-10. Note that no two series are alike! Table 3-8 AMCC Q24000 Series Arrays - Cell Resources Array Name
Internal B Cells
I/O Cells
Pads
6880
300
256
Q24280 Q24140
3360
226
226
Q24091
2268
160
160 132
Q24060
1440
132
Q24021
540
80
80
Q24008
190
66
44
Usage restrictions: Refer to the Q24000 Design Manual for details. Table 3-9 AMCC Q5000 Series Arrays - Cell Resources Array Name
Internal L Cells
I/O Cells
Output Limit
Memory Cells
Q5000T
352
160
120
-
Q3500T
242
120
-
-
Q1300T
84
76
-
-
114
106
-
2 (1240 bits)
QM1600T
Table 3-10 AMCC Q20000 Series Arrays - Cell Resources I/O I/O Signals Signals Cells Cells - Loop - PLL (For (Fixed) FIlter Related Signals) (1) (2)
Array Name
Internal Cells
Q20120
3414
198
4
-
-
Q20080
2044
162
4
-
-
Q20045
1227
128
4
-
-
Q20P025
595
76
4
13
8
Q20025
733
100
4
-
-
Q20P010
177
54
4
13
8
Q20010
267
66
4
-
-
* Two pads are used by the AC Speed Monitor and two by the thermal diode. ** Only for the largest arrays, 100_LDCC for the Q20P010 and 132_LDCC for the Q20P025 Add last four columns to find total I/O cells and pads.
Array Name
ECL TTL PLL Power/Ground Outputs Outputs Power/Ground (1) Limit Limit
Q20120
172
100
-
78
Q20080
130
80
-
52
Q20045
100
64
-
52
Q20P025
45 (2)
45 (2)
8
26
Q20025 Q20P010 Q20010
80
48
-
36
23 (3)
23 (3)
8
20
50
24
-
32
(1) Add last two columns to find total number of fixed power and grounds. (2) 51 for external loop (3) 34 for external loop
Systems
Array architecture The base arrays for the various series are similar in their design concept in that the core of most arrays is composed of an array or matrix of logic or basic cells organized in a row-column configuration. Arrays that contain memory place the RAM blocks in the core area, with the rest of the core designated for internal logic cells. Phase-Lock loop arrays, the PLL arrays, have PLL locations that straddle both core and interface areas. Interface (I/O) cells are placed around the perimeter of the array interspersed with power and ground. There are different base arrays for different power supply configurations. The base array for a single +5V supply will be different from that for a mixed-mode +5V/-5.2V dual supply. A generic die plot for the Q20080 array is shown in Figure 3-1 and one for the BiCMOS Q24091 is shown in Figure 3-2, with the interconnect pattern in Figure 3-3. Figure 3-1 Q20080 Die Plot
Figure 3-2 Q24140 Die Plot
Figure 3-3 BiCMOS Macro Interconnect Pattern
Macro configurations Macros are individually configured by interconnecting the components within a cell with one layer of metal to form the selected macro function. Macros can occupy a cell, a partial cell (usually 0.5 cell), or require several cells. The internal interconnect for a simple macro is generally confined to one layer of metal. The particular layer will depend on the array series.
Cell Interconnect The process of interconnecting macros is called routing. For channelled architec tures, routing is performed following specific routing tracks. The interconnect is on the first and second layers of metal in a two layer metalization array. Horizontal and vertical tracks are assigned to specific metal layers. For an array with three layers of metal, the second and third layers will be used for inter-macro routing and the first layer for intra-macro routing. In practice, the hard definition of which layer of metalization is restricted to which operation can be blurred.
Channelless architecture Channelless architectures have been developed to avoid some of the limitations im posed by restricted number of routing tracks. The Q24000 sea-of-gates and Q20000 sea-of-cells (channelless) architectures use three layers of metal. Macros are interconnected on one level and interconnect between macros occurs on the other two, the specific layers being array and series dependent. For the Q20000 Series arrays, the internal macro connects (intraconnects) are on second and third metal with macro and I/O interconnects on the first layer. Routing on all three layers is possible and four layers of metal is a future possibility.
Netlist The combination of the macro layout patterns (component interconnect) and the macro interconnect forms the metalization pattern required to implement the circuit on a given array. This pattern is described in a netlist. Each workstation produces a netlist in its own format, carrying along whatever in formation the workstation vendor has decided was necessary. There is no standard workstation or simulator netlist format although efforts are directed toward that goal (see EDIF) and some success has been recently attained. Parametric information that is included in the netlist is array and arrayvendor depen dent. A library such as the Q20000 is shipped to customers with a Macro Parameter File, which supplies the parameters for each macro in the library. These parameters are included in the netlist for each occurrence of each macro used in the design.
Example The AMCC netlist To accommodate transfer of designs from any workstation or from any of the sup ported netlisters (Laser 6 and Verilog) to the mainframe-based place and route sys tem, netlist conversion is performed, where the workstation netlist is translated into a standard interface format. AMCC refers to this as AGIF - AMCC generic interface format. A different conversion program is required for each workstation or simulator that AMCC supports. The standardized netlist is named circuit.sdi . This netlist is used as input to the AMCC MacroMatrix software as listed in Table 3-11. Table 3-11 AMCC MacroMatrix and Design Support Software - using circuit.sdi MacroMatrix AMCCERC rules check MacroMatrix AMCCPACKAGE (Package Check and Data) MacroMatrix AMCCANN annotation MacroMatrix AMCCSIMFMT simulation file formatter MacroMatrix AMCCVRC vector check MacroMatrix AMCCSUBMIT submission check AMCCAD place and route system Test vector transfer software Verilog simulator
Interface options - I/O modes Interface combinations required for the design should be compared to those offered by the arrays under evaluation. The power supply and the interface combination define the I/O mode of the array. Not all arrays support all possible I/O modes with all possible power-supply combinations.
Interface types Once it is seen that the interface mix can be supported on an array series, the type of TTL and ECL outputs that will be required is used to help size the I/O requirements of the array.
Example: AMCC interface options For all AMCC arrays, TTL and ECL translators are included in the I, O, or I/O cells for external interfacing to both ECL and TTL. Each I, O, or I/O cell can be configured to be either TTL, ECL 10KH, ECL 10K, ECL 100K or as a power or ground pad. I/O cells can usually be used for input macros, output macros or bidirectional macros. Table 3-12 shows the possible I/O combinations allowed on AMCC arrays while Table 3-13 details the TTL output options and Table 3-14 the ECL output options. Table 3-12 AMCC Interface Combinations IF INPUT IS OF TYPE:
OUTPUT CAN BE ANY OF:
TTL ECL 10K ECL 100K TTL ECL 10K
ECL 100K
X X X X X
X X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
Table 3-13 TTL OUTPUT OPTIONS standard TTL open-collector three-state or 3-state also called tri-state standard TTL output bidirectional open-collector output bidirectional The 3-state outputs and TTL bidirectional macros have an enable pin that is either restricted to being driven by a specific macro type (a 3-state enable driver) or unre stricted and driveable by any internal-level signal. The restriction depends on the array and on the mode (100% TTL or Mixed ECL/TTL) of the circuit. Table 3-14 ECL Output Options
ECL 10K, 25 ohm termination ECL 100K, 25 ohm termination ECL 10K, 50 ohm termination bidirectional ECL 100K, 50 ohm termination bidirectional CML outputs (> 600MHz), ECL's version of an open-collector On-chip 50 ohm series termination ECL 10K On-chip 50 ohm series termination ECL 100K On-chip 100 ohm series termination ECL 10K On-chip 100 ohm series termination ECL 100K Darlington ECL 10K, 50 ohm termination Darlington ECL 100K, 50 ohm termination Darlington ECL 10K, 25 ohm termination Darlington ECL 100K, 25 ohm termination Darlington ECL Hi-Z 10K Darlington ECL Hi-Z 100K Darlington On-chip 50 ohm series termination ECL 10K Darlington On-chip 50 ohm series termination ECL 100K Darlington On-chip 100 ohm series termination ECL 10K Darlington On-chip 100 ohm series termination ECL 100K
From CML forward in the above list are types identified as possible for the Q20000 Series. Standard ECL 10K, 100K, CML and Darlington outputs were in the first re lease of the macro library for the series.
Power supply options In addition to the types of interface required, the power supply or power supplies to be used should be compared to the supplies allowed for the array. The supplies, the number of fixed power and ground pads and their locations should be reviewed for their applicability to the design in question. There is often a need to have an array interface with several types of I/O while keeping power supply requirements in line with what is already provided on the target PCB (printed circuit board). This can lead to operation of a technology with non-standard voltages.
Effects on Parametrics When non-standard voltages are used, such as -4.5V with ECL 10K and 5.2V with ECL 10K, the DC parametrics for the array will be affected. The data sheet for the array series will call out the parametrics for standard supplies.
The vendor must be consulted for computational procedures to be used when non-standard power supplies are used.
Example - AMCC Arrays - Power Supply Options The power-supply and interface type matrix for the AMCC arrays shows a very flex ible approach to solving interface problems. Many of the AMCC arrays can be used with a single power supply (+5V) or dual supplies (+5V/ -5.2V or +5V/-4.5V) as shown in Table 3-15. The Q5000 and Q20000 Series arrays are bipolar arrays. They use an internal ECL core (0.5V ECL) and can externally interface to either Schottky TTL, ECL 10K or to ECL 100K. AMCC arrays allow for the mixed mode operation of ECL/TTL on the same array, either ECL 10K/TTL or ECL 100K/TTL or all three. Only one type of ECL may be used for input on a single array. Both ECL types may be used for output on the same array. Table 3-15 AMCC Power Supply Options
SINGLE POWER SUPPLY
DUAL POWER
SUPPLY
-5.2V
-4.5V
+5V/5.2V
+5V/4.5V
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
I/O MODE
+5V
100% TTL
x
100% ECL 10K 100% ECL 100K ECL10K/TTL ECL100K/TTL
100% ECL run with dual power supplies is called "DECL". Table 3-15, with the exception of "DECL", also applies to the Q24000 Series BiCMOS arrays. They have a CMOS core and bipolar I/O and they can interface to CMOS. The concept of mixed ECL-TTL interface on a single array was originated as a result of customer demand. The idea of operating ECL 10K at ECL 100K power supplies and visa versa was also the result of customer requests.
Example- communicating to the software AMCC uses dummy macros called chip macros that allow a user to specify precisely what array is to be used in what interface mode with what power supplies. (See Figure 3-4.) The chip macro communicates parameters to the AMCC MacroMatrix software modules that are performing design validation, including population and cell type limit checks. The array-specific checks use chip macro parameters to set limits for TTL outputs, Darlington outputs, simultaneously switching outputs, bidirectional macro counts, and other checks. The AMCCERC software can spot mismatched interface macros and exceeded macro type limits and issue appropriate error messages. It can also adjust the DC power module to use the correct power supply in the power computation. Figure 3-4 A Chip Macro (1994)
Interface cell functionality Interface cells are designed to support TTL-translators, ECL-translators and most of the required buffers for external interfacing to both ECL and TTL. The amount of buffering, the capability of the cell to support high fan-out drivers, single-cell bidirec tional macros, ECL output terminations to 25 or 50 ohms, and elementary logic possible in an interface cell varies by array series. For many of the arrays, the input macros also provide simple AND/NAND or OR/NOR logic or high fan-out driver operations. The output macros for TTL contain OR or NOR operations and those for ECL may contain these operations plus others as complex as a latch or a 2:1 MUX. This is in addition to level translation and buffering functions. The amount of logic contained within an interface macro is series-dependent; it is a function of the I/O cell complexity and the components available within the cell.
Variability in I/O design The various array series and even arrays within a series differ in their approach to interface. The following gives an idea of the choices that have existed on the arrays from one vendor. Similar variability and evolution can be traced for other vendors. ●
●
●
●
●
The Q700 Series used unbuffered I (input-only) and I/O (input, output or bidirectional) cells that require a buffer for each input and each output macro. The buffer macros were placed on internal cells (L or B), reduc ing the L-cells available for internal logic functions. There was a D-cell on one array in the series to provide a pin-restrictive three-state enable driver that could drive more than eight loads. A bidirectional macro was composed of one interface and two buffer cells. The Q1500A array used I (input-only) and O (output-only) cells, with buffering either in the input or output macro or in a separate macro. The BExx macros were for ECL output buffering, for example, and were placed on a B cell. TTL input buffers are part of the input macro that was placed on an I cell. Bidirectionals were constructed from two macros on two adjacent cells using the same methods now used on the Q14000 Series arrays. The QH1500A array used I and I/O cells, with buffering included in the input, output, and bidirectional macros the first time all buffers were removed from the internal cell area. The I/O cell could support single -cell bidirectional macros. The Q3500 and Q5000 Series use I/O cells only, with buffering included in the input, output and bidirectional macros. The Q1500, Q3500 and Q5000 Series also provide unbuffered ECL input and the buffered logic macros to support it. The BIxx series
macros are made up of representative logical functions from the rest of the macro library (gates, EXOR networks, latches, flip/flops, MUXs and decoders) which also includes the ECL input buffering function on selected input pins. The BIxx macros are placed on internal macros (L or B). The selected pins are pinrestricted to be driven by any unbuffered ECL input macro. ●
●
●
●
●
The unbuffered ECL input macro does not suffer any degradation in speed due to loading delay, the only macro to behave in this manner. It can drive eight loads. Load capacitance presented to the source driving the unbuffered ECL input increases by 1 pF per fan-out. The Q14000 Series uses I/O cells, with buffering and logic as is used in the Q5000 Series. Single cell bidirectional macros can only be used on the Q9100B or Q2100B and then only in specific "special-I/O" cell loca tions. Additional bidirectional macros must be built from one input and one output macro. The Q20000 Series uses I/O cells, with buffering but no logic functions. TTL outputs (output macros and bidirectional macros) are limited to a number that varies per array. ECL outputs are also limited. The bidirec tional macros use two-cells and provide an added ground pad by using the left-over pad. Most 25 ohm termination macros require two I/O cells. The Q20000 Series provides a single-cell 25 ohm termination macro but limits its use to arrays using two power supplies. Darlington macros are limited to arrays with two power supplies. The Q20000 Series uses four fixed I/O signals per array. These signals are used by the on-chip thermal diode (one anode and one cathode) and the on-chip AC speed monitor (one is power and the other is an output signal). These four pads and cells are not available for use with any other function.
Bidirectional macros Bidirectional macros can be two-pin, one-pin, one-cell or two-cell macros. If an array series has no bidirectional macros, they may need to be constructed. Watch out for incompatibility with the workstations - a work-around may be required for proper simulation of bidirectional macros. If more bidirectional macros are needed, they are constructed from two macros, one input and one output, and placed on two adjacent I/O cells. The two macros can be tied together into one package pin, but this requires two test vector sets, one for wafer sort and one for packaged part testing. They are usually tied together outside of the package to keep testing simplified, but this requires two package pins. A third approach not liked by the array vendors is to stitch two macros together in the interconnect so that only one pad and one pin are used. Anytime that hand-edits or customization of the interconnect or base is involved, both time and money are required, and debugging time may need to be increased.
Examples The Q20000 Series arrays support a bidirectional macro that sits on two I/O cells, unlike the single-cell approach of the Q5000 Series. In this case, the internal macro routing eliminates the need for two sets of test vectors or an extra bonded-out pad. Each bidirectional macro also contains either an IEVCC pad (ECL VCC) or an ITGND (TTL GROUND) pad. (Refer to "Added power and grounds" for a discussion of pad -plane interconnections for added power and ground pads.)
Internal Cell Functionality The logic (bipolar) and basic (BiCMOS) cells are organized to provide logic functions such as basic logic gates and buffers, high-fan-out drivers, EXOR and EXNOR net works, gate networks, multiplexors, decoders, latches, and flip/flops. These cells can support a 3:1 MUX-D flip/flop combination, triple latchcommon clock, triple 2:1 MUX-common select and dual D F/Fs. As stated before (see "Cell struc ture"), the number of cells required for any of these functions will vary by array series. The number of cells required to implement a function depends on the component mix present in the cell and that required by the function. Arrays are designed for a specific set of applications or targets and base array design is optimized for those applications. An array cell size may be divisible so that half-cell macros are possible, which also allows sizes such as 4.5 cells. A cell may be designated as the smallest divisible or addressable unit (SAU), in which case a one cell macro is the smallest macro allowed.
Multi-Cell Macros Groups of internal and/or interface cells can also be combined into large multi-cell macros for higher functionality. The larger multi-cell macros, named MSI macros by AMCC, interconnect components spread across several cells more efficiently than the schematic interconnection of the equivalent function formed from basic macros. The result is a denser functionality with the resultant speed improvement. Design density, measured by the cell utilization per functionality, can be increased by 20-40% while reducing design partitioning and macro conversion efforts. The large MSI macros include MSI and LSI functions.
Example MSI macros are 6-bit comparators, 4-bit carry-look ahead adders and their companion carry-look-ahead generator, 4-bit up and down counters, 4-bit registers, 6-bit comparators and 8-bit latches. Different array series offer different MSI mac ros. The simple and MSI macros available with a specific array series are documented, along with any use or placement restrictions, in the appropriate Design Guide or Design Manual. Always refer to the latest version of these manuals when performing an evaluation.
Hard and soft macros There are two types of MSI or multi-cell macros. One type is hard, where the cell interconnect is treated as one large macro and no variations in layout are permitted. The other type is soft, where the cells composing the macro have a preferred, speci fied-to layout pattern but which requires the interconnect to be routed as if it were any other interconnect net. The MSI macros in the Q5000 Series were originally designed to allow placement in several different configurations to facilitate the auto-place algorithm (best-fit ap proach), while closely maintaining the specified performance for the macro. This is a soft-macro. A preferred placement is documented. The problems in improper placement, which invalidates the timing specifications and therefore, the simulation model, and the problems in net weighting and prioritizing the internal nets to the router, so that the interconnect delay could be kept minimal, make the soft MSI macro approach unattractive. Both the BiCMOS Q14000 and Q20000 Series MSI macros use hardmacros, where an MSI macro is laid-out as a single multi-cell unit and handled by the placement soft ware as an inflexible black box. Hardmacros facilitate automated placement. Future AMCC arrays will use the hard macro approach. Figure 3-5 shows an MSI-based 16 -bit adder. Figure 3-5 16-Bit MSI Adder (1994)
Refining Interface Requirements When the interface types and their power supply requirements are documented and one or more arrays chosen as candidates for the final selection, the interface require ments must be refined. There are several conditions under which additional power or ground pads will need to be added to an array beyond the fixed power and ground pads provided. These include: ● ● ● ●
simultaneously switching outputs, package restrictions, high-speed signal isolation and ECL - TTL isolation.
Simultaneous switching TTL or ECL outputs is a potential source of system noise, which can be reduced by the addition of TTL VCC - TTL Ground pairs and/or ECL VCC. Some arrays require that drivers be placed next to ground. Others require that a ground exist between simultaneously switching TTL outputs and ECL inputs, or between any TTL output and an ECL input. Isolation of CMOS inputs from the faster switching TTL and ECL signals may also be required. When a fixed ground is not available, then one must be added. The design rules for any array series are called out in the Design Manual for the array.
Variable Requirements for Power and Ground Bipolar arrays require that all fixed power and ground be used or bonded out to the package. Additional power and grounds are based on simultaneously switching out puts or isolation requirements. CMOS arrays have some or all of their fixed power and ground pads under user -placement control. The vendor provided a list of how many would need to be used depending on the signals used by the design. This type of flexibility is detrimental to standard packaging; it is time consuming and expensive. In spite of the drawbacks, recent BiCMOS designs have returned to this approach, providing the minimal number of power and grounds and allowing other fixed-position power and grounds to go unbonded (unconnected). The criteria for requiring that these fixed positions be used or that additional power and grounds be added is based on the number and types of interface macros used. When the power busses supporting the internal core are isolated from the busses supporting the peripheral I, O or I/O cells, noise feedback due
to output switching is minimized. The threshold and reference voltage generators for the logic array inter nal cells and I, O and I/O cells should also be independent to insure steady operation.
Adding Extra Power and Ground pads Adding a power pad or a ground pad to an array can be accomplished by placing a power or ground macro on the desired pad (array-specific procedure. AMCC arrays use the ITPWR (+5V), ITGND (0V) and IEVCC (ECL VCC) macros to add power or ground. (See Figure 3-6.) For standard refer-ence ECL, IEVCC represents a ground pad. For +5V REF ECL, IEVCC represents a power pad. Figure 3-6 Added power and Ground macros (AMCC)
Dual-Function I/O Macros Each added power and ground macro uses a pad and disables the cell that is associ ated with that pad, reducing the number of these cells and pads available for I/O operations. To offset this waste, many macro libraries include dual-function macros that use the I/O cell for one function and the pad for added ground. Silicon efficiency can be achieved with the dual function macros. The macros avail able are array series-specific and vary widely. If any of these functions applies to the design, they can reduce silicon requirements while maintaining functionality. (See Figure 3-7.) Example macros include: ● ●
input function with 3-state enable driver 3-state enable driver with added ground bidirectional input with added ground
Figure 3-7 Example Dual-Function I/O Macro
Example - Simultaneously Switching Outputs All AMCC arrays, with the exception of the Q20000 Bipolar Series and the BiCMOS Q24008 array, use the following rules for adding power and ground due to simul taneously switching outputs (SSO), called an output group. Allow 8 TTL SSO outputs per quadrant, then add one TTLPWR and one TTLGND macro for each group of 1-8 after the first eight. This requires two cells, two pads and, depending on the package, two package pins. Add another pair for the next group of 1-8 and another for the next group of 1-8 and so on. All TTL output counts are converted to "equivalent" 8 mA outputs. (See Table 3-16.) For packages with internal power and ground planes, place the TTLPWR and TTLGND macros so that they are interspersed with the
simultaneously switching outputs and can be bonded to the power or ground package plane. Table 3-16 Sample Rules for Adding TTL Power and Ground PER TTL SSO ADD TTLPWR, TTLGND PAIRS: 0-8
do nothing
9-16
add 1 pair
7-24
add 2 pairs
Etc. Allow 8 ECL SSO outputs per quadrant, then add one ECLVCC macro for each group of 1-8 after the first eight. This requires one cell, one pad and, depending on the package, one package pin. Add another pair for the next group of 1-8 and another pair for the next group and so on. For packages with internal power and ground planes, place the ECLVCC macro so that it is interspersed with the simultaneously switching outputs and can be bonded to the power or ground package plane as required. Note that ECLVCC is a power pad in a +5V reference ECL circuit (5V REF ECL) and a ground pad in a standard reference ECL circuit. (See Table 3-17.) Table 3-17 Sample Rules for Adding ECL Power OR Ground PER ECL SSO ADD ECLVCC Q20000 Rules 0-4
do nothing
do nothing
4-8
do nothing
add 1
9-12
add 1
add 2
13-16
add 1
add 3
17-21
add 2
add 4
21-24
add 2
add 5
Etc.
Etc.
The Q20000 Series requires one ECLVCC per additional 1-4 ECL SSO after the first group of four. All output counts are converted to "equivalent" 50 ohm outputs. The extremely high speeds of these arrays require design procedures to ensure minimal noise.
Systems
Thermal Diodes As the arrays have become larger and dissipate more power, thermal characteriza tion becomes an increasingly important issue. Some means of evaluating array junc tion temperature must be developed for each array series. For some of these series, macros have been developed that allow the designer to add one or more thermal diodes to the design. The macros are treated as any other macro and are placed on interface cells. Newer arrays, such as the Q20000 Series, have thermal diodes built into the base array. The Q20000 Series arrays have a thermal diode structure embedded in the base and brought out to dedicated or fixed pads. These pads must be brought out to package pins. These pads are not accessible to any other macro function.
Example - AMCC thermal diodes (1994) Thermal diode macros exist for the Q14000 and Q5000 Series libraries and the de signer is required to add one thermal diode macro pair per circuit. Using more than one was found to be unnecessary as the thermal gradient across the chips was found to be insignificant. Where there might be doubt, additional thermal diode pairs can be added. Each pair uses two I/O cells. (See Figure 3-8.) One earlier version of the implementation also used one internal cell. No differences were found to exist be tween these two versions. Thermal diode macros also exist for the Q20000 Series for those cases where a second thermal diode measurement is felt to be necessary. Figure 3-8 Thermal Diode Pair
The AMCC AC Speed Monitor AC testing is a problem for both the designer and the vendor and to reduce the problems associated with it, the Q20000 Series arrays each has a built-in AC speed monitor with two fixed pads assigned to it. These pads must be brought out to package pins.
Threshold generators - routable generators The designer is not usually concerned with the threshold generators. In cases where they are required, they may only need identification and routing connections rather than actual cell placement.
VBB Reference voltages There are some instances where VBB reference voltages are desired, where I/O utili zation is high and the designer is using single-rail ECL where differential ECL is re quired. These reference voltages are supplied with a macro and are placed on an interface cell. They will connect to external package pins.
Speed and testing interface cell utilization Maximum speed of operation and testing requirements will have an affect on the final interface cell count. For very high speeds, differential ECL may be required by the array vendor, doubling the cell and pad counts of those signals. Testing may require that parts of the circuit are degated while other parts are being tested. This will occur when a simultaneously switching group is very large, including the simultaneous enable-disable of threestate or bidirectional macros. Test-enables may be required to partition the circuit for testing, and test enables will use cells and pads.
Population or cell type limits and utilization Where population restrictions exist, circumvention of the limits may the include the addition of interface macros. For example, a single-cell bidirectional macro limit would result in two-cell bidirectional macros being used for additional bidirectional signals. The single-cell 25 ohm ECL termination, if dual power supplies are not available would result in twocell 25-ohm terminations.
Placement restrictions High-frequency signals in particular will often require placement in specific cell loca tions and require that these macros be isolated with added grounds. Added grounds use pads and disable the accompanying cell. Where placement restrictions require the addition of macros or a change in the macros selected, the effects on cell utilization must be anticipated in the initial estimate.
Final Interface Cell Utilization The final interface cell count for the circuit in its estimated stage should look at all the factors that could increase interface cell requirements. The interface cell utiliza tion for a non-captured circuit should be less than 100% if possible to allow for adjustments and expansion. If this is not possible, than the rest of the design must be completed using I/O cell utilization minimization as a priority design objective. In the ideal situation, an array chosen for a design should be somewhere in the middle of an array series. This is to provide a smaller option if I/O minimization can reduce the requirements and to provide a larger option should the interface requirements grow out of the original selection. If not, then the interface utilization should be no more than 90% during develop ment, with no more than 100% interface utilization for the final design.
Interface Cell Utilization (general) To find interface cell utilization, add the items in the list in Table 3-18. Table 3-18 Interface Cell Utilization Interface Cell Utilization cells for input signals cells for output signals cells for bidirectional signals cells for thermal diodes (I/O) cells for AC speed monitor (I/O) cells for reference generators cells blocked by added power pads cells blocked by added ground pads cells dedicated to fixed I/O signals Divide this sum by the number of interface cells available on the array of choice. Interface cell utilization = (number of interface cells used by the circuit) / (number of interface cells available on the array)
Example - BiCMOS Cell/PAD Utilization When an array does not have a one-to-one ration of I/O cells and pads, then PAD utilization may also be required. The Q24008 and Q24280 arrays have 2-cell-1-pad structures. Certain macros placed on these structures are very efficient, others are not. Depending on the macros used, single-cell or multi-cell, either pads or cells may be rendered inaccessible. These arrays have a complex algorithm available to allow sizing. The algorithm requires a check on both cells and pads. PAD utilization = (number of PADS used by the circuit) / (number of PADS available on the array)
Fan-out load limits Internal cell usage will depend on the macros required to implement the desired func tions. Refinements to that estimate come when the fan-out load limits, hook-up and pin restrictions for those macros are evaluated. If an interface macro is driving too many loads, internal macro buffers will be needed to divide that load or additional interface macros will be needed. If internal macros are driving too many loads, the same approach is used. These buffer trees use cells and current. Macros will be specified with both fan-in and fan-out load limits. The fanin numbers represent the load that the macro presents to the macro driving it. The fan-out limit is the number of loads that the macro can safely drive before signal degradation becomes a predominant factor. A load unit can be considered to be equivalent to one pico-farad. Check with the array vendor for their definition.
Derated fan-out load limits Clock paths, distortion-sensitive and high-speed paths should be designed with a derated fan-out load limit, i.e., with macros operating well below their specified limits. The array may be specified with a guideline as to the frequency - derating schedule. Each AMCC array series is different in the value of the breakpoint frequency but each has the same basic rule. For sensitive and clock paths, derate the fan-out load limit by 20% up to the breakpoint and 40% at or above the breakpoint frequency.
Example - fan-out derating For the Q20000 Series, all internal macros have the Turbo speed enhancement allow ing a fan-out load limit of 18 loads. The TTL input and bidirectional input macros are the only interface macros that do not have this Turbo enhancement and their fan-out load limit is 9 loads. Assume that the breakpoint frequency is currently set at 400MHz. For an ECL input toggling at 500MHZ, the derated fan-out load limit would be: (1.0 - 0.4) * 18 = 10 (truncated)
Drivers Special driver macros may be provided in a library. These "super-drivers" are not derated. They are designed to provide a clean edge even when loaded to their rated limit. These drivers will use more current and more cells then the non-driver but fewer of them are required to drive the same load. The result may be the same cell utilization and the same power. Another feature of drivers should be considered. When timing analysis is performed, the super-drivers and drivers will be seen to have a lower kfactor (drive factor) than the non-driver macros, resulting in lower intermacro delays for the same load than a non-driver macro could provide. Drivers may be interface macros or internal macros.
Hook-up or interconnect restrictions Hook-up is used here to define the rules on grounding an input pin to a macro. CMOS and BiCMOS technologies require that all unused macro input pins (non-primary array inputs) be clipped to VDD or VSS, no exceptions. Bipolar technologies allow the unused input pins to be tied to global ground. The ground symbol on the schematic is for human comprehension and to allow checking software to understand that the designer meant to leave the pin unattached. For some arrays, a macro input pin connected to global ground on a schematic will mean that the pin "floats", or is unattached to anything when silicon is built. For others, these pins are physically attached to a confirmed logical low by connecting to a rail (CMOS) or by strapping the base to the emitter (bipolar) through conditional geometry. For the Q20000 Series these pins are base input to transistors and when unused are tied to the emitter to ensure a logical low. For the Q5000 Series, the pins were allowed to float. Whether or not the pins are allowed to float, there will be cases where specific macro pins are restricted, i.e., these pins cannot be attached to global ground but must be driven low by another macro. This is a hookup restriction. When hook-up restrictions exist, some macro must be added to the schematic to drive these pins low (or high). The number added will depend on the number of loads that must be driven low or high.
Pin restrictions - interconnect restrictions Some macros are pin-restricted in that they may not be freely connected to any other macro but much be driven by or drive a specific class of macro. As an example, TTL three-state outputs and TTL bidirectional macros in some macro libraries must have their enable pins driven by a macro known as a three-state enable driver. No other macro may drive that enable pin. The three-state enable drivers can only be connected to drive these specific pins; they may not be used to drive other macros. In the Q5000 library, three-state enable drivers may only be placed on interface I/O cells, even when they are driven by internal signals, leaving the pad unused in this case. When pin-restrictions cause the use of specific macros and these macros have re stricted placements, the impact on cell utilization must be considered.
Internal cell utilization When the paths have all been checked for fan-out, pin restrictions, hookup restric tions, placement rules, etc., the internal cell utilization can be estimated. As stated in Chapter 2, this is the sum of all the internal cells used divided by the number of internal cells available. Internal cell utilization = (number of internal cells used by the circuit) / (number of internal cells available on the array)
Further changes Other factors that can change the estimated cell utilization include adjustments made for power reduction, for speed enhancement, or for cell utilization reduction for ei ther interface or internal cells.
Exercises 1. Select a semi-custom array series (any). List: ● ● ● ● ●
the processing technology available power supply configurations types of TTL input and outputs allowed types of ECL input and output allowed how bidirectional macros are handled
2. For the selected series, what cell usage restrictions exist? ● ●
● ● ●
a. Any limits on inputs b. Any limits on outputs ❍ TTL ❍ ECL c. Any limits on bidirectionals d. Any rules for simultaneously switching outputs Are the rules easy to find?
3. For the selected series, how many fixed power and ground pads are on each array in the series? How are additional power and ground pads added? 4. For the selected series, what types of cells are available on each array and how many of each type? 5. How many internal cells would be required by the selected array series macros to implement an 8-bit barrel shift register (8 2:1 MUXs with 8 4:1 MUXs, 8 D flip/flops)? 6. Given a 16-bit fast adder design using carry-look ahead, 16 DATAA and 16 DATAB inputs, necessary controls (clock, reset, carry-in), a registered output, 17 outputs (sum plus carry out), size the design for the macro library for the selected array series. Assume a COMMERCIAL environment, single -5.2V power supply, ECL is ECL 10K or ECL 10KH. Fast adder: four 4-bit fast adders with carry-propagate outputs; one 4bit carry-look ahead unit; 17 D flip/flops; 35 ECL inputs; 17 outputs; buffers and gates as required; added power/ground as required. 7. Given a 32-bit register, 35 ECL inputs (32 data, clock, reset, 3-state enable), dual ECL-TTL outputs (32 TTL 3-state and 32 ECL, same signals), size the design for the selected array series. Assume a MILITARY environment, dual-power supplies of +5V and -5.2V, ECL is ECL 10K or ECL 10KH. Register: 32 D flip/flops, 35 ECL inputs; 64 ECL outputs; buffers and gates as required; added power and ground as required.
Case Study: Sizing A Design
TARGET ARRAY: AMCC's Q20080 {Based on 1994 data} The following exercise is not intended as a practical circuit for actual construction on an array, however, this exercise will examine nearly every design rule and restriction for the example array series. It will be solved here using a Q20080 array as the intended target solution but could be solved with any macro library provided one of the supported arrays in that series can accommodate approx. 160 I/O signals and toggle at 500MHz. See Figure A-1.
THE DESIGN Using the following list of requirements, design a circuit using AMCC macros for the Q20000 Series and size the design to fit the Q20080 array in that series: ●
●
●
A pipelined structure two flip/flops deep is to be 32 bits wide. Each data input to the first flip/flop stage is to be driven by a 2:1 MUX, the inputs of which are driven by ECL 10KH inputs. All flip/flops are required to be reset by way of a master reset signal.
●
The common clock is to be a differential signal, if possible.
●
All 32 multiplexors are to have a common select.
●
●
●
●
●
●
The target maximum speed of operation is 500MHz. (Design Objective.) All dataA inputs (32 of them) are to be fed in groups of four into two 16:1 multiplexors. There are four common select lines for the two 16:1 multiplexors and two outputs, controlled by enables (one per signal). All input signals, data and controls, are to be fed into a parity tree, a gate tree that will produce a single output. This structure is to be used for parametric testing. A six-bit pass-through bus (input to output without logic) is included which uses ECL inputs and outputs. The flip/flop output stage is connected to non-Darlington ECL 10KH outputs. Both true and complementary outputs are to be brought out to external pins. This is a military, standard reference ECL -5.2V single-supply circuit.
Note: Keep your data. This problem or a similar one will be referred to in other chapters.
Exercise Review the selected design manual, select macros and compute cell utilization. Pick an array from the series that would fit the design. Perform all required population checking for that series.
LOGIC DESIGN for ARRAY-based Circuits
SOLUTION - Q20000 Check for I/O mode and power supply. This is a 100% ECL circuit and uses no Darlingtons so that a single -5.2V supply is allowed. The AMCC chip macro is Q20080ECL10K, which sets the I/O mode at 100% ECL with ECL 10K/KH inputs. The power supply parameter is set at STD5 for standard reference -5.2V supply. The product grade parameter is set at MIL for military. Between them, these parameters define this circuit as a MIL5 circuit, using the MIL5 library and annotation data. The chip macro is shown in Figure A-2. Figure A-2 AMCC Icon for the Chip Macro
Selecting a flip/flop - first pass The need for a master reset will reduce the set of available flip/flop macros that could be used to those with a synchronous or asynchronous reset (or set). The use of a 2:1 MUX - flip/flop combination will further reduce the choices for the first stage of the circuit. For the chosen Q20000 macro library, FF46S is a D flip/flop with a 2:1 MUX on the data input and an asynchronous reset. It is more siliconefficient to use a combination MUX-F/F macro than to implement the design with individual multiplexor and flip/flop macros. The second stage flip/flop needs a reset and at this stage in the design process needs both Q and QN outputs. FF10S was chosen as the
appropriate macro. See Figure A-3. Figure A-3 MUX and two F/Fs in Two Macros
Selecting the ECL input All inputs (reset, selects, output enables and data) except the clock will use the IE93S, a simple buffered input that produces both Y and YN outputs shown in Figure A-4. The YN output will be used to input to the gate tree to keep loading off the Y path. To reduce power, the IE94 version with only the Y output could have been chosen. This option would use three loads on the Y path, two to the main circuit (register input and 16:1 MUX input) and one to the parametric tree. Figure A-4 Output Macro with Complementary Outputs
For this circuit, the saving of one load is not significant in that the loads are not in the critical path. In another instance, the reduction of one load could be the difference between meeting or failing specification. There are 64 data inputs, 32 dataA and 32 dataB, plus one select for the input 2:1 MUX, and four for the 16:1 MUX controls (and four 16:1 MUXs) for a total of 82 IE93S macros. Each macro uses one I/O cell and one pad. (See Table A-1.) Table A-1 Required IE93S Inputs 32 data A 32 data B 1 reset control 5 data MUX control select 2 output enables (MUX outputs) 6 pass-through inputs 78 IE93S inputs
Clock input The clock input will use IE34H, a differential high-speed input with a Y and YN output. For CML-compatible input, use IE31H. The clock will have two loads. It uses two I/O cells and two pads. The clock is in the critical path. Other options that could be considered include the use of the driver version of the differential input, IE32D. The driver handles 32 loads and has k-factors with less skew than those of the H-option IE34H. If the IE34H proves to be too slow or the inter-macro delays too long, the IE32D would be the choice for a speed upgrade. The driver is shown in Figure A-5. Figure A-5 Differential Input Macro
ECL outputs - first pass All outputs in the initial version of the circuit were the OE42S, a cut-off (ECL output with an enable) macro used with the enable tied low (always on) except for the two controlled outputs. (This macro was the only 50 ohm non-Darlington termination in the initial release of the library.) The (111) version of the library added OE11S, a NOR-input 50 ohm termination, rated for 350MHz. The other option is to have a custom 50 ohm macro created, not worth the effort for the case study but something that should be reviewed in a real circuit where power and cell space are at a premium. The OE42S enable is tied low by way of the GT87D static driver, a macro that supplies steady HIGH and LOW signals when unused macro pins cannot be "clipped" low or allowed to float. There will be 64 data output for the pipeline, six outputs for the passthrough signals, two MUX outputs and an output for the parametric gate tree for a total of 73 outputs. Each OE42S uses one I/O cell and one pad. The fan-out load limit for the GT87D is 50 loads so two will be required to supply the OE42S enable pins in this first version of the design. The basic module is shown in Figure A-6. Note that the OE11S is easier to use and uses less power - reasons to consider challenging the initial solution.
16:1 MUX The 16:1 MUX is constructed from five MX21S macros, each a 4:1 MUX with two selects. This is the largest multiplexor in the first release. Four of these will feed into the fifth to form the 16:1 MUX structure. Since there are two 16:1 MUXs, there will be 10 MX21S macros required. An 8:1 or 16:1 MUX MSI macro would simplify the design. The basic design is shown in Figure A-7. Figure A-7 Schematic Page for the 16:1 MUX
Parity tree A parity tree of all inputs (required for parametric VIL, VIH testing) can be formed from NOR gates using the GT60L or GT60S, an 8-input NOR macro. The L-option is slower and uses less power. The speed of the gate tree is not important since testing is functional at 100ns intervals. The first estimate for the tree is to use eleven GT60S macros in a threelevel structure to accommodate the 79 input signals. (The 78 data signals plus the clock are required.) The parity tree is shown in Figure A8. Figure A-8 Parity Tree
REVIEW STATUS SO FAR The first sizing estimate provides the cell counts shown in Table A-2. Table A-2 First Sizing Estimates # MACROS Macro
# I/O Cells Required
78
IE93S
78
73
OE42S
73
1
IE31H
2
TOTAL I/O CELLS: 153 Macro
# L Cells Required
10
# MACROS
MX21S
20
11
GT60L
33
32
FF10S
96
32
FF46S
96
TOTAL L CELLS: 245 The number of macros is not the same as the number of cells, even for the I/O macros.
Exercise Check the cell counts against the current design manual for the Q20000 Series. Check for new MSI macros or new I/O macros that might be used in place of those selected (such as OE11S). Consider size, speed and power in making changes. (Changes should be made!) If you are designing with a different array series, create the same table for the chosen library.
SIMULTANEOUSLY SWITCHING OUTPUTS Since 64 outputs are switching simultaneously in the worst case (master reset is one example), additional IEVCC macros (added ground) will be required according to the Q20000 Series design rules. A total of 16 IEVCC macros is required for these outputs and each blocks off one I/O cell and uses one pad. This is the minimum number of added power and grounds recommended for worst-case conditions.
Adding two more outputs for the 16:1 MUX Y outputs, six for the passthrough and one for the gate tree, requires two more IEVCC macros. ●
●
If the outputs switch within one macro delay (or within 2 ns, whichever is larger) of the other switching group additional IEVCC is required. If they switch well separated in time from the other group, then the added IEVCC for this group will not be required.
By tagging the switching groups and the added power and ground macros that belong to the groups with a SWGROUP parameter or property, the AMCC MacroMatrix can check for sufficient added power and grounds. For this design, assume that the groups are not simultaneously switching more than 32 signals, allowing a reduction in added ground. Allowing 8 IEVCC for the pipeline outputs (switching group AAA) and one for the rest of the circuit (switching group BBB), nine IEVCC macros are required. Adding these 9 IEVCC macros to the previous counts (153 + 9), the number of I/O cells used is 162. This is exactly the number of I/O cells available for circuit use on the Q20080 array. (This does not count the four fixed I/O signals for the AC Speed Monitor and the thermal diode that have pre-assigned PADs.) The added ground macro is shown in Figure A-9. Figure A-9 Added IEVCC Macro
Note: Using less than the recommended number of added grounds is not a good idea. It will require engineering approval before design submission and could cause other problems later. Think about another solution!
FAN-OUT LOADS The final step toward an estimate of circuit size requires that fan-out loads be examined. Most macros in the Q2000 library will have a fan-in of one except for H-option macros that will have a higher fan-in (and larger cell size). This is not always the case but should be considered when examining macro options. Select lines for 16:1 MUX Select lines to each 16:1 have at most four loads. No buffering is required for the IE93S macros that can drive 18 loads each. Select lines to 2:1 MUX structure The select to the 2:1 MUX structure has 32 loads and will need buffering. One macro can drive 18 loads, adding a gate buffer tree such as two GT09S macros allows one primary input to drive 32 loads. (See Figure A10.) Figure A-10 Buffer Tree for the 2:1 MUX (32 Loads)
The other option is to switch the IE93S for an IE23D driver that can drive 32 loads directly. The IE23D driver uses twice as much current as an IE93S macro but would save the internal cells that the GT09S macros would have used. Reset loading RESET requires the same decision process. In this case, the signal goes to 64 flip/flops. The AR pin for the FF46S is two loads and the AR pin for the FF10S is 1 load for a total of 96 loads. Either six GT09S macros or three GT55D macros can provide the drive. The GT55D driver uses twice as much current as a GT09S macro and is twice as large. Since half as many are required, on comparing cell usage and power these two solutions are equivalent. On the schematic, eight GT09S macros were used to simplify the schematic design (eight pages are replicated). (See Figure A-11.) Figure A-11 Reset Signal Buffer Tree
RESET STRUCTURE - ONE OPTION Reset structures are often treated as clock structures without the need for speed. This structure is only one level in depth. Current synthesis systems will create the necessary buffer trees to support the load being driven. Clock The clock is handled differently since all clock nets must be derated. There are 64 loads from the flip/flops, plus 1 load due to the parametric gate tree, for a total of 65 loads. The IE31H can drive 10 loads with a 40% derating. The GT55D driver, derated, drives 19 loads and presents a fan-in load of two to the driving macro. Four GT55D macros would provide the drive capability with full 40% derating down the path as shown in Figure A-12. Figure A-12 Clock Tree
CLOCK STRUCTURE - ONE OPTION Derating guidelines are part of the array design rules. Macro load limits are listed in the macro documentation. Place & Route software today creates the clock tree structure based on the commands in a control script. The commands involve suggested buffer or macro to be used and clock tree depth. In the near future, Floorplanners will incorporate this function. Clock trees have priority during layout, depending on the design constraints supplied to the Place&Route tool. When the clock tree is to be constructed by the Place&Route software, all timing analysis prior to the routing is done using a modeled clock, approximating what the final clock tree behavior might be. Static Driver The static driver required to drive the always-on output enable inputs can handle 50 loads but 64 are required in this version of the design.
Two GT87D macros can be used. One is shown in Figure A-13. Figure A-13 Static Driver
Static driver is not a term that shows up in macro lists today. Rather, high-drive options on various macros are used. If no one macro can handle the load to be driven, then a buffer tree is constructed by the synthesis tool.
Parity tree A parity tree of all inputs (required for parametric VIL, VIH testing) can be formed from NOR gates using the GT60L or GT60S, an 8-input NOR macro. The L-option is slower and uses less power. The speed of the gate tree is not important since testing is functional at 100ns intervals. The first estimate for the tree is to use eleven GT60S macros in a threelevel structure to accommodate the 79 input signals. (The 78 data signals plus the clock are required.) The parity tree is shown in Figure A8. Figure A-8 Parity Tree
REVIEW OF SIZE - SECOND PASS The revised estimate (one version of the solution) shows the circuit requirements as they are now understood. Table A-3 Second Sizing Estimates Number of Cells Required #macros
MACRO
CELLS
TOTAL
79
IE93S
1
79
73
OE42S
1
73
1
IE31H
1
2
9
IEVCC
1
9
TOTAL I/O CELLS REQUIRED 162 10
MX21S
2
20
11
GT60S
3
33
10
GT09S
1
10
4
GT55D
2
8
2
GT87D
2
4
32
FF10S
3
96
32
FF46S
3
96
TOTAL L CELLS REQUIRED 267 Change OE42S to OE11S and delete the 2 GT87Ds. This fits into the Q20080 array that has 162 I/O cells and 2044 L cells. This is a severely I/O-bound design (of course!). A design is either corelimited or I/O limited. Note: When vectors are written for this array, they should be designed so that no more than 16-32 of the outputs switch at any one time. These are AMCC-specific vector design rules. Table A-4 AMCCERC Population ERC
PACKAGE SIZE The minimum number of signal pins that should be available on a package for this circuit is 157 (162 signals plus the 4 fixed signals minus the 9 added grounds). The worst-case number of signal pins that could be required on a package for this circuit is 166 (162 signals plus the 4 fixed signals). The truth is in the middle and is placement-dependent. PROBLEMS ●
The OE42S is limited to a toggle frequency of 350MHz. If the clock is running at 500MHz, the outputs could be toggling slower. If not, then the OE42S is not a correct choice if speed is to be maintained. Neither is the OE11S!
●
Insufficient added grounds is not a minor problem.
●
The circuit uses nearly 8 Watts - much too high.
ALTERNATIVE SOLUTION The differential output OE14S could be used in place of two OE42S macros and the GT87D driver (at least one) could be deleted. This reduces the OE42S macros from 73 to 9, and the 7 always-on enables could be driven by a GT08L NOR gate instead of a static driver macro. The use of OE14S provides a cleaner solution (less skew) plus it frees internal cells. The maximum frequency of the OE14S is 1.2GHz. One output pad can be used as the true signal and the other as the compliment. Another advantage is the reduced requirement for added grounds. The 32 differential outputs count as 32 outputs and not as 64, reducing the re-quirement for this group to 8 added IEVCC, what was provided. The ninth IEVCC applies to the miscellaneous other outputs. There will be a warning issued by AMCCERC that there might not be sufficient added grounds for these miscellaneous outputs - the algorithm defined by AMCC requires that two IEVCC macros be added. Table A-5 OE42S Solution IE93S OE42S
78 73
IE31H IEVCC MX21S GT87D GT60S GT09S GT55D FF10S FF46S
1 9 10 2 11 8 4 32 32
Table A-65 OE14S Solution IE93S OE42S OE14S IE31H IEVCC MX21S GT87D
78 9 32 1 9 10 1
GT09S GT55D FF10S FF46S
8 4 32 32
POWER The DC power dissipation for the maximum worst-case MILITARY DC power for the OE42S version of the circuit was estimated to be over 8 Watts. The DC power computation for the OE14S version, same conditions, is esti-mated to be 5.88 Watts. (This number is based on the circuit as shown in the schematics and the February 1991 library specifications.) Reducing the GT08S macros to GT08L macros can further reduce power.
FURTHER THOUGHT For cell usage, timing, power, and added ground requirements, the basic OE14S solution is the best pro-posed so far. Table A-7 OE14S Solution
Table A-8 OE14S Solution
This version used GT87D instead of a GT08L. It uses GT60S macros in the gate tree instead of GT60L macros. Do the MUX and reset buffer trees need S-macros or could L-option macros be used? (Watch it - the options have different maximum frequency of operation numbers! This is often overlooked in choosing options.)
The DC power computed by the AMCCERC program is summarized below. Remember - AC power dissipation must be added to this. AC power compu-tations required depend on the array series. Table A-8b Macro Occurrence Report Continued
Exercise Add a design objective to reduce power to 5 Watts or as close to it as possible and modify this circuit using the latest library information. The frequency of operation requirement remains. This same exercise was used in the AMCC training classes through several library releases. This problem, or one close to it, was actually used for over eleven years with several technology libraries, bipolar, Bisquared MOS and CMOS. It demonstrates nearly 85% of the array design rules. Today's designers would create this circuit in Verilog or VHDL and a control script for the synthesis tool. Constraints can drive area reduction, speed improvements or power reduction. The script can also set the priority for the different design constraints.
THE SCHEMATICS Page 1 - Chip Macro and added Ground (IEVCC for ECL VCC); AAA is switch group tag; GT87D a static driver
Page 2 - Clock tree; RESET tree; 2:1 MUX select tree. Buffer trees go to various pages. Note the inputs to the parametric gate tree. "40"s are FOD values. (Figure A-10, Figure A-11, Figure A-12, Figure A-13)
Page 3 - 2:1 MUX selects and enable controls; 6-bit input-output path. OE42S macros should be replaced and VLO signal deleted.
Page 4 - Using MX21S 4:1 MUX macros to built a 16:1 MUX. OE42 should be changed.
Page 5 - pipelined register: 2:1 MUX-D F/F FF46S feeds FF10S which drives OE14S. OE14S connection could be improved to remove need for VLO signal.
Page 6 - Same as page 5 except for names. Note output to parametric gate tree. AAA is the switch group tag (matches IEVCC on page 1).
Page 7 - Next four bits.
Page 8 - Next four bits.
Page 9 - Next four bits. Note how the page number has been incorporated into the macro instance names - FF0905, FF0906, etc. - to prevent duplicate names.
Page 10 - Next four bits.
Page 11 - Next four bits.
Page 12 - Last four bits for 32-bit registers.
Page 13 - The parametric gate tree - all inputs fed into a combinatorial gate tree and tied to one output. PGATE is the GTO parameter value. OE42S should be changed. Note how page references make it possible to trace the connections. (Figure A-8) Page 14 - The second 16:1 MUX - this page should have been grouped with the other MUX page for better schematic set readability. Group functions together. (Figure A-7)
Design Optimization Last Edit July 22, 2001
The initial version of any design is almost guarenteed not to be the best solution. It is always possible to improve on an existing design, hardware or software, just as it is always possible to edit a manuscript. The trick is in knowing when to start and stop the process, also known as the endless loop.
Introduction Design optimization should be performed once an initial version of the design has been drafted at the block-diagram level. The design should be reviewed for optimization under the constraints of the established design objectives. It should also be reviewed for optimization using the particular characteristics of the technology and array series selected. A second design optimization review should be performed once the macro conversion has been accomplished. The first step in this process is another review of the chosen macro library. Familiarity with the macros available will be invaluable in contributing to an optimized final design. The process is shown in Figure 4-1.
Overview of Design Process - Objectives and Optimization
It has been shown that familiarity with the macro library is even more important than previous design experience! Something many designers argue with until faced with a case in point. After reviewing the steps required to solve the simple case study example, it should be obvious that the selection of macros to solve a circuit implementation is much more complex than simply selecting the macros that appear to solve the equation. Timing, cell utilization and power dissipation are integrated elements that must be considered in parallel during the design process. Design automation tools are moving in the direction of design synthesis and design review-for-criteria. Example systems are the NCR ViSys Design Advisor• available on the Mentor Graphics and VCR-supported CAE workstations. A future expansion to that system is the NCR design synthesis tool. Design synthesis tools will become more prevalent over the next few years. They should be considered as a tool to assist the designer, not to replace the designer. [This was written in 1991-4 so there are presumably more systems available. Check the current literature for other references.]
Optimization Approaches There are several approaches to optimizing a design as shown in Table 41. Table 4-1 Design Optimization Objectives General Design Optimization Objectives ● ● ● ● ● ● ● ● ●
improve speed and minimize distortion balance speed (tracking) reduce internal cell utilization reduce I/O pin count reduce power (lower the junction temperature) increase circuit testability increase circuit reliability reduce cost
These "design objectives" are often incompatible. Each design will have its own priority order for these items, establishing the basis for decisions where choices must be made. For example, design requirements may include a power dissipation limit or a junction temperature restriction. Solving the power equation may violate the speed requirements. The maximum specified operating frequency and critical path performance are usually clearly defined. Balanced path design is essential in communications circuits and has its own restrictions and tracking requirements. Macros selected to allow speed to be achieved may increase power while macros chosen to allow balanced delays may increase cell utilization. Cell utilization can determine which array in a series is acceptable, and the larger arrays do cost more. Cell reduction techniques may affect final speed, power and cost.
●
●
●
●
The reliability of the circuit is a question of the "trickiness" of the timing and the logic design. The so-called "hot dog" designers are as welcome in a company as their software counterparts - their designs are difficult to build, test or maintain. Modular designs are an important reliability and testability issue. Modularity may require additional macros for degating while testability may require additional macros for test point monitoring or circuits such as scan-path. A circuit must be testable to at least the 90% confidence level, preferably higher, and testability issues should have been in place before the design start. Refinements to the circuit testability are what should be required at this point in the design cycle.
Circuits with design for test (DFT) modules will average 10-20% more cells than circuits that do not use DFT. DFT circuits are easier to develop test vectors for than non-DFT circuits and they require significantly less vectors. (See Chapter 8.) The size of the test vector set also has an effect on cost. It is more costly to develop a large set of vectors, more time consuming to fault-grade them and takes more tester time to test the die.
DESIGN FOR SPEED There are several basic design approaches that can help the designer achieve the desired speed from the circuit. A list is provided in Table 4-2. Table 4-2 Designing For A High-Speed Circuit Design Procedures for a High-Speed Circuit ● ●
● ● ● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
Minimize the circuit logic. Evaluate several implementations of speed-critical paths. Reduce interconnections, gate count and chip area. Use dense, silicon-efficient macros where possible. Use the correct macro-option or version. Place signals efficiently in multilevel gating structures. Place late-arriving signals as last-level inputs. Place critical signals on the fastest paths when the macro has more than one propagation delay. Where there are complementary outputs on a macro, alternate the signal to balance loading delays, watching loads, drive-factors. Where there are complementary outputs, use the fastest path to propagate a critical signal. Duplicate logic to reduce the fan-out and metal load delay in the critical path (balance this against the added load delay to duplicate the signal and added silicon). Use the correct-drive macro for the fan-out at each net (do not overload). Do not use high fan-out drivers for low loads. Use fan-out derating on all macros except identified super-drivers. Avoid wire-ORs when available as they add to metal loading. Use binary counters instead of shift counters (parallel versus serial). Use parallel load counters and parallel data transfer. Use carry-look ahead, carry generate-propagate and other fast-adder techniques Choose the design for high-input logic gates carefully. Investigate design options and their speeds. Perform a pulse-distortion review and minimize distortion in critical paths. Minimize output pin capacitive loading, both system load and package pin capacitance. Last resort: specify placement restrictions for difficult paths (less than 20% of the paths in the array). Last resort: ask the vendor for a custom macro to enhance speed.
Systems
Design to Improve Speed If a macro library provides speed-power options for the macros available, an initial design is done selecting the standard macro options (the middle choice) and is also done from the perspective of logic minimization. Once the array technology has been selected, circuit speed is affected by the major factors shown in Table 4-3. Table 4-3 Designing To Improve Speed Optimize for Speed ● ● ● ● ● ●
Review the macro chosen Review the macro option Review Macro functionality Perform Logic minimization Reduce the fan-out loading Reduce wire-OR loading (if allowed)
In the case study, the change of the output macro to allow the circuit to operate at specification did not affect cell count and reduced the power by three Watts. The solution of timing problems are not often so simplistic or beneficial. When faced with a circuit that does not meet timing specifications, changing macros can lead to increased cell counts, no change or decreased cell counts and higher power dissipation, no change or a reduction in pow-er dissipation.
Macro options When an array provides macro options, those options should be reviewed for applicability to the design problem. If a macro comes in low-power, high-speed and standard options, they will each have a different toggle frequency or maximum frequency of operation. Each option may have a different fan-out load or drive capability. Each option will have a different power dissipation. In some case, the different options may have different cell counts. The selection of a high-speed macro option may carry power dissipation penalties that may in turn lead to other macros needing to be downgraded to low-power options. This may be necessitated by an internal current limit for the array or from the early estimates of the junction temperature. As long as the toggle frequency of the low-power macros are not violated and the fan-out load limits are not exceeded, then the use of low-power options is acceptable. When a library has driver macros with balanced load delay drive factors (minimal skew) and faster intrinsic (internal) delays, the use of the driver
should be justified. Drivers typically carry a power dissipation penalty. They should be used in clock distribution lines and for heavily loaded paths that have tight timing specifications.
Macro functionality Another reason to review the macro library before finalizing a design is that speed is usually a function of density. In general, a highfunctionality multiple-cell macro will perform better than a circuit module formed from equivalent macros. Intra-macro nets (connecting components) are shorter than inter-macro interconnect delays. A hard macro, where the routing is always in the same pattern, can guaranty its worst-case speed. A soft macro has placement require ments and priority routing that must be used if it is to meet its specifications. High-functionality macros also include those that combine functions, such as the 3:1 MUX-D flip/flop macros (supporting testing), dual flip/flops, triple latches or triple multiplexors, internal-I/O dual function macros that ensure the maximum utilization of the complex I/O cells, or combined two-cell bidirectional-added ground macros, that keep the second pad from being wasted. High-functionality macros are those that prevent or minimize wasted (unused) silicon, pads or cells. Other design techniques to increase the circuit density include replacing gate structures with multiplexors where the speed and gate count would be reduced. (See Digital Design with Standard MSI and LSI, 2nd ed., by T.R. Blakeslee, 1979, Wiley, New York.)
Example of silicon efficiency As an example of high-functionality silicon-efficient macros, the AMCC REG00 universal register is the equivalent of four 4:1 MUXs and four D F/F or 8 macros and is available in several libraries. In the Q5000 library, it uses 4.5 internal cells whereas the individual 4:1 MUX and D Flip/Flop macros would use 8 internal cells. The number of inter-macro interconnects is 15 and their delays are kept constant for each occurrence of the MSI macro due to the required placement pattern. REG00 is a soft macro in the Q5000 library and a hard macro in another. Table 4-4 Basic Macro-Selection Guidelines * Path Type
Use these macro types: High fan-out drivers
Clock paths
Fast paths
Loaded paths
Slow paths
High speed, minimum skew macros High-speed, minimum skew macros High fan-out drivers Parallel structures Low-power, slower macros
Average paths Standard macros
Penalty: more power more cells more power more power more power more cells more power more cells (less power) (less cells) --------
* For libraries with high-speed, low-power, standard and driver macros. Comprable tables can be generated for libraries with other combinations of variations, versions and options.
Alternative implementations One way to ensure that the best version of a critical path has been created with any given library is to create more than one solution. Alternative implementations should be reviewed within the confines of a specific set of prioritized design objectives. They should always be evaluated for any critical path. Paths (partial circuits) can be captured and simulated for detailed comparison of timing, including timing check analysis using the current workstations. Some jury-rig of connectors or dummy loading may be required depending on the peculiarities of the CAD/CAE workstation chosen and the part of the circuit being captured. This is a minor inconvenience in exchange for which the designer can easily perform timing analysis, such as checking on pulse width distortion, set-up and hold violations and path delays.
Examples As an example of the value of checking various implementations, a test circuit was given to students, applications and macro design engineers to implement using the Q5000 library. The design objective was stated as speed at all costs. The students (unfamiliar with the macro library) produced circuits running at 125-145MHz. Applications engineers produced a version running at 183MHz. The array-macro designers produced a version running at 235MHz. The variable was macro library familiarity; no custom macros were allowed. The case study in the previous chapter was another example. It showed two different implementations of that circuit, one with an output toggle rate of 350MHz and one with 600MHz - the only difference was the output macro.
Internal Net Delays The delay in a heavily loaded net tends to be longer than the macro intrinsic delay, the delay through the macro that drives the net. The net delay is a result of the electrical effects of fan-out load, wire-OR load and the capacitance of the metal interconnect length. Wire-ORs if allowed in the library will add to the delay in the net with both an electrical load and with additional metal. Front-Annotation delays (pre-place and route) due to metal length for a large array are on the average larger than for a small array. This is reasonable since the side to side distances are larger for the larger array. The break-up of heavily loaded paths into identical parallel paths can result in significant propagation delay improvement, regardless of array size. Table 4-5 Components Of Internal Net Delays Internal Net Delays ● ● ●
electrical fan-out load electrical wire-OR load capacitive load of the metal etch
Fan-out loading is the same regardless of the array size. A macro driving 6 loads on a large array would see the same load if the circuit were placed on a smaller array.
Wire-ORs when allowed Some array libraries allowed dot-connects such as wire-ORs or wireANDs. These may save gate delays but add a wire or metal length penalty. A wire-OR driven by four macros and outputting to six other macros has the metal equivalent of a ten output net - using lumped FrontAnnotation computations. This is considered to be a heavily loaded net. The added net delay will probably exceed the "saved" gate delay. The use of dot-connects should be carefully evaluated. Verify that they are allowed on the schematics before evaluating their usefullness. They may be allowed on some arrays and not on others from the same vendor. For example, the AMCC Q5000 has wire-ORs but the Q20000 bipolar and Q24000 BiCMOS arrays do not allow their use. Figure 4-2 Optimization - Speed
Optimization - Speed Considerations
Design To Reduce Internal Cell Utilization Reduction of the internal cell utilization or equivalent gate count is also called logical circuit minimization. Factoring of common terms from the logical equation and the removal of redundant logic help reduce cell counts by reducing the logic that must be implemented. Minimization is critical when a high fault-grade score is desired since redundant logic will lower the potential test score (fault masking). Use of the higher functionality MSI macros and the design approach discussed earlier, of selecting higher functionality macros first and working back towards the SSI macros in the library, will contribute to a cell-efficient design. The design approaches for internal cell minimization are shown in Table 4-6. Table 4-6 Minimizing Internal Cell Utilization Reducing Internal Cell Utilization Logic minimization. High-functionality internal macros. Use shift counters instead of parallel counters. Use ripple counters if the propagation delay meets specification delays. Use ripple-carry adders (between MSI blocks). Use single polarity between macros. Use serial data transfer. Use a scan-test F/F or latch to replace a MUX-F/F or MUXLATCH combination. Avoid extraneous invertors (those added just to invert signals). Many macros are available in complementary form or use DeMorgan's theorem. When converting from TTL or ECL, do not implement unused functions. Keep the macro design application-specific. Avoid: ● ● ● ●
unused preset, clear multiple enables excess carry logic excess load logic
Internal cell minimization is not fully compatible with the approaches used to improve speed. While logic minimization does help speed, as does the use of high-functionality macros, serial operations are slower than parallel operations. The designer must be guided by the priority assigned to the conflicting design objectives.
Example An experienced ECL designer chose the Q3500 array and converted a standard-part design into macros from the chosen library. He was careful to duplicate the parts exactly. When he was finished, he had 124% cell utilization. At the time, there was no larger array. The solution came when the logic was minimized and the unused functionality of the individual standard parts was deleted from the design. By changing the design from a direct conversion to an application-specific implementation, the cell utilization was reduced to 98% and the circuit was built. {True story.} Figure 4-3 Optimization - Cell Utilization (Sizing)
Optimization Issues - Cell Utilization
Design To Reduce I/O Utilization There are several techniques used to reduce the I/O cell utilization shown in Table 4-7. Table 4-7 Reducing I/O Cell And Pad Utilization Reducing I/O Utilization Use the high-functionality interface macros. Use bidirectional macros if possible. Partition a system by bits (bit-slice) rather than function. This also allows the development of circuit sub-modules which reduce schematic capture and test efforts. When speed requirements will permit it, use serial data transfer rather than parallel data transfer. Multiplex test points to reduce test pinouts. Multiplex non-critical outputs. Transfer only one polarity of a signal on and off chip (single rail transfer) rather than differential if other factors permit it. - use a VBB source and single-rail ECL. Decode input signals on-chip. Use local (to the array) counters; duplicate the counter on several arrays and synchronize. Use bus architecture, where one or more I/O signals serve several signal sources. Use external serial-input registers. The difficulty with array-based circuit design is that, should a circuit require just one more I/O connection, there is no way to obtain it save by the selection of a larger array. There are no jumpers, piggy-backed components and other quick-and-dirty board design tricks that can apply. The I/O signal count must fit the target array, and be equal to or less than the available array signal pads.
Design To Fit The Package Also at issue at this point in the design stage is the desired package. When the package is selected, the package methodology for handling added power and grounds can be determined. Packages can be one-onone, each added power and ground pad reaches an external package pin, or they may have internal power and ground planes. Not all array pads will reach an internal power or ground plane therefore there are placement restrictions on the locations of the added power and ground macros. If these restrictions cannot be met due to other placement requirements or if the package does not have enough pads that can bond to internal planes, then the added power or ground macros will require external package pins. (Example, some packages offer ground planes and no power planes.)
Case 1 - Count of all array pads less than or equal to the number of total package pins There are two approaches to checking the package against the design. The first is when there are no internal power or ground planes. In this case, count the number of inputs, outputs, bidirectionals, added power, added ground, fixed power, fixed ground, and any fixed I/O signals (such as on-chip thermal diodes and AC speed monitors). This number should be less than or equal to the total number of package pins.
Case 2 - Count of all signals less than or equal to the number of package signal pins The second case involves making an estimate which can be refined after placement is completed and approved. In this case, count the number of array pads used by inputs, outputs, bidirectionals, added power, added ground and any fixed I/O signals such as on-chip thermal diodes or AC speed monitors. This number should be less than or equal to the total number of package signal pins. The package power and ground pins connect to the internal power and ground planes. After placement, the number of signals will be reduced by the number of added power and ground macros that were placed to connect to internal package power and ground planes. Those macros will not use the external package signal pins.
Example A designer submitted a design with a desired package (a 149 PGA with internal power and ground planes). The package has 120 signal pins. He used 132 array pads for I/O signals and added power and grounds. There were no fixed thermal diode or AC speed monitor signals on the array. There were eight added power and grounds. The design, after careful placement of the added power and grounds, had four signals more than there were package signal pins.
This problem was not discovered until placement, i.e., until after all simulations and design validations were performed. One solution is to look for expendable I/O cell usage. If there are extra grounds beyond the minimum, or more VBB or other voltage sources than is really required, they can be reduced. Another solution is to add an 8:1 MUX, place eight non-critical outputs as inputs to the MUX, add three input signals to control the select lines and one output for the MUX. This reduced the total number of signal pins required to 120, which would fit the package. What if neither of these solutions is acceptable? Then some other design change is in order if the package cannot be changed. A design change requires that all simulations and all checking be repeated. Could this situation have been prevented? By checking the package limits during the optimization phase and using the package limits as a guide, the design changes or package changes could have been identified earlier, saving the iteration of the simulation loop. Remember that simulation is estimated to use approximately 50% of the CPU time used in a design process. Figure 4-4 Optimization - Packaging Issues
Optimization Issues - Packaging
Design To Reduce Power Table 4-8 Power And Macro Choices Macro Options That Affect Power Consumption Macro Option, if available Macro Drive capability Functionality Outputs used or terminated I/O mode
When the array library has options, the high-speed macros will dissipate more current than the standard option macros. The low-power macros are slower with less current than the standard option macros. Driver macros use more current than non-drivers, while super-drivers may use 2-3 times the current used by a driver that can handle fewer loads. High-functionality macros are using more of the components in a macro cell so the cells these macros occupy use more current. MSI macros are usually pre-placed to avoid hot-spots and to maintain their timing specifications. Some arrays (such as the AMCC Q5000 Series) offer a power-down feature if a macro output is unused. The newer [1994] Q20000 Series does not have this feature. For DC power dissipation, overhead current is also a factor. Overhead current is that current that is used when an unpopulated array is plugged into the power supplies. It supplies the internal voltage regulators and reference generators. It is a function of the I/O mode and the power supply configuration. AC power dissipation computations depend heavily on the switching frequency of the various macros. Depending on the vendor, the number of outputs on the macro is a factor in the AC power equation. If one output is required and the other terminated, the terminated output contributes to the AC power equation. When several variations of a macro exist, and there is a choice, use the macro with a single output if that is what is needed. The DC power computation can be performed by software supplied by the array vendor. AC power computations are still primarily a manual estimate. (Hardware emulators are in the early stages of AC computation support. Refer to the chapter on power computation.) Figure 4-5 Optimization - Power Considerations
Power dissipation is controlled by the number and type of macros chosen. . Some of the variations are summarized in Table 4-8.
Optimization Issues - Power
Design To Reduce Cost The cost of a design is a function of all factors involved in that design, from the initial design decisions on who will do the macro conversion to the special testing requirements. Anything that is not within the standard, routine design flow will usually cost more. Design iterations cost more. A redesign averages the same amount of CPU time spent in simulation. Items that increase costs are listed in Table 4-9. Some guidelines to keep costs down are shown in Table 4-10 . Table 4-9 Items That Increase Cost; * Items That Increase Circuit Costs ●
●
● ● ● ●
●
●
● ● ● ●
● ● ● ● ● ●
●
● ● ● ●
Training classes - usually credited against NRE Amount of design support from vendor ❍ macro conversion ❍ performing validation ❍ performing simulations Array series chosen Size of the array Custom macros Use of the vendor's design center longer than a nominal time (4 weeks) Functional vectors that exceed 4K ❍ charged on a per page (4K) basis (based on the SENTRY tester) or whatever limit is specified Functional vectors that contain races and hazards (require rework) Fault grading - more than 2 passes Net matching required Pre-defined pin-out (causes iterations) Design iteration ❍ after first place and route ❍ when it was not the vendor's error Non-standard bonding Non-standard package Optional heatsink Custom DUT board Custom test software Bench tests ❍ charged per path MIL screening ❍ tri-temp testing ❍ burn-in ❍ qual Optional Commercial circuit burn-in Optional Commercial circuit - qual Optional PIND testing (post-package) Expedite on schedule
* AMCC measures used for example
Table 4-10 Keeping The Costs Down How to Keep Circuit Costs Down ●
●
●
● ●
●
●
●
● ●
●
● ● ● ●
Allow adequate time in the design schedule and keep to the schedule Allow time for several design reviews at various steps in the design flow Allow time for the vendor's steps and to review those steps Follow the previously outlined steps for the design flow Choose an array that is adequate for the design - within population and cell utilization bounds Keep the junction temperature within bounds --- so that a heatsink is not required --- to minimize placement problems Use added power and grounds as required for SSOs, high-frequency- use additional grounds Keep the design modular to keep functional vector set size down Plan for standard packaging, standard bonding Do not commit the PCB layout until the array place and route is approved Plan the design (through a design review) before showing up at the design center Design for testability Work with the vendor to avoid custom macros Plan for standard testing, standard DUT board Plan to avoid costly redesigns
Design Reviews A design review should be held at the initial optimization at the block diagram-functional description stage of the design. Another design review should be held on the completion of the design optimization at the macro level, before any lengthy simulations are performed. This procedure will help to reduce iterations of the simulation design step.
Summary Design optimization is not something that is just applied to a design after it has been created. Optimization should be considered as a design is being planned, as the block diagram is being sketched, and as the macro conversion is being performed. The design criteria, and the priorities of those criteria should be established at the start of the design cycle and referred to at every step in that cycle. The design criteria are not compatible items. In satisfying one objective, another may be compromised. The engineering task is to satisfy as many as feasible for the given situation. It requires tradeoffs and compromise.
Exercises 1. Select an array series and macro library to evaluate. 2. Choose a design project of your own or try a 16-bit adder with latched input and registered output (the adder portion of the Am2901 without memory). 3. Establish a set of prioritized design objectives including the target array and the target speed (such as add-with-carry). Block out a solution for the design. 4. Change the priority of the design objectives. For example, instead of speed being the most important, make cell reduction and power reduction the most important objectives. Block out a second solution for the design. 5. What effect did this change of perspective have on your approach to the design?
Basic Design For Circuit Testability There are several formal methods for design-for-test. These include scanpath and level sensitive scan design. In addition to these, and a part of the design for test requirements, are the follow suggestions for improved circuit testability. ●
●
●
● ● ●
● ●
●
● ●
●
●
● ● ●
● ●
Become familiar with the macro library BEFORE beginning the macro conversion or design. Use synchronous rather than asynchronous circuits whenever possible - functional tests are synchronous. Partition the design (use structured design techniques) into smaller, testable sections, usually along a functional boundary. In partitioning: ❍ Use degating logic to isolate modules for test. ❍ Use modular architecture, bus structures. ❍ Break up long counters (>8). Don't bury states. Use transparent latches instead of flip/flops where possible. Use macros, especially flips/flops and latches, with RESET or SET controls where possible to simplify initialization. Avoid feedback loops. If unavoidable, provide a means to break up feedback loops during test (degating, enables). Avoid redundant logic - minimize! - or add test points to unmask masked faults. Avoid derived clocks - they complicate testing. Design in test points, especially in sequential logic. Add test points to improve controllability and observability. Perform testability analysis. If I/O pins are limited, use demultiplexors to control and multiplexors to observe internal nodes with otherwise poor observability (buried states). Any 3-state enable control signal that is internally generated must be externally observable, and should be externally controllable during test. Add parity trees for error detection. Use Scan Path Design to simplify test sequence generation. Use Level Sensitive Scan Design to simplify test sequence generation. Use some variation of the Scan Path or LSSD DFT procedures. Keep test generation in mind while designing the circuit.
Figure 4-6 Optimization - Circuit Testability
Optimization Issues - Testability
Basic Design For Circuit Reliability Some specific design suggestions for improved circuit reliability are: ●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
Become familiar with the macro library BEFORE beginning the macro conversion or design. Be aware of "glitch" circuits. Do not use potential glitch circuits to drive clock inputs. Avoid one-shot pulse generators. Avoid gated and derived clocks. Avoid race and hazard conditions. (print on change files can help identify these.) These are generated by having a signal follow two or more paths to a common circuit element (a.k.a. reconvergent fan-out.) Avoid feedback loops. If unavoidable, provide a means to break up feedback loops during test (using degating, enables). Avoid feedback paths between registers. If present, compute the worst-case set-up and hold times and verify operation. (Feedback from the ECL output macros must be handled with care if used to input to internal latches and flip/flops.) Add sufficient GROUND for the number of simultaneously switching outputs and distribute among these outputs (similar to distributed ground in a ribbon cable). Add additional extra ground if there are extra I/O pins available. Add extra VCC as needed for the number of simultaneously switching outputs. Properly derate fan-out on all distortion-sensitive paths and all clock paths. Keep clock path loading balanced. Avoid floating nodes on internal 3-state busses or external bidirectional busses. Use Johnson (a.k.a. Mobius, Ring or Twisted-tail) counters or separate flip/flops to decode terminal counts. The loading on the Q outputs is identical, eliminating the loading skew (not the metal skew), and the outputs are a Gray code - only one output changes state per clock cycle. (Binary counter decoding can cause glitches.) Compensate for rising and falling edge loading skews and the reversed TTL input translator rising and falling edge skews by inversion as needed to reduce pulse stretch and pulse shrink phenomena. Reduce heavy loading on high-speed paths by creating parallel paths with identical macros. Use ECL differential inputs and outputs when the frequency exceeds 200-300MHz. The actual frequency boundary will be series-specific.
Figure 4-7 Optimization - Circuit Reliability
Optimization Issues - Reliability
Timing Analysis for Arrays
Introduction There are two types of timing analysis required for array verification ● ●
path propagation delay and set-up and hold time analysis.
Path propagation delay is covered in this chapter. External set-up and hold time is covered in Chapter 6. One circuit design objective is speed and, before schematic capture, a preliminary timing analysis of the critical paths of a circuit is performed. It may be done by simulation or even by hand, if necessary, to assure that the circuit as implemented on the chosen array will be successful. This initial analysis may dictate the design optimization techniques required to ensure that the circuit will meet the specified performance requirements. To reduce manual effort, as soon as a macro library is available for evaluation, the critical performance paths of the circuit should be captured and a detailed, annotated simulation performed. (Manual computations may still be required for external set-up and hold time analysis.)
A detailed timing analysis of the complete circuit and its critical paths is required before a circuit can be submitted for place and route. Timing analysis of a circuit includes: ●
● ●
●
●
worst-case path propagation delays for both rising and fallingedge inputs for all critical or suspected critical paths; external and internal set-up, hold and recovery times; pulse skew and tracking due to placement (process variation); pulse distortion due to wire-OR, fan-out and metal loading (pulse stretch, pulse shrink) for all input and internal macros; pulse distortion due to output macro loading (package pin capacitance and system capacitive loading
The macros selected, the options of those macros, the loading on the macros, and the final layout of the circuit are all factors in the propagation delay of any path. The loading may be the interconnect capacitance or the external load capacitance due to system loading and package pin capacitance.
Path Propagation Delay Overview Computation of the propagation delay for a circuit path includes an evaluation of the following: ● ● ●
input, logic and output macro intrinsic propagation delays extrinsic loading devices an adjustment for environmental effects and processing variations (worst-case timing multiplier) for those arrays which specify typical intrinsic delays
Intrinsic Delays Array vendors use a variety of design manual documentation formats to specify intrinsic delays. Intrinsic delay (Tpd) is the time required for a signal to propagate from a macro input pin to a macro output pin. The delay may be different for each input to output path through the macro. The delay may be specified as dependent on the input and output edges such as rising to rising (++) or rising to falling edges (+-, inversion). The delay may be a function of other input states or simultaneous input switching. This information is usually detained in the documentation for the macro library. The four input-output edge combinations may be identified as: Tpd ++ rising edge input, rising edge output Tpd +- rising edge input, falling edge output Tpd --
falling edge input, falling edge output
Tpd -+ falling edge input, rising edge output Intrinsic delay may be specified as typical, with adjustment factors or worst-case delay multipliers supplied to allow maximum and minimum delay computations for specific operating conditions. The delays may be specified as worst-case maximum for one set of operating conditions with adjustment factors to convert to other conditions. Another option is to specify the delays with a worst-case min-max range for one or more sets of operating conditions. (See Table 5-1.) Table 5-1 Tpd Specifications And Adjustment Factors (Historical) Tpd Specified: For Specific Operating Conditions Use: Typical
Adjustment Factors
Worst-Case
Factors for other Conditions
MIN/MAX Range (Specific to Conditions) Macro intrinsic delay values may assume no loading, one load, or several loads on the macro output pin. When annotation software is available, macros are specified as unloaded. For some macros, delays are dependent on how many other pins on the macro are also switching. The actual macro path delay may be a function of: 1. state of the input data (low data may have different set-up and hold times than high data; 2. low to high (rising edge) propagation may be different from high to low (falling edge) propagation); 3. multiple inputs changing state (when several OR/NOR inputs change simultaneously, the delay increases). Three-state macros have specifications for high-Z, representing switching delays for TPHZ, TPZH, TPZL and TPLZ. The propagation delay supplied in a design manual is for the delay from input to output measured at the 50% level. For TTL I/O macros, the measurement is at the 1.5V level. Rise and fall time is measured between 10% and 90%. The data sheet for the array series should indicate measurement points and levels. If this information is important to the design, check to see if it is available, under what conditions the measurements were taken and under what load. Adjust according to the intended operating environment and load conditions.
Intrinsic Set-Up and Hold Time The intrinsic set-up and hold times represent the required behavior of the signals coming into the macro, observed at its input and output nodes. External set-up and hold times are concerned with the signals at the external pins of the array. Set-up time (Tsu) is the length of time that a data signal must be stable before the 50% point of the next active clock edge. A negative set-up time indicates that the data does not have to become stable until after the active clock edge. Hold time (Th) is the length of time that a data signal must be held stable after the active clock edge. A negative hold time indicates that the data can be removed before the active clock edge. Whether a set-up or hold time is negative is a function of the disparity in the delays in the clock and data paths between the input and the actual functional use of the signal. For example, a complex macro may have a multiplexor in the data path and nothing in the clock path for an internal flip/flop. Today's libraries tend toward rising-edge active clocks, zero hold times, and positive set-up times.
Start and End Points for Set-Up and HoldTime Computations Set-up time is considered to be the minimum time required for a signal to travel from the circuit data pin input to the data port of the flip/flop or register. It can also be the minimum time for a signal to leave the Clk->q port of one register and arrive at the data port of a sequential register (register-tegister delay time). When the start point is the external data pin (and the clock from the external clock pin), this is considered external set-up and hold time. This is discussed in Chapter 6.
Intrinsic Recovery Time Recovery time (Trec) is specified for any latch or flip/flop macro that has a set or reset pin, or any complex macro that contains the equivalent function. It is the length of time that a reset or set signal has to have been inactive before an active clock edge. Clocking within the recovery time will result in unpredictable behavior.
Set-up, Hold and Recovery Specifications Set-up time (Tsu), hold time (Th), and recovery time (Trec) may be specified as typical, in which case adjustment factors or worst-case multipliers must be used to adjust them to the specific operating conditions of the circuit. They may be specified as worst-case for common operating conditions, i.e., Commercial and Military. In cases where differences between Commercial and Military are minimal, only one value may be specified.
Maximum Operating Frequency; Toggle Frequency The maximum operating frequency (fmax) is specified as the maximum switching rate for the macros and is technology dependent. Macros must not be driven beyond their specified operating limits. The actual frequency at which a circuit may operate must be computed from the worst-case critical path propagation delay timing analysis and set-up and hold analysis. The limit for a macro is its minimum pulse width: the minimum pulse that can be successfully propagated through the macro. The pulse reaching a macro input pin is a function of the frequency of the signal on that pin and the pulse stretch-shrink distortion of the signal.
Intrinsic Pulse Width Minimum pulse width (PW) is the inverse of the specification of the toggle frequency for the macro. It defines how close any two edges of a pulse passing through the macro may be. Latches and flip/flops and the complex macros that include these devices will normally be specified with a minimum pulse width and a maximum frequency of operation, possibly differentiated for military and commercial operation, or both. (The reciprocal of the pulse width multiplied by two is the frequency.) For most macros, the generic maximum frequency of operation for a given class of macros defines the limits. For special cases, such as latches, flip/flops, and complex macros, the macro may have its own specific limits. Complex libraries with a range of performance within the macro set may specify a maximum frequency and pulse width for each macro.
Examples As an example of specification approaches, the Q20000 [as of 19924] specification approach is detailed below. For Q20000 Bipolar Array Series: ●
●
● ● ●
The pulse width specifications in the AMCC Q20000 design manual are worst-case times, computed from the maximum frequency of operation (assuming a 50% duty cycle). PW is specified for Military and Commercial conditions (originally specified for Hot and Cold conditions). Tpd delays are specified as unloaded delays. Set-up and Hold times are specified as worst-case. Although Tsu, Th, and pulse width are specified as single values for the given conditions, the Tpd delays are specified as a min/max range.
For today's Arrays ●
●
●
●
●
●
●
Pulse width, set-up and hold times, are computed by today's synthesis programs using data specified in the chosen design library. Libraries are specified for specific operating conditions - NO ADJUSTMENTS If variations on the library operating conditions is desired, get the vendor to supply a new library. [We will repeat this warning.] Each library has 1-2 interconnect models (worst-case being one of them) MIN-MAX anaysis of hold time is commonly computed prelayout to identify serious hold-time issues. Individual macros contain set-up and hold time information for the macro. Pulse-swallowing is used to refer to the condition where the signal is too fast for the macro to "see" - i.e., the frequency exceeds the minimum pulse wodth fo the macro
Interconnect Delays Path delays are composed of the intrinsic (ti, internal to the macro) delays specified for the macros and extrinsic delays (Tex). Extrinsic delays are composed of the path propagation delays for the macro interconnect (macro to macro routing) and the output macro capacitive load. For any array, once the macro intrinsic propagation delays for macros in a given path for a given edge direction are listed, the next step in performing timing analysis on a circuit is to evaluate the loading for each macro in the path. Each macro output pin belongs to a separate timing path. Interconnect delays are the delays incurred by a signal when propagating from a driving macro to the inputs of its load that are caused by the RC time constants of the metal etch interconnect. The delay will be different for rising and falling edges. For small arrays, the delay was assumed to be dominated by the fan-out loading allowing linear estimates based on that load to be used to approximate the interconnect delays. The interconnect delays in the small arrays were not as large as the intrinsic macro delays themselves, therefore, the simplification could be justified. As the arrays became larger and faster, the interconnect delay in a heavily loaded net became larger than the intrinsic delay for the macro. It became important to obtain a closer estimate of circuit performance pre-layout. This led to the development of FrontAnnotation software, capable of estimating the interconnect delays based on statistical tables of physical fan-out loads versus etch length.
Types of Extrinsic Loading There are two types of extrinsic loading. The first type exists for those macros that drive internal nets (texint). Loading for an internal net involves: ● ●
●
the electrical load due to the fan-out for the net; the electrical load due to the presence of wire-ORed inputs to the net; the physical load due to the first and second layer metal used to interconnect the net.
The net that connects the output macro to the outside world will be subject to a capacitive load. This second type of extrinsic load (texout) is due to system capacitive loading and package pin capacitance.
Annotation There are three approaches that can be used to compute propagation delay due to the interconnect nets: ●
●
●
Front-Annotation, where a statistical estimate of metal delays based on the physical net size is used; Intermediate-Annotation, where a refined statistical estimate based both on the physical size of the net and the relative positions of the macros; Back-Annotation, based on the actual lengths of metal 1 and metal 2.
In all three cases, the interconnect delays due to the electrical fanout load and the electrical loading due to wire-ORs are accurate. The part of the delay due to the metal lengths will vary. When package pin capacitance is included in the output macro extrinsic load, Front-Annotation uses an estimate since actual placement is unknown. Back-Annotation uses the actual pin capacitance for the assigned package pin. For critical circuits, since routing is the longest process, Intermediate Annotation can reduce the routing passes or edits required. This tool, however, is not always available.
Drive Factors Macros are specified with drive capability, drive factors or adjustment factors that apply to extrinsic loading with the same variability that is found in the intrinsic delay specifications. Alternatively, they may have delays directly specified for several loads allowing an extrapolation to be made, often by reading a graph.
Manual Computation - One Method The path propagation delays can be estimated using: ● ● ● ●
the statistical wire delay tables (Lnet), fan-out loading (Lfo), wire-OR loading (Lwo), if any, and the appropriate k-factors or drive factors (k) for the macros chosen.
One equation for the typical extrinsic (load) delay for a single internal net is shown below and discussed in detail on the following pages. texint = kfo * Lfo + knet * Lnet + kwo * Lwo
One equation for the typical extrinsic (load) delay for a single output net is shown below and discussed in detail on the following pages. texout = kcap * (Csystem + Cpackage )
The form and notations used for these equations varies widely with the array vendors. The typical intrinsic macro delays in the path (tin; specified as Tpd in the macro documentation) and all typical extrinsic loading: (texi = sum of all texintj and texout) 1...j all nets in path
are multiplied by the proper worst-case timing multiplication factor and the results summed. Worst-case macro delays are used directly.
Example Equations for Extrinsic Loading Internal Nets Loading delays may be computed for Front-Annotation analysis by the general equation: texint = kfo * Lfo + knet * Lnet + kwo * Lw
where: k = the k-factor for the series and the macro option kfo is for fan-out load; knet is for metal load kwo is for wire-OR load Lfo = the sum of the electrical fan-out loads in a net. ----(Pins with a fan-in of 2 count as 2 electrical loads.) Lnet = the estimated metal delay from Front Annotation tables or equations; Lwo = the electrical load due to wire-OR The k-Factors are the conversion factors for changing load units into time units. These k-Factors are expressed in ns/LU. The load units are computed for electrical fan-out, net metalization and electrical wire-OR loads. The k-Factors may be assumed to be identical for Front-Annotation estimation. If the array does not allow wire-OR structures, the equation reduces to: tex = k * [ Lfo + Lnet ] reduced equation
Lnet - Front-Annotation Lnet is the statistical wire load taken from the Front-Annotation Statistical Wire Load table supplied by the vendor or from an equation that is supplied by the vendor, using the number of physical pins in the net minus 1 as an index. Lnet is expressed in load units. This load may also be specified through graphs. For Front-Annotation, Lnet tables are derived from empirical measurement of hundreds of nets of equal size in actual circuits on the same array and the resulting 50% point in the normal distribution (the median) is used as the table entry. This means that 50% of the net delays computed with this number will be equal to or smaller than this number and 50% will be equal to or larger than the computed number. When an array is preliminary, not many circuits will have been tested or measured. This means that the Front-Annotation values are estimates of expected delays. The errors could be in either direction. For any array, Front-Annotation accuracy decreases with net size. Critical paths that are pre-placed or given a higher priority in place and route operations than the rest of the circuit can be kept within the FrontAnnotation limits. Other circuits that are not critical will have longer metal lengths.
A reasonable number of paths in the circuit can be prioritized, with the allowed number a function of other the placement restrictions for the circuit, cell utilization and internal pin count. A limit - on the number of pre-placed and priority routed paths - of 20% is satisfactory for most circuits. Preplacement should not be considered as a solution to a timing problem. It is available as an aid, depending on other placement considerations.
Fan-In The loading that a typical macro presents to its driving source is typically one load for bipolar arrays and higher for BiCMOS arrays. Some vendors specify fan-in in tabular form with other specification data, or indicate a general rule. Some macros look as if they present two or more loads to their driving sources when they do not. The graphic representation is a logical picture of the function, not a physical representation of how the function is constructed.
Fan-Out - Lfo Fan-in affects the electrical loading presented to the driving macro. The electrical fan-out load count may be higher than the physical fan-out load count. For example, a pin with a fan-in of 2 counts as two electrical loads in Lfo and one physical pin when looking up Lnet. Fan-out violations or fan-out in excess of derated levels should have been checked during the design review of the circuit. Derated fan-out limits are used by the AMCCERC software when performing fan-out load limit violation checking. If a fan-out load is found to be excessive, the circuit must be corrected before proceeding with the timing analysis. Lfo is the sum of all fan-out loads. This is the sum of all electrical fan-out loads - the sum of the fan-in for each pin connected to the net. Lfo is expressed in load units. The load a macro presents to a driving macro is part of the macro specifications.
Wire-OR - Lwo For arrays that allow wire-ORs, Lwo is W * (n-1) where W is the wire-OR load factor for the array and n is the wire-OR size. Lwo is expressed in load units. This term only exists for those arrays that allow a wire-OR. Not all arrays in all technologies allow the use of the wire-OR. If it is legal for the array, the presence of a wire-OR in a net will affect both the electrical and the physical loading in the net. Wire-ORing two outputs will not increase the fan-out load limit in the bipolar arrays as it would have in a CMOS array. AMCC Q5000 arrays, which allow wire-ORs, power-down the additional current sources by way of conditional geometry software. Load units for the Q5000 wire-OR macros are shown in Table 5-2.
Example Table 5-2 Q5000 Wire-Or Loading (Lwo) WIRE-OR SIZE
LU
WIREOR2
0.40
WIREOR3
0.80
WIREOR4
1.20
k-Factors The k-factor for an internal macro output pin is the drive factor expressed in ns/LU (Load Units) that is used to convert the total net load units into time. As detailed above, load units are attributed to electrical fan-in loading, physical metal length loading and electrical wire-OR loading. Within an array series, the k-Factors vary by macro option and by edge direction. Examples are shown in Table 5-3 for the Q5000 Series (bipolar) and the Q20000 Series (high-speed bipolar) arrays. Table 5-3a Example K-Factors - Option Specific * Low Power Option Macro: k-factors
rising edge
0.04 ns/LU
falling edge
0.08 ns/LU
Standard Option Macro: k-factors
rising edge
0.04 ns/LU
falling edge
0.04 ns/LU
High-Speed Option Macro: k-factors
rising edge
0.02 ns/LU
falling edge
0.04 ns/LU
15-Load Driver: k-factors
rising edge
0.02 ns/LU
falling edge
0.02 ns/LU
25-Load Driver: k-factors
rising edge
0.01 ns/LU
falling edge
0.01 ns/LU
* Q5000 Library, internal macros, typical values
Table 5-3b Example Com5max Library K-Factors For The AMCC Q20000 Series kfo = knet; kcap** Macro Type ECL input, Bidi and internal macros - L option
ECL input, Bidi and internal macros - S option
ECL input, Bidi and internal macros - H option
ECL input, Bidi and internal macros - drivers
ECL output - 50 ohm standard
ECL output - 25 ohm Darlington
ECL output - 50 ohm Darlington
TTL input and Bidi (Non-Turbo) S, L, H options
TTL input and Bidi (Turbo) S, L, H options
TTL output - 20 mA
TTL output - 8 mA
Description
min/max
units
rising edge
2.3/5.1
ps/LU
falling edge
3.9/8.5
ps/LU
rising edge
1.8/3.9
ps/LU
falling edge
3.0/6.5
ps/LU
rising edge
1.4/3.1
ps/LU
falling edge
2.4/5.2
ps/LU
rising edge
1.1/2.3
ps/LU
falling edge
1.5/3.3
ps/LU
rising edge
16.0/36.0
ps/pF
falling edge
20.0/44.0
ps/pF
rising edge
13.0/30.0
ps/pF
falling edge
15.0/34.0
ps/pF
rising edge
13.0/30.0
ps/pF
falling edge
21.0/46.0
ps/pF
rising edge
3.0/6.5
ps/LU
falling edge
6.0/13.0
ps/LU
rising edge
1.4/3.1
ps/LU
falling edge
2.4/5.2
ps/LU
rising edge
33.0/72.0
ps/pF
falling edge
33.0/72.0
ps/pF
rising edge
33.0/72.0
ps/pF
falling edge
54.0/117.0
ps/pF
** Individual macro k-factors are specified in the macro library documentation in the AMCC Q20000 Design Manual, Volume I, Section 6. Units are min/max spread for the Commercial 5V library only.
Computing Lfo Compute Lfo by adding the sum of the electrical loads of all loads driven. If a destination pin has a fan-in of 2, it counts as two electrical loads and as one physical pin. A destination may appear to have two physical loads internal to the macro. In these cases, the macro documentation will clearly identify the fan-in load represented by that pin. BiCMOS libraries have a higher average fanin than do bipolar libraries.
Computing Lwo Compute Lwo by multiplying the wire-OR load factor by the size of the wire-OR. For libraries that do not allow a wire-OR, this term becomes zero. Example: For the AMCC Q5000, WIREOR4 = 1.2 loads.
Computing Lnet For a non-RC tree, non-distributed estimate of metal delays, use the vendor-supplied equation to find the metal loading. The cell sizes on the larger arrays in the same family are the same as for the smaller arrays. Since the distance from edge to edge of the array is larger, the average distance for an interconnect is larger. Therefore, the same macro path would be estimated as longer on the larger array than on the smaller one. Note that this is strictly the estimate - the actual delay will depend upon the macro positions and the actual routing paths.
Example AMCC uses: Lnet = a * (net size - 1)** b
where (net size - 1) is the physical pin count of the loads driven plus the number of sources on the net (assuming a wire-OR) minus one. When there is no wire-OR, (net size - 1) reduces to pins driven. Q2000 Series a and b factors (Historical)
For the Q5000, b = 0.67 and a varies by array. The Q5000T uses a = 3.84 and the Q1300T uses a = 1.96. For a macro driving a net sized 8 (net size - 1 = 7), this converts to 14.14 load units for the Q5000T and 6.63 load units for the Q1300T.
Exercises 1. For a Q5000T array, using the above information, what is the typical estimated delay in a net that is driven by a standard macro, if it is sourced by a four-input wire-OR (all sources are standard macros), and drives four other macros each of which has a fan-in of 1? Answer: k-factor = 0.04 ns/LU rising or falling Lwo = 1.2 LU Lfo = (4 macros * (fan-in = 1)) = 4 LU Lnet = 3.83 * (8-1)** 0.67 = 14.14 LU texint = 0.04 * ( 4 + 14.14 + 1.2 ) ns = 0.77 ns 2. Repeating for the Q13000T array: Answer: k-factor = 0.04 ns/LU rising or falling Lwo = 1.2 LU Lfo = (4 macros * (fan-in = 1)) = 4 LU Lnet = 1.96 * (8-1)** 0.67 = 6.63 LU <== Only difference texint = 0.04 * ( 4 + 6.63 + 1.2 ) ns = 0.46 ns
Example Equations for Extrinsic Loading - Output Nets ECL and TTL output macros are often specified with no capacitive loading on the macros. One method for manual computation of the effect of output pin capacitive load on the propagation path is: texout = kcap * (Csystem + Cpackage )
where kcap = the k-factor for output macro Csystem = the system capacitive load as seen by the pin Cpackage = the package pin capacitance for that pin The AMCC MacroMatrix AMCCANN user interface on the workstation allows the specification of the package, the package pin capacitance and the system load. Both package pin and system load capacitance may be specified as default values (all pins identical) or on an individual pin or group of pins basis. The output loading delay is automatically computed and added to the annotation files. All simulations are performed with annotation delay files. Output net k-Factors are shown in Table 5-4 for the Q5000 Series. These are typical vales and must be adjusted to worst-case. Refer to Table 5-3b for the output values for the Q20000 Series for comparison. Table 5-4 Example Output K-Factors kcap TTL OUTPUT: k-factors
rising
0.072ns/pF
falling
0.072ns/pF
ECL OUTPUT: k-factors
rising
0.045ns/pF
falling
0.037ns/pF
Package pin capacitance varies from 1pF to 18pF on larger packages (specific pins). The variance on a single package can be as wide. System loads are by default TTL = 15pF and ECL = 5pF. Loads that exceed 30pF can be seen to begin to affect the timing in a significant way.
A 50pF load represents 3.6ns delay when driven by a TTL output. For the library from which these were taken, macro delays are in the 1-2ns range. The output load could swamp the rest of the path, offsetting any design optimization that was performed without considering the load to be driven. These example k-factors are typical, which means that the delay must be multiplied by a worst-case multiplier to find the worst-case maximum delay. For the library chosen (AMCC's Q20000), this is 1.45 for Military, or a 50pF load on TTL is 5.22ns maximum worstcase MIL Also, note the ECL skew - rising edges are slower. This may offset macros whose intrinsic delays are the reverse (rising edge propagates faster). Path skew is discussed later. Note: When computing tester limits for path delay evaluation during AC test, the system load is replaced by the tester load for the pin. Tester loading is related to the output macro type. The package pin capacitance would remain the same. The use of tester inaccuracy adjustments such as an added load of 0.15 pF is array and vendor specific.
Q5000 Example (Front-Annotation) If a Q5000T internal S-option macro has an unloaded propagation delay (Tpd) of 0.65 ns, is wire-ORed with two other S-option sources (WIREOR3), and fans out to six loads, each with a fan-in of 1, compute the worst-case maximum delay through the loaded macro. Assume a Military-grade circuit. The Tpd delay is given as 0.65 ns (typical) The added delay due to loading is: tex = k * [ Lfo + Lnet + Lwo ] Lfo = 6 load units. net size = 6 + 3 - 1 = 8 Lnet for 8 and Q5000T = 15.47 LU from the FrontAnnotation table. Lwo = 0.8 from the wire-OR table. = k * [ 6 + 15.47 + 0.8 ] krise = 0.4; kfall = 0.4 from the macro specification sheet. then: tex= .04 * [ 22.27 ] = 0.8908 ns for the rising edge or falling edge delay due to loading The total delay through the loaded macro is: 0.65 + 0.89 = 1.54 ns (typical) The MILitary delay values are: 1.54 * 1.45 = 2.23 ns MIL
where 1.45 is the MILitary MAX worst-case multiplier specified for the Q5000T.
Worst-Case Delay Multiplication Factors Once the sum of all the typical intrinsic and extrinsic propagation delays in a circuit or path segment is computed and adjusted for the output capacitive load, then the result must be multiplied to obtain the worst-case delay as follows: Tpdworst-case = Tpdtypical * adjustment To perform worst-case analysis, the worst-case multipliers or adjustment factors for the appropriate operating range must be used. Example bipolar array multipliers are shown below. Refer to the appropriate design manual for worst-case multipliers for a particular array series. Many vendors specify a typical propagation delay, with or without assumed loading. Delays due to fan-out loads are also given for specific conditions. When the power supply or temperature varies from the typical specification, some derating must be applied to the typical delays. In addition, some allowances for process variations should be made. Some vendors specify the derating required for each of these items as a separate number or curve while others provide a combined worst-case multiplier, designed to assume that everything is in the worst possible state. There may different derating or worst-case multipliers for macros and for the interconnect nets, and there may be different multipliers for different macros. For a given array, the designer needs to determine whether the worst-case delay multipliers apply to set-up, hold, recovery time or pulse width. Array Series (Historical) Timing Adjustment Factors - Different Arrays
Examples The Raytheon design manual uses charts and shows a variation for temperature derating that spreads from 0.88 for -25C, to 1.15 for 150oC (1.10 for 130oC). The chart for the voltage derating factor varies from 0.98 at -5.72V (ECL 10K), to 1.05 at -4.68V (ECL 10K). The same range applies ECL 100K; 0.98 at -4.95V, to 1.05 at 4.05V. The chart for the process derating varies from 0.73 to 1.2. At a junction temperature of Tj = 150oC, these combine to form a worst-case derating of 1.449 max and 0.63 min. AMCC specifies a worst-case military maximum derating factor of 1.45 for its bipolar arrays and specifies a process variation of 1.19 1.45 (20% variation) in combination with the temperature and voltage variations. The AMCC multipliers also account for temperature, voltage and process. The minimum operating conditions use a worst-case range of 0.70-0.89. They apply to both the intrinsic macro delays (tin = Tpd) and to the extrinsic loading delays (tex), both loading delays on internal nets and loading delays on output macros due to capacitive load. For interconnect delays, Raytheon provides tables to allow an estimate of fan-out delays based on fan-out load. They appear to be appear linear. (See Table 5-5.) Additional tables provide derating factors to allow adjustment for metal interconnect and metal temperature. Table 5-5 Raytheon Adjustment Factors Factor/Variation (Historical) 0.88/1.05/130oC 1.15/150oC 25oC
Temperature ECL10K
0.98/5.2V
1.05/-4.68V
ECL100K
0.98/4.95V
1.05/-4.05
Process
0.73
1.2
TOTAL
0.63 MIN
1.449 MAX
Voltage
AMCC uses a non-linear equation to compute net loading and uses the macro worst-case multipliers on the interconnect delays. For older arrays, AMCC also specified a linear relationship based only on fan-out loading (0.5ns/load for rising edge; 1.0ns/load for falling edge), also combined with the macro worst-case multipliers. For the new BiCMOS series, the different technologies make different multipliers necessary. There is one for the bipolar interface macros and their extrinsic delays and one for the internal CMOS core macros and their extrinsic delays. The internal core macro worst-case multipliers apply to the set-up, hold, recovery time and pulse width values. An array series may have different multipliers for different arrays within the series.
Always verify the multipliers required, if any, and verify when they are used. Always go back to the latest information from the array vendor.
Front-Annotation Before schematic capture, computation of the propagation delays due to loading in a circuit can be performed using a table of statistically derived metal loads (Lnet) for a given net size. After schematic capture, Front-Annotation software is available to provide the designer with a file of rising and falling edge delays per net expressed as NOM, MIN and MAX, or as a min/max range to represent the uncertainty spread within a given operating condition. Output loading will be based on customer-described system loads and estimated package-pin capacitance. For EWS systems that can handle ambiguity testing, this allows an evaluation of the uncertainty window propagated down a path. There are six delays for each net. By incorporating these delays into the simulation database, the designer can obtain a statistical estimate of the circuit performance. Front-Annotation software is resident on individual workstations. It is linked into the workstation's model library by way of an EWS-specific command procedure and is used directly in the simulation database. The resulting simulation output file is the result of both intrinsic and extrinsic delays. Front-Annotation results cannot be guaranteed. They cannot be used as specifications for the final circuit. Indeed, if they are within 10% of the circuit target specification, the path should be evaluated for further optimization to reduce its delay. Any path this close to the specification at Front-Annotation time should be called out in the design submission package for special handling and possible preplacement. As stated before, preplacement should never be assumed to be able to solve a timing problem.
Remedial Steps - When a path is too slow When a path is already too slow by Front-Annotation simulation results, the designer must stop and address the problem before committing more time to simulation and submission requirements. A path that appears to be on the borderline of meeting specification requirements must also be reviewed for possible changes. For slow paths, design changes such as selecting different macros or macro options, trying an alternative design, and unloading paths may be sufficient.
●
In some cases, a custom macro may need to be designed. This is usually a last-resort approach since it defeats the advantages of semi-custom design, i.e., it is an additional charge and possible delay. Determine the need for a custom macro as soon as feasible in the design cycle to allow time for either the development of that macro or the development of an alternative to it.
●
●
The last solution is to allow placement to effect a reduction in the interconnect delays. Placement cannot be relied upon to reduce all paths and the achievement of the reduction through placement is dependent on the other conflicting placement requirements that exist for the circuit. Preplacement requests are normally reserved for those 20%-30% paths termed critical paths. Critical paths would be clock nets and other performance-sensitive nets.
Table 5-6 Methods For Speed Improvement Methods to use for Speed Improvement ● ● ● ● ●
● ●
● ● ●
use an alternate configuration use different macros - different functions use different macro options (high-speed, differential, drivers) use parallel paths to reduce loading use large, complex macros rather than many small ones(MSI macro vs. individual) use preplacement to control etch length design to fit on a smaller array - this possibility depends on modularity of the design develop custom macros to provide complex, design-specific functions reduce system loading use low-capacitance package pins for high performance signals (controls placement)
Intermediate-Annotation When a circuit is borderline in its performance, Intermediate Annotation may be available to refine the design. It can identify paths that are already in trouble or that are borderline that were not identified in the pre-placement FrontAnnotation analysis. The use of the Intermediate-Annotation delay file in a re-simulation is costeffective since it allows placement changes to be made before entering the costly routing process. (AMCC used this for problem circuits on an in-house only basis for some time.) Intermediate-Annotation uses the same electrical loading delays as FrontAnnotation but it has a refinement on the delays due to metal length. One program uses the placement file and the Manhatten Distance Algorithm and closely approximates the results that will be seen in Back-Annotation. The accuracy of the algorithm varies with the net size. Values for the loading delays on the output macros will reflect the same customer-defined system load but will now include actual package-pin capacitances, providing a further refinement in the timing analysis. To allow more control at the customer site, some ASIC companies are making packaging databases available to the annotation procedure. Combined with a basic placement capability, this allows the intermediate annotation to be available at the customer's site before any simulations have been performed. It improves the trial-and-error or alternate solution analysis by providing more accurate data. Note that Intermediate-Annotation results still cannot be guaranteed. They cannot be used as specifications for the final circuit.
Remedial Steps - Intermediate-Annotation When a path is too slow by Intermediate-Annotation simulation results, the designer must again stop and address the problem before committing more time to simulation and submission requirements. A path that appears to be on the borderline of meeting specification requirements must also be reviewed for possible changes. The procedures followed are the same as those listed for FrontAnnotation timing problems.
Back-Annotation The most accurate method of computing a circuit propagation delay requires that the circuit be completed through layout (routing). Back-Annotation software adds the actual metal delay to the electrical fan-out and wire-OR delays in the path. As for Front- and Intermediate-Annotation, Back-Annotation files provide the net rising and falling edge delays. These files can be used to compute accurate predictive path propagation delays and external set-up and hold times. Back-Annotation delay files include metal load delays which account for the differences in rise and fall times due to differences in metal load units for the rising and falling edges of metal 1 and metal 2. This software is undergoing further evolution as RC distributive networks are being examined by several vendors as a possible refinement. There is also research being undertaken to examine the topological effects of the path delays (nearness of a signal on one level to another active signal). The Back-Annotation software uses a mainframe-based database and is not available on the individual workstations. The program provides files that can be used on the workstations and which include the actual metal delays of each net. The BackAnnotation delay files are used in the simulations in the same manner as the Front-Annotation files. Back-annotation must be run and accepted as final before the generation of the actual arrays. Vendors will guarantee that the array will match (will not be slower than) the results of the Back-Annotation. Back Annotation Load Units Example - specific to the metal layer Data used to be supplied such that computations had to be made, hence the conversion tables. These computations are now performed by software. Synopsys PrimeTimeTM and DesignTimeTM STA (Static Timing Analysis) imports back-annotation data (PDEF, edif, SDF and RC_parametric file) and performs the sack-annotated timing analysis.
Remedial Steps - Back-Annotation If Intermediate-Annotation analysis has been performed, the probability of a failure during Back-Annotation analysis is significantly reduced. When a path is having timing problems after routing, the solutions are still those listed before, to reevaluate the problem and to determine an acceptable solution. Minor problems can often be solved by a post-route edit. More serious problems require a more serious solution, including a return to the design phase. All timing problems must be resolved before committing the design to production.
Exercises 1. For a bipolar array series of interest, how are the intrinsic macro delays specified? 2. For the same series, if worst-case multipliers or adjustment factors are specified, what are they and what operating conditions do they encompass? 3. Repeat 1. and 2. for a BiCMOS array series. Does the technology make a difference to the specification approach? If it does, why does it? If it does not, why not? 4. Are the macros in the bipolar library chosen in step 1. specified as unloaded or loaded?
External Set-Up and Hold Times
Introduction When the input to the data (D) or the clock (C) or both pins on a flip/flop or a latch are supplied from an external signal, then the external set-up and hold times must be computed. The computations must be for the worstcase and account for processing skew. The worst-case may be the worstcase maximum conditions or the worst-case minimum conditions. Hold time violations are a concern wherever two storage elements interface with each other and are clocked by different drivers. An example structure is parallel-clocked register flip/flops driven by multiple clock macros. Any multiple clock organization or clock distribution tree is subject to this design hazard. The error occurs when the Q output of one flip/flop (or latch) directly feeds the D input of another. If the clock to the second flip/flop is delayed due to tracking or skew, the D input may change during the set-up/hold window. This can be avoided by using the guidelines and design checks described in this chapter. (See the Case Study: Preventing Hold Violations Due to Clock Skew.) To meet design submission requirements, both the maximum worst-case and the minimum worst-case equations need to be computed to determine the worst-case window for external set-up and hold times for the specified operating conditions. Both rising edge and falling edge input path propagation must be evaluated. For deeply-nested paths, consult the array vendor for other effects that must be considered. Figure 6-1 illustrates the delay paths. The data delay path is TD; the clock delay path is TC. Depending on the methods used to specify macro timing, the data and clock paths may need to be divided into interface and internal macro components. Results computed or derived from simulations using Front-Annotation data cannot be considered as the circuit specification. Those derived from BackAnnotation are considered to be the specification. Figure 6-1 External Set-Up And Hold - Clock And Data Paths
There may be no internal macros in the data or the clock path or both, leaving the interface macro and the extrinsic loading as the only components in the paths. There may be multiple internal macros in a clock buffer tree while the data path is unloaded, or the data path may be heavily loaded while the clock path remains relatively simple. The relative loading between these two paths will determine whether the set-up or hold time is negative or positive and how large the set-up and hold window will be. The intrinsic set-up and hold time of the latch or flip/flop is another factor in the equation. Figure 6-2 External Set-Up And Hold Times - Interpretation
Figure 6-2 diagrams the definitions of positive set-up and hold times. Data must be held stable relative to the associated clock during these times. The time span indicated by the set-up and hold times is referred to as a timing "window". The term window is used since the external set-up and hold computations will produce a wider timing range than will be exhibited by any single die. For one set of operating conditions, the set-up time is computed assuming that the data path is worst-case maximum and the clock is worst-case minimum. The hold time reverses this and is computed assuming that the clock path is worst-case maximum and the data path is worst-case minimum. The resulting range is designed to encompass all extremes of voltage, temperature and process. (See Figure 6-3.) Figure 6-3 Worst-Case Range
Figure 6-3 diagrams the worst-case window produced from the maximum external set-up time and the maximum external hold time. The conditions under which these two values are computed are inconsistent and contradictory with each other, thereby ensuring that the set-up and hold timing "window" for the same two signals for any single array is contained within the computed range.
External Set-Up and Hold Times
Case 1: When The Timing Specifications Are Nominal For the case where the vendor has specified timing delays as nominal, with adjustment factors or worst-case multipliers, the following procedures can be applied. The worst-case multiplier (WCM) may be the product of adjustment factors for temperature, voltage and process.
Set-Up Time - Generic Equation When computing the set-up time, it is desirable to assume that the data propagation path delay is the worst-case maximum and that the clock path propagation delay is a worst-case minimum for the operating conditions. The generic equation is: tsuexternal = WCMmax * (tD + Tsumacro) - WCMmin * tC
Hold Time - Generic Equation When computing the hold time, it is desirable to assume that the data propagation path delay is the worst-case minimum and that the clock path propagation delay is a worst-case maximum for the operating conditions. The generic equation is: thexternal = WCMmax * (tC + Thmacro) - WCMmin * tD
Terminology Definitions
tD Nominal data path propagation delay from the circuit input and up to the flip/flop or latch data input pin; including the interface macro intrinsic delay and the extrinsic net delay for the net driven by the interface macro, computed using Front-Annotation methodology before layout, Back-Annotation after layout. tC Nominal clock path propagation delay from the circuit input and up to the flip/flop or latch clock input pin; including the interface macro intrinsic delay and the extrinsic net delay for the net driven by the interface macro, computed using Front-Annotation methodology before layout, Back-Annotation after layout. Tsumacro Macro intrinsic set-up time as specified in the macro library documentation. (For this case, it is assumed to be specified as typical.) Thmacro Macro intrinsic hold time as specified in the macro library documentation. (For this case, it is assumed to be specified as typical)
Variations in Specifications Even when the propagation delays through a macro are specified as typical or nominal, the setup and hold times for flip/flops and latches may be specified as worst-case maximum. In that case, the macro set-up (Tsumacro) and hold (Thmacro) times will not be multiplied by a worst-case multiplier or adjustment factor
.
Case 2: When The Timing Specifications Are Worst-Case For the case where the vendor has specified timing delays as worst-case, the following procedures can be applied. Note that if the macro intrinsic delays are specified as worst-case maximum, some means of computing the worst-case minimum for the same operating conditions must exist.
Set-Up Time - Generic Equation When computing the set-up time, it is desirable to assume that the data propagation path delay is the worst-case maximum and that the clock path propagation delay is a worst-case minimum for the operating conditions. The generic equation is: tsuexternal = tDmax - tCmin + Tsumacro
max
Hold Time - Generic Equation When computing the hold time, it is desirable to assume that the data propagation path delay is the worst-case minimum and that the clock path propagation delay is a worst-case maximum for the operating conditions. The generic equation is: thexternal = tCmax - tDmin + Thmacro
max
Example - The AMCC BiCMOS Q14000 Series The Q14000 Series has a complex arrangement of worst-case multipliers, separating interface and internal and differentiating between the channelled and channel-less arrays. For the Q14000 BiCMOS arrays, the data and clock paths are broken down into the interface macro and its net (tDinput or tCinput), and the internal macros and the nets they drive (tD or tC). Both set-up and hold times are specified as nominal for this array. For the Q14000 Series and its dual multipliers, the external set-up and
hold time equations become: Table 6-1 General Equations When Worst-Case Multipliers Are Used tsuexternal for Q14000 BiCMOS WCMmaxinterface * tDinput + WCMmaxcore * tD - WCMmininterface* tCinput WCMmincore* tC + WCMmaxcore * Tsu(macro)
thexternal for Q14000 BiCMOS WCMmaxinterface* tCinput + WCMmaxcore * tC - WCMmininterface* tDinput WCMmincore* tD + WCMmaxcore * Th(macro)
Table 6-2 provides the external set-up and hold equations for the Q14000 BiCMOS Series Arrays for four defined operating conditions, MIL, COM5, COM4 and MIN, using the specified worst-case delay multipliers specified for the array series for those conditions. Note that, since these arrays specify nominal macro intrinsic set-up and hold times, the worst-case time delay multipliers are applied to the set-up and hold times as well as to the Tpd delays. This may not always be the case. (Refer to the AMCC Q5000 and Q14000 Design Manuals.) Table 6-2 Set-Up And Hold Equations BiCMOS Q14000 Series Arrays Q28000B, Q14000B, Q6000B And Q800B Arrays tsuexternal = 1.70 * tDinput + 1.95 * tD - 1.40 * tCinput 1.59 * tC + 1.95 * Tsu(macro)
MILITARY OPERATING RANGE MIL4 or MIL5
thexternal = 1.70 * tCinput + 1.95 * tC - 1.40 * tDinput 1.59 * tD + 1.95 * Th(macro) tsuexternal = 1.55 * tDinput + 1.55 * tD - 1.27 * tCinput - 1.27 *
COMMERCIAL OPERATING RANGE COM5
tC + 1.55 * Tsu(macro)
thexternal = 1.55 * tCinput + 1.55 * tC - 1.27 * tDinput - 1.27 * tD + 1.55 * Th(macro) tsuexternal = 1.55 * tDinput + 1.75 * tD - 1.27 * tCinput - 1.43 *
COMMERCIAL OPERATING RANGE COM4
tC + 1.75 * Tsu(macro)
thexternal = 1.55 * tCinput + 1.75 * tC - 1.27 * tDinput - 1.43 * tD + 1.75 * Th(macro) tsuexternal = 0.86 * (tD + tDinput) - 0.70 * (tC + tCinput) + 0.86 * Tsu(macro)
MINIMUM thexternal = 0.86 * (tC + tCinput) - 0.70 * (tD + tDinput) 0.86 * Th(macro) These equations are for the channelless arrays, those designed with a seaof-gates or sea-of-cells architecture. (Refer to the AMCC Q14000 Design Manual for the equations for the other arrays in the series that are channelled.) Which Equations Should be Used? For external set-up time, use the following as a guideline: ●
●
Use the MILITARY or COMMERCIAL equations for external set-up time when ( tD + tDinput) - 0.82 * (tC + tCinput) > 0. Use the MINIMUM equations for external set-up time when (tD + tDinput) - 0.82 * (tC + tCinput) < 0.
For external hold time, use the following as a guideline: ●
●
Use the MILITARY or COMMERCIAL equations for external hold time when (tC + tCinput) - 0.82 * (tD + tDinput) > 0. Use the MINIMUM equations for external hold time when (tC + tCinput) - 0.82 * (tD + tDinput) < 0.
Example - AMCC Q20000 Bipolar Series At the other end of the complexity spectrum is the AMCC Q20000 bipolar array series. The Q20000 Series specifies its timing using a min/max range and specifies intrinsic set-up and hold times as worst-case.
External Set-up Time (MIL5, COM5) Given the MAX and MIN libraries, the external set-up time equations for The AMCC Q20000 Series become those shown in Table 6-3. MIL5MAX, MIL5MIN, COM5MAX and COM5MIN equations are shown. Note that for the MIL5MAX equation, the data path uses the MIL5MAXmax data and the clock path uses MIL5MAXmin data. Table 6-3 Set-Up Time Equations * MILITARY: tsuMIL5MAX = tDMIL5MAX
- tCMIL5MAX
+ TsuMIL5MAX
tsuMIL5MIN = tDMIL5MIN
- tCMIL5MAX
+ TsuMIL5MAX
max
max
min
min
COMMERCIAL: tsuCOM5MAX= tDCOM5MAX
max
tsuCOM5MIN = tDCOM5MIN
- tCCOM5MAX
+ TsuCOM5MAX
- tCCOM5MIN
+ TsuCOM5MAX
min
max
min
The equation does not use MIL5MAXmax for the data path and MIL5MAXminfor the clock path. Those two data sets are from two different operating condition extremes. The combination of two extremes of temperature, voltage and process variation can not occur on a single array.
External Hold Time (MIL5, COM5) Given the MAX and MIN libraries as described before, the external hold time equation becomes those shown in Table 6-4. MIL5MAX, MIL5MIN, COM5MAX and COM5MIN equations are shown. The same rules apply to these equations in terms of what timing data is used to compute the individual delay paths. Table 6-4 Hold Time Equations MILITARY: thMIL5MAX = tCMIL5MAX
max
- tDMIL5MAX
+ ThMIL5MAX
thMIL5MIN = tCMIL5MAX
max
- tDMIL5MAX
+ ThMIL5MAX
min
min
COMMERCIAL: thCOM5MAX = tCCOM5MAX
max
thCOM5MIN = tCCOM5MIN
max
- tDCOM5MAX
min
- tDCOM5MIN
min
+ ThCOM5MAX
+ ThCOM5MAX
* Both MIL5MAX and MIL5MIN or both COM5MAX and COM5MIN set-up and hold times must be computed and the largest external set-up time and the largest external hold time noted on design submission. The definitions of the terms used in the equations in Table 6-3 and Table 64 are given in Table 6-5. Table 6-5 Terminology Definitions Defining a "memory macro" as a latch, a flip/flop or an MSI containing one or the other, the terms used in the equations for the MIL5MAX and COM5MAX libraries are defined below. Terms for the other libraries (MIL5MIN and COM5MIN) would be similarly defined. tDMIL5MAX
max
data path propagation delay from the circuit input and up to the memory macro data input pin; computed using Front-Annotation methodology before layout, Back-Annotation after layout; computed with the maximum values of the Tpd delays and the k-Factors from the MIL5MAX library. tDCOM5MAX
max
data path propagation delay from the circuit input and up to the memory macro data input pin; computed using Front-Annotation methodology before layout, Back-Annotation after layout; computed with the maximum values of the Tpd delays and the k-Factors from the COM5MAX library. tDMIL5MAX
min
data path propagation delay from the circuit input and up to the memory macro data input pin; computed using Front-Annotation methodology before layout, Back-Annotation after layout; computed with the minimum values of the Tpd delays and the k-Factors from the MIL5MAX library. tDCOM5MAX
min
data path propagation delay from the circuit input and up to the memory macro data input pin; computed using Front-Annotation methodology before layout, Back-Annotation after layout; computed with the minimum values of the Tpd delays and the k-Factors from the COM5MAX library. tCMIL5MAX
max
clock path propagation delay from the circuit input and up to the memory macro clock input pin; computed using Front-Annotation methodology before layout, Back-Annotation after layout; computed with the maximum values of the Tpd delays and the k-Factors from the MIL5MAX library. tCCOM5MAX
max
clock path propagation delay from the circuit input and up to the memory macro clock input pin; computed using Front-Annotation methodology before layout, Back-Annotation after layout; computed with the maximum values of the Tpd delays and the k-Factors from the COM5MAX library. tCMIL5MAX
min
clock path propagation delay from the circuit input and up to the memory macro clock input pin; computed using Front-Annotation methodology before layout, Back-Annotation after layout; computed with the minimum values of the Tpd delays and the k-Factors from the MIL5MAX library. tCCOM5MAX
min
clock path propagation delay from the circuit input and up to the memory macro clock input pin; computed using Front-Annotation methodology before layout, Back-Annotation after layout; computed with the minimum values of the Tpd delays and the k-Factors from the COM5MAX library. Tsumacro = Tsu as specified in Section 6 Thmacro = Th as specified in Section 6
Converting COM5 to COM4, MIL5 to MIL4 The use of adjustment factors, to convert the COM5 and MIL5 data to COM4 and MIL4, were described in Chapter 5. They would also be applied to the computations for external set-up and hold times.
Business Systems
Case Study: Preventing Hold Violations Due To Clock Skew Hold Time Considerations Assume a typical 32-Bit shift register being driven by two clocks. No extraordinary design considerations are needed to maintain hold time when the Q output and D input are on flip/flops that are driven by the same clock net. (See Figure 6-4.) The AMCC Q5000 Bipolar macro library is used in the example. Figure 6-4 32-Bit Register: Single Clock Path
If the clock nets are different (Figure 6-5), the two clock paths need to be analyzed to determine if the required hold time has been satisfied. Differences in tracking, fan-out and metal lengths between the paths can be significant and cause enough delay in the second path (PATH 2) to create a hold time Figure 6-5 32-Bit Register: Two Clock Paths, Balanced Tree
CLK through D1 and through clock->Q on FF15 is the data path for the hold time computation for FF16. CLK through D2 and coming into the C input of FF16 is the corresponding clock path. The macro hold time, shown as Th for FF16, cannot be violated.
Overcoming Hold Time Error Two options are available to overcome the problem of hold time error. 1. Add one or more gates in PATH1 to offset the delay that has been found in PATH2. Generally, a single gate will suffice (B1, a buffer). (See Figure 6-6) 2. Minimize the differences to an acceptable level through macro selection, placement, load balancing and reduced metal lengths. ❍
❍
❍
Use identical clock driver macros to improve tracking. In buffer trees, use identical macros within the trees. Balance the clock fan-out loading (Lfo) to within 10%. In a buffer tree, divide the loads to be as evenly distributed as possible, within 1 or 2 loads of each other. Balance the clock load delays (fan-out + metal) (k fo * L fo + k net * L net) to within 30%. This requires placement considerations and possible preferential placement.
❍
Add a buffer if computation still shows a hold time violation (data changing too fast for clock). This is B1 in Figure 6-6 (between FF15 and FF16 because that is where the clocks switch from sourced by PATH1 to sources by PATH2).
Figure 6-6 32-Bit Register: Two Clock Paths, Balanced Tree, Buffer Added
Design Check To compute the hold time of the design, the propagation delay of both paths needs to be determined while factoring in the effects of tracking. For the Q5000 Series, the effects are defined below. For Like edges ●
●
●
If like edges and like structures where they are placed on adjacent cells on the same row or column, a 5% tracking is specified for AMCC Q5000 and Q14000 arrays. The amount will vary with the array and with the manner used by the vendor to specify tracking. If the two drivers are of the same macro type and they are placed within the same quadrant, a tracking of 10% is specified for the AMCC Q5000 and Q14000 arrays. Check with the vendor for the array to be used. If they are not the same macro but are placed within the same quadrant, 20% tracking is specified for the AMCC Q5000 and Q14000 arrays.
Check with the vendor for the array to be used. The buffer B1 in Figure 6-6 adds delay to the data path to prevent the hold time violation for th of macro FF16. For unlike edges ●
If unlike edges and unlike structures but placed within the same quadrant, 20% tracking is specified for the AMCC Q5000 and Q14000 arrays. Check with the vendor for the array to be used. The variations will usually be between 20 and 50%. Other placement options are not recommended.
Note: It is unlikely that two paths whose tracking in relation to each other is of concern, would be placed in different quadrants. If they are, consult with the array vendor.
Systems
Computing Hold Time in the Register Example To compute the hold time of the design, the propagation delay of both paths needs to be determined while factoring in the effects of tracking. ● ● ● ● ● ●
The worst-case tracking occurs when tracking reduces the path propagation delay of both the clock path PATH1 and the clock-to-output (Tpd) delay of the last flip/flop in the chain that is connected to PATH1 (FF15). This includes both the intrinsic and extrinsic delays.
The general equation for the example is as follows (using 10% tracking for the first macro in PATH1, and 20% tracking for the second macro): 10%
20%
WCM * [ 0.9 * Tpd D1 + 0.8 * Tpd FF15 - Tpd D2 ] > Th FF16 TRK1
TRK2
where Tpd is nominal. For the example, the driver macros (D1) are identical and placed within the same chip quadrant. For the Q5000 Series, this has been specified to mean a 10% tracking (0.9). For identical macros on the same row in the same quadrant, this reduces to 5% tracking or 0.95. TRK1 will be taken at 10%. The second tracking factor (TRK2) applies to FF15. This accounts for unlike edges and unlike structures between D2 and FF15. It does assume that both are placed within the same quadrant. For macros that are not identical, placed in the same quadrant on the array, tracking is defined as 20% or 0.8. TRK2 will be taken as 20%. These numbers only apply to the AMCC Q5000 Series. The timing diagram for Figure 6-5 is shown in Figure 6-7.
Minimum Library (Q5000 Series) For the example, for a military circuit, using bipolar worst-case multipliers (AMCC Q5000 Series), the equation for the Minimum Library minimum values becomes: 0.86 * [0.9 * Tpd D1 + 0.8 * Tpd FF15 - Tpd D2] > Th FF16 The Minimum Library is specified with worst-case multipliers of 0.86 (MAX), 0.78 (TYP) and 0.70 (MIN).
Figure 6-7 Timing Diagrams
In Figure 6-7, FF15 Tpd C->Q (clock to output delay) may be 20% faster than nominal. The total effect of these two variations may be to cause the D input to FF16 to change within the hold time window.
Military Library (Q5000 Series) The equation for the Military Library maximum value becomes: 1.45 * [0.9 * Tpd D1 + 0.8 * Tpd FF15 - Tpd D2] > Th FF16 The Military Library is specified with worst-case multipliers of 1.45 (MAX), 1.32 (TYP) and 1.19 (MIN). The computation should be performed using the MAXIMUM (Military or Commercial, as appropriate) worst-case library and then using the MINIMUM library to determine the worst-case external hold time. The ratio of the data and clock path delays will determine which library will produce the larger (most positive number) hold time. While success can be estimated using Front-Annotation, final calculations must be made after Back-Annotation using the actual metal delays. The layout and/or design must be changed if the appropriate test fails. Note that tracking may be part of the adjustment factor specifications, in which case the equations would need to be altered to reflect this different specification. Check with the array vendor.
Exercises 1. Review the design manual for the array you have chosen. a. How is tracking specified? b. How are external set-up and hold-time specified? c. What guidelines are given to minimize the hold-time problems in circuits such as shown in Figure 6-3? 2. . Compare the array chosen in exercise one with a second array (different series, different vendor). 3. Write the generic equations for external set-up and hold time for the case where the propagation delays are specified as nominal and the macro set-up and hold times are specified as worst-case. 4. . Change the register case study to the macros in a library of interest. Write the hold time tracking analysis equations for this circuit. You need to know: ❍ how macros are specified - Tpd; ❍ how macros are specified - Tsu, Th; ❍ if worst-case multipliers or adjustment factors are to be used, ❍ what they are; for like and unlike edges, ❍ how the vendor specifies tracking; ❍ and how macro extrinsic loads (net and fan-out loads) are computed.
Power Considerations
Introduction Power dissipation is a measure of the amount of heat that must be handled by the array, its package, and the board on which they are mounted and cooled by whatever means is used. Handling this heat, called thermal management, is a fundamental design requirement. Estimating the heat to be dissipated requires a worst-case maximum power dissipation computation. The computation of the worst-case maximum power dissipation for a circuit has two major components, DC (quiescent) power and AC (dynamic) power dissipation. Each in turn is composed of several elements. While DC power dissipation can be reasonably bounded by worst-case equations based on P = I*V, AC power dissipation without hardware emulation-assist and complex software is at best a rough guess.
Cell Types Define the Power Computation Array and cell types can be broken down into four basic categories: CMOS, bipolar, BiCMOS and super-speed bipolar (see Table 7-1): ●
●
●
●
●
CMOS arrays have a significantly higher AC power dissipation figure than DC power dissipation for most cells. Bipolar arrays have the DC power dissipation as the principal source of heat. BiCMOS arrays have a combination of these two approaches, following bipolar computational procedures for I/O macros and CMOS computational procedures for core macros. BiCMOS arrays (in proposal stages) would use bipolar methods for any bipolar cells, interface and internal, and CMOS computations for their CMOS cell based macros. High-speed bipolar (over 600MHz) requires the use of both approaches, with neither DC or AC power being considered as insignificant factors. In this case, AC power may be no more than 20% of the DC power (using the Q20000 Series guideline).
Table 7-1 Cell Type Vs. Power Component Macro On Cell Type: Principal Power Dissipation: CMOS
AC
Bipolar
DC
BiCMOS
AC internal DC interface
High-speed bipolar
both AC and DC
DC Power in a Bipolar Array DC power dissipation computation for a bipolar array is composed of interface macro DC power, internal macro DC power, overhead current DC power and ECL static output power dissipation. DC power is a function of the current dissipated, adjusted for state dependencies, junction temperature, voltage levels, ECL termination values, and any conditional geo-metry power down of IOEF. Typical DC power can be computed to a given level of detail from the netlist of the circuit using: P = I * V (one macro) where I is the adjusted typical macro current from all current sources (IOEF and internal current) and V is the power supply. A worst-case multiplier can be used to convert typical DC power into worst-case power for a specific set of conditions. Some components of DC power are shown in Table 7-2.
Table 7-2 DC Power Computation - Bipolar Computing DC Power ● ● ● ● ● ● ● ●
Interface macro DC power Internal macro DC power Overhead current DC power ECL static DC power Reduction for state dependencies Reduction for power-down of IOEF Reduction for junction temperature Reduction for voltage levels
The array vendor may specify a typical macro current (summing internal and IOEF values) and overhead current, or a typical macro PDC term, where a power supply has been assumed and conversion factors exist to adjust for differences. In this case, additional information as to ECL I/O bias power usage will be supplied, as well as some means of handling ECL DC power due to termination current. Regardless of the variations in "specsmenship", the methods are algebraically the same, as they must be.
DC Power in a BiCMOS Array The DC power component will be the significant factor in computing power for the interface macros in a BiCMOS array. The power computation is the same as was applied to the bipolar array, except that the computation is made for the interface macros and not all macros in the circuit. Both the power due to the current sources in the interface macros and the power due to overhead current must be computed.
Overhead Current Overhead current (ICC and IEE) is that current dissipated by the voltage threshold generators and bias circuitry. It will be dissipated even if no macros are placed on any of the cells, i.e., it exists as soon as the chip is powered on. For some arrays, overhead current and therefore overhead power may be specified as variable. It may depend on whether a specific type of I/O macro was used anywhere on the circuit, or on the number of such macros and their placement. Placement-dependent computations cannot occur until after place and route. Pre-placement computations must assume the worst possible conditions. Overhead power is the overhead current times the power supply, computed as for the macro power dissipation.
DC Power in a CMOS Array DC power in a CMOS array is due to a static DC current (transistors are ON) and a leakage DC current (transistors are OFF). There may also be an overlap current, a result of both paired transistors being ON as they change state. The total value for leakage current will be low and needs to be measured when the circuit can be configured with all devices off. Static DC current should be approximately zero, or low enough not to consider. Power dissipation due to overlap current is no more than 10% of the total power dissipation. These figures are guidelines only. The designer should review the design manual for the specific array to be used to determine the power characteristics for that array.
TTL Outputs - CMOS Arrays Power dissipation due to TTL outputs on a CMOS array needs to be included in the total power computation. It is a function of the duty cycle of the outputs and the sink current. P = n * Isink * VOL where n is the number of TTL loads, V is the output low voltage and I is the sink current for the macro. The specification may show a Pdc value instead of current for the individual macro.
ECL Static Output Power - All Arrays ECL output macros on bipolar or BiCMOS arrays dissipate a static power based on the termination resistor value (off-chip termination). The common estimate for this current is: P = I * V (per output) where I is the termination current value (14 mA for a 50 ohm termination) and V = 1.3V, the voltage swing for -5.2V termination.
AC Power The dominant source of AC power (~90%) in a CMOS device is due to the charging and discharging, i.e., switching, of the circuit capacitance. AC power due to switching is composed of interface macro AC power and internal macro AC power dissipation. These groups may in turn be broken down by macro type and switching frequency, depending on the particuar array specifications. The equation for AC switching power for a single device depends on the frequency (f) at which the logic is switching. It is based on the charging of a capacitor (C) to a voltage (V) through a P-channel device to build up a charge (CV). The energy stored is CV2. The energy is in turn discharged through the paired N-channel device. P = f * C * V2 A variation of the equation is possible when the vendor specifies a constant in terms of microwatts/gate-MHz. The equation can reduce to: P = 0.20* (a*f*G) (per class of macro) where .20 is 20% devices switching, a is the power constant in mw/gateMHz, f is the switching frequency and G is the number of gates. All devices in a circuit will not be switching at the same time. Estimates from 18-36% devices switching can be obtained from the different vendors, based on the ratio of register elements in the circuit. Loosely, 20% or 30% devices switching is the number used by most vendors. The equation may be changed to be number of macros with an adjustment in the constant, or it may be specified using .30 as the percentage of devices switching. Whatever is used in the vendor documentation, the equation represents an estimate of the worst-case AC power. All devices will not be switching at the maximum frequency. If the frequencies are clearly defined, the problem may be handled as a series of equations, one for each frequency group. In that case, the largest number of elements switching at the same time in that group would be used rather than a 20% estimate. The sum of the power computations for all groups is a worst-case estimate. The factors in the AC power computation are shown in Table 7-3. Table 7-3 Components Of AC Power Computation Components of AC Power ● ● ● ●
Percent elements switching Maximum switching frequency Capacitance Constant in mW/gate-MHz
CMOS circuits use AC power computation for internal and interface macros. BiCMOS circuits use AC power for their internal macros. The high-speed bipolar circuits will require that an AC power component be computed for their internal and some of their interface macros. Interface macro AC power computations may require input macro and output macro contributions to be computed separately, depending on the methods used by the vendor to specify them. Macros contributing to AC power are listed in Table 7-4.
Table 7-4 AC Power Computation AC Power Computation ● ● ● ●
interface macro AC power input macros output macros internal macro AC power
Hardware Assist While it is relatively easy to compute the switching frequency of the interface macros, allowing a realistic value to be computed for their AC power component, computation of internal macro AC power dissipation is more difficult. Some hardware emulation systems such as IKOS and ZYCAD are providing support for internal macro switching frequency computation and the quality and quantity of the support should expand over the next few years.
Benchmark Required Hardware-assisted AC power computations depend on the accuracy of the atspeed vector set. The at-speed vectors must correctly benchmark the expected lifetime behavior of the circuit for these systems to provide accurate AC power computation support. If the vector set is not accurate, i.e., if the switching patterns are not a correct representation of the expected behavior of the circuit in use, the results can be considered to be no more accurate than the results produced from the estimation equations.
Worst-case Power The worst-case maximum power dissipation is required to allow the computation of the worst-case maximum junction temperature. The resulting computation of the junction temperature may allow a reduction in the worst-case IEE current, reducing the DC power. This may in turn affect the junction temperature. The computation of the junction temperature is an iterative process. The worst-case power as related to junction temperature is used to examine the packaging and the heat-sink requirements of the final product. The worst-case maximum junction temperature is used to evaluate package selection and to make cooling decisions such whether to use a heatsink and what rate of airflow is required. IEE <==> DC POWER <==> Junction Temperature <==> IEE Again, this is an iterative process. The package selected will determine the thermal coefficients which affect the choices of heatsinks. Airflow alters the effective thermal coefficients. All of these items affect the final junction temperature. Figure 7-1 Iterative Power Interactions package power junction <-----------> <-----------> heatsink dissipation temperature airflow
Adjustment Multiplier for DC Power If the vendor specified power for the individual macros, then the typical power listed with each macro assumes a certain set of conditions including supply voltage. When a different supply is used, an adjustment to the
typical power must be made. The default supply voltage is -5.2V in an AMCC ECL array. In this case the power supply adjustment factors would be as shown in Table 7-5. Table 7-5 DC Power Voltage Adjustment Factor
Type
Supply
1.00
ECL 10K
-5.2V
0.96
TTL
+5.0V
0.88
ECL100K
-4.5V
Some vendors specify an adjustment factor to be used to compute worstcase DC power dissipation from the typical DC power dissipation. For ECL and BiCMOS arrays, the range is 0.6 to 0.7 times typical to find an estimate of minimum power and 1.3 to 1.54 times typical to find maximum worst case power. This adjustment factor is also called a worst-case multiplier. It may vary between commercial and military grade circuits. The worst-case multiplier is used with macros that are specified with typical current and with macros that are specified with typical power. When typical current has been specified, this adjustment factor may be called a worstcase current multiplier. Table 7-6 DC Power Worst-Case Adjustment Factors - Worst Case Multiplier DC Power Worst-Case Adjustment Factors Worst Case Multiplier 0.6-0.70
MINIMUM
1.4-1.54
MAXIMUM
These multipliers may be expected to decrease as newer and cooler arrays are developed.
Checking with the Vendor When a designer is evaluating power while selecting an array, the design manual for that array must be reviewed for the items listed in Table 7-7.
Example 1 AMCC specifies macros with a typical current. To compute power for a Q5000 Bipolar Series based mixed-mode ECL 10K/TTL military circuit, perform the following steps. ● ●
●
●
●
The sum of all IEE and ICC currents, including overhead is computed. The currents are multiplied by 1.4, the worst-case current multiplier for commercial and military circuits. [AMCC value] The voltage sources of -5.2V and +5V are allowed to vary ±5% for a commercial circuit and ±10% for a military circuit. The -5.2V supply would be -5.72V worst-case maximum and the +5V TTL supply would be 5.5V. A separate computation is made for terminated ECL outputs. If all outputs are standard 50ohm terminations, PECL = 1.3 * 14 * n, where n is the number of such outputs. Add PEE, PCC and PECL to find the worst-case maximum DC power dissipated by the circuit.
Example 2 Raytheon specifies power components for its macros. After adjusting for the correct voltage, the typical DC power is the sum of the DC power components of the internal macros, I/O macros and bias cells (overhead). A worst-case multiplier of 1.4 is used to find the worst-case maximum power dissipated by the circuit.
Table 7-7 Topics For Review Review Topics
Systems
●
● ●
● ●
What is specified for a macro? ❍ current ■ typical ■ worst-case ❍ power ■ AC ■ DC ❍ adjustments ■ for voltage variation ■ for temperature variation ■ for process variation ■ for different operating environments How are bias circuits (overhead circuits) to be handled? How is power to be worst-cased? ❍ current multiplier? ❍ voltage multiplier? ❍ power multiplier? What about TTL outputs on a CMOS array? Refinements to this include: ❍ Power-down of terminated outputs ❍ Three-state output current variations ❍ Two-state output current variations ❍ Junction temperature effects on IEE ❍ Placement-sensitive overhead or bias current
Example 3 AMCC specifies typical current for the Q20000 high-speed bipolar series. In this case, there are different worst-case current multipliers, one for internal macro current, one for interface macro current and one for the bias or overhead current.
Example 4 Several BiCMOS vendors specify an overhead current that is placement dependent or at least usage dependent. The overhead current will have some basic value that is always present as when TTL can be on the array with ECL, and there will be a variable component. The amount of the variable component depends on how many ECL inputs and ECL outputs are used by the circuit, and may depend on where on the array perimeter these macros are placed. Before placement, the designer should estimate this variable as the worst it could be and make thermal decisions using this value. Refinements in power reduction would include a post-layout review of the overhead current component.
Power Reduction Techniques Regardless of the array technology, items whose use will increase the power dissipated by the array should be carefully chosen. A tradeoff or balance of different design objectives should reflect judicious selections that maintain speed while keeping power dissipation to a minimum and circuit size within the array constraints. Power considerations are no less serious for the large CMOS and BiCMOS arrays. Table 7-8 summarizes the design choices that contribute to higher power; which choices are possible depends on the array series. Table 7-8 Power Dissipation Contributors Contributors to Power Dissipation ●
●
●
●
●
●
●
High-speed macro options propagation delay - faster High fan-out macro options low distortion propagation delay - may be faster Driver macros (clock drivers) chip-efficient low distortion propagation delay - may be faster Multi-cell macros chip-efficient lower cell count propagation delay - may be faster Unused outputs -no contributionPowered-down unused outputs -no contributionUnused inputs -no contribution-
Table 7-9 summarizes the choices that can be made to reduce power and the design tradeoffs that these may require. Table 7-9 Low Power Options Low Power Dissipation
Tradeoff
Low-power macro options
higher propagation delay distortionless drive (maybe) higher distortion drive (maybe)
Placement of I/O to reduce bias or overhead current
less flexibility in placement; less performance (maybe)
The Macros and Their Options For the bipolar arrays and for the interface portion of the BiCMOS arrays, macro selection plays a major role in the final power dissipated by the circuit. Macros that are fast, have high fan-out drive capability or are dense (read chip-efficient) are high-current macros. The macro library should be reviewed for the existence of options or versions. A macro library may have low-power versions or options of its macros. (Note: AMCC CMOS macros have no options.) Once the design is blocked and the macros selected, the timing requirements should be reviewed and the macros adjusted by option when options are available and can be applied. As discussed during the chapter on timing, low-power macros are slower than the standard or high-speed options. Low-power macros may have a lower fan-out load limit. Low-power macros may have a lower maximum frequency of operation and narrower minimum pulse width. None of these variations are absolute, i.e., they vary with the array.
Macro Option Examples The AMCC Q5000 Bipolar Series macro library has options for most of its internal macros and many of its interface macros. ●
●
● ●
The low-power L-option macros have a lower drive (4 versus 9 loads) and are slower than the standard S-option macros they can replace. High-speed H-option macros have the same drive as the S-option macros but are faster and use more current. Driver macros (macros with a high fan-out limit) have no options. The super-driver macros (with a 25 load limit) average approximately 5 mA of worst-case current versus the 2.5 mA average current of the 15-load driver macros.
The AMCC Q20000 Bipolar Series macro library also contains L-, S-, and Hoptions. In this series, all options have the same fan-out load limit, a function of the Turbo output feature. What is different is that some Hoptions use more cells than the L-and S-options. All other considerations of differences between the options remain the same.
Design Rules for Macro Options ●
●
●
●
Always review a library to determine: ❍ if it has options or versions of its macros, ❍ which macros do and don't have options or versions, ❍ what macros are similar with minor variants to other macros, and ❍ the different limitations on the applicability of the options, versions and variants. During design and when available, H-option macros, high fan-out drivers and other high-current macros should be used judiciously to avoid unnecessary high current - high power dissipation. The use of L-option macros when available can help balance the use of high-current macros providing speed - power programmability if and only if the minimum pulse widths are not violated. Select macro options and versions carefully! Double-check time and power impacts of your choices.
Power-Down and Conditional Geometry Bipolar If a bipolar macro library is implemented with conditional geometry, macro outputs that are not used (are terminated) have their IOEF current sources shut-down. Only one version of the macro needs to be supplied in these libraries. Note that not all macros in the library allow this feature.
Terminated Outputs - Bipolar When power-down is not allowed, the macros in the bipolar library may be available in different versions. For example three macros may perform the same function with one supplying a non-inverted Y output, one supplying an inverted YN output and one supplying both Y and YN. The macro version selected should reflect output usage. One or two macros that are not a perfect match may not be a problem in the design but dozens of terminated yet powered-up outputs can be expensive because of the power supply and packaging options required by the higher-than-necessary power.
Terminated Outputs - BiCMOS, CMOS CMOS and internal BiCMOS macros may have their power computation based on the number of outputs switching (vendor-dependent equation). Both used and terminated outputs are counted in the computation. Therefore, for BiCMOS and CMOS arrays, the objective is to minimize the number of terminated outputs. This objective is no different from the objective for bipolar arrays.
Inverted Outputs for Distortion Management The need for signal inversion in high-speed paths for skew and pulse distortion control may require YN (inverted output). However, there may be a different speed associated with each of the output polarities. One design objective is to maintain the timing considerations of both speed and distortion management while managing power. Power considerations require that macros exist in a library in different versions to allow the flexibility of inversion with no additional cost in power.
Examples ●
●
●
●
The AMCC Q5000 Series macros feature power down for over 90% of the macros. The AMCC Q14000 and Q24000 BiCMOS Series interface macros were not implemented with conditional geometry. The AMCC Q20000 Series macros have no conditional geometry due to use of a completely different process and technology. The BiCMOS and high-speed bipolar libraries supply different versions of various macros to allow for optimum selection.
Design Rules for Power Reduction (AMCC) ●
● ●
Check the library for either power-down of unused outputs or the existence of different output pin versions of the macros. Examine both power and speed differences of these versions Minimize the number of terminated outputs When a version of a macro is absent from the library, the array vendor may be able to supply a custom macro or an alternate design solution. Do not hesitate to consult the array vendor.
Unused Inputs - Bipolar, BiCMOS Macros with unused inputs may be dissipating more current than is required for the function being performed. When too many macro inputs are grounded (global ground) or clipped to VDD or VSS, a check should be made to see if another macro could be used that more closely reflects the circuit requirements
MSI And High-Functionality Macros Dense bipolar macros, as represented by the AMCC MSI macros and any complex chip-efficient macro in any other libraries, represent high-power. They concentrate operations in a small chip area. Their placement in a bipolar array may need to follow row and quadrant current limits. In their favor, their use reduces the number of macros needed to implement a function but the overall current usage may or may not be reduced BiCMOS MSI macros are generally spread out over a larger number of cells than their bipolar counterparts and do not require the same power considerations Driver macros on bipolar arrays supply high-drive capability in a small chip area and have the same potential restrictions as for the bipolar MSI macros. The bipolar array may have internal current limits that need to be honored. Drivers reduce the number of other macros that would be required but their use may increase the overall power required by the circuit. High-fan-out drivers were specifically developed for use in clock distribution schemes
State Dependent Current - Bipolar For some bipolar interface macros, as defined in the macro library documentation, the current is state-dependent. The ICC and IEE values for many of the interface macros are specified for HIGH and LOW input. Bidirectionals are specified for enabled and disabled states To compute actual power for any given state of the circuit, the state and operating duty cycle of each I/O would have to be known, requiring a detailed vector analysis
Estimating State-Dependent Current Without specific and unique operating duty cycles, the following procedure is recommended: ●
●
For 2-state outputs (HIGH, LOW), calculate IEE and ICC as the average of the two values (50% HIGH, 50% LOW) For 3-state outputs (HIGH, LOW, Z or INPUT) calculate as 50% disabled high impedance Z state or input mode, 25% enabled HIGH and 25% enabled LOW
Usage-Dependent Overhead (Bias) Power Several arrays specify overhead power dissipation as a function of the number of ECL interface macros used, without reference to placement. A placement restriction is implied in such calculations
Placement-Dependent Overhead Current For those arrays with placement-dependent overhead current, the knowledge that a designer might influence power by careful placement is dangerous. Placement is driven by speed performance requirements and array placement restrictions. Only after these criteria are satisfied can any approach toward reducing overhead current be attempted, and then it will be heavily restricted As a rule, the differences between the maximum worst-case overhead current and the minimum do not warrant the violation of the performance criteria
Example The AMCC Q24000B BiCMOS array has placement-dependent overhead current. In the 100% ECL mode of operation, with a supply of -5.2V, a military circuit would have 1.14W due to overhead. Of this, 40% is due to a fixed component and 60% is due to the variable component of overhead. If the array has no ECL interface macros, the minimum value can be applied The breakdown for this one array is shown in Table 7-10 Table 7-10 Components Of Overhead Current Description
Value
Generator Name
fixed component
51.0 mA -
ECL inputs
9.6 mA
VTA10K or VTA100K
ECL inputs + outputs 72.0 mA VRB The numbers are the same for an ECL 100K array, whether or not both ECL 10K and ECL 100K outputs are used. Either ECL 10KH or ECL 10K compatibility is assumed.
Computing DC Power Dissipation To compute the worst-case DC power for a non-CMOS circuit, perform the following steps. There will be some variation in the complexity of the steps between vendors, depending on the method used to specify current or power dissipation for individual macros.
Steps To Compute Maximum WorstCase DC Power The following example methodology assumes that typical current is specified. It applies to any power-supply configuration ●
●
●
Sum all individual interface macro currents and multiply by the interface macro worst-case current multiplier. Keep IEE and ICC separate Sum all individual internal macro currents and multiply by the internal macro worst-case current multiplier. Keep IEE and ICC separate. Skip this step for the BiCMOS arrays Find the IEE and ICC overhead currents and multiply them by the worst-case overhead current multiplier. Keep IEE and ICC separate
●
Add all IEE currents together
●
Add all ICC currents together
●
Find the worst-case VCC and VEE voltages
●
Multiply IEE * VEE
●
Multiply ICC * VCC
●
●
Compute ECL static power dissipation: 1.3 * termination current * number of outputs. Adjust the equation when the standard assumptions are not met Add all items together. This is the worst-case maximum DC power dissipated by the circuit if it is bipolar. It is the interface macro worstcase maximum DC power if the circuit is for the BiCMOS arrays
Reduction for single-supply circuits There is an obvious reduction in the complexity of the steps if the circuit uses only one power supply. Under this condition, IEE becomes ICC when a +5V reference circuit is being analyzed. There is no ICC in a -5.2V or -4.5V single-supply circuit
Table 7-11 Currents Present By I/O Mode Power Supply
Technology Compute IEE Compute ICC
SINGLE, +5V
100% TTL
-
X
SINGLE, +5V
100% ECL
-
X
SINGLE, +5V
ECL/TTL
-
X
SINGLE, -5.2V
100% ECL
X
-
SINGLE, -4.5V
100% ECL
X
-
DUAL, +5V -5.2V 100% ECL
X
-
DUAL, +5V -5.2V ECL/TTL
X
X
DUAL, +5V -4.5V 100% ECL
X
-
DUAL, +5V -4.5V ECL/TTL
X
X
The unusual dual-supply 100% ECL circuits (DECL) shown in Table 7-11 are required for minimum cell 25 ohm terminations (see the AMCC Q20000 Series Darlingtons)
CMOS and BiCMOS Arrays BiCMOS arrays will simplify the computation since they will only compute DC power for the bipolar interface macros. Note that BiCMOS arrays still have an overhead current CMOS arrays emphasize AC power components and do not usually have DC components listed
When Power is Specified Rather then Current Another variation in the computational method occurs when the macros are individually specified with a power dissipation, i.e., use this macro dissipate this much power. The worst-case multiplier and the worst-case voltage may or may not be accounted for in the computation. Only addition is required to compute the macro power dissipation An overhead component for the bias circuitry may need to be computed and added to the macro sum. If typical power is specified, a worst-case multiplier or adjustment factor may be provided as previously discussed
Design Rules when Power is to be Estimated Before trying to estimate power, verify that the specifications made for the macros are clearly understood
●
Review the overhead (bias) current and how it is handled by the array
●
Review the procedures specified by the particular array vendor for that array series since procedures may vary from series to series with the same vendor
●
Review other vendor-identified power dissipaters - TTL IOEF, ECL static output
●
AC Macro Power Dissipation To compute the AC power for any array, perform the computation shown earlier and repeated here: PAC = 0.20 * ( a * f * G ) where
● ● ●
a is the power constant in microwatts/gate-MHz, f is the switching frequency G is the number of gates.
There will be some variation in the form of the equation depending on the use of gates or macros or outputs as the sizing measure. There will be variation depending on the use of a register ratio and there will be variation in the types of macros for which AC power will be computed.
Case Study: DC Power Computation The Q20000 Series DC power methodology is detailed below to allow sample computations and to provide some reference for analysis of other arrays. The method can be modified to apply to all AMCC arrays, independent of their technology, and to apply to other vendor arrays
Steps Required To Compute DC Power: - Example Array Series Step 1: Build the Macro Occurrence Table A macro occurrence table can be constructed before design capture to assist in the manual computation of interface current and power dissipation. A macro occurrence table should provide the data necessary to compute the worst-case DC power dissipated by the circuit. (This table approach can be used with macros that specify current or macros that specify power.) Macro Number of Macro Macro Macro Total Total Name Occurrences ICC IEE ICC IEE AMCC MacroMatrix ERC power computations are done with the worst-state current value, assuming the worst "state" always for those macros specified with HIGH and LOW or ENABLED and DISABLED current. The power dissipation value computed will be conservative. A manual com-putation can be made to adjust the current used for these state-dependent macros Identify those macros where some assumption must be made on the value of the current to be used in the table. They will need to be identified during design submission
Step 2: Find the Total Interface Macro Current The macro current for individual interface macros is multiplied by the number of occurrences to determine the total typical current resulting from the use of that macro in the circuit. Change the labels if power has been specified Compute the total current used by all occurrences of each interface macro and sum these totals to find the total IEE and ICC interface current IinterfaceCC = SUM INTERFACE MACROS ICC CURRENT IinterfaceEE = SUM INTERFACE MACROS IEE CURRENT Unless the circuit uses a single-power supply these two sums must be kept separate
Step 3: Compute the Worst-Case Interface Current Multiply both sums by the worst-case current multiplier (WCCM1) for the interface macros to obtain the worst-case current due to interface macros IinterfaceCCwc = WCCM1 * IinterfaceCC IinterfaceEEwc = WCCM1 * IinterfaceEE The Q20000 Series uses three worst-case current multipliers, one for core macros, one for interface macros and one for overhead current. Other arrays may be specified with one multiplier or may vary the multiplier or adjustment factor based on the product grade Table 7-12 Example Worst-Case Current Multipliers Interface Internal Overhead WCCM1
WCCM2
WCCM3
1.30
1.25
1.25
Step 4: Compute the Total Typical Internal Macro Current The macro current for individual internal macros is multiplied by the number of occurrences to determine the total for the use of that macro in the circuit. Change the labels if power has been specified Compute the total current used by all occurrences of each internal macro and sum these totals to find: IinternalEE = SUM INTERNAL MACROS IEE CURRENT IinternalCC = SUM INTERNAL MACROS ICC CURRENT Unless the circuit uses a single-power supply these two sums must be kept separate
Step 5: Compute the Worst-Case Internal Current Multiply both sums by the worst-case current multiplier for the internal macros (WCCM2) to obtain the worst-case current due to interface macros IinternalCCwc = WCCM2 * IinternalCC IinternalEEwc = WCCM2 * IinternalEE
Step 6: Compute the Total Typical Overhead Current Compute the total overhead current used by the circuit IoverheadEE = Table entry for array and I/O mode IoverheadCC = Table entry for array and I/O mode Unless the circuit uses a single-power supply these two terms must be kept separate. The vendor may specify a computation based on number of inputs or outputs of a given type. AMCC specifies a typical overhead current by array, by I/O mode (ECL-TTL or mixed) and power supply Note: The DECL and the mixed ECL/TTL dual supply arrays dissipate the same overhead current. In the case of the DECL circuit, some of the macros that require the use of two power supplies will dissipate an ICC current component Table 7-13 Q20000 Series Example Typical Overhead Currents
ARRAY
ECL Mode IEE, mA
ECL Mode ECL/TTL "DECL" Mixed Mode Dual Supply IEE/ICC, mA IEE/ICC, mA
Q20160 400
400/40
400/40
Q20080 180
180/26
180/26
Q20045 126
126/26
126/26
Q20025 126
126/26
126/26
Q20010 67
67/13
67/13
Step 7: Compute the Worst-Case Overhead Current Multiply both the IEE and ICC overhead currents by the worst-case current multiplier for the overhead current (WCCM3) to obtain the worst-case overhead currents IoverheadCCwc = WCCM3 * IoverheadCC IoverheadEEwc = WCCM3 * IoverheadEE
Step 8: Sum the ICC and IEE Currents Sum the internal macro, interface macro, and overhead worst-case currents, keeping ICC and IEE separate to find the total IEE and ICC worstcase currents ICCwc = SUM IinternalCCwc + IinterfaceCCwc + IoverheadCCwc IEEwc = SUM IinternalEEwc + IinterfaceEEwc + IoverheadEEwc
Step 9: Multiply by the Worst-Case Voltage The worst-case voltage is dependent on whether the circuit is commercial or military and the specified allowed power supply variations. The array data sheet carries this information. The typical variation is shown below. For commercial circuits, with a -5.2V or a +5V supply, the voltage variation is usually ±5%. For commercial circuits using VEE = -4.5V, the variation is ±7% For military circuits, the voltage variation is usually ±10%. Note: The worstcase voltage for the -4.5V supply as listed on the Q20000 data sheet, where -4.5V supply varies ±7%. The worst-case voltage is taken as -4.8V for military or commercial circuits Table 7-14 Example Worst-Case Voltages Nominal Commercial MILITARY +5.0V
+5.25V
+5.5V
-5.2V
-5.46V
-5.72V
-4.5V
-4.8V
-4.8V
Multiply the worst-case DC current by the appropriate worst-case voltage: PEEDC = IEEwc * VEEwc PCCDC = ICCwc * VCCwc
This product is the worst-case DC power due to the macros on the array.
Step 10: Determine ECL Static Power PEO The equation used by the AMCC MacroMatrix ERC software to compute ECL static power dissipation for ECL outputs is: PEO = XXmA * 1.3V * NUMBER_OF_ECL_OUTPUTS where XX is the current based on the termination The 1.3V term represents the average between VOH and VOL. This is considered to be the statistical worst-case for this function If there is more than one termination, the power for each type of termination is computed and summed to find the total PECL outputs
ECL Output Termination Current Table 7-15 provides the ECL output termination currents used by the AMCC MacroMatrix ERC software. The currents shown are the average current (average of IOH and IOL) AMCC automated ECL static power is computed with the 50 or 25 ohm termination under the assumption of 50% terminations active. This is because AMCC provides macros that drive 50 ohm loads and macros that drive 25 ohm loads. These assumptions may vary with the vendor. Table 7-15 ECL Termination Current ohm
mA
25
28.0
50
14.0
100
7.0
200
3.5
* the average current (average of IOH and IOL) for termination to -2V
* When 50 ohm or 25 ohm Terminations are Not Used Resistive loads that differ from that used by automation software will require manual computation. These loads should be identified when performing design submission to any vendor. If other ECL output load resistances are used, the actual current value must be computed for use in the ECL static power equation. For a -2V termination, to find the average current in mA, use the following equation. I in mA = 0.7/( R * (10-3)) for any R
* When VT Ê -2V For other termination voltages, an adjustment to the power dissipation computation must be made by the designer. For a termination voltage VT, to find the average current in mA, use the following equation. I in mA = (-1.3V - VT)/( R * (10-3)) for any R
* Darlington ECL Outputs Darlington ECL outputs are treated as a standard ECL outputs for static power computations.
* On-Chip Series Termination There is no IOEF output current for on-chip series termination ECL output macros. All current (power) dissipated is specified in the macro documentation.
Step 11: Sum the Result - Total DC power Sum the results of the macro and overhead current power computations with any ECL output macro static power dissipation to obtain the total worstcase DC power dissipation for the circuit: PdDC = PCCDC + PEEDC + PEO The result is the total worst-case DC power dissipated by the circuit on the target array.
Exercises 1. Determine how current or power is specified for an array series of interest. Is it worst-case? If not, find out how the data is converted to worst-case for Commercial and for Military operating conditions. 2. Determine how overhead current is accounted for by this array series. Is it variable? If so, determine what affects it, e.g., number of I/O by type or their placement. 3. What other adjustments are required before a final DC power value can be obtained? If the manual is not specific, consult the vendor. 4. AMCC Q20000 Exercise. A 100% ECL circuit has been designed using the Q20080 (8K) array. It uses the following example values: Macro Name: Number Used: Current: IEE mA FF10S
32
0.98
FF46S
32
0.93
GT09S
0
0.39
GT55D
4
0.91
GT60L
6
0.64
GT60S
6
0.78
GT87D
2
0.49
IE31H
1
2.20
IE93S
82
1.03
IEVCC
9
0.00
MX21S
20
0.66
OE42S
69
8.72
Typical current is specified for one occurrence of each macro. The power supply is nominal -5.2V. The circuit is military. Compute its DC power dissipation given the above portion of the macro occurrence table and the other tables provided in the text. [Ans.: 8.32 Watts (1994 data)] 5. Substitute 32 OE14S macros that use 6.69 mA each for 64 of the OE42S macros and delete one of the GT87D macros. Now how much power is dissipated?
Case Study: AC Power Computation The Q20000 Series AC power methodology [as of 1994] is detailed below to allow sample computations and to provide some reference for analysis of other arrays.
Step 1: Build the Macro Occurrence Table A macro occurrence worksheet for interface and internal logic macros can be constructed before design capture to assist in the computation of total circuit power dissipation. A macro occurrence worksheet for the interface and internal logic should provide: ● ● ●
A list of the different macros differentiated by option; The number of times each macro appears on the schematics; The number of outputs the macro has.
The macros are differentiated by option if there is a difference in the conversion constant. The number of outputs is required if that is the sizing measure used by the vendor. If the number of gates rather than the number of macros is required, a conversion factor in gates per macro is supplied. The AMCC MacroMatrix ERC produces a BiCMOS/Bipolar Power Computation Worksheet that lists macros and their occurrences and which can be used for manual AC power analysis. Because of the use of several different conversion constants for the Q20000 macros, the macros on the worksheet must be grouped within certain types. The types are ECL Darlington outputs, ECL inputs and internal macros. Other ECL outputs are not involved in the computation. TTL I/O macros are not involved in the computation.
Step 2: Find the number of Darlington Outputs Using the macro occurrence worksheet, find the number of Darlington outputs. For AMCC customers, the number of Darlington outputs can be determined from the AMCC I/O statistics ERC report and from the AMCC AC Bipolar/BiCMOS Macro Occurrence worksheet. Count the number of Darlington output macros switching at the fastest frequency. Count the number of Darlington output macros switching at the next fastest frequency. Repeat until all Darlington output macros are accounted for.
Step 3: Compute AC Power For all ECL Darlington Output Macros Using the counts from step 2, compute the AC power dissipated by the ECL Darlington outputs used in the circuit. For AMCC arrays, use: PacDARoutputs = f * n * 15 microwatts where f = maximum frequency of the Darlington outputs in MHz n = number of ECL Darlington output macros toggling at frequency f conversion factor = 15 microwatt/macro-MHz for Q20000 Series ●
●
●
Compute the AC power due to those macros switching at the fastest frequency. Compute the AC power due to those macros switching at the next fastest frequency. Repeat until all Darlington output macros are accounted for and sum the results.
Step 4: Find the Number of ECL Inputs Using the macro occurrence worksheet, find the number of ECL input macros. For AMCC customers, the number of ECL input macros can be determined from the AMCC I/O statistics ERC report and from the AMCC AC Bipolar/BiCMOS Macro Occurrence worksheet. ●
●
●
Count the number of ECL input macros switching at the fastest frequency. Count the number of ECL input macros switching at the next fastest frequency. Repeat until all ECL input macros are accounted for and sum the results.
Systems
Step 5: Compute AC Power For all ECL Input Macros Using the counts from step 4, compute the AC power dissipated by the ECL inputs used in the circuit. For AMCC arrays, use: PacI = f * n * 3.25 microwatts where f = frequency of operation in MHz n = number of ECL input macros conversion factor = 3.25 microwatt/macro-MHz for Q20000 Series ●
●
●
Compute the AC power due to those input macros switching at the fastest frequency. Compute the AC power due to those macros switching at the next fastest frequency. Repeat until all ECL input macros are accounted for and sum the results.
Step 6: Find the Number of Internal Macros Using the macro occurrence worksheet, find the number of internal macros. For AMCC customers, the number of internal macros can be determined from the AMCC Population ERC report.
Step 7: Compute AC Power For all Internal Macros Using the count from step 6, estimate the AC power dissipated by the internal macros used in the circuit: PacI = 0.20 * f * n * 3.25 microwatts where f = maximum frequency of operation in MHz n = number of internal macros 0.2 = 20% switching conversion factor = 3.25 microwatt/macro-MHz for Q20000 Series
Step 8: Sum All Components of AC Power Sum the results together and convert to watts (or to the same units as was used for DC power). This equation provides an estimate of the worst-case AC power dissipation for the Q20000 array.
PdAC = PacI + Pacinternal + PacDARoutputs Use the same equation for military and commercial computations.
Total Power Dissipation Add the DC power computation result to the AC power computation result. Pd = PdDC + PdAC The result is the worst-case power dissipation to be used in computing junction temperature. Depending on the array series, one of the terms may be zero or close enough to zero to discount.
Exercises 1. Is AC power a factor in the array series of interest? (Use the same series used in the DC power exercises.) 2. What equation is used by the array vendor you have chosen? 3. AMCC Q20000 Series: Continue the problem started in the DC power exercises and add AC power to the DC power already computed for the circuit. The number of input macros is 84. The number of internal macros is 112. There are no Darlington output macros. Assume all macros switch at 500MHz (Note: Is there a pulse width violation for any macros at this speed?) 4. Compare both versions of the above problem as described earlier. Which one is best?
Simulation
Introduction Simulation is a design synthesis - design validation - design verification tool. It can involve the functional module level, the entire array or an entire PC board (arrays and other devices). The array vendor is responsible for the detailed Spice-level modeling of the macros, i.e., modeling at the basic device level of transistors, resistors, diodes, with parametrics generated by characterization of the array process. As a rule, the designer does not need to evaluate the circuit to that level of detail. Exceptions are those circuits with strict path matching require-ments. In that case, a partial circuit involving the paths in question is evaluated at the discrete level. This is usually done with the support of the array vendor. Before design start, a simulator can be used to: ● ●
evaluate the array series evaluate different implementations of critical or complex structures
Partial and checkout simulations are usually not submitted to the array vendor at design submission although this use of the simulation tools is an important step in automation of the design synthesis process. These simulations are usually in the "quick and dirty" class in that they are run to appease the designer and often not documented. (This is not intended as approval for the lack of procedure; it is merely a statement of what occurs.) Any simulation done to "check out" a design implementation, partial or not, should become part of the design notebook and be correctly documented. During design synthesis, simulation can and should be used to: ● ●
● ●
debug the logic design itself debug the implementation of that design - functional performance evaluate the timing performance of the design implementation generate test vectors for prototype and production testing
Simulations used to debug the complete circuit, to check functional performance and some of the timing analysis vectors can often be used in the creation of the complete test vector set for wafer-sort and packaged part testing. Test vectors are generated from simulator output files. Before simulations are generated for an ASIC array, the designer needs a basic understanding of the testing problem. This encompasses:
● ● ● ●
why the vector set is needed what is being tested by the vendor what is not being tested by the vendor what could be added to the testing done by the vendor if added to the contract
In addition, the designer must be aware of the limitations imposed by the array vendor that may be tied to a specific tester. They may be driven by the specific simulators supported by the vendor. A vendor may require that all submitted simulations have been run on a specific "golden simulator". Simulation output files may need to be reformatted before submission. Such reformatted files are then processed as data by the array vendor using a test generation program which may add testerspecific control vectors to handle bidirectional and three-state enable signals. The designer may have access to an automatic test generator (ATG) and may have done the proper design for test (DFT) that supports the use of the ATG. The vectors produced by this method may still need to be processed to assure that the vectors are in the proper format for the array vendor and meet the required rules. Check with the chosen array vendor to see if ATG is supported.
Simulators - The Tools Workstation simulators are capable of: executing partial circuits, of tracing waveforms or lists of circuit activity, of allowing the specification of what signals are to be traced or monitored, of allowing both internal and external signals to be displayed, and of performing timing checks within the accuracy of the array vendor's models. Listings and waveforms are available in a variety of formats. All simulations should use some form of annotation and annotation delay files or alternative modeling procedures can be used with all simulators. Annotation software to generate the interconnect delay files may be supplied by the workstation or array vendor. If the array vendor does supply annotation files or the software to generate them for the circuit, then the array vendor's software must preempt that of the workstation vendors. While the simulators may provide many features and output options that the designer will find useful during the circuit development, the array vendor may restrict the type of simulation and the output option that may be submitted. Designers should review the chosen vendor's rules before beginning the simulation process.
Non-Native Simulators Several workstation graphic capture systems are no longer tied to their native simulators or may have more than one "native" simulator. Netlist conversion software allows selected "front-ends" to communicate to the LASAR 6, Verilog, or other simulators that can be located on a remote mainframe or resident on the workstation platform. For example, as of 1994, Dazix, Valid and Mentor systems will input to their own simulators and to the Verilog simulator. Conversion programs allow the VAX/VMS LASAR 6 simulator to be used with the Dazix, Valid or Mentor netlist. (AMCC made use of Lasar 6 as its RaceCheck package in this manner.) Other combinations of front-ends and simulators are available. The simulators that can be used will be determined by the array vendor. Each simulator requires timing modeling for each specific library. A vendor may support one resident simulator and not others for a given workstation.
Teradyne's LASAR 6 LASAR 6 is a software simulator originally intended as a tool to aid Test Engineers. LASAR 6 is a true min/max simulator that correctly handles reconvergent fan-out. Reconvergent fan-out occurs when two signals that can be traced back to a common point must have their timing evaluated relative to each other. The evaluation of Tsu and Th at the input to the second stage of a shift register would be one such case. The min/max ambiguity that exists at the driving point of the clock net is the same at the input to both the first stage and the second stage. It must be removed before the evaluation of the data and clock relationship at the input to the second stage. The need for min/max simulation is generally required because of the possible propagation delay time differences that might occur on any two structures on the same die. This is generally referred to as "on chip tracking". LASAR 6 also allows the skewing of the primary inputs to account for the tester characteristics. This means that, if the tester specification says that an input transition will be within plus or minus 3 ns from the time it is told to change, then we can skew the inputs to the simulation by plus or minus 3 ns. This makes sure that the patterns will work when run on the tester. These advanced capabilities found in LASAR 6 are not available in the most commonly used EWS logic simulators.
AMCC's RaceCheck RaceCheck is a tool developed by AMCC for verifying that no problems exist in the test vectors to be used for automated test. RaceCheck incorporates Teradyne's LASAR 6 simulator into an easily executable form. A series of translators for getting data from the AMCC generic format into the LASAR format, and for getting back to the AMCC generic format, have been provided. The entire process, including running LASAR and checking the results, has been put into a shell that prompts the user for needed information and then submits the process to a job queue for actual execution. RaceCheck must be run using all functional, parametric, and AC test patterns intended for use on AMCC's automated test systems.
Timing Verifiers Timing verifiers are an alternative timing analysis tool that is available on most workstations. They are used to predict circuit performance under real-time conditions and are an alternative to the at-speed simulation. Example verifiers are the Valid timing verifier, DTV from Dazix, and QuickPath from Mentor. One problem with the verifiers is that they do require different models, increasing the effort required to add an array library to the system. A different model library and different annotation files may be required for the timing verifier than for the simulator on the same system. A timing verifier on one workstation supported by an array vendor does not guarantee that another workstation supported by the vendor has a supported verifier. An array library may exist for the simulator and not for the timing verifier on any given workstation. The timing verifiers that can be used, if any, will be determined by the array vendor.
Hardware Emulators Hardware emulators are beginning to be adapted into the workstation arena, driven in part by the increasing size of the arrays and their simulations. Heavily populated large arrays (over 5000 gates) can saturate a workstation. When combined with a large vector set, the simulations can run for days on some workstations or cause back-up on a mainframe. Hardware emulators can reduce the simulation time to hours or minutes. Emulators may be part of the add-on hardware of a native system or they may be independently sourced systems. Example systems are the MegaLogician-Gatemaster MDLS for Dazix and IKOS, capable of being driven from a Dazix, Mentor or Valid front-end, among others. This cast of players is in a high state of flux. Always check into what is currently available when evaluating emulators.
The more user-friendly the simulator input process, the more likely the engineer will make use of the tool. At this time, the set-up required for some emulators takes longer than the final simulation execution time! That is expected to change as more users are forced to acquire simulation assistance. A hardware emulator can be used if the array vendor supports the emulator. Another driving force behind the evolution of the hardware emulators is the need to have improved AC power computations. Using an emulator and a benchmark vector set designed to reflect the circuit usage, a better evaluation of macro or gate switching frequency can be obtained resulting in a closer estimate of expected AC power dissipation. The problem complexity is beyond the available simulators. Note that the accuracy obtained even with an emulator will depend on the accuracy of the vectors analyzed.
Golden Simulators No single simulator can handle all problems and circuit complexities with equal ease. Circuit structures such as reconvergent fan-out and tight rise-fall timing ambiguity analysis are handled at varying levels of accuracy. The idea of a "golden simulator" is an attempt by array vendors to guarantee that a known level of accuracy and simulation capability is used on all in-coming designs. It is also one way of standardizing the submission process. Dial-up simulation is offered by several array vendors and being considered by others. Using dial-up, the different array designers could access the selected array-vendor-resident simulator. It allows small companies to take advantage of a mainframe simulator without a significant hardware investment. The requirements would be that the user have a VT100-type terminal for batch operation or a full graphics station for interactive place and route.
Questions To Ask A designer selecting an array must have a clear understanding of the design submission process required by that specific vendor. The number of simulations and their format will vary with the vendor and with corporate policies of the designer's company. Questions that should be asked before design start are shown in Table 8-1.
Design Validation During design validation, thorough simulations are required. The types of simulations that are specifically required for design submission will vary from vendor to vendor. There are four basic groupings currently required or allowed by vendors: ● ● ● ●
wafer-sort, packaged-part test vectors timing verification at-speed AC test parametric
These four will be discussed since they represent a sufficient set for design submission. Additional simulations or slight variations may be required by various array vendors or quality assurance managers. Understanding these four basic groups will provide a basis for understanding any other simulations that may be required for a design.
Wafer Sort/Packages Part Sort Functional Simulation The vectors used to perform wafer sort and packaged part testing are generated from simulation output vectors. These simulation output vectors consist of all the input and the expected output signals generated by the input file. These vectors provide a time independent, sequential state description of the circuit after any input change has propagated through the logic and all the outputs have settled to their stable state. A reasonable sample step for this simulation is 100ns, with the sample taken one simulator time step before the next input vector. The simulation provided to the array vendor should contain sufficient vectors to provide verification of the logic and should supply all simulation vectors required for the recommended 90% or better fault coverage of the final circuit. The level of fault coverage will vary from vendor to vendor or may be dictated by the designer's company. Indications are that reliability requirements are beginning to force a high (98-99%) fault coverage requirement.
Fault-Grading Fault grading is a measure of the fault coverage - a "grade" on the quality of the fault detection provided by the submitted simulation vectors. A fault grading score of 100% means that if a SA1 or SA0 fault exists at any single observable node within the circuit it will be detected during the tester functional testing phase. Single fault detection, the detection of a stuck-at fault (SA1 = stuckat-1; SA0 = stuck-at-0) at any single node in the circuit, requires that the node be "covered" by at least one simulation vector. A node is covered by the vector set when the state of at least one circuit primary output for at least one vector is different when the failure is present than when the failure is absent. A failure at a circuit node that is not covered by the functional simulation vector set will not be detected. A stuck-at fault is generally thought of as a physical open or short circuit, i.e., a "hard" failure. Intermittent failure is not necessarily detectable by the stuck-at model. Redundancy in the circuit produces fault-masking and will reduce the obtainable fault-coverage since it reduces the observable nodes. The addition of test points when the redundancy is deliberate, and the minimization of the circuit when it is not, are recommended approaches to improve testability. There are no requirements for fault location, i.e., the identification of the exact point of failure. Multiple-fault detection, a less-probable occurrence, is also not required although most singlefault minimal test sets and minimal test sequences will provide 100% fault coverage of all observable faults and will also detect the presence of many multiple faults, depending on the circuit implementation.
The Minimal Test Sequence as applied to combinatorial and sequential circuit elements is discussed in Chapter 9. Table 8-1 Questions To Be Asked Questions to Ask About Simulation Submission Requirements ●
● ● ● ● ●
● ●
●
● ● ● ●
What simulations are required fordesign submission and on what media? ❍ wafer-sort ❍ timing verification at speed ❍ AC tests ❍ parametric tests ❍ other ❍ Electronic FTP, floppy disk or tape? What simulations may be submitted? What simulators are supported? What support for hazard and race detection? What timing verifiers are supported? What level of fault-grading is required? ❍ by the array vendor ❍ by the designing company What rules must be followed for each type of simulation Determine if ATG outputs are acceptable or if they need a post-processing What is submitted? ❍ simulator control or command files ❍ input files ❍ output files ❍ vector checking reports ❍ netlist ❍ annotation delay files ❍ other control or data files? Is annotation software available? What testers are available? What expansion can be expected from the test program? What vector-checking software is available?
Testability Analysis There were several available software packages such as the Dazix DTA (Dazix Testability Analyzer) and Tegas COPTR that analyzed the testability of a design. They attempted to measure the controllability and the observability of the nodes within the design. There are various products on individual workstations that have been designed to perform this task. Note that array vendors on the whole do not enforce their use and may not support them if different models are required. Controllability is the measure of how difficult it is to set a node to a given value. A node is controllable if it takes one or not more than a selected number of vectors to set it to a given value, i.e., propagate a primary input signal to that node. The ideal case is that it requires one vector to set a node. Observability is the measure of how difficult it is to see the value to which a node is set. A node is observable if it takes one or not more than a selected number of vectors to propagate the value of the node to an observable output. The ideal case is that it requires one vector to observe a node. Design optimization using ad hoc or formal procedures (DFT) to improve testability scores were discussed earlier. If such software is available, it is recommended that the design evaluate the circuit for testability before finalizing the design and proceeding with the simulations.
Simulation Rules Each vendor will have specific rules for the simulations that are to be used to generate test vectors including: initialization, file sizes, number of output signals that can change per vector (tester probe noise limitation), fault coverage, procedures for three-state and bidirectional signals, procedures for differential signals, race restrictions for data and clock signals, and any other rules peculiar to that vendor. A partial sample simulation output file is shown below. It is a formatted output, sampled in 100 ns steps taken one simulator time step before the next input vector. All inputs are uniform in their arrival and all vendor rules were followed. This file is part of one that represents 100% fault coverage for a simple 16:1 MUX circuit. The vectors were created following Minimal Test Sequence rules (one input changes state per data change vector). Figure 8-1 Sample Simulation Output File, Formatted To Be Used To Generate Test Vectors
TIME 9999 19999 29999 39999 99999 109999 119999 129999
0100 1100 0000 1000 1001 0001 1001 0001
0010 0010 0010 0010 0010 0010 0010 0010
939999 949999 959999 969999 979999 989999 999999
1000 0000 1000 0000 1000 0000 1000
1010 1010 1010 1010 1010 0010 0010
0101 0101 0101 0101 0101 0111 0111 0101 o o 0101 1101 1101 0101 0101 0101 0101
1001 1001 1001 1001 1001 1001 1001 1001 o o o o 1001 1001 1001 1001 1001 1001 1001
1010 1010 1010 1010 1010 1010 1010 1010
0100 0100 0100 0110 0100 0100 0110 0110
1010 1010 1010 1010 1010 1010 1010
0100 0100 0110 0110 0100 0100 0110
Full File Listing - Functional, Sampled Simultation 16-Bit Register with Mux Output Sample Circuit
At-Speed Simulation For Timing Verification An at-speed simulation is one method that may be used to verify the actual timing performance of the circuit as implemented on an array against the target specification for the circuit. It is run at the maximum worst-case conditions and at the minimum worst-case conditions. Since it is run at the specified maximum operating frequency of the circuit, this simulation is not used to generate test vectors. The at-speed simulation run before layout using worst-case multipliers and FrontAnnotation can spot potential problem areas in the circuit and to assist in defining the criteria for the layout. The Front-Annotation file makes a statistically-based estimate of the metal delay and adds the actual delay due to fan-out load and any wire-ORs present. Output nets have estimated package pin capacitance and system capacitance delays. The at-speed simulation run after layout using the Back-Annotation file can verify the actual timing performance for the array. The Back-Annotation file provides the actual metal delays for the layout combined with the actual fan-out and wire-OR delays for internal nets and actual package pin and system capacitive load delays. Note that the at-speed simulation outputs will be time-dependent (some results are not necessarily available within one sample step). The apparent "phase delay" of some outputs relative to others makes the evaluation of at-speed simulation results a non-trivial exercise.
Timing Verifier Engineering workstations often have a timing verifier as well as a simulator. If the array vendor supports the verifier, timing analysis using the verifier can be substituted for At-speed simulations under the array-vendor''s approval.
AC Tests Array vendors are beginning to incorporate thermal diodes and AC speed monitors within the base array or as macros that can be added to allow thermal and speed measurements. AC tests may also be allowed, regardless of the presence of a speed monitor. An AC test is a measurement of one path, input to output, using a single input, a rising or falling signal on that signal, one output and the rising or falling edge on that signal. The vendor may allow set-up and hold measurements or the designer may be restricted to propagation path tests only.
Parametric Vectors Quality Assurance departments will generally require that the DC parametric tests for VIH and VIL be performed on the array. Should this option be selected, there are several approaches that can be used to ensure proper vectors and conditions for these tests. The easiest method has been to allow the tests to be made using the test vectors written to perform wafer sort. This approach is acceptable if the inputs to be tested are toggled within the a single page of the vectors; if the input to output paths are combinatorial and if the number of outputs which toggle during those vectors is within the tester limits. Different vendors may suggest or require alternative approaches. When it is not possible to meet the restrictions that would allow parametric testing with the wafer sort vectors, a successful approach is to add combinatorial gates (NAND, AND, NOR or OR) and one output macro. Gate all inputs or all inputs to be tested through this combinatorial gate tree. There are several types of inputs that cannot be tested in any of these approaches. They are: ● ● ● ● ● ● ● ● ●
thermal diodes AC monitors VBxx macros added power added ground added ECL VCC unbuffered ECL inputs unbuffered TTL inputs three-state enable-drivers
Differential inputs always operate as a pair and each pair should be considered as a single entity when reading the following test methodology descriptions.
Gate Tree - Any Circuit This approach for parametric testing is the best for any circuit, any I/O mode. It requires internal logic cells, internal routing, and one additional I/O cell. ●
●
●
●
● ●
●
If SET or RESET is one signal, start the vector set with the set or reset in the inactive or disabled state (outputs unknown). The second vector will SET or RESET the circuit. This is the only vector that will encounter multiple outputs switching. Enable the TEST mode in the first vector if the design requires it. Set all bidirectional macros to the input mode, set up gating, etc. Complete initialization before beginning the parametric test. Gate all inputs (except those already identified as exclusions) together (use an AND tree, a NOR tree, etc., as required). Bring the result of the GATE tree out to a primary output which must be listed in the simulation vector format. (Note: The parametric tree output will be listed in all simulations performed on this circuit.) Identify the gate tree output signal. The tree output may be passed through a multiplexor to allow use of an existing primary output only if the circuit is I/O limited. One input may switch per vector in the following manner. ❍ Start with all inputs at logical "1" ❍ Switch one input to "0" ❍ Switch that one input back to "1" ❍ Switch the next input in sequence to "0" ❍ Continue until all inputs have been toggled ❍ The gate tree output signal toggles each vector or the reverse (start at "0" and switch to "1" and back).
The toggle pattern of the parametric test is the Minimal Test Sequence for the gate tree. [The Minimal Test Sequence is discussed in Chapter 9.] The Minimal Test Sequence will cover all possible faults in the gate tree. Wafer sort vectors and parametric vectors taken together determine the fault grade score of the vector set for the entire circuit.
Systems
Hazards Timing checks in the simulators are designed to help the designer screen hazards and race conditions from the circuit and from the test vector set. LASAR 6 is used as the role model for race and hazard evaluation. Timing hazards are divided into two basic categories: structural and functional.
Structural Hazard Types * Converging Ambiguities A Convergence hazard is one where at least two edges come together at a primitive and the ambiguity associated with the edges overlap so that the primitive may or may not pulse. (A primitive is an element provided by the simulator software. It is a basic building block used to construct a logic model.)
* Cumulative Ambiguity A primitive in the model has a pulse at its output, but the ambiguities at the rising and falling edges are so large that they overlap. Since the whole pulse is "gray", it may or may not occur and is therefore flagged as a hazard.
* Composite Hazard A primitive's inputs change in such a way that the primitive's output will change at a future time. If, before that time occurs, the inputs change again in such a manner that the output change would be reversed, then the simulator doesn't know if the primitive output will respond at all. (This can be described as the inputs changing faster than the propagation delay of the primitive.) This is flagged as a hazard by LASAR 6.
Functional Hazards * Setup violations Set-up time violations are situations where a node has not remained in a stable state long enough before another node changes state. This test is common on data inputs to flip/flops with respect to the clock of the flip/flop.
* Hold time violations Hold time violations are situations where a node does not remain in a stable state long enough after another node changes state. This is common on data lines with respect to clock inputs where the data must be maintained in a stable state for a specified time after the clock change.
* Minimum pulse width violation The time between two consecutive edges of a signal must be greater than a specified value.
* Minimum period violation The time between two consecutive rising or falling edges of a signal must be greater than a specified value. For a node that has multiple hazards associated with it within a single pattern, only the last hazard is reported by LASAR 6. The resulting hazard report file will contain only those hazards that meet the definition of "persist". A persistent hazard is one that causes a node to go unknown and stay unknown until the end of the current pattern. This is a valid way of screening out combinatorial glitches so long as the sample time is long relative to the longest delay time.
What is required to fix race problems * Identify source of problem The above errors can be caused by vector races or by internal timing conditions. The LASAR 6 hazard report file will show a trace back to the primary input or inputs that caused the error. If it is traced to two separate inputs then it can generally be fixed by changing the input pattern as long as it does not violate the intended function. If it is traced to only one input then the problem is due strictly to internal timing and will require modification of the circuit to change its timing. * Implementing The Fixes In the case of a vector race, it will be necessary to return to the EWS system where the design was done. The simulation input file must be modified to fix the timing problem and the simulations rerun. In the case of the internal timing problems, it will still be necessary to return to the EWS system, this time to modify the schematic as required. After this is done, rerun all steps, including all simulations, before returning to RaceCheck. * Looping caused by iterative fixes Iterative fixes may result from three primary causes. ●
●
●
If a fix for a previous error introduces a new error then the process will have to be repeated to fix the new problem without going back to the old one. The second cause is due to a hazard causing an entire path to become unknown (X) for an extended time. This leaves the possibility that a node down stream has a hazard that was blocked by the X state existing from the previous upstream hazard. The third, but least common cause, is the fact that the simulator, such as LASAR 6, reports only one hazard at a time for a given node within a single pattern.
Exercises 1. Read at least four survey articles in current literature that discuss ASIC simulation, workstations, simulators, timing verifiers and hardware emulators. What trends do these articles show? 2. Select three to ten array vendors. Discover what workstations, simulators, and other support tools are available from these vendors. Do they offer design centers (provide equipment to designers at specific sites)?
3. Read at least two articles on the concept of a "golden simulator". Discuss the impact of this approach on designers used to their own workstations. 4. Select either an 8-bit adder, 8-bit counter or 16-bit multiplexor. Develop 100% fault coverage wafer-sort vectors. The circuits are of your own design as to interface, etc. 5. Write AC propagation path only test vectors for any six paths in the circuit chosen in Exercise 4. 6. Develop a gate-tree and its vector set for the circuit chosen in Exercise 4.
Case Study: Simulation Introduction The following case study (the by-now-familiar 32-bit register) provides a circuit, complete with thermal diodes and a parametric gate tree. The simulation output files are formatted according to one vendor's rules and requirements. The files include the wafer sort set, at-speed, 16 AC tests (propagation path tests only) and a parametric vector set. The files pass all tests and would pass design acceptance. They are only for the maximum (military) worst-case conditions. A second set of files would be required that were produced using the minimum timing library and annotation files. The sample step is 100 ns for all files except the at-speed file. That file must be run at the specified maximum operating frequency. Both sampled and print on change files are included per the vendor requirements. The workstation used was a Mentor EWS on Apollo using the 1991 Q20000 Bipolar library. Identical files are produced when the design is done using a Valid or Dazix SUN system since the files are always reformatted. The formatter accepts the normal simulator output file as input and converts it into the format shown. Fault coverage for the vector set (parametric and wafer sort taken together) is 100%. The vectors were developed using the Minimal Test Sequence methodology.
The Schematic The circuit used to demonstrate simulation files is a 32-bit register created from the AMCC Q20000 Series macro library and was shown in the case study appendix to Chapter 3. The schematic was created using AMCC schematic rules. The overriding rule is human-readability.
Page One Page one includes a chip macro, a pseudo-macro designed to allow the designer to communicate array-specific parameters to the design software. In this instance, the chip macro informs the software that this is a Q20080 array, using ECL interface macros in a standard 5.2V single power supply configuration. User-added parameters identify the circuit and specify that it is a military circuit. This will determine which annotation files are generated. The library revision shows (010) - October 1990, the most current at the time of the schematic capture. The library was run under the (111) - November 1991 release. Unseen information carried by the chip macro includes number of I/O cells, number of internal cells, overhead current, switching group size and I/O macro types. Dozens of parameters are carried by the chip macro. This approach to array and circuit identification is vendor-specific. Also shown as part of page one are a number of added power and ground macros. For TTL or TTL/ECL mixed circuits, ITPWR is a macro that will be used at placement to identify a pad that is to be tied to the +5V bus. ITGND is a macro that will be used at placement to identify a pad that is to be tied to the 0V bus. For the ECL circuits, IEVCC is used to supply either added ground (standard reference circuits) or added power (+5V reference circuits). These macros are named since they will be placed. User-defined or instance names are used to create a placement file. They also carry a tag "AAA", which is a switching-group assignment. The switching-group parameter is vendor-specific. It is used by AMCC software to tie simultaneously switching outputs to the added power and grounds that were added to handle them. This allows checking software to flag possible violations based on the rules issued for the specific array. Also on page one, note the static driver.
Output Enable Where a pin must be driven by another macro or where tying a pin to ground would be unacceptable, the use of another macro that can be tied to ground is required. This circuit uses a static driver that could drive 50 loads. It drives 39 loads in this case. There are design restrictions and tester limitations that affected the design of an output enable circuit. First, not all outputs may change state during testing (the limit is 16). This has nothing to do with normal circuit operation. To accommodate this vector restriction, testenable inputs may need to be designed into the enable structure. This circuit did not require any test enables.
Page Two The control circuitry is grouped on page two. There are three modules, clock, reset and MUX enable. All macros and all external signals are named as are signals that go from one page of the schematic to another. Naming other internal nets is arbitrary, and usually depends on the need to trace them during critical path analysis. Vendor-specific rules require that any signal that goes to one or more other pages have those pages noted. Any signal coming from another page must have that page noted. This is part of the humanreadability.
Clock The clock uses library-specific clock drivers. GT55D macros are low skew-high load drivers. They are driven by a differential-macro IE31H. The array-specific rule in use here, since the speed will not exceed 600 MHz, is to derate the fan-out load capability of all macros in the clock net by 40%. On the schematic, this appears as a 40 near the internal signal "INTCLK" and near "ICA", "ICB", "ICC" and "ICD". The use of a fan-out derating parameter is vendor-specific. AMCC uses FOD attached to a net to derate the pin of the macro driving the net. Fan-out loading for the clock depends on the macros driven. Regardless of loads driven, it is often desirable to keep the clock lines balanced, in this case placing fewer loads on each driver. Fan-out load derating for clock lines is a typical vendor requirement and you should verify what a chosen vendor will require.
Reset The reset structure is simple, using eight GT09S standard macros. Fan-out loading for the reset line does not require derating. Each macro output pin will drive 12 loads.
MUX Enable The 2:1 MUX enable must drive 32 loads so a small buffer tree consisting of two gates was added. To reduce internal cell usage, a high-power driver could have been used.
Page Three Page three contains the four enables for the 16:1 multiplexors, and the six pass-through inputs and outputs. The outputs use the static driver.
Page Four , Fourteen Page four and fourteen show the 16:1 MUXs. The output macro is OE42S that does use an output-enable control (from sheet 3).
Page Five through Twelve Pages five through twelve are identical in macro content except that the names of the macros and signals will be different. Each page is a 4-bit "slice" of the register. This circuit cannot be drawn using hierarchy due to vendor-specific rules that require all I/O macros to be at the top of the drawing hierarchy. The only non-I/O macros on these pages are the flip/flop macros. The flip/flops used in the drawing are FF46S and FF10S. Although the flip/flops are in the critical path, the use of 32 H-option flip/flops would make the design very hot. The 3-input OR/NOR has the C pin tied to ground but the B pin tied to the static driver. The macro requires that B or C be driven by a macro. By sending the flip/flop output into the B or C pin, the remaining pin and the A pin could be tied to ground and the schematic could be simplified. It would also reduce routing nets - an important issue even for channelless arrays. Note the vendor-specific switching-group parameter "AAA" used on each output macro.
Page Thirteen The thirteenth page contains a parametric gate tree. The parametric gate tree is a vendor-specific requirement to allow parametric VIH, VIL testing. All circuit inputs are ORed and brought out to one output macro, in this case OE42S. The actual connection is to the output pin of the input macro. When an I/O macro has two output pins, using the otherwise unused pin prevents the actual circuit from "seeing" the time delay imposed by the net driving the gate tree. When no unused pin exists, an added fan-out load will be introduced. Gate trees are the recommended approach to any sequential circuit with more than 8 simultaneously switching outputs (mixed-mode or TTL/ECL mixed circuit) or 16 (100% TTL or 100% ECL circuit) in any vector.
Design Checks This circuit passed all design checks - with a warning that perhaps added power macros might be required at placement.
Annotation The annotation files were generated using a 251_PGA_CD package 251 pin grid array cavity down. This establishes an estimate for the package pin capacitance for the output macro loading. Annotation was run without incident.
The Simulations Control files vary from workstation to workstation, from simulator to simulator, from product grade to product grade, from array to array and from simulation to simulation. They are too specific to be shown here. The outputs of the simulators are for the most part non-compact and even difficult to read. Only formatted outputs will be shown here. First, they are compact, a requirement since there are 107 I/O signals and 4 internal enable signals that the vendor requires be listed. Second, they look the same regardless of the workstation used, Dazix, Mentor or Valid. The simula-tions were run using Mentor. All simulations shown are the result of military worst-case maximum timing. Minimum simulations are often also required.
Functional Simulation The wafer-sort and packaged-part sort test vectors are derived from the functional simulation. (The name is vendor-specific.) This simulation is done following vendor-specific rules. These include high-coverage, no more than 16 outputs changing in any one vector, the high limit allowed because parametric vectors and a parametric gate tree are used. All signal transitions must be included - 0-1, 1-0 for standard macros, 0-1, 1-0, 0-Z, Z-0, 1-Z, Z-1 for 3-state macros. To prevent the vector checker from complaining, the PARAM signal from the gate tree is toggled at the end of the vector set. There will be one error message - the initial reset will cause 64 signals to change state. This cannot be avoided. The circuit must be brought up exactly as shown, with the reset "disabled", and then the reset activated. A vector set for a 16:1 MUX is shown in Figure 8-1. Checking of this vector set shows 100% coverage of the internal nets and primary I/O, excluding a gate tree. It passes the vector check software with the allowed exception of the reset error message. Figure 8-1 Functional Simulation - 16:1 Mux
MINIMAL TEST SEQUENCE FOR 16:1 MUX TEST CASE ON A Q20010E ________________________________________________________________________ 1***CIRCUIT IDENTIFICATION = EESSSSDDDDDDDDDDDDDDDDYP XXEEEEAAAAAAAAAAAAAAAAOA TTLLLLTTTTTTTTTTTTTTTTUR CRCCCC0123456789111111TA LSTTTT 012345PM KT3210 T TIME 99.990 010000100101100110100101 199.990 110000100101100110100101 299.990 000000100101100110100101 399.990 100000100101100110100111 499.990 000000000101100110100111 599.990 100000000101100110100101 699.990 000000100101100110100101 799.990 100000100101100110100111 899.990 000100100101100110100111 999.990 100100100101100110100101 1099.990 000100100111100110100101o o o o o o o o
Full File Listing - Functional, Sampled Simultation 16-Bit Register with Mux Output Sample Circuit A partial vector set for a 32-bit register similar to that in the schematics is shown in Figure 8-2. The sample step is 100 ns and the sample is taken first at 99.99 ns. The simulator output is integer - place the decimal two places from the right. The signals are listed in vendor-specified order, all inputs, all outputs, and all 3-state enables listed last. Only sampled func-tional simulations are submitted. Figure 8-2 Functional Simulation - 32-Bit Register (partial) Figure 8-2 Functional Simulation - 32-Bit Register - Full Listing
Exercise Create a complete functional vector set for the schematics shown in the Appendix of Chapter 3.
Parametric Simulation Since there is a gate tree, a parametric vector set can be easily constructed using the Minimal Test Sequence. The sequence requires that only one input change per vector, that each input toggle in both directions and that the output (PARAM) toggles with each vector. Figure 8-3 shows a parametric vector file for a 32-bit version of the register. Sampling and format are vendor-specific, and the 100 ns step was used. Note that the reset is executed at the beginning as it was for the functional vectors. This will produce an error message during parametric vector checking. Figure 8-3 Parametric Simulation - 32-Bit Register (partial) Figure 8-3b Parametric Simulation - 32-Bit Register - Full Listing By combining the functional and parametric vectors, 100% fault coverage is obtained for a circuit. Only sampled simulations are submitted.
Exercise Create a complete parametric vector set for the schematics shown in the Appendix of Chapter 3.
Parametric Simulation - Sampled - 32-Bit Register Last Edit October 8, 1996 1***CIRCUIT IDENTIFICATION = DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDEEEEEEDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTOOP IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIXXXNNNODODODODODODODODODODODODODODODODODODODODODODODODODODODODODODODODEEA 00000000001111111111222222222233TTTABC0O0O0O0O0O0O0O0O0O0O1O1O1O1O1O1O1O1O1O1O2O2O2O2O2O2O2O2O2O2O3O3ONNR 01234567890123456789012345678901RCC 001020304050607080900111213141516171819102122232425262728292031312A SKA 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 M TAN TIME 99.990 11111111111111111111111111111111010111XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX001 199.990 111111111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000000 299.990 111111111111111111111111111111110101110000000000000000000000000000000000000000000000000000000000000000001 399.990 111111111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000000 499.990 011111111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000001 599.990 111111111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000000 699.990 101111111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000001 799.990 111111111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000000 899.990 110111111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000001 999.990 111111111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000000 1099.990 111011111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000001 1199.990 111111111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000000 1299.990 111101111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000001 1399.990 111111111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000000 1499.990 111110111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000001 1599.990 111111111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000000 1699.990 111111011111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000001 1799.990 111111111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000000 1899.990 111111101111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000001 1999.990 111111111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000000 2099.990 111111110111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000001 2199.990 111111111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000000 2299.990 111111111011111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000001 2399.990 111111111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000000 2499.990 111111111101111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000001 2599.990 111111111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000000 2699.990 111111111110111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000001 2799.990 111111111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000000 2899.990 111111111111011111111111111111111101110000000000000000000000000000000000000000000000000000000000000000001 2999.990 111111111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000000 3099.990 111111111111101111111111111111111101110000000000000000000000000000000000000000000000000000000000000000001 3199.990 111111111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000000 3299.990 111111111111110111111111111111111101110000000000000000000000000000000000000000000000000000000000000000001 3399.990 111111111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000000 3499.990 111111111111111011111111111111111101110000000000000000000000000000000000000000000000000000000000000000001 3599.990 111111111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000000 3699.990 111111111111111101111111111111111101110000000000000000000000000000000000000000000000000000000000000000001 3799.990

Systems
At-Speed Simulation There are no coverage rules for an at-speed analysis. It can be done using a timing verifier or a simulator. The output format is vendorspecific and in this case matches all other simulations performed for the circuit. The at-speed simulation is to be run at the specified maximum operating frequency of the circuit. The actual maximum frequency of this circuit is left as a class exercise. Two simulations are shown - Figure 8-4 shows the maximum worstcase sampled simulation and Figure 8-5 shows the maximum worstcase print on change. Sampling depends on the maximum frequency. The print on change file has an entry for each time that a monitored signal changes state. (This version of the circuit had 3-state outputs.) Figure 8-4a At-Speed Sampled Simulation - 32-Bit Register (partial) Figure 8-4b At-Speed Sampled Simulation - 32-Bit Register - Full listing Figure 8-5a At-Speed Print-On Change Simulation - 32-Bit Register (partial) Figure 8-5b At-Speed Print-On-Change Simulation - 32-Bit Register Full listing Note that there are no switching restrictions for outputs in a vector for this simulation, which considerably reduces the size of the sampled vector output file in comparison to that for the functional simulation. The at-speed simulation is never run on a tester. The reset is handled as before.
Exercise Create a complete at-speed vector set for the schematics shown in the Appendix of Chapter 3 (no 3-state outputs).
At-Speed Sampled (Partial) 1***CIRCUIT IDENTIFICATION = DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDEEEEEEDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTOOP IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIXXXNNNODODODODODODODODODODODODODODODODODODODODODODODODODODODODODODODODEEA 00000000001111111111222222222233TTTABC0O0O0O0O0O0O0O0O0O0O1O1O1O1O1O1O1O1O1O1O2O2O2O2O2O2O2O2O2O2O3O3ONNR 01234567890123456789012345678901RCC 001020304050607080900111213141516171819102122232425262728292031312A SKA 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 M TAN TIME 14.990 00000000000000000000000000000000001111XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX001 29.990 000000000000000000000000000000000101110000000000000000000000000000000000000000000000000000000000000000001 44.990 000000000000000000000000000000001011110000000000000000000000000000000000000000000000000000000000000000001 59.990 000000000000000000000000000000001101110000000000000000000000000000000000000000000000000000000000000000001 74.990 111111111111111111111111111111110011110000000000000000000000000000000000000000000000000000000000000000001 89.990 111111111111111111111111111111110101111111111111111111111111111111111111111111111111111111111111111111001 104.990 000000000000000000000000000000000011111111111111111111111111111111111111111111111111111111111111111111001 119.990 000000000000000000000000000000000101110000000000000000000000000000000000000000000000000000000000000000001 o o o
At-Speed Sampled (Partial) AT-Speed Print-On-Change File_name = ../bitregatspdpoc.vec File_type = PRINT_ON_CHANGE Test_type = AT_SPEED_SIMULATION Simulation_type = MAXIMUM Annotation_type = FRONT_ANNOTATION Vector_set_number = 1 Vector_set_revision = A Submission_time = Fri Aug 7 15:35:43 1992 ________________________________________________________________________ 1***CIRCUIT IDENTIFICATION =
o o o o o o o
At-Speed Simulation - Print On Change - 32-Bit Register Last Edit October 8, 1996 y
File_name = ../bitregatspdpoc.vec File_type = PRINT_ON_CHANGE Test_type = AT_SPEED_SIMULATION Simulation_type = MAXIMUM Annotation_type = FRONT_ANNOTATION Vector_set_number = 1 Vector_set_revision = A Submission_time = Fri Aug 7 15:35:43 1992 ________________________________________________________________________ 1***CIRCUIT IDENTIFICATION = DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDEEEEEETTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTOOP IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIXXXNNNEEODODODODODODODODODODODODODODODODODODODODODODODODODODODODODODODODEEA 00000000001111111111222222222233TTTABCNN0O0O0O0O0O0O0O0O0O0O1O1O1O1O1O1O1O1O1O1O2O2O2O2O2O2O2O2O2O2O3O3ONNR 01234567890123456789012345678901RCC 12001020304050607080900111213141516171819102122232425262728292031312A SKA 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 M TAN TIME 1.219 0000000000000000000000000000000000111111XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX00X 7.926 0000000000000000000000000000000000111111XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX001 11.574 0000000000000000000000000000000000111111XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX001 14.990 0000000000000000000000000000000000111111XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX001 15.000 0000000000000000000000000000000001011111XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX001 23.560 00000000000000000000000000000000010111110X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X001 26.655 00000000000000000000000000000000010111110000000000000000000000000000000000000000000000000000000000000000001 29.990 00000000000000000000000000000000010111110000000000000000000000000000000000000000000000000000000000000000001 30.000 00000000000000000000000000000000101111110000000000000000000000000000000000000000000000000000000000000000001 44.990 00000000000000000000000000000000101111110000000000000000000000000000000000000000000000000000000000000000001 45.000 00000000000000000000000000000000110111110000000000000000000000000000000000000000000000000000000000000000001 59.990 00000000000000000000000000000000110111110000000000000000000000000000000000000000000000000000000000000000001 60.000 11111111111111111111111111111111001111110000000000000000000000000000000000000000000000000000000000000000001 74.990 11111111111111111111111111111111001111110000000000000000000000000000000000000000000000000000000000000000001 75.000 11111111111111111111111111111111010111110000000000000000000000000000000000000000000000000000000000000000001 83.810 11111111111111111111111111111111010111111010101010101010101010101010101010101010101010101010101010101010001 84.110 11111111111111111111111111111111010111111111111111111111111111111111111111111111111111111111111111111111001 89.990 11111111111111111111111111111111010111111111111111111111111111111111111111111111111111111111111111111111001 90.000 00000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111111111111111001 104.990 00000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111111111111111001 105.000 00000000000000000000000000000000010111111111111111111111111111111111111111111111111111111111111111111111001 113.560 00000000000000000000000000000000010111110101010101010101010101010101010101010101010101010101010101010101001 116.655 00000000000000000000000000000000010111110000000000000000000000000000000000000000000000000000000000000000001 119.990 00000000000000000000000000000000010111110000000000000000000000000000000000000000000000000000000000000000001 120.000 00000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000001 120.816 00000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000111 130.107 00000000000000000000000000000000001000000Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z111 134.990 00000000000000000000000000000000001000000Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z111 135.000 00000000000000000000000000000000010000000Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z111 149.990 00000000000000000000000000000000010000000Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z111 150.000 11111111111111111111111111111111001111110Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z111 151.219 11111111111111111111111111111111001111110Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z001 160.016 11111111111111111111111111111111001111110000000000000000000000000000000000000000000000000000000000000000001

At-Speed Simulation - Print On Change - 32-Bit Register Last Edit October 8, 1996 y
File_name = ../bitregatspdpoc.vec File_type = PRINT_ON_CHANGE Test_type = AT_SPEED_SIMULATION Simulation_type = MAXIMUM Annotation_type = FRONT_ANNOTATION Vector_set_number = 1 Vector_set_revision = A Submission_time = Fri Aug 7 15:35:43 1992 ________________________________________________________________________ 1***CIRCUIT IDENTIFICATION = DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDEEEEEETTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTOOP IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIXXXNNNEEODODODODODODODODODODODODODODODODODODODODODODODODODODODODODODODODEEA 00000000001111111111222222222233TTTABCNN0O0O0O0O0O0O0O0O0O0O1O1O1O1O1O1O1O1O1O1O2O2O2O2O2O2O2O2O2O2O3O3ONNR 01234567890123456789012345678901RCC 12001020304050607080900111213141516171819102122232425262728292031312A SKA 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 M TAN TIME 1.219 0000000000000000000000000000000000111111XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX00X 7.926 0000000000000000000000000000000000111111XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX001 11.574 0000000000000000000000000000000000111111XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX001 14.990 0000000000000000000000000000000000111111XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX001 15.000 0000000000000000000000000000000001011111XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX001 23.560 00000000000000000000000000000000010111110X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X001 26.655 00000000000000000000000000000000010111110000000000000000000000000000000000000000000000000000000000000000001 29.990 00000000000000000000000000000000010111110000000000000000000000000000000000000000000000000000000000000000001 30.000 00000000000000000000000000000000101111110000000000000000000000000000000000000000000000000000000000000000001 44.990 00000000000000000000000000000000101111110000000000000000000000000000000000000000000000000000000000000000001 45.000 00000000000000000000000000000000110111110000000000000000000000000000000000000000000000000000000000000000001 59.990 00000000000000000000000000000000110111110000000000000000000000000000000000000000000000000000000000000000001 60.000 11111111111111111111111111111111001111110000000000000000000000000000000000000000000000000000000000000000001 74.990 11111111111111111111111111111111001111110000000000000000000000000000000000000000000000000000000000000000001 75.000 11111111111111111111111111111111010111110000000000000000000000000000000000000000000000000000000000000000001 83.810 11111111111111111111111111111111010111111010101010101010101010101010101010101010101010101010101010101010001 84.110 11111111111111111111111111111111010111111111111111111111111111111111111111111111111111111111111111111111001 89.990 11111111111111111111111111111111010111111111111111111111111111111111111111111111111111111111111111111111001 90.000 00000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111111111111111001 104.990 00000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111111111111111001 105.000 00000000000000000000000000000000010111111111111111111111111111111111111111111111111111111111111111111111001 113.560 00000000000000000000000000000000010111110101010101010101010101010101010101010101010101010101010101010101001 116.655 00000000000000000000000000000000010111110000000000000000000000000000000000000000000000000000000000000000001 119.990 00000000000000000000000000000000010111110000000000000000000000000000000000000000000000000000000000000000001 120.000 00000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000001 120.816 00000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000111 130.107 00000000000000000000000000000000001000000Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z111 134.990 00000000000000000000000000000000001000000Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z111 135.000 00000000000000000000000000000000010000000Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z111 149.990 00000000000000000000000000000000010000000Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z111 150.000 11111111111111111111111111111111001111110Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z111 151.219 11111111111111111111111111111111001111110Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z0Z001 160.016 11111111111111111111111111111111001111110000000000000000000000000000000000000000000000000000000000000000001

AC Test An optional simulation is that for AC test measurements. If measurements are to be done, the simulations are required. As with at-speed, both sampled and print on change files are required. Sampling is done at 100 ns intervals. These files show 16 AC tests, propagation path measurements only. A test is one input to one output and one edge (rising or falling). The reset is handled as before at the beginning of the file. AC test submission is complicated and extensive information is required for each test. AMCC offers AMCCSUBMIT, test submission software, to help with the process. Such support is vendor-specific, as is the vector check software. Figure 8-6 shows a sampled AC test file (16 tests) and Figure 8-7 shows the same tests, print-on-change. (This is not the Q20000 circuit.) These files pass the AMCC vector check software. Figure 8-6 AC Test Sampled Simulation - 32-Bit Register (partial) Figure 8-6 AC Test Sampled Simulation - 32-Bit Register - Full listing Figure 8-7a AC Test Print-On Change Simulation - 32-Bit Register (partial) Figure 8-7b AC Test Print-On Change Simulation - 32-Bit Register Full listing
Exercise Create a complete AC test vector set for the schematics shown in the Appendix of Chapter 3.
Exercises 1. For your version of the 32-bit register (assigned earlier), create functional simulation vector files following the rules for the array vendor selected. 2. What other simulations are required? 3. Review the vendor design submission package. 4. What vector checking software tools are available? 5. For the circuit shown, obtain a Q20000 Design Guide and compute the actual maximum frequency of operation for this circuit. (Refer to Chapters 5 and 6.) 6. For your version of the 32-bit register, compute the maximum frequency of operation and the resulting at-speed sample step required for the simulation file.
AC Test Simulation - Sampled - 32-Bit Register Last Edit October 8, 1996 1***CIRCUIT IDENTIFICATION = DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDEEEEEEDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTOOP IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIXXXNNNODODODODODODODODODODODODODODODODODODODODODODODODODODODODODODODODEEA 00000000001111111111222222222233TTTABC0O0O0O0O0O0O0O0O0O0O1O1O1O1O1O1O1O1O1O1O2O2O2O2O2O2O2O2O2O2O3O3ONNR 01234567890123456789012345678901RCC 001020304050607080900111213141516171819102122232425262728292031312A SKA 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 M TAN TIME 99.990 00000000000000000000000000000000001111XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX001 199.990 000000000000000000000000000000000101110000000000000000000000000000000000000000000000000000000000000000001 299.990 000000000000000000000000000000001011110000000000000000000000000000000000000000000000000000000000000000001 399.990 000000000000000000000000000000001101110000000000000000000000000000000000000000000000000000000000000000001 499.990 000000000000000000000000000000000011110000000000000000000000000000000000000000000000000000000000000000001 599.990 010000000000000000000000000000000101110011000000000000000000000000000000000000000000000000000000000000001 699.990 010000000000000000000000000000000011110011000000000000000000000000000000000000000000000000000000000000001 799.990 000000000000000000000000000000000101110000000000000000000000000000000000000000000000000000000000000000001 899.990 000000000000000000000000000000000011110000000000000000000000000000000000000000000000000000000000000000001 999.990 000000001000000000000000000000000101110000000000000000110000000000000000000000000000000000000000000000001 1099.990 000000001000000000000000000000000011110000000000000000110000000000000000000000000000000000000000000000001 1199.990 000000000000000000000000000000000101110000000000000000000000000000000000000000000000000000000000000000001 1299.990 000000000000000000000000000000000011110000000000000000000000000000000000000000000000000000000000000000001 1399.990 000000000000000001000000000000000101110000000000000000000000000000000000110000000000000000000000000000001 1499.990 000000000000000001000000000000000011110000000000000000000000000000000000110000000000000000000000000000001 1599.990 000000000000000000000000000000000101110000000000000000000000000000000000000000000000000000000000000000001 1699.990 000000000000000000000000000000000011110000000000000000000000000000000000000000000000000000000000000000001 1799.990 000000000000000000000000010000000101110000000000000000000000000000000000000000000000000011000000000000001 1899.990 000000000000000000000000010000000011110000000000000000000000000000000000000000000000000011000000000000001 1999.990 000000000000000000000000000000000101110000000000000000000000000000000000000000000000000000000000000000001 2099.990 000000000000000000000000000000000011110000000000000000000000000000000000000000000000000000000000000000001
AC Test Simulation - Print on change - 32-Bit Register Last Edit October 8, 1996 1***CIRCUIT IDENTIFICATION = DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDEEEEEEDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTDTOOP IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIXXXNNNODODODODODODODODODODODODODODODODODODODODODODODODODODODODODODODODEEA 00000000001111111111222222222233TTTABC0O0O0O0O0O0O0O0O0O0O1O1O1O1O1O1O1O1O1O1O2O2O2O2O2O2O2O2O2O2O3O3ONNR 01234567890123456789012345678901RCC 001020304050607080900111213141516171819102122232425262728292031312A SKA 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 M TAN TIME 1.257 00000000000000000000000000000000001111XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX00X 7.926 00000000000000000000000000000000001111XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX001 11.612 00000000000000000000000000000000001111XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX001 100.000 00000000000000000000000000000000010111XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX001 108.560 000000000000000000000000000000000101110X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X0X001 111.655 000000000000000000000000000000000101110000000000000000000000000000000000000000000000000000000000000000001 200.000 000000000000000000000000000000001011110000000000000000000000000000000000000000000000000000000000000000001 300.000 000000000000000000000000000000001101110000000000000000000000000000000000000000000000000000000000000000001 400.000 000000000000000000000000000000000011110000000000000000000000000000000000000000000000000000000000000000001 500.000 010000000000000000000000000000000101110000000000000000000000000000000000000000000000000000000000000000001 508.810 010000000000000000000000000000000101110010000000000000000000000000000000000000000000000000000000000000001 509.110 010000000000000000000000000000000101110011000000000000000000000000000000000000000000000000000000000000001 600.000 010000000000000000000000000000000011110011000000000000000000000000000000000000000000000000000000000000001 700.000 000000000000000000000000000000000101110011000000000000000000000000000000000000000000000000000000000000001 708.560 000000000000000000000000000000000101110001000000000000000000000000000000000000000000000000000000000000001 711.655 000000000000000000000000000000000101110000000000000000000000000000000000000000000000000000000000000000001 800.000 000000000000000000000000000000000011110000000000000000000000000000000000000000000000000000000000000000001 900.000 000000001000000000000000000000000101110000000000000000000000000000000000000000000000000000000000000000001 908.810 000000001000000000000000000000000101110000000000000000100000000000000000000000000000000000000000000000001 909.110 000000001000000000000000000000000101110000000000000000110000000000000000000000000000000000000000000000001 1000.000 000000001000000000000000000000000011110000000000000000110000000000000000000000000000000000000000000000001 1100.000 000000000000000000000000000000000101110000000000000000110000000000000000000000000000000000000000000000001 1108.560 000000000000000000000000000000000101110000000000000000010000000000000000000000000000000000000000000000001 1111.655 000000000000000000000000000000000101110000000000000000000000000000000000000000000000000000000000000000001 1200.000 000000000000000000000000000000000011110000000000000000000000000000000000000000000000000000000000000000001 1300.000 000000000000000001000000000000000101110000000000000000000000000000000000000000000000000000000000000000001 1308.810 000000000000000001000000000000000101110000000000000000000000000000000000100000000000000000000000000000001 1309.110 000000000000000001000000000000000101110000000000000000000000000000000000110000000000000000000000000000001 1400.000 000000000000000001000000000000000011110000000000000000000000000000000000110000000000000000000000000000001 1500.000 000000000000000000000000000000000101110000000000000000000000000000000000110000000000000000000000000000001 1508.560 000000000000000000000000000000000101110000000000000000000000000000000000010000000000000000000000000000001 1511.655 000000000000000000000000000000000101110000000000000000000000000000000000000000000000000000000000000000001 1600.000 000000000000000000000000000000000011110000000000000000000000000000000000000000000000000000000000000000001 1700.000 000000000000000000000000010000000101110000000000000000000000000000000000000000000000000000000000000000001 1708.810 000000000000000000000000010000000101110000000000000000000000000000000000000000000000000010000000000000001 1709.110 000000000000000000000000010000000101110000000000000000000000000000000000000000000000000011000000000000001 1800.000 000000000000000000000000010000000011110000000000000000000000000000000000000000000000000011000000000000001 1900.000 000000000000000000000000000000000101110000000000000000000000000000000000000000000000000011000000000000001 1908.560 000000000000000000000000000000000101110000000000000000000000000000000000000000000000000001000000000000001 1911.655 000000000000000000000000000000000101110000000000000000000000000000000000000000000000000000000000000000001 2000.000 000000000000000000000000000000000011110000000000000000000000000000000000000000000000000000000000000000001
Faults and Fault Detection
Introduction In any circuit composed of logic gates there is the possibility of the occurrence of a fault. A fault is defined to have occurred when any circuit output variable assumes a value of 1, 0 or X that differs from the value expected. When this occurs, the circuit violates the original circuit equations. Fault detection requires that a vector test set provide a test to detect when any fault has occurred in a circuit path. Fault location requires that sufficient tests be included such that the specific node in failure can be located. In general, fault location requires a larger vector set.
Controllability A circuit is judged on the ease and ability of the input variables to set internal and output variables to specific values. This is defined as circuit controllability. The ideal case is when internal variables can be set with one input vector (input variable configuration). The worst cases require that multiple vectors be gated through the circuit until a target internal node is set or forced to a desired value.
Observability A circuit is judged on the ease and ability to propagate the values on input and internal variables to a primary and therefore observable output. This is defined as circuit observability. The best case is when an internal variable propagates directly to an observable output within the vector time step. The worst cases require that multiple vectors be gated through the circuit until a target internal node has propagated to an observable output. In extreme cases, a fault may be undetectable.
Masking a Fault The presence of an internal or input fault may not be observable at any circuit output. In this case the fault is considered to be masked. A single fault may be masked as the result of the causes shown in Table 9-1. Masked faults are undetectable by definition since the observed circuit behavior is correct. The occurrence of a second fault may uncover a previously undetectable fault. To be complete, a vector test set must include tests for this case. Table 9-1 Single Fault Masking ● ● ●
reconvergent fan-out where unequal parity changes have occurred circuit redundancy, deliberate or otherwise previous occurrence of an undetectable fault
Fault Types Faults may be indeterminate in value (suspended between logical 1 and logical 0), or determinate in value (exhibiting a 0 or a 1). They may be transient (intermittent, time varying), in which case they are elusive and difficult to detect. Faults may be permanent, i.e., considered hard or solid, in which case they may be detected if they are not masked, i.e., are observable. Multiple faults occur when more than one fault occurs at one time. The probability of multiple faults occurring in a circuit is relatively less than the probability of a single fault, but is increasing with the increase in circuit density. Single faults remain the most likely event. Multiple faults can occur in such a manner that they can be degraded to an equivalent single fault. In this case, the input vectors that test for the occurrence of the single fault also test for the occurrence of the multiple fault condition.
Fault Equivalencies There are several equivalencies that exist which are useful in fault detection and which make fault location potentially more difficult. Some of these equivalencies are shown in Table 9-2 and two cases are diagrammed in Figure 9-1. Table 9-2 Fault Equivalencies ●
●
●
●
●
●
●
One or more inputs to an OR gate stuck at 1 is equivalent to the output of the OR gate stuck at 1. One or more inputs to an AND gate stuck at 0 is equivalent to the output of the AND gate stuck at 0. All inputs to an OR gate stuck at 0 is equivalent to the output of the OR gate stuck at 0. All inputs to an AND gate stuck at 1 is equivalent to the output of the AND gate stuck at 1. Failure of at least one input and the output (multiple faulting) will result in the output fault being propagated, masking the input faults. Any gate output fault has, as equivalent, one or more single gate input faults (the inputs not necessarily inputs to that gate). Any gate input fault does not necessarily have an equivalent gate output fault.
Figure 9.1 Fault Equivalencies
Single Stuck-At Faults The most common fault for current technology is the single, permanent, stuck-at fault where one of the actions shown in Table 9-3 has occurred. Table 9-3 Stuck-At Faults ● ● ● ●
A gate output is stuck at logical 0 A gate output is stuck at logical 1 Any single gate input is stuck at 0 Any single gate input is stuck at 1
The single fault assumption is not valid during design debug and prototype circuit checkout. Design errors are interpreted as faults during the design debug process. Process errors and mask errors produce faults during the prototype analysis phase. The single fault stuck-at test vectors can be used to assist the debug process, although it may be overkill to debug with vectors that are formatted under rigid tester requirement rules. Component failures that alter or affect voltage levels, current levels, pulse widths or circuit timing, but which do not alter or affect the logical function realized by the circuit will not be detected by stuck-at testing. Other testing is required to characterize the array. Table 9-4 Miscellaneous Testing ●
●
●
Voltage threshold levels, VIH and VIL, are tested during parametric testing. Device timing characterization is detected by either an AC speed monitor or AC tests. Thermal measurements of the array are made using thermal diodes.
Minimal Test Sets There are dozens of procedures that have been developed to allow the creation of minimal test sets, those sets of input vectors that all 100% fault coverage of all detectable fault locations in a circuit. Masked faults are not detectable. To be complete, a test set must be able to detect any single detectable fault, any multiple faults that are not single fault equivalent, and those faults that can be uncovered when another undetectable fault occurs (a special type of multiple fault).
Example test set The simple 2-state logic AND-OR circuit implementing Y = X0X1 + X2X3 has seven nets, including the four inputs and one output. There are fourteen unique single fault locations. Faults at the beginning and the end of a net are considered equivalent faults. There are 84 possible double fault combinations. All 84 double faults are covered by the fourteen single faults. All faults will be detected by the vector set which tests for the fourteen single stuck-at faults. (See Figure 9-2.) Other testing is required to characterize the array.
Figure 9.2 Example Circuit
As a further reduction, not all fourteen single faults are distinct. As an example, a test for X0 SA0 (stuck-at-0) also tests X4 SA0 and internal net X6 SA0. This means that the final minimal test set is something less than 2**4 = 16 tests.
The Problem The problem is to construct a complete and minimal test set such that any single fault condition is detected, provided that masking has not covered the effects.
Combinatorial Circuit During research into applications of Svoboda's Boolean Analyzer, a method of deriving a Minimal Test Sequence was discovered. It was, at that time, applied to combinatorial circuits and indicated that it could probably be applied to sequential circuits [Svoboda, White, 1974]. Since that time, the method for deriving test sequences for sequential circuits has been developed. The first step in developing a Minimal Test Sequence is the derivation of the circuit equations. These have been found to be implementationindependent. They may be derived on a gate by gate basis from a particular implementation, but the resulting sequence will always be the same as that derived for a minimized implementation of the circuit. Internal nets (intermediate variables) are not required to be in the final equations. If intermediate variables are to be carried through, they are treated as unknowns Primary output variables are treated as unknowns. Primary input variables are the knowns. A primary input is an input that is connected to an external source. A primary output is an output going to an external sink or connection. After all equations for all gates are listed for a circuit, form the Existence Function by solving the equation: ( F = G ) <==> ( y = 1 ) where y is the output. Equations of the form F = G can be rewritten as: ( F G ' + F ' G = 0 ) <==> ( y = 0 ) where F G ' + F ' G = 0 expresses the validity requirements for the complement functions. The Existence Function contains all behavior properties of the circuit. Fault testing problems are solved by adding equations to the system equations or by processing the Existence Function after it is derived.
Formation Rules for the Existence Function The inputs and outputs for a vector represent a point on the Existence function map that has a value of one. The set of all points of value equal to one is the Existence Function. A Test Sequence is a sequence of tests or input vectors represented by a selected sequencing through the points of the Existence Function. A Minimal Test Sequence may not necessarily use all Existence Function points (minterms, vectors). The existence function for the sample circuit of Figure 9-2 is shown in Figure 9-3. Figure 9.3 Sample Circuit (2-Stage NAND)
Forming Links The points on the Existence Function, treated as minterms, are linked following the linkage rule: Two points connect if and only if: ● ●
The input variables are logical distance one A primary output variable toggles 0-1 or 1-0
Logical distance one for the input variables means that only one input variable may change state when traversing the link from one minterm to another. The primary output must be an observable output (as shown in Figure 9-4.) The requirement that only one input change per vector is designed to reduce the instance of hazards and race conditions in the test vector set. Hazards are introduced during test when multiple inputs change state due
to differences in the tester lead connections. The parametric vectors for the gate tree in the last chapter are written using the Minimal Test Sequence for the tree. Figure 9-4 Adding The Logical-Distance-One Edges
Selecting a Chain When all the possible links have been formed, there will be one or more observable chains or sets of links. The longest chain defines the desired Test Sequence.
Selecting a Chain When all the possible links have been formed, there will be one or more observable chains or sets of links. The longest chain defines the desired Test Sequence. Figure 9.5 Minimal Test Sequence for the Function Y = X3X2 + X1X0
Minterm Y X3 X2 X1 X0 # 5
0 0
1
0
1
29
1 1
1
0
1
9
0 1
0
0
1
27
1 1
0
1
1
10
0 1
0
1
0
30
1 1
1
1
0
6
0 0
1
1
0
23
1 0
1
1
1
5
0 0
1
0
1
Note that, for each vector, one input changes state and the observable output changes state. Each input switches from 0-1 and from 1-0 during the test sequence. If the internal nets X4 and X5 were added to the table, they would also be observed to switch. This sequence provides 100% fault coverage for the function. There are cases when there is no longest chain. When a circuit is redundant there may be two chains or sequences of equal length and only one needs to be used for fault detection. Some faults will remain masked regardless of the sequence selected. Disjoint functions with terms that share no variable states will have a disjoint sequence, existing as two or more chains. A stepped sequence must be generated to connect the disjoint sequences, honoring the rule that only one input may change per vector.
Advantages of the test sequence The advantages of the Minimal Test Sequence are listed in Table 9-5. Table 9-5 Advantages Of The Minimal Sequence ● ●
● ● ● ● ●
Reduction of test hazards One observable output changes per vector allowing for straightforward identification of the existence of an error. All variables are toggled from 0-1 or 1-0 and back Complete coverage of all detectable single faults Covers multiple faults Closed sequence which allows easy repetitive application Possible coverage of bridging faults (not yet researched)
Simple Gates - Sequences The Existence Functions for the SSI gates NAND and NOR are shown in Figure 9-6. The Existence Functions for the SSI gates AND and OR are shown in Figure 9-7. In both figures, the gate, the logic equation it represents and a Marquand Map of the Existence Function is shown. Marquand maps date from the 1880's. Figure 9.6 Simple SSI Gates: NAND, NOR
Figure 9.7 Simple SSI Gates: AND, OR
The logical distance 1 edges are also shown. The test sequence can start at any node, and traverses each link in both directions. The sequences for the AND and OR gates are shown in Figure 9-8. An analysis of the fault coverage for the AND gate sequence is shown in Figure 9-9.
Figure 9-8 AND, OR Gate Minimal Sequences
Y1 C B A Y2 C B A 1
111 0
000
0
011 1
001
1
111 0
000
0
101 1
010
1
111 0
000
0
011 1
100
1
111 0
000
Figure 9.9 Sequence Analysis - AND Gate
Y1 C B A
Systems
FAULTS COVERED
1
1 1 1 Y1 SA0; A or B or C SA0
0
0 1 1 C SA1; Y1 SA1
1
1 1 1 Y1 SA0; A or B or C SA0
0
1 0 1 B SA1; Y1 SA1
1
1 1 1 Y1 SA0; A or B or C SA0
0
0 1 1 A SA1; Y1 SA1
1
1 1 1 Y1 SA0; A or B or C SA0
Extension to Three-State and Bidirectional Structures Three-state and bidirectional structures, such as TTL three-state outputs and bidirectional I/O macros, may be tested using the Minimal Test Sequence. The states for which the Existence Function map shows a Z (for three-state) or a PAD (for bidirectional macro in input mode) are mapped along with the conventional points with a value of one. The EN enable is treated as any other input. The same link formation rules apply. The differences are: ● ● ●
A link may exist with 1 or 0 at one end and Z or PAD at the other. There are no PAD-PAD or Z-Z links. A 1-Z, Z-1, 0-Z or Z-0 link does not violate the rule requiring that an output change state on each successive vector.
Using the same basic rules as before with these additions, a Minimal Test Sequence was successfully developed for both a TTL three-state output and a bidirectional I/O macro.
Extension to Sequential Circuits Sequential circuits are testable with the Minimal Test Sequence. The sequence is applied after the circuit is initialized. The Existence Function is developed mapping the Qn points representing "hold previous value". These are handled in the same manner as the Z or PAD values of the three-state and bidirectional devices. Violations of the rules requiring an observable output to change state occur when a 1-Qn or 0-Qn link is used. In fact, a sequential circuit must test HOLD 0 and HOLD 1. The sequence must include 1-Qn-1 and 0-QN-0 connections. A sequential device with a set or reset must be tested as shown in Table 96. Table 9-6 Sequential Device ● ● ● ●
●
Start with the circuit showing an unknown X state on the output(s) Execute a SET or RESET to initialize the device Begin the sequence, starting from the SET or RESET state For a device with SET and no output inversion, the toggle X-1 is tested but the toggle X-0 is not. For a device with RESET and no output inversion, the toggle X-0 is tested but the toggle X-1 is not.
Reference For reference, test sequences for SSI logic are listed on Advanced Logic Circuit Design Techniques, page 259 of Svoboda, White, Garland STPM Press, 1974. Simplistic invertors and non-inverting drivers are not listed. Several combinational circuits are examined in detail. They were chosen based on their characteristics and cover a range of difficult test situations. A three-state output, a bidirectional I/O, a latch and a D flip/flop macro are examined here.
Case Study - 16:1 MUX D Flip/Flop Circuit 100% FAULT-GRADE VECTOR SET The following circuit was developed as a teaching circuit and as such has parameters and labels beyond what would appear in an actual circuit schematic. These parameters have nothing to do with the required Functional, AC Test or Parametric Vector sets. A parametric gate-tree, used for VIH and VIL measurement is included and its output signal is listed. A simulation format requires that all I/O signals and internal enable nets be listed. The test sequence for a 16:1 MUX 'was altered to allow clocking to occur between vector steps. The rule of one input per vector changing state is honored in that data and clock do not change in the same vector. The sequence begins after the circuit RESET is executed. Both the schematic set and a formatted (compacted) output vector set is shown here. The output vectors include input, output and enable signals.
The Marquand Map The Marquand Map for logical analysis was proposed in a mathematical paper in the late 1800's. It is a convenient mapping method for large functions. The Karnough Map was developed in the 1950's specifically for 4-variable circuits (for coding) and is messier to use in these cases. Figure 9-10 through Figure 9-13 show different sizes of Marquand Maps with minterms labeled. Figure 9-10 2-Input 1-Output Marquand Map
Figure 9-11 3-Input 1-Output Marquand Map
Figure 9-12 4-Input 1-Output Marquand Map
Figure 9-13 5-Input 1-Output Marquand Map (Split)
2:1 MUX Example Figure 9-14 shows a 2:1 MUX, its equation and its existence function. The minimal test sequence is shown in the darkened edges. The table shows the test vector (output listed first), what is tested, and what changes in each vector. Note that X3, the output, must always change, while only one input is allowed to change. Figure 9-14 2:1 Mux Test Sequence Analysis
Figure 9-15 shows the same circuit, but this time lists both of the possible test sequences. In this case, the sequences are equal in length and coverage (100%). Figure 9-15 Choosing A Sequence When Two Are Available
3:1 MUX Example Figure 9-16 show the Existence Function for the 3:1 MUX. To keep the map on one page, the Marquand map was broken into two halves. X5 is the output and X0-X4 are the inputs (3 data inputs and 2 select inputs). Figure 9-16 3:1 Mux Existence Function
Figure 9-17 adds the logical-distance-1 edges. In each case, connect the two points iff (if and only if) one input changes state and one output changes state. Figure 9-17 Adding The Links
Figure 9-18 shows one sequence in dark edges. This is for a non-clocked circuit. The actual circuit simulated used a clock output, doubling the size of the vector set (change data in one step, change the clock in the next). Figure 9-18 Identifying The Sequence
16:1 MUX Actual Test Vectors Figure 9-19 Actual Test Vector Sequence For A 16:1 Mux With Clocked Output
Design Submission Last Edit September 2, 2001
On the completion of the design verificaton and simulation for the circuit, the design submission package is assembled. This package must pass through a design acceptance review before the array vendor will accept it. The requirement of design submission is specific - provide the array vendor with sufficient infoprmation to be able to evaluate the design, reproduce the simulation on their own internal systems, place and route the circuit, and produce Back-Annotated simulations that meet the customer's requirements. Design submission requirements vary widely in their specific forms and supplemental material. As an added support feature, AMCC created AMCCSUBMIT - a design submission documentation program that prompts for design validation questions and for information required to document submitted simulation files. To better discuss this process, the following is the current AMCC design submission document, referencing AMCCSUMBIT and the validation questions. Regardless of the forms it may take or be required to be in, the same basic information is required for any design submission. Variations occur when the array vendor is contracted to perform some or all of the design process to the customer's specifications.
Case Study: AMCC Design Submission Submitting A Circuit Design To AMCC (1990) The following document has been designed to ensure the AMCC customer, of a successful transition from concept to finished part. It is a summary of the items required for the submission of a BiCMOS or bipolar array-based circuit design to AMCC when: ●
●
the schematic capure and simulation vector generation has been performed by the customer; the design entry is via a description language or netlist and simulation vector generation has been performed by the customer.
The items described herein must be submitted to AMCC for use in the acceptance design review prior to committing the design to layout. These are the critical information transfers which, if not completed, may delay the acceptance of the circuit.
Schematics are to be prepared following AMCC schematic conventions. Refer to the design manual for further information. Sections provide a detailed description of the AMCC rules for both EWS-generated and nonEWS-generated schematics. A design cannot be processed without the required test vectors. Functional simulation and at-speed simulation or timing verification are required prior to releasing a design to layout. Parametric simulation may also be required. AC Tests are optional. For information on the test vector requirements and simulation procedures, refer to Volume II, Section 4, "Vector Submission Rules and Guidelines". ●
AMCC Array Design Submission With AMCCSUBMIT AMCCSUBMIT must be run and the questions answered. It will produce a summary of the documentation required for a successful design submission. Where information is requested that is not available due to the type of design submission and the contract, indicate why the information is not being supplied.
●
AMCC Assigned Circuit Name (required identification)
●
AMCC PRODUCT_NAME When AMCC has received a purchase order for a circuit, AMCC assigns a code name (a.k.a circuit name, product name) to protect the proprietary nature of the circuit. This code name should appear on the submission document and on all hardcopy documents as a single point of reference. It should be attached to the schematic chip macro as the PRODUCT_NAME parameter. If no such name assignment exists, use the first 4-6 letters of the company name as the code name ID.
●
AMCC Assigned DEVICE_NUMBER [may use default] Assigned with the PRODUCT_NAME, the DEVICE_NUMBER serves to identify the individual circuit. It may be defaulted if no number has been assigned.
●
Array Series and Specific Array Clearly identify which array family and which array within that family is the one used for this design. The selection should match the chip macro identification.
●
Macro Library Version (required identification) The macro library: release version number is on the label of the release media and on the upper right hand corner of the chip macros. The version used for the drawings and all execution files must be supplied on the submission checklist. Designers should always verify that they have the latest version of a library immediately prior to beginning schematic capture.
●
●
●
MacroMatrix Release Version The MacroMatrix: release number may vary from the library release. Supply this number on the checklist. EWS System Operating System and Software Version AMCC-system release compatibility is defined in the Operation and Installation section of the AMCC Design Manual and is the minimum release level that is compatible with the AMCC MacroMatrix package. Where a CAD/CAE software package is available on several platforms, that information is also clearly defined.
●
Index file for Floppy DIsks; Tapes (required) Each floppy disk (tape) submitted must contain its own text index file (read.me) listing all files contained on that disk (tape), with a description of each file. Each disk (tape) should be clearly labeled as to its name and contents. All file names should be meaningful. One disk (tape) should contain a master index file listing all disk (tape) files and their contents. A hardcopy of this master index file should also be submitted. All files should be on AMCC-readable magnetic media.
●
Schematic Drawing Pages (may be required) The final versions of all submitted schematic pages are to be on both magnetic media and in hardcopy format. Two sets of the printed schematics (minimum) are required; three sets for nonEWS schematics. One set must be clearly marked (highlighted) to show AC Test paths and critical paths. Label AC Test paths with the documentation AC Test number. Label the critical paths with the timing correlation report path number.
●
AGIF Netlist - circuit.sdi (required) The final version of the AGIF (AMCC Generic Interface Format) netlist that was used as input to the MacroMatrix AMCCERC, AMCCANN, AMCCSIMFMT, AMCCVRC and AMCCSUBMIT programs must be included on the media (disk or tape). They require the use of the AGIF netlist. The netlist file is called circuit.sdi and is located in the .../ERC subdirectory. There should only be one of these files for any given set of simulations running on the same library (MILMAX, etc.)
●
AMCCERC (Rules Check) Report (required) The printed AMCCERC report, AMCCERC.LST, must be submitted with the hardcopy and media documentation. Any error messages must be accompanied with an AMCC waiver (approved Pre-Approval Request or PAR) and a description as to their intended resolution. [Page 6-A-1; 6-A-2]
●
Design Validation Review (required) A complete and comprehensive design validation review is required for all design submissions and is prompted for by AMCCSUBMIT. In cases where there is no software support for design rules, the designer must perform the validation checks manually and submit the results.
●
I/O List (required) The list of all signal I/O, added power and ground pads and fixed power and ground pads including signal I/O type (TTL standard, TTL open-collector, TTL 3-state, ECL 10K, ECL 100K, etc.) and clear identification of simultaneously switching outputs is generated by AMCCERC and called AMCCIO.LST. It must be included in the hardcopy and media documentation.
●
Annotation Level Space is provided on the submission checklist for annotation level, either Front-Annotation or Back-Annotation. BackAnnotation submissions are currently handled as special cases.
●
Maximum Operating Frequency The specified maximum operating frequency is the frequency at which the At-Speed simulation must be run; specify it in MHz or GHz.
●
Power The total power in Watts is to be computed or estimated. This includes DC power and any AC power, following the directions in Volume I, Section 5 of the design manual specific to the AMCC array series. Specify any differences between the total power entered here and the power computed by AMCCERC. The difference may be the AC power computation, different termination loading or duty cycles.
●
Annotation and Output Loading (required) AMCCPKG.LST AMCCANN output.dly AMCCANN allows the package type, package pin capacitance and system loading to be specified for the circuit. Commentary information on frequency must be added for high-speed I/O (see Section 4). The list of all signals with pad placement (if known), package pin capacitance (Front-Annotation or Back-Annotation), and system load is called AMCCPKG.LST. It must be included in the hardcopy and media documentation.
The output.dly data file generated by AMCCANN must be included on media. The output capacitive load delays are included in the annotation files and is reflected in the simulation results. The delay files used in the simulations must be submitted on media. For FrontAnnotation these are: FNTxxx[x].yyy, where xxx[x] is the simulation type (mil, com, min, nom,c5mx, c5mn, etc.) and yyy is the extension denoting the system used (dsy, val, men, ver, lsr, etc.). Back-Annotation uses BCKxxx[x].yyy. Refer to Volume I, Section 3 for a detailed list of the files used with a particular array series. ●
Cross Reference (optional) The AMCC cross-reference file, AMCCXREF.LST can be submitted on media. It is currently not required.
●
Function Description of the Circuit (optional) AMCC prefers that a high-level functional description of the circuit on the array be included in the hardcopy documentation. This description should include the required performance of the circuit, any timing constraints, interface requirements, testing specifications and the operating environment.
●
Block Diagram of the Circuit ..(optional) Unless a hierarchical schematic capture has been performed, AMCC prefers that a top-level block diagram of the circuit as configured for the array be included in the hardcopy documentation. If a hierarchical schematic capture has been performed, the top-level schematic may be sufficient.
●
Preplacement Requests (optional) Preplacement of macros for critical timing requirements can be submitted to AMCC if the designer feels it to be necessary. All preplacement requests will be evaluated by AMCC, and the designer will be notified if they can or cannot be met.
●
Pin-Out Requests (optional) I/O placement may be of concern in a design, and the designer may wish to specify a pin-out request to AMCC. The final pin-out is driven by layout considerations and restrictions. The pin-out requests will be evaluated by AMCC, and the designer will be notified if they can or cannot be met.
●
Critical Path and Timing Requirements (required) The maximum required frequency of operation for the circuit and the expected performance for the critical paths should be clearly stated in the hardcopy documentation following the procedure discussed in "Vector Submission Rules and Guidelines" in the AMCC design manual. Use the Timing Correlation report form for any path not covered by an AC Test.
●
Waivers - Approved PARS Attach any preapproval requests that have been approved to the design submission package and list them by number on page 6-A2. These include custom macros, design rule variations, approved AMCCERC errors, approved timing check errors, and approved AMCCVRC errors.
Functional Simulation Submission (Required) The functional simulation output vectors, consisting of all of the input and the expected output signals generated by the input stimulus file, are the actual vectors used by the AMCC test department to verify the correct functional operation of the part. The customer must supply all functional simulation vectors. AMCC uses functional simulation output vectors for fault grading and recommends 90% or better fault coverage of the final circuit. AMCC requires that functional simulations be performed once using the minimum library and once using the maximum worst-case library. Refer to Volume I, Section 3 for the specific series to determine which library is maximum and which minimum for the array. The minimum simulation should exactly match the maximum simulation. If not, a potential timing problem could exist. Resolve the problem or consult with AMCC and send in both the minimum and maximum functional simulations. The functional simulation should be performed following the procedures described in the design manual.
Functional Simulation Documentation The documentation required for an AC Test simulation submission is prompted for by the AMCCSUBMIT program interface. Before running AMCCSUBMIT, simulation files for the paths in sampled format should exist. All AMCCSUBMIT errors should be resolved prior to submission. Documentation included on disk or tape for each functional simulation file includes: ● ●
● ● ●
●
● ●
●
● ●
the sampled AMCCSIMFMT file name (file is UNEDITED); whether this was a MIN or a MAX simulation; PRODUCT_GRADE is prompted in the form header (identifies the annotation file used); the simulation control file name; the simulation input file name, if any; the simulation command file name which may be submit procedures, a transcript of system operation during simulation, including VIEW, RUN, START, LIST, WRITE, etc. statements as required for the specific system; the AMCCVRC.LST report file name (rename since several are included per submission); the signal analysis file used with AMCCVRC; the name of the text file version of the data or the control file which clearly describes the testing performed; the timing check report for the file (all simulations done for submission use timing checks active). the AMCCVRC.LST report commented for errors AMCC waivers.
At-Speed Simulation Submission (Required) The submitted at-speed simulation is rerun after layout to verify the actual timing performance for the array. The Back-Annotation delay file, which provides the actual metal delays for the layout, is used in place of the Front-Annotation delay file. The circuit.pkg file provides package pin capacitance based on final placement to replace the estimates for package pin capacitance in the output.dly file. At-speed simulation results provide an evaluation of the timing performance and help identify timing violations and potential timing problems. The customer must supply all at-speed simulation vectors. AMCC requires that at-speed simulations be performed at the specified maximum operating frequency, and be performed using the minimum library and then again using the maximum worst-case library. Refer to Volume I, Section 3 for the specific series to determine which library is maximum and which minimum for the array. The at-speed simulation, if required, should be performed following the procedures described in the design manual.
At-Speed Simulation Documentation The documentation required for at-speed simulation submission is prompted for by the AMCCSUBMIT program interface. Before running AMCCSUBMIT, simulation files for the paths, both in sampled and in print-on-change format should exist. All AMCCSUBMIT errors should be resolved prior to submission. Documentation included on disk or tape for the sampled at-speed simulation file includes: ●
● ●
● ● ●
●
●
the file number (1..n); no paging of at-speed files is required or necessary; the sampled AMCCSIMFMT file whether this was a MIN or MAX simulation; PRODUCT_GRADE is prompted on the form header (identifies the annotation file); the simulation control file name; the simulation input file name, if any; the simulation command file name which may be submit procedures, a transcript of system operation during simulation, including VIEW, RUN, START, LIST, WRITE, etc. statements as required for the specific system; the name of the text file version of the data or the control file which clearly describes the testing performed; the timing check report for the file (all simulations done for submission use timing checks active).
Documentation included on disk or tape for the print on change atspeed simulation file includes: ●
everything included to document the sampled at-speed simulation except that the sampled AMCCSIMFMT file is replaced by the print on change AMCCSIMFMT file.
AC Tests Path Delay Vector Submission (Optional) AC Testing is an automated testing methodology used to examine in close detail circuit performance measurements of propagation path delay. A separate set of vectors is required for each measurement, and each set must initialize and bias the path and provide the means by which the measurement can be made. If this testing option is desired, then these vectors must be supplied. They are in addition to functional simulation vectors. A maximum of 10 paths and 20 tests can be tested per array. Refer to the "Vector Submission Rules and Guidelines" concerning a concatenated set of vectors or the use of one set of vectors for multiple, individual tests. AMCC requires that AC Test simulations be performed using the minimum library and then again using the maximum worst-case library. Refer to Volume I, Section 3 for the specific series to determine which library is maximum and which minimum for the array. The minimum simulation should exactly match the maximum simulation. If not, a potential timing problem could exist. Resolve the problem or consult with AMCC and send in both the minimum and maximum AC Test simulations. The AC Test simulations should be performed following the procedures described in the design manual. Documentation of AC Test Simulation [AMCCSUBMIT] The documentation required for an AC Test simulation submission is prompted for by the AMCCSUBMIT program interface. Before running AMCCSUBMIT, simulation files for the paths, both in sampled and in print-on-change format should exist. The program is iterative, allowing test to be documented one at a time or a test to be partially documented in one run and completed in another. All AMCCSUBMIT errors should be resolved prior to submission.
Parametric Vectors (Required Or Optional) Parametric testing is optional, provided the SSO switching in the output vectors does not exceed the 16 (mixed mode) or 32 (100% TTL or 100% ECL) switching outputs per vector limit. If parametric vectors are required, the customer must supply all parametric simulation vectors. AMCC requires that Parametric simulations be performed using the minimum library and then again using the maximum worst-case library. The minimum simulation should exactly match the maximum simulation. If not, a potential timing problem could exist. Resolve the problem or consult with AMCC and send in both the minimum and maximum parametric simulations. The parametric simulation should be performed following the procedures described in the design manual. Parametric Simulation Documentation The documentation required for the parametric simulation submission is prompted for by the AMCCSUBMIT program interface. Before running AMCCSUBMIT, simulation files for the paths in sampled format should exist. All AMCCSUBMIT errors should be resolved prior to submission. Documentation included on disk or tape for each parametric simulation file includes: ●
● ●
● ● ●
●
●
the file number (1..n) when 4K pages are submitted or when the functional simulation is a set of simulation files (Volume II, Section 4); the sampled AMCCSIMFMT file name (file is UNEDITED); whether this was a MIN or a MAX simulation; PRODUCT_GRADE is prompted in the form header (identifies the annotation file used); the simulation control file name; the simulation input file name, if any; the simulation command file name which may be submit procedures, a transcript of system operation during simulation, including VIEW, RUN, START, LIST, WRITE, etc. statements as required for the specific system; the AMCCVRC.LST report file name (rename since several are included per submission); the name of the text file version of the data or the control file which clearly describes the testing performed;
Hardcopy documentation consists of: ●
the AMCCVRC.LST report commented for errors and AMCC waivers.
AMCC Simulation Forms Vocabulary The simulation submission forms require at least one functional simulation run at worst-case maximum. The items requested are: File # Item number, 1,2,3... AMCCSIMFMT OUTPUT FILENAME The user-defined name assigned to the formatted simulation output file (the vectors being submitted). The input to AMCCSIMFMT is from a VALID tabular trace output, MENTOR LIST, DAZIX VLAIF output file, etc. MIN
MAX
Check if this is a MINIMUM simulation. This assumes a MIN timing library and FNTMIN.ews or the equivalent annotation file.
Check if this is a maximum simulation. This assumes a maximum for the product_grade timing library and FNTMIL.ews, FNTCOM.ews or the equivalent Q20000 or Q24000 annotation delay file (COM5MAXmax - fntc5mx.ews, MIL5MAXmax fntm5mx.ews).
SIMULATION CONTROL FILENAME The file required by the EWS to input stimulus, etc. that was used to perform the submitted simulation. It is called the SOM_MCF.SING file on DAZIX, the transcript file on MENTOR, the directives file on VALID, etc. SIMULATION INPUT FILENAME Any other required input file. Called the remote data VLAIF file on DAZIX, the stimulus input file on VALID. Mentor may have a data file called from the force file. The existance of this file depends on the method used to create the stimuli. SIMULATION COMMAND FILENAME The transcript or log of operations or submit process, i.e., what it would take for AMCC to duplicate the simulation. AMCCVRC REPORT FILENAME Always rename AMCCVRC.LST to a user-defined name since more than one is to be submitted. List the one that goes with the netlist and AMCCSIMFMT output file. AMCCVRC is not required for minimum simulations, for sampled at-speed simulations, or for print-on-change simulations. AMCCVRC SIGNAL ANALYSIS FILENAME The clock race-condition test requires a signal analysis file to specify clock-data relationships. A user-defined name. SIMULATION DESCRIPTION FILE The text file, may be a commented copy of the AMCCSIMFMT output file, which describes the testing being performed.
Glossary Introduction This glossary contains industry-standard as well as AMCC - specific definitions. For AMCC software and files, refer to the AMCCERC User's Guide for further information on the items, if you can find one..
y
3-level gating Circuit design technique used internally in the AMCC Bipolar Logic Arrays for improved logic density.
A AC Speed Monitor AMCC incorporated a 9-stage ring oscillator followed by a 2-state divide-by-4 counter as the basis of the monitor in the base array for the Q20000 Series arrays. Its use relieves the designer of developing AC test vectors to check part performance. It requires two pads. AC test Testing performed with a tester and the packaged part and designed to sample the timing characteristics of the actual die to verify the timing of the actual circuit as produced. Only a few tests are needed since all paths on any single path will be similarly affected by the processing and environment and all timing delays are constrained to be within the temperature-voltage-process variation range. Active components Active components of a die are transistors and diodes. Added ground Interface cells may be designated as providing ground pads for either IEVCC or ITGND (0V). IEVCC is ground for standard reference ECL. Placement is user-controlled. Added power Interface cells may be designated as providing power pads for either IEVCC or ITPWR (+5V). IEVCC is power for +5V REF ECL. Placement is user-controlled. Adjustment factor A multiplier specified to allow conversion of a data value specified under one set of conditions to a value suitable for a second set of conditions. They may be used for typical to worst-case maximum or minimum conversion. Factors may be used to adjust for powersupply, temperature or process variation. See worst-case multiplier. AGIF
ALU
AMCC generic interface format; Used as the means of communicating between a workstation and the AMCC proprietary software tools.
The arithmetic-logic unit, where data is processed according to the instruction under execution.
AMCCANN AMCC annotation software. See also Front-Annotation and BackAnnotation. AMCCAD AMCC place and route system. AMCCERC AMCC engineering rules check software. See ERC.
AMCCERC.LST Report produced by AMCCERC. AMCCIO.LST List of all pads on the array with descriptions of levels and types produced by AMCCERC. AMCCPACKAGER The packaging database shipped with MacroMatrix to allow placement to be completed in the filed. It allows a package to be selected and more refined estimates in the Front-Annotation delays files. AMCCPKG.LST The documentation of the package, package pin capacitance and system load capacitance produced by AMCCANN using the userinterface, interactive input and the output.dly cumulative edits file. AMCCSIMFMT AMCC simulation format preparation software program. Operates (currently) on LIST file input. Unique for each workstation simulator, versions exist for Dazix, Verilog, Mentor, Lasar, and Valid. AMCCSUBMIT AMCC submission automation software. Prompts user for data and provides some screening of the specifications vrs. the annotation results. Produces reports for use in design submission. AMCCSUBMIT.ERR AMCCSUBMIT.DAT AMCCSUBMIT.LST Reports produced by AMCCSUBMIT. AMCCVRC AMCC vectors rules checking software. Provides some screening of vectors for functional, AC test, and parametric simulations. AMCCVRC.LST Vector rules check reports produced by AMCCVRC. AMCCXREF.LST A cross-reference listing produced by running the AGIF netlister. AR
AS
Asynchronous reset. See R.
Asynchronous set. See S.
Astable A circuit with two quasi-stable states; an oscillator. At-speed simulation A simulation designed to examine the timing integrity of a circuit design by functioning the circuit at its specified maximum operating frequency. Autoplace or auto place. Macro placement on internal array cells is performed automatically by software. Autoroute or auto route . Interconnection of the macros is performed automatically by software.
B B cell Old Usage: Buffer cell; can be used for some logic macros as well. Located around the periphery of the internal cell block in AMCC Q700 Series arrays and in the Q1500A Array. Capable of a higher row current limit (three times greater) than the logic cells. New Usage: Basic cell, the internal cells in the BiCMOS arrays
Back-Annotation The method of simulation of a circuit after layout, where the loading on a particular net is the sum of the actual fan-out load (electrical load), the actual wire-OR load (electrical load), and the actual metal used to interconnect the pins. The load is a function of the array (diesize) and is accurate within measurement device parameters. Bandwidth Frequency range of performance for a device (as in the bandwidth of an amplifier). Basic
cell In the BiCMOS Gate arrays, a basic cell is an internal cell, similar in application to the bipolar array L cell.
BCKxxx[x].ews A back-annotation delay file where xxx[x] identifies the product_grade and power_supply, and ews defines the simulator or workstation for which the file has been formatted. Bidirectional Capable of moving in either of two directions at any given time. Bidirectional I/O cells Capable of processing input or output signals. See I/O cells. Binary The base two number system where each digit is a power of two, the allowed digits are 1 and 0. In circuit representations of a binary equation the digits 0 and 1 are also called LOW and HIGH or TRUE and FALSE, the equivalent values of which depend on the polarity of the system. Bipolar One silicon technology used in ICs; radiation-resistant, faster than CMOS at the expense of higher power dissipation. Bit
Byte
One Boolean digit: 0 or 1. X or # are the symbols used for "Don't care" and U is used for undefined or unknown.
Originally meant enough bits to represent a character (code dependent in size); by default a byte is now usually taken as 8 bits. In 16-bit architectures, it is half a word.
Bundle A group of single wires; a bus.
C CAD
CAE
CAI
CAM
Cell
Computer-Aided Design.
Computer-Aided Engineering.
Computer-Aided Instruction.
Computer-Aided Manufacturing.
The smallest uniform repeatable unit on a logic array; there may be more than one type on a given array. A cell may or may not be the smallest addressable unit one the array. A cell denotes a group of active and passive elements on an array.
Cell utilization A measure (percentage) of the number of internal cells actually used in a design (accessible by a designer). The suggested limit is usually specified on the array series data sheet. Internal cell utilization is a more descriptive phrase. Chip
Slang for integrated circuit; it is used for a packaged die.
Chip macro A pseudo-macro that is used with an AMCC library as a method of communicating circuit and array-specific information to the software. E.g., array, cell count limits, cell types, overhead current, powersupply, product-grade, worst-case multipliers for current and time, if needed, power limit, cell usage limits, cell utilization, pad counts, and other data. Circuit A logical function or functions constructed from electrical devices. Circuit.pkg A data file produced by AMCCAD that contains the actual package pin capacitance by signal name, produced after placement. Circuit.sdi The AMCC universal netlist produced by translating the workstation or simulator-unique netlist into the AMCC generic interface format (AGIF). Circuit building block The basic unit available to simplify circuit design. SSI, MSI, LSI and VLSI represent the evolution of hardware building blocks. Circuit density A measure of the number of equivalent 2-input NOR gates a design would require, usually rounded to the nearest 100 or 1000(K) as in "800 gates" or "28K gates". Clock, CLK , Clock signal In many AMCC macros the rising edge is taken as the active edge. Either edge may be the clocking or active edge in a circuit or a macro; complex systems use both edges to operate latches and registers. CML
Current mode logic is a variant of ECL. The AMCC Q20000 Series has CML macros. These are high frequency (600MHz up to >1.2GHz) macros. Terminated with 50 ohms to GROUND, they maintain a typical 500 mV swing.
CMOS or C MOS; complementary metal-oxide semiconductor. (In contrast to NMOS - Nchannel metal-oxide semiconductor, PMOS - P-channel metal-oxide semiconductor.) Constructed using comple-mentary (N- and P-) MOS field-effect transistors. CMOS has lower power consumption per gate and the highest gate densities per die. See MOS. Common clock Reference to using the same clock signal on all devices on a board or all functions within a semi-custom array. A single common clock is the preferred and the simplest method of design. It is also easier to test. Configurable cells Able to be altered to fit a particular application. Conventional ECL Standard REF ECL, ECL 10K or ECL 100K, voltage supplies are -5.2V (ECL 10K) or -4.5V (ECL 100K). CORxxx[x].ews Internal-net only delay files produced by AMCCAD for use by AMCCANN where xxx[x] identifies the product_grade and power_supply, and ews defines the simulator or workstation for which the files are formatted. Combined with output.dly and circuit.pkg, these files allow the creation of the BCKxxx[x].ews files. Critical path The longest path (largest propagation and loading delay) through a circuit. Custom macro One that is designed specifically to meet the customer's requirement and is not currently in the released macro library
D Delay offset Lag, time before start. Design element Basic part of a design. Demultiplexor Demux; A decoder; a device that allows one or more input lines to select between n output lines where the number of inputs is less than the number of outputs. design automation DA The tools used to automate the design and design verification from circuit creation to mask specification. Design rule verification Software that performs the function of circuit-level design rule verification, or verification that electrical restrictions have not been violated. Wafer fabrication design rules (DRC) are a set of electrical and minimum physical parameters that can be guaranteed for the process. DEVICE_NUMBER Another identification for a booked circuit. It appears as a parameter on the chip macro. Die
The silicon chip itself; plural is dice. A die is an non-packaged chip.
Differential ECL inputs Pairing of true and complement signals to allow dual-rail communication for increased noise immunity. Required for remote signals, off-board communication with +5V REF ECL and when operating at 80 MHz and higher. Driver An internal macro that provides extra drive capability, extra current and can handle extra loads. A driver macro typically has lower valued k-factors and therefore can drive more loads at less time delay penalty than a standard macro. The k-factors may be balanced. Dual-in-line DIP Type of package with leads perpendicular to the cavity and spaced .100 inch from each other on the same side and .300, .400 or .600 apart from the leads on the opposite side. Many low external pin count (< 64) circuits are packaged in DIPs. Power limitations are about 1 Watt.
E +5V REF ECL The voltage supplies are +5V and GND; for use with TTL I/O in a mixed I/O mode design. ECL Emitter-coupled logic A differential switch-based logic family using parallel transistors and series switches. ECL is extremely high-speed. The normal power supply is -5.2V for ECL 10K and -4.5V for ECL 100K. Logical one is 0.8V; logical zero is -1.80V for -5.2V ECL 10K (an average voltage magnitude of 1.3V). The speed comes from the fact that the transistors within the gates are never driven into saturation, eliminating the time required for the transistors to come out of saturation. ECL 10K See ECL. ECL 100K ECL normally using a power supply of -4.5V; it is temperaturecompensated. ECL, pseudo This is +5V REF ECL, either ECL 10K or ECL 100K operated with a TTL power supply of +5V, to allow ECL functions on a TTL board. EDIF
Electronic design interface format which AMCC will use to replace AGIF. This is a proposed standard (one of several). EDIF can already be used to transfer a Verilog HDL design to Mentor schematics.
Emitter-follower A circuit used to provide drive after the logic portion of an ECL circuit; the voltage gain of an emitter-follower is close to unity. Equivalent gates A sizing methodology for gate arrays. There is no standard method of determining equivalent gate counts. This should not be used to "size" a circuit design. ERC
EWS
Engineering Rules Checks or engineering reports and checks. Support software on the various EWS designed to flag miss-connects (missed connections), naming errors, improper wire-ORs, excessive fan-out, GND checks, etc., and to generate reports on macro usage, current dissipation and loading.
Engineering workstation; A computer-graphics system specifically oriented to support circuit design development from schematic capture through simulation, timing analysis, testability analysis, and eventually test pattern generation. Some can handle layout, PG tape pattern generation. At the minimum, it will produce a netlist from the captured schematic. Adding a framework allows easier use of 3rd party software and provides better support for the entire design process.
External hold time The time that data on an external package pin must be held stable after the arrival of the active edge of the clock at a second external package pin. External package pin A package pin, usually carrying a signal. See package signal pin. External pin A pin on the outside on the die used to interface the circuitry to the outside world. External set-up time The time that data on an external package pin must be stable before the active edge of the clock arriving at a second external package pin.
F Falling-edge active A flip/flop or latch which can change state during the falling edge of the active clock and remains static during the rising edge of the active clock. Fan-in The number of electrical loads presented by an input pin to the driving device, applies to macros within an array or to discrete devices. Fan-out The number of components to which a signal is connected. Fault, as in logical fault An error due to hard or soft failure such that the logical function implemented is not what is desired. (E.g., SA1, SA0, SAX) Fault coverage The inclusion in the test set of a test to cover each possible, observable fault at least once. A measure of this coverage expressed in per cent. Recommended coverage is 90% or higher. Fault grading The process of estimating the percentage of faults tested by the test vectors. Fault grade software is available on the workstations and some simulator systems, but modeling differences make comparisons of fault grade scores difficult. Fixed ground An array pad designated for use as a ground pad and not capable of being used for any other purpose. Usually, an array fixed ground pad must be bonded to the package to an internal ground plane or to a package pin. Fixed power An array pad designated for use as a power pad and not capable of being used for any other purpose. Usually, an array fixed power pad must be bonded to the package to an internal power plane or to a package pin. Flat-pack A minimum volume package with leads or connectors distributed radially on all four sides and paralleled with the die cavity. Flat packs are commonly used where a high pin-count, very light weight packaging system is needed. Used in military and space systems. FOD
Fan-Out Derating A net parameter (AMCC software) that specifies the percentage derating to be applied to the driving macro pin fan-out load limit.
FNTxxx[x].ews A Front-Annotation data file where xxx[x] defines the product_grade and power_supply, and ews defines the simulator or workstation for which the file has been formatted. FP
FPGA
Abbreviation for flat pack. Package type.
Field programmable gate array.
FPLA
FPLS
Field programmable logic array; "AND" array user-programmable, some pre-selected subset of the 2**n product terms is available from the n inputs (array-dependent), "OR" array user-programmable, any product term (of those available) and be ORed to the y outputs.
Field programmable logic sequencer.
Front-Annotation The method used to predict the loading on the internal net of a circuit by using the actual fan-out (electrical) load, the actual wire-OR (electrical) load, an averaged package pin capacitance, and the statistically estimated metal (physical) load. The estimate is the mean of the collected data of observations made on previous circuits. The estimate is a function of the fan-out and wire-OR loads, in terms of the physical pin con-nect and the array (die size). Output nets use actual system load capa-citance and estimated package pin capacitance as the basis for time delays. Full custom ICs All devices and components are designed specifically for the application. This approach takes the most design time, the most debug time, and is the most expensive. Saving in die size are not achievable unless the designer is very experienced. It is the most difficult approach. See Silicon compiler. Function cell Internal logic cell capable of supporting one or more logic operations, depending on the cell size. Function macro Internal macro, on bipolar arrays, operates at internal ECL levels (onehalf volt ECL for AMCC bipolar arrays), placed on internal logic or buffer cell (optional placement). Operates at CMOS levels inside BiCMOS arrays. Functional elements See function macro. Functional simulation Simulation designed to examine the functional integrity of a circuit design. Functionality Degree of density of a design; also the logical integrity of a Boolean circuit, independent of detailed speed or parametric behavior. Fuse-programmable devices Example devices are PROMs, PLAs, and PALs.
G Gate
1. 2. 3. 4.
Physical input to a CMOS transistor pair. A logic gate such a NOR or NAND. A number of repetitive active element groups. A transistor.
Gate array An array formed from elementary gates, usually 2-input NOR or 2input NAND (CMOS) gates. A gate array is configured into a variety of logical circuits through customized interconnect. A gate array is often defined as a semicustom digital integrated circuit.
Gate delay The time it takes to propagate a signal through a gate. Gate equivalent circuit Unit of measure. Equivalent gates refers to the number of gates of a given complexity (2-input NOR, 2-input NAND) that would be required to perform the same function. GTO
A net parameter (AMCC software) that is attached to the output signal for the parametric gate tree.
H HDL
Hardware Description Language from the IEEE Standards Subcommittee (IEEE SC20 ATPG).
Hierarchical structuring Generally a tree-like structure for top-down design from block diagram to detailed circuit. Depending on the EWS, this can involve nesting, blocks, cells (on-page-nest) and multiple directory levels. Hertz; HZ Cycles/second. High performance High-speed, high density, it may also mean high-power. High speed macro option This version of the macro is designed for high speed applications with associated greater power dissipation.
I I cell
IC
I2L
Accommodate macros for input type functions only. CMOS refers to these as dedicated input cells.
Integrated circuit
Integrated injection logic.
I/O cell Stands for Input/Output Cell An Interface cell that accommodates macros for input, output and bidirectional functions as well as 3-state enable drivers; where interface functions are performed. Interconnect verification Validation of the interconnections between macros used to form a circuit. Interface Taken to be the transition between external and internal levels of a device. Interface cell utilization limit The percentage of interface cells that may be used. This limits has application for specific arrays that have a non-one to one ratio to pads. In most cases 100% interface cell utilization is an assumed limit. Interface macro A macro whose function is to perform the translation between external and internal device levels. Internal cell utilization See cell utilization. A guideline for the percentage of internal cells that may be used before routing becomes difficult or impossible. Internal logic cell Cells designed for logical functions and not for I/O; can accommodate
Interface macro A macro whose function is to perform the translation between external and internal device levels. Internal cell utilization See cell utilization. A guideline for the percentage of internal cells that may be used before routing becomes difficult or impossible. Internal logic cell Cells designed for logical functions and not for I/O; can accommodate certain buffers. Internal logic macro library A set of macros that may be placed on the internal logic cells; they may or may not be restricted to those cells. Internal macro Internal logic macro; A functional macro. Internal pin A pin on a macro that is used to connect it to the other macros on the array (in the circuit). Internal pin count is a measure of routability.
J Junction-isolated Early low-density technology. Junction temperature See TJ. The temperature internal to the transistors on the array.
L L cell
LLCC
LDCC
See logic cell.
Abbreviation for leadless chip carrier. Package type.
Abbreviation for leaded chip carrier. Package type.
Leaded chip carrier See LDCC. Leadless chip carrier See LLCC. LED
Light-emitting diode.
Load delays The amount of time delay caused by loading on the macro due to electrical effects of fan-out loading, wire-OR loading, and the physical metal etch. For output macros, the time delay caused by the system and package pin capacitive loads. Logic array An array of predefined base-wafer transistor, diode and resistor components that can be configured into a variety of logical circuits by way of n-levels of metal interconnect. Levels number two or three for semi-custom arrays contain the logic. Logic cell Internal cell, Function cell Cells designed for logical functions and not for I/O; can accommodate certain buffers. Logic functions See functionality, includes AND, OR, NOT, NAND, NOR, INVERT and combinations of these. Low-power macro A macro option designed to use less power than the standard option at the cost of slower propagation delays. Low-power option Same as low-power macro; macro options. LSI
Large Scale Integration, from 200 to 1000+ equivalent gates on a chip.
LSTTL Low power Schottky TTL.
M Macro Pre-designed logical function with a name; May be multiple cells. Macro functions Logic functions that are performed by the macros in a library. Macro library The collected set of macros that are valid for use in a given array or array series. Macro library element A macro, either simple (0.5 cell to n cells) or complex. MacroMatrix• An integrated set of software tools for logic array design. Macro options Variations on the basis macro specification such as low-power, highpower, high-speed, termination or pull-down, pull-up resistors, etc. The propagation time and current specifications of the options will vary from the standard while the function performed does not. Drivers do not usually have options. Different ECL types and power supply configurations are handled by macro versions. Macro oriented An array arranged as repeated groups of configurable components in contrast to a "sea of primitive gates" structure. Macro version A macro meant for another ECL type (ECL 10K versus ECL 100K) or power supply (+5V REF ECL versus STD REF ECL). The parameters are identical. Mask
A level of interconnect or via (through-holes) defining one layer of a die.
Mask-programmable devices Example devices are gate arrays and ROMs. Monostable A circuit with one stable state and one quasi-stable state. Also called a one-shot, a delay circuit, a single-cycle circuit, a gating circuit. MOS
MSI
Metal-oxide semiconductor Developed for denser, lower performance circuits. Simpler processing and fewer mask layers allow larger circuits to be produced.
Medium Scale Integration. 20-100+ equivalent gates on a chip.
MSI functions Functions of size/complexity appropriate to MSI devices; Medium scale functions; MUX, decoder, register. See above. MSI library A collection of large, multiple cell macros MSI logic macros Large, multiple cell macros MUX
Multiplexor, also spelled multiplexer A select one of n device where n input lines are reduced to one output, the nth line active is selected by one or more selection control inputs.
N Net
The etch required to connect one output pin to all of its destination pins plus any etch required to connect any other sources wire-ORed to that pin. Also described as "photolitho-graphically determined interconnect metalization". A wire-net refers to the representation of a net
on a circuit schematic. A net segment is a piece of a net from one end to a node. Netlist A listing of all the interconnects within a circuit. Node
ns
The interconnect point of two or more nets.
Nanosecond, 10-9 sec.
O Output cell A cell that accommodates output-only functions. Output macro Macro that performs the translation from internal levels to external levels for a device. Depending on the cell complexity, it may require the use of a buffer or may provide its own. Output.dly A data file created and used by AMCCANN. Overhead circuitry Overhead circuitry consisting of bias generators and voltage references is pre-defined in the base array. Overhead current The current dissipated by the overhead circuitry. It may be a function of the numbers and types of interface macros used. Oxide-isolated Another bipolar technology.
P Package The enclosure used to protect the die and interconnect it to the board. Package power and ground planes Newer packages provide internal planes to which fixed power and ground pads may be connected. Allow the 1 to 1 ratio of fixed power and ground pads to package power and ground pins to be altered. Allow added power and grounds to connect to internal planes only if placed on pads accessible to the internal planes. Package signal pin An external package pin that carries a variable signal level as opposed to one that carries power or ground. PAL•
A registered Trademark of Monolithic Memories, Inc. Program-mable Array Logic. An optimized variant of a PLA (more inputs, more outputs, or more functionality by reducing the width of the OR portion of the array). "AND" array user-programmable, "OR" array preprogrammed, groups of product terms are ORed to the y outputs according to the pre-arranged, pre-programmed pattern.
Passive components Example passive components of a die are resistors, capacitors and conductors. PC or PCB A printed circuit board has interconnections between points printed in metal on the board. "PC board" could be confused with a personnel computer board. PGA
PLA
Pin grid array; A package type suitable for high pin-count requirements, it has pins .100 inch apart on 10x10 through 17x17 matrices brazed perpendicular to the die cavity.
Programmable Logic Array Either AND-OR or NOR-NOR structure.
Power/density Ratio of the density (number of gates) to the power (current and therefore heat) possible in a given technology. Power option Macro option that provides extra current to drive additional fan-out loads without altering the function or the propagation delay. Used on older families of arrays. POWER_SUPPLY This chip macro parameter is available only to ECL or standardreference ECL/TTL mixed circuits. For 100% ECL circuits it specifies the standard -5.2V (STD5) or -4.5V (STD4) supply or the +5V reference supply (5VREF). For mixed circuits it specifies either -5.2V or -4.5V. It is used with PRODUCT_GRADE to determine AMCCANN delay file values. PRODUCT_GRADE This chip macro parameter is used to identify a circuit as MILitary or COMmercial. Along with POWER_SUPPLY, it is used by AMCCANN to determine the contents of the time delay files (depending on the library). PRODUCT_NAME An array or circuit code name used to identify a design. It appears as a parameter on the chip macro. PROM
Programmable read-only memory; "AND" array, pre-programmed (at the factory), n inputs, 2**n product terms; "OR" array, userprogrammable, any of the product terms can be ORed to the y outputs.
Propagation delay The time it takes a signal to pass through a macro from input to output, specified in typical time in the macro listing in the AMCC Design Manuals. ps
Picosecond, 10-12 sec.
Pseudo-ECL ECL operating in the TTL voltage range, using +5V and GND. More properly called +5V REF ECL.
Q Q
QN
True output of a flip/flop or latch.
Complementary output of a flip/flop or latch.
R R
Reset Q output of a latch or flip/flop to FALSE, usually = 0.
Radiation-hard RAD-hard Degree of resistance from radiation effects. RAM
Random Access Memory, used to refer to Readable - Writable Memory
Rising-edge active A flip/flop or device that can change state on the rising edge of the active clock and is static during the falling edge of the active clock. ROM
Read Only Memory
Routing channels Paths that can be used during layout to complete circuit interconnections. that replaces the static current source of the emitter follower. This allows lower power, higher drives, and reduced skews.
S Schematic Logic symbol design. Macro symbol representation of the design. Also refers to blueprints created by other means. Schematic capture The process of entering macros and their interconnections into an EWS before error analysis and simulation. Semicustom array An array that is pre-designed in all base levels, leaving the top n levels for user-defined connections. Series, as in logic array series. Devices that are similar, differing in size; a related set of products. Series gating Transistor chaining. S
Set Q output of a latch or flip/flop to TRUE, usually = 1.
Signal pin A pin carrying a variable signal rather than power or ground. This is also used as an abbreviation for package signal pin. Silicon compiler Software that translates a circuit description at the behavior or gate level to a pattern generator tape. Simulation vector A bit-pattern used to evaluate functionality, parametrics, or timing performance of a circuit or array. Single-ended input Single rail One polarity (as opposed to differential input). Skew
The amount of variation in propagation delay between two logical gates. A function of placement, switching, whether the two gate functions are on identical macros, on the same power bus, have the same loading, etc.
Speed/power ratio Term used to indicate the tradeoff of higher speed for higher current and therefore power dissipation. SSI
Small Scale Integration; 2-20 equivalent gates on a chip; AND, OR, NOT level gates.
Stabilization time Time required for a circuit to reach a stable, known state. Standard cell An approach to customization in which all mask levels (as many as 14) are stored with a macro. The base wafer is not made ahead of time. Tooling costs are higher than with semi-custom gate arrays. Debug takes longer than for semi-custom. Depending on the designer, it may save die area and therefore overall cost for large production runs. Differences between it and gate arrays are in reality slight and die costs are not the leading expense. Standard macro The basic parametric configuration for a macro. All macros have an Soption although drivers have only one option. SWGROUP A macro parameter that allows the user to "tag" members of a switching group (simultaneously switching outputs).
T Testability Analysis Evaluation of the controllability and observability of a circuit. Controllability is the measure of how easy it is to toggle a node. Observability is the measure of whether the toggling of a node can be seen at an output easily, with difficulty, or not at all. Thermal diode Constructed from input and output macros with a choice of sizes (2X transistor normally used), this device allows thermal measurements to be made for a device. A thermal diode is incorporated into the base for the Q20000 Series arrays. Three-level gating Three-stage transistor staging; proprietary technique used in the earlier AMCC Q3500 and Q5000 Series Bipolar Logic Arrays. Trench-oxide isolated Technology used in the bipolar high-speed AMCC Q20000 Series (1.2GHz). TTL
TTL
Transistor-Transistor logic, AKA T2L.
output drive See IOL
Turbo driver An AMCC-patented dynamic discharge circuit that replaces the static current source of the emitter follower. This allows lower power, higher drives, and reduced skews.
U Uncommitted logic Logic whose use or application is not predetermined by the manufacturer.
V VHDL
VHSIC hardware description language, a high-level design language.
VHSLSI Very high-speed large scale integration. See VHSIC. VHSIC Very high-speed integrated circuits, from the American military program (Department of Defense). VLSI
Very large scale integration, over 1000 equivalent gates, (100010000 gates on a chip).
W Wafer The silicon slice (various diameter circles) upon which multiple layers of doped materials have been placed to form a number of usable component chips known as dice. Yield is usually stated as the number of usable die per wafer. Wafers consist of layers of conducting, partially conducting and non-conducting materials. The layers form active and pas-sive components of the die. Worst-case COMMERCIAL Refers to the multiplication factors used to compute maximum worstcase power and maximum worst-case path propagation delay when the circuit is for COMMERCIAL application. Or, refers to the worstcase maximum specifications under Commercial operating conditions. Worst-case MILITARY Refers to the multiplication factors used to compute maximum worstcase power or maximum worst-case path propagation delay when the circuit is for MILITARY application. Or, refers to the worst-case maximum specifications under Military operating conditions. Worst-case multiplier The adjustment factor specified for use in computing maximum worstcase current from typical current specifications, or for computing maximum or minimum worst-case time delays from typical time delays. Adjustment factors may be required to adjust for powersupply. Refer to the Design Manual for the specific array to determine what multipliers are required.
Symbols fmax
ICC
Maximum clock frequency; highest rate at which a clock input can be driven and still maintain stable transitions.
The current drawn by the macro, into the VCC supply pin of the circuit. For individual macros, the current drawn by that macro (TYPICAL). When computing the supply current for an array, it will be a function of the macros used in the array plus the overhead current for the I/O mode. It may also be a function of the number of TTL input macros.
ICC HIGH The current drawn by the macro when the output is logical HIGH. ICC HIGH-Z The current drawn by the macro when the output of high-impedance OFF. ICC LOW The current drawn by the macro when the output is logical LOW. IEE
II
IIL
IIH
IOH
IOL
IOS
PW
The current drawn by the macro, into the VEE supply of the circuit. For individual macros, the current drawn by that macro (TYPICAL). When computing the supply current for an array, it will be a function of the macros used in the array plus the overhead current for the I/O mode.
Input HIGH current at maximum Vin.
TTL input current when the input logic level is LOW.
TTL input current when the input logic level is HIGH.
TTL output current when the output logic level is HIGH.
TTL output current when the output logic level is LOW. The current drawn by the macro is the sink capability of an output. For AMCC bipolar Logic Arrays this is 20 mA.
Output short circuit current.
Pulse width. The minimum time required between edges of the driving signal. AMCC specifies PW as the worst-case (the minimum for which operation is guaranteed).
Th
TJ
TPHL
TPHZ
TPLH
TPLZ
TPZL
TPZH
Trec
Tsu
VIH
VIK
VIL
VOH
VOL
Hold time; the minimum time between the application of an active edge of the clock signal and the removal of the data signal being clocked. A negative hold time implies that the data may be removed before the arrival of the active edge of the clocking signal.
Junction temperature of a device; specified as the maximum for which operation can be guaranteed. For MILITARY circuits, TJ = 150o C. For COMMERCIAL circuits, TJ = 130o C.
Propagation delay time, HIGH-to-LOW-level output.
Output disable time, HIGH-to-high-impedance (off) output.
Propagation delay time, LOW-to-HIGH-level output.
Output disable time, LOW-to-high-impedance (off) output.
Output enable time, high-impedance (off) to LOW output.
Output enable time, high-impedance (off) to HIGH output.
Recovery time; the minimum time required between the removal of a set or reset and the next active edge of the clock for the correct operation of the device to be guaranteed.
Set-up time; the minimum time between the application of a data signal and the active edge of the clock. A negative set-up time implies that the data must remain "set-up" after the active edge of the clock. AMCC specifies Tsu as worst-case (the minimum for which operation is guaranteed).
High-level input voltage. The minimum voltage that should be applied to the input of a device for a logical 1 voltage level. A maximum may be specified; the input current will become very large if this maximum is exceeded.
Input clamp diode voltage, limits input swing below ground (TTL).
Low-level input voltage. The maximum voltage that should be applied to the input of the device for a logical 0 voltage level.
High-level output voltage.
Low-level output voltage.