Robustness and Usability in Modern Design Flows
Robustness and Usability in Modern Design Flows by
Görschwin Fey Univ...
43 downloads
880 Views
769KB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Robustness and Usability in Modern Design Flows
Robustness and Usability in Modern Design Flows by
Görschwin Fey University of Bremen Germany and
Rolf Drechsler University of Bremen Germany
A C.I.P. Catalogue record for this book is available from the Library of Congress.
ISBN 978-1-4020-6535-4 (HB) ISBN 978-1-4020-6536-1 (e-book)
Published by Springer, P.O. Box 17, 3300 AA Dordrecht, The Netherlands. www.springer.com
Printed on acid-free paper
All Rights Reserved c 2008 Springer Science + Business Media B.V.
No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work.
To Liva Jolanthe and Luna Sophie
CONTENTS
Dedication List of Figures List of Tables Preface 1. INTRODUCTION 2. PRELIMINARIES 2.1 Boolean Reasoning 2.1.1 Boolean Functions 2.1.2 Binary Decision Diagrams 2.1.3 Boolean Satisfiability 2.2 Circuits 2.2.1 Circuits and Traces 2.2.2 BDD Circuits 2.2.3 Transformation into CNF 2.3 Formal Verification 2.3.1 Equivalence Checking 2.3.2 Bounded Model Checking 2.4 Automatic Test Pattern Generation 2.4.1 Fault Models 2.4.2 Combinational ATPG 2.4.3 Classical ATPG Algorithms
v xi xv xvii 1 9 9 9 10 13 19 19 22 23 25 25 27 31 31 32 33
viii
ROBUSTNESS AND USABILITY
3. ALGORITHMS AND DATA STRUCTURES 3.1 Combining SAT and BDD Provers 3.1.1 Proof Techniques 3.1.2 Hybrid Approach 3.1.3 Experimental Results 3.2 Summary and Future Work
37 37 38 40 45 49
4. SYNTHESIS 4.1 Synthesis of SystemC 4.1.1 SystemC 4.1.2 SystemC Parser 4.1.3 Characteristics 4.1.4 Experimental Results 4.2 Synthesis for Testability 4.2.1 BDD Transformation 4.2.2 Testability 4.2.3 Experimental Results 4.3 Summary and Future Work
51 52 54 55 59 60 65 66 68 69 72
5. PROPERTY GENERATION 5.1 Detecting Gaps in Testbenches 5.1.1 Generating Properties 5.1.2 Selection of Properties 5.1.3 Experimental Results 5.2 Design Understanding 5.2.1 Methodology 5.2.2 Comparison to Other Techniques 5.2.3 Work Flow 5.2.4 Experimental Results 5.3 Summary and Future Work
75 77 78 81 83 87 87 91 91 92 97
6. DIAGNOSIS 6.1 Comparing SAT-based and Simulation-based Approaches 6.1.1 Diagnosis Approaches 6.1.2 Relation Between the Approaches 6.1.3 Qualitative Comparison 6.1.4 Experimental Results 6.2 Generating Counterexamples for Diagnosis 6.2.1 Choosing Good Counterexamples
99 101 102 107 109 112 115 116
ix
Contents
6.3
6.4
6.2.2 Heuristics to Choose Counterexamples 6.2.3 Experimental Results Debugging Properties 6.3.1 Other Diagnosis Approaches 6.3.2 Diagnosis for Properties 6.3.3 Source Level Diagnosis 6.3.4 Experimental Results Summary and Future Work
123 126 130 132 133 141 142 147
7. SUMMARY AND CONCLUSIONS
149
References
151
Index of Symbols
163
Index
165
LIST OF FIGURES
1.1 1.2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 3.1 3.2 3.3
Traditional design flow Enhanced design flow Example for a BDD BDD Gπb BDD Gϕ b DPLL procedure Decision stack Basic gates Simulation trace for the shift-register 1-bit-shift-register Multiplexor cell MUX Example for a BDD circuit Example for the conversion into CNF Miter circuit for equivalence checking SAT instance for BMC 1-bit-shift-register Example for the SAFM Boolean difference of the faulty circuit and the fault free circuit Justification and propagation Different approaches Overview over different node types Depth first traversal
3 5 11 12 12 15 17 19 21 21 22 23 25 26 27 30 31 33 34 41 42 44
xii
ROBUSTNESS AND USABILITY
3.4 3.5 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11
Modified node structure Solution to the 5-Queens problem Synthesis part of the design flow Data types in the process counter proc Process counter proc of the robot controller from [GLMS02] AST for Example 16 Overall synthesis procedure Intermediate representation Arbiter: Block-level diagram Arbiter: Top-level module scalable FIR-filter: Block-level diagram Generation of circuits from BDDs Redundancy due to simplification Verification part of the design flow Integration into the verification flow Sketch of the property generation Simulation trace for the shift-register 1-bit-shift-register Runs resulting in a valid property for misex3 Time needed for property generation for misex3 Current verification methodology Proposed methodology Application of property deduction The arbiter Code of the arbiter Fault diagnosis in the design flow Basic simulation-based diagnosis Example of a sensitized path SAT-based diagnosis Basic SAT-based diagnosis Diagnosis based on set cover Example: COV may not provide a correction Example: Solution for k = 2 by BSAT but not by COV BSAT vs. COV: Average distance BSAT vs. COV: Number of solutions Circuit corresponding to the instance I1 of MI
45 46 52 56 56 56 57 58 60 61 63 67 69 76 78 79 79 81 85 85 88 90 91 94 96 100 103 104 105 105 107 108 109 114 115 121
List of Figures
6.12 6.13 6.14 6.15 6.16 6.17 6.18 6.19 6.20 6.21 6.22 6.23 6.24
Algorithm to build the subset circuit Greedy algorithm to choose counterexamples Number of candidates Time for diagnosis Faulty arbiter circuit Circuit with gate g2 as diagnosis (Ω = req + ack + X ack, Ψ = ack + X ack). State elements considered for Ackermann constraints Pseudocode of the static decision strategy Source code link State machine for branch prediction Source code for bpb am2910: Runtime vs. number of diagnosed components gcd: Runtime vs. Number of diagnosed components
xiii 122 125 129 130 135 136 138 140 142 143 144 147 147
LIST OF TABLES
2.1 3.1 3.2 3.3 3.4 4.1 4.2 4.3 4.4 4.5 4.6 5.1 5.2 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9
Transformation of an AND-gate into a CNF formula 24 Index of node types (32-bit) 45 Heuristics to limit the size of the hybrid structure 46 Selection of expansion nodes 47 ESOP minimization 48 Arbiter: Synthesis results 62 FIR-filter: Synthesis results 63 ISCAS 89: Synthesis results 64 Benchmarks before and after optimization by SIS 70 Path-delay fault coverage of BDD circuits 71 Path-delay fault coverage of BDD circuits optimized by sifting 71 Sequential benchmarks, tcyc = 1,000,000 86 Sequential benchmarks, tcyc = 100, 000 93 Comparison of the approaches 110 Run time of the basic approaches 112 Quality of the basic approaches 113 Circuit data 127 Results using two counterexamples 127 Results using three counterexamples 128 Results using four counterexamples 128 Diagnosis results for multiple counterexamples 145 and Ackermann constraints Run times for the different approaches 146 (using four counterexamples)
PREFACE
The size of technically producible integrated circuits increases continuously. But the ability to design and verify these circuits does not keep up with this development. Therefore, today’s design flow has to be improved to achieve a higher productivity. In this book the current design methodology and verification methodology are analyzed, a number of deficiencies are identified, and solutions are suggested. Improvements in the methodology as well as in the underlying algorithms are proposed. An in-depth presentation of preliminary concepts makes the book self-contained. Based on this foundation major design problems are targeted. In particular, a complete tool flow for Synthesis for Testability of SystemC descriptions is presented. The resulting circuits are completely testable and test pattern generation in polynomial time is possible. Verification issues are covered in even more detail. A whole new paradigm for formal design verification is suggested. This is based upon design understanding, the automatic generation of properties, and powerful tool support for debugging failures. All these new techniques are empirically evaluated and experimental results are provided. As a result, an enhanced design flow is created that provides more automation (i.e. better usability) and reduces the probability of introducing conceptual errors (i.e. higher robustness).
Acknowledgments We would like to thank all members of the research group for computer architecture in Bremen for the helpful discussions and the great atmosphere during work and research. Furthermore, we would like to thank all our coauthors of the papers that make up an important part of this book: Roderick Bloem, Tim Cassens, Christian Genz, Daniel Große, Sebastian Kinder, Sean Safarpour, Stefan Staber, Andreas Veneris, and Tim Warode. Rüdiger Ebendt helped us in proofreading while unifying the notations. We would like to thank Lisa Teuber for designing the cover page. Antje Luchs patiently helped to improve the presentation for nonexperts. Görschwin Fey and Rolf Drechsler Bremen, September 2007
Chapter 1 INTRODUCTION
Almost every appliance used in daily life has an integrated circuit as a control unit. This applies not only to a modern television or a washing machine but also to cars or airplanes where security critical tasks are controlled by circuits. Up to several 100 million gates are contained in such an integrated circuit – also called “chip”. Moreover, the number of elements that are composed into a single chip doubles every 18 months according to Moore’s Law. This causes an exponentially increasing size of the problem instances that have to be handled during circuit design. Techniques and tools for computer-aided design (CAD) are available to create such complex systems. But often the tool development does not keep up with the progress in fabrication techniques. The “design gap” is resulting, i.e. the size of the circuits that can be produced increases faster than the productivity of the design process. One major issue is the robustness of the design tools. While a tool may produce an output of high quality within an acceptable run time for one design, this may not be the case for another design. Also, the performance of the tool cannot be predicted from the design itself. This behavior is not desirable while designing a circuit. But it is inherent to the problems solved by these tools. Many of these problems are computationally complex – often NP-complete – and, additionally, the size of the problem instances grows exponentially. For this reason, the underlying algorithms have to be continuously improved. This means to reduce the run time of these algorithms while keeping or even improving the quality of the output. A second reason for the design gap is the low usability of circuit design tools. Often a high expertise and long experience are needed, e.g. to adjust the large number of parameters or to optimally interpret the output. By automating more tasks to help the designer and providing tools that are easy to use, these steps become easier and, as a result, the design productivity increases.
2
ROBUSTNESS AND USABILITY
This book addresses both of these aspects: robustness and usability. For this purpose the current – in the following also called “traditional” – design flow is considered as a whole. A number of hot spots is identified where an improvement of either robustness or usability of the tools can significantly improve the overall productivity. Solutions to these methodological weaknesses are proposed. This leads to a new enhanced design flow based on the intensive use of formal methods. First, the traditional design flow is briefly reviewed and deficiencies are identified. Then, solutions for these deficiencies and the enhanced design flow are presented. This presentation is kept brief because the whole design flow is covered. A more detailed explanation of the problems and a motivation for the proposed solutions follows at the beginning of each chapter that addresses a particular problem. The major steps of the traditional design flow are shown in Figure 1.1 on Page 3. The design process itself is sketched in the left part of the figure while the right part shows the verification procedures. Rounded boxes denote tasks and angular boxes denote input data or output data of these tasks. Initially, a specification of the circuit is written, usually as a text book in natural language. This textual specification is then manually coded in two formal languages. An executable system description in terms of an ordinary software programming language (often C/C++ ) serves as an early system model to allow for the development of software and for simulation-based verification. Additionally, a synthesizable description in terms of a Hardware Description Language (HDL) is necessary. Both descriptions are usually coded independently. This redundancy in the design flow significantly extends the design time and, even worse, may lead to inconsistencies between the different design descriptions. Based on the HDL description, synthesis is carried out to retrieve the circuit description for production, i.e. a gate level or transistor level representation. Simulation is applied to check the compliance of the system level description with the textual specification and with the synthesizable description of the system. A testbench is created manually to describe crucial scenarios that have to be considered during simulation. But the state space grows exponentially with the number of state elements. A design with only 100 state elements, for example, has 2100 states already. Todays circuits often have more than 100 k state elements. Therefore these dynamic verification approaches are inherently incomplete in the sense that neither all input scenarios nor all design states can be considered due to time limits. Formal property checking overcomes this weakness. The industrial application of property checking is at its beginning. The formal verification with respect to the textual specification of a 2 million gate design for UMTS data transfer was described in [WTSF04]. Formal equivalence checking is already state of the art to guarantee the correctness of subsequent synthesis steps if the synthesizable description of the design is
3
Introduction
Textual specification
Manual setup
Testbench
Manual coding
Simulation
System level description Counterexamples Manual coding
Manual fault diagnosis Equivalence check
Synthesizable description
Synthesis
Gate level description
Manual fault diagnosis
ATPG
Testset
Counterexamples
Task Input/ Output
Figure 1.1. Traditional design flow
available. Equivalence checking has already replaced simulation-based methods in many industrial design flows. But all these methods only help to detect the existence of design errors. The localization of design errors currently remains a time- consuming manual task. As a last step, m Automatic Test Pattern Generation (ATPG) is applied to calculate input stimuli for the postproduction
4
ROBUSTNESS AND USABILITY
test. But during synthesis testability issues are usually not considered and, therefore, ATPG is difficult; the underlying problem is NP-complete. In this book, several approaches are proposed to remove the deficiencies that exist in the traditional design flow. By combining these techniques, a new enhanced design flow emerges. The enhanced flow boosts the productivity of circuit design and thereby reduces the design gap. Formal techniques are excessively used for this purpose since it has been shown that they improve the productivity of individual steps in the traditional flow already. One reason is the high computational power of these techniques compared to nonsymbolic techniques like, e.g. simulation-based approaches. As a starting point, the underlying algorithms for Boolean function manipulation are considered with respect to particular needs. Binary Decision Diagrams (BDDs) and Boolean Satisfiability (SAT) are the dominant engines in this area. Currently, efficiency, i.e. to calculate a solution as fast as possible, is a major focus in the development of such algorithms. Increasing the robustness of the formal techniques is an important issue. This is achieved by combining concepts from BDDs and solvers for the SAT problem. The resulting integrated data structure allows to trade BDD-like behavior for SAT-like behavior and, by this, to exploit the strengths of both domains. Additionally, the data structure can be used to investigate “more interesting” parts of the search space more thoroughly than others. Efficient Boolean function manipulation is the core of several techniques to improve the overall design flow. The enhanced design flow itself is shown in Figure 1.2 on Page 5. Bold lines around boxes indicated sections modified in comparison to the traditional design flow. As a first major improvement, the enhanced flow tightly couples the system level description and the synthesizable description. The two languages that are typically used – the software programming language and the HDL – are replaced by SystemC [LTG97, GLMS02] (see also http://www.systemc.org). SystemC is a description language that includes constructs to specify software and hardware at different levels of abstraction. As a result, the system level description can directly be refined into a synthesizable description within a single language. By this, the robustness of the design task is improved because the transformation of the system level model can be done more efficiently. The improved refinement step is complemented by synthesis for testability. The proposed technique produces circuits that are fully testable under several fault models. Here, a representation of the function of the circuit as a BDD is used as a starting point. This functional representation is directly converted into a fully testable circuit. While ATPG is NP-complete in general, all faults can be classified in polynomial time on these circuits – a robust ATPG step is the result. The weak simulation-based techniques for design verification are replaced by state-of-the-art formal techniques, namely property checking. The slow manual creation of properties is aided by automatically generating properties
5
Introduction
Textual specification
Interactive creation
Manual coding
Simulation traces
System level description
Simulation
Properties
Property check
Counterexamples Manual refinement
Fault diagnosis Equivalence check
Synthesizable description
Synthesis (for test.)
Gate level description
ATPG
Testset
Fault diagnosis
Counterexamples
Task Input/ Output
Figure 1.2. Enhanced design flow
from simulation traces. This allows to apply a new design methodology at this step. Properties are created interactively. The approach has a number of advantages. Automatically generated properties help to understand the simulation traces and by this the design itself. If the proof of these properties fails on the design, this also helps to identify gaps in the simulation traces. When
6
ROBUSTNESS AND USABILITY
testbenches are used for simulation, this bridges the traditional flow and the enhanced flow. That is of great importance for the practical application. As a side effect, the formal properties verifying the synthesizable description of the system are created much faster when an interactive approach is used. By this, verification with respect to all input sequences and all states of a design can be done more easily – the usability of the verification tools is raised. An inconsistency between different design descriptions is usually indicated by counterexamples, no matter which technique – simulation, property checking, or equivalence checking – or design step – design description or synthesis – is considered. Debugging this inconsistency, i.e. identifying the real error site in the description, is a time-consuming manual task. Here, using techniques for automatic fault diagnosis drastically boosts the productivity. Efficient state-ofthe-art techniques for fault diagnosis are compared and a technique to improve the generation of counterexamples for diagnosis is presented. The extension of diagnosis methods for debugging errors that are detected by formal properties is also considered. In contrast to previous methods, no correct output response per counterexample has to be given in advance and the diagnosis results are presented at the source code level. Automatically, helping the designer to find design errors reduces the difficulty of interpreting results from formal verification tools and, by this, the usability increases. Altogether, the proposed techniques and verification methodology establish the enhanced design flow. Only equivalence checking is not further considered in this book. Robust and easy-to-use tools for this task are already state of the art in the industrial application. Finally, the main improvements of the enhanced design flow over the traditional design flow can be summarized as follows: Integration of SAT and BDDs for robust Boolean function manipulation Tight coupling of system-level description and synthesizable description Fully testable circuits Automatic generation of properties from simulation traces Detection of gaps in simulation traces Automatic debugging support Presentation of diagnosis results at the source code level All techniques that are proposed have been implemented and empirically evaluated. They have been developed to the extent of a robust application on benchmark cases. Experimental results, a discussion of related work, and possible future extensions for the proposed techniques are presented in the respective
Introduction
7
sections. Each chapter addresses a particular problem area. Due to the comprehensive coverage of the whole design flow, a more detailed explanation of the problem and a motivation of the proposed solution are given at the beginning of each chapter. There, the embedding into the overall flow is also shown. A summary, possible future extensions and further related papers are given at the end of each chapter. This book is structured as follows: In the second chapter, the basic notations and definitions are given for the different concepts to keep the presentation self-contained. In Chapter 3, improvements for underlying algorithms for Boolean reasoning are explained. Namely, the integration of BDDs and SAT provers is investigated. Then, the synthesis step of the design flow is considered in Chapter 4. The technique to create fully testable circuits from SystemC descriptions is introduced. This is done in two steps. A tool to parse and synthesize a SystemC description is presented. The gate level circuit is then transformed into a fully testable circuit. Chapter 5 presents the techniques and methodology to improve the verification flow. First, the automatic generation of properties from traces is explained from a technical point of view and the practical application to detect gaps in testbenches is proposed. Then, the transition towards a whole new verification methodology based on design understanding and interactive creation of formal properties is discussed. Techniques for automatic diagnosis are reviewed in Chapter 6. Simulationbased diagnosis and SAT-based diagnosis are compared in detail. Then, the problem to produce counterexamples for an increased diagnosis quality is examined from a theoretical and practical point of view. Next, a technique to aid debugging for property checking is presented. Based on counterexamples, the error candidates are automatically calculated at the source code level. In the last chapter, the contributions of this book are summarized and conclusions are presented.
Chapter 2 PRELIMINARIES
This chapter provides the necessary definitions and notations to keep the book self-contained. The complete design flow and, therefore, a wide area is covered ranging from Boolean reasoning and underlying techniques to applications like formal verification and ATPG. Therefore, the presentation is kept brief. A large number of books is available for an in-depth discussion of each topic. References to some of these books are given at the beginning of the respective sections.
2.1
Boolean Reasoning
In the following, the notations used for Boolean functions, Boolean expressions, binary decision diagrams, and Boolean satisfiability are briefly reviewed. A more detailed presentation can be found, e.g. in [HS96].
2.1.1
Boolean Functions
Notation 1. The set of Boolean values is given by B = {0, 1}. A Boolean function f is a mapping f : Bn → B. Usually f is defined over n variables X = {x1 , . . . , xn } in the following. This is denoted by f (x1 , . . . , xn ). A multi-output Boolean function is a mapping f : Bn → Bm . A Boolean function can be described in terms of a Boolean expression. A Boolean expression over a set X = {x1 , . . . , xn } is formed over The variables The unary operator · (NOT) The binary operators, · (AND), + (OR), ⊕ (XOR), → (implication), ↔ (equivalence) Parentheses
10
ROBUSTNESS AND USABILITY
Given a Boolean function f (x1 , . . . , xn ), the positive cofactor fxi and the negative cofactor fxi with respect to xi are defined as follows: fxi (. . . , xi−1 , xi+1 , . . .) = f (. . . , xi−1 , 1, xi+1 , . . .) f xi (. . . , xi−1 , xi+1 , . . .) = f (. . . , xi−1 , 0, xi+1 , . . .) The iterative cofactor fli1 ... lij , where lik ∈ {xik , xik }, is retrieved by iteratively calculating the cofactors fli1 , (fli1 )li2 up to (. . . ((fli1 )li2 ) . . .)lij .
2.1.2
Binary Decision Diagrams
As is well known, a Boolean function f : Bn → B can be represented by a Binary Decision Diagram (BDD) which is a directed acyclic graph G = (V, E) representing a Boolean function [Bry86]. The Shannon decomposition f (x) = xfx + xfx is carried out in each of the internal nodes with respect to a given variable x. The function represented by an internal node is determined recursively by the two children. Terminal nodes represent the constant functions. Output nodes represent functions that are considered externally, i.e. user functions. A BDD is called ordered if each variable is encountered at most once on each path from the root to a terminal node and if the variables are encountered in the same order on all such paths. A BDD is called reduced if it does not contain isomorphic subgraphs nor does it have redundant nodes. Reduced and ordered BDDs are a canonical representation since for each Boolean function the BDD is unique up to graph isomorphism [Bry86]. In the following, we refer to reduced and ordered BDDs as BDDs for brevity. By using Complemented Edges (CEs), the size of a BDD can be further reduced [BRB90]. In this book, both types of BDDs – with and without CEs – are considered. Formally, the order of the n variables of a Boolean function can be given by mapping the variable index to a level in the graph G: π : {1, . . . , n} → {1, . . . , n}. The index n+1 is assigned to terminal nodes. A BDD with CEs has exactly one terminal node, denoted by 1. The function isTerminal(v) returns true if, and only if, v is a terminal node. Each internal node has two successors, denoted by Then(v) and Else(v), and v ∈ V is labeled with an index Index(v) ∈ {1, . . . , n}. Alternatively, the function Label(v) returns the variable of a node, i.e. Label(v) = xIndex(v) . Due to the order π, the inequality π(Index(v)) < min{π(Index(Then(v))), π(Index(Else(v)))} always holds, i.e. a node is always above its children. Output nodes v ∈ V are labeled with Index(v) = 0, Label(v) is undefined. They always reside on
11
Preliminaries
the topmost level 0. These nodes have exactly one successor Else(v). An edge e = (v, Then(v)) is never a CE. For edges e with e = (v, Else(v)) the attribute CE(e) is true if and only if e is a CE. By this, an output node v represents the Boolean function f or f , respectively, where f is the Boolean function represented by Else(v). Output nodes are denoted by a function symbol in all figures. In the following, Gπf denotes a BDD representing the Boolean function f with respect to variable order π. If clear from the context, π and f are omitted. The size of a BDD refers to the number of nodes excluding terminal nodes and output nodes. Example 1. Figure 2.1 shows the BDD for f = x1 x2 x3 + x1 x2 x4 + x1 x2 x3 x4 + x1 x2 x3 x4 . Edges from a node w to Else(w) are dashed; edges to Then(w) are solid. A dot denots a CE. The output node is denoted by the function symbol f . The BDD has a size of five. The implementations handle BDDs with CEs. BDDs without CEs are considered in some examples to keep the presentation simple. They have two terminals 0 and 1 but no edge attributes. As a result, two different nodes are needed to represent a function and its complement. In the worst case a BDD v0
f
v1 x4 v2
v3 x3
x3 v4 x2
v5
x1
1
1
Figure 2.1. Example for a BDD
12
ROBUSTNESS AND USABILITY
without CEs has twice the number of nodes compared to a BDD with CEs [BRB90]. The size of a BDD depends on the variable order. Example 2. Bryant [Bry86] gave the function b = x1 xn+1 + x2 xn+2 + · · · + xn x2n and the two variable orders π = (x1 , xn+1 , . . . , xn , x2n ), ϕ = (x1 , x2 , . . . , x2n−1 , x2n ) as an example. Figures 2.2 and 2.3 show the BDDs for variable orders π and ϕ, respectively. The BDD of b has a size of O(n) when π is used, but the size is O(2n ) when ϕ is used. The problem to decide whether a given variable ordering can be improved is NP-complete [BW96]. Efficient heuristics have been proposed to find a good variable order for BDDs. For example, Rudell’s sifting algorithm [Rud93] is b
b
x1
x1
x2
xn+1
x2
x2 x3
x3
x3
x3
xn+2 x2n−1
xn
x2n
x2n 0
1 Figure 2.2.
x2n−1
BDD Gπb
1 Figure 2.3. BDD Gϕ b
0
13
Preliminaries
quite fast while techniques based on evolutionary algorithms usually yield better results at the cost of a higher run time [DBG96].
2.1.2.1 Efficient Implementation Using BDDs in practice is relatively easy since efficient BDD packages are available, e.g. CUDD [Som01a]. A BDD node v is stored as a triple (Index(v), Then(v), Else(v)), where Then(v) and Else(v) are pointers to the memory locations of the children of v. The least significant bits of these pointers are always zero in modern computers that address the memory word-wise. To save memory, the attribute CE(v, Else(v)) is stored in the least significant bit of the pointer to Else(v). A hash is used to uniquely store the tuples representing all nodes. This hash is called the unique table. An advantage of BDDs is the efficiency of Boolean operations [Bry86]. Consider a Boolean operation ◦ ∈ {·, +, ⊕, ↔}. Given two functions f and g, the result of f ◦ g is calculated as follows: f ◦ g = (xfx + xfx ) ◦ (xgx + xgx ) = x(fx ◦ gx ) + x(fx ◦ gx )
(2.1)
Given the BDD nodes representing the functions f and g, this corresponds to the construction of a node to represent f ◦ g. This node is determined by recursively calculating the result of the operation on the children. By using the unique table, an existing node is reused if the function was already represented within the BDD package. Otherwise, a new node is created. This guarantees that only reduced BDDs are created and no additional reduction step is necessary. A second hash, the computed table, is used to efficiently carry out the recursive descent. The computed table is accessed via the operands and the operation as key. The value stores the result of the operation. Each time a result is calculated this is stored in the computed table. Therefore, each pair of nodes is only considered once with respect to a particular binary Boolean operation.
2.1.3
Boolean Satisfiability
Besides, BDDs solvers for Boolean Satisfiability (SAT) provide a powerful reasoning engine for Boolean problems. In CAD of circuits and systems, problems are frequently transformed into an instance of SAT. Then, a SAT solver can be used to calculate a solution and the SAT solution is transformed back into the original problem domain. In particular, SAT solvers are often used as the underlying engine for formal verification. In this work, SAT solvers are applied for diagnosis. Moreover, the concepts will be important when considering the underlying algorithms for Boolean function manipulation. The SAT problem and efficient algorithms to solve a given SAT instance are reviewed in this section.
14
ROBUSTNESS AND USABILITY
Given a Boolean function f (x1 , . . . , xn ) in conjunctive normal form, the SAT problem is to find an assignment a = a1 , . . . , an for x1 . . . . , xn such that f (a1 , . . . , an ) = 1 or to proof that no such assignment exists. For the corresponding decision problem the question whether a exists has to be answered. This was the first problem that was proven to be NP-complete [Coo71]. Despite this proven difficulty of the problem, algorithms for SAT solving have been proposed recently that efficiently solve many practical SAT instances.
2.1.3.1 SAT Solver SAT solvers usually work on a database that represents the Boolean formula in Conjunctive Normal Form (CNF), also called product of sums. A CNF formula is a conjunction (product) of clauses where each clause is a disjunction (sum) of literals. Finally, a literal is a variable or its complement. The objective during SAT solving is to find a satisfying assignment for the given Boolean formula or to prove that no such assignment exists. A CNF formula is satisfied if all clauses are satisfied. A clause is satisfied if at least one literal in the clause is satisfied. The literal x is satisfied if the value 1 is assigned to variable x. The literal x is satisfied if the value 0 is assigned to variable x. If there exists a satisfying assignment for a formula, the formula is said to be satisfiable, otherwise the formula is unsatisfiable. Example 3. The following Boolean formula is given in CNF: f (x1 , x2 , x3 , x4 ) = (x1 + x3 + x4 ) · (x1 + x3 + x4 ) · (x1 + x3 + x4 ) w1
w2
w3
· (x1 + x3 + x4 ) · (x1 + x2 + x3 ) w4
w5
This CNF formula has five clauses w1 , . . . , w5 . A satisfying assignment for the formula is given by x1 = 1, x2 = 1, x3 = 1 and x4 = 0. Therefore this formula is satisfiable. Modern SAT solvers are based on the DLL procedure that was first introduced in [DLL62] as an improvement upon [DP60]. Often the DLL procedure is also referred to as DPLL. In principle, this algorithm explores the search space of all assignments by a backtrack search as shown in Figure 2.4. Iteratively, a decision is done by choosing a variable and a value for this variable according to a decision heuristic (Step 1). Then, implications due to this assignment are carried out (Step 2). When all clauses are satisfied, the problem is solved (Step 3). Otherwise, the current assignment may only be partial and therefore no conclusion is possible, yet. In this case, further assignments are necessary (Step 4). If at least one clause cannot be satisfied under the current (partial) assignment, conflict analysis is carried out as will be explained below.
15
Preliminaries
1. Decision: Choose an unassigned variable and assign a new value to the variable. 2. Boolean Constraint Propagation: Carry out implications resulting from the previous assignment. 3. Solution: If all clauses are satisfied, output the current variable assignment and return “satisfiable.” 4. If there is no unsatisfied clause due to the current assignment, proceed with Step 1. 5. Conflict analysis: If the current assignment leads to at least one unsatisfied clause without unassigned literals, carry out conflict analysis and add conflict clauses. 6. (Non-chronological) Backtracking: Undo the most recent decision where switching the variable could lead to a solution, undo all implications due to this assignment and switch the variable value. Go to Step 2. 7. Unsatisfiable: Return “unsatisfiable.” Figure 2.4.
DPLL procedure
Then, a new branch in the search tree is explored by switching the variable value (Step 6). When there is no decision to undo, the search space has been completely explored and the instance is unsatisfiable (Step 7).
2.1.3.2 Advances in SAT Only after some substantial improvements over the basic DPLL procedure in the recent past SAT solvers became a powerful engine to solve real world problems. In particular, these improvements were: efficient Boolean Constraint Propagation (BCP), conflict analysis together with non-chronological backtracking, and sophisticated decision heuristics. BCP carries out implications due to previous decisions. In order to satisfy a CNF formula, all clauses must be satisfied. Now, assume that under the current partial assignment all but one literal in a clause evaluate to 0 and the variable of the last literal is unassigned. Then, the value of this last variable can be implied in order to evaluate the clause to 1.
16
ROBUSTNESS AND USABILITY
Example 4. Again, consider the CNF formula from Example 3. Assume the partial assignment x1 = 1 and x2 = 1. Then, due to clause w5 = x1 + x2 + x3 the assignment x3 = 1 can be implied. After each decision BCP has to be carried out and, therefore, the efficiency of this procedure is crucial for the overall performance. In [MMZ+ 01] an efficient architecture for BCP was presented for the SAT solver Chaff (the source code of the implementation Zchaff can be downloaded from [Boo04]). The basic idea is to use the two literal watching scheme to efficiently detect where an implication may be possible. Two literals of each clause are watched. Only if one of these literals evaluates to 0 upon a previous decision and the other literal is unassigned, an implication may occur for the clause. If no implication occurs because there is a second unassigned literal, this second literal is watched. For each literal a watching list is stored to efficiently access those clauses where the particular literal is watched. Therefore, instead of always touching all clauses in the database, only those clauses that may cause an implication are considered. Conflict analysis was first proposed in [MS96, MS99] for the SAT solver GRASP. In the traditional DPLL procedure only the most recent decision was undone when a conflict, i.e. a clause that is unsatisfied under the current assignment, was detected. In contrast, a modern SAT solver analyzes such a conflict. During BCP, a conflict occurs if opposite values are implied for a single variable due to different clauses. Then, the decisions that were responsible for this conflict are detected. These decisions are the reason for the conflict. From this reason a conflict clause is created to prevent the solver to reenter the same search space. As soon as all but one literal of the conflict clause are assigned, BCP takes over and implies the value of the remaining literal. As a result the previously conflicting assignment is not considered again. Finally, the SAT solver backtracks to the decision before the last decision that participated in the conflict. Switching the value of the last decision that lead to the conflict is done by BCP due to the inserted conflict clause. So this value assignment becomes an implication instead of a decision – also called conflict driven assertion. Example 5. Again, consider the CNF formula from Example 3: f (x1 , x2 , x3 , x4 ) = (x1 + x3 + x4 ) · (x1 + x3 + x4 ) · (x1 + x3 + x4 ) · (x1 + x3 + x4 ) · (x1 + x2 + x3 ) Each time the SAT solver makes a decision, this decision is pushed onto the decision stack. Now, assume that the first decision at decision level L0 is the assignment x1 = 0. No implications follow from this decision. Then, the solver decides x2 = 0 at L1. Again, no implications follow. The solver decides x3 = 1 at L2. Now, according to clause w1 the assignment x4 = 1 is implied,
17
Preliminaries
L0 x1=0
w1
w3 L0 x =0 1 w6
x4=1
w2 L1 x2=0
w4 L1 x2=0
x4
w1 L2 x3=1 w x4=0 2
x3=0 w 3
x4=1
w4
x4
x4=0
L2
(a) Configuration 1
(b) Configuration 2
w7
L0 x =1 2
x1=1
w5 x3=1
L1 x =0 4
L2
(c) Configuration 3
Figure 2.5. Decision stack
but also, due to w2 , the assignment x4 = 0 is implied. Therefore, a conflict with respect to variable x4 occurs. This situation is shown in Figure 2.5(a). The decision stack is shown on the left hand side. The solver tracks reasons for assignments using an implication graph (shown on the right hand side). Each node represents an assignment. Decisions are represented by nodes without predecessors. Each implied assignment has the reason that caused the assignment as its predecessors. The edges are labeled by the clauses that cause an assignment. In the example, the decisions x1 = 0 and x3 = 1 caused the assignment x4 = 1 due to clause w1 . Additionally, this caused the assignment x4 = 0 due to w2 and a conflict results. By traversing the graph backwards, the reason for the conflict, i.e. x1 = 0 and x3 = 1, can be determined. Now, it is known that this assignment must be avoided in order to satisfy the CNF formula. This information is stored by adding the conflict clause
18
ROBUSTNESS AND USABILITY
w6 = (x1 + x3 ) to the CNF formula. Thus, the nonsolution space is recognized earlier while searching – this is also called conflict based learning. The decision x3 = 1 is undone. Due to x1 = 0 and the conflict clause w6 , the assignment x3 = 0 is implied which is called a conflict driven assertion. The implication x3 = 0 triggers a next conflict with respect to x4 as shown in Figure 2.5(b). The single reason for this conflict is the decision x1 = 0. So the conflict clause w7 = (x1 ) is added. Now, the solver backtracks above decision level L0. This happens because the decision x2 = 0 was not a reason for the conflict. Instead, nonchronological backtracking occurs – the solver undoes any decision up to the most recent decision that was involved in the conflict. Therefore, in the example, the decisions x2 = 0 and x1 = 0 are undone. Due to the conflict clause w7 , the assignment x1 = 1 is implied independent of any decision as shown in Figure 2.5(c). Then, the decision x2 = 1 is done at L0. For efficiency reasons the SAT solver does not check whether all clauses are satisfied under this partial assignment but only detects conflicts. Finally, a satisfying assignment is found by deciding x4 = 0 at L1. In summary, this example shows on an informal basis how a modern SAT solver carries out conflict analysis and uses conflict clauses “to remember” nonsolution spaces. A large number of added conflict clauses may result in memory problems. This is resolved by removing conflict clauses from time to time which does not change the initial problem instance. A formal and more detailed presentation of the technique can be found in [MS99]. The algorithms to derive conflict clauses have been further improved, e.g. in [ZMMM01, ES04]. A result of this learning is a drastic speed-up of the solving process – in particular, also for unsatisfiable formulas. The last major improvement of SAT solvers results from sophisticated decision heuristics. Basically, the SAT solver dynamically collects statistics about the occurrence of literals in clauses. A dynamic procedure is used to keep track of conflict clauses added during the search. An important observation is that locality is achieved by exploiting recently learned information. This helps to speed up the search. An example is the Variable State Independent Decaying Sum (VSIDS) strategy employed in [MMZ+ 01]. A counter exists for each literal to count the number of occurrences in clauses. Each time a conflict clause is added, the counters are incremented accordingly. The value of these counters is regularly divided by two. This helps to emphasize the influence of more recently learned clauses. But a large number of other heuristics has also been investigated, e.g. in [Mar99, GN02, JS05]. Another ingredient to modern SAT solvers is a powerful preprocessing step as proposed in [Dre04, EB05, JS05, EMS07]. The original CNF formula is usually a direct mapping of the problem onto a CNF representation. No optimizations are carried out, e.g. clauses with only one literal are frequently
19
Preliminaries
contained in this original CNF formula, but these can be eliminated without changing the solution space. When preprocessing the CNF formula, optimizations are applied to make the representation more compact and to improve the performance of BCP. Due to these advances, SAT solvers have become the state of the art for solving a large range of problems in CAD, e.g. formal verification [BCCZ99, KPKG02], debugging or diagnosis [SVV04, ASV+ 05, FSVD06], and test pattern generation [SBSV96, SFD+ 05b].
2.2
Circuits
Circuits are considered throughout the design flow. Often formal definitions for circuits aim at a special purpose like synthesis [HS96] or ATPG [KS97]. In this book a more general definition is used to also cope with different tasks, like verification, simulation, and diagnosis. After defining circuits for the sequential and the combinational case, the mapping of a BDD to a circuit is introduced. Finally, the transformation of a circuit into a CNF formula which is necessary when applying a SAT solver is explained.
2.2.1
Circuits and Traces
A circuit is usually composed of the elements of a set of basic gates. This set of gates is called library. One example of such a library is the set of gates shown in Figure 2.6. These are the well-known gates that correspond to Boolean operators: AND, OR, XOR, and NOT. If necessary, it is straightforward to extend this library to other Boolean gates. In the following, the library usually consists of all Boolean functions with a single output. Where necessary, the library may be restricted to consider only a subset of all Boolean functions. The connections between gates are defined by an underlying graph structure. Additionally, for gates that represent nonsymmetric functions (e.g. multiplexors) a unique order for the inputs is given by ordering the predecessors of a gate. Definition 1. A sequential circuit is defined by C = (V, E, X, Y, S, N, F, P ) where An acyclic directed graph G = (V, E) defines the connections X = {x1 , . . . , xn } ⊆ V is the set of primary inputs
AND
OR
Figure 2.6.
XOR
Basic gates
NOT
20
ROBUSTNESS AND USABILITY
Y = {y1 , . . . , ym } ⊆ V is the set of primary outputs S = {s1 , . . . , sl } ⊆ V is the set of present state nodes N = {n1 , . . . , nl } ⊆ V is the set of next state nodes F : V → (B∗ → B) associates a Boolean function fv = F (v) to a node v (projection functions of variables are assigned to input nodes and present state nodes) P : (V \ (X ∪ S)) → (V \ (Y ∪ N ))∗ is an ordered tuple of predecessors of v: P (v) = (w1 , . . . , wp ) Thus, P (v) describes the input variables of fv . A gate of a circuit C = (V, E, X, Y, S, N, F, P ) is a node g ∈ V . This is often denoted by g ∈ C. The size of a circuit C is denoted by |C| and is equal to the number of gates |V |. For convenience, the output signal of a gate g is often referred to as signal g. If a propositional variable is needed for gate g, this variable is also denoted by g. For any gate g the Boolean function of g in terms of primary inputs and present state values is denoted by Fg . This function is retrieved by recursively substituting the variables in fg with the functions of predecessors of g. Definition 2. A controlling value at the input of a gate determines the value of the output independently of the values at other inputs. Example 6. The value 1 (0) is the controlling value for OR (AND), and the value 0 (1) is the non-controlling value for OR (AND). An XOR-gate does not have a controlling input value. These notations can be extended to handle gates with multiple outputs and hierarchical circuits. The extension is straightforward and therefore omitted. All the practical implementations of the techniques presented in this book handle these cases when necessary. Definition 3. A combinational circuit C = (V, E, X, Y, S, N, F, P ) is a circuit without state elements, i.e. S = ∅ and N = ∅. A circuit with state elements may also be referred to as sequential circuit. For brevity a combinational circuit C = (V, E, X, Y, ∅, ∅, F, P ) may be denoted by C = (V, E, X, Y, F, P ). The value of gate g at time step t is denoted by νg [t]. If the value is unknown or not important, this may be denoted by the values ‘U ’ or ‘−’, respectively.
21
Preliminaries
This may be particularly useful to describe a counterexample where the values of some signals are not important to excite a malfunction. In this case νg [t] ∈ {0, 1, −, U } but often νg [t] ∈ B is sufficient. Definition 4. A simulation trace T of length tcyc for a circuit C is given by a tuple (U, (u0 , . . . , utcyc −1 )), where U = (g1 , . . . , gr ) is a vector of r gates gj ∈ C, 1 ≤ j ≤ r and ut = (νg1 [t], . . . , νgr [t]) gives the values of these gates at time step t Example 7. Consider the waveforms in Figure 2.7(a) produced by the sequential circuit in Figure 2.8. For synchronously clocked circuits as studied in this book, the waveform can directly be mapped into the vector notation that is shown in Figure 2.7(b). Together with the vector U = (x2 , x1 , s1 , s2 , s3 ) this forms the simulation trace T = (U, (u0 , . . . , u5 )). Thus, a simulation trace directly corresponds to a waveform, e.g. given in the widely used Value Change Dump (VCD) format that is specified in IEEE Std 1364-1995.
x2
x2
0
0
0
0
0
0
x1
x1
1
1
0
1
0
0
s1
s1
0
1
1
0
1
0
s2
s2
0
0
1
1
0
1
s3
s3
0
0
0
1
1
0
u0 u1
u2
u3
u4
u5
0
1
2
3
4
5
t
(b) Vector representation of the waveforms
(a) Waveform
Figure 2.7.
Simulation trace for the shift-register
x2 x1
0 1
s1
0 1
Figure 2.8.
s2
0 1
1-bit-shift-register
s3
y1
22
ROBUSTNESS AND USABILITY d0
d1
s d0 d1 s
0
1
Figure 2.9. Multiplexor cell MUX
2.2.2
BDD Circuits
BDDs directly correspond to Boolean circuits composed of multiplexors as explained in [Bec92]. Such circuits are called BDD circuits in this work. More exactly: BDD circuits are combinational logic circuits defined over a fixed library. The typical multiplexor cell is denoted as MUX, and it is defined as shown in Figure 2.9 by its standard AND-, OR-, NOT-based realization. The left input is called control input, the upper inputs are called data inputs (left data input = 0-input, right data input = 1-input). Results reported for BDD circuits in this book also transfer to different realizations, e.g. the realization of a multiplexor in Pass Transistor Logic (PTL). The BDD circuit of a BDD is now obtained by the following construction: Traverse the BDD in topological order and replace each internal node v in the BDD by a MUX cell, connect the control input to the primary input Label(v), corresponding to the label of the BDD node. Then, connect the 1-input to the output of the multiplexor for Then(v), connect the 0-input to the multiplexor for Else(v) and insert an inverter if CE((v, Else(v))). Finally, substitute the output nodes by primary outputs and connect these outputs to the multiplexors of their successors; insert an inverter if the edge to the successor is complemented. Example 8. Figure 2.10 shows an example for the transformation. The original BDD is shown in Figure 2.10(a). Note that the root node in this case is shown on the bottom and the terminal nodes on the top. The corresponding BDD circuit can be seen in Figure 2.10(b). Remark 1. As has been suggested in previous work [Bec92, ADK93], the MUX cells connected to constant values can be simplified. But this reduction is not applied to the BDD circuits considered in this book unless stated otherwise. The reason is a degradation of the testability due to the optimization as will be shown in Section 4.2.
23
Preliminaries 0
1
0
x3
x3
x2
x1
0
Figure 2.10.
1
x2
0
x1
0
f (a) BDD
1
1
1
y (b) BDD circuit
Example for a BDD circuit
More details on BDD circuits and their applications in the design flow can be found, e.g. in [DG02].
2.2.3
Transformation into CNF
A SAT solver can be applied as a powerful black-box engine to solve a problem. In this case, transforming the problem instance into a SAT instance and the SAT solution into a solution for the original problem is crucial. In particular, the transformation of the circuit into a CNF formula is one step for multiple applications that can be implemented using a SAT prover as a core engine, e.g. ATPG, property checking, or debugging. Commonly, the Tseitin transformation [Tse68] is used that is defined for Boolean expressions. For each subformula a new propositional variable is introduced and constrained to be equivalent to the subformula. For example, in [Lar92] the application to circuits has been presented. The transformation of a single AND-gate into a set of clauses is shown in Table 2.1. The goal is to create a CNF formula that models an AND-gate, i.e. a CNF formula that is only satisfied for assignments that may occur for an ANDgate. For an AND-gate with two inputs x1 and x2 , the output y must always be equal to x1 ·x2 . The truth-table for this CNF formula is shown in Figure 2.1(a). From the truth-table a CNF formula is generated by extracting one clause for each assignment where the formula evaluates to 0. These clauses are shown in Table 2.1(b). This CNF representation is not minimal and can therefore be reduced by two-level logic minimization, e.g. using the tool ESPRESSO that is included in SIS [SSL+ 92]. The clauses in Table 2.1(c) are the final result.
24
ROBUSTNESS AND USABILITY
Table 2.1. Transformation of an AND-gate into a CNF formula (a) Truth-table
x1 0 0 0 0 1 1 1 1
x2 0 0 1 1 0 0 1 1
y 0 1 0 1 0 1 0 1
(b) Clauses
y ↔ x1 · x2 1 0 1 0 1 0 0 1
(c) Minimized
(x1 + x2 + y) ·
(x1 + x2 + y)
· ·
(x1 + x2 + y) (x1 + x2 + y)
· ·
(x1 + y) (x2 + y) (x1 + x2 + y)
For a gate g a propositional variable g is also used in the CNF formula when a circuit is considered in the following. This simplifies understanding and notation of CNF formulas that correspond to circuits. The Boolean expression ψg describes the constraints needed to model g. Now, the generation of the CNF formula for a complete circuit is straightforward. The Boolean expressions describing the gates are conjoined into one CNF formula. Clauses are generated for each gate according to the type. Given the circuit C = (V, E, X, Y, S, N, F, P ), the Boolean expression to model the whole circuit is given by ψC = g∈V ψg . If all subexpressions ψg are given in CNF representation, the overall expression is in CNF. The output variables of a gate and input variables of the successors are identical and therefore reflect the connections between gates within the CNF formula. Note that this only models the circuit for one-time step. Modeling the sequential behavior will be considered later. Example 9. Consider the circuit shown in Figure 2.11. The OR-gate is described by the formula ψy = (y ↔ a + b). The primary input x1 is described by ψx1 = x1 . As a result, the circuit is translated into the following CNF formula: (x1 + a) · (x2 + a) · (x1 + x2 + a) a↔x1 ·x2
· (x3 + b) · (x3 + b) b↔x3
· (a + y) · (b + y) · (a + b + y) y↔a+b
25
Preliminaries x1 x2
a y
x3
Figure 2.11.
b
Example for the conversion into CNF
An advantage of this transformation is the linear size complexity. Given a circuit where |C| is the sum of the numbers of inputs, outputs, and gates, the number of variables in the SAT instance is also |C| and the number of clauses is in O(|C|). A disadvantage is the loss of structural information. Only a set of clauses is given to the SAT solver. Information about predecessors and successors of a node is lost and is not used during the SAT search. But this information can be partially recovered for certain applications by introducing additional constraints into the SAT instance as proposed in [Sht01] for bounded model checking and in [SBSV96, SFD+ 05b] for test pattern generation.
2.3
Formal Verification
Formal verification covers mainly two aspects of the design flow. The verification of the initial HDL description of the design is addressed by model checking. The correctness of subsequent synthesis steps is verified by equivalence checking. These two techniques are introduced in this section. The simpler presentation for equivalence checking is given first. The text book [Kro99] gives a more comprehensive introduction and overview of techniques for formal verification.
2.3.1
Equivalence Checking
Formal equivalence checking determines whether the functions realized by two given circuits are identical. In the following, the equivalence checking problem for two combinational circuits is considered. Matching the primary inputs (outputs) of one circuit with those of the other circuit is a difficult problem itself [MMM02]. But this is not the focus of this work and, therefore, the mapping is assumed to be given. A common approach to carry out the equivalence check is to create a miter circuit [Bra83]. Example 10. Given a circuit CE = (VE , EE , X, Y, FE ) and its specification CS = (VS , ES , X, Y, FS ) with X = (x1 , x2 , x3 ) and Y = (y1 , y2 ), the miter circuit is built as shown in Figure 2.12. The output of the miter assumes the value 1 if, and only if, the current input assignment causes at least one pair
26
ROBUSTNESS AND USABILITY
x1 x2
y1
CE
1 y2
x3
y1
CS y2 Figure 2.12.
Miter circuit for equivalence checking
of outputs to assume different values. Such an input assignment is called a counterexample (see Definition 5 below). The two circuits are equivalent if no such input assignment exists. One possibility to solve the equivalence checking problem is to transform the miter circuit into a SAT instance and constrain the output to the value 1. The resulting SAT instance is unsatisfiable if the two circuits are equivalent. The SAT instance can be satisfied if implementation and specification differ in at least one output value under the same input assignment. In this case, SAT solving returns a single counterexample. An all solutions SAT solver [GSY04, LHS04] could be used if more than one counterexample is needed. Alternatively, BDDs could be used to calculate the counterexamples for each output symbolically. For output yi all counterexamples are represented by: FE,yi ⊕ FS,yi
(2.2)
But this approach is limited due to the potentially large size of BDDs. Therefore, in practice, structural information is usually exploited to simplify the problem by merging identical subcircuits and multiple engines are applied as proposed in [KPKG02]. For diagnosis and debugging, often a description of the implementation and one or more counterexamples are used. Formally, counterexamples are described as follows: Definition 5. Let the circuit C be a faulty implementation of a specification. A counterexample T is a triple (T, g, ν), where T is a simulation trace of C T causes an erroneous value at gate g ν is the correct value for gate g A test-set T is a set of counterexamples.
27
Preliminaries
For combinational equivalence checking the trace has a length of one time frame and the trace is defined solely over primary inputs. The fault is always observed at a primary output. If the counterexample is calculated symbolically, do not care values may be contained in the trace.
2.3.2
Bounded Model Checking
Model checking (or property checking) [CGP99] is a technique to formally prove the validity of a given property on a model. The property is usually given in some temporal logic and the model is often described in terms of a labeled transition system or a finite state machine. Here, the model is described by a circuit that directly corresponds to a finite state machine: The values of the flip-flops describe a state, the values of the primary inputs describe the input symbol, and the combinational logic describes the transition function. In this context, the atomic propositions for a particular state in a labeled transition system are given by the bits with value 1 of the state vector. Essentially, each formalism can be transformed into the others. In this book, Bounded Model Checking (BMC) is considered [BCCZ99]. The property is always checked over a finite number of time frames. The advantage of this formulation is the direct correspondence to a SAT instance. The property language may describe properties over infinite intervals like Linear Time Logic (LTL) [Pnu77]. Longer and longer time frames are considered incrementally until either a counterexample is found or the state space diameter is reached. On the other hand, the property language may restrict the length of the time interval. Solving a single SAT instance is then sufficient to prove or disprove the property. By this, the effectiveness is drastically increased. A finite window restricts the expressiveness of the property language but usually circuits also respond within a bounded time interval to stimuli. Therefore, this type of property checking is quite efficient and successfully applied in practice [WTSF04]. The SAT instance for checking a temporal property over a finite interval is shown in Figure 2.13. The circuit is “unrolled” for a finite number of time time frame 0
x1 [0]
x2 [0]
time frame 1
y1 [0]
y2 [0]
x1 [1]
time frame tcyc -1
y1 [1]
x2 [1]
x [t cyc -1] y2 [1] 1 x2 [t cyc -1]
y1 [t cyc -1] y2 [t cyc -1]
s1 [0]
n1 [0]
s1 [1]
n1 [1]
s1 [tcyc -1]
n1 [tcyc -1]
s2 [0]
n2 [0]
s2 [1]
n2 [1]
s2 [tcyc -1]
n2 [tcyc -1]
Property
Figure 2.13.
SAT instance for BMC
0
28
ROBUSTNESS AND USABILITY
frames and a propositional formula corresponding to the property is attached to this unrolling. The property is constrained to evaluate to 0. Therefore, the SAT instance is satisfiable if, and only if, a counterexample exists that shows the invalidity of the property on the circuit. Otherwise, the property is valid. More detailed, the SAT instance is created as follows: For each time frame one copy of the circuit is created. State elements are converted to inputs and outputs. The next state outputs of time frame t are connected to the present state inputs of time frame t + 1. New variables are used for every copy of the circuit in the SAT instance. Notation 2. In time frame t, the variable g[t] is used for gate g. The Boolean expression ψg [t] denotes the Boolean constraints for gate g at time t. Remark 2. Normally, an indexed notation is used to denote variables or different Boolean expressions, while using an array notation to identify different time frames is not usual. But this notation has the advantage to separate the time reference from other indices (e.g. the number i of a particular input xi ) and, by this, to improve the readability. Moreover, in Chapter 5 Boolean expressions are derived from simulation traces. The chosen notation helps to understand the equations more easily. Given the constraint ψg , the constraint ψg [t] is retrieved by substituting all variables with the variable at time frame t, e.g. g is substituted by g[t]. Then, the CNF formula to describe the unrolling of circuit C = (V, E, X, Y, S, N, F, P ) for tcyc time frames is given by: t ψCcyc
tcyc −1
=
t=0 g∈V
ψg [t]
tcyc −2 l t=0 i=1
((ni [t] + si [t + 1])(ni [t] + si [t + 1])) ni [t]↔si [t+1]
(2.3) As a result, the behavior of the circuit over time is modeled. For a bounded finite interval the temporal property directly corresponds to a propositional formula where the variables correspond to variables of gates at particular time frames. By attaching the property to the unrolled circuit, the relationship between signals is evaluated over time. In this book, properties may either be given 1. As an LTL safety property 2. As a propositional formula that refers to signals of the circuit at particular time frames
29
Preliminaries
At first, suppose a partial specification of the system is given as an LTL formula. Besides well-known propositional operators, also temporal connectives are available in LTL. The meaning of the temporal operators is informally introduced in the following: X p means “p holds in the next time frame” G p means “p holds in all time frames” F p means “p eventually holds in some time frame” p U q means “p holds in all time frames until q holds” A safety property does not contain the operator F (and no other construct to express this operator, e.g. G p). Now, the LTL formula Ψ has to be checked for tcyc time steps. For this purt pose a propositional formula ψΨcyc representing the specification is constructed. For each subformula Ω of Ψ and for every time frame t a new propositional variable zΩ [t] is introduced. These variables are related to each other and to the variables used in the unrolling of the circuit as follows. For the temporal connectives, the well-known expansion rules [MP91] are applied which relate the truth value of a formula to the truth values of its subformulas in the same and the next time frame. For instance, G Ψ = Ψ · X G Ψ and F Ψ = Ψ + X F Ψ. The Boolean connectives used in LTL are trivially translated to the corresponding constructs relating the propositional variables. Finally, the truth value of the atomic proposition g at time frame t is equal to the value of the corresponding variable g[t] in the unrolling of the circuit. The final requirement is that the formula is not contradicted by the behavior of the circuit. That is, zΨ [0], the variable corresponding to the specification in time frame 0, is true. As a result, property checking can be done by solving the SAT problem for the following propositional formula: t
t
ψBM C = z Ψ [0] · ψΨcyc · ψCcyc
(2.4)
The formula ψBM C is unsatisfiable if, and only if, no trace for the circuit C exists such that the specification Ψ does not hold – or simpler ψBM C is unsatisfiable if and only if Ψ is valid on C, i.e. C is a model for Ψ. Alternatively, a property may be given directly as a propositional formula. In this case, a fixed number of time frames is considered by this property. The length of the window for a propositional property is given by the largest time
30
ROBUSTNESS AND USABILITY
x2 x1
0 1
s1
0 1
Figure 2.14.
s2
0 1
s3
y1
1-bit-shift-register
step referenced by any variable plus one (the first time step is considered to be zero). The property is shifted to an arbitrary time step t by adding t to each time reference. Example 11. Again, consider the circuit in Figure 2.14 that was introduced in Example 11. This is a 1-bit-shift-register with three state registers labeled by the name of the present state nodes s1 , s2 and s3 . The shift-register has two modes of operation: keep the current value (x2 = 1) and shifting (x2 = 0). In the shifting mode, the value of input x1 is shifted into the register. After three clock cycles the value is stored in register s3 . This behavior is described by the property “If x2 is zero on three consecutive time steps, the value of x1 in the first time step equals y1 in the fourth time step” which can be written as a formula: x2 [t] · x2 [t + 1] · x2 [t + 2] → (x1 [t] = s3 [t + 3])
(2.5)
The length of the window for this property is 4. Similar notions of properties are also used by industrial model checking tools, e.g. [BS01]. Having a window for the property is not a restriction in practice. Very often the length of the window corresponds to a particular number of cycles needed for an operation in the design. In case of the shift-register, this is the number of cycles needed to bring an input value to the output. For a more sophisticated design like a processor this can be the depth of the pipeline, i.e. the number of cycles to process an instruction. Finally, counterexamples are also considered to carry out diagnosis for BMC. Similar to the case of equivalence checking the counterexample is represented by a triple (T, y, ν) as described by Definition 5. This counterexample may either be given with respect to the circuit or with respect to the SAT instance representing the BMC problem. With respect to the circuit, the counterexample is a simulation trace over time, but in general no single erroneous output of the circuit is responsible for the failure of a property. If the counterexample is given with respect to the SAT instance, the failure corresponds to zΨ [0] becoming 0 instead of 1.
31
Preliminaries
2.4
Automatic Test Pattern Generation
This section provides the necessary notions to introduce Automatic Test Pattern Generation (ATPG). First, circuits and fault models are presented. Then, the reduction of a sequential ATPG problem to a combinational problem is explained. Finally, classical ATPG algorithms working on the circuit structure are briefly reviewed. The presentation is kept brief, for further reading we refer to, e.g. [JG03].
2.4.1
Fault Models
After producing a chip, the functional correctness of this chip has to be checked. Without this check an erroneous chip may be delivered to customers which, in turn, may cause a malfunction of the final product. This, of course, is not acceptable. A large range of malfunctions is possible due to defects in the material, process variations during production, etc. But directly checking for all possible physical defects is not feasible. An abstraction in terms of a fault model is typically introduced. The Stuck-At Fault Model (SAFM) [BF76] is well-known and widely used in practice. In this fault model, a single line is assumed to be stuck at a fixed value instead of depending on the input values. When a line is stuck at the value 0, this is called a stuck-at-0 (SA0) fault. Analogously, if the line is stuck at the value 1, this is a stuck-at-1 (SA1) fault. Example 12. Figure 2.15(a) repeats the circuit from Example 9. When a SA0 fault is introduced on line a, the faulty circuit in Figure 2.15(b) is created. The output of the AND-gate is disconnected and the upper input of the OR-gate constantly assumes the value 0. Besides the SAFM a number of other fault models have been proposed, e.g. the cellular fault model [Fri73] where the function of a single gate is changed, or the bridging fault model [KP80] where two lines are assumed to settle to a single value. These fault models mainly cover static physical defects like opens or shorts. Dynamic effects are covered by delay fault models, for example, the Path-Delay Fault Model (PDFM) [Smi85]. In the PDFM, it is x1
x1
x2 x3
x2
a y b (a) Correct circuit
Figure 2.15.
a 0
x3
b (b) Faulty circuit
Example for the SAFM
y
32
ROBUSTNESS AND USABILITY
checked whether the propagation delays of all paths in a given circuit are less than the system clock interval. For the detection of a path delay fault a pair of patterns (I1 , I2 ) is required rather than a single pattern as in the SAFM: The initialization vector I1 is applied and all signals of the circuit are allowed to stabilize. Then, the propagation vector I2 is applied and after the system clock interval the outputs of circuit C are checked. Definition 6. A two-pattern test is called a robust test for a path delay fault (RPDF test) on a path if it detects that fault independently of all other delays in the circuit and all other delay faults not located on this path. An even stronger property can also be defined for PDF tests: For each path delay fault there exists a robust test (I1 , I2 ) which sets all off-path inputs to noncontrolling values on application of I1 and remains stable during application of I2 , i.e. the values on the off-path inputs are not invalidated by hazards or races. Robust tests with this property are called strong RPDF tests. In the following, we only use such tests, but for simplicity we call them RPDF tests, too. For a detailed classification of PDFs see [PR90].
2.4.2
Combinational ATPG
Automatic Test Pattern Generation (ATPG) is the task of calculating a set of test patterns for a given circuit with respect to a fault model. A test pattern for a particular fault is an assignment to the primary inputs of the circuit that leads to different output values depending on the presence of the fault. Calculating the Boolean difference of the faulty circuit and the fault-free circuit yields all test patterns for a particular fault. This construction is similar to a miter circuit [Bra83] as it can be used for combinational equivalence checking (see Section 2.3.1). In this sense, formal verification and ATPG are similar problems [AFK88]. Example 13. Again, consider the SA0 fault in the circuit in Figure 2.15. The input assignment x1 = 1, x2 = 1, x3 = 1 leads to the output value y = 1 for the correct circuit and to the output value y = 0 if the fault is present. Therefore this input assignment is a test pattern for the fault a SA0. The construction to calculate the Boolean difference of the fault free circuit and the faulty circuit is shown in Figure 2.16. A similar approach can be used to calculate tests for the dynamic PDFM. In this case the circuit is either unrolled for two time frames or a multi-valued logic is applied to model the value of a gate in two subsequent time frames. Additional constraints apply to gates along the path to be tested to force different values in the two time frames. As a result, two test patterns are calculated
33
Preliminaries x1 x2
a y
x3
b BD
a’ 0
y’
b’
Figure 2.16.
Boolean difference of the faulty circuit and the fault free circuit
to test for a PDF. For a strong RPDF test the side inputs to the path have to be set to noncontrolling values. The absence of hazards has to be ensured by extra constraints. Definition 7. A fault is testable when a test pattern exists for that fault. A fault is untestable when no test pattern exists for that fault. To decide whether a fault is testable, is an NP-complete problem [IS75]. The aim is to classify all faults and to create a set of test patterns that contains at least one test pattern for each testable fault. Generating test patterns for circuits that contain state elements like flip-flops is computationally more difficult because the state elements cannot directly be set to a particular value. Instead, the behavior of the circuit over time has to be considered during ATPG. For example, the circuit can be unrolled similarly to BMC. In ATPG, this is frequently called the iterative logic array. Moreover, a number of tools have been proposed that directly address the sequential problem, e.g. HITEC [NP91] or the sequential SAT solver SATORI [IPC03]. But in practice, the resulting model is often too complex to be handled by ATPG tools. To overcome this problem, the full scan mode is usually considered by connecting all state elements by a scan chain [WA73, EW77]. In test mode, the scan chain combines all state elements into a shift-register, in normal operation mode the state elements are driven by the ordinary logic in the circuit. As a result, the state elements can be considered as primary inputs and outputs for testing purposes and a combinational problem results.
2.4.3
Classical ATPG Algorithms
Classical algorithms for ATPG usually work directly on the circuit structure to solve the ATPG problem for a particular fault. Some of these algorithms are briefly reviewed in the following. For an in-depth discussion the reader is referred to text books on ATPG, e.g. [JG03].
34
ROBUSTNESS AND USABILITY
One of the first complete algorithms dedicated to ATPG for the SAFM was the D-algorithm proposed by Roth [Rot66]. The basic ideas of the algorithm can be summarized as follows: A fault is observed due to differing values at a line in the faulty circuit and the fault-free circuit. Such a divergence is denoted by values D or D to mark differences 1/0 or 0/1, respectively. Instead of Boolean values, the set {0, 1, D, D} is used to evaluate gates and carry out implications. A gate that is not on a path between the fault and any output does never have a D-value. A necessary condition for testability is the existence of a path from the fault location to an output where all intermediate gates either have a D-value or are not assigned yet. Such a path is called a potential D-chain. A gate is on a D-chain if it is on a path from the fault location to an output and all intermediate gates have a D-value. On this basis an ATPG algorithm can focus on justifying a D-value at the fault site and propagating this D-value to an output as shown in Figure 2.17. The algorithm starts with injecting the D-value at the fault site. Then, this value has to be propagated towards the outputs. For example, to propagate the value D at one input along a 2-input AND-gate, the other input must have the noncontrolling value 1. After reaching an output, the search proceeds towards the inputs in the same manner to justify the D-value at the fault site. At some stages in the search decisions are possible. For example, to produce a 0 at the output of an AND-gate, either one or both inputs can have the value 0. Such a decision may be wrong and may lead to a conflict later on. Due to a reconvergence as shown in Figure 2.17, conditions resulting from propagation may prevent justification. In this case, a backtrack search has to be applied. In
Fault site
Justification
Propagation
Reconvergent path Figure 2.17.
Justification and propagation
Preliminaries
35
summary, the D-algorithm is confronted with a search space of O(2|C| ) for a circuit with |C| signals including inputs, outputs and internal signals. A number of improvements have been proposed for this basic procedure. PODEM [Goe81] branches only on the values for primary inputs. This reduces the search space for test pattern generation to O(2n ) for a circuit with n primary inputs. But as a disadvantage time is wasted if all internal values are implied from a given input pattern that finally does not detect the fault. Fan [FS83] improves upon this problem by branching on stems of fanout points as well. As a result internal structures that cause a conflict when trying to detect the test pattern are detected earlier. The branching order and value assignments are determined by heuristics that rely on observability measures to predict a “good” variable assignment for justification or propagation, respectively. Moreover, the algorithm keeps track of a justification frontier moving towards the inputs and a propagation frontier moving towards the outputs. Therefore, Fan can make the “most important decision” first – based on a heuristic – while the D-algorithm applied a more static order by propagating the fault at first and justifying the assignments for preceding gates afterward. Socrates [STS87] includes the use of global static implications by considering the circuit structure. Based on particular structures in the circuit, indirect implications are possible, i.e. implications that are not directly obvious due to assignments at a single gate, but implications that result from functional arguments taking several gates into account. These indirect implications are applied during the search process to imply values earlier from partial assignments and, by this, prevent wrong decisions. Hannibal [Kun93] further improves this idea. While Socrates only uses a predefined set of indirect implications, Hannibal learns from the circuit structure in a preprocessing step. For this task recursive learning [KP94] is applied. In principle, recursive learning is complete itself but too time consuming to be used as a stand-alone procedure. Therefore, learning is done in a preprocessing step. During this step, the effect of value assignments is calculated and the resulting implications are learned. These implications are stored for the following run of the search procedure. In Hannibal the Fan algorithm was used to realize this search step. Even though several improvements have been proposed to increase the efficiency of ATPG algorithms, the worst case complexity is still exponential. Synthesis for testability means to consider the ATPG problem during synthesis already and, by this, create circuits with good testability.
Chapter 3 ALGORITHMS AND DATA STRUCTURES
The technique proposed in this chapter cannot exclusively be attributed to a single step in the design flow. Instead, the underlying techniques for Boolean function manipulation are adjusted to particular subsequent needs. Binary Decision Diagrams (BDDs) [Bry86] and solvers for the Boolean Satisfiability (SAT) problem [DP60, Coo71] are state of the art for Boolean function manipulation. Both approaches have individual advantages. In the past, many researchers have proposed techniques to improve the efficiency of these algorithms, e.g. in [MS96, BRB90, MMZ+ 01, Som01b, ES04]. A new data structure is presented that combines paradigms of SAT solvers and BDDs. Heuristics allow to trade-off BDD-like symbolic manipulation of Boolean functions versus SAT-like search in the Boolean space. This can influence the robustness advantageously or can be exploited to retrieve more detailed information about particular parts of the solution space. The approach was first presented in [DFK06]. The link of this technique to other steps within the design flow is outlined at the end of this chapter.
3.1
Combining SAT and BDD Provers
Besides BDDs SAT provers are an efficient – and often more robust – technique to handle Boolean problems. Experimental studies have shown that both techniques are orthogonal, i.e. there exist problems where BDDs work well while SAT solvers fail and vice versa. This trade-off can even be formally proven [GZ03]. BDDs and SAT provers are very different in nature. While BDDs compute all solutions in parallel, they require a large amount of memory. In contrast, SAT is very efficient regarding memory consumption but only gives a single solution. There are many applications where multiple solutions are needed
38
ROBUSTNESS AND USABILITY
(see, e.g. [HTFM03] or Section 6.2). Motivated by these observations, many authors tried to combine the best of the two approaches, by applying SAT solvers and BDDs alternatively or iteratively. Even though remarkable results have been obtained, so far none of the approaches considered an integration of the two methods within a single data structure. In this section, the first hybrid approach that allows to tightly combine BDDs and SAT is presented. Even though the overall principle of the two techniques is very different, there are also some similarities. In both concepts, starting from a Boolean description the problem is decomposed by assigning a Boolean value to a variable. This has already been observed in [RDO02]. For this, the concept of expansion nodes is introduced. The given Boolean problem is initially represented by a single expansion node that is recursively expanded. If this is done in a strict Depth First Search (DFS) manner, the resulting algorithm is close to a SAT procedure. But if all operations are carried out symbolically, the algorithm computes a BDD. The relation between the two approaches is discussed in more detail later. Experimental results demonstrate the efficiency of the approach. The section is structured as follows: Other approaches to extend Boolean proof techniques and the relation between SAT and BDDs are discussed in Section 3.1.1. Then, the relation between the two is considered. The new hybrid approach is presented in Section 3.1.2. In Section 3.1.3, experimental results are given.
3.1.1
Proof Techniques
In the following, earlier work related to the hybrid approach is discussed. Different extensions have been suggested for both concepts, SAT provers and BDDs. Then, the relations between both concepts are briefly reviewed.
3.1.1.1 Extensions Streaming BDDs have been proposed to reduce the memory requirements [Min02]. The idea is to represent a BDD as a bracketed sequence. The sequence can be processed sequentially using limited memory. But this can only be done by giving up canonicity. In the context of extensions of the classical BDD concept introduced by Bryant (see Section 2.1.2), some approaches have been presented that make use of different types of functional nodes. The approach in [RBKM91] keeps control of the memory needed for the BDD construction by projecting some parts of the graph to a new terminal node U (= unknown). Instead of completely calculating each subgraph, the calculation may be stopped at a given depth and the complete tree is replaced by the terminal node U . As a result, exactness cannot be recovered afterward.
39
Algorithms and Data Structures
Nodes to represent the exclusive-or of the children have been introduced in [MS98]. The purpose of these nodes is to reduce the size of the BDD. Then, probabilistic methods are applied to find a satisfying assignment. Extended BDDs as proposed in [JPHS91] apply existential quantification and universal quantification as edge attributes. By introducing a “structural variable” s, the equality ∃s f = fs + fs can be exploited to represent the Boolean operation f + g in terms of a node v. This can be seen as follows: Let v be a node and f and g be the Boolean functions represented by its children. Then, v represents the function sf + sg. Now, assume an incoming edge has the attribute for existential quantification. The function represented by this edge is retrieved as follows: ∃s (sf + sg) = (sf + sg)s + (sf + sg)s = f +g
(as introduced above)
Similarly, universal quantification is used to represent f · g. These structural variables allow to control the size of the extended BDD. Again, the problem is to find a satisfying assignment of the resulting extended BDDs. The same principle was exploited in [HDB96]. By introducing extra nodes at the top level of two BDDs, a Boolean operation is represented. Then, these nodes are moved towards the terminals by exchanging adjacent variables [Rud93]. At the terminals these nodes can be eliminated. In both cases the use of new variables implies that a new level is introduced in the shared BDD structure. The approach was further extended in [AH97] for Boolean Expression Diagrams (BEDs). Functional nodes that directly represent Boolean operations were introduced. Again, these nodes can be eliminated by swapping adjacent levels in the BED. If a BED is built from a description of a circuit, the size of a BED is similar to the circuit size. All of these approaches are presented as extensions of BDDs. The advantage of using SAT-like algorithms on such a structure has not been considered. Another recent direction of research are efficient all-solution SAT solvers that do not stop after reaching the first satisfying assignment but calculate all possible satisfying solutions, e.g. [LHS04]. A drawback of these approaches is the potentially large representation of all solutions usually as cubes or as BDDs. In contrast, the hybrid approach targets applications where not all but a set of good solutions is needed. Recently, several techniques have been proposed to combine BDDs and SAT solvers (see, e.g. [GYAG00, KPKG02, CNQ03, SFVD05]), but no real integration is done. Instead, the proof engines are started one after the other, or alternating. By this, good experimental results have often been obtained, demonstrating the potential of an integrated approach.
40
ROBUSTNESS AND USABILITY
3.1.1.2 Relations BDDs and SAT solvers are most frequently used as complete proof techniques and for the symbolic manipulation of Boolean functions. Both techniques have advantages and disadvantages. BDDs represent all solutions in parallel at the cost of large memory requirements. SAT solvers only provide a single solution while the memory consumption is relatively low. In [RDO02] the relation between BDDs and SAT has been studied from a theoretical point of view. It has been proven that the BDD corresponds to a complete representation of the SAT backtrack tree if a fixed variable order is assumed. As a motivation for the next section, where the hybrid approach is described in more detail, an example is given to show the difference between SAT and BDDs. We will later come back to this example. Example 14. Consider the Boolean function f over four variables given by f
= (x1 + x2 + x3 )(x1 + x2 + x4 )(x1 + x2 + x4 ) (x1 + x2 + x3 )(x1 + x2 + x3 + x4 )
A sketch of the search tree if the function is processed by a SAT solver is shown in Figure 3.1(a). The corresponding BDD is given in Figure 3.1(b) for the variable order π = (x1 , x2 , x3 , x4 ). As can be seen, the SAT solver by construction only gives a single solution while the BDD represents all satisfying assignments in parallel at the cost of a larger number of nodes.
3.1.2
Hybrid Approach
In this section, the hybrid approach for BDD and SAT integration is presented. First, the overall idea is given. Then, the concept of expansion nodes is introduced followed by a discussion of expansion heuristics. Finally, comments on some issues related to an efficient implementation are provided.
3.1.2.1 Basic Idea In the hybrid approach, processing starts by symbolic operations analogously to BDDs. For the operations the basic operators for XOR and AND (see Section 2.1.2) have been modified. During the starting phase, the constructed graphs are simply BDDs. But when composing BDDs, a heuristic is used to decide which parts of the solution space are explored. To guarantee the exactness of the algorithm, i.e. no solution is missed, a node is introduced where the computation can be resumed. These nodes are called expansion nodes in the following. As a result, the hybrid approach stores all necessary information resulting in a complete proof method.
41
Algorithms and Data Structures f
x1 x3
x3=0
x2
x2
x2 x2=0
x3
x3
x1 x1=1 x4 x4=1
x4
x4=0 0
1
x4
x1=0
1
0
0
(b) BDD
(a) SAT search tree
x1
xh
x2
E xi
E
x3 E
xj
x4
0 (c) Sketch of the hybrid approach Figure 3.1.
1
(d) Hybrid representation
Different approaches
A sketch of a configuration during the run is shown in Figure 3.1(c). In this case the upper part is “SAT-like” while the lower part is a complete symbolic representation as it occurs in BDDs. The expansion nodes are denoted by E. The decomposition nodes are labeled by variables, these variables occur in the same order on all paths. In the following, such graphs that allow a smooth transition between SAT and BDDs are called a hybrid structure.
42
ROBUSTNESS AND USABILITY xi 1 (a) Terminal
E op low
high
(b) Decomposition node
f
g
(c) Expansion node
Figure 3.2. Overview over different node types
Remark 3. Several expansion nodes in a hybrid structure may represent the same function. This cannot be detected before completely expanding the node. Thus, a hybrid structure is not a canonical representation of Boolean functions.
3.1.2.2 Expansion Nodes The hybrid approach makes use of three types of nodes (see Figure 3.2): (a) Terminal nodes (b) Decomposition nodes (c) Expansion nodes The first two can also be found in BDDs. Terminal nodes represent the constant functions 0 and 1. In decomposition nodes the Shannon decomposition is carried out. Expansion nodes are labeled by a Boolean operation op and have two successors f and g that represent Boolean functions (which are also denoted by f and g for simplicity). The expansion node represents the function f op g. Example 15. Consider again the function from Example 14 and Figures 3.1(a) and 3.1(b). A possible hybrid structure is shown in Figure 3.1(d). This one results if the top variable is only decomposed in one direction, while an expansion node is placed on the other branch. As can be seen, the structure is more memory efficient. Compared to the BDD five instead of seven nodes are needed. At the same time three solutions are represented in contrast to the SAT approach that only returns a single solution. This simple example demonstrates that the approach combines the two proof techniques SAT and BDD. A crucial point to address is where to place the expansion nodes. A heuristic for this purpose is presented in the next section.
3.1.2.3 Expansion Heuristics Inserting expansion nodes at suitable locations is crucial for the approach to work. If too many expansion nodes are inserted, no solutions can be found. Only structures without a path to a terminal will be constructed and the expansion of partial trees will take most of the run time until computing a solution.
Algorithms and Data Structures
43
Not inserting enough expansion nodes will lead to a memory blow-up as known from BDDs. In a BDD-based approach the final solutions are computed by composing intermediate BDDs. This is similar for the new approach. The following steps are necessary to retrieve solutions: 1. Build BDDs for basic functions without any expansion nodes. 2. Compose the basic functions and insert expansion nodes according to a predetermined heuristic. 3. Select expansion nodes to calculate the final solutions. Which functions are considered as basic functions in Step 1 depends on the problem and the input format, e.g. projection functions and cubes were chosen in the experiments. Building BDDs for these basic functions is not necessary for the approach to work, but having the basic functions completely represented, improves the performance drastically by reducing the number of necessary expansions. The following two heuristics to limit the size of the resulting hybrid structure in Step 2 have been evaluated: (S1) A fast procedure is to directly limit the memory consumption. This limit can be detected efficiently. Once the limit is reached no further decomposition nodes are created. Instead, expansion nodes are generated. Therefore, prior to performing an expansion the memory limit is increased by a user defined value. (S2) The second procedure is to limit the number of nodes in a subgraph to a certain threshold. Tracking this limit is computationally more expensive. But allowing more than n nodes in a subgraph guarantees that there is at least one path to a terminal node, i.e. for at least one assignment the function can directly be evaluated. The selection of nodes to expand in Step 3 has been evaluated using two other heuristics: (E1) Randomly (E2) Heuristically (using the algorithm in Figure 3.3): The hybrid structure is traversed in a depth first manner until an expansion node is reached. This node is selected and then expanded by carrying out the stored operation. The same scheme is applied recursively if further selections are necessary.
44
ROBUSTNESS AND USABILITY
1 node* DFS(v){ 2 if(isTerminal(v)) return NULL; 3 tmp = DFS(Then(v)); 4 if(tmp) return tmp; 5 if(isExpNode(v)) return v; 6 tmp = DFS(Else(v)); 7 return tmp; 8 } Figure 3.3. Depth first traversal
Here, (E2) also heuristically ensures a moderate growth of the memory needs. Experimental studies showed that the combination of a hard limit on memory consumption (S1) with deterministic DFS (E2) gives the best results, i.e. small run times and a large number of solutions. From a more general point of view this combination of heuristics leads to a SAT-like search tree in the upper part of the hybrid structure which is enriched by a BDD-like lower part. Remark 4. When using heuristics (S1) and (E2) in combination, the search space is traversed similar as with “BDDs at SAT leaves” as it has been introduced in [GYAG00, GYA+ 01]. But the proposed hybrid structure is more general in the sense that switching between SAT-like and BDD-like behavior is subject to heuristics. Remark 5. During expansion canonicity is also an issue. When expanding a node, a function that is already represented by another node may be the result. The hybrid structure can be reduced at a computational cost linear in the number of nodes using an algorithm similar to [SW93]. In the implementation no reduction was carried out to save run time.
3.1.2.4 Implementation The technique described above has been integrated into the CUDD package [Som01a] where the core data structures are taken from. To store the expansion nodes, the structure for nodes has been extended (see Line 8 in Figure 3.4). The structure for the new type is given in Lines 12–15. In case of an expansion node, also the operation has to be stored. For reasons of efficiency only two types of operations are stored: AND and XOR. Negation is realized by complemented edges (see Section 2.1.2). All other Boolean operators are mapped accordingly. The information is stored in the index of each node. The complete encoding is given in Table 3.1, i.e. three indices have a special meaning while all the remaining ones are used for decomposition variables.
45
Algorithms and Data Structures
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
struct node { halfWord index; halfWord ref; node *next; union { terminal value; children kids; expNode func; } } struct expNode { node *F; node *G; } Figure 3.4. Modified node structure
Table 3.1. Index of node types (32-bit) Node type Decomposition nodes XOR-node AND-node Terminal node
3.1.3
Index 0 - 65532 65533 65534 65535
Experimental Results
In the following, the results of two types of experiments are analyzed. First, the well-known n-Queens problem is considered as an example of a combinational problem where BDDs perform poorly on large instances while a large number of solutions is available. Second, the synthesis problem of minimizing ESOP representations is studied as an optimization problem that is known to be hard. All experiments have been carried out on an Intel Pentium 4 processor with 3 GHz and 1 GB of main memory running Linux.
3.1.3.1 n-Queens The n-Queens problem is a well-known combinational problem. The objective is to place n queens on an n × n board such that no queen can be captured by another one. An example for a solution of the 5-Queens problem is shown in Figure 3.5. This game problem is encoded using n2 binary input variables, each one deciding whether a queen is placed on the corresponding field of the
46
ROBUSTNESS AND USABILITY
Figure 3.5. Solution to the 5-Queens problem Table 3.2. Heuristics to limit the size of the hybrid structure BDD n 6 7 8 9 10 11 12 13
#Sol. 4 40 92 352 724 2680 14200 73712
Time 0.00 0.01 0.05 0.37 1.56 7.81 48.12 352.11
Limit for the size Memory (S1) Subgraph (S2) Time Overhead Time Overhead 0.00 – 0.01 – 0.01 0.00 % 0.03 200.00 % 0.06 20.00 % 0.18 260.00 % 0.37 0.00 % 1.30 251.35 % 1.59 1.92 % 8.20 425.64 % 7.82 0.13 % 62.39 698.84 % 48.54 0.87 % 490.33 918.97 % 353.21 0.31 % 4566.75 1196.97 %
chess board or not. Obviously, the constraints are to place one queen per row and column and at most one queen per diagonal. In a first experiment, the heuristics to limit the size were considered. For all experiments the limits were loose enough to retrieve all solutions. By this, the overhead of the heuristics to limit the size can directly be measured in comparison to BDDs. Results are reported in Table 3.2. Given are the number of solutions for increasing values of n and run times in CPU seconds for BDDs and the two heuristics introduced in Section 3.1.2.3. The resource requirements for BDDs increase rapidly and no further solutions beyond n = 13 could be retrieved. Also the computational overhead of limiting the size of subgraphs using heuristic (S2) is too large. But directly limiting the memory consumption according to heuristic (S1) introduces almost no overhead. The direct limit has been used in all remaining experiments to restrict the size. The performance of heuristics to select nodes for expansion has been investigated in the next experiment. Expansion was carried out until a total memory
47
Algorithms and Data Structures Table 3.3. Selection of expansion nodes n 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
#Var 9 16 25 36 49 64 81 100 121 144 169 196 225 256 289 324 361 400 441
Randomly (E1) #Sol. Time 0 0.00 2 0.00 10 0.00 4 0.00 40 0.02 92 0.06 352 0.37 724 2.10 2680 16.54 14200 158.86 73712 2062.39 0 384.45 0 289.01 0 652.64 0 1366.25 0 693.13 0 529.37 0 1923.07 0 1957.39
DFS (E2) #Sol. Time 0 0.00 2 0.00 10 0.00 4 0.00 40 0.01 92 0.06 352 0.37 724 1.83 2680 10.30 14200 73.34 73712 578.54 56672 1836.93 33382 1669.50 20338 2555.35 5061 2055.97 204 2238.79 1428 3357.97 38 1592.94 111 1972.60
limit of 750 MB was reached. Due to the expansion of subfunctions, more than one solution can be contained in the final representation. The results are shown in Table 3.3. Up to n = 13 all solutions were obtained with both heuristics. Then, the random selection performs very poorly. When expanding the last node in a cascade of expansion nodes, new decomposition nodes are created. But the next expansion will often occur at an expansion node in a different subgraph. Thus, the previously created decomposition nodes cannot be utilized for the next step. In contrast, the deterministic DFS starts the next expansion where new decomposition nodes have been constructed previously. As a result, the new approach yields solutions up to n = 21 in a moderate amount of time.
3.1.3.2 ESOP Minimization Compared to a SOP-representation of an function the ESOP-representation can be exponentially smaller (consider i = 1 xi as an example). But most algorithms for ESOP minimization only apply local transformations to improve the representation starting from an initial solution, e.g. [BS93, MP01]. In [PC90] the problem to compute an ESOP for a given Boolean function f over n variables has been formulated using the Helliwell equation. The Helliwell equation Hf for function f has 3n input variables, each input variable corresponds to a cube and is 1 if, and only if, this cube is chosen for the ESOP of f . A satisfying
48
ROBUSTNESS AND USABILITY
Table 3.4. ESOP minimization
k 4 5 10 15 20 25 30 35 39
BDD all solutions Time #Nodes 0.55 628 0.58 4075 1.75 420655 4.96 1428139 53.96 2444782 1945.01 3449866 9985.37 4441463 13900.22 5361182 13913.44 5906441
hybrid structure ≥ 1 solution ≥ 103 solutions ≥ 106 Time #Nodes Time #Nodes Time 0.50 568 0.53 1108 0.53 0.53 638 0.60 4729 0.61 0.47 145 0.70 11597 51.28 0.48 352 0.61 11634 10.17 0.47 112 0.54 7459 1.13 0.48 490 0.52 5465 0.98 0.49 495 0.49 2618 0.66 0.52 544 0.51 878 0.75 0.44 217 0.45 1241 0.53
solutions #Nodes 1108 4729 155018 172422 177708 133396 48107 21608 5910
Zchaff 1 sol. 103 sol. 106 sol. Time Time Time <0.01 0.07 0.07 <0.01 0.09 0.09 <0.01 0.14 – <0.01 0.11 – <0.01 0.32 – <0.01 0.37 – <0.01 0.12 – <0.01 0.16 – <0.01 0.09 –
assignment to Hf determines an ESOP for f and vice versa. The hybrid structure was built for the Helliwell equation. By additional constraints, the number of cubes was limited to be at most k. The experimental results for applying 4 this method to f = i = 1 xi are shown in Table 3.4. Given are results for using BDDs, the hybrid structure, and the SAT solver Zchaff [MMZ+ 01]. We modified the SAT solver Zchaff to calculate more than one solution: For each solution a blocking clause is added and the solve process is continued. For the hybrid structure results are reported when different numbers of solutions are calculated: more than 1, more than 103 , and more than 106 solutions, respectively. For different values of k the CPU time in seconds and the number of nodes in the BDD or the hybrid structure are reported, respectively. For Zchaff, the CPU time is given. The number of available solutions is not reported but grows rapidly. While there are only 38 valid solutions for k = 4, there are more than 5000 for k = 6 and more than 4 × 106 for k = 9. The results show the superiority of the hybrid approach compared to BDDs. For a tightly restricted solution space (k < 25) BDDs are feasible. But after that the memory and especially the run time requirements grow prohibitively fast. In contrast, the hybrid approach exhibits a rather stable performance as CPU time and memory requirements remain in the same order for all runs. The increased run time for k = 10, 15 when calculating more than 106 solutions is due to the small number of possible solutions. In this case a large part of the BDD has to be recreated using the expansion technique without retrieving more solutions. As a result, BDDs are faster. But usually even calculating a large number of solutions does not degrade the performance of the new approach in the experiments. When calculating a single solution, the SAT solver is faster. But even for calculating 103 solutions the computation time increases significantly. Finally, when calculating a large number of solutions the added blocking clauses lead to a memory blow-up even for the SAT solver. Using a more sophisticated
Algorithms and Data Structures
49
approach, the blocking clauses could be compacted but only at the expense of CPU time for logic optimization. By this, the new approach provides a good compromise between a SAT-based approach and a BDD-based approach.
3.2
Summary and Future Work
Algorithms and data structures for Boolean function manipulation have been considered. A new approach to handle satisfiability problems was presented. This approach can be seen as an integrated technique using BDDs and SAT solvers and incorporates benefits of both: the memory consumption can be limited while calculating a large number of solutions in a single run. First heuristics have been proposed and evaluated to increase the performance of the new technique. Experiments show the efficiency of the hybrid technique in contrast to classical approaches. This technique can improve the robustness of Boolean function manipulation as the experiments show. But the technique can also be applied to improve the usability of design tools as it provides multiple solutions. The hybrid approach can be used to explore particular parts of the search space in depth while the exploration of less interesting parts may be deferred. This is beneficial at least in two cases. First, where multiple solutions provide a better basis for analysis, as it is the case for diagnosis (see also Chapter 6). Second, when there is a tradeoff between different solutions, e.g. for logic synthesis if power and size are considered. Integrating learning strategies as known from SAT provers is one focus of future work. The development of better heuristics and heuristics for particular problems is another direction.
Chapter 4 SYNTHESIS
The techniques presented in this chapter provide a flow for Synthesis for Testability of SystemC descriptions. Figure 4.1 shows the part of the design flow that is covered. The synthesizable description of a system is only an intermediate step in the design process. Coming from a high-level description, a synthesizable description is created. This is done mainly manually. While the synthesizable description is usually coded in a Hardware Description Language (HDL), a second system-level description is commonly described in a software programming language like C or C++ . This system-level description serves as a “golden model” for the design. From the functional point of view the synthesizable description is a refinement of the system-level description in the sense that the same output should be produced by both models upon the same input. Traditionally this is only exploited during simulation-based verification. But the time-consuming task of coding the two models is carried out twice in two independent processes. As a result, checking the consistency of both descriptions becomes very difficult as the simulation-based approach is not feasible for a complete check. Replacing the two languages – HDL and software programming language – by a single language can alleviate this weakness in the design flow. By using SystemC, the system-level description can be refined into a synthesizable one. Parsing and synthesizing SystemC are considered in the first part of this chapter. A subset of SystemC is used to describe a synthesizable model. Techniques for transforming this model into a gate-level description are explained. These techniques are implemented in the tool ParSyC. Experimental results show the efficiency. The technique for synthesis of SystemC was first presented in [FGC+ 04] and is part of the integrated design environment SyCE [DFGG05].
52
ROBUSTNESS AND USABILITY System level description
Manual refinement
Synthesizable description
Synthesis (for test.)
Gate level description
Figure 4.1.
Synthesis part of the design flow
After transforming the HDL description into a gate-level description, logic synthesis is carried out to optimize the circuit structures, e.g. with respect to size or speed. During this step testability issues are usually not considered. Therefore the calculation of test-vectors for the postproduction test becomes a difficult task. The decision problem “Does there exist a test-vector for a particular fault?” is NP complete [IS75]. Additionally, the optimized circuit may contain redundancies, i.e. untestable faults. Here, Synthesis for Testability can help to ease Automatic Test Pattern Generation (ATPG). The technique MuTaTe addresses this problem. MuTaTe is presented in the second part of this chapter. The circuits that are created using this technique are 100% testable under the Stuck-At Fault Model (SAFM) and the Path-Delay Fault Model (PDFM). Moreover, the run time of test pattern generation is polynomial in the size of the resulting circuit. MuTaTe has been proposed in [DSF04].
4.1
Synthesis of SystemC
New design methodologies and design flows are needed to cope with the increasing complexity of today’s circuits and systems. As already explained, this applies to all stages of circuit design from system-level modeling and verification down to layout. One focus of research in this area is the use of new hardware description languages. Traditionally, the system level description is done in a programming language like C or C++ while dedicated hardware description languages like VHDL and Verilog are used at the RTL. This leads to a decoupling of the
Synthesis
53
behavioral description and the synthesizable description. Recently, developed languages allow for higher degrees of abstraction, and, additionally, the refinement for synthesis is possible within the language. One of these new languages is SystemC [LTG97, GLMS02]. Basis of SystemC is C++ . Therefore all features of C++ are available. The additional SystemC library provides all concepts that are needed to model hardware as, e.g. timing or concurrency. Research on the conceptual side and on algorithms that are applied to SystemC designs is difficult. For areas like high-level synthesis, verification, or power estimation a formal understanding of the given design is necessary before any subsequent processing can be carried out. Most recent publications either focused on special features of SystemC or C/C++ , like synthesis of fixed point numeric operations [BFGR03], polymorphism [SKWS+ 04], or pointers [KCY03], or considered the design methodology [MRR03]. Few works [e.g. MRH+ 01] have been published that rely on the formal model of an arbitrary SystemC design. One reason for this is the high effort to syntactically and semantically understand the SystemC description. For this purpose SystemC has to be parsed. Recently, different parsers for SystemC have been proposed. The approaches in [BPM+ 05, EAH05] only extract a coarse structure from a SystemC description. Behavioral information is lost. The analysis tool PINAPA [MMMC05] can handle high-level constructs. But PINAPA relies on the C++ compiler as a front-end for parsing the source code. Therefore, patches are needed to change to a different compiler or to a different version of SystemC. This makes the use of PINAPA difficult. Finally, the approach proposed in [GD06] is a direct enhancement of the one suggested here. In this section, a parser and synthesizer for SystemC is presented that is implemented as the tool ParSyC. This tool is also part of the design environment SyCE [DFGG05] for SystemC. The parser covers SystemC and to a certain extent C++ . The Purdue Compiler Construction Tool Set (PCCTS) [Par97] was used to build ParSyC. This parser produces an easy-to-process representation of a SystemC design in terms of an intermediate representation. The description is generic, i.e. any further processing can start from this representation, regardless of the application to visualization [GDLA03, GD06], formal verification [GD03], or other purposes. As an example, the application to synthesis of RTL descriptions is explained and the efficiency is underlined by experiments. Some advantages of this approach are easy extendability, adaptivity, and efficiency of the SystemC front-end. This section is structured as follows: The basic concepts of SystemC are discussed in Section 4.1.1. The methodology to create ParSyC and the exemplary application to synthesis are explained in Section 4.1.2. Advantages of the approach are discussed in Section 4.1.3. The experimental results are given in Section 4.1.4.
54
4.1.1
ROBUSTNESS AND USABILITY
SystemC
The main concepts of SystemC are briefly reviewed in the following. SystemC is a system description language that enables modeling at different levels of abstraction. Constructs known from traditional hardware description languages are also provided. By this, any task between design exploration at a high level and synthesis at a low level can be carried out within the same environment. Features to aid modeling at different levels of abstraction are included in SystemC. For example, the concept of channels allows to abstract from details of the communication between modules. Therefore, modeling at the transactional level becomes possible [CJG+ 03]. This, in turn, enables fast design space exploration and partitioning of the design before working on the details of protocols or modules. In practice, SystemC comes as a library that provides classes to model hardware in C++ . For example, a hardware module is described using the class sc module provided by SystemC. All features of C++ are also available in SystemC. This includes dynamic memory allocation, multiple inheritance, as well as any type of complex operations on data of an arbitrary type. Any SystemC design can be simulated by compiling it with an ordinary C++ -compiler into an executable specification. But to focus on other aspects of circuit design a formal model of the design is needed. Deriving a formal model from a SystemC description is difficult: A parser that handles SystemC – and for this C++ – is necessary. But developing a parser for a complex language comes at a high effort. Moreover, the parser should be generic in order to aid not only a single purpose but to be applicable for different areas as well, e.g. synthesis, formal verification, and visualization.
4.1.1.1 Synthesis In order to allow for concise modeling of hardware, several constructs are excluded from the synthesizable subset of SystemC [Syn02]. For example, SystemC provides classes to easily model buffers for arbitrary data using the class sc fifo. An instance of type sc fifo can have an arbitrary size and can work without explicit timing. Therefore, there is no general way for synthesis. In principle, this could be solved by providing a standard realization of the class. But in order to retrieve a good – e.g. small and/or fast – solution after synthesis, several decisions are necessary. For this reason, it is left to the hardware designer to replace this class by a synthesizable description. The concept of dynamic memory allocation also is hardly synthesizable in an efficient way and, thus, excluded from the synthesizable subset. For a better understanding synthesis of RTL descriptions is used to demonstrate the features of ParSyC. Due to this application the SystemC input is
55
Synthesis
restricted, but as a generic front-end ParSyC can handle other types of SystemC descriptions as well.
4.1.2
SystemC Parser
In this section, the methodology to build a parser and the special features used for parsing SystemC are explained. The synthesis of RTL descriptions is carried out using ParSyC as a front-end. The methodology for parsing and compiling has been studied intensively (see, e.g. [ASU85]). Often the Unix-tools lex and yacc are used to create parsers. But more recent tools provide simpler and more powerful interfaces for this purpose. Here, the tool PCCTS was used to create ParSyC. For details on the advantages of PCCTS see [PQ95, Par97]. The SystemC parser was built as follows: A preprocessor is used to account for preprocessor directives and to filter out header files that are not part of the design, like system header files. A lexical analyzer splits the input into a sequence of tokens. These are given as regular expressions that define keywords, identifiers, etc. of SystemC descriptions. Besides C++ keywords also essential keywords of SystemC are added, e.g. sc module or sc int. A syntactical analyzer checks whether the sequence of tokens can be generated by the grammar that describes the syntax of SystemC. Terminals in this grammar are the tokens. PCCTS creates the lexical and syntactical analyzer from tokens and grammar, respectively. Together they are referred to as the parser. The result of parsing a SystemC description is an Abstract Syntax Tree (AST). At this stage no semantic checks have been performed as, e.g. for type conflicts. The AST is constructed using a single node type, that can have a pointer to the list of children and a pointer to one sibling. Additional tags at each node are used to store the type of a statement, the character string for an identifier and other necessary information. This structure is explained by the following example. Example 16. Consider the code fragment in Figure 4.3. Figure 4.2 shows the data types of the variables. Shown is one process of the robot controller introduced in [GLMS02]. Figure 4.4 shows a part of the AST for this process. Missing parts of the AST are indicated by triangles. In the AST produced by PCCTS each node points to the next sibling and to the list of children. The node in the upper left represents the if-statement from Line 3 of the code. The condition is stored as a child of this node. The then-part and the else-part of the statement are siblings of the child.
56
ROBUSTNESS AND USABILITY
1 2 3 4
sc_in<sc_bv<8> > sc_out sc_uint<8> sc_signal
uSEQ_BUS; LSB_CNTR; counter; DONE, LDDIST, COUNT;
Data types in the process counter proc
Figure 4.2.
1 void robot_controller::counter_proc() 2 { 3 if (LDDIST.read()) { 4 counter = uSEQ_BUS.read(); 5 } else if (COUNT.read()) { 6 counter = counter - 1; 7 } 8 DONE.write(counter == 0); 9 LSB_CNTR.write(counter[0]); 10 } Figure 4.3. Process counter proc of the robot controller from [GLMS02]
statement if expression LDDIST
statement write block {
statement write
statement if
expression = ID counter
ID uSEQ_BUS
Figure 4.4.
AST for Example 16
The overall procedure when applying the parser for synthesis is shown in Figure 4.5. The dashed box indicates steps that are application independent, i.e. the corresponding tasks have to be executed for other applications as visualization or formal verification as well. The whole process can be divided into several steps.
57
Synthesis SystemC description
Preprocessor
Preprocessed SystemC description
Parser
Abstract syntax tree
Analyzer
Intermediate representation
Synthesizer
Netlist
Figure 4.5.
Overall synthesis procedure
After preprocessing, the parser is used to build the AST from the SystemC description of a design. The AST is traversed to build an intermediate representation of the design. All nodes in an AST have the same type, any additional information is contained in tags attached to these nodes. Therefore, different cases have to be handled at each node while traversing the AST. By transforming the AST into the intermediate representation, the information is made explicit in the new representation for further processing. The intermediate representation is built using classes to represent building blocks of the design, like, e.g. modules, statements, or blocks of statements. During this traversal semantic consistency checks are carried out. This includes checking for correct typing of operands, consistency of declarations, definitions, etc. Up to this stage the parser is not restricted to synthesis and all processing is application-independent. The intermediate representation serves as the starting point for the intended application. At this point, handling the design is much easier because it is represented as a formal model within the class structure of the intermediate representation. The classes for keeping the intermediate representation
58
ROBUSTNESS AND USABILITY
correspond to constructs in the SystemC code. Each component knows about its own semantic in the original description. Further processing of the design is done by adding application specific features to the classes used for storing the intermediate representation. In case of synthesis, a recursive traversal is necessary. Each class is extended by functions for the synthesis of substructures to generate a gate level description of the design. Example 17. Again, consider the AST shown in Figure 4.4. The AST is transformed into the intermediate representation shown in Figure 4.6. The structure looks similar to that of the AST, but in the AST only one type of node was used. Now, dedicated classes hold different types of constructs. The differentiation of these classes relies on inheritance in C++ . Therefore, synthesis can recursively descend through the intermediate representation. As usual in synthesis, RTL descriptions in SystemC are restricted to a subset of possible C++ constructs and SystemC constructs [Syn02]. C++ features like dynamic memory allocation, pointers, recursion, or loops with nonconstant bounds are not allowed to prevent difficulties already known from highlevel synthesis. In the same way some SystemC constructs are excluded from synthesis as they have no direct correspondence at the RTL, e.g. as shown for sc fifo in Section 4.1.1. Thus, for simplicity SystemC channels were
CIfStatement
CAssignStat.
CAssignStat.
condition
destination
destination
then
source
source
else
CLiteralExp.
CBlock
CIfStatement
LDDIST
statements
condition then else
CAssignStat. destination source
CVariableExp.
CVariableExp.
counter
USEQ_BUS
Figure 4.6. Intermediate representation
59
Synthesis
excluded from synthesis. For channels that obey certain restrictions synthesis can be extended by providing a library of RTL realizations. Supported are all other constructs that are known from traditional hardware description languages. This comprises different operators for SystemC data types, hierarchical modeling, or concurrent processes in a module. Additionally, the new-operator is allowed for instantiation of submodules to allow for a compact description of scalable designs. The outcome of the synthesis process is a gate-level description in Berkeley logic interchange format (blif format) as used by SIS [SSL+ 92]. Switching the output format to VHDL or Verilog on the RTL is easily possible. Here, the focus is on parsing SystemC and retrieving a formal model from the description, no optimizations are applied. Logic synthesis will be considered later in Section 4.2.
4.1.3
Characteristics
The presented approach to create a formal model from a SystemC description has several advantages: Extendability. SystemC is still evolving. The parser can easily be enhanced to cope with future developments by extending the underlying grammar and the classes for the intermediate representation. The necessary changes are straightforward in most cases. Adaptivity. Here, ParSyC is only exemplary applied to synthesis, but several other applications are also of interest. When starting with a new application that should work on SystemC designs, the intermediate representation directly serves as a first model of the design. Decoupling. The complex process of parsing should be hidden from the application. ParSyC serves as the front-end to “understand” a given SystemC description. The application specific algorithms can be crafted without touching the SystemC code of the design. Efficiency. A fast front-end is necessary to cope with large designs. The efficiency of the front-end is guaranteed by the compiler–generator PCCTS. The subsequent application can directly start processing the intermediate representation that is given as a C++ -class structure. Experiments are presented in the next section to underline the efficiency of ParSyC. Compactness. The parser should be compact to allow for an easy understanding during later use and extension. The parser itself has only ≈1000 lines of code (loc) which includes the grammar and necessary modifications beyond PCCTS to create the AST. The code for analyzing the SystemC
60
ROBUSTNESS AND USABILITY
code and for the classes that represent the intermediate representation consists of ≈4000 loc. For synthesis ≈2500 loc are needed. The complete tool for synthesis including error handling, messaging, etc. has ≈9000 loc. Comments and blank lines in the source are not included in these numbers.
4.1.4
Experimental Results
All experiments have been carried out on a Pentium IV with Hyperthreading at 3GHz and 1GB RAM running Linux. ParSyC has been implemented using C++ . A control dominated design and a data dominated design are considered in the first two experiments, respectively. Large SystemC descriptions are created from ISCAS89 circuits to demonstrate the efficiency of ParSyC in the third experiment.
4.1.4.1 Control Dominated Design The scalable arbiter introduced in [McM93] has been frequently used in works related to property checking. Therefore, a SystemC description at the RTL was created and synthesized. The top-level view of the arbiter is shown in Figure 4.7. This design handles the access of NUMC devices to a shared resource. Device i can signal a request on line req in[i] and may access the resource if the arbiter sets ack out[i]. The arbiter uses priority scheduling but also guarantees that no device waits forever (for details we refer 0 token_out req_in
override_in
Cell n–1
token_in
override_out
token_out
override_in
req_in
Cell 1
token_in
override_out
token_out
override_in
req_in token_in
Figure 4.7.
Cell 0 override_out
grant_out ack_out grant_in
grant_out ack_out grant_in
grant_out ack_out grant_in
Arbiter: Block-level diagram
61
Synthesis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
#include "RTLCell.h" #include "Inverter.cc" #define NUMC 2 SC_MODULE(scalable) { / / D e c l a r a t i o n of inputs , outputs / / and i n t e r n a l s i g n a l s ... Inverter *inv; RTLCell *cells[NUMC]; SC_CTOR(scalable) { for (int i= 0; i < NUMC; ++i) { / / Create c e l l i cells[i]= new RTLCell("cells"); if (i==0) { / / Connect c e l l 0 cells[i]->TICK(clk); ... cells[i]->ove_out(override_out); } else { if (i==(NUMC-1)) { / / C o n n e c t c e l l NUMC−1 ... } else { / / Connect c e l l i ... } } } inv= new Inverter("Inverter"); inv->in( override_out ); inv->out( grant_in ); } } Figure 4.8. Arbiter: Top-level module scalable
to [McM93]). Figure 4.8 shows the SystemC description of the top-level module scalable. For each of the NUMC devices a corresponding arbiter cell is instantiated and the cells are interconnected using a for-loop. Results for the synthesis with different numbers of arbiter cells are shown in Table 4.1. Given are the size of the netlist output and the CPU times needed. Note that
62
ROBUSTNESS AND USABILITY
Table 4.1. Arbiter: Synthesis results NUMC 5 10 50 100 500 1000
In 6 11 51 101 501 1001
Out 5 10 50 100 500 1000
Latches 10 20 100 200 1000 2000
Gates 942 1882 9402 18802 94002 188002
tp <0.01 <0.01 <0.01 <0.01 <0.01 <0.01
ta <0.01 0.01 <0.01 0.01 0.03 0.06
ts <0.01 <0.01 0.02 0.02 0.08 0.16
tt 0.04 0.01 0.04 0.04 0.11 0.24
the arbiter cells are described at the RTL and synthesis is carried out without applying optimizations. The netlist is built using two input gates. The hierarchical description generated by the synthesis tool always contains 190 gates: 188 gates per arbiter cell plus one additional buffer and one inverter. The number of gates that is contained in the flattened netlist is larger. This number is shown in the table. The same holds for the number of latches. The flattened netlist contains two latches per arbiter cell while the hierarchical netlist only contains two latches in total. The times needed for parsing tp , analyzing ta , synthesis ts , and the total time tt are shown in the respective columns. As can be seen, scaling the arbiter does not influence the time for parsing because only the constant NUMC in the source code is changed. The time for analysis increases moderately since type checks for the different cells have to be carried out. During synthesis the for-loop has to be unrolled and, therefore, scaling influences the synthesis time. The total time is dominated by the time needed for synthesis and includes overhead like reading the template for the output format, parsing the command line, etc. Even synthesizing a design that corresponds to a flattened netlist with 188 k gates takes less than one CPU second.
4.1.4.2 Data Dominated Design The second design is a Finite Impulse Response (FIR)-filter of scalable width. Scalable are the number of coefficients and the bit-width of data. A blocklevel diagram of the FIR-filter is shown in Figure 4.9. Incoming data is stored in a shift-register (d[0],...,d[n-1]). A read only memory stores the filter coefficients (c[0],...,c[n-1]) as constants. The result is provided at the output dout. The SystemC description contains one process to create the shift-register and another process that carries out the calculations. The coefficients are provided by an array of constants. Synthesis results for different bit-widths and numbers of coefficients are given in Table 4.2. In case of the arbiter, additional checks for submodules were necessary during analysis of the for-loop. This is not the case for the FIR-filter where no submodules are created, therefore, scaling does not influence the time needed for analysis. But for the FIR-filter the time for synthesis increases faster compared to the arbiter
63
Synthesis reset din
d[1] d[2]
x
x
d[n]
x +
c[1]
c[2]
dout
+ c[n]
Figure 4.9. FIR-filter: Block-level diagram
Table 4.2. FIR-filter: Synthesis results Width 2 2 2 4 4 4 8 8 8 16 16 16 32 32 32 64 64 64
Coeff 2 4 8 2 4 8 2 4 8 2 4 8 2 4 8 2 4 8
In 3 3 3 5 5 5 9 9 9 17 17 17 33 33 33 65 65 65
Out 4 4 4 8 8 8 16 16 16 32 32 32 64 64 64 128 128 128
Latches 8 12 20 16 24 40 32 48 80 64 96 160 128 192 320 256 384 640
Gates 159 301 585 611 1189 2345 2283 4501 8937 8699 17269 34409 33819 67381 134505 133211 265909 531305
tp <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01
ta <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 0.01 <0.01 <0.01 <0.01
ts <0.01 <0.01 0.02 0.01 0.02 0.03 0.03 0.05 0.10 0.07 0.14 0.30 0.29 0.63 1.22 1.23 2.34 5.34
tt <0.01 <0.01 0.03 0.01 0.04 0.03 0.04 0.05 0.12 0.08 0.15 0.31 0.29 0.64 1.23 1.23 2.35 5.34
when the design is expanded. This is due to the description of the multiplication as a simple equation in SystemC: 1 for (int i=0; i < n; i++) { 2 tmp = c[i] * d[i].read(); 3 out = out+tmp; 4 } Instead of instantiating modules, the operations are directly described in the netlist. Therefore, the bit-width and the number of coefficients have a direct influence on the synthesis time and the size of the output.
64
ROBUSTNESS AND USABILITY
Even the large design of 500 k gates has been parsed and analyzed very fast. The synthesis which includes writing the output to the hard disk only took about 5 CPU seconds.
4.1.4.3 Large SystemC Descriptions The first two experiments showed the influence of scaling different types of designs. In the following, the influence of a large SystemC description is investigated. Circuits from the ISCAS89 benchmark set are considered. Starting from the netlist, BDDs (see Section 2.1.2) were built for each circuit. While building the BDD, no reordering techniques were applied for size reductions. For each output and next state the BDD was dumped into an if-then-else-structure which was embedded in a SystemC module. This module was synthesized. The results are shown in Table 4.3. Given are the name of the circuit, the lines of code loc and the number of characters char in the SystemC code. The circuits are ordered by increasing loc. As can be seen, the time for parsing increases with the size of the source code, but is small even for large designs of several 1,00,000 loc. The time needed for analysis increases faster due to the semantical checks and the translation into the intermediate representation that are carried out at this stage. The largest amount of time is due to synthesis where the intermediate structure is traversed and the netlist is written. As a reference, the time needed to compile the SystemC code using g++ (version 3.3.2, optimizations turned off, no linking is done) is given in column tg++ . Compiling the SystemC description using g++ means to create an Table 4.3. ISCAS 89: Synthesis results Circuit s27 s298 s382 s400 s386 s526 s344 s349 s444 s641 s713 s1488 s1494 s1196 s1238 s820 s832
Loc 184 1269 2704 2704 4260 3332 5103 5103 6264 54849 54849 60605 60605 247884 247884 402546 402546
Char tp ta ts 3129 <0.01 <0.01 0.01 20798 0.01 0.02 0.07 47343 0.02 0.05 0.16 47343 0.03 0.04 0.16 69331 0.04 0.07 0.24 52999 0.03 0.05 0.19 86055 0.04 0.06 0.29 86055 0.05 0.09 0.29 97100 0.06 0.11 0.39 847546 0.48 1.27 4.16 847546 0.50 1.29 4.24 981692 0.57 1.15 3.61 981692 0.55 1.17 3.61 3817191 2.27 5.57 16.53 3817191 2.33 5.62 16.58 6130213 3.80 10.53 25.36 6130213 3.75 10.57 25.69
tt 0.02 0.12 0.26 0.26 0.39 0.31 0.45 0.49 0.63 6.47 6.58 5.95 5.96 26.82 27.01 43.77 43.99
tg++ 2.26 2.26 2.44 2.41 2.57 2.56 2.70 2.73 2.78 8.27 8.52 8.81 8.84 30.88 31.28 42.68 43.12
Synthesis
65
executable description of the design for simulation while synthesis creates the hardware description of the design. The total run time needed for synthesis is comparable to the time needed by g++, even for the largest files. The experiments have shown that ParSyC is an efficient front-end for SystemC. For this purpose designs have been considered that are large in terms of the number of gates and in terms of the size of the SystemC description. The performance of ParSyC is comparable to the efficient and widely used compiler g++.
4.2
Synthesis for Testability
The previous section provides the methodology to transform a SystemC description into a functional description at the gate level. This gate-level description essentially describes the circuits as a Boolean function. In this section, a gate-level description is transformed into a fully testable circuit. The underlying algorithm relies on the use of BDDs for logic synthesis. BDDs have been applied in several applications. For example, the importance for formal verification has already been discussed. But BDDs have also been studied in logic synthesis since they allow to combine aspects of circuit synthesis and technology mapping [GD00, DG02]. Recently, there is a renewed interest in multiplexor-based design styles, since often multiplexor nodes can be realized at very low cost (as, e.g. Pass Transistor Logic (PTL)). In addition, these techniques allow to consider layout aspects during the synthesis step and by this guarantee high design quality (see, e.g. [MSML99, MBM01]). In this context, circuits derived from BDDs often result in smaller chips. Besides size, the testability of chips is another important issue, i.e. which faults in the resulting chip can be tested and which ones cannot. BDD circuits as introduced in Section 2.2.2 have been studied intensively under various fault models [ADK91b, ADK91a, Bec92, ADK93], for an overview see [Bec98]. But none of these approaches can guarantee 100% testability in a “systematic way”. For example, in [Bec92] an algorithm is given that can compute all redundancies of the circuit in polynomial time. But the removal of these redundancies can generate new ones (so-called 2nd-generation redundancies). For their removal only classical ATPG can be applied. On the other hand, many approaches have been presented to improve the testability of an already synthesized circuit based on circuit transformations (see, e.g. [CPK95]). But also here, the techniques to ensure full testability can be time consuming and it is desirable to have “testability by construction”. In this section, a simple transformation is presented that guarantees full testability of a circuit derived from a BDD description under the stuck-at fault model and the robust path-delay fault model. The size of the circuit is directly proportional to the given BDD size (see Section 2.2.2). All optimizations of the BDDs based on variable ordering directly transfer to the resulting circuit
66
ROBUSTNESS AND USABILITY
sizes. Only one extra input and one inverter are needed. The resulting circuits are free of redundancies. The algorithm has been implemented as a tool for Multiplexor Transformation for Testability (MuTaTe). Experimental results are given that show the advantages of the approach compared to traditional synthesis approaches and to “classical” mapping of BDDs. The presentation is structured as follows: The creation of fully testable BDD circuits is explained in the next section. Then, in Section 4.2.2, the main results on testability are presented. Experimental results that show the efficiency of the synthesis step and the improvements regarding testability are reported in Section 4.2.3.
4.2.1
BDD Transformation
In the following, we first describe the transformation how to derive a circuit from a given BDD description. Then, some properties of the resulting circuits are discussed. In the next sections, testability properties regarding the SAFM and the PDFM are studied. Analogously to the “standard approach” from [Bec92] as explained in Section 2.2.2, the circuit is generated by traversing the BDD and substituting each node with a MUX cell. But the methods differ when reaching nodes that have one or two pointers to terminal nodes. In this case, usually the MUX cell is simplified. For example, if the 0-input is connected to constant 0, the MUX cell can be simplified and can be substituted by an AND-gate. Here, all nodes – also the ones pointing to terminals – are substituted by complete multiplexor cells. The terminal node 0 is then substituted by a new primary input t (=test). Furthermore, t is connected to the 1-terminal of the BDD by an inverter. Example 18. The generation of a circuit from a BDD was already explained in Example 8 in Section 2.2.2. Figures 4.10(a) and 4.10(b) repeat this example. The BDD for function f = x1 x2 + x3 is drawn upside down to underline the similarities to the resulting circuit. If the approach from [Bec92] is applied, the BDD circuit in Figure 4.10(b) results (shown without simplification). The transformation described above generates the circuit in Figure 4.10(c). Remark 6. It is important to notice that for multiplexor-based design styles, like, e.g. PTL, the “simplification” of the MUX cell does not really imply savings in area or delay since the complete multiplexor cell is often easier to realize. If t is set to constant 0, the circuit computes the original function. If t is set to 1, the complement is computed. It is important to observe that by changing
67
Synthesis 0
0
1
x3
x3
x2
x1
x1
t
x3
0 1
0 1
x2
f (a) BDD
1
0 1
0
1
x2
0
0
x1
y (b) Approach of [Bec92]
Figure 4.10.
1
1
y (c) MuTaTe
Generation of circuits from BDDs
the value of t all “internal” signals, i.e. signals corresponding to edges in the BDD, change their value. This can be seen as follows. Inverting the terminals means inverting the function represented by a node. According to the Shannon decomposition, this inversion propagates to all intermediate nodes as well: f
= xi g + xi h = (xi g)(xi h) = (xi + g)(xi + h) = xi g + xi h
Applying this recursively to the BDD shows that all signals in the BDD circuit change their value. This guarantees applicability of the values needed at the fault location as explained below. Furthermore, the propagation of the faulty behavior to an output has to be ensured. This is one of the reasons, why multiplexor cells are suited to compose circuits that are well testable. Due to the control input, a propagating path can easily be generated. Thus, the propagation of a value from a fault location is no problem and only the values to excite the faulty behavior have to be applied. In previous approaches (see, e.g. [Bec92, ADK93]), modifications of the circuit were described; but these change the multiplexor structure and, by this, also destroy the propagation properties of multiplexors.
68
4.2.2
ROBUSTNESS AND USABILITY
Testability
The main results with respect to the testability of circuits generated by MuTaTe are presented in the following. Sketches to formally proof these results are also given.
4.2.2.1 Stuck-At Fault Model As has been observed in [Bec92], stuck-at redundancies in a mapped BDD circuit can only occur if one of the values 01 or 10 is not applicable to the data inputs of this cell. On the other hand, at least one of the values is applicable (otherwise both functions would be equivalent and, consequently, the BDD would not be reduced). Due to the properties of the new input t, i.e. all internal signals change their value, the missing value can be applied by changing the value at t. We obtain: Theorem 1. By one additional input and one inverter a circuit can be generated from a BDD that is 100% testable for single stuck-at faults. For the generation of a test, the efficient polynomial synthesis operations on BDDs can be used (see Section 2.1.2). For each multiplexor cell the set of applicable values can easily be computed by carrying out AND-operations on the corresponding BDD nodes. The propagating path can be determined by a linear time graph traversal. Lemma 1. In the resulting circuits, test pattern generation for stuck-at faults can be carried out in polynomial time.
4.2.2.2 Path-Delay Fault Model The same arguments as given above also ensure that all paths are testable under the PDFM. At each cell the values 10 or 01 can be applied (dependent on t). Thus, the paths starting at an input corresponding to the variable xi can be propagated along any of the two AND-gates in the MUX cell (see Figure 2.9 on Page 22). Furthermore, due to the propagation along the multiplexors, it is easy to see that the paths starting at t can be tested. We obtain: Theorem 2. By one additional input and one inverter a combinational circuit can be generated from a BDD that is 100% testable for robust path-delay faults. After the applicable values have been determined based on BDD operations, two patterns for a robust test can be determined by a traversal of the circuit in linear time. The two patterns only differ in the value of the primary input where the path starts. Lemma 2. In the resulting circuits, test pattern generation for path-delay faults can be carried out in polynomial time.
69
Synthesis x2 x1
x1 1 x2
x3
0
1
0
Figure 4.11.
1
x3
0
1
Redundancy due to simplification
4.2.2.3 Partial Simplification As has been observed in Remark 1 on Page 22, the simplification of the MUX cells can destroy the testability. But not all types of simplifications have this property, i.e. if both data inputs have constant values the MUX cell can be substituted by a simple wire or an inverter. Due to this substitution, 100% testability is preserved. Dependent on the design style this should be preferred. If exactly one of the data inputs is constant, there exist four cases, i.e. left or right data input is constantly 0 or 1, respectively. For one of these cases we provide an example showing that the simplification of the MUX cell results in redundancies. A similar example can be created for the remaining three cases. Example 19. Assume, the right data input is constantly 1 (see Figure 4.11): The simplification results in an OR-gate, but then the combination 01 cannot be applied to the data inputs of the next MUX since a value 1 at the right data input directly implies a 1 at the left data input. According to the classification given in [Bec92], this would result in untestable faults in the MUX. In summary, only if both data inputs have constant values the MUX cell can be simplified without creating redundancies in design styles where this is beneficial. For all other four cases full testability cannot be guaranteed. In the following, we only perform the simplifications that ensure full testability.
4.2.3
Experimental Results
The technique described above has been implemented as the tool MuTaTe. The program is implemented in C and all experiments have been carried out on a SUN Sparc 20 with 64 MB of main memory. For the experiments we used some of the benchmarks from LGSynth91 [Yan91]. As the underlying BDD package, CUDD has been used [Som01a]. In the following, we restrict ourselves to a study of the PDF coverage (PDFC) of the circuits. The PDFC was determined using an improved version of the tool BiTeS [Dre94]. For each circuit we report the number of literals (measured using SIS [SSL+ 92]), the number of paths that have to be tested, and the PDFC in percent. Of course, the number of paths can become a crucial factor since a large number results
70
ROBUSTNESS AND USABILITY
Table 4.4. Benchmarks before and after optimization by SIS Circuit 5xp1 C17 alu2 b9 clip con1 count i1 i5 t481 tcon 9sym f51m z4ml x2
In 7 5 10 41 9 7 35 25 133 16 17 9 8 7 10
Out 10 2 6 21 5 2 16 13 66 1 16 1 8 4 7
Lits 391 23 909 408 1026 48 338 118 689 2673 90 655 409 325 97
Original NoP 296 11 69521 301 888 23 368 97 883 4752 56 522 319 252 90
PDFC 98.4 100.0 1.0 93.3 90.4 100.0 100.0 97.4 100.0 100.0 85.7 96.5 98.2 100.0 74.4
Lits 158 10 415 171 273 21 159 50 198 532 32 333 155 46 51
Optimized NoP 1071 11 61362 410 571 22 384 98 672 2405 40 518 30950 368 79
PDFC 19.9 100.0 0.9 89.0 74.3 100.0 100.0 77.5 100.0 89.5 100.0 91.1 0.7 25.2 81.0
in high test costs. For BDD circuits reducing the number of paths in the BDD [FD06] is one method to address this problem [FSD04]. In Table 4.4, the name of the benchmark is given in the first column followed by the number of inputs and outputs in column two and three, respectively. The number of literals, the number of paths, and the PDFC are given in column lits, NoP, and PDFC, respectively. Column original gives the numbers for the benchmark as it is given in the description. Column optimized gives the numbers for the circuits that have been optimized by SIS using script rugged. As can be seen, the PDFC varies significantly. While some circuits have a testability of 100%, for others only 1% of the paths (or even less) are robustly testable. It is also important to notice that the optimization techniques used can result in a large number of untestable paths although the original circuit was very well testable. Consider circuit f51m as the most obvious example. Even though the original circuit had a PDFC of 98.2%, the coverage of the optimized netlist is less than 1%. In a second series of experiments, we study the PDF testability of BDD circuits. The results for BDDs without optimization are shown in Table 4.5. Table 4.6 presents result when optimization was used. In column MUX-map, the results are given for a direct mapping of BDDs with simplification of the constant values as described in [Bec92]. As has been observed in Remark 1, the “full simplification” can result in untestable paths. But already in this case the resulting BDD circuits have a significant better testability than the ones generated by SIS (see above), i.e. always more than 60%.
71
Synthesis Table 4.5. Path-delay fault coverage of BDD circuits Circuit 5xp1 C17 alu2 b9 clip con1 count i1 i5 t481 tcon 9sym f51m z4ml x2
Lits 273 26 652 609 603 53 832 157 2659 301 32 84 239 75 147
MUX-map NoP PDFC 273 89.0 22 68.1 873 86.9 1773 64.6 954 79.4 47 74.4 2248 66.1 137 74.4 44198 61.3 4518 86.1 40 100.0 328 72.5 326 99.3 175 77.1 188 72.3
Lits 320 41 718 768 667 77 928 230 3032 328 96 97 262 92 183
MuTaTe NoP 574 44 1713 3429 1862 95 4072 295 81381 8671 112 658 668 358 379
PDFC 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
Lits 308 29 698 700 655 61 864 186 2768 312 32 93 242 84 171
MuTaTe-S NoP 364 26 984 2370 1130 59 2704 184 51345 5473 40 490 332 232 244
PDFC 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
Table 4.6. Path-delay fault coverage of BDD circuits optimized by sifting Circuit 5xp1 C17 alu2 b9 clip con1 count i1 i5 t481 tcon 9sym f51m z4ml x2
Lits 131 21 673 374 353 38 221 141 341 72 32 84 133 51 87
MUX-map NoP PDFC 218 83.0 18 66.6 749 84.5 870 66.7 764 76.4 28 85.7 624 58.9 111 71.1 1102 67.6 4065 74.6 40 100.0 328 72.5 236 83.8 169 72.7 80 73.7
Lits 168 29 746 520 391 61 320 194 552 80 96 97 158 64 123
MuTaTe NoP 463 35 1446 1707 1466 59 1024 226 2301 7201 112 658 494 346 160
PDFC 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
Lits 160 25 730 468 379 45 252 170 540 76 32 93 146 60 111
MuTaTe-S NoP PDFC 337 100.0 23 100.0 867 100.0 1227 100.0 938 100.0 35 100.0 880 100.0 142 100.0 1998 100.0 4996 100.0 40 100.0 490 100.0 314 100.0 238 100.0 112 100.0
The results for the new approach are given in the next two blocks. Column MuTaTe gives the results for a direct mapping, i.e. the new test input is connected to each constant input to a MUX cell, while MuTaTe-S performs the simplifications described in Section 4.2.2.3 that preserve testability. In some rare cases (e.g. tcon), this reduction also removed the additional input. In these cases 100% PDFC is ensured while no additional input is needed.
72
ROBUSTNESS AND USABILITY
As can be seen, in both cases 100% PDFC is ensured. As is well-known, the size of a BDD (and by this of the resulting BDD circuit) largely depends on the chosen variable ordering. Comparing the literal count of the final circuits in Table 4.6, i.e. the size optimized BDDs, with those of SIS in Table 4.4, shows that the synthesis methods are somehow “orthogonal”. For several circuits the sizes are comparable. In some cases SIS is significantly better (see, e.g. b9) while for others BDDs are better suited. For example, for t481 the BDD circuit generated by MuTaTe-S is seven times smaller than the corresponding circuit produced by SIS. Furthermore, the synthesis scenario considered in our experiments is to be seen as “worst case” for BDD circuits (cf. Remark 6) since all cells are mapped to basic gates. For MUX oriented design styles the reduction in size can be expected to be even larger.
4.3
Summary and Future Work
A flow to produce fully testable circuits from SystemC descriptions has been outlined. This is done in two steps. The parser and synthesis tool ParSyC is used as a front-end to construct a formal model from a SystemC description. The formal model is given by an intermediate representation that can serve as a starting point for other applications in the design flow as well, e.g. verification and visualization. This hides the complexity of parsing a SystemC description from the intended application. As an example, the synthesis of RTL descriptions has been shown. Several experiments underline the efficiency of the tool. Bridging the refinement steps within SystemC using formal techniques remains an important area for future work. The second step is logic synthesis. Here, synthesis for testability is applied to the gate-level model using a BDD transformation. The resulting circuits are fully testable under the stuck-at fault model and under the (robust) pathdelay fault model in the combinational case. The transformation only needs one extra input and one inverter. The algorithm has been implemented as the program MuTaTe. Experimental studies have demonstrated the advantages of the approach. The optimization of the resulting circuits has been studied in [FSD04]. There, for example, the technique proposed in [FD06] was applied to reduce the number of paths in the BDD and, by this, the time needed for testing. Furthermore, results on the testability with respect to the bridging fault model have been reported in [SFD05a]. Unfortunately, the technique is only applicable to relatively small circuits because it relies on monolithic BDDs. This also means that the composition of fully testable circuits created using MuTaTe is an important direction for future work. Meanwhile, traditional ATPG techniques have to be applied to those circuits that cannot be handled by MuTaTe. But formal techniques can also be applied to improve the robustness of traditional ATPG algorithms, e.g. by integrating modern SAT provers into traditional ATPG environments [Lar92, SBSV96, SFD+ 05b, DF06].
Synthesis
73
Altogether the two techniques proposed in this chapter increase the robustness of the design flow. The application of SystemC as the only language for system level and register transfer level helps to avoid inconsistencies between the different abstraction levels. Having a technique to create fully testable circuits during logic synthesis improves the robustness of ATPG, especially due to the ability to classify all faults in polynomial time.
Chapter 5 PROPERTY GENERATION
In the previous chapter, the synthesis path of the design flow has been considered. This chapter focuses on design verification. The coarse parts of the flow covered in this chapter are shown in Figure 5.1. A more detailed presentation of the proposed methodologies is given in the respective sections. Verification is a major issue in the design of integrated circuits and systems. According to Moore’s law, the number of elements in a manufacturable circuit doubles every 18 months. But the design productivity increases at a lower rate. Resulting is a design and verification gap. On the other hand verification of circuits is becoming even more important as circuits are applied in a variety of systems concerned with security issues. Design verification means to check the compliance of the system with the textual specification. Traditionally this is done by simulation. A testbench is defined that contains typical operation scenarios of the system and also other scenarios of interest. These are simulated on the system-level description and the HDL description. The correctness of the output responses of the design is controlled. The main disadvantage of this approach is the low coverage of input sequences and design states. For example, a design with only 100 flip-flops has already 2100 states. Therefore, only a fraction of the states, not to mention the interaction between different states, can be checked using simulation. Often corner cases are not considered. The “Pentium bug” is a well-known example that escaped verification and eventually caused a huge financial loss and – even worse – damaged the image of the company. In contrast, property checking is complete in the sense that a proven property is valid in all states of the design and under any input sequence. Moreover, due to the maturity of verification tools property checking becomes feasible for larger designs and even complete industrial designs can be verified at the block level by property
76
ROBUSTNESS AND USABILITY Textual specification
Interactive creation
Manual coding
Simulation traces
System level description
Simulation
Properties
Property check
Counterexamples Manual refinement
Synthesizable description
Figure 5.1. Verification part of the design flow
checking [WTSF04]. But the creation of properties is a manual task that is not supported by any software tools. The techniques proposed in this chapter provide an innovative approach to address both deficiencies explained above: the detection of gaps in testbenches and the creation of properties. Moreover, a completely new verification methodology is introduced that helps to speed up “design understanding”. In contrast to standard verification approaches, the new one is interactive and relies on the automatic generation of properties. The first part of the chapter introduces a technique to automatically derive properties from given simulation traces. The technique is based on pattern matching and is very efficient. The derived property is always valid with respect to the simulation trace and can be verified on the design afterward. If the property is not valid on the design, the counterexample explicitly shows a gap in the testbench, i.e. a sequence of input values or states that was not covered in the testbench. In this sense, the automatic generation of properties helps to set up a testbench in a traditional design environment. By this, the approach bridges the transition between the traditional design flow and the enhanced design flow proposed here. Additionally, a computer aided creation of properties becomes possible. This speeds up the time-consuming manual creation of formal properties also in the enhanced design flow. This technique has been published in [FD04]. The second part of the chapter builds on these techniques for an interactive generation of properties and – even more important – yields a new verification methodology for design verification. The tool for property generation is
Property Generation
77
used interactively to question the behavior of the design. For this purpose a set of signals and possibly a description of the scenario of interest is selected by the user. This is used to generate a property which is then compared to the expectation. This way the behavior of the design can be explored interactively. As a result, the user learns information about the design which is termed “design understanding” in the following. This methodology was introduced in [DF04].
5.1
Detecting Gaps in Testbenches
Formal verification methods guarantee completeness under any input sequence and in any state of the design. The compliance of a design with the specification is formally verified by model checking. Nonetheless, due to familiarity of designers with simulation and the fact that whole systems cannot be handled by property checking due to the complexity, simulation is still widely used to check the correctness of a system. For this, large testbenches for a system are created. Techniques to gather information about the reliability of the verification mostly rely on coverage metrics. Simulation-based approaches use monitors during simulation to determine the amount of coverage, e.g. statement coverage or line coverage. But even achieving 100% coverage with respect to a certain metric by simulation still cannot guarantee correctness. Here, a method is proposed to formally analyze a testbench and to check which parts of the functional behavior of a design are not tested. Therefore, this technique bridges the gap between the traditional simulation-based verification flow (see Figure 1.1) and the enhanced verification flow based on property checking (see Figures 1.2 and 5.1). As a result, the traditional techniques and the new techniques can be combined as shown in Figure 5.2. PropGen is the tool to generate properties from the testbench. While usually only the simulator is available to check the design by means of the testbench, PropGen is employed to analyze the testbench. An invalid property leads to a counterexample produced by the property checker. This counterexample exhibits the behavior that is not examined by the testbench, i.e. a gap in the set of stimuli provided by the testbench. This knowledge can be used, e.g. to extend the testbench. The integration of PropGen and the property checker – as indicated by the dashed shape in the figure – results in an easy-to-use push-button tool for analyzing a testbench. The user does not have to know about the underlying formal techniques. In summary, a crosscheck of testbench and design is established by this method. Additionally, a mechanism to generate more focused properties is provided. The generation of properties which show all dependencies between certain signals can lead to properties that are too general. By applying restrictions, PropGen can be guided to find properties for certain situations, e.g. a particular
78
ROBUSTNESS AND USABILITY
Textual specification
PropGen
Manual coding
Simulation traces
System level description
Simulation
Properties
Property check
Counterexamples Manual refinement
Testbench
Synthesizable description
Extend testbench
Figure 5.2. Integration into the verification flow
operating mode of the design. Experimental results show the efficiency of the approach: A property is generated from traces of more than 1 million clock cycles in at most 6 min but usually in less than 10 s. This section is structured as follows: The basic procedure for PropGen together with techniques for pruning the search space and finding useful properties is explained next. A heuristic to select “useful” properties from generated properties and a method to guide the search for properties are presented in Section 5.1.2. The application and experiments showing the efficiency of PropGen for large traces are given in Section 5.1.3.
5.1.1
Generating Properties
In the following, properties expressed in terms of propositional logic are considered (see Section 2.3.2). The generation of properties is based upon pattern search in a simulation trace. A particular pattern in the trace shows a relationship between signals and, by this, indicates an underlying property. The property is generated by taking all patterns that occur in the trace into account. The basic procedure for deducing a property is given in Figure 5.3: Given is a tuple of signals I and a maximal window tmax for the properties to be generated as well as a simulation trace T = (U, (u1 , . . . , utcyc −1 )). In the property, a particular time step in the window is assigned to each signal; this time step is not given in advance. An assignment of time steps to signals is called time relation in the following. The iteration of all possible time relations R is the outer loop (Line 2). At the beginning nothing is known about the property, it is
79
Property Generation 1 2 3 4 5 6
PropGen(I, tmax , T ) foreach time relation R p(R) = 0 foreach time step 1 ≤ t < tcyc pat= getPattern(I,R,T ,t); addPattern(p(R), pat);
Figure 5.3. Sketch of the property generation
x1
x2
0
0
0
x2
x1
1
1
0
1
0
0
s1
s1
0
1
1
0
1
0
s2
s2
0
0
1
1
0
1
s3
0 u0
0 0 1 1 0 u1 u2 u3 u4 u5
s3 0
1
2
3
4
(a) Waveform
Figure 5.4.
5 t
0
0
0
(b) Vector representation of the waveforms
Simulation trace for the shift-register
initialized to the constant function 0. Then, at each time step of the trace the behavior of the signals is determined in terms of a pattern (Line 5) and included in the property (Line 6). The property for a particular time relation R is valid within the trace by construction because all occurring patterns are considered. A pattern is the vector that gives the values of signals at the time steps as determined by the time relation. The time relation R assigns to a signal sig ∈ I the time offset within the property R(sig). For a window starting at time t the value inserted for signal sig is sig[t + R(sig)]; thus, the pattern is determined by the trace. This assignment of values for a pattern is done by getPattern. Then, the behavior reflected by the pattern is included in the property by addPattern. This is achieved by rewriting the pattern as a conjunction of literals of the variables in I at the time steps determined by R. For a value of 0 in the pattern the negative literal is used, for the value 1 the positive literal is used. This cube determines one valid assignment to the signals, the sum of all these cubes leads to the property p(R). Example 20. Again, the simulation trace that was introduced in Example 11 on Page 30 is considered. For convenience the waveforms and the trace are repeated in Figure 5.4 which shows the shift-register that produces this trace.
80
ROBUSTNESS AND USABILITY
Let the tuple of signals I be (x2 , x1 , s1 ). And let R(x2 ) = R(x1 ) = 0 and R(s1 ) = 1. Now, for each time step t the pattern is given by: (νx2 [t], νx1 [t], νs1 [t + 1]) At time steps 0, 1, and 2 a pattern is found, each of which leads to a cube: 0) (0, 1, 1) → x2 [t] · x1 [t] · s1 [t + 1] 1) (0, 1, 1) → x2 [t] · x1 [t] · s1 [t + 1] 2) (0, 0, 0) → x2 [t] · x1 [t] · s1 [t + 1] No other patterns are found at later time steps. The resulting property is the sum of the cubes, i.e. p(R) = x2 [t] · x1 [t] · s1 [t + 1] + x2 [t] · x1 [t] · s1 [t + 1]. The number of time relations is large since each of the signals can be assigned to any time step from 0 to tmax − 1. This leads to (tmax )|I| time relations. But the search space can be pruned by using the following rules: 1. At least one time reference must be zero, otherwise the same time relation is considered more than once with a constant offset. 2. No signal is considered twice at the same time step. If a signal occurs more than once in I, different time steps are assigned to the different instantiations of the signal. 3. An input is never considered at the last time step tmax − 1 in the time relation. The input value has no influence on an observed state bit or output value if it occurs at the last time step of the window. Another observation helps to further reduce the search space. Given |I|, there can occur at most 2|I| possible patterns with respect to a particular time relation. If all the possible patterns occur, the sum of the cubes returns the constant function 1 as a property, i.e. a property that is always valid or a “trivial property”. Thus, this time relation does not lead to a useful property and further scanning is skipped. Currently, the algorithm considers only one time relation for property generation. As a result, no property that includes several time relations can be generated. This is the case for existential quantification: In the propositional property this breaks down to a disjunction of several time relations. The resulting property itself is represented by a BDD (see Section 2.1.2). This introduces some abstraction from the cube representation, e.g. don’t cares are easily determined. Because |I| – the number of signals considered – is relatively small, BDDs are suitable to represent the property.
81
Property Generation
5.1.2
Selection of Properties
This section explains how “useful” properties are chosen, how the generation of properties can be guided and how property completion with respect to the design may be applied.
5.1.2.1 Choosing a Useful Property For each time relation that is not pruned by the rules shown above, a valid property is generated. Then it has to be decided which of this large number of properties are “useful”. Obviously, this cannot be done fully automatically. But indeed some help can be provided. As stated at the end of the last section, a property that is trivially true, i.e. equal to constant 1, is of no use. Also, if the relation between some signals in time is determined by the underlying circuit, the number of occurring patterns is usually small compared to 2|I| . In the other case, if the relation of the signals is not determined by the circuit, the values in the patterns seem to be randomly distributed and thus the number of occurring patterns is close to 2|I| . Example 21. Once more consider the shift-register given in Figure 5.5, I = (x2 , x1 , s1 ) and a trace reflecting any state and any input sequence for the shift-register. For the time relation given in Example 20 the following holds: “If x2 [t] = 0, then s1 [t + 1] is equal to x1 [t].” This breaks down to two cubes representing x2 [t] · (x1 [t] ↔ s1 [t + 1]). “If x2 [t] = 1, then s1 [t + 1] and x1 [t] are independent,” leading to the cube x2 [t]. This corresponds to six patterns in total. Now, consider a time relation where the value of s1 is taken at a time step greater than 1. In this case the value cannot be predicted from the other two values. Therefore, all patterns occur and the property becomes the constant function 1, i.e. trivially true. This observation can be used to order the properties generated from the trace. Resulting properties are ordered by increasing numbers of different patterns that were observed. The property with the least number of patterns has the x2 x1
0 1
s1
0 1
Figure 5.5.
s2
0 1
1-bit-shift-register
s3
y1
82
ROBUSTNESS AND USABILITY
highest ranking. This ranking is used to decide about the “usefulness” of properties. The ranking also helps to prune evaluations of other time relations. Only a limited number of properties with the least number of patterns is retained. As a consequence, the order in which different time relations are evaluated influences the time needed to generate properties. Further scanning the trace can be skipped as soon as the number of patterns observed for a time relation exceeds the previously determined limit. Thus, early finding the time relation corresponding to the smallest number of patterns makes the evaluation more efficient.
5.1.2.2 Guided Property Generation When being confronted with a large design, more focused properties can be useful. This can be formulated as an assumption to restrict the property generation. In case of the shift-register, there are two different modes of operation. Either x2 = 0, i.e. the register shifts at each clock cycle or x2 = 1, i.e. the register keeps the present state. So far the method only allows to generate properties for any relation between the signals. Often a property focused to a certain mode of operation can be more desirable. This focusing can be done by an extension of the property with an assumption. This assumption restricts some signals in I to a certain value, to the value of another signal in I, or to a particular time step within the window. Only a pattern that does not violate the assumption is included in the property. The assumption can also be rewritten as a propositional formula a. Thus, the resulting property becomes P (R) = (a → p(R)), where p(R) is generated as above but only from patterns fulfilling the assumption. Example 22. Assume, that in case of the shift-register the operating mode for shifting is particularly interesting. Therefore, the assumption a = (x2 [t] ↔ 0) is used. In this case only those cubes are collected where x1 [t] ↔ s1 [t + 1] holds. As a result, the property p is generated: p = (x2 [t] ↔ 0) → (x1 [t] ↔ s1 [t + 1]) Currently simple assumptions are allowed, e.g. the restriction of a signal to a certain value or to the value of another signal. Also, a signal can be restricted to be considered only at a particular time step within the window of the property. More complex constructs can easily be allowed by extending the input language used for assumptions. Checking whether an assumption holds breaks down to fast pattern matching on the given trace.
5.1.2.3 Property Completion In cases where a large number of signals or states is considered, the given simulation trace cannot cover the complete behavior of the design. In this case
Property Generation
83
the property that is extracted from the trace is not valid within the design. But formal techniques can be applied to complete this invalid property and retrieve a valid one. For this purpose the engine that is used to prove properties on the design must have the capability to find all counterexamples if the property is invalid. This is true in case of BDDs or when an all solutions SAT solver is used. Each counterexample is a pattern that was not contained in the trace. The counterexample is included into the property in order to become valid. When the added counterexamples are shown to the designer, this provides a feedback about behavior that was not covered by the simulation trace. The aim of understanding the design benefits from this feedback.
5.1.3
Experimental Results
The method is suitable for a tight integration of simulation-based verification methods with formal proof techniques. Given a trace and a set of signals, a number of properties is generated. Each of these properties is valid on the trace. If the whole testbench is used as the trace, but the resulting property is not valid for the design, a portion of the design was not tested yet. In the following, this is exemplary shown for the small shift-register from Section 2.2. Then, results for increasing length of the traces are shown in detail for one larger benchmark. Finally, sequential benchmarks from LGSynth93 are evaluated. All experiments were carried out on an Athlon XP 2200+ system with a memory of 512MB running Linux. Initially the properties were represented by cubes, these were then converted into BDDs. A simple BDD-based bounded model checker based on CUDD [Som01a] was used to check the validity of the resulting properties. The initial representation of properties by cubes is suitable because the number of patterns cannot exceed 2|I| , where the number of signals |I| is small. Only the property with the fewest patterns was retained as explained in Section 5.1.2 above.
5.1.3.1 Case Study: Shift-Register Again, the shift-register shown in Figure 5.5 is considered. The signal tuple I = (x2 , x1 , s1 , s1 ) was used. The expected property has to reflect the operating modes “shift x1 into s1 ” and “keep the value of s1 ”, i.e. in terms of a propositional formula the property should be: p = x2 [t] · (x1 [t] ↔ s1 [t + 1]) + x2 [t] · (s1 [t] ↔ s1 [t + 1]) Property generation was carried out for a trace of length 30 that was randomly generated. At first only 5 time steps, then 10, 20 and finally the whole trace were considered. For these cases the generated property became increasingly
84
ROBUSTNESS AND USABILITY
better as the trace covered more and more of the functionality. For all trace lengths the correct time relation was found, where x1 and x2 were picked at time step 0 and the two instantiations of s1 at time 0 and 1. tcyc = 5 – Only the shifting-mode (x2 = 0) was covered for the case x1 = 0 and s1 = 0: tcyc
p5 = x2 [t] · x1 [t] · s1 [t] · s1 [t + 1] = 10 – Both operation modes were covered but only for certain values of x1 and s1 : x2 [t] (x1 [t] · s1 [t + 1] + x1 [t] · s1 [t + 1] · s1 [t]) +x2 [t] ·x1 [t] · s1 [t] · s1 [t + 1] = 20 – The shifting mode was covered completely, the nonshifting mode only for certain values of x1 and s1 : p10 =
tcyc
x2 [t] ·(x1 [t] ↔ s1 [t + 1]) +x2 [t] ·(x1 [t] · s1 [t] · s1 [t + 1] + s1 [t] · s1 [t + 1]) tcyc = 30 – Both modes were covered completely, yielding the property as given above. This experiment shows how the amount of coverage achieved by the testbench is reflected by the property. p20 =
5.1.3.2 Benchmarks Figure 5.6 shows results for misex3. This circuit has 14 inputs and 14 outputs. Circuit misex3 is combinational but property generation still has to figure out the correct time relation. The diagram gives the number of runs resulting in a valid property for traces of different lengths. For each trace length 50 runs were carried out. As expected, the number of runs resulting in a valid property increases with the trace length because a better functional coverage is achieved. The time for property generation is very moderate as Figure 5.7 shows. The figure shows results for a decreasing order of time relations and for an increasing order of time relations. This influences the time needed for the overall algorithm as explained in Section 5.1.2. For the combinational circuit an increasing order of time relations is more efficient. Because the relation between signals has a length of only two time steps, the corresponding time relations are found early when the increasing order is applied. Up to 40,000 cycles, often invalid properties were generated. In this case longer traces lead to better pruning of time relations (the ones that have more patterns than previous ones). For more than 40,000 cycles onward mostly a valid property was generated. From this point additional cycles in the trace do not lead to more pruning but to a linear increase of the time needed for scanning the trace.
85
Property Generation 50 45 40 35
#runs
30 25 20 15 10 5 0 0
20000
40000
60000
80000 100000 120000 140000 160000 180000 200000
tcyc Runs resulting in a valid property for misex3
Figure 5.6.
1.6 increasing order decreasing order
1.4
propGen (sec)
1.2
1
0.8
0.6
0.4
0.2
0
0
20000
40000
60000
80000
100000 120000 140000 160000 180000 200000
tcyc Figure 5.7.
Time needed for property generation for misex3
Table 5.1 shows results for sequential benchmarks. For each circuit five runs were carried out. The parameter tmax was statically set to 4. For each run a random trace of 1 million clock cycles was generated and 7 signals were randomly chosen for I. The whole process of producing the random trace,
86
ROBUSTNESS AND USABILITY
Table 5.1. Sequential benchmarks, tcyc = 1,000,000 Circuit daio gcd mm4a mm9a mm9b mult16a mult16b phase d. s1196 s1238 s1423 s344 s349 s382 s400 s420.1 s444 s526 s526n s641 s713 s838.1 s838 s953 traffic
v u v u u 1 1 u u u u 1 v 1 1 1 1 1 1 u u 1 1 v v
Run 1 1.03 164.39 3.12 86.25 31.91 0.32 0.31 22.65 7.32 69.62 161.10 8.31 3.02 0.78 0.87 1.87 31.02 1.62 6.24 35.29 78.50 8.48 416.28 8.01 1.62
v u v 1 u 1 1 u u u u v v 1 1 1 1 1 1 u u 1 v v v
Run 2 0.66 303.57 1.64 1.30 131.40 0.08 2.32 2.44 85.81 2.44 41.95 2.39 2.26 1.94 1.09 34.88 7.04 1.60 135.65 25.85 1.51 83.71 1.32 8.22 1.85
v u 1 u u v 1 u u u u 1 1 1 1 v 1 1 1 u 1 1 1 v 1
Run 3 0.95 331.01 6.15 72.92 2.44 1.43 0.69 103.00 71.55 39.85 216.33 1.53 2.69 0.59 1.98 44.71 1.85 3.18 1.52 88.86 2.80 2.46 291.80 11.78 1.32
v 1 v 1 1 1 1 u u u u 1 1 1 1 v 1 1 1 u u 1 v v v
Run 4 4.10 22.45 3.24 26.35 48.90 0.13 0.39 1.83 40.11 55.42 153.03 2.82 1.32 0.49 10.81 24.96 47.36 1.07 3.55 53.93 0.88 290.97 1.25 8.64 1.65
v u v 1 1 1 1 u u u u v 1 1 1 1 1 1 1 u u 1 v v v
Run 5 0.96 167.84 2.83 7.31 21.56 0.09 1.09 133.24 11.91 81.04 29.7 4.26 1.55 1.28 0.47 7.23 76.04 0.93 53.39 7.46 118.19 1.15 1.21 4.43 4.18
generating the property and model checking was limited to 15 min. The time for property generation is shown for each run. The letter preceding the run time denotes the result returned by the BDD-based model checker, i.e. whether the property was valid (v), invalid (i), trivial (1) because all patterns occurred, or the proof engine exceeded the time limit or memory limit and the property was left undecided (u). Even for the large number of 1 million clock cycles to be scanned for properties at most 420 s (s838) are needed. Very often the run time is below 10 s. Especially, runs resulting in valid properties are very fast, e.g. runs 2, 4, and 5 for s838 are much shorter than runs 1 and 3 that result in trivial properties. This allows to use the tool on testbenches for large designs. So far mainly quantitative studies were carried out with respect to the run time of the algorithm and the number of properties that was produced. The quality of the resulting properties is further examined in Section 5.2.4.2 in the context of design understanding.
Property Generation
5.2
87
Design Understanding
The technique to automatically generate properties from traces and its capability to bridge the gap between the traditional verification flow and the enhanced verification flow based on formal methods have been introduced in the previous section. Now, we exploit these techniques as a basis for a new verification methodology. As already explained, the classical approach to verification is based on simulation, but creating large testbenches and manually coding monitors is very time consuming and error prone. Setting up properties for formal verification is a time-consuming manual process as well. Moreover, all these techniques can only be applied if a formal description of the circuit exists – either on the behavioral level or on the RTL. But with increasing design complexity it becomes more and more important to get an understanding of the design, i.e. to check whether the implemented formal model corresponds to the intention and ideas of the person who wrote the initial specification (usually in form of a textbook). This specification is commonly given in natural language and by this may contain inconsistencies, nonprecise descriptions or even contradicting requirements. In this section, a new approach is presented that is based on formal techniques. In contrast to other formal approaches, the goal is not to prove the correctness of given formulas or properties but to automatically generate properties. These properties are shown to the designer. So he/she gets feedback about the functional behavior of the system and he/she can “discuss” with the tool. This method focuses on design understanding. It can be applied as soon as a specification that produces cycle accurate traces is available. By this, the method targets design verification instead of implementation verification – in contrast to most other tools. This difference will be explained in more detail below. The section is structured as follows: In Section 5.2.1, we describe the underlying ideas and the methodology in more detail. First, the classical design methodology is briefly reviewed and the resulting verification approaches are discussed. Then, the new technique is presented and advantages and disadvantages are discussed. The work flow to apply the proposed methodology using PropGen is explained in Section 5.2.3. In Section 5.2.4, experimental results are reported.
5.2.1
Methodology
In this section, we first briefly review the classical design flow and resulting implications for verification techniques. Then, the new methodology is introduced. The integration into the design flow is described and resulting benefits are discussed.
88
ROBUSTNESS AND USABILITY
5.2.1.1 Classical Verification Still, verification is carried out at the end of the design process as shown in Figure 5.8 that covers the verification part of the standard design flow from Figure 1.1. To illustrate the deficiencies, the process is briefly reviewed: 1. The initial idea is written down in a textual specification. Even though the specification might contain some formal parts, it is usually given in form of a natural language. This specification is then handed to the design team. 2. The textual specification is formalized and used to build a system model. This can be done on a behavioral level or on the RTL. Usually a common programming language like C or C++ is used to implement the system model. 3. The model is then coded in HDL. This HDL model is built according to the textual specification.
Textual specification
Manual setup
Manual coding
Testbench
Simulation
System level description Counterexamples Manual coding
Detection of a bug or inconsistency
Manual fault diagnosis
Synthesizable description
Synthesis
Figure 5.8.
Current verification methodology
Property Generation
89
4. The HDL model is checked Against the system model by means of testbenches (as indicated by bold arrows) or Against the specification by means of property checking (in a more advanced flow). 5. The HDL model is synthesized. Following the usual notation, the verification of the specification is addressed as design verification while implementation verification covers the steps from the first formal description down to the final layout (including various stages of equivalence checking, etc.). Design verification is only addressed in Step 4. Remark 7. Since the main focus of this section is on the verification of the design entry, all verification issues related to design implementation, like, e.g. verifying the correctness of the synthesis process by equivalence checking, are not further considered. Moreover, in a large design project frequent changes of the specification may occur. These changes are incorporated into the design by repeating the steps shown above. This process leads to a late detection of failures within the design process. When a failure is detected, even modifying the specification can be necessary as shown in the Figure 5.8. This causes long delays during the design process.
5.2.1.2 New Approach Here, the incorporation of techniques for formal verification into earlier stages of the process is proposed. As soon as first blocks can be simulated at a cycle accurate level, properties should be automatically deduced. This helps to get insight and offers a different view at the design. By this, conceptual errors as well as coding errors can be detected earlier. Also, the deduced properties can be used as a starting point for formal verification which reduces the time needed to set up the verification environment. The idea is to provide a tool to the designer that allows to gather more insight into his own design. For an overview see Figure 5.9. This figure shows the procedure for design verification in the enhanced design flow. Usually, the system model already contains the cycle accurate I/O-behavior of most blocks. Therefore, as soon as first portions of the design can be simulated at
90
ROBUSTNESS AND USABILITY Textual specification
Interactive creation
Manual coding
Simulation traces
System level description
Simulation
Properties
Property check
Counterexamples Manual refinement
Synthesizable description
Figure 5.9.
Proposed methodology
the accuracy of clock cycles, formal properties are derived from the given description, i.e. from the system model or the HDL description. These properties exhibit some behavior of the design. The designer or a verification engineer has then to decide whether the property is correct or not. This means that the compliance of the property with the textual specification has to be checked. If the property is found to be valid, it can be used as a starting point for formal verification. If the property is incorrect, either the given simulation trace does not show all behaviors of the given block or the block is erroneous. As a result, a direct feedback between textual specification, system model, HDL description, and verification is established. This feedback between the different design stages helps to improve design quality. Instead of only assuming a property, the designer can explicitly search for a property and check the compliance with the specification. Moreover, pulling verification methods into the earlier stages of the design process enables an early detection of design errors. A mismatch between specification and HDL model is usually only detected during verification. But the proposed method unveils this mismatch already while coding a block or even earlier while interactively creating properties. The iteration becomes superfluous. Still, the verification step cannot be discarded. But now – similar to the set up of testbenches – properties are already developed during coding or even when only the system model is given. As soon as a trace is guaranteed to exhibit all important behavior, the corresponding stimuli become part of the testbench. The same holds for deduced properties. These are added to the
91
Property Generation
property suite for later formal verification. A starting point for formal verification is created during coding already and time is saved. The enhanced design flow proposed in this book makes use of this new interactive verification methodology.
5.2.2
Comparison to Other Techniques
The presented methodology leads to a different aspect of the design than other techniques do. At a similarly early stage of the circuit design phase usually only lint checking or assertion checking are applied. But in case of linting only general properties that guarantee, e.g. correct handling of arrays are checked. Using assertions, semantic checks can be carried out by means of powerful properties [FKL03]. But in this case the designer has to write the assertions himself/herself, i.e. an assumption about the design is formulated as a corresponding property. The proposed method allows to retrieve insight from a simulation trace. No further knowledge about the design is needed. In the software domain, a similar methodology has been presented in [NE01]. In that case, Java programs have been considered and execution traces have been searched for a set of predefined invariants. These invariants were statically checked afterward. Here, the property detection is more general and aims at the verification of hardware.
5.2.3
Work Flow
The work flow to apply property deduction is shown in Figure 5.10. This is a refinement of the interactive creation of properties in the proposed enhanced
Design description Signals
Sim. trace PropGen Property
Property Checker valid/ invalid reject
?accept
Figure 5.10. Application of property deduction
92
ROBUSTNESS AND USABILITY
design flow as shown in Figures 1.2 and 5.9. As a starting point, the design and simulation traces are available. Then a tuple of signals is selected that has to be related to each other. This tuple is handed to the automatic property deduction and a property is retrieved that is valid on the trace. The property is passed to a proof engine that returns validity or invalidity of the property. All information is provided to the designer who decides whether the property is to be accepted or rejected. In case of acceptance, the property is added to the property suite. If necessary, corrections to the design are made. For this purpose the debugging techniques proposed in Section 6.3 can be applied. If the property is rejected, another deduced property with lower ranking (see Section 5.1.2) can be considered or a different tuple of signals can be chosen. The techniques that were introduced in Section 5.1.2 aid the designer. The completion of a property exhibits behavior that is not covered by the simulation traces and may reveal corner cases. Using assumptions to guide the property generation, helps to concentrate on particular modes of operation. Remark 8. In some cases, it can be more instructive to review the property before the result of the proof engine is known. This allows to consider the property itself without being influenced by the verification result. Opposed to looking at the simulation trace, considering the property allows to focus on the functional relation between the signals more easily. Altogether this establishes an interactive process that offers different views at the design. Such a new perspective is an opportunity to increase the insight and to understand the design.
5.2.4
Experimental Results
Two types of experiments are carried out in the following. At first the property generation is evaluated by using the LGSynth93 benchmark set. Then, a case study shows the benefits of property generation while implementing an arbiter in more detail.
5.2.4.1 Benchmarks The focus of this study is to give some information about the relation between deduced properties and the design. For this purpose, the same sequential circuits from the LGSynth93 benchmark set were considered as in Section 5.1.3. Previously only the validity, run time and number of properties were considered. Here, more data is presented to give an idea of the quality of the properties. For each of the circuits the results of three runs are shown in Table 5.2. In all cases, traces of 100,000 clock cycles were considered. The tuple of seven signals was chosen randomly and tmax was set to 4. Again, a simple BDD-based proof engine was used to decide the validity of a property. This proof engine was tightly integrated with the property generation algorithm.
93
Property Generation Table 5.2. Sequential benchmarks, tcyc = 100, 000 Circuit daio gcd mm4a mm9a mm9b mult16a mult16b phase d. s1196 s1238 s1423 s344 s349 s382 s400 s420.1 s444 s526 s526n s641 s713 s838.1 s838 s953 traffic
Run 1 #Rel Time Res. #Pat #Diff 1 0.42 v 18 – 1 8.11 u 47 – 2 1.93 i 91 9 1 4.86 u 106 – 2 11.26 u 105 – 0 0.08 1 – 81 5.96 v 96 – 2 5.81 u 10 – 1 62.07 u 70 – 3 2.53 u 54 – 4 27.36 u 79 – 18 5.89 i 127 1 0 0.12 1 – 72 19.15 i 17 63 17 0.9 i 6 54 0 0.23 1 – 26 11.24 i 16 112 16 117.81 i 5 91 9 1.78 i 5 91 653 49.5 u 32 – 453 35.76 u 32 – 551 41.34 i 32 96 695 46.95 v 16 – 74 15.82 v 33 – 3 0.96 v 7 –
Run 2 #Rel Time Res. #Pat #Diff 1 0.21 v 24 – 1 14.08 u 64 – 1 1.32 v 56 – 3 14.49 u 117 – 18 1.95 u 56 – 0 0.27 1 – 0 2.25 1 – 109 8.05 u 6 – 5 50.42 u 57 – 2 1.96 u 66 – 1 10.93 u 106 – 186 10.64 v 48 – 1 4.22 i 108 20 40 16.55 i 7 49 18 1.45 i 18 48 0 0.26 1 – 10 1.43 i 4 40 2 35.32 i 22 106 71 6.88 i 7 121 0 1.78 1 – 5 7.96 u 94 – 96 7.58 i 16 112 116 7.59 v 4 – 5 2.34 v 14 – 17 2.73 v 40 –
Run 3 #Rel Time Res. #Pat #Diff 3 0.36 v 22 – 12 31.89 u 71 – 2 0.53 v 50 – 1 1.11 u 78 – 6 1.31 u 56 – 0 0.5 1 – 0 2.03 1 – 115 8.97 u 6 – 1 1.79 v 49 – 1 4.05 i 94 18 10 19.42 u 34 – 17 1.45 i 60 60 27 5.38 i 65 15 48 3.37 i 7 24 18 1.53 i 5 32 0 4.53 1 – 234 35.6 i 5 25 38 4.23 i 15 51 22 1.95 i 14 58 2 24.65 u 87 – 3 25.38 u 78 – 308 38.97 i 64 64 83 7.88 v 4 – 7 4.16 v 36 – 2 0.66 i 29 7
The following data is reported for each run: Column #rel gives the number of time relations with a minimum number of patterns, i.e. the number of properties with the highest ranking according to Section 5.1.2. The time in CPU seconds needed to deduce properties is shown in column time (AMD Athlon XP 2200+, 512 MB). The next three columns give information about the first property of those with highest ranking. Column res states the result returned by the property checker denoted by a letter (v = valid, i = invalid, u = undecided, 1 = all patterns occurred). In column #pat, the number of patterns included in this property is shown. Finally, column #diff gives the number of added patterns in order to turn an invalid property into a valid one. A trivial property was not further considered. As can be seen from the table, the number of time relations with a high ranking property is rather small in most cases. This strongly underlines the feasibility of the interactive application of the tool by going through the highest ranked properties. The large number of properties that were left undecided is due to the proof engine that was based on BDDs only. Instructive is the number
94
ROBUSTNESS AND USABILITY
of patterns that were added to valid properties. This shows that in some cases the simulation only covered a small fraction of the total behavior of the signals. When a large number of properties with high ranking occurs, the number of added patterns is an additional hint for selecting useful properties. Note that the experiments in this section are a worst case scenario for property deduction: The tuple of signals is chosen randomly. The window length is fixed. The simulation trace was generated by random stimuli. In the usual application scenario, a designer would choose a set of appropriate signals and a window length. Often stimuli provided by a testbench could be used for property deduction. A more realistic scenario is shown in the following case study.
5.2.4.2 Case Study: Arbiter The benchmark results above show the efficiency of property generation. But these examples do not show the quality of the generated properties or the feasibility of the proposed verification methodology. As a case study, a simple arbiter was coded and checked by means of property generation. In contrast to the more sophisticated arbiter that was considered in Section 4.1.4 the present arbiter manages the access of only two clients to a bus. Conflicts are resolved by a priority scheduling. There exists a request input (req) and a done input (done) as well as an acknowledge output (ack) for each client. Figure 5.11(a) shows a block diagram of the arbiter with two clients. An example for a request from client 0 is shown in Figure 5.11(b). By setting req, a client signals the need to access the bus. Then, the arbiter sets ack if the bus is not in use, no request of higher priority occurs, and ack is kept. Finally, the client sets done to release the bus again. req[0] ack[0] done[0] req[0] done[0] req[1]
req[1]
ack[0]
ack[1] ack[1]
done[1]
done[1]
0
(a) Block diagram
1
2
(b) Trace
Figure 5.11.
The arbiter
3
4
5 t
Property Generation
95
The arbiter was coded using Verilog. VIS [VIS96] has been used to generate a blif-file from the Verilog description. Then, automatic property generation was used in the manner explained in Section 5.2.3. Due to the new methodology, two errors were detected. The Verilog code of the arbiter is shown in Figure 5.12 on page 96. Originally, instead of Lines 16 and 17 only Line 14 was in place. In a first attempt to fix the bug, this was replaced by Line 15 and, finally, by Lines 16 and 17. The detection of errors and reasons for the replacement are described in the following. For all calls of property deduction tmax was set to 2. The tuple I of signals is shown at the beginning of each paragraph. I = (req[0], ack[0]): At first the relation between req and ack for the client with highest priority was of interest. The set of signals passed to property generation consisted only of req[0] and ack[0]. The first assumption was that there is a direct dependency between this pair of signals. But indeed any pattern may occur. A trivial property was the result. Therefore, the state was included in the set of signals. I = (req[0], state, ack[0]): The first version of the arbiter contained Line 14 instead of Lines 16 and 17. This lead to an error: ack could be influenced by the behavior of req while the bus was BUSY. This error was resolved by replacing Line 14 with Line 15. Now, additionally the influence of done on the other signals was of interest. Also, a value of ack at another time step was taken into account. I = (req[0],done[0],state,ack[0],ack[0]): The resulting property showed that ack for the client was reset even if the client did not release the bus by setting done. A possible solution is the replacement of Line 15 by Lines 16 and 17. In the resulting property, the state was taken at time step 1 instead of 0 as originally wanted. Therefore, this signal was restricted to time step 0 and as a result the relation for the arbiter with highest priority was returned. The case study showed a scenario for the application of property deduction and how errors can be revealed using this method. In the first query, only a small tuple of signals was considered. This tuple was then successively enlarged to understand more relations. Checking deduced properties on the design and reviewing these gave a feedback that led to the detection of design errors. During this process no direct interaction with the formal verification engine was necessary.
96
ROBUSTNESS AND USABILITY
1 module theArbiter (clock,ack,done,req); 2 parameter IDLE=0, BUSY=1; 3 input clock, reset; 4 output [1:0] ack; 5 input [1:0] req, done; 6 7 reg [1:0] ack; 8 reg state; 9 10 wire [1:0] resolve, acquire; 11 12 assign resolve[0]= req[0]; 13 assign resolve[1]= !req[0] & req[1]; 14 / / a s s i g n a c q u i r e = ( a c k [ 0 ] & r e q [ 0 ] ) | ( ack [ 1 ] & req [ 1 ] ) ; 15 / / a s s i g n a c q u i r e = ( a c k [ 0 ] & done [ 0 ] ) | ( a c k [ 1 ] & done [ 1 ] ) ; 16 assign acquire[0]= ack[0] & !done[0]; 17 assign acquire[1]= ack[1] & !done[1]; 18 19 always @(posedge clock) 20 case (state) 21 IDLE: if (req!=0) 22 begin 23 ack= resolve; 24 state = BUSY; 25 end 26 BUSY: if (done!=0) 27 begin 28 ack= acquire; 29 state= IDLE; 30 end 31 endcase 32 endmodule Figure 5.12.
Code of the arbiter
Property Generation
5.3
97
Summary and Future Work
The verification flow for circuit design has been considered in this chapter and shifting to a new verification methodology has been suggested. A technique to automatically generate properties from traces is the basis. The technique relies on pattern matching and therefore is very efficient even on large traces as experiments demonstrated. A first application of this technique is the detection of gaps in testbenches. By this, both techniques – traditional simulation-based verification and the enhanced formal methods – can be applied for verification. Invalidity of a generated property gives hints to the gaps in the testbench. This knowledge can then be exploited to increase the quality of simulation-based functional verification. PropGen can be used as a push-button tool that automatically detects gaps for given signals. Based on this foundation, a new verification methodology was presented. Properties are created interactively which can speed up the pure manual creation of properties. Accepted properties serve as a starting point for formal verification. Even more important is the different view at the design provided by the generated properties. This can unveil the behavior which remains hidden otherwise. As a result, the understanding of the design is improved. In turn, the efficiency of the design process increases due to early detection of bugs or inconsistencies. The main advantages of this methodology are the new perspective on the design and an increased productivity when creating a property suite. Important directions for future improvements are more heuristics to select useful properties or the front-end for the presentation of deduced properties and for user interaction. Additionally, the properties could be extracted directly from a formal model of the design. In summary, the proposed techniques contribute an improved usability in the verification flow. New push-button tools were proposed to automate tasks that were carried out manually so far. As a result, the productivity of the verification flow is increased. This mainly concerns the creation of properties. The automation of another task – that of debugging failures – is considered in the next chapter.
Chapter 6 DIAGNOSIS
Similar to the previous chapter, this chapter also considers the verification path of the design flow. At this point the diagnosis problem is studied in more detail. The steps of the design flow covered in this chapter and the integration into the overall flow are shown in Figure 6.1. Efficient simulation and formal verification – in the form of property checking and equivalence checking – are applied to guarantee the functional correctness of a design. The postproduction test of a chip checks the correct behavior of the final product. Although effective, these techniques only detect the existence of errors or faults. Further effort is required to locate the source of the errors. Unfortunately, manually finding or diagnosing the error locations in a design is a time-consuming and, therefore, costly task. Automatic approaches for diagnosis have been proposed to speed up this debugging process. These approaches automatically calculate a set of candidate error sites. Then, the user can restrict himself to these candidates for debugging instead of going through the complete design. In literature, this area is frequently named Design Error Detection and Correction (DEDC). In the first section of the chapter, existing diagnosis techniques are reviewed and compared as published in [FSVD06]. In particular, simulation-based and SAT-based diagnosis are considered in detail. These two approaches are independent of the circuit structure and they are applicable for sequential diagnosis problems. Both approaches rely on a given set of counterexamples and the design as the starting point. The comparison shows that the simulation-based approach is very efficient in terms of run time and resources. The SAT-based approach is less efficient but returns results of higher quality. The problem of generating counterexamples that are explicitly chosen to retrieve a good diagnosis resolution is considered in Section 6.2. In particular, the decision problem whether a given set of counterexamples leads to the
100
ROBUSTNESS AND USABILITY Property check
Counterexamples Fault diagnosis Synthesizable description
Equivalence check
Synthesis (for test.)
Gate level description
Figure 6.1.
Fault diagnosis
Counterexamples
Fault diagnosis in the design flow
smallest number of candidates is shown to be NP-complete even if all counterexamples are given in advance. Additionally, heuristics to select a “good” set of counterexamples from all counterexamples are presented and evaluated. These results were presented in part in [FD03]. Up to this point only the diagnosis of combinational gate-level designs is considered. In the last part of the chapter, diagnosis is extended to HDL level designs and sequential designs. As a result, diagnosis can be applied to debug failing properties. While the other approaches need a correct output response per counterexample, the property provides the correctness check for the diagnosis algorithm in this case. This diagnosis algorithm works on the gate-level structure of the design. A dedicated synthesis flow is used to identify source level components and to link the diagnosis results back to the HDL. In this work, the SAT-based procedure is applied to diagnose properties as proposed in [SFBD06]. The simulation-based approach for diagnosing properties as presented in [FD05] is applied as a preprocessing step to increase efficiency. Similar to fault models in ATPG an error model can be considered for diagnosis to restrict the type of errors. Throughout this chapter a very general error model is used. Here, a single error is the replacement of the function of a gate or a component by another function. Due to this general formulation, this model subsumes most simpler fault models.
Diagnosis
6.1
101
Comparing SAT-based and Simulation-based Approaches
A number of different concepts have been used for diagnosis. Some of these techniques were originally applied in the context of the postproduction test, but they can be used for equivalence checking in the same manner. The structural approaches [BDKN94, LCC+ 95, VSA03] and the BDD-based approaches [CCC+ 92, PR95, HK00] have certain drawbacks. Structural approaches rely on similarities between the erroneous circuit – the implementation – and the specification. But such similarities may not be present, e.g. due to optimizations during synthesis. For large designs BDD-based approaches suffer from space complexity issues. In both cases, a complete specification of the design is usually a precondition for diagnosis. Here, diagnosis methods that use a set of counterexamples are considered. The focus is on approaches for simulation-based diagnosis as proposed in [KCSL94, HC99, VH99, LV05] and diagnosis based on Boolean Satisfiability (SAT) [AVS+ 04, ASV+ 05]. Both approaches have been applied to combinational and sequential diagnosis problems. Due to the underlying engines, both techniques are robust with respect to the size of the design. Simulationbased approaches can use efficient parallel simulation techniques with linear run times while SAT-based approaches benefit from recent advances in SAT solving (see Section 2.1.3 for details). An in-depth analysis of these diagnosis methods can show directions for further improvements. In this section, simulation-based and SAT-based diagnosis are compared from a theoretical and empirical point of view for the first time. Both approaches use a set of counterexamples for diagnosis that may be provided after test-bench simulations, formal verification, or after failing a postproduction test. The basic procedures of the two approaches are outlined. Then, the relationship between these procedures is explained by introducing a third approach of simulation-based diagnosis for multiple errors. Similarities and differences are analyzed using this third approach. The theoretical results are backed by experimental data based on the ISCAS89 benchmark suite. Overall, this analysis provides future research initiatives for improving each individual diagnosis technique as well as creating hybrid approaches that exploit the advantages of both. The technique presented in Section 6.3 is a first step into this direction because the simulation-based technique is applied as a preprocessing step for SAT-based diagnosis. This section is structured as follows: In Section 6.1.1, the basic approaches for simulation-based and SAT-based diagnosis are introduced. In Section 6.1.2 the relation between the approaches is considered from a theoretical point of view. Further issues regarding performance and quality are discussed in Section 6.1.3. The basic procedures are experimentally compared in Section 6.1.4.
102
6.1.1
ROBUSTNESS AND USABILITY
Diagnosis Approaches
In this section, the diagnosis problem is introduced and the basic diagnosis procedures for simulation-based and SAT-based diagnosis are presented. References for the advanced approaches which make use of the basic procedures are given in the corresponding sections.
6.1.1.1 Diagnosis Problem Based on the notion of combinational circuits (see Section 2.2, Definition 3) and test-sets (see Section 2.3, Definition 5) the diagnosis problem is formulated as follows: Definition 8. Let the combinational circuit C = (V, E, X, Y, F, P ) be an implementation of a specification and let T be a test-set of r counterexamples. The diagnosis problem is to determine a set of candidate gates C = {g1 , . . . , gc } where a correction can be applied to rectify C such that C yields the correct output value for all counterexamples in T. The size of a correction C is denoted by |C|. Definition 9. A set of candidate gates C is called a valid correction for a test-set T if changing the functionality of the gates in C is sufficient to rectify the circuit C such that C yields the correct output value for all counterexamples in T. This definition of a valid correction does not require the replacement to be a deterministic combinational function in terms of primary inputs and present state. But this is not relevant in the combinational case because the same configuration of input values and present state values does not occur twice for different counterexamples. In contrast, in the sequential case the same input values and present state values may reoccur. Therefore, this definition is refined in Section 6.3 when the diagnosis of properties is considered. Definition 10. A valid correction C contains only essential candidates if and only if for any g ∈ C: C \ {g} is not a valid correction. The faulty circuit C contains p actual error sites e1 , . . . , ep . An error is considered to be the replacement of the function of a gate by another arbitrary Boolean function. Therefore, the size of the search space for possible corrections is in the order of O(|C|p ) [VH99]. In the following, the term effect analysis means “determining whether changing the functionality of one or more internal circuit lines corrects the value of the erroneous output”.
103
Diagnosis
5 6
PathTrace(C, i, T , y) Simulate T to establish values of internal signals. Ci := ∅; mark y. For each marked gate g that was not visited Ci := Ci ∪ g If there are inputs with controlling value, mark one of these inputs, else // no input has a controlling value mark all inputs. If there remain marked gates that have not been visited, goto (3). Return Ci .
7 8 9
BasicSimDiagnose(C, T) For i = 1 . . . r Ci := PathTrace(C, i, Ti , yi )
1 2 3 4
Figure 6.2.
Basic simulation-based diagnosis
6.1.1.2 Simulation-based Diagnosis The basic procedure for simulation-based diagnosis approaches considered in this work is Path Tracing (PT) that is derived from critical PT [AMM83]. The overall flow for a naïve simulation-based diagnosis is shown in Figure 6.2. The procedure BasicSimDiagnose uses PT to calculate a set of candidates Ci for each triple Ti = (Ti , yi , νi ) in the test-set T. PT marks “candidate gates” on the sensitive paths leading to the erroneous output yi . All gates on the sensitized path are candidate error sites. More than one input of a gate may have a controlling value (e.g. both inputs of an AND-gate have value 0), but it is only necessary to choose one of these inputs to include at least one error site in the sensitized path [HC99, VH99]. So there can be several sensitized paths. One of these paths is deterministically chosen. The basic algorithm does not check whether the inversion of a candidate’s logic value for a particular counterexample really causes a value change at the erroneous output(s), i.e. no effect analysis is performed. In the following, we refer to this basic simulationbased approach as BSIM. Example 23. Consider Figure 6.3 and assume the gate marked by “X” is an OR-gate instead of an AND-gate, i.e. the actual error site. When the shown values are applied, an erroneous output is resulting. The fault free/faulty values of lines are annotated. Starting PT at the erroneous output y0 , the bold lines and gates are marked as sensitized. All gates except the inverter are returned as candidate error sites by PT.
104
ROBUSTNESS AND USABILITY 0
x0 =1
X 1
x1 =1 x2 =0
1
Figure 6.3.
0/1
1
0/1
y0
Example of a sensitized path
The interpretation of the diagnosis result depends on the number of errors that are assumed to be contained in C. If there is only a single error present in the circuit, the actual error site is contained in the intersection of all candidate sets, i.e. in ri=1 Ci . If there are multiple errors, a more conservative approach has to be used: Each marked gate has to be considered as a candidate. The number M (g) = |{i : g ∈ Ci }| of counterexamples that marked a particular gate g can be used to prioritize the candidates. But there is no guarantee that any real error site has been marked by the largest number of PT marks. Because the candidate set of each counterexample contains at least one actual error site, at least one actual error site is marked by more than r/p counterexamples [KCSL94], i.e. ∃e ∈ {e1 , . . . , ep } : M (e) > r/p. Thus, for the correction of k errors, subsets up to size k of all marked gates have to be considered. This is done by the advanced simulation-based approaches relying on PT [HC99, VH99, LV05]. Multiple errors are handled by considering the corrections of size k and applying pruning techniques. For examples, in [LV05] the number of remaining errors is reduced by one each time in a greedy-like manner. After choosing a single correction, the candidate sets Ci are recalculated by calling BasicSimDiagnose. This effect analysis is necessary, because correcting one error may change the sensitized paths in the circuit. Then, the next single correction is chosen. But earlier decisions may have been wrong. Thus, the ability to perform a backtrack similar to the solvers for NP-complete problems is required. As a result, the time complexity for the advanced simulationbased techniques drastically increases compared to BSIM. A simulation-based approach that does not use PT has been introduced in [BMJ+ 99]. Instead of backtracing sensitive paths an approach based on forward implications by injecting X-values was chosen for diagnosis. Therefore, the core idea is similar to the approaches based on PT: The effect of changing a value at a certain position is considered.
6.1.1.3 SAT-based Diagnosis For SAT-based diagnosis a SAT instance is generated that can only be satisfied if changing a limited number of gates in the erroneous circuit produces the correct output values for all counterexamples. This approach was first presented in [SVV04]. The SAT instance F is built as shown in Figure 6.4. Multiplexors
105
Diagnosis
T1
y1
correct value
Tm
ym
correct value
ab
i
cg 1
g
mg
0
abg (a) Multiplexor at g
(b) SAT instance
Figure 6.4.
SAT-based diagnosis
1. BasicSATDiagnose(C, T, k) 2. For each triple (T, y, ν) ∈ T Create an instance I of C in F. Constrain y to adopt the correct value ν. Constrain inputs to the values of t. Insert multiplexors at gates that are considered for correction. 3. For i = 1 . . . k Constrain the number of abnormal predicates set to 1 to at most i. Enumerate all solutions and add a blocking clause per solution. Figure 6.5.
Basic SAT-based diagnosis
are inserted at each gate g to allow for corrections (see Figure 6.4(a)). The output value of g is propagated when the select input abg has the value 0. A correction is applied when the select input abg is set to 1: the value of g is overwritten by a new unrestricted value cig . The variable name abg of the select input refers to the functionality of asserting the gate g “abnormal” to inject a correction. Notation 3. A variable abg is also called abnormal predicate. Such corrections are necessary to retrieve a solution for the SAT instance F shown in Figure 6.4(b). According to the pseudocode in Figure 6.5, a copy of C is created for each counterexample (T, y, ν) ∈ T. Each copy is constrained to the primary input values of trace t and to produce the correct output value ν for the erroneous output y. The select-line abg for multiplexors corresponding to gate g is the same in all copies of C. Therefore, the gate may be changed
106
ROBUSTNESS AND USABILITY
for all counterexamples or for none. The injected value cig may be different for different counterexamples. Thus, gate g can be replaced by an arbitrary Boolean function. The number of gates that may be changed is bounded by constraining the number of abnormal predicates that may take the value 1 to be less than or equal to k. A SAT solver is used to solve the SAT instance F. Free variables in F are those corresponding to the abnormal predicates abg and to the new primary inputs for the correct values at gates cig . All other variable values are determined by the constraints for the circuit’s gates (see Section 2.2.3), the traces and the correct output values. Each solution of F is a solution to the diagnosis problem. The abnormal predicates that are set to 1 in a satisfying assignment for F determine the set of candidate gates C that have to be changed. In the following, we refer to C as a solution of F. In Figure 6.5, the limit is iteratively incremented in the for-loop in Line 3. This guarantees that all solutions generated by the approach only contain essential candidates, because solutions with a smaller number of candidates are blocked before increasing the limit. For this purpose an incremental SAT solver can be used [WKS01]. In the following, we refer to this basic SAT-based approach as BSAT. The advanced SAT-based diagnosis approach [SVV04] applies several heuristics that improve the performance of BSAT. To reduce the search space, additional clauses are added that force the free variables cig to 0 when abg is set to 0. This prevents up to |C| decisions of the SAT solver. The same effect is achieved by using the following construction: abg → (cig ↔ g) That is, if gate g is not changed (abg = 0) then cig yields the value calculated at g for counterexample i, otherwise the value of cig can be set to an arbitrary value. Also, instead of inserting a multiplexor at each gate, only dominators are selected in a first run to reduce the search space. In a second run, a finer level of granularity for diagnosis can be retrieved by introducing more multiplexors in the dominated regions that may contain an error. Additionally, for a large number of counterexamples the test-set is split into partitions to reduce the size of the SAT instance. Finally, an all-solutions SAT solver is used. Such a solver automatically minimizes the number of assignments in a solution. Thus, incrementally solving instances with larger limits as in the basic procedure is not necessary. All these techniques do not change the solution space but dramatically decrease the run time. In fact, speed-up factors of more than 100 times have been observed [Smi04]. The approach has also been applied to diagnose sequential errors efficiently [AVS+ 04] and to carry out diagnosis for hierarchical structures [ASV+ 05].
107
Diagnosis
6.1.2
Relation Between the Approaches
In this section, the two basic diagnosis approaches BSIM and BSAT are compared from a theoretical point of view. A third approach is introduced that formally describes the application of BSIM for the diagnosis of multiple errors. Using this approach, the differences between the two basic techniques are explained. The discussion in Section 6.1.3 includes the advanced approaches. The third approach is given in Figure 6.6. First, BasicSimDiagnose is called to calculate the candidate set Ci for counterexample (Ti , yi , νi ) ∈ T. These sets form an instance S of the set cover problem: A solution C∗ of S contains at least one element of each set Ci . Thus, for each counterexample in T at least one gate on a sensitized path is contained in C∗ . We refer to the approach implemented by SCDiagnose as COV. Example 24. Assume, that SCDiagnose is called for k = 2 and a test-set with three counterexamples. Further assume that BasicSimDiagnose returns the following candidate sets: C1 = {a, b, f, g} C2 = {c, d, e, f, g} C3 = {b, c, e, h} Then, {b, d} would be one possible solution returned by SCDiagnose. Another solution would be {a, d, h}. This simple approach does not use heuristics to bias preference to one solution over another. The minimum cover problem, i.e. to decide whether no solution with fewer elements exists, is NP-complete [GJ79] (see also Definition 14 on Page 117). The relation between the set cover problem and diagnosis of multiple errors has been studied earlier, e.g. in [VH99]. The BSAT approach solves a very similar problem. By choosing the values of the abnormal predicates, locations for corrections are determined. One difference is the simulation engine which is replaced by BCP of the SAT solver 1. 2. 3.
SCDiagnose(C, T, k) Call BasicSimDiagnose(C, T, k) to calculate Ci , 1 ≤ i ≤ r. Calculate all solutions of the set cover problem S: Find C∗ such that (a) for each i: at least one element of Ci is contained in C∗ , (b) for any g ∈ C∗ : C∗ \ g does not fulfill condition (a), (c) |C∗ | ≤ k. Figure 6.6.
Diagnosis based on set cover
108
ROBUSTNESS AND USABILITY
(see Section 2.1.3). Additionally, BSAT carries out an effect analysis while solving the SAT instance: When switching an abnormal predicate at a multiplexor BCP propagates value changes dynamically. In contrast, COV does not carry out effect analysis at all. Based on these observations, the following lemmas can be derived. Lemma 3. Let C be a circuit, T be a test-set and k ∈ N. Each solution C of the SAT instance F is a valid correction for T. Proof. The construction of the SAT instance directly implies this lemma. Lemma 4. Let C be a circuit, T be a test-set and k ∈ N. There exist solutions for the set cover problem S in SCDiagnose(C, T, k) that are not a valid correction for T. Proof. Consider the circuit in Figure 6.7. The values described by the simulation trace are assigned to the inputs. This produces the output value 0 instead of 1. PT either marks the gates {a, b, d} or {a, c, d} because both inputs of d have a controlling value. A possible solution to cover this single set of candidates is {b} (or {c}, respectively). But the counterexample cannot be rectified by changing only the output value of b (or c). Lemmas 3 and 4 directly lead to Theorem 3. Theorem 3. Let C be a circuit, T be a test-set and k ∈ N. There exist solutions calculated by SCDiagnose(C, T, k) that are not calculated by BasicSATDiagnose(C, T, k). Next, the capability to calculate all valid corrections is analyzed. Lemma 5. Let C be a circuit, T be a test-set and k ∈ N. BasicSATDiagnose(C, T, k) returns all valid corrections containing only essential candidates up to size k. Proof. Again, the construction of the SAT instance directly implies this lemma. Incrementally calculating corrections of sizes 1 to k and “blocking” smaller solutions guarantees that only essential candidates are contained in each correction. 1 b 1 0 1 Figure 6.7.
a
0
0
d c
0/1
0
Example: COV may not provide a correction
109
Diagnosis 0 1 0 1
Figure 6.8.
a
0 d
b
c 1
e
0/1
1
0
Example: Solution for k = 2 by BSAT but not by COV
Lemma 6. Let C be a circuit, T be a test-set and k ∈ N. There are valid corrections with k or less candidate gates that are not calculated by SCDiagnose(C, T, k). Proof. Consider the circuit in Figure 6.8. Assume, that only the applied trace shows an erroneous output value and that k = 2. By changing the output values of a and b, the correct output value 1 can be produced. But the single candidate set {a, c, d, e} generated by PT does not contain b. Therefore, {a, b} is not a solution of S. Lemma 5 and 6 imply the following theorem. Theorem 4. Let C be a circuit, T be a test-set and k ∈ N. There exist solutions calculated by BasicSATDiagnose(C, T, k) that are not calculated by SCDiagnose(C, T, k). This analysis states that neither BSIM nor COV always provide valid corrections. Furthermore, these methods do not calculate all valid corrections, whereas BSAT does both, calculates all valid corrections and provides only valid corrections. This difference is important when discussing the advanced approaches in the next section.
6.1.3
Qualitative Comparison
While only the basic procedures were compared in the previous section, the following discussion also includes advanced simulation-based [HC99, VH99, LV05] and SAT-based [SVV04] approaches. Formal aspects like the complexity of the approaches and their ability to calculate valid corrections are considered. Further issues are discussed on an informal basis. Table 6.1 summarizes the topics and the respective results. The number of candidate error sites differs between the approaches. A large number of candidates is returned by BSIM, only the number of counterexamples that marked a particular gate may differ. In contrast, the other approaches only return k candidates. The number k is small and has either to be specified by the user or is determined by automatically calculating a minimal solution. During the search subsets of the gates in C up to size k are considered.
110
ROBUSTNESS AND USABILITY
Table 6.1. Comparison of the approaches BSIM
COV
Adv. sim.- BSAT based k, user defined (or incrementally determined)
Adv. SATbased
Number of candidate error sites
O(|C|)
Valid correction Effect analysis
Not guaranteed, guides the designer Guaranteed, correct values per counterexample are supplied None Simulation- Inherent based
Structural Available information
none for correction
Available
None
Exploited during CNF generation
Simulation engine
Efficient, circuit-based
BCP
Time complexity
O(|C| · r)
O(|C|k )
O(|C|k+1 · r) O(k2l|C| ), O( kr 2l|C| ), j l = 2r + k + 1 l = 2r+j+1
Space complexity
O(|C| + r)
O(|C| · r)
O(k · |C| · r) Θ(|C|(r + k))
Θ(|C|(j+k))
When debugging the design, it is important whether an approach guarantees to return a valid correction. This is not done by BSIM and COV. The solutions calculated by these basic approaches can only be used to guide the designer during error location. In contrast, the advanced simulation-based approaches, the BSAT approach and the advanced SAT-based technique only return valid corrections. Additionally, with respect to each counterexample a new value for each gate in the correction is provided. This can be exploited to determine the “correct” function of the gate. Effect analysis guarantees that only valid corrections are calculated. The advanced simulation-based approaches rely on resimulation while a SAT solver inherently carries out effect analysis. The simulation-based approaches may use structural information for these purposes since they are directly applied to the circuit. For example, successor/predecessor relations, knowledge of dominators, etc., can directly be exploited in the algorithms. For a SAT-based approach such information has to be encoded while generating the SAT instance. This is not done by BSAT. But the advanced SAT-based approach in [SVV04] uses, for example, information about structural dominators to prune the search space. A crucial issue when considering a large number of counterexamples is the simulation engine. Naturally, the simulation-based approaches can use fast engines that directly evaluate the circuit. Such an engine can also be used for
Diagnosis
111
what-if-analysis when carrying out effect analysis. The SAT approach inherently uses BCP for these purposes. This may induce some overhead when the SAT instance is large. But due to sophisticated implementation techniques BCP is very efficient in practice (see Section 2.1.3). Moreover, a large number of unit literals is contained in the SAT instances. These are not further considered after the preprocessing step. Only BSIM has a linear time complexity of O(|C| · r) where simulation and PT are carried out for each counterexample (r is the number of counterexamples considered). COV has to determine a solution to the set cover problem. A backtrack search is applied to determine subsets of size ≤k of gates that cover all candidate sets Ci , 1 ≤ i ≤ r, which takes O(|C|k ). The advanced simulation-based approaches also calculate these subsets. Additionally, for each subset simulation and PT are carried out per counterexample. In total, this takes time O(|C|k · |C|r). For BSAT the SAT solver searches for a satisfying solution on each of the k SAT instances. Per gate each SAT instance contains one select input for a multiplexor and for each counterexample a variable and an additional input, leading to |C| + r · 2|C| variables. Additionally, at most k|C| variables are needed to encode the constraint that restricts the number of abnormal predicates with value 1. (It is assumed that a BDD circuit is encoded in CNF to retrieve this constraint. A different realization may lead to a different result for the complexity.) This are |C|(2r +k +1) variables in total. Thus, the search on one of the k SAT instances containing r counterexamples is carried out in O(2|C|(2r+k+1) ). In the advanced SAT-based approach, only a fixed number j of counterexamples is considered in a single SAT instance, i.e. |C|(2j + k + 1) variables. For a given limit k at most r/j instances have to be solved. This causes an asymptotic complexity of O(k rj 2|C|(2j+k+1) ). Modern SAT solvers drastically improve upon the theoretical upper bounds. For example, after choosing the values for the abnormal predicates at the multiplexors, the values of all other variables in the SAT instance can be implied, which reduces the search space to the size 2|C| already. Pruning due to learning techniques further improves the search. The space complexity of BSIM is also the smallest. Each counterexample can be handled independently of the others leading to a complexity of O(|C| + r). COV stores the circuit, the current test, and the set of candidates marked by PT for each counterexample. In the worst case, PT marks all gates of the circuit. The complexity is O(|C| · r) in this case. The advanced simulation-based approaches store the same information, but additionally resimulation has to be done during the backtrack search. At each search level up to depth k this information has to be stored, yielding O(k · |C| · r) as the space requirement. The CNF formula generated for BSAT always contains a copy of the circuit for each counterexample. The size of the constraint to limit the number of abnormal predicates set to 1 is in Θ(|C| · k). Therefore, the total
112
ROBUSTNESS AND USABILITY
space complexity is Θ(|C| · (r + k)). In the advanced SAT-based approach, only j counterexamples are considered in a single SAT instance which yields Θ(|C| · (j + k)). In summary, the basic simulation-based approaches BSIM and COV are very fast but do not yield diagnosis results of highest quality. The other approaches have higher run times but ensure valid corrections. The experiments presented in the following section strengthen these theoretical observations.
6.1.4
Experimental Results
In the experiments, the three basic approaches BSIM, COV, and BSAT were considered. A number of 1–4 gate change errors were injected into circuits from the ISCAS89 benchmark set. The limit k was always set to the number of errors injected previously. Then, diagnosis was done for 4, 8, 16, 32 counterexamples to show the finer resolution obtained from additional counterexamples. In all cases, a part of the same test-set has been used for an erroneous circuit. All experiments were carried out on an AMD Athlon 3500+ (1GB, 2.2 GHz, Linux). The resources were restricted to 512 MB and 30 min CPU time. The SAT solver Zchaff [MMZ+ 01] was used. Zchaff supports incremental SAT to reuse learned clauses. The set cover problem in COV was also solved using Zchaff. The three basic approaches are compared with respect to run time and quality. Table 6.2 shows the run times of the three approaches. Given are the name of the circuit C, the number of errors p, and the number of counterexamples r used. For COV and BSAT run times to create the SAT instance (for COV this includes the time for BSIM), to calculate one solution, and to calculate all solutions are reported in columns CNF, One, and All, respectively. Table 6.2. Run time of the basic approaches BSIM Circuit s1423 s1423 s1423 s1423 s6669 s6669 s6669 s6669 s38417 s38417 s38417 s38417
p 4 4 4 4 3 3 3 3 2 2 2 2
r 4 8 16 32 4 8 16 32 4 8 16 32
0.00 0.01 0.02 0.03 0.01 0.02 0.03 0.10 0.18 0.25 0.45 0.90
CNF 0.01 0.01 0.01 0.03 0.01 0.02 0.04 0.10 0.18 0.25 0.45 0.90
COV One 0.01 0.01 0.02 0.03 0.03 0.04 0.05 0.12 0.18 0.25 0.45 0.90
All 1.36 19.98 4.12 0.68 0.09 0.12 0.7 0.65 0.20 0.27 0.47 0.92
CNF 0.02 0.02 0.04 0.06 0.05 0.05 0.08 0.13 0.40 0.42 0.49 0.60
BSAT One 0.21 0.21 0.29 0.60 3.24 5.06 10.48 10.80 37.4 33.64 300.86 394.47
All 34.21 12.93 13.14 22.72 56.49 47.87 12.06 14.30 1093.76 522.62 637.18 953.98
113
Diagnosis
Remark 9. Note that only BSAT is guaranteed to return a valid correction since the other approaches do not carry out any effect analysis. Thus, BSAT solves a harder problem. Remark 10. The run times of the basic approaches cannot be compared to that of the advanced approaches. For the SAT-based approaches heuristics have been proposed that yield a speed-up of up to 100 times. The advanced simulation-based techniques are applying a backtrack search and carry out effect analysis for each solution resulting in a drastic increase in run times (see Sections 6.1.1.2 and 6.1.2). As expected, BSIM is the fastest approach and takes less than 1 s CPU time even for a large circuit as s38417. Also, COV computes corrections quite fast even when all corrections are retrieved. Due to the effect analysis, BSAT needs much longer run times especially when all solutions are calculated. But this ensures to return only valid corrections. Table 6.3 compares the quality of the approaches: For BSIM – –
–
–
The total number of gates that have been marked by PT is given (| Ci |). For each of these gates the distance to the nearest error was determined, i.e. the number of gates on a shortest path to any error. The average value of these distances is reported (avgA ). The number of gates that have been marked by the maximal number of counterexamples is also given, i.e. Gmax = |{g : ∀h ∈ C : M (g) ≥ M (h)}|. Again, the distance to the nearest error was determined for each of these gates. The minimal, maximal, and average (avgG ) values of these
Table 6.3. Quality of the basic approaches BSIM Circuit p r | ∪ Ci | AvgA Gmax Min Max AvgG s1423 4 4 100 3.68 4 1 4 2.75 s1423 4 8 115 3.78 2 3 4 3.50 s1423 4 16 126 3.90 1 1 1 1.00 s1423 4 32 139 3.85 3 1 4 2.67 s6669 3 4 90 6.89 83 0 12 7.17 s6669 3 8 106 6.87 86 0 12 6.95 s6669 3 16 117 6.85 69 0 12 6.94 s6669 3 32 117 6.85 64 0 12 7.39 s38417 2 4 52 4.75 18 0 11 4.61 s38417 2 8 67 5.69 18 0 11 4.61 s38417 2 16 67 5.69 15 0 11 4.73 s38417 2 32 95 7.56 14 0 11 4.93
COV #Sol Min Max Avg 5931 0 5.33 2.90 28281 0 5.50 3.42 7960 0 4.50 2.85 1716 0.33 4 2.37 415 0 7 4.18 565 0 7 3.94 2275 0 7.33 4.55 1790 0 7.33 4.48 156 0 11 4.67 113 0 11 4.61 150 0 11 4.53 133 0 11 4.40
SAT #Sol Min Max Avg 4239 0 4.00 2.18 1281 0 3.50 1.78 809 0 3.25 1.66 767 0 3.25 1.61 1935 0 5.67 3.66 1029 0 5.67 3.72 12 0 1 0.64 12 0 1 0.64 5959 0 22.00 9.64 31 0 5.50 3.45 29 0 5.50 3.33 33 0 4.50 2.88
114
ROBUSTNESS AND USABILITY
distances are reported. If the minimal value is greater than zero, no actual error site was marked by the maximal number of counterexamples. For COV and BSAT –
The number of solutions is given.
–
For each gate in a solution the distance to the nearest error was determined. Per solution the average a of these distances was calculated. The minimal, maximal and average value of a over all solutions is reported.
These distance measures give an intuition up to which depth the designer has to analyze the circuit when starting from a solution returned by one of the approaches. A small value of this distance is desirable. The table shows that BSIM alone does usually not yield a good diagnosis result. The number of gates that have the highest count from PT (Gmax ) can be quite large see, e.g. s6669. While often an actual error site is among these gates, this cannot be guaranteed. Based on these results, the designer may have to analyze a large part of the circuit before finding an error. COV considers subsets up to size k of all marked gates. Thus, the solution space is large. Using more counterexamples may even increase the solution space, because more gates are marked by PT. Similarly, for BSAT the solution space is large. More counterexamples may also increase this space when additional outputs are introduced into the diagnosis problem. If no additional outputs are introduced, the number of solutions is reduced. Besides the fact that all solutions calculated by BSAT are valid corrections also their quality is better in all cases, except for s38417 when only four counterexamples were considered. When more counterexamples were used, BSAT returned the best results. The plot in Figure 6.9 shows an overview of the results of BSAT vs. COV for all 10
BSAT
8 6 4 2 0
0
2
Figure 6.9.
4
COV
6
8
BSAT vs. COV: Average distance
10
115
BSAT
Diagnosis 10
5
10
4
10
3
100 10 1 1
10
100
10
3
10
4
10
5
COV Figure 6.10.
BSAT vs. COV: Number of solutions
benchmarks. For each benchmark the value of avg is denoted for the two approaches. Marks below the bisecting line indicate that the result returned by BSAT was better than that of COV for a particular benchmark. In Figure 6.10 the number of solutions is compared for the two approaches. Now, a mark below the bisecting line indicates that BSAT returned a smaller number of solutions – note the logarithmic scale of the axes. The figures show that BSAT usually returns a smaller number of solutions of a better quality. This directly implies time savings during design debugging. In summary, the approaches behave as expected from the theoretical analysis. BSAT is slower than the other approaches but returns the best results. Nonetheless, even the simple approaches often calculate solutions of good quality. This is exploited in Section 6.3 to combine both approaches. A method to determine multiple counterexamples that yield good diagnosis results is considered in the following section.
6.2
Generating Counterexamples for Diagnosis
Tools for formal verification guarantee equivalence of designs or validity of a property under any input sequence (see, e.g. [BCMD90, PK00]). But the opposite, i.e. proving in-equivalence of two designs or in-validity of a property, is often only done by providing one counterexample. This is formally correct, but the designer has to locate the error based upon this single counterexample often by hand. The two techniques presented in the previous section and a number of other techniques [KCSL94, VF97, HC99, Uba03] improve this by applying multiple counterexamples and calculating error candidates automatically. For all of these approaches the counterexamples are given in advance, e.g. by a formal verification tool or by a failing simulation trace. The generation of counterexamples that are dedicated for diagnosis has not been addressed adequately, yet.
116
ROBUSTNESS AND USABILITY
Only in [TYSH94] a condition to create counterexamples for error location is defined. The approaches in [TSH94] and [IINY03] make use of this technique. But the given condition reduces the number of applicable counterexamples. Based upon the PT procedure (see Section 6.1.1.2), counterexamples can be used to calculate candidate error sites, even if no complete specification is given. In this context, in the following the problem of choosing from the set of all counterexamples a set that leads to the smallest possible number of candidates is formalized. The decision whether a given set of counterexamples is optimal in this sense is proven to be NP-complete. Two heuristics are introduced to choose counterexamples. The first heuristic can be considered as a general guideline how to choose counterexamples. This can be exploited when simulation-based verification is used. The second heuristic efficiently chooses a set of counterexamples from BDDs. BDDs are used to represent the set of all counterexamples. This is not necessary for the approach but allows to evaluate the quality of the proposed heuristics because the whole search space can be traversed. In Section 6.2.1, the problem of deciding whether a set of counterexamples leads to a minimal number of marked gates is proven to be NP-complete. The two heuristics to choose counterexamples are explained in Section 6.2.2. Experimental results in Section 6.2.3 underline the quality of the heuristics.
6.2.1
Choosing Good Counterexamples
Given are an erroneous circuit C and the set of all counterexamples for a single gate change error. The PT procedure leads to a set of candidate error sites for each of the counterexamples. The intersection of all these sets gives the minimal set of candidates that can be determined. Usually, it is too expensive to use all counterexamples since the number of counterexamples can be exponential in the number of inputs of the circuits. Therefore, a subset of counterexamples has to be chosen that leads to a small number of candidate error sites. This motivates the definition of the following problem; BSIM as introduced in Section 6.1.1 is used as the underlying diagnosis procedure. Definition 11. An instance ICCE of the problem Choosing Counterexamples (CCE) is defined by 1. C = (V, E, X, Y, F ), an erroneous circuit that can be diagnosed by num counterexamples, 2. Γ := {C0 , . . . , Cnum−1 }, the sets of candidates calculated by BSIM, i.e. Ci , 0 ≤ i < num, is the set of gates marked by PT when counterexample i is considered, 3. A fixed but arbitrary integer num ≥ r > 1,
117
Diagnosis
4. A positive integer k. A solution to ICCE is a subset Γ∗ ⊂ Γ with |Γ∗ | ≤ r and
C∗ | ≤ k. | C∗ ∈Γ∗
The decision problem CCE is defined by the question: Does there exist a solution to ICCE ? Informally, a solution to an instance of CCE is a subset of all counterexamples that has size r or less and leads to an intersection of at most k gates. The decision problem CCE is NP-complete as will be proven in the remainder of this section. The proof is carried out by establishing a hierarchy of problems (>pol means “in deterministic polynomial time reducible to”) M C >pol EI >pol M I >pol CCE. First, the subsequent problems and the questions leading to the corresponding decision problems are defined, then the hierarchy is established. Definition 12. An instance of the problem Minimal Intersection (MI) is given by a tuple IM I = (C, Γ = {C0 , . . . , Cnum−1 }, r, k) : 1. C is a finite set of elements, 2. Ci ⊆ C, 0 ≤ i < num, 3. 1 < r < num is a fixed but arbitrary positive integer, 4. k is a positive integer. A solution to MI is given by Γ∗ ⊂ Γ of size r or less, with
∗ C ≤ k. ∗ ∗
(6.1)
C ∈Γ
The decision problem MI is defined by the question: Does there exist a solution to IM I ? Definition 13. An instance IEI of the problem Empty Intersection (EI) is an instance of MI with k = 0. The decision problem EI is defined by the question: Given r, does there exist a solution to IEI ?
118
ROBUSTNESS AND USABILITY
Definition 14. An instance of the problem Minimum Cover (MC) is defined by the tuple IM C = (C, Γ = {C0 , . . . , Cnum−1 }, r) : 1. C is a finite set of elements, 2. Ci ⊆ C, 0 ≤ i < num, 3. 1 < r < num is a fixed but arbitrary positive integer. A solution to MC is given by Γ∗ ⊂ Γ of size r or less, with C∗ = C. C∗ ∈Γ∗
The decision problem MC is defined by the question: Does there exist a solution to IM C ? NP-completeness of the decision problem MC is known [GJ79]. Based on this, the other problems will be proven to be NP-complete. Lemma 7. The decision problem EI is NP-complete. Proof. EI is in NP: a non-deterministic Turing machine decides for each subset Ci if it is chosen to be in Γ∗ or not. Then, the emptiness of the intersection is checked. An instance of MC can be reduced to EI in polynomial time: Given an instance IM C = (C, ΓM C = {C0 , . . . , Cnum−1 }, r) of MC, this is transformed into an instance of EI IEI = (C, ΓEI = {C0 , . . . , Cnum−1 }, r), i.e. the subsets Ci are the complement sets of those in IM C . Let Γ∗EI be a solution to IEI , i.e.
C∗ = ∅ C∗ ∈Γ∗EI
Now,
∗
Γ∗M C = {C |C∗ ∈ Γ∗EI }
119
Diagnosis
is a solution to IM C :
∗
C
=
∗
C∗
∗
C ∈Γ∗M C
C ∈Γ∗M C
=
C∗
C∗ ∈Γ∗EI
= ∅ = C An analog construction yields a solution for IEI , given a solution Γ∗M C of IM C . Thus, IEI has a solution if and only if IM C has a solution. Lemma 8. The decision problem MI is NP-complete. Proof. Non-deterministically finding the solution is done as in the case of EI. Then, Equation (6.1) is validated. EI is reduced to MI. Given an instance IEI = (C, ΓEI = {C0 , . . . , Cnum−1 }, r) of EI, an instance of MI is created: IM I = (C ∪ K, ΓM I = {C0 ∪ K, . . . , Cnum−1 ∪ K}, r, k). K is an auxiliary set of k new elements. From a solution Γ∗M I to IM I a solution to IEI can be retrieved: Γ∗EI = {C∗ |(C∗ ∪ K) ∈ Γ∗M I } And vice versa:
Γ∗M I = {(C∗ ∪ K)|C∗ ∈ Γ∗EI }
By using the NP-completeness of MI, the selection of counterexamples can be proven to be NP-complete as well as the following theorem states. Theorem 5. The decision problem CCE is NP-complete. Before proving the theorem, the idea of the proof will be illustrated by an example. An instance of MI is reduced to an instance of CCE. This is done by creating a circuit. An intersection of gates sensitized by counterexamples corresponds to an intersection of sets in MI.
120
ROBUSTNESS AND USABILITY
Example 25. Let I1 = (C = {g1 , . . . , g15 }, Γ = {C0 , . . . , C4 }, 2, 2) be an instance of MI, where: C0 = {g1 , g2 , g4 , g5 , g6 , g7 , g8 , g9 , g10 , g11 } C1 = {g1 , g2 , g4 , g5 , g6 , g7 , g12 , g13 } C2 = {g1 , g2 , g4 , g6 , g10 } C3 = {g1 , g2 , g3 , g11 , g12 , g13 , g14 , g15 } C4 = {g1 , g2 , g3 , g12 , g13 } For this instance the circuit in Figure 6.11 is constructed. The circuit differs from the specification due to the extra inverter at the top. A binary vector i = (x2 , x1 , x0 ) is an input assignment to this circuit. Let i also denote the decimal value of this vector. Then, an input assignment i with a value i, 0 ≤ i ≤ 4, corresponds to counterexample i. This counterexample is observed at output yi and sensitizes the gates in set Ci . The circuit is composed of three modules: the subset circuit, the propagation array, and a decoder. Purpose of the subset circuit is to model all non-empty intersections of sets Ci . The decoder guarantees that an erroneous output value is only generated if an input value from {0, . . . , num − 1} is applied. The AND-gates in the propagation array propagate the erroneous value at ci to output yi if output i of the decoder is one. Applying any input value greater or equal to num sets all outputs to 0 and, therefore, is not a counterexample. For example, consider counterexample number one corresponding to the input assignment (0, 0, 1) to primary inputs (x2 , x1 , x0 ). This counterexample is observed only at output y1 which evaluates to 0 instead of 1. The PT procedure marks all gates along the bold lines as candidate error sites. Besides the gates in C1 there are only AND-gates, an OR-gate, and the extra inverter on these lines. Therefore, by applying counterexample number one, C1 can be retrieved. In general, counterexample i leads to marking gates in Ci plus (num + 3) gates. For example, counterexamples 1 and 3 lead to the minimal intersection of sensitized gates {g1 , g3 } and {C2 , C3 } is also a solution to I1 . The proof of Theorem 5 is sketched in the following. Sketch of proof. Answering the question by nondeterministically choosing a set Γ∗ and checking if this is a solution to an instance ICCE of CCE is done as before.
121
Diagnosis 1 extra inverter
subset circuit {0} g8
{0,1} g5
{0} g9
{0,1} g7
c0
{0,3} g11
{0,1,2} {0,1,2,3,4} g1 g4 {3,4} g 3 {0,1,2} {0,1,2,3,4} g2 g6
c1
c2
c3
c4
{1,3,4} {0,2} g13 g10
{3} g14
{1,3,4} g12
{3} g15
propagation array 0 1 0
1
2
3
D E C O D E R
x0 x1 x2
4 5 6 7
Figure 6.11.
y0
y1
y2
y3
y4
0
0/1
0
0
0
Circuit corresponding to the instance I1 of MI
The polynomial time reduction of an instance IM I = (C, Γ = {C0 , . . . , Cnum−1 }, r, k) to an instance ICCE with r and k + 1 of CCE is shown. At first the circuit is created, then the one-to-one correspondence of solutions is shown.
122
ROBUSTNESS AND USABILITY
1. SubsetCircuit(Γ) num−1 2. For each v ∈ i=0 Ci : 3. Ind(v) = {i : v ∈ Ci ∧ Ci ∈ Γ} 4. If no list L(Ind(v)) exists, create an empty list L(Ind(v)). 5. Push v into L(Ind(v)). 6. For each previously generated list L(I), I ⊂ N: 7. Create a line of buffers labeled by elements in L(I). 8. Connect the input of the first buffer to the output of the extra inverter. 9. Connect the output of the last buffer to all OR-gates with output ci and i ∈ I. 10. Return the circuit C. Figure 6.12.
Algorithm to build the subset circuit
The generalization of the circuit in Example 25 is straightforward. The circuit can be built in polynomial time. The decoder of a log(num)-bit binary input value to a unary output has size O(num). The propagation array has size O(num2 ) and has a regular structure. Remaining is the subset circuit that has to represent all non-empty intersections of subsets of Γ. This circuit is built by the algorithm in Figure 6.12. During the first step lists are created (Lines 2–5). Each list is named by a set of indices Ind(v), and contains all elements v ∈ C that occur in the intersection of all Ci indicated by Ind(v). In the second step, these lists are used to build the necessary structures to represent nonempty intersections of sets (Lines 6–9). In the resulting circuit, an input value i ≥ num is not a counterexample: Due to the OR of all outputs of the decoder greater num − 1, all primary outputs assume the correct value zero. If a value i < num is applied at the inputs, exactly those buffers in the subset circuit labeled by v, where v ∈ Ci , are sensitized as follows: Due to the structure of the propagation array, an erroneous value is only observable at output yi . This erroneous value is 0 instead of 1. The controlling value of the previous AND-gate is 0. Thus, only AND-gates on the path from output yi to line ci but no other gates are sensitized. All inputs of the OR-gate with output ci have a value of 0, i.e. are noncontrolling. Thus, all input lines of this OR-gate are sensitized. If an output of a buffer is sensitized, the input is sensitized as well. Thus, all buffers labeled with v (v ∈ Ci ) are sensitized due to the construction of the subset circuit. Beside the buffers in the subset circuit only the extra inverter, the OR-gate and AND-gates in the propagation array are sensitized by any counterexample. The AND-gates and OR-gates sensitized by different counterexamples are disjoint.
123
Diagnosis
This leads to a one-to-one correspondence of solutions for IM I and ICCE . An intersection of elements in a subset of Γ∗ ⊆ Γ results in the set of gates sensitized by all the counterexamples {i : Ci ∈ Γ∗ }. Because |Γ∗ | = r > 1, no additional AND-gates and OR-gates occur in the intersection. Only the extra inverter is additionally sensitized. Thus, it has been proven that choosing the optimal set of counterexamples is difficult. Even if all counterexamples are given, the choice of the best set cannot be done efficiently (provided that P = N P). If the number of counterexamples is restricted, e.g. r ≤ 4, and all counterexamples are given, all possible subsets up to size r can be enumerated to find the subset leading to the smallest number of candidates. In practice, even the number of counterexamples num may be exponential in the number n of primary inputs of the circuit. So solving a practical instance of CCE is difficult. On the other hand, heuristics are often used to solve NP-complete problems. These heuristics frequently find good – however nonoptimal – solutions for a given problem. Moreover, randomly generating a nonoptimal solution is often possible but usually far from optimal. This motivates the investigation of heuristics to solve instances of CCE.
6.2.2
Heuristics to Choose Counterexamples
Choosing the best set of counterexamples, i.e. the set that leads to the smallest number of candidate error sites, is difficult. For this reason, two heuristics to choose counterexamples are proposed. The heuristics deal with different situations. The first heuristic is based on a distance metric. This allows to determine what the next counterexample “should look like”. Therefore, the metric also shows how to generate counterexamples to achieve a good diagnosis result. For the second heuristic all counterexamples are given and an efficient choice among them has to be done. Both heuristics are based on the observations given in the next subsection.
6.2.2.1 Observations The heuristics have to guide the selection of counterexamples such that the sensitized paths have a small intersection. The following observations show which conditions lead to a small number of gates on a sensitized path and to different sensitized paths for two counterexamples. Observation 1. A nonspecified input is good because it is mostly not sensitized: A counterexample is observed at an output with a defined value; thus, nonspecified input values at gates are not marked by PT. Observation 2. If the same input has assigned the opposite polarity in two counterexamples, this often leads to a controlling value in one case and to a non-controlling value in the other case. Different paths are sensitized.
124
ROBUSTNESS AND USABILITY
Observation 3. Counterexamples that are observed at different outputs, lead to different sensitized paths that reconverge. The heuristics are designed in such a way that these observations are taken into account.
6.2.2.2 Maximum Distance Heuristic This heuristic uses a distance metric to choose counterexamples, and, in addition, the definition of the metric can be considered as a guideline for the generation of counterexamples. A distance defined between two counterexamples allows to determine what the next counterexample “should look like”. To evaluate the heuristic, a greedy algorithm chooses from all counterexamples such that the sum of pairwise distances is maximized. The following distance between counterexamples is derived from the observations in Section 6.2.2.1. Note that this is not a distance in the mathematical sense since d(T , T ) > 0 for certain T . Definition 15. Let d(x, y), where x, y ∈ {0, 1, −}, be defined by ⎧ 3, if x = − and y = − ⎪ ⎪ ⎨ 2, if either x = − or y = − d(x, y) := 1, if x = y and x = − and y = − ⎪ ⎪ ⎩ 0, if x = y and x = − and y = − The distance d(T , T ) between two counterexamples T and T T
T
= (T, y, ν),
where T = (U, u0 ), U = (x1 , . . . , xn ), u0 = (νx1 [0], . . . , νxn [0]), = (T , y , ν ), where T = (U, u0 ), U = (x1 , . . . , xn ), u0 = (νx 1 [0], . . . , νx n [0])
is defined by
d(T , T ) := (y = y )3n +
n
d(νxi [0], νx i [0]).
i=1
The values assigned to d(νxi [0], νx i [0]) capture the first two observations. The term (y = y )3n increases the distance value of counterexamples that are observed at different outputs. The greedy algorithm in Figure 6.13 uses the distance. Given the set of all counterexamples T, the algorithm selects r > 2 counterexamples and inserts
125
Diagnosis
1. GreedySelect(r, T = T 1 , . . . , T num ) 2. count := 2. 3. Choose counterexamples T , T from T such that d(T , T ) is maximal. 4. Move T and T from T into T . 5. While count < r: 6. Choose T from T to maximize a counterexample T ∈T d(T , T ). 7. Move T from T into T . 8. count + +. 9. Return T . Figure 6.13. Greedy algorithm to choose counterexamples
them into the new set T . The number of calls to calculate the distance is bounded by O(num2 + num · r2 ). Each calculation of the distance takes O(n), where n is the number of inputs. Thus, the algorithm runs in time O((num2 + num · r2 ) · n).
6.2.2.3 Efficient Heuristic on BDDs Having all counterexamples given by BDDs is a different situation compared to the generation of counterexamples. It is desirable to have a more efficient algorithm to heuristically choose counterexamples. The algorithm explained in this section runs in time O(r · n), where n is the number of inputs. Again, the observations from Section 6.2.2.1 are used to create the heuristic. Observation 3 suggests to choose counterexamples for different outputs. Let o be the number of outputs where at least one counterexample can be observed, i.e. the BDDs that represent the counterexamples for these outputs resulting from Equation (2.2) are different from constant 0. From each BDD r = r/o counterexamples are chosen, the remaining r − or/o counterexamples are chosen from different BDDs. Each time a value a ∈ {0, 1} is assigned to an input xj with respect to a counterexample, a corresponding counter count[xj ][a] is incremented. Thus, the counter keeps track of the number of counterexamples that assign a certain value to a primary input. Given a node v, the successor on the path to one has to be chosen from the two children of the node. If the successor is chosen to be Then(v), value 1 is assigned to primary input Label(v), otherwise 0 is assigned. Choosing the successor is done using the following rules that are ordered by decreasing priority, π denotes the variable order: 1. Do not choose the terminal 0. 2. Choose the child that was not yet visited.
126
ROBUSTNESS AND USABILITY
3. If π(Index(Then(v))) > π(Index(Else(v))), choose Then(v). 4. If π(Index(Else(v))) > π(Index(Then(v))), choose Else(v). 5. If (count[Label(v)][0] < count[Label(v)][1]), choose Else(v); otherwise, choose Then(v). Rule 1 ensures that the chosen assignment is part of a counterexample. Rule 2 ensures to visit different branches of the BDD. Rules 3 and 4 maximize the number of do not cares in the counterexample (see Observation 1) by jumping over as many levels as possible. Rule 5 leads to counterexamples that have different values for a variable (see Observation 2).
6.2.3
Experimental Results
Experiments were carried out on an AMD Athlon 2.2+ GHz with 512 MB running under Linux. The benchmarks were taken from the LGSynth93 set. Into each of the circuits a single error that changes the function of exactly one gate was randomly injected. Only those erroneous circuits were considered in the experiments where the number of counterexamples was between 50 (to really have a search problem) and 200 (to be able to determine the optimal choice of counterexamples). For each of the circuits 100 erroneous instances were generated and then diagnosed by five techniques. At first, all counterexamples were taken into account, then the results for the optimal choice of r counterexamples were calculated by simply iterating all possible choices. Then, r counterexamples were chosen randomly, by the maximum distance heuristic and by the efficient heuristic for BDDs. Table 6.4 gives information about the circuits considered. The name, number of inputs and outputs and the number of gates (as given by the original description) are listed. Column #cand. gives the average number of gates marked ascandidate error sites when all counterexamples were applied, i.e. #cand.= | Ci | due to the single fault assumption. The number of gates that were marked was in the range from only a few up to 100% depending on the circuit structure and the error location. Also, the number of counterexamples was spread within the full range allowed. Tables 6.5, 6.6, and 6.7 give experimental results for using two, three, and four counterexamples, respectively. All results are given relative to the overall optimum of using all counterexamples. In column av. the arithmetic mean factor of gates marked by the heuristic over gates marked by all counterexamples is given, i.e. 100 gates marked by the heuristic 1 . 100 gates marked using all counterexamples i=1
127
Diagnosis Table 6.4. Circuit data Circuit 9sym alu2 alu4 apex2 apex5 cordic dalu des e64 ex4p ex5p seq t481 x3
In 9 10 14 39 117 23 75 256 65 128 8 41 16 135
Out 1 6 8 3 88 2 16 245 65 28 63 35 1 99
Gates 269 261 2416 3227 2734 2103 2699 2763 717 1593 1477 2991 1091 1638
#Cand. 46.7 7.2 14.7 56.6 7.9 873.7 33.1 65.3 17.7 17.0 8.9 23.1 147.1 9.8
Table 6.5. Results using two counterexamples Circuit 9sym alu2 alu4 apex2 apex5 cordic dalu des e64 ex4p ex5p seq t481 x3
Opt Av. Dev. 1.413 0.624 1.818 1.098 1.110 0.487 1.094 0.205 1.039 0.129 1.005 0.015 1.193 0.458 1.021 0.055 1.068 0.245 1.072 0.176 1.351 0.792 1.039 0.105 1.140 0.310 1.210 0.425
Av. 15.657 16.680 3.454 2.392 4.040 1.649 3.961 1.141 10.157 2.554 18.299 2.053 7.464 4.132
Rand Dev. 27.240 24.731 10.028 4.516 9.990 6.331 8.199 0.182 15.532 6.203 53.541 3.790 22.014 6.181
Time 0.01 s 0.01 s 0.02 s 0.03 s 0.02 s 0.02 s 0.02 s 0.02 s 0.01 s 0.01 s 0.01 s 0.03 s 0.01 s 0.01 s
Dist Av. Dev. Time Av. 2.736 4.334 0.01 s 2.574 7.152 9.979 0.01 s 6.149 1.289 1.090 0.03 s 1.381 1.660 1.342 0.06 s 1.801 1.636 1.170 0.08 s 2.829 1.011 0.054 0.04 s 1.006 2.652 3.157 0.04 s 2.409 1.086 0.416 0.15 s 1.033 3.090 5.471 0.02 s 8.143 1.279 0.559 0.05 s 1.303 7.392 17.146 0.02 s 5.929 1.440 1.352 0.04 s 1.280 4.088 16.320 0.02 s 3.567 2.382 2.386 0.06 s 2.078
BDD Dev. 2.518 8.165 1.443 3.755 7.671 0.020 2.632 0.123 16.626 0.679 15.475 0.635 10.462 1.821
Time 0.01 s 0.01 s 0.02 s 0.03 s 0.02 s 0.02 s 0.02 s 0.02 s 0.01 s 0.01 s 0.01 s 0.03 s 0.01 s 0.01 s
The best average factor between the three heuristics is denoted in bold letters for each benchmark. Column dev. gives the standard deviation. Column time gives the average run time for choosing the counterexamples and diagnosing the error. As can be seen from Table 6.5, randomly choosing two counterexamples does not lead to a good reduction of candidate error sites and largely differs from one case to the next (see the large values of the standard deviation). The heuristics are more reliable, but there also exist cases where one heuristic performs rather poor while the other one almost reaches the overall optimum (e.g. e64).
128
ROBUSTNESS AND USABILITY
Table 6.6. Results using three counterexamples Circuit 9sym alu2 alu4 apex2 apex5 cordic dalu des e64 ex4p ex5p seq t481 x3
Opt Rand Av. Dev. Av. Dev. Time 1.053 0.116 1.948 1.433 0.01 s 1.162 0.285 9.025 11.897 0.01 s 1.038 0.201 1.349 1.463 0.03 s 1.003 0.025 1.847 3.717 0.05 s 1.000 0.000 2.518 4.399 0.02 s 1.003 0.010 1.006 0.020 0.03 s 1.048 0.193 2.586 3.267 0.02 s 1.003 0.017 1.053 0.075 0.03 s 1.000 0.000 6.278 8.920 0.01 s 1.002 0.008 2.123 5.599 0.01 s 1.144 0.404 18.156 49.285 0.02 s 1.007 0.039 1.479 1.224 0.03 s 1.036 0.111 3.495 10.478 0.02 s 1.027 0.178 2.534 2.721 0.01 s
Dist Av. Dev. Time Av. 2.232 2.000 0.01 s 2.448 4.712 7.167 0.01 s 4.132 1.231 0.845 0.04 s 1.340 1.074 0.179 0.07 s 1.674 1.375 0.976 0.09 s 1.531 1.006 0.019 0.05 s 1.006 1.529 1.177 0.05 s 2.188 1.037 0.191 0.16 s 1.021 2.674 4.899 0.02 s 8.001 1.149 0.324 0.05 s 1.204 5.031 13.726 0.02 s 4.967 1.184 0.502 0.05 s 1.212 1.197 0.526 0.02 s 3.502 1.433 0.939 0.06 s 1.640
BDD Dev. 2.423 5.507 1.396 3.673 1.034 0.020 2.387 0.104 16.451 0.559 13.150 0.514 10.442 1.187
Time 0.01 s 0.01 s 0.03 s 0.05 s 0.02 s 0.03 s 0.02 s 0.03 s 0.01 s 0.01 s 0.02 s 0.03 s 0.02 s 0.01 s
Opt Rand Dist Av. Dev. Av. Dev. Time Av. Dev. Time Av. 1.026 0.065 14.133 24.820 0.01 s 2.207 1.989 0.01 s 1.730 1.043 0.128 7.102 8.917 0.01 s 3.684 6.228 0.02 s 3.351 1.015 0.080 1.281 0.921 0.04 s 1.226 0.820 0.05 s 1.226 1.000 0.000 2.541 4.922 0.06 s 1.066 0.160 0.09 s 1.216 1.000 0.000 1.975 2.742 0.03 s 1.352 0.958 0.09 s 1.484 1.002 0.007 1.004 0.017 0.04 s 1.005 0.017 0.06 s 1.006 1.010 0.062 1.789 1.556 0.03 s 1.464 1.115 0.06 s 2.059 1.000 0.000 1.079 0.231 0.04 s 1.034 0.173 0.18 s 1.018 1.000 0.000 4.688 6.806 0.01 s 2.674 4.899 0.03 s 7.071 1.001 0.003 1.943 5.309 0.01 s 1.111 0.243 0.06 s 1.166 1.049 0.192 7.392 18.086 0.03 s 4.550 13.074 0.03 s 4.526 1.001 0.008 1.201 0.435 0.04 s 1.153 0.387 0.06 s 1.159 1.000 0.001 1.159 0.300 0.02 s 1.167 0.407 0.03 s 3.452 1.010 0.071 1.978 1.685 0.01 s 1.337 0.764 0.07 s 1.427
BDD Dev. 1.186 3.666 0.861 0.639 0.995 0.019 2.232 0.104 13.859 0.482 12.021 0.407 10.427 0.848
Time 0.01 s 0.01 s 0.04 s 0.06 s 0.03 s 0.04 s 0.03 s 0.04 s 0.01 s 0.02 s 0.02 s 0.04 s 0.02 s 0.01 s
Table 6.7. Results using four counterexamples Circuit 9sym alu2 alu4 apex2 apex5 cordic dalu des e64 ex4p ex5p seq t481 x3
Both heuristics lead to roughly the same reduction for most benchmarks when three counterexamples are used (Table 6.6). In some cases, the maximum distance heuristic achieves better results. This is due to the global decision scheme for this heuristic opposed to that of the BDD heuristic. The two directives in the BDD heuristic – “do not choose zero” and “jump over a large number of levels” – are local decisions. In all but a few cases, the heuristics were superior to the random choice.
129
Diagnosis 70 apex2 random apex2 BDD alu4 random alu4 BDD
60
# candidates
50 40 30 20 10 0 0
20
40
60
80
100
# counterexamples Figure 6.14. Number of candidates
Using four counterexamples further improves the result of PT (Table 6.7). For the investigated examples even the random choice often performed well. But there do exist cases, where more counterexamples improve the diagnosis quality of the random choice only slightly (e.g. alu2 or ex5p). This effect does not always disappear when more counterexamples are used as Figure 6.14 shows. Given are results for benchmarks alu4 and apex2 with respect to the random heuristic and the BDD heuristic for a particular error. In this series of experiments, the maximum number of counterexamples was not restricted since no optimal choice was calculated. For the error injected on alu4 985 counterexamples were available. Even the random choice performs good in this case because randomly choosing similar counterexamples from a large set is unlikely. For apex2 the number of counterexamples was 1417. Here, the disadvantage of the random choice can be seen: the number of candidates does not decrease monotonously. Even when 100 counterexamples are used the random choice leads to twice as many candidates than the BDD heuristic. Compared to the total number of counterexamples a small fraction is sufficient to achieve good diagnosis results in both cases. Figure 6.15 shows the total time needed for diagnosis, i.e. the selection and PT. The time increases linearly with the number of counterexamples and is still moderate for 100. In summary, applying more than one counterexample can significantly reduce the number of candidate errors, but using a large number is not necessary. Diagnosis is improved even if the counterexamples are chosen randomly from the set of all counterexamples. But using a heuristic to select counterexamples as different as possible is more reliable. An efficient algorithm was presented that can be applied when all counterexamples are given by BDDs.
130
ROBUSTNESS AND USABILITY 1.8 apex2 random apex2 BDD alu4 random alu4 BDD
1.6 1.4
CPU sec
1.2 1 0.8 0.6 0.4 0.2 0
0
10
20
30
40
50
60
70
80
90
100
# counterexamples Figure 6.15.
6.3
Time for diagnosis
Debugging Properties
In the previous sections, diagnosis and debugging have only been considered in the context of equivalence checking. The underlying problem was combinational and results were presented at the gate level. Moreover, for a counterexample there was always a correct output response available. Now, property checking is considered. For this purpose diagnosis has to be extended to the sequential case, results have to be presented at the source code level and, usually, there is not a unique correct output response available. Currently, there is not much tool support for debugging the failure of formal properties. Different methods have been proposed to understand the essence of a failure by improving the understanding of a counterexample. For example, partitioning the counterexample into parts which force the failure and into parts which try to avoid it is proposed in [JRS04]. In [RS04] counterexamples are reduced by removing irrelevant parts. Other approaches try to provide a fault explanation by investigating related traces [BNR03, GV03, RR03, Gro04]. The differences between failure traces and successful traces give an indication of the parts of a software program that are likely involved in the failure. In [CIW+ 03], all counterexamples are considered and it is classified whether particular value assignments are necessary, irrelevant, or possible to show the failure. Tools for an enhanced presentation of counterexamples are also available. In [HTCT03], a simulator with reasoning capabilities is proposed to interactively analyze the cause of a value assignment or the outcome of a forced change of a signal value. All these approaches help to understand a failure. But so far methods that fully automate the localization of faults for temporal properties are missing.
Diagnosis
131
In this section, an approach for automatic localization of fault candidates at the gate level or source code level for safety properties is proposed. The diagnosis uses a set of counterexamples that are obtained from either a formal verification tool or a run of a simulator with functional checkers. The proposed approach builds on model-based diagnosis [Rei87]. A failure is seen as a discrepancy between the required and the actual behavior of the system. Then diagnosis means to determine those components that, when assumed to be incorrect, explain the discrepancy. In [HD84], it is shown that for certain degenerate cases of sequential circuits model-based diagnosis marks all components as possible faults. Perhaps for this reason, there is little work on model-based diagnosis for sequential circuits, with the exception of [PW03] which does not take properties into account and applies a different fault model. The experimental results show that such degenerate cases rarely happen and that model-based diagnosis can be used successfully in the sequential case. Previous work in both the sequential and combinational case requires that a failure trace is given and the correct output for the trace is provided by the user. Here, instead of requiring a fixed error trace, it is assumed that a specification is given in Linear Time Logic (LTL) [Pnu77]. Counterexamples to a specification can be extracted automatically and the user does not need to provide the correct output: The necessary constraints on the outputs are contained in the specification. The diagnosis problem is stated as a SAT problem similar to the SAT-based diagnosis approach explained in Section 6.1. The construction is closely related to that used in BMC [BCCZ99]. For diagnosis, a counterexample of length tcyc is given. As in BMC, the circuit is unrolled to length tcyc and a propositional formula is built to decide whether the LTL property holds. If the inputs in the unrolled circuit are fixed to the values given in the counterexample and the property is constrained to hold, a contradiction is produced. The problem of diagnosis is the problem of resolving this contradiction. To resolve the contradiction, the model of the circuit is extended. A set of predicates is introduced which assert that a component functions incorrectly. If an abnormal predicate is asserted, the functional constraints between inputs and outputs of the component are suspended. The diagnosis problem is to find which abnormal predicates need to be asserted in order to resolve the contradiction. The set of satisfying assignments can be further restricted by requiring that the output of a gate must depend functionally on the inputs and the state of the circuit. Thus, the existence of a combinational correction is required. This allows to extract a suggestion of the proper behavior of the suspect component from the satisfying assignments.
132
ROBUSTNESS AND USABILITY
To improve the performance of the algorithm, a dedicated decision heuristic for the SAT solver is suggested. In the setting considered here, a small set of decision variables suffices to imply the values of all other variables. Restricting the decision variables to this set leads to a considerable speed-up and allows us to handle large and complex designs. The search space can be further pruned by applying a simulation-based preprocessing step. By calculating sensitized paths, the set of candidate error sites is pruned first. Only those components identified as candidates during the preprocessing step have to be considered during SAT-based diagnosis. The section is structured as follows. In Section 6.3.1, other diagnosis approaches are discussed. Section 6.3.2 gives the foundation of the diagnosis approach and presents how fault localization is performed. The applicability of the approach on the source level is shown in Section 6.3.3. Then, Section 6.3.4 gives experimental evidence of the efficiency of the approach.
6.3.1
Other Diagnosis Approaches
There is a large amount of literature on diagnosis and repair. Most of it is restricted to combinational circuits. Also, much of it is limited to simple faults such as a forgotten inverter, or an AND-gate that should be an OR. Such faults are likely to occur, for example, when a synthesis tool makes a mistake when optimizing the circuit. The work in [VH99] and [CWH93] on diagnosis at the gate level, for example, has both limitations. Sequential circuits are treated in [WB95] on the gate level but the approach is limited to simple faults. The fault model of [HC99] is more general, and it addresses sequential circuits but assumes that the correct output values are given. Its technical approach is also quite different from the one introduced here. The error model in [AVS+ 04] is similar but there correct output responses are always available and no functional consistency constraints are provided. The approach introduced in [ASV+ 05] has the same limitations. But there hierarchical relations are exploited during debugging. This is similar to hierarchical information that is available from source code annotations as will be explained later. Therefore, the same technique could be used to further increase the efficiency of the approach presented here. Both [Gro04] and [ZH02] work on the source code level (for hardware and software, respectively). Both are based on the idea of comparing which parts of the code are exercised by correct traces and incorrect traces that are similar. Only a few approaches have been proposed that are dedicated to fault location or correction for property checking. In [JGB05, SJB05], a game-based approach is proposed which locates a fault and provides a new function as a correction for a faulty component. Because it computes a repair, this approach is far less efficient than the one suggested here. In [FD05], a simulation-based approach using BSIM (see Section 6.1.1.2) is presented that is similar to the
133
Diagnosis
current one but less accurate. Here, the simulation-based technique is used as a preprocessing step to prune the number of components considered during diagnosis.
6.3.2
Diagnosis for Properties
In this section, the new approach is described. The basic algorithm is introduced and extensions for run time improvements and accuracy improvements are explained. The section concludes with a discussion.
6.3.2.1 Computing Fault Candidates To simplify the explanation, it is assumed that the components of the circuit are gates, that is, a fault candidate is always a single gate. The proper definition of components is considered in Section 6.3.3. Furthermore, the specification is given as a (single) LTL formula. The overall approach is a combination of BMC as explained in Section 2.3.2 and SAT-based diagnosis as introduced in Section 6.1.1.3. The basic procedure has four steps: 1. Create counterexamples 2. Build the unrolling of the circuit, taking into account that some components may be incorrect 3. Build a propositional representation of the property 4. Use a SAT solver to compute the fault candidates The counterexamples to the property can be obtained using model checking [CGP99] or using dynamic verification [ABG+ 00]. It is advantageous to have many counterexamples available as this increases the discriminative power of the diagnosis algorithm. Techniques for obtaining multiple counterexamples have been studied in Section 6.2 and in [GKL04]. For simplicity, the case of using one counterexample (of length tcyc ) is considered first. Furthermore, finite counterexamples are assumed, that is, the liveness part of the specification is ignored. The purpose of Step 2 and Step 3 is to construct a propositional formula ψDiag such that the fault candidates can easily be extracted from the satisfying assignments for ψDiag . As stated before, the procedure is closely related to BMC, and specifically the differences will be addressed. The unrolling of the circuit C and creating a SAT instance to check an LTL property have been explained in Section 2.3.2. In particular, the creation of the formula to describe the unrolled circuit as given in Equation (2.4) has been discussed in detail. A similar formula is used for diagnosis.
134
ROBUSTNESS AND USABILITY
In order to perform diagnosis, a new propositional variable abg is introduced for each gate g. Analogously to the combinational case of SAT-based diagnosis (see Section 6.1.1.3), the description ψg [t] of gate g at time frame t is replaced by the formula ψˆg [t] = (abg → ψg [t]). As explained before, if the abnormal predicate abg is asserted, gate g is selected for correction, i.e. no assumptions on its behavior at any time frame is made. If abg is not asserted, the gate works as required. Now, given a single counterexample the formula ξ forces the inputs of the unrolled circuit to the values prescribed by the counterexample. Then, the description of the unrolling is given by tcyc −1 tcyc ˆ ψˆg [t] ψC = ξ · t=0 g∈V
t
The propositional formula ψΨcyc for the LTL formula Ψ is created as explained in Section 2.3.2. Note that combining the description of the counterexample, the circuit, and the specification in a single SAT instance and forcing all abnormal predicates to false, yields a contradiction. Let abg . ζ0 = g∈V
Then the following expression is contradictory: t t zΨ [0] · ψΨcyc · ψˆCcyc · ζ0
A diagnosis is obtained by calculating which abnormal predicates can resolve the contradiction. For instance, for single fault candidates, let ζ1 state that at most one abnormal predicate is true, then the diagnosis problem can be formulated as follows: abg ζ1 = g∈V h=g,h∈V t t ψDiag = zΨ [0] · ψΨcyc · ψˆCcyc · ζ1
If a is a satisfying assignment for ψDiag and a asserts abg , then g is a fault candidate. As shown in Section 6.2, multiple counterexamples can be used to reduce the number of diagnosed components: Only an explanation that resolves the conflict for all counterexamples is a fault candidate. The propositional formula corresponding to this problem consists of one unrolling of the circuit for each counterexample. All sets of variables are disjoint, the abnormal predicates, which are shared, are an exception.
135
Diagnosis
Example 26. In the following, the process is illustrated using another simple arbiter with input req and output ack. The arbiter is supposed to acknowledge each request either instantaneously or in the next time frame, but it may not emit two consecutive acknowledgments. Let s1 and s2 be present state bits. State bit s1 stores whether there is a pending request, and s2 stores whether an acknowledge has occurred in the last step. The arbiter is defined by the following equations: ack = (s1 + req) · s2 next(s1 ) = req · ack next(s2 ) = ack Furthermore, the initial values of s1 and s2 are 0. Note that the circuit as shown in Figure 6.16 contains a fault: ack should be g1 · s2 . In LTL, the specification reads G((req + ack + X ack) · (ack + X ack)). The shortest counterexamples to the property have length two. For example, if requests occur in the first two time frames, ack is 0 in both frames, which violates the specification. Figure 6.17 shows the unrolled circuit combined with the unrolled LTL specification. The abnormal predicates can remove the relation between the input and the output of a gate. For instance, the clauses for gate g2 are equivalent to abg2 → (g2 ↔ (g1 · s2 )). Nothing is ascertained about g2 when abg2 is true.
s0 req
g1 g3
s1
g2
ack
Figure 6.16.
Faulty arbiter circuit
136
ROBUSTNESS AND USABILITY req 1
abg 1
abg
abg 3
0
1
0
2
req 1
0
0 s1
g1
s1
1
g1
0
s1
1 g3
g3 0
g2
s2
1
1
g2
s2
1
0 s2 ack
ack
1 ack X ack 1
1
g7 1
1g 6 1 1 valid
1
g5
g4
1
Time frame 0
g7
g6
Ω Ψ
g5
Ω·Ψ 1 G(Ψ·Ω)
g4
Time frame 1
X G(Ψ·Ω)
Figure 6.17. Circuit with gate g2 as diagnosis (Ω = req + ack + X ack, Ψ = ack + X ack).
The gates below the horizontal dashed line correspond to the unrolled formula. The signal corresponding to the truth of the specification is labeled with “valid” in the figure. For every time frame, the outputs of the gates in the unrolled formula correspond to a subformula of the specification. In the figure, the labels of the dashed horizontal lines indicate which subformula is represented by a gate output. It can easily be seen that valid is zero when two requests occur and all abnormal signals are set to zero. (Ignore for now the numbers in boxes.) Note that signals corresponding to the valuation of ack and G Ψ · Ω in time frame 2 appear in the figure (bottom right). The fact that the specification is false can be derived regardless of the values of these signals, since the counterexample is finite. The question for the SAT solver is whether there is a consistent assignment to the signals that makes the specification true and sets only one of the abnormal predicates to true. One solution to this question is shown by the numbers in boxes in the figure. Gate g2 is assumed to be incorrect (as expected). For the circuit to be correct, it could return 1 in time frame 0 and 0 in time frame 1. The corresponding correction suggested by this satisfying assignment is that g2 should be 0 when g1 is 1 and s2 is 0, and 0 when both inputs to the gate are 1.
137
Diagnosis
The contradiction cannot be explained by setting abg1 or abg3 to true which means that g2 is the only fault candidate.
6.3.2.2 Functionality Constraints There is another satisfying assignment to the example just discussed: let g2 be 0 in the first step and 1 in the second. Note that there is no combinational correction to the circuit that implements this repair, as the inputs and states in both steps would be the same, but the output of g2 is required to be different. In fact, the approach may find diagnoses for which there is no combinational repair. It may even find diagnoses when the specification is not realizable. A similar observation is made in [Wot02] for multiple test cases. Now it is shown that by adding Ackermann constraints to the propositional formula ψDiag it can be guaranteed that for any diagnosis there is a fix that makes the circuit correct for at least the given set of counterexamples. The following example shows that the approach considered so far does not make any guarantees. Example 27. Consider the unrealizable specification out ↔ X in, where out is an output and in is an input. If the circuit consists of one component c, connecting in and out, it is not hard to see that c is a diagnosis independent of the counterexample. Therefore, {c} is a valid correction as specified in Definition 9 in Section 6.1. In the previous sections, only combinational circuits were considered with respect to different counterexamples. Thus, for each counterexample a combinational function can deterministically produce a different value. This is not true in the sequential case where the same values of primary inputs and present state elements may occur for different time steps and different counterexamples. The definition of a valid correction can be refined as follows to alleviate this problem. Definition 16. A gate g is repairable if there is a Boolean function f (x1 , . . . , xn , s1 , . . . , sn ) in terms of the inputs and the state such that the circuit adheres to the specification when g is replaced by f (x1 , . . . , xn , s1 , . . . , sn ). That is, a gate is repairable if the circuit can be fixed by replacing the gate by some new cone of combinational logic. Definition 17. Gate g is repairable with respect to T, where T is a set of counterexamples, if there is a Boolean function f (x1 , . . . , xn , s1 , . . . , sn ) such that none of the counterexamples in T are a counterexample to the property when the function of g is replaced by f .
138
ROBUSTNESS AND USABILITY
The generalization of this definition to a set of candidates or components is straightforward. But in this section a single error assumption is applied. Remark 11. In the sequential case, each repairable gate is also a valid correction because changing the value of this gate is sufficient. But in Example 27 the component c is a valid correction while not being repairable. Given a set of counterexamples T, the Ackermann constraint for a gate g says that for any (not necessarily distinct) pair of counterexamples T1 , T2 and any pair of time steps i, j, if the state and the inputs of the circuit in time step i of counterexample T1 equal the state and the inputs in time step j of counterexample T2 , then the output of g is the same in both steps. Ackermann constraints can easily be added to the propositional formula by adding a number of clauses that is quadratic in the cumulative length of the counterexamples and linear in the number of gates. This leads to the following result: Theorem 6. In the presence of Ackermann constraints, given a set of counterexamples T, any gate that is a diagnosis is repairable for T. The choice of what constitutes a repairable gate may seem somewhat arbitrary. Alternative definitions, however, are handled just as easily. For instance, one could require that a fix is a replacement by a single gate with the same inputs. The Ackermann constraints would change correspondingly. On the other extreme, one could allow any realizable function, in which case the Ackermann constraints would require that the output is equal if all the inputs in the past have been equal. In this case – assuming that all counterexamples are pairwise different – the notion of a valid correction as defined earlier is applicable. For the notion of Ackermann constraints used here, including all state elements and inputs may yield a very large problem instance in practice. Instead, only those signales are included that are considered by the property and their transitive fanin. This is visualized in Figure 6.18: Signals that are considered by the property are indicated by “×” in the figure and the transitive fanin is
0 2
Time frame 0
Figure 6.18.
Time frame 1
Time frame 2
State elements considered for Ackermann constraints
Diagnosis
139
shown as a grey area. As a result, the subset of state elements and primary inputs may be different at different time steps. Given two time steps i and j, the the state elements and inputs are compared that are contained in both copies of the circuit. For example, consider time frame 1 in Figure 6.18. When comparing the state to time frame 0, all elements indicated by the bold bar labeled with 0 are considered. When comparing time frames 1 and 2, those elements marked by the bold bar with label 2 are considered. This makes the constraints more restrictive since only a subset of the state bits is considered in each time frame. Now consider a single component g (indicated by •). This component influences the property in time frames 1 and 2 and may be considered for correction. By construction all state elements influencing g in these time frames are contained in the unrolled circuit as well. Therefore, all state elements relevant for repairing g are compared by Ackermann constraints. Thus, the proposed more restrictive heuristic approach is reasonable and more efficient than including all state elements in the constraints.
6.3.2.3 SAT Techniques In practice, all fault candidates are of interest, not just one. This can be achieved efficiently by adding blocking clauses [McM02] to the SAT instances stating that the abnormal predicates found so far must be false. Note that not the full satisfying assignment is added as a blocking clause but just the fact that some abnormal predicates must be false, to exclude all other valuations of this assignment. The efficiency of the SAT solver can be drastically improved using a dedicated decision strategy similar to [Str04]. By default, the solver performs a backtrack search on all variables in the SAT instance. Here, all variable values can be implied when the abnormal predicates and the output values of gates asserted as abnormal are given. Therefore, a static decision strategy is applied, that decides abnormal predicates first and then proceeds on those gates that are asserted abnormal starting at time frame 0 up to time frame tcyc − 1. Figure 6.19 shows the pseudo code for this decision strategy. The vector A contains all abnormal predicates. This vector is searched until a predicate ab with an undecided value is found. If no value was assigned, the predicate is set to 1 (Lines 4–6). Due to the construction of the SAT instance, this assignment implies the value 0 for all other abnormal predicates. If the first assigned predicate has value 1, the output variable of the gate influenced by ab is considered (Lines 7–11). The hash H maps abnormal predicates to output variables of gates. H(ab) returns a vector of k propositional variables. Variable H(ab)[t] represents the output of the gate that is asserted abnormal by ab at time frame t. Thus, the first gate with unknown output value that is asserted abnormal is set to the value 0. Gates in earlier time frames are considered first. If no unassigned variable is found, a satisfying assignment was
140
ROBUSTNESS AND USABILITY 1 2 3 4 5 6 7 8 9 10 11 12
function staticDecision for i := 1 to A.size let ab be the variable A[i]; if ab == UNDECIDED then ab := 1; return DECISION_DONE; else if ab == 1 then for t := 0 to tcyc − 1 if H(ab)[t] == UNDECIDED H(ab)[t] := 0; return DECISION_DONE; return SATISFIED;
Figure 6.19.
Pseudocode of the static decision strategy
found (Line 12). Note that only one value of each variable has to be assigned in the decision strategy because the other value is implied by failure driven assertions (see Section 2.1.3). Note also that H(ab)[t] is a list in the general case because multiple counterexamples and components instead of gates are considered, i.e. each abnormal predicate may correspond to multiple gates as explained in Section 6.3.3. In the implementation, this list is searched for the first gate that is undecided. The experiments show a significant speed up when this strategy is applied. Constraint replication is not yet applied, but this can obviously be used in this setting, especially when multiple counterexamples are present.
6.3.2.4 Simulation-based Preprocessing When all gates or components of a circuit are considered as potential diagnoses, the search space is very large. A first obvious method to reduce this search space is a cone-of-influence analysis. As a result, only those components that drive signals considered in the property are contained in the SAT instance. Furthermore, a simulation-based preprocessing step can be applied to further reduce the number of components that have to be considered during diagnosis. As observed in Section 6.1, simulation-based diagnosis has a linear time complexity with respect to the size of the circuit. Furthermore, a single error assumption is applied for diagnosing properties. Therefore, when using multiple counterexamples only components marked by each counterexample are considered as candidates. Abnormal predicates are only assigned to these candidates during the SAT-based diagnosis step. This procedure does not change the solution space for diagnosis, because changing a component that is not on a sensitized path cannot change the output value of the property.
141
Diagnosis
The experimental results show that the overhead of this linear time preprocessing step is low. This step can prune the search space and by this reduces the overall run time.
6.3.2.5 Discussion Just like multiple counterexamples, stronger specifications reduce the number of diagnoses. When more properties are considered, the constraints on the behavior are tightened, leading to less diagnoses. In practical applications, a hint how to repair the faulty behavior at a particular component is useful. The satisfying assignments do not only provide diagnoses but also provide the values that the faulty components should provide. The extension to liveness properties does not seem to be simple. In model checking, the counterexample to a liveness property is “lasso-shaped”: After some initial steps, it enters an execution that repeats infinitely often. It is very easy to remove such a counterexample by changing any gate that breaks the loop without violating the safety part of the property. The recent observation that liveness properties can be encoded as safety [BAS02] does not seem to affect this observation as it merely encodes the loop in a different way. Note however that on an implementation level bounds on the response time are often available and liveness can thus be eliminated from the specification, at least for the purpose of debugging.
6.3.3
Source Level Diagnosis
The previous section describes the diagnosis approach by means of sequential circuits on the gate level. In this section, the applicability of the approach on the source level is shown. An expression on the source level may correspond to multiple gates. Therefore, a single fault on the source level may correspond to multiple faults on the gate level. To avoid multiple fault diagnosis, this information has to be included in the SAT formula that is solved for diagnosis. This is achieved by grouping several gates into one component. The hierarchy induced by the syntactical structure of the source code is included in the gate-level representation of the design and the property. This allows to link the gate level to the source code. The link between source code and gate-level model is established during synthesis. This procedure is shown in Figure 6.20. An Abstract Syntax Tree (AST) is created from the source code at first. Then, the AST is traversed and directly mapped to gate-level constructs. During this mapping, the gates that correspond to certain portions of the source code can be identified. Thus, the AST induces regions at the gate level. These regions are grouped hierarchically. Components are identified based on this representation. Each region corresponds to a component. For example, the expression (a==1) && (b==0)
142
ROBUSTNESS AND USABILITY intermediate representation
source code
hierarchical netlist
c
if-stmt expr
ifif (a==0 && b==1) c= 1; else else c=0;
ab && 3 == a
ab1
0
== b
stmt
stmt
=
=
ab2 c 1
1
c 0
0
0
1
1
a
Figure 6.20.
b
Source code link
corresponds to three components: (a==1), (b==0), and the complete expression. For each region a single abnormal predicate is introduced. All gates that do not belong to a lower region in the hierarchy are associated to this abnormal predicate. In the example, the predicates ab1 , ab2 , and ab3 are introduced. Although this approach requires a modified synthesis tool, the diagnosis engine can take advantage of the hierarchical information as it was suggested in [ASV+ 05]. For instance, a correction of a single expression may not be possible, but changing an entire module may rectify all counterexamples. When the hierarchy information is encoded in the diagnosis problem, a single fault assumption still returns a valid diagnosis. The granularity of the diagnosis result can also be influenced. For example, choosing only source-level modules as components yields a coarse diagnosis, or, in contrast, considering all subexpressions and statements as components produces a fine-grained diagnosis result. Finally, hierarchical information can be used to improve the performance of the diagnosis engine. First, a coarse granularity can be used to efficiently identify possibly erroneous parts of the design. Then, diagnosis can be carried out at a finer granularity with higher computational cost to calculate more accurate diagnoses for the previously identified components [ASV+ 05].
6.3.4
Experimental Results
For the experimental data, the benchmarks provided with VIS [VIS96] were used. A bug was manually introduced in each of the designs by changing an operator or a constant. In the following, the specificity of the diagnosis is analyzed and the benefit of the modified decision heuristics is shown. All experiments were carried out on an AMD Athlon 3500+ (Linux, 2.2 GHz, 1 GB, Linux).
143
Diagnosis
A modified version of the synthesis tool vl2mv from VIS is used to produce the annotated gate-level representation. The design and the property are described in Verilog. As a result, either one can be considered during diagnosis. This environment can use multiple counterexamples for diagnosis. The incremental property checker is based upon a version of Zchaff [MMZ+ 01] that supports incremental SAT [WKS01]. During diagnosis one SAT instance is created that includes a copy of the design for each counterexample. Constraint replication between the copies is not used, yet. The incremental interface of Zchaff is used to calculate all diagnoses. Zchaff was modified to use the static decision heuristic discussed in Section 6.3.2.5.
6.3.4.1 Example The branch prediction buffer bpbs provided as an example for VIS is considered in the following. For each branching point four state machines as shown in Figure 6.21 are used for prediction. Each state machine provides one prediction bit. When a branching point is reached, one state machine is updated. The address translation and the selection of a single state machine are done externally. A valid property for this buffer is pStrongPrediction that says: If the operation is not stalled and all state machines agree on branch taken/not taken, the prediction bits also agree on the next clock cycle. The code of the buffer that relates to this property is shown in Figure 6.22. Line 11 contains a fault. The wrong polarity of signal stall is considered. The comment in Line 10 shows the correct code. The proposed localization approach was applied to the erroneous design. When two or more counterexamples were used, only one diagnosed expression was returned. This expression was exactly the underlined stall-signal in the erroneous if-condition. Therefore, in this case the single diagnosis exactly points to the problem that caused the property to fail.
no jump 0: strong not taken
no jump 1: weak not taken
jump Figure 6.21.
jump
no jump 2: weak taken jump
State machine for branch prediction
3: strong taken
144
ROBUSTNESS AND USABILITY
1 module branchPredictionBuffer(clock,stall ,inst_addr,... 2 prediction); parameter SIZE = 4; input clock; input [1:0] 3 inst_addr; input stall; output [3:0] prediction; reg [1:0] . 4 state_bank0 [SIZE-1:0]; . . 5 6 7 8 9 10 11 12 13 14
always @(posedge clock) begin //Correct: if (stall==0) begin if (stall==1) begin if (state_bank3[inst_addr] > 1) prediction[3] = 1; else prediction[3] = 0; if (state_bank2[inst_addr] > 1) . . .
end // if (!stall) . 15 . . end // always @ (posedge clock) endmodule // 16 branchPredictionBuffer Figure 6.22.
Source code for bpb
6.3.4.2 Diagnosis Quality Results regarding the quality of the diagnosis are presented in Table 6.8. The table also shows the influence of multiple counterexamples and Ackermann constraints on the diagnosis results. The first column gives the name of the benchmark circuit and the name of the property considered. The number of gates and the number of registers contained in the circuit follow. For BMC the length of the counterexamples and the number of components identified in the complete design is given in columns len and #cmp, respectively. Besides diagnosis, results for a static cone of influence analysis, results for using a single counterexample, four counterexamples, and four counterexamples together with Ackermann constraints are reported. For each approach the number of components returned as diagnosis and the percentage of diagnosed components compared to all components in the design is shown in columns #cmp and %, respectively. The traditional cone of influence analysis often leads to a large number of components that have to be considered. In contrast, the number of diagnoses is often small already when the new diagnosis approach is applied with a single
145
Diagnosis Table 6.8. Diagnosis results for multiple counterexamples and Ackermann constraints BMC Circuit, property am2910 p1 e1,pE5 am2910 p2 e1, pSP bpbs p1 e1, pValidT bpbs p1 e2, pValidT counter e1, pCount FPMult e1, pLegalOp FPMult e2, pLegalOp gcd e1, pReady gcd e2, pReady gcd e1, pBoth gcd e2, pBoth gcd e1, pThree gcd e2, pThree
Gates Reg. Len #Cmp 2257 102 5 227 2290 102 5 230 1640 39 2 127 1640 39 2 127 25 7 3 11 973 69 4 119 973 69 4 119 634 51 22 87 634 51 22 87 634 51 23 87 634 51 23 87 634 51 23 87 634 51 23 87
Diagnosis Cone Single Four Ackermann #Cmp % #Cmp % #Cmp % #Cmp % 205 90 66 29 36 15 36 15 87 37 37 16 26 11 26 11 102 80 15 11 13 10 13 10 102 80 15 11 4 3 4 3 10 90 4 36 4 36 1 9 105 88 3 2 3 2 3 2 105 88 54 45 47 39 47 39 68 78 45 51 35 40 35 40 68 78 34 39 32 36 32 36 71 81 46 52 36 41 36 41 71 81 33 37 33 37 33 37 71 81 33 37 23 26 23 26 71 81 39 44 22 25 22 25
counterexample. Problems may occur when very long counterexamples are considered. Then, the fix can be placed at many different locations. But this is inherent to the problem and not a limitation of the presented approach. Moreover, using multiple counterexamples for diagnosis often raises the accuracy. As an extreme, consider bpbs where the number of fault candidates is reduced from 15 to only 4. In contrast, Ackermann constraints do not yield the same improvement. Only for counter e1 the number of diagnoses was reduced and the algorithm returned exactly the real error site. The overhead in run time is quite high for Ackermann constraints. An increase is observed by up to a factor of 60 especially on large instances. Thus, Ackermann constraints should only be applied in a second stage of the diagnosis process due to their low influence on the accuracy.
6.3.4.3 Run time In Section 6.3.2, we suggested two techniques to improve the run time of the overall algorithm: A static decision strategy for the SAT solver and the use of a simulation-based preprocessing step. Both techniques were implemented within the hierarchical framework. The results are reported in Table 6.9 for the practical case of using four counterexamples without Ackermann constraints. The table shows run times for the different techniques. Additionally, the number of components considered during SAT-based diagnosis is given. Note that this is not the number of components returned as diagnoses that was considered previously. The number of decisions made by the SAT solver are also reported.
146
ROBUSTNESS AND USABILITY
Table 6.9. Run times for the different approaches (using four counterexamples) BMC Circuit, property am2910 p1 e1, pE. am2910 p2 e1, pS. bpbs p1 e1, pV. bpbs p1 e2, pV. counter e1, pC. FPMult e1, pL. FPMult e2, pL. gcd e1, pR. gcd e2, pR. gcd e1, pBoth gcd e2, pBoth gcd e1, pThree gcd e2, pThree
Time 0.5 <0.1 0.1 <0.1 <0.1 <0.1 <0.1 18.7 22.1 32.2 24.2 42.7 35.5
Diagnosis Zchaff default Static Simulation + static Time #Cmp #Dec Time #Cmp #Dec Time #Cmp #Dec 11.9 205 165,247 2.6 205 8,047 1.6 69 7,855 0.4 87 3,848 0.3 87 989 0.3 52 916 0.2 102 2,819 0.2 102 302 0.1 19 266 0.2 102 1,805 0.1 102 110 0.1 5 87 <0.1 10 259 <0.1 10 131 <0.1 9 130 0.4 105 397 0.2 105 60 0.2 5 60 2.3 105 17,540 1.1 105 8,440 1.0 76 7,320 1057.2 68 3,271, 957 54.0 68 479,526 54.4 67 479,525 351.2 68 1,022,573 19.7 68 115,519 18.6 63 112,833 2213.4 71 3,468,162 91.7 71 425,438 90.1 67 425,436 453.8 71 1,058,165 55.2 71 237,104 50.2 59 232,334 1626.1 71 2,617,354 201.8 71 723,180 198.4 65 730,191 499.0 71 1,278,064 1306.9 71 3,586,181 1307.8 71 3,586,181
The run time decreases drastically when the static decision heuristic is applied. This is due to the reduction of the number of decisions that have to be done by the SAT solver. The only exception is the last benchmark, but when using only one counterexample, the run time was only 9.91 s at the cost of a lower accuracy (see above). Usually, the run time does not exceed the time for BMC too much – even when four counterexamples are applied for diagnosis. Here, incrementally applying more and more counterexamples as suggested in [SVV04] can yield an even shorter run time. The use of the simulation-based preprocessing step also saves some run time in those cases where the number of components considered during SAT-based diagnosis can be reduced significantly. On the other hand the overhead is quite low when no components can be pruned. The creation of counterexamples dedicated to diagnosis as proposed for the combinational case in Section 6.2 may further improve the diagnosis result. This hypothesis is strengthened by the following experimental results. In total, 1000 diagnosis runs were carried out with four randomly chosen counterexamples on am2910 e1 for property pEntry5 and on gcd e2 for property pReadyIn22Cyc. Figures 6.23 and 6.24 show the results. The number of diagnoses varied from 28 to 90 and the run time varied between 1.75 s and 3.75 s for am2910 as can been seen in Figure 6.23. In case of gcd, the number of diagnoses was between 22 and 38 while the run time varied between 6.95 s and 23.65 s as Figure 6.24 shows. Usually, a better diagnosis accuracy also causes shorter run time. In summary, the run time was drastically reduced by the proposed techniques and makes the effort of diagnosis comparable to that of BMC.
147
Diagnosis 4
runtime (s)
3.5
3
2.5
2
1.5 20
30
40
50
60
70
80
90
#comp
Figure 6.23.
am2910: Runtime vs. number of diagnosed components
24 22
runtime (s)
20 18 16 14 12 10 8 6 22
24
26
28
30
32
34
36
38
#comp
Figure 6.24.
6.4
gcd: Runtime vs. Number of diagnosed components
Summary and Future Work
Automatic diagnosis and debugging were considered in detail in this chapter. First, the relations between simulation-based and SAT-based diagnosis have been investigated. Theoretically and empirically it has been shown that the basic simulation-based approaches BSIM and COV are fast but they cannot guarantee to return a valid correction. Moreover, COV may not retrieve all valid corrections. Manually removing invalid corrections is very time consuming. BSAT needs more computation time but returns good diagnosis results that are guaranteed to be a valid correction for a given test-set. The same is true for the advanced approaches that use different search paradigms. The results show a direction for future work. While BSIM does not guarantee that an actual error site has been marked by the largest number of
148
ROBUSTNESS AND USABILITY
counterexamples, this happened in almost all experiments. In the same way, the results returned by COV were not too far from the real errors in most cases. These results suggest a hybrid approach. A simulation-based preprocessing step for diagnosing properties was already applied in Section 6.3. In future work, the fast engines of BSIM and COV can be used to direct the SAT search by tuning the decision heuristics of the solver. A second possibility is to choose an initial correction (that may not be valid) and use SAT-based diagnosis to turn it into a valid correction. Next, the problem of selecting multiple counterexamples for diagnosis was targeted. The problem was formally defined and shown to be difficult. The corresponding decision problem was proven to be NP-complete. Heuristics were given to enable the generation of useful counterexamples and to efficiently choose them. Here, also tuning the SAT solver to produce a “good set” of counterexamples is an important next step. For example, techniques that are also applied for all-solutions SAT [GSY04, LHS04] could be exploited. Finally, aiding debugging of properties was considered. The presented approach automatically locates design errors at the gate level or the source code level. The approach handles safety properties written in LTL. A propositional logic formula is built such that diagnoses can be derived from satisfying assignments. We have shown how to extend the formula to make sure that a diagnosed component is actually repairable for the given input sequences. The link to the source code enables the diagnosis engine to exploit hierarchical information. More important, the source code information allows to apply a single error assumption even when errors are introduced at the HDL level. Experimental results show that the efficiency is drastically improved by using a dedicated search strategy for the SAT solver. All these diagnosis techniques improve the usability of formal verification tools in the design flow. Instead of manually debugging the design description this process is partially automated. Only a small fraction of the design, i.e. the candidate error locations, have to be considered by the designer.
Chapter 7 SUMMARY AND CONCLUSIONS
Today circuit design is a complex task that is composed of several steps. The overall flow and the different steps have been studied in detail in this work. Currently, robustness and usability are still the major problems. The analysis identified a number of deficiencies in the individual design steps. Techniques and methods to alleviate specific problems have been proposed. All of these approaches were empirically evaluated in case studies or benchmarking experiments. When all of these techniques are integrated, a new enhanced design flow emerges. In this new flow underlying algorithms do not only aim at robustness but are also adjusted to the needs of subsequent tasks like the generation of meaningful counterexamples. The use of SystemC tightly couples system-level description and the synthesizable description of the design. By this, inconsistencies between the two descriptions can be detected more easily and often even avoided because only simple transformations are done. A technique for the creation of fully testable circuits from the SystemC description was presented. On these circuits, ATPG can be carried out efficiently. In the verification realm the transition towards formal methods has been suggested. As long as simulationbased verification methods are still in use, these can be coupled with formal techniques by applying automatic generation of properties from testbenches. These generated properties help to detect gaps in testbenches. But – even more important – the automatic generation of properties provides a whole new verification methodology. A methodology that is based on design understanding and the interactive creation of properties. Using this methodology, the creation of properties becomes more efficient, and the usability of tools to check the consistency between design description and textual specification improves as well. Thus, the verification productivity increases. Here, automatic support for debugging also plays an important role. Counterexamples will still remain
150
ROBUSTNESS AND USABILITY
the instrument to unveil discrepancies between a certain design and its specification. Therefore, techniques to automate error location and design debugging were investigated. Different approaches were compared and a method to produce particularly useful counterexamples for automatic diagnosis was proposed. An approach to apply diagnosis techniques even at the source code level to debug formal properties was presented. Several ideas to further improve the different techniques have been discussed in each chapter already. Overall, the proposed techniques establish an enhanced design flow. In comparison to the traditional design flow, the new approaches boost the productivity of the time consuming design process. The improvements are achieved by more robust algorithms and tools that are easier to use.
REFERENCES
[ABG+ 00]
Y. Abarbanel, I. Beer, L. Gluhovsky, S. Keidar, and Y. Wolfsthal. FoCs – automatic generation of simulation checkers from formal specifications. In Computer Aided Verification, volume 1855 of LNCS, pages 538–542, 2000.
[ADK91a]
P. Ashar, S. Devadas, and K. Keutzer. Gate-delay-fault testability properties of multiplexor-based networks. In Int’l Test Conf., pages 887–896, 1991.
[ADK91b]
P. Ashar, S. Devadas, and K. Keutzer. Testability properties of multilevel logic networks derived from binary decision diagrams. Advanced Research in VLSI: UC Santa Cruz, pages 33–54, 1991.
[ADK93]
P. Ashar, S. Devadas, and K. Keutzer. Path-delay-fault testability properties of multiplexor-based networks. INTEGRATION, the VLSI Jour., 15(1):1–23, 1993.
[AFK88]
M.S. Abadir, J. Ferguson, and T.E. Kirkland. Logic verification via test generation. IEEE Trans. on CAD, 7:172–177, 1988.
[AH97]
H. Andersen and H. Hulgaard. Boolean expression diagrams. In Logic in Computer Science, pages 88–98, 1997.
[AMM83]
M. Abramovici, P.R. Menon, and D.T. Miller. Critical path tracing – an alternative to fault simulation. In Design Automation Conf., pages 214–220, 1983.
[ASU85]
A.V. Aho, R. Sethi, and J.D. Ullman. Compilers – Principles, Techniques and Tools. Pearson Higher Education, 1985.
[ASV+ 05]
M. Ali, S. Safarpour, A. Veneris, M. Abadir, and R. Drechsler. Post-verification debugging of hierarchical designs. In Int’l Conf. on CAD, pages 871–876, 2005.
[AVS+ 04]
M.F. Ali, A. Veneris, S. Safarpour, R. Drechsler, A. Smith, and M.S.Abadir. Debugging sequential circuits using Boolean satisfiability. In Int’l Conf. on CAD, pages 204–209, 2004.
152
ROBUSTNESS AND USABILITY
[BAS02]
A. Biere, C. Artho, and V. Schuppan. Liveness checking as safety checking. In FMICS workshop, volume 66(2) of Electronic Notes in Theoretical Computer Science, 2002.
[BCCZ99]
A. Biere, A. Cimatti, E. Clarke, and Y. Zhu. Symbolic model checking without BDDs. In Tools and Algorithms for the Construction and Analysis of Systems, volume 1579 of LNCS, pages 193–207. Springer Verlag, 1999.
[BCMD90]
J.R. Burch, E.M. Clarke, K.L. McMillan, and D.L. Dill. Sequential circuit verification using symbolic model checking. In Design Automation Conf., pages 46–51, 1990.
[BDKN94]
D. Brand, A. Drumm, S. Kundu, and P. Narrain. Incremental synthesis. In Int’l Conf. on CAD, pages 14–18, 1994.
[Bec92]
B. Becker. Synthesis for testability: Binary decision diagrams. In Symp. on Theoretical Aspects of Comp. Science, volume 577 of LNCS, pages 501–512. Springer, 1992.
[Bec98]
B. Becker. Testing with decision diagrams. INTEGRATION, the VLSI Jour., 26:5–20, 1998.
[BF76]
M.A. Breuer and A.D. Friedman. Diagnosis & reliable design of digital systems. Computer Science Press, 1976.
[BFGR03]
A.G. Braun, J.B. Freuer, J. Gerlach, and W. Rosenstiel. Automated conversion of SystemC fixed-point data types for hardware synthesis. In VLSI of Systemon-Chip, pages 55–60, 2003.
[BMJ+ 99]
V. Boppana, R. Mukherjee, J. Jain, M. Fujita, and P. Bollineni. Multiple error diagnosis based on Xlists. In Design Automation Conf., pages 660–665, 1999.
[BNR03]
T. Ball, M. Naik, and S. K. Rajamani. From symptom to cause: Localizing errors in counterexample traces. In Symposium on Principles of Programming Languages, pages 97–105, January 2003.
[Boo04]
Boolean Satisfiability Research Group at Princeton University. ZCHAFF, 2004. http://www.princeton.edu/ ˜chaff/zchaff.html.
[BPM+ 05]
D. Berner, H. Patel, D. Mathaikutty, J.-P. Talpin, and S. Shukla. SystemCXML: An extensible SystemC front end using XML. Technical report, INRIA, France and Virginia Polytechnic and State University, USA, 2005.
[Bra83]
D. Brand. Redundancy and don’t cares in logic synthesis. IEEE Trans. on Comp., 32(10):947–952, 1983.
[BRB90]
K.S. Brace, R.L. Rudell, and R.E. Bryant. Efficient implementation of a BDD package. In Design Automation Conf., pages 40–45, 1990.
[Bry86]
R.E. Bryant. Graph-based algorithms for Boolean function manipulation. IEEE Trans. on Comp., 35(8):677–691, 1986.
[BS93]
D. Brand and T. Sasao. Minimization of AND-EXOR expressions using rewrite rules. IEEE Trans. on Comp., 42:568–576, 1993.
References
153
[BS01]
J. Bormann and C. Spalinger. Formale Verifikation für Nicht-Formalisten (Formal verification for non-formalists). Informationstechnik und Technische Informatik, 43:22–28, 2001.
[BW96]
B. Bollig and I. Wegener. Improving the variable ordering of OBDDs is NPcomplete. IEEE Trans. on Comp., 45(9):993–1002, 1996.
[CCC+ 92]
G. Cabodi, P. Camurati, F. Corno, P. Prinetto, and M.S. Reorda. A new model for improving symbolic product machine traversal. In Design Automation Conf., pages 614–619, 1992.
[CGP99]
E.M. Clarke, O. Grumberg, and D.A. Peled. Model Checking. MIT Press, Cambridge, MA, 1999.
[CIW+ 03]
F. Copty, A. Irron, O. Weissberg, N. Kropp, and G. Kamhi. Efficient debugging in a formal verification environment. Software Tools for Technology Transfer, 4:335–348, 2003.
[CJG+ 03]
A. Clouard, K. Jain, F. Ghenassia, L. Maillet-Contoz, and J.-P. Strassen. Using Transactional Level Models in a SoC Design Flow, chapter 2, pages 29–64. Kluwer Academic Publishers, 2003.
[CNQ03]
G. Cabodi, S. Nocco, and S. Quer. SAT-based bounded model checking by means of BDD-based approximate traversals. In Design, Automation and Test in Europe, pages 898–903, 2003.
[Coo71]
S.A. Cook. The complexity of theorem proving procedures. In 3. ACM Symposium on Theory of Computing, pages 151–158, 1971.
[CPK95]
M. Chatterjee, D. K. Pradhan, and W. Kunz. LOT: logic optimization with testability - new transformations using recursive learning. In Int’l Conf. on CAD, pages 318–325, 1995.
[CWH93]
P.-Y. Chung, Y.-M. Wang, and I. N. Hajj. Diagnosis and correction of logic design errors in digital circuits. In Design Automation Conf., pages 503–508, 1993.
[DBG96]
R. Drechsler, B. Becker, and N. Göckel. A genetic algorithm for variable ordering of OBDDs. IEE Proceedings, 143(6):364–368, 1996.
[DF04]
R. Drechsler and G. Fey. Design understanding by automatic property generation. In Workshop on Synthesis And System Integration of Mixed Information technologies, pages 274–281, 2004.
[DF06]
R. Drechsler and G. Fey. Automatic test pattern generation. In Formal Methods for Hardware Verification, LNCS, pages 30–55, 2006.
[DFGG05]
R. Drechsler, G. Fey, C. Genz, and D. Große. SyCE: An integrated environment for system design in SystemC. In IEEE Int’l Workshop on Rapid System Prototyping, pages 258–260, 2005.
[DFK06]
R. Drechsler, G. Fey, and S. Kinder. An integrated approach for combining BDD and SAT provers. In VLSI Design Conf., pages 237–242, 2006.
154
ROBUSTNESS AND USABILITY
[DG02]
R. Drechsler and W. Günther. Towards One-Path Synthesis. Kluwer Academic Publishers, 2002.
[DLL62]
M. Davis, G. Logeman, and D. Loveland. A machine program for theorem proving. Comm. of the ACM, 5:394–397, 1962.
[DP60]
M. Davis and H. Putnam. A computing procedure for quantification theory. Journal of the ACM, 7:506–521, 1960.
[Dre94]
R. Drechsler. BiTeS: A BDD based test pattern generator for strong robust path delay faults. In European Design Automation Conf., pages 322–327, 1994.
[Dre04]
R. Drechsler. Using synthesis techniques in SAT solvers. In ITG/GI/GMMWorkshop Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen, pages 165–173, 2004.
[DSF04]
R. Drechsler, J. Shi, and G. Fey. Synthesis of fully testable circuits from BDDs. IEEE Trans. on CAD, 23(3):440–443, 2004.
[EAH05]
C. Eibl, C. Albrecht, and R. Hagenau. gSysC: A graphical front end for SystemC. In European Conference on Modelling and Simulation (ECMS), pages 257–262, 2005.
[EB05]
N. Eén and A. Biere. Effective preprocessing in SAT through variable and clause elimination. In International Conference on Theory and Applications of Satisfiability Testing, volume 3569 of LNCS, pages 61–75, 2005.
[EMS07]
N. Een, A. Mishchenko, and N. Sörensson. Applying logic synthesis for speeding up SAT. In Int’l Conference on Theory and Applications of Satisfiability Testing, LNCS, 2007.
[ES04]
N. Eén and N. Sörensson. An extensible SAT solver. In SAT 2003, volume 2919 of LNCS, pages 502–518. Springer, 2004.
[EW77]
E.B. Eichelberger and T.W. Williams. A logic design structure for LSI testability. In Design Automation Conf., pages 462–468, 1977.
[FD03]
G. Fey and R. Drechsler. Finding good counter-examples to aid design verification. In MEMOCODE, pages 51–52, 2003.
[FD04]
G. Fey and R. Drechsler. Improving simulation-based verification by means of formal methods. In ASP Design Automation Conf., pages 640–643, 2004.
[FD05]
G. Fey and R. Drechsler. Efficient hierarchical system debugging for property checking. In IEEE Workshop on Design and Diagnostics of Electronic Circuits and Systems, pages 41–46, 2005.
[FD06]
G. Fey and R. Drechsler. Minimizing the number of paths in BDDs – theory and algorithm. IEEE Trans. on CAD, 25(1):4–11, 2006.
[FGC+ 04]
G. Fey, D. Große, T. Cassens, C. Genz, T. Warode, and R. Drechsler. ParSyC: An efficient SystemC parser. In Workshop on Synthesis And System Integration of Mixed Information technologies, pages 148–154, 2004.
References
155
[FKL03]
H. Foster, A. Krolnik, and D. Lacey. Assertion-Based Design. Kluwer Academic Publishers, 2003.
[Fri73]
A.D. Friedman. Easily testable iterative systems. IEEE Trans. on Comp., 22:1061–1064, 1973.
[FS83]
H. Fujiwara and T. Shimono. On the acceleration of test generation algorithms. IEEE Trans. on Comp., 32:1137–1144, 1983.
[FSD04]
G. Fey, J. Shi, and R. Drechsler. BDD circuit optimization for path delay fault testability. In EUROMICRO Symp. on Digital System Design, pages 162–172, 2004.
[FSVD06]
G. Fey, S. Safarpour, A. Veneris, and R. Drechsler. On the relation between simulation-based and SAT-based diagnosis. In Design, Automation and Test in Europe, pages 1139–1144, 2006.
[GD00]
W. Günther and R. Drechsler. ACTion: Combining logic synthesis and technology mapping for MUX based FPGAs. Journal of Systems Architecture, 46(14):1321–1334, 2000.
[GD03]
D. Große and R. Drechsler. Formal verification of LTL formulas for SystemC designs. In IEEE International Symposium on Circuits and Systems, pages V:245–V:248, 2003.
[GD06]
C. Genz and R. Drechsler. System exploration of SystemC designs. In IEEE Annual Symposium on VLSI, pages 335–340, 2006.
[GDLA03]
D. Große, R. Drechsler, L. Linhard, and G. Angst. Efficient automatic visualization of SystemC designs. In Forum on Specification and Design Languages, pages 646–657, 2003.
[GJ79]
M.R. Garey and D.S. Johnson. Computers and Intractability - A Guide to NP-Completeness. Freeman, San Francisco, 1979.
[GKL04]
A. Groce, D. Kroening, and F. Lerda. Understanding counterexamples with explain. In R. Alur and D. A. Peled, editors, Computer Aided Verification, number 3114 in LNCS, pages 453–456, July 2004.
[GLMS02]
T. Grötker, S. Liao, G. Martin, and S. Swan. System Design with SystemC. Kluwer Academic Publishers, 2002.
[GN02]
E. Goldberg and Y. Novikov. BerkMin: a fast and robust SAT-solver. In Design, Automation and Test in Europe, pages 142–149, 2002.
[Goe81]
P. Goel. An implicit enumeration algorithm to generate test for combinational logic. IEEE Trans. on Comp., 30:215–222, 1981.
[Gro04]
A. Groce. Error explanation with distance metrics. In Tools and Algorithms for the Construction and Analysis of Systems, volume 2988 of LNCS, pages 108–122, Barcelona, Spain, March–April 2004.
[GSY04]
O. Grumberg, A. Schuster, and A. Yadgar. Memory efficient all-solutions SAT solver and its application to reachability. In Int’l Conf. on Formal Methods in CAD, volume 3312 of LNCS, pages 275–289, 2004.
156
ROBUSTNESS AND USABILITY
[GV03]
A. Groce and W. Visser. What went wrong: Explaining counterexamples. In Model Checking of Software: International SPIN Workshop, volume 2648 of LNCS, pages 121–135. Springer, May 2003.
[GYA+ 01]
A. Gupta, Z. Yang, P. Ashar, L. Zhang, and S. Malik. Partition-based decision heuristics for image computation using SAT and BDDs. In Int’l Conf. on CAD, pages 286–292, 2001.
[GYAG00]
A. Gupta, Z. Yang, P. Ashar, and A. Gupta. SAT-based image computation with application in reachability analysis. In Int’l Conf. on Formal Methods in CAD, volume 1954 of LNCS, pages 354–371, 2000.
[GZ03]
J. F. Groote and H. Zantema. Resolution and binary decision diagrams cannot simulate each other polynomially. Discrete Applied Mathmatics, 130(2):157– 171, 2003.
[HC99]
S.-Y. Huang and K.-T. Cheng. Errortracer: Design error diagnosis based on fault simulation techniques. IEEE Trans. on CAD, 18(9):1341–1352, 1999.
[HD84]
W. Hamscher and R. Davis. Diagnosing circuits with state: An inherently underconstrained problem. In Proceedings of the Fourth National Conference on Artificial Intelligence (AAAI’84), pages 142–147, 1984.
[HDB96]
A. Hett, R. Drechsler, and B. Becker. MORE: Alternative implementation of BDD packages by multi-operand synthesis. In European Design Automation Conf., pages 164–169, 1996.
[HK00]
D.W. Hoffmann and T. Kropf. Efficient design error correction of digital circuits. In Int’l Conf. on Comp. Design, pages 465–472, 2000.
[HS96]
G. Hachtel and F. Somenzi. Logic Synthesis and Verification Algorithms. Kluwer Academic Publishers, 1996.
[HTCT03]
Y.-C. Hsu, B. Tabbara, Y.-A. Chen, and F. Tsai. Advanced techniques for RTL debugging. In Design Automation Conf., pages 362–367, 2003.
[HTFM03]
C. Haubelt, J. Teich, R. Feldmann, and B. Monien. SAT-based techniques in system synthesis. In Design, Automation and Test in Europe, volume 1, pages 11168–11169, 2003.
[IINY03]
H. Inoue, T. Iwasaki, M. Numa, and K. Yamamoto. An improved multiple error diagnosis technique using symbolic simulation with truth variables and its application to incremental synthesis for standard-cell design. In Workshop on Synthesis And System Integration of Mixed Information technologies, pages 61–68, 2003.
[IPC03]
M.K. Iyer, G. Parthasarathy, and K.-T. Cheng. SATORI – a fast sequential SAT engine for circuits. In Int’l Conf. on CAD, pages 320–325, 2003.
[IS75]
O.H. Ibarra and S.K. Sahni. Polynomially complete fault detection problems. IEEE Trans. on Comp., 24:242–249, 1975.
[JG03]
N. Jha and S. Gupta. Testing of Digital Systems. Cambridge University Press, 2003.
References
157
[JGB05]
B. Jobstmann, A. Griesmayer, and R. Bloem. Program repair as a game. In Computer Aided Verification, volume 3576 of LNCS, pages 226–238, 2005.
[JPHS91]
S.-W. Jeong, B. Plessier, G. Hachtel, and F. Somenzi. Extended BDD’s: Trading of canonicity for structure in verification algorithms. In Int’l Conf. on CAD, pages 464–467, 1991.
[JRS04]
H.S. Jin, K. Ravi, and F. Somenzi. Fate and free will in error traces. Software Tools for Technology Transfer, 6(2):102–116, 2004.
[JS05]
H. Jin and F. Somenzi. CirCUs: A hybrid satisfiability solver. In SAT 2004, volume 3542 of LNCS, pages 211–223. Springer, 2005.
[KCSL94]
A. Kuehlmann, D.I. Cheng, A. Srinivasan, and D.P. LaPotin. Error diagnosis for transistor-level verification. In Design Automation Conf., pages 218–224, 1994.
[KCY03]
D. Kroening, E. Clarke, and K. Yorav. Behavioral consistency of C and Verilog programs using bounded model checking. In Design Automation Conf., pages 368–371, 2003.
[KP80]
K.L. Kodandapani and D.K. Pradhan. Undetectability of bridging faults and validity of stuck-at fault test sets. IEEE Trans. on Comp., C-29(1):55–59, 1980.
[KP94]
W. Kunz and D.K. Pradhan. Recursive learning: A new implication technique for efficient solutions of CAD problems: Test, verification and optimization. IEEE Trans. on CAD, 13(9):1143–1158, 1994.
[KPKG02]
A. Kuehlmann, V. Paruthi, F. Krohm, and M.K. Ganai. Robust Boolean reasoning for equivalence checking and functional property verification. IEEE Trans. on CAD, 21(12):1377–1394, 2002.
[Kro99]
Th. Kropf. Introduction to Formal Hardware Verification. Springer, 1999.
[KS97]
W. Kunz and D. Stoffel. Reasoning in Boolean Networks. Kluwer Academic Publishers, 1997.
[Kun93]
W. Kunz. HANNIBAL: An efficient tool for logic verification based on recursive learning. In Int’l Conf. on CAD, pages 538–543, 1993.
[Lar92]
T. Larrabee. Test pattern generation using Boolean satisfiability. IEEE Trans. on CAD, 11:4–15, 1992.
[LCC+ 95]
C.-C. Lin, K.-C. Chen, S.-C. Chang, M. Marek-Sadowska, and K.-T. Cheng. Logic synthesis for engineering change. In Design Automation Conf., pages 647–651, 1995.
[LHS04]
B. Li, M.S. Hsiao, and S. Sheng. A novel SAT all-solutions solver for efficient preimage computation. In Design, Automation and Test in Europe, pages 10272–10278, 2004.
[LTG97]
S. Liao, S. Tjiang, and R. Gupta. An efficient implementation of reactivity for modeling hardware in the scenic design environment. In Design Automation Conf., pages 70–75, 1997.
158
ROBUSTNESS AND USABILITY
[LV05]
J.B. Liu and A. Veneris. Incremental fault diagnosis. IEEE Trans. on CAD, 24(4):1514–1545, 2005.
[Mar99]
J.P. Marques-Silva. The impact of branching heuristics in propositional satisfiability algorithms. In 9th Portuguese Conference on Artificial Intelligence (EPIA), 1999.
[MBM01]
L. Macchiarulo, L. Benini, and E. Macii. On-the-fly layout generation for PTL macrocells. In Design, Automation and Test in Europe, pages 546–551, 2001.
[McM93]
K.L. McMillan. Symbolic Model Checking. Kluwer Academic Publishers, 1993.
[McM02]
K. L. McMillan. Applying SAT methods in unbounded symbolic model checking. In Computer Aided Verification, volume 2404 of LNCS, pages 250–264, July 2002.
[Min02]
S. Minato. Streaming BDD manipulation. IEEE Trans. on Comp., 51(5):474– 485, 2002.
[MMM02]
J. Mohnke, P. Molitor, and S. Malik. Limits of using signatures for permutation independent Boolean comparison. Formal Methods in System Design: An International Journal, 2(21):167–191, 2002.
[MMMC05]
M. Moy, F. Maraninchi, and L. Maillet-Contoz. PINAPA: An extraction tool for SystemC descriptions of systems-on-a-chip. In ACM International Conference on Embedded Software (EMSOFT), pages 317–324, 2005.
[MMZ+ 01]
M.W. Moskewicz, C.F. Madigan, Y. Zhao, L. Zhang, and S. Malik. Chaff: Engineering an efficient SAT solver. In Design Automation Conf., pages 530– 535, 2001.
[MP91]
Z. Manna and A. Pnueli. The Temporal Logic of Reactive and Concurrent Systems. Springer-Verlag, 1991.
[MP01]
A. Mishchenko and M. Perkowski. Fast heuristic minimization of exclusivesums-of-products. In Int’l Workshop on Applications of the Reed-Muller Expansion in Circuit Design, pages 242–250, 2001.
[MRH+ 01]
W. Müller, J. Ruf, D. Hoffmann, J. Gerlach, T. Kropf, and W. Rosenstiehl. The simulation semantics of SystemC. In Design, Automation and Test in Europe, pages 64–70, 2001.
[MRR03]
W. Müller, W. Rosenstiel, and J. Ruf, editors. SystemC Methodologies and Applications. Kluwer Academic Publishers, 2003.
[MS96]
J.P. Marques-Silva and K.A. Sakallah. GRASP – a new search algorithm for satisfiability. In Int’l Conf. on CAD, pages 220–227, 1996.
[MS98]
C. Meinel and H. Sack. ⊕-OBDDs – a BDD structure for probabilistic verification. In Workshop on Probabilistic methods in Verification, pages 141–151, 1998.
[MS99]
J.P. Marques-Silva and K.A. Sakallah. GRASP: A search algorithm for propositional satisfiability. IEEE Trans. on Comp., 48(5):506–521, 1999.
References
159
[MSML99]
A. Mukherjee, R. Sudhakar, M. Marek-Sadowska, and S. Long. Wave steering in YADDs: A novel non-iterative synthesis and layout technique. In Design Automation Conf., pages 466–471, 1999.
[NE01]
J.W. Nimmer and M.D. Ernst. Static verification of dynamically detected program invariants: Integrating Daikon and ESC/Java. In Workshop on Runtime Verification, volume 55 of Electronic Notes in Theoretical Computer Science. Elsevier, 2001.
[NP91]
T.M. Niermann and J.H. Patel. HITEC: A test generation package for sequential circuits. In European Conf. on Design Automation, pages 214–218, 1991.
[Par97]
T. Parr. Language Translation using PCCTS and C++: A Reference Guide. Automata Publishing, 1997.
[PC90]
M.A. Perkowski and M. Chrzanowska-Jeske. An exact algorithm to minimize mixed-radix exclusive sums of products for incompletely specified Boolean functions. In Int’l Symp. Circ. and Systems, pages 1652–1655, 1990.
[PK00]
V. Paruthi and A. Kuehlmann. Equivalence checking combining a structural SAT-solver, BDDs, and simulation. In Int’l Conf. on Comp. Design, pages 459–464, 2000.
[Pnu77]
A. Pnueli. The temporal logic of programs. In IEEE Symposium on Foundations of Computer Science, pages 46–57, Providence, RI, 1977.
[PQ95]
T.J. Parr and R.W. Quong. ANTLR: A predicated-LL(k) parser generator. Software – Practice and Experience, 25(7):789–810, 1995.
[PR90]
A.K. Pramanick and S.M. Reddy. On the design of path delay fault testable combinational circuits. In Int’l Symp. on Fault-Tolerant Comp., pages 374– 381, 1990.
[PR95]
I. Pomeranz and S.M. Reddy. On correction of multiple design errors. IEEE Trans. on CAD, 14(2):255–264, 1995.
[PW03]
B. Peischl and F. Wotawa. Modeling state in software debugging of VHDLRTL designs – a model based diagnosis approach. In Automated and Algorithmic Debugging (AADEBUG 2003), pages 197–210, 2003.
[RBKM91]
D.E. Ross, K.M. Butler, R. Kapur, and M.R. Mercer. Fast functional evaluation of candidate OBDD variable ordering. In European Conf. on Design Automation, pages 4–9, 1991.
[RDO02]
S. Reda, R. Drechsler, and A. Orailoglu. On the relation between SAT and BDDs for equivalence checking. In Int’l Symp. on Quality Electronic Design, pages 394–399, 2002.
[Rei87]
R. Reiter. A theory of diagnosis from first principles. Artificial Intelligence, 32:57–95, 1987.
[Rot66]
J.P. Roth. Diagnosis of automata failures: A calculus and a method. IBM J. Res. Dev., 10:278–281, 1966.
160
ROBUSTNESS AND USABILITY
[RR03]
M. Renieris and S. P. Reiss. Fault localization with nearest neighbor queries. In International Conference on Automated Software Engineering, pages 30–39, Montreal, Canada, October 2003.
[RS04]
K. Ravi and F. Somenzi. Minimal assignments for bounded model checking. In Tools and Algorithms for the Construction and Analysis of Systems, volume 2988 of LNCS, pages 31–45, 2004.
[Rud93]
R. Rudell. Dynamic variable ordering for ordered binary decision diagrams. In Int’l Conf. on CAD, pages 42–47, 1993.
[SBSV96]
P. Stephan, R.K. Brayton, and A.L. Sangiovanni-Vincentelli. Combinational test generation using satisfiability. IEEE Trans. on CAD, 15:1167–1176, 1996.
[SFBD06]
S. Staber, G. Fey, R. Bloem, and R. Drechsler. Automatic fault localization for property checking. In Haifa Verification Conference, volume 4383 of LNCS, pages 50–64, 2006.
[SFD05a]
J. Shi, G. Fey, and R. Drechsler. Bridging fault testability of BDD circuits. In ASP Design Automation Conf., pages 188–191, 2005.
[SFD+ 05b]
J. Shi, G. Fey, R. Drechsler, A. Glowatz, F. Hapke, and J. Schlöffel. PASSAT: Effcient SAT-based test pattern generation for industrial circuits. In IEEE Annual Symposium on VLSI, pages 212–217, 2005.
[SFVD05]
S. Safarpour, G. Fey, A. Veneris, and R. Drechsler. Utilizing don’t care states in SAT-based bounded sequential problems. In Great Lakes Symp. VLSI, pages 264–269, 2005.
[Sht01]
O. Shtrichman. Pruning techniques for the SAT-based bounded model checking problem. In Conference on Correct Hardware Design and Verification, volume 2144 of LNCS, pages 58–70, 2001.
[SJB05]
S. Staber, B. Jobstmann, and R. Bloem. Finding and fixing faults. In Conference on Correct Hardware Design and Verification, LNCS, pages 35–49, 2005.
[SKWS+ 04]
C. Schulz-Key, M. Winterholer, T. Schweizer, T. Kuhn, and W. Rosenstiel. Object-oriented modeling and synthesis of SystemC specifications. In ASP Design Automation Conf., pages 238–243, 2004.
[Smi85]
G.L. Smith. Model for delay faults based upon paths. In Int’l Test Conf., pages 342–349, 1985.
[Smi04]
A. Smith. Diagnosis of combinational logic circuits using Boolean satisfiability. Master’s thesis, University of Toronto, Canada, 2004.
[Som01a]
F. Somenzi. CUDD: CU Decision Diagram Package Release 2.3.1. University of Colorado at Boulder, 2001.
[Som01b]
F. Somenzi. Efficient manipulation of decision diagrams. Software Tools for Technology Transfer, 3(2):171–181, 2001.
References
161
[SSL+ 92]
E. Sentovich, K. Singh, L. Lavagno, Ch. Moon, R. Murgai, A. Saldanha, H. Savoj, P. Stephan, R. Brayton, and A. Sangiovanni-Vincentelli. SIS: A system for sequential circuit synthesis. Technical report, University of Berkeley, 1992.
[Str04]
O. Strichman. Accelerating bounded model checking of safety properties. Formal Methods in System Design, 24(1):5–24, January 2004.
[STS87]
M. Schulz, E. Trischler, and T. Sarfert. SOCRATES: A highly efficient automatic test pattern generation system. In Int’l Test Conf., pages 1016–1026, 1987.
[SVV04]
A. Smith, A. Veneris, and A. Viglas. Design diagnosis using Boolean satisfiability. In ASP Design Automation Conf., pages 218–223, 2004.
[SW93]
D. Sieling and I. Wegener. Reduction of BDDs in linear time. Information Processing Letters, 48(3):139–144, 11 1993.
[Syn02]
Synopsys. Describing Synthesizable RTL in SystemCT M , Vers. 1.1. Synopsys Inc., 2002. Available at http://www.synopsys.com.
[Tse68]
G. Tseitin. On the complexity of derivation in propositional calculus. In Studies in Constructive Mathematics and Mathematical Logic, Part 2, pages 115– 125, 1968. (Reprinted in: J. Siekmann, G. Wrightson (Ed.), Automation of Reasoning, Vol. 2, Springer, Berlin, pages 466–483, 1983.)
[TSH94]
M. Tomita, N. Suganuma, and K. Hirano. Pattern generation for locating logic design errors. IEICE Trans. Fundamentals, E77-A(5):881–893, 1994.
[TYSH94]
M. Tomita, T. Yamamoto, F. Sumikawa, and K. Hirano. Rectification of multiple logic design errors in multiple output circuits. In Design Automation Conf., pages 212–217, 1994.
[Uba03]
R. Ubar. Design error diagnosis with re-synthesis in combinational circuits. Jour. of Electronic Testing: Theory and Applications, 19:73–82, 2003.
[VF97]
S. Venkataraman and W. K. Fuchs. A deductive technique for diagnosis of bridging faults. In Int’l Conf. on CAD, pages 562–567, 1997.
[VH99]
A. Veneris and I. N. Hajj. Design error diagnosis and correction via test vector simulation. IEEE Trans. on CAD, 18(12):1803–1816, 1999.
[VIS96]
The VIS Group. VIS: A system for verification and synthesis. In Computer Aided Verification, volume 1102 of LNCS, pages 428–432. Springer Verlag, 1996.
[VSA03]
A. Veneris, A. Smith, and M. S. Abadir. Logic verification based on diagnosis techniques. In ASP Design Automation Conf., 2003.
[WA73]
M.J.Y. Williams and J.B. Angell. Enhancing testability of large-scale integrated circuits via test points and additional logic. IEEE Trans. on Comp., C-22(1):46–60, 1973.
162
ROBUSTNESS AND USABILITY
[WB95]
A. Wahba and D. Borrione. Design error diagnosis in sequential circuits. In Conference on Correct Hardware Design and Verification, volume 987 of LNCS, pages 171–188. Springer, 1995.
[WKS01]
J. Whittemore, J. Kim, and K. Sakallah. SATIRE: A new incremental satisfiability engine. In Design Automation Conf., pages 542–545, 2001.
[Wot02]
F. Wotawa. Debugging hardware designs using a value-based model. Applied Intelligence, 16:71–92, 2002.
[WTSF04]
K. Winkelmann, H.-J. Trylus, D. Stoffel, and G. Fey. Cost-efficient block verification for a UMTS up-link chip-rate coprocessor. In Design, Automation and Test in Europe, volume 1, pages 162–167, 2004.
[Yan91]
S. Yang. Logic synthesis and optimization benchmarks user guide. Technical Report 1/95, Microelectronic Center of North Carolina, 1991.
[ZH02]
A. Zeller and R. Hildebrandt. Simplifying and isolating failure-inducing input. IEEE Transactions on Software Engineering, 28(2):183–200, February 2002.
[ZMMM01]
L. Zhang, C.F. Madigan, M.H. Moskewicz, and S. Malik. Efficient conflict driven learning in a Boolean satisfiability solver. In Int’l Conf. on CAD, pages 279–285, 2001.
INDEX OF SYMBOLS
· + · νg [t] ϕ π ψ Ψ ω Ω
Boolean AND Boolean OR Negation of · Value of gate g at time step t Variable order Variable order Boolean expression LTL formula Boolean expression LTL formula (usually a subformula) B Set of Boolean values {0, 1} C Circuit C Set of candidate gates f Boolean function Positive cofactor of f wrt. x fx Negative cofactor of f wrt. x fx g Gate in a circuit C (i.e. a node in the graph) g[t] Propositional variable representing gate g at time step t (e.g. in a CNF formula) BDD of f wrt. π, f and/or π may Gπf be omitted when clear from the context i General indexing variable j General indexing variable k Upper (lower) limit for a minimization (maximization) problem l Number of latches in a circuit C M0 (w) Set of predecessors of a node w in a BDD that have a CE to w M1 (w) Set of predecessors of a node w in a BDD that have a non-CE to w N Set of next state elements in a circuit C Next state element i in a circuit C ni
n
Number of variables in a Boolean function f , number of primary inputs of a circuit C num Number of all counterexamples m Number of outputs of Boolean function f , number of primary outputs of a circuit C P Set of predecessors of a node in a circuit C P1 (Gπf ) Number of one-paths in the BDD Gfπ P0 (Gπf ) Number of zero-paths in the BDD Gfπ R Time relation S Set of present state elements of a circuit C Present state element i in a circuit si C t Time reference Length of a simulation trace or a tcyc property over a fixed time interval T Test-vector or counterexample T Simulation trace U Vector of signals in T Vector of values at time t in T ut v Node in a graph, often in a BDD Gfπ or a circuit C w Node in a graph, often in a BDD Gfπ or a circuit C X Set of primary inputs of a C, set of variables of a BDD Gfπ xi Primary input i of a circuit C or variable i in a BDD Gfπ Y Set of primary outputs of a circuit C Output i of a Boolean function f , yi primary output i of a circuit C Boolean variable corresponding zΩ [t] to an LTL formula Ω at time t
INDEX
Ω, see LTL formula Ψ, see LTL formula ω, see Boolean expression ϕ, see variable order π, see variable order ψ, see Boolean expression ψg [t], 28 νg [t], see value of a gate abnormal predicate, 105, 135 abstract syntax tree, 141 Ackermann constraint, 138 AST, 141 ATPG, 31 automatic test pattern generation, see ATPG B, 9 BasicSATDiagnose, 106 BasicSimDiagnose, 103 BCP, 15 BDD, 10 BDD circuit, 22 simplification, 69 testability, 68 binary decision diagram, see BDD blif, 59 BMC, 27 Boolean expression, 9 function, 9 Boolean constraint propagation, see BCP Boolean Satisfiability, see SAT bounded model checking, see BMC BSAT, 106 BSIM, 103 C, see circuit C, 102
CCE, 116 heuristics, 123 NP-completeness, 119 CE, 10 choosing counterexamples, see CCE circuit, 19 clause, 14 CNF, 14 cofactor, 10 combinational circuit, 20 complemented edge, see CE computed table, 13 conflict analysis, 14, 16 conflict-based learning, 18 conflict clause, 16 conflict-driven assertion, 16 conjunctive normal form, see CNF controlling value, 20 counterexample, 26 COV, 107 D-algorithm, 34 debugging, 99, 130, see diagnosis decision heuristic, 14 for diagnosis, 140 diagnosis, 99 complexity, 110 problem, 102 SAT-based, 104, 139 simulation-based, 103, 140 DLL procedure, 14 DPLL procedure, 14 effect analysis, 102 Else, 10 empty intersection, 117 equivalence checking, 25
166 ESOP, 47 essential candidates, 102 expansion heuristic, 42 expansion node, 42 F , 20 f , see Boolean function fx , see cofactor fx , see cofactor FAN, 35 fault model, 31 Gπ f , 11 g, see gate g[t], 28 gate, 20 HANNIBAL, 35 implication graph, 17 l, 20 Linear Time Logic, see LTL literal, 14 LTL, 27 formula, 29 m, 9, 20 minimal intersection, 117 minimum cover, 117 minimum cover problem, 107 miter circuit, 25 MuTaTe, 66 N , 20 n, 9, 19 non-chronological backtracking, 18 P , 20 ParSyC, 53 path delay fault model, see PDFM path tracing, 103 PDFM, 31 PODEM, 35 property diagnosis, 133 generation, 78 LTL, 29 propositional, 29 propGen, 79
ROBUSTNESS AND USABILITY repairable gate, 137 robust test, 32 Rudell’s sifting, 12 S, 20 SAFM, 31 SAT, 13 solver, 14 satisfiable, 14 SCDiagnose, 107 sequential circuit, 19 set cover problem, 107 Shannon decomposition, 10 simulation trace, 21 SOCRATES, 35 strong robust test, 32 stuck-at fault model, 31 SyCE, 53 synthesis for testability, 65 SystemC, 54 T, see simulation trace T , see counterexample T, 26 tcyc , 21 test pattern, 32 testable fault, 33 Then, 10 time relation, 78 two literal watching scheme, 16 U , 21 unique table, 13 unsatisfiable, 14 untestable fault, 33 valid correction, 102, 138 value Boolean, 9 of a gate, 20 variable order, 10 verification methodology, 87 new methodology, 77 VSIDS strategy, 18 X, 9, 19 Y , 20 zΩ [t], 29