Robustness and Usability in Modern Design Flows

Robustness and Usability in Modern Design Flows Robustness and Usability in Modern Design Flows by Görschwin Fey Uni...

Author: Goerschwin Fey | Rolf Drechsler

22 downloads 948 Views 1MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Robustness and Usability in Modern Design Flows

Robustness and Usability in Modern Design Flows by

Görschwin Fey University of Bremen Germany and

Rolf Drechsler University of Bremen Germany

A C.I.P. Catalogue record for this book is available from the Library of Congress.

ISBN 978-1-4020-6535-4 (HB) ISBN 978-1-4020-6536-1 (e-book)

Published by Springer, P.O. Box 17, 3300 AA Dordrecht, The Netherlands. www.springer.com

Printed on acid-free paper

All Rights Reserved c 2008 Springer Science + Business Media B.V.

No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work.

To Liva Jolanthe and Luna Sophie

CONTENTS

Dedication List of Figures List of Tables Preface 1. INTRODUCTION 2. PRELIMINARIES 2.1 Boolean Reasoning 2.1.1 Boolean Functions 2.1.2 Binary Decision Diagrams 2.1.3 Boolean Satisﬁability 2.2 Circuits 2.2.1 Circuits and Traces 2.2.2 BDD Circuits 2.2.3 Transformation into CNF 2.3 Formal Veriﬁcation 2.3.1 Equivalence Checking 2.3.2 Bounded Model Checking 2.4 Automatic Test Pattern Generation 2.4.1 Fault Models 2.4.2 Combinational ATPG 2.4.3 Classical ATPG Algorithms

v xi xv xvii 1 9 9 9 10 13 19 19 22 23 25 25 27 31 31 32 33

viii

ROBUSTNESS AND USABILITY

3. ALGORITHMS AND DATA STRUCTURES 3.1 Combining SAT and BDD Provers 3.1.1 Proof Techniques 3.1.2 Hybrid Approach 3.1.3 Experimental Results 3.2 Summary and Future Work

37 37 38 40 45 49

4. SYNTHESIS 4.1 Synthesis of SystemC 4.1.1 SystemC 4.1.2 SystemC Parser 4.1.3 Characteristics 4.1.4 Experimental Results 4.2 Synthesis for Testability 4.2.1 BDD Transformation 4.2.2 Testability 4.2.3 Experimental Results 4.3 Summary and Future Work

51 52 54 55 59 60 65 66 68 69 72

5. PROPERTY GENERATION 5.1 Detecting Gaps in Testbenches 5.1.1 Generating Properties 5.1.2 Selection of Properties 5.1.3 Experimental Results 5.2 Design Understanding 5.2.1 Methodology 5.2.2 Comparison to Other Techniques 5.2.3 Work Flow 5.2.4 Experimental Results 5.3 Summary and Future Work

75 77 78 81 83 87 87 91 91 92 97

6. DIAGNOSIS 6.1 Comparing SAT-based and Simulation-based Approaches 6.1.1 Diagnosis Approaches 6.1.2 Relation Between the Approaches 6.1.3 Qualitative Comparison 6.1.4 Experimental Results 6.2 Generating Counterexamples for Diagnosis 6.2.1 Choosing Good Counterexamples

99 101 102 107 109 112 115 116

ix

Contents

6.3

6.4

6.2.2 Heuristics to Choose Counterexamples 6.2.3 Experimental Results Debugging Properties 6.3.1 Other Diagnosis Approaches 6.3.2 Diagnosis for Properties 6.3.3 Source Level Diagnosis 6.3.4 Experimental Results Summary and Future Work

123 126 130 132 133 141 142 147

7. SUMMARY AND CONCLUSIONS

149

References

151

Index of Symbols

163

Index

165

LIST OF FIGURES

1.1 1.2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 3.1 3.2 3.3

Traditional design ﬂow Enhanced design ﬂow Example for a BDD BDD Gπb BDD Gϕ b DPLL procedure Decision stack Basic gates Simulation trace for the shift-register 1-bit-shift-register Multiplexor cell MUX Example for a BDD circuit Example for the conversion into CNF Miter circuit for equivalence checking SAT instance for BMC 1-bit-shift-register Example for the SAFM Boolean difference of the faulty circuit and the fault free circuit Justiﬁcation and propagation Different approaches Overview over different node types Depth ﬁrst traversal

3 5 11 12 12 15 17 19 21 21 22 23 25 26 27 30 31 33 34 41 42 44

xii

ROBUSTNESS AND USABILITY

3.4 3.5 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11

Modiﬁed node structure Solution to the 5-Queens problem Synthesis part of the design ﬂow Data types in the process counter proc Process counter proc of the robot controller from [GLMS02] AST for Example 16 Overall synthesis procedure Intermediate representation Arbiter: Block-level diagram Arbiter: Top-level module scalable FIR-ﬁlter: Block-level diagram Generation of circuits from BDDs Redundancy due to simpliﬁcation Veriﬁcation part of the design ﬂow Integration into the veriﬁcation ﬂow Sketch of the property generation Simulation trace for the shift-register 1-bit-shift-register Runs resulting in a valid property for misex3 Time needed for property generation for misex3 Current veriﬁcation methodology Proposed methodology Application of property deduction The arbiter Code of the arbiter Fault diagnosis in the design ﬂow Basic simulation-based diagnosis Example of a sensitized path SAT-based diagnosis Basic SAT-based diagnosis Diagnosis based on set cover Example: COV may not provide a correction Example: Solution for k = 2 by BSAT but not by COV BSAT vs. COV: Average distance BSAT vs. COV: Number of solutions Circuit corresponding to the instance I1 of MI

45 46 52 56 56 56 57 58 60 61 63 67 69 76 78 79 79 81 85 85 88 90 91 94 96 100 103 104 105 105 107 108 109 114 115 121

List of Figures

6.12 6.13 6.14 6.15 6.16 6.17 6.18 6.19 6.20 6.21 6.22 6.23 6.24

Algorithm to build the subset circuit Greedy algorithm to choose counterexamples Number of candidates Time for diagnosis Faulty arbiter circuit Circuit with gate g2 as diagnosis (Ω = req + ack + X ack, Ψ = ack + X ack). State elements considered for Ackermann constraints Pseudocode of the static decision strategy Source code link State machine for branch prediction Source code for bpb am2910: Runtime vs. number of diagnosed components gcd: Runtime vs. Number of diagnosed components

xiii 122 125 129 130 135 136 138 140 142 143 144 147 147

LIST OF TABLES

2.1 3.1 3.2 3.3 3.4 4.1 4.2 4.3 4.4 4.5 4.6 5.1 5.2 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9

Transformation of an AND-gate into a CNF formula 24 Index of node types (32-bit) 45 Heuristics to limit the size of the hybrid structure 46 Selection of expansion nodes 47 ESOP minimization 48 Arbiter: Synthesis results 62 FIR-ﬁlter: Synthesis results 63 ISCAS 89: Synthesis results 64 Benchmarks before and after optimization by SIS 70 Path-delay fault coverage of BDD circuits 71 Path-delay fault coverage of BDD circuits optimized by sifting 71 Sequential benchmarks, tcyc = 1,000,000 86 Sequential benchmarks, tcyc = 100, 000 93 Comparison of the approaches 110 Run time of the basic approaches 112 Quality of the basic approaches 113 Circuit data 127 Results using two counterexamples 127 Results using three counterexamples 128 Results using four counterexamples 128 Diagnosis results for multiple counterexamples 145 and Ackermann constraints Run times for the different approaches 146 (using four counterexamples)

PREFACE

The size of technically producible integrated circuits increases continuously. But the ability to design and verify these circuits does not keep up with this development. Therefore, today’s design ﬂow has to be improved to achieve a higher productivity. In this book the current design methodology and veriﬁcation methodology are analyzed, a number of deﬁciencies are identiﬁed, and solutions are suggested. Improvements in the methodology as well as in the underlying algorithms are proposed. An in-depth presentation of preliminary concepts makes the book self-contained. Based on this foundation major design problems are targeted. In particular, a complete tool ﬂow for Synthesis for Testability of SystemC descriptions is presented. The resulting circuits are completely testable and test pattern generation in polynomial time is possible. Veriﬁcation issues are covered in even more detail. A whole new paradigm for formal design veriﬁcation is suggested. This is based upon design understanding, the automatic generation of properties, and powerful tool support for debugging failures. All these new techniques are empirically evaluated and experimental results are provided. As a result, an enhanced design ﬂow is created that provides more automation (i.e. better usability) and reduces the probability of introducing conceptual errors (i.e. higher robustness).

Acknowledgments We would like to thank all members of the research group for computer architecture in Bremen for the helpful discussions and the great atmosphere during work and research. Furthermore, we would like to thank all our coauthors of the papers that make up an important part of this book: Roderick Bloem, Tim Cassens, Christian Genz, Daniel Große, Sebastian Kinder, Sean Safarpour, Stefan Staber, Andreas Veneris, and Tim Warode. Rüdiger Ebendt helped us in proofreading while unifying the notations. We would like to thank Lisa Teuber for designing the cover page. Antje Luchs patiently helped to improve the presentation for nonexperts. Görschwin Fey and Rolf Drechsler Bremen, September 2007

Chapter 1 INTRODUCTION

Almost every appliance used in daily life has an integrated circuit as a control unit. This applies not only to a modern television or a washing machine but also to cars or airplanes where security critical tasks are controlled by circuits. Up to several 100 million gates are contained in such an integrated circuit – also called “chip”. Moreover, the number of elements that are composed into a single chip doubles every 18 months according to Moore’s Law. This causes an exponentially increasing size of the problem instances that have to be handled during circuit design. Techniques and tools for computer-aided design (CAD) are available to create such complex systems. But often the tool development does not keep up with the progress in fabrication techniques. The “design gap” is resulting, i.e. the size of the circuits that can be produced increases faster than the productivity of the design process. One major issue is the robustness of the design tools. While a tool may produce an output of high quality within an acceptable run time for one design, this may not be the case for another design. Also, the performance of the tool cannot be predicted from the design itself. This behavior is not desirable while designing a circuit. But it is inherent to the problems solved by these tools. Many of these problems are computationally complex – often NP-complete – and, additionally, the size of the problem instances grows exponentially. For this reason, the underlying algorithms have to be continuously improved. This means to reduce the run time of these algorithms while keeping or even improving the quality of the output. A second reason for the design gap is the low usability of circuit design tools. Often a high expertise and long experience are needed, e.g. to adjust the large number of parameters or to optimally interpret the output. By automating more tasks to help the designer and providing tools that are easy to use, these steps become easier and, as a result, the design productivity increases.

2

ROBUSTNESS AND USABILITY

This book addresses both of these aspects: robustness and usability. For this purpose the current – in the following also called “traditional” – design ﬂow is considered as a whole. A number of hot spots is identiﬁed where an improvement of either robustness or usability of the tools can signiﬁcantly improve the overall productivity. Solutions to these methodological weaknesses are proposed. This leads to a new enhanced design ﬂow based on the intensive use of formal methods. First, the traditional design ﬂow is brieﬂy reviewed and deﬁciencies are identiﬁed. Then, solutions for these deﬁciencies and the enhanced design ﬂow are presented. This presentation is kept brief because the whole design ﬂow is covered. A more detailed explanation of the problems and a motivation for the proposed solutions follows at the beginning of each chapter that addresses a particular problem. The major steps of the traditional design ﬂow are shown in Figure 1.1 on Page 3. The design process itself is sketched in the left part of the ﬁgure while the right part shows the veriﬁcation procedures. Rounded boxes denote tasks and angular boxes denote input data or output data of these tasks. Initially, a speciﬁcation of the circuit is written, usually as a text book in natural language. This textual speciﬁcation is then manually coded in two formal languages. An executable system description in terms of an ordinary software programming language (often C/C++ ) serves as an early system model to allow for the development of software and for simulation-based veriﬁcation. Additionally, a synthesizable description in terms of a Hardware Description Language (HDL) is necessary. Both descriptions are usually coded independently. This redundancy in the design ﬂow signiﬁcantly extends the design time and, even worse, may lead to inconsistencies between the different design descriptions. Based on the HDL description, synthesis is carried out to retrieve the circuit description for production, i.e. a gate level or transistor level representation. Simulation is applied to check the compliance of the system level description with the textual speciﬁcation and with the synthesizable description of the system. A testbench is created manually to describe crucial scenarios that have to be considered during simulation. But the state space grows exponentially with the number of state elements. A design with only 100 state elements, for example, has 2100 states already. Todays circuits often have more than 100 k state elements. Therefore these dynamic veriﬁcation approaches are inherently incomplete in the sense that neither all input scenarios nor all design states can be considered due to time limits. Formal property checking overcomes this weakness. The industrial application of property checking is at its beginning. The formal veriﬁcation with respect to the textual speciﬁcation of a 2 million gate design for UMTS data transfer was described in [WTSF04]. Formal equivalence checking is already state of the art to guarantee the correctness of subsequent synthesis steps if the synthesizable description of the design is

3

Introduction

Textual specification

Manual setup

Testbench

Manual coding

Simulation

System level description Counterexamples Manual coding

Manual fault diagnosis Equivalence check

Synthesizable description

Synthesis

Gate level description

Manual fault diagnosis

ATPG

Testset

Counterexamples

Task Input/ Output

Figure 1.1. Traditional design ﬂow

available. Equivalence checking has already replaced simulation-based methods in many industrial design ﬂows. But all these methods only help to detect the existence of design errors. The localization of design errors currently remains a time- consuming manual task. As a last step, m Automatic Test Pattern Generation (ATPG) is applied to calculate input stimuli for the postproduction

4

ROBUSTNESS AND USABILITY

test. But during synthesis testability issues are usually not considered and, therefore, ATPG is difﬁcult; the underlying problem is NP-complete. In this book, several approaches are proposed to remove the deﬁciencies that exist in the traditional design ﬂow. By combining these techniques, a new enhanced design ﬂow emerges. The enhanced ﬂow boosts the productivity of circuit design and thereby reduces the design gap. Formal techniques are excessively used for this purpose since it has been shown that they improve the productivity of individual steps in the traditional ﬂow already. One reason is the high computational power of these techniques compared to nonsymbolic techniques like, e.g. simulation-based approaches. As a starting point, the underlying algorithms for Boolean function manipulation are considered with respect to particular needs. Binary Decision Diagrams (BDDs) and Boolean Satisﬁability (SAT) are the dominant engines in this area. Currently, efﬁciency, i.e. to calculate a solution as fast as possible, is a major focus in the development of such algorithms. Increasing the robustness of the formal techniques is an important issue. This is achieved by combining concepts from BDDs and solvers for the SAT problem. The resulting integrated data structure allows to trade BDD-like behavior for SAT-like behavior and, by this, to exploit the strengths of both domains. Additionally, the data structure can be used to investigate “more interesting” parts of the search space more thoroughly than others. Efﬁcient Boolean function manipulation is the core of several techniques to improve the overall design ﬂow. The enhanced design ﬂow itself is shown in Figure 1.2 on Page 5. Bold lines around boxes indicated sections modiﬁed in comparison to the traditional design ﬂow. As a ﬁrst major improvement, the enhanced ﬂow tightly couples the system level description and the synthesizable description. The two languages that are typically used – the software programming language and the HDL – are replaced by SystemC [LTG97, GLMS02] (see also http://www.systemc.org). SystemC is a description language that includes constructs to specify software and hardware at different levels of abstraction. As a result, the system level description can directly be reﬁned into a synthesizable description within a single language. By this, the robustness of the design task is improved because the transformation of the system level model can be done more efﬁciently. The improved reﬁnement step is complemented by synthesis for testability. The proposed technique produces circuits that are fully testable under several fault models. Here, a representation of the function of the circuit as a BDD is used as a starting point. This functional representation is directly converted into a fully testable circuit. While ATPG is NP-complete in general, all faults can be classiﬁed in polynomial time on these circuits – a robust ATPG step is the result. The weak simulation-based techniques for design veriﬁcation are replaced by state-of-the-art formal techniques, namely property checking. The slow manual creation of properties is aided by automatically generating properties

5

Introduction

Textual specification

Interactive creation

Manual coding

Simulation traces

System level description

Simulation

Properties

Property check

Counterexamples Manual refinement

Fault diagnosis Equivalence check

Synthesizable description

Synthesis (for test.)

Gate level description

ATPG

Testset

Fault diagnosis

Counterexamples

Task Input/ Output

Figure 1.2. Enhanced design ﬂow

from simulation traces. This allows to apply a new design methodology at this step. Properties are created interactively. The approach has a number of advantages. Automatically generated properties help to understand the simulation traces and by this the design itself. If the proof of these properties fails on the design, this also helps to identify gaps in the simulation traces. When

6

ROBUSTNESS AND USABILITY

testbenches are used for simulation, this bridges the traditional ﬂow and the enhanced ﬂow. That is of great importance for the practical application. As a side effect, the formal properties verifying the synthesizable description of the system are created much faster when an interactive approach is used. By this, veriﬁcation with respect to all input sequences and all states of a design can be done more easily – the usability of the veriﬁcation tools is raised. An inconsistency between different design descriptions is usually indicated by counterexamples, no matter which technique – simulation, property checking, or equivalence checking – or design step – design description or synthesis – is considered. Debugging this inconsistency, i.e. identifying the real error site in the description, is a time-consuming manual task. Here, using techniques for automatic fault diagnosis drastically boosts the productivity. Efﬁcient state-ofthe-art techniques for fault diagnosis are compared and a technique to improve the generation of counterexamples for diagnosis is presented. The extension of diagnosis methods for debugging errors that are detected by formal properties is also considered. In contrast to previous methods, no correct output response per counterexample has to be given in advance and the diagnosis results are presented at the source code level. Automatically, helping the designer to ﬁnd design errors reduces the difﬁculty of interpreting results from formal veriﬁcation tools and, by this, the usability increases. Altogether, the proposed techniques and veriﬁcation methodology establish the enhanced design ﬂow. Only equivalence checking is not further considered in this book. Robust and easy-to-use tools for this task are already state of the art in the industrial application. Finally, the main improvements of the enhanced design ﬂow over the traditional design ﬂow can be summarized as follows: Integration of SAT and BDDs for robust Boolean function manipulation Tight coupling of system-level description and synthesizable description Fully testable circuits Automatic generation of properties from simulation traces Detection of gaps in simulation traces Automatic debugging support Presentation of diagnosis results at the source code level All techniques that are proposed have been implemented and empirically evaluated. They have been developed to the extent of a robust application on benchmark cases. Experimental results, a discussion of related work, and possible future extensions for the proposed techniques are presented in the respective

Introduction

7

sections. Each chapter addresses a particular problem area. Due to the comprehensive coverage of the whole design ﬂow, a more detailed explanation of the problem and a motivation of the proposed solution are given at the beginning of each chapter. There, the embedding into the overall ﬂow is also shown. A summary, possible future extensions and further related papers are given at the end of each chapter. This book is structured as follows: In the second chapter, the basic notations and deﬁnitions are given for the different concepts to keep the presentation self-contained. In Chapter 3, improvements for underlying algorithms for Boolean reasoning are explained. Namely, the integration of BDDs and SAT provers is investigated. Then, the synthesis step of the design ﬂow is considered in Chapter 4. The technique to create fully testable circuits from SystemC descriptions is introduced. This is done in two steps. A tool to parse and synthesize a SystemC description is presented. The gate level circuit is then transformed into a fully testable circuit. Chapter 5 presents the techniques and methodology to improve the veriﬁcation ﬂow. First, the automatic generation of properties from traces is explained from a technical point of view and the practical application to detect gaps in testbenches is proposed. Then, the transition towards a whole new veriﬁcation methodology based on design understanding and interactive creation of formal properties is discussed. Techniques for automatic diagnosis are reviewed in Chapter 6. Simulationbased diagnosis and SAT-based diagnosis are compared in detail. Then, the problem to produce counterexamples for an increased diagnosis quality is examined from a theoretical and practical point of view. Next, a technique to aid debugging for property checking is presented. Based on counterexamples, the error candidates are automatically calculated at the source code level. In the last chapter, the contributions of this book are summarized and conclusions are presented.

This page intentionally blank

Chapter 2 PRELIMINARIES

This chapter provides the necessary deﬁnitions and notations to keep the book self-contained. The complete design ﬂow and, therefore, a wide area is covered ranging from Boolean reasoning and underlying techniques to applications like formal veriﬁcation and ATPG. Therefore, the presentation is kept brief. A large number of books is available for an in-depth discussion of each topic. References to some of these books are given at the beginning of the respective sections.

2.1

Boolean Reasoning

In the following, the notations used for Boolean functions, Boolean expressions, binary decision diagrams, and Boolean satisﬁability are brieﬂy reviewed. A more detailed presentation can be found, e.g. in [HS96].

2.1.1

Boolean Functions

Notation 1. The set of Boolean values is given by B = {0, 1}. A Boolean function f is a mapping f : Bn → B. Usually f is deﬁned over n variables X = {x1 , . . . , xn } in the following. This is denoted by f (x1 , . . . , xn ). A multi-output Boolean function is a mapping f : Bn → Bm . A Boolean function can be described in terms of a Boolean expression. A Boolean expression over a set X = {x1 , . . . , xn } is formed over The variables The unary operator · (NOT) The binary operators, · (AND), + (OR), ⊕ (XOR), → (implication), ↔ (equivalence) Parentheses

10

ROBUSTNESS AND USABILITY

Given a Boolean function f (x1 , . . . , xn ), the positive cofactor fxi and the negative cofactor fxi with respect to xi are deﬁned as follows: fxi (. . . , xi−1 , xi+1 , . . .) = f (. . . , xi−1 , 1, xi+1 , . . .) f xi (. . . , xi−1 , xi+1 , . . .) = f (. . . , xi−1 , 0, xi+1 , . . .) The iterative cofactor fli1 ... lij , where lik ∈ {xik , xik }, is retrieved by iteratively calculating the cofactors fli1 , (fli1 )li2 up to (. . . ((fli1 )li2 ) . . .)lij .

2.1.2

Binary Decision Diagrams

As is well known, a Boolean function f : Bn → B can be represented by a Binary Decision Diagram (BDD) which is a directed acyclic graph G = (V, E) representing a Boolean function [Bry86]. The Shannon decomposition f (x) = xfx + xfx is carried out in each of the internal nodes with respect to a given variable x. The function represented by an internal node is determined recursively by the two children. Terminal nodes represent the constant functions. Output nodes represent functions that are considered externally, i.e. user functions. A BDD is called ordered if each variable is encountered at most once on each path from the root to a terminal node and if the variables are encountered in the same order on all such paths. A BDD is called reduced if it does not contain isomorphic subgraphs nor does it have redundant nodes. Reduced and ordered BDDs are a canonical representation since for each Boolean function the BDD is unique up to graph isomorphism [Bry86]. In the following, we refer to reduced and ordered BDDs as BDDs for brevity. By using Complemented Edges (CEs), the size of a BDD can be further reduced [BRB90]. In this book, both types of BDDs – with and without CEs – are considered. Formally, the order of the n variables of a Boolean function can be given by mapping the variable index to a level in the graph G: π : {1, . . . , n} → {1, . . . , n}. The index n+1 is assigned to terminal nodes. A BDD with CEs has exactly one terminal node, denoted by 1. The function isTerminal(v) returns true if, and only if, v is a terminal node. Each internal node has two successors, denoted by Then(v) and Else(v), and v ∈ V is labeled with an index Index(v) ∈ {1, . . . , n}. Alternatively, the function Label(v) returns the variable of a node, i.e. Label(v) = xIndex(v) . Due to the order π, the inequality π(Index(v)) < min{π(Index(Then(v))), π(Index(Else(v)))} always holds, i.e. a node is always above its children. Output nodes v ∈ V are labeled with Index(v) = 0, Label(v) is undeﬁned. They always reside on

11

Preliminaries

the topmost level 0. These nodes have exactly one successor Else(v). An edge e = (v, Then(v)) is never a CE. For edges e with e = (v, Else(v)) the attribute CE(e) is true if and only if e is a CE. By this, an output node v represents the Boolean function f or f , respectively, where f is the Boolean function represented by Else(v). Output nodes are denoted by a function symbol in all ﬁgures. In the following, Gπf denotes a BDD representing the Boolean function f with respect to variable order π. If clear from the context, π and f are omitted. The size of a BDD refers to the number of nodes excluding terminal nodes and output nodes. Example 1. Figure 2.1 shows the BDD for f = x1 x2 x3 + x1 x2 x4 + x1 x2 x3 x4 + x1 x2 x3 x4 . Edges from a node w to Else(w) are dashed; edges to Then(w) are solid. A dot denots a CE. The output node is denoted by the function symbol f . The BDD has a size of ﬁve. The implementations handle BDDs with CEs. BDDs without CEs are considered in some examples to keep the presentation simple. They have two terminals 0 and 1 but no edge attributes. As a result, two different nodes are needed to represent a function and its complement. In the worst case a BDD v0

f

v1 x4 v2

v3 x3

x3 v4 x2

v5

x1

1

1

Figure 2.1. Example for a BDD

12

ROBUSTNESS AND USABILITY

without CEs has twice the number of nodes compared to a BDD with CEs [BRB90]. The size of a BDD depends on the variable order. Example 2. Bryant [Bry86] gave the function b = x1 xn+1 + x2 xn+2 + · · · + xn x2n and the two variable orders π = (x1 , xn+1 , . . . , xn , x2n ), ϕ = (x1 , x2 , . . . , x2n−1 , x2n ) as an example. Figures 2.2 and 2.3 show the BDDs for variable orders π and ϕ, respectively. The BDD of b has a size of O(n) when π is used, but the size is O(2n ) when ϕ is used. The problem to decide whether a given variable ordering can be improved is NP-complete [BW96]. Efﬁcient heuristics have been proposed to ﬁnd a good variable order for BDDs. For example, Rudell’s sifting algorithm [Rud93] is b

b

x1

x1

x2

xn+1

x2

x2 x3

x3

x3

x3

xn+2 x2n−1

xn

x2n

x2n 0

1 Figure 2.2.

x2n−1

BDD Gπb

1 Figure 2.3. BDD Gϕ b

0

13

Preliminaries

quite fast while techniques based on evolutionary algorithms usually yield better results at the cost of a higher run time [DBG96].

2.1.2.1 Efﬁcient Implementation Using BDDs in practice is relatively easy since efﬁcient BDD packages are available, e.g. CUDD [Som01a]. A BDD node v is stored as a triple (Index(v), Then(v), Else(v)), where Then(v) and Else(v) are pointers to the memory locations of the children of v. The least signiﬁcant bits of these pointers are always zero in modern computers that address the memory word-wise. To save memory, the attribute CE(v, Else(v)) is stored in the least signiﬁcant bit of the pointer to Else(v). A hash is used to uniquely store the tuples representing all nodes. This hash is called the unique table. An advantage of BDDs is the efﬁciency of Boolean operations [Bry86]. Consider a Boolean operation ◦ ∈ {·, +, ⊕, ↔}. Given two functions f and g, the result of f ◦ g is calculated as follows: f ◦ g = (xfx + xfx ) ◦ (xgx + xgx ) = x(fx ◦ gx ) + x(fx ◦ gx )

(2.1)

Given the BDD nodes representing the functions f and g, this corresponds to the construction of a node to represent f ◦ g. This node is determined by recursively calculating the result of the operation on the children. By using the unique table, an existing node is reused if the function was already represented within the BDD package. Otherwise, a new node is created. This guarantees that only reduced BDDs are created and no additional reduction step is necessary. A second hash, the computed table, is used to efﬁciently carry out the recursive descent. The computed table is accessed via the operands and the operation as key. The value stores the result of the operation. Each time a result is calculated this is stored in the computed table. Therefore, each pair of nodes is only considered once with respect to a particular binary Boolean operation.

2.1.3

Boolean Satisﬁability

Besides, BDDs solvers for Boolean Satisﬁability (SAT) provide a powerful reasoning engine for Boolean problems. In CAD of circuits and systems, problems are frequently transformed into an instance of SAT. Then, a SAT solver can be used to calculate a solution and the SAT solution is transformed back into the original problem domain. In particular, SAT solvers are often used as the underlying engine for formal veriﬁcation. In this work, SAT solvers are applied for diagnosis. Moreover, the concepts will be important when considering the underlying algorithms for Boolean function manipulation. The SAT problem and efﬁcient algorithms to solve a given SAT instance are reviewed in this section.

14

ROBUSTNESS AND USABILITY

Given a Boolean function f (x1 , . . . , xn ) in conjunctive normal form, the SAT problem is to ﬁnd an assignment a = a1 , . . . , an for x1 . . . . , xn such that f (a1 , . . . , an ) = 1 or to proof that no such assignment exists. For the corresponding decision problem the question whether a exists has to be answered. This was the ﬁrst problem that was proven to be NP-complete [Coo71]. Despite this proven difﬁculty of the problem, algorithms for SAT solving have been proposed recently that efﬁciently solve many practical SAT instances.

2.1.3.1 SAT Solver SAT solvers usually work on a database that represents the Boolean formula in Conjunctive Normal Form (CNF), also called product of sums. A CNF formula is a conjunction (product) of clauses where each clause is a disjunction (sum) of literals. Finally, a literal is a variable or its complement. The objective during SAT solving is to ﬁnd a satisfying assignment for the given Boolean formula or to prove that no such assignment exists. A CNF formula is satisﬁed if all clauses are satisﬁed. A clause is satisﬁed if at least one literal in the clause is satisﬁed. The literal x is satisﬁed if the value 1 is assigned to variable x. The literal x is satisﬁed if the value 0 is assigned to variable x. If there exists a satisfying assignment for a formula, the formula is said to be satisﬁable, otherwise the formula is unsatisﬁable. Example 3. The following Boolean formula is given in CNF: f (x1 , x2 , x3 , x4 ) = (x1 + x3 + x4 ) · (x1 + x3 + x4 ) · (x1 + x3 + x4 ) w1

w2

w3

· (x1 + x3 + x4 ) · (x1 + x2 + x3 ) w4

w5

This CNF formula has ﬁve clauses w1 , . . . , w5 . A satisfying assignment for the formula is given by x1 = 1, x2 = 1, x3 = 1 and x4 = 0. Therefore this formula is satisﬁable. Modern SAT solvers are based on the DLL procedure that was ﬁrst introduced in [DLL62] as an improvement upon [DP60]. Often the DLL procedure is also referred to as DPLL. In principle, this algorithm explores the search space of all assignments by a backtrack search as shown in Figure 2.4. Iteratively, a decision is done by choosing a variable and a value for this variable according to a decision heuristic (Step 1). Then, implications due to this assignment are carried out (Step 2). When all clauses are satisﬁed, the problem is solved (Step 3). Otherwise, the current assignment may only be partial and therefore no conclusion is possible, yet. In this case, further assignments are necessary (Step 4). If at least one clause cannot be satisﬁed under the current (partial) assignment, conﬂict analysis is carried out as will be explained below.

15

Preliminaries

1. Decision: Choose an unassigned variable and assign a new value to the variable. 2. Boolean Constraint Propagation: Carry out implications resulting from the previous assignment. 3. Solution: If all clauses are satisﬁed, output the current variable assignment and return “satisﬁable.” 4. If there is no unsatisﬁed clause due to the current assignment, proceed with Step 1. 5. Conﬂict analysis: If the current assignment leads to at least one unsatisﬁed clause without unassigned literals, carry out conﬂict analysis and add conﬂict clauses. 6. (Non-chronological) Backtracking: Undo the most recent decision where switching the variable could lead to a solution, undo all implications due to this assignment and switch the variable value. Go to Step 2. 7. Unsatisﬁable: Return “unsatisﬁable.” Figure 2.4.

DPLL procedure

Then, a new branch in the search tree is explored by switching the variable value (Step 6). When there is no decision to undo, the search space has been completely explored and the instance is unsatisﬁable (Step 7).

2.1.3.2 Advances in SAT Only after some substantial improvements over the basic DPLL procedure in the recent past SAT solvers became a powerful engine to solve real world problems. In particular, these improvements were: efﬁcient Boolean Constraint Propagation (BCP), conﬂict analysis together with non-chronological backtracking, and sophisticated decision heuristics. BCP carries out implications due to previous decisions. In order to satisfy a CNF formula, all clauses must be satisﬁed. Now, assume that under the current partial assignment all but one literal in a clause evaluate to 0 and the variable of the last literal is unassigned. Then, the value of this last variable can be implied in order to evaluate the clause to 1.

16

ROBUSTNESS AND USABILITY

Example 4. Again, consider the CNF formula from Example 3. Assume the partial assignment x1 = 1 and x2 = 1. Then, due to clause w5 = x1 + x2 + x3 the assignment x3 = 1 can be implied. After each decision BCP has to be carried out and, therefore, the efﬁciency of this procedure is crucial for the overall performance. In [MMZ+ 01] an efﬁcient architecture for BCP was presented for the SAT solver Chaff (the source code of the implementation Zchaff can be downloaded from [Boo04]). The basic idea is to use the two literal watching scheme to efﬁciently detect where an implication may be possible. Two literals of each clause are watched. Only if one of these literals evaluates to 0 upon a previous decision and the other literal is unassigned, an implication may occur for the clause. If no implication occurs because there is a second unassigned literal, this second literal is watched. For each literal a watching list is stored to efﬁciently access those clauses where the particular literal is watched. Therefore, instead of always touching all clauses in the database, only those clauses that may cause an implication are considered. Conﬂict analysis was ﬁrst proposed in [MS96, MS99] for the SAT solver GRASP. In the traditional DPLL procedure only the most recent decision was undone when a conﬂict, i.e. a clause that is unsatisﬁed under the current assignment, was detected. In contrast, a modern SAT solver analyzes such a conﬂict. During BCP, a conﬂict occurs if opposite values are implied for a single variable due to different clauses. Then, the decisions that were responsible for this conﬂict are detected. These decisions are the reason for the conﬂict. From this reason a conﬂict clause is created to prevent the solver to reenter the same search space. As soon as all but one literal of the conﬂict clause are assigned, BCP takes over and implies the value of the remaining literal. As a result the previously conﬂicting assignment is not considered again. Finally, the SAT solver backtracks to the decision before the last decision that participated in the conﬂict. Switching the value of the last decision that lead to the conﬂict is done by BCP due to the inserted conﬂict clause. So this value assignment becomes an implication instead of a decision – also called conﬂict driven assertion. Example 5. Again, consider the CNF formula from Example 3: f (x1 , x2 , x3 , x4 ) = (x1 + x3 + x4 ) · (x1 + x3 + x4 ) · (x1 + x3 + x4 ) · (x1 + x3 + x4 ) · (x1 + x2 + x3 ) Each time the SAT solver makes a decision, this decision is pushed onto the decision stack. Now, assume that the ﬁrst decision at decision level L0 is the assignment x1 = 0. No implications follow from this decision. Then, the solver decides x2 = 0 at L1. Again, no implications follow. The solver decides x3 = 1 at L2. Now, according to clause w1 the assignment x4 = 1 is implied,

17

Preliminaries

L0 x1=0

w1

w3 L0 x =0 1 w6

x4=1

w2 L1 x2=0

w4 L1 x2=0

x4

w1 L2 x3=1 w x4=0 2

x3=0 w 3

x4=1

w4

x4

x4=0

L2

(a) Conﬁguration 1

(b) Conﬁguration 2

w7

L0 x =1 2

x1=1

w5 x3=1

L1 x =0 4

L2

(c) Conﬁguration 3

Figure 2.5. Decision stack

but also, due to w2 , the assignment x4 = 0 is implied. Therefore, a conﬂict with respect to variable x4 occurs. This situation is shown in Figure 2.5(a). The decision stack is shown on the left hand side. The solver tracks reasons for assignments using an implication graph (shown on the right hand side). Each node represents an assignment. Decisions are represented by nodes without predecessors. Each implied assignment has the reason that caused the assignment as its predecessors. The edges are labeled by the clauses that cause an assignment. In the example, the decisions x1 = 0 and x3 = 1 caused the assignment x4 = 1 due to clause w1 . Additionally, this caused the assignment x4 = 0 due to w2 and a conﬂict results. By traversing the graph backwards, the reason for the conﬂict, i.e. x1 = 0 and x3 = 1, can be determined. Now, it is known that this assignment must be avoided in order to satisfy the CNF formula. This information is stored by adding the conﬂict clause

18

ROBUSTNESS AND USABILITY

w6 = (x1 + x3 ) to the CNF formula. Thus, the nonsolution space is recognized earlier while searching – this is also called conﬂict based learning. The decision x3 = 1 is undone. Due to x1 = 0 and the conﬂict clause w6 , the assignment x3 = 0 is implied which is called a conﬂict driven assertion. The implication x3 = 0 triggers a next conﬂict with respect to x4 as shown in Figure 2.5(b). The single reason for this conﬂict is the decision x1 = 0. So the conﬂict clause w7 = (x1 ) is added. Now, the solver backtracks above decision level L0. This happens because the decision x2 = 0 was not a reason for the conﬂict. Instead, nonchronological backtracking occurs – the solver undoes any decision up to the most recent decision that was involved in the conﬂict. Therefore, in the example, the decisions x2 = 0 and x1 = 0 are undone. Due to the conﬂict clause w7 , the assignment x1 = 1 is implied independent of any decision as shown in Figure 2.5(c). Then, the decision x2 = 1 is done at L0. For efﬁciency reasons the SAT solver does not check whether all clauses are satisﬁed under this partial assignment but only detects conﬂicts. Finally, a satisfying assignment is found by deciding x4 = 0 at L1. In summary, this example shows on an informal basis how a modern SAT solver carries out conﬂict analysis and uses conﬂict clauses “to remember” nonsolution spaces. A large number of added conﬂict clauses may result in memory problems. This is resolved by removing conﬂict clauses from time to time which does not change the initial problem instance. A formal and more detailed presentation of the technique can be found in [MS99]. The algorithms to derive conﬂict clauses have been further improved, e.g. in [ZMMM01, ES04]. A result of this learning is a drastic speed-up of the solving process – in particular, also for unsatisﬁable formulas. The last major improvement of SAT solvers results from sophisticated decision heuristics. Basically, the SAT solver dynamically collects statistics about the occurrence of literals in clauses. A dynamic procedure is used to keep track of conﬂict clauses added during the search. An important observation is that locality is achieved by exploiting recently learned information. This helps to speed up the search. An example is the Variable State Independent Decaying Sum (VSIDS) strategy employed in [MMZ+ 01]. A counter exists for each literal to count the number of occurrences in clauses. Each time a conﬂict clause is added, the counters are incremented accordingly. The value of these counters is regularly divided by two. This helps to emphasize the inﬂuence of more recently learned clauses. But a large number of other heuristics has also been investigated, e.g. in [Mar99, GN02, JS05]. Another ingredient to modern SAT solvers is a powerful preprocessing step as proposed in [Dre04, EB05, JS05, EMS07]. The original CNF formula is usually a direct mapping of the problem onto a CNF representation. No optimizations are carried out, e.g. clauses with only one literal are frequently

19

Preliminaries

contained in this original CNF formula, but these can be eliminated without changing the solution space. When preprocessing the CNF formula, optimizations are applied to make the representation more compact and to improve the performance of BCP. Due to these advances, SAT solvers have become the state of the art for solving a large range of problems in CAD, e.g. formal veriﬁcation [BCCZ99, KPKG02], debugging or diagnosis [SVV04, ASV+ 05, FSVD06], and test pattern generation [SBSV96, SFD+ 05b].

2.2

Circuits

Circuits are considered throughout the design ﬂow. Often formal deﬁnitions for circuits aim at a special purpose like synthesis [HS96] or ATPG [KS97]. In this book a more general deﬁnition is used to also cope with different tasks, like veriﬁcation, simulation, and diagnosis. After deﬁning circuits for the sequential and the combinational case, the mapping of a BDD to a circuit is introduced. Finally, the transformation of a circuit into a CNF formula which is necessary when applying a SAT solver is explained.

2.2.1

Circuits and Traces

A circuit is usually composed of the elements of a set of basic gates. This set of gates is called library. One example of such a library is the set of gates shown in Figure 2.6. These are the well-known gates that correspond to Boolean operators: AND, OR, XOR, and NOT. If necessary, it is straightforward to extend this library to other Boolean gates. In the following, the library usually consists of all Boolean functions with a single output. Where necessary, the library may be restricted to consider only a subset of all Boolean functions. The connections between gates are deﬁned by an underlying graph structure. Additionally, for gates that represent nonsymmetric functions (e.g. multiplexors) a unique order for the inputs is given by ordering the predecessors of a gate. Deﬁnition 1. A sequential circuit is deﬁned by C = (V, E, X, Y, S, N, F, P ) where An acyclic directed graph G = (V, E) deﬁnes the connections X = {x1 , . . . , xn } ⊆ V is the set of primary inputs

AND

OR

Figure 2.6.

XOR

Basic gates

NOT

20

ROBUSTNESS AND USABILITY

Y = {y1 , . . . , ym } ⊆ V is the set of primary outputs S = {s1 , . . . , sl } ⊆ V is the set of present state nodes N = {n1 , . . . , nl } ⊆ V is the set of next state nodes F : V → (B∗ → B) associates a Boolean function fv = F (v) to a node v (projection functions of variables are assigned to input nodes and present state nodes) P : (V \ (X ∪ S)) → (V \ (Y ∪ N ))∗ is an ordered tuple of predecessors of v: P (v) = (w1 , . . . , wp ) Thus, P (v) describes the input variables of fv . A gate of a circuit C = (V, E, X, Y, S, N, F, P ) is a node g ∈ V . This is often denoted by g ∈ C. The size of a circuit C is denoted by |C| and is equal to the number of gates |V |. For convenience, the output signal of a gate g is often referred to as signal g. If a propositional variable is needed for gate g, this variable is also denoted by g. For any gate g the Boolean function of g in terms of primary inputs and present state values is denoted by Fg . This function is retrieved by recursively substituting the variables in fg with the functions of predecessors of g. Deﬁnition 2. A controlling value at the input of a gate determines the value of the output independently of the values at other inputs. Example 6. The value 1 (0) is the controlling value for OR (AND), and the value 0 (1) is the non-controlling value for OR (AND). An XOR-gate does not have a controlling input value. These notations can be extended to handle gates with multiple outputs and hierarchical circuits. The extension is straightforward and therefore omitted. All the practical implementations of the techniques presented in this book handle these cases when necessary. Deﬁnition 3. A combinational circuit C = (V, E, X, Y, S, N, F, P ) is a circuit without state elements, i.e. S = ∅ and N = ∅. A circuit with state elements may also be referred to as sequential circuit. For brevity a combinational circuit C = (V, E, X, Y, ∅, ∅, F, P ) may be denoted by C = (V, E, X, Y, F, P ). The value of gate g at time step t is denoted by νg [t]. If the value is unknown or not important, this may be denoted by the values ‘U ’ or ‘−’, respectively.

21

Preliminaries

This may be particularly useful to describe a counterexample where the values of some signals are not important to excite a malfunction. In this case νg [t] ∈ {0, 1, −, U } but often νg [t] ∈ B is sufﬁcient. Deﬁnition 4. A simulation trace T of length tcyc for a circuit C is given by a tuple (U, (u0 , . . . , utcyc −1 )), where U = (g1 , . . . , gr ) is a vector of r gates gj ∈ C, 1 ≤ j ≤ r and ut = (νg1 [t], . . . , νgr [t]) gives the values of these gates at time step t Example 7. Consider the waveforms in Figure 2.7(a) produced by the sequential circuit in Figure 2.8. For synchronously clocked circuits as studied in this book, the waveform can directly be mapped into the vector notation that is shown in Figure 2.7(b). Together with the vector U = (x2 , x1 , s1 , s2 , s3 ) this forms the simulation trace T = (U, (u0 , . . . , u5 )). Thus, a simulation trace directly corresponds to a waveform, e.g. given in the widely used Value Change Dump (VCD) format that is speciﬁed in IEEE Std 1364-1995.

x2

x2

0

0

0

0

0

0

x1

x1

1

1

0

1

0

0

s1

s1

0

1

1

0

1

0

s2

s2

0

0

1

1

0

1

s3

s3

0

0

0

1

1

0

u0 u1

u2

u3

u4

u5

0

1

2

3

4

5

t

(b) Vector representation of the waveforms

(a) Waveform

Figure 2.7.

Simulation trace for the shift-register

x2 x1

0 1

s1

0 1

Figure 2.8.

s2

0 1

1-bit-shift-register

s3

y1

22

ROBUSTNESS AND USABILITY d0

d1

s d0 d1 s

0

1

Figure 2.9. Multiplexor cell MUX

2.2.2

BDD Circuits

BDDs directly correspond to Boolean circuits composed of multiplexors as explained in [Bec92]. Such circuits are called BDD circuits in this work. More exactly: BDD circuits are combinational logic circuits deﬁned over a ﬁxed library. The typical multiplexor cell is denoted as MUX, and it is deﬁned as shown in Figure 2.9 by its standard AND-, OR-, NOT-based realization. The left input is called control input, the upper inputs are called data inputs (left data input = 0-input, right data input = 1-input). Results reported for BDD circuits in this book also transfer to different realizations, e.g. the realization of a multiplexor in Pass Transistor Logic (PTL). The BDD circuit of a BDD is now obtained by the following construction: Traverse the BDD in topological order and replace each internal node v in the BDD by a MUX cell, connect the control input to the primary input Label(v), corresponding to the label of the BDD node. Then, connect the 1-input to the output of the multiplexor for Then(v), connect the 0-input to the multiplexor for Else(v) and insert an inverter if CE((v, Else(v))). Finally, substitute the output nodes by primary outputs and connect these outputs to the multiplexors of their successors; insert an inverter if the edge to the successor is complemented. Example 8. Figure 2.10 shows an example for the transformation. The original BDD is shown in Figure 2.10(a). Note that the root node in this case is shown on the bottom and the terminal nodes on the top. The corresponding BDD circuit can be seen in Figure 2.10(b). Remark 1. As has been suggested in previous work [Bec92, ADK93], the MUX cells connected to constant values can be simpliﬁed. But this reduction is not applied to the BDD circuits considered in this book unless stated otherwise. The reason is a degradation of the testability due to the optimization as will be shown in Section 4.2.

23

Preliminaries 0

1

0

x3

x3

x2

x1

0

Figure 2.10.

1

x2

0

x1

0

f (a) BDD

1

1

1

y (b) BDD circuit

Example for a BDD circuit

More details on BDD circuits and their applications in the design ﬂow can be found, e.g. in [DG02].

2.2.3

Transformation into CNF

A SAT solver can be applied as a powerful black-box engine to solve a problem. In this case, transforming the problem instance into a SAT instance and the SAT solution into a solution for the original problem is crucial. In particular, the transformation of the circuit into a CNF formula is one step for multiple applications that can be implemented using a SAT prover as a core engine, e.g. ATPG, property checking, or debugging. Commonly, the Tseitin transformation [Tse68] is used that is deﬁned for Boolean expressions. For each subformula a new propositional variable is introduced and constrained to be equivalent to the subformula. For example, in [Lar92] the application to circuits has been presented. The transformation of a single AND-gate into a set of clauses is shown in Table 2.1. The goal is to create a CNF formula that models an AND-gate, i.e. a CNF formula that is only satisﬁed for assignments that may occur for an ANDgate. For an AND-gate with two inputs x1 and x2 , the output y must always be equal to x1 ·x2 . The truth-table for this CNF formula is shown in Figure 2.1(a). From the truth-table a CNF formula is generated by extracting one clause for each assignment where the formula evaluates to 0. These clauses are shown in Table 2.1(b). This CNF representation is not minimal and can therefore be reduced by two-level logic minimization, e.g. using the tool ESPRESSO that is included in SIS [SSL+ 92]. The clauses in Table 2.1(c) are the ﬁnal result.

24

ROBUSTNESS AND USABILITY

Table 2.1. Transformation of an AND-gate into a CNF formula (a) Truth-table

x1 0 0 0 0 1 1 1 1

x2 0 0 1 1 0 0 1 1

y 0 1 0 1 0 1 0 1

(b) Clauses

y ↔ x1 · x2 1 0 1 0 1 0 0 1

(c) Minimized

(x1 + x2 + y) ·

(x1 + x2 + y)

· ·

(x1 + x2 + y) (x1 + x2 + y)

· ·

(x1 + y) (x2 + y) (x1 + x2 + y)

For a gate g a propositional variable g is also used in the CNF formula when a circuit is considered in the following. This simpliﬁes understanding and notation of CNF formulas that correspond to circuits. The Boolean expression ψg describes the constraints needed to model g. Now, the generation of the CNF formula for a complete circuit is straightforward. The Boolean expressions describing the gates are conjoined into one CNF formula. Clauses are generated for each gate according to the type. Given the circuit C = (V, E, X, Y, S, N, F, P ), the Boolean expression to model the whole circuit is given by ψC = g∈V ψg . If all subexpressions ψg are given in CNF representation, the overall expression is in CNF. The output variables of a gate and input variables of the successors are identical and therefore reﬂect the connections between gates within the CNF formula. Note that this only models the circuit for one-time step. Modeling the sequential behavior will be considered later. Example 9. Consider the circuit shown in Figure 2.11. The OR-gate is described by the formula ψy = (y ↔ a + b). The primary input x1 is described by ψx1 = x1 . As a result, the circuit is translated into the following CNF formula: (x1 + a) · (x2 + a) · (x1 + x2 + a) a↔x1 ·x2

· (x3 + b) · (x3 + b) b↔x3

· (a + y) · (b + y) · (a + b + y) y↔a+b

25

Preliminaries x1 x2

a y

x3

Figure 2.11.

b

Example for the conversion into CNF

An advantage of this transformation is the linear size complexity. Given a circuit where |C| is the sum of the numbers of inputs, outputs, and gates, the number of variables in the SAT instance is also |C| and the number of clauses is in O(|C|). A disadvantage is the loss of structural information. Only a set of clauses is given to the SAT solver. Information about predecessors and successors of a node is lost and is not used during the SAT search. But this information can be partially recovered for certain applications by introducing additional constraints into the SAT instance as proposed in [Sht01] for bounded model checking and in [SBSV96, SFD+ 05b] for test pattern generation.

2.3

Formal Veriﬁcation

Formal veriﬁcation covers mainly two aspects of the design ﬂow. The veriﬁcation of the initial HDL description of the design is addressed by model checking. The correctness of subsequent synthesis steps is veriﬁed by equivalence checking. These two techniques are introduced in this section. The simpler presentation for equivalence checking is given ﬁrst. The text book [Kro99] gives a more comprehensive introduction and overview of techniques for formal veriﬁcation.

2.3.1

Equivalence Checking

Formal equivalence checking determines whether the functions realized by two given circuits are identical. In the following, the equivalence checking problem for two combinational circuits is considered. Matching the primary inputs (outputs) of one circuit with those of the other circuit is a difﬁcult problem itself [MMM02]. But this is not the focus of this work and, therefore, the mapping is assumed to be given. A common approach to carry out the equivalence check is to create a miter circuit [Bra83]. Example 10. Given a circuit CE = (VE , EE , X, Y, FE ) and its speciﬁcation CS = (VS , ES , X, Y, FS ) with X = (x1 , x2 , x3 ) and Y = (y1 , y2 ), the miter circuit is built as shown in Figure 2.12. The output of the miter assumes the value 1 if, and only if, the current input assignment causes at least one pair

26

ROBUSTNESS AND USABILITY

x1 x2

y1

CE

1 y2

x3

y1

CS y2 Figure 2.12.

Miter circuit for equivalence checking

of outputs to assume different values. Such an input assignment is called a counterexample (see Deﬁnition 5 below). The two circuits are equivalent if no such input assignment exists. One possibility to solve the equivalence checking problem is to transform the miter circuit into a SAT instance and constrain the output to the value 1. The resulting SAT instance is unsatisﬁable if the two circuits are equivalent. The SAT instance can be satisﬁed if implementation and speciﬁcation differ in at least one output value under the same input assignment. In this case, SAT solving returns a single counterexample. An all solutions SAT solver [GSY04, LHS04] could be used if more than one counterexample is needed. Alternatively, BDDs could be used to calculate the counterexamples for each output symbolically. For output yi all counterexamples are represented by: FE,yi ⊕ FS,yi

(2.2)

But this approach is limited due to the potentially large size of BDDs. Therefore, in practice, structural information is usually exploited to simplify the problem by merging identical subcircuits and multiple engines are applied as proposed in [KPKG02]. For diagnosis and debugging, often a description of the implementation and one or more counterexamples are used. Formally, counterexamples are described as follows: Deﬁnition 5. Let the circuit C be a faulty implementation of a speciﬁcation. A counterexample T is a triple (T, g, ν), where T is a simulation trace of C T causes an erroneous value at gate g ν is the correct value for gate g A test-set T is a set of counterexamples.

27

Preliminaries

For combinational equivalence checking the trace has a length of one time frame and the trace is deﬁned solely over primary inputs. The fault is always observed at a primary output. If the counterexample is calculated symbolically, do not care values may be contained in the trace.

2.3.2

Bounded Model Checking

Model checking (or property checking) [CGP99] is a technique to formally prove the validity of a given property on a model. The property is usually given in some temporal logic and the model is often described in terms of a labeled transition system or a ﬁnite state machine. Here, the model is described by a circuit that directly corresponds to a ﬁnite state machine: The values of the ﬂip-ﬂops describe a state, the values of the primary inputs describe the input symbol, and the combinational logic describes the transition function. In this context, the atomic propositions for a particular state in a labeled transition system are given by the bits with value 1 of the state vector. Essentially, each formalism can be transformed into the others. In this book, Bounded Model Checking (BMC) is considered [BCCZ99]. The property is always checked over a ﬁnite number of time frames. The advantage of this formulation is the direct correspondence to a SAT instance. The property language may describe properties over inﬁnite intervals like Linear Time Logic (LTL) [Pnu77]. Longer and longer time frames are considered incrementally until either a counterexample is found or the state space diameter is reached. On the other hand, the property language may restrict the length of the time interval. Solving a single SAT instance is then sufﬁcient to prove or disprove the property. By this, the effectiveness is drastically increased. A ﬁnite window restricts the expressiveness of the property language but usually circuits also respond within a bounded time interval to stimuli. Therefore, this type of property checking is quite efﬁcient and successfully applied in practice [WTSF04]. The SAT instance for checking a temporal property over a ﬁnite interval is shown in Figure 2.13. The circuit is “unrolled” for a ﬁnite number of time time frame 0

x1 [0]

x2 [0]

time frame 1

y1 [0]

y2 [0]

x1 [1]

time frame tcyc -1

y1 [1]

x2 [1]

x [t cyc -1] y2 [1] 1 x2 [t cyc -1]

y1 [t cyc -1] y2 [t cyc -1]

s1 [0]

n1 [0]

s1 [1]

n1 [1]

s1 [tcyc -1]

n1 [tcyc -1]

s2 [0]

n2 [0]

s2 [1]

n2 [1]

s2 [tcyc -1]

n2 [tcyc -1]

Property

Figure 2.13.

SAT instance for BMC

0

28

ROBUSTNESS AND USABILITY

frames and a propositional formula corresponding to the property is attached to this unrolling. The property is constrained to evaluate to 0. Therefore, the SAT instance is satisﬁable if, and only if, a counterexample exists that shows the invalidity of the property on the circuit. Otherwise, the property is valid. More detailed, the SAT instance is created as follows: For each time frame one copy of the circuit is created. State elements are converted to inputs and outputs. The next state outputs of time frame t are connected to the present state inputs of time frame t + 1. New variables are used for every copy of the circuit in the SAT instance. Notation 2. In time frame t, the variable g[t] is used for gate g. The Boolean expression ψg [t] denotes the Boolean constraints for gate g at time t. Remark 2. Normally, an indexed notation is used to denote variables or different Boolean expressions, while using an array notation to identify different time frames is not usual. But this notation has the advantage to separate the time reference from other indices (e.g. the number i of a particular input xi ) and, by this, to improve the readability. Moreover, in Chapter 5 Boolean expressions are derived from simulation traces. The chosen notation helps to understand the equations more easily. Given the constraint ψg , the constraint ψg [t] is retrieved by substituting all variables with the variable at time frame t, e.g. g is substituted by g[t]. Then, the CNF formula to describe the unrolling of circuit C = (V, E, X, Y, S, N, F, P ) for tcyc time frames is given by: t ψCcyc

tcyc −1

=

t=0 g∈V

ψg [t]

tcyc −2 l t=0 i=1

((ni [t] + si [t + 1])(ni [t] + si [t + 1])) ni [t]↔si [t+1]

(2.3) As a result, the behavior of the circuit over time is modeled. For a bounded ﬁnite interval the temporal property directly corresponds to a propositional formula where the variables correspond to variables of gates at particular time frames. By attaching the property to the unrolled circuit, the relationship between signals is evaluated over time. In this book, properties may either be given 1. As an LTL safety property 2. As a propositional formula that refers to signals of the circuit at particular time frames

29

Preliminaries

At ﬁrst, suppose a partial speciﬁcation of the system is given as an LTL formula. Besides well-known propositional operators, also temporal connectives are available in LTL. The meaning of the temporal operators is informally introduced in the following: X p means “p holds in the next time frame” G p means “p holds in all time frames” F p means “p eventually holds in some time frame” p U q means “p holds in all time frames until q holds” A safety property does not contain the operator F (and no other construct to express this operator, e.g. G p). Now, the LTL formula Ψ has to be checked for tcyc time steps. For this purt pose a propositional formula ψΨcyc representing the speciﬁcation is constructed. For each subformula Ω of Ψ and for every time frame t a new propositional variable zΩ [t] is introduced. These variables are related to each other and to the variables used in the unrolling of the circuit as follows. For the temporal connectives, the well-known expansion rules [MP91] are applied which relate the truth value of a formula to the truth values of its subformulas in the same and the next time frame. For instance, G Ψ = Ψ · X G Ψ and F Ψ = Ψ + X F Ψ. The Boolean connectives used in LTL are trivially translated to the corresponding constructs relating the propositional variables. Finally, the truth value of the atomic proposition g at time frame t is equal to the value of the corresponding variable g[t] in the unrolling of the circuit. The ﬁnal requirement is that the formula is not contradicted by the behavior of the circuit. That is, zΨ [0], the variable corresponding to the speciﬁcation in time frame 0, is true. As a result, property checking can be done by solving the SAT problem for the following propositional formula: t

t

ψBM C = z Ψ [0] · ψΨcyc · ψCcyc

(2.4)

The formula ψBM C is unsatisﬁable if, and only if, no trace for the circuit C exists such that the speciﬁcation Ψ does not hold – or simpler ψBM C is unsatisﬁable if and only if Ψ is valid on C, i.e. C is a model for Ψ. Alternatively, a property may be given directly as a propositional formula. In this case, a ﬁxed number of time frames is considered by this property. The length of the window for a propositional property is given by the largest time

30

ROBUSTNESS AND USABILITY

x2 x1

0 1

s1

0 1

Figure 2.14.

s2

0 1

s3

y1

1-bit-shift-register

step referenced by any variable plus one (the ﬁrst time step is considered to be zero). The property is shifted to an arbitrary time step t by adding t to each time reference. Example 11. Again, consider the circuit in Figure 2.14 that was introduced in Example 11. This is a 1-bit-shift-register with three state registers labeled by the name of the present state nodes s1 , s2 and s3 . The shift-register has two modes of operation: keep the current value (x2 = 1) and shifting (x2 = 0). In the shifting mode, the value of input x1 is shifted into the register. After three clock cycles the value is stored in register s3 . This behavior is described by the property “If x2 is zero on three consecutive time steps, the value of x1 in the ﬁrst time step equals y1 in the fourth time step” which can be written as a formula: x2 [t] · x2 [t + 1] · x2 [t + 2] → (x1 [t] = s3 [t + 3])

(2.5)

The length of the window for this property is 4. Similar notions of properties are also used by industrial model checking tools, e.g. [BS01]. Having a window for the property is not a restriction in practice. Very often the length of the window corresponds to a particular number of cycles needed for an operation in the design. In case of the shift-register, this is the number of cycles needed to bring an input value to the output. For a more sophisticated design like a processor this can be the depth of the pipeline, i.e. the number of cycles to process an instruction. Finally, counterexamples are also considered to carry out diagnosis for BMC. Similar to the case of equivalence checking the counterexample is represented by a triple (T, y, ν) as described by Deﬁnition 5. This counterexample may either be given with respect to the circuit or with respect to the SAT instance representing the BMC problem. With respect to the circuit, the counterexample is a simulation trace over time, but in general no single erroneous output of the circuit is responsible for the failure of a property. If the counterexample is given with respect to the SAT instance, the failure corresponds to zΨ [0] becoming 0 instead of 1.

31

Preliminaries

2.4

Automatic Test Pattern Generation

This section provides the necessary notions to introduce Automatic Test Pattern Generation (ATPG). First, circuits and fault models are presented. Then, the reduction of a sequential ATPG problem to a combinational problem is explained. Finally, classical ATPG algorithms working on the circuit structure are brieﬂy reviewed. The presentation is kept brief, for further reading we refer to, e.g. [JG03].

2.4.1

Fault Models

After producing a chip, the functional correctness of this chip has to be checked. Without this check an erroneous chip may be delivered to customers which, in turn, may cause a malfunction of the ﬁnal product. This, of course, is not acceptable. A large range of malfunctions is possible due to defects in the material, process variations during production, etc. But directly checking for all possible physical defects is not feasible. An abstraction in terms of a fault model is typically introduced. The Stuck-At Fault Model (SAFM) [BF76] is well-known and widely used in practice. In this fault model, a single line is assumed to be stuck at a ﬁxed value instead of depending on the input values. When a line is stuck at the value 0, this is called a stuck-at-0 (SA0) fault. Analogously, if the line is stuck at the value 1, this is a stuck-at-1 (SA1) fault. Example 12. Figure 2.15(a) repeats the circuit from Example 9. When a SA0 fault is introduced on line a, the faulty circuit in Figure 2.15(b) is created. The output of the AND-gate is disconnected and the upper input of the OR-gate constantly assumes the value 0. Besides the SAFM a number of other fault models have been proposed, e.g. the cellular fault model [Fri73] where the function of a single gate is changed, or the bridging fault model [KP80] where two lines are assumed to settle to a single value. These fault models mainly cover static physical defects like opens or shorts. Dynamic effects are covered by delay fault models, for example, the Path-Delay Fault Model (PDFM) [Smi85]. In the PDFM, it is x1

x1

x2 x3

x2

a y b (a) Correct circuit

Figure 2.15.

a 0

x3

b (b) Faulty circuit

Example for the SAFM

y

32

ROBUSTNESS AND USABILITY

checked whether the propagation delays of all paths in a given circuit are less than the system clock interval. For the detection of a path delay fault a pair of patterns (I1 , I2 ) is required rather than a single pattern as in the SAFM: The initialization vector I1 is applied and all signals of the circuit are allowed to stabilize. Then, the propagation vector I2 is applied and after the system clock interval the outputs of circuit C are checked. Deﬁnition 6. A two-pattern test is called a robust test for a path delay fault (RPDF test) on a path if it detects that fault independently of all other delays in the circuit and all other delay faults not located on this path. An even stronger property can also be deﬁned for PDF tests: For each path delay fault there exists a robust test (I1 , I2 ) which sets all off-path inputs to noncontrolling values on application of I1 and remains stable during application of I2 , i.e. the values on the off-path inputs are not invalidated by hazards or races. Robust tests with this property are called strong RPDF tests. In the following, we only use such tests, but for simplicity we call them RPDF tests, too. For a detailed classiﬁcation of PDFs see [PR90].

2.4.2

Combinational ATPG

Automatic Test Pattern Generation (ATPG) is the task of calculating a set of test patterns for a given circuit with respect to a fault model. A test pattern for a particular fault is an assignment to the primary inputs of the circuit that leads to different output values depending on the presence of the fault. Calculating the Boolean difference of the faulty circuit and the fault-free circuit yields all test patterns for a particular fault. This construction is similar to a miter circuit [Bra83] as it can be used for combinational equivalence checking (see Section 2.3.1). In this sense, formal veriﬁcation and ATPG are similar problems [AFK88]. Example 13. Again, consider the SA0 fault in the circuit in Figure 2.15. The input assignment x1 = 1, x2 = 1, x3 = 1 leads to the output value y = 1 for the correct circuit and to the output value y = 0 if the fault is present. Therefore this input assignment is a test pattern for the fault a SA0. The construction to calculate the Boolean difference of the fault free circuit and the faulty circuit is shown in Figure 2.16. A similar approach can be used to calculate tests for the dynamic PDFM. In this case the circuit is either unrolled for two time frames or a multi-valued logic is applied to model the value of a gate in two subsequent time frames. Additional constraints apply to gates along the path to be tested to force different values in the two time frames. As a result, two test patterns are calculated

33

Preliminaries x1 x2

a y

x3

b BD

a’ 0

y’

b’

Figure 2.16.

Boolean difference of the faulty circuit and the fault free circuit

to test for a PDF. For a strong RPDF test the side inputs to the path have to be set to noncontrolling values. The absence of hazards has to be ensured by extra constraints. Deﬁnition 7. A fault is testable when a test pattern exists for that fault. A fault is untestable when no test pattern exists for that fault. To decide whether a fault is testable, is an NP-complete problem [IS75]. The aim is to classify all faults and to create a set of test patterns that contains at least one test pattern for each testable fault. Generating test patterns for circuits that contain state elements like ﬂip-ﬂops is computationally more difﬁcult because the state elements cannot directly be set to a particular value. Instead, the behavior of the circuit over time has to be considered during ATPG. For example, the circuit can be unrolled similarly to BMC. In ATPG, this is frequently called the iterative logic array. Moreover, a number of tools have been proposed that directly address the sequential problem, e.g. HITEC [NP91] or the sequential SAT solver SATORI [IPC03]. But in practice, the resulting model is often too complex to be handled by ATPG tools. To overcome this problem, the full scan mode is usually considered by connecting all state elements by a scan chain [WA73, EW77]. In test mode, the scan chain combines all state elements into a shift-register, in normal operation mode the state elements are driven by the ordinary logic in the circuit. As a result, the state elements can be considered as primary inputs and outputs for testing purposes and a combinational problem results.

2.4.3

Classical ATPG Algorithms

Classical algorithms for ATPG usually work directly on the circuit structure to solve the ATPG problem for a particular fault. Some of these algorithms are brieﬂy reviewed in the following. For an in-depth discussion the reader is referred to text books on ATPG, e.g. [JG03].

34

ROBUSTNESS AND USABILITY

One of the ﬁrst complete algorithms dedicated to ATPG for the SAFM was the D-algorithm proposed by Roth [Rot66]. The basic ideas of the algorithm can be summarized as follows: A fault is observed due to differing values at a line in the faulty circuit and the fault-free circuit. Such a divergence is denoted by values D or D to mark differences 1/0 or 0/1, respectively. Instead of Boolean values, the set {0, 1, D, D} is used to evaluate gates and carry out implications. A gate that is not on a path between the fault and any output does never have a D-value. A necessary condition for testability is the existence of a path from the fault location to an output where all intermediate gates either have a D-value or are not assigned yet. Such a path is called a potential D-chain. A gate is on a D-chain if it is on a path from the fault location to an output and all intermediate gates have a D-value. On this basis an ATPG algorithm can focus on justifying a D-value at the fault site and propagating this D-value to an output as shown in Figure 2.17. The algorithm starts with injecting the D-value at the fault site. Then, this value has to be propagated towards the outputs. For example, to propagate the value D at one input along a 2-input AND-gate, the other input must have the noncontrolling value 1. After reaching an output, the search proceeds towards the inputs in the same manner to justify the D-value at the fault site. At some stages in the search decisions are possible. For example, to produce a 0 at the output of an AND-gate, either one or both inputs can have the value 0. Such a decision may be wrong and may lead to a conﬂict later on. Due to a reconvergence as shown in Figure 2.17, conditions resulting from propagation may prevent justiﬁcation. In this case, a backtrack search has to be applied. In

Fault site

Justification

Propagation

Reconvergent path Figure 2.17.

Justiﬁcation and propagation

Preliminaries

35

summary, the D-algorithm is confronted with a search space of O(2|C| ) for a circuit with |C| signals including inputs, outputs and internal signals. A number of improvements have been proposed for this basic procedure. PODEM [Goe81] branches only on the values for primary inputs. This reduces the search space for test pattern generation to O(2n ) for a circuit with n primary inputs. But as a disadvantage time is wasted if all internal values are implied from a given input pattern that ﬁnally does not detect the fault. Fan [FS83] improves upon this problem by branching on stems of fanout points as well. As a result internal structures that cause a conﬂict when trying to detect the test pattern are detected earlier. The branching order and value assignments are determined by heuristics that rely on observability measures to predict a “good” variable assignment for justiﬁcation or propagation, respectively. Moreover, the algorithm keeps track of a justiﬁcation frontier moving towards the inputs and a propagation frontier moving towards the outputs. Therefore, Fan can make the “most important decision” ﬁrst – based on a heuristic – while the D-algorithm applied a more static order by propagating the fault at ﬁrst and justifying the assignments for preceding gates afterward. Socrates [STS87] includes the use of global static implications by considering the circuit structure. Based on particular structures in the circuit, indirect implications are possible, i.e. implications that are not directly obvious due to assignments at a single gate, but implications that result from functional arguments taking several gates into account. These indirect implications are applied during the search process to imply values earlier from partial assignments and, by this, prevent wrong decisions. Hannibal [Kun93] further improves this idea. While Socrates only uses a predeﬁned set of indirect implications, Hannibal learns from the circuit structure in a preprocessing step. For this task recursive learning [KP94] is applied. In principle, recursive learning is complete itself but too time consuming to be used as a stand-alone procedure. Therefore, learning is done in a preprocessing step. During this step, the effect of value assignments is calculated and the resulting implications are learned. These implications are stored for the following run of the search procedure. In Hannibal the Fan algorithm was used to realize this search step. Even though several improvements have been proposed to increase the efﬁciency of ATPG algorithms, the worst case complexity is still exponential. Synthesis for testability means to consider the ATPG problem during synthesis already and, by this, create circuits with good testability.

This page intentionally blank

Chapter 3 ALGORITHMS AND DATA STRUCTURES

The technique proposed in this chapter cannot exclusively be attributed to a single step in the design ﬂow. Instead, the underlying techniques for Boolean function manipulation are adjusted to particular subsequent needs. Binary Decision Diagrams (BDDs) [Bry86] and solvers for the Boolean Satisﬁability (SAT) problem [DP60, Coo71] are state of the art for Boolean function manipulation. Both approaches have individual advantages. In the past, many researchers have proposed techniques to improve the efﬁciency of these algorithms, e.g. in [MS96, BRB90, MMZ+ 01, Som01b, ES04]. A new data structure is presented that combines paradigms of SAT solvers and BDDs. Heuristics allow to trade-off BDD-like symbolic manipulation of Boolean functions versus SAT-like search in the Boolean space. This can inﬂuence the robustness advantageously or can be exploited to retrieve more detailed information about particular parts of the solution space. The approach was ﬁrst presented in [DFK06]. The link of this technique to other steps within the design ﬂow is outlined at the end of this chapter.

3.1

Combining SAT and BDD Provers

Besides BDDs SAT provers are an efﬁcient – and often more robust – technique to handle Boolean problems. Experimental studies have shown that both techniques are orthogonal, i.e. there exist problems where BDDs work well while SAT solvers fail and vice versa. This trade-off can even be formally proven [GZ03]. BDDs and SAT provers are very different in nature. While BDDs compute all solutions in parallel, they require a large amount of memory. In contrast, SAT is very efﬁcient regarding memory consumption but only gives a single solution. There are many applications where multiple solutions are needed

38

ROBUSTNESS AND USABILITY

(see, e.g. [HTFM03] or Section 6.2). Motivated by these observations, many authors tried to combine the best of the two approaches, by applying SAT solvers and BDDs alternatively or iteratively. Even though remarkable results have been obtained, so far none of the approaches considered an integration of the two methods within a single data structure. In this section, the ﬁrst hybrid approach that allows to tightly combine BDDs and SAT is presented. Even though the overall principle of the two techniques is very different, there are also some similarities. In both concepts, starting from a Boolean description the problem is decomposed by assigning a Boolean value to a variable. This has already been observed in [RDO02]. For this, the concept of expansion nodes is introduced. The given Boolean problem is initially represented by a single expansion node that is recursively expanded. If this is done in a strict Depth First Search (DFS) manner, the resulting algorithm is close to a SAT procedure. But if all operations are carried out symbolically, the algorithm computes a BDD. The relation between the two approaches is discussed in more detail later. Experimental results demonstrate the efﬁciency of the approach. The section is structured as follows: Other approaches to extend Boolean proof techniques and the relation between SAT and BDDs are discussed in Section 3.1.1. Then, the relation between the two is considered. The new hybrid approach is presented in Section 3.1.2. In Section 3.1.3, experimental results are given.

3.1.1

Proof Techniques

In the following, earlier work related to the hybrid approach is discussed. Different extensions have been suggested for both concepts, SAT provers and BDDs. Then, the relations between both concepts are brieﬂy reviewed.

3.1.1.1 Extensions Streaming BDDs have been proposed to reduce the memory requirements [Min02]. The idea is to represent a BDD as a bracketed sequence. The sequence can be processed sequentially using limited memory. But this can only be done by giving up canonicity. In the context of extensions of the classical BDD concept introduced by Bryant (see Section 2.1.2), some approaches have been presented that make use of different types of functional nodes. The approach in [RBKM91] keeps control of the memory needed for the BDD construction by projecting some parts of the graph to a new terminal node U (= unknown). Instead of completely calculating each subgraph, the calculation may be stopped at a given depth and the complete tree is replaced by the terminal node U . As a result, exactness cannot be recovered afterward.

39

Algorithms and Data Structures

Nodes to represent the exclusive-or of the children have been introduced in [MS98]. The purpose of these nodes is to reduce the size of the BDD. Then, probabilistic methods are applied to ﬁnd a satisfying assignment. Extended BDDs as proposed in [JPHS91] apply existential quantiﬁcation and universal quantiﬁcation as edge attributes. By introducing a “structural variable” s, the equality ∃s f = fs + fs can be exploited to represent the Boolean operation f + g in terms of a node v. This can be seen as follows: Let v be a node and f and g be the Boolean functions represented by its children. Then, v represents the function sf + sg. Now, assume an incoming edge has the attribute for existential quantiﬁcation. The function represented by this edge is retrieved as follows: ∃s (sf + sg) = (sf + sg)s + (sf + sg)s = f +g

(as introduced above)

Similarly, universal quantiﬁcation is used to represent f · g. These structural variables allow to control the size of the extended BDD. Again, the problem is to ﬁnd a satisfying assignment of the resulting extended BDDs. The same principle was exploited in [HDB96]. By introducing extra nodes at the top level of two BDDs, a Boolean operation is represented. Then, these nodes are moved towards the terminals by exchanging adjacent variables [Rud93]. At the terminals these nodes can be eliminated. In both cases the use of new variables implies that a new level is introduced in the shared BDD structure. The approach was further extended in [AH97] for Boolean Expression Diagrams (BEDs). Functional nodes that directly represent Boolean operations were introduced. Again, these nodes can be eliminated by swapping adjacent levels in the BED. If a BED is built from a description of a circuit, the size of a BED is similar to the circuit size. All of these approaches are presented as extensions of BDDs. The advantage of using SAT-like algorithms on such a structure has not been considered. Another recent direction of research are efﬁcient all-solution SAT solvers that do not stop after reaching the ﬁrst satisfying assignment but calculate all possible satisfying solutions, e.g. [LHS04]. A drawback of these approaches is the potentially large representation of all solutions usually as cubes or as BDDs. In contrast, the hybrid approach targets applications where not all but a set of good solutions is needed. Recently, several techniques have been proposed to combine BDDs and SAT solvers (see, e.g. [GYAG00, KPKG02, CNQ03, SFVD05]), but no real integration is done. Instead, the proof engines are started one after the other, or alternating. By this, good experimental results have often been obtained, demonstrating the potential of an integrated approach.

40

ROBUSTNESS AND USABILITY

3.1.1.2 Relations BDDs and SAT solvers are most frequently used as complete proof techniques and for the symbolic manipulation of Boolean functions. Both techniques have advantages and disadvantages. BDDs represent all solutions in parallel at the cost of large memory requirements. SAT solvers only provide a single solution while the memory consumption is relatively low. In [RDO02] the relation between BDDs and SAT has been studied from a theoretical point of view. It has been proven that the BDD corresponds to a complete representation of the SAT backtrack tree if a ﬁxed variable order is assumed. As a motivation for the next section, where the hybrid approach is described in more detail, an example is given to show the difference between SAT and BDDs. We will later come back to this example. Example 14. Consider the Boolean function f over four variables given by f

= (x1 + x2 + x3 )(x1 + x2 + x4 )(x1 + x2 + x4 ) (x1 + x2 + x3 )(x1 + x2 + x3 + x4 )

A sketch of the search tree if the function is processed by a SAT solver is shown in Figure 3.1(a). The corresponding BDD is given in Figure 3.1(b) for the variable order π = (x1 , x2 , x3 , x4 ). As can be seen, the SAT solver by construction only gives a single solution while the BDD represents all satisfying assignments in parallel at the cost of a larger number of nodes.

3.1.2

Hybrid Approach

In this section, the hybrid approach for BDD and SAT integration is presented. First, the overall idea is given. Then, the concept of expansion nodes is introduced followed by a discussion of expansion heuristics. Finally, comments on some issues related to an efﬁcient implementation are provided.

3.1.2.1 Basic Idea In the hybrid approach, processing starts by symbolic operations analogously to BDDs. For the operations the basic operators for XOR and AND (see Section 2.1.2) have been modiﬁed. During the starting phase, the constructed graphs are simply BDDs. But when composing BDDs, a heuristic is used to decide which parts of the solution space are explored. To guarantee the exactness of the algorithm, i.e. no solution is missed, a node is introduced where the computation can be resumed. These nodes are called expansion nodes in the following. As a result, the hybrid approach stores all necessary information resulting in a complete proof method.

41

Algorithms and Data Structures f

x1 x3

x3=0

x2

x2

x2 x2=0

x3

x3

x1 x1=1 x4 x4=1

x4

x4=0 0

1

x4

x1=0

1

0

0

(b) BDD

(a) SAT search tree

x1

xh

x2

E xi

E

x3 E

xj

x4

0 (c) Sketch of the hybrid approach Figure 3.1.

1

(d) Hybrid representation

Different approaches

A sketch of a conﬁguration during the run is shown in Figure 3.1(c). In this case the upper part is “SAT-like” while the lower part is a complete symbolic representation as it occurs in BDDs. The expansion nodes are denoted by E. The decomposition nodes are labeled by variables, these variables occur in the same order on all paths. In the following, such graphs that allow a smooth transition between SAT and BDDs are called a hybrid structure.

42

ROBUSTNESS AND USABILITY xi 1 (a) Terminal

E op low

high

(b) Decomposition node

f

g

(c) Expansion node

Figure 3.2. Overview over different node types

Remark 3. Several expansion nodes in a hybrid structure may represent the same function. This cannot be detected before completely expanding the node. Thus, a hybrid structure is not a canonical representation of Boolean functions.

3.1.2.2 Expansion Nodes The hybrid approach makes use of three types of nodes (see Figure 3.2): (a) Terminal nodes (b) Decomposition nodes (c) Expansion nodes The ﬁrst two can also be found in BDDs. Terminal nodes represent the constant functions 0 and 1. In decomposition nodes the Shannon decomposition is carried out. Expansion nodes are labeled by a Boolean operation op and have two successors f and g that represent Boolean functions (which are also denoted by f and g for simplicity). The expansion node represents the function f op g. Example 15. Consider again the function from Example 14 and Figures 3.1(a) and 3.1(b). A possible hybrid structure is shown in Figure 3.1(d). This one results if the top variable is only decomposed in one direction, while an expansion node is placed on the other branch. As can be seen, the structure is more memory efﬁcient. Compared to the BDD ﬁve instead of seven nodes are needed. At the same time three solutions are represented in contrast to the SAT approach that only returns a single solution. This simple example demonstrates that the approach combines the two proof techniques SAT and BDD. A crucial point to address is where to place the expansion nodes. A heuristic for this purpose is presented in the next section.

3.1.2.3 Expansion Heuristics Inserting expansion nodes at suitable locations is crucial for the approach to work. If too many expansion nodes are inserted, no solutions can be found. Only structures without a path to a terminal will be constructed and the expansion of partial trees will take most of the run time until computing a solution.

Algorithms and Data Structures

43

Not inserting enough expansion nodes will lead to a memory blow-up as known from BDDs. In a BDD-based approach the ﬁnal solutions are computed by composing intermediate BDDs. This is similar for the new approach. The following steps are necessary to retrieve solutions: 1. Build BDDs for basic functions without any expansion nodes. 2. Compose the basic functions and insert expansion nodes according to a predetermined heuristic. 3. Select expansion nodes to calculate the ﬁnal solutions. Which functions are considered as basic functions in Step 1 depends on the problem and the input format, e.g. projection functions and cubes were chosen in the experiments. Building BDDs for these basic functions is not necessary for the approach to work, but having the basic functions completely represented, improves the performance drastically by reducing the number of necessary expansions. The following two heuristics to limit the size of the resulting hybrid structure in Step 2 have been evaluated: (S1) A fast procedure is to directly limit the memory consumption. This limit can be detected efﬁciently. Once the limit is reached no further decomposition nodes are created. Instead, expansion nodes are generated. Therefore, prior to performing an expansion the memory limit is increased by a user deﬁned value. (S2) The second procedure is to limit the number of nodes in a subgraph to a certain threshold. Tracking this limit is computationally more expensive. But allowing more than n nodes in a subgraph guarantees that there is at least one path to a terminal node, i.e. for at least one assignment the function can directly be evaluated. The selection of nodes to expand in Step 3 has been evaluated using two other heuristics: (E1) Randomly (E2) Heuristically (using the algorithm in Figure 3.3): The hybrid structure is traversed in a depth ﬁrst manner until an expansion node is reached. This node is selected and then expanded by carrying out the stored operation. The same scheme is applied recursively if further selections are necessary.

44

ROBUSTNESS AND USABILITY

1 node* DFS(v){ 2 if(isTerminal(v)) return NULL; 3 tmp = DFS(Then(v)); 4 if(tmp) return tmp; 5 if(isExpNode(v)) return v; 6 tmp = DFS(Else(v)); 7 return tmp; 8 } Figure 3.3. Depth ﬁrst traversal

Here, (E2) also heuristically ensures a moderate growth of the memory needs. Experimental studies showed that the combination of a hard limit on memory consumption (S1) with deterministic DFS (E2) gives the best results, i.e. small run times and a large number of solutions. From a more general point of view this combination of heuristics leads to a SAT-like search tree in the upper part of the hybrid structure which is enriched by a BDD-like lower part. Remark 4. When using heuristics (S1) and (E2) in combination, the search space is traversed similar as with “BDDs at SAT leaves” as it has been introduced in [GYAG00, GYA+ 01]. But the proposed hybrid structure is more general in the sense that switching between SAT-like and BDD-like behavior is subject to heuristics. Remark 5. During expansion canonicity is also an issue. When expanding a node, a function that is already represented by another node may be the result. The hybrid structure can be reduced at a computational cost linear in the number of nodes using an algorithm similar to [SW93]. In the implementation no reduction was carried out to save run time.

3.1.2.4 Implementation The technique described above has been integrated into the CUDD package [Som01a] where the core data structures are taken from. To store the expansion nodes, the structure for nodes has been extended (see Line 8 in Figure 3.4). The structure for the new type is given in Lines 12–15. In case of an expansion node, also the operation has to be stored. For reasons of efﬁciency only two types of operations are stored: AND and XOR. Negation is realized by complemented edges (see Section 2.1.2). All other Boolean operators are mapped accordingly. The information is stored in the index of each node. The complete encoding is given in Table 3.1, i.e. three indices have a special meaning while all the remaining ones are used for decomposition variables.

45

Algorithms and Data Structures

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

struct node { halfWord index; halfWord ref; node *next; union { terminal value; children kids; expNode func; } } struct expNode { node *F; node *G; } Figure 3.4. Modiﬁed node structure

Table 3.1. Index of node types (32-bit) Node type Decomposition nodes XOR-node AND-node Terminal node

3.1.3

Index 0 - 65532 65533 65534 65535

Experimental Results

In the following, the results of two types of experiments are analyzed. First, the well-known n-Queens problem is considered as an example of a combinational problem where BDDs perform poorly on large instances while a large number of solutions is available. Second, the synthesis problem of minimizing ESOP representations is studied as an optimization problem that is known to be hard. All experiments have been carried out on an Intel Pentium 4 processor with 3 GHz and 1 GB of main memory running Linux.

3.1.3.1 n-Queens The n-Queens problem is a well-known combinational problem. The objective is to place n queens on an n × n board such that no queen can be captured by another one. An example for a solution of the 5-Queens problem is shown in Figure 3.5. This game problem is encoded using n2 binary input variables, each one deciding whether a queen is placed on the corresponding ﬁeld of the

46

ROBUSTNESS AND USABILITY

Figure 3.5. Solution to the 5-Queens problem Table 3.2. Heuristics to limit the size of the hybrid structure BDD n 6 7 8 9 10 11 12 13

#Sol. 4 40 92 352 724 2680 14200 73712

Time 0.00 0.01 0.05 0.37 1.56 7.81 48.12 352.11

Limit for the size Memory (S1) Subgraph (S2) Time Overhead Time Overhead 0.00 – 0.01 – 0.01 0.00 % 0.03 200.00 % 0.06 20.00 % 0.18 260.00 % 0.37 0.00 % 1.30 251.35 % 1.59 1.92 % 8.20 425.64 % 7.82 0.13 % 62.39 698.84 % 48.54 0.87 % 490.33 918.97 % 353.21 0.31 % 4566.75 1196.97 %

chess board or not. Obviously, the constraints are to place one queen per row and column and at most one queen per diagonal. In a ﬁrst experiment, the heuristics to limit the size were considered. For all experiments the limits were loose enough to retrieve all solutions. By this, the overhead of the heuristics to limit the size can directly be measured in comparison to BDDs. Results are reported in Table 3.2. Given are the number of solutions for increasing values of n and run times in CPU seconds for BDDs and the two heuristics introduced in Section 3.1.2.3. The resource requirements for BDDs increase rapidly and no further solutions beyond n = 13 could be retrieved. Also the computational overhead of limiting the size of subgraphs using heuristic (S2) is too large. But directly limiting the memory consumption according to heuristic (S1) introduces almost no overhead. The direct limit has been used in all remaining experiments to restrict the size. The performance of heuristics to select nodes for expansion has been investigated in the next experiment. Expansion was carried out until a total memory

47

Algorithms and Data Structures Table 3.3. Selection of expansion nodes n 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

#Var 9 16 25 36 49 64 81 100 121 144 169 196 225 256 289 324 361 400 441

Randomly (E1) #Sol. Time 0 0.00 2 0.00 10 0.00 4 0.00 40 0.02 92 0.06 352 0.37 724 2.10 2680 16.54 14200 158.86 73712 2062.39 0 384.45 0 289.01 0 652.64 0 1366.25 0 693.13 0 529.37 0 1923.07 0 1957.39

DFS (E2) #Sol. Time 0 0.00 2 0.00 10 0.00 4 0.00 40 0.01 92 0.06 352 0.37 724 1.83 2680 10.30 14200 73.34 73712 578.54 56672 1836.93 33382 1669.50 20338 2555.35 5061 2055.97 204 2238.79 1428 3357.97 38 1592.94 111 1972.60

limit of 750 MB was reached. Due to the expansion of subfunctions, more than one solution can be contained in the ﬁnal representation. The results are shown in Table 3.3. Up to n = 13 all solutions were obtained with both heuristics. Then, the random selection performs very poorly. When expanding the last node in a cascade of expansion nodes, new decomposition nodes are created. But the next expansion will often occur at an expansion node in a different subgraph. Thus, the previously created decomposition nodes cannot be utilized for the next step. In contrast, the deterministic DFS starts the next expansion where new decomposition nodes have been constructed previously. As a result, the new approach yields solutions up to n = 21 in a moderate amount of time.

3.1.3.2 ESOP Minimization Compared to a SOP-representation of an function the ESOP-representation can be exponentially smaller (consider i = 1 xi as an example). But most algorithms for ESOP minimization only apply local transformations to improve the representation starting from an initial solution, e.g. [BS93, MP01]. In [PC90] the problem to compute an ESOP for a given Boolean function f over n variables has been formulated using the Helliwell equation. The Helliwell equation Hf for function f has 3n input variables, each input variable corresponds to a cube and is 1 if, and only if, this cube is chosen for the ESOP of f . A satisfying

48

ROBUSTNESS AND USABILITY

Table 3.4. ESOP minimization

k 4 5 10 15 20 25 30 35 39

BDD all solutions Time #Nodes 0.55 628 0.58 4075 1.75 420655 4.96 1428139 53.96 2444782 1945.01 3449866 9985.37 4441463 13900.22 5361182 13913.44 5906441

hybrid structure ≥ 1 solution ≥ 103 solutions ≥ 106 Time #Nodes Time #Nodes Time 0.50 568 0.53 1108 0.53 0.53 638 0.60 4729 0.61 0.47 145 0.70 11597 51.28 0.48 352 0.61 11634 10.17 0.47 112 0.54 7459 1.13 0.48 490 0.52 5465 0.98 0.49 495 0.49 2618 0.66 0.52 544 0.51 878 0.75 0.44 217 0.45 1241 0.53

solutions #Nodes 1108 4729 155018 172422 177708 133396 48107 21608 5910

Zchaff 1 sol. 103 sol. 106 sol. Time Time Time <0.01 0.07 0.07 <0.01 0.09 0.09 <0.01 0.14 – <0.01 0.11 – <0.01 0.32 – <0.01 0.37 – <0.01 0.12 – <0.01 0.16 – <0.01 0.09 –

assignment to Hf determines an ESOP for f and vice versa. The hybrid structure was built for the Helliwell equation. By additional constraints, the number of cubes was limited to be at most k. The experimental results for applying 4 this method to f = i = 1 xi are shown in Table 3.4. Given are results for using BDDs, the hybrid structure, and the SAT solver Zchaff [MMZ+ 01]. We modiﬁed the SAT solver Zchaff to calculate more than one solution: For each solution a blocking clause is added and the solve process is continued. For the hybrid structure results are reported when different numbers of solutions are calculated: more than 1, more than 103 , and more than 106 solutions, respectively. For different values of k the CPU time in seconds and the number of nodes in the BDD or the hybrid structure are reported, respectively. For Zchaff, the CPU time is given. The number of available solutions is not reported but grows rapidly. While there are only 38 valid solutions for k = 4, there are more than 5000 for k = 6 and more than 4 × 106 for k = 9. The results show the superiority of the hybrid approach compared to BDDs. For a tightly restricted solution space (k < 25) BDDs are feasible. But after that the memory and especially the run time requirements grow prohibitively fast. In contrast, the hybrid approach exhibits a rather stable performance as CPU time and memory requirements remain in the same order for all runs. The increased run time for k = 10, 15 when calculating more than 106 solutions is due to the small number of possible solutions. In this case a large part of the BDD has to be recreated using the expansion technique without retrieving more solutions. As a result, BDDs are faster. But usually even calculating a large number of solutions does not degrade the performance of the new approach in the experiments. When calculating a single solution, the SAT solver is faster. But even for calculating 103 solutions the computation time increases signiﬁcantly. Finally, when calculating a large number of solutions the added blocking clauses lead to a memory blow-up even for the SAT solver. Using a more sophisticated

Algorithms and Data Structures

49

approach, the blocking clauses could be compacted but only at the expense of CPU time for logic optimization. By this, the new approach provides a good compromise between a SAT-based approach and a BDD-based approach.

3.2

Summary and Future Work

Algorithms and data structures for Boolean function manipulation have been considered. A new approach to handle satisﬁability problems was presented. This approach can be seen as an integrated technique using BDDs and SAT solvers and incorporates beneﬁts of both: the memory consumption can be limited while calculating a large number of solutions in a single run. First heuristics have been proposed and evaluated to increase the performance of the new technique. Experiments show the efﬁciency of the hybrid technique in contrast to classical approaches. This technique can improve the robustness of Boolean function manipulation as the experiments show. But the technique can also be applied to improve the usability of design tools as it provides multiple solutions. The hybrid approach can be used to explore particular parts of the search space in depth while the exploration of less interesting parts may be deferred. This is beneﬁcial at least in two cases. First, where multiple solutions provide a better basis for analysis, as it is the case for diagnosis (see also Chapter 6). Second, when there is a tradeoff between different solutions, e.g. for logic synthesis if power and size are considered. Integrating learning strategies as known from SAT provers is one focus of future work. The development of better heuristics and heuristics for particular problems is another direction.

This page intentionally blank

Chapter 4 SYNTHESIS

The techniques presented in this chapter provide a ﬂow for Synthesis for Testability of SystemC descriptions. Figure 4.1 shows the part of the design ﬂow that is covered. The synthesizable description of a system is only an intermediate step in the design process. Coming from a high-level description, a synthesizable description is created. This is done mainly manually. While the synthesizable description is usually coded in a Hardware Description Language (HDL), a second system-level description is commonly described in a software programming language like C or C++ . This system-level description serves as a “golden model” for the design. From the functional point of view the synthesizable description is a reﬁnement of the system-level description in the sense that the same output should be produced by both models upon the same input. Traditionally this is only exploited during simulation-based veriﬁcation. But the time-consuming task of coding the two models is carried out twice in two independent processes. As a result, checking the consistency of both descriptions becomes very difﬁcult as the simulation-based approach is not feasible for a complete check. Replacing the two languages – HDL and software programming language – by a single language can alleviate this weakness in the design ﬂow. By using SystemC, the system-level description can be reﬁned into a synthesizable one. Parsing and synthesizing SystemC are considered in the ﬁrst part of this chapter. A subset of SystemC is used to describe a synthesizable model. Techniques for transforming this model into a gate-level description are explained. These techniques are implemented in the tool ParSyC. Experimental results show the efﬁciency. The technique for synthesis of SystemC was ﬁrst presented in [FGC+ 04] and is part of the integrated design environment SyCE [DFGG05].

52

ROBUSTNESS AND USABILITY System level description

Manual refinement

Synthesizable description

Synthesis (for test.)

Gate level description

Figure 4.1.

Synthesis part of the design ﬂow

After transforming the HDL description into a gate-level description, logic synthesis is carried out to optimize the circuit structures, e.g. with respect to size or speed. During this step testability issues are usually not considered. Therefore the calculation of test-vectors for the postproduction test becomes a difﬁcult task. The decision problem “Does there exist a test-vector for a particular fault?” is NP complete [IS75]. Additionally, the optimized circuit may contain redundancies, i.e. untestable faults. Here, Synthesis for Testability can help to ease Automatic Test Pattern Generation (ATPG). The technique MuTaTe addresses this problem. MuTaTe is presented in the second part of this chapter. The circuits that are created using this technique are 100% testable under the Stuck-At Fault Model (SAFM) and the Path-Delay Fault Model (PDFM). Moreover, the run time of test pattern generation is polynomial in the size of the resulting circuit. MuTaTe has been proposed in [DSF04].

4.1

Synthesis of SystemC

New design methodologies and design ﬂows are needed to cope with the increasing complexity of today’s circuits and systems. As already explained, this applies to all stages of circuit design from system-level modeling and veriﬁcation down to layout. One focus of research in this area is the use of new hardware description languages. Traditionally, the system level description is done in a programming language like C or C++ while dedicated hardware description languages like VHDL and Verilog are used at the RTL. This leads to a decoupling of the

Synthesis

53

behavioral description and the synthesizable description. Recently, developed languages allow for higher degrees of abstraction, and, additionally, the reﬁnement for synthesis is possible within the language. One of these new languages is SystemC [LTG97, GLMS02]. Basis of SystemC is C++ . Therefore all features of C++ are available. The additional SystemC library provides all concepts that are needed to model hardware as, e.g. timing or concurrency. Research on the conceptual side and on algorithms that are applied to SystemC designs is difﬁcult. For areas like high-level synthesis, veriﬁcation, or power estimation a formal understanding of the given design is necessary before any subsequent processing can be carried out. Most recent publications either focused on special features of SystemC or C/C++ , like synthesis of ﬁxed point numeric operations [BFGR03], polymorphism [SKWS+ 04], or pointers [KCY03], or considered the design methodology [MRR03]. Few works [e.g. MRH+ 01] have been published that rely on the formal model of an arbitrary SystemC design. One reason for this is the high effort to syntactically and semantically understand the SystemC description. For this purpose SystemC has to be parsed. Recently, different parsers for SystemC have been proposed. The approaches in [BPM+ 05, EAH05] only extract a coarse structure from a SystemC description. Behavioral information is lost. The analysis tool PINAPA [MMMC05] can handle high-level constructs. But PINAPA relies on the C++ compiler as a front-end for parsing the source code. Therefore, patches are needed to change to a different compiler or to a different version of SystemC. This makes the use of PINAPA difﬁcult. Finally, the approach proposed in [GD06] is a direct enhancement of the one suggested here. In this section, a parser and synthesizer for SystemC is presented that is implemented as the tool ParSyC. This tool is also part of the design environment SyCE [DFGG05] for SystemC. The parser covers SystemC and to a certain extent C++ . The Purdue Compiler Construction Tool Set (PCCTS) [Par97] was used to build ParSyC. This parser produces an easy-to-process representation of a SystemC design in terms of an intermediate representation. The description is generic, i.e. any further processing can start from this representation, regardless of the application to visualization [GDLA03, GD06], formal veriﬁcation [GD03], or other purposes. As an example, the application to synthesis of RTL descriptions is explained and the efﬁciency is underlined by experiments. Some advantages of this approach are easy extendability, adaptivity, and efﬁciency of the SystemC front-end. This section is structured as follows: The basic concepts of SystemC are discussed in Section 4.1.1. The methodology to create ParSyC and the exemplary application to synthesis are explained in Section 4.1.2. Advantages of the approach are discussed in Section 4.1.3. The experimental results are given in Section 4.1.4.

54

4.1.1

ROBUSTNESS AND USABILITY

SystemC

The main concepts of SystemC are brieﬂy reviewed in the following. SystemC is a system description language that enables modeling at different levels of abstraction. Constructs known from traditional hardware description languages are also provided. By this, any task between design exploration at a high level and synthesis at a low level can be carried out within the same environment. Features to aid modeling at different levels of abstraction are included in SystemC. For example, the concept of channels allows to abstract from details of the communication between modules. Therefore, modeling at the transactional level becomes possible [CJG+ 03]. This, in turn, enables fast design space exploration and partitioning of the design before working on the details of protocols or modules. In practice, SystemC comes as a library that provides classes to model hardware in C++ . For example, a hardware module is described using the class sc module provided by SystemC. All features of C++ are also available in SystemC. This includes dynamic memory allocation, multiple inheritance, as well as any type of complex operations on data of an arbitrary type. Any SystemC design can be simulated by compiling it with an ordinary C++ -compiler into an executable speciﬁcation. But to focus on other aspects of circuit design a formal model of the design is needed. Deriving a formal model from a SystemC description is difﬁcult: A parser that handles SystemC – and for this C++ – is necessary. But developing a parser for a complex language comes at a high effort. Moreover, the parser should be generic in order to aid not only a single purpose but to be applicable for different areas as well, e.g. synthesis, formal veriﬁcation, and visualization.

4.1.1.1 Synthesis In order to allow for concise modeling of hardware, several constructs are excluded from the synthesizable subset of SystemC [Syn02]. For example, SystemC provides classes to easily model buffers for arbitrary data using the class sc fifo. An instance of type sc fifo can have an arbitrary size and can work without explicit timing. Therefore, there is no general way for synthesis. In principle, this could be solved by providing a standard realization of the class. But in order to retrieve a good – e.g. small and/or fast – solution after synthesis, several decisions are necessary. For this reason, it is left to the hardware designer to replace this class by a synthesizable description. The concept of dynamic memory allocation also is hardly synthesizable in an efﬁcient way and, thus, excluded from the synthesizable subset. For a better understanding synthesis of RTL descriptions is used to demonstrate the features of ParSyC. Due to this application the SystemC input is

55

Synthesis

restricted, but as a generic front-end ParSyC can handle other types of SystemC descriptions as well.

4.1.2

SystemC Parser

In this section, the methodology to build a parser and the special features used for parsing SystemC are explained. The synthesis of RTL descriptions is carried out using ParSyC as a front-end. The methodology for parsing and compiling has been studied intensively (see, e.g. [ASU85]). Often the Unix-tools lex and yacc are used to create parsers. But more recent tools provide simpler and more powerful interfaces for this purpose. Here, the tool PCCTS was used to create ParSyC. For details on the advantages of PCCTS see [PQ95, Par97]. The SystemC parser was built as follows: A preprocessor is used to account for preprocessor directives and to ﬁlter out header ﬁles that are not part of the design, like system header ﬁles. A lexical analyzer splits the input into a sequence of tokens. These are given as regular expressions that deﬁne keywords, identiﬁers, etc. of SystemC descriptions. Besides C++ keywords also essential keywords of SystemC are added, e.g. sc module or sc int. A syntactical analyzer checks whether the sequence of tokens can be generated by the grammar that describes the syntax of SystemC. Terminals in this grammar are the tokens. PCCTS creates the lexical and syntactical analyzer from tokens and grammar, respectively. Together they are referred to as the parser. The result of parsing a SystemC description is an Abstract Syntax Tree (AST). At this stage no semantic checks have been performed as, e.g. for type conﬂicts. The AST is constructed using a single node type, that can have a pointer to the list of children and a pointer to one sibling. Additional tags at each node are used to store the type of a statement, the character string for an identiﬁer and other necessary information. This structure is explained by the following example. Example 16. Consider the code fragment in Figure 4.3. Figure 4.2 shows the data types of the variables. Shown is one process of the robot controller introduced in [GLMS02]. Figure 4.4 shows a part of the AST for this process. Missing parts of the AST are indicated by triangles. In the AST produced by PCCTS each node points to the next sibling and to the list of children. The node in the upper left represents the if-statement from Line 3 of the code. The condition is stored as a child of this node. The then-part and the else-part of the statement are siblings of the child.

56

ROBUSTNESS AND USABILITY

1 2 3 4

sc_in<sc_bv<8> > sc_out sc_uint<8> sc_signal

uSEQ_BUS; LSB_CNTR; counter; DONE, LDDIST, COUNT;

Data types in the process counter proc

Figure 4.2.

1 void robot_controller::counter_proc() 2 { 3 if (LDDIST.read()) { 4 counter = uSEQ_BUS.read(); 5 } else if (COUNT.read()) { 6 counter = counter - 1; 7 } 8 DONE.write(counter == 0); 9 LSB_CNTR.write(counter[0]); 10 } Figure 4.3. Process counter proc of the robot controller from [GLMS02]

statement if expression LDDIST

statement write block {

statement write

statement if

expression = ID counter

ID uSEQ_BUS

Figure 4.4.

AST for Example 16

The overall procedure when applying the parser for synthesis is shown in Figure 4.5. The dashed box indicates steps that are application independent, i.e. the corresponding tasks have to be executed for other applications as visualization or formal veriﬁcation as well. The whole process can be divided into several steps.

57

Synthesis SystemC description

Preprocessor

Preprocessed SystemC description

Parser

Abstract syntax tree

Analyzer

Intermediate representation

Synthesizer

Netlist

Figure 4.5.

Overall synthesis procedure

After preprocessing, the parser is used to build the AST from the SystemC description of a design. The AST is traversed to build an intermediate representation of the design. All nodes in an AST have the same type, any additional information is contained in tags attached to these nodes. Therefore, different cases have to be handled at each node while traversing the AST. By transforming the AST into the intermediate representation, the information is made explicit in the new representation for further processing. The intermediate representation is built using classes to represent building blocks of the design, like, e.g. modules, statements, or blocks of statements. During this traversal semantic consistency checks are carried out. This includes checking for correct typing of operands, consistency of declarations, deﬁnitions, etc. Up to this stage the parser is not restricted to synthesis and all processing is application-independent. The intermediate representation serves as the starting point for the intended application. At this point, handling the design is much easier because it is represented as a formal model within the class structure of the intermediate representation. The classes for keeping the intermediate representation

58

ROBUSTNESS AND USABILITY

correspond to constructs in the SystemC code. Each component knows about its own semantic in the original description. Further processing of the design is done by adding application speciﬁc features to the classes used for storing the intermediate representation. In case of synthesis, a recursive traversal is necessary. Each class is extended by functions for the synthesis of substructures to generate a gate level description of the design. Example 17. Again, consider the AST shown in Figure 4.4. The AST is transformed into the intermediate representation shown in Figure 4.6. The structure looks similar to that of the AST, but in the AST only one type of node was used. Now, dedicated classes hold different types of constructs. The differentiation of these classes relies on inheritance in C++ . Therefore, synthesis can recursively descend through the intermediate representation. As usual in synthesis, RTL descriptions in SystemC are restricted to a subset of possible C++ constructs and SystemC constructs [Syn02]. C++ features like dynamic memory allocation, pointers, recursion, or loops with nonconstant bounds are not allowed to prevent difﬁculties already known from highlevel synthesis. In the same way some SystemC constructs are excluded from synthesis as they have no direct correspondence at the RTL, e.g. as shown for sc fifo in Section 4.1.1. Thus, for simplicity SystemC channels were

CIfStatement

CAssignStat.

CAssignStat.

condition

destination

destination

then

source

source

else

CLiteralExp.

CBlock

CIfStatement

LDDIST

statements

condition then else

CAssignStat. destination source

CVariableExp.

CVariableExp.

counter

USEQ_BUS

Figure 4.6. Intermediate representation

59

Synthesis

excluded from synthesis. For channels that obey certain restrictions synthesis can be extended by providing a library of RTL realizations. Supported are all other constructs that are known from traditional hardware description languages. This comprises different operators for SystemC data types, hierarchical modeling, or concurrent processes in a module. Additionally, the new-operator is allowed for instantiation of submodules to allow for a compact description of scalable designs. The outcome of the synthesis process is a gate-level description in Berkeley logic interchange format (blif format) as used by SIS [SSL+ 92]. Switching the output format to VHDL or Verilog on the RTL is easily possible. Here, the focus is on parsing SystemC and retrieving a formal model from the description, no optimizations are applied. Logic synthesis will be considered later in Section 4.2.

4.1.3

Characteristics

The presented approach to create a formal model from a SystemC description has several advantages: Extendability. SystemC is still evolving. The parser can easily be enhanced to cope with future developments by extending the underlying grammar and the classes for the intermediate representation. The necessary changes are straightforward in most cases. Adaptivity. Here, ParSyC is only exemplary applied to synthesis, but several other applications are also of interest. When starting with a new application that should work on SystemC designs, the intermediate representation directly serves as a ﬁrst model of the design. Decoupling. The complex process of parsing should be hidden from the application. ParSyC serves as the front-end to “understand” a given SystemC description. The application speciﬁc algorithms can be crafted without touching the SystemC code of the design. Efﬁciency. A fast front-end is necessary to cope with large designs. The efﬁciency of the front-end is guaranteed by the compiler–generator PCCTS. The subsequent application can directly start processing the intermediate representation that is given as a C++ -class structure. Experiments are presented in the next section to underline the efﬁciency of ParSyC. Compactness. The parser should be compact to allow for an easy understanding during later use and extension. The parser itself has only ≈1000 lines of code (loc) which includes the grammar and necessary modiﬁcations beyond PCCTS to create the AST. The code for analyzing the SystemC

60

ROBUSTNESS AND USABILITY

code and for the classes that represent the intermediate representation consists of ≈4000 loc. For synthesis ≈2500 loc are needed. The complete tool for synthesis including error handling, messaging, etc. has ≈9000 loc. Comments and blank lines in the source are not included in these numbers.

4.1.4

Experimental Results

All experiments have been carried out on a Pentium IV with Hyperthreading at 3GHz and 1GB RAM running Linux. ParSyC has been implemented using C++ . A control dominated design and a data dominated design are considered in the ﬁrst two experiments, respectively. Large SystemC descriptions are created from ISCAS89 circuits to demonstrate the efﬁciency of ParSyC in the third experiment.

4.1.4.1 Control Dominated Design The scalable arbiter introduced in [McM93] has been frequently used in works related to property checking. Therefore, a SystemC description at the RTL was created and synthesized. The top-level view of the arbiter is shown in Figure 4.7. This design handles the access of NUMC devices to a shared resource. Device i can signal a request on line req in[i] and may access the resource if the arbiter sets ack out[i]. The arbiter uses priority scheduling but also guarantees that no device waits forever (for details we refer 0 token_out req_in

override_in

Cell n–1

token_in

override_out

token_out

override_in

req_in

Cell 1

token_in

override_out

token_out

override_in

req_in token_in

Figure 4.7.

Cell 0 override_out

grant_out ack_out grant_in

grant_out ack_out grant_in

grant_out ack_out grant_in

Arbiter: Block-level diagram

61

Synthesis

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

#include "RTLCell.h" #include "Inverter.cc" #define NUMC 2 SC_MODULE(scalable) { / / D e c l a r a t i o n of inputs , outputs / / and i n t e r n a l s i g n a l s ... Inverter *inv; RTLCell *cells[NUMC]; SC_CTOR(scalable) { for (int i= 0; i < NUMC; ++i) { / / Create c e l l i cells[i]= new RTLCell("cells"); if (i==0) { / / Connect c e l l 0 cells[i]->TICK(clk); ... cells[i]->ove_out(override_out); } else { if (i==(NUMC-1)) { / / C o n n e c t c e l l NUMC−1 ... } else { / / Connect c e l l i ... } } } inv= new Inverter("Inverter"); inv->in( override_out ); inv->out( grant_in ); } } Figure 4.8. Arbiter: Top-level module scalable

to [McM93]). Figure 4.8 shows the SystemC description of the top-level module scalable. For each of the NUMC devices a corresponding arbiter cell is instantiated and the cells are interconnected using a for-loop. Results for the synthesis with different numbers of arbiter cells are shown in Table 4.1. Given are the size of the netlist output and the CPU times needed. Note that

62

ROBUSTNESS AND USABILITY

Table 4.1. Arbiter: Synthesis results NUMC 5 10 50 100 500 1000

In 6 11 51 101 501 1001

Out 5 10 50 100 500 1000

Latches 10 20 100 200 1000 2000

Gates 942 1882 9402 18802 94002 188002

tp <0.01 <0.01 <0.01 <0.01 <0.01 <0.01

ta <0.01 0.01 <0.01 0.01 0.03 0.06

ts <0.01 <0.01 0.02 0.02 0.08 0.16

tt 0.04 0.01 0.04 0.04 0.11 0.24

the arbiter cells are described at the RTL and synthesis is carried out without applying optimizations. The netlist is built using two input gates. The hierarchical description generated by the synthesis tool always contains 190 gates: 188 gates per arbiter cell plus one additional buffer and one inverter. The number of gates that is contained in the ﬂattened netlist is larger. This number is shown in the table. The same holds for the number of latches. The ﬂattened netlist contains two latches per arbiter cell while the hierarchical netlist only contains two latches in total. The times needed for parsing tp , analyzing ta , synthesis ts , and the total time tt are shown in the respective columns. As can be seen, scaling the arbiter does not inﬂuence the time for parsing because only the constant NUMC in the source code is changed. The time for analysis increases moderately since type checks for the different cells have to be carried out. During synthesis the for-loop has to be unrolled and, therefore, scaling inﬂuences the synthesis time. The total time is dominated by the time needed for synthesis and includes overhead like reading the template for the output format, parsing the command line, etc. Even synthesizing a design that corresponds to a ﬂattened netlist with 188 k gates takes less than one CPU second.

4.1.4.2 Data Dominated Design The second design is a Finite Impulse Response (FIR)-ﬁlter of scalable width. Scalable are the number of coefﬁcients and the bit-width of data. A blocklevel diagram of the FIR-ﬁlter is shown in Figure 4.9. Incoming data is stored in a shift-register (d[0],...,d[n-1]). A read only memory stores the ﬁlter coefﬁcients (c[0],...,c[n-1]) as constants. The result is provided at the output dout. The SystemC description contains one process to create the shift-register and another process that carries out the calculations. The coefﬁcients are provided by an array of constants. Synthesis results for different bit-widths and numbers of coefﬁcients are given in Table 4.2. In case of the arbiter, additional checks for submodules were necessary during analysis of the for-loop. This is not the case for the FIR-ﬁlter where no submodules are created, therefore, scaling does not inﬂuence the time needed for analysis. But for the FIR-ﬁlter the time for synthesis increases faster compared to the arbiter

63

Synthesis reset din

d[1] d[2]

x

x

d[n]

x +

c[1]

c[2]

dout

+ c[n]

Figure 4.9. FIR-ﬁlter: Block-level diagram

Table 4.2. FIR-ﬁlter: Synthesis results Width 2 2 2 4 4 4 8 8 8 16 16 16 32 32 32 64 64 64

Coeff 2 4 8 2 4 8 2 4 8 2 4 8 2 4 8 2 4 8

In 3 3 3 5 5 5 9 9 9 17 17 17 33 33 33 65 65 65

Out 4 4 4 8 8 8 16 16 16 32 32 32 64 64 64 128 128 128

Latches 8 12 20 16 24 40 32 48 80 64 96 160 128 192 320 256 384 640

Gates 159 301 585 611 1189 2345 2283 4501 8937 8699 17269 34409 33819 67381 134505 133211 265909 531305

tp <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01

ta <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 0.01 <0.01 <0.01 <0.01

ts <0.01 <0.01 0.02 0.01 0.02 0.03 0.03 0.05 0.10 0.07 0.14 0.30 0.29 0.63 1.22 1.23 2.34 5.34

tt <0.01 <0.01 0.03 0.01 0.04 0.03 0.04 0.05 0.12 0.08 0.15 0.31 0.29 0.64 1.23 1.23 2.35 5.34

when the design is expanded. This is due to the description of the multiplication as a simple equation in SystemC: 1 for (int i=0; i < n; i++) { 2 tmp = c[i] * d[i].read(); 3 out = out+tmp; 4 } Instead of instantiating modules, the operations are directly described in the netlist. Therefore, the bit-width and the number of coefﬁcients have a direct inﬂuence on the synthesis time and the size of the output.

64

ROBUSTNESS AND USABILITY

Even the large design of 500 k gates has been parsed and analyzed very fast. The synthesis which includes writing the output to the hard disk only took about 5 CPU seconds.

4.1.4.3 Large SystemC Descriptions The ﬁrst two experiments showed the inﬂuence of scaling different types of designs. In the following, the inﬂuence of a large SystemC description is investigated. Circuits from the ISCAS89 benchmark set are considered. Starting from the netlist, BDDs (see Section 2.1.2) were built for each circuit. While building the BDD, no reordering techniques were applied for size reductions. For each output and next state the BDD was dumped into an if-then-else-structure which was embedded in a SystemC module. This module was synthesized. The results are shown in Table 4.3. Given are the name of the circuit, the lines of code loc and the number of characters char in the SystemC code. The circuits are ordered by increasing loc. As can be seen, the time for parsing increases with the size of the source code, but is small even for large designs of several 1,00,000 loc. The time needed for analysis increases faster due to the semantical checks and the translation into the intermediate representation that are carried out at this stage. The largest amount of time is due to synthesis where the intermediate structure is traversed and the netlist is written. As a reference, the time needed to compile the SystemC code using g++ (version 3.3.2, optimizations turned off, no linking is done) is given in column tg++ . Compiling the SystemC description using g++ means to create an Table 4.3. ISCAS 89: Synthesis results Circuit s27 s298 s382 s400 s386 s526 s344 s349 s444 s641 s713 s1488 s1494 s1196 s1238 s820 s832

Loc 184 1269 2704 2704 4260 3332 5103 5103 6264 54849 54849 60605 60605 247884 247884 402546 402546

Char tp ta ts 3129 <0.01 <0.01 0.01 20798 0.01 0.02 0.07 47343 0.02 0.05 0.16 47343 0.03 0.04 0.16 69331 0.04 0.07 0.24 52999 0.03 0.05 0.19 86055 0.04 0.06 0.29 86055 0.05 0.09 0.29 97100 0.06 0.11 0.39 847546 0.48 1.27 4.16 847546 0.50 1.29 4.24 981692 0.57 1.15 3.61 981692 0.55 1.17 3.61 3817191 2.27 5.57 16.53 3817191 2.33 5.62 16.58 6130213 3.80 10.53 25.36 6130213 3.75 10.57 25.69

tt 0.02 0.12 0.26 0.26 0.39 0.31 0.45 0.49 0.63 6.47 6.58 5.95 5.96 26.82 27.01 43.77 43.99

tg++ 2.26 2.26 2.44 2.41 2.57 2.56 2.70 2.73 2.78 8.27 8.52 8.81 8.84 30.88 31.28 42.68 43.12

Synthesis

65

executable description of the design for simulation while synthesis creates the hardware description of the design. The total run time needed for synthesis is comparable to the time needed by g++, even for the largest ﬁles. The experiments have shown that ParSyC is an efﬁcient front-end for SystemC. For this purpose designs have been considered that are large in terms of the number of gates and in terms of the size of the SystemC description. The performance of ParSyC is comparable to the efﬁcient and widely used compiler g++.

4.2

Synthesis for Testability

The previous section provides the methodology to transform a SystemC description into a functional description at the gate level. This gate-level description essentially describes the circuits as a Boolean function. In this section, a gate-level description is transformed into a fully testable circuit. The underlying algorithm relies on the use of BDDs for logic synthesis. BDDs have been applied in several applications. For example, the importance for formal veriﬁcation has already been discussed. But BDDs have also been studied in logic synthesis since they allow to combine aspects of circuit synthesis and technology mapping [GD00, DG02]. Recently, there is a renewed interest in multiplexor-based design styles, since often multiplexor nodes can be realized at very low cost (as, e.g. Pass Transistor Logic (PTL)). In addition, these techniques allow to consider layout aspects during the synthesis step and by this guarantee high design quality (see, e.g. [MSML99, MBM01]). In this context, circuits derived from BDDs often result in smaller chips. Besides size, the testability of chips is another important issue, i.e. which faults in the resulting chip can be tested and which ones cannot. BDD circuits as introduced in Section 2.2.2 have been studied intensively under various fault models [ADK91b, ADK91a, Bec92, ADK93], for an overview see [Bec98]. But none of these approaches can guarantee 100% testability in a “systematic way”. For example, in [Bec92] an algorithm is given that can compute all redundancies of the circuit in polynomial time. But the removal of these redundancies can generate new ones (so-called 2nd-generation redundancies). For their removal only classical ATPG can be applied. On the other hand, many approaches have been presented to improve the testability of an already synthesized circuit based on circuit transformations (see, e.g. [CPK95]). But also here, the techniques to ensure full testability can be time consuming and it is desirable to have “testability by construction”. In this section, a simple transformation is presented that guarantees full testability of a circuit derived from a BDD description under the stuck-at fault model and the robust path-delay fault model. The size of the circuit is directly proportional to the given BDD size (see Section 2.2.2). All optimizations of the BDDs based on variable ordering directly transfer to the resulting circuit

66

ROBUSTNESS AND USABILITY

sizes. Only one extra input and one inverter are needed. The resulting circuits are free of redundancies. The algorithm has been implemented as a tool for Multiplexor Transformation for Testability (MuTaTe). Experimental results are given that show the advantages of the approach compared to traditional synthesis approaches and to “classical” mapping of BDDs. The presentation is structured as follows: The creation of fully testable BDD circuits is explained in the next section. Then, in Section 4.2.2, the main results on testability are presented. Experimental results that show the efﬁciency of the synthesis step and the improvements regarding testability are reported in Section 4.2.3.

4.2.1

BDD Transformation

In the following, we ﬁrst describe the transformation how to derive a circuit from a given BDD description. Then, some properties of the resulting circuits are discussed. In the next sections, testability properties regarding the SAFM and the PDFM are studied. Analogously to the “standard approach” from [Bec92] as explained in Section 2.2.2, the circuit is generated by traversing the BDD and substituting each node with a MUX cell. But the methods differ when reaching nodes that have one or two pointers to terminal nodes. In this case, usually the MUX cell is simpliﬁed. For example, if the 0-input is connected to constant 0, the MUX cell can be simpliﬁed and can be substituted by an AND-gate. Here, all nodes – also the ones pointing to terminals – are substituted by complete multiplexor cells. The terminal node 0 is then substituted by a new primary input t (=test). Furthermore, t is connected to the 1-terminal of the BDD by an inverter. Example 18. The generation of a circuit from a BDD was already explained in Example 8 in Section 2.2.2. Figures 4.10(a) and 4.10(b) repeat this example. The BDD for function f = x1 x2 + x3 is drawn upside down to underline the similarities to the resulting circuit. If the approach from [Bec92] is applied, the BDD circuit in Figure 4.10(b) results (shown without simpliﬁcation). The transformation described above generates the circuit in Figure 4.10(c). Remark 6. It is important to notice that for multiplexor-based design styles, like, e.g. PTL, the “simpliﬁcation” of the MUX cell does not really imply savings in area or delay since the complete multiplexor cell is often easier to realize. If t is set to constant 0, the circuit computes the original function. If t is set to 1, the complement is computed. It is important to observe that by changing

67

Synthesis 0

0

1

x3

x3

x2

x1

x1

t

x3

0 1

0 1

x2

f (a) BDD

1

0 1

0

1

x2

0

0

x1

y (b) Approach of [Bec92]

Figure 4.10.

1

1

y (c) MuTaTe

Generation of circuits from BDDs

the value of t all “internal” signals, i.e. signals corresponding to edges in the BDD, change their value. This can be seen as follows. Inverting the terminals means inverting the function represented by a node. According to the Shannon decomposition, this inversion propagates to all intermediate nodes as well: f

= xi g + xi h = (xi g)(xi h) = (xi + g)(xi + h) = xi g + xi h

Applying this recursively to the BDD shows that all signals in the BDD circuit change their value. This guarantees applicability of the values needed at the fault location as explained below. Furthermore, the propagation of the faulty behavior to an output has to be ensured. This is one of the reasons, why multiplexor cells are suited to compose circuits that are well testable. Due to the control input, a propagating path can easily be generated. Thus, the propagation of a value from a fault location is no problem and only the values to excite the faulty behavior have to be applied. In previous approaches (see, e.g. [Bec92, ADK93]), modiﬁcations of the circuit were described; but these change the multiplexor structure and, by this, also destroy the propagation properties of multiplexors.

68

4.2.2

ROBUSTNESS AND USABILITY

Testability

The main results with respect to the testability of circuits generated by MuTaTe are presented in the following. Sketches to formally proof these results are also given.

4.2.2.1 Stuck-At Fault Model As has been observed in [Bec92], stuck-at redundancies in a mapped BDD circuit can only occur if one of the values 01 or 10 is not applicable to the data inputs of this cell. On the other hand, at least one of the values is applicable (otherwise both functions would be equivalent and, consequently, the BDD would not be reduced). Due to the properties of the new input t, i.e. all internal signals change their value, the missing value can be applied by changing the value at t. We obtain: Theorem 1. By one additional input and one inverter a circuit can be generated from a BDD that is 100% testable for single stuck-at faults. For the generation of a test, the efﬁcient polynomial synthesis operations on BDDs can be used (see Section 2.1.2). For each multiplexor cell the set of applicable values can easily be computed by carrying out AND-operations on the corresponding BDD nodes. The propagating path can be determined by a linear time graph traversal. Lemma 1. In the resulting circuits, test pattern generation for stuck-at faults can be carried out in polynomial time.

4.2.2.2 Path-Delay Fault Model The same arguments as given above also ensure that all paths are testable under the PDFM. At each cell the values 10 or 01 can be applied (dependent on t). Thus, the paths starting at an input corresponding to the variable xi can be propagated along any of the two AND-gates in the MUX cell (see Figure 2.9 on Page 22). Furthermore, due to the propagation along the multiplexors, it is easy to see that the paths starting at t can be tested. We obtain: Theorem 2. By one additional input and one inverter a combinational circuit can be generated from a BDD that is 100% testable for robust path-delay faults. After the applicable values have been determined based on BDD operations, two patterns for a robust test can be determined by a traversal of the circuit in linear time. The two patterns only differ in the value of the primary input where the path starts. Lemma 2. In the resulting circuits, test pattern generation for path-delay faults can be carried out in polynomial time.

69

Synthesis x2 x1

x1 1 x2

x3

0

1

0

Figure 4.11.

1

x3

0

1

Redundancy due to simpliﬁcation

4.2.2.3 Partial Simpliﬁcation As has been observed in Remark 1 on Page 22, the simpliﬁcation of the MUX cells can destroy the testability. But not all types of simpliﬁcations have this property, i.e. if both data inputs have constant values the MUX cell can be substituted by a simple wire or an inverter. Due to this substitution, 100% testability is preserved. Dependent on the design style this should be preferred. If exactly one of the data inputs is constant, there exist four cases, i.e. left or right data input is constantly 0 or 1, respectively. For one of these cases we provide an example showing that the simpliﬁcation of the MUX cell results in redundancies. A similar example can be created for the remaining three cases. Example 19. Assume, the right data input is constantly 1 (see Figure 4.11): The simpliﬁcation results in an OR-gate, but then the combination 01 cannot be applied to the data inputs of the next MUX since a value 1 at the right data input directly implies a 1 at the left data input. According to the classiﬁcation given in [Bec92], this would result in untestable faults in the MUX. In summary, only if both data inputs have constant values the MUX cell can be simpliﬁed without creating redundancies in design styles where this is beneﬁcial. For all other four cases full testability cannot be guaranteed. In the following, we only perform the simpliﬁcations that ensure full testability.

4.2.3

Experimental Results

The technique described above has been implemented as the tool MuTaTe. The program is implemented in C and all experiments have been carried out on a SUN Sparc 20 with 64 MB of main memory. For the experiments we used some of the benchmarks from LGSynth91 [Yan91]. As the underlying BDD package, CUDD has been used [Som01a]. In the following, we restrict ourselves to a study of the PDF coverage (PDFC) of the circuits. The PDFC was determined using an improved version of the tool BiTeS [Dre94]. For each circuit we report the number of literals (measured using SIS [SSL+ 92]), the number of paths that have to be tested, and the PDFC in percent. Of course, the number of paths can become a crucial factor since a large number results

70

ROBUSTNESS AND USABILITY

Table 4.4. Benchmarks before and after optimization by SIS Circuit 5xp1 C17 alu2 b9 clip con1 count i1 i5 t481 tcon 9sym f51m z4ml x2

In 7 5 10 41 9 7 35 25 133 16 17 9 8 7 10

Out 10 2 6 21 5 2 16 13 66 1 16 1 8 4 7

Lits 391 23 909 408 1026 48 338 118 689 2673 90 655 409 325 97

Original NoP 296 11 69521 301 888 23 368 97 883 4752 56 522 319 252 90

PDFC 98.4 100.0 1.0 93.3 90.4 100.0 100.0 97.4 100.0 100.0 85.7 96.5 98.2 100.0 74.4

Lits 158 10 415 171 273 21 159 50 198 532 32 333 155 46 51

Optimized NoP 1071 11 61362 410 571 22 384 98 672 2405 40 518 30950 368 79

PDFC 19.9 100.0 0.9 89.0 74.3 100.0 100.0 77.5 100.0 89.5 100.0 91.1 0.7 25.2 81.0

in high test costs. For BDD circuits reducing the number of paths in the BDD [FD06] is one method to address this problem [FSD04]. In Table 4.4, the name of the benchmark is given in the ﬁrst column followed by the number of inputs and outputs in column two and three, respectively. The number of literals, the number of paths, and the PDFC are given in column lits, NoP, and PDFC, respectively. Column original gives the numbers for the benchmark as it is given in the description. Column optimized gives the numbers for the circuits that have been optimized by SIS using script rugged. As can be seen, the PDFC varies signiﬁcantly. While some circuits have a testability of 100%, for others only 1% of the paths (or even less) are robustly testable. It is also important to notice that the optimization techniques used can result in a large number of untestable paths although the original circuit was very well testable. Consider circuit f51m as the most obvious example. Even though the original circuit had a PDFC of 98.2%, the coverage of the optimized netlist is less than 1%. In a second series of experiments, we study the PDF testability of BDD circuits. The results for BDDs without optimization are shown in Table 4.5. Table 4.6 presents result when optimization was used. In column MUX-map, the results are given for a direct mapping of BDDs with simpliﬁcation of the constant values as described in [Bec92]. As has been observed in Remark 1, the “full simpliﬁcation” can result in untestable paths. But already in this case the resulting BDD circuits have a signiﬁcant better testability than the ones generated by SIS (see above), i.e. always more than 60%.

71

Synthesis Table 4.5. Path-delay fault coverage of BDD circuits Circuit 5xp1 C17 alu2 b9 clip con1 count i1 i5 t481 tcon 9sym f51m z4ml x2

Lits 273 26 652 609 603 53 832 157 2659 301 32 84 239 75 147

MUX-map NoP PDFC 273 89.0 22 68.1 873 86.9 1773 64.6 954 79.4 47 74.4 2248 66.1 137 74.4 44198 61.3 4518 86.1 40 100.0 328 72.5 326 99.3 175 77.1 188 72.3

Lits 320 41 718 768 667 77 928 230 3032 328 96 97 262 92 183

MuTaTe NoP 574 44 1713 3429 1862 95 4072 295 81381 8671 112 658 668 358 379

PDFC 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

Lits 308 29 698 700 655 61 864 186 2768 312 32 93 242 84 171

MuTaTe-S NoP 364 26 984 2370 1130 59 2704 184 51345 5473 40 490 332 232 244

PDFC 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

Table 4.6. Path-delay fault coverage of BDD circuits optimized by sifting Circuit 5xp1 C17 alu2 b9 clip con1 count i1 i5 t481 tcon 9sym f51m z4ml x2

Lits 131 21 673 374 353 38 221 141 341 72 32 84 133 51 87

MUX-map NoP PDFC 218 83.0 18 66.6 749 84.5 870 66.7 764 76.4 28 85.7 624 58.9 111 71.1 1102 67.6 4065 74.6 40 100.0 328 72.5 236 83.8 169 72.7 80 73.7

Lits 168 29 746 520 391 61 320 194 552 80 96 97 158 64 123

MuTaTe NoP 463 35 1446 1707 1466 59 1024 226 2301 7201 112 658 494 346 160

PDFC 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

Lits 160 25 730 468 379 45 252 170 540 76 32 93 146 60 111

MuTaTe-S NoP PDFC 337 100.0 23 100.0 867 100.0 1227 100.0 938 100.0 35 100.0 880 100.0 142 100.0 1998 100.0 4996 100.0 40 100.0 490 100.0 314 100.0 238 100.0 112 100.0

The results for the new approach are given in the next two blocks. Column MuTaTe gives the results for a direct mapping, i.e. the new test input is connected to each constant input to a MUX cell, while MuTaTe-S performs the simpliﬁcations described in Section 4.2.2.3 that preserve testability. In some rare cases (e.g. tcon), this reduction also removed the additional input. In these cases 100% PDFC is ensured while no additional input is needed.

72

ROBUSTNESS AND USABILITY

As can be seen, in both cases 100% PDFC is ensured. As is well-known, the size of a BDD (and by this of the resulting BDD circuit) largely depends on the chosen variable ordering. Comparing the literal count of the ﬁnal circuits in Table 4.6, i.e. the size optimized BDDs, with those of SIS in Table 4.4, shows that the synthesis methods are somehow “orthogonal”. For several circuits the sizes are comparable. In some cases SIS is signiﬁcantly better (see, e.g. b9) while for others BDDs are better suited. For example, for t481 the BDD circuit generated by MuTaTe-S is seven times smaller than the corresponding circuit produced by SIS. Furthermore, the synthesis scenario considered in our experiments is to be seen as “worst case” for BDD circuits (cf. Remark 6) since all cells are mapped to basic gates. For MUX oriented design styles the reduction in size can be expected to be even larger.

4.3

Summary and Future Work

A ﬂow to produce fully testable circuits from SystemC descriptions has been outlined. This is done in two steps. The parser and synthesis tool ParSyC is used as a front-end to construct a formal model from a SystemC description. The formal model is given by an intermediate representation that can serve as a starting point for other applications in the design ﬂow as well, e.g. veriﬁcation and visualization. This hides the complexity of parsing a SystemC description from the intended application. As an example, the synthesis of RTL descriptions has been shown. Several experiments underline the efﬁciency of the tool. Bridging the reﬁnement steps within SystemC using formal techniques remains an important area for future work. The second step is logic synthesis. Here, synthesis for testability is applied to the gate-level model using a BDD transformation. The resulting circuits are fully testable under the stuck-at fault model and under the (robust) pathdelay fault model in the combinational case. The transformation only needs one extra input and one inverter. The algorithm has been implemented as the program MuTaTe. Experimental studies have demonstrated the advantages of the approach. The optimization of the resulting circuits has been studied in [FSD04]. There, for example, the technique proposed in [FD06] was applied to reduce the number of paths in the BDD and, by this, the time needed for testing. Furthermore, results on the testability with respect to the bridging fault model have been reported in [SFD05a]. Unfortunately, the technique is only applicable to relatively small circuits because it relies on monolithic BDDs. This also means that the composition of fully testable circuits created using MuTaTe is an important direction for future work. Meanwhile, traditional ATPG techniques have to be applied to those circuits that cannot be handled by MuTaTe. But formal techniques can also be applied to improve the robustness of traditional ATPG algorithms, e.g. by integrating modern SAT provers into traditional ATPG environments [Lar92, SBSV96, SFD+ 05b, DF06].

Synthesis

73

Altogether the two techniques proposed in this chapter increase the robustness of the design ﬂow. The application of SystemC as the only language for system level and register transfer level helps to avoid inconsistencies between the different abstraction levels. Having a technique to create fully testable circuits during logic synthesis improves the robustness of ATPG, especially due to the ability to classify all faults in polynomial time.

This page intentionally blank

Chapter 5 PROPERTY GENERATION

In the previous chapter, the synthesis path of the design ﬂow has been considered. This chapter focuses on design veriﬁcation. The coarse parts of the ﬂow covered in this chapter are shown in Figure 5.1. A more detailed presentation of the proposed methodologies is given in the respective sections. Veriﬁcation is a major issue in the design of integrated circuits and systems. According to Moore’s law, the number of elements in a manufacturable circuit doubles every 18 months. But the design productivity increases at a lower rate. Resulting is a design and veriﬁcation gap. On the other hand veriﬁcation of circuits is becoming even more important as circuits are applied in a variety of systems concerned with security issues. Design veriﬁcation means to check the compliance of the system with the textual speciﬁcation. Traditionally this is done by simulation. A testbench is deﬁned that contains typical operation scenarios of the system and also other scenarios of interest. These are simulated on the system-level description and the HDL description. The correctness of the output responses of the design is controlled. The main disadvantage of this approach is the low coverage of input sequences and design states. For example, a design with only 100 ﬂip-ﬂops has already 2100 states. Therefore, only a fraction of the states, not to mention the interaction between different states, can be checked using simulation. Often corner cases are not considered. The “Pentium bug” is a well-known example that escaped veriﬁcation and eventually caused a huge ﬁnancial loss and – even worse – damaged the image of the company. In contrast, property checking is complete in the sense that a proven property is valid in all states of the design and under any input sequence. Moreover, due to the maturity of veriﬁcation tools property checking becomes feasible for larger designs and even complete industrial designs can be veriﬁed at the block level by property

76

ROBUSTNESS AND USABILITY Textual specification

Interactive creation

Manual coding

Simulation traces

System level description

Simulation

Properties

Property check

Counterexamples Manual refinement

Synthesizable description

Figure 5.1. Veriﬁcation part of the design ﬂow

checking [WTSF04]. But the creation of properties is a manual task that is not supported by any software tools. The techniques proposed in this chapter provide an innovative approach to address both deﬁciencies explained above: the detection of gaps in testbenches and the creation of properties. Moreover, a completely new veriﬁcation methodology is introduced that helps to speed up “design understanding”. In contrast to standard veriﬁcation approaches, the new one is interactive and relies on the automatic generation of properties. The ﬁrst part of the chapter introduces a technique to automatically derive properties from given simulation traces. The technique is based on pattern matching and is very efﬁcient. The derived property is always valid with respect to the simulation trace and can be veriﬁed on the design afterward. If the property is not valid on the design, the counterexample explicitly shows a gap in the testbench, i.e. a sequence of input values or states that was not covered in the testbench. In this sense, the automatic generation of properties helps to set up a testbench in a traditional design environment. By this, the approach bridges the transition between the traditional design ﬂow and the enhanced design ﬂow proposed here. Additionally, a computer aided creation of properties becomes possible. This speeds up the time-consuming manual creation of formal properties also in the enhanced design ﬂow. This technique has been published in [FD04]. The second part of the chapter builds on these techniques for an interactive generation of properties and – even more important – yields a new veriﬁcation methodology for design veriﬁcation. The tool for property generation is

Property Generation

77

used interactively to question the behavior of the design. For this purpose a set of signals and possibly a description of the scenario of interest is selected by the user. This is used to generate a property which is then compared to the expectation. This way the behavior of the design can be explored interactively. As a result, the user learns information about the design which is termed “design understanding” in the following. This methodology was introduced in [DF04].

5.1

Detecting Gaps in Testbenches

Formal veriﬁcation methods guarantee completeness under any input sequence and in any state of the design. The compliance of a design with the speciﬁcation is formally veriﬁed by model checking. Nonetheless, due to familiarity of designers with simulation and the fact that whole systems cannot be handled by property checking due to the complexity, simulation is still widely used to check the correctness of a system. For this, large testbenches for a system are created. Techniques to gather information about the reliability of the veriﬁcation mostly rely on coverage metrics. Simulation-based approaches use monitors during simulation to determine the amount of coverage, e.g. statement coverage or line coverage. But even achieving 100% coverage with respect to a certain metric by simulation still cannot guarantee correctness. Here, a method is proposed to formally analyze a testbench and to check which parts of the functional behavior of a design are not tested. Therefore, this technique bridges the gap between the traditional simulation-based veriﬁcation ﬂow (see Figure 1.1) and the enhanced veriﬁcation ﬂow based on property checking (see Figures 1.2 and 5.1). As a result, the traditional techniques and the new techniques can be combined as shown in Figure 5.2. PropGen is the tool to generate properties from the testbench. While usually only the simulator is available to check the design by means of the testbench, PropGen is employed to analyze the testbench. An invalid property leads to a counterexample produced by the property checker. This counterexample exhibits the behavior that is not examined by the testbench, i.e. a gap in the set of stimuli provided by the testbench. This knowledge can be used, e.g. to extend the testbench. The integration of PropGen and the property checker – as indicated by the dashed shape in the ﬁgure – results in an easy-to-use push-button tool for analyzing a testbench. The user does not have to know about the underlying formal techniques. In summary, a crosscheck of testbench and design is established by this method. Additionally, a mechanism to generate more focused properties is provided. The generation of properties which show all dependencies between certain signals can lead to properties that are too general. By applying restrictions, PropGen can be guided to ﬁnd properties for certain situations, e.g. a particular

78

ROBUSTNESS AND USABILITY

Textual specification

PropGen

Manual coding

Simulation traces

System level description

Simulation

Properties

Property check

Counterexamples Manual refinement

Testbench

Synthesizable description

Extend testbench

Figure 5.2. Integration into the veriﬁcation ﬂow

operating mode of the design. Experimental results show the efﬁciency of the approach: A property is generated from traces of more than 1 million clock cycles in at most 6 min but usually in less than 10 s. This section is structured as follows: The basic procedure for PropGen together with techniques for pruning the search space and ﬁnding useful properties is explained next. A heuristic to select “useful” properties from generated properties and a method to guide the search for properties are presented in Section 5.1.2. The application and experiments showing the efﬁciency of PropGen for large traces are given in Section 5.1.3.

5.1.1

Generating Properties

In the following, properties expressed in terms of propositional logic are considered (see Section 2.3.2). The generation of properties is based upon pattern search in a simulation trace. A particular pattern in the trace shows a relationship between signals and, by this, indicates an underlying property. The property is generated by taking all patterns that occur in the trace into account. The basic procedure for deducing a property is given in Figure 5.3: Given is a tuple of signals I and a maximal window tmax for the properties to be generated as well as a simulation trace T = (U, (u1 , . . . , utcyc −1 )). In the property, a particular time step in the window is assigned to each signal; this time step is not given in advance. An assignment of time steps to signals is called time relation in the following. The iteration of all possible time relations R is the outer loop (Line 2). At the beginning nothing is known about the property, it is

79

Property Generation 1 2 3 4 5 6

PropGen(I, tmax , T ) foreach time relation R p(R) = 0 foreach time step 1 ≤ t < tcyc pat= getPattern(I,R,T ,t); addPattern(p(R), pat);

Figure 5.3. Sketch of the property generation

x1

x2

0

0

0

x2

x1

1

1

0

1

0

0

s1

s1

0

1

1

0

1

0

s2

s2

0

0

1

1

0

1

s3

0 u0

0 0 1 1 0 u1 u2 u3 u4 u5

s3 0

1

2

3

4

(a) Waveform

Figure 5.4.

5 t

0

0

0

(b) Vector representation of the waveforms

Simulation trace for the shift-register

initialized to the constant function 0. Then, at each time step of the trace the behavior of the signals is determined in terms of a pattern (Line 5) and included in the property (Line 6). The property for a particular time relation R is valid within the trace by construction because all occurring patterns are considered. A pattern is the vector that gives the values of signals at the time steps as determined by the time relation. The time relation R assigns to a signal sig ∈ I the time offset within the property R(sig). For a window starting at time t the value inserted for signal sig is sig[t + R(sig)]; thus, the pattern is determined by the trace. This assignment of values for a pattern is done by getPattern. Then, the behavior reﬂected by the pattern is included in the property by addPattern. This is achieved by rewriting the pattern as a conjunction of literals of the variables in I at the time steps determined by R. For a value of 0 in the pattern the negative literal is used, for the value 1 the positive literal is used. This cube determines one valid assignment to the signals, the sum of all these cubes leads to the property p(R). Example 20. Again, the simulation trace that was introduced in Example 11 on Page 30 is considered. For convenience the waveforms and the trace are repeated in Figure 5.4 which shows the shift-register that produces this trace.

80

ROBUSTNESS AND USABILITY

Let the tuple of signals I be (x2 , x1 , s1 ). And let R(x2 ) = R(x1 ) = 0 and R(s1 ) = 1. Now, for each time step t the pattern is given by: (νx2 [t], νx1 [t], νs1 [t + 1]) At time steps 0, 1, and 2 a pattern is found, each of which leads to a cube: 0) (0, 1, 1) → x2 [t] · x1 [t] · s1 [t + 1] 1) (0, 1, 1) → x2 [t] · x1 [t] · s1 [t + 1] 2) (0, 0, 0) → x2 [t] · x1 [t] · s1 [t + 1] No other patterns are found at later time steps. The resulting property is the sum of the cubes, i.e. p(R) = x2 [t] · x1 [t] · s1 [t + 1] + x2 [t] · x1 [t] · s1 [t + 1]. The number of time relations is large since each of the signals can be assigned to any time step from 0 to tmax − 1. This leads to (tmax )|I| time relations. But the search space can be pruned by using the following rules: 1. At least one time reference must be zero, otherwise the same time relation is considered more than once with a constant offset. 2. No signal is considered twice at the same time step. If a signal occurs more than once in I, different time steps are assigned to the different instantiations of the signal. 3. An input is never considered at the last time step tmax − 1 in the time relation. The input value has no inﬂuence on an observed state bit or output value if it occurs at the last time step of the window. Another observation helps to further reduce the search space. Given |I|, there can occur at most 2|I| possible patterns with respect to a particular time relation. If all the possible patterns occur, the sum of the cubes returns the constant function 1 as a property, i.e. a property that is always valid or a “trivial property”. Thus, this time relation does not lead to a useful property and further scanning is skipped. Currently, the algorithm considers only one time relation for property generation. As a result, no property that includes several time relations can be generated. This is the case for existential quantiﬁcation: In the propositional property this breaks down to a disjunction of several time relations. The resulting property itself is represented by a BDD (see Section 2.1.2). This introduces some abstraction from the cube representation, e.g. don’t cares are easily determined. Because |I| – the number of signals considered – is relatively small, BDDs are suitable to represent the property.

81

Property Generation

5.1.2

Selection of Properties

This section explains how “useful” properties are chosen, how the generation of properties can be guided and how property completion with respect to the design may be applied.

5.1.2.1 Choosing a Useful Property For each time relation that is not pruned by the rules shown above, a valid property is generated. Then it has to be decided which of this large number of properties are “useful”. Obviously, this cannot be done fully automatically. But indeed some help can be provided. As stated at the end of the last section, a property that is trivially true, i.e. equal to constant 1, is of no use. Also, if the relation between some signals in time is determined by the underlying circuit, the number of occurring patterns is usually small compared to 2|I| . In the other case, if the relation of the signals is not determined by the circuit, the values in the patterns seem to be randomly distributed and thus the number of occurring patterns is close to 2|I| . Example 21. Once more consider the shift-register given in Figure 5.5, I = (x2 , x1 , s1 ) and a trace reﬂecting any state and any input sequence for the shift-register. For the time relation given in Example 20 the following holds: “If x2 [t] = 0, then s1 [t + 1] is equal to x1 [t].” This breaks down to two cubes representing x2 [t] · (x1 [t] ↔ s1 [t + 1]). “If x2 [t] = 1, then s1 [t + 1] and x1 [t] are independent,” leading to the cube x2 [t]. This corresponds to six patterns in total. Now, consider a time relation where the value of s1 is taken at a time step greater than 1. In this case the value cannot be predicted from the other two values. Therefore, all patterns occur and the property becomes the constant function 1, i.e. trivially true. This observation can be used to order the properties generated from the trace. Resulting properties are ordered by increasing numbers of different patterns that were observed. The property with the least number of patterns has the x2 x1

0 1

s1

0 1

Figure 5.5.

s2

0 1

1-bit-shift-register

s3

y1

82

ROBUSTNESS AND USABILITY

highest ranking. This ranking is used to decide about the “usefulness” of properties. The ranking also helps to prune evaluations of other time relations. Only a limited number of properties with the least number of patterns is retained. As a consequence, the order in which different time relations are evaluated inﬂuences the time needed to generate properties. Further scanning the trace can be skipped as soon as the number of patterns observed for a time relation exceeds the previously determined limit. Thus, early ﬁnding the time relation corresponding to the smallest number of patterns makes the evaluation more efﬁcient.

5.1.2.2 Guided Property Generation When being confronted with a large design, more focused properties can be useful. This can be formulated as an assumption to restrict the property generation. In case of the shift-register, there are two different modes of operation. Either x2 = 0, i.e. the register shifts at each clock cycle or x2 = 1, i.e. the register keeps the present state. So far the method only allows to generate properties for any relation between the signals. Often a property focused to a certain mode of operation can be more desirable. This focusing can be done by an extension of the property with an assumption. This assumption restricts some signals in I to a certain value, to the value of another signal in I, or to a particular time step within the window. Only a pattern that does not violate the assumption is included in the property. The assumption can also be rewritten as a propositional formula a. Thus, the resulting property becomes P (R) = (a → p(R)), where p(R) is generated as above but only from patterns fulﬁlling the assumption. Example 22. Assume, that in case of the shift-register the operating mode for shifting is particularly interesting. Therefore, the assumption a = (x2 [t] ↔ 0) is used. In this case only those cubes are collected where x1 [t] ↔ s1 [t + 1] holds. As a result, the property p is generated: p = (x2 [t] ↔ 0) → (x1 [t] ↔ s1 [t + 1]) Currently simple assumptions are allowed, e.g. the restriction of a signal to a certain value or to the value of another signal. Also, a signal can be restricted to be considered only at a particular time step within the window of the property. More complex constructs can easily be allowed by extending the input language used for assumptions. Checking whether an assumption holds breaks down to fast pattern matching on the given trace.

5.1.2.3 Property Completion In cases where a large number of signals or states is considered, the given simulation trace cannot cover the complete behavior of the design. In this case

Property Generation

83

the property that is extracted from the trace is not valid within the design. But formal techniques can be applied to complete this invalid property and retrieve a valid one. For this purpose the engine that is used to prove properties on the design must have the capability to ﬁnd all counterexamples if the property is invalid. This is true in case of BDDs or when an all solutions SAT solver is used. Each counterexample is a pattern that was not contained in the trace. The counterexample is included into the property in order to become valid. When the added counterexamples are shown to the designer, this provides a feedback about behavior that was not covered by the simulation trace. The aim of understanding the design beneﬁts from this feedback.

5.1.3

Experimental Results

The method is suitable for a tight integration of simulation-based veriﬁcation methods with formal proof techniques. Given a trace and a set of signals, a number of properties is generated. Each of these properties is valid on the trace. If the whole testbench is used as the trace, but the resulting property is not valid for the design, a portion of the design was not tested yet. In the following, this is exemplary shown for the small shift-register from Section 2.2. Then, results for increasing length of the traces are shown in detail for one larger benchmark. Finally, sequential benchmarks from LGSynth93 are evaluated. All experiments were carried out on an Athlon XP 2200+ system with a memory of 512MB running Linux. Initially the properties were represented by cubes, these were then converted into BDDs. A simple BDD-based bounded model checker based on CUDD [Som01a] was used to check the validity of the resulting properties. The initial representation of properties by cubes is suitable because the number of patterns cannot exceed 2|I| , where the number of signals |I| is small. Only the property with the fewest patterns was retained as explained in Section 5.1.2 above.

5.1.3.1 Case Study: Shift-Register Again, the shift-register shown in Figure 5.5 is considered. The signal tuple I = (x2 , x1 , s1 , s1 ) was used. The expected property has to reﬂect the operating modes “shift x1 into s1 ” and “keep the value of s1 ”, i.e. in terms of a propositional formula the property should be: p = x2 [t] · (x1 [t] ↔ s1 [t + 1]) + x2 [t] · (s1 [t] ↔ s1 [t + 1]) Property generation was carried out for a trace of length 30 that was randomly generated. At ﬁrst only 5 time steps, then 10, 20 and ﬁnally the whole trace were considered. For these cases the generated property became increasingly

84

ROBUSTNESS AND USABILITY

better as the trace covered more and more of the functionality. For all trace lengths the correct time relation was found, where x1 and x2 were picked at time step 0 and the two instantiations of s1 at time 0 and 1. tcyc = 5 – Only the shifting-mode (x2 = 0) was covered for the case x1 = 0 and s1 = 0: tcyc

p5 = x2 [t] · x1 [t] · s1 [t] · s1 [t + 1] = 10 – Both operation modes were covered but only for certain values of x1 and s1 : x2 [t] (x1 [t] · s1 [t + 1] + x1 [t] · s1 [t + 1] · s1 [t]) +x2 [t] ·x1 [t] · s1 [t] · s1 [t + 1] = 20 – The shifting mode was covered completely, the nonshifting mode only for certain values of x1 and s1 : p10 =

tcyc

x2 [t] ·(x1 [t] ↔ s1 [t + 1]) +x2 [t] ·(x1 [t] · s1 [t] · s1 [t + 1] + s1 [t] · s1 [t + 1]) tcyc = 30 – Both modes were covered completely, yielding the property as given above. This experiment shows how the amount of coverage achieved by the testbench is reﬂected by the property. p20 =

5.1.3.2 Benchmarks Figure 5.6 shows results for misex3. This circuit has 14 inputs and 14 outputs. Circuit misex3 is combinational but property generation still has to ﬁgure out the correct time relation. The diagram gives the number of runs resulting in a valid property for traces of different lengths. For each trace length 50 runs were carried out. As expected, the number of runs resulting in a valid property increases with the trace length because a better functional coverage is achieved. The time for property generation is very moderate as Figure 5.7 shows. The ﬁgure shows results for a decreasing order of time relations and for an increasing order of time relations. This inﬂuences the time needed for the overall algorithm as explained in Section 5.1.2. For the combinational circuit an increasing order of time relations is more efﬁcient. Because the relation between signals has a length of only two time steps, the corresponding time relations are found early when the increasing order is applied. Up to 40,000 cycles, often invalid properties were generated. In this case longer traces lead to better pruning of time relations (the ones that have more patterns than previous ones). For more than 40,000 cycles onward mostly a valid property was generated. From this point additional cycles in the trace do not lead to more pruning but to a linear increase of the time needed for scanning the trace.

85

Property Generation 50 45 40 35

#runs

30 25 20 15 10 5 0 0

20000

40000

60000

80000 100000 120000 140000 160000 180000 200000

tcyc Runs resulting in a valid property for misex3

Figure 5.6.

1.6 increasing order decreasing order

1.4

propGen (sec)

1.2

1

0.8

0.6

0.4

0.2

0

0

20000

40000

60000

80000

100000 120000 140000 160000 180000 200000

tcyc Figure 5.7.

Time needed for property generation for misex3

Table 5.1 shows results for sequential benchmarks. For each circuit ﬁve runs were carried out. The parameter tmax was statically set to 4. For each run a random trace of 1 million clock cycles was generated and 7 signals were randomly chosen for I. The whole process of producing the random trace,

86

ROBUSTNESS AND USABILITY

Table 5.1. Sequential benchmarks, tcyc = 1,000,000 Circuit daio gcd mm4a mm9a mm9b mult16a mult16b phase d. s1196 s1238 s1423 s344 s349 s382 s400 s420.1 s444 s526 s526n s641 s713 s838.1 s838 s953 trafﬁc

v u v u u 1 1 u u u u 1 v 1 1 1 1 1 1 u u 1 1 v v

Run 1 1.03 164.39 3.12 86.25 31.91 0.32 0.31 22.65 7.32 69.62 161.10 8.31 3.02 0.78 0.87 1.87 31.02 1.62 6.24 35.29 78.50 8.48 416.28 8.01 1.62

v u v 1 u 1 1 u u u u v v 1 1 1 1 1 1 u u 1 v v v

Run 2 0.66 303.57 1.64 1.30 131.40 0.08 2.32 2.44 85.81 2.44 41.95 2.39 2.26 1.94 1.09 34.88 7.04 1.60 135.65 25.85 1.51 83.71 1.32 8.22 1.85

v u 1 u u v 1 u u u u 1 1 1 1 v 1 1 1 u 1 1 1 v 1

Run 3 0.95 331.01 6.15 72.92 2.44 1.43 0.69 103.00 71.55 39.85 216.33 1.53 2.69 0.59 1.98 44.71 1.85 3.18 1.52 88.86 2.80 2.46 291.80 11.78 1.32

v 1 v 1 1 1 1 u u u u 1 1 1 1 v 1 1 1 u u 1 v v v

Run 4 4.10 22.45 3.24 26.35 48.90 0.13 0.39 1.83 40.11 55.42 153.03 2.82 1.32 0.49 10.81 24.96 47.36 1.07 3.55 53.93 0.88 290.97 1.25 8.64 1.65

v u v 1 1 1 1 u u u u v 1 1 1 1 1 1 1 u u 1 v v v

Run 5 0.96 167.84 2.83 7.31 21.56 0.09 1.09 133.24 11.91 81.04 29.7 4.26 1.55 1.28 0.47 7.23 76.04 0.93 53.39 7.46 118.19 1.15 1.21 4.43 4.18

generating the property and model checking was limited to 15 min. The time for property generation is shown for each run. The letter preceding the run time denotes the result returned by the BDD-based model checker, i.e. whether the property was valid (v), invalid (i), trivial (1) because all patterns occurred, or the proof engine exceeded the time limit or memory limit and the property was left undecided (u). Even for the large number of 1 million clock cycles to be scanned for properties at most 420 s (s838) are needed. Very often the run time is below 10 s. Especially, runs resulting in valid properties are very fast, e.g. runs 2, 4, and 5 for s838 are much shorter than runs 1 and 3 that result in trivial properties. This allows to use the tool on testbenches for large designs. So far mainly quantitative studies were carried out with respect to the run time of the algorithm and the number of properties that was produced. The quality of the resulting properties is further examined in Section 5.2.4.2 in the context of design understanding.

Property Generation

5.2

87

Design Understanding

The technique to automatically generate properties from traces and its capability to bridge the gap between the traditional veriﬁcation ﬂow and the enhanced veriﬁcation ﬂow based on formal methods have been introduced in the previous section. Now, we exploit these techniques as a basis for a new veriﬁcation methodology. As already explained, the classical approach to veriﬁcation is based on simulation, but creating large testbenches and manually coding monitors is very time consuming and error prone. Setting up properties for formal veriﬁcation is a time-consuming manual process as well. Moreover, all these techniques can only be applied if a formal description of the circuit exists – either on the behavioral level or on the RTL. But with increasing design complexity it becomes more and more important to get an understanding of the design, i.e. to check whether the implemented formal model corresponds to the intention and ideas of the person who wrote the initial speciﬁcation (usually in form of a textbook). This speciﬁcation is commonly given in natural language and by this may contain inconsistencies, nonprecise descriptions or even contradicting requirements. In this section, a new approach is presented that is based on formal techniques. In contrast to other formal approaches, the goal is not to prove the correctness of given formulas or properties but to automatically generate properties. These properties are shown to the designer. So he/she gets feedback about the functional behavior of the system and he/she can “discuss” with the tool. This method focuses on design understanding. It can be applied as soon as a speciﬁcation that produces cycle accurate traces is available. By this, the method targets design veriﬁcation instead of implementation veriﬁcation – in contrast to most other tools. This difference will be explained in more detail below. The section is structured as follows: In Section 5.2.1, we describe the underlying ideas and the methodology in more detail. First, the classical design methodology is brieﬂy reviewed and the resulting veriﬁcation approaches are discussed. Then, the new technique is presented and advantages and disadvantages are discussed. The work ﬂow to apply the proposed methodology using PropGen is explained in Section 5.2.3. In Section 5.2.4, experimental results are reported.

5.2.1

Methodology

In this section, we ﬁrst brieﬂy review the classical design ﬂow and resulting implications for veriﬁcation techniques. Then, the new methodology is introduced. The integration into the design ﬂow is described and resulting beneﬁts are discussed.

88

ROBUSTNESS AND USABILITY

5.2.1.1 Classical Veriﬁcation Still, veriﬁcation is carried out at the end of the design process as shown in Figure 5.8 that covers the veriﬁcation part of the standard design ﬂow from Figure 1.1. To illustrate the deﬁciencies, the process is brieﬂy reviewed: 1. The initial idea is written down in a textual speciﬁcation. Even though the speciﬁcation might contain some formal parts, it is usually given in form of a natural language. This speciﬁcation is then handed to the design team. 2. The textual speciﬁcation is formalized and used to build a system model. This can be done on a behavioral level or on the RTL. Usually a common programming language like C or C++ is used to implement the system model. 3. The model is then coded in HDL. This HDL model is built according to the textual speciﬁcation.

Textual specification

Manual setup

Manual coding

Testbench

Simulation

System level description Counterexamples Manual coding

Detection of a bug or inconsistency

Manual fault diagnosis

Synthesizable description

Synthesis

Figure 5.8.

Current veriﬁcation methodology

Property Generation

89

4. The HDL model is checked Against the system model by means of testbenches (as indicated by bold arrows) or Against the speciﬁcation by means of property checking (in a more advanced ﬂow). 5. The HDL model is synthesized. Following the usual notation, the veriﬁcation of the speciﬁcation is addressed as design veriﬁcation while implementation veriﬁcation covers the steps from the ﬁrst formal description down to the ﬁnal layout (including various stages of equivalence checking, etc.). Design veriﬁcation is only addressed in Step 4. Remark 7. Since the main focus of this section is on the veriﬁcation of the design entry, all veriﬁcation issues related to design implementation, like, e.g. verifying the correctness of the synthesis process by equivalence checking, are not further considered. Moreover, in a large design project frequent changes of the speciﬁcation may occur. These changes are incorporated into the design by repeating the steps shown above. This process leads to a late detection of failures within the design process. When a failure is detected, even modifying the speciﬁcation can be necessary as shown in the Figure 5.8. This causes long delays during the design process.

5.2.1.2 New Approach Here, the incorporation of techniques for formal veriﬁcation into earlier stages of the process is proposed. As soon as ﬁrst blocks can be simulated at a cycle accurate level, properties should be automatically deduced. This helps to get insight and offers a different view at the design. By this, conceptual errors as well as coding errors can be detected earlier. Also, the deduced properties can be used as a starting point for formal veriﬁcation which reduces the time needed to set up the veriﬁcation environment. The idea is to provide a tool to the designer that allows to gather more insight into his own design. For an overview see Figure 5.9. This ﬁgure shows the procedure for design veriﬁcation in the enhanced design ﬂow. Usually, the system model already contains the cycle accurate I/O-behavior of most blocks. Therefore, as soon as ﬁrst portions of the design can be simulated at

90

ROBUSTNESS AND USABILITY Textual specification

Interactive creation

Manual coding

Simulation traces

System level description

Simulation

Properties

Property check

Counterexamples Manual refinement

Synthesizable description

Figure 5.9.

Proposed methodology

the accuracy of clock cycles, formal properties are derived from the given description, i.e. from the system model or the HDL description. These properties exhibit some behavior of the design. The designer or a veriﬁcation engineer has then to decide whether the property is correct or not. This means that the compliance of the property with the textual speciﬁcation has to be checked. If the property is found to be valid, it can be used as a starting point for formal veriﬁcation. If the property is incorrect, either the given simulation trace does not show all behaviors of the given block or the block is erroneous. As a result, a direct feedback between textual speciﬁcation, system model, HDL description, and veriﬁcation is established. This feedback between the different design stages helps to improve design quality. Instead of only assuming a property, the designer can explicitly search for a property and check the compliance with the speciﬁcation. Moreover, pulling veriﬁcation methods into the earlier stages of the design process enables an early detection of design errors. A mismatch between speciﬁcation and HDL model is usually only detected during veriﬁcation. But the proposed method unveils this mismatch already while coding a block or even earlier while interactively creating properties. The iteration becomes superﬂuous. Still, the veriﬁcation step cannot be discarded. But now – similar to the set up of testbenches – properties are already developed during coding or even when only the system model is given. As soon as a trace is guaranteed to exhibit all important behavior, the corresponding stimuli become part of the testbench. The same holds for deduced properties. These are added to the

91

Property Generation

property suite for later formal veriﬁcation. A starting point for formal veriﬁcation is created during coding already and time is saved. The enhanced design ﬂow proposed in this book makes use of this new interactive veriﬁcation methodology.

5.2.2

Comparison to Other Techniques

The presented methodology leads to a different aspect of the design than other techniques do. At a similarly early stage of the circuit design phase usually only lint checking or assertion checking are applied. But in case of linting only general properties that guarantee, e.g. correct handling of arrays are checked. Using assertions, semantic checks can be carried out by means of powerful properties [FKL03]. But in this case the designer has to write the assertions himself/herself, i.e. an assumption about the design is formulated as a corresponding property. The proposed method allows to retrieve insight from a simulation trace. No further knowledge about the design is needed. In the software domain, a similar methodology has been presented in [NE01]. In that case, Java programs have been considered and execution traces have been searched for a set of predeﬁned invariants. These invariants were statically checked afterward. Here, the property detection is more general and aims at the veriﬁcation of hardware.

5.2.3

Work Flow

The work ﬂow to apply property deduction is shown in Figure 5.10. This is a reﬁnement of the interactive creation of properties in the proposed enhanced

Design description Signals

Sim. trace PropGen Property

Property Checker valid/ invalid reject

?accept

Figure 5.10. Application of property deduction

92

ROBUSTNESS AND USABILITY

design ﬂow as shown in Figures 1.2 and 5.9. As a starting point, the design and simulation traces are available. Then a tuple of signals is selected that has to be related to each other. This tuple is handed to the automatic property deduction and a property is retrieved that is valid on the trace. The property is passed to a proof engine that returns validity or invalidity of the property. All information is provided to the designer who decides whether the property is to be accepted or rejected. In case of acceptance, the property is added to the property suite. If necessary, corrections to the design are made. For this purpose the debugging techniques proposed in Section 6.3 can be applied. If the property is rejected, another deduced property with lower ranking (see Section 5.1.2) can be considered or a different tuple of signals can be chosen. The techniques that were introduced in Section 5.1.2 aid the designer. The completion of a property exhibits behavior that is not covered by the simulation traces and may reveal corner cases. Using assumptions to guide the property generation, helps to concentrate on particular modes of operation. Remark 8. In some cases, it can be more instructive to review the property before the result of the proof engine is known. This allows to consider the property itself without being inﬂuenced by the veriﬁcation result. Opposed to looking at the simulation trace, considering the property allows to focus on the functional relation between the signals more easily. Altogether this establishes an interactive process that offers different views at the design. Such a new perspective is an opportunity to increase the insight and to understand the design.

5.2.4

Experimental Results

Two types of experiments are carried out in the following. At ﬁrst the property generation is evaluated by using the LGSynth93 benchmark set. Then, a case study shows the beneﬁts of property generation while implementing an arbiter in more detail.

5.2.4.1 Benchmarks The focus of this study is to give some information about the relation between deduced properties and the design. For this purpose, the same sequential circuits from the LGSynth93 benchmark set were considered as in Section 5.1.3. Previously only the validity, run time and number of properties were considered. Here, more data is presented to give an idea of the quality of the properties. For each of the circuits the results of three runs are shown in Table 5.2. In all cases, traces of 100,000 clock cycles were considered. The tuple of seven signals was chosen randomly and tmax was set to 4. Again, a simple BDD-based proof engine was used to decide the validity of a property. This proof engine was tightly integrated with the property generation algorithm.

93

Property Generation Table 5.2. Sequential benchmarks, tcyc = 100, 000 Circuit daio gcd mm4a mm9a mm9b mult16a mult16b phase d. s1196 s1238 s1423 s344 s349 s382 s400 s420.1 s444 s526 s526n s641 s713 s838.1 s838 s953 trafﬁc

Run 1 #Rel Time Res. #Pat #Diff 1 0.42 v 18 – 1 8.11 u 47 – 2 1.93 i 91 9 1 4.86 u 106 – 2 11.26 u 105 – 0 0.08 1 – 81 5.96 v 96 – 2 5.81 u 10 – 1 62.07 u 70 – 3 2.53 u 54 – 4 27.36 u 79 – 18 5.89 i 127 1 0 0.12 1 – 72 19.15 i 17 63 17 0.9 i 6 54 0 0.23 1 – 26 11.24 i 16 112 16 117.81 i 5 91 9 1.78 i 5 91 653 49.5 u 32 – 453 35.76 u 32 – 551 41.34 i 32 96 695 46.95 v 16 – 74 15.82 v 33 – 3 0.96 v 7 –

Run 2 #Rel Time Res. #Pat #Diff 1 0.21 v 24 – 1 14.08 u 64 – 1 1.32 v 56 – 3 14.49 u 117 – 18 1.95 u 56 – 0 0.27 1 – 0 2.25 1 – 109 8.05 u 6 – 5 50.42 u 57 – 2 1.96 u 66 – 1 10.93 u 106 – 186 10.64 v 48 – 1 4.22 i 108 20 40 16.55 i 7 49 18 1.45 i 18 48 0 0.26 1 – 10 1.43 i 4 40 2 35.32 i 22 106 71 6.88 i 7 121 0 1.78 1 – 5 7.96 u 94 – 96 7.58 i 16 112 116 7.59 v 4 – 5 2.34 v 14 – 17 2.73 v 40 –

Run 3 #Rel Time Res. #Pat #Diff 3 0.36 v 22 – 12 31.89 u 71 – 2 0.53 v 50 – 1 1.11 u 78 – 6 1.31 u 56 – 0 0.5 1 – 0 2.03 1 – 115 8.97 u 6 – 1 1.79 v 49 – 1 4.05 i 94 18 10 19.42 u 34 – 17 1.45 i 60 60 27 5.38 i 65 15 48 3.37 i 7 24 18 1.53 i 5 32 0 4.53 1 – 234 35.6 i 5 25 38 4.23 i 15 51 22 1.95 i 14 58 2 24.65 u 87 – 3 25.38 u 78 – 308 38.97 i 64 64 83 7.88 v 4 – 7 4.16 v 36 – 2 0.66 i 29 7

The following data is reported for each run: Column #rel gives the number of time relations with a minimum number of patterns, i.e. the number of properties with the highest ranking according to Section 5.1.2. The time in CPU seconds needed to deduce properties is shown in column time (AMD Athlon XP 2200+, 512 MB). The next three columns give information about the ﬁrst property of those with highest ranking. Column res states the result returned by the property checker denoted by a letter (v = valid, i = invalid, u = undecided, 1 = all patterns occurred). In column #pat, the number of patterns included in this property is shown. Finally, column #diff gives the number of added patterns in order to turn an invalid property into a valid one. A trivial property was not further considered. As can be seen from the table, the number of time relations with a high ranking property is rather small in most cases. This strongly underlines the feasibility of the interactive application of the tool by going through the highest ranked properties. The large number of properties that were left undecided is due to the proof engine that was based on BDDs only. Instructive is the number

94

ROBUSTNESS AND USABILITY

of patterns that were added to valid properties. This shows that in some cases the simulation only covered a small fraction of the total behavior of the signals. When a large number of properties with high ranking occurs, the number of added patterns is an additional hint for selecting useful properties. Note that the experiments in this section are a worst case scenario for property deduction: The tuple of signals is chosen randomly. The window length is ﬁxed. The simulation trace was generated by random stimuli. In the usual application scenario, a designer would choose a set of appropriate signals and a window length. Often stimuli provided by a testbench could be used for property deduction. A more realistic scenario is shown in the following case study.

5.2.4.2 Case Study: Arbiter The benchmark results above show the efﬁciency of property generation. But these examples do not show the quality of the generated properties or the feasibility of the proposed veriﬁcation methodology. As a case study, a simple arbiter was coded and checked by means of property generation. In contrast to the more sophisticated arbiter that was considered in Section 4.1.4 the present arbiter manages the access of only two clients to a bus. Conﬂicts are resolved by a priority scheduling. There exists a request input (req) and a done input (done) as well as an acknowledge output (ack) for each client. Figure 5.11(a) shows a block diagram of the arbiter with two clients. An example for a request from client 0 is shown in Figure 5.11(b). By setting req, a client signals the need to access the bus. Then, the arbiter sets ack if the bus is not in use, no request of higher priority occurs, and ack is kept. Finally, the client sets done to release the bus again. req[0] ack[0] done[0] req[0] done[0] req[1]

req[1]

ack[0]

ack[1] ack[1]

done[1]

done[1]

0

(a) Block diagram

1

2

(b) Trace

Figure 5.11.

The arbiter

3

4

5 t

Property Generation

95

The arbiter was coded using Verilog. VIS [VIS96] has been used to generate a blif-ﬁle from the Verilog description. Then, automatic property generation was used in the manner explained in Section 5.2.3. Due to the new methodology, two errors were detected. The Verilog code of the arbiter is shown in Figure 5.12 on page 96. Originally, instead of Lines 16 and 17 only Line 14 was in place. In a ﬁrst attempt to ﬁx the bug, this was replaced by Line 15 and, ﬁnally, by Lines 16 and 17. The detection of errors and reasons for the replacement are described in the following. For all calls of property deduction tmax was set to 2. The tuple I of signals is shown at the beginning of each paragraph. I = (req[0], ack[0]): At ﬁrst the relation between req and ack for the client with highest priority was of interest. The set of signals passed to property generation consisted only of req[0] and ack[0]. The ﬁrst assumption was that there is a direct dependency between this pair of signals. But indeed any pattern may occur. A trivial property was the result. Therefore, the state was included in the set of signals. I = (req[0], state, ack[0]): The ﬁrst version of the arbiter contained Line 14 instead of Lines 16 and 17. This lead to an error: ack could be inﬂuenced by the behavior of req while the bus was BUSY. This error was resolved by replacing Line 14 with Line 15. Now, additionally the inﬂuence of done on the other signals was of interest. Also, a value of ack at another time step was taken into account. I = (req[0],done[0],state,ack[0],ack[0]): The resulting property showed that ack for the client was reset even if the client did not release the bus by setting done. A possible solution is the replacement of Line 15 by Lines 16 and 17. In the resulting property, the state was taken at time step 1 instead of 0 as originally wanted. Therefore, this signal was restricted to time step 0 and as a result the relation for the arbiter with highest priority was returned. The case study showed a scenario for the application of property deduction and how errors can be revealed using this method. In the ﬁrst query, only a small tuple of signals was considered. This tuple was then successively enlarged to understand more relations. Checking deduced properties on the design and reviewing these gave a feedback that led to the detection of design errors. During this process no direct interaction with the formal veriﬁcation engine was necessary.

96

ROBUSTNESS AND USABILITY

1 module theArbiter (clock,ack,done,req); 2 parameter IDLE=0, BUSY=1; 3 input clock, reset; 4 output [1:0] ack; 5 input [1:0] req, done; 6 7 reg [1:0] ack; 8 reg state; 9 10 wire [1:0] resolve, acquire; 11 12 assign resolve[0]= req[0]; 13 assign resolve[1]= !req[0] & req[1]; 14 / / a s s i g n a c q u i r e = ( a c k [ 0 ] & r e q [ 0 ] ) | ( ack [ 1 ] & req [ 1 ] ) ; 15 / / a s s i g n a c q u i r e = ( a c k [ 0 ] & done [ 0 ] ) | ( a c k [ 1 ] & done [ 1 ] ) ; 16 assign acquire[0]= ack[0] & !done[0]; 17 assign acquire[1]= ack[1] & !done[1]; 18 19 always @(posedge clock) 20 case (state) 21 IDLE: if (req!=0) 22 begin 23 ack= resolve; 24 state = BUSY; 25 end 26 BUSY: if (done!=0) 27 begin 28 ack= acquire; 29 state= IDLE; 30 end 31 endcase 32 endmodule Figure 5.12.

Code of the arbiter

Property Generation

5.3

97

Summary and Future Work

The veriﬁcation ﬂow for circuit design has been considered in this chapter and shifting to a new veriﬁcation methodology has been suggested. A technique to automatically generate properties from traces is the basis. The technique relies on pattern matching and therefore is very efﬁcient even on large traces as experiments demonstrated. A ﬁrst application of this technique is the detection of gaps in testbenches. By this, both techniques – traditional simulation-based veriﬁcation and the enhanced formal methods – can be applied for veriﬁcation. Invalidity of a generated property gives hints to the gaps in the testbench. This knowledge can then be exploited to increase the quality of simulation-based functional veriﬁcation. PropGen can be used as a push-button tool that automatically detects gaps for given signals. Based on this foundation, a new veriﬁcation methodology was presented. Properties are created interactively which can speed up the pure manual creation of properties. Accepted properties serve as a starting point for formal veriﬁcation. Even more important is the different view at the design provided by the generated properties. This can unveil the behavior which remains hidden otherwise. As a result, the understanding of the design is improved. In turn, the efﬁciency of the design process increases due to early detection of bugs or inconsistencies. The main advantages of this methodology are the new perspective on the design and an increased productivity when creating a property suite. Important directions for future improvements are more heuristics to select useful properties or the front-end for the presentation of deduced properties and for user interaction. Additionally, the properties could be extracted directly from a formal model of the design. In summary, the proposed techniques contribute an improved usability in the veriﬁcation ﬂow. New push-button tools were proposed to automate tasks that were carried out manually so far. As a result, the productivity of the veriﬁcation ﬂow is increased. This mainly concerns the creation of properties. The automation of another task – that of debugging failures – is considered in the next chapter.

This page intentionally blank

Chapter 6 DIAGNOSIS

Similar to the previous chapter, this chapter also considers the veriﬁcation path of the design ﬂow. At this point the diagnosis problem is studied in more detail. The steps of the design ﬂow covered in this chapter and the integration into the overall ﬂow are shown in Figure 6.1. Efﬁcient simulation and formal veriﬁcation – in the form of property checking and equivalence checking – are applied to guarantee the functional correctness of a design. The postproduction test of a chip checks the correct behavior of the ﬁnal product. Although effective, these techniques only detect the existence of errors or faults. Further effort is required to locate the source of the errors. Unfortunately, manually ﬁnding or diagnosing the error locations in a design is a time-consuming and, therefore, costly task. Automatic approaches for diagnosis have been proposed to speed up this debugging process. These approaches automatically calculate a set of candidate error sites. Then, the user can restrict himself to these candidates for debugging instead of going through the complete design. In literature, this area is frequently named Design Error Detection and Correction (DEDC). In the ﬁrst section of the chapter, existing diagnosis techniques are reviewed and compared as published in [FSVD06]. In particular, simulation-based and SAT-based diagnosis are considered in detail. These two approaches are independent of the circuit structure and they are applicable for sequential diagnosis problems. Both approaches rely on a given set of counterexamples and the design as the starting point. The comparison shows that the simulation-based approach is very efﬁcient in terms of run time and resources. The SAT-based approach is less efﬁcient but returns results of higher quality. The problem of generating counterexamples that are explicitly chosen to retrieve a good diagnosis resolution is considered in Section 6.2. In particular, the decision problem whether a given set of counterexamples leads to the

100

ROBUSTNESS AND USABILITY Property check

Counterexamples Fault diagnosis Synthesizable description

Equivalence check

Synthesis (for test.)

Gate level description

Figure 6.1.

Fault diagnosis

Counterexamples

Fault diagnosis in the design ﬂow

smallest number of candidates is shown to be NP-complete even if all counterexamples are given in advance. Additionally, heuristics to select a “good” set of counterexamples from all counterexamples are presented and evaluated. These results were presented in part in [FD03]. Up to this point only the diagnosis of combinational gate-level designs is considered. In the last part of the chapter, diagnosis is extended to HDL level designs and sequential designs. As a result, diagnosis can be applied to debug failing properties. While the other approaches need a correct output response per counterexample, the property provides the correctness check for the diagnosis algorithm in this case. This diagnosis algorithm works on the gate-level structure of the design. A dedicated synthesis ﬂow is used to identify source level components and to link the diagnosis results back to the HDL. In this work, the SAT-based procedure is applied to diagnose properties as proposed in [SFBD06]. The simulation-based approach for diagnosing properties as presented in [FD05] is applied as a preprocessing step to increase efﬁciency. Similar to fault models in ATPG an error model can be considered for diagnosis to restrict the type of errors. Throughout this chapter a very general error model is used. Here, a single error is the replacement of the function of a gate or a component by another function. Due to this general formulation, this model subsumes most simpler fault models.

Diagnosis

6.1

101

Comparing SAT-based and Simulation-based Approaches

A number of different concepts have been used for diagnosis. Some of these techniques were originally applied in the context of the postproduction test, but they can be used for equivalence checking in the same manner. The structural approaches [BDKN94, LCC+ 95, VSA03] and the BDD-based approaches [CCC+ 92, PR95, HK00] have certain drawbacks. Structural approaches rely on similarities between the erroneous circuit – the implementation – and the speciﬁcation. But such similarities may not be present, e.g. due to optimizations during synthesis. For large designs BDD-based approaches suffer from space complexity issues. In both cases, a complete speciﬁcation of the design is usually a precondition for diagnosis. Here, diagnosis methods that use a set of counterexamples are considered. The focus is on approaches for simulation-based diagnosis as proposed in [KCSL94, HC99, VH99, LV05] and diagnosis based on Boolean Satisﬁability (SAT) [AVS+ 04, ASV+ 05]. Both approaches have been applied to combinational and sequential diagnosis problems. Due to the underlying engines, both techniques are robust with respect to the size of the design. Simulationbased approaches can use efﬁcient parallel simulation techniques with linear run times while SAT-based approaches beneﬁt from recent advances in SAT solving (see Section 2.1.3 for details). An in-depth analysis of these diagnosis methods can show directions for further improvements. In this section, simulation-based and SAT-based diagnosis are compared from a theoretical and empirical point of view for the ﬁrst time. Both approaches use a set of counterexamples for diagnosis that may be provided after test-bench simulations, formal veriﬁcation, or after failing a postproduction test. The basic procedures of the two approaches are outlined. Then, the relationship between these procedures is explained by introducing a third approach of simulation-based diagnosis for multiple errors. Similarities and differences are analyzed using this third approach. The theoretical results are backed by experimental data based on the ISCAS89 benchmark suite. Overall, this analysis provides future research initiatives for improving each individual diagnosis technique as well as creating hybrid approaches that exploit the advantages of both. The technique presented in Section 6.3 is a ﬁrst step into this direction because the simulation-based technique is applied as a preprocessing step for SAT-based diagnosis. This section is structured as follows: In Section 6.1.1, the basic approaches for simulation-based and SAT-based diagnosis are introduced. In Section 6.1.2 the relation between the approaches is considered from a theoretical point of view. Further issues regarding performance and quality are discussed in Section 6.1.3. The basic procedures are experimentally compared in Section 6.1.4.

102

6.1.1

ROBUSTNESS AND USABILITY

Diagnosis Approaches

In this section, the diagnosis problem is introduced and the basic diagnosis procedures for simulation-based and SAT-based diagnosis are presented. References for the advanced approaches which make use of the basic procedures are given in the corresponding sections.

6.1.1.1 Diagnosis Problem Based on the notion of combinational circuits (see Section 2.2, Deﬁnition 3) and test-sets (see Section 2.3, Deﬁnition 5) the diagnosis problem is formulated as follows: Deﬁnition 8. Let the combinational circuit C = (V, E, X, Y, F, P ) be an implementation of a speciﬁcation and let T be a test-set of r counterexamples. The diagnosis problem is to determine a set of candidate gates C = {g1 , . . . , gc } where a correction can be applied to rectify C such that C yields the correct output value for all counterexamples in T. The size of a correction C is denoted by |C|. Deﬁnition 9. A set of candidate gates C is called a valid correction for a test-set T if changing the functionality of the gates in C is sufﬁcient to rectify the circuit C such that C yields the correct output value for all counterexamples in T. This deﬁnition of a valid correction does not require the replacement to be a deterministic combinational function in terms of primary inputs and present state. But this is not relevant in the combinational case because the same conﬁguration of input values and present state values does not occur twice for different counterexamples. In contrast, in the sequential case the same input values and present state values may reoccur. Therefore, this deﬁnition is reﬁned in Section 6.3 when the diagnosis of properties is considered. Deﬁnition 10. A valid correction C contains only essential candidates if and only if for any g ∈ C: C \ {g} is not a valid correction. The faulty circuit C contains p actual error sites e1 , . . . , ep . An error is considered to be the replacement of the function of a gate by another arbitrary Boolean function. Therefore, the size of the search space for possible corrections is in the order of O(|C|p ) [VH99]. In the following, the term effect analysis means “determining whether changing the functionality of one or more internal circuit lines corrects the value of the erroneous output”.

103

Diagnosis

5 6

PathTrace(C, i, T , y) Simulate T to establish values of internal signals. Ci := ∅; mark y. For each marked gate g that was not visited Ci := Ci ∪ g If there are inputs with controlling value, mark one of these inputs, else // no input has a controlling value mark all inputs. If there remain marked gates that have not been visited, goto (3). Return Ci .

7 8 9

BasicSimDiagnose(C, T) For i = 1 . . . r Ci := PathTrace(C, i, Ti , yi )

1 2 3 4

Figure 6.2.

Basic simulation-based diagnosis

6.1.1.2 Simulation-based Diagnosis The basic procedure for simulation-based diagnosis approaches considered in this work is Path Tracing (PT) that is derived from critical PT [AMM83]. The overall ﬂow for a naïve simulation-based diagnosis is shown in Figure 6.2. The procedure BasicSimDiagnose uses PT to calculate a set of candidates Ci for each triple Ti = (Ti , yi , νi ) in the test-set T. PT marks “candidate gates” on the sensitive paths leading to the erroneous output yi . All gates on the sensitized path are candidate error sites. More than one input of a gate may have a controlling value (e.g. both inputs of an AND-gate have value 0), but it is only necessary to choose one of these inputs to include at least one error site in the sensitized path [HC99, VH99]. So there can be several sensitized paths. One of these paths is deterministically chosen. The basic algorithm does not check whether the inversion of a candidate’s logic value for a particular counterexample really causes a value change at the erroneous output(s), i.e. no effect analysis is performed. In the following, we refer to this basic simulationbased approach as BSIM. Example 23. Consider Figure 6.3 and assume the gate marked by “X” is an OR-gate instead of an AND-gate, i.e. the actual error site. When the shown values are applied, an erroneous output is resulting. The fault free/faulty values of lines are annotated. Starting PT at the erroneous output y0 , the bold lines and gates are marked as sensitized. All gates except the inverter are returned as candidate error sites by PT.

104

ROBUSTNESS AND USABILITY 0

x0 =1

X 1

x1 =1 x2 =0

1

Figure 6.3.

0/1

1

0/1

y0

Example of a sensitized path

The interpretation of the diagnosis result depends on the number of errors that are assumed to be contained in C. If there is only a single error present in the circuit, the actual error site is contained in the intersection of all candidate sets, i.e. in ri=1 Ci . If there are multiple errors, a more conservative approach has to be used: Each marked gate has to be considered as a candidate. The number M (g) = |{i : g ∈ Ci }| of counterexamples that marked a particular gate g can be used to prioritize the candidates. But there is no guarantee that any real error site has been marked by the largest number of PT marks. Because the candidate set of each counterexample contains at least one actual error site, at least one actual error site is marked by more than r/p counterexamples [KCSL94], i.e. ∃e ∈ {e1 , . . . , ep } : M (e) > r/p. Thus, for the correction of k errors, subsets up to size k of all marked gates have to be considered. This is done by the advanced simulation-based approaches relying on PT [HC99, VH99, LV05]. Multiple errors are handled by considering the corrections of size k and applying pruning techniques. For examples, in [LV05] the number of remaining errors is reduced by one each time in a greedy-like manner. After choosing a single correction, the candidate sets Ci are recalculated by calling BasicSimDiagnose. This effect analysis is necessary, because correcting one error may change the sensitized paths in the circuit. Then, the next single correction is chosen. But earlier decisions may have been wrong. Thus, the ability to perform a backtrack similar to the solvers for NP-complete problems is required. As a result, the time complexity for the advanced simulationbased techniques drastically increases compared to BSIM. A simulation-based approach that does not use PT has been introduced in [BMJ+ 99]. Instead of backtracing sensitive paths an approach based on forward implications by injecting X-values was chosen for diagnosis. Therefore, the core idea is similar to the approaches based on PT: The effect of changing a value at a certain position is considered.

6.1.1.3 SAT-based Diagnosis For SAT-based diagnosis a SAT instance is generated that can only be satisﬁed if changing a limited number of gates in the erroneous circuit produces the correct output values for all counterexamples. This approach was ﬁrst presented in [SVV04]. The SAT instance F is built as shown in Figure 6.4. Multiplexors

105

Diagnosis

T1

y1

correct value

Tm

ym

correct value

ab

i

cg 1

g

mg

0

abg (a) Multiplexor at g

(b) SAT instance

Figure 6.4.

SAT-based diagnosis

1. BasicSATDiagnose(C, T, k) 2. For each triple (T, y, ν) ∈ T Create an instance I of C in F. Constrain y to adopt the correct value ν. Constrain inputs to the values of t. Insert multiplexors at gates that are considered for correction. 3. For i = 1 . . . k Constrain the number of abnormal predicates set to 1 to at most i. Enumerate all solutions and add a blocking clause per solution. Figure 6.5.

Basic SAT-based diagnosis

are inserted at each gate g to allow for corrections (see Figure 6.4(a)). The output value of g is propagated when the select input abg has the value 0. A correction is applied when the select input abg is set to 1: the value of g is overwritten by a new unrestricted value cig . The variable name abg of the select input refers to the functionality of asserting the gate g “abnormal” to inject a correction. Notation 3. A variable abg is also called abnormal predicate. Such corrections are necessary to retrieve a solution for the SAT instance F shown in Figure 6.4(b). According to the pseudocode in Figure 6.5, a copy of C is created for each counterexample (T, y, ν) ∈ T. Each copy is constrained to the primary input values of trace t and to produce the correct output value ν for the erroneous output y. The select-line abg for multiplexors corresponding to gate g is the same in all copies of C. Therefore, the gate may be changed

106

ROBUSTNESS AND USABILITY

for all counterexamples or for none. The injected value cig may be different for different counterexamples. Thus, gate g can be replaced by an arbitrary Boolean function. The number of gates that may be changed is bounded by constraining the number of abnormal predicates that may take the value 1 to be less than or equal to k. A SAT solver is used to solve the SAT instance F. Free variables in F are those corresponding to the abnormal predicates abg and to the new primary inputs for the correct values at gates cig . All other variable values are determined by the constraints for the circuit’s gates (see Section 2.2.3), the traces and the correct output values. Each solution of F is a solution to the diagnosis problem. The abnormal predicates that are set to 1 in a satisfying assignment for F determine the set of candidate gates C that have to be changed. In the following, we refer to C as a solution of F. In Figure 6.5, the limit is iteratively incremented in the for-loop in Line 3. This guarantees that all solutions generated by the approach only contain essential candidates, because solutions with a smaller number of candidates are blocked before increasing the limit. For this purpose an incremental SAT solver can be used [WKS01]. In the following, we refer to this basic SAT-based approach as BSAT. The advanced SAT-based diagnosis approach [SVV04] applies several heuristics that improve the performance of BSAT. To reduce the search space, additional clauses are added that force the free variables cig to 0 when abg is set to 0. This prevents up to |C| decisions of the SAT solver. The same effect is achieved by using the following construction: abg → (cig ↔ g) That is, if gate g is not changed (abg = 0) then cig yields the value calculated at g for counterexample i, otherwise the value of cig can be set to an arbitrary value. Also, instead of inserting a multiplexor at each gate, only dominators are selected in a ﬁrst run to reduce the search space. In a second run, a ﬁner level of granularity for diagnosis can be retrieved by introducing more multiplexors in the dominated regions that may contain an error. Additionally, for a large number of counterexamples the test-set is split into partitions to reduce the size of the SAT instance. Finally, an all-solutions SAT solver is used. Such a solver automatically minimizes the number of assignments in a solution. Thus, incrementally solving instances with larger limits as in the basic procedure is not necessary. All these techniques do not change the solution space but dramatically decrease the run time. In fact, speed-up factors of more than 100 times have been observed [Smi04]. The approach has also been applied to diagnose sequential errors efﬁciently [AVS+ 04] and to carry out diagnosis for hierarchical structures [ASV+ 05].

107

Diagnosis

6.1.2

Relation Between the Approaches

In this section, the two basic diagnosis approaches BSIM and BSAT are compared from a theoretical point of view. A third approach is introduced that formally describes the application of BSIM for the diagnosis of multiple errors. Using this approach, the differences between the two basic techniques are explained. The discussion in Section 6.1.3 includes the advanced approaches. The third approach is given in Figure 6.6. First, BasicSimDiagnose is called to calculate the candidate set Ci for counterexample (Ti , yi , νi ) ∈ T. These sets form an instance S of the set cover problem: A solution C∗ of S contains at least one element of each set Ci . Thus, for each counterexample in T at least one gate on a sensitized path is contained in C∗ . We refer to the approach implemented by SCDiagnose as COV. Example 24. Assume, that SCDiagnose is called for k = 2 and a test-set with three counterexamples. Further assume that BasicSimDiagnose returns the following candidate sets: C1 = {a, b, f, g} C2 = {c, d, e, f, g} C3 = {b, c, e, h} Then, {b, d} would be one possible solution returned by SCDiagnose. Another solution would be {a, d, h}. This simple approach does not use heuristics to bias preference to one solution over another. The minimum cover problem, i.e. to decide whether no solution with fewer elements exists, is NP-complete [GJ79] (see also Deﬁnition 14 on Page 117). The relation between the set cover problem and diagnosis of multiple errors has been studied earlier, e.g. in [VH99]. The BSAT approach solves a very similar problem. By choosing the values of the abnormal predicates, locations for corrections are determined. One difference is the simulation engine which is replaced by BCP of the SAT solver 1. 2. 3.

SCDiagnose(C, T, k) Call BasicSimDiagnose(C, T, k) to calculate Ci , 1 ≤ i ≤ r. Calculate all solutions of the set cover problem S: Find C∗ such that (a) for each i: at least one element of Ci is contained in C∗ , (b) for any g ∈ C∗ : C∗ \ g does not fulﬁll condition (a), (c) |C∗ | ≤ k. Figure 6.6.

Diagnosis based on set cover

108

ROBUSTNESS AND USABILITY

(see Section 2.1.3). Additionally, BSAT carries out an effect analysis while solving the SAT instance: When switching an abnormal predicate at a multiplexor BCP propagates value changes dynamically. In contrast, COV does not carry out effect analysis at all. Based on these observations, the following lemmas can be derived. Lemma 3. Let C be a circuit, T be a test-set and k ∈ N. Each solution C of the SAT instance F is a valid correction for T. Proof. The construction of the SAT instance directly implies this lemma. Lemma 4. Let C be a circuit, T be a test-set and k ∈ N. There exist solutions for the set cover problem S in SCDiagnose(C, T, k) that are not a valid correction for T. Proof. Consider the circuit in Figure 6.7. The values described by the simulation trace are assigned to the inputs. This produces the output value 0 instead of 1. PT either marks the gates {a, b, d} or {a, c, d} because both inputs of d have a controlling value. A possible solution to cover this single set of candidates is {b} (or {c}, respectively). But the counterexample cannot be rectiﬁed by changing only the output value of b (or c). Lemmas 3 and 4 directly lead to Theorem 3. Theorem 3. Let C be a circuit, T be a test-set and k ∈ N. There exist solutions calculated by SCDiagnose(C, T, k) that are not calculated by BasicSATDiagnose(C, T, k). Next, the capability to calculate all valid corrections is analyzed. Lemma 5. Let C be a circuit, T be a test-set and k ∈ N. BasicSATDiagnose(C, T, k) returns all valid corrections containing only essential candidates up to size k. Proof. Again, the construction of the SAT instance directly implies this lemma. Incrementally calculating corrections of sizes 1 to k and “blocking” smaller solutions guarantees that only essential candidates are contained in each correction. 1 b 1 0 1 Figure 6.7.

a

0

0

d c

0/1

0

Example: COV may not provide a correction

109

Diagnosis 0 1 0 1

Figure 6.8.

a

0 d

b

c 1

e

0/1

1

0

Example: Solution for k = 2 by BSAT but not by COV

Lemma 6. Let C be a circuit, T be a test-set and k ∈ N. There are valid corrections with k or less candidate gates that are not calculated by SCDiagnose(C, T, k). Proof. Consider the circuit in Figure 6.8. Assume, that only the applied trace shows an erroneous output value and that k = 2. By changing the output values of a and b, the correct output value 1 can be produced. But the single candidate set {a, c, d, e} generated by PT does not contain b. Therefore, {a, b} is not a solution of S. Lemma 5 and 6 imply the following theorem. Theorem 4. Let C be a circuit, T be a test-set and k ∈ N. There exist solutions calculated by BasicSATDiagnose(C, T, k) that are not calculated by SCDiagnose(C, T, k). This analysis states that neither BSIM nor COV always provide valid corrections. Furthermore, these methods do not calculate all valid corrections, whereas BSAT does both, calculates all valid corrections and provides only valid corrections. This difference is important when discussing the advanced approaches in the next section.

6.1.3

Qualitative Comparison

While only the basic procedures were compared in the previous section, the following discussion also includes advanced simulation-based [HC99, VH99, LV05] and SAT-based [SVV04] approaches. Formal aspects like the complexity of the approaches and their ability to calculate valid corrections are considered. Further issues are discussed on an informal basis. Table 6.1 summarizes the topics and the respective results. The number of candidate error sites differs between the approaches. A large number of candidates is returned by BSIM, only the number of counterexamples that marked a particular gate may differ. In contrast, the other approaches only return k candidates. The number k is small and has either to be speciﬁed by the user or is determined by automatically calculating a minimal solution. During the search subsets of the gates in C up to size k are considered.

110

ROBUSTNESS AND USABILITY

Table 6.1. Comparison of the approaches BSIM

COV

Adv. sim.- BSAT based k, user deﬁned (or incrementally determined)

Adv. SATbased

Number of candidate error sites

O(|C|)

Valid correction Effect analysis

Not guaranteed, guides the designer Guaranteed, correct values per counterexample are supplied None Simulation- Inherent based

Structural Available information

none for correction

Available

None

Exploited during CNF generation

Simulation engine

Efﬁcient, circuit-based

BCP

Time complexity

O(|C| · r)

O(|C|k )

O(|C|k+1 · r) O(k2l|C| ), O( kr 2l|C| ), j l = 2r + k + 1 l = 2r+j+1

Space complexity

O(|C| + r)

O(|C| · r)

O(k · |C| · r) Θ(|C|(r + k))

Θ(|C|(j+k))

When debugging the design, it is important whether an approach guarantees to return a valid correction. This is not done by BSIM and COV. The solutions calculated by these basic approaches can only be used to guide the designer during error location. In contrast, the advanced simulation-based approaches, the BSAT approach and the advanced SAT-based technique only return valid corrections. Additionally, with respect to each counterexample a new value for each gate in the correction is provided. This can be exploited to determine the “correct” function of the gate. Effect analysis guarantees that only valid corrections are calculated. The advanced simulation-based approaches rely on resimulation while a SAT solver inherently carries out effect analysis. The simulation-based approaches may use structural information for these purposes since they are directly applied to the circuit. For example, successor/predecessor relations, knowledge of dominators, etc., can directly be exploited in the algorithms. For a SAT-based approach such information has to be encoded while generating the SAT instance. This is not done by BSAT. But the advanced SAT-based approach in [SVV04] uses, for example, information about structural dominators to prune the search space. A crucial issue when considering a large number of counterexamples is the simulation engine. Naturally, the simulation-based approaches can use fast engines that directly evaluate the circuit. Such an engine can also be used for

Diagnosis

111

what-if-analysis when carrying out effect analysis. The SAT approach inherently uses BCP for these purposes. This may induce some overhead when the SAT instance is large. But due to sophisticated implementation techniques BCP is very efﬁcient in practice (see Section 2.1.3). Moreover, a large number of unit literals is contained in the SAT instances. These are not further considered after the preprocessing step. Only BSIM has a linear time complexity of O(|C| · r) where simulation and PT are carried out for each counterexample (r is the number of counterexamples considered). COV has to determine a solution to the set cover problem. A backtrack search is applied to determine subsets of size ≤k of gates that cover all candidate sets Ci , 1 ≤ i ≤ r, which takes O(|C|k ). The advanced simulation-based approaches also calculate these subsets. Additionally, for each subset simulation and PT are carried out per counterexample. In total, this takes time O(|C|k · |C|r). For BSAT the SAT solver searches for a satisfying solution on each of the k SAT instances. Per gate each SAT instance contains one select input for a multiplexor and for each counterexample a variable and an additional input, leading to |C| + r · 2|C| variables. Additionally, at most k|C| variables are needed to encode the constraint that restricts the number of abnormal predicates with value 1. (It is assumed that a BDD circuit is encoded in CNF to retrieve this constraint. A different realization may lead to a different result for the complexity.) This are |C|(2r +k +1) variables in total. Thus, the search on one of the k SAT instances containing r counterexamples is carried out in O(2|C|(2r+k+1) ). In the advanced SAT-based approach, only a ﬁxed number j of counterexamples is considered in a single SAT instance, i.e. |C|(2j + k + 1) variables. For a given limit k at most r/j instances have to be solved. This causes an asymptotic complexity of O(k rj 2|C|(2j+k+1) ). Modern SAT solvers drastically improve upon the theoretical upper bounds. For example, after choosing the values for the abnormal predicates at the multiplexors, the values of all other variables in the SAT instance can be implied, which reduces the search space to the size 2|C| already. Pruning due to learning techniques further improves the search. The space complexity of BSIM is also the smallest. Each counterexample can be handled independently of the others leading to a complexity of O(|C| + r). COV stores the circuit, the current test, and the set of candidates marked by PT for each counterexample. In the worst case, PT marks all gates of the circuit. The complexity is O(|C| · r) in this case. The advanced simulation-based approaches store the same information, but additionally resimulation has to be done during the backtrack search. At each search level up to depth k this information has to be stored, yielding O(k · |C| · r) as the space requirement. The CNF formula generated for BSAT always contains a copy of the circuit for each counterexample. The size of the constraint to limit the number of abnormal predicates set to 1 is in Θ(|C| · k). Therefore, the total

112

ROBUSTNESS AND USABILITY

space complexity is Θ(|C| · (r + k)). In the advanced SAT-based approach, only j counterexamples are considered in a single SAT instance which yields Θ(|C| · (j + k)). In summary, the basic simulation-based approaches BSIM and COV are very fast but do not yield diagnosis results of highest quality. The other approaches have higher run times but ensure valid corrections. The experiments presented in the following section strengthen these theoretical observations.

6.1.4

Experimental Results

In the experiments, the three basic approaches BSIM, COV, and BSAT were considered. A number of 1–4 gate change errors were injected into circuits from the ISCAS89 benchmark set. The limit k was always set to the number of errors injected previously. Then, diagnosis was done for 4, 8, 16, 32 counterexamples to show the ﬁner resolution obtained from additional counterexamples. In all cases, a part of the same test-set has been used for an erroneous circuit. All experiments were carried out on an AMD Athlon 3500+ (1GB, 2.2 GHz, Linux). The resources were restricted to 512 MB and 30 min CPU time. The SAT solver Zchaff [MMZ+ 01] was used. Zchaff supports incremental SAT to reuse learned clauses. The set cover problem in COV was also solved using Zchaff. The three basic approaches are compared with respect to run time and quality. Table 6.2 shows the run times of the three approaches. Given are the name of the circuit C, the number of errors p, and the number of counterexamples r used. For COV and BSAT run times to create the SAT instance (for COV this includes the time for BSIM), to calculate one solution, and to calculate all solutions are reported in columns CNF, One, and All, respectively. Table 6.2. Run time of the basic approaches BSIM Circuit s1423 s1423 s1423 s1423 s6669 s6669 s6669 s6669 s38417 s38417 s38417 s38417

p 4 4 4 4 3 3 3 3 2 2 2 2

r 4 8 16 32 4 8 16 32 4 8 16 32

0.00 0.01 0.02 0.03 0.01 0.02 0.03 0.10 0.18 0.25 0.45 0.90

CNF 0.01 0.01 0.01 0.03 0.01 0.02 0.04 0.10 0.18 0.25 0.45 0.90

COV One 0.01 0.01 0.02 0.03 0.03 0.04 0.05 0.12 0.18 0.25 0.45 0.90

All 1.36 19.98 4.12 0.68 0.09 0.12 0.7 0.65 0.20 0.27 0.47 0.92

CNF 0.02 0.02 0.04 0.06 0.05 0.05 0.08 0.13 0.40 0.42 0.49 0.60

BSAT One 0.21 0.21 0.29 0.60 3.24 5.06 10.48 10.80 37.4 33.64 300.86 394.47

All 34.21 12.93 13.14 22.72 56.49 47.87 12.06 14.30 1093.76 522.62 637.18 953.98

113

Diagnosis

Remark 9. Note that only BSAT is guaranteed to return a valid correction since the other approaches do not carry out any effect analysis. Thus, BSAT solves a harder problem. Remark 10. The run times of the basic approaches cannot be compared to that of the advanced approaches. For the SAT-based approaches heuristics have been proposed that yield a speed-up of up to 100 times. The advanced simulation-based techniques are applying a backtrack search and carry out effect analysis for each solution resulting in a drastic increase in run times (see Sections 6.1.1.2 and 6.1.2). As expected, BSIM is the fastest approach and takes less than 1 s CPU time even for a large circuit as s38417. Also, COV computes corrections quite fast even when all corrections are retrieved. Due to the effect analysis, BSAT needs much longer run times especially when all solutions are calculated. But this ensures to return only valid corrections. Table 6.3 compares the quality of the approaches: For BSIM – –

–

–

The total number of gates that have been marked by PT is given (| Ci |). For each of these gates the distance to the nearest error was determined, i.e. the number of gates on a shortest path to any error. The average value of these distances is reported (avgA ). The number of gates that have been marked by the maximal number of counterexamples is also given, i.e. Gmax = |{g : ∀h ∈ C : M (g) ≥ M (h)}|. Again, the distance to the nearest error was determined for each of these gates. The minimal, maximal, and average (avgG ) values of these

Table 6.3. Quality of the basic approaches BSIM Circuit p r | ∪ Ci | AvgA Gmax Min Max AvgG s1423 4 4 100 3.68 4 1 4 2.75 s1423 4 8 115 3.78 2 3 4 3.50 s1423 4 16 126 3.90 1 1 1 1.00 s1423 4 32 139 3.85 3 1 4 2.67 s6669 3 4 90 6.89 83 0 12 7.17 s6669 3 8 106 6.87 86 0 12 6.95 s6669 3 16 117 6.85 69 0 12 6.94 s6669 3 32 117 6.85 64 0 12 7.39 s38417 2 4 52 4.75 18 0 11 4.61 s38417 2 8 67 5.69 18 0 11 4.61 s38417 2 16 67 5.69 15 0 11 4.73 s38417 2 32 95 7.56 14 0 11 4.93

COV #Sol Min Max Avg 5931 0 5.33 2.90 28281 0 5.50 3.42 7960 0 4.50 2.85 1716 0.33 4 2.37 415 0 7 4.18 565 0 7 3.94 2275 0 7.33 4.55 1790 0 7.33 4.48 156 0 11 4.67 113 0 11 4.61 150 0 11 4.53 133 0 11 4.40

SAT #Sol Min Max Avg 4239 0 4.00 2.18 1281 0 3.50 1.78 809 0 3.25 1.66 767 0 3.25 1.61 1935 0 5.67 3.66 1029 0 5.67 3.72 12 0 1 0.64 12 0 1 0.64 5959 0 22.00 9.64 31 0 5.50 3.45 29 0 5.50 3.33 33 0 4.50 2.88

114

ROBUSTNESS AND USABILITY

distances are reported. If the minimal value is greater than zero, no actual error site was marked by the maximal number of counterexamples. For COV and BSAT –

The number of solutions is given.

–

For each gate in a solution the distance to the nearest error was determined. Per solution the average a of these distances was calculated. The minimal, maximal and average value of a over all solutions is reported.

These distance measures give an intuition up to which depth the designer has to analyze the circuit when starting from a solution returned by one of the approaches. A small value of this distance is desirable. The table shows that BSIM alone does usually not yield a good diagnosis result. The number of gates that have the highest count from PT (Gmax ) can be quite large see, e.g. s6669. While often an actual error site is among these gates, this cannot be guaranteed. Based on these results, the designer may have to analyze a large part of the circuit before ﬁnding an error. COV considers subsets up to size k of all marked gates. Thus, the solution space is large. Using more counterexamples may even increase the solution space, because more gates are marked by PT. Similarly, for BSAT the solution space is large. More counterexamples may also increase this space when additional outputs are introduced into the diagnosis problem. If no additional outputs are introduced, the number of solutions is reduced. Besides the fact that all solutions calculated by BSAT are valid corrections also their quality is better in all cases, except for s38417 when only four counterexamples were considered. When more counterexamples were used, BSAT returned the best results. The plot in Figure 6.9 shows an overview of the results of BSAT vs. COV for all 10

BSAT

8 6 4 2 0

0

2

Figure 6.9.

4

COV

6

8

BSAT vs. COV: Average distance

10

115

BSAT

Diagnosis 10

5

10

4

10

3

100 10 1 1

10

100

10

3

10

4

10

5

COV Figure 6.10.

BSAT vs. COV: Number of solutions

benchmarks. For each benchmark the value of avg is denoted for the two approaches. Marks below the bisecting line indicate that the result returned by BSAT was better than that of COV for a particular benchmark. In Figure 6.10 the number of solutions is compared for the two approaches. Now, a mark below the bisecting line indicates that BSAT returned a smaller number of solutions – note the logarithmic scale of the axes. The ﬁgures show that BSAT usually returns a smaller number of solutions of a better quality. This directly implies time savings during design debugging. In summary, the approaches behave as expected from the theoretical analysis. BSAT is slower than the other approaches but returns the best results. Nonetheless, even the simple approaches often calculate solutions of good quality. This is exploited in Section 6.3 to combine both approaches. A method to determine multiple counterexamples that yield good diagnosis results is considered in the following section.

6.2

Generating Counterexamples for Diagnosis

Tools for formal veriﬁcation guarantee equivalence of designs or validity of a property under any input sequence (see, e.g. [BCMD90, PK00]). But the opposite, i.e. proving in-equivalence of two designs or in-validity of a property, is often only done by providing one counterexample. This is formally correct, but the designer has to locate the error based upon this single counterexample often by hand. The two techniques presented in the previous section and a number of other techniques [KCSL94, VF97, HC99, Uba03] improve this by applying multiple counterexamples and calculating error candidates automatically. For all of these approaches the counterexamples are given in advance, e.g. by a formal veriﬁcation tool or by a failing simulation trace. The generation of counterexamples that are dedicated for diagnosis has not been addressed adequately, yet.

116

ROBUSTNESS AND USABILITY

Only in [TYSH94] a condition to create counterexamples for error location is deﬁned. The approaches in [TSH94] and [IINY03] make use of this technique. But the given condition reduces the number of applicable counterexamples. Based upon the PT procedure (see Section 6.1.1.2), counterexamples can be used to calculate candidate error sites, even if no complete speciﬁcation is given. In this context, in the following the problem of choosing from the set of all counterexamples a set that leads to the smallest possible number of candidates is formalized. The decision whether a given set of counterexamples is optimal in this sense is proven to be NP-complete. Two heuristics are introduced to choose counterexamples. The ﬁrst heuristic can be considered as a general guideline how to choose counterexamples. This can be exploited when simulation-based veriﬁcation is used. The second heuristic efﬁciently chooses a set of counterexamples from BDDs. BDDs are used to represent the set of all counterexamples. This is not necessary for the approach but allows to evaluate the quality of the proposed heuristics because the whole search space can be traversed. In Section 6.2.1, the problem of deciding whether a set of counterexamples leads to a minimal number of marked gates is proven to be NP-complete. The two heuristics to choose counterexamples are explained in Section 6.2.2. Experimental results in Section 6.2.3 underline the quality of the heuristics.

6.2.1

Choosing Good Counterexamples

Given are an erroneous circuit C and the set of all counterexamples for a single gate change error. The PT procedure leads to a set of candidate error sites for each of the counterexamples. The intersection of all these sets gives the minimal set of candidates that can be determined. Usually, it is too expensive to use all counterexamples since the number of counterexamples can be exponential in the number of inputs of the circuits. Therefore, a subset of counterexamples has to be chosen that leads to a small number of candidate error sites. This motivates the deﬁnition of the following problem; BSIM as introduced in Section 6.1.1 is used as the underlying diagnosis procedure. Deﬁnition 11. An instance ICCE of the problem Choosing Counterexamples (CCE) is deﬁned by 1. C = (V, E, X, Y, F ), an erroneous circuit that can be diagnosed by num counterexamples, 2. Γ := {C0 , . . . , Cnum−1 }, the sets of candidates calculated by BSIM, i.e. Ci , 0 ≤ i < num, is the set of gates marked by PT when counterexample i is considered, 3. A ﬁxed but arbitrary integer num ≥ r > 1,

117

Diagnosis

4. A positive integer k. A solution to ICCE is a subset Γ∗ ⊂ Γ with |Γ∗ | ≤ r and

C∗ | ≤ k. | C∗ ∈Γ∗

The decision problem CCE is deﬁned by the question: Does there exist a solution to ICCE ? Informally, a solution to an instance of CCE is a subset of all counterexamples that has size r or less and leads to an intersection of at most k gates. The decision problem CCE is NP-complete as will be proven in the remainder of this section. The proof is carried out by establishing a hierarchy of problems (>pol means “in deterministic polynomial time reducible to”) M C >pol EI >pol M I >pol CCE. First, the subsequent problems and the questions leading to the corresponding decision problems are deﬁned, then the hierarchy is established. Deﬁnition 12. An instance of the problem Minimal Intersection (MI) is given by a tuple IM I = (C, Γ = {C0 , . . . , Cnum−1 }, r, k) : 1. C is a ﬁnite set of elements, 2. Ci ⊆ C, 0 ≤ i < num, 3. 1 < r < num is a ﬁxed but arbitrary positive integer, 4. k is a positive integer. A solution to MI is given by Γ∗ ⊂ Γ of size r or less, with

∗ C ≤ k. ∗ ∗

(6.1)

C ∈Γ

The decision problem MI is deﬁned by the question: Does there exist a solution to IM I ? Deﬁnition 13. An instance IEI of the problem Empty Intersection (EI) is an instance of MI with k = 0. The decision problem EI is deﬁned by the question: Given r, does there exist a solution to IEI ?

118

ROBUSTNESS AND USABILITY

Deﬁnition 14. An instance of the problem Minimum Cover (MC) is deﬁned by the tuple IM C = (C, Γ = {C0 , . . . , Cnum−1 }, r) : 1. C is a ﬁnite set of elements, 2. Ci ⊆ C, 0 ≤ i < num, 3. 1 < r < num is a ﬁxed but arbitrary positive integer. A solution to MC is given by Γ∗ ⊂ Γ of size r or less, with C∗ = C. C∗ ∈Γ∗

The decision problem MC is deﬁned by the question: Does there exist a solution to IM C ? NP-completeness of the decision problem MC is known [GJ79]. Based on this, the other problems will be proven to be NP-complete. Lemma 7. The decision problem EI is NP-complete. Proof. EI is in NP: a non-deterministic Turing machine decides for each subset Ci if it is chosen to be in Γ∗ or not. Then, the emptiness of the intersection is checked. An instance of MC can be reduced to EI in polynomial time: Given an instance IM C = (C, ΓM C = {C0 , . . . , Cnum−1 }, r) of MC, this is transformed into an instance of EI IEI = (C, ΓEI = {C0 , . . . , Cnum−1 }, r), i.e. the subsets Ci are the complement sets of those in IM C . Let Γ∗EI be a solution to IEI , i.e.

C∗ = ∅ C∗ ∈Γ∗EI

Now,

∗

Γ∗M C = {C |C∗ ∈ Γ∗EI }

119

Diagnosis

is a solution to IM C :

∗

C

=

∗

C∗

∗

C ∈Γ∗M C

C ∈Γ∗M C

=

C∗

C∗ ∈Γ∗EI

= ∅ = C An analog construction yields a solution for IEI , given a solution Γ∗M C of IM C . Thus, IEI has a solution if and only if IM C has a solution. Lemma 8. The decision problem MI is NP-complete. Proof. Non-deterministically ﬁnding the solution is done as in the case of EI. Then, Equation (6.1) is validated. EI is reduced to MI. Given an instance IEI = (C, ΓEI = {C0 , . . . , Cnum−1 }, r) of EI, an instance of MI is created: IM I = (C ∪ K, ΓM I = {C0 ∪ K, . . . , Cnum−1 ∪ K}, r, k). K is an auxiliary set of k new elements. From a solution Γ∗M I to IM I a solution to IEI can be retrieved: Γ∗EI = {C∗ |(C∗ ∪ K) ∈ Γ∗M I } And vice versa:

Γ∗M I = {(C∗ ∪ K)|C∗ ∈ Γ∗EI }

By using the NP-completeness of MI, the selection of counterexamples can be proven to be NP-complete as well as the following theorem states. Theorem 5. The decision problem CCE is NP-complete. Before proving the theorem, the idea of the proof will be illustrated by an example. An instance of MI is reduced to an instance of CCE. This is done by creating a circuit. An intersection of gates sensitized by counterexamples corresponds to an intersection of sets in MI.

120

ROBUSTNESS AND USABILITY

Example 25. Let I1 = (C = {g1 , . . . , g15 }, Γ = {C0 , . . . , C4 }, 2, 2) be an instance of MI, where: C0 = {g1 , g2 , g4 , g5 , g6 , g7 , g8 , g9 , g10 , g11 } C1 = {g1 , g2 , g4 , g5 , g6 , g7 , g12 , g13 } C2 = {g1 , g2 , g4 , g6 , g10 } C3 = {g1 , g2 , g3 , g11 , g12 , g13 , g14 , g15 } C4 = {g1 , g2 , g3 , g12 , g13 } For this instance the circuit in Figure 6.11 is constructed. The circuit differs from the speciﬁcation due to the extra inverter at the top. A binary vector i = (x2 , x1 , x0 ) is an input assignment to this circuit. Let i also denote the decimal value of this vector. Then, an input assignment i with a value i, 0 ≤ i ≤ 4, corresponds to counterexample i. This counterexample is observed at output yi and sensitizes the gates in set Ci . The circuit is composed of three modules: the subset circuit, the propagation array, and a decoder. Purpose of the subset circuit is to model all non-empty intersections of sets Ci . The decoder guarantees that an erroneous output value is only generated if an input value from {0, . . . , num − 1} is applied. The AND-gates in the propagation array propagate the erroneous value at ci to output yi if output i of the decoder is one. Applying any input value greater or equal to num sets all outputs to 0 and, therefore, is not a counterexample. For example, consider counterexample number one corresponding to the input assignment (0, 0, 1) to primary inputs (x2 , x1 , x0 ). This counterexample is observed only at output y1 which evaluates to 0 instead of 1. The PT procedure marks all gates along the bold lines as candidate error sites. Besides the gates in C1 there are only AND-gates, an OR-gate, and the extra inverter on these lines. Therefore, by applying counterexample number one, C1 can be retrieved. In general, counterexample i leads to marking gates in Ci plus (num + 3) gates. For example, counterexamples 1 and 3 lead to the minimal intersection of sensitized gates {g1 , g3 } and {C2 , C3 } is also a solution to I1 . The proof of Theorem 5 is sketched in the following. Sketch of proof. Answering the question by nondeterministically choosing a set Γ∗ and checking if this is a solution to an instance ICCE of CCE is done as before.

121

Diagnosis 1 extra inverter

subset circuit {0} g8

{0,1} g5

{0} g9

{0,1} g7

c0

{0,3} g11

{0,1,2} {0,1,2,3,4} g1 g4 {3,4} g 3 {0,1,2} {0,1,2,3,4} g2 g6

c1

c2

c3

c4

{1,3,4} {0,2} g13 g10

{3} g14

{1,3,4} g12

{3} g15

propagation array 0 1 0

1

2

3

D E C O D E R

x0 x1 x2

4 5 6 7

Figure 6.11.

y0

y1

y2

y3

y4

0

0/1

0

0

0

Circuit corresponding to the instance I1 of MI

The polynomial time reduction of an instance IM I = (C, Γ = {C0 , . . . , Cnum−1 }, r, k) to an instance ICCE with r and k + 1 of CCE is shown. At ﬁrst the circuit is created, then the one-to-one correspondence of solutions is shown.

122

ROBUSTNESS AND USABILITY

1. SubsetCircuit(Γ) num−1 2. For each v ∈ i=0 Ci : 3. Ind(v) = {i : v ∈ Ci ∧ Ci ∈ Γ} 4. If no list L(Ind(v)) exists, create an empty list L(Ind(v)). 5. Push v into L(Ind(v)). 6. For each previously generated list L(I), I ⊂ N: 7. Create a line of buffers labeled by elements in L(I). 8. Connect the input of the ﬁrst buffer to the output of the extra inverter. 9. Connect the output of the last buffer to all OR-gates with output ci and i ∈ I. 10. Return the circuit C. Figure 6.12.

Algorithm to build the subset circuit

The generalization of the circuit in Example 25 is straightforward. The circuit can be built in polynomial time. The decoder of a log(num)-bit binary input value to a unary output has size O(num). The propagation array has size O(num2 ) and has a regular structure. Remaining is the subset circuit that has to represent all non-empty intersections of subsets of Γ. This circuit is built by the algorithm in Figure 6.12. During the ﬁrst step lists are created (Lines 2–5). Each list is named by a set of indices Ind(v), and contains all elements v ∈ C that occur in the intersection of all Ci indicated by Ind(v). In the second step, these lists are used to build the necessary structures to represent nonempty intersections of sets (Lines 6–9). In the resulting circuit, an input value i ≥ num is not a counterexample: Due to the OR of all outputs of the decoder greater num − 1, all primary outputs assume the correct value zero. If a value i < num is applied at the inputs, exactly those buffers in the subset circuit labeled by v, where v ∈ Ci , are sensitized as follows: Due to the structure of the propagation array, an erroneous value is only observable at output yi . This erroneous value is 0 instead of 1. The controlling value of the previous AND-gate is 0. Thus, only AND-gates on the path from output yi to line ci but no other gates are sensitized. All inputs of the OR-gate with output ci have a value of 0, i.e. are noncontrolling. Thus, all input lines of this OR-gate are sensitized. If an output of a buffer is sensitized, the input is sensitized as well. Thus, all buffers labeled with v (v ∈ Ci ) are sensitized due to the construction of the subset circuit. Beside the buffers in the subset circuit only the extra inverter, the OR-gate and AND-gates in the propagation array are sensitized by any counterexample. The AND-gates and OR-gates sensitized by different counterexamples are disjoint.

123

Diagnosis

This leads to a one-to-one correspondence of solutions for IM I and ICCE . An intersection of elements in a subset of Γ∗ ⊆ Γ results in the set of gates sensitized by all the counterexamples {i : Ci ∈ Γ∗ }. Because |Γ∗ | = r > 1, no additional AND-gates and OR-gates occur in the intersection. Only the extra inverter is additionally sensitized. Thus, it has been proven that choosing the optimal set of counterexamples is difﬁcult. Even if all counterexamples are given, the choice of the best set cannot be done efﬁciently (provided that P = N P). If the number of counterexamples is restricted, e.g. r ≤ 4, and all counterexamples are given, all possible subsets up to size r can be enumerated to ﬁnd the subset leading to the smallest number of candidates. In practice, even the number of counterexamples num may be exponential in the number n of primary inputs of the circuit. So solving a practical instance of CCE is difﬁcult. On the other hand, heuristics are often used to solve NP-complete problems. These heuristics frequently ﬁnd good – however nonoptimal – solutions for a given problem. Moreover, randomly generating a nonoptimal solution is often possible but usually far from optimal. This motivates the investigation of heuristics to solve instances of CCE.

6.2.2

Heuristics to Choose Counterexamples

Choosing the best set of counterexamples, i.e. the set that leads to the smallest number of candidate error sites, is difﬁcult. For this reason, two heuristics to choose counterexamples are proposed. The heuristics deal with different situations. The ﬁrst heuristic is based on a distance metric. This allows to determine what the next counterexample “should look like”. Therefore, the metric also shows how to generate counterexamples to achieve a good diagnosis result. For the second heuristic all counterexamples are given and an efﬁcient choice among them has to be done. Both heuristics are based on the observations given in the next subsection.

6.2.2.1 Observations The heuristics have to guide the selection of counterexamples such that the sensitized paths have a small intersection. The following observations show which conditions lead to a small number of gates on a sensitized path and to different sensitized paths for two counterexamples. Observation 1. A nonspeciﬁed input is good because it is mostly not sensitized: A counterexample is observed at an output with a deﬁned value; thus, nonspeciﬁed input values at gates are not marked by PT. Observation 2. If the same input has assigned the opposite polarity in two counterexamples, this often leads to a controlling value in one case and to a non-controlling value in the other case. Different paths are sensitized.

124

ROBUSTNESS AND USABILITY

Observation 3. Counterexamples that are observed at different outputs, lead to different sensitized paths that reconverge. The heuristics are designed in such a way that these observations are taken into account.

6.2.2.2 Maximum Distance Heuristic This heuristic uses a distance metric to choose counterexamples, and, in addition, the deﬁnition of the metric can be considered as a guideline for the generation of counterexamples. A distance deﬁned between two counterexamples allows to determine what the next counterexample “should look like”. To evaluate the heuristic, a greedy algorithm chooses from all counterexamples such that the sum of pairwise distances is maximized. The following distance between counterexamples is derived from the observations in Section 6.2.2.1. Note that this is not a distance in the mathematical sense since d(T , T ) > 0 for certain T . Deﬁnition 15. Let d(x, y), where x, y ∈ {0, 1, −}, be deﬁned by ⎧ 3, if x = − and y = − ⎪ ⎪ ⎨ 2, if either x = − or y = − d(x, y) := 1, if x = y and x = − and y = − ⎪ ⎪ ⎩ 0, if x = y and x = − and y = − The distance d(T , T ) between two counterexamples T and T T

T

= (T, y, ν),

where T = (U, u0 ), U = (x1 , . . . , xn ), u0 = (νx1 [0], . . . , νxn [0]), = (T , y , ν ), where T = (U, u0 ), U = (x1 , . . . , xn ), u0 = (νx 1 [0], . . . , νx n [0])

is deﬁned by

d(T , T ) := (y = y )3n +

n

d(νxi [0], νx i [0]).

i=1

The values assigned to d(νxi [0], νx i [0]) capture the ﬁrst two observations. The term (y = y )3n increases the distance value of counterexamples that are observed at different outputs. The greedy algorithm in Figure 6.13 uses the distance. Given the set of all counterexamples T, the algorithm selects r > 2 counterexamples and inserts

125

Diagnosis

1. GreedySelect(r, T = T 1 , . . . , T num ) 2. count := 2. 3. Choose counterexamples T , T from T such that d(T , T ) is maximal. 4. Move T and T from T into T . 5. While count < r: 6. Choose T from T to maximize a counterexample T ∈T d(T , T ). 7. Move T from T into T . 8. count + +. 9. Return T . Figure 6.13. Greedy algorithm to choose counterexamples

them into the new set T . The number of calls to calculate the distance is bounded by O(num2 + num · r2 ). Each calculation of the distance takes O(n), where n is the number of inputs. Thus, the algorithm runs in time O((num2 + num · r2 ) · n).

6.2.2.3 Efﬁcient Heuristic on BDDs Having all counterexamples given by BDDs is a different situation compared to the generation of counterexamples. It is desirable to have a more efﬁcient algorithm to heuristically choose counterexamples. The algorithm explained in this section runs in time O(r · n), where n is the number of inputs. Again, the observations from Section 6.2.2.1 are used to create the heuristic. Observation 3 suggests to choose counterexamples for different outputs. Let o be the number of outputs where at least one counterexample can be observed, i.e. the BDDs that represent the counterexamples for these outputs resulting from Equation (2.2) are different from constant 0. From each BDD r = r/o counterexamples are chosen, the remaining r − or/o counterexamples are chosen from different BDDs. Each time a value a ∈ {0, 1} is assigned to an input xj with respect to a counterexample, a corresponding counter count[xj ][a] is incremented. Thus, the counter keeps track of the number of counterexamples that assign a certain value to a primary input. Given a node v, the successor on the path to one has to be chosen from the two children of the node. If the successor is chosen to be Then(v), value 1 is assigned to primary input Label(v), otherwise 0 is assigned. Choosing the successor is done using the following rules that are ordered by decreasing priority, π denotes the variable order: 1. Do not choose the terminal 0. 2. Choose the child that was not yet visited.

126

ROBUSTNESS AND USABILITY

3. If π(Index(Then(v))) > π(Index(Else(v))), choose Then(v). 4. If π(Index(Else(v))) > π(Index(Then(v))), choose Else(v). 5. If (count[Label(v)][0] < count[Label(v)][1]), choose Else(v); otherwise, choose Then(v). Rule 1 ensures that the chosen assignment is part of a counterexample. Rule 2 ensures to visit different branches of the BDD. Rules 3 and 4 maximize the number of do not cares in the counterexample (see Observation 1) by jumping over as many levels as possible. Rule 5 leads to counterexamples that have different values for a variable (see Observation 2).

6.2.3

Experimental Results

Experiments were carried out on an AMD Athlon 2.2+ GHz with 512 MB running under Linux. The benchmarks were taken from the LGSynth93 set. Into each of the circuits a single error that changes the function of exactly one gate was randomly injected. Only those erroneous circuits were considered in the experiments where the number of counterexamples was between 50 (to really have a search problem) and 200 (to be able to determine the optimal choice of counterexamples). For each of the circuits 100 erroneous instances were generated and then diagnosed by ﬁve techniques. At ﬁrst, all counterexamples were taken into account, then the results for the optimal choice of r counterexamples were calculated by simply iterating all possible choices. Then, r counterexamples were chosen randomly, by the maximum distance heuristic and by the efﬁcient heuristic for BDDs. Table 6.4 gives information about the circuits considered. The name, number of inputs and outputs and the number of gates (as given by the original description) are listed. Column #cand. gives the average number of gates marked ascandidate error sites when all counterexamples were applied, i.e. #cand.= | Ci | due to the single fault assumption. The number of gates that were marked was in the range from only a few up to 100% depending on the circuit structure and the error location. Also, the number of counterexamples was spread within the full range allowed. Tables 6.5, 6.6, and 6.7 give experimental results for using two, three, and four counterexamples, respectively. All results are given relative to the overall optimum of using all counterexamples. In column av. the arithmetic mean factor of gates marked by the heuristic over gates marked by all counterexamples is given, i.e. 100 gates marked by the heuristic 1 . 100 gates marked using all counterexamples i=1

127

Diagnosis Table 6.4. Circuit data Circuit 9sym alu2 alu4 apex2 apex5 cordic dalu des e64 ex4p ex5p seq t481 x3

In 9 10 14 39 117 23 75 256 65 128 8 41 16 135

Out 1 6 8 3 88 2 16 245 65 28 63 35 1 99

Gates 269 261 2416 3227 2734 2103 2699 2763 717 1593 1477 2991 1091 1638

#Cand. 46.7 7.2 14.7 56.6 7.9 873.7 33.1 65.3 17.7 17.0 8.9 23.1 147.1 9.8

Table 6.5. Results using two counterexamples Circuit 9sym alu2 alu4 apex2 apex5 cordic dalu des e64 ex4p ex5p seq t481 x3

Opt Av. Dev. 1.413 0.624 1.818 1.098 1.110 0.487 1.094 0.205 1.039 0.129 1.005 0.015 1.193 0.458 1.021 0.055 1.068 0.245 1.072 0.176 1.351 0.792 1.039 0.105 1.140 0.310 1.210 0.425

Av. 15.657 16.680 3.454 2.392 4.040 1.649 3.961 1.141 10.157 2.554 18.299 2.053 7.464 4.132

Rand Dev. 27.240 24.731 10.028 4.516 9.990 6.331 8.199 0.182 15.532 6.203 53.541 3.790 22.014 6.181

Time 0.01 s 0.01 s 0.02 s 0.03 s 0.02 s 0.02 s 0.02 s 0.02 s 0.01 s 0.01 s 0.01 s 0.03 s 0.01 s 0.01 s

Dist Av. Dev. Time Av. 2.736 4.334 0.01 s 2.574 7.152 9.979 0.01 s 6.149 1.289 1.090 0.03 s 1.381 1.660 1.342 0.06 s 1.801 1.636 1.170 0.08 s 2.829 1.011 0.054 0.04 s 1.006 2.652 3.157 0.04 s 2.409 1.086 0.416 0.15 s 1.033 3.090 5.471 0.02 s 8.143 1.279 0.559 0.05 s 1.303 7.392 17.146 0.02 s 5.929 1.440 1.352 0.04 s 1.280 4.088 16.320 0.02 s 3.567 2.382 2.386 0.06 s 2.078

BDD Dev. 2.518 8.165 1.443 3.755 7.671 0.020 2.632 0.123 16.626 0.679 15.475 0.635 10.462 1.821

Time 0.01 s 0.01 s 0.02 s 0.03 s 0.02 s 0.02 s 0.02 s 0.02 s 0.01 s 0.01 s 0.01 s 0.03 s 0.01 s 0.01 s

The best average factor between the three heuristics is denoted in bold letters for each benchmark. Column dev. gives the standard deviation. Column time gives the average run time for choosing the counterexamples and diagnosing the error. As can be seen from Table 6.5, randomly choosing two counterexamples does not lead to a good reduction of candidate error sites and largely differs from one case to the next (see the large values of the standard deviation). The heuristics are more reliable, but there also exist cases where one heuristic performs rather poor while the other one almost reaches the overall optimum (e.g. e64).

128

ROBUSTNESS AND USABILITY

Table 6.6. Results using three counterexamples Circuit 9sym alu2 alu4 apex2 apex5 cordic dalu des e64 ex4p ex5p seq t481 x3

Opt Rand Av. Dev. Av. Dev. Time 1.053 0.116 1.948 1.433 0.01 s 1.162 0.285 9.025 11.897 0.01 s 1.038 0.201 1.349 1.463 0.03 s 1.003 0.025 1.847 3.717 0.05 s 1.000 0.000 2.518 4.399 0.02 s 1.003 0.010 1.006 0.020 0.03 s 1.048 0.193 2.586 3.267 0.02 s 1.003 0.017 1.053 0.075 0.03 s 1.000 0.000 6.278 8.920 0.01 s 1.002 0.008 2.123 5.599 0.01 s 1.144 0.404 18.156 49.285 0.02 s 1.007 0.039 1.479 1.224 0.03 s 1.036 0.111 3.495 10.478 0.02 s 1.027 0.178 2.534 2.721 0.01 s

Dist Av. Dev. Time Av. 2.232 2.000 0.01 s 2.448 4.712 7.167 0.01 s 4.132 1.231 0.845 0.04 s 1.340 1.074 0.179 0.07 s 1.674 1.375 0.976 0.09 s 1.531 1.006 0.019 0.05 s 1.006 1.529 1.177 0.05 s 2.188 1.037 0.191 0.16 s 1.021 2.674 4.899 0.02 s 8.001 1.149 0.324 0.05 s 1.204 5.031 13.726 0.02 s 4.967 1.184 0.502 0.05 s 1.212 1.197 0.526 0.02 s 3.502 1.433 0.939 0.06 s 1.640

BDD Dev. 2.423 5.507 1.396 3.673 1.034 0.020 2.387 0.104 16.451 0.559 13.150 0.514 10.442 1.187

Time 0.01 s 0.01 s 0.03 s 0.05 s 0.02 s 0.03 s 0.02 s 0.03 s 0.01 s 0.01 s 0.02 s 0.03 s 0.02 s 0.01 s

Opt Rand Dist Av. Dev. Av. Dev. Time Av. Dev. Time Av. 1.026 0.065 14.133 24.820 0.01 s 2.207 1.989 0.01 s 1.730 1.043 0.128 7.102 8.917 0.01 s 3.684 6.228 0.02 s 3.351 1.015 0.080 1.281 0.921 0.04 s 1.226 0.820 0.05 s 1.226 1.000 0.000 2.541 4.922 0.06 s 1.066 0.160 0.09 s 1.216 1.000 0.000 1.975 2.742 0.03 s 1.352 0.958 0.09 s 1.484 1.002 0.007 1.004 0.017 0.04 s 1.005 0.017 0.06 s 1.006 1.010 0.062 1.789 1.556 0.03 s 1.464 1.115 0.06 s 2.059 1.000 0.000 1.079 0.231 0.04 s 1.034 0.173 0.18 s 1.018 1.000 0.000 4.688 6.806 0.01 s 2.674 4.899 0.03 s 7.071 1.001 0.003 1.943 5.309 0.01 s 1.111 0.243 0.06 s 1.166 1.049 0.192 7.392 18.086 0.03 s 4.550 13.074 0.03 s 4.526 1.001 0.008 1.201 0.435 0.04 s 1.153 0.387 0.06 s 1.159 1.000 0.001 1.159 0.300 0.02 s 1.167 0.407 0.03 s 3.452 1.010 0.071 1.978 1.685 0.01 s 1.337 0.764 0.07 s 1.427

BDD Dev. 1.186 3.666 0.861 0.639 0.995 0.019 2.232 0.104 13.859 0.482 12.021 0.407 10.427 0.848

Time 0.01 s 0.01 s 0.04 s 0.06 s 0.03 s 0.04 s 0.03 s 0.04 s 0.01 s 0.02 s 0.02 s 0.04 s 0.02 s 0.01 s

Table 6.7. Results using four counterexamples Circuit 9sym alu2 alu4 apex2 apex5 cordic dalu des e64 ex4p ex5p seq t481 x3

Both heuristics lead to roughly the same reduction for most benchmarks when three counterexamples are used (Table 6.6). In some cases, the maximum distance heuristic achieves better results. This is due to the global decision scheme for this heuristic opposed to that of the BDD heuristic. The two directives in the BDD heuristic – “do not choose zero” and “jump over a large number of levels” – are local decisions. In all but a few cases, the heuristics were superior to the random choice.

129

Diagnosis 70 apex2 random apex2 BDD alu4 random alu4 BDD

60

# candidates

50 40 30 20 10 0 0

20

40

60

80

100

# counterexamples Figure 6.14. Number of candidates

Using four counterexamples further improves the result of PT (Table 6.7). For the investigated examples even the random choice often performed well. But there do exist cases, where more counterexamples improve the diagnosis quality of the random choice only slightly (e.g. alu2 or ex5p). This effect does not always disappear when more counterexamples are used as Figure 6.14 shows. Given are results for benchmarks alu4 and apex2 with respect to the random heuristic and the BDD heuristic for a particular error. In this series of experiments, the maximum number of counterexamples was not restricted since no optimal choice was calculated. For the error injected on alu4 985 counterexamples were available. Even the random choice performs good in this case because randomly choosing similar counterexamples from a large set is unlikely. For apex2 the number of counterexamples was 1417. Here, the disadvantage of the random choice can be seen: the number of candidates does not decrease monotonously. Even when 100 counterexamples are used the random choice leads to twice as many candidates than the BDD heuristic. Compared to the total number of counterexamples a small fraction is sufﬁcient to achieve good diagnosis results in both cases. Figure 6.15 shows the total time needed for diagnosis, i.e. the selection and PT. The time increases linearly with the number of counterexamples and is still moderate for 100. In summary, applying more than one counterexample can signiﬁcantly reduce the number of candidate errors, but using a large number is not necessary. Diagnosis is improved even if the counterexamples are chosen randomly from the set of all counterexamples. But using a heuristic to select counterexamples as different as possible is more reliable. An efﬁcient algorithm was presented that can be applied when all counterexamples are given by BDDs.

130

ROBUSTNESS AND USABILITY 1.8 apex2 random apex2 BDD alu4 random alu4 BDD

1.6 1.4

CPU sec

1.2 1 0.8 0.6 0.4 0.2 0

0

10

20

30

40

50

60

70

80

90

100

# counterexamples Figure 6.15.

6.3

Time for diagnosis

Debugging Properties

In the previous sections, diagnosis and debugging have only been considered in the context of equivalence checking. The underlying problem was combinational and results were presented at the gate level. Moreover, for a counterexample there was always a correct output response available. Now, property checking is considered. For this purpose diagnosis has to be extended to the sequential case, results have to be presented at the source code level and, usually, there is not a unique correct output response available. Currently, there is not much tool support for debugging the failure of formal properties. Different methods have been proposed to understand the essence of a failure by improving the understanding of a counterexample. For example, partitioning the counterexample into parts which force the failure and into parts which try to avoid it is proposed in [JRS04]. In [RS04] counterexamples are reduced by removing irrelevant parts. Other approaches try to provide a fault explanation by investigating related traces [BNR03, GV03, RR03, Gro04]. The differences between failure traces and successful traces give an indication of the parts of a software program that are likely involved in the failure. In [CIW+ 03], all counterexamples are considered and it is classiﬁed whether particular value assignments are necessary, irrelevant, or possible to show the failure. Tools for an enhanced presentation of counterexamples are also available. In [HTCT03], a simulator with reasoning capabilities is proposed to interactively analyze the cause of a value assignment or the outcome of a forced change of a signal value. All these approaches help to understand a failure. But so far methods that fully automate the localization of faults for temporal properties are missing.

Diagnosis

131

In this section, an approach for automatic localization of fault candidates at the gate level or source code level for safety properties is proposed. The diagnosis uses a set of counterexamples that are obtained from either a formal veriﬁcation tool or a run of a simulator with functional checkers. The proposed approach builds on model-based diagnosis [Rei87]. A failure is seen as a discrepancy between the required and the actual behavior of the system. Then diagnosis means to determine those components that, when assumed to be incorrect, explain the discrepancy. In [HD84], it is shown that for certain degenerate cases of sequential circuits model-based diagnosis marks all components as possible faults. Perhaps for this reason, there is little work on model-based diagnosis for sequential circuits, with the exception of [PW03] which does not take properties into account and applies a different fault model. The experimental results show that such degenerate cases rarely happen and that model-based diagnosis can be used successfully in the sequential case. Previous work in both the sequential and combinational case requires that a failure trace is given and the correct output for the trace is provided by the user. Here, instead of requiring a ﬁxed error trace, it is assumed that a speciﬁcation is given in Linear Time Logic (LTL) [Pnu77]. Counterexamples to a speciﬁcation can be extracted automatically and the user does not need to provide the correct output: The necessary constraints on the outputs are contained in the speciﬁcation. The diagnosis problem is stated as a SAT problem similar to the SAT-based diagnosis approach explained in Section 6.1. The construction is closely related to that used in BMC [BCCZ99]. For diagnosis, a counterexample of length tcyc is given. As in BMC, the circuit is unrolled to length tcyc and a propositional formula is built to decide whether the LTL property holds. If the inputs in the unrolled circuit are ﬁxed to the values given in the counterexample and the property is constrained to hold, a contradiction is produced. The problem of diagnosis is the problem of resolving this contradiction. To resolve the contradiction, the model of the circuit is extended. A set of predicates is introduced which assert that a component functions incorrectly. If an abnormal predicate is asserted, the functional constraints between inputs and outputs of the component are suspended. The diagnosis problem is to ﬁnd which abnormal predicates need to be asserted in order to resolve the contradiction. The set of satisfying assignments can be further restricted by requiring that the output of a gate must depend functionally on the inputs and the state of the circuit. Thus, the existence of a combinational correction is required. This allows to extract a suggestion of the proper behavior of the suspect component from the satisfying assignments.

132

ROBUSTNESS AND USABILITY

To improve the performance of the algorithm, a dedicated decision heuristic for the SAT solver is suggested. In the setting considered here, a small set of decision variables sufﬁces to imply the values of all other variables. Restricting the decision variables to this set leads to a considerable speed-up and allows us to handle large and complex designs. The search space can be further pruned by applying a simulation-based preprocessing step. By calculating sensitized paths, the set of candidate error sites is pruned ﬁrst. Only those components identiﬁed as candidates during the preprocessing step have to be considered during SAT-based diagnosis. The section is structured as follows. In Section 6.3.1, other diagnosis approaches are discussed. Section 6.3.2 gives the foundation of the diagnosis approach and presents how fault localization is performed. The applicability of the approach on the source level is shown in Section 6.3.3. Then, Section 6.3.4 gives experimental evidence of the efﬁciency of the approach.

6.3.1

Other Diagnosis Approaches

There is a large amount of literature on diagnosis and repair. Most of it is restricted to combinational circuits. Also, much of it is limited to simple faults such as a forgotten inverter, or an AND-gate that should be an OR. Such faults are likely to occur, for example, when a synthesis tool makes a mistake when optimizing the circuit. The work in [VH99] and [CWH93] on diagnosis at the gate level, for example, has both limitations. Sequential circuits are treated in [WB95] on the gate level but the approach is limited to simple faults. The fault model of [HC99] is more general, and it addresses sequential circuits but assumes that the correct output values are given. Its technical approach is also quite different from the one introduced here. The error model in [AVS+ 04] is similar but there correct output responses are always available and no functional consistency constraints are provided. The approach introduced in [ASV+ 05] has the same limitations. But there hierarchical relations are exploited during debugging. This is similar to hierarchical information that is available from source code annotations as will be explained later. Therefore, the same technique could be used to further increase the efﬁciency of the approach presented here. Both [Gro04] and [ZH02] work on the source code level (for hardware and software, respectively). Both are based on the idea of comparing which parts of the code are exercised by correct traces and incorrect traces that are similar. Only a few approaches have been proposed that are dedicated to fault location or correction for property checking. In [JGB05, SJB05], a game-based approach is proposed which locates a fault and provides a new function as a correction for a faulty component. Because it computes a repair, this approach is far less efﬁcient than the one suggested here. In [FD05], a simulation-based approach using BSIM (see Section 6.1.1.2) is presented that is similar to the

133

Diagnosis

current one but less accurate. Here, the simulation-based technique is used as a preprocessing step to prune the number of components considered during diagnosis.

6.3.2

Diagnosis for Properties

In this section, the new approach is described. The basic algorithm is introduced and extensions for run time improvements and accuracy improvements are explained. The section concludes with a discussion.

6.3.2.1 Computing Fault Candidates To simplify the explanation, it is assumed that the components of the circuit are gates, that is, a fault candidate is always a single gate. The proper deﬁnition of components is considered in Section 6.3.3. Furthermore, the speciﬁcation is given as a (single) LTL formula. The overall approach is a combination of BMC as explained in Section 2.3.2 and SAT-based diagnosis as introduced in Section 6.1.1.3. The basic procedure has four steps: 1. Create counterexamples 2. Build the unrolling of the circuit, taking into account that some components may be incorrect 3. Build a propositional representation of the property 4. Use a SAT solver to compute the fault candidates The counterexamples to the property can be obtained using model checking [CGP99] or using dynamic veriﬁcation [ABG+ 00]. It is advantageous to have many counterexamples available as this increases the discriminative power of the diagnosis algorithm. Techniques for obtaining multiple counterexamples have been studied in Section 6.2 and in [GKL04]. For simplicity, the case of using one counterexample (of length tcyc ) is considered ﬁrst. Furthermore, ﬁnite counterexamples are assumed, that is, the liveness part of the speciﬁcation is ignored. The purpose of Step 2 and Step 3 is to construct a propositional formula ψDiag such that the fault candidates can easily be extracted from the satisfying assignments for ψDiag . As stated before, the procedure is closely related to BMC, and speciﬁcally the differences will be addressed. The unrolling of the circuit C and creating a SAT instance to check an LTL property have been explained in Section 2.3.2. In particular, the creation of the formula to describe the unrolled circuit as given in Equation (2.4) has been discussed in detail. A similar formula is used for diagnosis.

134

ROBUSTNESS AND USABILITY

In order to perform diagnosis, a new propositional variable abg is introduced for each gate g. Analogously to the combinational case of SAT-based diagnosis (see Section 6.1.1.3), the description ψg [t] of gate g at time frame t is replaced by the formula ψˆg [t] = (abg → ψg [t]). As explained before, if the abnormal predicate abg is asserted, gate g is selected for correction, i.e. no assumptions on its behavior at any time frame is made. If abg is not asserted, the gate works as required. Now, given a single counterexample the formula ξ forces the inputs of the unrolled circuit to the values prescribed by the counterexample. Then, the description of the unrolling is given by tcyc −1 tcyc ˆ ψˆg [t] ψC = ξ · t=0 g∈V

t

The propositional formula ψΨcyc for the LTL formula Ψ is created as explained in Section 2.3.2. Note that combining the description of the counterexample, the circuit, and the speciﬁcation in a single SAT instance and forcing all abnormal predicates to false, yields a contradiction. Let abg . ζ0 = g∈V

Then the following expression is contradictory: t t zΨ [0] · ψΨcyc · ψˆCcyc · ζ0

A diagnosis is obtained by calculating which abnormal predicates can resolve the contradiction. For instance, for single fault candidates, let ζ1 state that at most one abnormal predicate is true, then the diagnosis problem can be formulated as follows: abg ζ1 = g∈V h=g,h∈V t t ψDiag = zΨ [0] · ψΨcyc · ψˆCcyc · ζ1

If a is a satisfying assignment for ψDiag and a asserts abg , then g is a fault candidate. As shown in Section 6.2, multiple counterexamples can be used to reduce the number of diagnosed components: Only an explanation that resolves the conﬂict for all counterexamples is a fault candidate. The propositional formula corresponding to this problem consists of one unrolling of the circuit for each counterexample. All sets of variables are disjoint, the abnormal predicates, which are shared, are an exception.

135

Diagnosis

Example 26. In the following, the process is illustrated using another simple arbiter with input req and output ack. The arbiter is supposed to acknowledge each request either instantaneously or in the next time frame, but it may not emit two consecutive acknowledgments. Let s1 and s2 be present state bits. State bit s1 stores whether there is a pending request, and s2 stores whether an acknowledge has occurred in the last step. The arbiter is deﬁned by the following equations: ack = (s1 + req) · s2 next(s1 ) = req · ack next(s2 ) = ack Furthermore, the initial values of s1 and s2 are 0. Note that the circuit as shown in Figure 6.16 contains a fault: ack should be g1 · s2 . In LTL, the speciﬁcation reads G((req + ack + X ack) · (ack + X ack)). The shortest counterexamples to the property have length two. For example, if requests occur in the ﬁrst two time frames, ack is 0 in both frames, which violates the speciﬁcation. Figure 6.17 shows the unrolled circuit combined with the unrolled LTL speciﬁcation. The abnormal predicates can remove the relation between the input and the output of a gate. For instance, the clauses for gate g2 are equivalent to abg2 → (g2 ↔ (g1 · s2 )). Nothing is ascertained about g2 when abg2 is true.

s0 req

g1 g3

s1

g2

ack

Figure 6.16.

Faulty arbiter circuit

136

ROBUSTNESS AND USABILITY req 1

abg 1

abg

abg 3

0

1

0

2

req 1

0

0 s1

g1

s1

1

g1

0

s1

1 g3

g3 0

g2

s2

1

1

g2

s2

1

0 s2 ack

ack

1 ack X ack 1

1

g7 1

1g 6 1 1 valid

1

g5

g4

1

Time frame 0

g7

g6

Ω Ψ

g5

Ω·Ψ 1 G(Ψ·Ω)

g4

Time frame 1

X G(Ψ·Ω)

Figure 6.17. Circuit with gate g2 as diagnosis (Ω = req + ack + X ack, Ψ = ack + X ack).

The gates below the horizontal dashed line correspond to the unrolled formula. The signal corresponding to the truth of the speciﬁcation is labeled with “valid” in the ﬁgure. For every time frame, the outputs of the gates in the unrolled formula correspond to a subformula of the speciﬁcation. In the ﬁgure, the labels of the dashed horizontal lines indicate which subformula is represented by a gate output. It can easily be seen that valid is zero when two requests occur and all abnormal signals are set to zero. (Ignore for now the numbers in boxes.) Note that signals corresponding to the valuation of ack and G Ψ · Ω in time frame 2 appear in the ﬁgure (bottom right). The fact that the speciﬁcation is false can be derived regardless of the values of these signals, since the counterexample is ﬁnite. The question for the SAT solver is whether there is a consistent assignment to the signals that makes the speciﬁcation true and sets only one of the abnormal predicates to true. One solution to this question is shown by the numbers in boxes in the ﬁgure. Gate g2 is assumed to be incorrect (as expected). For the circuit to be correct, it could return 1 in time frame 0 and 0 in time frame 1. The corresponding correction suggested by this satisfying assignment is that g2 should be 0 when g1 is 1 and s2 is 0, and 0 when both inputs to the gate are 1.

137

Diagnosis

The contradiction cannot be explained by setting abg1 or abg3 to true which means that g2 is the only fault candidate.

6.3.2.2 Functionality Constraints There is another satisfying assignment to the example just discussed: let g2 be 0 in the ﬁrst step and 1 in the second. Note that there is no combinational correction to the circuit that implements this repair, as the inputs and states in both steps would be the same, but the output of g2 is required to be different. In fact, the approach may ﬁnd diagnoses for which there is no combinational repair. It may even ﬁnd diagnoses when the speciﬁcation is not realizable. A similar observation is made in [Wot02] for multiple test cases. Now it is shown that by adding Ackermann constraints to the propositional formula ψDiag it can be guaranteed that for any diagnosis there is a ﬁx that makes the circuit correct for at least the given set of counterexamples. The following example shows that the approach considered so far does not make any guarantees. Example 27. Consider the unrealizable speciﬁcation out ↔ X in, where out is an output and in is an input. If the circuit consists of one component c, connecting in and out, it is not hard to see that c is a diagnosis independent of the counterexample. Therefore, {c} is a valid correction as speciﬁed in Deﬁnition 9 in Section 6.1. In the previous sections, only combinational circuits were considered with respect to different counterexamples. Thus, for each counterexample a combinational function can deterministically produce a different value. This is not true in the sequential case where the same values of primary inputs and present state elements may occur for different time steps and different counterexamples. The deﬁnition of a valid correction can be reﬁned as follows to alleviate this problem. Deﬁnition 16. A gate g is repairable if there is a Boolean function f (x1 , . . . , xn , s1 , . . . , sn ) in terms of the inputs and the state such that the circuit adheres to the speciﬁcation when g is replaced by f (x1 , . . . , xn , s1 , . . . , sn ). That is, a gate is repairable if the circuit can be ﬁxed by replacing the gate by some new cone of combinational logic. Deﬁnition 17. Gate g is repairable with respect to T, where T is a set of counterexamples, if there is a Boolean function f (x1 , . . . , xn , s1 , . . . , sn ) such that none of the counterexamples in T are a counterexample to the property when the function of g is replaced by f .

138

ROBUSTNESS AND USABILITY

The generalization of this deﬁnition to a set of candidates or components is straightforward. But in this section a single error assumption is applied. Remark 11. In the sequential case, each repairable gate is also a valid correction because changing the value of this gate is sufﬁcient. But in Example 27 the component c is a valid correction while not being repairable. Given a set of counterexamples T, the Ackermann constraint for a gate g says that for any (not necessarily distinct) pair of counterexamples T1 , T2 and any pair of time steps i, j, if the state and the inputs of the circuit in time step i of counterexample T1 equal the state and the inputs in time step j of counterexample T2 , then the output of g is the same in both steps. Ackermann constraints can easily be added to the propositional formula by adding a number of clauses that is quadratic in the cumulative length of the counterexamples and linear in the number of gates. This leads to the following result: Theorem 6. In the presence of Ackermann constraints, given a set of counterexamples T, any gate that is a diagnosis is repairable for T. The choice of what constitutes a repairable gate may seem somewhat arbitrary. Alternative deﬁnitions, however, are handled just as easily. For instance, one could require that a ﬁx is a replacement by a single gate with the same inputs. The Ackermann constraints would change correspondingly. On the other extreme, one could allow any realizable function, in which case the Ackermann constraints would require that the output is equal if all the inputs in the past have been equal. In this case – assuming that all counterexamples are pairwise different – the notion of a valid correction as deﬁned earlier is applicable. For the notion of Ackermann constraints used here, including all state elements and inputs may yield a very large problem instance in practice. Instead, only those signales are included that are considered by the property and their transitive fanin. This is visualized in Figure 6.18: Signals that are considered by the property are indicated by “×” in the ﬁgure and the transitive fanin is

0 2

Time frame 0

Figure 6.18.

Time frame 1

Time frame 2

State elements considered for Ackermann constraints

Diagnosis

139

shown as a grey area. As a result, the subset of state elements and primary inputs may be different at different time steps. Given two time steps i and j, the the state elements and inputs are compared that are contained in both copies of the circuit. For example, consider time frame 1 in Figure 6.18. When comparing the state to time frame 0, all elements indicated by the bold bar labeled with 0 are considered. When comparing time frames 1 and 2, those elements marked by the bold bar with label 2 are considered. This makes the constraints more restrictive since only a subset of the state bits is considered in each time frame. Now consider a single component g (indicated by •). This component inﬂuences the property in time frames 1 and 2 and may be considered for correction. By construction all state elements inﬂuencing g in these time frames are contained in the unrolled circuit as well. Therefore, all state elements relevant for repairing g are compared by Ackermann constraints. Thus, the proposed more restrictive heuristic approach is reasonable and more efﬁcient than including all state elements in the constraints.

6.3.2.3 SAT Techniques In practice, all fault candidates are of interest, not just one. This can be achieved efﬁciently by adding blocking clauses [McM02] to the SAT instances stating that the abnormal predicates found so far must be false. Note that not the full satisfying assignment is added as a blocking clause but just the fact that some abnormal predicates must be false, to exclude all other valuations of this assignment. The efﬁciency of the SAT solver can be drastically improved using a dedicated decision strategy similar to [Str04]. By default, the solver performs a backtrack search on all variables in the SAT instance. Here, all variable values can be implied when the abnormal predicates and the output values of gates asserted as abnormal are given. Therefore, a static decision strategy is applied, that decides abnormal predicates ﬁrst and then proceeds on those gates that are asserted abnormal starting at time frame 0 up to time frame tcyc − 1. Figure 6.19 shows the pseudo code for this decision strategy. The vector A contains all abnormal predicates. This vector is searched until a predicate ab with an undecided value is found. If no value was assigned, the predicate is set to 1 (Lines 4–6). Due to the construction of the SAT instance, this assignment implies the value 0 for all other abnormal predicates. If the ﬁrst assigned predicate has value 1, the output variable of the gate inﬂuenced by ab is considered (Lines 7–11). The hash H maps abnormal predicates to output variables of gates. H(ab) returns a vector of k propositional variables. Variable H(ab)[t] represents the output of the gate that is asserted abnormal by ab at time frame t. Thus, the ﬁrst gate with unknown output value that is asserted abnormal is set to the value 0. Gates in earlier time frames are considered ﬁrst. If no unassigned variable is found, a satisfying assignment was

140

ROBUSTNESS AND USABILITY 1 2 3 4 5 6 7 8 9 10 11 12

function staticDecision for i := 1 to A.size let ab be the variable A[i]; if ab == UNDECIDED then ab := 1; return DECISION_DONE; else if ab == 1 then for t := 0 to tcyc − 1 if H(ab)[t] == UNDECIDED H(ab)[t] := 0; return DECISION_DONE; return SATISFIED;

Figure 6.19.

Pseudocode of the static decision strategy

found (Line 12). Note that only one value of each variable has to be assigned in the decision strategy because the other value is implied by failure driven assertions (see Section 2.1.3). Note also that H(ab)[t] is a list in the general case because multiple counterexamples and components instead of gates are considered, i.e. each abnormal predicate may correspond to multiple gates as explained in Section 6.3.3. In the implementation, this list is searched for the ﬁrst gate that is undecided. The experiments show a signiﬁcant speed up when this strategy is applied. Constraint replication is not yet applied, but this can obviously be used in this setting, especially when multiple counterexamples are present.

6.3.2.4 Simulation-based Preprocessing When all gates or components of a circuit are considered as potential diagnoses, the search space is very large. A ﬁrst obvious method to reduce this search space is a cone-of-inﬂuence analysis. As a result, only those components that drive signals considered in the property are contained in the SAT instance. Furthermore, a simulation-based preprocessing step can be applied to further reduce the number of components that have to be considered during diagnosis. As observed in Section 6.1, simulation-based diagnosis has a linear time complexity with respect to the size of the circuit. Furthermore, a single error assumption is applied for diagnosing properties. Therefore, when using multiple counterexamples only components marked by each counterexample are considered as candidates. Abnormal predicates are only assigned to these candidates during the SAT-based diagnosis step. This procedure does not change the solution space for diagnosis, because changing a component that is not on a sensitized path cannot change the output value of the property.

141

Diagnosis

The experimental results show that the overhead of this linear time preprocessing step is low. This step can prune the search space and by this reduces the overall run time.

6.3.2.5 Discussion Just like multiple counterexamples, stronger speciﬁcations reduce the number of diagnoses. When more properties are considered, the constraints on the behavior are tightened, leading to less diagnoses. In practical applications, a hint how to repair the faulty behavior at a particular component is useful. The satisfying assignments do not only provide diagnoses but also provide the values that the faulty components should provide. The extension to liveness properties does not seem to be simple. In model checking, the counterexample to a liveness property is “lasso-shaped”: After some initial steps, it enters an execution that repeats inﬁnitely often. It is very easy to remove such a counterexample by changing any gate that breaks the loop without violating the safety part of the property. The recent observation that liveness properties can be encoded as safety [BAS02] does not seem to affect this observation as it merely encodes the loop in a different way. Note however that on an implementation level bounds on the response time are often available and liveness can thus be eliminated from the speciﬁcation, at least for the purpose of debugging.

6.3.3

Source Level Diagnosis

The previous section describes the diagnosis approach by means of sequential circuits on the gate level. In this section, the applicability of the approach on the source level is shown. An expression on the source level may correspond to multiple gates. Therefore, a single fault on the source level may correspond to multiple faults on the gate level. To avoid multiple fault diagnosis, this information has to be included in the SAT formula that is solved for diagnosis. This is achieved by grouping several gates into one component. The hierarchy induced by the syntactical structure of the source code is included in the gate-level representation of the design and the property. This allows to link the gate level to the source code. The link between source code and gate-level model is established during synthesis. This procedure is shown in Figure 6.20. An Abstract Syntax Tree (AST) is created from the source code at ﬁrst. Then, the AST is traversed and directly mapped to gate-level constructs. During this mapping, the gates that correspond to certain portions of the source code can be identiﬁed. Thus, the AST induces regions at the gate level. These regions are grouped hierarchically. Components are identiﬁed based on this representation. Each region corresponds to a component. For example, the expression (a==1) && (b==0)

142

ROBUSTNESS AND USABILITY intermediate representation

source code

hierarchical netlist

c

if-stmt expr

ifif (a==0 && b==1) c= 1; else else c=0;

ab && 3 == a

ab1

0

== b

stmt

stmt

=

=

ab2 c 1

1

c 0

0

0

1

1

a

Figure 6.20.

b

Source code link

corresponds to three components: (a==1), (b==0), and the complete expression. For each region a single abnormal predicate is introduced. All gates that do not belong to a lower region in the hierarchy are associated to this abnormal predicate. In the example, the predicates ab1 , ab2 , and ab3 are introduced. Although this approach requires a modiﬁed synthesis tool, the diagnosis engine can take advantage of the hierarchical information as it was suggested in [ASV+ 05]. For instance, a correction of a single expression may not be possible, but changing an entire module may rectify all counterexamples. When the hierarchy information is encoded in the diagnosis problem, a single fault assumption still returns a valid diagnosis. The granularity of the diagnosis result can also be inﬂuenced. For example, choosing only source-level modules as components yields a coarse diagnosis, or, in contrast, considering all subexpressions and statements as components produces a ﬁne-grained diagnosis result. Finally, hierarchical information can be used to improve the performance of the diagnosis engine. First, a coarse granularity can be used to efﬁciently identify possibly erroneous parts of the design. Then, diagnosis can be carried out at a ﬁner granularity with higher computational cost to calculate more accurate diagnoses for the previously identiﬁed components [ASV+ 05].

6.3.4

Experimental Results

For the experimental data, the benchmarks provided with VIS [VIS96] were used. A bug was manually introduced in each of the designs by changing an operator or a constant. In the following, the speciﬁcity of the diagnosis is analyzed and the beneﬁt of the modiﬁed decision heuristics is shown. All experiments were carried out on an AMD Athlon 3500+ (Linux, 2.2 GHz, 1 GB, Linux).

143

Diagnosis

A modiﬁed version of the synthesis tool vl2mv from VIS is used to produce the annotated gate-level representation. The design and the property are described in Verilog. As a result, either one can be considered during diagnosis. This environment can use multiple counterexamples for diagnosis. The incremental property checker is based upon a version of Zchaff [MMZ+ 01] that supports incremental SAT [WKS01]. During diagnosis one SAT instance is created that includes a copy of the design for each counterexample. Constraint replication between the copies is not used, yet. The incremental interface of Zchaff is used to calculate all diagnoses. Zchaff was modiﬁed to use the static decision heuristic discussed in Section 6.3.2.5.

6.3.4.1 Example The branch prediction buffer bpbs provided as an example for VIS is considered in the following. For each branching point four state machines as shown in Figure 6.21 are used for prediction. Each state machine provides one prediction bit. When a branching point is reached, one state machine is updated. The address translation and the selection of a single state machine are done externally. A valid property for this buffer is pStrongPrediction that says: If the operation is not stalled and all state machines agree on branch taken/not taken, the prediction bits also agree on the next clock cycle. The code of the buffer that relates to this property is shown in Figure 6.22. Line 11 contains a fault. The wrong polarity of signal stall is considered. The comment in Line 10 shows the correct code. The proposed localization approach was applied to the erroneous design. When two or more counterexamples were used, only one diagnosed expression was returned. This expression was exactly the underlined stall-signal in the erroneous if-condition. Therefore, in this case the single diagnosis exactly points to the problem that caused the property to fail.

no jump 0: strong not taken

no jump 1: weak not taken

jump Figure 6.21.

jump

no jump 2: weak taken jump

State machine for branch prediction

3: strong taken

144

ROBUSTNESS AND USABILITY

1 module branchPredictionBuffer(clock,stall ,inst_addr,... 2 prediction); parameter SIZE = 4; input clock; input [1:0] 3 inst_addr; input stall; output [3:0] prediction; reg [1:0] . 4 state_bank0 [SIZE-1:0]; . . 5 6 7 8 9 10 11 12 13 14

always @(posedge clock) begin //Correct: if (stall==0) begin if (stall==1) begin if (state_bank3[inst_addr] > 1) prediction[3] = 1; else prediction[3] = 0; if (state_bank2[inst_addr] > 1) . . .

end // if (!stall) . 15 . . end // always @ (posedge clock) endmodule // 16 branchPredictionBuffer Figure 6.22.

Source code for bpb

6.3.4.2 Diagnosis Quality Results regarding the quality of the diagnosis are presented in Table 6.8. The table also shows the inﬂuence of multiple counterexamples and Ackermann constraints on the diagnosis results. The ﬁrst column gives the name of the benchmark circuit and the name of the property considered. The number of gates and the number of registers contained in the circuit follow. For BMC the length of the counterexamples and the number of components identiﬁed in the complete design is given in columns len and #cmp, respectively. Besides diagnosis, results for a static cone of inﬂuence analysis, results for using a single counterexample, four counterexamples, and four counterexamples together with Ackermann constraints are reported. For each approach the number of components returned as diagnosis and the percentage of diagnosed components compared to all components in the design is shown in columns #cmp and %, respectively. The traditional cone of inﬂuence analysis often leads to a large number of components that have to be considered. In contrast, the number of diagnoses is often small already when the new diagnosis approach is applied with a single

145

Diagnosis Table 6.8. Diagnosis results for multiple counterexamples and Ackermann constraints BMC Circuit, property am2910 p1 e1,pE5 am2910 p2 e1, pSP bpbs p1 e1, pValidT bpbs p1 e2, pValidT counter e1, pCount FPMult e1, pLegalOp FPMult e2, pLegalOp gcd e1, pReady gcd e2, pReady gcd e1, pBoth gcd e2, pBoth gcd e1, pThree gcd e2, pThree

Gates Reg. Len #Cmp 2257 102 5 227 2290 102 5 230 1640 39 2 127 1640 39 2 127 25 7 3 11 973 69 4 119 973 69 4 119 634 51 22 87 634 51 22 87 634 51 23 87 634 51 23 87 634 51 23 87 634 51 23 87

Diagnosis Cone Single Four Ackermann #Cmp % #Cmp % #Cmp % #Cmp % 205 90 66 29 36 15 36 15 87 37 37 16 26 11 26 11 102 80 15 11 13 10 13 10 102 80 15 11 4 3 4 3 10 90 4 36 4 36 1 9 105 88 3 2 3 2 3 2 105 88 54 45 47 39 47 39 68 78 45 51 35 40 35 40 68 78 34 39 32 36 32 36 71 81 46 52 36 41 36 41 71 81 33 37 33 37 33 37 71 81 33 37 23 26 23 26 71 81 39 44 22 25 22 25

counterexample. Problems may occur when very long counterexamples are considered. Then, the ﬁx can be placed at many different locations. But this is inherent to the problem and not a limitation of the presented approach. Moreover, using multiple counterexamples for diagnosis often raises the accuracy. As an extreme, consider bpbs where the number of fault candidates is reduced from 15 to only 4. In contrast, Ackermann constraints do not yield the same improvement. Only for counter e1 the number of diagnoses was reduced and the algorithm returned exactly the real error site. The overhead in run time is quite high for Ackermann constraints. An increase is observed by up to a factor of 60 especially on large instances. Thus, Ackermann constraints should only be applied in a second stage of the diagnosis process due to their low inﬂuence on the accuracy.

6.3.4.3 Run time In Section 6.3.2, we suggested two techniques to improve the run time of the overall algorithm: A static decision strategy for the SAT solver and the use of a simulation-based preprocessing step. Both techniques were implemented within the hierarchical framework. The results are reported in Table 6.9 for the practical case of using four counterexamples without Ackermann constraints. The table shows run times for the different techniques. Additionally, the number of components considered during SAT-based diagnosis is given. Note that this is not the number of components returned as diagnoses that was considered previously. The number of decisions made by the SAT solver are also reported.

146

ROBUSTNESS AND USABILITY

Table 6.9. Run times for the different approaches (using four counterexamples) BMC Circuit, property am2910 p1 e1, pE. am2910 p2 e1, pS. bpbs p1 e1, pV. bpbs p1 e2, pV. counter e1, pC. FPMult e1, pL. FPMult e2, pL. gcd e1, pR. gcd e2, pR. gcd e1, pBoth gcd e2, pBoth gcd e1, pThree gcd e2, pThree

Time 0.5 <0.1 0.1 <0.1 <0.1 <0.1 <0.1 18.7 22.1 32.2 24.2 42.7 35.5

Diagnosis Zchaff default Static Simulation + static Time #Cmp #Dec Time #Cmp #Dec Time #Cmp #Dec 11.9 205 165,247 2.6 205 8,047 1.6 69 7,855 0.4 87 3,848 0.3 87 989 0.3 52 916 0.2 102 2,819 0.2 102 302 0.1 19 266 0.2 102 1,805 0.1 102 110 0.1 5 87 <0.1 10 259 <0.1 10 131 <0.1 9 130 0.4 105 397 0.2 105 60 0.2 5 60 2.3 105 17,540 1.1 105 8,440 1.0 76 7,320 1057.2 68 3,271, 957 54.0 68 479,526 54.4 67 479,525 351.2 68 1,022,573 19.7 68 115,519 18.6 63 112,833 2213.4 71 3,468,162 91.7 71 425,438 90.1 67 425,436 453.8 71 1,058,165 55.2 71 237,104 50.2 59 232,334 1626.1 71 2,617,354 201.8 71 723,180 198.4 65 730,191 499.0 71 1,278,064 1306.9 71 3,586,181 1307.8 71 3,586,181

The run time decreases drastically when the static decision heuristic is applied. This is due to the reduction of the number of decisions that have to be done by the SAT solver. The only exception is the last benchmark, but when using only one counterexample, the run time was only 9.91 s at the cost of a lower accuracy (see above). Usually, the run time does not exceed the time for BMC too much – even when four counterexamples are applied for diagnosis. Here, incrementally applying more and more counterexamples as suggested in [SVV04] can yield an even shorter run time. The use of the simulation-based preprocessing step also saves some run time in those cases where the number of components considered during SAT-based diagnosis can be reduced significantly. On the other hand the overhead is quite low when no components can be pruned. The creation of counterexamples dedicated to diagnosis as proposed for the combinational case in Section 6.2 may further improve the diagnosis result. This hypothesis is strengthened by the following experimental results. In total, 1000 diagnosis runs were carried out with four randomly chosen counterexamples on am2910 e1 for property pEntry5 and on gcd e2 for property pReadyIn22Cyc. Figures 6.23 and 6.24 show the results. The number of diagnoses varied from 28 to 90 and the run time varied between 1.75 s and 3.75 s for am2910 as can been seen in Figure 6.23. In case of gcd, the number of diagnoses was between 22 and 38 while the run time varied between 6.95 s and 23.65 s as Figure 6.24 shows. Usually, a better diagnosis accuracy also causes shorter run time. In summary, the run time was drastically reduced by the proposed techniques and makes the effort of diagnosis comparable to that of BMC.

147

Diagnosis 4

runtime (s)

3.5

3

2.5

2

1.5 20

30

40

50

60

70

80

90

#comp

Figure 6.23.

am2910: Runtime vs. number of diagnosed components

24 22

runtime (s)

20 18 16 14 12 10 8 6 22

24

26

28

30

32

34

36

38

#comp

Figure 6.24.

6.4

gcd: Runtime vs. Number of diagnosed components

Summary and Future Work

Automatic diagnosis and debugging were considered in detail in this chapter. First, the relations between simulation-based and SAT-based diagnosis have been investigated. Theoretically and empirically it has been shown that the basic simulation-based approaches BSIM and COV are fast but they cannot guarantee to return a valid correction. Moreover, COV may not retrieve all valid corrections. Manually removing invalid corrections is very time consuming. BSAT needs more computation time but returns good diagnosis results that are guaranteed to be a valid correction for a given test-set. The same is true for the advanced approaches that use different search paradigms. The results show a direction for future work. While BSIM does not guarantee that an actual error site has been marked by the largest number of

148

ROBUSTNESS AND USABILITY

counterexamples, this happened in almost all experiments. In the same way, the results returned by COV were not too far from the real errors in most cases. These results suggest a hybrid approach. A simulation-based preprocessing step for diagnosing properties was already applied in Section 6.3. In future work, the fast engines of BSIM and COV can be used to direct the SAT search by tuning the decision heuristics of the solver. A second possibility is to choose an initial correction (that may not be valid) and use SAT-based diagnosis to turn it into a valid correction. Next, the problem of selecting multiple counterexamples for diagnosis was targeted. The problem was formally deﬁned and shown to be difﬁcult. The corresponding decision problem was proven to be NP-complete. Heuristics were given to enable the generation of useful counterexamples and to efﬁciently choose them. Here, also tuning the SAT solver to produce a “good set” of counterexamples is an important next step. For example, techniques that are also applied for all-solutions SAT [GSY04, LHS04] could be exploited. Finally, aiding debugging of properties was considered. The presented approach automatically locates design errors at the gate level or the source code level. The approach handles safety properties written in LTL. A propositional logic formula is built such that diagnoses can be derived from satisfying assignments. We have shown how to extend the formula to make sure that a diagnosed component is actually repairable for the given input sequences. The link to the source code enables the diagnosis engine to exploit hierarchical information. More important, the source code information allows to apply a single error assumption even when errors are introduced at the HDL level. Experimental results show that the efﬁciency is drastically improved by using a dedicated search strategy for the SAT solver. All these diagnosis techniques improve the usability of formal veriﬁcation tools in the design ﬂow. Instead of manually debugging the design description this process is partially automated. Only a small fraction of the design, i.e. the candidate error locations, have to be considered by the designer.

Chapter 7 SUMMARY AND CONCLUSIONS

Today circuit design is a complex task that is composed of several steps. The overall ﬂow and the different steps have been studied in detail in this work. Currently, robustness and usability are still the major problems. The analysis identiﬁed a number of deﬁciencies in the individual design steps. Techniques and methods to alleviate speciﬁc problems have been proposed. All of these approaches were empirically evaluated in case studies or benchmarking experiments. When all of these techniques are integrated, a new enhanced design ﬂow emerges. In this new ﬂow underlying algorithms do not only aim at robustness but are also adjusted to the needs of subsequent tasks like the generation of meaningful counterexamples. The use of SystemC tightly couples system-level description and the synthesizable description of the design. By this, inconsistencies between the two descriptions can be detected more easily and often even avoided because only simple transformations are done. A technique for the creation of fully testable circuits from the SystemC description was presented. On these circuits, ATPG can be carried out efﬁciently. In the veriﬁcation realm the transition towards formal methods has been suggested. As long as simulationbased veriﬁcation methods are still in use, these can be coupled with formal techniques by applying automatic generation of properties from testbenches. These generated properties help to detect gaps in testbenches. But – even more important – the automatic generation of properties provides a whole new veriﬁcation methodology. A methodology that is based on design understanding and the interactive creation of properties. Using this methodology, the creation of properties becomes more efﬁcient, and the usability of tools to check the consistency between design description and textual speciﬁcation improves as well. Thus, the veriﬁcation productivity increases. Here, automatic support for debugging also plays an important role. Counterexamples will still remain

150

ROBUSTNESS AND USABILITY

the instrument to unveil discrepancies between a certain design and its speciﬁcation. Therefore, techniques to automate error location and design debugging were investigated. Different approaches were compared and a method to produce particularly useful counterexamples for automatic diagnosis was proposed. An approach to apply diagnosis techniques even at the source code level to debug formal properties was presented. Several ideas to further improve the different techniques have been discussed in each chapter already. Overall, the proposed techniques establish an enhanced design ﬂow. In comparison to the traditional design ﬂow, the new approaches boost the productivity of the time consuming design process. The improvements are achieved by more robust algorithms and tools that are easier to use.

REFERENCES

[ABG+ 00]

Y. Abarbanel, I. Beer, L. Gluhovsky, S. Keidar, and Y. Wolfsthal. FoCs – automatic generation of simulation checkers from formal speciﬁcations. In Computer Aided Veriﬁcation, volume 1855 of LNCS, pages 538–542, 2000.

[ADK91a]

P. Ashar, S. Devadas, and K. Keutzer. Gate-delay-fault testability properties of multiplexor-based networks. In Int’l Test Conf., pages 887–896, 1991.

[ADK91b]

P. Ashar, S. Devadas, and K. Keutzer. Testability properties of multilevel logic networks derived from binary decision diagrams. Advanced Research in VLSI: UC Santa Cruz, pages 33–54, 1991.

[ADK93]

P. Ashar, S. Devadas, and K. Keutzer. Path-delay-fault testability properties of multiplexor-based networks. INTEGRATION, the VLSI Jour., 15(1):1–23, 1993.

[AFK88]

M.S. Abadir, J. Ferguson, and T.E. Kirkland. Logic veriﬁcation via test generation. IEEE Trans. on CAD, 7:172–177, 1988.

[AH97]

H. Andersen and H. Hulgaard. Boolean expression diagrams. In Logic in Computer Science, pages 88–98, 1997.

[AMM83]

M. Abramovici, P.R. Menon, and D.T. Miller. Critical path tracing – an alternative to fault simulation. In Design Automation Conf., pages 214–220, 1983.

[ASU85]

A.V. Aho, R. Sethi, and J.D. Ullman. Compilers – Principles, Techniques and Tools. Pearson Higher Education, 1985.

[ASV+ 05]

M. Ali, S. Safarpour, A. Veneris, M. Abadir, and R. Drechsler. Post-veriﬁcation debugging of hierarchical designs. In Int’l Conf. on CAD, pages 871–876, 2005.

[AVS+ 04]

M.F. Ali, A. Veneris, S. Safarpour, R. Drechsler, A. Smith, and M.S.Abadir. Debugging sequential circuits using Boolean satisﬁability. In Int’l Conf. on CAD, pages 204–209, 2004.

152

ROBUSTNESS AND USABILITY

[BAS02]

A. Biere, C. Artho, and V. Schuppan. Liveness checking as safety checking. In FMICS workshop, volume 66(2) of Electronic Notes in Theoretical Computer Science, 2002.

[BCCZ99]

A. Biere, A. Cimatti, E. Clarke, and Y. Zhu. Symbolic model checking without BDDs. In Tools and Algorithms for the Construction and Analysis of Systems, volume 1579 of LNCS, pages 193–207. Springer Verlag, 1999.

[BCMD90]

J.R. Burch, E.M. Clarke, K.L. McMillan, and D.L. Dill. Sequential circuit veriﬁcation using symbolic model checking. In Design Automation Conf., pages 46–51, 1990.

[BDKN94]

D. Brand, A. Drumm, S. Kundu, and P. Narrain. Incremental synthesis. In Int’l Conf. on CAD, pages 14–18, 1994.

[Bec92]

B. Becker. Synthesis for testability: Binary decision diagrams. In Symp. on Theoretical Aspects of Comp. Science, volume 577 of LNCS, pages 501–512. Springer, 1992.

[Bec98]

B. Becker. Testing with decision diagrams. INTEGRATION, the VLSI Jour., 26:5–20, 1998.

[BF76]

M.A. Breuer and A.D. Friedman. Diagnosis & reliable design of digital systems. Computer Science Press, 1976.

[BFGR03]

A.G. Braun, J.B. Freuer, J. Gerlach, and W. Rosenstiel. Automated conversion of SystemC ﬁxed-point data types for hardware synthesis. In VLSI of Systemon-Chip, pages 55–60, 2003.

[BMJ+ 99]

V. Boppana, R. Mukherjee, J. Jain, M. Fujita, and P. Bollineni. Multiple error diagnosis based on Xlists. In Design Automation Conf., pages 660–665, 1999.

[BNR03]

T. Ball, M. Naik, and S. K. Rajamani. From symptom to cause: Localizing errors in counterexample traces. In Symposium on Principles of Programming Languages, pages 97–105, January 2003.

[Boo04]

Boolean Satisﬁability Research Group at Princeton University. ZCHAFF, 2004. http://www.princeton.edu/ ˜chaff/zchaff.html.

[BPM+ 05]

D. Berner, H. Patel, D. Mathaikutty, J.-P. Talpin, and S. Shukla. SystemCXML: An extensible SystemC front end using XML. Technical report, INRIA, France and Virginia Polytechnic and State University, USA, 2005.

[Bra83]

D. Brand. Redundancy and don’t cares in logic synthesis. IEEE Trans. on Comp., 32(10):947–952, 1983.

[BRB90]

K.S. Brace, R.L. Rudell, and R.E. Bryant. Efﬁcient implementation of a BDD package. In Design Automation Conf., pages 40–45, 1990.

[Bry86]

R.E. Bryant. Graph-based algorithms for Boolean function manipulation. IEEE Trans. on Comp., 35(8):677–691, 1986.

[BS93]

D. Brand and T. Sasao. Minimization of AND-EXOR expressions using rewrite rules. IEEE Trans. on Comp., 42:568–576, 1993.

References

153

[BS01]

J. Bormann and C. Spalinger. Formale Veriﬁkation für Nicht-Formalisten (Formal veriﬁcation for non-formalists). Informationstechnik und Technische Informatik, 43:22–28, 2001.

[BW96]

B. Bollig and I. Wegener. Improving the variable ordering of OBDDs is NPcomplete. IEEE Trans. on Comp., 45(9):993–1002, 1996.

[CCC+ 92]

G. Cabodi, P. Camurati, F. Corno, P. Prinetto, and M.S. Reorda. A new model for improving symbolic product machine traversal. In Design Automation Conf., pages 614–619, 1992.

[CGP99]

E.M. Clarke, O. Grumberg, and D.A. Peled. Model Checking. MIT Press, Cambridge, MA, 1999.

[CIW+ 03]

F. Copty, A. Irron, O. Weissberg, N. Kropp, and G. Kamhi. Efﬁcient debugging in a formal veriﬁcation environment. Software Tools for Technology Transfer, 4:335–348, 2003.

[CJG+ 03]

A. Clouard, K. Jain, F. Ghenassia, L. Maillet-Contoz, and J.-P. Strassen. Using Transactional Level Models in a SoC Design Flow, chapter 2, pages 29–64. Kluwer Academic Publishers, 2003.

[CNQ03]

G. Cabodi, S. Nocco, and S. Quer. SAT-based bounded model checking by means of BDD-based approximate traversals. In Design, Automation and Test in Europe, pages 898–903, 2003.

[Coo71]

S.A. Cook. The complexity of theorem proving procedures. In 3. ACM Symposium on Theory of Computing, pages 151–158, 1971.

[CPK95]

M. Chatterjee, D. K. Pradhan, and W. Kunz. LOT: logic optimization with testability - new transformations using recursive learning. In Int’l Conf. on CAD, pages 318–325, 1995.

[CWH93]

P.-Y. Chung, Y.-M. Wang, and I. N. Hajj. Diagnosis and correction of logic design errors in digital circuits. In Design Automation Conf., pages 503–508, 1993.

[DBG96]

R. Drechsler, B. Becker, and N. Göckel. A genetic algorithm for variable ordering of OBDDs. IEE Proceedings, 143(6):364–368, 1996.

[DF04]

R. Drechsler and G. Fey. Design understanding by automatic property generation. In Workshop on Synthesis And System Integration of Mixed Information technologies, pages 274–281, 2004.

[DF06]

R. Drechsler and G. Fey. Automatic test pattern generation. In Formal Methods for Hardware Veriﬁcation, LNCS, pages 30–55, 2006.

[DFGG05]

R. Drechsler, G. Fey, C. Genz, and D. Große. SyCE: An integrated environment for system design in SystemC. In IEEE Int’l Workshop on Rapid System Prototyping, pages 258–260, 2005.

[DFK06]

R. Drechsler, G. Fey, and S. Kinder. An integrated approach for combining BDD and SAT provers. In VLSI Design Conf., pages 237–242, 2006.

154

ROBUSTNESS AND USABILITY

[DG02]

R. Drechsler and W. Günther. Towards One-Path Synthesis. Kluwer Academic Publishers, 2002.

[DLL62]

M. Davis, G. Logeman, and D. Loveland. A machine program for theorem proving. Comm. of the ACM, 5:394–397, 1962.

[DP60]

M. Davis and H. Putnam. A computing procedure for quantiﬁcation theory. Journal of the ACM, 7:506–521, 1960.

[Dre94]

R. Drechsler. BiTeS: A BDD based test pattern generator for strong robust path delay faults. In European Design Automation Conf., pages 322–327, 1994.

[Dre04]

R. Drechsler. Using synthesis techniques in SAT solvers. In ITG/GI/GMMWorkshop Methoden und Beschreibungssprachen zur Modellierung und Veriﬁkation von Schaltungen und Systemen, pages 165–173, 2004.

[DSF04]

R. Drechsler, J. Shi, and G. Fey. Synthesis of fully testable circuits from BDDs. IEEE Trans. on CAD, 23(3):440–443, 2004.

[EAH05]

C. Eibl, C. Albrecht, and R. Hagenau. gSysC: A graphical front end for SystemC. In European Conference on Modelling and Simulation (ECMS), pages 257–262, 2005.

[EB05]

N. Eén and A. Biere. Effective preprocessing in SAT through variable and clause elimination. In International Conference on Theory and Applications of Satisﬁability Testing, volume 3569 of LNCS, pages 61–75, 2005.

[EMS07]

N. Een, A. Mishchenko, and N. Sörensson. Applying logic synthesis for speeding up SAT. In Int’l Conference on Theory and Applications of Satisﬁability Testing, LNCS, 2007.

[ES04]

N. Eén and N. Sörensson. An extensible SAT solver. In SAT 2003, volume 2919 of LNCS, pages 502–518. Springer, 2004.

[EW77]

E.B. Eichelberger and T.W. Williams. A logic design structure for LSI testability. In Design Automation Conf., pages 462–468, 1977.

[FD03]

G. Fey and R. Drechsler. Finding good counter-examples to aid design veriﬁcation. In MEMOCODE, pages 51–52, 2003.

[FD04]

G. Fey and R. Drechsler. Improving simulation-based veriﬁcation by means of formal methods. In ASP Design Automation Conf., pages 640–643, 2004.

[FD05]

G. Fey and R. Drechsler. Efﬁcient hierarchical system debugging for property checking. In IEEE Workshop on Design and Diagnostics of Electronic Circuits and Systems, pages 41–46, 2005.

[FD06]

G. Fey and R. Drechsler. Minimizing the number of paths in BDDs – theory and algorithm. IEEE Trans. on CAD, 25(1):4–11, 2006.

[FGC+ 04]

G. Fey, D. Große, T. Cassens, C. Genz, T. Warode, and R. Drechsler. ParSyC: An efﬁcient SystemC parser. In Workshop on Synthesis And System Integration of Mixed Information technologies, pages 148–154, 2004.

References

155

[FKL03]

H. Foster, A. Krolnik, and D. Lacey. Assertion-Based Design. Kluwer Academic Publishers, 2003.

[Fri73]

A.D. Friedman. Easily testable iterative systems. IEEE Trans. on Comp., 22:1061–1064, 1973.

[FS83]

H. Fujiwara and T. Shimono. On the acceleration of test generation algorithms. IEEE Trans. on Comp., 32:1137–1144, 1983.

[FSD04]

G. Fey, J. Shi, and R. Drechsler. BDD circuit optimization for path delay fault testability. In EUROMICRO Symp. on Digital System Design, pages 162–172, 2004.

[FSVD06]

G. Fey, S. Safarpour, A. Veneris, and R. Drechsler. On the relation between simulation-based and SAT-based diagnosis. In Design, Automation and Test in Europe, pages 1139–1144, 2006.

[GD00]

W. Günther and R. Drechsler. ACTion: Combining logic synthesis and technology mapping for MUX based FPGAs. Journal of Systems Architecture, 46(14):1321–1334, 2000.

[GD03]

D. Große and R. Drechsler. Formal veriﬁcation of LTL formulas for SystemC designs. In IEEE International Symposium on Circuits and Systems, pages V:245–V:248, 2003.

[GD06]

C. Genz and R. Drechsler. System exploration of SystemC designs. In IEEE Annual Symposium on VLSI, pages 335–340, 2006.

[GDLA03]

D. Große, R. Drechsler, L. Linhard, and G. Angst. Efﬁcient automatic visualization of SystemC designs. In Forum on Speciﬁcation and Design Languages, pages 646–657, 2003.

[GJ79]

M.R. Garey and D.S. Johnson. Computers and Intractability - A Guide to NP-Completeness. Freeman, San Francisco, 1979.

[GKL04]

A. Groce, D. Kroening, and F. Lerda. Understanding counterexamples with explain. In R. Alur and D. A. Peled, editors, Computer Aided Veriﬁcation, number 3114 in LNCS, pages 453–456, July 2004.

[GLMS02]

T. Grötker, S. Liao, G. Martin, and S. Swan. System Design with SystemC. Kluwer Academic Publishers, 2002.

[GN02]

E. Goldberg and Y. Novikov. BerkMin: a fast and robust SAT-solver. In Design, Automation and Test in Europe, pages 142–149, 2002.

[Goe81]

P. Goel. An implicit enumeration algorithm to generate test for combinational logic. IEEE Trans. on Comp., 30:215–222, 1981.

[Gro04]

A. Groce. Error explanation with distance metrics. In Tools and Algorithms for the Construction and Analysis of Systems, volume 2988 of LNCS, pages 108–122, Barcelona, Spain, March–April 2004.

[GSY04]

O. Grumberg, A. Schuster, and A. Yadgar. Memory efﬁcient all-solutions SAT solver and its application to reachability. In Int’l Conf. on Formal Methods in CAD, volume 3312 of LNCS, pages 275–289, 2004.

156

ROBUSTNESS AND USABILITY

[GV03]

A. Groce and W. Visser. What went wrong: Explaining counterexamples. In Model Checking of Software: International SPIN Workshop, volume 2648 of LNCS, pages 121–135. Springer, May 2003.

[GYA+ 01]

A. Gupta, Z. Yang, P. Ashar, L. Zhang, and S. Malik. Partition-based decision heuristics for image computation using SAT and BDDs. In Int’l Conf. on CAD, pages 286–292, 2001.

[GYAG00]

A. Gupta, Z. Yang, P. Ashar, and A. Gupta. SAT-based image computation with application in reachability analysis. In Int’l Conf. on Formal Methods in CAD, volume 1954 of LNCS, pages 354–371, 2000.

[GZ03]

J. F. Groote and H. Zantema. Resolution and binary decision diagrams cannot simulate each other polynomially. Discrete Applied Mathmatics, 130(2):157– 171, 2003.

[HC99]

S.-Y. Huang and K.-T. Cheng. Errortracer: Design error diagnosis based on fault simulation techniques. IEEE Trans. on CAD, 18(9):1341–1352, 1999.

[HD84]

W. Hamscher and R. Davis. Diagnosing circuits with state: An inherently underconstrained problem. In Proceedings of the Fourth National Conference on Artiﬁcial Intelligence (AAAI’84), pages 142–147, 1984.

[HDB96]

A. Hett, R. Drechsler, and B. Becker. MORE: Alternative implementation of BDD packages by multi-operand synthesis. In European Design Automation Conf., pages 164–169, 1996.

[HK00]

D.W. Hoffmann and T. Kropf. Efﬁcient design error correction of digital circuits. In Int’l Conf. on Comp. Design, pages 465–472, 2000.

[HS96]

G. Hachtel and F. Somenzi. Logic Synthesis and Veriﬁcation Algorithms. Kluwer Academic Publishers, 1996.

[HTCT03]

Y.-C. Hsu, B. Tabbara, Y.-A. Chen, and F. Tsai. Advanced techniques for RTL debugging. In Design Automation Conf., pages 362–367, 2003.

[HTFM03]

C. Haubelt, J. Teich, R. Feldmann, and B. Monien. SAT-based techniques in system synthesis. In Design, Automation and Test in Europe, volume 1, pages 11168–11169, 2003.

[IINY03]

H. Inoue, T. Iwasaki, M. Numa, and K. Yamamoto. An improved multiple error diagnosis technique using symbolic simulation with truth variables and its application to incremental synthesis for standard-cell design. In Workshop on Synthesis And System Integration of Mixed Information technologies, pages 61–68, 2003.

[IPC03]

M.K. Iyer, G. Parthasarathy, and K.-T. Cheng. SATORI – a fast sequential SAT engine for circuits. In Int’l Conf. on CAD, pages 320–325, 2003.

[IS75]

O.H. Ibarra and S.K. Sahni. Polynomially complete fault detection problems. IEEE Trans. on Comp., 24:242–249, 1975.

[JG03]

N. Jha and S. Gupta. Testing of Digital Systems. Cambridge University Press, 2003.

References

157

[JGB05]

B. Jobstmann, A. Griesmayer, and R. Bloem. Program repair as a game. In Computer Aided Veriﬁcation, volume 3576 of LNCS, pages 226–238, 2005.

[JPHS91]

S.-W. Jeong, B. Plessier, G. Hachtel, and F. Somenzi. Extended BDD’s: Trading of canonicity for structure in veriﬁcation algorithms. In Int’l Conf. on CAD, pages 464–467, 1991.

[JRS04]

H.S. Jin, K. Ravi, and F. Somenzi. Fate and free will in error traces. Software Tools for Technology Transfer, 6(2):102–116, 2004.

[JS05]

H. Jin and F. Somenzi. CirCUs: A hybrid satisﬁability solver. In SAT 2004, volume 3542 of LNCS, pages 211–223. Springer, 2005.

[KCSL94]

A. Kuehlmann, D.I. Cheng, A. Srinivasan, and D.P. LaPotin. Error diagnosis for transistor-level veriﬁcation. In Design Automation Conf., pages 218–224, 1994.

[KCY03]

D. Kroening, E. Clarke, and K. Yorav. Behavioral consistency of C and Verilog programs using bounded model checking. In Design Automation Conf., pages 368–371, 2003.

[KP80]

K.L. Kodandapani and D.K. Pradhan. Undetectability of bridging faults and validity of stuck-at fault test sets. IEEE Trans. on Comp., C-29(1):55–59, 1980.

[KP94]

W. Kunz and D.K. Pradhan. Recursive learning: A new implication technique for efﬁcient solutions of CAD problems: Test, veriﬁcation and optimization. IEEE Trans. on CAD, 13(9):1143–1158, 1994.

[KPKG02]

A. Kuehlmann, V. Paruthi, F. Krohm, and M.K. Ganai. Robust Boolean reasoning for equivalence checking and functional property veriﬁcation. IEEE Trans. on CAD, 21(12):1377–1394, 2002.

[Kro99]

Th. Kropf. Introduction to Formal Hardware Veriﬁcation. Springer, 1999.

[KS97]

W. Kunz and D. Stoffel. Reasoning in Boolean Networks. Kluwer Academic Publishers, 1997.

[Kun93]

W. Kunz. HANNIBAL: An efﬁcient tool for logic veriﬁcation based on recursive learning. In Int’l Conf. on CAD, pages 538–543, 1993.

[Lar92]

T. Larrabee. Test pattern generation using Boolean satisﬁability. IEEE Trans. on CAD, 11:4–15, 1992.

[LCC+ 95]

C.-C. Lin, K.-C. Chen, S.-C. Chang, M. Marek-Sadowska, and K.-T. Cheng. Logic synthesis for engineering change. In Design Automation Conf., pages 647–651, 1995.

[LHS04]

B. Li, M.S. Hsiao, and S. Sheng. A novel SAT all-solutions solver for efﬁcient preimage computation. In Design, Automation and Test in Europe, pages 10272–10278, 2004.

[LTG97]

S. Liao, S. Tjiang, and R. Gupta. An efﬁcient implementation of reactivity for modeling hardware in the scenic design environment. In Design Automation Conf., pages 70–75, 1997.

158

ROBUSTNESS AND USABILITY

[LV05]

J.B. Liu and A. Veneris. Incremental fault diagnosis. IEEE Trans. on CAD, 24(4):1514–1545, 2005.

[Mar99]

J.P. Marques-Silva. The impact of branching heuristics in propositional satisﬁability algorithms. In 9th Portuguese Conference on Artiﬁcial Intelligence (EPIA), 1999.

[MBM01]

L. Macchiarulo, L. Benini, and E. Macii. On-the-ﬂy layout generation for PTL macrocells. In Design, Automation and Test in Europe, pages 546–551, 2001.

[McM93]

K.L. McMillan. Symbolic Model Checking. Kluwer Academic Publishers, 1993.

[McM02]

K. L. McMillan. Applying SAT methods in unbounded symbolic model checking. In Computer Aided Veriﬁcation, volume 2404 of LNCS, pages 250–264, July 2002.

[Min02]

S. Minato. Streaming BDD manipulation. IEEE Trans. on Comp., 51(5):474– 485, 2002.

[MMM02]

J. Mohnke, P. Molitor, and S. Malik. Limits of using signatures for permutation independent Boolean comparison. Formal Methods in System Design: An International Journal, 2(21):167–191, 2002.

[MMMC05]

M. Moy, F. Maraninchi, and L. Maillet-Contoz. PINAPA: An extraction tool for SystemC descriptions of systems-on-a-chip. In ACM International Conference on Embedded Software (EMSOFT), pages 317–324, 2005.

[MMZ+ 01]

M.W. Moskewicz, C.F. Madigan, Y. Zhao, L. Zhang, and S. Malik. Chaff: Engineering an efﬁcient SAT solver. In Design Automation Conf., pages 530– 535, 2001.

[MP91]

Z. Manna and A. Pnueli. The Temporal Logic of Reactive and Concurrent Systems. Springer-Verlag, 1991.

[MP01]

A. Mishchenko and M. Perkowski. Fast heuristic minimization of exclusivesums-of-products. In Int’l Workshop on Applications of the Reed-Muller Expansion in Circuit Design, pages 242–250, 2001.

[MRH+ 01]

W. Müller, J. Ruf, D. Hoffmann, J. Gerlach, T. Kropf, and W. Rosenstiehl. The simulation semantics of SystemC. In Design, Automation and Test in Europe, pages 64–70, 2001.

[MRR03]

W. Müller, W. Rosenstiel, and J. Ruf, editors. SystemC Methodologies and Applications. Kluwer Academic Publishers, 2003.

[MS96]

J.P. Marques-Silva and K.A. Sakallah. GRASP – a new search algorithm for satisﬁability. In Int’l Conf. on CAD, pages 220–227, 1996.

[MS98]

C. Meinel and H. Sack. ⊕-OBDDs – a BDD structure for probabilistic veriﬁcation. In Workshop on Probabilistic methods in Veriﬁcation, pages 141–151, 1998.

[MS99]

J.P. Marques-Silva and K.A. Sakallah. GRASP: A search algorithm for propositional satisﬁability. IEEE Trans. on Comp., 48(5):506–521, 1999.

References

159

[MSML99]

A. Mukherjee, R. Sudhakar, M. Marek-Sadowska, and S. Long. Wave steering in YADDs: A novel non-iterative synthesis and layout technique. In Design Automation Conf., pages 466–471, 1999.

[NE01]

J.W. Nimmer and M.D. Ernst. Static veriﬁcation of dynamically detected program invariants: Integrating Daikon and ESC/Java. In Workshop on Runtime Veriﬁcation, volume 55 of Electronic Notes in Theoretical Computer Science. Elsevier, 2001.

[NP91]

T.M. Niermann and J.H. Patel. HITEC: A test generation package for sequential circuits. In European Conf. on Design Automation, pages 214–218, 1991.

[Par97]

T. Parr. Language Translation using PCCTS and C++: A Reference Guide. Automata Publishing, 1997.

[PC90]

M.A. Perkowski and M. Chrzanowska-Jeske. An exact algorithm to minimize mixed-radix exclusive sums of products for incompletely speciﬁed Boolean functions. In Int’l Symp. Circ. and Systems, pages 1652–1655, 1990.

[PK00]

V. Paruthi and A. Kuehlmann. Equivalence checking combining a structural SAT-solver, BDDs, and simulation. In Int’l Conf. on Comp. Design, pages 459–464, 2000.

[Pnu77]

A. Pnueli. The temporal logic of programs. In IEEE Symposium on Foundations of Computer Science, pages 46–57, Providence, RI, 1977.

[PQ95]

T.J. Parr and R.W. Quong. ANTLR: A predicated-LL(k) parser generator. Software – Practice and Experience, 25(7):789–810, 1995.

[PR90]

A.K. Pramanick and S.M. Reddy. On the design of path delay fault testable combinational circuits. In Int’l Symp. on Fault-Tolerant Comp., pages 374– 381, 1990.

[PR95]

I. Pomeranz and S.M. Reddy. On correction of multiple design errors. IEEE Trans. on CAD, 14(2):255–264, 1995.

[PW03]

B. Peischl and F. Wotawa. Modeling state in software debugging of VHDLRTL designs – a model based diagnosis approach. In Automated and Algorithmic Debugging (AADEBUG 2003), pages 197–210, 2003.

[RBKM91]

D.E. Ross, K.M. Butler, R. Kapur, and M.R. Mercer. Fast functional evaluation of candidate OBDD variable ordering. In European Conf. on Design Automation, pages 4–9, 1991.

[RDO02]

S. Reda, R. Drechsler, and A. Orailoglu. On the relation between SAT and BDDs for equivalence checking. In Int’l Symp. on Quality Electronic Design, pages 394–399, 2002.

[Rei87]

R. Reiter. A theory of diagnosis from ﬁrst principles. Artiﬁcial Intelligence, 32:57–95, 1987.

[Rot66]

J.P. Roth. Diagnosis of automata failures: A calculus and a method. IBM J. Res. Dev., 10:278–281, 1966.

160

ROBUSTNESS AND USABILITY

[RR03]

M. Renieris and S. P. Reiss. Fault localization with nearest neighbor queries. In International Conference on Automated Software Engineering, pages 30–39, Montreal, Canada, October 2003.

[RS04]

K. Ravi and F. Somenzi. Minimal assignments for bounded model checking. In Tools and Algorithms for the Construction and Analysis of Systems, volume 2988 of LNCS, pages 31–45, 2004.

[Rud93]

R. Rudell. Dynamic variable ordering for ordered binary decision diagrams. In Int’l Conf. on CAD, pages 42–47, 1993.

[SBSV96]

P. Stephan, R.K. Brayton, and A.L. Sangiovanni-Vincentelli. Combinational test generation using satisﬁability. IEEE Trans. on CAD, 15:1167–1176, 1996.

[SFBD06]

S. Staber, G. Fey, R. Bloem, and R. Drechsler. Automatic fault localization for property checking. In Haifa Veriﬁcation Conference, volume 4383 of LNCS, pages 50–64, 2006.

[SFD05a]

J. Shi, G. Fey, and R. Drechsler. Bridging fault testability of BDD circuits. In ASP Design Automation Conf., pages 188–191, 2005.

[SFD+ 05b]

J. Shi, G. Fey, R. Drechsler, A. Glowatz, F. Hapke, and J. Schlöffel. PASSAT: Effcient SAT-based test pattern generation for industrial circuits. In IEEE Annual Symposium on VLSI, pages 212–217, 2005.

[SFVD05]

S. Safarpour, G. Fey, A. Veneris, and R. Drechsler. Utilizing don’t care states in SAT-based bounded sequential problems. In Great Lakes Symp. VLSI, pages 264–269, 2005.

[Sht01]

O. Shtrichman. Pruning techniques for the SAT-based bounded model checking problem. In Conference on Correct Hardware Design and Veriﬁcation, volume 2144 of LNCS, pages 58–70, 2001.

[SJB05]

S. Staber, B. Jobstmann, and R. Bloem. Finding and ﬁxing faults. In Conference on Correct Hardware Design and Veriﬁcation, LNCS, pages 35–49, 2005.

[SKWS+ 04]

C. Schulz-Key, M. Winterholer, T. Schweizer, T. Kuhn, and W. Rosenstiel. Object-oriented modeling and synthesis of SystemC speciﬁcations. In ASP Design Automation Conf., pages 238–243, 2004.

[Smi85]

G.L. Smith. Model for delay faults based upon paths. In Int’l Test Conf., pages 342–349, 1985.

[Smi04]

A. Smith. Diagnosis of combinational logic circuits using Boolean satisﬁability. Master’s thesis, University of Toronto, Canada, 2004.

[Som01a]

F. Somenzi. CUDD: CU Decision Diagram Package Release 2.3.1. University of Colorado at Boulder, 2001.

[Som01b]

F. Somenzi. Efﬁcient manipulation of decision diagrams. Software Tools for Technology Transfer, 3(2):171–181, 2001.

References

161

[SSL+ 92]

E. Sentovich, K. Singh, L. Lavagno, Ch. Moon, R. Murgai, A. Saldanha, H. Savoj, P. Stephan, R. Brayton, and A. Sangiovanni-Vincentelli. SIS: A system for sequential circuit synthesis. Technical report, University of Berkeley, 1992.

[Str04]

O. Strichman. Accelerating bounded model checking of safety properties. Formal Methods in System Design, 24(1):5–24, January 2004.

[STS87]

M. Schulz, E. Trischler, and T. Sarfert. SOCRATES: A highly efﬁcient automatic test pattern generation system. In Int’l Test Conf., pages 1016–1026, 1987.

[SVV04]

A. Smith, A. Veneris, and A. Viglas. Design diagnosis using Boolean satisﬁability. In ASP Design Automation Conf., pages 218–223, 2004.

[SW93]

D. Sieling and I. Wegener. Reduction of BDDs in linear time. Information Processing Letters, 48(3):139–144, 11 1993.

[Syn02]

Synopsys. Describing Synthesizable RTL in SystemCT M , Vers. 1.1. Synopsys Inc., 2002. Available at http://www.synopsys.com.

[Tse68]

G. Tseitin. On the complexity of derivation in propositional calculus. In Studies in Constructive Mathematics and Mathematical Logic, Part 2, pages 115– 125, 1968. (Reprinted in: J. Siekmann, G. Wrightson (Ed.), Automation of Reasoning, Vol. 2, Springer, Berlin, pages 466–483, 1983.)

[TSH94]

M. Tomita, N. Suganuma, and K. Hirano. Pattern generation for locating logic design errors. IEICE Trans. Fundamentals, E77-A(5):881–893, 1994.

[TYSH94]

M. Tomita, T. Yamamoto, F. Sumikawa, and K. Hirano. Rectiﬁcation of multiple logic design errors in multiple output circuits. In Design Automation Conf., pages 212–217, 1994.

[Uba03]

R. Ubar. Design error diagnosis with re-synthesis in combinational circuits. Jour. of Electronic Testing: Theory and Applications, 19:73–82, 2003.

[VF97]

S. Venkataraman and W. K. Fuchs. A deductive technique for diagnosis of bridging faults. In Int’l Conf. on CAD, pages 562–567, 1997.

[VH99]

A. Veneris and I. N. Hajj. Design error diagnosis and correction via test vector simulation. IEEE Trans. on CAD, 18(12):1803–1816, 1999.

[VIS96]

The VIS Group. VIS: A system for veriﬁcation and synthesis. In Computer Aided Veriﬁcation, volume 1102 of LNCS, pages 428–432. Springer Verlag, 1996.

[VSA03]

A. Veneris, A. Smith, and M. S. Abadir. Logic veriﬁcation based on diagnosis techniques. In ASP Design Automation Conf., 2003.

[WA73]

M.J.Y. Williams and J.B. Angell. Enhancing testability of large-scale integrated circuits via test points and additional logic. IEEE Trans. on Comp., C-22(1):46–60, 1973.

162

ROBUSTNESS AND USABILITY

[WB95]

A. Wahba and D. Borrione. Design error diagnosis in sequential circuits. In Conference on Correct Hardware Design and Veriﬁcation, volume 987 of LNCS, pages 171–188. Springer, 1995.

[WKS01]

J. Whittemore, J. Kim, and K. Sakallah. SATIRE: A new incremental satisﬁability engine. In Design Automation Conf., pages 542–545, 2001.

[Wot02]

F. Wotawa. Debugging hardware designs using a value-based model. Applied Intelligence, 16:71–92, 2002.

[WTSF04]

K. Winkelmann, H.-J. Trylus, D. Stoffel, and G. Fey. Cost-efﬁcient block veriﬁcation for a UMTS up-link chip-rate coprocessor. In Design, Automation and Test in Europe, volume 1, pages 162–167, 2004.

[Yan91]

S. Yang. Logic synthesis and optimization benchmarks user guide. Technical Report 1/95, Microelectronic Center of North Carolina, 1991.

[ZH02]

A. Zeller and R. Hildebrandt. Simplifying and isolating failure-inducing input. IEEE Transactions on Software Engineering, 28(2):183–200, February 2002.

[ZMMM01]

L. Zhang, C.F. Madigan, M.H. Moskewicz, and S. Malik. Efﬁcient conﬂict driven learning in a Boolean satisﬁability solver. In Int’l Conf. on CAD, pages 279–285, 2001.

INDEX OF SYMBOLS

· + · νg [t] ϕ π ψ Ψ ω Ω

Boolean AND Boolean OR Negation of · Value of gate g at time step t Variable order Variable order Boolean expression LTL formula Boolean expression LTL formula (usually a subformula) B Set of Boolean values {0, 1} C Circuit C Set of candidate gates f Boolean function Positive cofactor of f wrt. x fx Negative cofactor of f wrt. x fx g Gate in a circuit C (i.e. a node in the graph) g[t] Propositional variable representing gate g at time step t (e.g. in a CNF formula) BDD of f wrt. π, f and/or π may Gπf be omitted when clear from the context i General indexing variable j General indexing variable k Upper (lower) limit for a minimization (maximization) problem l Number of latches in a circuit C M0 (w) Set of predecessors of a node w in a BDD that have a CE to w M1 (w) Set of predecessors of a node w in a BDD that have a non-CE to w N Set of next state elements in a circuit C Next state element i in a circuit C ni

n

Number of variables in a Boolean function f , number of primary inputs of a circuit C num Number of all counterexamples m Number of outputs of Boolean function f , number of primary outputs of a circuit C P Set of predecessors of a node in a circuit C P1 (Gπf ) Number of one-paths in the BDD Gfπ P0 (Gπf ) Number of zero-paths in the BDD Gfπ R Time relation S Set of present state elements of a circuit C Present state element i in a circuit si C t Time reference Length of a simulation trace or a tcyc property over a ﬁxed time interval T Test-vector or counterexample T Simulation trace U Vector of signals in T Vector of values at time t in T ut v Node in a graph, often in a BDD Gfπ or a circuit C w Node in a graph, often in a BDD Gfπ or a circuit C X Set of primary inputs of a C, set of variables of a BDD Gfπ xi Primary input i of a circuit C or variable i in a BDD Gfπ Y Set of primary outputs of a circuit C Output i of a Boolean function f , yi primary output i of a circuit C Boolean variable corresponding zΩ [t] to an LTL formula Ω at time t

INDEX

Ω, see LTL formula Ψ, see LTL formula ω, see Boolean expression ϕ, see variable order π, see variable order ψ, see Boolean expression ψg [t], 28 νg [t], see value of a gate abnormal predicate, 105, 135 abstract syntax tree, 141 Ackermann constraint, 138 AST, 141 ATPG, 31 automatic test pattern generation, see ATPG B, 9 BasicSATDiagnose, 106 BasicSimDiagnose, 103 BCP, 15 BDD, 10 BDD circuit, 22 simpliﬁcation, 69 testability, 68 binary decision diagram, see BDD blif, 59 BMC, 27 Boolean expression, 9 function, 9 Boolean constraint propagation, see BCP Boolean Satisﬁability, see SAT bounded model checking, see BMC BSAT, 106 BSIM, 103 C, see circuit C, 102

CCE, 116 heuristics, 123 NP-completeness, 119 CE, 10 choosing counterexamples, see CCE circuit, 19 clause, 14 CNF, 14 cofactor, 10 combinational circuit, 20 complemented edge, see CE computed table, 13 conﬂict analysis, 14, 16 conﬂict-based learning, 18 conﬂict clause, 16 conﬂict-driven assertion, 16 conjunctive normal form, see CNF controlling value, 20 counterexample, 26 COV, 107 D-algorithm, 34 debugging, 99, 130, see diagnosis decision heuristic, 14 for diagnosis, 140 diagnosis, 99 complexity, 110 problem, 102 SAT-based, 104, 139 simulation-based, 103, 140 DLL procedure, 14 DPLL procedure, 14 effect analysis, 102 Else, 10 empty intersection, 117 equivalence checking, 25

166 ESOP, 47 essential candidates, 102 expansion heuristic, 42 expansion node, 42 F , 20 f , see Boolean function fx , see cofactor fx , see cofactor FAN, 35 fault model, 31 Gπ f , 11 g, see gate g[t], 28 gate, 20 HANNIBAL, 35 implication graph, 17 l, 20 Linear Time Logic, see LTL literal, 14 LTL, 27 formula, 29 m, 9, 20 minimal intersection, 117 minimum cover, 117 minimum cover problem, 107 miter circuit, 25 MuTaTe, 66 N , 20 n, 9, 19 non-chronological backtracking, 18 P , 20 ParSyC, 53 path delay fault model, see PDFM path tracing, 103 PDFM, 31 PODEM, 35 property diagnosis, 133 generation, 78 LTL, 29 propositional, 29 propGen, 79

ROBUSTNESS AND USABILITY repairable gate, 137 robust test, 32 Rudell’s sifting, 12 S, 20 SAFM, 31 SAT, 13 solver, 14 satisﬁable, 14 SCDiagnose, 107 sequential circuit, 19 set cover problem, 107 Shannon decomposition, 10 simulation trace, 21 SOCRATES, 35 strong robust test, 32 stuck-at fault model, 31 SyCE, 53 synthesis for testability, 65 SystemC, 54 T, see simulation trace T , see counterexample T, 26 tcyc , 21 test pattern, 32 testable fault, 33 Then, 10 time relation, 78 two literal watching scheme, 16 U , 21 unique table, 13 unsatisﬁable, 14 untestable fault, 33 valid correction, 102, 138 value Boolean, 9 of a gate, 20 variable order, 10 veriﬁcation methodology, 87 new methodology, 77 VSIDS strategy, 18 X, 9, 19 Y , 20 zΩ [t], 29