Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis, and J. van Leeuwen
2517
3
Berlin Heidelberg New York Barcelona Hong Kong London Milan Paris Tokyo
Mark D. Aagaard John W. O’Leary (Eds.)
Formal Methods in Computer-Aided Design 4th International Conference, FMCAD 2002 Portland, OR, USA, November 6-8, 2002 Proceedings
13
Series Editors Gerhard Goos, Karlsruhe University, Germany Juris Hartmanis, Cornell University, NY, USA Jan van Leeuwen, Utrecht University, The Netherlands Volume Editors Mark D. Aagaard Department of Electrical and Computer Engineering, University of Waterloo 200 University Avenue West, Waterloo, ON N2L 3G1, Canada E-mail:
[email protected] John W. O’Leary Strategic CAD Labs, Intel Corporation 5200 NE Elam Young Parkway, Hillsboro OR, 97124-6497, USA E-mail:
[email protected] Cataloging-in-Publication Data applied for Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at
CR Subject Classification (1998): B.1.2, B.1.4, B.2.2-3, B.6.2-3, B.7.2-3, F.3.1, F.4.1, I.2.3, D.2.4, J.6 ISSN 0302-9743 ISBN 3-540-00116-6 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer-Verlag Berlin Heidelberg New York a member of BertelsmannSpringer Science+Business Media GmbH http://www.springer.de © Springer-Verlag Berlin Heidelberg 2002 Printed in Germany Typesetting: Camera-ready by author, data conversion by PTP-Berlin, Stefan Sossna e. K. Printed on acid-free paper SPIN: 10870994 06/3142 543210
Preface
This volume contains the proceedings of the Fourth Biennial Conference on Formal Methods in Computer-Aided Design (FMCAD). The conference is devoted to the use of mathematical methods for the analysis of digital hardware circuits and systems. The work reported in this book describes the use of formal mathematics and associated tools to design and verify digital hardware systems. Functional verification has become one of the principal costs in a modern computer design effort. FMCAD provides a venue for academic and industrial researchers and practitioners to share their ideas and experiences of using discrete mathematical modeling and verification. Over the past 20 years, this area has grown from just a few academic researchers to a vibrant worldwide community of people from both academia and industry. This volume includes 23 papers selected from the 47 submitted papers, each of which was reviewed by at least three program committee members. The history of FMCAD dates back to 1984, when the earliest meetings on this topic occurred as part of IFIP WG10.2. IFIP WG10.2 Workshops 1984 1985 1986 1988 1989 1990 1991
Darmstadt Edinburgh Grenoble Glasgow Leuven Miami Torino
Eveking Milne and Subrahmanyam Borrione Milne Claessen Subrahmanyam Prinetto and Camurati
At the IFIP WG10.2 meeting in 1991 a presentation by the ESPRIT group “CHARME” led to the creation of the conference on Correct Hardware Design and Verification Methods (CHARME). For several years, CHARME alternated with the conference on Theorem Provers in Circuit Design (TPCD), which evolved into FMCAD. Traditionally, FMCAD and CHARME are held on alternate years on different continents. Correct Hardware Design and Verification Methods (CHARME) 1993 Arles Milne and Pierre (LNCS 683) 1995 Frankfort Eveking and Camurati (LNCS 987) 1997 Montreal Li and Probst 1999 Bad Herrenalb Kropf and Pierre (LNCS 1703) 2001 Livingstone Margaria and Melham (LNCS 2144)
VI
Preface
Theorem Provers in Circuit Design 1992 1994
(TPCD)
Nijmegen Boute, Melham, and Stavridou Bad Herrenalb Kropf and Kumar (LNCS 901)
Formal Methods in Computer-Aided Design (FMCAD) 1996 1998 2000
San Jose San Jose Austin
Camilleri and Srivas (LNCS 1166) Gopalakrishnan and Windley (LNCS 1522) Hunt and Johnson (LNCS 1954)
The organizers are grateful to Intel, Motorola, Xilinx, and Synopsys for their financial sponsorship, which considerably eased the organization of the conference. Sandy Ellison and Kelli Dawson of Intel Meeting Services are to be thanked for their tireless effort; they kept us on an organized and orderly path. Waterloo, Ontario Portland, Oregon November 2002
Mark D. Aagaard John W. O’Leary
Conference Organization John O’Leary (General Chair) Mark Aagaard (Program Chair)
Program Committee Mark Aagaard (Canada) Dominique Borrione (France) Randal E. Bryant (USA) Jerry Burch (USA) Eduard Cerny (USA) Shiu-Kai Chin (USA) Ed Clarke (USA) David Dill (USA) Hans Eveking (Germany) Masahiro Fujita (Japan) Steven German (USA) Ganesh Gopalakrishnan (USA) Mike Gordon (UK) Susanne Graf (France) Kiyoharu Hamaguchi (Japan) Ravi Hosabettu (USA) Alan Hu (Canada) Warren Hunt (USA) Steve Johnson (USA)
Robert Jones (USA) Thomas Kropf (Germany) Andreas Kuehlmann (USA) John Launchbury (USA) Tim Leonard (USA) Andy Martin (USA) Ken McMillan (USA) Tom Melham (UK) Paul Miner (USA) John O’Leary (USA) Laurence Pierre (France) Carl Pixley (USA) David Russinoff (USA) Mary Sheeran (Sweden) Eli Singerman (Israel) Anna Slobodova (USA) Ranga Vemuri (USA) Matthew Wilding (USA) Jin Yang (USA)
Additional Reviewers Roy Armoni Ritwik Bhattacharya Jesse Bingham Annette Bunker Pankaj Chauhan Limor Fix Amit Goel
John Harrison Gila Kamhi James Kukula Shuvendu Lahiri Madhubanti Mukherjee Rajesh Radhakrishnan Sanjit Seshia
Ali Sezgin Robert de Simone Subramanyan Siva Ofer Strichman Rob Sumners Vijay Sundaresan
Table of Contents
Abstraction Abstraction by Symbolic Indexing Transformations . . . . . . . . . . . . . . . . . . . . . Thomas F. Melham, Robert B. Jones
1
Counter-Example Based Predicate Discovery in Predicate Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Satyaki Das, David L. Dill Automated Abstraction Refinement for Model Checking Large State Spaces Using SAT Based Conflict Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Pankaj Chauhan, Edmund Clarke, James Kukula, Samir Sapra, Helmut Veith, Dong Wang Symbolic Simulation Simplifying Circuits for Formal Verification Using Parametric Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 In-Ho Moon, Hee Hwan Kwak, James Kukula, Thomas Shiple, Carl Pixley Generalized Symbolic Trajectory Evaluation — Abstraction in Action . . . . 70 Jin Yang, Carl-Johan H. Seger Model Checking: Strongly-Connected Components Analysis of Symbolic SCC Hull Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Fabio Somenzi, Kavita Ravi, Roderick Bloem Sharp Disjunctive Decomposition for Language Emptiness Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Chao Wang, Gary D. Hachtel Microprocessor Specification and Verification Relating Multi-step and Single-Step Microprocessor Correctness Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Mark D. Aagaard, Nancy A. Day, Meng Lou Modeling and Verification of Out-of-Order Microprocessors in UCLID . . . . 142 Shuvendu K. Lahiri, Sanjit A. Seshia, Randal E. Bryant
X
Table of Contents
Decision Procedures On Solving Presburger and Linear Arithmetic with SAT . . . . . . . . . . . . . . . . 160 Ofer Strichman Deciding Presburger Arithmetic by Model Checking and Comparisons with Other Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Vijay Ganesh, Sergey Berezin, David L. Dill Qubos: Deciding Quantified Boolean Logic Using Propositional Satisfiability Solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Abdelwaheb Ayari, David Basin Model Checking: Reachability Analysis Exploiting Transition Locality in the Disk Based Murϕ Verifier . . . . . . . . . . 202 Giuseppe Della Penna, Benedetto Intrigila, Enrico Tronci, Marisa Venturini Zilli Traversal Techniques for Concurrent Systems . . . . . . . . . . . . . . . . . . . . . . . . . 220 Marc Sol´e, Enric Pastor Model Checking: Fixed Points A Fixpoint Based Encoding for Bounded Model Checking . . . . . . . . . . . . . . . 238 Alan Frisch, Daniel Sheridan, Toby Walsh Using Edge-Valued Decision Diagrams for Symbolic Generation of Shortest Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 Gianfranco Ciardo, Radu Siminiceanu Verification Techniques and Methodology Mechanical Verification of a Square Root Algorithm Using Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 Jun Sawada, Ruben Gamboa A Specification and Verification Framework for Developing Weak Shared Memory Consistency Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 Prosenjit Chatterjee, Ganesh Gopalakrishnan Model Checking the Design of an Unrestricted, Stuck-at Fault Tolerant, Asynchronous Sequential Circuit Using SMV . . . . . . . . . . . . . . . . . 310 Meine van der Meulen
Table of Contents
XI
Hardware Description Languages Functional Design Using Behavioural and Structural Components . . . . . . . 324 Richard Sharp Compiling Hardware Descriptions with Relative Placement Information for Parametrised Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342 Steve McKeever, Wayne Luk, Arran Derbyshire Prototyping and Synthesis Input/Output Compatibility of Reactive Systems . . . . . . . . . . . . . . . . . . . . . . 360 Josep Carmona, Jordi Cortadella Smart Play-out of Behavioral Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 378 David Harel, Hillel Kugler, Rami Marelly, Amir Pnueli Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
Abstraction by Symbolic Indexing Transformations Thomas F. Melham1 and Robert B. Jones2 1
Department of Computing Science, University of Glasgow, Glasgow, Scotland, G12 8QQ. 2 Strategic CAD Labs, Intel Corporation, JF4-211, 2511 NE 25th Avenue, Hillsboro, OR 97124, USA.
Abstract. Symbolic indexing is a data abstraction technique that exploits the partially-ordered state space of symbolic trajectory evaluation (STE). Use of this technique has been somewhat limited in practice because of its complexity. We present logical machinery and efficient algorithms that provide a much simpler interface to symbolic indexing for the STE user. Our logical machinery also allows correctness assertions proved by symbolic indexing to be composed into larger properties, something previously not possible.
1
Introduction
Symbolic trajectory evaluation (STE) is an efficient model checking algorithm especially suited to verifying properties of large datapath designs [1]. STE is based on symbolic ternary simulation [2], in which the Boolean data domain {0, 1} is extended to a partially-ordered state space by the addition of an unknown value ‘X’. This gives circuit models in STE a built-in and flexible data abstraction hierarchy. Symbolic indexing is a technique for formulating STE logic formulas in a way that exploits this partially-ordered state space and reduces the number of BDD variables needed to verify a property. The method can make a dramatic difference in the time and space needed to check a formula, and can be used to verify circuit properties that are infeasible to verify directly [3]. Although symbolic indexing has been known for a long time [4], our experience is that it is not exploited nearly as often as it is applicable. In part, this is because only limited user-level support has been available in libraries provided to verification engineers. But, more importantly, correctness assertions proved by symbolic indexing are not formulated in a way that makes them composable at higher levels. Two formulas written using symbolic indexing might express two circuit properties that imply some desired result but encode these properties using incompatible indexing schemes. Moreover, there is no explicit characterization of the conditions under which more composable formulas can be derived from the indexed ones. M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 1–18, 2002. c Springer-Verlag Berlin Heidelberg 2002
2
T.F. Melham and R.B. Jones
This paper describes some logical machinery aimed at bridging these gaps. We present an algorithm to transform ordinary verification problems into symbolically indexed form, together with an account of the side-conditions that must hold for this transformation to be sound. We also describe how the algorithm can be applied in the presence of environmental constraints, an important consideration in practice. Finally, we provide some experimental results on a CAM (content-addressable memory) circuit. The work presented in this paper does not completely automate the use of symbolic indexing in the verification flow. Our algorithms require the user to supply an indexing relation that expresses the desired abstraction scheme; we do not provide a method whereby an effective indexing relation can be discovered in the first place. Our results do, however, guarantee the soundness, subject to certain well-characterized side-conditions, of using an indexing relation to transform a verification property. This key result paves the way for future work on automatic abstraction techniques for STE, in which an attempt might be made to discover suitable indexing relations automatically.
2
STE Model Checking
Symbolic trajectory evaluation [1] is an efficient model checking algorithm especially suited to verifying properties of large datapath designs. The most basic form of STE works on a very simple linear-time temporal logic, limited to implications between formulas built from only conjunction and the next-time operator. STE is based on ternary simulation [2], in which the Boolean data domain {0, 1} is extended with a third value ‘X’ that stands for an indeterminate value (‘0’ or ‘1’). This provides STE with powerful state-space abstraction capabilities, as will be illustrated subsequently. While the basic STE logic is weak, its expressive power is greatly extended by implementing a symbolic ternary simulation algorithm. Symbolic ternary simulation [4] uses BDDs [5] to represent classes of data values on circuit nodes. With this representation, STE can combine many (ternary) simulation runs— one for each assignment of values to the BDD variables—into a single symbolic simulation run covering them all. In this section, we provide a brief overview of STE model checking theory. A full account of the theory can be found in [1] and an alternative perspective in [6]. 2.1
Circuit Models
Symbolic trajectory evaluation employs a ternary data model with values drawn from the set D = {0, 1, X}. A partial order relation ≤ is introduced, with X ≤ 0 and X ≤ 1: 0
❅ ❅
1 X
Abstraction by Symbolic Indexing Transformations
3
This orders values by information content: X stands for an unknown value and so is ordered below 0 and 1. We suppose there is a set of nodes, N , naming observable points in circuits. A state is an instantaneous snapshot of circuit behavior given by assigning a value in D to every circuit node in N . The ordering ≤ on D is extended pointwise to get an ordering on states. We wish this to form a complete lattice, and so introduce a special ‘top’ state, , and define the set of states S to be (N →D) ∪ {}. The required ordering is then defined for states s1 , s2 ∈ S by s1 s2
=
s2 = or s1 , s2 ∈ N →D and s1 (n) ≤ s2 (n) for all n ∈ N
The intuition is that if s1 s2 , then s1 may have ‘less information’ about node values than s2 , i.e. it may have Xs in place of some 0s and 1s. If one considers the three-valued ‘states’ s1 and s2 as constraints or predicates on the actual, i.e. Boolean, state of the hardware, then s1 s2 means that every Boolean state that satisfies s1 also satisfies s2 . We say that s1 is ‘weaker than’ s2 . (Strictly speaking, is reflexive and we really mean ‘no stronger than’, but it is common to be somewhat inexact and just say ‘weaker than’.) The top value represents the unsatisfiable constraint. The join operator on pairs of states in the lattice is denoted by ‘ ’. To model dynamic behavior, a sequence of the values that occur on circuit nodes over time will represented by a function σ ∈ N→S from time (the natural numbers N) to states. Such a function, called a sequence, assigns a value in D to each node at each point in time. For example, σ 3 reset is the value present on the reset node at time 3. We lift the ordering on states pointwise to sequences in the obvious way: σ 1 σ2
=
σ1 (t) σ2 (t) for all t ∈ N
One convenient operation, used later in stating the semantics of STE, is taking the ith suffix of a sequence. The ith suffix of a sequence σ is written σ i and defined by σ i t = σ (t+i) for all t ∈ N.
The suffix operation σ i simply shifts the sequence σ forward i points in time, ignoring the states at the first i time units. In symbolic trajectory evaluation, the formal model of a circuit c is given by a next-state function Yc ∈ S → S that maps states to states. Intuitively, the next-state function expresses a constraint on the real, Boolean states into which the circuit may go, given a constraint on the current Boolean state it is in. The next-state function must be monotonic and a requirement for implementations of STE is that they extract a next-state function that has this property from the circuit under analysis.1 1
In practice, the circuit model Yc is constructed on-the-fly by ternary symbolic simulation of a netlist description of the circuit c.
4
T.F. Melham and R.B. Jones
A sequence σ is said to be a trajectory of a circuit if it represents a set of behaviors that the circuit could actually exhibit. That is, the set of behaviors that σ represents (i.e. possibly using unknowns) is a subset of the Boolean behaviors that the real circuit can exhibit (where there are no unknowns). For a circuit c, we define the set of all its trajectories, T (c), as follows:
T (c) = {σ | Yc (σ t) σ (t+1) for all t ∈ N} For a sequence σ to be a trajectory, the result of applying Yc to any state must be no more specified (with respect to the ordering) than the state at the next moment of time. This ensures that σ is consistent with the circuit model Yc . 2.2
Trajectory Evaluation Logic
One of the keys to the efficiency of STE and its success with datapath circuits is its restricted temporal logic. A trajectory formula is a simple linear-time temporal logic formula with the following syntax: f, g := n is 0 | n is 1 | f and g | P →f | Nf
-
node n has value 0 node n has value 1 conjunction of formulas f is asserted only when P is true f holds in the next time step
where f and g range over formulas, n ∈ N ranges over the nodes of the circuit, and P is a propositional formula (‘Boolean function’) called a guard. The basic trajectory formulas ‘n is 0’ and ‘n is 1’ say that the node n has value 0 or value 1, respectively. The operator and forms the conjunction of trajectory formulas. The trajectory formula P → f weakens the subformula f by requiring it to be satisfied only when the guard P is true. Finally, Nf says that the trajectory formula f holds in the next point of time. Guards are the only place that variables may occur in the primitive definition of trajectory formulas. At first sight, this seems to rule out assertions such as ‘node n has value b’, where b is a variable. But the following syntactic sugar allows variables—indeed any propositional formula—to be associated with a node:
n is P = P → (n is 1) and ¬P → (n is 0) where n ∈ N ranges over nodes and P ranges over propositional formulas. The definition of when a sequence σ satisfies a trajectory formula f is now given. Satisfaction is defined with respect to an assignment φ of Boolean truthvalues to the variables that appear in the guards of the formula: φ, σ φ, σ φ, σ φ, σ φ, σ
|= n is 0 |= n is 1 |= f and g |= P → f |= Nf
= = = = =
σ(0)=, or σ(0) ∈ N →D and σ 0 n = 0 σ(0)=, or σ(0) ∈ N →D and σ 0 n = 1 φ, σ |= f and φ, σ |= g φ |= P implies φ, σ |= f φ, σ 1 |= f
Abstraction by Symbolic Indexing Transformations
5
where φ |= P means that the propositional formula P is satisfied by the assignment φ of truth-values to the Boolean variables in P . The key feature of this logic is that for any trajectory formula f and assignment φ, there exists a unique weakest sequence that satisfies f . This sequence is called the defining sequence for f and is written [f ]φ . It is defined recursively as follows: [m is 0]φ t [m is 1]φ t [f and g]φ t [P → f ]φ t [Nf ]φ t
= λn. 0 if m=n and t=0, otherwise X = λn. 1 if m=n and t=0, otherwise X = ([f ]φ t) ([g]φ t) = [f ]φ t if φ |= P, otherwise λn. X = [f ]φ (t−1) if t=0 , otherwise λn. X
The crucial property enjoyed by this definition is that [f ]φ is the unique weakest sequence that satisfies f for the given φ. That is, for any φ and σ, φ, σ |= f if and only if [f ]φ σ. The algorithm for STE is also concerned with the weakest trajectory that satisfies a particular formula. This is the defining trajectory for a formula, written [[f ]]φ . It is defined by the following recursive calculation:
[[f ]]φ 0 = [[f ]]φ (t+1) =
[f ]φ 0 [f ]φ (t+1) Yc ( [[f ]]φ t)
The defining trajectory of a formula f is its defining sequence with the added constraints on state transitions imposed by the circuit, as modeled by the nextstate function Yc . It can be shown that [[f ]]φ is the unique weakest trajectory that satisfies f . 2.3
Symbolic Trajectory Evaluation
Circuit correctness in symbolic trajectory evaluation is stated with trajectory assertions of the form A ⇒ C, where A and C are trajectory formulas. The intuition is that the antecedent A provides stimuli to circuit nodes and the consequent C specifies the values expected on circuit nodes as a response. A trajectory assertion is true for a given assignment φ of Boolean values to the variables in its guards exactly when every trajectory of the circuit that satisfies the antecedent also satisfies the consequent. For a given circuit c, we define φ |= A ⇒ C to mean that for all σ ∈ T (c), if φ, σ |= A then φ, σ |= C. The notation |= A ⇒ C means that φ |= A ⇒ C holds for all φ. The fundamental theorem of trajectory evaluation [1] follows immediately from the previously-stated properties of [f ]φ and [[f ]]φ . It states that for any φ, the trajectory assertion φ |= A ⇒ C holds exactly when [C]φ [[A]]φ . The intuition is that the sequence characterizing the consequent must be ‘included in’ the weakest sequence satisfying the antecedent that is also consistent with the circuit.
6
T.F. Melham and R.B. Jones
This theorem gives a model-checking algorithm for trajectory assertions: to see if φ |= A ⇒ C holds for a given φ, just compute [C]φ and [[A]]φ and compare them point-wise for every circuit node and point in time. This works because both A and C will have only a finite number of nested next-time operators N, and so only finite initial segments of the defining trajectory and defining sequence need to be calculated and compared. Much of the practical utility of STE comes from the key observation that it is possible to compute [C]φ [[A]]φ not just for a specific φ, but as a symbolic constraint on an arbitrary φ. This constraint takes the form of a propositional formula (e.g. a BDD) which is true exactly for variable assignments φ for which [C]φ [[A]]φ holds. Such a constraint is called a residual , and represents precisely the conditions under which the property A ⇒ C is true of the circuit.
3
Symbolic Indexing in STE
Two important properties follow from the STE theory just presented. Consider an STE assertion A ⇒ C. Suppose we replace the antecedent A with a new antecedent B that has a defining sequence no stronger than that of A (i.e. [B]φ [A]φ for all φ). Then by monotonicity of underlying the circuit model we will also have that [[B ]]φ [[A]]φ for all φ. Hence if we can prove |= B ⇒ C, then the original STE assertion |= A ⇒ C also holds. This is called antecedent weakening. Likewise, if we replace the consequent C with a new consequent D that has a defining sequence no weaker than that of C (i.e. [C]φ [D]φ for all φ) and we can prove |= A ⇒ D, then the original STE assertion |= A ⇒ C also holds. This is called consequent strengthening. Symbolic indexing is the systematic use of antecedent weakening to perform data abstraction for certain circuit structures. It exploits the partially-ordered state space of STE to reduce the complexity of the BDDs needed to verify a circuit property. Intuitively, symbolic indexing is a way to use BDD variables only ‘when needed’. The idea can be illustrated using the following trivial example. Consider the three-input AND gate shown below: ✏ i1 o i2 i3 ✑ With direct use of STE, an assertion that could be used to verify this device is |= (i1 is a) and (i2 is b) and (i3 is c) ⇒ (o is a ∧ b ∧ c)
(1)
In primitive form, this would be expressed as follows: |= ¬a → (i1 is 0) and a → (i1 is 1) and ¬b → (i2 is 0) and b → (i2 is 1) and ¬c → (i3 is 0) and c → (i3 is 1) and ⇒ ¬a ∨ ¬b ∨ ¬c → (o is 0) and a ∧ b ∧ c → (o is 1)
(2)
Abstraction by Symbolic Indexing Transformations
7
The strategy here is to place unique and unconstrained Boolean variables on each input node in the device, and symbolically simulate the circuit to check that the desired function of these variables will appear on the output node. STE’s unknown value X allows us to reduce the number of variables needed to verify the desired property. Because of the functionality of the AND gate, only the four cases enumerated in the table below need to be verified: case i1 i2 i3 o 0 0 XX0 1 X 0 X0 2 XX 0 0 3 1 1 1 1 If at least one of the AND inputs is 0, the output will be 0 regardless of the values on the other two inputs. In these cases, X may be used to represent the unknown value on the other two input nodes. If all three inputs are 1, then the output is 1 as well. Antecedent weakening, and the fact that the four cases enumerated above cover all input patterns of 0s and 1s, means this is sufficient for a complete verification. Symbolic indexing is the technique of using Boolean variables to enumerate or ‘index’ groups of cases in this efficient way. For the AND gate, there are just four cases to check, so these can be indexed with two Boolean variables, say p and q. These cases can then be verified simultaneously with STE by checking the following trajectory assertion: |= ¬p ∧ ¬q → (i1 is 0) and p ∧ q → (i1 is 1) and p ∧ ¬q → (i2 is 0) and p ∧ q → (i2 is 1) and ¬p ∧ q → (i3 is 0) and p ∧ q → (i3 is 1) and ⇒ ¬p ∨ ¬q → (o is 0) and p ∧ q → (o is 1)
(3)
If this formula is true, then we have definitive—but somewhat indirectly stated— formal evidence that the AND gate does what is required. Antecedent weakening says that whenever (3) allows an input circuit node to be X, that node could have been set to either 0 or 1 and the input/output relation verified would still hold. It can also be established by inspection of the cases enumerated in the antecedent that the given combinations of explicit constant 0s and 1s and implicit Xs covers the whole input space. This (informal) reasoning tells us that the indexed formula (3) amounts to a complete verification of the expected behavior. The advantage of symbolic indexing is that it reduces the number of Boolean variables needed to verify a property. In the AND gate the reduction is trivial— two variables instead of three. But much greater reductions are possible in real applications, and there are certainly circuits that can be verified in STE by indexing but cannot be verified directly. Memory structures are one notable example that arise frequently.
8
4
T.F. Melham and R.B. Jones
Indexing Transformations
The technical contribution of this paper addresses two problems with using symbolic indexing in practice. First, how can we gain the efficiency of symbolic indexing and yet still obtain properties that make direct, non-indexed statements about circuit correctness? Second, what side conditions must hold to ensure the soundness of such a process? We show how to construct indexed STE assertions from direct ones, given a user-supplied specification of the indexing scheme to be employed. For example, applying the method to the AND gate formula (1) above produces the indexed formula (3). This provides an accessible interface to the indexing technique. The user no longer needs to generate indexed antecedents and consequents explicitly, but can describe the indexing scheme abstractly and let a computer program construct the correct indexed formulas. Moreover, if the resulting indexed assertions are proven true, then the original assertion is also true by construction (subject to a certain side condition). This means that the original assertion can subsequently be used in higher-level reasoning. For example, it might be composed via theorem proving with other assertions verified using a different indexing scheme. 4.1
Indexing Relations
The user’s interface to our indexing method is an indexing relation that specifies the indexing scheme to be applied to the problem at hand. The relation is a propositional logic formula of the form R(xs, ts). It relates the Boolean variables ts appearing in the original problem and the Boolean variables xs that will index the cases being grouped together in the abstraction. The original problem variables ts are called the index target variables and the variables to be introduced xs are called the index variables. For the AND gate, the index targets are a, b, c and the index variables are p and q. The indexing relation R is: R(p, q, a, b, c) ≡ (pq ⊃ a) ∧ (pq ⊃ b) ∧ (pq ⊃ c) ∧ (pq ⊃ abc) As can be seen, this relation represents in logical form an enumeration of the four cases in the table of Section 3. Note that the indexing relation is not one-to-one (though other indexing relations may be). This reflects the Xs that appear in the table in Section 3, and indeed is essential to making the indexing a data abstraction at all. 4.2
Preimage and Strong Preimage
It is convenient to specify two operations on predicates using an indexing relation. The first is the ordinary preimage operation. Given a relation R and a predicate P on the target variables, the preimage PR is defined by
PR = ∃ts. R(xs, ts) ∧ P (ts)
Abstraction by Symbolic Indexing Transformations
9
The second is the strong preimage of a predicate. Given a relation R and a predicate P on the target variables, the strong preimage P R is defined by P R = PR ∧ ¬ [∃ts. R(xs, ts) ∧ ¬P (ts)]
The strong preimage is P R (xs) holds of some index xs precisely when xs is in the preimage of P and not in the preimage of the negation of P . These operations are illustrated in Figure 1. The solid circle is the preimage
xs
1111 0000 0000 1111 0000 1111 0000 1111 PR
←R→
ts P
¬PR
¬P Fig. 1. Index Relation Preimages
PR of P and the dotted circle the preimage (¬P )R of the negation of P . The strong preimage P R is the shaded region—i.e. that part of PR that does not also lie within (¬P )R . 4.3
Transforming STE Formulas with Indexing Relations
Our indexing transformation for an STE assertion A ⇒ C applies the strong preimage operation to the guards of the antecedent A and the preimage operation to the guards of the consequent C. For given trajectory formula f and indexing relation R, we write fR for the preimage of f under R and f R for the strong preimage of f under R. The definitions of these operations are given by recursion over the syntax of trajectory formulas in the obvious way: (n is 0)R (n is 1)R (f and g)R (P → f )R (N f )R
= n is 0 = n is 1 = f R and g R = PR → fR = N fR
(n is 0)R (n is 1)R (f and g)R (P → f )R (N f )R
= n is 0 = n is 1 = fR and gR = PR → fR = N fR
Two theorems about the preimage and strong preimage operations on trajectory formulas are used in the sequel. The first is that applying the strong
10
T.F. Melham and R.B. Jones
preimage of an indexing relation to the guards of an STE formula is a weakening operation: Theorem 1 For all R, f and φ, if φ |= R, then [f R ]φ [f ]φ . This is really the core of our abstraction transformation. Taking the strong preimage under an indexing relation can strictly weaken the guards of the formula by ‘subtracting out’ the indexes of cases in which the guard can be false. This achieves an abstraction by introducing Xs into the defining sequence of the formula. The second theorem is that applying the preimage of an indexing relation to the guards of an STE formula is a strengthening operation: Theorem 2 For all R, f and φ, if φ |= R, then [f ]φ [fR ]φ . Each of these theorems follows by a straightforward induction on the structure of the trajectory formula f . 4.4
Transforming STE Assertions with Indexing Relations
The theorems just cited, combined with the STE antecedent weakening and consequent strengthening properties of Section 2, allow an arbitrary property A ⇒ C to be indexed by an indexing relation R. Intuitively, we can use an indexing scheme to weaken the antecedent by grouping some of its separate Boolean input configurations using Xs (thereby assuming less about circuit behavior). If we use the same indexing to strengthen the consequent, and the resulting STE assertion holds, then we can also conclude the original STE assertion. To guarantee soundness, a technical side condition must be satisfied—namely that the indexing scheme R completely ‘covers’ the target variables: ∀ts. ∃xs. R(xs, ts)
(4)
This says that for any values of the target variables ts (the variables that appear in A and C), there is an assignment to the index variables xs that indexes it. This condition ensures that every verification case included in the original problem is also covered in the indexed verification—which is clearly necessary, for otherwise the indexed verification would be incomplete. Before considering the soundness of our transformation, we introduce a notation for the truth of a trajectory formula under a propositional assumption about its Boolean variables. If P is a propositional Boolean formula (for example an indexing relation) and A ⇒ C a trajectory assertion, we write P |= A ⇒ C to mean that for any valuation φ for which φ |= P , we have that φ |= A ⇒ C. Informally, we are saying that A ⇒ C is true whenever the condition P holds. More detail on how such an assertion can be checked in practice is given in Section 5.1. Soundness of our abstraction transformation is given by the following theorem.
Abstraction by Symbolic Indexing Transformations
11
Theorem 3 If we can show that R(xs, ts) |= AR ⇒ CR and the indexing relation coverage condition ∀ts. ∃xs. R(xs, ts) holds, then we may conclude |= A ⇒ C. Proof. By the following derivation. 1. 2. 3. 4. 5. 6.
R(xs, ts) |= AR ⇒ CR R(xs, ts) |= A ⇒ CR R(xs, ts) |= A ⇒ C ∃xs. R(xs, ts) |= A ⇒ C ∀ts. ∃xs. R(xs, ts) |= A ⇒ C
[assumption] [1 and Theorem (1)] [2 and Theorem (2)] [3, because xs do not appear in A or C] [side condition] [4 and 5]
Note that although the variables ts do not appear in the trajectory assertion AR ⇒ CR of line 1, the variables xs do. The condition given by R(xs, ts) is therefore significant to verification of this assertion. Indeed in this context it is equivalent to ∃ts.R(xs, ts), which restricts the verification to values of xs that actually do index something. If the STE algorithm produces a residual when checking the formula shown in line 1, then this will of course be given in terms of the index variables rather than the target variables from the original problem. The user must therefore analyze the residual by taking its image under the indexing relation, mapping it back into the original target variables for inspection there.
5
Indexing under Environmental Constraints
Few verifications take place in isolation from complex environmental and other operating assumptions. In this section, we extend our indexing algorithm to incorporate such conditions. We present two methods for indexing under an environmental constraints. The first is the simpler option, and requires little or no user intervention. The second is an alternative that can be applied to certain problems for which the direct approach is infeasible. Both methods use the technique of parametric representation of environmental constraints, which we now briefly introduce. 5.1
Parametric Representation
The parametric representation of Boolean predicates is useful for restricting verification to a care set and for reducing complexity by input-space decomposition [7,8,9]. The technique is independent of the symbolic simulation algorithm in STE, does not require modifications to the circuit, and can be used to constrain both input and internal signals. Consider a Boolean predicate P that constrains input and state variables vs. Suppose we express the required behavior of the circuit as a trajectory assertion A ⇒ C over the same variables, but expect this assertion to hold only under the constraint P . That is, we wish to establish that P |= A ⇒ C. One way of
12
T.F. Melham and R.B. Jones
doing this is to use STE to obtain a residual from φ |= A ⇒ C and then check that P implies this. But this is usually not practical; the complexity of directly computing φ |= A ⇒ C with a symbolic simulator is too great. A better way is to evaluate φ |= A ⇒ C only for those variable assignments φ that actually do satisfy P . The parametric representation does exactly this, by encoding the care predicate implicitly by means of parametric functions. Given a satisfiable P , we compute a vector of Boolean functions Qs = param(P, vs) that are substituted for the variables vs in the original trajectory assertion.2 These functions are constructed so that P |= A ⇒ C holds exactly when |= A[Qs/vs] ⇒ C[Qs/vs] holds. An algorithm for param and its correctness proof are found in [9]. Suppose M is an arbitrary expression—either a propositional logic formula or a trajectory formula—and P is a predicate over the variables vs appearing in M . We write ‘M [P ]’ for M [param(P, vs)/vs]. A complicating factor is that the parametric functions will, in general, contain fresh variables vs distinct from the original variables vs. When necessary, we will write M [P ](vs ) to emphasize the appearance of these in the resulting expression. 5.2
Method 1: Direct Parametric Encoding
We wish to apply an indexing relation R to a verification problem P |= A ⇒ C that includes a constraint P . With our first method, a fully automatic procedure uses the parametric representation to ‘fold’ the constraint P into both the trajectory assertion being checked and the relation R. Indexed verification then proceeds as before. Suppose we wish to check an STE assertion P |= A ⇒ C under an environmental constraint P and using an indexing relation R(xs, ts). First, we compute a parametrically-encoded STE assertion |= A[P ] ⇒ C[P ] and indexing relation R[P ]. We then just supply these to the symbolic indexing algorithm of Section 4. The soundness of the optimization provided by our transformation is justified as follows. Note that we also write the encoded indexing relation R[P ] as R[P ](xs, ts ), where ts are the fresh variables introduced by the parametric encoding process. Theorem 4 If R[P ](xs, ts ) |= A[P ]R[P ] ⇒ C[P ]R[P ] and the indexing relation coverage condition ∀ts . ∃xs. R[P ](xs, ts ) holds, then |= A ⇒ C. Proof. By the following derivation. 1. 2. 3. 4. 5. 2
[assumption] R[P ](xs, ts ) |= A[P ]R[P ] ⇒ C[P ]R[P ] [1 and Theorem (1)] R[P ](xs, ts ) |= A[P ] ⇒ C[P ]R[P ] R[P ](xs, ts ) |= A[P ] ⇒ C[P ] [2 and Theorem (2)] ∃xs. R[P ](xs, ts ) |= A[P ] ⇒ C[P ] [3, because xs do not appear in A or C] [side condition] ∀ts . ∃xs. R[P ](xs, ts )
As usual, we write f [Qs/vs] to denote the result of substituting Qs for all occurrences of vs (respectively) in a formula f .
Abstraction by Symbolic Indexing Transformations
6. |= A[P ] ⇒ C[P ] 7. P |= A ⇒ C
13
[4 and 5] [parametric theorem (see [8])]
As before, if the STE run that checks line 1 produces a non-trivial residual this must first be mapped back through the relation R[P ] to derive a residual in terms of the target variables of |= A[P ] ⇒ C[P ]. But these will, of course, be the fresh variables introduced by the parametric encoding, so we must also undo this encoding in turn to get back to the user’s variables of the original assertion A ⇒ C. 5.3
Method 2: Analyzing Indexed Residuals
While the method presented above is straightforward, it is often infeasible in practice to construct the parameterized indexing relation R[P ]. Our second method avoids this, while still allowing us to use a constraint predicate P . We initially run the STE model-checking algorithm on AR ⇒ CR . This will then produce a residual that describes the indexed situations under which the property holds. The predicate P is then itself indexed with R, to produce an indexed predicate PR . This is then checked to ensure it implies the indexed residual obtained from STE. This process is sound only for certain indexing relations R, and the main technical innovation here consists in identifying the required side conditions on R. The first side condition is similar to the coverage side condition (4) in Section 4.4. It requires the indexing relation to cover all values of the target variables that satisfy the constraint P : ∀ts. P (ts) ⊃ ∃xs. R(xs, ts)
(5)
The second side condition is new. It is that the preimage PR and the preimage (¬P )R must be disjoint, making PR = P R . The intuition for this condition is provided by considering Figure 1, where PR and (¬P )R overlap. We wish to index the condition P in order to check that it implies the residual—and we must do this by either taking the preimage PR or the strong preimage P R . If the preimage PR is selected, and there is an overlap, then false negatives may occur. Every point in the overlap will be included in the verification, but also maps via R to elements of ¬P , and the property may simply not hold for some of these ‘don’t care’ elements. On the other hand, false positives could occur if the strong preimage P R is selected. In this case, there may be points in P that are indexed only from points in the overlap area, but for which the verification property fails. The solution is to ban the overlap. One way to ensure PR = P R is to make the preimage (¬P )R empty. The following condition does this by restricting R from indexing anything in ¬P : ∀ts. (∃xs. R(xs, ts)) ⊃ P (ts)
(6)
If we choose an indexing relation R that exactly partitions P , ∀ts. P (ts) ≡ ∃xs. R(xs, ts) both side conditions are satisfied.
(7)
14
T.F. Melham and R.B. Jones
The soundness of the optimization provided by our transformation is justified as follows. Note again that we write R(xs, ts) as just ‘R’ when we do not need to emphasize the particular variables involved. Theorem 5 Let Q be the residual condition under which the model-checking assertion R(xs, ts) |= AR ⇒ CR holds. Suppose that ∀ts. P (ts) ≡ ∃xs. R(xs, ts) and that PR ⊃ Q. Then P |= A ⇒ C. Proof. By the following derivation. 1. 2. 3. 4. 5. 6. 7. 8. 9.
6
Q ∧ R(xs, ts) |= AR ⇒ CR PR ⊃ Q (∃ts. R(xs, ts) ∧ P (ts)) ⊃ Q P (ts) ∧ R(xs, ts) |= AR ⇒ CR P (ts) ∧ R(xs, ts) |= A ⇒ CR P (ts) ∧ R(xs, ts) |= A ⇒ C P (ts) ∧ ∃xs. R(xs, ts) |= A ⇒ C ∀ts. P (ts) ≡ ∃xs. R(xs, ts) P (ts) |= A ⇒ C
[assumption] [assumption] [2 and definition of PR ] [1 and 3, by logic] [4 and Theorem (1)] [5 and Theorem (2)] [6, because xs do not appear in A or C] [side conditions] [7 and 8, by logic]
Experimental Results
We have implemented the above algorithm as an experimental extension to Forte, a formal verification environment developed in Intel’s Strategic CAD Labs. Forte combines STE model checking with lightweight theorem proving in higher-order logic and has successfully been used in large-scale industrial trials on datapathdominated hardware [10,11,12]. The implementation of our algorithm is highly optimized, to ensure that the cost of computing an indexed STE property does not exceed the benefit gained by the abstraction. As usual with symbolic treatment of relations in model-checking algorithms, the main computational overhead arises from the existential quantifier of the preimage. We use the common strategy of partitioning the indexing relation to allow early quantification. The implementation is also carefully engineered to eliminate redundant computations. One circuit structure we studied is the simple CAM shown in Figure 2. This compares a 64-bit query against the contents of an n-entry memory, producing a bit that indicates whether the query value is in the memory or not. CAM devices have previously been verified using symbolic indexing by Pandey et al. [3], who devised an indexing scheme with a logarithmic reduction in the number of variables needed—bringing an otherwise infeasible verification within reach of STE. Our experiments on CAMs showed that we could add our indexing transformation to get a verification of directly-stated CAM properties with acceptable computational overhead. As an example, we present results for the following simple property: if the query value is equal to the contents of one of the CAM memory entries, then the ‘hit’ output will be true. The formalization of this
Abstraction by Symbolic Indexing Transformations
15
query Memory
= =
n entries
=
hit
= 64-bits Fig. 2. Simple Content-Addressable Memory (CAM)
property in STE involves the use of an environmental constraint to express the condition that the query is equal to one of the CAM entries. The verification therefore employs the methods of Section 5. Of course, this is not a complete characterization of correct behavior for the CAM device. However, it is typical of the kind of property for memory arrays that cannot be verified directly but that yields to the symbolic indexing technique. Figure 3 shows the CPU time required to verify this property for different numbers of entries in the CAM memory, from 4 up to 64. All runs were performed on a 400 MHz Intel Pentiumr II Processor running RedHatr Linux, and user time was determined with the system time command. The verification of this property by symbolic indexing, including our indexing transformation algorithm, is much faster than the best-known alternative, namely using the parametric representation to case-split on the location of the hit while simultaneously weakening other circuit nodes. The numbers reported are for the model-checking portions of the verification. Both approaches require similar amounts of deductive reasoning, namely coverage analysis for case splitting and the coverage side condition for symbolic indexing. As shown in Figure 4, our automatic indexing transformation did not add significant computational overhead to the indexed verification, a requirement for our technique to be feasible in practice. The computational overhead for our indexing algorithm is roughly constant at 50-60% of the total verification time.
7
Conclusions
We have presented algorithms that facilitate easier application of symbolic indexing in STE model checking. Our approach provides a simpler interface for the STE user, making it easier to include the technique in the verification flow. Our theoretical results also provide the logical foundation for composing multiple indexed results into larger properties. The method allows us to transform
16
T.F. Melham and R.B. Jones
100 time (s)
Case Splitting Symbolic Indexing
10 1 0.1
4 8
16
32
64
CAM Entries Fig. 3. Symbolic Indexing vs. Case Splitting
an STE formula into the more efficiently-checkable indexed form, but still conclude the truth of the original formula. A top-level verification can, therefore, be decomposed into separate sub-properties that are verified under different, and possibly incompatible, indexing schemes. We have demonstrated the efficiency of an implementation of our algorithms by verifying a simple property of a CAM, a hardware structure commonly encountered in microprocessor designs. The indexing scheme applied in this example comes from past work by Pandey et al. [3]. Of course, the single property chosen as an illustration in Section 6 doesn’t provide a complete characterization of the desired behavior of a CAM. Our contribution has been to show that we can both obtain the computational advantages of this indexing scheme and
10 8 time (s)
Total Indexing Only
6 4 2 0
4
8
16
32
64
CAM Entries Fig. 4. Overhead of Automatic Indexing Algorithm
Abstraction by Symbolic Indexing Transformations
17
justifiably conclude a direct statement of the desired property—with negligible additional cost. Our algorithm requires a user-supplied abstraction scheme, presented formally as a Boolean relation. Of course the indexing scheme could also be provided as a set of (possibly overlapping) predicates over the the target variables in the original formula. For example, the indexing scheme in Section 3 for the AND gate can also be given by the following set of predicates: {¬a, ¬b, ¬c, a ∧ b ∧ c} These cover the whole input space and precisely characterize the four cases to be verified in terms of the ‘target’ variables in the original property. A formal indexing relation can just be an arbitrary enumeration of these predicates in terms of a suitable number of index variables and can easily be generated automatically. But this still leaves the problem of discovering the indexing scheme in the first place. Part of our current research is directed at finding techniques to automatically discover abstractions that can leverage the indexing algorithms presented here. Finally, we observe that our transformation is a pre-processing step for STE model checking. In this paper, we have assumed a BDD-based STE algorithm. But of course the data abstraction capability of STE’s partially-ordered state spaces is orthogonal to the propositional logic technology employed. It is therefore reasonable to suppose that our method would also work with STE algorithms based on SAT [13], provided the formula representation supports our preimage and strong preimage operations. It would also be very interesting to see how our algorithms could be applied to generalized STE [14], a promising new model checking method that combines the efficiency of STE’s partially-ordered state spaces with a much more expressive and flexible framework for stating properties. Acknowledgments. We thank the anonymous referees for their careful reading of the paper and very helpful comments. John Harrison and Ashish Darbari also provided useful remarks on notation.
References 1. Seger, C.J.H., Bryant, R.E.: Formal verification by symbolic evaluation of partiallyordered trajectories. Formal Methods in System Design 6 (1995) 147–189 2. Bryant, R.E.: A methodology for hardware verification based on logic simulation. Journal of the ACM 38 (1991) 299–328 3. Pandey, M., Raimi, R., Bryant, R.E., Abadir, M.S.: Formal verification of content addressable memories using symbolic trajectory evaluation. In: ACM/IEEE Design Automation Conference, ACM Press (1997) 167–172 4. Bryant, R.E., Beatty, D.L., Seger, C.J.H.: Formal hardware verification by symbolic ternary trajectory evaluation. In: ACM/IEEE Design Automation Conference, ACM Press (1991) 397–402
18
T.F. Melham and R.B. Jones
5. Bryant, R.E.: Graph-based algorithms for Boolean function manipulation. IEEE Transactions on Computers C-35 (1986) 677–691 6. Chou, C.T.: The mathematical foundation of symbolic trajectory evaluation. In Halbwachs, N., Peled, D., eds.: Computer Aided Verification (CAV). Volume 1633 of Lecture Notes in Computer Science., Springer-Verlag (1999) 196–207 7. Jain, P., Gopalakrishnan, G.: Efficient symbolic simulation-based verification using the parametric form of Boolean expressions. IEEE Transactions on ComputerAided Design of Integrated Circuits 13 (1994) 1005–1015 8. Aagaard, M.D., Jones, R.B., Seger, C.J.H.: Formal verification using parametric representations of Boolean constraints. In: ACM/IEEE Design Automation Conference, ACM Press (1999) 402–407 9. Jones, R.B.: Applications of Symbolic Simulation to the Formal Verification of Microprocessors. PhD thesis, Department of Electrical Engineering, Stanford University (1999) 10. O’Leary, J.W., Zhao, X., Gerth, R., Seger, C.J.H.: Formally verifying IEEE compliance of floating-point hardware. Intel Technical Journal (First quarter, 1999) Available at developer.intel.com/technology/itj/. 11. Kaivola, R., Aagaard, M.D.: Divider circuit verification with model checking and theorem proving. In Aagaard, M., Harrison, J., eds.: Theorem Proving in Higher Order Logics. Volume 1869 of Lecture Notes in Computer Science., Springer-Verlag (2000) 338–355 12. Aagaard, M.D., Jones, R.B., Seger, C.J.H.: Combining theorem proving and trajectory evaluation in an industrial environment. In: ACM/IEEE Design Automation Conference, ACM Press (1998) 538–541 13. Bjesse, P., Leonard, T., Mokkedem, A.: Finding bugs in an Alpha microprocessor using satisfiability solvers. In Berry, G., Comon, H., Finkel, A., eds.: Computer Aided Verification (CAV). Volume 2102 of Lecture Notes in Computer Science., Springer-Verlag (2001) 454–464 14. Yang, J., Seger, C.J.H.: Introduction to generalized symbolic trajectory evaluation. In: Proceedings of 2001 IEEE International Conference on Computer Design. (2001) 360–365
Counter-Example Based Predicate Discovery in Predicate Abstraction Satyaki Das and David L. Dill Computer Systems Laboratory Stanford University [email protected], [email protected]
Abstract. The application of predicate abstraction to parameterized systems requires the use of quantified predicates. These predicates cannot be found automatically by existing techniques and are tedious for the user to provide. In this work we demonstrate a method of discovering most of these predicates automatically by analyzing spurious abstract counter-example traces. Since predicate discovery for unbounded state systems is an undecidable problem, it can fail on some problems. The method has been applied to a simplified version of the Ad hoc OnDemand Distance Vector Routing protocol where it successfully discovers all required predicates.
1
Introduction
Unbounded state systems have to be reasoned about to prove the correctness of a variety of real life systems including microprocessors, network protocols, software device drivers and security protocols. Predicate Abstraction is an efficient way of reducing these infinite state systems into more tractable finite state systems. A finite set of abstraction predicates defined on the concrete system is used to define the finite-state model of the system. The states of the abstract system consist of truth assignments to the set of abstraction predicates, that is each predicate is assigned a value of true or false. The abstraction is conservative, meaning that for any property proved on the abstract system, a concrete counterpart holds on the actual system. There are many hard problems that need to be solved to make predicate abstraction useful. The first is that the problem of proving arbitrary safety properties of a transition system is (obviously) undecidable. Given a pre-selected set of predicates and certain other assumptions, it is possible to prove in some cases that the system satisfies a safety property, but a failed proof may indicate that the property is violated, or simply that the abstraction is not sufficiently precise to complete the proof. Automating such
This work was supported by National Science Foundation under grant number 0121403 and DARPA contract 00-C-8015. The content of this paper does not necessarily reflect the position or the policy of the Government and no official endorsement should be inferred.
M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 19–32, 2002. c Springer-Verlag Berlin Heidelberg 2002
20
S. Das and D.L. Dill
proofs is quite difficult in practice, since it involves automatically solving logic problems that have high complexity and searching potentially large state spaces. In spite of the difficulty of this problem, there has been substantial progress towards solving it in the last few years. Another problem is how to discover the appropriate set of predicates. In much of the work on predicate abstraction, the predicates were assumed to be given by the user, or they were extracted syntactically from the system description (for example, predicates that appear in conditionals are often useful). It is obviously difficult for the user to find the right set of predicates (indeed, it is a trial-anderror process involving inspecting failed proofs), and the predicates appearing in the system description are rarely sufficient. There has been less work, and less progress, on solving the problem of finding the right set of predicates. In addition to the challenge of finding a sufficient set of predicates, there is the challenge of avoiding irrelevant predicates, since the cost of checking the abstract system usually increases exponentially with the number of predicates. In our system quantified predicates are used to deal with parameterized systems. In a parameterized system, it is often interesting (and necessary) to find properties that hold for all values of the parameter. For instance if a message queue is modeled as an array and rules parameterized by the array index are used to deliver messages then the absence of certain kinds of messages is expressed by a universally quantified formula. So predicates with quantifiers in them are used. This paper describes new ways of automatically discovering useful predicates by diagnosing failed proofs. The method is designed to find hard predicates that do not appear syntactically in the system description, including quantified predicates, which are necessary for proving most interesting properties. As importantly, it tries to avoid discovering useless predicates that do not help to avoid a known erroneous result. Furthermore, the diagnosis process can tell when a proof fails because of a genuine violation of the property by the actual system. Implementation The system was implemented using Binary Decision Diagrams (BDD) to represent the abstract system. A decision procedure for quantifier-free first-order logic, CVC [1] was used to do the satisfiability checks. The system is built around the predicate abstraction tool described in Das and Dill [9]. The state variable declarations describe the state of the concrete system. The transition relation is described using a list of parameterized guarded commands. Each guarded command consists of a guard and an action. The guard is a logic formula over the state variables and possibly the parameters that evaluates to either true or false. Each of the actions is a procedure that modifies the current concrete state into a new value. At each point the action corresponding to one of the enabled rules (rules whose guards evaluate to true) is non-deterministically executed and the concrete state changes. The prototype is implemented as as shown in Figure 1. The upper block is the tool described in our previous work [9]. Given a set of abstraction predicates,
Counter-Example Based Predicate Discovery in Predicate Abstraction Concrete System Verification Condition
Property verified Abstraction and Model Checking
Initial Predicates
Discovered predicates
21
Abstract Counter−example
Counter Example checking and predicate discovery Counter−example found
Fig. 1. Predicate Abstraction Algorithm
a verification condition and the concrete system description it first computes an approximate abstract model. This abstract model is model checked and the abstract system refined appropriately if it was too inexact. Notice that this refinement does not change the set of abstraction predicates and concentrates on using the existing predicates more efficiently. Finally this process terminates with either the verification condition verified (in which case nothing else needs to be done) or with an abstract counter-example trace. The current work, represented by the lower block in the diagram, checks whether a concrete counter-example trace corresponding to the abstract trace exists. If so the verification condition is violated and an error is reported, otherwise new predicates are discovered which avoids this counter-example. The new predicates are added to the already present abstraction predicates and the process starts anew. Since all the old predicates are reused a lot of the work from previous iterations are reused. Related Work Recently a lot of work has been done on predicate abstraction. The use of automatic predicate abstraction for model checking infinite-state systems was first presented by Graf and Sa¨ıdi in 1997 [11]. The method used monomials (monomials are conjunctions of abstract state variables or their negations) to represent abstract states. Parameterized systems are handled by using a counting abstraction [13]. Similar work has also been proposed in [17] and [14]. In 1998 [8], Col´ on and Uribe describes a method of constructing a finite abstract system and then model checking it. The abstraction produced in both methods are coarse and could fail to prove the verification condition even if all necessary predicates were present. By constructing the abstraction in a demand driven fashion, the method of Das and Dill [9] is able to compute abstractions efficiently that are as precise
22
S. Das and D.L. Dill
as possible given a fixed finite set of predicates. This ensures that if the desired properties can be proved with the abstraction predicates then the method will be able to do so. The predicate abstraction methods described so far have relied on user provided predicates to produce the abstract system. Counter-example guided refinement is a generally useful technique. It has been used in by Kurshan et al. [2] for checking timed automata, Balarin et al. [3] for language containment and Clarke et al [7] in the context of verification using abstraction for different variables in a version of the SMV model checker. Counter-example guided refinement has even been used with predicate abstraction by Lakhnech et al. [12]. Invariant generation techniques have also used similar ideas [19,5]. Invariant generation techniques generally produce too many invariants many of which are not relevant to the property being proved. This can cause problems with large systems. The counter-example guided refinement techniques do not produce the quantified predicates that our method needs. Predicate abstraction is also being used for software verification. Device drivers are being verified by the SLAM project [4]. The SLAM project has used concrete simulation of the abstract counter-example trace to generate new predicates. The BLAST project [18] also uses spurious counter-examples to generate new predicates. Predicate abstraction has also been used in software verification as a way of finding loop invariants [10]. These systems do not deal with parameterized systems, hence they do not need quantified predicates.
2
Abstraction Basics
As in previous work [9], sets of abstract and concrete states will be represented by logical formulas. For instance the concrete predicate, X represents the set of concrete states which satisfy, X. The main idea of predicate abstraction is to construct a conservative abstraction of the concrete system. This ensures that if some property is proved for the abstract system, then the corresponding property also holds for the concrete system. Formally the concrete transition system is described by a set of initial states represented by the predicate IC and a transition relation represented by the predicate RC . IC (x) is true iff x is an initial state. Similarly, RC (x, y) is true iff y is a successor of x. The safety property, P is the verification condition that needs to be proved in the concrete system. An execution of the concrete system is defined to be a sequence of states, x0 , x1 , . . . xM such that IC (x0 ) holds and for every i ∈ [0, M ), RC (xi , xi+1 ) holds. A partial trace is an execution that does not necessarily start from an initial state. A counter-example trace is defined to be an execution, x0 , x1 , . . . xM such that ¬P (xM ) holds (i.e., the counter-example trace ends in a state which violates P ). The abstraction is determined by a set of N predicates, φ1 , φ2 , . . . φN . The abstract state space is just the set of all bit-vectors of length N . An abstraction function, α maps sets of concrete states to sets of abstract states while the concretization function, γ does the reverse. In the following definitions the predicates QC and QA represent sets of concrete states and abstract states respectively. Then α(QC ) is a predicate over abstract states such that α(QC )(s)
Counter-Example Based Predicate Discovery in Predicate Abstraction
23
holds exactly when s is an abstraction of some concrete state x in QC . Similarly γ(QA )(x) holds exactly when there exists an abstract state, s in QA and s is the abstraction of x. Definition 1 Given predicates, QC and QA over concrete and abstract states respectively, the abstraction and concretization functions are defined as: α(QC )(s) = ∃x. QC (x) ∧
φi (x) ≡ s(i)
i∈[1,N ]
γ(QA )(x) = ∃s. QA (s) ∧
φi (x) ≡ s(i)
i∈[1,N ]
Using the above definitions, the abstract system is defined by the set of abstract initial states, IA = α(IC ) and the abstract transition relation, RA (s, t) = ∃x, y. γ(s)(x)∧γ(t)(y)∧RC (x, y). An abstract execution is a sequence of abstract states, s0 , s1 , . . . sM such that IA (s0 ) holds and for each i ∈ [0, M ), RA (si , si+1 ) holds. An abstract counter-example trace is an abstract execution, s0 , s1 , . . . sM for which α(¬P )(sM ) holds. The atomic predicates in the verification condition, P , are used as the initial set of predicates. The abstract system is constructed and the abstract property, ¬α(¬P ) checked for all reachable states. If this is successful then the verification condition holds. Otherwise the generated abstract counter-example is analyzed to see if a concrete execution corresponding to the abstract trace exists. In that case, a concrete counter-example has been constructed. Otherwise the abstract counter-example is used to discover new predicates. Then the process is repeated with the discovered predicates being added to the already present predicates. An abstract trace is called a real trace if there exists a concrete trace corresponding to it. Conversely if there are no concrete traces corresponding to an abstract trace then it is called a spurious trace.
3
Predicate Discovery
As described in the previous section, the system generates a counter-example trace to the verification condition that was to be proved. Now the system must analyze the abstract counter-example trace to either confirm that the trace is real, that is a concrete trace corresponding to it exists, or come up with additional predicates which would eliminate the spurious counter-example. First the trace is minimized to get a minimal spurious trace. A minimal spurious trace is defined to be an abstract trace which is 1. spurious (no corresponding concrete trace exists.) 2. minimal (removing even a single state from either the beginning or end of the trace makes the remainder real.)
24
S. Das and D.L. Dill
Checking the Abstract Counter-Example Trace There is a concrete counter-example trace x1 , x2 , . . . xL corresponding to the abstract counter-example trace, s1 , s2 , . . . sL if these conditions are satisfied: 1. For each i ∈ [1, L], γ(si )(xi ) holds. This means that each concrete state xi corresponds to the abstract state si in the trace. 2. IC (x1 ) ∧ ¬P (xL ) holds. The concrete counter-example trace starts from a initial state and ends in a state which violates P . 3. For each i ∈ [1, L), RC (xi , xi+1 ). For every i, xi+1 is the successor of xi . The conditions (1) and (3) determine that a concrete trace corresponding to the abstract trace exists and condition (2) determines that the trace starts from the set of concrete initial states and ends in a state that violates the verification condition. To write the logic concisely the logic for the initial state has been disregarded. In the implementation, an initial totally unconstrained state is added to the trace and it is assumed that the initial rule produces the initial state of the system. Since all the atomic predicates of P are present among the abstraction predicates the condition ¬P (xL ) is implied by γ(sL )(xL ). Hence, if the formula L i=1
γ(si )(xi ) ∧
L−1
RC (xi , xi+1 )
i=0
is satisfiable then the abstract counter-example trace is real. Otherwise there is no satisfying assignment and the abstract counter-example trace is spurious. To simplify the presentation it shall be assumed that the same transition relation, RC can be used for each of the concrete steps including the first where RI is actually used. In our implementation the first step is handled specially and RI is used instead of RC . The test for spuriousness is completely a property of the transition relation and the trace itself and does not depend either on the initial states or the verification condition. So we will generalize the definition of spuriousness to partial traces. A partial trace is spurious if the above formula is unsatisfiable. Predicate Discovery To understand predicate discovery we must first understand when predicate abstraction produces a spurious counter-example. Assume that in Figure 2 the whole abstract trace s1 , s2 , . . . sL is spurious but the partial trace s2 , s3 , . . . sL is real. So there are two kinds of concrete states in γ(s2 ): 1. Successor states of states in γ(s1 ). 2. States (like x2 ) that are part of some concrete trace corresponding to s2 , . . . sL .
Counter-Example Based Predicate Discovery in Predicate Abstraction
25
It must be the case that the above two types of states are disjoint. Otherwise it would be possible to find a concrete trace corresponding to the whole trace thereby making it real. If predicates to distinguish the two kinds of states were added then the spurious counter-example would be avoided. In the method described here, the discovered predicates will be able to characterize states of the second type above. Once it has been determined that the abstract counter-example is spurious, states are removed from the beginning of the trace while still keeping the remainder spurious. When states can no longer be removed from the beginning, the same process is carried out by removing states from the end of the trace. This will eventually produces a minimal spurious trace. predicate to refine s2
✬✩ ✬✩ ✬✩ ✬✩ ✈ x2 ✈ ✈ ✲✈ ✈x1 ✫✪ ✫✪ ✫✪ ✫✪ γ(s3 ) γ(sL ) γ(s2 ) γ(s1 )
✤✜
✬✩ ✈ ✫✪
✈
✬✩ ✬✩ ✈ ✈
✣✢ ✤✜ ✈
✫✪ ✫✪
✣✢ Fig. 2. Abstraction Refinement
Now consider the minimal spurious trace, s1 , s2 , s3 , . . . sL shown in Figure 2. Here the circles representing γ(s1 ), γ(s2 ) etc. are sets of concrete states while the black dots inside the sets represent individual concrete states. Since the trace s2 , s3 . . . sL is real, Q0 =
L i=2
γ(si )(xi ) ∧
L−1
RC (xi , xi+1 )
i=2
is satisfiable for some concrete states, x2 , x3 , . . . xL . Now CVC is queried about the satisfiability of Q0 . This returns a finite conjunction of formulas, ψ1 (x2 )∧ψ2 (x2 )∧. . . ψK (x2 )∧θ(x3 , . . . xL ) which implies
26
S. Das and D.L. Dill
Q0 . So the ψi ’s are conditions that any x2 must satisfy for it to be the first state of the concrete trace corresponding to s2 , s3 , . . . sL . Now it must be the case that, γ(s1 )(x1 ) ∧ RC (x1 , x2 ) ∧
K
ψi (x2 ) ∧ θ(x3 , . . . xL )
i=1
is unsatisfiable. Otherwise it would be possible to find a concrete trace corresponding to s1 , s2 , . . . sL ! More specifically, if the predicates ψ1 , ψ2 , . . . ψK are added to the set of abstraction predicates, and the verifier rerun, this particular spurious abstract counter-example will not be generated. So, we have an automatic way of discovering new abstract predicates. However it is possible to reduce the number of additional abstraction predicates. In fact it is quite likely that all of the predicates ψ1 , . . . ψK are not needed to avoid the spurious counter-example. The satisfiability of the above formula is checked after leaving out the ψ1 (x2 ) expression. If the formula is still unsatisfiable then it is dropped altogether. The same procedure is repeated with the other ψi ’s till an essential set of predicates remain (dropping any one of them makes the formula satisfiable). Notice that there may be multiple essential sets of predicates that make the above formula unsatisfiable. This method finds one such set. Now consider the effect that the abstraction refinement has on the abstract system. The original abstract state, s2 will be split into two – in one part all the added predicates hold while in the other part at least one of the assertions does not hold. Also, in the abstract transition relation, the transition from the state s1 to the first partition of s2 is removed. It is still possible that there is a path from s1 to s3 through the other partition of s2 . However the refined abstraction will never generate a spurious counter-example in which a concrete state corresponding to s1 has a successor which satisfies all the assertions ψ1 , ψ2 , . . . ψK . Parameterized Rules and Quantified Predicates When proving properties of parameterized systems, quantified predicates are needed. These quantified predicates cannot be found either from the system description or by existing predicate discovery techniques. Invariant generation methods do find quantified invariants which may be useful in some cases. But the problem there is that a lot of invariants are generated and there is no good way of deciding which ones are useful. In the presence of parameterized rules, the predicate discovery works exactly as described above. But the parameters (which are explicitly not part of the concrete state) in the rules may appear in the predicates finally generated. Recall that the predicates discovered characterize the set of states like x2 (in Figure 2) that are part of a real abstract trace. Appearance of a rule parameter in these expressions implies that the parameter must satisfy some conditions in the concrete counterpart of the abstract trace. Any other value of the parameter which satisfies the same conditions could produce another concrete trace. Naturally,
Counter-Example Based Predicate Discovery in Predicate Abstraction
27
state N: positive integer status : array [N ] of enum {GOOD, BAD} error : boolean initialize status := All values are initialized to GOOD error := false /* No error initially */ rule(p : subrange [1..N]) (status[p] = BAD) ⇒ error := true property ¬error
Fig. 3. Quantified predicate example
an existential quantifier wrapped around these expressions would find a predicate that is consistent with all possible behaviors of the (possibly unbounded) parameter. Quantifier scope minimization is carried out so that smaller predicates may be found. In some cases the existential quantifiers can be eliminated all together. Often predicates of the form, ∃x. Q(x) ∧ (x = a) where a is independent of x, are discovered. Heuristics were added so that this predicate would be simplified to Q(a). To illustrate the way quantified predicates are discovered automatically, a really trivial example is presented in Figure 3. In the example system we want to prove that error is always false. So the initial abstraction predicate chosen will be just the atomic formulas of the verification condition, in this case the predicate: B1 ≡ error. With this abstraction the property can not be proved and an abstract counter-example trace, ¬B1 , B1 is returned. Since the initialization rule is handled like any other rule (only with implicit guard true) the abstract counter-example that shall be analyzed is, true, ¬B1 , B1 . Using the test for spuriousness described earlier, the counter-example is shown to be a minimal spurious trace. Also the partial trace, ¬B1, B1 is real (that is a concrete counterpart exists) when status[p0 ] = BAD holds (p0 is the specific value of the parameter chosen). However the initialization rule specifically sets all the elements of the status array to GOOD. Hence the predicate discovered will be, status[p0 ] = BAD. But notice that the parameter appears in the predicate. Hence the new predicate will be, B2 ≡ ∃q. status[q] = BAD. Now the abstraction will be refined with the extra predicate. The additional bit will be initialized to false. Also the transition rule will now be enabled only when the new bit is true. Since that never happens the rule is never enabled and the desired property holds.
28
4
S. Das and D.L. Dill
Application to AODV
As an application of this method we shall consider a simplified version of the Ad Hoc On-demand Distance Vector (AODV) routing protocol [15,16]. The simplification was to remove timeouts from the protocol since we could not find a way of reasoning about them in our system. The protocol is used for routing in a dynamic environment where networked nodes are entering and leaving the system. The main correctness condition of the protocol is to avoid the formation of routing loops. This is hard to accomplish and bugs have been found [6]. Finite instances of the protocol has been analyzed with model checkers and a version of the protocol has been proved correct using manual theorem proving techniques. Briefly the protocol works as follows. When a node needs to find a route to another, it broadcasts a route request (RREQ) message to its neighbors. If any of them has a route to the destination it replies with a route reply (RREP) message. Otherwise it sends out a RREQ to its neighbors. This continues till the destination node is reached or some node has a route to the final destination. Then the RREP message is propagated back to the node requesting the route. When a node receives a RREQ message it adds a route to the original sender of the message, so that it can propagate the RREP back. Also nodes will replace longer paths by shorter ones to optimize communication. The routing tables are modeled by the three two-dimensional arrays route p, route and hops. Given nodes i and j, route p[i][j] is true iff i has a route to j, route[i][j] is the node to which i forwards packets whose final destination is j and hops[i][j] is the number of hops that i believes are needed for a packet to reach j. The message queue is modeled as an unbounded array of records. Each record has type, src, dst, from, to and hops fields. The src and dst fields are the original source and final destination of the current request (or reply). The from and to fields are the message source and destination of the current hop. The field hops is an integer which keeps track of the number of hops the message has traversed. As explained before, for every route that a node has, it keeps track of the number of hops necessary to get to the destination. Consider three arbitrary but distinct nodes: a, b and c. The node a has a route to c and its next hop is b. In this situation the protocol maintains the invariant that b has a route to c and a’s hop count to c is strictly greater than b’s hop count to c. This makes sure that along a route to the destination the hop count always decreases. Thus there can not be a cycle in the routing table. This is the property that was verified automatically. In the actual protocol, where links between nodes can go down, the age of the routes is tracked with a sequence number field. The ordering relation is more complex in that case. To simplify the system for the sake of discussion here the sequence numbers have been dropped. The simplified version is described in Figure 4 and 5. The atomic predicates in the the verification condition are used as the initial set of predicates. The initial predicates are, B1 ≡ route p[a][c], B2 ≡ route[a][c] = b, B3 ≡ route p[b][c] and B4 ≡ hops[a][c] > hops[b][c]. The abstract
Counter-Example Based Predicate Discovery in Predicate Abstraction
29
type cell index type : subrange(1..N) msg index type : subrange(1..infinity) msg sort : enum of [INVALID, RREQ, RREP] msg type : record of [type : msg type; from,to,src,dst : cell index type; hops : integer]; state route p : array [N][N] of boolean route : array [N][N] of cell queue : array [infinity] of msg type a, b, c : msg index type initialize msg queue := all messages have type INVALID route p := all array elements are false /* Generate RREQ */ rule (msg : msg index type; src,dst : cell index type;) queue[msg].type = INVALID ∧ ¬ route p[src][dst] ⇒ queue[msg] := [# type = RREQ; src = src; dst = dst; from = src; hops = 0 #] /* Receive RREP */ rule (in, out: msg index type;) queue[in].type = RREP ∧ queue[out].type = INVALID ⇒ /* Add route to immediate neighbor */ route p[queue[in].to][queue[in].from] := true route[queue[in].to][queue[in].from] := queue[in].from hops[queue[in].to][queue[in].from] := 1 /* Add route to RREP source if this is a better route */ if hops[queue[in].to][queue[in].src]>queue[in].hops ∨ ¬ route p[queue[in].to][queue[in].src] then route p[queue[in].to][queue[in].src] := true route[queue[in].to][queue[in].src] := queue[in].from hops[queue[in].to][queue[in].src] := queue[in].hops + 1 end /* Forward RREP */ if queue[in].to = queue[in].dst∧ route p[queue[in].to][queue[in].dst] then queue[out] := [# type=RREP; src=queue[in].src; dst=queue[in].dst; from=queue[in].to; to=route[queue[in].to][queue[in].dst] hops=hops[queue[in].to][queue[in].src] #] end
Fig. 4. AODV protocol
30
S. Das and D.L. Dill
/* Receive RREQ */ rule (in, out: msg index type;) queue[in].type = RREQ ∧ queue[out].type = INVALID ⇒ /* Add route to immediate neighbor */ route p[queue[in].to][queue[in].from] := true route[queue[in].to][queue[in].from] := queue[in].from hops[queue[in].to][queue[in].from] := 1 /* Add route to RREQ source if this is a better route */ if hops[queue[in].to][queue[in].src]>queue[in].hops ∨ ¬ route p[queue[in].to][queue[in].src] then route p[queue[in].to][queue[in].src] := true route[queue[in].to][queue[in].src] := queue[in].from hops[queue[in].to][queue[in].src] := queue[in].hops + 1 end /* RREQ has reached final destination */ if queue[in].dst = queue[in].to then queue[out] := [# type=RREP; src=queue[in].dst; dst=queue[in].src; from=queue[in].to; to=queue[in].from; hops=0 #] /* The RREQ receiver has a route to final destination */ elsif route p[queue[in].to][queue[in].dst] then queue[out] := [# type=RREP; src=queue[in].dst; dst=queue[in].src from=queue[in].to; to=queue[in].from; hops=hops[queue[in].to][queue[in].dst] #] /* Forward RREQ */ else queue[out] := [# type=RREQ; src=queue[in].src; dst=queue[in].dst from=queue[in].from; hops=queue[in].hops+1 #] end property (route p[a][c] ∧ route[a][c] = b) → (route p[b][c] ∧ hops[a][c] > hops[b][c])
Fig. 5. AODV protocol (contd.)
system generates a counter-example of length one where a receives a RREQ and adds a route to c through b while b does not have a route to c. The predicate discovery algorithm deduces that this cannot happen since in the initial state there are no RREQs present. So the predicate, ∃x. queue[x].type = RREQ is added and the new abstraction is model checked again. Now a two step counterexample is generated. In the first step an arbitrary cell generates an RREQ. In the next step a receives an RREQ from b originally requested by c and sets it routing table entry for node c to b. Since b does not have a routing table entry
Counter-Example Based Predicate Discovery in Predicate Abstraction
31
to c this violates the desired invariant. Again the predicate discovery algorithm deduces that such a message cannot exist. So the predicate ∃x. ( queue[x].type = RREQ ∧ queue[x].f rom = b ∧ queue[x].src = c ∧ queue[x].to = a) is discovered. Continuing in this manner in the next iteration the predicate, ∃x. ( queue[x].type = RREQ ∧ queue[x].f rom = b ∧ queue[x].src = c ∧ queue[x].to = a ∧ hops[b][c] > queue[x].hops) is discovered. This is exactly the predicate that is required to prove the desired invariant. While verifying the actual protocol, similar predicates are discovered for the RREP branch of the protocol as well. The predicates needed to prove the actual protocol are different from the predicates listed here but are of the same flavor. The program requires thirteen predicate discovery cycles to find all the necessary predicates.
References 1. David L. Dill Aaron Stump, Clark W. Barrett. CVC: a cooperating validity checker. In Conference on Computer Aided Verification, Lecture notes in Computer Science. Springer-Verlag, 2002. 2. R. Alur, A. Itai, R.P. Kurshan, and M. Yannakakis. Timing verification by successive approximation. Information and Computation 118(1), pages 142–157, 1995. 3. F. Balarin and A. L. Sangiovanni-Vincentelli. An iterative approach to language containment. In 5th International Conference on Computer-Aided Verification, pages 29–40. Springer-Verlag, 1993. 4. Thomas Ball and Sriram K. Rajamani. The SLAM project: debugging system software via static analysis. In Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 1–3. ACM Press, 2002. 5. Saddek Bensalem, Yassine Lakhnech, and Sam Owre. InVeSt: A tool for the verification of invariants. In 10th International Conference on Computer-Aided Verification, pages 505–510. Springer-Verlag, 1998. 6. Karthikeyan Bhargavan, Davor Obradovic, and Carl A. Gunter. Formal verification of standards for distance vector routing protocols, August 1999. Presented in the Recent Research Session at Sigcomm 1999. 7. Edmund M. Clarke, Orna Grumberg, Somesh Jha, Yuan Lu, and Helmut Veith. Counterexample-guided abstraction refinement. In Computer Aided Verification, pages 154–169. Springer-Verlag, 2000. 8. Michael A. Col´ on and Tom´ as E. Uribe. Generating finite-state abstractions of reactive systems using decision procedures. In Conference on Computer-Aided Verification, volume 1427 of Lecture Notes in Computer Science, pages 293–304. Springer-Verlag, 1998. 9. Satyaki Das and David L. Dill. Successive approximation of abstract transition relations. In Proceedings of the Sixteenth Annual IEEE Symposium on Logic in Computer Science, pages 51–60. IEEE Computer Society, 2001. June 2001, Boston, USA.
32
S. Das and D.L. Dill
10. C. Flanagan and S. Qadeer. Predicate abstraction for software verification. In Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM Press, 2002. 11. Susanne Graf and Hassen Sa¨ıdi. Construction of abstract state graphs with PVS. In Orna Grumberg, editor, Conference on Computer Aided Verification, volume 1254 of Lecture notes in Computer Science, pages 72–83. Springer-Verlag, 1997. June 1997, Haifa, Israel. 12. Yassine Lakhnech, Saddek Bensalem, Sergey Berezin, and Sam Owre. Incremental verification by abstraction. In T. Margaria and W. Yi, editors, Tools and Algorithms for the Construction and Analysis of Systems: 7th International Conference, TACAS 2001, pages 98–112, Genova, Italy, 2001. Springer-Verlag. 13. D. Lessens and Hassen Sa¨ıdi. Automatic verification of parameterized networks of processes by abstraction. Electronic Notes of Theoretical Computer Science (ENTCS), 1997. 14. Z. Manna and A. Pnueli. Temporal Verification of Reactive Systems: Safety. Springer-Verlag, 1995. 15. Charles E. Perkins and Elizabeth M. Royer. Ad Hoc On-Demand Distance Vector (AODV) Routing. In Workshop on Mobile Computing Systems and Applications, pages 90–100. ACM Press, February 1999. 16. Charles E. Perkins, Elizabeth M. Royer, and Samir Das. Ad Hoc On-Demand Distance Vector (AODV) Routing. Available at http://www.ietf.org/internet-drafts/draft-ietf-manet-aodv-05.txt, 2000. 17. A. P. Sistla and S. M. German. Reasoning with many processes. In Symp. on Logic in Computer Science, Ithaca, pages 138–152. IEEE Computer Society, June 1987. 18. Rupak Majumdar Thomas A Henzinger, Ranjit Jhala and Gregoire Sutre. Lazy abstraction. In Proceedings of the 29th ACM SIGPLAN-SIGACT Conference on Principles of Programming Languages. ACM Press, 2002. 19. A. Tiwari, H. Rueß, H. Sa¨idi, and N. Shankar. A technique for invariant generation. In Tiziana Margaria and Wang Yi, editors, TACAS 2001 - Tools and Algorithms for the Construction and Analysis of Systems, volume 2031 of Lecture Notes in Computer Science, pages 113–127, Genova, Italy, apr 2001. Springer-Verlag.
Automated Abstraction Refinement for Model Checking Large State Spaces Using SAT Based Conflict Analysis Pankaj Chauhan1 , Edmund Clarke1 , James Kukula3 , Samir Sapra1 , Helmut Veith2 , and Dong Wang1 1 3
Carnegie Mellon University 2 TU Vienna, Austria Synopsys Inc., Beaverton, OR
Abstract. We introduce a SAT based automatic abstraction refinement framework for model checking systems with several thousand state variables in the cone of influence of the specification. The abstract model is constructed by designating a large number of state variables as invisible. In contrast to previous work where invisible variables were treated as free inputs we describe a computationally more advantageous approach in which the abstract transition relation is approximated by pre-quantifying invisible variables during image computation. The abstract counterexamples obtained from model-checking the abstract model are symbolically simulated on the concrete system using a state-of-the-art SAT checker. If no concrete counterexample is found, a subset of the invisible variables is reintroduced into the system and the process is repeated. The main contribution of this paper are two new algorithms for identifying the relevant variables to be reintroduced. These algorithms monitor the SAT checking phase in order to analyze the impact of individual variables. Our method is complete for safety properties (AG p) in the sense that – performance permitting – a property is either verified or disproved by a concrete counterexample. Experimental results are given to demonstrate the power of our method on real-world designs.
1
Introduction
Symbolic model checking has been successful at automatically verifying temporal specifications on small to medium sized designs. However, the inability of BDD based model checking to handle large state spaces of “real world” designs hinders the wide scale acceptance of these techniques. There have been advances
This research is sponsored by the Semiconductor Research Corporation (SRC) under contract no. 99-TJ-684, the Gigascale Silicon Research Center (GSRC), the National Science Foundation (NSF) under Grant No. CCR-9803774, and the Max Kade Foundation. One of the authors is also supported by Austrian Science Fund Project N Z29-INF. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of SRC, GSRC, NSF, or the United States Government.
M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 33–51, 2002. c Springer-Verlag Berlin Heidelberg 2002
34
P. Chauhan et al.
on various fronts to push the limits of automatic verification. On the one hand, improving BDD based algorithms improves the ability to handle large state machines, while on the other hand, various abstraction algorithms reduce the size of the design by focusing only on relevant portions of the design. It is important to make improvements on both fronts for successful verification. A conservative abstraction is one which preserves all behaviors of a concrete system. Conservative abstractions benefit from a preservation theorem which states that the correctness of any universal (e.g. ACTL∗ ) formulae on an abstract system automatically implies the correctness of the formula on the concrete system. However, a counterexample on an abstract system may not correspond to any real path, in which case it is called a spurious counterexample. To get rid of a spurious counterexample, the abstraction needs to be made more precise via refinement. It is obviously desirable to automate this procedure. This paper focuses on automating the abstraction process for handling large designs containing up to a few thousand latches. This means that using any computation on concrete systems based on BDDs will be too expensive. Abstraction refinement [1,6,8,11,13,17] is a general strategy for automatic abstraction. Abstraction refinement usually involves the following process. 1. Generation of Initial Abstraction. It is desirable to derive the initial abstraction automatically. 2. Model checking of abstract system. If this results in a conclusive answer for the abstract system, then the process is terminated. For example, in case of existential abstraction, a “yes” answer for an ACTL∗ property in this step means that the concrete system also satisfies the property, and we can stop. However, if the property is false on the abstract system, an abstract counterexample is generated. 3. Checking whether the counterexample holds on the concrete system. If the counterexample is valid, then we have actually found a bug. Otherwise, the counterexample is spurious and the abstraction needs to be refined. Usually, refinement of abstraction is based on the analysis of counterexample(s) generated. Our abstraction function is based on hiding irrelevant parts of the circuit by make a set of variables invisible. This simple abstraction function yields an efficient way to generate minimal abstractions, a source of difficulty in previous approaches. We describe two techniques to produce abstract systems by removing invisible variables. The first is simply to make the invisible variables into input variables. This is shown to be a minimal abstraction. However, this leaves a large number of input variables in the abstract system and, consequently, BDD based model checking even on this abstract system becomes very difficult [19]. We propose an efficient method to pre-quantify these variables on the fly during image computation. The resulting abstract systems are usually small enough to be handled by standard BDD based model checkers. We use an enhanced version [3,4] of NuSMV [5] for this. If a counterexample is produced for the abstract system, we try to simulate it on the concrete system symbolically using a fast SAT checker (Chaff [16,21] in our case).
Automated Abstraction Refinement
35
The refinement is done by identifying a small set of invisible variables to be made visible. We call these variables the refinement variables. Identification of refinement variables is the main focus of this paper. Our techniques for identifying important variables are based on analysis of effective boolean constraint propagation (BCP) and conflicts [16] during the SAT checking run of the counterexample simulation. Recently, propositional SAT checkers have demonstrated tremendous success on various classes of SAT formulas. The key to the effectiveness of SAT checkers like Chaff [16], GRASP [18] and SATO [20] is nonchronological backtracking, efficient conflict driven learning of conflict clauses, and improved decision heuristics. SAT checkers have been successfully used for Bounded Model Checking (BMC) [2], where the design under consideration is unrolled and the property is symbolically verified using SAT procedures. BMC is effective for showing the presence of errors. However, BMC is not at all effective for showing that a specification is true unless the diameter of the state space is known. Moreover, BMC performance degrades when searching for deep counterexamples. Our technique can be used to show that a specification is true and is able to search for deeper concrete counterexamples because of the guidance derived from abstract counterexamples. The efficiency of SAT procedures has made it possible to handle circuits with a few thousand of variables, much larger than any BDD based model checker is able to do at present. Our approach is similar to BMC, except that the propositional formula for simulation is constrained by assignments to visible variables. This formula is unsatisfiable for a spurious counterexample. We propose heuristic scores based on backtracking and conflict clause information, similar to VSIDS heuristics in Chaff, and conflict dependency analysis algorithm to extract the reason for unsatisfiability. Our techniques are able to identify those variables that are critical for unsatisfiability of the formula and are, therefore, prime candidates for refinement. The main strength of our approach is that we use the SAT procedure itself for refinement. We do not need to invoke multiple SAT instances or solve separation problems as in [8]. Thus the main contributions of our work are, (a) use of SAT for counterexample validation, (b) refinement procedures based on SAT conflict analysis, and, (c) a method to remove invisible variables from the abstract system for computational efficiency. Outline of the Paper The rest of the paper is organized as follows. Section 2 briefly reviews how abstraction is used in model checking and introduces notation that is used in the following sections. In Section 3, we describe in detail, our abstraction technique and how we check an abstract counterexample on the concrete model. The most important part of the paper is Section 4, where we discuss our refinement algorithms based on scoring heuristics for variables and conflict dependency analysis. In section 5, we present experimental evidence to show the ability of our approach to handle large state systems. In Section 6, we describe related work in detail. Finally, we conclude in Section 7 with directions for future research.
36
2
P. Chauhan et al.
Abstraction in Model Checking
We give a brief summary of the use of abstraction in model checking and introduce notation that we will use in the remainder of the paper (refer to [7] for a full treatment). A transition system is modeled by a tuple M = (S, I, R, L, L) where S is the set of states, I ⊆ S is the set of initial states, R is the set of transitions, L is the set of atomic propositions that label each state in S with the labeling function L : S → 2L . The set I is also used as a predicate I(s), meaning the state s is in I. Similarly, the transition relation R is also used as a predicate R(s1 , s2 ), meaning there exists a transition between states s1 and s2 . Each program variable vi ranges over its non-empty domain Dvi . The state space of a program with a set of variables V = {v1 , v2 , . . . , vn } is defined by the Cartesian product Dv1 × Dv2 × . . . × Dvn . In existential abstraction [7] a surjecˆ tion h : S → Sˆ maps a concrete state si ∈ S to an abstract state sˆi = h(si ) ∈ S. si ). We denote the set of concrete states that map to an abstract state sˆi by h−1 (ˆ ˆ = (S, ˆ I, ˆ R, ˆ L, ˆ L) ˆ corDefinition 1. The minimal existential abstraction M responding to a transition system M = (S, I, R, L, L) and an abstraction function h is defined by: 1. Sˆ = {ˆ s|∃s.s ∈ S ∧ h(s) = sˆ}. 2. Iˆ = {ˆ s|∃s.I(s) ∧ h(s) = sˆ}. ˆ = {(ˆ 3. R s1 , sˆ2 )|∃s1 .∃s2 .R(s1 , s2 ) ∧ h(s1 ) = sˆ1 ∧ h(s2 ) = sˆ2 }. ˆ 4. L = L. ˆ s) = 5. L(ˆ h(s)=ˆ s L(s). Condition 3 can be stated equivalently as ˆ s1 , sˆ2 ) ∃s1 , s2 (R(s1 , s2 ) ∧ h(s1 ) = sˆ1 ∧ h(s2 ) = sˆ2 ) ⇔ R(ˆ
(1)
An atomic formula f respects h if for all s ∈ S, h(s) |= f ⇒ s |= f . Labeling ˆ s) is consistent, if for all s ∈ h−1 (ˆ L(ˆ s) it holds that s |= f ∈L(ˆ ˆ s) f . The following theorem from [6,15] is stated without proof. Theorem 1. Let h be an abstraction function and φ an ACTL∗ specification where the atomic sub-formulae respect h. Then the following holds: (i) For all ˆ L(ˆ ˆ s) is consistent, and (ii) M ˆ |= φ ⇒ M |= φ. sˆ ∈ S, This theorem is the core of all abstraction refinement frameworks. However, the ˆ | converse may not hold, i.e., even if M = φ, the concrete model M may still ˆ is said to be spurious, and satisfy φ. In this case, the counterexample on M we need to refine the abstraction function. Note that the theorem holds even if only the right implication holds in Equation 1. In other words, even if we add ˆ the validity of an ACTL∗ more transitions to the minimal transition relation R, ˆ implies its validity on M . formula on M Definition 2. An abstraction function h is a refinement for the abstraction function h and the transition system M = (S, I, R, L, L) if for all s1 , s2 ∈ S, h (s1 ) = h (s2 ) implies h(s1 ) = h(s2 ). Moreover, h is a proper refinement of h if there exist s1 , s2 ∈ S such that h(s1 ) = h(s2 ) and h (s1 ) =h (s2 ).
Automated Abstraction Refinement
37
In general, ACTL∗ formulae can have tree-like counterexamples [9]. In this paper, we focus only on safety properties, which have finite path counterexamples. It is possible to generalize our approach to full ACTL∗ as done in [9]. The following iterative abstraction refinement procedure for a system M and a safety formula φ follows immediately. 1. Generate an initial abstraction function h. ˆ . If M ˆ |= φ, return TRUE. 2. Model check M ˆ 3. If M | = φ, check the generated counterexample T on M . If the counterexample is real, return FALSE. 4. Refine h, and goto step 2. Since each refinement step partitions at least one abstract state, the above procedure is complete for finite state systems for ACTL* formulae that have path counterexamples. Thus the number of iterations is bounded by the number of concrete states. However, as we will show in the next two sections, the number of refinement steps can be at most equal to the number of program variables. We would like to emphasize that we model check abstract system in step 2 using BDD based symbolic model checking, while steps 3 and 4 are carried out with the help of SAT checkers.
3
Generating Abstract State Machine
We consider a special type of abstraction for our methodology, wherein, we hide a set of variables that we call invisible variables, denoted by I. The set of variables that we retain in our abstract machine are called visible variables, denoted by V. The visible variables are considered to be important for the property and hence are retained in the abstraction, while the invisible variables are considered irrelevant for the property. The initial abstraction and the refinement in steps 1 and 4 respectively correspond to different partitions of V . Typically, we would want |V| |I|. Formally, the value of a variable v ∈ V in state s ∈ S is denoted by s(v). Given a set of variables U = {u1 , u2 , . . . , up }, U ⊆ V , let sU denote the portion of s that corresponds to the variables in U , i.e., sU = (s(u1 )s(u2 ) . . . s(up )). Let V = {v1 , v2 , . . . , vk }. This partitioning of variables ˆ The set of abstract states is Sˆ = defines our abstraction function h : S → S. Dv1 × Dv2 . . . × Dvk and h(s) = sV . In our approach, the initial abstraction is to take the set of variables mentioned in the property as visible variables. Another option is to make the variables in the cone of influence (COI) of the property visible. However, the COI of a property may be too large and we may end with a large number of visible variables. The idea is to begin with a small set of visible variables and then let the refinement procedure come up with a small set of invisible variables to make visible. We also assume that the transition relation is described not as a single predicate, but as a conjunction of bit relations Rj of each individual variable vj . More formally, we consider a sequential circuit with registers V = {v1 , v2 , . . . , vm } and inputs I = {i1 , i2 , . . . , iq }. Let s = (v1 , v2 , . . . , vm ), i = (i1 , i2 , . . . , iq )
38
P. Chauhan et al.
and s = (v1 , v2 , . . . , vm ). The primed variables denote the next state versions of unprimed variables as usual. Thus the bit relation for vj becomes Rj (s, i, vj ) = (vj ↔ fvj (s, i)).
R(s, s ) = ∃i
m
Rj (s, i, vj )
(2)
j=1
3.1
Abstraction by Making Invisible Variables as Input Variables
ˆ corresponding to R and h As shown in [8], the minimal transition relation R described above is obtained by removing the logic defining invisible variables ˆ looks like: and treating them as free input variables of the circuit. Hence, R ˆ s, sˆ ) = ∃sI ∃i R(ˆ
Rj (sV , sI , i, vj )
(3)
vj ∈V
The quantifications in Equation 3 are performed during each image computation in symbolic model checking of the abstract system. This is done so as not to build ˆ and enjoy the benefits of early quantification. a monolithic BDD for R We call this type of abstraction an input abstraction. We write s as sV , sI to stress the fact that we are leaving invisible variables as input variables in ˆ When dealing with systems with a large number of registers, quantifying so R. many variables for each image computation is expensive (e.g. [19]). An invisible variable can in the support of multiple partitions of the transition relation. In input abstraction, each occurence of an invisible variable has the same value in different partitions of the abstract transition relation. Thus, we say input abstraction preserves correlations between different occurrences of an invisible variable. In the next type of abstraction, we pre-quantify most of the invisible variables, to reduce the number of variables during image computation. This means that different occurrences of an invisible variable get de-coupled when we push the quantifications inside Equation 3, making the abstraction more approximate. 3.2
Abstraction by Pre-quantifying Invisible Variables
Input abstraction leaves a large number of variables to quantify during the image computation process. We can however, quantify these variables a priori, leaving ˆ The transition relation that we get by quantifying only visible variables in R. ˆ ˜ We can even quantify invisible variables from R in the beginning is denoted by R. some of the input variables a priori in this fashion to control the total number ˜ Let Q ⊆ I ∪ I denote the set of variables to be preof variables appearing in R. quantified and let W = (I ∪I)\Q, the set of variable that are not pre-quantified. Quantification of a large number of invisible variables in Equation 3 is computationally expensive [15]. To alleviate this difficulty, it is customary to
Automated Abstraction Refinement
39
approximate this abstraction by pushing the quantification inside conjunctions as follows. ˜ s, sˆ ) = ∃sW R(ˆ ∃sQ Rj (sV , sI , i, vj ) (4) vj ∈V
Since the BDDs for state sets do not contain input variables in the support, this is a safe step to do. This does not violate the soundness of the approximation, i.e., for each concrete transition in R, there will be a corresponding transition ˆ as stated below. in R, ˜ s1 , sˆ2 ). Theorem 2. ∃s1 , s2 (R(s1 , s2 ) ∧ h(s1 ) = sˆ1 ∧ h(s2 ) = sˆ2 ) ⇒ R(ˆ The other direction of this implication does not hold because of the approximations introduced. Preserving Correlations. We can see in Equation 4 that by existentially quantifying each invisible variable separately for each conjunct of the transition relation, we lose the correlation between different occurrences of a variable. For example, consider the trivial bit relations x1 = x3 , x2 = ¬x3 and x3 = x1 ⊕ x2 . Suppose x3 is made an invisible variable. Then quantifying x3 from the bit relations of x1 and x2 will result in the transition relation being always evaluated 1, meaning the state graph is a clique. However, we can see that in any reachable state, x1 and x2 are always opposite of each other. To solve this problem partially without having to resort to equation 4, we propose to cluster those bit relations that share many common variables. Since this problem is very similar to the quantification scheduling problem (which occurs during image computations), we propose to use a modification of VarScore algorithms [3] for evaluating this quantification. This algorithm can be viewed as producing clusters of bit relations. We use it to produce clusters with controlled approximations. The idea is to delay variable quantifications as much as possible, without letting the conjoined BDDs grow too large. When a BDD grows larger than some threshold, we quantify away a variable. We can of course quantify a variable that no longer appears in the support of other BDDs. Effective quantification scheduling algorithms put closely related occurrences of a variable in the same cluster. Figure 1 shows the VarScore algorithm for approximating existential abstraction. A static circuit minimum cut based structural method to reduce the number of invisible variables was proposed in [12] and used in [19]. Our method introduces approximations as needed based on actual image computation, while there method removes the variables statically. Our algorithms achieves a balance between performance and accuracy. This means that the approximations introduced by our algorithm are more accurate as the parts of the circuits statically removed in [12] could be important. 3.3
Checking the Validity of an Abstract Counterexamples
ˆ and a safety formula φ, we run the usual BDD based Given an abstract model M ˆ |= φ. Suppose that the symbolic model checking algorithm to determine if M
40
P. Chauhan et al.
Given a set of conjuncts RV and variables sQ to pre-quantify Repeat until all sQ variables are quantified 1. Quantify away sQ variables appearing in only one BDD 2. Score the variables by summing up the sizes of BDDs in which a variable occurs 3. Pick two smallest BDDs for the variable with the smallest score 4. If any BDD is larger then the size threshold, quantify the variable from BDD(s) and go back to step 2. 5. If the BDDs are smaller than threshold, do BDDAnd or BDDAndExists depending upon the case Fig. 1. VarScore algorithm for approximating existential abstraction
model checker produces an abstract path counterexample s¯m = ˆ s0 , sˆ1 , . . . , sˆm . To check whether this counterexample holds on the concrete model M or not, we symbolically simulate M beginning with the initial state I(s0 ) using a fast SAT checker. At each stage of the symbolic simulation, we constrain the values of visible variables only according to the counterexample produced. The equation for symbolic simulation is: (I(s0 ) ∧ (h(s0 ) = sˆ0 )) ∧ (R(s0 , s1 ) ∧ (h(s1 ) = sˆ1 )) ∧ . . . ∧(R(sm−1 , sm ) ∧ (h(sm ) = sˆm ))
(5)
Each h(si ) is just a projection of the state si onto visible variables. If this propositional formula is satisfiable, then we can successfully simulate the counterexample on the concrete machine to conclude that M | = φ. The satisfiable assignments to invisible variables along with assignments to visible variables produced by model checking give a valid counterexample on the concrete machine. If this formula is not satisfiable, the counterexample is spurious and the abstraction needs refinement. Assume that the counterexample can be simulated up to the abstract state sˆf , but not up to sˆf +1 ([6,8]). Thus formula 6 is satisfiable while formula 7 is not satisfiable, as shown in Figure 2. (I(s0 ) ∧ (h(s0 ) = sˆ0 )) ∧ (R(s0 , s1 ) ∧ (h(s1 ) = sˆ1 )) ∧ . . . ∧(R(sf −1 , sf ) ∧ (h(sf ) = sˆf ))
(6)
(I(s0 ) ∧ (h(s0 ) = sˆ0 )) ∧ (R(s0 , s1 ) ∧ (h(s1 ) = sˆ1 )) ∧ . . . ∧(R(sf , sf +1 ) ∧ (h(sf +1 ) = sˆf +1 ))
(7)
Using the terminology introduced in [6], we call the abstract state sˆf a failure state. The abstract state sˆf contains many concrete states given by all possible combinations of invisible variables, keeping the same values for visible variables as given by sˆf . The concrete states in sˆf reachable from the initial states following the spurious counterexample are called the dead-end states. The concrete states in sˆf that have a reachable set in sˆf +1 are called bad states. Because the
Automated Abstraction Refinement
Abstract Trace
s^0
s^1
s^2
s^f
41
dead−end states s^
f+1
s^0 failure state
Concrete Trace −1 h ( s^0 )
−1 h ( s^1 )
−1 h ( s^2 )
−1 h ( s^f )
bad states
−1 h ( s^f+1)
Fig. 2. A spurious counterexample showing failure state [8]. No concrete path can be extended beyond failure state.
dead-end states and the bad states are part of the same abstract state, we get the spurious counterexample. The refinement step then is to separate dead-end states and bad states by making a small subset of invisible variables visible. It is easy to see that the set of dead-end states are given by the values of state variables in the f th step for all satisfying solutions to Equation 6. Note that in symbolic simulation formulas, we have a copy of each state variable for each time frame. We do this symbolic simulation using the SAT checker Chaff [16]. We assume that there are concrete transitions which correspond to each abstract transition from sˆi to sˆi+1 , where 0 < i ≤ f . It is fairly straightforward to extend our algorithm to handle spurious abstract transitions. In this case, the set of bad states is not empty. Since s¯f is the shortest prefix that is unsatisfiable, there must be information passed through the invisible registers at time frame f in order for the SAT solver to prove the counterexample is spurious. Specifically, the SAT solver implicitly generates constraints on the invisible registers at time frame f based on either the last abstract transition or the prefix s¯f . Obviously the intersection of these two constraints on those invisible registers is empty. Thus the set of invisible registers that are constrained in time frame f during the SAT process is sufficient to separate deadend states and bad states after refinement. Therefore, our algorithm limits the refinement candidates to the registers that are constrained in time frame f . Equation 5 is exactly like symbolic simulation with Bounded Model Checking. The only difference is that the values of visible state variables at each step are constrained to the counterexample values. Since the original input variables to the system are unconstrained, we also constrain their values according to the abstract counterexample. This puts many constraints on the SAT formula. Hence, the SAT checker is able to prune the search space significantly. We rely on the ability of Chaff to identify important variables in this SAT check to separate dead-end and bad states, as described in the next section.
42
P. Chauhan et al. while(1) { if (decide_next_branch()) { while (deduce() == conflict) { blevel = analyse_conflict(); if (blevel == 0) return UNSAT; else backtrack(blevel); } } else }
return SAT;
// Branching // Propagate implications // Learning
// Non-chronological // backtrack // no branch means all vars // have been assigned
Fig. 3. Basic DPLL backtracking search (used from [16] for illustration purpose)
4
SAT Based Refinement Heuristics
The basic framework for these SAT procedures is Davis-PutnamLogeman-Loveland backtracking search, shown in Figure 3. The function decide_next_branch() chooses the branching variable at current decision level. The function deduce() does Boolean constraint propagation to deduce further assignments. While doing so, it might infer that the present set of assignments to variables do not lead to any satisfying solution, leading to a conflict. In case of a conflict, new clauses are learned by analyse_conflict() that hopefully prevent the same unsuccessful search in the future. The conflict analysis also returns a variable for which another value should be tried. This variable may not be the most recent variable decided, leading to a non-chronological backtrack. If all variables have been decided, then we have found a satisfying assignment and the procedure returns. The strength of various SAT checkers lies in their implementation of constraint propagation, decision heuristics, and learning. Modern SAT checkers work by introducing conflict clauses in the learning phase and by non-chronological backtracking. Implication graphs are used for Boolean constraint propagation. The vertices of this graph are literals, and each edge is labeled with the clause that forces the assignment. When a clause becomes unsatisfiable as a result of the current set of assignments (decision assignments or implied assignments), a conflict clause is introduced to record the cause of the conflict, so that the same futile search is never repeated. The conflict clause is learned from the structure of the implication graph. When the search backtracks, it backtracks to the most recent variable in the conflict clause just added, not to the variable that was assigned last. For our purposes, note that Equation 7 is unsatisfiable, and hence there will be much backtracking. Hence, many conflict clauses will be introduced before the SAT checker concludes that the formula is unsatisfiable. A conflict clause records a reason for the formula being unsat-
Automated Abstraction Refinement
43
isfiable. The variables in a conflict clause are thus important for distinguishing between dead-end and bad states. The decision variable to which the search backtracks is responsible for the current conflict and hence is an important variable. We call the implication graph associated with each conflict a conflict graph.The source nodes of this graph are the variable decisions, the sink node of this graph is the conflicting assignment to one of the variables. At least one conflict clause is generated from a conflict graph. We propose the following two algorithms to identify important variables from conflict analysis and backtracking. 4.1
Refinement Based on Scoring Invisible Variables
We score invisible variables based on two factors, first, the number of times a variable gets backtracked to and, second, the number of times a variable appears in a conflict clause. Note that we have adjust the first score by an exponential factor based on the decision level a variable is at, as the variable at the root node can potentially get just two back tracks, while a variable at the decision level dl can get 2dl backtracks globally. Every time the SAT procedure backtracks to an invisible variable at decision level dl, we add the following number to the backtrack score.
2
|I|−dl c
We use c as a normalizing constant. For computing the second score, we just keep a global counter conflict score for each variable and increment the counter for each variable appearing in any conflict clause. The method used for identifying conflict clauses from conflict graphs greatly affects SAT performance. As shown in [21], we use the most effective method called the first unique implication point (1UIP) for identifying conflict clauses. We then use weighted average of these two scores to derive the final score as follows. w1 · backtrack score + w2 · conflict score
(8)
Note that the second factor is very similar to the decision heuristic VSIDS used in Chaff. The difference is that Chaff uses these per variable global scores to arrive at local decisions (of the next branching variable), while we use them to derive global information about important variables. Therefore, we do not periodically divide the variable scores as Chaff does. We also have to be careful to guide Chaff not to decide on the intermediate variables introduced while converting various formulae to CNF form, which is the required input format for SAT checkers. This is done automatically in our method. 4.2
Refinement Based on Conflict Dependency Graph
The choice of which invisible registers to make visible is the key to the success of the refinement algorithm. Ideally, we want this set of registers to be small and still
44
P. Chauhan et al.
be able to prevent the spurious trace. Obviously, the set of registers appearing in the conflict graphs during the checking of the counterexample could prevent the spurious trace. However, this set can be very large. We will show here that it is unnecessary to consider all conflict graphs. Dependencies between Conflict Graphs. We call the implication graph associated with a conflict a conflict graph. At least one conflict clause is generated from a conflict graph. Definition 3. Given two conflict graphs A and B, if at least one of the conflict clauses generated from A labels one of the edges in B, then we say that conflict B directly depends on conflict A. For example, consider the conflicts depicted in the conflict graphs of Figure 4. Suppose that at a certain stage of the SAT checking, conflict graph A is generated. This produces the conflict clause ω9 = (¬x9 + x11 + ¬x15 ). We are using the first UIP (1UIP) learning strategy [21] to identify the conflict clause here. This conflict clause can be rewritten as x9 ∧ ¬x11 → ¬x15 . In the other conflict graph B, clause ω9 labels one of the edges, and forces variable x15 to be 0. Hence, we say that conflict graph B directly depends on conflict graph A. −x11 (2)
−x11 (2) ω1 −x12 (3) ω4
ω4 x14 (5)
ω1 x15 (5)
x9 (1)
ω2
ω5
−x11 (2) x2 (5) ω3
ω3 x10 (5)
ω2 1UIP cut
Conflict graph A
directly depends conflict
ω9 ω9
ω5 −x15 (5)
x9 (1)
x16 (5) ω6
ω6
conflict
x17 (4) Using conflict clause
Conflict graph B
Fig. 4. Two dependent conflict graphs. Conflict B depends on conflict A, as the conflict clause ω9 derived from the conflict graph A produces conflict B.
Given the set of conflict graphs generated during satisfiability checking, we construct the unpruned conflict dependency graph as follows: – Vertices of the unpruned dependency graph are all conflict graphs created by the SAT algorithm. – Edges of the unpruned dependency graph are direct dependencies.
Automated Abstraction Refinement
45
Figure 5 shows an unpruned conflict dependency graph with five conflict graphs. A conflict graph B depends on another conflict graph A, if vertex A is reachable from vertex B in the unpruned dependency graph. In Figure 5, conflict graph E depends on conflict graph A. When the SAT algorithm detects unsatisfiability, it terminates with the last conflict graph corresponding to the last conflict. The subgraph of the unpruned conflict dependency graph on which the last conflict graph depends is called the conflict dependency graph. Formally, Definition 4. The conflict dependency graph is a subgraph of the unpruned dependency graph. It includes the last conflict graph and all the conflict graphs on which the last one depends.
conflict graph B
conflict graph A
conflict graph D conflict graph C
conflict graph E
last conflict graph
Fig. 5. The unpruned dependency graph and the dependency graph (within dotted lines)
In Figure 5, conflict graph E is the last conflict graph, hence the conflict dependency graph includes conflict graphs A, C, D, E. Thus, the conflict dependency graph can be constructed from the unpruned dependency graph by any directed graph traversal algorithm for reachability. Typically, many conflict graphs can be pruned away in this traversal, so that the dependency graph becomes much smaller than the unpruned dependency graph. Intuitively, all SAT decision strategies are based on heuristics. For a given SAT problem, the initial set of decisions/conflicts a SAT solver comes up with may not be related to the final unsatisfiability result. Our dependency analysis helps to remove that irrelevant reasoning. Generating Conflict Dependency Graph Based on Zchaff. We have implemented the conflict dependency analysis algorithm on top of zchaff [21], which has a powerful learning strategy called first UIP (1UIP). Experimental results from [21] show that 1UIP is the best known learning strategy. In 1UIP, only one conflict clause is generated from each conflict graph, and it only includes those implications that are closer to the conflict. Refer to [21] for the details. We have built our algorithms on top of 1UIP, and we restrict the following discussions to the case that only one conflict clause is generated from a conflict graph. Note here that the algorithms can be easily adapted to other learning strategies.
46
P. Chauhan et al.
After SAT terminates with unsatisfiability, our pruning algorithm starts from the last conflict graph. Based on the clauses contained in this conflict graph, the algorithm traverses other conflict graphs that this one depends on. The result of this traversal is the pruned dependency graph. Identifying Important Variables. The dependency graph records the reasons for unsatisfiability. Therefore, only the variables appearing in the dependency graph are important. Instead of collecting all the variables appearing in any conflict graph, those in the dependency graph are sufficient to disable the spurious counterexample. s0 , sˆ1 , . . . , sˆf +1 is the shortest prefix of a spurious counSuppose s¯f +1 = ˆ terexample that can not be simulated on the concrete machine. Recall that sˆf is the failure state. During the satisfiability checking of s¯f +1 , we generate an unpruned conflict dependency graph. When Chaff terminates with unsatisfiability, we collect the clauses from the pruned conflict dependency graph. Some of the literals in these clauses correspond to invisible registers at time frame f . Only those portions of the circuit that correspond to the clauses contained in the pruned conflict dependency graph are necessary for the unsatisfiability. Therefore, the candidates for refinement are the invisible registers that appear at time frame f in the conflict dependency graph. Refinement Minimization. The set of refinement candidates identified from conflict analysis is usually not minimal, i.e., not all registers in this set are required to invalidate the current spurious abstract counterexample. To remove those that are unnecessary, we have adapted the greedy refinement minimization algorithm in [19]. The algorithm in [19] has two phases. The first phase is the addition phase, where a set of invisible registers that it suffices to disable the spurious abstract counterexample is identified. In the second phase, a minimal subset of registers that is necessary to disable the counterexample is identifed. Their algorithm tries to see whether removing a newly added register from the abstract model still disables the abstract counterexample. If that is the case, this register is unnecessary and is no longer considered for refinement. In our case, we only need the second phase of the algorithm. The set of refinement candidates provided by our conflict dependency analysis algorithm already suffices to disable the current spurious abstract counterexample. Since the first phase of their algorithm takes at least as long as the second phase, this should speed up our minimization algorithm considerably.
5
Experimental Results
We have implemented our abstraction refinement framework on top of NuSMV model checker [5]. We modified the SAT checker Chaff to compute heuristic scores, to produce conflict dependency graphs and to do incremental SAT. The IU-p1 benchmark was verified by conflict analysis based refinement on a SunFire 280R machine with two 750Mhz UltraSparc III CPUs and 8GB of RAM running Solaris. All other experiments were performed on a dual 1.5GHz Athlon machine
Automated Abstraction Refinement
47
with 3GB of RAM running Linux. The experiments were performed on two sets of benchmarks. The first set of benchmarks in Table 1 are industrial benchmarks obtained from various sources. The benchmarks IU-p1 and IU-p2 refer to the same circuit, IU, but different properties are checked in each case. This circuit is an integer unit of a picoJava microprocessor from Sun. The D series benchmarks are from a processor design. The properties verified were simple AG properties. The property for IU-p2 has 7 registers, while IU-p1 and D series circuits have only one register in the property. The circuits in Table 2 are various abstractions of the IU circuit. The property being verified has 17 registers. They are smaller circuits that are easily handled by our methods but they have been shown to be difficult to handle by Cadence SMV [8]. We include these results here to compare our methods with the results reported in [8] for property 2. We do not report the results for property 1 in [8] because it is too trivial (all counterexamples can be found in 1 iteration). It is interesting to note that all benchmarks but IU-p1 and IU-p2 have a valid counterexample. Table 1. Comparison between Candence SMV (CSMV), heuristic score based refinement and dependency analysis based refinement for larger circuits. The experiment marked with a ∗ was performed on the SunFire machine with more memory because of a length 72 abstract counterexample encountered. circuit # regs ctrex length D2 105 15 D5 350 32 D6 177 20 D18 745 28 D20 562 14 D24 270 10 IU-p1 4855 true IU-p2 4855 true
CSMV Heuristic Score Dependency time time iters # regs time iters # regs 152 105 10 51 79 11 39 1,192 29 3 16 38.2 8 10 45,596 784 24 121 833 48 90 >4 hrs 12,086 69 346 9,995 142 253 >7 hrs 1,493 56 281 1,947 74 265 7,850 14 1 6 8 1 4 - 9,138 22 107 3,350∗ 13 19 - 2,820 7 36 712 6 13
In Table 1, we compare our methods against the BDD based model checker Cadence SMV (CSMV). We enabled cone of influence reduction and dynamic variable reordering in Cadence SMV. The performance of “vanilla” NuSMV was worse than Cadence SMV, hence we do not report those numbers. We report total running time, number of iterations and the number of registers in the final abstraction. The columns labeled with “Heuristic Score” report the results with our heuristic variable scoring method. We introduce 5 latches at a time in this method. The columns labeled with “Dependency” report the results of our dependency analysis based refinement. This method employs pruning of candidate refinement sets. A “-” in a cell indicates that the model checker ran out of memory. Table 2 compares our methods against those reported in [8] on IU series benchmarks for verifying property 2.
48
P. Chauhan et al.
Table 2. Comparison between [8], heuristic score based refinement and dependency analysis based refinement for smaller circuits. circuit # regs ctrex length IU30 30 11 IU35 35 20 IU40 40 20 IU45 45 20 IU50 50 20 IU55 55 11 IU60 60 11 IU65 65 11 IU70 70 11 IU75 75 11 IU80 80 11 IU85 85 11 IU90 90 11
[8] time 6.5 11 16.1 22.1 85.1 130.5 153.4 167.7 167.1
Heuristic Score time iters # regs 2.3 2 27 8.9 2 27 28.4 3 32 32.9 3 32 36 3 32 43 2 27 52.8 2 27 50.3 2 27 55.6 2 27 38.5 4 37 47.1 4 37 44.7 4 37 49.9 4 37
Dependency time iters # regs 1.9 4 20 10.4 5 21 13.3 6 22 25 6 22 32.8 6 22 61.9 4 20 65.5 4 20 67.5 4 20 71.4 4 20 15.7 5 21 21.1 5 21 24.6 5 21 24.3 5 21
We can see that our conflict dependency analysis based method outperforms a standard BDD based model checker, the method reported in [8] and the heuristic score based method. We also conclude that the computational overhead of our dependency analysis based method is well justified by the smaller abstractions that it produces. The variable scoring based method does not enjoy the benefits of reduced candidate refinement sets obtained through dependency analysis. Therefore, it results in a coarser abstraction in general. The heuristic based refinement method adds 5 registers at a time, resulting in some uniformity in the final number of registers, especially evident in Table 2. Due to the smaller number of refinement steps it performs, the total time it has to spend in model checking abstract machines may be smaller (as for D5, D6, D20, IU60, IU65, IU70).
6
Related Work
Our work compares most closely to that presented in [6] and more recently [8]. There are three major differences between our work and [6]. First, their initial abstraction is based on predicate abstraction, where new set of program variables are generated representing various predicates. They symbolically generate and manipulate these abstractions with BDDs. Our abstraction is based on hiding certain parts of the circuit. This yields an easier way to generate abstractions. Secondly, the biggest bottleneck in their method is the use of BDD based image computations on concrete systems for validating counterexamples. We use symbolic simulation based on SAT accomplish this task, as in [8]. Finally, their refinement is based on splitting the variable domains. The problem of finding the coarsest refinement is shown to be NP-hard in [6]. Because our abstraction functions are simpler, we can identify refinement variables during the SAT
Automated Abstraction Refinement
49
checking phase. We do not need to solve any other problem for refinement. We differ from [8] in three aspects. First, we propose to remove invisible variables from abstract systems on the fly by quantification. This reduces the complexity of BDD based model checking of abstract systems. Leaving a large number of input variables in the system makes it very difficult to model check even an abstract system [19]. Secondly, computation overhead for our separation heuristics is minimal. In their approach, refinement is done by separating dead-end and bad states (sets of concrete states contained in the failure state) with ILP solvers or machine learning. This requires enumerating all dead-end and bad states or producing samples of these states and separating them. We avoid this step altogether and cheaply identify refinement variables from the analysis of a single SAT check that is already done. We do not claim any optimality on the number of variables, however, this is a small price to pay for efficiency. We have been able to handle a circuit with about 5000 variables in cone of influence of the specification. Finally, we believe our method can identify a better set of invisible registers for refinement. Although [8] uses optimization algorithms to minimize the number of registers to refine, their algorithm relies on sampling to provide the candidate separation sets. When the size of the problem becomes large, there could be many possible separation sets. Our method is based on SAT conflict analysis. The Boolean constraint propagation (BCP) algorithm in a SAT solver naturally limits the number of candidates that we will need to consider. We use conflict dependency analysis to reduce further the number of candidates for refinement. The work of [10] focuses on algorithms to refine an approximate abstract transition relation. Given a spurious abstract transition, they combine a theorem prover with a greedy strategy to enumerate the part of the abstract transition that does not have corresponding concrete transitions. The identified bad transition is removed from the current abstract model for refinement. Their enumeration technique is potentially expensive. More importantly, they do not address the problem of how to refine abstract predicates. Previous work on abstraction by making variables invisible includes the localization reduction of Kurshan [13] and other techniques (e.g. [1,14]). Localization reduction begins with the set of variables in the property as visible variables. The set of variables adjacent to the present set of visible variables in the variable dependency graph are chosen as the candidates for refinement. Counterexamples are analyzed in order to choose variables among these candidates. The work presented in [19] combines three different engines (BDD, ATPG and simulation) to handle large circuits using abstraction and refinement. The main difference between our method and that in [19] is the strategy for refinement. In [19], candidates for refinement are based on those invisible registers that get assigned in the abstract counterexample. In our approach, we intentionally throw away invisible registers in the abstract counterexample, and rely on our SAT conflict analysis to select the candidates. We believe there are two advantages to disallowing invisible registers in the abstract counterexample. First of all, generating an abstract counterexample is computationally expensive, when the number of invisible registers is large. In fact, for efficiency reasons, a BDD/ATPG hybrid engine is used in [19] to model check the abstract model. By quantifying
50
P. Chauhan et al.
the invisible variables early, we avoid this bottleneck. More importantly, in [19], invisible registers are free inputs in the abstract model, their values are totally unconstrained. When checking such an abstract counterexample on the concrete machine, it is more likely to be spurious. In our case, the abstract counterexample only includes assignments to the visible registers and hence a real counterexample can be found more cheaply.
7
Conclusions
We have presented an effective and practical automatic abstraction refinement framework based on our novel SAT based conflict analysis. We have described a simple variable scoring heuristic as well as an elaborate conflict dependency analysis for identifying important variables. Our schemes are able to handle large industrial scale designs. Our work highlights the importance of using SAT based methods for handling large circuits. We believe these techniques complement bounded model checking in that they enable us to handle true specifications effeciently. An obvious extension of our framework is to handle all ACTL* formulae. We believe this can be done as in [9]. Further experimental evaluation will help us fine tune our procedures. We can also use circuit structure information to accelerate the SAT based simulation of counterexamples, for example, by identifying replicated clauses. We are investigating the use of the techniques described in this paper for software verification. We already have a tool for extracting a Boolean program from an ANSI C program by using predicate abstraction. Acknowledgements. We would like to thank Ofer Strichman for providing us some of the larger benchmark circuits. We would also like to acknowledge the anonymous reviewers for carefully reading the paper and making useful suggestions.
References [1] Felice Balarin and Alberto L. Sangiovanni-Vincentelli. An iterative approach to language containment. In Proceedings of CAV’93, pages 29–40, 1993. [2] Armin Biere, Alexandro Cimatti, Edmund M. Clarke, and Yunshan Zhu. Symbolic model checking without BDDs. In Proceedings of Tools and Algorithms for the Analysis and Construction of Systems (TACAS’99), number 1579 in LNCS, 1999. [3] Pankaj Chauhan, Edmund M. Clarke, Somesh Jha, Jim Kukula, Tom Shiple, Helmut Veith, and Dong Wang. Non-linear quantification scheduling in image computation. In Proceedings of ICCAD’01, pages 293–298, November 2001. [4] Pankaj Chauhan, Edmund M. Clarke, Somesh Jha, Jim Kukula, Helmut Veith, and Dong Wang. Using combinatorial optimization methods for quantification scheduling. In Tiziana Margaria and Tom Melham, editors, Proceedings of CHARME’01, volume 2144 of LNCS, pages 293–309, September 2001. [5] A. Cimatti, E. M. Clarke, F. Giunchiglia, and M. Roveri. NuSMV: A new Symbolic Model Verifier. In N. Halbwachs and D. Peled, editors, Proceedings of the International Conference on Computer-Aided Verification (CAV’99), number 1633 in Lecture Notes in Computer Science, pages 495–499. Springer, July 1999.
Automated Abstraction Refinement
51
[6] E. M. Clarke, O. Grumberg, S. Jha, Y. Lu, and H. Veith. Counterexample-guided abstraction refinement. In E. A. Emerson and A. P. Sistla, editors, Proceedings of CAV, volume 1855 of LNCS, pages 154–169, July 2000. [7] E. M. Clarke, O. Grumberg, and D. Peled. Model Checking. MIT Press, 2000. [8] Edmund Clarke, Anubhav Gupta, James Kukula, and Ofer Strichman. SAT based abstraction-refinement using ILP and machine learning techniques. In Proceedings of CAV’02, 2002. To appear. [9] Edmund Clarke, Somesh Jha, Yuan Lu, and Helmut Veith. Tree-like counterexamples in model checking. In Proceedings of the 17th Annual IEEE Symposium on Logic in Computer Science (LICS’02), 2002. To appear. [10] Satyaki Das and David Dill. Successive approximation of abstract transition relations. In Proceedings of the 16th Annual IEEE Symposium on Logic in Computer Science (LICS’01), 2001. [11] Shankar G. Govindaraju and David L. Dill. Counterexample-guided choice of projections in approximate symbolic model checking. In Proceedings of ICCAD’00, San Jose, CA, November 2000. [12] P.-H. Ho, T. Shiple, K. Harer, J. Kukula, R. Damiano, V. Bertacco, J. Taylor, and J. Long. Smart simulation using collaborative formal and simulation engines. In Proceedings of ICCAD’00, November 2000. [13] R. Kurshan. Computer-Aided Verification of Co-ordinating Processes: The Automata-Theoretic Approach. Princeton University Press, 1994. [14] J. Lind-Nielsen and H. Andersen. Stepwise CTL model checking of state/event systems. In N. Halbwachs and D. Peled, editors, Proceedings of the International Conference on Computer Aided Verification (CAV’99), 1999. [15] David E. Long. Model checking, abstraction and compositional verification. PhD thesis, Carnegie Mellon University, 1993. CMU-CS-93-178. [16] Matthew W. Moskewicz, Conor F. Madigan, Ying Zhao, Lintao Zhang, and Sharad Malik. Chaff: Engineering an efficient SAT solver. In Proceedings of the Design Automation Conference (DAC’01), pages 530–535, 2001. [17] Abelardo Pardo and Gary D. Hachtel. Incremental CTL model checking using BDD subsetting. In Proceedings of the Design Automation Conference (DAC’98), pages 457–462, June 1998. [18] J. P. Marques Silva and K. A. Sakallah. GRASP: A new search algorithm for satisfiability. Technical Report CSE-TR-292-96, Computer Science and Engineering Division, Department of EECS, Univ. of Michigan, April 1996. [19] Dong Wang, Pei-Hsin Ho, Jiang Long, James Kukula, Yunshan Zhu, Tony Ma, and Robert Damiano. Formal property verification by abstraction refinement with formal, simulation and hybrid engines. In Proceedings of the DAC, pages 35–40, 2001. [20] Hantao Zhang. SATO: An efficient propositional prover. In Proceedings of the Conference on Automated Deduction (CADE’97), pages 272–275, 1997. [21] Lintao Zhang, Conor F. Madigan, Matthew W. Moskewicz, and Sharad Malik. Efficient conflict driven learning in a Boolean satisfiability solver. In Proceedings of ICCAD’01, November 2001.
Simplifying Circuits for Formal Verification Using Parametric Representation In-Ho Moon1 , Hee Hwan Kwak1 , James Kukula1 , Thomas Shiple2 , and Carl Pixley1 1
Synopsys Inc., Hillsboro, OR Synopsys Inc., Grenoble, France {mooni,hkwak,kukula,shiple,cpixley}@synopsys.com 2
Abstract. We describe a new method to simplify combinational circuits while preserving the set of all possible values (that is, the range) on the outputs. This method is performed iteratively and on the fly while building BDDs of the circuits. The method is composed of three steps; 1) identifying a cut in the circuit, 2) identifying a group of nets within the cut, 3) replacing the logic driving the group of nets in such a way that the range of values for the entire cut is unchanged and, hence, the range of values on circuit outputs is unchanged. Hence, we parameterize the circuit in such a way that the range is preserved and the representation is much more efficient than the original circuit.Actually, these replacements are not done in terms of logic gates but in terms of BDDs directly. This is allowed by a new generalized parametric representation algorithm to deal with both input and output variables at the same time. We applied this method to combinational equivalence checking and the experimental results show that this technique outperforms an existing related method which replaces one logic net at a time. We also proved that the previous method is a special case of ours. This technique can be applied to various other problem domains such as symbolic simulation and image computation in model checking.
1
Introduction
Given a complex Boolean expression that defines a function from an input bit vector to an output bit vector, one can compute by a variety of methods the range of output values that the function can generate. This range computation has a variety of applications such as equivalence checking and model checking. BDDs (Binary Decision Diagrams[4]) and SAT (Satisfiability[13,19]) are two major techniques that can be used to perform the computation. In this paper we present a new BDD-based method, and describe its use in equivalence checking. However this new method can also be applied to other areas. The Boolean equivalence checking problem is to determine whether two circuits are equivalent. Typically, the circuits are at different levels of abstraction, one is a reference design and the other is its implementation. Equivalence checking is being used intensively in industrial design and is a mature problem. However there are still many real designs that current state-of-the-art equivalence checking tools cannot verify. BDD-based equivalence checking is trivial if the BDD size does not grow too large, however that is not the case in most of real designs. Therefore cut-based method [2, 14,10] has been used to avoid building huge monolithic BDDs. The cut-based method M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 52–69, 2002. c Springer-Verlag Berlin Heidelberg 2002
Simplifying Circuits for Formal Verification
53
introduces free variables for the nets in a cut, causing the false negative problem [2] since we lose the correlations on the free variables. When the verification result is false, this method has to resolve the false negatives by composing the free variables with their original functions. Even though this method has been used successfully, it still suffers from false negative resolutions that are very expensive and infeasible in many cases in real designs. To overcome the false negative problem, Moondanos et al. proposed the normalized function method [18]. Instead of simply introducing a free variable for a net on a cut, the function driving the net is replaced with a simplified function which preserves the range of values on the cut. This simplified function is called a normalized function. However we have observed that the normalized function is not optimal and we have generalized the normalized function not to have redundant variables, as explained in Section 4. A similar approach to the normalized function has been presented by Cerny and Mauras [6], which uses cross-controllability and cross-observability to compute the range of a cut from primary inputs, and the reverse range from primary outputs. Then equivalence checking can be done by checking whether the reverse range covers the range. In this method, once a set of gates is composed to compute the range, the variables feeding only the gates are quantified, just as the fanout-free variables are quantified in the normalized function. However this method suffers from BDD blowup since the range computation is expensive and the range of a cut represented by BDDs is very large in general. In this paper we present a new method to simplify circuits while preserving the range of all outputs. The method makes the work of Cerny and Mauras practical and also extends normalized functions to apply to a set of nets in a cut, instead of a single net. The new method is performed iteratively and on the fly while building BDDs of the circuits and is composed of three steps; 1) identifying a cut in the circuit, 2) identifying a group of nets within the cut, 3) replacing the logic driving the group of nets in such a way that the range of values for the entire cut is unchanged and, hence, the range of values on circuit outputs is unchanged. We apply the range computation selectively by first identifying the group to be replaced in step 2) and then estimating the feasibility and the gain from the computation in step 3). Furthermore once the range is computed, we do not keep the computed range as Cerny and Mauras do. Instead we try to get a simplified circuit from the range by using a parametric representation [1,11]. We also prove that the normalized function method is a special case of our method. Parametric representation has been used to model the verification environment based on design constraints [1,11]. Various parametric representations of Boolean expressions have been discussed in [5,7,8,9,1,11]. Parametric representation using BDDs was introduced by Coudert et al.[7,8] and improved by Aagaard et al.[1]. The authors in [1] proposed a method to generate the parameterized outputs as BDDs from the constraints represented by a single BDD [1]. However this method can deal with only the output variables of the environment, in other words the variables do not depend on the states of the design. Kukula and Shiple presented a method to deal with the output variables as well as the input variables that depend on the states of the design [11]. However this method takes the environment represented by a relation BDD and generates the parameterized outputs as circuits instead of BDDs.
54
I.-H. Moon et al.
In this paper we also present a generalized approach of the parametric representations to deal with the input and output variables as well as to generate the parameterized outputs as BDDs. We also identify that the method in [1] is a special case of the one in [11] in the framework of our generalized approach. Combining the range computation and the generalized parametric representation makes more efficient and compact representations of the circuits under verification so that the circuits can be easily verified. This approach can be applied to not only equivalence checking but also symbolic simulation as well as image computation. The rest of the paper is organized as follows. Section 2 reviews background material and Section 3 discusses prior work. We present our algorithm to find sets of variables for early quantification in Section 4. Section 5 shows the overall algorithm for equivalence checking and compares ours to the prior work. Section 6 describes a special type of range computation and Section 7 presents our methods for parametric representation. Section 8 shows the relationship between normalization and parameterization. Experimental results are shown in Section 9 and we conclude with Section 10.
2
Preliminaries
Image computation is finding all successor states from a given set of states in one step and is a key step in model checking to deal with sequential circuits [15,17,16]. Let x and y be the sets of present and next state variables and w be the set of primary input variables. Suppose we have a transition relation T (x, w, y) that represents all transitions, being true of just those triples of a, b, and c, such that there is a transition from state a to state c, labeled by input b. Image I(y) for given set of states C(x) is formally defined as I(y) = Image(T, C) = ∃x,w. T (x, w, y) ∧ C(x) . Range computation is a special type of image computation where C(x) is the universe, in other words it finds all possible successor states in a transition system. Range R(y) is defined as R(y) = Range(T ) = Image(T, 1) = ∃x,w. T (x, w, y) .
3 3.1
(1)
Related Work Normalization
To overcome the false negative problem in BDD-based equivalence checking, Moondanos et al. proposed a normalization method [18]. The authors split the set of input variables of the current cut into N and R. N is the set of fanout-free variables, in other words, the variables feeding only one net in the cut. R is the set of fanout variables that fanout to more than one net in the cut. Then, the function F of a net can be simplified without causing false negatives by using its normalized function that preserves the range of the cut. To make the normalized function of F , possible term Fp and forced term Ff of F are defined as below. Fp (R) = ∃N. F (R, N ) Ff (R) = ∀N. F (R, N )
Simplifying Circuits for Formal Verification
55
Then the normalized function Fnorm is defined by Fnorm = (v ∧ Fp ) ∨ Ff = (v ∧ ∃N. F (R, N )) ∨ ∀N. F (R, N ) ,
(2)
where v is an eigenvariable that is newly introduced. 3.2
Parameterization with Output Variables
Parametric representation using BDDs was introduced by Coudert et al.[7,8] and improved by Aagaard et al.[1]. The authors in [1] used the parametric representation to make the verification environment from the input constraints of the design under verification. Thus only output variables of the environment are considered since there is no constraint relating the states of the design. The basic idea is that each variable is parameterized with three cases for each path from the root to the leaf of the constraint BDD. However this operation is performed implicitly by calling P aram recursively and by using a cache. The three cases are 1) the positive cofactor of a node is empty, 2) the negative cofactor is empty, and 3) both cofactors are non-zeroes; BDD ZERO, BDD ONE or a parametric variable is assigned for each case, respectively. Then, the sub-results from the two children of a branch are merged by bdd ite operations from bottom to top. 3.3
Parameterization with Input and Output Variables
Kukula and Shiple proposed a method of parametric representation to deal with the input and output variables of the verification environment [11]. The input variables depend on the states of the design under verification. This method generates circuits from the BDD relation representing the environment. The conceptual procedure consists of three phases as follows. – Phase 1 (DFS): Finds all paths to constant 1 for each child of each node, through bottom-up traversal from the leaf node to the root node. – Phase 2 (DFS): Propagates signals from the root node to the leaf node to activate single path from root to leaf. – Phase 3: Computes the circuit output for each output variable. This method uses two template modules; one for input variables and the other for output variables. Using the template modules, the parameterized output circuit is generated in the following procedure. 1. Replace all BDD nodes for input and output variables with the pre-defined input and output template modules, respectively. 2. Connect the pins of the modules through Phase 1 and 2. 3. Produce outputs using a mux for each output variable.
56
3.4
I.-H. Moon et al.
Cross-Controllability and Cross-Observability Relations
Cerny and Mauras have used cross-controllability and cross-observability for equivalence checking[6]. Suppose we make a cut in a circuit containing two outputs of implementation and specification, namely yI and yS , respectively. Let x be the set of input variables and u and v be the set of cut variables in the implementation and specification, respectively. We then compute I1 , which is the relation between u and x, and S1 , which is the relation between v and x. Then cross-controllability is defined as Cross-controllability(u, v) = ∃x. (I1 (u, x) ∧ S1 (v, x)) . We can see that the cross-controllability is the range of the cut. Similarly, we compute I2 , which is the relation between yI and u, and S2 , which is the relation between yS and v. Then cross-observability is defined as Cross-observability(u, v) = ∃y. (I2 (u, y) ∧ S2 (v, y)) . We can also see that the cross-observability is the reverse range of the two outputs in terms of u and v. Then equivalence checking can be done by Cross-controllability(u, v) ≤ Cross-observability(u, v) .
(3)
The authors proposed the three different checking strategies and one of those is forward sweep. In the strategy, the cut is placed at the primary outputs and the crosscontrollability of the cut is computed by composing gates iteratively from the inputs to the outputs in such a way to eliminate any local variables that feed only some of the gates to be composed. When all gates are composed, Equation 3 is applied with trivial cross-observability.
4
Definition of K and Q Sets
In this section, we start with an example to show that the method in Section 3.1 can introduce redundant variables. Then we define the set of variables we can early quantify not to have those redundant variables in simplifying the functions in a given cut. Furthermore we extend the definition to handle a group in the cut. Consider two functions f and g in terms of variables a, b, and c. f = a ∧ ¬b + ¬a ∧ b g =a∧c Then, from the normalization method, R becomes {a} and N becomes {b, c}. Then, the normalized functions for f and g are as below. fnorm = v1 gnorm = a ∧ v2 In this example, it is easy to see that the variable a is redundant in gnorm since the variable a occurs only in gnorm . Actually the range of {f, g} is tautologous. So gnorm
Simplifying Circuits for Formal Verification
57
could be just v2 , which is optimum. This is because even though the variable a fans out to both f and g, the effect of the signal a to f is blocked by the signal b, which is non-reconvergent. Therefore, we can move the signal a into N in this case so that we can quantify even a. Now we formally define K and Q for a cut. K is the set of variables to keep in the simplified functions and Q is the set of variables to quantify out. Let F = {fi,0<=i
(4) (5)
where N is the set of fanout-free variables in V as shown in Section 3.1. Let us also define a function non blocked(F, v) that returns the number of functions in F , whose f ∃ or f ∀ contains the variable v. We then define K and Q as below. K = {v | v ∈ R ∧ non blocked(F, v) > 1} Q=V \K ,
(6) (7)
where R is the set of fanout variables in V . Then K and Q can be used instead of R and N in exactly the same way as in the method in Section 3.1. We can further optimize the size of K and Q by a fixpoint computation as shown in Figure 1 so that we can quantify more variables as early as possible. In Figure 1, Line 1 computes the initial K and Q by assigning R and N , respectively. Line 2 computes the quantified functions using Equation 4 and 5 for each function. For each variable in K, Equation 6 is tested in Line 4. If the condition is not satisfied, the variable is moved from K to Q and is quantified from each f ∃ and f ∀ . The do-while loop in Line 3 is continued until K and Q reach the fixed point. Using this K and Q, Equation 2 can be improved by F˜norm = (v ∧ ∃Q. F ) ∨ ∀Q. F .
(8)
By applying the new normalized function to the example, we get the optimally normalized functions f˜norm = v1 and g˜norm = v2 . Now we extend K and Q to simplify many functions in a selected group at once in the cut. Since some variables in K can feed only the nets in the group, these variables can be quantified when we build the relation of the group. This technique is already applied in [6]. The extended K and Q for the group are defined as KG = K \ L QG = Q ∪ L , where L is the set of variables in K that feed only the nets in the group and the variables in L are local, meaning that the variables do not affect the other functions outside the group.
58
I.-H. Moon et al. FindFixpointKQ(F , V ) { 1 Find the initial K and Q 2 for each (f ∈ F ) Compute f ∃ and f ∀ by quantifying Q 3 do { Gain = 0 4 for each (k ∈ K) { if (non-blocked(F, k) <= 1) { Gain++ K =K \k Q=Q∪k Update each f ∃ and f ∀ by quantifying k } } } while (Gain > 0) return K and Q } Fig. 1. Procedure to find the fixpoint K.
5
Overall Algorithm for Equivalence Checking
Figure 2 illustrates our overall algorithm for equivalence checking. CheckEquivalence takes a M iter [3]. The miter contains a specification circuit and its implementation circuit and the two circuits are XNORed so that the miter output is true when the circuits are equivalent. First we set a cut in the circuit and this cut is moved from primary inputs toward the miter output. For a cut in the do-while loop in Line 1, we build BDDs in Line 2. If we fail to build the BDDs, we return inconclusive in Line 3. If the cut has reached the miter output, we decide the equivalence in Line 4. Line 5 computes K and Q. Then a group containing a subset of F is selected iteratively in Line 6 by estimating the feasibility of range computation and the number of variables that can be quantified out. Line 6 refines K and Q with respect to the selected group and this refinement will be explained in Section 6. Line 8 computes the range of the group and Line 9 parameterizes the range if the range computation was successful and the range is deleted in Line 10. The range computation and parameterization will be explained in Section 6 and Section 7. There can be many heuristics for setting a cut and finding a group and these problems are also crucial in terms of performance in practice. We refer the reader to [12] for detailed heuristics for identifying cuts and groups. The circuit topology is used for setting a cut and estimating the number of variables to quantify and to introduce for finding a group. 5.1
Comparison to Prior Work
The overall algorithm in Figure 2 is conceptually the combination of the work in [6,11, 18]. However there are significant differences from their work.
Simplifying Circuits for Formal Verification
59
CheckEquivalence(M iter) { 1 do { Set a cut and let F be the set of functions in the cut 2 Build BDDs for each function in F 3 if (failed to build BDDs) return INCONCLUSIVE 4 if (reached to the compare point) if (BDD IS ONE(the compare point)) return EQUIVALENT else return INEQUIVALENT 5 Compute K and Q 6 while (G = FindGroup(F )) { 7 Compute KG and QG 8 if (RG = ComputeRange(G, QG )) { 9 Parameterize(RG , KG ) 10 Delete RG } } } while (TRUE) } Fig. 2. Overall procedure for equivalence checking.
Our algorithm is conceptually similar to the forward sweep in Section 3.4. However the major differences are as follows. Our method 1. does not require success of range computation. In general range computation is very expensive and requires huge memory space, and thus BDDs blow up quite often during the range computation. In [6] their method aborts once a range computation fails, whereas in our algorithm even if the range computation fails, we can take another group. 2. does not keep the range. In many cases the size of the range BDD of a group is relatively much larger than the shared size of the BDDs representing the circuits of the group. This is possible due to the parameterization of the range and the simpler BDDs representing the parameterized functions. 3. does not need the range of all nets in the cut. The method in [6] requires the global range of the cut since all computed ranges are kept even though local range of a group is computed at every composition, whereas our method throws away a computed range once it is parameterized. 4. may still have redundant variables theoretically. The method in [6] does not have any redundant variables since the method keeps the ranges, whereas ours may still have some redundant variables in theory. However ours is practically optimal by computing K in Section 4 since ours looks ahead only intermediate results of the range. Our parametric representation produces exactly the same functions as in [11]. However there are the following differences. Our method
60
1. 2. 3. 4.
I.-H. Moon et al.
performs only single phase, instead of 3 phases. does not visit all BDD nodes. produces outputs as BDDs, instead of circuits. handles BDDs with complement arcs.
Our approach is also quite similar to the normalized function method in [18]. However the normalized function method simplifies one function at a time, whereas our method can simplify multi functions at a time by using range computation and parametric representation. Thus our method can simplify further as well as ours has more chances to simplify. Moreover the method in [18] can produce redundant variables as shown in Section 4, whereas ours has very little chances. We also show that the normalization is a special case of parameterization.
6
K-Set Preserving Range Computation
The key idea in Figure 2 is that we try to simplify the functions in the selected group by first computing the range of the group, and then parameterizing the computed range to get simpler functions. As shown in Figure 2, once we find a group G containing a subset of all nets in the current cut, we then compute a special type of range of the group so that we preserve the range of all the nets in the cut. This is called K-set preserving range computation and is formally defined as RK (yG , KG ) = ∃QG . TG (yG , KG , QG ) , where TG is the conjuncted relations of all nets in the group G, yG is the range variables for the nets in G, and KG and QG are the extended K and Q for the group G. Theorem 1. Once we preserve the relations between K-set variables and the range variables yG in RK , the total range R of all the nets in the cut is preserved. Proof. Suppose we have a cut and we make a group G containing a subset of all nets in the cut. Let TG be the relation of all nets in G, TR be the relation of all remaining nets outside G, Qg be the set of variables belong to QG and feeding any net in G, and Qr be the set of variables belong to QG and feeding any net outside G. And let yG be the set of range variables for the nets in G and yR be for the nets outside G. The total range of the cut without parameterization is R(yG , yR ) = ∃KG ,Qg ,Qr . (TG (yG , KG , Qg ) ∧ TR (yR , KG , Qr )) = ∃KG . (∃Qg . TG (yG , KG , Qg ) ∧ ∃Qr . TR (yR , KG , Qr )) .
(9)
Now suppose that we parameterize the group by quantifying the variables in Qg and introducing new parametric variables Qp so that we preserve the relations between the variables in K and the range variables yR such that RK (yG , KG ) = ∃Qp . TG (yG , KG , Qp ) = ∃Qg . TG (yG , KG , Qg ) .
(10)
Simplifying Circuits for Formal Verification
61
Parameterization preserving Equation 10 is the work in [11]. By applying Equa˜ as below. tion 10, we can compute the total range with the parameterization R ˜ G , yR ) = ∃K ,Q ,Q . (TG (yG , KG , Qp ) ∧ TR (yR , KG , Qr )) R(y p r G = ∃KG . (∃Qp . TG (yG , KG , Qp ) ∧ ∃Qr . TR (yR , KG , Qr )) = ∃KG . (RK (yG , KQ ) ∧ ∃Qr . TR (yR , KG , Qr )) = ∃KG . (∃Qg . TG (yG , KG , Qg ) ∧ ∃Qr . TR (yR , KG , Qr ))
(11)
Equation 11 is equal to Equation 9, therefore we can conclude that the total range is preserved.
7
K-Set Preserving Parametric Representation
Once the K-set preserving range of the group is computed in Section 6, we try to get simpler functions of the group by parameterizing the range so that we preserve the range of the group as well as the total range of the cut with the parameterized functions. This is called K-set preserving parametric representation. In K-set preserving parametric representation, the input variables are the variables in K and the output variables are the range variables. Thus we parameterize only the range variables by preserving the relations with the K variables. Section 7.1 extends the method in [11] to make the differences mentioned in Section 5.1, and Section 7.2 modifies the BFS(Breadth-first-search) approach in Section 7.1 to a DFS(Depth-first-search) and shows that the two methods produce the same result. Section 7.3 describes the concept of the generalized parametric representation. 7.1
BFS Parameterization
Kukula and Shiple presented a method of parametric representation to generate circuits from the relation BDD of the design environment [11]. This method deals with the output variables of the environment as well as the input variables that depend on the states of the design under verification. For this purpose, input and output template modules are pre-defined and the parametric representations from the relation are obtained as a circuit by replacing each BDD node of the relation with the template modules. The equation form of the input template module in [11] is as below. f indOut = (f indIn1 ∧ varV al) ∨ (f indIn0 ∧ ¬varV al) chooseOut1 = (chooseIn ∧ varV al) ∨ chooseOutChain1
(12)
chooseOut0 = (chooseIn ∧ ¬varV al) ∨ chooseOutChain0 All BDD nodes of input variables in the relation are replaced with this module. Also the equation form of the output template module in [11] is as follows. f indOut = f indIn1 ∨ f indIn0 f unc = (f indIn1 ∧ valIn) ∨ ¬f indIn0
(13) (14)
62
I.-H. Moon et al.
chooseOut1 = (chooseIn ∧ f unc) ∨ chooseOutChain1 chooseOut0 = (chooseIn ∧ ¬f unc) ∨ chooseOutChain0 drive = chooseIn ∨ driveChain valOut = f unc ∨ valOutChain All BDD nodes of output variables in the relation are replaced with this module. We have extended this method to generate BDDs directly so that we do not need to build BDDs for the parameterized circuits, and to handle complement arcs on BDDs. More differences have already been mentioned in Section 5.1. The extension is based on the following observations. First from Equation 12 and 13 of the input and output template modules, f indIn1 and f indIn0 are f indOuts of then and else child node, respectively. Thus f indIn1 , f indIn0 , and f indOut of a node whose function is F and whose variable is v can be rewritten by f indIn1 = ∃Y. Fv f indIn0 = ∃Y. F¬v f indOut = ∃Y. F ,
(15)
where Y is the set of all output variables below v in the BDD. This observation makes the computations without looking at children nodes when we generate BDDs directly. Secondly, chooseOutChain1 and chooseOutChain0 are used to make union of all chooseOut1 s and chooseOut0 s that point to a same child node and to make chooseIn of the child node. Thirdly, driveOutChain of an output variable is used to sum the chooseIns of all nodes of the variable and similarly valOutChain are used to sum the f uncs (Equation 14) of all nodes of the variable. Fourthly, chooseIn of a node is the case set to constrain the parameterization of the node. Fifthly, adding a mux for each output variables is to take care of the don’t care space of the variable. By using the observations, we propose the algorithm shown in Figure 3 that produces BDDs directly with a single event-driven BFS step. The algorithm takes a relation R and a set of output variables Y . Line 1 finds all supports in R and Y . Line 2 initializes and finds the variables in V that do not occur in R and assigns a parametric variable. Also maxLevel is set to the level of the lowest output variable in R so that we do not traverse the BDD nodes below the variable. Line 3 computes f indOut of the top node and P ushQueue adds an event on the node with the f indOut as chooseIn of the node. Each output variable has an event queue. Each event queue for the current variable is selected in Line 4 and each event is poped in Line 6 until the event queue becomes empty in Line 5. Line 7 is the case of input variables and if any child node is not below maxLevel, chooseOut is computed and P ushQueue is called. If an event is already in the queue, the active condition of the event is updated by adding the new active. Otherwise a new event is created and added. Line 8 is the case of output variables and f indIn1 , f indIn0 , f unc, drivei , and valOuti are computed by the observations mentioned above. Adding events is the same as in the case of input variables. Line 9 computes the parameterized output when the event queue becomes empty for the variable. Notice that this algorithm produces exactly the same function as in [11].
Simplifying Circuits for Formal Verification BFS PR(R, Y ) { 1 V = Supp(R) ∪ Y 2 for each (vi ∈ V ) { Qi = drivei = valOuti = BDD ZERO if (vi ∈ Y ) if (vi ∈Supp(R)) resi = pi else maxLevel = i else resi = vi } 3 f indOut = ∃Y. R level = bdd get top level(R) PushQueue(Q[level], R, f indOut) 4 for (i=level, . . . , maxLevel) { 5 while (| Q[level] | > 0) { 6 event = PopQueue(Q[level]) posCof = (event → node)vi negCof = (event → node)¬vi chooseIn = (event → active) 7 if (vi ∈Y ) { if ((childLevel = bdd get top level(posCof )) ≤ maxLevel) { chooseOut1 = chooseIn ∧ vi PushQueue(Q[childLevel], posCof , chooseOut1 ) } if ((childLevel = bdd get top level(negCof )) ≤ maxLevel) { chooseOut0 = chooseIn ∧ ¬vi PushQueue(Q[childLevel], negCof , chooseOut0 ) } 8 } else { f indIn1 = ∃Y. posCof f indIn0 = ∃Y. negCof f unc = (f indIn1 ∧ pi ) ∨¬f indIn0 chooseOut1 = f unc ∧ f indIn1 drivei = drivei ∨ chooseIn valOuti = valOuti ∨ f unc if ((childLevel = bdd get top level(posCof )) ≤ maxLevel) PushQueue(Q[childLevel], posCof , chooseOut1 ) if ((childLevel = bdd get top level(negCof )) ≤ maxLevel) { chooseOut0 = chooseIn ∧ ¬f unc PushQueue(Q[childLevel], negCof , chooseOut0 ) } 9 if (| Q[level] | == 0) resi = (¬drivei ∧ pi ) ∨ (drivei ∧ valOuti ) } } } return res } Fig. 3. BFS parameterization procedure.
63
64
7.2
I.-H. Moon et al.
DFS Parameterization
The algorithm in Figure 3 can be implemented in a DFS manner by using f indIn1 , f indIn0 , and f unc in Equation 15. The algorithm is shown in Figure 4 and is quite similar to that in [1] except one difference, that the algorithm can handle both input and output variables. The algorithm takes R (a relation), Y (the set of output variables), V (the set of variables in R and Y ), P (the set of parametric variables corresponding to V ), and n (the number of variables in V ). Line 1 lookups the cache and returns the computed result if exists. Line 2 computes the positive and negative cofactors of R with respect to the first variable in V . Line 3 is the case of input variables and assigns f unc to the input variable itself. Line 4 is the case of output variables and computes the parameterized output using Equation 15. Line 5 recurs for each positive and negative cofactors. Line 6 merges two sub-results from Line 5 by using bdd ite. Line 7 puts the computed f unc in res and inserts res into the cache. DFS PR(R, Y , V , P , n) { 1 if (res = LookupCache(R, V , n)) return res 2 posCof = RV [0] negCof = R¬V [0] 3 if (V [0] ∈Y ) f unc = V [0] 4 else { f indIn1 = ∃Y. posCof f indIn0 = ∃Y. negCof f unc = (f indIn1 ∧ P [0]) ∨¬f indIn0 } 5 hres = DFS PR(posCof , Y , &V [1], &P [1], n-1) lres = DFS PR(negCof , Y , &V [1], &P [1], n-1) 6 for (i=0, . . . , n-2) if (V [i] ∈ Y ) res[i] = bdd ite(f unc, hres[i], lres[i]) else res[i] = V [i] 7 res[n − 1] = f unc InsertCache(R, V , n, res) return res } Fig. 4. DFS parameterization procedure.
Now we show that both alrorithms in Figure 3 and Figure 4 produce the same result when the relation is complete. If f indOut of the top node of the relation is tautologous, the relation is complete, otherwise it is incomplete [11]. Theorem 2. Both BFS and DFS parameterization algorithms produce the same result if the relation is complete. Proof. (Sketch) Let us assume that we perform the BFS and DFS parameterizations on a non-reduced BDD such that we do not reduce the nodes whose two children point the same node.
Simplifying Circuits for Formal Verification
65
Let Pn be the parameterized output of the top node of the BDD and Pt be that of its then child node and Pe be that of its else child node. Notice that both children nodes are in the same level of the BDD. Let Pc be the parameterized output of the variable of the children nodes and we compute Pc with the BFS and DFS separately. First we compute with the DFS. Pc = bdd ite(Pn , Pt , Pe ) = (Pn ∧ Pt ) ∨ (¬Pn ∧ Pe )
(16)
Now we compute with the BFS. Let chooseInt and chooseIne be the care sets for each child node. chooseInt = Pc chooseIne = ¬Pc Pc = (chooseIn1 ∧ Pt ) ∨ (chooseIn0 ∧ Pe ) = (Pn ∧ Pt ) ∨ (¬Pn ∧ Pe )
(17)
We can see that Equation 17 becomes the same as Equation 16. This is the base case of the proof by induction. The remaining step is trivial and omitted.
In case of incomplete relation, still we can have the same result from both BFS and DFS parameterization methods by the following modifications. In BFS method, we modify Line 9 in Figure 3 as follows. resi = (¬drivei ∧ pi ∧ CareSet) ∨ (drivei ∧ valOuti ) , where CareSet is f indOut of the top node of the relation. In DFS method, we multiply CareSet to each parameterized output. Furthermore these modifications can regenerate exactly the same relation from the parameterized outputs even when the relation is incomplete. 7.3
Generalized Parametric Representation
We consider a case in which there are no input variables in Figure 4. Since all variables belong to Y and are existentially quantified from posCof and negCof , f indIn1 and f indIn0 can have either 1 or 0. Therefore f unc in Figure 4 can have the following 3 cases if the relation R is not empty. 1. f indIn1 = 1 and f indIn0 = 0: f unc = (1 ∧ P [0]) ∨ ¬0 = 1 . 2. f indIn1 = 0 and f indIn0 = 1: f unc = (0 ∧ P [0]) ∨ ¬1 = 0 . 3. f indIn1 = 1 and f indIn0 = 1: f unc = (1 ∧ P [0]) ∨ ¬1 = P [0] . We can see that Case 1, 2, and 3 produce the same results as the three cases in Section 3.2. Thus we can conclude that the algorithm in Figure 4 produces the same results as that in Section 3.2 when there are no input variables in R. Now we generalize finding f unc in Figure 4. Suppose that we parameterize an output node. Let R be the relation of the node whose output variable is y. Since the relation on the node can be seen as R = (y ∧ yon ) ∨ (¬y ∧ yoff ) .
66
I.-H. Moon et al.
Then posCof and negCof of R can be considered as on-set and off-set with respect to the output variable y, respectively. Then we can define new on-set y˜on and new off-set y˜off as follows. y˜on = ∃Y. Ry y˜off = ∃Y. R¬y Then the generalized parametric representation yp for the node is yon ∧ v) ∨ ¬˜ yoff = (∃Y. Ry ∧ v) ∨ ¬∃Y. R¬y , yp = (˜ where v is a parametric variable. This parameterizes the output node regardless of presence of input variables.
8
Relationship between Parameterization and Normalization
Suppose we have a group containing only one net f (K, Q) and let y be the range variable of the net. Then K-set preserving range is RK (y, K) = ∃Q. ((y ∧ f (K, Q)) ∨ (¬y ∧ ¬f (K, Q))) = (y ∧ ∃Q. f (K, Q)) ∨ (¬y ∧ ∃Q. ¬f (K, Q)) = (y ∧ ∃Q. f (K, Q)) ∨ (¬y ∧ ¬∀Q. f (K, Q)) .
(18)
Now we compute the parametric representation of RK to get a simplified function of f . Since y is the only range variable, we can find its on-set y˜on and off-set y˜off . y˜on = (RK )y = ∃Q. f (K, Q) y˜off = (RK )¬y = ¬∀Q. f (K, Q) Therefore the parametric representation f˜ of f (K, Q) is f˜ = (v ∧ y˜on ) ∨ ¬˜ yoff = (v ∧ ∃Q. f (K, Q)) ∨ ∀Q. f (K, Q) , where v is a parametric variable. We can see that Equation 19 is exactly the same form as Equation 8. Furthermore Equation 19 produces even more simplified circuits since Q is a superset of N in Equation 8, that is, more variables can be quantified with Q than with N . Therefore normalization is a special case of parameterization when the number of output variables is just one and when Q is replaced with N . In the real implementation of this case, we do not need to compute RK in Equation 18 since we can compute f˜ directly from f .
9
Experimental Results
We have implemented the algorithm in Figure 2 and compared it to the normalized function method and also compared the BFS parameterization with the DFS. We have run all experiments on a 750MHz SUN UltraSPARC-III with 8GB memory.
Simplifying Circuits for Formal Verification
67
Table 1 shows the effectiveness of the method we present. The designs in the first column of the table are single compare points that are known as very hard and that existing state-of-the-art equivalence checkers fail to verify with various techniques such as BDD, SAT, ATPG, functional learning and so on. The designs are further modified to make them even harder by merging all internal equivalent points that are found by the various techniques. The second column shows the number of gates and the third and fourth columns give the number of primary input variables and the logic depth from an output to its inputs. The remaining columns compare the verification results. NONE stands for the method to build monolithic BDDs and NORM is the normalized function method. BFS PR stands for our method with the BFS parameterization and DFS PR is with the DFS. The results with NONE illustrate where the monolithic BDDs blow up; the numbers in the parentheses show in which level of depth it aborts. NORM verified 3 out of 10 cases and there is an improvement in test2 case by building BDDs 12 more depths compared to NONE. BFS PR and DFS PR verified 8 out of 10 cases and this shows that our method outperforms the normalized function method. However there are some cases in which ours also fails, and the failure cases can be categorized as follows: 1) The BDDs are built to greater depths, however building BDDs even with the improvement is still too hard for the remaining part of the circuits as in test9. 2) There is no improvement at all, meaning that even though the circuits are simplified by our method, it does not help much to build the BDDs for the remaining part of the circuits as in test10. Table 1. Verification results for different methods. Design test1 test2 test3 test4 test5 test6 test7 test8 test9 test10
Gates 1150 1203 1263 1296 849 30973 29541 5324 1486 22654
PIs Depth NONE NORM BFS PR DFS PR 128 71 Abort(35) Abort(35) Verified Verified 131 78 Abort(43) Abort(55) Verified Verified 134 81 Abort(49) Abort(49) Verified Verified 60 61 Abort(32) Verified Verified Verified 57 47 Abort(29) Verified Verified Verified 303 89 Abort(16) Abort(16) Verified Verified 306 82 Abort(24) Verified Verified Verified 1284 92 Abort(16) Abort(14) Verified Verified 148 37 Abort(25) Abort(25) Abort(27) Abort(27) 202 43 Abort(17) Abort(17) Abort(17) Abort(17)
Table 2 shows the performance in terms of time taken and consumed peak memory. In the table, the numbers with (*) show the time and peak memory until the cases were aborted at the depths shown in Table 1. For the designs (test4, test5, test7) in which both NORM and ours can verify, even though ours uses the expensive range computations, the results were faster in test5 and test7. However with test4 the result was slower. This implies that identifying groups to apply the range computation is very crucial in performance since there were 52 range computations with more than one net in the case and we were able to get the range only 8 out of 52 cases. For the other designs, NORM failed to verify, whereas ours completed by further simplifying the functions of the groups selected in Figure 2.
68
I.-H. Moon et al.
Lastly we compare BFS PR to DFS PR. Since both BFS PR and DFS PR produce the same results, the performance was about the same except test2, test3, and test8. DFS PR was better than BFS PR in both time and space in test3, whereas test8 was the opposite. In test2, DFS PR was faster, however took more memory. Table 2. Performance comparison for different methods. Design test1 test2 test3 test4 test5 test6 test7 test8 test9 test10
10
NONE 7721.5(*) 6850.9(*) 19626.0(*) 52178.3(*) 21603.6(*) 10608.2(*) 4138.2(*) 8498.0(*) 4796.6(*) 7147.4(*)
Time (seconds) NORM BFS PR 9982.3(*) 8910.6 21232.7(*) 7176.9 10905.0(*) 5128.2 8642.1 20312.0 2509.1 610.8 9340.4(*) 13619.2 14568.4 11000.2 7844.2(*) 764.5 4744.7(*) 7654.6(*) 7235.5(*) 7163.7(*)
DFS PR 8605.6 4270.9 7317.9 20262.5 633.5 13772.1 10781.7 616.2 7707.9(*) 7121.0(*)
NONE 129.7(*) 298.3(*) 932.6(*) 849.2(*) 942.6(*) 256.8(*) 200.6(*) 480.9(*) 792.6(*) 996.1(*)
Memory (Mbytes) NORM BFS PR DFS PR 98.7(*) 156.8 164.4 943.0(*) 165.5 272.4 156.6(*) 140.9 505.7 159.1 405.1 405.1 52.4 78.4 78.4 252.8(*) 330.0 330.0 416.4 291.5 291.5 940.8(*) 90.9 58.9 825.8(*) 238.8(*) 238.6(*) 996.1(*) 1033.2(*) 1033.2(*)
Conclusions
We have presented a new method to simplify circuits on the fly while building BDDs of the circuits. This method uses range computations and parametric representations while preserving the range of all combinational outputs. We have also proved that the normalized function method is a special case of our method. Furthermore we have presented an algorithm to find as many variables to quantify as possible. We have also presented a generalized parametric representation algorithm to deal with both input and output variables of an environment. The generalized algorithm provides a uniform framework that covers the existing methods. We have applied the method and algorithms we presented to equivalence checking. Experimental results show that our method outperforms the normalized function method. Furthermore the methods we presented can be also applied to other formal methods. Given that this new method simplifies the functions in a given group of a given cut, we are investigating how to identify better cuts and groups that are key factors in performance. Further research is required to make an efficient counter-example generation for this method.
References [1] M. Aagaard, R. B. Jones, and C.-J. H. Seger. Formal verification using parametric representations of boolean constraints. In Proceedings of the Design Automation Conference, pages 402–407, June 1999.
Simplifying Circuits for Formal Verification
69
[2] C. L. Berman and L. H. Trevillyan. Functional comparison of logic designs for VLSI circuits. In Proceedings of the International Conference on Computer-Aided Design, pages 456–459, Santa Clara, CA, November 1989. [3] D. Brand. Verification of large synthesized designs. In Proceedings of the International Conference on Computer-Aided Design, pages 534–537, Santa Clara, CA, November 1993. [4] R. E. Bryant. Graph-based algorithms for Boolean function manipulation. IEEE Transactions on Computers, C-35(8):677–691, August 1986. [5] E. Cerny and M. A. Marin. A computer algorithm for the synthesis of memoryless logic circuits. IEEE Transactions on Computers, C-23(5):455–465, May 1974. [6] E. Cerny and C. Mauras. Tautology checking using cross-controllability and crossobservability relations. In Proceedings of the International Conference on Computer-Aided Design, pages 34–37, Santa Clara, CA, November 1990. [7] O. Coudert, C. Berthet, and J. C. Madre. Verification of sequential machines using Boolean functional vectors. In L. Claesen, editor, Proceedings IFIP International Workshop on Applied Formal Methods for Correct VLSI Design, pages 111–128, Leuven, Belgium, November 1989. [8] O. Coudert and J. C. Madre. A unified framework for the formal verification of sequential circuits. In Proceedings of the International Conference on Computer-Aided Design, pages 126–129, November 1990. [9] P. Jain and G. Gopalakrishnan. Efficient symbolic simulation-based verifiaction using the parametric form of boolean expressions. IEEE Transactions on CAD, 13(8):1005–1015, August 1994. [10] A. Kuehlmann and F. Krohm. Equivalence checking using cuts and heaps. In Proceedings of the Design Automation Conference, pages 263–268, Anaheim, CA, June 1997. [11] J. H. Kukula and T. R. Shiple. Building circuits from relations. In E. A. Emerson and A. P. Sistla, editors, 12th Conference on Computer Aided Verification (CAV’00), pages 131–143. Springer-Verlag, Chicago, July 2000. LNCS 1855. [12] H. H. Kwak, I.-H. Moon, J. Kukula, and T. Shiple. Combinational equivalence checking through function transformation. In Proceedings of the International Conference on Computer-Aided Design (To appear), San Jose, CA, November 2002. [13] J. P. Marques-Silva and K. A. Sakallah. GRASP: A search algorithm for propositional satisfiability. IEEE Transactions on CAD, 48(5):506–521, May 1999. [14] Y. Matsunaga. An efficient equivalence checker for combinational circuits. In Proceedings of the Design Automation Conference, pages 629–634, June 1996. [15] K. L. McMillan. Symbolic Model Checking. Kluwer Academic Publishers, Boston, MA, 1994. [16] I.-H. Moon, G. D. Hachtel, and F. Somenzi. Border-block triangular form and conjunction schedule in image computation. In W.A. Hunt, Jr. and S. D. Johnson, editors, Formal Methods in Computer Aided Design, pages 73–90. Springer-Verlag, November 2000. LNCS 1954. [17] I.-H. Moon, J. H. Kukula, K. Ravi, and F. Somenzi. To split or to conjoin: The question in image computation. In Proceedings of the Design Automation Conference, pages 23–28, Los Angeles, CA, June 2000. [18] J. Moondanos, C.-J. H. Seger, Z. Hanna, and D. Kaiss. Clever: Divide and conquer combinational logic equivalence verification with false negative elimination. In B. Berry, H. Comon, and A. Finkel, editors, 13th Conference on Computer Aided Verification (CAV’01), pages 131–143. Springer-Verlag, Paris, July 2001. LNCS 2101. [19] M. W. Moskewicz, C. F. Madigan, Y. Zhao, L. Zhang, and S. Malik. Chaff: Engineering an efficient SAT solver. In Proceedings of the Design Automation Conference, pages 530–535, June 2001.
Generalized Symbolic Trajectory Evaluation — Abstraction in Action Jin Yang and Carl-Johan H. Seger Strategic CAD Labs, Intel Corp. {jin.yang, carl.seger}@intel.com
Abstract. Generalized STE (Symbolic Trajectory Evaluation) [21,22,23] is a very significant extension of STE [11,20] that truly combines the efficiency, capacity and ease of use of STE with the ability of classic symbolic model checking for verifying a much richer set of properties [9]. GSTE provides a unified model checking framework that gives one the power to choose and seamlessly adjust the level of abstraction in a model as well as in a specification during a verification effort. This paper describes some of the techniques that made this possible using a simple FIFO example for illustration. Finally, a set of real life verification results is provided to strongly demonstrate the viability of GSTE as a new generation model checking solution for complex designs.
1
Introduction
STE is a model checking technique based on a form of quaternary symbolic simulation [11,20]. It is a proven technology in verifying fairly large industrial hardware designs with a high degree of automation at both the gate level and the transistor level. It comes with a very comprehensive verification methodology [1]. STE has been in active use in Intel, Compaq, IBM and Motorola [2,16,17,18,19]. Despite its efficiency, STE is limited in the kind of properties it can handle. Any property spreading over an indefinitely long time interval cannot be expressed in STE, let alone be verified. Classic symbolic model checking (SMC) [9,10,15], on the other hand, can handle a much larger set of properties. In particular, classic SMC for linear time logic is capable of verifying all ω-regular properties. However, it cannot handle many realistic hardware systems due to the state explosion problem. Efforts to improve the efficiency of SMC like the ones in [4,7,13,24] have only met with limited success. In recent years, several efforts have been made to extend the expressiveness of STE using some of the fundamental techniques well studied in SMC, while preserving the benefits of STE. The first such work was proposed by Seger and Bryant in [20], where they introduced non-nested loops into STE assertions. In [5], Beatty proposed a more general extension of STE assertions by introducing a form of labeled transition graphs where each vertex is associated with an antecedent and a consequent. In [14], Jain introduced a formal semantics for the extension that existentially quantifies over paths in a graph. He proposed a generalized STE algorithm to model check the extension, which was then shown to be incomplete. In [8], Chou introduced a more natural semantics for the extension that universally quantifies over paths, and proposed an alternative to M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 70–87, 2002. c Springer-Verlag Berlin Heidelberg 2002
Generalized Symbolic Trajectory Evaluation — Abstraction in Action
71
Jain’s algorithm that can be proved both sound and complete. He then showed that the quaternary simulation in STE is simply an abstract interpretation of the boolean simulation. Our earlier work [21,22,23] significantly extended these results, and bridged the gap between STE and classic SMC. In fact, we developed a GSTE algorithm that is able to verify all ω-regular properties, thus making GSTE as powerful as traditional SMC for linear time logic. At the same time, many of the automatic abstraction techniques that are inherent in STE remain for this GSTE algorithm, promising a quantum leap in model checking capacity. Although the abstraction implied by the use of a quaternary state representation in (G)STE is the key to its high capacity, it is also its bane. A common early result when trying to verify some property with (G)STE is a false negative; the result is X instead of 0 or 1, which does not contain any information. Due to the fixpoint computation in GSTE, there are additional possibilities for unknowns to creep into the verification. In traditional STE, the remedy to this over-abstraction problem is almost always to add more boolean variables for a set of nodes in a model to the specification, effectively implicitly enumerating a large collection of simpler specifications. In GSTE we can also use a similar approach. However, GSTE provides also several additional avenues. First, the specification assertion graph can be transformed through a sequence of semanticspreserving transformations so that the uncertainty introduced by the fixpoint computation is minimized or removed. Secondly, in GSTE one can introduce boolean variables to represent parts of the state space more accurately. We call these nodes “precise”. Thirdly, sometimes one can apply one of the more sophisticated GSTE model checking algorithms that propagates information backwards, thus avoiding loss of information. In fact, in practical large GSTE verification efforts it is common to apply a combination of all these techniques. In this paper, we show how one can make various trade-offs between precision and efficiency in both specification and model checking in GSTE, and discuss the techniques that made these possible. We use a 3-entry 10-bit-wide FIFO example for illustration, since it is well understood and simple enough to explain yet has everything we need for illustrating the various techniques. The methodology and techniques described here have already been used in several verification efforts at Intel. We organize the rest of the paper as follows. In Section 2, we give a brief introduction to GSTE and assertion graphs. This includes a brief overview of the quaternary model used in GSTE model checking. In Section 3, we demonstrate a model checking approach based on specification refinement. In Section 4, we show how to extend the quaternary model to handle sets of quaternary assignments in order to support seamless model refinement in GSTE model checking. In Section 5, we discuss an approach that combines model refinement and specification refinement to achieve model checking efficiency, with the additional help of the backward simulation capability and the symbolic case analysis capability in GSTE. Finally, in Section 6 we present some experiment results to show the impact of different trade-offs on the GSTE model checking.
72
2
J. Yang and C.-J.H. Seger
Background
In this section we give an brief introduction to the general framework of GSTE by starting from STE and gradually generalizing the underlying concepts. For a more complete treatment, including formal definitions and proofs of correctness of the various algorithms, we refer the reader to [21,22,23]. For the purpose of this paper, an STE assertion will be viewed as a labeled linear graph representing a finite time line. Each edge in the graph represents a time unit and is labeled with two sets of circuit states (or equivalently state predicates), one of which is called an antecedent label and the other consequent label. In general, we define a GSTE assertion graph as a quintuple G = (V ,v0 ,E,ant,cons), where V is a set of vertices, v0 is the initial vertex, E is a set of directed edges, and ant and cons are functions that map each edge to an antecedent label and a consequent label, respectively. Note that E can contain more than one edge between two vertices, thus allowing different antecedent and/or consequent labels. In order to provide a formal semantics to an assertion graph, we introduced a fairly traditional, and simple, circuit model. A circuit consists of a set of boolean nodes V . A state is an assignment to all the nodes in V . The node set V can be partitioned into two disjoint sets: state nodes VS and input nodes VI . There is a next state function Nv (V ) for each state node v ∈ VS . The set of next functions defines how the circuit transitions between states. The transition can also be defined by the equivalent transition relation R(V , V ) = v ∈VS (v = Nv ), where V is a copy of V to hold the values for V after the transition. Note that (G)STE does not assume any initial state. In [22] three types of semantics for assertion graphs were introduced: strong, normal, and fair satisfaction. The strong satisfiability definition is the one used in the original STE theory [6,12] and the generalization in [8,14] and is based on the observation that every finite path in a GSTE assertion graph from the initial vertex is an STE assertion. Therefore, we define that a circuit strongly satisfies a GSTE assertion graph if and only if it satisfies every STE assertion in the graph. Since strong satisfiability covers every finite initial path in the graph, it enforces a consequent to hold solely based on the past and present antecedents. In other words, it expresses a class of effect-of-cause properties, but cannot express cause-of-effect properties where some consequents may depend on future antecedents. To overcome this limitation, normal satisfiability was introduced. A circuit normally satisfies an assertion graph with respect to a set of terminal vertices, if for every finite path in the graph that ends at a terminal vertex, and every finite state trace in the circuit of the same length as the path, the trace satisfies the antecedent sequence on the path implies it also satisfies the consequent sequence on the path. Going further, by introducing a set of “fair edges”, a fair satisfiability definition was given (by checking infinite traces against infinite fair paths that visit fair edges infinitely often). With fairness constraints, assertion graphs can express all ω-regular properties. This can be proven by showing that for every deterministic Streett automaton, there is an equivalent assertion graph with a fairness constraint. The proof is given in [22]. An assertion graph for a fairly complete specification of the behavior of a 3-entry, 10-bit-wide FIFO circuit is shown in Figure 1. The top part of the graph specifies how the number of filled entries should be updated, and makes sure that the full and empty flags
Generalized Symbolic Trajectory Evaluation — Abstraction in Action
73
are set correctly. The bottom part describes the correct movement of the distinct data in the FIFO and makes sure it is not corrupted. Note that this specification is completely implementation-independent.
!enq / empty&!full
enq iff deq / !empty&!full enq / empty&!full
enq iff deq / !empty&!full enq&!deq / !empty&!full
!deq / !empty&full enq&!deq / !empty&!full
reset / true 0−filled
1−filled
2−filled
3−filled
init !enq&deq / !enq&deq / deq / !empty&!full !empty&!full !empty&full enq&!deq& enq&!deq& enq&!deq& din=D[9:0] / din=D[9:0] / din=D[9:0] / true true true enq&deq& enq&deq& din=D[9:0] / din=D[9:0] / true true done
0−ahead deq / dout=D[9:0]
1−ahead deq / true
!deq / true
2−ahead deq / true
!deq / true
!deq / true
Fig. 1. The Functional Specification for 3-entry 10-bit-wide FIFO
From a practical point of view, the most important results of [21,22,23] were three model-checking algorithms for the above satisfaction definitions. Although the fair and normal satisfiability model checking algorithms are non-trivial, they are all extensions of simpler algorithms, starting with the traditional STE algorithm. We start by describing the algorithms when no abstraction is done, i.e., by taking away the quaternary abstraction used in (G)STE. The exact symbolic simulation algorithm in STE is quite straight-forward. It starts from the first edge in the STE assertion graph and computes the set of states simulated by the edge. In this case, it is the set of states satisfying the antecedent label on the edge. The algorithm then computes the post-image of this set of simulation states and intersects it with the set of states satisfying the antecedent label on the second edge. The result is the set of states simulated by the second edge. The post-image of a state set S(V ), denote by post((S(V )), is given by ∃V . S(V ) ∧ R(V , V ). The algorithm repeats this step until the set of simulation states for the last edge is computed. Once the simulation is complete, the consequent label on each edge is checked against the set of simulation states for this edge to see if it is satisfied. The GSTE model checking algorithm, for verifying the strong satisfiability of an assertion graph against a circuit, also computes a simulation relation sim(v) for each vertex v in the graph. It proceeds as follows. Initially, sim(v) is the set of all states if v = v0 , and the empty set otherwise. Repeatedly, an edge e = (v, v ) is picked and the set of states simulated by the edge, denoted by esim(e), is computed as sim(v) ∩ ant(e). The consequent cons(e) is then checked against every state in esim(e), and sim(v ) is updated as sim(v ) ∪ post(esim(e)). This iteration terminates when either a consequent is violated or the fixpoint is reached. The algorithm for checking normal satisfiability of an assertion graph against a circuit can be viewed as a two-stage process. First a pre-image based fixpoint algorithm
74
J. Yang and C.-J.H. Seger
is used to propagate antecedent information back to earlier antecedent labels in the assertion graph. When this is completed, the strong satisfiability algorithm is applied to the intermediate assertion graph thus formed. The backward simulation capability used to compute the pre-image is very similar to the (forward) simulation capability. Instead of computing a simulation relation for each vertex starting from the initial vertex, the backward simulation computes a backward simulation relation starting from a set of given terminal vertices in the symmetric way. In other words, it picks an edge and updates the simulation relation for the source vertex from that of the sink vertex using the pre-image function. Finally, the algorithm for model checking using fair satisfaction is even more complex and requires a nested fixpoint computation. Due to space considerations, we will not describe this algorithm, but refer the interested reader to [22]. Since model checking for normal satisfaction involves two fixpoints and pre-image computations, it is considerably more expensive than model checking for strong satisfiability. Similarly, fair satisfaction requires nested fixpoints and is even more computationally expensive than normal satisfaction. However, an important aspect of the satisfiability definitions for assertion graphs is that they are monotonic in the sense that strong satisfaction implies normal satisfaction and normal satisfaction implies fair satisfaction. As a result, one typically tries to prove a property using the least expensive model checking algorithm. In Section 5 we illustrate how we sometimes can trade off model checking efficiency for more circuit abstraction. When the circuit becomes large, the likelihood for a symbolic model checking algorithm to encounter the state explosion problem increases drastically. To overcome the state explosion problem in model checking, (G)STE operates on the quaternary model of the circuit introduced to STE in [6,11] where each node in the circuit has four values {0, 1, X, }. X denotes an unknown and an over-constraint. Besides the quaternary generalization of the boolean operations, two new operations are defined: the greatest lower bound and the least upper bound of any two quaternary values. Figure 2 lists the truth tables for the basic quaternary operations. It should be pointed out that we are using a slightly different partial order than was used in the original STE theory [20]. We place X at the top and at the bottom. The reason for this is that we can then interpret X as the set {0, 1} and as ∅. As a result, join (least upper bound) corresponds to set union and meet (greatest lower bound) corresponds to set intersection.
!
|
X 0
X 0 X T
X
X X 1
T
X
X
X
X 0 1
T
X
X X X X
0 0 0
T
0
0
1
0
X 0 X 0
1
1
0
1
0 0 T T 1 T 1 T
0
1 T
X 0 1 T 1 1 1 T
1
X X 1 1
T T T T
T
T T T T
T
T
T
T T T T
T
X 0
&
X 0
X 0 1
X 0
T
1
T
1
T
X 0
1
T
X 0
1
1
T
T
Fig. 2. Quaternary Operations
The post-image computation needed for strong-satisfiability is performed by symbolic simulation of the quaternary extension of the circuit model using the operations
Generalized Symbolic Trajectory Evaluation — Abstraction in Action
75
listed in Figure 2. To make the backward simulation efficient, the pre-image function is abstracted on the quaternary model using an efficient inverse of the post-image function built from the inverse functions for the set of basic boolean functions. As an example, Figure 3 lists the inverse functions for a two-input AND, a two-input OR and an inverter.
&
X 0
T
X
X X 1 T
0
X X T T
1
X 0
T
T T T T
1 T
|
X 0
X
X 0 X T
0 1
X 0 1 T X T X T
T
T T T T
1
!
T
output
1
output
the other input
the other input
output
X
X
0
1
1
0
T
T
Fig. 3. Efficient Quaternary Inverse Operations
A quaternary assignment to the state nodes represents a set of states in the circuit either precisely or approximately. A node has a boolean value in the quaternary assignment if it has the same boolean value in every state of the set. Otherwise, it has value X. The empty set is represented by assigning to one or more nodes depending on where the conflict occurs in the circuit. For instance, consider a circuit with three nodes p, q and r. The quaternary assignment for the singleton state set {[p = 1, q = 0, r = 1]} is [p = 1, q = 0, r = 1]. The assignment for the state set {[p = 1, q = 0, r = 0], [p = 1, q = 1, r = 0]} is [p = 1, q = X, r = 0]. In the following, we shall omit nodes with the X value from a quaternary assignment. With this abstraction, the state space becomes much smaller. In general, for a circuit n with n nodes, there are 22 different sets of states but only 4n different quaternary assignments. In the quaternary world, the set of abstracted next state functions now maps an abstract set of states (i.e., a quaternary assignment) to another. The intersection ∩ (union ∪) of two state sets is abstracted as bit-wise ( ) of the two corresponding quaternary assignments. The quaternary model can be made symbolic by allowing a quaternary assignment to take boolean functions over symbolic constants [6,11] (i.e., rigid boolean variables that hold the same values forever) as its values. All the quaternary operations can be easily extended to handle such symbolic quaternary assignments. To be model checked in this quaternary model, an assertion graph must express its antecedents and consequents using symbolic quaternary assignments. For instance, the antecedent on edge (0-filled, 0-ahead) in Figure 1 is expressed as [enq = 0, deq = 1, din[9] = D[9], . . . , din[0] = D[0]]. In general, any boolean constraint can be expressed as a symbolic quaternary assignment using a technique called parametric representation [3]. For instance, the boolean constraint (deq iff enq) can be expressed as [enq = Z, deq = Z], where Z is a symbolic constant. One way to look at this assignment is that it symbolically indexes the set of all scalar instances satisfying the boolean constraint:
76
J. Yang and C.-J.H. Seger
{[enq = 1, deq = 1], [enq = 0, deq = 0]}, in the form: (Z → [enq = 1, deq = 1]) ∧ ( !Z → [enq = 0, deq = 0]). However, care must be taken when such a parametric representation is applied to antecedents of an assertion graph. For instance, if the antecedent on the self-loop at vertex 1-filled in Figure 1 is expressed as the above symbolic quaternary assignment, then the symbolic constant Z will force enq to have the same value every time the selfloop is visited. The parametric representation can only be applied to the antecedent on an edge if the edge is visited at most once on every path in the graph. The amount of symbolic constants in the antecedents of an assertion graph determines the precision of the quaternary model and the efficiency of the model checking. Increasing the number of symbolic constants in antecedents increases the precision of the symbolic quaternary model but decreases the performance and capacity of the model checking. For STE, the model can be ultimately made into the symbolic boolean model. Unfortunately, without some further mechanisms, this is not true for GSTE due to potentially infinite many paths converging at an edge in an assertion graph. In the next two sections, we shall address the limitation of the symbolic quaternary model and discuss two approaches to strike a good balance between precision and efficiency in GSTE model checking.
3
Model Checking by Specification Refinement
One strategy for efficient model checking with adequate precision is to refine the assertion graph to be verified. The basic observation behind this approach is that the assertion graph contains both semantic information (what to prove) as well as guidance to the model checking algorithm (how to prove it). Since for any assertion graph with at least one loop there are infinitely many assertion graphs representing the same property, one can use semantic-preserving transformations of the assertion graph to better guide the model checking algorithm during a verification of a particular model. Although it is an open research problem whether there is a complete set of such transformations, in practice a few sound rules appears to suffice. In particular, case splitting of an edge and unrolling of a loop are almost always sufficient. Based on the discussion in the previous section, the first refinement step is to split every edge with an antecedent that is not expressible by symbolic quaternary assignments. For the assertion graph in Figure 1, the self-loop on vertex 1-filled must be split into two, one with the antecedent !enq&!deq and the other enq&deq. The same applies to the self-loop at vertex 2-filled. To see how to refine the graph further, let us consider the simple FIFO implementation in Figure 4 (a). During a dequeue operation, the filled entry at the head of the FIFO is read out and all other entries shift one position towards the head. During an enqueue operation, the entry selected by the tail pointer is filled and the pointer is incremented by 1. Figure 4 (b) shows the quaternary simulation relation obtained from the GSTE model checking algorithm. As one can see, most of the time the tail pointer maintains a scalar
Generalized Symbolic Trajectory Evaluation — Abstraction in Action
empty
tail[1:0]
reset enq full
77
demux deq
entry[0]
entry[1]
entry[2]
din[9:0]
dout[9:0]
(a) A FIFO Implementation
[] init
0−filled
[tail=00]
1−filled [tail=01]
done
2−filled [tail=10]
0−ahead [tail=XX]
[tail=1X,entry[1]=D[9:0]]
3−filled [tail=11]
1−ahead
2−ahead
[tail=11,entry[2]=D[9:0]]
(b) Quarternary simulation relation
Fig. 4. Model Checking A FIFO Implementation
value after reset. However, it starts to lose precision at vertex 1-ahead. Since the tail pointer may have either 10 or 11 depending on whether there is an enqueue operation on the self-loop at 1-ahead, the value for its lower bit becomes X in the quaternary model after the abstract set union operation. This becomes worse on the self-loop at vertex 0-ahead, as both bits of the pointer become X. Consequently, the content in entry[0] can be erased by an enqueue operation, and thus the verification would fail with a false negative. To overcome this over-approximation problem, we further refine the assertion graph by unfolding the graph at vertices where the precision is lost. In particular, we unfold vertices 1-ahead and 0-ahead so that every instance of the two vertices corresponds to a different tail pointer value. Figure 5 shows the final refined graph. It is not difficult to show that the tail pointer has a scalar value everywhere in the refined graph after reset, and thus the vector of symbolic constants D[9:0] will come out correctly. Intuitively, the more an assertion graph resembles the computation flow in a circuit, the more efficient the model checking will be and the less chance the quaternary model will lose precision.
78
J. Yang and C.-J.H. Seger !enq / empty&!full
!deq / !enq&!deq / enq&deq / !enq&!deq / enq&deq / !empty&full !empty&!full !empty&!full !empty&!full !empty&!full enq / enq&!deq / enq&!deq / empty&!full !empty&!full !empty&!full
reset / true 1−filled
0−filled
2−filled
3−filled
init !enq&deq / !enq&deq / deq / !empty&!full !empty&!full !empty&full enq&!deq& enq&!deq& enq&!deq& din=D[9:0] / din=D[9:0] / din=D[9:0] / true true true enq&deq& enq&deq& din=D[9:0] / din=D[9:0] / true true 0_ahead,1_filled
done deq / dout=D[9:0] deq / dout=D[9:0]
!enq&!deq / true enq&!deq / true
!enq&deq / true
enq&deq / true
1_ahead,2_filled
!enq&!deq / true
!enq&!deq / true
!deq / true
enq&!deq / true
0_ahead,2_filled deq / dout=D[9:0]
2_ahead,3_filled
deq / true
1_ahead,3_filled
deq / true !deq / true
enq&!deq / true
0_ahead,3_filled !deq / true
Fig. 5. The Implementation Dependent Specification for 3-entry FIFO
4
Extended Quaternary Model and Model Refinement
Model checking by refining an assertion graph is not always a good solution. It may require very detailed knowledge about the circuit, and may drastically increase the size of the graph and become too dependent on the implementation. To illustrate this, let us consider a stationary implementation of the FIFO as shown in Figure 6. The stationary implementation is a circular structure with head and tail pointers. On an enqueue operation, the data is put to the entry indexed by the tail pointer which is then incremented by 1 (modulo 3). On an dequeue operation, the data in the entry pointed by the head pointer is read out and the head pointer is incremented by 1. Initially, both pointers have value 00. The wrap is set when the tail pointer meets the head pointer from behind, indicating that the FIFO is full. If we use the specification refinement approach to model check against this implementation, we need to split each vertex in the original assertion graph into 3 different vertices, each of which corresponds to a different setting for the head and tail pointers and the warp bit with respect to the number of filled entries at the vertex. In general, the refined graph would be grow quadratically with the depth of the FIFO. To avoid this problem, we look at a complimentary approach to obtain enough precision in GSTE model checking through model refinement. In order to support the model
Generalized Symbolic Trajectory Evaluation — Abstraction in Action deq empty demux
entry[0]
entry[1]
entry[2]
tail[1:0] head[1:0] wrap
reset enq full
79
dout[9:0]
din[9:0] mux
Fig. 6. Stationary FIFO Implementation
refinement at all levels, two problems need to be addressed for the symbolic quaternary model: (1) how to express an arbitrary boolean constraint in a parametric representation in the context of an assertion graph, and (2) how to avoid the precision loss caused by the abstract set union operation. To solve the first problem, we extend the symbolic quaternary assignments to include symbolic variables. Unlike a symbolic constant, a symbolic variable is a boolean variable that can change its values, and is existentially quantified out when contributing to any simulation relation. Using a symbolic variable z, the antecedent enq iff deq on the selfloops at vertices 1-filled and 2-filled in Figure 1 can be expressed parametrically as the extended symbolic quaternary assignment [enq = z, deq = z], and thus no split is need for both self-loops. We can also add the assignment enq = z to the antecedent on every edge in the bottom half of the graph to increase model checking precision. Figure 7 shows the assertion graph where the antecedents are expressed as extended quaternary assignments. To solve the second problem, we allow one to specify a set of circuit nodes as precise nodes. For these nodes, their values and the relationship among them are always represented exactly by using boolean expressions as their values. For this to work, however, any extended symbolic quaternary assignment representing a simulation relation must be made canonical so that it can be checked for reaching a fix-point. In [3], it was shown that every state predicate except false has a parametric representation. An algorithm was proposed to build a parametric representation for a given state predicate. It is not difficult to argue that if the order of the variables to be made parametric is fixed, and the corresponding list of parametric variables is fixed, then the algorithm always produces a unique parametric representation for any state predicate. Therefore, we can build the canonical form of any extended symbolic quaternary assignment by first building the state predicate represented by this assignment and then applying the parameterization algorithm to the predicate in a consistent way. Unfortunately, this would be very inefficient and below we describe a more efficient approach. Before we describe the algorithm, let us first walk through a concrete example for the FIFO verification to see how precise nodes and canonization are used in simulation
80
J. Yang and C.-J.H. Seger [enq=0] / empty&!full
[enq=z, deq=z] / !empty&!full [enq=1] / empty&!full
[enq=z, deq=z] / !empty&!full
[enq=1, deq=0] / !empty&!full
[deq=0] / !empty&full
[enq=1, deq=0] / !empty&!full
reset / true 0−filled
1−filled
2−filled
3−filled
init [enq=0, deq=1] / [enq=0, deq=1] / [deq=1] / !empty&!full !empty&!full !empty&full [enq=1, deq=0, [enq=1, deq=0, [enq=1, deq=0, din=D[9:0]] / din=D[9:0]] / din=D[9:0]] / true true true [enq=1, deq=1, [enq=1, deq=1, din=D[9:0]] / din=D[9:0]] / true true done
0−ahead [enq=z, deq=1] / dout=D[9:0]
1−ahead [enq=z, deq=1] / true
[enq=z, deq=0] / true
2−ahead [enq=z, deq=1] / true
[enq=z, deq=0] / true
[enq=z, deq=0] / true
Fig. 7. Parametric Assertion Graph
relation computation. As we pointed out earlier, the head and tail pointers can have different values at each vertex in the assertion graph for the stationary implementation, and will eventually become X in simulation. For this reason, we make both pointers as well as the wrap bit precise in the model refinement. Let these nodes be ordered as {tail[1], tail[0], head[1], head[0], wrap}. Let us assume that the simulation relation for vertex 1-filled has been partially computed as [tail[1] = 0, tail[0] = 1, head[1] = 0, head[0] = 0, wrap = 0]. To update the simulation relation for 1-filled through the self-loop, we first compute the new set of states simulated by the vertex: newsim(1-filled) = post(sim(1-filled) ∩ ant(1-filled, 1-filled)) = post([tail[1] = 0, tail[0] = 1, head[1] = 0, head[0] = 0, wrap = 0] [enq = z, deq = z]) = [tail[1] = z, tail[0] =!z, head[1] = 0, head[0] = z, wrap = 0].
In order to preserve the precision in updating the simulation relation, a unique special symbolic variable $c is temporally introduced in representing the updated simulation relation: sim (1-filled) = sim(1-filled) ∪ newsim(1-filled) = (!$c → [tail[1] = 0, tail[0] = 1, head[1] = 0, head[0] = 0, wrap = 0])∧ ($c → [tail[1] = z, tail[0] =!z, head[1] = 0, head[0] = z, wrap = 0]) = [tail[1] = $c&z, tail[0] =!$c+!z, head[1] = 0, head[0] = $c&z, wrap = 0]. Finally, the parametric representation of the constraint among the precise nodes is canonized using the unique set of symbolic variables determined by the precise nodes, e.g., {$t[1], $t[0], $h[1], $h[0], $w}, i.e., sim (1-filled) = [tail[1] = $t[1], tail[0] =!$t[1], head[1] = 0, head[0] = $t[1], wrap = 0].
Generalized Symbolic Trajectory Evaluation — Abstraction in Action
81
Figure 8 shows the refined quaternary simulation result for the original FIFO specification in Figure 1 on the stationary implementation, where the relations between the tail pointer and the head pointer are represented as constraints rather than their actual canonized parametric forms for clearer exposition. This improved result is sufficient for the GSTE model checking to conclude that the consequent on the edge (0-ahead, done) is satisfied.
0−filled
1−filled
2−filled
init [(tail−head)%3=0, wrap=0]
[(tail−head)%3=1, wrap=0]
done [(tail−head)%3>0 | wrap=1, entry[head]=D]
0−ahead [(tail−head)%3=1 | wrap=1, entry[head−1]=D]
3−filled [(tail−head)%3=0, wrap=1]
[(tail−head)%3=2, wrap=0]
1−ahead
2−ahead
[(tail−head)%3=0, wrap=1, entry[head−2]=D]
Fig. 8. Refined Quaternary Simulation for the FIFO
Now, let us turn to the canonization algorithm. The algorithm canonizes an extended quaternary assignment in two phases, assuming a total ordering among precise nodes. Given a quaternary assignment, the first phase of the algorithm extracts the list of precise node and symbolic value pairs ordered by the precise nodes, and generates the unique parametric representation for the list without building the intermediate state predicate. It also returns a boolean relation between the unique set of parametric variables generated from the precise nodes and the symbolic variables in the original list. The entire algorithm is listed in Figure 9, where input old is a vector of precise node and symbolic value pairs such that all symbolic variables in the list have been properly renamed to avoid overlapping with the unique set of parametric variables and depend(f ) is a function that returns the set of symbolic variables in f . It should be pointed out that if only new needed to be computed, the algorithm could be significantly improved by existentially quantifying out variables from old as soon as they no longer were needed. However, in order not to lose the information implied by the precise nodes on the non-precise nodes, it is critical to compute this relation. The second phase of the canonization algorithm normalizes the symbolic quaternary value for every other circuit node in the quaternary assignment. The value of a nonprecise node in the original quaternary assignment may depend on the constraint among precise nodes captured by the original symbolic variables. Such a dependency relation is maintained in the canonized assignment by replacing the original constraint with the equivalent constraint over the set of unique parametric variables.
82
J. Yang and C.-J.H. Seger Algorithm: UniqueParam( old[1:n]) 1. Rel := true; 2. Z := depend( old[1:n]); 3. for i: = 1 to n 4. (Node, Value) := old[i]; 5. H := ∃Z. Rel ∧ Value; 6. L := ∃Z. Rel ∧ ¬Value; 7. Value’ := (UniqVar(N ode) → H) ∧ (¬UniqVar(N ode) → ¬L); 8. new[i] := (Node, Value’); 9. Rel := Rel ∧ (Value’ = Value); endfor 10. return ( new[1:n], Rel); end. Fig. 9. Unique Parameterization
Although the model refinement approach might be computationally more expensive than the specification refinement approach, it has many advantages. First, deep knowledge about the circuit behavior is not required. What is required is to identify the set of precise nodes in the circuit. This is a considerably simpler task and can be aided by the debugging capability in GSTE. Also, heuristics exists for identifying precise nodes, e.g., all control nodes may be good candidates for precise nodes. Second, the graph does not need to be unfolded to match the implementation, a process which can sometimes be very tedious. Last, the regression becomes easier to maintain, since an internal change in the circuit may require very little to be changed in the specification. Nevertheless, the key for successful verification is to make the best trade-off between specification refinement and model refinement, and between model checking efficiency and accuracy.
5
Model Checking with Backward Simulation
In this section, we show that in GSTE we also can use a more sophisticated model checking algorithm as yet another weapon against over-abstraction. More specifically, we will illustrate that by using backward simulation, i.e., model checking using normal satisfiability, a problem can get significantly simplified. In order to illustrate the approach, we will use the very simple 8-bit counter circuit shown in Figure 10 (a). The property we want to verify is that, after the circuit has been properly reset, the outputs outb is the complement of output out. In Figure 10 (b) we give a typical GSTE assertion stating this property. Intuitively, after a reset and some arbitrary number of clock cycles later, if the output of out has some value O, then outb should be !O. O is a vector of symbolic constants used to encode the 256 different values that out can take on. The difficulty with this verification is that the we need to maintain the relation between outb and out. We could split the assertion graph and effectively have a vertex for every value of the counter. However, for large registers, this is clearly not practical. On the other hand, we could simply make the nodes in out and outb precise. However, this would lead to a large number of variables if the registers are large. Although both approaches are feasible for the 8-bit wide register shown in Figure 10 (a), there is in
Generalized Symbolic Trajectory Evaluation — Abstraction in Action 8 8
0 1
mux
+1
8
reset&!clk / true
out
clk / true
!reset&!clk / true
clk&out=O[7:0] / outb=!O[7:0]
mid
8
reset
init
outb
clk / true
loop
83
!reset&!clk / true
done
clk (A) dual−rail 8−bit counter
(B) mutual exclusion specification
Fig. 10. Dual-rail counter: (a) circuit, (b) assertion graph
fact a more elegant solution that does not require any change of assertion graph and no extra variables. The idea is to use the normal satisfaction model checking algorithm to effectively move the output assumption into the internal signals of the circuit. As mentioned in Section 2, the model checking algorithm for normal satisfaction is a two-phase process. In the first phase, a pre-image fixpoint process is used to strengthen earlier antecedents in the assertion graph by propagating later antecedent constraints backwards. By using this algorithm, we will “pull” the value on out back to the inverter input mid, which will then be propagated forward during the second phase of the model checking algorithm to the node outb. In Figure 11 we show the strengthened assertion graph that is obtained after the first phase of the model checking algorithm. Note that we did not assume here that the backward propagation was able to pull the value O back through the adder. For this to be possible, the state-holding register out would almost certainly have to have been made precise. However, for this verification, it is sufficient to pull the value O back through the flip-flop.
reset&!clk / true init
clk / true clk / true
!reset&!clk&mid=O[7:0] / true loop
!reset&!clk / true
clk&out=O[7:0] / outb=!O[7:0] done
Fig. 11. Assertion graph after backward strengthening phase.
In our final example, we turn our focus to the verification of a high-level property of our FIFO circuit. Although much less complete than our earlier FIFO specifications, the property illustrates the need for using a combination of the techniques we have introduced in this paper. The property we would like to establish states simply: if, after a reset, a value D is never enqueued, then the value D cannot be dequeued. A natural assertion graph for the property is shown in Figure 12 (a). Let us consider verifying this property against the stationary FIFO implementation in Figure 6 as an example. On the self loop (loop, loop), the antecedent can be converted into a parametric form: [enq = z, din = (if z then param((d[9:0]! = D[9:0]), d[9:0]) else X[9: 0])]
84
J. Yang and C.-J.H. Seger !empty&deq&dout=E[9:0] / E[9:0]!=D[9:0]
reset / true init
loop enq −> din!=D[9:0] / true (a) the Original Assertion Graph
done
head=H[1:0]&0<=H[1:0]<=2& !empty&deq&dout=E[9:0] / E[9:0]!=D[9:0]
reset / true init
loop
done
enq −> din!=D[9:0] / true (b) Symbolic Case Analysis
Fig. 12. Graph Refinement by Symbolic Case Analysis
where param takes a state predicate and a list of variables and returns the parametric constraint form for the variables. Since there are many values different than D[9:0], the parametric variables must be symbolic variables d[9:0] rather than symbolic constants. If we only mark both pointers and the wrap bit as precise nodes, the simulation relation will still be too coarse to be useful. The reason is that symbolic variables do not last after they are written into the FIFO, and therefore the entries in the FIFO would be all X and consequently the verification would fail. One solution to this problem is to further refine the model. However, this requires all the entries in the FIFO to be made precise, as any X would cause the verification to fail. Unfortunately, doing so means that GSTE model checking would fall back completely to classic SMC and thus hit the same capacity barrier. Now, let us look at the verification of the property using the normal satisfaction model checking, i.e., applying backward simulation. The idea is to pull back the symbolic constants E[9:0] at the output to the input through the internal of the FIFO so we can argue that E[9:0] is not D[9:0]. If we apply the backward simulation directly on the assertion graph, the value E[9:0] would get lost in the FIFO and cannot be compared with D[9:0], as its precise location would soon become unknown and could be anywhere between the head and the tail pointers. Since the value is in the entry pointed by the head pointer when it is dequeued on the last edge (loop, done), we could refine the assertion graph by doing a case analysis on all possible values of the head pointer on the edge. By doing so, the location of the data in the FIFO becomes fixed in each case. Verifying each individual graph separately, however, is tedious and time consuming when the depth of the FIFO increases. This problem can be solved by doing a symbolic case analysis using the symbolic indexing technique in GSTE. More precisely, we assign symbolic constants H[1:0] to the head pointer rather than scalar values, and constrain these constants to take valid values, as shown in Figure 12 (b). With these techniques, the GSTE model checker completes successfully. This example further demonstrates the power of GSTE that gives one the great flexibility to use a combined strategy of specification refinement, model refinement and algorithm selection in model checking.
6
Experimental Results
The GSTE model checker has been implemented using a functional language fl inside the Intel Forte environment ([1]). We first conduct an experiment on the two implementations
Generalized Symbolic Trajectory Evaluation — Abstraction in Action
85
of the 10-bit-wide FIFO, and analyze how each of the approaches we proposed in the paper scales with the depth of the FIFO and how they compare with each other. We then show results collected from real verification efforts completed using GSTE. All the r Pentium r 4 processor experiments were done on a computer with 1. 5 GHz Intel with 1 GB memory. We would like to emphasize that all the verifications were done on the original circuits with no prior abstraction. Table 1 lists the verification result of the FIFO functional specification. The number of edges in the assertion graph for each depth is given for each refinement approach. Note that the number for the graph refinement approach is much larger than what one would expect from the discussion in Section 3. It turns out that in the marching implementation, when a FIFO entry gets a fresh data and when it preserves the data correspond to two different circuit states. Therefore the vertices in the graph have to be split further based on different combinations of the enqueue and dequeue operations on top of the already quadratic complexity. The situation is even worse for the stationary implementation, as every vertex has to correspond to a different combination of the head and tail pointer values. For this reason, we did not include the experiment for the case. For the model refinement approach, the number of precise nodes is given for each depth and each implementation, which is essentially the number of non-data-path elements in the circuit. Table 1. GSTE Results for the FIFO Functional Specification Circuit Marching Stationary Strategy Graph Refinement Model Refinement Model Refinement Depth #Latch #Symb. #Edges Time Mem. #Edges #Prec. Time Mem. #Prec. Time Mem. Nodes Const. (sec.) (MB) Nodes (sec.) (MB) Nodes (sec.) (MB) 3 53 16 75 2.5 16 28 2 0.2 17.3 5 1.2 17.4 7 199 16 377 23.9 20 64 3 1.0 17.5 7 7.7 19.8 15 249 16 1701 278.9 35 136 4 3.6 18.8 9 60.5 28.0 31 523 16 7229 3993.1 134 280 5 12.9 21.8 11 508.4 35.7 63 1481 16 N/A 568 6 52.2 27.7 13 4227.3 63.7
Figure 13 Table (a) lists the verification result of the no-data-creation property from Section 5 (Figure 12) on the stationary implementation using the backward GSTE. We are not reporting the result for the forward symbolic simulation, since it did not even finish for the 3-entry FIFO after an hour and 500MB with variable re-ordering turned on. For the backward GSTE, further refinement is needed for the same reason as in the functional specification verification. It is very interesting to note that contrary to the previous experiment, the graph refinement this time is much more favorable, as it only requires the splitting of the vertex loop into two vertices, one corresponding to the state after an enqueue operation and the other the state after no enqueue operation. The model refinement approach, on the other hand, requires to mark as precise the enable bit of every bit of every data entry in the FIFO, which approaches very quickly to the precise circuit model. The data shown in the table is for the refined graph. As one can see from the table, the memory usage remains small. The verification time, however, increases much more quickly. This has much more to do with the backward simulation
86
J. Yang and C.-J.H. Seger
being entirely implemented in fl than the quadratic time complexity for enumerating all possible combinations of the head and tail pointer values. Depth Time Memory (sec.) (MB) 3 49 39 7 167 40 15 478 41 31 2088 49 63 3hrs. 83
(a) No-Data-Creation
Circuit #Latch Nodes ckt1 718 ckt2 7506 ckt3 22433 ckt4 22433 ckt5 34899 ckt6 46682
#Gates #Symb. #Symb. #Prec. Time Memory Constants Variables Nodes (sec.) (MB) 17367 66 2 4 122 36 62735 55 40 41 5220 260 187928 8 393 0 117 509 187928 4 99 0 500 240 406630 16 8 0 451 361 241854 122 160 12 132 295
(b) Results from Real Life Verification Fig. 13. GSTE Performance Comparison
Finally, Figure 13 (b) shows the data we collected from some real-life verification efforts completed using GSTE in Intel. In each case, one non-trivial functionality of a circuit was verified using GSTE without any prior model abstraction/pruning. A majority of these properties cover the entire circuits from inputs to outputs. As one can see, the verification time and the memory usage are fairly small in each case. Another interesting point is that the number of BDD variables (i.e., symbolic constants, symbolic variables, and precise nodes) is very small, independent of the circuit complexity. It should be pointed out that none of these verifications were possible using a state-of-the-art classic symbolic model checker. This set of data again strongly demonstrates the viability of GSTE as a new generation model checking solution for complex circuit designs.
7
Conclusion
In this paper, we described several GSTE techniques that give one the great flexibility to not only choose and seamlessly adjust the level of abstraction in a model as well as in a specification, but also different model checking algorithms, in order to get the verification done with the efficiency it needs. GSTE provides a unified model checking framework that combines the best from both the STE world and the classic symbolic model checking world.
References 1. M. Aagaard, R. Jones, T. Melham, J. O’Leary, and C.-J. Seger. A methodology for large-scale hardware verification. In FMCAD’2000, November 2000. 2. M. Aagaard, R. Jones, and C.-J. Seger. Combining theorem proving and trajectory evaluation in an industrial environment. In Proceedings of the 35th DAC, pages 538–541, June 1998. 3. M. Aagaard, R. Jones, and C.-J. Seger. Formal verification using parametric representations of boolean constraints. In Proceedings of the 36th DAC, June 1999. 4. F. Balarin and A. Sangiovanni-Vincentelli. An iterative approach to language containment. In Proc. of the 5th Workshop on CAV, pages 193–195, 1999.
Generalized Symbolic Trajectory Evaluation — Abstraction in Action
87
5. D. Beatty and R. Bryant. Formally verifying a microprocessor using symbolic simulation methodology. In Proceedings of the 31st DAC, June 1994. 6. R. Bryant and C.-J. Seger. Formal verification of digital circuits using symbolic ternary system models. In DIMAC Workshop on Computer-Aided Verification, June 1990. 7. J. Burch, E. Clarke, and D. Long. Symbolic model checking with partitioned transition relations. In Proc. of International Conference on VLSI, 1991. 8. C.-T. Chou. The mathematical foundation of symbolic trajectory evaluation. In CAV’1999, July 1999. 9. E. Clarke, O. Grumberg, and D. Peled. Model Checking. The MIT Press, 1999. 10. O. Coudert, J. Madre, and C. Berthet. Verifying temporal properties of sequential machines without building their state diagrams. In Proc. of CAV’90, pages 23–32, 1990. 11. S. Hazelhurst and C.-J. Seger. A simple theorem prover based on symbolic trajectory evaluation and OBDDs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 14(4):413–422, April 1995. 12. S. Hazelhurst and C.-J. Seger. Model checking lattices: Using and reasoning about information orders for abstraction. Logic Journal of the IGPL, Volume 7:375–411, May 1999. 13. R. Hojati, S. Krishnan, and R. Brayton. Early quantification and partitioned transition relations. In Proc. of the 1996 ICCAD, pages 12–19, 1996. 14. A. Jain. Formal Hardware Verification by Symbolic Trajectory Evaluation. PhD thesis, ECE, Carnegie-Mellon University, August 1997. 15. K. McMillan. Symbolic Model Checking: An Approach to the State Explosion Problem. Kluwer Academic, 1993. 16. J. O’Leary, X. Zhao, R. Gerth, and C.-J. Seger. Formally verifying ieee compliance of floating-point hardware. Intel Technology Journal, Q1:147–190, 1999. 17. M. Pandey and R. Bryant. Exploiting symmetry when verifying transistor-level circuits by symbolic trajectory evaluation. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 18(7):918–935, July 1999. 18. M. Pandey, R. Raimi, D. Beatty, and R. Bryant. Formal verification of PowerPC(TM) arrays using symbolic trajectory evaluation. In Proceedings of the 33rd DAC, pages 649–654, June 1996. 19. M. Pandey, R. Raimi, D. Beatty, R. Bryant, and M. Abadir. Formal verification of content addressable memories using symbolic trajectory evaluation. In Proceedings of the 34th DAC, June 1997. 20. C.-J. Seger and R. Bryant. Formal verification by symbolic evaluation of partially-ordered trajectories. Formal Methods in System Design, 6(2):147–190, March 1995. 21. J. Yang. Generalized symbolic trajectory evaluation. STE Symposium, CAV’2000, 2000. 22. J. Yang and C.-J. Seger. Generalized symbolic trajectory evaluation. Intel SCL Technical Report (submitted for journal publication), 2000. 23. J. Yang and C.-J. Seger. Introduction to generalized symbolic trajectory evaluation. In Proc. of ICCD, September 2001. 24. J. Yang and A. Tiemeyer. Lazy symbolic model checking. In DAC’00, 2000.
Analysis of Symbolic SCC Hull Algorithms Fabio Somenzi1 , Kavita Ravi2 , and Roderick Bloem3 1
University of Colorado at Boulder [email protected] 2 Cadence Design Systems [email protected] 3 Graz University of Technology [email protected]
Abstract. The Generalized SCC Hull (GSH) algorithm of [11] can be instantiated to obtain many symbolic algorithms for the detection of fair cycles in a graph. We present a modified GSH with improved convergence properties, and we use it to study—both in theory and experimentally—the performance of various algorithms employed in symbolic model checkers. In particular, we show that the algorithm of Emerson and Lei [4] has optimal complexity among those that can be derived from GSH. We also propose an early termination check that allows the Lockstep algorithm [1] to detect the existence of a fair cycle before an entire SCC has been examined. Our experimental evaluation confirms that no one method dominates the others, and identifies some of the factors that impact run times besides those accounted for by the theoretical analysis.
1
Introduction
Cycle detection algorithms are at the heart of symbolic model checkers. For common specification mechanisms like fair CTL, LTL, and ω-automata, deciding the satisfaction of a property entails determining whether an infinite path can be found along which certain fairness constraints are satisfied infinitely often [3,9]. The existence of a fair cycle—one that goes through a set of states intersecting all fairness constraints—provides an affirmative answer. Given its central role, fair cycle detection has received considerable attention over the years. In recent times, both new algorithms [15,1] based on SCC enumeration, and variants of the classical algorithm of Emerson and Lei [4] that computes a hull of all fair SCCs [7,6,8,5] have been proposed for symbolic model checking [10]. A first contribution to the classification and comparison of these different algorithms was offered in [11]. There, it was shown that most algorithms that compute hulls of the fair SCCs can be regarded as instantiations of a Generic SCC Hull (GSH) algorithm. In this paper, we improve the GSH algorithm and we use it to derive several bounds on the performance of classes of instantiations of GSH. These classes subsume all popular SCC-hull algorithms. Among our results, we prove that the algorithm of Emerson and Lei
This work was supported in part by SRC contract 2001-TJ-920 and NSF grant CCR-99-71195. This work was done while this author was with the University of Colorado at Boulder.
M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 88–105, 2002. c Springer-Verlag Berlin Heidelberg 2002
Analysis of Symbolic SCC Hull Algorithms
89
is optimal in terms of worst-case number of image and preimage computations among the SCC-hull algorithms. Besides providing an excellent framework for the analysis of SCC-hull algorithms, GSH is flexible and efficient in practice. Our implementation of the Emerson and Lei algorithm as an instantiation of GSH is actually slightly more efficient than the traditional one it replaced. We used the new implementation of GSH to compare several different SCC hull algorithms, and to compare them to a modified Lockstep [1] algorithm, enhanced by an early termination check that allows it to detect the existence of a fair cycle even before an entire SCC has been examined. Our experiments confirm in essence the observations of [11,5] that there is no clear winner among the competing algorithms. They also shed some light on what factors affect the relative performance of different techniques, and point to directions for possible improvement. The paper by Fisler et al. [5] also compares different SCC hull algorithms with respect to their complexity. Its bounds are in terms of the number of nodes of the graph. For the large graphs encountered in symbolic model checking, we argue that our analysis, which is in terms of quantities like the diameter of the graph and the height of the SCC quotient graph, provides more useful bounds. On the other hand, we do not discuss maximum gaps between algorithms.
2
Preliminaries
Symbolic graph algorithms operate on the characteristic functions of sets. Besides the usual Boolean connectives, they employ as basic operations the computation of the predecessors (EX) and successors (EY) of a set of states1 . From these operators, others can be derived; in particular, we shall use: E q U p = µZ . p ∨ (q ∧ EX Z) EG p = νZ . p ∧ EX Z EF p = E true U p
E q S p = µZ . p ∨ (q ∧ EY Z) EH p = νZ . p ∧ EY Z EP p = E true S p ,
where µZ . τ denotes the least fixpoint of τ and νZ . τ denotes its greatest fixpoint. Given a graph G(V, E) and a set C = {c0 , . . . , cm−1 } ⊆ 2V of (B¨uchi) fairness constraints, a fair cycle is a cycle that intersects each ci . Several algorithms have been proposed that check for the existence of a fair cycle by computing an SCC hull [11]; that is, a set of states that is empty if no fair cycle exists in G, and otherwise contains all fair SCCs. A fair SCC is a maximal nontrivial strongly connected component of the graph that intersects all fairness constraints, and hence contains a fair cycle. (A trivial SCC has one node and no arcs). A set of vertices is SCC-closed if every SCC is either contained in it or disjoint from it. The SCC quotient graph of G is obtained by collapsing each SCC to a single node. This graph is acyclic, and defines a partial order of the SCCs. The transitive (irreflexive) 1
Throughout this paper, the interpretation of a temporal logic formula is the characteristic function of a set of states. We do not distinguish a set from its characteristic function.
90
F. Somenzi, K. Ravi, and R. Bloem
closure of a graph is obtained by adding arcs from a node to all nodes to which it has paths of length greater than 1. Let L be a finite lattice and l, l ∈ L generic elements of L. A function τ : L → L is monotonic if l ≤ l implies τ (l) ≤ τ (l ); τ is downward if τ (l) ≤ l. A monotonic function over a finite lattice has both a least and a greatest fixpoint [13]. Given a family T = {τ1 , . . . , τk } of monotonic functions over L, we are interested in their greatest common fixpoint. We summarize here the results from [12] that we need. For σ a finite sequence over T , let τσ be the function L → L obtained by composing all the functions in σ in the order specified by the sequence. We say that σ is 1-closed if for i = 1, . . . , k we have τi (τσ (1)) = τσ (1), in other words, if τσ (1) is a common fixpoint of τ1 , . . . , τk . Let σ be a 1-closed sequence and Γ = τ ∈T τ be the pointwise greatest lower bound of all the functions in T . Then τσ (1) = νZ . Γ (Z) .
(1)
From this result it is easily seen that τσ is the greatest common fixpoint of τ1 , . . . , τn . An infinite sequence over T is fair if each τ ∈ T appears in the sequence infinitely often. If the functions in T are downward, then all fair sequences have a finite prefix that is 1-closed. Also, for a 1-closed sequence σ, τσ (1) ≤ l, then τσ (l) = τσ (1). The results presented above imply that we can compute this common fixpoint by iteratively applying all functions, as long as the sequence is fair, i.e., as long as we keep revisiting every function. A set of states X is forward-closed [11] if X = EP X; X is backward-closed if X = EF X. If X is forward-closed, then X ≥ EY X and EY X is forward-closed. If X is backward-closed, then X ≥ EX X and EX X is backward-closed.
3 The Generic SCC Hull Algorithm In this section, we shall describe the Generic SCC Hull algorithm. We shall first describe how an SCC hull can be computed. Then, we shall introduce the algorithm and constraints on the schedule so that only potentially useful operations can be chosen. Finally, we shall look at existing algorithms as GSH schedules. 3.1
Computing an SCC Hull
Let G = (V, E) be a graph and C = {c0 , . . . , cm−1 } ⊆ 2S a set of B¨uchi fairness constraints. Let TP = {ES0 , . . . , ESm−1 , λZ. Z ∧ EY Z} be a set of monotonic downward past-tense (or forward) operators over S, where ESi stands for λZ. E Z S(Z ∧ ci ). Likewise, TF = {EU0 , . . . , EUm−1 , λZ. Z ∧ EX Z} is a set of monotonic downward future-tense (or backward) operators with EUi = λZ. E Z U(Z ∧ ci ). Let TB = TP ∪ TF . (P stands for “Past,” F stands for “Future,” and B stands for “Both.”)
Analysis of Symbolic SCC Hull Algorithms
91
Theorem 1. Let φ, π, and β be finite 1-closed sequences over TF , TP , and TB , respectively. Let F (P ) be the set of nodes of G with a path to (from) a cycle fair with respect to C. Let B = F ∩ P . Then τβ (1) = B ,
τφ (1) = F ,
and
τπ (1) = P .
Proof. We prove that τφ (1) = F . The proofs for τπ (1) = P and τβ (1) = B are similar. It follows from the results in Section 2 that τφ (1) is the greatest common fixpoint of the functions in TF . We shall now prove that F has the same property. We shall first prove that F is a common fixpoint. It is clear that F ⊇ E F U(F ∧ ci ). For the other direction, suppose v ∈ F . Then v has a path to a fair cycle, and this path and the cycle are by definition contained within F . Hence, there is a path from v to a node in ci that is wholly contained within F . It is also clear that F ⊇ F ∧ EX F . For the other direction, note that if a node has a path to a fair cycle, then it has a successor with a path to a fair cycle. We shall now prove that F is the greatest common fixpoint of TF . Suppose F is a common fixpoint of TF , and let v ∈ F . From F = F ∧ EX F it follows that v has a successor v ∈ F . This successor has a path to some v0 ∈ c0 ∧ F because F = E F U(F ∧ c0 ). The path from v to v0 is a non-trivial path connecting v to a state in v0 . With similar process, we can extend this path contained in F so that it goes through states v1 , . . . , vi , . . . with vi ∈ vimodm . Since F is finite, it is possible to extract from this path a cycle that touches all fairness constraints. Hence, v can reach a fair cycle, and
thus F ⊆ F . As a result of this theorem, and the fact that all functions in TF are downward, any fair sequence of functions from TF has a finite prefix φ that computes the set of states that can reach a fair cycle. We can compute this set by applying all functions until we have reached a common fixpoint. In a similar manner, we can compute P or B. In our applications, we need to detect the existence of a fair cycle, i.e., we can stop as soon as we know that F , P or B are empty; for counterexample generation we need to detect the cycle, by either isolating the fair SCC or knowing whether it is minimal or maximal. The following corollary allows us to halt the computation of B as soon as we have reached either a common fixpoint of TF , or a common fixpoint of TP . Corollary 1. If σ is a sequence over TB that is 1-closed with respect to TF (that is, τ (τσ (1)) = τσ (1) for all τ ∈ TF ), then B ≤ τσ (1) ≤ F . If σ is 1-closed with respect to TP , then B ≤ τσ (1) ≤ P . Proof. We consider only the case in which σ is 1-closed with respect to TF . The other case is proved similarly. Since closedness with respect to TB implies closedness with respect to TF , if β is a 1-closed sequence over TB , we have B = τβ (1) ≤ τσ (1) . Let φ be a 1-closed sequence over TF . Then τσ (1) = τ(σ,φ) (1) ≤ τφ (1) = F , where the first equality comes from the closedness of σ with respect to TF .
92
F. Somenzi, K. Ravi, and R. Bloem
In the proof of Theorem 1, we have assumed that all functions were downward. In practice, for schedules that apply past-tense operators only, we can use λZ. EY Z instead of λZ. Z ∧ EY Z because EP S0 is forward-closed. For future-tense schedules, we can use λZ. EX Z instead of λZ. Z ∧ EX Z, because Z restricted to the reachable states is backward-closed. Therefore, the simplified operators are contextually downward. In the following, we assume that contextual downwardness is exploited when possible, and we denote by EX and EY the simplest forms of the operators allowed by the schedule. 3.2
Limiting the Choice of the Operator
Corollary 1 leads to the Generalized SCC Hull (GSH) algorithm of Fig. 1 (which improves the one in [11]). The algorithm functions by keeping a fixpoint iterate Z (and its previous value ζ). It iteratively picks and applies an operator τ . We say the operator makes progress if it changes Z. The algorithm keeps a list of disabled functions γ. Function pick is responsible for the scheduling. Note that a schedule need not be fixed in advance. The function converged ensures convergence, by disabling operators that can not make progress. Its decisions depend on whether Z changes, the last operator, and the operators that will not yield progress; they are discussed below. Theorem 2. If a past-tense (future-tense) operator is an ES (EU) or makes no progress, then it can not make progress when applied again unless in the meanwhile a different past-tense (future-tense) operator has made progress. Proof. We shall prove the theorem for past-tense operators. The case for future-tense operators is dual. Suppose EY causes no progress. Then all the minimal SCCs are nontrivial. Application of EX may not remove any of them. Application of EUi may remove some of them. Let s be a minimal nontrivial SCC that is removed by EUi . Then s is not fair, and it cannot reach the i-th fair set. Hence, EUi removes all the successors of s together with s. Therefore, no trivial SCC becomes minimal as a result of the application of EUi . Thus, EY will not cause progress if applied again. Suppose ESi is applied. Then all the minimal SCCs of the resulting graph intersect the i-th fair set. As above, if EUj (j =i) removes s, then it removes also all its successors. Hence, all minimal SCCs continue to intersect the i-th fair set. Likewise, (repeated) application of EX may eliminate s, but since EX eliminates terminal SCCs, when s is eliminated it has no successors and no new minimal SCCs are thus created. Hence, applying ESi again will not cause progress. We have shown that if a past-tense operator τ is an ES or fails to make progress, no intervening future-tense operator can change the graph in such a way that application of τ will make progress.
Theorem 3. For a given sequence of operators that GSH may pick, there is a graph G such that all operators that are not in γ at the j-th iteration of the sequence make progress in G restricted to the iterate Z at iteration j. Proof (sketch). The main observation is that the operators may only remove minimal or maximal SCCs. Given a sequence of pairs—each pair consisting of an operator and a Boolean denoting whether the operator made progress—one can construct a graph G that satisfies the following two properties, and their counterparts for future-tense operators.
Analysis of Symbolic SCC Hull Algorithms
93
global TF , TP gsh(G, S0 ) { // graph, initial states Z := EP S0 ; // fixpoint iterate γ := ∅; // disabled operator set do { ζ := Z; τ := pick((TF ∪ TP ) \ γ); Z := τ (Z); (done, γ) := converged((Z = ζ), τ, γ); } until (done); return Z; } converged(progress, τ, γ) { if (progress) { // enable operators of the same tense as τ . . . if (τ ∈ TF ) γ := γ \ TF ; else γ := γ \ TP ; // . . . except τ itself if it is an EU or an ES if (τ ∈EX, { EY}) γ := γ ∪ {τ }; return (false,γ); } else { // no progress, disable τ γ := γ ∪ {τ }; return ((TF ⊆ γ ∨ TP ⊆ γ) , γ); } } Fig. 1. Improved GSH algorithm
1. If the j-th operator made progress, then the subgraph of G defined by Zj has a minimal trivial SCC, so that EY will make progress if applied. 2. If there has been a successful application of a past-tense operator since the last ESi , then the subgraph of G defined by Zj has a minimal SCC that has no intersection with ci , so that ESi will make progress if applied. The graph can be constructed inductively as a linear graph with some self loops, by starting with a graph consisting of a single node, and working back through the sequence of operations, making the graph consistent with the observations.
Theorem 2 leads to the improved convergence test of Fig. 1, while Theorem 3 proves that it is optimal in the sense that if decisions are only based on the picked operator and whether it made progress, the set of disabled operations could never be any larger. Theorem 2 also allows us to improve algorithm that use both tenses [7,6] as follows. If during a pass no progress is obtained by application of the past-tense (future-tense) operators, these operators are no longer applied. This improvement is for the case in which we insist on having both initial and terminal SCCs fair. Otherwise, the criterion of Fig. 1 suffices. Theorems 2 and 2 are not exploited in the original GSH [11].
94
F. Somenzi, K. Ravi, and R. Bloem
Theorem 4. The gsh algorithm returns the empty set if no fair cycle exists. If a fair cycle exists, it returns a set containing all states on a fair cycle. Proof. The algorithm terminates because of the result in Section 2 that any fair sequence has a finite 1-closed prefix, and because converged does not disable too many operations, as proven in Theorem 2. Correctness follows from Theorem 1.
It should be noted that either the minimal SCCs or the maximal SCCs of a non-empty hull produced by GSH are fair [11]. 3.3
Existing Algorithms as GSH Schedules
Efficient version of popular SCC-hull algorithms can be described as specializations of GSH by giving their schedules. In particular: EL [4] : EU0 , EX, . . . , EUm−1 , EX, EU0 , EX, . . . EL2 [7,5] : EU0 , EU1 , . . . , EUm−1 , EG, EU0 , EU1 , . . . HH [7,6] : EU0 , ES0 , . . . , EUn−1 , ESm−1 , EX, EY, EX,EY, . . . , EU0 , ES0 , . . . Here, EG represents applying EX to convergence, which implies that the EL2 schedule is not fixed in advance, but rather depends on the termination of EG. Note that the termination condition of GSH is more refined than that of the algorithms above. In particular, even with the EL schedule, GSH may detect convergence earlier than the standard EL algorithm, as shown in the following example. Example 1. Suppose we are using algorithm EL to compute the set of states reaching a cycle intersecting a single B¨uchi fairness constraint F . Let Z be the fixpoint iterate. We start with Z0 equal to the set of all reachable states and compute Z1 = E Z0 U(Z0 ∧ F ) Z2 = Z1 ∧ EX Z1 . Suppose Z1 =Z0 , and Z2 = Z1 . EL will perform another iteration because Z2 =Z0 , computing Z3 = E Z2 U(Z2 ∧ F ) Z4 = Z3 ∧ EX Z3 . If GSH picks operators so as to mimic EL, it also computes Z1 and Z2 , but then it stops, because all future tense operators are disabled. Since Z1 is a fixpoint, Z1 = E Z1 U(Z1 ∧ F ) , and from Z2 = Z1 it follows that Z2 is the set of states that can reach a fair cycle.
Accelerated convergence is obviously desirable, and in practice implementing EL as a GSH schedule improves performance slightly. The main virtue of the GSH approach, however, lies in providing a common framework for the implementation and study of SCC-hull algorithms, as shown in the next section.
Analysis of Symbolic SCC Hull Algorithms
95
Table 1. Labels for nodes label 0 n>0 x o F
4
meaning has self loop, belongs to no fairness constraints has self loop, belongs to all fairness constraints except the nth trivial SCC, belongs to no fairness constraints trivial SCC, belongs to all fairness constraints has self loop, belongs to all fairness constraints
Complexity of GSH
Following [1], we measure the cost of GSH in steps, the number of EX and EY operations applied to a nonempty set. We express the cost in terms of the number of fairness constraints |C|, the diameter of the graph d, the height of (i.e., the length of the longest paths in) the SCC quotient graph h, and the number of SCCs (N ) and nontrivial SCCs (N ). Since d, h, and N are often much smaller than n, this analysis provides more practical bounds than using the number of states. In [1, Theorem 1] it was shown that EL takes Θ(|C|dh) steps. In this section we extend this result to account for the flexibility in the scheduling of operators that characterizes the GSH algorithm. Throughout the analysis, we shall analyze worst-case behavior: we pick graphs that are hard for the algorithms. In Sections 4.1 and 4.2, we shall look at how many steps are needed if the scheduler chooses operations badly. In Section 4.3, we shall study how many steps are needed if the scheduler makes optimal decisions. The conventions used to label the nodes in Figs. 2–5 are shown in Table 1. To avoid clutter, the arcs controlling the diameter are not shown, but rather described in the captions. 4.1
Bounds for Unrestricted Schedules
Theorem 5. GSH takes O(|C|dN ) steps. Proof. Let t = |TB | = 2|C| + 2 be the number of operators applied by GSH. Clearly, O(t) = O(|C|). We must have progress at least once every t iterations, because otherwise all operators have been applied without progress and the algorithm terminates. Each operator’s application cost is O(d). Hence, we do O(|C|d) work in between two advances toward the fixpoint. The number of times we make progress is O(N ). Hence, the desired bound. To show that the number of times we make progress is O(N ) we argue as follows. The initial Z is SCC-closed, because it is either V or the reachable subset of V . When any of the operators in TB is applied, SCC-closedness is preserved. In particular, this holds for EU and ES because Z is (inductively) SCC-closed. Indeed, if v ∈ V has a path to Z ⊆ Z all in Z, then the SCC of v is contained in Z. Hence, each v in the same SCC has a path to Z all in Z. The result is therefore SCC-closed, and the set of dropped states, which is the difference of two SCC-closed sets, is also SCC-closed. In summary, when there is progress, Z loses an integral number of unfair SCCs that is greater than or equal to 1. Thus, progress cannot occur more than N times.
96
F. Somenzi, K. Ravi, and R. Bloem 2r
r
2r
0
0
0
0
0
0
0
0
1
2
1
2
1
2
1
2
0
0
0
0
0
0
0
0
1
2
1
2
1
2
3
3
0
0
0
0
0
0
0
0
1
2
1
2
4
4
3
3
0
0
0
0
0
0
0
0
1
2
5
5
4
4
3
3
Fig. 2. Graph showing that GSH is Ω(|C|dN ). Not shown are the arcs from any node with a label different from 0 to all the nodes to its right
The bound of Theorem 5 is tight in the following sense. Theorem 6. There exist a family of graphs and a corresponding family of schedules such that GSH takes Ω(|C|dN ) steps. Proof. Consider the family of graphs parameterized by r that is exemplified by Fig. 2. A graph in the family has r rows, each of which consists of 4r nontrivial SCCs. Hence, there are N = 4r2 SCCs, and |C| = r + 1 acceptance conditions; the height of the SCC graph is 4r, and the diameter is d = 2r + 1. Let U = {EUi |1 ≤ i ≤ |C|}. We consider the following schedule. – – – – –
All elements of U \ {EU3 } in decreasing index order r times, followed by EU3 . All elements of U \ {EU4 } in decreasing index order r − 1 times, followed by EU4 . ... All elements of U \ {EU|C| } in decreasing index order twice, followed by EU|C| . All elements of U \ {EU|C| } in decreasing index order once.
We now count the steps. The first series of subsequences takes h/4O(|C|d) steps. The second series takes (h/4 − 1)O(|C|d) steps and so on. The total number of steps is therefore Ω(|C|dh2 ), which is also Ω(|C|dN ).
4.2
Bounds for Restricted Schedules
If we strengthen the assumption about pick, we can prove an O(|C|dh + N − N ) bound. (N − N is the number of trivial SCCs.) The additional assumption is that the computation is performed in passes. We shall show that this bound is tight for EL2, but not for EL. Definition 1. A pass is a sequence over TB that satisfies the constraints imposed by GSH, and such that: 1. No EUi or ESi appears more than once. 2. Either all operators in TF or all operators in TP appear. Having thus divided the computation in passes, we can use reasoning similar to the one of [1, Theorem 1].
Analysis of Symbolic SCC Hull Algorithms
97
Table 2. Schedules and tenses. The algorithms are classified according to the mix of operators (EX of EY and EU or ES). Within these categories, they differ by tense EL future-tense [4] past-tense both tenses
EL2 [7,5] [7,8] [7,6]
Theorem 7. If the operator schedule can be divided in passes, GSH takes O(|C|dh + N − N ) steps. Proof. A pass in which all EU operators and at least one EX have been applied once removes all the terminal unfair SCCs present at the beginning of the pass. Likewise, a pass in which all ES operators and at least one EY have been applied once removes all the minimal unfair SCCs present at the beginning of the pass. Then, by induction on h, we can prove that we cannot have more than h passes of either type, for a total of 2h passes. Each pass may contain more than one EX or EY. We charge their cost separately, and we argue that the total cost of the successful applications is O(N − N ), because each extra EXs or EYs removes a trivial SCC. The cost of the unsuccessful applications is dominated by the cost of the EUs and ESs, which is O(|C|d).
The algorithms of Table 2 all satisfy the restricted scheduling policy2 , and are therefore O(|C|dh + N − N ). N − N is the linear penalty discussed in [5]. Though this penalty does not alter the complexity bounds in terms of the number of states n, it cannot be ignored when the bounds are given in terms of |C|, d, and h. Consider the following family Gr,s,f of graphs. Here, r is the number of rows, 0 < s < 2r determines the diameter, and f is the number of fairness conditions. (Shown in Fig. 3 is G3,2,2 .) For this family of graphs, d = s + 2, |C| = f , and h = 4r − 1. We consider the EL2 schedule. The future-tense version of EL2 applies EU1 through EUf followed by EG until convergence. The first application of EU1 through EUf removes the f rightmost nontrivial SCCs of each row. The successive EG removes what is left of the first row, and the rightmost trivial SCC of all the other rows. The second round of EU’s removes again the f rightmost nontrivial SCCs of each surviving row. EG then removes the second row entirely and the rightmost trivial SCC of each other row. We need a total of r passes to converge. Each pass costs |C|(s + 3) + 2r + 2. The |C|(s + 3) term is for the EUs and the 2r + 2 term is for the EG. Hence, the total cost of EL2 is (|C|(s + 3) + 2r + 2)r which is not O(|C|dh) = O(|C|sr). So, even though EL2 may beat EL on specific examples, EL’s O(|C|dh) bound is better. For EL2 we have the following lower bound, which is a special case of our previous observation about schedules that can be divided into passes. 2
GSH can implement a simplification of Kesten’s algorithm that disregards the issues of Streett emptiness.
98
F. Somenzi, K. Ravi, and R. Bloem
2r x
x
x
1 x
x
x
o 2 1
r
x
x
x
x
x
x
o
1 o
2 1 x
x
x
x
x
s
x
o
2 1 o
2
1 o
2
2
Fig. 3. Graph G3,2,2 showing that EL2 is Θ(|C|dh + N − N ). Not shown are the arcs from each node of type o or n to every node to its right on the same row, and from every node of type x, but the first s, to each x to its right and to the first o node on the row
Theorem 8. Algorithm GSH with schedule EL2 runs in Θ(|C|dh + N − N ) steps. Proof. EL2 is O(|C|dh+N −N ) thanks to Theorem 7. To show that EL2 is Ω(|C|dh+ N − N ) we resort to the family of graphs Gr,s,f we have used to show that EL2 is not O(|C|dh). We counted (|C|(s + 3) + 2r)r steps for EL2. Since N − N = 5r2 /2 + r/2, |C|dh + N − N = |C|(s + 2)(4r − 1) + 5r2 /2 + r/2, so (|C|(s + 3) + 2r)r is Ω(|C|dh + N − N ).
A similar analysis can be carried out for the variant of EL2 that uses both tenses (HH). In particular, for the upper bound Theorem 7 applies. For the lower bound, one can take the family of graphs we have used for EL2, add a fair SCC at the beginning of each row, and then mirror each row to the left. On the other hand, every schedule divided in passes in which the cost of applying EX and EY in each pass is dominated by the cost of applying EU and ES operators shares the (optimal) bounds of EL. 4.3
Bounds for Optimal Schedules
Theorem 6 is concerned with how badly things can go if the schedule is not well matched to the graph at hand. It is also interesting to consider what an optimal schedule can do. To this purpose, we provide GSH with an oracle that computes an optimal schedule, and we call the resulting method OSH. Theorem 9. OSH takes Θ(|C|dh) steps. Proof. For the upper bound we rely on [1, Theorem 1], which shows that EL is O(|C|dh). For the lower bound, we use the example of Fig. 4. In the graph shown, |C| = 3. The diameter is determined by the number of x nodes. Assume there are at least as many o nodes as there are x nodes.3 3
This assumption guarantees that the number of o nodes, which determines the number of “rounds” of EUs or ESs, is Ω(h).
Analysis of Symbolic SCC Hull Algorithms 1
1 2 3
o
2
1 o
x
x
F
x
3
x
o
2 3
99
1
o
2 3
Fig. 4. Graph showing that OSH is Ω(|C|dh). The o and n nodes have arcs forward to the other o and n nodes on the same side of F
OSH takes Ω(|C|dh) steps on this family of graphs. The cost of an EU or ES does not change until some x nodes are removed. At that point, the optimal schedule simply removes the remaining exposed x nodes to reach convergence. Hence, in this case, a unidirectional schedule is optimal. Suppose we use a future-tense schedule to fix ideas. Initially, we can only make progress by applying one EU. After that, we need to apply all remaining EUs before the rightmost o is exposed. At that point we can only make progress by applying EX. Therefore, we need to apply all EUs and one EX Ω(h) times. (Here is where we use the assumption of the number of o nodes.) The number of EUs is thus Ω(|C|h) and their cost is Ω(|C|dh).
A consequence of Theorem 9 is that EL is an optimal schedule. This optimality has its limits: it depends on our choice of measures, and there are graphs on which other schedules need fewer steps. Corollary 2. If cost is measured in terms of steps, and expressed in terms of |C|, d, h, N , and N , there is no schedule of GSH that has better worst-case asymptotic complexity than EL. 4.4
Bidirectional vs. Unidirectional Schedules
We conclude our analysis of GSH with a discussion of the advantages and disadvantages of schedules that use all four types of operators relative to those schedules that use only past-tense operators, or only future-tense operators. The proof of Theorem 7 suggests that trimming from both sides may take more steps than trimming from one side only, because if we work only from one side, we need at most h passes instead of 2h. Occasionally, though, working from both sides will speed up things, especially when there are no fair SCCs. (As noted in [6].) One reason is that a search in one direction can reduce the diameter of the graph, which helps the search in the other direction. The following example illustrates this point. Example 2. Consider the family of graphs exemplified in Fig. 5. The arcs out of the x nodes are all to the direct neighbor to the right, while the remaining nodes form a complete acyclic subgraph. In the example graph, d = 12, |C| = 3. OSH first applies an EH at a cost of d/2; it then applies d/2 EUs. Each EU costs 2 steps; so, the total cost is Θ(d) steps for EUs and O(d) steps total. Any purely future-tense or purely past-tense algorithm needs to apply d/2 EUs, too, but this time every one costs Ω(d) steps, giving a quadratic behavior. Note that an EG does not help.
100
F. Somenzi, K. Ravi, and R. Bloem
x
x
x
x
x
x
x
x
x
x
x
x
1
2
3
1
2
3
F 1
2
3
1
2
3
Fig. 5. Graph illustrating the possible advantages of bidirectional schedules
The preceding example shows that some bidirectional schedules may outperform all unidirectional ones. Obviously, there are even more cases in which bidirectional schedules outperform schedules in one direction, but not in the other.
5
Early Termination in Lockstep
Lockstep is a symbolic cycle detection algorithm based on SCC enumeration [1]. Given a seed v, the SCC containing v is computed as the intersection of the set β(v) of states that can reach v nontrivially and the set φ(v) of states that are nontrivially reachable from v. If there are short cycles involving the seed state v, the intersection of β(v) and φ(v) may be non-empty well before the two sets have been computed in their entirety. This suggests an early termination criterion for the algorithm. Theorem 10. Let F (v) and B(v) be the subsets of φ(v) and β(v) computed by Lockstep at some iteration. Let I(v) = F (v) ∩ B(v) and U (v) = F (v) ∪ B(v). If I(v) has a non-null intersection with all fair sets, then U (v) contains a fair cycle. Proof. Every state in B(v) has a non-trivial path to v entirely contained in B(v). Every state in F (v) has a non-trivial path from v entirely contained in F (v). Hence, every state in I(v) has non-trivial paths to and from v entirely contained in U (v). Therefore, every state in I(v) is connected to every other state of I(v) by a non-trivial path entirely in U (v). Since I(v) contains representatives from all fair sets, one can trace a fair cycle in U (v).
Once a fair set intersects I(v), it will continue to intersect it in all successive iterations of lockstep. Furthermore, at each iteration, one can stop testing intersections as soon as one fair set is found that does not intersect I(v). Hence, if there are |C| fair sets and convergence requires s steps, the number of intersection checks is O(|C| + s). The overhead for this early termination check is O(s) intersections (for I(v)) and O(s) intersection checks, because the original Lockstep performs O(|C|) intersection checks on the maximal SCC. Early termination imposes a simple change to counterexample generation: The fair sets are intersected with I(v). This intersection guarantees that the path connecting the fair sets can always be closed.
Analysis of Symbolic SCC Hull Algorithms
6
101
Experiments
In this section we present preliminary results obtained with implementations of GSH and Lockstep in VIS 1.4 [2]. The CPU times were measured on an IBM IntelliStation running Linux with a 1.7 GHz Pentium 4 CPU and 1 GB of RAM. The experiments involved three types of schedules for GSH: EL, EL2, and a random schedule, which applies about the same fraction of EXs as EL. We experimented with different levels in the use of don’t care conditions in the computation of the fixpoints. Our experiments involved both language emptiness checks, and CTL model checking problems. For the language emptiness experiments, we considered both future-tense and past-tense schedules, and we also ran the enhanced version of Lockstep described in Section 5. All experiments included reachability analysis and used fixed BDD variable orders to minimize noise. We present three summary tables, for the three classes of experiments. The parameters we vary are the algorithm/schedule, the tense (future or past) for GSH schedules, and the degree of exploitation of don’t care conditions in the EX and EY computations. For all GSH schedules, a “low DC level” means that the reachable states are used to simplify the transition relation, that the fixpoint computation for the fair states is started from Z = true, and that no frontiers are used in the EU and ES computations. A “medium DC level” means that in addition to simplifying the transition relation, frontiers are used. A “high DC level,” would simplify the argument to each EX computation with respect to the reachable states. (This simplification is not possible for EY computations.) One such scheme that we tried did not produce significant improvements over “medium DC level.” We have not yet implemented the technique described in [14]. The columns of the tables have the following meaning. Total is the total time (in seconds) for all experiments. For timed-out experiments, the limit of 1800s is taken. Gmt is the geometric mean of the running time of all experiments. A win is a case in which a method is at least 2% and at least 0.1s faster than any other method. A tie is a case in which there was no win, and the method was either fastest or less than 2% slower than the fastest method. T/O is the number of experiments that timed out. Steps is the total number of steps (EXs and EYs) performed, and gms is the geometric mean of the number of steps. All experiments for which at least one method took 0s are excluded from the computation of geometric mean time, wins, and ties. Table 3 compares different GSH schedules for CTL model checking experiments; Table 4 compares GSH schedules and Lockstep on language emptiness problems; and Table 5 shows the results of LTL and CTL model checking for families of parameterized models. The properties used for these experiments required cycle detection for both CTL and LTL model checking. The data in the tables supports the following remarks. – No type of schedule dominates the others, even though on individual models there are sometimes large differences. On average, EL-type schedules are the fastest for the parameterized models, while EL2 is the best for the non-parameterized ones. – The complexity bounds of Section 4 are in terms of number of steps. While within a homogeneous group of experiments (same tense and DC level) the schedule performing fewer steps is often the fastest, it is obvious that the cost of a step is not
102
F. Somenzi, K. Ravi, and R. Bloem Table 3. Summary of CTL model checking experiments for 39 models schedule EL EL2 random EL EL2 random
DC level low low low medium medium medium
total 10307 8002 10028 9606 9128 11284
gmt wins ties T/O 7.7 1 20 4 6.3 0 23 3 8.5 2 20 4 7.5 0 14 5 6.7 1 17 4 9.0 1 16 6
steps 78734 31153 64420 78622 31013 64059
gms 139 126 148 137 123 147
Table 4. Summary of language emptiness experiments for 59 models schedule EL EL2 random EL EL2 random EL EL2 random EL EL2 random lockstep
tense future future future future future future past past past past past past both
DC level low low low medium medium medium low low low medium medium medium medium
total 15389 10273 15167 12367 8342 11875 11440 5999 10666 6759 3362 5771 10613
gmt wins ties T/O 23.1 0 13 7 16.4 0 13 4 22.7 0 13 6 19.6 1 13 4 14.8 1 13 1 18.9 1 14 2 18.3 0 13 4 9.3 0 17 2 17.4 0 13 4 14.2 1 14 0 8.4 3 16 0 13.4 0 15 0 22.2 7 12 3
steps 37252 15182 31500 38412 16033 32840 118711 19894 105842 153188 20692 116904 140673
gms 64 57 63 68 59 67 101 71 98 105 72 101 86
constant. Figure 6, for instance, shows the relation between number of steps and CPU time for the EL schedule with low DC level applied to the parameterized families of models. It is readily seen that for most families, the computation time grows much faster than the number of steps. Also, the past-tense schedules of Table 5 perform many more steps than the corresponding future-tense schedules. However, the majority of them are due to just one model, and are very cheap. – In our experiments, the tense did not affect in a significant way the comparison between different types of schedules (e.g., EL vs. EL2). – Past-tense schedules did usually better than future-tense schedules.4 However, the advantage of past-tense schedules may depend on several factors. These include different quantification schedules for EX and EY, different diameters for a graph and its reverse, the positions of the fair SCCs in the graphs, as well as various BDDrelated factors like the fact that some fixed variable orders are saved at the end of reachability analysis runs with dynamic reordering, and the hard-to-predict effects of the BDD computed table. In addition, our current implementation applies the same don’t care techniques for past and future schedules. All these reasons may 4
In Table 5, the best results are for CTL model checking, but it is not possible to compare those future-tense schedules to the others because the LTL models have more state variables.
Analysis of Symbolic SCC Hull Algorithms
103
Table 5. Summary of model checking experiments for 57 models from 11 parameterized families logic LTL LTL LTL LTL LTL LTL LTL LTL LTL LTL LTL LTL LTL CTL CTL CTL CTL CTL CTL
schedule EL EL2 random EL EL2 random EL EL2 random EL EL2 random lockstep EL EL2 random EL EL2 random
tense future future future future future future past past past past past past both future future future future future future
DC level low low low medium medium medium low low low medium medium medium medium low low low medium medium medium
total 7746 13419 12614 7503 14009 13896 8422 8573 7236 8587 8588 7908 26597 2699 10076 11525 2969 10928 11725
gmt wins ties T/O steps gms 6.8 0 10 3 606502 587 10.7 0 10 5 626093 1061 10.5 0 8 5 685794 1053 6.7 0 9 3 606400 587 10.5 0 8 7 595326 1007 10.3 0 8 7 656409 996 6.0 0 17 3 2873990 1220 5.9 0 18 4 2749041 1164 5.9 0 14 2 2865320 1237 6.2 0 17 4 2816871 1188 6.0 0 18 4 2714631 1139 6.0 0 15 2 2857436 1222 30.2 0 8 12 3362850 2603 2.8 7 26 1 396251 429 4.9 4 23 5 370797 720 4.9 0 23 6 483961 676 3.2 0 29 1 396218 428 5.5 0 27 5 342588 703 5.5 0 27 6 463425 667
Scatter plot for 53 out of 57 experiments arbiter bakery drop elev-1-f elev-c-3 hrglass lock minmax philo tree-arb vending
103
EL: time (s)
102
1
10
100
10-1 1 10
2
10
3
10 10 EL: steps
4
5
10
Fig. 6. CPU time as a function of the number of steps
explain the differences between our results and those of [11] with regard to tenses. It should also be mentioned that future tense schedules may be applied also without preliminary reachability analysis. For past-tense schedules, one has then to prove reachability of the fair SCCs.
104
F. Somenzi, K. Ravi, and R. Bloem
– For all the experiments that complete within 1800 s, the number of steps does not depend on the DC level. However, in case of timeout, the number of steps until the timeout is counted. This explains small differences in the numbers of steps between methods that differ only in the use of don’t cares. Early termination for Lockstep is effective. The results with early termination are uniformly better or equal to those without.5 Compared to GSH, Lockstep loses in most cases, but has the largest number of wins for both non-parameterized and parameterized language emptiness experiments. (That is, not counting the CTL experiments in Table 5.)
7
Conclusions
We have presented an improved Generic SCC Hull algorithm, and we have proved several bounds on the performance of classes of algorithms that can be cast as particular operator schedules for GSH. We have proved in particular, that when complexity is measured in steps (EX and EY computations) and it is given as a function of the number of fairness constraints |C|, the diameter of the graph d, the height of the SCC quotient graph h, and the number of total (nontrivial) SCCs N (N ), then algorithm EL is optimal (Θ(|C|dh)) among those that can be simulated by GSH. Variants like EL2, on the other hand, are not optimal in that sense. (They are Θ(|C|dh + N − N ).) Of course, on a particular graph, EL2 may outperform EL for at least two reasons: On the one hand, the theoretical bounds are for worst-case performance. On the other hand, the cost of individual steps can vary widely. This implies that the theoretical analysis should be accompanied by an experimental evaluation. We have performed such an assessment, conducting experiments with several competing algorithms on a large set of designs. We have found that no GSH schedule dominates the others. Also, Lockstep is slower on average than GSH, but it produces the best results in quite a few cases. On individual experiments the ranges of CPU times for the various schedules may cover three orders of magnitude, which suggests that having more than one method at one’s disposal may allow more model checking problems to be solved.
References [1] R. Bloem, H. N. Gabow, and F. Somenzi. An algorithm for strongly connected component analysis in n log n symbolic steps. In W. A. Hunt, Jr. and S. D. Johnson, editors, Formal Methods in Computer Aided Design, pages 37–54. Springer-Verlag, November 2000. LNCS 1954. [2] R. K. Brayton et al. VIS: A system for verification and synthesis. In T. Henzinger and R. Alur, editors, Eighth Conference on Computer Aided Verification (CAV’96), pages 428– 432. Springer-Verlag, Rutgers University, 1996. LNCS 1102. [3] E. M. Clarke, O. Grumberg, and D. A. Peled. Model Checking. MIT Press, Cambridge, MA, 1999. 5
We also found that the best performance is achieved by not trimming [11] the initial set of states (the reachable states). The results shown for Lockstep are for the algorithm that does not trim the initial set.
Analysis of Symbolic SCC Hull Algorithms
105
[4] E. A. Emerson and C.-L. Lei. Efficient model checking in fragments of the propositional mu-calculus. In Proceedings of the First Annual Symposium of Logic in Computer Science, pages 267–278, June 1986. [5] K. Fisler, R. Fraer, G. Kamhi, M. Vardi, and Z.Yang. Is there a best symbolic cycle-detection algorithm? In T. Margaria and W. Yi, editors, Tools and Algorithms for the Construction and Analysis of Systems, pages 420–434. Springer-Verlag, April 2001. LNCS 2031. [6] R. H. Hardin, R. P. Kurshan, S. K. Shukla, and M. Y. Vardi. A new heuristic for bad cycle detection using BDDs. In O. Grumberg, editor, Ninth Conference on Computer Aided Verification (CAV’97), pages 268–278. Springer-Verlag, Berlin, 1997. LNCS 1254. [7] R. Hojati, H. Touati, R. P. Kurshan, and R. K. Brayton. Efficient ω-regular language containment. In Computer Aided Verification, pages 371–382, Montr´eal, Canada, June 1992. [8] Y. Kesten, A. Pnueli, and L.-o. Raviv. Algorithmic verification of linear temporal logic specifications. In International Colloquium on Automata, Languages, and Programming (ICALP-98), pages 1–16, Berlin, 1998. Springer. LNCS 1443. [9] R. P. Kurshan. Computer-Aided Verification of Coordinating Processes. Princeton University Press, Princeton, NJ, 1994. [10] K. L. McMillan. Symbolic Model Checking. Kluwer Academic Publishers, Boston, MA, 1994. [11] K. Ravi, R. Bloem, and F. Somenzi. A comparative study of symbolic algorithms for the computation of fair cycles. In W. A. Hunt, Jr. and S. D. Johnson, editors, Formal Methods in Computer Aided Design, pages 143–160. Springer-Verlag, November 2000. LNCS 1954. [12] F. Somenzi. Symbolic state exploration. Electronic Notes in Theoretical Computer Science, 23, 1999. http://www.elsevier.nl/locate/entcs/volume23.html. [13] A. Tarski. A lattice-theoretic fixpoint theorem and its applications. Pacific Journal of Mathematics, 5:285–309, 1955. [14] C. Wang, R. Bloem, G. D. Hachtel, K. Ravi, and F. Somenzi. Divide and compose: SCC refinement for language emptiness. In International Conference on Concurrency Theory (CONCUR01), pages 456–471, Berlin, August 2001. Springer-Verlag. LNCS 2154. [15] A. Xie and P. A. Beerel. Implicit enumeration of strongly connected components. In Proceedings of the International Conference on Computer-Aided Design, pages 37–40, San Jose, CA, November 1999.
Sharp Disjunctive Decomposition for Language Emptiness Checking Chao Wang and Gary D. Hachtel Department of Electrical and Computer Engineering University of Colorado at Boulder, CO, 80309-0425 {wangc,hachtel}@Colorado.EDU
Abstract. We propose a “Sharp” disjunctive decomposition approach for language emptiness checking which is specifically targeted at “Large” or “Difficult” problems. Based on the SCC (Strongly-Connected Component) quotient graph of the property automaton, our method partitions the entire state space so that each state subspace accepts a subset of the language, the union of which is exactly the language accepted by the original system. The decomposition is “sharp” in that this allows BDD operations on the concrete model to be restricted to small subspaces, and also in the sense that unfair and unreachable parts of the submodules and automaton can be pruned away. We also propose “sharp” guided search algorithms for the traversal of the state subspaces, with its guidance the approximate distance to the fair SCCs. We give experimental data which show that our algorithm outperforms previously published algorithms, especially for harder problems.
1
Introduction
Language emptiness checking on the fair Kripke structure is an essential problem in LTL [1,2] and fair-CTL [3] model checking, and in the language-containment based verification [4]. Symbolic fair cycle detection algorithms – both the SCC-hull algorithms [5,6,7,8] and the SCC enumeration algorithms [9,10], can be used to solve this problem. However, checking language emptiness in general is harder than checking invariants, since the later is equivalent to reachability analysis and has a linear complexity. Due to the well-known state space explosion, checking language emptiness can be prohibitively more expensive and is still considered to be impractical on industry scale circuits. Symbolic fair cycle detection requires in the general case a more than linear complexity: O(n2 ) for SCC-hull algorithms and O(n log n) for SCC enumeration algorithms, where n is the number of states. For those cases where the automata are weak or terminal [11,12], special model checking algorithms usually outperform the general ones. This idea was further extended by [13], which combines compositional SCC analysis with specific decision procedures tailored to the cases of strong, weak, or terminal automata. It thus takes advantage of those strong automata with weak or terminal SCCs, and of those strong SCCs that turn into weak or terminal SCCs after the automata are composed with the model.
This work was supported in part by SRC contract 2001-TJ-920 and NSF grant CCR-99-71195.
M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 106–122, 2002. c Springer-Verlag Berlin Heidelberg 2002
Sharp Disjunctive Decomposition for Language Emptiness Checking
107
In [13] SCC analysis is also used during the localization reduction to limit BDD attention to one fair SCC of the partially composed abstract model at a time. This permitted BDD restriction to a small state subspace during expensive operations that needed to be performed on the entire concrete model. Sometimes, by partitioning the sequential system into subsystems and inspecting each of these small pieces separately, the chance of solving the problem increases. In the context of reachability analysis, [14] proposed the machine decomposition algorithm: It partitions the sequential system using its latch connectivity graph, so each subsystem contains a subset of latches of the original system. For language emptiness checking, we propose in this paper a new algorithm for state space decomposition, which is based on the notion of sharpness. Our algorithm partitions the original state space S into a collection of state subspaces Si , according to the SCC quotient graph structure of the amassed property automaton. A nice feature of these state subspaces is that, each of them can be viewed as a separate fair Kripke structure. Further, if we use L(S) to represent the original language, and L(Si ) to represent the language accepted within each state subspace, we have L(Si ) ⊆ L(S) and ∪i L(Si ) = L(S). This allows us to check language emptiness on each state subspace separately. Thus our decomposition is “sharp” in that the BDD operations on the concrete model are focused on very small state subspaces, and also in the sense that unfair and unreachable parts of the submodules and automaton can be pruned away. We further propose a “sharp” forward (and backward) guided search algorithm for the traversal of the sate subspaces, which uses the approximate distance to the fair SCCs to guide the search. At each breadth-first search step, we only compute a subset of normal image with a smaller BDD size (sharp) and a closer distance to the potential fair SCC (guided). Whenever the reachable subset intersects a promising state – a state that is in the fair SCC-closed set (defined later) and satisfies some fairness constraints, we use that state as a seed for the fair SCC search. If a fair SCC can be found, we know that the language isn’t empty; otherwise, we continue the forward search. If we can not find any fair SCC when the forward search reaches a fix-point, or the entire fair SCC-closed set has been explored, we know the language of is empty. Note our new algorithm does not use the weak/terminal automata strength reduction techniques of [12]. On practical circuits, reachability analysis or even a single image computation can be prohibitively expensive. In fact, our research is directed specifically toward such larger problems. Thus it is to be expected that algorithms with less heuristic overhead might outperform our “sharp” algorithm for easily soluble problems. The experimental results show this, but they also show that when the LTL model checking problems become harder, our “sharp” algorithm outperforms both Emerson-Lei (the standard language emptiness checking algorithm in VIS [15]) and D’n’C [13]. The flow of the paper is as follows. In Section 2 we present the basic definitions. In Section 3 we present the state space decomposition theory. In Section 4 we describe the algorithm and analyze its complexity. The experimental results are given in Section 5, and we conclude and discuss potentially fruitful future work in Section 6.
108
2
C. Wang and G.D. Hachtel
Preliminaries
We combine the model M and the property automaton A¬ψ together and represent the entire system as a labelled, generalized B¨uchi automaton1 A = M ∗ A¬ψ . Definition 1. A (labelled, generalized) B¨uchi automaton is a six-tuple A = S, S0 , T, F, A, L , where S is the finite set of states, S0 ⊆ S is the set of initial states, T ⊆ S × S is the transition relation, F ⊆ 2S is the set of fairness conditions, A is the finite set of atomic propositions, and L : S → 2A is the labelling function. A run of A is an infinite sequence ρ = ρ0 , ρ1 , . . . over S, such that ρ0 ∈ S0 , and for all i ≥ 0, (ρi , ρi+1 ) ∈ T . A run ρ is accepting if, for each Fi ∈ F, there exists sj ∈ Fi that appears infinitely often in ρ. The automaton accepts an infinite word σ = σ0 , σ1 , . . . in Aω if there exists an accepting run ρ such that, for all i ≥ 0, σi ∈ L(ρi ). The language of A, denoted by L(A), is the subset of Aω accepted by A. The language of A is nonempty iff A contains a fair cycle: a cycle that is reachable from an initial state and intersects all the fair sets. A Strongly-Connected Component (SCC) C of an automaton A is a maximal set of nodes such that there is a directed path between any node in C to any other. A reachable SCC that intersects all fair sets is called fair SCC. A SCC that intersects some initial states is called initial SCC. Given an automaton A, the SCC (quotient) graph Q(A) is the result of contracting each SCC of A into one node, merging the parallel edges and removing the self-loops. Definition 2. The SCC (quotient) graph of the automaton A is a four-tuple C Q(A) = S C , S0C , T C , SF ,
where S C is the finite set of SCCs, S0C ⊆ S C is the set of initial SCCs, T C = {C1 × C2 |s1 ∈ C1 , s2 ∈ C2 and s1 × s2 ∈ T and C1 =C2 } is the transition relation, C ⊆ S C is the set of fair SCCs. SF The SCC graph forms a Directed Acyclic Graph (DAG), which induces a partial order: The minimal (maximal) SCC has no incoming (outgoing) edges. In symbolic model checking, we assume that all automata are defined over the same state space and agree on the state labels, and communication proceeds through the common state space. The composition A1 ∗ A2 = S, S0 , T, F, A, L of two B¨uchi automata A1 = S, S01 , T1 , F1 , A, L and A2 = S, S02 , T2 , F2 , A, L is defined by S0 = S01 ∩ S02 , T = T1 ∩ T2 , and F = F1 ∪ F2 . Hence, composing two automata restricts the transition relation and results in the intersection of the two languages. We also want to define a quotient restriction operation. Definition 3. The restriction of A = S, S0 , T, F, A, L by a subset SCC graph Q− = C S C , S0C , T C , SF is defined as A ⇓ Q− = S − , S0− , T − , F, A, L , with S − = {s|s ∈ C C and C ∈ S }, S0− = {s0 |s0 ∈ C0 and C0 ∈ S0C }, T − = {s1 × s2 |s1 , s2 ∈ C and C ∈ S C and (s1 , s2 ) ∈ T }. 1
Note when the context is clear we will just use ∗ to denote the composition operation between two FSMs. Similarly, consistent with BDD usage, we will sometimes use ∗ in place of × to refer to the cartesian product of two sets, or the product/composition of two automata.
Sharp Disjunctive Decomposition for Language Emptiness Checking
109
Obviously we have A ⇓ Q(A) = A. Note that unlike BDD restriction operations, the right argument is a segment of a quotient graph. Inside the definition, however, the automaton is actually operated upon by the sets of states implied by the quotient graph. An SCC-closed set of A is a subset V ⊆ S such that, for every SCC C in A, either C ⊆ V or C ∩ V = ∅. Note that if C is an SCC in A1 (or A2 ) , it is an SCC-closed set of the composition A1 ∗ A2 .
3
State Space Decomposition – Theory
The automaton A contains an accepting cycle iff its SCC graph Q(A) contains a fair SCC. Definition 4. A SCC graph Q(A) is “pruned” if all the minimal nodes are initial , all the maximal nodes are fair, and all the other nodes are on paths from initial nodes to fair nodes. Pruning (defined as removing nodes that are not in the pruned SCC graph Q(A)) does not change the language of the corresponding automaton A. In the following, we assume that all the SCC graphs are pruned. 2 The entire state space of A can be decomposed into state subspaces according to the structure of Q(A) . For brevity, we don’t give proof for the following theorems since they are obvious. Definition 5. For each fair SCC Ci in Q(A), we can construct such a SCC subgraph QF i by marking all the other SCCs “non-fair” and then pruning Q(A). Theorem 1. The language accepted by each state subspace A ⇓ QF i and the language accepted by A satisfy the following relations, L(A ⇓ QF i ) ⊆ L(A) ∪i L(A ⇓ QF i ) = L(A) Note that in each SCC subgraph QF i , the (only) maximal node is fair. Definition 6. In the SCC subgraph QF i , each “initial-fair” path constitutes a SCC . subgraph QL ij Theorem 2. The language accepted by each state subspace A ⇓ QL ij satisfies the following relations, F L(A ⇓ QL ij ) ⊆ L(A ⇓ Qi )
F ∪j L(A ⇓ QL ij ) = L(A ⇓ Qi ) 2
Note that in the pruned SCC graph, all the maximal nodes are fair. However, the fair SCCs are not always maximal – they might be on the path from initial to other maximal fair SCCs.
110
C. Wang and G.D. Hachtel
Thus, checking language emptiness of the original automaton A can be done on each individual subgraphs A ⇓ QL ij separately. L Theorem 3. L(A) = ∅ iff L(A ⇓ QL ij ) = ∅ for every SCC subgraph Qij . In order to clarify the distinction between Cartesian product and composition operations in the sequel (see methods (b) and (c) in Section 4.3) we also include the following proposition. Proposition 1. Let {Ci1 } be the SCCs of A1 and {Cj2 } be the SCCs of A2 . Then the SCCs {Cij } of the composition A1 ∗ A2 satisfies
∃k,l such that (1) Cij ⊆ Ck1 × Cl2 , (2) Cij ∗ (Ck1 × Cl2 ) = ∅, ∀(k , l ) =k,( l), with equality holding only when the edges inside Ci1 and Cj2 either: 1. have no labels; or 2. have labels whose supports are disjoint from each other; or 3. have mutually consistent labels (meaning nonempty conjunction). Note although the first two conditions for equality are subsumed by the third, they demonstrate cheap tests which might be used to avoid the expensive composition operation in some cases.
4 The Algorithm 4.1 The Overall Algorithm In this algorithm, we combine the idea of “sharp” guided search (will be explained in Section 4.4) with the “disjunctive” decomposition (explained in Section 3). The pseudo code of the overall algorithm is given in Figure 1. check-language-emptiness is the main procedure, it accepts three parameters: The concrete system A, the property automaton A¬ψ , and the list of (circuit) model submodules M = {M1 , M2 , ..., Mm }. The algorithm goes through the following phases: 1. The amassing phase: The property automaton A¬ψ is composed with submodules from {Mi }, one at a time, and its SCC graph QA+ is built at each step. This phase continues until either QA+ becomes an empty graph or the amassing threshold is reached. We will explain the amassing phase in detail in Section 4.2. 2. The decompose and pre-jump phase: Each fair SCC in A+ is pre-processed by intersecting with the remaining submodules in list {Mi }. The details of this prejump process will be explained in Section 4.3. By building the QF/QL subgraphs, QA+ is decomposed into a collection of SCC subgraphs QF /QL . 3. The jump phase: Now we “jump” to the concrete system A, with a collection of SCC subgraphs QL . Language emptiness is check on each individual state subspace A ⇓ QL . The “sharp” guided search idea is implemented in sharpsearch-andlockstep, together with the LockStep search, with focus on the ideagoal of “early termination”. This will be described in detail in Section 4.4.
Sharp Disjunctive Decomposition for Language Emptiness Checking // entire system, property, submodules check-language-emptiness(A, A¬ψ , {Mi }){ Reach := compute-initial-states(A) A+ := A¬ψ // amassing phase while (amassing threshold not reached) do Mi := pick-next-submodule(A+ , {Mi }) A+ := A+ ∗ Mi QA+ := build-sccgraph(A+ ) if QA+ is an empty graph then return true fi od for each fair SCC C ∈ QA+ do // decompose and pre-jump QF := build-qf-subgraph(QA+ , C) Queue := {C} for each remaining submodule Mi do Queue := refine-sccs(A¬ψ ∗ Mi , Queue) od for each dfs path pj in QF do QL := build-ql-subgraph(QF , pj ) if (sharpsearch-and-lockstep(A, QL , C, Reach, Queue) = false) then return false fi od od return true } sharpsearch-and-lockstep(A, QL , C, Reach, Queue){ // model, hyper-line, fair scc // reachable, and scc queue F ront := Reach absRings := compute-reachable-onionrings(A+ ⇓ QL ) // (see Definition 3) FS: while (F ront = ∅) and (F ront ∩ Queue = ∅) do F ront := img# (A ⇓ QL , F ront, absRings) \ Reach if (F ront = ∅) then F ront := img(A ⇓ QL , Reach) \ Reach fi Reach := Reach ∪ F ront od if (F ront = ∅) then return true else if (lockstep-with-earlytermination(A ⇓ C, Queue, absRings)) then return false else goto FS fi } Fig. 1. The Overall Algorithm for Checking Language Emptiness
111
112
C. Wang and G.D. Hachtel
4.2 Amassing and Decomposition Amassing the Property Automaton. The property automaton A¬ψ is usually small and its SCC graph demonstrates limited structure/sparsity. In order to get finer decomposi tion, we need to augment A¬ψ with a small portion of the submodules of M = i Mi . At the very beginning A+ = A¬ψ . As we pick up the Mi and gradually add them to A+ , we are able to see the structural interaction between the property automaton and the model. As a consequence, the SCCs in A+ gradually gets fractured and the SCC graph becomes larger and shows more structure/sparsity. We call this augmentation process “amassing the automaton”. The order in which the remaining submodules Mi are brought in is critical, as is the way in which the original model was partitioned to form the submodules Mi in the first place. Since our “sharpness” goal is to fracture the SCC graph Q(A+ ) and make it show more structure/sparsity, we used the following criteria: 1. Cone-Of-Influence (localization) reduction: Only state variables that are in the transitive fan-ins of A¬ψ are considered. These state variables are grouped into clusters {Mi } so that the interaction between clusters is minimized [14]. For each Mi , we compute the SCC graph Q(Ai ), with Ai = A¬ψ ∗ Mi . 2. When we augment A+ , we give priority to clusters which are both in the immediate fan-ins of A+ and have the relatively most complex SCC graph Q(Ai ). 3. We repeat the previous step until either all the Mi are added, or the amassing phase reaches a certain threshold. At each amassing step, current A+ is a refinement of the previous A+ (The SCC graph Q(A+ ) is also a refinement of its previous counter-part). This means that we can build the SCC graph incrementally, as opposed to building it from the scratch each time. We use lockstep to refine each SCC in the previous Q(A+ ), then update the edges. Also, the SCCs that are in the previous SCC graph, but now becomes redundant (not in the pruned graph) are removed. If at anytime, Q(A+ ) becomes empty, we can stop, knowing that the language is empty. In order to avoid an excessive partitioning cost , with consequent exponential number of subgraphs, we have heuristic control on the activation of SCC refinement: 1. If the size of an SCC in the previous Q(A+ ) is below a certain threshold, and it is not fair, skip refining it. 2. If the total number of edges in Q(A+ ), e, exceeds a certain threshold, stop the amassing. 3. If the total number of fair SCCs in Q(A+ ), f , exceeds a certain threshold, stop the amassing. After the amassing phase, the SCC graph Q(A+ ) is available. SCC subgraphs QF i L and QL ij will be built as discussed in Section 3. Since each Qij corresponds to a depthfirst search path in the SCC graph Q(A+ ), we also called them hyperlines in the sequel. In fact, each hyperline is an envelope of Abstract Counter-Examples (ACE). The total number of SCC subgraphs is bounded by the size of Q(A+ ). Theorem 4. For a SCC graph with f fair SCCs and e edges, the total number of QF i SCC subgraphs is f ; The total number of the QL ij SCC subgraphs is O(f e). Let’s denote the total number of states in A+ as ηk . Without the control by the amassing threshold, in the worst case e = O(ηk2 ) and f = O(ηk ). However, in our method, the amassing threshold bounds both f and e to constant values.
Sharp Disjunctive Decomposition for Language Emptiness Checking
113
4.3 The Jump Phase In the jump phase we determine if any of the abstract counter examples in the current hyperlines contain a concrete counter example. We are currently using an intermediate iterative state subspace restriction process that can be inserted before “jump”, which is related to the work of [14], and [16]. Assume that the submodules of the model (M = i Mi ) have each been intersected with the property automaton, creating a series of Ai , where i = 1, 2, ..., m 3 . After the amassing phase, we have: (1) the amassed automaton A+ = A1 ∗ A2 ∗ ... ∗ Ak−1 , (2) the remaining automata Ak , Ak+1 , ..., Am , and (3) the list of SCC-closed sets in A+ , which we shall call L+ . At this point, LockStep can be used to partition and refine each SCC-closed set of L+ into one or more SCCs according to the Transition Relations (TR) of Ai . We briefly discuss four different approaches to the “jump” phase. The first is one used for a similar purpose in D’n’C, while the last three are part of our new algorithm. In the last three approaches, the last step, called the jump step, is the same: They are computed on the entire concrete system, subject to a computed state subspace restriction. Only the state subspace restriction varies from method to method. First, in the D’n’C approach 4 [13], which we shall call Method (a), L = EL( Ai , ∪C∈L+ C) i
Here el stands for the Emerson-Lei algorithm, and ∪C∈L+ C is the union of all the fair SCC-closed sets of A+ . i Ai is the concrete system. Its main feature is that fair cycle detection is restricted to each state subspace C ∈ L+ . In [13], the advantageous experimental results were attributed mainly to this restriction and the automata strength reduction [12]. The second approach, which is the one currently in our implemention, can be called the “Cartesian product” approach, and will be referred to as Method (b). It is based directly on Proposition 1, and can be characterized as follows. 5 Compute Jump State Space Restriction = LockStep(Ak , L+ ) Lk Lk+1 = LockStep(Ak+1 , Lk ) = LockStep(Ak+2 , Lk+1 ) Lk+2 ... = LockStep(Am , Lm−1 ) Lm Jump in Restricted State Space L = LockStep( i Ai , Lm )
Ai = A¬ψ ∗ Mi , and A = i Ai 4 We acknowledge here that these methods fall partially under the purview of the “Policy” discussed in [13] in terms of the lattice of over approximations which derive from composing arbitrary subsets of all the submodules of the concrete system A+ . However the Cartesian Product Approach (Method (b) below) has an element (see Proposition 1) distinct from the topic of which approximations to use, and that element appears in (c) and (d) below as well. 5 In the pseudo code, this is described by the function refine-sccs 3
114
C. Wang and G.D. Hachtel
A direct analogy can be observed between Method (b) and the MBM (Machine by Machine) approach of [14]. For each SCC-closed set C in list Lk , LockStep will further partition it into a collection of SCCs according to the TR of Ak+1 . The submachines still remaining to be composed in testing the ACE, are treated “one machine at a time”. Note that the quotient graph of machine Ak = A¬ψ ∗ Mk has been computed a priori, and the searches inside LockStep are restricted to the state subspace C × C, where C is a specific SCC of Ak , and × represents the cartesian product. The product C × C is a smaller set than C, because the product operation further refines the partition block C. Further, this process can fracture the closed sets, since C and C are sometimes disjoint sets. Thus the closed sets in Lk can be smaller than those L+ . Similarly, those in Lk+1 are smaller still, and so on. Thus as the machine that LockStep operates on becomes progressively more concrete, the size of the considered state space becomes progressively smaller. Thus the state restriction subspaces of Method (b) are thus generally much smaller than those in Method (a). To illustrate this effect, consider a simple example, in which the original property automaton A¬ψ has two fair SCCs (scc1, scc2), and the pre-jump amassed automaton A+ also has two fair SCCs (SC1, SC2). We assume: scc1 ⊇ SC1 and scc1 × SC2 = ∅. Suppose there are two submodules Ma , Mb yet to be composed in the jump phase, and that Ma has a single fair SCC Ca that Mb also has a single fair SCC Cb . Summarizing, we have Module A¬ψ A+ Aa = A¬ψ ∗ Ma Ab = A¬ψ ∗ Mb Ma Mb
Fair SCCs scc1 scc2 SC1 SC2 C1 C2 C3 C4 Ca Cb
After the composition A¬ψ ∗ Ma , Ca is decomposed into 2 SCCs (C1,C2). In this case, it’s obvious that C1 ⊆ (scc1 × Ca ) and C2 ⊆ (scc2 × Ca ). The same thing happens to the composition of Mb : its only fair SCC Cb is decomposed into (C3, C4). And the following holds C3 ⊆ (scc1 × Cb ), C4 ⊆ (scc2 × Cb ). We take the two cartesian products to yield = {SC1 × Ca , SC2 × Ca } LockStep(Ma , {SC1, SC2}) LockStep(Mb , {SC1 × Ca , SC2 × Ca }) = {SC1 × Ca × Cb , SC2 × Ca × Cb } Notice that SC1 × Ca × Cb ⊇ SC1 × (C1 + C2) × (C3 + C4) SC2 × Ca × Cb ⊇ SC2 × (C1 + C2) × (C3 + C4) To Summarize Method (b) L+ La Lb L
= {SC1, SC2} amassing = LockStep(Aa , L+ ) = {SC1 × C1, SC2 × C2} refining Aa = LockStep(A , L ) = {SC1 × C1 × C3, SC2 × C2 × C4} refining Ab b a = LockStep( i Ai , Lb ) (Jump in Restricted State Space)
Sharp Disjunctive Decomposition for Language Emptiness Checking
115
Note since SC1 ⊆ scc1, SC2 ⊆ scc2 Method (b) gives a smaller restriction subspace than Method (a) as used in D’n’C. The third approach, Method (c), can be called the “one-step composition” approach, and can be characterized briefly as follows. Lk = LockStep(A+ ∗ Ak , L+ ) Lk+1 = LockStep(A+ ∗ Ak+1 , Lk ) ... Lm = LockStep(A+ ∗ Am , Lm−1 ) L
= LockStep( i Ai , Lm)
Whereas Method (b) did no composition prior to making the full jump, Method (c) invests more heavily in sharpness by composing A+ with each of the remaining submodules. At each step we use the refined SCC-closed sets computed in the previous step. This is certainly more work than method (b) but produces still “sharper” (that is, smaller) restriction subspaces, due to SCC-fracturing process inherent in composition. In comparing Methods (b) and (c), the reader should pay attention to Proposition 1. Working with unlabelled graphs might give the impression that methods (b) and (c) give identical results. Note for an edge to exist in the STG of the composition it must exist in both of the machines being composed. Thus whereas method (b) never “fractured” any individual SCCs, Method (c) does, ultimately leading to much smaller restriction subspaces in the jump (that is the last) step. The fourth approach, Method (d), can be called the “full iterative composition” approach, and can be characterized briefly as follows. Lk Lk+1 Lk+2 ... L
= LockStep(A+ ∗ Ak , L+ ) = LockStep((A+ ∗ Ak ) ∗ Ak+1 , Lk ) = LockStep(((A+ ∗ Ak ) ∗ Ak+1 ) ∗ Ak+2 , Lk+1 ) = LockStep( i Ai , Lm−1 )
Note that in the calls to LockStep, the next of remaining uncomposed submachines are composed with the result of the previous composition. At each step computation is restricted to an SCC-closed set computed on the previous step. This composition process maximally fractures the SCC closed sets. Each step is thus done on a maximally reduced restriction subspace, due to the restriction to the state subspace of an SCC of computed in the previous step. Further, the SCCs of Lk+1 are generally smaller than those in Lk . Thus as the machine LockStep operates on becomes progressively more concrete, the size of the considered state space becomes progressively smaller. Method (d) is offered to complete the spectrum of available sharpness options. It has not yet been implemented. The principle at work in Methods (a)-(d) is to use the maximum affordable sharpness with each composition step. Method (a) represents the least investment in sharpness, and therefore suffers the least amount of overhead. However, it performs the most expensive
116
C. Wang and G.D. Hachtel
step (the jump step) on the largest restriction subspace. Similarly, Method (d) is sharpest at the jump step, but incurs the greatest overhead. Roughly speaking, we expect that CPUTIME(a)) CPUTIME(b) CPUTIME(c)) CPUTIME(d) However, in the experimental results section we show that the largest computations are only possible with maximum affordable sharpness. The larger investment is clearly justified when the cheaper approach fails anyway. 4.4
Sharp Search and Fair Cycle Detection
Now we “jumped” to the concrete system, and language emptiness need to be checked on each individual state subspace (A ⇓ QL ij ). Fortunately, the subspaces are smaller than the entire state space, thus, both forward traverse and fair cycle detection are easier. Since fair cycle detection is generally harder than forward traverse, and it doesn’t make sense searching unreachable area for a fair cycle, we want to do forward search first, and only starting fair cycle detection when forward search hits a promising state. The promising state is defined as the state that is both in the SCC-closed set of A+ and intersects some fair sets. These promising states are also prioritized: those intersect more fair sets get higher priorities. Sharp Search. We notice that not all the hyperlines are as “sharp” as expected. This is because the SCC size varies, and sometimes we have a big SCC stay in the hyperline. In this case, we need to sharpen it further. The “sharp” guided search algorithm is proposed to address this issue. Instead of using the normal image computation in the forward search, at each step we use its “sharp” counterpart – img# . The pseudo code of img# is given in figure 2. First, a subset of the “from” set is computed heuristically, (it could be a minterm, a cube , or an arbitrary subset with a small BDD size), and states in the subset is selected in a way that those with a shorter approximate distances to the fair SCC are favored. img# is fast even on the concrete system, and it heuristically targets the fair SCC. In other words, it is able to hit a fair SCC by visiting only a portion of the states in the stem (states between initial states and fair SCCs). img# (A, F rom, absRings){ // Model, from set, and abstract onionRings i := length(absRings) while (F rom ∩ absRings[i] = ∅) do i−− od F rom# := bdd-subsetting(F rom ∩ absRings[i]) return img(A, F rom# ) } Fig. 2. The “sharp” image computation algorithm
Sharp Disjunctive Decomposition for Language Emptiness Checking
117
Since img# computes only a subset of the image each time, a dead-end might be reached before the forward search reaches the fix-point. Whenever this happens, we need to backtrack and use the normal img to recover (the algorithm is described in Figure 1). If there exist fair cycles, the sharp guided search algorithm might find one by exploring only part of the reachable states and going directly to its target - the fair SCC. Though all the reachable states or all the SCC-closed set (whichever finishes first) should be explored if there is no fair cycle. In the worst case, the sharp search will have to be executed on every hyperline. It is possible that some area (states) are shared by more than one hyperlines. The variable Reach is used (Figure 1) to avoid computing them more than once. Given ηR as the total number of reachable state on each state subspace A ⇓ QL ij , and f e as the total number of hyperlines (or QL SCC subgraphs), the cost of sharp search on all the subgraphs is O(ηR + f e). Prioritized Lockstep with Early Termination. LockStep with early termination is used together with sharp guided search to find a fair cycle on each A ⇓ QL ij . All the SCC-closed set are put into a priority queue, and they are prioritized according to the approximate distances to the initial states. (These distances are computed on the abstract model A+ ). The recursion in LockStep is also implemented using the priority queue [17]. LockStep will be started as soon as sharp forward search hits some promising states. At this time, one promising state (with higher priority) will be selected as a seed. This guarantees that every fair SCC found is in the reachable area. The early termination is implemented such that, as soon as the cycle found so far intersects all the fair sets, we stop (as opposed to find the entire fair SCC). Assume that η is the total number of states on the concrete system A, clearly, the cost of fair cycle detection is bounded by O(η log η). 4.5
Complexity
+ + + Here A+ 1 , A2 , ..., Ak are used to represent the series of A in the amassing phase. Assume A contains r state variables, the total number of states is then η = O(2r ). Since we have all the abstract model and the concrete system defined over the same + state space and agree on the state labels, each A+ i has η states. However, On each Ai the entire state space can be partitioned into ηi parts and the states inside each part are “indistinguishable”. We define ηi as the effective number of states of the abstract model. ti If A+ i contains ti ≤ r state variables, we have ηi = O(2 ). It’s obvious that LockStep takes O(ηi log ηi ) on each Ai .
Amassing and Decomposition. Building the SCC quotient graph for each A+ i takes O(ηi log ηi ) symbolic step. For any two consecutive abstract models A+ and A+ i i+1 , + Ai+1 has at least one more state variable. This gives us the following relation over their effective number of states: ηi+1 ≥2 ηi
118
C. Wang and G.D. Hachtel
Thus, the total cost of the amassing phase is bound by ηk log ηk (1 + 1/2 + 1/4 + ...) ≤ 2ηk log ηk which is O(ηk log ηk ). So does the pre-jump process. During the decomposition phase, the total number of hyperlines is O(f e), given that the SCC quotient graph of A+ k has a total number of f fair SCCs and e edges. Sharp Search and Lockstep. In the worst case, sharp-search is traversing all the reachable states, plus at least 1 image computation on each state subspace, thus its cost is bounded by O(ηR + f e). Fair cycle detection on the concrete system is bounded by O(η log η) symbolic steps. Put all of them together, we have the overall complexity O(ηk log ηk + f e + ηR + η log η + f e) = O(η log η + f e) In our implementation, f e is bounded by a constant value (amassing threshold), though leaving it uncontrolled will result in O(f e) = O(ηk3 ) in the worst case .
5
Experiments
We implemented our algorithm in VIS-1.4 (we call it LEC# ), and compared its performance with both Emerson-Lei (the standard language emptiness checking command) and D’n’C on the circuits in [13] and the texas97 benchmark circuits. All the experiments are using the static variable ordering (obtained by dynamic variable reordering command in VIS). Table 1 and Table 2 are run on the 400MHz Pentium II with 1GB of RAM, while Table 3 is on the 1.7 GHz Pentium 4 with 1GB of RAM. All of them are running Linux and with the data size limit set to 750MB. Table 1 shows that with VIS dcLevel=2 (using prior reachability analysis result as don’t cares where possible), D’n’C consistently outperforms our new algorithm. To summarize the comparison of the new algorithm and D’n’C, we can denote by “CL” (Constant factor Lose) the case in which both algorithms complete, but D’n’C is faster. Similarly, we can denote by “CW” (Constant factor Win) the case in which both algorithms complete, but LEC# is faster. We also denote by “AL/AW” (Arbitrary Factor Loss/Win) the case where D’n’C (the new algorithm) completes but the other doesn’t. With this notation, a tally of Table 1 gives Cases CL CW AL AW
LEC# vs. D’n’C 15 3 0 0
LEC# vs. EL 6 6 0 6
We see that compared to D’n’C, LEC# has only 3 constant factor wins vs. 15 for D’n’C. However, when you look at D’n’C’s 15 CWs, only 4 were for problems needing more than 100 seconds to complete–that is, the easy problems. In contrast, on LEC# ’s 3 CWs, D’n’C took 1337, 1683, and 233 seconds. Neither algorithm had an AW case.
Sharp Disjunctive Decomposition for Language Emptiness Checking
119
We conclude that on harder problems LEC# is at least competitive even when advance reachability is feasible. Making similar tally for LEC# vs. EL, we see that LEC# ties in the constant factor competition with 6 CWs each, and has a convincing advantage in AWs: New 9, EL 0. Tables 2 and 3 show that with VIS dcLevel=3 (using approximate reachability analysis result as don’t cares), LEC# consistently out-performs both D’n’C and EL on the circuits of [13] and the Texas-97 benchmark circuits. The difference here is that both D’n’C and EL depend strongly on full reachability 6 to restrict the search spaces. The sharp searches of the new algorithm minimizes this dependency. For the circuits of [13] the tallies are: Cases CL CW AL AW
LEC# vs. D’n’C 6 4 0 7
LEC# vs. EL 3 5 0 9
We see that compared to D’n’C, LEC# has only 4 constant factor wins vs. 6 for D’n’C. However, when you look at D’n’C’s 6 CWs, all were for problems needing less than 100 seconds to complete–that is, the easy problems. In contrast, on LEC# ’s CWs, D’nC’ took 7565, 5, 2165, and 1139 seconds. Except for one case, LEC# “wins the big ones”. The bottom line that we are seeking is completion on large problems. In that respect, note that AWs (Arbitrary Factor Wins) are LEC# 7, D’n’C 0. Making similar tallies for LEC# vs. EL, we see that LEC# wins the constant factor competition, and has an even more convincing advantage in AWs: New 9, EL 0. Finally, we look at the same comparisons for Texas-97 benchmark circuits. Similarly tallying Table 3, we obtain Cases CL CW AL AW
LEC# vs. D’n’C 2 3 0 2
LEC# vs. EL 1 3 0 3
For these mostly larger circuits, for some of which reachability is prohibitively expensive, we see a decisive advantage of LEC# vs. both D’n’C and EL.
6
Conclusion
In this paper we proposed a new algorithm for language emptiness, based on a series of “sharpness” heuristics, which enable us to perform the most expensive parts of language emptiness checking with restriction to minimal state subspaces. We presented theoretical and experimental results which supports our hypothesis that for large or otherwise difficult problems, heavy investment in sharpness-based heuristic state subspace restriction and guidance is justified. 6
The full reachability analysis is usually impossible on pratical circuits
120
C. Wang and G.D. Hachtel
Table 1. On the circuits of [13]: * means dcLevel=0 (no don’t cares), otherwise dcLevel=2 (reachable don’t cares). T/O means time-out after 4 hours. Circuit Pass and LTL or Fail bakery1 F bakery2 P bakery3 P bakery4 F bakery5 F eisen1 F eisen2 F elevator1 F nmodem1 P peterson1 F philo1 F philo2 F philo3 P shamp1 F shamp2 F shamp3 F twoq1* P twoq2* P
latch num 56 49 50 58 59 35 35 37 56 70 133 133 133 143 144 145 69 69
CPU (s) Memory (MB) BDD (M) EL D’n’C LEC# EL D’n’C LEC# EL Dnc LEC# 212 27 159 262 69 20 28 152 421 43 1514 550 T/O 1337 655 T/O 623 737 23 16 128 69 T/O 1683 944 210 41 192 489 4384 233 227 569 17 21 41 73 371 7 56 401 73 12 58 145 T/O 115 207 44 87 303 96 T/O 101 239 T/O 335 1383 12 4 14 36 241 30 289 333
75 73 111 411 555 50 564 132 63 83 26 44 119 113 187 478 23 47
125 74 125 476 554 64 340 369 169 78 37 42 329 401 268 500 24 509
5.1 3.4 14 1.0 14 11 1.1 12 2.8 2.1 0.4 8.9
1.3 1.2 1.8 4.7 6.1 0.9 7.7 2.2 0.6 1.2 0.2 0.5 1.2 2.2 2.9 4.4 0.1 0.9
1.5 1.2 1.5 4.8 9.9 0.6 1.7 10.3 2.2 1.2 0.1 0.3 7.0 9.2 3.5 5.8 0.0 7.9
The experimental results show that while D’n’C mostly outperforms our new algorithm on problems where prior reachability is possible, the new algorithm outperforms both the Emerson-Lei algorithm and the D’n’C algorithm on difficult circuits. Although our new algorithm does not win in every case, it tends to win on the harder problems. Out of the 25 LTL model checking cases we described in Tables 2 and 3, Emerson-Lei timed out in 13 cases, more than half. This attests to the fact that the circuits studied, while not huge, are definitely non-trivial. The D’n’C algorithm timed out on 10 of the 25 cases. Since our algorithm never timed out (and usually had much smaller memory requirements when time was an issue) we can only say that the speedup achieved in these cases was arbitrarily large. We note that our new algorithm does not yet employ the strength reduction techniques of D’n’C. This suggests that sharpness itself is very powerful. However, when combined with the strength reduction techniques, our advantage with respect to both D’n’C and Emerson-Lei, might improve further on some problems. A priority in future work would be to diagnose the qualities of a given design which make language emptiness checking compute-intensive. This might afford guidance on how to set the various parameters of the algorithm such as how many latches to compose before jumping, and how to choose, for example between sharp forward search and sharp backward search at the end of the jump phase (currently, we start both and abandon the one that seems to be stalling).
Sharp Disjunctive Decomposition for Language Emptiness Checking
121
Table 2. On the circuits of [13] : With dcLevel=3 (approximate reachable don’t cares). T/O means time-out after 4 hours. Circuit Pass and LTL or Fail bakery1 F bakery2 P bakery3 P bakery4 F bakery5 F eisen1 F eisen2 F ele F nullmodem P peterson F philo1 F philo2 F philo3 P shampoo1 F shampoo2 F shampoo3 F twoq1 P twoq2 P
latch num 56 49 50 58 59 35 35 37 56 70 133 133 133 143 144 145 69 69
CPU Memory EL D’n’C LEC# EL D’n’C LEC# T/O 7565 5367 183 5 2 241 2794 48 174 609 T/O T/O 1964 T/O T/O 1294 23 6 107 36 T/O T/O 1150 3504 2156 585 663 T/O T/O 3375 4 8 176 21 T/O T/O 385 T/O T/O 267 T/O 1139 241 12 T/O 168 21 T/O T/O 189 T/O 53 735 12 4 23 37 172 30 665 322
609 25 128 26 612 42 609 51 14 15
BDD EL Dnc LEC#
447 - 17.6 15 4.1 0.1 133 18.8 2.1 477 416 73 0.3 0.3 365 657 24.4 21.1 306 121 0.0 0.3 64 144 119 - 21.4 127 0.0 153 331 - 0.3 24 0.4 0.1 496 7.7 0.9
8.0 0.0 1.5 4.0 4.9 0.5 3.0 23.6 2.6 1.4 0.9 2.1 1.4 2.0 3.0 5.0 0.0 8.2
Table 3. On Texas-97 benchmark circuits. With dcLevel=3 (approximate reachable don’t cares). T/O means time out after 8 hours. Circuit Pass and LTL or Fail Blackjack1 F MSI cache1 P MSI cache2 F PI bus1 P PI bus2 F PPC60X1 F PPC60X2 P
latch EL num 176 7296 65 T/O 65 T/O 387 T/O 385 501 67 1109 69 13459
CPU (s) Memory (MB) D’n’C LEC# EL D’n’C LEC# 2566 237 618 T/O 51 T/O 165 73 1700 292 1302 467 1690 651 609 2811 531 745
610 243 477 611 625
551 83 342 539 609 445 327
BDD (M) EL D’n’C LEC# 26.8 17.0 20.1 17.8
24.2 3.5 15.4 22.4 18.9
18.1 2.0 6.7 13.4 22.6 10.6 6.9
Further research should be focused on both the clustering algorithms to create the submodules, and the corresponding refinement scheduling (guidance on the order of processing the submodules in the amassing and jump phases). Acknowledgements. We acknowledge the contributions of “deep in the shed” research sessions with Roderick Bloem, Kavita Ravi, and Fabio Somenzi.
122
C. Wang and G.D. Hachtel
References [1] O. Lichtenstein and A. Pnueli. Checking that finite state concurrent programs satisfy their linear specification. In Proceedings of the Twelfth Annual ACM Symposium on Principles of Programming Languages, pages 97–107, New Orleans, January 1985. [2] M. Y. Vardi and P. Wolper. An automata-theoretic approach to automatic program verification. In Proceedings of the First Symposium on Logic in Computer Science, pages 322–331, Cambridge, UK, June 1986. [3] K. L. McMillan. Symbolic Model Checking. Kluwer Academic Publishers, Boston, MA, 1994. [4] R. P. Kurshan. Computer-Aided Verification of Coordinating Processes. Princeton University Press, Princeton, NJ, 1994. [5] E. A. Emerson and C.-L. Lei. Efficient model checking in fragments of the propositional mu-calculus. In Proceedings of the First Annual Symposium of Logic in Computer Science, pages 267–278, June 1986. [6] R. Hojati, H. Touati, R. P. Kurshan, and R. K. Brayton. Efficient ω-regular language containment. In Computer Aided Verification, pages 371–382, Montr´eal, Canada, June 1992. [7] H. J. Touati, R. K. Brayton, and R. P. Kurshan. Testing language containment for ω-automata using BDD’s. Information and Computation, 118(1):101–109, April 1995. [8] Y. Kesten, A. Pnueli, and L.-o. Raviv. Algorithmic verification of linear temporal logic specifications. In International Colloquium on Automata, Languages, and Programming (ICALP-98), pages 1–16, Berlin, 1998. Springer. LNCS 1443. [9] A. Xie and P.A. Beerel. Implicit enumeration of strongly connected components and an application to formal verification. IEEE Transactions on Computer-Aided Design, 19(10):1225– 1230, October 2000. [10] R. Bloem, H. N. Gabow, and F. Somenzi. An algorithm for strongly connected component analysis in n log n symbolic steps. In W. A. Hunt, Jr. and S. D. Johnson, editors, Formal Methods in Computer Aided Design, pages 37–54. Springer-Verlag, November 2000. LNCS 1954. [11] O. Kupferman and M. Y. Vardi. Freedom, weakness, and determinism: From linear-time to branching-time. In Proc. 13th IEEE Symposium on Logic in Computer Science, June 1998. [12] R. Bloem, K. Ravi, and F. Somenzi. Efficient decision procedures for model checking of linear time logic properties. In N. Halbwachs and D. Peled, editors, Eleventh Conference on Computer Aided Verification (CAV’99), pages 222–235. Springer-Verlag, Berlin, 1999. LNCS 1633. [13] C. Wang, R. Bloem, G. D. Hachtel, K. Ravi, and F. Somenzi. Divide and compose: SCC refinement for language emptiness. In International Conference on Concurrency Theory (CONCUR01), pages 456–471, Berlin, August 2001. Springer-Verlag. LNCS 2154. [14] H. Cho, G. D. Hachtel, E. Macii, M. Poncino, and F. Somenzi. A state space decomposition algorithm for approximate FSM traversal. In Proceedings of the European Conference on Design Automation, pages 137–141, Paris, France, February 1994. [15] R. K. Brayton et al. VIS: A system for verification and synthesis. In T. Henzinger and R. Alur, editors, Eighth Conference on Computer Aided Verification (CAV’96), pages 428– 432. Springer-Verlag, Rutgers University, 1996. LNCS 1102. [16] D. L. Dill. What’s between simulation and formal verification? In Proceedings of the Design Automation Conference, pages 328–329, San Francisco, CA, June 1998. [17] K. Ravi, R. Bloem, and F. Somenzi. A comparative study of symbolic algorithms for the computation of fair cycles. In W. A. Hunt, Jr. and S. D. Johnson, editors, Formal Methods in Computer Aided Design, pages 143–160. Springer-Verlag, November 2000. LNCS 1954.
Relating Multi-step and Single-Step Microprocessor Correctness Statements Mark D. Aagaard1 , Nancy A. Day2 , and Meng Lou2 1
Electrical and Computer Engr., University of Waterloo [email protected] 2 Computer Science, University of Waterloo, Waterloo, ON, Canada [email protected], [email protected]
Abstract. A diverse collection of correctness statements have been proposed and used in microprocessor verification efforts. Correctness statements have evolved from criteria that match a single step of the implementation against the specification to seemingly looser, multi-step, criteria. In this paper, we formally verify conditions under which two categories of multi-step correctness statements logically imply single-step correctness statements. The first category of correctness statements compare flushed states of the implementation and the second category compare states that are able to retire instructions. Our results are applicable to superscalar implementations, which fetch or retire multiple instructions in a single step.
1
Introduction
Microprocessor verification efforts usually compare a state-machine description of a microarchitectural-level implementation against an Instruction Set Architecture (ISA). The correctness statement describes the intended relationship between the implementation and the specification ISA. In early verification efforts, correctness statements were based on Milner’s pointwise notion of simulation — a commuting diagram that says for any step the implementation takes, the specification must take a corresponding step [15]. Pipelining and other optimizations increased the gap between the behaviour of the implementation and the specification, making it more difficult to show that an individual implementation step corresponds to a specification step. In a seminal paper, Burch and Dill proposed constructing abstraction functions automatically by flushing pipelines [5]. Their correctness criteria compares each step of the implementation against the specification by flushing the implementation. As verification efforts have tackled complexities such as out-of-order execution and interrupts, the correctness statements have evolved from single-step criteria to seemingly looser, multi-step criteria. Sawada and Hunt [16], Hosabettu et al. [10], Jones et al. [14], and Arons and Pnueli [3] check that the implementation corresponds with the specification only at flushed implementation states, i.e. states with no in-flight instructions. Fox and Harman [7] compare the implementation and specification only at states where an instruction is about to retire. Berezin et al. [4] compare multi-step implementation traces that fetch a single instruction against a single step of the specification. M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 123–141, 2002. c Springer-Verlag Berlin Heidelberg 2002
124
M.D. Aagaard, N.A. Day, and M. Lou
The change from single-step to multi-step correctness statements raises the questions “are they proving the same relationship?”, “are there correct machines that satisfy multistep correctness but not single-step?”, and finally, “are there bugs that are undetectable with multi-step correctness statements?” To explore the relationship between multi-step and single-step correctness statements, we build on the Microbox framework [1,2] for microprocessor correctness statements. Using Microbox, Aagaard et al. [2] described and compared thirty-seven correctness statements from twenty-nine papers. Day et al. [6] mechanized Microbox in the HOL theorem prover [8] and verified a partial order relationship between correctness statements. Day et al. proved that tighter criteria, such as single-step correctness statements, logically imply looser criteria, such as testing only flushed states of the implementation. In this paper we examine whether some reverse implications hold, i.e., if a multi-step correctness statement is verified, is there a single-step statement that also holds? Sections 2 provides background material on Microbox. Section 3 characterizes the microprocessor-specific functions used in the correctness statements. Section 4 describes the relationship between multi-step correctness that compares flushed states and singlestep correctness using Burch-Dill style flushing. The main result of the section is Theorem 3, which says that comparing flushed states of the implementation against the specification is equivalent to using flushing to compare each step of the implementation, for deterministic specifications with no internal state. We also provide an example of a non-deterministic specification and implementation that satisfy the multi-step correctness statement, but not the single-step statement with flushing. Section 5 describes the relationship between multi-step correctness at retirement to single-step correctness. Theorem 6 says that comparing the implementation to the specification when instructions are about to retire is equivalent to checking each step of the implementation. Our results are applicable to superscalar implementations, which can fetch and retire multiple instructions in a single step. Sections 6 and 7 consider the relevance of our results to existing verification efforts and summarize the paper.
2 The Microbox Framework The Microbox framework uses four parameters to characterize a correctness statement: alignment, match, implementation execution, and specification execution. Alignment is the method used to align the traces of the implementation and specification (Section 2.1). Match is the relation established between aligned states in the implementation and specification traces (Section 2.2). Implementation execution and specification execution describe the type of state machines used – either deterministic or non-deterministic. The Microbox framework provides a list of options for each of these parameters based on verification efforts discussed in the literature (Table 1). By choosing options for the parameters, Microbox can produce a wide variety of correctness statements. Each correctness statement contains a base case and an induction step. The base cases deal with initial states and are generally quite straightforward, so we concentrate on the induction steps. The alignment parameter determines the overall form of the induction clause. For each alignment option, Microbox defines a correctness statement for an other match (O), non-deterministic implementation (N), and non-deterministic specification
Relating Multi-step and Single-Step Microprocessor Correctness Statements
125
Table 1. Options for correctness statement parameters alignment (F) Flushpoint (W) Will-retire (M) Must-issue (S) Stuttering (I) Informed-pointwise (P) Pointwise
match impl. execution spec. execution (O) Other (N) Non-deterministic (N) Non-deterministic (A) Abstraction (D) Deterministic (D) Deterministic (U) Flushing (E) Equality (R) Refinement Map
Example: IUND = informed-pointwise alignment (I), flushing match (U), non-deterministic implementation (N) and deterministic specification (D).
(N). Correctness statements for different match and execution options are generated by substitutions into the *ONN definitions. In Microbox, both the specification and implementation machines have program memories as part of their state, and so do not take instructions as inputs. Invariants, which limit the state space of a machine to reachable states or an over-approximation of reachable states, are encoded in the set of states for a machine. Table 2 summarizes the notation. Table 2. State-machine notation N N k (q, q ) n π π qi = qs
Next-state relation q is reachable from q in k steps of N Next-state function External state projection function. Externally visible equivalence: πi (qi ) = πs (qs ).
Identifiers are subscripted with “s” for specification and “i” for implementation.
In Sections 2.1 and 2.2, we describe the alignment and match options that are relevant to this paper. In Section 2.3, we characterize the correctness statements in terms of the type of synchronization used, i.e. at fetch or at retire. In Section 2.4, we describe the partial order relationships between these correctness statements. 2.1 Alignment Alignment describes which states in the execution trace are tested for matching. Pointwise alignment (P, Definition 1) is the classic commuting diagram. Informed-pointwise (I, Definition 2) is a variation of pointwise alignment suitable for superscalar implementations, which allows the implementation to inform the correctness statement of the number of specification steps to take. In practice, numInstr is instantiated with either the number of instructions that were fetched (numFetch) or the number of instructions that were retired (numRetire), depending on the synchronization method (Section 2.3).
126
M.D. Aagaard, N.A. Day, and M. Lou
Definition 1 (Pointwise induction clause: PONN). Ns PONN(R, Ni , Ns ) ≡ qs qs ∀ qi , qi . ∀ qs . ∃qs . R R N (q , q ) N (q , q ) =⇒ ∧ s s s ∧ i i i qi qi R(qi , qs ) R(qi , qs ) N i
Definition 2 (Informed-pointwise induction clause: IONN). IONN(numInstr, R, Ni , Ns ) ≡
∀ qi , qi . ∀ qs . ∃ qs . letj = numInstr(q i , qi ) in j N (q , q ) N (q , q ) =⇒ ∧ s s s ∧ i i i R(qi , qs ) R(qi , qs )
qs
Ns
Ns
qs R
R qi
Ni
qi
Will-retire alignment (W, Definition 3) compares the implementation and specification whenever the implementation is ready to retire instructions. The implementation retires one or more instructions in the first step of the trace and continues until it is ready to retire again. Definition 3 (Will-retire induction clause: WONN). WONN(numRetire, willRetire, R, Ni , Ns ) ≡ ∀ qi0 , qi1 , . . . , qik . ∀ qs . ∃ qs . letr = numRetire(qi0 , qi1 ) in Ni (qi0 , qi1 ) ∧ willRetire(qi0 , qi1 ) ∧ ∀j ∈ 1 . . . k − 1. Nsr (qs , qs ) =⇒ ∧ R(qik , qs ) Ni (qij , qij+1 ) ∧ ¬ willRetire(qij , qij+1 ) ∧ ( ∃ q . Ni (q k , q ) ∧ willRetire(q k , q ) ) i i i i i ∧ R(qi0 , qs ) R qs
Ns
Ns
qs R
R
Ni
qik
qi
et llR wi
llR et wi
Ni
ire
ire
Ni
¬
llR et wi
¬
llR et wi
qi1
ire
Ni
ire
qi0
Flushpoint alignment (F, Definition 4) compares flushed states of the implementation against the specification. It says that if there is a trace between flushed implementation states, then there must exist a trace in the specification between a pair of states that match the flushed implementation states.
Relating Multi-step and Single-Step Microprocessor Correctness Statements
127
Definition 4 (Flushpoint induction clause: FONN). j
FONN(isFlushed, R, Ni , Ns ) ≡
qs
∀ qi , qi , qs . ∃ qs . isFlushed(qi ) ∧ ∃ k. Nik (qi , qi ) ∃ j. Nsj (qs , qs ) ∧ isFlushed(qi ) =⇒ ∧ R(qi , qs ) ∧ R(qi , qs )
2.2
Ns
Ns
qs
R
isF lu
ed
k
qi
sh
Ni
isF lu
Ni
sh
ed
qi
R
Match
Instantiations for the match parameter are relations between an implementation state qi and specification state qs that mean “qi is a correct representation of qs ”. Figure 1 shows the match options that are relevant to this paper and the partial order on the options.
(O)
qs •T
General relation R(qi , qs )
qi •
^== ==
@ (E)
Equality π qi = qs
qs • V
qs •T
(U)
π
=
qi •
R
Flushing π flush(qi ) = qs
π =
•O flush qi •
Fig. 1. Options and partial order for the match parameter
An other match (O) is any relation between implementation and specification states. The flushing match (U) uses a flushing function to compute an implementation state that should be externally equivalent to a specification state. The equality match (E) requires that the implementation and specification states be externally equivalent. 2.3
Synchronization
In the implementation projection function (πi ), there are two common representations of the program counter: the address of the next instruction to fetch, and the address of the next instruction to retire. We refer to the first option as synchronization at fetch and the second option as synchronization at retirement. For a projection function to be sensible, the program counter, register file, and other state components must all reflect the same point in the execution of a program. Synchronization at fetch is only appropriate when applied to a flushed implementation state.
128
M.D. Aagaard, N.A. Day, and M. Lou
Hence, synchronization at fetch can only be used with the flushing match, which flushes the implementation before applying the projection function, and with flushpoint alignment. With synchronization at retirement, the register file and program counter always correspond to the same point of execution. The function numInstr is instantiated with numFetch for synchronization at fetch and numRetire for synchronization at retirement. Instructions in the shadow of a mispredicted branch or an exception should not be executed by the specification, and so do not count toward the number of instructions fetched. The function numRetire counts the number of instructions that retire. Every instruction that retires should be executed by the specification. 2.4
Correctness Space
Figure 2 shows the partial order of logical implication for the first two parameters of correctness statements (alignment and match). For the third and fourth parameters, the execution of the implementation and specification machines, it is easy to consider deterministic as an instance of non-deterministic, thereby providing the ordering amongst these options. The alignment parameter iF (Definition 5, informed-flushpoint — a common instance of F) will be introduced in Section 4.1. The non-shaded lines show the natural ordering amongst correctness criteria, which was verified in Day et al. [6]. In this paper, we verify the arrows in the shaded boxes, which proves equivalences between the correctness statements. In Section 4.2, we verify informed-flushpoint with the equality match for deterministic specifications with no internal state is equivalent to informed-pointwise with flushing (iFE ⇐⇒ IU). The dashed line between iFE and IU indicates that this implication holds only for deterministic specifications. In Section 5, we prove will-retire equality is equivalent to informed-pointwise equality (WE ⇐⇒ IE). In related work, we verified that the multi-step correctness statement of must-issue with the flushing match, in which the implementation takes some number of stalled steps followed by one step where it fetches an instruction, is equivalent to the singlestep informed-pointwise flushing (IUNN) [6].
3
Characterization of Microprocessor-Specific Functions
The relationships between correctness statements are based on microprocessor-specific functions and relations (Table 3) behaving appropriately. In this section, we describe the required conditions on these functions. These conditions often appear as lemmas in verification efforts. To apply our results to a particular specification and implementation, these conditions would have to be verified. Conditions 1–5 are for synchronization at fetch. Conditions 6–8 are for synchronization at retirement. 3.1
Fetching and Flushing Conditions
Condition 1 states that numFetch is zero in a step if-and-only-if doesFetch is false.
Relating Multi-step and Single-Step Microprocessor Correctness Statements
Flushpoint
FU
FO
iFU
iFO
129
F FE
Alignment Options
Informed Flushpoint
iF iFE
WO
Will-Retire W WE
Informed Pointwise
I
Pointwise
P
IU
IO
PU
PO
IE
PE
E
U
O
Equality
Flushing
Other
Match Options
Fig. 2. Partial order for correctness statements Table 3. Microprocessor-specific functions doesFetch(qi , qi ) numFetch(qi , qi ) willRetire(qi , qi ) numRetire(qi , qi ) flush(qi ) isFlushed(qi )
true if an instruction is fetched in a step. returns the number of instructions fetched in a step. true if an instruction is retired in a step. returns the number of instructions retired in a step. flushes qi , i.e., completes the execution of any in-flight instructions. true if a state is flushed.
Condition 1 (numFetch and doesFetch) numFetch doesFetch(numFetch, doesFetch) ≡ ∀ qi , qi . (numFetch(qi , qi ) = 0) ⇐⇒ ¬ doesFetch(qi , qi ) We characterize the required behaviour of a flushing function with Conditions 2 and 3. Condition 2 relates the function flush to the predicate isFlushed and says that if a state qi is flushed, then flushing qi returns qi , i.e. flush is the identity function for a flushed state.
130
M.D. Aagaard, N.A. Day, and M. Lou
Condition 2 (isFlushed and flush) isFlushed flush(isFlushed, flush) ≡ ∀ qi . isFlushed(qi ) =⇒ (flush(qi ) = qi ) Condition 3 says that if an instruction is not fetched in a step where the implementation transitions from qi to qi , then flushing qi returns the same state as flushing qi . Equivalently, flushing a stalled state results in the same state as allowing the machine to take one (unproductive) step and then flushing. Condition 3 (doesFetch and flush) doesFetch flush(doesFetch, flush, Ni ) ≡ ∀ qi , qi . ¬ doesFetch(qi , qi ) ∧ Ni (qi , qi ) =⇒ (flush(qi ) = flush(qi )) Conditions 2 and 3 are the only restrictions on flushing functions. The construction of the flushing function is up to the verifier. The most common method for constructing a flushing function was originated by Burch and Dill [5]. They iterate a deterministic implementation’s next-state function without fetching new instructions. Another method for constructing flushing functions was developed by Hosabettu et al. [10], who define completion functions for each stage in the pipeline and then compose the completion functions to create a flushing function. We also need a reachability condition and a liveness condition. Condition 4 says that for any implementation state, qi , there exists a trace from a flushed implementation state to qi . Condition 4 (Past Flush) past flush(isFlushed, Ni ) ≡ ∀ qi . ∃ k, qi0 . isFlushed(qi0 ) ∧ Nik (qi0 , qi ) Condition 5 says that from any state, the implementation can reach a flushed state by passing through a series of states where it does not fetch an instruction. If the implementation does not already have the ability to prevent instructions from being fetched, then flushing circuitry must be added. Condition 5 (Eventually Flushed) eventually flushed(isFlushed, doesFetch, Ni ) ≡ ∀ qi . ∃ k, qi0 , . . . , qik . qi = qi0 ∧ (∀ j < k. Ni (qij , qij+1 ) ∧ ¬ doesFetch(qij , qij+1 )) ∧ isFlushed(qik ) 3.2
Retiring and Projection Conditions
Condition 6 states that numRetire is zero for an implementation step if-and-only-if willRetire is false. It is the dual of Condition 1 for synchronization at retirement. Condition 6 (numRetire and willRetire) numRetire willRetire(numRetire, willRetire) ≡ ∀ qi , qi . (numRetire(qi , qi ) = 0) ⇐⇒ ¬ willRetire(qi , qi )
Relating Multi-step and Single-Step Microprocessor Correctness Statements
131
Condition 7, relating the predicate willRetire to the implementation projection function πi appropriate for synchronization at retirement, is the dual of Condition 3. Condition 7 says that if an instruction is not retired in a step where the implementation transitions from qi to qi , then the projections of qi and qi are equivalent. Condition 7 (willRetire and πi ) willRetire pi(willRetire, πi , Ni ) ≡ ∀ qi , qi . ¬ willRetire(qi , qi ) ∧ Ni (qi , qi ) =⇒ (πi (qi ) = πi (qi )) Condition 8 is a liveness condition. The condition says that from any implementation state, it is possible to reach a state that can retire an instruction. Condition 8 (Eventually Retires) eventually retires(willRetire, Ni ) ≡ ∀ qi . ∃ k, qi , qi . Nik (qi , qi ) ∧ Ni (qi , qi ) ∧ willRetire(qi , qi )
4
Flushpoint Equality and Informed-Pointwise Flushing
In this section, we discuss the relationship between the two correctness statements, flushpoint equality (FE) and informed-pointwise flushing (IU), which use synchronization at fetch. IU is Burch-Dill style flushing. In Section 4.1, we introduce a commonly used version of flushpoint alignment, which we call informed-flushpoint (iF). In Section 4.2, we prove that informed-flushpoint equality and informed-pointwise flushing are equivalent for a deterministic specification with no internal state (iFEND ⇐⇒ IUND, Theorem 3). A similar relationship does not exist between flushpoint equality (FE) and informed-pointwise flushing (IU), because flushpoint alignment does not constrain the number of steps in the specification trace. In Section 4.3, we describe an implementation and a non-deterministic specification that satisfy informed-flushpoint equality but not informed-pointwise flushing, thereby providing a counterexample to iFENN =⇒ IUNN.
4.1
Informed-Flushpoint
Flushpoint alignment (Definition 4) does not impose any constraints on the number of specification steps taken. However, in most verification efforts that use flushpoint alignment (e.g., [16,10,14]), the number of steps in the specification trace is the number of instructions executed in the implementation trace. We introduce informed-flushpoint alignment (iF) to capture this common practice. Informed-flushpoint is most commonly used with the equality match, as shown in Definition 5. We overload numFetch to return the total number of instructions fetched in either a sequence of implementation steps or in a single implementation step.
M.D. Aagaard, N.A. Day, and M. Lou
qs =π
Ni
4.2
qik
lu isF
isF
lu
sh ed
Definition 5 (Informed-Flushpoint Equality induction clause: iFENN). iFENN(isFlushed, numFetch, πi , πs , Ni , Ns ) ≡ numFetch ∀ qi0 , qi1 , . . . , qik . ∀ qs . ∃ qs . Ns Ns qs let f = numFetchqi0 , . . . , qik in =π isFlushed(qi0 ) ∧ j j+1 f (∀j < k. Ni (qi , qi )) qi0 Ni =⇒ ∧ Ns π(qs , qs ) ∧ k isFlushed(qi ) qik = qs k ∧ 0 π qi = q s
sh ed
132
Informed-Flushpoint and Informed Pointwise: Deterministic Specification
In this section, we prove Theorem 3, which says that, for a deterministic specification without internal state (i.e. Ns is ns and πs is identity), informed-flushpoint with the equality match (iFEND, an instantiation of Definition 5) is equivalent to informedpointwise with the flushing match (IUND, an instantiation of Definition 2). Showing that the single-step informed-pointwise correctness statement logically implies multistep informed-flushpoint (IUND =⇒ iFEND) is straightforward by induction. Here we describe the more difficult reverse direction (iFEND =⇒ IUND). First, we introduce an intermediate point, which we call iFflush (Definition 6) and prove iFEND =⇒ iFflush (Theorem 1). Second, we show iFflush =⇒ IUND (Theorem 2). Definition 6 (iFflush). numFetch
iFflush(isFlushed, numFetch, flush, πi , πs , Ni , Ns ) ≡
qs
Ns
qs
π
=
lu
sh
ed
qi0
=π
flush Ni
Ni
qik
k
isF
∀ qi0 , qi1 , . . . , qik .∀ qs . ∃ qs . let f = numFetchqi0 , . . . , qik in isFlushed(qi0 ) f ∧ j j+1 (∀j < k. Ni (qi , qi )) =⇒ ∧ Ns (qs , qs )π ∧ π flush(qik ) = qs q i = qs
Ns
Definition 6 is the same as informed-flushpoint (Definition 5), except that the final states must satisfy the flushing match, rather than be externally equivalent. Theorem 1 (iFENN =⇒ iFflush). ∀ isFlushed, numFetch, doesFetch, flush, πi , πs , Ni , Ns . eventually flushed(isFlushed, doesFetch, Ni ) — Condition 5 ∧ doesFetch flush(doesFetch, flush, Ni ) — Condition 3 ∧ isFlushed flush(isFlushed, flush) — Condition 2 ∧ — Condition1 numFetch doesFetch(numFetch, doesFetch) iFENN(isFlushed, numFetch, πi , πs , Ni , Ns ) =⇒ =⇒ iFflush(isFlushed, numFetch, flush, πi , πs , Ni , Ns )
Relating Multi-step and Single-Step Microprocessor Correctness Statements
133
Figure 3 outlines the proof of iFENN =⇒ iFflush (Theorem 1). This theorem depends on conditions described in Section 3. We begin in Step 0 assuming the left and lower sides of the commuting diagram for iFflush. In Step 1, we extend the path from qik to a flushed state, qi◦ , using the condition that the implementation can always reach a flushed state by taking steps that do not fetch instructions (Condition 5, eventually flushed). In Step 2, we use the condition that flushing a state after taking a series of steps that do not fetch an instruction is the same as flushing the state at the beginning of the series (Condition 3, doesFetch flush). In Step 3, we conclude that flushing qik results in qi◦ because flushing a flushed state has no effect (Condition 2, isFlushed flush). In Step 4, we use the fact that iFENN holds for traces between flushed states to complete the commuting diagram. Condition 1, which relates numFetch and doesFetch, is needed to relate the number of steps in the specification traces.
Step 0
Step 1: using eventually_flushed qs
qs
Ni
Ni
Ni ¬ doesFetch qik
lu
Step 2: using doesFetch_flush
isF
isF
lu
sh
ed
Ni
Ni
qi
qik
sh
ed
qi
qi◦ isFlushed Ni ¬ doesFetch
π =
π =
Step 3: using isFlushed_flush qs
qs
flush
qi
flush
Ni
qik
lu
sh
Ni
Step 4: using iFENN qs
Ns
π = flush
Ni
Ni
qik
isF
lu
sh
ed
qs
qi◦ isFlushed
π = qi
Ns
isF
lu isF
qi◦ isFlushed
π =
Ni ¬ doesFetch qik
Ni
Ni
sh
ed
qi
flush
ed
π =
qi◦ isFlushed Ni ¬ doesFetch
Fig. 3. Steps in proof of iFENN =⇒ iFflush (Theorem 1)
In the second half of the proof of iFEND =⇒ IUND, we use iFflush to arrive at IUND (Theorem 2). The steps of the proof are outlined in Figure 4.
134
M.D. Aagaard, N.A. Day, and M. Lou
Theorem 2 (iFflush =⇒ IUND). ∀ isFlushed, numFetch, flush, πi , πs , Ni , ns . — Condition 4 past flush(isFlushed, Ni ) ∧ πs= (λx.x) iFflush(isFlushed, numFetch, flush, πi , πs , Ni , ns ) =⇒ =⇒ IUND(numFetch, flush, πi , πs , Ni , ns )
Step 0
qs πi flush qi
Step 1: using past_flush
Ni
qi
qs πi
flush Ni
qi
Ni
Ni
qi
isF
lu
sh
ed
qi◦
Step 2: using iFflush twice ns ns qs◦
qs◦
qs πi
πi
ns
ns
qs
ns
qs πi
πi
flush
flush Ni
qi◦
ed
qi
Ni
Ni
Ni
sh
Ni
isF
isF lu
lu
sh
ed
qi◦
qi
Step 3: IUND
qs
ns
qs
πi
πi flush
flush qi
Ni
qi
Fig. 4. Steps in proof of iFflush =⇒ IUND (Theorem 2)
qi
Ni
qi
Relating Multi-step and Single-Step Microprocessor Correctness Statements
135
In Step 0 of Figure 4, we start with the left and lower edges of the IUND commuting diagram, leaving out πs because it is the identity function in this case. In Step 1, we extend the path from qi back to a flushed state, qi◦ , using the condition that for any state, there is always a previous flushed state (Condition 4, past flush). In Step 2, we use iFflush to deduce the two commuting diagrams both beginning at qi◦ . Because the matching relationship is a function, and because the specification is deterministic, from these two commuting diagrams we can conclude IUND in Step 3. We combine Theorem 1, specialized for a deterministic specification with no internal state; Theorem 2; and the result that IUND logically implies iFEND to conclude that iFEND is equivalent to IUND under the conditions listed in Section 3 (Theorem 3). Theorem 3 (iFEND ⇐⇒ IUND). ∀ isFlushed, numFetch, doesFetch, flush, πi , πs , Ni , ns . — Condition 5 eventually flushed(isFlushed, doesFetch, Ni ) ∧ doesFetch flush(doesFetch, flush, Ni ) — Condition 3 ∧ isFlushed flush(isFlushed, flush) — Condition 2 ∧ numFetch doesFetch(numFetch, doesFetch) — Condition 1 ∧ past flush(isFlushed, Ni ) — Condition 4 ∧ πs = (λx.x) iFEND(isFlushed, numFetch, πi , πs , Ni , ns ) ⇐⇒ =⇒ IUND(numFetch, flush, πi , πs , Ni , ns )
4.3
Informed-Flushpoint and Informed-Pointwise: Non-Deterministic Specification Counterexample
In Section 4.2, we proved iFEND ⇐⇒ IUND. In this section, we illustrate that a nondeterministic specification paired with an implementation can satisfy iFENN without satisfying IUNN. Figure 5 is an example of a reasonable non-deterministic specification and a slightly strange, but arguably correct, implementation that satisfies iFENN but not IUNN. In the specification states (S1—S9), the letters in the top of the box represent instructions to execute. The lower part of the box lists completed instructions. In the implementation states (I1–I7), the middle shaded area is in-flight instructions. States with no in-flight instructions are flushed. The larger, shaded arrows show the projection of the implementation states. In the step marked “X” the implementation kills its currently executing instruction “B” and fetches the instructions “C”, and “D”, however it only reports fetching one instruction. Figure 6 shows how the iFENN commuting diagram is satisfied for all possible paths between flushed implementation states. In all three cases, the length of the specification trace is the reported number of instructions fetched. Because there is a bug in the fetch mechanism, this is not actually the number of instructions fetched in Path 3. Figure 7 illustrates that IUNN does not hold for the implementation step “X”.
136
M.D. Aagaard, N.A. Day, and M. Lou
S1
Specification S3
S2
A B C D
A
B C D
B
C D
A C
A B C
S4 A B
D
C
S5
A
D
B
C
A D
A B C D
S7
I7
B C D A B
A
B C D
B
C D
B
C,D
B
X
A
¬ doesFetch
¬ doesFetch
I4
I3
A
C D
¬ doesFetch
¬ doesFetch
I2 A
B
C D
A
A B C D
C D
πi
I6
I5
A B C D
A
S9
B
B
πi
πi
D
S8
A C D
I1
D
C
A B
πi
S6
B
C D A
Implementation
Path 1: iFENN
Fig. 5. Specification and implementation of counterexample
S2
S1
A B C D
A
B C D A
I1
πi
A B C D
I2 A
I5
B C D
πi B C D
A
=
A
Path 2: iFENN
S3 S1
A B C D
S2
A
B C D
A
I1
πi
A B C D
I2 A
B C D
B
A
C D
C D
Path 3: iFENN
S4
A B C D
C
S7
A B
D
D
C
I1
πi
A B C D
I2 A
I3 B C D
A
C D
B
B A
A
A
B
C D
A
I4
I7 B
C,D
X
B
S8
A B
A
I4 C,D
X
C D
πi
C D
Fig. 6. iFENN paths of counterexample
A
C D
flush B C D
A
B
A
πi B
flush
I3
C D
I7
A B
A B
C D
S1
B A
C D
B A
πi
I6
πi
I6
D A B C
A B
C D
A B
I3
S8
C D C
S3
B
S6
C D
Fig. 7. IUNN path of counterexample
Relating Multi-step and Single-Step Microprocessor Correctness Statements
137
5 Will-Retire and Informed-Pointwise The will-retire correctness statement (WONN, Definition 3) uses synchronization at retirement to compare an implementation trace that retires instructions only in the first step against one specification step. The implementation trace continues until it is ready to retire another instruction. The main result of this section is Theorem 6, which says that will-retire equality (WENN) is equivalent to informed-pointwise with equality (IENN, Definition 2 with the equality match). The first insight in the proof that WENN is equivalent to IENN is the introduction of an alternative way of expressing WONN, which we call single-step will-retire (ssWONN, Definition 7). ssWONN decomposes WONN into two simpler, single-step properties based on whether the implementation will retire any instructions. As a single-step correctness statement, ssWONN is similar to informed-pointwise (IONN) in examining only a single step of the implementation. IONN and ssWONN are equivalent under Condition 6, numRetire willRetire, which states that the function numRetire returns zero if-and-only-if willRetire is false (Theorem 4). Definition 7 (Single-step will-retire induction clause: ssWONN). ssWONN(numRetire, willRetire, R, Ni , Ns ) ≡ ∀ qi , qi . ∀ qs . letr = numRetire(q i , qi )in willRetire(qi , qi ) =⇒ ∃ qs . Nsr (qs , qs ) ∧ R(qi , qs ) Ni (qi , qi ) ∧ =⇒ ∧ R(qi , qs ) ¬ willRetire(qi , qi ) =⇒ R(qi , qs ) Theorem 4 (ssWONN ⇐⇒ IONN). ∀ numRetire, willRetire, R, Ni , Ns . numRetire — Condition 6 willRetire(numRetire, willRetire) ssWONN(numRetire, willRetire, R, Ni , Ns ) ⇐⇒ =⇒ IONN(numRetire, R, Ni , Ns ) The next and more challenging step in the proof is to show that will-retire with the equality match is equivalent to the seemingly tighter single-step will-retire correctness statement (WENN ⇐⇒ ssWENN). Showing ssWENN =⇒ WENN is straightforward by induction. The other direction (WENN =⇒ ssWENN, Theorem 5) holds under Conditions 7 and 8. Theorem 5 (WENN ⇐⇒ ssWENN). ∀ willRetire, πi , πs , Ni , Ns . willRetire pi(willRetire, πi , Ni ) — Condition 7 ∧ eventually retires(willRetire, Ni ) — Condition 8 WENN(numRetire, willRetire, πi , πs , Ni , Ns ) ⇐⇒ =⇒ ssWENN(numRetire, willRetire, πi , πs , Ni , Ns )
138
M.D. Aagaard, N.A. Day, and M. Lou
Figure 8 is an illustration of the proof of Theorem 5. In Step 0, we start with the left and lower side of the commuting diagram for ssWENN. In Step 1, we use the eventually retires condition (Condition 8), to reach the first future state, qi , that retires an instruction. In Step 2, we use the willRetire pi condition (Condition 7) to conclude the projection of qi and qi are equal. In Step 3, we use WENN to complete the commuting diagram. Step 4 shows ssWENN where the left case follows from Step 3 and the right case follows directly from Condition 7.
qs πs
Step 0
πi qi
Ni willRetire
qi
qs πs
Step 1: using eventually_retires
πi qi
Ni willRetire
qi
Ni ¬ willRetire
qi Ni ¬ willRetire
qs
Step 2: using willRetire_pi
πs πi
πi
πi qi
qs
Ni willRetire Ns
qi
Ni ¬ willRetire
πs
πi
πi
qs
Ni willRetire Ns
qi
πs
πi
πi Ni willRetire
qi
Ni willRetire
Step 3: using WENN πi Ni ¬ willRetire
qi Ni ¬ willRetire
qs
πs
qi
qi Ni ¬ willRetire
qs
πs
qi
Ni willRetire
qs πs
Ni willRetire
Step 4: left case from Step 3; right case from willRetire_pi πi
πi qi
qi Ni ¬ willRetire
Fig. 8. Steps in proof of WENN =⇒ ssWENN (Theorem 6)
Relating Multi-step and Single-Step Microprocessor Correctness Statements
139
Theorem 6 (WENN ⇐⇒ IENN). ∀ willRetire, πi , πs , Ni , Ns . — Condition 7 willRetire pi(willRetire, πi , Ni ) ∧ eventually retires(willRetire, Ni ) — Condition 8 ∧ willRetire(numRetire, willRetire) — Condition 6 numRetire WENN(numRetire, willRetire, πi , πs , Ni , Ns ) ⇐⇒ =⇒ IENN(numRetire, πi , πs , Ni , Ns ) By specializing R in Theorem 4 to the equality match, we are able to conclude
IENN is equivalent to ssWENN under Condition 6. Combining this specialization of Theorem 4 with Theorem 5, we conclude WENN ⇐⇒ IENN under Conditions 7, 8,
and 6 (Theorem 6).
6
Relating Theory to Practice
We now consider the relevance of our results to existing microprocessor verification efforts that use multi-step correctness statements based on flushpoint and will-retire alignment. Using our theorems is contingent upon showing the implementation satisfies the conditions in Section 3. Sawada and Hunt [16] verified that a non-deterministic implementation with out-oforder retirement satisfies informed-flushpoint equality with a deterministic specification with no internal state (iFEND). Their verification strategy is to build an intermediate model with history variables, called the MAETT. From our result, they can now conclude that informed-pointwise flushing (IUND) also holds. In later work [17,18], they enhanced their implementation to support external interrupts, which led them to add non-determinism to their specification because of the problem of predicting how many instructions the implementation will have completed when an interrupt occurs. Because of the non-deterministic specification, we cannot conclude that pointwise flushing holds in this case. Skakkebæk et al. [14,13] verify that a deterministic implementation with in-order retirement satisfies informed-flushpoint equality with a deterministic specification with no internal state (iFEDD). They build a non-deterministic intermediate model that computes the result of each instruction when it enters the machine and queues the result for later retirement. Because of our result they are able to conclude informed-pointwise flushing (IUDD) holds. Hosabettu, Srivas, and Gopalakrishnan [10,11,12,9] prove that a deterministic out-oforder implementation satisfies informed-flushpoint equality with a deterministic specification with no internal state. They first prove informed-pointwise flushing (IUDD), then apply induction to prove informed-flushpoint equality (iFEDD). Because they use IUDD as a step toward iFEDD, there is no need for our result in this work. Arons and Pnueli [3] use flushpoint alignment, not informed-flushpoint. Thus, our result is not applicable to their verification effort. Fox and Harman [7] use will-retire alignment for a deterministic implementation and specification where the match is projection of the implementation (WEDD). Based on the results of this paper, they can also conclude informed-pointwise equality (IEDD).
140
7
M.D. Aagaard, N.A. Day, and M. Lou
Conclusions
This paper contains three results. First, we prove that for deterministic specifications with no internal state, from multi-step informed-flushpoint equality, one can conclude singlestep informed-pointwise with the flushing match. Second, we provide a counterexample showing that for non-deterministic specifications flushpoint equality does not always imply informed-pointwise with the flushing match. Third, we prove that a multi-step correctness statement based on synchronization at retirement with the equality match is equivalent to informed-pointwise with the equality match. Our results are applicable to superscalar implementations, which fetch or retire multiple instructions in a single step. Our long-term goal in studying correctness statements abstractly is to determine decomposition strategies that will ease the verification effort. The proofs described in this paper have been mechanized in the HOL theorem prover. We have created a reusable theory of microprocessor correctness that allows the comparison and extension of existing verification efforts. Acknowledgments. We thank Robert Jones of Intel and the reviewers for detailed comments on this paper. The authors are supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC). Aagaard is supported in part by Intel Corporation.
References 1. M. D. Aagaard, B. Cook, N. A. Day, and R. B. Jones. A framework for microprocessor correctness statements. In CHARME, volume 2144 of LNCS, pages 433–448. Springer, 2001. 2. M. D. Aagaard, B. Cook, N. A. Day, and R. B. Jones. A framework for superscalar microprocessor correctness statements, 2002. To appear in Software Tools for Technology Transfer. 3. T. Arons and A. Pnueli. Verifying Tomasulo’s algorithm by refinement. In Int’l Conf. on VLSI Design, pages 92–99. IEEE Comp. Soc. Press, 1999. 4. S. Berezin, E. Clarke, A. Biere, and Y. Zhu. Verification of out-of-order processor designs using model checking and a light-weight completion function. Formal Methods in System Design, 20(2):159–186, March 2002. 5. J. Burch and D. Dill. Automatic verification of pipelined microprocessor control. In CAV, volume 818 of LNCS, pages 68–80. Springer, 1994. 6. N. A. Day, M. D. Aagaard, and M. Lou. A mechanized theory for microprocessor correctness statements. Technical Report 2002-11, U. of Waterloo, Dept. of Comp. Sci., 2002. 7. A. Fox and N. Harman. Algebraic models of correctness for microprocessors. Formal Aspects in Computing, 12(4):298–312, 2000. 8. M. Gordon and T. Melham. Introduction to HOL: A Theorem Proving Environment for Higher Order Logic. Cambridge University Press, 1993. 9. R. Hosabettu, G. Gopalakrishnan, and M. Srivas. Verifying advanced microarchitectures that support speculation and exceptions. In CAV, volume 1855 of LNCS, pages 521–537. Springer, 2000. 10. R. Hosabettu, M. Srivas, and G. Gopalakrishnan. Decomposing the proof of correctness of pipelined microprocessors. In CAV, volume 1427 of LNCS, pages 122–134. Springer, 1998. 11. R. Hosabettu, M. Srivas, and G. Gopalakrishnan. Proof of correctness of a processor with reorder buffer using the completion functions approach. In CAV, volume 1633 of LNCS, pages 47–59. Springer, 1999.
Relating Multi-step and Single-Step Microprocessor Correctness Statements
141
12. R. Hosabettu, M. Srivas, and G. Gopalakrishnan. Proof of correctness of a processor without reorder buffer using the completion functions approach. In CHARME, volume 1703 of LNCS, pages 8–22. Springer, 1999. 13. R. Jones, J. Skakkebæk, and D. Dill. Reducing manual abstraction in formal verification of out-of-order execution. In FMCAD, volume 1522 of LNCS, pages 2–17. Springer, 1998. 14. R. B. Jones, J. U. Skakkebæk, , and D. L. Dill. Formal verification of out-of-order execution using incremental flushing. Formal Methods in System Design, 20(2):39–58, March 2002. 15. R. Milner. An algebraic definition of simulation between programs. In Joint Conference on Artificial Intelligence, pages 481–489. British Computer Society, 1971. 16. J. Sawada and W. Hunt. Trace table based approach for pipelined microprocessor verification. In CAV, volume 1254 of LNCS, pages 364–375. Springer, 1997. 17. J. Sawada and W. Hunt. Processor verification with precise exceptions and speculative execution. In CAV, volume 1427 of LNCS, pages 135–146. Springer, 1998. 18. J. Sawada and W. Hunt. Results of the verification of a complex pipelined machine model. In CHARME, volume 1703 of LNCS, pages 313–316. Springer, 1999.
Modeling and Verification of Out-of-Order Microprocessors in UCLID Shuvendu K. Lahiri2 , Sanjit A. Seshia1 , and Randal E. Bryant1,2 1
2
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA {Randy.Bryant, Sanjit.Seshia}@cs.cmu.edu Electrical and Computer Engineering Department, Carnegie Mellon University, Pittsburgh, PA [email protected]
Abstract. In this paper, we describe the modeling and verification of out-of-order microprocessors with unbounded resources using an expressive, yet efficiently decidable, quantifier-free fragment of first order logic. This logic includes uninterpreted functions, equality, ordering, constrained lambda expressions, and counter arithmetic. UCLID is a tool for specifying and verifying systems expressed in this logic. The paper makes two main contributions. First, we show that the logic is expressive enough to model components found in most modern microprocessors, independent of their actual sizes. Second, we demonstrate UCLID’s verification capabilities, ranging from full automation for bounded property checking to a high degree of automation in proving restricted classes of invariants. These techniques, coupled with a counterexample generation facility, are useful in establishing correctness of processor designs. We demonstrate UCLID’s methods using a case study of a synthetic model of an out-of-order processor where all the invariants were proved automatically.
1
Introduction
Present-day microprocessors are complex systems, incorporating features such as pipelining, speculative, out-of-order execution, register-renaming, exceptions, and multi-level caching. Several formal verification techniques, including symbolic model checking [4,12], theorem proving [17,2,11], and approaches based on decision procedures for the logic of equality with uninterpreted functions [8,6,20] have been used to verify such microarchitectures. In previous work, Bryant et al.[5,6] presented PEUF, a logic of positive equality with uninterpreted functions. PEUF has been shown to be expressive enough to model pipelined processors and also has a very efficient decision procedure based on Boolean techniques. Lahiri et al. [13] demonstrate the use of this technique for the verification of the superscalar, deeply pipelined MCORE1 processor, by finding bugs in the real 1
MCORE is a registered trademark of Motorola Inc.
M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 142–159, 2002. c Springer-Verlag Berlin Heidelberg 2002
Modeling and Verification of Out-of-Order Microprocessors in UCLID
143
design. However, this approach cannot handle models with unbounded queues and reorder buffers, which limits its applicability to processors with bounded resources. To overcome this problem, we have generalized PEUF to yield a more expressive logic called CLU [7], which is a logic of Counter Arithmetic with Lambda Expressions and Uninterpreted Functions. UCLID is a system for modeling and verifying systems modeled in CLU. It can be used to model a large class of infinite-state systems, including those with unbounded resources, while retaining the advantage of having an efficient decision procedure. In this paper, we explore the application of UCLID to out-of-order processor designs. First, we illustrate the fact that CLU is expressive enough to model different processor components with unbounded resources. This includes components with infinite resources (e.g. infinite memory) or resources with finite but arbitrary size (e.g. a circular queue of arbitrary length). Next, we show that UCLID has useful verification capabilities that build upon the efficient decision procedure and a counterexample generator. We demonstrate the successful use of bounded property checking, i.e., checking an invariant on all the states of the system which are reachable within a fixed (bounded) number of steps from the reset state. The efficiency of UCLID’s decision procedure enables a completely automatic exploration of a much larger state space than is possible with other techniques which can model infinite state systems. UCLID can also be used for inductive invariant checking, for a restricted class of invariants of the form ∀x1 . . . ∀xk .Ψ (x1 , . . . , xk ), where Ψ (x1 , . . . , xk ) is a CLU formula. In our experience, this class of invariant is expressive enough to specify most invariants about out-of-order processors with unbounded size. These are also the most frequently occurring invariants that we have encountered in our experience with UCLID. As a case study, we present the modeling and verification of a synthetic out-oforder processor, OOO, with ALU instructions, infinite memory, arbitrary large data words and an unbounded-size reorder buffer (first with an infinite size queue, and then with a finite but arbitrary size circular buffer). Bounded property checking was used initially to debug the design. The processor model was then formally verified by inductive invariant checking, by showing that it refines an instruction set architecture (ISA) model. The highlight of the verification was that all the invariants were proved fully automatically. Moreover, very little manual effort was needed in coming up with auxiliary invariants, which were inferred fairly easily from counterexample traces. Related Work. Jhala and McMillan [12] use compositional model checking to verify a microarchitecture with speculative, out-of-order execution, load-store buffers and branch prediction. Apart from requiring the user to write down the refinement maps and case-splits to prove lemmas, the rest of the verification is automatically performed using Cadence SMV. The out-of-order processor we verify is similar in complexity to the model of Tomasulo algorithm McMillan verified using compositional reasoning [14]. The author acknowledges that the proof is not automatic and substantial human effort is required to decompose the proof into lemmas about small components of states. The main advantage of using model checking is in automatically computing the strongest invariants for the most general state of the system; in our case, once the invariants have
144
S.K. Lahiri, S.A. Seshia, and R.E. Bryant
been figured out by the user, the rest of the proof is fully automatic and no manual decomposition is required. Berezin et al. [4] use special data-structures called reference files, along with other symmetry reduction techniques, to manually decompose a generic out-of-order execution model to a finite model, which is verified using a model checker. The manual guidance involved in decomposing the model limits the applicability of this approach to small, simple designs. Sawada and Hunt [17] use theorem proving methodology to verify the correctness of microarchitectures with out-of-order execution, load-store instruction and speculation. They use a trace-table based intermediate representation called MAETT to record both committed and in-flight instructions. This method requires extensive user guidance during the verification process, first in discovering invariants, and then in proving them using the ACL2 theorem prover. The authors claim that automating the proof of the lemmas would make the verification easier. Automating proof is central to our work and we illustrate it with the verification of an out-of-order unit. Hosabettu et al. [10,11] use a completion function approach to verify advanced microarchitectures which includes reorder buffers, using the PVS [16] theorem prover. The method requires user ingenuity to construct a completion function for the different instruction types and then composing the different completion functions to obtain the abstraction function. The approach further requires extensive user guidance in discharging the proofs. Although the out-of-order unit we verify is of similar complexity as that in their original work [10], we shall show that the invariants required in our verification are few and simple, and they are discharged in a completely automatic manner. Arons et al. [1,2] also verify out-of-order processors using refinement within PVS theorem prover. Our verification scheme is very similar to their approach as it also uses prediction to establish the correspondence with a sequential ISA. The model verified in [1] is similar in complexity to ours but once again substantial manual assistance is required to prove the invariants using PVS. Skakkebaek et al. [19] manually transform an out-of-order model of a processor to an intermediate inorder model, and use incremental flushing to show the correspondence of the intermediate model with the ISA model. The manual component in the entire process is significant in both constructing the intermediate model and proving correctness. Velev [20] has verified an out-of-order execution unit exploiting positive equality and rewrite rules. The model does not have register-renaming and still considers bounded (although very large) resources. The rest of the paper is organized as follows. We begin by describing the UCLID system in Section 2. This section outlines the underlying logic CLU in Section 2.1 and the verification techniques supported in the UCLID framework in Section 2.2. Modeling primitives for various processor components are described in Section 3. Section 4 describes the case study of the verification of an out-oforder processor unit (OOO) in detail. The section contains a description of the processor, all the invariants required, and the use of bounded property checking and inductive invariant checking for the verification of the OOO unit. We conclude in Section 5.
Modeling and Verification of Out-of-Order Microprocessors in UCLID
2
145
The UCLID System
2.1
The CLU Logic
The logic of Counter Arithmetic with Lambda Expressions and Uninterpreted Functions (CLU) is a generalization of Logic of Equality with Uninterpreted Functions (EUF) [8] with constrained lambda expressions, ordering, interpreted functions for successor (succ) and predecessor (pred) operations, that we will refer to as counter arithmetic. bool-expr ::= true | false | ¬bool-expr | (bool-expr ∧ bool-expr) | (int-expr = int-expr) | (int-expr < int-expr) | predicate-expr(int-expr, . . . , int-expr) int-expr ::= int-var | ITE(bool-expr, int-expr, int-expr) | succ(int-expr) | pred(int-expr) | function-expr(int-expr, . . . , int-expr) predicate-expr ::= predicate-symbol | λ int-var, . . . , int-var . bool-expr function-expr ::= function-symbol | λ int-var, . . . , int-var . int-expr
Fig. 1. CLU Syntax.
Expressions in CLU describe means of computing four different types of values. Boolean expressions, also termed formulas, yield true or false. Integer expressions, also referred to as terms, yield integer values. Predicate expressions denote functions from integers to Boolean values. Function expressions, denote functions from integers to integers. Figure 1 summarizes the expression syntax. The simplest Boolean expressions are true and false. Boolean expressions can also be formed by comparing two integer expressions for equality or for ordering, or by applying a predicate expression to a list of integer expressions, or by combining Boolean expressions using Boolean connectives. Integer expressions can be integer variables2 , or can be formed by applying a function expression (including interpreted functions succ and pred) to a list of integer expressions, or by applying the ITE (for “if-then-else”) operator. The ITE operator chooses between two values based on a Boolean control value, i.e., ITE(true, x1 , x2 ) yields x1 while ITE(false, x1 , x2 ) yields x2 . Function (predicate) expressions can be either function (predicate) symbols, representing uninterpreted functions (predicates), or lambda expressions, defining the value of the function (predicate) as an integer (Boolean) expression containing references to a set of argument variables. We will omit parentheses for function and predicate symbols with zero arguments, writing a instead of a(). An integer variable x is said to be bound in expression E when it occurs inside a lambda expression for which x is one of the argument variables. We say that an 2
Integer variables are used only as the formal arguments of lambda expressions
146
S.K. Lahiri, S.A. Seshia, and R.E. Bryant
expression is well-formed when it contains no unbound variables. The value of a well-formed expression in CLU is defined relative to an interpretation I of the function and predicate symbols. Let Z denote the set of integers. Interpretation I assigns to each function symbol of arity k, a function from Z k to Z, and to each predicate symbol of arity k a function from Z k to {true, false}. The value of a well-formed expression E in CLU relative to an interpretation I, [E]I is defined inductively over the expression structure. We shall omit the details in this paper. A well-formed formula F is true under interpretation I if [F ]I is true. It is valid when it is true under all possible interpretations. It can be easily shown that CLU has a small-model property, i.e. a CLU formula Fclu is valid iff Fclu is valid over all interpretations whose domain size equals the number of distinct terms in Fclu . The decision procedure for CLU checks the validity of a well-formed formula F by translating it to an equivalent propositional formula. The structure of the formula is exploited for positive equality [5] to dramatically reduce the number of interpretations to consider, yielding a very efficient decision procedure for CLU [7]. For brevity, we will not discuss the decision procedure in this paper. 2.2
Verification with UCLID
The UCLID specification language can be used to specify a state machine, where the state variables either have primitive types — Boolean, enumerated, or (unbounded) integer — or are functions of integer arguments that evaluate to these primitive types. The concept of using functions or predicates as state variables has previously been used in Cadence SMV, and in theorem provers as well. A system is specified in UCLID by describing initial-state and next-state expressions for each state variable. The UCLID verification engine comprises of a symbolic simulator that can be “configured” for different kinds of verification tasks, and a decision procedure for CLU. We shall illustrate the use of two particular techniques for the verification of out-of-order processors. The reader is referred to [7] for more details. 1. Bounded property checking: The system is symbolically simulated for a fixed number of steps starting from the reset state. At each step, the decision procedure is invoked to check the validity of some safety property. If the property fails, then we can generate a counterexample trace from the reset state. 2. Inductive invariant checking: The system is started from the most general state which satisfies the invariants and then simulated for one step. The invariants are checked at the next step to ensure that the state transition preserves the invariant. If the invariants hold for the reset state, and the invariants are preserved by the transition function, then the invariants hold for any reachable state of the model. As we shall see in the next section, we can express an interesting class of invariants with universal quantifiers and can automatically decide that the transition function preserves the invariants.
Modeling and Verification of Out-of-Order Microprocessors in UCLID
147
Counterexample Generation. One of the useful features of UCLID is its ability to generate counterexample traces, much like a model checker. A counterexample to a CLU formula Fclu is a partial interpretation I to the various function and predicate symbols in the formula. If the system has been symbolically simulated for k steps, then the interpretation I generated above can be applied to the expressions at each step, thereby resulting in a complete counterexample trace for k steps. The counterexample generation is useful in both bounded property checking to discover bugs in the design and in inductive invariant checking for adding more auxiliary invariants. Invariant Checking and Quantifiers. The logic of CLU has been restricted to be quantifier-free. Hence a well-formed formula in this logic can be decided for validity using the small-model property of CLU. Although this restriction is not severe in the modeling of the out-of-order processors we consider, the need for quantifiers become apparent when UCLID is used for invariant checking. The invariants we encounter are frequently of the form ∀x1 ∀x2 . . . ∀xk Φ(x1 , . . . , xk ), where x1 , . . . , xk are integer variables free in the CLU formula Φ(x1 , . . . , xk ). To prove that such an invariant is actually preserved by the state transition function, we need to decide the validity of formulas of the form ∀x1 . . . ∀xm Ψ (x1 , . . . , xm ) =⇒ ∀y1 . . . ∀yk Φ(y1 , . . . , yk )
(1)
where Ψ (x1 , . . . , xm ), Φ(y1 , . . . , yk ) are CLU formulas, x1 . . . xm and y1 . . . yk are free in Ψ (x1 , . . . , xm ) and Φ(y1 , . . . , yk ) respectively. In general, the problem of checking validity of first-order formulas of the form (1), with uninterpreted functions is undecidable [9]. Note that this class of formulas cannot be expressed in CLU, since CLU is a quantifier-free logic. However, UCLID has a preprocessor for formulas of the form (1), which are translated to a CLU formula, which is a more conservative form of the original formula, i.e. if the CLU formula is valid then the original formula is valid. As we shall demonstrate, this has proved very effective for automatically checking the class of invariants encountered in our verification with out-of-order processors. We employ a very simple heuristic to convert formulas of the form (1) to a CLU formula. First, the universal quantifiers to the right of the implication in (1) are removed by skolemization to yield the following formula, which is equivalent to the formula in (1) ∀x1 . . . ∀xm Ψ (x1 , . . . , xm ) =⇒ Φ( y1 , . . . , yk )
(2)
where y1 , . . . , yk are fresh function symbols of arity 0. Second, as in deductive verification, we instantiate x1 . . . xm with concrete terms and the universal quantifiers to the left of the implication are replaced by a finite conjunction over these concrete terms. The resulting formula is a CLU formula, whose validity implies the validity of (1). The set of terms over which to instantiate the antecedent is chosen as follows. Let T (Fclu ) be the set of all terms (integer expressions) which occur in a CLU expression Fclu . For each bound variable xi in ∀x1 . . . ∀xm Ψ (x1 , . . . , xm ), we denote Fxji = { f | f is an function or predicate symbol and xi occurs as j th
148
S.K. Lahiri, S.A. Seshia, and R.E. Bryant
argument to f in Ψ (x1 , . . . , xm )}. Further, for each function or predicate symbol f which occurs in Ψ (x1 , . . . , xm ), denote Gfk = { T | T ∈ T (Φ), and appears as the k th argument to f in Φ(y1 , . . . , yk ) }. The set of arguments that each bound variable xi takes is given by Axi = j { T | T ∈ Gfj for some f ∈ Fxji }. Finally, Ψ (x1 , . . . , xm ) is instantiated over all the terms in Cartesian product, Ax1 × Ax2 . . . × Axm . For example, consider the following quantified formula ∀x1 ∀x2 .f (x1 , x2 ) = g(x2 , x1 ) =⇒ ∀y.f (h2 (y), h1 (y)) = g(h1 (y), h2 (y)) where Ψ ≡ f (x1 , x2 ) = g(x2 , x1 ) and Φ ≡ f (h2 (y), h1 (y)) = g(h1 (y), h2 (y)). In this case,Fx11 = {f }, Fx21 = {g} and Fx12 = {g}, Fx22 = {f }. Similarly, Gf1 = {h2 ( y )}, Gf2 = {h1 ( y )} and Gg1 = {h1 ( y )}, Gg2 = {h2 ( y )}. Finally, Ax1 = y )} and Ax2 = {h1 ( y )}. Hence the bound variables x1 , x2 are instantiated {h2 ( over {h2 ( y )} and {h1 ( y )} respectively. Hence the CLU formula becomes : y ), h1 ( y )) = g(h1 ( y ), h2 ( y )) =⇒ f (h2 ( y ), h1 ( y )) = g(h1 ( y ), h2 ( y )) f (h2 ( which is valid. It is easy to see that this method would cause a blowup which is exponential in the number of bound variables in ∀x1 . . . ∀xm Ψ (x1 , . . . , xm ). However our experience shows that the form of invariants we normally consider have very few bound variables which the decision procedure for UCLID can handle. More importantly, we will demonstrate in Section 4.2 that this simple translation to CLU formula helps us decide many equations of the form (1).
3
Modeling Components of Microprocessors
This section presents techniques to model commonly found structures of modern superscalar processor designs. Primitive constructs have been drawn from a wide spectrum of industrial processor designs, including those of the MIPS R10000, PowerPC 620, and Pentium Pro [18]. 3.1
Terms, Uninterpreted Functions, and Data Abstraction
Microprocessors are described using the standard term-level modeling primitives [17,12,21], where data-words and bit-vectors are abstracted with terms, and functional units abstracted with uninterpreted functions. 3.2
Memories
In this section, we look at a few different formulations of memories found in processors and show how the lambda notations offer a very natural modeling capability for memories. Indexed Memories. Data memory and register file are examples of indexed memories. The set of operations supported by this form of memory are read,
Modeling and Verification of Out-of-Order Microprocessors in UCLID
149
write. At any point in system operation, an indexed memory is represented by a function expression M denoting a mapping from addresses to data values. The initial state of the memory is given by an uninterpreted function symbol m0 which denotes an arbitrary memory state. The effect of a write operation with integer expressions A and D denoting the address and data values yields a function expression M : M = λ addr . ITE(addr = A, D, M (addr )) where M (addr ) denotes a read from the memory at an address addr . Content Addressable Memories. Register Rename units and Translation Lookaside Buffers (TLBs) are examples of Content Addressable Memory (CAM), that store associations between key and data. We represent a CAM as a pair C = C .data,C .present, where C .present is a predicate expression such that C .present(k) is true for any key k that is stored in the CAM, and C .data is a function expression such that C .data(k) yields the data associated with key k, assuming the key is present. The next state components of a CAM for different operations are shown in Figure 2. Operation C .present C .data Insert(C , K , D) λ key . (key = K) ∨ C .present(key) λ key . ITE(key = K, D, C .data(key)) Delete(C , K ) λ key . ¬(key = K) ∧ C .present(key) C .data
Fig. 2. CAM operations
Simultaneous-update arrays. Many structures such as reorder buffers, reservation stations in processors, snoop on the result bus to update an arbitrary number of entries in the array at a single instant. At any point in time, the entry at index i in M can be updated with a data D(i ) if the predicate P (i ) is satisfied. The next state of the array is denoted as: M = λi .ITE(P (i ), D(i ), M (i )) Note that an arbitrary subset of entries in the array can get updated at any time. 3.3
Queues and FIFO Buffers
Processors which employ out-of-order execution mechanisms or prefetching use a variety of queues in the microarchitecture. Instruction buffers, reorder buffers, queues for deferring store instructions to memory, load queues to hold the load instructions which suffer a cache miss are found in most modern processors. Queues. A finite circular queue of arbitrary length can be modeled by augmenting a CAM with two pointers to point to the head and the tail of the queue.
150
S.K. Lahiri, S.A. Seshia, and R.E. Bryant
Insertion (push) of data takes place only at the tail of the queue, and deletion (pop) takes place only at the head. Thus a circular queue can be modeled as a record Q = Q.data, Q.present, Q.head , Q.tail . Q.data and Q.present are defined exactly as in Section 3.2. Q.head is the index of the head of the queue, Q.tail is the index of the tail (next insertion point) of the queue. Let the symbolic constants s and e represent the start and end points of the array over which the circular queue is implemented. The queue is empty when Q.head = Q.tail and Q.present(Q.head ) = false. The queue is full when Q.head = Q.tail and Q.present(Q.head ) = true. To model the effect of succ and pred modulo certain integer, we define the the modulo increment and decrement functions succ[s,e] and pred[s,e] as follows: succ[s,e] := λ i . ITE(i = e, s, succ(i)) pred[s,e] := λ i . ITE(i = s, e, pred(i)) Popping data item from Q returns a new queue Q whose components have the value: Q .head = succ[s,e] (Q.head ) Q .present = λ i . ¬(i = Q.head ) ∧ Q.present(i) Q .tail = Q.tail Q .data = Q.data
Pushing a data item X into Q returns a new queue Q where Q .head = Q.head Q .present = λ i . (i = Q.tail ) ∨ C .present(i) Q .tail = succ[s,e] (Q.tail ) Q .data = λ i . ITE(i = Q.tail , X , Q.data(i))
This formulation of queue is used when the the index to the queue is used as a key in the system. The reorder buffers in processors follow this formulation, because the index in the reorder buffer uniquely identifies the instruction at the index. It is easy to see that for the case when succ[s,e] = succ and pred[s,e] = pred, we obtain an unbounded infinite queue. Q.present would be redundant in that situation. FIFO Buffers. Alternate formulation of queues where the index in the queue is not used as a key (normally referred as FIFO Buffers) are also found in processors. Instruction buffers and load buffers are some examples of this form of queue. Every time an entry is dequeued, the entire content of the queue is shifted by one place towards the head of the queue. If the symbolic constant max denotes the maximum length of the queue, then the queue is full when (Q.tail = max ) and is empty when (Q.tail = Q.head ). The other operations of the queue are given below. Operation Q .head Q .tail Q .data Push(Q, X ) Q.head succ(Q.tail ) λ i . ITE(i = Q.tail , X, Q.data(i)) Pop(Q) Q.head pred(Q.tail ) λ i . Q.data(succ(i))
Modeling and Verification of Out-of-Order Microprocessors in UCLID
4
151
OOO: A Synthetic Out-of-Order Processor
OOO is a simple, unspeculative, out-of-order execution unit with unbounded resources, depicted in Figure 3. The only instructions permitted are arithmetic and logical (ALU) instructions with two source operands and one destination operand. As shown in Figure 3, an instruction is read from program memory,
D E C O D E
PROGRAM MEMORY
REGISTER FILE
src1 src2 dest
RESULT
retire
RESULT BUS
11 00 0 1 00 11 00 11 0 1 00 11 00 11 0 1 00 11 00 11 0 1 00 11 00 11 0 1 00 11 00 11 0 1 00 11 00 11 0 1 00 11 00 11 0 1 00 11 00 11 0 1 00 11 00 11 0 1 00 11
ALU
SRC1 VAL SRC1 TAG SRC1 VALID? SRC2 VAL SRC2 TAG SRC2 VALID? DEST REG ID
HEAD
VALID? opcode
1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
OPCODE PC
dispatch REORDER BUFFER
TAIL
execute
Fig. 3. OOO: An Out-of-order execution unit.
decoded, and dispatched to the end of the reorder buffer, which is modeled as an infinite queue. Instructions with ready operands can execute out-of-order. Finally, an instruction is retired (the program state updated), once it is at the head of the reorder buffer. On each step, the system nondeterministically chooses to either dispatch a new instruction, execute an instruction, or retire an instruction. The register file is modeled as an infinite memory indexed by register ID. Each entry of the register file has a bit, reg.valid, a value reg.val and a tag reg.tag. If reg.valid bit is true, the reg.val contains a valid value, else, reg.tag would hold the tag of the most recent instruction that will write to this register. The reorder buffer has two pointers, rob.head, which points to the oldest instruction in the reorder buffer, and rob.tail, where a newly dispatched instruction would be added. The index of an entry in the reorder buffer serves as its tag. Each entry in the reorder buffer has a valid bit rob.valid indicating if the instruction has finished execution. It has fields for the two operands rob.src1val, rob.src2val. The bit rob.src1valid indicates if the first operand is ready. If the first operand does not have valid data, rob.src1tag holds the tag for the instruction which would produce the operand data. There is a similar bit for the second operand. Each entry also contains the destination register identifier rob.dest and the result of the instruction rob.value to be written back. Further, each entry also stores the program counter (PC) for each entry in rob.PC.
152
S.K. Lahiri, S.A. Seshia, and R.E. Bryant
When an instruction is dispatched, if a source register is marked valid in the register file, the contents of that register are filled into the corresponding operand field for the instruction in the reorder buffer and it is marked valid. If the instruction which would write to the source register has finished execution, then the corresponding operand field copies the result of that instruction and the operand is marked valid. Otherwise, the operand copies the the tag present with the source register into its tag field and the operand is marked invalid. When an instruction executes, it updates its result, and broadcasts the result on the result bus so that all other instructions in the reorder buffer that are waiting on it can update their operand fields. Finally, when a completed instruction reaches the head of the reorder buffer, it is retired. If the tag of the retiring instruction matches the reg.tag for the destination register, the result of the instruction is written back into the destination register, and that register is marked valid. Otherwise, the register file remains unchanged. 4.1
Bounded Property Checking of OOO
The verification of the OOO model was carried out in two phases. In the first phase, we applied bounded property checking to eliminate most of the bugs present in the original model of OOO. For instance, in the original model, a dispatched instruction only looked at the register file for its source operands. If the source was invalid, it was enqueued into the reorder buffer with its operand invalid. The counterexample trace demonstrated that an instruction in the rob can hold the tag of an already retired instruction. The purpose of bounded property checking is not only to discover bugs, but can also serve as a very useful semi-formal verification tool. We can argue that for a model with a circular rob of size k, all the states of the OOO where (i) the length of the rob is anywhere between 0, . . . , k, (ii) the value of the control bits rob.src1valid, rob.src2valid, rob.valid are arbitrary for each entry in the rob and (iii) the control bit of each register reg.valid is arbitrary, can be reached within 2k steps from the reset state. 2k steps are needed to reach the state when the rob is full and all the instructions in the rob have finished execution. Thus a property verified upto 2k steps gives a reasonable guarantee that it would always hold for a implementation of OOO where the number of rob entries is bound by k. This also means that if there is a bug for a particular implementation of OOO where the size of the rob is bound by k, then there is a high likelihood of the bug being detected within 2k steps of bounded-property checking. In Fig 4, we demonstrate that the efficiency of the decision procedure enables UCLID to perform bounded property checking for a reasonable number of steps (upto 20), thus providing guarantee for OOO models with upto 10 rob entries. Figure 4 shows the result for checking the following two properties: 1. tag-consistency:
∀r1 ∀r2 [((r1 = r2 ) ∧ ¬reg.valid(r1 ) ∧ ¬reg.valid(r2 )) =⇒ (reg.tag(r1 ) = reg.tag(r2 ))] 2. rf-rob: ∀r[¬reg.valid(r) =⇒ rob.dest(reg.tag(r)) = r]
The experiments were performed on a 1400MHz Pentium with 256MB memory running Linux. zChaff [15] was used as the SAT solver within UCLID. To com-
Modeling and Verification of Out-of-Order Microprocessors in UCLID
153
pare the performance of UCLID’s decision procedure, we also used SVC [3] to decide the CLU formulas. Although SVC’s logic is more expressive than CLU (includes bit-vectors and linear arithmetic in addition to CLU constructs), the decision procedure for CLU outperforms SVC for checking the properties of interest in bounded property checking. The key point to note is that UCLID (coupled with powerful SAT solvers like zChaff) enables automatic exploration of much larger state spaces than was previously possible with other techniques. Property #steps tag-consistency 6 10 14 18 20 rf-rob 10 14 18 20
Fclu size 346 2566 7480 15098 19921 2308 7392 14982 19791
Fbool size UCLID time 1203 0.87 15290 10.80 62504 76.55 173612 542.30 263413 1679.12 14666 10.31 61196 71.29 171364 485.09 260599 777.12
SVC time 0.22 233.18 > 5hrs > 1 day > 1 day 160.84 > 8hr > 1day > 1day
Fig. 4. Experimental results for Bounded Property Checking with OOO. Here “steps” indicates the number of steps of symbolic simulation, “Fclu ” denotes the CLU formula obtained after the symbolic simulation, “Fbool ” denotes boolean formula obtained after translating a CLU formula to a propositional formula by the decision procedure, the “size” of a formula denotes the number of distinct nodes in the Directed Acyclic Graph (DAG) representing the formula. “UCLID time” is the time taken by UCLID decision procedure and “SVC time” is the time taken by SVC 1.1 to decide the CLU formula. “tag-consistency” and “rf-rob” denote the properties to be verified.
4.2
Verification of the OOO Unit by Invariant Checking
We verify the OOO processor by proving a refinement map between OOO and a sequential Instruction Set Architecture (ISA) model. The ISA contains a program counter Isa.PC, and a register file Isa.rf. The program counter Isa.PC is synchronized with the program counter for OOO. Isa.rf maintains the state of the register file when all the instructions in the reorder buffer (rob) have retired . and the rob is empty. Every time an instruction I = (r1,r2,d,op) is decoded and put into the rob, the result of the instruction is computed and written to the destination register d in the ISA register file as follows: Isa.rf[d] ← Alu(op, Isa.rf[r1], Isa.rf[r2]) where, Alu is an uninterpreted function to abstract the actual computation of the execution unit. To state the invariants for the OOO processor, we maintain some auxiliary state elements in addition to the state variables of the OOO unit. These structures are very similar to the auxiliary structures used by McMillan [14] and Arons [1] for
154
S.K. Lahiri, S.A. Seshia, and R.E. Bryant
verifying the correctness of out-of-order processors. We maintain the following structures to reason about the correctness. 1. A shadow reorder buffer, Shadow.rob, where each entry contains the correct values of the operands and the result. This structure is used to reason about the correctness of values in the rob entries. Shadow.rob is a triple (Shadow.value, Shadow.src1val, Shadow.src2val). Shadow.value(t) contains the correct value of rob.value(t) in the rob. Similarly, the other fields in the Shadow.rob contain the correct values for the two data operands. . When an instruction I = (r1,r2,d,op) is decoded, the Shadow.rob structure at rob.tail is updated as follows: Shadow.value(rob.tail) ← Alu(op, Isa.rf(r1), Isa.rf(r2)) Shadow.src1val(rob.tail) ← Isa.rf(r1) Shadow.src2val(rob.tail) ← Isa.rf(r2) 2. A shadow program counter Shadow.PC, which points to the next instruction to be retired. It is incremented every time an instruction retires in OOO. The Shadow.PC is used to prove that OOO retires instruction in a sequential order. Correctness criteria. The correctness is established by proving the following refinement map between the register file of the OOO unit and the ISA register file. ∀r.[reg.valid(r) =⇒ (Isa.rf (r) = reg.val(r))]
(ΨHa )
The lemma states that if a register is not the destination of any of the instructions in the rob, then the values in the OOO model and the ISA model are the same. Inorder Retirement. We also prove that the OOO retires instruction in sequential order with the following lemma. Shadow.P C = IT E(rob.head = rob.tail, rob.P C(rob.head), P C)
(ΨP C )
Note that this lemma is not required for establishing the correctness of OOO. 4.3
Invariants for the OOO Unit
We needed to come up with 12 additional invariants to establish the correctness of the OOO model, and we describe all of them in this section. The invariants broadly fall under three categories. The first four invariants, ΨA , ΨB1 , ΨC , ΨD are concerned with maintaining a consistent state within the OOO model. These invariants are required mainly due to the redundancy present in the OOO model. The invariants ΨE , ΨGa establish the correctness of data in the register file and rob. Lastly, invariants ΨGb , ΨHc , ΨK1 are the auxiliary invariants, which were required to prove some of the invariants above. The invariant names have no special bearing, except ΨB1 , ΨE1 and ΨK1 denote that there are similar invariants
Modeling and Verification of Out-of-Order Microprocessors in UCLID
155
for the second operand. For the sake of readability, we define ∀t.Φ(t) to be an abbreviation for ∀t.((rob.head ≤ t < rob.tail)) =⇒ Φ(t). Consistency Invariants. Invariant ΨA asserts that an instruction in the rob can execute only when both the operands are ready. ∀t.[rob.valid(t) =⇒ (rob.src1valid(t) ∧ rob.src2valid(t))]
(ΨA )
For any rob entry t, if any operand is not valid, then the operand should hold the tag of an older entry which produces the data but has not yet completed execution. There is a similar invariant for the second operand. ∀t.[¬rob.src1valid(t) =⇒ (¬rob.valid(rob.src1tag(t)) ∧ (rob.head ≤ rob.src1tag(t) < t)]
(ΨB1 )
Invariant ΨC claims that if the instruction at index t writes to a register r : rob.dest(t), then r can’t have valid data and the tag carried by r would be either t or a newer entry. ∀t.[(t ≤ reg.tag(rob.dest(t)) < rob.tail)) ∧ (¬reg.valid(rob.dest(t))]
(ΨC )
Invariant ΨD asserts that a register r can only be modified by an active instruction in the rob which has r as the destination register. ∀r.[¬reg.valid(r) =⇒ ((rob.dest(reg.tag(r)) = r) ∧ (rob.head ≤ reg.tag(r) < rob.tail))]
(ΨD )
All the above invariants restrict the state of the OOO model to be a reachable state. Note that there is no reference to any shadow structure, because the shadow structures only provide correctness of values in the OOO model. Correctness Invariants. Invariant ΨE1 establishes the constraint between the Shadow.src1val and rob.src1val. It states that if any rob entry has a valid operand, then it should be correct (equals the value in the Shadow structure for that entry). There is a similar invariant for the second operand. ∀t.[rob.src1valid(t) =⇒ (Shadow.src1val(t) = rob.src1val(t))]
(ΨE1 )
The following invariant asserts that if an rob entry has completed execution, then the result matches with the value in the shadow rob. ∀t.[rob.valid(t) =⇒ (Shadow.value(t) = rob.value(t))]
(ΨGa )
Auxiliary Invariants. We needed the following auxiliary invariants for the Shadow.src1val, Shadow.value and Isa.rf respectively to prove the previous invariants inductive. ∀t.[¬rob.src1valid(t) =⇒ Shadow.src1val(t) = Shadow.value(rob.src1tag(t))] (ΨK1 )
The above invariant asserts that the correct value of a data operand which is not ready is the result of the instruction which would produce the data. ∀t.[(Shadow.value(t) = Alu(rob.opcode(t), Shadow.src1val(t), Shadow.src2val(t)))]
(ΨGb )
156
S.K. Lahiri, S.A. Seshia, and R.E. Bryant
The above invariant relates the result of execution to the correct value for any entry. ∀r.[¬reg.valid(r) =⇒ Isa.rf (r) = Shadow.value(reg.tag(r))]
(ΨHc )
The invariant ΨHc relates the value of a register r in the shadow register file with the result of the instruction which would write back to the register. Finally, we conjoin all the invariants to make the monolithic invariant Ψall . Since ∀ distributes over ∧, we pull the quantifiers out in the formula given here: . Ψall = ∀r.∀t.[ΨA (t) ∧ ΨB1 (t) ∧ ΨB2 (t) ∧ ΨC (t) ∧ ΨD (r) ∧ ΨE1 (t) ∧ ΨE2 (t) ∧ ΨK1 (t) ∧ ΨK2 (t) ∧ ΨGa (t) ∧ ΨGb (t) ∧ ΨHa (r) ∧ ΨHc (r)]
Proof of the invariants. Some of the invariants were manually deduced from a failure trace from the counterexample generator. The most complicated among them were the invariants for the shadow register file and shadow rob entries. We spent two man-days to come up with all the invariants. The invariants were proved in a completely automatic way by automatically translating the invariants to a formula in CLU by the method described in Section 2.2, and using the decision procedure for CLU to decide the formula. As we claimed earlier, the translation of quantified formulas to a CLU formula does not blow up the formula in a huge way, since most of the formulas have at most two bound variables. For instance, consider the proof for the invariant ΨHa as given in the UCLID framework: decide(Inv_all => Inv_Ha_next(r1));
Here the invariant ΨHa (written above as Inv Ha) is checked in the next state if Ψall (written as Inv all) holds in the current state for all registers r and all tags t. There are only two bound variables r,t in the antecedent. Since all our invariants are of the form ∀r.Φ(r) or ∀t.Ψ (t), we had to consider at most two bound variables in the antecedent. The final proof script had 13 such formulas (one for each invariant) to be decided, and they were discharged automatically by UCLID in 76.44 sec. on a 1400 MHz Pentium IV Linux machine with 256 MB of memory. The memory requirement was less than 20 MB for the entire run. There is still a lot of scope of improvement in the decision procedure. The proof script consisted of the shadow structures, definition of the invariants mentioned in the Section 4.3, and 13 lines of proof to prove all the invariants in the next state. To prove the lemma ΨP C for the in-order retirement, we required two more auxiliary lemmas. nPC is an uninterpreted function to obtain the next sequential value of a program counter. ∀t.[(t > rob.head) =⇒ rob.P C(t) = nP C(rob.P C(t − 1))]
(ΨP C1 )
[(rob.head = rob.tail) =⇒ P C = nP C(rob.P C(rob.tail − 1))]
(ΨP C2 )
Modeling and Verification of Out-of-Order Microprocessors in UCLID
4.4
157
Using a Circular Reorder Buffer
The model verified in this section is somewhat unrealistic because of the infinite reorder buffer, since it never wraps around. Most reorder buffer implementations use a finite circular queue to model the reorder buffer. Thus tags are reused unlike the above model. Hence we re-did the verification using a model with a circular buffer of arbitrary size. We needed very little change to our original proof. First, the reorder buffer was modeled as a circular buffer with modulo successor and predecessor functions as defined in Section 3.3. Second, each rob entry had an additional entry rob.present to indicate if the entry has a valid instruction, and to disambiguate between checking the rob for full or empty. Third, the “<” operation was modified to take into account the wrap around for circular queues. Finally, we had to establish an invariant between rob.present, rob.head and rob.tail to ensure that an entry is present if and only if it lies between the rob.head and rob.tail. None of the invariants had to be modified except as mentioned above. Hence the proof of the processor with circular reorder buffer went through without any major changes to the model or invariants. 4.5
Liveness Proof
We give a high level proof sketch of liveness for the OOO processor similar to Hosabettu’s proof [10] in PVS. Although this proof is not mechanical, it uses a high level induction which utilizes various invariants that have been proved in the previous section using UCLID. Proposition 1. Every dispatched instruction eventually gets executed and retired, assuming fair scheduling of instructions. – We show that every instruction eventually reaches the head of the rob, and that the instruction at the head is eventually retired. Our proof proceeds by induction on the size of the rob. – The base case, when the rob is empty is trivial. – Let us assume that the proposition holds when the rob has less than k entries in it. Now, consider the case when there are k entries in the reorder buffer. Observe that rob.head (rob.tail) is incremented only when an instruction is retired (dispatched). • A fair scheduler will attempt to retire the instruction at the head of rob infinitely often, therefore the instruction at the rob.head is eventually retired if it gets executed. This is because the instruction at the head is retired if the rob.valid bit is set for the entry at the head. • Observe that a fair scheduler attempts to execute any instruction in the rob infinitely often, thus an instruction at index t, executes if it has both the operands ready (i.e. rob.src1valid(t) and rob.src2valid(t) are true). But, invariant ΨB1 and ΨB2 assert that both rob.src1valid and rob.src2valid for rob.head are true. Thus the instruction at the head of the rob eventually gets executed and thus, retired.
158
S.K. Lahiri, S.A. Seshia, and R.E. Bryant
Thus for any finite sequence of instructions, the rob eventually becomes empty. With an empty rob, invariant ΨD ensures that all the registers have reg.valid bit as true. Hence, by invariant ΨHa , we know that the state of the register files in both OOO and the ISA model would eventually match.
5
Conclusions and Future Work
We have demonstrated the use of UCLID in modeling and verifying out-of-order processor designs. We showed the use of two different verification techniques that provide varying correctness guarantees and degrees of automation, ranging from bounded property checking, which provides full automation and debugging facilities, to invariant checking, which allows for full correctness checking at the cost of a manual assistance in deriving the invariants. Our hope is that the automation provided by bounded property checking and the proof of invariants would be of great help in analyzing large designs. We are currently trying to extend the verification to an out-of-order unit with exceptions, load-store instructions and branch-prediction. We have also started the verification of the MIPS R10000 processor as an industrial case study. Acknowledgments. This research was supported in part by the Semiconductor Research Corporation, Contract RID 684, and by the Gigascale Research Center, Contract 98DT-660. The second author was supported in part by a National Defense Science and Engineering Graduate Fellowship.
References 1. T. Arons and A. Pnueli. Verifying Tomasulo’s algorithm by Refinement. In Proc. VLSI Design Conference (VLSI ’99), 1999. 2. T. Arons and A. Pnueli. A comparison of two verification methods for speculative instruction execution. In Proc. Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2000), March 2000. 3. C. Barrett, D. Dill, and J. Levitt. Validity checking for combinations of theories with equality. In M. Srivas and A. Camilleri, editors, Formal Methods in ComputerAided Design (FMCAD ’96), LNCS 1166, pages 187–201. Springer-Verlag, November 1996. 4. S. Berezin, A. Biere, E. M. Clarke, and Y. Zhu. Combining symbolic model checking with uninterpreted functions for out of order microprocessor verification. In Formal Methods in Computer-Aided Design(FMCAD ’98), LNCS 1522. Springer-Verlag, November 1998. 5. R. E. Bryant, S. German, and M. N. Velev. Exploiting positive equality in a logic of equality with uninterpreted functions. In N. Halbwachs and D. Peled, editors, Computer-Aided Verification (CAV ’99), LNCS 1633, pages 470–482. SpringerVerlag, July 1999. 6. R. E. Bryant, S. German, and M. N. Velev. Processor verification using efficient reductions of the logic of uninterpreted functions to propositional logic. ACM Transactions on Computational Logic, 2(1):1–41, January 2001.
Modeling and Verification of Out-of-Order Microprocessors in UCLID
159
7. R. E. Bryant, S. K. Lahiri, and S. A. Seshia. Modeling and verifying systems using a logic of counter arithmetic with lambda expressions and uninterpreted functions. In Proc. Computer-Aided Verification (CAV’02) (to appear), July 2002. 8. J. R. Burch and D. L. Dill. Automated verification of pipelined microprocessor control. In D. L. Dill, editor, Computer-Aided Verification (CAV ’94), LNCS 818, pages 68–80. Springer-Verlag, June 1994. 9. Y. Gurevich. The decision problem for standard classes. The Journal of Symbolic Logic, 41(2):460–464, June 1976. 10. R. Hosabettu, G. Gopalakrishnan, and M. Srivas. Proof of correctness of a processor with reorder buffer using the completion function approach. In N. Halbwachs and D. Peled, editors, Computer-Aided Verification (CAV 1999), volume 1633 of Lecture Notes in Computer Science. Springer-Verlag, July 1999. 11. R. Hosabettu, G. Gopalakrishnan, and M. Srivas. Verifying advanced microarchitectures that support speculation and exceptions. In A. Emerson and P. Sistla, editors, Computer-Aided Verification (CAV 2000), LNCS 1855. Springer-Verlag, July 2000. 12. R. Jhala and K. McMillan. Microarchitecture verification by compositional model checking. In G. Berry, H. Comon, and A. Finkel, editors, Computer-Aided Verification, volume 2102 of Lecture Notes in Computer Science, pages 396–410. SpringerVerlag, July 2001. 13. S. Lahiri, C. Pixley, and K. Albin. Experience with term level modeling and verification of the MCORE microprocessor core. In Proc. IEEE High Level Design Validation and Test (HLDVT 2001), November 2001. 14. K. McMillan. Verification of an implementation of Tomasulo’s algorithm by compositional model checking. In A. J. Hu and M. Y. Vardi, editors, ComputerAided Verification (CAV 1998), volume 1427 of Lecture Notes in Computer Science. Springer-Verlag, June 1998. 15. M. Moskewicz, C. Madigan, Y. Zhao, L. Zhang, and S. Malik. Chaff: Engineering an efficient SAT solver. In 38th Design Automation Conference (DAC ’01), June 2001. 16. S. Owre, J. M. Rushby, and N. Shankar. PVS: A prototype verification system. In Deepak Kapur, editor, 11th International Conference on Automated Deduction (CADE), volume 607 of Lecture Notes in Artificial Intelligence, pages 748–752. Springer-Verlag, June 1992. 17. J. Sawada and W. Hunt. Processor verification with precise exceptions and speculative execution. In A. J. Hu and M. Y. Vardi, editors, Computer-Aided Verification (CAV ’98), LNCS 1427. Springer-Verlag, June 1998. 18. J. P. Shen and M. Lipasti. Fundamentals of Superscalar Processor Design. In Press, 2001. 19. J. U. Skakkaebaek, R. B. Jones, and D. L. Dill. Formal verification of out-oforder execution using incremental flushing. In A. J. Hu and M. Y. Vardi, editors, Computer-Aided Verification (CAV ’98), LNCS 1427. Springer-Verlag, June 1998. 20. M. N. Velev. Using rewriting rules and positive equality to formally verify wideissue out-of-order microprocessors with a reorder buffer. In Design, Automation and Test in Europe (DATE ’02), pages 28–35, March 2002. 21. M. N. Velev and R. E. Bryant. Formal Verification of Superscalar Microprocessors with Multicycle Functional Units, Exceptions and Branch Predication. In 37th Design Automation Conference (DAC ’00), June 2000.
On Solving Presburger and Linear Arithmetic with SAT Ofer Strichman Computer Science, Carnegie Mellon University, Pittsburgh, PA [email protected]
Abstract. We show a reduction to propositional logic from quantifierfree Presburger arithmetic, and disjunctive linear arithmetic, based on Fourier-Motzkin elimination. While the complexity of this procedure is not better than competing techniques, it has practical advantages in solving verification problems. It also promotes the option of deciding a combination of theories by reducing them to this logic.
1
Introduction
Quantifier-Free Presburger (QFP) formulas and Disjunctive Linear Arithmetic (DLA) are two major decidable theories that are frequently used in formal verification of both hardware designs and software. The syntax of both theories is n aj · xj ≤ b, the same: a Boolean combinations of predicates of the form Σj=1 where the coefficients aj and the bound b are rational constants1 . The formula belongs to DLA if ∀j, xj ∈ R, and to QFP if ∀j, xj ∈ Z, where R and Z are the set of reals and integers, respectively. Decision procedures for both of these theories handle disjunctions by ‘casesplitting’, i.e., transforming the formula to Disjunctive Normal Form (DNF) and then solving each clause separately. Naive case-splitting procedures explicitly transform the formula to DNF, and are therefore very restricted in the size of the formula that they can handle (the number of clauses in the resulting formula can be exponential in the size of the original formula). More sophisticated implementations split the formula only ’as needed’, which increases in many cases the capacity of these procedures. The lower-bound complexity of solving each clause, i.e., a conjunction of constraints, is exponential for QFP and polynomial for Linear Arithmetic [11]. The latter is rarely better in practice comparing to some exponential methods like Simplex [6], and this is especially true when considering small or medium size problems, as the ones that are typically encountered in formal verification. For this reason, as far as we know, no automated theorem prover uses a polynomial procedure for linear arithmetic. The most commonly used method by theorem 1
This research was supported in part by the Office of Naval Research (ONR) and the Naval Research Laboratory (NRL) under contract no. N00014-01-1-0796 Linear arithmetic doesn’t have the requirement that the coefficients are rational. We make this assumption nevertheless, for simplicity.
M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 160–170, 2002. c Springer-Verlag Berlin Heidelberg 2002
On Solving Presburger and Linear Arithmetic with SAT
161
provers is the Fourier-Motzkin (FM) variable elimination method [2], which is used in popular tools like PVS [14], ICS [8], SVC [1] IMPS [7] and others. A variation of FM can also be used for solving QFP, as being done in the omega library [18]. We will describe in detail how this method works in section 2. Although FM has a worst-case super-exponential complexity, it is popular because it is frequently faster than competing methods for the size of instances encountered in practice. Hence, the current practice in solving both QFP and DLA is to solve, in the worst case, an exponential number of FM instances. Theoretically this is not the best possible, as explained above, but experience has showed that for the type of formulas encountered in verification, it is adequate. The procedure described in this paper solves one FM instance in order to generate a SAT instance, and then solves this instance with a standard SAT solver. It has a similar complexity to what we just described as the common practice, but we expect it to be better in practice because it avoids explicit case-splitting2 . In other words, it is expected to be an efficient procedure for deciding formulas that their equivalent DNF has many clauses, but each one of them is relatively small. An efficient reduction of QFP and DLA to propositional logic will not only enable to (potentially) solve them faster, but also to integrate them with other theories on the propositional logic level. Many other decidable theories that are frequently encountered in verification (e.g. bit-vector arithmetic [10]) already have such reductions to propositional logic. Solving mixed theories by reducing them to a common logic will facilitate the application of various learning techniques between sub-expressions that originate from different theories. Furthermore, current popular techniques for integrating theories, like ICS [8] and Nelson et al. [13] invoke different procedures for deciding each theory. The overhead of the mutual updating of these procedures can become significant. This overhead is avoided if only one procedure (SAT in this case) is used. In the next section we briefly describe the FM method. In section 3 we present a propositional version of the same procedure and explain how it can be used to reduce the satisfiability problem of a linear system to SAT. In section 4 we present a method called ‘conjunctions matrices’, which is useful for reducing the complexity of the procedure described in section 3. In section 5 we list experimental results.
2
Fourier-Motzkin Elimination
n A linear inequality predicate over n variables has the form Σj=1 aj · xj ≤ b. A conjunction of m such constraints is conveniently described by C : AI ≤ b 2
In a previous research [20], we detailed the reasons why SAT solvers are more efficient than case splitting in handling propositional combinations of formulas, although both have the same theoretical complexity. Propositional SAT checkers apply techniques like learning, pruning and guidance (‘guidance’ refers to heuristics for prioritizing the internal steps of the decision procedure) that can not be easily imitated by case-splitting.
162
O. Strichman
where A is an m × n real-valued coefficient matrix, I = x1 ...xn is a vector of n variables, and b is a vector of real-valued bounds. Given a variable order x1 ...xn the FM method eliminates (existentially quantifies) them in decreasing order. Each variable is eliminated by projecting its constraints on the rest of the system. The procedure works as follows: at each elimination step, the list of constraints is partitioned to three segments, according to the sign of the coefficient of xn in each constraint. Let ai,n denote the coefficient of xn in constraint i, for i ∈ [1..m]. The three segments are: 1. For all i s.t. ai,n > 0: 2. For all i s.t. ai,n < 0: 3. For all i s.t. ai,n = 0:
n−1 ai,n · xn ≤ bi − Σj=1 ai,j · xj n−1 Σj=1 ai,j · xj − bi ≤ −ai,n · xn n−1 Σj=1 ai,j · xj ≤ bi
The first and second segments correspond to upper and lower bounds on xn , respectively. To eliminate xn , FM replaces each pair of lower and upper bound constraints L ≤ cl · xn and cu · xn ≤ U , where cl , cu > 0, with the new constraint cu ·L ≤ cl ·U . If, in the process of elimination, the procedure derives the constraint c ≤ 0 where c is a constant greater than 0, it terminates and indicates that the system is unsatisfiable. Note that it is possible that variables are not bounded from both ends. In this case it is possible to simplify the system by removing these variables from the system together with all the constraints to which they belong. This can make other variables unbounded. Thus, this simplification stage iterates until no such variables are left. n The FM method can result in the worst case in m2 constraints, which is the reason that it is only suitable for a relatively small set of constraints with small number of variables. There are various heuristics for choosing the elimination order. A standard greedy criteria gives priority to variables that their elimination produces less new constraints. Example 1. Consider the following formula: ϕ = x1 − x2 ≤ 0
∧
x1 − x3 ≤ 0
∧
−x1 + 2x3 + x2 ≤ 0
∧
−x3 ≤ −1
The following table demonstrates the elimination steps following the variable order x1 ,x2 ,x3 : Eliminated Lower Upper New var bound bound constraint x1 x1 − x2 ≤ 0 −x1 + 2x3 + x2 ≤ 0 2x3 ≤ 0 x1 − x3 ≤ 0 −x1 + 2x3 + x2 ≤ 0 x2 + x3 ≤ 0 x2 no lower bound x3 2x3 ≤ 0 −x3 ≤ −1 2≤0 The last line results in a contradiction, which implies that this system is unsatisfiable. The extension of FM to handle a combination of strict (<) and weak (≤) inequalities is simple. If either the lower or upper bound are a strict inequality, then so is the resulting constraint.
On Solving Presburger and Linear Arithmetic with SAT
163
Fourier-Motzkin for integer variables. An extension of Fourier-Motzkin for eliminating integer variables was described in [17,18] as part of the ‘omega test’. The omega test is quite complex and its full description is beyond the scope of this article. Below we summarize the core principles of this method. The important point is that like the original FM, eliminating an integer variable is done by deriving a new set of constraints that are satisfiable if and only if the original set of constraints is satisfiable. This is the only property that we will need for the reduction in the next section. FM is a projection method, and therefore it is convenient to describe its operation in geometrical terms. A set of inequalities C define, geometrically, a convex polyhedron in n dimensions, where n is the number of variables. Elimination of one variable from C corresponds to defining a new polyhedron in n − 1 dimensions. For example, if C is a set of constraints on x1 , x2 , x3 then ∃x3 .C is a set of constraints on x1 and x2 that define a two dimensional polyhedron, or, illustratively, a shadow of the preceding polyhedron. FM repeats this process until a single variable is left, a point in which checking for contradiction is trivial. The problem of applying this process to integer variables is that elimination of such variables may create non-convex constraints. The following example presents such a case. Example 2. Consider the following formula: ∃x2 . (0 ≤ 3x2 − x1 ≤ 7
∧
1 ≤ x1 − 2x2 ≤ 5)
With standard FM we derive 3 ≤ x1 ≤ 27. If x2 is of type integer, this is not a correct projection. For example, if x1 = 4 no integer value of x2 satisfies the above formula. The correct solution is the non-convex region x1 = 3, x1 = 29, and 5 ≤ x1 ≤ 27. The shadow that is computed by FM is referred to in [17] as the real shadow, since it refers to a projection resulting from elimination of real variables. If the real shadow is unsatisfiable (by a recursive computation), the omega test concludes that there is no solution to C. Otherwise, it computes the dark shadow of C, which defines a sub-region of the real shadow. In geometrical terms, the dark shadow is a projection of the sub-region of C that is at least one unit thick (that is, the range of the eliminated variable above the dark shadow is greater or equal to 1). This sub-region is guaranteed to contain an integral point, and therefore if the dark shadow is satisfiable, then so is C. The dark shadow is expressed by the single constraint cl · U − cu · L > (cu − 1) · (cl − 1) for a given lower and upper bounds on xn : cLl ≤ xn ≤ cUu , where cu and cl are integer constants. We refer the reader to [18] for a proof of this derivation. The dark shadow test is sound, but not complete. It is possible that the dark shadow is unsatisfiable, but there is still a solution to C. If the dark shadow is unsatisfiable, the omega test generates a set of constraints in DNF, called splinters, which define integral solutions outside the dark shadow (DNF is required because the solution area is not necessarily continues). The algorithm in figure 1, adopted from [18], gives a rough idea on how this algorithm works. Unlike
164
O. Strichman
the description above, this is a non-recursive version of the algorithm, which is therefore more suitable for reduction to SAT. Given a set of inequality constraints C and an integer variable xn that should be quantified out, it generates a logically equivalent formula that is a disjunction between two sub-formulas: the first does not contains xn , and the second contains xn as part of an equality constraint (which means it can be eliminated by simple substitution). % Input: ∃xn .C where xn is an integer variable and C is a conjunction of inequalities. R = false C = all constraints from C that do not involve xn . for each lower bound on xn : L ≤ cl · xn for each upper bound on xn : cu · xn ≤ U C = C ∧ (cu · L + (cu − 1)(cl − 1) ≤ cl · U ) Let cmax = max coefficient of xn in upper bound on xn . for (i = 0 to ((cmax − 1)(cl − 1) − 1)/cmax ) do R = R ∨ (C ∧ (L + i = cl · xn )). % C is the dark shadow. % R contains the splinters % Output: C ∨ (∃ integer xn s.t. R) Fig. 1. Existential quantification of an integer xn from a set of constraints C
In the next section we present a propositional version of the FM method and the omega test.
3
A Propositional Version of Fourier-Motzkin
Given a DLA formula ϕ, we now show how to derive a propositional formula ϕ s.t. ϕ is satisfiable iff ϕ is satisfiable. The procedure for generating ϕ emulates the FM method. 1. Normalize ϕ: - Rewrite equalities as conjunction of inequalities. - Transform ϕ to Negation Normal Form (negations are allowed only over atomic constraints). - Eliminate negations by reversing inequality signs. 2. Encode each inequality i with a Boolean variable ei . Let ϕ denote the encoded formula. 3. - Perform FM elimination on the set of all constraints in ϕ, while assigning new Boolean variables to the newly generated constraints. - At each elimination step, for every pair of constraints ei , ej that result in the new constraint ek , add the constraint ei ∧ ej → ek to ϕ . - If ek represents a contradiction (e.g., 1 ≤ 0), replace ek by false. We refer to this procedure from here on as Boolean Fourier Motzkin (BFM).
On Solving Presburger and Linear Arithmetic with SAT
165
Example 3. Consider the following formula: ϕ = 2x1 − x2 ≤ 0
∧
(2x2 − 4x3 ≤ 0
∨
x3 − x1 ≤ −1)
By Assigning an increasing index to the predicates from left to right we initially get ϕ = e1 ∧ (e2 ∨ e3 ). Let x1 , x2 , x3 be the elimination order. The following table illustrates the process of updating ϕ : Elimina- Lower Upper New Enco- Add to ϕ ted var bound bound constraint ding x1 x3 − x1 ≤ −1 2x1 − x2 ≤ 0 2x3 − x2 ≤ −2 e4 e3 ∧ e1 → e4 x2 2x3 − x2 ≤ −2 2x2 − 4x3 ≤ 0 4 ≤ 0 false e4 ∧ e2 → false Thus, the resulting satisfiable formula is: ϕ = (e1 ∧ (e2 ∨ e3 )) ∧ (e1 ∧ e3 → e4 ) ∧ (e4 ∧ e2 → false) A propositional version of the omega test, which is needed for solving QFP arithmetic, works in a similar way. The main difference is that in step 3, ei and ej can result in a Boolean combination of predicates rather than a single predicate ek . Example 3 demonstrates the main drawback of this method. Since in step 2 we consider all inequalities, regardless of the Boolean connectives between them, the number of constraints that the FM procedure adds is potentially larger than those that we would add if we considered each case separately (where a ’case’ corresponds to a conjoined list of inequalities). In the above example, case splitting would result in two cases, none of which results in added constraints. Since the complexity of FM is the bottleneck of this procedure, this drawback may significantly worsen the overall run time and risk its usability. As a remedy, we will suggest in section 4 a polynomial method that bounds the number of constraints to the same number that would otherwise be added by solving the various cases separately. Complexity of deciding ϕ . The encoded formula ϕ has a unique structure that makes it easier to solve comparing to a general propositional formula of similar size. Let m be the set of encoded predicates of ϕ and n be the number of variables. n
Proposition 1. ϕ can be decided in time bounded by O(2|m| · |m|2 ). Proof. SAT is worst case exponential in the number of decided variables and linear in the number of clauses. The Boolean value assigned to the predicates in m imply the values of all the generated predicates3 . Thus, we can restrict the 3
Note that the constraints added in step 3 are Horn clauses. This means that for a given assignment to the predicates in m, these constraints are solvable in linear time.
166
O. Strichman
SAT solver to split only on m. Hence, in the worst case the SAT procedure will be exponential in m and linear in the number of clauses, which in the worst case n is |m|2 .
4
Conjunctions Matrices
Case splitting can be thought of as a two step procedure, where in the first step the formula is transformed to DNF, and in the second each clause, which now includes a conjunction of constraints, is solved separately. In this section we show how to predict, in polynomial time, whether a given pair of predicates would share a clause if the formula was transformed to DNF. It is clear that there is no need to generate a new constraint from two predicates that do not share a clause. 4.1
Joining Operands
We assume that ϕ is normalized, as explained in step 1. Let ϕf denote the encoded formula after step 2 and ϕc denote the added constraints of step 3 (thus, after step 3 ϕ = ϕf ∧ϕc ). All the internal nodes of the parse tree of ϕf correspond to either disjunctions or conjunctions. Consider the lowest common parent of two leafs ei , ej in the parsing tree. We call the Boolean operand represented by this node the joining operand of these two leafs and denote it by J(ei , ej ). Example 4. In the formula ϕf = e1 ∧ (e2 ∨ e3 ), J(e1 , e2 ) = ‘∧’ and J(e2 , e3 ) = ‘∨’. For simplicity, we first assume that no predicates appear in ϕ more than once. In section 4.2 we solve the more general case. Denote by ϕD the DNF representation of ϕ. The following proposition is the basis for the prediction technique: Proposition 2. Two predicates ei , ej share a clause in ϕD iff J(ei , ej ) = ‘∧’. Proof. Recall that ϕf does not contain negations and no predicate appears more than once. (⇒) Let node denote the node joining ei and ej , and assume it represent a disjunction (J(ei , ej ) =‘∨’). Transform the right and left branches descending from node to DNF. A disjunction of two DNF formulas is a DNF, and therefore the formula under node is now a DNF expression. If node is the root or if there are only disjunctions on the path from node to the root, we are done. Otherwise, the distribution of conjunction only adds elements to each of the clauses under node but does not join them into a single clause. Thus, ei and ej do not share a clause if their joining operand is a disjunction. (⇐) Again let node denote the node joining ei and ej , and assume it represents a conjunction (J(ei , ej ) =‘∧’). Transform the right and left branches descending from node to DNF. Transforming a conjunction of two DNF sub formulas back to DNF is done by forming a clause for each sequence of literals from the different clauses. Thus, at least one clause contains ei ∧ ej . Since there are no negation in the formula, the literals in this clause will remain together in ϕD regardless of the Boolean operands above node.
On Solving Presburger and Linear Arithmetic with SAT
167
For a given pair of predicates, it is a linear operation (in the height of the parse tree h) to check whether their joining operand is a conjunction or disjunction. If there are m predicates in ϕ, constructing the initial m × m conjunctions matrix Mϕ of ϕ has the complexity of O(m2 h). Mϕ is a binary, symmetric matrix, where Mϕ [ei , ej ] = 1 if and only if J(ei , ej ) =‘∧’. For example, Mϕ corresponding to ϕf of example 4 is given by
e1 e1 0 Mϕ = e2 1 e3 1
e2 1 0 0
e3 1 0 0
Given proposition 2, this means that these predicates share at least one clause in ϕD . New entries are added to Mϕ when new constraints are generated, and other entries, corresponding to constraints with non-zero coefficients over eliminated variables, are removed. The entry for a new predicate ek that was formed from the predicates ei , ej is updated as follows: ∀l ∈ [1..k − 1]. Mϕ [ek , el ] = Mϕ [ei , el ] ∧ Mϕ [ej , el ] This reflects the fact that the new predicate is relevant only to predicates that share a clause with both ei and ej . 4.2
Handling Repeating Predicates
Practically most formulas contain predicates that appear more than once, in different parts of the formula. We will denote by eki , k ≥ 1 the k instance of the predicate ei in ϕ . It is possible that the same pair of predicates has different joining operands, e.g. J(e1i , e1j ) =‘∧’ but J(e1i , e2j ) =‘∨’. There are two possible solutions to this problem: 1. Represent each predicate instance as a separate predicate. 2. Assign Mϕ [ei , ej ] = 1 if there exists an instance of ei and of ej s.t. J(ei , ej ) = ‘∧’. The second option has a more concise representation, but may result in redundant constraints, as the example below demonstrates. Example 5. Let ϕf = e1 ∧ (e2 ∨ e3 ) ∨ (e2 ∧ e3 ). According to option 2, ϕ contains only three predicates e1 . . . e3 and therefore Mϕ is a 3 × 3 matrix with an entry ’1’ in all its cells. Thus, Mϕ does not contain the information that the three predicates never appear together in the same clause, which will potentially result in redundant constraints. Conjunctions matrices can be used to speed up many of the other decision procedures that were published in the last few years for subset of linear arithmetic [9,5, 3,4,15,20]. We refer the reader to a technical report [19] for a detailed description of how this can be done.
168
4.3
O. Strichman
A Revised Decision Procedure
Given the initial conjunctions matrix Mϕ , we now change step 3: 3.
- Perform FM elimination on the set of all constraints in ϕ, while assigning new Boolean variables to the newly generated constraints. - At each elimination step consider the pair of constraints ei , ej only if Mϕ [ei , ej ] = 1. In this case let ek be the new predicate. · Add the constraint ei ∧ ej → ek to ϕ . · If ek represents a contradiction (e.g., 1 ≤ 0), replace ek by false. · Otherwise update Mϕ as follows: ∀l ∈ [1..k − 1]. Mϕ [ek , el ] = Mϕ [ei , el ] ∧ Mϕ [ej , el ].
The revised procedure guarantees that the total number of constraints generated is less or equal to the total number of constraints that are generated by solving each set of conjoint constraints separately. In fact, it is expected to generate a much smaller number, because constraints that are repeated in many separate cases resolve in a single new constraint in BFM. For example, naive case splitting over the formula ϕ = e1 ∧ e2 ∧ (e3 ∨ e4 ) will generate the resolvent of e1 and e2 twice, while BFM will only generate it once4 .
5
Experiments
An implementation of BFM turned out to be harder than expected, because of the lack of efficient and sound implementations of FM and the omega test in the public domain. We implemented BFM for real variables on top of PORTA (A Polyhedron Representation and Transformation Algorithm) [16]. We randomly generated formulas in 2-CNF style (that is, a 2-CNF where the literals are linear inequalities) with different number of clauses and variables. The (integer) coefficients were chosen randomly in the range −10..10. The time it took to generate the SAT instance with BFM5 is summarized in Fig. 2. The time it took Chaff [12] to solve each of the instances that we were able to generate was relatively negligible. Normally it was less than a second, with the exception of 3 instances that took 10-20 seconds each to solve. We also ran these instances with ICS, which solves these type of formulas with FM combined with case-splitting. ICS could solve only one of these instances (the 10 x 10 instance) in the specified time bound (it took it about 10 minutes). It either ran out of memory or out of time in all other cases. This is not very surprising, because in the worst case it has to solve 2c separate cases, where c is the number of clauses. CNF style formulas are also harder for BFM because they make conjunctions matrices ineffective. Each predicate in ϕ appears with 4
5
Smarter implementation of case splitting will possibly identify, in this simple example, that the resolvent has to be generated once. But in the general case redundant constraints will be generated. All experiments were run on a 1.5 GHz AMD Athlon machine with 1.5 G memory running Linux.
On Solving Presburger and Linear Arithmetic with SAT
# vars 10 30 50 70 90 110 130 150 170
10 0.1 0.1 0.1 0.1 0.2 0.3 0.3 0.2 0.2
30 0.2 0.1 0.1 0.2 0.2 0.3 0.3 0.3 0.3
50 0.2 0.2 0.2 0.2 0.3 0.5 0.4 0.5 0.5
70 1.1 2.5 0.3 0.4 0.3 8.2 0.7 0.8 58.2
# clauses 90 110 130 56 103 208 61.1 68 618 4.9 8 173 13.4 108 * 0.5 1 14 396 594 * 2.9 195 2658 18.4 334 1227 999 * *
150 254 * 893 * 181 * * * *
169
170 * * 2772 * 347 * * * *
Fig. 2. Time, in seconds, required for generating a SAT instance for random 2-CNF style linear inequalities with a varying number of clauses and variables. ‘*’ indicates running time exceeding 2 hours.
all other predicates in some clause of ϕD , except those predicates it shares a clause with in ϕ. Thus, almost all the entries of Mϕ are equal to ‘1’. We performed two other sets of tests. In the first set, we ran BFM and ICS on seven formulas resulting from symbolic simulation of hardware designs. The only type of inequalities found in these formulas are separation predicates, i.e. predicates of the form x < y + c, where c is a constant. While BFM solved all seven formulas in a few seconds, ICS timed-out on two formulas, and solved in a few seconds the other five. In the second set, we ran some of the standard ICS benchmarks (e.g., ‘linsys-035’, ’linsys-100’). ICS performed much better than BFM with these instances. In some cases it terminated in a few seconds, while BFM timed-out. The reason for this seemingly inconsistency is that all the ICS benchmark formulas are a conjunction of linear equalities, and therefore no case splitting is required. The better performance of ICS can be attributed to the higher quality of implementation of FM comparing to that of PORTA. PORTA itself is, unfortunately, not an optimized implementation of FM. For example, it does not have heuristics for choosing dynamically the variable elimination order; rather it requires the user to supply a static order. It also doesn’t have a mechanism for identifying subsumed or even equivalent inequalities. These inefficiencies apparently have a very strong effect on the results, which indicates that if BFM will be implemented on top of a better implementation of FM (for example, on top of ICS itself), the results will, hopefully, further improve.
References 1. C. Barrett, D. Dill, and J. Levitt. Validity checking for combinations of theories with equality. In M. Srivas and A. Camilleri, editors, Proc. FMCAD 1996, volume 1166 of LNCS. Springer-Verlag, 1996. 2. A.J.C. Bik and H.A.G. Wijshoff. Implementation of Fourier-Motzkin elimination. Technical Report 94-42, Dept. of Computer Science, Leiden University, 1994.
170
O. Strichman
3. R.E. Bryant, S. German, and M. Velev. Exploiting positive equality in a logic of equality with uninterpreted functions. In Proc. 11th Intl. Conference on Computer Aided Verification (CAV’99), 1999. 4. R.E. Bryant, S. German, and M. Velev. Processor verification using efficient reductions of the logic of uninterpreted functions to propositional logic. ACM Transactions on Computational Logic, 2(1):1–41, 2001. 5. R.E. Bryant and M. Velev. Boolean satisfiability with transitivity constraints. In E.A. Emerson and A.P. Sistla, editors, Proc. 12th Intl. Conference on Computer Aided Verification (CAV’00), volume 1855 of Lect. Notes in Comp. Sci. SpringerVerlag, 2000. 6. G. Dantzig. Linear Programming and Extensions. Princeton University Press, Princeton, New Jersey., 1963. 7. W. M. Farmer, J. D. Guttman, , and F. J. Thayer. IMPS: System description. In D. Kapur, editor, Automated Deduction–CADE-11, volume 607 of Lect. Notes in Comp. Sci., pages 701–705. Springer-Verlag, 1992. 8. J.C. Filliatre, S. Owre, H. Rueb, and N. Shankar. ICS: Integrated canonizer and solver. In G. Berry, H. Comon, and A. Finkel, editors, Proc. 13th Intl. Conference on Computer Aided Verification (CAV’01), LNCS. Springer-Verlag, 2001. 9. A. Goel, K. Sajid, H. Zhou, A. Aziz, and V. Singhal. BDD based procedures for a theory of equality with uninterpreted functions. In A.J. Hu and M.Y. Vardi, editors, CAV98, volume 1427 of LNCS. Springer-Verlag, 1998. 10. P. Johannsen. Reducing bitvector satisfiability problems to scale down design sizes for rtl property checking. In IEEE Proc. HLDVT’01, 2001. 11. L. G. Khachiyan. A polynomial algorithm in linear programming. Soviet Mathematics Doklady, 1979. 12. M. Moskewicz, C. Madigan, Y. Zhao, L. Zhang, and S. Malik. Chaff: Engineering an efficient SAT solver. In Proc. Design Automation Conference 2001 (DAC’01), 2001. 13. G. Nelson and D. C. Oppen. Simplification by cooperating decision procedures. ACM Transactions on Programming Languages and Systems, 1979. 14. S. Owre, N. Shankar, and J.M. Rushby. User guide for the PVS specification and verification system. Technical report, SRI International, 1993. 15. A. Pnueli, Y. Rodeh, O. Shtrichman, and M. Siegel. Deciding equality formulas by small-domains instantiations. In Proc. 11th Intl. Conference on Computer Aided Verification (CAV’99), Lect. Notes in Comp. Sci. Springer-Verlag, 1999. 16. PORTA. http://elib.zib.de/pub/packages/mathprog/polyth/porta/. 17. W. Pugh. The omega test: a fast and practical integer programming algorithm for dependence analysis. Communications of the ACM, pages 102–114, 1992. 18. W. Pugh and D. Wonnacott. Experiences with constraint-based array dependence analysis. In Principles and Practice of Constraint Programming, pages 312–325, 1994. 19. O. Strichman. Optimizations in decision procedures for propositional linear inequalities. Technical Report CMU-CS-02-133, Carnegie Mellon University, 2002. 20. O. Strichman, S.A. Seshia, and R.E. Bryant. Deciding separation formulas with SAT. In Proc. 14th Intl. Conference on Computer Aided Verification (CAV’02), LNCS, Copenhagen, Denmark, July 2002. Springer-Verlag.
Deciding Presburger Arithmetic by Model Checking and Comparisons with Other Methods Vijay Ganesh, Sergey Berezin, and David L. Dill Stanford University {vganesh,berezin, dill}@stanford.edu
Abstract. We present a new way of using Binary Decision Diagrams in automata based algorithms for solving the satisfiability problem of quantifier-free Presburger arithmetic. Unlike in previous approaches [5,2,19], we translate the satisfiability problem into a model checking problem and use the existing BDD-based model checker SMV [13] as our primary engine. We also compare the performance of various Presburger tools, based on both automata and ILP approaches, on a large suite of parameterized randomly generated test cases. The strengths and weaknesses of each approach as a function of these parameters are reported, and the reasons for the same are discussed. The results show that no single tool performs better than the others for all the parameters. On the theoretical side, we provide tighter bounds on the number of states of the automata.
1
Introduction
Efficient decision procedures for logical theories can greatly help in the verification of programs or hardware designs. For instance, quantifier-free Presburger arithmetic [15] has been used in RTL-datapath verification [3], and symbolic timing verification [1].1 However, the satisfiability problem for the quantifier free fragment is known to be NPcomplete [14]. Consequently, the search for practically efficient algorithms becomes very important. Presburger arithmetic is defined to be the first-order theory of the structure Z, 0, ≤, +, where Z is the set of integers. The satisfiability of Presburger arithmetic was shown to be decidable by Presburger in 1927 [15,12]. This theory is usually defined over the natural numbers N, but can easily be extended to the integers (which is important for practical applications) by representing any integer variable x by two natural variables: x = x+ − x− . This reduction obviously has no effect on known decidability or complexity results.
This research was supported by GSRC contract SA2206-23106PG-2 and in part by National Science Foundation CCR-9806889-002. The content of this paper does not necessarily reflect the position or the policy of GSRC, NSF, or the Government, and no official endorsement should be inferred. 1 In [1] Presburger formulas have quantifiers, but without alternation, and therefore, are easy to convert into quantifier-free formulas.
M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 171–186, 2002. c Springer-Verlag Berlin Heidelberg 2002
172
V. Ganesh, S. Berezin, and D.L. Dill
The remainder of the paper focuses on quantifier-free Presburger arithmetic because many verification problems do not require quantification, and because the performance of decision procedures on quantifier-free formulas may be qualitatively different from the quantified case. This paper has two primary goals: presentation of a new decision procedure based on model checking and comparison of the various approaches to deciding quantifier-free Presburger arithmetic and their implementations. There are three distinct ways of solving the satisfiability problem of quantifier-free Presburger, namely the Cooper’s method [8], the integer linear programming (ILP) based approaches, and the automata-based methods. Cooper’s method is based on Presburger’s original method for solving quantified formulas, only more efficient. Using Cooper’s method on a quantifier-free formula still requires introducing existential quantifiers and then eliminating them. This process results in an explosion of new atomic formulas, so the method is probably too inefficient to be competitive with other approaches. Since atomic formulas are linear integer equalities and inequalities, it is natural to think of the integer linear programming (ILP) algorithms as a means to determine the satisfiability of quantifier-free formulas in Presburger arithmetic. ILP algorithms maximize an objective function, subject to constraints in the form of a conjunction of linear equalities and inequalities. Along the way, the system is checked for satisfiability (usually called feasibility), which is the problem of interest in this paper. There are many efficient implementations of ILP solvers available. We have experimented with the commercial tool CPLEX and open source implementations LP SOLVE and OMEGA [16]. The OMEGA tool is specifically tuned to solve integer problems, and is an extension of the Fourier-Motzkin linear programming algorithm [9] to integers [18]. In order to solve an arbitrary quantifier-free formula, it must first be converted to disjunctive normal form (DNF), then ILP must be applied to each disjunct until a satisfiable one is found. If any of the disjuncts is satisfiable, then the entire formula is satisfiable. This conversion to DNF may lead to an exponential explosion of the formula size. In addition, unlike automata methods, the existing implementations lack the support for arbitrarily large integers and use native machine arithmetic. This has two consequences. Firstly, it obstructs making a fair comparison of the ILP tools with automata methods, since the two are not feature equivalent. The use of native machine arithmetic by ILP tools gives them an unfair performance advantage. Secondly, the support for large integers may be crucial in certain hardware verification problems, where the solution set may have integers larger than the int types supported natively by the hardware. For instance, many current RTL-datapath verification approaches use ILP [11,3], but these approaches cannot be scaled with the bit-vector size in the designs. A third approach uses finite automata theory. The idea that an atomic Presburger formula can be represented by a finite-state automaton goes back at least to B¨uchi [5]. Boudet and Comon [2] proposed a more efficient encoding than B¨uchi’s. Later, Wolper and Boigelot [19] further improved the method of Boudet and Comon and implemented the technique in the system called LASH. Another automata-based approach is to translate the atomic formulas into WS1S (weak monadic second order logic with one successor) and then use the MONA tool [10]. MONA is a decision procedure for WS1S and uses Binary Decision Diagrams (BDDs, [4]) internally to represent automata.
Deciding Presburger Arithmetic
173
In this paper, a new automata-based approach using symbolic model checking [7] is proposed and evaluated. The key idea is to convert the quantifier-free Presburger formula into a sequential circuit which is then model checked using SMV [13]. Experiments indicate that the SMV approach is quite efficient and more scalable on formulas with large coefficients than all the other automata-based techniques. The reason for this is the use of BDDs to represent both the states and the transitions of the resulting automaton. Another factor which contributes to the efficiency is that SMV uses a highly optimized BDD package. In addition, the use of an existing tool saves a lot of implementation effort. The experiments required only a relatively small Perl script to convert Presburger formulas into the SMV language. The other tools do not use BDDs for the states because they perform quantifier elimination by manipulating the automata directly. Namely, each quantifier alternation requires projection and determinization of the automaton. The use of BDDs for the states can make the implementation of the determinization step particularly hard. We also compare various automata and ILP-based approaches on a suite of 400 randomly generated Presburger formulas. The random generation was controlled by several parameters, such as the number of atomic formulas, the number of variables, and maximum coefficient size. For every approach we identify classes of Presburger formulas for which it either performs very poorly or very efficiently. Only one similar comparison has been done previously in [17]. However, their examples consist of a rather small set of quantified Presburger formulas obtained from real hardware verification problems. The goal of our comparison is to study the performance trends of various approaches and tools depending on different parameters of quantifier-free Presburger formulas. The paper is organized as follows. Section 2 explains the automata construction algorithms which are the same as in [19,2], except for the tighter bounds on the number of states of the automata. Section 3 then describes the implementation issues, the conversion of the satisfiability problem into a model checking problem, and construction of a circuit corresponding to the automaton. Section 4 provides our experimental results and comparisons with other tools. Finally, Section 5 concludes the paper with the discussion of experimental results and the future work.
2
Presburger Arithmetic
Definition 1. We define Presburger arithmetic to be the first-order theory over atomic formulas of the form n ai xi ∼ c, (1) i=1
where ai and c are integer constants, xi ’s are variables ranging over integers, and ∼ is an operator from {=, = , <, ≤, >, ≥}. The semantics of these operators are the usual ones. In the rest of the paper we restrict ourselves to only quantifier-free fragment of Presburger arithmetic.
174
V. Ganesh, S. Berezin, and D.L. Dill
A formula f is either an atomic formula (1), or is constructed from formulas f1 and f2 recursively as follows: f ::= ¬f1 | f1 ∧ f2 | f1 ∨ f2 . Throughout the paper we use the following typographic conventions. Notation 1. We reserve boldface letters, e.g. b, to represent column vectors and bT to represent row vectors. The term vector shall always refer to a column vector unless specified otherwise. In this notation, x represents the vector of variables of the atomic formula: x1 .. x= . xn and b represents n-bit boolean column vectors. A row vector of coefficients in an atomic formula is denoted by aT : aT = (a1 , a2 , . . . , an ). In particular, an atomic formula in the vector notation is written as follows: f ≡ aT · x ∼ c, where aT · x is the scalar product of the two vectors aT and x. We give the formal semantics of the quantifier-free Presburger arithmetic in terms of the sets of solutions. A variable assignment for a formula φ (not necessarily atomic) with n free variables is an n-vector of integers w. An atomic formula f under a particular assignment w can be easily determined to be true or false by evaluating the expression aT · w ∼ c. A solution is a variable assignment w which makes the formula φ true. We denote the set of all solutions of φ by Sol(φ), which is defined recursively as follows: – if φ is atomic, then Sol(φ) = {w ∈ Zn | aT · w ∼ c}; – if φ ≡ ¬φ1 , then Sol(φ) = Zn − Sol(φ1 ); – if φ ≡ φ1 ∧ φ2 , then Sol(φ) = Sol(φ1 ) ∩ Sol(φ2 ); – if φ ≡ φ1 ∨ φ2 , then Sol(φ) = Sol(φ1 ) ∪ Sol(φ2 ). To simplify the definitions, we assume that all atomic formulas of φ always contain the same set of variables. If this is not true and some variables are missing in one of the atomic formulas, then these variables can be added with zero coefficients. 2.1
Idea behind the Automaton
The idea behind the automata-based approach is to construct a deterministic finite-state automaton (DFA) Aφ for a quantifier-free Presburger formula φ such that the language of this automaton L(Aφ ) corresponds to the set of all solutions of φ. When such an
Deciding Presburger Arithmetic
175
automaton is constructed, the satisfiability problem for φ is effectively reduced to the emptiness problem of the automaton, that is, checking that L(Aφ ) =∅. If a formula is not atomic, then the corresponding DFA can be constructed from the DFAs for the subformulas using the complement, intersection, and union operations on the automata. Therefore, to complete our construction of Aφ for an arbitrary quantifierfree Presburger formula φ it is sufficient to construct DFAs for each of the atomic formulas of φ. Throughout this section we fix a particular atomic Presburger formula f : f ≡ aT · x ∼ c. Recall that a variable assignment is an n-vector of integers w. Each integer can be represented in the binary format in 2’s complement, so a solution vector can be represented by a vector of binary strings. We can now look at this representation of a variable assignment w as a binary matrix where each row, or track, represents an integer for the corresponding variable, and each ith column represents the vector of the ith bits of all the components of w. Alternatively, this matrix can be seen as a string of its columns, a string over the alphabet Σ = Bn , where B = {0, 1}. The set of all strings that together represent all the solutions of a formula f form a language Lf over the alphabet Σ. Our problem is now reduced to building a DFA for the atomic formula f that accepts exactly the language Lf . Intuitively, the automaton Af must read a string π, extract the corresponding variable assignment w from it, instantiate it into the formula f , and check that the value of the left hand side (LHS) is indeed related to the right hand side (RHS) constant as the relation ∼ prescribes. If it does, the string is accepted, otherwise rejected. Since the RHS constant and the relation ∼ are fixed in f , the value of the LHS of f solely determines whether the input string π should be accepted or not. Assume that the automaton Af reads a string from left to right. If the value of the LHS of f is l after reading the string π, then after appending one more “letter” b ∈ Bn to π on the right, the LHS value changes to l = 2l + aT · b. Notice that only the original value of the LHS l and the new “letter” b are needed to compute the new value of the LHS l for the resulting string. This property directly corresponds to the property of the transition relation of an automaton, namely, that the next state is solely determined by the current state and the next input letter. Following the above intuition, we can define an automaton Af as follows. The states of Af are integers representing the values of the LHS of f ; the input alphabet is Σ = Bn ; and on an input b ∈ Σ the automaton transitions from a state l to l = 2l + aT · b. The set of accepting states are those states l that satisfy l ∼ c. Special care has to be taken of the initial state sinitial ∈Z. First, we interpret the empty string as a vector of 0’s. Thus, the value of the left hand side in the initial state must be equal to 0. The first “letter” read by Af is the vector of sign bits, and, according to the 2’s complement interpretation, the value of the LHS in the next state after sinitial must be l = −aT · b. Notice that this automaton is not finite, since we have explicitly defined the set of states to be integers. Later we examine the structure of this infinite automaton and show how to trim the state space to a finite subset and obtain an equivalent DFA, similar to the one in Figure 1.
176
V. Ganesh, S. Berezin, and D.L. Dill
−inf 0 1
0 0 1 1 0,1,0,1
0 0 1 1 1,0,0,1
−2 0 1 0 , 1 1 0
−1 1 0
Sinitial
0 1 0 , 1 0 1 0 1
0 1 0 ,1
0 1 0
0 1
1 0 1 0,1
2 0 0 1 1 0,1,0,1 1 0
+inf
0 0 1 1 0,1,0,1
Fig. 1. Example of an automaton for an atomic Presburger formula x − y ≤ −2.
2.2
Formal Description of the Automaton
An (infinite-state) automaton corresponding to an atomic Presburger formula f is defined as follows: Af = (S, Bn , δ, sinitial , Sacc ), where – – – –
S = Z ∪ {sinitial } is the set of states, Z is the set of integers and sinitial ∈Z; sinitial is the start state; Bn is the alphabet, which is the set of n-bit vectors, B = {0, 1}; The transition function δ : S × Bn → S is defined as follows: δ(sinitial , b) = −aT · b δ(l, b) = 2l + aT · b
where l ∈ Z is a non-initial state. – The set of accepting states Sacc = {l ∈ Z | l ∼ c} ∪
{sinitial } ∅
if aT · 0 ∼ c otherwise.
In the rest of this section we show how this infinite automaton can be converted into an equivalent finite-state automaton. Intuitively, there is a certain finite range of values of the LHS of f such that if Af transitions outside of this range, it starts diverging, or “moving away” from this range, and is guaranteed to stay outside of this range and on the same side of it (i.e. diverging to +∞ or −∞). We show that all of the states outside of the range can be collapsed into only two states (representing +∞ and −∞ respectively), and that those states can be meaningfully labeled as accepting or rejecting without affecting the language of the original automaton Af .
Deciding Presburger Arithmetic
177
Definition 2. For a vector of LHS coefficients aT = (a1 , . . . , an ) define |ai | ||aT ||− = {i|ai <0}
||aT ||+ =
{i|ai >0}
|ai |
Notice that both ||aT ||− and ||aT ||+ are non-negative. Let b denote an n-bit binary vector, that is, b ∈ Bn . Observe that −aT · b ≤ ||aT ||− for any value of b, since the expression −aT · b can be rewritten as |aj |bj − |ai |bi . −aT · b = {j | aj <0}
{i | ai >0}
Therefore, the largest positive value of −aT ·b can be obtained by setting bi to 0 whenever ai > 0, and setting bj to 1 when aj < 0, in which case −aT · b = ||aT ||− . It is clear that any other assignment to b can only make −aT · b smaller. Similarly, aT · b ≤ ||aT ||+ . Lemma 3. Given an atomic Presburger formula aT ·x ∼ c, a corresponding automaton Af as defined is Section 2.2, and a current state of the automaton l ∈ Z, the following two claims hold: 1. If l > ||aT ||− , then any next state l will satisfy l > l. 2. If l < −||aT ||+ , then any next state l will satisfy l < l. Proof. The upper bound (claim 1). Assume that l > ||aT ||− for some state l ∈ Z. Then the next state l satisfies the following: l = 2l + aT · b ≥ 2l − ||aT ||− > 2l − l = l. The lower bound (claim 2) is similar to the proof of claim 1. We now discuss bounds on the states of the automata based on Lemma 3. From this lemma it is easy to see that once the automaton reaches a state outside of
min(−||aT ||+ , c), max(||aT ||− , c) , it is guaranteed to stay outside of this range and on the same side of it. That is, if it reaches a state l < min(−||aT ||+ , c), then l < min(−||aT ||+ , c) for any subsequent state l that it can reach from l. If the relation ∼ in f is an equality, then l = c is guaranteed to be false from the moment Af transitions to l onward. Similarly, it will be false forever when ∼ is ≥ or >; however it will always be true for < and ≤ relations. In any case, either all of the states l of the automaton Af below min(−||aT ||+ , c) are accepting, or
178
V. Ganesh, S. Berezin, and D.L. Dill
all of them are rejecting. Since the automaton will never leave this set of states, it will either always accept any further inputs or always reject. Therefore, replacing all states below min(−||aT ||+ , c) with one single state s−∞ with a self-loop transition for all inputs and marking this state appropriately as accepting or rejecting will result in an automaton equivalent to the original Af . Exactly the same line of reasoning applies to the states l > max(||aT ||− , c), and they all can be replaced by just one state s+∞ with a self-loop for all inputs. Formally, the new finite automaton has the set of states
S = min(−||aT ||+ , c), max(||aT ||− , c) ∪ {sinitial , s−∞ , s+∞ }. Transitions within the range coincide with the transitions of the original (infinite) automaton Af . If in the original automaton l = δ(l, b) for some state l and input b, and l > max(||aT ||− , c), then in the new automaton the corresponding next state is δ (l, b) = s+∞ , and subsequently, δ (s+∞ , b) = s+∞ for any input b. Similarly, if the next state l < min(−||aT ||+ , c), then the new next state is s−∞ , and the automaton remains in s−∞ forever: δ (sinitial , b) = −aT · b δ (s+∞ , b) = s+∞ δ (s−∞ , b) = s−∞ if 2l + aT · b > max(||aT ||− , c) s+∞ , if 2l + aT · b < min(−||aT ||+ , c) δ (l, b) = s−∞ , T 2l + a · b, otherwise. The accepting states within the range are those that satisfy the ∼ relation. The new “divergence” states are labeled accepting if the ∼ relation holds for some representative state. For instance, for a formula aT · x < c the state s−∞ is accepting, and s+∞ is rejecting. Finally, the initial state sinitial is accepting if and only if it is accepting in the original infinite automaton. We can use the bounds from Lemma 3 to repeat the analysis from [19] for the number of states of the automaton and obtain new bounds tighter by a factor of 2. Since we have to know the bounds in advance when constructing an SMV model, this saves one bit of state for every atomic formula. Asymptotically, of course, our new bounds stay the same as in [19].
3
Implementation
In the previous section we have shown a mathematical construction of a deterministic finite-state automaton corresponding to a quantifier-free Presburger formula f . In practice, building such an automaton explicitly is very inefficient, since the number of states is proportional to the value of the coefficients in aT and the right hand side constant c and, most importantly, the number of transitions from each state is exponential (2n ) in the number of variables in f .
Deciding Presburger Arithmetic
179
Instead, we use an existing symbolic model checker SMV [13] as a means to build the symbolic representation of the automaton and check its language for emptiness. Symbolic model checking expresses a design as a finite-state automaton, and then properties of this design are checked by traversing the states of the automaton. In the past decade, there has been a lot of research in boosting the performance of model checkers. Most notable breakthrough was in early 90s when binary decision diagrams [4] (BDDs) were successfully used in model checking [13], pushing the tractable size of an automaton to as many as 1020 states and beyond [6]. Therefore, it is only natural to try to utilize such powerful and well-developed techniques of handling finite-state automata in checking the satisfiability of Presburger formulas. The obvious advantages of this approach is that the state-of-the-art verification engines such as SMV are readily available, and the only remaining task is to transform the emptiness problem for an automaton into a model checking problem efficiently. In addition, with SMV we exploit the efficient BDD representation for both states and transitions of the automata, whereas in the other automata-based approaches like MONA or LASH the states are represented explicitly. We have performed all of our experiments with the CMU version of SMV model checker. Although the SMV language allows us to express the automaton and its transitions directly in terms of arithmetic expressions, the cost of evaluating these expressions in SMV is prohibitively high. Internally, SMV represents all the state variables as vectors of boolean variables. Similarly, the representation of the transition relation is a function2 that takes boolean vectors of the current state variables and the inputs and returns new boolean vectors for the state variables in the next state. Clock Input
Next State Function
R accept?
0/1
Fig. 2. Circuit implementing a finite-state automaton.
Effectively, SMV builds an equivalent of a sequential digital circuit operating on boolean signals, as shown in Figure 2. The current state of the automaton is stored in the register R. The next state is computed by a combinational circuit from the value of the current state and the new inputs, and the result is latched back into the register R at the next clock cycle. A special tester circuit checks whether the current state is accepting, 2
Strictly speaking, SMV constructs a transition relation which does not have to be a function, but here it is indeed a function, so this distinction is not important.
180
V. Ganesh, S. Berezin, and D.L. Dill
and if it is, the sequence of inputs read so far (or the string in our original terminology) is accepted by the automaton (and represents a solution to f ). The property that we check is that the output of the circuit never becomes 1 for any sequence of inputs. In the logical specification language of SMV, this is written as AG(output = 1) . If this property is true, then the language of the automaton is empty, and the original formula f is unsatisfiable. If this property is violated, SMV generates a counterexample trace which is a sequence of transitions leading to an accepting state. This trace represents a satisfying assignment to the formula f . The translation of the arithmetic expressions to such a boolean circuit is the primary bottleneck in SMV. Hence, providing the circuit explicitly greatly speeds up the process of building the transition relation. A relatively simple Perl script generates such a circuit and the property very efficiently and transforms it into a SMV description. The structure of the resulting SMV code follows very closely the mathematical definition of the automaton, but all the state variables are explicitly represented by several boolean variables, and all the arithmetic operations are converted into combinational circuits (or, equivalently, boolean expressions). In particular, ripple-carry adders are used for addition, “shift-and-add” circuits implement multiplication by a constant, and comparators implement equality and inequality relations in the tester circuit.
4
Experimental Results
Since the satisfiability problem for quantifier-free Presburger arithmetic is NP-complete, the hope that it has an efficient general purpose decision procedure is quite thin. Therefore, for practical purposes, it is more important to collect several different methods and evaluate their performance on different classes of formulas. When strengths and weaknesses of each of the approaches and tools are identified, it is easier to pick the best one for solving concrete problems that arise in practice. The primary purpose of our experiments is to study the performance of automatabased and ILP-based methods and their variations depending on different parameters of Presburger formulas. The tools and approaches that we picked are the following: – Automata-based tools: • Our approach using the SMV model checker (we refer to it as “SMV”); • LASH [19], a direct implementation of the automata-based approach dedicated to Presburger arithmetic; • MONA [10], an automata-based solver for WS1S and a general-purpose automata library. – Approaches based on Integer Linear Programming (ILP): • LP SOLVE, simplex-based open source tool with branch-and-bound for integer constraints; • CPLEX, one of the best commercial simplex-based LP solvers; • OMEGA [16], a tool based on Fourier-Motzkin algorithm [18].
Deciding Presburger Arithmetic
181
The benchmarks consist of many randomly generated relatively small quantifierfree Presburger formulas. The examples have three main parameters: the number of variables, the number of atomic formulas (the resulting formula is a conjunction of atomic formulas), and the maximum value of the coefficients. For each set of parameters we generate 5 random formulas and run this same set of examples through each of the tools. The results of the comparisons appear in Figures 3, 4, and 5 as plots showing how execution time of each automata-based tool depends on some particular parameter with other parameters fixed, and the success rate of all the tools for the same parameters. Each point in the run-time graphs represents a successful run of an experiment in a particular tool. That is, if a certain tool has fewer points in a certain range, then it means it failed more often in this range (ran out of memory or time, hit a fatal error, etc.). The ILP tools either complete an example within a small fraction of a second, or fail. Therefore the run-time is not as informative for ILP tools as the number of completed examples, and hence, only the success rates for those are shown. In the case of MONA, the only readily available input language is WS1S, and we have found that translating Presburger formulas into WS1S is extremely inefficient. Even rather simple examples which SMV and LASH solve in no time take significant time in MONA. Due to this inefficient translation, the comparison of MONA with other approaches is not quite fair. Therefore, it is omitted from the graphs and will not be considered in our discussion further. LASH and SMV both have obvious strengths and weaknesses that can be easily characterized. SMV suffers the most from the number of atomic formulas, as can be seen from Figure 3 where the run-time is plotted as a function of the number of atomic formulas. The largest number of formulas it could handle in this batch is 11, whereas the other tools including LASH finished most of the experiments with up to 20 atomic formulas. This suggests that the implemention of the parallel composition of automata for atomic formulas in SMV is suboptimal. LASH apparently has a better way of composing automata. Varying the number of variables (Figure 4) makes SMV and LASH look very much alike. Both tools can complete all of the experiments, and the run-time grows approximately exponentially with the number of variables and at the same rate in both tools. This suggests that the BDD-like structure for the transitions in LASH indeed behaves very similarly to BDDs in SMV. However, since the number of states in the automata are proportional to the values of the coefficients, LASH cannot complete any of the experiments with coefficients larger than 4096 and fails on many experiments even with smaller values. SMV, on the other hand, can handle as large coefficients as 230 with only a moderate increase of the runtime and the failure rate. We attribute this behavior to the fact that in SMV both the states and the transitions of the automata are represented with BDDs, while in LASH (and all the other available automata-based tools) the states are always represented explicitly. Finally, we have to say a few words about the ILP based methods. First of all, these methods are greatly superior to the automata-based in general, and they do not exhibit any noticeable increase in run-time when the number of variables or the number of formulas increase. The only limiting factor for ILPs are the values of the coefficients, which cause
182
V. Ganesh, S. Berezin, and D.L. Dill 10000 time (seconds)
number of variables = 4, max. coefficient size = 32
1000
100
10
1
0.1
0.01 Tools SMV LASH
Number of atomic formulas 0
5
10
15 25
22.5
22.5
20 17.5 15
SMV LASH
12.5 10 7.5 5 2.5 0 1-5
6-10
11-15
16-20
Number of atomic formulas
Completed experiments
Completed experiments
0.001 25
20
25
20 17.5 15
Omega LP SOLVE CPLEX
12.5 10 7.5 5 2.5 0 1-5
6-10
11-15
16-20
Number of atomic formulas
Fig. 3. Run-time and the number of completed experiments depending on the number of atomic formulas in each test case.
many failures and overflows starting at about 107 , especially in LP SOLVE. Although all of the successful runs of the ILP-based tools are well under a fraction of a second, there are also many failures due to a non-terminating branch-and-bound search, overflow exceptions, and program errors. OMEGA is especially notorious for segmentation faults, and its failure rate greatly increases when the values of the coefficients approach the limit of the machine-native integer or float representation. Despite overall superiority of the ILP-based methods over the automata-based ones, there are a few cases where the ILP methods fail while the automata-based methods work rather efficiently. The most interesting class of such examples can be characterized as follows. The formula must have a solution in real numbers, but the integer solutions either do not exist or they are rather sparse in the feasibility set (the set of real solutions) of the formula. Additionally, the direct implementation of the branch-and-bound method is incomplete when the feasibility set is unbounded, since there are infinitely many integer points that have to be checked. This claim still holds to some extent even in the
Deciding Presburger Arithmetic
183
1000 time (seconds)
number of formulas = 1, max. coefficient size = 32
100
10
1
0.1
0.01 Tools SMV LASH
Number of variables 0.001
0
5
10
15
20
25
30
35
25
Completed experiments
22.5 20 17.5 15
OMEGA LP SOLVE CPLEX
12.5 10 7.5 5 2.5 0 1-5
6-10
11-15
16-20
21-25
26-30
Number of variables Fig. 4. Run-time and the number of completed experiments depending on the number of variables in a single atomic formula. SMV and LASH finish all of the experiments, hence there is no bar chart for those.
heuristic-rich top quality commercial tools such as CPLEX, and we have observed their divergence on a few examples that are trivial even for the automata-based techniques. The OMEGA approach stands out from the rest of ILP tools since it is based on the Fourier-Motzkin method which is complete for integer linear constraints. Unfortunately, the only readily available implementation of this method is very unstable. Another common weakness of all of the ILP-based approaches is the limit of the coefficient and solution values due to the rounding errors of native computer arithmetic. It is quite easy to construct an example with large integer coefficients for which CPLEX
184
V. Ganesh, S. Berezin, and D.L. Dill 10000 time (seconds)
Number of variables = 4, number of formulas = 1
1000
100
10
1
0.1
0.01 Tools SMV LASH
maximum coefficient value 0.001
1
10
100
1000
10000
100000
23
22.5
1e+07
1e+08
1e+09
1e+10
22.5
20 17.5 15
SMV LASH
12.5 10 7.5 5 2.5
2 0
0 1-5
1e+06 25
25
6-10
11-15
16-20
0
21-25
log2 (max. coefficient)
0
26-30
Completed experiments
Completed experiments
25
20 17.5 15
OMEGA LP SOLVE CPLEX
12.5 10 7.5 5 2.5 0 1-5
6-10
11-15
16-20
21-25
26-30
log2 (max. coefficient)
Fig. 5. Run-time and the number of completed examples depending on the (maximum) values of the coefficients in a single atomic formula.
returns a plainly wrong answer. Large coefficients can be extremely useful in hardware verification when operations on long bit-vectors are translated into Presburger arithmetic. We conjecture that the efficiency of the ILP methods highly depends on the use of computer arithmetic, and the only fair comparison with automata-based methods can be done if the ILP tools use arbitrary precision arithmetic.
5
Conclusion
Efficient decision procedures for Presburger arithmetic are key to solving many formal verification problems. We have developed a decision procedure based on the idea of converting the satisfiability problem into a model checking problem. Experimental comparisons show that our method can be more efficient than other automata-based methods like LASH and MONA, particularly for formulas with large coefficients. In our approach
Deciding Presburger Arithmetic
185
we use BDDs both for the states and the transitions of the automata while LASH and MONA use BDDs or similar structures only for the transitions. As an additional theoretical result, we provide tighter bounds for the number of states of the automata. This makes our automaton construction in SMV even more efficient. Another advantage of our approach is that converting the satisfiability problem into model checking problem requires very little implementation effort. We exploit the existing SMV model checker as a back-end which employs a very efficient BDD package. Therefore, the only effort required from us is the translation of a Presburger formula into the SMV input language. In addition, we compare various automata and ILP-based approaches on a suite of parameterized randomly generated Presburger formulas. For every approach we identify classes of Presburger formulas for which it either performs very poorly or very efficiently. For instance, we found that the ILP-based tools are more likely to fail on examples with unbounded but sparse solution sets and cannot handle large coefficients due to the use of native machine arithmetic. The automata-based tools are not as sensitive to these parameters. On the other hand, ILP-based approaches scale much better on the number of variables and atomic formulas. We also believe that the ILP tools have an unfair advantage over the automata methods due to the use of native arithmetic. However, until further experiments are done with an ILP tool with support for arbitrarily large integers we cannot tell how much difference it makes. Within the automata-based approaches SMV scales better with the coefficients’ size, but displays poorer performance for large number of atomic formulas when compared to LASH. Both perform equally well as the number of variables is varied. The reason the other tools do not use BDDs for the states is because they perform quantifier elimination by manipulating the automata directly. Namely, each quantifier alternation requires projection and determinization of the automaton. The use of BDDs for the states can make the implementation of the determinization step particularly hard. This difference is one of the reasons for the relative efficiency of our approach. The extension of our approach to full Presburger arithmetic can be done by combining it with the traditional quantifier elimination method [12]. This method introduces a new type of atomic formulas with the divisibility operator: aT · x | c, and our automaton construction can be easily extended to handle it. We also believe that our approach may prove useful to other theories and logics which use automata based decision procedures.
References 1. Tod Amon, Gaetano Borriello, Taokuan Hu, and Jiwen Liu. Symbolic timing verification of timing diagrams using Presburger formulas. In Design Automation Conference, pages 226–231, 1997. 2. Alexandre Boudet and Hubert Comon. Diophantine equations, Presburger arithmetic and finite automata. In H. Kirchner, editor, Colloquium on Trees in Algebra and Programming (CAAP’96), volume 1059 of Lecture Notes in Computer Science, pages 30–43. Springer Verlag, 1996. 3. R. Brinkmann and R. Drechsler. RTL-datapath verification using integer linear programming. In IEEE VLSI Design’01 & Asia and South Pacific Design Automation Conference, Bangalore, pages 741–746, 2002.
186
V. Ganesh, S. Berezin, and D.L. Dill
4. R. E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Transactions on Computers, 35(8):677–691, 1986. 5. J. R. B¨uchi. Weak second-order arithmetic and finite automata. Zeitschrift f¨ur mathematische Logik und Grundladen der Mathematik, 6:66–92, 1960. 6. J. R. Burch, E. M. Clarke, K. L. McMillan, D. L. Dill, and L. J. Hwang. Symbolic model checking: 1020 states and beyond. Information and Computation, 98:142–170, 1992. 7. E. M. Clarke, E. A. Emerson, and A. P. Sistla. Automatic verification of finite-state concurrent systems using temporal logic specifications. ACM Transactions on Programming Languages and Systems, 8(2):244–263, 1986. 8. D. C. Cooper. Theorem proving in arithmetic without multiplication. In Machine Intelligence, volume 7, pages 91–99, New York, 1972. American Elsevier. 9. George B. Dantzig and B. Curtis Eaves. Fourier-Motzkin elimination and its dual. Journal of Combinatorial Theory (A), 14:288–297, 1973. 10. Jacob Elgaard, Nils Klarlund, and Anders Møller. Mona 1.x: new techniques for ws1s and ws2s. In Computer Aided Verification, CAV ’98, Proceedings, volume 1427 of LNCS. Springer Verlag, 1998. 11. P. Johannsen and R. Drechsler. Formal verification on the RT level computing one-to-one design abstractions by signal width reduction. In IFIP International Conference on Very Large Scale Integration (VLSI’01), Montpellier, 2001, pages 127–132, 2001. 12. G. Kreisel and J. Krivine. Elements of mathematical logic, 1967. 13. K. L. McMillan. Symbolic Model Checking: An Approach to the State Explosion Problem. Kluwer Academic Publishers, 1993. 2pn 14. Derek C. Oppen. A 22 upper bound on the complexity of Presburger arithmetic. Journal of Computer and System Sciences, 16(3):323–332, June 1978. ¨ 15. M. Presburger. Uber de vollst¨andigkeit eines gewissen systems der arithmetik ganzer zahlen, in welchen, die addition als einzige operation hervortritt. In Comptes Rendus du Premier Congr`es des Math´ematicienes des Pays Slaves, pages 92–101, 395, Warsaw, 1927. 16. William Pugh. The omega test: a fast and practical integer programming algorithm for dependence analysis. In Supercomputing, pages 4–13, 1991. 17. T. R. Shiple, J. H. Kukula, and R. K. Ranjan. A comparison of Presburger engines for EFSM reachability. In A. J. Hu and M. Y. Vardi, editors, Proceedings of the 10th International Conference on Computer Aided Verification, volume 1427, pages 280–292. Springer-Verlag, 1998. 18. H. P. Williams. Fourier-Motzkin elimination extension to integer programming problems. Journal of Combinatorial Theory (A), 21:118–123, 1976. 19. Pierre Wolper and Bernard Boigelot. On the construction of automata from linear arithmetic constraints. In Proc. 6th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, volume 1785 of Lecture Notes in Computer Science, pages 1–19, Berlin, March 2000. Springer-Verlag.
Qubos: Deciding Quantified Boolean Logic Using Propositional Satisfiability Solvers Abdelwaheb Ayari and David Basin Institut f¨ ur Informatik, Albert-Ludwigs-Universit¨ at Freiburg, Germany. www.informatik.uni-freiburg.de/˜{ayari,basin}
Abstract. We describe Qubos (QUantified BOolean Solver), a decision procedure for quantified Boolean logic. The procedure is based on nonclausal simplification techniques that reduce formulae to a propositional clausal form after which off-the-shelf satisfiability solvers can be employed. We show that there are domains exhibiting structure for which this procedure is very effective and we report on experimental results.
1
Introduction
In recent years there has been considerable work on developing and applying satisfiability (SAT) solvers for quantified Boolean logic (QBL). Applications include program verification using bounded model checking [3] and bounded model construction [1], hardware applications including testing and equivalence checking [17], and artificial intelligence tasks like planning [14]. Solvers for (unquantified) Boolean logic have reached a state of maturity; there are many success stories where SAT-solvers such as [11,19,22] have been successfully applied to industrial scale problems. However, the picture for QBL is rather different. Despite the growing body of research on this topic, the current generation of Q(uantified)SAT-solvers [8,10,15] are still in their infancy. These tools work by translating QBL formulae to formulae in a quantified clausal normal form and applying extensions of the Davis-Putnam method to the result. The extensions concern generalizing Davis-Putnam heuristics such as unitpropagation and backjumping. These tools have not yet achieved the successes that SAT tools have and our understanding of which classes of formulae these procedures work well on, and why, is also poor. In this article, we present a different approach to the QSAT problem. It arose from our work in bounded model construction for monadic second-order logics [1] where we reduce the problem of finding small models for monadic formulae to QBL satisfiability. Our experience with available QBL solvers was disappointing. Their application to formulae involving more than a couple quantifier iterations would often fail, even for fairly simple formulae. In particular, our model construction procedure generates formulae where the scope of quantification is generally small in proportion to the overall formula size and in many cases quantifiers can be eliminated, without blowing up the formulae, by combining quantifier elimination with simplification. This motivated our work on a procedure based on combining miniscoping (pushing quantifiers in, in contrast M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 187–201, 2002. c Springer-Verlag Berlin Heidelberg 2002
188
A. Ayari and D. Basin
to out, which is used in clause based procedures), quantifier expansion, and eager simplification using a generalization of Boolean constraint propagation. The transformation process is carried out until the result has only one kind of quantifier remaining, at which point the result can be converted to clausal form and given to an off-the-shelf (Boolean) SAT-solver. Our thesis in this paper is that our decision procedure works well (it is superior to other state-of-the-art approaches) when certain kinds of structure are present in the problems to be solved. Our contribution is to identify a notion of structure based on relative quantifiers scope, to show that certain classes of problems will naturally have this structure (i.e., that the ideas presented in this paper have general applicability), and to validate our thesis experimentally. Our experimental comparison is on two sets of problems, those arising in bounded model construction, which always exhibit significant structure, and those arising in conditional planning, which have varying degrees of structure. Related Work. The idea of tuning a solver to exploit structure also arises in bounded model checking, where SAT-solvers are tuned to exploit the problemspecific structure arising there. In [18], such heuristics were embedded within a generic SAT algorithm that generalizes the Davis-Putnam procedure. Similar techniques to miniscoping and quantifier expansion are also used in Williams et al. [20] to optimize different computation tasks like the calculation of fixed points. Most QBL algorithms generalize the Davis-Putnam procedure to operate on formulae transformed into quantified clausal normal form. Cadoli et al. [6] and Rintanen [16,15] present different heuristic extensions of the Davis-Putnam method. Cadoli et al.’s techniques were tuned for randomly generated problems and Rintanen’s strategies were specially designed for planning problems whose quantifiers have a fixed ∃∀∃-structure. Other work includes that of Letz [10] and Giunchiglia et al. [7] who have generalized the backjumping heuristic (also called dependency-directed backtracking) to QBL. Our approach differs from all of these in that it is not based on Davis-Putnam, it can operate freely on subformulae of the input formula (this avoids a major source of inefficiency of Davis-Putnam based procedures, namely the selection of branching variables is strongly restricted by the ordering induced by the prefix of the input formula), and for structured problems (in our sense) it yields significantly better results. The most closely related work is that of Plaisted et al. [13] who present a decision procedure for QBL that also operates directly on quantified Boolean formulae by iteratively applying equivalence preserving transformation. However, rather than expanding quantifiers, in their approach a subformula with a set of free variables X is replaced by a large conjunction of all negated evaluations of X that make the subformulae unsatisfiable. Plaisted et al. [13] suggest that their procedure should work well for hardware systems that have structure in the sense of being “long and thin”; as indicated by their examples (ripplecarry adders), these systems form a subclass of well-structured problems in our sense. As no implementation is currently available, we were unable to compare our approaches experimentally.
Qubos: Deciding Quantified Boolean Logic
189
Organization. The rest of the paper is organized as follows. In Section 2, we provide background on QBL and introduce notation. In Section 3, we explain what kind of structure we will exploit and why certain classes of problems are naturally structured. In Section 4, we introduce our procedure and in Section 5, we present experimental results. Finally, in Section 6, we draw conclusions.
2
Background
The formulae of Boolean logic (BL) are built from the constants and ⊥, the variables x ∈ V, and are closed under the standard connectives ¬ (negation), ∧ (conjunction), ∨ (disjunction), → (implication), and ↔ (logical equivalence). The formulae φ are interpreted in B = {0, 1}. A substitution σ : V → B is a mapping from variables to truth values that is extended homomorphically to formulae. We say σ satisfies φ if σ(φ) = 1. Quantified Boolean logic (QBL) extends Boolean logic by allowing quantification over Boolean variables, i.e., ∀x. φ and ∃x. φ. A substitution σ satisfies ∀x. φ if σ satisfies φ[/x]∧φ[⊥/x], and dually a substitution σ satisfies ∃x. φ if σ satisfies φ[/x] ∨ φ[⊥/x]. As notational shorthand, we allow quantification over sets of variables and we write Qx1 , . . . , xn . φ for the formula Qx1 . · · · Qxn . φ, where Q ∈ {∀, ∃}. We denote by free(φ) the set of free variables in φ. Unless indicated otherwise, by “formulae” we mean quantified Boolean formulae instead of (unquantified) Boolean formulae. A formula x or ¬x, where x is a variable, is called a literal . A formula φ is in negation normal form (nnf ), if, besides the quantifiers, it contains only the connectives ∨, ∧ and ¬, and ¬ appears only before variables. A formula φ is in prenex normal form (pnf ) if it has the form Q1 X1 · · · Qk Xk . ψ where Qi ∈ {∃, ∀}, each Xi is a finite set of variables, and ψ is a Boolean formula called the matrix of φ. A formula φ is in quantified clausal normal form (qcnf ) if it is in pnf and its matrix is a conjunction of disjunctions of literals. We define the prefix-type of a formula in pnf inductively as follows. A Boolean formula has the prefix-type Σ0 = Π0 . A formula ∀x. φ has the prefix-type Πn+1 (respectively Πn ) if φ has the prefix-type Σn (respectively Πn ). A formula ∃x. φ has the prefix-type Σn+1 (respectively Σn ) if φ has the prefix-type Πn (respectively Σn ). Finally, the size of a formula φ, denoted by | φ |, is the number of variable occurrences, connectives and (maximal) quantifier blocks in φ, i.e., the size of the abstract syntax tree for φ, where like quantifiers are grouped in blocks and only counted once.
3
Structured Problems
Our thesis is that our decision procedure works well (in particular, it is superior to other state-of-the-art approaches) when certain kinds of structure are present in the problems to be solved. In this section we explain what structure is, how one measures it, and why certain classes of problems will naturally have this structure.
190
A. Ayari and D. Basin
The structure we exploit is based on a notion of quantifier scope, in particular the size of quantified subterms relative to the size of the entire term. When the average quantifier scope is small, our transformations can often successfully eliminate quantifiers in manageable time and space. In our experiments, it is important to be able to measure structure to assess its effects on the decision procedure’s performance. Our measure is based on the average quantifier weight, defined as follows: Definition 1. Let φ be a quantified Boolean formula, Q ∈ {∀, ∃}, MQ be the multiset of all Q-quantified subformulae of φ, and ψ ∈ MQ . The relative Qweight of ψ with respect to φ is rwφQ (ψ) = |ψ| |φ| . The average Q-weight of φ is
awQ (φ) = |M1Q| Σψ∈MQ rwφQ (ψ). Now, well-structured formulae are those with either a small average ∀-weight or small average ∃-weight (typically under 5%, as we will see for the first problem domain we consider), i.e., those in which, for at least one of the quantifiers, quantified variables have small scopes on average. In contrast, poorly structured formulae with large average weight have many quantifiers with large scopes. The two domains we consider are system verification using bounded model construction [1], and conditional planning [14]. For the first domain, we show that problems are always well-structured. In the second domain, the degree of structure varies considerably. The corresponding effectiveness of our decision procedure also varies in relationship to this structure. 3.1
Bounded Model Construction
Bounded model construction (BMC) [1] is a method for generating models for a monadic formula by reducing its satisfiability problem to a QBL satisfiability problem. This method has been applied to problems in hardware verification, protocol verification, and reasoning about Java Bytecode correctness. We will present a small example to show how structured problems arise in BMC and then explain why this is generally the case. The example is reasoning about a parameterized family of ripple-carry adder: verifying the equivalence of adders in the family with their specification, for all parameter instances. The monadic formulae describing part of the implementation and specification of the adder are as follows: adder(n, A, B, S, ci , co ) ≡ ∃C. (C(0) ↔ ci ) ∧ (C(n) ↔ co ) ∧ ∀p. p < n → full adder(A(p), B(p), S(p), C(p), C(p + 1)) spec(n, A, B, S, ci , co ) ≡ ∃C. (C(0) ↔ ci ) ∧ (C(n) ↔ co ) ∧ ∀p. p < n → at least two(A(p), B(p), C(p), C(p + 1))∧ mod two(A(p), B(p), S(p), C(p)) The monadic second-order variables (written in capitals) A and B represent n-bit input vectors, S represents the n-bit output, C the n + 1-bit vector of carries, and the Booleans ci and co are the carry-in and carry-out respectively. The
Qubos: Deciding Quantified Boolean Logic
191
specification of the n-bit adder, for example, states that an n-bit adder is built by chaining together (ripple-carry fashion) n copies of a full-one bit adder, where carries are propagated along an internal line of carries C. The specification of the auxiliary formulae full adder, at least two and mod two are straightforward Boolean formulae and can be found in [2]. The equivalence between the specification and the implementation of the adder is stated by the formula Φ ≡ ∀n. ∀A, B, S. ∀ci , co . adder(n, A, B, S, ci , co ) ↔
(1)
spec(n, A, B, S, ci , co ) . In this example, BMC takes as input the negation of (1) and a natural number k. It produces a quantified Boolean formula as follows. First, first-order quantified subformulae are unfolded k-times; that is formulae having the form ∀x. φ (respectively, ∃x. φ), where x ranges over the natural numbers, are unfolded into the formula i∈{1,... ,k} φ[i/x] (respectively, i∈{1,... ,k} φ[i/x]). In our example, the quantification over n in (1) and over p in the predicates adder and spec are unfolded k times. Afterwards, second-order quantification is eliminated: each second-order variable is replaced with k Boolean variables. For example ∀A is replaced with the quantifier block ∀a1 , . . . , ak and every occurrence of the predicate A(i) is replaced with the Boolean variable ai . This kind of transformation produces a quantified Boolean formula whose size is O(k 2 | φ |) in the bound k and original formula φ. In general, applications to practical verification problems give rise to large quantified Boolean formulae often on the order of 20 megabytes for larger examples, that we have tackled. Central to our approach here is the fact that the transformation always produces formula with a large amount of structure, as we explain below. In the above transformation, large formulae (due to the k 2 factor in the expansion) result from expanding first-order quantification. In this example, we quantify outermost over n in stating our correctness theorem and this is always the case when verifying theorems about parameterized systems. Similarly, when reasoning about time dependent systems, like sequential circuits or protocols, one also always quantifies outermost over n, which represents time or the number of steps. The unfolding of this outermost quantifier alone explains the main reason why BMC results in a quantified Boolean formula of small average quantifier weight since, after the unfolding, the remaining quantified subformulae have a relative weight at most 1/k of the original formula. The unfolding of additional first-order quantifiers only serves to further reduce the average weight. Hence we have: Lemma 1. Let Φ ≡ Q n. φ be a first-order quantified monadic formula where Q ∈ {∀, ∃} and let Φ (respectively φ ) be the result of the BMC expansion of Φ (respectively φ) with bound k ∈ N. It holds that awQ (Φ ) = k1 awQ (φ ), for Q ∈ {∀, ∃}. Of course, BMC also eliminates second-order quantification, where a secondorder quantifier is replaced with a block of Boolean quantifiers. In general, this has a negligible effect on the amount of structure since, after the outermost unfolding, these quantifiers have small relative scope. It follows then that BMC
192
A. Ayari and D. Basin
produces well-structured problems. Moreover, there is a positive correlation between problem size (resulting from large values of k) and structure, which helps to explain the good performance of our decision procedure on problems in this class. 3.2
Conditional Planning in Artificial Intelligence
The second problem domain that we use for experiments is conditional planning in QBL. A conditional planning problem is the task of finding a finite sequence of actions (which comprise a plan) whose successive application, starting from an initial state, leads to a goal state. Applications of conditional planning include robotics, scheduling, and building controllers. The main difference between conditional and classical planning is that the initial states as well as the moves from one state to another state depend on different circumstances that can be tested. This leads to interesting QBL problems. As shown in [14], finding a solution for a conditional planning problem can be expressed as a satisfiability problem for a quantified Boolean formula of the form: P ≡ ∃P1 , . . . , Pm . ∀C1 , . . . , Cn . ∃O1 , . . . , Op . Φ . The validity of the formula P means that there is a plan (represented by the variables P1 , . . . , Pm ) such that for any contingencies (represented by the variables C1 , . . . , Cn ) that could arise, there is a finite sequence of operations (O1 , . . . , Op ) whose applications allow one to reach the goal state starting from an initial state. The body Φ is a conjunct of formulae stating the initial states, goal states, and the next-state relation. If n = 0 then P encodes a classical (non-conditional) planning problem. In this case, the validity of P can be checked using a SAT-solver. In the n = 0 case, in general miniscoping can only partially succeed in pushing the quantifier ∃O1 , . . . , Op down in Φ; this in turn limits the miniscoping of the other quantifiers, e.g., ∀C1 , . . . , Cn . As a result, even after miniscoping, the average ∀-weight is m n + p+ | Φ | = 1− n + m + p+ | Φ | m + n + p+ | Φ | which is high, up to 90%, for large n, m, p, and | Φ |. The average ∃-weight tends to be better since by pushing down, even partially, the ∃O1 , . . . , Op , we increase the amount of (∃-)structure in P and we obtain better average weight, typically between 50% and 70%. Furthermore, the average ∃-weight generally becomes larger (respectively smaller) when we decrease (respectively increase) one of the factors p and | Φ |. Hence conditional planning gives us a potentially large spectrum of problems with differing amounts of structure. Moreover, there are standard databases of such planning problems that exhibit such variations, which we can use for testing.
Qubos: Deciding Quantified Boolean Logic
193
proc Qubos(φ, SAT ) ≡ let Q ∈ {∀, ∃} be the quantifier kind with smallest awQ while (φ contains Q’s) do miniscope the quantifiers in φ; eliminate the innermost Q block; simplify φ; od; compute input α for SAT from φ; invoke SAT with the input α; end Fig. 1. The Qubos Main Loop
4
Qubos
We present in this section the decision procedure implemented by our system Qubos. The main idea is to iterate normalization using miniscoping with selective quantifier expansion and simplification. For well-structured problems, the combination often does not require significant additional space; we will provide experimental evidence for this thesis in Section 5. The structure of the main routine of our decision procedure is given in Figure 1. It takes as arguments a quantified Boolean formula φ and a SAT-solver SAT . The initial step determines whether the average quantifier weight is smaller for ∀ or ∃. Afterwards Qubos iterates three transformations to reduce φ to a Boolean formula. As each iteration results in fewer Q-quantifiers, the procedure always terminates (given sufficient memory). At the end of this step, the formula φ contains only one kind of quantifier. Afterwards, Qubos computes the input formula of the SAT-solver SAT depending on the quantifier kind Q and whether SAT operates on Boolean formulae or on formulae in clausal form. If Q is the quantifier ∃ then Qubos deletes all the occurrences of Q and generates the input of SAT . If Q is the quantifier ∀ then Qubos also deletes all the occurrences of Q, negates the resulting formula, generates the input of SAT , and finally it complements the result returned by the SAT solver. Below, we describe the transformations used in the main loop in more details. Miniscoping. Miniscoping is the process of pushing quantifiers down inside a formula to their minimal possible scope. By reducing the scope of quantifiers, miniscoping reduces the size of the formula resulting from subsequent quantifier expansion. The following rules for miniscoping are standard. ∀x. φ ∧ ψ
⇒
(∀x. φ) ∧ ∀x. ψ
∀x. φ ∨ ψ
⇒
(∀x. φ) ∨ ψ,
if x ∈ free(ψ)
∀x. φ
⇒
φ,
if x ∈ free(φ)
∃x. φ ∨ ψ
⇒
(∃x. φ) ∨ ∃x. ψ
∃x. φ ∧ ψ
⇒
(∃x. φ) ∧ ψ,
if x ∈ free(ψ)
∃x. φ
⇒
φ,
if x ∈ free(φ)
194
A. Ayari and D. Basin
Note that similar kinds of simplification are performed in first-order theorem proving, where quantifiers are pushed down to reduce dependencies and generate Skolem functions with minimal arities (see [12]). Although simple and intuitively desirable, other QSAT solvers work by maxi scoping, i.e., moving quantifiers outwards when transforming formulae to quantified clausal normal form. Elimination of Quantified Variables. We explain only the elimination of universally quantified variables as the elimination of existentially quantified variables is similar. In an expansion phase, we eliminate blocks of universally quantified variables by replacing subformulae of the form ∀x. φ with the conjunction φ[/x] ∧ φ[⊥/x]. In special cases (when eliminating universally quantified variables), we can avoid duplication altogether, e.g., when φ does not contain existential quantifiers (cf., [5]). In this case, we proceed as follows: we transform φ into the clausal normal form ψ, remove all tautologies from ψ, and then replace each literal from {y | y is universally quantified in φ} ∪ {¬y | y is universally quantified in φ} with ⊥ in ψ. Simplification. The application of simplification after each expansion step is important in keeping the size of formulae manageable. We distinguish between four kinds of simplification rules. The first kind consists of the standard simplification rules for Boolean logic that are used to remove tautologies, or perform direct simplification using the idempotence of the connectives ∨ and ∧ and the fact that ⊥ and are their (respective) identities. The second kind of simplification rule is based on a generalization of the unit clause rule (also called Boolean constraint propagation [21]). These rules are as follows (where l is a literal): l ∨ φ ⇒ l ∨ φ[⊥/l]
l ∧ φ ⇒ l ∧ φ[/l]
These rules are especially useful in combination with miniscoping as they often lead to new opportunities for miniscoping to be applied. For example, using the above rules, the formula ∀x. ∃y, z. x ∨ (y ∧ ¬z) ∨ (¬y ∧ z ∧ ¬x) can be simplified to ∀x. ∃y, z. x ∨ (y ∧ ¬z) ∨ (¬y ∧ z) , which can be further transformed using the miniscoping rules to (∀x. x) ∨ ((∃y. y) ∧ (∃z. ¬z) ∨ (∃y. ¬y) ∧ (∃z. z)) .
(2)
This example also motivates why miniscoping is in the Qubos main loop, as opposed to being applied only once initially.
Qubos: Deciding Quantified Boolean Logic
195
The third kind of simplification rule consists of the following quantifier specific rules. ∃x. φ
⇒ φ,
if x ∈free(φ)
∃x. l
⇒
for l ∈ {x, ¬x}
∃x. x ∧ φ
⇒ φ[/x]
∃x. (¬x) ∧ φ ⇒
, φ[⊥/x]
∀x. φ
⇒ φ,
if x ∈free(φ)
∀x. l
⇒
for l ∈ {x, ¬x}
∀x. x ∨ φ
⇒ φ[⊥/x]
∀x. (¬x) ∨ φ ⇒
⊥, φ[/x]
These rules are often effective in eliminating both kinds of quantifiers and therefore avoiding expansion steps. The application of these rules to the formula (2) above simplifies it to . The fourth kind of simplification rule is based on a technique commonly used by solvers based on clausal normal form and consists of dropping variables that occur only positively or only negatively in the clauses set. This technique can be also applied to quantified Boolean formulae that are in nnf . Let φ be a quantified Boolean formula in nnf and x a variable occurring in φ; we say that x is monotone in φ if it occurs only positively or only negatively in φ. It is easy to show that formulae with monotone variables have the following property. Proposition 1 Let φ be a quantified Boolean formula in nnf and let Qx.ψ (for Q ∈ {∀, ∃}) be a subformula in φ where x is monotone in φ. Then the formulae φ and φ are equivalent, where: (i) If Q is ψ[/x] (ii) If Q is ψ[⊥/x]
the quantifier ∃ then φ (respectively ψ[⊥/x]), if the quantifier ∀ then φ (respectively ψ[/x]), if
is obtained from φ by replacing Qx.ψ with x occurs positively (respectively negatively). is obtained from φ by replacing Qx.ψ with x occurs positively (respectively negatively).
This proposition provides a way of eliminating both universally and existentially quantified variables without applying the expansion step, provided the variables are monotone. Clausal Normal Form. Before handing off the normalized formula to a SAT solver we must transform it into clausal normal form. We do this using the renaming technique of [4] where subformulae are replaced with new Boolean variables and definitions of these new Booleans are added to the formula. This technique allows the generation of the clauses in time linear in the size of the input formula.
5
Experimental Results
We have built a system, Qubos (QUantified BOolean Solver), based on the ideas presented in Section 4. The system is written in C++ and supports the use of
196
A. Ayari and D. Basin Table 1. Examples from the BMC library
Qubos: Deciding Quantified Boolean Logic
197
different SAT-solvers including Prover [19], Heerhugo [9], Sato [22] and Zchaff [11]. The times reported below are based on Zchaff. In these timings, typically 60% of the time is consumed by our system and 40% by Zchaff. We carried out comparisons with the Qbf [16] and Semprop [10] systems, which are both state-of-the-art systems based on extensions of Davis-Putnam. The runtimes (on a 750 Mhz Sun Ultra Sparc workstation) depicted in the tables below are user time (in seconds) reported by the operating system for all computation required. Times greater than one hour are indicated by the symbol abort. We used two sets of benchmarks for our comparison. The first is obtained by applying bounded model construction to a library of monadic formulae modeling several verification tasks. These problems include: 1. Formulae encoding the equivalence of the specification and implementation of a ripple-carry adder for different bit-widths. 2. Formulae stating safety properties of a lift-controller. 3. Formulae encoding the equivalence of von Neumann adders and ripple-carry adders with varying bit-width. 4. Formulae stating the stability of a timed flip-flop model. 5. Formulae stating the mutual exclusion property for two protocols. The second set contains encodings of conditional planning problems generated by Rintanen [16] as well as their negations. Table 1 shows the results of the comparison. Each table gives information on quantificational structure, the size k of the model investigated, running times, Qubos space requirements in megabytes, the average quantifier width, and the prefix type of the problems. The input formulae are of size 105 , on average, with respect to | . | defined in Section 2. Qubos has dramatically better performance on all of these examples. The reason is that these problems all have very high structure and, as explained previously, the amount of structure improves (the average quantifier weight decreases) as k and the formulae become larger. These examples also demonstrate that, for well-structured formulae, memory requirements are typically modest; for example, the adder problems use 2 megabytes on the average. On the other hand, Qbf and Semprop translate the problems into quantified clausal form, which drastically increases the quantifier scope and the time and space required to find a solution. The second set of examples contains encodings of block-world planning problems where there is significantly less structure, although varied. Table 2 shows the time required to solve different block planning problems and their negations. The instances are called x.iii.y, where x denotes the number of blocks, y denotes the length of the plan and iii stands for the encoding strategy used to generate the problem (cf., [15]). The instances are ordered by the number of the blocks and their size. The left part of Table 2 titled ”Positive (∃∀∃) Q ≡ ∃” contains the results of the (positive) block planning problems and the right part titled ”Negative (∀∃∀∃) Q ≡ ∀ ” contains the results of the negated block planning problems. A (positive) block planning problem has the general form ∃∀∃φ, where φ is a Boolean formula, and its negation has the form ∀∃∀¬φ. Since the negative problems are just the negation of the positive problems the average ∃-weight in
198
A. Ayari and D. Basin Table 2. Block-World Planning Problems
the positive case and the average ∀-weight in the negative case are identical and their values are displayed in the second column of Table 2. In the positive case, the system Semprop generally either diverges or is very fast. The system Qbf always succeeds with respectable runtime. For Qubos there is a close relationship between its success and the average quantifier weight: the performance of Qubos decreases as the average quantifier weight rises. Qubos succeeds for the small problems, up to size 103 (with respect to | . |), even when the average quantifier weight is high, but it requires significantly more time than Qbf. When the problems become larger, up to size 105 , and the average quantifier weight is high, then Qubos exhausts memory. The superior performance of Qbf in this domain is not too surprising: it was developed and tuned precisely to solve this class of planning problems. In the negative case, the results show that Qubos is robust with respect to the quantificational structure and its success depends decisively on the average
Qubos: Deciding Quantified Boolean Logic
199
quantifier weight. Notice that although the problems in the positive case as well as in the negative case have the same average quantifier weight Qubos requires in general less CPU time for the negative problems. This can be explained by the fact that the negation makes these problems easier. When applying Qbf and Semprop to the negative problems, the negated formula ¬φ is first transformed into clausal form and thereby a new block of existential quantified variables (due to the renaming technique describe in Section 4) is introduced and so these problems have a ∀∃∀∃-structure. As a result these problems no longer have the shape of ∃∀∃ planning problems, which accounts for the divergence of Qbf. Notice that the Mona system can be also used for these examples. A detailed comparison of Mona with the BMC approach can be found in [1]. On the examples given here Mona yields comparable results for the ripple-carry adder, flip-flop, and mutex examples. It yields poorer results for the von Neumann adders, lift-controller, and planning problems. For examples, for the von Neumann adders with bit-width less than 11 it is up to factor 3 slower than Qubos and it diverges on the rest the von Neumann adders, the lift-controller, and all of the planning problems.
6
Conclusion and Future Work
We presented an approach to deciding quantified Boolean logic that works directly on fully-quantified Boolean formulae. We gave a characterization of structure, defined an interesting, natural, class of well-structured problems, and showed experimentally that our approach works well for problems in this class. One issue that is not addressed in our implementation of Qubos is the impact of the order in which quantified subformulae are expanded. Currently Qubos selects the innermost quantified subformula. As future work, we intend to investigate the effect of different selection strategies, such as ordering the quantified formulae with respect to their relative structure. Acknowledgments. The authors would like to thank Jussi Rintanen for providing us the planning examples used in Section 5.
References 1. Abdelwaheb Ayari and David Basin. Bounded model construction for monadic second-order logics. In 12th International Conference on Computer-Aided Verification (CAV’00), number 1855 in LNCS. Springer-Verlag, 2000. 2. Abdelwaheb Ayari, David Basin, and Stefan Friedrich. Structural and behavioral modeling with monadic logics. In Rolf Drechsler and Bernd Becker, editors, The Twenty-Ninth IEEE International Symposium on Multiple-Valued Logic. IEEE Computer Society, Los Alamitos, Freiburg, Germany, May 1999. 3. Armin Biere, Alessandro Cimatti, Edmund Clarke, and Yunshan Zhu. Symbolic model checking without BDDs. In TACAS’99, volume 1579 of LNCS. Springer, 1999. 4. Thierry Boy de la Tour. An optimality result for clause form translation. Journal of Symbolic Computation, 14(4), October 1992.
200
A. Ayari and D. Basin
5. Hans Kleine B¨ uning and Theodor Lettmann. Aussagenlogik: Deduktion und Algorithmen. B. G. Teubner, Stuttgart, 1994. 6. Marco Cadoli, Andrea Giovanardi, and Marco Schaerf. An algorithm to evaluate quantified Boolean formulae. In Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98) and of the 10th Conference on Innovative Applications of Artificial Intelligence (IAAI-98), July 26–30 1998. 7. Enrico Giunchiglia, Massimo Narizzano, and Armando Tacchella. Backjumping for quantified boolean logic satisfiability. In Proceedings of the 17th International Conference on Artificial Intelligence (IJCAI-01), August 4–10 2001. 8. Enrico Giunchiglia, Massimo Narizzano, and Armando Tacchella. QuBE: A system for deciding Quantified Boolean Formulas Satisfiability. In Proceedings of the International Joint Conference on Automated Reasoning (IJCAR’01), June 2001. 9. Jan Friso Groote and Joost P. Warners. The propositional formula checker HeerHugo. In Ian Gent, Hans van Maaren, and Toby Walsh, editors, SAT20000: Highlights of Satisfiability Research in the year 2000, Frontiers in Artificial Intelligence and Applications. Kluwer Academic, 2000. 10. Reinhold Letz. Advances in decision procedures for quantified boolean formulas. In Uwe Egly, Rainer Feldmann, and Hans Tompits, editors, Proceedings of QBF2001 workshop at IJCAR’01, June 2001. 11. Matthew Moskewicz, Conor Madigan, Ying Zhao, Lintao Zhang, and Sharad Malik. Chaff: Engineering an Efficient SAT Solver. In Proceedings of the 38th Design Automation Conference (DAC’01), June 2001. 12. Andreas Nonnengart and Christoph Weidenbach. Computing small clause normal forms. In Alan Robinson and Andrei Voronkov, editors, Handbook of Automated Reasoning, volume I, chapter 6. Elsevier Science B.V., 2001. 13. David Plaisted, Armin Biere, and Yunshan Zhu. A satisfiability procedure for quantified boolean formulae. Unpublished, 2001. 14. Jussi Rintanen. Constructing conditional plans by a theorem-prover. Journal of Artificial Intelligence Research, 10, 1999. 15. Jussi Rintanen. Improvements to the evaluation of quantified boolean formulae. In Dean Thomas, editor, Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI-99-Vol2). Morgan Kaufmann Publishers, S.F., July 31–August 6 1999. 16. Jussi Rintanen. Partial implicit unfolding in the Davis-Putnam procedure for quantified Boolean formulae. In R. Nieuwenhuis and A. Voronkov, editors, Proceedings of the 8th International Conference on Logic for Programming, Artificial Intelligence and Reasoning, volume 2250 of LNCS. Springer-Verlag, Berlin, 2001. 17. Christoph Scholl and Bernd Becker. Checking equivalence for partial implementations. In Design Automation Conference, 2001. 18. Ofer Shtrichman. Tuning sat checkers for bounded model checking. In 12th International Conference on Computer-Aided Verification (CAV’00), number 1855 in LNCS. Springer-Verlag, 2000. 19. Gunnar St˚ almarck. A system for determining propositional logic theorems by applying values and rules to triplets that are generated from a formula. Technical report, European Patent Nr. 0403 454 (1995), US Patent Nr. 5 276 897, Swedish Patent Nr. 467 076 (1989), 1989. 20. Poul Williams, Armin Biere, Edmund Clarke, and Anubhav Gupta. Combining Decision Diagrams and SAT Procedures for Efficient Symbolic Model Checking. In Proceedings of CAV’00, number 1855 in LNCS. Springer-Verlag, 2000.
Qubos: Deciding Quantified Boolean Logic
201
21. Ramin Zabih and David McAllester. A rearrangement search strategy for determining propositional satisfiability. In Tom M. Smith, Reid G.; Mitchell, editor, Proceedings of the 7th National Conference on Artificial Intelligence, St. Paul, MN, August 1988. Morgan Kaufmann. 22. Hantao Zhang. SATO: An efficient propositional prover. In CADE’97, volume 1249 of LNAI. Springer, 1997.
Exploiting Transition Locality in the Disk Based Murϕ Verifier Giuseppe Della Penna1 , Benedetto Intrigila1 , Enrico Tronci2, , and Marisa Venturini Zilli2 1
Dip. di Informatica, Universit` a di L’Aquila, Coppito 67100, L’Aquila, Italy {gdellape,intrigil}@univaq.it 2 Dip. di Informatica Universit` a di Roma “La Sapienza”, Via Salaria 113, 00198 Roma, Italy {tronci,zilli}@dsi.uniroma1.it
Abstract. The main obstruction to automatic verification of Finite State Systems is the huge amount of memory required to complete the verification task (state explosion). This motivates research on distributed as well as disk based verification algorithms. In this paper we present a disk based Breadth First Explicit State Space Exploration algorithm as well as an implementation of it within the Murϕ verifier. Our algorithm exploits transition locality (i.e. the statistical fact that most transitions lead to unvisited states or to recently visited states) to decrease disk read accesses thus reducing the time overhead due to disk usage. A disk based verification algorithm for Murϕ has been already proposed in the literature. To measure the time speed up due to locality exploitation we compared our algorithm with such previously proposed algorithm. Our experimental results show that our disk based verification algorithm is typically more than 10 times faster than such previously proposed disk based verification algorithm. To measure the time overhead due to disk usage we compared our algorithm with RAM based verification using the (standard) Murϕ verifier with enough memory to complete the verification task. Our experimental results show that even when using 1/10 of the RAM needed to complete verification, our disk based algorithm is only between 1.4 and 5.3 times (3 times on average) slower than (RAM) Murϕ with enough RAM memory to complete the verification task at hand. Using our disk based Murϕ we were able to complete verification of a protocol with about 109 reachable states. This would require more than 5 gigabytes of RAM using RAM based Murϕ.
1
Introduction
State Space Exploration (Reachability Analysis) is at the very heart of all algorithms for automatic verification of concurrent systems. As well known, the
This research has been partially supported by MURST projects MEFISTO and SAHARA Corresponding Author: Enrico Tronci. Tel: +39 06 4991 8361 Fax: +39 06 8541 842
M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 202–219, 2002. c Springer-Verlag Berlin Heidelberg 2002
Exploiting Transition Locality in the Disk Based Murϕ Verifier
203
main obstruction to automatic verification of Finite State Systems (FSS) is the huge amount of memory required to complete state space exploration (state explosion). For protocol like systems, Explicit State Space Exploration often outperforms Symbolic (i.e. OBDD based, [1,2]) State Space Exploration [8]. Since here we are mainly interested in protocol verification, we focus on explicit state space exploration. Tools based on explicit state space exploration are, e.g., SPIN [6, 14] and Murϕ [4,11]. In our context, roughly speaking, two kinds of approaches have been studied to counteract (i.e. delay) state explosion: memory saving and auxiliary storage. In a memory saving approach essentially one tries to reduce the amount of memory needed to represent the set of visited states. Examples of the memory saving approach are, e.g., in [23,9,10,17,18,7]. In an auxiliary storage approach one tries to exploit disk storage as well as distributed processors (network storage) to enlarge the available memory (and CPU). Examples of this approach are, e.g., in [15,16,12,20,13,5]. Exploiting statistical properties of protocol transition graphs it is possible to trade space with time [21,22], thus enlarging the class of systems for which automatic verification is feasible. In particular in [21] it has been shown that protocols exhibit locality. That is, w.r.t. levels of a Breadth First Search (BFS), state transitions tend to be between states belonging to close levels of the transition graph. In [21] an algorithm was also presented exploiting locality in order to save RAM as well as an implementation of such an algorithm within the Murϕ verifier. It is then natural and worth doing looking for a way to exploit locality also when using a disk based state exploration algorithm. In this paper we present a Disk based Breadth First Search (DBFS) algorithm that exploits transition locality. Our algorithm is obtained by modifying the DBFS algorithm presented in [16]. Our main results can be summarized as follows. – We present a DBFS algorithm that is able to exploit transition locality. Essentially, our algorithm is obtained from the one in [16] by using only a suitable subset of the states stored on disk to clean up the unchecked states BFS queue of [16]. By reducing disk read accesses we also reduce our time overhead w.r.t. a RAM based BFS state space exploration. – We implemented our algorithm within the Murϕ verifier. As the algorithm in [16], our algorithm is compatible with all state reduction techniques implemented in the Murϕ verifier. – We run our DBFS algorithm on some of the protocols included in the standard Murϕ distribution [11]. Our experimental results can be summarized as follows. • Even when using 1/10 of the RAM needed to complete verification, our disk based Murϕ is only between 1.4 and 5.3 times slower (3 times on average) than (RAM based) standard Murϕ [11] with enough RAM to complete the verification task at hand.
204
G.D. Penna et al.
• Our disk based algorithm is typically more than 10 times faster than the disk based algorithm presented in [16]. – Using our disk based Murϕ we were able to complete verification of a protocol with almost 109 reachable states. Using standard Murϕ this protocol would require more than 5 gigabytes of RAM.
2
Transition Locality for Finite State Systems
In this section we define (from [21]) our notion of locality for transitions. For our purposes, a protocol is represented as a Finite State System. A Finite State System (FSS) S is a 4-tuple (S, I, A, R) where: S is a finite set (of states), I ⊆ S is the set of initial states, A is a finite set (of transition labels) and R is a relation on S ×A×S. R is usually called the transition relation of S. Given states s, s ∈ S and a ∈ A we say that there is a transition from s to s labeled with a iff R(s, a, s ) holds. We say that there is a transition from s to s (notation R(s, s )) iff there exists a ∈ A s.t. R(s, a, s ) holds. The set of successors of state s (notation next(s)) is the set of states s s.t. R(s, s ). The set of reachable states of S (notation Reach) is the set of states of S reachable in 0 or more steps from I. Formally, Reach is the smallest set s.t.: 1. I ⊆ Reach; 2. for all s ∈ Reach, next(s) ⊆ Reach. The transition relation R of a given system defines a graph (transition graph). Computing Reach (reachability analysis) means visiting (exploring) the transition graph starting from the initial states in I. This can be done, e.g., using a Depth First Search (DFS) or a Breadth First Search (BFS). In the following we will focus on BFS. As well known a BFS defines levels on the transition graph. Initial states (i.e. states in I) are at level 0. The states in (next(I) − I) (states reachable in one step from I and not in I) are at level 1, etc. Formally we define the set of states at level k (notation L(k)) as follows. ∪ L(i)}. L(0) = I, L(k + 1) = {s | ∃s s.t. s ∈ L(k) and R(s, s ) and s ∈ i=k i=0 Given a state s ∈ Reach we define level(s) = k iff s ∈ L(k). That is level(s) is the level of state s in a BFS of S. The set Visited(k) of states visited (by a BFS) by level k is defined as follows. Visited(k) = ∪i=k i=0 L(i). Informally, transition locality means that for most transitions source and target states will be in levels not too far apart. Let S = (S, I, A, R) be an FSS. A transition in S from state s to state s is said to be k-local iff |level(s ) − level(s)| ≤ k. In [21] it is shown experimentaly the following fact. For most protocols, we have that for most states more that 75% of the transitions are 1-local.
Exploiting Transition Locality in the Disk Based Murϕ Verifier
205
/* Global Variables */ hash table M; /* main memory table */ file D; /* disk table */ FIFO queue Q_ck; /* checked state queue */ FIFO queue Q_unck; /* unchecked state queue */ int disk_cloud_size; /* number of blocks to be read from file D */ Fig. 1. Data Structures
3
A Disk Based State Space Exploration Algorithm Exploiting Transition Locality
Magnetic disk read/write times are much larger than RAM read/write times. Thus, not surprisingly, the main drawback of DBFS (Disk based Breadth First Search) w.r.t. RAM-BFS (RAM based Breadth First Search) is the time overhead due to disk usage. On the other hand, because of state explosion, memory is one of the main obstructions to automatic verification. Thus using magnetic disks to increase the amount of memory available during verification is very appealing. In [16] a DBFS algorithm has been proposed for the Murϕ verifier. Here we show that by exploiting transition locality (Section 2) the algorithm in [16] can be improved. In particular, disk accesses for reading can be reduced. This decreases the time overhead (w.r.t. a RAM-BFS) due to disk usage. As in [16] we actually have two DBFS algorithms: one for the case in which hash compaction [17,18] (Murϕ option -c) is enabled and one for the case in which hash compaction is not enabled. As the algorithm in [16] our algorithm can adjust for both cases. In the following we only present the version which is compatible with the hash compaction option. When hash compaction is not enabled the algorithm is actually simpler and can be easily obtained from the algorithm compatible with the hash compaction option. In the following we call LDBFS our Locality based DBFS algorithm. Figs. 1, 2, 3, 4, 5, 7 define our LDBFS using a C like programming language. Search() { /* initialization */ M = empty; D = empty; Q_ck = empty; Q_unck = empty; for each startstate s {Insert(s);} /* startstate generation */ do /* search loop */ { while (Q_ck is not empty) { s = dequeue(Q_ck); for all s’ in successors(s) {Insert(s’);} } /* while */ Checktable(); } while (Q_ck is not empty); /* do */ } /* Search()*/ Fig. 2. Function Search()
206
G.D. Penna et al.
3.1
Data Structures
The data structures used by LDBFS are in Fig. 1 and are essentially the same as the ones used in [16]. We have: a table M to store signatures of recently visited states; a file D to store signatures of all visited states (old states); a checked queue Q ck to store the states in the BFS level currently explored by the algorithm (BFS front); an unchecked queue Q unck to store pairs (s, h) where s is a state candidate to be on the next BFS level and h is the signature of state s. As in [16] state signatures in M do not necessarily represent all visited states. In M we just have recently visited states. Using the information in M we build the unchecked queue Q unck, i.e. the set of states candidate to be on the next BFS level. Note that the states in Q unck may be old (i.e. previously visited) since using M we can only avoid re-inserting in Q unck recently visited states. As in [16] we use disk file D to remove old state signatures from table M as well as to check Q unck to get rid of old states. The result of this checking process is the checked queue Q ck. The main difference between our algorithm and the one in [16] is that in the checking process we only use a subset of the state signatures in D. In fact we divide D into blocks and then use only some of such blocks to clean up M and Q unck. The global variable state cloud size holds the number of blocks of D we use to remove old state signatures from table M. Our algorithm dynamically adjust the value of state cloud size during the search. Using only a subset of the states in D decreases disk usage and thus speeds up verification. Note however that in [16] the checked queue Q ck only contains new (i.e. not previously visited) states whereas in our case Q ck may also contain some old (i.e. already visited) state. As a result our algorithm may mark as new (unvisited) a state that indeed is old (visited). This means that some state may be visited more than once and thus appended to file D more than once. However, thanks to transition locality (Section 2), this does not happen too often. It is exactly this statistical property of transition graphs that makes our approach effective. Table M is in main memory (RAM) whereas file D is on disk. We use disk memory also for the BFS queues Q ck, Q unck which instead are kept in main memory in the algorithm proposed in [16]. Our low level algorithm to handle disk queues Q ck and Q unck is exactly the same one we used in Cached Murϕ [21,3] for the same purpose, thus we do not show it here. Note that all the data structures that grow with the state space size (namely: D, Q ck, Q unck) are on disk in LDBFS. In [16] D is on disk, however state queues are in RAM. Since states in the BFS queue are not compressed [11] we have that for large verification problems the BFS queue can be a limiting factor for [16]. For this reason in LDBFS we implemented state queues on disk. 3.2
Function Search()
Function Search() (Fig. 2) is the same as the one used in the DBFS algorithm in [16].
Exploiting Transition Locality in the Disk Based Murϕ Verifier Insert(state s) { h = hash(s); /* compute signature of state s */ if (h is not in M) { insert h in M; enqueue((s, h), Q_unck); if (M is full) Checktable(); } /* if */ } /* Insert()
207
*/
Fig. 3. Function Insert()
Function Search() is a Breadth First Search using the checked queue Q ck as the current level state queue. Function Search() first loads the BFS queue (Q ck) with the initial states. Then Search() begins dequeuing states from Q ck. For each successor s’ of each state dequeued from Q ck, Search() calls Insert(s’) to store potentially new states in M as well as in Q unck. When queue Q ck becomes empty it means that all transitions from all states in the current BFS level have been explored. Thus we want to move to the next BFS level. Function Search() does this by calling function Checktable() which refills the checked queue Q ck with fresh (non visited) states, if any, from the unchecked queue Q unck. If, after calling Checktable(), Q ck is still empty it means that all reachable states have been visited and the BFS ends. 3.3
Function Insert()
Functions Insert() (Fig. 3) is the same as the one used in the DBFS algorithm in [16]. Consider the pair (s, h), where s is a state whose signature is h. If signature h is not in table M then Insert(s) inserts pair (s, h) in the unchecked queue Q unck and signature h in table M. When M is full, function Insert() calls function Checktable() to clean up M as well as the queues. Function Checktable() is also called at the end of each BFS level (when Q ck is empty). 3.4
Exploiting Locality in State Filtering
Function Checktable() in the DBFS algorithm in [16] uses all state signatures in disk file D to remove old states from Q unck. Exploiting locality (Section 2) here we are able to use only a fraction of the state signatures on disk D to clean up table M and queue Q unck. Disk usage is what slows down DBFS w.r.t. a RAM-BFS. Thus, by reading less states from disk, we save w.r.t. [16] some of the time overhead due to disk (read) accesses. The rationale of our approach stems from the following observations. First we should note that state signatures are appended to D in the same order in which new states are discovered by the BFS. Thus, as we move towards the tail of file D we find (signatures of) states whose BFS level is closer and closer to the current BFS level, i.e. the BFS level reached by the search. From [21] we
208
G.D. Penna et al.
Checktable() /* old/new check for main memory table */ { /* Disk cloud defined in Section 3.4 */ /*number of states deleted from M that are in disk cloud*/ deleted_in_cloud = 0; /*number of states deleted from M that are on disk but not in disk cloud*/ deleted_not_in_cloud = 0; /* Randomly choose indexes of disk blocks to read (disk cloud) */ DiskCloud = GetDiskCloud(); /* something_not_in_cloud is true iff there exists a state on disk that is not in the disk cloud */ if (there exists a disk block not selected in DiskCloud) something_not_in_cloud = true; else something_not_in_cloud = false; Calibration_Required = QueryCalibration(); for each Block in D { if (Block is in DiskCloud or Calibration_Required) { for all state signatures h in Block { if (h is in M) { remove h from M; if (Block is in DiskCloud) { deleted_in_cloud++; } else /* Block is not in DiskCloud */ {deleted_not_in_cloud++; }}}}} /* remove old states from state queue and add new states to disk */ while (Q_unck is not empty) { (s, h) = dequeue(Q_unck); if (h is in M) {append h to D; remove h from M; enqueue(Q_ck, s);}} /* clean up the hash table */ remove all entries from M; /* adjust disk cloud size, if requested */ if (Calibration_Required) { if (something_not_in_cloud and (deleted_in_cloud + deleted_not_in_cloud > 0)) {Calibrate(deleted_in_cloud,deleted_not_in_cloud);} if (disk access rate has been too long above a given critical limit) {reset disk cloud size to its initial value with given probability P;} } /* if Calibration_Required */ } /* Checktable() */ Fig. 4. Function Checktable() (state filtering)
Exploiting Transition Locality in the Disk Based Murϕ Verifier
209
GetDiskCloud() { Randomly select disk_cloud_size blocks from disk according to the probability distribution shown in Fig. 6 Return the indexes of the selected blocks. } Fig. 5. Function GetDiskCloud()
know that most transitions are local, i.e. they lead to states that are on BFS levels close to the current one. This means that most of the old states in M can be detected and removed by only looking at the tail of file D. We can take advantage of the above remarks by using the following approach. We divide the disk file D into blocks. Rather than using the whole file D in the Checktable() (as done in [16]) we only use a subset of the set of disk blocks. We call such a subset disk cloud. The disk cloud is created by selecting at random several disk blocks. Selection probability of disk blocks is not uniform. Instead, to exploit locality, disk block selection probability increases as we approach the tail of D (see Fig. 6). In [21] it is shown that locality allows us to save about 40% of the memory required to complete verification. This suggests to just use say 60% of the disk blocks. Thus the size (number of blocks) of the disk cloud should be 60% of the number of disk blocks. This works fine. However we can do more. Our experimental results show that, most of the time, we need much less than 60% of the disk blocks to carry out the clean up implemented by function Checktable(). Thus we dynamically adjust the fraction of disk blocks used by function Checktable(). 3.5
Function Checktable()
Function Checktable() (Fig. 4), using disk file D, removes signatures of old (i.e. visited) states from table M. Then, using such cleaned M, Checktable() removes old states from the unchecked queue Q unck. Finally, Checktable() moves the states that are in the (now cleaned) unchecked queue Q unck to the checked queue Q ck. 3.6
Disk Cloud Creation
Function GetDiskCloud() (Fig. 5) is called by function Checktable() to create our disk cloud. Function GetDiskCloud() selects disk cloud size disk blocks according to the probability curve shown in Fig. 6. We number disk blocks starting from 0 (oldest block). Thus the lower the disk block index the older (closer to the head of file D) the disk block. On the x axis of Fig. 6 we have the relative disk block index ρ, i.e. ρ = /. E.g. ρ = 0 is the (relative index of the) first (oldest) disk block inserted in disk D, whereas ρ = 1 is the last (newest) disk block inserted. On the y axis of Fig. 6 we have the probability of selecting a disk block with a given ρ.
210
G.D. Penna et al.
Selection Probability
b3
b2
b1 b0 a0
a1 Disk Blocks
a2
a3
Fig. 6. Probability curve for disk cloud block selection (used by GetDiskCloud())
The selection probability curve in Fig. 6 ensures that the most recently created blocks (ρ close to 1) are selected with a higher probability than old blocks thus exploiting transition locality [21]. Note that, defensively, the selection probability of old blocks (ρ close to 0) is b0 > 0. This is because we want to have some old blocks to remove occasional far back states (i.e. states belonging to an old BFS level far from the current one) reached by occasional non local transitions. Function GetDiskCloud() returns to Checktable() the indexes of the selected blocks. Since our min and max values for the relative disk block indexes are, respectively, 0 and 1, in Fig. 6 we have a0 = 0 and a3 = 1. The value of b3 is always 1/K, where K is a normalization constant chosen so that the sum over all disk blocks of the selection probabilities is 1. The pairs (a1 , b1 ), (a2 , b2 ) define our selection strategy. The values we used in our experiments are: a1 = 0.4, b1 = 0.4/K, a2 = 0.7, b2 = 0.6/K. Two strategies are possible to partition disk D in state signature blocks. We can have either a variable number of fixed size blocks or a fixed number of variable size blocks. Reading a block from disk D can be done with a sequential transfer, whereas moving disk heads from one block to another requires a disk seek operation. Since seeks take longer than sequential transfers we decided to limit the number of seeks. This led us to use a fixed number of variable size blocks. Let N be the number of disk blocks we want to use and let S be the number of state signatures in file D. Then each block (possibly with the exception of the last one that will be smaller) has S/N state signatures. As a matter of fact, to avoid having too small blocks, we also impose a minimum value B for the number of state signatures in a block. Thus we may have less than N blocks if S is too small.
Exploiting Transition Locality in the Disk Based Murϕ Verifier
211
Calibrate(deleted_in_cloud, deleted_not_in_cloud) { deleted_states = deleted_in_cloud + deleted_not_in_cloud; beta = deleted_not_in_cloud / deleted_states; if (beta is close to 1) /* low disk cloud effectiveness: increase disk access rate */ { /* increase disk_cloud_size by a given percentage */ disk_cloud_size = (1 + speedup)*disk_cloud_size; } else if (beta is close to 0) /* high disk cloud effectiveness: decrease disk access rate */ { /* decrease disk_cloud_size by a given percentage */ disk_cloud_size = (1 - slowdown)*disk_cloud_size; }} Fig. 7. Function Calibrate()
In our experiments here we used N = 100 and B = 104 . Thus, e.g. to have 100 disk blocks we need at least 106 reachable states. 3.7
Disk Cloud Size Calibration
Function Calibrate() (Fig. 7) is called by function Checktable() every time a calibration is needed for the disk cloud size. Two parameters are passed to function Calibrate(). Namely: the number of disk states deleted from M by Checktable() by only using disk blocks that are in the disk cloud (deleted in cloud in Fig. 7) and the number of disk states deleted from M by only using disk blocks that are not in the disk cloud (deleted not in cloud in Fig. 7). Function Calibrate() reads the whole file D and computes the ratio (beta in Fig. 7) between the number of deleted states not in the disk cloud and the number of total deleted states (deleted states in Fig. 7). A value of beta close to 1 (low disk cloud effectiveness) means that the disk cloud has not been very effective in removing old states from table M. In this case, the variable disk cloud size (holding the disk cloud size) is increased by (speedup*disk cloud size). A value of beta close to 0 (high disk cloud effectiveness) means that the disk cloud has been very effective in removing old states from table M. In this case, we decrease the value of disk cloud size by (slowdown*disk cloud size) in order to lower the disk access rate. In our experiments here we used speedup = 0.15 and slowdown = 0.15. 3.8
Calibration Frequency
Function QueryCalibration() called by function Checktable() (Fig. 4) tells us whether a calibration has to be performed or not. The rationale behind function QueryCalibration() is the following. Calling function Calibrate() too often nullifies our efforts for reducing disk usage. In fact a calibration of the disk cloud size requires reading the whole file
212
G.D. Penna et al.
D. However calling function Calibrate() too sporadically may have the same effect. In fact waiting too much for a calibration may lead to use an oversized disk cloud or an undersized one. An oversized disk cloud increases disk usage beyond needs. Also an undersized disk cloud increases disk usage, since many old states will not be removed from M and we will be revisiting many already visited states. In our current implementation function QueryCalibration() enables a calibration for every 10 calls of function Checktable() (Fig. 4). Our experimental results suggests that this is a reasonable calibration frequency.
4
Experimental Results
We implemented the LDBFS algorithm of Sect. 3 within the Murϕ verifier. In the following we call DMurϕ the version of the Murϕ verifier we obtained. In this section we report the experimental results we obtained by using DMurϕ. Our experiments have two goals. First we want to know if by using locality there is indeed some gain w.r.t. the algorithm proposed in [16]. Second we want to measure DMurϕ time overhead w.r.t. standard Murϕ performing a RAM-BFS. To meet our goals we proceed as follows. First, for each protocol in our benchmark we determine the minimum amount of memory needed to complete verification using the Murϕ verifier (namely Murϕ version 3.1 from [11]). Then we compare Murϕ performances with those of DMurϕ and with those of the disk based algorithm proposed in [16]. Our benchmark consists of some of the protocols in the Murϕ distribution [11] and the kerb protocol from [19]. 4.1
Results with Murϕ
The Murϕ verifier takes as input the amount of memory M to be used during the verification run as well as the fraction g (in [0, 1]) of M used for the queue (i.e. g is gPercentActive using a Murϕ parlance). We say that the pair (M , g) is suitable for protocol p iff the verification (with Murϕ) of p can be completed with memory M and queue gM . For each protocol p we determine the least M s.t. for some g, (M , g) is suitable for p. In the sequel we denote by M (p) such an M . Of course M (p) depends on the compression options used. Murϕ offers bit compression (-b) and hash compaction (-c). Our approach (as the one in [16]) is compatible with all Murϕ compression options. However, a disk based approach is really interesting only when, even using all compression options, one runs out of RAM. For this reason we only present results about experiments in which all compression options (i.e. -b -c) are enabled. Fig. 8 gives some useful information about the protocols we considered in our experiments. The meaning of the columns in Fig. 8 is explained in Fig. 9.
Exploiting Transition Locality in the Disk Based Murϕ Verifier Bytes Diam 96 1,1,3,2,10 12 n peterson 20 9 241 newlist6 32 7 91 ldash 144 1,4,1,false 72 sci 60 3,1,1,2,1 94 mcslock1 16 6 111 sci 64 3,1,1,5,1 95 sci 68 3,1,1,7,1 143 kerb 148 NumIntruders=2 15 newlist6 40 8 110 Protocol and Parameters ns
213
mu -b -c Reach
Rules
Max Q
M
g
T
2,455,257
8,477,970 1,388,415 145,564,125 0.57 1,211.02
2,871,372
25,842,348
3,619,556
21,612,905 140,382 22,590,004 0.04 1,641.67
46,657
15,290,000 0.02 764.27
8,939,558 112,808,653 509,751 118,101,934 0.06 12,352.93 9,299,127
30,037,227 347,299 67,333,575 0.04 2,852.03
12,783,541 76,701,246 392,757 70,201,817 0.03 3,279.45 75,081,011 254,261,319 2,927,550 562,768,255 0.04 35,904.86 126,784,943 447,583,731 4,720,612 954,926,331 0.04 99,904.47 7,614,392
9,859,187 4,730,277 738,152,956 0.62 2,830.83
81,271,421 563,937,480 2,875,471 521,375,945 0.03 31114.87
Fig. 8. Results on a INTEL Pentium III 866Mhz with 512M RAM. Murϕ options used: -b (bit compression), -c (40 bit hash compaction), -ndl (no deadlock detection). Meaning Attribute Protocol Name of the protocol. Values of the parameters we used for the protocol. We show our parameter values in the same order in which such parameters appear in the Parameters Const section of the protocol file included in the Murϕ distribution [11]. When such list is too long, as for the kerb protocol, we just list the assignments we modified in the Const section w.r.t. the distribution. Number of bytes needed to represent a state in the queue when bit compression is used. For protocol p we denote such number by StateBytes(p). Bytes Note that since we are using bit compression as well as hash compaction (-b -c), 5 bytes are used to represent (the signature of) a state in the hash table. Number of reachable states for the protocol. For protocol p, we denote Reach such number by |Reach(p)|. Number of rules fired during state space exploration. For protocol p, we Rules denote such number by RulesFired(p). Maximum queue size (i.e. number of states) attained during space state Max Q exploration. For protocol p we denote such number by MaxQ(p). Diam Diameter of the transition graph. Minimum amount of memory (in kilobytes) needed to complete state space exploration. That is M (p). Let bh be the number of bytes taken M by a state in the hash table (for us bh = 5 since we are using hash compaction). From the Murϕ source code [11] we can compute M (p). We have: M (p) = |Reach(p)| (bh + (MaxQ(p)/|Reach(p)|)StateBytes(p)). Fraction of memory M used for the queue. From the Murϕ source code g [11] we can compute g. We have: g = MaxQ(p)/|Reach(p)|. CPU time (in seconds) to complete state space exploration when using memory M and queue gM. For protocol p, we denote such number by T T (p). Fig. 9. Meaning of the columns in Fig. 8.
214
G.D. Penna et al.
From column M of Fig. 8 we see that there are protocols requiring more than 512M bytes of RAM to complete. Thus we could not use standard Murϕ on our 512M PC. However we were able to complete verification of such protocols using Cached Murϕ (CMurϕ) [3]. Giving to CMurϕ enough RAM we get a very low collision rate and from [21] we know that in this case the CPU time taken by CMurϕ is essentially the same as that taken by standard Murϕ with enough RAM to complete the verification task. For this reason in the following we will regard the results in Fig. 8 as if they were all obtained by using standard Murϕ with enough (i.e. M (p)) RAM to complete the verification task.
4.2
Results with DMurϕ
Our next step is to run each protocol p in Fig. 8 with less and less (RAM) memory using our DMurϕ. Namely, we run protocol p with memory limits M (p), 0.5M (p) and 0.1M (p). This approach allows us to easily compare the experimental results obtained from different protocols. The results we obtained are in Fig. 10. We give the meaning of rows and columns in Fig. 10. Columns Protocol and Parameters have the meaning given in Fig. 9. Column α (with α = 1, 0.5, 0.1) gives information about the run of protocol p with memory αM (p). Row States gives the ratio between the visited states (by DMurϕ) when using memory αM (p) and |Reach(p)| (in Fig. 8). This is the state overhead due to revisiting already visited states. This may happen since in function Checktable() (Fig. 4) we do not use the whole disk file D to remove old states from table M. Row Rules gives the ratio between the rules fired (by DMurϕ) when using memory αM (p) and RulesFired(p) (in Fig. 8). This is the rule overhead due to revisiting already visited states. Row Time gives the ratio between the time TDM urϕ,α (p) (in seconds) to complete state space exploration (with DMurϕ) when using memory αM (p) and T (p) in Fig. 8. This is our time overhead w.r.t. RAM-BFS. Note that TDM urϕ,α (p) is the time elapsed between the start and the end of the state space exploration process. That is TDM urϕ,α (p) is not just the CPU time, instead TDM urϕ,α (p) also includes the time spent on disk accesses. Note that for the big protocols in Fig. 8 (i.e. those requiring more than 512M of RAM) we could not run the experiments with α = 1 on our machine with 512M of RAM. However, of course, the most interesting column for us is the one with α = 0.1. The experimental results in Fig. 10 show that even when α = 0.1 our disk based approach is only between 1.4 and 5.3 (3 on average) times slower than a RAM-BFS with enough RAM to complete the verification task.
Exploiting Transition Locality in the Disk Based Murϕ Verifier Protocol n peterson
Parameters 9
ns
1,1,3,2,10
newlist6
7
ldash
1,4,1,false
sci
3,1,1,2,1
mcslock1
6
sci
3,1,1,5,1
Mem States Rules Time States Rules Time States Rules Time States Rules Time States Rules Time States Rules Time
States Rules Time sci 3,1,1,7,1 States Rules Time kerb NumIntruders=2 States Rules Time newlist6 8 States Rules Time Min Avg Max
1 1.178 1.178 2.148 1.348 1.487 1.734 1.366 1.365 1.703 1.566 1.528 2.037 1.260 1.279 1.811 1.346 1.346 1.915
0.5 1.124 1.124 2.056 1.405 2.011 2.144 1.335 1.334 1.765 1.668 1.626 2.226 1.189 1.206 1.798 1.550 1.550 2.477
0.1 1.199 1.199 2.783 1.373 1.645 1.953 1.384 1.382 2.791 1.702 1.658 3.770 1.183 1.200 2.888 1.703 1.703 5.259
— — — — — —
1.169 1.195 1.828 1.130 1.152 1.421 1.282 1.060 1.234 1.416 1.412 2.612
1.143 1.167 2.553 1.097 1.115 1.743 1.279 1.080 1.438 1.406 1.405 4.436
— — — — —
215
Time 1.703 1.234 1.438 Time 1.891 1.954 2.961 Time 2.148 2.612 5.259
Fig. 10. Comparing DMurϕ with RAM Murϕ [11] (compression options: -b -c)
4.3
Results with Disk Based Murϕ
To measure the time speed up we obtain by exploiting locality we are also interested in comparing our locality based disk algorithm DMurϕ with the disk based Murϕ presented in [16]. The algorithm in [16] is not available in the standard Murϕ distribution [11]. However, if we omit the calibration (Fig. 7) step in function Checktable() (Fig. 4) and always use all disk blocks to clean up the unchecked queue Q unck and
216
G.D. Penna et al. Protocol Parameters Mem n peterson 9 States Rules Time ns 1,1,3,2,10 States Rules Time newlist6 7 States Rules Time ldash 1,4,1,false States Rules Time sci 3,1,1,2,1 States Rules Time mcslock1 6 States Rules Time
1 0.5 0.1 1.000 1.000 0.527 1.000 1.000 0.507 2.623 2.430 > 90.704 1.000 1.000 0.747 1.000 1.000 0.309 1.259 242.131 >77.895 1.000 1.000 0.253 1.000 1.000 0.203 1.331 1.357 >42.817 0.355 — — 0.245 — — >50.660 — — 1.000 0.361 — 1.000 0.647 — 1.616 > 11.863 — 1.000 1.000 0.137 1.000 1.000 0.115 1.821 1.691 >11.605
Fig. 11. Comparing Disk Murϕ in [16] with RAM Murϕ [11] (compression options: -b -c)
table M (Fig. 1) we obtain exactly the algorithm in [16] (quite obviously since [16] was our starting point). Thus in the sequel for the algorithm in [16] we use the implementation obtained as described above. For the algorithm in [16] (implemented as above) we wanted to repeat the same set of experiments we run for DMurϕ. However the big protocols of Fig. 8 took too long. Thus we did not include them in our set of experiments. Our results are in Fig. 11. Rows and columns in Fig. 11 have the same meaning as those in Fig. 10, but those of Fig. 11 refer to the algorithm in [16] (while those of Fig. 10 refer to DMurϕ). Computations taking too much longer than the time in Fig. 8 were aborted. In such cases we get a lower bound to the time overhead w.r.t. standard Murϕ. This is indicated with a > sign before the lower bound. For aborted computations the rows States and Rules are, of course, less than 1 and give us an idea of the fraction of the state space explored before the computation was terminated. Fig. 12 compares performances of our DMurϕ with those of the disk based Murϕ in [16]. The meaning of rows and columns of Fig. 12 is as follows. Columns Protocol, Parameters and column α (with α = 1, 0.5, 0.1) have the meaning given in Fig. 9. Row Time gives the ratio (or a lower bound to the ratio) between the verification time when using disk based Murϕ in [16] and the verification time when using DMurϕ. Of course the interesting cases for us are those for which α = 0.1 (i.e. there is not enough RAM to complete verification using a RAM-BFS). For such cases,
Exploiting Transition Locality in the Disk Based Murϕ Verifier Protocol n peterson ns newlist6 ldash sci mcslock1
Parameters 9 1,1,3,2,10 7 1,4,1,false 3,1,1,2,1 6
Min Avg Max
Mem Time Time Time Time Time Time
1 1.221 0.726 0.781 > 24 0.892 0.950
0.5 1.182 112.934 0.768 > 24 >6 0.683
217
0.1 > 32 > 39 > 15 > 24 >6 >2
Time 0.726 0.683 >2 Time >4.762 > 24.261 > 19.667 Time > 24 112.934 > 39
Fig. 12. Comparing DMurϕ with disk based Murϕ in [16].
from the results in Fig. 12 we see that our algorithm is typically more than 10 times faster than the one presented in [16]. Note however that the results in Fig. 12 should be regarded more as qualitative results rather than quantitative results. In fact, as described above, we obtained the algorithm in [16] by eliminating the calibration step from our algorithm. It is quite conceivable that when calibration is not to be performed one can devise optimizations that are not possible when calibration has to be performed. Still, the message of Figs. 10, 11, 12 is quite clear: because of transition locality most of the time we do not need to read the whole disk D. This saves disk accesses and thus verification time. Protocol Parameters Bytes Reach Rules MaxQ mcslock2 N=4 16 945,950,806 3,783,803,224 30,091,568 Diam 153
T 406,275
Mem HMem 300 4,729,754
QMem 481,465
TotMem 5,211,219
Fig. 13. Results for DMurϕ on a 1GHz Pentium IV PC with 512M of RAM. Murϕ options used: -ndl (no deadlock detection), -b (bit compression), -c (40 bit hash compaction).
4.4
A Large Protocol
We also wanted to test our disk based approach on a protocol out of reach for both standard Murϕ [4,11] and Cached Murϕ [21,3] on our 512M machine. We found that the protocol mcslock2 (with N = 4) in the Murϕ distribution suites our needs. Our results are in Fig. 13. The meaning of the columns of Fig. 13 is as follows. Columns Protocol, Parameters, Bytes, Reach, Rules, MaxQ, Diam, T have the same meaning as in Fig. 8 but they refer to DMurϕ (while those of Fig. 8 refer to standard Murϕ). Column Mem gives the total RAM memory (in Megabytes) given to DMurϕ to carry out the given verification task.
218
G.D. Penna et al.
Column HMem gives the hash table size (in kilobytes) that would be needed if we were to store all reachable states in a RAM hash table. Column QMem gives the RAM size (in kilobytes) needed for the BFS queue if we were to keep all BFS queue in RAM. Column TotMem gives the RAM size (in kilobytes) needed to complete the verification task using a RAM-BFS with standard Murϕ. TotMem is equal to (HMem + QMem).
5
Conclusions
We presented a disk based Breadth First Explicit State Space Exploration algorithm as well as an implementation of it within the Murϕ verifier. Our algorithm has been obtained from the one in [16] by exploiting transition locality [21] to decrease disk usage (namely, disk read accesses). Our experimental results show the following. Our algorithm is typically more than 10 times faster than the disk based algorithm proposed [16]. Moreover, even when using 1/10 of the RAM needed to complete verification, our algorithm is only between 1.4 and 5.3 times (3 times on average) slower than RAM-BFS (namely, standard Murϕ) with enough RAM memory to complete the verification task at hand. Statistical properties of transition graphs (as transition locality is) have proven quite effective in improving state space exploration algorithms ([21,22]) on a single processor machine. Looking for new statistical properties and for ways to exploit such statistical properties when performing verification on distributed processors are natural further developments for our research work. Acknowledgements. We are grateful to Igor Melatti and to FMCAD referees for helpful comments and suggestions on a preliminary version of this paper.
References [1] R. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Trans. on Computers, C-35(8), Aug 1986. [2] J. R. Burch, E. M. Clarke, K. L. McMillan, D. L. Dill, and L. J. Hwang. Symbolic model checking: 1020 states and beyond. Information and Computation, (98), 1992. [3] url: http://univaq.it/∼tronci/cached.murphi.html. [4] D. L. Dill, A. J. Drexler, A. J. Hu, and C. H. Yang. Protocol verification as a hardware design aid. In IEEE International Conference on Computer Design: VLSI in Computers and Processors, pages 522–5, 1992. [5] R. Sisto F. Lerda. Disributed-memory model checking with spin. In Proc. of 5th International SPIN Workshop, volume 1680. LNCS, Springer, 2000. [6] G. J. Holzmann. The spin model checker. IEEE Trans. on Software Engineering, 23(5):279–295, May 1997. [7] G. J. Holzmann. An analysis of bitstate hashing. Formal Methods in Systems Design, 1998.
Exploiting Transition Locality in the Disk Based Murϕ Verifier
219
[8] A. J. Hu, G. York, and D. L. Dill. New techniques for efficient verification with implicitily conjoined bdds. In 31st IEEE Design Automation Conference, pages 276–282, 1994. [9] C. N. Ip and D. L. Dill. Better verification through symmetry. In 11th International Conference on: Computer Hardware Description Languages and their Applications, pages 97–111, 1993. [10] C. N. Ip and D. L. Dill. Efficient verification of symmetric concurrent systems. In IEEE International Conference on Computer Design: VLSI in Computers and Processors, pages 230–234, 1993. [11] url: http://sprout.stanford.edu/dill/murphi.html. [12] R. K. Ranjan, J. V. Sanghavi, R. K. Brayton, and A. Sangiovanni-Vincentelli. Binary decision diagrams on network of workstations. In IEEE International Conference on Computer Design, pages 358–364, 1996. [13] J. V. Sanghavi, R. K. Ranjan, R. K. Brayton, and A. Sangiovanni-Vincentelli. High performance bdd package by exploiting memory hierarchy. In 33rd IEEE Design Automation Conference, 1996. [14] url: http://netlib.bell-labs.com/netlib/spin/whatispin.html. [15] U. Stern and D. Dill. Parallelizing the murϕ verifier. In Proc. 9th Int. Conference on: Computer Aided Verification, volume 1254, pages 256–267, Haifa, Israel, 1997. LNCS, Springer. [16] U. Stern and D. Dill. Using magnetic disk instead of main memory in the murϕ verifier. In Proc. 10th Int. Conference on: Computer Aided Verification, volume 1427, pages 172–183, Vancouver, BC, Canada, 1998. LNCS, Springer. [17] U. Stern and D. L. Dill. Improved probabilistic verification by hash compaction. In IFIP WG 10.5 Advanced Research Working Conference on: Correct Hardware Design and Verification Methods (CHARME), pages 206–224, 1995. [18] U. Stern and D. L. Dill. A new scheme for memory-efficient probabilistic verification. In IFIP TC6/WG6.1 Joint International Conference on: Formal Description Techniques for Distributed Systems and Communication Protocols, and Protocol Specification, Testing, and Verification, 1996. [19] url: http://verify.stanford.edu/uli/research.html. [20] T. Stornetta and F. Brewer. Implementation of an efficient parallel bdd package. In 33rd IEEE Design Automation Conference, pages 641–644, 1996. [21] E. Tronci, G. Della Penna, B. Intrigila, and M. Venturini Zilli. Exploiting transition locality in automatic verification. In IFIP WG 10.5 Advanced Research Working Conference on: Correct Hardware Design and Verification Methods (CHARME). LNCS, Springer, Sept 2001. [22] E. Tronci, G. Della Penna, B. Intrigila, and M. Venturini Zilli. A probabilistic approach to space-time trading in automatic verification of concurrent system. In Proc. of 8th IEEE Asia-Pacific Software Engineering Conference (APSEC), Macau SAR, China, Dec 2001. IEEE Computer Society Press. [23] Pierre Wolper and Dennis Leroy. Reliable hashing without collision detection. In Proc. 5th Int. Conference on: Computer Aided Verification, pages 59–70, Elounda, Greece, 1993.
Traversal Techniques for Concurrent Systems Marc Sol´e and Enric Pastor Department of Computer Architecture Technical University of Catalonia 08860 Castelldefels (Barcelona), Spain {msole, enric}@ac.upc.es
Abstract. Symbolic model checking based on Binary Decision Diagrams (BDDs) is a verification tool that has received an increasing attention by the research community. Conventional breadth-first approach to state generation results is often responsible for inefficiencies due to the growth of the BDD sizes. This is specially true for concurrent systems in which existing research (mostly oriented to synchronous designs) is ineffective. In this paper we show that it is possible to improve BFS symbolic traverse for concurrent systems by scheduling the application of the transition relation. The scheduling scheme is devised analyzing the causality relations between the events that occur in the system. We apply the scheduled symbolic traverse to invariant checking. We present a number of schedule schemes and analyze its implementation and effectiveness in a prototype verification tool.
1
Introduction
A lot of effort has been made by the verification community to develop efficient traversal methods [1,2]. Unfortunately most of them are designed to improve the traversal process of synchronous systems and are not suitable or relevant for concurrent systems (concurrent systems may include asynchronous circuits [3,4], distributed systems [5,6], etc.). In synchronous systems, transition relations (TRs) are usually partitioned and the sequence of application of each part must be decided in order to reduce the BDD sizes for intermediate results. The application order in this case is important because the way the variables are quantified depends on it, affecting the size of the intermediate representation. This is usually referred as the quantification schedule problem. Algorithms developed to solve the quantification schedule problem have no practical application for concurrent systems. In this latter case we usually have a disjunctive collection of small TRs, each one describing the behavior of some component. Each individual TR is applied assuming interleaved semantics and the result is immediately added to the reachable set of states, so the order in which these TR are fired has a strong influence on the overall performance.
This work has been partially funded by the Ministry of Science and Technology of Spain under contract TIC 2001-2476-C03-02 and grant AP2001-2819.
M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 220–237, 2002. c Springer-Verlag Berlin Heidelberg 2002
Traversal Techniques for Concurrent Systems
221
Some authors have studied the influence of ordering the application of the TR to avoid the BDD explosion problem. Their goal is to schedule the exploration of the state space by taking only selected portions of the TR, or by delaying the exploration of certain states. In [7] Ravi and Somenzi proposed a “high density” traverse, which does not use the set of newly reached states as the from set for the next iteration. Instead it uses a subset of the newly reached states that has a more compact representation. This is a partial traverse, so afterwards must be completed. In [8] Cabodi et al. use “activity profiles” for each BDD node in the TRs and prune the BDDs to perform a partial traversal, completed again, in the end. The “activity profiles” are obtained in a preliminary reachability learning phase. In [9] Hett et al. propose a sequence of partial traverses that combine subsets of the newly reached states and dynamic TR pruning. Both manipulations are applied using the Hamming distance as the main part of the heuristic function. In [10] Ravi and Cabodi allow the user to provide hints to guide symbolic search. User-defined hints are used to simplify the TR, but require the user to understand the design and also predict the BDD behavior. Our objective is to minimize the CPU time of the traversal process. Usually the problems appear in its intermediate steps, as big BDDs start to be generated. In this cases the faster you can discover the remaining states, the better the performance is, due to BDD recombination. The speed of new states generation is highly related to the number of TRs applications needed to end up the process. Hence an algorithm for determining a good TR application order is crucial. This paper proposes a method that intends to complete symbolic traversal with the minimum number of TR applications. The number of intermediate steps is reduced, thus reducing the probability of generating an intermediate BDD that is much too big to cope with. We present four symbolic traverse algorithms that schedule the application of the TRs. The TR application schemas are named: token traverse (TOK), weighted token traverse (WTOK), dynamic event-clustered traverse (DEC), and TR cluster-closure traverse (TRCC). TOK and WTOK require an static analysis of the system to build the TR application schema. The analysis is basically an a priori causality analysis between TRs (see Section 3). Once we have derived a TR application schema we use it to decide the order in which the TRs would be applied. The schema does not imply a static TR application order because it uses feedback from the traversal to adapt the order dynamically. TOK and WTOK differ in the kind of feedback that receive from the traversal analysis. DEC tries to be more accurate in its TR application schema, so it is completely adaptable and has no initial precomputation phase. DEC keeps constantly updated information on how many states each TR may be applied for the first time. Hence we can decide each time which TR has the biggest probabilities to generate new states at a fastest ratio. Finally, TRCC is an adaptation of partial iterative squaring to the scope of concurrent systems. We combine some TRs to (1) reduce the number of TRs while keeping their size small, thus reducing the number of intermediate results,
222
M. Sol´e and E. Pastor
and (2) due to squaring reduce the number of iterations needed by the schema to complete the analysis. The paper is organized as follows. Section 2 is devoted to basic models for formal verification of concurrent systems. Section 3 reviews some of the known peculiarities of symbolic traverse for concurrent systems and their impact on performance. Sections 4, 5, 6 and 7 are the core of this paper as they explain the four traversal proposals: TOK, WTOK, DEC, and TRCC respectively. Section 8 presents some preliminary results on the performance of the different methods on some benchmarks. Finally Section 9 concludes the paper.
2
Background
A finite transition system (TS) [11] is a 4-tuple A = S, Σ, T, Si , where S is a finite set of states, Σ is a non-empty alphabet of events, T is a transition relation such that T ⊆ S × Σ × S, and Si are the initial states Si ⊆ S. Transitions are e e denoted by s −→s . An event e is enabled at state s if ∃s −→s ∈ T . Given an S event e its firing region Fr(e) is defined as Fr : Σ → 2 such that Fr(e) = {s ∈ e S | ∃ s −→s ∈ T } . Event e is said to be firable at state s if s ∈ Fr(e). The concurrent execution of events is described by means of interleaving; that is, weaving the execution of events into sequences. Given the significance of individual events, the transition relation of a TS can be naturally partitioned e into a disjoint set of relations, one for each event ei ∈ Σ: Te = {s −→s ∈ T }. To represent events symbolically we use a set of Boolean variables that encode the states in the TS and a Boolean relation to encode the TR. The application of a TR Te on some set of states R results in a set of states R that contains all the states reachable from R through a transition of event e. Although a TS is a powerful formalism, it is not usually used directly to specify concurrent systems. Instead, other high-level formalisms like Petri nets [12] or circuit structural descriptions are used, that later on are translated to transitions systems for analysis. A Petri net (PN) is a 4-tuple N = {P, T , W, M0 }, where P = {p1 , p2 , . . . , pn } and T = {t1 , t2 , . . . , tm } are finite sets of places and transitions satisfying P∩T = ∅ and P ∪ T = ∅; W : (P × T ) ∪ (T × P) → Z defines the weighted flow relation, and M0 is the initial marking. The function M : P → N is called a marking; that is, an assignment of a nonnegative integer to each place. If k is assigned to place p, we will say that p is marked with k tokens. If W(u, v) > 0 then there is an arc from u to v with weight (or multiplicity) W(u, v). PNs are graphically represented by drawing places as circles, transitions as boxes or bars, the flow relation as directed arcs, and tokens as dots circumscribed into the places (see the example in Fig. 5).
3
Causality and Chaining Traversal
To speed up the generation of new states we combine two kinds of techniques: causality analysis and chaining. In traditional breadth first search (BFS) the TR
Traversal Techniques for Concurrent Systems s0
a s1
s3
d
s2
b
c
s4
e
s6
s5
c
d
s7
a
b
f
s9
s10
a
iteration 3
d
s11
(a)
s6
iteration 5
d
s7
a
s8
e f
s9
s10
e
apply e
d
s11
(b)
apply g
a
s6
s5
c
d
s7
a f
s9
s10
iteration 2
e s8
e
a
iteration 1
b
s4
e
b apply f
s2
b
s3
d
f
g s12
apply d
b
s1
c
apply c
s5
c
initial state
s0
a
apply b
b
s4
a
g s12
a
e
b iteration 4
s2
b
s3
d
f
initial state b apply a
s1
c
iteration 2
e s8
e
a
iteration 1
b
a
s0
initial state
b
223
f d
s11
g s12
iteration 3
(c)
Fig. 1. Example of exploration process of a 13 state concurrent system using (a) BFS (b) BFS with chaining in lexicographical order of TR application (c) BFS with chaining in inverse lexicographical order.
is applied to the same from set to generate a new to set. Using chaining after each TR application the from set is updated with the states recently generated. Thus, a domino effect is produced and more states are discovered in only one TR application. Figures 2 and 3 show the difference between BFS traversal and BFS traversal with chaining. When chaining is used the order in which TRs are applied plays a crucial role. As an example, Fig. 1 shows a TS in which the behavior of symbolic traverse depends to a great extend on the selected TR application order. Each one of the subfigures in Fig. 1, shows the performance of different approaches on the same system. Subfigure (a) corresponds to a traditional BFS traversal. The progress of the reachability set is indicated by means of labeled arcs of type iteration n, indicating that all the states over that arc were discovered in BFS step n. Subfigures (b) and (c) show also a BFS approach, but using chaining. The difference between these two subfigures is the order in which the chaining of events is applied. In (b) we used lexicographical order (so in each step we applied the events as follow: {a, b, c, d, e, f, g}), and in (c) we used inverse lexicographical order ({g, f, e, d, c, b, a}). In this case the length of the traverse process was: (b) 1 iteration (c) 3 iterations. In (b) we show a detailed behavior of chaining and we draw the reachability set after each event is fired. As we can see all the system is traversed in only one step, while in (c) three steps are needed, although chaining is also used. The state generation ratio of this technique may be limited if a TR application order is established that does not pay attention to causality between TRs. The causality between pairs of TRs can be approximated by the following heuristic that tries to numerically indicate the a priori causal relationship between events. Let Tei and Tej be the TRs of two events ei and ej . We define XT oei →ej (V ) as XT oei →ej (V ) = ∃v∈V [Tei (V, V ) ∩ Fr(ei )(V ) ∩ Fr(ej )(V )] V ←V
224
M. Sol´e and E. Pastor
TO(A)
Application of TR A
States generated by the application of TR A TO(A)
Application of TR B
Overlapping the sets TO(B)
TO (B)
(C) TO
FROM
States generated by the application of TR B
Union of the sets
Application of TR C
TO(C)
TO States generated by the application of TR C
Fig. 2. State generation using BFS traversal. Application of TR A TO(A)
FROM
States generated by the application of TR A Overlapping the sets TO(B) Application of TR B
Union of the sets FROM
TO(A)
FROM’
States generated by the application of TR B
Overlapping the sets
FROM’
Union of the sets FROM’’
TO(B)
Fig. 3. State generation using chained traversal.
(see Fig. 4). From now on we will avoid the overhead of explicitly stating the present set of Boolean variables V and next state set V in the formulas. Therefore the previous formula will be rewritten as XT oei →ej = ∃v∈V [Tei ∩ Fr(ei ) ∩ Fr(ej )] V ←V The XT oei →ej operator simply gives us the set of states reached after the firing of event ei from the states in which event ej was not fireable. The heuristic causality(ei → ej ) is defined as causality(ei → ej ) =
|XT oei →ej ∩ Fr(ej )| |Fr(ej )|
Traversal Techniques for Concurrent Systems
225
and indicates the proportion between the set found with the XT o operator and Fr(ej ). Graphically, see Fig. 4(c), it is the proportion of the dashed area with respect to the whole Fr(ej ) set. a)
b)
XTo(A,B)
c)
XTo(A,B)
To(A)
To(A) Fr(A)
Fr(A)
Fr(A) XTo(A,B)*Fr(B) Fr(B) Fr(B)
Fig. 4. The XT oA→B operator: (a) shows the T o operator, (b) depicts XT oA→B , and (c) shows their relationship.
Intuitively, big values of causality(ei → ej ) show that the activation of TR Tei will tend to produce states in which the application of TR Tej is possible. It must be noted that it is possible to define the symmetric heuristic of causality(ei → ej ), noted negative causality(ei → ej ) by defining the operator CT oei →ej as CT oei →ej = ∃v∈V [Tei ∩ Fr(ei ) ∩ Fr(ej )] V ←V . This function returns the set of states reached after firing event ei from the states in which event ej was fireable. negative causality(ei → ej ) is defined as: negative causality(ei → ej ) =
|CT oei →ej ∩ Fr(ej )| |CT oei →ej |
Definition 1. Two TRs A and B are said to be independent iff each one of the transitions of A falls into one of the following categories: 1. it goes from a state where TR B is fireable to a state where TR B is still fireable, or 2. it goes from a state where TR B is not fireable to a state where TR B is still not fireable. and the same must hold if TR B is applied with respect the fireability of TR A. Theorem 1. If two TRs A and B are independent ⇔ causality(A → B) = 0 and negative causality(A → B) = 0 and causality(B → A) = 0 and negative causality(B → A) = 0. Proof. If A and B are independent the application of one of them to some set S of states cannot change the enableness/disableness of the other. Suppose in set S we
226
M. Sol´e and E. Pastor
have states from which B can be fired (set SB ) and states in which it cannot (set SB ). If we apply A to SB or SB there may be states that will not change. This ones immediately satisfy the property abovementioned. The states that have changed must satisfy the following: if they were states of SB the application of A must produce only states in which B is fireable (negative causality(A → B) is then 0). If they were part of SB then the states generated cannot be in Fr(B) (so causality(A → B) must be 0). The same holds if we exchange A and B. It must be noted that this concept of independence may be viewed as a strong independence or structural independence, as it can happen that two dependent TRs behave, in fact, as independent given some particular initial states. Definition 2. The set of variables which constitute a formula ϕ is called support of ϕ, written as Sup(ϕ). To specify the formula for a TR we use two sets of variables, one to represent present state states and another to represent next state states. Definition 3. Let V be the set of variables used to represent the present state, and V the set of variables used to represent the next state. We define the function related(v) as a bijective function between V and V . Given a variable v in V , related(v ) returns the corresponding variable v in V . We extend function related(v) to sets of variables; i.e. related(Va ) returns the set of variables related to Va . Formally related(Va ) = {related(v)|v ∈ Va }. For instance, assuming V = {p1 , p2 , p3 } and V = {q1 , q2 , q3 }, then related(q3 ) = p3 and related({q1 , q3 }) = {p1 , p3 }. Definition 4. An event ei is said to have independent causal support from event ej iff related(Sup(Tej )) ∩ Sup(Tei ) = ∅. Theorem 2. Events that have mutual independent causal support one from each other, are independent. Proof. if Related(Sup(ej )) ∩ Sup(ei ) = ∅ is true, then event ej is not able to write on the variables on which the enableness of ei depends. Then, any state obtained from the activation of ej will preserve the enableness of ei . Thus, ei is independent from ej . The same can be stated by interchanging ei and ej , so ei and ej are independent events. Theorem 2 can be used to simplify the computation of the causal matrix (see Section 4), as this independence check only involves variable set manipulation, which is usually very fast. Only for those events that do not satisfy this check we need to compute the causality heuristic to determine its final causality value.
4
Token Traversal
Given a concurrent system, it is possible to compute causality(ei → ej ) for each pair of events (TRs), resulting in a causality matrix. This matrix can be analyzed in such a way that we produce a PN model of the event firing. This
Traversal Techniques for Concurrent Systems
227
A
A A 0 B 0 C 0.1
B 0.3 0 0
(a)
C 0.2 0.4 0
A B C A 011 B 0 0 1 C 100 (b)
A fired
C fired
B
C B fired
(c) Fig. 5. Event PN inferred from the causality matrix.
transformation is done as follows: suppose that each event is a place, then, for every position of the matrix different than 0, we establish a relation between the places of the two events that are related to that matrix position. For instance, imagine the causality matrix for a three event system shown in Fig. 5(a). All matrix positions that have a value greater than 0 are changed to 1, otherwise their value remains 0 (see Fig. 5(b)). The corresponding PN is depicted in Fig. 5(c). Although we use the same graphical representation as a PN, it does not behave as a normal PN as defined in Section 2. Instead, the traverse scheme always fires the place with more tokens in it. Obviously we must define some initial tokens. In order to do this we put a token in all places corresponding to events that are initially fireable; more precisely for each event e ∈ Σ : N ew ∩ Fr(e) =∅. Initially, the set of new states is equal to the initial set of states (N ew = F rom). A brief outline of the algorithm is given below: 1. select place with the highest number of tokens 2. fire the event associated to this place 3. if the event has generated new states 4. then put one token on all successors 5. else absorb tokens
Figure 6 shows an example of the algorithm execution over the system represented by Fig. 5. We assume two initial tokens on events A and B. When there is more than one place with maximum number of tokens, one of them is chosen randomly. In our case event A was selected, although event B was also a possible election. Let us assume that event A generates new states (states not already visited), then one token is placed on its successors, that is place B and C. Next, event B is selected (the only possible choice this time) and is fired, successfully generating new states. As a result two tokens are placed on event C (the initial plus the token from B), which is our next choice corresponding to the last state shown in the figure. Now consider what happens if event C is not successfully fired. All tokens on the net are absorbed, so no possible event can be selected afterwards. In this case, the algorithm starts up again, first by recalculating the
228
M. Sol´e and E. Pastor A
A
A A fires successfully
A selected A fired
C fired
B
C
A fired
C fired
B
B fired
C
A fired
C fired
B
B fired
C B fired B selected
A
A fired
C fired
B
C B fired
A
A
C selected
A fired
C fired
C
B
B fires successfully
A fired
C fired
B
B fired
C B fired
Fig. 6. Token firing scheme (TOK).
N ew set as the set of new states generated since last setup and then placing tokens in the events fireable given this present new set. Proceeding so, the number of steps can be considered as the number of setups, and inside one step all firings use chaining to take advantage of the causality relation. The algorithm for TOK is shown next. The external loop is repeated until traversal is finished (no new states generated in the last step). The inner loop represents one step, we select events until all tokens are absorbed. 1. repeat 2. oldF rom = f rom 3. initial tokens( net, new ) 4. stop = FALSE 5. while (¬stop) 6. event = select event max token( net ) 7. to = fire event( event, f rom ) 8. if (to ⊆ reached) 9. absorb token( net, event ) 10. else 11. propagate token( net, event ) 12. f rom = f rom ∪ to 13. reached = reached ∪ to 14. if ( no more tokens( net ) ) 16. stop = TRUE 17. new = reached\oldF rom 18. f rom = new 19.until (new = ∅)
We provide a brief definition of all functions called in this pseudo-code:
Traversal Techniques for Concurrent Systems
229
– initial tokens scans all events and adds a token to the corresponding place if the event is fireable in some state contained in N ew i.e. N ew ∩ Fr(e) =∅. – select event max token selects the event that has more tokens in its corresponding place. – absorb token just removes the tokens from the place assigned to the event passed as argument. – propagate token removes the tokens from the place assigned to the event passed as argument and adds one token on all the successors of that event. – no more tokens returns true if there is no token left in the net.
5
Weighed Token Traversal
It is possible to expand the preceding idea and consider that when an event is successfully fired we do not add only one token to its successors. Instead we can add a number of tokens related to the number of new states generated in which this particular successor is fireable. This would solve one of the problems of our previous proposal, the ineffective activation. The problem arises when a token is placed on one event because its predecessor generated new states, but there are no real new states in which this particular event can be fired. As a result its activation is superfluous. We will illustrate this problem with an example. Suppose we have a TS with three variables V = {p1 , p2 , p3 } and three events A, B and C. To specify the TRs we also use an extra set of variables V = {q1 , q2 , q3 } on the next state. The TR for the events is given below: – TR A: p1 · q1 – TR B: p2 · q2 – TR C: p1 · p2 · q1 · q2 · q3 The initial state s0 is p1 = 0, p2 = 0, p3 = 0 that we write as 000. This system has the reachable set of states S that we depict in Figure 7. The causality matrix of this system (once all values greater than 0 are converted to 1) is: A B C A 001 B 0 0 1 C 110 which translates into the net of Fig. 7. Applying the TOK scheme (see Section 4), Fig. 8 depicts the execution of the traversal on the example. We start at state 000, where events A and B can be activated. This is shown in the first net of Fig. 8 by the two tokens placed on places A and B. The algorithm may select A to fire. A token is placed on C as in the causality matrix A is related to C. However, the activation of A has only produced state 100, from which C cannot fire although a token has been placed on its place. Now the algorithm may select C to fire (3rd net on figure), that is a superfluous activation because no new state can be produced.
230
M. Sol´e and E. Pastor 000 A
B
B
A
100
C fired
010
A
B
110 C 001 A 101
A fired
B
B
B fired
011
C
C
A 111
Fig. 7. (a)State space for ineffective activation example; and (b)Firing net for ineffective activation example. C fired A
C fired B
C fired
A
B
A fired
A fired
B fired
A
B
C fired
A fired
B fired
C
A fired
C
B fired C B fired
C fired A
C fired B
A
C fired B A fired
C fired
A
B
(no new so token absorbed) A fired
B fired C
A fired
B fired C
A fired
B fired C
Fig. 8. Ineffective activation example for system of Figure 7(b) using token traverse.
In order to tighten the relationship between the number of tokens and the real number of new states in which an event can be fired, we redefine the number of tokens as a lower bound of the number of states in which the events may be fired. Later on we will justify why it is only a lower bound. The setup is done like in TOK, except that the number of tokens placed in every event is given by |F rom ∩ Fr(e)|. When an event is selected and fired, we compute the new states set (inN ew = T o\Reached), and we add to the successors of the event the following quantity |N ew ∩ Fr(e)|. We stated that the number of tokens in a place is a lower bound of the number of fireable states for the event related to that place. We illustrate this with an example. In Fig. 7 the causality matrix has zeros for the relationships between A and B as they are independent events. However it can be seen in Fig. 7 that if our starting point is state 000 and we fire event A, we obtain state 100; that is, a state in which B is also fireable. No new token has been placed on
Traversal Techniques for Concurrent Systems
231
B because there is a zero in the causality matrix for those two events, although now B can be fired on two different states. These “untracked” states are always states in which an event ei was already fireable and then the activation of ej added new fireable states to ei (they were independent). Although they are not considered by the number of tokens, the algorithm indirectly keeps track of them because initial tokens are placed on all possible fireable events. In this example, although no additional token is added to the place of event B, there is already a token there and eventually B will be fired. Next we present the main WTOK traverse schema, which resembles the TOK algorithm: 1. repeat 2. oldF rom = f rom 3. initial tokens( net, new ) 4. stop = FALSE 5. while (¬stop) 6. event = select event max token( net ) 7. to = fire event( event, f rom ) 8. inN ew = to\reached 9. distribute tokens( net, event, inN ew ) 10. f rom = f rom ∪ to 11. reached = reached ∪ to 12. if ( no more tokens( net) ) 13. stop = TRUE 14. new = reached\oldF rom 15. f rom = new 16. until (new = ∅)
Using this schema, the sequences of firings for the TS in Fig. 7 is shown in Fig. 9. Note that with respect to Fig. 8 the ineffective activation problem has been eliminated. Compared with TOK, WTOK allows a greater level of accuracy, but is computationally slightly more expensive, because for every possible successor a BDD AND operation is performed.
C fired A
C fired B
A
C fired B
A fired
A fired
B fired C
A
C fired B
B fired
A fired
B fired C
A
B
C fired
A fired
B fired C
A fired
B fired C
Fig. 9. Solution to the ineffective activation provided by the weighed token traverse.
232
6
M. Sol´e and E. Pastor
Dynamic Event-Clustered Traversal
We have seen that WTOK does not guarantee an exact equivalence between number of tokens and new states to fire from. The main problem are the untracked states produced by independent events firings. This is a side-effect of using only causality to determine the successors events of an event, as we have already stated on the previous section. Using causality was motivated to produce sequences of firings favorable to chaining. However, if we fire not only causal related events but also independent events, then the use of chaining is unadvisable. A generalized use of chaining usually implies larger execution times as all events are fired in each iteration. To avoid the ineffective application of the TRs we propose to keep track of all states in which each particular event is enabled (DEC). Hence, we will store a From set for each event in the system (denoted F rom(e)). This set should hold all states up to the current state of the reachability analysis from which the event has not been fired yet; that is, all new states for the event. When an event is fired from the set of states assigned to it, implicitly uses chaining. The firing scheme is as follows. Given a set of new states, they are distributed over the events in the TS. Those states in which a certain event is enabled are associated to it and accumulated with other states that have been previously assigned. The set is updated as: F rom(e) = F rom(e) ∪ (N ew ∩ Fr(e)). The number of tokens “assigned” to each event is computed as the cardinality of the set F rom(e). The event with greater number of fireable states is selected, the event fired, its F rom(e) set emptied, and the new states generated distributed again. The scheme ends when all events have an empty from set. The main algorithm is given below: 1. stop = FALSE 2. while (¬stop) 3. event = select event max from( event list ) 4. if (event = N U LL) 5. stop = TRUE 6. else 7. to = fire event( event, event → f rom ) 8. event → f rom = ∅ 9. new = to\reached 10. reached = reached ∪ to 11. distribute tokens( new, event list )
The price to pay for the exact knowledge this scheme provides, is an increased computational complexity. For every event activation, the state distribution process implies n BDD operations, being n the number of events. Compared to WTOK in which only k BDD operations were performed, being k the number of successors for that particular event. Another drawback is the BDD blowup problem, when the from sets tend to grow due to poor BDD recombination. To mitigate this problem the from sets are minimized using the reachability set.
Traversal Techniques for Concurrent Systems
233
Event "a OR b" closure s0
a
b
s1
s2
a s3
a
a
b
b
s4
b
a
s5
b
a
b
Fig. 10. Closure of an ORed event.
7
Transition Relation Cluster-Closure Traversal
One of the main bottlenecks of symbolic verification is the size of the TR as a result of its monolithic structure. After partitioned TRs were introduced the bottleneck moved to the representation of the intermediate set of reachable states. In concurrent systems partitioned TRs is even more natural due to their inherent structure. However, the additional number of intermediate sets and BDD operations increases the probability of a BDD blowup. We propose a firing scheme that reduces the number of TR applications by clustering subsets of events (TRCC). A monolithic closured TR is created for each cluster. Events are added to clusters incrementally. Without loss of generality, two events are clustered together by ORing their TRs. ORing produces a single TR whose activation has the same effect than the activation of both TR independently. Note that TR size is increased as the support variables in each TR is increased. Hence, clustering stops when a certain BDD size is reached. As a result, we perform less TR applications but normally more expensive. In concurrent systems it is common to have concurrency diamonds due to events that are independent. In order to generate this diamond in only one firing we also concatenate TRs. This process is a particular case of iterative squaring. Iterative squaring is a powerful technique because when used with a monolithic TR it may exponentially reduce the number of steps required to complete the reachability analysis. Unfortunately, it is often the case that this is computationally too expensive. However, when transitive closure is used with smaller TRs it may be effective and computationally suitable. If we take a twoevent TR and compute its closure, we obtain a TR that can compute at least the full concurrency diamond in one step. In fact more states can be discovered depending on whether the events can be iteratively activated or not (see Fig. 10). In practice we add events to the events clusters iteratively. First we OR the TR of the new event and then compute the transitive closure of this new TR (usually we obtain smaller BDD sizes). Our approach does not assume any hierarchical structure in the system. To avoid an uncontrolled BDD growth we cluster the events that share as many variables as possible. In the results pre-
234
M. Sol´e and E. Pastor
sented in Section 8, each event was clustered with some other event that had most variables in common. Doing so the number of events can be reduced at most to half of the original number.
8
Experimental Results
All the results are from executions on a Pentium III 833 Mhz. On the following tables several concurrent systems are analyzed using the schemes described in this article. Due to space constraints we use abbreviations on the table. The correspondence between the abbreviations and the methods is: Seq BFS traversal. GChain Greedy chaining traversal. TOK Token traverse (see Section 4). WTOK Weighed token traversal (see Section 5). DEC Dynamic event-clustered traversal (see Section 6). TRCC Transition relation cluster-closure traversal (see Section 7). For TRCC in some examples there is an additional entry. The default method is TRCC, but when appears written as TRCC* indicates that the execution used manual clustering. As it is not always easy to define good partitioning schemes, we only report results when this was possible. The Greedy chaining traversal is equal to the BFS algorithm (i.e. same firing order) with the only exception that makes use of the chaining technique (see Section 3). Column (Events) shows the total number of event firings used to traverse the system. Note that some results are not directly comparable, i.e. TRCC reduces the number of events in the system. Column (Peak) shows the peak number of BDD nodes reached during traversal (in thousands). Finally, the last column specifies in seconds the wall-clock time needed to finish the analysis or timeout if the algorithm failed to finish within an hour (3600s). Sometimes, when the total time has been obtained, it is specified in brackets. Also when the time at which the algorithm was stopped has been bigger than an hour it is indicated with a “>” sign. We analyzed different types of systems. Their characteristics are described on Table 1. Basic information on the size is given: number of Boolean variables, number of events and reachable states. The second column shows the original formalism of the system (before generating the equivalent TS): C for circuits and P for Petri nets. We give a brief description of the most relevant systems: RGD-arbiter asP*, RGD arbiter presented on [13] at transistor level. STARI (16) A self-timed pipeline. Slotted ring (n) Slotted ring protocol for LANs (n number of nodes). dme /DME (n) Various DME implementations/specifications. Muller (n) Muller’s C-element pipeline of n elements. In all examples the TR application count is largely reduced (the original goal of this work). We can also see that the way in which TRs are applied also provide
Traversal Techniques for Concurrent Systems
235
benefits in terms of CPU execution times and BDD-sizes. The RGD-arbiter, the slotted ring, the DME specification and Muller pipeline are examples in which almost any traversal scheme will provide improvements. On the contrary, the STARI pipeline does not respond to any of the schemes, except TRCC when a set of clusters was manually provided. The motivation for this behavior is the structure of the pipeline itself: it is a deep structure with lots of concurrency at each level. Clustering the events in each step reduced the depth of the traversal and a BDD reduction due to the complete diamond generation in one step. More experiments are necessary in order to correlate the efficiency of each scheme with the topology of the system under analysis. Table 1. Concurrent systems under test. Name Type Variables Events Size RGD-arbiter C 63 47 5.49046e+13 STARI (16) C 100 100 4.21776e+22 Slotted ring (10) P 100 100 8.49079e+12 Slotted ring (15) P 150 150 4.79344e+19 Slotted ring (20) P 200 200 2.86471e+26 dme (3) C 295 492 6579 dme (5) C 491 820 859996 DME (8) P 134 128 311296 DME (9) P 152 144 3.2768e+06 parallelizer (16) P 130 100 2.82111e+12 Muller (30) P 120 60 6.009e+07 Muller (40) P 160 80 4.64139e+10 Muller (50) P 200 100 3.61071e+13 Muller (60) P 240 120 8.38369e+15 buf (100) P 200 101 1.26765e+30 sdl arq deadlock P 154 92 3954
— RGD-arbiter — Method Steps Events Peak Time (s) Seq >38 >1786 >1755 [>14400] GChain 24 1175 1755 1476 TOK 8 1430 20 30 WTOK 10 1280 20 26 DEC N/A 1334 50 82 TRCC* 10 55 63 45 TRCC 17 468 1335 1501 — slotted ring (10) — Method Steps Events Peak Time (s) Seq 189 19000 445 4195 GChain 17 1800 68 65 TOK 1 1486 17 16 WTOK 1 1296 20 20 DEC N/A 2500 307 802 TRCC 12 780 32 15 — slotted ring (20) — Method Steps Events Peak Time (s) Seq – – – timeout GChain 32 6600 1562 [5296] TOK 1 5463 191 966 WTOK 1 4474 118 545 DEC – – – timeout TRCC 22 2760 311 531
— STARI (16) — Steps Events Peak Time (s) >329 >33000 – [>8800] 127 12800 440 2435 >34 N/A [>1590] [>10800] 67 10555 698 [8890] N/A 8135 572 [7997] 48 833 106 138 110 5550 852 [4318] — slotted ring (15) — Steps Events Peak Time (s) – – – timeout 24 3750 391 781 1 3206 71 248 1 2690 220 414 N/A 5621 893 [7196] 18 1710 77 87 — parallelizer (16) — Steps Events Peak Time (s) 99 10000 70 189 5 600 20 26 1 314 20 18 1 342 20 30 N/A 194 20 39 3 272 20 18
236
M. Sol´e and E. Pastor — dme (3) — Method Steps Events Peak Time (s) Seq 114 56580 150 289 GChain 46 23124 70 105 TOK 1 2938 87 305 WTOK 1 3235 78 156 DEC N/A 544 45 103 TRCC 46 11562 77 145 — DME (8) — Method Steps Events Peak Time (s) Seq 40 5248 36 26 GChain 12 1664 20 9 TOK 1 545 26 20 WTOK 1 528 26 19 DEC N/A 250 20 7 TRCC 12 936 20 11 — Muller (30) — Method Steps Events Peak Time (s) Seq 140 8460 258 1386 GChain 23 1440 43 32 TOK 1 901 20 16 WTOK 1 774 20 16 DEC N/A 666 113 98 TRCC 23 720 41 17 — Muller (50) — Method Steps Events Peak Time (s) Seq – – – timeout GChain 35 3600 219 456 TOK 1 2336 57 155 WTOK 1 1965 57 111 DEC – – – timeout TRCC 35 1800 213 246 — buf100 — Method Steps Events Peak Time (s) Seq >352 >35552 – [>8100] GChain 100 10201 13 7 TOK 1 10202 51 690 WTOK 1 6200 21 155 DEC N/A 7864 – [13407] TRCC 100 5151 334 595
9
— dme (5) — Steps Events Peak Time (s) >83 >68060 >1297 [>9900] 86 71340 977 [4166] 1 9453 1055 [9865] 1 10328 857 [11989] N/A 2708 373 2321 86 35670 756 3089 — DME (9) — Steps Events Peak Time (s) 51 7488 51 82 15 2304 20 18 1 690 20 22 1 697 25 34 N/A 392 20 10 15 1216 20 30 — Muller (40) — Steps Events Peak Time (s) 248 19920 1026 [15361] 29 2400 103 151 1 1536 30 52 1 1305 47 59 N/A >211 – timeout 29 1200 103 79 — Muller (60) — Steps Events Peak Time (s) – – – timeout 43 5280 431 907 1 3185 80 244 1 2763 155 320 – – – timeout 43 2640 429 582 — sdl arq deadlock — Steps Events Peak Time (s) 120 11132 42 35 40 3772 20 7 1 1354 15 3 1 1242 15 3 N/A 448 20 8 35 1800 22 22
Conclusions
This paper proposes four different schemes to speed up reachability analysis on concurrent systems. Their main contribution is to establish different heuristic orderings for the application of the TRs that can reduce substantially the time required to generate the full state space. Although firing order has been studied on state reduction techniques (i.e partial order [14]), to our knowledge this is the first time this issue is addressed to generate all the reachable states for concurrent systems. Experimental evidence has been given that the methods proposed are most times faster than a classical BFS approach or even a BFS with chaining. For all benchmarks, the use of the simple greedy chaining (BFS) scheme has proved to be very useful. However it is important to note that at least one of the proposed schemes always performed better than the latter. It remains as an open problem to decide a priori which method is more suitable for a given system. If this could not be decided on a reasonable amount of time there is always the possibility to try all the schemes sequentially or in parallel.
Traversal Techniques for Concurrent Systems
237
References 1. R. E. Bryant, “Graph-based algorithms for Boolean function manipulation,” IEEE Trans. Computers, vol. C-35, pp. 677–691, Aug. 1986. 2. J. R. Burch, E. M. Clarke, K. L. McMillan, D. L. Dill, and L. J. Hwang, “Symbolic model checking: 1020 states and beyond,” Information and Computation, vol. 98, no. 2, pp. 142–170, 1992. 3. O. Roig, J. Cortadella, and E. Pastor, “Verification of asynchronous circuits by bdd-based model checking of petri nets,” in 16th International Conference on Application and Theory of Petri Nets, pp. 374–391, June 1995. 4. J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, and A. Yakovlev, “Petrify: a tool for manipulating concurrent specifications and synthesis of asynchronous controllers,” IEICE Transactions on Information and Systems, vol. E80D, no. 3, pp. 315–325, March 1997. 5. A. S. Miner and G. Ciardo, “Efficient reachability set generation and storage using decision diagrams,” in ICATPN, pp. 6–25, 1999. 6. J. C. E. Pastor and O. Roig, “Symbolic analysis of bounded petri nets,” IEEE Transactions on Computers, vol. 50, no. 5, pp. pp. 432–448, May 2001. 7. K. Ravi and F. Somenzi, “High-density reachability analysis,” in Proc. of the IEEE/ACM International Conference on Computer Aided Design, pp. 154–158, 1995. 8. G. Cabodi, P. Camurati, and S. Quer, “Improving symbolic traversals by means of activity profiles,” in Design Automation Conference, pp. 306–311, 1999. 9. A. Hett, C. Scholl, and B. Becker, “State traversal guided by hamming distance profiles.” 10. K. Ravi and F. Somenzi, “Hints to accelerate symbolic traversal,” in Conference on Correct Hardware Design and Verification Methods, pp. 250–264, 1999. 11. A. Arnold, Finite Transition Systems. Prentice Hall, 1994. 12. C. Petri, Kommunikation mit Automaten. PhD thesis, Schriften des Institutes f¨ ur Instrumentelle Matematik, Bonn, 1962. 13. M. R. Greenstreet and T. Ono-Tesfaye, “A fast, ASP*, RGD arbiter,” in Proceedings of the Fifth International Symposium on Advanced Research in Asynchronous Circuits and Systems, (Barcelona, Spain), pp. 173–185, IEEE, Apr. 1999. 14. P. Godefroid, Partial-order methods for the verification of concurrent systems: an approach to the state-explosion problem, vol. 1032. New York, NY, USA: SpringerVerlag Inc., 1996.
A Fixpoint Based Encoding for Bounded Model Checking Alan Frisch1 , Daniel Sheridan1 , and Toby Walsh2 1
2
University of York, York, UK {frisch,djs}@cs.york.ac.uk Cork Constraint Computation Centre, University College Cork, Cork, Ireland [email protected]
Abstract. The Bounded Model Checking approach to the LTL model checking problem, based on an encoding to Boolean satisfiability, has seen a growth in popularity due to recent improvements in SAT technology. The currently available encodings have certain shortcomings, particularly in the size of the clause forms that it generates. We address this by making use of the established correspondence between temporal logic expressions and the fixpoints of predicate transformers as used in symbolic model checking. We demonstrate how an encoding based on fixpoints can result in improved performance in the SAT checker.
1
Introduction
Bounded Model Checking (BMC) [2] is an encoding to Boolean Satisfiability (SAT) of the LTL model checking problem. The encoding is achieved by placing a bound on the number of time steps of the model that are to be checked against the specification. The resulting Boolean formula contains variables representing the state variables of the model at each step along a path, together with constraints requiring the path to be contained within the model and to violate the specification. The result of the SAT checker is thus a path in the model which is a counterexample to the specification, or failure, which means that no such path exists within the bound. The encoding of the LTL specification in BMC is defined recursively on the structure of the formula. While for simple specifications this is sufficient, more complex specifications such as bounded existence and response patterns [7] lead to an exponential blowup in the size of the resulting Boolean formula. Recent improvements to the encoding in NuSMV [4] have not removed this restriction. The fixpoint characterisations of temporal operators [8] have been exploited in other model checking systems such as SMV [14]; we discuss an approach to their use in an encoding of LTL for BMC which produces more compact encodings which can be solved more quickly in the SAT solver.
M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 238–255, 2002. c Springer-Verlag Berlin Heidelberg 2002
A Fixpoint Based Encoding for Bounded Model Checking
2 2.1
239
Bounded Model Checking Background
A model checking problem is a pair M, f of a model and a temporal logic specification. A model M is defined as a Kripke structure S, R, L, I where S is a set of states; R : S → S is the transition relation; L : S → P(AP ) is the labelling function, marking each state with the set of atomic propositions (AP ) that hold in that state; and I is the set of initial states, which may be equal to S. A path π ∈ M is a sequence of states s0 , s1 , . . . ∈ M such that ∀i.(si , si+1 ) ∈ R. We write π(i) to refer to the ith state along the path. The model checking problem for LTL is to verify that for an LTL formula f , for all paths πi ∈ M such that πi (0) ∈ I, (M, πi ) |= f . 2.2
Path Loops
We say the a path π is a k-loop if for all i ≥ 0, the (k + i)th state in π is identical to the l + ith state for some l, 0 ≤ l < k. If a path is known to be a loop, it is possible to verify the correctness of infinite time specifications such as always (G) by checking just the first k states in the path. 2.3
Boolean Satisfiability
Boolean satisfiability (SAT) is the problem of assigning Boolean values to variables in a propositional formula, in such a way as to make the formula evaluate to true (to satisfy the formula). For example, for the formula (a ∨ ¬b) ∧ (b ∨ ¬c) ∧ (¬c ∨ ¬a) can be satisfied by e.g. the assignment a = 1, b = 1, c = 0. SAT solvers derived from the Davis-Putnam algorithm [5] require input in clause form (CNF): a conjunction of clauses, each of which is a disjunction of literals. A number of high performance SAT solvers are available, making SAT a convenient ‘black box’ back end for a number of different problems. 2.4
The Bounded Model Checking Encoding
The bounded model checking encoding represents k states along a bounded path πbmc together with a conjunction of constraints requiring πbmc to be a valid path in M and be a counterexample of f . The ‘valid path’ constraint is a propositional encoding of the transition relation. We can see from the bounded semantics of LTL (Figure 1) that there are two ways of violating each operator in the specification, depending on whether πbmc is a k-loop; the ‘counterexample’ constraint is therefore a disjunction of the ways in which the specification may be violated. We write the bounded model checking encoding of a problem with bound k, model M and specification f as [[M, ¬f ]]k
240
A. Frisch, D. Sheridan, and T. Walsh (M, π) |=ik a
⇔ a ∈ L(π(i))
(M, π) |=ik ¬f1
⇔ (M, π)| =ik f1
(M, π) |=ik f1 ∧ f2
⇔ (M, π) |=ik f1 and (M, π) |=ik f2
for atomic a
(M, π) |=ik f1 ∨ f2 (M, π) |=ik (M, π) |=ik (M, π) |=ik
(M, π) |=ik
(M, π) |=ik
⇔ (M, π) |=ik f1 or (M, π) |=ik f2 f1 if π is a k-loop (M, π) |=i+1 k X f1 ⇔ i+1 (M, π) |=k f1 ∧ i < k otherwise ∃j, i ≤ j.(M, π) |=jk f1 if π is a k-loop F f1 ⇔ ∃j, i ≤ j ≤ k.(M, π) |=jk f1 otherwise ∀j, i ≤ j.(M, π) |=jk f1 if π is a k-loop G f1 ⇔ ⊥ otherwise j ∃j, i ≤ j.(M, π) |=k f2 ∧∀n, i ≤ n < j.(M, π) |=n if π is a k-loop k f1 [f1 U f2 ] ⇔ n f ∃j, i ≤ j ≤ k.(M, π) |= 2 k ∧∀n, i ≤ n < j.(M, π) |=n otherwise k f1 ∃j, i ≤ j.(M, π) |=jk f1 ∧∀n, i ≤ n ≤ j.(M, π) |=jk f2 if π is a k-loop [f1 R f2 ] ⇔ ∃j, i ≤ j ≤ k.(M, π) |=jk f1 ∧∀n, i ≤ n ≤ j.(M, π) |=jk f2 otherwise
Fig. 1. The Bounded Semantics of LTL
Given the functions l Lk (π) which holds when π is a k-loop with π(k) = π(l) k and Ll (π) = l=0 l Lk which holds when π is any k-loop, the general translation is defined as1 : k 0 0 (1) [[M, f ]]k := [[M ]]k ∧ ¬Lk (π) ∧ [[f ]]k ∨ l Lk (π) ∧ l [[f ]]k l=0
where [[M ]]k denotes the encoding of the transition relation of M as a constraint on π with bound k; [[f ]]ki and l [[f ]]ki denote the encoding of the LTL formula f evaluated along path π at time i, where π is a non-looping path and a k-loop to l respectively. These encodings are given in Table 1. Biere et al. show the correctness of some of these encodings in [2]; we will not repeat their proofs here. Theorem 1 in Biere et al. [2] states that bounded model checking of this form is complete provided that the bound k is sufficiently large. 1
This comes from Definition 15 in [2]
A Fixpoint Based Encoding for Bounded Model Checking
241
Table 1. The BMC Encoding for LTL f [[f ]]ik [[f ]]i k l k j G f1 ⊥ j=min(i,l) l [[f1 ]]k k k j F f1 [[f ]]j j=i [[f1 ]]k j=min(i,l) l 1 k i+1 i+1 X f1 i < k ∧ [[f1 ]]k i < k ∧ l [[f1 ]]k ∨ i = k ∧ l [[f1 ]]lk k k n n f1 U f2 j=i ([[f2 ]]jk ∧ j−1 [[f2 ]]jk ∧ j−1 1 ]] ) n=i [[f1 ]]k ) j=i (l n=i l [[f k k j−1 i−1 j n ∨ j=l (l [[f2 ]]k ∧ n=i l [[f1 ]]n k ∧ n=l l [[f1 ]]k ) k j k k j j j j n n f1 R f2 j=i ([[f1 ]]k ∧ n=i [[f2 ]]k ) j=min(i,l) l [[f2 ]]k ∨ j=i (l [[f1 ]]k ∧ n=i l [[f2 ]]k ) i−1 j n ∨ j=l (l [[f1 ]]jk ∧ kn=i l [[f2 ]]n k ∧ n=l l [[f2 ]]k )
3
Exploiting Fixpoints in BMC
The approach that we have taken to making a fixpoint-based encoding for BMC is based on a clause-style normal form for temporal logic. After converting the specification to this form, we can redefine the encoding to specifically take advantage of the properties of the normal form. 3.1
The Separated Normal Form
Gabbay’s Separation Theorem [11] states that arbitrary temporal formulæ may be written in the form G ( i (Pi ⇒ Fi )) where Pi are (strict) past time formulæ and Fi are (non-strict) future time formulæ. Fisher [9] defines a normal form for temporal logic based on the Separation Theorem and gave a series of transformations for reaching it. The general form of SNF is the same as the separation theorem; the implications Pi ⇒ Fi are referred to as rules. Since neither LTL nor CTL have explicit past-time operators, Bolotov and Fisher [3] define the start operator which holds only at the beginning of time. (M, π) |=ik start ⇔ π(i) ∈ I The possible rules are thus start ⇒ lj An initial rule i
j
li ⇒ F
i
lj
li ⇒ X
lj
A global X-rule
j
A global F-rule
j
where li and lj are literals. The transformation functions T (Ψ ) recursively convert a set of rules which do not conform to the normal form into a set of rules which do. To convert any temporal logic formula f to SNF, it is sufficient to apply the transformation rules to the singleton set {start ⇒ f }. For brevity, we do not list the full set of transformations here; in general they are trivially adapted from those in [3], or from standard propositional logic.
242
A. Frisch, D. Sheridan, and T. Walsh
P ⇒f ∧x ∪ TG (Ψ ) x ⇒ X (f ∧ x) P ⇒ g ∨ (f ∧ x) TU ({P ⇒ f U g} ∪ Ψ ) = x ⇒ X (g ∨ (f ∧ x)) ∪ TU (Ψ ) P ⇒ Fg
P ⇒ G f (x) Tren1 ({P ⇒ G f (F g)} ∪ Ψ ) = ∪ Tren1 (Ψ ) x ⇒ Fg TG ({P ⇒ G f } ∪ Ψ ) =
In each of the above transformations, a new variable x is introduced: the conversion to SNF introduces one variable for each removed operator (in the first two transformations above) in addition to the renaming variables used to flatten the structure of the formula (in the last transformation above). The transformations to rules are based on the fixpoint characterisations of the LTL operators. All LTL operators can be represented as the fixpoint of a recursive function [8]; the transformations encode the corresponding function as a rule which is required to hold in all states. Only those operators characterised by greatest fixpoints are converted (always (G) and weak until (W); until (U) is first converted to weak until and sometime for its transformation) which means that the sometime operator remains unchanged. By Tarski’s fixpoint theorem [18] we know that a finite number of iterations of a rule is sufficient to find its fixpoint. Thus the instance of the introduced variable at time i holds iff the original operator held at time i. For a formal proof of the correctness of the transformations, see [10]. 3.2
Bounded SNF
Although the fixpoint characterisations are given for unbounded temporal logic, they are preserved for most of bounded LTL since we have bounded semantics for next-state (X). We note that the characterisation of always is valid if and only if the path is a k-loop; we encapsulate this constraint in the new operator next-loop-state (Xl ) with semantics (M, π(i + 1)) |=k f1 if π is a k-loop i (M, π) |=k Xl f1 ⇔ ⊥ otherwise and modify the transformation accordingly. The bounded semantics of always also fails to capture the concept of rules holding in all reachable states. We give the semantics for a modified operator bounded always (Gk ) for bounded LTL without the restriction to paths with loops. if π is a k-loop ∀j, i ≤ j.(M, π(j)) |=k f1 i (M, π) |=k Gk f1 ⇔ ∀j, i ≤ j < k.(M, π(j)) |=k f1 otherwise
A Fixpoint Based Encoding for Bounded Model Checking
243
The correctness of the transformations rely on a sufficient number of instances of the rules occurring. In BMC, this means that the transformations based on fixpoints are correct only when the bound is sufficiently large. It is easy to see, by appealing to the semantics, that the failure mode with an insufficiently large bound is the same as that for the original encoding: no counterexample is found. Introducing this operator allows us to restate the general form as Gk (Pi ⇒ Fi ) i
The rules Pi ⇒ Fi are now of the following form: lj An initial rule li ⇒ X l lj start ⇒ i
j
li ⇒ X
i
lj
A global X-rule
j
A global Xl -rule
j
li ⇒ F
i
lj
A global F-rule
j
with the transformation for the always operator being amended to TG ({P ⇒ G f } ∪ Ψ ) =
P ⇒f ∧x ∪ TG (Ψ ) x ⇒ Xl (f ∧ x)
The correctness of bounded SNF is covered in [16]. 3.3
Encoding Bounded SNF
The distributivity of bounded always follows directly from its semantics; because of the unusual semantics of start, this means that any LTL formula may be represented as a conjunction of instances of the following ‘universal’ rules:
start ⇒ Gk
i
lj
Gk
j
li ⇒ X lj j
i
Gk
i
li ⇒ X l lj j
li ⇒ F lj j
Although it is simple to encode these rules using the standard BMC encodings in Table 1, we can take advantage of the limited nesting depth characteristic of these normal forms to define a more efficient encoding, in the same way as for the depth 1 case in [4] and [17]. We give the more efficient encodings in Table 2. Note that although we make use of the BMC encodings, they are only used for purely propositional formulæ. No further proof of these encodings is required: they are trivial simplifications of those proved in [2].
244
A. Frisch, D. Sheridan, and T. Walsh Table 2. The BMC Encoding for SNF-LTL [[f ]]0k [[f1 ]]0k
f start ⇒ f1 Gk (f1 ⇒ Xl f2 )
⊥ k−1 n Gk (f1 ⇒ X f2 ) [[f1 ]]k ⇒ [[f2 ]]n+1 k n=0
k m Gk (f1 ⇒ F f2 ) kn=0 [[f1 ]]n k ⇒ m=n [[f2 ]]k
0 l [[f ]]k 0 l [[f1 ]]k k n+1 n n=0 l [[f1 ]]k ⇒ l [[f2 ]]k k n+1 n n=0 l [[f1 ]]k ⇒ l [[f2 ]]k
k k n m n=0 l [[f1 ]]k ⇒ m=min(n,l) l [[f2 ]]k
For propositional f , [[f ]]ik ≡ l [[f ]]ik , so we can deduce from Table 2 that this relationship also holds for many cases where f is a rule. Under these circumstances, we can factorise the encodings for f out of the disjunctions in Equation 1 either explicitly during the encoding or by processing the resulting propositional formula. Often the checks for the looping nature of π will cancel each other out entirely, further simplifying the encoding. While this type of optimisation can be made with the standard BMC encoding, it only occurs where operators are not nested; the renaming effect of SNF simplifies this optimisation and makes it more widely applicable. 3.4
The Fixpoint Normal Form
We noted in Section 3.1 that SNF converts only the greatest fixpoint operators, leaving rules containing the sometime operator; we see from Table 2 that these rules are the pathological case for this encoding. Converting the sometime operator in the same way requires care. A transformation based directly on the fixpoint characterisation would be
P ⇒f ∨x TF ({P ⇒ F f } ∪ Ψ ) = ∪ TF (Ψ ) x ⇒ X (f ∨ x) The problem stems from the disjunction in the second rule. Since we are trying to show satisfiability, it is simple to satisfy each occurrence of the rule by setting the right hand disjunct to true for all time: the rule can always be satisfied. Since we are interested only in the bounded semantics of the operator, it is possible to break this chain at the bound by introducing an extra operator: (M, π) |=ik bound ⇔ i ≥ k The transformation is now
P ⇒f ∨x x ⇒ X (f ∨ x) ∪Ψ TF ({P ⇒ F f } ∪ Ψ ) = bound ⇒ f ∨ ¬x
3.5
Correctness of the Fixpoint Normal Form Transformation
We take the outline of the proof from [10]. For a transformation T to preserve the semantics of an arbitrary formula f , we require that
A Fixpoint Based Encoding for Bounded Model Checking
245
for all models M and for all LTL formulæ f , (M, π) |=k f iff there exists an M such that M ∼x M and (M, π) |=k τ (f ) where x is a new propositional variable introduced, and M ∼x M if and only if M differs from M in at most the valuation given x. We express this in temporal logic with quantification over propositions (QPTL)2 as QPTL f ⇔ ∃x.T (f ). The proof is given for the case that the rule set is a singleton set, since for all transformations, T is independent of Ψ . The proofs may easily be extended to non-empty Ψ . Lemma 1. For sufficiently large k, (M, π) |=k F f1 if and only if (M , π) |=k (x ∨ f1 ) and (M , π) |=k Gk (x ⇔ X(x ∨ f1 )) where M ∼x M . Proof. Consider the fixpoint expression τ (Z) = f1 ∨ X Z. We introduce the variable x such that for all n, (M , π) |=nk x ⇔ (M , π) |=nk X τ k−n (true) By substituting the definition of x and by one substitution of the definition of τ , we have (M , π) |=nk x ⇔ (M , π) |=nk X (f1 ∨ x) and by reference to the semantics, (M , π) |=k Gk (x ⇔ X(x ∨ f1 )) . From the least fixpoint characterisation[8], (M , π) |=k x ⇔ F f1 , and by unrolling τ by one step and substituting the definition of x, we get (M , π) |=k f1 ∨ x. Theorem 1. For any rule A, QPTL A ⇔ ∃x.TF (A) Proof. Proving each direction independently: – QPTL A ⇒ ∃x.TF (A) Substituting Lemma 1, Gk (P ⇒ F B) ⇒ ∃x.Gk (x ⇔ X(x ∨ B)) ∧ Gk (bound ⇒ ¬x) ∧ Gk (P ⇒ (x ∨ B)) ⇒ ∃x.Gk ((x ⇔ X(x ∨ B)) ∧ bound ⇒ ¬x ∧ (P ⇒ (x ∨ B))) which implies the set of rules {x ⇒ X(x ∨ B), bound ⇒ ¬x, P ⇒ x ∨ B}. – QPTL ∃x.TF (f ) ⇒ f Starting with the transformed set of rules {x ⇒ X(x ∨ B), bound ⇒ ¬x, P ⇒ x ∨ B}, and exploiting the corollary of Lemma 1, (M , si ) |=k (x ∨ f1 ) ⇔ (M , si ) |=k F f1 iff (M , si ) |= Gk (bound ⇒ ¬x) Gk ((x ⇔ X(x ∨ B)) ∧ bound ⇒ ¬x ∧ (P ⇒ (x ∨ B))) ⇔ Gk (x ⇔ X(x ∨ B)) ∧ Gk (bound ⇒ ¬x) ∧ Gk (P ⇒ (x ∨ B)) ⇔ Gk (x ⇔ X F B) ∧ Gk (bound ⇒ ¬x) ∧ Gk (P ⇒ (x ∨ B)) ⇒ Gk ((x ⇒ X F B) ∧ (P ⇒ (x ∨ B))) ⇒ Gk (P ⇒ ((X F B) ∨ B)) ⇒ Gk (P ⇒ F B) That is, the singleton rule set {P ⇒ F B}. 2
See [19] for full details; briefly, (M, i) |= ∃p.A iff there exists an M such that (M , i) |= A, and M and M differ at most in the valuation given to p.
246
4
A. Frisch, D. Sheridan, and T. Walsh
Comparisons
We compare the encodings on an example specification G F f . This is a reachability specification, with many applications. Before encoding, the specification is negated to F G ¬f
(2)
We consider only the loop encoding, as the non-loop encoding is ⊥ for all methods due to the semantics of always. The original, recursive encoding decomposes in two steps.In the loop case, 0 l [[F G ¬f, π]]k
=
k
i l [[G ¬f, π]]k
i=0
=
k
k
f (j)
i=0 j=min(i,l)
This is a disjunction of conjunctions: the pathological case for conversion to clause form. It is possible to define a more efficient encoding using renamed subformulæ [4], but this approach is difficult to generalise. The size of the formula is O(k2 ), hence the cost to build it before CNF conversion is quadratic. The conversion to SNF yields the following rules3 start ⇒ F x1 x1 ⇒ ¬f ∧ x2 x2 ⇒ Xl (¬f ∧ x2 ) which encode to the three conjuncts k
x1 (i)
∧
(x1 (i) ⇒ ¬f (i) ∧ x2 (i))
∧
i=0 k i=0 k
(x2 (i) ⇒ ¬f (i + 1) ∧ x2 (i + 1))
i=0
We have two introduced variables: the first establishes a renaming of the G ¬f subformula, and the second renames each successive step of this subformula. This means that steps are shared between references from the F operator, leading to 3
Further reduction of the second and third rules is necessary for correct SNF; we disregard this as it makes no difference to the final encoding
A Fixpoint Based Encoding for Bounded Model Checking
247
a simplification of the problem which is easier to solve as well as being smaller. The added complexity of the introduced variables is balanced by the ability to reuse subformulæ many times. The encoding corresponds to an ideal renaming of the formula above, but the conversion is performed in linear time, and results in a formula of size O(k). Furthermore, we can show in advance that the encoding of each rule used here is invariant with respect to l, which means that the subformulæ can be factorised out of the disjunction of loops seen in Equation 1. Finally, we examine the fixpoint normal form conversion. The set of rules corresponding to the specification is start ⇒ x0 ∨ x1 x0 ⇒ X (x0 ∨ x1 ) bound ⇒ x1 ∨ ¬x0 x1 ⇒ ¬f ∧ x2 x2 ⇒ Xl (¬f ∧ x2 ) which encode to the conjuncts k
x0 (0) ∨ x1 (0)
∧
(x0 (i) ⇒ x0 (i + 1) ∨ x1 (i + 1))
∧
x1 (k) ∨ ¬x0 (k)
∧
(x1 (i) ⇒ ¬f (i) ∧ x2 (i))
∧
i=0
k i=0 k
(x2 (i) ⇒ ¬f (i + 1) ∧ x2 (i + 1))
i=0
The main difference between the SNF encoding and the fixpoint normal form encoding is the omission of the long disjunction in the first conjunct which would be encoded as a single long clause. This is replaced by an array of conjunctions which rename each step in much the same way as for the G operator. Although in this case the advantage is dependent on the SAT checker, it is clear that where the F operator is nested, similar advantages would be seen as for SNF with the G operator.
5
Results
We compare the SNF and Fixpoint encodings with the encoding used in NuSMV version 2.0.2; this version of NuSMV includes several of the optimisations discussed in [4]. For consistency, we have implemented the SNF and Fixpoint encodings as options in NuSMV. All of the experiments have been done using the SAT solver zChaff [15] on a 700MHz AMD Athlon with 256Mb main memory, running Linux.
248
A. Frisch, D. Sheridan, and T. Walsh Snf 104
Fixpoint 104
Global After Before
103
102 2 10
Fixpoint 104
Global After Before
103
103
104 NuSMV
105
102 2 10
Global After Before
103
103
104 NuSMV
105
102 2 10
103 Snf
104
Fig. 2. Number of clauses generated by a shift register model
5.1
Scalability
We observe the difference in the behaviour of the encodings with increasing problem size by choosing a simple problem that is easy to scale. The benchmark circuits have been kept deliberately simple as it is the encoding of the specification not the model that differentiates the encodings. A shift register is a storage device of length n which, at each time step, moves the values stored in each of its elements to the next element, reading in a new value to fill the now empty first element. That is, storage elements x0 . . . xn−1 and input in are transformed such that ∀i, 0 < i < n · (xi ← xi−1 ) and x0 ← in. The specification that the shift register must fulfil will depend on its application; we explore a number of response patterns taken from [6]. The specifications depend on the number of elements in the shift register, referring to points at the end and middle of the register. For example, in the case of a three element register: – Global response (depth 2) — x2 goes high in response to in: G(in ⇒ F x2 ) – After response (depth 3)— x2 goes high in response to in, after x1 has gone high: G (x1 ⇒ G(in ⇒ F x2 )) – Before response (depth 3)— x1 goes high in response to in, before x2 has gone high (this property is only true if all the registers are zero, so we test for empty ≡ ¬x0 ∧ ¬x1 ∧ ¬x2 too): [((in ∧ empty) ⇒ [¬x2 U (x1 ∧ ¬x2 )]) U (x2 ∨ G x2 )]
Number of Clauses. We see in Figure 2 that the number of clauses produced by both SNF and Fixpoint grows, in general, less quickly than the number produced by NuSMV, as the length of the register increases. The differing gradients follow the behaviour predicted by the differing depths of the specifications: the slopes become shallower with increasing depth indicating an exponential improvement in the number of clauses.
A Fixpoint Based Encoding for Bounded Model Checking Snf 101
Fixpoint 101
Global After Before
0
Fixpoint 101
Global After Before
0
10
10
10-1
10-1
10-1
10-1
100
101 102 NuSMV
103
104
10-2 -2 10
Global After Before
0
10
10-2 -2 10
249
10-1
100
101 102 NuSMV
103
104
10-2 -2 10
10-1
100
101
Snf
Fig. 3. Time taken by zChaff for a shift register model
The advantage of the Fixpoint encoding over SNF is dependent upon the number of occurrences of the always operator in the specification, since this is the only difference between the encodings. We see the greatest advantage for Fixpoint in the after response and before response specifications, with two occurrences of the always operator; the first operator in the after response specification has a smaller encoding than the second as one of the corresponding rules is an initial rule. We can conclude that, as far as the number of clauses is concerned, the Fixpoint encoding outperforms SNF and NuSMV in the way that is expected: size and rate of size increase decreasing with the nesting depth and the occurrence of least fixpoint operators.
zChaff timings. Counting the number of clauses is far from being an effective method of determining the efficiency of an encoding. We also look at one of the current state-of-the-art SAT solvers, zChaff [15]. The behaviour is far less clear than for the number of clauses; zChaff is a complex system. Broadly, the SNF and Fixpoint encodings always result in a shorter runtime than the NuSMV encoding; the Fixpoint encoding outperforms the SNF encoding only for the after response specification (for the global response specification, the trend is towards an improvement for larger problems). We see a clear exponential improvement for certain specifications: the timings for Before with SNF and Fixpoint grow exponentially slower than NuSMV; the global response specification shows the same trend less dramatically. We only see a exponential improvement for the after response specification with the Fixpoint encoding: with the SNF encoding, the trend appears to be towards NuSMV being faster.
250
5.2
A. Frisch, D. Sheridan, and T. Walsh
Distributed Mutual Exclusion
The distributed mutual exclusion circuit from [13] forms a good basis for comparing the performance of different encodings as it meaningfully implements several specifications. We look at three here, applied to a DME of four elements: – Accessibility: if an element wishes to enter the critical region, it eventually will. We check the accessibility of the first two elements. This specification is correct, so as in [2], we check at a chosen bound to illustrate the timing differences. G(request(0) → F enter (0)) ∧ G(request(1) → F enter (1)) – Precedence given token possession: the mutual exclusion property is enforced by a token passing mechanism; if an element of the DME holds the token, then its requests to enter the critical region are given precedence. We check the converse: if the first element holds the token, the second does not have precedence and vice versa. Since the token begins at the first element, this is the quicker to prove, with a bound of 14. For the second element, a bound of 54 is required to find the counterexample. G((request(0) ∧ request(1) ∧ token(0)) → [¬enter (0) U enter (1)]) – Bounded overtaking given token possession: if two elements wish to enter the critical region, then the higher priority may enter a given number of times before the other. We check bounded overtaking of one and two entrances. Both specifications are correct so as above we check at a bound of 40. These specifications are the most complex, including up to four nested until operators. For one entrance: G((request(0) ∧ request(1) ∧ token(0)) → [(¬enter (0) ∧ ¬enter (1)) U (enter (0) ∧ X(enter (0) U [(¬enter (0) ∧ ¬enter (1)) U enter (1)]))]) The results are summarised in Table 3 together with the timings for CMU SMV on CTL representations of the same problems4 . For the bounded overtaking problems, we note that NuSMV took nearly 10 minutes to generate the formula in the first case, and after 25 minutes had not completed in the second case. In contrast, the time taken to perform the SNF and Fixpoint encodings were insignificant. While both the SNF and Fixpoint encodings outperform the NuSMV encoding and SMV, we do not see a consistent advantage to either. The results for accessibility suggest that Fixpoint scales better with increasing bound, while the results for bounded overtaking suggest that SNF scales better with increasing specification depth. 5.3
Texas-97 Benchmarks
We examine a number of model checking benchmarks from the Texas-97 benchmark suite [1]. These benchmarks have been converted from the Blif-mv represen4
We note that for SMV to terminate in a reasonable time on these problems, it must be started with the -inc switch. No similar knowledge of model checker behaviour is needed for BMC.
A Fixpoint Based Encoding for Bounded Model Checking
251
Table 3. Timing results in zChaff for the distributed mutual exclusion circuit Specification Bound NuSMV encoding SNF encoding Fixpoint encoding Accessibility 30 2.65 0.33 0.36 Accessibility 40 20.93 4.84 4.33 Priority for 0 14 0.13 0.02 0.02 Priority for 1 54 14.93 0.44 0.76 Overtaking depth 1 40 85.73 2.15 1.11 Overtaking depth 2 40 * 4.92 5.15
SMV 13.13 13.13 12.97 15.00 13.96 14.14
Table 4. Timing results in zChaff for the MSI cache coherence protocol Processors 2 2 2 2 3 3 3 3 3 3
Specification Bound NuSMV SNF Fixpoint Request A 10 4.40 1.73 1.53 Request A 20 19.40 5.82 9.97 Request B 10 2.65 3.63 2.69 Request B 20 49.78 8.63 16.42 Request A 10 13.00 3.03 2.50 Request A 20 39.22 8.2 5.79 Request B 10 4.60 6.66 5.93 Request B 20 54.94 62.11 40.25 Request C 10 4.58 6.64 5.91 Request C 20 44.8 50.27 37.65
tation to SMV format by a locally modified version of the VIS model checker [12]. We run these benchmarks at fixed bounds and report the time spent by zChaff. MSI Cache Coherence Protocol. This is an implementation of a Modified Shared Invalid cache coherence protocol between two or three processors. We examine two of the specifications of behaviour from the benchmark. The results are summarised in Table 4. – Whenever processor A requests the bus, it gets the bus in the next clock cycle. Listed as “Request A” in the results. G(bus reqA → X bus ackA) – Whenever processor B (or C) requests the bus, it gets the bus only when Processor A did not request the bus. Listed as “Request B” or “Request C” in the results. G(bus reqB → F bus ackB ) Instruction Fetch Control Module. This is a model of the instruction fetch control module of the experimental torch microprocessor developed at Stanford University. Three models are examined; from the text accompanying the benchmark set: – IFetchControl1: The original instruction module with several assumptions on the environmental signal. – IFetchControl2: As IFetchControl1 except that the memory stall signal is always low.
252
A. Frisch, D. Sheridan, and T. Walsh Table 5. Timing results in zChaff for the Instruction Fetch Control Module Model IFetchControl1 IFetchControl2 IFetchControl3 IFetchControl1 IFetchControl2 IFetchControl3 IFetchControl1 IFetchControl2 IFetchControl3
Specification Bound NuSMV SNF Fixpoint Delay 10 0.94 0.45 0.44 Delay 10 0.99 0.40 0.40 Delay 10 1.29 0.39 0.50 Refetch 10 3.69 0.91 0.82 Refetch 10 3.30 0.89 0.81 Refetch 10 3.74 1.49 1.88 WriteCache 10 3.58 1.68 2.47 WriteCache 10 2.67 1.65 1.78 WriteCache 10 2.78 2.24 1.40
– IFetchControl3.v: As IFetchControl1 except that the instruction cache line is assumed to be always valid. We examine three specifications from the benchmark. The results are summarised in Table 5. – The delayed version of a signal should, in the next state, have the signal’s previous value. Listed as “Delay” in the results. G(IStall s1 → X IStall s2 ) – As above, for the Refetch state. Listed as “Refetch” in the results. G((PresState s1 = REFETCH ) → X((PrevState s2 = REFETCH )) – WriteCache s2 becomes one in some paths before WriteTag s2 becomes one. Listed as “WriteCache” in the results. ¬[¬WriteTag s2 U (WriteCache s2 ∧ ¬WriteTag s2 )]
Pentium Pro Split-Transaction Bus. This is a model of the Modified Exclusive Shared Invalid cache coherence protocol used by the Intel Pentium Pro processor for SMP. We examine a number of different combinations of opcodes running on the processors, with the memory address of the transaction being nondeterministically 0 or 1. We examine three specifications from the benchmark. The results are summarised in Table 6. – Correctness of the bus transaction IOQ. Listed as “IOQ” in the results. G(¬((processor0 .fifo = REQUEST ) ∧ (processor1 .fifo = REQUEST ))) – Liveness of processor 0 (part 1). Listed as “Live 1” in the results. G((processor0 .stage = FETCH ) → F(processor0 .stage = EXECUTE )) – Liveness of processor 0 (part 2). Listed as “Live 2” in the results. G((processor0 .stage = EXECUTE ) → F(processor0 .stage = FETCH ))
Summary. While we can see that the SNF and Fixpoint encodings outperform the NuSMV encoding in many cases, the gains are typically less dramatic than were seen for the mutual exclusion circuit. The models are encoded in the same
A Fixpoint Based Encoding for Bounded Model Checking
253
Table 6. Timing results in zChaff for the Pentium Pro Split-Transaction Bus Opcode 0 Load2Store Load2Store Load2Store Load2Store Load2Store Load2Store Load2Store Load2Store
Opcode 1 Specification NuSMV SNF Fixpoint Load2Store IOQ 949.53 202.83 202.90 Store IOQ 753.26 156.43 156.24 Load2Store Live 1 923.12 176.64 169.97 Load Live 1 745.94 131.61 145.88 Store Live 1 1111.63 175.58 199.19 Load2Store Live 2 919.61 167.23 160.38 Load Live 2 883.51 134.52 155.23 Store Live 2 738.74 128.96 143.04
way regardless of the encoding used for the specification; these benchmarks are very large circuits with several thousand variables, so it is reasonable to suppose that the performance gains due to the new encodings are mitigated by the time taken to process the model. The specifications used in these benchmarks are much simpler than those used to test the DME: typically of the form G(a → X b) or G(a → F b). This suggests again that one advantage of the SNF and Fixpoint encodings are dependent on the nesting depth of the specification.
6
Conclusions
We have described two new encoding schemes for bounded model checking which build on the existing encodings and use the fixpoint characterisations of LTL. The first is a novel application of the Separated Normal Form, while the second extends SNF by the introduction of a transformation for the eventually operator. We have shown that these new encodings are correct, provided that the original bounded model checking encoding is correct. We have demonstrated a reduction in the number of clauses generated by the problem which is exponential in the size of the problem instance, for both encodings, and also that the improvement in performance in the SAT checker can be exponential in the size of the problem instance, depending on the specification. We have demonstrated a clear performance advantage to these encodings over the NuSMV bounded model checking implementation in several real-world examples, and we have demonstrated the advantage that these encodings give BMC over conventional symbolic model checkers.
References 1. Adnan Aziz et al. Examples of HW verification using VIS, 1997. http://vlsi.colorado.edu/˜vis/texas-97/ 2. Armin Biere, Alessandro Cimatti, Edmund Clarke, and Yunshan Zhu. Symbolic model checking without BDDs. In W.R. Cleaveland, editor, Tools and Algorithms for the Construction and Analysis of Systems. 5th International Conference, TACAS’99, volume 1579 of Lecture Notes in Computer Science, pages 193–207. Springer-Verlag Inc., July 1999.
254
A. Frisch, D. Sheridan, and T. Walsh
3. Alexander Bolotov and Michael Fisher. A resolution method for CTL branchingtime temporal logic. In Proceedings of the Fourth International Workshop on Temporal Representation and Reasoning (TIME). IEEE Press, 1997. 4. Alessandro Cimatti, Marco Pistore, Marco Roveri, and Roberto Sebastiani. Improving the encoding of LTL model checking into SAT. In Agostino Cortesi, editor, Third International Workshop on Verification, Model Checking and Abstract Interpretation, volume 2294 of Lecture Notes in Computer Science. Springer-Verlag Inc., January 2002. 5. Martin Davis and Hilary Putnam. A computing procedure for quantification theory. Journal of the ACM, 7:201–215, 1960. 6. M.B. Dwyer, G.S. Avrunin, and J.C. Corbett. Property Specification Patterns for Finite-State Verification. In M. Ardis, editor, 2nd Workshop on Formal Methods in Software Practice, pages 7–15, March 1998. 7. M.B. Dwyer, G.S. Avruning, and J.C. Corbett. Patterns in property specifications for finite-state verification. In 21st International Conference on Software Engineering, Los Angeles, California, May 1999. 8. E. Allen Emerson and Edmund M. Clarke. Characterizing correctness properties of parallel programs using fixpoints. In Jan van Leeuwen J. W. de Bakker, editor, Automata, Languages and Programming, 7th Colloquium, volume 85 of Lecture Notes in Computer Science, pages 169–181. Springer-Verlag Inc, 1980. 9. Michael Fisher. A resolution method for temporal logic. In Proceedings of Twelfth International Joint Conference on Artificial Intelligence (IJCAI). Morgan Kaufmann, August 1991. 10. Michael Fisher and Philippe No¨el. Transformation and synthesis in MetateM Part I: Propositional MetateM. Technical Report UMCS-92-2-1, Department of Computer Science, University of Manchester, Manchester M13 9PL, England, February 1992. 11. Dov Gabbay. The declarative past and imperative future. In H. Barringer, editor, Proccedings of the Colloquium on Temporal Logic and Specifications, volume 398 of Lecture Notes in Computer Science, pages 409–448. Springer-Verlag, 1989. 12. The VIS Group. VIS: A system for verification and synthesis. In R. Alur and T. Henzinger, editors, Proceedings of the 8th International Conference on Computer Aided Verification, volume 1102 of Lecture Notes in Computer Science, pages 428– 432, New Brunswick, NJ, July 1996. Springer. 13. A. J. Martin. The design of a self-timed circuit for distributed mutual exclusion. In Henry Fuchs, editor, Proceedings of the 1985 Chapel Hill Conference on VLSI, pages 245–260. Computer Science Press, 1985. 14. K. L. McMillan. Symbolic Model Checking: An Approach to the State Explosion Problem. PhD thesis, Carnegie Mellon University, 1992. 15. M. Moskewicz, C. Madigan, Y. Zhao, L. Zhang, and S. Malik. Chaff: Engineering an efficient SAT solver. In 39th Design Automation Conference, Las Vegas, June 2001. 16. Daniel Sheridan. Using fixpoint characterisations of LTL for bounded model checking. Technical Report APES-41-2002, APES Research Group, January 2002. Available from http://www.dcs.st-and.ac.uk/˜apes/apesreports.html 17. Daniel Sheridan and Toby Walsh. Clause forms generated by bounded model checking. In Andrei Voronkov, editor, Eighth Workshop on Automated Reasoning, 2001.
A Fixpoint Based Encoding for Bounded Model Checking
255
18. A. Tarski. A lattice-theoretical fixpoint theorem and its applications. Pacific Journal of Mathematics, 5:285–309, 1955. 19. Pierre Wolper. Specification and synthesis of communicating processes using an extended temporal logic. In Proceeding of the 9th Symposium on Principles of Programming Languages, pages 20–33, Albuquerque, January 1982.
Using Edge-Valued Decision Diagrams for Symbolic Generation of Shortest Paths Gianfranco Ciardo and Radu Siminiceanu College of William and Mary, Williamsburg, Virginia 23187 {ciardo,radu}@cs.wm.edu
Abstract. We present a new method for the symbolic construction of shortest paths in reachability graphs. Our algorithm relies on a variant of edge–valued decision diagrams that supports efficient fixed–point iterations for the joint computation of both the reachable states and their distance from the initial states. Once the distance function is known, a shortest path from an initial state to a state satisfying a given condition can be easily obtained. Using a few representative examples, we show how our algorithm is vastly superior, in terms of both memory and space, to alternative approaches that compute the same information, such as ordinary or algebraic decision diagrams.
1
Introduction
Model checking [13] is an exhaustive, fully automated approach to formal verification. Its ability to provide counterexamples or witnesses for the properties that are checked makes it increasingly popular. In many cases, however, this feature is the most time– and space–consuming stage of the entire verification process. For example, [15] shows how to construct traces for queries expressed in the temporal logic CTL [11] under fairness constraints. Another direction is taken in SAT–based model checking, where satisfiabilty checkers are used to find shortest–length counterexamples (as is the case of the bounded model checking technique [4]), conduct the entire reachability analysis [1], or combine the state– space exploration method with SAT solvers [24]. Since a trace is usually meant to be examined by a human, it is particularly desirable for a model–checking tool to compute a minimal–length trace. Unfortunately, finding such trace is an NP-complete problem [17], thus a sub–optimal trace is sought in most cases. For some operators, finding minimal–length witnesses is instead easy in principle. An example is the EF operator, which is closely related to the (backward) reachability relation: a state satisfies EF p if there is an execution path from it to a state where property p holds. Even using symbolic encodings [7], though, the generation and storage of the sets of states required to generate an EF witness can be a major limitation in practice.
Work supported in part by the National Aeronautics and Space Administration under NASA Grants NAG-1-2168 and NAG-1-02095.
M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 256–273, 2002. c Springer-Verlag Berlin Heidelberg 2002
Using Edge-Valued Decision Diagrams
257
Our goal is then to adapt a very fast and memory–efficient state–space generation algorithm we recently developed [10] and endow the symbolic data structure with information that captures the minimum distance of each state from any of the initial states. Knowledge of this distance significantly simplifies the generation of shortest–length EF witnesses. To encode this information, we employ a variant of the edge–valued decision diagrams [21], appropriately generalized so that it is applicable to our fast state–space generation strategy. We show that the new variant we define is still canonical, and emphasize the importance of using edge–values, which give us increased flexibility when performing guided fixed–point iterations. The paper is organized as follows. Section 2 defines basic concepts in discrete– state systems, ordinary and edge–valued decision diagrams, state–space generation, and traces, and formulates the one–to–many shortest path problem. Section 3 introduces our extensions to edge–valued decision diagrams, including a different type of canonical form, EV+MDDs. Section 4 discusses the efficient manipulation of EV+MDDs and our algorithm for constructing the distance function. Section 5 evaluates the performance of the new data structure and algorithm by comparing them with existing technologies: regular and algebraic decision diagrams. Section 6 concludes with final remarks and future research directions.
2
State Spaces, Decision Diagrams, and Distances
S init , N ), where the discrete set S is the A discrete–state model is a triple (S, potential state space of the model; the set S init ⊆ S contains the initial states; can be reached and N : S → 2S is the transition function specifying which states from a given state in one step, which we extend to sets: N (X ) = i∈X N (i). We consider structured systems modeled as a collection of K submodels. A (global) system state i is then a K-tuple (iK , . . . , i1 ), where ik is the local state for submodel k, for K ≥ k ≥ 1, and S is given by SK × · · · × S1 , the cross–product of K local state spaces Sk , which we identify with {0, . . . , nk −1} since we assume that S is finite. The (reachable) state space S ⊆ S is the smallest set containing S init and closed with respect to N , i.e.: S = S init ∪ N (S init ) ∪ N (N (S init ) ∪ · · · = N ∗ (S init ). Thus, S is the fixed point of equation S = N (S) when S is initialized to S init . 2.1
Decision Diagrams
It is well known that the state spaces of realistic models are enormous, and that decision diagrams are an effective way to cope with this state–space explosion problem. Their boolean incarnation, binary decision diagrams (BDDs) [5], can compactly encode boolean functions of K variables, hence subsets of {0, 1}K , which can then be manipulated very efficiently. BDDs have been successfully employed to verify digital circuits and other types of synchronous and
258
G. Ciardo and R. Siminiceanu
asynchronous systems. In the last decade, their application has expanded to areas of computer science beyond computer–aided verification. A comprehensive overview of decision diagrams is presented in [14]. We consider exclusively ordered decision diagrams (the variables labelling nodes along any path from the root must follow the order iK , . . . , i1 ) that are either reduced (no duplicate nodes and no node with all edges pointing to the same node, but edges possibly spanning multiple levels) or quasi–reduced (no duplicate nodes, and all edges spanning exactly one level), either form being canonical. We adopt the extension of BDDs to integer variables, i.e., multi– valued decision diagrams (MDDs) [19], an example of which is in Figure 1. MDDs are often more naturally suited than BDDs to represent the state space of arbitrary discrete systems, since no binary encoding must be used to represent the local states for level k when nk > 2. An even more important reason to use MDDs in our work, as it will be apparent, is that they better allow us to exploit the event locality present in systems exhibiting a globally–asynchronous locally– synchronous behavior. When combined with the Kronecker representation of the transition relation inspired by [2] and applied in [9,22], MDDs accommodate different fixed–point iteration strategies that result in remarkable efficiency improvements [10]. To discuss locality in a structured model, we require a disjunctively– partitioned transition function [18], i.e., N must be a union of (asynchronous) transition functions: N (iK , . . . , i1 ) = e∈E Ne (iK , . . . , i1 ), where E is a finite set of events and Ne is the transition function associated with event e. Furthermore, we must be able to express each transition function Ne as the cross–product of K local transition functions: Ne (iK , . . . , i1 ) = Ne,K (iK ) × · · · × Ne,1 (i1 ). This is a surprisingly natural requirement: for example, it is satisfied by any Petri net [23], regardless of how it is decomposed into K subnets (by partitioning its places into K sets). Moreover, if a given model does not exhibit this behavior, we can always coarsen K or refine E so that it does. If we identify Ne,k with a boolean matrix of size nk × nk , where entry (ik , jk ) is 1 iff jk ∈ Ne,k (ik ), the overall transition relation is encoded by the boolean Kronecker expression e∈E K≥k≥1 Ne,k . We say that event e affects level k if Ne,k is not the identity, we denote the top and bottom levels affected by e with Top(e) and Bot(e), respectively, and we let Ek = {e ∈ E : Top(e) = k}. 2.2
Symbolic State–Space Generation: Breadth–First vs. Saturation
The traditional approach to generating the reachable states of a system is based on a breadth–first traversal, as derived from classical fixed–point theory, and applies a monolithic N (even when encoded as e∈E Ne ): after d iterations, the currently–known state space contains exactly all states whose distance from any state in S init is at most d. However, recent advances have shown that non–BFS, guided, or chaotic [16], exploration can result in a better iteration strategy. An example is the saturation algorithm introduced in [10], which exhaustively fires (explores) all events of Ek in an MDD node at level k, thereby bringing it to its final “saturated” form. We only briefly summarize the main characteristics of
Using Edge-Valued Decision Diagrams 0 1 2 3
S4 = {0, 1, 2, 3} S3 = {0, 1, 2}
0 1 2
S2 = {0, 1}
0 1
S1 = {0, 1, 2}
0 1 2
0 1 2 0 1
0 1 2
0 1
0 1
0 1 2 0
0 1 2
259
S = {0210, 1000, 1010, 1100, 1110, 1210, 2000, 2010, 2100, 2110, 2210, 3010, 3110, 3200, 3201, 3202, 3210, 3211, 3212}
1
Fig. 1. A 4-level MDD on {0,1,2,3}×{0,1,2}×{0,1}×{0,1,2} and the encoded set S.
saturation in this section, since the algorithm we present in Section 4.1 follows the same idea, except it is applied to a richer data structure. Saturation considers the nodes in a bottom–up fashion, i.e., when a node is processed, all its descendants are already known to be saturated. There are major advantages in working with saturated nodes. A saturated node at level k encodes a fixed point with respect to events in Ek ∪ . . . ∪ E1 , thus it need not be visited again when considering such events. By contrast, traditional symbolic algorithms manipulate and store a large number of non–saturated nodes; these nodes cannot be present in the encoding of the final state space, thus will necessarily be deleted before reaching the fixed–point and replaced by (saturated) nodes encoding a larger subspace. Similar advantages apply to the manipulation of the auxiliary data structures used in any symbolic state–space generation algorithm, the unique table and the operation cache: only saturated nodes are inserted in them, resulting in substantial memory savings. Exploring a node exhaustively once, instead of once per iteration, also facilitates the idea of in– place–updates: while traditional algorithms frequently create updated versions of a node, to avoid using stale unique table and cache entries, saturation only checks–in a node when all possible updates on it have been performed. Experimental studies [10] show that our saturation strategy performs orders of magnitude faster than previous algorithms. Even more important, its peak memory requirements are often very close to the final requirements, unlike traditional approaches where the memory consumption grows rapidly until midway through the exploration, only to drop sharply in the last phases. Our next challenge for saturation is then applying it to other types of symbolic computation, such as the one discussed in this paper: the generation of shortest–length traces, where the use of chaotic iteration strategies would not seem applicable at first. 2.3
The Distance Function
The distance of a reachable state i ∈ S from the set of initial states S init is defined as δ(i) = min d : i ∈ N d (S init ) . We can naturally extend δ : S → N to all states in S by letting δ(i) = ∞ for any non–reachable state i ∈ S \ S. Alternatively, given such a function δ : S → N ∪ {∞}, we can identify S as the subset of the domain where the function is finite: S = {i ∈ S : δ(i) < ∞}.
260
G. Ciardo and R. Siminiceanu
The formulation of our problem is then: Given a description of a struc S init , N ), determine the distance to all reachable tured discrete–state system (S, states, i.e., compute and store δ : S → N ∪ {∞} (note that the reachable state space S is not an input, rather, it is implicitly an output). This can be viewed as a least fixed–point computation for the functional Φ : D → D, where D is the set of functions mapping S onto N ∪ {∞}. In other words, Φ refines an approximation of the distance function from the initial δ [0] ∈ D, defined as δ [0] (i) = 0, if s ∈ S init , δ [0] (i) = ∞ otherwise, via the iteration
δ [m+1] (i) = Φ(δ [m] )(i) = min δ [m] (i), min 1 + δ [m] (i ) i ∈ N (i ) . Note that the state–space construction is itself a fixed–point computation, so we seek now to efficiently combine the two fixed–point operations into one. Before showing our algorithm to accomplish this, in Section 3, we first describe a few approaches to compute distance information based on existing decision diagrams technology. 2.4
Explicit Encoding of State Distances
Algebraic decision diagrams (ADDs) [3] are an extension of BDDs where multiple terminals are allowed (thus, they are also called MTBDDs [12]). ADDs can encode arithmetic functions from S to R ∪ {∞}. The value of the function on a specific input (representing a state in our case) is the value of the terminal node reached by following the path encoding the input. While ADDs are traditionally associated to boolean argument variables, extending the arguments to finite integer sets is straightforward. The compactness of the ADD representation is related to the merging of nodes, exploited to a certain degree in all decision diagrams. In this case, there is a unique root, but having many terminal values can greatly reduce the degree of node merging, especially at the lower levels, with respect to the support decision In other words, the number of diagram, i.e., the MDD that encodes S ⊆ S. terminal nodes for the ADD that encodes δ : S → N ∪ {∞} equals the number of distinct values for δ (hence the “explicit” in the title of this section); if we merged all finite–valued terminals into one, thus encoding just S but not the state distances, many ADD nodes may be merged into one MDD node. An alternative explicit encoding of state distances can be achieved by simply using a forest of MDDs. This approach is derived from the traditional ROBDD method, by extending it to multi–valued variables. Each of the distance sets N d (S init ) = {i ∈ S | δ(i) = d} (or {i ∈ S | δ(i) ≤ d}, which may require fewer nodes in some cases) can be encoded using a separate MDD. Informally, this reverses the region where most sharing of nodes occurs compared to ADDs: the roots are distinct, but they may be likely to share nodes downstream. The cardinality of the range of the function is critical to the compactness of either representation: the wider the range, the less likely it is that nodes are merged. Figure 2 (a) and (b) show an example of the same distance function represented as an ADD or as a forest of MDDs, respectively.
Using Edge-Valued Decision Diagrams (a)
261
(b) dist=0 dist=1 dist=2 dist=3 dist=4
i3 0 0 0 0 1 1 1 1 i2 0 0 1 1 0 0 1 1 i1 0 1 0 1 0 1 0 1 f 02322410
0
1
2
3
4
T
Fig. 2. Storing the distance function: an ADD (a) vs. a forest of MDDs (b).
2.5
Symbolic Encoding of State Distances
The idea of associating numerical values to the edges of regular BDDs was proposed in [20,21], resulting in a new type of decision diagrams, edge–valued BDDs (EVBDDs)1 . In the following definition of EVBDDs, instead of using the original terminology and notation, we use the terminology and notation needed to introduce the new data structure presented in the next section, so that differences and similarities will be more apparent. Definition 1. An EVBDD is a directed acyclic graph that encodes a total function f : {0, 1}K → Z as follows: 1. There is a single terminal node, at level 0, with label 0, denoted by 0.0. 2. A non–terminal node at level k, K ≥ k ≥ 1, is denoted by k.p, where p is a unique identifier within level k, and has two children, k.p[0].child and k.p[1].child (corresponding to the two possible values of ik ) which are nodes at some (not necessarily the same) level l, k > l ≥ 0. 3. The 1-edge is labelled with an integer value k.p[1].val ∈ Z, while the label of k.p[0].val is always (implicitly) 0. 4. There is a single root node kr .r, for some K ≥ kr ≥ 0, with no incoming edges, except for a “dangling” edge labelled with an integer value ρ ∈ Z. 5. Canonicity restrictions analogous to those of reduced ordered BDDs apply: uniqueness: if k.p[0].child = k.q[0].child, k.p[1].child = k.q[1].child, and k.p[1].val = k.q[1].val, then p = q; reducedness: there is no redundant node k.p satisfying k.p[0].child = k.p[1].child and k.p[1].val = 0. The function encoded by an EVBDD node k.p is recursively defined by if ik = 0 f k.p [0].child (il , . . . , i1 ) f k.p (ik , . . . , i1 ) = f k.p [1].child (ir , . . . , i1 ) + k.p[1].val if ik = 1 1
We observe that also binary moment diagrams (BMDs), independently introduced in [6], associate values to edges. For BMDs however, evaluating the function on a particular argument requires the traversal of multiple paths, as opposed to a unique path for EVBDDs. Thus, while very effective for verifying circuits such as a multiplier, BMDs are not as suited for our approach.
262
G. Ciardo and R. Siminiceanu
i3 0 0 0 0 1 1 1 1
(b)
0
(a)
(c)
0
0 2
1
-3
1
2 3
i2 0 0 1 1 0 0 1 1
0 3
0 -1
0 1
2 -1
2 2
3 -1
i1 0 1 0 1 0 1 0 1
0 2
0 -1
-1 1
1 0
-1 1
2 1
f 02322410
0
0
0
Fig. 3. Canonical (a) and non–canonical (b),(c) EVBDDs for the same function f .
where l and r are the levels of k.p[0].child and k.p[1].child, respectively, and f 0.0 = 0. The function encoded by an EVBDD edge, that is, a (value,node) pair is then simply obtained by adding the constant value to the function encoded by the node. In particular, the function encoded by the EVBDD is f = ρ + f kr .r . Note that the nodes are normalized to enforce canonicity: the value of the 0-edge is always 0. If this requirement were relaxed, there would be an infinite number of EVBDDs representing the same function, obtained by rearranging the edge values. An example of multiple ways to encode the function of Figure 2 with non–canonical EVBDDs is shown in Figure 3, where, for better readability, we show the edge value in the box from where the edge departs, except for the top dangling arc. Only the EVBDD in Figure 3(a) is normalized. This node normalization implies that ρ = f (0, . . . , 0) and may require the use of both negative and positive edge values even when the encoded function is non–negative, as is the case for Figure 3(a). More importantly, if we want to represent functions such our distance δ : S → N ∪ {∞}, we can allow edge values to be ∞; however, if δ(0, . . . , 0) = ∞, i.e., state (0, . . . , 0) is not reachable, we cannot enforce the required normalization, since this implies that ρ is ∞, and f is identically ∞ as well. This prompted us to introduce a more general normalization rule, which we present next.
3
A New Approach
We use quasi–reduced, ordered, non–negative edge–valued, multi–valued decision diagrams. To the best of our knowledge, this is the first attempt to use edge–valued decision diagrams of any type in fixed–point computations or in the generation of traces. 3.1
Definition of EV+MDDs
We extend EVBDDs in several ways. The first extension is straightforward: from binary to multi–valued variables. Then, we change the normalization of nodes to a slightly more general one needed for our task. Finally, we allow the value of
Using Edge-Valued Decision Diagrams (a)
i3 0 0 0 0 1 1 1 1
0 0 0
i1 0 0 0 0 1 1 1 1
i2 0 0 1 1 0 0 1 1
0 2
2 0
i2 0 0 1 1 0 0 1 1
i1 0 1 0 1 0 1 0 1
0 2
1 0
i3 0 1 0 1 0 1 0 1
f 02322410
0
f 023∞∞410
(b)
0 2
263
0 0 0 0 2
2 0
1
2
1
0
0
Fig. 4. Storing total (a) and partial (b) arithmetic functions with EV+MDDs.
an edge to be ∞, since this is required to describe our distance functions. Note that the choice to use quasi–reduced instead of reduced decision diagrams is not dictated by limitations in the descriptive power of EVBDDs, but by efficiency considerations in the saturation–based algorithm we present in Section 4. Definition 2. Given a function f : S → Z ∪ {∞}, an EV+MDD for f =∞ is a directed acyclic graph with labelled edges that satisfies the following properties: 1. There is a single terminal node, at level 0, with label 0, denoted by 0.0. 2. A non–terminal node at level k, K ≥ k ≥ 1, is denoted by k.p, where p is a unique identifier within the level, and has nk ≥ 2 edges to children, k.p[ik ].child, labelled with values k.p[ik ].val ∈ N ∪ {∞}, for 0 ≤ ik < nk . 3. If k.p[ik ].val = ∞, the value of k.p[ik ].child is irrelevant, so we simply require it to be 0 for canonicity; otherwise, k.p[ik ].child is the index of a node at level k − 1. 4. There is a single root node, K.r, with no incoming edges, except for a “dangling” incoming edge labelled with an integer value ρ ∈ Z. 5. Each non–terminal node has at least one outgoing edge labelled with 0. 6. All nodes are unique, i.e., if ∀ik , 0 ≤ ik < nk , k.p[ik ].child = k.q[ik ].child, k.p[ik ].val = k.q[ik ].val, then p = q. Figure 4 shows two EV+MDDs storing a total and a partial2 function, respectively (the total function encoded is that of Figures 2 and 3). Note that, unlike the normalization for EVBDDs, our normalization requires that the labels on (non–dangling) edges be non–negative, and at least one per node be zero, but not in a pre–determined location; compare the EVBDD of Figure 3(a) with the equivalent EV+MDD of Figure 4(a). The function encoded by the EV+MDD node k.p is f k.p (ik , . . . , i1 ) = k.p[ik ].val + f k−1. k.p [ik ].child (ik−1 , . . . , i1 ) 2
By “partial, we mean that some of its values can be ∞; whenever this is the case, we omit the corresponding value and edge from the graphical representation.
264
G. Ciardo and R. Siminiceanu
and we let f 0.0 = 0. As for EVBDDs, the function encoded by the EV+MDD (ρ, K.r) is f = ρ + f K.r . However, now, ρ = min{f (i) : i ∈ Sk × · · · × S1 }. In our application, we will encode distances, which are non–negative, thus ρ = 0. If we wanted to cope with the degenerate case S init = ∅, so that f is identically ∞, we could allow a special EV+MDD with ρ = ∞ and root 0.0. 3.2
Canonicity of EV+MDDs
Lemma 1. From every non–terminal EV+MDD node, there is an outgoing path with all edges labelled 0 reaching 0.0. Corollary 1. The function f k.p encoded by a node k.p is non–negative and min(f k.p ) = 0. Definition 3. The graphs rooted at two EV+MDD nodes k.p and k.q are isomorphic if there is a bijection b from the nodes of the first graph to the nodes of the second graph such that, for each node l.s of the first graph and each il ∈ Sl (with k ≥ l ≥ 1): b(l.s)[il ].child = b(l.s[il ].child) and b(l.s)[il ].val = l.s[il ].val. Theorem 1. (Canonicity) If two EV+MDDs (ρ1 , K.r1 ) and (ρ2 , K.r2 ) encode the same function f : S → N ∪ {∞}, then ρ1 = ρ2 and the two labelled graphs rooted at K.r1 and K.r2 are isomorphic. Proof. It is easy to see that, since the value on the dangling edges of the two EV+MDDs equals the minimum value ρ the encoded function f can assume, we must have ρ1 = ρ2 = ρ, and the two nodes K.r1 and K.r2 must encode the same function f − ρ. We then need to prove by induction that, if two generic EV+MDD nodes k.p and k.q encode the same function, the labelled graphs rooted at them are isomorphic. Basis (k = 1): if 1.p and 1.q encode the same function f : S1 → N ∪ {∞}, 1.p[i1 ].child = 1.q[i1 ].child = 0 and 1.p[i1 ].val = 1.q[i1 ].val = f (i1 ) for all i1 ∈ S1 , thus the two labelled graphs rooted at 1.p and 1.q are isomorphic. Inductive step (assume claim true for k − 1): if k.p and k.q encode the same function f : Sk × · × S1 → N ∪ {∞}, consider the function obtained when we fix ik to a particular value t, i.e., fik =t . Let g and h be the functions encoded by k.p[t].child and k.q[t].child, respectively; also, let k.p[t].val = α and k.q[t].val = β, and observe that the functions α+g and β+h must coincide with fik =t . However, because of Corollary 1, we know that both the g and h evaluate to 0, their minimum possible value, for at least one choice of the arguments (ik−1 , . . . , i1 ). Thus, the minimum of values α+g and β +h can have are α and β, respectively; since α+g and β+h are the same function, they must have the same minimum, hence α = β. This implies that g = h and, by inductive hypothesis, that k.p[t].child and k.q[t].child are isomorphic. Since this argument applies to a generic child t, the two nodes k.p and k.q are then themselves isomorphic, completing the proof. ✷
Using Edge-Valued Decision Diagrams
265
UnionMin(k : level , (α, p) : edge, (β, q) : edge) : edge 1. if α = ∞ then return (β, q); 2. if β = ∞ then return (α, p); 3. if k = 0 then return (min(α, β), 0); • the only node at level k = 0 has index 0 4. if UCacheFind (k, p, q, α−β, (γ, u)) then • match (k, p, q, α−β), return (γ, u) 5. return (γ + min(α, β), u); 6. u ← NewNode(k); • create new node at level k with edges set to (∞, 0) 7. µ ← min(α, β); 8. for ik = 0 to nk − 1 do 9. p ← k.p .child [ik ]; α ← α − µ + k.p .val [ik ]; 10. q ← k.q .child [ik ]; β ← β − µ + k.q .val [ik ]; 11. k.u [ik ] ← UnionMin(k−1, (α , p ), (β , q )); • continue downstream 12. CheckInUniqueTable(k, u); 13. UCacheInsert(k, p, q, α − β, (µ, u)); 14. return (µ, u); Fig. 5. The UnionMin algorithm for EV+MDDs.
4
Operations with EV+MDDs
We are now ready to discuss manipulation algorithms for EV+MDDs. We do so in the context of our state–space and distance generation problem, although, of course, the UnionMin function we introduce in Figure 5 has general applicability. The types and variables used in the pseudo–code of Figures 5 and 7 are event (model event, e), level (EV+MDD level, k), index (node index within a level, p, q, p , q , s, u, f ), value (edge value, α, β, α , β , µ, γ, φ), local (local state index ik , jk ), and localset (set of local states for one level, L). In addition, we let edge denote the pair (value, index ), i.e., the type of k.p[i]; note that only index is needed to identify a child, since the level itself is known: k−1. The UnionMin algorithm computes the minimum of two partial functions. This acts like a dual operator by performing the union on the support sets of states of the two operands (which must be defined over the same potential state and by finding the minimum value for the common elements. The space S), algorithm starts at the roots of the two operand EV+MDDs, and recursively descends along matching edges. If at some point one of the edges has value ∞, the recursion stops and returns the other edge (since ∞ is the neutral value with respect to the minimum); if the other edge has value ∞ as well, the returned value is (∞, 0), i.e., no states are added to the union; otherwise, if the other edge has finite value, we have just found states reachable in one set but not in the other. If the recursion reaches instead all the way to the terminal node 0.0, the returned value is the minimum of the two input values α and β. If both α and β are finite and p and q are non–terminal, UnionMin “keeps” the minimum value on the incoming arcs to the operands, µ, and “pushes down” any residual value α − µ, if µ = β < α, or β − µ, if µ = α < β, on the children of
266
G. Ciardo and R. Siminiceanu
i3 0 0 0 0 1 1 1 1 2 2 2 2
f
g
h=min(f,g)
0 1 2
0 2 1
0 1 1
i2 0 0 1 1 0 0 1 1 0 0 1 1
0 2
i1 f g h
0
0 0 0 0
1 ∞ 2 2
0 2 ∞ 2
1 ∞ ∞ ∞
0 2 2 2
1 ∞ 4 4
0 ∞ ∞ ∞
1 1 ∞ 1
0 3 1 1
1 ∞ 3 3
0 ∞ ∞ ∞
1 2 3 2
1 0 0 0
0 0 2
0 2
0 2
1 0
0
0 2
0
0
0 1 0
0
Fig. 6. An example of the UnionMin operator for EV+MDDs.
p or q, respectively, in its recursive downstream calls. In this case, the returned edge (µ, u) is such that µ + f k.u = min(α + f k.p , β + f k.q ). An example of the application of the UnionMin algorithm is illustrated in Figure 6. The potential state space is S3 × S2 × S1 = {0, 1, 2} × {0, 1} × {0, 1}. The functions encoded by the operands, f and g, are listed in the table to the left, along with the result function h = min(f, g). Lemma 2. The call UnionMin(k, (α, p), (β, q)) returns an edge (µ, u) such that µ = min(α, β) and k.u and its descendants satisfy property 5 of Definition 2, if k.p and k.q do. Proof. It is immediate to see that µ = min(α, β). To prove that k.u satisfies property 5, we use induction: if k = 0, there is nothing to prove, since property 5 applies to non–terminal nodes only. Assume now that the lemma is true for all calls at level k −1 and consider an arbitrary call UnionMin(k, (α, p), (β, q)), where the input nodes k.p and k.q satisfy property 5. If α or β is ∞, the returned node is one of the input nodes, so it satisfies property 5. Otherwise, since µ = min(α, β), at least one of α−µ and β −µ is 0; say α−µ = 0. The values labelling the edges of k.u are computed in line 11 of UnionMin. Since k.p satisfies property 5, there exists ik ∈ {0, . . . , nk − 1} such that k.p.val [ik ] = 0. Then, for the corresponding iteration of the for–loop, α is 0 and the edge returned by UnionMin(k−1, (α , p ), (β , q )) is (min(α , β ), u ) = (0, u ), where k−1.u satisfies property 5 by induction; thus, k.u[ik ].val is set to 0. ✷ We conclude the discussion of UnionMin by observing that the hash–key for the entries in our “union/min cache” is formed by the two nodes (passed as level , index , index , since the nodes are at the same level) plus the difference α − β of the values labelling two edges pointing to these nodes. This is better than using the key (k, p, q, α, β), which would unnecessarily clutter the cache with entries of the form (k, p, q, α + τ, β + τ, (µ + τ, u)), for all the values of τ arising in a particular execution.
Using Edge-Valued Decision Diagrams
4.1
267
State-Space and Distance Generation Using EV+MDDs
Our fixed–point algorithm to build and store the distance function δ, and implicitly the state space S, is described by the pseudo–code for BuildDistance, S init , N ) we Saturate, and RecursiveFire, shown in Figure 7. Given a model (S, follow these steps: 1. Encode S init into an initial EV+MDD node K.r. This can be done by building the MDD for S init , then setting to 0 all edge values for edges going to true (called 1 in the MDD terminology of [10]), setting the remaining edge values to ∞, eliminating the terminal node false, and renaming the terminal node true as 0 (in EV+MDD terminology). See [10] on how to build an MDD when S init contains a single state. In general, the MDD encoding of S init will be derived from some other symbolic computation, e.g., it will be already available as the result of a temporal logic query. 2. Call BuildDistance(K , r ). Functions CheckInUniqueTable , LocalsToExplore , UCacheFind , FCacheFind , UCacheInsert , FCacheInsert , PickAndRemoveElementFromSet , and CreateNode have the intuitive semantic associated to their name (see also the comments in the pseudo–code). Normalize(k, s) puts node k.s in canonical form by computing µ = min{k.s[ik ].val : ik ∈ Sk } and subtracting µ from each k.s[ik ].val (so that at least one of them becomes 0), then returns µ; in particular, if all edge values in k.s are ∞, it returns ∞ (this is the case in Statement 17 of RecursiveFire if the while–loop did not manage to fire e from any of the local states in L). The hash–key for the firing cache does not use the value α on the incoming edge, because the node k.s corresponding to the result (γ, s) of RecursiveFire is independent of this quantity. The edge value returned by RecursiveFire depends instead of α: it is simply obtained by adding the result of Normalize(k, s) to α. RecursiveFire may push excess values upwards when normalizing a node in line 17, that is, residual values are moved in the opposite direction as in UnionMin. However, the normalization procedure is called only once per node (when the node has been saturated), therefore excess values are not bounced back and forth repeatedly along edges. 4.2
Trace Generation Using EV+MDDs
Once the EV+MDD (ρ, K.r) encoding δ and S is built, a shortest–length trace from any of the states in S init to one of the states in a set X (given in input as an MDD) can be obtained by backtracking. For simplicity, the following algorithm does not output the identity of the events along the trace, but this option could be easily added, if desired: 1. Transform the MDD for X into an EV+MDD (ρx , K.x) encoding X and δx using the approach previously described for S init , where δx (i) = 0 if i ∈ X and δx (i) = ∞ if i ∈ S \ X .
268
G. Ciardo and R. Siminiceanu
BuildDistance(k : level , p : index ) 1. if k > 0 then 2. for ik = 0 to nk − 1 do 3. if k.p [ik ].val < ∞ then BuildDistance(k − 1, k.p [ik ].child ); 4. Saturate(k, p); Saturate(k : level , p : index ) 1. repeat 2. pChanged ← false; 3. foreach e ∈ Ek do 4. L ← LocalsToExplore(e, k, p); • {ik : Ne,k (ik ) = ∅ ∧ k.p [ik ].val = ∞} 5. while L = ∅ do 6. ik ← PickAndRemoveElementFromSet(L); 7. (α, f ) ← RecursiveFire(e, k−1, k.p [ik ]); 8. if α = ∞ then 9. foreach jk ∈ Ne,k (ik ) do 10. (β, u) ← UnionMin(k−1, (α + 1, f ), k.p [jk ]); 11. if (β, u) = k.p [jk ] then 12. k.p [jk ] ← (β, u); 13. pChanged ← true; 14. if Ne,k (jk ) = ∅ then L ← L∪{jk }; • remember to explore jk later 15. until pChanged = false; 16. CheckInUniqueTable(k, p); RecursiveFire(e : event, k : level , (α, q) : edge) : edge 1. if k < Bot(e) then return (α, q); • level k is not affected by event e 2. if FCacheFind (k, q, e, (γ, s)) then • match (k, q, e), return (γ, s) 3. return (γ + α, s); 4. s ← NewNode(k); • create new node at level k with edges set to (∞, 0) 5. sChanged ← false; ∅ ∧ k.q [ik ].val = ∞} 6. L ← LocalsToExplore(e, k, q); • {ik : Ne,k (ik ) = 7. while L = ∅ do 8. ik ← PickAndRemoveElementFromSet(L); 9. (φ, f ) ← RecursiveFire(e, k−1, k.q [ik ]); 10. if φ = ∞ then 11. foreach jk ∈ Ne,k (ik ) do 12. (β, u) ← UnionMin(k−1, (φ, f ), k.s [jk ]); 13. if (β, u) = k.s [jk ] then 14. k.s [jk ] ← (β, u); 15. sChanged ← true; 16. if sChanged then Saturate(k, s); 17. γ ← Normalize(k, s); 18. s ← CheckInUniqueTable(k, s); 19. FCacheInsert(k, q, e, (γ, s)); 20. return (γ + α, s); Fig. 7. BuildDistance, our saturation–based algorithm using EV+MDDs.
Using Edge-Valued Decision Diagrams
269
2. Compute IntersectionMax (K, (ρ, r), (ρx , x)), which is the dual of UnionMin, and whose pseudo–code is exactly analogous; let (µ, K.m) be the resulting EV+MDD, which encodes X ∩ S and the restriction of δ to this set (µ is then the length of one of the shortest–paths we are seeking). [µ] [µ] 3. Extract from (µ, K.m) a state j [µ] = (jK , . . . , j1 ) encoded by a path from K.m to 0.0 labelled with 0 values (j [µ] is a state in X at the desired minimum distance µ from S init ). The algorithm proceeds now with an explicit flavor. 4. Initialize ν to µ and iterate: a) Find all states i ∈ S such that j [ν] ∈ N (i). With our boolean Kronecker encoding of N , this “one step backward” is easily performed: we simply have to use the transpose of the matrices Ne,k . b) For each such state i, compute δ(i) using (ρ, K.r) and stop on the first i such that δ(i) = ν − 1 (there exists at least one such state i∗ ). c) Decrement ν. d) Let j [ν] be i∗ . 5. Output j [0] , . . . , j [µ] . The cost of obtaining j [µ] as the result of the IntersectionMax operation is O(#K.r · #K.x), where # indicates the number of EV+MDD nodes. The complexity of the rest of the algorithm is then simply O(µ · M · K), where M is the maximum number of incoming arcs to any state in the reachability graph of the model, i.e., M = max{|N −1 (j)| : j ∈ S}, and K comes from traversing one path in the EV+MDD. In practice M is small but, if this were not the case, the set N −1 (j [ν] ) could be computed symbolically at each iteration instead. Generating the same trace using traditional symbolic approaches could follow a similar idea. If we used ADDs, we would start with an ADD encoding the same information as the EV+MDD (ρx , K.x), compute the ADD equivalent to the EV+MDD (µ, K.m) using a breadth–first approach, and pick as j [µ] any state leading to a terminal with minimal value µ. If we used a forest of MDDs, we = ∅}, and pick as j [µ] any state in would compute µ = min{d : N d (S init ) ∩ X µ N ∩ X . Then, the backtracking would proceed in exactly the same way. In either case, however, we are discovering states symbolically in breadth– first order, thus we could choose to perform an intersection with X after finding each additional set of states N d , and stop as soon as N d (S init ) ∩ X = ∅. Overall, we would then have explored only {i : δ(i) ≤ µ}, which might be a strict subset of the entire state space S. However, two observations are in order. First, while this “optimization” manages fewer states, it may well require many more nodes in the symbolic representation: decision diagrams are quite counter–intuitive in this respect. Second, in many verification applications, the states in X satisfy some property, e.g., “being a deadlock”, and they can only be reached in some obscure and tortuous way, so that the minimum distance µ to any state in X is in practice close, if not equal, to the maximum distance ρ to any of the states in S. The advantage of our approach is that, while it must explore the entire S, it can do so using the idea of saturation, thus the resulting decision diagrams are
270
G. Ciardo and R. Siminiceanu Table 1. Comparison of the five approaches (“—” means “out of memory”).
Time Number of nodes (in seconds) final peak Es Eb Mb As Ab Es Eb Mb As Ab Es Eb Mb As Ab Dining philosophers: D = 2N , K = N/2, |Sk | = 34 for all k except |S1 | = 8 when N is odd 5 1.3·103 0.00 0.01 0.01 0.01 0.03 11 83 38 11 155 172 48 434 10 1.9·106 0.01 0.06 0.05 0.12 0.46 21 255 170 21 605 644 238 4022 20 3.5·1012 0.01 0.34 0.28 1.64 9.00 46 1100 740 46 2990 3079 1163 38942 25 4.7·1015 0.01 0.59 0.47 4.09 26.08 61 1893 1178 61 5215 5334 1958 79674 30 6.4·1018 0.02 0.86 0.70 7.39 56.80 71 2545 1710 71 7225 7364 2788 140262 1000 9.2·10626 0.48 — — — — 2496 — — 2496 — — — — Kanban system: D = 14N , K = 4, |Sk | = (N +3)(N +2)(N +1)/6 for all k 3 5.8·104 0.01 0.02 0.02 0.04 0.17 7 180 68 29 454 464 284 3133 5 2.5·106 0.02 0.14 0.12 0.24 1.55 9 444 133 57 1132 1156 776 13241 7 4.2·107 0.04 0.51 0.42 0.94 7.79 11 848 218 93 2112 2166 1600 35741 14 1673 383 162 4041 4160 3616 98843 10 1.0·109 0.16 2.10 1.68 4.68 48.86 12 5.5·109 0.34 4.34 3.45 11.08 129.46 16 2368 518 218 5633 5805 5585 165938 50 1.0·1016 179.48 — — — — 58 — — 2802 — — — — Flex. manuf. syst.: D = 14N , K = 19, |Sk | = N +1 for all k except |S17 | = 4, |S12 | = 3, |S2 | = 2 3 4.9·104 0.00 0.12 0.09 0.26 1.58 88 1925 1191 116 5002 5187 2075 37657 5 2.9·106 0.01 0.42 0.34 0.88 11.78 149 5640 2989 211 15205 15693 4903 179577 7 6.6·107 0.02 1.05 0.85 2.08 65.32 222 12070 5739 326 32805 33761 9027 523223 10 2.5·109 0.04 2.96 2.40 5.79 608.92 354 28225 11894 536 76676 78649 17885 1681625 140 2.0·1023 20.03 — — — — 32012 — — 52864 — — — — Round–robin mutex protocol: D = 8N −6, K = N +1, |Sk | = 10 for all k except |S1 | = N +1 10 2.3·104 0.01 0.06 0.05 0.22 0.50 92 1038 1123 107 1898 1948 1210 9245 15 1.1·106 0.01 0.15 0.14 1.00 2.93 177 2578 3136 212 4774 4885 3308 34897 20 4.7·107 0.02 0.32 0.31 3.10 12.62 287 4968 6619 322 9270 9467 6901 92140 25 1.8·109 0.03 0.59 0.54 7.89 52.29 422 8333 11947 477 15636 15944 12364 198839 30 7.2·1010 0.05 0.95 0.89 16.04 224.83 582 12798 19495 637 24122 24566 20072 376609 200 7.2·1062 1.63 — — — — 20897 — — 21292 — — — — N
|S|
built much more efficiently and require much less memory than with breadth– first approaches. The following section confirms this, focusing on the first and expensive phase of trace generation, the computation of the distance information, since the backtracking phase has negligible cost in comparison and is in any case essentially required by any approach.
5
Results
To stress the importance of using a saturation–based approach, we compare the three types of encodings for the distance function we have discussed, EV+MDDs, forests of MDDs, and ADDs, in conjunction with two iteration strategies, based on breadth–first and saturation, respectively (see Table 1). Since only breadth– first is applicable in the case of forests of MDDs, this leads to five cases: EV+MDD with saturation (Es ), EV+MDD with breadth–first (Eb ), forest of MDDs with breadth–first (Mb ), ADD with saturation (As ), and ADD with breadth–first (Ab ). Note that only Mb and Ab have been used in the literature before, while Es and Eb use our new data structure and As (which we cannot
Using Edge-Valued Decision Diagrams
271
discuss in detail for lack of space) applies the idea of saturation to ADDs, thus it is also a new approach. We implemented the five algorithms (their MDD, not BDD, version) in our tool SMART [8] and used them to generate the distance function for the entire state space. The suite of examples is chosen from the same benchmark we used in [10]; each model is scalable by a parameter N . All experiments were ran on a 800 MHz Pentium III workstation with 1GB of memory. For each model, we list the maximum distance D, the number K of levels in the decision diagram, and the sizes of the local state spaces. For each experiment we list the maximum distance to a reachable state, which is also the number of iterations in the breadth–first approaches, the runtime, and the number of nodes (both final and peak). In terms of runtime, there is a clear order: Es < Eb < Mb < As < Ab , with Es easily managing much larger systems; Es , Eb < Mb < As , Ab clearly attests to the effectiveness of the data structures, while Es < Eb and As < Ab attest to the improvements obtainable with saturation–based approaches. With EV+MDDs, in particular with Es , we can scale up the models to huge parameters. The other two data structures do not scale up nearly as well and run out of memory. In terms of memory consumption: Es < As < Eb ≈ Mb < Ab for the peak number of nodes, while Es = Eb < As = Ab ≈ Mb for the final number of nodes. The key observation is that Es substantially outperforms all other methods. Compared to Ab , it is over 1,000 times faster and uses fewer peak nodes, also by a factor of 1,000.
6
Conclusion
We introduced EV+MDDs, a new canonical variation of EVBBDs, which can be used to store the state space of a model and the distance of every state form the initial set of states within a single decision diagram. A key contribution is that we extend the saturation approach we previously introduced for state–space generation alone, and apply it to this data–structure, resulting in a very fast and memory–efficient algorithm for joint state–space and distance generation. One conclusion of our research is a clear confirmation of the effectiveness of saturation as opposed to a traditional breadth–first iteration, not just when used in conjunction with our EV+MDDs, but even with ADDs. A second orthogonal conclusion is that edge–valued decision diagrams in general are much more suited than ADDs to the task at hand, because they implicitly encode the possible distance values, while ADDs have an explicit terminal node for each possible value, greatly reducing the degree of node merging in the diagram. Future work along these research lines includes exploring smarter cache management policies that exploit properties of the involved operators (e.g., additivity), extending the idea to EU and EG operators (probably a major challenge), comparing the performance of our method with that of non BDD–based techniques (such as using SAT solvers [4]), and investigate other fields of application for EV+MDDs.
272
G. Ciardo and R. Siminiceanu
References 1. P. A. Abdulla, P. Bjesse, and N. E´en. Symbolic reachability analysis based on SAT-solvers. In S. Graf and M. Schwartzbach, editors, Proc. Tools and Algorithms for the Construction and Analysis of Systems TACAS, Berlin, Germany, volume 1785 of LNCS, pages 411–425. Springer-Verlag, 2000. 2. V. Amoia, G. De Micheli, and M. Santomauro. Computer-oriented formulation of transition-rate matrices via Kronecker algebra. IEEE Trans. Rel., 30:123–132, June 1981. 3. R. I. Bahar, E. A. Frohm, C. M. Gaona, G. D. Hachtel, E. Maciii, A. Pardo, and F. Somenzi. Algebraic decision diagrams and their applications. Formal Methods in System Design, 10(2/3):171–206, Apr. 1997. 4. A. Biere, A. Cimatti, E. Clarke, and Y. Zhu. Symbolic model checking without BDDs. LNCS, 1579:193–207, 1999. 5. R. E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Trans. Comp., 35(8):677–691, Aug. 1986. 6. R. E. Bryant and Y.-A. Chen. Verification of arithmetic circuits with binary moment diagrams. In Proc. of Design Automation Conf. (DAC), pages 535–541, 1995. 7. J. R. Burch, E. M. Clarke, K. L. McMillan, D. L. Dill, and L. J. Hwang. Symbolic model checking: 1020 states and beyond. In Proc. 5th Annual IEEE Symp. on Logic in Computer Science, pages 428–439, Philadelphia, PA, 4–7 June 1990. IEEE Comp. Soc. Press. 8. G. Ciardo, R. L. Jones, A. S. Miner, and R. Siminiceanu. SMART: Stochastic Model Analyzer for Reliability and Timing. In P. Kemper, editor, Tools of Aachen 2001 Int. Multiconference on Measurement, Modelling and Evaluation of Computer-Communication Systems, pages 29–34, Aachen, Germany, Sept. 2001. 9. G. Ciardo, G. Luettgen, and R. Siminiceanu. Efficient symbolic state-space construction for asynchronous systems. In M. Nielsen and D. Simpson, editors, Application and Theory of Petri Nets 2000 (Proc. 21th Int. Conf. on Applications and Theory of Petri Nets, Aarhus, Denmark), LNCS 1825, pages 103–122. SpringerVerlag, June 2000. 10. G. Ciardo, G. Luettgen, and R. Siminiceanu. Saturation: An efficient iteration strategy for symbolic state space generation. In T. Margaria and W. Yi, editors, Proc. Tools and Algorithms for the Construction and Analysis of Systems (TACAS), LNCS 2031, pages 328–342, Genova, Italy, Apr. 2001. Springer-Verlag. 11. E. Clarke, E. Emerson, and A. Sistla. Automatic verification of finite-state concurrent systems using temporal logic specifications. ACM Trans. Progr. Lang. and Syst., 8(2):244–263, Apr. 1986. 12. E. Clarke and X. Zhao. Word level symbolic model checking: A new approach for verifying arithmetic circuits. Technical Report CS-95-161, Carnegie Mellon University, School of Computer Science, May 1995. 13. E. M. Clarke, O. Grumberg, and D. A. Peled. Model Checking. MIT Press, 1999. 14. R. Drechsler and B. Becker. Overview of decision diagrams. IEE Proc.-Comput. Digit. Tech., 144(3):187–193, May 1997. 15. E.M. Clarke, O. Grumberg, K.L. McMillan, and X. Zhao. Efficient generation of counterexamples and witnesses in symbolic model checking. In 32nd Design Automation Conference (DAC 95), pages 427–432, San Francisco, CA, USA, 1995. 16. A. Geser, J. Knoop, G. L¨ uttgen, B. Steffen, and O. R¨ uthing. Chaotic fixed point iterations. Technical Report MIP-9403, Univ. of Passau, 1994.
Using Edge-Valued Decision Diagrams
273
17. R. Hojati, R. K. Brayton, and R. P. Kurshan. BDD-based debugging of designs using language containment and fair CTL. In C. Courcoubetis, editor, Computer Aided Verification (CAV’93), volume 697 of LNCS, pages 41–58, Elounda, Greece, June/July 1993. Springer-Verlag. 18. J.R. Burch, E.M. Clarke, and D.E. Long. Symbolic model checking with partitioned transition relations. In A. Halaas and P.B. Denyer, editors, Int. Conference on Very Large Scale Integration, pages 49–58, Edinburgh, Scotland, Aug. 1991. IFIP Transactions, North-Holland. 19. T. Kam, T. Villa, R. Brayton, and A. Sangiovanni-Vincentelli. Multi-valued decision diagrams: theory and applications. Multiple-Valued Logic, 4(1–2):9–62, 1998. 20. Y.-T. Lai, M. Pedram, and B. K. Vrudhula. Formal verification using edge-valued binary decision diagrams. IEEE Trans. Comp., 45:247–255, 1996. 21. Y.-T. Lai and S. Sastry. Edge-valued binary decision diagrams for multi-level hierarchical verification. In Proceedings of the 29th Conference on Design Automation, pages 608–613, Los Alamitos, CA, USA, June 1992. IEEE Computer Society Press. 22. A. S. Miner and G. Ciardo. Efficient reachability set generation and storage using decision diagrams. In H. Kleijn and S. Donatelli, editors, Application and Theory of Petri Nets 1999 (Proc. 20th Int. Conf. on Applications and Theory of Petri Nets, Williamsburg, VA, USA), LNCS 1639, pages 6–25. Springer-Verlag, June 1999. 23. T. Murata. Petri Nets: properties, analysis and applications. Proc. of the IEEE, 77(4):541–579, Apr. 1989. 24. P. F. Williams, A. Biere, E. M. Clarke, and A. Gupta. Combining Decision Diagrams and SAT Procedures for Efficient Symbolic Model Checking. In Proceedings of CAV’00, pages 124–138, 2000.
Mechanical Verification of a Square Root Algorithm Using Taylor’s Theorem Jun Sawada1 and Ruben Gamboa2 1
2
IBM Austin Research Laboratory Austin, TX 78759 [email protected] Department of Computer Science University of Wyoming Laramie, WY 82071 [email protected]
Abstract. The IBM Power4TM processor uses Chebyshev polynomials to calculate square root. We formally verified the correctness of this algorithm using the ACL2(r) theorem prover. The proof requires the analysis on the approximation error of Chebyshev polynomials. This is done by proving Taylor’s theorem, and then analyzing the Chebyshev polynomial using Taylor polynomials. Taylor’s theorem is proven by way of non-standard analysis, as implemented in ACL2(r). Since a Taylor polynomial has less accuracy than the Chebyshev polynomial of the same degree, we used hundreds of Taylor polynomial generated by ACL2(r) to evaluate the error of a Chebyshev polynomial.
1
Introduction
We discuss the formal verification of a floating-point square root algorithm used in the IBM Power4TM processor. The same algorithm was first presented and proven, not formally, by Agarwal et al in [2]. Obviously, the drawback of a handproof is that it does not provide an absolute assurance of correctness. Formal verification gives a higher-level of confidence by mechanically checking every detail of the algorithm. The formal verification of square root algorithms used in industrial processors has been studied in the past. Russinoff used the ACL2 theorem prover [12] to verify the microcode of K5 Microprocessor [18]. Later he also verified the square root algorithm in the K7 microprocessor [19]. Aagaard et al. [1] verified the square root algorithm used in an Intel processor with the Forte system [15] that combines symbolic trajectory evaluation and theorem proving. The square root algorithms mentioned above use the Newton-Raphson algorithm or one of its variants. This algorithm starts with an initial estimate and interactively calculates a better estimate from the previous one. The formula to obtain the new estimate is relatively simple. It takes a few iterations to obtain an estimate that is accurate enough. This estimate is rounded to the final answer according to a specified rounding mode. In Newton-Raphson’s algorithm, many M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 274–291, 2002. c Springer-Verlag Berlin Heidelberg 2002
Mechanical Verification of a Square Root Algorithm
275
instructions are dependent on earlier instructions. The algorithm may require more execution cycles on a processor with many pipeline stages and high latency. The IBM Power4 processor and its predecessor Power3TM processor use a different iteration algorithm. From the initial approximation, it obtains a better approximation using a Chebyshev polynomial. Polynomial calculation needs more instructions than a single iteration of the Newton-Raphson algorithm. However, only a single iteration is sufficient to obtain the necessary precision. Since instructions in the polynomial calculation are less dependent on earlier instructions than those in the Newton-Raphson algorithm, more instructions can be executed in parallel with a pipelined floating-point unit. We verify that this algorithm returns a final estimate accurate enough to guarantee that it is rounded to the correct answer. The verification was carried out with the ACL2(r) theorem prover [4]. ACL2(r) is an extension of the ACL2 theorem prover that performs reasoning on real numbers using non-standard analysis [17]. The verification of the square root algorithm took place in three steps: S1 Prove Taylor’s theorem. S2 Bound the error of a Chebyshev polynomial using the result from S1. S3 Prove the algorithm using the result from S2. One challenge for the formal verification of this algorithm is the error size analysis on the Chebyshev polynomial approximating the square root function. Our approach uses Taylor polynomials in the measurement of the error size of a Chebyshev polynomial. However, a Chebyshev polynomial gives a better approximation than a Taylor polynomial of the same degree, thus it cannot be done in a straightforward fashion. Certainly, we can use a high-degree Taylor polynomial to obtain a better precision, as was done by Harrison [8,9] in his analysis of exponential and trigonometric functions. In order to measure the error of a polynomial p(x) approximating function f (x), he used a high-degree Taylor polynomial t(x) which approximates f (x) far better than p(x). The upper bound of |t(x) − p(x)| can be obtained by calculating its value at the points where the derivatives of the polynomials satisfy t (x) − p (x) = 0. However, calculating all the roots of this equation is a major bottleneck in automating the proof. Our approach, instead, generates hundreds of Taylor polynomials that have no higher degree than p(x), and measure the error size of p(x) in divided segments. This approach does not require solving equations, and can be automated easily. This paper is organized as follows. In Section 2, we introduce the nonstandard analysis features of ACL2(r) that form a basis for our proof. In Section 3, we describe the proof of Taylor’s theorem in ACL2(r), which corresponds to the step S1. In Section 4, we describe the square root algorithm used in the Power4 processor and its verification, which corresponds to the step S3. This section assumes that certain proof obligations are met. These proof obligations are proven in Section 5, using Taylor’s theorem. This corresponds to the step S2. Finally, we conclude in Section 6.
276
2
J. Sawada and R. Gamboa
ACL2(r): Real Analysis Using Non-standard Analysis
Non-standard analysis, introduced by Robinson in the 1960s using model theoretic techniques and later given an axiomatization by Nelson [17,14], provides a rigorous foundation for the informal reasoning about infinitesimal quantities used by Leibniz when he co-invented the calculus and still used today by engineers and scientists when applying calculus. There are several good introductions to non-standard analysis, for example [16,13]. In this section, we give the reader enough of the background to follow subsequent discussions. Non-standard analysis changes our intuitive understanding of the real number line in a number of ways. Some real numbers, including all numbers that are uniquely determined by a first-order formula, such as 0, 1, e, and π, are called standard. There are real numbers that are larger in magnitude than all the standard reals; these numbers are called i-large. Numbers that are not i-large are called i-limited. Moreover, there are reals smaller in magnitude than any positive standard real; these numbers are called i-small. It follows that 0 is the only number that is both standard and i-small. Notice that if N is an i-large number, 1/N must be i-small. Two numbers are called i-close if their difference is i-small. It turns out that every i-limited number is i-close to a standard number. That is, if x is i-limited, it can be written as x = x∗ + , where x∗ is standard and is i-small. The number x∗ is called the standard-part of x. The terms i-large, i-small, and i-close give mathematical precision to the informal ideas “infinitely large,” “infinitely small,” and “infinitely close.” These informal notions are ubiquitous in analysis, where they are often replaced by formal statements about series or by − δ arguments. A feature of non-standard analysis is that it restores the intuitive aspects of analytical proofs. For example, the sequence {an } is said to converge to the limit A if and only if aN is i-close to A for all i-large N . This agrees with the intuitive notion of convergence: “an gets close to A when n is large enough.” Similarly, consider the notion of derivatives: the function f has derivative f (x) at a standard point x if and only if (f (x) − f (y))/(x − y) is i-close to f (x) whenever x is i-close to y. Again, the formal definition follows closely the intuitive idea of derivative as the slope of the chord with endpoints “close enough.” The non-standard definition principle allows the definition of functions by specifying√ their behavior only at standard points. For example, consider the function x. One way to define it is to provide an approximation scheme fn (x) so that {f √ square root of x. For standard points x, the √n (x)} converges to the function x can be defined by x = (fN (x))∗ , where N is an i-large integer. Using the non-standard definitional principle, this function defined over standard √ numbers is extended to the function x defined over the entire real number line. The transfer principle allows us to prove a first-order statement P (x) about the reals by proving it only when x is standard. This principle can be applied only when the statement P (x) is a statement without using the new functions of non-standard analysis, such as standard,√ i-large, i-small, i-close, or standard-part. Consider the example given above for x. The function fN (x) is an approximation to the square root of x, so it is reasonable that fN (x) · fN (x) is i-close
Mechanical Verification of a Square Root Algorithm
277
to x when x is i-limited and N is i-large. In fact, such a theorem can proved in using induction on N . What this means is that for standard x, √ √ ACL2(r) ∗ ∗ ∗ x · x = (fN (x)) √ ·√(fN (x)) = (fN (x) · fN (x)) = x. The transfer principle then establishes x · x = x for all x. Using the non-standard definition and transfer principles in tandem is a powerful and ubiquitous technique in ACL2(r). To illustrate it, we present a proof of the maximum theorem in ACL2(r). The theorem states that if f is a continuous function on the closed interval [a, b], there is a point x ∈ [a, b] so that f (x) ≥ f (y) for all y ∈ [a, b]. This theorem is used in the proof of Rolle’s Lemma, which in turn is the key to proving Taylor’s Theorem. We begin by introducing an arbitrary continuous function f in a domain. This can be done in ACL2 using the encapsulate event: (encapsulate ((f (x) t) (domain-p (x) t)) (local (defun f (x) x)) (local (defun domain-p (x) (realp x))) (defthm domain-real (implies (domain-p x) (realp x))) (defthm domain-is-interval (implies (and (domain-p l) (domain-p h) (realp x) (<= l x) (<= x h)) (domain-p x))) (defthm f-standard (implies (and (domain-p x) (standard-numberp x)) (standard-numberp (f x)))) (defthm f-real (implies (domain-p x) (realp (f x)))) (defthm f-continuous (implies (and (domain-p x) (standard-numberp x) (domain-p y) (i-close x y)) (i-close (f x) (f y)))) ) ACL2’s encapsulate mechanism allows the introduction of constrained functions. This event introduces the functions f and domain-p. The first argument of the encapsulate establishes that they are unary functions. The definitions
278
J. Sawada and R. Gamboa
of f and domain-p are marked local, and they are not available outside of the encapsulate. Their only purpose is to demonstrate that there are some functions which satisfy the given constraints. The constraints are specified by the defthm events inside of the encapsulate. These constraints serve to make domain-p an arbitrary function that accepts intervals of real numbers, and f an arbitrary standard, real, and continuous function. To show that f achieves its maximum on a closed interval, we split the interval [a, b] into n subintervals of size = b−a n . It is easy to define a function that finds the point a + k · where f achieves the maximum of the points in the -grid of points a + i · . That much of the reasoning uses only the traditional concepts in ACL2, notably recursion and induction. Non-standard analysis takes center stage when we consider what happens when n is i-large, hence when is i-small. Consider the point xmax = (a + k · )∗ . This is a standard point, since it is the standard-part of an i-limited point. Let y be a standard point in [a, b]. Since y ∈ [a, b], there must be an i so that y ∈ [a + (i − 1) · , a + i · ]. Since is i-small, it follows that y is i-close to a + i · , so f (y) = (f (a + i · ))∗ from the continuity of f . But from the definition of xmax it follows that f (y) = (f (a + i · ))∗ ≤ (f (a + k · ))∗ = f (xmax ). This suffices to show that f achieves its maximum over standard points y ∈ [a, b] at xmax . Using the transfer principle, we have that f achieves its maximum over [a, b] at xmax . In ACL2(r), we prove the result by defining the function find-max-f-x-n which finds the point a + k · where f achieves its maximum over the -grid as: (defun find-max-f-x-n (a max-x i n eps) (if (and (integerp i) (integerp n) (<= i n) (realp a) (realp eps) (< 0 eps)) (if (> (f (+ a (* i eps))) (f max-x)) (find-max-f-x-n a (+ a (* i eps)) (1+ i) n eps) (find-max-f-x-n a max-x (1+ i) n eps)) max-x)) It is a simple matter to prove that this function really does find the maximum in the grid. Moreover, under natural conditions, the point it finds is in the range [a, b]. The next step is to use the non-standard definitional principle to introduce the function that picks the actual maximum over [a, b]. In ACL2(r) this is done by using the event defun-std in place of defun to define the function: (defun-std find-max-f-x (a b) (if (and (realp a) (realp b) (< a b)) (standard-part (find-max-f-x-n a a 0 (i-large-integer) (/ (- b a) (i-large-integer)))) 0)) The function i-large-integer in ACL2(r) is used to denote a positive i-large integer. Since find-max-f-x-n is in the range [a, b], we can use the transfer principle to show that find-max-f-x is also in [a, b].
Mechanical Verification of a Square Root Algorithm
279
A simple inductive argument can establish that find-max-f-x-n finds the point in the -grid where f achieves its maximum. Taking the standard-part of both sides of this inequality shows that the point selected by find-max-f-x is also a maximum over the points in the grid: (defthm find-max-f-is-maximum-of-grid (implies (and (realp a) (standard-numberp a) (realp b) (standard-numberp b) (< a b) (domain-p a) (domain-p b) (integerp i) (<= 0 i) (<= i (i-large-integer))) (<= (standard-part (f (+ a (* i (/ (- b a) (i-large-integer)))))) (f (find-max-f-x a b))))) Once we have this result, we can see that find-max-f-x finds the point where f achieves its maximum on the standard points of [a, b]. It is simply necessary to observe that any standard point x ∈ [a, b] must be i-close to some point a + i · in the -grid of [a, b]. (defthm find-max-f-is-maximum-of-standard (implies (and (realp a) (standard-numberp a) (realp b) (standard-numberp b) (realp x) (standard-numberp x) (domain-p a) (domain-p b) (<= a x) (<= x b) (< a b)) (<= (f x) (f (find-max-f-x a b))))) To complete the proof, it is only necessary to invoke the transfer principle on the theorem above. In ACL2(r) this is done by using the event defthm-std when the theorem is proved: (defthm-std find-max-f-is-maximum (implies (and (realp a) (domain-p a) (realp b) (domain-p b) (realp x) (<= a x) (<= x b) (< a b)) (<= (f x) (f (find-max-f-x a b))))) The techniques described above, combining defun-std and defthm-std, have been used extensively in ACL2(r), resulting in proofs as varied as the correctness of the Fast Fourier Transform and the fundamental theorem of calculus [5,11]. A more complete account of ACL2(r) can be found in [4,6].
3
Proof of Taylor’s Theorem
Given a function f with n continuous derivatives on an interval [a, b] Taylor’s formula with remainder provides a means for estimating f (x) for an arbitrary x ∈ [a, b] from the values of f and its derivatives at a. Formally, Taylor’s Theorem can be stated as follows:
280
J. Sawada and R. Gamboa
Taylor’s Theorem If f (n) (x) is continuous in [x0 , x0 + δ] and f (n+1) (x) exists in [x0 , x0 + δ], then there exists ξ ∈ (x0 , x0 + δ) such that f (x0 + δ) = f (x0 ) + f (x0 )δ + =
n−1 i=0
The term f (n) (ξ) n n! δ
n−1 i=0
f (x0 ) 2 f (n−1) (x0 ) (n−1) f (n) (ξ) n + δ + ··· + δ δ 2! (n − 1)! n!
f (i) (x0 ) i f (n) (ξ) n δ + δ i! n!
f (i) (x0 ) i δ i!
is called the Taylor polynomial of f of degree n−1, and
is called the corresponding Taylor remainder. The Taylor polynomial is often used to approximate f (x); the approximation error can be estimated using the Taylor remainder. The proof of this theorem, presented in [3] among others, is similar to the proof of the mean value theorem. First, a special function F is constructed, and then Rolle’s Lemma is applied to F to find a ξ for which F (ξ) = 0. Taylor’s formula follows from solving F (ξ) = 0 for f (x0 + δ). From the theorem prover perspective, the main challenge is to find F , which can be done by repeated application of the theorems concerning the derivatives of composition of functions, and then to solve the equation accordingly. To formalize Taylor’s Theorem in ACL2(r), we use encapsulate to introduce the constrained function f. The derivatives of f are constrained in the function f-deriv, which takes arguments corresponding to i and x in f (i) (x). The definition of f-deriv is rather tricky, because we need the definition of f (i) (x) in order to define f (i+1) (x), and ACL2(r) does not allow the use of non-standard features in a recursive definition. Instead, we define it as follows: (encapsulate ((f (x) t) (f-deriv (i x) t) (domain-p (x) t) (tay-n () t)) (local (defun tay-fn (i n a x)
0))
(local (defun tay-fn-deriv (i n a x)
0))
... (defthm f-deriv-0 (implies (domain-p x) (equal (f-deriv 0 x) (f x)))) (defthm f-deriv-chain
Mechanical Verification of a Square Root Algorithm
)
281
(implies (and (standard-numberp x) (domain-p x) (domain-p y) (i-close x y) (not (= x y)) (integerp i) (<= 0 i) (<= i (tay-n))) (i-close (/ (- (f-deriv i x) (f-deriv i y)) (- x y)) (f-deriv (1+ i) x))))
In the interest of brevity, we present only the local definition and constraints of f and f-deriv in this encapsulate. We locally defined functions f and f-deriv as a constant zero function. Remember the purpose of local definition of an encapsulate form is just to show the existence of concrete functions that satisfy the constraints and check the logical consistency. And the constant zero function serves for this purpose. The constraints on f-deriv follow the non-standard definition of derivatives as discussed in section 2. Also constrained is the bound tay-n on the order of the Taylor approximation. Note in particular that for i greater than tay-n, we do not assume that f-deriv continues to find derivatives, since these higher-order derivatives are not even assumed to exist as part of Taylor’s theorem. Once the i’th derivative is defined, the rest of the proof is rather straightforward. Details of this proof can be found in [7].
4 4.1
Verification of a Square Root Algorithm Description of the Algorithm
Following is the description of the square root algorithm used in the Power4 processor. First we introduce a few functions. We define expo(x) as the function that returns the exponent of x. Function ulp(x, n) returns the unit of least position; that is defined as: ulp(x, n) = 2expo(x)−n+1 . This corresponds to the magnitude of the least significant bit of an n-bit precision floating point number x. Function near(x, n) rounds a rational number x to a floating-point number with n-bit precision using IEEE nearest rounding mode. Function rnd(x, m, n) rounds x to an n-bit precision floating-point number using rounding mode m, where m can be near, trunc, inf or minf corresponding to the four IEEE rounding modes [10]. The predicate exactp(x, n) is true if x can be exactly represented as an n-bit IEEE floating-point number. Thus exactp(near(x, n), n) and exactp(rnd(x, m, n), n) are true. These functions, except ulp(x, n), are defined in the library distributed with the ACL2 system. We assume that b is an IEEE double precision floating-point number satisfying 1/2 ≤ b < 2 and discuss the algorithm to calculate the square root of b. This restricted algorithm can be easily extended to the full range of IEEE
282
J. Sawada and R. Gamboa
Table 1. Double-precision floating-point square root algorithms used in Power4. √ Algorithm to calculate b Look up y0s Look up q0 e := near(1 − y0s × b, 53) q0s := near(*root2* × q0 , 53) if 1 ≤ b < 2 := q0 if 1/2 ≤ b < 1 t3 := near(c4 + c5 × e, 53) t4 := near(c2 + c3 × e, 53) esq := near(e × e, 53) t5 := near(c0 + c1 × e, 53) e1 := near(q0s × q0s − b, 53) t1 := near(y0s × q0s , 53) t6 := near(t4 + esq × t3 , 53) q0e := near(t1 × e1 , 53) t7 := near(t5 + esq × t6 , 53) q1 := q0s + q0e × t7 sqrt-round(q1 ,b,mode)
floating-point numbers, because the exponent of the square root is calculated by dividing the exponent of b by 2. In this algorithm, an on-chip table provides the initial 12-bit estimate of the square root of b, which is named q0 . Another on-chip table gives the 53-bit rounded value of 1/q02 , which is denoted y0 . In order to reduce the size of the on-chip tables, the tables entries exist only for 1/2 ≤ b < 1. When 1 ≤ b < 2, the algorithm looks up the table entries for b/2 and adjusts them by dividing y0 by 2 and √ multiplying q0 by *root2*, which is a precomputed 53-bit representation of 2. The adjusted values are called y0s and q0s , respectively. √ Let e = 1 − y0s b. The difference between the square of q0s and b can be calculated as: 2 2 2 q0s − b q0s (1 − by0s ) = q0s e. √ By solving this equation with respect to b, we get √ √ b q0s 1 − e q0s (1 + c0 e + c1 e2 + c2 e3 + c3 e4 + c4 e5 + c5 e6 ) where 1 + c0 e + · · · + c5 e6 is a Chebyshev polynomial approximating Further manipulation of the right-hand side leads to the following: √ b q0s + q0s e(c0 + · · · c5 e5 ) = q0s + q0s (1 − y0s b)(c0 + · · · c5 e5 ) 2 y0s − y0s b)(c0 + · · · c5 e5 )
q0s + q0s (q0s 2 = q0s + q0s y0s (q0s − b)(c0 + · · · c5 e5 ).
√
1 − e.
Mechanical Verification of a Square Root Algorithm
283
The algorithm in Table 1 uses this equation to calculate a better approxi√ mation q1 of b. The procedure to obtain y0s from y0 is not explicit in Table 1 as it is simply an exponent adjustment. Chebyshev coefficients c0 through c5 are 53-bit precision floating-point numbers obtained from an on-chip table. In fact, we use two sets of Chebyshev coefficients, one of which is intended for the segment 0 ≤ e ≤ 2−6 and the other for the segment −2−6 ≤ e ≤ 0. Let c0p , c1p , c2p , c3p , c4p and c5p be the set of coefficients intended for the positive segment, and c0n , c1n , c2n , c3n , c4n and c5n for the negative segment. In our algorithm, the 6th fraction bit of b, instead of the polarity of e, determines which set of coefficients will be used. This can be justified by the fact that e tends to be positive when the 6th fraction bit of b is 0, and negative otherwise. However, this relation between the 6th fraction bit of b and the polarity of e is not always true, and we must verify that this heuristic in selecting Chebyshev coefficients does not cause too much error. We will come back to this in Section 5. The function sqrt-round(q1 , b, m) at the end of the algorithm represents the hardwired rounding mechanism for the√square root algorithm. It rounds the final estimate q1 to the correct answer rnd( b, m, 53), if the error of the final estimate q1 is less than a quarter of the ulp. Thus our verification objective is to prove the following theorem: Theorem 1. For any 1/2 ≤ b < 2, |Ef inal | < 2−55 ≤ ulp(b, 53)/4. Although we analyzed a paper description of the rounding mechanism, the rounding implemented in Power4 is not part of our verification.
4.2
Verification of the Algorithm
The proof of the square root algorithm has been mechanically checked by the ACL2(r) prover. The proof outline is basically the same as that provided by Agarwal et al[2]. We describe the proof from the perspective of mechanization, and explain what must be proven with Taylor’s theorem. First we define the intermediate values q0s , y0s , e, t3 , t4 , esq, t5 , e1 , t1 , t6 , q0e , t7 and q1 that appear in Table 1 as ACL2(r) functions. These are, in fact, functions of b, but we omit the argument b for simplicity in this paper. The same is true for the Chebyshev coefficients c0, c1, c2, c3, c4 and c5, which are selected from two sets of coefficients depending on b. For each of the intermediate values, we ˜ t˜5 , e˜1 , t˜1 , t˜6 , q˜0e and t˜7 as the infinitely precise value before define e˜, t˜3 , t˜4 , esq, rounding. We define re , rt3 , rt4 , resq , rt5 , re1 , rt1 , rt6 , rq0e and rt7 as the values added to the infinitely precise values by rounding. Formally speaking:
284
J. Sawada and R. Gamboa
e˜ t˜3 t˜4 esq ˜ t˜5 e˜1 t˜1 t˜6 q˜0e t˜7
= 1 − y0s × b = c4 + c5 × e = c2 + c3 × e =e×e = c0 + c1 × e = q0s × q0s − b = y0s × q0s = t4 + esq × t3 = t1 × e1 = t5 + esq × t6
e t3 t4 esq t5 e1 t1 t6 q0e t7
= near(˜ e, 53) = near(t˜3 , 53) = near(t˜4 , 53) = near(esq, ˜ 53) = near(t˜5 , 53) = near(e˜1 , 53) = near(t˜1 , 53) = near(t˜6 , 53) = near(q˜0e , 53) = near(t˜7 , 53)
re rt3 rt4 resq rt5 r e1 rt1 rt6 rq0e rt7
= e − e˜ = t3 − t˜3 = t4 − t˜4 = esq − esq ˜ = t5 − t˜5 = e1 − e˜1 = t1 − t˜1 = t6 − t˜6 = q0e − q˜0e = t7 − t˜7
2 We also define µ as µ = y0s q0s − 1. From an automatic case-analysis of the look-up table, ACL2(r) can show that |˜ e| < 2−6 , |µ| ≤ 397/128 × 2−53 , 50/71 ≤ q0s < 71/50 and 1/2 ≤ y0s < 2. The amount rounded off by the nearest-mode rounding is at most half of the ulp as stated in the following lemma.
Lemma 1. For rational number x and a positive integer n, |near(x, n) − x| ≤ ulp(x, n)/2 By applying this lemma, we can show that |re | ≤ 2−60 . Furthermore, from the definition given above and Lemma 1, ACL2(r) proves that other intermediate values satisfy the following conditions. |t˜3 | ≤ 2−5 + 2−11 |t˜4 | ≤ 2−4 + 2−10 |esq| ˜ ≤ 2−12 ˜ |t5 | ≤ 2−1 + 2−7 |e˜1 | ≤ 2−5 + 2−50 |t˜1 | < 182 128 |t˜6 | ≤ 2−4 + 2−9 −5 |q˜0e | ≤ 182 128 × 2 −1 |t˜7 | ≤ 2 + 2−6
|t3 | ≤ 2−5 + 2−11 |t4 | ≤ 2−4 + 2−10 |esq| ≤ 2−12 |t5 | ≤ 2−1 + 2−7 |e1 | ≤ 2−5 + 2−50 |t1 | < 182 128 |t6 | ≤ 2−4 + 2−9 −5 |q0e | ≤ 182 128 × 2 −1 |t7 | ≤ 2 + 2−6
|rt3 | ≤ 2−58 |rt4 | ≤ 2−57 |resq | ≤ 2−66 |rt5 | ≤ 2−54 |re1 | ≤ 2−58 |rt1 | ≤ 2−53 |rt6 | ≤ 2−57 |rq0e | ≤ 2−58 |rt7 | ≤ 2−54
Next we represent each intermediate value as the sum of the formula the intermediate value is intended to represent and an error term. For example, esq ere + re2 + resq . is the sum of e˜ × e˜ and the error term Eesq = 2˜ esq = (˜ e + re ) × (˜ e + re ) + resq = e˜ × e˜ + 2˜ ere + re2 + resq = e˜ × e˜ + Eesq , From the magnitude of the intermediate values, the size of the error term Eesq can be calculated as: e||re | + |re |2 + |resq | < 2−64 |Eesq | < 2|˜
Mechanical Verification of a Square Root Algorithm
285
Similarly, with appropriate error terms Eq0e , Et3 , Et4 , Et5 , Et6 and Et7 , we can represent qe0 , t3 , t4 , t5 , t6 and t7 in the following way. |Eq0e | ≤ 2−56 |Et3 | ≤ 2−58 + 2−65 |Et4 | ≤ 2−57 + 2−64 |Et5 | ≤ 2−54 + 2−61 |Et6 | ≤ 2−56 + 2−63 |Et7 | ≤ 2−53 + 2−60
q0e = q0s (˜ e + µ) + Eq0e t3 = c4 + c5 e˜ + Et3 t4 = c2 + c3 e˜ + Et4 t5 = c0 + c1 e˜ + Et5 t6 = c2 + c3 e˜ + c4 e˜2 + c5 e˜3 + Et6 t7 = c0 + c1 e˜ + c2 e˜2 + c3 e˜3 + c4 e˜4 + c5 e˜5 + Et7
Let P (x) denote the polynomial1 c0 + c1 x + c2 x2 + c3 x3 + c4 x4 + c5 x5 . Then we can represent the final estimate q1 as: |Eq1 | ≤ 2−56
q1 = q0s + q0s (˜ e + µ)P (˜ e) + Eq1
with an appropriate error term Eq1 . We are going to rewrite the last equation using a number of series approximation. Let us define Esu = 1 + µ − (1 + µ/2) √ Ese = 1 − e˜ − (1 − e˜/2) √ e)) Echeb = 1 − e˜ − (1 + e˜P (˜ Further we define the following error terms: e) − (−1/2 − e˜/8) Epet2 = P (˜ √ Esb = q0s × Ese − b × (Esu + µ/2) Ef inal = −3/8 × q0s µ˜ e − q0s Echeb + µEsb /2 + Then we can prove that q1 =
√
√
bEsu + Eq1 + q0s µEpet2
b + Ef inal .
The hand proof of this equation is long and tedious, but ACL2(r) can prove it using only five defthm proof commands. The details of the proof are provided in [21]. Let us assume that the following inequalities hold: |Esu | ≤ 2−105 |Ese | ≤ 2−15 + 2−19 −58
|Echeb | ≤ 3/2 × 2
(1) (2) (3)
Then, we can prove from the definition of Ef inal that |Ef inal | < 2−55 ≤ ulp(b, 53)/4. This will prove Theorem 1, given that proof obligations (1)-(3) are resolved in the next section. 1
Since coefficients c0 through c5 depend on b, P (x) depends on b as well. In ACL2(r), we define it as a function that takes b and x as its arguments.
286
5 5.1
J. Sawada and R. Gamboa
Use of Taylor’s Theorem in Error Calculation Simple Application
In this section, we prove the inequalities (1) and (2) from the previous section that give the upper bounds of |Esu | and |Ese |. Since the square root function is infinitely differentiable and its derivatives are continuous on the positive domain, we can apply Taylor’s theorem on its entire positive domain. Let us define a(n, x) as: n−1 1 1 1 a(n, x) = ( − i)x 2 −n . n! i=0 2
√ Function a(n, x0 ) gives the n’th Taylor coefficient for x at x0 . Thus Taylor’s equation presented in Section 3 can be written for square root as: n−1 x0 + δ = a(i, x0 )δ i + a(n, ξ)δ n . i=0
Given that (nth-tseries-sqrt i x0 δ) represents a(i, x0 )δ i , we define the n’th degree Taylor polynomial in the ACL2(r) logic as follows: (defun tseries-sqrt (n x d) (if (zp n) 0 (+ (nth-tseries-sqrt (- n 1) x d) (tseries-sqrt (- n 1) x d)))) We also define the term ξ in Taylor’s theorem as the function (taylor-sqrt-xi n x δ) using the same principle as find-max-f-x in Section 2. Then Taylor’s theorem proven earlier can be instantiated to the following theorem: (defthm taylor-theorem-on-sqrt (implies (and (integerp n) (< 1 n) (<= n (tay-degree-ubound)) (realp x) (realp d) (< 0 x) (< 0 (+ x d))) (equal (acl2-sqrt (+ x d)) (+ (tseries-sqrt n x d) (nth-tseries-sqrt n (taylor-sqrt-xi n x d) d)))) We can also prove that (taylor-sqrt-xi n x δ) returns a real number that is in the open segment (x, x + δ). The condition (<= n (tay-degree-ubound)) guarantees that n is i-limited in the sense discussed in Section 2. Using this theorem, we will prove the upper bound of error terms. An upper bound of |Esu | can be directly calculated by applying Taylor’s theorem. √ Since Esu is equal to the second degree Taylor remainder for the function 1 + µ at µ = 0: 3 1 Esu = 1 + µ − (1 + µ/2) = − (1 + ξ)− 2 µ2 . 8
Mechanical Verification of a Square Root Algorithm
Since |ξ| < |µ| ≤
397 128
|Esu | <
287
× 2−53 , an upper bound of |Esu | is given as 3 397 1 397 (1 − × 2−53 )− 2 × ( × 2−53 )2 < 2−105 . 8 128 128
Similarly, the upper bound for |Ese | can be calculated as |Ese | <
3 1 × (1 − 2−6 )− 2 × (2−6 )2 < 2−15 + 2−19 . 8
5.2
Use of Taylor Approximation on Divided Segments √ e))| and prove Now we calculate an upper bound of |Echeb | = | 1 − e˜ − (1 + e˜P (˜ the inequality (3). Since the Chebyshev polynomial 1 + e ˜ P (˜ e ) is a better approx√ imation of 1 − e˜ than the Taylor polynomial of the same degree, the error size analysis using a Taylor polynomial is not straightforward. Our approach is to divide the range of e˜ into small segments, generate a Taylor polynomial for each segment, and separately calculate the error of the Chebyshev polynomial in individual segments. A divided segment should be small enough so that the generated Taylor polynomial is far more accurate than the Chebyshev polynomial. The range of e˜ is [−2−6 , 2−6 ]. We divided it into 128 segments of size 2−12 to achieve this goal. In order to carry out the proof on many segments efficiently, it is critical to automate the error analysis at each segment. One of the major obstacles to automatic analysis is that ACL2(r) cannot directly compute the square root, because acl2-sqrt is defined using non-standard analysis, and it might return irrational numbers which cannot be computed by ACL2(r). Because of this reason, we used a function approximating the square root function with an arbitrary precision. x ) √ √ The ACL2(r) function (iter-sqrt this paper, we write x to denote returns a rational number close to √x. In √ √ √ this function. This function satisfies x × x ≤ x and x − x ×√ x √ < for a positive rational number . From this, we can easily prove that x − x ≤ max( , /x). Using this function, we define an ACL2(r) function a (x, i, η) that calculates the approximation of a(i, x0 ). More precisely, a (x, i, η) =
n−1 √ 1 (1/2 − i) xη x−n . n! i=0
Then we can show |a(x, n) − a (x, n, η)| ≤
n−1 1 (1/2 − i) × max(η, η/x)x−n . n! i=0
As discussed in Section 4, our algorithm selects Chebyshev coefficients from two sets of constants depending on the 6th fraction bit of b. Let Chebp (e) = 1 + c0p e + c1p e2 + c2p e3 + c3p e4 + c4p e5 + c5p e6 Chebn (e) = 1 + c0n e + c1n e2 + c2n e3 + c3n e4 + c4n e5 + c5n e6
288
J. Sawada and R. Gamboa
√ Then Echeb = 1 − e˜ − Chebp (˜ e) when the 6th fraction bit of b is 0, and Echeb = √ 1 − e˜ − Chebn (˜ e) when it is 1. Let us calculate the size of Echeb for the case where the 6th fraction bit of b is 1. Even though the heuristics discussed in Section 4 suggests that e˜ tends to be negative in this case, a simple analysis shows that −2−6 ≤ e˜ ≤ 3/2 × 2−12 ; e could take some positive numbers. We analyze the entire √ domain of e˜ by dividing e), where it into 66 small segments. We substitute e0 −eδ for e˜ in 1 − e˜−Chebn (˜ e0 is one of the 66 constants −63 × 2−12 , −62 × 2−12 , . . ., 2 × 2−12 , while eδ is a new variable that satisfies 0 ≤ eδ ≤ 2−12 . The upper bound for the entire domain of e˜ is simply the maximum value of all the upper bounds for the 66 segments. The upper bound for |Echeb | can be represented as the summation of three terms. 5 √ √ | 1 − e0 + eδ − Chebn (e0 − eδ )| ≤ 1 − e0 + eδ − a(1 − e0 , i)eiδ + i=0 5 5 i i a(1 − e0 , i)eδ − a (1 − e0 , i, η)eδ + i=0 i=0 5 a (1 − e0 , i, η)eiδ − Chebn (e0 − eδ ) . i=0 An upper bound for the first term can be given by applying Taylor’s theorem. 5 5 √ 11 1 1 | 1 − e0 + eδ − a(1 − e0 , i)eiδ | ≤ |a(ξ, 6)e6δ | ≤ | − i||ξ − 2 ||e6δ | 6! 2 i=0
<
i=0
5 1 1
6!
(
i=0
2
− i) × max((1 − e0 )−6 , (1 − e0 )−5 ) × 2−78 .
Here ξ is the constant satisfying Taylor’s theorem such that 1 − e0 < ξ < 1 − e0 + eδ . Note that this upper bound can be calculated by ACL2(r) as it does not contain square root nor variables. The upper bound for the second term can be calculated as follows: n−1 n−1 n−1 a(1 − e0 , i)eiδ − a (1 − e0 , i, η)eiδ ≤ |a(1 − e0 , i) − a (1 − e0 , i, η)| eiδ i=0 i=0 i=0 n−1 i−1 −13i 2 −i ≤
i=0
i!
(1/2 − j) × max(η, η/(1 − e0 )) × (1 − e0 )
.
j=0
We chose η to be 2−60 to make this term small enough. Again the upper bound has no variables involved and can be calculated by ACL2(r). The third term is the difference between the Chebyshev series approximation and the Taylor series approximation. Since e0 and η are constant in the third 5 term, we can simplify the term i=0 a (1−e0 , i, η)eiδ −Chebn (e0 −eδ ) into a polynomial of eδ of degree 6. Here having the computational function a (1 − e0 , i, η)
Mechanical Verification of a Square Root Algorithm
289
rather than the real Taylor coefficient allows ACL2(r) to automatically simplify 6 the formula. We denote the resulting polynomial as i=0 bi eiδ , where coefficient bi is a constant automatically calculated by ACL2(r) during the simplification. Then the upper bound can be given as 5 6 6 i a (1 − e0 , i, η)eδ − Chebn (e0 − eδ ) = | bi eiδ | ≤ |bi | × 2−13i . i=0
i=0
i=0
By adding the three upper bounds, we can prove that √ | 1 − e0 + eδ − Chebn (e0 − eδ )| < 3/2 × 2−58 . for all 66 values for e0 . This is the upper bound √ of |Echeb | when the 6th fraction bit of b is 1. Similarly, we can prove that | 1 − e0 + eδ − Chebp (e0 − eδ )| < 3/2×2−58 for the case where the 6th fraction bit is 0. In this case, −6/5×2−12 ≤ e˜ ≤ 2−6 . Since the ranges of e˜ are overlapping for the two cases, we repeat the upper bound analysis on some segments. Summarizing the two cases, |Echeb | has the upper bound 3/2×2−58 . This will complete the proof of the previous section. Each step of this proof has been mechanically checked by ACL2(r). By making the upper bounds of the error terms computational by ACL2(r), this unbound calculation for the hundreds of segments was performed automatically.
6
Discussion
We mechanically proved Taylor’s theorem using non-standard analysis implemented in ACL2(r), and then we used it to formally verify that the Power4 square root algorithm satisfies an error size requirement. One major challenge for its verification was evaluating the approximation error for the Chebyshev polynomial. We have performed error size calculation of the Chebyshev polynomial in hundreds of small segments. For each segment, a Taylor polynomial is generated to evaluate the approximation error of the Chebyshev polynomial. This type of proof can be carried out only with a mechanical theorem prover or other type of computer program, because the simplification of hundreds of formulae is too tedious for humans to carry out correctly. One might wonder why we did not prove theorems about Chebyshev series and use them in the error size analysis. One answer is that the mathematics behind Chebyshev series is much more complex than Taylor series. We believe our approach is a good mix of relatively simple mathematics and the power of mechanical theorem proving. The upper bound proof of the Chebyshev polynomial was carried out automatically after providing the following: 1. ACL2(r) macros that automates the proof for small segments. 2. Computed hints that guides the proof by case analysis. 3. A set of rewrite rules that simplify a polynomial of rational coefficients.
290
J. Sawada and R. Gamboa
An ACL2(r) macro works like a CommonLisp macro; a macro is syntactically translated before it is evaluated by ACL2(r). ACL2(r) computed hint [20] is a programmable mechanism to control ACL2(r) proof, like tactics in the HOL theorem prover. We wrote a macro that is translated into the collection of defthm commands, each of which proves the error size on each segment. Evaluation of this macro generates hundreds of lemmas on the divided segments. The computed hint is used to combine the results together; it first case-splits the proof on an entire range into the analysis of divided segments, and then pick and apply to each case the appropriate lemma proven by the macro. Since this proof with macros and computed hints is parametrized and automatic, we could change the parameter setting to try different proof configura√ tions. For example, we changed the segment size and η used to calculate xη . In fact, Chebyshev polynomial approximation error was obtained by trial-and-error. At first, we set a relatively large number to an ACL2(r) constant *apx error* and ran the prover to verify |Echeb | <*apx error*. If it is successful, we lowered the value of *apx error*, iterated the process until the proof failed. The approximation error analysis using Taylor polynomial requires less computational power than brute-force point-wise analysis. When |˜ e| ≤ 2−6 , the value √ −7 1 − e˜ 1 − e˜/2 ranges approximately from 1 − 2 to 1 + 2−7 . In order to prove that the error of its Chebyshev approximation is less than 1.5 × 2−58 , our estimate suggests that we need to check over 250 points, assuming the monotonicity of the square root function. On the other hand, the entire verification of the square root algorithm using Taylor’s polynomials took 673 seconds on a Pentium III 400MHz system. It is not sheer luck that we could finish the error calculation by analyzing only hundreds of segments. Because the n’th degree Taylor remainder for the square root function is O(dn ) for the segment size d, the approximation error by a Taylor polynomial quickly converges to 0 by making the segment smaller when n is, say, 6. We believe that we can apply our technique to other algorithms involving approximation polynomials. Acknowledgment. We thank Brittany Middleton for proving basic theorems on continuous functions and their derivatives. We also acknowledge the usefulness of the floating-point library distributed with ACL2, which was developed by David Russinoff.
References 1. M. D. Aagaard, R. B. Jones, R. Kaivola, K. R. Kohatsu, and C.-J. H. Seger. Formal verification of iterative algorithms in microprocessors. Proceedings Design Automation Conference (DAC 2000), pages 201 – 206, 2000. 2. R. C. Agarwal, F. G. Gustavson, and M. S. Schmookler. Series approximation methods for divide and square root in the Power3 processor. In Proceedings of the 14th IEEE Symposium on Computer Arithmetic, pages 116–123, 1999. 3. W. Fulks. Advanced Calculus: an introduction to analysis. John Wiley & Sons, third edition, 1978.
Mechanical Verification of a Square Root Algorithm
291
4. R. Gamboa. Mechanically Verifying Real-Valued Algorithms in ACL2. PhD thesis, University of Texas at Austin, 1999. 5. R. Gamboa. The correctness of the Fast Fourier Trasnform: A structured proof in ACL2. Formal Methods in System Design, 20:91–106, January 2002. 6. R. Gamboa and M. Kaufmann. Nonstandard analysis in ACL2. Journal of Automated Reasoning, 27(4):323–351, November 2001. 7. R. Gamboa and B. Middleton. Taylor’s formula with remainder. In Proceedings of the Third International Workshop of the ACL2 Theorem Prover and its Applications (ACL2-2002), 2002. 8. J. Harrison. Verifying the accuracy of polynomial approximations in HOL. In E. L. Gunter and A. Felty, editors, Theorem Proving in Higher Order Logics: 10th International Conference, TPHOLs’97, volume 1275 of LNCS, pages 137–152. SpringerVerlag, 1997. 9. J. Harrison. Formal verification of floating point trigonometric functions. In W. A. Hunt and S. D. Johnson, editors, Formal Methods in Computer-Aided Design: Third International Conference FMCAD 2000, volume 1954 of LNCS, pages 217– 233. Springer-Verlag, 2000. 10. Institute of Electrical and Electronic Engineers. IEEE Standard for Binary Floating-Point Arithmetic. ANSI/IEEE Std 754-1985. 11. M. Kaufmann. Modular proof: The fundamental theorem of calculus. In M. Kaufmann, P. Manolios, and J. S. Moore, editors, Computer-Aided Reasoning: ACL2 Case Studies, chapter 6. Kluwer Academic Press, 2000. 12. M. Kaufmann and J. S. Moore. ACL2: An industrial strength version of nqthm. In Eleventh Annual Conference on Computer Assurance (COMPASS-96), pages 23–34. IEEE Computer Society Press, June 1996. 13. E. Nelson. On-line books: Internal set theory. Available on the world-wide web at http://www.math.princeton.edu/˜nelson/books.html. 14. E. Nelson. Internal set theory. Bulletin of the American Mathematical Society, 83:1165–1198, 1977. 15. J. O’Leary, X. Zhao, R. Gerth, and C.-J. H. Seger. Formally verifying IEEE compliance of floating-point hardware. Intel Technology Journal, Q1, Feb. 1999. 16. A. Robert. Non-Standard Analysis. John Wiley, 1988. 17. A. Robinson. Model theory and non-standard arithmetic, infinitistic methods. In Symposium on Foundations of Mathematics, 1959. 18. D. Russinoff. A Mechanically Checked Proof of Correctness of the AMDK5 Floating-Point Square Root Microcode. Formal Methods in System Design, 14(1), 1999. 19. D. M. Russinoff. A Mechanically Checked Proof of IEEE Compliance of the Floating Point Multiplication, Division, and Square Root Algorithm of the AMDK7 Processor. J. Comput. Math. (UK), 1, 1998. 20. J. Sawada. ACL2 computed hints: Extension and practice. In ACL2 Workshop 2000 Proceedings, Part A. The University of Texas at Austin, Department of Computer Sciences, Technical Report TR-00-29, Nov. 2000. 21. J. Sawada. Formal verification of divide and square algorithms using series calculation. Technical Report RC22444, IBM, May 2002.
A Specification and Verification Framework for Developing Weak Shared Memory Consistency Protocols Prosenjit Chatterjee and Ganesh Gopalakrishnan School of Computing, University of Utah http://www.cs.utah.edu/formal verification/fmcad02.html {prosen,ganesh}@cs.utah.edu
Abstract. A specification and verification methodology for Distributed Shared Memory consistency protocols implementing weak shared memory consistency models is proposed. Our approach uniformly describes a wide range of weak memory models in terms of a single concept—the visibility order of loads, stores, and synchronization operations, as perceived by all the processors. A given implementation is correct with respect to a weak memory model if it produces executions satisfying the visibility order for that memory model. Given an implementation, the designer annotates it with events from the visibility order, and runs reachability analysis to verify it against a specification that is also similarly annotated. A specification is obtained in two stages: first, the designer reverse engineers an intermediate abstraction from the implementation by replacing the coherence network with a logically equivalent concurrent data structure. The replacement is selected in a standard way, depending almost exclusively on the memory model. Verification of the intermediate abstraction against a visibility order specification can be accomplished using theorem-proving. The methodology was applied to four snoopybus protocols implementing aspects of the Alpha and Itanium memory models, with encouraging results.
1
Introduction
The distributed shared memory (DSM) approach plays a central role in multiprocessing. DSM computers include multiprocessor desktop machines, high-end machines [1], and chip-level multiprocessors [2,3,4]. With the growing mismatch between CPU speeds and memory system speeds, the variety of techniques used to hide memory latencies, such as out-of-order completion of shared memory consistency protocol actions, are on the increase. These make DSM hardware implementations highly error-prone and hard to verify. One school of thought says that despite such low-level optimizations, a simple shared memory consistency model, namely sequential consistency, must be presented to programmers [5]. However, this suggestion is not widely followed, as evidenced by most modern microprocessors supporting very weak memory models [6,7,8]. The reason
This work was supported by NSF Grants CCR-9987516 and CCR-0081406
M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 292–309, 2002. c Springer-Verlag Berlin Heidelberg 2002
A Specification and Verification Framework
293
for this trend is that it enables even more aggressive hardware- and compilerlevel optimizations. The net result is that the verification teams are left with a ‘double-whammy’: they have to verify highly complex consistency protocols against highly complex memory model specifications. In other words, significant advances beyond cache coherence1 verification or sequential consistency verification are needed in order to design reliable multiprocessor systems. We use the terms DSM system, shared memory system, and coherence protocol implementation interchangeably, since our only concern in this paper is the verification of the weak memory model aspects of these systems. Where are the bugs? Memory model bugs arise for a variety of reasons. A shared memory system may employ a processor chip that handles the speculative execution of memory operations incorrectly; it may employ a memory controller that incorrectly implements the snoopy bus protocol; or the designer may make an incorrect assumption about the message ordering properties of the network, in a directory-based protocol. Given the wide range of implementations and the correspondingly wide source of bugs, we must be clear on what we propose to verify. What we verify in this paper is an abstract finite-state model created by the designer (called Imp) to mimic his actual implementation, focusing on the shared memory consistency protocol. Typically, we will model only two processors, addresses, as well as data values, as even such scaled-down configurations can generate trillions of states. We take the view that such ‘small instances’ must be adequately verified before the luxury of parameterized verification can be entertained. How is a weak memory model specified? Ever since Lamport [10] defined sequential consistency, researchers have craved for a similarly lucid specification for weak memory models in terms of a single serial order of operations. In our opinion, the degree of their success varies, with our strongest inspirations coming from the works of Lamport [11], Neiger [12], Condon et.al [13], and Ahamad et.al. [9]. In [12], it is shown how a single visibility order may be defined to encompass certain memory models that include strongly ordered as well as weakly ordered operations. However, Neiger seems to propose that (the more difficult option of) multiple visibility orders be still pursued for many ‘less fortunate’ memory models. We propose a way to avoid this complication, and sketch how, for every memory model in a reasonably large taxonomy of memory models (Figure 2), a single visibility order can still be employed. Briefly, this single visibility order is set up as follows. Given a concurrent program consisting of loads, stores, and synchronization operations, the stores are split into local and global stores, where the number of such local/global stores depends on the memory model at hand. Then, a single order of these operations (local/global stores, loads, and synchronization operations—together called visibility events) is defined, by specifying the orderings that are required, leaving the remaining operations unordered (and therefore having the latitude to slide with respect to other operations). Finally, 1
Most DSM researchers view ‘cache coherency’ as an informal descriptive term to signify ‘protocols concerned with managing shared memory.’ We employ this term more formally, to denote per-location sequential consistency, as used by [9].
294
P. Chatterjee and G. Gopalakrishnan
a load value rule is defined in a fixed, and standard way: that every load return the value written into the load location by the most recent store that writes that location. How is verification accomplished? Sticking with the single visibility order approach immediately gives us a simple and intuitive verification approach: (i) obtain a reliable specification whose external traces are all possible visibility orders for the memory model; (ii) annotate Imp with the events of the visibility order, so that every run of Imp may be seen with respect to the traces of these visibility events; (iii) advance the state of the Imp via one of the visibility events, and ensure that the specification can match this visibility event. If the visibility event happens to be a load, a match occurs if they return the same value. This is the only invariant checked during reachability analysis. In effect, we are checking the existence of a refinement map [14] between these models. Also, strictly speaking, we are only debugging the implementation with respect to the specification, because we employ a finite state specification which cannot encompass all possible executions of the memory model [15]. The main observation we make is that the single visibility order written in terms of the split stores enables us to perform a backtrack-free traversal while establishing the refinement mapping. To drive this point home, consider a protocol that supports local bypassing, which allows a load to pick its value straight out of the store buffer, as opposed to allowing the store to post globally, and then loading the global store. By looking at only the external traces of this protocol in terms of loads and unsplit stores, it is impossible to determine whether the local bypassing option has been exercised or not. Let there be a memory system M1 that implements this protocol, and suppose someone makes an identical copy of M1 , and calls it M2 . Certainly we expect M2 to refine M1 . However, M2 and M1 can have two different executions that have a common prefix in terms of unsplit stores and loads. For instance, if P1 runs program store(a, 1); load(a), and P2 runs program load(a), one execution of M1 (obtained by annotating loads with the returned values) is Exec1 = P 1 : store(a, 1); P 1 : load(a, 1); P 2 : load(a, 1), while an execution of M2 is Exec2 = P 1 : store(a, 1); P 1 : load(a, 1); P 2 : load(a, ), where is the initial value of a memory location. In these executions, Exec2 exercises the bypass option while Exec1 didn’t. Note that after a common prefix, the load values disagree. This makes it impossible to establish refinement by comparing point-by-point along the executions. In other words, the ‘internal non-determinism’ has not been fully resolved, as we keep the bypassing step invisible. If we now enrich the external alphabets of M1 and M2 to Σ = {storep1 (a, 1), storeg (a, 1), load(a, 1), load(a, }, where storep1 refers to the store being visible to P1 and storeg refers to the store being visible globally, Exec1 and Exec2 detailed in terms of this alphabet to Exec1’ and Exec2’ will exhibit no non-determinism, permitting reachability analysis to proceed based on the load value invariant alone: Exec1’ = P 1 : storep1 (a, 1); P 1 : load(a, 1); storeg (a, 1); P 2 : load(a, 1) Exec2’ = P 1 : storep1 (a, 1); P 1 : load(a, 1); P 2 : load(a, ); storeg (a, 1).
A Specification and Verification Framework
295
Note that each load now returns a value determined by the most recent store to that location. In our refinement checking procedure, the above two traces will not be the ones compared, as explained shortly. Obtaining a specification in two steps: Our main emphasis is to move towards verification against weak memory models by exploiting common patterns found in contemporary designs. In this connection, we observe that many microarchitectures (hereafter called implementation) can be sub-divided into two parts: (i) a collection of structures, such as load (synonymous with ‘read’) and store (synonymous with ‘write’) buffers, into which the processing engine issues committed memory operations, and (ii) a collection of structures including caches, snoopy busses (for bus-based protocols) or directories (for directory-based protocols), and message buffers, that make up the rest of the DSM system. We call these the internal and external partitions, respectively. The internal partition buffers-up instructions ready for execution. Sometimes, instructions can be completed solely within internal partitions—for example, when a load instruction ‘hit’s within the store buffer, and loads the value there, as in many TSO [6] implementations. In all other cases, instructions wait for the external partition to attain specific states (e.g., make a line exclusive before a store), and then atomically complete. In our methodology, an abstraction step is applied to the implementation (Imp) in which the external partition is replaced by a simpler structure to yield the intermediate abstraction (Impabs ). The internal partition is unaffected by the abstraction, and hence is present in its original form within Impabs . This approach to abstraction is motivated by the fact that in many protocol implementations, a significant amount of complexity resides in the external partition. For most implementations, the choice of the replacement directly depends on the memory model. (We discuss extensions in Section 5.) As our methodology stands, the external partition is a serial memory, or a collection of serial memory units (one per processor). The intermediate abstraction thus created is used as the reference specification for verifying the implementation. This constitutes the Imp verification step. Impabs itself needs to be verified against a formal specification of the memory model (Spec). This constitutes the Impabs verification step. To carry out reachability analysis, we employ a parallel model-checker developed [16] by porting the parallel Murϕ model-checker [17] to the MPI parallel programming library. This model checker runs on Unix/Linux clusters. Efficiency measures: The memory needs of enumerative model checkers are governed by the size of the state-vector as well as the size of the reachable statespace. Our technique avoids keeping two separate copies of the internal partition for Imp and Impabs by actually sharing one copy of this state. This optimization is possible because the state of Impabs would advance in the same manner even if two copies are kept. The external partition does not affect the internal partition other than by helping complete (and drain) instructions buffered in the internal partition. (In other words, it does not ‘reach in’ and modify data buffered in the internal partition.) Also, because of the ‘lock-step’ execution of the models,
296
P. Chatterjee and G. Gopalakrishnan
the number of reachable states generated during reachability stays exactly the same as the number of reachable states for Imp. Hence, a multiplication of Imp states with ‘property automaton’ states, as in some other approaches (e.g., [18]), is avoided. Design Spec Specification
Analyze Spec implications using Executable Specification
Start Final Spec
Identify completion points of all Spec events in Imp
Design Imp Design
Verification
Designed Imp
Failure
Failure Verify Imp against Impabs
Derive Imp abs
Success Verify Imp abs Success
Final Imp
Fig. 1. Specifying, Designing and Verifying an Implementation
Memroy Consistency Models Strong SC
Weak
IBM-370 TSO PSO RMO
Weakest Alpha PC
PowerPC
Hybrid Itanium DASH-RC
Fig. 2. The four classes of memory models
Verifying Impabs : We anticipate the step of verifying Imp to be a stand-alone debugging aid. However, our methodology will be incomplete without verifying Impabs . One way to approach this problem is by using a theorem-prover, as demonstrated in [19]. Another promising approach is to verify Impabs against a ‘golden’ operational model for the memory model (we call it the executable Spec to distinguish it from the visibility order Spec which is non-executable). We have created a prototype tool to automatically generate finite-state executable Specs for memory models, from a description of the weak memory model [20]. This tool is driven by a single Promela [21] program that is parameterizable over a
A Specification and Verification Framework
297
range of memory models. It is possible to verify the executable Spec generation tool once and for all, and then verify Impabs against the Specs generated by it. These directions are expected to be pursued in our future work. Overview of methodology: Once a visibility order for the memory model of interest is developed, the required orderings of the visibility order can be directly entered into a Spec generation tool. The designer may then run the tool on assembly program fragments and gain deeper understanding before he/she embarks on protocol design. We therefore propose the following methodology (see Figure 1): (i) write the (visibility order) Spec, (ii) generate and analyze an executable Spec, (iii) design Imp, and annotate it with visibility events, (iv) depending on the memory model, create Impabs , (v) run our reachability analysis procedure which checks for a refinement mapping, and finally (vi) verify Impabs against Spec (future work). Additional related work and observations: In [13], it is shown that by timestamping events within implementations using a modest extension of Lamport’s work on logical clocks [22], one can show whether the total ordering of the timestamped events satisfies the conditions of the Wisconsin TSO ordering (their version of visibility order tailored for the TSO memory model). The main drawback of their work is the lack of automation. Also, the timestamping arguments can get very tedious. Instead, our approach requires designers to annotate Imp with the events of the visibility order. In practice, this task is not onerous, as each event of the visibility order carries information on when loads and stores appear to complete for various processors. Impabs must also be annotated with these events. This task is considerably simpler than for Imp, as Impabs is chosen in a relatively constrained manner. Many formal approaches (e.g., [23]) verify cache coherency. Other recent work [24,25,26] address verification against sequential consistency. [27] checks event sequences generated by a protocol implementation against those generated by a much simpler (and hence more trustworthy) protocol processor. In a sense, our work can be viewed as a way to obtain the Lamport-clock ordering automatically, using reachability analysis, inspired by the informal ideas in [27]. In [28], memory models are characterized in terms of architectural rules that pertain to specific aspects of the memory model, such as store orderings and store atomicity, and used as a basis to create concurrent programs that debug actual multiprocessor machines. [18] discusses how to create finite state ‘test automata’ that capture these concurrent programs as a specification for use during implementation verification. In [29,30], this work has been extended to encompass several weak memory models. In many advanced protocols, the logical and temporal orders of memory actions2 differ. The logical order is the order used in an “explanation” of the ordering, while the temporal order is a time-sequence of actions (whenever such real-time orderings are obtainable). For example, in the protocol due to Scheurich [31], a cache can pretend to invalidate a cache line, send an invalidation acknowledgement, and ‘secretly continue to enjoy’ the cache line in shared mode. In [32],
298
P. Chatterjee and G. Gopalakrishnan
handling such optimizations through timestamping is explained. In Section 4.4, we briefly explain how we model these situations in our framework. Roadmap: Section 2 sketches our approach to specify memory models, with the details present in [33]. Section 3 discusses an Alpha implementation, and the creation of the intermediate abstraction. Section 4 discusses the transitions of Impabs , the synchronization scheme, our experimental results, and handling implementations with different logical and temporal orders. Section 5 concludes the paper.
2
Specification of Memory Models
2.1
Executable Spec
A (concurrent shared memory) program is a set of sequences of load, store, and synchronization instructions, with one sequence per processor. Each sequence captures the program order at that processor. The execution of a program on a shared memory multiprocessor completes the instructions, returning values for the loads. An execution of a concurrent program is the program with each load labeled with the returned value. Every instruction in an execution can be decomposed into one or two3 events. If two events are used, they are the local and global events. In the former case, we shall use ‘instruction’ and ‘event’ synonymously. Each event t is a tuple (p, l, o, a, d) where • • • • •
p(t): processor in whose program t originates from l(t): label of instruction t in p’s program o(t): event type a(t): memory address d(t): data
All instructions except stores map onto exactly one event. The stores map onto multiple events depending upon the memory model under consideration. An execution obeys a memory model if the instructions in the execution can be arranged into one logical total order which obeys the Per Processor Order and the Load Value Rule. The Per Processor Order depends on the program order as well as the data dependence order (how values flow through the memory system). The Load Value Rule specifies the data value to be returned by the loads. In general, an executable Spec of a memory model is created by choosing a simple and reliable implementation whose runs satisfy the executions allowed by the memory model, and vice versa 4 . For each category of memory models, we generate a standard executable Spec. We categorize memory models into four classes: 3
4
In general, an instruction could be decomposed into more than two events. In order to keep the main ideas clear, we discuss this detail only at selected places in this paper. However, since executable Specs are finite-state, the full generality of store/load re-orderings is not supported by them.
A Specification and Verification Framework
299
1. Strong: requires Write atomicity and does not allow local bypassing. (e.g. Sequential Consistency [34], IBM 370) 2. Weak: requires Write atomicity and allows local bypassing (e.g. Ultra Sparc TSO, PSO and RMO [6], Alpha[35]) 3. Weakest: does not require Write atomicity and allows local bypassing (e.g. PC [36], PowerPC [8], PRAM [37], Slow Memory [38]) 4. Hybrid: supports weak load and store instructions that come under Weakest memory model class and also support strong load and store instructions that come under Strong or Weak memory model class (e.g. Itanium [39], DASH-RC [8], RCP C [8]).
Table 1. Splitting of store and data structures for each memory model class Splitting of Data Structures (Internal) Memory Model store instructions load buffer LB, store buffer SB Strong store unsplit store split to local Weak same as above and global store split to local and (p + 1)5 globals same as above Weakest store split to local Hybrid and (p + 1) globals same as above
Data Structures (External) single port memory M single port memory M memory M and re-order buffer W IB per processor memory M and re-order buffer W IB per processor
Depending upon the category that a memory model falls under (Figure 2), we split a store instruction into one or more events. In case of Sequential Consistency (strong), we do not split the stores. For the UltraSparc TSO (weak), we split the store instruction into local and global. For the weakest category that lacks write atomicity, each store is split into p + 1 events, where p is number of processors, thus ending up with a local store event and p global store events. Table 1 summarizes these splitting decisions for various memory models, and Table 1 and Figure 3 show the data structures chosen for various memory models. We now briefly sketch the internal and external partitions (details are in [33, 39]): Internal partition: For a memory model belonging to any of the four classes, the internal partition data structures are a load buffer LB and a store buffer SB, where the issued load and store instructions, respectively, are buffered. The completion order of the loads buffered in LB depends on whether the concerned 5
Each memory model class also includes internal load and store buffers. Also, p is number of processors.
P. Chatterjee and G. Gopalakrishnan
ld
ld
st
M
(a)
External
WIB
M
st
M
WIB
External
Internal
SB
LB
st
LB
ld
proc
SB
st
SB
LB
ld
proc Internal
proc Internal
LB
proc Internal
SB
300
(b)
Fig. 3. (a) Generalized Weak Executable Specification (b) Generalized Weakest Executable Specification
memory model allows a load to bypass a previously issued load. Similarly, the ordering of stores depends on whether stores can bypass each other. The relative ordering of loads and stores in LB and SB depends on the load/store bypassing rules. If local bypassing is allowed, a load picks up its value from a matching store instruction residing in SB. External partition: In case of Strong and Weak memory models, the external partition’s data structures is just a single port memory M . The intuition behind having M is that both these classes of memory models require Write Atomicity and hence a store instruction once completed (flushed from SB to M ) should be visible to all processors instantaneously. Weakest and Hybrid memory models require more involved data structures where each processor i has its own memory Mi and also a re-ordering buffer W IBi that takes in incoming store instructions posted by different processors including itself from their internal store buffers (SBs). Store instructions residing in this buffer eventually get flushed to memory. The combination of Mi and W IBi simulates a processor seeing store instructions at different times and different relative orders. 2.2
A Generic Visibility Order Spec for Category ‘Weak’
We now show how we can define a generic visibility order relation covering the ‘Weak’ category. We then show how this specification can be instantiated specific to Alpha. In [33], it is argued that all memory models in the taxonomy of Figure 2 can be similarly specified. For the Weak category, every store instruction is split into two events tlocal = (p, l, stlocal , a, d) and tglobal = (p, l, stglobal , a, d). First, → always includes tlocal → tglobal for split store events originating from the same store instruction. Additional entries in → possible under the W eak category are as follows.
A Specification and Verification Framework
301
Per Processor Order: Let t1 and t2 be two events such that p(t1 ) = p(t2 ), and l(t1 ) < l(t2 ). Then we can have the following t1 → t2 entries for a specific memory model under the Weak category, under various conditions. For the most relaxed memory model of category Weak, under none of the following cases will a t1 → t2 entry be present, while for the strongest memory model of category Weak, a t1 → t2 entry will be present under all the following conditions: When o(t1 ) = o(t2 ) = ld (informally referred to as “ld → ld is true”), When o(t1 ) = ld and o(t2 ) = stglobal (informally “ld → st”), When o(t1 ) = stglobal , o(t2 ) = ld (informally “st → ld”), When o(t1 ) = stglobal , o(t2 ) = stglobal (informally “st → st”), There exists an event tf with o(tf ) = f ence, and l(t1 ) < l(tf ) < l(t2 ) (informally, “f ence”), 6. The conditions for Memory Data Dependence. This sub-rule pertains to instructions involving the same memory location. Hence, when a(t1 ) = a(t2 ), a t1 → t2 entry will be present under none, some, or all of these conditions (depending on the memory model): a) o(t1 ) = ld, o(t2 ) = ld (“ld → ld”), b) o(t1 ) = ld, o(t2 ) = stlocal (“ld → st”), c) o(t1 ) = stlocal , o(t2 ) = ld (“st → ld”), d) o(t1 ) = stlocal , o(t2 ) = stlocal (“st → st”).
1. 2. 3. 4. 5.
For example, if a weak memory model does not allow a load (ld) instruction to bypass a previously issued store (st) instruction (i.e. a ld issued after a st cannot complete before that st) then sub-rule st → ld holds true for that memory model. We now define the Load Value rule covering all memory models of category Weak. Load Value: Let t1 be a load instruction. Then the data value of t1 is the data value of the “most recent store event t2 ,” i.e., if
a(t1 ) = a(t2 ), tlocal → t1 → tglobal and there does not exist a store event 2 2 → tlocal → t1 , t3 s.t p(t1 ) = p(t3 ), a(t1 ) = a(t3 ), tlocal 2 3 then d(t1 ) = d(t2 ). else if a(t1 ) = a(t2 ), tglobal → t1 and there does not exist a store instruction t3 2 s.t a(t1 ) = a(t3 ), and tglobal → tglobal → t1 , 2 3 then also d(t1 ) = d(t2 ), else, t1 receives the initial value . Write Atomicity: Notice that we employ stlocal s purely to model the effect of local bypassing. All other loads in any execution obtain their values from stglobal s. Since there is only one stglobal event for every store, and we have a single visibility order, all the store events t, where o(t) = stglobal , are part of the visibility order. All these facts together means that Write Atomicity is obeyed. Instantiation for Alpha: In the Alpha memory model, Memory Data Dependence is obeyed. and the fence rule is also obeyed. No other orderings (e.g. ld → ld,ld → st, etc.) are selected. Our tool to generate executable Specs can accept these selections from its user interface, and produce a Promela model for the Alpha memory model.
302
P. Chatterjee and G. Gopalakrishnan
3
Implementation and Intermediate Abstraction
3.1
An Alpha Implementation Model
In the Alpha implementation (modeled after a multiprocessor using the Compaq (DEC) Alpha 21264 microprocessor), each processor is separated from its cache by a re-ordering coalescing store buffer SB, and a re-ordering load buffer LB. Caches are kept coherent with a write-invalidate coherence protocol [32]. The data structure of caches is a two dimensional array C where, for event t, C[p(t)][a(t)].a refers to data value of address a(t) at processor p(t) and C[p(t)][a(t)].st refers to its address state (A-state). We begin with a brief explanation of the cache coherence protocol we use. This protocol is the same as the one used in [40] to describe a Gigaplane-like split-transaction bus. Memory blocks may be cached Invalid(I), Shared(S), or Exclusive(E). The A-state (address state) records how the block is cached and is used for responding to subsequent bus transactions. The protocol seeks to maintain the expected invariants (e.g, a block is Exclusive in at most one cache) and provides the usual coherent transactions: Get-Shared (GETS), Get-Exclusive (GETX), Upgrade (UPG, for upgrading the block from Shared to Exclusive) and Writeback (WB). As with the Gigaplane, coherence transactions change the A-state regardless of when the data arrives. If a processor issues a GETX transaction and then sees a GETS transaction for the same block by another processor, the processor’s A-state for the block will go from Invalid to Exclusive to Shared, regardless of when it obtains the data. The processor issues all instructions in program order. Below, we specify exactly what happens when the processor issues one of these instructions.
proc Internal
st
ld
RB
SB
proc Internal
ld
WOB
st
RB
SB
RB
ld
st
WOB
proc Internal
st
RB
proc Internal ld
External C
C
External Cache Coherent Protocol, Bus or Network,
M
Main Memory or Memory Modules
(a)
(b)
Fig. 4. (a) The Alpha Imp model, and (b) The Alpha Impabs model
1. st: A st instruction first gets issued to coalescing re-order buffer SB, completing the stlocal event. Entries in SB are the size of cache lines. Stores to the same cache line are coalesced. Entries are eventually deleted (flushed)
A Specification and Verification Framework
303
from SB to the cache, although not necessarily in the order in which they were issued to the write buffer. Before deleting, the processor first makes sure there is no earlier issued ld instruction to the same address pending in LB (if any, those LB instructions must be completed before deleting that entry from SB). It then checks if the corresponding block’s A-state is Exclusive(E). If not, the coherence protocol is invoked to change the A-state to E. Once in E state, the entry is deleted from SB and written into the cache atomically, thus completing the stglobal event. 2. ld: To issue a ld instruction, the processor first checks in its SB for a st instruction to the same word. If there is one, the ld obtains its value from it. If there is no such word, the processor buffers the ld in LB. In future, when an address is in E or S state, all ld entries to that same address in LB gets its data from cache and are then deleted from the buffer. ld entries to different words in LB can be deleted in any relative order. Receipt of data value marks the completion of the corresponding ld event. 3. MB: Upon issuing a MB instruction, all entries in SB are flushed to the cache and all entries in LB are deleted after returning their values from cache, hence completing the corresponding MB event6 . While flushing an entry from SB, the processor checks that there is no earlier issued ld instruction to the same address residing in LB. We call this entire process as f lushimp .
3.2
The Intermediate Abstraction
The Alpha Impabs retains the internal data partition of Imp without any changes. However, the cache, the cache coherent protocol, bus and main memory in the implementation which belong to the external partition are all replaced by a single port main memory M in Impabs . This replacement follows Table 1. We now take a look at how each of the instructions get realized. As with Imp, the processor issues all instructions in program order. 1. st: A st instruction first gets issued to SB just as in the implementation, completing the stlocal event. At any time, an entry anywhere in SB can be deleted from the buffer and written to the single port memory M atomically, provided there is no earlier issued ld instruction to that address pending in LB. This completes the stglobal event. 2. ld: Similarly, as in implementation, a ld instruction tries to hit SB and on a miss, it gets buffered in LB. However, any entry in LB can be deleted once it receives its data from M , both the steps being performed in one atomic step. Entries to same address get their data values from M at the same time. 3. MB: Upon issuing a MB instruction, all entries in SB are flushed to M and all entries in LB are deleted after returning their values from M . While flushing from SB the processor checks that there is no earlier issued load event to the same address residing in LB. We call this entire process as f lushabstract 6
Appropriate cache entries need to be in E state before flushing
304
4
P. Chatterjee and G. Gopalakrishnan
Verification
4.1
Intermediate Abstraction
The events stlocal , stglobal , ld and M B have been defined for both the implementation and the abstract model. Every event of Imp is composed of multiple Table 2. Completion steps of all events of implementation and abstract model Event ld(t) (SBp(t) hit)
Imp
load from SBp(t) Issue to LBp(t) ; C[p(t)][a(t)].st=S or E ld(t) and (SBp(t) miss) d(t) ← C[p(t)][a(t)].a stlocal (t) Issue(SBp(t) , t) C[p(t)][a(t)].st=E and stglobal (t) C[p(t)][a(t)].a ← d(t) M B(t) f lushimp
Impabs load from SBp(t) Issue to LBp(t) ; d(t) ← M [a(t)] Issue(SBp(t) , t) M [a(t)] ← d(t) f lushabstract
steps. However, in Impabs , each event except ld is composed of a single atomic step. For example, for a stglobal event to complete, if the concerned address’s A-state is Invalid, the processor will need to send a request on the bus to get an Exclusive copy. During this process many intermediate steps take place which include other processors and main memory reacting to the request. However, in Impabs , on a miss while handling the stglobal event, the entry in SB can be deleted and written to single port memory atomically. 4.2
Synchronization Scheme
The discovery of the synchronization sequences between the implementation and the specification is the crux of our verification method. Table 2 provides an overview of the overall synchronization scheme. This table compares the completion steps of both the implementation and the abstract model, and highlights all synchronization points. Let us briefly elaborate the actions taken for ld entry in LB to complete. In the implementation, coherence actions are first invoked to promote the cache line into an Exclusive or Shared state. Thereafter, the implementation receives data from the bus and at this point completes the ld event. At this point, the model-checker will immediately make the same event complete 6
Here d(t) ← C[p(t)].[a(t)].a refers to the load instruction t receiving its data from the updated cache entry
A Specification and Verification Framework
305
Table 3. Experimental Results
Cache Coherent Protocol Split Trans. Bus Split Trans. Bus with Scheurich’s Opt. Multiple Interleaved Buses Multiple Interleaved Buses with Scheurich’s Opt.
Alpha Implementation Itanium Implementation States Transitions Time States Time (×106 ) (×106 ) (hrs) (×106 ) Transitions (×106 ) (hrs) 64.16 470.52 0.95 111.59 985.43 1.75 251.92
1794.96
3.42 325.66
2769.77
4.80
255.93
1820.38
3.65 773.27
2686.89
10.97
278.02
1946.67
3.90 927.31
3402.41
12.07
in the abstract model by simply returning the data from M [a(t)] through the multiplexor switch. Synchronization happens if the same datum is returned. In general, the last step that completes any event in the implementation and the single step that completes the same event in the abstract model are performed atomically. The synchronization scheme for instructions that may get buffered and get completed later are slightly more elaborate. Basically, synchronization must be performed both when the instruction is entered into the buffer and later when they complete. For example, since a ld instruction may miss the SB and hence may not complete immediately, we will have to synchronize both the models when ld gets buffered, and finally synchronize again when the ld event completes. The synchronization of M B is accomplished indirectly, by the already existing synchronizations between the models at ld or stglobal . This is because an M B completes when the above instructions occurring before it complete. 4.3
A Discussion of the Results
Our preliminary experimental results are summarized in Table 3. In all these experiments, our parallel Murϕ model-checker was used. The invariant checked was that the load values agree. During the development of our models, our method could detect our coding errors. Our verification methodology does not pre-select a certain number of concurrent programs with respect to which to verify the implementation. Rather, it lets the model-checker automatically pick all concurrent programs that saturate the state-space of the implementation. During verification, Imp runs with its inputs unconstrained, thus considering all possible sequences of external events. Thus, when the implementation Imp goes through its entire reachable state-space, we would automatically consider all concurrent programs that affect the state-space of Imp. In some other works (e.g., [41]), a catalog of concurrent programs are pre-selected.
306
P. Chatterjee and G. Gopalakrishnan
If we contrast model-checking based refinement against theorem-proving based refinement, while the latter gives rise to more general proofs, the process is very labor intensive, almost always requiring the construction of non-trivial invariants. Our approach avoids this manual work, and also returns error-traces that help debug the implementation. A combination of theorem-proving and model checking in this realm seems entirely appropriate. 4.4
Handling Different Logical and Temporal Orders
As explained earlier, with an optimization such as Scheurich’s, some processor may delay the invalidation of its cached copy. Thus, in Imp, a stglobal cannot be regarded as having completed when the cache write happens, because it then means that it becomes visible to all processors. This can result in a false negative verification with respect to Impabs .
RB
ld
SB
RB
proc Internal
st
st
SB
proc Internal ld
M’
External M
Fig. 5. Modified Alpha abstract Model
This optimization is taken into account by Imp, by generating stglobal not when the cache is written, but rather when the cache entry is in the Exclusive A-state and all the other caches have invalidated their shared copies. We can modify Impabs to take into account this optimization by having a ‘shadow’ memory module M prior to the single-ported memory, as in Figure 5. With these changes, every cache write at Imp is matched with a shadow memory write, and the final invalidation of all shared copies is matched with a transfer of contents from M to M . As part of future work, we plan to investigage how to systematize such abstraction methods.
5
Concluding Remarks
In this paper, we presented a specification and verification methodology for shared memory consistency protocols, and reported encouraging results on veri-
A Specification and Verification Framework
307
fying four snoopy-bus protocols implementing aspects of the Alpha and Itanium memory models (available from our webpage). Our approach fits today’s design flow where aggressive, performance oriented protocols are first designed by expert designers, and handed over to verification engineers. The verification engineer, in turn, follows a systematic method for deriving an abstract reference model, and then uses a parallel model-checker to conduct verification. Effort-wise, our proposed method of obtaining the abstract model compares favorably with the effort of writing verification properties for a model checker. Our approach does not require special training to use, and can benefit from the use of multiple high-performance PCs to conduct parallel/distributed model checking, thereby covering large state spaces. Our future work will focus more on aggressive directory-based protocols. To be able to handle large state-spaces, we are working on an aggressive list of optimizations to our parallel model-checker. We are also exploring random-walk methods to achieve very high state-generation rates, thus using our methodology also for bug-hunting. We also plan to develop rigorous methods to verify the intermediate abstraction of the implementation against the memory model specification.
References 1. The ASCI White Computer http://www.llnl.gov/asci/. 2. The Sun MAJC Microarchitecture http://www.sun.com/microelectronics/MAJC/. 3. The IBM Power4 Microarchitecture http://www-1.ibm.com/servers/eserver/ pseries/hardware/whitepapers/power4.html. 4. L.A. Barroso, K. Gharachoroloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese, “Piranha: A scalable architecture based on single-chip multiprocessing,” in 27th International Symposium on Computer Architecture (ISCA), June 2000. 5. Mark D. Hill, “Multiprocessors should support simple memory-consistency models,” IEEE Computer, vol. 31, no. 8, pp. 28–34, 1998. 6. David L. Weaver and Tom Germond, The SPARC Architecture Manual – Version 9, P T R Prentice-Hall, Englewood Cliffs, NJ 07632, USA, 1994. 7. Intel, The IA-64 Architecture Software Developer’s Manual Vol. 2 rev. 1.1: Itanium (TM); System Architecture, Intel, 2000, Volume 2, Chapter 13, “Coherence and MP Ordering.” http://developer.intel.com/design/ia-64/downloads/24531802.htm. 8. Sarita V. Adve and Kourosh Gharachorloo, “Shared memory consistency models: A tutorial,” Computer, vol. 29, no. 12, pp. 66–76, Dec. 1996. 9. M. Ahamad, R. A. Bazzi, R. John, P. Kohli, and G. Neiger, “The power of processor consistency (extended abstract),” in Proc. of the 5th ACM Annual Symp. on Parallel Algorithms and Architectures (SPAA’93), June 1993, pp. 251–260. 10. Leslie Lamport, “How to make a multiprocessor computer that correctly executes multiprocess programs,” IEEE Transactions on Computers, vol. 9, no. 29, pp. 690–691, 1979. 11. Leslie Lamport, “The wildfire challenge problem,” http://research.microsoft.com/users/lamport/tla/wildfire-challenge.html.
308
P. Chatterjee and G. Gopalakrishnan
12. Gil Neiger, ,” 2001, http://www.cs.utah.edu/mpv/papers/neiger/fmcad2001.pdf. 13. Anne Condon, Mark Hill, Manoj Plakal, and David Sorin, “Using lamport clocks to reason about relaxed memory models,” in Proceedings of the Fifth International Symposium On High Performance Computer Architecture (HPCA-5), Jan. 1999. 14. Mart´ın Abadi and Leslie Lamport, “The existence of refinement mappings,” Theoretical Computer Science, vol. 82, no. 2, pp. 253–284, 1991. 15. Rajeev Alur, Ken McMillan, and Doron Peled, “Model-checking of correctness conditions for concurrent objects,” in 11th Annual IEEE Symposium on Logic in Computer Science, New Brunswick, New Jersey, July 1996, pp. 219–228. 16. Hemanthkumar Sivaraj, “Parallel and distributed model checking,” M.S. thesis, School of Computing, University of Utah, 2002, In progress. 17. Ulrich Stern and David Dill, “Parallelizing the Murϕ verifier,” Formal Methods in System Design, vol. 18, no. 2, pp. 117–129, 2001, (Journal version of their CAV 1997 paper). 18. Ratan Nalumasu, Rajnish Ghughal, Abdel Mokkedem, and Ganesh Gopalakrishnan, “The ‘test model-checking’ approach to the verification of formal memory models of multiprocessors,” in Computer Aided Verification98, Alan J. Hu and Moshe Y. Vardi, Eds., Vancouver, BC, Canada, June/July 1998, vol. 1427 of Lecture Notes in Computer Science, pp. 464–476, Springer-Verlag. 19. Seungjoon Park, Computer Assisted Analysis of Multiprocessor Memory Systems, Ph.D. thesis, Stanford University, jun 1996, Department of Computer Science. 20. Prosenjit Chatterjee, ,” Tool available at http://www.cs.utah.edu/formal verification/ESGtool. 21. G. J. Holzmann, “The model checker spin,” IEEE Transactions on Software Engineering, vol. 23, no. 5, pp. 279–295, May 1997, Special issue on Formal Methods in Software Practice. 22. Leslie Lamport, “Time, clocks, and the ordering of events in a distributed program,” Communications of the ACM, vol. 21, no. 7, pp. 558–565, 1978. 23. Homayoon Akhiani, Damien Doligez, Paul Harter, Leslie Lamport, Joshua Scheid, Mark Tuttle, and Yuan Yu, “Cache coherence verification with tla+,” in World Congress on Formal Methods, 1999, vol. LNCS 1709, pp. 1871–1872. 24. Thomas Henzinger, Shaz Qadeer, and Sriram Rajamani, “Verifying sequential consistency on shared-memory multiprocessor systems,” in Computer Aided Verification99, Nicolas Halbwachs and Doron Peled, Eds., Trento, Italy, July 1999, vol. 1633 of Lecture Notes in Computer Science, pp. 301–315, Springer-Verlag. 25. Shaz Qadeer, “Verifying sequential consistency on shared-memory multiprocessors by model checking,” Tech. Rep., SRC, Dec. 2001, Research Report 176. 26. Michael Merritt, “Guest editorial: Special issue on shared memory systems,” Distributed Computing, vol. 12, no. 12, pp. 55–56, 1999. 27. Jason F. Cantin, Mikko H. Lipasti, and James E. Smith, “Dynamic verification of cache coherence protocol,” in Workshop on Memory Performance Issues, in conjunction with ISCA, June 2001. 28. W. W. Collier, Reasoning About Parallel Architectures, Prentice-Hall, Englewood Cliffs, NJ, 1992. 29. Rajnish Ghughal and Ganesh Gopalakrishnan, “Verification methods for weaker shared memory consistency models,” in Proc. of the workshop FMPPTA (Formal Methods for Parallel Programming: Theory and Applications), Cancun, Mexico. LNCS # 1800, Jos´e Rolim et al., Ed., May 2000, pp. 985–992. 30. Rajnish Ghughal, “Test model-checking approach to verification of formal memory models,” M.S. thesis, Department of Computer Science, University of Utah, 1999, Also available from http://www.cs.utah.edu/formal_verification.
A Specification and Verification Framework
309
31. Christoph Scheurich, Access Ordering and Coherence in Shared Memory Multiprocessors, Ph.D. thesis, University of Southern California, May 1989. 32. D. Sorin, M. Plakal, A. E. Condon, M. D. Hill, M. M. Martin, and D. A. Wood, “Specifying and verifying a broadcast and a multicast snooping cache coherence protocol,” Tech. Rep. #1412, Computer Sciences Department, U. Wisconsin, Madison, Mar. 2000. 33. Prosenjit Chatterjee, “Formal specification and verification of memory consistency models of shared memory multiprocessors,” M.S. thesis, Department of Computer Science, University of Utah, 2002, Also available from http://www.cs.utah.edu/formal_verification. 34. Leslie Lamport, “How to make a correct multiprocess program execute correctly on a multiprocessor,” Tech. Rep., Digital Equipment Corporation, Systems Research Center, Feb. 1993. 35. Richard L. Sites, Alpha Architecture Reference Manual, Digital Press, 1992. 36. K. Gharachorloo, D. E. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. L. Hennessy, “Memory consistency and event ordering in scalable shared-memory multiprocessors,” in Proc. of the 17th Annual Int’l Symp. on Computer Architecture (ISCA’90), May 1990, pp. 15–26. 37. R. J. Lipton and J. S. Sandberg, “Pram: A scalable shared memory,” Tech. Rep. CS-TR-180-88, Dept. of Computer Science, Princeton University, Sept. 1988. 38. P. W. Hutto and M. Ahamad, “Slow memory: Weakening consistency to enhance concurrency in distributed shared memories,” in Proc. of the 10th Int’l Conf. on Distributed Computing Systems (ICDCS-10), May 1990, pp. 302–311. 39. Prosenjit Chatterjee and Ganesh Gopalakrishnan, “Towards a formal model of shared memory consistency for intel itanium,” in International Conference on Computer Aided Design, Austin, USA, 2001. 40. A.Singhal, D.Broniarczyk, F.Cerauskis, J.Price, L.Yuan, C.Cheng, D.Doblar, S.Fosth, N.Agarwal, K.Harvey, E.Hangersten, and B.Liencres, “Gigaplane: A high performance bus for large smps,” in Proc. of the 4th Annual Symposium on High Performance Interconnects at Stanford University, 1996, pp. 41–52. 41. Ratan Nalumasu, Rajnish Ghughal, Abdel Mokkedem, and Ganesh Gopalakrishnan, “The ‘test model-checking’ approach to the verification of formal memory models of multiprocessors,” in Computer Aided Verification, Alan J. Hu and Moshe Y. Vardi, Eds., Vancouver, BC, Canada, June 1998, vol. 1427 of Lecture Notes in Computer Science, pp. 464–476, Springer-Verlag. 42. http://www.cs.utah.edu/˜prosen/fmcad02.html.
Model Checking the Design of an Unrestricted, Stuck-at Fault Tolerant, Asynchronous Sequential Circuit Using SMV Meine van der Meulen SIMTECH Engineering, Rotterdam, The Netherlands [email protected] http://www.simtech.nl
Abstract. The design of unrestricted, stuck-at fault tolerant, asynchronous sequential circuits involves the use of complex software. Since software errors might lead to incorrect design, it is important to verify the correctness of the results. A possible method to do this, is by proving that the design possesses the required properties ’unrestricted’ and ’stuck-at fault tolerant’. This paper presents this approach using the model checker SMV. The approach used is general, and can be applied to all mealy-type asynchronous sequential circuits. The paper shows the approach using an example. It appears possible to prove that the circuit is unrestricted, does not reach undefined states, is stable, and shows correct behavior. These properties are also proved under the assumption of the presence of one stuck-at fault. An important intermediate result is the design of the delay in the feedback loop of the asynchronous sequential circuit. Since the duration of the time steps in the model checker is random, it is not possible to use a deterministic model. The model developed is an abstract model for the behavior of delay elements comparable to RC-filters. It includes a notion of synchronization with the other delays in the circuit.
1
Introduction
The design of fault-tolerant asynchronous sequential circuits has been the subject of many papers in the seventies. Many methods have been proposed, most of them using Single Transition Time (STT) state transitions or related schemes ([1], [2], [3], [4], [5], [6], [7]). Disadvantage of these methods is that they require the environment to be fundamental-mode, i.e. the input signals may only change when the circuit is stable. To circumvent this disadvantage we propose a method using Multiple Transition Time (MTT) state transitions for designing unrestricted, stuck-at fault tolerant circuits (to be published). The mathematics behind designing such circuits is complicated, and the design of the circuits involves software. Therefore, a legitimate question is whether the circuit produced by the software is correct: whether it really is unrestricted M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 310–323, 2002. c Springer-Verlag Berlin Heidelberg 2002
Model Checking the Design of an Asynchronous Sequential Circuit
311
and stuck-at fault tolerant. This paper investigates ways to check the correctness of the output of the software using a model checker: SMV. Since the output of a sequential circuit simply is combinational logic on the inputs and the state, this paper does not consider the correctness of this logic. This can be done using available methods.
2
Designing MTT Circuits
The design of MTT circuits involves the following steps: 1. Describe the desired behavior of the circuit in a finite state machine. See Table 1(a) for the example used in this paper. (The example is taken from Maki & Sawin [7].) 2. Determine the number of stuck-at faults to be tolerated. The number of faults tolerated in this example is one. 3. Design the circuit. This can almost only be done by computer. How it is done is outside the scope of this paper, a possible result for the finite state machine in Table 1(a) is shown in Table 2. Tables 1(b) and (c) show the assignment of state vectors to states and input vectors to inputs (The resulting circuit is close to that of Maki & Sawin [7]. The assignment of state vectors to states and input vectors to inputs in Tables 1(b) and (c) is the same.) 4. Optimize the circuit. The design proposed can be logically optimized before building using available tools like SIS [10]. In this paper we perform the SMV proofs on a circuit that is not yet optimized. As can be easily seen, we could just as well have performed the proofs on the optimized circuit. By doing the latter, we would have also been independent of the possible presence of faults in the optimizer. Advantage of performing the proofs on the circuit before optimization, is that it is possible to model the don’t cares. The proofs then show that the properties hold for all possible optimizations.
Table 1. (a) Flow table of the example finite state machine. State vector (b) and input vector (c) assignment for an asynchronous sequential circuit tolerating one stuck-at fault. State S1 S2 S3 S4
state transitions I1 I2 S1 S4 S1 S2 S3 S2 S3 S4 (a)
State S1 S2 S3 S4
y1 1 1 0 0
State vector y2 y3 y4 y5 y6 0 1 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 (b)
Input I1 I2
Input vector x1 0 1 (c)
312
M. van der Meulen
Table 2. Assignment of state vectors to the state vector transition function. (Italics indicate state vectors only reachable by input changes during state transitions.) x=0 y1 y2 =00 y3 y4 y5 y6 00 00 01 11 10 010010 x=0 y1 y2 =10 y3 y4 y5 y6 00 00 01 101101 11 101001 10 101011 x=1 y1 y2 =00 y3 y4 y5 y6 00 00 01 11 10 101010 x=1 y1 y2 =10 y3 y4 y5 y6 00 00 01 110101 11 101001 10 101010
3 3.1
01 11 10 010110 101010 010100 101101 101101 101001 010010 101010 101011 01 11 10 110101 101101 101101 101101 101101 101101 110101 101101 101101 101011 101001 01 11 10 010110 101010 010101 100101 101101 101010 010010 101010 101010 01 11 10 110101 100101 101010 010101 110101 100101 110101 100101 101010 101010 101010
x=0 y1 y2 =01 y3 y4 y5 y6 00 00 010010 01 010100 11 010010 10 010010 x=0 y1 y2 =11 y3 y4 y5 y6 00 00 01 010101 11 10 010010 x=1 y1 y2 =01 y3 y4 y5 y6 00 00 011010 01 010101 11 011010 10 001010 x=1 y1 y2 =11 y3 y4 y5 y6 00 00 01 010101 11 10 011010
01 11 10 010010 010110 001010 010110 010100 010010 001010 010010 010010 010010 01 11 10 010110 010100 101101 101101 010101 101001 010010 101011 01 11 10 010101 010110 001010 010101 010101 010101 001010 011010 001010 101010 01 11 10 010101 010101 010101 101101 010101 101001 010010 101010
Concepts Unrestricted
Brzozowski proposes the following definition for the concept ’finiteness’ [8]: only a finite, but possibly unbounded, number of signal changes can occur in any finite interval. He then derives the concept ’unrestricted’ as follows: an environment satisfying only the finiteness condition is called unrestricted. The definition is clear, but it does not state what the reaction of the circuit is to an unrestricted environment. When the finite state machine describing the behavior of the circuit is deterministic, it reacts to all signal changes. The realized circuit however will never be able to do that under all circumstances. When input signals change too quickly (e.g. spikes), a circuit might just ignore them. In that case the circuit is not a legitimate implementation of the finite state machine, since their behaviors under the same input are different.
Model Checking the Design of an Asynchronous Sequential Circuit
313
Therefore, in an unrestricted environment, we have to assume that the finite state machine is not deterministic in the sense that some inputs may be ignored. A second observation is the following. Assume that when two subsequent transitions occur from state A to B to C. When this is done quickly, the circuit might for example ’step over’ state B (this is a consequence of design using multiple state transitions, but can in be observed in other logic as well). The outside world only observes a transition from A to C. Therefore, we have to assume that intermediate states of subsequent state transitions might not be observable. The properties of the states of the circuit can then only be described under the assumption that input signals are eventually stable. 3.2
Stuck-at Fault
We use the following definition for stuck-at fault: A state variable remains the same value from a certain moment in time.
4
Modelling the Finite State Machine in SMV
The model of the finite state machine in SMV is very simple. In the array fsm it captures the states possible entered, given the input sequence: init(fsm):= [1,0,0,0]; \* S1 is next(fsm[1]):= fsm[1] | (fsm[2] next(fsm[2]):= fsm[2] | (fsm[3] next(fsm[3]):= fsm[3] | (fsm[4] next(fsm[4]):= fsm[4] | (fsm[1]
the initial state *\ & ˜x1); & x1); & ˜x1); & x1);
This model is based on the description of the finite state machine in Table 1(a) adding the non-determinism. The array fsm contains the states in which the circuit can be given the input sequence.
5 5.1
Modelling the Circuit in SMV Underlying Model
The general model of the circuit is the Mealy model. Figure 1 depicts the general model of a Mealy-type asynchronous sequential circuit. The circuit contains combinational logic, delay elements and wiring. We assume that the combinational logic as well as the wiring do not have delays. The delay in the combinational logic and the wiring is modelled in the delay elements. The input to the circuit is the vector X, containing p bits. For the example in this paper: p = 1. The state vector of the circuit is the vector Y , containing q bits. For the example in this paper: q = 6.
314
M. van der Meulen
Input vector X
x1 x.2 .. xp
State vector Y
y1 y.2 .. yq
Output vector Combinational Logic Circuit
y1 y.2 .. yq
δY (Y, X)
Delay q .. . Delay 2 Delay 1
Fig. 1. Model of a Mealy-type asynchronous sequential circuit.
The combinational logic computes the state vector transition function as a function of the input and the current state vector, resulting in the vector Y = δY (Y, X). Of course Y also contains q bits. 5.2
On Modelling Time
Modelling time in asynchronous logic for a model checker is difficult. To capture all possible behaviors, it is necessary to describe all possible sequences of possible input and state vectors. This is only possible using incremental steps. Each step describes a possible next state of the circuit. The time interval between these steps can be very small and very large. To model this difference between real-time and the steps the actual circuit takes, we use the variable time tick. Its value is false when the duration of a sample period is negligible, and true when it isn’t. The value of time tick is used to synchronize the delay element modules. The value of time tick is random. 5.3
Combinational Logic
Implementation of the combinational logic in SMV is straightforward. The table is converted into a case statement. The empty spaces in the table translate to don’t cares. Table 3 shows the beginning of the module for the first bit of the
Model Checking the Design of an Asynchronous Sequential Circuit
315
Table 3. Module for the combinational logic of bit 1 of the state vector transition function. module dy1(x1, y, dy1) { INPUT x1: boolean; INPUT y: array 1..6 of boolean; OUTPUT dy1: boolean;
}
case{ x1=0 x1=0 x1=0 x1=0 x1=0 x1=0 x1=0 x1=0 . . . }
& & & & & & & &
y=[0,0,0,0,0,0]: y=[0,0,0,0,0,1]: y=[0,0,0,0,1,0]: y=[0,0,0,0,1,1]: y=[0,0,0,1,0,0]: y=[0,0,0,1,0,1]: y=[0,0,0,1,1,0]: y=[0,0,0,1,1,1]:
dy1:={0,1}; dy1:={0,1}; dy1:=0; dy1:={0,1}; dy1:=0; dy1:=0; dy1:=0; dy1:={0,1};
state vector transition function. The other bits of the state vector have similarly been modelled in modules dy2 . . . dy6. 5.4
Delay Logic, Feedback
A physical delay element normally has a basis in the time domain. When a rising edge arrives at the input, the output will eventually show a rising edge. The time between these events might not always be exactly the same, but in most cases it is possible to determine a lower and an upper bound. It is not possible to exactly model the time behavior, but a fairly abstract model seems to be sufficient for our proofs. The model proposed uses the time ticks provided by the main module. The module in Table 4 realizes the delay element. The model can be compared to an RC-filter. The capacitor is charged when there is a time tick. The feed to the capacitor is random, but not larger than max feed. The feed to the capacitor is random to model differences in the delay elements. One feedback loop might be faster than another. The randomness also captures the possible differences in the duration of the steps in the SMV model. The delay element contains another random element. When a capacitor is charged exactly to its maximum, the delay element is allowed to randomly choose the moment to change its value to 1: when the variable random becomes true, the output of the delay element changes. When a new time tick arrives the capacitor is charged again and the output of the delay element changes to 1. (A similar
316
M. van der Meulen Table 4. Delay module.
module delay( in, time tick, stuck at, out) { INPUT in, time tick, stuck at: boolean; OUTPUT out: boolean; charge: 0..max charge; feed: -max feed..max feed; random : boolean; case{ time tick = 0: feed:= 0; time tick = 1 & in=1: feed:= 1..max feed; time tick = 1 & in=0: feed:= -(1..max feed); } case{ (charge + feed) > max charge: next(charge):= max charge; (charge + feed) < min charge: next(charge):= min charge; default: next(charge):= charge + feed; }
}
case{ stuck at: next( out):= out; (charge + feed) > max charge: next( out):= 1; (charge + feed) < min charge: next( out):= 0; (charge + feed) = max charge & random: next( out):= 1; (charge + feed) = min charge & random: next( out):= 0; default: next( out):= out; }
behavior applies for input 0.) This is to account for the fact that when more than one delay element is to fire, the order of events might be unpredictable. The model of the delay element also incorporates stuck-at faults. When a stuck-at fault occurs the output of the delay element will remain the same value. The following values of the variables were used for the proofs: max feed=2, min charge=0, and max charge=2. We also tried other values and found no combination of values for which the proofs don’t hold. For values above 3 for max feed and max charge SMV becomes very slow for properties like zone preserve behavior or does not complete the proof at all. Simple properties like no undefined state vectors can even be proved with max feed=30 and max charge=30. We assume that the value of the values of these variables does not influence the correctness of the proofs. We think our model of the delay elements is realistic and that it models reasonable behavior. The model in fact only prevents that a delay element ’re-
Model Checking the Design of an Asynchronous Sequential Circuit
317
members’ past events too long by synchronizing the delay elements through time ticks. An example might clarify this. Suppose charge=max charge, out=0 and in is and remains 0. Without synchronization, out can become 1 at a random moment in time. With synchronization, out can still randomly become 1 but no longer than the appearance of a sample with a longer than negligible duration (time tick=1). When this sample arrives, the delay element will decharge with a minimum feed of 1, and out can no longer become 1, without the arrival of a new 1 at the input. 5.5
The SMV-Model
The circuit itself can now be modelled based on the modules described above, see Table 5. The SMV-model first describes the behavior of the finite state machine. The initial state of the finite state machine is state S1. Then it instantiates the logic and the delay elements, connecting them according to Figure 1. The initial state vector of the circuit is [1,0,1,1,0,1], the state vector assignment of state S1 (see Table 1(b)). The initialization is done by selecting the initial state of the delay elements. We assume that the bits of the state vector of the circuit are quiescent at 1 or 0 at start up, and choose the according states of the delay elements.
6
Proving Properties of the Circuit
For readability we use the following abbreviations for the state vectors of the states: #define #define #define #define
sa sa sa sa
s1 s2 s3 s4
[1,0,1,1,0,1] [1,0,1,0,1,0] [0,1,0,0,1,0] [0,1,0,1,0,1]
The prefix ’sa ’ stands for ’state assignment’. 6.1
Assumptions
In some proofs we need assumptions. The first assumption concerns the time. It is assumed that always eventually a time tick will occur. This is captured in the statement: fair time: assert G F time tick;
This assumption can be seen as a fairness constraint. A fairness constraint is a CTL formula which is assumed to be true infinitely often in all fair execution paths [11].
318
M. van der Meulen Table 5. Main module describing the finite state machine and the circuit.
module main(x1, dy, y) { INPUT x1: boolean; INPUT dy: array 1..6 of boolean; OUTPUT y: array 1..6 of boolean; fsm: array 1..4 of boolean; stuck at : array 1..6 of boolean; \* The fsm array contains the states in which the circuit could be given the input sequence *\ init(fsm):= [1,0,0,0]; \* S1 is the initial state *\ next(fsm[1]):= fsm[1] | (fsm[2] & ˜x1); next(fsm[2]):= fsm[2] | (fsm[3] & x1); next(fsm[3]):= fsm[3] | (fsm[4] & ˜x1); next(fsm[4]):= fsm[4] | (fsm[1] & x1)]; \* The initial state of the circuit is set to S1=[1,0,1,1,0,1] with all delay elements in their stable state *\ init(delay1.charge):=max charge; init(delay1. out):=1; init(delay2.charge):=min charge; init(delay2. out):=0; init(delay3.charge):=max charge; init(delay3. out):=1; init(delay4.charge):=max charge; init(delay4. out):=1; init(delay5.charge):=min charge; init(delay5. out):=0; init(delay6.charge):=max charge; init(delay6. out):=1; \* This is the description of the mealy-type circuit, first the logic, then the delay elements with the feedback *\ logic1: dy1(x1, y, dy[1]); . . . logic6: dy6(x1, y, dy[6]);
}
delay1: delay(dy[1], time tick, stuck at[1], y[1]); . . . delay6: delay(dy[6], time tick, stuck at[6], y[6]);
Another assumption (not a fairness constraint) needed in some proofs is that we assume the input to eventually become stable. We need this for example in proofs that the circuit itself is stable. This is captured in the assumption assume input stable: assume input stable: assert (F G x1) | (F G ˜x1);
Model Checking the Design of an Asynchronous Sequential Circuit
319
This simply states that eventually the input will always be high or always be low. The last assumptions are about the presence of stuck-at faults. When no stuck-at faults are present, we use: assume no stuck at: assert G stuck at=[0,0,0,0,0,0];
When on the other hand we assume a stuck-at fault, for example in the first bit of the state vector, we use: assume stuck at1: assert ( F G stuck at[1] & G ˜(stuck at[1] & X ˜stuck at[1]) & G ˜stuck at[2] & G ˜stuck at[3] & G ˜stuck at[4] & G ˜stuck at[5] & G ˜stuck at[6]);
This assumption states that the first bit will eventually show a stuck-at fault, but before that will function normally. All other bits never contain a stuck-at fault. Equivalent assumptions can be formulated for the other bits of the state vector. 6.2
The Circuit Does Not Reach Undefined State Vectors
The property no undefined state vectors states that the state vector will never become undefined. (The undefined state vectors do not have the state vector transition function defined in Table 2.) no undefined state vectors: assert G ˜( y=[0,0,0,0,0,0] | y=[0,0,0,0,0,1] | y=[0,0,0,1,1,1] | y=[0,0,1,1,0,0] | y=[0,1,1,1,1,1] | y=[0,1,1,0,0,1] | y=[1,0,0,1,1,0] | y=[1,1,0,0,0,0] | y=[1,1,1,1,0,0] | y=[1,1,1,1,1,1] | y=[1,1,1,1,1,0] | y=[1,1,1,0,0,0]);
y=[0,0,0,0,1,1] y=[0,0,1,1,1,1] y=[1,0,0,0,0,0] y=[1,1,0,0,1,1]
| | | |
The proof of this property holds without assumptions. 6.3
The Circuit Is Stable
The property stable state vector states that the state vector will be stable under the fairness constraint assume input stable that the input vector eventually stabilizes. stable state vector: assert F G (y[1] = X y[1]) & (y[2] = X (y[4] = X y[4]) & (y[5] = X using assume stable input prove
( y[2]) & (y[3] = X y[3]) & y[5]) & (y[6] = X y[6])); stable state vector;
320
M. van der Meulen
The CTL formula states that eventually every state vector will be equal to the next state vector. This is equivalent to stating that it always remains the same. A second property related to the stability of the circuit is end state vector: end state vector: assert F G (y=sa s1 | y=sa s2 | y=sa s3 | y=sa s4); using assume stable input, fair time, assume no stuck at prove end state vector;
It states that when the circuit eventually stabilizes, it does so at the state vector of one of the states. This property only holds under the fairness constraints that the input becomes stable, and that time proceeds. 6.4
The Circuit Realizes the Finite State Machine
The property preserve behavior states that the state vectors eventually reached are always possible outcomes given the input sequence. preserve behavior: assert F G ( y=sa s1 & fsm[1] | y=sa s2 & fsm[2] | y=sa s3 & fsm[3] | y=sa s4 & fsm[4]); using assume stable input, fair time, assume no stuck at prove preserve behavior;
The array fsm captures the possible states of the finite state machine. When the input vector stabilizes - assume stable input - the circuit has to stabilize in one of these states. We also need the fairness constraints fair time and assume no stuck at here, because the circuit might otherwise wait in an intermediate state forever.
7
The Circuit Is Stuck-at Fault Tolerant
Now, we will prove the same properties under the assumption that the circuit contains a stuck-at fault in one of the bits of the state vector. The main difference with the proofs above is that due to a stuck-at fault, the state vector might not become one of the state vectors assigned to a state (see Table 1). The state vector is also allowed to become one of the state vectors with a Hamming distance of one to that state vector. The set of vectors with Hamming distance zero or one to a state vector, we call the state vector zone. Every state vector of a state has its state vector zone. The following zone is defined for state S1:
Model Checking the Design of an Asynchronous Sequential Circuit
321
#define zone s1 ([1,0,1,1,0,1] union [0,0,1,1,0,1] union [1,1,1,1,0,1] union [1,0,0,1,0,1] union [1,0,1,0,0,1] union [1,0,1,1,1,1] union [1,0,1,1,0,0])
Equivalent zones have been defined for states S2, S3 and S4. 7.1
The Circuit Does Not Reach Undefined State Vectors
The proof of the property no undefined state vectors does not contain any assumptions on the presence of stuck-at faults. Therefore the property will be true independent of timing and independent of the number of stuck-at faults in the state vector. 7.2
The Circuit Is Stable
The property stable state vector is proved for the general circuit without assuming anything about stuck-at faults. Therefore, the circuit is stable under the assumption of the presence of a random number of stuck-at faults. The associated property that the circuit stabilizes in a state vector assigned to a state, has to be relaxed. The circuit has to stabilize in the state vector zone of a state: zone end state vector: assert F G ( y in zone s1 | y in zone s2 | y in zone s3 | y in zone s4); using assume stuck at1, assume stable input, fair time prove end state vector zone;
This is the proof for a stuck-at fault in bit 1 of the state vector. Equivalent proofs hold for all state vector bits. It can also be proved that the property does not hold when two bits are allowed to have a stuck-at fault. 7.3
The Circuit Realizes the Finite State Machine
The proof of the fact that the circuit realizes the finite state machine is essentially the same as for the case without stuck-at faults. The only difference is that the state vector is required to end in the state vector zone of a state, rather than the state vector assigned to the state itself. zone preserve behavior: assert F G ( y in zone s1 & fsm[1] | y in zone s2 & fsm[2] | y in zone s3 & fsm[3] | y in zone s4 & fsm[4]); using assume stuck at1 fair time, assume stable input prove zone preserve behavior;
This proof equivalently holds for stuck-at faults in the other bits of the state vector.
322
M. van der Meulen
8
Delay Logic Revisited
The model of the delay element contains one assumption that needs further investigation. When time tick=1, feed is always not equal to zero. This is assumption is necessary, for example for the proof of the property no undefined state vectors. When we change the case statement for the determination of feed to: case { time tick = 0: feed:= 0; time tick = 1 & in=1: feed:= 0..max feed; time tick = 1 & in=0: feed:= -(0..max feed); }
SMV generates the following counterexample: # 1 2 3 4
Y [1,0,1,1,0,1] [1,0,0,1,0,1] [1,1,0,1,0,1] [1,1,1,1,0,0]
X 1 0 0 -
Y’ time tick [1,1,0,1,0,1] 1 [1,0,1,1,0,1] 1 [0,1,0,1,0,0] 1 [-,-,-,-,-,-] 1
Delay 2 in out charge 1 0 0 0 0 2 1 1 2 - 1 2
Delay 3 in out charge 0 1 2 1 0 0 0 0 2 - 1 2
State vector [1,1,1,1,0,0] in sample 4 is an undefined state vector. The problem is the interaction between the second and the third bit. In sample 2, the input to delay element 2 is 0, but it is not decharged at all. In the same sample, the input to delay element 3 is 1 and it is completely charged. It is plausible that we forbid this to happen. When the duration of a sample is so long that a delay element can completely charge, it is reasonable to assume that the other delay elements at least charge/decharge a little in the same sample (at least 1).
9
Conclusions
The correctness of designs of unrestricted, stuck-at fault tolerant, asynchronous sequential circuits can be checked using the model checker SMV. The approach is general, and is explained using an example. The properties can be proved under reasonable constraints and assumptions in the model. Important is the design of the SMV specification of the delay elements, since it possibly contains many hidden assumptions on the functioning of the circuit. It appeared to be possible to use a very abstract model. The only necessary assumption is some synchronization between the delay elements in the circuit.
10
Future Research
We proved properties of the circuit using max feed=2 and max charge=2. We tried some other values and we believe the proofs hold for all values. It could be possible to verify this claim using a theorem prover.
Model Checking the Design of an Asynchronous Sequential Circuit
323
More demanding future research would involve a proof of the unrestricted and fault tolerant properties based on the design rules for MTT circuits. At the moment we have a strong intuition that our design rules always lead to the desired properties. The fact that we can prove that the circuits designed using these rules show the desired properties supports this intuition, but we can not yet make a general claim.
References 1. J.F. Meyer: Fault Tolerant Sequential Machines. In: IEEE Transactions on Computers, Vol. c-20, No. 10, October 1971. 2. Y. Tohma, Y. Ohyama & R. Sakai: Realization of Fail-Safe Sequential Machines by Using a k-out-of-n Code. In: IEEE Transactions on Computers, Vol. c-20, No. 11, November 1971. 3. W.W. Patterson & G.A. Metze: A Fault-Tolerant Asynchronous Sequential Machine, Int. Symp. on Fault Tolerant Computing, p176-81, 1972. 4. D.K. Pradhan & S.M. Reddy: Fault-Tolerant Asynchronous Networks. In: IEEE Transactions on Computers, Vol. c-22, No. 7, July 1973. 5. W.W. Patterson & G. Metze: A Fail-Safe Asynchronous Sequential Machine. In: IEEE Transactions on Computers, Vol. c-23, No. 4, April 1974. 6. D.H. Sawin & G.K. Maki: Asynchronous Sequential Machines Designed for Fault Detection. In: IEEE Transactions on Computers, Vol. c-23, No. 3, March 1974. 7. G.K. Maki & D.H. Sawin: Fault-Tolerant Asynchronous Sequential Machines. In: IEEE Transactions on Computers, Vol. c-23, No. 7, July 1974. 8. J.A. Brzozowski & C.-J. H. Seger: Asynchronous Circuits, Monographs in Computer Science, Springer-Verlag, New York, 1995. 9. P.K. Lala: Fault Tolerant & Fault Testable Hardware Design, Prentice/Hall International, 1985. 10. E.M. Sentovich e.a.: SIS: A System for Sequential Circuit Synthesis, University of California, Berkeley, Electronics Research Laboratory, Memorandum No. UCB/ERL M92/41, 4 May 1992. 11. K.L. McMillan: The SMV Language, Cadence Berkeley Labs, 1999. 12. P.K. Lala: Self-Checking and Fault-Tolerant Digital Design, Morgan Kaufmann Publishers, 2001.
Functional Design Using Behavioural and Structural Components Richard Sharp University of Cambridge Computer Laboratory William Gates Building JJ Thomson Avenue Cambridge CB3 0FD, UK [email protected]
Abstract. In previous work we have demonstrated how the functional language SAFL can be used as a behavioural hardware description language. Other work (such as µFP and Lava) has demonstrated that functional languages are apposite for structural hardware description. One of the strengths of systems such as VHDL and Verilog is their ability to mix structural- and behavioural-level primitives in a single specification. Motivated by this observation, we describe a unified framework in which a stratified functional language is used to specify hardware across different levels of abstraction: Lava-style structural expansion is used to generate acyclic combinatorial circuits; these combinatorial fragments are composed at the SAFL-level. We demonstrate the utility of this programming paradigm by means of a realistic case-study. Our tools have been used to specify, simulate and synthesise a DES encryption/decryption circuit. Area-time performance figures are presented. Finally, we show how similar integration techniques can be used to embed languages such as Magma/Lava into industrial HDLs such as Verilog and VHDL. Our methodology offers significant advantages over the “Perlscript” technique so commonly employed in practice.
1
Introduction
Hardware description languages (HDLs) are often categorised according to the level of abstraction they provide. Behavioural HDLs focus on algorithmic specification and attempt to abstract as many low-level implementation issues as possible. Most behavioural HDLs support constructs commonly found in highlevel programming languages (e.g. assignment, sequencing, conditionals and iteration). In contrast, Structural HDLs allow a hardware engineer to describe a circuit by specifying its hardware-level components and their interconnections. The process of automatically translating a Behavioural HDL into a Structural HDL is often referred to as high-level synthesis. Commercially the two most important HDLs are Verilog and VHDL [7,8]. A contributing factor to the success of these systems is their support for both behavioural- and structural-level design. The ability to combine behavioural M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 324–341, 2002. c Springer-Verlag Berlin Heidelberg 2002
Functional Design Using Behavioural and Structural Components
325
and structural primitives in a single specification offers engineers a powerful framework: when the precise low-level details of a component are not critical, behavioural constructs can be used; for components where finer-grained control is required, structural constructs can be used.1 However, the flip-side is that by supporting multiple levels of abstraction both Verilog and VHDL are very large languages which are difficult to analyse, transform and reason about. In previous work we have designed SAFL [12], a behavioural HDL which supports a functional programming style. An optimising high-level synthesis system has been implemented which compiles SAFL specifications into structural Verilog [7]. (We map the generated Verilog to silicon using commercially available RTL compilers.) Other researchers have demonstrated that functional languages are powerful tools for structural hardware specification [20,15,2]. In this paper we present a system which integrates both structural- and behavioural-level hardware design in a pure functional framework. Our technique involves embedding a functional language designed for structural hardware description into SAFL. The remainder of this paper is structured as follows: after surveying related work (Section 2) we give a brief overview of the SAFL language (Section 3). Our mechanism for embedding Lava-style structural expansion in SAFL is then presented (Section 4); this methodology is demonstrated by means of a realistic case-study in which a fully functional DES encrypter/decrypter is specified (Section 5). We go on to describe how similar integration techniques can be used to embed languages such as Lava into industrial HDLs such as Verilog and VHDL (Section 6) and argue that such integration leads to a system more powerful than Verilog/VHDL alone. Finally, Section 7 concludes and outlines directions for future work.
2
Related Work
There is a large body of work on using functional languages to describe hardware at the structural level. Notable systems in this area include µFP [20], HDRE/Hydra [15], Hawk [10] and Lava [2]. The central idea behind each of these systems is to use the powerful features found in existing functional languages (e.g. higher-order functions, polymorphism and lazy evaluation) to build up netlists from simple primitives. These primitives can be given different semantic interpretations allowing, for example, the same specification to be either simulated or translated into a netlist. However, whilst this technique is obviously appealing, there are problems involved in generating netlists for circuits that contain feedback loops. The difficulty is that, in a pure functional language, a cyclic circuit (expressed as a series of mutually recursive equations) naturally evaluates to an infinite tree, preventing the netlist translation phase from terminating. A number of solutions to this problem have been proposed: O’Donnell advocates the explicit tagging of components at the source-level [16]. In this system the programmer is responsible for labelling distinct components of a circuit with 1
Note the analogy with embedding assembly code in a higher-level software language.
326
R. Sharp
unique values. Whilst this allows a pure functional graph traversal algorithm to detect cycles trivially (by maintaining a list of tags which have already been seen) it imposes an extra burden on the programmer and significantly increases potential for manual error (since it is the programmer’s job to ensure that distinct components have unique tags). Lava [2] also uses tagging to identify cycles, but employs a state monad [21] to generate fresh tags automatically. Although this neatly abstracts the low-level tagging details from the designer, Claessen and Sands [4] argue that the resulting style of programming is “unnatural” and “inconvenient”. In the same paper, Claessen and Sands propose another solution which involves augmenting Haskell (the functional language in which Lava is embedded) with immutable references which support a test for equality. This extension makes graph sharing observable at the source-level but, although it is shown that many useful laws still hold, full equational reasoning is no longer possible—for example, β-reduction no longer preserves equality. In this paper we present an alternative approach. By only allowing the description of acyclic circuits through Lava-style structural static expansion and then combining these circuit fragments at the SAFL-level we facilitate the pure functional specification of complex circuits which can contain feedback loops. We have not solved the observable sharing problem; instead we have eliminated it: since cycles are not permitted at the structural level we do not have to worry about infinite loops being statically expanded. Conversely, since feedback loops are represented as tail-recursive calls at the SAFL-level there is no need to introduce impure language features. Although most of the work on using functional languages for hardware description focuses on the structural level some researchers have considered using functional languages for behavioural hardware description. Johnson’s Digital Design Derivation (DDD) system [3] uses a scheme-like language to describe circuit behaviour. A series of semantics-preserving transformations are presented which can be used to refine a behavioural specification into a circuit structure; the transformations are applied manually by an engineer. This is a different approach from hardware design using SAFL [12]. Although we advocate the use of source-level transformations to explore architectural tradeoffs (including allocation, binding and scheduling [6]), SAFL specifications are translated to hardware automatically using our optimising silicon compiler.
3
Overview of the SAFL Language
SAFL has syntactic categories e (term) and p (program). First suppose that v ranges over a set of constants. Let x range over variables (occurring in let declarations or as formal parameters), a over primitive functions (such as addition) and f over user-defined functions. For typographical convenience we abbreviate formal parameter lists (x1 , . . . , xk ) and actual parameter lists (e1 , . . . , ek ) to x and e respectively; the same abbreviations are used in let definitions. Then the abstract syntax of the core SAFL language can be given in terms of recursion equations on programs, p, and expressions, e:
Functional Design Using Behavioural and Structural Components
327
e ::= v | x | if e1 then e2 else e3 | let x = e in e0 | a(e1 , . . . , earity(a) ) | f (e1 , . . . , earity(f ) ) p ::= fun f1 (x ) = e1 . . . fun fn (x ) = en
It is sometimes convenient to extend this syntax slightly. In later examples we use a case-expression instead of iterated tests; we also write e[n:m] to select a bit-field [n..m] from the result of expression e (where n and m are integer constants). There is a syntactic restriction that whenever a call to function fj from function fi is part of a cycle in the call graph of p then we require the call to be a tail call.2 (Note that calls to a function not forming part of a cycle can occur in an arbitrary expression context.) This ensures that storage for the variables and temporaries of p can be allocated statically—in software terms the storage is associated with the code of the compiled function; in hardware terms it is associated with the logic to evaluate the function body. The other main feature of SAFL, apart from static allocatability, is that its evaluation is limited only by data flow (and control flow at user-defined function call and conditional). Thus, in the form let x = (e1 , . . . , ek ) in e0 or in a call f (e1 , . . . , ek ) or a(e1 , . . . , ek ), all the ei (1 ≤ i ≤ k) are evaluated concurrently. In the conditional if e1 then e2 else e3 we first evaluate (only) e1 ; one of e2 or e3 is evaluated after its result is known. SAFL has call-by-value semantics since eager evaluation offers a greater opportunity for parallelism (i.e. we can execute a function call’s arguments in parallel without worrying about strictness). Although up to this point we have referred to SAFL as a behavioural language, it is also capable of capturing some structural aspects of a design. We say that SAFL is resource-aware to indicate that a single user-defined function definition at the source-level corresponds to a single hardware resource at the circuitlevel. In this context multiple calls to the same function corresponds to resource sharing3 . We use SAFL-level transformations to express architectural tradeoffs such as resource duplication/sharing and hardware/software co-design [13]. In essence these transformations preserve a specification’s extensional semantics (the result returned) whilst changing the intensional semantics (how the circuit is structured). A more in-depth description of the SAFL language and its associated silicon compiler can be found in our recent survey paper [14]. For the purposes of this document we provide a short example which illustrates the main points: fun mult(x:16, y:16, acc:32):32 = if (x=0 | y=0) then acc else mult(x<<1, y>>1, if y[0:0] then acc+x else acc) fun f(x:16):32 = mult(x, x, 0) + mult(13, x, 0) 2
3
Tail calls consist of calls forming the whole of a function body, or nested solely within the bodies of let-in expressions or that are the consequents of if-then-else expressions. Our optimising compiler automatically deals with sharing issues by statically scheduling access to resources where it can, and generating arbiters to perform scheduling dynamically otherwise [19].
328
R. Sharp
From this specification, two hardware resources are generated: a circuit, Hmult , corresponding to mult and a circuit, Hf , corresponding to f. The two calls to mult are not inlined: at the hardware level there is only one shared resource, Hmult , which is invoked twice by Hf . The tail-recursive call in the definition of mult is synthesised into a feedback loop at the circuit level. Since function arguments are evaluated concurrently, the two shift operations occurring in the recursive call to mult are evaluated in parallel along with the conditional test and possibly, depending on the conditional branch taken, the addition operation. Each SAFL variable is annotated with a bit-width at its point of introduction. We use the form x:w to indicate that variable x has width w. Note that the widths of function result types are also specified explicitly (using the form fun f(...):w). Widths of constants can either be specified explicitly or, more usually, inferred from their local context. As part of a simple type-checking phase our SAFL compiler ensures that for each function call, f (x ), the widths of arguments, x match those specified in the signature of f .
4
Embedding Structural Expansion in SAFL
Resource awareness allows SAFL to describe the system-level structure of a design by mapping fun declarations to circuit-level functional units. In contrast, systems such as µFP and Lava offer much finer-grained control over circuit structure, taking logic-gates (rather than function definitions) as their structural primitives. We are not arguing that either approach is better: in practice both are appropriate depending on the type of hardware that is being designed. Motivated by this observation, we present a framework which integrates Lava-style structural expansion with SAFL. Section 4.1 outlines our system for fine-grained structural hardware description which, for the purposes of this paper, we will refer to as Magma4 . In Section 4.2 we show how Magma is integrated with SAFL. 4.1
Building Combinatorial Hardware in Magma
An argument in favour of Lava, Hydra and other similar systems, is that since they are embedded in existing functional languages they are able to leverage existing tools and compilers. Furthermore, use of non-standard interpretation of basis functions means that the same compiler can be used to perform both hardware simulation and synthesis. These compelling benefits lead us to adopt a similar approach. However, in contrast to Lava, which is embedded in Haskell [1], we choose to embed Magma in ML [11]. The choice of ML is fitting for two main reasons: firstly, since we only wish to describe acyclic circuits, ML’s strict evaluation is appropriate for both simulation and synthesis interpretations; secondly, since SAFL also borrows much of its syntax and semantics from ML, both Magma and SAFL share similar conventions (an important consideration 4
As it is a restricted form of Lava.
Functional Design Using Behavioural and Structural Components
329
when we are dealing with specifications containing a mixture of both Magma and SAFL). In the remainder of this section we assume that the reader is familiar with the ML module system (signatures, structures and functors). A good overview of the ML module system can be found in Paulson’s textbook [17]. signature BASIS = sig type bit val b0 : val b1 : val orb : val andb : val notb : val xorb : end
bit bit bit bit bit bit
* * * *
bit -> bit bit -> bit bit bit -> bit
Fig. 1. The definition of the BASIS signature (from the Magma library)
The Magma system essentially consists of a library of ML code. A signature called BASIS is provided which declares the types of supported basis functions (see Figure 1). Values b0 and b1 correspond to logic-0 (false) and logic-1 (true) respectively. Functions orb, andb, notb and xorb correspond to logic functions or, and, not and xor. Two structures which implement BASIS are provided: – SimulateBasis provides a simulation interpretation. We implement bits as boolean values; functions orb, andb etc. have their usual boolean interpretations. – SynthesisBasis provides a synthesis interpretation. We implement bits as strings representing names of wires in a net-list. Functions orb, andb etc. take input wires as arguments and return a (fresh) output wire. Calling one of the basis functions results in its netlist declaration being written to the selected output stream as a side-effect. For example, if the result of calling andb with string arguments “in wire1” and “in wire2” is the string “out wire” then the following is output to StdOut: and(out_wire,in_wire1,in_wire2); Figure 2 shows a Magma specification of a ripple-adder. As with all Magma programs, the main body of code is contained within an ML functor. This provides a convenient abstraction, allowing us to parameterise a design over its basis functions. By passing in the structure SimulateBasis (see above) we are able to instantiate a copy of the design for simulation purposes; similarly, by passing in SynthesisBasis we instantiate a version of the design which, when executed, outputs its netlist. The signature RP ADD is used to specify the type of the ripple add function. Using this signature to constrain the RippleAdder functor also means that only the ripple add function is externally visible; the
330
R. Sharp
signature RP_ADD = sig type bit val ripple_add : (bit list * bit list) -> bit list end functor RippleAdder (B:BASIS):RP_ADD = struct type bit=B.bit fun adder (x,y,c_in) = (B.xorb(c_in, B.xorb(x,y)), B.orb( B.orb( B.andb (x,y), B.andb(x,c_in)), B.andb(y,c_in))) fun carry_chain f _ ([],[]) = [] | carry_chain f c_in (x::xs,y::ys) = let val (res_bit, c_out) = f (x,y,c_in) in res_bit::(carry_chain f c_out (xs,ys)) end end
val ripple_add = carry_chain adder B.b0
Fig. 2. A simple ripple-adder described in Magma
functions carry chain and adder can only be accessed from within the functor. Note that the use of signatures to specify interfaces in this way is not compulsory but, for the usual software-engineering reasons, it is recommended. Let us imagine that a designer has just written the ripple-adder specification shown in Figure 2 and now wants to test it. This can be done by instantiating a simulation version of the design in an interactive ML session: - structure SimulateAdder = RippleAdder (SimulationBasis); The adder can now be tested by passing in arguments (a tuple of bit lists) and examining the result. For example: - SimulateAdder.ripple_add ([b1,b0,b0,b1,b1,b1], [b0,b1,b1,b0,b1,b1]) val it = [b1,b1,b1,b1,b0,b1] : SimulateAdder.bit list Let us now imagine that the net-list corresponding to the rippler-adder is required. We start by instantiating a synthesis version of the design: - structure SynthesiseAdder = RippleAdder (SynthesisBasis); If we pass in lists of input wires as arguments, the ripple add function prints its netlist to the screen and returns a list of output wires:
Functional Design Using Behavioural and Structural Components
331
- SynthesiseAdder.ripple_add (Magma.new_bus 5, Magma.new_bus 5) val it = ["w_149","w_150","w_151","w_152","w_153"] with output: and(w_1,w_45,w_46); and(w_2,w_1,w_44); ... and(w_149,w_55,w_103); The function new bus, part of the Magma library, is used to generate a bus of given width (represented as a list of wires). 4.2
Integrating SAFL and Magma
Our approach to integrating Magma and SAFL involves using delimiters to embed Magma code fragments inside SAFL programs. At compile time the embedded Magma is synthesised and the resulting netlist is incorporated into the generated circuit (see Figure 3). We use delimiters “<%” and “%>” to mark the start and end points of Magma code fragments. Our compiler performs type checking across the SAFL-Magma boundary, ensuring the validity of the final design.
Execute Magma under Synthesis interpretation
Process 1: ML Session
Magma
Verilog
Encounter Magma code fragment
Process 2: SAFL Compiler
Time
Fig. 3. A diagrammatic view of the steps involved in compiling a SAFL/Magma specification
The SAFL parser is extended to allow a special type of Magma code fragment which, if present, must appear at beginning of a specification. This Magma fragment, which is referred to as the library block, contains an ML functor called Magma Code. Functions within Magma Code can be called from other Magma fragments in the remainder of the specification. Figure 4 illustrates these points with a simple example in which the Magma ripple adder (initially defined in Figure 2) is invoked from a SAFL specification. The precise details of the SAFL-Magma integration are discussed later in this section; for now it suffices to observe that
332
R. Sharp
(* Magma library block containing Magma_Code functor ------ *) <% signature RP_ADD = ... (* as in Figure 2 *) functor Magma_Code (B:BASIS):RP_ADD = ... (* as RippleAdder functor in Figure 2 *)
%> (* --------- End of Magma Library Block ------------------- *) (* SAFL function declaration: *) fun mult(x, y, acc) = if (x=0 | y=0) then acc else mult(x<<1, y>>1, if y[0] then <% ripple_add %>(acc,x) else acc) Fig. 4. A simple example of integrating Magma and SAFL into a single specification
Magma fragments are treated as functions at the SAFL-level and applied to SAFL expressions. The treatment of Magma fragments is similar to that of primitive functions (such as +, -, * etc.). In particular, Magma code fragments are expanded inplace. For example, if a specification contains two Magma fragments of the form, <% ripple add %>, then the generated hardware contains two separate ripple adders. Note that if we require a shared ripple adder then we can encapsulate the Magma fragment in a SAFL function definition and rely on SAFL’s resourceawareness properties. For example, the specification: fun add(x, y) = <% ripple_add %> (x,y) fun mult_3(x) = add(x, add(x,x)) contains a single ripple adder shared between the two invocations within the definition of the mult 3(x) function. Since embedded Magma code fragments represent pure functions (i.e. do not cause side effects) they do not inhibit SAFL-level program transformation. Thus our existing SAFL-level transformations corresponding to resource duplication/sharing [12], hardware/software co-design [13] etc. remain valid. Implementation and Technical Details: Consider the general case of a Magma fragment, m, embedded in SAFL: <% m %>(e1 , . . . , ek ) where e1 , . . . , ek are SAFL expressions. On encountering the embedded Magma code fragment, <% m %>, our compiler performs the following operations:
Functional Design Using Behavioural and Structural Components
333
1. An ML program, M, (represented as a string) is constructed by concatenating the library block together with commands to instantiate the Magma Code functor in its synthesis interpretation (see above). 2. The bit-widths of SAFL expressions, e1 ,. . . ,ek , are determined (bit-widths of variables are known to the SAFL compiler) and ML code is added to M to construct corresponding buses, B1 , . . . , Bk , of the appropriate widths (using the Magma.new bus library call). 3. M is further augmented with code to: a) execute ML expression, m(B1 , . . . , Bk ), which, since the library block has been instantiated in its synthesis interpretation, results in the generation of a netlist; and b) wrap up the resulting netlist in a Verilog module declaration (adding Verilog wire declarations as appropriate). 4. A new ML session is spawned as a separate process and program M is executed within it. 5. The output of M, a Verilog module declaration representing the compiled Magma code fragment, is returned to the SAFL compiler where it is added to the object code. Our SAFL compiler also generates code to instantiate the module, connecting it to the wires corresponding to the output ports of SAFL expressions e1 , . . . , ek . In order that the ML-expression m(B1 , . . . , Bk ) type checks, m must evaluate to a function, F, with a type of the form: (bit list * bit list * ... * bit list)
->
bit list
with the arity of F’s argument tuple equal to k. If m does not have the right type then a type-error is generated in the ML-session spawned to execute M. Our SAFL compiler traps this ML type-error and generates a meaningful error of its own, indicating the offending line-number of the SAFL/Magma specification. In this way we ensure that the bit-widths and number of arguments applied to <% m %> at the SAFL-level match those expected at the Magma-level. Another property we wish to ensure at compile time is that the output port of a Magma-generated circuit is of the right width. We achieve this by incorporating width information corresponding to the output port of Magma-generated hardware into our SAFL compiler’s type-checking phase. Determining the width of a Magma specification’s output port is trivial—it is simply the length of the bit list returned when m(B1 , . . . , Bk ) is executed.
5
Case Study: DES Encrypter/Decrypter
Appendix A presents code fragments from the SAFL specification of a Data Encryption Standard (DES) encryption/decryption circuit. Here we describe the code for the DES example, focusing on the interaction between SAFL and Magma; the details of the DES algorithms are not discussed. We refer readers who are interested in knowing more about DES to Scheier’s cryptography textbook [18].
334
R. Sharp
The library block at the beginning of the DES specification defines three functions used later in the specification: – perm is a curried function which takes a permutation pattern, p, (represented as a list of integers) and a list of bits, l. It returns l permuted according to pattern p. – ror is a curried function which takes an integer, x, and a list of bits, l. It returns l rotated right by x. – rol is as ror but rotates bits left (instead of right). A set of permutation patterns required by the DES algorithm are also declared. (For space reasons the bodies of some of these declarations are omitted.) The code in Appendix A uses two of SAFL’s features which have not been described in this paper: – The primitive function join takes an arbitrary number of arguments and returns the bit-level concatenation of these arguments. As one would expect, the bit-width of the result of a call to join is the sum of the bit-widths of its input arguments. – SAFL’s type declaration allows us to construct records with named fields. Curly braces, { . . . }, are used as record constructors and dot notation (r.f ) is used to select a field, f , from record r. After type-checking our SAFL compiler translates record notation directly into bit-level joins and selects. (Recall that bit-level selects are represented using the e[n:m] notation—see Section 3.) Primitive functions corresponding to arithmetic and boolean operators use their standard symbols (e.g. +, <, =). The binary infix operator, (ˆ), is used for bit-wise exclusive-or. The DES algorithm requires 8 S-boxes, each of which is a substitution function which takes a 6-bit input and returns a 4-bit output. The S-boxes’ definitions make use of one of SAFL’s syntactic sugarings: lookup e with {v0 , ..., vk } Semantically the lookup construct is equivalent to a case expression: case e of 0 => v0 | ...| (k − 1) => vk−1 default vk . To ensure that each input value to the lookup expression has a corresponding output value we enforce the constraint that k = 2w − 1 where w is the width of expression e. Our compiler is often able to map lookup statements directly into ROM blocks, leading to a significantly more efficient implementation than a series of iterated tests. Before applying its substitution each S-box permutes its input. We use our Magma permutation function to represent this permutation: <% perm p inSbox %>(x)
Functional Design Using Behavioural and Structural Components
335
Other examples of SAFL-Magma integration can be seen throughout the specification. The keyshift function makes use of the Magma ror and rol functions to generate a key schedule. Other invocations of the Magma perm function can be seen in the bodies of SAFL-level functions: round and main. In general we find the use of higher-order Magma functions (such as perm, ror and rol) to be a powerful idiom. We used our SAFL compiler to map the DES specification to synthesisable RTL-Verilog. A commercial RTL-synthesis tool (Leonardo from Exemplar) was used to synthesise the RTL-Verilog for an Altera Apex E20K200E FPGA (200K gate equivalent). The resulting circuit utilised 8% of the FPGA’s resources and could be clocked up to 48MHz. The design was mapped onto a Altera Excalibur Development Board and, using the board’s 33MHz reference clock a throughput of 15.8Mb/s (132 Mbits/s) was achieved. The performance figures of our DES implementation compare favourably to a hand-coded DES implementation written in VHDL by Kapps and Paar [9]. In practice our implementation runs 30% faster; however this is probably, at least in part, due to the fact that we are using different FPGA technology. A more meaningful comparison is to observe that both implementations take the same number of cycles to process a DES block.
6
Embedding Magma in VHDL/Verilog
A common practice in the hardware design industry is to generate repetitive combinatorial logic by writing scripts (in a language such as Perl) which, when executed, generate the necessary VHDL or Verilog code. The output of the script is then cut and pasted into the VHDL/Verilog design and the glue-code required to integrate the two written manually. Clearly there are a number of ways in which this design methodology can be improved. In particular it would be beneficial if (i ) type checking could be performed across the Verilog/VHDL–scripting language boundary and (ii ) the necessary glue-code generated automatically at compile time. The question that naturally arises is whether it is possible to use the SAFL-Magma integration techniques we have already described to integrate, say, Verilog and Magma. Although the complex syntax of the Verilog language makes integration with Magma more difficult the basic principles outlined earlier in the paper are still applicable. Since the widths of Verilog variables are statically known to the Verilog compiler we can use the same width-checking techniques across the VerilogMagma boundary that we employed across the SAFL-Magma divide in Section 4.2. We have devised three different forms of integration mechanism which we believe would be of use to Verilog programmers. These are mentioned briefly below: Expressions: In the context of a Verilog expression (e.g. the right-hand-side of an assign statement), integration can be performed using the function-call mechanism already described in the context of SAFL. For example, a Verilog design may contain code such as:
336
R. Sharp
assign after_perm = <% perm p_initial %>(before_perm); Here, the Magma expression is statically expanded and treated in a similar way to one of Verilog’s primitive operators. Explicit Module Definitions: In some cases an engineer may wish to treat a Magma function as a named Verilog module which can subsequently be instantiated in the usual Verilog fashion. To handle this type of integration we introduce the following form: module ModName(out, in_1, in_2) --> <% carry_chain adder B.b0 %> We use the symbol --> to indicate that the module’s body is specified by the given Magma expression. Note that an explicit output port, out, is required to read the result of the function. This form of integration can be seen as syntactic sugar. In general, it can be translated into the expression-integration form as follows: module ModName(out, in_1, ..., in_N); output out; input in_1, ..., in_N; assign out = <% ... Magma expression ... %>(in_1, ..., in_N); endmodule Implicit Module Definitions: It is often convenient to avoid the explicit definition of a named module where possible. For this reason we propose a third form of integration as follows: <% perm p_initial %> my_perm(out_w, in_w); In this case the augmented Verilog compiler automatically generates a fresh module definition (with a name of its choosing), instantiates it (with instance name my perm) and connects it to existing wires out w and in w. Again, notice that in the Verilog domain it is necessary to explicitly mention the output of the function. In contrast, in the Magma domain, function composition can still be used to connect hardware blocks together without the overhead of explicitly declaring connecting wires. For this reason, designers may wish to move as much of the combinatorial logic specification as possible into the Magma portion of the design.
7
Conclusions and Further Work
In this paper we have motivated and described a technique for combining both behavioural and structural-level hardware specification in a stratified pure functional language. Our methodology has been applied to a realistic example. We believe that the major advantages of our approach are as follows:
Functional Design Using Behavioural and Structural Components
337
– As in Verilog and VHDL, we are able to describe large systems consisting of both behavioural and structural components. – SAFL-level program transformation remains a powerful technique for architectural exploration. The functional nature of the Magma-integration means that our existing library of SAFL transformations are still applicable. – By only dealing with combinatorial circuits at the structural-level we eliminate the problems associated with graph-sharing in a pure functional language (see Section 2). We do not sacrifice expressivity: cyclic (sequential) circuits can still be formed by composing combinatorial fragments at the SAFL-level. We also showed how similar techniques can be used to embed languages such as Magma into industrial HDLs such as Verilog or VHDL. We believe that this approach offers a great deal over the ad-hoc “Perl-script” technique so commonly employed in practice. In particular: (i ) type-checking across the Verilog-scripting language boundary catches a class of common errors; (ii ) time-consuming gluecode required for the integration is generated automatically; and (iii ) as is often argued, the features of a functional language such as polymorphism, static typechecking and higher-order functions, encourage code-reuse and aid correctness. Another compelling benefit for integrating a functional-language (such as Lava or Magma) into Verilog/VHDL is that the techniques of Claessen et al. for concisely encapsulating place-and-route information [5] can be employed to generate efficient layouts for repetitive combinatorial logic. Whilst we accept that the majority of working hardware engineers are not familiar with functional programming (and hence not likely to embrace the technique) we also observe that there are an increasing number of Computer Science graduates (as opposed to Electronic Engineering graduates) seeking employment in the hardware design sector5 . With this in mind, it is conceivable that an easily implementable integration mechanism between languages such as Magma/Lava and industrial HDLs such as Verilog/VHDL (see Section 6) may help to make the tried-and-tested technique of structural hardware specification using functional languages more attractive to the hardware design industry. Acknowledgements. This work was supported by (UK) EPSRC grant, reference GR/N64256: “A Resource-Aware Functional Language for Hardware Synthesis”; AT&T Research Laboratories Cambridge provided additional support (including sponsoring the author). The author would like to thank Alan Mycroft for his valuable comments and suggestions.
5
We do not wish to imply that EE graduates are inferior to their CS counterparts! We are simply commenting that they often have different areas of expertise.
338
R. Sharp
Appendix A: SAFL Specification of a DES Encrypter/Decrypter (* Start of Magma Library Block --------------------------------- *) <% signature DES = sig val perm: int list -> ’a list -> ’a list val ror: int -> ’a list -> ’a list val rol: int -> ’a list -> ’a list val p_compress : int list val p_key : int list ... etc ... val p_inSbox : int list end functor Magma_code (B:BASIS):DES = struct (* DES permutation patterns. To save space some patterns are ommitted -- written as ’...’ *) val p_initial
val p_key val p_compress val p_final
= [58,50,42,34,26,18,10,2,60,52,44,36,28,20,12,4, 62,54,46,38,30,22,14,6,64,56,48,40,32,24,16,8, 57,49,41,33,25,17,9,1,59,51,43,35,27,19,11,3, 61,53,45,37,29,21,13,5,63,55,47,39,31,23,15,7] = [ ... ] val p_pbox = [ ... ] = [ ... ] val p_expansion = [ ... ] = [ ... ] val p_inSbox = [1,5,2,3,4]
(* Permutation function -- given a permutation pattern (list of ints) and a list of bits it returns a permuted list of bits: *) fun perm positions input = let val inlength = length input fun do_perm [] _ = [] | do_perm (p::ps) input = (List.nth (input,inlength-p))::(do_perm ps input) in do_perm positions input end (* Rotate bits right by specified amount: *) fun ror n l = let val last_n = rev (List.take (rev l, n)) val rest = List.take (l, (length l)-n) in last_n @ rest end (* Rotate bits left by specified amount: *) fun rol n l =
Functional Design Using Behavioural and Structural Components let val first_n = List.take (l, n) val rest = List.drop (l, n) in rest @ first_n end
end %> (* End of Magma Library Block ----------------------------------- *) (* Definitions of S-Boxes (implemented as simple lookup tables). Note: the ’inline’ pragma tells the compiler to inline each call to a function rather than treating it as a shared resource. We use inline here because the resources as so small they are not worth sharing. *) inline fun sbox1(x:6):4 = lookup <% perm p_inSbox %> (x) with {14,4,13,1,2,15,11,8,3,10,6,12,5,9,0,7, 0,15,7,4,14,2,13,1,10,6,12,11,9,5,3,8, 4,1,14,8,13,6,2,11,15,12,9,7,3,10,5,0, 15,12,8,2,4,9,1,7,5,11,3,14,10,0,6,13} inline fun sbox2(x:6):4 = lookup ... ... similarly define sbox4,5,6 and 7 -- ommitted to save space. (* Do s_box substitution on data-block: *) inline fun s_sub(x:48):32 = join( sbox1( x[47:42] ), sbox2( sbox3( x[35:30] ), sbox4( sbox5( x[23:18] ), sbox6( sbox7( x[11:6] ), sbox8(
x[41:36] x[29:24] x[17:12] x[5:0]
), ), ), ))
(* Define a record which contains the left and right halves of a 64-bit DES block and the 56-bit key. *) type round_data = record {left:32, right:32, key:56} (*
Successive keys are calculated by circular shifts. The degree of the shift depends on the round (rd). We shift either left/right depending on whether we are decrypting/encrypting. *)
inline fun keyshift(key_half:28,rd:4,encrypt:1):28 = define val shift_one = (rd=0 or rd=1 or rd=8 or rd=15) in if encrypt then if shift_one then <% rol 1 %> (key_half) else <% rol 2 %> (key_half) else if rd=0 then key_half else if shift_one then <% ror 1 %> (key_half) else <% ror 2 %> (key_half) end
339
340
R. Sharp
(* A single DES round: *) inline fun round(bl:round_data,rd:4,encrypt:1):round_data = let val lkey = keyshift(slice(bl.key,55,28),rd,encrypt) val rkey = keyshift(slice(bl.key,27,0),rd,encrypt) val keybits = <% perm p_compress %> ( join(lkey,rkey) ) val new_right = let val after_p = <% perm p_expansion %>(bl.right) in s_sub (after_p ˆ keybits ˆ bl.left) end in {left=bl.right, right=new_right, key=join(lkey,rkey)} end (* Do 16 DES rounds: *) fun des(c:4, rd:round_data,encrypt:1):round_data = let val new_data = round(rd, c, encrypt) in if c=15 then new_data else des(c+1, new_data,encrypt) end (* Do input/output permutations and 16 rounds of DES: *) fun main(block:64,key:64, encrypt:1):64 = let val block_p = <% perm p_initial %> (block) val realkey = <% perm p_key %> (key) val output = des(0:4, {left=slice(block_p,63,32), right=slice(block_p,31,0), key=realkey}, encrypt) in <% perm final %> (join(output.right, output.left)) end
References 1. Haskell98 report. Available from http://www.haskell.org/. 2. Bjesse, P., Claessen, K., Sheeran, M., and Singh, S. Lava: Hardware description in Haskell. In Proceedings of the 3rd International Conference on Functional Programming (1998), SIGPLAN, ACM. 3. Bose, B. DDD: A transformation system for digital design derivation. Tech. Rep. 331, Computer Science Department, Indiana University, 1991. 4. Claessen, K., and Sands, D. Observable sharing for functional circuit description. In Advances in Computing Science ASIAN’99; 5th Asian Computing Science Conference (1999), vol. 1742 of LNCS, Springer Verlag, pp. 62–73. 5. Classen, K., Sheeran, M., and Singh, S. The design and verification of a sorter core. In Proceedings of the 11th Advanced Working Conference on Correct Hardware Design and Verification Methods (2001), vol. 2144 of LNCS, Springer-Verlag, pp. 355–369. 6. De Micheli, G. Synthesis and Optimization of Digital Circuits. McGraw-Hill Inc., 1994. 7. IEEE. Verilog HDL language reference manual. IEEE Standard 1364-2001.
Functional Design Using Behavioural and Structural Components
341
8. IEEE. Standard VHDL Reference Manual, 1993. IEEE Standard 1076-1993. 9. Kaps, J.-P., and Paar, C. Fast DES implementation for FPGAs and its application to a universal key-search machine. In Selected Areas in Cryptography (1998), vol. 1556 of Lecture Notes in Computer Science, Springer Verlag, pp. 234–247. 10. Matthews, J., Cook, B., and Launchbury, J. Microprocessor specification in Hawk. In Proceedings of the IEEE International Conference on Computer Languages (1998). 11. Milner, R., Tofte, M., Harper, R., and MacQueen, D. The Definition of Standard ML (Revised). MIT Press, 1997. 12. Mycroft, A., and Sharp, R. A statically allocated parallel functional language. In Proceedings of the International Conference on Automata, Languages and Programming (2000), vol. 1853 of LNCS, Springer-Verlag. 13. Mycroft, A., and Sharp, R. Hardware/software co-design using functional languages. In Proceedings of TACAS (2001), vol. 2031 of LNCS, Springer-Verlag. 14. Mycroft, A., and Sharp, R. Higher-level techniques for hardware description and synthesis. To appear. International Journal on Software Tools for Technology Transfer (STTT) (2002). 15. O’Donnell, J. Hardware description with recursion equations. In Proceedings of the IFIP 8th International Symposium on Computer Hardware Description Languages and their Applications (April 1987), North-Holland, pp. 363–382. 16. O’Donnell, J. Generating netlists from executable circuit specifications in a pure functional language. In Functional Programming, Workshops in Computing, Proceedings (1992), Springer-Verlag, pp. 178–194. 17. Paulson, L. ML for the working programmer. Cambridge University Press, 1996. 18. Schneier, B. Applied cryptography: protocols, algorithms, and sourcecode in C. John Wiley and Sons, New York, 1994. 19. Sharp, R., and Mycroft, A. Soft scheduling for hardware. In Proceedings of the 8th International Static Analysis Symposium (2001), vol. 2126 of LNCS, SpringerVerlag. 20. Sheeran, M. muFP, a language for VLSI design. In Proceedings of the ACM Symposium on LISP and Functional Programming (1984). 21. Wadler, P. Monads for functional programming. In Advanced Functional Programming (1995), vol. 925 of LNCS, Springer-Verlag.
Compiling Hardware Descriptions with Relative Placement Information for Parametrised Libraries Steve McKeever, Wayne Luk, and Arran Derbyshire Department of Computing, Imperial College 180 Queen’s Gate, London, UK {swm2, wl, arad}@doc.ic.ac.uk
Abstract. Placement information is useful in producing efficient circuit layout, especially for hardware libraries or for run-time reconfigurable designs. Relative placement information enables control of circuit layout at a higher level of abstraction than placement information in the form of explicit coordinates. We present a functional specification of a procedure for compiling programs with relative placement information in Pebble, a simple language based on Structural VHDL, into programs with explicit placement coordinate information. This procedure includes source-level transformation for compiling into descriptions that support conditional compilation based on symbolic placement constraints, a feature essential for parametrised library elements. Partial evaluation is used to optimise a description using relative placement to improve its size and speed. We illustrate our approach using a DES encryption design, which results in a 60% reduction in area and a 6% improvement in speed.
1
Introduction
Placement information is useful for guiding design tools to produce an efficient design. Such information is particularly effective for regular circuits, where conventional placement algorithms may not be able to fully exploit the circuit structure to achieve an optimised implementation. Precise control of layout is especially rewarding in two situations. First, optimal resource usage is paramount for hardware libraries, since inefficiency will affect all the designs that use them. It has been shown that, despite advance in automatic placement methods, usersupplied placement information can often significantly improve FPGA performance and resource utilisation for common applications [18]. Second, controlling placement is desirable for reconfigurable circuits to minimise reconfiguration time, since components at identical locations common to two successive configurations do not need to be reconfigured. Such optimisation has been included in recent design tools for reconfigurable applications [16]. While hardware library developers often have good reasons to control circuit placement, it is, however, tedious to provide explicit coordinate information for every component in a large circuit. The use of relative placement information, M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 342–359, 2002. c Springer-Verlag Berlin Heidelberg 2002
Compiling Hardware Descriptions with Relative Placement Information
343
such as placing components beside or below one another, has been proposed for producing designs. Languages and systems that support this technique include µFP [8], Ruby [5],[17], T-Ruby [15], Lava [1], and Rebecca [3]. All these systems produce, from declarative descriptions, circuit layouts in the form of VHDL or EDIF descriptions with explicit coordinates which can be mapped efficiently into hardware. However, the compiled circuit descriptions are no longer parametrised. Our aim is to support instantiation of parameters at the compiled VHDL level, in addition to instantiation at the declarative description level. This paper describes an approach capable of producing parametric descriptions in VHDL with symbolic placement information, which can be instantiated and further processed by industry-standard VHDL tools. Our approach is supported by Pebble [9],[13], a simple hardware description language based on Structural VHDL which has been used in a framework for verifying the correctness of design tools [12]. The novel aspects of our work include: – functional specification of a compilation procedure mapping designs with relative placement to the corresponding descriptions with explicit placement coordinates; – source-level transformations for compiling composite designs containing conditional statements into parametric descriptions; – illustration of circuit compaction based on partial evaluation for optimising resource usage and performance; – evaluation of the proposed approach using an FPGA implementation of the DES encryption algorithm. Our work unites two recent themes which seem to have growing importance. The first theme concerns the combination of architectural and physical design, since physical constraints are becoming relevant earlier in the design process. The second theme concerns the use of standard programming language techniques, such as partial evaluation, for analysis and transformation of hardware descriptions. While partial evaluation has been used for dynamic specialisation of reconfigurable circuits [11] and automated design of field-programmable compute accelerators [20], our use of partial evaluation for parametric hardware compaction appears to be novel. The rest of the paper is organised as follows. Section 2 provides an overview of Pebble, a variant of VHDL that we use. Section 3 introduces the DES encryption example, showing how it can be captured in Pebble. Section 4 presents the functional specification of a compiler mapping descriptions with relative placement to the corresponding descriptions with explicit placement coordinates. Section 5 explains how this compiler can be extended to support conditional compilation, which is critical for supporting parametric descriptions in hardware libraries. Section 6 describes automatic compaction based on partial evaluation, and illustrates the application of the proposed approach to the DES example. Section 7 contains concluding remarks.
344
S. McKeever, W. Luk, and A. Derbyshire x0
x1
x2
x3
mux
mux
mux
mux
z0 y0
z1 y1
z2 y2
z3 y3
c
Fig. 1. An array of multiplexors described by the Pebble program in Figure 2.
BLOCK muxarray (n) [c:WIRE, x,y:VECTOR (n-1..0) OF WIRE] [z:VECTOR (n-1..0) OF WIRE] VAR i; BEGIN GENERATE FOR i = 0..(n-1) BEGIN mux [c,x(i),y(i)] [z(i)] AT (i,0) END END Fig. 2. A description of an array of multiplexors (Figure 1) in Pebble with explicit placement coordinates. The external input c is used to provide a common control input for each multiplexor.
2
Pebble
Pebble can be regarded as a simple variant of Structural VHDL. It provides a means of representing block diagrams hierarchically and parametrically [9]. Pebble has a simple, block-structured syntax. As an example, Figure 2 describes the multiplexor array in Figure 1, provided that the size parameter n is 4. The syntax of Pebble is shown in Figure 3. A Pebble program is a block, defined by its name, parameters, interfaces, local definitions, and its body. The block interfaces are given by two lists, usually interpreted as the inputs and outputs. An input or an output can be of type WIRE, or it can be a multidimensional vector of wires. A wire can carry integer, boolean or other primitive data values. Wires w1, w2, . . . that are connected together are denoted by the expression connect [w1,w2, . . . ]. A primitive block has an empty body; a composite block has a body containing the instantiation of composite or primitive blocks in any order. Blocks
Compiling Hardware Descriptions with Relative Placement Information
345
connected to each other share the same wire in the interface instantiation. For hardware designs, the primitive blocks can be bit-level logic gates and registers, or they can, like an adder, process word-level data such as integers or fixed-point numbers; the set of primitives depends on the availability of the corresponding components in the domain targeted by the Pebble compiler. The GENERATE IF statement enables conditional compilation and recursive definition, while the GENERATE FOR statement allows the concise description of regular circuits. To support generic description of designs, the parameters in a Pebble program can include the number of pipeline stages or the pitch between neighbouring interface connections [9]. Different network structures, such as treeor butterfly-shaped circuits, can be described parametrically by indexing the components and wires. The semantics of Pebble depends on the behaviour of the primitive blocks and their composition in the target technology. Currently a synchronous circuit model is used in our tools, and special control components for modelling runtime reconfiguration are also supported [9]. However, other models can be used if desired. Indeed Pebble can be used in modelling any block-structured systems, not just electronic circuits. Pebble adopts the convention “AT (x,y)” to denote the placement of a block at a location with coordinates (x,y) as shown in Figure 3. While such placement information helps to optimise the layout, it is usually tedious and error-prone to specify. We have therefore developed high-level descriptions for placement constraints, abstracting away the low-level details. These descriptions are compiletime directives for the Pebble compiler to project coordinates onto designs, generating a tree representing placement possibilities. The two main descriptions, shown in Figure 4, are BESIDE, which places two or more blocks beside each other, and BELOW, which places blocks vertically. These descriptions allow blocks to be placed relatively to each other, without the user providing the coordinates of their locations. As a simple example, an alternative description to Figure 2 using relative placement can be obtained by replacing the keyword GENERATE by BESIDE; the placement specification “AT (i,0)” is no longer necessary. A more complex example involving DES encryption will be given next in Section 3.
3
Case Study: DES Cryptographic Algorithm
The Data Encryption Standard (DES) is a cryptographic algorithm that is ideally suited to implementation in hardware. It features a regular datapath consisting of 16 identical iterations. It is provided as a standard component in many hardware libraries [7]. To improve performance and area efficiency, it can be placed as a hierarchy of adjacent tiles. The BESIDE and BELOW descriptions provide a simple way of capturing this placement. The algorithm takes as inputs a 56-bit key, a mode indicator (encrypt or decrypt), and a 64-bit block of data (either plain text or cipher text). The design can be specialised to particular values of the key and mode [14]. In this
346
S. McKeever, W. Luk, and A. Derbyshire
blk
::= BLOCK id (id1 , . . . , idj ) [idin 1 : tin 1 , . . . , idin n : tin n ] [idout 1 : tout 1 , . . . , idout m : tout m ] VAR id1 , . . . , idq ; VAR id1 : t1 , . . . , idp : tp ; BEGIN stmts END
stmts ::= stmt | stmt ; stmts stmt ::= connect [le1 , . . . , lep ] | pid [le1 , . . . , len ] [le1 , . . . , lem ] AT (e1 ,e2 ) | id (e1 , . . . , ej ) [le1 , . . . , len ] [le1 , . . . , lem ] | GENERATE FOR id = e1 ..e2 BEGIN stmts END t ::= WIRE | VECTOR (e1 ..e2 ) OF t pid ::= AND | OR | · · · le ::= id | id (e) e ::= id | n | e1 + e2 | · · · Fig. 3. Syntax of core Pebble langauge with explicit placement information for primitive blocks to be placed at Cartesian coordinates given by expressions e1 and e2. Identifiers pid are the names for Pebble primitive blocks.
besblk ::= BLOCK id (id1 , . . . , idj ) [id1 , . . . , idn ] [id1 , . . . , idm ] VAR id1 , . . . , idq ; VAR id1 :t1 , . . . , idp :tp ; BEGIN bes END bes
::= | | | | | |
connect [le1 , . . . , lep ] pid [le1 , . . . , len ] [le1 , . . . , lem ] id (e1 , . . . , ej ) [le1 , . . . , len ] [le1 , . . . , lem ] BESIDE (bes1 ; . . . ;besn ) BELOW (bes1 ; . . . ;besn ) BESIDE FOR id = e1 ..e2 BEGIN bes END BELOW FOR id = e1 ..e2 BEGIN bes END
Fig. 4. Syntax of Pebble with relative placement.
situation, performance and resource usage can be improved by applying boolean optimisation to remove unused logic. The layout of a specialised design can be compacted to eliminate the gaps created by this removal of logic.
Compiling Hardware Descriptions with Relative Placement Information
347
We present a description for the DES case study that can be parametrised to implement either a full design or a specialised design. These two design alternatives are selected by a design parameter and have two different layouts; the specialised design has a compacted layout. In order to describe the two alternative layouts using coordinates, the compaction would have to be described using symbolic arithmetic expressions given in terms of the design parameters. Using the BESIDE and BELOW operators, this compaction is provided for free, hence removing the need to provide an otherwise tedious and error-prone layout description. Each iteration of the DES algorithm contains a number of permutations, substitutions and exclusive-OR operations. The structure of the iteration is shown in Figure 5. The design shown is fully-pipelined, the pipeline registers are represented by triangles. The c, e and p operators are permutations and the and operators are shifts, all of which can be implemented in hardware simply as wires. The s operator (the s-box) performs a series of substitutions and is implemented by a lookup-table. The key generator combines its result with the main datapath through the XOR block labelled xors.
mode
key
c
e
s
p
text
xors
Fig. 5. A single iteration of the DES algorithm. Pipeline registers are represented by triangles. The c, e and p operators are permutations. The s operator performs a series of substitutions. The and operators are shifts.
When the design is specialised by its key and mode, it can be optimised by constant propagation which removes the need for the key generator, and which replaces the xors operator by a series of wires or inverters. The inverters can be removed by including the appropriate entries of the lookup-table. The Pebble description of this design is shown in Figure 6. Note that conditional compilation is supported by the GENERATE IF statement: depending on the value of specialise, the description produces either a composite circuit involving keygen and xors, or just the wiring circuit connect [xortext(i),
348
S. McKeever, W. Luk, and A. Derbyshire
Fig. 6. Pebble description of the top level of the DES design with placement given by BESIDE operator. A specialised implementation is generated when the parameter specialise=1, otherwise a full implementation is generated. In the full implementation, the keygen and xors blocks are sandwiched between the round blocks; the description with explicit coordinates is shown in Figure 14. When specialised, the use of the BESIDE inside the FOR loop ensures that the design is compacted (Figure 15).
exptext(i)]. The syntax of Pebble supporting the GENERATE IF statement will be given in Figure 10. Figure 7 shows block diagrams of the Pebble description in Figure 6 when a) the full design is implemented (specialise=0) and b) when the specialised design is implemented (specialise=1). The block labelled keygen implements the key generator and the block labelled round implements the main datapath.
Compiling Hardware Descriptions with Relative Placement Information keygen xors round
keygen
349
keygen
xors
xors
round
round
a)
round
round
round
b) Fig. 7. Block diagram of the DES implementations: a) the full DES design with specialise=0 and b) the specialised DES design with specialise=1.
4
Compiling Pebble: Functional Specification
In order to project a coordinate scheme onto a Beside-Below Pebble statement, we use an environment µ mapping block names to their syntactical definitions, an environment φ mapping block names to their sizes, and a placement function P (Figure 8). Block sizes are functions that take the symbolic arguments of a block and return its symbolic width and height. The placement function P is used to position blocks within their immediate context; it maps an abstract coordinate scheme onto a statement. It returns a tuple of three components: a sequence of statements unfolded by the rules of BESIDE and BELOW, the dimensions of the statement, and an updated block size environment φ. A default identity function is used for placing single blocks, while one that derives repeated positions is used for loops. The placement of blocks is achieved locally. The symbolic addresses are calculated using the given (x, y) expressions and the function ‘f ’ or ‘g’. They provide all that is required to derive suitable symbolic locations. For BESIDE and BELOW loops, we create the new local placement function ‘g’ that does not depend on the nesting level of the statement, but only on the given start position of the loop. Our model does not include space for wiring: it is assumed that wiring resources are orthogonal to the network of logic blocks and have no effects on them, or that the effects of routing between logic blocks are captured within the blocks themselves. A coordinate scheme is projected onto a Beside-Below statement in the following manner. A primitive block of width wdpid and height htpid is positioned according to its placement function and dimension. The size expression of composite blocks is calculated by applying the generic expressions to the block’s size stored in φ. If the size expression is unknown, then it is derived using PB.
350
S. McKeever, W. Luk, and A. Derbyshire
P :: SizeEnv → BlockEnv → BesBelStmt → (Exp × Exp) → FuncPos → ([Stmt] × (Exp × Exp) × BlockEnv) Pφ µ [[ connect [le1 , . . . , lep ] ]] (x, y) f = ([connect [le1 , . . . , lep ]], (0, 0), φ) Pφ µ [[ pid [le1 , . . . , len ] [le1 , . . . , lem ] ]] (x, y) f = let (xpos, ypos) = f (x, y) in ([pid [le1 , . . . , len ] [le1 , . . . , lem ] AT (xpos, ypos)], (wdpid , htpid ), φ) Pφ µ [[ id (e1 , . . . , ej ) [le1 , . . . , len ] [le1 , . . . , lem ] ]] (x, y) f = if (id ∈ (dom φ)) then let (acc, up) = (φ id) (e1 , . . . , ej ) (xpos, ypos) = f (x, y) in ( [id (xpos, ypos, e1 , . . . , ej ) [le1 , . . . , len ] [le1 , . . . , lem ]], (acc, up), φ) else let φ = PBφ µ (µ id) (acc, up) = (φ id) (e1 , . . . , ej ) (xpos, ypos) = f (x, y) in ( [id (xpos, ypos, e1 , . . . , ej ) [le1 , . . . , len ] [le1 , . . . , lem ]], (acc, up), φ ) Pφ µ [[ BESIDE(bes1 ; . . . ;besn ) ]] (x, y) f = let (stmts1 , (acc 1 , up 1 ), φ1 ) = Pφ µ [[ bes1 ]] (x, y) f (stmts2 , (acc 2 , up 2 ), φ2 ) = Pφ1 µ [[ bes2 ]] (x + acc 1 , y) f .. . (stmtsn , (acc n , up n ), φn ) = Pφn−1 µ [[ besn ]] (x + acc 1 + · · · + acc n−1 , y) f in (stmts1 ++ · · · ++ stmtsn , (acc 1 + · · · + acc n , max (up 1 , . . . , up n )), φn ) Pφ µ [[ BELOW(bes1 ; . . . ;besn ) ]] (x, y) f = let (stmts1 , (acc 1 , up 1 ), φ1 ) = Pφ µ [[ bes1 ]] (x, y) f (stmts2 , (acc 2 , up 2 ), φ2 ) = Pφ1 µ [[ bes2 ]] (x, y + up 1 ) f .. . (stmtsn , (acc n , up n ), φn ) = Pφn−1 µ [[ besn ]] (x, y + up 1 + · · · + up n−1 ) f in (stmts1 ++ · · · ++ stmtsn , (max (acc 1 , . . . , acc n ), up 1 + · · · + up n ), φn ) Pφ µ [[ BESIDE FOR id = e1 ..e2 BEGIN bes END ]] (x, y) f = let xoffset = N V () g (x, y) = (x + (id − e1 ) × xoffset, y) (stmts, (acc, up), φ ) = Pφ µ [[ bes ]] (x, y) g stmts = (λ xoffset · stmts) acc in ( [FOR id = e1 ..e2 BEGIN stmts END], (acc × (e2 − e1 + 1), up), φ ) Pφ µ [[ BELOW FOR id = e1 ..e2 BEGIN bes END ]] (x, y) f = let yoffset = N V () g (x, y) = (x, y + (id − e1 ) × yoffset) (stmts, (acc, up), φ ) = Pφ µ [[ bes ]] (x, y) g stmts = (λ yoffset · stmts) up in ( [FOR id = e1 ..e2 BEGIN stmts END], (acc, up × (e2 − e1 + 1)), φ ) Fig. 8. Mapping descriptions with relative placement to descriptions with explicit placement coordinates constructed symbolically.
Compiling Hardware Descriptions with Relative Placement Information
351
PB :: SizeEnv → BlockEnv → BesBelBlock → SizeEnv PBφ µ [[ BLOCK id (gid1 , . . . , gidj ) [id1 :t1 , . . . , idn :tn ] [id1 :t1 , . . . , idm :tm ] VAR lid1 , . . . , lidq ; VAR id1 :t1 , . . . , idp :tp ; BEGIN bes END ]] = let f (x, y) = (x, y) (stmts, (acc, up), φ ) = Pφ µ [[ bes ]] (x, y) f in φ ⊕ { id →λ(gid1 , . . . , gidj ) · (acc, up) } Fig. 9. An algorithm for calculating the size of a block. The identifiers lidi and wires idj are local to this block.
Coordinates are projected onto a row of beside terms by adding previous widths together. The final size of the BESIDE statement is the sum of each width and the maximum height of all subterms. Similarly for the BELOW statement. For loops, the position of each loop body depends on the iteration index and the size of the body. Initially, we do not know the size of the loop body so we create a new identifier using the function N V, and replace it with the value once it is known. The concealed function N V creates a distinct new identifier each time it is called. This method works because the place holder variables will not be required until after the size of the block is known. The position of each repeated subterm is calculated using a new placement function. The size of a Beside-Below block is calculated from the size of its statement body using P and the default identity placement function f . The resulting dimensions (acc, up) are parametrised by the block’s generic variables (gid1 , . . . , gidj ), as shown in the lambda expression of Figure 9. This expression denotes the size of the block when applied to a list of values; it is bound to the block’s name and added to the updated size environment φ . We can use the above definitions to prove the correctness of various source to source transformations. As an example, consider the composition of two BESIDE statements: Pφ µ [[BESIDE(a;BESIDE(b;c))]] (x, y) f = Pφ µ [[BESIDE(a;b;c)]] (x, y) f A proof can be obtained by unfolding the LHS twice using P, rearranging the resulting expression, and then folding on P to arrive at the RHS.
5
Dealing with Conditionals
The syntax of our conditional command is essentially the same as that in VHDL, namely a guarded command as shown in Figure 10. From a placement perspective this creates a problem, as we have to consider both what happens when the guard succeeds and fails. We need to deal with this issue in order to support the generation of VHDL descriptions with symbolic placement constraints.
352
S. McKeever, W. Luk, and A. Derbyshire
An observation is that primitive block calls which occur after a conditional call will be placed differently depending on whether the boolean condition is true or not. Consider the following example: BESIDE ( a; GENERATE IF x=2 THEN b; c) This description covers two situations. If x is 2 then we can rewrite the above as BESIDE (a;b;c), otherwise it becomes BESIDE (a;c). Applying P to each case will result in differing layouts. A simple solution to this problem is to assume that the guard will always succeed for the placement of subsequent gate calls but this leads to many cells being left unused at run time. Our solution is to develop an intermediate syntax in which all conditionals occur at the end of a BESIDE or BELOW list as shown in Figure 11. We preprocess conditional descriptions so that all calls that occur after a GENERATE IF statement are removed. These calls are nested within either a conditional that succeeds or one that fails for the particular guard. Considering our example above we would arrive at the following description: BESIDE ( a; GENERATE IF x=2 THEN BESIDE (b;c); GENERATE IF NOT (x=2) THEN c) In effect we create a tree of possible placement paths so that each conditional branch will contain all possible subsequent gate calls. The recursive descent algorithm that undertakes this conversion is presented in Figure 12. We include two new cases for the P function as shown in Figure 13. For a BESIDE call, the length of the statement list will be the length of all the primitive calls plus the maximum of the length of the conditionals. In other words, we assume that the length of the BESIDE call will be that of the largest possible configuration. As before, the height will be the maximum of all possible primitive calls. This scheme integrates smoothly with the placement function for loops. Let us apply T S (Figure 12) to the DES example shown in Figure 6 to produce a description with explicit coordinates (Figure 14). The application results in lifting the two calls xors and round into both conditional branches. We then apply P with the following block size environment: φ = {keygen → (2, 15), xors → (1, 12), round → (2, 24)} to create a version with explicit coordinates. The length of each loop iteration is calculated as the maximum size of both conditionals. Therefore the width and height of the DES block is given by ((2 + 1 + 2) × 16, 24) = (5 × 16, 24) = (80, 24).
Compiling Hardware Descriptions with Relative Placement Information
353
besblk ::= BLOCK id (id1 , . . . , idj ) [id1 , . . . , idn ] [id1 , . . . , idm ] VAR id1 , . . . , idq ; VAR id1 :t1 , . . . , idp :tp ; BEGIN bes END bes
::= | | | | | | cstmt ::= |
connect [le1 , . . . , lep ] pid [le1 , . . . , len ] [le1 , . . . , lem ] id (e1 , . . . , ej ) [le1 , . . . , len ] [le1 , . . . , lem ] BESIDE (cstmt1 ; . . . ;cstmtn ) BELOW (cstmt1 ; . . . ;cstmtn ) BESIDE FOR id = e1 ..e2 BEGIN bes END BELOW FOR id = e1 ..e2 BEGIN bes END GENERATE IF e THEN bes bes
Fig. 10. Syntax of Beside and Below Pebble with conditionals.
tbesblk ::= BLOCK id (id1 , . . . , idj ) [id1 , . . . , idn ] [id1 , . . . , idm ] VAR id1 , . . . , idq ; VAR id1 :t1 , . . . , idp :tp ; BEGIN tbes END tbes
connect [le1 , . . . , lep ] pid [le1 , . . . , len ] [le1 , . . . , lem ] id (e1 , . . . , ej ) [le1 , . . . , len ] [le1 , . . . , lem ] BESIDE (tbes1 ; . . . ;tbesn ;tcstmt) BELOW (tbes1 ; . . . ;tbesn ;tcstmt) BESIDE FOR id = e1 ..e2 BEGIN tbes END BELOW FOR id = e1 ..e2 BEGIN tbes END GENERATE IF e1 THEN tbes1 ; GENERATE IF e2 THEN tbes2 | tbes
::= | | | | | | tcstmt ::=
Fig. 11. Syntax of Beside and Below Pebble, with all conditionals appearing at the end of a BESIDE or BELOW list.
6
Compaction by Partial Evaluation
A partial evaluator is an algorithm which, when given a program and some of its input data, produces a residual or specialized program. Running the residual
354
S. McKeever, W. Luk, and A. Derbyshire
T B :: CondBesBelBlk → TransBesBelBlk T B [[ BLOCK id (gid1 , . . . , gidj ) [id1 :t1 , . . . , idn :tn ] [id1 :t1 , . . . , idm :tm ] VAR lid1 , . . . , lidq ; VAR id1 :t1 , . . . , idp :tp ; BEGIN bes END ]] = BLOCK id (gid1 , . . . , gidj ) [id1 :t1 , . . . , idn :tn ] [id1 :t1 , . . . , idm :tm ] VAR lid1 , . . . , lidq ; VAR id1 :t1 , . . . , idp :tp ; BEGIN T S [[ bes ]] END TS TS TS TS TS TS
TS TS
TS TS
:: CondBesBelStmt → TransBesBelStmt [[ connect [le1 , . . . , lep ] ]] = connect [le1 , . . . , lep ] [[ pid [le1 , . . . , len ] [le1 , . . . , lem ] ]] = pid [le1 , . . . , len ] [le1 , . . . , lem ] [[ id (e1 , . . . , ej ) [le1 , . . . , len ] [le1 , . . . , lem ] ]] = id (e1 , . . . , ej ) [le1 , . . . , len ] [le1 , . . . , lem ] [[ BESIDE (bes1 ; . . . ;besn ) ]] = BESIDE (T S [[ bes1 ]]; . . . ;T S [[ besn ]]) [[ BESIDE ( bes1 ; . . . ;besj ; GENERATE IF e THEN cstmttt ; cstmtk ; . . . ;cstmtm ) ]] = let tcase = T S [[ BESIDE (cstmttt ;cstmtk ; . . . ;cstmtm ) ]] fcase = T S [[ BESIDE (cstmtk ; . . . ;cstmtm ) ]] in BESIDE ( T S [[ bes1 ]]; . . . ;T S [[ besj ]]; GENERATE IF e THEN tcase; GENERATE IF NOT e THEN fcase) [[ BELOW (bes1 ; . . . ;besn ) ]] = BELOW (T S [[ bes1 ]]; . . . ;T S [[ besn ]]) [[ BELOW ( bes1 ; . . . ;besj ; GENERATE IF e THEN cstmttt ; cstmtk ; . . . ;cstmtm ) ]] = let tcase = T S [[ BELOW (cstmttt ;cstmtk ; . . . ;cstmtm ) ]] fcase = T S [[ BELOW (cstmtk ; . . . ;cstmtm ) ]] in BELOW ( T S [[ bes1 ]]; . . . ;T S [[ besj ]]; GENERATE IF e THEN tcase; GENERATE IF NOT e THEN fcase) [[ BESIDE FOR id = e1 ..e2 BEGIN bes END ]] = BESIDE FOR id = e1 ..e2 BEGIN T S [[ bes ]] END [[ BELOW FOR id = e1 ..e2 BEGIN bes END ]] = BELOW FOR id = e1 ..e2 BEGIN T S [[ bes ]] END
Fig. 12. A recursive descent algorithm for creating a tree of possible placement paths so that each conditional branch will contain all possible subsequent gate calls.
program on the remaining data will yield the same result as running the original program on all of its input data [4]. Our use of the Pebble language is to enable a parametrised style of hardware design [6]. Partial evaluation, even with no static data at all, can often opti-
Compiling Hardware Descriptions with Relative Placement Information
355
Fig. 13. Extending the placement function to deal with conditional compilation.
mize such descriptions. This is because it can propagate constants from blocks where they are defined to those where they are used, and precomputing wherever possible. However, in the case of our placement descriptions, we seek to exploit the inefficiency introduced when assigning locations to primitive blocks within conditionals. As discussed in Section 5, we assume that the size of a conditional statement is the maximum of both the true and false cases. If we know in advance which branch of the conditional will be chosen, then we can not only eliminate the dead code from our circuit description, but also re-apply the P function to create a more precise layout. We demonstrate this process by partially evaluating our DES example when the value of specialise is 1. As we can see in Figure 15, the size of the loop body is smaller, reducing the width and height of the DES block to: (2 × 16, 24) = (32, 24).
356
S. McKeever, W. Luk, and A. Derbyshire
BLOCK des (x,y,specialise) [textin:VECTOR (63..0) OF WIRE;clk:WIRE; keyin:VECTOR (55..0) OF WIRE;modein:WIRE] [textout:VECTOR (63..0) OF WIRE] VAR i; VAR text : VECTOR (16..0) OF VECTOR (63..0) OF WIRE; VAR xortext: VECTOR (16..0) OF VECTOR (47..0) OF WIRE; VAR exptext: VECTOR (16..0) OF VECTOR (47..0) OF WIRE; VAR key : VECTOR (16..0) OF VECTOR (55..0) OF WIRE; VAR mode : VECTOR (16..0) OF WIRE; VAR rkey : VECTOR (16..0) OF VECTOR (47..0) OF WIRE; BEGIN connect [text(0), textin]; connect [ key(0), keyin ]; connect [mode(0), modein]; GENERATE FOR i=0..15 BEGIN GENERATE IF specialise=0 THEN keygen (x+(4*i),y) [key(i), mode(i), clk] [rkey(i), key(i+1), mode(i+1)]; xors (x+1+(4*i),y) [exptext(i), rkey(i)] [xortext(i)]; round (x+2+(4*i),y) [text(i), xortext(i), clk] [exptext(i), text(i+1)] END; GENERATE IF specialise=1 THEN connect [xortext(i), exptext(i)]; round (x+1+(4*i),y) [text(i), xortext(i), clk] [exptext(i), text(i+1)] END END; connect [textout, text(16)] END Fig. 14. Pebble description of the DES design (Figure 6) with placement given by coordinates. Since our method involves putting a conditional statement at the end of a BESIDE or BELOW list, the round block is replicated to appear in both GENERATE IF statements, each with different coordinates.
When implemented on a Xilinx Virtex FPGA, the bounding box of the floorplan of the specialised design is 40% of that of the non-specialised design – in other words, the compaction reduces its size by 60%. A similar specialised design with floorplanning [14] runs at 10.7 Gbits per second, which is 600 Mbits per second faster than a comparable non-specialised implementation without floorplanning [19].
Compiling Hardware Descriptions with Relative Placement Information
357
BLOCK des (x,y) [textin:VECTOR (63..0) OF WIRE; clk:WIRE] [textout:VECTOR (63..0) OF WIRE] VAR i; VAR text : VECTOR (16..0) OF VECTOR (63..0) OF WIRE; VAR xortext: VECTOR (16..0) OF VECTOR (63..0) OF WIRE; VAR exptext: VECTOR (16..0) OF VECTOR (47..0) OF WIRE; BEGIN connect [text(0), textin]; GENERATE FOR i=0..15 BEGIN connect [xortext(i), exptext(i)]; round (x+1+(4*i),y) [text(i), xortext(i), clk] [exptext(i), text(i+1)] END; connect [textout, text(16)] END Fig. 15. Pebble description of the DES compacted design when specialise=1.
7
Summary
We have provided a functional specification for a procedure that compiles a description with relative placement information into a version where symbolic information is specified using coordinates. We have also shown how a description using relative placement can be optimised using partial evaluation, so that compaction is achieved for free. Such compaction can benefit designs in which parameters are used for block selection in the floorplan. Our approach applies to these designs and is supported by Pebble, a simple language based on Structural VHDL. Prototype tools have also been developed to support experiments with placement constraints expressed as polynomial expressions [2]. Such placement constraint expressions can be solved automatically by a hierarchical resolution engine. This approach allows for greater placement accuracy. The target applications for our methodology include hardware libraries and run-time reconfigurable designs. Hardware libraries can be optimised for different parameters and instantiated before or after compaction without increasing complexity or inefficiency. Run-time reconfigurable designs enable the synthesis of smaller circuits which can operate at higher speeds and consume less power than non-reconfigurable designs [10]. The RECONFIGURE IF statement [9] enables circuit descriptions where two components can occupy the same location at different instants. Our methodology extends naturally to include this paradigm. Current work involves verifying the correctness of our transformations, developing an efficient partial evaluator which exploits source to source optimisations, and extending our approach to cover descriptions with optional placement constraints [9] and polymorphic and higher-order features [13].
358
S. McKeever, W. Luk, and A. Derbyshire
Acknowledgements. Many thanks to the anonymous reviewers for their comments and suggestions. The support of Xilinx, Inc., Celoxica Limited and UK Engineering and Physical Sciences Research Council (Grant number GR/N 66599) is gratefully acknowledged. This work was carried out as part of Technology Group 10 of UK MOD’s Corporate Research Programme.
References 1. P. Bjesse, K. Claessen, M. Sheeran and S. Singh, “Lava: Hardware design in Haskell”, Proc. ACM Int. Conf. Functional Programming (ICFP’98), ACM Press, 1998. 2. F. Dupont-De-Dinechin, W. Luk and S.W. McKeever, “Towards portable hierarchical placement for FPGAs”, INRIA Report 3776, 1999. 3. S. Guo and W. Luk, “An Integrated system for developing regular array design”, Journal of Systems Architecture, Vol. 47, 2001. 4. N. Jones, C. Gomard and P. Sestoft, Partial Evaluation and Automatic Program Generation, Prentice Hall International Series in Computer Science, 1993. 5. W. Luk, “A declarative approach to incremental custom computing”, in Proc. Symp. on FPGAs for Custom Computing Machines, IEEE Computer Society Press, 1995. 6. W. Luk, S. Guo, N. Shirazi and N. Zhuang, “A framework for developing parametrised FPGA libraries”, in Field-Programmable Logic and Applications, LNCS 1142, Springer, 1996. 7. W. Luk, T. Kean, A. Derbyshire, J. Gause, S.W. McKeever, O. Mencer and A. Yeow, “Parameterised Hardware Libraries for Programmable System-on-Chip Technology”, in Canadian Journal of Electrical and Computer Engineering, Vol. 26, No. 3/4, 2001. 8. W. Luk and I. Page, “Parametrising designs for FPGAs”, in FPGAs, Abingdon EE&CS Books, 1991. 9. W. Luk and S.W. McKeever, “Pebble: a language for parametrised and reconfigurable hardware design”, in Field-Programmable Logic and Applications, LNCS 1482, Springer, 1998. 10. J. MacBeth and P. Lysaght, “Dynamically reconfigurable cores”, in FieldProgrammable Logic and Applications, LNCS 2147, Springer, 2001. 11. N. McKay, T. Melham, K.W. Susanto and S. Singh, “Dynamic specialisation of XC6200 FPGAs by partial evaluation”, in Proc. Symp. on FPGAs for Custom Computing Machines, IEEE Computer Society Press, 1998. 12. S.W. McKeever and W. Luk, “Towards provably-correct hardware compilation tools based on pass separation techniques”, in Correct Hardware Design and Verification Methods, LNCS 2144, Springer, 2001. 13. S.W. McKeever and W. Luk, “A declarative framework for developing parametrised hardware libraries”, in Proc. 8th Int. Conf. on Electronics, Circuits and Systems, IEEE, 2001. 14. C. Patterson, “High Performance DES Encryption in Virtex FPGAs using JBits”, Proc. Symp. on Field-Programmable Custom Computing Machines, IEEE Computer Society Press, 2000. 15. R. Sharp and O. Rasmussen, “The T-Ruby design system”, Formal Methods in System Design, Vol. 11, No. 3, October, 1997.
Compiling Hardware Descriptions with Relative Placement Information
359
16. N. Shirazi, W. Luk and P.Y.K. Cheung, “Framework and tools for run-time reconfigurable designs”, IEE Proc. Comput. Digit. Tech., Vol. 147, No. 3, May 2000. 17. S. Singh, “Architectural descriptions for FPGA circuits”, in Proc. Symp. on FPGAs for Custom Computing Machines, IEEE Computer Society Press, 1995. 18. S. Singh, “Death of the RLOC?”, in Proc. Symp. on Field-Programmable Custom Computing Machines, IEEE Computer Society Press, 2000. 19. S. Trimberger, R. Pang and A. Singh, “A 12 Gbps DES encryptor/decryptor core in an FPGA”, in Proc. Cryptographic Hardware and Embedded Systems, LNCS 1965, Springer, 2000. 20. Q. Wang and D.M. Lewis, “Automated field-programmable compute accelerator design using partial evaluation”, in Proc. Symp. on FPGAs for Custom Computing Machines, IEEE Computer Society Press, 1997.
Input/Output Compatibility of Reactive Systems Josep Carmona1 and Jordi Cortadella2 1
Universitat Polit`ecnica de Catalunya Computer Architecture Department Avda. Canal Olimpic, s/n. 08860 Castelldefels, Spain [email protected] 2
Universitat Polit`ecnica de Catalunya Software Department Jordi Girona 1-3 08034 Barcelona, Spain [email protected]
Abstract. The notion of I/O compatibility of reactive systems is defined. It models the fact that two systems can be connected and establish a correct dialogue through their input and output events. I/O compatibility covers safeness and liveness properties that can be checked with a polynomial-time decision procedure. The relationship between observational equivalence, I/O compatibility and input properness is also studied with the aim at supporting the proposal of transformations for the synthesis of reactive systems. Finally, a set of Petri net transformations that preserve I/O compatibility are shown as an example of application of the theory presented in this paper. Keywords: Reactive systems, Input/Output compatibility, Observational equivalence, Synchronous product, Trace theory, Conformation, Petri nets.
1
Introduction
This section is devoted to present the motivation of this work and a summary of the main contributions. 1.1
Reactive Systems
A system is said to be reactive when it has an explicit interaction with an environment. A reactive system can receive input stimuli from the environment, execute internal operations and produce results observable by the environment.
This work has been partially funded by the Ministry of Science and Technology of Spain under contract TIC 2001-2476, ACiD-WG (IST-1999-29119) and a grant by Intel Corporation.
M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 360–377, 2002. c Springer-Verlag Berlin Heidelberg 2002
Input/Output Compatibility of Reactive Systems
361
Formally, a reactive system can be modeled as a transition system with an explicit distinction among input, internal and output events. The system can only control its own events (internal and output), but cannot prevent the environment from producing input events if it decides to do so. Two different reactive systems can interact by connecting their inputs and outputs. We assume that the composition of reactive systems is done by synchronizing common events. An example of composition is the connection of two digital circuits, in which the transitions of any output signal are simultaneously observed by the circuit receiving them as inputs. Thus, the concept of environment is relative: each system considers the other to be its environment. 1.2
Motivation
The motivation comes from the need to formalize the fact that two systems can be connected and establish a consistent dialogue through their input and output events. The theory presented in this paper is inspired on the work by Dill [7]. The formal model for specifying a system considered here is more restricted that the one presented by Dill for complete trace structures. However, the properties covered by the model, including some notion of liveness, can be checked in polynomial time. For the type of systems that we want to deal with, the model is powerful enough. The definition of correct interaction is done by relating the states of the two systems. This state-based definition eases the proof of properties on their interaction. When then enabledness of input events is considered, sufficient conditions can be obtained that relate the theory with the well-known concept of observational equivalence [14] and input properness [3]. Finally, we show that the theory presented can be used for the synthesis of reactive systems. A kit of Petri net transformations is presented that are proved to preserve the notion of I/O compatibility. A practical application of this work is found in the area of synthesis of concurrent systems, e.g. asynchronous circuits [3] or codesign of embedded systems [9]. 1.3
I/O Compatibility
The notion we want to model is Input/Output compatibility. We now illustrate this notion with some examples and show why other equivalences for concurrent systems are not appropriate. Figure 1(a) depicts two reactive systems, X and Y , synchronized by a pair of events, a and b. Event a is an output for X and an input for Y , whereas b is an input for X and an output for Y . Moreover, X has an internal event τ . When enabled, internal and output events may take an unbounded, but finite, delay to fire. At each state, a system has only a (possibly empty) subset of input events enabled. If a non-enabled input is produced by the other partner, a communication failure is produced. The transition systems in Fig. 1(a) are observational equivalent. However, they are not I/O compatible, according to the notion presented in this paper.
362
J. Carmona and J. Cortadella
X
X
Y a
a!
b? τ
b
(a)
a?
Y
a? b?
b! d!
b?
a? c!
X
a b c d
d?
a!
c?
b! c? d?
(b)
Y
a?
a
a!
b?
b
c?
c!
c
b!
(c)
Fig. 1. Connection between different reactive systems (the suffixes ? and ! are used to denote input and output events, respectively).
In the initial state, only event a (produced by X) is enabled. After firing a synchronously in both systems, a new state is reached. In this state, Y is ready to produce b. However, X is not ready to accept b before τ is produced and, thus, a communication failure occurs when Y fires b and X has not fired τ yet. Therefore, observational equivalence does not imply I/O compatibility. Figure 1(b) shows that I/O compatibility does not imply observational equivalence. The synchronization of X and Y through the input and output events produces the following language: (abcd)∗ . In the initial state, X is ready to accept a and b in any order, i.e. they can fire concurrently. However, Y produces a and b sequentially. This situation is reversed for events c and d, accepted concurrently by Y but produced sequentially by X. In either case, the synchronization between X and Y is correct and both systems can interact without any failure. However, it is easy to see that X and Y are not observationally equivalent. Figure 1(c) depicts another undesired situation. After having produced event a, both systems block waiting for each other to fire some event. Thus, a deadlock is produced. This interaction would be considered “fair” in I/O automata theory [12]. Finally, there is another situation not acceptable for I/O compatible systems: livelock. This situation occurs when one of the systems can manifest an infinite internal behavior without any interaction with the other partner. 1.4
Application to the Synthesis of Reactive Systems
The main objective of this work is to provide a formal framework for characterizing valid transformations of reactive systems during synthesis. Synthesis is the process of transforming a system from a specification to an implementation that uses primitive actions available in some library. For example, a circuit is usually specified in terms of Boolean equations. However, only logic gates with limited fanin are available in a library. For this reason, Boolean equations must be decomposed and matched with logic gates. When synthesizing asynchronous circuits [3], each logic gate introduces a new internal signal with its associated internal events.
Input/Output Compatibility of Reactive Systems
363
Another example is software synthesis. A compiler is a tool that transforms a high-level specification into assembly code. In this process, many low-level internal actions are introduced (e.g. moving data across internal registers). In case of software synthesis for reactive systems, these internal actions are not observable by the environment.
X
X a a!
b?
b
a?
b!
a? τ2 b! X’: τ1 τ τ2 1
X’’:
b!
a? τ1
τ2
τ2 b!
Fig. 2. Transformations for the synthesis of a reactive system
Figure 2 depicts an example of valid and invalid transformations according to the I/O compatibility criterion. The system X is I/O compatible with X, the mirror of X. Let us assume that, for implementability reasons, two internal actions must be introduced in X, say τ1 and τ2 . The transformation that leads from X to X produces the internal events concurrently between a and b. On the other hand, the system X produces τ1 after a and then τ2 and b concurrently. Even though the transformations from X to X and X preserve observational equivalence, only X is I/O compatible with X. If we analyze the interaction between X and X , we observe that the trace aτ1 b leads to a state in which X can produce the event a but X cannot accept it. In this work we will show that input-properness is an important property in reactive systems, that guarantees that the receptiveness of input events does not depend on the internal activity of the system. 1.5
Contributions
The contributions of this work are next summarized: – A formal definition of I/O compatibility, as a relation between the states of two reactive systems is given. – Safety and liveness properties of I/O compatible systems are proved. – A polynomial-time decision procedure for I/O compatibility of finite transition systems is presented. – The relationship between observational equivalence, input-properness and I/O compatibility is studied as a support to propose I/O-compatible transformations during synthesis. – A kit of Petri net transformations preserving I/O compatibility is presented as an example to support the synthesis of asynchronous circuits. For simplicity, only I/O compatibility between two systems is considered. The extension to multiple systems would make the nomenclature more tedious, the
364
J. Carmona and J. Cortadella
paper less readable and would not contribute to go deeply into the main concepts of this work. The extension to more than two systems is quite straightforward and left for the reader.
2
Reactive Transition Systems
An event in a reactive system can be input, output or internal. An input event represents an action produced by the environment whereas an output event represents an action produced by the system. Finally, an internal event represents internal actions not observable by the environment. Typical examples of reactive systems are a computer, a television set and a vending machine. The events executed in a reactive system are assumed to take arbitrary but finite time. Formally, a Reactive Transition System is a Transition System [1] where transitions are labeled with events that can occur in a reactive system. Definition 1 (Reactive Transition System). A Reactive Transition System (RTS) is a 4-tuple A = (S, Σ, T, sin ) where – S is the set of states – Σ is the alphabet of events partitioned into three pairwise disjoint subsets of input (ΣI ), output (ΣO ) and internal (ΣIN T ) events. ΣOBS = ΣI ∪ ΣO is called the set of observable events. – T ⊆ S × Σ × S is the set of transitions – sin ∈ S is the initial state We will call it simply transition system (TS) when the distinction among input, output and internal events is irrelevant. Definition 2 (Enabling). An event e is enabled in the state s, denoted by En(s, e), if (s, e, s ) ∈ T , for some s . e
Reachability in an RTS. The transitions are denoted by (s, e, s ) or s → s . The reachability relation between states is the transitive closure of the transition σ relation T . The predicate s → s denotes a trace of events σ that leads from s to s by firing transitions in T . A state s is terminal if no event is enabled in s. An RTS is finite if S and T are finite sets. An RTS is deterministic if for each e state s and each event e there can be at most one state s such that s → s . Language of an RTS. An RTS can be viewed as an automaton with alphabet Σ, where every state is an accepting state. For an RTS A, let L(A) be the corresponding language, i.e. its set of traces starting from the initial state.
Input/Output Compatibility of Reactive Systems
2.1
365
Properties of Reactive Transition Systems
Depending on the interpretation of the events in an RTS, different properties can be defined. Definition 3 (Livelock). A livelock is an infinite trace of only internal events. An RTS is livelock-free if it has no livelocks. Livelocks can be detected in polynomial time in finite RTSs. The problem is reduced to the detection of cycles in a graph in which only the edges labeled with internal events are taken into account. Definition 4 (Input-properness). An RTS is input-proper when for every e internal transition s → s , with e ∈ ΣIN T and for every input event i ∈ ΣI , En(s , i) =⇒ En(s, i). In other words, input-properness is a property that indicates that the enabledness of an input event in a given state depends only on the observable trace leading to that state. Input-properness was introduced in [3] and is a crucial concept to preserve I/O compatibility, as shown later in Sect. 5. It avoids the situations in which the system is doing some “pending” internal work when the environment is producing an input event. The underlying idea of input-properness was previously presented by Dill [7] when, as a result of hiding an output signal, the same trace could be considered both as success and failure. Definition 5 (Mirror). The mirror of A, denoted by A, is another RTS identical to A, but in which the input and output alphabets of A have been interchanged. 2.2
Synchronous Product
The synchronous product of two transition systems is a new transition system which models the interaction between both systems that synchronize with common events [1]. We define the synchronous product for the class of transition systems, where no partition exists among the set of events. The extension to reactive transitions systems is straightforward. Definition 6 (Synchronous Product). Let A = (S A , Σ A , T A , sA in ), B = (S B , Σ B , T B , sB ) be two TSs. The synchronous product of A and B, denoted in by A × B is another TS (S, Σ, T, sin ) defined by B – sin = sA in , sin ∈ S A – Σ = Σ ∪ ΣB – S ⊆ S A ×S B is the set of states reachable from sin according to the following definition of T . – Let s1, s1 ∈ S. e e e • If e ∈ Σ A ∩ Σ B , s1 → s2 ∈ T A and s1 → s2 ∈ T B , then s1 , s1 → s2 , s2 ∈ T e e • If e ∈ Σ A \ Σ B and s1 → s2 ∈ T A , then s1 , s1 → s2 , s1 ∈ T e B A e B • If e ∈ Σ \ Σ and s1 → s2 ∈ T , then s1 , s1 → s1 , s2 ∈ T • No other transitions belong to T
366
3
J. Carmona and J. Cortadella
I/O Compatibility
A formal description of the conditions needed for having a correct dialogue between two RTSs is given in this section. We call this set of conditions I/O compatibility. The properties of the I/O compatibility can be stated in natural language: (a) Safeness: if system A can produce an output event, then B must be prepared to accept the event. (b) Liveness: if system A is blocked waiting for a synchronization with B, then B must produce an output event in a finite period of time. Theorems 1, 2 and 3 presented below define formally this properties. Two RTSs are structurally I/O-compatible if they share the observational set of events, in a way that they can be connected. Definition 7 (Structural I/O Compatibility). Let A = (S A , Σ A , T A , sA in ) and B = (S B , Σ B , T B , sB in ) be two RTSs. A and B are structurally I/O compatB A B B A ible if ΣIA = ΣO , ΣO = ΣIB , Σ A ∩ ΣIN T = ∅ and Σ ∩ ΣIN T = ∅. The following definition gives a concise formalization of the conditions needed for characterizing the correct interaction of two RTSs: Definition 8 (I/O Compatibility). Let A = (S A , Σ A , T A , sA in ) and B = (S B , Σ B , T B , sB in ) be two structurally I/O compatible RTSs. A and B are I/O compatible, denoted by A B, if A and B are livelock-free and there exists a relation R ⊆ S A × S B such that: B 1. sA in Rsin . 2. Receptiveness (output events of one party are expected by the other party): e e A and s1 → s2 then En(s1 , e) and ∀s1 → s2 : s2 Rs2 . (a) If s1 Rs1 , e ∈ ΣO e B e (b) If s1 Rs1 , e ∈ ΣO and s1 → s2 then En(s1 , e) and ∀s1 → s2 : s2 Rs2 . 3. Internal Progress (internal process preserves the interaction): e A (a) If s1 Rs1 , e ∈ ΣIN T and s1 → s2 then s2 Rs1 . B e (b) If s1 Rs1 , e ∈ ΣIN T and s1 → s2 then s1 Rs2 . 4. Deadlock-freeness (both parties can not be blocked at the same time): (a) If s1 Rs1 and {e | En(s1 , e)} ⊆ ΣIA then {e | En(s1 , e)} ΣIB . (b) If s1 Rs1 and {e | En(s1 , e)} ⊆ ΣIB then {e | En(s1 , e)} ΣIA .
Let us consider the examples of Fig. 1. In Fig. 1(a), the receptiveness condition fails and therefore X and Y are not I/O compatible. However, the RTSs of Fig. 1(b) are I/O compatible. Finally, Fig. 1(c) presents an example of violation of the deadlock-freeness condition. Condition 4 has a strong impact on the behavior of the system. It guarantees that the communication between A and B has no deadlocks (see theorem 3). Lemma 1. Let A and B be two RTSs such that A B, let R be an I/O compatible relation between A and B and let A × B = (S, Σ, T, sin ) be the synchronous product of A and B. Then, s, s ∈ S ⇒ sRs
Input/Output Compatibility of Reactive Systems
367
Proof. If s, s ∈ S, then there is a trace σ that leads from sin to s, s . We prove the lemma by induction on the length of σ. – Case |σ| = 0. The initial states are related in Condition 1 of Definition 8. – Case |σ| > 0. Let σ = σ e, with |σ | = n, and assume that it holds for any trace up to length n. Let s1 , s1 be the state where the event e is enabled. The induction hypothesis ensures that s1 is I/O compatible to s1 . Two situations can happen in s1 depending on the last event e of σ: either 1) e ∈ ΣO ∪ ΣIN T is enabled in s1 , or 2) only input events are enabled in s1 . In situation 1), Conditions 2-3 of Definition 8 guarantee that s is I/O compatible to s . In situation 2), applying Condition 4 of Definition 8 ensure that some non-input event is enabled in state s1 of B. Definition 6 and Conditions 2-3 on s1 and the enabled non-input event e guarantees s to be I/O compatible to s . ✷ Theorem 1 (Safeness). Let A and B be two RTSs such that A B, and a σ trace σ ∈ L(A × B) of their synchronous product such that sin → s, s . If A can fire an output event in s, then the same event is enabled in state s of B. Proof. It immediately follows from Lemma 1 and the condition of receptiveness in the definition of I/O compatibility. ✷ Theorem 2 (Absence of Livelocks). Let A and B be two RTSs such that A B, and let A × B be the synchronous product of A and B. Then, A × B is livelock-free. Proof. The definition of synchronous product implies that only livelocks appear in A × B if either A or B has a livelock. But A and B are livelock-free because A B. ✷ The following theorem is the one that proves the absence of deadlocks produced by the interaction between two I/O compatible RTSs. Theorem 3 (Liveness). Let A, B be two RTSs such that A B, and a trace σ σ ∈ L(A × B) of their synchronous product such that sin → s, s . If only input σ
events of A are enabled in s, then there exists some trace s, s → s, s such that some of the input events of A enabled in s are also enabled in s as output events of B. Proof. By Lemma 1 we have that sRs . We also have that {e | En(s, e)} ⊆ ΣIA . By Condition 4 of Definition 8 we know that {e | En(s1 , e)} ΣIB . Theorem 2 guarantees the livelock-freeness of A × B, and therefore from s, s there exists a trace of internal events reaching a state s, s where no internal event is enabled. We know by Lemma 1 that sRs . Condition 4 of Definition 8, together with the fact that no internal event is enabled in s implies that there exists an output event enabled in s , which is enabled as input in s. ✷
368
4
J. Carmona and J. Cortadella
A Polynomial-Time Decision Procedure for I/O Compatibility
A procedure for deciding if two finite RTS are I/O compatible is presented in this section. It is based on the synchronous product of transition systems. B B B B Theorem 4. Let A = (S A , Σ A , T A , sA in ), B = (S , Σ , T , sin ) be two livelockfree RTSs. A B iff A × B = (S, Σ, T, sin ) fulfills the following properties: A 1. (a) For each state s ∈ S A , for each event e ∈ ΣO : if En(s, e) holds and s, s ∈ S then En(s, s , e) holds. B (b) For each state s ∈ S B , for each event e ∈ ΣO : if En(s, e) holds and s, s ∈ S then En(s, s , e) holds. 2. For every s, s ∈ S, if s, s ∈ S is a terminal state, then s and s are terminal states in A and B, respectively.
Proof. The proof is divided into two parts: Sufficiency. Let R be an I/O compatibility relation between A and B and s, s ∈ S. Lemma 1 guarantees that sRs . 1. Since sRs , then En(s , e) holds in B. By the definition of synchronous product, En(s, s , e) holds. (Similarly for 1(b)). 2. Every non-input event e enabled in s or s induces e to be enabled in s, s . If only input events are enabled in one of the states, condition 4 of Definition 8 guarantees the enabling in the other state of a non-input event, and the definition of synchronous product ensures the existence of a transition leaving from s, s . Necessity. We will proof that S is an I/O compatible relation between A and B. State B sA in , sin belongs to S by definition of synchronous product. Let s, s ∈ S. Property 1, together with the definition of synchronous product implies the receptiveness condition of Definition 8. Condition 3 (internal progress) of Definition 8 holds by the definition of synchronous product: every internal event e enabled in s (s ) is also enabled in s, s , and the state(s) of S reached by the firing of e in s, s are exactly the pairs of I/O compatible states induced by Condition 3 with s and s . Condition 4 (deadlock-freeness) of Definition 8 also holds: if the events enabled in s are input events, then given that s, s is not terminal (due to Property 2), the only possibility for having an event enabled in ✷ s, s in Definition 6 is when a non-input event is enabled in s . Theorem 4 enables the use of the synchronous product for deciding the I/O compatibility of two finite RTSs in polynomial-time1 . It consists in computing the synchronous product in the first step, and then checking the conditions 1 and 2 of the theorem. 1
Figure 3 shows why it is necessary to consider only livelock-free RTSs in Theorem 4. Systems 1 and 2 are I/O compatible, but System 1 could have a livelock in the state reached after the sequence bτ1 a.
Input/Output Compatibility of Reactive Systems
5
369
I/O Compatibility and Observational Equivalence
In the first part of this section, the observational equivalence relation [13] is defined. Section 5.2 presents the relationship between I/O compatibility and observational equivalence. The proofs for the theorems in this sections are not difficult, but tedious. For this reason, they are presented in the appendix. 5.1
Observational Equivalence
The observational equivalence relation between two reactive systems was first introduced by Milner in [13]. The relation identifies those systems whose observable behavior is indistinguishable. B B B B Definition 9. Let A = (S A , Σ A , T A , sA in ) and B = (S , Σ , T , sin ) be two A B RTSs. A and B are observational equivalent (A ≈ B) iff ΣOBS = ΣOBS and there exists a relation R ⊆ S × S satisfying B 1. sA in Rsin . 2. (a) ∀s ∈ S A , ∃s ∈ S B s.t. sRs . (b) ∀s ∈ S B , ∃s ∈ S A s.t. sRs . e A ) and s1 → s2 then ∃σ1 , σ2 ∈ 3. (a) ∀s1 ∈ S A , s1 ∈ S B : if s1 Rs1 , e ∈ (ΣOBS σ eσ B 1 2 (ΣIN T )∗ such that s1 −→ s2 , and s2 Rs2 . e A B A (b) ∀s1 ∈ S , s1 ∈ S : if s1 Rs1 , e ∈ (ΣOBS ) and s1 → s2 then ∃σ1 , σ2 ∈ σ eσ 1 2 A (ΣIN T )∗ such that s1 −→ s2 , and s2 Rs2 .
The two RTSs of Fig. 1(a) are observational equivalent, because every observable sequence of one of them can be executed in the other. Figures 1(b)-(c) depict examples of non-observationally equivalent systems. 5.2
A Sufficient Condition for I/O Compatibility
A sufficient condition for having I/O compatibility between two reactive systems can be obtained when combining the notions of observational equivalence and input-properness: B B B B Theorem 5. Let A = (S A , Σ A , T A , sA in ), B = (S , Σ , T , sin ) be two livelockA B A B free RTSs with ΣI = ΣO and ΣO = ΣI . If A and B are input proper and A ≈ B, then A B.
Proof. See appendix. When considering a system A and some I/O compatible system B, any transformation of B preserving both input-properness and observational equivalence will lead to another I/O compatible system: B B B B Theorem 6. Let A = (S A , Σ A , T A , sA in ), B = (S , Σ , T , sin ) and C = ) be three RTSs. If A B, B ≈ C, and C is input-proper (S C , Σ C , T C , sC in then A C.
370
J. Carmona and J. Cortadella System 2
System 1 c!
a? b? τ1 a?
a b c d
c?
a! d? τ2 c?
Fig. 3. Two I/O compatible systems that are not input-proper.
Proof. See appendix. Figure 2 shows an example of application of Theorem 6. The transformation of X which leads to X preserves both observational equivalence and inputproperness, and then, X and X can safely interact. Finally, it must be noted that I/O compatibility does not require inputproperness, as shown in Fig. 3. This occurs when the non-input-proper situations are not reachable by the interaction of the two systems.
6
Application to the Synthesis of Asynchronous Circuits
Synthesis is the process of transforming a model in such a way that the observable behavior is preserved and the final model commits a set of implementability properties. This section presents a simple synthesis example in the area of asynchronous circuits modeled with Petri nets. I/O compatibility is the property we want to preserve across all transformations from the specification. A good survey on Petri net theory can be found in [15]. A kit of synthesis rules is presented that is valid for deterministic free-choice live and safe Petri nets (FCLSPN) [6]. Under certain conditions, the rules in the kit preserve I/O compatibility. Formal definitions and proofs can be found in [4]. Section 6.2 presents a simple example that shows the usefulness of the transformations. 6.1
I/O Compatible Petri Net Transformations
Three rules are presented for modifying the structure of a Petri net. The rule φr is used for serializing two concurrent transitions. It was first defined in [2]. Here a simplified version is presented. Rule φi does the opposite: it increases the concurrency between two ordered transitions. φi can be obtained as a combination of the ones appearing in [15]. Finally, rule φe hides a transition. It was first presented in [11]. All three rules preserve the liveness, safeness and free-choiceness of the Petri net. In each rule, the conditions for preserving I/O compatibility are also described.
Input/Output Compatibility of Reactive Systems p
p
2
1
t
ti
t
p
p
1
p
ti
tj
2 p1
p2
tj
ti tj
tj
ti
371
p3
p
3
p
3
(φr )
(φi )
tk
ti
tm
tk
p2
p1 ε p3
ts
p13
ts
p4
tr
p23
tm p14
tr
ti p24
tp
tp (φe )
Fig. 4. Kit of Petri net transformations: (φr ) concurrency reduction, (φi ) increase of concurrency, (φe ) transition elimination.
Rule φr . The purpose of the rule φr is to eliminate the concurrency between two transitions of the Petri net. This is done by inserting a place that connects the two transitions, ordering their firing. Figure 4 (top left) presents an example of concurrency reduction between transitions ti and tj . Rule φr preserves I/O compatibility when neither ti nor tj are transitions labeled with an input event. Rule φi . Inversely to rule φr , rule φi removes the causality relation between two ordered transitions, making them concurrent. Figure 4 (top right) presents an example of increase of concurrency between transitions ti and tj . Rule φi preserves I/O compatibility when: 1) either ti or tj represent a transition of an internal event, and 2) no input-properness violations are introduced by the transformation. Rule φe . The rule φe eliminates a transition from the Petri net. Figure 4 (bottom) presents an example of elimination of transition ε. Rule φe preserves I/O compatibility when ε represents an internal event. 6.2
Synthesis of a Simple Circuit
Figures 5(a-b) depict the example. The models used to describe behavior are marked graphs, a subclass of Petri nets with no choice places, in which events
372
J. Carmona and J. Cortadella
Environment x x+ y z+ y− y+ x− z
System x+ z+ y− y+ x−
z−
z− (a)
System
Environment x x+ y z+ y− y+ x− z
x+ y−
y+
z−
z+ x−
z− (b)
Fig. 5. (a) Mirrored implementation of an asynchronous circuit, (b) valid implementation with concurrency reduction.
represent rising (+) or falling (-) transitions of digital signals. The goal is to synthesize a circuit that can have a correct dialogue with the environment. We will assume that the components of the circuit have arbitrary delays. Likewise, the environment may take any arbitrary delay to produce any enabled output event. Let us first have a look at Fig. 5(a). The marked graph in the environment can be considered as a specification of a circuit. The underlined transitions denote input events. Thus, an input event of the environment must have a correspondence with an output event of the system, and vice versa. The behavior denoted by this specification can be informally described as follows: In the initial state, the environment will produce the event x+. After that, the environment will be able to accept the events y+ and z+ concurrently from the system. After the arrival of z+, the environment will produce x−, that can occur concurrently with y+. Next, it will wait for the system to sequentially produce z− and y−, thus leading the environment back to the initial state. The circuit shown in Fig. 5(a) behaves as specified by the adjacent marked graph. In this case, the behavior of the system is merely a mirror of the behavior of the environment. For this reason, the dialogue between both is correct. Let us analyze now the system in Fig. 5(b). The marked graph in the system part has been obtained by reducing concurrency between events y+ and z+, from the marked graph of Fig. 5(a). Still, the system can maintain a correct dialogue, since the environment is able to accept more behaviors than the ones produced by the system, i.e. the transformation performed preserves I/O compatibility. We can observe that, even though the behavior is less concurrent, the implementation is simpler.
Input/Output Compatibility of Reactive Systems
7 7.1
373
Related Work Conformation
The notion of conformation was defined in [7], where the model used for specifying circuits is a trace structure. Conformation models the fact that a specification is correctly realized by a given implementation. A complete trace structure is a four-tuple containing the set of input signals (I), the set of output signals (O), the set of traces leading to a success (S) and the set of traces leading to a failure (F ), with S, F ⊆ (I ∪ O)∞ . A complete trace structure models complete executions of a circuit. This allows to express liveness properties. Given two complete trace structures T and T , T conforms to T (T ≤ T ) if the composition of T and the mirror of T is failure-free (i.e. set of failures of the resulting trace structure is empty). The I/O compatibility can be reformulated to define a concept similar to conformation: for specification A, the system A represents a model of the environment where a possible implementation B must correctly interact [7]. We call this relation I/O preserving realization: Definition 10 (I/O Preserving Realization). Let A and B be two RTSs, A representing the specification of a reactive system. B realizes A (A |= B) if A B. I/O preserving realization inherits the liveness property from I/O compatibility: if no deadlocks exist in the interaction between the specification and its environment then the same occurs with its I/O realizable implementation. 7.2
Other Relations
I/O automata [12] is a model similar to RTS. In fact, any RTS can be expressed as an I/O automata by including a failure state that is the sink of transitions labeled with the input events not enabled at each state. In [12], a notion of automata satisfaction is presented, expressing when an I/O automata specification is correctly implemented by another I/O automata. The main difference between their satisfaction notion and our realization notion is that we guarantee the absence of deadlock situations in the dialogue between the system and its environment. Moreover, the fact that systems are assumed to be livelock-free allows a local definition of the I/O compatibility, in contrast to the trace-based definition in I/O automata. I/O compatibility has also relations with other equivalences like testing equivalence [5], built-in at CIRCAL [8]. In the area of asynchronous systems, several authors have defined different relations to model the concepts of refinement and realization [3,18,16,17, 10]. Among them, we emphasize the one proposed by Brzozowski and Seger [3]. They introduced the concept of input-properness and defined a realization notion stronger than I/O compatibility, that requires language equivalence. In particular, the following theorem can be easily proved. Theorem 7. Let A, B be two livelock-free RTSs such that A realizes B under the conditions defined in [3]. Then, A |= B.
374
J. Carmona and J. Cortadella
Finally, Verhoeff proposed the XDI refinement for delay-insensitive systems. This type of refinement assumes that the dialogue between two systems is produced by introducing any arbitrary delay in the communication, i.e. an event is received some time later than it is produced. Analogously to [7], the expressive power of the XDI model allows to include progress concerns in the model. Differently to the RTS model, the XDI model can not express internal progress (only input/output events are allowed in the model).
8
Conclusions
The theory presented in this paper is only the starting point to support synthesis frameworks that require a kit of transformations that preserve a correct interaction with the environment. Transformations such as insertion of internal events, reduction/increase of concurrency and so on, are crucial for the synthesis of asynchronous circuits or embedded software, in which concurrent models, e.g. Petri nets, are used to specify the behavior of the system. Further research is needed to extend the results of this work and derive necessary and sufficient conditions for the preservation of I/O compatibility.
References 1. A. Arnold. Finite Transition Systems. Prentice Hall, 1994. 2. G. Berthelot. Checking Properties of Nets Using Transformations. In G. Rozenberg, editor, Advances in Petri Nets 1985, volume 222 of Lecture Notes in Computer Science, pages 19–40. Springer-Verlag, 1986. 3. Janusz A. Brzozowski and Carl-Johan H. Seger. Asynchronous Circuits. SpringerVerlag, 1995. 4. J. Carmona, J. Cortadella, and E. Pastor. Synthesis of reactive systems: application to asynchronous circuit design. In J. Cortadella, A. Yakovlev, and G. Rozenberg, editors, Advances in Concurrency and Hardware Design (ACHD). Springer-Verlag, 2002. (To appear). Available at http://www.lsi.upc.es/˜jcarmona/achd02.ps.gz. 5. R. de Nicola and M. C. B. Hennessy. Testing Equivalences for Processes. Theoretical Computer Science, 34(1-2):83–133, November 1984. 6. J. Desel and J. Esparza. Free Choice Petri Nets. Cambridge University Press, Cambridge, Great Britain, 1995. 7. David L. Dill. Trace Theory for Automatic Hierarchical Verification of SpeedIndependent Circuits. ACM Distinguished Dissertations. MIT Press, 1989. 8. G.J. Milne. CIRCAL: A calculus for circuit descriptions. Integration, the VLSI Journal, 1(2–3):121–160, October 1983. 9. A. Jerraya. Hardware-software codesign. IEEE Design & Test of Computers, 17:92–99, March 2000. 10. Mark B. Josephs. A state-based approach to communicating processes. Distributed Computing, 3:9–18, 1988. 11. A. V. Kovalyov. On complete reducibility of some classes of Petri nets. In Proceedings of the 11th International Conference on Applications and Theory of Petri Nets, pages 352–366, Paris, June 1990.
Input/Output Compatibility of Reactive Systems
375
12. Nancy A. Lynch and Mark R. Tuttle. An introduction to input/output automata. In CWI-Quarterly, volume 2, pages 219–246, Centrum voor Wiskunde en Informatica, Amsterdam, The Netherlands, September 1989. 13. R. Milner. A Calculus for Communicating Processes, volume 92 of Lecture Notes in Computer Science. Springer Verlag, 1980. 14. Robin Milner. Communication and Concurrency. Prentice-Hall, 1989. 15. Tadao Murata. Petri nets: Properties, analysis and applications. Proceedings of the IEEE, 77(4):541–574, April 1989. 16. Radu Negulescu. Process Spaces and Formal Verification of Asynchronous Circuits. PhD thesis, Department of Computer Science, University of Waterloo, Waterloo, Ontario, Canada, August 1998. 17. Tom Verhoeff. Analyzing specifications for delay-insensitive circuits. In Proc. International Symposium on Advanced Research in Asynchronous Circuits and Systems, pages 172–183, 1998. 18. M. Yoeli and A. Ginzburg. Lotos/cadp-based verification of asynchronous circuits. Report CS-2001-09-2001, Technion - Computer Science Department, September 2001.
A
Proofs of Section 5
Proof of Theorem 5. Proof. Let R be the relation induced by the observational equivalence between A and B. We will prove that R is also an I/O compatibility relation between A and B. R must fulfill the conditions of the I/O compatibility relation: B – Condition 1: sA in Rsin by Definition 9. e A – Condition 2(a): let s1 Rs1 , and assume s1 → s2 , with e ∈ ΣO . Figure 6(a) depicts the situation. The observational equivalence of s1 and s1 implies that a trace σ of internal events exists in s1 enabling e. The event e is an input event in B, and therefore the input-properness of B ensures that in every state s of σ, En(s , e) holds. In particular, it also holds in the first state and, e thus, En(s1 , e). The definition of R ensures that every s2 such that s1 → s2 is related with s2 by R. e A – Condition 3(a): let s1 Rs1 and assume s1 → s2 , with e ∈ ΣIN T . The definition of R implies that s2 Rs1 . – Condition 4(a): let s1 Rs1 , and suppose {e | En(s1 , e)} ⊆ ΣIA . Figure 6(b) depicts the situation. Let e be one of the input events enabled in s1 . The observational equivalence between s1 and s1 requires that a sequence σ of internal events exists enabling e starting in s1 , and given that e in not input in B implies {e | En(s1 , e)} ΣIB .
An identical reasoning can be applied in the symmetric cases (conditions 2(b), 3(b) and 4(b)). ✷
376
J. Carmona and J. Cortadella
s1
e?
e!
s’2
s2
s’1 σ
e!
s1
s’1 σ
e’!
e!
e?
(a)
(b)
Fig. 6. Conditions 2(a) and 4(a) from the proof of Theorem 5.
s1 e! s2
R’
s’’ 1 e? s’’ 2
(a)
~
~
R
s’1 e? s’2
s1 e! s2
s’’ 1
σ
s’’ e! s’2
s’1 e! s’2
(b)
Fig. 7. Conditions 2(a) and 2(b) from the proof of Theorem 6.
Proof of Theorem 6. Proof. Let R be the relation between A and B, and ≈ the observational equivalent relation between states from B and C. Define the relation R as: ∀s ∈ S A , s ∈ S B , s ∈ S C : sR s ∧ s ≈ s ⇔ (s, s ) ∈ R
The conditions that R must satisfy are the ones of Definition 8. Remember B A = ΣIA and ΣIB = ΣO . Moreover, relation B ≈ C that A B implies that ΣO B C implies that ΣOBS = ΣOBS . – Condition 1: the initial states are related in R by definition. e A . Figure 7(a) – Condition 2(a): let s1 Rs1 , and suppose s1 → s2 with e ∈ ΣO depicts the situation. Given that s1 R s1 , e is enabled in s1 and for each e s2 such that s1 → s2 , s2 R s2 . The observational equivalence of s1 and s1 , together with the fact that C is input-proper implies that e is also enabled in s1 (identical reasoning of condition 2(a) in Theorem 5), and the definition e of ≈ implies that each s2 such that s1 → s2 must be related in ≈ with s2 . e Then each s2 such that s1 → s2 is related by R with s2 . e B – Condition 2(b): let s1 Rs1 , and suppose s1 → s2 with e ∈ ΣO . Figure 7(b) depicts the situation. The observational equivalence of s1 and s1 implies that there is a sequence σ of internal events starting in s1 and enabling e, and
Input/Output Compatibility of Reactive Systems
– – –
–
377
every state of σ is observational equivalent to s1 . Moreover, every state of σ is also related to s1 by the condition 3(b) of R . In particular, s1 is related e by R with the state s of σ s.t. s → s2 ; applying Condition 2(b) of R , e En(s1 , e) holds and for each e s.t. s1 → s2 , s2 R s2 . The definition of R and ≈ induces that each such s2 is related with s2 by R. e A Condition 3(a): let s1 Rs1 , and suppose s1 → s2 with e ∈ ΣIN T . Then Condition 3(a) of R ensures s2 R s1 and then applying the definition of R implies s2 Rs1 . e C Condition 3(b): let s1 Rs1 , and suppose s1 → s2 with e ∈ ΣIN T . Then s1 ≈ s2 , and then s1 Rs2 . Condition 4(a): let s1 Rs1 , and suppose {e|En(s1 , e)} ⊆ ΣIA . Condition 4(a) a of R ensures that {e|En(s1 , e)} ΣIB : let a be an event such that s1 → s2 , B with a ∈ / ΣIB . If a ∈ ΣO , the related pair s1 ≈ s1 ensures that in s1 there is a feasible sequence of internal events (which can be empty) enabling a, B and therefore {e|En(s1 , e)} ΣIC . If a ∈ ΣIN T , applying Condition 3(b) of R and the definition of ≈, s1 R s2 and s1 ≈ s1 is obtained, respectively. The same reasoning applied to s1 , s1 and s1 can now be applied to s1 , s2 and s1 . Given that B is livelock-free, the sequence of internal events starting in s1 and passing through s2 must end in a state s where a observable event a is enabled. State s is also related by R with s1 , and by ≈ with s1 (applying inductively the same reasoning applied to s2 ). Event a belongs to B ΣO because otherwise a violation of Condition 2(b) in R arise. The previous B , enabled in s1 ) can be applied to s . case (a ∈ ΣO Condition 4(b): let s1 Rs1 , and suppose {e|En(s1 , e)} ⊆ ΣIC . Let a such a B , then a contradiction arise because s1 ≈ s1 and that s1 → s2 . If a ∈ ΣO C {e|En(s1 , e)} ⊆ ΣI . If a ∈ ΣIB , then identical conditions make En(s1 , a) to B hold. If a ∈ ΣIN T , then Conditions 3(a) of R and ≈ ensure that s1 R s2 and s2 ≈ s1 , and the same reasoning of s1 , s1 and s1 can be applied to s1 , s1 and s2 (but not infinite times, because B is livelock-free). Therefore a feasible sequence of internal events (which can be empty) exist from s1 reaching a state s such that {e|En(s , e)} ⊆ ΣIC , with s1 R s and s ≈ s1 . Condition 4(b) of R ensures that {e|En(s1 , e)} ΣIA . ✷
Smart Play-out of Behavioral Requirements David Harel, Hillel Kugler, Rami Marelly, and Amir Pnueli The Weizmann Institute of Science, Rehovot, Israel {harel,kugler,rami,amir}@wisdom.weizmann.ac.il
Abstract. We describe a methodology for executing scenario-based requirements of reactive systems, focusing on “playing-out” the behavior using formal verification techniques for driving the execution. The methodology is implemented in full in our play-engine tool1 . The approach appears to be useful in many stages in the development of reactive systems, and might also pave the way to systems that are constructed directly from their requirements, without the need for intra-object or intra-component modeling or coding.
1
Introduction
In the last few years, formal specification and verification techniques are beginning to be applied to the development of complex reactive systems. Major obstacles that still prevent even wider usage of such methods include the fact that errors are found relatively late in the development process and that high expertise is required to correctly capture the properties to be verified. Recently there has been a growing interest in the verification of software based reactive systems, especially given the success in applying verification techniques to hardware. Due to the size and complexity of such systems, it is desirable to understand all the system requirements, and to make sure they are consistent, before moving to the implementation phase. In classic verification, a model is first constructed and then verified against well defined requirements, whereas one of the main points of this paper is that verification techniques can be beneficially applied to the requirements too. In this paper we suggest a methodology that addresses these obstacles. As our requirements language we use the live sequence charts (LSCs) of [7], a visual formalism based on specifying the various kinds of scenarios of the system — including those that are mandatory, those that are allowed but not mandatory, and those that are forbidden. LSCs thus extend classical message sequence charts, which do not make such distinctions. The Unified Modeling Language (UML) [33], which is the leading standard for specifying object oriented software systems, uses a variant of classical message sequence charts (MSCs) [21], called sequence diagrams, which can be viewed as a simple existential variant of LSCs. A new approach for capturing behavioral requirements (proposed briefly in [12]) has been developed recently, and is described in detail in [14]. In it the user plays
This research was supported in part by the John von Neumann Minerva Center for the Verification of Reactive Systems. 1 Short animations demonstrating some capabilities of the play-engine tool are available on the web: http://www.wisdom.weizmann.ac.il/∼rami/PlayEngine
M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 378–398, 2002. c Springer-Verlag Berlin Heidelberg 2002
Smart Play-out of Behavioral Requirements
379
in the behavior using a graphical interface (GUI) of the target system or an abstract version thereof. The formal requirements in the language of LSCs are then automatically generated from the play-in by a tool called the play-engine, without a need to explicitly prepare the LSCs or to write complex formulas in, e.g., temporal logic. Complementary to the play-in process is play-out [14]. In the play-out phase the user plays the GUI application as he/she would have done when executing a system model (or, for that matter, the final system) but limiting him/herself to “end-user” and external environment actions only. While doing so, the play-engine keeps track of the actions taken, and causes other actions and events to occur as dictated by the universal charts in the specification (these are charts describing mandatory behavior), thus giving the effect of working with a fully operational system or an executable model. It is noteworthy that no code needs to be written in order to play-out the requirements, nor does one have to prepare a conventional intra-object system model, as is required in most system development methodologies (e.g., using statecharts or some other language for describing the full behavior of each object, as in the UML, for example). We should also emphasize that the behavior played out is up to the user, and need not reflect the behavior as it was played in; the user is not merely tracing scenarios, but is executing the requirements freely, as he/she sees fit. This idea appears to have potential in many stages of system development [14]. In particular, the ability to execute such inter-object requirements without building a system model or writing code could lead to a totally new way of building many kinds of reactive systems. The play-engine would become a sort of “universal reactive machine", which would run requirements that were played in via a GUI, or written directly as LSCs, timing diagrams or formulas in an appropriate temporal logic. You provide the global, declarative, inter-object ways you want your system to behave (or to not behave), and the engine runs the system directly from them. It works a little like a perfect citizen, who does absolutely nothing unless it is called for by the grand “book of rules", and unless it doesn’t contradict anything else written in the book. Thus, the engine does only those things it is required to do, while avoiding those it is forbidden to do. This is a minimalistic, but completely safe way for a system to behave exactly according to the requirements, and to make sure that the system doesn’t just sit around doing nothing, it is up to the requirement engineers to make sure that any liveness properties they want the system to satisfy should be incorporated into the requirements. Play-out is actually an iterative process, where after each step taken by the user, the play-engine computes a superstep, which is a sequence of events carried out by the system as response to the event input by the user. However, the original play-out process of [14] is rather naive, for several reasons. For example, there can be many sequences of events possible as a response to a user event, and some of these may not constitute a “correct" superstep. We consider a superstep to be correct if when it is executed no active universal chart is violated. By acting blindly by the “book" of requirements, reacting to a user-generated event with the first action it encounters as a possible reaction to that event, the naive play-out process could very well follow a sequence of events that eventually causes violation, although another sequence could have been chosen that would have completed successfully. The multiplicity of possible sequences of reactions to a user event is due to the fact that a declarative, inter-object behavior language, such as LSCs,
380
D. Harel et al.
enables formulating high level requirements in pieces (e.g., scenario fragments), leaving open details that may depend on the implementation. The partial order semantics among events in each chart and the ability to separate scenarios in different charts without having to say explicitly how they should be composed are very useful in early requirement stages, but can cause under-specification and nondeterminism when one attempts to execute them. The work we describe here, which we term smart play-out, focuses on executing the behavioral requirements with the aid of formal analysis methods, mainly modelchecking. Our smart play-out process uses model-checking to find a “correct" superstep if one exists, or proves that such a superstep does not exist. Model-checking is applied anew at the occurrence of each user event to examine the different potential supersteps and to find a correct sequence of system reactions if there is one. Model-checking thus drives the execution. Another way of putting it is that the “smartness" in smart play-out works as an aid, helping the objects in the system cooperate in fulfilling the requirements. Experimental results we have obtained using a prototype implementation of smart playout are very promising. Smart play-out illustrates the power of putting formal verification methods to use in early stages of the development process, with the potential of impacting the development of reactive systems. We believe that additional verification tools and technologies can be used to improve the ability of the play-out framework to handle large systems efficiently. And, as mentioned above, we also believe that for certain kinds of systems the playout methodology, enhanced by formal verification techniques, could serve as the final implementation too, with the play-out being all that is needed for running the system itself. The paper is organized as follows. Section 2 gives a brief overview of the LSC language using a cellular phone system which serves as a running example throughout the paper. Section 3 discusses the Play-in/Play-out approach focusing on play-out and explaining the need for "Smart Play-Out". Section 4 shows examples from the cellular phone system illustrating where smart play-out is helpful. Section 5 gives a high level description of the smart play-out approach and how model-checking is used to achieve it, while section 6 provides a formal description of the translation that produces the input to the model-checker. Section 7 describes experimental results obtained on the cellular phone system using our prototype tool implementation of smart play-out. We conclude with a discussion of related work in Section 8.
2
LSCs
Live sequence charts (LSCs) [7] have two types of charts: universal (annotated by a solid borderline) and existential (annotated by a dashed borderline). Universal charts are used to specify restrictions over all possible system runs. A universal chart typically contains a prechart, that specifies the scenario which, if successfully executed, forces the system to satisfy the scenario given in the actual chart body. Existential charts specify sample interactions between the system and its environment, and must be satisfied by at least one system run. They thus do not force the application to behave in a certain way in all cases, but rather state that there is at least one set of circumstances under
Smart Play-out of Behavioral Requirements
381
which a certain behavior occurs. Existential charts can be used to specify system tests, or simply to illustrate longer (non-restricting) scenarios that provide a broader picture of the behavioral possibilities to which the system gives rise. We will use the cellular phone system to illustrate the main concepts and constructs of the language. In the LSC of Fig. 1, the prechart (top dashed hexagon) contains three
Fig. 1. LSC Sample - Quick Dialing
messages denoting the events of the user clicking the ‘*’ key, then clicking some digit (denoted by X2), and then clicking the SEND button. Following this, in the chart body, the chip sends a message to the memory asking it to retrieve the number stored in cell #X2. After this message comes an assignment in which the variable Num is assigned the value of the Number property of the memory. Assignments are internal to a chart and were proposed in [14] as an extension to LSCs. Using an assignment, the user may save values of the properties of objects, or of functions applied to variables holding such values. The assigned-to variable stores the value for later use in the LSC. It is important to note that the assignment’s variable is local to the containing chart and can be used for the specification of that chart only, as opposed to the system’s state variables, which may be used in several charts. After the assignment comes a loop construct. This is a bounded loop, denoted by a constant number (3 in this case), which means that it is performed at most that number of times. It can be exited when a cold condition inside it is violated, as described shortly2 . Inside the loop of Fig. 1, the chip tries (at most three times) to call the number Num. After sending the message to the environment, the chip waits for a signal to come back from it. 2
[14] defines also unbounded loops and dynamic loops, which we will not describe here
382
D. Harel et al.
The loop ends with a cold condition that requires Signal to be Busy. If a cold condition is true, the chart progresses to the location that immediately follows the condition, whereas if it is false, the surrounding (sub)chart is exited. A hot condition, on the other hand, must always be true, otherwise the requirements are violated and the system aborts. In Fig. 1, the chip will continue sending messages to the environment as long as the received signal is Busy, but no more than three times. Note how the use of variables and assignments in the chart makes this scenario a generic one, standing for many different specific scenarios. Hot conditions can be used for many other things too. For example, a forbidden scenario can be specified by putting it into a prechart with the main chart being a hot false condition. In general, we consider open reactive systems, and thus distinguish between the system and its external environment. As can be seen in Fig. 1 the system’s environment is also composed of a user operating the system (denoted by the like of a person) and an abstract entity representing all other elements interacting with the system. The user interacts with the system directly by operating its user interface, while the environment interacts with the system in other ways (e.g., communicating over channels, controlling environmental settings etc.). The advantage in using LSC’s is that it is an extension of sequence chart formalisms that are widely accepted and used by engineers, but is far more expressive than MSCs or UML sequence diagram. LSC’s can be viewed as a visual front-end to a somewhat restricted version of temporal logic, with mechanisms enabling convenient usage of the language. The semantics of a restricted subset of LSC’s in terms of temporal logic are given in [13], and a more complete treatment is in preparation. For a discussion on the advantages of LSCs as a requirements specification language see, e.g., [7,14].
3 The Play-in/Play-out Approach The play-in/play-out approach is described in detail in [14]. Recognizing that [14] has not yet been published, we give a brief overview here, sufficient for the purposes of the present paper. As its name states, the approach consists of two complementary aspects. Play-in is a method for capturing behavioral requirements (e.g., following the preparation of use cases) in an intuitive way, using a graphical user interface of the target system or an abstract version thereof. The output of this process is a formal specification in the language of LSCs [7]. Play-out is the process of testing the requirements by executing them directly. The input to the play-out process is a formal LSC specification.Although, it is much more effective to play out requirements that were played in, this is not obligatory, and the LSC specification can be produced in any desired way. It is worth noting that the behavior described in Fig. 1 was played in using a GUI of a cellular phone and did not require any drawing or editing of elements in the generated chart. Play-out is the process of testing the behavior of the system by providing user and environment actions in any order and checking the system’s ongoing responses. The playout process calls for the play-engine to monitor the applicable precharts of all universal charts, and if successfully completed to then execute their bodies. By executing the events
Smart Play-out of Behavioral Requirements
383
in these charts and causing the GUI application to reflect the effect of these events on the system objects, the user is provided with a simulation of an executable application. Note that in order to play out scenarios, the user does not need to know anything about LSCs or even about the use cases and requirements entered so far. All he/she has to do is to operate the GUI application as if it were a final system and check whether it reacts according to his/her expectations. Thus, by playing out scenarios the user actually tests the behavior of the specified system directly from the requirements — scenarios and forbidden scenarios as well as other constraints — without the need to prepare statecharts, to write or generate code, or to provide any other detailed intra-object behavioral specification. This process is simple enough for many kinds of end-users and domain experts, and can greatly increase the chance of finding errors early on. Note that a single universal chart may become activated (i.e., its prechart is successfully completed) several times during a system run. Some of these activations might overlap, resulting in a situation where there are several copies of the same chart active simultaneously. In order to correctly identify the activation of universal charts, there is also a need to have several copies of the prechart (each representing a different tracking status) monitored at the same time. A number of things happen during play-out. Charts are opened whenever they are activated and are closed when they are violated or when they terminate. Each displayed chart shows a “cut" (a kind of rectilinear “slice"), denoting the current location of each instance. The currently executed event is highlighted in the relevant LSCs. The playengine interacts with the GUI application, causing it to reflect the change in the GUI, as prescribed by the executed event. The user may examine values of assignments and conditions by moving the mouse over them in the chart. Whenever relevant, the effects show up in the GUI. Play-out sessions can also be recorded and re-played later on. So much for the universal charts, which drive the behavior and are activated when needed. In contrast, existential charts can be used as system tests or as examples of required interactions. Rather than serving to drive the play-out, existential charts are monitored, meaning that the play-engine simply tracks the events in the chart as they occur. When (and if) the chart reaches its end, it is highlighted and the user is informed that it was successfully traced to completion. These runs can be recorded as well, to provide testimonies (that can be re-played) for fulfilling the promises made by existential LSCs. We thus run the system in such a way as to seek satisfaction of existential promises while making sure we satisfy all universal promises. The premise of our present work is that the play-out algorithms described in [14] are somewhat naive. For example, if there are several ways to linearize the partial order of events in an LSC, the engine might choose one that leads to a contradiction with another LSC. This, depending on the hot or cold nature of active elements, could lead to abortion of the entire run. While such an occurrence is indeed a result of what the user played in, and is a legal execution, we might want the engine to help avoid it. If in this example there is some “correct" order (or several) that manages to run to completion successfully, we would like to find it and guide the play-out accordingly.
384
4
D. Harel et al.
Being Smart Helps: Examples
Consider the two charts LSC1 and LSC2 appearing in Fig. 2 and the following system reaction performed in response to the user clicking on the ‘PWR’ button: ChangeBackground(Green), ChangeBackground(Red), Open This superstep satisfies LSC1 but LSC2 remains active with the condition DisplayBack-
Fig. 2. Smart play-out helps
ground = Green false, because when it was activated by the Open event the background was already red. Notice that “locally” each event seems to be good, since it does not cause violation and causes the execution to progress. However, “globally” these system moves do not satisfy the second LSC. In contrast, the following system reaction satisfies both LSCs: ChangeBackground(Green), Open, ChangeBackground(Red) After changing the color to Green the system opens the antenna, thus causing the activation of LSC2. The Display color is Green, so the condition holds and LSC2 is satisfied. Then the color is changed to Red and LSC1 is satisfied. Smart play-out is designed to find a correct superstep in such cases. Similarly, consider the two charts State First and Background First in Fig. 3. When the user opens the cover both charts are activated. However, there is no way to satisfy them both since they require the events ChangeBackground(Green) and SetState(Time) to occur in contradicting order. While this is a very simple example, such contradictions can be a lot more subtle, arising as a result of the interaction between several charts. In large specifications this can be very hard to analyze manually. The smart play-out framework would prove that in such a case no correct superstep exists, which by the semantics of LSCs means that the requirements are inconsistent; see [13]. As discussed earlier, existential LSCs may be used to specify system tests. Smart play-out can then be used to find a trace that satisfies the chart without violating universal
Smart Play-out of Behavioral Requirements
385
Fig. 3. Inconsistent LSCs
charts on the way. Fig. 4 shows a test in which user and external environment actions are performed and expected system responses are described using conditions. In this chart, the user opens the cover and enters the number 911. In response, the display is expected to show the dialed number. Next, the user clicks the ‘SEND’ button and the phone’s speaker is expected to ring. Finally, when a signal from the environment indicating the accepting of the call (denoted by the “ACK" reserved word) is received by the phone’s chip, the speaker turns silent.
Fig. 4. Using existential charts to specify system tests
5
Smart Play-out: The General Approach
The approach we use is to formulate the play-out task as a verification problem, and to use a counterexample provided by model-checking as the desired superstep. The system on which we perform model-checking is constructed according to the universal charts in the specification. The transition relation is defined so that it allows progress of active universal charts but prevents any violations. The system is initialized to reflect the status of the application just after the last external event occurred, including the current values
386
D. Harel et al.
of object properties, information on the universal charts that were activated as a result of the most recent external events, and the progress in all precharts. The model-checker is then given a property claiming that always at least one of the universal charts is active. In order to falsify the property, the model-checker searches for a run in which eventually none of the universal charts is active; i.e., all active universal charts completed successfully, and by the definition of the transition relation no violations occurred. Such a counter-example is exactly the desired superstep. If the model-checker verifies the property then no correct superstep exists. The next section provides details of how to construct the input to the model checker. It is important to note that smart play-out (at least as it stands today) does not backtrack over supersteps. Thus, we may get to a situation where no correct super-step exists due to moves the system made in previous super-steps, which could perhaps have been done differently. This demonstrates the difference between smart play-out, which looks one super-step ahead, and full synthesis, which performs a complete analysis. Another important thing that we have incorporated into the smart play-out is to find a way to satisfy an entire existential chart (e.g. Fig. 4). Here we cannot limit ourselves to a single superstep, since the chart under scrutiny can contain external events, each of which triggers a superstep of the system. Nevertheless, the above formulation as a modelchecking problem can be used with slight modifications for this task too. Also, when trying to satisfy an existential LSC, we take the approach that assumes the cooperation of the environment. We should add that the method for satisfying existential LSCs can also be used to verify safety properties that take the form of an assertion on the system state. This is done by putting the property’s negation in an existential chart and verifying that it cannot be satisfied.
6 The Translation In the original paper defining LSCs [7] and in later work that uses LSCs for testing reactive systems [22], the semantics of LSCs is defined for a single chart. In the first one, a programmatic style is used and in the second, an automaton having legal cuts3 as states is constructed. In our work, the main focus is to find a correct behavior of the system according to several charts acting together. To do that, we construct a transition system which has one process for each actual object. A state in this system indicates the currently active charts and the location of each object in these charts. The transition relation restricts the transitions of each process only to moves that are allowed by all currently active charts. Note that our translation does not explicitly construct the cuts for each chart (a construction which by itself causes an exponential growth in the size of the initial representation). We now provide some of the details on how to translate a play-out problem into a model-checking problem. An LSC specification LS consists of a set of charts M , where each chart m ∈ M is existential or universal. We denote by pch(m) the prechart of chart m. Assume the set 3
A cut is a configuration indicating the location of each object along its instance line
Smart Play-out of Behavioral Requirements
387
of universal charts in M is M U = {m1 , m2 , ..., mt }, and the objects participating in the specification are O = {O1 , ..., On }. We define a system with the following variables: actmi determines if universal chart mi is active. It gets value 1 when mi is active and 0 otherwise. s msgO denoting the sending of message msg from object Oj to object Ok . The j →Ok value is set to 1 at the occurrence of the send and is changed to 0 at the next state. r msgO denoting the receipt by object Ok of message msg sent by object Oj . j →Ok Similarly, the value is 1 at the occurrence of the receive and 0 otherwise. lmi ,Oj denoting the location of object Oj in chart mi , ranging over 0 · · · lmax where lmax is the last location of Oj in mi . lpch(mi ),Oj denoting the location of object Oj in the prechart of mi , ranging over 0 · · · lmax where lmax is the last location of Oj in pch(mi ). Throughout this paper, we use the asynchronous mode, in which a send and a receive are separate events, but we support the synchronous mode too. We denote by f (l) the event associated with location l, and use the convention that primed variables denote the value of a variable in the next state while unprimed variables relate to the current state. We will now show the definition of the transition relation as it is affected by the different features of the LSC language. 6.1
Messages
We first define the transition relation for the location variable when the location corresponds to the sending of a message: lm i ,Oj
=
s =1 l if lmi ,Oj = l − 1 ∧ msgO j →Ok s l − 1 if lmi ,Oj = l − 1 ∧ msgOj →Ok = 0
Intuitively, if object Oj is at location l − 1 in chart mi , and the next location of Oj corresponds to the sending of message msg from Oj to Ok , then if in the next state the message is sent, the location is advanced; otherwise it remains still. It is important to s may not be allowed to occur at the next state due to notice that the event msgO j →Ok some other chart. This is one of the places were the interaction between the different charts becomes important. As for the receipt of events, given that n is the location at which message msg is sent from object Oj to object Ok , we define the transition relation as: = lm i ,Ok
r =1 l if lmi ,Ok = l − 1 ∧ lmi ,Oj ≥ n ∧ msgO j →Ok r l − 1 if lmi ,Ok = l − 1 ∧ (lmi ,Oj < n ∨ msgOj →Ok = 0)
If object Ok is at location l − 1 in chart mi , and the next location of Ok corresponds to the receipt of the message msg sent by object Oj , and this message has already been sent , then if in the next state the message is received, the location is advanced; otherwise it remains as is.
388
D. Harel et al.
We now define the transition relation for the variable determining the occurrence of a send event (the receive case is similar): s = msgO j →Ok
φ1 =
1 if φ1 ∧ φ2 0 otherwise
s mi ∈M U ∧msgO
j →Ok
φ2 =
actmi = 1
∈M essages(mi )
(actmi = 0 ∨ ψ(mi ))
s mi ∈M U ∧msgO ∈M essages(mi ) j →Ok
ψ(mi ) = lt
s.t.
(lmi ,Oj = lt − 1 ∧ lm = lt ) i ,Oj
s f (lt )=msgO j →Ok
In order for the event of sending msg from Oj to Ok to occur, we require two conditions to hold, which are expressed by formulas φ1 and φ2 respectively. The first, φ1 , states that at least one of the main charts in which this message appears is active. The assumption is that message communication is caused by universal charts that are active and does not occur spontaneously. The second requirement, φ2 , states that all active s charts must “agree" on the message. For an active chart mi in which msgO appears j →Ok we require that object Oj progress to a location lt corresponding to this message, as s expressed in formula ψ(mi ). Formula φ2 states that for all charts mi in which msgO j →Ok s appears (that is, msgOj →Ok ∈ M essages(mi )) either the chart is not active or the message can occur (that is, ψ(mi ) holds). According to the semantics of LSCs, if a message does not appear in a chart explicitly it is allowed to occur in-between the messages that do appear, without violating the chart. This is reflected in φ2 by the fact s that the conjunction is only over the charts in which msgO appears. j →Ok 6.2
Precharts
A prechart of a universal chart describes a scenario which, if completed successfully, forces the scenario described in the main chart to occur. (Fig. 1 has a prechart — the portion enclosed in the dashed hexagon.) The main chart becomes active if all locations of the prechart have reached maximal positions. In play-out it is often the case that a sequence of events in a superstep causes the activation of some additional universal chart, and this chart must now also be completed successfully as part of the super-step. For this purpose precharts are monitored, and locations along instance lines are advanced while messages are being sent and received. The transition relation for a location variable in a prechart is similar to the one defined for locations in the main chart, with one major difference; precharts may be violated. If a message is sent or received while it is not enabled in the prechart, the prechart is “reset" by moving all its instances back to their initial location. This reset action allows for the prechart to start “looking" for another option to be satisfied. In fact, in many cases when the model-checker searches for a “correct" super-step it tries to violate precharts in order not to get into the “obligations" of having to satisfy the corresponding main
Smart Play-out of Behavioral Requirements
389
charts. When all locations in the prechart reach their maximal positions, they too are reset.4 Formally, if location lpch(mi ),Oj = l − 1, and the next location corresponds to a message sending, then its transition relation is given by:
lpch(m i ),Oj
Φ(mi ) =
s if msgO =1 l j →Ok s msgOj →Ok = 0 ∧ Φ(mi ) = 0 l − 1 otherwise s Ψ s (msgO )∨ x →Oy
s msgO ∈M essages(mi ) x →Oy
r msgO ∈M essages(mi ) x →Oy
r Ψ r (msgO )∨ x →Oy
max (lpch(mi ),Oj = lpch(m ) i ),Oj
Oj ∈Obj(mi ) s Ψ s (msgO )= x →Oy
Ψ
r
r (msgO ) x →Oy
=
s s 1 if lmi ,Ox = lx − 1 ∧ f (lx ) = msgO ∧ msgO =1 x →Oy x →Oy 0 otherwise r r 1 if lmi ,Oy = ly − 1 ∧ f (ly ) = msgO ∧ msgO =1 x →Oy x →Oy 0 otherwise
Ψ s /Ψ r checks whether a send/receive event occurred while not enabled by its sender/receiver instance in the chart. φ(mi ) checks whether all locations reached their maximal position. 6.3 Activation of Charts For a universal chart mi , we define the transition relation for actmi as follows: if φ(pch(mi )) 1 if φ(mi ) actmi = 0 actmi otherwise max (lm = lm ) φ(mi ) = i ,Oj i ,Oj Oj ∈Obj(mi )
The main chart mi becomes active when all locations of the prechart reach maximal positions, and it stops being active when all locations of the main chart reach maximal positions.5 In order to identify the activation of a universal chart it is sometimes necessary 4
Our current treatment of precharts is still rather preliminary, and there are several issues we plan to consider more fully in the future. They include figuring out whether or not (or when) to use model checking to “help" precharts be successfully completed, and how to deal with loops and conditions in precharts in light of the main goals of smart play-out. 5 When the chart body contains interactions with the user/environment, we cannot guarantee that all maximal positions are reached, because the play-out cannot initiate moves by the environment. We therefore modify the transition relation to set a chart to be inactive when only user/environment events are enabled.
390
D. Harel et al.
to maintain several copies of the same prechart, each one being in a different stage of the prechart scenario. A universal chart may also be reactivated before it has completed, causing several copies of the main chart to be active simultaneously. It can be shown that in the absence of unbounded loops, the maximal number of simultaneously active charts and precharts is bounded and can be computed. Actually, we predict that in most practical cases these bounds will be small.6 6.4
Object Properties and Conditions
Although the basic strength of scenario-based languages like LSCs is in showing message communication, the LSC language has the ability to reason about the properties of objects too. Object’s properties can be referenced in condition constructs, which can be hot or cold.According to the semantics of LSCs, if a cold condition is true the chart progresses to the location that immediately follows the condition, whereas if it is false the surrounding (sub)chart is exited. A hot condition, on the other hand, must always be met, otherwise the requirements are violated and the system aborts. To support this kind of reasoning, we have to update the value of each property as the system runs. More formally, let POt k denote the tth property of object Ok , defined over a finite domain D. For many of the object properties there are simple rules — defined when the application is being constructed — that relate the value of the property to message communication. Accordingly, suppose that message msg received by Ok from Oj has the effect of changing property P t of Ok to the value d ∈ D. We then add to the transition relation of process Oj the clause:
r =1 POt k = d if msgO j →Ok
In this way, the values of the properties are updated as the objects send and receive messages. Object properties can be referred to in conditions. In fact, we take a condition expression to be a Boolean function over the domains of the object properties, C : D1 × D2 · · · × Dr → {0, 1}, so that a condition can relate to the properties of several objects. Here, the properties appearing in the condition are P1 , P2 , · · · Pr . A condition affects the transition relation of the location of a participating object. If object Oj is at location lj − 1 and object Ok is at location lk − 1 in chart mi , and if their next locations correspond to a hot condition C, we define: lm i ,Oj
=
if C(dj , dk ) = 1 ∧ lmi ,Oj = lj − 1 ∧ lmi ,Ok = lk − 1 lj lj − 1 if lmi ,Oj = lj − 1 ∧ ((C(dj , dk ) = 0 ∨ lmi ,Ok = lk − 1)
Object Oj moves to location lj if both objects participating in the condition are ready to evaluate the condition expression, being at locations lj − 1 and lk − 1, respectively, and the condition C holds. Here dj and dk are the values of properties POs j and POt k , respectively. The transition relation thus ensures synchronization of the objects when evaluating the condition and allows progress only if the condition expression holds, thus 6
This is because in order for the bound to be large there must be a very strong correlation between the messages in the prechart and the main chart, and this is usually not the case.
Smart Play-out of Behavioral Requirements
391
preventing violation of the chart. In this definition, we assumed that we have two objects, Oj and Ok , constrained by the condition, whereas in the general case there could be a single object or several objects. For a cold condition we define:
lm i ,Oj
if C(dj , dk ) = 1 ∧ lmi ,Oj = lj − 1 ∧ lmi ,Ok = lk − 1 lj if C(dj , dk ) = 0 ∧ lmi ,Oj = lj − 1 ∧ lmi ,Ok = lk − 1 = ls lj − 1 if lmi ,Oj = lj − 1 ∧ ((C(dj , dk ) = 0 ∨ lmi ,Ok = lk − 1)
The difference between this and the definition for a hot condition is that if the objects are ready for evaluating the condition but the condition does not hold, the smallest surrounding (sub)chart is exited, as per the semantics of LSCs. Here, ls is the location of object Oj at the end of the surrounding (sub)chart. In such a case, all the other objects will also synchronize their exit of this (sub)chart. Note that this is a “peaceful exit", and does not constitute a violation of the universal chart mi . 6.5 Assignments Assignments enable referring to system properties after they are set. An assignment of the form x := d stores the value d in the variable x. In practice, d may be a constant value, a property value of some object or the value obtained by applying some function. To handle assignments we add a boolean variable assign(x, d) that is set to 1 exactly when the assignment is performed. Actually, these variables are used only for notational clarity, since in the implementation they can be computed from the values of the location variables. The translation is straightforward:
x =
d if lmi ,Ok = l − 1 ∧ lmi ,Ok = l ∧ assign(x, d) x otherwise
Intuitively, if object Ok is at location l − 1 in chart mi , and the next location of Ok corresponds to the assignment x := d the value of x is set to d. We also add to the system a boolean variable xbound , which determines whether variable x is already bound to a concrete value. After an assignment is evaluated xbound is set to 1. More information about this appears in the next subsection. Assignments are local to a chart. Typically the variable x in the left hand side of the assignment is used later in a condition or symbolic message. 6.6
Symbolic Messages
Symbolic messages are of the form msg(x), where x is a parameter ranging over the finite domain D . A symbolic message represents concrete messages of the form msg(d), where d ∈ D. Using symbolic messages it is possible to describe generic scenarios, which are typically instantiated and bound to concrete values during play-out. To handle symbolic messages we add a variable representing the parameter x, which can be bound to a concrete value as the result of the occurrence of a concrete message
392
D. Harel et al.
or an assignment. The binding of this variable also affects other messages in the same chart that are parameterized by x, binding them to the same value. Once the variables of a symbolic message are bound to concrete values, the usual rules concerning message communication apply to it, so it affects the transition relation similarly to a regular message. Formally, for a symbolic message of the form msg(x) we add a variable x ∈ D and a boolean variable xbound , which determines whether variable x is already bound to a concrete value. Initially we set xbound to 0 and define the transition relation as follows:
1 if φ1 ∨ φ2 ∨ xbound = 1 0 otherwise φ1 = lmi ,Oj = l − 1 ∧ lm =l∧ msg(d) = 1 i ,Oj xbound =
φ2 = lt
s.t. f (lt )=assign(x)
d∈D (lmi ,Ok = lt − 1 ∧ lm = lt ) i ,Ok
According to the definition xbound is changed to 1 in the case of the occurrence of concrete message msg(d) where d ∈ D (As defined by φ1 ) or when x appears in the left hand side of an assignment that is being evaluated (As defined by φ2 ). The transition relation for the variable x is defined: x =
= l ∧ (msg(d) = 1 ∨ assign(x, d) = 1) d if lmi ,Oj = l − 1 ∧ lm i ,Oj x otherwise
The first case corresponds to binding of x to value d as the result of the occurrence of concrete message msg(d) or as the result of x being assigned the value d. Otherwise x remains unchanged. We now define the transition relation for the location variable when the location corresponds to a symbolic message: lm i ,Oj
=
l if lmi ,Oj = l − 1 ∧ d∈D (msg(d) = 1 ∧ xbound = 1 ∧ x = d) l − 1 if lmi ,Oj = l − 1 ∧ d∈D (msg(d) = 0 ∨ xbound = 0 ∨ x = d)
Intuitively, if object Oj is at location l − 1 in chart mi , and the next location of Oj corresponds to a symbolic message, then the location is advanced if the message msg(d) occurs and x is bound to the value d ∈ D. 6.7
If-Then-Else
The transition relation of this construct is a variation on the way conditions are handled in subsection 6.4. All participating objects are synchronized when the condition is evaluated and when entering and exiting the Then and Else parts. We omit the details.
Smart Play-out of Behavioral Requirements
6.8
393
Loops
A loop is a sub-chart whose behavior is iterated, and all objects are synchronized at the beginning and end of each iteration. Loops can be of two basic types, bounded or unbounded [7,14]. The transition relation synchronizes the objects at the beginning and end of each iteration, and for the bounded case a counter variable is added to ensure that the given bound is not exceeded. We omit the details. 6.9
Functions
As explained in the subsection dealing with object properties, message communication can have an effect on the values of object properties. In cases where there is a simple rule relating the value of a property to message communication, this can be fully handled in the transition relation. In cases where more complex functions are used, the situation is more complicated. We used a practical approach, creating a symbolic trace of events that is bound to actual values at a later stage, iteratively. Here too, we omit the details. 6.10 The Model-Checking To compute a super-step using a model checker, the system is initialized according to the current locations of instances in precharts, while all locations in the main charts are set to 0. The main chart’s activation state is also initialized to reflect the current state.7 We also set the objects’ properties to reflect their current value. The model checker is then given the following property to prove, stating that it is always the case that at least one of the universal charts is active: (actmi = 1)) G( mi ∈M U
As explained earlier, falsifying this property amounts to finding a run that leads to a point in which all active universal charts have completed successfully, with no violations, which is exactly the desired superstep.
7
Implementation and Experimental Results
We have implemented smart play-out as part of a prototype tool that links to the playengine, thus supporting the general play-in/play-out approach. During play-out, the tool translates a play-out task into the corresponding model, runs the model checker and then injects the obtained counter-example into the play-engine. Thus, smart play-out drives the execution. We use the Weizmann Institute model-checker TLV [30] and the CMU SMV model-checker [6], but we can easily modify the tool to use other model-checkers too. Before constructing the model we perform a static calculation to identify those charts that can potentially become active in the current super-step, and use only them when 7
After each external event, the play-engine decides which precharts have completed and sets their corresponding main charts to be active.
394
D. Harel et al.
defining the system transition relation. This static calculation appears to reduce the size of the model-checking problem dramatically, since we have found that only a relatively small number of charts are usually active together in a single super-step even when the LSC specification itself is large. The model-checkers we use are BDD based,8 where the ordering of variables has a critical influence on running time. We use structural information from the LSC specification in order to derive good variable ordering. We also noticed that the message variables described in the translation section can be expressed in terms of the location variables, and can then be eliminated from the model. When obtaining the counter-example their values can be calculated and used for constructing the “correct” super-step. A cellular phone system we use for illustration has about 35 different charts, and handles scenarios like dialing numbers, sending and receiving calls, opening the antenna, etc. It consists of 15 objects and uses 40 different types of messages. Calculating a superstep using our current implementation of smart play-out takes less than 1 second on a standard PC. This is fast enough to give the user a seamless feeling of working with an conventional executable model. The tool also manages to satisfy existential charts for which the counter-example has more than 100 events, in less than 2 minutes. A satisfying scenario for the existential chart shown in Fig. 4 was found by the play-engine in less then 7 seconds (including the translation, model checking and construction of the run). The scenario consists of 19 events and involves 5 different universal charts, one of which is activated 3 times. Besides these rather dry algorithmic/performance issues, using the smart play-out tool seems to provide the user with an enhanced understanding of the behavioral requirements, and a smooth and realistic execution framework for LSCs. Given these results and the major progress verification and model-checking has made in recent years, we are strongly believe that using such a methodology can be practical for handling real-world applications. And, as we have repeatedly mentioned, it brings us one step closer to the possibility of requirements-based code-less development of reactive systems.
8
Related Work
A large amount of work has been done on formal requirements, sequence charts, and model execution. We briefly discuss the ones most relevant to our work. There are commercial tools that successfully handle the execution of graphical models (e.g., Statemate [16] and Rhapsody by I-Logix [20], ObjectTime [32], and Rose-RT by Rational [31]). However, they all execute an intra-object design model (statecharts) rather than an inter-object requirement model. LSC’s have been used for testing and verification of system models. Lettrai and Klose [26] present a methodology supported by a tool called TestConductor, which is integrated into Rhapsody [20]. The tool is used for monitoring and testing a model using a (rather restricted) subset of LSCs. During execution of a Rhapsody model the TestConductor monitors the charts and provides information on whether they have been 8
We have recently begun using bounded model checking based on SAT methods. In some cases, they prove to be very effective for smart play-out, yet this work is only in its initial phase.
Smart Play-out of Behavioral Requirements
395
completed successfully or if any violations have occurred. [26] also mentions the ability to test an implementation using these sequence charts, by generating messages on behalf of the environment (or other un-implemented classes termed stubs). Their algorithm selects the next event to be carried out at the appropriate time by the environment (or by unimplemented classes) based on a local choice, without considering the effects of the next step on the rest of the sequence, or the interaction between several charts. Damm and Klose [8,22] describe a verification environment in which LSCs are used to describe requirements that are verified against a Statemate model implementation. The verification is based on translating an LSC chart into a timed Buchi automaton , as described in [22], and it also handles timing issues. In both this work and [26], the assumption is that a system model whose reactive parts are described by statecharts has already been constructed, and the aim is to test or verify that model. We might thus say that while our work here focuses on putting together the information in the different charts, these papers treat each chart independently. In a recent paper [27], the language of LSCs was extended with variables and symbolic instances. A symbolic instance, associated with a class rather than with an object, may stand for any object that is an instance of the class. The information passed between the instances can also be parameterized, using symbolic variables. A symbolic message may stand for any message of the same kind, with actual values bound to its parameterized variables. The extension is useful for specifying systems with unbounded number of objects and for parameterized systems, where an actual instantiation of the system has a bounded number of objects, but this number is given as a parameter. In [15], the language of LSCs is further extended with powerful timing constructs, and the execution mechanism is modified so that real-time systems too can be specified and simulated directly from the requirements. We intend to extend the smart play-out algorithms to deal with both symbolic instances and the timing extensions. Application of formal methods to the analysis of software requirements captured with SCR (Software Cost Reduction) is described in [17]. The SCR method provides a tabular notation for specifying the required relation between system and environment variables. In [5] model-checking methods are used to verify that a complete SCR model satisfies certain properties, by using SMV and Spin model-checkers. This work is very different from our work. In [5] model-checking is used for verifying properties of a state-based model (which is the traditional use of model-checking), while we use model-checking for driving the execution of a scenario-based specification. The idea of using sequence charts to discover design errors at early stages of development has been investigated in [3,28] for detecting race conditions, time conflicts and pattern matching. The language used in these papers is that of classical Message Sequence Charts, with semantics being simply the partial order of events in a chart. In order to describe system behavior, such MSC’s are composed into hierarchal message sequence charts (HMSC’s) which are basically graphs whose nodes are MSC’s. As has been observed in several papers, e.g. [4], allowing processes to progress along the HMSC with each chart being in a different node may introduce non-regular behavior and is the cause of undecidability of certain properties. Undecidability results and approaches to restrict HMSC’s in order to avoid these problems appear in [19,18,11]. In our work, the
396
D. Harel et al.
fact that LSC semantics requires that objects are synchronized while iterating during (unbounded) loops prevents such problems. Another direction of research strongly related to our work is synthesis, where the goal is to automatically synthesize a correct system implementation from the requirements. Work on synthesis from MSC-like languages appears in [23,24,2,34,10], and an algorithm for synthesizing statecharts from LSC’s appears in [13]. Moreover, a lot of work has been done on synthesis from temporal logic e.g., [9,1,29,25]. The main difference is that in our work the play-out algorithms search one super-step ahead (or several supersteps when satisfying existential charts), whereas synthesis algorithms do not have such restrictions; they can thus be proven to behave correctly under all circumstances. Apart from the fact that smart play-out deals with an easier problem and therefore solutions may be more practical, we believe that play-out is complementary to synthesis. Making synthesis methodologies feasible requires designers to have good ways to understand and execute the requirements, in order to make sure that the input to the synthesis algorithm is exactly what is desired. Our approach is also useful in an iterative development cycle, where many modifications of requirements and implementations are performed; trying to run a synthesis algorithm after each modification, even assuming that synthesis becomes feasible, does not seem like a particularly good approach.
References 1. M. Abadi, L. Lamport, and P. Wolper. Realizable and unrealizable concurrent program specifications. In Proc. 16th Int. Colloq. Aut. Lang. Prog., volume 372 of Lect. Notes in Comp. Sci., pages 1–17. Springer-Verlag, 1989. 2. R. Alur, K. Etessami, and M. Yannakakis. Inference of message sequence charts. In Proc. 22nd Int. Conf. on Software Engineering (ICSE’00), Limerick, Ireland, June 2000. 3. R. Alur, G.J. Holzmann, and D. Peled. An analyzer for message sequence charts. Software Concepts and Tools, 17(2):70–77, 1996. 4. R. Alur and M. Yannakakis. Model checking of message sequence charts. In Proc. 10th Int. Conf. on Concurrency Theory (CONCUR’99), Eindhoven, Netherlands, August 1999. 5. R. Bharadwaj and C. Heitmeyer. Model Checking Complete Requirements Specifications Using Abstraction. Automated Software Engineering, 6(1):37–68, January 1999. 6. J.R. Burch, E.M. Clarke, K.L. McMillan, D.L. Dill, and J. Hwang. Symbolic model checking: 1020 states and beyond. Information and Computation, 98(2):142–170, 1992. 7. W. Damm and D. Harel. LSCs: Breathing Life into Message Sequence Charts. Formal Methods in System Design, 19(1), 2001. (Preliminary version in Proc. 3rd IFIP Int. Conf. on Formal Methods for Open Object-Based Distributed Systems (FMOODS’99), (P. Ciancarini, A. Fantechi and R. Gorrieri, eds.), Kluwer Academic Publishers, 1999, pp. 293–312.). 8. W. Damm and J. Klose. Verification of a Radio-based Signalling System using the STATEMATE Verification Environment. Formal Methods in System Design, 19(2):121–141, 2001. 9. E.A. Emerson and E.M. Clarke. Using branching time temporal logic to synthesize synchronization skeletons. Science of Computer Programming, 2:241–266, 1982. 10. M. Fr¨anzle and K. L¨uth. Visual Temporal Logic as a Rapid Prototyping Tool. Computer Languages, 27:93–113, 2001. 11. Elsa L. Gunter, Anca Muscholl, and Doron Peled. Compositional message sequence charts. In Tools and Algorithms for Construction and Analysis of Systems, pages 496–511, 2001.
Smart Play-out of Behavioral Requirements
397
12. D. Harel. From Play-In Scenarios To Code: An Achievable Dream. IEEE Computer, 34(1):53– 60, January 2001. (Also in FundamentalApproaches to Software Engineering (FASE), Lecture Notes in Computer Science, Vol. 1783 (Tom Maibaum, ed.), Springer-Verlag, March 2000, pp. 22–34.). 13. D. Harel and H. Kugler. Synthesizing State-Based Object Systems from LSC Specifications. Int. J. of Foundations of Computer Science (IJFCS)., 13(1):5–51, Febuary 2002. (Also,Proc. Fifth Int. Conf. on Implementation and Application of Automata (CIAA 2000), July 2000, Lecture Notes in Computer Science, Springer-Verlag, 2000.). 14. D. Harel and R. Marelly. Specifying and Executing Behavioral Requirements: The Play-In/ Play-Out Approach. Tech. Report MCS01-15, The Weizmann Institute of Science, 2001. 15. D. Harel and R. Marelly. Playing with Time: On the Specification and Execution of TimeEnriched LSCs. In Proc. 10th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS’02), Fort Worth, Texas, 2002. To appear. 16. D. Harel and M. Politi. Modeling Reactive Systems with Statecharts: The STATEMATE Approach. McGraw-Hill, 1998. 17. C. Heitmeyer, J. Kirby, B. Labaw, and R. Bharadwaj. SCR*: A Toolset for Specifying and Analyzing Software Requirements. In A.J. Hu and M.Y. Vardi, editors, Proc. 10th Intl. Conference on Computer Aided Verification (CAV’98), volume 1427 of Lect. Notes in Comp. Sci., Springer-Verlag, pages 5–51, 1998. 18. J.G. Henriksen, M. Mukund, K. Narayan Kumar, and P.S. Thiagarajan. On Message Sequence Graphs and finitely generated regular MSC languages. In Proceedings of the 27th International Colloquium on Automata Languages and Programming (ICALP’2000), number 1853 in Lecture Notes in Computer Science, Geneva, Switzerland, 2000. Springer. 19. J.G. Henriksen, M. Mukund, K. Narayan Kumar, and P.S. Thiagarajan. Regular collections of Message Sequence Charts. In Proceedings of the 25th International Symposium on Mathematical Foundations of Computer Science (MFCS’2000), number 1893 in Lecture Notes in Computer Science, Bratislava, Slovakia, 2000. Springer-Verlag. 20. I-logix,inc., products web page. http://www.ilogix.com/fs prod.htm. 21. ITU. ITU-T recommendation Z.120: Message sequence chart (MSC). 22. J. Klose and H. Wittke. An automata based interpretation of live sequence chart. In Proc. 7th Intl. Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’01), 2001. 23. K. Koskimies, T. Systa, J. Tuomi, and T. Mannisto. Automated support for modeling OO software. IEEE Software, 15(1):87–94, 1988. 24. I. Kruger, R. Grosu, P. Scholz, and M. Broy. From MSCs to statecharts. In Proc. DIPES’98. Kluwer, 1999. 25. O. Kupferman and M.Y. Vardi. Synthesis with incomplete information. In 2nd International Conference on Temporal Logic, pages 91–106, Manchester, July 1997. 26. M. Lettrari and J. Klose. Scenario-based monitoring and testing of real-time uml models. In Proc. 4th Int. Conf. on the Unified Modeling Language, 2001. 27. R. Marelly, D. Harel, and H. Kugler. Multiple Instances and Symbolic Variables in Executable Sequence Charts. In Proc. 17th Ann. ACM Conf. on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA’02), Seattle, WA, 2002. To appear. Also available as Tech. Report MCS02-05, Weizmann Institute of Science, 2002. 28. Anca Muscholl, Doron Peled, and Zhendong Su. Deciding properties for message sequence charts. In Foundations of Software Science and Computation Structure, pages 226–242, 1998. 29. A. Pnueli and R. Rosner. On the synthesis of a reactive module. In Proc. 16th ACM Symp. Princ. of Prog. Lang., pages 179–190, 1989.
398
D. Harel et al.
30. A. Pnueli and E. Shahar. A platform for combining deductive with algorithmic verification. In In R. Alur and T. Henzinger, editors, Proc. 8th Intl. Conference on ComputerAidedVerification (CAV’96), volume 1102 of Lect. Notes in Comp. Sci., Springer-Verlag, pages 184–195, 1996. 31. Rational,inc., web page. http://www.rational.com. 32. B. Selic, G. Gullekson, and P. Ward. Real-Time Object-Oriented Modeling. John Wiley & Sons, New York, 1994. 33. UML. Documentation of the unified modeling language (UML). Available from the Object Management Group (OMG), http://www.omg.org. 34. J. Whittle and J. Schumann. Generating statechart designs from scenarios. In Proc. 22nd Int. Conf. on Software Engineering (ICSE’00), Limerick, Ireland, June 2000.
Author Index
Luk, Wayne
Aagaard, Mark D. 123 Ayari, Abdelwaheb 187
342
Marelly, Rami 378 McKeever, Steve 342 Melham, Thomas F. 1 Meulen, Meine van der 310 Moon, In-Ho 52
Basin, David 187 Berezin, Sergey 171 Bloem, Roderick 88 Bryant, Randal E. 142 Carmona, Josep 360 Chatterjee, Prosenjit 292 Chauhan, Pankaj 33 Ciardo, Gianfranco 256 Clarke, Edmund 33 Cortadella, Jordi 360
Pastor, Enric 220 Penna, Giuseppe Della Pixley, Carl 52 Pnueli, Amir 378
Das, Satyaki 19 Day, Nancy A. 123 Derbyshire, Arran 342 Dill, David L. 19, 171
Sapra, Samir 33 Sawada, Jun 274 Seger, Carl-Johan H. 70 Seshia, Sanjit A. 142 Sharp, Richard 324 Sheridan, Daniel 238 Shiple, Thomas 52 Siminiceanu, Radu 256 Sol´e, Marc 220 Somenzi, Fabio 88 Strichman, Ofer 160
Frisch, Alan
Ravi, Kavita
238
Gamboa, Ruben 274 Ganesh, Vijay 171 Gopalakrishnan, Ganesh Hachtel, Gary D. 106 Harel, David 378 Intrigila, Benedetto Jones, Robert B.
202
1
Kugler, Hillel 378 Kukula, James 33, 52 Kwak, Hee Hwan 52 Lahiri, Shuvendu K. Lou, Meng 123
142
292
88
Tronci, Enrico
202
Veith, Helmut
33
Walsh, Toby Wang, Chao Wang, Dong Yang, Jin
202
238 106 33
70
Zilli, Marisa Venturini
202