Towards a Design Flow for Reversible Logic

Towards a Design Flow for Reversible Logic Robert Wille Rolf Drechsler Towards a Design Flow for Reversible Logic ...

Author: Robert Wille | Rolf Drechsler

52 downloads 1226 Views 3MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Towards a Design Flow for Reversible Logic

Robert Wille Rolf Drechsler

Towards a Design Flow for Reversible Logic

Robert Wille Institute of Computer Science University of Bremen Bibliothekstr. 1 28359 Bremen Germany [email protected]

Rolf Drechsler Institute of Computer Science University of Bremen Bibliothekstr. 1 28359 Bremen Germany [email protected]

ISBN 978-90-481-9578-7 e-ISBN 978-90-481-9579-4 DOI 10.1007/978-90-481-9579-4 Springer Dordrecht Heidelberg London New York Library of Congress Control Number: 2010932404 © Springer Science+Business Media B.V. 2010 No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Cover design: eStudio Calamar S.L. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)

Preface

The development of computing machines found great success in the last decades. But the ongoing miniaturization of integrated circuits will reach its limits in the near future. Shrinking transistor sizes and power dissipation are the major barriers in the development of smaller and more powerful circuits. Reversible logic provides an alternative that may overcome many of these problems in the future. For low-power design, reversible logic offers significant advantages since zero power dissipation will only be possible if computation is reversible. Furthermore, quantum computation profits from enhancements in this area, because every quantum circuit is inherently reversible and thus requires reversible descriptions. However, since reversible logic is subject to certain restrictions (e.g. fanout and feedback are not directly allowed), the design of reversible circuits significantly differs from the design of traditional circuits. Nearly all steps in the design flow (like synthesis, verification, or debugging) must be redeveloped so that they become applicable to reversible circuits as well. But research in reversible logic is still at the beginning. No continuous design flow exists so far. In this book, contributions to a design flow for reversible logic are presented. This includes advanced methods for synthesis, optimization, verification, and debugging. Formal methods like Boolean satisfiability and decision diagrams are thereby exploited. By combining the techniques proposed in the book, it is possible to synthesize reversible circuits representing large functions. Optimization approaches ensure that the resulting circuits are of small cost. Finally, a method for equivalence checking and automatic debugging allows to verify the obtained results and helps to accelerate the search for bugs in case of errors in the design. Combining the respective approaches, a first design flow for reversible circuits of significant size results. This book addresses computer scientists and computer architects and does not require previous knowledge about the physics of reversible logic or quantum computation. The respective concepts as well as the used models are briefly introduced.

v

vi

Preface

All approaches are described in a self-contained manner. The content of the book does not only conveys a coherent overview about current research results, but also builds the basis for future work on a design flow for reversible logic. Bremen

Robert Wille Rolf Drechsler

Acknowledgements

This book is the result of more than three years of intensive research in the area of reversible logic. During this time, we experienced many support from different people for which we would like to thank them very much. In particular, the Group of Computer Architecture at the University of Bremen earns many thanks for providing a comfortable and inspirational environment. Many thanks go to Stefan Frehse, Daniel Große, Lisa Jungmann, Hoang M. Le, Sebastian Offermann, and Mathias Soeken who actively helped in the development of the approaches described in this book. Sincere thanks also go to Prof. D. Michael Miller from the University of Victoria, Prof. Gerhard W. Dueck from the University of New Brunswick, and Dr. Mehdi Saeedi from the Amirkabir University of Technology in Tehran for very fruitful collaborations. In this context, we would like to thank the German Academic Exchange Service (DAAD) which enabled the close contact with the groups in Canada. Special thanks go to the German Research Foundation (DFG) which funded parts of this work under the contract number DR 287/20-1. Finally, we would like to thank Marc Messing who did a great job of proofreading as well as Christiane and Shawn Mitchell who closely checked the manuscript for English style and grammar.

vii

Contents

1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Background . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Reversible Functions . . . . . . . . . . . . . . 2.1.2 Reversible Circuits . . . . . . . . . . . . . . . 2.1.3 Quantum Circuits . . . . . . . . . . . . . . . 2.2 Decision Diagrams . . . . . . . . . . . . . . . . . . . 2.2.1 Binary Decision Diagrams . . . . . . . . . . . 2.2.2 Quantum Multiple-valued Decision Diagrams 2.3 Satisfiability Solvers . . . . . . . . . . . . . . . . . . 2.3.1 Boolean Satisfiability . . . . . . . . . . . . . 2.3.2 Extended SAT Solvers . . . . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

7 7 7 9 13 16 17 19 21 21 22

3

Synthesis of Reversible Logic . . . . . . . . . 3.1 Current Synthesis Steps . . . . . . . . . . 3.1.1 Embedding Irreversible Functions . 3.1.2 Transformation-based Synthesis . . 3.2 BDD-based Synthesis . . . . . . . . . . . 3.2.1 General Idea . . . . . . . . . . . . 3.2.2 Exploiting BDD Optimization . . . 3.2.3 Theoretical Consideration . . . . . 3.2.4 Experimental Results . . . . . . . 3.3 SyReC: A Reversible Hardware Language 3.3.1 The SyReC Language . . . . . . . 3.3.2 Synthesis of the Circuits . . . . . . 3.3.3 Experimental Results . . . . . . . 3.4 Summary and Future Work . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

27 28 28 30 31 32 34 37 39 46 47 50 53 56

4

Exact Synthesis of Reversible Logic . . . . . . . . . . . . . . . . . . . 4.1 Main Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 SAT-based Exact Synthesis . . . . . . . . . . . . . . . . . . . . .

57 58 61

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

1

ix

x

Contents

4.2.1 Encoding for Toffoli Circuits . . . . . . 4.2.2 Encoding for Quantum Circuits . . . . . 4.2.3 Handling Irreversible Functions . . . . . 4.2.4 Experimental Results . . . . . . . . . . 4.3 Improved Exact Synthesis . . . . . . . . . . . . 4.3.1 Exploiting Higher Levels of Abstractions 4.3.2 Quantified Exact Synthesis . . . . . . . 4.3.3 Experimental Results . . . . . . . . . . 4.4 Summary and Future Work . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

61 65 68 70 76 77 81 84 91

5

Embedding of Irreversible Functions . 5.1 The Embedding Problem . . . . . 5.2 Don’t Care Assignment . . . . . . 5.2.1 Methods . . . . . . . . . . 5.2.2 Experimental Results . . . 5.3 Synthesis with Output Permutation 5.3.1 General Idea . . . . . . . . 5.3.2 Exact Approach . . . . . . 5.3.3 Heuristic Approach . . . . 5.3.4 Experimental Results . . . 5.4 Summary and Future Work . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

93 94 96 96 99 100 102 104 105 106 111

6

Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Adding Lines to Reduce Circuit Cost . . . . . . . . . . . . . . 6.1.1 General Idea . . . . . . . . . . . . . . . . . . . . . . . 6.1.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 6.1.3 Experimental Results . . . . . . . . . . . . . . . . . . 6.2 Reducing the Number of Circuit Lines . . . . . . . . . . . . . 6.2.1 General Idea . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Experimental Results . . . . . . . . . . . . . . . . . . 6.3 Optimizing Circuits for Linear Nearest Neighbor Architectures 6.3.1 NNC-optimal Decomposition . . . . . . . . . . . . . . 6.3.2 Optimizing NNC-optimal Decomposition . . . . . . . . 6.3.3 Experimental Results . . . . . . . . . . . . . . . . . . 6.4 Summary and Future Work . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

113 114 114 116 117 124 125 127 130 131 133 134 138 138

7

Formal Verification and Debugging . . . . . . . 7.1 Equivalence Checking . . . . . . . . . . . . 7.1.1 The Equivalence Checking Problem . 7.1.2 QMDD-based Equivalence Checking 7.1.3 SAT-based Equivalence Checking . . 7.1.4 Experimental Results . . . . . . . . 7.2 Automated Debugging and Fixing . . . . . . 7.2.1 The Debugging Problem . . . . . . . 7.2.2 Determining Error Candidates . . . .

. . . . . . . . .

. . . . . . . . .

143 144 145 145 148 150 154 155 157

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . .

. . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

Contents

7.2.3 Determining Error Locations 7.2.4 Fixing Erroneous Circuits . . 7.2.5 Experimental Results . . . . 7.3 Summary and Future Work . . . . . 8

xi

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

161 165 167 173

Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . 175

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

Acronyms

BDDs CMOS CNF CNOT d HDL LNN NNC MCF MCT P QMDD QF_BV QBF SAT SMT SWOP

Binary Decision Diagrams Complementary Metal Oxide Semiconductor Conjunctive Normal Form Controlled-NOT Number of gates (depth) of a circuit Hardware Description Language Linear Nearest Neighbor Nearest Neighbor Cost Multiple control Fredkin Multiple control Toffoli Peres Quantum Multiple-valued Decision Diagram Quantifier free bit-vector logic Quantified Boolean Formulas Boolean satisfiability SAT Modulo Theories Synthesis with Output Permutation

xiii

Chapter 1

Introduction

In the last decades, great achievements have been made in the development of computing machines. While computers consisting of a few thousands of components filled whole rooms in the early 70’s, nowadays billions of transistors are built on some square millimeters. This is a result of the achievements made in the domain of semiconductors which are still holding on: The number of transistors in a circuit doubles every 18 months (which is also known as Moore’s Law according to the founder of Intel, Gordon E. Moore, who formulated this as a prediction in 1965 [Moo65]).1 Until today, this prediction has not lost any of its validity—each year more complex systems and chips are introduced. However, it is obvious that such an exponential growth must reach its limits in the future—at least when the miniaturization reaches a level, where single transistor sizes are approaching the atomic scale. Besides that, power dissipation more and more becomes a crucial issue for designing high performance digital circuits. In the last decades, the amount of power dissipated in the form of heat to the surrounding environment of a chip increased by orders of magnitudes. Since excessive heat may decrease the reliability of a chip (or even destroys it), power dissipation is one of the major barriers to progress the development of smaller and faster computer chips. Due to these reasons, some researchers expect that from the 2020s on, duplication of transistor density will not be possible any longer. To further satisfy the needs for more computational power, alternatives are needed that go beyond the scope of “traditional” technologies like CMOS.2 Reversible logic marks a promising new direction where all operations are performed in an invertible manner. That is, in contrast to traditional logic, all computations can be reverted (i.e. the inputs can be obtained from the outputs and vice versa). A simple standard operation like the logical AND already illustrates that reversibility is not guaranteed in traditional circuit systems. Indeed, it is possible to obtain the inputs of an AND gate if the output is assigned to 1 (then both inputs must be as1 Originally,

Moore predicted a doubling every 12 months; ten years later he updated to 18 months.

2 CMOS

is the abbreviation for Complementary Metal Oxide Semiconductor, the technology mainly used for today’s integrated circuits. R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, DOI 10.1007/978-90-481-9579-4_1, © Springer Science+Business Media B.V. 2010

1

2

1 Introduction

signed to 1 as well). But, it is not possible to determine the input values if the AND outputs 0. In contrast, reversible logic allows bijective operations only, i.e. n-input n-output functions that map each possible input vector to a unique output vector. This reversibility builds the basis for emerging technologies that may replace or at least enhance the traditional computer chip. Two examples of such technologies making use of reversible logic are sketched in the following: • Reversible Logic for Low-Power Design As mentioned above, power dissipation and therewith heat generation is a serious problem for today’s computer chips. A significant part of energy dissipation is due to the non-ideal behaviors of transistors and materials. Here, higher levels of integration and new fabrication processes reduced the heat generation in the last decade. However, a more fundamental reason for power dissipation arises from the observations made by Landauer in 1961 [Lan61]. Landauer proved that using traditional (irreversible) logic, gates always lead to energy dissipation regardless of the underlying technology. More precisely, exactly k · T · log 2 Joule of energy is dissipated for each “lost” bit of information during the irreversible operation (where k is the Boltzmann constant and T is the temperature). While this amount of power currently does not sound significant, it may become crucial additionally considering that (1) today millions of operations are performed in some seconds (i.e. increasing processor frequency multiplies this amount) and (2) more and more operations are performed with smaller and smaller transistor sizes (i.e. in a smaller area). In contrast, Bennett showed that energy dissipation is reduced or even eliminated if computation becomes information-lossless [Ben73]. This holds for reversible logic, since data is bijectively transformed without losing any of the original information. Bennett proved that circuits with zero power dissipation are only possible if they are built from reversible gates. In 2002, first reversible circuits have been built that exploit this observation [DV02]. In fact, these circuits were powered by their input signals only (i.e. without additional power supplies). In the future, such circuits may be an alternative that can cope with the heat generation problem of traditional chips. Furthermore, since reversible circuits already work with low power, applications are also possible in domains where power is a limited resource (e.g. for mobile computation). • Reversible Logic as a Basis for Quantum Computation Quantum circuits [NC00] offer a new kind of computation. Instead of logic signals 0 and 1, quantum circuits make use of qubits. A qubit is a two level quantum system, described by a two dimensional complex Hilbert space. The resulting tools can be obtained under www.revkit.org. This allows to represent not only 0 and 1 but also a superposition of both. As a result, qubits may represent multiple states at the same time enabling enormous speed-ups in computations. For example, it has been shown that using a quantum circuit it is possible to solve the factorization problem in polynomial time, while for traditional circuits only exponential methods exist [Sho94, VSB+01].

1 Introduction

3

But, research in the area of quantum circuits is still at the beginning. Nevertheless, first promising results exist: At the University of Innsbruck one of the first quantum circuits consisting of 8 qubits was built in 2005. This has been further improved so that today circuits with dozens of qubits exists—with upward trend. Even first commercial realizations of quantum circuits (e.g. a random number generator) are available. Reversible logic is important in this area because every quantum operation is inherently reversible. Thus, progress in the domain of reversible logic can be directly applied to quantum logic. Besides that, reversible logic additionally finds application in domains like optical computing [CA87], DNA computing [TS05], as well as nanotechnologies [Mer93]. Also, cryptography or encoding/decoding methods (e.g. for music and videos) can profit from enhancements in this area (see e.g. [ML01]). Furthermore, already today reversible operations are used in instruction sets for microprocessors [SL00]. The basic concepts of reversible logic are thereby not new and have already been introduced in the 60’s by Landauer [Lan61] and were further refined by Bennett [Ben73] and Toffoli [Tof80]. They observed that due to the reversibility, a straightforward usage of fanouts and feedback is not possible in reversible logic. Furthermore, new libraries of (reversible) gates have been introduced to represent invertible operations [Tof80, FT82, Per85, NC00] and it was stated that each reversible circuit must be a cascade of these reversible gates. Even if this still represents the basis for research in the area of reversible logic, the topic was not intensively studied by computer scientists before the year 2000. The main reason for that may lie in the fact that applications of reversible logic (in particular in the domain of quantum computation) have been seen as “dreams of the future”. But, this changed as with factorization a very important problem (factorization builds the basis for most of the today’s encrypting methods) was solved on a physically implemented quantum circuit [Sho94, VSB+01]. Therewith, a proof of concept was available showing that quantum computing, in fact, may be one solution for future computational problems. In particular, this achievement (together with further ones e.g. in reversible CMOS design as mentioned above) significantly moved the topic forward so that nowadays reversible logic is seen as a promising research area. As a consequence, in the last years computer scientists have also started to develop new methods e.g. for synthesis of reversible circuits. However, no real design flow for reversible logic exists until today. This is crucial since, due to the mentioned restrictions (e.g. no fanout and feedback), the design of reversible circuits significantly differs from the design of traditional circuits. Nearly all elaborated methods for synthesis, verification, debugging, and test available for traditional circuit design must be redeveloped so that they become applicable to reversible circuits as well. Now, while applications of reversible logic are starting to become feasible and traditional technologies more and more suffer from the increasing miniaturization, it is even more necessary to work towards such a flow. Moreover, considering the traditional design flow, it can be concluded that until today, computer scientists cannot fully exploit the technical state-of-the-art. That is, the number of transistors that can be physically implemented on a chip grows faster

4

1 Introduction

Fig. 1.1 Proposed design flow

than the ability to design them in a useful manner (also known as the design gap). This becomes even more crucial if, additionally, the ability to verify the correctness of the designed circuits is considered (known as the verification gap). Once reversible logic becomes feasible for large designs in the future, researchers will be faced with similar challenges. Thus, it is worth working towards a design flow for reversible logic already today. First steps in this direction have been made in the domain of synthesis (see e.g. [SPMH03, MDM05]), verification (see e.g. [VMH07, GNP08]), and test (see e.g. [PHM04, PBL05, PFBH05]). However, they are all still far away from covering real design needs. As an example, most synthesis approaches are only applicable for small functions and often produce circuits with relatively high cost. In contrast, design methods to create complex circuits and to efficiently verify their correctness are needed. This book makes contributions to a future design flow for reversible logic by proposing advanced methods for synthesis, optimization, verification, and debugging. Figure 1.1 shows the interaction of the proposed steps in an integrated flow. The left-hand side sketches the restrictions or challenges, respectively, to be solved in comparison to traditional methods. By combining the techniques proposed in the book, it is possible to synthesize reversible circuits representing large functions. Optimization methods ensure that the resulting circuits are of small cost. Finally, methods for equivalence checking and automatic debugging allow to verify the obtained results and help to accelerate the search for bugs in case of errors in the design. In the following, the respective contributions are briefly introduced in the order they appear in this book. A more detailed description of the problems as well as the proposed solutions is given at the beginning of each chapter.

1 Introduction

5

As a starting point, synthesis is considered in Chap. 3. Currently, the synthesis of reversible logic or quantum circuits, respectively, is limited. In the past, only methods have been proposed that are applicable for relatively small functions, i.e. functions with at most 30 variables. In addition, these methods often require an enormous amount of run-time. After reviewing the reasons for these limitations, a new synthesis method based on Binary Decision Diagrams (BDDs) is proposed. This enables synthesis of functions containing over 100 variables and thus is a major step towards design of complex systems in reversible logic. Additionally, a hardware description language is introduced that allows to specify and afterwards synthesize complex circuits in reversible logic. The problem of exact synthesis is considered in Chap. 4. Exact synthesis methods generate minimal circuits for a given function. Naturally, exact synthesis approaches are only applicable for very small functions. But, the resulting circuit realizations can be used later e.g. as building blocks for heuristic approaches. Nevertheless, run-time is the limiting factor here. The chapter describes how techniques of Boolean satisfiability (SAT) can be exploited for efficient exact synthesis of reversible circuits. Further approaches incorporating problem-specific knowledge as well as quantification are then introduced. These methods allow further accelerations of the exact synthesis. With these different synthesis approaches as a basis, the problem of embedding is addressed in Chap. 5. Usually, most of the synthesis approaches require a reversible function as input. But, basic functions like AND or addition are inherently irreversible. Thus, before synthesis, these functions must be embedded into reversible ones. This requires the addition of extra circuit signals and therewith constant inputs, garbage outputs, as well as don’t care conditions at the outputs. Furthermore, the order of outputs may be chosen arbitrarily. All this affects the generated synthesis result. In Chap. 5, methods for finding good embeddings are proposed and evaluated. After synthesis, the resulting circuits often are of high cost. In particular, dedicated technology-specific constraints are not considered by synthesis approaches. To address this, in Chap. 6 three different optimization methods are introduced—each with an own focus on a particular cost metric. The first one considers the reduction of the well-established quantum cost (used in quantum circuits) and the transistor cost (used in CMOS implementations), respectively. The second one addresses the number of lines in a circuit which particularly is important for all quantum realizations. Finally, an approach is introduced that takes a new cost metric based on a dedicated physical realization of quantum circuits into account. This allows that designers can automatically optimize their circuits with respect to the special needs of the addressed technology. To ensure that the respective results (e.g. obtained by optimization) still represent the desired functionality, verification is applied. For this purpose, in Chap. 7 equivalence checkers are introduced that can handle circuits with several thousands of gates in a very short time. Furthermore, an automatic approach for debugging is proposed. Instead of manually searching for the source of an error, this method allows a fast calculation of a reduced set of error candidates to be considered or even to automatically fix the erroneous circuit, respectively.

6

1 Introduction

Fig. 1.2 Structure of the book

Altogether, the contributions of this book to the design flow for reversible logic can be summarized as follows: • Synthesis methods for large functions (i.e. functions with more than 100 variables) • A hardware description language for reversible logic • Exact approaches for synthesizing minimal circuits that can later be used as building blocks • Embedding methods to automatically realize circuits for irreversible functions • Optimization approaches to reduce the cost with respect to the addressed technology • Equivalence checking of large circuits (i.e. circuits with several thousands of gates) • Automatic debugging and fixing of erroneous circuits All proposed methods have been implemented and experimentally evaluated. To this end, a uniform format for specifying reversible functions as well as reversible circuits has been defined which was used in all experiments throughout this book (see also the note on benchmarks on p. 26). Furthermore, all benchmark functions as well as the circuits have been made online available at RevLib under www.revlib.org. The resulting tools can be obtained under www.revkit.org. This allows other researchers to compare their results with the ones obtained in this work. However, the results together with a discussion, related work, and future research directions are, of course, also given in the respective chapters. According to the outline sketched above, the remainder of this book is structured as depicted in Fig. 1.2. The next chapter gives a more detailed introduction into both reversible as well as quantum logic and provides the basic notations and definitions as used in the rest of this book. Afterwards, the chapters about synthesis (Chap. 3), optimization (Chap. 6), as well as verification and debugging (Chap. 7) can be read independently of each other. Only for Chap. 4 about exact synthesis and Chap. 5 about embedding irreversible functions is it recommended to read the previous chapters beforehand. Chapter 8 summarizes all findings and gives directions for future work.

Chapter 2

Preliminaries

This chapter provides the basic definitions and notations to keep the remaining book self-contained. The chapter is divided into three parts. In the first section, Boolean functions, reversible functions, and the respective circuit descriptions are introduced. This builds the basis for all approaches described in this book. Since many of the proposed techniques exploit decision diagrams and satisfiability solvers, respectively, the basic concepts of these core techniques are also introduced in the last two sections. All descriptions are thereby kept brief. For a more in-depth treatment, references to further reading are given in the respective sections.

2.1 Background Reversible logic realizes bijective Boolean functions. Thus, first the basics regarding Boolean functions are revisited and further extended by a description of the properties specifically applied to reversible functions. Then, reversible circuits as well as quantum circuits are introduced which are used as realizations of reversible functions.

2.1.1 Reversible Functions Every logic computation can be defined as a function over Boolean variables B ∈ {0, 1}. More precisely: Definition 2.1 A Boolean function is a mapping f : Bn → B with n ∈ N. A function f is defined over its input variables X = {x1 , x2 , . . . , xn } and hence is also denoted by f (x1 , x2 , . . . , xn ). The concrete mapping is described in terms of Boolean expressions which are formed over the variables from X and the operations ∧ (AND), ∨ (OR), as well as · (NOT). R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, DOI 10.1007/978-90-481-9579-4_2, © Springer Science+Business Media B.V. 2010

7

8 Table 2.1 Boolean functions

2 Preliminaries (a) AND

(b) OR

(c) NOT

x1 x2

x1 ∧ x2

x1 x2

x1 ∨ x2

x1

x1

0 0

0

0 0

0

0

1

0 1

0

0 1

1

1

0

1 0

0

1 0

1

1 1

1

1 1

1

Example 2.1 Table 2.1 shows the truth tables of the operations AND, OR, and NOT, respectively. Each truth table has 2n rows, showing the mapping of each input pattern to the respective output pattern. Taking AND, OR, and NOT as a basis, every Boolean function can be derived. For example, the often used functions XOR, implication, and equivalence are derived as follows: • XOR: x1 ⊕ x2 := (x1 ∧ x 2 ) ∨ (x 1 ∧ x2 ) • Implication: x1 ⇒ x2 := x 1 ∨ x2 • Equivalence: x1 ⇔ x2 := x1 ⊕ x2 So far, single-output functions have been introduced. However, in practice also multi-output functions are widely used. Definition 2.2 A multi-output Boolean function is a mapping f : Bn → Bm with n, m ∈ N. More precisely, it is a system of Boolean functions fi (x1 , x2 , . . . , xn ) with 1 ≤ i ≤ m. In the following multi-output functions are also termed as n-input, m-output functions or n × m functions, respectively. Example 2.2 Table 2.2(a) shows the truth table of a 3-input, 2-output function representing the adder function. This book considers reversible functions. Reversible functions are a subset of multi-output functions and are defined as follows: Definition 2.3 A multi-output function f : Bn → Bm is a reversible function iff • its number of inputs is equal to the number of outputs (i.e. n = m) and • it maps each input pattern to a unique output pattern. In other words, each reversible function is a bijection that performs a permutation of the set of input patterns. A function that is not reversible is termed irreversible. Example 2.3 Table 2.2(c) shows a 3-input, 3-output function. This function is reversible, since each input pattern maps to a unique output pattern. In contrast the function depicted in Table 2.2(a) is irreversible, since n = m. Moreover, also the

2.1 Background

9

Table 2.2 Multi-output functions (a) Irrev. (Adder) (b) Irreversible

(c) Reversible

x1

x2

x3

f1

f2

x1 x2 x3

f1 f2 f3

x1 x2 x3

f1 f2 f3

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

1

0

0

1

0

0

0

0

0

1

0

1

0

0

1

0

0

1

0

1

0

0

1

0

0

1

0

1

0

0

0

1

1

1

0

0

1

1

0

1

1

0

1

1

1

0

1

1

0

0

0

1

1

0

0

1

0

0

1

0

0

0

0

1

1

0

1

1

0

1

0

1

1

0

1

1

0

1

0

1

1

1

1

0

1

0

1

1

0

1

1

1

1

1

0

1

1

0

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

1

function in Table 2.2(b) is irreversible. Here, the number n of inputs indeed is equal to the number m of outputs, but there is no unique input-output mapping. For example, both inputs 000 and 001 map to the output 000. Quite often, (irreversible) multi-output Boolean functions should be represented by reversible circuits. This necessitates the irreversible function to be embedded into a reversible one which requires the addition of constant inputs and garbage outputs defined as follows: Definition 2.4 A constant input of a reversible function is an input that is set to a fixed value (either 0 or 1). Definition 2.5 A garbage output of a reversible function is an output which is a don’t care for all possible input conditions. The problem of embedding is an integral part of synthesis which is described later in this book. In particular, Sect. 3.1.1 and Chap. 5 cover the respective aspects in detail. Reversible functions can be realized by reversible logic. Due to its special properties, reversible logic found large interest in several domains like low-power design or quantum computation (see Chap. 1). As a result, synthesis of reversible functions has become an intensively studied topic in the last years. Therefore, new kinds of circuits have been proposed that are introduced and compared to traditional circuits in the next section.

2.1.2 Reversible Circuits A circuit realizes a Boolean function. Usually, a circuit is composed of signal lines and a set of basic gates (called gate library). For traditional circuits, often the gate

10

2 Preliminaries

Fig. 2.1 Traditional circuit elements

library depicted in Fig. 2.1 is used. This includes gates for the operations AND, OR, and NOT, based on which any Boolean function can be realized. Furthermore, fanouts are applied to use signal values more than once. In contrast, to realize reversible logic some restrictions must be considered: fanouts and feedback are not directly allowed, since they would destroy the reversibility of the computation [NC00]. Also, the gate library from above as well as the traditional design flow cannot be utilized. As a result, a cascade structure over reversible gates is the established model to realize reversible logic. Definition 2.6 A reversible circuit G over inputs X = {x1 , x2 , . . . , xn } is a cascade of reversible gates gi , i.e. G = g0 g1 · · · gd−1 where d is the number of gates. A reversible gate has the form g(C, T ), where C = {xi1 , . . . , xik } ⊂ X is the set of control lines and T = {xj1 , . . . , xjl } ⊂ X with C ∩ T = ∅ is the set of target lines. C may be empty. The gate operation is applied to the target lines iff all control lines meet the required control conditions. Control lines and unconnected lines always pass through the gate unaltered. In the literature, three types of reversible gates have been established: • A (multiple control) Toffoli gate (MCT) [Tof80] has a single target line xj and maps (x1 , x2 , . . . , xj , . . . , xn ) to (x1 , x2 , . . . , xi1 xi2 · · · xik ⊕ xj , . . . , xn ). That is, a Toffoli gate inverts the target line iff all control lines are assigned to 1. • A (multiple control) Fredkin gate (MCF) [FT82] has two target lines xj1 and xj2 . The gate interchanges the values of the target lines iff the conjunction of all control lines evaluates to 1. • A Peres gate (P) [Per85] has a control line xi , a target line xj1 , and a line xj2 that serves as both, control and target. It maps (x1 , x2 , . . . , xj1 , . . . , xj2 , . . . , xn ) to (x1 , x2 , . . . , xi xj2 ⊕xj1 , . . . , xi ⊕xj2 , . . . , xn ) and thus is a cascade of two MCT gates. Example 2.4 Figure 2.2 shows a Toffoli gate (a), a Fredkin gate (b), and a Peres gate (c) together with a truth table of its functionality. A ● is used to indicate a control line, while an ⊕ (×) is used for denoting the target line of a Toffoli and Peres gate (Fredkin gate). Remark 2.1 These definitions also provide the basis for other gate types. For example, the Toffoli gate builds the basis for the NOT gate (a Toffoli gate with no control lines, i.e. with C = ∅), for the controlled-NOT gate (a Toffoli gate with one control

2.1 Background

11

Fig. 2.2 Reversible gates

Fig. 2.3 Reversible circuits

line),1 as well as for the Toffoli gate as originally proposed in [Tof80]. In contrast, the Fredkin gate builds the basis for a SWAP gate (a Fredkin gate with C = ∅, i.e. an interchanging of two lines). In the following, the notations MCT(C, xj ), MCF(C, xj1 , xj2 ), and P (xi , xj1 , xj2 ) are used to denote a Toffoli, Fredkin, and Peres gate, respectively. The number of control lines, a Toffoli (Fredkin) gates consists of, defines the size of the gate. Using these gate types, universal libraries can be composed. A gate library is called universal, if it enables the realization of any reversible function. For example, it has been proven that every reversible function can be realized using MCT gates only [MD04b]. Also, the gate library consisting of NOT, CNOT, and two-controlled Toffoli gates is universal [SPMH03]. In contrast, a library including only CNOT gates allows the realization of linear reversible functions only [PMH08]. Example 2.5 Figure 2.3 shows reversible circuits realizing the function depicted in Table 2.2(c) with the help of Toffoli and Fredkin gates, respectively. As for their traditional counterparts, the complexity of reversible circuits is measured by means of different cost metrics. More precisely, the cost of the respective circuits is defined as follows: 1 The

controlled-NOT gate is also known as CNOT or Feynman gate.

12

2 Preliminaries

Table 2.3 Quantum cost for Toffoli and Fredkin gates N O . OF CONTROL LINES

Q UANTUM COST OF A

T OFFOLI GATE

OF A

0

1

3

1

1

7

2

5

15

3

13

F REDKIN GATE

28, if at least 2 lines are unconnected 31, otherwise

4

26, if at least 2 lines are unconnected 29, otherwise

40, if at least 3 lines are unconnected 54, if 1 or 2 lines are unconnected 63, otherwise

5

6

38, if at least 3 lines are unconnected

52, if at least 4 lines are unconnected

52, if 1 or 2 lines are unconnected

82, if 1, 2 or 3 lines are unconnected

61, otherwise

127, otherwise

50, if at least 4 lines are unconnected

64, if at least 5 lines are unconnected

80, if 1, 2 or 3 lines are unconnected

102, if 1, 2, 3 or 4 lines are unconnected

125, otherwise

255, otherwise

Definition 2.7 A reversible circuit G = g0 g1 · · · gd−1 has cost of c=

d−1

ci ,

i=0

where ci denotes the cost of gate gi . The concrete cost for a single gate of course depends on the respective type but also on the addressed technology. In this book, the following cost metrics are used: • Gate count denotes the number of gates the circuit consists of (i.e. ci = 1 and c = d). • Quantum cost denotes the effort needed to transform a reversible circuit to a quantum circuit (see also next section). Table 2.3 shows the quantum cost for a selection of Toffoli and Fredkin gate configurations as introduced in [BBC+95] and further optimized in [MD04a] and [MYDM05]. As can be seen, gates of larger size are considerably more expensive than gates of smaller size. The Peres gate

2.1 Background

13

represents a special case, since it has quantum cost of 4, while the realization with two Toffoli gates would imply a cost of 6. • Transistor cost denotes the effort needed, to realize a reversible circuit in CMOS according to [TG08]. The transistor cost of a reversible gate is 8 · s where s is the number of control lines. Example 2.6 Consider the circuits from Example 2.5 depicted in Fig. 2.3. The Toffoli circuit has a gate count of 6, quantum cost of 10, and transistor cost of 56, while the Fredkin circuit has a gate count of 3, quantum cost of 13, and transistor cost of 8, respectively. As can be seen, the costs significantly differ depending on the applied cost model. Even if the number of gates in a cascade is a very simple measure of its complexity, it is the most technology-independent metric. Thus, the gate count is often used to evaluate the quality of a reversible circuit. Besides that, also the quantum cost metric is popular because it represents a measure for the most intensely studied application (namely quantum computation) and considers larger gates to be more costly. The transistor cost model is a relatively new model that arose with the application of reversible circuits to the area of low-power CMOS design. In this book, gate count and quantum cost are primarily considered as it allows a fair comparison of synthesis results with respect to previous work. Transistor costs are additionally addressed where appropriate. Finally, a special property of reversible logic is reviewed: Lemma 2.1 If the cascade of MCT gates G = g0 g1 · · · gd−1 realizes a reversible function f , then the reverse cascade G = gd−1 gd−2 · · · g0 realizes the inverse function f −1 . Proof Each reversible gate realizes a reversible function. That is, for each input pattern a unique output pattern, i.e. a one-to-one mapping, exists. Thus, calculating the inverse of the function f for an output pattern is essentially the same operation as propagating this pattern backwards through the circuit. This lemma is particularly exploited during synthesis of reversible logic as described later in this book. The next section considers quantum circuits and how they are derived from reversible logic.

2.1.3 Quantum Circuits Quantum computation [NC00] is a promising application of reversible logic. Every quantum circuit works on qubits instead of bits. In contrast to Boolean logic, qubits do not only allow to represent Boolean 0’s and Boolean 1’s, but also the superposition of both. More formally:

14

2 Preliminaries

Definition 2.8 A qubit is a two level quantum system, described by a two dimensional complex Hilbert space. The two orthogonal quantum states 1 0 |0 ≡ and |1 ≡ 0 1 are used to represent the Boolean values 0 and 1. Any state of a qubit may be written as |Ψ = α|0 + β|1, where α and β are complex numbers with |α|2 + |β|2 = 1. The quantum state of a single qubit is denoted by the vector α . β The state of a quantum system with n > 1 qubits is given by an element of the tensor product of the respective state spaces and can be represented as a normalized vector of length 2n , called the state vector. The state vector is changed through multiplication of appropriate 2n × 2n unitary matrices. Thus, each quantum computation is inherently reversible but manipulates qubits rather than pure logic values. At the end of the computation, a qubit can be measured. Then, depending on the current state of the qubit a 0 (with probability of |α|2 ) or a 1 (with probability of |β|2 ) returns, respectively. After the measurement, the state of the qubit is destroyed. In other words, using quantum computation and qubits in superposition, functions can be evaluated with different possible input assignments in parallel. But, it is not possible to obtain the current state of a qubit. Instead, if a qubit is measured, either 0 or 1 is returned depending on the respective probability. Nevertheless, researchers exploited quantum computation (in particular superposition) to solve many practically relevant problems faster than by traditional computing machines. For example, it was possible to solve the factorization problem in polynomial time— for traditional machines only exponential algorithms are known. Even if the research in this area is still at the beginning (so far, quantum algorithms with only up to 28 qubits have been implemented), these first promising results motivate further research in this area. The focus of this book is how to design reversible and quantum circuits, respectively. Thus, in the following the model for quantum circuits as used in this book is introduced. For a more detailed treatment of the respective physical background, the reader is referred to [Pit99, NC00, Mer07]. Definition 2.9 A quantum circuit Q is a cascade of quantum gates qi , i.e. Q = q0 · · · qd−1 . In this book, the following quantum gates are considered: • Inverter (NOT): A single qubit is inverted. • Controlled inverter (CNOT): The target qubit is inverted if the control qubit is 1. • Controlled V gate: A V operation is performed on the target qubit if the control qubit is 1. The V operation is also known as the square root of NOT, since two consecutive V operations are equivalent to an inversion.

2.1 Background

15

Fig. 2.4 Quantum gates

Fig. 2.5 State transitions for NOT, CNOT, V, and V+ operations

• Controlled V+ gate: A V+ operation is performed on the target qubit if the control qubit is 1. The V+ gate performs the inverse operation of the V gate, i.e. V + ≡ V −1 . The notation for these gates along with their corresponding 2n × 2n unitary matrices is shown in Fig. 2.4. In the following, the input to a quantum circuit as well as to each control line of a gate is restricted to 0 and 1. This has the effect that the value of each qubit is restricted to one value of the set {0, 1, V0 , V1 }, i.e. a 4-valued logic with 1+i 1 1 + i −i V0 = and V1 = −i 1 2 2 is applied. Figure 2.5 shows the resulting transitions with respect to the possible NOT, CNOT, V, and V+ operations. By restricting the quantum circuit model in this way, physical effects like superposition (and entanglement [NC00]) are excluded from the following consideration so that automated approaches (e.g. for synthesis, optimization, verification, etc.) become applicable. Nevertheless, the restricted model remains realistic for many applications. As an example, many of today’s quantum algorithms (e.g. Deutsch’s algorithm or Grover’s algorithm [NC00]) include quantum realizations of reversible (Boolean) functions. Thus, the mentioned restrictions are common in the design of quantum circuits.

16

2 Preliminaries

Fig. 2.6 Quantum circuit

Fig. 2.7 Pairs of quantum gates with unit cost

Example 2.7 Figure 2.6 shows a quantum circuit realizing the reversible function depicted in Table 2.2(c). All quantum gates are assumed to be the basic blocks of each quantum computation. This is also reflected in the cost metric. Definition 2.10 Each quantum gate has cost of 1. Thus, the cost of a quantum circuit is defined by the number d of its gates. Remark 2.2 In previous work, also an extended cost metric has been applied: When a CNOT and a V (or V+) gate are applied to the same two qubits, the cost of the pair can be considered unit as well [SD96, HSY+06]. The possible pairs (denoted by double gates in the following) are shown in Fig. 2.7. In this book, primarily the cost metric from Definition 2.10 is applied. However, all approaches can also be extended to consider unit cost of double gates. Exemplarily, this is shown for exact synthesis of quantum circuits in Sect. 4.2.2. Since quantum circuits are inherently reversible, every reversible circuit can be transformed to a quantum circuit. To this end, each gate of the reversible circuit is decomposed into a cascade of quantum gates. Example 2.8 Figure 2.8(a) (Fig. 2.8(b)) shows the quantum gate cascade which can be used to transform a Toffoli (Fredkin) gate to a quantum circuit. As can be seen, the number of required quantum gates is equal to the quantum cost of the Toffoli (Fredkin) gate as introduced in Table 2.3. Exploiting these decompositions, synthesis of quantum circuits can be approached from two different angles: (1) Targeting quantum gates directly during the synthesis process or (2) synthesizing reversible circuits first that are later mapped into quantum circuits.

2.2 Decision Diagrams To represent Boolean (including reversible) functions and circuits, decision diagrams can be applied. They provide an efficient data-structure that can represent

2.2 Decision Diagrams

17

Fig. 2.8 Decomposition of reversible gates to quantum circuits

large functions in a more compact way than truth tables. In the past, several types of decision diagrams have been introduced. In this book, Binary Decision Diagrams (BDDs) [Bry86] are considered to represent Boolean functions. Quantum Multiple-valued Decision Diagrams (QMDDs) [MT06, MT08] are used to represent reversible functions that may include quantum operations. Both are briefly introduced in this section.

2.2.1 Binary Decision Diagrams A Boolean function f : Bn → B can be represented by a graph-structure defined as follows: Definition 2.11 A Binary Decision Diagram (BDD) over Boolean variables X with terminals T = {0, 1} is a directed acyclic graph G = (V , E) with the following properties: 1. Each node v ∈ V is either a terminal or a non-terminal. 2. Each terminal node v ∈ V is labeled by a value t ∈ T and has no outgoing edges. 3. Each non-terminal node v ∈ V is labeled by a Boolean variable xi ∈ X and represents a Boolean function f . 4. In each non-terminal node (labeled by xi ), the Shannon decomposition [Sha38] f = x i fxi =0 + xi fxi =1 is carried out, leading to two outgoing edges e ∈ E whose successors are denoted by low(v) (for fxi =0 ) and high(v) (for fxi =1 ), respectively. The size of a BDD is defined by the number of its (non-terminal) nodes. Example 2.9 Figure 2.9 shows a BDD representing the function f = x1 ⊕ x2 · x3 . Edges leading to a node fxi =0 (fxi =1 ) are marked by a 0 (1). This BDD has a size of 5.

18

2 Preliminaries

Fig. 2.9 BDD representing f = x1 ⊕ x2 · x3

A BDD is called free if each variable is encountered at most once on each path from the root to a terminal node. A BDD is called ordered if in addition all variables are encountered in the same order on all such paths. The respective order is defined by π : {1, . . . , n} → {1, . . . , n}. Finally, a BDD is called reduced if it does neither contain isomorphic sub-graphs nor redundant nodes. To achieve reduced BDDs, reduction rules as depicted in Fig. 2.10 are applied. Applying the reduction rules leads to shared nodes, i.e. nodes that have more than one predecessor. Example 2.10 Figure 2.11 shows two reduced ordered BDDs representing the function f = x1 ·x2 +x3 ·x4 +· · ·+xn−1 ·xn . For the order x1 , x2 , . . . , xn−1 , xn , the BDD depicted in Fig. 2.11(a) has a size of O(n), while the BDD depicted in Fig. 2.11(b) with the order x1 , x3 , . . . , xn−1 , x2 , x4 , . . . , xn has size of O(2n ). Remark 2.3 In the following, reduced ordered binary decision diagrams are called BDDs for brevity. BDDs are canonical representations, i.e. for a given Boolean function and a fixed order, the BDD is unique [Bry86]. As shown by Example 2.10, BDDs are very sensitive to the chosen variable order. It has been shown in [BW96] that proving the existence of a BDD with a lower number of nodes (i.e. proving that no other order leads to a smaller BDD size) is NP-complete. As a consequence, several heuristics to find good orders have been proposed. In particular, sifting [Rud93] has been shown to be quite effective. Further reductions of the BDD size can be achieved, if complement edges [BRB90] are applied. They allow to represent a function as well as its complement by one single node only. BDDs can also be used to represent multi-output functions. Then, all BDDs for the respective functions are shared, i.e. isomorphic sub-functions are represented by a single node as well. For a more comprehensive introduction into BDDs, the reader is refereed to [DB98, EFD05]. For the application of BDDs in practice, many well-engineered BDD packages (e.g. CUDD [Som01]) are available.

2.2 Decision Diagrams

19

Fig. 2.10 Reduction rules for BDDs

Fig. 2.11 BDDs with different variable orders

2.2.2 Quantum Multiple-valued Decision Diagrams As described in Sect. 2.1.3, quantum operations are defined by 2n × 2n unitary matrices (consider again Fig. 2.4 on p. 15 for examples). Thus, to represent functions including quantum operations, an adjusted data-structure is needed. Quantum Multiple-valued Decision Diagrams (QMDDs) [MT06, MT08] provide for the representation and manipulation of r n × r n complex-valued matrices with r pure logic states. This includes unitary matrices and thus QMDDs can be applied to represent quantum gates and circuits. Since in this book QMDDs are used as black box only (in contrast to BDDs), a formal definition of QMDDs is omitted and instead they are introduced by exemplarily describing the general idea.

20

2 Preliminaries

Fig. 2.12 QMDD representing the matrix of a single V gate

A QMDD structure is based on partitioning an r n × r n matrix M into r 2 submatrices, each of dimension r n−1 × r n−1 as shown in the following equation: ⎛ ⎞ M0 M1 · · · Mr−1 ⎜ ⎟ ⎜ Mr Mr+1 · · · M2r−2 ⎟ ⎜ ⎟ . M =⎜ . .. .. ⎟ .. ⎜ .. . . . ⎟ ⎝ ⎠ Mr 2 −r Mr 2 −r+1 · · · Mr 2 −1 In the following, the concepts of QMDDs are briefly presented by way of the following example of a single V gate. Example 2.11 Figure 2.12(a) shows a V gate in a 3-line circuit. The unitary matrix describing the behavior of this gate is given in Fig. 2.12(b) where v = 1+i 2 and v = 1−i . The QMDD for this matrix is given in Fig. 2.12(c). The edges from each 2 non-terminal node point to four sub-matrices indexed 0, 1, 2, 3 from left to right. Each edge has a complex-valued weight. For clarity, edges with weight 0 are indicated as stubs. In fact, they point to the terminal node. The key features of QMDD are evident in this example. There is a single terminal node. Furthermore, each edge has a complex-valued weight. Each non-terminal node represents a matrix partitioning. For example, the top node in Fig. 2.12(c) represents the partitioning shown in Fig. 2.12(b). The non-terminal nodes lower in the diagram represent similar partitioning of the resulting sub-matrices. The representation of common sub-matrices is shared. To ensure the uniqueness of the representation, edges with weight 0 must point to the terminal node and normalization is applied to non-terminal nodes so that the lowest indexed edge with non-zero weight has weight 1. As for BDDs, an efficient implementation exists also for QMDDs. However, since QMDD involve multiple edges from nodes and are applicable to both binary

2.3 Satisfiability Solvers

21

and multiple-valued problems, the QMDD package is not built using a standard decision diagram package. Nevertheless, the implementation employs well-known decision diagram techniques like sharing, reordering, and so on. For a more comprehensive introduction into QMDDs, the reader is referred to [MT08].

2.3 Satisfiability Solvers The methods described in this book make use of techniques for solving the Boolean satisfiability problem (SAT problem). The SAT problem is one of the central NP-complete problems. In fact, it was the first known NP-complete problem that was proven by Cook in 1971 [Coo71]. Despite this proven complexity, efficient solving algorithms have been developed that found great success as proof engines for many practically relevant problems. Today there exists algorithms exploiting SAT that solve many practical problem instances, e.g. in the domain of automatic test pattern generation [Lar92, DEF+08], logic synthesis [ZSM+05], debugging [SVAV05], and verification [BCCZ99, CBRZ01, PBG05]. In this section, the SAT problem, the respective solving algorithm, and its application are introduced. Furthermore, extended SAT solvers additionally exploiting bit-vector logic, quantifiers, or problem-specific modules, respectively, are briefly reviewed. These engines are used later as core techniques for selected steps in the proposed flow for reversible logic.

2.3.1 Boolean Satisfiability The Boolean satisfiability problem (SAT problem) is defined as follows: Definition 2.12 Let h : Bn → B be a Boolean function. Then, the SAT problem is to find an assignment to the variables of h such that h evaluates to 1 or to prove that no such assignment exists. In other words, SAT asks if ∃Xh for an h over variables X and determines a satisfying assignment in this case. In this context, the Boolean formula h is often given in Conjunctive Normal Form (CNF). A CNF is a set of clauses, each clause is a set of literals, and each literal is a Boolean variable or its negation. The CNF formula is satisfied if all clauses are satisfied, a clause is satisfied if at least one of its literals is satisfied, and a variable is satisfied when 1 is assigned to the variable (the negation of a variable is satisfied under the assignment 0). Example 2.12 Let h = (x1 + x2 + x 3 )(x 1 + x3 )(x 2 + x3 ). Then, x1 = 1, x2 = 1, and x3 = 1 is a satisfying assignment for h. The values of x1 and x2 ensure that the first clause becomes satisfied, while x3 ensures this for the remaining two clauses.

22

2 Preliminaries

Fig. 2.13 Solving algorithm in modern SAT solvers

To solve SAT problems, in the past several (backtracking) algorithms (or SAT solvers, respectively) have been proposed [DP60, DLL62, MS99, MMZ+01, GN02, ES04]. Most of them apply the steps depicted in Fig. 2.13: While there are free variables left (a), a decision is made (c) to assign a value to one of these variables. Then, implications are determined due to the last assignment (d). This may cause a conflict (e) that is analyzed. If the conflict can be resolved by undoing assignments from previous decisions, backtracking is done (f). Otherwise, the instance is unsatisfiable (g). If no further decision can be made, i.e. a value is assigned to all variables and this assignment did not cause a conflict, the CNF is satisfied (b). Advanced techniques like e.g. efficient Boolean constraint propagation [MMZ+01] or conflict analysis [MS99] as well as efficient decision heuristics [GN02] are common in state-of-the-art SAT solvers today. These techniques as well as the tremendous improvements in the performance of the respective implementations [ES04] enable the consideration of problems with more than hundreds of thousands of variables and clauses. Thus, SAT is widely used in many application domains. Therefore, the real world problem is transformed into CNF and then solved by using a SAT solver as a black box.

2.3.2 Extended SAT Solvers Despite their efficiency, Boolean SAT solvers have a major drawback: they work on the Boolean level. But, many problems are formulated on a higher level of abstraction and would benefit from a more general description, respectively. As a consequence, researchers investigated the use of more expressive formulations than CNF—by still exploiting the established SAT techniques. This leads (1) to the combination of SAT solvers with decision procedures for decidable theories resulting

2.3 Satisfiability Solvers

23

in SAT Modulo Theories (SMT) [BBC+05, DM06b] and (2) to the application of quantifiers resulting in Quantified Boolean Formulas (QBF) [Bie05, Ben05]. Furthermore, problem-specific knowledge is exploited during the solving process by the SAT solver SWORD [WFG+07]. The respective concepts are briefly reviewed in the following.

2.3.2.1 SMT Solvers for Bit-vector Logic An SMT solver integrates a Boolean SAT solver with other solvers for specialized theories (e.g. linear arithmetic or bit-vector logic). The SAT solver thereby works on an abstract representation (still in CNF) of the problem and steers the overall search process, while each (partial) assignment of this representation has to be validated by the theory solver for the theory constraints. Thus, advanced SAT techniques together with specialized theory solvers are exploited. In this book, the theory of quantifier free bit-vector logic (QF_BV) is utilized. This logic is defined as follows: Definition 2.13 A bit-vector is an element b = (bn−1 , . . . , b0 ) ∈ Bn . The index [ ] : Bn × [0, n) → B maps a bit-vector b and an index i to the ith component of the vector, i.e. b[i] = bi . Conversion from (to) a natural number isdefined by i nat : Bn → N (bv : N → Bn ) with N = [0, 2n ) ⊂ N and nat(b) := n−1 i=0 bi · 2 −1 (bv := nat ). Problems can be constraint by using bit-vector operations as well as arithmetic operations. Let a, b ∈ Bn be two bit-vectors. Then, the bit-vector operation ◦ ∈ {∧, ∨, . . .} is defined by a ◦ b := (a[n − 1] ◦ b[n − 1], . . . , a[0] ◦ b[0]). An arithmetic operation • ∈ {·, +, . . .} is defined by a • b := nat(a) • nat(b). Example 2.13 Let a, b, and c be three bit-vector variables with bit-width n = 3 and (a ∨ b = c) ∧ (a + b = c) an SMT bit-vector instance over these variables. Then, a = (010), b = (001), and c = (011) is a satisfying solution of this instance, since it satisfies each constraint. To solve SMT instances in QF_BV logic either (1) a combination of a traditional SAT solver and a specialized (bit-vector) theory solver is applied (see e.g. [BBC+05, DM06a]), (2) the instance is pre-processed exploiting the higher level of abstraction before the resulting (simplified) instance is bit-blasted to a traditional SAT solver (see e.g. [GD07, BB09]), or (3) a specialized solver that directly works on the bitlevel of the problem is used (see e.g. [DBW+07]). Having an efficient solver available, similar to Boolean SAT the real world problem is transformed into a QF_BV instance. But instead of a description in terms of clauses, the higher level representation in terms of bit-vectors is used. Then, the resulting instance is passed to the solver which is again used as a black box. The higher abstraction which is now available can be exploited to accelerate the solving process.

24

2 Preliminaries

2.3.2.2 QBF Solvers Another generalization of the SAT problem is given by QBF satisfiability. Here, variables of the Boolean function h additionally can be universally and existentially quantified. More formally: Definition 2.14 Let h : Bn → B be a Boolean function over the variables X (usually given in CNF). Then, Q1 X1 . . . Qt Xt h with disjunct Xi ⊂ X and Qi ∈ {∃, ∀} is a Quantified Boolean Formula (QBF). The QBF problem is to find an assignment to the variables of h such that h evaluates to 1 with respect to the quantifiers or to prove that no such assignment exists. Example 2.14 Let ∃x2 , x3 ∀x1 (x1 + x2 + x 3 )(x 1 + x3 )(x 2 + x3 ). Then x2 = 1 and x3 = 1 is a satisfying assignment for the QBF h. The value of x2 ensures that the first clause becomes satisfied, while x3 ensures this for the remaining two clauses for all possible assignments to x1 . Obviously, solving QBF problems is significantly harder than solving pure SAT instances. In fact, it is PSPACE-complete [Pap93]. Nevertheless, QBF enables the formulation of many problems in a more compact way. In this sense, complexity is moved from the problem formulation to the solving engine, i.e. the task can be formulated in a more compact way resulting in a more complex problem to be solved by the solver. However, since usually solving engines are well-engineered with respect to the dedicated problem, this may lead to a faster solving process. Today, recent solvers (e.g. [Bie05, Ben05]) exploit techniques like symbolic skolemization to solve QBF instances (i.e. converting the instance to a normal form which enables simplifications).

2.3.2.3 SWORD Solver Due to the translation of the problem into CNF (or QF_BV logic, respectively), problem-specific knowledge is lost. More illustrative, decisions, implications, and learning schemes can only exploit the Boolean (bit-vector) description. In contrast, with more problem-specific knowledge available, more options exist how to control the search space traversion. This observation is exploited by the problem-specific SAT solver SWORD [WFG+07].2 SWORD represents the problem in terms of so called modules. Each module defines an operation over bit vectors of module variables. Each module variable is a Boolean variable. By this, structural and semantical knowledge is available which can be exploited by special algorithms for each kind of module. Furthermore, this 2 SWORD

has been co-developed by the authors of this book. Even if SWORD is focused on problem-specific knowledge, it can also be used as an SMT solver and already participated in the respective SMT competitions in 2008 [WSD08] and 2009 [JSWD09], respectively.

2.3 Satisfiability Solvers

25

Fig. 2.14 SWORD algorithm

leads to a more compact problem formulation, since representing complex operations in terms of modules substitutes a significant amount of clauses. Example 2.15 Consider an n × n-multiplier. This multiplier can be represented by n2 AND gates and n − 1 adders [MK04]. Furthermore, a single AND gate can be modeled by three clauses and requires θ (n2 ) auxiliary variables. Thus, just to encode the AND gates a CNF with θ (n2 ) auxiliary variables and clauses is required, respectively. In contrast, using SWORD only 3n module variables (for the two inputs and the output of the multiplication) and a single (multiplier) module are needed to represent the whole multiplication. Given a SAT instance including modules, the overall algorithm depicted in Fig. 2.14 is used to solve the problem. This algorithm is similar to the procedure as applied in standard SAT solvers (see Fig. 2.13 on p. 22): While free variables remain (a), a decision is made (c), implications resulting from this decision are carried out (d), and if a conflict occurs, it is analyzed (f). The important difference is that SWORD has two operation levels: The global algorithm controls the overall search process and calls local procedures of modules for decision and implication. Thus, decision making and the implication engine can be adjusted for each type of module. In more detail, the solver first chooses a particular module based on a global decision heuristic (c.1). Then, this module chooses a value for one of its variables according to a local decision heuristic (c.2). Afterwards, the solver calls the local implication procedures (d.2) of all modules that are potentially affected (d.1) by the previous decision or implication. Here, a variable watching scheme similar to the one presented in [MMZ+01] is used which can efficiently determine these modules. The chosen modules imply further assignments and detect conflicts.

26

2 Preliminaries

Due to the two operation levels, problem-specific strategies e.g. for decision making and propagation can be exploited by the modules. For example, decision making can be prioritized so that modules, which are assumed to be “more important” than others, are selected for a decision with a higher priority than less important modules. Furthermore, different modules can be equipped with different strategies. For a more detailed description of SWORD, the reader is referred to [WFG+07, WFG+09]. Therewith, all preliminaries required for this book have been introduced. Besides an introduction of reversible and quantum logic, also the applied core techniques have been briefly described. With that as a basis, the contributions towards a design flow for reversible logic are proposed in the following chapters. Decision diagrams are thereby applied for synthesis (Sect. 3.2), partially for exact synthesis (Sect. 4.3.2), and for verification (Sect. 7.1.2), while Boolean satisfiability is exploited in exact synthesis (especially in Sect. 4.2 and partially in Sect. 5.3 as well as Sect. 6.3), verification (Sect. 7.1.3), and debugging (Sect. 7.2). The extended SAT solvers (i.e. SMT solvers, QBF solvers, and SWORD) are used to improve exact synthesis (Sect. 4.3). Note on Benchmarks In the following chapters, the respective contributions are introduced and evaluated in detail. Different scopes are thereby considered that are also reflected in the benchmark sets used to experimentally evaluate the respective methods. Synthesis approaches are evaluated using large functions (for the proposed heuristic method), small functions (for exact synthesis), and irreversible functions (to evaluate the different embeddings). In contrast, the methods targeting optimization, verification, and debugging work on a given circuit description. Furthermore, different timeouts are applied in the respective evaluations (e.g. exact synthesis normally requires more run-time than heuristic synthesis). As already stated in the introduction, the benchmarks used in this book are publicly available at www.revlib.org. The resulting tools can be obtained under www.revkit.org.

Chapter 3

Synthesis of Reversible Logic

Synthesis is the most important step while building complex circuits. Considering the traditional design flow, synthesis is carried out in several individual steps such as high-level synthesis, logic synthesis, mapping, and routing (see e.g. [SSL+92]). To synthesize reversible logic, adjustments and extensions are needed. For example, further tasks such as embedding of irreversible functions must be added. Furthermore, throughout the whole flow, the restrictions caused by the reversibility (no fanout and feedback) and a completely new gate library must be considered as well. In the last years, first approaches addressing some of these issues have been introduced (see e.g. [SPMH03, MMD03, MD04b, Ker04, MDM05, GAJ06, HSY+06, MDM07]). The first section of this chapter briefly reviews existing methods for the individual steps. However, the research in this area is still at the beginning. So far, the desired behavior of the circuit to be synthesized is given by function descriptions like truth tables or permutations, respectively. As a result, current synthesis methods are applicable to relatively small functions only and often need a significant amount of run-time. This must be improved in order to design larger functions or complex reversible systems in the future. In this book, the wide area of reversible logic synthesis is covered by the following three chapters—each chapter with an own detailed view on a particular aspect. While Chap. 4 introduces exact (i.e. minimal) circuit synthesis, Chap. 5 discusses aspects of embedding in detail. The present chapter builds the basis for them and additionally proposes new methods that allows a fast synthesis of significantly larger functions and more complex circuits, respectively. Since Toffoli circuits as introduced in Sect. 2.1.2 generally build the basis for both, reversible and quantum circuits, in the following, the focus is on synthesis of Toffoli cascades. Nevertheless, quantum circuit synthesis is additionally considered when it is appropriate. As already mentioned, the first part of this chapter builds the basis for all remaining synthesis sections. Here, it is shown how irreversible functions must be embedded into reversible ones before they can be applied to existing synthesis methods. Then, by using the example of the transformation-based approach introduced in [MMD03], one of the previous synthesis methods is described and discussed. Altogether, this briefly summarizes the basic synthesis steps for reversible logic as they exist today. R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, DOI 10.1007/978-90-481-9579-4_3, © Springer Science+Business Media B.V. 2010

27

28

3 Synthesis of Reversible Logic

Motivated by this (in particular by the limitations of the current synthesis methods), the second part of this chapter introduces a new synthesis approach [WD09, WD10] that exploits Binary Decision Diagrams (BDDs) [Bry86]. BDDs allow an efficient representation of large Boolean functions that can be mapped into reversible cascades. As a result, for the first time Toffoli circuits for functions containing over 100 variables can be derived efficiently. Finally, how to specify and synthesize more complex reversible circuits at higher abstractions is considered in the third part of this chapter. For this purpose, a new programming language (called SyReC) and a respective hierarchical synthesis approach is presented and evaluated [WOD10].

3.1 Current Synthesis Steps This section illustrates the current synthesis steps that use well-established methods. First, the problem of embedding irreversible functions is considered. Second, the synthesis itself is introduced. For the latter, a widely known approach, namely the transformation-based approach introduced in [MMD03], is used. Most of the remaining synthesis methods apply similar strategies (e.g. [Ker04, GAJ06, MDM07]) or are developed on top of this method (e.g. [MDM05]).

3.1.1 Embedding Irreversible Functions Table 3.1 shows the truth table of a 1-bit adder which is used as an example in this section. The adder has three inputs (the carry-in cin as well as the two summands x and y) and two outputs (the carry-out cout and the sum). The adder obviously is irreversible, since • the number of inputs differs from the number of outputs and • there is no unique input-output mapping. Even adding an additional output to the function (leading to the same number of inputs and outputs) would not make the function reversible. Then, without loss of generality, the first four lines of the truth table can be embedded with respect to reversibility as shown in the rightmost column of Table 3.1. However, since cout = 0 and sum = 1 already appeared two times (marked bold), no unique embedding for the fifth line is possible any longer. The same also holds for the lines marked italic. This has already been observed in [MD04b]. Here, the authors came to the conclusion that at least log2 (μ) additional (garbage) outputs are required to make an irreversible function reversible, where μ is the maximum number of times an output pattern is repeated in the truth table. Since for the adder at most three output patterns are repeated, log2 (3) = 2 additional outputs are required to make the function reversible.

3.1 Current Synthesis Steps Table 3.1 Truth table of an adder

Table 3.2 Truth table of an embedded adder

29 cin

x

y

cout

sum

0

0

0

0

0

0

0

0

1

0

1

0

0

1

0

0

1

1

0

1

1

1

0

0

1

0

0

0

1

?

1

0

1

1

0

1

1

1

0

1

0

?

1

1

1

1

1

1

0

cin

x

y

cout

sum

g1

g2

0

0

0

0

0

0

0

0

0

0

0

1

0

1

1

1

0

0

1

0

0

1

1

0

0

0

1

1

1

0

0

1

0

1

0

0

0

1

0

0

0

1

0

1

1

0

1

1

0

1

1

0

1

0

1

0

0

1

1

1

1

1

0

1

1

0

0

0

1

0

0

0

1

0

0

1

1

1

1

1

1

0

1

0

1

1

1

0

1

0

1

1

0

0

0

1

1

1

0

0

1

1

0

0

1

1

0

1

0

0

1

1

1

1

1

0

0

0

1

0

1

1

1

1

0

1

0

1

Adding new lines causes constant inputs and garbage outputs. The value of the constant inputs can be chosen by the designer. Garbage outputs are by definition don’t cares and thus can be left unspecified leading to an incompletely specified function. However, many synthesis approaches require a completely specified function so that often all don’t cares must be assigned to a concrete value. As a result, the adder is embedded in a reversible function including four variables, one constant input, and two garbage outputs. A possible assignment to the constant as well as the don’t care values is depicted in Table 3.2 (where the original adder function is marked bold). In the following, a synthesis method is introduced assuming a completely specified reversible function as input. However, the concrete embedding of irreversible functions (in particular the concrete assignment to don’t

30

3 Synthesis of Reversible Logic

Table 3.3 MDM procedure line

input

output

1st step

2nd step

3rd step

4th step

5th step

6th step

(i)

abcd

abcd

abcd

abcd

abcd

abcd

abcd

abcd

0

0000

0000

0000

0000

0000

0000

0000

0000

1

0001

0111

0101

0001

0001

0001

0001

0001

2

0010

0110

0110

0110

0010

0010

0010

0010

3

0011

1001

1011

1111

1011

0011

0011

0011

4

0100

0100

0100

0100

0100

0100

0100

0100

5

0101

1011

1001

1101

1101

1101

0101

0101

6

0110

1010

1010

1010

1110

1110

1110

0110

7

0111

1101

1111

1011

1111

0111

1111

0111

8

1000

1000

1000

1000

1000

1000

1000

1000

9

1001

1111

1101

1001

1001

1001

1001

1001

10

1010

1110

1110

1110

1010

1010

1010

1010

11

1011

0001

0011

0111

0011

1011

1011

1011

12

1100

1100

1100

1100

1100

1100

1100

1100

13

1101

0011

0001

0101

0101

0101

1101

1101

14

1110

0010

0010

0010

0110

0110

0110

1110

15

1111

0101

0111

0011

0111

1111

0111

1111

cares) can have a significant impact on the synthesis results (i.e. on the number of gates in the resulting circuit). Thus, this issue is again considered in Chap. 5 which also provides examples showing the effect of different embeddings.

3.1.2 Transformation-based Synthesis In this section, synthesis of reversible logic is exemplarily described using the approach from [MMD03]. The basic idea is to traverse each line of the truth table and to add gates to the circuit until the output values match the input values (i.e. until the identity is achieved). Gates are thereby chosen so that they don’t alter already considered lines. Furthermore, gates are added starting at the output side of the circuit (this is because output values are transformed until the identity is achieved). In the following, the approach is described using the example of the embedded adder from Table 3.2. Table 3.3 shows the respective steps. The first column denotes the truth table line numbers, while the second and third column give the function specification of the adder. For brevity, the inputs 0, cin , x, y and the outputs cout , sum, g1 , g2 are denoted by a, b, c, d, respectively. The remaining columns provide the transformed output values for the respective steps. The approach starts at truth table line 0. Since for this line the input is already equal to the output (both are assigned to 0000), no gate has to be added. In contrast, to match the output with the input in line 1, the values for c and b must be

3.2 BDD-based Synthesis

31

Fig. 3.1 Circuit obtained by transformation-based synthesis

inverted. To this end, two gates MCT({d}, c) (1st step) and MCT({d}, b) (2nd step) are added as depicted in Fig. 3.1. Due to the control line d, this does not affect the previous truth table line. In line 2 and line 3, an MCT({c}, b) as well as an MCT({c, d}, a) is added to match the values of b and a, respectively (step 3 and 4). For the latter gate, two control lines are needed to keep the already traversed truth table lines unaltered. Afterwards, only two more gates MCT({d, b}, a) (5th step) and MCT({c, b}, a) (6th step) are necessary to achieve the input-output identity. The resulting circuit is shown in Fig. 3.1. This circuit consists of six gates and has quantum cost of 18. In [MMD03], further variations of this approach are discussed. In fact, this transformation can also be applied in the inverse direction (i.e. so that the input must match the output) and in both directions simultaneously. Furthermore, in [MDM05] the approach has been extended by the application of templates. These help to reduce the size of the resulting circuits and thus to achieve circuits with lower cost. Having this as a general introduction into synthesis of reversible logic, in the following new synthesis approaches are proposed.

3.2 BDD-based Synthesis The strategy introduced in the last section (namely selecting reversible gates so that the chosen function representation becomes the identity) has been adopted and extended by many other researchers. More precisely, more compact data-structures like decision diagrams [Ker04], positive-polarity Reed-Muller expansion [GAJ06], or Reed-Muller spectra [MDM07] have been applied. But even if complementary approaches are used (e.g. [SPMH03]), so far all approaches are applicable only to relatively small functions, i.e. functions with at most 30 variables [GAJ06]. Moreover, often a significant amount of run-time is needed to achieve these results. Thus, current synthesis methods are limited. These limitations are caused by the underlying techniques. The existing synthesis approaches often rely on truth tables (or similar descriptions like permutations) of the function to be synthesized (e.g. in [SPMH03, MMD03]). But even if alternative data-structures (e.g. the ones mentioned above) are used, the same limitations can be observed. In this section, a synthesis method that can cope with significantly larger functions is introduced. The basic idea is as follows: First, for the function to be synthesized a BDD (see Sect. 2.2.1) is built. This can be efficiently done for large functions using existing well-developed techniques. Then, each node of the BDD is substituted by a cascade of reversible gates. Since BDDs may include shared nodes

32

3 Synthesis of Reversible Logic

causing fanouts (which are not allowed in reversible logic), this may require additional circuit lines. As a result, circuits composed of Toffoli or quantum gates, respectively, are obtained in time and with memory linear to the size of the BDD. Moreover, since the size of the resulting circuit is bounded by the BDD size, theoretical results known from BDDs (see e.g. [Weg00, LL92]), can be transferred to reversible circuits. The experiments show significant improvements (with respect to the resulting circuit cost as well as to the run-time) in comparison to previous approaches. Furthermore, for the first time large functions with more than hundred variables can be synthesized at very low run-time. In the remainder of this section, the BDD-based synthesis approach is introduced as follows: In Sect. 3.2.1, the general idea and the resulting synthesis approach is described in detail. How to exploit BDD optimizations is shown in Sect. 3.2.2, while Sect. 3.2.3 briefly reviews some of the already known theoretical results from reversible logic synthesis and introduces bounds which follow from the new synthesis approach. Finally, in Sect. 3.2.4 experimental results are given.

3.2.1 General Idea In this section, the general idea of the BDD-based synthesis is proposed. The aim of the approach is to determine a circuit realization for a given Boolean function. It is well known that Boolean functions can be efficiently represented by BDDs. Given a BDD G = (V , E), a reversible circuit can be derived by traversing the decision diagram and substituting each node v ∈ V with a cascade of reversible gates. The concrete cascade of gates depends on whether the successors of the node v are terminals or not. For the general case (no terminals), the first row of Table 3.4 shows a substitution with two Toffoli gates or five quantum gates, respectively. The following rows give the substitutions for the remaining cases. These cascades can be applied to derive a complete Toffoli circuit (or quantum circuit, respectively) from a BDD without shared nodes. Example 3.1 Consider the BDD in Fig. 3.2(a). Applying the substitutions given in Table 3.4 to each node of the BDD, the Toffoli circuit depicted in Fig. 3.2(b) results. Remark 3.1 As shown in Table 3.4, an additional (constant) line is necessary if one of the edges low(v) or high(v) leads to a terminal node. This is because of the reversibility which has to be ensured when synthesizing reversible logic. As an example consider a node v with high(v) = 0 (second row of Table 3.4). Without loss of generality, the first three lines of the corresponding truth table can be embedded with respect to reversibility as depicted in Table 3.5(a). However, since f is 0 in the last line, no reversible embedding for the whole function is possible. Thus, an additional line is required to make the respective substitution reversible (see Table 3.5(b)).1 1 For the same reason, it is also not possible to preserve the values for low(v) or high(v), respectively, in the substitution depicted in the first row of Table 3.4.

3.2 BDD-based Synthesis

Table 3.4 Substitution of BDD nodes to reversible/quantum circuits

33

34

3 Synthesis of Reversible Logic

Fig. 3.2 BDD and Toffoli circuit for f = x1 ⊕ x2

Table 3.5 (Partial) Truth tables for node v with high(v) = 0

(b) with additional line

(a) w/o add. line xi

low(f )

f

–

0 xi low(f )

f xi low(f )

0

0

0

0

0 0 0

0 0 0

0

1

1

1

0 0 1

1 0 1

1

0

0

1

0 1 0

0 1 0

1

1

0

?

0 1 1

0 1 1

Based on these substitutions, a method for synthesizing Boolean functions in reversible or quantum logic can be formulated: First, a BDD for function f to be synthesized is created. This can be done efficiently using state-of-the-art BDD packages (e.g. CUDD [Som01]). Next, the resulting BDD G = (V , E) is processed by a depth-first traversal. For each node v ∈ V , cascades as depicted in Table 3.4 are added to the circuit. As a result, circuits are synthesized that realize the given function f .

3.2.2 Exploiting BDD Optimization To build compact BDDs, current state-of-the-art BDD packages exploit several optimization techniques such as shared nodes [Bry86], complement edges [BRB90], or reordering [Bry86, Rud93]. In this section, it is shown how these techniques can be applied to the proposed BDD-based synthesis.

3.2.2.1 Shared Nodes If a node v has more than one predecessor, then v is called a shared node. The application of shared nodes is common for nearly all BDD packages. Shared nodes can be used to represent a sub-formula more than once without the need to rebuild the whole sub-graph. In particular, functions f : Bn → Bm (i.e. functions with more than one output) can be represented more compactly using shared nodes.

3.2 BDD-based Synthesis

35

Fig. 3.3 Substitution for shared nodes without terminals as successors

However, to apply shared nodes in reversible logic synthesis, the output value of a respective node has to be preserved until it is not needed any longer. Considering the substitutions depicted in Table 3.4, this holds for all cases where one of the edges low(v) or high(v) lead to a terminal node. Here, all values of the inputs (in particular of high(v) or low(v) that represent output values of other nodes) are preserved. In contrast, this is not the case for the general case (first row of Table 3.4). Here, only one value (namely the value from the select variable xi ) is preserved. Thus, a modified substitution for shared nodes without terminals as successors is required. Figures 3.3(a) and 3.3(b) show one possible substitution to a reversible cascade and a quantum cascade, respectively. Besides an additional constant circuit line, this requires one (three) additional reversible gates (quantum gates) in comparison to the substitution of Table 3.4. But therefore, shared nodes are supported. Moreover, this substitution also allows to represent the identity of a select variable (last row of Table 3.4) by the respective input line of the circuit (i.e. without any additional gates or lines). Previously, this was not possible, since the value of this circuit line was not necessarily preserved (as an example see Fig. 3.2 where the value of the identity node f gets lost after node f is substituted). Exploiting this, the synthesis algorithm proposed in the last section can be improved as follows: Again a BDD for the function to be synthesized is build which is afterwards traversed in a depth-first manner. Then, for each node v ∈ V , the following checks are performed: 1. Node v represents the identity of a primary input (i.e. the select input) In this case no cascade of gates is added to the circuit, since the identity can be represented by the same circuit line as the input itself. 2. Node v contains at least one edge (low(v) or high(v), respectively) leading to a terminal In this case substitutions as depicted in Table 3.4 are applied, since they often need a smaller number of gates and additionally preserve the values of all input signals. 3. The values of low(v) and high(v) are still needed, since they represent either shared nodes or the identity of an input variable In this case the substitutions depicted in Fig. 3.3 are applied, since they preserve the values of all input signals. 4. Otherwise The substitution as depicted in the first row of Table 3.4 is applied, since no input values must be preserved or a terminal successor occurs, respectively. In

36

3 Synthesis of Reversible Logic

Fig. 3.4 Toffoli circuits for shared BDD

this case, the smaller cascades (with respect to both the number of additional lines and the number of gates) are preferred. Example 3.2 In Fig. 3.4(a) a partial BDD including a shared node f is shown. Since the value of node f is used twice (by nodes f1 and f2 ), an additional line (the second one in Fig. 3.4(b)) and the cascade of gates as depicted in Fig. 3.3 are applied to substitute node f1 . Then, the value of f is still available such that the substitution of node f2 can be applied. The resulting circuit is given in Fig. 3.4(b). Figure 3.4(c) shows the resulting circuit for low(f ) = 0 and high(f ) = 1, i.e. for f representing the identity of xj . In this case no gates for f are added. Instead, the fifth line is used to store the value for both, xj and f . Besides that, the remaining substitutions are equal to the ones described above.

3.2.2.2 Complement Edges Further reductions in BDD sizes can be achieved if complement edges [BRB90] are applied. In particular, this allows to represent a function as well as its negation by a single node only. If there is a complement edge e.g. between v and low(v), then Shannon decomposition with an inverted value of low(v) is applied. To support

3.2 BDD-based Synthesis

37

complement edges in the proposed synthesis approach, adjusted substitutions have to be used that take the inversion of complemented edges into account. Table 3.6 shows the resulting cascades used in the proposed synthesis approach. Note that complements have to be considered only at the low edges of the nodes, since complements at high-edges can be mapped to them and vice versa. In some cases, this leads to larger cascades in comparison to the substitution without complement edges (e.g. compare the second row of Table 3.6 to the first row of Table 3.4). How far this can be compensated by the possible BDD reductions is discussed in the experimental evaluation in Sect. 3.2.4.

3.2.2.3 Reordering of BDDs Finally, different BDD orders may influence the synthesis results. It has been shown that the order of the variables has a high impact to the size of the resulting BDD [Bry86] (see e.g. Fig. 2.11 on p. 19). Since reducing the number of nodes may also reduce the size of the resulting circuits, reordering is considered in this section. In the past, several approaches have been proposed to achieve good orders (e.g. sifting [Rud93]) or to determine exact results (e.g. [DDG00]) with respect to the number of nodes. All these techniques can be directly applied to the BDD-based synthesis approach and need no further adjustments of the already introduced substitutions. Using these optimization techniques (i.e. shared nodes, complement edges, and reordering), in Sect. 3.2.4 it is considered how they influence the resulting Toffoli or quantum circuits, respectively. But before, it is briefly shown how the proposed approach can be used to transfer theoretical results from BDDs to reversible logic.

3.2.3 Theoretical Consideration In the past, first lower and upper bounds for the synthesis of reversible functions containing n variables have been determined. In [MD04b], it has been shown that there exists a reversible function that requires at least (2n / ln 3) + o(2n ) gates (lower bound). Furthermore, the authors proved that every reversible function can be realized with no more than n · 2n gates (upper bound). For a restricted gate library leading to smaller quantum cost and thus only consisting of NOT, CNOT, and two-controlled Toffoli gates (the same as applied for the substitutions proposed here), functions can be synthesized with at most n NOT gates, n2 CNOT gates, and 9 · n · 2n + o(n · 2n ) two-controlled Toffoli gates (according to [SPMH03]). A tighter upper bound of n NOT gates, 2 · n2 + o(n · 2n ) CNOT gates, and 3 · n · 2n + o(n · 2n ) two-controlled Toffoli gates has been proved in [MDM07]. In [PMH08] it has been shown that linear reversible functions are synthesizable with CNOT gates only. Moreover, their algorithm never needs more than Θ(n2 / log n) CNOT gates for any linear function f with n variables.

Table 3.6 Subst. of BDD nodes with complement edge to reversible/quantum circuits

38 3 Synthesis of Reversible Logic

3.2 BDD-based Synthesis

39

Using the synthesis approach proposed in the last sections, reversible circuits for a function f with a size dependent on the number of nodes in the BDD can be constructed. More precisely, let f be a function with n primary inputs which is represented by a BDD containing k nodes.2 Then, the resulting Toffoli circuit consists of at most • k + n circuit lines (since besides the input lines, for each node at most one additional line is added) and • 3 · k gates (since for each node cascades of at most 3 gates are added according to the substitutions of Table 3.4 and Fig. 3.3, respectively). Asymptotically, the resulting reversible circuits are bounded by the BDD size. Since for BDDs many theoretical results exist, using the proposed synthesis approach, these results can be transferred to reversible logic as well. In the following, some results obtained by this observation are sketched. • A BDD representing a single-output function has 2n nodes in the worst case. Thus, each function can be realized in reversible logic with at most 3 · 2n gates (where at most 2 · 2n CNOTs and 2 · 2n Toffoli gates are needed). nodes in the worst case. • A BDD representing a symmetric function has n·(n+1) 2 Thus, each symmetric function can be realized in reversible logic with a quadratic number of gates (more precisely, a quadratic number of CNOTs and a quadratic number of Toffoli gates are needed). • A BDD representing specific functions, like AND, OR, or XOR has a linear size. Thus, there exists a reversible circuit realizing these functions in linear size as well. • A BDD representing an n-bit adder has linear size. Thus, there exists a reversible circuit realizing addition in linear size as well. Further results (e.g. tighter upper bounds for general functions as well as for respective function classes) are also known (see e.g. [Weg00, LL92]). Moreover, in a similar way bounds for quantum circuits can be obtained. However, a detailed analysis of the theoretical results that can be obtained by the BDD-based synthesis is left for future work.

3.2.4 Experimental Results The BDD-based synthesis method together with the suggested improvements has been implemented in C++ on top of the BDD package CUDD [Som01]. In this section, first a case study is given evaluating the effect of the respective BDD optimization techniques on the resulting reversible or quantum circuits. Afterwards, the proposed approach is compared against two previously proposed synthesis methods. 2 For

simplicity, it is assumed that no complement edges are applied.

40

3 Synthesis of Reversible Logic

Benchmarks functions provided by RevLib [WGT+08] (including most of the functions which have been previously used to evaluate existing reversible synthesis approaches) as well as from the LGSynth package [Yan91] (a benchmark suite for evaluating traditional synthesis) have been used. All experiments have been carried out on an AMD Athlon 3500+ with 1 GB of memory. The timeout was set to 500 CPU seconds.

3.2.4.1 Effect of BDD Optimization To investigate the effect of the respective BDD optimization techniques the proposed synthesis approach has been applied to the benchmarks with the respective techniques enabled or disabled. In the following, for each optimization technique (i.e. shared nodes, complement edges, and reordering) the respective results are presented and discussed. Shared Nodes Shared nodes can be enabled or disabled by manipulating the unique table. Then, depending on the respective case, the substitutions of Table 3.4 or additionally of Fig. 3.3 are applied. The results are summarized in Table 3.7. The first two columns give the name of the benchmark (F UNCTION) as well as the number of primary inputs and outputs (PI/PO). Then, the number of resulting circuit lines (n), Toffoli gates (dTof ), or quantum gates (dQua ), as well as the run-time of the synthesis approach (in CPU seconds) is given for the naive approach (denoted by W / O S HARED N ODES) and the approach that exploits shared nodes (denoted by WITH S HARED N ODES). One can clearly conclude that the application of shared nodes leads to better realizations for reversible and quantum logic. Both, the number of lines and the number of gates can be significantly reduced. In particular, for the number of lines this might not be obvious, since additional lines are required to support shared nodes (see Sect. 3.2.2). But due to the fact that shared nodes also decrease the number of terminal nodes (which require additional lines as well), this effect is compensated. Complement Edges Complement edges are supported by the CUDD package and can easily be disabled and enabled. For comparison, circuits from both, BDDs with and BDDs without complement edges (denoted by WITH C OMPL . E DGES and W / O C OMPL . E DGES, respectively), are synthesized. In the latter case, the substitutions shown in Table 3.6 are applied whenever a successor is connected by a complement edge. Shared nodes are also applied, since they make complement edges more beneficial. The results are given in Table 3.8.3 The columns are labeled as described above for Table 3.7. Even if the cascades representing nodes with complement edges are larger in some cases (see Sect. 3.2.2), improvements in the circuit sizes can be observed (see 3 Compared to Table 3.7, also benchmarks are considered for which no result could be determined using the W / O S HARED N ODES approach.

3.2 BDD-based Synthesis

41

Table 3.7 Effect of shared nodes F UNCTION

PI/PO

W/O

n

S HARED N ODES dTof

dQua

WITH

T IME

n

S HARED N ODES T IME

dTof

dQua

7

7

21

<0.01

R EV L IB F UNCTIONS decod24_10

2/4

7

7

21

<0.01

4mod5_8 mini-alu_84

4/1

9

13

36

<0.01

9

13

36

<0.01

4/2

12

21

57

<0.01

11

20

52

<0.01

alu_9

5/1

15

30

73

<0.01

14

29

72

<0.01

rd53_68

5/3

31

85

212

<0.01

20

49

130

<0.01

hwb5_13

5/5

36

105

277

<0.01

32

91

238

<0.01

sym6_63

6/1

23

57

126

0.01

17

34

83

<0.01

hwb6_14

6/6

68

239

618

<0.01

53

167

437

<0.01

rd73_69

7/3

86

301

730

<0.01

38

105

272

<0.01

ham7_29

7/7

75

231

595

<0.01

36

88

224

<0.01

hwb7_15

7/7

136

526

1353

<0.01

84

284

744

<0.01

rd84_70

8/4

194

679

1650

0.01

52

140

373

<0.01

hwb8_64

8/8

277

1132

2903

0.02

129

456

1195

<0.01

sym9_71

9/1

104

325

724

<0.01

35

79

201

<0.01

17

40

98

<0.01

10

19

48

<0.01

LGS YNTH F UNCTIONS xor5

5/1

bw

5/28

125

381

935

0.01

97

286

747

<0.01

9sym

9/1

104

325

724

<0.01

35

79

201

<0.01

e.g. rd84_70, 9sym, or cordic). But in particular for the LGSynth functions, sometimes better circuits result, when complement edges are disabled (see e.g. spla). Here, the larger cascades obviously cannot be compensated by complement edge optimization. In contrast, for quantum circuits in nearly all cases better realizations are obtained with complement edges enabled. A reason for that is that the quantum cascades for nodes with complement edges have the same size as the respective cascades for nodes without complement edges in nearly all cases (see Table 3.4, Fig. 3.3, and Table 3.6, respectively). Thus, the advantage of complement edges (namely the possibility to create smaller BDDs) can be fully exploited without the drawback that the respective gate substitutions become larger. Reordering of BDDs To evaluate the effect of reordering the BDD on the resulting circuit sizes, three techniques are considered: (1) An order given by the occurrences of the primary inputs in the function to be synthesized (denoted by O RIG INAL ), (2) an optimized order achieved by sifting [Rud93] (denoted by S IFTING ), and (3) an exact order [DDG00] which ensures the BDD to be minimal (denoted by E XACT). Again, all created BDDs exploit shared nodes. Furthermore, complement edges are enabled in this evaluation. After applying the synthesis approach,

42

3 Synthesis of Reversible Logic

Table 3.8 Effect of complement edges F UNCTION

PI/PO

W/O

n

C OMPL . E DGES dTof

dQua

WITH

T IME

n

C OMPL . E DGES dTof

dQua

T IME

R EV L IB F UNCTIONS decod24_10

2/4

7

7

21

<0.01

6

11

23

<0.01

4mod5_8

4/1

9

13

36

<0.01

8

16

37

<0.01

mini-alu_84

4/2

11

20

52

<0.01

10

22

49

<0.01

alu_9

5/1

14

29

72

<0.01

11

25

53

<0.01

rd53_68

5/3

20

49

130

<0.01

13

34

75

<0.01

hwb5_13

5/5

32

91

238

<0.01

27

85

201

<0.01

sym6_63

6/1

17

34

83

<0.01

14

29

69

<0.01

hwb6_14

6/6

53

167

437

<0.01

46

157

377

<0.01 <0.01

rd73_69

7/3

38

105

272

<0.01

25

73

162

ham7_29

7/7

36

88

224

<0.01

18

50

82

<0.01

hwb7_15

7/7

84

284

744

<0.01

74

276

665

<0.01

rd84_70

8/4

52

140

373

<0.01

34

104

229

<0.01

hwb8_64

8/8

129

456

1195

<0.01

116

442

1067

<0.01

sym9_71

9/1

35

79

201

<0.01

27

62

153

<0.01

LGS YNTH F UNCTIONS xor5

5/1

10

19

48

<0.01

6

8

8

<0.01

bw

5/28

97

286

747

<0.01

91

317

732

<0.01

ex5p

8/63

276

680

1676

0.02

233

706

1520

0.02

9sym

9/1

35

79

201

<0.01

27

62

153

<0.01

pdc

16/40

648

2074

4844

0.12

631

2109

4803

0.12

spla

16/46

567

1422

3753

0.09

559

1728

3799

0.09

cordic

23/2

76

177

448

0.02

53

109

265

0.02

circuit sizes as summarized in Table 3.9 result. Here again, the columns are labeled as described above. The results show that the order has a significant effect on the circuit size. In particular for the LGSynth functions, the best results are achieved with the exact order. But as a drawback, this requires a longer run-time. Besides that, also in this evaluation, examples can be found, showing that optimization for BDDs not always leads to smaller circuits. Altogether, particularly for larger functions reordering is beneficial. In most of the cases it is thereby sufficient to perform sifting instead of exact reordering, since this lead to results of similar quality but without a notable increase in run-time. For the following evaluations, BDD-based synthesis with shared nodes, complement edges, and sifting has been applied.

3.2 BDD-based Synthesis

43

Table 3.9 Effect of variable ordering F UNCTION PI/PO

O RIGINAL n

S IFTING dQua T IME n

dTof

E XACT dQua T IME n

dTof

dQua T IME

dTof

R EV L IB F UNCTIONS decod24_10 2/4

6

11

23 <0.01

6

11

23 <0.01

6

11

4mod5_8

4/1

8

16

37 <0.01

7

8

18 <0.01

7

8

18 <0.01

mini-alu_84 4/2

10

22

49 <0.01 10

20

43 <0.01 10

20

43 <0.01

alu_9

11

25

53 <0.01

9

22 <0.01

5/1

7

9

rd53_68

5/3

13

34

75 <0.01 13

34

hwb5_13

5/5

27

85

201 <0.01 28

88

sym6_63

6/1

14

29

69 <0.01 14

hwb6_14

6/6

46

157

377 <0.01 46

22 <0.01

7

75 <0.01 13 205

34

0.01 28

88

29

69 <0.01 14

29

159

375 <0.01 46

159

23 <0.01

75 <0.01 205

0.01

69 <0.01 375

0.01

rd73_69

7/3

25

73

162 <0.01 25

73

162 <0.01 25

73

ham7_29

7/7

18

50

82 <0.01 21

61

107 <0.01 21

61

107

0.01

hwb7_15

7/7

74

276

665 <0.01 73

281

653 <0.01 76

278

658

0.01

rd84_70

8/4

34

104

229 <0.01 34

104

229 <0.01 34

104

229 <0.01

hwb8_64

8/8

116

sym9_71

9/1

27

442 1067 <0.01 112 62

153 <0.01 27

449 1047 <0.01 114 62

153 <0.01 27

162 <0.01

440 1051 62

0.03

153 <0.01

LGS YNTH F UNCTIONS xor5

5/1

6

8

bw

5/28

91

317

ex5p

8/63

233

9sym

9/1

27

6

8

732 <0.01 87

8 <0.01

307

706 1520 62

0.02 206

153 <0.01 27

6

8

8 <0.01

693 <0.01 84

8 <0.01

306

667 <0.01

647 1388 62

0.02 206

153 <0.01 27

647 1388 62

0.06

153

0.01

pdc

16/40

631 2109 4803

0.12 619 2080 4781

0.13 619 2087 4850

66.38

spla

16/46

559 1728 3799

0.09 489 1709 4372

0.09 483 1687 4322

86.92

cordic

23/2

0.02 52

0.03 50

53

109

265

101

247

95

237

6.90

3.2.4.2 Comparison to Previous Synthesis Approaches In this section, circuits synthesized by the BDD-based approach are compared to the results generated by (1) the RMRLS approach (described in [GAJ06] using version 0.2 in the default settings) and (2) the RMS approach (based on the concepts of [MDM07] in its most recent version including improved handling of don’t care conditions at the output). Since previous approaches (i.e. RMRLS and RMS) require reversible functions as input, non-reversible functions are embedded into reversible ones (based on the concepts introduced in Sect. 3.1.1). For BDD-based synthesis, the original function description has been used which automatically leads to an embedding. The results are summarized in Table 3.10. The first columns give the name as well as the number of the primary inputs (PI) and primary outputs (PO) of the original function. In the following columns, the number of lines (n), the gate count (dTof ),

PI/PO

2/4 4/1 4/2 5/1 5/3 5/5 6/1 6/6 6/6 7/3 7/7 7/7 8/4 8/8 9/1 9/9 12/12 12/12 13/13 13/13 15/15

F UNCTION

R EV L IB F UNCTIONS decod24_10 4mod5_8 mini-alu_84 alu_9 rd53_68 hwb5_13 sym6_63 mod5adder_66 hwb6_14 rd73_69 hwb7_15 ham7_29 rd84_70 hwb8_64 sym9_71 hwb9_65 cycle10_2_61 plus63mod4096_79 plus127mod8192_78 plus63mod8192_80 ham15_30

4 5 5 5 7 5 7 6 6 9 7 7 11 8 10 9 12 12 13 13 15

11 9 21 9 – – 36 37 – – – – – – – – 26 – – – –

55 25 173 49 – – 777 529 – – – – – – – – 1435 – – – –

497.51 0.86 495.61 122.48 >500.00 >500.00 485.47 494.46 >500.00 >500.00 >500.00 >500.00 >500.00 >500.00 >500.00 >500.00 491.87 >500.00 >500.00 >500.00 >500.00

P REVIOUS A PPROACHES RMRLS [GAJ06] dTof QC T IME n

7 5 36 9 221 42 15 35 100 1344 375 26 124 229 27 2021 41 24 25 28 –

19 9 248 25 2646 214 119 151 740 20779 3378 90 8738 3846 201 23311 1837 4873 9131 9183 – <0.01 <0.01 <0.01 0.01 0.14 0.01 0.13 0.06 0.04 1.93 0.18 0.09 9.92 0.90 3.98 1.45 26.17 17.74 57.16 57.19 >500.00

RMS [MDM07] dTof QC T IME

Table 3.10 Comparison of BDD-based synthesis to previous methods

6 7 10 7 13 28 14 32 46 13 73 21 34 112 27 170 39 23 25 25 45

n

11 8 20 9 34 88 29 96 159 73 281 61 104 449 62 699 78 49 54 53 153

dTof

27 24 60 29 98 276 93 292 507 217 909 141 304 1461 206 2275 202 89 98 97 309

QC

23 18 43 22 75 205 69 213 375 162 653 107 229 1047 153 1620 164 79 86 87 246

dQua

BDD- BASED S YNTHESIS

<0.01 −32 <0.01 −7 <0.01 −130 0.01 −27 <0.01 – 0.01 – <0.01 −708 <0.01 −316 <0.01 – <0.01 – <0.01 – <0.01 – <0.01 – 0.01 – <0.01 – 0.02 – 0.09 −1271 0.08 – 0.21 – 0.20 – 1.25 –

T IME

Δ QC (RMRLS)

4 9 −205 −3 −2571 −9 −50 62 −365 −20617 −2725 17 −8509 −2799 −48 −21691 −1673 −4794 −9045 −9096 –

Δ QC (RMS)

44 3 Synthesis of Reversible Logic

PI/PO

23/2

24/109

39/3

41/35

cordic

cps

apex2

seq

128/28

16/46

spla

ex4p

16/40

pdc

65/65

9/1

9sym

117/88

8/63

ex5p

apex5

5/28

bw

e64

5/1

xor5

LGS YNTH F UNCTIONS

F UNCTION

Table 3.10 (Continued)

∼

∼

∼

∼

∼

∼

∼

∼

∼

∼

∼

∼

∼

∼

∼

∼

∼

∼

∼

∼

–

∼

∼

10

27

6

∼

∼

∼

∼

∼

∼

∼

∼

∼

–

∼

∼

387

∼

∼

∼

∼

∼

∼

∼

∼

∼

>500.00

∼

∼

484.11

P REVIOUS A PPROACHES RMRLS [GAJ06] dTof QC T IME n

∼

∼

∼

∼

∼

∼

∼

∼

∼

27

∼

∼

8

∼

∼

∼

∼

∼

∼

∼

∼

∼

201

∼

∼

68

RMS [MDM07] dTof QC

∼

∼

∼

∼

∼

∼

∼

∼

∼

4.00

∼

∼

0.01

T IME

87

6

510

1147

195

1617

498

930

52

489

619

27

206

n

8

1277

3308

387

5990

1746

2676

101

1709

2080

62

647

307

dTof

8

4009

11292

907

19362

5922

8136

325

5925

6500

206

1843

943

QC

4435

6301

247

4372

4781

153

1388

693

8

dQua

3093

8387

713

14259

BDD- BASED S YNTHESIS

0.03

0.14

0.04

1.14

0.24

0.10

0.02

0.10

0.14

<0.01

0.02

<0.01

<0.01

T IME

–

–

–

–

–

–

–

–

–

–

–

–

−379

Δ QC (RMRLS)

–

–

–

–

–

–

–

–

–

−48

–

–

−60

Δ QC (RMS)

3.2 BDD-based Synthesis 45

46

3 Synthesis of Reversible Logic

the quantum cost (QC), and the synthesis time (T IME) for the respective approaches (i.e. RMRLS, RMS, and the BDD- BASED S YNTHESIS) are reported.4 For BDDbased synthesis, additionally the resulting number of gates (and thus the quantum cost) when directly synthesizing quantum gate circuits is given in the column denoted by dQua . Furthermore, a “∼” denotes that an embedding needed by the previous synthesis approaches could not be created within the given timeout. Finally, the last two columns (Δ QC) give the absolute difference of the quantum cost for the resulting circuits obtained by the BDD-based quantum circuit synthesis and the RMRLS and RMS approach, respectively. As a first result, one can conclude that for large functions to be synthesized it is not always feasible to create a reversible embedding needed by the previous approaches. Moreover, even if this is possible, both RMRLS and RMS need a significant amount of run-time to synthesize a circuit from the embedding. As a consequence, for most of the LGSynth benchmarks no result can be generated within the given timeout. In contrast, the BDD-based approach is able to synthesize circuits for all given functions within a few CPU seconds. Furthermore, although BDD-based synthesis often leads to larger circuits with respect to gate count and number of lines, the resulting quantum cost are significantly lower in most of the cases (except for decod24_10, 4mod5_8, mod5adder_66, and ham7_29). As an example, for plus63mod4096_79 the BDD- BASED S YNTHESIS synthesizes a circuit with twice the number of lines but with two orders of magnitude fewer quantum cost in comparison to RMS. In the best cases (e.g. hwb9_65) a reduction of several thousands in quantum cost is achieved. Note that quantum costs are more important than gate count, since they consider gates with more control lines to be more costly. Thus, even if the total number of circuit lines that have been added by the BDD- BASED S YNTHESIS is higher than by previous approaches, significant improvements in the quantum cost are obtained. Furthermore, reversible logic for functions with more than 100 variables can be automatically synthesized. How to reduce the number of circuit lines is addressed later in Sect. 6.2.

3.3 SyReC: A Reversible Hardware Language Besides synthesis of reversible functions, also how to realize more complex circuits has to be addressed in order to provide an efficient design flow. Thus, synthesis of reversible logic has to reach a level which allows the description of circuits at higher abstractions. For this purpose, programming languages can be exploited. Considering traditional synthesis, approaches using languages like VHDL [LSU89], SystemC [GLMS02], or SystemVerilog [SDF04] have been established to specify and subsequently synthesize circuits. Even if first programming languages are also available in the reversible domain (see e.g. [Abr05, PHW06, YG07]), so far they 4 T IME for BDD- BASED S YNTHESIS includes both, the time to build the BDD as well as to derive the circuit from it.

3.3 SyReC: A Reversible Hardware Language

47

only have been used to design reversible software. Similar approaches for reversible circuit synthesis are still missing. In this section, the programming language SyReC is proposed intended to specify and afterwards to automatically synthesize reversible logic. For this purpose, Janus [YG07]—an existing language designed to specify reversible software—is used as a basis and enriched by new concepts as well as operations aimed to specify reversible circuits. A hierarchical approach is presented that automatically transforms the respective statements and operations of the new programming language into a reversible circuit. Experiments show that complex circuits can be efficiently generated with the help of SyReC. Moreover, a comparison to the BDD-based synthesis approach presented in the previous section shows the advantages of SyReC, if more complex circuits instead of single functions should be synthesized. The remainder of this section is structured as follows: The SyReC programming language as well as the new concepts, operations, and restrictions applied for hardware synthesis are introduced in Sect. 3.3.1. Section 3.3.2 describes the hierarchical synthesis approach and explains in detail how reversible circuits specified in SyReC can be generated. Finally, experimental results and conclusions are given in Sect. 3.3.3.

3.3.1 The SyReC Language As mentioned above, Janus [YG07] is used as a basis for the programming language SyReC to specify reversible systems to be synthesized as circuits. This section briefly reviews the syntax of the Janus language. Afterwards, the new concepts and operations added to address circuit synthesis are introduced.

3.3.1.1 The Software Language Janus Janus is a reversible language that is simple but yet powerful enough to design practical reversible software systems [YG07]. It provides fundamental constructs to define control and data operations while still preserving reversibility. Figure 3.5 shows the syntax of Janus. Each Janus program (denoted by P) consists of variable declarations (denoted by D) and procedure declarations. The variables have non-negative integer values and are denoted by strings. They can be grouped as arrays. New variables are initially assigned to 0. Constants are denoted by c. Each procedure consists of a name (id) and a sequence of statements (denoted by S) including operations, reversible conditionals, reversible loops, as well as call and uncall of procedures (lines 4 to 7 in Fig. 3.5). Variables within statements are denoted by V . In the following, it is distinguish between reversible assignment operations (denoted by ⊕) and (not necessarily reversible) binary operations (denoted by ). The former ones assign values to a variable on the left-hand side. Therefore, the respective variable must not appear in the expression on the right-hand side. Furthermore,

48

3 Synthesis of Reversible Logic

only a restricted set of assignment operations exists, namely increase (+ =), decrease (− =), and bit-wise XOR (ˆ =), since they preserve the reversibility (i.e. it is possible to compute these operations in both directions). In particular, the bit-wise XOR is of interest because aˆ = b is equal to an assignment a = b if a is equal to 0. In contrast, binary operations, i.e. arithmetic (+, ∗, /, %, ∗/), bit-wise (&, |, ˆ), logical (&&, ||), and relational (<, >, =, ! =, <=, >=) operations, may not be reversible. Thus, they can only be used in right-hand expressions which preserve (i.e. do not modify) the values of the respective inputs. In doing so, all computations remain reversible since the input values can be applied to revert any operation. For example, to specify a multiplication (i.e. a ∗ b) in Janus, a new free variable c must be introduced which is used to store the product (i.e. cˆ = a ∗ b is applied). In comparison, to common (irreversible) programming languages this forbids statements like a = a ∗ b. Having this as basis, Janus can be used to specify reversible programs and execute them in a reversible manner (i.e. forward and backward).

3.3.1.2 The Hardware Language SyReC In the following, the programming language SyReC for synthesis of reversible circuits is described. Janus is thereby used as a basis and enriched by further concepts (e.g. declaring circuit signals of different bit-width) and operations (e.g. bit-access and shifts). Besides that, some restrictions are applied (e.g. dynamic loops are forbidden in hardware). Incorporating all these aspects, a syntax of a programming language for reversible circuit synthesis as depicted in Fig. 3.6 results. More precisely, the following extensions and restrictions have been applied: • The declaration of variables has been extended so that the designer can declare variables with different bit-widths (line 2). • Arrays are not allowed. • Operators to access single bits (x.N), a range of bits (x.N:N), as well as the size (#V) of a variable, respectively, have been added (line 3 and line 4). • Since loops must be completely unrolled during synthesis, the number of iterations has to be available before compilation. That is, dynamic loops (defined by expressions) are not allowed (line 7). • Macros for the SWAP operation (<=>) (line 5) as well as for the for-loop statement (line 8) have been added.5 • Further operations used in hardware design (e.g. shifts <) have been added (line 10 and line 14). Example 3.3 Figure 3.7 shows a simple Arithmetic Logic Unit (ALU) illustrating the core concept of the resulting hardware programming language. The basic arithmetic operations can be thereby applied directly. Furthermore, control variables can be defined with a lower bit-width than data variables. 5 These extensions are not necessarily needed (i.e. they can also be expressed by the existing operations), but they allow a more intuitive programming of reversible circuits.

3.3 SyReC: A Reversible Hardware Language

49

Fig. 3.5 Syntax of the software language Janus

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11)

P ::= D∗ (procedure id S+ )+ D ::= x | x[c] V ::= x | x[E] S ::= V ⊕ = E | if E then S else S fi E | from E do S loop S until E | call id | uncall id | skip E ::= c | V | (E E) ⊕ ::= + | − | ˆ ::= ⊕ | ∗ | / | % | ∗/ | & | | | && | | <|>|=|!=|<=|>=

Fig. 3.6 Syntax of the hardware language SyReC

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14)

P ::= D∗ (procedure id S+ )+ D ::= x | x ( c ) V ::= x | x.N:N | x.N N ::= c | #V S ::= V < => V | V ⊕ = E | if E then S else S fi E | from N do S loop S until N | for N do S until N | call id | uncall id | skip E ::= N | V | (E E) | (E < N) ⊕ ::= + | − | ˆ ::= ⊕ | ∗ | / | % | ∗/ | & | | | && | | < | > | = | ! = | <= | >= < ::= << | >>

Fig. 3.7 SyReC example: ALU

op ( 2 ) x0 x1 x2 procedure a l u i f ( op = 0 ) t h e n x0 ^= ( x1 + x2 ) else i f ( op = 1 ) t h e n x0 ^= ( x1 − x2 ) else i f ( op = 2 ) t h e n x0 ^= ( x1 ∗ x2 ) else x0 ^= ( x1 ^ x2 ) f i ( op = 2 ) f i ( op = 1 ) f i ( op = 0 )

50

3 Synthesis of Reversible Logic

In contrast to previous approaches, this allows a much easier specification of (complex) reversible circuits. Having this, the next section describes how circuits can be synthesized from this representation.

3.3.2 Synthesis of the Circuits Using the language introduced above it is possible to specify reversible circuits on a higher level. As demonstrated by Example 3.3 this particularly allows to design complex circuits in an easier way than e.g. by truth tables or decision diagrams. Nevertheless, the specified circuits still need to be synthesized. To this end, a hierarchical synthesis method is proposed that uses existing realizations of the individual operations (i.e. building blocks) and combines them so that the desired circuit results. More precisely, the approach (1) traverses the whole program and (2) adds cascades of reversible gates to the circuit to be synthesized for each statement or expression, respectively. In the following, the individual mappings of the operations and expressions to the respective reversible cascades are described.

3.3.2.1 Reversible Assignment Operations As introduced in Sect. 3.3.1, reversible assignment operations include those which are reversible even if they assign a new value to the variable on the left-hand side of a statement. In the following, the notation as depicted in Fig. 3.8(a) is used to denote such an operation in a circuit structure.6 Solid lines represent the variable(s) on the right-hand side of the operation, i.e. the variable(s) whose values are preserved. The simplest reversible assignment operation is the bit-wise XOR (e.g. x1 ˆ = x2 ). For Boolean variables, this operation can be synthesized by a single Toffoli gate as shown in Fig. 3.8(b). If variables with a bit-width greater than 1 are applied, then a Toffoli gate has to be analogously applied for each bit. To synthesize the increase operation (e.g. a+ = b), a modified addition network is added. In the past, several realizations of addition in reversible logic have been investigated. In particular, it is well known that the minimal realization of a onebit adder consists of four Toffoli gates (see e.g. [WGT+08]). Thus, cascading the required number of one-bit adders is a possible realization. But since every onebit adder also requires one constant input, this is a very poor solution with respect to circuit lines. In contrast, heuristic realizations exist that require a fewer number of additional lines (see e.g. [TK05]). Here, a realization with only one additional line (which additionally can be reused for any further addition operation) is used. A cascade showing this realization for a 3-bit addition is depicted in Fig. 3.8(c). Nevertheless, any other adder realization can be applied as well. 6 Figure 3.8(a) shows the notation for a single bit operation. For larger bit-widths the notation is extended accordingly.

3.3 SyReC: A Reversible Hardware Language

51

Fig. 3.8 Realization of reversible assignment operations

Fig. 3.9 Realization of binary operations

Finally, the mapping for the decrease operation is left (e.g. a− = b). Here, also the realization from Fig. 3.8(c) is applied, which is fed with a negated variable value.

3.3.2.2 Binary Operations Binary operations include operations that are not necessarily reversible so that its inputs have to be preserved to allow a (reversible) computation in both directions. To denote such operations, in the following the notation depicted in Fig. 3.9(a) is used. Again, solid lines represent the variable(s) whose values are preserved (i.e. in this case the input variables). Synthesis of irreversible functions in reversible logic is not new so that for most of the respective operations already a reversible circuit realization exists. Additional

52

3 Synthesis of Reversible Logic

lines with constant inputs are thereby applied to make an irreversible function reversible (see e.g. Sect. 3.1.1). As an example, Fig. 3.9(b) shows a reversible cascade that realizes an AND operation. As can be seen, this requires one additional circuit line with a constant input 0. Similar mappings exist for all other operations. However, since binary operations can be applied together with reversible assignment operations (e.g. cˆ = a&b), sometimes a more compact realization is possible. More precisely, additional (constant) circuit lines can be saved (at least for some operations), if the result of a binary operation is applied to a reversible assignment operation. As an example, Fig. 3.9(c) shows the realization for cˆ = a&b where no constant input is needed but the circuit line representing c is used instead. However, such a “combination” is not possible for all operations. As an example, Fig. 3.9(d) shows a two-bit addition whose result is applied to a bit-wise XOR, i.e. cˆ = a + b. Here, removing the constant lines and directly applying the XOR operation on the lines representing c would lead to a wrong result. This is because intermediate results are stored at the lines representing the sum. Since these values are reused later, performing the XOR operation “in parallel” would destroy the result. Thus, to have a combined realization of a bit-wise XOR and an addition, a concrete embedding for this case must be generated. Since finding and synthesizing respective embeddings for all affected operations and combinations, respectively, is a non-trivial task, a more detailed consideration of this aspect is left for future work. So far, constant lines are applied to realize the desired functionality. In this sense, most of the binary operations (in particular the bit-wise, logical, and relational operations as well as the addition) are synthesized. Besides that, the realization of the multiplication is of interest. A couple of possible ways are described in [OWDD10]. Figure 3.9(e) briefly shows how multiplication is realized by the proposed synthesis method. As can be seen, partial products are applied. Considering one of the factors a, each time a respective bit of this factor (denoted by ai ) is equal to 1, the respective partial product is added to the product. This allows to reuse the increase realization introduced in the previous section.

3.3.2.3 Conditional Statements, Loops, Call/Uncall Finally, the realization of control operations as reversible cascades is considered. Loops and procedure calls/uncalls can be realized in a straightforward way. More precisely, loops are realized by simple cascading (i.e. unrolling) the respective statements within a loop block for each iteration. Since the number of iterations must be available before synthesis (see Sect. 3.3.1), this results in a finite number of statements which is subsequently processed. Call and uncall of procedures are handled similarly. Here, the respective statements in the procedures are cascaded together. To realize conditional statements (e.g. the one shown in Fig. 3.10(a)), two variants are proposed. Figure 3.10(b) shows the first one, which is realized in three steps: 1. All variables in the then- or else-block, respectively, which potentially are assigned with a new value (i.e. that are on the left-hand side of a reversible assign-

3.3 SyReC: A Reversible Hardware Language

53

Fig. 3.10 Realization of an if-statement

ment operation) are duplicated. This respectively requires an additional circuit line with constant input 0. 2. The statements in the respective blocks are mapped to reversible cascades. The duplications introduced in the last step are thereby applied to intermediately store the results of the then-block and the original values of the variables in the elseblock, respectively. 3. Depending on the result of the if-statement e, the respective values of the duplicated lines and the original lines are swapped. More precisely, in the example of Fig. 3.10(a) the value of a is swapped with its (newly assigned) duplication iff e evaluates to 1. Analogously, iff e evaluates to 0 the (newly assigned) value of c is passed through. The second realization of a conditional statement is depicted in Fig. 3.10(c). In contrast to the previous one, here no duplications (and therewith no additional circuit lines) are required. Instead, control lines are added to all gates in the realization of the respective then- and else-block. Thus, the operations are computed iff the expression e is assigned to 1 or 0, respectively. A NOT gate (i.e. a Toffoli without control lines) is thereby used to flip the value of e so that the gates of the else block can be “controlled” as well. Having both realizations, it is up to the designer which one should be used during synthesis. Using the first realization leads to additional circuit lines (particularly in quantum logic a restricted resource). This is not in case in the second realization; however, here due to the additional control lines both the quantum cost and the transistor cost of the circuit significantly increase. Besides other aspects, this is also evaluated in the experiments in the next section.

3.3.3 Experimental Results The proposed synthesis approach for the programming language SyReC has been implemented in C++. In this section, experimental results obtained by this approach are provided. In particular, the different realizations of conditional statements are evaluated in more detail. Furthermore, the results obtained by the pro-

54

3 Synthesis of Reversible Logic

posed approach are compared to the BDD-based synthesis method introduced in Sect. 3.2. As benchmarks a couple of programs are used including a simple arithmetic logic unit (denoted by alu; see also Fig. 3.7), a program computing the average of 8 or 16 values (denoted by avg8 and avg16), a logic unit applying bit-wise operations instead of arithmetic (denoted by lu), as well as an arbiter with 8 clients (denoted by arb8). Thus, results obtained by programs including arithmetic (alu, avg8, and avg16) as well as bit-wise operations (lu and arb8) have been evaluated. All experiments have been carried out on an AMD DualCore Athlon 3 GHz machine with 32 GB of main memory. The time-out was set to 500 CPU seconds. In a first evaluation the effect of different if-statement realizations was considered in detail. The results are presented in Table 3.11(a). The first column gives the name of the benchmark followed by the applied bit-width of data variables (denoted by BW) and the resulting number of primary inputs (denoted by PI). The following columns give the number of constant input lines (CI), the number of gates (d), the quantum cost (QC), and the transistor cost (TC) of the circuits obtained using the if-realization with additional circuit lines (denoted by IF - STM . W / ADD . LINES) or without additional circuit lines (denoted by IF - STM . W / O ADD . LINES), respectively. Finally, the run-time is reported for both approaches in column T IME. The results confirm the discussion from the last section. If additional circuit lines are applied, the respective cost can be significantly reduced. In comparison to the realization without additional circuit lines for if-statements, approx. 80% of the quantum costs and at least for alu and lu more than 50% of the transistor costs can be saved (this does not hold for the avg benchmarks since they do not include ifstatements). In contrast, this leads to a significant increase in the number of constant inputs. Finally, the results are compared to the BDD-based synthesis approach. Here, a function given as binary decision diagram is used as input. Thus, the circuits obtained by SyReC are extracted as BDDs and re-synthesized based on the concepts introduced in Sect. 3.2.7 The results are given in Table 3.11(b) using the same denotation as described above. As can be clearly seen, the proposed approach outperforms the BDD-based synthesis in all objectives: Circuits with significantly less number of gates, quantum cost, and transistor cost, respectively, are synthesized in much less run-time (only the small arb8 is an exception). Moreover, in particular for the benchmarks including arithmetic (i.e. alu and awg) for large bit-widths no circuit can be synthesized within the given time-out. This can be explained by the fact, that in particular for the multiplication no efficient representation as BDD exists. Thus, for these examples the BDD-based approach suffers from memory explosion. Altogether, SyReC allows the specification of complex circuits that are hard to describe in terms of a decision diagram or truth table, respectively. Afterwards, the specified circuits can efficiently be synthesized. 7A

similar comparison to further work (e.g. [GAJ06, MDM07]) was not possible since due to memory limitations the respective benchmarks cannot be represented in terms of truth tables which is required by these approaches.

3.3 SyReC: A Reversible Hardware Language

55

Table 3.11 Experimental results (a) Applying the programming language SyReC B ENCH BW

PI

IF - STM . W / ADD . LINES

CI

453 2069

T IME CI

5840 0.03 s

QC

d

41

TC

T IME

8

alu

16

50 121 1345 7377 19856 0.03 s

alu

32

98 233 4473 27785 72464 0.03 s 137 4284 155677 166136 0.03 s

8

65

TC

alu

avg8

26

QC

d

IF - STM . W / O ADD . LINES

408

11125

13208 0.02 s

73 1252

40749

45240 0.03 s

72

11

405

885

4200 0.01 s

11

405

885

avg8

16 144

19

861 1853

8872 0.01 s

19

861

1853

8872 0.01 s

avg8

32 288

35 1773 3789 18216 0.01 s

35 1773

3789

18216 0.01 s

12

12

avg16

8 136

754

1654

7832 0.01 s

avg16

16 272

20 1602 3462 16536 0.01 s

20 1602

3462

16536 0.01 s

avg16

32 544

36 3298 7078 33944 0.01 s

36 3298

7078

33944 0.01 s

64

164

392

1328 0.02 s

40

119

1960

2960 0.02 s

744

2544 0.02 s

72

215

3768

5584 0.02 s

4976 0.03 s 136

407

7384

10832 0.02 s

24

746

800 0.44 s

lu

8

lu

16

50 120

308

lu

32

98 232

596 1448

arb8

26

754 1654

1

16

37

80

296

7832 0.01 s

4200 0.01 s

640 0.45 s

1

(b) BDD-based synthesis B ENCH

BW

PI

BDD-based synthesis CI

QC

d

TC

T IME

alu

8

26

768

3560

11196

70792

0.06 s

alu

16

50

541099

2842702

9380494

57530752

283.63 s

alu

32

98

–

–

–

–

>500.00 s

avg8

8

72

2933

10449

36581

217240

4.61 s

avg8

16

144

–

–

–

–

>500.00 s

avg8

32

288

–

–

–

–

>500.00 s

avg16

8

136

7410

25454

89938

532424

9.61 s

avg16

16

272

–

–

–

–

>500.00 s

avg16

32

544

–

–

–

–

>500.00 s

lu

8

26

111

331

823

5928

0.01 s

lu

16

50

215

651

1623

11688

0.03 s

lu

32

98

423

1291

3223

23208

0.08 s

1

16

15

49

101

824

0.01 s

arb8

56

3 Synthesis of Reversible Logic

3.4 Summary and Future Work Having automated synthesis methods is crucial in the design of reversible or quantum circuits. In this chapter, the current synthesis steps (including embedding and actual synthesis) have been described. Even if only a selected approach has been considered in detail, it was illustrated and discussed that synthesis of reversible logic currently is still at the beginning. Most of the existing methods are not able to synthesize large Boolean functions—not to mention complex reversible systems. One contribution towards synthesis of significantly larger circuits was made in the second part of this chapter. Here, BDDs representing the function to be synthesized are constructed whose nodes afterwards are substituted with a cascade of Toffoli or quantum gates, respectively. While previous approaches are only able to handle functions with up to 30 variables at high run-time, the BDD-based approach can synthesize circuits for functions with more than hundred variables in just a few CPU seconds. Furthermore, with respect to quantum cost (i.e. number of quantum gates), significantly smaller realizations are obtained. Due to these promising results, BDD-based synthesis should be subject to further research. In particular, a detailed analysis of the theoretical results that can be obtained by the BDD method is left. Section 3.2.3 gave a first sketch. However, BDDs are very well understood so that for sure many more results can be transferred. Furthermore, it would be of interest to evaluate the effect of the proposed approach if adjusted cost functions for reordering as well as other decompositions (e.g. positive Davio or negative Davio) are applied. Besides that, synthesis of reversible logic should reach the “next” level, i.e. the system level. Therefore, reversible hardware programming languages are needed. SyReC, as introduced in the third part of this chapter, provides a first approach in this direction. Using this language in combination with the proposed hierarchical synthesis approach enables synthesis of more complex reversible circuits for the first time. Nevertheless, the circuits resulted from both, BDD-based as well as SyReCbased synthesis, still require a notable amount of additional circuit lines. Depending on the technology this might be a drawback (e.g. for quantum systems where the number of lines or qubits, respectively, is limited). Thus, how to reduce these number of lines in a circuit is an important question. Section 6.2 introduces a postprocess approach which addresses this issue. Besides that, finding embeddings leading to fewer additional circuit lines (e.g. for the binary operations in SyReC) is an important task for future work.

Chapter 4

Exact Synthesis of Reversible Logic

In contrast to the heuristic approaches introduced in the last chapter for synthesis of reversible logic, exact methods determine a minimal solution, i.e. a circuit with a minimal number of gates or quantum cost, respectively. Ensuring minimality often causes an enormous computational overhead and thus exact approaches are only applicable to relatively small functions. Nevertheless, it is worth to consider exact methods, since they • allow finding smaller circuits than the currently best known realizations, • allow the evaluation of the quality of heuristic approaches, and • allow the computation of minimal circuits as building blocks for larger circuits. For example, improving heuristic results by 10% is significant, if this leads to optimal results, but marginal if the generated results are still factors away from the optimum. Conclusions like this are only possible, if the optimum is available. Another aspect is the computation of building blocks that can be reused to synthesize larger designs. For example, the substitutions used in the last chapter for the BDDbased synthesis have been generated using exact approaches. However, only very little research has been done in exact synthesis of reversible logic so far. A method based on a depth-first traversal with iterative deepening that uses circuit equivalences to rewrite a limited set of gates has been presented in [SPMH03]. The authors of [YSHP05] introduce an exact algorithm based on group theory. But for both approaches, only results for functions with up to three variables are reported. Furthermore, in [HSY+06] another exact synthesis method based on a reachability analysis has been proposed which is geared towards quantum gates. However, also here only functions with three and a couple of functions with four variables can be handled in a significant amount of run-time. This chapter proposes methods based on Boolean satisfiability (SAT) that allow a faster exact synthesis and that is applicable for functions with up to six variables. The general idea is as follows: The synthesis problem is formulated as a sequence of decision problems. Then, each decision problem is encoded as a SAT instance and checked for satisfiability using an off-the-shelf SAT solver. If the instance is unsatisfiable, then no realization with d gates exists and a check for another d value R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, DOI 10.1007/978-90-481-9579-4_4, © Springer Science+Business Media B.V. 2010

57

58

4 Exact Synthesis of Reversible Logic

is performed. Otherwise, the circuit can be obtained from the satisfying assignment. Minimality is ensured by iteratively increasing d starting with d = 1. In the following, the main flow and the respective SAT encodings for Toffoli circuit synthesis [GCDD07, GWDD09a] as well as for quantum logic synthesis [GWDD08, GWDD09b] are introduced in detail in Sects. 4.1 and 4.2, respectively. Since nowadays very powerful techniques for solving SAT instances exist (see Sect. 2.3), already this enables efficient exact synthesis of reversible functions. However, further improvements are possible, if (1) the problem is formulated and solved on the SMT level [WGSD08], (2) additional knowledge provided by the dedicated solving engine SWORD is exploited [WG07, GWDD09a], or (3) quantified Boolean satisfiability is used [WLDG08]. The respective encodings and methods are described in Sect. 4.3. The last (and most efficient) method also has been applied and evaluated to synthesize reversible circuits including Fredkin and Peres gates. Finally, the chapter is concluded and future work is sketched in Sect. 4.4.

4.1 Main Flow In this section, the main concepts of the exact synthesis algorithm for reversible and quantum logic are presented. The basic idea is as follows: Given a reversible function f to be synthesized. Then, the exact synthesis of f is formulated as a sequence of decision problems. In each iteration, it is checked if for the reversible function f and a natural number d a circuit with exactly d gates exists. Here, options for specifying and solving the decision problem as well as finding the optimal value for d exist. The decision problem is encoded and solved by using SAT techniques which is described in the next section in detail. To find the optimal value for d, i.e. to determine a d where the resulting circuit has the minimal number of gates, a possible approach is to start searching a solution with d = 1. If there is no solution, i.e. the decision problem returns false, the number of gates (i.e. d) is incremented until one of the remaining decision problems becomes true. Following this procedure, minimality is ensured. Obviously, it is also possible to choose another technique to reach the optimal d. For example, upper or lower bounds can be exploited. However, for the exact synthesis problem at hand the following observation holds that is first illustrated with an example and afterwards formulated in a lemma. Example 4.1 Consider the reversible function in Fig. 4.1(a). For this function two Toffoli circuits are shown in Fig. 4.1(b). By exhaustive enumeration it has been proven that, even if there are realizations including d = 2 gates and d = 4 gates, no realization with d = 3 gates exist. Hence, if a realization with d gates has been found, minimality cannot be shown by only proving that there is no realization with d − 1 gates. However, for Toffoli circuits it is sufficient to prove that there are no realizations with d − 1 and d − 2 gates as the following lemma shows:

4.1 Main Flow

59

Fig. 4.1 Function with circuits including d = 2/4, but not d = 3 gates

Fig. 4.2 Extension of a circuit with d gates to a circuit with d + 2 gates

Fig. 4.3 Gate equivalences

Lemma 4.1 Let f : Bn → Bn be a reversible function to be synthesized. A Toffoli circuit including d gates is minimal with respect to the number of gates, if no realization with d − 1 gates and no realization with d − 2 gates exists. Proof Assuming that for a reversible function f a realization with d gates and a smaller realization with d − r gates exist (r > 0). Then, as shown in Fig. 4.2, the smaller realization can be extended by two additional NOT gates so that the resulting circuit still realizes f . By cascading the respective NOT gates, it follows that there are realizations with d − r + 2 · s gates as well (s > 0). Thus, if there is a realization with d − r gates, there has to be at least one realization with d − 1 or with d − 2 gates. If quantum gate circuits are considered this observation can be applied as well. Moreover, if at least a CNOT gate, a V gate, or a V+ gate occurs, a somewhat tighter extension is possible. Then, each of these gates can be “extended” as depicted in Fig. 4.3. This leads to valid realizations with costs d + r for any r ∈ N. As a result, it is sufficient to check for a realization with d − 1 gates to prove that d is minimal. Thus, the minimal d can be approached by two methods: (1) Start with d = 1 and iteratively increment d until a realization is found or (2) determine a value for d (e.g. by heuristics or bounds) and non-iteratively modify d until a minimal realization (approved by Lemma 4.1) is found. In this context, non-iteratively means that if there exists a circuit with d gates, then it is tried to find a better realization with only d < d gates; otherwise, it is tried to find a circuit with d > d gates. However, for the considered exact synthesis problem an iterative approach is chosen due to the complexity in solving the respective problem instances for large values of d. To illustrate this, Table 4.1 shows the results (R ES) as well as the run-time

60 Table 4.1 Iterative approach vs. non-iterative approach for mod5d1

4 Exact Synthesis of Reversible Logic d

1 2 3 4 5 6 7 total

Fig. 4.4 Main flow of exact synthesis algorithm

I TERATIVE R ES

T IME

B EST CASE NON - IT. R ES T IME

UNSAT UNSAT UNSAT UNSAT UNSAT UNSAT SAT

0.23 1.92 16.68 36.62 194.24 1625.88 218.56

– – – – UNSAT UNSAT SAT

2094.13

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)

– – – – 194.24 1625.88 218.56 2038.68

exactSynthesis(f : Bn → Bn ) // f is given in terms of a truth table found = false; d = 1; while (found == false) do inst = encodeProblem(f, d); res = callSolver(inst); if (res == satisfiable) // f is synthesizable with costs d A = getAssignment(); extractCircuitFromAssignment(A); found = true; else // f is not synthesizable with costs d d = d + 1;

(T IME, in CPU seconds) of the respective checks that have been performed to synthesize an optimal Toffoli circuit for the function mod5d1.1 The minimal circuit for this function includes d = 7 gates. Thus, using the iterative approach seven checks are performed in total. In contrast, assuming the best case for the non-iterative approach (i.e. the minimal depth d = 7 is determined just at the beginning and additionally the two checks for d = 6 and d = 5 are performed) only three checks are necessary. However, since the run-time needed for the first checks of the iterative approach are small, the total run-time for both approaches differs only slightly (less than 3%). Hence, the non-iterative approach for reaching the minimal d is not feasible in general. This particularly holds, since this approach naturally requires checks with a d greater than the minimal d (that obviously are harder). As a result, an iterative approach as shown in Fig. 4.4 is used for exact synthesis of reversible logic. The input is the truth table of the reversible function f to 1 Therefore, the SAT-based encoding described in the next section has been used. However, the same behavior was observed for other encodings and functions, respectively.

4.2 SAT-based Exact Synthesis

61

be synthesized. The algorithm tries to find a circuit representation for f with one gate only, i.e. d is initialized to 1 and a respective SAT instance is created. If no realization with d gates exists, d is incremented. This procedure is repeated until a realization is found. The respective checks are thereby performed by 1. encoding the synthesis problem as an instance of Boolean satisfiability inst (line 6) and 2. checking the instance for satisfiability using an off-the-shelf solver (line 7). If there exists a satisfying assignment for inst a circuit representing f has been found. This circuit is extracted from the assignment of the encoding given by the solver. If inst is unsatisfiable, it has been proven that no realization for f with d gates exists. By increasing d iteratively from d = 1 minimality is ensured. Using this as the main flow, the next sections introduce concrete encodings for Toffoli and quantum circuit synthesis, respectively.

4.2 SAT-based Exact Synthesis Having the main flow as a basis, the open question still is how to encode the decision problem “Is there a circuit with exactly d gates that realizes the given reversible function f ?” as a SAT instance. In this section, the concrete SAT formulation as well as first results obtained with it are presented. Section 4.2.1 addresses Toffoli circuit synthesis, while Sect. 4.2.2 is about quantum circuit synthesis, respectively. These encodings allow an efficient handling of the embedding problem for irreversible functions (see Sect. 3.1.1) which is considered in Sect. 4.2.3 in more detail. Finally, experimental results are given in Sect. 4.2.4.

4.2.1 Encoding for Toffoli Circuits The synthesis problem for Toffoli circuits is encoded so that the resulting instance is satisfiable iff a circuit with d gates realizing a given function f exists; otherwise the instance must be unsatisfiable. To this end, Boolean variables (for brevity denoted by vectors in the following) and constraints are used as described in the following. First, the vectors defining the type of a Toffoli gate at an arbitrary depth k are introduced:2 2 The Toffoli gates in a circuit are enumerated from left to right (starting from 0). Furthermore, the term depth is used to refer to the respective position of a Toffoli gate in this enumeration.

62

4 Exact Synthesis of Reversible Logic

Fig. 4.5 Representation of Toffoli gates by assignments to tk and ck

Definition 4.1 Let f : Bn → Bn be a reversible function to be synthesized as a circuit with d gates. Then, k . . . t1k ) with 0 ≤ k < d is a Boolean vector defining the position of • tk = (tlog 2 n the target line for the Toffoli gate at depth k. More precisely, tk is a binary encoding of a natural number t k ∈ {0, . . . , n − 1} that indicates this target line. k ck k • ck = (cn−1 n−2 . . . c1 ) with 0 ≤ k < d is a Boolean vector defining the control lines of the Toffoli gate at depth k. More precisely, assigning clk = 1 with (1 ≤ l < n) means that line (t k + l) mod n becomes a control line of the Toffoli gate at depth k.

Remark 4.1 In total there are n · 2n−1 different types of Toffoli gates for a reversible function with n variables. This holds, since a Toffoli gate has exactly one target line resulting in n − 1 lines that are left as possible control lines. Thus, there are n lines possible for placing a target line and 2n−1 combinations for control lines, respectively. Example 4.2 Figure 4.5 shows all 3 · 23−1 = 12 possible types of Toffoli gates for a circuit with n = 3 lines. For each gate its assignments to the vectors tk and ck are also given. For example, the assignments tk = (01) and ck = (01) state that line [01]2 = 1 is the target line. Furthermore, because c1 is assigned to 1, line (1 + 1) mod 3 = 2 becomes a control line. In contrast, because c2 is assigned to 0, line (1 + 2) mod 3 = 0 does not become a control line. Furthermore variables representing the inputs and outputs as well as the internal signals of the circuit to be synthesized are defined:

4.2 SAT-based Exact Synthesis

63

Fig. 4.6 SAT formulation for Toffoli circuit synthesis with n = 4 and d = 4

Definition 4.2 Let f : Bn → Bn be a reversible function to be synthesized as a k k ) with 0 ≤ i < 2n and 0 ≤ k ≤ d is . . . xi0 circuit with d gates. Then, xki = (xi(n−1) a Boolean vector representing the input (for k = 0), the output (for k = d), or the internal variables (for 1 ≤ k ≤ d − 1), respectively, of the circuit to be synthesized for each truth table line i of f . So, the left side of a truth table line i corresponds to the vector x0i , while the right side corresponds to the vector xdi , respectively. Example 4.3 Figure 4.6 shows the variables needed to formulate the synthesis problem for an (embedded) adder function3 with n = 4 variables and depth d = 4. The first row gives the variables for the first truth table line, the second row the variables for the second truth table line, and so on. Thus, for each of the 24 = 16 lines in the truth table, n = 4 circuit lines with the respective vectors for input, output, and internal variables are considered (i.e. overall 4 · 16 = 64 lines are considered). The positions for the Toffoli gates to be synthesized are marked by dashed rectangles. For each depth, all possible types of Toffoli gates can be defined by assigning the respective values to tk and ck .

3 In

the example, the adder from Table 3.2 on p. 29 is used.

64

4 Exact Synthesis of Reversible Logic

Using these variables, the synthesis problem for a reversible function f with d Toffoli gates can be formulated as follows: Is there an assignment to all variables of the vectors tk and ck such that for each line i, x0i is equal to the left side of the truth table, while xdi is equal to the corresponding right side? This is encoded by the conjunction of the following three constraints: 1. The input/output constraints set the input and output of the truth table given by the function f to the respective variables x0i and xdi (see also left-hand and righthand side of Fig. 4.6), i.e. n −1 2

[x0i ]2 = i ∧ [xdi ]2 = f (i).

i=0

2. For each gate to be synthesized at depth k, functional constraints are added so that—depending on the assignments to tk and ck as well as on the input xki of the is computed, i.e. kth gate for truth table line i—the respective gate output xk+1 i n −1 d−1 2

xk+1 = t (xki , tk , ck ). i

i=0 k=0

The function t (xki , tk , ck ) covers the functionality of a Toffoli gate with target line t k = [tk ]2 and the control lines defined by ck . As an example, consider tk = (01) and ck = (100), i.e. with c3k = 1. This assignment states that the Toffoli gate at depth k has line t k = [01]2 = 1 as target line and line (t k + l) mod n = (1 + 3)

mod 4

=0 as single control line. For this case, constraints tk = (01) ∧ ck = (100) ⇒

k+1 k xi0 = xi0 k+1 k ⊕ xk ∧ xi1 = xi1 i0 k+1 k ∧ xi2 = xi2 k+1 k ∧ xi3 = xi3

are added for each truth table line i of a function with n = 4 variables. That means, the values of the circuit lines 0, 2, and 3 are passed through, while the output value of line 1 becomes inverted, if line 0 is assigned to 1. Similar constraints are added for all remaining cases.

4.2 SAT-based Exact Synthesis

65

3. Finally, exclusion constraints ensure that illegal assignments to tk are excluded, since not all values of tk are necessary to enumerate all possible target lines, i.e. d−1

[tk ]2 < n.

k=0

For example, for a circuit consisting of n = 3 lines the target line is represented by two variables tk = (t2 t1 ) as shown in Fig. 4.5. Here, the assignment tk = (11) has to be excluded, since line [11]2 = 3 does not exist. As a result, a formulation has been constructed which is satisfiable, if there is a valid assignment to tk and ck so that for all truth table lines the desired inputoutput mapping is achieved. Then, the concrete Toffoli gates can be obtained by the assignments to tk and ck as depicted in Fig. 4.5. If there is no such assignment (i.e. the instance is unsatisfiable), then it has been proven that no circuit representing the function with d gates exists. As a last step, the proposed encoding has to be transformed from bit-vector logic into a Conjunctive Normal Form (CNF)—the standard input format for SAT solvers (see Sect. 2.3.1). This is a well understood process that can be done in time and space linear in the size of the original formulation [Tse68]. A possible way is to define methods for clause generation of simple logic functions like AND, OR, etc. and extending this scheme for more complex logic like implications or comparisons. Then, in particular the functional constraints can be mapped to CNF. The assignments of the input/output constraints can be applied by using unit clauses. Finally, the exclusion constraints can be expressed by explicitly enumerating all values that are not allowed in terms of a blocking clause [McM02]. Having the formulation in CNF, the satisfiability of the instance (as well as the satisfying assignments) can be efficiently determined.

4.2.2 Encoding for Quantum Circuits To synthesize quantum circuits, an encoding similar to the one introduced in the last section is used. However, since quantum circuits may have V gates and V+ gates, circuit lines (or qubits, respectively) may not only be assigned to Boolean 0 and Boolean 1, but also to V0 and V1 (see Sect. 2.1.3). Thus, for assignments to the input, output, and internal variables xki a multi-valued encoding is applied. To this k that represent the respective end, each xijk is replaced by new variables yijk and zij values as follows: yijk

k zij

0 0 1 1

0 1 0 1

0 V0 1 V1

66

4 Exact Synthesis of Reversible Logic

k is assigned to 0 the Boolean domain is considered, otherwise the That is, if zij non-Boolean quantum states V0 and V1 are selected. Furthermore, since another gate library is applied, new variables to represent the respective types of a gate at depth k are required. Thus, the variables tk and ck used for Toffoli gates are replaced by qk defined as follows:

Definition 4.3 Let f : Bn → Bn be a reversible function to be synthesized as a k quantum circuit with d gates. Then, qk = (qlog . . . q1k ) with (0 ≤ k < d) is a 2 g Boolean vector defining the type of the quantum gate at depth k. The number g of gate types possible with n circuit lines is thereby defined by g = 3n(n − 1) + n. The number g of all gates types is determined as follows: Each CNOT gate, V gate, and V+ gate has exactly one target line and one control line leading to 3n(n−1) possible gate types for a circuit with n lines. Additionally, n NOT gates are possible (one at each line). Thus, in total 3n(n − 1) + n different types of quantum gates exist. Remark 4.2 If additionally double gates are considered (see Sect. 2.1.3), for a circuit with n lines in total g = 7n(n − 1) + n different types of quantum gates have to be considered. This holds, since in total four double gates exist (namely the ones shown in Fig. 2.7 on p. 16), leading to 4n(n − 1) additional types. Example 4.4 Figure 4.7 shows the variables needed to formulate the constraints for an (embedded) adder function. In comparison to the variables needed for Toffoli synthesis (see Example 4.3 or Fig. 4.6, respectively) the variables defining the type of a gate at depth k and the variables representing the circuit line values have been changed. Having these variables (enabling a multi-valued encoding considering the quantum gate library), in comparison to the Toffoli synthesis formulation the constraints are modified as follows: 0 and y d , zd leading to 1. The input/output constraints now argue over yij0 , zij ij ij n −1 n−1 2

0 yij0 = i[j ] ∧ zij = 0 ∧ yijd = f (i)[i] ∧ zdij = 0.

i=0 j =0

That is, each yij0 (yijd ) is assigned to 1 or 0 according to the j th position in the 0 (zd ) is assigned to 0, since Boolean truth table line i of f . Furthermore, each zij ij functions are synthesized. As an example, consider the left-hand and the righthand side of Fig. 4.7.

4.2 SAT-based Exact Synthesis

67

Fig. 4.7 SAT formulation for quantum circuit synthesis with n = 4 and d = 4

2. The functional constraints are modified so that the functionality of the new gate k , qk ), i.e. library is represented by a new function q(yijk , zij n −1 n−1 d−1 2

k+1 k yijk+1 zij = q(yijk , zij , qk ).

k=0 i=0 j =0

Therefore, a similar formulation as described in the last section for the Toffoli gate library is possible. 3. And finally, illegal assignments to qk are now excluded by d−1

[qk ]2 < g,

k=0

where g is given by 7n(n − 1) + n (including double gates) or 3n(n − 1) + n (without double gates), respectively. As for Toffoli circuit synthesis, in a last step this formulation has to be transformed into a CNF and passed to a SAT solver. If the solver returns satisfiable, then

68

4 Exact Synthesis of Reversible Logic

the quantum circuit can be obtained by the assignments to qk . Even for the multivalued encoding, this can be done efficiently for many practically relevant functions. However, before the performance of the encodings (for both, Toffoli circuit synthesis as well quantum circuit synthesis), is considered in detail in Sect. 4.2.4, a beneficial modification for exact synthesis of (embedded) irreversible functions is introduced in the following section.

4.2.3 Handling Irreversible Functions As described in Sect. 3.1.1, an irreversible function often must be embedded into a reversible one before synthesis can be applied. This also holds for the exact synthesis approach proposed in the last sections. As a result, constant inputs, garbage outputs, and don’t cares, respectively, may occur in the embedded functions. Thus, to determine the minimal reversible circuit for an irreversible function all possible embeddings (and therewith all possible assignments to constant inputs, garbage outputs, and don’t cares) have to be checked separately.4 But, if some slight modifications in the encoding are applied, several embeddings can be considered in parallel. It is thereby distinguished between don’t cares at the outputs (e.g. because of garbage outputs) and concrete values to be assigned to constant inputs. In the former case, only the input/output constraints are relaxed. Instead of forcing all output d ) to have concrete values, only constraints for the specified ones variables xijd (yijd zij are added. Then, the variables for don’t care conditions are left unspecified and are—if the instance is satisfiable—assigned by the SAT solver. The same is done for all constant inputs. But, since a constant input must have the same assignment in all truth table lines, an additional constraint 0 0 0 x0c = x1c = · · · = x(2 n−|c| −1)c

0 0 0 0 0 0 or y0c z0c = y1c z01c = · · · = y(2 n−|c| −1)c z(2n−|c| −1)c

is added for each constant input c, respectively. This restricts the SAT solver to 0 (y 0 z0 ) with the same value for each truth table line. assign all input variables xic ic ic Furthermore, since the constant inputs are now modeled symbolically (the value of each constant input is not fixed to 0 or 1) only 2n−|c| truth table lines have to be considered (where |c| is the number of constants). Example 4.5 Consider the incompletely embedded adder function shown in Table 4.2. The adder needs one additional variable to become reversible leading to a function with n = 4 variables, one constant input c, and two garbage outputs g1 4 In

principle, also embeddings with an arbitrary number of garbage outputs and different output permutations are possible. However, in the following only embeddings with minimal garbage and a fix output order are considered. Chapter 5 provides a further consideration of different embeddings.

4.2 SAT-based Exact Synthesis Table 4.2 Incomplete embedding of an adder

69 c

cin

x

y

cout

sum

g1

g2

–

0

0

0

0

0

–

–

–

0

0

1

0

1

–

–

–

0

1

0

0

1

–

–

–

0

1

1

1

0

–

–

–

1

0

0

0

1

–

–

–

1

0

1

1

0

–

–

–

1

1

0

1

0

–

–

–

1

1

1

1

1

–

–

and g2 , respectively.5 To handle this incompletely specified function, four modifications in the proposed SAT encoding are performed: • Constraints for only 2n−|c| = 23 = 8 (instead of 2n = 24 = 16) truth table lines are created, i.e. 0 ≤ i < 8. d (y d zd ) and x d (y d zd ) no output constraints are • For all output variables xig ig1 ig1 ig2 ig2 ig2 1 added, i.e. they are left unspecified. 0 (y 0 z0 ) no input constraints are added, i.e. they are left • For all input variables xic ic ic unspecified. 0 = x 0 = · · · = x 0 (y 0 z0 = y 0 z0 = · · · = y 0 z0 ) • An additional constraint x0c 7c 7c 7c 1c 0c 0c 1c 01c is added for each truth table line i. In summary, these modifications do not only simplify the SAT encoding (since a smaller number of truth table lines is considered), but also reduce the number of checks that have to be performed to find a minimal circuit. Normally, to ensure minimality both values for each constant input have to be considered. Thus, for a function with one constant input two checks (one with c = 0 and one with c = 1) have to be performed. Moreover, for functions with more than one constant input an exponential number of combinations has to be checked (e.g. values from {00, 01, 10, 11} for a function with two constant inputs). For all these combinations, a single instance must be encoded and separately solved by the solver. In contrast, using the proposed modifications a single instance to be checked is sufficient to synthesize a minimal result. This leads to significant speed-ups as the experiments in the next section show.

5 Note, that only a partial truth table is shown. Depending on the assignment to the constant input, n 2n−1 = 22 truth table lines with don’t care outputs are added either above or below the shown truth table lines. For more details see Sect. 3.1.1.

70

4 Exact Synthesis of Reversible Logic

4.2.4 Experimental Results The proposed approaches have been implemented in C++. To solve the resulting instances, the SAT solver MiniSAT [ES04] has been used. This section provides experimental results for both, exact synthesis of quantum circuits and exact synthesis of Toffoli circuits. More precisely, it is shown that exact synthesis can be applied with up to six variables. Improvements in the run-time can be obtained if irreversible functions containing constant inputs are considered. Furthermore, a comparison to heuristic approaches confirm the need of exact synthesis methods for both, finding smaller circuits than the currently best known realizations and evaluating the quality of heuristic methods. As benchmarks a wide range of functions from different domains has been used. This includes reversible functions as well as embedded irreversible functions. All benchmarks have been taken from RevLib [WGT+08]. The experiments have been carried out on an AMD Athlon 3500+ with 1 GB of memory. This section starts with an evaluation of the quantum circuit synthesis afterwards followed by an evaluation of exact Toffoli circuit synthesis.

4.2.4.1 Synthesis of Quantum Circuits For exact synthesis of quantum circuits the results of three evaluations are presented. First the modifications for irreversible functions as introduced in Sect. 4.2.3 are studied in detail. Next, the effect of the application of double gates on the synthesis results is observed. Finally, the presented approach is compared to a previously introduced method for quantum circuit synthesis. Handling of Irreversible Functions To synthesize quantum circuits for irreversible functions, appropriate embeddings including constant inputs, garbage outputs, and don’t care conditions are used. Then, circuits are synthesized (1) by assigning the respective constants before creating and solving the SAT instance (denoted by O RIG . SAT E NCODING) or (2) by applying the modifications proposed in Sect. 4.2.3 (denoted by I MPR . SAT E NCODING). The respective results are shown in Table 4.3. Note that if the O RIG . SAT E NCODING is applied for all possible combinations of constant input assignments a single instance is encoded and solved. Thus, for each function there are 2|c| entries, where |c| is the number of constant inputs. Each line below the function name corresponds to one assignment to the constant inputs. In contrast, if the I MPR . SAT E NCODING is applied, only a single instance has to be solved and therewith only a single result is reported. Besides that, columns labeled d show the number of gates of the resulting (minimal) circuit and columns labeled T IME list the corresponding run-time in CPU seconds (the number of variables in the function is given by n). As can be seen, the modified encoding offers a significant speed-up for all examples. The reduction of run-times is between 70% and 95%. Reductions are more substantial if at least one of the assignments has a solution with more gates than

4.2 SAT-based Exact Synthesis

71

Table 4.3 Handling of irreversible functions

O RIG . SAT E NCODING d

I MPR . SAT E NCODING T IME

d

T IME

Half-adder (n = 3)

total

4

1.47

4

0.85

5

2.65

–

–

4

4.12

4

0.85

Half-adder2 (n = 3)

total

4

1.28

4

0.76

5

3.53

–

–

4

4.81

4

0.76

297.569

Full-adder (n = 4) 6

total

6

209.95

7

1123.55

–

–

6

1421.11

6

209.95

low-high (n = 4)

total

7

10636.95

6

2180.66

6

1444.26

–

–

6

12081.21

6

2180.66

zero-one-two (n = 4)

total

7

430.30

6

132.46

6

91.23

–

–

6

204.46

–

–

6

477.84

–

–

6

1203.83

6

132.46

8

2162.51

8

5110.26

8

4391.96

–

–

8

6158.05

–

–

8

6063.92

–

–

8

18776.44

8

5110.26

decod24 (n = 4)

total

72 Table 4.4 Effect of double gates

4 Exact Synthesis of Reversible Logic Q UA .G ATES

F UNCTION n

d

D BL .G ATES

T IME

d

T IME

R EVERSIBLE FUNCTIONS 3_17

3

10

1641.49

8

280.98

miller

3

8

15.49

6

11.60

fredkin

3

7

7.04

5

3.28

peres

3

4

0.33

4

1.21

toffoli

3

5

0.71

5

2.38

peres-double

3

6

11.32

6

175.86

toffoli-double

3

7

86.75

7

1121.68

graycode6

6

5

66.50

5

608.11

q4example

4

6

9.08

5

24.83

0.40

4

0.85

E MBEDDED IRREVERSIBLE FUNCTIONS Half-adder

3

5

Half-adder2

3

4

0.19

4

0.76

Full-adder

4

7

145.07

6

209.95

rd32

4

6

37.75

6

436.11

low-high

4

7

2245.47

7

2180.60

zero-one-two

4

7

32.05

6

132.46

decod24

4

9

5660.77

8

5110.26

the optimal assignment. This can be observed for all functions in Table 4.3 except decod24. It should be noted that the constraining of constant input variables requires some computation time (i.e. run-times may be higher than those for solving the function with a fixed constant input assignment). However, this overhead is easily compensated by the fact that only one instance needs to be solved. Since the proposed modifications often lead to better results (with respect to run-time and resulting circuit size), in the following, they are also applied in the remaining experiments. Effect of Double Gates In [HSY+06], double gates (as introduced in Sect. 2.1.3) are assumed to have unit cost. However, other synthesis methods (e.g. [BBC+95, MYDM05]) consider the cost of a double gate to be two, since they are composed of two quantum gates. Hence, there are compelling reasons to consider synthesis that rely only on (single) quantum gates only. As described above, the proposed SAT-based formulation supports both, synthesis with quantum gates (denoted by Q UA . G ATES) only or additionally with double gates (denoted by D BL . G ATES). In one evaluation, circuits with double gates enabled and with double gates disabled have been considered. Disabling double gates reduces the number of possible gates at each depth from 7n(n − 1) + n to 3n(n − 1) + n (making the instance more compact). The results are summarized in Table 4.4. In the first two columns the name

4.2 SAT-based Exact Synthesis Table 4.5 Comparison to exact synthesis in [HSY+06]

73 F UNCTION

RA [HSY+06]

SAT (PIII)

T IME

T IME

I MPR

R EVERSIBLE FUNCTIONS miller

318.29

34.53

>9.2

fredkin

78.02

10.96

>7.1

peres

35.18

4.43

>7.9

toffoli

122.52

8.45

>14.5

E MBEDDED IRREVERSIBLE FUNCTIONS Half-adder

6.77

2.99

>2.3

Half-adder2

26.25

2.70

>9.7

25200.00

551.92

>45.7

Full-adder

of the function and its number of variables are given. The next columns provide the number of gates (d) and the run-time in CPU seconds (T IME) for both cases (quantum gates only and additionally with double gates). In general, it is expected that more choices of possible gates at each level will increase the time to find a correct solution. This can clearly be seen for the benchmark functions where the inclusion of double gates offers no advantage (i.e. both values for d are the same). For example, the run-time for graycode6 increases by one order of magnitude when double gates are considered—even though the results are identical with respect to the costs. On the other hand for some functions where the inclusion of double gates leads to smaller circuits (e.g. 3_17), the run-time can be reduced, since fewer instances have to be solved. Comparison with Previous Work To compare the SAT-based synthesis to the exact approach introduced in [HSY+06], a 733 MHz Pentium III with 512 MB of main memory has been used (that is significantly slower than a 850 MHz Pentium III processor, the system used in [HSY+06]). The outcome is shown in Table 4.5. RA denotes the run-time of the reachability analysis from [HSY+06] and SAT denotes the run-time of the proposed approach (using both quantum gates and double gates). IMPR gives the run-time improvement, i.e. the run-time of RA divided by the runtime of SAT. The table clearly shows that all functions from [HSY+06] considered for exact synthesis can be synthesized in significant shorter run-time—even on a slower processor. Improvements of at least a factor of 2 are achieved. In the best case an improvement of a factor of 45 is observed. Besides that, also synthesized results for the functions q4-example, Peres-double, and Toffoli-double are compared. For these functions, the authors of [HSY+06] constrained the search space, i.e. they restrict the target-line of the V and V+ gates to be a fix line. Therefore, they cannot guarantee optimal solutions. The comparison in Table 4.6 shows that the proposed approach is able to find the optimal results for these functions with a low run-time increase (the results of the heuristic approach of [HSY+06] are denoted by RAheu ). It is thereby proven that for Peres-double and

74 Table 4.6 Comparison to heurstic synthesis in [HSY+06]

4 Exact Synthesis of Reversible Logic F UNCTION

RAheu [HSY+06]

SAT (PIII)

d

T IME

T IME

peres-double

6

171.27

481.71

6

toffoli-double

7

853.78

2985.88

7

q4-example

6

34.78

78.09

5

d

Toffoli-double the minimal quantum gate circuits have been found in [HSY+06]. Additionally, in case of q4-example the SAT-based approach synthesizes an optimal quantum gate representation with costs 5 instead of the non-optimal circuit of size 6 obtained in [HSY+06]. Note, again all these benchmarks are carried out on a slower system than the one used in [HSY+06]. For absolute run-times on a faster machine see the rightmost column of Table 4.4. 4.2.4.2 Synthesis of Toffoli Circuits In a next series of experiments, the SAT-based synthesis of Toffoli circuits was considered. Since the effect of the improved handling for irreversible functions already has been evaluated above for quantum circuits, another discussion of the results for Toffoli circuits is omitted (for all cases similar results have been obtained). Instead, the respective handling is directly applied where appropriate. In the following the results of SAT-based Toffoli circuit synthesis in comparison to previously introduced exact and heuristic results are presented, respectively. Comparison to Exact Approaches In the past, [SPMH03] and [YSHP05] investigated exact synthesis of reversible functions for Toffoli circuits. However, both approaches give results only for reversible functions with up to n = 3 variables. More precisely, the overall synthesis time for all possible 23 ! = 40320 reversible functions with 3 variables is reported (at least 40 CPU seconds for [SPMH03] and 12 CPU seconds for [YSHP05]). Applying the SAT-based approach to each of these 40320 functions takes less than 0.01 CPU seconds for most of the instances and 0.65 CPU seconds in the worst case. Thus, the synthesis time for any 3-variable function is negligible. Adding up the run times for all 40320 functions would only accumulate errors of measurement. Thus, one can conclude that the overall synthesis time for a function with n = 3 is not crucial, i.e. exact synthesis for such functions can be performed efficiently. Furthermore, in contrast to [SPMH03, YSHP05] it is also possible to synthesize minimal circuits for functions with more than 3 variables as the next evaluation shows. Comparison to Heuristic Approaches Since several heuristic approaches for Toffoli circuit synthesis have been proposed so far, circuits obtained by SAT-based synthesis are not compared to the results obtained by a single previous method, but to the currently best known results. The results are shown in Table 4.7. For each function, the number of gates (d) as well as the source (S RC .) of the currently best

4.2 SAT-based Exact Synthesis

75

Table 4.7 Comparison of synthesis results E XACT

B EST KNOWN

F UNCTION n

d

QC

S RC .

T IME

d

QC

Δd

Δ QC

[MDM05]

5

13

48.28

0

0

[GAJ06]

5

9

0.60

0

0

R EVERSIBLE FUNCTIONS mod5mils

5

5

13

ham3

3

5

9

ex-1

3

4

8

[MDM05]

4

8

0.12

0

0

graycode3

3

2

2

[MDM05]

2

2

0.01

0

0

graycode4

4

3

3

[MDM05]

3

3

0.64

0

0

graycode5

5

4

4

[MDM05]

4

4

22.08

0

0

graycode6

6

5

5

[GAJ06]

5

5

583.14

0

0

3_17

3

6

14

[GAJ06]

6

14

0.43

0

0

mod5d1

5

8

24

[WGT+08]

7

11

2094.13

1

13

mod5d2

5

8

16

[MDM05]

8

20

1616.07

0

−4

E MBEDDED IRREVERSIBLE FUNCTIONS rd32

4

4

12

[GAJ06]

4

12

3.03

0

0

decod24

4

11

31

[GAJ06]

6

18

6.33

5

13

4gt4

5

17

89

[MDM05]

5

54

412.03

12

35

4gt5

5

13

29

[MDM05]

4

28

48.75

9

1

4gt10

5

13

53

[MDM05]

5

37

245.60

8

16

4gt11

5

12

16

[MDM05]

3

7

7.32

9

9

4gt12

5

14

58

[MDM05]

5

41

440.30

9

17

4gt13

5

14

34

[MDM05]

3

15

7.23

11

19 0

4mod5

5

5

9

[WGT+08]

5

9

125.74

0

4mod7

5

6

38

[MDM05]

6

38

653.82

0

0

one-two-three

5

11

71

[MDM05]

8

24

2186.71

3

47

alu

4

18

114

[GAJ06]

6

22

2001.32

12

92

known realization is given in column B EST KNOWN.6 In contrast, column E XACT shows the number of gates obtained by the SAT-based synthesis. The quantum costs for the respective circuits are denoted by QC. The difference with respect to the number of gates and quantum costs is given in the last two columns, respectively. Since the results from previous work have been obtained by different approaches on different machines, no run-times for these are reported. The run-time of the SAT-based synthesis is given in column T IME. Using the exact synthesis, it can be proven that for many functions minimal Toffoli circuits (with respect to gate count) already have been found (all rows with Δd equal to 0). That shows, today’s synthesis approaches are achieving very good 6 For

some functions no results have been reported before. In this case, the approach of [MDM05] has been applied to generate a heuristic result.

76

4 Exact Synthesis of Reversible Logic

results for these benchmarks. However, exact synthesis additionally enables the realization of significantly smaller circuits. For example, the circuit for 4gt5 can be reduced by more than two third. In absolute numbers, up to 12 gates can be saved for some functions. Moreover, the proposed approach also improves the quantum cost for many functions (only in one case (for mod5d2) the quantum cost increase.7 In the best case the quantum costs are reduced by 92. It can be concluded that using SAT-based synthesis as proposed, exact results can be produced for functions with up to six variables. The comparison to heuristic approaches confirm the need of exact methods for both, finding smaller circuits than the currently best known realizations and evaluating the quality of heuristic methods. However, synthesizing exact results still requires high computing times. Thus, in the next section improvements are proposed that accelerate the synthesis process.

4.3 Improved Exact Synthesis In the last section, SAT-based synthesis for quantum gate and Toffoli gate circuits has been demonstrated as a promising alternative to achieve minimal results for functions with up to six variables. But, the proposed approach still works on a problem description in CNF. In this section, two improvements are described exploiting higher levels of abstraction and leading to an increase of both, efficiency and quality of the obtained results. Therefore, from now on the focus is on circuits composed of reversible gates only. But, the described techniques can also be applied to synthesis of quantum circuits. In the first part of this section, the SAT encoding presented in the last section is used as basis but lifted to a higher level of abstraction. More precisely, the synthesis problem is encoded in bit-vector logic which can be solved by Satisfiability Modulo Theories (SMT) solvers instead of Boolean SAT solvers. Experiments show that this leads to significant speed-ups. However, after a detailed analysis of the fundamental limits of these solving paradigms, another approach is proposed that utilizes problem-specific knowledge. Consequently, the general solver framework SWORD (see Sect. 2.3.2.3) is applied for which dedicated modules have been specified. Besides a high-level and very compact problem representation, these modules allow more efficient decision and propagation strategies. But even with higher levels of abstractions, the size of the synthesis encoding is still exponential. That is, constraints are built for each truth table line of the function f to be synthesized. As an alternative, in the second part of this section an approach for reversible logic synthesis which leads to a polynomial size encoding is proposed. This encoding takes advantage of Quantified Boolean Formula (QBF) 7 This is because circuits are optimally synthesized with respect to number of gates not quantum cost. In some (few) cases circuits with larger number of gates but with lower quantum cost are possible. For results with respect to quantum gates (and therewith with respect to quantum cost) see the discussion above.

4.3 Improved Exact Synthesis

77

satisfiability—a generalization of Boolean satisfiability (see Sect. 2.3.1). More precisely, the exact synthesis problem of a reversible function f is formulated as a QBF problem by encoding the cascade structure of a reversible circuit as a functional composition of universal gates and enforcing to meet the specification of f by quantification. In this sense, complexity is moved from the problem formulation to the solving engine. Then, the quantified Boolean formula is solved by applying QBF solvers and Binary Decision Diagrams (BDDs). This leads to three major improvements: (1) the circuits are synthesized faster, (2) all minimal circuits are found in a single step which allows to choose the best one with respect to the quantum cost, and (3) different reversible gate libraries are easily supported by a simple extension of the problem formulation. In the remainder of this section, both approaches are described in detail in Sect. 4.3.1 (for SMT and SWORD) and Sect. 4.3.2 (for OBF), respectively. Afterwards, experimental results for both are presented in Sect. 4.3.3.

4.3.1 Exploiting Higher Levels of Abstractions Recalling the proposed SAT-based encoding, for a function f : Bn → Bn to be synthesized as a circuit consisting of d gates, the variables k • tk = (tlog

2 n

. . . t1k ) (to define the target line of a gate at depth k),

k ck k • ck = (cn−1 n−2 . . . c1 ) (to define the control lines of a gate at depth k), and k k k • xi = (xi(n−1) . . . xi0 ) (to represent the input, output, and internal variables)

as well as the constraints n • 2i=0−1 [x0i ]2 = i ∧ [xdi ]2 = f (i) (input/output), n k+1 = t (xki , tk , ck ) (functional), and • 2i=0−1 d−1 k=0 xi k • d−1 k=0 [t ]2 < n (exclusion) have been introduced. As can be seen, a large part of the problem formulation consists of bit-vector variables and bit-vector constraints, respectively. However, most of this high level of abstraction is lost, when the formulation is encoded as a pure Boolean formula and afterwards solved by a Boolean SAT solver. Furthermore, this transformation requires a high amount of auxiliary variables leading to an additional overhead. Thus, it is worth to consider alternative encodings. The emerging area of SAT Modulo Theories (SMT) (see Sect. 2.3.2) provides new solving engines that directly support bit-vector logic and thus allow an encoding that avoids the conversion to the Boolean level. As a result, all bit-vector variables and most of the bit-vector operations are preserved; hardly any auxiliary variables are needed. Furthermore, the formulation at this higher level of abstraction allows stronger implications. As the experiments in Sect. 4.3.3 show, already this simple “replacement” allows significant improvements in the resulting synthesis times.

78

4 Exact Synthesis of Reversible Logic

However, further accelerations can be achieved if more dedicated solving engines are exploited. One adjusted solving technique, based on the framework SWORD [WFG+07], is described in this section. The general limits of the SAT- and SMT-based approaches are thereby discussed first. Then, the dedicated implication procedures and decision heuristics are introduced.

4.3.1.1 Limits of Common SAT and SMT Solvers The input of a SAT solver is a Boolean function in terms of clauses. The input of an SMT solver is a description in bit-vector logic. Both solvers are optimized for their particular problem representation. For example, common SAT solvers utilize the two literal watching scheme to carry out implications, which exploits the special structure of clauses [MMZ+01]. SMT solvers on the other hand, use e.g. canonizing [BDL98] and term-rewriting [BB09] to efficiently handle bit-vector constraints. Furthermore, highly optimized heuristics have been developed to decide the assignment of variables if no more implications are possible. Strategies employed are based on statistical information, for example occurrences or activities of variables [Mar99]. All these techniques work very well if CNF formulas or bit-vector logic are considered in general. But, the respective solvers are not able to take specific properties of the problem into account. For example, promising problem-specific strategies for the exact Toffoli circuits synthesis would be: • The type of the Toffoli gates (represented by tk and ck ) near to the inputs should be defined first, because the corresponding input variables are already assigned by the truth table. This allows for early implications and helps to determine the types of the remaining gates or to detect conflicts faster. Thus, tk and ck with small k should be preferred in the decision procedure. Similarly, this observation also holds for modules near to the outputs. • If the assignment to an input line of a Toffoli gate is not equal to the assignment to the corresponding output line of the same gate, this line has to be the target line. This observation allows to imply the assignment to variables in tk . • If the target line of a Toffoli gate is known, the values of all remaining lines can be implied if there is an assignment at the corresponding input or output. These specific strategies cannot be provided by a standard SAT or SMT solver. Moreover, extensions of standard solvers in this direction (e.g. by modifications of the heuristics) are not possible in general, because most of the problem-specific information is lost when encoding the instance. SAT and SMT solvers just have a clause database or constraint database, respectively. Thus, strategies like the ones described above can only be exploited with a solver that is based on a problemspecific representation.8 8 In principle, this problem can be prevented by introducing additional constraints to the problem instance. But then, the encoding becomes inefficient due to a very large number of constraints.

4.3 Improved Exact Synthesis

79

4.3.1.2 Dedicated Solve Techniques for Toffoli Circuit Synthesis To overcome the limitations discussed above, the solver framework SWORD is applied. While SAT solvers provide strategies optimized for clauses and SMT solvers for bit-vector constraints, respectively, SWORD makes problem-specific information available by using so called modules. These modules enable the implementation of dedicated heuristic as well as implication strategies, while still utilizing sophisticated SAT techniques such as conflict analysis or learning. In the following, the application of SWORD to Toffoli circuit synthesis is described. Section 2.3.2.3 gives a brief overview of the underlying solving techniques (starting on p. 24). For Toffoli circuit synthesis, dedicated modules have been developed that incorporate the problem-specific strategies described above. More precisely, a concrete Toffoli synthesis instance for the reversible function f to be synthesized with d gates includes d modules in a cascade structure—one module for each depth k. Each module has access to its relating variables tk , ck , xki , and xk+1 . The functionali ity of a Toffoli gate is defined by methods of the module, i.e. a concrete Toffoli gate function is selected by assigning tk and ck . Then, each module realizes the decision and implication strategies as described in the following. Decision Strategies The decision heuristic chooses a variable to be assigned if no further implication is possible. Therefore, decisions that cause many implications are preferred. Both, the global decision strategy (deciding which module should make the next decision) and the local decision strategy (deciding which variable of the chosen module should be assigned next) are motivated by the following two observations: • A module can imply many other assignments if the target line t k of the represented gate is known (i.e. if tk is completely assigned). In this case, the input xki and the output xk+1 for all lines, except the target line, have to be equal. i • The assignment of nearly all gate inputs xki are either given by the truth table (if k = 0) or they can be implied if the types of the previous gates are defined. These observations lead to the following decision heuristics: • Global decision heuristic Modules whose target lines are still undefined are selected for a decision. If all target lines of each module are defined, the module that still has other unassigned variables (from the vector ck for example) is selected. During this process the modules are considered in ascending order starting with depth k = 0. • Local decision heuristic Variables representing the target line of the respective Toffoli gate are decided first. If all target line variables are assigned, variables corresponding to the control lines are decided. Since the overall decision strategy (global and local) ensures that the gates become completely defined from the first gate to the last gate, there is no need for deciding the variables representing the inputs or outputs. These variables are implied after the corresponding types of the previous gates are decided.

80 Fig. 4.8 Propagate routine for module at depth k

4 Exact Synthesis of Reversible Logic

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19)

for each (truth table line i) for each (circuit line j ) k = x k+1 // input = output if (xij ij imply(tk ); // use value of j for each (circuit line j ) if (j == [tk ]2 ) continue; k or x k+1 ); imply(xij ij flipTargetLine = true; for each (clk ∈ ck ) if (clk == 1 ∧ x k k

it +l mod n)

== 0)

flipTargetLine = false; break; if (!flipTargetLine) imply(x k+1 ); // use value of x k k it k it else if (ck completely defined) k imply(x k+1 k ); // use value of x k ⊕ 1 it

it

Propagation Strategies The propagation procedures of a module consider the connected variables for the implication of the values. The pseudo-code of the propagation routine is shown in Fig. 4.8. The propagation routine consists of three parts: 1. Propagate the position of the target line (lines 2–4) If the assignment to an input is not equal to the assignment to the corresponding output of the same circuit line, then this line has to be the target line of the gate. In this case, the position j of this target line is assigned to tk . 2. Propagate non-target lines (lines 6–8) If the target line is known, all outputs are implied whose corresponding inputs are assigned (except the ones at the target line). This also holds vice versa. Thus, the output (input) is assigned to the value of the corresponding input (output). 3. Propagate the target line (lines 10–19) In the last step the output of the target line is assigned. To do so, the assignment to the control lines ck and to the corresponding input variables are considered. If a circuit line is a control line and the input of this line is assigned to 0 (line 12), then the assignment to the output of the target line has to be equal to its corresponding input assignment (line 16). Otherwise, if additionally ck is completely defined (i.e. no other control line with input value 0 can occur), the assignment to the output of the target line has to be equal to the inverted assignment to the input of this line (line 19). The presented decision and propagation strategies incorporated in the modules replace the respective functional and exclusion constraints. Furthermore, due to the problem-specific checks and heuristics, the overall formulation is lifted to a higher level of abstraction leading to faster run-times as shown in the evaluation

4.3 Improved Exact Synthesis

81

in Sect. 4.3.3. Nevertheless, the applied formulation still has to consider all truth table lines of f , i.e. the formulation is still of exponential size. How to overcome this drawback is described in the next section.

4.3.2 Quantified Exact Synthesis So far, for the respective checks “Is there a circuit with exactly d gates that realizes the given reversible function f ?”, several encodings on different levels of abstractions, i.e. on Boolean level, bit-vector level, or problem-specific level have been introduced. In all variations the problem is encoded for each truth table line separately. That is, the respective constraints representing the circuit to be synthesized are not built only for one truth table line, but they are duplicated for the remaining 2n − 1 truth table lines. Thus, the instances grow exponentially with respect to the number n of variables. In this section, an alternative problem formulation based on Quantified Boolean Formulas (QBF) (see Sect. 2.3.2) is introduced. QBF allows to encode the synthesis problem in polynomial size, i.e. the circuit to be synthesized is encoded only once and the specification of the considered function f is enforced by quantification. In doing so, complexity is moved from the problem description to the solving engine. In the following, the concrete method is described using a new formulation based on a universal gate type definition. This does not only enable to synthesize Toffoli circuits, but also reversible circuits consisting of Fredkin and Peres gates, respectively. Finally, it is shown how the resulting formulation can be solved using QBF solvers and Binary Decision Diagrams (BDDs). 4.3.2.1 Quantified Problem Formulation For the synthesis of a function f with n inputs/outputs into a reversible circuit, a set GT = {g0 , . . . , gq−1 } of q ∈ N different gate types is considered. The set GT is used to distinguish between all possible gate types in n variables. According to the chosen gate library (i.e. Toffoli gates, Fredkin gates, and/or Peres gates), the cardinality of GT varies. More precisely, let f : Bn → Bn be a reversible function to be synthesized. Then, there are • n · 2n−1 different multiple control Toffoli gate types, • n · (n − 1) · 2n−2 different multiple control Fredkin gate types, and • n · (n − 1) · (n − 2) different Peres gate types. If the gate library used for synthesis consists of more than one gate type, then the numbers above have to be added. For example, in the case of a gate library containing multiple control Toffoli gates and multiple control Fredkin gates for the synthesis of a 3 variable function, GT contains 3 · 23−1 + 3 · (3 − 1) · 23−2 = 12 + 12 = 24 different gates in total. Before the synthesis problem is formulated as a QBF instance, the universal gate is defined that covers the functionality of all gates given in the set GT.

82

4 Exact Synthesis of Reversible Logic

Definition 4.4 Let f : Bn → Bn be a reversible function to be synthesized. Then, a universal gate represents the function U GT (X, Y ) : Bn × Blog q → Bn with gt (X), if t = [y1 . . . ylog q ]2 < q, GT U (X, Y ) = X, otherwise where • X = {x1 , . . . , xn } is the set of the inputs of the gate and • Y = {y1 , . . . , ylog q } is the set of variables representing a binary encoding of a natural number t, which defines the type gt of the gate (in the following called gate select inputs). According to the assignments to the gate select inputs Y , a universal gate U GT acts either as a gate from the given set GT or as the identity gate. Remark 4.3 The variables Y = {y1 , . . . , ylog q } are comparable to the variables tk and ck used in the previous sections to define the type of a Toffoli gate. However, since now Fredkin and Peres gates are additionally considered, tk and ck cannot be applied any longer and thus are replaced by Y . Furthermore, the identity gate has been added to the definition of a universal gate to handle the case where the set GT does not exactly contain a power of two gate types. In this case, GT is extended by identity gates to fill the gap. In doing so, exclusion constraints are not needed any longer. Having a universal gate as a basis, a cascade of universal gates is defined as follows: Definition 4.5 Let f be a reversible function to be synthesized with at most d gates from the set GT. Then, a function F d is built representing the cascade structure of d universal gates U GT (X1 , Y1 ), . . . , U GT (Xd , Yd ). The output of the i-th universal gate (0 < i ≤ d) is equal to the input of the next gate, i.e. U GT (Xi , Yi ) = Xi+1 . Figure 4.9 shows the resulting cascade structure of the function F d for d universal gates. Using this structure, any reversible circuit containing d gates can be obtained by assigning the respective values to each of the gate select input variables yij ∈ Yi (0 < j ≤ log q). In other words, if a circuit realization with at most d gates for the reversible function f exists, there has to be at least one assignment to all variables yij ∈ Yi such that F d is equal to f . More formally, if f is synthesizable with at most d gates the quantified Boolean formula ∃y11 . . . ∃ydlog q ∀x1 . . . ∀xn (F d = f ) holds. This represents the new encoding of the synthesis problem which can be solved either by a QBF solver or by BDDs as described in the following.

4.3 Improved Exact Synthesis

83

Fig. 4.9 Problem formulation

4.3.2.2 Implementations Based on the proposed QBF formulation for the synthesis problem, two approaches can be applied to solve the formula: First, the problem is encoded as an instance of quantified Boolean satisfiability, which is given to a QBF solver. Second, the function F d = f is constructed as a BDD and thereafter the quantification is carried out on the BDD. A solution exists if the final BDD is not the constant 0-function. Moreover, all solutions can be extracted by traversing all paths to the 1-terminal. For both approaches, the incremental nature of F d is exploited during the construction of the formula. That is, first the formula F 0 = (x1 , . . . , xn ) is built for depth d = 0. Then, for each iteration, the function F d is incrementally built by applying F d = U GT (U GT (. . . (U GT (F 0 , Y1 ), Y2 ) . . . , Yd−1 ), Yd ). Finally, the equation to f is constrained. The next two paragraphs describe the respective steps for both approaches in more detail. Using QBF Solvers To use a common QBF solver, the formula F d = f is transformed into CNF, i.e. a representation that consists of Boolean variables and clauses. Then, the resulting set of clauses represents a cascade of d universal gates which has to meet the specification of f . The complete QBF instance is formed by adding the respective existential and universal quantifiers followed by an existential quantifier for the auxiliary variables added during the transformation into CNF (denoted as A in the following). Overall, this leads to the following quantification: ∃y11 . . . ∃ydlog q ∀x1 . . . ∀xn ∃A. Together with the CNF, afterwards this is passed to a QBF solver. In the case that the instance is satisfiable, a circuit realization of the function can be obtained from the assignments to the variables yij ∈ Yi . Otherwise it has been proven that no circuit realizing f with d gates exists. Using BDDs As shown later in the experiments, the performance of the QBF solver approach is poor. Therefore, BDDs are used as an alternative. That is, instead of building a quantified CNF and solving this instance with a QBF solver, the synthesis is carried out on a BDD representation.

84

4 Exact Synthesis of Reversible Logic

To this end, the BDD for the formula F d = f is build. This can be done efficiently using a state-of-the-art BDD package (e.g. CUDD [Som01]). The fixed variable order X, Y has thereby been applied. The alternative order Y, X leads to a blow-up of the BDD representation, since in this case, the BDD for F d would already represent all possible functions in n variables which are synthesizable with at most d gates. During the construction, isomorphic functions that result from the n output functions for F d are shared. After the computation of the equality, the resulting BDD is a single output function. For this BDD, the universal quantification of all xi variables is carried out. This is a standard operation available in a BDD package. The idea is to compute the product of the positive co-factor and the negative co-factor for a universally quantified variable, i.e. ∀x h(. . . , x, . . .) = h(. . . , 0, . . .) · h(. . . , 1, . . .). If the final BDD consists of the 0-terminal, then no reversible circuit with the given depth d exists for the function f . Otherwise, there is at least one path to the 1-terminal. Each of those paths represents an assignment to all variables yij ∈ Yi and thus can be converted into a concrete circuit realization. Since the BDD represents not only one but all 1-paths, in fact all realizations with the given depth are found in one single step. All solutions are of interest, since one can choose the best result with respect to quantum cost which is discussed later in the experiments.

4.3.3 Experimental Results The proposed improvements for exact synthesis have been implemented in C++. According to the respective encoding, the SMT solver MathSAT [BBC+05], the solver framework SWORD [WFG+07] (see also Sect. 2.3.2.3), the QBF solver sKizzo [Ben05], and the BDD package CUDD [Som01] have been used as solving engine, respectively. In this section, the described encodings are compared to each other as well as to the SAT-based encoding with MiniSAT [ES04] as solver. It is shown that higher levels of abstractions significantly improve the run-time when performing exact synthesis. Moreover, using the quantified formulation, further speed-ups can be documented and additionally the quality of the results can be strengthened. As benchmarks, again functions from RevLib [WGT+08] have been applied. All experiments have been carried out on an AMD Athlon 3500+ with 1 GB of memory. The timeout was set to 3000 CPU seconds.

4.3.3.1 Exploiting Higher Levels of Abstractions The original SAT encoding for exact synthesis (denoted by SAT) has been lifted to two higher levels of abstractions. First, instead of a Boolean formulation in CNF, the problem has been encoded in bit-vector logic that can be handled by SMT solvers (denoted by SMT). Second, problem-specific strategies developed within the solver framework SWORD provide an alternative (denoted by SWORD).

4.3 Improved Exact Synthesis

85

Results obtained by both approaches are summarized in Table 4.8. The first column provides the name of the function. Column n denotes the number of variables for each function, while in Column d the minimal number of Toffoli gates necessary to synthesize the function is given. The following columns provide the run-time of the respective synthesis approaches in CPU seconds (denoted by T IME). Furthermore, the improvements of the SMT approach and the SWORD approach are given in the last two columns, i.e. the run-time of MiniSAT divided by the run-time of MathSAT/SWORD (denoted by I MPR SAT ) and the run-time of MathSAT divided by the run-time of the SWORD approach (denoted by I MPR SMT ), respectively. The results clearly show that the chosen encoding is crucial for the resulting runtimes. For most of the functions a corresponding Toffoli circuit can be synthesized faster by the SMT approach than by using the SAT encoding. Only in some cases the SAT-based approach is slightly better. However, this only holds for functions that can be synthesized in less than one second, e.g. peres or fredkin. Overall, improvements of up to three orders of magnitude are achieved. Moreover, it is evident that the problem-specific approach outperforms the other methods. In many cases, the run-times are further reduced by a factor of approx. 30—in the best case by a factor of over 170. Furthermore, using the problemspecific approach, Toffoli circuits for the functions alu-v2 and alu-v3 are synthesized within the given timeout. In comparison to the SAT synthesis approach, speed-ups of up to four orders of magnitude are reported.

4.3.3.2 Quantified Exact Synthesis To evaluate the quantified exact synthesis encodings, three series of experiments have been carried out. First, the performance of the QBF solver and the BDD approach, respectively, is compared to each other as well as to previous encodings. Second, the fact that (using the BDD approach) all circuits for a given depth are available in parallel is evaluated. And finally, synthesis results for different gate libraries are considered. Run-time Comparison The run-times of both, the proposed QBF solver encoding (denoted by QBF S OLVER) as well as the BDD formulation (denoted by BDD S), is compared to the SAT-based and the SWORD-based approach (denoted by SAT and SWORD, respectively). A similar set of functions as in the previous sections was applied. Only some trivial functions (e.g. peres, fredkin) have been omitted. In contrast, two further functions, i.e. hwb4 and 4_49 (also taken from [WGT+08]), are additionally considered. The results are given in Table 4.9. The first columns show the name of the function as well as the number of lines (n) and the minimal number of Toffoli gates (d) of the resulting circuit, respectively. In the remaining columns, the run-times in CPU seconds (denoted by T IME) and the improvements of the new approaches with respect to the SAT solver (denoted by I MPR SAT ) and with respect to SWORD (denoted by I MPR SW. ) are given, respectively. The improvement is thereby obtained by the run-time of the SAT/SWORD approach divided by the run-time of the QBF solver/the BDD approach.

86

4 Exact Synthesis of Reversible Logic

Table 4.8 Experimental results if higher levels of abstractions are exploited F UNCTION n

d

SAT

SMT

T IME

T IME

SWORD I MPRSAT

T IME

I MPRSAT

I MPRSMT

R EVERSIBLE FUNCTIONS peres

3

2

0.01

0.03

0.33

<0.01

fredkin

3

3

0.03

0.12

0.25

peres-double

3

4

2.35

0.36

6.53

miller

3

5

0.23

0.22

mod5mils

5

5

48.28

3.81

>1.00

>33.00

<0.01

>3.00

>24.00

0.01

235.00

36.00

1.05

<0.01

>23.00

>22.00

12.67

0.08

603.50

47.63

ham3

3

5

0.60

0.29

2.07

0.01

60.00

29.00

ex-1

3

4

0.12

0.20

0.60

<0.01

>20.00

>12.00

graycode3

3

2

0.01

0.06

0.17

<0.01

>1.00

>6.00

graycode4

4

3

0.64

0.24

2.27

<0.01

>64.00

>24.00

graycode5

5

4

22.08

1.00

22.08

<0.01

graycode6

6

5

583.14

3.25

179.43

0.12

4859.50

27.08

3_17

3

6

0.43

0.72

0.59

0.03

14.33

24.00

mod5d1

5

7

2094.13

135.36

15.47

11.21

186.80

12.07

>2208.00 >100

mod5d2

5

8

1616.07

56.72

28.49

9.06

178.37

6.26

mini_alu

4

5

27.60

3.85

7.17

0.03

920.00

123.33

E MBEDDED IRREVERSIBLE FUNCTIONS rd32-v0

4

4

2.97

0.54

5.50

<0.01

>297.00

>54.00

rd32-v1

4

5

13.51

1.84

7.34

0.04

337.75

46.00

decod24-v0

4

6

6.54

1.33

4.92

0.02

327.00

66.50

decod24-v1

4

6

6.22

1.44

4.32

0.09

69.11

16.00

decod24-v2

4

6

7.25

1.35

5.37

0.02

362.50

67.50

decod24-v3

4

7

28.88

3.31

8.73

0.18

160.44

18.39

4gt4-v0

5

6

>3000.00

697.37

>4.30

21.19

>141.58

32.91

4gt4-v1

5

5

395.36

33.31

11.87

0.62

637.67

53.73

4gt5-v0

5

5

321.27

36.92

8.70

0.41

783.59

90.05

4gt5-v1

5

4

51.51

10.35

4.98

0.06

858.50

172.50

4gt10-v0

5

5

229.01

47.97

4.77

5.31

43.13

9.03

4gt10-v1

5

6

2417.04

234.85

10.29

14.69

164.54

15.99

4gt11-v0

5

3

8.54

1.32

6.47

0.01

854.00

132.00

4gt11-v1

5

4

33.39

3.30

10.12

0.24

139.13

13.75

4gt12-v0

5

5

441.79

25.86

17.08

2.39

184.85

10.80

4gt12-v1

5

5

470.96

41.17

11.44

9.53

49.42

4.32

4.3 Improved Exact Synthesis

87

Table 4.8 (Continued) F UNCTION

SAT

SMT

T IME

T IME

SWORD I MPRSAT

T IME

I MPRSAT

I MPRSMT

n

d

4gt13-v0

5

3

6.75

0.74

9.12

0.01

675.00

74.00

4gt13-v1

5

4

32.99

4.35

7.58

0.21

157.10

20.71

4mod5-v0

5

5

122.54

12.70

9.65

0.69

177.59

18.40

4mod5-v1

5

5

413.21

43.86

9.42

0.48

860.85

91.38

4mod7-v0

5

6

665.66

34.68

19.19

2.99

222.63

11.60

4mod7-v1

5

7

2055.80

100.07

20.54

133.97

15.34

0.75

one-two-three-v0

5

8

2292.26

443.21

5.17

55.66

41.18

7.96

one-two-three-v1

5

8

2094.26

481.53

4.35

71.72

29.20

6.71

one-two-three-v2

5

8

>3000.00

609.96

>4.91

78.05

>38.44

7.81

one-two-three-v3

5

8

>3000.00

250.36 >11.98

136.00

>22.06

1.80

alu-v0

5

6

1998.83

223.48

8.94

8.76

228.18

25.51

alu-v1

5

7

>3000.00

1692.29

>2.95

369.14

>13.54

4.58

alu-v2

5

7

>3000.00 >3000.00

–

840.25

>3.57

>3.57

alu-v3

5

7

>3000.00 >3000.00

–

764.04

>3.93

>3.93

From the results, it is easy to see that utilizing QBF leads to significant improvements for both, the QBF solver and the BDD approach, in comparison to the common SAT solving techniques. Only if additional knowledge is utilized, as done by SWORD, the QBF solver method is outperformed. However, the BDD approach for QBF leads to the smallest overall synthesis time for non-trivial functions. That is, for some functions the run-time, indeed, is higher than for SWORD, but this only holds for functions with an overall synthesis time of less than one second (e.g. graycode6 and decod24-v0). For all other functions, better run-times are documented. In the best case (hwb4), an improvement of more than a factor of 100 is achieved. Quantum Costs of Resulting Circuits After the efficiency of the BDD approach has been shown with respect to run-time, further experiments demonstrate the quality of the obtained results. As described in the preliminaries, quantum costs provide a good measurement of the complexity of the resulting circuits. The quantum costs thereby depend on the used Toffoli gates. Thus, it may be an advantage to determine not only one, but more Toffoli circuits for a given function. Then, by checking the resulting quantum costs for each of the realizations obtained the cheapest one with respect to quantum costs can be selected. Previous approaches for minimal Toffoli circuit synthesis determine only one circuit in each run. In contrast, using BDDs as described in Sect. 4.3.2 leads to

88

4 Exact Synthesis of Reversible Logic

Table 4.9 Comparison of quantifed encodings QBF- BASED

SAT- BASED

F UNCTION

SAT

SWORD QBF S OLVER

n d T IME R EVERSIBLE

T IME

T IME

BDD S

I MPRSAT I MPRSW. T IME

I MPRSAT I MPRSW.

FUNCTIONS

mod5mils

5

5

48.28

0.08

32.22

1.50

<0.01

0.15

321.87

0.53

graycode6

6

5

583.14

0.12

145.02

4.02

<0.01

0.46 1267.69

0.33

3_17

3

6

0.43

0.03

0.19

2.26

0.16

0.01

43.00

3.00

mod5d1

5

7

2094.13

11.21

405.96

5.16

0.03

1.68 1246.50

6.67

mod5d2

5

8

1616.17

9.06

337.49

4.79

0.03

3.84

2.36

hwb4

4 11 >3000.00 >3000.00 >3000.00

–

–

4_49

4 12 >3000.00 >3000.00 >3000.00

–

–

420.88

20.38 >147.20 >147.20 837.92

>3.58

>3.58

E MBEDDED

IRREVERSIBLE FUNCTIONS

rd32-v0

4

4

2.97

<0.01

0.22 13.50

<0.05

0.01

297.00

<1.00

rd32-v1

4

5

13.51

0.04

0.35 38.60

0.11

0.03

450.33

1.33

4mod5-v0

5

5

122.54

0.69

40.01

3.06

0.02

0.20

612.70

3.45

4mod5-v1

5

5

413.21

0.48

44.63

9.25

0.01

0.16 2582.56

3.00

decod24-v0 4

6

6.54

0.02

0.97

6.74

0.02

0.04

0.50

decod24-v1 4

6

6.22

0.09

1.28

4.86

0.07

0.04

155.50

2.25

decod24-v2 4

6

7.25

0.02

1.03

7.04

0.02

0.03

241.66

0.66

decod24-v3 4

7

28.88

0.18

2.00 14.44

0.09

0.05

577.60

3.60

1998.83

8.76

181.99 10.98

0.05

2.73

alu-v0

5

6

alu-v1

5

7 >3000.00

163.50

732.17

3.21

369.14 >3000.00

–

–

30.42 >98.62

12.13

alu-v2

5

7 >3000.00

840.25 >3000.00

–

–

34.72 >86.41

24.20

alu-v3

5

7 >3000.00

764.04 >3000.00

–

–

45.69 >65.66

16.72

all possible circuits in parallel. The differences in the resulting quantum costs are documented in Table 4.10. Column #S OL denotes the number of solutions found by the BDD approach, while QC denotes the minimal as well as the maximal quantum costs for the determined realizations. Considering the quantum costs of the obtained Toffoli circuits lead to further significant improvements. For example, circuits representing function 4_49 have quantum cost of 32 in the best case, while in the worst case quantum cost of more than 70 are required. Thus, in contrast to previous algorithms, the BDD-based synthesis is not only faster but also another quality criterion—the resulting quantum costs—is applicable. Synthesis with Extended Libraries Finally, the application of further gate types to the BDD-based synthesis is shown. This is done by extending the universal gate formula with further gates, i.e. Fredkin and Peres gates.

4.3 Improved Exact Synthesis Table 4.10 Quantum costs of resulting circuits

89 F UNCTION

#S OL

d

QC

R EVERSIBLE FUNCTIONS mod5mils

5

12

13–13

graycode6

5

1

5–5

3_17

6

7

14–14

mod5d1

7

1208

11–15

mod5d2

8

135

12–20

hwb4

11

264

23–39

4_49

12

374

32–72

E MBEDDED IRREVERSIBLE FUNCTIONS rd32-v0

4

4

12–12

rd32-v1

5

20

13–13

4mod5-v0

5

1176

9–21

4mod5-v1

5

592

9–25

decod24-v0

6

75

10–34

decod24-v1

6

3

14–22

decod24-v2

6

23

14–26

decod24-v3

7

1950

11–43

alu-v0

6

824

14–38

alu-v1

7

850

15–27

alu-v2

7

16296

15–55

alu-v3

7

132

15–39

The results are shown in Table 4.11. The respective depth (d), the run-time of the synthesis (T IME), the number of solutions (#S OL), and the quantum costs (QC) are listed. MCT+MCF denotes the results for a set of gates including multiple control Toffoli and multiple control Fredkin gates, MCT+P denotes the results for the set of gates including multiple control Toffoli and Peres gates, and MCT+MCF+P denotes the results for the set of all three gate types. As expected, extending the gate library leads to smaller realizations as for example the results for hwb4 show. While the minimal MCT-circuit for this function consists of eleven gates, it can be reduced by three more gates if additionally Peres gates are used. Furthermore, improvements with respect to the number of gates can be achieved for alu, 3_17, mod5d2, 4_49, rd32 and decod24, respectively. However, with an increasing number of gates to be considered, the run-times increase as well. This can be seen e.g. for function 4_49 or 4mod5. Only for the functions where the extension of the gate library leads to smaller circuits, the runtimes sometimes decrease (e.g. for function alu with the MCT+MCF library), since fewer iterations of the main flow have to be performed (see Sect. 4.1).

n

d

T IME

MCT+MCF

4

4_49

5

10

9

8

7

5

5

0.52

1193.90

37.36

63.96

15.08

0.02

2.77

4

4

5

5

4

4

4

4

5

5

5

5

rd32-v0

rd32-v1

4mod5-v0

4mod5-v1

decod24-v0

decod24-v1

decod24-v2

decod24-v3

alu-v0

alu-v1

alu-v2

alu-v3

5

5

5

4

6

6

5

5

5

5

5

4

10.00

8.66

6.73

0.43

0.07

0.09

0.04

0.05

3.52

4.09

0.10

0.02

I NCOMPLETELY S PECIFIED F UNCTIONS

5

mod5d1

4

5

3_17

hwb4

3

graycode6

mod5d2

5

6

mod5mils

C OMPLETELY S PECIFIED F UNCTIONS

F UNCTION

126

224

114

22

292

1435

12

13

2792

3672

780

4

32

774720

135

1352

2

1

12

#S OL

Table 4.11 Synthesis results using other gate libraries

17–33

17–39

17–33

16–30

12–32

12–38

15–23

11–25

9–29

9–23

11–41

12–12

54–58

31–51

12–20

11–19

15–15

5–5

13–13

QC

–

6

6

6

5

5

5

4

4

5

3

2

–

8

6

7

5

5

5

>3000.00

189.73

202.87

181.43

0.06

0.06

0.05

0.03

0.25

2.96

0.02

0.01

>3000.00

10.03

4.61

63.62

0.02

4.22

0.90

T IME

MCT+P d

–

280

638

29900

300

180

268

5

768

35088

12

4

–

164

8

5632

9

1

24

#S OL

–

22–40

18–32

12–64

11–29

11–23

14–27

13–14

7–18

8–27

9–9

8–8

–

23–29

12–19

10–23

11–12

5–5

12–13

QC

5

5

5

4

5

5

5

4

4

5

3

2

–

8

6

7

5

5

5

d

124.04

108.39

97.95

1.65

0.09

0.09

0.09

0.04

0.89

36.32

0.04

0.02

>3000.00

64.83

9.60

250.81

0.03

7.19

1.50

T IME

MCT+MCF+P

431

402

198

38

513

911

913

8

768

58176

12

4

–

1084

8

5856

43

1

24

#S OL

17–34

17–39

17–33

16–30

11–31

11–31

14–29

13–16

7–18

8–39

9–9

8–8

–

23–33

12–19

10–25

11–20

5–5

12–13

QC

90 4 Exact Synthesis of Reversible Logic

4.4 Summary and Future Work

91

4.4 Summary and Future Work Even if they are only applicable to small functions, exact synthesis methods are important in the context of evaluating heuristic methods, determining minimal building blocks (e.g. for the BDD-based synthesis as introduced in Sect. 3.2), and other aspects as well. In this chapter, several approaches based on satisfiability techniques have been introduced enabling exact synthesis for functions with up to six variables and leading to circuits with up to twelve gates. A comparison with the results obtained by previous approaches showed: Smaller circuits than the currently best known ones have been synthesized or, for the first time, the minimality of existing circuits was approved. Furthermore, it was shown that choosing an encoding for exact Toffoli circuit synthesis is crucial to the resulting run-times. Lifting the originally proposed Boolean SAT encoding to the SMT- or a problem-specific level accelerates the synthesis time by three or four orders of magnitudes, respectively. Complementary, applying quantifiers together with BDDs, a further speed-up by more than a factor of 100 can be observed in the best case. By the approaches proposed in this chapter, the number of reversible gates as well as the number of quantum gates (and therewith also quantum cost) have been considered. Depending on the addressed physical realization, further constraints (e.g. concurrent gates, adjacent gates, etc.) are important as well. Thus, exact synthesis with respect to further cost criteria might be a topic for future work. Chapter 6 discusses this aspect in detail and also proposes one respective exact approach for this aim. The results of this chapter build the basis for future investigations. The QBF encoding leads to the best results (in particular, with respect to run-time), since it does not need the exponential duplication of the instance for each truth table line. However, this encoding is still on the Boolean level, since a BDD is used as underlying solving engine. Thus, lifting the quantified encoding to higher levels of abstraction (as done for the Boolean SAT encoding) would be a promising task for future work. Therefore, respective solving engines efficiently supporting quantifiers must be available first. Additionally, the encoding itself is improvable. So far, all possible gate type combinations are tried by the solving engine. But, many combinations are redundant and can be ignored. Identifying easy to detect redundancies and exclude those from the search space may accelerate the solving process. Also, special function classes (e.g. symmetric functions) probably can be synthesized faster if the respective properties are fully exploited in the encoding. Furthermore, the application of advanced solving techniques like incremental SAT (see e.g. [WKS01]) seems to be promising, since several iterations of very similar instances are sequentially solved in the proposed synthesis approach.

Chapter 5

Embedding of Irreversible Functions

Quite often reversible logic should be synthesized for irreversible functions. Thus, the problem of embedding is an important aspect. Partially, how to handle irreversible functions during synthesis has already been discussed in the previous chapters (see e.g. Sects. 3.1.1 and 4.2.3). Additional lines are thereby introduced and the resulting constant inputs, garbage outputs, and don’t care conditions are arbitrarily assigned to concrete values. Further options exist how (i.e. in which order) to arrange the outputs in the circuit to be synthesized. Overall, functions can be embedded in different ways whereby the concrete don’t care assignments as well as the chosen output arrangement may have a significant impact on the resulting circuit size. As an example, for BDD-based synthesis introduced in Sect. 3.2 different output orders are applied to the building block-functions as they lead to better substitutions of the respective nodes. Since in the last chapters synthesis approaches (in particular the transformation-based approach and the exact synthesis method) have been described, now they can be used to evaluate the effect of different embeddings. In this chapter, the different aspects of embedding mentioned above are investigated in detail. First, strategies for the don’t care assignment [MDW09, MWD09] are proposed. More precisely, a greedy approach, a method based on the Hungarian algorithm, and an XOR-based strategy are introduced. Even if these strategies address don’t care assignments of the outputs only, it can be shown that the chosen method is crucial to the synthesis results. Afterwards, the order of outputs in the function to be synthesized is considered. Usually, each output is set to a fix position. But since in general, the output order is irrelevant for a given reversible function f , a new synthesis paradigm [WGDD09] is proposed that determines an equivalent circuit realization for f modulo the output permutation. That is, the result of the synthesis is a circuit whose outputs have been permuted. Therefore, distinct methods to efficiently determine “good” output permutations are introduced. As a result, significantly smaller circuits (even smaller than the ones previously obtained by the exact approaches) can be synthesized if this new synthesis paradigm is applied. In the following, the embedding problem together with the number of possibilities is described in Sect. 5.1, which therewith builds the motivation for the remaining sections. Afterwards, the approaches for don’t care determination (Sect. 5.2) R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, DOI 10.1007/978-90-481-9579-4_5, © Springer Science+Business Media B.V. 2010

93

94

5 Embedding of Irreversible Functions

Table 5.1 Embedding of an adder (b) Incomplete embedding (a) Original adder

(c) Complete embedding

cin x y

cout sum

0 cin x y

cout sum g1 g2

0 cin x y

cout sum g1 g2

0 00

0

0

0 0

0 0

0

0

–

–

0 0

0 0

0

0

0

0

0 01

0

1

0 0

0 1

0

1

–

–

0 0

0 1

0

1

0

0

0 10

0

1

0 0

1 0

0

1

–

–

0 0

1 0

0

1

0

1

0 11

1

0

0 0

1 1

1

0

–

–

0 0

1 1

1

0

0

0

1 00

0

1

0 1

0 0

0

1

–

–

0 1

0 0

0

1

1

0

1 01

1

0

0 1

0 1

1

0

–

–

0 1

0 1

1

0

0

1

1 10

1

0

0 1

1 0

1

0

–

–

0 1

1 0

1

0

1

0

1 11

1

1

0 1

1 1

1

1

–

–

0 1

1 1

1

1

0

0

1 0

0 0

–

–

–

–

1 0

0 0

0

0

0

1

1 0

0 1

–

–

–

–

1 0

0 1

0

0

1

0

1 0

1 0

–

–

–

–

1 0

1 0

0

0

1

1

1 0

1 1

–

–

–

–

1 0

1 1

0

1

1

1

1 1

0 0

–

–

–

–

1 1

0 0

1

0

1

1

1 1

0 1

–

–

–

–

1 1

0 1

1

1

0

1

1 1

1 0

–

–

–

–

1 1

1 0

1

1

1

0

1 1

1 1

–

–

–

–

1 1

1 1

1

1

1

1

and output permutation (Sect. 5.3) are proposed and evaluated. At the end of this chapter, all results are summarized and future work is sketched.

5.1 The Embedding Problem As already described in Sect. 3.1.1, at least g = log2 (μ) additional outputs are required to embed a completely specified irreversible function into a reversible function, where μ is the maximum number of times an output pattern is repeated in the irreversible function. For an irreversible function f : Bn → Bm , this means that the reversible embedding has m + g outputs. Furthermore, c constant inputs must be added such that n + c = m + g. Once the garbage outputs and constant inputs are added, an open issue is how to assign the don’t care conditions in the expanded truth table as shown by the following example. Example 5.1 Consider the adder function shown in Table 5.1(a). This function has three inputs (the carry-in cin as well as the two summands x and y) and two outputs (the carry-out cout and the sum). The function is irreversible, because the number of inputs differs from the number of outputs. Since the output pattern 01 appears three times (as does the output pattern 10), adding one additional output (leading to the same number of input and outputs) cannot make the function reversible. In

5.1 The Embedding Problem

95

Fig. 5.1 Circuits obtained with different embeddings

fact, log2 (3) = 2 additional outputs (and therewith one constant input) must be added. This is shown in Table 5.1(b). But since this incompletely specified function is not applicable for many synthesis approaches, afterwards the don’t cares must be assigned. One possible, albeit naive, embedding is shown in Table 5.1(c). This embedding was found by assigning the garbage outputs to the patterns 00, 01, and 10 in order for each of the output patterns in the top half of the table and then completing the bottom half of the table using the remaining available output patterns in numerical order. Remark 5.1 Not every synthesis approach requires a completely specified reversible function. For example, the SAT-based approach introduced in the last chapter can also handle don’t cares (see Sect. 4.2.3). However, most of the other synthesis approaches (as e.g. [Ker04, GAJ06, MDM07] and the transformation-based method described in Sect. 3.1.2) need a completely specified function. For these approaches, a completely specified embedding is required. To appreciate the complexity of choosing a don’t care assignment, consider Table 5.1(b). There are 4, 4, 3, 4, 2, 3, 2, and 4 choices for completing the don’t cares in the top eight rows of the table, respectively for a total of 9216 choices. The bottom eight rows of the table can then be completed in 8! ways. Lastly, the outputs can be permuted in 4! = 24 ways. Combining these yields 9216 · 40320 · 24 = 8, 918, 138, 880 possible embeddings for this small example. Each respective embedding may have an effect on the synthesis results, i.e. on the size of the resulting circuits. For example, the embedding from Table 5.1(c) lead to the circuit depicted in Fig. 5.1(a) (obtained by the transformation-based approach from Sect. 3.1.2). In contrast, using the embedding introduced in Sect. 3.1.1 (see Table 3.2 on p. 29), a significantly smaller circuit result as shown in Fig. 5.1(b). Thus, in the next two sections strategies how to determine “good” don’t care assignments and output permutations, respectively, are introduced and evaluated.

96

5 Embedding of Irreversible Functions

5.2 Don’t Care Assignment In this section, methods for don’t care assignment are presented and evaluated, respectively, that complete a reversible embedding of an irreversible function. It is assumed that always the minimal number of outputs (i.e. log2 (μ)) is added. Furthermore, all constant inputs are assigned to the value 0 and are always added as the most significant inputs in the truth table. This leads to a significant computational advantage as shown below and results in a circuit overhead of at most one NOT gate per constant input.

5.2.1 Methods In total, three approaches are introduced: A greedy algorithm, a method based on the Hungarian algorithm, and an XOR-based procedure. Afterwards, an incomplete application of the don’t care assignment methods is discussed that can be applied to many existing synthesis approaches and leads to a significant simplification of the synthesis process.

5.2.1.1 Greedy Method The first method for assigning don’t cares is motivated by the basic operation of the transformation-based synthesis algorithms (see Sect. 3.1.2). Here, gates are chosen so that each input value of the truth table matches its respective output value (i.e. so that the identity is achieved). Each line of the truth table is thereby sequentially traversed. It is thus reasonable to conjecture that assigning the don’t cares so that the Hamming distance of the output patterns to the corresponding input patterns is as small as possible, should help to reduce the number of gates required. This firstly leads to a simple greedy approach. The truth table is traversed downwards starting at the first row. In each row, the following two steps are performed: 1. For each distinct output assignment in the embedding, identify the target set of rows of the table containing that pattern. Then, determine the set of output assignments which are found by assigning the don’t cares in all possible ways. The candidates are arranged in ascending numerical order. 2. For each row in the target set in turn, choose the first remaining candidate assignment with minimal Hamming distance to the input assignment for that row. Example 5.2 Table 5.2(a) shows the embedding obtained by the greedy method for the full adder. The circuits synthesized from this assignment as well as from the naive assignment given in Table 5.1(c) are shown in Fig. 5.2(a) and Fig. 5.2(c),

5.2 Don’t Care Assignment

97

Table 5.2 Resulting embeddings of an adder after don’t care assignment (b) XOR-based method (a) Greedy/Hungarian method 0

cin

x

y

cout

sum

g1

g2

0

cin

x

y

cout

sum

g1

g2

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

1

0

1

0

0

0

1

0

1

1

1

0

0

1

0

0

1

1

0

0

0

1

0

0

1

1

0

0

0

1

1

1

0

1

1

0

0

1

1

1

0

0

1

0

1

0

0

0

1

0

0

0

1

0

0

0

1

0

0

0

1

0

1

1

0

0

1

0

1

0

1

1

0

1

1

0

1

1

0

1

0

1

0

0

1

1

0

1

0

1

0

0

1

1

1

1

1

1

1

0

1

1

1

1

1

0

1

1

0

0

0

1

0

0

0

1

0

0

0

1

0

0

0

1

0

0

1

0

0

0

1

1

0

0

1

1

1

1

1

1

0

1

0

0

0

1

0

1

0

1

0

1

1

1

0

1

0

1

1

0

0

1

1

1

0

1

1

0

0

0

1

1

1

0

0

1

1

0

0

1

1

0

0

1

1

0

0

1

1

0

1

1

1

0

1

1

1

0

1

0

0

1

1

1

1

1

0

1

1

1

0

1

1

1

0

0

0

1

0

1

1

1

1

0

1

1

1

1

1

1

1

0

1

0

1

Fig. 5.2 Circuits for the embeddings

98

5 Embedding of Irreversible Functions

respectively.1 The greedy assignment method leads to a circuit with 7 gates and quantum cost of 27, while with the naive embedding a circuit with 20 gates and quantum cost of 44 results.

5.2.1.2 Applying the Hungarian Algorithm Furthermore, the Hamming distance can be applied to formulate the don’t care assigning problem as an instance of the Hungarian algorithm [HL05]. To this end, let S be the set of truth table rows sharing a common output pattern in the irreversible function and let T be the set of possible assignments to the don’t cares to complete those rows. |T | is equal to 2g where g is the number of garbage lines added to permit the embedding of the irreversible function into a reversible one. Then, the don’t care assignment problem is to associate each element of S with a unique element from T . Let K(Si , Tj ) be the “cost” of associating the don’t care assignment Tj with Si , for which the hamming distance is applied. More precisely, K(Si , Tj ) is the Hamming distance between the completely specified truth table output pattern and the corresponding input pattern when Si is completed using Tj . This formulation can be expressed in tabular form with a row for each Si and a column for each Tj with each K(Si , Tj ) in the corresponding table entry. Assigning the don’t cares to minimize the total Hamming distance is then a matter of choosing one entry in each row such that those entries appear in unique columns and such that the sum of the chosen entries is minimal. This is a standard assignment problem. The Hungarian algorithm is a well-known method [HL05] for solving the assignment problem in polynomial time and thus has been applied to solve this instance. The only issue of note here is that storing the potentially very large assignment matrix is avoided, since Hamming distance is easily computed as needed—in fact more quickly than a matrix access. Example 5.3 Applying the Hungarian algorithm to the considered adder function, the same assignment as for the greedy method results. This may happen since both approaches use the Hamming distance as cost metric. Nevertheless, the experiments in Sect. 5.2.2 show that both assignment methods lead to notable differences for other (mainly larger) functions.

5.2.1.3 XOR-based Method The third method proposed for don’t care assignments is based on the observation that for many functions (in particular for arithmetic) a good embedding of an irreversible function into a reversible one is based upon setting the don’t care outputs to XOR combinations of the primary inputs. More precisely, the following steps are performed: 1 The

transformation-based synthesis method from Sect. 3.1.2 has been used to synthesize these circuits.

5.2 Don’t Care Assignment

99

1. For each truth table row i of the embedding f : Bn → Bn (i.e. 0 ≤ i < 2n ): a. Set k = i so that k represents the current input vector as a natural number. b. Set p = 0, q = 0. c. For each output fj of the embedding (i.e. 0 ≤ j < n): i. If fj is a garbage output, set q = q ⊕ kj and pj = q with kj (pj ) denoting the j th bit-value of the binary encoding of k (p). ii. Otherwise set pj to the j th bit-value of fi . In doing so, an output assignment is created (represented by the natural number p), where the don’t cares are assigned to a value achieved by an XORcombination with the respective input values. d. If p represents an already assigned output pattern, increment k and repeat from Step 1b. e. Set the output of the ith truth table line to p. Example 5.4 Table 5.2(b) shows the embedding obtained by the XOR-based method for the full adder. The circuit obtained from this assignment is shown in Fig. 5.2(b) (also synthesized using the transformation-based synthesis method). The XOR-based method yields a circuit with five gates and quantum costs of 13, which is significantly smaller than the circuits obtained with the greedy/Hungarian and the naive embedding, respectively. Overall, these circuits clearly show the importance of a good don’t care assignment. 5.2.1.4 Incomplete Don’t Care Assignment As noted above, all constant inputs take the value 0 and are always the most significant inputs in the truth table. This means that the irreversible function is always embedded in the first rows of the reversible truth table, while the remaining rows are completely don’t care (see e.g. Table 5.1(c)). In particular, if the original function has n primary inputs and, furthermore, c constant inputs are added, only the first 2n truth table rows of the embedding are of interest. The remaining (2c − 1)2n rows can be ignored. Given this construction, a synthesis method that works row by row from the top of the truth table (as e.g. the transformation-based synthesis approach from Sect. 3.1.2 and its derivates) can stop after transforming 2n rows. Because of this, it is not necessary to complete a don’t care assignment beyond the (2n )th row.2

5.2.2 Experimental Results In this section, experimental results obtained with the described don’t care assigning approaches are documented. To this end, (irreversible) functions from 2 Note that when this simplification is employed, bidirectional synthesis methods cannot be applied,

because don’t cares occur in the latter truth table lines so that no definition of the inverse function is possible.

100

5 Embedding of Irreversible Functions

RevLib [WGT+08] have been embedded using the proposed methods. Afterwards, the resulting embeddings have been passed to (1) the transformation-based synthesis from Sect. 3.1.2 (denoted by T RANSFORMATION - BASED A LGORITHM) or (2) to an extended version combining transformation-based synthesis with a searchbased method as proposed in [MWD09] (denoted by C OMBINED S YNTHESIS A L GORITHM ), respectively. An AMD Athlon 3500+ with 1 GB of memory was used for the experiments. Table 5.3 presents the results for each of the synthesis approaches and don’t care assignment methods, respectively. For the resulting circuits, the gate count (denoted by d) and the quantum cost (denoted by QC) are shown. Furthermore, for each function, the best result with respect to quantum cost is highlighted in bold. Run-times are not documented, since every circuit in Table 5.3 was found in less than one CPU second. The results show that the chosen embedding is crucial to the synthesis results. For example, the quantum costs of the circuits representing function rd73_69 range from 1112 to 184 using the combined synthesis approach. Thus, an improvement of nearly one order of magnitude can be achieved only by modifying the assignment of the don’t cares. In the next section, the second aspect of embedding, namely permutation of outputs, is considered in detail, where similar results have been achieved.

5.3 Synthesis with Output Permutation Usually, the outputs of a reversible function (or embedding, respectively) to be synthesized are set to a fix position. Since in general the output order is irrelevant for a given reversible function f , in this section a synthesis methodology—emphasized as Synthesis with Output Permutation (SWOP)—is proposed. SWOP determines a circuit for the function f modulo output permutation. That is, the result is a circuit representing the desired function but maybe with an adjusted order of outputs. This enables the determination of smaller realizations. In a naive way, synthesis with output permutation can easily be applied to existing approaches just by encoding all permutations, synthesize each in one turn, and keep the best one. Since each respective output order has to be considered, this results in an increase of factor n! (where n is the number of variables of the reversible n! function). If garbage outputs occur, this complexity can be reduced to g! (where g is the number of garbage outputs), which might still be a large number. Thus, in this section two approaches are introduced that efficiently determine “good” output permutations for both, exact as well as heuristic synthesis approaches. Moreover, applying the proposed exact synthesis with output permutation, also minimality with respect to all possible permutations is guaranteed. In the remainder of this section, both approaches are described in detail. First, the general idea, the best case benefits, as well as the complexity of the proposed synthesis paradigm are introduced and discussed in Sect. 5.3.1. Afterwards, exact SWOP and heuristic SWOP are described in Sects. 5.3.2 and 5.3.3, respectively. Finally, experimental results are given in Sect. 5.3.4.

9

10

11

sym9_71

rd84_70

5

mini-alu_84

rd73_69

5

alu_9

7

5

4mod7_26

7

5

4mod5_8

sym6_63

5

4gt5_21

rd53_68

2

5

4gt4_20

5

5

4gt13_25

6

5

4gt12_24

one-two-three_27

5

4gt11_23

decod24-enable_32

1

5

4gt10_22

3

1

2

1

2

3

0

1

1

1

1

1

1

1

1

1

4

2

4

d

rd32_19

n

decod24_10

F UNCTION

7

9

6

6

4

2

2

3

4

2

4

4

4

4

4

4

4

2

0

g

104

76

80

36

27

15

9

22

9

21

7

3

6

1

3

1

3

5

7

1823

1047

1112

485

228

39

33

110

65

65

19

19

58

13

55

5

47

13

11

7

3

6

1

3

1

3

5

7

104

76

80

36

27

11

9

26

16

21

d

1823

1047

1112

485

228

35

33

114

72

65

19

19

58

13

55

5

47

13

11

QC

H UNGARIAN

XOR- BASED

47

51

40

17

22

14

9

25

28

21

5

6

9

3

6

4

4

5

7

d

446

573

184

133

137

42

33

125

224

65

9

30

65

27

62

12

40

13

11

QC

9

3

7

1

3

1

3

5

7

111

210

100

36

22

15

9

22

12

15

d

2100

4368

2187

777

187

39

33

98

64

55

25

19

79

13

55

5

47

17

11

QC

9

3

7

1

3

1

3

5

7

111

210

100

36

22

13

9

28

16

15

d

2100

4368

2187

777

187

37

33

108

68

55

25

19

79

13

55

5

47

17

11

QC

H UNGARIAN

7

9

3

7

1

3

1

3

5

111

210

100

36

22

15

9

22

32

15

d

2100

4368

2187

777

187

39

33

110

252

55

25

19

79

13

55

5

47

17

11

QC

XOR- BASED

G REEDY

QC

G REEDY

d

T RANSFORMATION - BASED A LGORITHM

C OMBINED S YNTHESIS A LGORITHM

Table 5.3 Comparison of don’t care assignment methods

5.3 Synthesis with Output Permutation 101

102 Table 5.4 Function specification

5 Embedding of Irreversible Functions x1

x2

x3

f1

f2

f3

0

0

0

0

0

0

0

0

1

0

1

0

0

1

0

1

0

0

0

1

1

1

1

1

1

0

0

0

0

1

1

0

1

0

1

1

1

1

0

1

0

1

1

1

1

1

1

0

Fig. 5.3 Minimal Toffoli circuits

5.3.1 General Idea Many synthesis approaches get a reversible function (or a reversible embedding, respectively) f : Bn → Bn , where each specified output has a fix position. Thus, often only a fix order of outputs is considered during the synthesis. Example 5.5 Consider the function specification shown in Table 5.4. The reversible function maps (x1 , x2 , x3 ) to (x2 , x3 , x2 x3 ⊕ x1 ) = (f1 , f2 , f3 ). A minimal Toffoli circuit for this function is shown in Fig. 5.3(a). This circuit consists of 6 gates. Usually, the order of the outputs is irrelevant and can be swapped. As shown in the following example, this can lead to a much more compact circuit. Example 5.6 In Fig. 5.3(b) a Toffoli circuit is depicted which computes the same reversible function than the Toffoli circuit shown in Fig. 5.3(a). But in contrast, the three output functions have been reordered to another position in the output vector. More precisely, the Toffoli circuit shown in Fig. 5.3(b) maps the input (x1 , x2 , x3 ) to the output (x2 x3 ⊕ x1 , x2 , x3 ) = (f3 , f1 , f2 ). This reduces the overall number of gates from 6 to 1, i.e. 5 gates have been saved. Motivated by this example, a new synthesis paradigm denoted as Synthesis with Output Permutation (SWOP) is introduced. To this end, synthesis approaches are ex-

5.3 Synthesis with Output Permutation

103

Fig. 5.4 Permutations with garbage outputs

Fig. 5.5 Realization of a permutation

tended in such a way that different (or all) output permutations are considered. This causes a significant increase in complexity, since in general all possible permutations have to be checked (resulting in n! different synthesis calls in total). This can be slightly reduced, if a function containing garbage outputs should be synthesized. n! different permutations have to be considered, since permutations of Then, only g! the garbage outputs can be ignored. Example 5.7 Figure 5.4 shows all n! possible permutations for function with n = 3 variables and g = 2 garbage outputs (denoted by g1 and g2 ). Since the garbage outputs are left unspecified, the permutations that only swap garbage outputs can be skipped (i.e. the last three permutations of Fig. 5.4). Thus, only 3! 2! = 3 permutations instead of all 3! = 6 permutations are considered. Nevertheless, the number of additional checks is quite high. In contrast, synthesis with output permutation may lead to significant reductions in the resulting circuit sizes. To illustrate this, consider Fig. 5.5 depicting the gates needed to permute two signals in a reversible circuit with Toffoli gates (in total three gates are required). Since the best position of the outputs is unknown at the beginning of the synthesis process, outputs may be placed arbitrarily in the function specification. Then, the three gates of Fig. 5.5 may be needed to permute the value of a signal to the position given by the function. If in contrast output permutation is considered during the synthesis, the number of gates of the resulting circuit may be significantly lower. Lemma 5.1 The number of gates in a reversible circuit obtained by common synthesis approaches may be up to 3 · (n − 1) higher than the number of gates in a circuit where synthesis with output permutation is applied (with n being the number of variables). Proof Let d be the minimal number of gates of a circuit obtained by enabling output permutation during synthesis. To move one output line to the position given by the function three Toffoli gates are required (see Fig. 5.5). At most n − 1 lines need to be moved. It follows that the cost of the minimal circuit, where no output permutation is allowed, is less than or equal to d + 3 · (n − 1).

104

5 Embedding of Irreversible Functions

Remark 5.2 Lemma 5.1 gives a best case improvement. Because of the heuristic nature of most of the synthesis approaches, of course also circuits with a larger number of gates may result. This motivates the investigation of methods that exploit output permutation during the synthesis. The next two sections show how this is realized using an exact synthesis approach as well as a heuristic synthesis approach, respectively.

5.3.2 Exact Approach As intensely considered in Chap. 4, exact synthesis approaches determine minimal circuits for a given function, i.e. circuits with the minimal number of gates. The methods introduced in Chap. 4 exploit Boolean satisfiability (SAT) techniques where the basic idea is to check whether there exists a Toffoli circuit for a reversible function with d gates (starting with d = 1, where d is increased in each iteration if no realization is found). The respective checks are performed by representing the problem as an instance of SAT which is afterwards solved by a SAT solver or similar (specialized) solving engines (for more details see the respective sections in Chap. 4). To describe how synthesis with output permutation is applied to this exact synthesis method, the concrete SAT encoding is sketched as follows: Definition 5.1 Let f : Bn → Bn be a reversible function to be synthesized. Then, the SAT instance of the respective synthesis problem is given as Φ∧

n −1 2

([inpi ]2 = i ∧ [outi ]2 = f (i)),

i=0

where • inpi is a Boolean vector representing the inputs of the circuit to be synthesized for truth table line i, • outi is a Boolean vector representing the outputs of the circuit to be synthesized for truth table line i, and • Φ is a set of constraints representing the synthesis problem as described in Sects. 4.2.1 and 4.3.1, respectively. As an example Fig. 5.6(a) shows the simplified representation of the synthesis problem for the function specified in Table 5.4 (where the values of the truth table are given as integers). Applying SWOP to the exact approach and still ensuring minimality, all pern! mutations have to be considered. This can be done—as mentioned above—by g! separate synthesis calls. However, exploiting the advanced techniques of the used SAT solvers leads to faster synthesis. Therefore, an adjusted encoding is proposed which requires one additional Boolean vector.

5.3 Synthesis with Output Permutation

105

Fig. 5.6 Encoding for exact synthesis

Definition 5.2 Let f : Bn → Bn be a reversible function to be synthesized. Then, p = (plog n! , . . . , p1 ) is a Boolean vector representing the binary encoding of a 2 g!

n! natural number p ∈ {1, . . . , g! } which indicates the chosen output permutation of the circuit.

Using this vector, the SAT encoding is slightly extended: According to the assignments to p (set by the SAT solver), a value for p is determined, which selects the current output permutation. Depending on this permutation the respective output order is set during the search. More formally, the encoding of Definition 5.1 is extended as follows: Φ∧

n −1 2

([inpi ]2 = i ∧ [outi ]2 = πp (f (i))).

i=0

The extended encoding of the synthesis problem for the function specified in Table 5.4 is illustrated in Fig. 5.6(b). If the solver finds a satisfying assignment for this SWOP instance, one can obtain the circuit from the result as described in Chap. 4 and the best permutation is provided by the assignment to p. Overall, this extension allows exact SWOP with only one synthesis call in conn! separate ones. Furthermore, since the variables of p are an integral part of trast to g! the search space, the permutations are checked much more efficiently. Because of modern SAT techniques (in particular conflict analysis [MS99]), during the search process reasons for conflicts are learned. This learned information prevents the solver from reentering non-solution search space, i.e. large parts of the search space are pruned. In contrast, this information is not available when each permutation is checked by separate calls of the solver. Thus, exact synthesis with output permutation is possible in feasible run-time when learning is exploited.

5.3.3 Heuristic Approach To apply SWOP to a heuristic approach, the transformation-based algorithm described in Sect. 3.1.2 is considered. To avoid the construction of all possible permutations (which would lead to a complexity increase of n!, since the transformationbased approach does not support garbage outputs), a SWOP-based synthesis heuristic using a sifting algorithm is proposed. This algorithm is inspired by [Rud93] and

106 Fig. 5.7 Heuristic SWOP

5 Embedding of Irreversible Functions

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

HeuristicSWOP(f : Bn → Bn ) // f is given in terms of a truth table perm = (1, 2, . . . , n); dbest = synthesize(perm); best_perm = perm; for i = 0 to n − 2 do for j = i + 1 to n − 1 do tmp_perm = swap(perm, i, j ); dtmp = synthesize(tmp_perm); if (dtmp < dbest ) best_perm = tmp_perm; perm = best_perm;

reduces the above mentioned complexity to n2 . Because of the heuristic behavior of sifting, maybe not the best permutation is determined. However, as the experiments in Sect. 5.3.4 show, significant improvements can be achieved in feasible run-times. The pseudo-code for the sifting algorithm is given in Fig. 5.7. First, an initial permutation (given by the function) is chosen and the circuit for this function is synthesized (line 3 and line 4). The gate count of this first realization is stored. After this, for each output the best position within the current permutation is searched. This is done by swapping the position of the current output with each of the other positions leading to new permutations (line 8). For each of these new permutations, the respective circuit is synthesized (line 9). If the gate count of such a circuit is smaller than the current best known gate count (line 10), the current permutation is stored as being the best one (line 11). When each position for one output has been checked, the best permutation of these checks is used for the remaining outputs (line 12). In summary, for each of the first n − 1 outputs, the algorithm will find a new position that will result in a realization with the fewest gates—when synthesized with the transformation-based approach. Therewith the complexity of SWOP can be reduced while still improving the obtained results as the next section will show.

5.3.4 Experimental Results This section provides experimental results for SWOP. In total, four different aspects are studied: (1) the reduction of the complexity of SWOP when garbage outputs are considered, (2) the results of exact SWOP in comparison to the common exact synthesis, (3) the results of heuristic SWOP in comparison to the common transformation-based approach, and (4) the quality (with respect to the number of gates) of the circuits synthesized by SWOP in comparison to the currently best known realizations. For exact synthesis, the SWORD approach introduced in Sect. 4.3.1 has been used. The SWOP extension was implemented on top of this approach. As heuristic approach, the transformation-based approach (including template matching as described in [MDM05]) was applied. The respective benchmark functions have been

5.3 Synthesis with Output Permutation

107

Table 5.5 SWOP considering garbage outputs SWOP

F UNCTION n

g

d

n!

OPT. SWOP T IME

n! g!

T IME

I MPR

4mod5

5

4

5

120

233.18

5

7.37

decod24

4

0

5

24

0.10

24

0.10

31.6 1.0

gt4

4

3

3

24

<0.01

4

<0.01

1.0

gt5

4

3

1

24

0.01

4

<0.01

>1.0

low-high

4

3

4

24

3.71

4

0.39

9.51

zero-one-two

4

1

4

24

0.03

24

0.02

1.5

maj4_1

5

4

6

120

3500.90

5

2125.62

1.6

maj4_2

5

4

5

120

191.92

5

4.19

45.8

alu

5

4

6

120

2013.72

5

61.24

32.9

mini_alu_1

4

2

5

24

0.28

12

0.19

1.5

mini_alu_2

5

3

7

120

930.60

20

474.42

1.9

mini_alu_3

5

3

5

120

9.60

20

2.07

4.6

taken from RevLib [WGT+08]. All experiments have been carried out on an AMD Athlon 3500+ with 1 GB of main memory. All run-times are given in CPU seconds. The timeout was set to 3600 CPU seconds. 5.3.4.1 SWOP with Garbage Outputs In a first series of experiments the different complexities are compared which occur if Toffoli circuits for functions containing garbage outputs are synthesized. Here, n! permutations are considered. instead of n! permutations, only g! Table 5.5 shows the results of the exact SWOP approach with n! permutations n! permutations for functions containing garbage outputs. The first three and with g! columns provide the name of the function, the number of circuit lines n, and the number of garbage outputs g, respectively. The minimal number of gates of the obtained Toffoli circuits is given in column d. Then, the run-times of SWOP with n! n! permutations are given, respectively (denoted by T IME). Furthermore, and with g! n! the improvement of the optimized SWOP (i.e. the synthesis with only g! permutations) in comparison to SWOP with all n! permutation is provided (i.e. run-time of SWOP divided by run-time of OPT. SWOP). As expected the reduction of permutations leads to better run-times for all functions. Improvements of up to a factor of 45 can be achieved in the best case.

5.3.4.2 Exact SWOP In this section, exact SWOP is compared to the exact approach from Sect. 4.3.1. The results are shown in Table 5.6. Here again, the first column provides the name

108

5 Embedding of Irreversible Functions

Table 5.6 Comparison of exact synthesis to exact SWOP E XACT S YNTHESIS

F UNCTION n

g

d

T IME

SWOP-T IME S YN -T IME

E XACT SWOP d

vs.

n! g!

T IME

4mod5

5

4

5

0.9

5

7.4

8.2 > 5

decod24

4

0

6

0.1

5

0.1

1.0 < 24

gt4

4

3

4

<0.1

3

<0.1

gt5

4

3

3

<0.1

1

0.1

1.0 < 4

low-high

4

3

5

0.2

4

0.4

2.0 < 4

zero-one-two

4

1

5

<0.1

4

<0.1

maj4_1

5

4

6

438.0

6

2125.6

1.0 < 4

1.0 < 24 4.9 < 5

maj4_2

5

4

6

13.6

5

4.2

0.3 < 5

alu

5

4

7

423.3

6

61.2

0.1 < 5

mini_alu_1

4

2

5

<0.1

5

0.2

2.0 < 12

mini_alu_2

5

3

8

2460.0

7

474.4

0.2 < 20

mini_alu_3

5

3

5

0.2

5

2.1

10.5 < 20

3_17

3

0

6

<0.1

5

<0.1

1.0 > 6

graycode6

6

0

5

<0.1

5

13.5

13.5 < 720

mod5d1

5

0

7

11.8

7

184.1

15.6 < 120

mod5d2

5

0

8

9.9

8

1097.6

110.8 < 120

mod5mils

5

0

5

0.1

5

1.7

17.0 < 120

of the function, while n and g denote the number of variables and the number of garbage outputs, respectively. The next columns give the minimal number d of gates determined by the two approaches and the corresponding run-times. The last column shows information relating to the complexity, i.e. the run-time overhead when output -Time n! permutation is considered ( SWOP Syn-Time ) compared to the factor g! . It can be seen that for many functions, SWOP found smaller circuits than the ones generated by the previous exact synthesis approach. Thus, removing the restriction for the output order leads to smaller circuits for many of the well known benchmark functions. As expected, the run-time for SWOP is higher in comparison to the run-time of the pure exact synthesis. This is, because the search space is obviously larger due to the number of output permutations that can be chosen. However, the increase is n! not as high as the worst case complexity ( g! ). This can be seen in the last column of Table 5.6. For all benchmarks (except 4mod5 and 3_17) the run-time of SWOP divided by the run-time of the previous synthesis approach is significantly smaller n! than g! . As explained, this is due to search space pruning, possible if the encoding is extended so that all permutations are checked in parallel. Moreover, for some benchmarks (e.g. maj4_2 or alu) the run-time of SWOP is even smaller than for a single exact solution. This reduction is caused by the fact that smaller circuits are found and thus the synthesis terminates earlier.

5.3 Synthesis with Output Permutation

109

5.3.4.3 Heuristic SWOP The results obtained by common heuristic synthesis (i.e. by the transformationbased approach including templates [MDM05]) are given in Table 5.7. More precisely, the gate counts of the resulting circuits as well as the run-time needed for their synthesis is given for (1) the original algorithm (denoted by H EURISTIC S YN THESIS ), (2) for the SWOP-based synthesis where all permutations are considered (denoted by ALL PERMS), and (3) for the SWOP-based synthesis where the sifting algorithm introduced in Sect. 5.3.3 is used (denoted by S IFTING). As can clearly be seen, the effect of output permutation is significant for most of the functions. For example, for the function aj-e13 the realization is reduced by 30 percent from 40 gates to 28 gates. The best absolute reduction of gates can be obTable 5.7 Comparison of heuristic synthesis to heuristic SWOP F UNCTION

H EURISTIC SWOP

H EURISTIC S YNTHESIS n

3_17

3

6

4_49

4

4mod5

5

S IFTING

ALL PERMS

T IME

d

T IME

d

0.03

6–7

17

0.40

9

0.03

T IME

d

0.32

6

0.25

14–22

4.09

16

1.09

9–21

10.02

9

0.75

5mod5

6

18

0.13

14–37

254.14

18

3.59

aj-e10

5

33

0.63

22–51

107.03

30

8.21

aj-e11

4

12

0.09

11–22

2.46

11

0.55

aj-e12

5

26

0.35

25–57

103.37

25

8.11

aj-e13

5

40

0.97

28–51

112.70

34

12.31

ex1

3

4

<0.1

0.08

4

0.06

<0.1

graycode3

3

2

graycode4

4

3

0.01

4–8 2–5

0.01

2

0.01

3–9

0.32

3

0.07

graycode5

5

4

0.03

4–13

4.72

4

0.31

graycode6

6

5

0.08

5–18

67.25

5

1.08

hwb3

3

7

0.06

6–11

0.32

7

0.29

hwb4

4

15

0.35

10–21

3.70

10

0.69

hwb5

5

55

1.66

38–62

153.71

44

16.49

prime5

6

15

0.20

13–40

227.05

13

3.09

prime5a

6

16

0.10

14–41

291.58

14

3.92

ham3

3

5

0.01

3–5

0.02

4

0.03

hwb6

6

125

7.08

–

>3600.00

91

89.20

hwb7

7

283

33.26

–

>3600.00

259

656.82

hwb8

8

676

152.13

–

>3600.00

641

4525.22

ham7

7

23

0.34

–

>3600.00

23

49.47

rd53

7

16

0.26

–

>3600.00

13

10.04

110 Table 5.8 Best results obtained by SWOP

5 Embedding of Irreversible Functions F UNCTION

B EST KNOWN d

SWOP d

Δd

decod24

6

5

1

gt5

3

1

2 1

3_17

6

5

4_49

16

14

2

aj-e13

40

28

12

hwb4

11

10

1

served for function hwb8. Here, 35 gates are saved in total when output permutation is applied. But, not only the improvements are of interest. Even a comparison of the best and the worst permutation (shown in column d for ALL PERMS) give some interesting insight. For example, consider the function hwb5. One output permutation results in a circuit with 38 gates, while another permutation results in 62 gates. Since a heuristic synthesis procedure is used, the results will most likely not be optimal. In fact, according to Lemma 5.1 the best case improvement (i.e. the difference between the best and the worst permutation) for hwb5 cannot be greater than 3 · (5 − 1) = 12 for minimal realizations—yet it is 24. This can be explained by the heuristic nature of the approach that does not guarantee minimality. Finally it is shown that sifting provides good results in a fraction of the run-time. For most functions with more than six variables it is not feasible to minimize the circuit considering all permutations. However, sifting offers significant improvements for these cases.

5.3.4.4 Reductions Achieved by SWOP Finally, the quality (with respect to the number of gates) of some circuits synthesized by SWOP is compared to the currently best known realizations obtained by common synthesis approaches. Table 5.8 shows a selection of functions with the gate count of the currently best known circuit realization (denoted by B EST KNOWN). The obtained gate count when output permutation is considered is given in column SWOP. Synthesis with output permutation enables the realization of smaller circuits than the currently best known ones. As an interesting example, the realizations of the hwb4 function are observed in more detail. For the original function, a minimal realization with 11 gates has been synthesized by the exact approach. Now, using output permutation it is possible to synthesize a smaller realization with only 10 gates using a heuristic approach.

5.4 Summary and Future Work

111

5.4 Summary and Future Work Embedding irreversible functions is a new synthesis step, which is not needed in traditional circuit design. Nevertheless, it is a no less important aspect, since many practically relevant functions are irreversible and therewith need an embedding to become synthesizable in reversible logic. Usually, the embedding is done in a straightforward manner—Sect. 3.1.1 shows a possible approach. But, the way how the embedding is done is crucial to the resulting circuit sizes. This chapter gave examples showing this correlation. More precisely, two aspects have been investigated in detail. First, how to assign don’t cares was addressed. Don’t cares result if additional lines are added to make an irreversible function reversible. Second, the effect of different output permutations on the resulting circuits has been observed. Approaches have been proposed that exploit the respective possibilities so that significantly smaller circuits result. In doing so, it was not only possible to synthesize many important functions with smaller size; the described techniques (in particular the exact SWOP formulation from Sect. 5.3.2) also have been used to determine the respective BDD node substitutions for the BDD-based synthesis introduced in Sect. 3.2. In this sense, the made contributions are crucial in particular for synthesis of irreversible (sub-)functions that often occur in hardware design. In future work, the determination of “good” embeddings should be lifted from the truth table level to higher function representations. As an example, a 1-bit adder can efficiently be synthesized using the introduced techniques with minimal number of garbage lines, proper don’t care assignments, and the best output permutation. But, this cannot be ensured for a 32-bit adder with 232 = 4.3 · 109 truth table lines. Therefore, other synthesis techniques as e.g. the BDD-based synthesis approach from Sect. 3.2 must be applied. However, the embeddings generated by the BDDbased synthesis are not optimal with respect to the number of additional lines. Even simply cascading 32 optimally embedded 1-bit adders is not satisfying, since each 1-bit adder includes at least one constant input so that the final circuit would contain 32 additional lines. In fact, a 32-bit adder can be represented with at most one additional line only [CDKM05]. While for special functions like the adder “good” embeddings already have been manually found, developing new techniques that automatically generate embeddings for large functions are left for future work. Therewith, this chapter concludes the consideration of reversible logic synthesis in this book. In the next chapters the remaining aspects of an upcoming design flow, namely optimization as well as verification and debugging, are considered in detail.

Chapter 6

Optimization

The primary task of synthesis approaches is to generate circuits that realize the desired functions. Secondarily, it should be ensured that the resulting circuits are as compact as possible. However, the results obtained by synthesis approaches often are sub-optimal. For example, the transformation-based synthesis method described in Sect. 3.1.2 tends to produce circuits with very costly Toffoli gates (i.e. Toffoli gates with a large number of control lines). Hierarchical synthesis approaches like the BDD-based method from Sect. 3.2 or the SyReC-based approach from Sect. 3.3 lead to circuits that are not optimal with respect to the number of lines. Besides that, technology specific constraints are often not considered by synthesis approaches. Consequently, in common design flows optimization approaches are applied after synthesis. For reversible logic, only first attempts in optimization have been made in the last years. In particular, reducing the quantum cost of given circuits has been considered. For example, template matching [IKY02, MDM05, MYDM05] is a search method which looks for gate sequences that can be replaced by alternative cascades of lower cost. For many circuits, substantial improvements are achieved using this method. But, for large circuits or a high number of applied templates, respectively, this approach suffers from high run-time. As a second example, the work in [ZM06] showed how analyzing cross-point faults can identify redundant control connections in reversible circuits. Removing such control lines reduces the cost of the circuit. However, the computation needed to determine redundant control connections is extremely high. In this chapter, three new optimization approaches are introduced—each with an own focus on a particular cost metric. The first one considers the reduction of the well-established quantum cost (used in quantum circuits) and the transistor cost (used in CMOS implementations), respectively. Therefore, a (small) number of additional signal lines are added to the circuit that are used to “buffer” factors of control lines [MWD10]. Then, these factors can be reused by other gates in the circuit which reduces the size of the gates and thus decreases the cost of the circuit. A fast algorithm is presented along with results showing that even for a small number of additional lines (even 1) a significant amount of cost can be saved. R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, DOI 10.1007/978-90-481-9579-4_6, © Springer Science+Business Media B.V. 2010

113

114

6 Optimization

The second approach consideres the line count in a circuit. While adding a small number of additional lines may be worth to reduce e.g. the quantum cost of a circuit, usually this number should be kept small (in particular for quantum circuits where circuit lines or qubits, respectively, are a limited resource). But as already mentioned above, in particular hierarchical approaches lead to a significant amount of additional circuit lines. To reduce these lines, a post-synthesis approach is introduced which re-synthesizes parts of the circuits so that lines with constant inputs can be merged [WSD10]. Therewith, notable line reductions can be achieved. Finally, an optimization method is introduced which takes a cost metric beyond the established quantum cost, transistor cost, and line count into account. This is motivated by new physical realizations of reversible and quantum circuits (see e.g. [DV02, Mas07, RO08]) leading to further limitations and restrictions. By means of so called Nearest Neighbor Cost (NNC), it is shown how reversible circuits can be optimized with respect to the resulting new cost metrics [WSD09]. NNC is important if Linear Nearest Neighbor (LNN) architectures [FDH04, Kut06, Mas07] are addressed as target technology. Here, only adjacent gates are allowed (i.e. gates where control line and target line are on adjacent circuit lines). Since ensuring adjacent gates in a naive way increases the quantum cost by about one order of magnitude, optimization approaches are introduced that significantly reduce this increase. At the end of this chapter, all results are summarized and future work is sketched.

6.1 Adding Lines to Reduce Circuit Cost This section shows how circuit cost can be significantly reduced, if additional lines are added to the circuit. Therefore, the general idea is firstly introduced in Sect. 6.1.1 before an algorithm exploiting the made observation is proposed in Sect. 6.1.2. Finally, the experimental results in Sect. 6.1.3 demonstrate the effect of the proposed optimization approach.

6.1.1 General Idea Optimization approaches, such as the two noted above preserved the number of lines in the circuit to be optimized. In contrast, this section shows how extending the circuit by additional signal lines can improve the cost of a reversible circuit. The additionally added lines are thereby denoted as helper lines in the following. Definition 6.1 Let G be a reversible circuit. Then, a helper line is an additionally added circuit line • whose input is set to a constant value 0 and • whose output is used as a garbage output.

6.1 Adding Lines to Reduce Circuit Cost

115

Fig. 6.1 Illustrating the general idea of factoring

Having a helper line available, values can be “buffered” on this line so that they can be later reused by other gates. In doing so, control lines can be saved as shown by the following definition. Definition 6.2 Let G be a reversible circuit and h be a helper line. Then, a gate ˆ t), MCT(C, t) of G can be replaced by the sequence MCT(F, h), MCT(h ∪ C, ˆ F ∩ Cˆ = ∅, and F = ∅. In the following this reMCT(F, h) where C = F ∪ C, placement is referred as factoring the initial gate, where F is a factor of MCT(C, t). Remark 6.1 The terminology “factoring” and “factor” are natural, since partitioning the control set C into F and Cˆ is essentially factoring the AND function for the control lines. This factoring depends on the fact that 0 ⊕ x1 x2 . . . xk = x1 x2 . . . xk , i.e. that the result of a factor can be “buffered” by a constant line assigned to 0. Applying Definition 6.2 to gates of a circuit, control lines can be removed. Since the number of control lines defines the amount of the circuit cost, this may lead to less costly circuits. However, this is only the case, if the total cost of the added gates is less than the cost saved by factorizing the control lines. By substituting a single gate only, this cannot happen for the transistor cost model but it can for the quantum cost model. If more than one gate can be substituted, higher cost savings are achieved (then also reductions for the transistor cost model are observed). These ideas are illustrated in the following example. Example 6.1 Consider the cascade of Toffoli gates depicted in Fig. 6.1(a). The gates in this cascade have a common control factor F = {x0 , x1 }. Hence, the cost of this circuit can be reduced as shown in Fig. 6.1(b) by adding an additional line h (at the top of the circuit) as well as the Toffoli gates MCT(F, h) before and after the cascade. This leads to additional quantum cost of 2 · 5 = 10. However, the factored gates reuse the result of F leading to a reduction of one control line per gate (dashed

116

6 Optimization

rectangle in Fig. 6.1(b)). The removed control lines are shown as white circles. In total this reduces the quantum cost from 104 to 59 and the transistor cost from 144 to 136, respectively. Note that the added line is set to constant input 0. Furthermore, the right most Toffoli gate operating on the added line is only needed if the line is to be used for another factor. Remark 6.2 In previous work, it has already been observed that more circuit lines usually lead to lower (quantum) cost (see e.g. [BBC+95] or also the results of the BDD-based approach discussed in Sect. 3.2.4). Moreover, the authors of [SPMH03] even showed that some functions cannot be synthesized for certain gate libraries unless one additional line is added. However, here these observations are exploited for the first time by proposing a constructive post-synthesis optimization approach for reversible logic. In the following section, the algorithm is now presented in detail.

6.1.2 Algorithm Based on the ideas presented in the last section, now an algorithm is proposed that adds one helper line and then employs a straightforward search procedure to use that line for optimizing the circuit. More precisely, it is shown how to extract factors from Toffoli and Fredkin gates in the circuit (the circuit may contain other types of gates). The algorithm can be applied repeatedly to add more than one helper line. It can also be iterated to add lines until adding a further line results in no cost reduction. The transistor cost model or the quantum cost model can be used and in fact the algorithm is readily adapted to any other gate-based cost model. Consider a reversible circuit G consisting of the cascade of gates G = g0 g2 . . . gd−1 . Let Ci denote the set of control lines for gi and let Ti denote the set of target lines for gi . Then, four steps are performed in total. 1. Add a single helper line h. 2. Find the highest cost reducing factor across the circuit. Therefore, the whole circuit is traversed (i.e. every gate gi with 0 ≤ i < d is considered). If gi is a reversible gate gi (Ci , Ti ) and the helper line h is available (i.e. it is not used by a previously applied factor at this point in the circuit), then ˆ with F not empty: for every partitioning of Ci into {F, C} a. Find the lowest j ≥ i so that j = d − 1 or (F ∩ (Tj +1 ∪ h)) = ∅, i.e. find the next gate gj that manipulates one of the factors in F so that the value of the helper line cannot be reused any longer. If the outputs of the circuit are reached use gd−1 instead. b. Determine the cost reduction that would result from applying this factor to all applicable gates between gi and gj , including the cost of introducing two instances of the factor gate MCT(F, h).

6.1 Adding Lines to Reduce Circuit Cost

117

c. Keep a record of the factor and the gate range that leads to the largest cost reduction. 3. If no cost reducing factor is found in Step 2, then terminate. 4. Otherwise, apply the best factor found and repeat from Step 2 on the revised circuit. Note that as already mentioned above, the rightmost MCT(F, h) gate operating on the helper line is only added if the helper line is going to be used for another factor. Example 6.2 Figure 6.2 shows the result of applying the algorithm to the circuit representing the function rd53 (depicted in Fig. 6.2(a)) using the quantum cost metric. The applied factors are highlighted by brackets at the bottom of Fig. 6.2(b) (with one helper line) and Fig. 6.2(c) (with two helper lines), respectively. While the original circuit has quantum cost of 128, that can be reduced with one helper line to 83 or with two helper lines to 66. Adding a third helper line does not reduce the quantum cost of this circuit further. The order in which factors are considered typically has an effect. As a result, the algorithm is applied to the circuit as given and then to the circuit found by reversing the order of the original circuit. The better of the two final circuits is taken as the result. Thus, the presented algorithm is a heuristic. But as the experiments in the next section show, already this leads to good results.

6.1.3 Experimental Results This section provides experimental results for the proposed approach. To this end, the method described above has been implemented in C and was applied to all benchmarks from RevLib [WGT+08]. All experiments have been carried out on an AMD Athlon 3500+ with 1 GB of memory. Since some of the circuits in RevLib already have been optimized using various approaches (e.g. extensive template post-synthesis optimization, output permutation optimization, and other techniques), to provide an even basis all circuits have been previously optimized. To this end, the approach described in [MDM05] together with a basic set of 14 templates has been used.1 Afterwards, the proposed optimization method has been applied to the resulting circuits. In doing so, all considered circuits already went through an optimization and it can be shown that, independently from this, further significant reductions can be achieved if helper lines and the algorithm introduced above are used.2 1 This took over 10 hours of computation time. Furthermore, the application to the urf series of circuits (which are quite large) has been aborted because they required too much run-time. 2 Of course, similar results are also achieved if the proposed approach is directly applied to nonoptimized circuits.

118

6 Optimization

Fig. 6.2 Reversible circuits for rd53 with one helper line and two helper lines

Table 6.1 summarizes the obtained results for one and two helper lines, respectively. The first three columns give the name of the circuit (including the unique identifier of the circuit realization as used in RevLib), the number of circuit lines (n), as well as the number of gates of the initial (already optimized) circuit (d), respectively. In the following columns, the obtained results for quantum cost and transistor cost models are presented. Therefore, the proposed approach has been applied with one and with two helper lines to both, the circuit as given as well as in the reversed order. Afterwards, the better result has been chosen. The re-

8

8

hwb8_117

hwb8_116

8

hwb8_113

9

9

hwb9_123

9

9

urf3_157

urf1_151

hwb9_120

10

urf1_150

hwb9_122

8

9

hwb8_114

8

10

9

urf5_159

urf3_156

8

hwb8_115

urf2_154

8

hwb8_118

16152

24523

14679

16510

17027

25843

38462

1202

6063

614

11941

48952

749

748

1535

1538

637

1959

1487

6976

7010

44635

44684

13460

22482

45855

2674 121716

1517

4338

4346

27402

27425

8251

13665

27740

72562

29120

7066

75644

9298

13979

8302

9333

9570

11804

16902

420

1877

69.04

37.82

38.00

38.61

38.62

38.70

39.22

39.50

40.38

40.51

40.83

40.98

42.43

43.00

43.44

43.47

43.80

54.32

56.06

65.06

3781

3776

20548

20574

6763

10996

20765

51833

21511

5869

53584

7164

8894

6478

7334

7324

9016

11533

290

1233

45.80

46.13

53.96

53.96

49.75

51.09

54.72

57.41

56.06

50.85

58.19

55.65

63.73

55.87

55.58

56.99

65.11

70.01

75.87

79.66

10536

10520

46336

46400

16896

28696

47024

100544

48616

14488

103680

17432

18640

15304

17648

18240

16200

21840

800

3272

10216

10192

41456

41496

15912

27760

41784

88928

43040

13640

91600

15768

15632

13912

15920

16352

13896

18576

568

2040

3.04

3.12

10.53

10.57

5.82

3.26

11.14

11.55

11.47

5.85

11.65

9.55

16.14

9.10

9.79

10.35

14.22

14.95

29.00

37.65

10144

10112

39544

39584

15648

27376

39760

81296

40824

13376

83160

15184

13704

13424

15384

15712

13336

17560

512

1768

3.72

3.88

14.66

14.69

7.39

4.60

15.45

19.14

16.03

7.68

19.79

12.90

26.48

12.28

12.83

13.86

17.68

19.60

36.00

45.97

2.31

2.28

8.62

8.68

3.19

6.30

8.73

19.64

9.03

2.83

20.16

3.16

3.16

2.83

3.20

3.28

3.30

4.19

0.17

1.63

M AX T RANSISTOR C OST M ODEL T IME I NITIAL A DD 1 L INE I NITIAL A DD 1 L INE A DD 2 L INES A DD 2 L INES C OST O PTIMIZED % I MPR . O PTIMIZED % I MPR . O PTIMIZED % I MPR . O PTIMIZED % I MPR . C OST C OST C OST C OST C OST

Q UANTUM C OST M ODEL

2732 128172

620

499

610

633

429

638

8

urf2_153

492

plus63mod8192_164 13

plus63mod4096_163 12

19

48

20

12

d

cycle10_2_110

n

cycle17_3_112

C IRCUIT

Table 6.1 Experimental results for RevLib circuits

6.1 Adding Lines to Reduce Circuit Cost 119

n

7

6

5

7

5

7

15

5

7

6

alu-v2_31

hwb7_61

4gt4-v0_78

rd53_136

ham15_108

4gt12-v0_87

rd53_135

hwb6_57

7

hwb7_62

hwb6_56

7

hwb7_59

ham7_104

5

4gt4-v0_73

5

7

hwb7_60

5

5

mod8-10_177

alu-v2_30

9

hwb9_121

4gt12-v0_86

9

hwb9_119

plus127mod8192_162 13

C IRCUIT

Table 6.1 (Continued)

d

910

65

16

10

70

15

13

236

13

126

23

18

14

331

289

17

166

14

1541

1544

433

68

43

403

72

53

3261

45

1329

83

111

47

2608

3939

57

1754

84

35973

35967

61425

326

51

32

294

52

38

2319

32

932

58

76

32

1775

2632

38

1153

55

23280

23269

38710

24.71

25.00

25.58

27.05

27.78

28.30

28.89

28.89

29.87

30.12

31.53

31.91

31.94

33.18

33.33

34.26

34.52

35.28

35.30

36.98

299

51

32

257

45

38

2105

32

871

58

62

32

1624

2310

38

1010

44

18940

18939

28856

30.95

25.00

25.58

36.23

37.50

28.30

35.45

28.89

34.46

30.12

44.14

31.91

37.73

41.36

33.33

42.42

47.62

47.35

47.34

53.02

976

224

104

992

200

144

5592

144

2456

272

240

136

4632

6800

144

3168

144

44304

44344

39984

928

216

104

968

200

112

5432

128

2392

272

208

104

4512

6480

144

2960

144

41408

41448

35304

4.92

3.57

0.00

2.42

0.00

22.22

2.86

11.11

2.61

0.00

13.33

23.53

2.59

4.71

0.00

6.57

0.00

6.54

6.53

11.70

928

216

104

968

200

112

5408

128

2392

272

176

104

4496

6400

144

2912

144

40336

40376

32648

4.92

3.57

0.00

2.42

0.00

22.22

3.29

11.11

2.61

0.00

26.67

23.53

2.94

5.88

0.00

8.08

0.00

8.96

8.95

18.35

0.28

0.05

0.05

0.22

0.05

0.05

1.04

0.06

0.51

0.07

0.08

0.05

1.01

1.33

0.07

0.82

0.05

8.65

8.70

8.36

M AX T RANSISTOR C OST M ODEL T IME I NITIAL A DD 1 L INE I NITIAL A DD 1 L INE A DD 2 L INES A DD 2 L INES C OST O PTIMIZED % I MPR . O PTIMIZED % I MPR . O PTIMIZED % I MPR . O PTIMIZED % I MPR . C OST C OST C OST C OST C OST

Q UANTUM C OST M ODEL

120 6 Optimization

15

7

ham15_107

sym6_145

36

132

28

16

5

7

5

hwb5_53

rd53_132

4gt13_91

10

27

55

109

3250

11

21

30

31

114

286

206

20465

65

121

230

53700

348

1174

106

65

27

99

247

176

17235

54

100

190

43893

279

916

81

49

5

10

hwb5_55

urf1_149

sym9_148

7

9 11554

rd53_133

24

cnt3-5_180

210

24

12

20

5

16

hwb5_54

72

1250

98

57770

73

120

63

1126

87

51125

64

105

9.92

11.22

11.50

12.33

12.50

12.50

12.55

12.90

13.16

13.64

14.56

15.78

16.92

17.36

17.39

18.26

19.83

21.98

23.58

24.62

1074

87

50424

64

105

63

113918

27

99

242

169

16594

54

100

174

42232

265

823

81

49

14.08

11.22

12.72

12.33

12.50

12.50

13.92

12.90

13.16

15.38

17.96

18.92

16.92

17.36

24.35

21.36

23.85

29.90

23.58

24.62

728

2360

224

176

96

176

824

1008

53664

200

216

344

232

320

240

3448

296

3432

296

184864 181816

240

320

240

423488 414016

128

176

824

1008

54416

200

216

344

171840 159120

744

2456

224

176

0.46

0.00

1.65

3.33

0.00

0.00

2.24

25.00

0.00

0.00

0.00

1.38

0.00

0.00

0.00

7.40

2.15

3.91

0.00

0.00

3416

296

181696

232

320

240

413656

96

176

824

1008

53664

200

216

344

157448

728

2336

224

176

0.93

0.00

1.71

3.33

0.00

0.00

2.32

25.00

0.00

0.00

0.00

1.38

0.00

0.00

0.00

8.38

2.15

4.89

0.00

0.00

1.11

0.07

51.47

0.05

0.09

0.09

182.19

0.05

0.06

0.18

0.27

12.03

0.04

0.06

0.08

227.23

0.18

0.67

0.07

0.05

M AX T RANSISTOR C OST M ODEL T IME I NITIAL A DD 1 L INE I NITIAL A DD 1 L INE A DD 2 L INES A DD 2 L INES C OST O PTIMIZED % I MPR . O PTIMIZED % I MPR . O PTIMIZED % I MPR . O PTIMIZED % I MPR . C OST C OST C OST C OST C OST

Q UANTUM C OST M ODEL

10 26468 132340 115731

8

15

5

one-two-three-v0_97

ham15_109

6

mod5adder_127

urf2_161

7

rd53_130

urf3_155

d

15 10740

7

urf6_160

7

rd53_131

n

rd53_137

C IRCUIT

Table 6.1 (Continued)

6.1 Adding Lines to Reduce Circuit Cost 121

32004

5

11

4gt5_76

urf4_187

5030

9

12

8

10

12

15

rd53_138

rd73_140

sym9_146

rd84_142

28

28

20

20

sys6-v0_111 10

13

10276

8

d

urf5_158

n

urf2_152

C IRCUIT

Table 6.1 (Continued) T RANSISTOR C OST M ODEL

111

108

75

43

71

160020

26

51380

25150

110

107

74

42

69

148333

24

46513

22693

0.90

0.93

1.33

2.33

2.82

7.30

7.69

9.47

9.77

110

107

74

42

68

146890

24

45662

22518

0.90

0.93

1.33

2.33

4.23

8.21

7.69

11.13

10.47

408

384

288

176

280

512064

112

164416

80480

400

376

280

168

264

501824

104

162496

79880

1.96

2.08

2.78

4.55

5.71

2.00

7.14

1.17

0.75

400

376

280

168

256

501280

104

162408

79840

1.96

2.08

2.78

4.55

8.57

2.11

7.14

1.22

0.80

0.04

44.47

18.30

M AX T IME

0.08

0.09

0.06

0.04

0.06

2135.51

I NITIAL A DD 1 L INE I NITIAL A DD 1 L INE A DD 2 L INES A DD 2 L INES C OST O PTIMIZED % I MPR . O PTIMIZED % I MPR . O PTIMIZED % I MPR . O PTIMIZED % I MPR . C OST C OST C OST C OST C OST

Q UANTUM C OST M ODEL

122 6 Optimization

6.1 Adding Lines to Reduce Circuit Cost

123

sulting cost and the percentage improvement are shown for each case relative to the initial circuit cost (i.e. the cost after template application). Finally, the last column gives the maximum CPU time (in seconds) measured for a single run for each benchmark. Results for small circuits with less than five lines and less than ten gates are omitted. Furthermore, the circuits 4gt11_82, 4gt13_90, decod24enable_126, mod5adder_128, mod5adder_129, hwb6_58, ham7_105, ham7_106, rd73_141, sys6-v0_144, sym9_147, 0410184_169, 0410184_170, rd84_143, cnt35_179, and add8_172 gave no improvement and thus are not listed in Table 6.1. Considering quantum cost, for most of the circuits significant cost reductions can be observed—even if only a single line is added. Over all circuits (including the ones that gave no improvement), adding a single line reduces the quantum cost by 22.51% on average—in the best case (cycle17_3_112) by just over 69%. This can be further improved if another line is added leading to reductions of additional 5.10% on average. If transistor cost is considered, the reductions are somewhat smaller but still significant. When adding a single line the transistor cost is reduced by 5.83% on average—in the best case (cycle17_3_112) by 37%. Adding a second line reduces the transistor cost by further 1.65%. Since the number of lines is negligible in CMOS technologies, this is a notable reduction as well. In addition, these optimizations can be achieved in very short run-time. Even for circuits including thousands of gates, the approach terminates after some minutes—in most of the cases after some seconds. Besides that, the effect of adding a certain number of helper lines on the resulting improvement has been evaluated in detail. More precisely, the proposed method has been applied with one to five helper lines to all the circuits from RevLib (including the small ones that have been omitted in Table 6.1). Again, all these circuits already have been optimized using templates as noted above. A total of 95 of the 177 circuits show an improvement in quantum cost when a single helper line is added. Of the other 82 circuits, 64 ones have a very few number of lines (less than or equal to 5) and are already highly optimized due to their relatively small size. Figure 6.3 shows the improvement in quantum cost of the remaining circuits (both, for the respective benchmarks in the plot diagram as well as on average in the table). As already discussed above, a significant improvement can be observed if a single helper line is added. This is further increased if more lines are applied. However, the improvements diminish with increasing number of helper lines. Finally, no further improvement has been observed, if a sixth line is applied. This is the expected behavior, since multiple helper lines are only useful when multiple factors sharing common gates are present. Altogether, applying the proposed approach significant cost reductions can be achieved if a single line is added to the circuit (even on already optimized realizations). Further (diminishing) improvements result if more than one helper lines are applied. The most critical issue is the fact that additional lines must be added to enable these optimizations. While this is negligible for reversible CMOS technologies, in design for quantum circuits the designer must trade off if these additional expenses are worth the additional qubit(s). Since up to 70% of the quantum cost can be saved, this may be the case for many circuits.

124

6 Optimization

Fig. 6.3 Improvement for up to five helper lines

6.2 Reducing the Number of Circuit Lines While adding a small number of additional lines may be worth to reduce e.g. the quantum cost of a circuit (as shown in the previous section), usually circuit lines are a highly limited resource (caused by the fact that the number of circuit lines corresponds to the number of qubits). Furthermore, a high number of lines (or qubits, respectively) may decrease the reliability of the resulting system. Thus, this number should be kept as small as possible. In the best case, only the minimal number of circuit lines should be used. However, to ensure minimality of circuit lines, the underyling function must be given in terms of a truth table or similar descriptions (see Sect. 3.1.1). But, if larger functions should be synthesized, only hierarchical methods (like the BDD-based method from Sect. 3.2 or the SyReC-based approach from Sect. 3.3) are available so far. These often require a significant number of additional circuit lines (with constant inputs) and, thus, lead to circuits with a large line count. As an example, consider the reversible realization of the AND function and the OR function as shown in Fig. 6.4(a) and (b), respectively. Composing these circuits (as done by hierarchical approaches), a realization with two additional circuit lines (including constant inputs) results (see Fig. 6.4(c)). But, both functions combined can be realized with one additional circuit line only (see Fig. 6.4(d)). Thus, the question is how the number of additional lines in reversible circuits can be reduced. In this section, a post-process optimization method is proposed that addresses this problem. Garbage outputs (i.e. circuit lines whose output value is don’t care) are thereby exploited. A multi-stage approach is introduced that (1) identifies garbage outputs producing don’t cares, (2) re-synthesizes parts of the circuit so that instead of these don’t cares concrete constant values are computed, and (3) connects the resulting outputs with appropriate constant inputs. In other words, circuit structures are modified so that they can be merged with constant inputs resulting in a line reduction. For the respective re-synthesis step, existing synthesis methods are used.

6.2 Reducing the Number of Circuit Lines

125

Fig. 6.4 Composition of circuits

Experimental results show that applying this approach, the number of circuit lines can be reduced by 17% on average—in the best case by more than 40%. Furthermore, depending on the used synthesis approach, these line reductions are possible only with a small increase in the number of gates and the quantum costs, respectively. In some cases the costs even can be reduced. In this sense, drawbacks of scalable but line-costly synthesis approaches are minimized. The remainder of this section is structured as follows. Section 6.2.1 illustrates the general idea of the proposed approach. Afterwards, the concrete algorithm exploiting the made observations is described in Sect. 6.2.2. Finally, Sect. 6.2.3 reports experimental results.

6.2.1 General Idea In this section, the idea how to reduce the number of lines in large reversible circuits is presented. As discussed above, ensuring minimality of circuit lines is only possible for small functions from which a truth table description can be derived. Thus, line reduction is considered as a post-optimization problem. The proposed approach thereby exploits a structure often occurring in circuits generated by scalable synthesis approaches or by composed reversible sub-circuits. This is illustrated by the following running example. Example 6.3 Consider the circuit G = g1 . . . g12 depicted in Fig. 6.5(a) representing a 3-bit adder that has been created by composing three single (minimal) 1-bit adders. This circuit consists of three additional circuit lines (with constant inputs). Not all of them are necessarily required. Furthermore, there are a couple of garbage outputs whose values are don’t care. In particular of interest in this circuit is the first usage of a line with a constant input and the last usage of a line with a garbage output. For example, the constant

126

6 Optimization

Fig. 6.5 Reducing the number of lines in a 3-bit adder circuit

input at line 4 is firstly used by the fifth gate, while at the same time the value of the last line is not needed anymore after the second gate. Since the value of the garbage output doesn’t matter (because it is a don’t care), this might offer the possibility to merge the line including the constant input with the line including the garbage output. More precisely, if it is possible to modify the circuit so that a garbage output returns a constant value (instead of an arbitrary value), then this constant value can be used in the rest of this circuit. At the same time, a constant input line can be removed. More formally: Proposition 6.1 Without loss of generality, let G = g1 . . . gd be a reversible circuit with a constant input at line lc and a garbage output at line lg (lc = lg ). Furthermore, let gi be the first gate connected with line lc (including the constant input) and let gj with j < i be the last gate connected with line lg (including the garbage output). If it j is possible to modify the sub-circuit G1 = g1 . . . gj so that line lg becomes assigned to a constant value, then line lc can be removed from G. For all gates formerly connected with line lc , line lg can be used instead.

6.2 Reducing the Number of Circuit Lines

127

Note that the constant value of the selected line lc is thereby of no importance. If necessary, the needed value can easily be generated by an additional NOT gate (i.e. a Toffoli gate without any control lines). Furthermore, constant outputs can only be produced if the considered circuit includes additional lines with constant inputs. Example 6.4 Reconsider the adder circuit G = g1 . . . g12 in the running example. The constant input at line 1 is firstly used by gate g9 , while the values of the garbage outputs at line 5, line 6, line 9, and line 10, respectively, are not needed anymore after gate g8 . Since the sub-circuit G81 = g1 . . . g8 can be modified so that e.g. the garbage output at line 5 becomes assigned to the constant value 0 (see dashed rectangle in Fig. 6.5(b)), line 1 can be removed and the newly created constant value from line 5 can be used instead. The resulting circuit is depicted in Fig. 6.5(b). Now, this circuit consists of 9 instead of 10 lines. Note that the respective modification of a sub-circuit is not always possible. For example, consider the constant input at line 4 (firstly used by gate g5 ) and the garbage outputs at line 9 and line 10 (not needed anymore after gate g4 ). This might offer the possibility to remove one more circuit line. But, the sub-circuit G41 = g1 . . . g4 cannot be modified accordingly since a realization of the 1-bit addition together with an additional constant output requires more garbage outputs. Using these observations an algorithm for reducing the number of lines in reversible circuits can be formulated. The next section describes the respective steps in detail. Afterwards, the experiments in Sect. 6.2.3 show that significant reductions can be obtained with this approach.

6.2.2 Algorithm Based on the ideas presented in the last section, now an algorithm for circuit line reduction is proposed. The respective steps are illustrated by means of an example in Fig. 6.6. At first, an appropriate sub-circuit is determined (a). Afterwards, it is tried to re-synthesize the sub-circuit so that one of the (garbage) outputs returns a constant value (b). If this was successful, the re-synthesized sub-circuit is inserted into the original circuit (c). Finally, the newly created constant output is merged with a line including a constant input (d). The algorithm terminates if no appropriate sub-circuit can be determined anymore. In the following the respective steps are described in detail. 6.2.2.1 Determine an Appropriate Sub-circuit In the considered context, appropriate sub-circuits are characterized by the fact that they include at least one garbage output which can be later used to replace a constant input. Therefore, it is important to know when lines of a circuit are used for the first time and when they are not needed anymore, respectively. This is formalized by the following two functions:

128

6 Optimization

Fig. 6.6 Reducing the number of circuit lines in four steps

Definition 6.3 Let G = g1 . . . gd be a reversible circuit. Furthermore, let l ∈ {1, . . . , n} be a line of this circuit. Then, the function firstly_used(l) returns i ∈ {1, . . . , d} iff gi is the first gate connected with line l. Accordingly, the function lastly_used(l) returns i ∈ {1, . . . , d} iff gi is the last gate connected with line l. Using these functions, the flow to determine appropriate sub-circuits can be described as follows: 1. Traverse all circuit lines lg of the circuit G = g1 . . . gd that include a garbage output. 2. Check if line lg can be merged with another line lc including a constant input, i.e. if there is a constant input line lc so that firstly_used(lc )> lastly_used(lg ). If this check fails, continue with the next garbage output line lg . 3. Check if the Gk1 = g1 . . . gk with k = lastly_used(lg ) can be modified so that line lg outputs a constant value. If this check fails, continue with the next line lg in Step 2. Otherwise, Gk1 is an appropriate sub-circuit. Example 6.5 Consider the circuit G = g1 . . . g6 depicted in Fig. 6.6(a). Applying the steps introduced above, the sub-circuit G31 = g1 g2 g3 (marked by the dashed rectangle) is determined. Note that the order in which the garbage output lines lg are considered typically has an effect. Here, the lines with the smallest value of lastly_used(lg ) is considered first. This is motivated by the fact that firstly_used(lc ) > lastly_used(lg ) is a necessary condition which, in particular, becomes true for small values of lastly_used(lg ). Besides that, the check in Step 3 is strongly related to the re-synthesis of the subcircuit which is described next.

6.2 Reducing the Number of Circuit Lines

129

6.2.2.2 Re-synthesize the Sub-circuit Given an appropriate sub-circuit Gk1 , the next task is to re-synthesize it so that one garbage output returns a constant value (instead letting it a don’t care). Generally, any available synthesis approach can be applied for this purpose. But since the number of circuit lines should be reduced, approaches that generate additional circuit lines should be avoided. Thus, synthesis methods that require a truth table description (and therewith ensure minimality with respect to circuit lines) are used. Consequently, only sub-circuits with a limited number of primary inputs are considered. To address this issue, not the whole sub-circuit Gk1 is re-synthesized. Instead, a bounded cascade of gates which affects the respective garbage output is considered. More precisely, starting at the output of line lg , the circuit is traversed towards the inputs of the circuit. Each passed gate as well the lines connected with them are added to the following consideration.3 The traversal stops, if the number of considered lines reaches a given threshold λ (in the experimental evaluations, it turned out that λ = 6 is a good choice). From the resulting cascade, a truth table description is determined. Afterwards, the truth table is modified, i.e. the former garbage output at line lg is replaced by a constant output value. It is thereby important that the modification preserves the reversibility of the function. If this is not possible, the sub-circuit is skipped and the next line with a garbage output is considered (see Step 3 from above). Otherwise, the modified truth table can be passed to a synthesis approach. Note that the modification of the truth table is only possible, if constant values at the primary inputs of the whole circuit are incorporated. Constant inputs restrict the number of possible assignments to the inputs of the considered cascade. This enables a reversible embedding with a constant output. Example 6.6 Consider the cascade highlighted by the dashed rectangle in Fig. 6.6(a) which is considered for re-synthesis. Incorporating the constant values at the primary inputs of the whole circuit, only the patterns shown in Table 6.2(a) have to be considered. The outputs for the remaining patterns are not of interest. This function can be modified so that one of the garbage outputs returns a constant value, while still reversibility of the overall function is preserved (see Table 6.2(b)). Synthesizing the modified function, the circuit shown on the right-hand side of Fig. 6.6(b) results. This circuit can be used to remove a constant line. As shown by the example, re-synthesizing the respective cascades in the described manner might lead to an increase in the number of gates as well as in the quantum costs. This is an expected behavior since circuit lines can be exploited to buffer temporary values. If such lines are removed, additional gates may be required to recompute these values. 3 In

other words, the cone of influence of the garbage output line lg is considered.

130

6 Optimization

Table 6.2 Truth tables of the sub-circuit (a) Original a

b

0

0

0

0

–

b

–

f1

a

b

0

0

–

0

–

0

0

0

–

–

–

–

–

1

–

1

–

–

–

–

–

1

–

0

–

–

–

–

–

0

–

1

–

–

–

–

... 0

1

0

0

... 1

0

0

0

... 1

1

(b) After modification

0 ...

0

0

0

0

0

... 0

1

0

0

... 1

0

0

0

... 1

1

0 ...

0

–

b

–

f1

0

0

–

0

–

–

–

–

0

1

–

1

–

–

–

–

0

1

–

0

–

–

–

–

0

0

–

1

–

–

–

–

6.2.2.3 Insert the Sub-circuit and Merge the Lines If re-synthesis was successful, the last two steps are straightforward. At first, the considered sub-circuit is replaced by the newly synthesized one. Afterwards, the considered garbage output line lg is merged with the respective constant input line lc , i.e. the respective gate connections as well as possible primary outputs are adjusted. Finally, line lc which is not needed anymore is removed. Example 6.7 Consider the circuit shown in Fig. 6.6. Replacing the highlighted sub-circuit with the re-synthesized one from Example 6.6, the circuit shown in Fig. 6.6(c) results. Here, line 1 and line 5 can be merged leading to the circuit depicted in Fig. 6.6(d) where line 5 has been removed.

6.2.3 Experimental Results The proposed approach for line reduction has been implemented in C++ and evaluated using a set of reversible circuits with a large number of constant inputs. As synthesis method for step (b) of the optimization (see Sect. 6.2.2.2), two different approaches have been evaluated, namely 1. an exact synthesis approach (based on the principles described in Chap. 4 and denoted by exact synthesis in the following) that realizes a circuit with minimal number of gates but usually requires a significant amount of run-time and 2. a heuristic synthesis approach (namely the transformation-based method introduced in Sect. 3.1.2; in the following denoted by heuristic synthesis) that does not ensure minimality but is very efficient regarding run-time. As benchmarks, reversible circuits obtained by the BDD-based synthesis approach (from Sect. 3.2) were used. These circuits include a significant number of constant inputs that originated from the synthesis and thus cannot be easily removed.

6.3 Optimizing Circuits for Linear Nearest Neighbor Architectures

131

The experiments have been carried out on an Intel Core 2 Duo 2.26 GHz with 3 GB of main memory. The results of the evaluation are presented in Table 6.3. The first four columns give the name (Benchmark), the number of circuit lines4 (Lines), the gate count (d), and the quantum cost (QC) of the original circuits. In the following columns, the respective values after line reduction as well as the run-time needed for optimization (in CPU seconds) are reported. It is thereby distinguished between results obtained by applying exact synthesis and results obtained by applying heuristic synthesis in Step (b). As can be seen by the results, the number of lines can be significantly reduced for all considered reversible circuits. On average, the number of lines can be reduced by 17%—in the best case (spla with exact synthesis) by more than 40%.5 As already mentioned in Sect. 6.2.2.2, reducing the circuit lines might lead to an increase in the number of gates as well as in the quantum costs. This is also observable in the results. In this sense, the differences between the applied synthesis approaches provide interesting insights. While the application of exact synthesis leads to larger runtimes (in the worst case more than 3 CPU hours are required), results from the heuristic method are available within minutes. But, the differences in the respective number of gates and the quantum costs, respectively, are significant. If exact synthesis is applied, the increase in number of gates and quantum cost can be kept small—for some circuits (e.g. cordic and spla) even reductions have been achieved.

6.3 Optimizing Circuits for Linear Nearest Neighbor Architectures So far, circuits have been synthesized or optimized, respectively, mainly with respect to gate count, quantum cost, or line count. In the last years, both criteria have been established as quality criterion to evaluate the results obtained by synthesis approaches. However, with new (physical) realizations of reversible logic also new criteria emerge. As an example, Linear Nearest Neighbor (LNN) architectures [FDH04, Kut06, Mas07] require adjacent quantum gates (i.e. gates where control line and target line are on adjacent circuit lines). In this section, optimization approaches to determine circuits for LNN architectures are introduced. To this end, in Sect. 6.3.1 a new cost metric, namely Nearest Neighbor Cost (NNC), is introduced that denotes the effort needed to make an arbitrary quantum circuit consisting of adjacent gates only. Furthermore, it is reviewed how NNC optimality can be achieved (i.e. how quantum circuits consisting of adjacent gates only can be determined) in a straightforward way. Since this approach 4 Including both, the number of primary inputs/outputs as well as the number of additional circuit lines. 5 Note that thereby still the number of primary inputs/outputs are considered which cannot be reduced.

61

61

13

14

27

sym6

9sym

73

87

170

206

489

hwb7

bw

hwb9

ex5p

spla

159

46

45

ham15

52

39

cycle10_2

hwb6

34

rd84

cordic

152

32

mod5adder

922

909

324

507

308

195

302

292

276

205

205

92

98

59

24

QC

71

66

40

41

37

31

25

25

26

24

24

11

12

9

6

625 1821 165

1669 5885 267

615

718

292

288

80

172

164

76

105

96

89

61

61

28

35

19

10

(5)

(3)

(0)

(1)

(0)

(0)

(0)

(1)

(0)

(2)

264

572

365

253

362

381

342

209

209

92

99

88

26

QC

(−10) 2250

(19) 2545

(6) 1167

(7) 1010

(−20)

(13)

(12)

Δd (2)

1204.57

6280.85

2141.84

2117.42

1290.73

3056.40

1899.34

3913.73

3485.33

3489.19

41.11

0.12

1.96

0.00

Time

(429)

(270)

72

65

39

41

37

30

25

25

25

22

22

11

12

9

6

342.75 329

2625.17 171

467

586

805

485

677

585

424

223

146

362

362

177

50

95

10

909 (−160) 2945

(−35)

(−18) 1354

(−15)

(−8)

(−13)

(−5)

(−8)

(−9)

(−9)

(−7)

(−3)

(−5)

(−5)

(−3)

(−1)

(−1)

(−1) (2)

26

5651

8437

2623

4385

5909

3293

4652

3867

3416

1379

915

2845

2845

1340

227

670

QC

(1276) 22392

(284)

(655)

(181)

(305)

(705)

(326)

(525)

(514)

(322)

(127)

(58)

(301)

(301)

(149)

(16)

(76)

Δd

Line reduction with heuristic synthesis Lines ΔLines d

4656.58 152

(245) 11255.00

(101)

(−60)

(65)

(57)

(58)

(60)

(89)

(66)

(4)

(4)

(0)

(1)

(29)

ΔQC

(−222) 1007 (−662) 3805 (−2080)

(−41)

(−18)

(−16)

(−7)

(−12)

(−5)

(−8)

(−8)

(−9)

(−7)

(−2)

(−3)

(−3)

(−3)

(−1)

(−1)

(−1)

Lines ΔLines d

Line reduction with exact synthesis

699 2275 152

286

281

100

71

102

96

88

27

28

sym9

hwb5

28

34

19

10

mini-alu

rd53

8

7

4mod5

Benchmark Initial Lines d

Table 6.3 Experimental results

(2)

0.19

0.54

0.51

0.05

0.01

0.02

0.00

Time

1.76

5.12

5.77

1.38

3.82

1.35

1.84

(16507) 894.81

(3830) 108.52

(6162) 254.10

(1701)

(3476)

(5585)

(2786)

(4344)

(3672)

(3114)

(1087) 487.93

(639)

(2640)

(2640)

(1248)

(129)

(611)

ΔQC

132 6 Optimization

6.3 Optimizing Circuits for Linear Nearest Neighbor Architectures

133

significantly increases the quantum cost of the resulting circuits (an important cost criteria for LNN architectures as well), improvements are suggested in Sect. 6.3.2. Finally, the effect of this new optimization method is experimentally evaluated in Sect. 6.3.3.

6.3.1 NNC-optimal Decomposition As described in Sect. 2.1.3, quantum circuits can be obtained using reversible circuits as a basis which are afterwards mapped to a cascade of quantum gates. Alternatively, quantum circuits can be directly addressed as e.g. done by the BDDbased synthesis approach described in Sect. 3.2 or by the exact method described in Sect. 4.2.2. However, the resulting quantum circuits often include non-adjacent gates and thus are not applicable to LNN architectures. To have a distinct measurement of this, the following definition introduces the new NNC cost metric. Definition 6.4 Consider a 2-qubit quantum gate q where its control and target are placed at the cth and tth line (0 ≤ c, t < n), respectively. The Nearest Neighbor Cost (NNC) of q is defined as |c − t − 1|, i.e. the distance between control and target lines. The NNC of a single qubit gate is defined as 0. The NNC of a circuit is defined as the sum of the NNCs of its gates. Optimal NNC for a circuit is 0 when all quantum gates are either 1-qubit gates or 2-qubit gates performed on adjacent qubits. Example 6.8 Figure 6.7(a) shows the standard decomposition of a Toffoli gate leading to an NNC value of 1. In a naive way, NNC optimality can easily be achieved, by applying adjacent SWAP gates whenever a non-adjacent quantum gate occurs in the standard decomposition. More precisely, SWAP gates are added in front of each gate q with nonadjacent control and target lines to “move” a control (target) line of q towards the target (control) line until they become adjacent. Afterwards, SWAP gates are added to restore the original order of circuit lines. In total, this leads to additional quantum costs given by the following lemma: Lemma 6.1 Consider a quantum gate q where its control and target are placed at the cth and tth line, respectively. Using adjacent SWAP gates as proposed, additional quantum cost of 6 · |c − t − 1| are needed. Proof In total, |c −t −1| adjacent SWAP operations are required to move the control line to the target line so that both become adjacent. Another |c − t − 1| SWAP operations are needed to restore the original order. Considering quantum cost of 3 for each SWAP operation, this leads to the additional quantum cost of 6 · |c − t − 1|. By applying this method consecutively to each non-adjacent gate, a quantum circuit with NNC of 0 can be determined in linear time.

134

6 Optimization

Fig. 6.7 Different decompositions of a Toffoli gate

Example 6.9 Consider the standard decomposition of a Toffoli gate as depicted in Fig. 6.7(a). As can be seen, the first gate is non-adjacent. Thus, to achieve NNCoptimality, SWAP gates in front and after the first gate are inserted (see Fig. 6.7(b)). Since each SWAP gate is decomposed into 3 quantum gates, this increases the total quantum cost to 11 but leads to an NNC value of 0. In the rest of this section, this method is denoted by naive NNC-based decomposition. Scenarios like this also have been applied to construct circuits for LNN architectures so far (see e.g. [CS07, Kha08]). However, this naive method might lead to a significant increase in quantum cost. Thus, in the next section more elaborated approaches for synthesizing NNC-optimal circuits are proposed.

6.3.2 Optimizing NNC-optimal Decomposition Two improved approaches for NNC-optimal quantum circuit generation are introduced. The first one exploits exact synthesis techniques, while the second one manipulates the circuit and specification, respectively.

6.3.2.1 Exploiting Exact Synthesis In Chap. 4, exact synthesis approaches have been introduced that ensure minimality of the resulting circuits. The synthesis problem is thereby expressed as a sequence of Boolean satisfiability (SAT) instances. For a given function f , it is checked whether

6.3 Optimizing Circuits for Linear Nearest Neighbor Architectures Table 6.4 List of available macros

n

M ACRO

135 C OST NAIVE

I MPR E XACT

3

P(a,b,c), P(c,b,a)

12

8

33%

3

P(a,c,b), P(c,a,b)

24

12

50%

4

P(a,b,d), P(d,c,a)

30

11

63%

3

MCT({a,b},c), MCT({c,b},a)

11

9

18%

4

MCT({a,b},d), MCT({d,c},a)

29

12

59%

3

MCT({a,c},b)

17

13

24%

4

MCT({d,b},a), MCT({a,c},d)

29

13

55%

a circuit with d gates realizing f exists. Furthermore, d is initially assigned to 1 and increased in each iteration if no realization is found. More formally, for a given d and a reversible function f : Bn → Bn , the following SAT instance (similar to the one introduced in Definition 5.1 on p. 104) is created: Φ∧

n −1 2

([inpi ]2 = i ∧ [outi ]2 = f (i)),

i=0

where • inpi is a Boolean vector representing the inputs of the circuit to be synthesized for truth table line i, • outi is a Boolean vector representing the outputs of the circuit to be synthesized for truth table line i, and • Φ is a set of constraints representing the synthesis problem for quantum circuit as described in Sect. 4.2.2. Applying this formulation for synthesis of quantum circuits, NNC-optimality can be ensured by modifying the constraints in Φ so that they do not represent the whole set of quantum gates, but only adjacent gates. In doing so, exact synthesis is performed that determines minimal circuits not only with respect to quantum gates, but also with respect to NNC. Consequently, significant better NNC-optimal decomposition than the one from Fig. 6.7(b) can be synthesized. However, the applicability of such an exact method is limited to relatively small functions. In this sense, the proposed method is sufficient to construct minimal decompositions for a set of Toffoli and Peres gate configurations as shown in Table 6.4. But nevertheless, these results can be exploited to improve the naive NNC-based decomposition: Once an exact NNC-optimal quantum circuit for a reversible gate is available (denoted by macro in the following), it can be reused as shown by the following example. Example 6.10 Reconsider the decomposition of a Toffoli gate as depicted in Fig. 6.7. Using the proposed exact synthesis approach, a minimal quantum circuit

136

6 Optimization

Fig. 6.8 Circuit of Example 6.10

(with respect to both quantum cost and NNC) as shown in Fig. 6.7(c) is determined. In comparison to the naive method (see Fig. 6.7(b)), this reduces the quantum cost from 11 to 9 while still ensuring NNC optimality. Furthermore, the realization can be reused as a macro while decomposing larger reversible circuits. For example, consider the circuit shown in Fig. 6.8. Here, for the second gate the naive method is applied (i.e. standard decomposition is performed and SWAPs are added), while for the remaining ones the obtained macro is used. This enables a quantum cost reduction from 96 to 92. In total, 13 macros have been generated as listed in Table 6.4 together with the respective costs in comparison to the costs obtained by using the naive method. As can be seen, exploiting these macros reduces the cost for each gate by up to 63%. The effect of these macros on the decomposition of reversible circuits is considered in detail in the experiments.

6.3.2.2 Reordering Circuit Lines Applying the approaches introduced so far always leads to an increase in the quantum cost for each non-adjacent gate. In contrast, by modifying the order of the circuit lines (similar to the SWOP approach introduced in Sect. 5.3), some of the additional costs can be saved. As an example, consider the circuit in Fig. 6.9(a) with quantum cost of 3 and an NNC value of 6. By reordering the lines as shown in Fig. 6.9(b), the NNC value can be reduced to 1 without increasing the total quantum cost. To determine which lines should be reordered, two heuristic methods are proposed in the following. The former one changes the order of the circuit lines according to a global view, while the latter one applies a local view to assign the line order. Global Reordering After applying the standard decomposition, a cascade of 1- and 2-qubit gates is generated. Now, an order of the circuit lines which reduces the total NNC value is desired. To do that, the “contribution” of each line to the total NNC value is calculated. More precisely, for each gate q with control line i and target line j , the NNC value is determined. This value is added to variables impi and impj which are used to save the impacts of the circuit lines i and j on the total NNC value, respectively. Next, the line with the highest NNC impact is chosen for reordering and placed at the middle line (i.e. swapped with the middle line). If the selected line is the middle line itself, a line with the next highest impact is selected. This procedure is repeated until no better NNC value is achieved. Finally, SWAP

6.3 Optimizing Circuits for Linear Nearest Neighbor Architectures

137

Fig. 6.9 Reordering circuit lines

Fig. 6.10 Global and local reordering

operations as described in the previous sections are added for each non-adjacent gate. Example 6.11 Consider the circuit depicted in Fig. 6.10(a). After calculating the NNC contributions, impx0 = 1.5, impx1 = 0, impx2 = 0.5, and impx3 = 1 result. Thus, line x0 (highest impact) and line x2 (middle line) are swapped. Since further swapping does not improve the NNC value, reordering terminates and SWAP gates are added for the remaining non-adjacent gates. The resulting circuit is depicted in Fig. 6.10(b) and has quantum cost of 9 in comparison to 21 that results if the naive method is applied. Local Reordering In order to save SWAP gates, line reordering can also be applied according to a local schema as follows. The circuit is traversed from the inputs to the outputs. As soon as there is a gate q with an NNC value greater than 0, a SWAP operation is added in front of q to enable an adjacent gate. However, in contrast to the naive NNC-based decomposition, no SWAP operation is added after q. Instead, the resulting order is used for the rest of the circuit (i.e. propagated through the remaining circuit). This process is repeated until all gates are traversed. Example 6.12 Reconsider the circuit depicted in Fig. 6.10(a). The first gate is not modified, since it has an NNC of 0. For the second gate, a SWAP operation is applied to make it adjacent. Afterwards, the new line order is propagated to all remaining gates resulting in the circuit shown in Fig. 6.10(c). This procedure is repeated until the whole circuit has been traversed. Finally, again a circuit with quantum cost of 9 (in contrast to 21) results.

138

6 Optimization

6.3.3 Experimental Results In this section, experimental results obtained with the introduced approaches are presented. The methods introduced in Sects. 6.3.1 and 6.3.2, respectively, are evaluated by measuring the overhead needed to synthesize circuits with an optimal NNC value of 0. The approaches have been implemented in C++ and applied to benchmark circuits from RevLib [WGT+08] using an AMD Athlon 3500+ with 1 GB of main memory. The results are shown in Table 6.5. The first column gives the names of the circuits. Then, the number of circuit lines (n), the gate count (d), the quantum cost (QC), and the NNC value of the original (reversible) circuits are shown. The following columns denote the quantum cost of the NNC-optimal circuits obtained by using the naive method (NAIVE), by additionally exploiting macros (+Macros), and by applying reordering (G LOBAL, L OCAL, or both), respectively. The next column gives the percentage of the best quantum cost reduction obtained by the improvements in comparison to the naive method (B EST I MPR). The last column shows the smallest overhead in terms of quantum cost needed to achieve NNC-optimality in comparison to the original circuit (OVERHEAD FOR NNC OPTIMALITY). All runtimes are less than one CPU minute and thus are omitted in the table. As can be seen, decomposing reversible circuits to have NNC-optimal quantum circuits is costly. Using the naive method, the quantum cost increases by one order of magnitude on average. However, this can be significantly improved if macros or reordering are applied. Even if reordering may worsen the results in some few cases (e.g. for local reordering in 0410184_169 or add64_184), in total this leads to an improvement of 50% on average—in the best case 83% improvement was observed. If the respective methods are separately considered, it can be concluded that the combination of global and local reordering (i.e. G LOB .+L OC .) leads to the best improvements over all benchmarks. As a result, NNC-optimal circuits can be synthesized with a moderate increase of quantum cost.

6.4 Summary and Future Work Since synthesis results often are not optimal, optimization is an established part of today’s design flows. In this chapter, three optimization approaches for reversible logic have been introduced. While the first one reduces the circuit cost in general (i.e. the sizes of the respective gates and therewith quantum or transistor cost, respectively), the second one reduces the number of lines, and the third one addresses a more dedicated technology specific cost criterion. These approaches clearly show that post-synthesis optimization often has to be done with respect to the desired technology. For example, if quantum circuits are addressed the designer has to trade-off if an up to 70% cost reduction justifies to add a new circuit line (and therewith to spend one more qubit). Moreover, the NNC-based

0410184_169 3_17_13 4_49_17 4gt10-v1_81 4gt11_84 4gt12-v1_89 4gt13-v1_93 4gt4-v0_80 4gt5_75 4mod5-v1_23 4mod7-v0_95 add16_174 add32_183 add64_184 add8_172 aj-e11_165 alu-v4_36 cnt3-5_180 cycle10_2_110 decod24-v3_46 ham15_108 ham7_104

C IRCUIT

14 3 4 5 5 5 5 5 5 5 5 49 97 193 25 4 5 16 12 4 15 7

n

d

46 6 12 6 3 5 4 5 5 8 6 64 128 256 32 13 7 20 19 9 70 23

90 14 32 34 7 42 16 34 21 24 38 192 384 768 96 45 31 120 1126 9 453 83

QC

Table 6.5 Results of NNC-optimal synthesis

24 3 21 41 7 80 26 55 20 25 36 95 191 383 47 39 35 416 3368 9 2506 158

NNC 234 32 158 282 49 525 173 366 142 174 256 762 1530 3066 378 280 242 2621 21420 63 15494 1035

QC

197 28 120 282 47 525 173 364 138 155 256 473 953 1913 233 260 238 2591 21420 63 15390 1027

QC

234 32 128 258 25 321 77 168 118 114 352 762 1530 3066 378 280 218 1457 21420 39 14030 657

QC

423 32 98 150 22 171 56 138 82 78 127 1104 3744 13632 360 181 113 731 8046 21 2627 342

QC

D ECOMPOSED (NNC- OPTIMAL ) C IRCUITS R EORDERING NAIVE +M ACROS G LOBAL L OCAL

423 32 98 147 16 168 53 141 79 78 121 1104 3744 13632 360 181 104 728 8046 24 2588 333

QC

G LOB .+L OC .

16% 13% 38% 48% 67% 68% 69% 62% 44% 55% 53% 38% 38% 38% 38% 35% 57% 72% 62% 67% 83% 68%

B EST I MPR

2.19 2.00 3.06 4.32 2.29 4.00 3.31 4.06 3.76 3.25 3.18 2.46 2.48 2.49 2.43 4.02 3.35 6.07 7.15 2.33 5.71 4.01

OPTIMALITY

OVERHEAD FOR NNC

6.4 Summary and Future Work 139

hwb4_52 hwb5_55 hwb6_58 hwb7_62 hwb8_118 hwb9_123 mod5adder_128 mod8-10_177 plus127mod8192_162 plus63mod4096_163 plus63mod8192_164 rd32-v0_67 rd53_135 rd73_140 rd84_142 sym9_148 sys6-v0_144 urf1_149 urf2_152 urf3_155 urf5_158 urf6_160

C IRCUIT

Table 6.5 (Continued)

4 5 6 7 8 9 6 5 13 12 13 4 7 10 15 10 10 9 8 10 9 15

n

11 24 42 331 633 1959 15 14 910 429 492 2 16 20 28 210 15 11554 5030 26468 10276 10740

d

23 104 142 2325 14260 18124 83 88 57400 25492 32578 10 77 76 112 4368 67 57770 25150 132340 51380 53700

QC

14 119 193 4236 28803 47373 154 147 165415 63732 99482 5 124 119 234 12184 96 122802 45338 331578 114784 239034

NNC 107 823 1304 27967 187272 304659 1011 975 1057946 407926 633994 38 822 790 1516 77556 638 794582 297178 2121808 740084 1487904

QC

83 817 1160 27869 186880 304540 978 969 1057804 407784 633852 19 750 739 1465 77556 587 735170 276882 2038584 706412 1478080

QC

107 595 1268 25939 182196 302481 675 621 1057946 407926 633994 20 702 646 1696 67428 842 659150 297178 1933372 667484 1334916

QC

65 337 614 13390 87495 124068 330 372 503516 210400 279016 32 330 304 556 20643 263 238475 101683 596368 208709 320412

QC

D ECOMPOSED (NNC- OPTIMAL ) C IRCUITS R EORDERING NAIVE +M ACROS G LOBAL L OCAL

65 340 545 12955 87498 124041 333 363 503516 210400 279016 20 303 295 586 25023 308 238490 101683 596371 208706 320409

QC

G LOB .+L OC .

39% 59% 58% 54% 53% 59% 67% 63% 52% 48% 56% 50% 63% 63% 63% 73% 59% 70% 66% 72% 72% 78%

B EST I MPR

2.83 3.24 3.84 5.57 6.14 6.84 3.98 4.13 8.77 8.25 8.56 1.90 3.94 3.88 4.96 4.73 3.93 4.13 4.04 4.51 4.06 5.97

OPTIMALITY

OVERHEAD FOR NNC

140 6 Optimization

6.4 Summary and Future Work

141

optimization is only needed, if quantum circuits for the mentioned LNN architectures should be designed. Thus, optimization approaches should be available that can be applied by the designer according to his current needs. In future work, optimization approaches for further cost metrics are needed. The constraints considered in this chapter are not complete. Besides quantum cost, transistor cost, number of lines, and nearest neighbor cost, many more cost metrics exist (see e.g. [WSD09]). But so far, previous synthesis and optimization approaches only take these metrics into account. Thus, extending these methods so that further criteria are considered is a promising task for the future.

Chapter 7

Formal Verification and Debugging

This chapter introduces approaches for formal verification and debugging and therewith completes the proposed approaches towards a design flow for reversible logic. Verification is an essential step that ensures whether obtained designs in fact realize the desired functionality or not. This is important as with increasing complexity also the risk of errors due to erroneous synthesis and optimization approaches as well as imprecise specifications grows. Considering traditional circuits, verification has become one of the most important steps in the design flow. As a result, in this domain very powerful approaches have been developed, ranging from simulative verification (see e.g. [YSP+99, Ber06, YPA06, WGHD09]) to formal equivalence checking (see e.g. [Bra93, DS07]) and model checking (see e.g. [CGP99, BCCZ99]), respectively. For a more comprehensive overview, the reader is referred to [Kro99, Dre04]. For reversible logic, verification is still at the beginning. Even if first approaches in this area exist (e.g. [VMH07, GNP08, WLTK08]), they are often not applicable (e.g. circuits representing incompletely-specified functions are not supported). Furthermore, with new synthesis approaches (e.g. the BDD-based or the SyReC-based method introduced in Chap. 3), circuits can be designed that contain 100 and more circuit lines and ten thousands of gates—with upward trend. This increase in circuit size and complexity cannot be handled manually and thus efficient automated verification approaches are required. At the same time, existing approaches for traditional circuits cannot be directly applied, since they do not support the specifics of reversible logic such as new gate libraries, quantum values, or different embeddings of the same target function. In the first part of this chapter, equivalence checkers [WGMD09] are proposed that fulfill these requirements. More precisely, two approaches are introduced that check whether two circuits realize the same target function regardless of how the target function has been embedded or whether the circuit contains quantum logic or pure Boolean logic. The first approach employs decision diagram techniques, while the second one uses Boolean satisfiability. Experimental results show that for both methods, circuits with up to 27,000 gates, as well as adders with more than 100 inputs and outputs, are handled in under three minutes with reasonable memory requirements. R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, DOI 10.1007/978-90-481-9579-4_7, © Springer Science+Business Media B.V. 2010

143

144

7 Formal Verification and Debugging

However, while verification methods can be used to detect the existence of errors, they do not provide any information about the source of the error. Thus, in the second part of this chapter, it is shown how the debugging process can be accelerated by using an automatic debugging method (where first results have been published in [WGF+09]). This method gets an erroneous circuit as well as a set of counterexamples as input, and determines a set of gates (so called error candidates) whose replacements with other gates fixes the counterexamples. Having this set of error candidates, the debugging problem is significantly simplified, since only small parts of the circuit must be considered to find the error. The proposed debugging approach thereby also uses Boolean satisfiability and is inspired by traditional circuit debugging [SVAV05]. Moreover, this approach is further extended so that also the concrete error locations are determined, i.e. gate replacements that do not only fix the counterexamples but additionally ensures that the specification is preserved. Experiments show and discuss the effect of these methods. The following two sections describe both approaches in detail, i.e. equivalence checking is addressed in Sect. 7.1 and automated debugging in Sect. 7.2, respectively. Finally, the chapter is summarized and future work is sketched in Sect. 7.3.

7.1 Equivalence Checking In this section, two approaches to the equivalence checking problem are presented. Realizations of both, completely or incompletely specified functions are supported. The circuits can be composed of reversible gates and quantum gates, and can thus assume multiple internal values, but the primary inputs and outputs of the circuits are restricted to pure (non-quantum) logic states. The proposed approaches build on well-known proof techniques for formal verification of irreversible circuits, i.e. decision diagrams and satisfiability. The first approach employs Quantum Multiple-valued Decision Diagrams (QMDDs) (see Sect. 2.2.2). It involves the manipulation of unitary matrices describing the circuits and additional matrices specifying the total or partial don’t cares. The second approach is based on Boolean satisfiability (SAT) (see Sect. 2.3.1). It is shown that the equivalence checking problem can be transformed to a SAT instance supporting constant inputs and garbage outputs. Additional constraints are added to deal with total and partial don’t cares. Experiments on a large set of benchmarks show that both approaches are very efficient, i.e. circuits with up to 27,000 gates, as well as adders with more than 100 inputs and outputs, are handled in less than three minutes with reasonable memory requirements. The remainder of this section is structured as follows. The circuit equivalence checking problem is defined in Sect. 7.1.1. Sections 7.1.2 and 7.1.3 present the QMDD-based and the SAT-based approach, respectively. Finally, experimental results are given in Sect. 7.1.4.

7.1 Equivalence Checking

145

7.1.1 The Equivalence Checking Problem The goal of equivalence checking is to prove whether two reversible (or quantum) circuits designed to realize the same target functionality are equivalent or not. In the latter case, additionally a counterexample is generated, i.e. an input assignment showing the different output values of the two circuits. It is thereby assumed that the two circuits have the same labels for the primary inputs and primary outputs, respectively. Since all considered circuits are reversible, circuits representing irreversible functions (e.g. any n-input m-output function with n = m) might contain constant inputs, garbage outputs, and don’t care conditions (see Sect. 3.1.1). Thus, five types of functions must be considered: • Completely specified: A completely specified reversible function is given. • Constant input: At least one input is assigned to a fixed logic value. For the other assignments to these inputs, all respective outputs are don’t cares. • Garbage output: At least one output is unspecified for all input assignments. • Total don’t care condition: The values of all outputs are unspecified for a given assignment to the inputs. • Partial don’t care conditions: The values of a proper subset of the outputs are unspecified for a given assignment to the inputs. Table 7.1 shows truth tables of a completely specified function, a function with a constant input, a function with a garbage output, a function with total don’t care conditions, and a function with partial don’t care conditions, respectively. A specification with constant inputs, garbage outputs, or any don’t care conditions is denoted as incompletely specified function. Total and partial don’t cares are inherited from the irreversible function whereas constant input and garbage output don’t cares usually arise from embedding the irreversible function in a reversible specification. In the next sections both, a QMDD-based and a SAT-based approach for checking the equivalence of two circuits with respect to constant inputs, garbage outputs, and don’t care conditions is proposed, respectively.

7.1.2 QMDD-based Equivalence Checking In this section, the QMDD-based approach for equivalence checking of reversible circuits is presented. First, the completely specified case is considered followed by a description of how constant inputs, garbage outputs, and don’t care conditions can be handled.

7.1.2.1 The Completely Specified Case Given a reversible circuit with gates g0 g1 . . . gd−1 , the matrix describing the circuit is given by M = Md−1 × · · · × M1 × M0 where Mi is the matrix for gate gi .

146

7 Formal Verification and Debugging

Table 7.1 Different function types x0 x1 x2

f0 f1 f2

1 x1 x2

f0 f1 f2

x0 x1 x2

f0 f1 g

0

0

0

1

1

1

0 0 0

– – –

0

0

0

1

1

–

0

0

1

1

1

0

0 0 1

– – –

0

0

1

1

1

–

0

1

0

1

0

1

0 1 0

– – –

0

1

0

1

0

–

0

1

1

1

0

0

0 1 1

– – –

0

1

1

1

0

–

1

0

0

0

1

1

1 0 0

0 1 1

1

0

0

0

1

–

1

0

1

0

1

0

1 0 1

0 1 0

1

0

1

0

1

–

1

1

0

0

0

1

1 1 0

0 0 1

1

1

0

0

0

–

1

1

1

0

0

0

1 1 1

0 0 0

1

1

1

0

0

–

(a) Completely spec.

(b) Constant input

x0 x1 x2

f0 f1 f2

x0 x1 x2

f0 f1 f2

0

0

0

1

1

1

0

0

0

1

1

1

0

0

1

1

1

0

0

0

1

1

1

–

0

1

0

–

–

–

0

1

0

1

0

1

0

1

1

1

0

0

0

1

1

1

–

0

1

0

0

0

1

1

1

0

0

0

1

1

1

0

1

–

–

–

1

0

1

0

1

0

1

1

0

0

0

1

1

1

0

–

0

1

1

1

1

0

0

0

1

1

1

0

0

0

(d) Total don’t cares

(c) Garbage output

(e) Part. don’t cares

As reviewed in Sect. 2.2.2, QMDDs are a data-structure for the representation and manipulation of r n × r n complex-valued matrices with r pure logic states. Moreover, for a given order the QMDDs of two identical functions are canonical [MT06, MT08]. Thus, for the completely specified case, two reversible circuits that realize the same function and adhere to the same variable order have the same matrix description. Because of this uniqueness of QMDDs, to check the equivalence of two circuits it is sufficient to verify that the top edges of the two QMDDs point to the same node with the same weight. A traversal of the QMDDs is not required. Note that sorting is required when the inputs or outputs, respectively, are not aligned in the two circuits. Therefore, swap gates are added to achieve the same order for both circuits.

7.1.2.2 Constant Inputs, Garbage Outputs A constant input means that the input space is restricted to those assignments containing that value, all others do not occur. To support this, the matrix is adjusted. Consider the case when the constant input is the top-level partition variable with

7.1 Equivalence Checking

147

constant value j . The following equations show the transformation of the input space (denoted by γ ) to the output space (denoted by δ) for the general case and for the case with a constant input, respectively.1 φ δ0 M0 M1 γ0 δ0 Mj γj = , = . δ1 M2 M3 γ1 δ1 φ Mj +2 φ The variable φ thereby denotes an empty sub-matrix or sub-vector of appropriate dimension. For QMDDs, an empty sub-matrix is represented by a null edge pointer. Thus, performing these operations before comparing both QMDDs, the equivalence can be checked even if constant inputs occur. In a similar way, garbage outputs are handled. Suppose the top-level partition variable is a garbage output. In this case, the output of the circuit regardless of the value of that variable is of interest. To this end, the matrix is reduced to M0 + M 2 M 1 + M 3 γ0 δˆ = , φ φ γ1 φ where δˆ denotes the output after removal of the garbage output. To explain the addition of sub-matrices, recall that the circuit inputs and outputs are assumed to be in pure logic states, i.e. one element of γ is 1 and the others are 0. The same is true for δ. Further, M is a permutation matrix (a special case of a unitary matrix). In general, constant inputs and garbage outputs can correspond to any variables in the circuit’s QMDD. This can be handled by performing a depth-first traversal of the QMDD applying the above reductions to each node as it is encountered. In a depth-first traversal, the reductions are applied to a node’s descendants before applying them to the node itself. Note that a variable can be both a constant input and a garbage output. The order of applying the two reductions is unimportant. This traversal reduces sub-matrices as required throughout the full matrix.

7.1.2.3 Don’t Care Conditions Let Mˆ denote the matrix for a circuit after the constant input and garbage output reductions are applied. To deal with total don’t cares in the target function, a diagonal matrix D is constructed such that Di,i = 0 if the corresponding output position is a total don’t care, and Di,i = 1 otherwise. Then Mˆ × D is computed. The effect is to force all total don’t care situations to 0 by ignoring the input states corresponding to don’t care output assignments. This ensures that when the reduced matrices are compared for two circuits, differences cannot arise in total don’t care positions. Note that the easiest way to construct a QMDD for D is to start from a diagonal matrix and then use a depth-first traversal to zero the diagonal elements corresponding to total don’t cares. 1 Note

that both matrices already are partitioned to correspond to the QMDD partitioning.

148

7 Formal Verification and Debugging

Partial don’t care conditions can be handled in a similar fashion. The difference is that partial don’t care conditions apply only to a subset and not to all outputs. The simplest approach is to treat the outputs for which a set of partial don’t cares does not apply as pseudo-garbage outputs and construct a new matrix for this situation by reducing the pseudo-garbage. A diagonal matrix is then constructed for those don’t cares and the equivalence check proceeds as above. This is repeated for each subset of the outputs that have shared partial don’t cares.

7.1.3 SAT-based Equivalence Checking In this section, the SAT-based equivalence checker for reversible logic is described. The general idea is to encode the problem as an instance of Boolean satisfiability to be solved by a SAT solver (see Sect. 2.3.1). If the SAT solver returns unsatisfiable, then the checked circuits are equivalent. Otherwise, a counterexample can be extracted from the satisfying assignment of the instance.

7.1.3.1 The Completely Specified Case To formulate the problem, a so-called miter structure as proposed in [Bra93] for traditional (irreversible) circuits is built. By applying the input assignments to both circuits G1 and G2 , differences at corresponding outputs are observed by XOR operations. If at least one XOR evaluates to 1 (determined by an additional OR operation), the two circuits are not equivalent. Example 7.1 The miter structure for two circuits containing three lines is shown in Fig. 7.1. Note that the added XOR and OR operations are only used in formulating circuit equivalence checking as a SAT instance. They are not actually added to the circuits. To encode this miter as a SAT instance, a new free variable is introduced for each signal in the circuit. Furthermore, each reversible gate as well as the additional XOR and OR operations of the miter structure are represented by a set of clauses. Finally, the output of the OR gate is constrained to the value 1. These transformations can be performed in linear time and space with respect to the given circuit sizes, since only local operations (adding a few number of clauses per gate) are required. In doing so, the SAT instance becomes satisfiable iff there exists an input assignment to the circuits where at least one pair of the corresponding outputs assumes different values. In this case, from the satisfying assignment a counterexample can be extracted just by obtaining the assignments of all SAT variables representing circuit lines. If both circuits are equal, then no such assignment exists and thus the SAT solver must return unsatisfiable. If quantum circuits are considered, V and V+ gates may produce non-Boolean values. Thus, the variables for the associated signals employ a multiple-valued rather

7.1 Equivalence Checking

149

Fig. 7.1 SAT formulation for completely specified functions

than a Boolean encoding. More precisely, each signal of the circuit is represented by two Boolean variables y and z, while • • • •

00 represents the Boolean value 0, 01 represents the non-Boolean value V0 , 10 represents the Boolean value 1, and 11 represents the non-Boolean value V1 . So, both reversible as well as quantum circuits can be checked.

7.1.3.2 Constant Inputs, Garbage Outputs, and Don’t Care Conditions If circuits realizing an incompletely specified function are checked for equivalence, constant inputs, garbage outputs, as well as total and partial don’t care conditions must be considered. To this end, the SAT formulation introduced in the last section is extended as follows: • Constant inputs: The associated SAT variables are restricted to the appropriate constant values. This can be applied by using unit clauses. • Garbage outputs: Garbage outputs are by definition don’t cares and can be ignored in the SAT miter structure. • Total and partial don’t care conditions: In these cases, new variables t, pf0 , . . . , pfn−1 and additional AND operations are added to the miter structure. The variable t evaluates to 0, iff an input assignment leading to a total don’t care condition is applied. The variables pfi (0 ≤ i < n) evaluate to 0, iff an input assignment leading to a partial don’t care condition at output fi is applied. The AND operations ensure that if t (pfi ) is assigned to 0, all outputs (the respective output) are ignored by the miter. Hence, only differences in output values without don’t care conditions are detected. Example 7.2 Figure 7.2 shows the extended miter for two circuits realizing an incompletely specified function. A truth table showing the garbage output, the don’t care conditions, as well as the resulting values for t and pfi is given in the left part of Fig. 7.2. Note that the first half of the truth table includes don’t cares due to the constant input.

150

7 Formal Verification and Debugging

Fig. 7.2 SAT formulation for incompletely specified functions

7.1.4 Experimental Results This section provides experimental results. The QMDD package from [MT08] has been used with the default sizes for the computational tables, a garbage collection limit of 250,000, and a maximum of 200 circuit lines. For the SAT-based approach the SAT solver MiniSAT [ES04] has been applied. In total, two kinds of experiments have been conducted (typically using different gate libraries): experiments with equivalent circuits and with non-equivalent circuits, respectively. The experiments have been carried out on an AMD Athlon 3500+ with 1 GB of memory with a timeout of 500 CPU seconds. All considered benchmark circuits were taken from RevLib [WGT+08]. Table 7.2 shows the results obtained by comparing equivalent and non-equivalent circuits, respectively. The first column gives the name of the circuit. For equivalent circuits, two numbers following the name give the unique identifier of the circuit realizations as used in RevLib. For non-equivalent circuits, only one number is given which identifies the circuit from the corresponding equivalent test with the larger number of gates. That circuit is used as given in RevLib and in a modified form by arbitrarily altering, adding or deleting gates. Column DC shows the types of don’t cares (see table note a for the coding). In column GT the gate types used in each circuit are provided (see table note b for the coding). Column n presents the number of inputs and column d gives the number of gates for the first and second circuit, respectively. In the next three columns, the data for the QMDD-based approach is shown, i.e. peak number of QMDD nodes, run-time in CPU seconds, and memory in MByte. The peak number of nodes is the maximum number of active nodes at any point in building the circuit QMDDs and checking for equivalence. Finally, the last four columns provide the data for the SAT-based method. First, the number of variables and the number of clauses of the SAT instance are shown. Then, the runtime as well as the required memory are given. Both approaches prove or disprove the equivalence for all considered benchmarks (except one for the SAT-based approach) very quickly. The maximum runtime observed was less than three minutes. Several experiments with more than 10,000 gates are included. The largest has nearly 27,000 gates. Even for these cases, the proof times are very fast. The largest circuit in terms of the number of inputs

CT

none

none

none

decod24-v2 (43, 44)

hwb4 (52, 49)

hwb5 (55, 53)

hwb6 (56, 58)

CGT

none

urf2 (152, 154)

cnt3-5 (179, 180)

none

urf1 (149, 151)

none

none

c2 (182, 181)

urf6 (160, 160)

G

alu-v4 (36, 37)

none

G

alu-v3 (34, 35)

none

G

alu-v2 (30, 33)

urf3 (155, 157)

G

alu-v1 (28, 29)

urf5 (158, 159)

CG

G

alu-v0 (26, 27)

CG

add32 (185, 183)

add64 (186, 184)

none

CG

add16 (175, 174)

DCa

0410184 (170, 169)

E QUIVALENT C IRCUITS

NAME

C IRCUITS

Table 7.2 Experimental results

CT/CT

NCT/CT

CT/CT

NCT/NCV

CT/CT

T/T

T/CT

T/CT

T/CT

T/CT

NCV/NCT

NCT/NCT

NCT/NCT

NCT/NCT

NCT/NCT

NCT/NCT

CV/CT

CV/CT

CV/CT

NCV/NCT

GTb

6

5

4

4

16

15

9

10

8

9

35

5

5

5

5

5

193

97

49

14

n

305/116

7/7

7/7

18/7

7/7

6/6

384/256

192/128

96/64

74/46

126/42

24/55

11/17

6/9

25/20

10740/10740

10276/499

26468/2674

5030/620

11554/1487

d

3588

972

208

192

204874

258140

250061

250441

250020

250311

40767

223

177

295

188

171

109769

28761

7841

2924

0.01

<0.01

<0.01

<0.01

0.64

97.95

3.32

22.71

1.09

5.05

0.08

<0.01

<0.01

<0.01

<0.01

0.01

0.12

0.03

0.02

0.01

T IME

QMDD- BASED N ODES

12.53

12.52

12.52

12.52

21.95

24.25

23.91

23.91

23.89

23.91

14.36

12.53

12.53

12.53

12.53

12.53

17.53

13.82

12.72

12.54

M EM .

SAT- BASED

1146

448

133

115

2338

343712

107764

320578

50849

130419

26018

84

83

145

82

72

198596

50148

12788

2843

VARS

2846

1111

336

214

21976

751876

249191

735809

119592

302871

40774

189

185

340

180

159

276164

70500

18356

4640

C LSES

0.03

0.02

0.01

<0.01

0.43

>500.00

19.15

86.81

3.82

32.10

0.52

<0.01

<0.01

<0.01

<0.01

<0.01

1.12

0.29

0.08

0.05

T IME

–

3.21

2.94

2.82

2.70

9.56

48.18

134.67

23.86

55.24

10.67

2.70

2.70

2.70

2.70

2.70

48.10

15.14

6.31

3.70

M EM .

7.1 Equivalence Checking 151

CGT

sym9 (146, 147)

CG

CG

G

G

G

G

G

none

none

none

none

add32 (185)

alu-v0 (26)

alu-v1 (28)

alu-v2 (30)

alu-v3 (34)

alu-v4 (36)

c2 (182)

urf1 (149)

urf2 (152)

urf3 (155)

add16 (175)

add64 (186)

none

CG

0410184 (170)

N ON - EQUIVALENT C IRCUITS

CGT

rd53 (135, 134)

NCT/NCT

T

GTP

mod10 (171, 176)

none

hwb9 (123, 122)

mod8-10 (178, 177)

NCT/CTP

none

hwb8 (116, 115)

T/T

T/T

T/CT

NCV/NCV

NCT/CT

NCT/CT

NCT/NCT

NCT/CT

NCT/NCT

CV/NCV

CV/NCV

CV/NCV

NCV/NCV

CT/CTP

CT/TP

NCT/NCT

NCT/CTP

NCT/F

none

hwb7 (62, 60)

GTb

DCa

NAME

C IRCUITS

Table 7.2 (Continued)

7

5

4

9

8

7

10

8

9

35

5

5

5

5

5

193

97

49

14

12

n

749/614

331/166

26468/26464

5030/5029

11554/11554

305/304

7/6

7/6

18/16

7/6

6/5

384/386

192/193

96/97

74/73

28/28

16/12

9/14

10/7

1959/1541

d 22010

250441

229154

250311

34107

204

202

346

119

116

91417

21892

9516

2550

8825

593

235

122

250162

82942

44.14

2.00

9.53

0.15

0.01

<0.01

<0.01

<0.01

<0.01

0.15

0.04

0.02

<0.01

0.02

<0.01

<0.01

<0.01

1.89

0.26

0.04

T IME

QMDD- BASED N ODES

23.91

22.97

23.91

14.21

12.52

12.52

12.52

12.52

12.52

17.93

14.11

13.22

12.59

12.89

12.53

12.53

12.53

23.88

16.19

13.44

M EM . 3730

582274

90549

231099

43648

81

79

198

77

67

298632

75398

19270

4318

727

224

161

97

33454

11599

211280

531527

64262

186

178

480

172

148

353478

90522

23732

6385

1582

523

491

265

79798

27786

9249

C LSES

1323351

SAT- BASED VARS

13.72

2.12

4.58

0.32

<0.01

<0.01

0.01

<0.01

<0.01

1.53

0.38

0.10

0.04

0.02

0.01

0.01

<0.01

6.28

1.02

0.14

T IME

243.19

39.08

97.09

15.84

2.70

2.70

2.70

2.70

2.70

63.91

19.31

7.23

4.16

3.37

2.83

2.70

2.70

16.73

7.42

4.49

M EM .

152 7 Formal Verification and Debugging

CGT

sym9 (146)

b Gate

CT/CT

CT/CT

NCT/NCT

NCT/NCT

NCT/NCT

NCT/NCT

NCT/NCT

CT/CT

NCT/CT

CT/NCT

NCT/NCT

CT/NCT

T/CT

T/CT

GTb 9

12

7

5

4

9

8

7

6

5

4

4

16

15

n

28/27

16/16

9/15

10/10

1959/1958

749/750

331/331

126/125

24/54

11/18

6/5

25/26

10740/10740

10276/10276

d

10045

611

269

107

250162

61990

22541

3922

961

216

89

204745

260196

250163

0.02

<0.01

<0.01

<0.01

2.55

0.31

0.06

0.01

<0.01

<0.01

<0.01

0.63

144.93

6.91

T IME

QMDD- BASED N ODES

12.93

12.54

12.52

12.53

23.91

15.28

13.45

12.53

12.53

12.52

12.52

21.94

24.50

23.90

M EM .

714

252

166

109

36244

12438

4865

1733

442

137

60

2429

343711

205539

VARS

SAT- BASED

type: N = NOT, C = controlled-NOT, F = multiple control Fredkin, P = Peres, T = multiple control Toffoli, V = V or V+

1553

582

501

296

83568

28977

11553

4357

1094

344

143

22158

751873

472739

C LSES

none = completely-specified, C = constant input, G = garbage output, T = total don’t-care, P = partial don’t-care

CGT

rd53 (135)

a Don’t-care:

T

GTP

none

hwb9 (123)

mod8 (10)

none

mod (10)

none

hwb8 (116)

none

hwb (4)

hwb7 (62)

CT

decod24-v2 (43)

none

CGT

cnt3-5 (179)

none

none

urf6 (160)

hwb5 (55)

none

urf5 (158)

hwb6 (56)

DCa

NAME

C IRCUITS

Table 7.2 (Continued)

0.01

0.01

0.01

<0.01

0.57

0.16

0.08

0.03

0.01

0.01

<0.01

0.43

16.48

4.85

T IME

3.36

2.82

2.70

2.70

17.39

8.05

4.94

3.50

2.94

2.83

2.70

9.74

143.23

84.53

M EM .

7.1 Equivalence Checking 153

154

7 Formal Verification and Debugging

and outputs, add64 with n = 193, is a 64 bit ripple carry adder which includes 64 constant inputs and 128 garbage outputs to achieve reversibility. Comparing the runtimes of the two approaches, the QMDD method is faster in the case of equivalent circuits, while in the non-equivalent case the SAT-based approach often is faster. Regarding memory usage it can be seen that both approaches do not blow up even for instances containing tens of thousands gates.

7.2 Automated Debugging and Fixing After a circuit has been shown erroneous, the designer must locate the concrete error location to detect the source of the error. Without any additional support, this is a manual and therewith time-consuming task. Thus, for traditional circuits, automated debugging approaches have been introduced that support the designer in this task (see e.g. [LCC+95, VH99, HK00, SVAV05, ASV+05, FSVD06]). They use the counterexamples obtained by verification to reduce the set of gates that still have to be considered during debugging. In particular the approach from [SVAV05], which uses Boolean satisfiability, has been shown to be efficient and robust. However, applying this method to reversible circuits does not lead to the desired results, i.e. already for single errors no gate reduction can be achieved. Consequently, as for the approaches introduced in the former chapters and sections, respectively, a newly developed approach for reversible logic is needed. In this section, a first approach to automatically determine error candidates explaining the erroneous behavior of a reversible circuit is introduced. More precisely, given an erroneous circuit and a set of counterexamples describing the error(s), the approach returns sets of gates, whose replacements with other gates fix the counterexamples. Besides the automatic debugging approach, also theoretical results are presented. For a restricted error model, i.e. assuming single missing control line errors, already the given number of counterexamples allows to exclude a significant number of gates from being an error candidate. Thus, the size of the SAT instance is reduced or in the best case no SAT call is needed leading to a speed-up of the overall debugging process. However, as in irreversible debugging error candidates only give an approximation of the real error location. For single errors this is not a big problem, since the reduction achieved by the error candidates often is significant. In contrast, for multiple errors a second problem besides approximation arises: the determined error candidates often are misleading, i.e. an error candidate pinpoints to parts of the circuit which cannot be used for repair. Hence, an improved debugging approach is introduced that determines the concrete error locations. That is, strengthened error candidates are generated ensuring that for each gate of the error candidate, a gate replacement is possible such that not only the counterexamples are fixed, but also the specification is preserved. Experiments demonstrate the behavior of this approach. While concrete error locations obviously give the best possible debugging quality (pinpointing to the exact error and offering a fix), the run-times increase for larger circuits. Hence, there is a quality vs. time trade-off.

7.2 Automated Debugging and Fixing

155

Finally, a theoretical result is presented that can be applied to automatically fix an erroneous circuit. Therefore, a single gate must be replaced by a fixing cascade which—due to reversibility—can be computed in time linear in the size of the circuit. Altogether, the approaches presented in this section help designers to determine error candidates, to improve these candidates so that the concrete error location results, and to fix the erroneous circuit. The remainder of this section is structured as follows: The next section introduces the debugging problem in detail and briefly reviews SAT-based debugging of traditional circuits. Having this as a basis, determining error candidates for reversible circuits is proposed and further improvements are described in Sect. 7.2.2. How to determine error locations instead of the approximating error candidates is explained in Sect. 7.2.3, while Sect. 7.2.4 introduces the automatic fixing approach. Finally, Sect. 7.2.5 provides experimental results for all proposed approaches.

7.2.1 The Debugging Problem To keep the following descriptions self-contained, this section defines the debugging problem and briefly reviews the SAT-based method for debugging of traditional circuits.

7.2.1.1 Problem Description As their classic counterparts, reversible circuits may contain errors e.g. because of bugs in synthesis as well as optimization tools or imprecise specifications, respectively. These errors can be detected e.g. by verification as introduced in the last section. However, to find the source of an error, the circuit must be debugged—often a manual and time-consuming process. Thus, automatic approaches are desired that support the designer to reduce the possible error locations. Therefore, error models have been defined that represent frequently occurred errors. Possible error models for reversible logic include: Definition 7.1 Let g = MCT(C, t) be a Toffoli gate of a circuit G. Then, 1. a missing control line error appears if g is replaced by TOF(C , t), where C = C \ {xi } with xi ∈ C (i.e. a control line is removed), 2. an additional control line error appears if gate g is replaced by TOF(C , t), where C = C ∪ {xi } with xi ∈ / C ∪ {t} (i.e. a control line is added), 3. a wrong target line error appears if g is replaced by TOF(C , t ), where t = t and C may be different from C (i.e. g is replaced by a gate with another target line), and 4. a wrong gate error appears if g is replaced by TOF(C , t ), where t = t and/or C is different from C (i.e. g is replaced by another gate).

156

7 Formal Verification and Debugging

Fig. 7.3 Circuit with missing control error

Remark 7.1 Note that some error models are supersets of other models. For example, all control line errors are also wrong gate errors. As shown later, the automatic debugging approaches can be improved, if the designer is able to restrict the error to a particular model. Given an error model together with an erroneous circuit G as well as a set of counterexamples, the goal of automatic debugging approaches is to determine a set of error candidates that may explain the erroneous behavior of G. An error candidate is a set of gates gi that can be replaced by other gates (according to the error model) such that for each counterexample the correct output values result. The size of an error candidate is given by the number of gates (denoted by k). Example 7.3 Figure 7.3 shows an erroneous circuit G together with a counterexample (applied to the inputs of G). At the outputs, the wrong values (determined by the counterexamples) as well as the expected (i.e. the correct) values are annotated. For this example, {g5 } is an error candidate, since replacing g5 with another gate (namely a gate with one more control line) would correct the output values. In this case, the counterexample detects a missing control error. Having error candidates, the debugging process can be significantly accelerated, since only a small part of the circuit (highlighted by the error candidates) must be inspected. Moreover, in many cases determining error candidates directly leads to the desired error location.

7.2.1.2 Debugging of Traditional Circuits To determine error candidates, automatic approaches have been introduced for traditional circuits (see e.g. [LCC+95, VH99, HK00, SVAV05, ASV+05, FSVD06]). In particular, methods based on Boolean satisfiability (SAT) have been demonstrated to be very effective for debugging irreversible logic [SVAV05]. Here, the erroneous circuit and a set of counterexamples are used to create a SAT instance. Solving this instance using well engineered SAT solvers (see e.g. [ES04]) returns solutions from which the desired set of error candidates can be determined. The general structure for the debugging problem that is encoded as a SAT instance is shown in Fig. 7.4. For each counterexample, a copy of the circuit is created, whose inputs are assigned to values provided by the given counterexamples (denoted by cex0 , . . . , cex|CEX| ). The outputs are assigned to the correct values. Furthermore, each gate gi is extended by additional logic, i.e. a multiplexor with select

7.2 Automated Debugging and Fixing

157

Fig. 7.4 SAT-based debugging approach

line si is added. If si is assigned to 0, then the output value of gate gi is passed through, i.e. the gate works as usual. Otherwise (if si = 1), an unrestricted value is used (available via a new free variable w). Therefore, if gi is an erroneous gate, the SAT solver can assign si = 1 and choose the correct gate value to enable correct values at the output of the circuit. As depicted in Fig. 7.4, the same select value si is used for a gate gi with respect to all duplications. This ensures that free values of the respective output signals are only used, if circuit outputs are corrected for all counterexamples. Furthermore, the total number of selects si set to 1 is limited to k. Starting with k = 1, k is iteratively increased until the SAT instance becomes satisfiable. Then, each satisfying assignment yields an error candidate of size k. All gates with si set to 1 are contained in the error candidate. By performing all solution SAT (i.e. determining all solutions from the instance), all error candidates for a given k are calculated.

7.2.2 Determining Error Candidates In this section, the SAT-based debugging formulation for reversible circuits is presented. It is shown that the formulation for traditional circuits cannot be directly applied. Nevertheless, a similar concept is exploited. Furthermore, for specific error models (namely all control line errors) improvements are introduced.

7.2.2.1 SAT Formulation The debugging approach described above has been demonstrated to be very effective for determining error candidates of irreversible circuits. One-output gates for AND,

158

7 Formal Verification and Debugging

Fig. 7.5 Applying traditional debugging to reversible logic

OR, XOR, etc. are thereby considered. Thus, only a single multiplexor as shown in Fig. 7.5(a) is added to express whether a gate gi can become part of an error candidate or not. In contrast, reversible logic builds on a different gate library where each gate always has n outputs. Indeed, it is possible to convert Toffoli gates into respective AND gate and OR gate combinations (see Fig. 7.5(b) for a Toffoli gate with two control lines) and apply traditional debugging methods afterwards. But, this would lead to several drawbacks. First, more than one multiplexor with different select lines are needed to express whether a Toffoli gate gi can become part of an error candidate. Thus, the value k does not represent the number of error candidates any longer. For example, a single missing control line error may show up on two multiplexors and hence complicates debugging. Furthermore, only one single line of the Toffoli gate is considered with this formulation. Thus, errors like misplaced target lines cannot be detected. Alternatively, n multiplexors with the same select si might be added to the debugging formulation as shown in Fig. 7.5(c). However, also this lead to meaningless results as the following lemma shows. Lemma 7.1 Let G be an erroneous circuit. Using the traditional debugging approach with the additional logic formulation depicted in Fig. 7.5(c) and an arbitrary set of counterexamples, for each gate gi (0 ≤ i < d) of G a satisfying solution with si = 1 exists. Proof Let G = G1 gi G2 be an erroneous circuit with the set of counterexamples CEX. A gate gi is determined as error candidate, if a satisfying assignment with si = 1 exists such that the correct output value for each counterexample cex ∈ CEX can be calculated. Using the additional logic formulation depicted in Fig. 7.5(c), assigning si = 1 enables unrestricted values for all n outputs of the gate gi . To obtain the values leading to the correct circuit output, just G−1 2 has to be applied to the correct output values. This can be performed for each gate gi of G.

7.2 Automated Debugging and Fixing

159

Fig. 7.6 Proposed debugging formulations

As a result, each gate will be identified as an error candidate. This lemma shows that the existing SAT-based debugging formulation for irreversible circuits is too general for reversible circuits. In fact, assigning si to 1 should imply that the output values of gate gi cannot be chosen arbitrarily, but with respect to the functionality of Toffoli gates. The two main properties of Toffoli gates are: • At most one line (the target line) is inverted if the respective control lines are assigned to 1 and • all remaining lines are passed through. A new formulation respecting these properties is given in Fig. 7.6(a). For each output of a gate gi , a second multiplexor with a new select sib is added (0 ≤ b < n). By restricting the sum si0 + · · · + sin−1 to 1 it is ensured that the value of at most one line is modified, if si is set to 1. All remaining values are passing through. Therewith, the multi-output behavior including the reversibility is reflected in the debugging formulation.2 In the following example the application of the debugging formulation is demonstrated. Example 7.4 Consider the circuit realization of function 3_17 with an injected missing control error at gate g5 depicted in Fig. 7.7. The missing control error leads to four counterexamples shown in the first four rows below the circuit in Fig. 7.7. For k = 1, besides {g5 } the proposed debugging formulation also returns {g4 } as an error candidate (marked by a dashed rectangle). This is, because replacing g4 with a NOT gate at line c leads to correct output values for all counterexamples as shown in the first four rows: The bold values will be inverted such that they match the propagated values from the output of the circuit. However, as in traditional debugging an error candidate always is an approximation and thus may not necessarily be the error location. In fact, g4 is not an error location, since for the NOT gate replacement an incorrect output (with respect to 2 Note that using multiplexors obviously makes the considered circuits non-reversible. However, this formulation is only used as a logic encoding of the debugging problem.

160

7 Formal Verification and Debugging

Fig. 7.7 Circuit with single error

the specification of 3_17) using 011 as input is computed as can be seen in the fifth line of the figure. Nevertheless, the number of gates that have to be considered to detect the error is reduced from 6 to 2. As shown by the experiments in Sect. 7.2.5, a significant amount of gates can be automatically identified as non-relevant using the proposed approach, i.e. to fix the error only a very small fraction of all gates has to be considered. Besides that, some further improvements are possible as described in the following.

7.2.2.2 Improvements for Control Line Errors The proposed debugging formulation needs a substantial amount of additional logic. This can be reduced, if a restricted error-model is assumed. In this section, a simpler debugging formulation for control line errors is described. This simplification leads to a faster calculation of error candidates and thus should be applied, if the source of an error can be limited to this type of errors. Control line errors include both, missing as well as additional control lines in a Toffoli gate. They may occur, when e.g. optimization approaches or the designer himself manipulates control lines of Toffoli gates. In particular, deleting control lines is used by optimization approaches (see e.g. [ZM06] or the approach described in Sect. 6.1), since they reduce the quantum costs for the considered circuit. Errors caused by deleting control lines can be seen as missing control errors, too. Since missing control errors (as well as additional control errors) only affect the target line of a Toffoli gate, the debugging formulation can be simplified to the one shown in Fig. 7.6(b). Here, multiplexors are only added for the target line of each gate. If a gate gi includes a control line error, only the value of the target line can be erroneous. By assigning si to 1, the SAT solver can choose the correct value and thus enable correct outputs. In this case, gi becomes an element of an error candidate. Furthermore, if a single missing control error is assumed, the following holds:

7.2 Automated Debugging and Fixing

161

Lemma 7.2 Let G be a reversible circuit with a single missing control error and |CEX| the total number of counterexamples for this error. Then, the erroneous gate includes c = n − 1 − log2 |CEX| control lines. Proof Let G be a reversible circuit with a missing control error in gate gi containing c control lines. To detect the erroneous behavior, (1) all control lines of gi have to be assigned to 1 and (2) another line of gi (the missing control line) has to be assigned to 0. Due to the reversibility, these values can be propagated to the inputs of the circuit leading to |CEX| = 2n−c−1 different counterexamples in total. From this, one can conclude |CEX| = 2n−c−1 , log2 |CEX| = log2 2n−c−1 , log2 |CEX| = n − c − 1, c = n − 1 − log2 |CEX|.

Remark 7.2 If the total number of counter-examples is not available, Lemma 7.2 can still be used as an upper bound. This is, because the value of CEX can only be increased leading to a smaller value of c. Thus, all gates including more than n − 1 − log2 |CEX| control lines don’t have to be considered during debugging (where in this case |CEX| is the rounded up number of available counterexamples). Exploiting Lemma 7.2, the number of gates that have to be considered can be reduced significantly. In some cases, this reduction already leads to a single gate and therewith to the desired error location (see experiments in Sect. 7.2.5). But, even if additionally the SAT-based part to determine error candidates has to be invoked, improvements can be observed, since additional logic as depicted in Fig. 7.6(b) has to be added only to gates containing exactly c = n − 1 − log2 |CEX| control lines.

7.2.3 Determining Error Locations The debugging approach proposed in the last section basically uses a modified multiplexor formulation compared to debugging for irreversible circuits. However, traditional debugging approaches suffer from the problem that the obtained error candidates only approximate the error location(s), i.e. the verification engineer may be pinpointed to misleading parts of the circuit which cannot be used for repair. In contrast, for reversible logic the debugging formulation can be extended to overcome these limitations. As a result, the real source of an error can be calculated—the error location. In this section, first the limits of traditional debugging and error candidates are discussed in more detail, followed by a more detailed description of error locations. Then, the new debugging algorithm for computing error locations is presented.

162

7 Formal Verification and Debugging

Fig. 7.8 Circuit with multiple error

7.2.3.1 Limits of Error Candidates As in irreversible debugging, error candidates are only an approximation of the real source of the error (as shown in Example 7.4). In case of single errors (i.e. for k = 1), this can be accepted, since the number of gates that have to be considered is significantly reduced. However, if multiple errors occur in the design, the set of error candidates may be completely misleading as the following example shows. Example 7.5 In Fig. 7.8, a circuit realization of the function alu is depicted. In this circuit two missing control line errors have been injected: at gate g2 and at gate g3 , respectively. If the proposed debugging approach is applied, already for k = 1 a solution (namely {g2 }) is returned. However, by exhaustive enumeration it has been checked that no replacement for gate g2 exists such that the circuit realizes the function specification. In fact, an appropriate replacement of gate g2 only fixes the counterexamples (similar to Example 7.4), while the correct behavior according to the specification of the circuit is not preserved. Thus, this error candidate is misleading. The example clearly demonstrates the need for strengthened error candidates. This results in the formalization of error locations. An error location is an error candidate where for each gate of the error candidate there is a single gate replacement which not only fixes all counterexamples, but also preserves the overall specification. Having a single error location available, the real error is automatically highlighted in the circuit and no further manual inspection is necessary. Since error locations are strengthened error candidates, this concept guarantees to give better results than just determining error candidates. In the following, an automatic approach for determining error locations is described.

7.2.3.2 Approach The general idea of the debugging approach for calculating error locations is as follows. For increasing sizes k of error candidates, it is checked whether an error candidate is an error location or not. To determine the error candidates, first the debugging formulation of Sect. 7.2.2 is applied. Then, a second SAT instance is created that checks whether there are gate replacements such that the specification

7.2 Automated Debugging and Fixing

163

Fig. 7.9 SAT-based formulation to determine error locations of size k = 2

is fulfilled or not. In the following, first the SAT formulation for this check is described. Afterwards, the overall algorithm (that uses this formulation) is introduced and illustrated by means of an example. Usually, in the debugging process a reference circuit F (used to obtain the counterexamples) is available. Having this, a method is applied which is inspired by SAT-based equivalence checking (as introduced in Sect. 7.1.3) and exact synthesis of reversible logic (as introduced in Chap. 4). To check the existence of appropriate gates which replace each gate of a given error candidate of size k, a miter structure as depicted in Fig. 7.9 is built. Note that this figure illustrates the structure only for a concrete example, i.e. a circuit with three inputs/outputs and an error candidate of size k = 2 containing the gates gp and gq . By applying the inputs to both, to the reference circuit F as well as to the erroneous circuit G, the identity for corresponding outputs must result which is enforced by XNOR gates. An additional AND gate, where the output is set to value 1, constrains that both circuits must produce the same outputs for the same input assignment. Then, it is allowed that each gi of the current error candidate can be of any arbii i trary type. To this end, free variables tlog , ti , . . . , t1i and c1i , c2i , . . . , cn−1 2 n log2 n−1 are introduced (for brevity denoted by ti and ci in the following). According to the assignment to ti and ci , the gate gi is modified. Thereby, ti is used as a binary encoding of a natural number t i ∈ {0, . . . , n − 1} which defines the chosen target line. In contrast, ci denotes the control lines. More precisely, assigning cli = 1 (with 1 ≤ l ≤ n − 1) means that line (t i + l) mod n becomes a control line of the Toffoli gate gi . The same encoding has already been used in Chap. 4 for exact synthesis. Figure 4.5 on p. 62 gives some examples for assignments to ti and ci with their respective Toffoli gate representation. Finally, this formulation is duplicated for each possible input of the circuit, i.e. 2n times. The same variables ti and ci are thereby used for each duplication. In doing so, a functional description is constructed which is satisfiable, iff there is a valid assignment to ti and ci (i.e. iff there is a gate replacement of all gates gi ) such that

164 Fig. 7.10 Main flow of error location determination

7 Formal Verification and Debugging

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

determineErrorLocation(F , G, CEX) EC = ∅; // stores error candidate k = 1; while (true) do inst = formulateErrCandInstance(G, CEX, k); for each (EC = solution(inst)) inst = formulateErrLocInstance(F, G, EC); if (solution(inst) == SAT) return EC; k = k + 1;

for all inputs the same input-output mapping results. Then, a fix can be extracted from the assignments to ti and ci . If there is no such assignment (i.e. the instance is unsatisfiable), it has been proven that the considered error candidate is not an error location. Note that despite this SAT formulation, it is also possible to exhaustively enumerate all gate combinations for an error candidate. However, using modern SAT solvers sophisticated techniques (in particular search space pruning by means of conflict analysis [MS99] as well as efficient implication techniques [MMZ+01]) are exploited that significantly accelerates the solving process. In doing so, the worst case complexity still remains exponential but as the experiments in Sect. 7.2.5 show, error locations can be determined for many reversible circuits. Having this SAT formulation as a basis, the overall approach to determine error locations is summarized in Fig. 7.10. First, it is aimed to find an error location including one gate only, i.e. error candidates with k = 1 are determined while EC contains the current error candidate (line 5 and 6). Then, it is checked whether EC is an error location using the SAT formulation as described above. If this is the case (line 8), then EC is an error location and thus is returned (line 9). Otherwise, the remaining error candidates of same size are checked. If no error location of size k has been found, then k is increased and the respective steps are repeated (line 10). Example 7.6 Consider again the circuit shown in Fig. 7.8 with two injected errors. The debugging approach described in Sect. 7.2.2 first identifies EC = {g2 } as an error candidate. However, using the SAT formulation introduced above it can be verified that there is no gate replacement for EC that preserves the original circuit specification. Thus, further error candidates are generated. Since this is not possible for k = 1, k is increased leading to EC = {g2 , g3 }. Because an appropriate gate replacement can be found for this candidate, EC is the desired error location. Note that sometimes more than one error location is possible. As an example, consider the circuit given in Fig. 7.11. Here, a single missing control error has been injected at gate g1 . Nevertheless, in total there are two repairs for the erroneous circuit: g0 and g1 , respectively. Thus, if the designer wants to know if more than a single error location exists, the algorithm in Fig. 7.10 must not terminate at line 9 but iterated as long as the desired number of checks are performed.

7.2 Automated Debugging and Fixing

165

Fig. 7.11 Erroneous circuit with two error locations

In summary, using the proposed formulation, error locations can be automatically determined. This does not only pinpoint the designer to the concrete source of an error, but also allows an automatic repair, since the concrete gate replacements that fixes the erroneous behavior can be obtained by the assignments to ti and ci , respectively.

7.2.4 Fixing Erroneous Circuits So far, the goal was to determine error candidates or locations that explain the erroneous behavior in a circuit, respectively. However, the reversibility of the considered circuits additionally allows an easy computation of fixes (even easier than creating a fixing formulation and solving the corresponding SAT instance as described in the previous section). Therefore, a single gate must be replaced by a fixing cascade which—due to reversibility—can be computed in time linear in the size of the circuit. More precisely, fixes can be automatically generated by applying the following lemma. Lemma 7.3 Let F be an error-free reference circuit and G = G1 gi G2 be an erroneous realization/optimization of F . Then, G can be fixed by replacing an arbitrary fix −1 gate gi of G with a cascade of gates Gi = G−1 1 F G2 . Proof Since G−1 G realizes the identity function, it holds: fix

G1 Gi G2 = F fix −1 G−1 1 G1 Gi G2 G2

⇔

−1 = G−1 1 F G2

⇔

−1 Gi = G−1 1 F G2 . fix

fix

Thus, replacing gi with Gi fixes the erroneous circuit G.

At a first glance, applying this lemma for fixing the erroneous circuit G, i.e. refix placing some gi by Gi (which includes the circuit F ), leads to a larger circuit than fix F itself. However, in many cases Gi can be reduced to a few gates. In particular, fix if the chosen gate gi is a location of a single error, Gi can be simplified to a single

166

7 Formal Verification and Debugging

Fig. 7.12 Fixing an erroneous circuit

fix

Table 7.3 Sizes of Gi

fix

fix

G ATE

|Gi |

Simplified |Gi |

g0

13

6

g1

13

3

g2

13

1

g3

13

3

g4

13

3

gate. As a consequence, Lemma 7.3 can also be applied to determine error locations of single errors. The application of Lemma 7.3 is illustrated by the following example. Example 7.7 Consider the circuits F and G for function ham3 as depicted in Figs. 7.12(a) and (b), respectively. While F realizes the desired function, G is an erroneous optimization. Applying Lemma 7.3 for each gate gi gives the results shown in Table 7.3. The first column gives the considered gate. In the second column the fix number of gates of Gi after applying the lemma is shown, and the last column profix vides the same information after simplification of Gi . As can be seen, nearly all fixes can be significantly reduced. For gate g2 even a reduction to a single gate can be observed. For simplification any synthesis or optimization approach can be used. Thereby, fix to determine the smallest possible fix, in the worst case the simplification of Gi has to be executed for each possible gate of the circuit, since the best position is not known upfront. However, in many cases even the exact synthesis approach described in Chap. 4 leads to good results in feasible time (in particular, if it is tried fix to simplify Gi to a single gate only). In other cases, the transformation-based approach described in Sect. 3.1.2 has been shown to be quite effective. So, different positions can be tried until a satisfying one is found. Recall that in case of multiple errors, this technique repairs the circuit by substituting one gate by the simplified fix. However, this gives no information where the errors are located.

7.2 Automated Debugging and Fixing

167

7.2.5 Experimental Results The proposed methods have been implemented in C++ and were evaluated on a set of reversible circuits taken from RevLib [WGT+08]. In this section, the results of the experimental studies are presented. First, the behavior of the various debugging methods applied to single errors is discussed, followed by a consideration for multiple errors. Afterwards, the results of the automatic fixing approach are presented. For all benchmark circuits, single and multiple errors have been randomly injected to circuits taken from [WGT+08]. More precisely, a gate has been replaced with another gate (leading to a wrong gate error) or a control line has been removed (leading to a missing control error), respectively. Counterexamples showing the errors were generated using the SAT-based equivalence checker introduced in Sect. 7.1.3. For solving the respective instances, the SAT solver MiniSAT [ES04] has been used. The documented run-times include the times for instance generation and solving. All experiments have been carried out on an AMD Athlon 3500+ with 1 GB of main memory. The timeout was set to 5000 CPU seconds.

7.2.5.1 Single Errors In a first series of experiments, the debugging approaches are considered for determining single error candidates. For debugging wrong gate errors, the approach proposed in Sect. 7.2.2 has been used (denoted by D BG EC). For missing control errors additionally the improvements are applicable, namely the consideration of target lines only (denoted by TARGET L INES ONLY) and the application of Lemma 7.2 (denoted by |CEX|-based Reduction). The results are summarized in Table 7.4. Column C IRCUIT gives the circuit name. Column d, column n, and column |CEX| give the number of gates, the number of lines, and the number of counterexamples (the number within the brackets denotes the number of counterexamples for which the circuit has been duplicated), respectively. Furthermore, for each approach the number of obtained error candidates (denoted by C AND .) and the overall run-time in CPU seconds (denoted by T IME) are provided. Column C AND . for |CEX|-BASED R EDUCTION includes two values. The first denotes the remaining gates after applying Lemma 7.2, the second gives the final number of error candidates after running the SAT-based debugging approach. Finally, column R ED . lists the best reduction obtained by the approaches, i.e. the percentage of gates that are identified as non-relevant (meaning the error is not located at these gates). As shown in the table, a significant amount of gates can be automatically identified as non-relevant for debugging the error. Reductions of at least two third—for larger circuits of more than 90%—are achieved. As an example, for the wrong gate error in circuit hwb9 with 1544 gates, two error candidates are obtained in less than 100 CPU seconds. The quality of the resulting set of error candidates often depends on the used strategy. For example, to identify the missing control error

168

7 Formal Verification and Debugging

Table 7.4 Determining error candidates for single errors (a) Wrong gate errors |CEX|

C IRCUIT

3_17 4_49 4gt4 ham3 ham7 hwb4 hwb5 hwb6 hwb7 hwb8 hwb9 plus63mod4096 plus63mod8192 plus127mod8192 urf1 urf2 urf3

d

n

6 16 6 5 23 17 55 126 289 637 1544 429 492 910 11554 5030 26468

3 4 5 3 7 4 5 6 7 8 9 12 13 13 9 8 10

4(2) 8(2) 4(2) 4(2) 16(1) 8(2) 8(2) 16(1) 16(1) 4(2) 30(3) 32(3) 8(2) 16(1) 128(12) 64(6) 256(25)

D BG EC C AND . 2 1 1 1 2 4 8 16 3 1 2 1 2 12 – 2 –

R ED . T IME <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 0.01 0.03 0.05 8.23 89.75 14.60 12.17 35.81 >5000.00 1563.64 >5000.00

66.7 % 87.5 % 83.3 % 60.0 % 95.7 % 94.1 % 92.7 % 87.3 % 95.2 % 99.7 % 99.9 % 96.0 % 99.2 % 89.2 % – 99.9 % –

(b) Missing control errors |CEX|

C IRCUIT d

D BG EC C AND . T IME

3_17 6 3 4_49 16 4 4gt4 6 5 ham3 5 3 ham7 23 7 hwb4 17 4 hwb5 55 5 hwb6 126 6 hwb7 289 7 hwb8 637 8 hwb9 1544 9 plus63mod4096 429 12 plus63mod8192 492 13 plus127mod8192 910 13 urf1 11554 9 urf2 5030 8 urf3 26468 10

TARGET

LINES

ONLY

n

2(2) 8(2) 2(2) 4(2) 32(3) 8(2) 8(2) 16(1) 32(3) 8(2) 16(1) 16(1) 512(51) 64(6) 128(12) 64(6) 256(25)

2 10 3 3 5 4 6 7 250 34 68 296 101 237 – 12 –

C AND . T IME

<0.01 2 <0.01 2 <0.01 3 <0.01 1 <0.01 2 2 <0.01 0.02 8 0.07 6 0.71 13 5.57 2 221.42 3 78.81 26 426.74 92 198.14 206 >5000.00 1 2143.22 6 >5000.00 4

|CEX|-BASED R ED . R EDUCTION C AND . T IME

<0.01 4/2 <0.01 66.7 % <0.01 2/1 <0.01 93.8 % <0.01 2/2 <0.01 66.7 % <0.01 1/1 <0.01 80.0 % 0.01 17/1 <0.01 95.7 % <0.01 1/1 <0.01 94.1 % 0.01 25/1 <0.01 98.2 % 0.03 20/1 0.01 99.2 % 0.05 28/5 0.02 98.3 % 0.18 167/1 0.04 99.8 % 0.56 443/1 0.87 99.9 % 1.00 84/12 0.19 97.2 % 0.99 99.2 % 25.13 43/4 69.70 128/8 0.31 99.1 % 63.28 1/1 <0.01 99.9 % 6.40 1/1 <0.01 99.9 % 358.74 1/1 <0.01 99.9 %

7.2 Automated Debugging and Fixing

169

in circuit hwb7 still 250 (out of 289) error candidates have to be considered after applying the D BG EC approach. Here, restricting the error model and using the improvements not only leads to a speed-up, but also to a smaller number of error candidates. This is also effective for other circuits (e.g. hwb4 and urf1). Here, just by applying Lemma 7.2 the set of error candidates is reduced to the single erroneous gate and no further SAT call is required. For determining error locations (instead of error candidates), under the single error assumption two approaches can be used: The SAT-based formulation introduced in Sect. 7.2.3 (using the extended miter formulation) and the approach based fix on Lemma 7.3 (i.e. generate Gi for an error candidate determined by the debugging approach a nd try to simplify it to a single gate). Table 7.5 summarizes the results of both approaches where the former one is denoted by D BG EL and the latter one by D BG EC+F IX. Again, it is distinguished between wrong gate errors and missing control errors (where |CEX|-BASED R EDUCTION can be applied). Columns C IRCUIT, d, n, and C AND . denote the name, the number of gates, the number of lines, and the number of obtained error candidates of each benchmark, respectively. The documented overall run-times of the respective approaches are given in column T IME.3 As can be clearly seen, error locations for single errors can be determined even for circuits consisting of more than 25.000 gates. Applying Lemma 7.3 (i.e. D BG EC+F IX) is thereby more efficient than using the miter-formulation (i.e. D BG EL). In contrast, the D BG EL-approach is also applicable to multiple errors which is considered in the next section.

7.2.5.2 Multiple Errors If multiple errors occur, error candidates may be misleading. This has been observed in a second series of experiments, where circuits including multiple errors have been applied to the method for error candidate determination described in Sect. 7.2.2 (denoted by D BG EC) and the method for error location determination described in Sect. 7.2.3 (denoted by D BG EL), respectively. Results for multiple missing control errors injected to a set of circuits are given in Table 7.6. Results are thereby presented for the G ENERAL C ASE (without an error assumption) and for the case, where a control line error is assumed so that TARGET LINES ONLY can be considered. The denotation of the respective columns is analog to the ones in the previous tables. First of all, it can be seen that determining error candidates only (D BG EC), in fact is misleading for many cases. As examples, for the benchmarks 3_17, ham7, hwb5, 3_17-3, hwb5-3, and hwb7-3 error candidates with lower cardinality k than for the approved error location result. Thus, replacing these gates would fix the counterexamples but does not preserve the correct behavior according to the specification of 3 Note that in both cases this also includes the debugging run-time needed to obtain the error candidates.

170

7 Formal Verification and Debugging

Table 7.5 Determining error locations for single errors C IRCUIT d

n

C AND .

D BG EC+F IX T IME

D BG EL T IME

W RONG GATE ERRORS 3_17

6

3

2

<0.01

<0.01

4_49

16

4

1

<0.01

<0.01

4gt4

6

5

1

<0.01

<0.01

ham3

5

3

1

<0.01

<0.01

ham7

23

7

2

<0.01

0.02

hwb4

17

4

4

<0.01

<0.01

hwb5

55

5

8

0.01

0.06

hwb6

126

6

16

0.03

0.03

hwb7

289

7

3

0.05

0.80

hwb8

637

8

1

8.23

15.22

hwb9

1544

9

2

89.75

133.18

plus63mod4096

429

12

1

14.60

>5000.00

plus63mod8192

492

13

2

12.17

>5000.00

plus127mod8192

910

13

12

35.81

>5000.00 >5000.00

urf1

11554

9

–

>5000.00

urf2

5030

8

2

1563.64

1570.01

urf3

26468

10

–

>5000.00

>5000.00

M ISSING CONTROL ERRORS 3_17

6

3

2

<0.01

<0.01

4_49

16

4

2

<0.01

<0.01

4gt4

6

5

3

<0.01

<0.01

ham3

5

3

1

<0.01

<0.01

ham7

23

7

2

0.01

0.02

hwb4

17

4

2

<0.01

<0.01

hwb5

55

5

8

0.01

0.06

hwb6

126

6

6

0.03

0.33

hwb7

289

7

13

0.05

0.80

hwb8

637

8

2

0.18

14.22

hwb9

1544

9

3

0.56

44.51

plus63mod4096

429

12

26

1.00

>5000.00

plus63mod8192

492

13

92

25.13

>5000.00

910

13

206

69.70

>5000.00

11554

9

1

63.28

>5000.00

plus127mod8192 urf1 urf2

5030

8

6

6.40

37.45

urf3

26468

10

4

358.74

>5000.00

7.2 Automated Debugging and Fixing

171

Table 7.6 Determining error locations and error candidates for multiple errors D BG EL

C IRCUIT

D BG EC

G ENERAL C ASE d

n

|CEX|

P OS . k T IME

TARGET LINES ONLY

TARGET LINES ONLY

P OS . k T IME

C AND . k T IME

2 injected errors 3_17

6

3

4(2)

1

2

0.00 1

2

0.00

2

1

<0.01

4_49

16

4

10(4)

1

2

0.03 1

2

0.00

3

2

<0.01

4gt4

6

5

8(4)

1

2

0.04 1

2

0.00

3

2

<0.01

ham3

5

3

4(2)

1

2

0.00 1

2

0.00

1

2

<0.01

ham7

23

7 30(12)

1

2

0.68 1

2

0.08

4

1

<0.01

hwb4

17

4

12(5)

1

2

0.04 1

2

<0.01

2

2

<0.01

hwb5

55

5

3(2)

1

2

0.52 1

2

0.04

2

1

<0.01

hwb6

126

6

22(9)

1

2

27.77 1

2

0.37

2

2

0.29

hwb7

289

7 80(32)

1

2

1204.28 1

2

5.49 13

2

0.87

hwb8

637

8 42(17)

–

– >5000.00 1

2

59.18

4

2

45.39

hwb9

1544

9 36(15)

–

– >5000.00 1

2

348.51

2

2

332.51

urf1

1517

9 30(12)

–

– >5000.00 –

– >5000.00

4

2 1007.09

urf3

2674

10 141(57)

–

– >5000.00 –

– >5000.00

4

2 2932.35

3

2

1

<0.01

0.01 79

3

<0.01

3 injected errors 3_17-3

6

3

4(2)

1

3

4_49-3

16

4

10(4)

1

3

5

0.00 1 1147

1

3

<0.01

4gt4-3

6

18(8)

1

3

0.64 1

3

0.00

6

3

<0.01

ham7-3

23

7 46(19)

1

3

8.94 1

3

0.09

4

3

<0.01

hwb4-3

17

4

12(5)

1

3

0.48 1

3

0.05 65

3

<0.01

hwb5-3

55

5 24(10)

1

3

4241.11 1

3

1.03

2

0.18

2

hwb6-3

126

6 41(17)

–

– >5000.00 1

3

7.38

4

3

6.82

hwb7-3

289

7 94(38)

–

– >5000.00 1

3

311.04

7

2

1.10

hwb8-3

637

8 97(39)

–

– >5000.00 1

3

2575.31 12

3 2606.05

the circuit (as also discussed in Example 7.5). These examples confirm the need for determining error locations. However, as also shown in Table 7.6, determining error locations is expensive. For some benchmarks (urf1 and urf3) it was not possible to obtain the error locations within the given timeout. For other benchmarks (hwb8, hwb9, hwb6-3, hwb7-3, and hwb8-3), this was only possible if the TARGET LINES ONLY improvement has additionally been applied. Nevertheless, for all other benchmarks error locations can be determined. In comparison to traditional debugging (where only approaches that determine error candidates exist), this is a significantly stronger result.

172 Table 7.7 Fixing for single and double errors

7 Formal Verification and Debugging |Gfix |

C IRCUIT |G|

|F |

n

min

max

AVG . T IME

Single errors 4_49

11

16

4

1

24

0.01

4gt4

6

17

5

1

3

0.01

hwb5

23

55

5

1

75

0.01

hwb6

41

126

6

1

157

0.02

hwb7

235

331

7

1

384

0.05 0.14

hwb8

613

749

8

1

678

hwb9

1543

1959

9

1

2645

0.84

urf1

1516

11554

9

1

2229

0.95

urf2

3249

5030

8

1

1176

0.41

11

16

4

3

22

0.01

4gt4

5

17

5

16

22

0.01

hwb5

23

55

5

3

77

0.01

hwb6

41

126

6

24

190

0.02

hwb7

235

331

7

69

431

0.05 0.21

Double errors 4_49

hwb8

613

749

8

3

1202

hwb9

1543

1959

9

489

2689

0.83

urf1

1516

11554

9

361

2655

1.03

urf2

3249

5030

8

904

1251

0.40

7.2.5.3 Automatic Fixing Finally, fixing of erroneous circuits without determining error locations is considered. For this purpose, Lemma 7.3 was applied to all gates gi of an erroneous circuit G using an additional circuit F as reference. Afterwards, the respecfix tive Gi has been simplified by extracting the function and synthesizing it using the transformation-based approach of Sect. 3.1.2. In doing so, the resulting sizes of the fixing cascade have been evaluated. Table 7.7 presents the results. Again, the first column gives the name of the circuit. The next two columns provide the sizes of the erroneous circuit G denoted by |G| and the reference circuit F denoted by |F |. Then, the minimal size (min) and the maximal size (max), as well as the average run-time to determine one fix (AVG . T IME) are shown, respectively. Obviously, the size of the respective fixes depends on both, the size of the er−1 roneous circuit G as well as on the reference circuit F (since Gfix = G−1 1 F G2 ). Nevertheless, very often the resulting Gfix can be simplified to a significantly smaller cascade. For single errors, always a single gate fix (identical to the error location) is found. For double errors, the fixes become larger. Nevertheless, also for this case very compact fixes can be efficiently computed. In fact, for most of the considered

7.3 Summary and Future Work

173

benchmarks a smaller (fixed) circuit than originally given by F can be obtained (sometimes even if the worst case fixes given in column max are applied). Since additionally never more than approx. one CPU second is needed to determine a fix, the application of Lemma 7.3 is an efficient alternative to simply fix erroneous circuits.

7.3 Summary and Future Work Verification is crucial to ensure the correctness (and therewith the quality) of the circuits. Two aspects are particularly covered in the traditional design flow: Model checking verifies if the initial description of a circuit (e.g. given in an HDL) matches the specification while equivalence checking verifies the correctness of the following (e.g. optimization) steps. In this chapter, approaches to close the gap for the latter aspect have been proposed. A typical scenario results from the application of the approaches introduced in the former chapters: Using synthesis approaches (e.g. the ones introduced in Chap. 3) a circuit for the desired function is synthesized. Since this circuit does not satisfy the cost requirements it is optimized (e.g. by one of the approaches introduced in Chap. 6 or by some manual optimization). Then, equivalence checking as proposed in this chapter is applied to verify that the optimized circuit still represents the desired function. Therefore, approaches based on decision diagrams and based on Boolean satisfiability have been introduced. Both check the equivalence with respect to the target functionality, i.e. regardless of the used embedding, the applied output permutation, the number of additional circuit lines, etc. Using these approaches, circuits consisting of thousands of gates or more than hundred variables have been efficiently verified. In contrast, model checking (in particular property checking) still is an open issue. In the traditional design flow, this aspect is often related to a hardware description language which is used to implement a given specification. The approach introduced in Sect. 3.3 is a first step towards such a language for reversible logic. To check the correctness of the respective implementations appropriate model checkers as well as property specification languages are needed. Thus, research in these areas is the next step. The methods described in this chapter for equivalence checking may provide the basis for that. However, while verification is used to detect the existence of an error only, it gives no support what to do if a circuit has been shown erroneous. In this case, the designer only has a set of counterexamples which can be used to manually debug the design. In particular for large circuits, this can be a time-consuming task. For this reason, in traditional circuit design automatic debugging approaches have been introduced that support the designer by identifying “non-relevant” gates and instead highlighting so called error candidates. However, in this chapter it was shown that a one-to-one adaption of such approaches does not lead to the desired results. Hence, a new formulation has been devised that integrates the properties of reversible gates. As a result, methods for error candidate determination now are also available for reversible circuits. The experiments showed that applying these methods, the number

174

7 Formal Verification and Debugging

of gates which have to be considered during debugging can be reduced by 66.7% to 99.9%. Moreover, also approaches that determine concrete error locations have been introduced. In particular, for multiple errors this avoids misleading results and directly pinpoints the designer to the source of an error. Finally, for the case an erroneous circuit only has to be fixed, an efficient fixing approach has been proposed as well. This approach exploits properties of reversible logic and thus allows an easy fix which only has to be simplified afterwards. Future work is mainly driven by the current performance of the proposed approaches—in particular if multiple errors are considered. Here, improvements with respect to the run-time are needed. For particular error models, first promising results have already been proposed in [FWD10, JFWD10]. But besides that, in particular the formal methods need improvement. Thus, more sophisticated solving engines and formulations are required. The exploitation of unsatisfiable cores, as e.g. done in [SFBD08] for multiple error debugging of irreversible circuits, might be a promising direction. Furthermore, also the consideration of more specialized error models (similar to the ones identified in [PFBH05] for reversible testing) can be studied in future work.

Chapter 8

Summary and Conclusions

Traditional technologies more and more start to suffer from the increasing miniaturization and the exponential growth of the number of transistors in integrated circuits. To face the upcoming challenges, alternatives are needed. Reversible logic provides such an alternative that may replace or at least enhance traditional computer chips. In the area of quantum computation and low-power design, first very promising results have been obtained already today. Nevertheless, research in reversible logic is still at the beginning. No continuous design flow exists so far. Instead, approaches only for individual steps (e.g. for synthesis) have been proposed. But, most of these methods are applicable to very small functions or circuits, respectively. This is not sufficient to design complex reversible systems. In this book, first steps towards a design flow for reversible logic have been proposed. That is, methods ranging from synthesis, embedding, optimization, verification, and debugging have been introduced and experimentally evaluated. By BDDbased synthesis, it is possible to synthesize functions with more than 100 variables. More complex reversible systems can be realized using the SyReC language. Therefore, also techniques for exact synthesis as well as for embedding have been utilized to determine the respective building blocks. Three optimization approaches have been proposed that lead to more compact circuits with respect to different criteria (i.e. quantum cost, transistor cost, number of lines, or nearest neighbor cost) and therewith with respect to different technologies. To prove that e.g. the optimization was correct, equivalence checking with the help of decision diagrams or Boolean satisfiability, respectively, has been introduced. In case of a failed verification, approaches have been proposed that help the designer to find the error or to fix the circuit. Altogether, using the approaches introduced in this book, complex functions can be synthesized and circuits with thousands of gates can be optimized, verified, and debugged—everything in a very efficient automated way. Combining the respective approaches, a first design flow results that already can handle functions and circuits of notable size. The uniform RevLib-format for reversible functions and circuits (see www.revlib.org) thereby builds the basis to link the respective steps together. The resulting tools can be obtained under www.revkit.org. So, designer of reversible circuits have a first continuous and consistent flow to create their circuits. R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, DOI 10.1007/978-90-481-9579-4_8, © Springer Science+Business Media B.V. 2010

175

176

8 Summary and Conclusions

Besides that, the methods proposed in this book build the basis for further extensions towards a design flow that covers more elaborated designs needs. In particular, extensions “on the top” and “on the bottom” of the flow are promising. More precisely, synthesis of reversible logic should reach the system level. Therefore, it is vital to have appropriate hardware description languages as well as corresponding synthesis approaches. Only then, design of complex reversible circuits is possible. The SyReC language proposed in Sect. 3.3 is a first promising step in this direction. Followed by this, also new verification issues will emerge. In particular for complex circuits specified using e.g. hardware description languages, it often cannot be ensured that the design was implemented as intended. Thus, developing methods for property checking is a promising next step. Furthermore, questions related to test of reversible circuits will emerge in future. Already today, first models and approaches in this area exist (see e.g. [PHM04, PBL05, PFBH05]). But due to the absence of large physical realizations, it is hard to evaluate the suitability of them. Additionally, existing approaches cover only some possible technologies. With ongoing progress in the development of further (and larger) physical quantum computing or reversible CMOS realizations, new models and approaches are needed to efficiently test them. Then, at the latest, also the design flow for reversible logic needs a comprehensive consideration of testing issues. Besides this “global view” on upcoming challenges in this domain, further ideas for future work already have been discussed in the respective chapters. Overall, the development of an elaborated design flow that is comparable to the one for traditional circuit design (that has been developed in the last 25 years) will take further years of research. In this context, the contributions in this book provide a good starting point.

References

[Abr05] [ASV+05] [BB09]

[BBC+95]

[BBC+05]

[BCCZ99]

[BDL98] [Ben73] [Ben05] [Ber06] [Bie05] [Bra93] [BRB90] [Bry86] [BW96] [CA87] [CBRZ01]

S. Abramsky, A structural approach to reversible computation. Theor. Comput. Sci. 347(3), 441–464 (2005) M.F. Ali, S. Safarpour, A. Veneris, M.S. Abadir, R. Drechsler, Post-verification debugging of hierarchical designs, in Int’l Conf. on CAD (2005), pp. 871–876 R. Brummayer, A. Biere, Boolector: An efficient SMT solver for bit-vectors and arrays, in Tools and Algorithms for the Construction and Analysis of Systems (2009), pp. 174–177 A. Barenco, C.H. Bennett, R. Cleve, D.P. DiVinchenzo, N. Margolus, P. Shor, T. Sleator, J.A. Smolin, H. Weinfurter, Elementary gates for quantum computation. Am. Phys. Soc. 52, 3457–3467 (1995) M. Bozzano, R. Bruttomesso, A. Cimatti, T. Junttila, P. Rossum, S. Schulz, R. Sebastiani, The MathSAT 3 system, in Int. Conf. on Automated Deduction (2005), pp. 315–321 A. Biere, A. Cimatti, E. Clarke, Y. Zhu, Symbolic model checking without BDDs, in Tools and Algorithms for the Construction and Analysis of Systems. LNCS, vol. 1579 (Springer, Berlin, 1999), pp. 193–207 C.W. Barrett, D.L. Dill, J.R. Levitt, A decision procedure for bit-vector arithmetic, in Design Automation Conf. (1998), pp. 522–527 C.H. Bennett, Logical reversibility of computation. IBM J. Res. Dev. 17(6), 525– 532 (1973) M. Benedetti, sKizzo: a suite to evaluate and certify QBFs, in Int’l Conf. on Automated Deduction (2005), pp. 369–376 J. Bergeron, Writing Testbenches Using Systemverilog (Springer, Berlin, 2006) A. Biere, Resolve and expand, in Theory and Applications of Satisfiability Testing. LNCS, vol. 3542 (Springer, Berlin, 2005), pp. 59–70 D. Brand, Verification of large synthesized designs, in Int’l Conf. on CAD (1993), pp. 534–537 K.S. Brace, R.L. Rudell, R.E. Bryant, Efficient implementation of a BDD package, in Design Automation Conf. (1990), pp. 40–45 R.E. Bryant, Graph-based algorithms for Boolean function manipulation. IEEE Trans. Comput. 35(8), 677–691 (1986) B. Bollig, I. Wegener, Improving the variable ordering of OBDDs is NP-complete. IEEE Trans. Comput. 45(9), 993–1002 (1996) R. Cuykendall, D.R. Andersen, Reversible optical computing circuits. Opt. Lett. 12(7), 542–544 (1987) E.M. Clarke, A. Biere, R. Raimi, Y. Zhu, Bounded model checking using satisfiability solving. Form. Methods Syst. Des. 19(1), 7–34 (2001)

R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, DOI 10.1007/978-90-481-9579-4, © Springer Science+Business Media B.V. 2010

177

178 [CDKM05] [CGP99] [Coo71] [CS07] [DB98] [DBW+07]

[DDG00] [DEF+08]

[DLL62] [DM06a] [DM06b] [DP60] [Dre04] [DS07]

[DV02] [EFD05] [ES04] [FDH04] [FSVD06]

[FT82] [FWD10] [GAJ06] [GCDD07] [GD07] [GLMS02] [GN02]

References S.A. Cuccaro, T.G. Draper, S.A. Kutin, D.P. Moulton, A new quantum ripple-carry addition circuit, in Workshop on Quantum Information Processing (2005) E.M. Clarke, O. Grumberg, D. Peled, Model Checking (MIT Press, Cambridge, 1999) S.A. Cook, The complexity of theorem proving procedures, in Symposium on Theory of Computing (1971), pp. 151–158 A. Chakrabarti, S. SurKolay, Nearest neighbour based synthesis of quantum boolean circuits. Eng. Lett. 15, 356–361 (2007) R. Drechsler, B. Becker, Binary Decision Diagrams—Theory and Implementation (Kluwer Academic, Dordrecht, 1998) S. Deng, J. Bian, W. Wu, X. Yang, Y. Zhao, EHSAT: An efficient RTL satisfiability solver using an extended DPLL procedure, in Design Automation Conf. (2007), pp. 588–593 R. Drechsler, N. Drechsler, W. Günther, Fast exact minimization of BDDs. IEEE Trans. CAD 19(3), 384–389 (2000) R. Drechsler, S. Eggersglüß, G. Fey, A. Glowatz, F. Hapke, J. Schloeffel, D. Tille, On acceleration of SAT-based ATPG for industrial designs. IEEE Trans. CAD 27, 1329–1333 (2008) M. Davis, G. Logeman, D. Loveland, A machine program for theorem proving. Commun. ACM 5, 394–397 (1962) B. Dutertre, L. Moura, A fast linear-arithmetic solver for DPLL(T), in Computer Aided Verification. LNCS, vol. 4114 (Springer, Berlin, 2006), pp. 81–94 B. Dutertre, L. Moura, The YICES SMT solver (2006). Available at http:// yices.csl.sri.com/ M. Davis, H. Putnam, A computing procedure for quantification theory. J. ACM 7, 506–521 (1960) R. Drechsler, Advanced Formal Verification (Kluwer Academic, Dordrecht, 2004) S. Disch, C. Scholl, Combinational equivalence checking using incremental SAT solving, output ordering, and resets, in ASP Design Automation Conf. (2007), pp. 938–943 B. Desoete, A. De Vos, A reversible carry-look-ahead adder using control gates. Integr. VLSI J. 33(1–2), 89–104 (2002) R. Ebendt, G. Fey, R. Drechsler, Advanced BDD Optimization (Springer, Berlin, 2005) N. Eén, N. Sörensson, An extensible SAT solver, in SAT 2003. LNCS, vol. 2919 (Springer, Berlin, 2004), pp. 502–518 A.G. Fowler, S.J. Devitt, L.C.L. Hollenberg, Implementation of Shor’s algorithm on a linear nearest neighbour qubit array. Quantum Inf. Comput. 4, 237–245 (2004) G. Fey, S. Safarpour, A. Veneris, R. Drechsler, On the relation between simulationbased and SAT-based diagnosis, in Design, Automation and Test in Europe (2006), pp. 1139–1144 E.F. Fredkin, T. Toffoli, Conservative logic. Int. J. Theor. Phys. 21(3/4), 219–253 (1982) S. Frehse, R. Wille, R. Drechsler, Efficient simulation-based debugging of reversible logic, in Int’l Symp. on Multi-Valued Logic (2010), pp. 156–161 P. Gupta, A. Agrawal, N.K. Jha, An algorithm for synthesis of reversible logic circuits. IEEE Trans. CAD 25(11), 2317–2330 (2006) D. Große, X. Chen, G.W. Dueck, R. Drechsler, Exact SAT-based Toffoli network synthesis, in ACM Great Lakes Symposium on VLSI (2007), pp. 96–101 V. Ganesh, D.L. Dill, A decision procedure for bit-vectors and arrays, in Computer Aided Verification (2007), pp. 519–531 T. Grötker, S. Liao, G. Martin, S. Swan, System Design with SystemC (Kluwer Academic, Dordrecht, 2002) E. Goldberg, Y. Novikov, BerkMin: a fast and robust SAT-solver, in Design, Automation and Test in Europe (2002), pp. 142–149

References [GNP08]

179

S. Gay, R. Nagarajan, N. Papanikolaou, QMC: A model checker for quantum systems, in Computer Aided Verification (2008), pp. 543–547 [GWDD08] D. Große, R. Wille, G.W. Dueck, R. Drechsler, Exact synthesis of elementary quantum gate circuits for reversible functions with don’t cares, in Int’l Symp. on MultiValued Logic (2008), pp. 214–219 [GWDD09a] D. Große, R. Wille, G.W. Dueck, R. Drechsler, Exact multiple control Toffoli network synthesis with SAT techniques. IEEE Trans. CAD 28(5), 703–715 (2009) [GWDD09b] D. Große, R. Wille, G.W. Dueck, R. Drechsler, Exact synthesis of elementary quantum gate circuits. J. Mult.-Valued Log. Soft Comput. 15(4), 270–275 (2009) [HK00] D.W. Hoffmann, T. Kropf, Efficient design error correction of digital circuits, in Int’l Conf. on Comp. Design (2000), pp. 465–472 [HL05] F.S. Hillier, G.J. Lieberman, Introduction to Operations Research (McGraw-Hill, New York, 2005) [HSY+06] W.N.N. Hung, X. Song, G. Yang, J. Yang, M. Perkowski, Optimal synthesis of multiple output Boolean functions using a set of quantum gates by symbolic reachability analysis. IEEE Trans. CAD 25(9), 1652–1663 (2006) [IKY02] K. Iwama, Y. Kambayashi, S. Yamashita, Transformation rules for designing CNOT-based quantum circuits, in Design Automation Conf. (2002), pp. 419–424 [JFWD10] J.C. Jung, S. Frehse, R. Wille, R. Drechsler, Enhancing debugging of multiple missing control errors in reversible logic, in Great Lakes Symp. VLSI (2010) [JSWD09] J.C. Jung, A. Sülflow, R. Wille, R. Drechsler, SWORD v1.0, Satisfiability Modulo Theories Competition (2009) [Ker04] P. Kerntopf, A new heuristic algorithm for reversible logic synthesis, in Design Automation Conf. (2004), pp. 834–837 [Kha08] M.H.A. Khan, Cost reduction in nearest neighbour based synthesis of quantum boolean circuits. Eng. Lett. 16, 1–5 (2008) [Kro99] T. Kropf, Introduction to Formal Hardware Verification (Springer, Berlin, 1999) [Kut06] S.A. Kutin, Shor’s algorithm on a nearest-neighbor machine, in Asian Conference on Quantum Information Science (2006). arXiv:quant-ph/0609001v1 [Lan61] R. Landauer, Irreversibility and heat generation in the computing process. IBM J. Res. Dev. 5, 183 (1961) [Lar92] T. Larrabee, Test pattern generation using Boolean satisfiability. IEEE Trans. CAD 11, 4–15 (1992) [LCC+95] C.-C. Lin, K.-C. Chen, S.-C. Chang, M. Marek-Sadowska, K.-T. Cheng, Logic synthesis for engineering change, in Design Automation Conf. (1995), pp. 647–651 [LL92] H. Liaw, C. Lin, On the OBDD-representation of general Boolean functions. IEEE Trans. Comput. 41, 661–664 (1992) [LSU89] R. Lipsett, C. Schaefer, C. Ussery, VHDL: Hardware Description and Design (Kluwer Academic, Dordrecht, 1989) [Mar99] J.P. Marques-Silva, The impact of branching heuristics in propositional satisfiability algorithms, in 9th Portuguese Conference on Artificial Intelligence (EPIA) (1999) [Mas07] D. Maslov, Linear depth stabilizer and quantum fourier transformation circuits with no auxiliary qubits in finite neighbor quantum architectures. Phys. Rev. 76, 052310 (2007) [McM02] K.L. McMillan, Applying SAT methods in unbounded symbolic model checking, in Computer Aided Verification (2002), pp. 250–264 [MD04a] D. Maslov, G.W. Dueck, Improved quantum cost for n-bit Toffoli gates. IEE Electron. Lett. 39, 1790 (2004) [MD04b] D. Maslov, G.W. Dueck, Reversible cascades with minimal garbage. IEEE Trans. CAD 23(11), 1497–1509 (2004) [MDM05] D. Maslov, G.W. Dueck, D.M. Miller, Toffoli network synthesis with templates. IEEE Trans. CAD 24(6), 807–817 (2005) [MDM07] D. Maslov, G.W. Dueck, D.M. Miller, Techniques for the synthesis of reversible Toffoli networks. ACM Trans. Des. Autom. Electron. Syst. 12(4), 42 (2007)

180 [MDW09]

[Mer93] [Mer07] [MK04] [ML01]

[MMD03] [MMZ+01] [Moo65] [MS99] [MT06] [MT08] [MWD09] [MWD10] [MYDM05] [NC00] [OWDD10]

[Pap93] [PBG05] [PBL05] [Per85] [PFBH05] [PHM04] [PHW06] [Pit99] [PMH08] [RO08] [Rud93]

References D.M. Miller, G.W. Dueck, R. Wille, Synthesising reversible circuits from irreversible specifications using Reed-Muller spectral techniques, in Int’l Workshop on Applications of the Reed-Muller Expansion in Circuit Design (2009), pp. 87–96 R.C. Merkle, Reversible electronic logic using switches. Nanotechnology 4, 21–40 (1993) N.D. Mermin, Quantum Computer Science: An Introduction (Cambridge University Press, Cambridge, 2007) M.M. Mano, C.R. Kime, Logic and Computer Design Fundamentals (Pearson Education, Upper Saddle River, 2004) J.P. McGregor, R.B. Lee, Architectural enhancements for fast subword permutations with repetitions in cryptographic applications, in Int’l Conf. on Comp. Design (2001), pp. 453–461 D.M. Miller, D. Maslov, G.W. Dueck, A transformation based algorithm for reversible logic synthesis, in Design Automation Conf. (2003), pp. 318–323 M.W. Moskewicz, C.F. Madigan, Y. Zhao, L. Zhang, S. Malik, Chaff: Engineering an efficient SAT solver, in Design Automation Conf. (2001), pp. 530–535 G.E. Moore, Cramming more components onto integrated circuits. Electronics 38(8), 114–117 (1965) J.P. Marques-Silva, K.A. Sakallah, GRASP: A search algorithm for propositional satisfiability. IEEE Trans. Comput. 48(5), 506–521 (1999) D.M. Miller, M.A. Thornton, QMDD: A decision diagram structure for reversible and quantum circuits, in Int’l Symp. on Multi-Valued Logic (2006), p. 6 D.M. Miller, M.A. Thornton, Multiple-Valued Logic: Concepts and Representations (Morgan and Claypool, San Rafael, 2008) D.M. Miller, R. Wille, G. Dueck, Synthesizing reversible circuits for irreversible functions, in EUROMICRO Symp. on Digital System Design (2009), pp. 749–756 D.M. Miller, R. Wille, R. Drechsler, Reducing reversible circuit cost by adding lines, in Int’l Symp. on Multi-Valued Logic (2010) D. Maslov, C. Young, G.W. Dueck, D.M. Miller, Quantum circuit simplification using templates, in Design, Automation and Test in Europe (2005), pp. 1208–1213 M. Nielsen, I. Chuang, Quantum Computation and Quantum Information (Cambridge Univ. Press, Cambridge, 2000) S. Offermann, R. Wille, G.W. Dueck, R. Drechsler, Synthesizing multiplier in reversible logic, in IEEE Symp. on Design and Diagnostics of Electronic Circuits and Systems (2010), pp. 335–340 C.H. Papadimitriou, Computational Complexity (Addison Wesley, Reading, 1993) M.R. Prasad, A. Biere, A. Gupta, A survey of recent advances in SAT-based formal verification. Softw. Tools Technol. Transf. 7(2), 156–173 (2005) M. Perkowski, J. Biamonte, M. Lukac, Test generation and fault localization for quantum circuits, in Int’l Symp. on Multi-Valued Logic (2005), pp. 62–68 A. Peres, Reversible logic and quantum computers. Phys. Rev. A 32, 3266–3276 (1985) I. Polian, T. Fiehn, B. Becker, J.P. Hayes, A family of logical fault models for reversible circuits, in Asian Test Symp. (2005), pp. 422–427 K.N. Patel, J.P. Hayes, I.L. Markov, Fault testing for reversible circuits. IEEE Trans. CAD 23(8), 1220–1230 (2004) A. De Pierro, C. Hankin, H. Wiklicky, Reversible combinatory logic. Math. Struct. Comput. Sci. 16(4), 621–637 (2006) A.O. Pittenger, An Introduction to Quantum Computing Algorithms (Birkhauser, Basel, 1999) K. Patel, I. Markov, J. Hayes, Optimal synthesis of linear reversible circuits. Quantum Inf. Comput. 8(3–4), 282–294 (2008) M. Ross, M. Oskin, Quantum computing. Commun. ACM 51(7), 12–13 (2008) R. Rudell, Dynamic variable ordering for ordered binary decision diagrams, in Int’l Conf. on CAD (1993), pp. 42–47

References [SD96] [SDF04] [SFBD08] [Sha38] [Sho94] [SL00]

[Som01] [SPMH03] [SSL+92]

[SVAV05] [TG08] [TK05] [Tof80]

[TS05] [Tse68]

[VH99] [VMH07] [VSB+01]

[WD09] [WD10]

[Weg00] [WFG+07] [WFG+09]

181 J.A. Smolin, D.P. DiVincenzo, Five two-bit quantum gates are sufficient to implement the quantum fredkin gate. Phys. Rev. A 53(4), 2855–2856 (1996) S. Sutherland, S. Davidmann, P. Flake, System Verilog for Design and Modeling (Kluwer Academic, Dordrecht, 2004) A. Sülflow, G. Fey, R. Bloem, R. Drechsler, Using unsatisfiable cores to debug multiple design errors, in Great Lakes Symp. VLSI (2008), pp. 77–82 C.E. Shannon, A symbolic analysis of relay and switching circuits. Trans. AIEE 57, 713–723 (1938) P.W. Shor, Algorithms for quantum computation: discrete logarithms and factoring, in Foundations of Computer Science (1994), pp. 124–134 Z. Shi, R.B. Lee, Bit permutation instructions for accelerating software cryptography, in Int’l Conf. on Application-Specific Systems, Architectures, and Processors (2000), pp. 138–148 F. Somenzi, CUDD: Cu Decision Diagram Package Release 2.3.1 (University of Colorado at Boulder, Boulder, 2001) V.V. Shende, A.K. Prasad, I.L. Markov, J.P. Hayes, Synthesis of reversible logic circuits. IEEE Trans. CAD 22(6), 710–722 (2003) E. Sentovich, K. Singh, L. Lavagno, C. Moon, R. Murgai, A. Saldanha, H. Savoj, P. Stephan, R. Brayton, A. Sangiovanni-Vincentelli, SIS: A system for sequential circuit synthesis. Technical Report, University of Berkeley (1992) A. Smith, A.G. Veneris, M.F. Ali, A. Viglas, Fault diagnosis and logic debugging using boolean satisfiability. IEEE Trans. CAD 24(10), 1606–1621 (2005) M.K. Thomson, R. Glück, Optimized reversible binary-coded decimal adders. J. Syst. Archit. 54, 697–706 (2008) Y. Takahashi, N. Kunihiro, A linear-size quantum circuit for addition with no ancillary qubits. Quantum Inf. Comput. 5, 440–448 (2005) T. Toffoli, Reversible computing, in Automata, Languages and Programming, ed. by W. de Bakker, J. van Leeuwen (Springer, Berlin, 1980), p. 632. Technical Memo MIT/LCS/TM-151, MIT Lab. for Comput. Sci. H. Thapliyal, M.B. Srinivas, The need of DNA computing: reversible designs of adders and multipliers using fredkin gate, in SPIE (2005) G. Tseitin, On the complexity of derivation in propositional calculus, in Studies in Constructive Mathematics and Mathematical Logic, Part 2 (Nauka, Leningrad, 1968), pp. 115–125. (Reprinted in: J. Siekmann, G. Wrightson (eds.), Automation of Reasoning, vol. 2, Springer, Berlin, 1983, pp. 466–483) A. Veneris, I.N. Hajj, Design error diagnosis and correction via test vector simulation. IEEE Trans. CAD 18(12), 1803–1816 (1999) G.F. Viamontes, I.L. Markov, J.P. Hayes, Checking equivalence of quantum circuits and states, in Int’l Conf. on CAD (2007), pp. 69–74 L.M.K. Vandersypen, M. Steffen, G. Breyta, C.S. Yannoni, M.H. Sherwood, I.L. Chuang, Experimental realization of Shor’s quantum factoring algorithm using nuclear magnetic resonance. Nature 414, 883 (2001) R. Wille, R. Drechsler, BDD-based synthesis of reversible logic for large functions, in Design Automation Conf. (2009), pp. 270–275 R. Wille, R. Drechsler, Effect of BDD optimization on synthesis of reversible and quantum logic. Electron. Notes Theor. Comput. Sci. 253(6), 57–70 (2010). Proceedings of the Workshop on Reversible Computation (RC 2009) I. Wegener, Branching Programs and Binary Decision Diagrams: Theory and Applications (Society for Industrial and Applied Mathematics, Philadelphia, 2000) R. Wille, G. Fey, D. Große, S. Eggersglüß, R. Drechsler, SWORD: A SAT like prover using word level information, in VLSI of System-on-Chip (2007), pp. 88–93 R. Wille, G. Fey, D. Große, S. Eggersglüß, R. Drechsler, SWORD: A SAT like prover using word level information, in VLSI-SoC: Advanced Topics on Systems on a Chip: A Selection of Extended Versions of the Best Papers of the Fourteenth

182

[WG07] [WGDD09] [WGF+09] [WGHD09]

[WGMD09] [WGSD08]

[WGT+08]

[WKS01] [WLDG08] [WLTK08] [WOD10] [WSD08] [WSD09] [WSD10] [Yan91] [YG07]

[YPA06] [YSHP05]

[YSP+99] [ZM06] [ZSM+05]

References International Conference on Very Large Scale Integration of System on Chip, ed. by R. Reis, V. Mooney, P. Hasler (Springer, Berlin, 2009), pp. 175–192 R. Wille, D. Große, Fast exact Toffoli network synthesis of reversible logic, in Int’l Conf. on CAD (2007), pp. 60–64 R. Wille, D. Große, G. Dueck, R. Drechsler, Reversible logic synthesis with output permutation, in VLSI Design (2009), pp. 189–194 R. Wille, D. Große, S. Frehse, G.W. Dueck, R. Drechsler, Debugging of Toffoli networks, in Design, Automation and Test in Europe (2009), pp. 1284–1289 R. Wille, D. Große, F. Haedicke, R. Drechsler, SMT-based stimuli generation in the SystemC verification library, in Forum on Specification and Design Languages (2009) R. Wille, D. Große, D.M. Miller, R. Drechsler, Equivalence checking of reversible circuits, in Int’l Symp. on Multi-Valued Logic (2009), pp. 324–330 R. Wille, D. Große, M. Soeken, R. Drechsler, Using higher levels of abstraction for solving optimization problems by Boolean satisfiability, in IEEE Annual Symposium on VLSI (2008), pp. 411–416 R. Wille, D. Große, L. Teuber, G.W. Dueck, R. Drechsler, RevLib: an online resource for reversible functions and reversible circuits, in Int’l Symp. on MultiValued Logic (2008), pp. 220–225. RevLib is available at http://www.revlib.org J. Whittemore, J. Kim, K. Sakallah, SATIRE: A new incremental satisfiability engine, in Design Automation Conf. (2001), pp. 542–545 R. Wille, H.M. Le, G.W. Dueck, D. Große, Quantified synthesis of reversible logic, in Design, Automation and Test in Europe (2008), pp. 1015–1020 S.A. Wang, C.Y. Lu, I.M. Tsai, S.Y. Kuo, An XQDD-based verification method for quantum circuits. IEICE Trans. 91-A(2), 584–594 (2008) R. Wille, S. Offermann, R. Drechsler, SyReC: A programming language for synthesis of reversible circuits, in Forum on Specification and Design Languages (2010) R. Wille, A. Sülflow, R. Drechsler, SWORD v0.2—Module-based SAT Solving, Satisfiability Modulo Theories Competition (2008) R. Wille, M. Saeedi, R. Drechsler, Synthesis of reversible functions beyond gate count and quantum cost, in Int’l Workshop on Logic Synth. (2009), pp. 43–49 R. Wille, M. Soeken, R. Drechsler, Reducing the number of lines in reversible circuits, in Design Automation Conf. (2010) S. Yang, Logic synthesis and optimization benchmarks user guide. Technical Report 1/95, Microelectronic Center of North Carolina (1991) T. Yokoyama, R. Glück, A reversible programming language and its invertible selfinterpreter, in Symp. on Partial Evaluation and Semantics-based Program Manipulation (2007), pp. 144–153 J. Yuan, C. Pixley, A. Aziz, Constraint-based Verification (Springer, Berlin, 2006) G. Yang, X. Song, W.N.N. Hung, M.A. Perkowski, Fast synthesis of exact minimal reversible circuits using group theory, in ASP Design Automation Conf. (2005), pp. 1002–1005 J. Yuan, K. Shultz, C. Pixley, H. Miller, A. Aziz, Modeling design constraints and biasing in simulation using BDDs, in Int’l Conf. on CAD (1999), pp. 584–590 J. Zhong, J.C. Muzio, Using crosspoint faults in simplifying Toffoli networks, in IEEE North-East Workshop on Circuits and Systems (2006), pp. 129–132 J. Zhang, S. Sinha, A. Mishchenko, R. Brayton, M. Chrzanowska-Jeske, Simulation and satisfiability in logic synthesis, in Int’l Workshop on Logic Synth. (2005), pp. 161–168

Index

A ALU, 48 Arithmetic Logic Unit, see ALU B BDD, 17 Binary Decision Diagram, see BDD Bit-vector logic, 23 Boolean function, 7 multi-output, 8 reversible, 8 single-output, 7 Boolean satisfiability, 21 C Circuit quantum, 14 reversible, 10 traditional, 9 Circuit composition, 124 Circuit cost, 12 CNF, 21 CNOT gate, 10 Complement edge, 18, 36 Conjunctive Normal Form, see CNF Constant input, 9 Control line, 10 Cost, 12 Counterexample, 145 D Debugging, 155 reversible, 157 traditional, 156

Decomposition NNC-optimal (exact), 134 NNC-optimal (improved), 134 NNC-optimal (naive), 133 quantum, 16 Shannon, 17 Double gates, 16 E Embedding, 28, 93 Equivalence checking, 145 QMDD-based, 145 SAT-based, 148 Error candidate, 155 Error location, 162 Error models, 155 Exact synthesis, 57 QBF-based, 81 SAT-based, 58, 61 SMT-based, 77 SWORD-based, 79 F Factoring reversible circuits, 115 Fixing, 165 Fredkin gate, 10 G Garbage output, 9, 29 Gate CNOT, 10, 14 double, 16 Fredkin, 10 NOT, 10, 14 Peres, 10

R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, DOI 10.1007/978-90-481-9579-4, © Springer Science+Business Media B.V. 2010

183

184

Gate (cont.) quantum, 14 reversible, 10 SWAP, 11 Toffoli, 10 traditional, 9 V, 14 V+, 15 Gate count, 12 H Helper line, 114 Heuristic synthesis, see Synthesis I Inverter, see NOT gate J Janus, 47 L Linear Nearest Neighbor, see LNN LNN, 131 M Modules (SWORD), 24 N Nearest Neighbor Cost, see NNC NNC, 131, 133 NOT gate, 10, 14 O Ordering BDD, 18, 37 P Peres gate, 10 Q QBF, 24 QF_BV, see Bit-vector logic

Index

QMDD, 19 Quantified Boolean Formulas, see QBF Quantum computation, 14 Quantum cost, 12 Quantum gate, 14 Quantum Multiple-valued Decision Diagram, see QMDD Qubit, 14 S SAT Modulo Theories, see SMT SAT problem, 21 SAT solver, 22 Select line in debugging, 157 Shared node, 18, 34 SMT, 23 SMT solver, 23 Superposition, 13 SWAP gate, 11 SWOP, 100 SWORD solver, 24 Synthesis, 27 BDD-based, 31 QBF-based, 81 SAT-based, 58, 61 SMT-based, 77 SWORD-based, 79 SyReC-based, 46 transformation-based, 30 with output permutation, see SWOP T Target line, 10 Toffoli gate, 10 Transistor cost, 13 V V gate, 14 V+ gate, 15 Verification, 143