Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
5204
Cristian S. Calude José Félix Costa Rudolf Freund Marion Oswald Grzegorz Rozenberg (Eds.)
Unconventional Computation 7th International Conference, UC 2008 Vienna, Austria, August 25-28, 2008 Proceedings
13
Volume Editors Cristian S. Calude University of Auckland, Department of Computer Science 92019 Auckland, New Zealand E-mail:
[email protected] José Félix Costa Universidade Técnica de Lisboa, Department of Mathematics 1049-001 Lisboa, Portugal E-mail:
[email protected] Rudolf Freund Marion Oswald Vienna University of Technology, Faculty of Informatics 1040 Vienna, Austria E-mail: {rudi, marion}@emcc.at Grzegorz Rozenberg Leiden University, Leiden Institute of Advanced Computer Science 2333 CA Leiden, The Netherlands and University of Colorado, Department of Computer Science Boulder, CO 80309-0430, USA E-mail:
[email protected]
Library of Congress Control Number: 2008932587 CR Subject Classification (1998): F.1, F.2 LNCS Sublibrary: SL 1 – Theoretical Computer Science and General Issues ISSN ISBN-10 ISBN-13
0302-9743 3-540-85193-3 Springer Berlin Heidelberg New York 978-3-540-85193-6 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com © Springer-Verlag Berlin Heidelberg 2008 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12458677 06/3180 543210
Preface
The 7th International Conference on Unconventional Computation, UC 2008, organized under the auspices of the EATCS, by the Vienna University of Technology (Vienna, Austria) and the Centre for Discrete Mathematics and Theoretical Computer Science (Auckland, New Zealand) was held in Vienna during August 25–28, 2008. The venue for the conference was the Parkhotel Sch¨onbrunn in the immediate vicinity of Sch¨ onbrunn Palace, which, together with its ancillary buildings and extensive park, is by virtue of its long and colorful history one of the most important cultural monuments in Austria. Vienna, located in the heart of central Europe, is an old city whose historical role as the capital of a great empire and the residence of the Habsburgs is reflected in its architectural monuments, its famous art collections and its rich cultural life, in which music has always played an important part. The International Conference on Unconventional Computation (UC) series, https://www.cs.auckland.ac.nz/CDMTCS/conferences/uc/, is devoted to all aspects of unconventional computation – theory as well as experiments and applications. Typical, but not exclusive, topics are: natural computing including quantum, cellular, molecular, neural and evolutionary computing, chaos and dynamical system-based computing, and various proposals for computations that go beyond the Turing model. The first venue of the Unconventional Computation Conference (formerly called Unconventional Models of Computation) was Auckland, New Zealand in 1998; subsequent sites of the conference were Brussels, Belgium in 2000, Kobe, Japan in 2002, Seville, Spain in 2005, York, UK in 2006, and Kingston, Canada in 2007. The titles of volumes of previous UC conferences are as follows: 1. Calude, C.S., Casti, J., Dinneen, M.J. (eds.): Unconventional Models of Computation. Springer, Singapore (1998) 2. Antoniou, I., Calude, C.S., Dinneen, M.J. (eds.): Unconventional Models of Computation, UMC 2K: Proceedings of the Second International Conference. Springer, London (2001) 3. Calude, C.S., Dinneen, M.J., Peper, F. (eds.): UMC 2002. LNCS, vol. 2509. Springer, Heidelberg (2002) 4. Calude, C.S., Dinneen, M.J., P˘ aun, G., Jes´ us P´erez-J´ımenez, M., Rozenberg, G. (eds.): UC 2005. LNCS, vol. 3699. Springer, Heidelberg (2005) 5. Calude, C.S., Dinneen, M.J., P˘ aun, G., Rozenberg, G., Stepney, S. (eds.): UC 2006. LNCS, vol. 4135. Springer, Heidelberg (2006) 6. Akl, S.G., Calude, C.S., Dinneen, M.J., Rozenberg, G., Wareham, H.T. (eds.): UC 2007. LNCS, vol. 4618. Springer, Heidelberg (2007)
VI
Preface
The Steering Committee of the International Conference on Unconventional Computation series includes Thomas B¨ ack (Leiden, The Netherlands), Cristian S. Calude (Auckland, New Zealand, Co-chair), Lov K. Grover (Murray Hill, NJ, USA), Jan van Leeuwen (Utrecht, The Netherlands), Seth Lloyd (Cambridge, MA, USA), Gheorghe P˘ aun (Bucharest, Romania), Tommaso Toffoli (Boston, MA, USA), Carme Torras (Barcelona, Spain), Grzegorz Rozenberg (Leiden, The Netherlands, and Boulder, Colorado, USA, Co-chair), and Arto Salomaa (Turku, Finland). The four keynote speakers of the conference for 2008 were: ˇ – Caslav Brukner (Austrian Academy of Sciences, Austria): “Quantum Experiments Can Test Mathematical Undecidability” – Anne Condon (University of British Columbia, Canada): “Computational Challenges and Opportunities in the Design of Unconventional Machines from Nucleic Acids” – David Corne (Heriot-Watt University, UK): “Predictions for the Future of Optimization Research” – Jon Timmis (University of York, UK): “Immune Systems and Computation: An Interdisciplinary Adventure” In addition, UC 2008 hosted three workshops – one on “Computing with Biomolecules,” organized by Erzs´ebet Csuhaj-Varj´ u (Hungarian Academy of Sciences, Hungary) and Rudolf Freund (Vienna University of Technology, Austria), one on “Optical Supercomputing,” organized by Shlomi Dolev (Ben-Gurion University, Israel), Mihai Oltean (Babes-Bolyai University, Romania) and Wolfgang Osten (Stuttgart University, Germany) and one on “Physics and Computation,” organized by Cristian S. Calude (University of Auckland, New Zealand) and Jos´e F´elix Costa (Technical University of Lisbon, Portugal). The Programme Committee is grateful for the highly appreciated work done by the referees for the conference. These experts were: Selim G. Akl, Cristian S. Calude, Alberto Castellini, Barry S. Cooper, David Corne, Jos´e F´elix Costa, Erzs´ebet Csuhaj-Varj´ u, Michael J. Dinneen, Gerard Dreyfus, Rudolf Freund, Daniel Gra¸ca, Mika Hirvensalo, Natasha Jonoska, Jarkko Kari, Yun-Bum Kim, Manuel Lameiras Campagnolo, Vincenzo Manca, Marius Nagy, Turlough Neary, Marion Oswald, Roberto Pagliarini, Gheorghe P˘ aun, Ferdinand Peper, Petrus H. Potgieter, Kai Salomaa, Karl Svozil, Carme Torras, Hiroshi Umeo and Damien Woods. The Programme Committee consisting of Selim G. Akl (Kingston, ON, Canada), Cristian S. Calude (Auckland, New Zealand), Barry S. Cooper (Leeds, UK), David Corne (Edinburgh, UK), Jos´e F´elix Costa (Lisbon, Portugal, Co-chair), Erzs´ebet Csuhaj-Varj´ u (Budapest, Hungary), Michael J. Dinneen (Auckland, New Zealand), Gerard Dreyfus (Paris, France), Rudolf Freund (Vienna, Austria, Co-chair), Eric Goles (Santiago, Chile), Natasha Jonoska (Tampa, FL, USA), Jarkko Kari (Turku, Finland), Vincenzo Manca (Verona, Italy), Gheorghe P˘ aun (Bucharest, Romania), Ferdinand Peper (Kobe, Japan), Petrus H. Potgieter (Pretoria, South Africa), Kai Salomaa (Kingston, Canada), Karl Svozil (Vienna, Austria), Carme Torras (Barcelona, Spain), Hiroshi Umeo (Osaka, Japan), Harold T.
Preface
VII
Wareham (St. John’s, NL, Canada), Damien Woods (Cork, Ireland) and Xin Yao (Birmingham, UK) selected 16 papers (out of 22) to be presented as regular contributions. We extend our thanks to all members of the local Conference Committee, particularly to Aneta Binder, Rudolf Freund (Chair), Franziska Gusel, and Marion Oswald of the Vienna University of Technology for their invaluable organizational work. The conference was partially supported by the Institute of Computer Languages of the Vienna University of Technology, the Kurt G¨ odel Society, and the OCG (Austrian Computer Society); we extend to all our gratitude. It is a great pleasure to acknowledge the fine co-operation with the Lecture Notes in Computer Science team of Springer for producing this volume in time for the conference.
June 2008
Cristian S. Calude Jos´e F´elix Costa Rudolf Freund Marion Oswald Grzegorz Rozenberg
Table of Contents
Invited Papers Quantum Experiments Can Test Mathematical Undecidability . . . . . . . . . ˇ Caslav Brukner Computational Challenges and Opportunities in the Design of Unconventional Machines from Nucleic Acids . . . . . . . . . . . . . . . . . . . . . . . . Anne Condon
1
6
Predictions for the Future of Optimisation Research . . . . . . . . . . . . . . . . . . David Corne
7
Immune Systems and Computation: An Interdisciplinary Adventure . . . . Jon Timmis, Paul Andrews, Nick Owens, and Ed Clark
8
Regular Contributions Distributed Learning of Wardrop Equilibria . . . . . . . . . . . . . . . . . . . . . . . . . Dominique Barth, Olivier Bournez, Octave Boussaton, and Johanne Cohen
19
Oracles and Advice as Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Edwin Beggs, Jos´e F´elix Costa, Bruno Loff, and John V. Tucker
33
From Gene Regulation to Stochastic Fusion . . . . . . . . . . . . . . . . . . . . . . . . . Gabriel Ciobanu
51
A Biologically Inspired Model with Fusion and Clonation of Membranes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Giorgio Delzanno and Laurent Van Begin
64
Computing Omega-Limit Sets in Linear Dynamical Systems . . . . . . . . . . . Emmanuel Hainry
83
The Expressiveness of Concentration Controlled P Systems . . . . . . . . . . . . Shankara Narayanan Krishna
96
On Faster Integer Calculations Using Non-arithmetic Primitives . . . . . . . Katharina L¨ urwer-Br¨ uggemeier and Martin Ziegler
111
A Framework for Designing Novel Magnetic Tiles Capable of Complex Self-assemblies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Urmi Majumder and John H. Reif
129
X
Table of Contents
The Role of Conceptual Structure in Designing Cellular Automata to Perform Collective Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manuel Marques-Pita, Melanie Mitchell, and Luis M. Rocha
146
A Characterisation of NL Using Membrane Systems without Charges and Dissolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Niall Murphy and Damien Woods
164
Quantum Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Naya Nagy, Marius Nagy, and Selim G. Akl
177
On the Computational Complexity of Spiking Neural P Systems . . . . . . . Turlough Neary
189
Self-assembly of Decidable Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matthew J. Patitz and Scott M. Summers
206
Ultrafilter and Non-standard Turing Machines . . . . . . . . . . . . . . . . . . . . . . . Petrus H. Potgieter and Elem´er E. Rosinger
220
Parallel Optimization of a Reversible (Quantum) Ripple-Carry Adder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael Kirkedal Thomsen and Holger Bock Axelsen
228
Automata on Multisets of Communicating Objects . . . . . . . . . . . . . . . . . . . Linmin Yang, Yong Wang, and Zhe Dang
242
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
259
Quantum Experiments Can Test Mathematical Undecidability ˇ Caslav Brukner Institute for Quantum Optics and Quantum Information, Austrian Academy of Sciences, Boltzmanngasse 3, A-1090 Vienna, Austria Faculty of Physics, University of Vienna, Boltzmanngasse 5, A-1090 Vienna, Austria
Abstract. Whenever a mathematical proposition to be proved requires more information than it is contained in an axiomatic system, it can neither be proved nor disproved, i.e. it is undecidable, within this axiomatic system. I will show that certain mathematical propositions can be encoded in quantum states and truth values of the propositions can be tested in quantum measurements. I will then show that whenever a proposition is undecidable within the system of axioms encoded in the state, the measurement associated with the proposition gives random outcomes. This suggests a view according to which randomness in quantum mechanics is of irreducible nature.
In his seminal work from 1931, G¨ odel proved that the Hilbert programme on axiomatization of mathematics cannot be fulfilled in principle, because any system of axioms that is capable of expressing elementary arithmetic would necessarily have to be either incomplete or inconsistent. It would always be the case that either some proposition would be at once both provably true and false, or that some propositions would never be derivable from the axioms. One may wonder what G¨ odel’s incompleteness theorem implies for physics. For example, is there any connection between the incompleteness theorems and quantum mechanics as both fundamentally limit our knowledge? Opinions on the impact of the incompleteness theorem on physics vary considerably, from the conclusion that, ”just because physics makes use of mathematics, it is by no means required that G¨odel places any limit upon the overall scope of physics to understand the laws of Nature” [1], via demonstration that algorithmic randomness is implied by a ”formal uncertainty principle” similar to Heisenberg’s one [2], to a derivation of non-computability of sequences of quantum outcomes from the quantum value indefiniteness [3,4]. In 1982, Chaitin gave an information theoretical formulation of the incompleteness theorem suggesting that it arises whenever a proposition to be proven and the axioms contain together more information than the set of axioms alone [5,6]. In this work, when relating mathematical undecidability to quantum randomness, I will exclusively refer to the incompleteness in Chaitin’s sense and not to the original work of G¨ odel. C.S. Calude et al. (Eds.): UC 2008, LNCS 5204, pp. 1–5, 2008. c Springer-Verlag Berlin Heidelberg 2008
2
ˇ Brukner C.
Consider a d-valent function f (x) ∈ 0, ..., d − 1 of a single binary argument x ∈ {0, 1}, with d a prime number1 . There are d2 such functions. We will partition the functions into d + 1 different ways following the procedure of Ref. [7]. In a given partition, the d2 functions will be divided into d different groups each containing d functions. Enumerating the first d partitions by the integer a = 0, ..., d − 1 and the groups by b = 0, ..., d − 1, the groups of functions are generated from the formula: f (1) = af (0) ⊕ b, (1) where the sum is modulo d. In the last partition, enumerated by a = d, the functions are divided into groups b = 0, ..., d−1 according to the functional value f (0) = b. The functions can be represented in a table in which a enumerates the rows of the table, while b enumerates different columns. For all but the last row the table is built in the following way : (i) choose the row, a, and the column, b; (ii) vary f (0) = 0, ..., d − 1 and compute f (1) according to Eq. (1); (iii) write pairs f (0) f (1) in the cell. The last row (a = d) is built as follows: (i) choose the column b; (ii) vary f (1) = 0, ..., d − 1 and put f (0) = b; (iii) write pairs f (0) f (1) in the cell. For example, for d = 3, one has b=1 b=2 b=0 00 10 20 01 11 21 02 12 22 00 11 22 01 12 20 02 10 21 00 12 21 01 10 22 02 11 20 00 01 02 10 11 12 20 21 22
“f (1) = b” “f (1) = f (0) ⊕ b” “f (1) = 2f (0) ⊕ b” “f (0) = b”
(2)
The groups (cells in the table) of functions that do not belong to the last row are specified by the proposition: {a, b} : “The function values f (0) and f (1) satisfy f (1) = af (0) ⊕ b”,
(3)
while those from the last row by {d, b} : “The function value f (0) = b”.
(4)
The propositions corresponding to different partitions a are independent from each other. For example, if one postulates the proposition (A) “f (1) = af (0)⊕b” to be true, i.e. if we choose it as an “axiom”, then it is possible to prove that “theorem” (T1) “f (1) = af (0) ⊕ b ” is false for all b = b. Proposition (T1) is decidable within the axiom (A). Within the same axiom (A) it is, however, impossible to prove or disprove “theorem” (T2) “f (1) = mf (0)⊕n” with m = a. Having only axiom (A), i.e. only one dit of information, there is not enough information to know also the truth value of (T2). Ascribing truth values to two 1
The considerations here can be generalized to all dimensions that are powers of primes. This is related to the fact that in these cases a complete set of mutually unbiased bases is known to exit. In all other cases this is an open question and goes beyond the scope of this paper (see, for example, Ref. [7]).
Quantum Experiments Can Test Mathematical Undecidability
3
propositions belonging to two different partitions, e.g. to both (A) and (T2), would require two dits of information. Hence, in Chaitin’s sense, proposition (T2) is mathematically undecidable within the system containing the single axiom (A). So far, we have made only logical statements. To make a bridge to physics consider a hypothetical device – ”preparation device” – that can encode a mathematical axiom {a, b} of the type (3) or (4) into a property of a physical system by setting a ”control switch” of the apparatus in a certain position {a, b}. In an operational sense the choice of the mathematical axiom is entirely defined by the switch position as illustrated in Figure 1. We make no particular assumptions on the physical theory (e.g., classical or quantum) that underlies the behavior of the system, besides that it fundamentally limits the information content of the system to one dit of information. Furthermore, we assume that there is a second device – a ”measurement apparatus” – that can test the truth value of a chosen mathematical proposition again by setting a control switch of the apparatus to a certain position associated to the proposition. The choice of the switch position {m}, m ∈ {0, ..., d}, corresponds to a performance of one of the d + 1 possible measurements on the system and the observation of a d-valued outcome n in the measurement is identified with finding proposition {m, n} of the type (3) or (4) being true. Consider now a situation where the preparation device is set on {a, b}, while the measurement apparatus on {m}. If m = a, the outcome confirms the axiom, i.e. one has n = b. What will be the outcome in a single run of the experiment if m = a? I will show that devices from the previous paragraph are not hypothetical at all. In fact, they can be realized in quantum mechanics. The argument is based ˆ denoted as |κ, we define on Ref. [7]. In the basis of generalized Pauli operator Z, two elementary operators ˆ Z|κ = ηdκ |κ,
ˆ X|κ = |κ + 1,
(5)
where ηd = exp (i2π/d) is a complex dth root of unity. The eigenstates of the ˆ Zˆ a operator, expressed in the Zˆ basis, are given by X d−1 √ ηd−jκ−asκ |κ, |ja = (1/ d) κ=0
ˆ a = |j − 1a . where sκ = κ + ... + (d − 1) [8], and the Zˆ operator shifts them: Z|j To encode the axiom {a, b} into a quantum state the preparation device is set to ˆ =X ˆ f (0) Zˆ f (1) on it. The action prepare state |0a and then to apply the unitary U ˆ ∝ (X ˆ Zˆ a )f (0) Zˆ b , of the device is, for a = 0, ..., d − 1 and up to a global phase, U which follows from Eq. (1) and the commutation relation for the elementary ˆ Z. ˆ The state leaving the preparation device is shifted ˆ = ηd X operators, Zˆ X exactly b times resulting in | − ba . For the case a = d the state is prepared in the eigenstate |0d ≡ |0 of the operator Zˆ and the unitary transforms it into, up to the phase factor, | + bd . When the switch of the measurement apparatus is set to {m} it measures the incoming state in the basis {|0m , ..., |d − 1m }. For m = a the measurement will confirm the axiom {a, b} giving outcome b. In
4
ˇ Brukner C.
Fig. 1. Quantum experiment testing (un)decidability of mathematical propositions (3) and (4). A qudit is initialized in a definite quantum state |0a of one of d + 1 mutually unbiased bases sets a ∈ {0, ..., d}. Subsequently, the unitary transformation ˆ =X ˆ f (0) Zˆ f (1) which encodes the d-valued function with functional values f (0) and U f (1) is applied to the qudit. The final state encodes the proposition: “f (1) = af (0) ⊕ b” for a = 0, ..., d − 1 or the proposition:“f (0) = b” for a = d. The measurement apparatus is set to measure in the m-th basis {|0m , ..., |d−1m }, which belongs to one of d+1 mutually unbiased basis sets m ∈ {0, ..., d}. It tests the propositions: “f (1) = mf (0) ⊕ n” for m = 0, ..., d − 1 or “f (0) = n” for m = d.
all other cases, the result will be completely random. This follows from the fact ˆ Zˆ a for a = 0, ..., d − 1 (Zˆ 0 ≡ 1) and eigenbasis of X ˆ are that the eigenbases of X known to form a complete set of d+1 mutually unbiased basis sets [8]. They have the property that a system prepared in a state from one of the bases will give completely random results if measured in any other basis, i.e. |a b|mn |2 = 1/d for all a = m. Most working scientists hold fast to the viewpoint according to which randomness can only arise due to the observer’s ignorance about predetermined well-defined properties of physical systems. But the theorems of Kochen and Specker [9] and Bell [10] have seriously put such a belief in question. I argue that an alternative viewpoint according to which quantum randomness is irreducible is vindicable. As proposed by Zeilinger [11] an individual quantum system can contain only a limited information content (“a single qudit carries one dit of information”). We have shown here that a quantum system can encode a finite set of axioms and quantum measurements can test the mathematical propositions. If the proposition is decidable within the axiomatic system, the outcome will be definite. However, if it is undecidable, the response of the system must not contain any information whatsoever about the truth value of the undecidable proposition, and it cannot “refuse” to give an answer2 . Unexplained and perhaps unexplainable, it inevitably gives an outcome – a ”click” in a detector or a flash of a lamp – whenever measured. I suggest that the individual outcome 2
To put it in a grotesque way the system is not allowed to response “I am undecidable, I cannot give an answer.”
Quantum Experiments Can Test Mathematical Undecidability
5
must then be irreducible random, reconciling mathematical undecidability with the fact that a system always gives an “answer” when “asked” in an experiment. Whether or not every quantum measurement (for example, a measurement not belonging to the set of mutually unbiased bases sets) can be associated to a mathematical proposition is an open question. It therefore remains unanswered whether all quantum randomness can generally be seen as a physical signature of mathematical undecidability.
Acknowledgement I am grateful to T. Paterek, R. Prevedel, J. Kofler, P. Klimek, M. Aspelmeyer and A. Zeilinger for numerous discussions on the topic. This work is based on Ref. [7] and [12]. I acknowledge financial support from the Austrian Science Fund (FWF), the Doctoral Program CoQuS and the European Commission under the Integrated Project Qubit Applications (QAP).
References 1. Barrow, J.D.: G¨ odel and Physics. In: Horizons of Truth, Kurt G¨ odel Centenary Meeting, Vienna, April 27-29 (2006) arXiv:physics/0612253 2. Calude, C.S., Stay, M.A.: Int. J. Theor. Phys. 44, 1053–1065 (2005) 3. Svozil, K.: Phys. Lett. A 143, 433–437 (1990) 4. Calude, C.S., Svozil, K. (2006) arXiv:quant-ph/0611029 5. Chaitin, G.J.: Int. J. Theor. Phys. 21, 941–954 (1982) 6. Calude, C.S., J¨ urgensen, H.: Appl. Math. 35, 1–15 (2005) ˇ (2008) arXiv:0804.2193 7. Paterek, T., Daki´c, B., Brukner, C. 8. Bandyopadhyay, S., et al.: Algorithmica 34, 512 (2002) 9. Kochen, S., Specker, E.P.: J. Math. Mech. 17, 59 (1967) 10. Bell, J.: Physics 1, 195 (1964) 11. Zeilinger, A.: Found. Phys. 29, 631–643 (1999) 12. Paterek, T., Prevedel, R., Kofler, J., Klimek, P., Aspelmeyer, M., Zeilinger, A., ˇ (submitted) Brukner, C.
Computational Challenges and Opportunities in the Design of Unconventional Machines from Nucleic Acids Anne Condon The Department of Computer Science, U. British Columbia, Canada
[email protected]
DNA and RNA molecules have proven to be very versatile materials for programmable construction of nano-scale structures and for controlling motion in molecular machines. RNA molecules are also increasingly in the spotlight in recognition of their important regulatory and catalytic roles in the cell and their promise in therapeutics. Function follows form in the molecular world and so our ability to understand nucleic acid function in the cell, as well as to design novel structures, is enhanced by reliable means for structure prediction. Prediction methods for designed molecules typically rely on a thermodynamic model of structure formation. The model associates free energies with loops in the structure, and the overall energy of a structure is the sum of its free energies. From the energy model, the folding pathway, the structure with minimum free energy, or the probabilities of base pair formation, can be computed, Thus, the quality of predictions is limited by the quality of the energy model. In this talk, we will describe progress towards more accurate structure prediction, enabled by improved inference of energy parameters and by new algorithms. We will also describe some interesting problems in design of nucleic acids that have prescribed structure or folding pathways.
C.S. Calude et al. (Eds.): UC 2008, LNCS 5204, p. 6, 2008. c Springer-Verlag Berlin Heidelberg 2008
Predictions for the Future of Optimisation Research David Corne MACS, Earl Mountbatten Building, Heriot-Watt University, Edinburgh EH14 8AS, UK
[email protected]
The global effort to find good optimisation methods is an evolutionary algorithm (note ”is”, not ”is analogous to”). A team’s research effort is an individual, or ‘chromosome’, and peer review is a (very) noisy and multiobjective fitness metric. Genetic operators (new directions and ideas for research efforts) are guided partly by discussions at conferences, maybe even sometimes guided by plenary talks. In this talk I will predict what kind of research in optimisation I expect to have the highest fitness scores in the next several years. They will be, mainly, combinations of learning and optimisation that are theoretically justified, or simply justified by their excellent results, and they will be works concerned with generating algorithms that quickly solve a distribution of problem instances, rather than one at a time. These combinations of learning and optimisation will be informed by the (slow) realisation that several separate studies, emerging from different subfields, are converging on very similar styles of approach. A particular point is that, in this way, we see that theoretical work on optimisation is slowly beginning to understand aspects of methods used by nature. Finally, these are predictions, and certainly not prescriptions. The overarching evolutionary process that we serve cannot succeed unless lots of diversity is maintained. So, please ignore what I say.
C.S. Calude et al. (Eds.): UC 2008, LNCS 5204, p. 7, 2008. c Springer-Verlag Berlin Heidelberg 2008
Immune Systems and Computation: An Interdisciplinary Adventure Jon Timmis1,2 , Paul Andrews1 , Nick Owens2 , and Ed Clark1 1
Department of Computer Science University of York, Heslington, York. YO10 5DD. UK Tel.: +44 1904 432348
[email protected] 2 Department of Electronics University of York, Heslington, York. YO10 5DD. UK
Abstract. Artificial Immune Systems (AIS) is a diverse area of research that attempts to bridge the divide between immunology and engineering and are developed through the application of techniques such as mathematical and computational modeling of immunology, abstraction from those models into algorithm (and system) design and implementation in the context of engineering. Whilst AIS has become known as an area of computer science and engineering that uses immune system metaphors for the creation of novel solutions to problems, we argue that the area of AIS is much wider and is not confined to the simple development of new algorithms. In this paper we would like to broaden the understanding of what AIS are all about, thus driving the area into a true interdisciplinary one of genuine interaction between immunology, mathematics and engineering.
1
Introduction
Over recent years there have been a number of review papers written on Artificial Immune Systems (AIS) with the first being [5] followed by a series of others that either review AIS in general, for example, [7,8,21,12,34], or more specific aspects of AIS such as data mining [39], network security [22], applications of AIS [18], theoretical aspects [34] and modelling in AIS [10]. The aim of this paper is to bring together ideas from the majority of these papers into a single position paper focussed on the interdisciplinary nature of AIS. For information, a good resource of the latest developments in AIS is the International Conference on Artificial Immune Systems (ICARIS 1 ) conference series dedicated to AIS [37,36,25,19,3,9] where there are an extensive number of papers on all aspects of AIS. AIS has become known as an area of computer science and engineering that uses immune system metaphors for the creation of novel solutions to problems. Whilst this forms the majority view, we argue that the area of AIS is much wider and is not confined to the development of new algorithms. In a recent 1
http://ww.artificial-immune-systems.org
C.S. Calude et al. (Eds.): UC 2008, LNCS 5204, pp. 8–18, 2008. © Springer-Verlag Berlin Heidelberg 2008
Immune Systems and Computation: An Interdisciplinary Adventure
9
paper, Cohen [4] concurs with this view and in fact goes onto define three types of AIS scientists. The first are those of the “literal” school that build systems in silico to try and do what the actual immune system does (e.g. build computer security systems that discriminate between self and non-self); those of the “metaphorical” school that look for inspiration from the immune system and build computational systems with the immune system in mind (so the application may be far from analogous to what the immune system does) and a third school of people who aim to understand immunity through the development of computer and mathematical models. It is in this vein that our paper is written, and we would like to broaden the understanding of what AIS is all about, thus driving the area into a true interdisciplinary one of genuine interaction between immunology, mathematics and engineering. Cohen [4] discusses the notion of the immune system using a “computational strategy” to carry out it’s functions of protecting and maintaining the body. An interesting analogy is made to the universal Turing machine that transforms input, which is represented as a sequence of information on a tape, to output, again information on a tape, and this machine operates to a set of rules. He raises interesting questions as to “what does the immune system compute” and “what might we gain from thinking about immune computation?”. Cohen’s main argument is that the immune system computes the state of the organism, based on a myriad of signals, which endows the immune system the ability to maintain and protect the host. Cohen [4] urges the immunological community to embrace working with computational scientists to aid the understanding of the nature of immune computation: this is, in part, the spirit of this paper. In recent years, the area of AIS has begun to return to the immunology from which the initial inspiration came. For example, works by Stepney et al. [32], Twycross and Aickelin [40], Andrews and Timmis [1], Bersini [2] and Timmis [34], all advocate a deeper understanding of the immune system, in part through the use of modelling techniques, which should lead to the development of richer, more effective immune inspired engineered systems. This theme underpins our review paper. We have attempted to structure it in such a way as to reflect the nature of AIS research today, that is one that encompasses (or can encompass) a range of activities from modelling immune systems to engineering systems. The paper is structured as follows: in Section 2 we outline a conceptual framework for the development of AIS, this allows us to begin thinking in an interdisciplinary way, then in Section 3 we provide a very high-level discussion on the basic operation of the immune system, in order to ground provide the wider context, in Section 4 we discuss the role modelling the immunology and identify it as a critical aspect of our interdisciplinary journey, and focus on one modelling tool namely stochastic π-calculus, in Section 5 we discuss how one can consider AIS as a wider field than a simple engineering field and we outline our conclusions in Section 6. The concepts in this paper are discussed in greater depth and breadth in [35].
10
2
J. Timmis et al.
A Framework for Thinking about Artificial Immune Systems
As we have outlined in section 1 there has been a gradual shift in AIS towards paying more attention to the underlying biological system that serves as inspiration, and taking time both to develop abstract computational models of the immune system (to help them understand computational properties of the immune system) and work closer with immunologists to better understand the biology behind the system. This does not mean to say that AIS researchers are now only focussed on the biology, but it would be fair to say that AIS is becoming a more interdisciplinary topic where some researchers are working more on the biological aspects and others on the more engineering aspects. To highlight this, in a recent paper by Stepney et al. [33] (extended in [32]) suggest a methodology for the development of AIS was proposed that takes this shift into account. We will discuss that methodology here, however we also propose that this methodology is a good way to describe AIS in its current form, and indeed has formed the general structure for this paper. In addition, concurring with a view of Andrews and Timmis [1], Bersini [2] makes the argument that the AIS practitioner should take more seriously the role of modelling in the understanding and development of immune inspired solutions, and adopt a more “artificial life” approach. Indeed, Bersini makes a compelling argument for undertaking such an “Alife” approach based on pedagogy and the study of emergent phenomena and qualitative predictions, all of which are beneficial to the immunologist and ultimately engineers. Whilst we have a great deal of sympathy with this view, and indeed advocate the approach, we feel this needs to be tempered by the consideration of the engineering aspects, as after all, it is better engineered solutions that are the driving force behind the vast majority of research being undertaken in AIS. This is to say that we feel both the approach encouraged by Bersini and the problem oriented approach proposed by Freitas and Timmis [11] can sit together, and this can be achieved via the conceptual framework approach [33,32]. In their paper, Stepney et al. [33] propose that bio-inspired algorithms, such as AIS, are best developed in a more principled way than was currently being undertaken in the literature. To clarify, the authors suggested that many AIS developed had drifted away from the immunological inspiration that had fueled their development and that AIS practitioners were failing to capture the complexity and richness that the immune system offers. In order to remedy this, the authors suggest a conceptual framework for developing bio-inspired algorithms within a more principled framework that attempts to capture biological richness and complexity, but at the same time appreciate the need for sound engineered systems that need to work. This should avoid the “reasoning by metaphor” approach often seen in bio-inspired computing, whereby algorithms are just a weak analogy of the process on which they are based, being developed directly from (often naive) biological models and observations. One of the main problems involved in designing bio-inspired algorithms, is deciding which aspects of
Immune Systems and Computation: An Interdisciplinary Adventure
11
Fig. 1. The Conceptual Framework [32]. This can be seen as a methodology to develop novel AIS allowing true interaction between disciples where all can benefit, and, a way of thinking about the scope of AIS and how that has broadened over the years once again.
the biology are necessary to generate the required behaviour, and which aspects are surplus to requirements. Thus, the conceptual framework takes an interdisciplinary approach, involving the design of AIS through a series of observational and modelling stages in order to identify the key characteristics of the immunological process on which the AIS will be based. The first stage of the conceptual framework, as outlined in figure 1, aims to probe the biology, utilising biological observations and experiments to provide a partial view of the biological system from which inspiration is being taken. This view is used to build abstract models of the biology. These models can be both mathematical and computational, and are open to validation techniques not available to the actual biological system. From the execution of the models and their validation, insight can be gained into the underlying biological process. It is this insight that leads to the construction of the bio-inspired algorithms. This whole process is iterative, and can also lead to the construction of computational frameworks that provide a suitable structure for specific application-oriented algorithms to be designed from. As noted by Stepney et al. [32] each step in the standard conceptual framework is biased, be it modelling some particular biology mechanism or designing an algorithm for which there is an intended end product or specific concept. The first instantiations of the conceptual framework will produce models specific to certain biological systems and algorithms for solutions to specific problems. One could attempt to produce a computational framework based on some biology without a particular end algorithm/application in mind, that is examining biology and hoping to come across something applicable to a generic computational problem. This, however, would seem to be a very difficult task and one has to ground the development of AIS in some form of application at some point. Therefore, it is far easier to orient these steps toward some particular problem giving necessary focus to the modelling work [11].
12
3
J. Timmis et al.
A Quick Primer on the Immune System
AIS have been inspired by many different aspects of the human immune system. One of the first questions that might be asked is why, as engineers and mathematicians, are we interested in the immune system? The answer is that the immune system exhibits a number of computationally appealing properties such as pattern recognition, learning, memory and self-organisation. In this section we present an overview of much of the immunology that has inspired AIS to give the reader a better appreciation of the discussions on AIS that follows. For a comprehensive introduction to immunology, the reader is referred to [14] and [20]. The immune system is typically described as a defense system that has evolved to protect its host from pathogens (harmful micro-organisms such as bacteria and viruses) [14]. It comprises a variety of specialised cells and molecules along with immune organs that provide a place for the immune cells to mature and function. The interactions between immune cells and other cells of the body create a rich and complex set of immune behaviours, resulting in the recognition of pathogens and the evocation of a suitable pathogen ridding response. The vertebrate immune system can be split functionally into two components, the innate immune system and the adaptive (or acquired) immune system. The innate immune system incorporates general pathogen defence mechanisms that have evolved over the germline of the organism. These mechanisms remain essentially unchanged during the lifetime of an individual and include the inflammatory response, phagocytosis (ingestion of pathogens by specialised immune cells), and physiologic barriers such as temperature. The mechanisms of the adaptive immune system also develop as the organism evolves, however they also have the ability to change somatically (i.e. during the lifetime of an individual). This results in the ability of the adaptive immune system to recognise previously unseen pathogens (learning) and to remember them for future encounters (memory). The innate and adaptive immune systems typically operate over different timescales. The innate operates on a small time scale often initiating a reaction either instantly or within a matter of minutes, whilst the adaptive immune system operates over a longer time period, taking of the order of days to initiate a reaction. It is the combination and interaction of both the innate and adaptive immune mechanisms that provides us with an effective immune system.
4
Modelling the Immune System
Within the context of the conceptual framework (section 2) modelling plays an important role in the understanding of the computational aspects of the immune system. There is a vast range of modelling approaches available, each with their own advantages and disadvantages operating at different levels of abstraction [10]. What we present in this section is an overview of some of the techniques that are common place in the immunological world and help us, from a computational and engineering background, understand how the immune system “computes”.
Immune Systems and Computation: An Interdisciplinary Adventure
13
A recent paper by Forrest and Beauchemin [10] provides an excellent review of modelling approaches in immunology (and further discussions on engineering immune systems for computer security). The authors highlight that there are a number of ways in which one can model the immune system, with each approach offering different perspectives to the modeller. Within the paper, the authors focus more on Agent Based Modelling (ABM) as a tool where cells might be represented as individual agents, rather than the more traditional differential equation models of populations of cell types. An agent in the system may be a certain type of cell that is encoded with simple rules that govern its behaviours and interactions. Within ABM it is possible to observe quite easily the dynamics of the agent population that arise as a result of the interactions between the agents. One difficult aspect of ABM is defining the right level of abstraction for each agent in the model, as this will clearly affect how the simulation operates. Forrest and Beauchemin [10] argue that ABM might be a more appropriate tool for modelling immunology due to the ease of which one can incorporate knowledge into the model that might not be able to be expressed mathematically and that multiple tests (or experiments) can be run with great ease, thus allowing the experimental immunologist a chance to perform experiments (albeit ones at a certain level of abstraction) in silico. This concurs with the view of Bersini [2] who advocates the use of object oriented (OO) technologies, and indeed ABM is a natural implementation of the OO paradigm. Another modelling approach is one of state charts first proposed by Harel [16] as a mechanism for representing computational processes by means of states and events that cause a transition between states. Such state charts can be developed to model complex interactions between elements and have demonstrated themselves to be useful in the context of immunological modelling [2,10]. It seems clear that there is a great deal to be learnt from the examination of the immune system in more computational terms. Indeed, our position is to concur with Forrest and Beauchemin [10], Andrews and Timmis [1], Stepney et al. [32], Bersini [2], Timmis [34] and Cohen [4] that there is a great benefit from the AIS practitioner engaging with the immunological modelling community to help not only the engineers but also the immunologists. Having now motivated the study of immunological modelling, and the role it can play in not only understanding the immune system, but also its potential role in the development of AIS, we briefly review immunological modeling in terms of π-calculus . 4.1
π-Calculus
π-calculus is a formal language used to specify concurrent computational systems. Its defining feature that sets it apart from other process calculi is the possibility of expressing mobility. This allows processes to “move” by dynamically changing their channels of communication with other processes, thus one can model networks that reconfigure themselves. The π-calculus allows composition, choice, and restriction of processes which communicate on potentially private complementary channels. There is a growing similarity between the parallelism and complexity of computer systems today and biological systems. As noted by
14
J. Timmis et al.
[28] computational analysis tools such as the π-calculus are just as applicable to biology as they are to computing. Regev et al. [31] apply the π-calculus to model a signal transduction pathway, the authors note that the π-calculus allows the model to be mathematically welldefined, and remain biologically faithful and transparent. The authors also note that the π-calculus only allows qualitative analysis of a biological system. For quantitative analysis Stochastic π-calculus (Sπ) [29] is needed. Sπ extends the πcalculus by adding a rate parameter r to interactions, this defines an exponential distribution, such that the probability of an interaction occurring within time t is F (r, t) = 1 − e−rt . Thus the average duration of an interaction is the mean 1/r. The purpose of the conception of Sπ is to allow performance analysis of concurrent computational systems, as a consequence [29] demonstrates how it is possible to turn a system described in Sπ to a continuous time Markov chain. Priami et al. [30] follows the work [31] and applies Sπ to quantitatively examine biological pathways. The mathematical nature of π-calculus, stochastic or otherwise, can render it inaccessible to non-computer scientists and potentially unwieldy or at least nonintuitive when modelling biological systems. To address this issue Philips et al. [28] define a Graphical Stochastic π-calculus (GSπ) which represents a Sπ specification as a graph of typed nodes with labelled edges. The authors prove this to be reduction equivalent to Sπ to ensure that they both have the same expressive power. A number of advantages to GSπ are discussed in [28], including the ease with which one can discover cycles in a system (cycles are an important facet found at all levels of biological systems) and the ability to provide a frontend to an Sπ simulator. Clearly the ability to simulate systems specified in Sπ is essential to understanding its behaviour, to this end there are number of stochastic simulators, for example, BioSpi [30] and SPiM [28]. Both make use of the Gillespie Algorithm [13] to simulate bio-molecular interactions. However SPiM would seem to have some advantages over BioSpi, first, it is proved to be a correct simulation of Sπ. Second, it is optimised for the simulation of biology, it does this by noting that most biological simulations contain many thousands of the identical processes, i.e. many thousands of the same protein. Third, it provides visualisation through GSπ and through an animated 3D visualisation of Sπ. A final point raised in [31] highlights that the tools of π-calculus can aid the understanding of biology. For example bisimulation allows formal comparison between two π-calculus programs by an equivalence relation in terms of their behaviour. This may allow abstraction of concepts common to many biological systems. Such ideas have an interesting instantiation here, it may be possible to use them to pin down what in a biological system is necessary for its behaviour and what is superfluous, and thus be a great benefit to the AIS practitioner as they will better understand why the biology behaves as it does. This will give more considered steps on the route through the conceptual framework towards bio-inspired algorithms, thus allowing for a genuine interaction between disciplines with a common tool. Recent work by Owens at al [27] has adopted the use of stochastic π-calculus for the modelling of early signalling events on T-cells.
Immune Systems and Computation: An Interdisciplinary Adventure
15
Their model shows the ability of T-cells to tune to a particular antagonist. altering its activation rate over time. This model is being used as the first step in a wider project to construct engineered systems with homeostatic properties based on such tuning of cells [26].
5
Towards an Interdisciplinary Adventure
In a recent position paper, Timmis [34] states that the area of AIS has reached “an impasse” and is being hampered by the lack of attention being payed to the underlying biological system (both in terms of immunology and interactions with other systems), the lack of theoretical foundations being laid and the lack of challenging application areas to drive forward the engineering aspect to AIS. This paper takes a slightly different perspective to that of Garrett [12] in so much that Timmis argues there are a number of factors, which when combined, are affecting the progression of AIS from yet another evolutionary technique to something that is, to use Garret’s terms, useful and distinctive. Garrett attempts to assign some meaning to the usefulness and distinctive criteria, but this, as we have discussed, is potentially problematic and by it’s very nature, subjective. To address some of the concerns of Timmis [34], we can look at the papers of Bersini [2], Forrest and Beauchemin [10], Cohen [4] and Timmis et al [35] and conclude that modelling and greater interaction with immunologists can help the development of AIS in bringing greater understanding of the immune system. Through this interaction it may well be possible to begin the development of new, useful and distinctive algorithms and systems, that go way beyond what engineering has to offer to date. Indeed, at the recent ICARIS conference a number of papers were dedicated to this and explore the usefulnes of tunable activation thresholds [26,15], Cohen’s cognitive model [6,41] and immune networks [17,23]. However, there is one word of caution in the excitement of modelling, and we echo the concerns of Neal and Trapnel [24] in that just because the immune system does a certain task in a certain way, it does not mean that an AIS can do the same task in the same way: immune and engineered systems are fundamentally different things. What is key, is to abstract out the computational properties of the immune system, and by seeing the immune system as a computational device [4], this may be the key to future development. It would be primarily logical properties that would be extracted, but in contrast to [11] who advocate only logical principles, it is possible that there are physical properties that can be used as inspiration (such as the physical structure of lymph nodes), but being mindful that physical properties are difficult to translate from natural to artificial systems. A greater collaboration with immunologists should help us understand in a better way the intricate interaction both within and outside of the immune system: as outlined in another challenge by Timmis [34]. Neal and Trapnel [24] outline such interactions within the immune system and it is clear from this simple view, that the interactions are complex and effective tools are going to be needed for us to even begin to understand such interactions, let alone abstract useful and distinctive computational properties for our artificial systems.
16
J. Timmis et al.
Serious developments in theory are also required to fully understand how and why the algorithms work they way they do and there are many advances that can be made with respect to modelling the immune system [38].
6
Conclusions
We have highlighted the interdisciplinary nature of AIS and through interactions across a variety of disciplines we can begin to harvest the complexity of the immune system into our engineering and, at the same time, develop new insights into the operation and functionality of the immune system. Indeed we concur with Cohen [4] in that a great deal can be learnt on all sides and maybe through the use of the conceptual framework the “literal” and “metaphorical” school may gain a greater understanding and appreciation of the underlying immunology so as to build better immune-inspired systems and the “modelling” school may develop richer and more informative models so as to further our understanding of this amazing complex system. This is not easy and will take the effort of many people over many years, but it is one that we will learn many lessons along the way in our quest to create truly artificial immune systems. As a final point we would like to advocate the application of the conceptual framework as a methodology for the development of new immune-inspired systems. The conceptual framework facilitates a truly interdisciplinary approach where expertise from engineering can inform immunology and immunology can inform engineering and will facilitate the “interdisciplinary adventure”.
Acknowledgements Paul Andrews is supported by EPSRC grant number EP/E053505/1, Nick Owens is supported by EP/E005187/1 and Ed Clark by EP/D501377/1.
References 1. Andrews, P.S., Timmis, J.: Inspiration for the next generation of artificial immune systems. In: Jacob, et al. (eds.) [19], pp. 126–138 2. Bersini, H.: Immune system modeling: The OO way. In: Bersini, Carneiro (eds.) [3], pp. 150–163 3. Bersini, H., Carneiro, J. (eds.): ICARIS 2006. LNCS, vol. 4163. Springer, Heidelberg (2006) 4. Cohen, I.R.: Real and artificial immune systems: Computing the state of the body. Imm. Rev. 7, 569–574 (2007) 5. Dasgupta, D. (ed.): Artificial Immune Systems and their Applications. Springer, Heidelberg (1999) 6. Davoudani, D., Hart, E., Paechter, B.: An immune-inspired approach to speckled computing. In: de Castro, et al. (eds.) [9], pp. 288–299 7. de Castro, L.N., Von Zuben, F.J.: Artificial immune systems: Part I—basic theory and applications. Technical Report DCA-RT 01/99, School of Computing and Electrical Engineering, State University of Campinas, Brazil (1999)
Immune Systems and Computation: An Interdisciplinary Adventure
17
8. de Castro, L.N., Von Zuben, F.J.: Artificial immune systems: Part II—a survey of applications. Technical Report DCA-RT 02/00, School of Computing and Electrical Engineering, State University of Campinas, Brazil (2000) 9. de Castro, L.N., Von Zuben, F.J., Knidel, H. (eds.): ICARIS 2007. LNCS, vol. 4628. Springer, Heidelberg (2007) 10. Forrest, S., Beauchemin, C.: Computer Immunology. Immunol. Rev. 216(1), 176– 197 (2007) 11. Freitas, A., Timmis, J.: Revisiting the foundations of artificial immune systems for data mining. IEEE Trans. Evol. Comp. 11(4), 521–540 (2007) 12. Garrett, S.: How do we evaluate artificial immune systems? Evolutionary Computation 13(2), 145–177 (2005) 13. Gillespie, D.: Approximate accelerated stochastic simulation of chemically reacting systems. J. Phys. Chem. 81(25), 2340–2361 (1977) 14. Goldsby, R.A., Kindt, T.J., Osborne, B.A., Kuby, J.: Immunology, 5th edn. W. H. Freeman and Company, New York (2003) 15. Guzella, T., Mota-Santos, T., Caminhas, W.: Towards a novel immune inspired approach to temporal anomaly detection. In: de Castro, et al. (eds.) [9], pp. 119– 130 16. Harel, D.: Statecharts: a visual formalism for complex systems. Sci. Computer Program 8, 231–274 (1987) 17. Hart, E., Santos, F., Bersini, H.: Topological constraints in the evolution of idiotypic networks. In: de Castro, et al. (eds.) [9], pp. 252–263 18. Hart, E., Timmis, J.: Application areas of AIS: The past, the present and the future. Applied Soft Computing 8(1), 191–201 (2008); (in Press, Corrected Proof) (February 12, 2007) 19. Jacob, C., Pilat, M.L., Bentley, P.J., Timmis, J.I. (eds.): ICARIS 2005. LNCS, vol. 3627. Springer, Heidelberg (2005) 20. Janeway, C.A., Travers, P., Walport, M., Shlomchik, M.: Immunobiology, 5th edn. Garland Publishing (2001) 21. Ji, Z., Dasgupta, D.: Artificial immune system (AIS) research in the last five years. In: Congress on Evolutionary Computation, Canberra, Australia, December 8–12, vol. 1, pp. 123–130. IEEE, Los Alamitos (2003) 22. Kim, J., Bentley, P., Aickelin, U., Greensmith, J., Tedesco, G., Twycross, J.: Immune system approaches to intrusion detection - a review. In: Natural Computing (2007) (in print) 23. McEwan, C., Hart, E., Paechter, B.: Revisiting the central and peripheral immune system. In: de Castro, et al. (eds.) [9], pp. 240–251 24. Neal, M., Trapnel, B.: Silico Immuonology, chapter Go Dutch: Exploit Interactions and Environments with Artificial Immune Systems, pp. 313–330. Springer, Heidelberg (2007) 25. Nicosia, G., Cutello, V., Bentley, P.J., Timmis, J. (eds.): ICARIS 2004. LNCS, vol. 3239. Springer, Heidelberg (2004) 26. Owens, N., Timmis, J., Greensted, A., Tyrrell, A.: On immune inspired homeostasis for electronic systems. In: de Castro, et al. (eds.) [9], pp. 216–227 27. Owens, N., Timmis, J., Tyrrell, A., Greensted, A.: Modelling the tunability of early t-cell activation events. In: Proceedings of the 7th International Conference on Artificial Immune Systems. LNCS. Springer, Heidelberg (2008) 28. Phillips, A., Cardelli, L.: Efficient, correct simulation of biological processes in the stochastic pi-calculus. In: Calder, M., Gilmore, S. (eds.) CMSB 2007. LNCS (LNBI), vol. 4695, pp. 184–199. Springer, Heidelberg (2007)
18
J. Timmis et al.
29. Priami, C.: Stochastic π-calculus. The Computer Journal 38(7), 578–589 (1995) 30. Priami, C., Regev, A., Shapiro, E.: Application of a stochastic name-passing calculus to representation for biological processes in the stochastic π-calculus. Information Processing Letters 80, 25–31 (2001) 31. Regev, A., Silverman, W., Shapiro, E.: Representation and simulation of biochemical processes using the pi-calculus process algebra. In: Pacific Symposium on Biocomputing, vol. 6, pp. 459–470 (2001) 32. Stepney, S., Smith, R., Timmis, J., Tyrrell, A., Neal, M., Hone, A.: Conceptual frameworks for artificial immune systems. Int. J. Unconventional Computing 1(3), 315–338 (2006) 33. Stepney, S., Smith, R.E., Timmis, J., Tyrrell, A.M.: Towards a conceptual framework for artificial immune systems. In: Nicosia, et al. (eds.) [25], pp. 53–64 34. Timmis, J.: Artificial immune systems: Today and tomorow. Natural Computing 6(1), 1–18 (2007) 35. Timmis, J., Andrews, P.S., Owens, N., Clark, E.: An interdisciplinary perpective on artificial immune systems. Evolutionary Intelligence 1(1), 5–26 (2008) 36. Timmis, J., Bentley, P.J., Hart, E. (eds.): ICARIS 2003. LNCS, vol. 2787. Springer, Heidelberg (2003) 37. Timmis, J., Bentley, P.J. (eds.): Proceedings of the 1st International Conference on Artificial Immune Systems (ICARIS 2002). University of Kent Printing Unit (2002) 38. Timmis, J., Hone, A., Stibor, T., Clark, E.: Theoretical advances in artificial immune systems. Journal of Theoretical Computer Science (in press, 2008) (doi:10.1016/j.tcs.2008.02.011) 39. Timmis, J., Knight, T.: Data Mining: A Heuristic Approach, chapter Artificial immune systems: Using the immune system as inspiration for data mining, pp. 209–230. Idea Group (2001) 40. Twycross, J., Aickelin, U.: Towards a conceptual framework for innate immunity. In: Jacob, et al. (eds.) [19], pp. 112–125 41. Voigt, D., Wirth, H., Dilger, W.: A computational models for the cognitive immune system theory based on learning classifier systems. In: de Castro, et al. (eds.) [9], pp. 264–275
Distributed Learning of Wardrop Equilibria Dominique Barth2 , Olivier Bournez1 , Octave Boussaton1, and Johanne Cohen1 1
2
LORIA/INRIA-CNRS-UHP, 615 Rue du Jardin Botanique, 54602 Villers-L`es-Nancy, France {Olivier.Bournez,Octave.Boussaton,Johanne.Cohen}@loria.fr Laboratoire PRiSM Universit´e de Versailles, 45, avenue des Etats-Unis, 78000 Versailles, France
[email protected]
Abstract. We consider the problem of learning equilibria in a well known game theoretic traffic model due to Wardrop. We consider a distributed learning algorithm that we prove to converge to equilibria. The proof of convergence is based on a differential equation governing the global macroscopic evolution of the system, inferred from the local microscopic evolutions of agents. We prove that the differential equation converges with the help of Lyapunov techniques.
1
Introduction
We consider in this paper a well-known game theoretic traffic model due to Wardrop [34] (see also [30] for an alternative presentation). This model was conceived to represent road traffic with the idea of an infinite number of agents being responsible for an infinitesimal amount of traffic each. A network equipped with non-decreasing latency functions mapping flow on edges to latencies is given. For each of several commodities a certain amount of traffic, or flow demand, has to be routed from a given source to a given destination via a collection of paths. A flow in which for all commodities the latencies of all used paths are minimal with respect to this commodity is called a Wardrop equilibrium of the network. Whereas this is well-known that such equilibria can be solved by centralized algorithms in polynomial time, as in [31] we are interested in distributed algorithms to compute Wardrop equilibria. Actually, we consider in this paper a slightly different setting from the original Wardrop model [34] (similar to the one considered in [31]): we consider that the flow is controlled by a finite number N of agents only, each of which is responsible for a fraction of the entire flow of one commodity. Each agent has a set of admissible paths among which it may distribute its flow. Each agent aims at balancing its own flow such that the jointly computed allocation will be a Wardrop equilibrium. We consider for these networks a dynamics for learning Nash equilibria in multiperson games presented in [28]. This dynamics was proved to be such that all stable stationary points are Nash equilibria for general games. Whereas for general games, the dynamic is not necessarily convergent [28], we prove that the C.S. Calude et al. (Eds.): UC 2008, LNCS 5204, pp. 19–32, 2008. c Springer-Verlag Berlin Heidelberg 2008
20
D. Barth et al.
dynamics is convergent for linear Wardrop networks. We call linear Wardrop networks the case where latency functions are affine. Our motivation behind this study is twofold. On one hand, we want to understand if, how and when equilibria can be learned in games. The dynamics considered here has both the advantage of being decentralized and of requiring partial and very limited informations. It is indeed a discrete stochastic dynamics played by the N players, each of which chooses between a finite number of strategies (paths) they can use at each instant. After each play, players are rewarded with random payoffs. In order for players to learn optimal strategies, the game is played repeatedly. Hence, after each play, each player updates his strategy based solely on his current action and payoff. Our interest is in learning equilibria in games, through distributed algorithms and with minimal informations for players. On the other hand, our interest in this dynamics comes from a general plan of one of the authors here behind his study of computational properties of continuous time dynamical systems: see e.g. [6,5,3], or survey [4]. As we noticed in the introduction of this latter survey, continuous time systems arise in experimental sciences as soon as a huge population of agents (molecules, individuals,. . . ) is abstracted into real quantities. Wardrop networks constitute a clear and nice example where this holds for systems coming from road traffic [34], or from computer network traffic [30]. One strong motivation behind the current work is also to discuss the efficiency attained by such networks, and more generally by distributed systems. Our approach is based on a macroscopic abstraction of the microscopic rules of evolutions of the involved agents, in terms of a differential equation governing the global state of the system. This differential equation is proved to converge for linear Wardrop networks, using Lyapunov techniques. For general games the considered dynamics is not always convergent [28].
2
Related Work
For a survey on continuous time systems, and on their computational properties, we refer to [4]. In the history of game theory, various algorithms for learning equilibrium states have been proposed: centralized and decentralized (or distributed) algorithms, games with perfect, complete or incomplete information, with a restricted number of players, etc... See e.g. [23] for an introduction to the learning automata model, and the general references in [28] for specific studies for zero-sum games, N -person games with common payoff, non-cooperative games, etc... Wardrop traffic model was introduced in [34] to apprehend road traffic. More recently, it has often been considered as a model of computer network traffic. The price of anarchy, introduced by [22] in order to compare costs of Nash equilibria to costs of optimal (social) states has been intensively studied on these games: see e.g. [30,29,7,16,8]. There are a few works considering dynamical versions of these games, where agents try to learn equilibria, in the spirit of this paper.
Distributed Learning of Wardrop Equilibria
21
In [13], extending [14] and [15], Fischer and al. consider a game in the original Wardrop settings, i.e. a case where each user carries an infinitesimal amount of traffic. At each round, each agent samples an alternative routing path and compares the latency on its current path with the sampled one. If an agent observes that it can improve its latency, then it switches with some probability that depends on the improvement offered by the better paths, otherwise, it sticks to its current path. Upper bounds on the time of convergence were established for asymmetric and symmetric games. In [31] Fischer and al. consider a more tractable version of this learning algorithm, considering a model with a finite number of players, similar to ours. The considered algorithm, based on a randomized path decomposition in every communication round, is also very different from ours. Nash equilibria learning algorithms for other problems have also been considered recently, in particular for load balancing problems. First, notice that the proof of existence of a pure Nash equilibria for the load balancing problem of [22] can be turned into a dynamics: players play in turn, and move to machines with a lower load. Such a strategy can be proved to lead to a pure Nash equilibrium. Bounds on the convergence time have been investigated in [10,11]. Since players play in turns, this is often called the Elementary Step System. Other results of convergence in this model have been investigated in [17,25,27]. Concerning models that allow concurrent redecisions, we can mention the followings works. In [12], tasks are allowed in parallel to migrate from overloaded to underloaded resources. The process is proved to terminate in expected O(log log n + log m) rounds. In [2] is considered a distributed process that avoids that latter problem: only local knowledge is required. The process is proved to terminate in expected O(log log n + m4 ) rounds. The analysis is also done only for unitary weights, and for identical machines. Techniques involved in the proof, relying on martingale techniques, are somehow related to techniques for studying the classical problem of allocating balls into bins games as evenly as possible. The dynamics considered in our present paper has been studied in [28] for general stochastic games where Thathachar & al. proved that the dynamics is weakly convergent to some function solution of an ordinary differential equation. This ordinary differential equation turns out to be a replicator equation. While a sufficient condition for convergence is given, no error bounds are provided and no Lyapunov function is established for systems similar to the ones considered in this paper. Replicator equations have been deeply studied in evolutionary game theory [20,35]. Evolutionary game theory isn’t restricted to these dynamics but considers a whole family of dynamics that satisfy a so called folk theorem in the spirit of Theorem 2. Bounds on the rate of convergence of fictitious play dynamics have been established in [18], and in [21] for the best response dynamics. Fictitious play has been reproved to be convergent for zero-sum games using numerical analysis
22
D. Barth et al.
methods or, more generally, stochastic approximation theory: fictitious play can be proved to be a Euler discretization of a certain continuous time process [20]. A replicator equation for allocation games has been considered in [1], where authors establish a potential function for it. Their dynamics is not the same as ours : we have a replicator dynamics where fitnesses are given by true costs, whereas for some reason, marginal costs are considered in [1].
3
Wardrop’s Traffic Model
A routing game [34] is given by a graph G = (V, E). To each edge e ∈ E = (v1 , v2 ), where v1, v2 ∈ V , is associated a continuous and non decreasing latency function e : [0, 1] → R+ . We are given [k] = {1, 2, ..., k} a set of commodities, each of which is specified by a triplet consisting in: a source-destination pair of G connecting si to ti , and (si , ti ), Gi = (Vi , Ei ) a directed acyclic sub-graph a flow demand ri ≥ 0. The total demand is r = i∈[k] ri . We assume without loss of generality that r = 1. Let Pi denote the admissible paths of commodity i, i.e. all paths connecting si and ti in Gi . We may assume that the sets Pi are disjoint and define iP to be the unique commodity to which path P belongs. A non-negative path flow vector (fP )P ∈P is feasible if it satisfies the flow demands P ∈Pi fP = ri , for all i ∈ [k]. A path flowvector (fP )P ∈P induces an edge flow vector f = (fe,i )e∈E,i∈[k] with fe,i = P ∈Pi :e∈P fP . The total flow on edge e is fe = i∈[k] fe,i . The latency of an edge e is given by e (fe ) and the latency of a path P is given by the sum of the edge latencies P (f ) = e∈P e (fe ). A flow vector in this model is considered stable when no fraction of the flow can improve its latency by moving unilaterally to another path. It is easy to see that this implies that all used paths must have the same (minimal) latency. Definition 1 (Wardrop Equilibrium). A feasible flow vector f is at a Wardrop equilibrium if for every commodity i ∈ [k] and paths P1 , P2 ∈ Pi , with fP1 > 0, P1 (f ) ≤ P2 (f ) holds. We now extend the original Wardrop model [34] to an N player game as follows (a similar setting has been considered in [31]). We assume that we have a finite set [N ] of players. Each player is associated to one commodity, and is supposed to be in charge of a fraction wi of the total flow ri of a fixed commodity. Each player (agent) aims at balancing its own flow in such a way that its latency becomes minimal. In this present work, we will narrow down our investigations to the case of linear cost functions: we assume that for every edge e, there are some constants αe , and βe such that e (λ) = αe λ + βe .
4
Game Theoretic Settings
We assume that players distribute their flow selfishly without any centralized control and only have a local view of the system. All players know how many
Distributed Learning of Wardrop Equilibria
23
paths are available. We suppose that the game is played repeatedly. At each elementary step t, players know their cost and the path they chose at step t < t. Each one of them selects a path at time step t according to a mixed strategy qj (t), with qj,s denoting the probability for player j to select path s at step t. Any player associated to commodity i has the finite set of actions Pi . We assume that paths are known by and available to all of the players. An element of Pi , is called a pure strategy. N Define payoff functions di : j=1 P → [0, 1], 1 ≤ i ≤ N , by: di (a1 , a2 , ..., aN ) = cost f or i | player j chose action aj ∈ P, 1 ≤ j ≤ N
(1)
where (a1 , ..., aN ) is the set of pure strategies played by all the players. In our case, di (a1 , a2 , ..., aN ) = ai (f ), where f is the flow induced by a1 , a2 , ..., aN . We call it the payoff function, or utility function of player i and the objective of all players is to minimize their payoff. Now, we want to extend the payoff function to mixed strategies. To do so, let Sp denote the simplex of dimension p which is the set of p-dimensional probability vectors: p qs = 1}. (2) Sp = {q = (q1 , · · · , qp ) ∈ [0, 1]p : s=1
For a player associated to commodity i, we write abusively S for S|Pi | , i.e. the set of its mixed strategies. We denote by K = S N the space of mixed strategies. Payoff functions di defined on pure strategies in equation (1) can be extended to functions di on the space of mixed strategies K as follows: di (q1 , ..., qN ) = E[cost f or i | player z employs strategy qz , 1 ≤ z ≤ N ] N = j1 ,...,jN di (j1 , ..., jN ) × z=1 qz,jz
(3)
where (q1 , ..., qN ) is the set of mixed strategies played by the set of players and E denotes a conditiona expectation. Definition 2. The N -tuple of mixed strategies (˜ q1 , ..., q˜N ) is said to be a Nash equilibrium (in mixed strategies), if for each i, 1 ≤ i ≤ N , we have: di (˜ q1 , ..., q˜i−1 , q˜i , q˜i+1 , ..., q˜N ) ≤ di (˜ q1 , ..., q˜i−1 , q, q˜i+1 , ..., q˜N ) ∀q ∈ S
(4)
It is well known that every n-person game has at least one Nash equilibrium in mixed strategies [26]. We define K ∗ = (S ∗ )N where S ∗ = {q ∈ S| q is a p-dimensional probability vector with 1 component unity} as the corners of the strategy space K. Clearly, K ∗ can be put in one-to-one correspondence with pure strategies. A N -tuple of ˜N ) can be defined to be a pure Nash Equilibrium similarly. actions (˜ a1 , ..., a Now the learning problem can be stated as follows: Assume that we play a stochastic repeated game with incomplete information. qi [k] is the strategy
24
D. Barth et al.
employed by the ith player at instant k. Let ai [k] and ci [k] be the action selected and the payoff obtained by player i respectively at time k (k = 0, 1, 2, . . . ). Find a decentralized learning algorithm Ti , where qi [k +1] = Ti (qi [k], ai [k], ci [k]), such q1 , ..., q˜N ) is a Nash equilibrium of the game. that qi [k] → q˜i as k → +∞ where (˜
5
Distributed Algorithm
We consider the following learning algorithm, already considered in [23,28], and also called the Linear Reward-Inaction (LR−I ) algorithm. Definition 3 (Considered Algorithm) 1. At every time step, each player chooses an action according to its current Action Probability Vector (APV). Thus, the ith player selects path s = ai (k) at instant k with probability qi,s (k). 2. Each player obtains a payoff based on the set of all actions. We note the reward to player i at time k: ci (k) = ai (f (k)). 3. Each player updates his AP V according to the rule: qi (k + 1) = qi (k) + b × (1 − ci (k)) × (eai (k) − qi (k)), i = 1, ..., n,
(5)
where 0 < b < 1 is a parameter and eai (k) is a unit vector of dimension m th with ai (k) component unity. It is easy to see that decisions made by players are completely decentralized, at each time step, player i only needs ci and ai , respectively its payoff and last action, to update his APV. Notice that, componentwise, Equation (5) can be rewritten: qi,s (k) −b(1 − ci (k))qi,s (k) if ai = s (6) qi,s (k + 1) = qi,s (k) +b(1 − ci (k))(1 − qi,s (k)) if ai = s Let Q[k] = (q1 (k), ..., qN (k)) ∈ K denote the state of the player team at instant k. Our interest is in the asymptotic behavior of Q[k] and its convergence to a Nash Equilibrium. Clearly, under the learning algorithm specified by (5), {Q[k], k ≥ 0} is a Markov process. Observe that this dynamic can also be put in the form Q[k + 1] = Q[k] + b · G(Q[k], a[k], c[k]),
(7)
where a[k] = (a1 (k), ..., aN (k)) denotes the actions selected by the player team at k and c[k] = (c1 (k), ..., cN (k)) their resulting payoffs, for some function G(., ., .) representing the updating specified by equation (5), that does not depend on b. Consider the piecewise-constant interpolation of Q[k], Qb (.), defined by Qb (t) = Q[k], t ∈ [kb, (k + 1)b], where b is the parameter used in (5).
(8)
Distributed Learning of Wardrop Equilibria
25
Qb (.) belongs to the space of all functions from R into K. These functions are right continuous and have left hand limits. Now consider the sequence {Qb (.) : b > 0}. We are interested in the limit Q(.) of this sequence as b → 0. The following is proved in [28]: Proposition 1 ([28]). The sequence of interpolated processes {Qb (.)} converges weakly, as b → 0, to Q(.), which is the (unique) solution of Cauchy problem dQ = φ(Q), Q(0) = Q0 dt
(9)
where Q0 = Qb (0) = Q[0], and φ : K → K is given by φ(Q) = E[G(Q[k], a[k], c[k])|Q[k] = Q], where G is the function in Equation (7). Recall that a family of random variable (Yt )t∈R weakly converges to a random variable Y , if E[h(Xt )] converges to E[h(Y )] for each bounded and continuous function h. This is equivalent to convergence in distributions. The proof of Proposition 1 in [28], that works for general (even with stochastic payoffs) games, is based on constructions from [24], in turn based on [32], i.e. on weak-convergence methods, non-constructive in several aspects, and does not provide error bounds. It is actually possible to provide a bound on the error between Q(t) and the expectation of Qb (.) in some cases. Theorem 1. Let Q(.) be a process defined by an equation of type (7), and let Qb (.) be the corresponding piecewise-constant interpolation, given by (8). Assume that E[G(Q[k], a[k], c[k])] = φ(E[Q[k]]) for some function φ of class C 1 . Let (t) be the error in approximating the expectation of Qb (t) by Q(t): (t) = ||E[Qb (t)] − Q(t)||, where Q(.) is the (unique) solution of the Cauchy problem dQ = φ(Q), Q(0) = Q0 , dt where Q0 = Qb (0) = Q[0]. We have (t) ≤ M b
(10)
eΛt − 1 , 2Λ
for t of the form t = kb, where Λ = maxi, || ∂q∂φi, ||, and M is a bound on the norm of Q (t) =
dφ(Q(t)) . dt
Proof. The general idea of the proof is to consider the dynamic (7), as an Euler discretization method of the ordinary differential equation (10), and then use some classical numerical analysis techniques to bound the error at time t.
26
D. Barth et al.
Indeed, by hypothesis we have E[Q[k + 1]] = E[Q[k]] + b · E[G(Q[k], a[k], c[k])] = E[Q[k]] + bφ(E[Q[k]]). Suppose that φ(.) is Λ-Lipschitz: ||φ(x) − φ(x )|| ≤ Λ||x − x ||, for some positive Λ. From Taylor-Lagrange inequality, we can always take Λ = maxi, || ∂q∂φi, ||, if φ is of class C 1 . We can write, ((k + 1)b) = ||E[Qb ((k + 1)b)] − Q((k + 1)b)|| ≤ ||E[Qb ((k + 1)b)] − E[Qb (kb)] − bφ(Q(kb))|| +||E[Qb (kb)] − Q(kb)|| + ||Q(kb) − Q((k + 1)b) + bφ(Q(kb))|| = ||bφ(E[Qb (kb)]) − bφ(Q(kb))|| + (kb) (k+1)b +||bφ(Q(kb)) − kb φ(Q(t ))dt || b ≤ Λb||E[Q (kb)] − Q(kb)|| + (kb) + e(kb) ≤ (1 + Λb)(kb) + e(kb) (k+1)b where e(kb) = ||bφ(Q(kb)) − kb φ(Q(t ))dt ||. 2 From Taylor-Lagrange inequality, we know that e(kb) ≤ K = M b2 , where M is a bound on the norm of Q (t) = dφ(Q(t)) . dt From an easy recurrence on k, (sometimes called Discrete Gronwall’s Lemma, see e.g. [9]), using inequality ((k + 1)b) ≤ (1 + Λb)((k + 1)b) + K, we get that −1 (kb) ≤ (1 + Λb)k (0) + K (1+Λb) 1+Λb−1 ekΛb −1 ≤ K Λb kΛb = M b e 2Λ−1 k
using that (1 + u)k ≤ eku for all u ≥ 0, and (0) = 0. This completes the proof. Using (6), we can rewrite E[G(Q[k], a[k], c[k])] in the general case as follows. E[G(Q[k], a[k], c[k])]i,s = qi,s (1 − qi,s )(1 − E[ci |Q(k), ai = s]) − s =s qi,s qi,s (1 − E[ci |Q(k), ai = s ]) = qi,s [ s =s qi,s (1 − E[ci |Q(k), ai = s]) − s =s qi,s (1 − E[ci |Q(k), ai = s ]] = −qi,s s (E[ci |Q(k), ai = s]−qi,s E[ci |Q(k), ai = s ]), (11) using the fact that 1 − qi,s = s =s qi,s . Let hi,s be the expectation of the payoff for i if player i plays pure strategy s, and players j = i play (mixed) strategy qj . Formally, hi,s (q1 , · · · , qi−1 , s, qi+1 , · · · , qn ) = E[cost f or i |Q(k), ai = s].
Distributed Learning of Wardrop Equilibria
27
Let hi (Q) the mean value of hi,s , in the sense that hi (Q) = qi,s hi,s (Q). s
We obtain from (11), E[G(Q[k], a[k], c[k])]i,s = −qi,s (hi,s (Q) − hi (Q)).
(12)
Hence, the dynamics given by Ordinary Differential Equation (9) is componentwise: dqi,s = −qi,s (hi,s (Q) − hi (Q)). (13) dt This is a replicator equation, that is to say a well-known and studied dynamics in evolutionary game theory [20,35]. In this context, hi,s (Q) is interpreted as player i’s fitness for a given game, and hi (Q) is the mean value of the expected fitness in the above sense. In particular, solutions are known to satisfy the following theorem (sometimes called Evolutionary Game Theory Folk Theorem) [20,28]. Theorem 2 (see e.g. [20,28]). The following are true for the solutions of the replicator equation (13): – – – –
All All All All
corners of space K are stationary points. Nash equilibria are stationary points. strict Nash equilibria are asymptotically stable. stable stationary points are Nash equilibria.
From this theorem, we can conclude that the dynamics (13), and hence the learning algorithm when b goes to 0, will never converge to a point in K which is not a Nash equilibrium. However, for general games, there is no convergence in the general case [28]. We will now show that for linear Wardrop games, there is always convergence. It will then follow that the learning algorithm we are considering here converges towards Nash equilibria, i.e. solves the learning problem for linear Wardrop games. First, we specialize the dynamics for our routing games. We have e (λe ) = [βe + αe wi + αe 1e∈aj wj ] (14) ai (f ) = e∈ai
e∈ai
j=i
where 1e∈aj is 1 whenever e ∈ aj , 0 otherwise. Let us also introduce the following notation: qi,P × 1e∈P (15) prob(e, Q)i = P ∈Pi
which denotes the probability for player i to use edge e, for his given probability vector qi .
28
D. Barth et al.
Using expectation of utility for player i using path s, we get it as: hi,s (Q) = [βe + αe wi + αe qj,P × 1e∈P wj ] j=i P ∈Pj
e∈s
That we can also write (from (15)) hi,s (Q) =
e∈s
[βe + αe wi + αe
prob(e, Q)j wj ]
j=i
We claim the following. Theorem 3 (Extension of Theorem 3.3 from [28]). Suppose there is a non-negative function F :K→R such that for some constants wi > 0, for all i, s, Q, ∂F (Q) = wi × hi,s (Q). ∂qi,s
(16)
Then the learning algorithm, for any initial condition in K − K ∗ , always converges to a Nash Equilibrium. Proof. We claim that F (.) is monotone along trajectories. We have: dF (Q(t)) ∂F dqi,s = i,s ∂q dt i,s∂F dt = − i,s ∂qi,s (Q)qi,s s qi,s [hi,s (Q) − hi,s (Q)] = − i,s wi hi,s (Q)qi,s s qi,s [hi,s (Q) − hi,s (Q)] = − i wi s s qi,s qi,s [hi,s (Q)2 − hi,s (Q)hi,s (Q)] = − i wi s s >s qi,s qi,s [hi,s (Q) − hi,s (Q)]2 ≤0
(17)
Thus F is decreasing along the trajectories of the ODE and, due to the nature of the ODE (13), for initial conditions in K will be confined to K. Hence from the Lyapunov Stability theorem (see e.g. [19] page 194), if we note Q∗ an equilibrium point, we can define L(Q) = F (Q) − F (Q∗ ) as a Lyapunov function of the game. Asymptotically, all trajectories will be in the set K = (Q∗ ) {Q∗ ∈ K : dF dt = 0}. (Q∗ ) From (17), we know that dF dt = 0 implies qi,s qi,s [hi,s (Q) − hi,s (Q)] = 0 ∗ for all i, s, s . Such a Q is, thus, a stationary point of the dynamics. Since from Theorem 2, all stationary points that are not Nash equilibria are unstable, the theorem follows. We claim that such a function exists for linear Wardrop games.
Distributed Learning of Wardrop Equilibria
29
Proposition 2. For our definition we gave earlier of linear Wardrop games, the following function F satisfies the hypothesis of the previous theorem: N F (Q) = e∈E βe w × prob(e, Q) j + j=1 j 2 αe N (18) + j=1 wj × prob(e, Q)j 2
N prob(e,Q)j 2 ) αe j=1 wj × prob(e, Q)j × (1 − 2 Notice that the hypothesis of affine cost functions is crucial here. Proof. We use the fact that F (Q) is of the form e∈E expr(e, Q) in order to lighten the next few lines. ∂F ∂qi,s (Q)
=
∂F ∂qi,s (Q)
=
∂
expr(e,Q) ∂qi,s
e∈E
=
∂expr(e,Q) e∈E ∂prob(e,Q)i
Note that, from (15),
×
e∈E
∂expr(e,Q) ∂qi,s
which can be rewritten as
∂prob(e,Q)i . ∂qi,s
∂prob(e,Q)i ∂qi,s
= 1e∈s , we then get
∂expr(e, Q) ∂expr(e, Q) ∂F (Q) = × 1e∈s = ∂qi,s ∂prob(e, Q)i ∂prob(e, Q)i e∈s
(19)
e∈E
Let us now develop the derivative of each term of the sum and come back to (19) in the end, we have ∂expr(e,Q) ∂prob(e,Q)i
2 = βe ×wi +αe ×wi ( N j=1 wj ×prob(e, Q)j )+αe (wi (1−prob(e, Q)i ))
= βe × wi + αe × wi ( j=i wj × prob(e, Q)j ) + αe wi2 . This finally leads to:
∂expr(e,Q) e∈s ∂prob(e,Q)i
∂F ∂qi,s (Q)
=
e∈s
βe × wi + αe × wi ( j=i wj × prob(e, Q)j ) + αe wi2
= wi × hi,s (Q)
We showed that Equation (16) holds, which ends the proof and confirms that F is a good potential function for such a game. Proposition 3. Suppose for example that cost functions were quadratic : e (λe ) = αe λ2e + βe λe + γe , with αe , βe , γe ≥ 0, αe = 0. There can not exist a function F of class C 2 that satisfies (16) for all i, s, Q, and general choice of weights (wi )i .
30
D. Barth et al.
Proof. By Schwartz theorem, we must have ∂F ∂ ∂F ∂ ( )= ( ), ∂qi ,s ∂qi,s ∂qi,s ∂qi ,s and hence Wi
∂hi,s ∂hi ,s = Wi , ∂qi ,s ∂qi,s
for all i, i , s, s , for some constants Wi , Wi . It is easy to see that this doesn’t hold for general choice of Q and weights (wi )i in this case. Coming back to our model (with affine costs), we obtain the following result: Theorem 4. For linear Wardrop games, for any initial condition in K − K ∗ , the considered learning algorithm converges to a (mixed) Nash equilibrium.
6
Conclusion
In this paper we considered the classical Wardrop traffic model but where we introduced some specific dynamical aspects. We considered an update algorithm proposed by [28] and we proved that the learning algorithm depicted is able to learn mixed Nash equilibria of the game, extending several results of [28]. To do so, we proved that the learning algorithm is asymptotically equivalent to an ordinary differential equation, which turns out to be a replicator equation. Using a folk theorem from evolutionary game theory, one knows that if the dynamics converges, it will be towards some Nash equilibria. We proved using a Lyapunov function argument that the dynamics converges in our considered settings. We established some time bounds on the time required before convergence, based on the analysis of the dynamics, and numerical analysis arguments in some special cases. We are also investigating the use of this dynamics over other games which are known to have some potential function, such as load balancing problems [22,33]. We also believe that this paper yields a very nice example of distributed systems whose study is done through a macroscopic view of a set of distributed systems defined by microscopic rules: whereas the microscopic rules are quite simple, and based on local views, the macroscopic evolution computes global equilibria of the system. We also intend to pursue our investigations on the computational properties of distributed systems through similar macroscopic continuous time dynamical system views.
References 1. Altman, E., Hayel, Y., Kameda, H.: Evolutionary Dynamics and Potential Games in Non-Cooperative Routing. In: Wireless Networks: Communication, Cooperation and Competition (WNC3 2007) (2007)
Distributed Learning of Wardrop Equilibria
31
2. Berenbrink, P., Friedetzky, T., Goldberg, L.A., Goldberg, P., Hu, Z., Martin, R.: Distributed Selfish Load Balancing. In: SODA 2006: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, pp. 354–363. ACM, New York (2006) 3. Bournez, O.: Mod`eles Continus. Calculs. Algorithmique Distribu´ee. Hdr, Institut National Polytechnique de Lorraine (D´ecember 7, 2006) 4. Bournez, O., Campagnolo, M.L.: A Survey on Continuous Time Computations. In: New Computational Paradigms. Changing Conceptions of What is Computable, pp. 383–423. Springer, New York (2008) 5. Bournez, O., Campagnolo, M.L., Gra¸ca, D.S., Hainry, E.: Polynomial Differential Equations Compute All Real Computable Functions on Computable Compact Intervals. Journal of Complexity 23(3), 317–335 (2007) 6. Bournez, O., Hainry, E.: Recursive Analysis Characterized as a Class of Real Recursive Functions. Fundinform 74(4), 409–433 (2006) 7. Cole, R., Dodis, Y., Roughgarden, T.: Low much can taxes help selfish routing? In: Proceedings of the 4th ACM Conference on Electronic Commerce (EC 2003), pp. 98–107. ACM Press, New York (2003) 8. Cominetti, R., Correa, J.R., Stier-Moses, N.E.: Network Games with Atomic Players. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4051, pp. 525–536. Springer, Heidelberg (2006) 9. Demailly, J.-P.: Analyse Num´erique et Equations Diff´erentielles. Presses Universitaires de Grenoble (1991) 10. Even-Dar, E., Kesselman, A., Mansour, Y.: Convergence Time to Nash Equilibria. In: 30th International Conference on Automata, Languages and Programming (ICALP), pp. 502–513 (2003) 11. Even-Dar, E., Kesselman, A., Mansour, Y.: Convergence Time to Nash equilibrium in Load Balancing. ACM Transactions on Algorithms 3(3) (2007) 12. Even-Dar, E., Mansour, Y.: Fast Convergence of Selfish Rerouting. In: SODA 2005: Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, pp. 772–781 (2005) 13. Fischer, S., R¨ acke, H., V¨ ocking, B.: Fast Convergence to Wardrop Equilibria by Adaptive Sampling Methods. In: Proceedings of the thirty-eighth annual ACM symposium on Theory of computing, pp. 653–662 (2006) 14. Fischer, S., Vocking, B.: On the Evolution of Selfish Routing. In: Albers, S., Radzik, T. (eds.) ESA 2004. LNCS, vol. 3221. Springer, Heidelberg (2004) 15. Fischer, S., V¨ ocking, B.: Adaptive Routing with Stale Information. In: Proceedings of the twenty-fourth annual ACM SIGACT-SIGOPS symposium on Principles of distributed computing, pp. 276–283 (2005) 16. Fleischer, L.: Linear Tolls Suffice: New Bounds and Algorithms For Tolls in Single Source Networks. Theoretical Computer Science 348(2-3), 217–225 (2005) 17. Goldberg, P.W.: Bounds for the Convergence Rate of Randomized Local Search in a Multiplayer Load-Balancing Game. In: PODC 2004: Proceedings of the twentythird annual ACM symposium on Principles of distributed computing, pp. 131–140. ACM, New York (2004) 18. Harris, C.: On the Rate of Convergence of Continuous-Time Fictitious Play. Games and Economic Behavior 22(2), 238–259 (1998) 19. Hirsch, M.W., Smale, S., Devaney, R.: Differential Equations, Dynamical Systems, and an Introduction to Chaos. Elsevier Academic Press, Amsterdam (2003) 20. Hofbauer, J., Sigmund, K.: Evolutionary Game Dynamics. Bulletin of the American Mathematical Society 4, 479–519 (2003)
32
D. Barth et al.
21. Hofbauer, J., Sorin, S.: Best Response Dynamics for Continuous Zero-Sum Games. Discrete and Continuous Dynamical Systems-Series B 6(1) (2006) 22. Koutsoupias, E., Papadimitriou, C.: Worst-case Equilibria. In: Meinel, C., Tison, S. (eds.) STACS 1999. LNCS, vol. 1563, pp. 404–413. Springer, Heidelberg (1999) 23. Thathachar, M.A.L., Narendra, K.S.: Learning Automata: An Introduction. Prentice Hall, Englewood Cliffs (1989) 24. Kushner, H.J.: Approximation and Weak Convergence Methods for Random Processes, with Applications to Stochastic Systems Theory. MIT Press, Cambridge (1984) 25. Libman, L., Orda, A.: Atomic Resource Sharing in Noncooperative Networks. Telecommunication Systems 17(4), 385–409 (2001) 26. Nash, J.F.: Equilibrium Points in n-person Games. Proc. of the National Academy of Sciences 36, 48–49 (1950) 27. Orda, A., Rom, R., Shimkin, N.: Competitive Routing in Multi-user Communication Networks. IEEE/ACM Transactions on Networking (TON) 1(5), 510–521 (1993) 28. Thathachar, M.A.L., Sastry, P.S., Phansalkar, V.V.: Decentralized Learning of Nash Equilibria in Multi-Person Stochastic Games With Incomplete Information. IEEE transactions on system, man, and cybernetics 24(5) (1994) 29. Roughgarden, T.: How unfair is optimal routing? In: Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 203–204 (2002) ´ How bad is selfish routing? Journal of the ACM 49(2), 30. Roughgarden, T., Tardos, E.: 236–259 (2002) 31. Olbrich, L., Fischer, S., V¨ ocking, B.: Approximating Wardrop Equilibria with Finitely Many Agents. In: Pelc, A. (ed.) DISC 2007. LNCS, vol. 4731, pp. 238– 252. Springer, Heidelberg (2007) 32. Stroock, D.W., Varadhan, S.R.S.: Multidimensional Diffusion Processes. Springer, Heidelberg (1979) 33. V¨ ocking, B.: Selfish Load Balancing. In: Algorithmic Game Theory. Cambridge University Press, Cambridge (2007) 34. Wardrop, J.: Some Theoretical Aspects of Road Traffic Research. Proceedings of the Institution of Civil Engineers, Part II 1(36), 352–362 (1952) 35. Weibull, J.W.: Evolutionary Game Theory. The MIT Press, Cambridge (1995)
Oracles and Advice as Measurements Edwin Beggs
3
1,
, Jos´e F´elix Costa
2,3
, Bruno Loff
2,3
, and John V. Tucker
1
1 School of Physical Sciences Swansea University, Singleton Park, Swansea, SA2 8PP Wales, United Kingdom
[email protected],
[email protected] 2 Department of Mathematics, Instituto Superior T´ecnico Universidade T´ecnica de Lisboa, Lisboa, Portugal
[email protected],
[email protected] Centro de Matem´ atica e Aplica¸co ˜es Fundamentais do Complexo Interdisciplinar Universidade de Lisboa Lisboa, Portugal
Abstract. In this paper we will try to understand how oracles and advice functions, which are mathematical abstractions in the theory of computability and complexity, can be seen as physical measurements in Classical Physics. First, we consider how physical measurements are a natural external source of information to an algorithmic computation, using a simple and engaging case study, namely: Hoyle’s algorithm for calculating eclipses at Stonehenge. Next, we argue that oracles and advice functions can help us understand how the structure of space and time has information content that can be processed by Turing machines. Using an advanced case study from Newtonian kinematics, we show that non-uniform complexity is an adequate framework for classifying feasible computations by Turing machines interacting with an oracle in Nature, and that by classifying the information content of such a natural oracle, using Kolmogorov complexity, we obtain a hierarchical structure based on measurements, advice classes and information.
1
Introduction
In computability theory, the basic operations of algorithmic models, such as register machines, may be extended with sets, or (partial) functions, called “oracles.” For example, in Turing’s original conception, any set S can be used as an oracle in an algorithm as follows: from time to time in the course of a computation, an algorithm produces a datum x and asks “Is x ∈ S?”. The basic properties of universality, undecidability, etc., can be proved for these S-computable functions. Technically, there is nothing special about the operations chosen to be basic in an algorithmic model. This fact is characteristic of computability theories over abstract algebras ([21,22]) where, typically, one chooses interesting
Corresponding author.
C.S. Calude et al. (Eds.): UC 2008, LNCS 5204, pp. 33–50, 2008. c Springer-Verlag Berlin Heidelberg 2008
34
E. Beggs et al.
operations with which to program. In classical computability theory on the natural numbers, oracles are seen as technical devices used to compare and classify sets by means of degree theories and hierarchies. However, here we will argue that it is a useful, interesting, even beautiful, endeavour to develop a computability theory wherein oracles are natural phenomena, and to study oracles that arise in Nature. More specifically, we will consider how physical measurements can be a natural external source of information for an algorithm, especially automata and Turing machines. First, in Section 2, we reflect on an example of an algorithm that has need of a physical oracle. Hoyle’s algorithm calculates eclipses using the ancient monument Stonehenge. Abstractly, it has the structure of an automaton with an oracle accessed by experimental observation. Our analysis focusses on calculating solar eclipses and how the oracle is needed to make corrections. In examining Hoyle’s algorithm, our aim is to explore some of the essential features of digital computations that may depend on analogue oracles in Nature, and to set the scene for the theory that follows. Next, we study this conceptually complex type of computation by means of an advanced case study. For a physical realisation of an oracle, we choose a physical experiment that we have already studied in some detail from the computational point of view. The Scatter Machine Experiment (SM E) is an experimental procedure that measures the position of a vertex of a wedge to arbitrary accuracy [7]. Since the position may itself be arbitrary, it is possible to analyse the ways in which a simple experiment in Newtonian kinematics can measure or compute an arbitrary real in the interval [0, 1]. In [5], we examined three ways in which the SME can be used as an oracle for Turing machines and established the complexity classes of sets they defined using non-uniform complexity theory; the three involved exact, arbitrary precision and fixed precision oracle calls. With this technical knowledge from [7,5], in this paper we pull these ideas together and go on to consider how physical measurements are a natural external source of information to an algorithmic computation. Using Kolmogorov complexity theory, we introduce an information complexity notion and show how information content of the position of the wedge can be used to classify the complexity of Turing computations with SME as oracle, and is monotonic with respect to a natural information ordering on vertices. We find: Theorem. The class of sets defined by Turing machines in polynomial time with SME as oracle is the class P/poly, and the complexity of these sets can be measured by the Kolmogorov complexity of the position of the vertex. In Section 3, we summarise what we need about oracles and advice functions in order to understand how the structure of space and time may have information content that can be processed by Turing machines (after Cooper and Odifreddi ([9]) and Copeland and Proudfoot ([10,11])). In Section 4, we introduce reductions between advice functions and, in Section 5, concepts based on the Kolmogorov complexity measure are used to express the information content that can be processed by Turing machines. In Section 6 we recall the essential details of the SME. In Section 7, we apply information complexity notions to
Oracles and Advice as Measurements
35
the SME and prove the above theorem, which suggests an inner structure of the advice class P/poly, similar to the one found in [3,20].
2 2.1
Stonehenge and Calculating with an Oracle Hoyle’s Algorithm
Stonehenge is an arrangement of massive stones in Wiltshire. Its earliest form dates from 3100 BC and is called Stonehenge I. The Astronomer Sir Fred Hoyle showed in [14] that Stonehenge can be used to predict the solar and the lunar eclipse cycles. Specifically, he gave a method, which we may call Hoyle’s algorithm, to make such calculations. For our purposes it doesn’t really matter whether the Celts used Stonehenge I to predict the eclipse cycles, but it matters that, in our times, we can use Stonehenge I to make good predictions of celestial events, such as the azimuth of the rising Sun and of the rising Moon, or that we can use this Astronomical Observatory as a predictor of eclipses (see [17] for a short introduction). Consider the prediction of eclipses, especially the solar eclipse. This is done by a process of counting time but also requires celestial checks and making corrections. The counting of days is a purely algorithmic process. The celestial correction is an experimental process, an observation, which we interpret as consulting a physical oracle. The important structure is the circle of Aubrey holes, made of 56 stones, buried until the XVII century, and discovered by John Aubrey (see Fig. 1). Three stones are used as counters that will be moved around the circle of Aubrey holes. The first counter counts the days of the year along the Aubrey holes; the second counter counts the days of the lunar month; finally, the third counter takes care of the Metonic cycle, in which the same phases of the moon are repeated on the same date of the year to within an hour or so, after a period of nineteen years (discovered by Meton around 430 B.C., but believed to have been known earlier); in other words, the third stone counts along the cycle of the lunar node, one of the intersection points of the ecliptic with the Moon’s orbit. The example of Stonehenge illustrates what is meant by an oracle that arises in Nature. From the point of view of the Earth both the Moon and the Sun follow approximately circular orbits, as shown in Fig. 2, which cross at the nodes N and N . Suppose the moon is passing through N . Then a solar eclipse will occur if the sun is no further than 15◦ of N , and a lunar eclipse happens if the sun is within 10◦ of N . If the moon is passing through N the situation is reversed. One can then wait for a solar eclipse, set the three tokens in the appropriate Aubrey hole, and use the following: Simplified Hoyle’s algorithm 1. The first token, a little stone for instance, is moved along the Aubrey holes to keep track of the 28 day lunar cycle. We move counterclockwise the first token two places every day, since 56/2 = 28.
36
E. Beggs et al.
Fig. 1. A schematic drawing of Stonehenge I
2. The second token counts the days of the year. Since 56 × 13/2 = 364, we move counterclockwise the second token two places every thirteen days. 3. The third token will represent one of the nodes, say N . N and N themselves rotate around the Earth, describing a full cycle (the Metonic cycle) every 18.61 years. So we will move clockwise the third token three times every year, because 56/3 = 18.67. 4. Eclipses occur when the three tokens become aligned with each other up to one Aubrey hole to the right or to the left. Ignoring the error for now, we conclude that simple modulo 56 arithmetic is enough to predict every eclipse with one single necessary input, namely: the day of a solar eclipse when one sets the tokens in the first Aubrey hole. Now we introduce celestial corrections that constitute the call to an oracle. To the Northeast of Stonehenge I there is a 5 meter tall stone, called the Heelstone. In the morning of the Summer solstice the sun (our oracle) raises slightly to the north of the Heelstone. To know the exact day of the Summer solstice we wait for the day when the sun rises behind the Heelstone. The sunrise should then proceed north for a few days, and then back south. We count the number of days between the first sunrise behind the Heelstone and the second sunrise. The day of the summer solstice happened in the middle of these two events. With this
Oracles and Advice as Measurements
37
Fig. 2. The approximate orbits of the Moon and the Sun around the Earth
information we can calibrate the second token to enough precision every year, so that Stonehenge I can predict eclipses indefinitely.1 2.2
Physical Oracles
We have described an unusual form of computation, aided by an unusual oracle. Is the measurement or observation of the summer solstice in Hoyle’s algorithm “call to an oracle?” In our discussion we could have replaced the structure Stonehenge I with a modern computer and corrections could be made via a link with a satellite telescope, for example. While it seems natural to consider the Sun as an oracle in the Stonehenge I algorithm described above, calling this satellite link an “oracle” may feel awkward — could one call it “input?” However, let us point these two sources of information have the same nature. It is customary to consider input to be finitely bounded information that is given prior to the start of the computation, but the corrections are updates that over time give — in principle — an unbounded amount of data. Without such oracles both Stonehenge I and our modern computer would eventually be incapable of predicting eclipses, although the modern computer could keep providing accurate predictions for hundreds of years. In both cases, the observations of the sun act exactly as an oracle. Hoyle’s algorithm is an example of an algorithm with a physical oracle. Said differently, the oracle notion extended to include a physical process is just what we need to best express Hoyle’s algorithm. Hoyle’s algorithm is also a description of a physical process. The components of Stonehenge I referring to celestial objects make a simple model of solar system dynamics: in reality we have the sky and the big circle of Aubrey holes. The algorithm is embodied by the real world. Cooper and Odifreddi, in [9], comment on this type of phenomenon: the Turing model supports (in-)computability in 1
The calibration procedure explained in [14] is slightly more complicated and detailed: we only illustrate it here. The remaining tokens can also be calibrated using other oracles: the phases of the moon give the adjustment of the first token and the precise day in which a solar eclipse occurs allows for calibration of the third token.
38
E. Beggs et al.
Nature in the sense that the Turing model is embedded in Nature in one way or another. For these authors, incomputability sounds more like an intrinsic limitation of our knowledge about the Universe rather than a manifesto for hypercomputation. Do these incomputabilities come out of (i) unpredictable behaviour of the model (e.g., an uncertainty based upon mathematical limitations), or (ii) a real and essential incomputability in Nature (e.g., the hyper-computational character of some physical phenomenon). Indeed, the following conjecture is extremely debatable. Conjecture O (for ‘oracle’). The Universe has non-computable information which may be used as an oracle to build a hyper-computer. The conjecture was popularised by Penrose’s search for (ii) in [18,19] and much can be written about it. Cooper and Odifreddi [9] have suggested similarities between the structure of the Universe and the structure of the Turing universe. Calude [8] investigates to what extent quantum randomness can be considered algorithmically random. The search for a physical oracle was proposed by Copeland and Proudfoot [11]. Their article and subsequent work have been severely criticised [12,13] for historical and technical errors. There is, however, an appealing esthetical side to what Copeland and Proudfoot proposed. Consider a variation of the Church–Turing thesis: the physical world is simulable. This thesis leads us to conclude that one could, in principle, construct a Turing machine that could successfully predict eclipses forever, without the use of any oracle. Being able to predict eclipses indefinitely, however, would not imply that the physical world is simulable, unless the prediction of planet alignments is, in some sense, complete for the simulation problem. Measuring the rise of the sun to the side of the Heelstone is a human activity very close to the abstract machine we are going to describe in the following sections: The Stonehenge apparatus measures a point in space and time whereas the device we are describing next measures a point in space. Both are real numbers in classical physics.
3
Some Classical Results on Non–uniform Complexity
In this paper Σ denotes an alphabet, and Σ ∗ denotes the set of words over Σ (where λ stands for the empty word). A language (or just a set) is a subset of Σ ∗ . The census function of a set A is the function that, for each n ∈ N, gives the number of words in A of size less or equal to n. Definition 1. Let the set of finite sequences over the alphabet Σ be ordered alphanumerically (i.e., first by size, then alphabetically). The characteristic function of a language A ⊆ Σ ∗ is the unique infinite sequence χA : N → {0, 1} such that, for all n, χA (n) is 1 if, and only if, the n-th word in that order is in A. The pairing function is the well known −.− : Σ ∗ × Σ ∗ → Σ ∗ , computable in linear time, that allows to encode two words in a single word over the same
Oracles and Advice as Measurements
39
alphabet by duplicating bits and inserting a separation symbol “01.” By an advice we mean any total function f : N → Σ ∗ . We recall the definition of non–uniform complexity class. Definition 2. If F is a class of advice functions and A is a class of sets, then we define the new class A/F as the class of sets B such that there exists a set A ∈ A and an advice f ∈ F such that, for every word x ∈ Σ ∗ , x ∈ B if, and only if, x, f (|x|) ∈ A. If we fix the class P of sets decidable by Turing machines in polynomial time, we still have one degree of freedom which is the class of advice functions F that makes P/F . In this paper we will work with polynomial and subpolynomial advice functions such that F is a class of functions with sizes bounded by polynomials and computable in polynomial time. Note that the advice functions are not, in general, computable; but the corresponding class of bounds is computable. E.g., if the class is poly, then it means that any advice f : N → Σ ∗ , even being non–computable, is bounded by a computable polynomial p such that, for all n ∈ N, |f (n)| ≤ p(n). Although the class F of functions is arbitrary it is useless to use functions with growing rate greater than exponential. Let exp be the set of advice functions bounded in size by functions in the class 2O(n) . Then P/exp contains all sets. Given this fact, we wonder if either P/poly or P/log (subclasses of P/exp) exhibit some interesting internal structure. The following result is fundamental in that it says there are non decidable sets in P/poly. One such set is K = {0n : the Turing machine coded by n halts on input 0}. Proposition 1. The characteristic of the sparse halting set is in P/poly. A set is said to be sparse if its census is bounded by a polynomial. We also need to recall the concept of tally set: a set is said to be tally if it is a language over an alphabet of a single letter (we take this alphabet to be {0}). Tally sets are sparse (but not vice-versa). For each tally set T , χT is defined relative to a single letter alphabet, e.g., Σ = {0}. The Sparse Halting Set K above is tally. The following statement (needed to prove Proposition 3) is offered as exercise to the reader in [1] (Chapter 5, Exercise 9). The reader is reminded that a query to the oracle is written on a special query tape, and that the oracle answers yes or no in one time step. Further, we note that adding extra tapes to a Turing machine will not affect our results. This is because a Turing machine with 1 working tape and 1 input tape can simulate a Turing machine with k working tapes and 1 input tape in time O(t × log(t)), where t is the time taken by the multi–tape machine. Proposition 2. In polynomial time, tally oracle Turing machines and advice Turing machines are equivalent. We will also need to treat prefix non-uniform complexity classes. For these classes we may only use prefix functions, i.e., functions f such that f (n) is always a prefix of f (n + 1). The idea behind prefix non-uniform complexity classes is that the advice given for inputs of size n may also be used to decide smaller inputs.
40
E. Beggs et al.
Definition 3. Let B be a class of sets and F a class of functions. The prefix advice class B/F ∗ is the class of sets A for which some B ∈ B and some prefix function f ∈ F are such that, for every length n and input w, with |w| ≤ n, w ∈ A if and only if w, f (n) ∈ B.
4
Structure within Advice Classes
If f : N → Σ ∗ is an advice function, then by |f | we denote its size, i.e., the function |f | : N → N such that, for every n ∈ N, |f |(n) = |f (n)|. Let |F | = {|f | : f ∈ F }. We already have seen that log, poly are classes of advice functions. Now consider the concept of reasonable advice class that we adapt from [20] to our purpose.2 Definition 4. A class of reasonable advice functions is a class of advice functions F such that (a) for every f ∈ F, |f | is computable in polynomial time, (b) for every f ∈ F, |f | is bounded by a polynomial, (c) |F | is closed under addition and multiplication by positive integers, (d) for every polynomial p of positive integer coefficients and every f ∈ F, there exists g ∈ F such that |f | ◦ p ≤ |g|. Other definitions could have been used. (According to this definition, polynomially long advice functions constitute themselves a class of reasonable advice functions.) Herein, we preferred to use the same concept already used in [3], for the purpose of classifying real numbers in different Kolmogorov complexity classes. Definition 5. There is a relation between two total functions, s and r, by saying that s ≺ r if s ∈ o(r). This relation can be generalised to two classes of advice functions, F and G, by saying that F ≺ G if there exists a function g ∈ G, such that, for all functions f ∈ F, |f | ≺ |g|.3 Since ordered reasonable advice functions in the context of P/poly are classes of sublinear functions, the most natural chain of advice function sizes is a descendent chain of iterated logarithmic functions: Define log(0) (n) = n and log(k+1) (n) = log(log(k) (n)). Note that log(k+1) ≺ log(k) , for all k ≥ 0. Now we take the reasonable class of advice functions log (k) given by closure of each bound under addition and multiplication by positive integers. The class of advice functions poly is reasonable if we restrict it to functions of computable size. Proposition 3. If F and G are two classes of reasonable sublinear advice classes4 such that F ≺ G, then P/F ⊂ P/G (strict inclusion). 2
3
4
The concept so-called reasonable advice bounds does not coincide with ours. The main reason is that functions computable in polynomial time can grow faster than polynomials. Note that a quite different definition could be thought: F ≺ G if for every function f ∈ F, there exists a function g ∈ G, such that |f | ≺ |g|. I.e., classes of reasonable advice functions of sublinear sizes.
Oracles and Advice as Measurements
41
Proof. Let linear be the set of advice functions of size linear in the size of the input and η. linear be the class of advice functions of size η times the size of the input, where η is a number such that 0 < η < 1. There is a tally set A whose characteristic function, χA , is in P/linear but not in P/η.linear for some η sufficiently small.5 We prove that there is a g ∈ G (with |g| strictly sublinear) so that for all f ∈ F with |f | ∈ o(|g|), there is a set in P/g that does not belong to P/f . A new tally set T is defined in the following way: for each length n, if |g|(n) ≤ n, then the word βn = χA|g|(n) 0n−|g|(n) is the unique word of size n in T , else 0n is the unique word of size n in T .6 This tally set7 belongs trivially to the class of P/g choosing as advice the function γ(n) = χA|g|(n) . We prove that the same set does not belong to P/f . Suppose that some Turing machine with advice f , running in polynomial time, decides T . Since |f | ∈ o(|g|), then for all but finitely many n, |f |(n) < η|g|(n), for arbitrarily small η, meaning that we can compute, for all but finitely many n, |g|(n) bits of χA using an advice of length η.|g|(n), contradicting the fact that χA is not in P/η.linear. The reconstruction of the binary sequence χA|g|(n) is provided by the following procedure: M procedure: begin input n; x := λ; compute |g|(n); for i := 1 to |g|(n) do query 0i to T using advice f (i); if “YES”, then x := x1, else x := x0 end for; output x end. The function g itself should have a computable size |g|, due to the restriction of G being a class of reasonable advice functions. The computation of |g|(n) takes a polynomial number of steps on n. So does each query and the loop (herein, we are using the Proposition 2). We end up with a polynomial number of steps on the size of the input. The class P/poly restricted to the advice functions of polynomial size constitute itself a reasonable advice class and can not reveal any internal structure. If we consider the full class P/poly with advice functions with size less or equal to 5 6 7
We can take for A the set of prefixes of Ω. This situation can only happen for a finite number of values of n. The set T can be seen as tally by performing the corresponding subtitution of each word by the required words from 0.
42
E. Beggs et al.
polynomial the same proof allows us to conclude that (since λn. n is in poly) P/poly is the supremum of all classes of sets induced by the relation between reasonable advice classes so far considered. To our previously defined advice classes log (k) we add the limit advice class log (ω) = ∩k≥1 log (k) . Then proposition 3 allows us to take the infinite descending chain of advice function sizes log (ω) ≺ . . . ≺ log (3) ≺ log (2) ≺ log ≺ poly and turn it into a strictly descending chain of sets P/log (ω) ⊂ . . . ⊂ P/log (3) ⊂ P/log (2) ⊂ P/log ⊂ P/poly To show that log (ω) is not trivial, we note that the function log ∗ , defined by log ∗ (n) = min{k : log (k) (n) ≤ 1}, is in log (ω) . Identifying this function allows us to continue the descending chain by defining log (ω+k) , for k ≥ 1, to be the class generated by log (k) ◦ log ∗ . Again we take the limit log (2ω) = ∩k≥1 log (ω+k) , giving the descending chain log (2ω) ≺ . . . ≺ log (ω+2) ≺ log (ω+1) ≺ log (ω) ≺ . . . ≺ log (3) ≺ log (2) ≺ log ≺ poly Now the function log ∗(2) = log ∗ ◦ log ∗ is in log (2ω) , so the class log ∗(2) is not trivial. We can continue descending by setting log (2ω+k) for k ≥ 1 to be the class generated by log (k) ◦ log ∗(2) . Of course, this continues till we reach 2 log (ω ) = ∩k≥1 log (kω) . To get beyond this would require finding log 2∗ ≺ log ∗(k) for all k, and this continuation is left to the reader!
5
Kolmogorov Complexity
From this section on, by P we denote the set of polynomials P = {λn. nk : k ∈ N}. We will work with one of the definitions of Kolmogorov Complexity discussed by Balc´ azar, Gavald` a, and Hermo in [2]: Definition 6. Let U be a universal Turing machine, let f : N → N be a total function and g : N → N be a time constructible function, and let α ∈ {0, 1}ω . We say that α has Kolmogorov complexity K[f, g] if there exists β ∈ {0, 1}ω such that, for all n, the universal machine U outputs αn in time g(n), when given n and βf (n) as inputs. This definition can be restated as follows: the dyadic rational αn of size n is generated by a universal Turing machine given the dyadic rational β f (n) as input. The reader should look to the input βf (n) as a binary sequence (dyadic rational without the left leading zero) made of a prefix, which is the required program to the universal Turing machine, paired with the actual input. K[f, g] can also be seen as the set of all infinite binary sequences with Kolmogorov complexity K[f, g]. K[f ] is the set of all infinite binary sequences with Kolmogorov complexity K[f, g], where g is an arbitrary time constructible function.
Oracles and Advice as Measurements
43
Definition 7. If G is a set of time constructible bounds, then K[F , G] is the set of all infinitebinary sequences taken from K[f, g], where f ∈ F and g ∈ G, i.e., K[F , G] = f ∈F , g∈G K[f, g]. K[F ] is the set of all infinite binary sequences taken from K[f ], where f ∈ F, i.e., K[F ] = f ∈F K[f ]. A sequence is called a Kolmogorov random sequence if it belongs to K[(λn. n) −O(1)] and does not belong to any smaller class K[f ]. Every sequence belongs to K[(λn. n) + O(1), P], since every sequence can be reproduced from itself in polynomial time plus the constant amount of input which contains the program necessary for the universal Turing machine to make the copy. The class K[O(1)] contains all computable real numbers, in the sense of Turing (i.e., all the binary digits are computable). The characteristic functions of all recursively enumerable sets are in K[log]. This proof was done by Kobayashi in 1981 [15] and by Loveland in 1969 [16] for a variant of the definition of Kolmogorov complexity. The Kolmogorov complexity of a real is provided by the following definition: A real is in a given Kolmogorov complexity class if the task of finding the first n binary digits of the real is in that class.
6
The Analog–Digital Scatter Machine as Oracle or Advice
Experiments with scatter machines are conducted exactly as described in [7], but, for convenience and to use them as oracles, we need to review and clarify some points. The scatter machine experiment (SME) is defined within Newtonian mechanics, comprising the following laws and assumptions: (a) point particles obey Newton’s laws of motion in the two dimensional plane, (b) straight line barriers have perfectly elastic reflection of particles, i.e., kinetic energy is conserved exactly in collisions, (c) barriers are completely rigid and do not deform on impact, (d) cannons, which can be moved in position, can project a particle with a given velocity in a given direction, (e) particle detectors are capable of telling if a particle has crossed a given region of the plane, and (f) a clock measures time. The machine consists of a cannon for projecting a point particle, a reflecting barrier in the shape of a wedge and two collecting boxes, as in Figure 3. The wedge can be at any position, but we will assume it is fixed for the duration of all the experimental work. Under the control of a Turing machine, the cannon will be moved and fired repeatedly to find information about the position of the wedge. Specifically, the way the SME is used as an oracle in Turing machine computations, is this: a Turing machine will set a position for the canon as a query and will receive an observation about the result of firing the cannon as a response. For each input to the Turing machine, there will be finitely many runs of the experiment. In Figure 3, the parts of the machine are shown in bold lines, with description and comments in narrow lines. The double headed arrows give dimensions in meters, and the single headed arrows show a sample trajectory of the particle
44
E. Beggs et al.
6
right collecting box
6 sample trajectory
@
@ @ @ @ @
1 5m x 6 0? limit of traverse of point of wedge
?
1
10 m/s
s cannon
6 z
0? limit of traverse of cannon
left collecting box
5m
-
Fig. 3. A schematic drawing of the scatter machine
after being fired by the cannon. The sides of the wedge are at 45◦ to the line of the cannon, and we take the collision to be perfectly elastic, so the particle is deflected at 90◦ to the line of the cannon, and hits either the left or right collecting box, depending on whether the cannon is to the left or right of the point of the wedge. Since the initial velocity is 10 m/s, the particle will enter one of the two boxes within 1 second of being fired. Any initial velocity v > 0 will work with a corresponding waiting time. The wedge is sufficiently wide so that the particle can only hit the 45◦ sloping sides, given the limit of traverse of the cannon. The wedge is sufficiently rigid so that the particle cannot move the wedge from its position. We make the further assumption, without loss of generality that the vertex of the wedge is not a dyadic rational. Suppose that x is the arbitrarily chosen, but non–dyadic and fixed, position of the point of the wedge. For a given dyadic rational cannon position z, there are two outcomes of an experiment: (a) one second after firing, the particle is in the right box — conclusion: z > x —, or (b) one second after firing, the particle is in the left box — conclusion: z < x. The SME was designed to find x to arbitrary accuracy by altering z, so in our machine 0 ≤ x ≤ 1 will be fixed, and we will perform observations at different values of 0 ≤ z ≤ 1. Consider the precision of the experiment. When measuring the output state the situation is simple: either the ball is in one collecting box or in the other box. Errors in observation do not arise. There are different postulates for the precision of the cannon, and we list some in order of decreasing strength:
Oracles and Advice as Measurements
45
Definition 8. The SME is error–free if the cannon can be set exactly to any given dyadic rational number. The SME is error–prone with arbitrary precision if the cannon can be set only to within a non-zero, but arbitrarily small, dyadic error. The SME is error-prone with fixed precision if there is a value ε > 0 such that the cannon can be set only to within a given precision ε. The Turing machine is connected to the SME in the same way as it would be connected to an oracle: we replace the query state with a shooting state (qs ), the “yes” state with a left state (ql ), and the “no” state with a right state (qr ). The resulting computational device is called the analog–digital scatter machine, and we refer to the vertex position of an analog–digital scatter machine when meant to discuss the vertex position of the corresponding SME. In order to carry out a scatter machine experiment, the analog–digital scatter machine will write a word z in the query tape and enter the shooting state. This word will either be “1,” or a binary word beginning with 0. We will use z indifferently to denote both a word z1 . . . zn ∈ {1} ∪ {0s : s ∈ {0, 1}∗} and the corresponding dyadic rational ni=1 2−i+1 zi ∈ [0, 1]. We use dyadic rationals as they correspond to the initial segments of the binary expansion of a real number. In this case, we write |z| to denote n, i.e., the size of z1 . . . zn , and say that the analog–digital scatter machine is aiming at z. The Turing machine computation will then be interrupted, and the SME will attempt to set the cannon at the position defined by the sequence of bits: z ≡ z1 · z2 · · · zn . with precision ε = 2−n+1 . After setting the cannon, the SME will fire a projectile particle, wait one second and then check if the particle is in either box. If the particle is in the right collecting box, then the Turing machine computation will be resumed in the state qr . If the particle is in left box, then the Turing machine computation will be resumed in the state ql . Definition 9. An error–free analog–digital scatter machine is a Turing machine connected to an error–free SME. In a similar way, we define an error- -prone analog–digital scatter machine with arbitrary precision, and an error- -prone analog–digital scatter machine with fixed precision. If an error–free analog–digital scatter machine, with vertex position x ∈ [0, 1], aims at a dyadic rational z ∈ [0, 1], we are certain that the computation will be resumed in the state ql if z < x, and that it will be resumed in the state qr when z > x. We define the following decision criterion. Definition 10. Let A ⊆ Σ ∗ be a set of words over Σ. We say that an error-free analog–digital scatter machine M decides A if, for every input w ∈ Σ ∗ , w is accepted if w ∈ A and rejected when w ∈ / A. We say that M decides A in polynomial time, if M decides A, and there is a polynomial p such that, for every w ∈ Σ ∗ , the number of steps of the computation is bounded by p(|w|).
46
E. Beggs et al.
gedankenexperiment: The position for firing the cannon is written as a dyadic rational on the query tape, and since it takes unit time to write a symbol on the tape, there is a limit to the accuracy of determining the wedge position that we can obtain within a given time. Conversely, using bisection, we can determine the wedge position to within a given accuracy, and if the wedge position is a good encoding, we can find the original sequence to any given length (see [6]). The following theorems are proved in [6]. Theorem 1. An error–free analog–digital scatter machine can determine the first n binary places of the wedge position x in polynomial time in n. Theorem 2. The class of sets decided by error–free analog–digital scatter machines in polynomial time is exactly P/poly. So measuring the position of a motionless point particle in Newtonian kinematics, using a infinite precision cannon, in polynomial time, is the same as deciding a set in P/poly. Note that, the class P/poly includes the Sparse Halting Set. In this paper we are only considering error–free analog–digital scatter machines. The error–prone analog-digital scatter machines do not behave in a deterministic way, and in this paper we are not concerned with probabilistic classes. However, lest the reader were to think that the computational power of the analog–digital scatter machine was dependent on some “unphysical” assumption of zero error, in [6,5] it is shown that the arbitrary precision machine can still compute P/poly (with suitable account of time taken to set up each experiment), and that the fixed precision machines can compute BP P//log∗, according with the definition: Definition 11. BP P//log∗ is the class of sets A for which a probabilistic polynomial Turing machine M, a function f ∈ log∗, and a constant γ < 12 exist such that M rejects w, f (|w|) with probability at most γ if w ∈ A and accepts w, f (|w|) with probability at most γ if w ∈ / A. The vertex of the wedge of the analog–digital scatter machine is placed at a position x ∈ [0, 1], a real number that can be seen either as an infinite binary sequence, or as the tally set containing exactly the words 0n such that the n-th bit in the sequence is 1.
7
The Complexity of the Vertex Position
In this section, we will apply the same methods developed in [4,3,20] in the study of neural networks with real weights to the analog–digital scatter machine. We shall apply a “good” coding of sequences of 0s and 1s into the binary digits of a real number that will allow a measurement of a given accuracy to determine the first n 0s and 1s (and that in addition will never produce a dyadic rational). For example, we can replace every 0 in the original sequence with 001 and every 1 with 100. Then the sequence 0110 . . . becomes the number 0·001100100001 . . . The set of “good” encodings will typically be some form of Cantor set in [0, 1]. See [6] for more details.
Oracles and Advice as Measurements
47
Proposition 4. Let S be a set of infinite binary “good” encodings and let T be the family of tally sets T = {T : χT ∈ S}. The computation times of the analog–digital scatter machines with vertex in S are polynomially related to the computation times of oracle Turing machines that consult oracles in T . Proof. We first prove that an analog–digital scatter machine M with vertex at x ∈ S, can be simulated by an oracle Turing machine M that consults a tally oracle T ∈ T . Let the characteristic of T be (χT =) x. Let t be the running time of M (possibly a non–constructible time bound).8 According to the Theorem 1, p(t) bits of x are enough to get the desired result in time t. The oracle Turing machine M computes as follows: M procedure: begin input w; n := |w|; s := 1; loop for i = 1 to p(s) query 0i to T to construct the sequence ξ := xs ; simulate M with vertex at ξ, step by step until time s; if M halts, then output the result; s := s + 1 end loop end. To see that the output is correct, note that after the for step, M has the value of x with enough precision to correctly simulate t(n) steps of the computation. The simulation is polynomial on the time t(n).9 Conversely, we prove that an oracle Turing machine M that consults the oracle T ∈ T can be simulated by an analog–digital scatter machine with vertex exactly at χT . The query tape is substituted by a working tape and a new query tape is added to aim the cannon. The machine M reads one by one the number i of 0s written in the former query tape and calls the scatter machine procedure to find i bits of the vertex position using the new query tape. Each call can be executed in a time polynomial in i ([5]). The overall time of the computation is polynomially related with the running time of the digital–analog scatter machine. The following theorem is the analogue to the corresponding theorem of neural networks with real weights, due to Balc´ azar, Gavald` a, and Siegelmann in [3,20] and its proof is similar. 8
9
Note that M halts only after t(n) steps on input x of size n, if t(n) is defined; otherwise, M does not halt. If the time of M is constructible, than a single loop suffices to get the amount of bits of x needed to conclude the simulation. However, in general, t is not constructible or, even worse, t may be undefined for a given input.
48
E. Beggs et al.
Theorem 3. If F is a class of reasonable sublinear advice functions,10 then the class P/F ∗ is exactly the class of languages accepted by polynomial time analog–digital scatter machines with vertex in the subset of “good” encodings of K[|F |, P]. In consequence, the class of languages accepted by the analog–digital scatter machine with vertex in K[|poly|, P] is P/poly∗ = P/poly. The class of languages accepted by the analog–digital scatter machine with vertex in K[|log|, P] is P/log∗. Thus we can reprove one of the main results of the Gedankenexperiment of Section 4 (Theorem 2). The result is the same for neural nets with real weights computing in polynomial time (see [20]). Theorem 4. The analog–digital scatter machines decide in polynomial time exactly the class P/poly. Proof. From Theorem 3, we know that the analog–digital scatter machines decide in polynomial time exactly the class P/poly∗ = P/poly. Take for F the class poly, restricted to advice functions of computable size. If an advice has non–computable size, but it is bounded in size by a polynomial p, then we can pad the advice of size m, for the input x of size n, with the word 10p(n)−m−1 . Thus, for every advice in poly, there is always an advice of computable size equivalent to the previous one that do not alter the complexity of the problem. We can then prove a hierarchy theorem. The statement can be found in [3,20], but here the proof relies on the structure of advice classes given by Proposition 3, without the use of Kolmogorov complexity. Proposition 5. If F and G are two classes of reasonable advice functions such that F ≺ G, then K[|F |, P] ⊂ K[|G|, P] (strict inclusion). Proof. If F ≺ G, then, by Proposition 3, P/F ⊂ P/G, from where it follows that P/F ∗ ⊂ P/G∗11 and, consequently, by Proposition 3, that K[|F |, P] ⊂ K[|G|, P] (all strict inclusions). Theorem 5. If F and G are two classes of reasonable advice functions such that F ≺ G, then the class of languages decidable by digital–analog scatter machines with vertex in K[|F |, P] is strictly included in the class of languages decidable by digital–analog scatter machines with vertex in K[|G|, P]. In the limit of a descendent chain of sizes of classes of reasonable advice functions we have O(1). The class K[O(1), P] is, as we know, the class of Turing computable numbers in polynomial time. 10 11
I.e., a class of reasonable advice functions of sublinear sizes. The proof of Proposition 3 is also a proof that P/F ⊂ P/G∗. Since P/F∗ ⊂ P/F, the statement follows.
Oracles and Advice as Measurements
8
49
Conclusion
We have reflected upon the way physical experiments, measuring some quantities, can arise in computation and be viewed as special kinds of oracles — Hoyle’s algorithm is an intriguing, yet simple, case study for this purpose. Next, we have inspected in some detail a case study based upon the scatter machine experiment SM E, a computational Gedankenexperiment we have analysed earlier ([6,5]). Using the SM E, we have shown that non-uniform complexity is an adequate framework for classifying feasible computations by Turing machines interacting with an oracle in Nature. In particular, in this paper, by classifying the information content of such an oracle using Kolmogorov complexity, we have obtained a hierarchical structure for advice classes. In our use of the scatter machine experiment as an oracle, we assume that the wedge is sharp to the point and that the vertex is placed at point that is measured by a precise value x. Without these assumptions, our arguments about the scatter machine would need modification since the computational properties arise exclusively from the value of x. The existence of an arbitrarily sharp wedge seems to contradict atomic theory, and for this reason the scatter machine is not a valid counterexample to many forms of physical Church–Turing theses. What is the relevance of the analog–digital scatter machine as a model of computation? The scatter machine is relevant when seen as a Gedankenexperiment. In our discussion, we could have replaced the barriers, particles, cannons and particle detectors with any other physical system with this behaviour. The scatter machine becomes a tool to answer the more general question: if we have a physical system to measure an answer to the predicate y ≤ x, where x is a real number and y is a dyadic rational, to what extent can we use this system in feasible computations? If we accept that “measuring a physical quantity” is, in essence, answering whether y ≤ x, then the scatter machine is just a generic example of a measuring device. In this way, our work studies the fundamental limitations of computation depending on the measurement of some physical constant. As current research, besides a few other aspects of the measurement apparatus that we didn’t cover in this paper, we are studying a point mass in motion, according to some physical law, such as Newtonian gravitation, and we will apply instrumentation to measure the position and velocity of such a point mass. Acknowledgements. The research of Jos´e F´elix Costa is supported by FEDER and FCT Plurianual 2007. Edwin Beggs and John Tucker would like to thank EPSRC for their support under grant EP/C525361/1.
References 1. Balc´ azar, J.L., D´ıas, J., Gabarr´ o, J.: Structural Complexity I, 2nd edn. Springer, Heidelberg (1995) 2. Balc´ azar, J.L., Gavald` a, R., Hermo, M.: Compressibility of infinite binary sequences. In: Sorbi, A. (ed.) Complexity, logic, and recursion theory. Lecture notes in pure and applied mathematics, vol. 187, pp. 1175–1183. Marcel Dekker, Inc., New York (1997)
50
E. Beggs et al.
3. Balc´ azar, J.L., Gavald` a, R., Siegelmann, H.: Computational power of neural networks: a characterization in terms of Kolmogorov complexity. IEEE Transactions on Information Theory 43(4), 1175–1183 (1997) 4. Balc´ azar, J.L., Gavald` a, R., Siegelmann, H., Sontag, E.D.: Some structural complexity aspects of neural computation. In: Proceedings of the Eighth IEEE Structure in Complexity Theory Conference, pp. 253–265. IEEE Computer Society, Los Alamitos (1993) 5. Beggs, E., Costa, J.F., Loff, B., Tucker, J.: On the complexity of measurement in classical physics. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 20–30. Springer, Heidelberg (2008) 6. Beggs, E., Costa, J.F., Loff, B., Tucker, J.: Computational complexity with experiments as oracles. Proc. Royal Society, Ser. A (in press) 7. Beggs, E., Tucker, J.: Experimental computation of real numbers by Newtonian machines. Proc. Royal Society, Ser. A 463(2082), 1541–1561 (2007) 8. Calude, C.: Algorithmic randomness, quantum physics, and incompleteness. In: Margenstern, M. (ed.) MCU 2004. LNCS, vol. 3354, pp. 1–17. Springer, Heidelberg (2005) 9. Cooper, B., Odifreddi, P.: Incomputability in Nature. In: Cooper, B., Goncharov, S. (eds.) Computability and Models, Perspectives East and West. University series in mathematics, pp. 137–160. Springer, Heidelberg (2003) 10. Copeland, J.: The Church–Turing thesis. In: Zalta, E. (ed.) The Stanford Enciclopedia of Phylosophy (published, 2002), http://plato.stanford.edu/archives/fall2002/entries/church-turing/ 11. Copeland, J., Proudfoot, D.: Alan Turing’s forgotten ideas in Computer Science. Scientific American 280, 99–103 (1999) 12. Davis, M.: The myth of hypercomputation. In: Teuscher, C. (ed.) Alan Turing: the life and legacy of a great thinker, pp. 195–212. Springer, Heidelberg (2006) 13. Hodges, A.: The professors and the brainstorms (published, 1999), http://www.turing.org.uk/philosophy/sciam.html 14. Hoyle, F.: From Stonehenge to Modern Cosmology. W.H. Freeman, New York (1972) 15. Kobayashi, K.: On compressibility of infinite sequences. Technical Report C–34, Research Reports on Information Sciences (1981) 16. Loveland, D.W.: A variant of the Kolmogorov concept of complexity. Information and Control 15, 115–133 (1969) 17. Newham, C.A.: The Astronomical Significance of Stonehenge. Coats and Parker Ltd (2000) (First published, 1972) 18. Penrose, R.: The Emperor’s New Mind. Oxford University Press, Oxford (1989) 19. Penrose, R.: Shadows of the Mind. Oxford University Press, Oxford (1994) 20. Siegelmann, H.T.: Neural Networks and Analog Computation: Beyond the Turing Limit. Birkh¨ auser, Basel (1999) 21. Tucker, J.V., Zucker, J.I.: Computable functions and semicomputable sets on many sorted algebras. In: Abramsky, S., Gabbay, D., Maibaum, T. (eds.) Handbook of Logic for Computer Science. University Series in Mathematics, vol. V, pp. 317–523. Oxford University Press, Oxford (2000) 22. Tucker, J.V., Zucker, J.I.: Abstract versus concrete computation on metric partial algebras. ACM Transactions on Computational Logic 5, 611–668 (2004)
From Gene Regulation to Stochastic Fusion Gabriel Ciobanu “A.I.Cuza” University, Faculty of Computer Science Blvd. Carol I no.11, 700506 Ia¸si, Romania Romanian Academy, Institute of Computer Science
[email protected]
Abstract. Usual process algebras are working with one-to-one interactions, and so it is difficult to use them in order to describe complex biological systems like gene regulation where many-to-many interactions are involved. We overcome this limitation and present a stochastic fusion calculus suitable to describe the dynamic behaviour involving many-tomany interactions. We extend the semantics of the fusion calculus from labelled transition system to stochastic labelled transition system where the evolution of a system is driven by probability distributions, then we analyse the stochastic distribution of the synchronization between interacting processes. Finally we define and study a stochastic hyperequivalence, and present an axiomatic system for it.
1
Biological Many-to-Many Interaction
In living cells, genes and proteins interact in networks of gene regulation. Gene regulation is the cellular control of the amount and timing of appearance of the functional product of a gene. Although a functional gene product may be an RNA or a protein, the majority of the known mechanisms regulate the expression of protein coding genes. Any step of gene expression may be modulated, from the DNA-RNA transcription step to post-translational modification of a protein. A gene regulatory network was formally modelled in [7] by using stochastic π-calculus. Stochastic π-calculus is applied as a modelling language for systems biology in order to investigate a prototypical instance of gene regulation in a bacterium. As a case study, the control of transcription initiation at the λ switch is modelled and simulated. Since the involved interactions are of type many-to-one or many-to-many, a more appropriate formalism should be used. In this paper we introduce stochastic fusion, a version of fusion calculus [11]. We use this formalism to model and study the network controlling transcription initiation at the λ switch. We consider a simple subsystem using many-to-many interactions (Figure 1). Following the guidelines of Regev and Shapiro [14], we represent members of the biomolecular population as processes, and biomolecular events as communication. We consider the same case as in [7]: a system with two operators of the same type OR which can be bound by proteins of two different types A and B. The operators of type OR have three possible states: vacant, A, and B. The possible states of proteins A and B are bound and unbound. There are seven C.S. Calude et al. (Eds.): UC 2008, LNCS 5204, pp. 51–63, 2008. c Springer-Verlag Berlin Heidelberg 2008
52
G. Ciobanu x
fuse
OR
OR_ vacant
pro_OR pro_A A
OR
pro_OR
A (un) binds
.... pro_B
B
OR_A
B (un) binds OR_B
Fig. 1. Many-to-many interaction in gene regulation
possible combinations of molecule types with their states: A bound, A unbound, B bound, B unbound, OR vacant, OR A, OR B. Unlike in [7] where the operator region interacts with the proteins over two generic channels (pro and release), here we use specific channels for every type of operator region and protein, namely pro A, pro B for reactions of protein binding to the operator, and rel A, rel B for unbinding events. Before the protein binding to operator takes place, a fusion over the channels names must be done. All the names from the same equivalence class are fusing under the same name; this name is used from now on in further interactions. After fusion, we can refer to a specific name in the equivalence classes given by the fusion. Many-to-many interactions are common in biological systems, and this fact provides an important motivation of introducing stochastic fusion. This is because the existing process algebras are able to describe one-to-one communication. The π-calculus is a very expressive process algebra used to model the changing connectivity of interacting processes [10]. However it is difficult to use the π-calculus to describe complex systems where many-to-one and many-tomany interactions are emerging. This paper tries to overcome this limitation, using equivalence classes of names when we have multiple interactions. This can be done essentially by the fusion calculus [11] which is a symmetric generalization of the π-calculus. A stochastic version of the fusion calculus can model complex systems involving many-to-many interactions. For this reason we extend the fusion calculus, and present a stochastic approach. Stochastic fusion calculus provides a concise and compositional way to describe the dynamic behaviour of systems using probability distributions, in particular the exponential distribution. The paper is structured as follows: first we summarize the fusion calculus by using ordinary labelled transition systems extended with fusions, and providing the operational semantic rules. Then we present the semantic of stochastic fusion calculus by using stochastic labelled transition systems instead of simple labelled transition systems. The stochastic nature of the new transition systems is given by the fact that the labels are pairs where the first component is an action, and the second component represents a stochastic rate associated to each
From Gene Regulation to Stochastic Fusion
53
transition given by an exponential distribution. For two processes running in parallel, we define the distribution of their synchronization. We extend the notion of hyperbisimulation to stochastic fusion calculus, and prove that the stochastic hyperequivalence is a congruence. We also present an axiomatic system for the stochastic hyperbisimulation.
2
Syntax and Semantics of Fusion Calculus
Fusion calculus was introduced by Parrow and Victor as a symmetric generalization of the π-calculus [11]. The π-calculus has two binding operators (prefix and restriction), input and output actions are asymmetric, the effects of communication are local, and various bisimulations (early, late, open. . . ) are defined. Unlike the π-calculus, in fusion calculus the effects of communication are both local and global. Fusion calculus makes input and output operations fully symmetric and a more appropriate terminology for them might be action and co-action. A fusion is a name equivalence which allows for using interchangeably in a term all the names of an equivalence class. Computationally, a fusion is generated as a result of a synchronization between two complementary actions, and it is propagated to processes running in parallel within the same scope of the fusion. Fusions are ideal for representing various forms of many-to-many interactions. We briefly recall the syntax and the operational semantics of fusion calculus (see [11] for details). Let N be a countable infinite set of names with a, b, . . . , x, y, . . . as metavariables. Like in the π-calculus, names represent communication channels. We keep the notation x for a (possibly empty) finite se \ y we denote the set of xi with xi = yi . We use quence x1 , . . . , xn of names. By x ϕ, ψ to denote an equivalence relation called fusion over N , which is represented in the syntax by a finite set of equalities. We write xϕy if x and y are related by ϕ, and { x = y} to denote the smallest such equivalence relation relating each xi with yi . The identity relation is 1, and as a consequence, a fusion written {x = x} is the same as {y = y}, namely 1, and {x = y, x = z} is the same as {x = y, y = z}. We assume a set A of process identifiers ranged over A, A1 , . . . and a set P of processes ranged over P, Q, . . .. Definition 1 (Fusion Calculus Syntax). The actions ranged over by α, and the processes ranged over by P are defined by: Prefixes: α::= u x | u x|ϕ def x) = P Processes: P ::= 0 | α.P | P + Q | P | Q | (x)P | [x = y]P | [x = y]P | A( An input action u x means ”consider the input objects along the channel u, and replace x with these objects”. Note that input does not entail binding. The x means ”output the objects x along the channel u”. x are the output action u objects of the action, and the channel u is its subject. The fusion actions ϕ have neither subject nor objects. The processes syntactic constructs have the usual interpretation. A scope (x)P defines the scope of x as P ; no communication action of (x)P can have x as its subject, and fusion effects with respect to x are limited to P . Restriction and input binding of the π-calculus can be seen as
54
G. Ciobanu
special cases of the fusion scope. For every process P we denote by fn(P ) the free names in P , by bn(P ) the bound names in P, and by n(P ) all the names occurring in P . Similarly fn(α), bn(α), n(α) for every action α. A substitution σ agrees with a fusion ϕ if for every names x and y, xϕy if and only if σ(x) = σ(y). σ is a substitutive effect of a fusion ϕ if σ sends all members of each equivalence class of ϕ to one representative of the class. The only substitutive effect of a communication action is the identity substitution. Definition 2 (Fusion Calculus Semantics). The operational semantics of fusion calculus is given by a labelled transition system defined as the least relation satisfying the following inference rules:
P REF :
α
−
P ASS :
α
α.P → P
α
(z)P → (z)P
( y )a x
α
SU M :
/ fn(α) P → P , z ∈
P → P α
P + Q → P
OP EN :
\ y, a ∈ / {z, z} P −−−→ P , z ∈ x (z y )a x
α
P AR : COM :
P → P α
P | Q → P | Q u y u x |=| y | P → P , Q → Q , | x { x= y}
SCOP E :
P | Q −−−−→ P | Q ϕ P → P , zϕx, z = x ϕ\z
M AT CH :
(z)P −−−−→ P α P → P α
[x = x]P → P α P → P , M ISM AT CH : α [x = y]P → P α
SU BST :
(z)P −−→ P {x/z}
P { y/ x} → P α
A( y) →
P
def
, A( x) = P
For convenience we define ϕ \ z meaning ϕ ∩ (N \ {z})2 ∪ {(z, z)}. The only rule dealing with bounded actions is OP EN . Using structural congruence, and pulling up the relevant scope to top level, we can still infer e.g., (x)ayx P | (x)ayx.Q−−−−→P | Q using P REF and OP EN (an alpha-conversion is necessary whenever x ∈ / fn(P )). A SCOP E rule entails a substitution of the scoped name z for a nondeterministically chosen name x related to it by ϕ (for the purpose of the equivalence defined below, it does not matter which x). Definition 3. The structural congruence between processes, denoted by ≡, is the least congruence satisfying the following axioms: (fusion) ϕ.P ≡ ϕ.P σ for every substitution σ agreeing with ϕ; (par) P | 0 ≡ P P | Q ≡ Q | P P | (Q | R) ≡ (P | Q) | R; (scope) (x)0 ≡ 0 (x)(y)P ≡ (y)(x)Q (x)(P + Q) = (x)P + (x)Q; / fn(P ). (scope extension) P | (z)Q ≡ (z)(P | Q), where z ∈ Definition 4. (Hyperbisimulation) A fusion bisimulation is a binary symmetric relation S over processes such that (P, Q) ∈ S implies: α
α
if P → P with bn(α) ∩ fn(Q)= ∅, then Q → Q and (P σ, Q σ) ∈ S, for some substitutive effect σ of α, if α is a fusion. A hyperbisimulation is a substitution closed fusion bisimulation.
From Gene Regulation to Stochastic Fusion
55
Theorem 1. [11] Hyperequivalence is the largest congruence in bisimilarity. Before providing the syntax and the semantics of stochastic fusion calculus, we remind some important properties of the exponential distribution. Let X, X1 , and X2 denote exponentially distributed random variables. a) An exponential distribution P (X ≤ t) = 1 − e−rt is characterized by a single positive real value parameter r, usually referred to as the rate. b) Exponential distribution guarantees the memoryless property which says that at each step in which an activity has started but not terminated yet, the remaining duration of the activity is still distributed as the entire duration of the activity. This means P (X > u + t|X > t) = P (X > u), for all u, t ≥ 0. c) P (min(X1 , X2 ) ≤ t) = 1 − e−(r1 +r2 )t , where Xi ∼ Exp(ri ). This property explains why the waiting time for a state i is exponentially distributed. Every r → j leaving state i has an associated exponentially distributed transition i − random variable (with parameter r). It is assumed that we have a race among several transitions, i.e., they compete for a state change. The waiting time in i ends as soon as the first transition is ready to occur. r1 r2 d) P (X1 < X2 ) = , and P (X2 < X1 ) = . This property deterr1 + r2 r1 + r2 mines the probability of a specific transition to win such a race. Since we use the exponential distribution, we have some advantages derived from the memoryless property of this distribution. However, many phenomena which take place in practice are described by non-exponential distributions; general distributions will be considered in future work. For this reason we take a metavariable F to stand for a general probability distribution which in this paper is actually the exponential distribution.
3
Syntax and Semantics of Stochastic Fusion Calculus
Let PDF be a set of continuous probability distributions ranged over by F , Fu , Fu , Fϕ , where u is a channel name and ϕ is a fusion. We simplify the notation for the fusion relation: we still write x ϕ y if x and y are related by ϕ, but we write { x, y} to denote the smallest such equivalence relation relating each xi with yi . For example, a fusion ϕ written {x = y, x = z, u = v} refers in stochastic fusion calculus (SFC) to the equivalence classes {x, y, z} and {u, v}, and we write ϕ = {{x, y, z}, {u, v}}. For identity we use 1, and a fusion written {x} is the same as {y}, namely 1. [x] is the equivalence class of x, and ϕ \ z means ϕ without the equivalence class [z], but keeping the identity {z}. Definition 5 (Stochastic Fusion Calculus Syntax). The actions ranged over by μ, and the processes ranged over by P are defined as follows: x, Fu ) | (ϕ, Fϕ ) Prefixes: μ ::= (u x, Fu ) | (u def Processes: P ::= 0 | μ.P | P + Q | P | Q | (x)P | if xϕy then P else Q | A( x) = P Let SF C be the set of process expressions of the stochastic fusion calculus defined above. We use a generic notation μ = (α, F ).P , where α can be either an input
56
G. Ciobanu
u x, an output u x, or a fusion ϕ, and the probabilistic distribution F can be either Fu , Fu or Fϕ . By fn(μ), bn(μ), n(μ) we understand fn(α), bn(α), n(α). F = 1 − e−rt is an exponential distribution, and rate(F ) = r. We use the ”if-then-else” syntax instead of matching and mismatching expressions, where if xϕy then P else Q means that if x and y are related by a fusion (i.e., are from the same equivalence class) then P is executed, otherwise Q is executed. Definition 6 (Stochastic Fusion Calculus Semantics). The operational semantics of the stochastic fusion calculus is given by a labelled transition system defined as the least relation satisfying the following inference rules: μ
→k P Pi − SU M : μ Pj − →j.k P
(α,F )
P REF : (α, F ).P −−−→1 P
j∈I μ
P ARL :
μ
→i P P −
P ARR :
μ
P |Q− →(i,0) P | Q bn(μ) ∩ fn(Q) = ∅ →i P , z ∈ / fn (μ) P −
zϕx, z = x
((z y )u x,F )
(z)P −−−−−−−→i P z∈x \ y, u ∈ / {z, z}, F ∈ {Fu , Fu } (u x,Fu )
P −−−−→i P
COM :
(ϕ\z,Fϕ )
(z)P −−−−−−→i P {x/z}
(u y ,Fu )
P −−−−−→i P , Q −−−−−→j Q (ϕ,Fϕ )
P | Q −−−−→(i,j) P | Q where [ x] ∪ [ y] defines the new fusion ϕ
μ
IT E1 :
P −−−−−−→i P ,
OP EN :
μ
(z)P − →i (z)P
(ϕ,Fϕ )
SCOP E :
μ
P |Q− →(0,i) P | Q bn(μ) ∩ fn(P ) = ∅ (( y )u x,F )
μ
P ASS :
→i Q Q−
μ
P − →i P , (x, y) ∈ ϕ
IT E2 :
μ
if xϕy then P else Q − →i P
Q− →i Q , (x, y) ∈ /ϕ μ
if xϕy then P else Q − →i Q
μ
SU BST :
→i P P { y/ x} − μ
A( y) − →i P
, A( x) = P
A PASS rule is similar to a local variable declaration; the restriction of z on top of P declares a new name for use in P , which can not be used as a communication subject. However, such a restricted z might not remain local to P ; it can be exported outside by using an OPEN rule which removes the restriction. A COM rule expresses the synchronous communication between two processes; if we have a step from P to P by an input action according to an exponential distribution function Fu , and a step from Q to Q by an output action with an exponential distribution Fu , then we have a step from the parallel process P | Q to the parallel process P | Q by a fusion action with an exponential distribution Fϕ , where the new fusion ϕ contains the class [ x] ∪ [ y ]. This means that not only x
From Gene Regulation to Stochastic Fusion
57
and y fuse, but all the names in the equivalence class of x fuse with those of the equivalence class of y. Fϕ is called the synchronization distribution. The indices appearing on the arrows are used to distinguish different derivations of the same stochastic fusion process, and they are designed such that every derivation of a process has a unique index [4]. We denote by I the set of these indices, and I is the smallest set such that 0 ∈ I, j ∈ I, k ∈ I ⇒ j.k ∈ I, and i, j ∈ I ⇒ (i, j) ∈ I. Another way to keep track of the transition derivations is by using proof tree [3]. Example 1. The following examples illustrate the use of these indices. We can see how, whenever we get the same result by various derivations, different derivations are identified by their indices. 1. Let us consider a process P = (α, F ).0 + (α, F ).0, where α can be an input, an output or a fusion. The following transitions can be inferred: (α,F )
(α, F ).0 + (α, F ).0 −−−→1.1 0 (α,F )
(α, F ).0 + (α, F ).0 −−−→2.1 0 2. Let us consider a process Q = (α, F ).0 | ((α, F ).0 + (α, F ).0). Then (α,F )
(α, F ).0 | ((α, F ).0 + (α, F ).0) −−−→(1,0) 0 | ((α, F ).0 + (α, F ).0) (α,F )
(α, F ).0 | ((α, F ).0 + (α, F ).0) −−−→(0,1.1) (α, F ).0 | 0 (α,F )
(α, F ).0 | ((α, F ).0 + (α, F ).0) −−−→(0,2.1) (α, F ).0 | 0 3. Considering a process R = (ux, Fu ).0 | ((uy, Fu ).0 + (uy, Fu ).0), we have the following transitions: ([x]∪[y],Fϕ )
(ux, Fu ).0 | ((uy, Fu ).0 + (uy, Fu ).0) −−−−−−−→(1,1.1) 0 | 0 ([x]∪[y],Fϕ )
(ux, Fu ).0 | ((uy, Fu ).0 + (uy, Fu ).0) −−−−−−−→(1,2.1) 0 | 0 Example 2. In this example we show how the rules are used. Let us infer the following transition ([y]∪[w],Fϕ )
(z)(P | (uxy, Fu ).Q | (uzw, Fu ).R) −−−−−−−−→(0,(1,2)) (P | Q | R){x/z} We use the following rules: (uxy,Fu )
P REF : (uxy, Fu ).Q −−−−−−→1 Q
(uzw,Fu )
P REF : (uzw, Fu ).R −−−−−−→2 R
(uxy,Fu )
COM :
(uzw,Fu )
(uxy, Fu ).Q −−−−−−→1 Q , (uzw, Fu ).R −−−−−−→2 R (ϕ,Fϕ )
(uxy, Fu ).Q | (uzw, Fu ).R −−−−→(1,2) Q | R where ϕ = [x] ∪ [z] ∧ [y] ∪ [w]. (ϕ,Fϕ )
P ARR :
(uxy, Fu ).Q | (uzw, Fu ).R −−−−→(1,2) Q | R (ϕ,Fϕ )
P | (uxy, Fu ).Q | (uzw, Fu ).R −−−−→(0,(1,2)) P | Q | R (ϕ,Fϕ )
SCOP E :
P | (uxy, Fu ).Q | (uzw, Fu ).R −−−−→(0,(1,2)) P | Q | R ([y]∪[w],Fϕ )
(z)(P | (uxy, Fu ).Q | (uzw, Fu ).R) −−−−−−−−→(0,(1,2)) (P | Q | R){x/z}
58
G. Ciobanu
Remark: In the last transition we use {[x] ∪ [z], [y] ∪ [w]} \ z = [y] ∪ [w]. 3.1
Synchronization Distribution
Let Fi = 1 − e−λi t , i = 1, 2 be the distributions of two interacting processes. There are several ways to define the distribution Fϕ of synchronization. We define the rate of the synchronization distribution using the apparent rate as in PEPA [8]. The apparent rate rα (P ) of an action α in a process P is the sum of the rates of all actions α which are enabled in P . In PEPA the synchronization does not require complementary actions. When synchronizing two processes P and Q, where P may enable many α-actions and Q may enable many β-actions, the rate r of the synchronization process is computed using the formula: r=
rate(Fα ) rate(Fβ ) × × min{rα (P ), rβ (Q)} rα (P ) rβ (Q)
where rα (P ) is the apparent rate of an action α in process P , which is the sum of the rates of all possible actions α enabled in P , i.e., rα (P ) = rate(Fj ). P
(α,Fj )
→ Pj
rate(Fα ) A ratio represents the probability to occur of a transition from P by rα (P ) an α-action with distribution Fα . This ratio expresses the race policy. If there is a single α enabled in P , and a single β enabled in Q, the right hand side of the above equation is reduced to min(rate(Fα ), rate(Fβ )). In stochastic fusion calculus we adapt the synchronization in PEPA considering that actions α and β involved into a synchronization are complementary actions (a similar approach is used in stochastic π-calculus [13]). In this way, by using the rule COM, the rate in the definition of the synchronization distribution Fϕ becomes rate(Fϕ ) =
rate(Fu ) rate(Fu ) × × min{rux (P ), ruy (Q)} rux (P ) ruy (Q)
where ϕ = [ x] ∪ [ y ]. If there is nothing to send on the channels u and u, then ϕ = 1, i.e., all the equivalence classes remain as they are. In such a situation, we consider that rate(F1 ) is infinity or a number large enough, meaning that the interaction takes place instantaneously.
4
Stochastic Hyperbisimulation
The definition of the stochastic hyperbisimulation is closely related to the definition of probabilistic bisimilarity for probabilistic transition systems [6,9], or to the notion of lumpability for Markov chains [5]. Two processes P and Q are lumping equivalent, and we denote this by P ∼ Q, if the total rate of moving to an equivalence class S under ∼ is identical for all such classes.
From Gene Regulation to Stochastic Fusion
59
(α,Fα )
If R −−−−→i R , we define γα : SF C × SF C → − R by γα (R, R ) = rate(Fα ). We first define the cumulative rate function Definition 7. γα : SF C × P(SF C) − → R is the cumulative rate function given by: ∀α ∈ N , ∀R ∈ SF C, ∀S ⊆ SF C (α,Fα ) γα (R, S) = {rate(Fα ) | ∃i ∈ I, R −−−−→i R , R σ ∈ S} = γα (R, R ) i∈I∧R σ∈S
i∈I
for some substitutive effect σ of α, if α is a fusion. Essentially γα (R, S) represents the cumulative rate of transitions labelled by α from a process R to a subset S of processes. Definition 8 (Stochastic Bisimulation). A stochastic bisimulation is an equivalence relation R over the set SF C of processes satisfying the following property: for each pair (P, Q) ∈ R, for all actions α, and for all equivalence classes S ∈ SF C/R, we have γα (P, S) = γα (Q, S), where γα (R, S) = i∈I {rate(Fα ) | (α,Fα )
∃i ∈ I, R −−−−→i R , R σ ∈ S}, for some substitutive effect σ of a fusion α. .
Two processes P and Q are stochastic bisimilar, written P ∼SH Q, if they are related by a stochastic bisimulation. Stochastic bisimilarity is not a congruence, and the following example is illustrative: .
(y, Fy ) | (z, Fz ) ∼SH (y, Fy ).(z, Fz ) + (z, Fz ).(y, Fy ) .
[y] ∪ [z].((y, Fy ) | (z, Fz )) SH [y] ∪ [z].((y, Fy ).(z, Fz ) + (z, Fz ).(y, Fy ))
(1) (2)
We therefore look for the largest congruence included in the stochastic bisimilarity. This is achieved by closing the definition of stochastic bisimulation under arbitrary substitutions. Definition 9 (Stochastic Hyperbisimulation). A stochastic hyperbisimulation is an equivalence relation R over SF C satisfying the following properties: i) R is closed under any substitution σ, i.e., P RQ implies P σRQσ for any σ; ii) for each pair (P, Q) ∈ R, for all actions α, and for all equivalence classes S ∈ SF C/R, we have γα (P, S) = γα (Q, S). P and Q are stochastic hyperbisimulation equivalent (or stochastic hyperequivalent), written P ∼SH Q, if they are related by a stochastic hyperbisimulation. .
Example 3. (y, Fy ) | (z, Fz ) ∼SH (y, Fy ).(z, Fz ) + (z, Fz ).(y, Fy ) We have to show that the equivalence relation R = {(P, Q), (0, 0)} is a stochastic hyperbisimulation, where P ≡ (y, Fy ) | (y, Fy ), and Q ≡ (y, Fy ).(y, Fy ) + (y, Fy ).(y, Fy ) + 1. The only equivalence class is S = {P, Q, 0}. The only transition that can be inferred from P is (1,F1 ) (y, Fy ) | (y, Fy ) −−−−→(1,1) 0 | 0 ≡ 0 ∈ S. The only transition that can be inferred from Q is (1,F1 )
(y, Fy ).(y, Fy ) + (y, Fy ).(y, Fy ) + 1 −−−−→(1,1) 0 | 0 ≡ 0 ∈ S. Hence we have γ1 (P, S) = rate(F1 ) = γ1 (Q, S).
60
G. Ciobanu
Definition 10. A process context C is given by the syntax: C ::= [ ] | μ.C | C1 + C2 | C | P | P | C | (x)C | if xϕy then C1 else C2 C[P ] denotes the result of filling the hole in the context C by the process P . The elementary contexts are μ.[ ], [ ] + P, [ ] | P, P | [ ], (x)[ ], if xϕy then [ ] else [ ]. The set of all stochastic fusion calculus contexts is denoted SF C[ ]. Theorem 2. (Congruence) Stochastic hyperequivalence is a congruence, i.e., for P, Q ∈ SF C and C ∈ SF C[ ], P ∼SH Q implies C[P ] ∼SH C[Q]. Proof. The idea of this proof originates from [10]. However the proof is a bit different, because we insist on the fact that bisimulations should be equivalences, and reason in terms of the function γα rather than using the underlying transitions. This compensates when we add the probabilistic distributions. Note that for an expression C[P ], any variable in P is either bound within P , free within P but bound within C[P ], or free both within P and C[P ]. It is enough to show that the equivalence closure R of R = {(C[P ], C[Q]) | P ∼SH Q, C ∈ SF C[ ] such that C[P ], C[Q] ∈ SF C} is a stochastic hyperbisimulation.
5
Axiomatization of the Stochastic Hyperequivalence
We present a sound and complete axiomatization of ∼SH for stochastic fusion calculus. Such an axiomatization facilitates to prove the stochastic hyperequivalence of processes at a syntactical level. The axiomatization extends the original axiomatization of Parrow and Victor with stochastic axioms; in particular axiom S4 is new, and uses an additive property of the exponential distribution. The axiomatization is also related to the axiomatization presented in [1]. We use M, N to stand for a condition xϕy in the if-then-else operator, where ϕ is a fusion relation, and define the names occurring in M by n(M ) = {x, y}. We use a simplified notation for the if-then-else operator, namely M ?P :Q, and add a scope law for the structural congruence: (x)M ?P :Q ≡ M ?(x)P :(x)Q, if x ∈ n(M ). Note that if we have M ?(N ?P :Q):Q, then we can write M N ?P :Q, where M N is the conjunction of the conditions M and N . A sequence of conditions , N , and we say that M implies N , x1 ϕy1 x2 ϕy2 . . . xk ϕyk ranges over by M written M ⇒ N , if the conjunction of all conditions in M logically implies all (similar for M ⇔N ). elements in N , Definition 11. [11] A substitution σ agrees with a sequence of conditions M and M agrees with σ, if for all x, y which appear in M , σ(x)ϕσ(y) iff M ⇒ xϕy. We define ASHE, a stochastic extension of the axiom system presented in [11]. Summation S1 P + 0 = P S2 P + Q = Q + P
From Gene Regulation to Stochastic Fusion
61
S3 P + (Q + R) = (P + Q) + R S4 (α, F1α ).P + (α, F2α ).P = (α, F ).P , where F is the distribution function of the minimum of the two processes given by property c) of the exponential distribution. Scope R1 (x)0 = 0 R2 (x)(y)P = (y)(x)P R3 (x)(P + Q) = (x)P + (x)Q R4 (x)(α, Fα ).P = (α, Fα ).(x)P , if x ∈ / fn(α) R5 (x)(α, Fα ).P = 0, if x is the subject of α If-Then-Else ?P : Q = N ?P : Q, if M ⇔N I1 M I2 xϕy?P : Q = xϕy?(P {x/y}) : Q I3 M ?P : P + M ?Q : Q = M ?(P + Q) : (P + Q ) I4 xϕx?P : Q = P I5 P = xϕy?P : 0 + xϕy?0 : P If-Then-Else and Scope IR1 (x)yϕz?P : Q = yϕz?(x)P : (x)Q, if x = y, x = z IR2 (x)xϕy?P : 0 = 0, if x = y Fusion F1 (ϕ, Fϕ ).P = (ϕ, Fϕ ).(xϕy?P : Q), if xϕy F2 (z)(ϕ, Fϕ ).P = (ϕ \ z, Fϕ ).P , if z ∈ / fn(P ) Expansion Mi ?(xi )(αi , Fαi ).Pi : 0 and Q ≡ Nj ?(yj )(βj , Fβj ).Qj : 0, E If P ≡ i
j
where all the names in Mi (Nj ) are related by fusion ϕi (ϕj respectively), then we have: Mi ?(xi )(αi , Fαi ).(Pi | Q) : 0 + Nj ?(yj )(βj , Fβj ).(P | Qj ) : 0+ P |Q= +
i
j
Mi Nj ?(xi )(yj )(ϕ, Fϕ ).(Pi | Qj ) : 0,
αi ≡uzi ∧βj ≡uwj
where ϕ = [zi ] ∪ [wj ], Fϕ is the synchronisation distribution, and xi ∈ fn(αi ), yj ∈ fn(βj ). We also have the following derived rules: If-Then-Else DM1 xϕx?P : Q = P DM2 xϕy?(α, Fα ).P : Q = xϕy?((α, Fα ).(xϕy?P : R)) : Q ?P : Q = M ?(P σ) : Q, for σ agreeing with M DM3 M ?0 : 0 = 0 DM4 M ?P : P + P = P DM5 M Fusion DF1 (ϕ, Fϕ ).P = (ϕ, Fϕ ).(P σ), where σ agrees with ϕ DF2 (z)(ϕ, Fϕ ).P = (ϕ \ z, Fϕ ).(P {w/z}), if zϕw and z = w
Theorem 3 (Soundness) ASHE is sound, i.e., ASHE P = Q implies P ∼SH Q.
62
G. Ciobanu
Proof. We follow [10] in order to prove the soundness of the axioms which do not involve the distribution of the transitions. For the other axioms we follow the proof presented in [1]. We present here the proof regarding the expansion axiom. We write R for the right hand side of axiom E, and we show that P | Q ∼SH R. We consider a relation E given by E = {(P | Q, R)} ∪ Id. There are three cases induced by the three terms of R denoted R1 , R2 , R3 , respectively. We refer here to the third term of R. By applying P ASS, IT E1 , and in the end SU M , we get for P | Q : (u zk ,Fu )
zk , Fu ).Pk −−−−−→m Pk (u
by P ASS
(u zk ,Fu )
( xk )(u zk , Fu ).Pk −−−−−→m ( xk )Pk (u zk ,Fu )
( xk )(u zk , Fu ).Pk −−−−−→m ( xk )Pk (u zk ,Fu )
Mk ?( xk )(u zk , Fu ).Pk : 0 −−−−−→m ( xk )Pk
by IT E1
(uzk ,Fu )
Mk ?( xk )(uzk , Fu ).Pk : 0 −−−−−→m ( xk )Pk (uzk ,Fu )
by SU M
(3)
P −−−−−→k.m ( xk )Pk We have similar transitions for Q and by applying COM rule, we obtain: (uzk ,Fu )
(uwl ,Fu )
P −−−−−→k.m ( xk )Pk , Q −−−−−−→l.m ( yl )Ql
, ϕ = [zk ] ∪ [wl ] (ϕ,Fϕ ) P | Q −−−−→(k.m,l.m) ( xk )Pk | ( yl )Ql For R we apply P REF , and, since xi ∈ / fn(αi ), yj ∈ / fn(βj ), we apply P ASS twice, IT E1 twice, and in the end SU M : (ϕ,Fϕ )
Mk Nl ?( xk )( yl )([zk ] ∪ [wl ], Fϕ ).(Pk | Ql ) : 0 −−−−→m ( xk )( yl )(Pk | Ql ) (ϕ,Fϕ )
R3 −−−−→kl.m ( xk )( yl )(Pk | Ql ) where ϕ = [zk ] ∪ [wl ], and kl is the index of a term from the sum R3 . Therefore (ϕ,Fϕ )
xk )( yl )(Pk | Ql ) by SU M R −−−−→kl.m ( Finally, by applying the scope extension twice, we get: yl )Ql = ( xk )( yl )(Pk | Ql ) ( xk )Pk | (
Theorem 4 (Completeness) ASHE is complete, i.e., P ∼SH Q implies ASHE P = Q.
6
Conclusion
The formalism we describe in this paper is the stochastic fusion calculus, and it is suitable to describe the dynamic behaviour of biological systems with manyto-one or many-to-many interaction, and so able to capture various aspects and behaviours of complex biological systems. There exist attempts to extend concurrency formalisms with quantitative information defining probabilistic [9] or stochastic aspects [2]. A probabilistic approach of the quantitative aspects over the fusion calculus is presented in [1],
From Gene Regulation to Stochastic Fusion
63
where the probabilistic extensions to fusion calculus follow two directions. The first kind of extension is along the lines of classical actions timing based on stochastic process algebras. The second kind of extension deals with possible incomplete effects of fusion actions. In this paper we introduced stochastic fusion calculus, defining its syntax and operational semantics. The stochastic nature is evident in the labelled transition system of the operational semantics by the fact that the labels represent the rates corresponding to some exponential distributions. We extended the notion of hyperbisimulation to stochastic fusion calculus, proved that the stochastic hyperequivalence is a congruence, and presented an axiomatic system for the stochastic hyperbisimulation.
Acknowledgement Many thanks to my former student Laura Corn˘ acel for her contribution.
References 1. Ciobanu, G., Mishra, J.: Performance Analysis and Name Passing Errors in Probabilistic Fusion. Scientific Annals of “A.I.Cuza” University XVI, 57–76 (2005) 2. de Alfaro, L.: Stochastic Transition Systems. In: Sangiorgi, D., de Simone, R. (eds.) CONCUR 1998. LNCS, vol. 1466, pp. 423–438. Springer, Heidelberg (1998) 3. Degano, P., Priami, C.: Proved Trees. In: Kuich, W. (ed.) ICALP 1992. LNCS, vol. 623, pp. 629–640. Springer, Heidelberg (1992) 4. Glabeek, R.V., Smolka, S., Steffen, B., Tofts, C.: Reactive, Generative and Stratified Models for Probabilistic Processes. Inf. and Computation 121, 59–80 (1995) 5. Hermanns, H.: Interactive Markov Chains. LNCS, vol. 2428, pp. 843–857. Springer, Heidelberg (2002) 6. Jonsson, B., Larsen, K., Yi, W.: Probabilistic Extensions of Process Algebras. In: Handbook of Process Algebra, pp. 685–710. Elsevier, Amsterdam (2001) 7. Kuttler, C., Niehren, J.: Gene Regulation in the Pi Calculus: Simulating Cooperativity at the Lambda Switch. In: Priami, C., Ing´ olfsd´ ottir, A., Mishra, B., Riis Nielson, H. (eds.) Transactions on Computational Systems Biology VII. LNCS (LNBI), vol. 4230, pp. 24–55. Springer, Heidelberg (2006) 8. Hillston, J.: A Compositional Approach to Performance Modelling. PhD thesis, University of Edinburgh (1994) 9. Larsen, K.G., Skou, A.: Bisimulation through Probabilistic Testing. Information and Computation 94, 1–28 (1991) 10. Milner, R., Parrow, J., Walker, D.: A Calculus of Mobile Processes. Information and Computation 100, 1–40 (1992) 11. Parrow, J., Victor, B.: The Fusion Calculus: Expressiveness and Symmetry in Mobile Processes. In: 13th IEEE Symposium on Logic in Computer Science, pp. 176– 185. IEEE Computer Society, Los Alamitos (1998) 12. Parrow, J., Victor, B.: The tau-Laws of Fusion. In: Sangiorgi, D., de Simone, R. (eds.) CONCUR 1998. LNCS, vol. 1466, pp. 99–114. Springer, Heidelberg (1998) 13. Priami, C.: Stochastic π-calculus. The Computer Journal 38, 578–589 (1995) 14. Regev, A., Shapiro, E.: The π-calculus as an Abstraction for Biomolecular Systems. In: Ciobanu, G., Rozenberg, G. (eds.) Modelling in Molecular Biology. Natural Computing Series, pp. 219–266. Springer, Heidelberg (2004)
A Biologically Inspired Model with Fusion and Clonation of Membranes Giorgio Delzanno1 and Laurent Van Begin2, 1
2
Universit` a di Genova, Italy
[email protected] Universit´e Libre de Bruxelles, Belgium
[email protected]
Abstract. P-systems represent an important class of biologically inspired computational models. In this paper, we study computational properties of a variation of P-systems with rules that model in an abstract way fusion and clonation of membranes. We focus our attention on extended P-systems with an interleaving semantics and symbol objects and we investigate decision problems like reachability of a configuration, boundedness (finiteness of the state space), and coverability (verification of safety properties). In particular we use the theory of well-structured transition systems to prove that both the coverability and the boundedness problems are decidable for PB systems with fusion and clonation. Our results represent a preliminary step towards the development of automated verification procedures for concurrent systems with biologically inspired operations like fusion and clonation.
1
Introduction
In recent years several efforts have been spent to define unconventional computing models inspired by biological systems. One interesting family of this kind of models is that of P-systems [15]. P-systems are a basic model of the living cell defined by a set of hierarchically organized membranes and by rules that dynamically distribute elementary objects in the component membranes. Several variations of the basic model have been proposed in the literature, e.g., with active membranes [16], with string objects [15], with dissolution [7], division [16], and gemmation rules [4]. The PB-systems of Bernardini and Manca [3] represent one of the variants of the basic model in which rules can operate on the boundary of a membrane. A boundary rule can be used to move multisets of objects across a membrane. In biological modelling, PB-systems can be used to express complex interactions among biological membranes [10]. In this paper we take PB-systems as a starting point for studying computational properties of an extension of P-systems with two biologically inspired operations, namely fusion and clonation of membranes. Membrane fusion is defined in other artificial models of the living cell like the bio-ambients of Regev
Research fellow supported by the Belgian National Science Foundation (FNRS).
C.S. Calude et al. (Eds.): UC 2008, LNCS 5204, pp. 64–82, 2008. c Springer-Verlag Berlin Heidelberg 2008
A Biologically Inspired Model with Fusion and Clonation of Membranes
65
et al. [18], a model based on process algebra. A restricted form of division, namely replication, is also provided in biologically inspired models based on process algebra. The replication operation !P is defined (via the congruence !P ≡ P |!P ) in such a way that an arbitrary number of copies of process P can be generated during the execution of a system. Since the process !P cannot interact with the environment, replication can be viewed as an operation that dynamically injects new processes in their initial state. Differently from replication, we consider here a clonation operation in which a membrane can generate a copy of its current state (that may include other membranes). This unconventional form of dynamic process generation is inspired by biological processes. Indeed, it can be viewed as an abstract version of the division operation introduced in the context of P-systems in [17]. In this paper we focus our attention on decision problems related to basic qualitative properties of our extended notion of PB systems (PBFC systems, for short). Specifically, we investigate the decidability of properties like reachability of a given configuration, boundedness (finiteness of the state space), and coverability (reachability of an infinite set of configurations sharing a given pattern). We study all these properties using an interleaving operational semantics (i.e., no maximal parallelism) with no priorities. Furthermore, we consider membranes containing multisets (i.e., unordered collections) of objects. These limitations allows us to explore the expressiveness of our biologically inspired model independently from specific execution strategies, and additional ordering on elementary objects. Similar decision problems for qualitative analysis of subclasses of P-systems have been studied, e.g., in [14,6,7]. Our technical results are as follows. We first show that reachability for PBFC is undecidable. This result follows from a weak encoding of multi-counter machines in PBFC. The encoding is weak in the following sense: some execution paths of the PBFC that simulates the counter machine may take a wrong turn into a path that does not correspond to a simulation of the original model. We can use however information inserted in the target configuration of a reachability problem to restrict the simulation to good paths only. The encoding exploits the property that the set of reachable configurations of a PBFC model may contain configurations with unbounded width (due to the presence of clonation) and with multisets of objects of unbounded size (e.g., due to the presence of internal and boundary rules). This property however is not sufficient to obtain a Turing equivalent model. Indeed, we show that boundedness and coverability are both decidable for PBFC systems. These results are based on the theory of wellstructured transition systems developed in [1,9] for the analysis of infinite-state (concurrent) systems. Such a theory finds here a novel application to unconventional computing models. The introduction of maximal parallelism and/or priorities would lead to a Turing complete model as in the case of PB-systems. The analysis of a model with interleaving semantics represents however a promising preliminary step towards the development of automated verification procedures for concurrent
66
G. Delzanno and L. Van Begin
models similar to ours with unconventional (biologically inspired) primitives like fusion and clonation. Related work. To our current knowledge, these are the first (un)decidability results obtained for qualitative analysis of extensions of PB systems with both fusion and clonation rules, interleaving semantics and symbol objects. Decidability results for basic PB systems have been obtained in [6,7]. Specifically, in [6] Dal Zilio and Formenti proved that reachability, boundedness and coverability are decidable for PB systems with symbol objects by using a reduction to Petri nets. In [7] we proved that reachability is still decidable for an extension of PB systems with creation of new membranes with fixed content (e.g., an empty membrane) or with membrane dissolution, but not both. Interestingly, boundedness and coverability turn out to be undecidable with creation rules. We consider here operations of different nature (e.g., clonation cannot force the newly created membrane to be empty and does not increase the depth of configurations). The resulting extension of PB systems satisfies indeed different properties (undecidability of reachability, decidability of coverability and boundedness). The universality problem of different form of division in P-systems with active membranes has been studied in [17,2]. Differently from [17,2], we consider here a more abstract notion of division, we called clonation, and different decision problems like coverability more related to verification of qualitative problems. Similar verification problems have been investigated for other variations of P-systems (e.g., signaling and catalytic P-systems) in [14,12].
2
Preliminaries
In this section we recall the main definitions of well-structured transition systems [1,9], and PB-systems [10]. We first need some preliminary notions. Let N be the set of positive integers. Consider a finite alphabet Γ of symbols. A multiset over Γ is a mapping u : Γ → N. For any a ∈ Γ , the value u(a) denotes the multiplicity of a in u (the number of occurrences of symbol a in u). We often use a multiset as a string a1 · . . . · an of symbols, i.e., ai ∈ Γ . Furthermore, we use to denote the empty multiset, i.e., such that (a) = 0 for any a ∈ Γ . As an example, for Γ = {a, b, c, d}, the string abcc represents the multiset u such that u(a) = u(b) = 1, u(c) = 2, and u(d) = 0. We use Γ ⊗ to denote the set of all possible multisets over the alphabet Γ . Given two multisets u, v over Γ , we write u v if u(a) ≤ v(a) for all a ∈ Γ . We use u ≺ v to denote that u v and v u. Furthermore, we use ⊕ and to denote multiset union and difference, respectively. Specifically, for any a ∈ Γ we have that (u ⊕ v)(a) = u(a) + v(a), and (uv)(a) = max(0, u(a)−v(a)) where max(a, b) returns the largest number between a and b. Well-structured transition systems. A transition system is a tuple G = (S, T ) where S is a (possibly infinite) set of configurations, T ⊆ S × S is a transition relation between configurations. We use γ → γ to denote (γ, γ ) ∈ T . A quasiordering (S, ) is a well-quasi ordering (wqo for short) if for any infinite sequence
A Biologically Inspired Model with Fusion and Clonation of Membranes
67
s1 s2 . . . si . . . there exist indexes i < j such that si sj . A transition system G = (S, T ) is a well-structured transition system (wsts for short) with respect to a quasi-order ⊆ S × S iff: (i) is a well-quasi ordering; and (ii) for any configurations γ1 , γ1 , γ2 such that γ1 γ1 and γ1 → γ2 , there exists γ2 such that γ1 → γ2 and γ2 γ2 , i.e., G is monotonic. A wsts is said strictly monotonic when γ1 < γ1 , i.e., γ1 γ1 and γ1 γ1 , implies that γ2 < γ2 . P-systems with Boundary Rules. A PB system [3] with symbol objects is a tuple Π = (Γ, N, M, R, μ0 ), where: – Γ is a finite alphabet of symbols. – N is a finite set of membrane names/types. – M is a finite tree representing the membrane structure. Each node n of M corresponds to a membrane and is labelled with a membrane name/type type(n) ∈ N . We use nodes(M ) to denote the set of nodes of M . – R is a finite set of rules. – μ0 : nodes(M ) → Γ ⊗ is the initial configuration, i.e., a mapping from membranes to multisets of objects from Γ . Rules can be of the following two forms1 : (1) Internal : [i u → [i v (2) Boundary : u [i v → u [i v where i ∈ N , and u, u , v, v ∈ Γ ⊗ and we assume in boundary rules that at least one between u and u is not empty. The semantics of PB-systems is given in terms of transition systems. The set of configurations of a PB system Π is the set of distributions of objects in Γ in the membranes in M , i.e., a configuration μ is mapping from N odes(M ) to Γ ⊗ . The transition relation is defined as follows. A rule of the form (1) is enabled at μ, if there exists a membrane n ∈ nodes(M ) with type(n) = i and u μ(n). Its application leads to a new configurations μ such that μ (n) = (μ(n) u) ⊕ v and μ (n ) = μ(n ) for any other node n ∈ nodes(M ) such that n = n . Suppose now that a membrane m ∈ nodes(M ) with type(m) = j contains as immediate successor in M a node n with type(n) = i. A rule of the form (2) is enabled at μ, if u μ(m) and v μ(n). Its application leads to a new configuration μ such that μ (m) = (μ(m) u) ⊕ u μ (n) = (μ(n) v) ⊕ v and μ (m ) = μ(m ) for any node m ∈ nodes(M ) such that m = m, n. We have a transition from μ to μ , i.e., μ → μ , if μ can be obtained from μ by 1
We consider here a slight generalization of the model in [6] in which we allow any kind of transformation between two membranes.
68
G. Delzanno and L. Van Begin
applying a rule in R. A computation with initial configuration μ0 is a sequence of transitions μ0 → μ1 → . . . μk . A configuration μ is reachable from μ0 if there exists a computation μ0 → μ1 → . . . μk with μk = μ.
3
PB Systems with Fusion and Clonation
In this paper, we investigate an extension of PB-systems with two new operations namely fusion and clonation of membranes. On one side these two new operations allow to describe basic behaviors of a living cell. On the other side they complicate the formalism in that the membrane structure may dynamically change during the evolution. The state space of an instance of the extended model may be infinite in two dimensions: in the size of the membrane structures generated during the evolution of the system, and in the number of objects produced inside the corresponding membranes. Formally, a PB system with fusion and clonation rules (PBFC, for short) provides, in addition to internal and boundary rules, two other kinds of rules of the following form: (3) Fusion : [i u [j v → [k w (4) Clonation : [i u → [i v [i w where i, j, k ∈ N , and u, v, w ∈ Γ ⊗ . The rule (3) models the fusion of a membrane m with type(m) = i containing the multiset of objects u with a membrane m with type(m ) = j containing the multiset of objects v. Objects in u and v are consumed during this process. The fusion of the two membranes generates a new membrane n with type(n) = k that contains w and the (remaining) contents of both m and m . A clonation rule like (4) clones a sub-tree rooted by a membrane n with type(n) = i containing the multiset of objects u. During the clonation, the objects in u are consumed and replaced by the multiset of objects v in n and by the multiset of objects w in the clone of n. This definition allows to define both perfect clones (i.e., two copies of the same membrane) or to distinguish the clone from the original membrane by using objects w and v, respectively. The latter type of clonation can be used to disable a second application of clonation immediately after the generation of the clone (i.e., avoid clonation rules that are enabled forever and thus applied without control). To make the semantics formal, we make the membrane structure part of the current configuration, M0 being the initial tree. Thus, a configuration is now a pair c = (M, μ), where M is a tree, and μ : nodes(M ) → Γ ⊗ is a mapping from nodes of M to Γ ⊗ . Rules of type (1) and (2) operate on a configuration c = (M, μ) without changing the tree structure M and changing μ as specified in the semantics of PB systems. A fusion rule like (3) operates on a configuration c = (M, μ) as follows. Suppose that m and m are two descendants of a node p in M such that type(m) = i and type(m ) = j. The rule is enabled if u μ(m) and v μ(m ). Its application leads to a new configurations c = (M , μ ) such that
A Biologically Inspired Model with Fusion and Clonation of Membranes
69
– M is the tree obtained by removing the nodes m and m , adding a new node n with type(n) = k, and by letting all successor nodes of m and m become successors of n. The parent node of n is p, the parent of the nodes m and m in the tree M ; – μ is the mapping defined as μ (n) = (μ(m) u) ⊕ (μ(m ) v) ⊕ w and μ (n ) = μ(n ) for any other node n ∈ nodes(M ) such that n = n. A clonation rule like (4) operates on a configuration c = (M, μ) as follows. Suppose that M has a node m with a successor n with type(n) = i. The rule is enabled if u μ(n). Its application leads to a new configuration c = (M , μ ) such that – M is the tree obtained by adding a new copy of the tree rooted by n as sub-tree of m; – μ is the mapping defined as follows. For any node n in the sub-tree rooted by n, let Clone(n ) be its copy in the new sub-tree. Then, we have that • μ (n) = (μ(n) u) ⊕ v; • μ (Clone(n)) = (μ(n) u) ⊕ w; • μ (Clone(n )) = μ(n ) for any node n = n in the sub-tree rooted by n; • μ (n ) = μ(n ) for the other nodes n ∈ nodes(M ). The notions of computation and reachable configuration can naturally be extended to PBFC systems. Example 1. Consider a PBFC system with Γ = {a, b, c, d, e, f, g, h, u, v, w}, N = {i}. For simplicity, configurations are represented here as terms. Specifically, objects are represented as constants, and a membrane of type i containing t1 , . . . , rn as a term of the form [t1 , . . . , tn ]. Hence, [a [b] [c]] represents the configuration where the root node contains the object a and two membranes, with objects b and c, respectively. Now consider the initial configuration [a[b[d]][c]] and the following set of rules: (r1 ) (r2 ) (r3 ) (r4 )
[i d → [i f [i g (clonation) b [i g → e [i h (boundary) [i e → [i v [i u (clonation) [i u[i c → [i w (f usion)
Then, we have the following computation: [a [b [d]] [c]] →r1 [a [b [f ] [g]] [c]] →r2 [a [e [f ] [h]] [c]] →r3 [a [v [f ] [h]] [u [f ] [h]] [c]] →r4 [a [v [f ] [h]] [w [f ] [h]]] Decision Problems. In this paper, we focus our attention on decision problems related to the dynamic behavior of PBFC systems. The first problem we consider is the reachability of a configuration.
70
G. Delzanno and L. Van Begin
Definition 1 (Reachability). Given a PBFC system Π with the initial configuration (M0 , μ0 ) and a configuration (M, μ), the reachability problem consists in checking if (M, μ) is reachable from (M0 , μ0 ). The reachability problem is decidable for PB systems with symbol objects [6], and it remains decidable for PB systems with dissolution [7], and for PB systems with dynamic creation of membranes [7]. When adding fusion and clonation, two counter machines may be weakly simulated with PBFC in the sense that some executions of PBFC systems do not simulate an execution of the counter machine. However, in those cases a particular membrane contains objects, allowing to distinguish those executions from the one simulating the counter machine by looking at the content of that membrane in the last configuration of the execution. Hence, the reachability problem is undecidable for PBFC system. Theorem 1. The reachability problem is undecidable for PBFC systems. Proof. We reduce the reachability problem for two counter machines. Our reduction uses the following types of membrane: cs (for control structure), c1 , c2 , and trash. Each configuration has a root node of type cs. Counters are encoded with membranes of type c1 and c2 . Those membranes and one trash membrane are the children of the root. The trash membrane is used to validate executions. The set of objects contains the control states of the counter machine and their primed version, i.e. the root membrane containing control state l means that the counter machine is in control state l. The primed versions correspond to intermediate states. We also have objects o that are used to encode the value of counters, i.e., a membrane of type c1 with k objects o represents the value k for the first counter, and objects active1 , active2 , clone1 , clone2 , f usion and f used. The six last objects are useful for simulation. The initial configuration with control state l0 and both counters equal to 0 is encoded with a configuration where the root (of type cs) contains l0 , the child of type ci contains the object activei and the trash membrane is empty. An increment of counter i from location l1 to location l2 is simulated by a rule l1 [cs activei → l2 [cs activei · o A decrement of counter i from location l1 to location l2 is simulated by a rule l1 [cs activei · o → l2 [cs activei Finally, a zero test on counter i and a move from l1 to l2 is simulated by four rules. The two first ones clone membrane ci : l1 [cs activei → l1 [cs clonei [ci clonei → [ci activei [ci f usion The next rule can be fired only after the preceding ones and fuses the trash membrane with the copy of membrane ci containing the f usion object: [trash [ci f usion → [trash f used
A Biologically Inspired Model with Fusion and Clonation of Membranes
71
Finally, after the fusion the control state moves to l2 by applying the following rule: l [cs f used → l2 [cs Notice that if we simulate a test for zero on counter i and the membrane ci contains at least one object o then the trash membrane contains at least one object o after the simulation. Furthermore there is no rule that decreases the number of o in trash. Hence, trash remains empty while the PBFC system simulates correctly the counter machine. So, the state with control state l and counter ci equal to vi is reachable iff the configuration where the root node contains l, its child ci contains vi instances of o (and one object activei ) and an empty trash membrane is reachable. 2 Theorem 1 shows the power of PBFC systems: they have the same expressive power as Turing machines when considering reachability of a particular configuration as accepting condition. However, as we prove in the remainder of the paper and contrary to Turing machines, some interesting properties of PBFC systems can be automatically checked. In particular, we concentrate on two other important decision problems, i.e., boundedness and coverability of a configuration. The boundedness problem poses a basic question on the behavior of a system, i.e., the finiteness of its state space. Definition 2 (Boundedness). Given a PBFC system Π with the initial configuration (M0 , μ0 ), the boundedness problem consists in deciding if the set of configurations that are reachable from (M0 , μ0 ) is finite. The coverability problems is a weaker form of reachability often used for qualitative analysis and verification of infinite-state systems [1,9]. Instead of checking if a specific configuration is reachable or not, coverability is defined as the reachability of a (typically infinite) set of a configurations that have certain patterns. The most convenient way to formalize this idea is to introduce an ordering on configurations and formulate the coverability problem with respect to upward closed sets of configurations. Since PBFC configurations have a tree structure, it seems natural to consider here patterns of trees and, thus, to introduce an ordering on trees (with multisets of objects as labels). Specifically, we use here the tree embedding (a.k.a. Kruskal) order ≤K over trees [13]. Definition 3 (Tree Embedding). Let M and M be two trees with the set of nodes S and S , respectively; and assume a quasi order on labels of tree nodes. Then, M ≤K M iff there exists an injection ρ : S → S such that (i) for all n ∈ S, n ρ(n) and (ii) for all n, n ∈ S, we have that n is in the sub-tree rooted by n iff ρ(n ) is in the sub-tree rooted by ρ(n). In the case of PBFC configurations, the order is defined as follows. Given a node n of a configuration (M, μ) and a node n of (M , μ ), n n iff type(n) = type(n ) and μ(n) μ (n ). From the Kruskal tree theorem [13] (the version for unordered trees can be found in [5]), we know that if is a well-quasi ordering (wqo) then ≤K is also a
72
G. Delzanno and L. Van Begin
wqo (see preliminaries for def. of wqo). By Dickson’s lemma [8], the order is a wqo. Thus, the order ≤K is a wqo over PBFC configurations. The coverability problem is defined then as follows: Definition 4 (Coverability). Given a PBFC system Π with the initial configuration (M0 , μ0 ) and a configuration (M, μ), the ≤K -coverability problem consists in checking if there is a configuration (M , μ ) which is reachable from (M0 , μ0 ) and such that (M, μ) ≤K (M , μ ). The intuition here is that the configuration (M, μ) defines the pattern of the set of configurations for which we ask the reachability question. A pattern is defined here as a tree with certain objects in each node. A note about PBFC systems and (extended) Petri nets. In [6], it is shown that PB-systems can be encoded into (mimicked by) Petri nets. A Petri net is composed of a finite set of places P and a finite set of transitions T . A Petri net configuration m : P → N, called marking, assigns m(p) (black) tokens to each place p ∈ P . Each transition t ∈ T removes/adds a fixed number of tokens from each place p ∈ P (see [20] for a more detailed description of Petri nets). For instance, a transition may remove one token from a place p1 and add one token into another place p2 . The boundedness problem and the -coverability problem are defined as in the case of PBFC systems where the order over markings is the pointwise extension of the order ≤ over N. When adding fusion and clonation to PB systems, transfer Petri nets can be simulated. Those models are Petri nets extended with transfer arcs that move all the tokens contained in one place to another in one step. Hence, the number of tokens transferred is not fixed a priori. Since transfer Petri nets are more expressive than Petri nets [11], contrary to PB systems, PBFC systems cannot be encoded into (mimicked by) Petri nets. A transfer Petri net N is encoded into a PBFC system as follows. For each place p we have a membrane name/type p. We also have a membrane name/type N . A marking m is encoded into a configuration composed of a root membrane of name/type N which has, for each place p, two children of name/type p. The first child of type p contains only an object wp and is used to simulate transfers from the place p. The second child contains one object okp and as many objects • as the number of tokens assigned by m to p, i.e., it is used to encode the content of the place p. The root membrane contains an object that describes the state of the simulation: the PBFC systems is ready to simulate a new transition or it is simulating a transition. Fig. 1 shows how a transfer from place p to p is simulated: the membrane of type p encoding the content of the place p is fused with the membrane of type p encoding the content of the place p . Moreover, the other membrane of name/type p is cloned and the new copy is used to encode the content of the place p after the transfer. Let r be the object contained into the root membrane of type N when the PBFC system is ready to simulate a new transition and assume that the transition t is only composed of a transfer from the place p to p . Then, the transition t is simulated with the following rules.
A Biologically Inspired Model with Fusion and Clonation of Membranes
N
73
N
clone
w
• • •
. . .
• ••
ok p p (a) starting configuration
•
ok p
w
p
fusion . . .
• •• ok p p (b) simulation of the transfer
cloned
• • •
•
ok p
N
w
• . . . • • • •• ok p • ok p p (c) final configuration
Fig. 1. Simulation of a transfer with a PBFC systems. Rectangles represents membranes.
The first two rules clone the membrane of type p containing the object wp : r [N wp → r1 [N cp [p cp → [p wp [p clonedp The three next rules fuse the membranes encoding the content of the places p and p into a unique membrane of type p (encoding the content of the place p after the transfer). Those rules can be applied only after the two previous ones. r1 [N clonedp → r2 [N clonedp r2 [N okp → r3 [N fpp [p fpp [p okp → [p f usedp Finally, the cloned membrane of name/type p becomes the membrane encoding the content of the place p and the result of the fusion encodes p . r3 [N f usedp → r4 [N okp r4 [N clonedp → r [N okp
4
Decidability of Boundedness for Extended PB Systems
In this section, we prove that the boundedness problem is decidable for PBFC systems. To achieve that goal, we use the theory of well-structured transition systems [1,9]. In order to apply results provided in [1,9], we must first prove that PBFC systems together with a wqo over configurations form well-structured transition systems (see preliminaries for definitions). We first notice that PBFC-systems with the Kruskal order are not wellstructured. Indeed, consider only one type of membrane i and a boundary rule r = a [i b → c [i d. Now consider two configurations (M, μ) and (M , μ ) . The first one is composed of two nodes, the root and its unique child. The root contains the object a and its child contains b. Hence, r is applicable. The second
74
G. Delzanno and L. Van Begin
configuration is similar to the first one except that there is an intermediate membrane between the root and its child. That intermediate membrane contains the object c, hence r is not applicable and the condition (ii) of the definition of wsts (monotonicity) does not hold. Thus, we cannot directly use the theory of well-quasi ordering for PBFC and the order ≤K to solve the boundedness problem. Instead, we use another order, noted ≤D , for which PBFC systems are strictly monotonic. Assume two trees M and M with S and S as set of nodes, respectively, r and r as root, respectively. Assume also a quasi order on nodes of trees. Definition 5 (The order ≤D ). We say that M ≤D M iff there exists an injection ρ : S → S such that ρ(r) = r , for all n ∈ S, n ρ(n ) and for all n, n ∈ S, we have that n is a child of n iff ρ(n ) is a child of ρ(n). In the case of PBFC configurations, the order between labels of nodes is multiset inclusion as for ≤K . For any pairs of configurations c and c , we use c
A Biologically Inspired Model with Fusion and Clonation of Membranes
75
– Internal rule. First, notice that the tree structure of configurations does not change when applying an internal rule. Furthermore, if an internal rule [i u → [i v is applied on the node n, then the internal rule can also be applied on ρ(n) since (a) i = type(n) = type(ρ(n)) and (b) μ(n) μ (ρ(n)). Moreover, from (b) we deduce that (μ(n) u) ⊕ v (μ (ρ(n)) u) ⊕ v. Hence, by defining ρ = ρ we conclude that c ≤D c . Finally, since the tree structure does not change we have that if M has strictly more nodes than M then M has strictly more nodes than M . Furthermore, if M contains a node n such that μ(n ) ≺ μ (ρ(n )) then μ (n ) ≺ μ (ρ (n )). Hence, we conclude that c
76
G. Delzanno and L. Van Begin
M . Hence, if the number of nodes in M is strictly smaller than in M then the number of nodes in M is strictly smaller than in M . Furthermore, if M contains a node n such that μ(n) ≺ μ (ρ(n)) then μ (n) ≺ μ (ρ (n)). 2 Hence, we conclude that c
5
Decidability of Coverability for Extended PB Systems
We now apply the theory of well-structured transition systems [1,9] to solve the ≤K -coverability problem for PBFC systems. Since PBFC systems together with wqo ≤K do not form a wsts, we cannot apply directly the results of [1,9]. Instead, we reuse the wqo ≤D introduced into the previous section. The ≤D -coverability problem is defined similarly to the ≤K -coverability problem by replacing ≤K by ≤D . In the remainder of the section, we show how to instantiate the algorithm proposed in [1,9] to solve the ≤D -coverability problem. Then, we show how to use that result to solve the ≤K -coverability problem. 5.1
Solving the ≤D -Coverability Problem
The algorithm in [1,9] that solves the coverability problem manipulates upwardclosed sets of configurations. Remember that we restrict the set of configurations of a b-depth bounded PBFC system to b-depth bounded configurations. A set of PBFC configurations U is upward-closed if and only if for any pair of configurations c, c , we have that c ∈ U and c ≤D c implies that c ∈ U . Since ≤D is a wqo (and a partial order) over b-depth bounded configurations, any upward-closed set U of b-depth bounded configurations has a finite set of
A Biologically Inspired Model with Fusion and Clonation of Membranes
77
minimal elements, i.e., one finite and minimal (with respect to ⊆) set B such that U = {c | ∃c ∈ B : c ≤D c}. Given a set of b-depth bounded configurations F , F ↑ denotes the upward-closed set of b-depth bounded configurations {c | ∃c ∈ F : c ≤D c}. If F is finite then we say that it is a finite basis of F ↑. Notice that a finite basis may contain useless elements, ie. configurations c such that there exists another configuration c with c
Notice that the intersection of upward-closed sets is always an upward-closed set.
78
G. Delzanno and L. Van Begin
Proposition 2. There exists an algorithm that builds, given a b-depth bounded PBFC system Π and a finite basis of an upward closed set U of b-depth bounded configurations, a finite basis of the set Pre(U). Proof. Let F be a finite basis of U and F be the finite basis of Pre(U) computed by the algorithm. The algorithm is defined by cases as follows. Internal rule. Assume an internal rule [i u → [i v and c = (M, μ) ∈ F . We consider two cases. Either the rule is backwardly applied on a node which is explicitly in c or not. To take into account the first case, for any node n with type(n) = i3 we have that F contains the configuration (M, μ ) such that μ (n) = (μ(n) v) ⊕ u, and μ (n ) = μ(n ) for all n = n. For the second case, we must ensure that the predecessors have a node containing the objects u. Notice that the new node may appear an any depth (bounded by b). Hence, F contains any b-depth bounded configurations (M , μ ) obtained from c by adding a sub-tree composed of nodes n1 , . . . , nk (k ≥ 1) such that for all 1 ≤ j < k, type(nj ) is any membrane type, μ (nj ) = and nj has nj+1 as unique child; type(nk ) = i and μ (nk ) = u. Boundary rule. Assume a boundary rule u [i v → u [i v and c = (M, μ) ∈ F . We consider three cases: – The rule is applied on two nodes that appear explicitly in c. In that case, for any pairs of nodes n, n of c such that type(n) = i and n is a child of n the following holds. We have a predecessor configuration that contains a node m with type(m) = i with one child m such that m contains the objects (μ(n) u ) ⊕ u and m contains the objects (μ(n ) v ) ⊕ v. More precisely, F contains the configuration (M, μ ) where μ (n) = (μ(n)u )⊕u, μ (n ) = (μ(n ) v ) ⊕ v, and μ (n ) = μ(n ) for all n ∈ {n, n }; – The rule is applied on one node of type i that explicitly appears in c and on one of its child that does not appear explicitly. In that case, for any node n of c such that type(n) = i, the following holds. We have predecessor configurations that contain a node m with type(m) = i containing the objects (μ(n) u ) ⊕ u, and which has one child containing the objects v. More precisely, F contains a configuration (M , μ ) where M is obtained from M by adding one new child n to n where type(n ) is any membrane type; and μ is such that μ (n) = (μ(n) u ) ⊕ u, μ (n ) = v and μ (n ) = μ(n ) for all n ∈ {n, n }; – Finally, the rule may be applied on two nodes that do not explicitly appear in c. In that case, the predecessor configurations must contain two nodes containing the objects u and v, respectively. As in the case of internal rules, those nodes may appear at any depth (smaller than or equal to b). Hence, F contains any b-depth bounded configurations (M , μ ) obtained from c by adding nodes n1 , . . . , nk (k ≥ 2) such that for all 1 ≤ j < k−1, type(nj ) is any type, μ (nj ) = and nj has nj+1 as unique child; type(nk−1 ) = i, μ (nk−1 ) = u and nk−1 has nk as unique child; type(nk ) is any type and μ (nk ) = v. 3
We do not impose that v μ(n) because some elements of v may not appear explicitly in μ(n).
A Biologically Inspired Model with Fusion and Clonation of Membranes
79
Fusion rule. Assume a fusion rule [i u [j v → [k w. We consider two cases: – The fused sub-tree does not appear explicitly into c. In that case, we must ensure into the predecessor configurations that at least one node n has one child n with type(n ) = i containing the objects u and one child n with type(n ) = j containing the objects v. Notice that those nodes may appear at any depth smaller than or equal to b. More precisely, F contains all the b-depth bounded configurations (M , μ ) where M is built from M by adding nodes n1 , . . . , nk , nk+1 (k ≥ 1) such that n1 is a child of a node n0 appearing into M , for all 1 ≤ i ≤ k − 1, ni is a child of ni−1 and type(ni ) is any membrane type; and nk and nk+1 are children of nk−1 , type(n ) = i and type(n ) = j. Furthermore, μ (nk ) = u, μ (nk+1 ) = v, for all 1 ≤ i ≤ k−1, μ (ni ) = , and for all node n ∈ {n1 , . . . nk+1 } we have μ (n ) = μ(n ); – The fused sub-tree appears explicitly into c. In that case, M contains a node n with type(n) = k. Then, F contains all the configurations (M , μ ) such that M is built from M by replacing the sub-tree rooted by n by two subtrees rooted by the nodes n with type(n ) = i and n with type(n ) = j. The sub-trees rooted by children of n and n form the set of sub-trees rooted by children of n. Furthermore, the function μ satisfies the following: μ (n ) and μ (n ) are multi-sets of objects such that u μ (n ), v μ (n ) and (μ (n )u)⊕(μ (n )v) = μ(n)w, and for all m ∈ {n , n }, μ (m) = μ(m). Clonation rule. Assume a rule [i u → [i v [i w. We consider several cases: – The cloned sub-tree and the resulting sub-tree of the clonation do not appear into c. In that case, the predecessor configurations must contain a node of type i containing the objects u at any depth smaller than or equal to b. More precisely, F contains any configuration (M , μ ) obtained from (M, μ) by adding nodes n1 , . . . , nk (k ≥ 1) such that for all 1 ≤ j < k, type(nj ) is any type, μ (nj ) = and nj has nj+1 as unique successor; type(nk ) = i and μ (nk ) = u; – The cloned sub-tree appears explicitly into c but the resulting sub-tree of the clonation does not. For any node n of c such that type(n) = i F contains a configuration (M, μ ) where μ (n) = (μ(n) v) ⊕ u and μ (n ) = μ(n ) for all n = n; – The resulting sub-tree of clonation appears explicitly into c but the cloned sub-tree does not. For any node n of c such that type(n) = i F contains a configuration (M, μ ) where μ (n) = (μ(n) w) ⊕ u and μ (n ) = μ(n ) for all n = n; – The cloned sub-tree and the resulting sub-tree of clonation appear explicitly into c. For any nodes n and n such that n and n have the same parent n , type(n) = i and type(n ) = i, we have the following. Let the configuration (M, μ ) such that μ (n) = (μ(n)v), μ (n ) = (μ(n )w), and μ (m) = μ(m) for all m = n, n . Assume that (Mn , μn ) and (Mn , μn ) are the configurations corresponding to the sub-trees rooted by n and n , respectively, and let B be a finite basis of the set {(Mn , μn )} ↑ ∩{(Mn , μn )} ↑. Notice that the set B is computable following Lemma 3. For any (M∩ , μ∩ ) ∈ B, F contains a
80
G. Delzanno and L. Van Begin
configuration (M , μ ) built from (M , μ ) by removing the sub-trees rooted by n and n and by adding the tree M∩ as a sub-tree of n . The mapping μ is similar to μ for the nodes that does not appear in M∩ , for the root r of M∩ we have that μ (r) = μ∩ (r) ⊕ u, and for the other nodes m = r of M∩ we have that μ (m) = μ∩ (m). 2 Notice that the union of upward-closed sets is computed by making the union of their finite basis, and the convergence of the sequence (Pi )i≥0 can be decided by testing ≤D on elements of finite basis. Since ≤D is decidable, from Lemma 2 and Proposition 2 we get the decidability of the ≤D -coverability problem. Again, the encoding given in Section 3 allows us to reduce the -coverability problem for transfer Petri nets to the ≤D -coverability problem for PBFC systems (in polynomial time) and a trivial modification of the construction given in [19] allows us to prove that the coverability problem for transfer Petri nets has a nonprimitive recursive complexity. Hence, the next proposition. Proposition 3. The ≤D -coverability problem is decidable for PBFC systems and has a nonprimitive complexity. 5.2
Decidability of the ≤K -Coverability Problem
We now show how to reduce the ≤K -coverability problem to the ≤D -coverability problem. Let a b-bounded depth configuration c = (M, μ) and M(c, b) be the set of b-depth bounded configurations (M , μ ) built from c by adding possibly intermediate nodes between nodes of M . Those intermediate nodes n have any membrane type and are such that μ (n) = and have exactly one successor, either another intermediate node or a node appearing in M . In other words, the configurations in M(c, b) have a similar structure than c except that the nodes of M are potentially at a larger depth (bounded by b). By definition, any b-depth bounded configuration (M , μ ) has a depth bounded by b ∈ N. Hence, we have that (M, μ) ≤K (M , μ ) if and only if there exists a b-depth bounded configuration (M , μ ) ∈ M(c, b) such that (M , μ ) ≤D (M , μ ). Since the set M(c, b) is finite and computable, we can solve the ≤K coverability problem for a b-depth bounded PBFC system Π and a b-depth bounded configuration (M, μ) by solving ≤D -coverability problems for Π and configurations in M(c, b). Finally, the encoding of transfer Petri nets into PBFC systems given in Section 3 and [19] allow us to conclude that the complexity of the ≤K -coverability problem is nonprimitive recursive. Theorem 3. The ≤K -coverability problem is decidable for PBFC systems and has a nonprimitive recursive complexity.
6
Conclusion
In this paper we have investigated the computational property of a biologically inspired model providing the notion of membrane and rules for fusion of
A Biologically Inspired Model with Fusion and Clonation of Membranes
81
clonation of membranes. The resulting model, called PBFC, is an extension of PB-systems [3]. PBFC systems seem to be an interesting example of an unconventional computing model in between Turing machines and Petri nets. Indeed, they provide non-trivial built-in operations for the manipulation of tree structured collections of objects (reachability is undecidable). However, as for Petri nets, decision problems that are strictly related to the verification of structural and qualitative properties like boundedness and coverability turn out to be decidable for PBFC systems.
References ˇ ans, K., Jonsson, B., Yih-Kuen, T.: General decidability theo1. Abdulla, P.A., Cer¯ rems for infinite-state systems. In: LICS 1996, pp. 313–321 (1996) 2. Alhazov, A., Freund, R., Riscos-N´ un ˜ez, A.: Membrane division, restricted membrane creation and object complexity in P systems. Computer Mathematics 83(7), 529–547 (2006) 3. Bernardini, F., Manca, V.: P systems with boundary rules. In: P˘ aun, G., Rozenberg, G., Salomaa, A., Zandron, C. (eds.) WMC 2002. LNCS, vol. 2597, pp. 107– 118. Springer, Heidelberg (2003) 4. Besozzi, D., Zandron, C., Mauri, G., Sabadini, N.: P systems with gemmation of mobile membranes. In: Restivo, A., Ronchi Della Rocca, S., Roversi, L. (eds.) ICTCS 2001. LNCS, vol. 2202, pp. 136–153. Springer, Heidelberg (2001) 5. Bezem, M., Klop, J.W., de Vrijer, R.: Term rewriting systems. Cambridge University Press, Cambridge (2003) 6. Dal Zilio, S., Formenti, E.: On the dynamics of PB systems: A Petri net view. In: Mart´ın-Vide, C., Mauri, G., P˘ aun, G., Rozenberg, G., Salomaa, A. (eds.) WMC 2003. LNCS, vol. 2933, pp. 153–167. Springer, Heidelberg (2004) 7. Delzanno, G., Van Begin, L.: On the dynamics of PB systems with volatile membranes. In: Eleftherakis, G., Kefalas, P., P˘ aun, G., Rozenberg, G., Salomaa, A. (eds.) WMC 2007. LNCS, vol. 4860, pp. 240–256. Springer, Heidelberg (2007) 8. Dickson, L.E.: Finiteness of the odd perfect and primitive abundant numbers with distinct factors. Amer. J. Math. 35, 413–422 (1913) 9. Finkel, A., Schnoebelen, Ph.: Well-structured transition systems everywhere! TCS 256(1-2), 63–92 (2001) 10. Franco, G., Manca, V.: A membrane system for the leukocyte selective recruitment. In: Mart´ın-Vide, C., Mauri, G., P˘ aun, G., Rozenberg, G., Salomaa, A. (eds.) WMC 2003. LNCS, vol. 2933, pp. 181–190. Springer, Heidelberg (2004) 11. Geeraerts, G., Raskin, J.-F., Van Begin, L.: Well-structured languages. Acta Informatica 44(3-4), 249–288 (2007) ¨ Catalytic P systems, semilinear sets, and 12. Ibarra, O.H., Dang, Z., Egecioglu, O.: vector addition systems. TCS 312(2-3), 379–399 (2004) 13. Kruskal, J.B.: Well-quasi Ordering, The Tree Theorem, and Vazsonyi’s conjecture. Trans. American Math. Soc. 95, 210–225 (1960) 14. Li, C., Dang, Z., Ibarra, O.H., Yen, H.-C.: Signaling P systems and verification problems. In: Caires, L., Italiano, G.F., Monteiro, L., Palamidessi, C., Yung, M. (eds.) ICALP 2005. LNCS, vol. 3580, pp. 1462–1473. Springer, Heidelberg (2005) 15. P˜ aun, Gh.: Computing with membranes. J. of Computer and System Science 61(1), 108–143 (2000)
82
G. Delzanno and L. Van Begin
16. P˜ aun, Gh.: P systems with active membranes: attacking NP-complete problems. J. of Automata, Languages, Combinatorics 6(1), 75–90 (2001) 17. P˜ aun, Gh., Suzuki, Y., Tanaka, H., Yokomori, T.: On the power of membrane division in P systems. TCS 324(1), 61–85 (2004) 18. Regev, A., Panina, E.M., Silverman, W., Cardelli, L., Shapiro, E.: BioAmbients: an abstraction for biological compartments. TCS 325(1), 141–167 (2004) 19. Schnoebelen, Ph.: Verifying lossy channel systems has nonprimitive recursive complexity. IPL 83(5), 251–261 (2002) 20. Reisig, W.: Petri Nets - An Introduction. Springer, Heidelberg (1985)
Computing Omega-Limit Sets in Linear Dynamical Systems Emmanuel Hainry LORIA, Universit´e Henri Poincar´e Campus scientifique, BP 239 - 54506 Vandœuvre-l`es-Nancy, France
[email protected]
Abstract. Dynamical systems allow to modelize various phenomena or processes by only describing their way of evolution. It is an important matter to study the global and the limit behaviour of such systems. A possible description of this limit behaviour is via the omega-limit set: the set of points that can be limit of subtrajectories. The omega-limit set is in general uncomputable. It can be a set highly difficult to apprehend. Some systems have for example a fractal omega-limit set. However, in some specific cases, this set can be computed. This problem is important to verify properties of dynamical systems, in particular to predict its collapse or its infinite expansion. We prove in this paper that for linear continuous time dynamical systems, it is in fact computable. More, we also prove that the ω-limit set is a semi-algebraic set. The algorithm to compute this set can easily be derived from this proof. Keywords: Dynamical Systems, omega-limit set, hybrid systems, reachable set, verification, safety properties.
1
Introduction
The physical equations that govern interactions between celestial bodies give a local description of the trajectory of those bodies: given the positions and speeds of all the stars and planets, we know the evolution of those variables. Other systems, motivated by meteorological phenomena, chemical interactions, biological examples, mathematics equations or computing systems can be described in a similar local manner: given any set of instantaneous conditions, the local behaviour of the system is defined. Those examples can be described as dynamical systems. A dynamical system behaves either in discrete time, either in continuous time. In both cases, it will be defined by an initial position and a dynamics map. In the discrete case, the dynamics can, from the conditions at time n, predict the positions at time n + 1. In the continuous case, the direction in which the system moves from a given state point x is a function of x. The evolution of a dynamical system is hence described in a very simple way but it can be hard to grasp where a point that undergoes the dynamics will go. Hence, one of the fundamental questions with such systems is their asymptotic behaviour. Knowing whether they collapse to one single point, diverge or become C.S. Calude et al. (Eds.): UC 2008, LNCS 5204, pp. 83–95, 2008. c Springer-Verlag Berlin Heidelberg 2008
84
E. Hainry
periodic is important to grasp the evolution of a dynamical system. The case of celestial bodies is a fine example of the complexity of this problem: we can predict the whole trajectory of a system with two bodies, but as soon as there are three or more bodies, it becomes undecidable to know whether the bodies will not eventually collide. Dynamical systems are much studied or used to describe various phenomena that can belong to mathematics [12], physics, biology [18]... The famous Lorenz’ attractor [15] is an example of a dynamical system describing a meteorological phenomenon. However, as standard as those systems are, and as simple as the description of their dynamics may be, many important problems such as limit and reachability are undecidable. The challenge is of interest in computer science as computational models can be modelized by dynamical systems. Hybrid systems in particular rely on dynamical systems plus some discrete behaviour and as such, if a problem is difficult in dynamical systems, it is bound to be more difficult in hybrid systems. The difficulty of the prediction of the trajectory of dynamical systems is testified by many undecidability results for natural problems on such systems. Some problems are decidable but undecidability comes fast into the picture. Even considering polynomial systems yields many undecidable problems: [10] for example shows that it is possible to simulate a Turing machine using a polynomial dynamical system. It is hence undecidable whether or not a trajectory will reach the region corresponding to the halting state of the machine. This particular problem can be seen as a continuous version of the Skolem-Pisot problem [17,4,11] which studies whether a component of a discrete linear system will reach 0. This problem is not different from deciding if this system reaches a hyperplan of the space, described by yk = 0 where k is the number of the component considered. The (point to point) reachability problem, which is undecidable in the general case, has been shown undecidable for various restricted classes of dynamical systems, such as Piecewise Constant Derivative systems [7] where the dynamics are really simple as it consists of a sharing of the space into regions where the derivative will be constant. Other results on the subject of reachability and undecidability of problems in hybrid systems are studied in [1,2,3,5]. The problem of the limit set of dynamical systems is also undecidable in the general case. It is of interest for ensuring safety properties such as the absence of infinite expansion or the ultimate presence in a given region. In this paper, we will study this problem, more precisely the ω-limit set, in a simple class of continuoustime dynamical systems: linear dynamical systems. As Turing machines can be encoded in dynamical systems, the description of the ω-limit set would give an answer to the halting problem hence it is not decidable in polynomial dynamical systems. However, this article proves that the ω-limit set is computable in linear dynamical systems and gives a way to compute a semi-algebraic representation of this set. The section 2 presents the problems we are going to solve and mathematical notions that will be useful in the following. The next sections are the core of this paper: they prove the main result of this paper, Theorem 1, that asserts that the
Computing Omega-Limit Sets in Linear Dynamical Systems
85
ω-limit set of a given linear system is semi-algebraic and thus computable. Part 3 recalls that putting the matrix into Jordan form is doable. Then part 4 shows how to solve the problem in the specific case where the matrix is in Jordan form.
2
Prerequisites
In this section we will first present the problems that motivate this document and some basic definitions and results on polynomials and matrices. 2.1
Linear Continuous-Time Dynamical Systems
The dynamics of a linear dynamical system are described by a linear differential equation. To describe such a system, we take a matrix of real numbers which will represent the dynamics and a vector of reals that is the initial point. We use here classical definitions and notations that can be found in [13]. Definition 1 (Linear continuous-time dynamical system). Given a matrix A ∈ Rn×n and a vector X0 ∈ Rn . We define X as the solution of the following Cauchy problem: = AX X (1) X(0) = X0 . X is called a trajectory of the system. Definition 2 (Reachability). Given A ∈ Rn×n , X0 ∈ Rn , Y ∈ Rn , the system is said to reach Y from X0 if there exists t ∈ R such that X(t) = Y with X the trajectory defined as the solution of 1. Definition 3 (ω-limit points). Given a trajectory X, a point Y is an ω-limit point of X if there is a diverging increasing sequence (tn ) ∈ RN such that Y = limn→+∞ X(tn ). Definition 4 (ω-limit sets). The ω-limit set of a dynamical system is the set of its ω-limit points: ω(X) = ∩n ∪t>n X(t), where A is the closure of the set A. Definition 5 (semi-algebraic set). A subset S of Rn is called semi-algebraic if it can be defined by a finite sequence of polynomial equations or a finite union of sets so described. Formally, it can be written ⎫ ⎧ m ⎨ ⎬ pi,l (x) > 0 S= x ∈ Rn ; pi,j (x) = 0 ∧ ⎭ ⎩ i=1
j
l
Let us now define the problem that we will be interested in solving: the ω-limit set problem. Problem 1 (ω-limit set). Given a dynamical system, compute a representation of its ω-limit set. The theorem 1 gives an answer to this problem for linear dynamical systems and proves that the ω-limit set is semi-algebraic in this case.
86
2.2
E. Hainry
Polynomials
Let us now recall a few notations, mathematical tools and algorithms on polynomials. In the following, we use a field K that is a subfield of C. We will usually use Q as this field. Definition 6 (Ring of polynomials). We denote K[X] the ring of one variable with coefficients in K. A polynomial can be written as P (X) = n polynomials i a X , with ai ∈ K. The integer n is the degree of P . i i=1 Definition 7 (Roots of a polynomial). The set Z(P ) of roots of a polynomial P is defined as Z(p) = {x ∈ C; P (x) = 0}. Definition 8 (Algebraic numbers). The set of roots of polynomials with coefficients in Q is the set of algebraic numbers. An algebraic number can be represented uniquely by the minimal polynomial it nulls (minimal in Q[X] for the division) and a ball containing only one root of the polynomial. Note that the size of the ball can be chosen using only the values of the coefficients of the polynomial as [16] shows a bound on the distance between roots of a polynomial from its coefficient. Definition 9 (Representation of an algebraic number). An algebraic number α will be represented by (P, (a, b), ρ) where P is the minimal polynomial of α, a + ib is an approximation of α such that |α − (a + ib)| < ρ and α is the only root of P in the open ball B(a + ib, ρ). It can be shown that given the representations of two algebraic numbers α and β, the representations of α + β, α − β, αβ and α/β can be computed: indeed the approximation and bound are easy to obtain, and the minimal polynomial can be obtained using classical properties of the resultant that gives the polynomial whose roots are the H(αi , βj ) with H a polynomial. See [6,8] for details. We will also use the term commensurable in a specific way: two numbers are commensurable if one is a multiple of the others by a rational factor. Definition 10 (Commensurable numbers). Two numbers a and b are commensurable if there exists a rational number p/q such that a = pq b. Let us note that it is easy to check if two algebraic numbers are commensurable: given the representations of those two numbers, we know how to compute the representation of the fraction of those numbers. Then it suffices to check if this fraction is rational which is equivalent to the minimal polynomial being of degree 1. Proposition 1 (Q-linear independent algebraic numbers). Given the representations of n algebraic numbers p1 , ..., pn , it is decidable whether they are Q-linear dependent which means there exists (α1 , ..., αn ) ∈ Qn − (0) such that αi pi = 0. If so, then this n-uple is computable.
Computing Omega-Limit Sets in Linear Dynamical Systems
2.3
87
Matrices
Definition 11 (Characteristic polynomial). Given a matrix A ∈ Kn×n , its characteristic polynomial is χA (X) = det(XIn − A). Definition 12 (Exponential of a matrix). Given a matrix A, its exponential denoted exp(A) is the matrix +∞ 1 i A. i! i=1 Note that the exponential is well defined for all real matrices. Given a square matrix A, we want to solve the Cauchy problem (1). To do that, we will first put the matrix into a useful form: Jordan’s form and then compute the exponential of the matrix as it is known that the solution of the linear differential equation will be closely related to the exponential of the matrix A. All matrices can be put in Jordan form, which allows to compute easily the exponential. To find more about Jordan matrices and blocks, the reader may consult [13] or [14]. Definition 13 (Jordan block). A Jordan block is a square matrix of one of the two following forms ⎛ ⎞ λ ⎜1 λ ⎟ ⎜ ⎟ ⎜ .. .. ⎟ ⎝ . . ⎠ 1 λ or
⎞ B ⎟ ⎜I2 B a −b 10 ⎟ ⎜ and I2 = ⎟ with B = ⎜ .. .. b a 01 ⎝ . . ⎠ I2 B ⎛
Definition 14 (Jordan form). A matrix that contains Jordan blocks on its diagonal is said to be in Jordan form. ⎛ D1 ⎜ ⎜0 ⎜ ⎜ . ⎝ ..
0 ··· . D2 . . .. .. . . 0 ··· 0
⎞ 0 .. ⎟ . ⎟ ⎟ ⎟ 0 ⎠ Dn
Proposition 2 ([14]). Any matrix A ∈ Rn×n is similar to a matrix in Jordan form. In other words, ∃P ∈ GL(Rn×n ) and J in Jordan form such that A = P −1 JP.
88
3
E. Hainry
Computing the Jordan Form of a Matrix
We are given a matrix A and an initial vector X0 containing rational elements. We want compute a formal solution of the Cauchy problem (1). To do that, we will compute the Jordan form of this matrix and the similarity matrices. The process of putting the matrix A into Jordan form is a classical one which consists in four parts which are detailed in appendix: – – – –
computing the characteristic polynomial; factorizing the polynomial in Q[X] (section A.1); computing the roots (section A.2); jordanizing the matrix (section A.3).
Let us note that the matrix we obtain is composed of algebraic numbers, hence we know how to compute on those matrices.
4
Computing the ω-Limit Set of a Dynamical System
Let us now suppose that the matrix A is in Jordan form and that A and X0 are composed of algebraic numbers. Our goal is to compute the ω-limit set of the dynamical system defined by the differential equation (1). 4.1
Computing the Solution of the Cauchy Problem
Let us first remark that the solution express. We have ⎛ D1 ⎜ ⎜0 A=⎜ ⎜ .. ⎝ . 0
of this differential equation is simple to 0 ··· . D2 . . .. .. . . ··· 0
with the Di being Jordan blocks of the form ⎛ λ ⎜1 λ ⎜ Di = ⎜ . . ⎝ .. .. 1 or
⎞ 0 .. ⎟ . ⎟ ⎟ ⎟ 0 ⎠ Dk ⎞ ⎟ ⎟ ⎟ ⎠
(2)
λ
⎞ B ⎟ ⎜I2 B 10 a −b ⎟ ⎜ Di = ⎜ . . with B = and I2 = .. .. ⎟ 01 b a ⎠ ⎝ I2 B ⎛
(3)
Computing Omega-Limit Sets in Linear Dynamical Systems
89
The solution of the Cauchy problem is then X(t) = exp(tA)X0 , which we can write as ⎛ ⎞ exp(tD1 ) ⎜ ⎟ exp(tD2 ) ⎜ ⎟ X(t) = ⎜ ⎟ X0 . .. ⎝ ⎠ exp(tDk ) And computing the exp(tDi ) is simple: ⎛ 1 ⎜ t ⎜ t2 ⎜ exp(tDi ) = etλ ⎜ 2 ⎜ .. ⎝ . tm m!
in case Di is of the form (2), ⎞ 1 t 1 .. .. .. . . . 2 · · · t2 t
⎟ ⎟ ⎟ ⎟; ⎟ ⎠ 1
if on the other hand, Di is of the form (3), then ⎛ ⎞ B2 ⎜ tB2 B2 ⎟ ⎜ 2 2 ⎟ cos(tb) − sin(tb) ⎟ ta ⎜ t B2 tB B 2 2 exp(tDi ) = e ⎜ 2 . ⎟ with B2 = sin(tb) cos(tb) ⎜ .. ⎟ .. .. .. ⎝ . ⎠ . . . tm m m! B2
4.2
···
t2 2 2 B2
tB2 B2
Simplifying the Matrix
We can without losing information delete the Jordan blocks corresponding to zeros of the initial vector. Indeed those blocks have no impact on the behaviour of the system as the corresponding components will forever stay 0. We can in the same optic remove certain lines and columns from the Jordan blocks if they will forever stay 0. Let ⎛ us write ⎞ X0i for the components of X0 corresponding to each Di . Formally, X01 ⎜X02 ⎟ ⎟ ⎜ X0 = ⎜ . ⎟ and the size of X0i being equal to the size of Di . This way, we ⎝ .. ⎠ X0k ⎛ ⎞ exp(tD1 )X01 ⎜ exp(tD2 )X02 ⎟ ⎜ ⎟ can write X(t) = ⎜ ⎟. .. ⎝ ⎠ . exp(tDk )X0k If X0i = 0, then ∀t, Xi (t) = 0. Hence we need not consider the i-th block to compute the system. If the l first components of X0j are 0, then we can erase the l first lines and columns from the j-th Jordan block if it is of form (2) and only the 2 2l first lines and columns if the block is in form (3).
90
E. Hainry
We then obtain a representation of the solution where all Jordan blocks are useful and all dimensions of the Jordan blocks have a repercussion on the result. From now on, when we will talk of the multiplicity of a Jordan block, it will refer to its size in this new matrix, more, for the case (3) as both a + ib and its conjugate have same influence, the multiplicity of the corresponding Jordan block will be half the size of the matrix. Definition 15 (Multiplicity of a Jordan block). Let D ∈ Rm×m be a Jordan block. If D is of the form (2), its multiplicity is defined as being m. If D is of the form (3), its multiplicity is defined as being m 2. Notice that for the case (3), the size of the matrix is even. Notice also that since an eigenvalue can be responsible for more than one Jordan block, the multiplicity of a Jordan block is not the same as the multiplicity of the eigenvalue. 4.3
Computing the ω-Limit Set
There are now a few cases to consider. We will first consider the most simple cases and finish with the more complicated ones. The cases we will consider are – there is an eigenvalue with positive real part; – there is an eigenvalue with null real part corresponding to a Jordan block of multiplicity > 1; – there are only eigenvalues with negative real part; – other cases (the only eigenvalues with non negative real part have null real part and multiplicity one). If an eigenvalue has a positive real part, then a term eλt appears with λ > 0. It means that the corresponding component will grow unboundedly. Hence the ω-limit set is empty. If an eigenvalue has a null real part, the exponential part disappears, and the first component will have a bounded trajectory. However, since we suppose that the multiplicity is greater than one, a t factor will have to be taken into account. This factor makes the second component grow unboundedly and hence the ω-limit set is empty. In the case where all the eigenvalues have a negative real part, all components will have a decreasing exponential in their expression and since for all integer m, tm exp(−t) converges towards 0, all components will converge towards 0. Otherwise, there will appear a trajectory that stay in a given region of the space and can either be periodic (circles, or multi-dimensional Lissajous curves), either be dense in a specific semi-algebraic set. Theorem 1. Given a linear dynamical system, its ω-limit set is computable and is a semi-algebraic set. Proof. Let us compute the ω-limit set Ω for the different possible cases. – If one eigenvalue has a positive real part, then
Computing Omega-Limit Sets in Linear Dynamical Systems
91
Ω = ∅. Indeed, this component diverges towards 0 hence no real point will be a limit of a subtrajectory. – If one eigenvalue has a null real part and a multiplicity greater than 1, Ω = ∅. Indeed, the second component related to this eigenvalue will diverge to +∞ due to the t term in the exponential matrix. – If all eigenvalues have negative real part, all the components will converge to 0, regardless of the multiplicity of the eigenvalues, hence Ω = {0k }. – If all eigenvalues are non positive reals, then all the components corresponding to negative eigenvalues will converge to 0 as in the third case, the components corresponding to a null eigenvalue will either be constant either diverge to +∞ if the multiplicity is greater than 1. Hence, either Ω = {(..., x0i , 0, ...)}. either Ω = ∅. – Otherwise we have complex eigenvalues of null real part and multiplicity 1, and we may have other eigenvalues, either 0 with multiplicity 1 (whose component will be constant), either eigenvalues with negative real part (that will converge to 0). Only the complex eigenvalues with null real part are of interest, so let us consider only them for now. We have eigenvalues ib1 , −ib1 , ..., ibn , −ibn , with the bi being real algebraic numbers. There are two cases to consider: either the family (b1 , b2 , ..., bn ) is Q linearly independent, either it is not. • Let us assume the (b1 , ..., bn ) is Q linearly independent. In this case, the trajectory will not be periodic but instead will be dense in the set of points whose projections on each (x2k+1 , x2k+2 ) are the circles defined by x22k+1 + x22k+2 = x2k+1 20 + x2k+1 20 . Indeed, it is trivial if n = 1. Let us consider it true for n = k. It means that for any given point (α11 , α12 , ..., αk1 , αk2 , αk+11 , αk+12 ) of that set, there exists a sequence of times (ti )i∈N such that
(x1 (ti ), ..., x2k (ti )) − (α11 , ..., αk2 ) <
1 . 2i
We can similarly, for any α build a sequence of times (tj )j∈N such that
(x2k+1 (tj ), x2k+2 (tj )) − (αk+11 , αk+12 ) < 21j . Indeed, there exists a number t0 such that (x2k+1 (t0 ), x2k+2 (t0 )) = (αk+11 , αk+12 ). So choosing tj = t0 + 2jπ verifies this constraint. As x are continuous functions, those
92
E. Hainry
inequalities are true for neighbourhoods Vi , Vj of those ti , tj . As bk+1 is not a linear combination of the b1 , ..., bk , for all i0 , j0 , there exist i > i0 and j > j0 such that Vi ∩ Vj = ∅. If we take tφ (i0 ) ∈ Vi ∩ Vj , then we have
(x1 (t ), ..., x2k+2 (t )) − (α11 , ..., αk+12 ) <
1 . 2i0 +1
Hence we have exhibited a sequence that converges towards the said point. Finally, Ω = {(x1 , ..., xn ); ∀i, x22i+1 + x22i+2 = x202 i+1 + x202 i+2 }. • Let us assume there exists α1 , ..., αn ∈ Qn with αn = 0 such that αi bi = 0. the n − 1 first components. Let Ω1 be the ω-limit set while considering cos(bt) − sin(bt) eibt 0 Let us first recall that is similar to . Hence, sin(bt) cos(bt) 0 e−ibt if we do the variable change, we obtain Xi (t) = X0i , and we have ibi t n−1 e αi = 1 and e−ibn t αn = i=1 eibi t αi and αi X0i X2n (t)αi = X0α2nn Xi (t)αi . i
i
This polynomial equation is verified by all points of the trajectory and hence constitutes a constraint on the ω-limit set. By an argument similar to the one in the previous item, we can show that the set of points verifying this constraint as well as all the projection constraints is effectively contained in the ω-limit set. Hence, with Xi = (x2i−1 + ix2i ), we have Ω = Ω1 ∩ {(x1 , ...., xn );x22n−1 + x22n = x22n−10 + x22n0 ∧ αi X0i X2n (t)αi = X0α2nn Xi (t)αi } i
i
In each case, we have been able to give a formal representation of the ω-limit set, either as the empty set, a single point or a combination of polynomial equations. All those descriptions are semi-algebraic which proves the semi-algebraicity of the ω-limit set.
References 1. Asarin, E., Maler, O., Pnueli, A.: Reachability analysis of dynamical systems having piecewise-constant derivatives. Theoretical Computer Science 138, 35–65 (1995) 2. Asarin, E., Schneider, G.: Widening the boundary between decidable and undecidable hybrid systems. In: Brim, L., Janˇcar, P., Kˇret´ınsk´ y, M., Kucera, A. (eds.) CONCUR 2002. LNCS, vol. 2421, pp. 193–208. Springer, Heidelberg (2002)
Computing Omega-Limit Sets in Linear Dynamical Systems
93
3. Asarin, E., Schneider, G., Yovine, S.: On the decidability of the reachability problem for planar differential inclusions. In: Di Benedetto, M.D., SangiovanniVincentelli, A.L. (eds.) HSCC 2001. LNCS, vol. 2034, pp. 89–104. Springer, Heidelberg (2001) 4. Berstel, J., Mignotte, M.: Deux propri´et´es d´ecidables des suites r´ecurrentes lin´eaires. Bulletin de la Soci´et´e Math´ematique de France 104, 175–184 (1976) 5. Blondel, V., Tsitsiklis, J.N.: A survey of computational complexity results in systems and control. Automatica 36(9), 1249–1274 (2000) 6. Bostan, A.: Algorithmique efficace pour des op´erations de base en calcul formel. ´ PhD thesis, Ecole polytechnique, d´ecembre (2003) 7. Bournez, O.: Complexit´e algorithmique des syst`emes dynamiques continus et hy´ brides. PhD thesis, Ecole Normale Sup´erieure de Lyon, janvier (1999) 8. Brawley, J.V., Carlitz, L.: Irreducibles and the composed product for polynomials over a finite field. Discrete Mathematics 65(2), 115–139 (1987) 9. Cohen, H.: A Course in Computational Algebraic Number Theory. Springer, Heidelberg (1993) 10. Gra¸ca, D.S., Campagnolo, M.L., Buescu, J.: Robust simulations of Turing machines with analytic maps and flows. In: Cooper, S.B., L¨ owe, B., Torenvliet, L. (eds.) CiE 2005. LNCS, vol. 3526, pp. 169–179. Springer, Heidelberg (2005) 11. Halava, V., Harju, T., Hirvensalo, M., Karhum¨ aki, J.: Skolem’s problem - on the border between decidability and undecidability. Technical Report 683, Turku Center for Computer Science (2005) 12. Hirsch, M.W., Smale, S., Devaney, R.: Differential Equations, Dynamical Systems, and an Introduction to Chaos. Elsevier Academic Press, Amsterdam (2003) 13. Hirsch, M.W., Smale, S.: Differential Equations, Dynamical Systems, and Linear Algebra. Academic Press, London (1974) 14. Lelong-Ferrand, J., Arnaudi`es, J.-M.: Cours de math´ematiques, tome 1: alg`ebre. Dunod (1971) 15. Lorenz, E.N.: Deterministic non-periodic flow. Journal of the Atmospheric Sciences 20, 130–141 (1963) 16. Mignotte, M.: An inequality about factors of polynomials. Mathematics of Computation 28(128), 1153–1157 (1974) 17. Mignotte, M.: Suites r´ecurrentes lin´eaires. In: S´eminaire Delange-Pisot-Poitou. Th´eorie des nombres, vol. 15, pp. G14–1–G14–9 (1974) 18. Murray, J.D.: Mathematical Biology, 2nd edn. Biomathematics, vol. 19. Springer, Berlin (1993) 19. von zur Gathen, J., Gerhard, J.: Modern Computer Algebra. Cambridge University Press, Cambridge (2003)
A A.1
Appendix Factorizing a Polynomial in Q[X]
The characteristic polynomial χA (X) of the matrix A ∈ Qn×n belongs to Q[X]. We will first factorize χA (X) in Q[X] to obtain some square-free polynomials. This is a classical problem. One solution is to use Yun’s algorithm [19, p. 371] that writes our polynomial χA into the form χA = Rii i
94
E. Hainry
where the Ri are square-free and do not share roots. The polynomial then a square-free polynomial that has the same roots as P .
Ri is
Proposition 3. Suppose given a polynomial P that we can write as P = (X − αj )βj with the αj distinct. Let Q = P/ gcd(P, P ), then Q is square-free and Q=
(X − αj ).
We then want to factorize this polynomial Q in irreducible factors in Q[X]. This problem is again a classical problem. An algorithm that achieves this goal is for example presented in [9, p. 139]. Proposition 4. Given a square-free polynomial P ∈ Q[X], we can compute its factorization in Q[X]. So we have obtained Q = Qi with the Qi being polynomial that are irreducible in Q[X]. A.2
Computing the Roots
To obtain χA ’s roots, we are going to compute the roots of Q. Those are algebraic numbers. We only then need to compute a representation of each of those roots. It means finding the minimal polynomial and giving a rational approximation of the root and an error bound to discriminate other roots of the minimal polynomial. Let us consider a Qi . There can be both real roots and complex roots that are not real. Sturm’s theorem allows us to know the number of each of them [9, pp. 153-154]. We can then find the real roots with, for example, Newton’s iteration algorithm [19, sec. 9.4]. The complex roots will for example be computed with Schnhage’s method. From this, we obtain approximations of the roots of the polynomial Qi . Let αj be one of those roots. The minimal polynomial of αj divides Qi and belongs to Q[X]. As Qi is irreducible in Q[X], the minimal polynomial can only be Qi (1 has no root and hence cannot be a minimal polynomial). We then obtain a factorization of Q as (X − αj ) with the αj explicitly defined as algebraic numbers. A.3
Jordanizing the Matrix
The final step to be able to use the method described earlier is to do the factorization of χA in C[X]. In fact, it is sufficient to do it in Q({αj })[X] to obtain a factorization into monomials. So from now on, we will work in Q({αj }) which is the field generated from Q and the algebraic numbers {αj }.
Computing Omega-Limit Sets in Linear Dynamical Systems
95
To find the multiplicity of each root, we just need to know how many times the minimal polynomial divides χA . We then obtain a decomposition (X − ai )bi χA (X) = ((X − αi )(X − α¯i ))βi with the αi being the complex not real roots and the ai the real⎛ roots. ⎞ ai ⎟ ⎜ 1 ai ⎟ ⎜ The different Jordan blocks composing the matrix are either ⎜ . . .. .. ⎟ ⎠ ⎝ 1 ai ⎛ ⎞ B ⎜I2 B ⎟ p −q ⎜ ⎟ either ⎜ . . with B = for αi = p + iq. Note that an eigenvalue .. .. ⎟ q p ⎝ ⎠ I2 B can be responsible for more than one block. The number of different blocks an eigenvalue λ creates is dim(ker(A − λ)). Similarly, let δi = dim(ker(A − λ)i ), δi+1 − δi is the number of blocks of size at least i + 1. We can hence know the number of blocks of each size and write a Jordan matrix J consisting of blocks in decreasing size order (any order would be fine). This Jordan matrix is similar to the original matrix A. We finally need to compute the similarity matrix P which will be such that A = P −1 JP . This matrix is obtained by computing the eigenvectors of the matrix A (or J).
The Expressiveness of Concentration Controlled P Systems Shankara Narayanan Krishna Department of Computer Science & Engineering, IIT Bombay, Powai, Mumbai, India 400 076
[email protected]
Abstract. In this paper, we study concentration controlled P systems having catalysts, bi-stable catalysts and mobile catalysts. We show that computational universality can be obtained for pure catalytic P systems using 2 bi-stable catalysts and 1 membrane, improving the known universality result [2]. We also give universality results using catalysts, and mobile catalysts. Further, we identify some subclasses of these which are not computationally complete.
1
Introduction
P systems are a class of distributed parallel computing devices introduced in [10]. Various variants [11], [14] of P systems have been considered ever since. In this paper, we study a particular variant of P systems introduced in [2], where the transfer of symbols depends only on the differences in concentration between various regions. This variant was introduced to eliminate the unrealistic features of communication used in P systems wherein, a target could be specified for moving a symbol. A known variant where communication happens without specifying targets explicitly is [9]. In the variant [2], bi-stable catalysts are used, and all communications are governed only based on the concentration difference. It has been shown that using arbitrarily many bi-stable catalysts, one can obtain Turing completeness, while using a single bi-stable catalyst, the family of matrix languages can be obtained. Bi-stable catalysts are catalysts which can remain in two states (like on/off) and toggle between these states. In this paper, we continue the study of this variant. [12] mentions the study of this variant as an open problem (Problem Q4). We show that using controlled concentration, we can obtain universality using (i) a single mobile catalyst and 3 membranes, (ii) 2 bi-stable catalysts and 1 membrane in pure catalytic systems and (iii) 2 catalysts and 1 membrane. We also obtain some non-universality results. Mobile catalysts were introduced in [7] and the power of mobile catalysts and bi-stable catalysts have been studied in [7], [6].
2
Preliminaries
We refer to [1] and [13] for the elements of formal language theory we use here. N denotes the set of natural numbers, while Nk denotes the set of all natural C.S. Calude et al. (Eds.): UC 2008, LNCS 5204, pp. 96–110, 2008. c Springer-Verlag Berlin Heidelberg 2008
The Expressiveness of Concentration Controlled P Systems
97
numbers greater than or equal to k for k ≥ 1; V denotes a finite alphabet; V ∗ is the free monoid generated by V under the operation of concatenation and has the empty string denoted by λ, as unit element. The length of a string x ∈ V ∗ is denoted by |x| and the number of occurrences of a symbol a ∈ V in x is denoted by |x|a . Parikh Vectors: For V = {a1 , . . . , an }, the Parikh mapping associated with V is ψV : V ∗ → Nn defined by ψV (x) = (|x|a1 , . . . , |x|an ), for all x ∈ V ∗ . For a language L, its Parikh set ψV (L) = {ψV (x) | x ∈ L} is the set of all Parikh vectors of all words x ∈ L. For a family F L of languages, we denote by P sF L the family of Parikh sets of vectors associated with languages in F L. For basic elements of membrane computing we refer to [11]; for the state-of-the art of the domain, the reader may consult the bibliography from the web address [14]. For proving computational universality, we use the concept of Minsky’s register machine [8] and matrix grammars with appearance checking [1]. Register Machines: Some of the universality proofs in this paper are based on the concept of Minsky’s register machine [8]. Such a machine runs a program consisting of numbered instructions of several simple types. Several variants of register machines with different number of registers and different instruction sets were shown to be computationally universal. An n-register machine is a construct M = (n, P, i, h) , where: (i) n is the number of registers, (ii) P is a set of labeled instructions of the form j : (op (r) , k, l), where op (r) is an operation on register r of M , j, k, l are labels from the set Lab (M ) (which numbers the instructions in a one-to-one manner), (iii) i is the initial label, and (iv) h is the final label. The machine is capable of the following instructions: - (add(r), k, l): Add one to the contents of register r and proceed to instruction k or to instruction l; in the deterministic variants usually considered in the literature we demand k = l. - (sub(r), k, l): If register r is not empty, then subtract one from its contents and go to instruction k, otherwise proceed to instruction l. - halt: Stop the machine. This additional instruction can only be assigned to the final label h. In their deterministic variant, such n-register machines can be used to compute any partial recursive function f : Nα → Nβ ; starting with (n1 , . . . , nα ) ∈ Nα in registers 1 to α, M has computed f (n1 , . . . , nα ) = (r1 , . . . , rβ ) if it halts in the final label h with registers 1 to β containing r1 to rβ . If the final label cannot be reached, then f (n1 , . . . , nα ) remains undefined. A deterministic n-register machine can also analyze an input (n1 , . . . , nα ) ∈ Nα 0 in registers 1 to α, which is recognized if the register machine finally stops by the halt instruction with all its registers being empty. If the machine does not halt, the analysis was not successful. In their non-deterministic variant, n-register machines can compute any recursively enumerable set of non-negative integers (or of vectors of non-negative integers). Starting with all registers being empty, we consider a computation of
98
S.N. Krishna
the n-register machine to be successful, if it halts with the result being contained in the first (β) register(s) and with all other registers being empty. Some classic results on register machines are mentioned below. Proposition 1. For any partial recursive function f : Nα → Nβ there exists a deterministic (max {α, β} + 2)-register machine M computing f in such a way that, when starting with (n1 , . . . , nα ) ∈ Nα in registers 1 to α, M has computed f (n1 , . . . , nα ) = (r1 , . . . , rβ ) if it halts in the final label h with registers 1 to β containing r1 to rβ (and with all other registers being empty); if the final label cannot be reached, f (n1 , . . . , nα ) remains undefined. Proposition 2. For any recursively enumerable set L ⊆ Nβ of vectors of nonnegative integers there exists a non-deterministic (β + 2)-register machine M generating L in such a way that, when starting with all registers 1 to β + 2 being empty, M non-deterministically halts with ni in registers i, 1 ≤ i ≤ β, and registers β + 1 and β + 2 being empty if and only if (n1 , . . . , nβ ) ∈ L. A register machine can also be used for defining a language, in the following way. If V = {a1 , . . . , ak }, then each string w ∈ V ∗ can be interpreted as a number in base k + 1. Specifically, if w = ai1 ai2 . . . ain , 1 ≤ ij ≤ k, 1 ≤ j ≤ n, then val(w) = i1 (k + 1)n−1 + . . . + in−1 (k + 1) + in . Then, we have: Proposition 3. If L ⊆ V ∗ , card(V ) = k, L ∈ RE, then a 3-register machine M exists such that for every w ∈ V ∗ we have w ∈ L if and only if M halts when starting with valk+1 (w) in its first register; in the halting step, all registers of the machine are empty. Random Context Grammars: A random context grammar is a construct G = (N, T, S, R) where N is the set of non-terminals, T is the set of terminals, S is the start symbol and R is a set of rules of the form p : (A → w, E1 , E2 ) where A → w is a context-free production over N ∪T and E1 , E2 are subsets of N . Then, p can be applied to a string x ∈ (N ∪ T )∗ only if x = x1 Ax2 , E1 ⊆ alph(x1 x2 ), and E2 ∩ alph(x1 x2 ) = ∅. alph(x1 x2 ) stands for the set of symbols occurring in x1 x2 . If E1 or E2 is the empty set, then no condition is imposed by E1 or E2 respectively. E1 is said to be permitting and E2 is said to be the set of forbidding context conditions of p. We denote by RCp,f the family of languages generated by random context grammars with permitting and forbidding contexts and λ-free λ rules. If λ-rules are allowed, the family is represented by RCp,f . If in all rules, the set E1 is empty, we denote the family of languages generated by RCfλ ; similarly, if in all rules, the set E2 is empty, the resulting family of languages is denoted by RCpλ . It is known that NCF ⊆ NRCp,f ⊂ NRE and that NRCfλ ⊂ NRE. Matrix Grammars: A context-free matrix grammar without appearance checking is a construct G = (N, T, S, M ) where N, T are disjoint alphabets of non-terminals and terminals, S ∈ N is the axiom, and M is a finite set of matrices of the form (A1 → x1 , . . . , An → xn ) of context-free rules. For a string w, a matrix m : (r1 , . . . , rn ) is executed by applying the productions r1 , . . . , rn one after the another, following the order in which they appear in the matrix.
The Expressiveness of Concentration Controlled P Systems
99
We write w ⇒m z if there is a matrix m : (A1 → x1 , . . . , An → xn ) in M and the strings w1 , . . . , wn+1 in (N ∪ T )∗ such that w = w1 , wn+1 = z, and for each i = 1, 2, . . . , n we have wi = wi Ai wi , wi+1 = wi xi wi . We denote by ⇒∗ the reflexive and transitive closure of the relation ⇒. The language generated by G is L(G) = {x ∈ T ∗ | S ⇒∗ x}. The family of languages generated by context-free matrix grammars is denoted by M AT or M AT λ (depending on whether λ-free rules or λ-rules are used). A matrix grammar with appearance checking is a construct G = (N, T, S, M, F ), where N, T are disjoint alphabets, S ∈ N, M is a finite set of sequences of the form (A1 → x1 , . . . , An → xn ), n ≥ 1, of context-free rules over N ∪T (with Ai ∈ N, xi ∈ (N ∪T )∗ , in all cases), and F is a set of occurrences of rules in M . For w, z ∈ (N ∪ T )∗ , we write w ⇒ z if there is a matrix (A1 → x1 , . . . , An → xn ) in M and the strings wi ∈ (N ∪ T )∗ , 1 ≤ i ≤ n + 1, such that w = w1 , z = wn+1 , and, for all 1 ≤ i ≤ n, either wi = wi Ai wi , wi+1 = wi xi wi , for some wi , wi ∈ (N ∪ T )∗ , or wi = wi+1 , Ai does not appear in wi , and the rule Ai → xi appears in F . The rules of a matrix are applied in order, possibly skipping the rules in F if they cannot be applied; we say that these rules are applied in the appearance checking mode. If F = ∅, then the grammar is said to be without appearance checking (and F is no longer mentioned). The language generated by G is defined by L(G) = {w ∈ T ∗ | S ⇒∗ w}. The family of languages of this λ form is denoted by M ATac . When we use only grammars without λ-rules, then the obtained family is denoted by M ATac . In [4], it has been shown that each recursively enumerable language can be generated by a matrix grammar in the strong binary normal form. Such a grammar is a construct G = (N, T, S, M, F ), where N = N1 ∪ N2 ∪ {S, #}, with these three sets mutually disjoint, two distinguished symbols B (1) , B (2) ∈ N2 , and the matrices in M of one of the following forms: (1) (2) (3) (4)
(S → XA), with X ∈ N1 , A ∈ N2 , (X → Y, A → x), with X, Y ∈ N1 , A ∈ N2 , x ∈ (N2 ∪ T )∗ , |x| ≤ 2, (X → Y, B (j) → #), with X, Y ∈ N1 , j = 1, 2, (X → λ, A → x), with X ∈ N1 , A ∈ N2 , x ∈ T ∗ .
Moreover, there is only one matrix of type 1, and F consists of all the rules B (j) → #, j = 1, 2 appearing in matrices of type 3. # is a trap-symbol, once introduced it is never removed. Clearly, a matrix of type 4 is used only once, in the last step of a derivation. The corresponding family of generated languages is λ λ . It is known that M ATac = RE and M ATac ⊂ CS. denoted by M ATac Random Context Matrix Grammars: A random context matrix grammar is a construct G = (N, T, M, S, F ) where N, T, S are as in a usual matrix grammar and M is a finite set of triples ((A1 → x1 , A2 → x2 , . . . , An → xn ), Q, R) where Ai → xi are context-free rules, 1 ≤ i ≤ n, Q, R ⊆ N , Q ∩R = ∅. A matrix can be applied to a string x = x1 X1 x2 X2 . . . Xl xl+1 in order to rewrite effectively the symbols X1 , . . . , Xl only if x1 . . . xl+1 contains all symbols of Q and no symbols of R. We denote by RCM (M, β, max(α, γ)) the family of languages generated
100
S.N. Krishna
by random context matrix grammars G = (N, T, S, M, F ) with rules of type β, β ∈ {CF, CF − λ}, with arbitrary F if γ = ac, or with empty F if γ is empty, with arbitrary R in ((r1 , . . . , rn ), Q, R) ∈ M if α = ac and with empty α if no forbidding contexts are involved. max (α, γ) = ac if at least one of α, γ is ac. Thus, if no appearance checking is used, and if no forbidding contexts are used, we have the family RCM (M, β, ∅). It is known [1] that RCM (M, CF − λ, ∅) = M AT , and RCM (M, CF − λ, ac) = M ATac ⊂ CS. It is known that the family RCM (M, CF, ∅) = M AT λ is closed under arbitrary homomorphisms [1].
3
Concentration Controlled P Systems
A concentration controlled P system [2] is a construct Π = (V, Vc , T, C, μ, R1 , . . . , Rn ) where (i) V is the basic alphabet, (ii) Vc ⊆ V is a subset of V whose concentration will be controlled, (iii) T ⊆ V is the terminal alphabet, (iv) C is a set of bi-stable catalysts, C ∩ V = ∅, (v) μ is the initial membrane structure along with the initial multisets of objects as well as bi-stable catalysts in the appropriate regions, (vi) Ri , 1 ≤ i ≤ n are the rules governing the actions in the regions 1, . . . , n of μ. The rules Ri are of the form a → v or ca → cv, cb → cw where a, b ∈ V , v, w ∈ V ∗ and c, c are the two states that the bi-stable catalyst can be in. If there are k ≥ 1 bi-stable catalysts in C, we represent C as {c1 , . . . , ck }. Each catalyst can be in either of the two states ci , ci at any point of time. The concentration control of objects in Vc is done as follows: whenever objects of Vc are introduced in a region i, they are immediately redistributed such that all regions have equal number of symbols of Vc ; a difference of one occurrence is allowed when the total number of symbols introduced is not divisible by the number of regions. If there are more objects than the number of regions, the extra objects can be redistributed according to the following principle: move objects at the lowest distance from region i. For instance, if there are m regions in μ, and if m + 1 symbols of Vc are introduced in a region i, then all regions including the outer region, (the environment) will get one object each. If m + 2 objects are introduced in region i, then all regions including the outer region will get a single object while region i will get two objects. This principle is simple if the objects of Vc to be redistributed are introduced in a single region at any point of time. In this paper, we consider only this simple case. Thus, in this framework, no targets are necessary. The objects of Vc introduced in a region i will move according to their concentration, and objects of V − Vc will remain in the regions where they were introduced. We assume that the redistribution is done instantaneously. The rules are applied in a maximally parallel manner : all objects from all membranes, which can evolve by a rule must evolve. A sequence of transitions between configurations is called a computation; a computation is successful only if it halts - that is, no rule must be applicable to any object in any membrane. The result of a successful computation is the number of terminal symbols leaving the skin membrane. We denote by N(Π) the set of natural numbers computed by Π
The Expressiveness of Concentration Controlled P Systems
101
and by NOPm (con, bcatk ) the family of sets N(Π) generated by P systems with controlled concentration, using atmost k bi-stable catalysts and having atmost m membranes, k, m ≥ 1. In the following subsections, we consider concentration controlled P systems where the set C can be either a set of bi-stable catalysts, or mobile catalysts or catalysts. Catalysts and mobile catalysts are special objects not in V . Rules involving catalysts are of the form ca → cx where a ∈ V and x ∈ V ∗ , while those involving mobile catalysts are of the form ca → (c, move)x where a ∈ V, x ∈ V ∗ . The option move allows the mobile catalyst c to move from a region to any adjacent region. The family of sets computed would be denoted p p accordingly by NOPm (con, mcatk ) (using mobile catalysts) or NOPm (con, catk ) (using catalysts). Systems Π where all rules involve catalysts are called purely catalytic. The family of sets of numbers computed by these systems are denoted p (con, α), α ∈ {mcatk , bcatk , catk }. Purely catalytic systems were studby NOPm ied in [5], [3] and [6]. 3.1
Bi-stable Catalysts
We show that 2 bi-stable catalysts give Turing completeness even in purely catalytic systems with one membrane only, thereby answering the question posed in [12]. Theorem 1. NOP1p (con, bcat2 ) = NRE. Proof. We only prove the assertion NRE ⊆ NLP1p (con, bcat2 ), and infer the other inclusion from the Church-Turing thesis. The proof is based on the observation that each set from NRE is the range of a recursive function. Thus, we will prove the following assertion. For each recursively enumerable function f : N → N, there is a P System Π with 1 membrane having 2 bi-stable catalysts satisfying the following condition: For any arbitrary x ∈ N, the system Π first “generates” a multiset of the form ox1 and halts if and only if f (x) is defined, and, if so, the result of the computation is f (x). In order to prove this assertion, we consider a deterministic register machine with 3 registers, the last one being a special output register which is never decremented. Let there be a program P consisting of n instructions P1 , . . . , Pn which computes f . Let Pn correspond to the instruction HALT and P1 be the first instruction. The input value x is expected to be in register 1 and the output value in register 3. Without loss of generality, we can assume that all registers other than the first one are empty at the beginning of a computation. We can also assume that in the halting configuration all registers except the third, where the result of the computation is stored, are empty. Construct Π = (V, Vc , T, {c1 , c2 }, [ c1 c2 β] 1 , R1 ) with V = Vc ∪ {o1 , o2 , β, #} ∪ {Pj , P j , Njr , Qrj , Rjr , Sjr , Tjr , Ujr | 1 ≤ j ≤ n, r ∈ {1, 2}}, Vc = T = {o3 }, and rules as follows: 1. Generation of ox1 , the initial contents of register 1 0. c1 β → c1 o1 β, c1 β → c1 o1 β, c1 β → c1 P1 .
102
S.N. Krishna
An arbitrary number of copies of o1 are produced; at the end β is replaced with P1 , the first instruction. 2. Simulation of Pi : (ADD(r), j), r ∈ {1, 2, 3} 1. c1 Pi → c1 P i or , i = (ADD(r), j), r ∈ {1, 2}, c1 Pi → c1 P i o3 o3 , i = (ADD(3), j), 2. c2 P i → c2 #, c1 P i → c1 Pj , i = (ADD(r), j), ci # → ci #, ci # → ci #, i ∈ {1, 2}. To simulate addition, c1 interacts with Pi , and is replaced with c1 ; P i or are produced. If r = 3, we obtain o23 , which is redistributed, sending a copy outside. The symbol P i is acted upon by c1 replacing it with Pj , otherwise, c2 replaces it with #, triggering a non-halting computation. 3. Simulation of Pi : (SU B(r), j, k), r ∈ {1, 2} 3. cr Pi → cr Qri , i = (SU B(r), j, k). To simulate a subtract instruction cr acts on Pi replacing it with Qri . Case 1: The register value is non-zero. 4. cr or → cr , c3−r Qri → c3−r Rir Sir , 5. cr Rir → cr Njr , c3−r Sir → c3−r , 6. cr Njr → cr Pj , c3−r Njr → c3−r #, c3−r Sir → c3−r #. cr acts upon a symbol or , erasing it; in parallel, c3−r acts upon Qri , and replaces it with Rir Sir . Next, Rir is replaced with Njr by cr while c3−r erases Sir . If c3−r acts upon o3−r instead of Sir , then in the next step, we will have either rules (i) cr Njr → cr Pj and c3−r Sir → c3−r # or (ii) cr acts upon or and c3−r acts upon Sir or Njr giving # and triggers a non-halting computation. Case 2: The register value is zero. 4. c3−r Qri → c3−r Rir Sir ,
5. cr Rir → cr Tkr , c3−r Sir → c3−r ,
6. cr Tkr → cr Ukr , c3−r Sir → c3−r #, 7. cr Ukr → cr Pk . In case there are no or ’s, then c3−r will act on Qri replacing it with Rir Sir . Then we have the catalysts cr , c3−r in the membrane. Next, cr acts on Rir replacing it with Tkr , while in parallel, c3−r acts on Sir erasing it. If c3−r had acted upon o3−r instead of Sir , then in the next step, cr replaces Tkr with Ukr while c3−r replaces Sir with #. Next, cr replaces Ukr with Pk . Note that there is no danger of cr acting upon or instead of Ukr as there are no symbols or in the membrane. Thus, if all instructions are simulated correctly, we obtain the label Pn of the halting instruction in the membrane. Since there are no more instructions after Pn , there are no more applicable rules in the membrane. The system halts, and the output is the number of symbols o3 that have been sent out.
Theorem 2. NOP1p (con, bcat1 ) = NRCM (M, CF, ∅) = NM AT λ ⊂ NRE.
The Expressiveness of Concentration Controlled P Systems
103
Proof. Consider a P system Π = (V, Vc , T, {c}, [ cw] 1 , R1 ). Let V = {a | a ∈ V }. Construct a random context matrix grammar without appearance checking G = (N, T , M, S) where N = V ∪ V ∪ {c, c, d, d} ∪ {ψ, S}, T = T ∪ {ψ}, having matrices as follows: 1. ((S → cw), ∅, ∅), 2. (((a → xy, c → c), ∅, ∅), if ca → cz ∈ R1 and and x, y are obtained from z as follows: If z contained k symbols of Vc , then y is a string containing any k2 symbols of Vc in z, |x| = k2 and contains the remaining k2 symbols of Vc in z, and, the Parikh vector of y and z are same over the alphabet V − Vc , 3. (((a → xy, c → c), ∅, ∅), if ca → cz ∈ R and x, y are obtained from z in the same way as above, 4. ((c → d), ∅, ∅), 5. ((c → d), ∅, ∅), 6. ((a → λ), ∅, ∅), a ∈ V − T , 7. ((a → λ, d → d), ∅, ∅) if ca → cw ∈ R1 , 8. ((a → λ, d → d), ∅, ∅) if ca → cw ∈ R1 , 9. ((d → ψ), ∅, ∅), ((d → ψ), ∅, ∅). The first matrix is the starting matrix; it replaces S with cw where w is the initial multiset in the membrane and c denotes that a rule of the form ca → cv should be applied first. Rules 2,3 keep track of the application of a catalytic rule: a symbol a is replaced with xy and c with c if there was a rule ca → cz in R. x and y are obtained as follows: If z had k symbols of Vc , then k2 of these will be redistributed outside, and the remaining symbols as well as those of V − Vc will stay inside. The symbols that have been sent out are represented by x. Rules 2,3 can be applied as long as there are applicable rules. Non-deterministically, we can use one of the rules 4,5 replacing c or c with d, d. These rules guess that no rules are applicable from a point onwards. Rule 6 can be used anytime: the symbols of V − T can be erased, since they do not contribute to N(Π). Rules 7 to 9 are the only ones applicable now. These can be used only if we have a guessed a halting computation correctly. All symbols a ∈ V which do not correspond to a rule are erased: we should be able to erase all remaining symbols of V in the string to obtain a string in L(G). Rule 9 replaces d or d with ψ. Thus, we obtain ∗ a string in L(G) of the form ψv, v ∈ T if Π halts and if we detected the halting configuration correctly. Note that N(L(G)) = N(Π) + 1. Since M AT λ is closed under arbitrary morphisms, we can define a morphism h on T as h(ψ) = λ, h(a) = a, a ∈ T . The resultant language h(L(G)) is also
in M AT λ and N(h(L(G))) = N(Π). The case of NOP1 (con, bcat1 ) has been left open; we conjecture that NOP1 (con, bcat1 ) is also strictly less than NRE. 3.2
Mobile Catalysts
In this section, we consider concentration controlled P systems where the set C contains mobile catalysts. A mobile catalyst can move from one region to the
104
S.N. Krishna
another. The rules are either of the form a → w or involving mobile catalysts of the form ca → (c, move)w or ca → cw where move specifies that the catalyst can move to any one adjacent region. If the option move has not been specified, then the catalyst remains where it was. Note that the option move is more general than the options of in and out which were considered in [11]. The objects of Vc in w will be redistributed as defined earlier. We assume that the catalyst does not leave the system by the move option. Theorem 3. NOP3 (con, mcat1 ) = NRE. Proof. Consider a matrix grammar G = (N, T, S, M, F ) in the strong binary normal form. Assume that we have n1 matrices of type 2 or 4 numbered mi , 1 ≤ i ≤ n1 and n2 matrices of type 3, numbered mi , n1 + 1 ≤ i ≤ n1 + n2 . Construct Π = (V, Vc , T, {c}, [ cXA [ ] 2 [ ] 3 ] 1 , R1 , R2 , R3 ) with X , X | X ∈ N1 } V = Vc ∪ N ∪ {Xk,j | X ∈ N1 , 1 ≤ k ≤ n1 , j ≥ 0} ∪ {X, ∪ {α, α , H, H , H }, Vc = {Xi,i , Ai,i | X ∈ N1 , A ∈ N2 , 1 ≤ i ≤ n1 } ∪ {x | x ∈ (N2 ∪ T )∗ , |x| ≤ 2} ∪ {X i | X ∈ N1 , n1 + 1 ≤ i ≤ n1 + n2 }, with rules as follows : 4 R1 : 1. X → Xi,i , cA → (c, move)A4i,i , ∃ mi : (X → Y or λ, A → x)
of type-2 or 4, 2. Xi,i → λ, Ai,i → λ, 4
3. cX → (c, move)X i , there exists mi : (X → Y, B (j) → #), 4. X i → λ, 5. Y → Y , Y ∈ N1 , 6. Y → Y, Y ∈ N1 , x → x, x ∈ (N2 ∪ T )∗ , 7. a → λ, a ∈ T, R2 : 1. cXi,k → cXi,k−1 , Ai,j → Ai,j−1 , k ≥ 2, j ≥ 1, 2. cXi,1 → cY 4 α, ∃ mi : (X → Y, A → x) of type-2, cXi,1 → cHα, ∃ mi : (X → λ, A → x) of type-4, 3. cAi,0 → cx4 , ∃ mi : (X → Y, A → x) of type-2 or 4, α → α , H → H , 4. cα → (c, move), H → H , α → #, Ai,0 → #, Xi,j → #, j ≥ 0, # → #, 5. cH → (c, move)H , X i → λ, i ∈ lab2 , x → x, x ∈ (N2 ∪ T )∗ , β → λ, β ∈ N1 ∪ N2 \{B (1) } ∪ T ∪ {Y | Y ∈ N1 }, Xi → Y , cB (1) → c#, ∃ mi : (X → Y, B (1) → #), 6. cY → (c, move)Y 4 , 7. Y → #, Y → λ, Y → λ, Y ∈ N1 ,
The Expressiveness of Concentration Controlled P Systems
105
R3 : 1. Xi → Y , cB (2) → c#, ∃ mi : (X → Y, B (2) → #), X k → λ, k ∈ lab1 , Xi,i → λ, Ai,i → λ, 2. cY → (c, move)Y 4 , 3. # → #, Y → #, Y → λ, Y → λ, Y ∈ N1 , x → x, A → λ, A ∈ N2 ∪ T, A = B (2) . In the initial configuration, membrane 1 contains the symbols X, A corresponding to the first matrix (S → XA). 1. Simulation of mi : (X → Y, A → x), 1 ≤ i ≤ n1 The symbols X, A corresponding to mi are found in membrane 1. Rule 1 4 of R1 is used. X is replaced with Xi,i , while A is replaced with A4i,i . The mobile catalyst c enters membrane 2 or 3. Assume that it enters membrane 2. A copy of Xi,i , Ai,i is sent to each membrane, and one copy goes out. The copies of Xi,i , Ai,i which reach membranes 1,3 are erased. In membrane 2, the indices j, k of Xi,j and Al,k are decremented until j reaches 1 and k reaches 0. When Xi,1 is obtained, it is replaced with Y 4 α where α is a special symbol. In the next step, we will be able to replace Al,0 with x4 if l = i. The symbols of x except B (1) and the symbol Y are erased in membrane 2; in membrane 3, Y as well as all symbols of x except B (2) are erased, while membrane 1 retains all symbols of N2 ∪ N1 . If l > i in the symbols Al,l and Xi,i , then Al,0 will be replaced by # since the mobile catalyst c returns to membrane 1 after erasing α . If l < i, then either Al,0 is replaced with # or Xi,j , j > 0 is replaced with #. In both cases, a non-halting computation is obtained due to the rule # → #. Thus, we obtain a correct simulation only when l = i. If c had entered membrane 3, we would have obtained a non halting computation due to the rules Xi,j → #. Note that in the absence of A ∈ N2 corresponding to mi , a non-halting computation will be obtained since the mobile catalyst c will be still in membrane 1. The rule Xi,j → # will be used. 2. Simulation of mi : (X → Y, B (j) → #), n1 + 1 ≤ i ≤ n1 + n2 4 In this case, rule 3 of R1 is used. X is replaced with X i and the mobile catalyst should to membrane 2 or 3 depending on whether i ∈ lab1 or lab2 . The symbols X i are erased in membrane 2 if i ∈ lab2 , and in membrane 3 if i ∈ lab1 . Assume i ∈ lab1 . Then, in membrane 2, X i is replaced with Y , while in parallel, a copy of B (1) (if any) is replaced with the trap symbol #. In the next step, the mobile catalyst moves back to membrane 1, replacing Y with Y 4 . Note that the catalyst does not get a chance to attack B (1) during simulation of type-2/4 matrices since it is kept busy by Xi,j , Ai,0 and then α in order. The simulation of a type-3 matrix when i ∈ lab2 happens in membrane 3 and is easy to see. Now, if the catalyst had moved to membranes 3,2 while trying to simulate a matrix mi with i ∈ lab2 and lab1 respectively, we would have obtained a non halting computation due to rules Y → #.
106
S.N. Krishna
3. Halting : If all matrices are simulated correctly as described above, we reach the last matrix of type-4. After simulating this matrix, we obtain H in membrane 2 and the mobile catalyst c comes back to membrane 1. If any symbols A ∈ N2 remain, then by rule 1 of R1 , A will be replaced with A4i,i . This, when sent to membrane 2 will be sent back to membrane 1 by H if c moves to membrane 2. Thus, if there are n symbols A1 , . . . , An ∈ N2 in membrane 1, and if c always moves to membrane 2, at least the nth symbol An will trigger a non halting computation using the rules Ani,0 → #, # → #. If c moves to membrane 3, then it will stay put there and a non halting computation will be induced by some Aj . In case there are no more symbols of N2 in membrane 1, then we obtain a halting configuration. The output consists of the number of terminal symbols that have been sent out during the simulation of type-2/type-4 matrices.
If we consider purely catalytic systems with mobile catalysts, systems with 2 membranes and 1 catalyst do not characterize NRE. Theorem 4. NOP2p (con, mcat1 ) ⊂ NRE. Proof. It has been proved in [6] that systems with 2 membranes, having a single mobile catalyst and using pure catalytic rules are strictly less than RE. It is easy to see that a concentration controlled P system can be simulated by a system where targets are explicitly mentioned. Thus, given a concentration controlled system Π with 2 membranes and 1 mobile catalyst, we can construct a system Π where targets are explicitly mentioned (calculate the number of objects each region would receive and then accordingly mention explicit targets). Apply the proof given in [6] to the constructed system Π . Thus, it can be shown that NOP2p (con, mcat1 ) ⊂ NRE.
3.3
Catalysts
In this section, we consider concentration controlled P systems where C is a set of catalysts. The rules involved are either of the form a → w or ca → cw. Again, the objects of Vc in w will be redistributed. Theorem 5. NOP1 (con, cat2 ) = NRE. Proof. Consider a register machine with 3 registers, and a program P with n instructions P1 , . . . , Pn with Pn being the last instruction HALT . Construct the P system Π = (V, Vc , T, {c1 , c2 }, [ c1 c2 β] 1 , R1 ) with V = {o1 , o2 , o3 } ∪ {Pj , Qj , Rjr , Sjr , Tjr , Ujr , Vjr , Wjr , Xjr , Yjr | 1 ≤ j ≤ n, r ∈ {1, 2}}, Vc = T = {o3 } and rules as follows: 1. Generation of ox1 , the initial contents of register 1: 0. c1 β → c1 βo1 , c1 β → c1 P1 Q1 The symbol β produces arbitrarily many symbols o1 , and eventually is replaced with P1 Q1 where P1 is the first instruction.
The Expressiveness of Concentration Controlled P Systems
107
2. Simulation of an ADD instruction Pi = (ADD(r), j), 1 ≤ r ≤ 3, 1 ≤ i, j ≤ n 1. c1 Pi → c1 Pj Qj or , r ∈ {1, 2}, c1 Pi → c1 Pj Qj or or , r = 3, c2 Qi → c2 , Pi → #, Qi → #, # → #. The catalyst c1 acts on Pi and replaces it with Pj Qj and also produces symbols or ; in case r = 3, o23 is produced so that a copy of o3 is distributed to the outer region; in parallel, Qi is erased by c2 . 3. Simulation of a SUB instruction Pi = (SU B(r), k, j), 1 ≤ i, j, k ≤ n, r = 1, 2 We deal with the two cases of register r being zero or non-zero by appropriately guessing the case. Case 1: The register r is non-zero 2. cr Pi → cr Rir Sir , c3−r Qi → c3−r , 3. cr or → cr dr , c3−r Rir → c3−r , cr Sir → cr #, 4. cr dr → cr dr , c3−r Sir → c3−r Tir , 5. cr Tir → cr Pk Qk , c3−r dr → c3−r , 6. dr → #, dr → #, Rir → #, Tir → #, Pi → #, Qi → #, # → # Assume that the register r is non-zero. Then we replace Pi with Rir Sir using cr , while in parallel, Qi is erased by c3−r . Next, cr replaces a copy of or with dr while, c3−r erases Rir . This is followed by cr replacing dr with dr while c3−r replaces Sir with Tir . Finally, c3−r erases dr while cr replaces Tir with Pk Qk . Note that if the catalysts do not behave as mentioned above, a non halting computation will be induced by the symbols involved as can be seen from Rule 6. Note that if our guess was wrong, and if register r was zero, then instead of acting on or , cr would have acted on Sir inducing a non halting computation. Case 2: The register r is zero 7. cr Pi → cr Uir Vir , c3−r Qi → c3−r , 8. cr Uir → cr , c3−r Vir → c3−r Wir , 9. c3−r Wir → c3−r Xir Yir , 10. cr Xir → cr Pj Qj , c3−r Yir → c3−r , 11. Uir → #, Vir → #, Wir → #, Xir → #, Yir → #, Pi → #, Qi → #, # → #. If we guess that the register r is zero, then cr replaces Pi with Uir Vir while c3−r erases Qi . Next, cr erases Uir while c3−r replaces Vir with Wir . This is followed by c3−r replacing Wir with Xir Yir , while cr stays idle. If we had guessed wrongly, then cr would act on an or giving rise to dr . The symbols Xir , Yir are next acted upon by cr , c3−r obtaining Pj Qj and λ. Note that if a dr were produced, we would obtain a non-halting computation as all symbols dr , Xir , Yir require the catalysts to act on them.
108
S.N. Krishna
Thus, if all the instructions are simulated correctly, then we obtain Pn in the membrane. No more rules are left to be simulated, and the system will halt. The copies of o3 sent out of the skin membrane constitute the output.
Theorem 6. NOP1 (con, cat1 ) = NRCfλ ⊂ NRE. a| Proof. Consider Π = (V, Vc , T, {c}, [ cw] 1 , R1 ). Let V = {a | a ∈ V }, V = { a ∈ V }, AV = {Aa | a ∈ V }. Construct the random context grammar with forbidding contexts G = (V ∪ {P, S, S1 , S2 , S3 } ∪ V ∪ V ∪ AV ∪ {c}, T, P, R) with rules as follows: 1. (P → Sw, ∅, ∅), 2. (S → S1 , ∅, ∅), 3. (a → cw y , ∅, {S, S2 , S3 } ∪ {c}), if ca → cx ∈ R1 , and w, y are obtained from x as follows: If x contained k symbols of Vc , then w is a string containing any k2 symbols of Vc in x, |y| = k2 and y contains the remaining k2 symbols of Vc in x, the Parikh vectors of x and w are same over the alphabet V − Vc , 4. (a → w y , ∅, {S, S2 , S3 }), if a → x ∈ R1 , and w, y obtained from x as above, 5. (a → Aa , ∅, {S, S2 , S3 }) if a has no rules in R1 , 6. (S1 → S2 , ∅, V ), 7. (c → λ, ∅, {S, S1 , S3 }), 8. (a → a, ∅, {S, S1 , S3 }), 9. (S2 → S, ∅, {c} ∪ V ), 10. (S2 → S3 , ∅, {c} ∪ V ), 11. (S3 → λ, ∅, ∅), 12. (Aa → λ, ∅, ∅), 13. ( a → λ, ∅, {S, S1 , S2 }) for a ∈ V − T . The start symbol of G is P . It is replaced with Sw where w is the initial multiset of objects in the membrane. S is replaced with S1 non-deterministically. All replacements of symbols are expected to happen in the presence of S1 . Rule 3 describes how a rule corresponding to a catalyst is implemented: A symbol a y in the absence of c (that is, atmost one catalytic rule can is replaced with cw be used in a step). The string y represents the symbols of Vc in x which are sent to the outer region, while w represents the symbols of x which are retained in membrane 1. Thus, all symbols of V −Vc present in x are also present in w. Since there are two regions to redistribute symbols of Vc , it is done in the obvious way: if x contained k symbols of Vc , then k2 of those would go out, the remaining would stay in. The same holds for rules of the form a → w. Symbols a which do not have any applicable rules are replaced with Aa . At the end of a round of simulation, we are left with symbols over V which represent symbols which have been sent out, V which represent symbols in the membrane after replacement, and symbols of AV which can no longer be replaced. S1 is replaced with S2 at the end of each round of simulations. In the presence of S2 , (same as absence of S, S1 ), we replace all symbols a with a, and erase c. The symbols in AV can be erased any point of time. We can continue with the
The Expressiveness of Concentration Controlled P Systems
109
next round of simulations after all symbols of V are replaced with V . Then, S2 is replaced with S, and we continue. We can also non-deterministically guess that the system has reached a halting configuration and replace S2 with S3 . S3 as well as symbols of AV are erased. Since the output of Π is over symbols which have been sent out, we should analyze V . The symbols a, a ∈ V − T can be erased since they do not contribute to the output. If we did not guess a halting configuration correctly, then we will have symbols a ∈ V , which cannot be replaced in the absence of S1 . Thus, if we correctly guess the halting configuration, then we will obtain N(L(G)) over T which is same as N(Π) over T . N(L(G)) cannot be obtained if we do not guess the halting computation correctly.
4
Discussion
In this paper, we have improved an existing universality result [2], and have obtained some new universality results using mobile catalysts and catalysts. Purely catalytic systems with one membrane have been shown to be non-universal in the case of bi-stable catalysts and catalysts (with one bi-stable catalyst and one catalyst), while with one mobile catalyst, purely catalytic systems with two membranes are non-universal. Several questions remain open here. Some of them are (i) optimality of the bounds proved in this paper (ii) identifying interesting non-universal sub classes of these systems, and (iii) answering decision questions related to descriptional complexity measures of these systems.
References 1. Dassow, J., P˘ aun, Gh.: Regulated Rewriting in Formal Language Theory. Springer, Heidelberg (1989) 2. Dassow, J., P˘ aun, Gh.: P Systems with Communication Based on Concentration. Acta Cybernetica 15(1), 9–24 (2001) 3. Freund, R., Kari, L., Oswald, M., Sosik, P.: Computationally Universal P Systems without Priorities: two catalysts are sufficient. Theoretical Computer Science 330, 251–266 (2005) 4. Freund, R., P˘ aun, Gh.: On the Number of Non-terminals in Graph-controlled, Programmed, and Matrix Grammars. In: Proceedings of UMC, pp. 214–225 (2001) 5. Ibarra, O.H., Dang, Z., Egecioglu, O., Saxena, G.: Characterizations of Catalytic Membrane Computing Systems. In: Rovan, B., Vojt´ aˇs, P. (eds.) MFCS 2003. LNCS, vol. 2747, pp. 480–489. Springer, Heidelberg (2003) 6. Krishna, S.N.: On Pure Catalytic P Systems. In: Calude, C.S., Dinneen, M.J., P˘ aun, G., Rozenberg, G., Stepney, S. (eds.) UC 2006. LNCS, vol. 4135, pp. 152– 165. Springer, Heidelberg (2006) 7. Krishna, S.N., P˘ aun, A.: Results on Catalytic and Evolution-Communication P Systems. New Generation Computing 22(4), 377–394 (2004) 8. Minsky, M.L.: Computation: Finite and Infinite Machines. Prentice Hall, Englewood Cliffs (1967) 9. P˘ aun, Gh.: Computing with Membranes, A Variant. International Journal of Foundations of Computer Science 11(1), 167–182 (2000)
110
S.N. Krishna
10. P˘ aun, Gh.: Computing with Membranes. Journal of Computer and System Sciences 61(1), 108–143 (2000) 11. P˘ aun, Gh.: Membrane Computing. An Introduction. Springer, Heidelberg (2002) 12. P˘ aun, Gh.: Tracing Some Open Problems in Membrane Computing. Romanian Journal of Information Science and Technology 10(4) (2007) 13. Salomaa, A.: Formal Languages. Academic Press, London (1973) 14. The membrane computing web page, http://psystems.disco.unimib.it or http://ppage.psystems.eu/
On Faster Integer Calculations Using Non-arithmetic Primitives Katharina L¨ urwer-Br¨ uggemeier and Martin Ziegler Heinz Nixdorf Institute, University of Paderborn, 33095 Germany
Abstract. The unit cost model is both convenient and largely realistic for describing integer decision algorithms over +, ×. Additional operations like division with remainder or bitwise conjunction, although equally supported by computing hardware, may lead to a considerable drop in complexity. We show a variety of concrete problems to benefit from such non-arithmetic primitives by presenting and analyzing corresponding fast algorithms.
1
Introduction
The Turing machine is generally accepted as the appropriate model for describing both the capabilities (computability) and the complexity (bit cost) of calculations on actual digital computers; but it is cumbersome to handle when developing algorithms (upper complexity bounds) as well as for proving lower bounds, and therefore often replaced by algebraic models such as the random access machine (RAM). The latter operates on entire integers (as opposed to bits) and comes in various flavors depending on which primitives it is permitted to employ: e.g. incrementation, addition, subtraction, multiplication, comparisons “=” and “<”, division with remainder “x div y”, integer constants, bitwise conjunction “&”, shifts “x ← y = x·2y ” and “x → y = x div 2y ”, indirect addressing etc. Notice that bitwise conjunction and integer division (when the numerator is not a multiple of the denominator) are examples of non-arithmetic operations over Z yet commonly hardware supported by digital computers (see Section 4 below). The choice among these instructions heavily affect a RAM’s power in comparison to the normative Turing machine; e.g. a decision based on polynomially many applications of (+, −, ×, =) can, in spite of exponentially long intermediate results, be simulated within RP [31]; whereas polynomially many steps over (+, ×, =, div) cover already NP [31]; and over (+, −, ×, =, &) even entire PSPACE [27]; compare also [33] and [6]. We are interested in the effect of these additional instructions to selected problems of complexity possibly much lower than polynomial; specifically for accelerating to linear and sublinear running times as in the spirit of the following example.
M. Ziegler is supported by DFG project Zi1009/1-1.
C.S. Calude et al. (Eds.): UC 2008, LNCS 5204, pp. 111–128, 2008. c Springer-Verlag Berlin Heidelberg 2008
112
K. L¨ urwer-Br¨ uggemeier and M. Ziegler
Example1 1 a) Over (+, −, ×, ≤, div), not only primality test but even factorization of a given integer x is possible in time O(log x) linear in its binary length. 2k 2k b) Given a, k ∈ N and some arbitrary √ integer b ≥ a , one can compute a over (+, −, ×, ≤, div) within O( k) steps. c) Over (+, −, ×, ≤, div) and using indirect addressing, the greatest common divisor gcd(x, y) of given integers can be calculated in O(log N/ loglog N ) steps. where N = max{x, y}. d) Over (+, −, ≤, &, ←, →) (but without indirect addressing as for Bucket Sort), n given integers x1 , . . . , xn can be sorted in O(n); over (+, −, ×, div, ≤) this can be achieved in O(n · loglog maxi xi ). e) 3SUM, that is the question whether to given integers x1 , . . . , xn , y1 , . . . , yn , z1 , . . . , zn there exist i, j, k with xi + yj = zk , can be decided in O(n) operations over (+, −, ×, ≤, &). f) The permanent perm(A) = π∈Sn a1,π(1) · · · an,π(n) of a given n × n matrix A can be calculated in optimal O(n2 ) steps over (+, −, ×, div). Without integer division, perm is (Valiant) NP-complete in the uniform-cost model [5, Theorem 21.17] (and even #P-complete in the bit model). Similarly, 3SUM is considered ‘n2 –complete’ in a certain sense [15]. Regarding d), describing the permutation mapping the input to its sorted output requires Ω(n · log n) bits. Similarly, compare c) with the running time Θ(log N ) of the Euclidean algorithm attained on Fibonacci numbers x = Fn = N , y = Fn−1 . And finally observe that in b) mere repeated squaring, i.e. without resorting to integer division, yields only running time O(k); the additional input b thus can be seen as a ‘catalyst’; cf. Section 3.2 below. Proof. a) See [32]; b) see [7] or Section 3.2 below; c) see [9]; d) see [21], and [17] for an account of more recent results on sorting using various sets of operations and costs; f) appears e.g. in [1, Proposition 2.4]. Claim e) can be concluded from (the much more general considerations in) the work [4] which, applied to our setting, simplifies to the following observation: For N −1 , aN −1 , b0 , . . . , bN −1 < 2t−1 , let A := i=0 ai · 2ti , B := i bi · 2ti , 0 ≤ a0 , . . . and C := i 2t−1 · 2ti . Then ∀i = 0, . . . , N − 1 : ai ≥ bi
⇔
(A + C − B) & C = C.
In particular, subject to the above encodings, “∃i : ai = bi ” can be tested in constant time over (+, −, &). Now such an encoding can be obtained for the double sequence (xi + yj )i+nj in linear time O(n) over (+, −, ×, ≤); cf. e.g. our proof of Observation 17 below.
2
Polynomial Evaluation
Polynomial Evaluation occurs ubiquitously in computer science, e.g. in connection with splines or with Reed-Solomon codes. It is commonly performed by 1
We thank Riko Jacob for pointing us to Items d) and e) in this example.
On Faster Integer Calculations Using Non-arithmetic Primitives
113
Horner’s method using O(d) arithmetic operations where d denotes the polynomial’s degree. While this has been proven optimal in many cases [5, Theorem 6.5], over (certain subrings of) integers it is not. Specifically, if the integer polynomial to be evaluated has coefficients which are small (e.g. only 0s and 1s) compared to its degree, Horner’s method can be slightly accelerated: Proposition 2. Given p0 , . . . , pd−1 ∈ Z with |pn | < P ∈ N and x ∈ Z, d−1
pn xn
n=0
can be calculated using O(d/ logP d) operations over (+, ×, =). Proof. We treat the terms with negative coefficients separately and may therefore suppose pn ≥ 0. For k ∈ N to be chosen later, decompose p into d/k polynomials qi ∈ N[X] of degree less than k. Notice that, since their coefficients belong to {0, 1, . . . , P − 1}, there exist at most P k distinct such polynomials. Evaluate all of them at the given argument x ∈ Z: P k separate executions of Horner’s method result in a total running time of O(k · P k ). In a second phase, d/k apply Horner to evaluate i=0 qi (x) · Y i at Y = xk and obtain p(x) as desired: Together this leads to a total number of operations of O(d/k + k · P k ) which, for k :≈ logP d − 2 logP logP d, becomes O(d/ logP d) as claimed. 2.1
Throwing in Integer Division
Again for fixed p, the running time obtained in Proposition 2 can be accelerated when admitting integer division as an operational primitive and restricting to a finite domain: Observation 3. For X ∈ N and Z > max0≤x≤X p(x), store the integer Y := X x x=0 Z · p(x). Then, given 0 ≤ x ≤ X, decode this ‘table’ to obtain p(x) as follows: calculate Z x using repeated squaring in O(log x); now p(x) equals (Y div Z x ) rem Z. We thus have running time independent of deg(p) and logarithmic in x. [25], [10] had observed that, surprisingly, also the latter dependence can be removed: Algorithm 4. Fix X ∈ N and an integer polynomial p of degree at most d with nonnegative coefficients. Then, for Z ∈ N sufficiently large, one can evaluate {0, 1, . . . , X} x → p(x) as follows:
114
1) 2) 3) 4) 5)
K. L¨ urwer-Br¨ uggemeier and M. Ziegler
Input x ∈ {0, 1, . . . , X}. Compute Z d+1 div (Z − x) multiply the result to p(Z) integer-divide this in turn by Z d and output the remainder from division by Z.
By pre-computing and storing the integers Z, Z d , and p(Z) one arrives at Corollary 5. Over integer operations {+, −, ×c, div}, an arbitrary fixed polynomial p ∈ Z[x] can be evaluated on an arbitrary finite domain D ⊆ Z in constant time independent of p and (of its degree and of ) D. Here, ×c denotes unary multiplication (scaling) of the argument by a fixed integer constant. Indeed, evaluation of p at a negative argument −x reduces to the evaluation at positive x of p(−x); and every integer polynomial is the difference of two with nonnegative coefficients. Concerning the correctness of Algorithm 4, we repeat a proof due to [10] and obtain the following strengthening used in Section 4: d n Scholium 6. Let p = n=0 pn x be of degree at most d and norm p1 := |p0 | + · · · + |pd | ≤ P ; then every Z > max{X d · P, (X d + 1) · X} is ‘sufficiently large’ for Algorithm 4 to succeed. Note that the constants Z and p(Z) to store as preprocessing of course depend on p, d, and X; but the running time of Algorithm 4 does not. ∞ Proof. It holds Z d+1 div (Z − x) = Z d /(1 − Zx ) = Z d · m=0 (x/Z)m = = Z d + Z d−1 x + · · · + Zxd−1 + xd + ∈N
d
=Z +Z
d−1
x + · · · + Zx
d−1
d
+x
∞
Z d (x/Z)m
m=d+1
=xd+1 /(Z−x)<1
(1)
since Z > (xd + 1) · x by prerequisite. Hence the result of Step 3 equals d n,m=0
pn · xm · Z n+d−m =
2d k=0
qk · Z k
where 0 ≤ qk < Z for k ≤ d because Z > xd · (p0 + · · · + pd ). In particular d qd = n=0 pn · xn = p(x) is isolated by Steps 4 and 5. 2.2
First Consequences
Corollary 7. Over integer operations {+, −, ×c, div}, every finite integer sequence y0 , y1 , . . . , yN (or, more formally, the mapping {0, 1, . . . , N } n → yn ) is computable in constant time independent of (the length N of ) the sequence! Proof. Consider an interpolation polynomial p ∈ Q[X] of degree ≤ N + 1 such that p(n) = yn , n ∈ {0, . . . , N }. Take M ∈ N such that M · p ∈ Z[X]. Apply
On Faster Integer Calculations Using Non-arithmetic Primitives
115
Corollary 5 in order to calculate n → M · p(n) in constant time, then integerdivide the result by M . It has been shown in [20] that every language L ⊆ Z (rather than Z∗ ) which can be decided over {+, −, ×c, div} at all, can be decided in constantly many steps; that is in time independent of the input x ∈ Z—but of course depending on L. Observation 8. Every finite language L ⊆ Z is decidable over integer operations {+, −, ×c, div} within constant time independent of L. Proof. Let L ⊆ {0, 1, . . . , N } and apply Corollary 7 to the characteristic sequence (y0 , . . . , yN ) of L, defined by yn := 1 for n ∈ L and yn := 0 for n ∈ L. The next subsection implies the same to hold for finite sequences (y0 , . . . , yN ) in Zd and for finite languages L ⊆ Zd as long as d is fixed. 2.3
Multi-variate Case
We extend Algorithm 4 to obtain Proposition 9. Over integer operations {+, −, ×, div, ≤}, any fixed polynomial p ∈ Z[x1 , . . . , xn ] can be evaluated on an arbitrary finite domain D ⊆ Zn in time O(n) independent of p and D. Proof. We devise 2n separate algorithms: one for each of the polynomials p(±x1 , ±x2 , . . . , ±xn ) to be evaluated at non-negative argument vectors x ∈ Nn . Then, for a given input in Zn , one can in time O(n) determine which of these polynomials to evaluate at (|x1 |, |x2 |, . . . , |xn |) in order to yield the aimed value p(x). Moreover decomposition of a polynomial into a part with positive and one d−1 with negative coefficients reduces to the case p = i1 ,i2 ,...,in =0 ai1 ,...,in ·xi11 · · · xinn with aı ∈ N. As in Equation (1) on p.114, Z d div(Z − x) equals Z d−1 + Z d−2 · x + · · · + Z · xd−2 + xd−1 for all integers Z ≥ Ω(xd ). Applied to x2 and Z2 := Z d , one obtains d−1
d2
2 i1 2 Z div(Z d − x2 ) · Z d div(Z − x1 ) = Z d −1−(di2 +i1 ) · xdi 2 · x1 i1 ,i2 =0
and inductively, using O(n) operations from {+, −, ×, div}, d−1 i1 ,...,in =0
Zd
n
−1−(dn−1 in +···+di2 +i1 )
n−1
· xdn
in
i1 2 · · · xdi 2 · x1 .
Then multiply this counterpart to Step 2) in Algorithm 4 with the constant 2
p(Z, Z d , Z d , . . . , Z d
n−1
)
=
d−1 i1 ,...,in =0
ai1 ,...,in · Z i1 +di2 +d
(cmp. Step 3) and extract the term corresponding to Z d
n
−1
2
i3 +···+dn−1 in
(Steps 4+5).
116
K. L¨ urwer-Br¨ uggemeier and M. Ziegler
2.4
Evaluation on all Integers: Exploiting Bitwise Conjunction
As opposed to Horner’s method, Algorithm 4 and its above generalization restricts polynomial evaluation to arguments x from an arbitrary yet finite domain. Indeed Scholium 6 derives from a bound X on x one on Z to avoid spill-overs in the Z-ary expansion of the product of Z d+1 div(Z − x) with p(Z). Now Z can of course be chosen adaptively with respect to x, but how do we then adapt and calculate p(Z) accordingly? This becomes possible when allowing, in addition to integer division, bitwise conjunction as operational primitive. Proposition 10. Fix p ∈ N[x] of degree d. Then evaluation N x → p(x) is possible using O(log d) operations over {+, −, ×, div, &}. This is much faster than Horner and asymptotically optimal.
Fig. 1. Expansions of the calculations employed in the proof of Proposition 10
Proof. Since p is fixed, one may store p(Y ) as a constant for some sufficiently large integer Y , w.l.o.g. a power of two. Notice that Y − 1 can then serve as a mask for bitwise conjunction: for 0 ≤ qn < Y and Z a multiple of Y , it holds
qn · Z n & (Y − 1) · Z m = qm · Z m ; n
compare Figure 1. Now given x ∈ N we compute, using repeated squaring within O(log d), Z := xd+2 ; hence Z := Z · Y satisfies the conditions of Scholium 6. Then, using another O(log d) steps, calculate Z d+1 and, from that, d i d+1 div(Z −1) as in Equation (1). Multiply the latter to p(Y ) and, i=0 Z = Z to the result, apply bitwise conjunction with di=0 (Y − 1) · (Z Y )i ; the latter can
On Faster Integer Calculations Using Non-arithmetic Primitives
117
be obtained again as (Y − 1) · (Z d+1 · Y d+1 ) div(Z · Y − 1) . Based on the d mask property of Y − 1 mentioned above, this yields i=0 pi · (Z Y )i = p(Z): now continue as in Algorithm 4. A review of the above proof reveals that the O(log d) steps are spent for calculating Z = xd+2 and Z d ; everything else proceeds in constant time based on pre-computed constants like Y d . Now when x ≤ O(2d ), xd and Z d are faster to 2 obtain starting the repeated squaring, rather than from x, from 2d and 2d for O(loglog x) steps, respectively. Alternatively we may choose d as a power of two to invoke Example 1b) and arrive at Scholium 11. Fix p ∈ Z[x] of degree d. Given x ∈ Z, one can calculate p(x) using O(loglog |x|) operations over {+, −, ×, div, &}. If in addition some arbi2 trary integer y ≥ |x|d is given, also running time O( min{log d, loglog |x|}) is feasible. As in Proposition 9, this extends to the multi-variate case: Theorem 12. Over integer operations {+, −, ×, div, &}, any fixed polynomial p ∈ Z[x1 , . . . , xn ] of maximum degree less than d can be evaluated in time O(n · min{log d, loglog maxi |xi |}). dn+1 If, in addition to the argument (x1 , . . . , x d ), some integer y ≥ (maxi |xi |) is given, the running time reduces to O(n · min{log d, loglog maxi |xi |}). n
Proof. According to the proof of Proposition 9, for some integer Z > Ω(xd ), 2 n n−1 we need to know (Z d , Z d , . . . , Z d ) and p(Z, Z d , . . . , Z d ). Since the latter is a univariate polynomial in Z of degree < dn+1 , the proof of Proposition 10 n−1 shows how to obtain this value from p(Y, Y d , . . . , Y d ) using bitwise con2 n junction. Repeated squaring, either of maxi |xi | or of (2d , 2d , . . . , 2d ), yields 2 n (Z d , Z d , . . . , Z d ) in time O(n · min{log d, loglog maxi |xi |}); the additional input y accelerates this to O(n · min{log d, loglog maxi |xi |}) according to Example 1b). 2.5
Storing and Extracting Algebraic Numbers
When permitting not “&” but only (+, −, × div), Horner’s seems to remain the fastest known algorithm for evaluating an arbitrary but fixed polynomial on entire N. Its running time O(d) leaves a doubly exponential gap to the lower bound of Ω(loglog d) due to [23, Corollary 3]. Question 13. Does every (fixed) polynomial p ∈ N[x] admit evaluation x → p(x) on all integers x ∈ N in time o(deg p) over (+, −, ×, div) ? In view of the previous considerations, the answer is positive if one can, from given x within the requested time bounds and using the operations under
118
K. L¨ urwer-Br¨ uggemeier and M. Ziegler
consideration, obtain the number p(Z) for some Z > Ω(xd ) where d > deg p. To this end in turn, choose Zn := Y · 2n where Y = 2k > p1 and encode the sequence p(Zn ) < Znd · p1 ≤ 2K+dn , where n ∈ N and K := k · (d + 1), as in Observation 3 into the binary expansion—now of a real number like ρp :=
n
p(Zn ) · 2−n·(K+dn) .
(2)
Then, given x ∈ N, it suffices to approximate ρp up to error < 2−Kn−dn for some n ≥ Ω(d · log x) in order to extract2 the corresponding p(Zn ). 2
Lemma 14. Fix α ∈ R algebraic of degree < δ. Then, given n ∈ N, one can calculate u, v ∈ N such that |α − u/v| ≤ 2−n using O(δ · log n) operations over (+, −, ×). Similar results are known to hold, although by very different methods, for certain transcendental numbers [8]. Proof (Sketch). Apply Newton Iteration to the minimal polynomial q ∈ Z[x] of α. Since the latter is fixed, q, q , and an appropriate starting point for quadratic convergence can be stored beforehand. O(log n) iterations are sufficient to attain the desired precision; and each one amounts to evaluating q and q at cost O(δ) via Horner. So when permitting a mild dependence of the running time on x and if ρp is algebraic of degree o(deg p), we obtain a positive answer to Question 13: 2 Proposition 15. Let p ∈ N[x] be of degree < d and suppose that n 2−dn is algebraic of degree < δ. Then N x → p(x) can be calculated over (+, −, ×, div) using O(δ · loglog x) steps. 2 Unfortunately the question whether n 2−dn is algebraic (not to mention what its degree is) constitutes a deep open problem in Number Theory [30, Section 10.7.B, Example 1, p.314]. We are currently pursuing a different approach to Question 13 with a mild dependence on x: namely by exploiting integer division in some of the algorithms described in [14] in combination with the following: Observation 16. Let p ∈ Q[x] be of degree < d and c ∈ N. Then the integer sequence p(1), p(c), p(c2 ), . . . , p(cn ), . . . is linearly recurrent < d; that
of degree n+1 n is there exist a , . . . , a , a ∈ Z such that p(c ) = a · p(c ) + · · · + ad · 0 d−1 d 1 p(cn−d+1 ) /a0 for all n ∈ N.
2
Strictly speaking, this approximation does not permit to determine e.g. the least bit of p(Zn ) due to iterated carries of less significant ones; however this can be overcome by slightly modifying the encoding to force the least bit to be, e.g., zero.
On Faster Integer Calculations Using Non-arithmetic Primitives
119
Proof. For k = d − 1, the (d + 1) polynomials p(cx), p(x), p(x/c), . . . , p(xc−k ) all have degree < d and therefore must be linearly dependent over Q: q0 · p(cx) + q1 · p(x) + · · · + qk+1 · p(xc−k ) ≡ 0; w.l.o.g. qi ∈ Z. Choosing k minimal implies q0 = 0.
3
Applications to Linear Algebra
Naive multiplication of n × n matrices takes cubic running time, but V. Strassen has set off a race for faster methods with current record O(nω ) for ω < 2.38 held by D. Coppersmith and S. Winograd; see [5, Section 15] for a thorough account. However these considerations apply to the uniform cost model over arithmetic operations +, −, × where division provably does not help [5, Theorem 7.1]; whereas over Z when permitting integer division as a non-arithmetic operation, optimal quadratic running time can easily be attained: Observation 17. Given A ∈ Zk×n and B ∈ Zn×m , one can compute C := A · B ∈ Zk×m using O(kn + nm + km) operations over {+, −, ×, div}.
Fig. 2. Encoding matrices (ai ) and (bj ) to integers α, β; and decoding (cij ) from α · β
n Proof. We want to calculate ci,j = =1 ai, · b,j for i = 1, . . . , k and j = 1, . . . , m. W.l.o.g. ai, , b,j ≥ 0; otherwise decompose. Choose Z > (maxi, ai, ) · (max,j b,j ) · n; then compute α :=
k n i=1 =1
ai, · Z (−1)+2nm(i−1)
and
β :=
m n
b,j · Z (n−)+2n(j−1) .
=1 j=1
As indicated in Figure 2, the Z-adic expansion of their product γ := α·β contains all desired numbers ci,j at ‘position’ Z 2n(j−1)+(n−1)+2nm(i−1) from which they are easily extracted using division with remainder. Observe that most of the time is spent encoding and decoding the input and output, respectively. However the right factor is encoded differently from the left one; hence binary powering yields computation of Ak from A ∈ Zn×n within O(n2 · log k) whereas a running time of O(n2 + log k), i.e. by encoding and decoding only at the beginning and the end, seems infeasible. We shall return to this topic in Section 3.2.
120
3.1
K. L¨ urwer-Br¨ uggemeier and M. Ziegler
Determinant and Permanent
Over arithmetic operations (+, −, ×), the asymptotic complexities of matrix multiplication and of determinant computation are—although unknown—arbitrarily close to each other [5, Section 16.4]. We show the same to hold as well when including integer division: not by means of reduction but by exhibiting explicit algorithms. The determinant of an n × n matrix A is not too difficult to see computable in a polynomial number O(n3 ) of steps by bringing A into echelon form using Gaussian Elimination. The permanent on the other hand is (Valiant) NPcomplete in this algebraic model [5, Theorem 21.17] (and even #P-complete in the bit model). Nevertheless, when throwing in integer division, it is known [1, Proposition 2.4]: Fact 18. One can calculate Nn×n A → perm(A) =
a1,π(1) · · · an,π(n)
π∈Sn
over (+, −, ×, div) in O(n2 ) steps. Theorem 19. Given A ∈ Zn×n , one can calculate det(A) within O(n2 ) operations over {+, −, ×, div}. Notice that, as opposed to Theorem 12, bitwise conjunction “&” is not needed! Proof. Let det+ (A) =
a1,π(1) · · · an,π(n)
π∈Sn sgn(π)=+
and det− (A) =
a1,π(1) · · · an,π(n).
π∈Sn sgn(π)=−
Hence perm(A) = det+ (A) + det− (A) whereas det(A) = det+ (A) − det− (A). Also, both det+ (A) and det− (A) are polynomials in n2 variables xi+n(j−1) := ai,j of maximum degree less than d := 2 (total degree is n) with coefficients 0, 1. As in Section 2.4 it thus suffices, in view of the proof of Proposition 9, to obtain the values of det+ = (perm + det)/2 and of det− = (perm − det)/2 n2 −1
at x = (x0 , . . . , xn2 −1 ) := (Z , Z 2 , Z 4 , . . . , Z 2 n2
) where Z := Z · Y for
Z := (maxk |xk |)2 and Y denotes some appropriate constant. Now x can be computed in O(n2 ); and so can its permanent according to Fact 18; whereas its determinant amounts to
On Faster Integer Calculations Using Non-arithmetic Primitives
121
n−1 Z Z 2 Z 4 Z 8 · · · Z 2 n n+1 n+2 2n−1 Z 2 2 2 2 Z Z ··· Z 3n−1 Z 22n Z 22n+1 . . . Z 2 = .. 4n−1 23n 2 . Z Z .. .. . . 2 Z 2(n−1)n · · · 2n −1 ··· Z n−1 Z Z 2 Z 4 Z 8 · · · Z 2
2n 2 2n 4 2n 8
2n 2n−1 n 2 Z Z Z Z Z
22n 2 22n 4 22n 8
22n 2n−1 Z 22n Z Z Z Z = =
. 3n 3n 2 3n 2n−1 . 2 2 2 . Z Z Z . . . . . . n−1
(n−1)n 2 Z 2(n−1)n Z 2(n−1)n 2 · · · · · · Z 2
(j−1)n n 2n (n−1)n (i−1)n Z 2 · − Z 2 = Z · Z 2 · Z 2 · · · Z 2 1≤i<j≤n
a Vandermonde determinant easily evaluable in O(n2 ) steps.
Question 20. Let P ⊆ Sn be an arbitary family of permutations on [n]. n−1 Can π∈P i=0 xi+nπ (i)) be evaluated using O(n2 ) steps over {+, −, ×, div}? 3.2
Integer Matrix Powering: Exploiting GCD
The unit cost assigned to multiplication “×” allows to compute high powers like k a2 of a given input a to be calculated by squaring k-times. However the presence of integer division and the additional input √ of a sufficiently large but otherwise 2k arbitrary integer b yields a in only O( k) steps; recall Example 1b). We now generalize this to powering integer matrices. d×d , let gcd(C) := gcd(cij : 1 ≤ i, j ≤ d) and Definition 21. For X, C ∈ Z X rem C := xij rem gcd(C) extend the gcd and remainder from natural numbers to integer matrices. Also write “X ≡ Y (mod C)” if gcd(C) divides each entry xij − yij of X − Y .
For a fixed C, this obviously yields an equivalence relation on Zd×d ; in fact a two-sided congruence relation3 , since one easily verifies: Lemma 22 a) If X ≡ Y (mod C), then S · X · T ≡ S · Y · T (mod C). 3
Conversely, every two-sided ideal in Zd×d is of this form [19, Proposition III.2.1].
122
K. L¨ urwer-Br¨ uggemeier and M. Ziegler
b) For each n ∈ N it holds X n ≡ Y n (mod X − Y ). c) X rem C ≡ X (mod C). d) If 0 ≤ xij < gcd(C) then X rem C = X. Claim b) follows from a) and the non-commutative binomial theorem. (j−1) , Y := B 2 , and C := Y − X Now apply Lemma 22 to n := 2 , X := A2 in order to conclude
(j−1) 2
j (j−1) A2 = A2 (3) = B 2 rem B − A2 2
(j−1)
) is larger than the entries of A2 . In the case provided that gcd(B − A2 d = 1 treated in Example 1b), this amounts to the condition that B = (b) simply 2
be larger than a2 . In the general case d > 1, we need that gcd(B − C) be ‘large’ for ‘not too large’ matrices C. Section 3.3 below reveals that there are plenty of, and describes how to obtain, such matrices B appropriate for our Theorem 23. Let k ∈ N, A ∈ Nd×d be given and abbreviate γ := d2 −1 · k (maxij aij )2 . Given furthermore some B ∈ Nd×d such that gcd(B − C) > γ √ k holds for all C ∈ {0, 1, . . . , γ}d×d: then one can compute A2 using O(d2 · k) operations over {+, −, ×, div, gcd}. k
k
Notice that such a B will ‘catalyze’ not only the calculation of A2 but also of √ k A 2 , for any 0 ≤ k ≤ k and any 0 ≤ a ij ≤ aij , in time O(d2 · k ).
Proof. It suffices to treat the case k = 2 . First calculate B 2 using repeated squaring within O(d2 · ) according to Observation 17. Then proceed, inductively (j−1) j for j = 1, . . . , , from A2 to A2 according to Equation (3), at cost O(d2 ) each. Indeed, the m-th power C of a d × d–matrix A with entries in {0, 1, . . . , s} has entries in {0, 1, . . . , dm−1 · sm }; hence the prerequisite on B shows that Lemma 22d) applies. The binary gcd operation is used to compute gcd(C) in O(d2 ) steps and then X rem C according to Definition 21. In fact we were surprised to realize that the kj above sequence A2 , j = 0, . . . , k, is obtained according to Equation (3) merely by componentwise remainder calculations. 3.3
Locally Lower-Bounding the GCD
Upper semi-continuity of a real function f : Rd → R at x means that, for arguments u sufficiently close to x, the values f (u) do not drop below f (x) too much; recall e.g. [28, Chapter 6.7]. Now the greatest common divisor (gcd) function is discrete and such topological concepts hence inapplicable in the strict sense. Nevertheless one may say that gcd does admit points x arbitrarily close to ’approximate’ upper semi-continuity: Lemma 24. For all d, r, s ∈ N there exist x1 , x2 , . . . , xd ∈ N such that, for all v1 , . . . , vd ∈ {0, 1, . . . , s − 1}, it holds gcd(x1 + v1 , . . . , xd + vd ) ≥ r.
On Faster Integer Calculations Using Non-arithmetic Primitives
123
Proof. Take pairwise coprime integers pv ≥ r, v ∈ {0, 1, . . . , s − 1}d; e.g. distinct prime numbers will do. For i = 1, . . . , d and j = 0, . . . , s−1 let ui,j := v:vi =j pv . Then, for fixed i, the numbers ui,0 , ui,1 , . . . , ui,s−1 are pairwise coprime themselves. Hence, by the Chinese Remainder Theorem, there exists xi ∈ N such that ui,j divides xi + j for all j = 0, 1, . . . , s − 1. In particular pv , which is common to all ui,vi , divides xi + vi for each i = 1, . . . , d; and thus also divides gcd(x1 + v1 , . . . , xd + vd ): which therefore must be at least as large as pv ≥ r. Scholium 25. a) x1 , . . . , xd according to Lemma 24 can be chosen to lie between 0 and (r · S)O(S) where S := sd . b) It can be constructed (although not necessarily within this bound) using O(S) operations over (+, −, ×, div, gcdex). Here “gcdex” denotes the extended (binary) gcd function returning, for given x, y ∈ N, s, t ∈ Z (w.l.o.g. coprime) such that gcd(x, y) = sx + ty. Proof. a) According to the Prime Number Theorem, the k-th prime pk has magnitude O(k·log k) and there are at most π(n) ≤ O(n/ log n) primes below n. Hence the first prime at least as large as r has index kr ≤ O(r/ log r); and we are interested in bounding the product N = pkr · · · pkr +S , that is basically the quotient of primorials (r + )#/r# where r + = pkr +S = r + (S · log S). It has been shown4 [24] that π(r + ) − π(r) ≤ 2π( ) holds; that is, between r and r + there are at most O( / log ) = O(S) primes; and each obviously not larger than r + . Hence (r + )#/r# ≤ (r + )O(/ log ) ≤ (r · )O(/ log ) for = O(S · log S). b) Pairwise coprime integers pi ≥ r can be found iteratively as p1 := r, p2 := r + 1, p3 := p1 · p2 + 1, and pi+1 := p1 · · · pi + 1. Then apply the next lemma. Lemma 26 (Chinese Remainder). Given a1 , . . . , an ∈ N and pairwise coprime m1 , . . . , mn ∈ N, one ncan calculate x ∈ N with x ≡ ai (mod mi ) for i = 1, . . . , n with O(log n · i=1 log mi ) operations over (+, −, ×, div). When permitting in addition gcdex as primitive, the running time drops down5 to O(n). Proof. Calculate N := m1 · · · mn and, for each i = 1, . . . , n, ei := ti N/mi where 1 = gcd(mi , N/mi ) = si mi + ti N/mi with si , ti ∈ Z returned by gcdex. Then it holds ei ≡ 1 (mod mi ) and ei ≡ 0 (mod mj ) for j = i; hence x := i ei · ai satisfies the requirements. When working over (+, −, ×, div), the extended Euclidean algorithm computes gcdex(mi , N/mi ) within O(log N ) = O( j log mj ) steps, for each i = 1, . . . , n separately: leading to a total running time of O(n · log N ). In order to improve this with respect to n, arrange the equations “x ≡ ai (mod mi )”, i = 1, . . . , n, into a binary tree: first compute simultaneous solutions yj to y ≡ a2j (mod m2j ) and y ≡ a2j+1 (mod m2j+1 ) for j = 1, . . . , n/2; then solve adjacent quadruples as x ≡ y2j (mod m4j · m4j+1 ) and x ≡ y2j+1 (mod m4j+2 · m4j+3 ) 4 5
We are grateful to our colleague Stefan Wehmeier for pointing out this bound! Observe that, since m1 , . . . , mn are coprime, n i=1 log mn ≥ Ω(n).
124
K. L¨ urwer-Br¨ uggemeier and M. Ziegler
for j = 1, . . . , n/4; and so on. The k-th level thus consists of solving n/2k separate k-tuples of congruences involving disjoint k-tuples out of m1 , . . . , mn ; that is, the corresponding extended Euclidean algorithms incur cost O( i log mi ) independent of k = 1, . . . , O(log n). 3.4
Constructing Primes Using Integer Division
The (last of the) S pairwise coprime numbers pj ≥ r (and thus also the integers S−2 xi ) computed according to Part b) of Scholium 25 are of order Ω(r2 ) and thus much much larger than the ones asserted feasible in Part a) by choosing pj as prime numbers. This raises the question on the benefit of our non-arithmetic operations for calculating primes, i.e. for solving Problem (b) mentioned in the beginning of [29, Chapter 3] and addressed in Section II thereof. The Sieve of Eratosthenes finds all primes up to N using O(N ) operations over (+, −). This can be accelerated [26] to O(N/ loglog N ); which is almost optimal in view of the output consisting of Θ(N/ log N ) primes according to the Prime Number Theorem. This also yields a simple randomized way of finding a prime: Observation 27. Given N ∈ N, guess some integer N ≤ n < 2N . Then, with probability Θ(1/ log N ), n is a prime number: hence after O(log N ) independent trials we have, with constant probability, found one. Using Example 1a) to test primality, this leads to O(log2 N ) expected steps over (+, −, ×, div). Indeed the Bertrand-Chebyshev Theorem asserts a prime to always exist between N and 2N . This trivial algorithm can be slightly improved: Proposition 28. Given N ∈ N, a randomized algorithm can, at constant probability and within O(log2 N/ loglog N ) steps over (+, −, ×, div), obtain a prime p ≥ N. Proof. First check whether N itself is prime: by testing whether N divides (N − 1)! (Wilson’s Theorem); using integer division, this can be done in O(log N ) operations over (+, −, ×, div) [32, Section 3]. ¿From that, each adjacent factorial (N + k)!, k = 0, . . . , K − 1, is reachable in constant time: that is, after having tested primality of N , corresponding checks for N + 1, N + 2, . . . , . . . , N + K are basically free when K := O(log N ). So now guess some O(log N )-bit number M ≤ N and then test integers N + M, N + M + 1, . . . , N + M + K in total time O(log N ) as above. We claim that this succeeds with probability Ω(loglog N/ log N ), hence the claim follows by repeating independently at random for O(log N/ loglog N ) times. Indeed the Prime Number Theorem asserts between N and 2N to lie Ω(N/ log N ) primes; on the other hand every interval of length K between N and 2N contains at most π(K) ≤ O(K/ log K) primes [24]: hence by pigeon hole, among these N/K intervals, a fraction of at least Ω(log K/ log N ) must contain at least one prime.
On Faster Integer Calculations Using Non-arithmetic Primitives
125
Concerning an even faster and deterministic way of constructing primes, we Remark 29. In 1947, W.H. Mills proved the existence of a real number θ ≈ n 1.3063789 [11] such that pn := θ3 , n ∈ N, yields a (sub-)sequence of primes with pn+1 > p3n . It is not known whether θ is rational; if it is, one can straightforwardly extract from θ a prime pn > 3n =: N within O(n) = O(log N ) steps over (+, −, ×, div). But even if θ turns out as an algebraic ir rational, then still we obtain the same time bounds! Indeed, in order to compute θN , N N N N N −1 = θ + N ··θ + (θ + ) · k · θN −k k=2 k <1
shows that it suffices to calculate a rational approximation θ of θ up to error ≈ 2−N /N , according to Lemma 14 in time O(log N ), and then to take2 θ N .
4
Practical Relevance
Any real computer is of course far from able to operate in constant time on arbitrary large integers, the above algorithms therefore not practical in any way. Or are they? The technological progress described by Moore’s Law over the last decades includes an exponential increase in the width of processors’ arithmeticallogical units (ALU). Indeed, nowadays CPUs can commonly operate on 64 or even on 128 bits in one single instruction: that is, the unit cost model is valid for surprisingly large inputs and likely to become valid for even larger ones.
deg(p) p1 0≤ x Z
≤ 5 5 4 4 4 3 ≤ 5 15 9 13 23 56 ≤ 4 3 8 7 6 21 = 0x1401 0x1000 0x9001 0x8000 0x7471 0x80000
Fig. 3. Polynomials and argument ranges for Algorithm 4 to work on x86-64 CPUs
Specifically concerning Algorithm 4, it already now covers many polynomials of degree up to five (i.e. with six free coefficients): one can easily see the largest intermediate result to arise from the multiplication in Step 3; which then gets integer-divided (and thus smaller again) in Step 4. This corresponds rather nicely to two instructions provided by systems like AMD64 [2, Section 3.3.6] and Intel64 [18, Section 3.2]: mulq multiplies two 64-bit unsigned integers to return a full 128bit result; while divq obtains both quotient and remainder of dividing a 128-bit numerator by a 64-bit denominator. So whenever, in addition to the conditions on
126
K. L¨ urwer-Br¨ uggemeier and M. Ziegler
Z imposed by Scholium 6, p(Z) < 264 holds, each step of Algorithm 4 translates straight-forwardly to one x86-64 instruction. Figure 3 lists some example classes of polynomials6 and argument ranges which comply with these constraint; the shaded areas indicate that Z can be chosen as a power of 2 to further replace the integer divisions in Steps 4 and 5 by a shift and a binary conjunction, respectively. This leads to the realization indicated in the left of Figure 4. #Input: Z − x in %rsi #Constants: # p(Z) in %rdi. # Z d+1 in %rdx:%rax # (may occupy >64bit) # Z − 1 in %ebx, # 64 − d · log2 (Z) in %cl divq %rsi mulq %rdi shld %cl,%rax,%rdx andl %ebx,%edx #Output: p(x) in %edx, # %rax destroyed.
# x (byte!) in %eax = %ecx; # d + 1 coeff bytes in (%esi) mulb (%esi) xorl %ebx,%ebx movb 1(%esi),%bl addl %ebx,%eax mull %ecx : mull %ecx movb d(%esi),%bl addl %ebx,%eax # p(x) in %eax, # %ecx %ebx %edx destroyed
Fig. 4. x86-64 GNU assembler realization of Algorithm 4 and of Horner’s method
In comparison with Horner’s Method depicted to the right, this amounts to essentially the elimination of d − 1 (out of d) multiplications at the expense of one division—in a sense a counter-part to the converse direction, taken e.g. in [16], of replacing integer divisions by multiplications. Now an actual performance prediction, and even a meaningful experimental evaluation, is difficult in the age of caching hierarchies and speculative execution. For instance (e.g. traditional) 32-bit applications may leave large parts of a modern superscalar CPU’s 64-bit ALU essentially idle, in which case the left part of Figure 4 as a separate (hyper-)thread can execute basically for free. However even shorter than both Horner’s and Bshouty’s Algorithm for the evaluation of a fixed polynomial p is one (!) simple lookup in a pre-computed table storing p’s values for x = 0, 1, . . . , X. On the other hand when there are many polynomials to be evaluated, the tables required in this approach may become pretty large; e.g. in case of d = 3, X = 21, and p1 ≤ 56 (right-most column in Figure 3), the values of p(x) reach up to X d · p1 , hence do not fit into 16 bit and thus occupy a total of (X + 1) × 4 bytes for each of the
p 1 +d+1 = 487, 635 possible polynomials p: far too much to be held in cache d+1 and thus prone to considerably stall a modern computer; whereas the 487, 635 possible 64-bit values p(Z) do fit nicely into the 4MB L2-cache of modern CPUs, the corresponding four byte coefficients per polynomial (cf. right part of Figure 4) 6
Polynomials of higher degree D can be treated as D/(d + 1) polynomials of degree ≤ d and then applying Horner’s method to xd+1 .
On Faster Integer Calculations Using Non-arithmetic Primitives
127
even fit into 2MB. One may therefore regard Algorithm 4 as a compromise between table-lookup and Horner’s Method.
5
Conclusion
We presented algorithms which, using integer division and related non-arithmetic operations like bitwise conjunction or greatest common divisor, accelerate polynomial evaluation, linear algebra, and number-theoretic calculations to optimal running times. Several solutions would depend on deep open number-theoretical hypotheses, showing that corresponding lower bounds are probably quite difficult to obtain. Other problem turned out as solvable surprisingly fast (and actually beating information-theoretical lower bounds) when providing some more or less generic integers as additional input. On the other hand, these large numbers would suffice to be of size ‘only’ doubly exponential—and thus quickly computable when permitting leftshifts ←: y → 2y as in [33] or, more generally, exponentiation N × N (x, y) → xy as primitive at unit cost. In view of the hierarchy “addition, multiplication, exponentiation”, it seems interesting to gauge the benefit of level of Ackermann’s function A( , ·) to seemingly unrelated natural problems over integers.
References 1. Allender, E., B¨ urgisser, P., Kjeldgaard-Pedersen, J., Miltersen, P.B.: On the Complexity of Numerical Analysis. In: Proc. 21st Annual IEEE Conference on Computational Complexity (CCC 2006), pp. 331–339 (2006) 2. AMD64 Architecture Programmer’s Manual. vol. 1: Application Programming, Publication #24592 (Revision 3.13, July 2007) 3. Bach, E., Shallit, J.: Algorithmic Number Theory. Efficient Algorithms, vol. 1. MIT Press, Cambridge (1996) 4. Baran, I., Demaine, E.D., Pˇ atra¸scu, M.: Subquadratic Algorithms for 3SUM. In: Dehne, F., L´ opez-Ortiz, A., Sack, J.-R. (eds.) WADS 2005. LNCS, vol. 3608, pp. 409–421. Springer, Heidelberg (2005) 5. B¨ urgisser, P., Clausen, M., Shokrollahi, M.A.: Algebraic Complexity Theory. Springer, Heidelberg (1997) 6. Bertoni, A., Mauri, G., Sabadini, N.: Simulations Among Classes of Random Access Machines and Equivalence Among Numbers Succinctly Represented. Ann. Discrete Math. 25, 65–90 (1985) 7. Bshouty, N.H., Mansour, Y., Schieber, B., Tiwari, P.: Fast Exponentiation using the Truncation Operation. Computational Complexity 2, 244–255 (1992) 8. Borwein, J., Borwein, P.: PI and the AGM. Wiley, Chichester (1987) 9. Bshouty, N.: Euclidean GCD algorithm is not optimal (preprint, 1989) 10. Bshouty, N.: Private communication (1992) 11. Caldwell, C.K., Cheng, Y.: Determining Mill’s Constant and a Note on Honaker’s Problem. Journal of Integer Sequences, article 05.4.1 8 (2005) 12. Cheng, Q.: On the Ultimate Complexity of Factorials. In: Alt, H., Habib, M. (eds.) STACS 2003. LNCS, vol. 2607, pp. 157–166. Springer, Heidelberg (2003)
128
K. L¨ urwer-Br¨ uggemeier and M. Ziegler
13. Coppersmith, D., Winograd, S.: Matrix Multiplication via Arithmetic Progressions. Journal of Symbolic Computation 9, 251–280 (1990) 14. Fiduccia, C.M.: An Efficient Formula for Linear Recurrences. SIAM J. Comput. 14(1), 106–112 (1985) 15. Gajentaan, A., Overmars, M.H.: On a Class of O(n2 ) Problems in Computational Geometry. Computational Geometry: Theory and Applications 5, 165–185 (1995) 16. Granlund, T., Montgomery, P.L.: Division by Invariant Integers using Multiplication. In: ACM SIGPLAN Notices, pp. 61–72 (June 1994) 17. Han, Y.: Deterministic Sorting in O(n · loglog n) time and linear space. Journal of Algorithms 50, 96–105 (2004) R and IA-32 Architectures Software Developer’s Manual, vol. 2A. Instruc18. Intel64 tion Set Reference, A-M (order no.253666, May 2007) 19. Jacobson, N.: Structure of Rings. American Mathematical Society Colloquium Publications 37 (1964) 20. Just, B., auf der Heide, F.M., Wigderson, A.: On computations with integer division. RAIRO Informatique Theoretique 23(1), 101–111 (1989) 21. Kirkpatrick, D., Reisch, S.: Upper bounds for sorting integers on random access machines. Theoretical Computer Science 28(3), 263–276 (1983) 22. Koiran, P.: Valiant’s Model and the Cost of Computing Integers. Computational Complexity 13, 131–146 (2004) 23. L¨ urwer-Br¨ uggemeier, K., auf der Heide, F.M.: Capabilities and Complexity of Computations with Integer Division. In: Enjalbert, P., Wagner, K.W., Finkel, A. (eds.) STACS 1993. LNCS, vol. 665, pp. 463–472. Springer, Heidelberg (1993) 24. Montgomery, H.L., Vaughan, R.C.: The large sieve. Mathematika 20, 119–134 (1973) 25. Mansour, Y., Schieber, B., Tiwari, P.: The Complexity of Approximating the Square Root. In: Proc. 30th IEEE Symposium on Foundations of Computer Science (FOCS 1989), pp. 325–330 (1989) 26. Pritchard, P.: A sublinear additive sieve for finding prime numbers. Communications of the ACM 24, 18–23 (1981) 27. Pratt, V.R., Rabin, M.O., Stockmeyer, L.J.: A Characterization of the Power of Vector Machines. In: Proc. 6th Annual ACM Symposium on Theory of Computing (STOC 1974), pp. 122–134 (1974) 28. Randolph, J.F.: Basic Real and Abstract Analysis. Academic Press, London (1968) 29. Ribenboim, P.: The New Book of Prime Number Records, 3rd edn. Springer, Heidelberg (1996) 30. Ribenboim, P.: My Numbers, My Friends. Springer, Heidelberg (2000) 31. Sch¨ onhage, A.: On the Power of Random Access Machines. In: Maurer, H.A. (ed.) ICALP 1979. LNCS, vol. 71, pp. 520–529. Springer, Heidelberg (1979) 32. Shamir, A.: Factoring Numbers in O(log n) Arithmetic Steps. Information Processing Letters 8(1), 28–31 (1979) 33. Simon, J.: Division is Good. In: Proc. 20th Annual Symposium on Foundations of Computer Science (IEEE FoCS 1979), pp. 411–420 (1979)
A Framework for Designing Novel Magnetic Tiles Capable of Complex Self-assemblies Urmi Majumder and John H. Reif Department of Computer Science, Duke University, Durham, NC, USA {urmim,reif}@cs.duke.edu
Abstract. Self-assembly has been immensely successful in creating complex patterns at the molecular scale. However, the use of self-assembly techniques at the macroscopic level has so far been limited to the formation of simple patterns. For example, in a number of prior works, self-assembling units or tiles formed aggregates based on the polarity of magnetic pads on their sides. The complexity of the resulting assemblies was limited, however, due to the small variety of magnetic pads that were used: namely just positive or negative polarity. This paper addresses the key challenge of increasing the variety of magnetic pads for tiles, which would allow the tiles to self-assemble into more complex patterns. We introduce a barcode scheme which potentially allows for the generation of arbitrarily complex structures using magnetic self-assembly at the macro-scale. Development of a framework for designing such barcode schemes is the main contribution of the paper. We also present a physical model based on Newtonian mechanics and Maxwellian magnetics. Additionally, we present a preliminary software simulation system that models the binding of these tiles using magnetic interactions as well as external forces (e.g. wind) which provide energy to the system. Although we have not performed any physical experiments, nevertheless, we show that it is possible to use the simulation results to extract a higher level kinetic model that can be used to predict assembly yield on a larger scale and provide better insight into the dynamics of the real system.
1 1.1
Introduction Motivation
Self-assembly is a process where small components spontaneously organize themselves into a larger structure. This phenomenon is prevalent on all scales, from molecules to galaxies. Though self-assembly is a bottom-up process not utilizing an overall central control, it is theoretically capable of constructing arbitrarily complex objects. One of the most well-studied sub-fields of self-assembly is molecular selfassembly. However, many interesting applications of self-assembling processes can be found at a larger scale. Massively parallel self-assembling systems present C.S. Calude et al. (Eds.): UC 2008, LNCS 5204, pp. 129–145, 2008. c Springer-Verlag Berlin Heidelberg 2008
130
U. Majumder and J.H. Reif
a promising alternative to conventional manufacturing (which mostly uses sequential pick-and-place assembly). There are many examples of self-assembling systems at this scale which may be relevant to robotics and manufacturing such as self-assembled monolayers, the patterned assembly of electronic components and MEMS devices and the assembly of micro-robots and/or sensors. In this paper, we explore magnetic self-assembly with the ultimate goal of discovering the practical limits for its use in manufacturing and computing systems. Most of the related work described below focuses on the demonstration of macro- and micro-scale self-assembly. However, this paper focuses more on the design issues relevant to the generation of more complex structures using our novel barcode scheme. 1.2
Previous Work
Recent work in the field of macro-scale self-assembly include development of systems based on capillary interactions among millimeter-scale components either floating at a fluid-fluid interface or suspended in an approximately iso-dense fluid medium [1,2,3,4,5,6,7,8,9]. In fact, this technique has been adopted commercially [10,11]. Rothemund [12] demonstrated the use of hydrophobic and hydrophilic interactions to generate self-assemblies of moderate complexity and scale. His work was notable since it is the only work which demonstrated computational self-assembly at the macro-scale. Magnetic Passive and Active Assemblies. Magnetic assembly [13,14,15] is a form of macro-scale self-assembly that is directed by magnetic dipole interaction. One successful application of magnetic assembly is the spontaneous folding of elastomeric sheets, patterned with magnetic dipoles, into free standing, 3D spherical shells [16]. This technique has been shown to generate relatively simple structures, largely due to the limited nature of the magnetic interactions. This kind of self-assembly is also known as passive self-assembly since assembly takes place without external control. Here we address the key challenge of going beyond such limitations and aim to design more complex structures via magnetic assembly. To increase the complexity of magnetic assemblies, Klavins et al. [17] developed programmable units that move passively on an air-table and bind to each other upon random collisions. These programmable units have on-board processors that can change the magnetic properties of the units dynamically during assembly. Once attached, they execute local rules that determine how their internal states change and whether they should remain bound based on the theory of graph grammars [17]. This form of assembly is referred to as active assembly. However, our goal is to generate complex magnetic assemblies without the use of on-board processors (i.e. passive assembly). Computational Molecular Self-assemblies. In 1966 Berger proved that, in theory, universal computation can be done via tiling assemblies [18]. This essentially showed that tiling assemblies can generate arbitrarily complex structures. However, these theoretical ideas were not put to practice until much later.
A Framework for Designing Novel Magnetic Tiles
131
In 1982, Seeman [19] proposed that DNA nano-structures can be self-assembled by using Watson-Crick complementarity and thus DNA can form the basis of programmable nano-fabrication (This was later demonstrated in the 1990s). A seminal paper by Adleman [20] in 1994 used one-dimensional DNA self-assembly to solve an instance of the Hamiltonian path problem, thus establishing the first experimental connection between DNA self-assembly and computation. This work inspired Winfree [21] to apply the theory of Wang tiles to show that two-dimensional DNA self-assembly is capable of performing Turing Universal computation. This proposal was later verified experimentally with the demonstration of a Sierpinski Triangle pattern composed of DNA tiles [22]. 1.3
Our Contribution
The goal of this paper is to develop techniques that will allow the self-assembly of complex structures at the macro-scale. This task is quite challenging, since the available binding mechanisms (using magnetic and capillary interaction) currently used at the macro-scale provide only for binary binding (e.g., positive and negative in the case of magnetic binding and hydrophobic/hydrophilic interactions in the case of capillary binding). By contrast, DNA provides a large number of specific bindings through the use of complementary pairs of DNA sequences that can hybridize selectively. Here, we mimic the techniques and principles of molecular self-assembly to build complex structures at the macroscopic level. The key challenge is then to extend the binding mechanisms at this scale to a much larger number of specific bindings, rather than just two. We achieve this by using a magnetic barcode technique described in this paper. Our testbed is an example of a distributed system where a large number of relatively simple components interact locally to produce interesting global behavior. Square programmable tiles float passively on a forced air-table, mixed randomly by oscillating fans (simulates Brownian motion of the molecular scale). The tiles have a magnetic encoding on each of their faces. When they collide, if the facing poles are exactly complementary, the tile faces bind to each other and this process repeats to generate our desired final structure. We discuss how our barcode scheme relates to achievable shapes and how we can optimize our tile design. We further describe how a rigid-body simulation environment can be used to model the testbed and performed very preliminary validation of the feasibility of using self-assembly of magnetic tiles (with barcoded pads) for the generation of patterned lattices using our simulation system. We conclude with a discussion on scalability issues and how we can use our simulation results to predict yields in larger scales. Since we are yet to perform physical experiments, the emphasis in this discussion is the methodology that makes it possible to extract high-level parameters from the low-level simulation environment. 1.4
Organization of the Paper
Section 1 introduces the main theme of the paper: using magnetic barcodes to direct tile-assembly. Section 2 presents the overall scheme, in particular the
132
U. Majumder and J.H. Reif
barcode scheme and the set of achievable shapes. Section 2.3 discusses the various combinatorial, thermodynamic and physical optimization rules that can be applied to improve the yield of assembly. It also presents techniques from robot motion planning that can be applied as well to improve tile designs. Section 3 presents the simulation model and some preliminary results from simulating a simple two-tile system. It also discusses the feasibility of extracting a higher level kinetic model based on assembly/disassembly rates from the low level physical simulation model and includes a discussion on scaling of the system and yield optimization. Finally Section 4 concludes the paper with some future directions.
2
Design of a Magnetic Self-assembly System
Self-assembly at the macro-scale can happen through a wide range of forces viz. gravitational, electrostatic, magnetic, capillary, etc. In the past, the driving force has been mostly capillary interaction [23]. An important point to note here that the choice of the driving force depends on several factors like scale and magnitude of the force, environmental compatibility and influence of the interactions on the function of the system. We have chosen to use magnetic force as the driving force for our self-assembling system mainly because magnetic interactions are insensitive to the surrounding medium and are independent of surface chemistry. Also range of magnetic forces can be engineered so as to control the long-range and short-range interactions between components. This is important because a key issue in the design of programmable self-assembly is the recognition between components, governed by the design, surface chemistry and topology of the interacting surfaces. 2.1
The Overall Scheme
The overall design of our system is as follows: the main component of our selfassembling system is a set of square wooden tiles. Each edge of a tile is lined with a sequence of magnetic dipoles. The latter is perpendicular to each face and can either have their north pole or south pole facing out. A typical encoding on a tile face may be {NNNS} where N or S denotes whether the north pole or the south pole of the dipole is facing out of the tile. The tiles float on a custom made air-table. A set of fans are placed around the air-table and mix the tiles. Thus all the interactions between the tiles are due to chance collisions. The idea is that if a tile face (e.g. with encoding {NNNS}) collides with a tile face with matching encoding (i.e. {SSSN}), they stick together, thus resulting in an assembly (Fig. 1). 2.2
The Barcode Scheme and Achievable Shapes
In the context of our magnetic self-assembly a barcode is a series of bar magnet poles that face out of the tile on any face (e.g. NSN and SNS as in Fig. 1). If we have a n character long barcode on each face of every square tile in our tile
A Framework for Designing Novel Magnetic Tiles
133
NSN S N S N S SNS N NSN S S S SNS NSN NSN N N S N N S N NSS N S SNS
N S NSN N N S N N S N SNS NSN N S S N S S N S SNS S
Fig. 1. A typical magnetic assembly
set then number of distinct tiles is 24n . However, there can be different types of assemblies ranging from uniquely addressable to homogeneous. A uniquely addressable lattice is where each tile in the assembly has a unique location. Any such lattice of size m × n calls for m(n − 1) + n(m − 1) different barcodes. Thus, in this case, we need barcode of length O(log(mn)). At the other extreme is a homogeneous lattice which calls for exactly one tile type and can be constructed with O(1) length barcodes. In between these two extremes lie computational assemblies, which have been shown to be Turing Universal [24]. Here we treat each tile as a computational unit where the east and south faces are inputs of the computation while north and west are outputs of the computations, which are then used in the next step of the computation. In other words, a tile assembly model of size nT simulates a Blocked Cellular Automata of size n running in time T [24]. For any such computation, a barcode of length n generates a tile 2n set of size 22n . Further the number of functions we can have is (2)2 . Examples of Complex Assemblies. Some examples of complex assemblies are given in Fig. 2. Each of these assemblies are based on Winfree’s Tile Assembly model [24] and only uses a small number of tile types (O(1)), as is the characteristic of any computational assembly. Complexity of Achievable Shapes. This question was first addressed by Rothemund and Winfree [26] for computational molecular assembly. However, their results also hold for macroscopic assemblies. Suppose that τ is defined as the parameter which decides when to add a tile to a growing assembly. In particular, when the total interaction strength of a tile with its neighbors exceed τ , the tile gets added to the assembly. Then the minimum number of distinct tiles required to self-assemble a N ×N square decreases from N 2 to O(logN ) tiles as τ is increased from 1 (noncooperative bonding) to 2 (cooperative bonding). An alternative measure is to compute the minimum number of distinct side labels used for assembling the square. It is still an open question whether both measures give asymptotically similar results. The latter will be more useful for a practical implementation of the system since, in reality, the number of distinct binding interactions is limited due to imperfect specificity of binding. It should
134
U. Majumder and J.H. Reif
(a)
(b)
(c)
(d)
(e)
Fig. 2. Examples of Complex Assemblies: (a) Unary Square, (b) Binary Tree, (c) Binary Counter, (d) Beaver Square, (e) Spiral, made with XGROW [25]
be mentioned here that Adleman et al. [27] later proved a tighter bound of logN Θ( loglogN ) for the minimum number of distinct tiles required to assemble a N ×N square uniquely, by demonstrating that self-assembly can compute changes in the base representation of numbers. A further decrease was achieved by Kao et al. [28] who proved that using a sequence of O(m) changes in τ (where m is an arbitrarily long binary number), a general tile set of size O(1) can uniquely assemble any N × N square. For arbitrary shapes (e.g. non-squares) no such tight bounds exist as yet. However, Soloveichik et al. [29] recently showed that the minimal number of distinct tile types required to self-assemble an arbitrarily scaled structure can be bounded both above and below by the shape’s Kolmogorov’s complexity where Kolmogorov Complexity of a string I is defined to be the length of the shortest program that computes or outputs I, when the program is run on some fixed reference Universal computer. 2.3
Tile Programming
This section describes our barcode design scheme. We will sometimes refer to the barcode on a tile face as a word. Here our goal is to design a set of words such that the energy difference between a pair of perfectly matched faces and
A Framework for Designing Novel Magnetic Tiles
135
a pair of partially or completely unmatched faces is maximized. Tulpan et al. [30] proposed a DNA-design algorithm based on local search approach that can be utilized for our magnetic barcode design with minimal modifications. The algorithm takes as input: the length of each code word, the number of unique codewords that need to be generated and a set of constraints that the output set must satisfy. We describe some of the constraints for magnetic tile design below: Combinatorial Optimization. Some examples of combinatorial constraints [31] are as follows: 1. The number of mismatches in a perfect alignment of two tile faces must be above an user-defined threshold. For instance if a tile face encoded as NNNN aligns up with a face encoded as SNSN then there will be two mismatches in such an alignment. Note that mismatches need not be consecutive and can be minimized using prefix codes and Hamming distance maximization. 2. The number of mismatches between a perfect alignment of one tile face encoding and the complement of another tile face encoding should also be above some threshold. 3. The situation of tile binding can be made more complicated by the presence of a slide match configuration (e.g. when a tile face bearing NNNSSNSNSN matches with another tile face bearing NNSNSNSNNS starting at the fourth location on the second face and the first one for the first tile). Hence the number of mismatches in a slide of one tile face over another must be above some threshold. The problem of slide match configuration can be handled using shift distinct codes or complementary shapes for tile faces. 4. The maximum number of consecutive matches between all slides of one tile face encoding over the other must be in an user defined range. Thermodynamic Optimization. Thermodynamic constraints are based on the free energy of a pair of tile binding. The free energy of an assembly is not just a function of the encodings, but also the number, orientation and speed of fans and number of tiles. However, any model incorporating so many free parameters will be quite complicated. Hence, for simplicity, we will assume that the sole contributor to free energy in our case is the magnetic interaction between two tile faces when they are perfectly aligned. Effects of adjacent faces (e.g. north and east) can be neglected because of shielding (Section 3.1). Some thermodynamic constraints used in the algorithm [31] are as follows: 1. The free energy of a perfect match must be below a given threshold. 2. The free energy of a code word and the complement of another code word, two words or two complements must also be in a desired range. Eventually, the goal is to obtain a positive free energy gap between the perfect match and imperfect matches of a code word. Since our magnetic assembly is a mechanical system, we will also take some physical factors into consideration while designing tiles.
136
U. Majumder and J.H. Reif
Physical Optimization. We can minimize intra and inter-magnetic cross-talk using the following techniques [Fig. 3]: 1. Large tile to magnetic dipole size ratio (minimizes interaction between adjacent tile faces). 2. Barcodes towards the center of the face (minimizes interaction between adjacent tile faces). 3. Use of spacer sequences, thus increasing the alphabet size. 4. Use of long thin magnets, essentially minimizing the effect of one pole on another. 5. Use of magnetic shields. (a coating of soft iron on the magnets prevents coupling of flux lines between two adjacent bar magnets). An alternative method is to use Halbach array, which is a special arrangement of permanent magnets that augments the magnetic field on one side of the device while canceling the field to near zero on the other side [32]. Although in this scheme we can intensify the magnetic field at the end of the tile faces and minimize it at the other end of magnetic arrangement, the method cannot handle sideways magnetic crosstalk.
N S
S
N
S
S
S
N S
N
S N
S
N
Long Thin Magnet
N
N
S
S
N
N
S
Pairwise Alternate Encoding
S
N
S
S
N
N
N
S N
S
S
N
N
N
N S
S
N
S
N
S
N
N
S
S
Complementary Shaped Markers
N
Centered Encoding
Use of Spacer Sequence
S
Tile(Large tile size to dipole size)
Barcode Scheme Iron Shield
Fig. 3. Techniques for minimizing magnetic crosstalk
2.4
Improving Tile Designs Using Motion Planning Studies
Complementary shape matching [Fig. 3] is a useful technique in optimal tile design [33]. We can verify the “goodness” of a match using a motion planning technique called a probabilistic roadmap [34] which is mostly used to check the connectivity of a collision free path in a high-dimensional space. It can also be used to capture the extrema of a continuous function (e.g. potential energy) over high-dimensional space [35] since this map captures the connectivity of the low energy subset of the configuration space in the form of a network of weighted pathways. In our context, it can be used to study the potential energy landscape of a two tile assembly. Specifically it will be interesting to find out
A Framework for Designing Novel Magnetic Tiles
137
whether an energetically favorable path exists between any randomly generated configuration for the tiles and its final bound state and if it exists, compute the energy barrier. Further, it may be useful to study how the energy barrier varies with various complementary shapes and depth of the binding site. The conformational space for a two-tile system with one fixed tile and another with some initial velocity is essentially three-dimensional (x, y and θ). The energy function is based on the magnetic interaction model (See Sect. 3.1). Milestones in this configuration space are generated randomly using rejection sampling where the probability of accepting a milestone depends on the tile configuration. An edge exists between any two milestones in the configuration space if the path between them is energetically favorable and the weight is determined by the energy of the path. Once the graph is constructed there are many ways to use it. One typical query is the shortest weight path between two configurations; another query is to use it to characterize a true binding site based on the energy barrier mentioned above.
3 3.1
Simulation of a Two-Tile System Simulation Model
Since actual experimentation would require elaborate patterning of tiles and careful placement of fans with appropriate driving force, we evaluated our barcode scheme by simulating a two tile assembly. This section presents the physical model underlying the simulation. Tile Motion Model. The air-table provides a two-dimensional fluid environment for the tiles. As tiles traverse the testbed, they will lose kinetic energy due to friction and variations in their air cushion. In our model, we assume that our air-table surface has a very low coefficient of friction, minimizing energy losses as the tiles traverse on the air-bed. Fan Model. We use an exponentially decaying function to model the fans. Our → r simulation assumes that the potential energy Ef is a function of the distance − −|r|2 from a tile to a fan and takes the form of Ef = e . Hence the fan force can be obtained as the gradient of potential energy. Interestingly, the oscillating fans simulate the Brownian motion which is the main driving force behind diffusion and self-assembly in the molecular level. Collision and Friction Model. We assume that the coefficient of restitution between two tiles is small to allow short-range magnetic forces decide whether a binding event will take place or not. Our friction model is essentially an approximation of Coulomb’s friction model. Magnetic Interaction Model. Since the magnets are glued to the tile surface and are shielded, intra-tile magnetic interaction is negligible. For interfacing tiles, our design ensures that only the magnets on the nearest face will have any effect on a given bar magnet.
138
U. Majumder and J.H. Reif
Magnetic Dipole Approximation: We approximate our bar magnets as magnetic dipoles. We do not have any source of electric current in our system; so Maxwell’s equations for magnetostatics apply in this case specifically, − − → → ·B =0 → − − → → 4π − J ×B = c
(1)
→ − → − where B is the flux density of the magnet and J is its electric current density. If → − → − − → we define B = × A , then for single geometries we can perform Coulomb-like → − integrals for A and then a multi-pole expansion of it up to the dipole term, → − r )ˆ − → − r−→ m → yielding flux density B = 3( m.ˆ at a distance − r due to a magnet with → − r3 → − dipole moment m. Hence the force on a dipole in an external magnetic field is → − → − → F = (− m. B ). In particular, suppose we want to compute the force experienced → − by a magnet M1 on tile T1 due to the magnetic field B 2 of a magnet M2 on tile → − T2 located at a distance r = xi + yj. Let the dipole moment of a magnet M1 → → be − m 1 = mx1 i + my1 j and that of M2 be − m 2 = mx2 i + my2 j. Then, → → 3(− m 2 .ˆ − → r )ˆ r−− m2 B2 = → − 3 r 3(mx2 x2 + my2 xy)i + 3(mx2 xy + my2 y 2 )j mx2 i + my2 j = − 5 3 (x2 + y 2 ) 2 (x2 + y 2 ) 2
(2)
Consequently, → − − → → F = (− m1. B 2) (m m x2 + m m xy) + (m m xy + m m y 2 ) x1 x2 x1 y2 y1 x2 y1 y2 = 3 5 2 2 (x + y ) 2 (mx1 mx2 + my1 my2 ) − 3 (x2 + y 2 ) 2 (mx1 mx2 x2 + mx1 my2 xy) + (my1 mx2 xy + my1 my2 y 2 ) = − 15x 7 (x2 + y 2 ) 2 3(mx1 mx2 2x + mx1 my2 y) + 3my1 mx2 y 3x(mx1 mx2 + my1 my2 ) + + i 5 5 (x2 + y 2 ) 2 (x2 + y 2 ) 2 15y(mx1 mx2 x2 + mx1 my2 xy) + (my1 mx2 xy + my1 my2 y 2 ) + − 7 (x2 + y 2 ) 2 3mx1 my2 x + 3my1 mx2 x + 3my1 my2 2y 3y(mx1 mx2 + my1 my2 ) + + j (3) 5 5 (x2 + y 2 ) 2 (x2 + y 2 ) 2 We can compute the dipole moment of a bar magnet of length l and square cross-sectional area with a = 4r2 as follows. With long thin magnets, we can approximate the bar magnet with a cylindrical bar magnet which can be further
A Framework for Designing Novel Magnetic Tiles
139
approximated by a solenoid which is l units long, has N turns each of which has area πr2 sq units and current i. The magnetic field at the end of the coil is μ0 N i B0 = √ (4) 2 l 2 + r2 and following the analogous calculation for an electric dipole, the magnetic dipole moment 2B0 al (5) |M | = √ l 2 + r2 and the direction is from the north pole to the south pole. In our case the |M | is same for all magnets and can be set to some pre-defined value. FEMM Simulations: We used Finite Elements Method Magnetics[36] to verify the effect of all the techniques described above. FEMM has a much greater capacity to represent physical situation than the rigid body simulation. The results in Fig. 4 clearly shows that magnetic shielding is an effective technique for minimizing magnetic crosstalk.
Fig. 4. FEMM Simulations showing the effect of magnetic shielding: left tile has minimum magnetic crosstalk due to magnetic shielding on its dipoles unlike the right tile with no shielding on its dipoles
Tile Motion Model. Once the individual forces have been calculated, we can model the tile motion. On the testbed, a tile’s motion is described in terms of 2 2 its two-dimensional linear acceleration ( ddt2x , ddt2y ) and one-dimensional angular 2 acceleration ( ddt2θ ): ⎛ 2 ⎞ ⎛ ⎞ ⎛ dx ⎞ d x −→ −→ − → −μ 0 0 2 → → dt τ (− r , Fm ) Ff (x, y) Fm (x, y) − ⎜ ddt2 y ⎟ dy ⎠ ⎝ ⎠ ⎝ + + (6) 0 −μ 0 +g+ ⎝ dt2 ⎠ X = dt m m I dθ d2 θ 0 0 −μ dt 2 dt
140
U. Majumder and J.H. Reif
where g is the acceleration due to gravity, m is the mass of the tile, I is the moment of inertia about the axis passing through its centroid perpendicular to −→ the bed, μ is the coefficient of friction, Fm is the magnetic dipole force, − τ→ m is −→ → the torque exerted on the tile by the force Fm acting at the magnetic point − r relative to the tile’s center of mass. For simplicity, we apply the fan force to the center of the tile making the torque due to the fan force equal to zero. Tiles also receive impulse forces and torques when they collide with each other or the sides of the air-table. The force and torque imparted during these events conserve linear and angular momentum but not kinetic energy, since the collisions are partially inelastic. 3.2
Preliminary Simulation Results
Our simulation uses the Open Dynamics Engine [37] library, which can compute trajectories of all the tiles and determine the results of the collisions. The goal of our simulation is to discover the range of the magnetic force effective for tile binding in the absence of wind energy. Here, one tile has fixed position and the other tile has an initial random position. We gave the second tile some initial velocity and computed the likelihood of a correct match, given the kinodynamic (position, orientation and velocity) constraints on this tile. Note that by providing the random initial velocity we are essentially simulating the exponentially decaying potential function of the wind energy source. It is important to note here that in our simulation, we call two tile faces connected if the corresponding matching dipoles are within some pre-defined threshold distance. Also, for estimating the likelihood of match in any simulation, we declare the tiles connected only when they remain connected until the end of the simulation. Our air-bed is 2 m wide and 2 m long. The air-table has a very small coefficient of friction, specifically 0.0005. The dimension of a tile is 43 × 43 × 1.3 cm3 while that of each bar-magnet is 1 × 1 × 0.3 cm3 . This ensures a large tile to dipole size ratio. Each tile has a mass of 100 g. The frictional coefficient between two tile surfaces is assumed to be 0.3 while the coefficient of restitution for intertile collision is 0.01. An example simulation of a two-tile assembly is shown in Figure 5. It should be remembered that the emphasis of this paper is the design framework and hence we presented only preliminary experiments. 3.3
Interpretation of Simulation Data
Kinetic Model. The low-level simulation model based on Newtonian mechanics and Maxwellian magnetics serves as the basis for a higher level kinetic model based on on/off rates, very similar to chemical kinetics [38]. Chemical kinetics is useful for analyzing yields in large assemblies and understanding assembly dynamics without having to consider the innumerable free parameters in the low-level physical model. Although the number of tiles in our preliminary experimental setup is quite small and is not very suitable for deducing higher level model parameters, the goal here is to establish the feasibility of the process. Hence
A Framework for Designing Novel Magnetic Tiles
141
Fig. 5. (Top to Bottom, Left first, then Right) A simulation snapshot of two selfassembling square magnetic tiles (decorated with four bar magnets on each face and without complementary shapes) based on the original simulator from Klavins et al. [17]
142
U. Majumder and J.H. Reif
if we model tile attachment as a Poisson process, their on-rates λon will be exponentially distributed. We, however, use the simulation data with Monte Carlo Integration to estimate λon . Similarly, the off-rate can be determined using the data on time interval between when the tiles are attached and disconnected. Figure 6 gives the probability distribution of a correct match in an assembly when the relative orientation of the two tiles is in (− π2 , π2 ), relative velocity is in (0, 1.3m/s) (based on tile mass and dimensions) and relative distance between (0, 2.82m) (based on the dimensions of the air-table). Unfortunately, there is no reality check on this probability distribution since we have not performed any physical experiments. Consequently this discussion is meant to present the feasibility of such an interpretation and its related benefits. As a part of future work, we intend to perform an actual validation with real data. Scaling of the Simulation System. We consider two types of scaling. In the first interpretation, we consider the relationship between the yield of assembly and the number of component tiles in the system. Intuitively, if the system is too crowded, random collisions are not possible. However, if the system is well mixed such that random collisions are possible, then, the yield of an assembly is directly proportional to the number of component magnetic tiles. We discuss more on yield optimization in Section 3.3. Another interpretation of scale is the length scale. A major limitation to down-scaling our system is the rapid increase of the magnitude of the interactions between magnetic dipoles with the decreasing size of the particles [39]. The dipole dipole forces are attractive and scale as d−4 where d is the distance between their centers. In particular, in the nanometer scale, there is a critical
Fig. 6. Probability distribution for assembly of two tile faces for different initial positions of the moving tile (from simulation data of two tile system)
A Framework for Designing Novel Magnetic Tiles
143
length beyond which coercivity almost vanishes and the material becomes superparamagnetic [40]. Yield Optimization. Since our low-level physical model leads to a model similar to a chemical kinetics model, it is possible to extract the steady state component distribution and hence use this information to design better tiles. In particular, if we interpret the system as a Markov process then we can use Master’s Equation [38] to obtain the time evolution of the probability of the system to adapt one of the exponentially many configurations. We can derive the average behavior of the system using Kolmogorov’s Forward Equation [41] and, thus, compute the expected number of tiles of each type in the steady state. Based on the Markov Chain interpretation it is also possible to construct a linear program in order to obtain the probabilities that would maximize the yield subject to the rate constraints, as was done by Klavins et al. [42] for active magnetic assembly. However, our system is essentially passive, hence the best we can do is to use these values to make small changes in the parameter space and alter the effective on and off rates and hence make incremental improvements to our yield.
4
Future Directions
One of the immediate goals is to extend the simulation model to a multi-tile system with fans. However, the significance of demonstration of an actual magnetic assembly cannot be undermined. Hence, one possible future direction will be the actual demonstration of the assembly and then a comparison of the experimental and simulation results, particularly the yield and the size of assembly. Another possible direction is to study the potential of a magnetic self-assembling system in three dimensions. The situation becomes more complicated in 3D due to the increase in the degrees of freedom. We would also like to study our encoding technique in a more general manner so that it can be applied to any macro and micro-scale self-assembling system. For instance, one possible direction can be the study of complex self-assembly using the capillary interaction of MEMS tiles patterned with wetting codes. Nonetheless, as an enabling technique, our hope is that this assembly approach will be applicable to generic tiles for the generation of arbitrary complex macro-scale systems.
References 1. Bowden, N.B., Terfort, A., Carbeck, J., Whitesides, G.: Science 276, 233–235 (1997) 2. Bowden, N.B., Oliver, S., Whitesides, G.: Journal of Phys. Chem. 104, 2714–2724 (2000) 3. Jackman, R., Brittain, S.T., Adams, A., Prentiss, M., Whitesides, G.: Science 280, 2089–2091 (1998) 4. Clark, T.D., Tien, J., Duffy, D.C., Paul, K., Whitesides, G.: J. Am Chem. Soc. 123, 7677–7682 (2001) 5. Oliver, S.R.J., Bowden, N.B., Whitesides, G.M.: J. Colloid Interface Sci. 224, 425– 428 (2000)
144
U. Majumder and J.H. Reif
6. Bowden, N., Choi, I.S., Grzybowski, B., Whitesides, G.M.: J. Am. Chem. 121, 5373–5391 (1999) 7. Grzybowski, B., Bowden, N., Arias, F., Yang, H., Whitesides, G.: J. Phys. Chem. 105, 404–412 (2001) 8. Syms, R.R.A., Yeatman, E.M.: Electronics Lett. 29, 662–664 9. Harsh, K.F., Bright, V.M., Lee, Y.C.: Sens Actuators A 77, 237–244 (1999) 10. Yeh, H.J.J., Smith, J.S.: IEEE Photon Technol. Lett. 6, 706–708 (1994) 11. Srinivasan, U., Liepmann, D., Howe, R.T.: J. Microelectromech. Syst. 10, 17–24 (2001) 12. Rothemund, P.W.K.: Using lateral capillary forces to compute by self-assembly. PNAS (2000) 13. Gryzbowski, B., Whitesides, G.M.: Nature 405, 1033–1036 (2000) 14. Gryzbowski, B., Whitesides, G.M.: Science 296, 718–721 (2002) 15. Grzybowski, B., Jiang, X., Stone, H.A., Whitesides, G.M.: Phy. Rev. E 64(111603), 1–12 (2001) 16. Boncheva, M., Andreev, S.A., Mahadevan, L., Winkleman, A., Reichman, D.R., Prentiss, M.G., Whitesides, S., Whitesides, G.: PNAS 102, 3924–3929 (2005) 17. Bishop, J., Burden, S., Klavins, E., Kreisberg, R., Malone, W., Napp, N., Nguyen, T.: Self-organizing programmable parts. In: Intl. Conf. on Intelligent Robots and Systems (2005) 18. Berger, R.: The undecidability of the domino problem. Memoirs of the American Mathematical Society 66 (1966) 19. Seeman, N.C.: Nucleic acid junctions and lattices. Journal of Theor. Biology 99, 237–247 (1982) 20. Adleman, L.M.: Science 266, 1021–1024 (1994) 21. Winfree, E.: DNA Based Computers, pp. 199–221 (1996) 22. Rothemund, P.W.K., Papadakis, N., Winfree, E.: PLoS Biology 2(12) (December, 2004) 23. Whitesides, G., Boncheva, M.: PNAS 99, 4769–4774 (2002) 24. Winfree, E.: Algorithmic Self-Assembly of DNA. PhD thesis, California Institute of Technology (1998) 25. Winfree, E.: Simulation of computing by self-assembly. Technical Report 1998.22, Caltech (1998) 26. Rothemund, P., Winfree, E.: STOC, pp. 459–468. ACM Press, New York (2000) 27. Adleman, L., Cheng, Q., Goel, A., Huang, M.: STOC, pp. 740–748. ACM Press, New York (2001) 28. Kao, M., Schweller, R.: SODA. ACM Press, New York (2006) 29. Soloveichik, D., Winfree, E.: DNA Based Computers 10. LNCS (2005) 30. Tulpan, D., Hoos, H., Xiang, Y., Chaib-draa, B. (eds.) Canadian AI 2003. LNCS (LNAI), vol. 2671, pp. 418–433. Springer, Heidelberg (2003) 31. Tulpan, D., Andronescu, M., Chang, S.B., Shortreed, M.R., Condon, A., Hoos, H., Smith, L.M.: NAR 33(15), 4951–4964 (2005) 32. Mallinson, J.C., Shute, H., Wilton, D.: One-sided fluxes in planar, cylindrical and spherical magnetized structures. IEEE Transactions on Magnetics 36(2) (March 2000) 33. Fang, J., Liang, S., Wang, K., Xiong, X., Bohringer, K.: Self-assembly of flat micro components by capillary forces and shape recognition. In: FNANO (2005) 34. Kavraki, L., Svetska, P., Latombe, J., Overmars, M.: IEEE Trans. Rob. Autom. 12(4), 566–580 (1996) 35. Apaydin, M., Singh, A., Brutlag, D., Latombe, J.: In: ICRA (2001)
A Framework for Designing Novel Magnetic Tiles
145
36. FEMM:Finite Element Method Magnetics, http://femm.foster-miller.net/wiki/HomePage 37. Open Dynamics Engine, http://ode.org 38. Gillespie, D.: J. Phys. Chem. 81, 2340–2361 (1977) 39. Gryzbowski, B., Whitesides, G.: J. Phys. Chem. 106, 1188–1194 (2002) 40. Hu, R.L., Soh, A., Ni, Y.: J. Phys D: Appl. Phys. 39, 1987–1992 (2006) 41. Strook, D.: An Introduction to Markov Processes. Springer, Heidelberg (2005) 42. Klavins, E., Burden, S., Napp, N.: Optimal rules for programmed stochastic selfassembly. In: RRS (2006)
The Role of Conceptual Structure in Designing Cellular Automata to Perform Collective Computation Manuel Marques-Pita1,2,3, Melanie Mitchell3 , and Luis M. Rocha1,2 2
1 Indiana University Instituto Gulbenkian de Ciˆencia 3 Portland State University
Abstract. The notion of conceptual structure in CA rules that perform the density classification task (DCT) was introduced by [1]. Here we investigate the role of process-symmetry in CAs that solve the DCT, in particular the idea of conceptual similarity, which defines a novel search space for CA rules. We report on two new process-symmetric onedimensional rules for the DCT which have the highest “balanced” performance observed to date on this task, as well as the highest-performing CA known to perform the DCT in two dimensions. Finally, we investigate the more general problem of assessing how different learning strategies (based on evolution and coevolution, with and without spatial distribution), previously compared by [2], are suited to exploit conceptual structure in learning CAs to perform collective computation.
1
Introduction
The study of computation in cellular automata (CAs) and related cellular architectures has lately garnered renewed interest due to advances in the related fields of reconfigurable hardware, sensor networks, and molecular-scale computing systems. In particular, cellular array architectures are thought to be appropriate for constructing physical devices such as field configurable gate arrays for electronics, networks of robots for environmental sensing and nano-devices embedded in interconnect fabric used for fault tolerant nanoscale computing [3]. A current stumbling block for CA computing is the difficulty of programming CAs to perform desired computations, due to the decentralized architectures and nonlinear behavior of these systems. One approach is to use genetic algorithms or other evolutionary computation methods to evolve cellular automata transition rules that will perform desired computations. However, this approach has problems of scaling, due to the large search spaces for non-elementary CAs—those with larger than nearest-neighbor cell communication or with multiple states per cell. In this paper we describe our investigation of reducing the dimensionality of these search spaces by using automatically-discovered conceptual structures of rule tables that are common to CAs likely to be successful for a particular computational task. We show that for one well-studied task—two-state density C.S. Calude et al. (Eds.): UC 2008, LNCS 5204, pp. 146–163, 2008. c Springer-Verlag Berlin Heidelberg 2008
The Role of Conceptual Structure in Designing Cellular Automata
147
classification—a particular conceptual structure of CA rule tables that we call degree of process symmetry is correlated with success on the task, and is implicitly increased by genetic algorithms evolving CAs for this task. We also show that process symmetry provides a search space of significantly reduced dimensionality, in which a genetic algorithm can more easily discover high-performing one- and two-dimensional CAs for this task.
2
Cellular Automata
A cellular automaton (CA) consists of a regular lattice of N cells. Each cell is in one of k allowed states at a given time t. Let ω ∈ {0, 1, ..., k − 1} denote a possible state of a cell. Let state ω = 0 be referred to as the quiescent state, and any other state as an active state. Each cell is connected to a number of neighbors. Let a local neighborhood configuration (LNC) be denoted by μ, and its size by n. For each LNC in a CA an output state is assigned to each cell. This defines a CA rule string, φ, the size of which is k n . In binary CAs, in which only two states are allowed(k = 2), it is possible to classify individual cell state-updates in three categories: (1) preservations, where a cell does not change its state in the next time instance t + 1; (2) generations, state-updates in which the cell goes from the quiescent to the active state; and (3) annihilations, state-updates where the cell goes from the active to the quiescent state. The execution of a CA for a number M of discrete time steps, starting with a given initial configuration (IC) of states, is represented as the set Θ containing M + 1 lattice state configurations. 2.1
The Density Classification Task
The Density Classification Task (DCT) is a widely cited example of collective computation in cellular automata. The goal is to find a one-dimensional binary CA rule (with periodic boundary conditions) that can classify the majority state in a given, random IC (with odd number of cells). If the majority of cells in the IC are in the quiescent state, after a number of time steps M , the lattice should converge to a homogeneous state where every cell is in the quiescent state, with analogous behavior for an IC with a majority of active cells. Devising CA rules that perform this task is not trivial, because cells in a CA lattice update their states based only on local neighborhood information. However, in this particular task, it is required that information be transferred across time and space in order to achieve a correct global classification. The definition of the DCT used in our studies is the same as the one given by [4]. K (φ) the fraction of K initial configurations on We define the performance PN a N-cell lattice that produce correct classifications (all quiescent for a majority of quiescent states in the IC; all active for a majority of active states in the IC). Nine of the cellular automata rules with highest performance on the DCT were analyzed to determine whether there is conceptual structure not explicit in them, and if so, to investigate the possible conceptual similarity among them using a cognitively inspired mechanism (Aitana) [1]. Three of these rules were produced
148
M. Marques-Pita, M. Mitchell, and L.M. Rocha
by human engineering: φGKL [5,6], φDavis95 and φDas95 [7]; three were learned with genetic algorithms φDMC [8] or coevolution methods φCOE1 and φCOE2 [9]. Finally, three of the rules were learned with genetic programming or gene expression programming: φGP 1995 [7], φGEP 1 and φGEP 2 [10]. The next section summarizes the basics of Aitana’s architecture, and the conceptual properties found in the studied CAs that perform the DCT.
3
CA Schemata Redescription
Aitana is largely based on an explanatory framework for cognitive development in humans known as the Representational Redescription Model developed by [11], and the Conceptual Spaces framework proposed by [12]. There are a number of (recurrent) phases in Aitana’s algorithm: (1) Behavioral Mastery, during which CAs that perform some specific collective computation are learned using, for example, genetic algorithms or coevolution. The learned rules are assumed to be in a representational format we call implicit (conceptual structure is not explicit). (2) Representational Redescription Phase I takes as input the implicit representations (CA look-up tables) and attempts to compress them into explicit1 (E1) schemata by exploiting structure within the input rules. (3) Phase II and beyond look for ways to further compress E1 representations, for example by looking at how groups of cells change together, and how more complex schemata are capable of generating regular patterns in the dynamics of the CA. The focus in this paper is on Phase I redescription. E1 representations in Aitana are produced by different modules. In particular, two modules were explored by [13]: the density and wildcard modules. Modules in Aitana can be equated to representational transducers, where each module takes implicit CA rules, and outputs a set of E1 schemata that redescribe them. The nine high-performing CA rules we report on here were redescribed with the wildcard module, introduced in the next section. 3.1
The Wildcard Module
This module uses regularities in the set of entries in a CA’s look-up table, in order to produce E1 representations captured by wildcard schemata. These schemata are defined in the same way as the look-up table entries for each LNC of a CA rule, but allowing an extra symbol to replace the state of one or more cells within them. This new symbol is denoted by “#”. When it appears in a E1 schema it means that in the place where it appears, any of the possible k states is accepted for state update. The idea of using wildcards in representational structures was first proposed by [14], when introducing Classifier Systems. Wildcard schemata can be general or process-specific. The first variation allows wildcards to appear in the position of the updating cell in any schema. Process-specific schemata do not allow this, therefore making it possible for them to describe processes in the CA rule unambiguously. For example, given a one-dimensional, binary CA with local neighborhoods of length 7, a generation, process-specific, wildcard schema
The Role of Conceptual Structure in Designing Cellular Automata
149
{#, #, #, 0, 1, #, 1} → 1 prescribes that a cell in state 0, with immediate-right and end-right neighbors in state 1 updates its state to 1 regardless of the state of the other neighbors. The implementation of the wildcard module in Aitana consists of a simple McCulloch and Pitts neural network that is instantiated distinctly for each combination of values for neighborhood size n, and number of allowed states k of an input CA rule. In this assimilation network, input units represent each look-up table entry (one for each LNC), and ouput units represent all the schemata available to redescribe segments of the input rule (see [13]). 3.2
Assimilation and Accommodation
Phase I redescription in Aitana depends on two interrelated mechanisms, assimilation and accommodation 1 . During Phase I, the units in the input layer of an assimilation network will be activated according to the output states in the CA rule to be processed. The firing of these units will spread, thus activating other units across the network. When some unit in the network (representing a E1 schema) has excitatory input above a threshold it fires. This firing signals that the schema represented by the unit becomes an E1 redescription of the lower level units that caused its activation. When this happens, inhibitory signals are sent back to those lower level units so that they stop firing (since they have been redescribed). At the end of assimilation, the units that remain firing represent the set of wildcard schemata redescribing the input CA rule. Once the process of assimilation has been completed, Aitana will try to force the assimilation of any (wildcard-free) look-up table entry that was not redescribed i.e. any input unit that is still firing. This corresponds to the accommodation process implemented in Aitana [13].
4
Conceptual Structure
One of the main findings reported in [1] is that most rules that perform the density classification task are process-symmetric. A binary CA rule is defined as process-symmetric if a particular bijective mapping (defined below) maps each schema representing a generation into exactly one of the schemata representing an annihilation, and vice versa. The bijective function transforms a schema s into its corresponding processsymmetric schema s by (1) reversing the elements in s using a mirror function M (s), and (2) exchanging ones for zeros, and zeros for ones (leaving wildcards untouched), using a negation function N (s). Thus, in every process symmetric CA rule, given the set S = {s1 , s2 , ..., sz } of all schemata si prescribing a statechange process, the elements of the set of schemata prescribing the converse process S = {s1 , s2 , ..., sz } can be found by applying the bijective mapping between processes defined by the composition si = (M ◦ N )(si ). 1
These two processes are inspired in those defined by Piaget in his theory of Constructivism [15,16].
150
M. Marques-Pita, M. Mitchell, and L.M. Rocha
Six out the nine rules analyzed by [1] were found to be process-symmetric. The remaining three, φCOE1 and φCOE2 and φDMC are not. It is interesting to note that the latter three CA rules were discovered via evolutionary algorithms (GAs and coevolutionary search) which apply variation to genetic encodings of the look-up tables of CAs. Therefore, genotype variation in these evolutionary algorithms operates at the low level of the bits of the look-up table—what we referred to as the implicit representation of a CA. In contrast, the search (Genetic Programming and Gene Expression Programming) and human design processes that lead to the other six (process-symmetric) rules, while not looking explicitly for process symmetry, were based on mechanisms and reasoning trading in the higher-level behavior and structure of the CA—what we refer to as the explicit representation of a CA2 The same research also determined that it is possible to define conceptual similarity between the process symmetric CA rules for the DCT. For example, the rule φGP 1995 can be derived from φGKL [1]. Moreover, the best process-symmetric rule known for this task (at the time) was found via 105 ≈ 0.83. However, conceptual transformations: φMM401 3 with performance P149 this still below the performance of the highest-performance rule so far discovered 105 ≈ 0.86. for the DCT, namely φCOE2 , with P149
5
The 4-Wildcard Space
Starting with the conceptual similarities observed between φGKL and φGP 1995 , we studied the “conceptual space” in which these two CA rules can be found: the space of process-symmetric binary CA rules with neighborhood size n = 7, where all state-change schemata have four wildcards. A form of evolutionary search was used to evaluate rules in this space as follows: the search starts with a population of sixty-four different process-symmetric rules containing only 4-wildcard schemata; the generation and annihilation schema sets for an individual were allowed to have any number of schemata in the range between two and eight; crossover operators were not defined; a mutation operator was set, allowing the removal or addition of up to two randomly chosen 4-wildcard schemata (repetitions not allowed), as long as a minimum of two schemata are kept in each schema set; in every generation the fitness of each member of the population is evaluated against 104 ICs, keeping the top 25% rules (elite) for the next generation without modification; offspring are generated by choosing a random member of the elite, and applying the mutation operator until completing the population size with different CA rules; a run consisted of 500 generations, and the search was executed for 8 runs. 2
3
When we refer to implicit and explicit representations of CA rules, we are preserving the terminology of the Representational Redescription Model (§3), the basis of the cognitively-inspired Aitana. We do not mean to imply that state-transition rules of CA are implicit, but rather that it is not clear from these rules what conceptual properties they embody. In inverse lexicographical (hex) , φM M 401 is ffaaffa8ffaaffa8f0aa00a800aa00a8
The Role of Conceptual Structure in Designing Cellular Automata
151
There are 60 possible 4-wildcard process-symmetric schemata-pairs. Thus, our search space contains approximately 3 × 109 rules defined by generation and annihilation schema sets of size between 2 and 8. As reported in [17], our search found one rule with higher performance than φMM401 . This rule, φMM0711 4 105 has P149 ≈ 0.8428. Even though this search resulted in an improvement, the performance gap between the best process-symmetric rule, φMM0711 and φCOE2 is still close to 2%. Is it possible then, that a process-symmetric rule exists “hidden” in the conceptually “messy” φCOE2 ?
6
Process-Symmetry in φCOE2
Figure 1 shows the state-change schema sets for φCOE2 . The performance of 105 ≈ 0.86. We generated random ICs (binomial distribution with this rule is P149 ρ = 0.5), where each IC was put in one of two sets—with membership to each depending on whether the IC has majority 0’s or 1’s. This was done until each set contained 105 ICs. Then the DCT performance measure was calculated 105 for the sets of ICs. These were, respectively, P149 (majority-0 ICs) ≈ 0.83 and 5 10 P149 (majority-1 ICs) ≈ 0.89. Even though on average this is the best-performing CA rule for the DCT, its performance is noticeably higher on the majority1s set of ICs. We claim that this divergence in behavior is due to the fact that φCOE2 is not process-symmetric. Evaluation of split performance on the ten known highest-performing rules for the DCT supports this hypothesis (see Table 1). The difference between the split performance measures for the nonprocess-symmetric rules is one or two orders of magnitude larger than for the process-symmetric rules. This indicates that process symmetry seems to lead to more balanced rules—those that respond equally well to both types of of problem.
RULE
Generation
Annihilation
COE2
g1 {1, 0, 1, 0, #, #, #} g2 {1, 0, #, 0, #, 1, 1} g3 {1, 1, #, 0, 1, #, #} g4 {1, #, 1, 0, 1, #, #} g5 {1, #, 1, 0, #, 0, #} g6 {1, #, #, 0, 1, 1, #} g7 {1, #, #, 0, 1, #, 1} g8 {#, 0, 0, 0, 1, 0, 1} g9 {#, 0, 1, 0, 0, 1, #} g10 {#, 0, #, 0, 0, 1, 1} g11 {#, 1, 1, 0, 1, #, 0} g12 {#, 1, 1, 0, #, 0, #}
a1 {0, 0, 1, 1, 1, 1, #} a2 {0, 0, #, 1, #, 1, 0} a3 {0, 1, 0, 1, 1, #, #} a4 {0, #, 0, 1, #, #, 0} a5 {1, 0, 0, 1, #, 0, #} a6 {#, 0, 0, 1, #, #, 0} a7 {#, #, 0, 1, 1, 0, #} a8 {#, #, 0, 1, #, 0, 0} a9 {#, #, #, 1, 0, #, 0}
Fig. 1. E1 schemata prescribing state changes for φCOE2 . This is the highest performance rule for the DCT found to date, and does not show clear process symmetry.
4
φM M 0711 is faffba88faffbaf8fa00ba880a000a88
152
M. Marques-Pita, M. Mitchell, and L.M. Rocha
Table 1. Split performances of the ten best DCT rules. First column shows performance for ICs in which there is majority of 0s; the is the performance when ICs have majority 1s; the third shows the difference between the two performances. Darker rows correspond to process-symmetric rules; white rows refer to non-process-symmetric rules. 105
P 149
M0
105
P 149
M1
P. DIFF.
GKL
0.8135
0.8143
0.0008
Davis95
0.8170
0.8183
0.0013
Das95
0.8214
0.8210
0.0004
GP1995
0.8223
0.8245
0.0022
DMC
0.8439
0.7024
0.1415
COE1
0.8283
0.8742
0.0459
COE2
0.8337
0.888
0.0543
GEP1
0.8162
0.8173
0.0011
GEP2
0.8201
0.8242
0.0041
MM0711
0.8428
0.8429
0.0001
A relevant question at this point concerns the existence of a process-symmetric rule in the conceptual vicinity of φCOE2 , whose performance is as good (or higher) than the performance of the original φCOE2 . There are two ways in which it is possible to think about conceptual vicinities, where new neighboring rules are produced by different accommodation mechanisms. One approach is to work with schemata that are in the original set describing the analyzed rule only. In this context, it is possible to produce new rules by deleting schemata (e.g. deleting a schema from a generation set, the process symmetric of which is not in the original annihilation set), or by adding process symmetric schemata to a set, provided their process symmetric counterparts are present in the original rule. We will refer to this as the “naive” approach to accommodation. Note that accommodation here has the goal of generating process-symmetric rules, instead of ensuring full assimilation as described in §3.2. A second approach would be to work with manipulations on the LNC (implicit) representational level – with these followed by a necessary re-assimilation of the manipulated rule. This type of accommodation will produce new sets of schemata that replace (fully or partially) the ones in the original rule, due to the fact that the LNCs in the rule were manipulated. This approach will be referred to as the “reconceptualization” approach to accommodation. When working with rules such as φCOE2 , which were evolved by learning mechanisms that are unaware of process symmetry, the first approach just described is “naive”. It is so in the sense that it is likely that evolution produced pairs of schemata (for generation and annihilation) which are only partially process-symmetric; they may contain key process-symmetric LNC pairs that are necessary to perform the computation.Thus, simply adding and deleting
The Role of Conceptual Structure in Designing Cellular Automata
153
00 1 10 111 1 1 01 111 1 1 11 111 11 1 11 1
10 1 10 00 1 0 10 00 0 1 1 10 01 0 1 0 10 100 0 1 11 1100 0 0 11 000 0 11 0100 0 11 1000 1 11 000 0 1 0 11 100 10 0 11 10 0 11 1100 1 11 110 0 0 0 11 001 10 0 11 01 0 0 10 101 1 11 1010 11 0 01 0
00 0 00 00 00 00 00 0 0 0 00 0011 00 0 01 1
schemata to attain process-symmetry may miss subtle interactions present on the implicit representational level of a CA rule. This makes the naive approach too “coarse” for dealing with CA rules evolved with learning strategies that do not take process symmetry into account. In answer the question about of the possible existence of a process-symmetric rule in the conceptual vicinity of φCOE2 , we performed a number of tests. First, using the naive approach, we looked at the CA rule resulting from keeping all annihilations in φCOE2 , and using only their process-symmetric generations. 105 ≈ 0.73. A second test was the reverse The performance of that rule was P149 of the first: keeping all generations of φCOE2 , and using only their process105 symmetric annihilations. The resulting rule has a performance P149 ≈ 0.47—a large decrease in performance.
a1 a2 a3 a4 a5 a6 a7 a8 a9 g1 g2 g3 g4 g5 g6 g7 g8 g9 g10 g11 g12
0 00 00 01 00 00 0 0 1 00 00 1 0 1 00 00 0 00
0 00 10 10 10 0 10 00 10 01 0 1 0 10 110 0 10 110 0 1 10 0000 0 11 0100 0 11 1000 0 11 000 1 0 11 100 1 0 11 010 1 0 11 1100 0 11 110 0 1 11 001 0 0 11 001 0 1 11 1010 0 11 101 1 0 10 101 1 11
1 11 11 1 00 111 1 1 10 111 1 1 01 111 1 11
...
...
Fig. 2. Representation of the matrix A used to determine the degree of processes symmetry for a CA rule (here φCOE2 ). Of the 128 possible LNCs only the first and last four, plus the middle eighteen are shown. Matrix elements colored in the first nine rows correspond to annihilation LNCs (labeled at the top). Analogously, the darker elements in the bottom twelve correspond to generation LNCs (labelled at the bottom). The curved connecting lines represent the ordering of the columns as process symmetric pairs; the vertical dotted line represents an annhilation LNC that is not process symmetric; and the horizontal dotted line represents that annihilation LNCs in that row are part of schema a9.
154
M. Marques-Pita, M. Mitchell, and L.M. Rocha
In order to interpret the results of these first two tests it would be necessary to study how different schemata and LNCs interact when they form coherent timespace patterns. The set of annihilations in φCOE2 seems to contribute more to the overall collective computation than the set of original generations, since this set of annihilation schemata by itself, working with its corresponding process symmetric generation set, results in a CA with significant higher performance than for the other case (second test). Nonetheless, the naive (coarser) approach to accommodation did not “uncover” a process symmetric rule in φCOE2 that keeps (or improves) the original average performance. For the next test, we used the “finer” approach to accommodation in order to explore the conceptual vicinity of φCOE2 plus some additional constraints (explained later). First of all, we looked at the degree of process symmetry already existing in φCOE2 . To find this we used the matrix-form representation of φCOE illustrated in Figure 2. Each column corresponds to each of the 128 LNCs for a one-dimensional binary CA rule and neighborhood radius three. These LNCs are not arranged in lexicographical order, instead they are arranged as processsymmetric pairs: the first and last LNCs (columns) are process-symmetric, the second, and next to last are also process-symmetric and so on, until the two LNCs in the center are also process-symmetric. Each row corresponds to the E1 (wildcard) state-changing schemata for φCOE2 . The first nine rows correspond to the annihilation schemata, and the subsequent ones the twelve generation schemata for φCOE2 . In any of the first nine rows, a shaded-cell represents two things: (1) that the LNC in that column is an annihilation; and (2) that the LNC is part of the E1 schema labeled in the row where it appears. The twelve rows for generation schemata are reversed in the figure. This makes it simple to inspect visually what process-symmetric LNCs are present in the rule, which is the case when for a given column, there is, at least, one cell shaded in one of the first nine rows (an active annihilation, light gray), and at least one cell shaded in one of the bottom nine rows (an active generation, dark gray). We will refer the schemata x LNC matrix representation in Figure 2 as A. As just described, given the ordering of elements in the columns of Figure 2, if a generation row is isolated, and then reversed, the result can be matched against any of the annihilation rows to calculate the total degree of process symmetry between the two schemata represented in the two rows. A total match means that the original generation schema is process-symmetric with the matched annihilation schema. A partial match indicates a degree of process symmetry. This partial match can be used by Aitana’s accommodation mechanism to force the highly process-symmetric pair into a fully process-symmetric one, keeping the modified representation only if there is no loss of performance. More concretely, the degree of process symmetry existing between two schemata Sg and Sa prescribing opposite processes (a generation schema, and an annihilation respectively) is calculated as follows: 1. Pick rows Sg and Sa from matrix A; Sg corresponds to a generation and Sa to an annihilation).
The Role of Conceptual Structure in Designing Cellular Automata
155
2. Reverse one of the rows (e.g. Sa . This makes it possible to compare each LNC (the columns) with its process-symmetric pair, by looking at the ith element of each of the two row vectors. 3. Calculate the degree of process symmetry as: 2 × Sg · Sa |Sg | + |Sa | where, for binary vectors, Sg · Sa is the number of component-matches (i.e. the count of all of the ith components that are one in both vectors); and |S| is the number of ones in a binary vector.5 All the generation rows were matched against all the annihilation rows in matrix A, recording the proportion of matches found. Table 3 (A) shows the results of this matching procedure (only highest matches shown). The darker rows correspond to schema pairs that are fully process-symmetric. The first three light gray rows (with matching score 66% show an interesting, almost complete process symmetry subset, involving generation schemata g1, g4 and g5, and annihilation schema a9. Using the accommodation mechanism in Aitana, we “generalized” schemata g1, g4 and g5 into the more general process symmetric schema of a9 (that encompasses the three generation processes), and tested the resulting CA rule. A Generation schemata
Annihilation schemata
B Matching score
g1
a9
66%
g2
a2
100%
g3
a8
100%
g4
a9
66%
g5
a9
66%
g6
a6
100%
g7
a4
66%
g8
a3
66%
g9
a2
25%
g10
a1
66%
g11
a5
50%
g12
a9
33%
Generation
Annihilation
{0, 1, 1, 0, 1, 0, 1} {0, 1, 1, 0, 1, 0, 0} {0, 1, 1, 0, 0, 0, 1} {0, 1, 1, 0, 0, 0, 0} {0, 0, 1, 0, 0, 1, 1} {0, 0, 1, 0, 0, 1, 0}
{0, 1, 0, 1, 0, 0, 1} {1, 1, 0, 1, 0, 0, 1} {0, 1, 1, 1, 0, 0, 1} {1, 1, 1, 1, 0, 0, 1} {0, 0, 1, 1, 0, 1, 1} {1, 0, 1, 1, 0, 1, 1}
{0, 1, 0, 0, 1, 1, 1} {1, 1, 1, 0, 0, 1, 1} {1, 1, 1, 0, 0, 1, 0} {0, 1, 0, 0, 1, 1, 0} {0, 1, 0, 0, 1, 0, 1} {0, 1, 0, 0, 1, 0, 0}
{0, 0, 0, 1, 1, 0, 1} {0, 0, 1, 1, 0, 0, 0} {1, 0, 1, 1, 0, 0, 0} {1, 0, 0, 1, 1, 0, 1} {0, 1, 0, 1, 1, 0, 1} {1, 1, 0, 1, 1, 0, 1}
Fig. 3. (A) Degree of process symmetry amongst all the generation and annihilation schemata in φCOE2 . Darker rows indicate full process symmetry, while light gray rows indicate a high degree of process symmetry. (B) The set R, containing the twelve LNCs in φCOE2 (white background) for which their corresponding process-symmetric LNCs are preservations (gray background). 5
While |x| is the notation typically used for cardinality of sets, here, we use it to represent the 1-norm, more commonly denoted by ||x||1 .
156
M. Marques-Pita, M. Mitchell, and L.M. Rocha
A
B
1.0
RULE
Generation
Annihilation
MM0802
{1, 0, 1, 0, #, #, #} {1, 0, #, 0, #, 1, 1} {1, 1, #, 0, 1, #, #} {1, #, 1, 0, 1, #, #} {1, #, 1, 0, #, 0, #} {1, #, #, 0, 1, 1, #} {1, #, #, 0, 1, #, 1} {#, 0, 0, 0, 0, 1, 1} {#, 1, 0, 0, 1, #, #} {#, 1, #, 0, 1, 0, #} {#, 1, #, 0, 1, #, 0} {#, #, 0, 0, 1, 0, 1}
{0, 0, 1, 1, 1, 1, #} {0, 0, #, 1, #, 1, 0} {0, 1, 0, 1, 1, #, #} {0, #, 0, 1, #, #, 0} {1, #, 0, 1, #, 0, #} {#, 0, 0, 1, #, #, 0} {#, 1, 0, 1, #, 0, #} {#, 1, #, 1, 0, #, 0} {#, #, 0, 1, 0, #, 0} {#, #, 0, 1, 1, 0, #} {#, #, 0, 1, #, 0, 0} {#, #, #, 1, 0, 1, 0}
Performance
0.8
0.6
0.4
0.2
0.0
1 p. 2 p. 3 p. 4 p. 5 p. 6 p. 7 p. 8 p. 9 p. 10 p. 11 p. 12 p.
Process-symmetric tested sets
Fig. 4. (A) Performances of the 4096 process-symmetric CAs in the immediate conceptual vicinity of φCOE2 . The best specimen CA is φCOE2c lean plus one of the combinations of 6 LNC pairs from R.(B) E1 schemata prescribing state changes for φM M 0802 . This is the highest-performing known process-symmetric rule for the DCT.
We also “specialized” by breaking a9 into the three process-symmetric schemata of g1, g4 and g5, and forcing the remaining LNCs to become preservations. For 105 < 0.6. Notice both the resulting rules performance decreased significantly, P149 that for these tests, the approach used to define what rules are in the conceptual vicinity is more fined-grained, but still constrained to working with schemata, allowing mechanisms such as the generalization of schemata e.g. g1, g4 and g5 into a single one to work. However, these tests were also unsuccessful in uncovering a high-performing CA derived from φCOE2 . Using the re-conceptualization approach, it is possible to extract a matrix representation A that contains only those LNC process-symmetric pairs that are both 1s in A. In other words, each column in A will be exactly as in A, as long as the column contains 1s for both annihilation and generation rows, otherwise the column is all 0s—the latter is the case for all columns marked with dotted lines in Figure 2. We will refer to the rule represented by the matrix A as φCOE2−clean —the CA rule that preserves all the process symmetry already present in φCOE2 . The “orphan” LNCs removed from A are shown in Figure 3 (B) (white background). Their process-symmetric pairs are in the same Figure (gray background). We will refer to this set of process symmetric pairs as R. The last test to be reported here consisted in evaluating the performance of each CA rule derived from (1) taking φCOE2−clean as base (each time); (2) adding to it a number of process symmetric pairs from R to it; and (3) evaluating the resulting CA rule. This set contains all CA rules that are the same as φCOE2−clean , but adding one of the twelve pairs in R; it also contains all the rules that are as φCOE2−clean , including combinations of two pairs from R (66 rules), and so on. The total number of CA rules derived in this way is 40966 . 6
Note that each of the rules tested comes from adding a particular combination of pairs each time to the original φCOE2−clean , as opposed to adding pairs of LNCs cumulatively to φCOE2−clean .
The Role of Conceptual Structure in Designing Cellular Automata
157
The performance of the 4096 rules is shown in Figure 4 (A). Each column shows the performance of the subsets of rules adding one pair of LNCs from R, subsets adding combinations of two pairs, and so on. Note that the median performance in each subset decreases for rules containing more pairs of LNCs from R. However, the performance of the best CA rules in each subset increases for all subsets including up to six LNC pairs, and then decrease. One of the tested CAs, containing six LNC pairs added to φCOE2−clean , is the best process-symmetric CA 105 for the DCT with P149 ≈ 0.85 φMM0802 , are shown in Figure 4 (B). φMM0802 , has a performance that is very close to that of the second highest-performing rule known for the DCT, φCOE1 [1]. However, φMM0802 is the highest-performing CA for split performance for the DCT—which means that it classifies correctly the two types of IC it can encounter (majority 1s or majority 0s).
7
Implicit Evolution of Conceptual Properties?
From the work reported in previous sections, we have established that process symmetry is a conceptual property present in CAs that perform the DCT. Indeed, our experiments have shown that full process symmetry in a highperforming CA ensures that it classifies the two types of IC it encounters equally well. We have also shown that most of the highest-performing CA rules for the DCT are process-symmetric [1]. However, in order to make our results generally useful, i.e. for learning to program cellular arrays that perform a range of tasks that require collective computation, it is important to determine what learning strategy best exploits conceptual properties. For example, CA rules for a different task might not be as amenable to redescription using wildcard schemata (though another type of schema might be appropriate), and they would not necessarily exhibit process symmetry, but perhaps would exhibit other conceptual properties. Therefore, it is important to determine what makes a learning mechanism (e.g. coevolution working with standard CA look-up tables) more likely to exploit conceptual structure during learning most effectively. In previous work, [2] evaluated learning strategies based on evolution and coevolution, with or without using spatial distribution and local interactions during learning. In particular, they evaluated four methods: – Spatial Coevolution, in which hosts (CA rules) and parasites (ICs) coevolve in a spatial grid in which fitness is calculated and evolutionary selection is done in local grid neighborhoods; – Non-spatial Coevolution, which is the same as spatial coevolution except that fitness calculation and selection are performed using random samples of parasites or hosts that are not spatially correlated; – Spatial Evolution, which uses the same spatial grid method as spatial coevolution, except that the ICs do not evolve but are generated at random at each generation; and – Nonspatial Evolution, which is similar to a traditional genetic algorithm.
158
M. Marques-Pita, M. Mitchell, and L.M. Rocha
Their results have shown that spatial coevolution is substantially more successful than the other methods in producing high-performance rules. [2] gave evidence that this learning strategy ensures the highest diversity in the host (evolving programs) populations, which allows higher-performing CA rules to be discovered. The preliminary results we report here suggest that this diversity, coupled with the arms-races mechanisms at work during spatial coevolution, leads over 0.7 0.6 0.5
non sp. coev.
best rule performance (dots) & its p.s. degree (line)
0.6
0.5 0.4 0.4 0.3 0.3 0.2
0.2
0.1
0.1
0.0
0.0 0
1000
2000
3000
4000
0
1000
2000
3000
4000
generation
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0.0
non sp. evol.
best rule performance (dots) & its p.s. degree (line)
generation
0.0 0
1000
2000
3000
4000
0
1000
2000
3000
4000
generation
0.6
0.6
spatial coev.
best rule performance (dots) & its p.s. degree (line)
0.7
0.5 0.4
0.4
0.3 0.2
0.2
0.1 0.0
0.0 0
1000
0
1000
generation
2000
generation
0.5
spatial evol.
best rule performance (dots) & its p.s. degree (line)
0.6 0.6
0.4 0.4 0.3
0.2 0.2 0.1
0.0
0.0 0
1000
2000
generation
3000
4000
0
1000
2000
3000
4000
generation
Fig. 5. Average performance of the best individual CA rule in a generation (dots), and its degree of process-symmetry (line) for different runs of four learning strategies to evolve CAs that perform the DCT
The Role of Conceptual Structure in Designing Cellular Automata
159
time, to the survival of CAs that are more generally capable of solving the two types of IC they encounter. This is illustrated in Figure 5, where the degree of process symmetry (continuous line), and the overall performance (dots) for the best individual in a population during a number of runs for each of the learning strategies is shown7 . It is clear from the plot in Figure 6 that spatial coevolution has the smallest differences between performances for the two types of IC over time, and that there appears to be a correlation between performance and degree of process symmetry. Moreover, there seems to be sudden changes occurring in some of the plots. In particular for spatial coevolution, these changes show correlated increases in overall performance and degree of process symmetry. Concerning the apparent correlation between degree of process symmetry and performance, Table 2 shows the Pearson correlation coefficients for the data analyzed and partially plotted in Figure 5. Using 1000 degrees of freedom, with 99.99% confidence (crit. value 0.104), the Non Spatial Coevolution strategy has weak negative correlation for the 1st run; no correlation for the 2nd; weak positive correlation for the 3rd; and no correlation for the 4th. The Non Spatial Evolution strategy has significant positive correlation for the 1st run; significant negative correlation for the 2nd; and weak negative correlation for the 3rd. The Spatial Coevolution strategy has significant positive correlation for the 1st and 3rd runs; weak positive correlation for the 2nd, and very strong positive correlation for the 4th. Lastly, the Spatial Evolution strategy has significant positive correlation for the 1st run; for the 2nd and 3rd runs there is a weak positive correlation, and no correlation for the 4th. Clearly, if process-symmetry is taken to be a learning goal, spatial coevolution appears to be the only strategy capable of achieving this learning. To a lesser degree the spatial evolution strategy can also achieve this, while the non-spatial strategies do not achieve this learning consistently. We investigated the apparent sudden changes mentioned earlier (most noticeable in the spatial coevolution plots in Figure 5). Figure 6, shows the same data plotted in Figure 5, but splitting the performance by type of IC. The lighter dots Table 2. Correlation between performance and degree of process-symmetry for each run over evolution strategy Run 1
7
Run 2
Run 3
Run 4 0.05
N.S. Coe
-0.15
0.05
0.11
N.S. Evo
0.43
-0.48
-0.18
--
SP. Coe
0.62
0.13
0.65
0.8
SP. Evo
0.31
0.17
0.11
0.07
Here only two runs for each strategy are plotted for clarity. However, a larger number of runs (mostly four per strategy) was analyzed. Full plots are available from http://mypage.iu.edu/∼marquesm/Site/Online Materials/.
M. Marques-Pita, M. Mitchell, and L.M. Rocha 1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
non sp. coev.
best rule split performance (dots) & its p.s. degree (line)
160
0.0 0
1000
2000
3000
4000
0
1000
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0 1000
2000
3000
4000
0
1000
generation
4000
2000
3000
4000
generation
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
spatial coev.
best rule split performance (dots) & its p.s. degree (line)
3000
0.0 0
0.0 0
1000
0
1000
generation
2000
generation
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
spatial evol.
best rule split performance (dots) & its p.s. degree (line)
2000
generation
non sp. evol.
best rule split performance (dots) & its p.s. degree (line)
generation
0.0 0
1000
2000
generation
3000
4000
0
1000
2000
3000
4000
generation
Fig. 6. Split performance of the best individual CA rule in a generation—lighter dots for performance on ICs with majority 0s, darker dots for performance on ICs with majority 1s, and its degree of process-symmetry (line) for different runs of four learning strategies to evolve CAs that perform the DCT. For each best individual CA in a generation, a vertical light-gray line is used to join the two different performances, showing the difference between them.
show the rule’s performance in classifying correctly cases in which the IC has majority 0s; darker dots show the performance for the converse type of problem (majority 1s) and the continuous line is the degree of process symmetry. It becomes clear from the figure that, for the spatial coevolution strategy, there is
The Role of Conceptual Structure in Designing Cellular Automata
161
an initial period in which the hosts are very good at solving one type of problem, but very poor at the converse type. After the abrupt change in performance differences per type of IC, in which process-symmetry also increases.
8
Solving the DCT in 2D
In §5 we described a methodology to perform evolutionary search in a space of process symmetric rules, looking for the best process symmetric rule to perform the DCT in one dimension. We applied the same methodology to search the 9 (much larger space containing 22 CAs) of rules to perform the DCT in two dimensions, using the Moore neighborhood (center cell and 8 adjacent cells). Instead of looking in the space of 4-wildcards, we searched the space of four, five and six wildcards. In the space of six wildcards our search discovered the highestperforming 2D CA rule for the DCT found to date. The performance of this rule on 2D lattices of 19x19 cells is about 85%. Moreover, Aitana’s redescription of this rule φMM2D320 is very compact (shown in Figure 7), which shows the rule is parsimonious.
RULE
Generation
Annihilation
MM2D320
{#,#,#,#,0,#,#,1,1} {#,#,1,#,0,1,#,#,#} {#,1,#,#,0,1,#,#,#}
{0,0,#,#,1,#,#,#,#} {#,#,#,0,1,#,0,#,#} {#,#,#,0,1,#,#,0,#}
Fig. 7. E1 of φM M 2D320 The first three elements correspond to the NW, N, NE, W, updating, E, SW, S, and SE neighbors in that order
9
Conclusions and Future Work
In this paper we have demonstrated that a particular conceptual structure, process symmetry, is correlated with performance on the density classification task. We have also demonstrated that restricting the evolutionary algorithm’s search to the space of process-symmetric rules can more easily produce highperformance rules—for both one and two dimensional CAs—than allowing the EA an unrestricted search space. Furthermore, we have provided evidence that spatial coevolution, previously shown to be a powerful method for evolving cellular automata for the DCT, implicitly increases the degree of process symmetry in CAs over generations, and is correlated with the CAs improvement in performance. The major goals for future work on this topic are (1) determining how well Aitana can discover useful conceptual structures for other, more complex computational tasks for CAs; (2) developing a better understanding of why particular conceptual structures such as process symmetry enable higher-performance, and (3) further investigation of the implicit evolution of conceptual structures in CA
162
M. Marques-Pita, M. Mitchell, and L.M. Rocha
rule tables, and determining if and how these structures are related to characterizations of the space-time behavior of CAs, such as the domains and particles framework of Crutchfield et al. [18]. In recent work [19] have found new CA rules for the 1-dimensional DCT problem with performances over 88%. Future work is needed in order to determine the split performances of these new, high-performing CAs, as well as their conceptual structure—both in terms of parsimoniousness, and their levels of process symmetry. Acknowledgements. Melanie Mitchell acknowledges support from the Focus Center Research Program (FCRP), Center on Functional Engineered Nano Architectonics (FENA). This work was partially supported by Fundac.˜ao para a Ciˆencia e a Tecnologia (Portugal) grant 36312/2007.
References 1. Marques-Pita, M., Manurung, R., Pain, H.: Conceptual representations: What do they have to say about the density classification task by cellular automata? In: Jost, J., Reed-Tsotchas, F., Schuster, P. (eds.) ECCS 2006. European Conference on Complex Systems (2006) 2. Mitchell, M., Thomure, M.D., Williams, N.L.: The role of space in the success of coevolutionary learning. In: Proceedings of Artificial Life X: Tenth Annual Conference on the Simulation and Synthesis of Living Systems (2006) 3. Zhirnov, V., Cavin, R., Lemming, G., Galatsis, K.: An assessment of integrated digital cellular automata architectures. Computer 41(1), 38–44 (2008) 4. Mitchell, M., Crutchfield, J., Hraber, P.: Revisiting the edge of chaos: Evolving cellular automata to perform computations. Complex Systems 7, 89–130 (1993) 5. Gacs, P., Kurdyumov, L., Levin, L.: One-dimensional uniform arrays that wash out finite islands. Probl. Peredachi. Inform. 14, 92–98 (1978) 6. Gonzaga de S´ a, P., Maes, C.: Gacs-Kurdyumov-Levin automaton revisited. Journal of Statistical Physics 67(3-4), 507–522 (1992) 7. Andre, D., Bennett III, F., Koza, J.: Discovery by genetic programming of a cellular automata rule that is better than any known rule for the majority classification problem. In: Koza, J., Goldberg, D., Fogel, D. (eds.) Proceedings of the First Annual Conference on Genetic Programming, pp. 3–11. MIT Press, Cambridge (1996) 8. Das, R., Mitchell, M., Crutchfield, J.: A genetic algorithm discovers particle-based computation in cellular automata. In: Davidor, Y., Schwefel, H.P., M¨ anner, R. (eds.) Proceedings of the Int.Conf. on Evolutionary Computation, pp. 344–353 (1994) 9. Juill´e, H., Pollack, B.: Coevolving the ideal trainer: Application to discovery of cellular automata rules. In: Garzon, M.H., Goldberg, D.E., Iba, H., Riolo, R. (eds.) Genetic Programming 1998: Proceedings of the Third Annual Conference. Morgan Kaufmann, San Francisco (1998) 10. Ferreira, C.: Gene expression programming: A new adapive algorithm for solving problems. Complex Systems 13(2), 87–129 (2001) 11. Karmiloff-Smith, A.: Beyond Modularity: A Developmental Perspective on Cognitive Science. MIT Press, Cambridge (1992)
The Role of Conceptual Structure in Designing Cellular Automata
163
12. G¨ ardenfors, P.: Conceptual Spaces: The Geometry of Tought. MIT Press/Bradford Books (2000) 13. Marques-Pita, M.: Aitana: A Developmental Cognitive Artifact to Explore the Evolution of Conceptual Representations of Cellular Automata-based Complex Systems. PhD thesis, School of Informatics, University of Edinburgh, Edinburgh, UK (2006) 14. Holland, J., Holyoak, K., Nisbett, R., Thagard, P.: Induction: Processes of Inference, Learning and Discovery. MIT Press, Cambridge (1986) 15. Piaget, J.: The Origins of Intelligence in Children. International University Press (1952) 16. Piaget, J.: The Child’s Construction of Reality. Routledge and Kegan Paul (1955) 17. Marques-Pita, M., Rocha, L.M.: Conceptual structure in cellular automata: The density classification task. In: Bullock, S., Noble, J., Watson, R.A., Bedau, M.A. (eds.) Proceedings of the Eleventh International Conference on Artificial Life (Alife XI). MIT Press, Cambridge (2008) 18. Crutchfield, J.P., Mitchell, M., Das, R.: The evolutionary design of collective computation in cellular automata. In: Crutchfield, J.P., Schuster, P.K. (eds.) Evolutionary Dynamics—Exploring the Interplay of Selection, Neutrality, Accident, and Function, pp. 361–411. Oxford University Press, New York (2003) 19. Woltz, D., De Oliveira, P.: Very effective evolutionary techniques for searching cellular automata rule spaces. Journal of Cellular Automata (to appear)
A Characterisation of NL Using Membrane Systems without Charges and Dissolution Niall Murphy1 and Damien Woods2 1
Department of Computer Science, National University of Ireland, Maynooth, Ireland
[email protected] 2 Department of Computer Science and Artificial Intelligence, University of Seville, Seville, Spain
[email protected]
Abstract. We apply techniques from complexity theory to a model of biological cellular membranes known as membrane systems or P-systems. Like circuits, membrane systems are defined as uniform families. To date, polynomial time uniformity has been the accepted uniformity notion for membrane systems. Here, we introduce the idea of using AC0 and Luniformities and investigate the computational power of membrane systems under these tighter conditions. It turns out that the computational power of some systems is lowered from P to NL, so it seems that our tighter uniformities are more reasonable for these systems. Interestingly, other systems that are known to be lower bounded by P are shown to retain their computational power under the new uniformity conditions. Similarly, a number of membrane systems that are lower bounded by PSPACE retain their power under the new uniformity conditions.
1
Introduction
Membrane systems [14] are a model of computation inspired by living cells. In this paper we explore the computational power of cell division (mitosis) and dissolution (apoptosis) by investigating a variant of the model called active membranes [13]. An instance of the model consists of a number of (possibly nested) membranes, or compartments, which themselves contain objects. During a computation, the objects, depending on the compartment they are in, become other objects or pass through membranes. In the active membrane model it is also possible for a membrane to completely dissolve, and for a membrane to divide into two child membranes. This membrane model can be regarded as a model of parallel computation, however it has a number of features that make it somewhat unusual when compared to other parallel models. For example, object interactions are nondeterministic so confluence plays an important role, membranes contain multisets of objects, there are many parameters to the model, etc. In order to clearly see the power of the model we analyse it from the computational complexity point of view, the goal being to characterise the model in terms of the set of problems C.S. Calude et al. (Eds.): UC 2008, LNCS 5204, pp. 164–176, 2008. c Springer-Verlag Berlin Heidelberg 2008
A Characterisation of NL Using Membrane Systems
165
that it can solve in reasonable time. One can also interpret our results as classifying the computational complexity of simulating biological phenomena that are modelled by the membrane systems under consideration. Another, more specific, motivation is the so-called P-conjecture [15] which states that recogniser membranes systems with division rules (active membranes), but without charges, characterise P. On the one hand, it was shown that this conjecture does not hold for systems with non-elementary division as PSPACE upper [18] and lower [1] bounds were found for this variant (non-elementary division is where a membrane containing multiple membranes and objects may be copied in a single timestep). On the other hand, the P-conjecture was thought to hold for all active membrane systems without dissolution rules, when Guti´errez-Naranjo et al. [7] gave a P upper bound. The corresponding P lower bound (trivially) came from the fact that the model is defined to be P-uniform. However, here we argue that the aforementioned P lower bound highlights a problem with using P-uniformity, as it does not tell us whether this membrane model itself has (in some sense) the ability to solve all of P in polynomial time, or if the uniformity condition is providing the power. In this paper we show that when we use weaker, and more reasonable, uniformity conditions the model does not in fact have the ability to solve all problems in P (assuming P = NL). We find that with either AC0 or L-uniformity the model characterises NL in the semi-uniform case, and we give an NL upper bound for the uniform case. We also show that the PSPACE lower and upper bounds mentioned above still hold under these restricted uniformity conditions. Using the notation of membrane systems (defined in Section 2) our upper bound on L-uniform and L-semi-uniform families of membrane systems can be stated as follows. Theorem 1. PMCAM0−d ⊆ NL Essentially this theorem states that polynomial time active membrane systems, without dissolution rules, solve no more than those problems in NL. Despite the fact that these systems run for polynomial time (and can even create exponentially many objects and membranes), they can not solve all of P (assuming NL = P). This result is illustrated by the bottom four nodes in Figure 1. The upper bound in Theorem 1 is found by showing that the construction in [7] can be reduced to an instance of the NL-complete problem s-t-connectivity (STCON). The full proof appears in Section 3. Next we give a corresponding lower bound. Theorem 2. NL ⊆ PMCAM0−d,−u To show this lower bound we provide an AC0 -semi-uniform membrane family that solves STCON. The full proof is in Section 4 and the result is illustrated by the bottom left two nodes in Figure 1. Therefore, in the semi-uniform case we have a characterisation of NL.
166
N. Murphy and D. Woods PSPACE
PSPACE
+d, +ne, -u
+d, +ne, +u
NP P
NP P
+d, -ne, -u
+d, -ne, +u
NL
NL ?
-d, +ne, -u
-d, +ne, +u
NL
NL ?
-d, -ne, -u
-d, -ne, +u
Fig. 1. A diagram showing the currently known upper and lower bounds on the variations of the model. The top part of a node represents the best known upper bounds, and the lower part the best known lower bounds. An undivided node represents a characterisation. Arrows represent inclusions.
Corollary 1. NL = PMCAM0−d,−u We have not yet shown an analogous lower bound result for uniform families. In Section 4.1 we briefly explore some issues relating to this problem. So far we have shown that four models, that characterise P when polynomial time uniformity is used are actually upper bounded by NL when restricted to be AC0 -uniform (or L-uniform). Interestingly, we also show that two other polynomial time uniform membrane systems that are known [11] to characterise P actually retain this P characterisation when restricted to be AC0 -uniform (or L-uniform). This result is stated as a P lower bound on membrane systems with dissolution: Theorem 3. P ⊆ PMCAM0+d,+u The proof appears in Section 5 and is illustrated by the top front two nodes in Figure 1. Here we remark that the NP upper bounds given by the top two front nodes in Figure 1 are easily derived from the main result in [11]. In Section 2.4 we observe that the known PSPACE characterisations (top two nodes in Figure 1) remain unchanged under AC0 -uniformity conditions.
2
Membrane Systems
In this section we define membrane systems and complexity classes. These definitions are from P˘ aun [13,14], and Sos´ık and Rodr´ıguez-Pat´on [18]. We also introduce the notion of AC0 -uniformity for membrane systems. 2.1
Active Membrane Systems
Active membranes systems are a class of membrane systems with membrane division rules. Division rules can either only act on elementary membranes, or else
A Characterisation of NL Using Membrane Systems
167
on both elementary and non-elementary membranes. An elementary membrane is one which does not contain other membranes (a leaf node, in tree terminology). Definition 1. An active membrane system without charges is a tuple Π = (O, H, μ, w1 , . . . , wm , R) where, 1. 2. 3. 4.
m > 1 is the initial number of membranes; O is the alphabet of objects; H is the finite set of labels for the membranes; μ is a membrane structure, consisting of m membranes, labelled with elements of H; 5. w1 , . . . , wm are strings over O, describing the multisets of objects placed in the m regions of μ. 6. R is a finite set of developmental rules, of the following forms: (a) [ a → u ]h , for h ∈ H, a ∈ O, u ∈ O∗ (b) a[ ]h → [ b ]h , for h ∈ H, a, b ∈ O (c) [ a ]h → [ ]h b, for h ∈ H, a, b ∈ O (d) [ a ]h → b, for h ∈ H, a, b ∈ O (e) [ a ]h → [ b ]h [ c ]h , for h ∈ H, a, b, c ∈ O. (f ) [ a [ ]h1 [ ]h2 [ ]h3 ]h0 → [ b [ ]h1 [ ]h3 ]h0 [ c [ ]h2 [ ]h3 ]h0 , for h0 , h1 , h2 , h3 ∈ H, a, b, c ∈ O. These rules are applied according to the following principles: – All the rules are applied in maximally parallel manner. That is, in one step, one object of a membrane is used by at most one rule (chosen in a nondeterministic way), but any object which can evolve by one rule of any form, must evolve. – If at the same time a membrane labelled with h is divided by a rule of type (e) or (f ) and there are objects in this membrane which evolve by means of rules of type (a), then we suppose that first the evolution rules of type (a) are used, and then the division is produced. This process takes only one step. – The rules associated with membranes labelled with h are used for membranes with that label. At one step, a membrane can be the subject of only one rule of types (b)-(f ). The environment is an indissoluble membrane that is the ultimate parent of all other membranes in the system.
168
2.2
N. Murphy and D. Woods
Recogniser Membrane Systems
In this paper we study the language recognising variant of membrane systems that solves decision problems. Definition 2. A recogniser membrane system is a membrane system such that the result of the computation (a solution to the instance) is “yes” if a distinguished object yes appears in the environment or “no” if no appears. Such a membrane system is called deterministic if for each input a unique sequence of configurations exists. A membrane system is called confluent if it always halts and, starting from the same initial configuration, it always gives the same result, either always “yes” or always “no”. Therefore, the following interpretation holds: given a fixed initial configuration, a confluent membrane system non-deterministically chooses one from a number of valid configuration sequences, but all of them must lead to the same result. 2.3
Complexity Classes
Here we introduce the notion of AC0 -uniformity to membrane systems. Previous work on the computational complexity of membrane systems used (Turing machine) polynomial time uniformity [16]. Consider a decision problem X, i.e. a set of instances X = {x1 , x2 , . . .} over some finite alphabet such that to each xi there is an unique answer “yes” or “no”. We say that a family of membrane systems solves a decision problem if each instance of the problem is solved by some family member. We denote by |x| = n the length of any instance x ∈ X. AC0 circuits are DLOGTIME-uniform, polynomial sized (in input length n), constant depth, circuits with AND, OR, and NOT gates, and unbounded fanin [4]. Definition 3 (AC0 -uniform families of membrane systems). Let D be a class of membrane systems and let f : N → N be a total function. The class of problems solved by uniform families of membrane systems of type D in time f , denoted by MCD (f ), contains all problems X such that: – There exists an AC0 -uniform family of membrane systems, ΠX = (ΠX (1), ΠX (2), . . .) of type D: that is, there exists an AC0 circuit family such that on unary input 1n the nth member of the circuit family constructs ΠX (n). We refer to this circuit family as the family machine. – There exists an AC0 -uniform circuit family such that on input x ∈ X, of length |x| = n, the nth member of the family encodes x as a multiset of input objects placed in the distinct input membrane hin . We refer to this circuit family as the input encoding machine. – Each ΠX (n) is sound: ΠX (n) starting with an encoded input x of length n expels out a distinguished object yes if and only if the answer to x is “yes”. – Each ΠX (n) is confluent: all computations of ΠX (n) with the same input x of size n give the same result; either always “yes” or else always “no”. – ΠX is f -efficient: ΠX (n) always halts in at most f (n) steps.
A Characterisation of NL Using Membrane Systems
169
Using this definition of AC0 -uniform families, we define AC0 -semi-uniform families of membrane systems ΠX = (ΠX (x1 ); ΠX (x2 ); . . .) such that there exists an AC0 -uniform circuit family which, on an input x ∈ X of length |x| = n, constructs membrane system ΠX (x). Here a single circuit family (which we refer to as the input encoding machine) is used to construct the semi-uniform membrane family, and so the problem instance is encoded using objects, membranes, and rules. In this case, for each instance of X we have a special membrane system which therefore does not need a separately constructed input. The resulting class of problems is denoted by MCD,−u (f ). Obviously, MCD (f ) ⊆ MCD,−u (f ) for a given class D and a complexity [3] function f . Logspace, or L, uniform families of membrane systems are defined analogously, where we use two deterministic logspace Turing machines, instead of the two AC0 circuit families, for the uniformity conditions. Similarly we define L-semiuniformity using a logspace Turing machine instead of an AC0 circuit family. We define PMCD and PMCD,−u as PMCD =
MCD (O(nk )), PMCD,−u =
k∈N
MCD,−u (O(nk )).
k∈N
In other words, PMCD (and PMCD,−u ) is the class of problems solvable by uniform (respectively semi-uniform) families of membrane systems in polynomial time. We denote by AM0 the classes of membrane systems with active membranes and no charges. We denote by AM0−ne the classes of membrane systems with active membranes and only elementary membrane division and no charges. We denote by AM0+ne the classes of membrane systems with active membranes, and both non-elementary and elementary membrane division and no charges. We denote by PMCAM0−d the classes of problems solvable by uniform families of membrane systems in polynomial time with no charges and no dissolution rules. In this paper we are using DLOGTIME-AC0 -uniformity which can be somewhat cumbersome to analyse, therefore in our proofs we use an AC0 equivalent model called the constant time Concurrent Random Access Machine (constant time CRAM) [2,8]. Definition 4 (CRAM [8]). A CRAM is a concurrent-read concurrent write PRAM with a polynomial number of processors. Each processor is able to shift a word in memory by a polynomial number of bits. 2.4
AC0 -Uniformity and PSPACE Results
Membrane systems with active membranes, without charges, and using nonelementary division have been shown to characterise PSPACE [1,18]. For the lower bound, a P-uniform membrane system is given [1] that solves instances of QSAT in polynomial time. Clearly, stricter uniformity notions have no affect on the PSPACE upper bound. We now show that the use of AC0 -uniformity does not change this lower bound.
170
N. Murphy and D. Woods
The family machine inputs the numbers n and m representing the number of variables and clauses of the QSAT instance, and uses them to construct a polynomial number of objects, rules and membranes. We observe that the construction in [1] is in AC0 : the most complicated aspect involves multiplication by constants (essentially addition) which is known [9] to be in AC0 . Although we omit the details, it is not difficult to see that a constant time CRAM constructs the membrane system in constant time from n and m. Similarly, the encoding of the instance as objects to be placed in the input membrane involves only addition.
3
NL Upper Bound on Active Membranes without Dissolution Rules
Previously the upper bound on all active membrane systems without dissolution was P [7]. As an aside, we remark that this is a very enlightening proof since it first highlighted the importance of dissolution. Without dissolution, membrane division, even non-elementary division, can be modelled as a special case of object evolution. It is also worth noting that these systems can create exponential numbers of objects and membranes, yet they can not compute anything outside P. Since membrane systems are usually P-uniform, this P upper bound was considered a characterisation of P. However, having a lower bound of the same power as the uniformity condition is somewhat unsatisfactory, as it tells us little about the computing power of the actual membrane system itself. This is because the input encoding machine (in the uniform and semi-uniform case) takes an instance of the problem as input, thus if the problem is contained in the set of problems solvable by the encoder it simply outputs a yes or no object directly. In this section we show that if we tighten the uniformity condition to be AC0 , or even L, it is possible to decide in NL whether or not the system accepts. We give an overview rather than the full details. The proof of the P upper bound in [7] involves the construction of a dependency graph representing all possible computation paths of a membrane system on an input. The dependency graph for a membrane system Π is a directed graph GΠ = (VΠ , EΠ ). Each vertex a in the graph is a pair a = (v, h) ∈ Γ × H, where Γ is the set of objects and H is the set of membrane labels. An edge connects vertex a to vertex b if there is an evolution rule such that the left hand side of the rule has the same object-membrane pair as a and the right has an object-membrane pair matching b. If we can trace a path from the vertex (yes, env) (indicating an accepting computation) back to a node representing the input it is clear that this system must be an accepting one. It is worth noting that, unlike upper bound proofs for a number of other computational models, the dependency graph does not model entire configuration sequences, but rather models only those membranes and objects that lead to a yes output.
A Characterisation of NL Using Membrane Systems
171
The original statement of the proof constructed the graph in polynomial time and a path was found from the accepting node to the start node in polynomial time. We make the observation that the graph GΠ can be constructed in deterministic logspace. We omit the details, but our claim can be verified by checking that the construction in [7] can easily be computed using only a fixed number of binary counters. Also we note that the problem of finding a path from the accepting vertex to one of the input vertices is actually an instance of MSTCON, a variation of the NL-complete problem STCON. STCON is also known as PATH [17] and REACHABILITY [12]. Definition 5 (STCON). Given a directed graph G = (V, E) and vertices s, t ∈ V , is there a directed path in G from s to t? Definition 6 (MSTCON). Given a directed graph G = (V, E), vertex t ∈ V and S ⊆ V , is there a directed path in G from any element of S to t? MSTCON is NL-complete as a logspace machine, or AC0 circuit can add a new start vertex s , with edges from s to each vertex in S, to give an instance of STCON. Since we have shown that the problem of simulating a membrane system without charges and without dissolution can be encoded as an NL-complete problem we have proved Theorem 1. The proof holds for both AC0 and Luniformity, as well as for both uniform and semi-uniform families of membrane systems without dissolution.
4
NL Lower Bound for Semi-uniform Active Membranes without Dissolution
Here we provide a proof of Theorem 2 by giving a membrane system that solves STCON in a semi-uniform manner. The algorithm works by representing edges in the problem instance graph as object evolution rules. There is only one membrane which serves as the input and output membrane. The system is initialised with an s object in this membrane. If there are edges from s to any other nodes in the graph then have evolution rules indicating this. For example edges (s, b), (s, c), (s, d) are represented as the rule [s → bcd]. In this manner the presence of an object in a configuration indicates that the system is currently at this node while following (or simulating) each different path through the graph in parallel. If the t object is ever evolved the system outputs a yes object and halts. Otherwise, a no object is output from the system. We now give a proof of Theorem 2. Proof. Each instance of the problem STCON is of the form ((V, E) s, t). We let n and m be the number of vertices and edges in the graph respectively. We assume an ordering on instances (say by n and then lexicographically). We define a function f (k), computable in AC0 , that maps the k th instance to the following membrane system Πk .
172
– – – –
N. Murphy and D. Woods
The The The The
set of labels is {h}, initial membrane structure is [ ]h . working objects { yes, no}∪ {ci | 0 ≤ i ≤ |V | + 2} ∪ V . initial multiset is c|V |+2 , s .
In the input membrane we place the object node given by s. The evolution rules are as follows. If vertex vi has out degree d ∈ N and we have d edges {(vi , vj1 ), (vi , vj2 ), . . . , (vi , vjd )} then we encode it as a type (a) rule [ vi → ui ]h where ui = vj1 , vj2 , . . . , vjd . When the object t is evolved we want it to become a yes object and send it out to the environment. [ t ]h → [ ]h yes We also have a counter that counts down in parallel with the above steps. [ ci → ci−1 ]h where i ∈ {1, 2, . . . , |V | + 2} If we output a yes, this occurs on or before timestep 2n. Therefore, when the counter reaches zero, there must not have been a yes object, so we output a no to the environment. [ c0 ]h → [ ]h no This family of membrane systems is easily constructed by a logspace Turing machine. However, if we wish to use AC0 -uniformity we need to insist on a limited out-degree d on all nodes. We can make this restriction without loss of generality. A CRAM to construct the above family for this restricted version of STCON will run in d + 1 time steps. Each processor of the CRAM works with one edge of the graph. There is a register assigned for each node in the graph. Each processor writes the source node of its edge to the matching register, this will be the left hand side of the rule. The processor will continue to write to this same register in the following timesteps. In the next d time steps the processor tries to write its destination node to this register. If the register is being used by another processor, it waits and tries to write again the next time step. Once it writes its node successfully it stops. The CRAM then outputs the contents of the registers which are the membrane rules of the system. Note that we encode the edges of the graph as rules, rather than objects. In the membrane computing framework, for uniform membrane systems, inputs must be specified (encoded) as objects. Therefore our algorithm is semi-uniform as we require a different membrane system for each unique problem instance. 4.1
Differences between Circuit and Membrane Uniformity
To date we have no lower bound for uniform families of active membrane systems without dissolution. Our search for such a lower bound has highlighted some interesting differences between circuit and membrane uniformity.
A Characterisation of NL Using Membrane Systems
173
In circuit complexity we assume a reasonable binary encoding of the input to the circuit so we only need to consider bounding the complexity of the family machine which constructs the circuit family. However with uniform families of active membrane systems we construct our input multiset with an input encoding machine. The family machine that constructs the membrane system Π(n) takes a unary number n as input, where n is input length, similar to circuit uniformity. However the input encoding machine takes the actual input instance, this potentially allows it to solve the problem. For example, consider the following membrane system. Its family machine is DLOGTIME-AC0 but the input encoding machine is NC1 . The input encoding machine processes the input in such a way that it becomes trivial to solve the problem PARITY. PARITY is the problem of telling whether the number of 1 symbols in the input word is odd. This problem is known [5] to be outside of AC0 , and so AC0 would be a reasonable uniformity condition in this case. Our family machine takes as input n ∈ N and constructs a set of objects {odd1i 0j , even1i 0j | i, j ≥ 0 such that i + j = n}. Objects yes and no are also created. A type (a) rule is created mapping every odd object with i “1” symbols to the even object with i−1 “1” symbols in it. A type (a) rule is created mapping every even object with i “1” symbols to the odd object with i − 1 “1” symbols in it. A rule is created from object odd00...0 to yes and from even00...0 to no. The NC1 -input encoding machine rearranges the input word w by moving all 1 symbols to the left and all 0 symbols to the right, to give w . Then the symbol evenw is placed in the input membrane. (Note, the complexity of this problem has been previously analysed [2]). As the system runs, the initial object evolves alternately between odd and even until only 0 symbols are left in the subscript, then a yes (or no) is evolved indicating the input word contained an odd (or even) number of 1 symbols. It is possible to decide the parity of such preprocessed binary strings with an AC0 circuit. This indicates that our preprocessing step (the input encoding machine) was too powerful. Also, it can be noted that for circuits it is open whether or not P-uniform AC0 = DLOGTIME-AC0 , an analogous statement does not hold for membrane systems. Essentially the use of a P-uniform input encoding machine allows the system to solve at least the problems in P.
5
P Lower Bound on Uniform Families of Active Membrane Systems with Dissolving Rules
So far we have seen that by tightening the uniformity condition from P to AC0 we lower the power of some models from P down to NL (see Figure 1). In this section we show that this does not happen for all models with at least P power. More precisely, we prove Theorem 3 by showing that AC0 -uniform, polynomial time, membrane systems with dissolution are lower bounded by P. Naturally this result also holds for the semi-uniform case.
174
N. Murphy and D. Woods AND input t
OR
input
input
f
f
t
1
0
T [ ]t → [ T ]t [ T ]t → λ F [ ]f → [F ]f [ F ]f → λ [ 1 ]AND → [ ]AND T [ 0 ]AND → [ ]AND F
0
input f
t 1
F [ ]f → [ F ]f [ F ]f → λ T [ ]t → [ T ]t [ T ]t → λ [ 0 ]OR → [ ]OR F [ 1 ]OR → [ ]OR T
Fig. 2. AND and OR gadgets which can be nested together to simulate a circuit. Here “input” is either T , F , or a nested gadget membrane.
Proof. A constant time CRAM encodes an instance of the Circuit Value problem (CVP) [10] as a PMCAM0+d,+u membrane system using the gadget membranes and rules shown in Figure 2. The figure shows AND and OR gadgets: a NOT gadget can be made with the rules [ T ]NOT → [ ]NOT F , [ F ]NOT → [ ]NOT T . The resulting membrane system directly solves the instance of CVP in polynomial time. To ensure uniformity we have an input membrane (inside the skin membrane) where the initial input assignments for each variable are placed. For example if input gate i is true and input gate j is false we would have input objects Ti and Fj in the input membrane. When the computation starts the truth assignments descend into the encoded circuit until they reach their appropriate “input gate” gadget where they start the computation. We simulate multiple fanouts by outputting multiple copies of the resulting truth value of each gate. We also give each gadget a unique label and the output of each gate would be tagged. The output of a gate moves up through the layers of the membrane system until it reaches the correct gate according to its tag.
6
Future Directions
We have introduced AC0 uniform active membrane systems and shown an NL characterisation of semi-uniform systems without dissolution, this is an improvement over the previous P upper bound. Interestingly some existing P [11] and PSPACE [1,18] characterisations remain unchanged under the tighter uniformity conditions. This is the first characterisation of an active membrane system that is not either P or PSPACE. This raises the possibility that other variants may characterise other complexity classes such as NP or the arguably more realistic NC hierarchy [6].
A Characterisation of NL Using Membrane Systems
175
We have yet to show a lower bound for uniform active membranes without dissolution. Perhaps there is a way to further tighten the upper bound, this would be the first gap between the computing power of the uniform and semi-uniform versions of an active membrane model. In Section 4.1 we briefly explore the possibility of having different uniformity conditions and encoding conditions. Acknowledgements. Niall Murphy is funded by the Irish Research Council for Science, Engineering and Technology. Damien Woods is supported by Science Foundation Ireland grant 04/IN3/1524 and Junta de Andaluc´ıa grant TIC-581. We would like to thank Mario J. P´erez-Jim´enez and Agust´ın Riscos-N´ un ˜ ez and the other members of the Research Group on Natural Computing in Seville for interesting discussions and for spotting an ambiguity in an earlier version of our uniformity definition.
References 1. Alhazov, A., P´erez-Jim´enez, M.J.: Uniform solution to QSAT using polarizationless active membranes. In: Durand-Lose, J., Margenstern, M. (eds.) MCU 2007. LNCS, vol. 4664, pp. 122–133. Springer, Heidelberg (2007) 2. Allender, E., Gore, V.: On strong separations from AC0 . DIMACS Series in Discrete Mathematics and Theoretical Computer Science 13, 21–37 (1993) 3. Balc´ azar, J.L., Diaz, J., Gabarr´ o, J.: Structural complexity I, 2nd edn. Springer, New York (1988) 4. Barrington, D.A.M., Immerman, N., Straubing, H.: On uniformity within NC1 . Journal of Computer and System Sciences 41(3), 274–306 (1990) 5. Furst, M.L., Saxe, J.B., Sipser, M.: Parity, circuits and the polynomial-time hierarchy. Theory of Computing Systems (formerly Mathematical Systems Theory) 17(1), 13–27 (1984) 6. Greenlaw, R., Hoover, H.J., Ruzzo, W.L.: Limits to parallel computation:Pcompleteness Theory. Oxford University Press, New York (1995) 7. Guti´errez-Naranjo, M.A., P´erez-Jim´enez, M.J., Riscos-N´ un ˜ez, A., RomeroCampero, F.J.: Computational efficiency of dissolution rules in membrane systems. International Journal of Computer Mathematics 83(7), 593–611 (2006) 8. Immerman, N.: Expressibility and parallel complexity. SIAM Journal on Computing 18(3), 625–638 (1989) 9. Karp, R.M., Ramachandran, V.: Parallel algorithms for shared memory machines. In: van Leeuwen, J. (ed.) Handbook of Theoretical Computer Science, ch. 17, vol. A, pp. 869–941. Elsevier, Amsterdam (1990) 10. Ladner, R.E.: The circuit value problem is log space complete for P. SIGACT News 7(1), 18–20 (1975) 11. Murphy, N., Woods, D.: Active membrane systems without charges and using only symmetric elementary division characterise P. In: Eleftherakis, G., Kefalas, P., P˘ aun, G., Rozenberg, G., Salomaa, A. (eds.) WMC 2007. LNCS, vol. 4860, pp. 367–384. Springer, Heidelberg (2007) 12. Papadimitriou, C.H.: Computational Complexity. Addison-Wesley, Reading (1993) 13. P˘ aun, G.: P Systems with active membranes: Attacking NP-Complete problems. Journal of Automata, Languages and Combinatorics 6(1), 75–90 (2001); CDMTCS TR 102, Univ. of Auckland (1999), www.cs.auckland.ac.nz/CDMTCS
176
N. Murphy and D. Woods
14. P˘ aun, G.: Membrane Computing. An Introduction. Springer, Berlin (2002) 15. P˘ aun, G.: Further twenty six open problems in membrane computing. In: Proceedings of the Third Brainstorming Week on Membrane Computing, Sevilla (Spain), January 31st - February 4th, pp. 249–262 (2005) 16. P´erez-Jim´enez, M.J., Romero-Jim´enez, A., Sancho-Caparrini, F.: Complexity classes in models of cellular computing with membranes. Natural Computing 2(3), 265–285 (2003) 17. Sipser, M.: Introduction to the Theory of Computation. PWS Publishing Company (1996) 18. Sos´ık, P., Rodr´ıguez-Pat´ on, A.: Membrane computing and complexity theory: A characterization of PSPACE. Journal of Computer and System Sciences 73(1), 137–152 (2007)
Quantum Wireless Sensor Networks Naya Nagy, Marius Nagy, and Selim G. Akl School of Computing, Queen’s University Kingston, Ontario K7L 3N6 Canada {nagy,marius,akl}@cs.queensu.ca
Abstract. Security in sensor networks, though an important issue for widely available wireless networks, has been studied less extensively than other properties of these networks, such as, for example, their reliability. The few security schemes proposed so far are based on classical cryptography. In contrast, the present paper develops a totally new security solution, based on quantum cryptography. The scheme developed here comes with the advantages quantum cryptography has over classical cryptography, namely, effectively unbreakable keys and therefore unbreakable messages. Our security system ensures privacy of the measured data field in the presence of an intruder listening to messages broadcasted in the field. Keywords: wireless sensor networks, quantum cryptography, quantum teleportation, entanglement swapping.
1
Introduction
Wireless sensor networks are becoming increasingly more feasible in monitoring or evaluating various data fields. Their domain of applicability is steadily increasing, ranging from civil objective surveillance to strategic surveillance, from environmental forest condition monitoring to urban information gathering. Given the large variety of working environments, the question of protecting the privacy of the gathered data is almost overdue and will be addressed here. In general, a sensor network is a collection of sensor nodes arbitrarily spread over a geographic field [14]. The purpose of the network is to collect or monitor data from the field. From an abstract point of view, each point of the field is defined by a small set of significant parameters. Each node in its turn is able to measure (sense) the field parameters of its geographical location. Sensor nodes can communicate with each other via radio signals, which means that they are not hardwired to one another. Each node has a certain transmission power, and it can send messages to any of the nodes within its transmission range. Also a sensor node can receive messages sent by another node. Note that, the energy consumed to receive a message is independent of the distance between the source and the destination and thus, a node can receive a message from arbitrarily large distances (provided that it falls within the transmission range C.S. Calude et al. (Eds.): UC 2008, LNCS 5204, pp. 177–188, 2008. c Springer-Verlag Berlin Heidelberg 2008
178
N. Nagy, M. Nagy, and S.G. Akl
of the sender). As the nodes are deployed at random across the field, they self organize themselves in a network, restricted only by their transmission range. Each sensor node has a local limited computational capacity and is therefore able to perform modest sized computations locally. 1.1
Protecting the Sensor Network
The reliability of sensor networks [1] has been studied extensively and refers to the correct functioning of the network in the face of adverse events and failure of some of the nodes. Indeed, sensor nodes function in more challenging and unpredictable circumstances than regular computers and therefore can fail for multiple reasons. For example, sensor nodes are battery operated and battery failure implicitly causes the failure of the node. Again, sensor nodes are deployed in real natural environments, where natural events may destroy the node. Thus, the network as a whole needs to be operational, even though a fraction of the nodes are not operational. Algorithms to deal with node failure are basic to sensor network management and ensure that sensor networks work reliably. Note that all the challenges of the network considered up to now are natural, read unintentional . In this paper, by contrast, we explore some aspects of a malevolent intervention in the network. We note here, that the issue of security in a sensor network has been studied decidedly very little compared to, for example, the reliability of such networks. Security treats the situation where an intruder purposefully inserts itself in the sensor network. The intruder may intend to perform one or more of the following actions: 1. 2. 3. 4.
Listen to the environment for messages transmitted among sensor nodes, Tamper with the content of messages, Insert false messages in the network, Insert itself on a privileged communication line and then drop a message.
Perrig et al. [9] designed a subsystem to provide security of communication in a wireless sensor network. Their messages are encrypted with secret keys. The whole subsystem was implemented in a small network at Berkeley, consisting of nodes communicating with a base station. Messages are either destined for the base station or originate at the base station. Our paper describes a totally new approach to protecting the privacy of the data field. The method relies on quantum means to obtain security. We envision sensor nodes that have both a classical work memory and a set of quantum bits. Quantum cryptography methods will be used to establish effectively unbreakable secret keys. Experiments with quantum bits are very impressive. Although mostly in the experimental stage, the age of commercially used quantum devices may be nearer than we expect. Already, practical implementations of the BB84 [3] protocol are commercially available. Our security scheme has a requirement that is not yet practically feasible. Quantum bits, as used in our protocol, have to be entangled . Entanglement
Quantum Wireless Sensor Networks
179
will be defined in the next section and has been obtained experimentally in several settings. Additionally, our quantum bits have to persist in time. That is, these quantum bits have to retain their state for a reasonable amount of time and be able to be moved and deployed with the deployment of the sensor nodes. Trapping and transporting entangled quantum bits has not yet been done. Nevertheless, once entangled quantum bits can be stored and transported, applications of the kind described in this paper become very attractive indeed. The rest of the paper is organized as follows. Entangled qubits are introduced in section 2. Section 3 defines the sensor network with quantum properties. Section 4 describes quantum teleportation which is the essential means in our security scheme. The algorithm that allows secret message exchange in the network is given in section 5. The paper concludes with section 6.
2
Entangled Qubits in Quantum Cryptography
It is well known that quantum cryptography offers improved security for communication over classical cryptography. Two parties, Alice and Bob intend to communicate secretly. They go through a quantum key distribution protocol and establish a binary secret key. The key value is now known to both Alice and Bob. This secret key will be used afterwards to encrypt / decrypt classical messages. The secret key that is obtained from a quantum key distribution protocol has several desirable and important properties: 1. The secret key is unbreakable [10]. This means that the protocol that establishes the key, does not reveal any information about the value of the key. There is no advantage for an intruder, Eve, to listen to the quantum key distribution protocol. Any particular bit in the secret key still has a 50% chance of being either 0 or 1. 2. Intrusion detection is possible with high probability[10]. If Eve tampers with the messages and the quantum bits during the protocol, her presence is detected. 3. Information exchanged during the protocol is public [7]. There is no need for classical authentication of messages between Alice and Bob. This authentication would typically require a small secret key known to Alice and Bob prior to the protocol, yet the quantum protocol described in [7] provides authentication based on protected public information only. Many quantum key distribution algorithms rely on entangled qubits [5], [4], [11]. Two qubits that are entangled are described by a single quantum state. Consider an entangled qubit pair: Alice holds the first qubit and Bob holds the second qubit. If one party, say Alice, measures her qubit, Bob’s qubit will collapse to the state compatible with Alice’s measurement. The vast majority of key distribution protocols based on entanglement, rely on Bell entangled qubits [8]. The qubit pair is in one of the four Bell states: 1 Φ+ = √ (|00 + |11) 2
180
N. Nagy, M. Nagy, and S.G. Akl
1 Φ− = √ (|00 − |11) 2 1 Ψ + = √ (|01 + |10) 2 1 Ψ − = √ (|01 − |10) 2 Suppose Alice and Bob share a pair of entangled qubits described by the first Bell state: 1 Φ+ = √ (|00 + |11) 2 Alice has the first qubit and Bob has the second. If Alice measures her qubit and sees a 0, then Bob’s qubit has collapsed to |0 as well. Bob will measure a 0 with certainty, that is, with probability 1. Again, if Alice measures a 1, Bob will measure a 1 as well, with probability 1. The same scenario happens if Bob is the first to measure his qubit. Note that any measurement on one qubit of this entanglement collapses the other qubit to a classical state. This property is specific to all four Bell states and is then exploited by key distribution protocols: If Alice measures her qubit, she knows what value Bob will measure.
3
The Definition of a Quantum Sensor Network
The goal of our sensor network is to monitor a geographic data field to the benefit of a mobile agent (or person) walking in the field (see fig. 1). The agent should be able to take decisions based on the information gathered from the field. Consider the following toy example. The agent is a fox hunting rabbits. The sensor nodes are able to detect the presence of a rabbit and also the size of the rabbit. The fox wants to be able to know where the rabbits are, without walking through the whole field, indeed it wants to get this information without moving from its present location. Once the fox knows about the position and sizes of the rabbits, it will decide to go catch the largest rabbit. The security question translates for our game to the following scenario. Besides the fox, there is also a large cat walking in the field. Formally, we will call the cat the intruder, or adversary. The cat also wants to catch rabbits. The problem of the entire network is to prevent the cat from gathering any knowledge about the rabbits in the field. The cat is able to listen to the environment and record the messages transmitted among the sensor nodes. The protocol presented below will make the messages unintelligible to the cat. Sensor nodes are deployed at random in the field. We assume that the nodes know their geographic location. Each node has a small work memory to prepare and transmit messages. Also, an arbitrary node s has a set of n quantum bits qs1 , qs2 , qs3 , ..., qsn . The only operation that the node needs to be able to perform on the qubits is to measure them.
Quantum Wireless Sensor Networks
181
base station
friendly agent
sensor nodes Fig. 1. A network of sensor nodes with a friendly agent walking in the field
The (legitimate) agent a has greater computational power, and a larger memory than a sensor node. It also owns a larger set of m quantum bits qa1 , qa2 , qa3 , ..., qam , where m > n. The operations the agent is able to perform on its bits are: measuring and simple transformations. In fact, only two transformations are necessary: phase rotation (Z operator) and negation (N OT operator). The agent wishes to be able to query the field. These queries give the agent information about the field. The collected information will then affect its decision and movement in the field. The adversary or intruder, on the other hand, is interested in gathering the same information as the legitimate agent but harbors malevolent plans. The sensor network should be able to answer the queries of the agent, while protecting its measured data from the adversary. For each query, the agent consumes a certain constant amount of qubits k. The number of qubits used for one query depends on the desired level of security. Likewise, the sensor node answering the query, consumes the same amount of qubits k. Sensor nodes that pass on an already encrypted message don’t use their qubits. Note that the number of the agent’s qubits is limited by m. Therefore, the number of secret queries that an agent is able to perform on a field is also n limited by m k = O(m). Likewise, any sensor node is able to answer k = O(n) queries. Wireless communication is not secure. The adversary can listen to the environment for broadcasted messages. Therefore, our security scheme will provide the means to encrypt the messages. The intruder will have no benefit from intercepting the messages.
182
N. Nagy, M. Nagy, and S.G. Akl
To be able to effectively use the quantum bits, we will require the existence of a base station (see fig. 1). The base station is situated anywhere outside the field. It does not need to be in the communication range of any sensor node. It can be far from the sensor field, and is not directly connected to the sensor nodes. The agent is able to communicate with the base station on an authenticated telephone line. This telephone line can be made available prior to any interaction between the agent and the field. The reason for the base station is that it makes the connection between the agent and the sensor nodes in terms of quantum bits. Every quantum bit of the sensor nodes is entangled with a quantum pair physically situated at the base station. As such, the qubits of node s are pairwise entangled with a set of qubits at the base station qs1 , qs2 , qs3 , ..., qsn . The base station manages these quantum bits and knows the connection between the quantum bits at the station and the geographic sensor nodes in the field. The entanglement is of the type Φ+ as described in the previous section. Additionally, the base station also owns a larger set of quantum bits entangled with the quantum bits of the agent qa1 , qa2 , qa3 , ..., qam . This entanglement is also of the type Φ+ . In short, both the sensor nodes and the agent are entangled via multiple quantum bits with the base station and the main purpose of the base station is to manage these quantum bits (see fig. 2). Following a quantum teleportation protocol, described in the next section, the base station will be able to entangle qubits of the agent with qubits of some chosen sensor node. The result is that the agent now is directly entangled with a sensor node of its choice and can establish a secure secret key.
entangled
entangled
Fig. 2. For every sensor node and for the agent, the base station manages the entangled pair of several qubits. The figure shows only one pair for the agent and one pair for an arbitrary sensor node.
Quantum Wireless Sensor Networks
183
It is important now to mention that in this security scheme, several objects are trusted, namely: 1. The base station is trusted. This is a reasonable assumption, as the base station is not part of the field and can be located in a secure place. 2. The agent is trusted. The agent is the basic decision making component and thus is given authority and trust. 3. The sensor nodes are trusted. On the other hand, the environment is not trusted. Messages among sensor nodes can be freely intercepted. Also the telephone line between the agent and the base station is not secure, though authenticated. The adversary can listen to the telephone conversations.
4
Quantum Teleportation and Entanglement Swapping
Quantum teleportation was defined in [2], [12]. It refers to the transfer of an unknown quantum state from one geographical source location to another destination location. This state transfer does not involve any transfer of matter from the source to the destination. It needs an entangled qubit pair, with the first qubit located at the source and the second qubit located at the destination. The second qubit will receive the desired unknown state. In transferring the state to the destination, it disappears from the source, thus preserving the “no cloning” theorem [13]. To obtain the desired teleported state at the destination, two bits of classical information need to be sent from the source to the destination. Depending on this information, the destination qubit needs to be transformed by a simple gate. This property complies with the principle that information cannot be transmitted at a speed larger than the speed of light. A variant of quantum teleportation is entanglement swapping (see fig. 3). Note that, in teleportation, the quantum state of the source qubit qsource disappears from the source location and reappears in the destination qubit qdestination as exactly the same state. If the original state qsource was entangled with some other qubit qpair , this entanglement will be transferred to the destination qubit qdestination , causing the latter to be entangled with qpair . This scenario is called entanglement swapping and has been demonstrated in practice [6]. Quantum swapping will be described in detail below in the particular setting of our sensor network. Quantum swapping is the basic step towards private communication between the agent and some sensor node. Consider some qubit of the agent qai entangled with its base station com panion qubit qai . The agent intends to communicate secretly with node s. The node’s qubit offered for this entanglement swapping may be qsj entangled with the base station’s qubit qsj . These four qubits form an ensemble
ensemble = qai qai qsj qsj .
184
N. Nagy, M. Nagy, and S.G. Akl
1. before swapping
2. after swapping
Fig. 3. The entanglement is transferred to the two qubits belonging to the agent and the sensor node respectively
Note that, the first qubit of the ensemble belongs to the agent. The second and third qubits belong to the base station and the fourth qubit belongs to the sensor node. This order has been chosen so that the transformations applied by the base station and the agent are easier to see. As both the agent’s qubit pair and the sensor node’s qubit pair are entangled in the Φ+ Bell state, the ensemble can be rewritten as 1 1 ensemble = √ (|00 + |11) ⊗ √ (|00 + |11) = 2 2 =
1 (|0000 + |0011 + |1100 + |1111). 2
The following formula rewrites the base station’s two qubits highlighting the Bell basis ensemble =
1 1 (|0 ⊗ √ (|Φ+ + |Φ− ) ⊗ |0+ 2 2
Quantum Wireless Sensor Networks
185
1 +|0 ⊗ √ (|Ψ + + |Ψ − ) ⊗ |1+ 2 1 +|1 ⊗ √ (|Ψ + − |Ψ − ) ⊗ |0+ 2 1 +|1 ⊗ √ (|Φ+ − |Φ− ) ⊗ |1) = 2 1 = √ (|0 ⊗ |Φ+ ⊗ |0 + |1 ⊗ |Φ+ ⊗ |1+ 2 2 |0 ⊗ |Φ− ⊗ |0 − |1 ⊗ |Φ− ⊗ |1+ |0 ⊗ |Ψ + ⊗ |1 + |1 ⊗ |Ψ + ⊗ |0+ |0 ⊗ |Ψ − ⊗ |1 − |1 ⊗ |Ψ − ⊗ |0). The base station now measures qubits two and three, located at the station. The qubits are measured in the Bell basis (Φ+ , Φ− , Ψ + , Ψ − ). It is interesting to see what happens to the state of the other two qubits after this measurement. The base station will have to communicate the result of the measurement to the agent. This is done via the insecure classical channel. If the station’s measurement was: 1. Φ+ . The remaining qubits have collapsed to 1 ensemble1,4 = √ (|00 + |11) 2 This is a Bell Φ+ entanglement, the desired one. The agent and the field node are now entangled. 2. Φ− . The remaining qubits have collapsed to 1 ensemble1,4 = √ (|00 − |11) 2 This is not quite a Φ+ entanglement, but can be easily transformed into it. The agent has to change the phase of his qubit and can do so by applying the gate defined by the Pauli matrix [8]: 1 0 Z= . 0 −1
186
N. Nagy, M. Nagy, and S.G. Akl
3. Ψ + . The remaining qubits have collapsed to 1 ensemble1,4 = √ (|01 + |10) 2 In this case the agent has a qubit in which the bit values (|0 and |1) compared to the field node are reversed. The agent has to apply the gate for the Pauli matrix that performs a N OT : 01 N OT = . 10 4. Ψ − . The remaining qubits have collapsed to 1 ensemble1,4 = √ (|01 − |10) 2 Now the agent’s qubit has both the bit values reversed and the phase is also rotated. Thus, the agent will apply a gate defined by the product: 0 1 Z · N OT = . −1 0 The agent has to communicate with the base station in order to know what transformation, if any, to apply on his qubit to obtain the final Φ+ entanglement with the field node. This is why they need a telephone line. The base station communicates to the agent the outcome of its measurement. As there are four possible measurement outcomes, two classical bits suffice to discriminate among the measurements. After this step, the agent and the field node have Φ+ entangled qubits, without having ever met.
5
Security Protocols
The following two scenarios will be discussed 1. Agent query. The agent has a map of the field and wishes to obtain information from a selected location (x, y) regarding a possible event e. The location (x, y) to be queried will be visible by the intruder. Yet, the nature of the event and the parameters of the event will be private. 2. Sensor node event signaling. A sensor node located at (x, y) detects an event of importance. It sends a signal to the agent. The agent then queries the node as to the nature and parameters of the event. Again, the intruder will know the location of the event but will not have any information about the nature of the event and its parameters. We are ready now to describe an algorithm that allows the agent to query the field in some specific location. For simplicity, let us consider that the secret key that will encrypt the messages is just three bits long, k = k1 k2 k3 . This is of course a short key for practical purposes. The agent query algorithm follows the steps below:
Quantum Wireless Sensor Networks
187
1. The agent a sends the location (x, y) of the query to the base station. 2. The base station locates a sensor node s that is closest the (x, y) and performs an entanglement swapping for three qubit pairs. 3. The agent and the node s establish a secret key k of three bits. 4. The agent uses this secret key to encrypt a message containing the nature of the event of interest. Then it broadcasts the message in the network. The message will be unintelligible to all nodes except s which shared the secret key k. 5. When s receives the encrypted message, it reads the parameters of the requested event. These parameters are then encrypted using the same key k. The new message is broadcasted in the field again and the agent eventually receives the desired information. Most steps are straightforward and need no further explanation. We will insist on step 3, establishing the secret key. The agent and the node share three entangled quantum bit pairs. Remember that we trust both the agent and the node. A simple measurement performed in the computational basis will yield the same three classical bits for both the agent and the node. These three classical bits are the key k. In the second scenario, in which the sensor node is signaling the event, the procedure is very similar to the previous one. One step is performed ahead of the previous algorithm. 1. The sensor node that has detected an event broadcasts its location on the network. The agent will read this message with the position of the sensor node and start a query procedure with this location. The important feature of both algorithms is that the wireless environment does not reveal the measured parameters, nor the nature of the event. The only information which is not encrypted in the network is the location of the event or query. Note that, in the process that establishes the value of the secret key, no information concerning this value is ever visible in the environment. The key is therefore unbreakable by the intruder listening to the environment.
6
Conclusion
We have shown that sensor networks can benefit from quantum cryptography. In particular, the issue of security in sensor networks can find basic solutions in the already well established field of quantum cryptography. We described a security scheme for sensor networks using entangled qubits. The scheme protects the measured data of the field in the insecure wireless environment. The intruder is considered to be able to listen to the environment, but is considered unable to inject data in the data field or corrupt a sensor node. The issue of the intruder behaving as a sensor node in the field and injecting false messages will be treated in a future work.
188
N. Nagy, M. Nagy, and S.G. Akl
In the definition of the sensor network we considered all sensor nodes to be trusted. This is a strong assumption. It might be expected, that an intruder may try to insert itself in the network or corrupt an existing sensor node and then send spurious messages. Work is in progress to address these issues in future schemes. Acknowledgments. This research was supported by the Natural Sciences and Engineering Research Council of Canada. The authors wish to thank Waleed Al Salih for his important comments on this paper.
References 1. AboElFotoh, H.M.F., ElMallah, E.S., Hassanein, H.S.: On the reliability of wireless sensor networks. In: IEEE International Conference on Communications (ICC), June 2006, pp. 3455–3460 (2006) 2. Bennett, C.H., Brassard, G., Crepeau, C., Jozsa, R., Peres, A., Wootters, W.K.: Teleporting an unknown quantum state via dual classical Einstein-Podolsky-Rosen channels. Physical Review Letters 70, 1895–1899 (1993) 3. Bennett, C.H., Brassard, G.: Quantum cryptography: Public key distribution and coin tossing. In: Proceedings of IEEE International Conference on Computers, Systems and Signal Processing, Bangalore, India, December, pp. 175–179. IEEE, New York (1984) 4. Bennett, C.H., Brassard, G., Mermin, D.N.: Quantum cryptography without Bell’s theorem. Physical Review Letters 68(5), 557–559 (1992) 5. Ekert, A.: Quantum cryptography based on Bell’s theorem. Physical Review Letters 67, 661–663 (1991) 6. Halder, M., Beveratos, A., Gisin, N., Scarani, V., Simon, C., Zbinden, H.: Entangling independent photons by time measurement. Nature Physics 3, 659–692 (2007) 7. Nagy, N., Nagy, M., Akl, S.G.: Key distribution versus key enhancement in quantum cryptography. Technical Report 2007-542, School of Computing, Queen’s University, Kingston, Ontario (2007) 8. Nielsen, M.A., Chuang, I.L.: Quantum Computation and Quantum Information. Cambridge University Press, Cambridge (2000) 9. Perrig, A., Szewczyk, R., Wen, V., Culler, D.E., Tygar, J.D.: SPINS: security protocols for sensor netowrks. In: Mobile Computing and Networking, pp. 189–199 (2001) 10. Lomonaco Jr., S.J.: A Talk on Quantum Cryptography or How Alice Outwits Eve. In: Proceedings of Symposia in Applied Mathematics, Washington, DC, January 2006, vol. 58, pp. 237–264 (2002) 11. Shi, B.-S., Li, J., Liu, J.-M., Fan, X.-F., Guo, G.-C.: Quantum key distribution and quantum authentication based on entangled states. Physics Letters A 281(23), 83–87 (2001) 12. Vaidman, L.: Teleportation of quantum states. Phys. Rev. A 49(2), 1473–1476 (1994) 13. Wootters, W.K., Zurek, W.H.: A single quantum cannot be cloned. Nature 299, 802–803 (1982) 14. Zhao, F., Guibas, L.: Wireless Sensor Networks - An Information Processing Approach. Elsevier, Amsterdam (2004)
On the Computational Complexity of Spiking Neural P Systems Turlough Neary Boole Centre for Research in Informatics, University College Cork, Ireland
[email protected]
Abstract. It is shown that there is no standard spiking neural P system that simulates Turing machines with less than exponential time and space overheads. The spiking neural P systems considered here have a constant number of neurons that is independent of the input length. Following this we construct a universal spiking neural P system with exhaustive use of rules that simulates Turing machines in polynomial time and has only 18 neurons.
1
Introduction
Since their inception inside of the last decade P systems [12] have spawned a variety of hybrid systems. One such hybrid, that of spiking neural P system [3], results from a fusion with spiking neural networks. It has been shown that these systems are computationally universal. Here the time/space computational complexity of spiking neural P systems is examined. We begin by showing that counter machines simulate standard spiking neural P systems with linear time and space overheads. Fischer et al. [2] have previously shown that counter machines require exponential time and space to simulate Turing machines. Thus it immediately follows that there is no spiking neural P system that simulates Turing machines with less than exponential time and space overheads. These results are for spiking neural P systems that have a constant number of neurons independent of the input length. Extended spiking neural P systems with exhaustive use of rules were proved computationally universal in [4]. However, the technique used to prove universality involved the simulation of counter machines and thus suffers from an exponential time overhead. In the second part of the paper we give an extended spiking neural P system with exhaustive use of rules that simulates Turing machines in polynomial time and has only 18 neurons. Previously, P˘ aun and P˘ aun [11] gave a small universal spiking neural P system with 84 neurons and another, that uses extended rules, with 49 neurons. Both of these spiking neural P systems require exponential time and space to simulate Turing machines but do not have exhaustive use of rules. Chen et al. [1] have shown that with exponential pre-computed resources sat is solvable in constant time with spiking neural P systems. Leporati et al. [6] C.S. Calude et al. (Eds.): UC 2008, LNCS 5204, pp. 189–205, 2008. c Springer-Verlag Berlin Heidelberg 2008
190
T. Neary
gave a semi-uniform family of extended spiking neural P systems that solve the Subset Sum problem in constant time. In later work, Leporati et al. [7] gave a uniform family of maximally parallel spiking neural P systems with more general rules that solve the Subset Sum problem in polynomial time. All the above solutions to NP-hard problems rely families of spiking neural P systems. Specifically, the size of the problem instance determines the number of neurons in the spiking neural P system that solves that particular instance. This is similar to solving problems with uniform circuits families where each input size has a specific circuit that solves it. Ionescu and Sburlan [5] have shown that spiking neural P systems simulate circuits in linear time. In the next two sections we give definitions for spiking neural P systems and counter machines and explain the operation of both. Following this, in Section 4, we prove that counter machines simulate spiking neural P systems in linear time. Thus proving that there exists no universal spiking neural P systems that simulate Turing machines in less than exponential time. In Section 5 we present our universal spiking neural P systems that simulates Turing machine in polynomial time and has only 18 neurons. Finally, we end the paper with some discussion and conclusions.
2
Spiking Neural P Systems
Definition 1 (Spiking neural P systems). A spiking neural P system is a tuple Π = (O, σ1 , σ2 , · · · , σm , syn, in, out), where: 1. O = {s} is the unary alphabet (s is known as a spike), 2. σ1 , σ2 , · · · , σm are neurons, of the form σi = (ni , Ri ), 1 i m, where: (a) ni 0 is the initial number of spikes contained in σi , (b) Ri is a finite set of rules of the following two forms: i. E/sb → s; d, where E is a regular expression over s, b 1 and d 1, ii. se → λ; 0 where λ is the empty word, e 1, and for all E/sb → s; d from Ri se ∈ / L(E) where L(E) is the language defined by E, 3. syn ⊆ {1, 2, · · · , m} × {1, 2, · · · , m} are the set of synapses between neurons, where i = j for all (i, j) ∈ syn, 4. in, out ∈ {σ1 , σ2 , · · · , σm } are the input and output neurons respectively. In the same manner as in [11], spikes are introduced into the system from the environment by reading in a binary sequence (or word) w ∈ {0, 1} via the input neuron σ1 . The sequence w is read from left to right one symbol at each timestep. If the read symbol is 1 then a spike enters the input neuron on that timestep. A firing rule r = E/sb → s; d is applicable in a neuron σi if there are j b spikes in σi and sj ∈ L(E) where L(E) is the set of words defined by the regular expression E. If, at time t, rule r is executed then b spikes are removed from the neuron, and at time t + d − 1 the neuron fires. When a neuron σi fires a spike is sent to each neuron σj for every synapse (i, j) in Π. Also, the neuron σi remains closed and does not receive spikes until time t + d − 1 and no other rule may execute in σi until time t + d. We note here that in 2b(i) it is standard to have
On the Computational Complexity of Spiking Neural P Systems
191
a d 0. However, we have d 1 as it simplifies explanations throughout the paper. This does not effect the operation as the neuron fires at time t + d − 1 instead of t + d. A forgetting rule r = se → λ; 0 is applicable in a neuron σi if there are exactly e spikes in σi . If r is executed then e spikes are removed from the neuron. At each timestep t a rule must be applied in each neuron if there is one or more applicable rules at time t. Thus while the application of rules in each individual neuron is sequential the neurons operate in parallel with each other. Note from 2b(i) of Definition 1 that there may be two rules of the form E/sb → s; d, that are applicable in a single neuron at a given time. If this is the case then the next rule to execute is chosen non-deterministically. The output is the time between the first and second spike in the output neuron σm . An extended spiking neural P system [11] has more general rules of the form E/sb → sp ; d, where b p 0. Note if p = 0 then E/sb → sp ; d is a forgetting rule. An extended spiking neural P system with exhaustive use of rules [4] applies its rules as follows. If a neuron σi contains k spikes and the rule E/sb → sp ; d is applicable, then the neuron σi sends out gp spikes after d timesteps leaving u spikes in σi , where k = bg + u, u < b and k, g, u ∈ N. Thus, a synapse in a spiking neural P system with exhaustive use of rules may transmit an arbitrary number of spikes in a single timestep. In the sequel we allow the input neuron of a system with exhaustive use of rules to receive an arbitrary number of spikes in a single timestep. This is a generalisation on the input allowed by Ionescu et al. [4]. In the sequel each spike in a spiking neural P system represents a single unit of space. The maximum number of spikes in a spiking neural P system at any given timestep during a computation is the space used by the system.
3
Counter Machines
The definition we give for counter machine is similar to that of Fischer et al. [2]. Definition 2 (Counter machine) A counter machine is a tuple C = (z, cm , Q, q0 , qh , Σ, f ), where z gives the number of counters, cm is the output counter, Q = {q0 , q1 , · · · , qh } is the set of states, q0 , qh ∈ Q are the initial and halt states respectively, Σ is the input alphabet and f is the transition function f : (Σ × Q × g(i)) → ({Y, N } × Q × {IN C, DEC, N U LL}) where g(i) is a binary valued function and 0 i z, Y and N control the movement of the input read head, and IN C, DEC, and N U LL indicate the operation to carry out on counter ci . Each counter ci stores a natural number value x. If x > 0 then g(i) is true and if x = 0 then g(i) is false. The input to the counter machine is read in from an input tape with alphabet Σ. The movement of the scanning head on the input tape
192
T. Neary
is one-way so each input symbol is read only once. When a computation begins the scanning head is over the leftmost symbol α of the input word αw ∈ Σ ∗ and the counter machine is in state q0 . We give three examples below to explain the operation of the transition function f . – f (α, qj , g(i)) = (Y, qk , IN C(h)) move the read head right on the input tape to read the next input symbol, change to state qk and increment the value x stored in counter ci by 1. – f (α, qj , g(i)) = (N, qk , DEC(h)) do not move the read head, change to state qk and decrement the value x stored in counter ci by 1. Note that g(i) must evaluate to true for this rule to execute. – f (α, qj , g(i)) = (N, qk , N U LL) do not move the read head and change to state qk . A single application of f is a timestep. Thus in a single timestep only one counter may be incremented or decremented by 1. Our definition for counter machine, given above, is more restricted than the definition given by Fischer [2]. In Fischer’s definition IN C and DEC may be applied to every counter in the machine in a single timestep. Clearly the more general counter machines of Fischer simulate our machines with no extra space or time overheads. Fischer has shown that counter machines are exponentially slow in terms of computation time as the following theorem illustrates. Theorem 1 (Fischer [2]). There is a language L, real-time recognizable by a n one-tape TM, which is not recognizable by any k-CM in time less than T (n) = 2 2k . In Theorem 1 a one-tape TM is an offline Turing machine with a single read only input tape and a single work tape, a k-CM is a counter machine with k counters, n is the input length and real-time recognizable means recognizable in n timesteps. For his proof Fischer noted that the language L = {wawr | w ∈ {0, 1}∗}, where wr is w reversed, is recognisable in n timesteps on a one-tape n offline Turing machine. He then noted, that time of 2 2k is required to process input words of length n due to the unary data storage used by the counters of the k-CM. Note that Theorem 1 also holds for non-deterministic counter machines as they use the same unary storage method.
4
Non-deterministic Counter Machines Simulate Spiking Neural P Systems in Linear Time
Theorem 2. Let Π be a spiking neural P system with m neurons that completes its computation in time T and space S. Then there is a non-deterministic counter machine CΠ that simulates the operation of Π in time O(T (xr )2 m + T m2 ) and space O(S) where xr is a constant dependent on the rules of Π. Proof Idea. Before we give the proof of Theorem 2 we give the main idea behind the proof. Each neuron σi from the spiking neural P system Π is simulated by a counter ci from the counter machine CΠ . If a neuron σi contains y spikes, then
On the Computational Complexity of Spiking Neural P Systems
193
the counter will have value y. A single synchronous update of all the neurons at a given timestep t is simulated as follows. If the number of spikes in a neuron σi is deceasing by b spikes in-order to execute a rule, then the value y stored in the simulated neuron ci is decremented b times using DEC(i) to give y − b. This process is repeated for each neuron that executes a rule at time t. If neuron σi fires at time t and has synapses to neurons {σi1 , . . . σiv } then for each open neuron σij in {σi1 , . . . σiv } at time t we increment the simulated neuron cij using IN C(ij ). This process is repeated until all firing neurons have been simulated. This simulation of the synchronous update of Π at time t is completed by CΠ in constant time. Thus we get the linear time bound given in Theorem 2. Proof. Let Π = (O, σ1 , σ2 , · · · , σm , syn, in, out) be a spiking neural P system where in = σ1 and out = σ2 . We explain the operation of a non-deterministic counter machine CΠ that simulates the operation of Π in time O(T (xr )2 m + T m2 ) and space O(S). There are m + 1 counters c1 , c2 , c3 , · · · , cm , cm+1 in CΠ . Each counter ci emulates the activity of a neuron σi . If σi contains y spikes then counter ci will store the value y. The states of the counter machine are used to control which neural rules are simulated in each counter and also to synchronise the operations of the simulated neurons (counters). Input Encoding. It is sufficient for CΠ to have a binary input tape. The value of the binary word w ∈ {1, 0}∗ that is placed on the terminal to be read into CΠ is identical to the binary sequence read in from the environment by the input neuron σi . A single symbol is read from the terminal at each simulated timestep. The counter c1 (the simulated input neuron) is incremented only on timesteps when a 1 (a simulated spike) is read. As such at each simulated timestep t, a simulated spike is received by c1 if and only if a spike is received by the input neuron σ1 . At the start of the computation, before the input is read in, each counter simulating σi is incremented ni times to simulated the ni spikes in each neuron given by 2(a) of Definition 1. This takes a constant amount of time. Storing Neural Rules in the Counter Machine States. Recall from Definition 1 that the applicability of a rule in a neuron is dependent on a regular expression over a unary alphabet. Let r = E/sb → s; d be a rule in neuron σi . Then there is a finite state machine G that accepts language L(E) and thus decides if the number of spikes in σi permits the application of r in σi at a given time in the computation. G is given in Figure 1. If gj is an accept state in G then j > b. This ensures that there is enough spikes to execute r. We also place the restriction on G that x > b. During a computation we may use G to decide if r is applicable in σi by passing an s to G each time a spike enters σi . However, G may not give the correct result if spikes leave the neuron as it does not record spikes leaving σi . Thus using G we may construct a second machine G such that G records the movement of spikes going into and out of the neuron. G is construct as follows; G has all the same states (including accept states) and transitions as G along with an extra set of transitions that record spikes leaving the neuron. This extra set of transitions are given as follows for each transition
194
T. Neary
s
G g1
s
g2
s
g3
···
gx−1
s
gx
s
gx+1
···
gy
···
gy
+s
G g1
+s −s
g2
+s −s
g3
···
gx−1
+s −s
gx
+s −s
gx+1 −s
Fig. 1. Finite state machine G decides if a particular rule is applicable in a neuron given the number of spikes in the neuron at a given time in the computation. Each s represents a spike in the neuron. Machine G keeps track of the movement of spikes into and out of the neuron and decides whither or not a particular rule is applicable at each timestep in the computation. +s represents a single spike entering the neuron and −s represents a single spike exiting the neuron.
on s from a state gi to a state gj in G there is a new transition on −s going from state gi to gj in G that records the removal of a spike from G . By recording the dynamic movement of spikes, G is able to decide if the number of spikes in σi permits the application of r in σi at each timestep during the computation. G is also given in Figure 1. Note that forgetting rules se → λ; 0 are dependent on simpler regular expressions thus we will not give a machine G for forgetting rules here. Let neuron σi have the greatest number l of rules of any neuron in Π. Thus the applicability of rules r1 , r2 , · · · , rl in σi is decided by the automata G1 , G2 , · · · , Gl . We record if a rule may be simulated in a neuron at any given timestep during the computation by recording the current state of its G automaton (Figure 1) in the states of the counter machine. There are m neuron in Π. Thus each state in our counter machine remembers the current states of at most ml different G automata in order to determine which rules are applicable in each neuron at a given time. Recall that in each rule of the form r = E/sb → s; d that d specifies the number of timestep between the removal of b spikes from the neuron and the spiking of the neuron. The number of timesteps < d remaining until a neuron will spike is recorded in the states of the CΠ . Each state in our counter machine remembers at most m different values < d. Algorithm overview. Next we explain the operation of CΠ by explaining how it simulates the synchronous update of all neurons in Π at an arbitrary timestep t. The algorithm has 3 stages. A single iteration of Stage 1 identifies which applicable rule to simulate in a simulated open neuron. Then the correct number y of simulated spikes are removed by decrementing the counter y times (y = b or y = e in 2b of Definition 1). Stage 1 is iterated until all simulated open neurons have had the correct number of simulated spikes removed. A single iteration of Stage 2 identifies all the synapses leaving a firing neuron and increments
On the Computational Complexity of Spiking Neural P Systems
195
every counter that simulates an open neuron at the end of one of these synapses. Stage 2 is iterated until all firing neurons have been simulated by incrementing the appropriate counters. Stage 3 synchronises each neuron with the global clock and increments the output counter if necessary. If the entire word w has not been read from the input tape the next symbol is read. Stage 1. Identify rules to be simulated and remove spikes from neurons. Recall that d = 0 indicates a neuron is open and the value of d in each neuron is recorded in the states of the counter machine. Thus our algorithm begins by determining which rule to simulate in counter ci1 where i1 = min{i | d = 0 f or σi } and the current state of the counter machine encodes an accept state for one or more of the G automata for the rules in σi1 at time t. If there is more than one rule applicable the counter machine non-deterministically chooses which rule to simulate. Let r = E/sb → s; d be the rule that is to be simulated. Using the DEC(i1 ) instruction, counter ci1 is decremented b times. With each decrement of ci1 the new current state of each automaton G1 , G2 , · · · , Gl is recorded in the counter machine’s current state. After b decrements of ci the simulation of the removal of b spikes from neuron σi1 is complete. Note that the value of d from rule r is recorded in the counter machine state. There is a case not covered by the above paragraph. To see this note that in G in Figure 1 there is a single non-deterministic choice to be made. This choice is at state gx if a spike is being removed (−s). Thus, if one of the automata is in such a state gx our counter machine resolves this be decrementing the counter x times using the DEC instruction. If ci1 = 0 after the counter has been decremented x times then the counter machine simulates state gx−1 otherwise state gy is simulated. Immediately after this the counter is incremented x − 1 times to restore it to the correct value. When the simulation of the removal of b spikes from neuron σi1 is complete, the above process is repeated with counter ci2 where i2 = min{i | i2 > i1 , d = 0 f or σi } and the current state of the counter machine encodes an accept state for one or more of the G automata for the rules in σi2 at time t. This process is iterated until every simulated open neuron with an applicable rule at time t has had the correct number of simulated spikes removed. Stage 2. Simulate spikes. This stage of the algorithm begins by simulating spikes traveling along synapses of the form (i1 , j) where i1 = min{i | d = 1 f or σi } (if d = 1 the neuron is firing). Let {(i1 , j1 ), (i1 , j2 ), · · · , (i1 , jk )} be the set of synapses leaving σi where ju < ju+1 and d 1 in σju at time t (if d 1 the neuron is open and may receive spikes). Then the following sequence of instructions are executed INC(j1 ), INC(j2 ), · · · , INC(jk ), thus incrementing any counter (simulated neuron) that receives a simulated spike. The above process is repeated for synapses of the form (i2 , j) where i2 = min{i | i2 > i1 , d = 1 f or σi }. This process is iterated until every simulated neuron ci that is open has been incremented once for each spike σi receives at time t.
196
T. Neary
Stage 3. Reading input, decrementing d, updating output counter and halting. If the entire word w has not been read from the input tape then the next symbol is read. If this is the case and the symbol read is a 1 then counter c1 is incremented thus simulating a spike being read in by the input neuron. In this stage the state of the counter machine changes to record the fact that each k d that records the number of timesteps until a currently closed neuron will fire is decremented to k − 1. If the counter cm , which simulates the output neuron, has spiked only once prior to the simulation of timestep t + 1 then this stage will also increment output counter cm+1 . If during the simulation of timestep t counter cm has simulated a spike for the second time in the computation, then the counter machine enters the halt state. When the halt state is entered the number stored in counter cm+1 is equal to the unary output that is given by time between the first two spikes in σm . Space analysis. The input word on the binary tape of CΠ is identical to the length of the binary sequence read in by the input neuron of Π. Counters c1 to cm uses the same space as neurons σ1 to σm . Counter cm+1 uses the same amount of space as the unary output of the computation of Π. Thus CΠ simulates Π in space of O(S). Time analysis. The simulation involves 3 stages. Recall that x > b. Let xr be the maximum value for x of any G automaton thus xr is greater than the maximum number of spikes deleted in a neuron. Stage 1. In order to simulate the deletion of a single spike in the worst case the counter will have to be decremented xr times and incremented xr − 1 times as in the special case. This is repeated a maximum of b < xr times (where b is the number of spikes removed). Thus a single iteration of Stage 1 take O(xr 2 ) time. Stage 1 is iterated a maximum of m times per simulated timestep giving O(xr 2 m) time. Stage 2. The maximum number of synapses leaving a neuron i is m. A single spike traveling along a neuron is simulated in one step. Stage 2 is iterated a maximum of m times per simulated timestep giving O(m2 ) time. Stage 3. Takes a small constant number of steps. Thus a single timestep of Π is simulated by CΠ in O((xr )2 m + m2 ) time and T timesteps of Π are simulated in linear time O(T (xr )2 m + T m2 ) by CΠ . The following is an immediate corollary of Theorems 1 and 2. Corollary 1. There exist no universal spiking neural P system that simulates Turing machines with less than exponential time and space overheads.
5
A Universal Spiking Neural P System That Is Both Small and Time Efficient
In this section we construct a universal spiking neural P system that allows exhaustive use of rules, has only 18 neurons, and simulates Turing machines
On the Computational Complexity of Spiking Neural P Systems
197
in polynomial time. The system constructed efficiently simulates the computation of an existing small universal Turing machine [9]. This universal machine has 6 states and 4 symbols and is called U6,4 . The following theorem gives the time/space simulation overheads for U6,4 . Theorem 3 ([9]). Let M be a single tape Turing machine that runs in time T . Then U6,4 simulates the computation of M in time O(T 6 ) and space O(T 3 ). This result is used in the proof of our main theorem which is as follows. Theorem 4. Let M be a single tape Turing machine that runs in time T . Then there is a universal spiking neural P system ΠU6,4 with exhaustive use of rules 3 that simulates the computation of M in time O(T 6 ) and space O(32T ) and has only 18 neurons. If the reader would like to get a quick idea of how our spiking neural P system with 18 neurons operates they should skip to the algorithm overview subsection in the proof below. Proof. We give a spiking neural P system ΠU6,4 that simulates the universal Turing machine U6,4 in linear time and exponential space. The algorithm given for ΠU6,4 is deterministic and is mainly concerned with the simulation of an arbitrary transition rule for any Turing machine with the same state-symbol product as U6,4 , providing it has the same halting condition. Thus it is not necessary to give a detailed explanation of the operation of U6,4 . Any details about U6,4 will be given where necessary. Encoding a configuration of universal Turing machine U6,4 . Each unique configuration of U6,4 is encoded as three natural numbers using a well known technique. A configuration of U6,4 is given by the following equation Ck = ur , · · · ccc a−x · · · a−3 a−2 a−1 a0 a1 a2 a3 · · · ay ccc · · ·
(1)
where ur is the current state, c is the blank symbol, each ai is a tape cell of U6,4 and the tape head of U6,4 , given by an underline, is over a0 . Also, tape cells a−x and ay both contain c, and the cells between a−x and ay include all of the cells on U6,4 ’s tape that have either been visited by the tape head prior to configuration Ck or contain part of the input to U6,4 . The tape symbols of U6,4 are c, δ, b, and g and are encoded as c = 1, δ = 2, b = 3, and g = 4, where the encoding of object x is given by x . Each tape cell ai in configuration Ck is encoded as ai = α where α is a tape symbol of U6,4 . We encode the tape contents in Equation (1) to the left and right of the y x 32i ai and Y = 32j aj , respectively. The tape head as the numbers X = i=1
j=1
states of U6,4 are u1 , u2 , u3 , u4 , u5 , and u6 and are encoded as u1 = 5, u2 = 9, u3 = 13, u4 = 17, u5 = 21 and u6 = 25. Thus the entire configuration Ck is encoded as three natural numbers via the equation
198
T. Neary
Ck = (X, Y, ur + α1 )
(2)
where Ck is the encoding of Ck from Equation (1) and α1 is the symbol being read by the tape head in cell a0 . A transition rule ur , α1 , α2 , D, us of U6,4 is executed on Ck as follows. If the current state is ur and the tape head is reading the symbol α1 in cell a0 , α2 the write symbol is printed to cell a0 , the tape head moves one cell to the left to a−1 if D = L or one cell to the right to a1 if D = R, and us becomes the new current state. A simulation of transition rule ur , α1 , α2 , D, us on the encoded configuration Ck from Equation (2) is given by the equation X X X − ( 32 mod 32), 32Y + 32 α2 , ( 32 mod 32) + us
32 Ck+1 = (3) Y Y Y − ( 32 mod 32), ( 32 mod 32) + us
32X + 32 α2 , 32 where configuration Ck+1 results from executing a single transition rule on configuration Ck , and (b mod c) = d where d < c, b = ec + d and b, c, d, e ∈ N. In Equation (3) the top case is simulating a left move transition rule and the bottom case is simulating a right move transition rule. In the top case, following the left move, the sequence to the right of the tape head is longer by 1 tape cell, as cell a0 is added to the sequence. Cell a0 is overwritten with the write symbol α2 and thus we compute 32Y + 32 α2 to simulate cell a0 becoming part of the right sequence. Also, in the top case the sequence to the left of the tape head is X X − ( 32 mod 32). The rightmost getting shorter by 1 tape cell thus we compute 32 cell of the left sequence a−1 is the new tape head location and the tape symbol X X mod 32). Thus the value ( 32 mod 32) is added it contains is encoded as ( 32 to the new encoded current state us . For the bottom case, a right move, the Y Y − ( 32 mod 32) and sequence to the right gets shorter which is simulated by 32 the sequence to the left gets longer which is simulated by 32X + 32 α2 . The leftmost cell of the right sequence a1 is the new tape head location and the tape Y mod 32). symbol it contains is encoded as ( 32 Input to ΠU6,4 . Here we give an explanation of how the input is read into ΠU6,4 . We also give an rough outline of how the input to ΠU6,4 is encoded in linear time. A configuration Ck given by Equation (2) is read into ΠU6,4 as follows. All the neurons of the system initially have no spikes with the exception of σ3 , which has 30 spikes. The input neuron σ1 receives X spikes at the first timestep t1 , Y spikes at time t2 , and α1 + ur spikes at time t3 . Using the rule s∗ /s → s; 1 the neuron σ1 sends all the spikes it receives during timestep ti to σ6 at timestep ti+1 . Thus using the rules (s64 (s32 )∗ /s → s; 1) and (sα1 +ur /s → s; 1) in σ6 , the rule (s64 (s32 )∗ /s → s; 2) in σ5 , the rule (s64 (s32 )∗ /s → s; 1) in σ7 , and the rule s30 /s30 → λ; 5 in σ3 , the spiking neural P system has X spikes in σ2 , Y spikes in σ3 , and α1 + u spikes in σ5 and σ7 at time t6 . Note that the rule s30 /s30 → λ; 5 in σ3 prevents the first X spikes from entering σ3 and the rule (s64 (s32 )∗ /s → s; 2) in σ5 prevents the spikes encoding Y from entering σ2 .
On the Computational Complexity of Spiking Neural P Systems
199
Forgetting rules (s64 (s32 )∗ /s → λ; 0) and (sα1 +ur /s → λ; 0) are applied in σ8 , σ9 , σ10 , and σ11 to get rid of superfluous spikes. Given a configuration of U6,4 the input to our spiking neural P system in Figure 2 is computed in linear time. This is done as follows; A configuration of U6,4 is encoded as three binary sequences w1 , w2 , and w3 . Each of these sequences encode a numbers from Equation 2. We then use a spiking neural P system Πinput with exhaustive use of rules that takes each sequence and converts it into a number of spikes that is used as input by our system in Figure 2. We give a rough idea of how Πinput operates. The input neuron of Πinput receives the binary sequence w as a sequence of spikes and no-spikes. If a 1 is read at a given timestep a single spike is sent into Πinput . As each bit of the binary sequence is read the total number of spikes in the system is multiplied by 2 (this is a simplification of what actually happens). Thus, Πinput completes its computation in time that is linear in the length of the tape contents of U6,4 . Also, w1 , w2 , and w3 are computed in time that is linear in length of the tape contents of U6,4 . Algorithm overview. To help simplify the explanation, some of the rules given here in the overview differ slightly from those in the more detailed simulation below. The numbers from Equation (2), encoding a Turing machine configuration, are stored in the neurons of our systems as X, Y and α1 + u spikes. Equation (3) is implemented in Figure 2 to give a spiking neural P system ΠU6,4 that simulates the transition rules of U6,4 . The two values X and Y are stored in neurons σ2 and σ3 , respectively. If X or Y is to be multiplied the spikes that encode X or Y move down through the network of neurons from either σ2 or σ3 respectively, until they reach σ18 . Note in Figure 2 that there are synapses from σ6 to σ8 , σ9 , σ10 and σ11 , thus the number N of spikes in σ6 becomes 4N when it fires as it sends N spikes to each neuron σ8 , σ9 , σ10 and σ11 . If 32Y is to be computed we calculate 4Y by firing σ6 , then 16Y by firing σ8 , σ9 , σ10 , and σ11 , and finally 32Y by firing σ12 , σ13 , σ14 , and σ15 . 32X is computed using the same technique. X X − ( 32 mod 32) We give the general idea of how the neurons compute 32 X and ( 32 mod 32) from Equation (3) (a slightly different strategy is used in the simulation). We begin with X spikes in σ2 . The rule (s32 )∗ /s32 → s; 1 is applied X X spikes to σ5 . Following this (s32 )∗ s( 32 mod 32) /s32 → s32 ; 1 is in σ2 sending 32 X X X applied in σ5 which sends 32 − ( 32 mod 32) to σ2 leaving ( 32 mod 32) spikes Y Y Y in σ5 . The values 32 − ( 32 mod 32) and ( 32 mod 32) are computed in a similar manner. Finally, using the encoded current state ur and the encoded read symbol α1 the values 32 α2 and us are computed. Using the technique outlined in the first paragraph of the algorithm overview the value 32( ur + α1 ) is computed by sending ur + α1 spikes from σ6 to σ18 in Figure 2. Then the rule (s32(ur +α1 ) )/s32(ur +α1 )−us → s32α2 ; 1 is applied in σ18 which sends 32 α2 spikes out to neurons σ5 and σ7 . This rule uses 32( ur + α1 ) − us
spikes thus leaving us spikes remaining in σ18 and 32 α2 spikes in both σ5 and σ7 . This completes our sketch of how ΠU6,4 in Figure 2 computes the values
200
T. Neary input
σ2
σ3
σ4
output
σ1
σ5
σ7
σ6
σ8
σ9
σ10
σ11
σ12
σ13
σ14
σ15
σ16
σ17 σ18
Fig. 2. Universal spiking neural P system ΠU6,4 . Each oval shape is a neuron and each arrow represents the direction spikes move along a synapse between a pair of neurons.
in Equation (3) to simulate a transition rule. A more detailed simulation of a transition rule follows. Simulation of ur , α1 , α2 , L, us (top case of Equation (3)). The simulation of the transition rule begins at time ti with X spikes in σ2 , Y spikes in σ3 , and α1 + u spikes in σ5 and σ7 . We explain the simulation by giving the number of spikes in each neuron and the rule that is to be applied in each neuron at time t. For example at time ti we have ti : σ2 = X, σ3 = Y, σ5 = ur + α1 ,
sur +α1 /s → s; 1,
σ7 = ur + α1 ,
sur +α1 /s → s; 1.
where on the left σj = k gives the number k of spikes in neuron σj at time ti and on the right is the next rule that is to be applied at time ti if there is
On the Computational Complexity of Spiking Neural P Systems
201
an applicable rule at that time. Thus from Figure 2 when we apply the rule sur +α1 /s → s; 1 in neurons σ5 and σ7 at time ti we get ti+1 : σ2 = X + ur + α1 ,
s64 (s32 )∗ sur +α1 /s32 → s; 9,
σ3 = Y + ur + α1 ,
(s32 )∗ sur +α1 /s → s; 1.
ti+2 : σ2 = X + ur + α1 ,
s64 (s32 )∗ sur +α1 /s32 → s; 8,
σ4 = Y + ur + α1 , if ur + α1 = u6 + c
(s32 )∗ sur +α1 /s32 → s32 ; 1,
if ur + α1 = u6 + c
(s32 )∗ sur +α1 /s → λ; 0,
σ6 = Y + ur + α1 ,
(s32 )∗ sur +α1 /s → s; 1,
σ7 = Y + ur + α1 ,
s32 (s32 )∗ sur +α1 /s → λ; 0.
ti+3 : σ2 = X + ur + α1 ,
s64 (s32 )∗ sur +α1 /s32 → s; 7,
σ5 , σ7 = Y + ur + α1 ,
s32 (s32 )∗ sur +α1 /s → λ; 0,
σ8 , σ9 , σ10 , σ11 = Y + ur + α1 ,
s32 (s32 )∗ sur +α1 /s → s; 1.
In timestep ti+2 above σ4 the output neuron fires if and only if the encoded current state ur = u6 and the encoded read symbol α1 = c . The universal Turing machine U6,4 halts if an only if it encounters the state-symbol pair (u6 , c). Also, when U6,4 halts the entire tape contents are to the right of the tape head, thus only Y the encoding of the right sequence is sent out of the system. Thus the unary output is a number of spikes that encodes the tape contents of U6,4 . Note that at timestep ti+3 each of the neurons σ12 , σ13 , σ14 , and σ15 receive Y + ur + α1 spikes from each of the four neurons σ8 , σ9 , σ10 , and σ11 . Thus at timestep ti+4 each of the neurons σ12 , σ13 , σ14 , and σ15 contain 4(Y + ur + α1 ) spikes. Neurons σ12 , σ13 , σ14 , and σ15 are fired at time ti+4 to give 16(Y + ur + α1 ) spikes in each of the neurons σ16 and σ17 at timestep ti+5 . Firing neurons σ16 and σ17 at timestep ti+5 gives 32(Y + ur + α1 ) spikes in σ18 at timestep ti+6 . ti+4 : σ2 = X + ur + α1 , σ12 , σ13 , σ14 , σ15 = 4(Y + ur + α1 ), ti+5 : σ2 = X + ur + α1 , σ16 , σ17 = 16(Y + ur + α1 ),
s64 (s32 )∗ sur +α1 /s32 → s; 6, (s128 )∗ s4(ur +α1 ) /s → s; 1. s64 (s32 )∗ sur +α1 /s32 → s; 5, (s512 )∗ s16(ur +α1 ) /s → s; 1.
202
T. Neary
s64 (s32 )∗ sur +α1 /s32 → s; 4,
ti+6 : σ2 = X + ur + α1 , σ18 , = 32(Y + ur + α1 ),
(s32 )∗ s32(ur +α1 ) /s32 → (s32 ); 1. 2
2
2
Note that (32Y mod 322 ) = 0 and also that 32( ur + α1 ) < 322 . Thus in 2 2 2 neuron σ18 at time ti+6 the rule (s32 )∗ s32(ur +α1 ) /s32 → s32 ; 1 separates the encoding of the right side of the tape s32Y and the encoding of the current state and read symbol s32(ur +α1 ) . To see this note the number of spikes in neurons σ7 and σ18 at time ti+7 . The rule s32(ur +α1 )−us → s32α2 ; 1, applied in σ18 at timestep ti+7 , computes the new encoded current state us and the write symbol 32 α2 . To see this note the number of spikes in neurons σ7 and σ18 at time ti+8 . The reason the value 32 α2 appears in σ7 instead of α2 is that the cell containing α2 becomes part of the sequence on the right and is added to 32Y (as in Equation (3)) at timestep ti+9 . Note that d > 1 in σ2 at timesteps ti+7 and ti+8 indicating σ2 is closed. Thus the spikes sent out from σ5 at these times do not enter σ2 . s64 (s32 )∗ sur +α1 /s32 → s; 3,
ti+7 : σ2 = X + ur + α1 , σ5 = 32Y,
(s32 )∗ /s32 → s; 1,
σ7 = 32Y,
(s32 )∗ /s32 → s; 1,
σ18 , = 32( ur + α1 ), s32(ur +α1 ) /s32(ur +α1 )−us → s32α2 ; 1. s64 (s32 )∗ sur +α1 /s32 → s; 2,
ti+8 : σ2 = X + ur + α1 , σ3 = 32Y, σ5 = 32 α2 ,
(s32 )∗ /s32 → s; 1,
σ7 = 32 α2 ,
(s32 )∗ /s32 → s; 1, sus /s → s; 4.
σ18 , = us ,
s64 (s32 )∗ sur +α1 /s32 → s; 1,
ti+9 : σ2 = X + ur + α1 , σ3 = 32Y + 32 α2 ,
sus /s → s; 3.
σ18 , = us ,
At time ti+10 in neuron σ5 the rule (s32 )∗ s( 32 mod 32) /s32 → s32 ; 1 is applied X X X sending 32 − ( 32 mod 32) spikes to σ2 and leaving ( 32 mod 32) spikes in σ5 . X 32 ∗ ( 32 mod 32) 32 /s → λ; 0 is applied At the same time in neuron σ6 the rule (s ) s X leaving only ( 32 mod 32) spikes in σ6 . Note that from Equation (1) and the X value of X that ( 32 mod 32) = αj where αj is the symbol in cell a−1 at the new tape head location. X
On the Computational Complexity of Spiking Neural P Systems
sur +α1 /s → λ; 0,
ti+10 : σ2 = ur + α1 , σ3 = 32Y + 32 α2
X , σ5 = 32 X σ6 = , 32 σ18 , = us , X X −( mod 32) 32 32 σ3 = 32Y + 32 α2
X σ5 = mod 32 32 X σ6 = mod 32 32 σ18 , = us ,
203
(s32 )∗ s( 32 X
mod 32)
(s32 )∗ s( 32 X
/s32 → s32 ; 1,
mod 32)
/s32 → λ; 0,
sus /s → s; 2.
ti+11 : σ2 =
X X −( mod 32) 32 32 σ3 = 32Y + 32 α2
X mod 32) + us
σ5 = ( 32 X σ7 = ( mod 32) + us
32 X σ8 , σ9 , σ10 , σ11 = mod 32 32
s 32
X
mod 32
X
mod 32
s 32
/s 32
X
mod 32
→ λ; 0
X
mod 32
→ s; 1
/s 32
sus /s → s; 1.
ti+12 : σ2 =
X
s( 32 X
s( 32 X
s 32
mod 32)+us mod 32)+us
mod 32
X
/s 32
/s → s; 1
/s → s; 1,
mod 32
→ λ; 0.
The simulation of the left moving transition rule is now complete. Note that the number of spikes in σ2 , σ3 , σ5 , and σ7 at timestep ti+12 are the values given by the top case of Equation (3) and encode the configuration after the left move transition rule. The cases of when the tape head moves onto a part of the tape that is to the left of a−x+1 in Equation (1) is not covered by the simulation. For example when the tape head is over cell a−x+1 , then X = 32 (recall a−x contains c). If the tape head moves to the left from Equation (3) we get X = 0. Therefore the length of X is increased to simulate the infinite blank symbols (c symbols) to the left as follows. The rule s32+α1 +ur /s32 → s32 ; 1 is applied in σ2 at time ti+9 . Then at time ti+10 the rule s32 → s32 ; 1 is applied in σ5 and the rule s32 → s; 1 is applied in σ6 . Thus at time ti+10 there are 32 spikes in σ2 which simulates another c symbol to the left. Also at time ti+10 , there is 1 spike in σ5 and σ7 to simulate the current read symbol c. We have shown how to simulate an arbitrary left moving transition rule of U6,4 . Right moving transition rules are also simulated in 12 timesteps in a
204
T. Neary
manner similar to that of left moving transition rules. Thus a single transition rule of U6,4 is simulated by ΠU6,4 in 12 timesteps and from Theorem 3 the entire computation of M is simulated in 0(T 6 ) timesteps. From Theorem 3 and 3 Equation (2) M is simulated in 0(32T ) space. It was mentioned at the end of Section 2 that we generalised the previous definition of spiking neural P systems with exhaustive use of rules to allow the input neuron to receive an arbitrary number of spikes in a single timestep. If the synapses of the system can transmit an arbitrary number of spikes in a single timestep, then it does not seem unreasonable to allow and arbitrary number of spikes enter the input neuron in a single timestep. This generalisation can be removed from our system. This is done by modifying the spiking neural P system Πinput mentioned in the subsection “Input to ΠU6,4 ”, and attaching its output neuron to the input neuron of ΠU6,4 in Figure 2. The input neuron of this new system is the input neuron of Πinput and receives no more than a single spike at each timestep. This new universal spiking neural P system would be larger than the one in Figure 2, but there would be less work done in encoding the input. While the small universal spiking neural P system in Figure 2 simulates Turing machines with a polynomial time overhead it requires an exponential space overhead. This requirement may be shown by proving it is simulated by a counter machine using the same space. However, it is not unreasonable to expect efficiency from simple universal systems as many of the simplest computationally universal models have polynomial time and space overheads [8,13,10]. A more time efficient simulation of Turing machines may be given by spiking neural P system with exhaustive rules. Using similar techniques it can be shown that for each multi-tape Turing machine M there is a spiking neural P system with exhaustive rules that simulates M in linear time. ΠU6,4 from Figure 2 is easily altered to simulate other small universal Turing machines (i.e. to simulate them directly and not via U6,4 ). Using the same basic algorithm the number of neurons grows at a rate that is a log in the state-symbol product of the Turing machine it simulates. One approach to find spiking neural P systems smaller than that in Figure 2 is to simulate the universal Turing machines in [10]. These machines are weakly universal, which means that they have an infinitely repeated word to the left of their input and another to the right. The smallest of these machines has a state-symbol product of 8 and so perhaps the above algorithm could be altered to give a system with fewer neurons. Acknowledgements. The author would like to thank the anonymous reviewers for their careful reading and observations. The author is funded by Science Foundation Ireland Research Frontiers Programme grant number 07/RFP/CSMF641.
References 1. Chen, H., Ionescu, M., Ishdorj, T.: On the efficiency of spiking neural P systems. In: Guti´errez-Naranjo, M.A., et al. (eds.) Proceedings of Fourth Brainstorming Week on Membrane Computing, Sevilla, February 2006, pp. 195–206 (2006)
On the Computational Complexity of Spiking Neural P Systems
205
2. Fischer, P.C., Meyer, A., Rosenberg, A.: Counter machines and counter languages. Mathematical Systems Theory 2(3), 265–283 (1968) 3. Ionescu, M., P˘ aun, G., Yokomori, T.: Spiking neural P systems. Fundamenta Informaticae 71(2-3), 279–308 (2006) 4. Ionescu, M., P˘ aun, G., Yokomori, T.: Spiking neural P systems with exhaustive use of rules. International Journal of Unconventional Computing 3(2), 135–153 (2007) 5. Ionescu, M., Sburlan, D.: Some applications of spiking neural P systems. In: Eleftherakis, G., et al. (eds.) Proceedings of the Eighth Workshop on Membrane Computing, Thessaloniki, June 2007, pp. 383–394 (2007) 6. Leporati, A., Zandron, C., Ferretti, C., Mauri, G.: On the computational power of spiking neural P systems. In: Guti´errez-Naranjo, M.A., et al. (eds.) Proceedings of the Fifth Brainstorming Week on Membrane Computing, Sevilla, January 2007, pp. 227–245 (2007) 7. Leporati, A., Zandron, C., Ferretti, C., Mauri, G.: Solving numerical NP-complete problems with spiking neural P systems. In: Eleftherakis, G., et al. (eds.) Proceedings of the Eighth Workshop on Membrane Computing, Thessaloniki, June 2007, pp. 405–423 (2007) 8. Neary, T., Woods, D.: P-completeness of cellular automaton Rule 110. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4051, pp. 132–143. Springer, Heidelberg (2006) 9. Neary, T., Woods, D.: Four small universal Turing machines. In: Durand-Lose, J., Margenstern, M. (eds.) MCU 2007. LNCS, vol. 4664, pp. 242–254. Springer, Heidelberg (2007) 10. Neary, T., Woods, D.: Small weakly universal Turing machines. Technical Report arXiv:0707.4489v1, arXiv online report (July 2007) 11. P˘ aun, A., P˘ aun, G.: Small universal spiking neural P systems. BioSystems 90(1), 48–60 (2007) 12. P˘ aun, G.: Membrane Computing: An Introduction. Springer, Heidelberg (2002) 13. Woods, D., Neary, T.: On the time complexity of 2-tag systems and small universal Turing machines. In: 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS), Berkeley, California, October 2006, pp. 439–448. IEEE, Los Alamitos (2006)
Self-assembly of Decidable Sets Matthew J. Patitz and Scott M. Summers Department of Computer Science Iowa State University Ames, IA 50011, U.S.A. {mpatitz,summers}@cs.iastate.edu
Abstract. The theme of this paper is computation in Winfree’s Abstract Tile Assembly Model (TAM). We first review a simple, well-known tile assembly system (the “wedge construction”) that is capable of universal computation. We then extend the wedge construction to prove the following result: if a set of natural numbers is decidable, then it and its complement’s canonical two-dimensional representation self-assemble. This leads to a novel characterization of decidable sets of natural numbers in terms of self-assembly. Finally, we prove that our construction is, in some “natural” sense, optimal with respect to the amount of space it uses.
1
Introduction
In his 1998 Ph.D. thesis, Erik Winfree [9] introduced the (abstract) Tile Assembly Model (TAM) - a mathematical model of laboratory-based nanoscale self-assembly. The TAM is also an extension of Wang tiling [7,8]. In the TAM, molecules are represented by un-rotatable, but translatable two-dimensional square “tiles,” each side of which having a particular glue “color” and “strength” associated with it. Two tiles that are placed next to each other interact if the glue colors on their abutting sides match, and they bind if the strength on their abutting sides matches, and is at least a certain “temperature.” Extensive refinements of the TAM were given by Rothemund and Winfree in [5,4], and Lathrop et. al. [3] gave an elegant treatment of the model that does not discriminate against the self-assembly of infinite structures. In this paper, we explore the notion of computation in the TAM - what is it, and how is it accomplished? Despite its deliberate over-simplification, the TAM is a computationally expressive model. For instance, Winfree proved [9] that in two or more spatial dimensions, the TAM is equivalent to Turing-universal computation. In other words, it is possible to construct, for any Turing machine M and any input string w, a finite assembly system (i.e., finite set of tile types) that tiles the first quadrant, and encodes the set of all configurations that M goes through when processing the input string w. This implies that the process
This author’s research was supported in part by NSF-IGERT Training Project in Computational Molecular Biology Grant number DGE-0504304.
C.S. Calude et al. (Eds.): UC 2008, LNCS 5204, pp. 206–219, 2008. c Springer-Verlag Berlin Heidelberg 2008
Self-assembly of Decidable Sets
207
of self-assembly can (1) be directed algorithmically, and (2) be used to evaluate computable functions. One can also regard the process of self-assembly itself as computation that, takes as input some initial configuration of tiles, and produces output in the form of some particular connected shape, and nothing else (i.e., strict self-assembly [3]). The self-assembly of shapes, and their associated Kolmogorov (shape) complexity, was studied extensively by Soloveichik and Winfree in [6], where they proved the counter-intuitive fact that, sometimes fewer tile types are required to self-assemble a “scaled-up” version of a particular shape as opposed to the actual shape. Another flavor of computation in the TAM is the self-assembly of a language A ⊆ N. Of course, one must make some additional assumptions about the selfassembly of A, since A is one-dimensional, and not necessarily connected. In this case, it only makes sense to talk about the weak self-assembly [3] of A. We say that A weakly self-assembles if “black” tiles are placed on, and only on, the points that are in A. One can also view weak self-assembly as painting a picture of the set A onto a much larger canvas of tiles. It is clear that if A weakly self-assembles, then A is necessarily computably enumerable. Moreover, Lathrop et. al. [2] discovered that the converse of the previous statement holds in the following sense. If the set A is computably enumerable, then a “simple” representation of A as points along the x-axis weakly self-assembles. In this paper, we continue the work of Lathrop et. al. [2]. Specifically, we focus our attention on the self-assembly of decidable sets in the TAM. We first reproduce Winfree’s proof of the universality of the TAM [9] in the form of a simple construction called the “wedge construction.” The wedge construction self-assembles the computation history of an arbitrary TM M on input w in the space to the right of the y-axis, above the x-axis, and above the line y = x − |w| − 2. Our first main result follows from a straight-forward extension of the wedge construction, and gives a new characterization of decidable languages of natural numbers in terms of self-assembly. We prove that a set A ⊆ N is decidable if and only if A × {0} and Ac × {0} weakly self-assemble. Technically speaking, our characterization is (exactly) the first main theorem from Lathrop et. al. [2] with “computably enumerable” replaced by “decidable,” and f (n) = n. Finally, we establish that, if A ⊆ N is a decidable set having sufficient space complexity, then it is impossible to “naturally” self-assemble the set A × {0} without placing tiles in more than one quadrant.
2
The Tile Assembly Model
We now give a brief intuitive sketch of the abstract TAM. See [9,5,4,3] for other developments of the model. We work in the 2-dimensional discrete Euclidean space. We write U2 = {(0, 1), (1, 0), (0, −1), (−1, 0)}. We refer to the first quadrant N2 as Q1 , the second quadrant as Q2 , etc.. Intuitively, a tile type t is a unit square that can be translated, but not rotated, having a well-defined “side u” for each u ∈ U2 . Each side u of t has a “glue” of
208
M.J. Patitz and S.M. Summers
“color” colt (u) - a string over some fixed alphabet Σ - and “strength” strt (u) - a natural number - specified by its type t. Two tiles t and t that are placed at the points a and a + u respectively, bind with strength strt (u) if and only if (colt (u) , strt (u)) = (colt (−u) , strt (−u)). Given a set T of tile types, an assembly is a partial function α : Z2 T . An assembly is stable if it cannot be broken up into smaller assemblies without breaking bonds of total strength at least τ = 2. If α is an assembly, and X ⊆ Z2 , then we write the restriction of α to X as α X. Self-assembly begins with a seed assembly σ and proceeds asynchronously and nondeterministically, with tiles adsorbing one at a time to the existing assembly in any manner that preserves stability at all times. A tile assembly system (TAS) is an ordered triple T = (T, σ, τ ), where T is a finite set of tile types, σ is a seed assembly with finite domain, and τ = 2 is the temperature. An assembly α is terminal, and we write α ∈ A [T ], if no tile can be stably added to it. A TAS T is directed, or produces a unique assembly, if it has exactly one terminal assembly. A set X ⊆ Z2 weakly self-assembles [3] if there exist a TAS T = (T, σ, τ ) and a set B ⊆ T such that α−1 (B) = X holds for every terminal assembly α. That is, there is a set B of “black” tile types such that every terminal assembly has black tiles on points in the set X and only X. An assembly sequence in a TAS T = (T, σ, τ ) is an infinite sequence α = (α0 , α1 , α2 , . . .) of assemblies in which α0 = σ and each αi+1 is obtained from αi by the “τ -stable” addition of a single tile. We define the result of an assembly sequence α to be the unique assembly α = res(α) satisfying dom α = + 0≤i
3
The Wedge Construction
In this section, we review the “wedge construction” - a simple, well-known TAS that simulates an arbitrary Turing machine on some binary string in the first quadrant of the discrete Euclidean plane. We will later use the wedge construction to prove our main result. Construction 1 (wedge construction). Let M = (Q, Σ, Γ, δ, q0 , qA, qR ) be a standard TM, x ∈ {0, 1}∗, and define the TAS TM(x) = TM(x) , σ, τ , where TM(x) is the set of tile types defined in section 3.1, σ is the seed assembly satisfying dom σ = ({0, . . . , |x| − 1} × {0}) that encodes the initial configuration of M , and τ = 2. 3.1
Tile Types for Construction 1
We construct the set of tile types TM(x) as follows.
Self-assembly of Decidable Sets
209
1. For all x ∈ Γ , add the seed row tile types: Leftmost Interior Rightmost -*
x
q0x
q0x
x
>
>
>
>
-
2. For all x ∈ Γ , add the tile types: Left of tape head Right of tape head x
x
x
<
<
>
x
x
>
x
3. Add the following two tile types that grow the tape to the right: 2nd rightmost tape cell Rightmost tape cell -*
-
-
>
-
-*
-*
-*
4. For all p, q ∈ Q, and all a, b, c ∈ Γ satisfying (q, b, R) = δ(p, a) and q ∈ {qA , qR } (i.e. for each transition moving the tape head to the right into a non-accepting state), add the tile types: Tape cell with output Cell that receives tape value after transition head after transition b <
b
qc
pa
pa
qc
>
c
pa
5. For all p, q ∈ Q, and all a, b, c ∈ Γ satisfying (q, b, L) = δ(p, a) and q ∈ {qA , qR } (i.e. for each transition moving the tape head to the left into a non-accepting state), add the tile types: Tape cell with output Cell that receives tape value after transition head after transition b pa
b
qc
>
<
qc
pa
c
pa
6. For all p ∈ Q, a, b ∈ Γ , and all h ∈ {ACCEPT, REJECT} satisfying δ(q, b) ∈ {qA , qR } × Γ × {L, R} (i.e. for each transition moving the tape head into a halting state), add the tile types: h
h pa
qb b
>
<
qb b
pa
210
3.2
M.J. Patitz and S.M. Summers
Proof of Correctness
Lemma 1. If M is a standard TM, and x ∈ {0, 1}∗, then the TAS TM(x) is locally deterministic. Proof (Proof sketch). It is straightforward to define an assembly sequence α, leading to a terminal assembly α = res(α), in which (1) the j th configuration Cj of M is encoded in the row Rj = ({0, . . . , |x| − 1 + j} × {j}), and (2) α self-assembles Ci in its entirety before Cj if i < j. It follows easily from Construction 1 that every tile that binds in α does so deterministically, and with exactly strength 2, whence TM(x) is locally deterministic.
4
A New Characterization of Decidable Languages
We now turn our attention to the self-assembly of decidable sets of positive integers in the TAM. We will modify the wedge construction from the previous section in order to prove that, for every decidable set A ⊆ N, there exists a directed TAS TA×{0} = (TA×{0} , σ, τ ) in which A × {0} and Ac × {0} weakly selfassemble. Throughout our discussion, we assume that M = (Q, Σ, Γ, δ, q0 , qA , qR ) is a standard, total TM having ‘-’ as its blank symbol, and satisfying L(M ) = A. Our proof relies on the simple observation that, for every input w ∈ N, there exists a t ∈ N such that M halts on w after t steps. This means that we can essentially stack wedge constructions one on top of the other. Intuitively, our main construction is the “self-assembly version” of the following enumerator. while 0 ≤ n < ∞ do simulate M on the binary representation of n if M accepts then output 1 else output 0 end if n := n + 1 end while Just as the above enumerator prints the characteristic sequence of A, our construction will self-assemble the characteristic sequence of A along the positive x-axis. 4.1
Rigorous Construction of TA×{0}
In this section we present a full definition of the tile set TA×{0} , and in the next section we provide a higher level description of the behavior of our tile set. Note that in both sections we will be discussing a version of TA×{0} in which the simulations of M proceed from the bottom up since it is often more natural to think about this particular orientation. However, to be technically consistent we ultimately rotate all of the tile types in TA×{0} by 270 degrees, and then assign the
Self-assembly of Decidable Sets
211
seed tile to the location (−1, 0). The full construction is implemented in C++, and is available at the following URL: http://www.cs.iastate.edu/∼lnsa. In our construction, we use the following sets of strings (where ‘∗’ and ‘-’ simply represent the literal characters). C = {M0∗L, M1, M1∗L, M1∗, 0∗L, 1L, 0, 0∗, 1, -, -∗} C[no blank] = {M0∗L, M1, M1∗L, M1∗, 0∗L, 1L, 0, 0∗, 1} C[∗] = {M0∗L, M1∗L, M1∗, 0∗L, 0∗} C[no ∗] = C[no blank] − C[∗] M = {x ∈ C | M x} N = C[no blank] − M Intuitively, the set C contains the glue colors that appear on the north and south edges of some set of tile types that self-assembles a log-width binary counter (i.e., a binary counter that counts from 1 to infinity, and the width of each row is proportional to the log of the number it represents). We will embed these strings, and hence the behavior of a binary counter, into the tile types of the wedge construction. We will do so as follows. Let T be the set of tile types given in Construction 1 that are not in groups (1) or (3). For each tile type t ∈ T , c ∈ C, and u ∈ U2 , define the tile type tc such that (colt (u) , strt (u)) if u ∈ {(1, 0), (−1, 0)} tc (u) = (colt (u) ◦ (c), strt (u)) otherwise, Note that “colt (u) ◦ (c)” means concatenate the string c, surrounded by parentheses, to the end of the string colt (u). The set { tc | t ∈ T and c ∈ C} makes up part of the tile set TA×{0} , and we define the remaining tile types as follows. 1. The following are seed tile types. ~(M0*L)
SEED
~(-*) SE
SOLN
SE
SE
DIAG
PRESOLN
2. The following are the tile types for the initial configuration of M on some input. (a) Tile types that store the location of the tape head. For all m ∈ M , and all b ∈ {0, 1}, i. If there exists h ∈ {qA , qR } such that δ(q0 , b) ∈ {h} × Γ × {L, R}, If h = qA , add: If h = qR , add: REJECT
ACCEPT
q0b
q0b
>
~(m)
>
~(m)
ii. If δ(q0 , b) ∈ {qA , qR } × Γ × {L, R}, then add the following tile types: q0b(m)
q0b ~(m)
>
212
M.J. Patitz and S.M. Summers
(b) Tile types that represent the tape contents to the right of the tape head. For all n ∈ N ∪ {-}, and all a ∈ Γ , add the following tile types: a(n)
a
>
>
~(n)
3. Halting row tile types. For all h ∈ {ACCEPT, REJECT}, add the following tile types: (a) The following tile types initiate the halting signal. For all u ∈ C[no blank], If u ∈ C[∗], add: If u ∈ C[no ∗], add: u
u
h(u)
CTR
h
CTR
h(u)
h
h(u)
h(u)
(b) The following tile types propagate the halting signal to the right edge. For all u ∈ C[no blank], and for all a ∈ Γ , If u ∈ C[∗], add: If u ∈ C[no ∗], add: u
u
h(u)
h
h(u)
h
h
h
a(u)
a(u)
4. These are also halting row tile types, and fill in the space to the left of the initial halting tile. For all u ∈ C[no blank], add the following tile types: If u ∈ C[∗], add: If u ∈ C[no ∗], add: u
u
CTR CTR(u) CTR
CTR CTR(u) CTR
a(u)
a(u)
5. These are the tile types that perform counter increment operations. ~(M1*L)
M1* L
M
~(0*L) >>
M
0*L
M0*L
M1*L
~(0)
~(0*L)
0
*
*
0*L
~(M1)
M1
>>
~(1L) M
c*
~(0*) >>
c
0*
c*
c
1L
~(M1*)
M1*
>>
0*L
M1
~(0)
~(1)
0
c
c
1
M1*
1L
0
0
1
~(M1)
~(0)
~(0)
~(1)
~(1)
M1
c
*
0
*
*
0
M1
0
*
c
1
1
*
0*
c*
1
c*
c
c*
1
6. The following tile types propagate blank tape cells to the north -* ~(-*)
~(-*)
-(-)
-(-) >
-*
-
-
-
-
>>
-*
END
Self-assembly of Decidable Sets
213
7. The following tile types self-assemble a one-tile-wide path from the halting configuration to some location on the positive x-axis. For all h ∈ {ACCEPT, REJECT}, add the following tile types: h
h-*
-* h!
h!
-*
h
h END
END
h
h PRESOLN
END @
h
PRESOLN
h
h
h!
h END
h
END
h
h
END @
h!
h!
DIAGh
h
DIAG
h
h
DIAGh
h
h
DIAG
END @
h h!
PRESOLN
h
END
h
h
h
8. The following are solution tiles. For all h ∈ {ACCEPT, REJECT}, add the tile types: SOLN
h
h
SOLN
Construction 2. Let TA×{0} = (TA×{0} , σ, τ ) be the TAS, where, TA×{0} = { tc | t ∈ T and c ∈ C} ∪ { t | t is a tile type defined in the above list}, τ = 2, and σ consists of the leftmost tile type in group (1) of the above list placed at the point (0, 1). 4.2
Overview of Construction 2
This section gives a high level, intuitive description of Construction 2. Note that TA×{0} is singly-seeded, with the leftmost tile in group (1) of Section 4.1 being the seed tile type placed at the point (0, 1). The tile set TA×{0} is constructed in two phases. First, we use the definition of the TM M to generate TM(x) as in Construction 1. We then “embed” a binary counter directly into these tile types in order to simulate the self-assembly version of a loop. This creates a tile set which can simulate M on every input x ∈ N (assuming A is decidable), while passing the values of a binary counter up through the assembly. These are the tiles that form the white portion of the structure shown in Figure 1, and labeled M (0), M (1), and M (2). In order to provide M with a one-way, infinite-to-the-right work tape, every row in our construction that represents a computation step grows the tape by one tape cell to the right. The binary counter used to simulate a loop, running M on each input, is log-width and grows left into the second quadrant (represented by the dark grey tiles on the leftmost side of Figure 1). An increment operation is performed immediately above each halting configuration of M . The tile types that represent the initial configuration of M (on some input x) are shown in group (2) of Section 4.1. These tile types initiate each computation by using the value of x, embedded in the tile types of the binary counter, to
214
M.J. Patitz and S.M. Summers
M (2)
M (1)
M (0)
Fig. 1. The left-most (dark grey) vertical bars represent a binary counter that is embedded into the tile types of the TM; the darkest (black) rows represent the initial configuration of M on inputs 0, 1, and 2; and the (light grey) horizontal rows that contain a white/black tile represent halting configurations of M . Although this image seems to imply that the embedded binary counter increases its width (to the left) on each input, this is not true in our construction. This image merely depicts the conceptual “shape” of the log-width counter that is embedded in our construction.
construct a TM configuration with x located in its leftmost portion and q0 reading the leftmost symbol of x. Next, we construct the tile types for the ACCEPT and REJECT rows (i.e., halting configurations of M ). To do this, we construct tile types that form a row immediately above any row that represents a halting configuration of M . Conceptual examples of these rows are shown in Figure 1 as the light grey rows with the single black or white tiles which represent ACCEPT and REJECT signals, respectively. The tile types that make up halting configurations are constructed in groups (3) and (4) of Section 4.1. It is straightforward to construct the set of tile types that self-assemble a row that increments the value of the embedded binary counter (on top of the row that represents the halting configuration of M on x). These tile types are shown in group (5) of Section 4.1. After the counter increments, it initiates the simulation of M on input x + 1. We prefix the north edge colors of the tile types that make up a counter row with ‘∼’ so as to signal that the next row should be the initial configuration of M on x + 1. This has the effect of simulating M on x + 1 directly on top of the simulation of M on x.
Self-assembly of Decidable Sets
215
R
A
S
D D D D D D D D
Fig. 2. The white tiles represent successive simulations of M . When M halts and accepts, an accept signal (darkest grey tiles) is sent down along the right side of the assembly to the appropriate point on the negative y-axis. The reject signals (middle shade of grey tiles) operate in the same fashion. The diagonal (D) signal allows each halting signal to essentially “turn” the corner.
The tile types in group (6) of Section 4.1 simply allow the blank symbol to propagate up through the assembly. The final component of TA×{0} is a group of tile types that carry the ACCEPT and REJECT signals to the appropriate location on the x-axis. These tile types are shown in groups (7) and (8) of Section 4.1, and their functionality can be seen in Figure 2. 4.3
Proof of First Main Theorem
Lemma 2. Let A ⊆ N be decidable. The set A × {0} weakly self-assembles in the locally deterministic TAS TA×{0} . Proof. The details of this proof are tedious, and therefore omitted from this version of the paper. The following technical result is a primitive self-assembly simulator. Lemma 3. Let A ⊆ Z2 . If A weakly self-assembles, then there exists a TM MA with L (MA ) = A.
216
M.J. Patitz and S.M. Summers
Proof. Assume that A weakly self-assembles. Then there exists a TAS T = (T, σ, τ ) in which the set A weakly self-assembles. Let B be the set of “black” tile types given in the definition of weak self-assembly. Fix some enumeration a1 , a2 , a3 . . . of Z2 , and let MA be the TM, defined as follows. Require: v ∈ Z2 α := σ while v ∈ dom α do choose the least j ∈ N such that some tile can be added to α at aj choose some t ∈ T that can be added to α at aj add t to α at aj end while if α (v) ∈ B then accept else reject end if It is routine to verify that MA accepts A. Lemma 4. Let A ⊆ N. If A × {0} and Ac × {0} weakly self-assemble, then A is decidable. Proof. Assume the hypothesis. Then A×{0} by Lemma 3, there exist TMs M and MAc ×{0} satisfying L MA×{0} = A × {0}, and L MAc ×{0} = Ac × {0}, respectively. Now define the TM M as follows. Require: n ∈ N Simulate both MA×{0} and MAc ×{0} on input (n, 0) in parallel. if MA×{0} accepts then accept end if if MAc ×{0} accepts then reject end if It is clear that M is a decider for A. Lemma 5. Let A ⊆ N. If the set A is decidable, then A × {0} and Ac × {0} weakly self-assemble. Proof. This follows immediately from Construction 2 and Lemma 2. Note that the choice of the set B determines whether the set A × {0} or Ac × {0} weakly self-assembles. We now have the machinery to prove our main result. Theorem 1 (first main theorem). Let A ⊆ N. The set A is decidable if and only if A × {0} and Ac × {0} weakly self-assemble. Proof. This follows from Lemmas 4 and 5. In the next section, we will prove that our construction is optimal in some natural sense with respect to the amount of space that it uses.
Self-assembly of Decidable Sets
5
217
Two Quadrants Are Sufficient and Necessary
In the proof of Theorem 1, we exhibited a directed TAS that placed at least one tile in each of three different quadrants. This leads one to ask the natural question: is it possible to do any better than three quadrants? In other words, does Theorem 1 hold if only two quadrants of space are allowed? It turns out that the answer to the previous question is yes. Namely, if we simply shift the embedded binary counter in our construction to the right as its width grows, then we only need two quadrants of space to self-assemble the set A × {0}. (There is enough space to accommodate the counter bits because the right edge of the TM simulation grows to the right faster than the right edge of the counter.) Note that the modifications to the tile set are straightforward, requiring the modification of only five tile types. Now one must ask the question: does Theorem 1 hold when no more than one quadrant of space is available? First note that Winfree [9] proved one spatial dimension is sufficient to self-assemble A × {0} if A is regular. It is also easy to see that if A ∈ DSPACE(n), then it is possible to modify our construction to weakly self-assemble A × {0} using only one quadrant of space. However, and in the remainder of this section, we will prove that, if A ∈ DSPACE (2n ), then it is impossible to weakly self-assemble the set A × {0} in any “natural” way without using more than one quadrant. Note that, because of space-constraints, we merely sketch the proof of our second main theorem in this version of the paper. Definition 1. Let A ⊆ N be a decidable set and T be a TAS in which the set A × {0} weakly self-assembles. We say that T row-computes A if, for every α ∈ A [T ], the following conditions hold. 1. Let α be an assembly sequence of T with α = res(α). For all n ∈ N, there exists a unique point (x0 , y0 ) ∈ Q1 ∪ Q2 such that there is a path Pn = (x0 , y0 ), (x1 , y1 ), . . . , (xl−1 , yl−1 ) in the precedence graph Gα , where (xl−1 , yl−1 ) = (n, 0) and y0 > y1 ≥ · · · ≥ yl−1 = 0. ∞ 2. Let P = n=1 Pn , and α = α (dom α − P ). For all m ∈ N, there is a finite assembly sequence α = (αi | 0 ≤ i < k) satisfying α0 = α (Z × {0, . . . m − 1}), and dom res(α) = α (Z × {0, . . . , m}). We assume that if T row-computes a set A ⊆ N, then every terminal assembly α of T consists of two components: a simulation of some TM M with L(M ) = A, and the paths that determine the fate of the points along the x-axis. Intuitively, condition (1) says that for every point (n, 0) along the x-axis, there is a unique point in the first or second quadrant, and the path Pn that connects the former point to the latter carries the answer to the following question: “Is n ∈ A?” For technical reasons, we assume that the path Pn never grows “up.” Finally, condition (2) says that the simulation component of α can self-assemble one row at a time.
218
M.J. Patitz and S.M. Summers
It is clear that, for any decidable set A ⊆ N, the construction that we outlined at the beginning of this section row-computes A. Theorem 2 (second main theorem). Let A ⊆ N. If A ∈ DSPACE (2n ), and T is any TAS that row-computes A, then for all α ∈ A [T ], α (Q1 )∩α (Qc1 ) = ∅. Proof (Proof sketch). Assume for the sake of contradiction that for every terminal assembly α of T , dom α ⊆ Q1 . Since T row-compute A, there must be a path P in Gα from some point (x0 , y0 ) ∈ Q1 to some point along the x-axis. Moreover, the path P must “turn left” at some point. If this were not the case for every such path, then it is possible to use condition (2) in the definition of row-computes to show that A ∈ DSPACE(n), which contradicts the fact that A ∈ DSPACE (2n ). Since there is one path that, en route to the x-axis, turns left (at some point), every successive path must do so. Because dom α ⊆ Q1 , there exists n ∈ N for which a path terminating at the point (n, 0) goes through the point (n + 1, 0). This clearly violates condition (1) of the definition of row-computes. Hence, our initial assumption must be wrong, and the theorem follows. In other words, Theorem 2 says that if A has sufficient space complexity, then it is impossible to weakly self-assemble the set A × {0} in any “natural” way with the entire assembly being contained entirely in the first quadrant. This is the sense in which the construction that we outlined at the beginning of this section is optimal.
6
Conclusion
In this paper, we investigated the self-assembly of decidable sets of natural numbers in the TAM. We first proved that, for every decidable language A ⊆ N, A × {0} and Ac × {0} weakly self-assemble. This implied a novel characterization of decidable sets in terms of self-assembly. Our second main theorem established that in order to achieve this compactness (i.e., self-assembly of A × {0} as opposed to f (A) × {0} for some function f ) for spatially complex languages, any “natural” construction will inevitably utilize strictly more than one quadrant of space. In fact, we conjecture that Theorem 2 holds for any TAS T in which A × {0} weakly self-assembles. Our results continue to expose the rich interconnectedness between geometry and computation in the TAM. Acknowledgments. This research was supported in part by National Science Foundation Grants 0652569 and 0728806. Both authors wish to thank Dave Doty for pointing out simplifications to Section 5.
References 1. Cheng, Q., Goel, A., de Espan´es, P.M.: Optimal self-assembly of counters at temperature two. In: Proceedings of the First Conference on Foundations of Nanoscience: Self-assembled Architectures and Devices (2004)
Self-assembly of Decidable Sets
219
2. Lathrop, J.I., Lutz, J.H., Patitz, M.J., Summers, S.M.: Computability and complexity in self-assembly. In: Proceedings of The Fourth Conference on Computability in Europe, Athens, Greece, June 15-20 (to appear, 2008) 3. Lathrop, J.I., Lutz, J.H., Summers, S.M.: Strict self-assembly of discrete Sierpinski triangles. In: Proceedings of The Third Conference on Computability in Europe, Siena, Italy, June 18-23 (2007) 4. Rothemund, P.W.K.: Theory and experiments in algorithmic self-assembly, Ph.D. thesis, University of Southern California (December 2001) 5. Rothemund, P.W.K., Winfree, E.: The program-size complexity of self-assembled squares (extended abstract). In: Proceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing, pp. 459–468 (2000) 6. Soloveichik, D., Winfree, E.: Complexity of self-assembled shapes. SIAM Journal on Computing 36, 1544–1569 (2007) 7. Wang, H.: Proving theorems by pattern recognition – II. The Bell System Technical Journal XL(1), 1–41 (1961) 8. Wang, H.: Dominoes and the AEA case of the decision problem. In: Proceedings of the Symposium on Mathematical Theory of Automata (New York, 1962), Polytechnic Press of Polytechnic Inst. of Brooklyn, Brooklyn, NY, pp. 23–55 (1963) 9. Winfree, E.: Algorithmic self-assembly of DNA, Ph.D. thesis, California Institute of Technology (June 1998)
Ultrafilter and Non-standard Turing Machines Petrus H. Potgieter1 and Elem´er E. Rosinger2 1
2
Department of Decision Sciences, University of South Africa (Pretoria), P.O. Box 392, Unisa, 0003
[email protected],
[email protected] Department of Mathematics and Applied Mathematics, University of Pretoria, Pretoria, 0002
[email protected]
Abstract. We consider several kinds of non-finitary computation, using ordinary Turing machines, as usual, as the reference case. The main problem which this short paper tries to address, is the problem of defining the output, or final message, of a machine which has run for a countably infinite number of steps. A modest scheme, using non-standard numbers, is proposed.
1
Introduction
Non-finitary machines are the work-horses of hypercomputation. However these kinds of machines are subject to the usual problems of definability of the events – after the fact – as discussed in more detail in [1], among other. In fact, we are quite happy to interrogate the results – if convenient – of such a process but we are rather reluctant to consider the wreck(s) of the poor machine. In some approaches the “evidence” is, conveniently, destroyed by some astrophysical event [2]. This paper is not a survey of the field of hypercomputation, of which there are very many incarnations [3], nor a position on non-finitary computation, of which, ditto [4]. It restricts itself, rather, to the question of how to deal with the machine and output description when we consider a realised non-finitary computation1 . We shall consider only Turing machines with unbounded input and output tapes, with an unbounded working tape, over a finite alphabet. This exposition does not specify any new feature of the machine, but rather proposes a definition of the output of non-halting computations. In this sense, it is akin to the infinite time Turing machines introduced by Hamkins and Seabold [5,6] although our approach differs – as we shall see – substantially from theirs. Our approach attempts to put the halting of classical Turing machines and the defined output of non-halting machine (a kind of machine in infinite time) within similar frameworks – based on filters and ultrafilters, respectively. This is hypercomputation in the same sense that any supertask machine (e.g. the infinite-time machines of Hamkins and Seabold) is. Again, like Hamkins and Seabold, we do not propose 1
Which some people would deny is a “computation” at all.
C.S. Calude et al. (Eds.): UC 2008, LNCS 5204, pp. 220–227, 2008. c Springer-Verlag Berlin Heidelberg 2008
Ultrafilter and Non-standard Turing Machines
221
a physical mechanism for realising such machines. Obviously any of the speculative models for concretising supertask machines, e.g. based on Relativity Theory [7], could be used to propose a (pseudo-)physical model of our machines. Our model is essentially logical – as it has to be – since we are most concerned about defining the output of the machine for certain inputs for which it does not halt in finite time. We are, implicitly, assuming that the interesting forms of hypercomputation are related to infinite computational time. However, the logical difficulties – a` la Zeno – arise not from the identification of logical with computational time. Logical difficulties with infinite time computation have also been discussed elsewhere, e.g. [8,9]. These difficulties arise from much the same source as the cigars in the folkloric Hilbert Non-Smoking Hotel where all guests are searched for, and relieved of, all tobacco on arrival but where each guest can enjoy cigar in their room – as Guest 1 got his from Guest 2, who got two cigars from Guest 3, who got three cigars from Guest 4 etc.
2
Accepting Computations by Stable Global States
A Turing machine computation on a machine T can be fully described by a sequence of natural numbers (kn )∞ n=0 where each ki described the condition (sometimes called the global state) of the machine – including the tapes, head position and internal state – after the i-th step in the computation. Call such a sequence a run of T . The input is, naturally, encoded by k0 and if the input leads to an accepting computation, then the sequence is eventually constant. It is, incidentally, the concept of eventually constant which we shall try to generalise in this section. The sequence (kn )∞ n=0 is fully determined by k0 and the rules that determine the machine T . Hyper-computational paradigms that incorporate time infinities normally restrict themselves to descriptions of ordinary Turing machines and we shall freely abuse the notation to refer to T as both the ordinary Turing machine and the description of its rules of operation (i.e. the transition function), possibly interpreted in some hyper-computation paradigm. Consider the Fr´echet filter F = {L ⊆ N|N \ Lis finite} consisting of all co-finite subsets of the natural numbers N, where 0 ∈ N – as customary in computer science. Recall although these properties will not be used immediately, that a filter on a set X is a family of subsets of X, not containing the empty set, and closed with respect to the taking of finite intersections and the forming of supersets. Definition 1. A sequence (kn )∞ n=0 is an accepting computation whenever it is eventually constant, i.e. whenever it is constant on some element of the Fr´echet filter F . We have de-emphasised the notion of explicitly defined accepting state, without loss of generality. For, a Turing machine with an explicitly defined accepting
222
P.H. Potgieter and E.E. Rosinger
state obviously satisfies Definition 1 and a Turing machine with an accepting computation by the same definition, can be rewritten with an explicitly defined accepting state. Suppose, now, G ⊇ F is an arbitrary collection of infinite subsets of N. We attempt to redefine an accepting computation using the elements of G instead of F as earlier, so as to – prospectively – enlarge the number of runs of a Turing machine T that could be considered to actually compute something. Definition 2. A sequence (kn )∞ n=0 is a G-accepting computation whenever it is constant on some element of G. When G is the Fr´echet filter F , a G-accepting computation is an accepting computation in the usual sense. Definition 2 is especially interesting when we consider U-accepting computations, where U is an ultrafilter. Recall that an ultrafilter on X is a filter with the property that for each A ⊆ X, either A or its complement belongs to the filter. The Axiom of Choice guarantees the existence of ultrafilters U on N which contain F . Fix one such U. The subsets belonging to a specific filter are often seen as the large sets. The F -large sets are the cofinite subsets of N and the natural generalisation of the cofinite sets is the collection of U-large sets. We can summarise the defitions so far: Accepting computation. There exists an F -large set of points in time where the tape content as well as the internal state of the machine remain constant. U-accepting computation. There exists a U-large set of points in time where the tape content as well as the internal state of the machine remain constant. To see how a Thomson’s lamp2 by Turing machine is avoided in this approach, consider a machine TTL with alphabet {−1, 1} which at time n writes (−1)n to the first position of the output tape and has a minimal number of states. The output tape of the machine is now a Thomson’s lamp in the usual sense but any run of this machine is a U-accepting computation since, by the properties of an ultrafilter, either the set of odd points in time, or the set of even points, belongs to U and therefore TTL is, w.r.t. U-accepting computation, equivalent to a machine outputting a constant bit on the tape – and that bit is either 1 or −1, depending on the filter U. It is not extraordinarily liberal to consider a run of TTL an accepting computation since this machine simply oscillates between two global states. As we shall see, the notion of G-accepting computations does not, unfortunately, go much beyond machines of the type TTL . The following observation is self-evident. Proposition 1. If a filter G F then every G-accepting computation is either (i) an F -accepting computation, i.e. a usual accepting computation; or 2
Thomson’s lamp [10] is a variation of the Zeno paradox, where the switch of a lamp is toggled after 1 minute, then after half a minute, then after a quarter minute and so forth. The question is whether the lamp is on or off after two minutes have elapsed.
Ultrafilter and Non-standard Turing Machines
223
(ii) a computation that ends in a finite cycle of global states of the machine, in the fashion of Thomson’s lamp. Proof. Suppose that (i) does not hold, i.e. that the machine in question has a run (kn )∞ n=0 which is constant on some A ∈ G \ F. A is infinite since if it were not, we would have Ac ∈ F ⊂ G by definition and hence ∅ = A ∩ Ac ∈ G which would contradict the assumption that G is a filter. Since A is infinite, it has at least two distinct members, i.e. km1 = km2 for some m1 , m2 ∈ A with m1 = m2 . The Turing machine is deterministic, of course, and hence km1 +1 =
km2 +1 etc. This, almost trivial, remark shows that any departure from the usual notion of accepting computation, via the Fr´echet filter F for ordinary Turing machine descriptions, requires one to abandon the notion of global stability for the global state of the machine kn – even when relaxing it only slightly. Contrast this interpretation with the treatment, in the infinite machine framework of Hamkins and Seabold [5] of TTL . In their framework, the state of the machine is defined at the first limit ordinal ω and – if lim sup has been chosen over lim inf as the limiting operator for each cell of the tape – the content of the tape “at” the ordinal ω is just +1. In our approach, we do not – of course – know whether a run of TTL , being a U-accepting computation, will “output” +1 or −1. For each choice of U it will be one of the two but we do not know which, unless we have more information about U. Consider a slightly accelerated TATL which at time n writes (−1)n+1 to the first position of the output tape and has a minimal number of states. Clearly the “output” of TATL is either +1 or −1, depending on the choice of U. Furthermore, considered as U-accepting computations, the machines TATL and TATL have opposite outputs. In the infinite-time machines of Hamkins and Seabold, however, TATL and TATL are in the same state at ordinal ω. The present authors find it perhaps more intuitive that TATL and TATL compute different outputs. For a start, the two machines – if started at the same time – are always in opposite states. On the other hand, if they are started one unit of time apart then they would always seem to be in the same state. It could very well be that the question – whether TATL and TATL compute the same or opposite thing, or neither – is analogous to speculation about the virginity of Hamlet3 . 3
Simile used by Martin Davis in the Notices of the AMS, May 2008, with regard to the continuum hypothesis.
224
3
P.H. Potgieter and E.E. Rosinger
Ultrafilter Machines
In addition to the two kinds of computation below, described in the preceding section, one can use the filters F and U to describe two further notions. The additional notions are described for machines with a dedicated output tape. In each case the content of the output tape is considered only for the appropriate large set of points in time. Limit computation. For each F -small set of positions on the output tape, there exists a F -large set of points in time where those positions on the output tape do not change. Ultrafilter computation. There exists a U-large set of points in time where the output tape is constant. Limit computability is a well-known notion that is easily accommodated in this framework. Ultrafilter computation is a notion that is different and apparently a bit more simple than limit computation but still avoids some pathologies like the undefinability of the output of the Thomson’s lamp machine TTL . However, both of these notions are unclear about which part of the output tape (or, all of it?) should be read as the output of the machine and this leads – rather naturally – to the notion of non-standard machines. However, before we discuss these, let us consider one further toy example. Consider a Turing machine Td which operates with the alphabet {−1, 1} and as follows. write -1 on the tape up to position 98; n = 0; while 1 > 0 do write "+1" on position 99; go back and write "-1" on position 99; write "-1" in the 2^n positions to the right of 99; move back to position 99; n = n + 1; end while; Clearly them machine has neither an accepting computation, nor a U-accepting computation (as defined above). It also does not describe any limit-computable function because there is no F -large set in time for which position 99 is unchanged. However, it is conceivable that – for the right choice of U – it represents some ultrafilter computation. If one only considers the output tape, then this is not so unreasonable – the tape is actually mainly filled with “-1”, with the exception of a very occasional appearance of “1” in position 99. If one assumes that the tape is originally filled by the symbol “-1” then the output tape of Td (apart from the movement of the head) behaves exactly like the output tape of the machine executing the following actions. write -1 on the tape up to position 98; n = 0;
Ultrafilter and Non-standard Turing Machines
225
while 1 > 0 do write "+1" on position 99; go back and write "-1" on position 99; wait for as long as it take to write "-1" in the 2^n positions to the right of 99; n = n + 1; end while; Now, in the ordinal computation of Hamkins and Seabold, the machine Td has the same state at ω as a device executing the following program. write -1 on the tape up to position 98; n = 0; while 1 > 0 do write "+1" on position 99; go back and write "-1" on position 99; end while; This seems mildly counter-intuitive to us.
4
Non-standard Turing Machines
The motivation in this section is to handle the output – even for non-halting, in the classical sense – for Turing-type machines in a unified framework. The basic idea is elementary: to simply define the sequence of output tape content, at each discrete moment in time, as the output of the machine. For a start, if p(k) denotes the content of the output tape of a machine when its global description is k then for each run (kn ) of the machine, (p(kn )) will be the sequence of output tape contents. For the classically accepting computations, we shall want to identify the output sequence (p(kn )) – which will be constant after a finite number of terms – with the limit of the sequence, which is exactly the classical output of the machine. If U is the ultrafilter discussed earlier, we proceed to use the notions of non-standard analysis, as in Robinson’s development. Definition 3. For each sequence of natural number (an ) we set (an )U = {(bn )| {m|am = bm } ∈ U} which is the equivalence class of all sequences that agree with (an ) on a U-large set of indices. We shall abuse the notation and write k for the equivalence class of the sequence which has only one value, k. The following observation follows trivially from the preceding discussion. Theorem 1. If a run (an ) of a classical Turing machine T is an accepting computation with output k then
226
P.H. Potgieter and E.E. Rosinger
(i) for some ∈ N we have (an )U = ; and (ii) (p(an ))U = k. The converse is, of course, not true – the Turing lamp machine TTL being the obvious counter-example. We are now ready to see the output of a Turing-type machine as a, possibly infinite, non-standard natural number (cn )U where cn = p(kn ) for some run (kn ) of the machine. However, the input of the machine can also be made a nonstandard natural number. Suppose (ai ) is a sequence of natural numbers and let (kni ) denote a run of Turing machine T on input ai – in the classical sense. We now define a run of T on (ai ) to be the sequence (kii ). Remark 1. If (an )U = (bn )U and (kn ) and (n ) are runs of T on the two respective non-standard numbers, then (kn )U = (n )U . The preceding remark that runs, and consequently outputs (whether finite or infinite) are now defined for T on non-standard natural number inputs. Definition 4. The output of T on input (an )U is the class (p(kn ))U where (kn ) is a run of T on (an ). The definition is well-founded, by the remark above. One can, furthermore, easily see that within this framework the halting problem for ordinary Turing machines can be solved by a machine that outputs (1)U if the machine halts, and (0)U otherwise. Let us call ordinary Turing machine descriptions, equipped with the scheme of operation described above, non-standard Turing machines (NSTMs or NST machines). It is clear, of course, what the concatenation of two NST machines would compute as the output of an NST is always a valid input for another NSTM. However, exactly how this concatenation would be implemented on an NSTM – and whether this would be possible at all – is not clear. It is likely, for example, that certain additional conditions on the ultra-filter U will be required. The non-standard approach to computability has been investigated before, i.a. in [11] and it is absolutely conceivable that an approach via functions (which would largely eliminate the time-dynamics of the output tape) is more sensible. The approach of the earlier literature is focussed, as far as the authors are aware, on characterizing classical computability.
5
Conclusion
The paper has explored how the filter and ultrafilter concepts can be used to characterise the behaviour of certain non-classical computation schemes based on Turing machines. A fully non-standard scheme w.r.t. the input, output and run length – in which, however, the machine still has a classically finite description – is proposed as one way to overcome the problem of defining the output or final
Ultrafilter and Non-standard Turing Machines
227
global state of the machine. The authors regard this as a tentative proposal with some promise – at least for extending the vocabulary of hypercomputation by accelerated Turing machines.
References 1. Potgieter, P.H.: Zeno machines and hypercomputation. Theoretical Computer Science 358, 23–33 (2006) 2. Hogarth, M.L.: Does general relativity allow an observer to view an eternity in a finite time? Found. Phys. Lett. 5(2), 173–181 (1992) 3. Ord, T.: The many forms of hypercomputation. Appl. Math. Comput. 178(1), 143– 153 (2006) 4. Davis, M.: Why there is no such discipline as hypercomputation. Appl. Math. Comput. 178(1), 4–7 (2006) 5. Hamkins, J.D., Lewis, A.: Infinite time turing machines. The Journal of Symbolic Logic 65, 567–604 (2000) 6. Hamkins, J.D., Seabold, D.E.: Infinite time turing machines with only one tape. MLQ. Mathematical Logic Quarterly 47, 271–287 (2001) 7. Hogarth, M.L.: Does general relativity allow an observer to view an eternity in a finite time? Foundations of Physics Letters 5, 173–181 (1992) 8. Cotogno, P.: Hypercomputation and the physical Church-Turing thesis. British Journal for the Philosophy of Science 54, 181–223 (2003) 9. Cohen, R.S., Gold, A.Y.: Theory of ω-languages. I. Characterizations of ω-contextfree languages. Journal of Computer and System Sciences 15, 169–184 (1977) 10. Thomson, J.: Tasks and Super-Tasks. Analysis 15, 1–13 (1954–1955) 11. Richter, M.M., Szabo, M.E.: Nonstandard methods in combinatorics and theoretical computer science. Studia Logica 47, 181–191 (1988)
Parallel Optimization of a Reversible (Quantum) Ripple-Carry Adder Michael Kirkedal Thomsen and Holger Bock Axelsen DIKU, Department of Computer Science, University of Copenhagen Universitetsparken 1, DK-2100 Copenhagen Ø, Denmark
[email protected],
[email protected]
Abstract. The design of fast arithmetic logic circuits is an important research topic for reversible and quantum computing. A special challenge in this setting is the computation of standard arithmetical functions without the generation of garbage. The CDKM-adder is a recent garbage-less reversible (quantum) ripple-carry adder. We optimize this design with a novel parallelization scheme wherein m parallel k-bit CDKM-adders are combined to form a reversible mk-bit ripple-block √ carry adder with logic depth O(m + k) for a minimal logic depth O( mk), thus improving on the mk-bit CDKM-adder logic depth O(m · k). We also show designs for garbage-less reversible set-less-than circuits. We compare the circuit costs of the CDKM and parallel adder in measures of circuit delay, width, gate and transistor count, and find that the parallelized adder offers significant speedups at realistic word sizes with modest parallelization overhead. Keywords: Reversible computing, circuits, adders, quantum computing.
1
Introduction
We are reaching the end of Moore’s law [10]. In the near future miniaturization will bottom out at the atomic level, and the classical circuit model will be insufficient as a computational model for future performance gains in a realm where quantum effects dominate. However, we have already reached a point where power consumption and dissipation imposes severe constraints on the processing throughput. For this reason unconventional computation paradigms must be developed and evaluated presently [11]. Reversible computing [8,2,14,17,12], wherein computations are organized without any (logical) information loss, promises to reduce power consumption in the computation process dramatically. This follows from Landauer’s principle [8] which states that the erasure, not the generation, of information necessitates energy dissipation as heat, which translates directly to power consumption: Lossy operations makes a processor power-hungry and hot. The immediate deployment of reversible computing principles in current CMOS technology [15] could help alleviate both these problems for a dual power savings, as the energy used to cool computing machinery is now comparable to the energy needed to drive the computations. C.S. Calude et al. (Eds.): UC 2008, LNCS 5204, pp. 228–241, 2008. c Springer-Verlag Berlin Heidelberg 2008
Parallel Optimization of a Reversible (Quantum) Ripple-Carry Adder
229
In addition to this immediate application scope, reversible computing is also of great importance in the growing field of quantum computing. Non-measurement transformations of a quantum state must be unitary and therefore reversible. Indeed, the improvements in this paper arise from purely classical reversibility considerations wrt a quantum adder. Cuccaro et al. [3] improved on a reversible (quantum) ripple-carry adder by Vedral et al. [16]. The resulting CDKM-adder requires only one ancilla bit and no garbage bits.1 Here, we improve on this design by parallelizing m such kbit CDKM-adders to form an mk-bit ripple-block carry adder with circuit delay O(m + k), as opposed to O(m · k) for an mk-bit CDKM-adder. The resulting circuit is preferable to ordinary ripple-carry adders when circuit delay is critical, even for small mk. We also present novel less-than comparison circuits, used for each k-bit block in the parallelization. Overview. In Sec. 2 we provide an introduction to reversible circuits, and in Sec. 3 reversible adders are defined. We show the CDKM-adder (Sec. 3.1) and present a novel less-than comparison circuit (Sec. 3.2). In Sec. 4, we introduce our parallelization scheme, formally prove that the parallelization mechanism is correct and compare with the CDKM-adder. We conclude in Sec. 5.
2
Reversible Circuits
This paper describes reversible (quantum) circuits. We use the formalism of [16,3], which is based on [14]. The reversible circuit model mirrors the classical irreversible model except that (1) fan-out is not permitted, and (2) only reversible gates are allowed. A reversible logic circuit is thus an acyclic network of reversible logic gates. The reversible gates used in this paper are described below. 2.1
Reversible Logic Gates
A conventional logic gate as used in today’s computers is a function from one or more boolean inputs to one boolean output. Common for nearly all of these gates is that they are not reversible. For a gate of width n to be (logically) reversible, it must hold that (1) the number of input lines is equal to the number of output lines (written n × n) and (2) the boolean function Bn → Bn of the gate is bijective. Not gate. The NOT gate (Fig. 2) is the simplest nontrivial reversible gate, having one input and one output line. 1
In practical quantum computing, minimal circuit width is critical, as more bits require a larger number of qubits to be coherent, a difficult physics and engineering challenge that has yet to be solved efficiently.
230
M.K. Thomsen and H.B. Axelsen def A B⊕A
A B (a)
A B⊕A
A B (b)
Fig. 1. Shorthand of diagram (a) is (b)
A1 A2
A1 A2
An−1 An
An−1 An ⊕ A1 A2 . . . An−1
Fig. 3. n-bit CNOT gate (n × n)
A
A
Fig. 2. NOT gate (1 × 1)
A B C
A AB ⊕ AC AB ⊕ AC
Fig. 4. Fredkin gate (3 × 3)
Controlled-not gate. The n-bit controlled-not (CNOT) gate [14] (Fig. 3) is an n × n gate. Line An is negated iff all controls A1 , A2 , ..., An−1 are true. The control lines remain unchanged. For historical reasons the 2-bit CNOT gate is called a Feynman gate and the 3-bit CNOT gate is called a Toffoli gate. Fredkin gate. The Fredkin gate [7] (Fig. 4) is a 3 × 3 gate that performs a controlled swap operation: lines B and C are swapped iff the control A is true.2 Notation. The following notation for boolean logic expressions is used throughout the paper: A for the negation ¬A, AB for the conjunction A ∧ B, A ⊕ B for the exclusive-or A XOR B. The order of precedence is A, AB, A ⊕ B. The shorthand in Fig. 1(b) is used in place of Fig. 1(a) for denoting negative control. 2.2
Cost Metrics
The cost of realizing reversible logic gates depends greatly on the underlying technology (e.g., quantum computing, low-power CMOS). Reversible gates can and do have different relative costs in different technologies. This can lead to different optimizations for different implementations and suggests technologymotivated cost metrics [9]. In this paper, we consider a dual-line pass-transistor CMOS technology for calculating circuit delay and accurate transistor costs. Silicon prototypes of reversible circuits have been fabricated using this technology [4,15]. In a dual-line CMOS realization, where each logic variable A is represented by two physical lines, one representing A and one representing A, the NOT gate and negative control is a simple cross-over and, thus, incurs no transistor cost or gate delay. Gate costs. An abstract cost metric that does not take into account the complexity of the gates may derive a lower cost than does the actual implementation [9]. Several cost functions, including a simple gate count, have been reported 2
The principle generalizes to n-bit controlled-swap gates, where the Fredkin gate is the 3-bit version.
Parallel Optimization of a Reversible (Quantum) Ripple-Carry Adder
231
in the literature. However, a simple gate count does not always lead to a technologically preferable solution. To provide a more accurate cost estimate, we use the transistor cost of realizing the circuits in dual-line pass-transistor CMOS [15]. The NOT gate has no transistor cost, the n-bit CNOT gate costs 8(n − 1) transistors, and the Fredkin gate costs 16 transistors. Circuit delay. We use a simple delay metric with the assumption that (1) all circuit inputs are available at the same time, (2) an n-bit CNOT and Fredkin gate have a unit delay and (3) a NOT gate has no delay. This cost function is identical to the logic depth if we ignore NOT gates, and we use the terms interchangingly. The circuit delay is thus equal to the maximum number of n-bit CNOT and Fredkin gates on any input-output path of a circuit. Garbage and ancillae bits. As one of the central goals of reversible circuits is to avoid information destruction, the number of auxiliary bits used is a third important characteristic of circuit complexity. A garbage bit is a non-constant output line that is not part of the wanted result, but is required for logical reversibility. An ancilla bit (usually used as a scratch variable) is a bit-line in a circuit that is assured to be constant (e.g., set to 0) at both input and output. The distinction is important: while garbage bits will accumulate with repeated use of a circuit, ancillae will not.
3
Reversible Adders
Fast logic adders are essential for efficient computing, and have therefore received much attention in conventional circuit design. The same importance is true for reversible (and quantum) circuits. An immediate problem for reversible adder implementation is that addition is not a reversible function in itself. Given just the value of the sum A + B, one can not determine A and B uniquely. This means that addition cannot be directly implemented in reversible (and quantum) computing. What we can do instead is to use reversible updates [19,1,18]. Intuitively, we can reversibly augment a value A with a value that does not depend on A. The concept is illustrated in the n-bit controlled-not gate where (A1 , A2 , . . . , An−1 , An ) → (A1 , A2 , A3 , . . . , An−1 , An ⊕ (A1 A2 . . . An−1 )) . In the case of n-bit addition, the natural reversible update is (A, B) → (A, B+A). The sum A + B may overflow, however, so we use modular addition, (A, B) → (A, B + A mod 2n ) , to define reversible addition. The preservation of the A input gives us that this is an injective function, and should therefore be implementable reversibly.
232
M.K. Thomsen and H.B. Axelsen Ci ⊕ A i Ai ⊕ Bi Ci+1
Ci Bi Ai Majority Circuit
Unmajority Circuit
Ci Ci Bi Si Ai Ai Sum calculation Circuit
Fig. 5. 1-bit CDKM-adder
MAJ
Cn−1 ⊕ An−1 An−1 ⊕ Bn−1 Cn
UMS UMS UMS
MAJ
Bn−1 An−1
C0 ⊕ A 0 A0 ⊕ B0 C1 ⊕ A 1 A1 ⊕ B1
MAJ
C0 B0 A0 B1 A1
C0 S0 A0 S1 A1 Sn−1 An−1
Fig. 6. n-bit CDKM-ripple-carry adder [3]. MAJ is the majority circuit while UMS is the combined unmajority and sum calculation circuit.
Notation. Let A and B be n-bit numbers, with 0 ≤ i < j < n. By Ai we denote the i’th bit of A. Ai..j denotes j − i + 1 bits of A from Ai to Aj inclusive. The formulas for calculating the sum and carry-out are the well-known ones from conventional logic, Si = Ci ⊕ Ai ⊕ Bi Ci+1 = Ci (Ai ⊕ Bi ) ⊕ Ai Bi .
(1) (2)
Note that this implies S = (A + B + C0 mod 2n ), and Cn = (A + B + C0 ≥ 2n ). The partial carry i Cj denotes the carry-out of Ai..j−1 + Bi..j−1 , i.e. it is the carry-out Cj under the assumption that carry-in Ci = 0. Note that 0 Cj = Cj . 3.1
The CDKM-adder
A recent approach to the reversible adder [16,3] uses the fact that the carry-out and the sum are not necessarily needed at the same time in a ripple-carry adder. The CDKM-adder (Fig. 5) first calculates the carry-out, which can be propagated in standard ripple-carry fashion to the next adder (Fig. 6). The carry calculation is implemented by the majority circuit. When the carry-out has been used, it can be uncomputed by an unmajority circuit, and the sum calculated by a sum circuit. When implementing the circuit, the unmajority and add circuits can be merged into one unmajority-and-sum circuit, reducing the depth by two gates.3 3
The rightmost gate of the unmajority circuit and the leftmost gate of the add circuit are the same, and since controlled-not gates are their own inverses they can be omitted.
Parallel Optimization of a Reversible (Quantum) Ripple-Carry Adder
233
This is a dramatic improvement compared to previous adders [6], as this approach reduces the width of the circuit to a minimum and the number of garbage bits to zero, using just one ancilla bit. This comes at the cost of more gates and a deeper circuit, but both of these caveats are acceptable compared with garbage generation. Implementing an n-bit adder from the 1-bit CDKM-adder is done using the ripple-carry principle (Fig. 6). The first half is a ripple of majority circuits and the second half is a “backwards” ripple of the unmajority-and-sum circuits. 3.2
Comparison Circuits
Comparison of values is important to determine the control flow in programs and processors alike. Along with the adder, comparison circuits are thus a necessary part of the ALU of any processor. In reversible logic the adders have so far received significantly more attention than comparison circuits. Anticipating the parallelization scheme in Sec. 4, we shall require garbagefree reversible circuits for equality (=) and less-than (<) comparison of n-bit numbers. Equality comparison of two bits Ai and Bi is quite straightforward to implement as Ai ⊕ Bi , and this principle is easy to generalize. This section will therefore focus on implementing the set-less-than circuit (SLT). Given two n-bit numbers A = A0..n−1 and B = B0..n−1 , the comparison A < B can be defined as def
A < B = (An−1 < Bn−1 ) ∨ (An−1 = Bn−1 ∧ (A0..n−2 < B0..n−2 )) = An−1 Bn−1 ⊕ (An−1 ⊕ Bn−1 ∧ (A0..n−2 < B0..n−2 )) . This defines a recurrence for the less-than of the i first bits Li (analogous to the carry for the i first bits Ci ) by def
Li+1 = Li (Ai ⊕ Bi ) ⊕ Ai Bi = Li (Ai ⊕ Bi ) ⊕ Bi (Ai ⊕ Bi ) ,
(3) (4)
where L0 = 0. We have that i Lj = (Ai..j−1 < Bi..j−1 ), reusing the notation for partial carries. A 1-bit less-than circuit can be implemented in reversible logic using one Feynman and one Fredkin gate (Fig. 7). The use of a Fredkin gate instead of Toffoli gates is inspired by the adder implementation in [12]. Multiple 1-bit lessthans can be combined into an n-bit SLT circuit with the ripple principle used in the CDKM-adder, as shown in Fig. 8 (where r-SLT denotes the inverse of the 1-bit SLT circuit). Cost estimates for this circuit can be seen in Table 1. The ripple SLT requires an ancilla bit, and has a delay of approximately 2n. Note that if the ancilla input in the ripple circuit is set to 1 instead of 0, the circuit then calculates less-than-or-equal-to (≤). The recurrence (3) can also be combinatorially unfolded into n controlled-not gates, as shown in Fig. 9. This improves the depth of the less-than circuit to approximately n, and obviates the need for an ancilla. However, the number of
234
M.K. Thomsen and H.B. Axelsen Li Ai Bi
Li Ai Bi
Li+1
Fig. 7. Block in ripple SLT (A < B) circuit
SLT
An−1 Bn−1
r-SLT
0 A0 B0 A1 B1
r-SLT
SLT
r-SLT
SLT
0 A0 B0 A1 B1
An−1 Bn−1 Z ⊕ Ln
Z
Fig. 8. Ripple SLT circuit
A0 B0
A0 B0
An−2 Bn−2 An−1 Bn−1 Z
An−2 Bn−2 An−1 Bn−1 Z ⊕ Ln SLT Fig. 9. Triangular SLT circuit
Table 1. Costs of reversible n-bit circuits
Transistor cost Gate cost Garbage bits Ancilae bits Circuit width Delay in gates
CDKMadder 64n 6n 0 1 2n + 1 5n + 1
Ripple SLT 48n + 8 4n + 1 0 1 2n + 2 2n + 3
Triangular SLT 4n2 + 28n 3n 0 0 2n + 1 n+2
Carry correction 4n2 + 4n n 0 0 n+1 n
line connects (and hence transistor cost) increases quadratically with n. When less-than functionality is needed, a choice between the two designs will depend on which resources are considered more precious in the context of application.
Parallel Optimization of a Reversible (Quantum) Ripple-Carry Adder
4
235
Optimization Using Parallelization
The delay of an n-bit ripple-carry adder increases linearly with n. When n becomes large this means that ripple-carry adders become prohibitively slow. This has led to the development of numerous adder optimizations schemes such as the carry-lookahead, carry-save, and conditional-sum adders. Common for these schemes are that they have been designed as irreversible adders and can not be used cleanly 4 for optimizing a reversible adder [5]. Here, we present a parallelization scheme that makes use of m parallel blocks of k-bit CDKM-adders, and combine them to form an mk-bit adder without garbage. The instance where m = k = 4 is shown in Fig. 11. The idea for this scheme originates in the addition of binary coded decimal numbers, which is easily parallelized into blocks of 4 bits [13]. The parallelization scheme works as follows. 4.1
Carry Correction
Each of the m parallel k-bit CDKM-adders assumes that the carry-in to the block is 0, i.e. if a block works on Ai..i+k−1 and Bi..i+k−1 then for all j, 0 ≤ j ≤ k, the (majority) block will compute with carries set to i Ci+j , not Ci+j (unless i = 0). In order for parallelization to work, we need to establish an efficiently computable connection between the carries calculated by a block and the correct carries for that block. Lemma 1 (Carry correction). Let A and B be n-bit binary numbers, then for all i and j such that 0 ≤ i < j ≤ n it holds that Cj = (Ci (Ai ⊕ Bi )(Ai+1 ⊕ Bi+1 ) · · · (Aj−1 ⊕ Bj−1 )) ⊕ i Cj . Proof. Proof by mathematical induction on the difference j − i. Base case: The difference is 1 (j = i + 1). We must prove that Ci+1 = Ci (Ai ⊕ Bi ) ⊕ i Ci+1 . By the definition of carry-out (2) i Ci+1 = 0(Ai ⊕ Bi ) ⊕ Ai Bi = Ai Bi , and so Ci (Ai ⊕ Bi ) ⊕ i Ci+1 = Ci (Ai ⊕ Bi ) ⊕ Ai Bi which by definition is Ci+1 . Induction step: Assume that the induction hypothesis holds for a difference of h (j = i + h), Ci+h = (Ci (Ai ⊕ Bi )(Ai+1 ⊕ Bi+1 ) · · · (Ai+h−1 ⊕ Bi+h−1 )) ⊕ i Ci+h . We then need to prove that the equality holds for j = i + h + 1, i.e. that Ci+h+1 = (Ci (Ai ⊕ Bi )(Ai+1 ⊕ Bi+1 ) · · · (Ai+h ⊕ Bi+h )) ⊕ i Ci+h+1 . 4
Where clean means without the generation of garbage bits.
236
M.K. Thomsen and H.B. Axelsen CC Ci Ai Ai ⊕ Bi Ai+1 ⊕ i Ci+1 Ai+1 ⊕ Bi+1
Ci A i ⊕ Ci Ai ⊕ Bi Ai+1 ⊕ Ci+1 Ai+1 ⊕ Bi+1
Ai+k−1 ⊕ i Ci+k−1 Ai+k−1 ⊕ Bi+k−1 i Ci+k
Ai+k−1 ⊕ Ci+k−1 Ai+k−1 ⊕ Bi+k−1 Ci+k
Fig. 10. Triangular circuit for carry correction
We have that a
Ci+h+1 = Ci+h (Ai+h ⊕ Bi+h ) ⊕ Ai+h Bi+h b
= ((Ci (Ai ⊕ Bi )(Ai+1 ⊕ Bi+1 ) · · · (Ai+h−1 ⊕ Bi+h−1 )) ⊕ i Ci+h ) (Ai+h ⊕ Bi+h ) ⊕ Ai+h Bi+h c
= (Ci (Ai ⊕ Bi )(Ai+1 ⊕ Bi+1 ) · · · (Ai+h−1 ⊕ Bi+h−1 )(Ai+h ⊕ Bi+h ))⊕ i Ci+h (Ai+h
⊕ Bi+h ) ⊕ Ai+h Bi+h
a
= (Ci (Ai ⊕ Bi )(Ai+1 ⊕ Bi+1 ) · · · (Ai+h ⊕ Bi+h )) ⊕ i Ci+h+1 , a
b
where = follows by the definition of carry-out (2), = follows by the induction c
hypothesis, and = follows by distribution of exclusive-or. Given Ci and (A⊕ B)i..j−1 we now have a formula for correcting an intermediate carry i Cj to the correct carry Cj . A triangular carry-correction circuit for correcting the outputs following a majority ripple-block is shown in Fig. 10 (cost of circuit in Table 1). For the parallelized mk-bit adder (cf. Fig. 11) carry-correction can be done with a delay of m + k − 1. Following this correction, the mk-bit sum S can be computed for each k-bit block in parallel, using m unmajority-and-sum circuits. 4.2
Uncomputation of Carries Ck , C2k , . . . , C(m−1)k
The parallel sum computation leaves m − 1 intermediate carries Ck , C2k , . . . , C(m−1)k as garbage (cf. Fig. 11). If these are to be considered as ancillae instead they must be uncomputed, for safe removal. To efficiently uncompute these carries, we shall prove a simple arithmetical dependency in between the carry-in C0 , carry-out Ck , addend A and sum S = (A + B + Cin mod 2k ) of an k-bit full addition (noting that corresponding values are available for the uncomputation of the intermediate carries).
Parallel Optimization of a Reversible (Quantum) Ripple-Carry Adder
237
Lemma 2 (Carry-sum dependency). Let A, B, S be k-bit (non-negative) binary numbers ∈ {0 . . . 2k −1} and C0 , Ck be single bits (truth values) as described above. Then Ck = C0 (A = S) ⊕ (S < A) . Proof. By case analysis on the truth value of the equality comparison A = S. Case (A = S) ∧ C0 : A = S implies that (B + C0 mod 2k ) = 0. Since C0 = 1 this implies that B = 2k − 1, and so A + B + C0 ≥ 2k , and therefore Ck = 1. Case (A = S) ∧ C0 : Like above, (B + C0 mod 2k ) = 0, and since C0 = 0 this implies that B = 0. Hence A + B + C0 < 2k , and so Ck = 0. Case A > S: S cannot be smaller than A unless A + B + C0 ≥ 2k , as B and C0 are non-negative. Hence Ck = 1. Case A < S: If A is smaller than S, then, since B + C0 ≤ 2k we have S = A + B + C0 , and because this is less than 2k , Ck = 0. Noting that these cases are mutually exclusive we can combine them to get
Ck = C0 (A = S) ⊕ (S < A). Corollary 1. Let A, B, S be n-bit numbers defined as in Lemma 2. For all i, j, where 0 ≤ i < j ≤ n, the following recurrence holds. Cj = Ci (Ai..j−1 = Si..j−1 ) ⊕ (Si..j−1 < Ai..j−1 ) Proof. By definition of sum and carry, and a simple application of Lemma 2.
We have thus reduced the carry-out Cj to depend only on Ci , Ai..j−1 and Si..j−1 . Note that the equality comparison Ai..j−1 = Si..j−1 is easy to compute by (Ai..j−1 = Si..j−1 ) = (Ai ⊕ Si )(Ai+1 ⊕ Si+1 ) · · · (Aj−1 ⊕ Sj−1 ) , since Ai..j−1 = Si..j−1 iff they agree on all bits. We can use the recurrence of Cor. 1 in a manner entirely analogous to the forwards carry-correction and remove the dependency on Cik from C(i+1)k (where 0 < i < m − 1), leaving only the truth value of (Si..i+k−1 < Ai..i+k−1 ) = i Li+k in the ancillae. These can then be (un)computed in parallel for each k-bit block (see Fig. 11), using the (triangular) SLT circuit defined in Sec. 3.2. Thus, the parallelized uncomputation of the m−1 intermediate carries requires a circuit with delay m + k. 4.3
Total Cost of Parallelization
The full ripple-block carry adder (RBCA) is implemented as shown in Fig. 11, which shows the instance where m = k = 4. In Table 2 we give the circuit cost according to the metrics of Sec. 2.2. Notice that the final circuit has no garbage bits, uses m ancillae bits, and has delay O(m + k). Using these costs we calculate the combination of k and m such that the delay is minimal. The exact delay in gates is given by RBCAd (m, k) = 2m + 7k ,
238
M.K. Thomsen and H.B. Axelsen
0 B0..3 A0..3
M A J
0
M A J
S M U
(A ⊕ B)0..3 C4 CC
B4..7 A4..7 0 B8..11 A8..11
M A J
(A ⊕ B)4..7 4 C8
S C4 M U
C8
(A ⊕ B)8..11 8 C12
CC S C8 M U
C12 CC
0 B12..15 A12..15
M A J
(A ⊕ B)12..15 12 C16
C16
(A ⊕ S)0..3 S A0..3 L T L4
0 S0..3 A0..3 0
(A ⊕ S)4..7 S A4..7 L T 4 L8
S4..7 A4..7 0
(A ⊕ S)8..11 S A8..11 L T 8 L12
S8..11 A8..11 0
S C12 M U
S12..15 A12..15
Fig. 11. Layout of the ripple-block carry adder, where m = k = 4. The right-side fat Feynman gates denotes a Feynman gate for each pair of lines. The fat control on the other CNOT gates signifies a control for each line. Table 2. Costs of ripple-block carry adder. Costs for sub-circuits are found in Table 1. Transistor cost Gate cost Garbage bits Ancillae bits Circuit width Delay in gates
n-bit parallelized adder with m blocks of size k (m−1)(8k+8+CC t (k)+SLT t (k))+(m−2)(8k+8)+m·CDKM t (k) 2m − 3 + (m − 1)(CC g (k) + SLT g (k)) + m · CDKM g (k) 0 m 2n + m 2m − 3 + CC d (k) + SLT d (k) + CDKM d (k)
which yields the values kmin = for the minimal delay
2n and mmin = 7
7n , 2
√ RBCAd (mmin , kmin ) = 2 14n .
The corresponding transistor cost for the minimal delay circuit is then given by 96n 2 √ n n+ −3 . RBCAt (mmin , kmin ) = 8 7 7 Table 3 shows some common combinations and compares them with the CDKM-adder in the delay and transistor count cost metrics. The comparison shows that e.g., for 64-bit numbers it is possible to get a speedup of more than 5 over the CDKM-adder. Not surprisingly, the speedup from parallelization comes at the cost of more transistors, though still less than a factor of 3 for 128-bit adders.
Parallel Optimization of a Reversible (Quantum) Ripple-Carry Adder
239
Table 3. The delay and transistor cost of the ripple-block carry adder compared to the CDKM-adder. The numbers in bold are those with the lowest delay and therefore those that are compared with the CDKM-adder. The percentages indicate performance of the ripple-block carry adder relative to the CDKM-adder as benchmark. Delay in gates bits n-bit, m-block ripple-block carry adder n\m 2 4 8 16 32 8 32 22 23 16 60 36 30 39 32 116 64 44 46 71 64 228 120 72 60 78 128 452 232 128 88 92 Transistor cost n\m 2 4 8 16 32 8 800 912 992 16 1856 1984 2000 2080 32 4736 4704 4352 4176 4256 64 13568 12448 10400 9088 8528 128 43520 37152 27872 21792 18560
4.4
CDKM-adder 41 81 161 321 641
54% 37% 27% 19% 14%
512 1024 2048 4096 8192
178% 195% 213% 222% 266%
Discussion
Improved designs for majority-based ripple-carry adders exist [3,12] that have better run-time (delay) constants than those for the CDKM-adder used here. Employing these designs will improve the delay of a ripple-carry adder, but they will also improve the delay of the parallelized ripple-block carry adder. The transistor cost of the triangular SLT and carry-correction circuits, used in Fig. 11, grows quadratically with the size of the blocks, so for larger block sizes it may be preferable to use ripple circuits instead. It is possible to exchange both triangular circuits with ripple circuits, at the cost of a higher delay and more ancilla bits, but we shall not show this here. The dual-line pass-transistor CMOS technology considered here has the fortunate property that the different cost metrics of the controlled-not gates grows at most linearly with the number of control lines. Other technologies, such as the ones suggested for quantum computing, do not necessarily have this property. However, implementations in such domains can still benefit from the optimization by using the ripple versions of the triangular circuits that do not use wide gates. Power efficiency. As mentioned, the optimization comes at the price of additional hardware. We therefore expect the ripple-block carry adder to consume more power while running than the CDKM-adder, simply because it takes up more real estate on the chip. However, per operation we expect the ripple-block carry adder to be more power efficient than the CDKM-adder. A more reasonable measure for power efficiency than mere power consumption is how the computational performance (in operations pr. second) compares with power consumption (in Watts), i.e., how many operations we can execute per
240
M.K. Thomsen and H.B. Axelsen
dissipated Joule of energy. Assuming that power consumption is proportional to the number of transistors, and that operations pr. second is inversely proportional to the depth of the circuit, the performance of the ripple-block carry adder relative to the CDKM-adder is given by CDKM t (n) · CDKM d (n) , RBCAt (m, k) · RBCAd (m, k) where n = mk. As an example, in the case of n = 64 we expect the fastest of the optimized adders (m = 16, k = 4) to be at least twice as power efficient as 1 the CDKM-adder ( 2.22·0.19 , cf. Table 3). Whether this expectation is realistic remains as an open question for implementation and experiments.
5
Conclusion
The reversible ripple-block carry adder presented here is an optimization of the CDKM reversible (quantum) ripple-carry adder [3] by the parallelization of m kbit CDKM-adders. The parallelized adder produces no garbage bits and requires √ only m ancillae bits. The depth complexity is improved from O(n) to O(√ n) for mk-bit addition, with an increase in hardware cost from O(n) to O(n n). Specifically, the parallelized mk-bit ripple-block carry adder is only 2(m + k) − 1 gates deeper than a k-bit CDKM-adder. For realistic word sizes the optimization delivers significant speedup over the CDKM-adder. Since the speedup is proportionally larger than the increase in transistor cost, the optimization holds the promise of increased power efficiency as well as faster computation times. Reversible circuits differ radically from conventional designs, and further work is needed to develop mature design methodologies for efficient circuits. Acknowledgements. The authors wish to thank Alexis De Vos and Robert Gl¨ uck for insightful discussions on reversible adders. This work was in part supported by the Foundations for Innovative Research-based Software Technologies research school (FIRST).
References 1. Axelsen, H.B., Gl¨ uck, R., Yokoyama, T.: Reversible machine code and its abstract processor architecture. In: Diekert, V., Volkov, M.V., Voronkov, A. (eds.) CSR 2007. LNCS, vol. 4649, pp. 56–69. Springer, Heidelberg (2007) 2. Bennett, H.C.: Logical reversibility of computation. IBM Journal of Research and Development 17, 525–532 (1973) 3. Cuccaro, S.A., Draper, T.G., Kutin, S.A., Moulton, D.P.: A new quantum ripplecarry addition circuit. In: 8th Workshop on Quantum Information Processing (2005) arXiv:quant-ph, 0410184v1 4. De Vos, A.: Reversible computing. Progress in Quantum Electronics 23(1), 1–49 (1999)
Parallel Optimization of a Reversible (Quantum) Ripple-Carry Adder
241
5. Desoete, B., De Vos, A.: A reversible carry-look-ahead adder using control gates. Integration, the VLSI Journal 33(1-2), 89–104 (2002) 6. Feynman, R.: Quantum mechanical computers. Optics News 11, 11–20 (1985) 7. Fredkin, E., Toffoli, T.: Conservative logic. International Journal of Theoretical Physics 21(3-4), 219–253 (1982) 8. Landauer, R.: Irreversibility and heat generation in the computing process. IBM Journal of Research Development 5(3), 183–191 (1961) 9. Maslov, D., Miller, D.M.: Comparison of the cost metrics through investigation of the relation between optimal NCV and optimal NCT three-qubit reversible circuits. IET Computers & Digital Techniques 1(2), 98–104 (2007) 10. Moore, G.: Transcript of interview, Intel Developer Forum. Technical report, Intel Corp. (2007) 11. Munakata, T.: Beyond silicon: New computing paradigms. Communications of the ACM 50(9), 30–72 (2007) (special issue) 12. Skoneczny, M., Van Rentergem, Y., De Vos, A.: Reversible Fourier transform chip (accepted for MIXDES) (2008) 13. Thomsen, M.K., Gl¨ uck, R.: Optimized reversible binary-coded decimal adders. Journal of Systems Architecture (to appear, 2008) 14. Toffoli, T.: Reversible computing. In: de Bakker, J.W., van Leeuwen, J. (eds.) ICALP 1980. LNCS, vol. 85, pp. 632–644. Springer, Heidelberg (1980) 15. Van Rentergem, Y., De Vos, A.: Optimal design of a reversible full adder. International Journal of Unconventional Computing 1(4), 339–355 (2005) 16. Vedral, V., Barenco, A., Ekert, A.: Quantum networks for elementary arithmetic operations. Physical Review A 54(1), 147–153 (1996) 17. Vit´ anyi, P.: Time, space, and energy in reversible computing. In: Conference on Computing Frontiers. Proceedings, pp. 435–444. ACM Press, New York (2005) 18. Yokoyama, T., Axelsen, H.B., Gl¨ uck, R.: Principles of a reversible programming language. In: Conference on Computing Frontiers. Proceedings, pp. 43–54. ACM Press, New York (2008) 19. Yokoyama, T., Gl¨ uck, R.: A reversible programming language and its invertible self-interpreter. In: Partial Evaluation and Program Manipulation. Proceedings, pp. 144–153. ACM Press, New York (2007)
Automata on Multisets of Communicating Objects Linmin Yang1 , Yong Wang2 , and Zhe Dang1 1
School of Electrical Engineering and Computer Science Washington State University, Pullman, WA 99164, USA 2 Google Inc., Mountain View, CA 94043, USA
Abstract. Inspired by P systems initiated by Gheorghe P˜aun, we study a computation model over a multiset of communicating objects. The objects in our model are instances of finite automata. They interact with each other by firing external transitions between two objects. Our model, called service automata, is intended to specify, at a high level, a service provided on top of network devices abstracted as communicating objects. We formalize the concept of processes, running over a multiset of objects, of a service automaton and study the computing power of both single-process and multiprocess service automata. In particular, in the multiprocess case, regular maximal parallelism is defined for inter-process synchronization. It turns out that single-process service automata are equivalent to vector addition systems and hence can define nonregular processes. Among other results, we also show that Presburger reachability problem for single-process service automata is decidable, while it becomes undecidable in the multiprocess case. Hence, multiprocess service automata are strictly more powerful than single-process service automata.
1 Introduction Network services nowadays can be viewed as programs running on top of a (possibly large) number of devices, such as cellular phones, laptops, PDAs and sensors. How to design and implement such programs has become a central research topic in areas like pervasive computing [15,19], a proposal of building distributed software systems from (a massive number of) devices that are pervasively hidden in the environment. In fact, such a view has already embedded in algorithmic studies inspired from ant colonies (where each ant resembles a communicating device in our context) [4,7,8], as well as in more recent studies on P systems, a biologically inspired abstract computing model running on, in a simplest setting, multisets of symbol or string objects [13,14]. As an unconventional computing model motivated from natural phenomena of cell evolutions and chemical reactions, P systems were initiated by Gh. P˜aun [13,14] several years ago. It abstracts from the way living cells process chemical compounds in their compartmental structures. Thus, regions defined by a membrane structure contain objects that evolve according to given rules. The objects can be described by symbols or by strings of symbols, in such a way that multisets of objects are placed in regions of the membrane structure. The membranes themselves are organized as a Venn diagram or a tree structure where one membrane may contain other membranes. By using the rules in a nondeterministic, maximally parallel manner, transitions between the system configurations can be obtained. A sequence of transitions shows how the system is C.S. Calude et al. (Eds.): UC 2008, LNCS 5204, pp. 242–257, 2008. c Springer-Verlag Berlin Heidelberg 2008
Automata on Multisets of Communicating Objects
243
evolving. Objects in P systems are typed but addressless, which is an attractive property for modeling high-level networks. Inspired by P systems, we introduce an automata-theoretic model for the programs over network devices, called service automata, to specify services running over communicating objects (which are an abstraction of, e.g., network devices mentioned earlier). Our model is at the high-level. That is, the communicating objects are typed but addressless (i.e., the objects do not have individual identifiers). In other words, unique identifiers such as IP addresses for network devices are left (and of course also necessary) for the implementation level. For instance, in a fire truck scheduling system, which is also an example used throughout our paper, a fire emergency calls for one or more trucks that are currently available. In this scenario, exactly which truck is dispatched is not so important as long as the truck is available. Hence, a service automaton runs on multisets of communicating objects. This also resembles traditional high-level programming languages that run on a memory in the sense that a variable is often mapped with a concrete memory address only at compile time. In a service automaton, (communicating) objects are logical representations of physical devices and entities in a network. Functions of such a device or entity are abstracted as an automaton specified in the correspondent object. In this paper, we mostly study the case when the automaton is of finite-states, i.e., a finite automaton (FA). As we mentioned earlier, objects are typed but addressless in our model and the type of an object is the FA associated with it. In other words, a service automaton runs on a multiset of objects, which are modeled as finite automata. We depict a service automaton as a finite diagram consisting of a number of big circles. Each circle represents an object type that is an FA whose state transitions, called internal transitions, are drawn inside the circle. Notice that an unbounded number of objects could share with the same object type. Communications between objects are specified by external transitions, each of which connects two (internal) transitions. An example service automaton is depicted in Fig. 1. We shall emphasize that, in a service automaton, the total number of objects is not specified. That is, the automaton can run on any multiset of objects that are of the object types specified in the diagram of the service automaton. The service starts from an initial object (of a predefined initial object type) and, at this moment, we say that the object is active. Roughly speaking, at each step, the service automaton runs as follows. Suppose that the current active object O is of type A and is at state q. At the step, either an active object fires a purely internal transition (that is an internal transition not connected by any external transitions in the diagram of the service automaton) from its current state q to a new state and remains being active, or the active object O communicates with another nondeterministically chosen object O (we use B to denote its type and p to denote its current state) through firing an external transition. Suppose that the external transition is r. To ensure r firable, we further require that, on the diagram of the service automaton, the r connects from an internal transition tA inside the big circle of type A to an internal transition tB inside the big circle of type B. Furthermore, the two internal transitions start with the current states of the two objects O and O , respectively. On firing the external transition, both objects O and O fire the two internal transitions,
244
L. Yang, Y. Wang, and Z. Dang
respectively and simultaneously. After firing the external transition, the current active object becomes O and the object O is no longer active. Actually, we can view an active object as one holding a token. When an external transition (between two objects) is fired, it can pass the token to the other object. When we follow the flow of the token, we can define a process of the service automaton as a sequence (of labels) of transitions that the token is passing through. Hence, the service defined by the service automaton is the set (i.e., language) of all such processes. In the paper, we show that service automata and vector addition systems are equivalent and hence can define nonregular services. We also discuss other variations and verification problems of service automata. One interesting open question is that we currently do not know whether there is a nontrivial subclass of service automata that only define regular services. In the service automaton given above, there is only one object being active at any time (i.e., there is only one token), and hence it is a single-process service automaton. In the paper, we also study multiprocess service automata, where there are multiple active objects at any time; i.e., there are multiple tokens, each of which corresponds to a process. Inter-process communication is also defined through our notion of regular maximal parallelism among processes, which generalizes P˜aun’s [14] classic maximal parallelism as well as other derivation modes [6,10] in the context of P systems. One of our results shows that multiprocess service automata are strictly stronger than (single-process) service automata. We also study variations and verification problems for multiprocess service automata. Our service automata, in their current form (where each object type specifies an FA), can be treated as a variation of P systems where each object is a pair of a symbol and a state. Roughly speaking, an external transition that connects from the internal transition q → q in an automaton of type A to the internal transition p → p in an automaton of type B can be depicted in a P system rule in the following form: A¯q Bp → Aq B¯p where the symbol objects A¯q and B¯p indicate the active objects. Tailored for network applications, our model has the following features and differences: – In this paper, we mostly consider the case when the communicating objects are of finite-states. However, when communicating objects in our model are augmented with some unbounded storage devices (such as a counter), it is difficult to directly translate transitions in such generalized service automata into P system rules. Hence, it is necessary to further study P systems on “automata objects” in addition to symbol and string objects. – In P systems, the notion of “threads” or “processes” is hard to abstract. Naturally, in network service applications, such a notion is extremely important since, essentially, the applications are distributed and concurrent in nature. Targeting at these applications, our model suggests a subclass of P systems where single/multiple processes can be clearly defined and, therefore, opens the door for further applying the model of P systems in areas like pervasive/mobile/distributed computing. – In multiprocess service automata, we introduce the notion of regular maximal parallelism among processes, which is able to specify both Gh. P˜aun’s classical maximal parallelism and some other restricted forms of (maximal) parallelism [6,10].
Automata on Multisets of Communicating Objects
245
However, we shall point out that, for network applications, maximal parallelism in general is hard or expensive to implement. Therefore, it is a future research topic to study the cost of implementing restricted forms of regular maximal parallelism. There has been much work on modeling distributed systems using automata. For instance, an input/output (I/O) automaton [12] models and reasons a concurrent and distributed discrete event system based on the broadcasting communication. The name “service automata” also appears in the work [11] that analyzes the behaviors over an open workflow nets. We reuse the name “service automata” in our paper but with completely different context and meaning. In short, in the aforementioned papers, a system is composed of a finite and fixed number of automata, while in our work, a service automaton runs on a multiset of automata (whose size is not specified). The differences remain when one compares our work with some research in pervasive computing models [1,2,3] and mobile agents [16]. Linda [5] is another model of communications among processes, where communications are achieved by creating new objects in a tuple space, which is a quite practical model. Our previous work, Bond Computing Systems [21], is also an addressless model to analyze network behaviors. However, the work treats a network system from a global view and focuses on how symbol objects (without states) are formed, without using maximal parallelism, into bonds, while in this paper we focus on automata objects and, from a local view, study processes on how state changes between objects.
2 Definitions Let Σ = {A1 , ..., Ak } (k ≥ 1) be an alphabet of symbols. Each Ai , i = 1, · · · , k, is called a type. An instance of a symbol Ai , for some i, in Σ is called an object of type Ai , or simply an Ai -object. Without loss of generality, we call A1 to be the initial type. Each Ai is associated with a (nondeterministic) finite automaton (we still use Ai to denote it), which is a 3-tuple Ai = (Si , δi , qi0 ), where Si = {Si1 , ..., Sil } (some l ≥ 1) is a finite set of internal states (one can assume that the Si ’s are disjoint), δi ⊆ Si × Si is the set of the internal state transitions, and qi0 ∈ Si is the initial state of the automaton Ai . We use ti : Siu → Siv to denote a transition ti = (Siu , Siv ) ∈ δi . In this way, an Ai -object itself is simply an instance of the finite automaton Ai . Inter-object communications are achieved by external transitions in a given Δ, and each external transition r ∈ Δ is in the following rule-form: r : (Ai , ti ) → (Aj , tj ), for some i and j, where ti ∈ δi and tj ∈ δj are internal state transitions. We will see in a moment that it is required that ti and tj associated with the r must be fired together with r, and can not be fired alone. If an internal state transition t is not associated with any external transition, we call such a t as a purely internal state transition. In summary, a service automaton is a tuple G = Σ, Δ
246
L. Yang, Y. Wang, and Z. Dang
F ire_ T ruck
c all_bac k_ACK
S ched uler
idle
on_ call
dispatc h sc heduling
in
busy out
on_ duty
dispatc h_ACK
w ork
c all_bac k
Fig. 1. An example service automaton G for a fire truck scheduling system
where Σ and Δ are specified in above. As we will see in a moment, G could run over any number of objects and the number is not specified in G itself. Example 1. Now we model a fire truck scheduling system by a service automaton G = Σ, Δ, where Σ = {Scheduler, F ire T ruck} with Scheduler being the initial type, and Δ will be given in a moment. The service automaton G is shown in Fig. 1, where automata Scheduler and F ire T ruck are represented by the big (and bold) circles, internal state transitions (totally five) are represented by arrows within a big circle, and external transitions (totally four) are represented by arrows crossing the big circles. In Scheduler, busy is the initial state, while in F ire T ruck, on call is the initial state. Again, the number of Scheduler-objects and F ire T ruck-objects is not specified in G. We now define the semantics of the G. To specify an object O, we need only know its (unique) type A and its (unique) current state s of the finite automaton that is associated with the type; i.e., the O is an instance of (A, s), where for some i, A = Ai ∈ Σ and s ∈ Si (sometimes, we just call O an (A, s) object). A collection (C, O) is a multiset C of objects with O ∈ C being the only active object. Let (C, O) and (C , O ) be two collections and r be a transition. We use (C, O) → (C , O ) r
to denote the fact that the collection (C, O) changes to the collection (C , O ) by firing the transition r, which is defined formally as follows. We first consider the case when r is a purely internal transition, say ti : Siu → Siv in δi (i.e., the transition is inside an Ai -object specified by the automaton Ai ). We say ti that O → O when O is at state Siu and is of type Ai , and O is the result of changing the current state in O with Siv . Now, (C, O) → (C , O ) r
Automata on Multisets of Communicating Objects
247
if the following conditions are satisfied: t
i – O→ O . – C is the same as C except that the object O is changed into O .
Therefore, when the purely internal transition ti is fired, the active object must be at state Siu and, after firing the transition, the current state of the object is Siv and it remains as the active object. Next, we consider the case when r is an external transition, say r : (Ai , ti ) → (Aj , tj ), where ti : Siu → Siv in δi and tj : Sjp → Sjq in δj are internal state transitions. In this case, r (C, O) → (C , O ) if, for some O ∈ C (with O and O being distinct), and some object O , t
i – O→ O , t j – O → O , and – C is the result of, in C, replacing O with O and replacing O with O .
Therefore, when the external transition r is fired, the active object O must be an Ai object in state Siu and an Aj -object O in state Sjp is nondeterministically chosen from the collection. The Ai -object O will transit from state Siu to Siv (and evolve into O defined in above), and the Aj -object O will transit from state Sjp to Sjq (and evolve into O defined in above), in parallel. After the transition is fired, the active object is changed from O to O . The collection (C, O) is initial if all objects in C are in their initial states, and the O is a designated initial and active object (i.e., the type of O is the initial type A1 ). For an initial collection (C, O), we write (C, O) ;G (C , O )
(1)
if there are collections (C, O) = (C0 , O0 ), · · · , (Cz , Oz ) = (C , O ), for some z, such that r
r
1 z (C0 , O0 ) → (C1 , O1 ) · · · → (Cz , Oz ),
(2)
for some (purely internal and external) transitions r1 , · · · , rz in G. In fact, G defines a computing model that modifies a collection (C, O) into another collection (C , O ) through (C, O) ;G (C , O ). To characterize the relationship ;G that the G can compute, we need more definitions. Consider a set T ⊆ {(Ai , s), for all s ∈ Si , and for all i}. For each pair t = (A, s) ∈ T , we use #t (C, O) to denote the number of the objects in C such that, each of which is of type A and at state s. Clearly, when a proper ordering is applied on T , we may collect the numbers #t (C, O), t ∈ T , into a vector called #T (C, O). We use RG,T , called the binary reachability of G wrt T , to denote the set of all vector pairs (#T (C, O), #T (C , O )) for all initial collections (C, O) and collections (C , O ) satisfying (C, O) ;G (C , O ). In particular, when T = {(Ai , s), for all s ∈ Si , and for all i}, we simply use RG to denote the RG,T .
248
L. Yang, Y. Wang, and Z. Dang
Example 2. We now explain the semantics of the example service automaton in Fig. 1. Roughly speaking, what the service automaton G specifies is a fire truck scheduling system, where there could be an unspecified number of schedulers and fire trucks. Schedulers dispatch or call back fire trucks as needed, and once a fire truck changes its state, it sends back an acknowledge message to a scheduler. According to the finite automaton Scheduler, a scheduler is busy all the time. For the finite automaton F ire T ruck, the internal state transition out means that a fire truck is sent out to extinguish a fire, in means that a fire truck finishes its work and comes back, idle means that a fire truck keeps being on-call, and work means that a fire truck keeps working (being on-duty). The external transition dispatch sends an on-call fire truck to extinguish a fire; dispatch ACK describes that a dispatched fire truck sends an acknowledge message to a scheduler (we assume that all schedulers can communicate with each other through an underlying network); call back simply makes a scheduler call an on-duty fire truck back; similar to dispatch ACK, call back ACK means that once an on-duty fire truck is called back and becomes on-call, it sends an acknowledge message named call back ACK to a scheduler. Again, we emphasize that the number of Scheduler-objects and F ire T ruckobjects in G is not specified in the definition; that is, G could run over any number over Scheduler-objects and F ire T ruck-objects. In the next example, we illustrate a scenario and explain in details how the example service automaton runs. Example 3. We now illustrate an example run of the service automaton G specified in Fig. 1. The run, as shown in Fig. 2, is on two schedulers and three fire trucks. Telephones are used to depict Scheduler’s which are always in state busy, and fire trucks are used for F ire T ruck’s, while a fire truck with a black dot on it denotes a F ire T ruck in state on duty, otherwise denotes a F ire T ruck in state on call. Consider T = {t1 , t2 , t3 }, where t1 = (Scheduler, busy), t2 = (F ire T ruck, on call) and t3 = (F ire T ruck, on duty). By definition, #t1 (C, O), #t2 (C, O), and #t3 (C, O) are the numbers of Scheduler’s in state busy, F ire T ruck’s in state on call, and F ire T ruck’s in state on duty in a given collection (C, O), respectively. Let #T (C, O) be the vector (#t1 (C, O), #t2 (C, O), #t3 (C, O)). We focus on an initial collection (C0 , O0 ) where O0 is an initial and active object (described in a moment), and, accordingly, #T (C0 , O0 ) = (m, n, 0), m and n could be any number. In this example, we assign m = 2 and n = 3. Initially, according to the definition, there are only two Scheduler’s in state busy and three F ire T ruck’s in state on call, since busy and on call are the initial state of the automata Scheduler and F ire T ruck, respectively. Since Scheduler is the initial type, we nondeterministically choose a Scheduler in state busy, say O0 , as the initial and active object. Note that O0 (the same for O1 , · · · , O5 , O1 , · · · , O5 , O6 defined later) is only for notational convenience, it is not an identifier; actually, our system is addressless. Since all the internal state transitions are associated with external transitions, the internal state transitions cannot be fired alone, and hence, we only need to consider external transitions. According to Fig. 1, the external transition dispatch requires some active Scheduler in state busy and some F ire T ruck in state on call; the
Automata on Multisets of Communicating Objects
O1
d is p a
tc h
tc h
c all_ b
AC ch_ pat
O 2 (O 4 )
s di
d is
O2
O 1 (O 5 )
O 0 (O 6 ) d is p a
K
d is p a
O1
O0
O0
tc pa
d is p a
d is
(1)
pat
h_
A
CK
cal
l_ b
isp
atc
dis O3
AC
d is
K
pat
CK
a ck
O 2 (O 4 )
tc h
tc h
ac k _ A
d
ch_
249
ch_
h_
p at
AC
A
CK
ch
O3
K
(3)
(2)
Fig. 2. An illustration of how the example service automaton G runs on two schedulers and three fire trucks. Telephones here denote Scheduler’s which are always in state busy, and fire trucks here denote F ire T ruck’s, while a fire truck with a black dot on it denotes a F ire T ruck in state on duty, otherwise denotes a F ire T ruck in state on call.
external transition dispatch ACK requires some active F ire T ruck in state on duty and some Scheduler in state busy; the external transition call back requires some active Scheduler in state busy and some F ire T ruck in state on duty; and, finally, the external transition call back ACK requires some active F ire T ruck in state on call and some Scheduler in state busy. Initially, dispatch is the only external transition that could be fired, since there are only two Scheduler’s in state busy and three F ire T ruck’s in state on call in the initial collection, and the active object is some Scheduler O0 . We nondeterministically pick a F ire T ruck in state on call, say O1 , to fire dispatch. After firing dispatch, O0 is still in state busy, while O1 changes to state on duty (a black dot is added to O1 in dispatch
Fig. 2 (1)), and becomes the active object O1 . Now, we have (C0 , O0 ) → (C1 , O1 ) with #T (C1 , O1 ) = (2, 2, 1). At this moment, the only firable external transition is dispatch ACK, which requires some active F ire T ruck in state on duty and some Scheduler in state busy. The active F ire T ruck in state on duty is O1 , and we nondeterministically pick a Scheduler in state busy, say O2 . Note that O2 and O0 may or may not (actually this is the case here) be the same object. After firing dispatch ACK, O1 is still at state on duty, and O2 is still in state busy and becomes active object O2 . So, we have (C1 , O1 )
dispatch ACK
→
(C2 , O2 ), where #T (C2 , O2 ) = (2, 2, 1). Fig. 2 (1)
dispatch
dispatch ACK
→ (C2 , O2 ). shows the run (C0 , O0 ) → (C1 , O1 ) Next, both dispatch and call back become firable. Suppose that dispatch is nondedispatch
→ (C3 , O3 ) for some terministically picked to fire, similarly, we get (C2 , O2 ) F ire T ruck O3 in state on duty (a black dot is added on O3 in Fig. 2 (2)), and #T (C3 , O3 ) = (2, 1, 2). Next, dispatch ACK becomes the only firable external transition again. Suppose that (C3 , O3 )
dispatch ACK
→
(C4 , O4 ) for some Scheduler O4 in
state busy and #T (C4 , O4 ) = (2, 1, 2). Fig. 2 (2) shows the run of (C0 , O0 ) (C1 , O1 )
dispatch ACK
→
(C2 , O2 )
dispatch
→
(C3 , O3 )
dispatch ACK
→
(C4 , O4 ).
dispatch
→
250
L. Yang, Y. Wang, and Z. Dang
Now, both dispatch and call back become firable. Suppose that this time call back is nondeterministically picked to fire. We nondeterministically pick a F ire T ruck in state on duty from O1 and O3 (in Fig. 2 (3), O1 is picked), say O5 , to fire call back. After firing call back, O5 changes to state on call (the black dot is removed from O5 in call back
Fig. 2 (3)) and becomes the active object O5 . We get (C4 , O4 ) → (C5 , O5 ), where #T (C5 , O5 ) = (2, 2, 1). Similarly, call back ACK is the only firable external trancall back ACK → (C6 , O6 ), for some Scheduler O6 sition now, and we can get (C5 , O5 ) in state busy,and #T (C6 , O6 ) = (2, 2, 1). Fig. 2 (3) shows the run (C0 , O0 ) (C1 , O1 ) (C5 , O5 )
dispatch ACK
→
call back ACK
→
(C2 , O2 )
dispatch
→
(C3 , O3 )
dispatch ACK
→
(C4 , O4 )
dispatch
→
call back
→
(C6 , O6 ). Hence, (C0 , O0 ) ;G (C6 , O6 ).
3 Decidability of Presburger Reachability Let Y = {y1 , · · · , ym } be a finite set of variables over integers. For all integers ay , with y∈ Y , b and c (with b > 0), y∈Y ay y < c is an atomic linear relation on Y and y∈Y ay y ≡b c is a linear congruence on Y . A linear relation on Y is a Boolean combination (using ¬ and ∧) of atomic linear relations on Y . A Presburger formula P (y1 , · · · , ym ) [9] on Y is a Boolean combination of atomic linear relations on Y and linear congruences on Y . We say a vector (z1 , · · · , zm ) satisfies P if P (z1 , · · · , zm ) holds. A simple but important class of verification queries is about reachability. In this section, we study the Presburger reachability problem for service automata. Intuitively, the problem addresses whether there is a collection satisfying a given Presburger constraint is reachable. More precisely, the Presburger reachability problem is defined as follows: Given: a service automaton G, a T ⊆ Σ × S, and a Presburger formula P. Question: Is there any initial collection (C, O) and some collection (C , O ) such that (C, O) ;G (C , O ), and #T (C , O ) satisfying P? Before we proceed further, we need more definitions. An n-dimensional vector addition system with states (VASS) M is a 5-tuple V, p0 , pf , S, δ where V is a finite set of addition vectors in Zn , S is a finite set of states, δ ⊆ S × S × V is the transition relation, and p0 , pf ∈ S are the initial state and the final state, respectively. Elements (p, q, v) of δ are called transitions and are usually written as p → (q, v). A configuration of a VASS is a pair (p, u) where p ∈ S and u ∈ Nn . The transition p → (q, v) can be applied to the configuration (p, u) and yields the configuration (q, u + v), provided that u + v ≥ 0 (in this case, we write (p, u) → (q, u + v)). For vectors x and y in Nn , we say that x can reach y, written x ;M y, if for some j, (p0 , x) → (p1 , x + v1 ) → · · · → (pj , x + v1 + ... + vj ) where p0 is the initial state, pj is the final state, y = x + v1 + ... + vj , and each vi ∈ V . It is well-known that Petri nets and VASS are equivalent. Consider a number k ≤ n. We use x(k) to denote the result of projecting the n-ary vector x on its first k components, and use RM (k) to denote all the pairs (x(k), y(k)) with x ;M y. When k = n, we simply write RM for RM (k). We say that a service automaton G can be simulated by a VASS M if for some number
Automata on Multisets of Communicating Objects
251
k, RG = RM (k). We say that a VASS M can be simulated by a service automaton G if for some T , RG,T = RM . If both ways are true, we simply say that they are equivalent (in terms of computing power). Theorem 1. Service automata are equivalent to VASS, and therefore the Presburger reachability problem of service automata is decidable. The above theorem characterizes the computing power of service automata, when the service automata are interpreted as computation devices. In the following, we will treat service automata as language acceptors and therefore, we can characterize the processes that are generated by such services. We need more definitions. Let Π = {a1 , · · · , an } (n ≥ 1) be an alphabet of (activity) labels. Now, we are given a function that assigns each purely internal transition with Λ (empty label) and assigns each external transition with either Λ or an activity label in Π. Recall that we write (C, O) ;G (C , O ) if there are collections (C0 , O0 ), · · · , (Cz , Oz ), for some z, such that r1 rz (C1 , O1 ) · · · → (Cz , Oz ), (C0 , O0 ) → for some purely internal and external transitions r1 , · · · , rz in G. We use α to denote the sequence of labels for transitions r1 , · · · , rz . To emphasize the α, we simple write (C, O) ;α G (C , O ) for (C, O) ;G (C , O ). In this case, we say that α is a process of G. The set L(G) of all processes of G is called the service defined by the service automaton G. Example 4. Consider the example service automata in Fig. 1. We assign dispatch, dispatch ACK, call back and call back ACK with a1 , Λ, a2 and Λ, respectively. The service, by definition, L(G) = {α : (C, O) ;α G (C , O )}. Define #a (w) as the number of symbol a appearing in a word w. We can easily get that L(G) = {α : #a1 (α ) ≥ #a2 (α ) for all prefix α of α}, since the number of fire trucks dispatched is always greater that the number fire trucks called back. Hence, the service L(G) specified by the service automata in Fig. 1 is nonregular. A multicounter machine M is a nondeterministic finite automaton (with one-way input tape) augmented with a number of counters. Each counter takes nonnegative integer values and can be incremented by 1, decremented by 1, and tested for 0. It is well known that when M has two counters, it is universal. A counter is blind if it can not be tested for 0, however, when its value becomes negative, the machine crashes. A blind counter machine is a multicounter machine M whose counters are blind and the counters become 0 when computation ends. It is known that blind counter machines are essentially VASS treated as a language acceptor. Therefore, Theorem 2. Services defined by service automata are exactly the class of languages accepted by blind counter machines. From the above theorem, it is clear that service automata can define fairly complex processes, which are not necessarily regular, context free, or semilinear. Therefore, we are curious on what will happen if we put some restrictions over the syntax of service automata and what characteristics are essential to the computation power of service automata.
252
L. Yang, Y. Wang, and Z. Dang
One interesting case is when a service automaton only has objects of one type; i.e., Σ is of size 1, say, Σ = {A}. We call such a service automaton as a 1-type service automaton. Surprisingly, we get the following result, which implies that the number of object types is not critical to the computation power. Theorem 3. 1-type service automata can simulate any service automata, and, therefore, services defined by 1-type service automata are equivalent to those defined by service automata. Our next question focuses on whether purely internal state transitions are necessary for a service automaton. We call a service automaton without purely internal state transitions as internal-free. Theorem 4. Any service automaton can be simulated by an internal-free service automaton . However, currently, it is a difficult problem to characterize a nontrivial class of service automata that exactly define regular services.
4 Multiprocess Service Automata In previous sections, we model a single process from the local view of the active object; i.e., we view a single process following the flow of its active object. We describe a process in a service automaton G by recording the trace of the active object in a certain collection, as given in (2), r
r
1 z (C1 , O1 ) · · · → (Cz , Oz ), (C0 , O0 ) →
for some purely internal and external transitions r1 , · · · , rz in G. In a real network, there are often multiple processes in execution at the same time, and each process has its own active object. To model multiprocess processes, we need to take all the active objects into consideration. Let G = Σ, Δ be a service automaton, and we can define a corresponding multiprocess service automaton Gmp as follows. A multiprocess collection (C, O) is a multiset C of objects with O ⊆ C being the active multiset (i.e., each object in O is active). Suppose that there are totally m purely internal and external transitions r1 , · · · , rm in G. nm Let R = {r1n1 , · · · , rm } be a transition multiset where each ni ∈ N∪{∗} (1 ≤ i ≤ m) is the multiplicity of transition ri (the meaning of ∗ will be made clear in a moment). A multiprocess service automaton is a tuple Gmp = Σ, Δ, R, where R is a finite set of transition multisets. For each transition multiset R = {r1n1 , nm } ∈ R, we have a corresponding Presburger formula PR (y1 , · · · , ym ) defined · · · , rm in this way: for each i, when ni ∈ N, we define an atomic linear relation Pi as yi = ni ; when ni = ∗, Pi is defined as yi ≥ 0. Finally, P = i Pi . For instance, for the
Automata on Multisets of Communicating Objects
253
transition multiset {r1∗ , r52 } (transitions with 0 multiplicity are omitted in the R), the corresponding Presburger formula PR (y1 , · · · , ym ) is y1 ≥ 0 ∧ y5 = 2 yi = 0. i=1,5
Let (C, O) and (C , O ) be two multiprocess collections, R be a transition multiset in R. Now, R (C, O) → (C , O ) if the following conditions are satisfied: (i) there are some disjoint multisets Cj ⊂ C, each of which satisfies the following: r (Cj , Oj ) →i (Cj , Oj ) for some transition ri , multisets Cj , and objects Oj and Oj ri (Notice that, by definition of →, Oj ∈ Cj and Oj ∈ Cj ). Notice that, for each j, the ri is fired once. (ii) suppose that the total number of times that transitions ri are fired in (i) is #ri . Then, the corresponding Presburger formula PR (#r1 , · · · , #rm ) of R holds. (iii) the ri ’s are fired in a maximally parallel manner. That is, ri should be fired as many times as possible under (i) and (ii); i.e., the vector (#r1 , · · · , #rm ) satisfying (i) and (ii) is maximal. (iv) the C is the result of replacing each sub-multiset Cj in C with Cj , and the O is result of replacing each object Oj in O with Oj . R
Actually, (C, O) → (C , O ) fires transitions in R in a maximally parallel manner with respect to the constraint PR . By definition, the Presburger formula PR is only capable of comparing a variable with a constant. Hence, the maximally parallel notion used in here is called regular maximal parallelism. It is not hard to see that generalizes P˜aun’s [14] ∗ }) classic maximal parallelism (taking the transition multiset in the form of {r1∗ , · · · , rm as well as some other restricted forms [6,10]. A multiprocess collection (C, O) is initial if the initial active multiset O are of the initial type A1 and in the initial state of A1 . For an initial multiprocess collection (C, O), we write (C, O) ;Gmp (C , O )
(3)
if, for some z, there are multiprocess collections (C, O) = (C0 , O0 ), (C1 , O1 ), · · · , (Cz , Oz ) = (C , O ) such that R
R
(C0 , O0 ) →1 (C1 , O1 ) · · · →z (Cz , Oz ),
(4)
for some transition multisets R1 , · · · , Rz in R. Similarly, we can define #t (C, O) for t = (A, s) as the number of (A, s) objects in C, and the vector #T (C, O) as well as the binary reachability RGmp ,T can be defined similarly to single-process service automata.
254
L. Yang, Y. Wang, and Z. Dang
Example 5. Example 3 gives a service automaton that models a fire truck scheduling system, where transitions are fired sequentially. In the real world, if there are multiple schedulers, they can work in parallel; i.e., some schedulers may dispatch on-call fire trucks, some schedulers may call back on-duty fire trucks, and those actions can happen in parallel, only if two different actions work upon disjoint objects. Based on this observation, we can define a multiprocess service automaton Gmp = Σ, Δ, R based on the example service automaton G defined in Example 1, and R = {R} where R is the only transition multiset defined in below: R = {dispatch∗ , dispatch ACK ∗ , call back ∗ , call back ACK ∗ }, which means that R fires the four transitions in a maximally parallel manner. Suppose that O0 = {(Scheduler, busy)5} as the initial active set, i.e., initially, there are five Scheduler’s in state busy which are ready to start five processes. As Example 3, we define T = {t1 , t2 , t3 }, where t1 = (Scheduler, busy), t2 = (F ire T ruck, on call) and t3 = (F ire T ruck, on duty). We focus on the initial multiprocess collection (C0 , O0 ), where #T (C0 , O0 ) = (m, n, 0), with m ≥ 5 and n can be any number. Let us designate m = 5 and n = 8; i.e., there are five Scheduler’s and eight F ire T ruck’s in the initial multiprocess collection. In below, we will illustrate how the multiprocess service automaton runs. Following the analysis in Example 3, we know that initially dispatch is the only external transition that can be fired, and there are at most five dispatch’s that can be fired, since there are only five Scheduler’s. After firing R, we have five F ire T ruck’s in state on duty and they all become active, and there are (8-5=3) three F ire T ruck’s in R state on call. Now, (C0 , O0 ) → (C1 , O1 ) with O1 = {(F ire T ruck, on duty)5 } and #T (C1 , O1 ) = (5, 3, 5). Next, there are at most five dispatch ACK’s that can be fired, R
and (C1 , O1 ) → (C2 , O2 ), where O2 = {(Scheduler, busy)5} and #T (C2 , O2 ) = (5, 3, 5). At this moment, R can fire both dispath and call back, and in fact, we can fire five of them totally at most. Suppose that we nondeterministically pick one dispath and four call back’s to fire, and then one new F ire T ruck is dispatched, and four onduty F ire T ruck’s are called back, and hence there are (3+4-1=6) six F ire T ruck’s in state on call with four of them active, and (5-4+1=2) two F ire T ruck’s in state R on duty with one of them active. That is, we have (C2 , O2 ) → (C3 , O3 ), where O3 = 1 4 {(F ire T ruck, on duty) , (F ire T ruck, on call) } and #T (C3 , O3 ) = (5, 6, 2). At this time, there are at most one dispath ACK and four call back ACK’s could R be fired, and hence (C3 , O3 ) → (C4 , O4 ), where O4 = {(Scheduler, busy)5} and #T (C4 , O4 ) = (5, 6, 2). Therefore, we have (C0 , O0 ) ;Gmp (C4 , O4 ).
Obviously, multiprocess service automata can simulate service automata, and hence they can simulate VASS. Next we will show that multiprocess service automata are strictly more powerful than VASS.
Automata on Multisets of Communicating Objects
255
5 Undecidability of Presburger Reachability for Multiprocess Service Automata Now we study the Presburger reachability problem for multiprocess service automata: Given: a multiprocess service automaton Gmp , a set T ⊆ Σ × S, and a Presburger formula P. Question: Are there some initial multiprocess collection (C, O) and some multiprocess collection (C , O ) such that (C, O) ;Gmp (C , O ) and #T (C , O ) satisfying P? To proceed further, we need more definitions. A linear polynomial over nonnegative integer variables x1 , · · · , xn is a polynomial of the form a0 + a1 x1 + ... + an xn where each coefficient ai , 0 ≤ i ≤ n, is an integer. The polynomial is nonnegative if each coefficient ai , 0 ≤ i ≤ n is in N. A k-system is a quadratic Diophantine equation system that consists of k equations over nonnegative integer variables s1 , · · · , sm , t1 , · · · , tn for some m, n, in the following form: ⎧ ⎪ ⎨ 1≤j≤l B1j (t1 , · · · , tn )A1j (s1 , · · · , sm ) = C1 (s1 , · · · , sm ) .. (5) . ⎪ ⎩ B (t , · · · , t )A (s , · · · , s ) = C (s , · · · , s ) n kj 1 m k 1 m 1≤j≤l kj 1 Where the A s, B s and C s are nonnegative linear polynomials, and l, m, n are positive integers. [20] points out that the k-system in (5) can be simplified into the following form: ⎧ ⎪ ⎨ t1 A11 (s1 , · · · , sm ) + · · · + tn A1n (s1 , · · · , sm ) = C1 (s1 , · · · , sm ) .. (6) . ⎪ ⎩ t1 Ak1 (s1 , · · · , sm ) + · · · + tn Akn (s1 , · · · , sm ) = Ck (s1 , · · · , sm ) Theorem 5. The Presburger reachability problem of multiprocess service automata is decidable if and only if it is decidable whether a k-system has a solution for any k. From [20], we can get the following theorem: Theorem 6. There is a fixed k such that whether a k-system has a solution is undecidable. We can directly obtain the result from Theorem 5 and 6: Corollary 1. The Presburger reachability problem for multiprocess service automata is undecidable. Therefore, from Theorem 1, we can conclude that multiprocess service automata are strictly stronger than (single-process) service automata.
6 Discussions Service automata are a form of P systems based high level network programs, running over a network virtual machine. The virtual machine specifies abstract network communicating objects and the operations among the objects. In parallel to the idea of Java
256
L. Yang, Y. Wang, and Z. Dang
Java Program s
Java Virtual Mac hine
Conc rete Mac hine Code over CPU and Mem ory
(a)
P System based High-level Netw ork Program s
Netw ork Virtual Mac hine
Co n crete Netwo rk Pro to co l o v er a Ph y s ical Netwo rk
(b)
Fig. 3. A comparison of Java and P systems based high-level network programs
Virtual Machine [17], shown in Fig. 3, service automata can be automatically compiled into programs on the network virtual machine and, later, mapped to concrete network protocols on physical networks. One can refer to [18] for a detailed mechanism on the compiler and the mapping. Since service automata are independent of the underlying physical networks, similar to Java, they make network applications specified by service automata more portable, easier to verify and test.
References 1. Cardelli, L., Ghelli, G., Gordon, A.: Types for the ambient calculus (2002) 2. Cardelli, L., Ghelli, G., Gordon, A.D.: Ambient groups and mobility types. In: Watanabe, O., Hagiya, M., Ito, T., van Leeuwen, J., Mosses, P.D. (eds.) TCS 2000. LNCS, vol. 1872, pp. 333–347. Springer, Heidelberg (2000) 3. Cardelli, L., Gordon, A.D.: Mobile ambients. In: Nivat, M. (ed.) FOSSACS 1998. LNCS, vol. 1378. Springer, Heidelberg (1998) 4. Caro, G.D., Dorigo, M.: Two ant colony algorithms for best-effort routing in datagram networks. In: Proceedings of the Tenth IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS 1998), pp. 541–546 (1998) 5. Carriero, N., Gelernter, D.: Linda in context. Commun. ACM 32(4), 444–458 (1989) 6. Dang, Z., Ibarra, O.H.: On one-membrane P systems operating in sequential mode. Int. J. Found. Comput. Sci. 16(5), 867–881 (2005) 7. Dorigo, M., Caro, G.D.: The ant colony optimization meta-heuristic. In: New Ideas in Optimization, pp. 11–32. McGraw-Hill, London (1999) 8. Dorigo, M., Gambardella, L.M.: Ant colony system: A cooperative learning approach to the traveling salesman problem. IEEE Transactions on Evolutionary Computation 1(1), 53–66 (1997) 9. Ginsburg, S., Spanier, E.: Semigroups, presburger formulas, and languages. Pacific J. of Mathematics. 16, 285–296 (1966) 10. Ibarra, O.H., Yen, H., Dang, Z.: On various notions of parallelism in P Systems. Int. J. Found. Comput. Sci. 16(4), 683–705 (2005)
Automata on Multisets of Communicating Objects
257
11. Lohmann, N., Massuthe, P., Wolf, K.: Operating guidelines for finite-state services. In: Kleijn, J., Yakovlev, A. (eds.) ICATPN 2007. LNCS, vol. 4546, pp. 321–341. Springer, Heidelberg (2007) 12. Lynch, N.A., Tuttle, M.R.: An introduction to input/output automata. CWI-Quarterly 2(3), 219–246 (1989) 13. P˜aun, Gh.: Introduction to membrane computing. See P Systems Web Page, http://psystems.disco.unimib.it 14. P˜aun, Gh.: Computing with membranes. Journal of Computer and System Sciences 61(1), 108–143 (2000) 15. Satyanarayanan, M.: Pervasive computing: vision and challenges. IEEE Personal Communications 8(4), 10–17 (2001) 16. Di Marzo Serugendo, G., Muhugusa, M., Tschudin3, C.F.: A survey of theories for mobile agents. World Wide Web 1(3), 139–153 (1998) 17. Sun. Java remote method invocation (2007), http://java.sun.com/j2se/1.4.2/docs/guide/rmi/ 18. Wang, Y.: Clustering, grouping, and process over networks. PhD thesis, Washington State University (2007) 19. Weiser, M.: The computer for the 21st century. Scientific American 265(3), 66–75 (1991) 20. Xie, G., Dang, Z., Ibarra, O.H.: A solvable class of quadratic Diophantine equations with applications to verification of infinite state systems. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 668–680. Springer, Heidelberg (2003) 21. Yang, L., Dang, Z., Ibarra, O.H.: Bond computing systems: a biologically inspired and highlevel dynamics model for pervasive computing. In: Akl, S.G., Calude, C.S., Dinneen, M.J., Rozenberg, G., Wareham, H.T. (eds.) UC 2007. LNCS, vol. 4618, pp. 226–241. Springer, Heidelberg (2007)
Author Index
Akl, Selim G. 177 Andrews, Paul 8 Axelsen, Holger Bock
Mitchell, Melanie 146 Murphy, Niall 164 228 Nagy, Marius 177 Nagy, Naya 177 Neary, Turlough 189
Barth, Dominique 19 Beggs, Edwin 33 Bournez, Olivier 19 Boussaton, Octave 19 ˇ Brukner, Caslav 1
Owens, Nick
Patitz, Matthew J. 206 Potgieter, Petrus H. 220
Ciobanu, Gabriel 51 Clark, Ed 8 Cohen, Johanne 19 Condon, Anne 6 Corne, David 7 Costa, Jos´e F´elix 33 Dang, Zhe 242 Delzanno, Giorgio Hainry, Emmanuel
Reif, John H. 129 Rocha, Luis M. 146 Rosinger, Elem´er E. 220 Summers, Scott M.
83
Loff, Bruno 33 L¨ urwer-Br¨ uggemeier, Katharina
96
111
Van Begin, Laurent Wang, Yong 242 Woods, Damien 164 Yang, Linmin
Majumder, Urmi 129 Marques-Pita, Manuel 146
206
Thomsen, Michael Kirkedal Timmis, Jon 8 Tucker, John V. 33
64
Krishna, Shankara Narayanan
8
Ziegler, Martin
242 111
64
228