Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis and J. van Leeuwen Advisory Board: W. Brauer
D. Gries
J. Stoer
1346
S. Ramesh G. Sivakumar (Eds.)
Foundations of Software Technology and Theoretical Computer Science 17th Conference Kharagpur, India, December 18-20, 1997 Proceedings
Springer
Series Editors Gerhard Goos, Karlsruhe University, Germany Juris Hartmanis, Cornell University, NY, USA Jan van Leeuwen, Utrecht University, The Netherlands Volume Editors S. Ramesh G. Sivakumar Indian Institute of Technology, Department of Computer Science Powai, Mumbai 400 076, India E-mail: fsttcs @cse.iitb.ernet.in Cataloging-in-Publication data applied for Die Deutsche Bibliothek - CIP-Einheitsaufnahme
Foundations of software technology and theoretical c o m p u t e r science : 17th conference, Kharagpur, India, December 18 - 20, 1997 ; proceedings / S. Ramesh ; G. Sivakumar (ed.). - Berlin ; Heidelberg ; New York ; Barcelona ; Budapest ; Hong Kong ; London ; Milan ; Paris ; Santa Clara ; Singapore ; Tokyo : Springer, 1997 (Lecture notes in computer science ; Vol. 1346) ISBN 3-540-63876-8
CR Subject Classification (1991): E3-4, D.3, D.1, 1.2, El-2, G.2 ISSN 0302-9743 ISBN 3-540-63876-8 Springer-Verlag Berlin Heidelberg New York Thts work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, m its current version, and permission for use must always be obtained from Springer -Verlag. Violations are liable for prosecution under the German Copyright Law. 9 Springer-Verlag Berlin Heidelberg 1997 Printed in Germany Typesetting: Camera-ready by author SPIN 10652655 06/3142 - 5 4 3 2 I 0
Printed on acid-free paper
Preface The Foundations of Software Technology and Theoretical Computer Science conference, held annually in India, is a well established forum for researchers to present original research results. It is now organized by the Indian Association of Research in Computing Science (IARCS, http://www.imsc.ernet.in/,,~iarcs). This year's conference is the seventeenth in the series. It attracted 68 submissions from as many as 15 countries. Each submission was reviewed by at least three referees including one program committee member. Based on the reviews, 18 submissions were selected at the program committee meeting held on July 26, 1997 at the Indian Institute of Technology, Bombay. We thank all the reviewers for their assistance in ensuring the high quality of the program. One of the attractions of the FSTTCS conference is the invited talks. It is a great pleasure for us to thank the five invited speakers for this year-- Ed Clarke, Deepak Kapur, Madhu Sudan, Vijaya Ramachandran, and Moshe Vardi-- for readily agreeing to give talks and for providing written submissions for the proceedings. This year's conference is preceded by a two-day workshop on randomized algorithms. Special thanks to Vijaya Ramachandran, Edgar Ramos, Aravind Srinivasan, Sandeep Sen, and Madhu Sudan for their participation in the workshop. The conference and the workshop are held on the campus of Indian Institute of Technology, Kharagpur. We thank the local organizers and l i t Kharagpur for providing the infrastructural support. We also thank the governmental agencies (DST, DOE, AICTE, INSA) and other organizations (IEEE Kharagpur, Synetics Corporation, Viewlogic Systems, Compaq Computer Asia, Digital Equipment (India) Ltd., Alumnus Software, Motorola India Electronics Ltd., Price Waterhouse Associates, Vedika Software, and Modi Xerox Ltd.) who have extended financial support. Special thanks to our students Parag Deshmukh, Sridhar Iyer, and Ramesh Babu, and to the office staff of our department for their help in various ways. We also thank Alfred Hofmann, Anna Kramer, and the staff at Springer-Vertag for their continued support in bringing out the proceedings. I1T Bombay October 1997
S. Ramesh G. Sivakumar
VI
Program Committee Manindra Agrawal (IIT Kanpur) Ralph Back (Abo Akademi, Finland) John Case (U. Delaware) Vijay Chandru (IISc Bangalore) Joxan Jaffar (NUS, Singapore) Nachum Dershowitz (UIUC, Illinois) Tamal K. Dey (liT Kharagpur) Anna Gal (Princeton) Asish Mukhopadhyay (IIT Kanpur) Madhavan Mukund (SMI Madras) (Publicity Chair) Ketan Mulmuley (IIT Bombay/U.Chicago) C. Pandurangan (IIT Madras) P. K. Pandya (TIFR Bombay) A. K. Pujari (U. Hyd., Hyderabad) Vijaya Ramachandran (U. Texas) Krithi Ramamritham (U. Mass.) Venkatesh Raman (IMSc Madras) S. Ramesh (IIT Bombay) (Co-Chair) Bala Ravikumar (U. Rhode Island) Willem-Paul de Roever (Kiel University) Sandeep Sen (liT Delhi) R. K. Shyamasundar (TIFR Bombay) G. Sivakumar (IIT Bombay) (Co-Chair) Ashok Subramanian (IISc Bangalore)
Organizing Committee S. C. De Sarkar (IIT Kharagpur) (Chair) T. K. Dey (IIT Kharagpur) P. P. Chakrabarti (IIT Kharagpur) A. Bagchi (IIM Calcutta) M. K. Chakraborty (Calcutta Univ.) P. P. Das (IIT Kharagpur) P. Dasgupta (IIT Kharagpur) G. L. Datta (IIT Kharagpur) A. Pal (IIT Kharagpur) S. P. Pal (IIT Kharagpur) D. Sarkar (liT Kharagpur) B. P. Sinha (ISI Calcutta)
VII
List of Referees
Manindra Agrawal Rajeev Alur S. Arun-Kumar V Arvind Purandar Bhaduri Binay K. Bhattacharya Pushpak Bhattacharyya S. Biswas Chiara Bodei Ravi B. Boppana Roberto Bruni H. Buhrman Diego Calvanese Luca Cardelli John Case Ilaria Castellani Sharat Chandran Pallab Dasgupta Abhi Dattasharma N Dershowitz Joerg Desel Tamal K. Dey A. A. Diwan. Javier Esparza Anna Gal Michael R. Hansen Nevin Heintze Matthew Hennessy Martin Henz Monika Henzinger Lucas Hui Sanjay Jain J. James Tao Jiang L. V. Kale
B. Kalyanasundaram Deepak Kapur Padmanabhan Krishnan K. Narayan Kumar Yassine Lakhnech J.C. Liou Kamal Lodaya Phil Long C.E.Veni Madhavan Meena Mahajan Massimo Marchiori Y. Kao Ming Madhavan Mukund ketan Mulmuley M. Narasimha Murty Gopalan Nadathur Y. Narahari David Naumann M. Nielsen Friedrich Otto Paritosh K. Pandya Sachin Patkar Carsta Petersohn Jaikumar Radhakrishnan Vijaya Ramachandran K. Ramamritham Venkatesh Raman R Ramanujam S. Ramesh Rafael Ramirez Edgar Ramos A. Ranade Narayan Rangaraj Desh Ranjan M.R.K. Krishna Rao
B. Ravikumar Sandeep Sen S. Seshadri Anil Seth Nimish R. Shah N. Shankar Priti Shankar H. Shrikumar RK Shyamasundar E Simon R. de Simone Ambuj Singh Ramesh Sitaraman G. Sivakumar Milind Sohoni Neelam Soundarajan Aravind Srinivasan Ashok Subramanian K G Subramanian P.R. Subramanya S. Sudarshan Tiow Seng Tan P.S. Thiagarajan Ashish Tiwari Frits Vaandrager L. Valiant G. Venkatesh H. Venkateswaran V. Vinay V. Visvanathan Limsoon Wong Qiwen Xu Sheng Yu Job Zwiers
Table of Contents
Invited Talk 1 Vijaya Ramachandran QSM: A general purpose shared-memory model f o r parallel computation
Contributed Paper Session 1 T.K. Dey, A. Roy, N.R. Shah Approximating geometric domains through topological triangulations . . .
6
S. Mahajan, E. A. Ramos, K.V. Subrahmanyam Solving some discrepancy problems in NC . . . . . . . . . . . . . . . . . .
22
K. Cirino, S. Muthukrishnan, N. S. Narayanaswamy, H. Ramesh Graph editing to bipartite interval graphs: Exact and asymptotic bounds
37
Invited Talk 2 Edmund M. Clarke Jr. ~lodel checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
Contributed Paper Session 2 A. J. Kfoury Recursion versus iteration at higher-orders . . . . . . . . . . . . . . . . .
57
A. D. Gordon, P. D. Hankin, S. B. Lassen Compilation and equivalence of imperative objects . . . . . . . . . . . . .
74
M. Marchiori On the expressive power of rewriting
88
....................
Invited Talk 3 Deepak Kapur, M. Subramaniam Mechanizing Verification o f Arithmetic Orcuits: SRT Division . . . . . . .
103
x
Contributed Paper Session 3 E. Pontelli, D. Ranjan, G. Gupta On the complexity of parallel implementation o f logic programs
123
Jia-Huai You, Li-Yan Yuan, Randy Goebel An abductive semantics f o r disjunctive logic programs and its proof procedure . . . . . . . . . . . . . . . . . . . . . . . . . . .
138
Contributed Paper Session 4 S. Mohalik, R. Ramanujam Assumption-Commitment in automata . . . . . . . . . . . . . . . . . . . .
153
S. S. Kulkami, A. Arora Compositional design of multitolerant repetitive Byzantine agreement . . . 169
Invited Talk 4 Madhu Sudan Algorithmic issues in coding theory
184
Contributed Paper Session 5 A. Seth Sharper results on the expressive power of generalized quantifiers . . . . .
200
N.V. Vinodchandran Improved lowness results f o r solvable black-box group problems . . . . . .
220
V. Arvind, J. Koebler On resource-bounded measure and pseudorandomness . . . . . . . . . . .
235
Invited Talk 5 Moshe Y. Vardi Verification of open systems
.........................
250
XI
Contributed Paper Session 6 E S. de Boer, U. Hannemann, W. -P. de Roever Hoare-style compositional proof systems f o r reactive shared variable concurrency . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
267
K. S. Namjoshi A simple characterization o f stuttering bisimulation . . . . . . . . . . . . .
284
Contributed Paper Session 7 R. Devillers, H. Klaudel, R. -C. Riemann General refinement for high level Petri nets . . . . . . . . . . . . . . . . .
297
C. Dufourd, A. Finkel Polynomial-time many-one reductions for Petri nets . . . . . . . . . . . . .
312
B. Graves Computing reachability properties hidden infinite net unfoldings
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.....
327
343
QSM: A General Purpose Shared-Memory Model for Parallel Computation? Vijaya Ramachandran Department of Computer Sciences The University of Texas at Austin Austin, Texas 78712
Abstract. The Queuing Shared Memory (qsm) model is a general purpose shared-memory model for parallel computation. The qsm provides a high-level shared-memory abstraction for eective parallel algorithm design, as well as the ability to capture bandwidth limitations, as evidenced by a randomized work-preserving emulation on the bsp, which is a lower-level, distributed-memory model.
Summary A fundamental challenge in parallel processing is to develop eective models for parallel computation that balance simplicity, accuracy, and broad applicability. In particular, a simple \bridging" model, i.e., a model that spans the range from algorithm design to architecture to hardware, is an especially desirable one. In [8] we proposed the Queuing Shared Memory (QSM) model as a bridging shared-memory model for parallel computation. The QSM provides a high-level shared-memory abstraction for parallel algorithm design, as well as the capability to model bandwidth limitations and other features of current parallel machines, as evidenced by a randomized work-preserving emulation of the QSM on the Bulk Synchronous Parallel model [14], which is a lower-level, distributed-memory model.
Model De nition and Overview The Queuing Shared Memory (qsm) model [8] consists of a number of identical processors, each with its own private memory, communicating by reading and writing locations in a shared memory. Processors execute a sequence of synchronized phases, each consisting of an arbitrary interleaving of the following operations: 1. Shared-memory reads: Each processor i copies the contents of ri sharedmemory locations into its private memory. The value returned by a sharedmemory read can be used only in a subsequent phase. ?
This research was supported in part by NSF grant CCR/GER-90-23059. Email:
[email protected]. URL: http://www.cs.utexas.edu/users/vlr
2. Shared-memory writes: Each processor i writes to wi shared-memory locations. 3. Local computation: Each processor i performs ci ram operations involving only its private state and private memory. Concurrent reads or writes (but not both) to the same shared-memory location are permitted in a phase. In the case of multiple writers to a location x, an arbitrary write to x succeeds in writing the value present in x at the end of the phase. The maximum contention of a qsm phase is the maximum, over all locations x, of the number of processors reading x or the number of processors writing x. A phase with no reads or writes is de ned to have maximum contention one. Consider a qsm phase with maximum contention . Let mop = maxi ci for the phase, i.e. the maximum over all processors i of its number of local operations, and let mrw = max 1; maxi ri; wi for the phase. Then the time cost for the phase is max mop ; g mrw ; . (Alternatively, the time cost could be mop + g mrw + ; this aects the bounds by at most a factor of 3, and we choose to use the former de nition.) The time of a qsm algorithm is the sum of the time costs for its phases. The work of a qsm algorithm is its processor-time product. The particular instance of the Queuing Shared Memory model in which the gap parameter, g, equals 1 is the Queue-Read Queue-Write (qrqw) pram model de ned in [6]. A variant of the qsm in which a gap parameter g is applied to accesses at memory (in addition to requests to global memory from processors) is called the s-qsm. In the s-qsm the time cost of a phase is max mop ; g mrw ; g . It is shown in [8, 12] that the qsm and the s-qsm are interchangeable models for the most part. The bsp model [14] and the logp model [4] are well-known distributedmemory parallel computation models. We now state two theorems relating the s-qsm to the bsp. The rst theorem states that the s-qsm can be emulated in a work-preserving manner on the bsp with only a modest slowdown. The second theorem proves the converse, namely, that the bsp can be emulated in a work-preserving manner on the s-qsm with only a logarithmic slowdown. Similar results relating the qsm and bsp are established in [8, 12] (for the qsm we need p (L=g)+p g log p for Theorem 1 to hold, while Theorem 2 holds as stated below for the s-qsm). In the following two theorems g is the gap parameter for the s-qsm and the bsp, and L is the latency parameter [14] for the bsp. Theorem 1. A p -processor s-qsm algorithm that runs in time t can be emulated on a p-processor bsp in time t = t pp w.h.p. provided p (L=g)p+ logp and t is bounded by a polynomial in p. f
f
f
f
gg
g
f
0
0
0
0
0
0
0
g
g
Theorem2. An algorithm that runs in time t(n) on an n-component bsp, where t(n) is bounded by a polynomial in n, can be emulated with high probability on an s-qsm to run in time O(t(n) logn) with n= logn processors.
Since the qsm (and s-qsm) are higher-level models having fewer parameters than the bsp, we believe that they are more suitable as general-purpose models than the bsp. Furthermore, the qsm models have shared-memory, which is a very convenient framework for parallel algorithm design. Table 1 summarizes some algorithmic results for basic problems on the qsm, together with citations to the papers that present the results. problem (n = size of input) qsm result1 source 2 pre x sums, list ranking, etc. O(g log n) time, (gn)3 work erew linear compaction O( g log n + g log log n) time, O(gn) work qrqw4 [6] random permutation O(g log n) time, (gn) work w.h.p. qrqw [7] multiple compaction O(g log n) time, (gn) work w.h.p. qrqw [7] parallel hashing O(g log n) time, (gn) work w.h.p. qrqw [7] load balancing, max. load L O(g( log n log log L + log L)) time, qrqw [7] (gn) work w.h.p. broadcast to n mem. locations (g log n=(log g)) time, (ng) work qsm [1] sorting O(g log n) time, O(gn log n) work erew [3] simple fast sorting O(g log n + log2 n=(log log n)) time, qsm [8] (sample sort) O(gn log n) work w.h.p. work-optimal sorting O(n (g + log n)) time, > 0, bsp [5] (sample sort) (gn + n log n) work w.h.p. p
p
Table 1. Summary of some algorithmic results for the qsm. The random permutation algorithm of [7] cited in Table 1 has been implemented on three parallel machines: an implementation on the MasPar MP-1 is reported in [7] and implementations on the Cray C90 and Cray J90 are reported in [2]. In both studies, the performance of this algorithm was compared to the best previous parallel algorithm for this problem. On all three machines, the algorithm in [7] was found to outperform the other algorithm, giving evidence The time bound stated is the fastest for the given work bound; by [8], any slower time is possible within the same work bound. 2 Since any erew result maps on to the qsm with the work and time both increasing by a factor of g [8], the two problems cited in this line are representatives of the large class of problems for which logarithmic time, linear work erew pram algorithms are known [10, 9, 13]. 3 The use of in a work or time bound implies that the result is the best possible, to within a constant factor. 4 Any qrqw result maps on to the qsm with the work and time both increased by a factor of g [8]. This is the source of all qsm results attributed in Table 1 to qrqw, except the result for linear compaction, where the qrqw algorithm was ne-tuned to reduce the dependence on g. 1
that the qsm is a good model for parallel algorithm design on these machines.
Conclusion
The qsm is a general purpose shared-memory model for parallel computation that holds the promise of serving as a bridge between parallel algorithms and architectures. Much work still remains to be done in developing ecient algorithms and lower bounds for this model, and in experimental evaluation of the model and algorithms developed for it. In recent work, we have developed some lower bound results for the qsm for several basic problems such as linear compaction, load balancing, parity, list ranking, and sorting [11].
References 1. M. Adler, P. B. Gibbons, Y. Matias, and V. Ramachandran. Modeling parallel bandwidth: Local vs. global restrictions. In Proc. 9th ACM Symp. on Parallel Algorithms and Architectures, pages 94{105, June 1997. 2. G. E. Blelloch, P. B. Gibbons, Y. Matias, and M. Zagha. Accounting for memory bank contention and delay in high-bandwidth multiprocessors. In Proc. 7th ACM Symp. on Parallel Algorithms and Architectures, pages 84{94, July 1995. 3. R. Cole. Parallel merge sort. SIAM Journal on Computing, 17(4):770{785, 1988. 4. D. Culler, R. Karp, D. Patterson, A. Sahay, K.E. Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: Towards a realistic model of parallel computation. In Proc. 4th ACM SIGPLAN Symp. on Principles and Practices of Parallel Programming, pages 1{12, May 1993. 5. A. V. Gerbessiotis and L. Valiant. Direct bulk-synchronous parallel algorithms. Journal of Parallel and Distributed Computing, 22:251{267, 1994. 6. P. B. Gibbons, Y. Matias, and V. Ramachandran. The Queue-Read Queue-Write PRAM model: Accounting for contention in parallel algorithms. SIAM Journal on Computing, 1997. To appear. Preliminary version appears in Proc. 5th ACM-SIAM Symp. on Discrete Algorithms, pages 638-648, January 1994. 7. P. B. Gibbons, Y. Matias, and V. Ramachandran. Ecient low-contention parallel algorithms. Journal of Computer and System Sciences, 53(3):417{442, 1996. Special issue devoted to selected papers from the 1994 ACM Symp. on Parallel Algorithms and Architectures. 8. P. B. Gibbons, Y. Matias, and V. Ramachandran. Can a shared-memory model serve as a bridging model for parallel computation? In Proc. 9th ACM Symp. on Parallel Algorithms and Architectures, pages 72{83, June 1997. To appear. 9. J. JaJa. An Introduction to Parallel Algorithms. Addison-Wesley, Reading, MA, 1992. 10. R. M. Karp and V. Ramachandran. Parallel algorithms for shared-memory machines. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, Volume A, pages 869{941. Elsevier Science Publishers B.V., Amsterdam, The Netherlands, 1990. 11. P. MacKenzie and V. Ramachandran. Manuscript under preparation. 1997. 12. V. Ramachandran. A general purpose shared-memory model for parallel computation. Invited paper for IMA Volume in Mathematics and Its Applications on `Algorithms for Parallel Processing.' R. Schreiber, M. Heath, A. Ranade, ed. Springer-Verlag. To appear.
13. J. H. Reif, editor. A Synthesis of Parallel Algorithms. Morgan-Kaufmann, San Mateo, CA, 1993. 14. L. G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103{111, 1990. 15. L. G. Valiant. General purpose parallel architectures. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, Volume A, pages 943{972. Elsevier Science Publishers B.V., Amsterdam, The Netherlands, 1990.
This article was processed using the LaTEX macro package with LLNCS style
QSM: A General Purpose Shared-Memory Model for Parallel Computation* Vijaya Ramachandran Department of Computer Sciences The University of Texas at Austin Austin, Texas 78712
A b s t r a c t . The Queuing Shared Memory (QSM)model is a general purpose shared-memory model for parallel computation. The QSM provides a high-level shared-memory abstraction for effective parallel algorithm design, as well as the ability to capture bandwidth limitations, as evidenced by a randomized work-preserving emulation on the BsP, which is a lower-level, distributed-memory model.
Summary A fundamental challenge in parallel processing is to develop effective models for parallel computation that balance simplicity, accuracy, and broad applicability. In particular, a simple "bridging" model, i.e., a model that spans the range from algorithm design to architecture to hardware, is an especially desirable one. In 8 we proposed the Queuing Shared Memory (QSM) model as a bridging shared-memory model for parallel computation. The QSM provides a high-level shared-memory abstraction for parallel algorithm design, as well as the capability to model bandwidth limitations and other features of current parallel machines, as evidenced by a randomized work-preserving emulation of the QSM on the Bulk Synchronous Parallel model 14, which is a lower-level, distributed-memory model. Model Definition and Overview The Queuing Shared Memory (QSM) model 8 consists of a number of identical processors, each with its own private memory, communicating by reading and writing locations in a shared memory. Processors execute a sequence of synchronized phases, each consisting of an arbitrary interleaving of the following operations: 1. Shared-memory reads: Each processor i copies the contents of ri sharedmemory locations into its private memory. The value returned by a sharedmemory read can be used only in a subsequent phase. * This research was supported in part by NSF grant CCR/GER-90-23059. mail: vlr@cs .utexas .edu. URL: http://www, as .utexas. e d u / u s e r s / v l r
E-
2. Shared-memory writes: Each processor i writes to wi shared-memory locations. 3. Local computation: Each processor i performs ci RAM operations involving only its private state and private memory. Concurrent reads or writes (but not both) to the same shared-memory location are permitted in a phase. In the case of multiple writers to a location x, an arbitrary write to x succeeds in writing the value present in x at the end of the phase. The m a x i m u m contention of a QSM phase is the maximum, over all locations z, of the number of processors reading x or the number of processors writing x. A phase with no reads or writes is defined to have m a x i m u m contention one. Consider a QSM phase with maximum contention ~. Let mop = max~{ci} for the phase, i.e. the maximum over all processors i of its number of local operations, and let mrw = max{l, max/{ri, w~} for the phase. Then the time cost for the phase is max {mop, g. mrw, ~}. (Alternatively, the time cost could be mop A- g - m ~ ยง .to; this affects the bounds by at most a factor of 3, and we choose to use the former definition.) The time of a QSM algorithm is the sum of the time costs for its phases. The work of a QSM algorithm is its processor-time product. The particular instance of the Queuing Shared Memory model in which the gap parameter, g, equals 1 is the Queue-Read Queue-Write (QRQW) PRAM model defined in 6. A variant of the QSM in which a gap parameter g is applied to accesses at m e m o r y (in addition to requests to global memory from processors) is called the 8-QSM. In the s-QSM the time cost of a phase is m a x { m o p , g . m r w , g . ~}. It is shown in 8, 12 that the QSM and the 8-QSM are interchangeable models for the most part. The BSP model 14 and the LogP model 4 are well-known distributedmemory parallel computation models. We now state two theorems relating the 8-QSM to the BSP. The first theorem states that the s-QSM can be emulated in a work-preserving manner on the BSP with only a modest slowdown. The second theorem proves the converse, namely, that the BsP can be emulated in a work-preserving manner on the 8-QSM with only a logarithmic slowdown. Similar results relating the QSM and BSP are established in 8, 12 (for the QSM we p~ need p <_ (L/a)+glogp for Theorem 1 to hold, while Theorem 2 holds as stated below for the s-qsM). In the following two theorems g is the gap parameter for the 8-QSM and the BsP, and L is the latency parameter 14 for the BsP. T h e o r e m 1. A pl-processor s-QSM algorithm that runs in time t ~ can be emulated on a p-processor BSP in time t = t' 9 ~p w.h.p, provided
p, P <- ( L / g ) + logp and t ~ is bounded by a polynomial in p.
T h e o r e m 2. An algorithm that runs in time t(n) on an n-component BSP, where t(n) is bounded by a polynomial in n, can be emulated with high probability on an s-QSM to run in ~ime O(t(n). logn) with n / l o g n processors. Since the QSM (and S-QSM) are higher-level models having fewer parameters t h a n the BSP, we believe t h a t they are more suitable as general-purpose models t h a n the BSP. Furthermore, the QSM models have shared-memory, which is a very convenient framework for parallel algorithm design. Table 1 summarizes some algorithmic results for basic problems on the QSM, together with citations to the papers that present the results. p r o b l e m (n = size of input) prefix sums, list ranking, etc. z linear compaction random permutation multiple compaction parallel hashing load balancing, max. load L broadcast to n mem. locations sorting simple fast sorting (sample sort) work-optimal sorting (sample sort)
QSM r e s u l t 1
source
O(glog n) time, O(gn) ~ work
EREW QRQWd6
O(glog n) time, O(gn) work w.h.p. O(g log n) time, O(gu) work w.h.p. O(g log n) time, O(gn) work w.h.p.
QrtQW7 Q R Q w7 Qrtqw 7 qrtqw 7
O(g(x/log n log log L + log L)) time, O(gu) work w.h.p. O(glog u/(log g)) time, O(ng) work 0(9 log n) time, O(gn log n) work O(g log n + log 2 n/(log log n)) time, O(gn log n) work w.h.p. O(n +. (g + log n)) time, e > O, O(gn + nlog n) work w.h.p.
QSM 1 E~W 31 qs~ Is
ssP 5
T a b l e 1. Summary of some algorithmic results or the QSM. The random p e r m u t a t i o n algorithm of 7 cited in Table 1 has been implemented on three parallel machines: an implementation on the MasPar MP-1 is reported in 7 and implementations on the Cray C90 and Cray J90 are reported in 2. In both studies, the performance of this algorithm was compared to the best previous parallel algorithm for this problem. On all three machines, the algorithm in 7 was found to outperform the other algorithm, giving evidence that the QSM is a good model for parallel algorithm design on these machines. i The time bound stated is the fastest for the given work bound; by 8, any slower time is possible within the same work bound. 2 Since any Em~w result maps on to the qsM with the work and time both increasing by a factor of g 8, the two problems cited in this line are representatives of the large class of problems for which logarithmic time, linear work ErtEW PRAM algorithms are known 10, 9, 13. 3 The use of O in a work or time bound implies that the result is the best possible, to within a constant factor. 4 Any QrtQw result maps on to the qSM with the work and time both increased by a factor of g 8. This is the source of all QSM results attributed in Table 1 to qRqw, except the result for linear compaction, where the qRQW algorithm was fine-tuned to reduce the dependence on g.
Conclusion The QSM is a general purpose shared-memory model for parallel c o m p u t a t i o n that holds the promise of serving as a bridge between parallel algorithms and architectures. Much work still remains to be done in developing efficient algorithms and lower bounds for this model, and in experimental evaluation of the model and algorithms developed for it. In recent work, we have developed some lower bound results for the QSM for several basic problems such as linear compaction, load balancing, parity, list ranking, and sorting 11.
References 1. M. Adler, P. B. Gibbons, Y. Matins, and V. Ramachandran. Modeling parallel bandwidth: Local vs. global restrictions. In Proe. 9th A CM Syrup. on Parallel Algorithms and Architectures, pages 94-105, June 1997. 2. G. E. Blelloch, P. B. Gibbons, Y. Matins, and M. Zagha. Accounting for memory bank contention and delay in high-bandwidth multiprocessors. In Proc. 7th ACM Syrup. on Parallel Algorithms and Architectures, pages 84-94, July 1995. 3. R. Cole. Parallel merge sort. SIAM Journal on Computing, 17(4):770-785, 1988. 4. D. Culler, R. Karp, D. Patterson, A. Sahay, K.E. Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: Towards a realistic model of parallel computation. In Proc. 4th ACM SIGPLAN Syrup. on Principles and Practices o Parallel Pro9ramming, pages 1-12, May 1993. 5. A. V. Gerbessiotis and L. Valiant. Direct bulk-synchronous parallel algorithms. Journal o/ Parallel and Distributed Computing, 22:251-267, 1994. 6. P. B. Gibbons, Y. Matins, and V. Ramachandran. The Queue-Read Queue-Write PRAM model: Accounting for contention in parallel algorithms. SIAM Journal on Computing, 1997. To appear. Preliminary version appears in Proc. 5th ACM-SIAM Syrup. on Discrete Algorithms, pages 638-648, January 1994. 7. P. B. Gibbons, Y. Matias, and V. Ramachandran. Efficient low-contention parallel algorithms. Journal of Computer and System Sciences, 53(3):417-442, 1996. Special issue devoted to selected papers from the 1994 ACM Syrup. on Parallel Algorithms and Architectures. 8. P. B. Gibbons, Y. Matins, and V. Ramachandran. Can a shared-memory model serve as a bridging model for parallel computation? In Proc. 9th ACM Syrup. on Parallel Algorithms and Architectures, pages 72-83, June 1997. To appear. 9. J. Js163 An Introduction to Parallel Algorithms. Addison-Wesley, Reading, MA, 1992. 10. R. M. Karp and V. Ramachandran. Parallel algorithms for shared-memory machines. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, Volume A, pages 869-941. Elsevier Science Publishers B.V., Amsterdam, The Netherlands, 1990. 11. P. MacKenzie and V. Ramachandran. Manuscript under preparation. 1997. 12. V. Ramachandran. A general purpose shared-memory model for parallel computation. Invited paper for IMA Volume in Mathematics and Its Applications on 'Algorithms for Parallel Processing.' R. Schreiber, M. Heath, A. Ranade, ed. Springer-Verlag. To appear.
13. J. H. Reif, editor. A Synthesis of Parallel Algorithms. Morgan-Kaufmann, San Mateo, CA, 1993. 14. L. G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103-111, 1990. 15. L. G. VMiant. General purpose parallel architectures. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, Volume A, pages 943-972. Elsevier Science Publishers B.V., Amsterdam, The Netherlands, 1990.
Approximating Geometric Domains through Topological Triangulations Tamal K. Dey 1 Arunabha Roy 1 Nimish R. Shah 2 1 Dept. of CSE, I.I.T. Kharagpur, Kharagpur 721302, India 2 Synopsys Inc., 700 East Middlefield Road, Mt. View, CA 94043, USA Abstract. This paper introduces a 2-dimensional triangulation scheme based on a topological triangulation that approximates a given domain X within a specified Hausdorff distance from X. The underlying space of the resulting good quality triangulation is homeomorphic to X and contains either equilateral triangles or right angled triangles with 30 ~ 60 ~ and 90 ~ angles. For a particular range of approximation tolerance, the number of triangles in the triangulation produced by the method is O(t log2 t) where t is the number of triangles in an optimal triangulation where the optimum is taken over bounded aspect ratio triangulations satisfying a certain boundary condition with respect to X. The method can also produce a quadrangulation of X having similar properties. Relevant implementation issues and results are discussed.
1
Introduction
Triangulations of geometric spaces are used in many applications such as finite element methods, computer graphics and solid modeling. Triangular elements are usually linear and therefore a finite number of triangular elements cannot decompose a curved domain. On the other hand, it is easier to handle linear elements computationally. As a compromise, in most of the cases a curved domain is approximated with linear elements. A solid is often modeled with a linear boundary which is further broken up into triangular elements for efficient representation 10, for other postprocessing such as finite element analysis 1, for rendering 8, etc. Numerous methods exist to triangulate a domain with linear boundaries. See 3 for a comprehensive coverage of these methods. The requirement that the triangular elements cover the given domain exactly often conflicts with other objectives. For example, consider bounded aspect ratio triangulations in which each triangular element is guaranteed to have an aspect ratio within some limit. A polygon with a very sharp angle can never be triangulated without producing angles less than or equal to that sharp angle. All recent methods of guaranteed aspect ratio triangulation 2, 4, 6, 13 are constrained by this. Frequently, the domain being triangulated does not represent the original object exactly. Therefore, the constraint of triangulating the input domain exactly is too restrictive in these cases. Of independent interest are approximations of geometric domains being used recently in other applications such as simplification of scenes 9 for rendering and collison detection 11 in animations. In this
paper we relax the condition that the underlying space of the triangulation be exactly the same as the given domain. Our triangulation scheme approximates a given two dimensional geometric domain within a user specified tolerance keeping topological equivalence with the given domain. Since the constraint of fitting the boundary of the domain is not present, we have more flexibility in restricting the angles of the triangulation. In fact, we show that our triangulation produces only two types of triangles, viz. equilateral triangles and right angled triangles with 30 ~ 60 ~ and 90 ~ angles. The algorithm of 2 guarantees a minimum angle of 180 if the input polygon has all angles greater than 18 ~. For no-large-angle methods, see 4, 5. All these algorithms decompose the input polygons exactly. Relaxing this constraint allows our algorithm to handle more general topological domains and produce non-obtuse triangulations with the smallest angle greater than or equal to 30 ~. The method can be modified very easily to produce a quadrangulation of the domain, which find applications in finite element methods. Section 2 introduces some terminology and describes a triangulation method based on the topological triangulation introduced by Edelsbrunner and Shah 7. The method of generating a topological triangulation is based on a theorem that states the condition for the dual complex of a subdivision of the domain to be a triangulation with underlying space topologically the same as the given domain. A major computational drawback of the method was overcome when Shah 14 introduced a systematic method of choosing Steiner points from the given domain to build the triangulation. However, these earlier works were not concerned with implementation issues, size of the triangulation and quality of the generated triangles. This paper addresses these issues and introduces topological triangulation as an approximation scheme. Section 3 discusses a triangulation algorithm based on the topological triangulation scheme described in section 2. Implementation issues are discussed in section 4. While implementing the algorithm we discovered some tricks that ease the implementation effort and produce better output with respect to the number of triangles. These tricks are discussed in section 4. Section 5 discusses certain properties of the triangulation produced by the algorithm in section 3. It is shown that the Hausdorff distance between the underlying space of the triangulation and the given domain is at most (iT where (iT is the given approximation tolerance. Equilateral triangular grids as opposed to square grids used by Shah 14 are shown to produce triangles with good aspect ratio. The triangulation is a subcomplex of some Delaunay triangulation and has non-obtuse angles only. These properties are desirable in many applications. Let T(X, (iT) denote the set of bounded aspect ratio triangulations T such that the triangles of T cover the boundary of X and the triangles of T touching the boundary of the underlying space of T are within a distance 32-9(iT from the boundary of X. Let ITI denote the number of triangles in a triangulation T and let t denote the minimum ITI, T E T(X, (iT). For a particular range of the approximation tolerance (iT, we show in section 5 that our scheme produces a triangulation S such that ISI = O(t log2 t). For other values of the approximation tolerance, we expect the method to produce a triangulation of reasonable size. This is corroborated by
our implementation results. Section 6 describes some implementation results and discusses some ongoing research.
2
Preliminaries B a s i c definitions
2.1
In the following we only consider topological spaces that are subspaces of Euclidean real spaces. The d-dimensional real space is denoted by R d. SIMPLEX. A simplex ~rT is the convex hull of an affinely independent point set T. If IT = k + 1, then 0"T is a k-simplex. A j-simplex ~u is a j-face of 0"T if U C T. A 0-simplex is also called a vertex, a 1-simplex is called an edge and a 2-simplex is called a triangle. SIMPLICIAL COMPLEX. A simplicial complex IC is a collection of simplices such that the following properties hold. i) If a t / 9 and V C_ U, then e v 9 ii) If gu, ~ v 9 then ~u n z v = z u n v . The first condition implies that all faces of a simplex in/C are contained in /C. The second condition says that the (possibly empty) intersection of any two simplices in/C is their c o m m o n face. The underlying space of/C is denoted by UIc.
HOMEOMORPHISM. A continuous function f : X --~ Y between two topological spaces X and Y is a homeomorphism if its inverse is also continuous. X and Y are homeomorphic if there exists a homeomorphism between them. CLOSED BALLS. A topological space X C_ R d is a closed k-ball if X is homeomorphic to B h = {p 9 R k I POl <_ 1}, the unit k-ball. Here, o is the origin of R k. HAUSDORFF DISTANCE. Define the distance between a point p 9 R d and a compact set Z C R d to be d(p,X) = min{Ipxl I x 9 Z } . For a compact set Z C R d and a non-negative r 9 R, define Z + r = {p 9 R d I d(p, X) < r}. The Hausdorff distance between two compact sets X, Y C R d is the m i n i m u m r > 0 such that X C_ Y + r and Y C X + r. It is straightforward to verify that the Hausdorff distance is indeed a metric on the space of compact sets. 2.2
Topological triangulation
A topological triangulation of a topological space X is a simplicial complex/C such that U / c , the underlying space of/C, is homeomorphic to X. In this paper X is a compact 2-manifold embedded in R 2. In general, a connected component of X has an outer boundary and possibly a set of inner boundaries. We denote the boundary and interior of X by bd(X) and int(X) respectively. A A-tree E is a hierarchical subdivision of an equilateral triangle, represented by the root of the tree, into smaller equilateral triangles. An internal node of E represents an equilateral triangle which is split into four smaller, congruent, equilateral triangles in this hierarchical subdivision. If the length of the path from
a node n in E to the root is ! and the centroid of the triangle t corresponding to n is p, then we shall use Vpt to refer to both t and n. The usage will be clear from the context. Figure 2.1 illustrates splitting where a node a is split into four equilateral triangles b, c, d, and e. Observe that the centroid of the triangle corresponding to a is the same as the centroid of the triangle corresponding to d. Let Ex denote the subtree of E whose leaf nodes are exactly those leaf nodes of E which have non-empty intersection with X. Assume that the triangle corresponding to the root of E contains X. a
a
Fig. 2.1. Splitting a node into four children.
Let P = {p I Vptn is a leaf node in Ex}. For p e P, we shall refer to V~" simply as Vv since Ip is decided by the fact that the node is a leaf node. For any T C_ P, define IT to be the intersection of all Vp such that p E T. For convenience, we define I r = 0 when T = ~. Define IT,x = IT N X. Along the line of topological triangulation in 7, 14, we construct a simplicial complex from subcomplexes defined for each non-empty IT,x as follows. When ITI = 1, define S(T) = {p}, where T = {p}. If IT I = 2, define S(T) = {Pl, P2, ~r{px,p2}}, where T = {Pl, P2}. When ITI > 3, IT # ~ is a singleton set, say IT = {v}. Let Pl, P2, ..-, Pk E T be some sequence of centroids of triangles ordered around v. Consider the polygonal cells P~ with the boundary Px,P2,...,Pk,Pl. Triangulate this polygon possibly using points inside it and let S(T) be the corresponding simplicial complex. For an illustration see the polygonal cell Pz and Pu in Figure 3.2 (the cells are not marked in the figure). Finally, let S(X) be the union of all S(T) with non-empty IT,X. It can be shown that S(X) is a simplicial complex. The following theorem, which follows from the results of 14, is at the heart of the validity of topological triangulation. This was proved for the case of cubic subdivisions (quadtree in 1~2 and octtree in R 3) in 14 and the most general form of the theorem is proved in 7. The theorem uses a closed ball property, which we simplify for our domain X.
closed ball property with respect to X if cr fq X is either empty or a closed k-ball and cr N bd(X) is either
CLOSED BALL PROPERTY. A k-simplex cr satisfies the
empty or a closed (k - 1)-bMl. T h e o r e m 1. U S(X) is homeomorphic to X if, for any T C__P, IT = 0 or IT
satzsfies the closed ball property with respect to X.
10 3
Algorithm
In this section we describe an algorithm to compute a topological triangulation of an input domain X. The input to our algorithm is X and a tolerance of approximation 6T. The output is a topological triangulation S(X, 6) of X where 6 = 32- 9 ST. All triangles of S(X, 8) have bounded aspect ratio. In particular, they are either equilateral triangles or right angled triangles of specific type. Furthermore, the Hausdorff distance between S = U S(X, 6) and X is at most 6T. The algorithm builds E by successively splitting nodes that do not meet certain conditions. Initially, E consists of a sufficiently large equilateral triangle, with edge length 2J 96, j >_ 0, that contains X. The choice of j is made such that the area of this triangle is O(L 2) where L denotes the length 3 of bd(X). The size ~(cr) of a triangle cr is the length of the longest edge of a. A leaf node Vp of Ex is split if it does not satisfy one of the following three conditions. i) Vp and all of its edges and vertices satisfy the closed ball property with respect to X. ii) If W N bd(X) is non-empty then ~(W) <- 6. iii) If Vv and Vq are adjacent then 89 < e(Vp) < 2g(Vq). Condition (i) is imposed to achieve the topological equivalence according to Theorem 1. Condition (ii) guarantees approximation within the specified tolerance. Condition (iii) ensures that the sizes of the adjacent triangles are within a factor of two, and hence, as we will show, the triangles of S(X, ,~) have bounded aspect ratio. Finally, when all leaf nodes of Ex satisfy the above three conditions, we construct a simplicial complex S(X) and define it to be S(X, 6). This triangulation consists of triangles that are formed as follows. Let E(X) denote the set of triangles corresponding to the leaf nodes of Ex. Let P be as defined in the previous section and let T be a subset of P such that IT > 3 and IT,X # 0. Then IT,X is a singleton set, say IT,X = {v}. Consider the triangles of E(X) that contain v. We have two possible configurations of these triangles around v. Configuration (i): There are exactly four triangles around v and exactly one triangle has double the size of the other three incident to v. In Figure 3.2(b) z is such a vertex. Configuration (ii): There are exactly six triangles incident to v, each has a size within a factor of two of the others. In Figure 3.2(b) y is such a vertex. There is no other configuration possible, which can be observed from the fact that the triangles span either an angle of 180 o (Figure 3.2(b)) or an angle of 60 ~ around v and condition (iii) ensures the specific ratios of the sizes of these triangles. In configuration (i) the quadrilateral formed by the centriods of the four triangles is triangulated by adding the unique diagonal that connects the centroids of the larger triangle and a smaller triangle. See Figure 3.2(c). In configuration z We assume that X is well-behaved so that its area and the length of its boundary are well-defined.
11
(a) Fig. 3.2. (a) X and
(b)
E(X) (b) Centroids connected
(c) around vertices of triangles in
E(X)
(c) s(x). (ii), the hexagon Pv is triangulated by joining v to each of its vertices as in Figure
3.2(c). 4
4.1
Implementation
C l o s e d ball property
For a triangle Vv, we first check if bd(X) intersects any of its edges in at most one point. This condition is derived from the requirement that bd(X) n IT is either empty or a 0-ball when ITI = 2. If this condition is satisfied, then the intersection of an edge of Vp with X is either empty or a 1-ball, as required. It only remains to be verified that Vp n X is empty or a 2-ball and that Vp n bd(X) is empty or a 1-ball. These conditions are satisfied iff Vpn X = @or the boundary of Vp n X has one connected component. The latter condition is verified by checking if any component of bd(X) lies in the interior of Vp.
4.2
Perturbation
Excessive, unnecessary splitting may occur at certain regions of X in some situations. For example, consider the situation illustrated in Figure 4.3. A part of X is slightly above the edge e of a triangle, say Vp. To satisfy the closed ball property, Vp is continuously split until sufficiently small triangles divide Vp n X to meet the closed ball condition. This minute splitting is further propagated to meet condition (iii). We introduce e-perturbation to prevent this excessive splitting. A topological space Y C_ R d is c-perturbedto W C_ R d if Y and W are homeomorphic and the Hausdorff distance between the two is less than or equal to e. To prevent excessive splitting we check if there is a possible (i/2-perturbation of X that would allow Vp to satisfy the closed ball property. In that case we perform a 3/2-perturbation and declare Vp to satisfy the closed ball property. This perturbation may also affect other triangles. For example, the two triangles incident to e in Figure 4.3 satisfy the closed ball property after perturbation. Other, more involved, ~i/2-perturbations are not detailed in this version. Experimental results show that this saves a lot of unnecessary splitting. For example,
12 the domain in Figure A.1 (see appendix) generated 7930 triangles without any perturbation, and 1458 triangles with perturbation.
(a)
(b)
Fig. 4.3. Excess splitting.
4.3
Handling degeneracies
Let a C_ R 2 be a k-simplex. We say a intersects X genericallyif (i) a N bd(X) is empty or enbd(X) is a (k-1)-manifold 4 and (ii) bd(X)nint(cr) = int(bd(X)ncr). Degeneracies occur if this generic intersection property is violated. We handle these degeneracies by small perturbations of bd(X) as shown in Figure 4.4. In Figure 4.4(a) the condition (ii) of generic intersection is violated. In Figure 4.4(b) and (c) the condition (i) of generic intersection is violated. Note that we do not perform these perturbations geometrically. Rather we make implicit perturbations and assume that these degeneracies have not occured. The implicit assumptions are reflected while checking the closed ball properties for triangles participating in degenerate intersections.
(a)
(b)
(c)
Fig. 4.4. Implicit perturbations to avoid degeneracy (a) bd(X) "touches" an edge in a point (b) bd(X) intersects an edge in a line segment (c) bd(X) intersects a vertex.
4 A (k - 1)-manifold is defined to be the empty set when k = 0.
13 4.4
Sequencing condition c h e c k i n g
Excessive splitting may also result from an arbitrary ordering of the checking of the three conditions. For example, suppose V E E has been checked to satisfy the closed ball property. Subsequently if V is split due to condition (ii) or (iii), the resulting children may not satisfy the closed ball property and this may ensue a new series of splitting. See Figure 4.5 for an illustration. To avoid this we check the conditions in the following order and adopt a trick. First, all nodes intersecting bd(X) are split until condition (ii) is satisfied. Next, the nodes violating condition (i) even after a a/2-perturbation are split. At this stage, all nodes satisfy the closed ball property with a a/2-perturbation. Before we carry out the splitting due to condition (iii) we replace V N bd(X) as follows for each V intersecting bd(X), bd(X) intersects two edges of V, say at x and y (see Figure 4.5(a)). Replace V A bd(X) by the straight line segment connecting x and y. It is easy to check that all children of V satisfy the closed ball property with this replacement of bd(X) and thus do not cause any further splitting due to condition (i). Since we perform the splitting due to condition (ii) first, the approximation distance of S(X, a), as explained below, remains within the desired bound even with this implicit modification of bd(X).
(a)
(b)
Fig. 4.5. (a) V satisfies the closed ball property (b) Three children of V do not satisfy the closed ball property.
5
Triangulation properties
The triangulation S(X, a) has certain properties that are desirable in most of the applications requiring triangulations of input domains.
5.1
Hausdorff distance
The Hausdorff distance between S = [.J s(x, a) and X is at most aT = a-. 2 a. First, we show that S C_ X + 3 . a. For a triangle V E E ( X ) , either V C_ X or V N bd(X) r O. In the latter case, V C_ X + a because of condition (ii) of splitting a node. Since the triangles in S(X, a) are covered by the triangles in E(X), it
14 follows that S C_ X + J. Now consider a point x 6 X - S. There exists a triangle Vp 6 E(X) such that Vp N bd(X) yt 0 and either x 6 Vp, or z is at a distance of at most 8/2 from some point in Vp. By condition (ii), the size of Vp is at most and so pzl -< 3" 8 implying that X _C S + 3. j.
5.2
T r i a n g l e quality
The method described in section 3 produces only two types of triangles; viz., equilateral triangles and right angled triangles that have angles 300 , 600 and 900 . In configuration (i), only right angled triangles are created. To see this consider the quadrilateral pqrs where p is the centroid of the larger triangle. It is a simple observation that the angle at p is 60 ~ and the angle at the opposite vertex r is 120 ~ The diagonal pr dissects the quadrilateral into two isometric triangles and thus each triangle has angles 30 ~ 600 and 90 ~ A right angled triangle with these angles has aspect ratio 5 2. In configuration (ii), both equilateral and right angled triangles are generated. It can be shown using simple geometry that e{p,,p,+l,.} is an equilateral triangle (Figure 3.2(c)) if both pi and Pi+l are centroids of triangles with equal size. When these two centroids come from triangles, one of whose size is double the size of the other, the triangle g{p,,p,+l,v} is a right angled triangle whose other two angles are 300 and 60% This implies that the triangles produced by the method in section 3 have aspect ratio 2 or ~3" Experiments with different domains have shown that most of the triangles are equilateral triangles. We remark that, in general, a factor of 2j for j > 0 can be used in condition (iii). As j increases the aspect ratio increases and the size of the triangulation decreases. Considering the trade-off between the aspect ratio and the size of a triangulation, j = 1 seems to be the right choice. For j > 1 triangles may not remain non-obtuse and for j = 0 all triangles are equilateral and have the same size which results in a large number of triangles in the triangulation.
5.3
Delaunayhood
It can be shown that each triangle formed in configurations (i) and (ii) is Delaunay. In configuration (i), the four centroids are co-circular and this circle does not contain any other centroid or vertex of triangles in E(X). In configuaration (ii), the circumcircle of r does not contain any other centroid or vertex of triangles in E(X). It means that S(X, 6) is a subcomplex of a Delaunay triangulation of the vertex set of S(X, J). 5 The aspect ratio of a triangle is the length of the hypotenuse (longest side) divided by the length of the altitude from the hypotenuse.
15
5.4
Size o f S ( X , $ )
Usually triangulations are used for certain postprocessing in most applications. The larger the size of the triangulation, the more the time required for postprocessing. As a result attempts are made to keep the size of a triangulation as small as possible. In general, finding an optimal triangulation having the minimum number of triangles and satisfying a set of constraints seems hard. Many triangulation methods guarantee an output within a constant factor of the optimum [2, 13] when the input domain is a polygon.
Fig. 5.6. Approximating with a single triangle.
Achieving such a bound for general domains in P J eludes us. For example, the domain in Figure 5.6 can be approximated with a single triangle within reasonably small tolerance. The topological triangulation method described in this paper may produce many more triangles due to certain placements of the A-tree. Still, we can show that our method produces a triangulation whose size is not too far away from the optimal under certain restrictions. We say that *T satisfies the locality property if the following two conditions hold. Recall that 5 = ] 9ST. 1. For any triangle a with size 5 or less, e N bd(X) has length at most O(e(~)). 2. Any triangle cr with size 5 or less satisfies the closed ball property with respect to X with a 5/2- perturbation of X. Let B(5) denote the strip obtained by taking the union of all geometric discs of radius 5 and centered at points on bd(X). A triangle of a triangulation is a boundary triangle if it has non-empty intersection with the boundary of the underlying space of the triangulation. A triangulation T(X, 5) is boundary restricted if all boundary triangles lie within B(5) and bd(X) is covered by the boundary triangles of T(X, 5). Let T* (X, 5) denote an optimal triangulation with respect to triangulation size where the minimum is taken over all boundary restricted triangulations with bounded aspect ratio 6. We show that the algorithm in section 3 produces a boundary restricted triangulation which is nearly optimal when 5T satisfies the locality property. 6 A triangulation T has aspect ratio c if no triangle in T has aspect ratio greater than c. A triangulation T has bounded aspect ratio if there exists a constant c such that the aspect ratio of T is c.
6
First we prove the following useful lemmas for the triangulation S(X, 5). In the following C, with and without subscripts, denotes suitable constants, and L denotes the Euclidean length of bd(X). L e m m a 2. For any 5T satzsfying the locality property, triangles in E(X) cor-
responding to the leaves of E x have size at least 5. Proof. Suppose that there are triangles in E(X) with size smaller than 5. Let Vp be the first node in the A-tree with size smaller than 5. Then Vp cannot be generated by a split due to condition (iii). The size of Ve must be 5/2 and so the size of the parent Vq of Vp nmst be 5. Vq satisfies condition (ii) and since 5T satisfies the locality property, Vq could not have been split due to condition (i). This contradicts the existence of Vp. L e m m a 3. The depth of any leaf Vp E E(X) is at most C . log(L~5) for some
constant C. Proof. By lemma 2, the area of Vp is at least C15 ~ for some constant C1. The area of a parent in the A-tree is four times that of any of its children. Since the triangle corresponding to the root of the A-tree has area O(L2), the depth of a leaf cannot be larger than log ~C t 5 2 ' and the bound follows. L e m m a 4. For any 5T satisfying the locality property, IS(X, 5) 1 < C. k log 2(L/5) for some constant C, where k ~s the number of boundary triangles in S(X, 5).
Proof. We note that IS(X, 5) 1= O(IE(X)I) and so it suffices to count the number of triangles in E(X). The number of nodes in E x at any level 7 intersecting bd(X) is at most k. Since these nodes have three siblings, there are at most 4k nodes created at any level due to condition (i) or (ii). Let h be the height of Ex. Thus the number of nodes in E x created due to condition (i) or (ii) is at most 4kh. We claim, without giving details, that the splitting of a node due to condition (i) or (ii) causes at most 3h nodes to split to satisfy condition (iii). Thus the number of nodes in E x generated due to condition (iii) is at most 4kh. (3h. 4) = 48kh 2 since four nodes are generated when a node splits. The bound on E(X) then follows from the bound on h from lemma 3. L e m m a 5. For any 5T satisfying the locality property, IT* (X, 5) 1 > C . (L/5 + ~(X)) for some constant C, where fi(X) is the number of components of bd(X).
Proof. Let cr E T* (X, 5) be any boundary triangle. Since ~ lies inside B(5), its inscribing circle has a radius less than 5. Also, the aspect ratio of cr is bounded by a constant, whence it follows that the size of o" is at most C1 9 5. One can easily show that ~r covers at most C2 9 5 of bd(X). To see this, divide ~r into four similar, congruent triangles and carry on this subdivision successively on the resulting smaller triangles until the triangles have a size smaller than 5. At 7 We adopt the convention that the root has level 0 and a child has a level one greater than that of its parent.
17
most 0(C1) triangles are created. Because ~T satisfies the locality property, the length of bd(X) contained in any small triangle is O(~). This means that at least L/(C2 9~) boundary triangles are needed to account for the entire boundary of X. Therefore, IT*(X,~)I >_C. (L/~ยง since at least ~ ( g ( Z ) ) triangles are needed to account for all components of bd(X). T h e o r e m 6. For any ~T satisfying the LS(X,~)I < C . LT*(X,~)llog 2 IT*(Z,~)l
locality property, for some constant C.
Proof. Let k be the number of boundary triangles in S(X,~) and let E'(X) denote the collection of all triangles Vp 9 E(X) such that Vp N bd(X) r 0. We note that k = O(IE'(X)I ). Let fl(X) denote the number of components of bd(X). We show that IE'(X)I is at most C1" (L/~ + ~(X)). The required bound then follows from lemma 4 and lemma 5. Consider a triangle Vp 9 E'(X). Let u be the unique vertex of Vp that lies on the side of bd(X) opposite to the side containing the other two vertices of Vp. We call u the cap vertex of Vp. Let v be another cap vertex connected to u. If no such v exists, then a component of bd(X) surrounds u and u is incident to at most six triangles. See Figure 5.7(a). If v exists, then we claim that the two triangles Vpl and Vp~ incident to uv cover at least ~43 9 length of bd(X). Let Vpl denote the triangle whose two edges incident to u are intersected by bd(X). Let bd(X) intersect uv at y, the other edge of Vm incident to u at x and the other edge of Vp~ incident to v at z. See Figure 5.7(b). It is easy to prove using the sine law that luyl <_ ~a" IzYl and Ivyl <_ ~3" Iyzl whence it follows that luvl < ~aa"Ixzl" Since all triangles of E'(X) are of size no smaller than ~ (by lemma 2),
luvl >_ ~ implying that Ixzl >_ L43. ~. Hence, any
two triangles incident to two adjacent cap vertcies cover at least 4~ 9~ length of bd(X). It means that at most ~ a " (L/~) ยง fl(X) cap vertices are present. The bound on IE'(X)I follows since at most six triangles can be incident to any cap vertex.
U
bd(X)
V
(a)
(b)
Fig. 5.7. (a) A cap vertex surrounded by a component of bd(X). (b) u and v are cap vertices.
18 5.5
Quadrangulations
In finite element methods, quadrilaterals are often used as basic elements. This requires a quadrangulation of an input domain. Most of the existing algorithms convert triangulations to quadrangulations 12. Our method can provide a quadrangulation approximating the input domain directly from E ( X ) . Configuration (i) already produces convex quadrilaterals. In configuration (ii), we add one of the diagonals of P. to break it up into two convex quadrilaterals. We choose the diagonal that produces most symmetric quadrilaterals. Figure A.3 (see appendix) shows quadrangulations with two different approximation tolerances. It is not hard to show that the quadrilaterals have good aspect ratio, i.e., the ratio of the longest dimension to the shortest dimension, is bounded by a constant.
6
E x p e r i m e n t a l results and discussion
We implemented the algorithm of section 3 in C on a Unix platform. It is our experience that the perturbation method decreases the number of triangles considerably as we lower the value of ~T. Figure A.2 (see appendix) shows triangulations of a domain with three different values of tolerances. As expected the number of triangles increases as we decrease the tolerance. The number of triangles in the three triangulations are 370, 1258 and 5102. Quadrangulations of the same domain are shown in Figure A.3 (see appendix) for two different tolerances. This approximation scheme extends to three dimensions. Topological triangulation with a cubic grid is dealt with in 14. It needs to be investigated if the quality of the tetrahedral elements can be improved with some other grid. An implementation of the method in R 3 is expected to face other challenges. Currently we are in the process of implementing the method using a cubic grid.
References 1. T. Baker. Automatic mesh generation for complex three-dimensional regions using constrained Delaunay triangulation. Engineering with Computers, vol. 5, 1989, 161175. 2. M. Bern, D. Eppstein and J. Gilbert. Provably good mesh generation. Proc. 31st IEEE Sympos. Found. Comput. Sci., 1990, 231-241. 3. M. Bern and D. Eppstein. Mesh generation and optimal triangulation. In Computing in Euclidean Geometry, D. Z. Du and F. K. Hwang (eds.), World Scientific, 1992. 4. M. Bern, D. Dobkin and D. Eppstein. Triangulating polygons without large angles. Proc. 8th Ann. Sympos. Comput. Geom., 1992, 222-231. 5. M. Bern, S. A. Mitchell and J. Ruppert. Linear-size nonobtuse triangulation of polygons. Proc. lOth Ann. Sympos. Comput. Geom., 1994, 221-230. 6. L. P. Chew. Guaranteed quality triangular meshes. Tech. Report, TR89-983, Dept. of Comput. Sc., Cornell Univ., Ithaca, NY, 1989. 7. H. Edelsbrunner and N. R. Shah. Triangulating topological spaces. Proc. lOth Ann. Sympos. Comput. Geom., 1994, 285-292.
19
8. J. D. Foley, A. Van Dam, S. K. Feiner, et al. Computer Graphics: Principles and Practice, Addison-Wesley, Reading, MA, 1990. 9. P. S. Heckbert and M. Garland. Multiresolution modeling for fast rendering. Proc. Graphics Interface, 1994, 43-50. 10. C. M. Hoffmann. Geometric and Solid Modeling An Introduction. Morgan Kaufmarm Publishers, Inc., 1989. 11. P. M. Hubbard. Approximating polyhedra with spheres for time-critical collison detection. ACM Trans. Graphics, vol. 15, 1996, 179-210. 12. S. Ramaswami, P. Ramos and G. Toussaint. Converting triangulations to quadrangulations. Proc. 7th Canad. Conf. Comput. Geom., 1995, 297-302. 13. J. Ruppert. A Delaunay refinement algorithm for quality 2-dimensional mesh generation, manuscript, 1993. 14. N. R. Shah. Homeomorphic meshes in 1t 3. Proc. 7th Canad. Con]. Comput. Geom., 1995, 25-30.
Appendix
Fig. A.1. The triangulation on the left is without perturbation and the triangulation on the right is with perturbation.
20
F i g . A.2. A domain and its triangulation for three tolerances. The number of triangles grows as the tolerance decreases.
21
F i g . A . 3 . Quadrangulation of the domain in Figure A.2 for two tolerances. The number of quadrilaterals grows as the tolerance decreases.
Solving Some Discrepancy Problems in NC Sanjeev Mahajan, 1 * Edgar A. Ramos, 2 **, and K. V. S u b r a h m a n y a m 3 *** z LSI Logic, Milpitas, CA 95035, USA 2 Max-Planck-Institut fiir Informatik, Im Stadtwald, 66123 Saarbrilcken, Germany 3 SPIC Mathemat. Institute, 92.GN Cherty Road, T. Nagar Madras, India. 600 017
A b s t r a c t . We show that several discrepancy-like problems can be solved in N C 2 nearly achieving the discrepancies guaranteed by a probabilistic analysis and achievable sequentially. For example, given a set system (X,S), where X is a ground set and 8 C_ 2x, a set R C_ X can be computed in NC 2 so that, for each S E 8, the discrepaneyHRnS --RnSII is O(~/ISIlog 181). Whereas previous NC algorithms could only achieve O(~/ISiX+~log I,.ql), ours matches the probabilistic bound achieved sequentially within a multiplicative factor 1 + o(1). Other problems whose NC solution we improve are lattice approximation, e-approximations of range spaces of bounded VC-exponent, sampling in geometric configuration spaces, approximation of integer linear programs, and edge coloring of graphs.
1
Introduction
Problem and Pre~iou8 Work. Discrepancy is an important concept in combinatorics, see e.g. 1,5, and theoretical computer science, see e.g. 29,25,10. It a t t e m p t s to capture the idea of a good sample from a set. The simplest example, the set discrepancy problem considers a set system (X, 8), where X is a ground set and 5 C_ 2x is a family of subsets of X. Here one is interested in a subset R C_ X such that for each S 6 8 the difference IR n S - IR N sll, called the discrepancy, is small. Using Chernoff-Hoeffding bounds 11,16,30,29,6,31, it is found t h a t a random sample R C_ X with each 9 6 X taken into R independently with probability 1/2, results with nonzero probability in a low discrepancy set: for each S E S, IIR n s l - IRn Sll = o(~/ISlloglSl). In 29 using the method of conditional probabilitiez this was derandomized to obtain a deterministic sequential algorithm computing such a sample R . In parallel, several approaches have been used (k-wise independence combined with the method of conditional probabilities and relaxed to biased spaces 7,25,27,8). However, so far these efforts to compute a sample in parallel have resulted only in discrepancies o ( ~ / I s I TM log Ist). * Work performed while at MPI Informatik, Germany. E-mail: msanjeev@lsil, corn ** The work by this author was started at DIMACS/Rutgers Univ., New Brunswick, N J, USA, supported by a postdoctoral fellowship. E-mail:
[email protected] * * Work performed while visiting MPI Informatik, Germany. E-mail: kv@smi, ornet, in
23
Results. In this paper, we describe NC algorithms (specifically, the algorithms run in O(log 2 n) time using O(n c) processors for some constant C in the E R E W PRAM model) that achieve the probabilistic bounds (achievable sequentiMly) within a multiplicative factor 1+o(1). The technique we use is to model the sampiing by randomized finite automatons (RFA's) 1 and then fool these automata with a probability distribution of polynomial size support. The approach is not new; in fact, Karger and Koller 18 show how to fool such automata via the lattice approximation problem using a solution for that problem developed in 25. However, they apparently did not realize that the lattice approximation problem itself can be modeled by RFAs, and that as a result this and other discrepancylike problems can be solved in parallel nearly achieving the probabilistic bounds. We also describe how the work of Nisan 28 for fooling RFAs in the context of pseudorandom generators also fits the same general approach. We consider a sample R from X with each xj E X selected into R independently with probability pj. The goodness of the sample is determined by a polynomial number (in XI) of random variables ci -- ~-~x~r aijqj with coefficients aij in 0, 1 and qj = I iff xj E R (the indicators for R). More precisely, R is good if for each i, ci - #i _< )~i, where #i = ~ : j r aijpj is the expected value of ci, and hi is a deviation guaranteed by probabUistic (Chernoff-Hoeffding) bounds. Each coefficient a~j is restricted to have O(log XI) bits so that the number of possible values of ci is polynomial in XI, and therefore can be represented by a RFA of polynomial size. One RFA for each i. A key point, which perhaps explains why our observations had not been noticed before, is that it is sufficient to fool the individual transition probabilities of the RFAs simultaneously, rather than the joint transition obtaining since the probability of a bad sample is bounded by the sum of the probabilities that each individual constraint does not hold. Although limited, this framework includes the lattice approximation problem, the discrepancy problem, and sampling problems in computational geometry. Also, since the lattice approximation problem can be used to obtain approximate solutions to integer linear programs 30,29, this this leads to improved results in the parallel context. As a result, with no extra effort, we improve on the recent work in 2. Our improvement also translates to the derandomization in 25 of an algorithm for graph edge coloring by Karloff and Shmoys 20. ~ Contents of the paper. We first state the Chernoff-Hoeffding bounds used in this paper. In Sect. 2, we state and model the lattice approximation problem by RFAs; in Sect. 3, we present the techniques for fooling RFAs and the resulting algorithm for the lattice approximation problem; in Sect. 4, we consider the discrepancy problem and its application to solving the lattice approximation problem; in Sect. 5, we present two applications to computational geometry; finally, in Sect. 6, we briefly mention the applications to approximating integer linear programs and to edge coloring of graphs. t Finite automata in which transitions from a state to its immediate successor occurs with a certain probability. 2 W e thank an anonymous referee for pointing out this application.
24
Ohernoff-Hoeffding Bounds. For independent random variables X 1 , . . . , Xn in 0, 1, X = ~ = 1 Xi and # -- EX, let ~(/~, z) denote the absolute deviation for which, Pr(IX - # > ~(/~, z)) < z. A bound for ~(/~, z) is obtained using the Chernoff-Hoeffding bounds 11,16,29,1: @(~/#log(1/z))
=
if/~ > c l o g ( I / z )
/ "J ~ log(log(1/z)/l~) ) otherwise.
(2)
We define, likewise, ~k(#, z), when X is the sum of k-wise independent random variables X 1 , . . . , X,, with values in 0, 1. In this case 6,31:
O(x/~(1/z) 1/k) if # ~ ~ Ak(/~, z) = ). O ( k ( l / z ) l / k ) 2
Lattice
otherwise.
(2)
Approximation
In the lattice approzimation (latt. app. ) problem we are given an m x n matrix A with a~j G 0, 1, an n x 1 vector p with pj G 0, 1, and we are to compute an n x 1 vector q with qj G {0, 1}, a lattice rector, that achieves small discrepancies =
2.1
Ej\I
J(p
-
qj)
9
Randomized Rounding
Raghavan's 29 solution to the latt. app. problem is to set each qj to 1 with probability pj, independently of all others, a process called randomized rounding. Let /~ = ~ = 1 ~JPJ" The Chernoff-Hoeffding bounds guarantee that for each i with probability less than l / m , A~ > A(/~, l/m); therefore, with nonzero probability, for all i, A~ ~ A(#~, 1/ra) (m is the number of equations). For #~ = f2(log m) (which will be the case most of the time), this is O ( ~ ) . Raghavan 291 converted this probabilistic existence argument into a deterministic algorithm through the so called method of conditional probabilities. (achieving the discrepancies guaranteed by the Chernoff-Hoeffding bounds). A parallel version by Motwani et al 25 used polynomial size spaces with limited independence, together with a bit-by-bit rounding approach. Unfortunately, with the requirement that the algorithm be in NC, limited independence can only produce discrepancies z3, = O(~/#~+~ log m). Using the Chernoff-Hoeffding bounds for arbitrary pj's and the construction of k-wise independent probability spaces in 17, it is possible to avoid the bitby-bit rounding and obtain a faster and simpler algorithm (checking all the points in the probability space in a straightforward manner), though with larger discrepancies and work (the product of the time and the number of processors) bounds. This algorithm, with k = 2, turns out to be useful as a part of the main algorithm. To simplify later expressions, we assume that m is polynomial in n, so that log(r~ + m) = O(log r~). In any case, the resulting work bound is polynomial in n only if such is the case.
25
L e m m a 1. A lattice vector with discrepancies A~ = O(~/--~m1/~) can be computed in O(log(m+n) ) = O(log n) time using O(nk+~m) processors in the ERE W P R A M model. 2.2
Modeling Rounding with Leveled RFAs
Limitin9 the Precision. In order to derandomise the rounding procedure while getting closer to the probabilistic bound, it is useful to model it with RFAs. Specifically, the idea is to have one RFA for each of the m equations so that 1 in the i-th RFA, states correspond to the different partial sums ~ j = l a i j q j , l = 0 , . . . , n and qj E {0, 1}. For this to be useful, the number of states must be polynomial. Fortunately, as observed in 25, the fractional part of the coefficients aij (and so the partial sums) can be truncated without a significant increment in the discrepancies. Also, it will be useful later to limit the precision of the probabilities pj. More precisely, these parameters can be truncated to L' = log(3n/g) fractional bits while increasing the discrepancy by at most & Let 5q and/~j be the corresponding truncated numbers, thus the discrepancy I j(a*jqj -aOP )l with respect to the original parameters can be upper bounded by
_< y
la,j - a,jl + a , + J
- PJl + Y J
la, - a,jl _< a , +
J
where ~i is the discrepancy achieved for the truncated parameters. Furthermore, for the integer part of the partial sums, L" = Oogn bits suffice. If 1/g is polynomially bounded, then so is the number of states needed in the RFAs. We assume that ~ = O(1) is sufficient and so L = L' + L" = 21ogn + O(1) bits are sufficient to represent the different possible sums.
Leveled RFAs. Thus, the rounding procedure can be modeled with m leveled RFAs. The i-th RFA, M~, consists of n + 1 levels of states N~,0, 9 9 N~,,,, so that in N~j there is a state (i, j, r / for each number r with L bits. The transitions in M~ are between consecutive levels N~,j-1 and N~,j in the natural way: (i, j - 1, r) is connected to (i,j,r) under qj = O, and ( i , j - 1, r) is connected to ( i , j , r + a q ) under q1 : 1. The only state s~ : (i, 0, 0/ in N~,0 is the start state of Mi. A state (i, n, r) in the last level Ni,,~ is accepting if r is within a specified deviation Ai from #~, that is, if I v - #i _< ~ . Let R~ denote the set of rejecting states in Ni,,~. For two states s and t in some Mi and a string w, s ~-~ t denotes that starting at s the string w leads to t, and s -~ t is an indicator equal to 1 if s ~, t holds and equal to 0 otherwise. Let D be a probability distribution on ~Tz, the set of all 0/1 strings of length I. For w E El, PrD{w} denotes the probability of w in D, and PrD{st} denotes the probability of s Y-, t w h e n w is chosen at random according to D. Then zvEEz
26
Basic Approach. Let F,~ be the fully independent distribution on ,U,~ according to the specified bit probabilities pj. Suppose that we can construct in polynomial time a distribution D,, on ,U,~ with polynomial size support such that for each i, IPrD~{sir} - PrF~{Sir}
< e.
rER~
Then ~ - e a , PrD~{sir} < e + ~ e R , PrF~{Sir}, and if we set A~ - A(#i, 1/2m) the right hand side of this equation is at most e + ~-~m"Thus, summing over all i, ~ i ~ r e a , PrD~{sir} < rnE + 1/2. For e ---- ~m' this is at most 1. That is, there is at least one event in D,~ that gives a lattice vector solution almost as good as that guaranteed by the probabilistic bound under F,~. As a result, we obtain discrepancies within a multiplicative factor 1 + o(1): A(#~, 1/2rn) rather than A(#~, 1/m). (We could get even closer to the probabilistic bound by further reducing the error in the approximation at the expense of a greater amount of work.) Thus, derandomizing the rounding procedure becomes a problem of fooling a set of leveled RFAs, which is discussed in the next section. 3
Fooling
Leveled
RFAs
in Parallel
Techniques to fool a RFA are found in the work of Nisan 28 in the context of pseudorandom generators, and in the work of Karger and Koller 18 in the context of parallel derandomization. Karger and Koller's approach is stronger in that it achieves relative error in the transition probabilities, while Nisan's approach achieves absolute error. Although Nisan's approach has the advantage of a compact representation, that is not important for our purposes. So far it has gone unnoticed that these techniques are precisely what is needed to nearly achieve the probabillstic bounds for the latt. app. problem in parallel. We present these two approaches in a unified manner for the particular case of leveled RFAs, which results in better processor bounds than if general RFAs are considered. 3.1
General Approach
The goal is to construct a distribution D,~ on Zn that fools each RFA Mi. We emphasize that we can fool simultaneously the individual transition probabilities of all the RFAs, PrF,~{si ---* ri} for all i, but cannot fool the joint transition probabilities PrF,~{st --~ rl . . . . , s,~ --~ r,~}. Let E0 be an integer parameter which will correspond to the (approximate) size of Dn, and let W = log E0.
Algorithm. As in 28,18, D,~ is determined by a divide and conquer approach in which the generic procedure fool(l, I') constructs a distribution that fools the transition probabilities between level I and I' in all the RFAs. f o o l ( l , l') works as follows: It computes, using f o o l ( l , I") and f o o l ( / " , l') recursively, distributions D1 and D2, each of size at most E0(1 + o(1)), that fool the transitions between states in levels I and I" = (I + l')/2J, and between states in levels I" and I; reduce(Dr x D2) then combines D1 and D2 into a distribution D of size at
27 most E0(1 + o(1)) that fools the transitions between states in levels I and V in all the RFAs. In the bottom of the recursion we use a 0/1 distribution F1 with support of size E0 implemented by W unbiased bits, which preserves the transition probabilities exactly.
fool(l, V) 1.
if l = 1' then return F,
2. V' = (l+V)/2J 3. 4.
D1 = :tool(l, J") D2 = f o o l ( / " , l ' )
5.
return reduce(Dz x D2)
Reduce. L e t / ~ = D1 x D2 be the product distribution with support supp(/~) =
{wlw2 : w~ 6 supp(D~)} and Prb{wlw2 } = PrD~{wl}PrD,{tO2}. A randomized version of the combining is, as in 18: Retain each โข 6 b with certain probability q(w) into supp(D) with PrD{w} = P r 6 { w } / q ( w ) . Thus, for all states, s,L the transition probabilities are preserved in expectation:
EPrD {st} = ~-~s -~ t ~q(w)
= Prb{s* }.
(3)
zo
This selection also implies that the expected size of supp(D) is ~ q(w). We will bound this by our desired value Eo(1 + o(1)) and formulate these conditions as a randomized rounding problem. This is exactly the approach of Karger and Koller 18; but they missed the fact that the latt. app. problem itself can be modeled by RFAs and as a result the probabilistic bound can be nearly achieved. Next, we describe and analyze deterministic procedures to obtain a distribution D of size at most E0(1 + o(1)) such that for all states s, ~ the difference IPrD{s~} - Prb{s~}l is small. We distinguish two cases according to whether we aim for absolute or relative error in the approximation. These cases correspond to the work in 28 and in 18 respectively. 3 Our aim is a unified and self contained presentation adapted to our situation, emphasizing how new instances of the latt. app. problem appear naturally in solving the original instance. 3.2
Absolute Error
Let D be the distribution resulting from f o o l ( l , / ' ) at the k-th level of the recursive computation, with the 0-th level being the b o t t o m of the recursion. D should fool the RFAs in the sense that, for each s = si,z = (/, l, 0),
tEN,,t,
s For most of our applications, absolute error suffices. However, it turns out that for some range of the parameters of the latt. app. problem, using the relative error option results in a lower work bound. See Sect. 5 for an application in which relative error seems to be needed.
28 where h : l' - I and ek is an upper bound on the absolute error accumulated up to the k-th recursion level. Note that if the transitions from s~,z = (i, l, 0) are fooled then the transitions from the other states (i, l,r), r ~ 0, in Ni,z are automatically fooled as well (because a string w induces a transition from (i, l, 0) to (i, I', 6) iff it induces a transition from (i, l, r) to (i, I', r + 6)).
Accumulation of Error. Let us assume that D, obtained from /~ at level k, satisfies E
Prfi{s~} - PrD{S~} < ~
(4)
tGN~,t
for each s = si,l. Since (proof omitted here)
IPrb{st} - Pr As }l
2e _1,
t6N~,t, then ~k <_ 2~k-1 +~, and so ek < (2 ~ - 1)~. Let d = Ftogn1 be the last level. In order to achieve final error ed < e, we choose ~ = e/n.
Computing D from D. At each stage of the algorithm the partial distribution D constructed will be uniform on its support. If Isupp(b) is less than E0, then D = D, nothing needs to be done. Otherwise, D is obtained f r o m / ) as follows. We have the following equations for every pair of states s -- s~,z and t G Nij, :
s ~ tPrb{w } = Prb{s~};
(5)
w6supp(D) and there is also the normalization condition:
E Prb{w} ---1 ~6supp(/)) Multiplying each of these equations by Eo and with q(w) : E o / I s u p p ( b ) I, we obtain the following equations: E s -~ tq(w) = Prfi{st}E0 .esupp(fi) =
(6) EoPrfi{w} =
for each Mi, s = s~,, and t 6 Nij, (7) (s)
~esupp(/~) These equations define a latt. app. problem, whose solution is the desired probability space D: The support of the space D will be precisely the support of this lattice vector, and the elements in the support will be assigned probability 1/supp(D). 4 A solution to this latt. app. problem, as indicated earlier, is to retain into D each dement w 6 supp(/~) with probability q(w). 4 To satisfy Eqn. (3) exactly we need to assign each element retained in D a probability Prb{w}/q(w ). However D may not satisfy the requirements of being a probability distribution under such an assignment and so we normalize the probability of every element to 1/supp(D). We show below that even under such an assignment D is a small size probability distribution approximating /~ well.
29 Let 7? : 2L be the number of states in a level, and N : m~? be the number of pairs i, L So the latt. app. problem in Eqns. (7-8) has N + 1 equations. Using the Chernoff-Hoeffding bounds, there exists a lattice vector (whose support is identified with supp(D) in the sequel) such that for all states s, t the following holds with non zero probability:
s ~--*t - Prf){st}Eo < A(Prb{st}Eo, Prb{st}/(m + 1)) wesupp(o)
1 - Eo I < A(Eo, 1/(m+
1)1,
,0r
The probability is non zero since ~ = i ~teN~,,, Pr3 { s t } / ( m + 1) + 1 / ( m + 1) : 1. Letting, 3' : supp(D)/Eo, this is equivalent to
IPro{s*}~'- Prb{s~}l _< l'Y -
il <
A(Prb{st}Eo,Prb{st}/(m +
1))
~o
~(Eo, ~/(~ + ~)) Eo
So, for all s = s~,z and t 6 Ni,v, the following holds with nonzero probability
IPrb{st} - PrD{s*}l _< IPr3{s*} - Pro{st}'vl + Pro{st}b- II < A(Prb{st}Eo , PLf{st}l(m E0
+ i))
+ A(E0,
i/(m+
i))
Eo
In order to achieve the error bound between/~ and D expressed by Eqn. (4), it is sufficient that IPr3{st } - Pro{st} _< ~/~7 = r
Choice of Eo. If the w's axe selected with probability q(w) using a k-wise independent probability distribution, then using the estimate for Ak(/~,x) in Eqn. (2), we obtain that Pr15{st } - Pro{st} < CmlP'/v/~o. So we need that Cml/k/v/~o < ~/n~. We then choose E0 so that this holds: Eo > C n~72m21~ -
3.3
Relative
e2
(9)
9
Error
In this case, D should fool the RFAs in the sense that for each s -- s~,z and ~6 N~,v, Pro{st} 11 <6k, PrF~{st} -where 6k is the relative error accumulated up to the k-th recursion level. To achieve this, the distribution D is allowed to be non uniform on its support (a
30 probability distribution uniform on a support of polynomial size cannot have events with very small probability). The probabilities q(w) with which elements i n / ) are retained into D are also non uniform. As in the absolute error case, we set up a latt. app. problem and the support of D will be precisely the support of a solution to it. Instead of assigning each dement in the support of D a probability Prf)~w}/q(w) as required to satisfy Eqn. (3), we normalize it by "Y: ~-~esupp(D) PrD{w}/q(w); that is PrD{w} : PrD{w}/q(w)%
Accumulation of Error.
Let us assume that D, obtained from /) at level k,
satisfies
VrD{St}
1 < ~, -
Vr6Ist}
(10)
for each s = s~,l and t E N~,z,. Since (proof omitted here)
IVrv~{st} PrD{St}
I
1 + 1 < (1 + 6k_x)2(1 + $),
then (1 + 6~) < (1 + 5~_1)2(1 q-6), and 6a < (1 + $)n - 1 <__2n6 for 3 sufficiently small. Accordingly, we choose 5 = 5/2n to achieve total relative error 5.
Choice ofq(w). If Isupp(/3)l is less than E0 then D = / ) , nothing needs to be done. Otherwise, we proceed as follows. Let Zi = Eo/(N + 1). We rewrite Eqns. (5) and (6) as s-% tPrh(w}A , ,
Z
=
(11)
= A.
(12)
~Esupp(b) Pr~Iaq(w) ~Esupp(b) The probabilities q(w) are chosen as small as possible (to reduce the size of the support) while each coefficient in this system of equations is at most 1, so that these equations constitute a latt. app. problem. Therefore, as in 18, we choose:
q(w) =
max ( m a x s -~ t P r b ( w } A
Replacing maximum by summation, we find that ~
9,
prD{st }
+ PrD{z0}A
=
- _ .~
q(w) is upper prD{st }
bounded by
+Zl = (N+I)A.
That is, the expected size of supp(D) is at most (N + 1)Zi = Eo as desired.
31
Computing D from L). The latt. app. problem in Eqns. (11-12) is solved and the support of D is defined to be the support of the lattice vector so obtained. Using the Chernoff-Hoeffding bounds, there exists a lattice vector (whose support is identified with supp(D) in the sequel) such that for for each Mi, s = s~,z and t E Nij, the following holds with non zero probability: s ~ tPrD{w}Zl ~esupp(D)
_ A < A(A, 1/(N -{-I))
q(w)PrD{st} PrD{w}A
~supp(D)
q(w)
A < A(A, 1/(N.-k 1)).
After dividing by A this becomes, with p = PrD{st}/Prb{st}, I c y - 11 < ~(za, 1/(Nza+ 1))
and
IV - 11 _< A(za, II(NzI + 1))
Thus, using p < 2, with nonzero probability
PrD{st} prb{st }
1
_
= IP - II < p1 - "r + gr - II <
3A(A, 1 / ( N + za
1))
Choice ofza and Eo. We need 3A(A, 1/(N+l))/za < 6. Solving the latt. app. problem using a k-wise independent distribution, and using Eqn. (2), we obtain a condition for Eo (and z~ since Eo = (N + 1)za) _ C n~(~Tm)z+2/} Eo > 62 9 3.4
(13)
Work and Time Bounds
A variation of the algorithm in Lemma 1 is used for reduce. The recurrence for the number of processors used by f o o l ( l , l') is W(h) < 2W(h/2)+ Cf(E~)Eom, where .f(z) is the size of a k-wise independent probability space for z variables. Then the total number of processors is O(f(Eg)Eomn). This is minimized when k = 2. In the case of absolute error, one can have a better processor bound than that in Lemma 1 because a uniform 2-wise independent probability space of linear size can be constructed using hash functions as in 28. Using Eqns. (9) and (13), we finally obtain the following (details omitted). T h e o r e m 2. A leveled RFA can be fooled with absolute error e in O(log 2 n) time using O(nr~76m4/e 6) processors, and with relative error 5 in O(log 2 n) time
using O(nZl,Tz~
/5 z~ processors.
For the latt. app. problem, it is sufficient to use either absolute error with e = 1/2m, or relative error with 6 -- 1. Thus, we obtain the following. T h e o r e m 3. The latt. app. problem can be solved deterministically in the EREW PRAM model, resultin9 in discrepancies within a multiplicative factor 1 + o(1) of the probabilistic bound, using O(log z n) time and O(nr,76m 6 min(m 4, ~z4r}4))
processors.
32
4
Discrepancy
4.1
Problem
The particular case of the latt. app. problem in which each a~i is 0 or 1 and each P1 is 1/2 corresponds to the well-known discrepancy problem. It is usually stated as follows. We are given a set system (X, 5) where X is a ground set and S is a collection of subsets of X, n = IX I and m -- ISI, and we are to compute a subset R from X, such that for each S e 5, the discrepancies IR M S I - IR N siI are small. Let R be a sample from X with each z E X selected into R independently with probability 1/2. Then the Chernoff-Hoeffding bound for full independence, Eqn. (1), guarantees that with nonzero probability for each S E $:
IIRn Sl- IRn sll _<,~(Isl/2,1/m)= e(IV/~ilogm). Generalizations and variations of the discrepancy problem have been extensively studied in combinatorics and combinatorial geometry (where 5 is determined from X C Ra by specific geometric objects) 5,1. Computationally, it has also been object of extensive research 25,27,7. Because of its importance, we consider in detail the work and time requirements for its solution in NC. Also, it is shown in 25 that an algorithm for the discrepancy problem can be used to solve the more general latt. app. problem. As a result, if we are willing to loose a log n factor in the running time, and a constant factor in the value of discrepancy achieved, then this represents a substantial saving in the amount of work performed (though still much higher than the work performed sequentially). 4.2
Algorithm
The algorithm is just the specialization of the latt. app. algorithm of Sect. 2. The RFAs effectively work as counters that for each S E S store the number of elements of S that have been selected into R. Thus ~7= n + 1. The threshold As that determines the rejecting states of Ms is set to )~(S/2, 1/2m) = O(V ~ logm), so that even after an absolute error less than 1/2m per RFA, or a relative error less than 1, still there is a good set with nonzero probability. This choice of As results in a discrepancy that is larger than the probabilistic bound (which is achievable sequentially) by only a factor 1 + o(1). Plugging the corresponding parameters in Thm. 3, we obtain the following. T h e o r e m 4. The discrepancy problem can be solved deterministieally in the E R E W P R A M model in O(log n log(n+m)) = O(log 2 n) time using O(?~13m6 min
(m4, 4.8
processors. Lattice Approximation Via Discrepancy
The algorithm for the latt. app. problem in 25 is obtained by a reduction to the discrepancy problem. The resulting latt. app. algorithm achieves discrepancies a constant factor larger, while it has essentially the same work bound as the
33 discrepancy algorithm and a running time larger by a factor log n. The reduction uses as an intermediate step, for the purpose of analysis, the rector balancing problem. This problem is a latt. app. problem in which each pj = 1/2. Our improvement also translates to this algorithm (analysis omitted). As a result, we obtain the following. T h e o r e m 5. The latt. app. problem can be solred deterministicalhj, resulting in discrepancies within a multiplicatire factor 0(1) from the probabilistic bound, for #i _> logm, in the E R E W P R A M model in O(Llog2 n) --O(log3n) time using
o( 13 6 min( 5
Sampling
,
processors.
in Computational
Geometry
Randomized algorithms have been very successful in computational geometry 12,26 and, as a result, there has been interest in their derandomization. For this, two concepts capturing the characteristics of a sample have been developed: approzimations of range spaces and sampling in configuration spaces. In both cases, our approach improves on previous NC constructions.
5.1
Approximations of Range Spaces
A range space is a set system (X, 7~) consisting of a ground set X, n = X, and a set 7~ of subsets of X called ranges. A subset A C_ X is called an e-approximation for (X, 9~) if for each R E ~ , A N R I / I A I - R I / X I I < e. For Y C_ X, the restriction 7~lr is the set (Y N R : R E 7~}. (X, 7~) is said to have bounded VCexponent if there is a constant d such that for any Y C_X, ITaly I = O(IY~). For (X, 7~) with bounded VC-exponent, a random sample of size O(r z log r), where the multiplicative constant depends on d, is a (1/r)-approximation with nonzero probability 32,1. Sequentially, the method of conditional probabilities leads to a polynomial time algorithm for constructing these approximations with optimal size (matching the probabilistic bound). With a constant size loss in the size, they can be constructed in O(nr c) time, for some constant C that depends on (X, 7~) 22,10. Furthermore, for some range spaces that here we just call linearizable, and for r <__n ~, some e > 0 depending on the range space, the construction can be performed in O ( n l o g r ) time 23. In parallel (NC), however, only size O(r 2+~) has been achieved using k-wise independent probability spaces 13-15. There is a close relation to the discrepancy problem. In fact, when the random sample R is of size X/2, the low discrepancy and approximation properties are (almost) equivalent. From the definition, it is clear that the same approach used for the discrepancy problem can be used to compute an approximation of optimal size in parallel. Taking advantage of the good behavior of approximations under partitioning and iteration 22, the running times of the algorithms can be improved as follows, with only a constant factor loss in the size (details omitted here). The results for the C R C W P R A M model in 14,15 can be similarly improved.
34
Theorem 6. A (1/r)-approzimation of size O(r 2 log r) of a range space (X, T~), IXI = ~, can be computed deterministically in the E R E W P R A M model in O(log n ยง log 2 r) time using O(nr c) work, for some C > O. If (X, 7~) is linearizable, then for r < n e, for some 0 < e < 1, the construction can be performed in O(log nlog r) time using O(n log r) work.
5.2
Sampling in Geometric Configuration Spaces
Configuration spaces 12,9,26,24 provide a general framework for geometric sampiing. A configuration space is a 4-tuple (X, T, trig, kill) where: X is a finite set of objects, n = IXl; ~r is a mapping that assigns to each S C_ X a set T(S) called the regions determined by S, let 7~(X) = Usc_xT(S); trig is a mapping 7~(X) ---, 2 x indicating for each a 6 7~(X) the set of objects in X that trigger a; kill is a mapping 7~(X) --~ 2 x indicating for each ~ 6 7~(X) the set of objects in X that kill ~. We are interested in configuration spaces that satisfy the following azioms: (i) d = max{ltrig(a) : ~ E 7~} is a constant, called the dimension of the configuration space; furthermore, for S C_ X with S I _< d, the number of regions determined by S is at most a constant number E. (ii) For all S C_X and E 7~(X), cr E T ( S ) iff trig(a) C S and S n kill(~) = 0. The following sampling theorem is the basis for many geometric algorithms 12. T h e o r e m 7. Let (X, T, trig, kill) be a configuration space, with n = XI, satisfying azioms (i) and (ii), and for an integer 1 <_ r <_ n let R be a sample from X with each element of X taken into R independently with probability p = r/n. Then: E 2a+xf(r/2) where f(r) is a n upper bound for EIP-(R)I. It follows that with nonzero probability: (I) For all ~ T ( R ) : Ikill(a)l _< C ~ l o g r , and (2) For all integer j >__0: E~eT(R)kill(a)iJ <
E~er(R)exp (~lkill(a)) _<
c
Sequentially, a sample as guaranteed by the sampling theorem can be computed in polynomial time (using the method of conditional probabilities). Through the use of a (1/r)-approximation, the time can be reduced to O(nrC), and for linearizable configuration spaces, for r _< n e, to O ( n l o g r ) . In parallel (NC), kwise independence can only guarantee part (2) of the theorem for j -- O(k) (but not part (1)) 3,4. Modeling the sampling with leveled RFAs, and fooling them with relative error, we can construct in parallel a sample as guaranteed by the sampling theorem, except for a constant multiplicative factor. Relative error is needed because of the exponential weighting that makes even small probability events relevant. We obtain the following (details omitted here). T h e o r e m 8. A sample as guaranteed by the sampling theorem can be computed deterministically in the E R E W P R A M model in O(logn + log2r) time using O(nr c) work; and in the case of a linearizable configuration space and r < he, in O(og nlog r) time using O(nlog r) work.
35
6 6.1
Other Applications Approximation
of Integer Linear Programs
An NC algorithm for approximating positive linear programs was proposed in 21. To solve positive integer linear programs approximately in NC, 2 propose mimicing the philosophy of 29 - first solve the program without integrality constraints, approximately, using 21, and then use the NC latt. app. algorithm of 25 as a rounding black box to obtain an integral solution. However the second step introduces an additional error since 25 only guarantees O(~//~ +" log m) discrepancy sets. 2 attempts to correct, in some cases, the error introduced as a result of using the latt. app. algorithm of 25. Our algorithm essentially reduces the error introduced by latt. app. to the minimum possible. 6.2
Edge Coloring of Graphs
Let G = (IF, E) be an undirected graph whose maximal degree is z~. A legal edge coloring is an assignment of colors to the edges such that two edges incident to the same vertex cannot have the same color. Vizing's theorem states that G can be edge colored with A + 1 colors, and it implies a polynomial time sequential algorithm to find such coloring. The best deterministic parallel algorithm is the derandomization in 25 of an algorithm in 20. It uses a discrepancy algorithm and produces a coloring with A-t-O( Av/-Ai ~ ) colors. For A = D(log n), substituting there our discrepancy algorithm produces a coloring with A -t- O ( v / ~ - J ~ colors.
References 1. N. Alon and J. Spencer. The Probabilistic Method. Wiley-Interscience, 1992. 2. N. Alon and A. Srinivasan. Improved parallel approximation of a class of integer programming problems. Algorithmica 17 (1997) 449-462. 3. N. M. Amato, M. T. Goodrich, and E. A. Ramos. Parallel algorithms for higherdimensional convex hulls. In Proc. 35th Annu. IEEE Sympos. Found. Comput. Sci., 1994, 683-694. 4. N. M. Amato, M. T. Goodrich, and E. A. Ramos. Computing faces in segment and simplex arrangements. In Proc. ~Tth Annu. ACM Sympos. Theory Comput., 1995, 672-682. 5. J. Beck and W. Chen. Irregularities of distribution. Cambridge University Press, 1987. 6. M. Bellare and J. Rompel. Randomness-efficient oblivious sampling. In Proc. 35th Annu. IEEE Sympos. Found. Comput. Sci., 1994, 276-287. 7. B. Berger and J. Rompel. Simulating (log c n)-wise independence in NC. Journal ACM 38 (1991) 1026-1046. 8. S. Chari, P. Rohatgi and A. Srinivasan. Improved Algorithms via Approximations of Probability Distributions. In Proe. ACM Sympos. Theory Comput. (1994) 584592. 9. B. GhazeUe and J. Friedman. A deterministic view of random sampling and its use in geometry. Combinatoriea 10 (1990) 229-249. 10. B. Chazelle and J. Matou~ek. On linear-time deterministic algorithms for optimization problems in fixed dimension. In Proc. ~th ACM-SIAM Sympos. Discrete Algorithms, pages 281-290, 1993.
36 1I. H. Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistic8 23 (1952) 493-509. 12. K. L. Clarkson and P. W. Shot. Applications of random sampling in computational geometry, II. Discrete Comput. Geom., 4 (1989) 387-421. 13. M.T. Goodrich. Geometric partitioning made easier, even in parallel. In Proc. 9th Annu. AGM Sympos. Comput. Geom., 73-82, 1993. 14. M.T. Goodrich. Fixed-dimensional parallel linear programming via relative eapproximations. In Proc. 7th ACM-SIAM Sympos. Discr. Alg. (SODA), 1996, 132141. 15. M.T. Goodrich and E.A. 1Lumos. Bounded independence derandomization of geometric partitioning with applications to parallel fixed-dimensional linear programming. To appear in Discrete and Comput. Geom. 16. W. Hoeffding. Probability inequalities for sums of bounded random variables. American Statist. Assoc. J. 58 (1963) 13-30. 17. A. Joffe. On a set of almost deterministic k-independent random variables. Annals of Probability 2 (1974) 161-162. 18. D.R. Karger and D. Koller. (De)randomized constructions of small sample spaces in NC. In Proc. 35th Annu. IEEE Sympos. Foundations Comput. Sci., 1994, 252263. 19. H.J. Karloff and Y. Mansour. On construction of /c-wise independent random variables. In Proc. ~6th Annu. ACM Sympos. Theory Comput., 1994, 664-573. 20. H.J. Karloff and D.B. Shmoys. Efficient parallel algorithms for edge coloring problems. J. Algorithms 8 (1987) 39-52. 21. M. Luby and N. Nisan. A parallel approximation algorithm for positive linear programming. In Proc. ~5th Annu. ACM Sympos. Theory Comput., 1993, 448457. 22. J. Matou~ek. Approximations and optimal geometric divide-and-conquer. In Proc. ~-3rd Annu. ACM Sympos. Theory Comput., 1991, 505-511. Also in J. Comput. Syst. Sci. 50 (1995) 203-208. 23. J. Matou~ek. Efficient partition trees Discrete Comput. Geom. 8 (1992) 315-334. 24. J. Matou~ek. Derandomization in computational geometry. Available in the web site: http:// www.ms.mff.cuni.cz/ acad/kam/matousek/ Earlier version appeared in J. Algorithms. 25. R. Motwani, J. Naor and M. Naor. The probabilistic method yields deterministic parallel algorithms. J. Comput. Syst. Sci. 49 (1994) 478-516. 26. K. Mulmuley. Computational Geometry: An Introduction Through Randomized Algorithms. Prentice Hall, Englewood Cliffs, NJ, 1993. 27. J. Naor and M. Naor. Small-bias probability spaces: efficient constructions and applications. SIAM J. Comput. 22 (1993) 838-856. 28. N. Nisan. Pseudorandom generators for space-bounded computation. Combinatorica 12 (1992) 449-461. 29. P. Raghavan. Probabilistic construction of deterministic algorithms: Approximating packing integer programs../. Comput. Syst. Sci. 37 (1988) 130-143. 30. P. Raghavan and C.D. Thompson. Randomized rounding: A technique for provably good algorithms and algorithmic proofs. Combinatorica 7 (1987) 365-374. 31. J.P. Schmidt, A. Siegel and A. Srinivasan. Chernoff-Hoeffding bounds for applications with limited independence. SIAM J. Discrete Math. 8 (1995) 223-250. 32. V.N. Vapnik and A.Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16 (1971) 264280.
Graph Editing to Bipartite Interval Graphs: Exact and Asymptotic Bounds. K. Cirino* S. Muthukrishnan** N. S. Narayanaswamy*** H. Ramesh~
A b s t r a c t . Graph editing problems deal with the complexity of transforming a given input graph G from class g to any graph H in the target class 7/ by adding and deleting edges. Motivated by a physical mapping scenario in Computational Biology, we consider graph editing to the class of bipartite interval graphs (BIGs). We prove asymptotic and exact bounds on the minimum number of editions needed to convert a graph into a BIG.
1
Introduction
Graph editing problems deal with the complexity of transforming a given input graph G from class ~ to any graph H in the target class 7/using editing operations, namely, adding, deleting edges or doing a combination of both; we denote this process by ~ -+ 7/. Suppose we allow only edge additions. Then, the graph editing problems are what are known as "graph completion" problems. A classical example is the MINIMUM FILL-IN PROBLEM where 7 / i s the class of chordal graphs, ~ is arbitrary, and edges may only be added 22, 24. Suppose instead we allow only edge deletions. Then, graph editing problems become what are known as "largest subgraph" problems. See 10 for references on these problems. In this paper, we consider graph editing problems where both additions and deletions are allowed; we use editions to refer to an addition or a deletion. Graph editing problems have been extensively studied in Graph Theory and its application areas for special classes of graphs: path and edge graphs (see 14, pages 198-199), trees 11, various interval graphs 13, * Northeastern University. Work supported by DIMACS Special Year on Computational Biology. ** Information Sciences Center, Bell Laboratories Innovations, Lucent Technologies, Murray Hill, NJ 07974.
[email protected]. *** Dept. of Computer Science and Automation, Indian Institute of Science, Bangalore 560012, India.
[email protected]. t Dept. of Computer Science and Automation, Indian Institute of Science, Bangalore 560012, India.
[email protected].
38 chordal graphs etc. In the past 4 years, there has been a tremendous resurgence in the study of graph editing problems motivated by Computational Biology 19, 5, 6, 7, 12, 13, 16, 17, 19. T h e majority of these draw their motivation from DNA physical mapping. In those applications, 7/ is a class of interval graphs 5 The problems we study in this p a p e r are motivated by Computational Biology as well. Specifically we s t u d y the graph editing problem ~ --~ 7/ for the class of bipartite interval graphs (BIGs) 7 / t h a t are a specially important subclass of interval graphs of biological interest. We present the biological scenario and the motivation in Section 6 as an Appendix, and focus here on the graph-theoretic and complexity issues of the graph editing problem. While edition using insertions and deletions separately has been well studied for various classes of graphs, graph edition problems appear harder when b o t h addition a n d deletion operations are allowed since they can take potentially m a n y more paths from the source graph to the target graph passing t h r o u g h intermediate graphs which have little, or nothing, in c o m m o n with both. While our results are theoretical and they do not have a practical relevance for the biological scenarios that motivated us, we hope that our insight into the compIexity of this problem helps practitioners focus experiments better. We now describe our results in detail. Our Results and Their Relevance. We prove asymptotic a n d exact b o u n d s on the complexity of editing graphs to BIGs. Asymptotic Results. To start, we prove: - Finding the rnlnirnunl n u m b e r of editions needed to convert a graph to a BIG is NP-hard. This holds even if the input graph is bipartite and it has b o u n d e d degree. Oil the other hand, we show that there is an easy 3 factor approximation algorithm for this problem. In addition, if the graph has at least 3n edges then it is trivial to obtain a 2 factor approximation. Thus sparse graphs (with say fewer t h a n 3n edges) and additive approximations seem more meaningful for the problem t h a n arbitrary graphs and standard multiplicative approximations. Next we prove that, - The minimum n u m b e r of editions to convert a graph to a BIG cannot be approximated to an additive term of O(nl-~), for 0 < e < 1, 5 A graph is intervalgraph if each vertex can be assigned an interval on the real line such that there is an edge between two vertices if and only if their corresponding intervals overlap.
39 unless P = N P . In fact, this is true even when the input graph is of bounded degree a n d is restricted to be sparse, i.e., m < n(1 + ~), for some constant c. On the other hand, we consider trees, t h a t is graphs with m = n - 1 and prove a positive result. - The minimum n n m b e r of editions needed to converted a tree to BIG can be d e t e r m i n e d in linear time. We also observe that if the input graph h a d t r e e w i d t h w, t h e n the m i n i m u m n u m b e r of editions needed to convert that to a BIG can be d e t e r m i n e d in O(2W2poly(n)) time by adapting s t a n d a r d m e t h o d s for processing b o u n d e d treewidth graphs
31.6 Bounded degree graphs are m o t i v a t e d from C o m p u t a t i o n a l Biology. Making the biologically realistic a s s u m p t i o n t h a t the graphs are of bounded degree gives polynomial time algorithms w h e n the unconditional case is NP-Complete 16, 18 for some problems. O u r results above show that is not the case here. In proving the results above, we establish a relationship between the minimum n u m b e r of editions a n d w h a t is called the caterpillar cover of a graph (See Section 2 for the definition). We prove complexity results on approximating the caterpillar cover which m a y be of independent graph-theoretic interest. E x a c t C o m p l e x i t y B o u n d s . We prove exact bounds on the worst case complexity of editing trees to BIGs. -
-
The minimum to a BIG is at n - 6 changes The minimum to a (general,
n u m b e r editions n e e d e d to convert most n - 5 changes w h e n n is odd, otherwise. n u m b e r editions n e e d e d to convert not necessarily connected) interval
a connected tree and it is at most a connected tree graph is at most
t- J. -- The minimum n u m b e r editions n e e d e d to convert a connected tree to a (general) connected interval graph is at most 2n~l____._L~j. In each case, we prove the u p p e r b o u n d to be tight by exhibiting graphs that require that m a n y editions. All our proofs here make heavy use of the known characterization of B I G s in terms of its forbidden subgraphs.
See 4 for definition of treewidths and survey of results. A standard tree has treewidth 1.
40
M a p . Some preliminary results needed are stated in Section 2. Our asymptotic hardness results are in Section 3. Our asymptotic and exact complexity bounds for trees are in Section 4. Some related observations appear in Section 5. T h e relevance of our results to Computational Biology is described in Section 6.
2
Preliminaries
A caterpillar is a tree in which the deletion of all nodes of degree 1 results in a path. Suppose T is a tree which is not a caterpillar. T h e n there exists a splitting (TI, Tz) of T where T1 is a caterpillar with IT~ I > 5 (See 1). Here ITll is the n u m b e r of vertices in 7"1. A graph is an interval graph if each vertex can be assigned an interval on the real line such that there is an edge between two vertices if a n d only if their corresponding intervals overlap. Two forbidden classes in the structural characterisation of interval graphs that are relevant to us are cycles of length 4 or more, and asteroidal triples. An asteroidal triple is a graph with the following structure: u - v - w - a - b a n d to - y - z. We c a n conclude that a tree is an interval graph if a n d only if it does not have asteroidal triples (since all the other class of forbidden graphs are not trees). It immediately follows that trees that are interval graphs are precisely the class of BIGs. We call a BIG a caterpillar. A caterpillar cover of a graph G is a partition of V(G) such that, the induced subgraph of G on each set of the partition has a spanning tree that is a caterpillar. T h r o u g h o u t , we denote the degree of a vertex v by 6(v). We use T, Z, B, CZ a n d C for the classes of trees, interval graphs, bipartite graphs, c o n n e c t e d interval graphs and caterpillars respectively.
3
Asymptotic Complexity
First we make a characterization in t e r m s of caterpillar covers. L e m m a 1. Let the minimum number of edge editions on a bipartite graph G with n vertices and m edges to make it a caterpillar be p( G). Let ~ be the cardinality of the smallest caterpillar cover of G. Then, p(G) = m
- n-
1 + 2~
41
L e m m a 2. Let the m i n i m u m number of edge editions required to convert a connected graph G with n vertices and m edges into a caterpillar be p(G). Then there exists a polynomial time algorithm that approximates '~ for c > O, there is p(G) within a actor of 3. In addition, if m > n + -g a trivial algorithm to approximate p(G) to a 2c + 1 actor. P r o o f . Our algorithm simply finds the optimal caterpillar cover CT for any arbitrary spanning tree T of G using Theorem 7, discards edges not in this cover, and connects the caterpillars together using the required number of edges. T h e number of editions performed in this process is at most m - n - 1 + 2tOT, where ~;T is the number of caterpillars in CT. The first part of the l e m m a now follows from Lemma 1 and from the fact that ~T < m -- n + 1 + ~, where ~ is the optimal caterpillar cover of G. For the second part of the lemma, just delete all edges and add n - 1 edges to connect the vertices into a single caterpillar. The number of editions made is r e + n - 1 . If rn >_ n + -~ then r e + n - 1 <_ ( 2c + l ) ( m - n - l + 2~) 0 We remark that there exist graphs on which the above algorithm does indeed give an approximation factor of 3. Next, we consider additive approximations. We prove complexity results on finding the minimum caterpillar cover and use it to derive our results for p(G). T h e o r e m 3. Deciding if a bounded degree bipartite graph has a caterpillar cover of cardinality k is NP-complete. This is true even when the graph has at most n(1 + ~) edges, for some constant c > O. P r o o f . (Sketch) The problem is dearly in N P . The hardness reduction is from the Hamiltonian P a t h problem for directed graphs with bounded in and out degrees (The latter problem is N P - c o m p l e t e - See pages 199-200 in 14). Given a digraph D with in and out degrees bounded, we will obtain a bipartite graph G with bounded degree which has a Hamiltonian path if and only if D has a Hamiltonian path. First, a graph G t is obtained as follows. For each vertex v is D, there are path of four vertices v i , v t , v 2 , v o in G'. For each edge (u,v) in D, there is an edge uo, vi in G j. It is easy to see t h a t G ~ has a Hamiltonlan path if and only if D has a Hamiltonian path. Further, coloring the vis and v2s with one color and the vts and VoS with another shows that G ' is bipartite. In addition, G ~ has bounded degree. Next, G is obtained by
42
augmenting G' in two ways: first, by adding a vertex w' for each vertex w in G' and adding the edge w, to', and second, by adding k - 1 isolated vertices. G is easily seen to be b o u n d e d degree and bipartite. Furtherr G has a caterpillar cover of cardinality k if and only if G' has a Hamiltonian path. G can be made sparse by replacing one of the isolated vertices by a chain of long enough length.
C o r o l l a r y 4 . Computing the m i n i m u m number of editions to a bipartite graph to convert it to a caterpillar is NP-hard.
P r o o f . Follows from T h e o r e m 3 and L e m m a 1.
T h e o r e m 5. There exists no polynomial time algorithm that will find a caterpillar cover of size a n l - ' k in a bounded degree bipartite graph, unless P = N P . Here, k is the cardinality of the smallest caterpillar cover, ~ is any fixed number between 0 and 1, and a > O. Further, this holds even when the graph is sparse, i.e., it has at most n(1 + ~) edges, for some constant c > O.
P r o o f . (Sketch) We show t h a t if such an algorithm exists then the Hamiltonian Cycle problem for b o u n d e d degree sparse bipartite graphs can be solved in polynomial time. B u t this problem can be seen to be NPComplete as follows: The Hamiltonian Cycle problem in digraphs with bounded in degree and out degree is NP-Complete 14. This holds even when the digraph is sparse because any digraph can be converted to a sparse one preserving Hamiltonicity by stretching any arbitrary vertex to a chain. This problem can now be reduced to the problem of deciding Hamiltonicity in bounded degree, sparse, bipartite graphs. T h e reduction used is essentially the same as the one described in T h e o r e m 3. Given a bounded degree sparse bipartite graph H with p vertices, we construct a bounded degree bipartite graph G with n vertices. G has a caterpillar cover of cardinality 1 if H has a Hamiltonian cycle, and no caterpillar cover of cardinality less t h a n pC/3 + 1 otherwise. We chose and/~ in such a way that pt13 + 1 = a n I-~ + 1. First, we define a graph G' which consists of the graph H plus two other vertices, called i and o. In G', i is connected to an arbitrary vertex v
43
of H and o is c o n n e c t e d to all t h e neighbors of v in H. Next, a graph G" is defined as having ~3p t copies of G ~ chained together, i.e., the o vertex of one copy is connected to t h e i vertex of the next copy. Next, G is obtained by a u g m e n t i n g G" by adding one e x t r a vertex w' for each vertex w in G" a n d adding the edges t0', w. It can be seen t h a t G', G", G are all b o u n d e d degree and bipartite and sparse. In addition, G" has a H a m i l t o n i a n p a t h if H has a Hamiltonian cycle. It follows t h a t G has a caterpillar cover of cardinality 1 if H has a Hamiltonian cycle. Suppose H does n o t have a Hamiltonian cycle. T h e n G has no caterpillar cover of cardinality less t h a n ~pt + 1. Since the size of G, n, is at most 8~3pt+l, a n y caterpillar cover in G has size at t least (s--~)t+-"~'tj3 + 1. Setting an 1-~ + 1 to be equal to (8--~)~+t/~ + 1, we get ~ l = 1 - e a n d ~3 = 8ta t+1.
T h e o r e m 6. The number of editions required to obtain a caterpillar from
a bounded degree bipartite graph G cannot be approximated to an additive term of O(n x - ' ) for any fixed c, 0 < e < 1, unless P = N P .
P r o o f . ( S k e t c h ) Let n be the cardinality of the smallest caterpillar cover of G. At least m - n - 1 + 2~ editions are required. Suppose you can obtain a caterpillar in m - n - l + 2 ~ ; + x O ( n x-') ,,~ m - n - l + 3 g O ( n 1-~) editions. T h e n we can obtain a caterpillar cover of size O(nl-')t~ in polynomial time. But this contradicts T h e o r e m 3.
4
Exact Complexity
We first present a linear time algorithm for optimal solution of editing trees to caterpillars.
T h e o r e m 7. An optimal caterpillar cover can be found in O(n) time in
a tree T.
P r o o f S k e t c h . We will identify a particular caterpillar C in T whose removal will give a new tree T' w i t h the following property: the optimal caterpillar cover of T is exactly one more t h a n t h a t of T'. We can assume t h a t T is r o o t e d a t a non-leaf vertex. We also assume that there is always a vertex which is either the root and has at least 3
44
non-leaf children or a n o n - r o o t n o d e and has at least 2 non-leaf children; if such a vertex does n o t exist t h e n T is itself a caterpillar. Let v be such a vertex with the a d d e d p r o p e r t y t h a t in the subtree rooted at v, no other vertex has more than one non-leaf child. T h e n the subtrees rooted at children of v are caterpillars. T h e caterpillar C is determined as follows. If v has exactly two non-leaf children then the subtree r o o t e d at v is itself a caterpillar; we take C to be this caterpillar. Otherwise, if v has more t h a n two non-leaf children then pick the subtree r o o t e d at any child of v as C. Clearly, the o p t i m u m caterpillar cover of T has at most one more caterpillar than the o p t i m u m caterpillar cover of T - C = T ' Further, from any optimum cover of T, by at m o s t one simple edge exchange operation, an optimum cover of T containing C can b e constructed. This shows that the optimum cover of T ' is at m o s t one less t h a n the o p t i m u m cover of T. Therefore, the size of the o p t i m a l cover for T is exactly one more than that for T - C = T'. T h e following is a linear time i m p l e m e n t a t i o n of the algorithm. By a D e p t h First Traversal of the tree, a stack of vertices which have at least two non-leaf children is constructed. Every v e r t e x which has at least l:wo non-leaf children is pushed onto the stack the first time it is visited. Let v be the vertex at the t o p of the stack. T h e following procedure is repeated till the stack b e c o m e s empty. If v has more t h a n two non-leaf children in T, then the s u b t r e e r o o t e d at a child of v is added to the caterpillar cover and is r e m o v e d from T. If v has exactly two non-leaf children at T, then the s u b t r e e r o o t e d at v is a d d e d to the caterpillar cover and is removed from T and, v is p o p p e d off the stack. In either case, let the resulting tree b e d e n o t e d by T ' . If the stack is not empty, then the above procedure is r e p e a t e d with T : = T ' . We maintain the invariant t h a t in the s u b t r e e r o o t e d at v, no other vertex is present in the stack. Clearly, this is a linear time implementation of the algorithm. O Combined with L e m m a 1, it follows that the mlnmum n u m b e r of editions needed to convert the given tree to a caterpillar can b e c o m p u t e d in linear time. This result can b e e x t e n d e d to graphs of treewidth w in time O(2(~)2poly(n)) time using s t a n d a r d ideas from 3; we omit these details. Next we prove a series of three e x a c t b o u n d s on the minimum number of editions needed to convert the given tree into various interval graphs. T h e o r e m 8. The graph editing problem T -4 C takes at most n - 5 editions if n is odd, and at m o s t n - 6 editions otherwise; here n is the number of vertices in T C 7".
45
P r o o f . The proof is by induction on n. T h e base case is when n = 7. In that case, the only non-interval tree is the asteroidal triple which can be changed to a caterpillar b y removing the edge (u, v) and adding the edge (u, w). W h e n n = 8, the three non-interval trees are formed from the asteroidal triple b y attaching an additional vertex to vertex u, to v and to w respectively. Each such tree can b e made into a caterpillar by removing the edge (u, v) a n d adding the edge (u, w). Now assume the induction hypothesis holds for k < n. Consider two cases. C a s e 1. Let T b e a tree, T n o t a caterpillar, T = n odd. Let v be an end vertex of T, v adjacent to vertex u. Consider the graph T - v, T - v = n - 1 is even. Change T - v to the caterpillar ( T - v)' using _< ( n - 1) - 6 = n - 7 steps. N o w if ( T - v ) ' U { v , ( u , v ) } is a caterpillar, we are done. Else, remove edge ( u , v ) and add edge ( v , s , ) where s - 1 is the vertex in the labeling of the spine of (T - v)'. T h e total number of editions is <: n - 7 + 2 -- n - 5. C a s e 2. Let T b e a tree, T n o t a caterpillar, IT - n even. Then by the result of 1, there exists a splitting of T at v into Tl and T2 such that Tll >_ 5 and Tl is a caterpillar. Now we wish to change T2 into a caterpillar. If IT1 ---- 5, T2 = n - 4 which is even since n is even. Hence by induction hypothesis, 2"21 _< (n - 4) - 6 -- n - 10. Otherwise if 7"1 > 5, IT2 < n - 4 < n - 5. Again by induction hypothesis, 2"2 < (n - 5) - 5 -- n - 10. In either case, the n u m b e r of editions to convert T2 to a caterpillar T~ is < n - 10. It remains to only re-associate v in T1 and T~ to form target g r a p h T ' . S u b c a s e A . The v e r t e x v is a leaf v e r t e x of either T1 or T~. W.l.o.g. say T1. Let u be the vertex a d j a c e n t to v in T1. Remove the edge (u, v) from T' to get a disconnected graph T ~ containing the two components X and Y b o t h of which are caterpillars. N o w we can add one edge between X and Y so that T* is a c o n n e c t e d caterpillar. The total number of editions is < n - 1 0 + 2 < n - - 8 . S u b e a s e B . The vertex v is not a leaf vertex in either TI or T~. Consider v in T1. Since T1 is a caterpillar, v is adjacent to at most two non-leaf vertices, say x and y. R e m o v e edges (v, x) and (v, y) from the graph T'. T h e resulting graph T* contain.~ three components all of which are caterpillars. Now the three c o m p o n e n t s can be connected up by adding at most two edges. T h e total n u m b e r of editions is < n - 10 + 4 -- n - 6. o T h e o r e m 9. There exists a tree T E T on n nodes f o r odd n that requires at least n - 5 (respectively n - 6 w h e n n is even) editions to be converted into a caterpillar.
46
P r o o f . Consider any sequence of editions of minimum length that converts a tree into a caterpillar. Clearly this sequence does not consider the operation of inserting an edge e a n d deleting t h a t same edge e (in either order). Note t h a t t h e operations in that sequence can be arbitrarily p e r m u t e d without c-ban~ng the overall outcome. Therefore, we can assume without loss of generality t h a t any such sequence has all the delete operations preceding all the insert operations. Say n is odd. Consider the tree T in Figure 4 where the center node is denoted w. Each p a t h of length two e m a n a t i n g from w (including w) is called a branch. T h e r e are k = - ~ branches in T. Note that any three of these branches taken t o g e t h e r f o r m a 2-star. In order to convert T into a caterpillar, all such 2-stars must be destroyed. If any edge deletion occurs in a branch, we say t h a t branch is broken, and it is unbroken otherwise.
n is o d d
n is e v e n
F i g . 1. T h e trees for T h e o r e m 9. Let x be the n u m b e r of broken branches a f t e r all deletions have been performed; clearly the n u m b e r of deletions is at least x. The number of disconnected components left b e h i n d a f t e r all the deletions have been performed is at least x + 1 since each edge deletion creates one additiortal component. Any sequence of additions t h a t compose a connected BIG from these components must p e r f o r m at least x additions. Thus the total number of editions is at least 2x. C l a i m . x > k - 2. P r o o f . Suppose otherwise t h a t x < k - 2. In t h a t case, at least three branches are unbroken after all deletions have been performed; this implies the presence of at least one 2-star. Consider the component that contains a 2-star. In order to destroy this induced subgraph forming a 2-star, one or more edges have to be a d d e d within this component. But adding any edge within this c o m p o n e n t introduces cycles and hence the resulting graph cannot be a caterpillar. T h a t gives the contradiction. Q Thus it follows that the total n u m b e r of editions is at least 2x > 2 k - 4 > n - 5 proving one part of the theorem. Suppose n is even. Consider t h e tree in Figure 4. T h e n u m b e r of branches there is k = ~-~. The rest of the a r g u m e n t above holds and the minimum number of editions needed is at least 2x > 2(k - 2) > n - 6 proving the other part. :3
47
T h e o r e m 10. The graph editing problem 7- --~ I takes at most - ~ / editions where Z is the class o/ not necessarily connected interval graphs.
P r o o f . The proof is by induction on n. T h e base cases for n = 7 and n = 8 are the same as in T h e o r e m 8. In each case the removal of the edge (v, w) will yield an interval graph. We assume the hypothesis is true for k a n d show it for k + 2. First we observe that any tree T, IT >_ 3, contains one of the following two subgraphs: (A) vertex w c o n n e c t e d to u a n d to v, a n d the rest of the tree connected to w, and u and v are n o t c o n n e c t e d to each other; (B) vertex w connected to u which is c o n n e c t e d to v, a n d the rest of the tree is connected to w; w and v are n o t c o n n e c t e d to each other. In b o t h cases, u and v are not adjacent to a n y o t h e r vertices. Suppose ITI = k + 2. C a s e 1. T contains s u b g r a p h A. Consider T with u, v removed. By induction hypothesis, this can b e changed to an interval graph T ~ using < ~-~J editions. Consider T " o b t a i n e d from T ' with edges (w,u) and (w, v) added. Suppose T " contains a n asteroidal triple. Then b o t h u and v must be part of asteroidal triples since T ~ did not contain any. Let yl, y2, u and yl, y2, v b e asteroidal triples in T " . We claim that the degree of w, d e n o t e d ~i(w), is 3. Suppose not. Then there exist some vertices x a n d z a d j a c e n t to w. B u t then either yt, y2, x or yl, y2, z is an asteroidal triple in T ' . B u t T ' is interval and therefore only one of x nd z can b e a d j a c e n t to w. Hence 3(w) = 3. Let vertex x be adjacent to w. R e m o v e the edge ( w , x ) in T". The resulting graph is interval. So the n u m b e r of editions is < ~-~J + 1 =
C a s e 2. T contains s u b g r a p h B . Consider T with u, v removed. By induction hypothesis, this can b e changed to an interval graph T ~ using _< k~_~ editions. Consider T " o b t a i n e d from T ' with edges (w,u) and (w,v) added. Suppose either u or v forms an asteroidal triple. Remove the edge (w, u) in T " . T h e resulting graph is interval. So the number of editions is _< ~ - ~ + 1 = ( k + ~ ) - s j . t3 T h e o r e m 11. There exists a tree T E T such that it takes at least editions to convert it into a not necessarily connected interval graph.
We omit this proof here. Use the s a m e example as in Theorem 9. It will suffice to stop the arg~ment in T h e o r e m 9 with the case of deletions. We note that the c o n s t r u c t i o n in T h e o r e m 10 produces disconnected
48 interval graphs in some cases (for example in Case 2 above). In what follows we study the graph editing problem from trees under editions if the target graph is required to be a connected interval graph. T h e o r e m 12. The graph editing problem T -+ C I takes at most 2~1__.__L1j editions where CT. is the class of connected interval graphs. P r o o f . The proof is by induction on n. The base case is when n = 7 or n = 8. When n = 7, there is only one non-interval tree, the asteroidal triple. In this case, one edition operation suffices since adding the edge (w, x) destroys the 2-star and converts that to a connected interval graph. That proves the base case when n = 7. When n = 8, there are three noninterval trees (up to isomorphism) as in the proof of Theorem 8; in each case, simply adding the edge (w, x) suffices and the base case holds. For induction hypothesis, assume t h e theorem is true for all k < n. Let T be a tree with IT = n. Suppose the longest p a t h in T has length p. Consider the set of longest paths in T. C a s e 1. There exists at least one longest p a t h Vl,V2,...,vp such that either $(v2) _> 3 or $ ( v p - l ) _> 3. W.l.o.g say $(v2) > 3. Now we remove the edge (v2, v3). This results in two components T1 and T2 such that T1 is a star (hence a caterpillar), IT1 = m > 3, and IT2 = n - m. The total number of editions to transform T into a connected interval graph is the cumulative cost of converting T2 to a connected interval graph, that for T1, and the additional editing to form and later reconnect the components. The first of this takes at most /2(n--'n)--ll/ editions. The L 3 d second takes none. T h e third a m o u n t s to 2, one for deleting the edge (v2, v3) and the other for connecting the two components. Thus, the number of editions needed in all is at most 2(n-
m)- llj 3 -t-2--
2(n-m)-ll 3
+2J <
2(n-3)-11 3
+2J
since m > 3. T h a t reduces to at most lzn~l------klJ.T h a t finishes Case 1. Before we consider Case 2, we prove a useful lemma. L e m m a 13. Let T be the connected tree with longest path el, v 2 , . . . , v5 of length 5, ITI = n > 7 and g(u) < 2 for all u except u = vs. Then it takes at most - ~ J editions to convert that to a connected interval graph.
49 P r o o f . T has the longest p a t h v l , - - -, vs and additional paths of length 1 or 2 starting from vs. Let b be the n u m b e r of additional paths of length 2 from vz. We can label these p a t h s v3,V~,l,Vi,2 for 1 < i ~ b. Now adding the edges (v3, vi.2) for all i gives an interval graph. So we must add exactly b edges. B u t b < - ~ J . L e m m a follows. O Now we return to Case 2 in the proof of Theorem 12. C a s e 2. In every longest p a t h vl, v 2 , . . . , vp, ~(v2) - 2 and ~(vp-1) = 2. (Note that otherwise, we have Case 1 above). If the longest path in the tree is of length 5, t h e n we have precisely the case in Lemma 13. So that takes at most L - ~ J editions which is at most ~u~A~j for n >_ 7 as needed. So we will only trees with longest p a t h of length greater than or equal to 6. Consider some longest path v l , . . . , vp. The removal of edge (v3, v4) results in two c o m p o n e n t s T1 (containing at least the vertices vl, v2, v3) and T2 (containing at least the vertices v4, us, vs). So ITII - m >_ 3 and IT21 = n - m > 3. T h e longest p a t h in Tl must be of length at most 5 since otherwise, we can construct a p a t h for T which is longer than V l , . . . , v p . So it takes at most L - ~ J editions to convert h to a connected interval graph by L e m m a 13. (If m ~ 5, the number of editions is at most 0). S u b c a s e A. Suppose IT21 > 7. T h e n the total number of editions is at most - ~ J for T~ , 12(n-'~) L s -11 J for T2, and 2 to remove edge (v3,v4) and then connect the resulting components. Hence when m _> 5, the total number of editions is at most m-5
2(n+
m)-- llj 3
4n-,~-
25j
6
'
+ 2 < L
which when m > 5 is at most
4"-3~
"~ J<_L2n-11J o
O
:/ branches F i g . 2. the trees for Theorem 14.
50
which completes the proof. O n t h e other hand, when m < 5, the total number of editions is at most o+L
2(n-m)-11,+2~ 3
=L2(n-m)-5'~ 3
< L2(n-3)-5j=L2n-11 3 3
J
which completes the proof of this subcase. S u b c a s e B. Suppose IT2 < 7. T h e n the n u m b e r of I to convert T2 to a connected interval g r a p h is 0. So t h e total n u m b e r of editions is at most m-5
L-5--J
+ o+ 2=
m-1
<
n-4
l
since m _< n - 3. This is at m o s t L2'*~'------~-I J when n >_ 9. The case when n -" 7 or n = 8 form the base cases proved earlier. T h a t completes the proof. O
T h e o r e m 14. There exists a tree T o / n nodes such that it needs at least 2n~1_._..~1j editions to convert that into a connected interval graph.
Proof. W e can argue as in the proof of Theorem 9 and conclude that in any m i n i m u m length sequence of editions converting T to a connected interval graph, all the delete operations precede all the insert operations, and that such a sequence does not include the operations of inserting an edge e and deleting that same edge e (in either order). First suppose n ---3k -{-7 for integer k = --~. ,~-7 The tree T which satisfies the claim here is shown in Figure 2. Each "fork" e m a n a t i n g from w (including w) is called a branch. There are k -b 2 branches in T. If any edge deletion occurs in a branch, we say that branch is broken, a n d it is unbroken otherwise. C a s e 1. Suppose at least 2 branches are unbroken. T h e n each of the remaining k branches have e i t h e r h a d at least one deletion or they have had none. In the former case, t h e y n e e d at least one addition to reconnect them. In the latter case, t h e y will need two additions to remove the 2stars. Hence, in all, we need at least 2k editions. C a s e 2. Suppose fewer than 2 branches are unbroken. T h e n at least k branches have had at least one deletion a n d t h e y would need at least one addition to reconnect them. Hence we need at least 2k editions in this case.
51
In either case, the number of deletions is at least 2k = 2(=-T) 2=-14 3 -3 That gives the lower bound when n - 7 is multiple of 3, or in other words, k is an integer. When k is not an integer, we construct T' by adding sufficient singleton nodes to the branches of T. The same a r g - m e n t as above, coupled with the fact that the number of editions must be integers, can be used to argue that T' needs at least
y2(n -
__ L2(" -
editions which proves the theorem.
5
Concluding
7)
+ 1J = L2"-3 uj El
Remarks
Using techniques similar to the ones above, it can be shown that the graph editing problem to caterpillars is N-P-complete even with the restriction that the number of additions is restricted to p. Further, approximating the minimum number of additions required to edit a graph to a caterpillar within an O(n 1-~) additive factor is not possible for any e, 0 < e < 1, unless P - NP. Both these results follow from the fact that the minimum number of additions required is exactly one less than the size of the smallest caterpillar cover. It can also be shown that approximating the minimum caterpillar cover for chordal graphs to within an O(n 1-~) multiplicative factor for any fixed e, 0 < e < 1, is impossible unless P = N P.
Acknowledgements We thank the referees for numerous comments, especially the suggestions which lead to Lemma 2.
References 1. T. Andrae and M. Aigner. The total interval number of a graph. J. Comb. Theory, Series B, 46, 7-21, 1989. 2. F. Alizadeh, R.M. Karp, D.K. Weisser, G. Zweig, Physical Mapping of Chromosomes Using Unique Probes, J. Comp. Bio., 2(2):153-158, 1995. 3. S. Arnborg, A. Proskurowski, Linear T i m e Algorithms for NP-Hard Problems Restricted to Partial k-Trees, Discrete Applied Mathematics, 23:11-24, 1989.
52 4. H. Bodlaender. A tourist guide through treewidth. Manuscript, 1995. 5. H. Bodlaender, M. Fellows, M. Hallet, T. Wareham and T. Warnow. The hardness of problems on thin colored graphs. Manuscript, 1995. 6. H. Bodlaender, and B. de Fluiter. Intervalizing k-colored graphs. Proc. ICALP, 1995. Also, see http : //www.cs.ruu.nl/ ,., hansb/mypapers2.html for the journal version. 7. H. Bodlaender, M. Fellows, and M. Hallet. Beyond the NP Completeness for problems of bounded width. Proc. STOC, 449-458, 1994. 8. K. Booth and G. Leuker. Testing for the consecutive ones property, interval graphs and graph planarity using PQ algorithms. J. Comput. Syst. Sciences, 13, 335 - 37'9, 1976. 9. N. G. Cooper(editor). The Human Genome Project- Deciphering the Blueprint of Heredity, University Science Books, Mill Valley, California, 1994. 10. P. Cresenzi and V. Kann. The NP Completeness Compendium. See Section on Subgraphs and Supergraphs. http://www.nada.kth.se/viggo/problemlist/compendium. 11. M. Farach, S. Kannan and T. Warnow. A robust model for constructing evolutionary trees. Proc. STOC, 1994. 12. M. Fellows, M. HaUet and T. Wareham. DNA Physical Mapping: three ways difficult. Proc. First ESA, 157-168, 1993. 13. P.W. Goldberg, M.C. Golumbic, H. Kaplan, R. Shamir, Four Strikes Against Physical Mapping of DNA, J. Comp. Bio., 2(1):139-152, 1995. 14. M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, p199-200. (Freeman, San Fransisco, CA, 19 79) 15. M.C. Golumbic, Algorithmic Graph Theory and Perfect Graphs. Academic Press, New York, 1980. 16. M. Golumbic, H. Kaplan and R. Shamir On the complexity of DNA physical mapping. Adv. Appl. Math., 15:251-261, 1994. 17. H. Kaplan and R. Shamir. Pathwidth, bandwidth and completion problems to proper interval graphs with small cliques. To appear in SIAM. J. Computing, 1996. 18. H. Kaplan and R. Shamir. Physical mapping and interval sandwich problems: Bounded degrees help. Manuscript, 1996. 19. H. Kaplan, R. Sharnir and R. Tarjan. Tractability of parameterized completion problems on chordal and interval graphs: Minimum fill-in and physical mapping. Procs. of FOCS, 1994. 20. R. Karl), Mapping the Genome: Some Combinatorial Problems Arising in Molecular Biology, ~5th ACM STOC, 1993.
53
21. P. Pevzner, M. Waterman. Open Combinatorial Problems in Computational Molecular Biology, Proceedings of the Third Israel Symposium on Theory of Computing and Systems, Jan 4-6, 1995, Tel Aviv, Israel. 22. 3. Rose. A graph-theoretic study of the numerical solution of sparse positive definite systems of linear equations. Reed Eds. Graph Theory and Computing, 183--217, Academic Press, NY, 1972. 23. M. Waterman, J. R. Griggs, Interval Graphs and Maps oj DNA, Bull. of Math. Biol., 48:189-195, 1986. 24. M. Yannakakis. Computing the minimum fiU-in is NP-Complete. SIAM J. ALg. Disc. Methods, 2, 1981. 25. C. Wang. A subgraph problem from restriction maps of DNA chain. Journal of Computational Biology, 1995.
6
Appendix I. Motivating Biological Scenario
In this section, we very briefly review the motivating biological scenarios for studying graph editing problems for bipartite interval graphs (BIGs). M o t i v a t i n g Biological Scenario. Graph editing problems ~ ~ 7/ where 7/is a class of BIGs, arise in DNA physical mapping in presence of experimental errors. This scenario is nicely motivated in 23. Consider mapping a DNA molecule by double digest restriction mapping. The basic operation is that of cutting a DNA strand into disjoint fragments by a restriction enzyme. Two different restriction enzymes A and B are employed separately and then simulatenously in a total of three experiments. From the overlap information from the pieces thus obtained, we can construct an overlap graph with vertices representing the fragments cut by A on the left and those by B on the right; there is an edge between two vertices if their corresponding fragements overlap. Thus the overlap graph is a bipartite interval graph (BIG). Clearly in the experimental process, the order of the fragments is lost, and the goal is to reconstruct that order based on the overlap graph. This is an easy task provided there were no ex'perimental errors and the overlap graph is a BIG 8, 23. However, biological experimental have a significant rates of false positive (presence of an edge between two vertices whose corresponding interval do not overlap) and false negative (absence of an edge between two vertices whose corresponding interval do overlap) errors. In that case, the overlap graph is not an interval graph and biologists are interested in constructing the "true" overlap graph (a BIG), from the graph obtained from experiments with errors, assuming only a few experimental errors occurred, in other words, with few edges edits. That gives the graph editing problem. It is easy to see that the addition of edges compensates for false negative errors and deletion for
Model
Checking *
Edmund M. Clarke Department of Computer Science Carnegie Mellon, Pittsburgh
A B S T R A C T : Model checking is an automatic technique for verifying finite-state reactive systems, such as sequential circuit designs and communication protocols. Specifications are expressed in temporal logic, and the reactive system is modeled as a statetransition graph. An efficient search procedure is used to determine whether or not the state-transition graph satisfies the specifications. We describe the basic model checking algorithm and show how it can be used with binary decision diagrams to verify properties of large state-transition graphs. We illustrate the power of model checking to find subtle errors by verifying part of the Contingency Guidance Requirements for the Space Shuttle. K e y w o r d s : automatic verification, temporal logic, model checking, binary decision diagrams Model checking is an automatic technique for verifying finite-state reactive systems. Specifications are expressed in a propositional temporal logic, and the reactive system is modeled as a state-transition graph. An efficient search procedure is used to determine automatically if the specifications are satisfied by the state-transition graph. The technique was originally developed in 1981 by Clarke and Emerson 10, 11. Quielle and Sifakis 18 independently discovered a similar verification technique shortly thereafter. An alternative approach based on showing inclusion between w-automata was later devised by Robert Kurshan at A T T Bell Laboratories 14, 15. Model checking has a number of advantages over verification techniques based on automated theorem proving. The most important is that the procedure is highly automatic. Typically, the user provides a high level representation of the model and the specification to be checked. The model checker will either terminate with the answer true, indicating that the model satisfies the specification, or give a counterexample execution that shows why the formula is not satisfied. The counterexamples are particularly important in finding subtle errors in complex reactive systems. The first model checkers were able to verify small examples (1, 2, 3, 4, 11, 13, 16). However, they were unable to handle very large examples due to the state explosion problem. Because of this limitation, many researchers in formal verification predicted that model checking would never be useful in practice. The possibility of verifying systems with realistic complexity changed dramatically in the late 1980's with the discovery of how to represent transition relations using ordered bin~yy decision diagrams (OBDDs) 5. This discovery was made independently by three This research is sponsored in part by the Wright Laboratory, Aeronautical Systems Center, Air Force Materiel Command, USAF, and the Advanced Research Projects Agency (ARPA) under grant F33615-93-1-1330. and in part by the National Science foundation under Grant No. CCR-9217549 and in part by the Semiconductor Research Corporation under Contract 92-DJ-294. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the officialpolicies,eitherexpressed or implied of the U.S. government.
55 research teams 8, 12, 17 and is basically quite simple. Assume that the behavior of a reactive system is determined by n boolean state variables vl, v2,...,vn. Then the transition relation of the system can be expressed as a boolean formula
where vl, v z , . . . , v, represents the current state and v~, v~,..., v, represents the next state. By converting this formula to a BDD, a very concise representation of the transition relation may be obtained. The original model checking algorithm, together with the new representation for transition relations, is called symbolic model checking 7, 8, 9. By using this combination, it is possible to verify extremely large reactive systems. In fact, some examples with more than than 10 lz~ states have been verified 6, 9. This is possible because the number of nodes in the OBDDs that must be constructed no longer depends on the actual number of states or the size of the transition relation. Because of this breakthrough it is now possible to verify reactive systems with realistic complexity, and a number of major companies including Intel, Motorola, Fujitsu, and A r T have started using symbolic model checkers to verify actual circuits and protocols. In several cases, errors have been found that were missed by extensive simulation. We illustrate the power of model checking to find subtle errors by considering a protocol used by the Space Shuttle. We discuss the verification of the Three-EnginesOut Contingency Guidance Requirements using the SMV model checker. The example describes what should be done in a situation where all of the three main engines of the Space Shuttle fail during the ascent. The main task of the Space Shuttle Digital Autopilot is to separate the shuttle from the external tank and dump extra fuel if necessary. The task involves a large number of cases and has many different input parameters. Thus, it is important to make sure that all possible cases and input values are taken into account and that the tank will eventually separate. The Digital Autopilot chooses one of the six contingency regions depending on the current flight conditions. Each region uses different maneuvers for separating from the external tank. This involves computing a guidance quaternion. Usually, the region is chosen once at the beginning of the contingency and is maintained until separation occurs. However, under certain conditions a change of region is allowed. In this case, it is necessary to recompute the quaternion and certain other output values. Using SMV we were able to find a counterexample in the program for this task. We discovered that when a transition between regions occurs, the autopilot system may fail to recompute the quaternion and cause the wrong maneuver to be made. The guidance program consists of about 1200 lines of SMV code. The number of reachable states is 2- 1014, and it takes 60 seconds to verify 40 CTL formulas.
References 1. M. C. Browne and E. M. Clarke. Smh A high level language for the design and verification of finite state machines. In IFIP WG I0.~ International Working Conferencefrom HDL Descriptions to Guaranteed Correct Circuit Designs, Grenoble, France. IFIP, September 1986. 2. M. C. Browne, E. M. Clarke, and D. Dill. Checking the correctness of sequential circuits. In Proceedings of the 1985 International Conference on Computer Design, Port Chester, New York, October 1985. IEEE.
56 3. M. C. Browne, E. M. Clarke, and D. Dill. Automatic circuit verification using temporal logic: Two new examples. In Formal Aspects of VLSI Design. Elsevier Science Publishers (North Holland), 1986. 4. M. C. Browne, E. M. Clarke, D. L. Dill, and B. Mishra. Automatic verification of sequential circuits using temporal logic. IEEE Transactions on Computers, C-35(12):1035-1044, 1986. 5. R. E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Transactions on Computers, C-35(8), 1986. 6. J.R. Butch, E.M. Clarke, and D. E. Long. Symbolic model checking with partitioned transition relations. In A. Halaas and P. B. Denyer, editors, Proceedings of the 1991 International Conference on Very Large Scale Integration, August 1991. Winner of the Sidney Michaelson Best Paper Award. 7. J. R. Butch, E. M. Clarke, K. L. McMillan~ and D. L. Dill. Sequential circuit verification using symbolic model checking. In Proceedings of the $Tth A CM/IEEE Design Automation Conference. IEEE Computer Society Press, June 1990. 8. J. R. Butch, E.M. Clarke, K. L. McMillan, D.L. Dill, and J. Hwang. Symbolic model checking: 1020 states and beyond. In Proceedings of the Fifth Annual Symposium on Logic in Computer Science. IEEE Computer Society Press, June 1990. 9. Jerry R. Butch, Edmund M. Clarke, David E. Long, Kenneth L. MeMillan, and David L. Dill. Symbolic model checking for sequential circuit verification. IEEE Transactions on Computer-Aided Design of Integrated Circuits, 13(4):401-424, April 1994. 10. E. M. Clarke and E. A. Emerson. Synthesis of synchronization skeletons for branching time temporal logic. In Logic of Programs: Workshop, Yorktown Heights, NY, May 1981, volume 131 of Lecture Notes in Computer Science. Springer-Verlag, 1981. 11. E. M. Clarke, E. A. Emerson, and A. P. Sistla. Automatic verification of finite-state concurrent systems using temporal logic specifications. ACM Transactions on Programming Languages and Systems, 8(2):244-263, 1986. 12. O. Coudert, C. Berthet, and J. C. Madre. Verification of synchronous sequeutial machines based on symbolic execution. In J. Sifakis, editor, Proceedings of the 1989 International Workshop on Automatic Verification Methods for Finite State Systems, Grenoble, France, volume 407 of Lecture Notes in Computer Science. Springer-Verlag, June 1989. 13. D. L. Dill and E. M. Clarke. Automatic verification of asynchronous circuits using temporal logic. IEE Proceedings, Part E 133(5), 1986. 14. Z. Har'El and R. P. Kurshan. Software for analytical development of communications protocols. A T ~ T Technical Journal, 69(1):45-59, Jan.-Feb. 1990. 15. R. P. Kurshan. Analysis of discrete event coordination. In J.W. de Bakker, W.-P. de Roever, and G. Rozenberg, editors, Proceedings of the REX Workshop on Stepwise Refinement of Distributed Systems, Models, Formalisms, Correctness, volume 430 of Lecture Notes in Computer Science. Springer-Verlag, May 1989. 16. B. Mislara and E.M. Clarke. Hierarchical verification of asynchronous circuits using temporal logic. Theoretical Computer Science, 38:269--291, 1985. 17. C. Pixley. A computational theory and implementation of sequential hardware equivalence. In R. Kurshan and E. Clarke, editors, Proc. CAV Workshop (also DIMACS Tech. Report 90-31), Rutgers University, NJ, June 1990. 18. J.P. Quielle and J. Sifaki.~. Specification and verification of concurrent systems in CESAR. In Proceedings of the Fifth International Symposium in Programming, 1981.
R e c u r s i o n Versus I t e r a t i o n at H i g h e r - O r d e r s A. J. Kfoury Boston University, Boston, MA 02215, USA e-maih kfoury~cs.bu.edu
A b s t r a c t . We extend the well-known analysis of recursion-removal in first-order program schemes to a higher-order language of finitely typed and polymorphically typed Junctional programs, the semantics of which is based on call-by-name parameter-passing. We introduce methods for recursion-removal, i.e. for translating higher-order recursive programs into higher-order iterative programs, and determine conditions under which this translation is possible. Just as finitely typed recursive programs are naturally classified by their orders, so are finitely typed iterative programs. This syntactic classification of recursive and iterative programs corresponds to a semantic (or computational) classification: the higher the order of programs, the more functions they can compute.
1
Background and Motivation
Although our analysis is entirely theoretical, as it combines methods from typed A-calculi, from abstract recursion theory and from denotational semantics, the problems we consider have a strong practical motivation. The translation of recursive procedures into iterative procedures, sometimes called recursion-removal, has been a standard feature of programming practice for a long time. A frequent and important use of recursion-removal is, for example, in compiler design; modern treatments of this topic are in 6, Chapter 8 and 18, Chapter 21. Discussions about other benefits of recursion-removal can be found elsewhere in the literature, e.g. in the books 1, 10, 14, among several others. The usual implementation of a recursive procedure requires a stack, in addition to basic control mechanisms to change the order of execution in a computation, such as g o t o instructions. The stack in question is used to save information, the so-called activation records 2, Chapter 10, from successive recursive calls of the procedure. By contrast, the implementation of an iterative procedure does not require a stack, resulting in more efficient use of storage space. By appropriate coding and decoding of arbitrarily long sequences of activation records, it is in principle always possible to avoid the use of a stack and directly implement recursive procedures as iterative ones. The problem is that such a coding and decoding mechanism is not always available, and even if it is, it often leads to just as expensive implementations by trading storage space for * Partly supported by NSF grant CCR-9417382.
58 computational complexity. In the absence of such a coding and decoding mechanism, there are recursive procedures that can be translated into iterative ones, while there are others that cannot. A procedure in the latter case is inherently recursive, in the sense that any other procedure defining the same function cannot be iterative and requires a stack for its implementation, if we also preclude the use of a coding and decoding mechanism to simulate a stack.
Comparative Schematology. The preceding facts are well-known among computer scientists. In the early 1970's people working in comparative schematology aimed at rigorously establishing these and other similar results. Classes of program schemes were defined, modelling different control mechanisms in conventional programming languages, such as recursion and iteration. An elegant and sometimes highly technical theory was developed to examine questions of inter-translatability between these classes. The first paper in this area, sparking an intensive research activity for many years, was by Paterson and Hewitt 17 who spelled out precise conditions under which recursion-removal is not possible. We summarize the Paterson-Hewitt result by writing ITER < REC, meaning that iteration is a strictly weaker control structure than recursion. Pebbling vs. Counting. Paterson and Hewitt gave two very different proofs for ITER < REC. Their first proof was based on the pebble game. As it turned out in later years, the pebble game and many of its variants became a basic technique in other areas of theoretical computer science; in particular, the most numerous applications have been in complexity theory, e.g. in time-space tradeoffs 19, 20, rather than in comparative schematology or in logics of programs. The second proof by Paterson and Hewitt was a counting argument. Freely speaking, counting (rather than pebbling) is used to show that a simulating iterative procedure runs out of time (rather than space). This dichotomy between pebbling and counting pervades nearly all studies of the original question recursion vs. iteration and later variants. For the early results for example, see the survey in 8, Chapters VII and VIII.
Other Theoretical Studies. Recursion pervades the whole field of computer science. Ways of removing or simplifying recursion abound. The background reviewed above by no means exhausts the topic and only covers that part of the field directly motivating our work. There is for example an extensive literature on boundedness of database logic programs, see 9 and the references therein; a database program is bounded if the depth of recursion is a constant independent of the input, in which case the program is equivalent to a recursion-free firstorder formula. Another motivation for the study of recursion vs. iteration comes from inductive definability in abstract recursion theory; see for example 15 and 16, where a formal logic framework is defined to study the question. Another work still is 5, where the question is considered in relation to program synthesis. However interesting these and other studies are, none is directly related to this
59 paper. Closer to our concerns is part of the literature on continuations t h a t deals with issues of recursion removal, specifically, the use of CPS (continuationpassing style) to transform a recursive functional program into tail-recursive form. Although "iteration" and "tail-recursion" coincide in the case of firstorder programs, the two notions are not exactly the same at higher-orders. In the full version 12 of this report, we adduce several reasons for this distinction; in any case, an attractive theory and many non-trivial results emerge out of the distinction, as shown in 12. S o m e P a s t R e s u l t s a n d S o m e Q u e s t i o n s . This history of results, starting with the Paterson-Hewitt result in 1970, is entirely concerned with the firstorder case of recursion vs. iteration. T h a t is, recursive and iterative procedures are always activated by receiving input values of ground type, and they always return an output value, if any, of ground type. Many modern programming languages allow their programs to use higherorder objects in their computations. The role of such objects is purely auxiliary in that, semantically, higher-order values are not mentioned in the input-output relation of the program. In our setting, only the main procedure in a program is first-order, but it can call intermediate procedures of arbitrary finite orders. It is now important to distinguish between a single procedure and a finite collection of procedures calling each other, of which only one is the main procedure. In this paper, we call such a collection a functional program. The order of a functional program is the highest order of any of its procedures. In general, a functional program is recursive, as no restriction is imposed on its syntax; under the restriction that all of its procedures are iterative, we say that the functional program is iterative. If ITERn and RECn are, respectively, the set of all iterative functional programs and the set of all recursive functional programs of order ~ n, where n is a positive integer, then: (1) I T E I ~ c RECn i.e. every program in ITERn, as a syntactic object, is a program in REC,~, but not vice-versa. We also have the following strict syntactic hierarchies: (2) ITER~ C ITER2 C ITER3 C .-(3) REC1 C REC2 C REC3 C ... (ITER1 and REC1 here are I T E R and REC mentioned earlier.) It is well-know there is a strict computational hierarchy (see, for example, 7 or 11 or references cited therein): (4) REC1 < REC2 < REC3 < . . In words, increasing the order of functional programs results in a gain of computational power. On the other hand, so far we can only assert, by (2):
(5) ITER1 < ITER2 < ITER3 <~..-
60
and the obvious question is whether this computational hierarchy is strict. In the presence of higher-order procedures, the question of recursion vs. iteration splits into two parts. We first ask, for n ~> 2: (6) Is ITER~ < RECn ? If the answer to question (6) is yes, it will generalize to higher-orders the PatersonHewitt result in 17, namely: At every order, there are functional programs that are inherently recursive. We further ask, for n >/1: (7) Is RECn ~< ITERn, for some n ~ > n ? If yes, what is the least such n ~ ? In words, question (7) asks whether recursion-removal is always possible after all, but now at the price of increasing the order of the program. These and other related questions, to be formulated in due course, are the immediate motivation for the present work. O r g a n i z a t i o n of the Paper. The first task is to define precisely the framework of the investigation. Our choices are not the only possible ones: How we define higher-order procedures, and how we interpret and execute them, depend on choices inspired by a particular programming paradigm. This is the paradigm of strongly-typed pure functional programs, where there are no side-effects and where the only execution mechanism is call-by-name parameter-passing. This is all done in Section 2. Hence, equivalence between programs here means equivalence under an operational semantics based on call-by-name execution. In addition to the hierarchies {REC,} and {ITEI~} mentioned earlier, we define the class p-REC of polymorphically typed recursive programs and the class p-ITER of polymorphically typed iterative programs. Our key technical lemmas are presented in Section 3 and Section 4. From these results, we draw several consequences about the hierarchies {RECn} and {ITEI~} in Section 5. Proofs are only sketched in this conference report. Details, as well as further material and related results, are in 12.
2
Basic Definitions: Syntax and Semantics
What we set up is the syntax of a typed )~-calculus + conditionals + recursion. We introduce recursion by means of mutually recursive functional equations, and not by application of fixpoint operators. D e f i n i t i o n i (Types). Let TVar he an infinite set of type variables. Let * and b o o l he two type constants, which we call ground types. The atomic types are {*,bool} U TVar. The set T of types is the least such that V _~ {*,bool} U TVar U { (a -+ T) I a , r e V } For a E T, let TVar(a) denote the finite set of type variables occurring in a. We partition T into the set Tfin of finite types and the s e t Tgen of generic types: Tfn
-~ { (7 e 'I~
TVar(a) = ~ )
and
Tgen = '~ -- Tfin 9
61
The order of a finite type a E Tfin is:
order(a) = { 0, if a = * or b o o l ; max{order(r1) + 1, order(T2)}, if a (T1 --+ T2). We do not define order(a) if a is generic. In what follows, every term M has a type % which we indicate by writing M : T. If T is a finite type and M : T, we say that M is finitely typed. If T is a generic type and M : v, we say that M is
polymorphically typed. A type substitution is a map S : TVar -~ T such that {a 9 TVarlS(c~ ) r a} is finite. Every type substitution extends in a natural way to a { * , b o o l , - + } homomorphism S : T --+ T. For a, T 9 T, we say that T is an instance of a, in symbols a ~ T, if there is a type substitution S such that T = S(a). D e f i n i t i o n 2 ( T e r m s ) . A class of functional programs is defined relative to a fixed first-order signature z~A = ~ I.J A , where E is a finite set of relation and function symbols and A is a countable set of individual symbols. To avoid trivial and uninteresting situations, we require that both E r O and A ~ O. Every relation (or function) symbol f 9 E has a fixed arity k >i 1, in which case its type is 9 -~ --- --+ * - ~ p ,
abbreviated as
.k ~ p ,
where p = b o o l (or *, resp.), i.e. a finite type of order 1. Every constant in A is of type *. It is convenient to use two disjoint sets of variables: object variables and function names. For every type T there is a countably infinite set of object variables of type T, and for every non-atomic type T a countably infinite set of function names of type T. The set of terms is the smallest containing
(ground constants)
{n, ff} u A u
u {i~ITET}
U
{ object variables } U { instantiated function names }
(otherconstants) (variables)
and closed under application and A-abstraction. The details of the definition of well-typed terms are given in Figure 1. We omit the type subscript T in if r and in an instantiated function name ~rr whenever possible, if no ambiguity is introduced. (if M t h e n N e l s e P ) is a sugared version of (if M N P). For simplicity, we often omit the type of a A-binding, and therefore write (Av M) instead of (Av : a. M). If a closed term M and all its subterms are finitely typed, the order of M is:
order(M) = max { order(T) I g C_ M and g :T } If M is not closed and N - (Av M) is the closure of M, then order(M) = order(N). We do not define the order of a polymorphically typed term. By a slight abuse of terminology, we often say " M is a n-th order term" to mean that order(M) <~n rather than order(M) = n.
62
Ground constants c : p 6-terms
if-terms
Na:*
type(~) = p ...
N~:*
t y p e ( f ) = *~ ~ p
(IN1 ... Nk) : p
N1 : b o o l N2:T N3:T (if N1 N2 N3) :T
Variables
v :r
type(v) = r
Function n a m e s
Jr~ : r
type(Jr) = a ~ T
Applications
M:a---~T N:a (MN):T
Abstractions
M:T (Av : a. M ) : (a --+ r)
Programs
M1 : al
M2 : a2
{J:I=M1, Jr2
type(v) = a
.-"
Mt : at
= M2, . . . ,Jr~ = M l }
Mi is a closed term and type(Jri) = ai for i E { 1 , . . . , e}
F i g . 1. Rules for well-typed terms and programs. Throughout, p E {*, b o o l } .
D e f i n i t i o n 3 ( F u n c t i o n a l P r o g r a m s ) . Let Jr be a function name of type a E T. A function definition for Jr is an equation of the form Jr = M where M is an arbitrary closed term of type a. Jr on the left of "=" is not instantiated, while every occurrence of Y on the right of "=" is instantiated as Jrr, for some T such that type(J r) ~ T, mad corresponds to a recursive call to Jr at type r. A functional program 79 is a finite non-empty set of function definitions, together with a distinguished function name Jr such that: - The type of Jr is *~ -~ p for some k >/1, where p = * or b o o l . - For every function name G, 79 has at most one function definition for G. - For every function name G, 79 has a definition for G if there is an instance Gr of G appearing in 79 (i.e. on the right of " = " , a call to G at some type ~-). The restrictions we impose on the type of the distinguished Jr have to do with the fact that the inputs and output, if any, of a program are always ground values.
63 Let 7> be the functional program {~i = Mi}l<~i~
(Jr1,... ,~-e) =
(M1,... , M l )
in
Jrlvl--.vk
where ~'1 is the distinguished function of arity k >/1. If the types of all the terms M 1 , . . . , Ml and their subterms are finite, then 7) is finitely typed; otherwise, 7) is polymorphically typed. If 7) is finitely typed, then its order is order(7)) = m a x { o r d e r ( M 1 ) , . . .
, order(Me)}
If this order is n >/ 1, we also say this is an order-n functional program. We do not define the order of a polymorphically typed program. Specific examples of finitely typed and polymorphically typed programs are given in Sections 3 and 4, respectively. Although it does not restrict the later analysis in any way, let us assume that the right-hand side of every function definition Jri = M i in 7) is in 3-normal form. Under this assumption, it is easy to check that if 1 ~< order(g: 0 < o r d e r ( M i ) then there is an instantiated function name ~ j occurring in M i such that order(JCj) = o r d e r ( M i ) . Hence, under this assumption, we can equivalently and more simply define order(7)) by: order(7)) = m a x { order(~'t), . . . , o r d e r ( T l ) } . T h e syntactic hierarchy of functional programs is given by: 9 R E C , --- { finitely typed functional programs of order ~ n } 9 R.EC -- .Jail RECn 9 p-REC = { polymorphically typed functional programs }
Definition 4 ( C a l l - b y - N a m e S e m a n t i c s ) . A functional program computes relative to a Z A - s t r u c t u r e .4 which assigns a meaning to every symbol in the signature E. We take the universe of .4 to be precisely the set A of all individual constants of type *. The meaning in .4 of the symbol .f E Z of type . k _+ p, where k >/1 and p = * (or b o o l ) , is a total function from A k to A (or to {tt,ff}). For the functional program 7) = {~'i = Mi}l~~<e we define the reduction relation -~ relative to .4 by the rewrite rules of Figure 2. A,7~ In the J-reduction rule, a l , . . . , a k are arbitrary elements of A, f E Z has arity k/> 1, and f interpreted in ,4 is a function that maps al 9 . . a k to ak+l E A U {tt,ff}. We often leave the structure .4 and the program 7) implicit, and write > instead of ). We write >> for the reflexive transitive closure of >. .4,'P
The first term in a computation of program 7> is always (9r al "-. ak), where ~" is the distinguished function name ofT) with arity k >/1, for some a l , . . . , ak E A . We call a = al . . . ak an input vector for 7). We call (.4, a) an interpretation for 7). Note that ( 2 al " " a k ) is a closed term of a ground type, 9 or b o o l . The next proposition is proved in 11, Sections 2 and 3. A term is in n o r m a l form if it does not contain any redexes, i.e. it cannot be reduced. To reduce a term in n o r m a l order means to reduce its leftmost redex at every step.
64
((Av N ) P ) ~ at,'P
at,7~
Nv := P
M~
(P-reduction
~ N, if b = tt,
(if b then N else P) (fal"'ak)
(/3-reduction)
~ at,'p
at,~'
I, P, if b
at,p
~
P
(NQ)
~ at,~
(PQ)
N
~ at,7a ~ at,7)
P
~ at,7~
P
(QN) N
()~v N)
(if-reduction' (&reduction)
ak+l N
ft.
4
at,'p
(QP)
(Av P)
Fig. 2. Reduction rules for program P = {~-i = M i } l ~ i ~ l
9
Proposition 5. Let 79 be a program over the signature EA, whose distinguished function name is Y:. Let (,4, a) be an interpretation for 79. If applying the reduction rules in some arbitrary order causes (Jr a) to terminate at ground value b, then applying the reduction rules in normal order also causes ( ~ a) to terminate at the same ground value b. The call-by-name semantics of functional programs corresponds to carrying out their computations, viewed as reduction sequences, in normal order. Let P be a functional program over the signature ZA, whose distinguished function name is ~" : . k __+ p where k /> 1 and p = * or b o o l . Over a EA-structure ,4, the program P defines a partial function pat : A s _+ A or Pat : A s --+ {tt,ff}, given by: Pat = { (a, b) I ( 5v a) ~ b } where b E A or b E {tt,ff}, respectively. Implicit in this definition, by the preceding proposition, is that Pat is the callby-name semantics of P over the structure ,4.
Definition 6 (Iterative Form). A term M is simple if M does not mention instantiated function names. A term M is left-linear if either M is simple or M -- ( ~ N 1 . . . N k ) where ~" is an instantiated function name and N 1 , . . . ,Nk are simple terms. A function definition Y = M is in iterative form (or left-linear form) if
65
- - either M -- )~v. N, where N is left-linear, -or M ~ )~v. if N t h e n P else Q, where N is simple, P and Q left-linear. A functional program {~'i = Mi} is in iterative f o r m if every definition in it is in iterative form. The syntactic hierarchy of iterative programs is given by: 9 ITERn = { finitely typed iterative programs of order <~ n } 9 I T E R = Un~>IITERn 9 p - I T E R = { polymorphically typed iterative programs }
3
From
Recursion
to Iteration
We restrict our analysis in this section to finitely typed programs, as we do not know how to extend it to polymorphically typed programs. Unless stated otherwise, all programs in this section are finitely typed. Let 7) = {~'~ = Mi}l~<~
1 and some ground type p. We temporarily introduce s + 1 special symbols _L0 : p, 11 : a l , . . . , i t : at with the exhibited types. Intuitively, A-i stands for "the evaluation of function jc~ has not yet converged". We define a "flowchart" 7) whose call-by-name semantics coincide with the call-by-name semantics of 7). 7) is shown in Fig. 3, where x l , . . . , xk are k input variables all of type * and z is a fresh variable of type p. The construction of 7) from 7) is suggested by standard methods in denotational semantics. 7) is a convenient description suggesting the later transition to iterative form. In the form of a functional program, not legal in our formalism because of the artificial symbols -l-0,-kl,..., I t , we can write 7) as ~x 7/x
z FI'"
=
Ft =
7/x-Lo-L1.-.โข if (z r A-0) t h e n (Fix) else 7-/x ( F i x ) M ~ . . . M~
where x = Xl . . . x k and M~ is M i with ~'1,-.. ,~'l replaced by F 1 , . . . , F t , respectively, and 0 and 7 / a r e fresh function names with the following types: O: 7/
,k-4p :
* k -4 p -4 a 1 - 4 . . . -4 a t -4 p
The challenge here is how to turn 7) into a legal functional program, in particular how to simulate the artificial symbols A-0, _L1,..., -14. For this, we first transform 7) into ((7))), which incorporates an appropriate simulation of A-0,-L1,... , I t . In a second stage, we transform ((7))) into iter(7)), the final iterative form of 7).
66 input Xl .-. Xk
1 (-~1,... ,.~t) := (--LI,... , -l-t)
,
( ~ ' ~ , . . . , 9rt := (M1 ( 9 r l , . . . , ~ ) ) , . . . ,
M t ( ~ ' : , . 9 9 ~'t))
z := ~'lXl " ' X k
*
output z m
F i g . 3. Transforming P into P .
T h e transformation (()) is defined in Fig. 5. If :P mentions object variable v (resp. function name ~'), then ((P)) mentions b o t h v (resp. jr) and a fresh object variable v (resp. fresh function n a m e ~ ) . If P = {~'i = Mi}l<,.i<<j then ((7~)) is the program: ((P)) = { ~'i = Mi }l..
Let P = {~ri = M/}l~
- - ~'1 : a l , . . . , ~'t : a t for some a l , . . . , at E T, ~'1 is the distinguished function, al = . k _+ p and p E { * , b o o l } .
--
T h e iterative form iter(7)) of P is the following functional program:
7/xzF1
F1...Ft
Gx
=
7/xffclf2~,...ct~,~
Ft
:
if
z
then
(FlXl...xk)
else
7 / x (F1 tt x~ . . . tt Xk) M~ ((M1))*"" M~ ((Mr)) ~ where 1.
x
=
Xl
9 9 9 Xk.
67 input Xl 99 9xk
L
l
( ~ ' I , Y l , . 9 9 , ~'t,.~t) : = (M1, ( ( M i ) ) , . . . , Me, ((Me)))
1
I z:=~ittxi"'ttxk
l ,
o u t p u t ~'i xi -'-Xk
Fig. 4. Transforming ((7))) into ((7))). 2. 3. 4. 5.
F1 : o'1, F 1 : ((u1)),... , F! : o'l, E l : (Co'l)) are fresh variables.
M~" is Ms with ~'1,... , ~'! replaced by F I , , . . . , Fl, respectively. ((Ms))* is ((M~)) with Y i , ~ ' l , . . - , ~ l , 9 v t replaced by F 1 , F 1 , . . . ,Ft,Fe. For i E { 1 , . . . ,~}, if as =- Ti,i ~ "'" ~ VS,k, ~ Pi with ks >/ 1 and Pi E { * , b o o l } , then cs - (Avi : rs,1. " " .AVk~ : T~,k,. a), where a is any ground constant, or variable in { X l , . . . ,Xk}, of type Pi6. g and 7-/are fresh function names with the following types: g:
. k __+ p
7-/ :
. k __+ bool ~ ai --+ ((ai)) - + . . . -~ at --~ ((at)) ~ p
T h e o r e m 8. If 7) is a finitely typed functional program of order n >/ 1, then iter(7)) is an iterative program of order n + 1 equivalent to 7).
Proof. Let 7) = {:7:s = Mi}i<<.i<<.eas in Definition 7. T h e equivalence of 7) and iter(7)) follows from the preceding discussion. To check t h a t iter(7)) is in iterative form is a straightforward inspection of its definition. Finally, if order(7)) = n, we can take: n = m a x { o r d e r ( J r i ) , . . . , order(:7:t)} = m a x { o r d e r ( a i ) , . . . , order(at)} See Definition 3. This implies:
order( iter(7)) ) = order(7-l) order(* k -~ b o o l --+ a l --~ ((al)) -+ "'" ~ a t --~ (Cat)) --> p) =n+l
=
T h e last equality (to n + 1) is immediate from the definition of order.
68
((*)) = b o o l
Types
((bool)) = b o o l
((~ - , ~)) = ((~)) -~ ~ -~ ((~)) /~. - f f : b o o l
Special symbols
12booi -- .~ : b o o l
o ~ _ . - ( ~ _ : ((o)).~, : ~. o T ) : ((~ -~ ~)) ((c: p)) = u : ((p))
Terms
((.f N l " " N k : p)) = ((N1)) ~ d
... a n d ((Nk)): ((p))
((if N1 N2 N3 : ~)) = (if ((N1)) ( if N1 ((N2)) ((N3))) O ~ ) : ((~)) ((v : ~)) = v : ((~))
((~': ~)) = :r: ((~)) ((MN : r) = (((M))((N))N)
: ((7-))
((Av : a . M : a -+ r)) = ()~v_: ((a)).)~v :a. ((U))) : ((a ~ 7")) Programs
{ .T~ = M~, ... ,.T't = Mr, r
=
((M~)), . . . ,.T't = ((Mr))
Fig. 5. The transformation (()) of finitely typed functional programs. Throughout, p E ( * , b o o l } and (bl a n d b2) is shorthand for (if bl b2 ~ ) .
Example 1. The following first-order functional program :P is from the PatersonHewitt paper 17: ~x
--
if p x t h e n
x else g ( J : ( L x ) ) ( J r ( R x ) )
where the signature is ,U = ( p , L , R , g } , and the types are p : * -~ b o o l , L , R : 9 - ~ . , and 9~, g : * -~ * ~ *. Using the pebble game, Paterson and Hewitt show that this functional program cannot be simulated by a first-order iterative program. By Theorem 8, iter(7~) is a second-order iterative program equivalent to 7). The details of iter(7=)) are given next. Let M denote the right-hand side of the function definition in :P, i.e. M - ()~x.ifpx t h e n x else g ( j r ( L x ) ) ( ~ ( R x ) ) ) .
69
The functional program iter(P) is:
gx = 7/xzFF__ =
7 / x / ~ (~v.x) (~w.~w./f) if z t h e n F x else 7 t x (F tt x) N P
where G and 7 / a r e fresh function names with the following types: :
,
---~,
7/ : * --~ b o o l -+ (* --~ *) --+ (bool --+ * -+ bool) -+ * and N and P are the following terms: N - ~x. i f p x t h e n x else g(F(Lx))(F(Rx)) P - )~x_..,kx.if x_.t h e n ( i f p x t h e n x_ else (_F x. (Lx))and(_.F x (Rz))) else f f N is simply M with function name ~" replaced by variable F, and P is ((N)). Open Problem. Extend the transformation (()) to polymorphically typed functional programs. Define the iterative form iter('P) of a polymorphically typed ~o based on this extension of (()). 4
Polymorphic
Iteration
We prove that there are polymorphically typed iterative programs that cannot be translated into equivalent finitely typed functional programs. We start with an example of a polymorphically typed iterative program P, which we use in the proof of Theorem 9. Example 2. P is the following polymorphically typed functional program: Y:x =
gxFZ
~x/a
=
if
rx
then
FZ
else
G(fx)(Av.)~w.v(vw))FZ
The types of the symbols in the signature E -- ( f , r}, the ground constant a, the variables {x, v, w, F, Z}, and the function names {Jr, G} are a~x
:
*
f
:
*---~ *
r : * -+ b o o l v :F
:
o~ --~ o~
w~ Z
:
oL
: ,--~, G :
*~(~-~)-~
The instance of ~ in the definition for ~" has type 9 -+(, ~,)
~*
~,
70 The instance of G in the definition for G has type -,
,
-,
-,
-,
Noting that :P is in iterative form, what we have in this example is a case of
polymorphic iteration. If n is the least natural number such that r(f(n)x) is tt, we have the following converging computation:
~cx
~
g x / a g (lx) 2 l a
~ 6 (f(2)x) 2 2 f a D ..-
where 2 denote the term (~v.~w.v(vw)), and the function e is given by e(0) -- 1 and e(n + 1) = 2 e(n) for all n /> 0. Each call to ~ in the computation is at a different type 9 An explicitly typed intermediate term in this computation is (types inserted as superscripts): G*~*k+l-~*k+l
(/(~)x)* 2 *k+l
2 *~
... 2 *2 1.1
a*
where 0 ~< k ~< n and we use the following type abbreviations: *0 = * and 9 k + 1 = (*k ~ *k). Note that because the program is polymorphic, G is repeatedly applied to finitely typed arguments with increasingly complex types. This example is adapted from 11, Example 5.3. T h e o r e m 9. There is a polymorphically typed iterative program 7~ which is not
equivalent to any finitely typed functional program. Proof. 2 The desired :P is the program given in Example 2. It suffices to show for every finitely typed functional program Q of order n / > 1, over the signature S of :P, there is a EA-structure ,4 such that P A r QA. We choose ,4 to be of the form ( { a l , . . . ,a~} U N,r, f ) where u is a positive integer (to be appropriately selected depending on Q), N is the set of natural numbers, r is the predicate such that (r x) = tt iff x = a~, and / is the function: ( f ai) = ai+l for 1 ~ i < u, ( f a~) = a~, and ( f i) = i + 1 for every i e N. P mentions only one ground constant, namely a, while Q may mention a as well as several other ground constants. We choose a to be 0 in N, and we can always define ,4 so that all the other ground constants mentioned by Q are in { a l , . . . , au}. No matter what the value of u is, the computation of P relative to the interpretation (,4, al) converges and returns the value: (f(e(u)) a) = (f(e(u)) O) = e(u) 2 Joint with Pawet Urzyczyn.
71 We need to select u large enough so t h a t the computation of Q, relative to the same interpretation (,4, al), either diverges or converges and returns a value e(u). By Theorem 8, we can restrict Q to be in iterative form. We can therefore write Q in the form of an order-n "flowchart" consisting of, say, k >/1 instructions and ~/> 1 variables, all of order < n. As there is no "communication" between the elements in {al . . . . , au} and the elements in N, the behavior of Q is entirely determined by the substructure B = ( { a l , . . . , au), r, f ) . Define the function exp such that exp(0, n) = n and exp(m+l, n) = 2exp(re'n) , for all m, n E N. It is not difficult to show there is a fixed polynomial ~ : N ~ N such that if v : a is a variable in Q and order(a) = m >1 O, then v will be assigned at most exp(m, ~p(u)) distinct values, i.e. functions of type a over the universe { a l , . . . ,au), in the course of the computation of Q relative to the interpretation (B, al). If a state of this computation consists of an instruction label (k of them) in Q, together with the values assigned to the ~ variables in Q, then k . (exp(n, ~(u))) ~ is an upper bound on the number of distinct states that Q can visit in the course of its computation. If a state is repeated, the computation is doomed to diverge. Noting that k, e and n depend on Q and are therefore fixed in this proof, let r = k . (exp(n,~(u))) ~. Suppose z : * is a variable in Q which is assigned the final output, if any. The value assigned to z can be changed at most r times in the course of a converging computation of Q, now relative to (A, al). Using the fact that a finitely-typed simple term M reduces to normal form in at most exp(M, M D steps, see 21, the final value assigned to z cannot exceed exp(p. r ~b(u)) for some p E N depending on Q. The desired conclusion now follows, because e(u) > exp(p. r r for sufficiently large u. :3
5
Hierarchies
The results of Sections 3 and 4 do not presume that the signature E contains a symbol e q : .2 ~ b o o l which is always interpreted as the equality relation on the universe A of a ZA-structure. To compare the hierarchies {ITERn} and {RECn}, and use the results of Ill, we now assume the existence of such a symbol eq. It is easy to see that Theorems 8 and 9 imply the following computational hierarchy: ITER1 < I~EC1 ~ ITER2 ~< REC2 ~< ITER3 ~< REC3 ~< --. < p-ITER ~ p-REC The strictness of the first level, ITER1 < RECt, is the classical result of Paterson and Hewitt 17. Using the already known fact that RECn < RECn+I for every n / > 1, see ll, Theorem 3.9, we can conclude: ITER1 < ITER3 < ITER5 < .-. < I T E R < p - I T E R
(odd orders)
ITER2 < ITERa < ITER6 < ... < I T E R . < p - I T E R
(even orders)
Although we are not yet able to separate two consecutive levels in this hierarchy, we have already shown that increasing the order of finitely-typed iterative
72 programs results in a net gain of computational power, and adding polymorphic iteration results in a further gain of computational power. Based on the preceding, it is only natural to state the following.
Conjecture. ITER1 < ITER2 < ITERz < ITER4 < -..
(all orders)
To settle this conjecture, we can proceed in one of two ways to prove that consecutive levels of the hierarchy ( I T E I ~ } can indeed be separated. This is similar to the situation in the first-order case, where separation results can be established in two different ways, depending on whether "counting" or "pebbling" is used. The first approach is to sharpen the counting arguments we have used so far, with a view to separate consecutive levels in the hierarchy. The second approach, more problematic at this point, is to try some kind of pebbling argument. This clearly raises the question of how to define a higher-order pebble game. We can define it with the aim to directly show that ITERn < ITERn+I or, using the already established reduction RECn ~ ITERn+I of Theorem 8, the stronger result that ITERn < RECn for every n /> 2. In both cases there are several technical issues to be sorted out. Any of these two results will settle the above conjecture.
References 1. Abelson, H., and Sussman, G., Structure And Interpretation Of Computer Progams, MIT Press/McGraw-Hill, NY, 1984. 2. Aho, A.V. and Ullman, J.D., Principles of Compiler Design, Addison-Wesley, 1979. 3. Auslander, M.A., and Strong, H.R., "Systematic recursion removal", Communications ACM, 21, no. 2, pp 127-134, Feb 1978. 4. Barendregt, H.P., The Lambda Calculus, Its Syntax and Semantics, revised edition, North-Holland, Amsterdam, 1984. 5. BShm, C., and Berarducci, A., "Automatic synthesis of typed lambda-programs on term algebras", Theoretical Computer Science, 39, pp 135-154, 1985. 6. Friedman, D.P., Wand, M., and Haynes, C.T., Essentials of Programming Languages, MIT Press/McGraw-Hill, NY, 1992. 7. Goerdt, A., "On the computational power of the finitely typed lambda-terms", in Proceedings of 13th MFCS, LNCS 324, pp 318-328, 1988. 8. Greibach, S.A., Theory of Program Structures: Schemes, Semantics, Verification, LNCS 36, Springer-Verlag, 1975. 9. Hillebrand, G.G., Kanellakis, P.C., Mairson, H.G., and Vardi, M.Y., "Undecidable Boundedness Problems for Datalog Programs", Journal of Logic Programming 25:2, pp 163-190, 1995. 10. Kamin, S.N., Programming Languages: An Interpreter-Based Approach, AddisonWesley, 1990. 11. Kfoury, A.J., Tiuryn, J., and Urzyczyn, P., "On the expressive power of finitely typed and universally polymorphic recursive procedures", Theoretical Computer Science, 93, pp 1-41, 1992. 12. Kfoury, A.J., "Recursion, Tail-Recursion, and Iteration at Higher-Orders". In preparation.
73 13. Kozen, D., and Tiuryn, J., "Logics of Programs", in Handbook of Theoretical Computer Science, Vol. B, Formal Methods and Semantics, ed. J. van Leeuven, Elsevier Science Publ. and The MIT Press, pp 789-840, 1990. 14. Mitchell, J.C., Foundations for Programming Languages, MIT Press, Cambridge, Mass, 1996. 15. Moschovakis, Y.N., Elementary Induction on Abstract Structures, North-Holland, 1974. 16. Moschovakis, Y.N., "The Formal Language of Recursion", Journal of Symbolic Logic, 54, pp 1216-1252, 1989. 17. Paterson, M.S., and Hewitt, C., "Comparative schematology", M I T A.L Lab Technical Memo No. 201 (also in Proc. of Project MAC Conference on Concurrent Systems and Parallel Computation), 1970. 18. Peyton Jones, S.L., The Implementation of Functional Programming Languages, Prentice-Hall, 1987. 19. Pippenger, N., "Pebbling", Fifth Symposium on Mathematical Foundations of Computer Science, IBM Japan, 1980. 20. Pippenger, N., "Advances in Pebbling", Proc. of 9th ICALP, LNCS no. 140, Springer-Verlag, 1982. 21. Statman, R., "The typed A-calculus is not elementary recursive", Theoretical Computer Science, 9, pp 73-81, 1979. 22. Strong, H.R., "Translating recursion equations into flowcharts", J. Computer and System Sciences, 5, pp 254-285, 1971. 23. Walker, S.A. and Strong, H.R., "Characterizations of flowchartable recursions', J. Computer and System Sciences, 7, pp 404-447, 1973.
Approximating Geometric Domains through Topological Triangulations Tamal K. Dey1 Arunabha Roy1 Nimish R. Shah2 2
1
Dept. of CSE, I.I.T. Kharagpur, Kharagpur 721302, India Synopsys Inc., 700 East Middle eld Road, Mt. View, CA 94043, USA
Abstract. This paper introduces a 2-dimensional triangulation scheme
based on a topological triangulation that approximates a given domain X within a speci ed Hausdor distance from X . The underlying space of the resulting good quality triangulation is homeomorphic to X and contains either equilateral triangles or right angled triangles with 30 , 60 and 90 angles. For a particular range of approximation tolerance, the number of triangles in the triangulation produced by the method is O(t log2 t) where t is the number of triangles in an optimal triangulation where the optimum is taken over bounded aspect ratio triangulations satisfying a certain boundary condition with respect to X . The method can also produce a quadrangulation of X having similar properties. Relevant implementation issues and results are discussed.
1 Introduction Triangulations of geometric spaces are used in many applications such as nite element methods, computer graphics and solid modeling. Triangular elements are usually linear and therefore a nite number of triangular elements cannot decompose a curved domain. On the other hand, it is easier to handle linear elements computationally. As a compromise, in most of the cases a curved domain is approximated with linear elements. A solid is often modeled with a linear boundary which is further broken up into triangular elements for ecient representation [10], for other postprocessing such as nite element analysis [1], for rendering [8], etc. Numerous methods exist to triangulate a domain with linear boundaries. See [3] for a comprehensive coverage of these methods. The requirement that the triangular elements cover the given domain exactly often con icts with other objectives. For example, consider bounded aspect ratio triangulations in which each triangular element is guaranteed to have an aspect ratio within some limit. A polygon with a very sharp angle can never be triangulated without producing angles less than or equal to that sharp angle. All recent methods of guaranteed aspect ratio triangulation [2, 4, 6, 13] are constrained by this. Frequently, the domain being triangulated does not represent the original object exactly. Therefore, the constraint of triangulating the input domain exactly is too restrictive in these cases. Of independent interest are approximations of geometric domains being used recently in other applications such as simpli cation of scenes [9] for rendering and collison detection [11] in animations. In this
paper we relax the condition that the underlying space of the triangulation be exactly the same as the given domain. Our triangulation scheme approximates a given two dimensional geometric domain within a user speci ed tolerance keeping topological equivalence with the given domain. Since the constraint of tting the boundary of the domain is not present, we have more exibility in restricting the angles of the triangulation. In fact, we show that our triangulation produces only two types of triangles, viz. equilateral triangles and right angled triangles with 30 ; 60 and 90 angles. The algorithm of [2] guarantees a minimum angle of 18 if the input polygon has all angles greater than 18 . For no-large-angle methods, see [4, 5]. All these algorithms decompose the input polygons exactly. Relaxing this constraint allows our algorithm to handle more general topological domains and produce non-obtuse triangulations with the smallest angle greater than or equal to 30 . The method can be modi ed very easily to produce a quadrangulation of the domain, which nd applications in nite element methods. Section 2 introduces some terminology and describes a triangulation method based on the topological triangulation introduced by Edelsbrunner and Shah [7]. The method of generating a topological triangulation is based on a theorem that states the condition for the dual complex of a subdivision of the domain to be a triangulation with underlying space topologically the same as the given domain. A major computational drawback of the method was overcome when Shah [14] introduced a systematic method of choosing Steiner points from the given domain to build the triangulation. However, these earlier works were not concerned with implementation issues, size of the triangulation and quality of the generated triangles. This paper addresses these issues and introduces topological triangulation as an approximation scheme. Section 3 discusses a triangulation algorithm based on the topological triangulation scheme described in section 2. Implementation issues are discussed in section 4. While implementing the algorithm we discovered some tricks that ease the implementation eort and produce better output with respect to the number of triangles. These tricks are discussed in section 4. Section 5 discusses certain properties of the triangulation produced by the algorithm in section 3. It is shown that the Hausdor distance between the underlying space of the triangulation and the given domain is at most T where T is the given approximation tolerance. Equilateral triangular grids as opposed to square grids used by Shah [14] are shown to produce triangles with good aspect ratio. The triangulation is a subcomplex of some Delaunay triangulation and has non-obtuse angles only. These properties are desirable in many applications. Let T (X; T ) denote the set of bounded aspect ratio triangulations T such that the triangles of T cover the boundary of X and the triangles of T touching the boundary of the underlying space of T are within a distance 23 T from the boundary of X . Let jT j denote the number of triangles in a triangulation T and let t denote the minimum jT j, T 2 T (X; T ). For a particular range of the approximation tolerance T , we show 2in section 5 that our scheme produces a triangulation S such that jS j = O(t log t). For other values of the approximation tolerance, we expect the method to produce a triangulation of reasonable size. This is corroborated by
our implementation results. Section 6 describes some implementation results and discusses some ongoing research.
2 Preliminaries 2.1 Basic de nitions In the following we only consider topological spaces that are subspaces of Euclidean real spaces. The d-dimensional real space is denoted by Rd . Simplex. A simplex T is the convex hull of an anely independent point set T . If jT j = k + 1, then T is a k-simplex. A j -simplex U is a j -face of T if U T . A 0-simplex is also called a vertex, a 1-simplex is called an edge and a 2-simplex is called a triangle. Simplicial complex. A simplicial complex K is a collection of simplices such that the following properties hold. i) If U 2 K and V U , then V 2 K. ii) If U ; V 2 K, then U \ V = U V . The rst condition implies that all faces of a simplex in K are contained in K. The second condition says that the (possibly empty) intersection of any two simplices S K. in K is their common face. The underlying space of K is denoted by Homeomorphism. A continuous function f : X ! Y between two topological spaces X and Y is a homeomorphism if its inverse is also continuous. X and Y are homeomorphic if there exists a homeomorphism between them. Closed Balls. A topological space X Rd is a closed k-ball if X is homeomorphic to B k = fp 2 Rk j jpoj 1g, the unit k-ball. Here, o is the origin of Rk . Hausdorff Distance. De ne the distance between a point p 2 Rd and a compact set X Rd to be d(p; X ) = minfjpxj j x 2 X g. For a compact set X Rd and a non-negative r 2 R, de ne X + r = fp 2 Rd j d(p; X ) rg. The Hausdor distance between two compact sets X; Y Rd is the minimum r 0 such that X Y + r and Y X + r. It is straightforward to verify that the Hausdor distance is indeed a metric on the space of compact sets. \
2.2 Topological triangulation A topological S triangulation of a topological space X is a simplicial complex K such that K, the underlying space of K, is homeomorphic to X . In this paper X is a compact 2-manifold embedded in R2. In general, a connected component of X has an outer boundary and possibly a set of inner boundaries. We denote the boundary and interior of X by bd(X ) and int(X ) respectively. A -tree E is a hierarchical subdivision of an equilateral triangle, represented by the root of the tree, into smaller equilateral triangles. An internal node of E represents an equilateral triangle which is split into four smaller, congruent,
equilateral triangles in this hierarchical subdivision. If the length of the path from
a node n in E to the root is l and the centroid of the triangle t corresponding to n is p, then we shall use Vpl to refer to both t and n. The usage will be clear from the context. Figure 2.1 illustrates splitting where a node a is split into four equilateral triangles b; c; d; and e. Observe that the centroid of the triangle corresponding to a is the same as the centroid of the triangle corresponding to d. Let EX denote the subtree of E whose leaf nodes are exactly those leaf nodes of E which have non-empty intersection with X . Assume that the triangle corresponding to the root of E contains X . a
a
b c
d
b c d
e
e
Fig. 2.1. Splitting a node into four children. Let P = fp j Vpl is a leaf node in EX g. For p 2 P , we shall refer to Vpl simply as Vp since lp is decided by the fact that the node is a leaf node. For any T P , de ne IT to be the intersection of all Vp such that p 2 T . For convenience, we de ne IT = ; when T = ;. De ne IT;X = IT \ X . Along the line of topological triangulation in [7, 14], we construct a simplicial complex from subcomplexes de ned for each non-empty IT;X as follows. When jT j = 1, de ne S (T ) = fpg, where T = fpg. If jT j = 2, de ne S (T ) = fp1 ; p2; p1;p2 g, where T = fp1; p2g. When jT j 3, IT 6= ; is a singleton set, say IT = fvg. Let p1; p2; :::;pk 2 T be some sequence of centroids of triangles ordered around v. Consider the polygonal cells Pv with the boundary p1; p2; :::; pk; p1. Triangulate this polygon possibly using points inside it and let S (T ) be the corresponding simplicial complex. For an illustration see the polygonal cell Px and Py in Figure 3.2 (the cells are not marked in the gure). Finally, let S (X ) be the union of all S (T ) with non-empty IT;X . It can be shown that S (X ) is a simplicial complex. The following theorem, which follows from the results of [14], is at the heart of the validity of topological triangulation. This was proved for the case of cubic subdivisions (quadtree in R2 and octtree in R3) in [14] and the most general form of the theorem is proved in [7]. The theorem uses a closed ball property, which we simplify for our domain X . p
f
p
g
Closed ball property. A k-simplex satis es the closed ball property with
respect to X if \ X is either empty or a closed k-ball and \ bd(X ) is either
empty or a closed (k , 1)-ball.
Theorem 1. S S (X ) is homeomorphic to X if, for any T P , IT = ; or IT satis es the closed ball property with respect to X .
3 Algorithm In this section we describe an algorithm to compute a topological triangulation of an input domain X . The input to our algorithm is X and a tolerance of approximation T . The output is a topological triangulation S (X; ) of X where = 23 T . All triangles of S(X; ) have bounded aspect ratio. In particular, they are either equilateral triangles or right angledS triangles of speci c type. Furthermore, the Hausdor distance between S = S (X; ) and X is at most T . The algorithm builds E by successively splitting nodes that do not meet certain conditions. Initially, E consists of a suciently large equilateral triangle, with edge length 2j ; j 0, that contains X . The choice of j is made such that the area of this triangle is O(L2) where L denotes the length3 of bd(X ). The size `() of a triangle is the length of the longest edge of . A leaf node Vp of EX is split if it does not satisfy one of the following three conditions. i) Vp and all of its edges and vertices satisfy the closed ball property with respect to X . ii) If Vp \ bd(X ) is non-empty then `(Vp ) . iii) If Vp and Vq are adjacent then 12 `(Vq ) `(Vp ) 2`(Vq ). Condition (i) is imposed to achieve the topological equivalence according to Theorem 1. Condition (ii) guarantees approximation within the speci ed tolerance. Condition (iii) ensures that the sizes of the adjacent triangles are within a factor of two, and hence, as we will show, the triangles of S (X; ) have bounded aspect ratio. Finally, when all leaf nodes of EX satisfy the above three conditions, we construct a simplicial complex S (X ) and de ne it to be S (X; ). This triangulation consists of triangles that are formed as follows. Let E (X ) denote the set of triangles corresponding to the leaf nodes of EX . Let P be as de ned in the previous section and let T be a subset of P such that jT j 3 and IT;X 6= ;. Then IT;X is a singleton set, say IT;X = fvg. Consider the triangles of E (X ) that contain v. We have two possible con gurations of these triangles around v. Con guration (i): There are exactly four triangles around v and exactly one triangle has double the size of the other three incident to v. In Figure 3.2(b) x is such a vertex. Con guration (ii): There are exactly six triangles incident to v, each has a size within a factor of two of the others. In Figure 3.2(b) y is such a vertex. There is no other con guration possible, which can be observed from the fact that the triangles span either an angle of 180 (Figure 3.2(b)) or an angle of 60 around v and condition (iii) ensures the speci c ratios of the sizes of these triangles. In con guration (i) the quadrilateral formed by the centriods of the four triangles is triangulated by adding the unique diagonal that connects the centroids of the larger triangle and a smaller triangle. See Figure 3.2(c). In con guration
3
We assume that X is well-behaved so that its area and the length of its boundary are well-de ned.
x
(a)
y
(b)
(c)
Fig. 3.2. (a) X and E (X ) (b) Centroids connected around vertices of triangles in E (X ) (c) S (X ).
(ii), the hexagon Pv is triangulated by joining v to each of its vertices as in Figure 3.2(c).
4 Implementation 4.1 Closed ball property For a triangle Vp , we rst check if bd(X ) intersects any of its edges in at most one point. This condition is derived from the requirement that bd(X ) \ IT is either empty or a 0-ball when jT j = 2. If this condition is satis ed, then the intersection of an edge of Vp with X is either empty or a 1-ball, as required. It only remains to be veri ed that Vp \ X is empty or a 2-ball and that Vp \ bd(X ) is empty or a 1-ball. These conditions are satis ed i Vp \ X = ; or the boundary of Vp \ X has one connected component. The latter condition is veri ed by checking if any component of bd(X ) lies in the interior of Vp . 4.2 Perturbation
Excessive, unnecessary splitting may occur at certain regions of X in some situations. For example, consider the situation illustrated in Figure 4.3. A part of X is slightly above the edge e of a triangle, say Vp. To satisfy the closed ball property, Vp is continuously split until suciently small triangles divide Vp \ X to meet the closed ball condition. This minute splitting is further propagated to meet condition (iii). We introduce -perturbation to prevent this excessive splitting. A topological space Y Rd is -perturbed to W Rd if Y and W are homeomorphic and the Hausdor distance between the two is less than or equal to . To prevent excessive splitting we check if there is a possible =2-perturbation of X that would allow Vp to satisfy the closed ball property. In that case we perform a =2-perturbation and declare Vp to satisfy the closed ball property. This perturbation may also aect other triangles. For example, the two triangles incident to e in Figure 4.3 satisfy the closed ball property after perturbation. Other, more involved, =2-perturbations are not detailed in this version. Experimental results show that this saves a lot of unnecessary splitting. For example,
the domain in Figure A.1 (see appendix) generated 7930 triangles without any perturbation, and 1458 triangles with perturbation. e
(a)
(b)
Fig. 4.3. Excess splitting.
4.3 Handling degeneracies Let R2 be a k-simplex. We say intersects X generically if (i) \ bd(X ) is empty or \ bd(X ) is a (k , 1)-manifold4 and (ii) bd(X ) \ int() = int(bd(X ) \ ). Degeneracies occur if this generic intersection property is violated. We handle these degeneracies by small perturbations of bd(X ) as shown in Figure 4.4. In Figure 4.4(a) the condition (ii) of generic intersection is violated. In Figure 4.4(b) and (c) the condition (i) of generic intersection is violated. Note that we do not perform these perturbations geometrically. Rather we make implicit perturbations and assume that these degeneracies have not occured. The implicit assumptions are re ected while checking the closed ball properties for triangles participating in degenerate intersections.
(a)
(b)
(c)
Fig.4.4. Implicit perturbations to avoid degeneracy (a) bd(X ) \touches" an edge in a point (b) bd(X ) intersects an edge in a line segment (c) bd(X ) intersects a vertex. 4
A (k , 1)-manifold is de ned to be the empty set when k = 0.
4.4 Sequencing condition checking Excessive splitting may also result from an arbitrary ordering of the checking of the three conditions. For example, suppose V 2 E has been checked to satisfy the closed ball property. Subsequently if V is split due to condition (ii) or (iii), the resulting children may not satisfy the closed ball property and this may ensue a new series of splitting. See Figure 4.5 for an illustration. To avoid this we check the conditions in the followingorder and adopt a trick. First, all nodes intersecting bd(X ) are split until condition (ii) is satis ed. Next, the nodes violating condition (i) even after a =2-perturbation are split. At this stage, all nodes satisfy the closed ball property with a =2-perturbation. Before we carry out the splitting due to condition (iii) we replace V \ bd(X ) as follows for each V intersecting bd(X ). bd(X ) intersects two edges of V , say at x and y (see Figure 4.5(a)). Replace V \ bd(X ) by the straight line segment connecting x and y. It is easy to check that all children of V satisfy the closed ball property with this replacement of bd(X ) and thus do not cause any further splitting due to condition (i). Since we perform the splitting due to condition (ii) rst, the approximation distance of S (X; ), as explained below, remains within the desired bound even with this implicit modi cation of bd(X ).
x
y (a)
(b)
Fig. 4.5. (a) V satis es the closed ball property (b) Three children of V do not satisfy the closed ball property.
5 Triangulation properties The triangulation S (X; ) has certain properties that are desirable in most of the applications requiring triangulations of input domains.
5.1 Hausdor distance
S
The Hausdor distance between S = S (X; ) and X is at most T = 23 . First, we show that S X + 32 . For a triangle V 2 E (X ), either V X or V \ bd(X ) 6= ;. In the latter case, V X + because of condition (ii) of splitting a node. Since the triangles in S (X; ) are covered by the triangles in E (X ), it
follows that S X + . Now consider a point x 2 X , S . There exists a triangle Vp 2 E(X ) such that Vp \ bd(X ) 6= ; and either x 2 Vp , or x is at a distance of at most =2 from some point in Vp . By condition (ii), the size of Vp is at most and so jpxj 32 implying that X S + 23 .
5.2 Triangle quality The method described in section 3 produces only two types of triangles; viz., equilateral triangles and right angled triangles that have angles 30 , 60 and 90 . In con guration (i), only right angled triangles are created. To see this consider the quadrilateral pqrs where p is the centroid of the larger triangle. It is a simple observation that the angle at p is 60 and the angle at the opposite vertex r is 120 . The diagonal pr dissects the quadrilateral into two isometric triangles and thus each triangle has angles 30 ; 60 and 90 . A right angled triangle with these angles has aspect ratio5 2. In con guration (ii), both equilateral and right angled triangles are generated. It can be shown using simple geometry that p ;p +1 ;v is an equilateral triangle (Figure 3.2(c)) if both pi and pi+1 are centroids of triangles with equal size. When these two centroids come from triangles, one of whose size is double the size of the other, the triangle p ;p +1 ;v is a right angled triangle whose other two angles are 30 and 60 . This implies that the triangles produced by the method in section 3 have aspect ratio 2 or 23 . Experiments with dierent domains have shown that most of the triangles are equilateral triangles. We remark that, in general, a factor of 2j for j 0 can be used in condition (iii). As j increases the aspect ratio increases and the size of the triangulation decreases. Considering the trade-o between the aspect ratio and the size of a triangulation, j = 1 seems to be the right choice. For j > 1 triangles may not remain non-obtuse and for j = 0 all triangles are equilateral and have the same size which results in a large number of triangles in the triangulation.
f i
f i
i
i
g
g
p
5.3 Delaunayhood It can be shown that each triangle formed in con gurations (i) and (ii) is Delaunay. In con guration (i), the four centroids are co-circular and this circle does not contain any other centroid or vertex of triangles in E (X ). In con guaration (ii), the circumcircle of p ;p +1 ;v does not contain any other centroid or vertex of triangles in E (X ). It means that S (X; ) is a subcomplex of a Delaunay triangulation of the vertex set of S (X; ). f i
5
i
g
The aspect ratio of a triangle is the length of the hypotenuse (longest side) divided by the length of the altitude from the hypotenuse.
5.4 Size of S (X; ) Usually triangulations are used for certain postprocessing in most applications. The larger the size of the triangulation, the more the time required for postprocessing. As a result attempts are made to keep the size of a triangulation as small as possible. In general, nding an optimal triangulation having the minimum number of triangles and satisfying a set of constraints seems hard. Many triangulation methods guarantee an output within a constant factor of the optimum [2, 13] when the input domain is a polygon.
Fig. 5.6. Approximating with a single triangle. Achieving such a bound for general domains in R2 eludes us. For example, the domain in Figure 5.6 can be approximated with a single triangle within reasonably small tolerance. The topological triangulation method described in this paper may produce many more triangles due to certain placements of the -tree. Still, we can show that our method produces a triangulation whose size is not too far away from the optimal under certain restrictions. We say that T satis es the locality property if the following two conditions hold. Recall that = 32 T . 1. For any triangle with size or less, \ bd(X ) has length at most O(`()). 2. Any triangle with size or less satis es the closed ball property with respect to X with a =2- perturbation of X . Let B ( ) denote the strip obtained by taking the union of all geometric discs of radius and centered at points on bd(X ). A triangle of a triangulation is a boundary triangle if it has non-empty intersection with the boundary of the underlying space of the triangulation. A triangulation T (X; ) is boundary restricted if all boundary triangles lie within B ( ) and bd(X ) is covered by the boundary triangles of T (X; ). Let T (X; ) denote an optimal triangulation with respect to triangulation size where the minimum is taken over all boundary restricted triangulations with bounded aspect ratio6. We show that the algorithm in section 3 produces a boundary restricted triangulation which is nearly optimal when T satis es the locality property.
6
A triangulation T has aspect ratio c if no triangle in T has aspect ratio greater than c. A triangulation T has bounded aspect ratio if there exists a constant c such that the aspect ratio of T is c.
First we prove the following useful lemmas for the triangulation S (X; ). In the following C , with and without subscripts, denotes suitable constants, and L denotes the Euclidean length of bd(X ). Lemma 2. For any T satisfying the locality property, triangles in E (X ) corresponding to the leaves of EX have size at least .
Proof. Suppose that there are triangles in E (X ) with size smaller than . Let
Vp be the rst node in the -tree with size smaller than . Then Vp cannot be generated by a split due to condition (iii). The size of Vp must be =2 and so the size of the parent Vq of Vp must be . Vq satis es condition (ii) and since T satis es the locality property, Vq could not have been split due to condition (i). This contradicts the existence of Vp . Lemma 3. The depth of any leaf Vp 2 E (X ) is at most C log(L=) for some constant C . Proof. By lemma 2, the area of Vp is at least C1 2 for some constant C1. The area of a parent in the -tree is four times that of any of its children. Since the triangle corresponding to the root 2of the -tree has area O(L2 ), the depth of a leaf cannot be larger than log CC21 L2 , and the bound follows.
Lemma 4. For any T satisfying the locality property, jS(X; )j C k log2(L= ) for some constant C , where k is the number of boundary triangles in S (X; ). Proof. We note that jS (X; )j = (jE (X )j) and so it suces to count the number of triangles in E (X ). The number of nodes in EX at any level7 intersecting bd(X ) is at most k. Since these nodes have three siblings, there are at most 4k nodes created at any level due to condition (i) or (ii). Let h be the height of EX . Thus the number of nodes in EX created due to condition (i) or (ii) is at most 4kh. We claim, without giving details, that the splitting of a node due to condition (i) or (ii) causes at most 3h nodes to split to satisfy condition (iii). Thus the number of nodes in EX generated due to condition (iii) is at most 4kh (3h 4) = 48kh2 since four nodes are generated when a node splits. The bound on E (X ) then follows from the bound on h from lemma 3. Lemma 5. For any T satisfying the locality property, jT (X; )j C (L= + (X )) for some constant C , where (X ) is the number of components of bd(X ). Proof. Let 2 T (X; ) be any boundary triangle. Since lies inside B ( ), its inscribing circle has a radius less than . Also, the aspect ratio of is bounded by a constant, whence it follows that the size of is at most C1 . One can easily show that covers at most C2 of bd(X ). To see this, divide into four similar, congruent triangles and carry on this subdivision successively on the resulting smaller triangles until the triangles have a size smaller than . At
7
We adopt the convention that the root has level 0 and a child has a level one greater than that of its parent.
most O(C1 ) triangles are created. Because T satis es the locality property, the length of bd(X ) contained in any small triangle is O( ). This means that at least L=(C2 ) boundary triangles are needed to account for the entire boundary of X . Therefore, jT (X; )j C (L= + (X )) since at least ( (X )) triangles are needed to account for all components of bd(X ).
Theorem 6. For any T satisfying the locality property, jS (X; )j C jT (X; )j log2 jT (X; )j for some constant C . Proof. Let k be the number of boundary triangles in S (X; ) and let E (X ) denote the collection of all triangles Vp 2 E (X ) such that Vp \ bd(X ) 6= ;. We note that k = (jE (X )j). Let (X ) denote the number of components of bd(X ). We show that jE (X )j is at most C1 (L= + (X )). The required bound then follows from lemma 4 and lemma 5. Consider a triangle Vp 2 E (X ). Let u be the unique vertex of Vp that lies on the side of bd(X ) opposite to the side containing the other two vertices of Vp . We call u the cap vertex of Vp . Let v be another cap vertex connected to u. If no such v exists, then a component of bd(X ) surrounds u and u is incident to at most six triangles. See Figure 5.7(a). If v exists, then we claim that the two triangles Vp1 and Vp2 incident to uv cover at least 43 length of bd(X ). Let Vp1 denote the triangle whose two edges incident to u are intersected by bd(X ). Let bd(X ) intersect uv at y, the other edge of Vp1 incident to u at x and the other edge of Vp2 incident to v at z . See Figure 5.7(b). It is easy to prove using the sine law that juyj 23 jxyj and jvyj 23 jyz j whence it follows that juvj 43 jxz j. Since all triangles of E (X ) are of size no smaller than (by lemma 2), juvj implying that jxz j 43 . Hence, any two triangles incident to two adjacent cap vertcies cover at least 43 length of bd(X ). It means that at most 83 (L=) + (X ) cap vertices are present. The bound on jE (X )j follows since at most six triangles can be incident to any cap
0
0
0
0
p
p
p
0
p
p
p
p
0
vertex.
u u
y
z
x
bd(X)
v component of bd(X) (a)
(b)
Fig. 5.7. (a) A cap vertex surrounded by a component of bd(X ). (b) u and v are cap vertices.
5.5 Quadrangulations In nite element methods, quadrilaterals are often used as basic elements. This requires a quadrangulation of an input domain. Most of the existing algorithms convert triangulations to quadrangulations [12]. Our method can provide a quadrangulation approximating the input domain directly from E (X ). Con guration (i) already produces convex quadrilaterals. In con guration (ii), we add one of the diagonals of Pv to break it up into two convex quadrilaterals. We choose the diagonal that produces most symmetric quadrilaterals. Figure A.3 (see appendix) shows quadrangulations with two dierent approximation tolerances. It is not hard to show that the quadrilaterals have good aspect ratio, i.e., the ratio of the longest dimension to the shortest dimension, is bounded by a constant.
6 Experimental results and discussion We implemented the algorithm of section 3 in C on a Unix platform. It is our experience that the perturbation method decreases the number of triangles considerably as we lower the value of T . Figure A.2 (see appendix) shows triangulations of a domain with three dierent values of tolerances. As expected the number of triangles increases as we decrease the tolerance. The number of triangles in the three triangulations are 370, 1258 and 5102. Quadrangulations of the same domain are shown in Figure A.3 (see appendix) for two dierent tolerances. This approximation scheme extends to three dimensions. Topological triangulation with a cubic grid is dealt with in [14]. It needs to be investigated if the quality of the tetrahedral elements can be improved with some other grid. An implementation of the method in R3 is expected to face other challenges. Currently we are in the process of implementing the method using a cubic grid.
References 1. T. Baker. Automatic mesh generation for complex three-dimensional regions using constrained Delaunay triangulation. Engineering with Computers, vol. 5, 1989, 161175. 2. M. Bern, D. Eppstein and J. Gilbert. Provably good mesh generation. Proc. 31st IEEE Sympos. Found. Comput. Sci., 1990, 231-241. 3. M. Bern and D. Eppstein. Mesh generation and optimal triangulation. In Computing in Euclidean Geometry, D. Z. Du and F. K. Hwang (eds.), World Scienti c, 1992. 4. M. Bern, D. Dobkin and D. Eppstein. Triangulating polygons without large angles. Proc. 8th Ann. Sympos. Comput. Geom., 1992, 222-231. 5. M. Bern, S. A. Mitchell and J. Ruppert. Linear-size nonobtuse triangulation of polygons. Proc. 10th Ann. Sympos. Comput. Geom., 1994, 221-230. 6. L. P. Chew. Guaranteed quality triangular meshes. Tech. Report, TR89-983, Dept. of Comput. Sc., Cornell Univ., Ithaca, NY, 1989. 7. H. Edelsbrunner and N. R. Shah. Triangulating topological spaces. Proc. 10th Ann. Sympos. Comput. Geom., 1994, 285{292.
8. J. D. Foley, A. Van Dam, S. K. Feiner, et al. Computer Graphics: Principles and Practice, Addison-Wesley, Reading, MA, 1990. 9. P. S. Heckbert and M. Garland. Multiresolution modeling for fast rendering. Proc. Graphics Interface, 1994, 43{50. 10. C. M. Homann. Geometric and Solid Modeling An Introduction. Morgan Kaufmann Publishers, Inc., 1989. 11. P. M. Hubbard. Approximating polyhedra with spheres for time-critical collison detection. ACM Trans. Graphics, vol. 15, 1996, 179{210. 12. S. Ramaswami, P. Ramos and G. Toussaint. Converting triangulations to quadrangulations. Proc. 7th Canad. Conf. Comput. Geom., 1995, 297-302. 13. J. Ruppert. A Delaunay re nement algorithm for quality 2-dimensional mesh generation. manuscript, 1993. 14. N. R. Shah. Homeomorphic meshes in R3 . Proc. 7th Canad. Conf. Comput. Geom., 1995, 25{30.
Appendix
Fig. A.1. The triangulation on the left is without perturbation and the triangulation on the right is with perturbation.
This article was processed using the LATEX macro package with LLNCS style
Fig.A.2. A domain and its triangulation for three tolerances. The number of triangles grows as the tolerance decreases.
Fig. A.3. Quadrangulation of the domain in Figure A.2 for two tolerances. The number of quadrilaterals grows as the tolerance decreases.
Compilation and Equivalence of Imperative Objects A.D. Gordon 1, P.D. Hankin 1, and S.B. Lassen 2 1 Computer Laboratory, University of Cambridge 2 BRICS, Computer Science Department, University of Aarhus
A b s t r a c t . We adopt the untyped imperative object calculus of Abadi and Cardelli as a minimal setting in which to study problems of compilation and program equivalence that arise when compiling object-oriented languages. Our main result is a direct proof, via a small-step unloading machine, of the correctness of compilation to a closure-based abstract machine. Our second result is that contextual equivalence of objects coincides with a form of Mason and Talcott's CIU equivalence; the latter provides a tractable means of establishing operational equivalences. Finally, we prove correct an algorithm, used in our prototype compiler, for statically resolving method offsets. This is the first study of correctness of an object-oriented abstract machine, and of CIU equivalence for an object-oriented language.
1
Motivation
This paper collates and extends a variety of operational techniques for describing and reasoning about programming languages and their implementation. We focus on implementation of imperative object-oriented programs. The language we describe is essentially the untyped imperative object calculus of Abadi and Cardelli 1-3, a small but extremely rich language that directly accommodates objectoriented, imperative and functional programming styles. Abadi and Cardelli invented the calculus to serve as a foundation for understanding object-oriented programming; in particular, they use the calculus to develop a range of increasingly sophisticated type systems for object-oriented programming. We have implemented the calculus as part of a broader project to investigate concurrent object-oriented languages. This paper develops formal foundations and verification methods to document and better understand various aspects of our implementation. Our work recasts techniques originating in studies of the A-calculus in the setting of the imperative object calculus. In particular, our reduction relation for the object calculus, our design of an object-oriented abstract machine, our compiler correctness proof and our notion of program equivalence are all based on earlier studies of the A-calculus. This paper is the first application of these techniques to an object calculus and shows they may easily be re-used in an object-oriented setting. Our system compiles the imperative object calculus to bytecodes for an abstract machine, implemented in C, based on the ZAM of Leroy's CAML Light
75 16. A type-checker enforces the system of primitive self types of Abadi and Cardelli. Since the results of the paper are independent of this type system, we will say no more about it. In Section 2 we present the imperative object calculus together with a smallstep substitution-based operational semantics. Section 3 gives a formal description of an object-oriented abstract machine, a simplification of the machine used in our implementation. We present a compiler from the object calculus to instructions for the abstract machine. We prove the compiler correct by adapting a proof of Rittri 23 to cope with state and objects. In Section 4, we develop a theory of operational equivalence for the imperative object calculus, based on the CIU equivalence of Mason and Talcott 18. We establish useful equivalence laws and prove that CIU equivalence coincides with Morris-style contextual equivalence 20. In Section 5, we exercise operational equivalence by specifying and verifying a simple optimisation that resolves at compile-time certain method labels to integer offsets. We discuss related work at the ends of Sections 3, 4 and 5. Finally, we review the contributions of the paper in Section 6. The full version of this paper, with proofs, is available as a technical report 9. 2
An Imperative
Object
Calculus
We begin with the syntax of an untyped imperative object calculus, the impg calculus of Abadi and Cardelli 3 augmented to include store locations as terms. Let x, y, and z range over an infinite collection of variables. Let L range over an infinite collection of locations, the addresses of objects in the store. The set of t e r m s of the calculus is given as follows: a, b ::= x ~i = ~(xi)bi iel..,~ a.~ a.~ ~ ~(x)b clone(a) let x = a in b
term variable location object (~i distinct) method selection method update cloning let
Informally, when an object is created, it is put at a fresh location, ~, in the store, and referenced thereafter by e. Method selection runs the body of the method with the self parameter (the x in q(x)b) bound to the location of the object containing the method. Method update allows an existing method in a stored object to be updated. Cloning makes a fresh copy of an object in the store at a new location. The reader unfamiliar with object calculi is encouraged to consult the book of Abadi and Cardelli 3 for many examples and a discussion of the design choices that led to this calculus. Here are the scoping rules for variables: in a method ~(x)b, variable x is bound in b; in let x = a in b, variable x is bound in b. If r is a phrase of syntax we write fv(r for the set of variables that occur free in r We say phrase r is
76 closed if fv(r = O. We write r162 for the substitution of phrase r for each free occurrence of variable x in phrase r We identify all phrases of syntax up to alpha-conversion; hence a = b, for instance, means that we can obtain term b from term a by systematic renaming of bound variables. Let o range over objects, terms of the form s = ~(xi)bi iet..,~. In general, the notation r iel..n means
r
Cn.
Unlike Abadi and Cardelli, we do not identify objects up to re-ordering of methods since the order of methods in an object is important for an algorithm we present in Section 5 for statically resolving method offsets. Moreover, we include locations in the syntax of terms. This is so we may express the dynamic behaviour of the calculus using a substitution-based operational semantics. In Abadi and Cardelli's closure-based semantics, locations appear only in closures and not in terms. If r is a phrase of syntax, let locs(r be the set of locations that occur in r Let a term a be a static term if locs (a) = 0. The static terms correspond to the source syntax accepted by our compiler. Terms containing locations arise during reduction. As an example of programming in the imperative object calculus, here is an encoding of the call-by-value )~-calculus:
A(x)b clef a r g = ~(z)z.aT"g, val = ~(s)let x = s.arg in b b(a) %f let
= a in (b.arg
where y ~ z, and s and y do not occur free in b. It is like an encoding from Abadi and Cardelli's book but with right-to-left evaluation of function application. Given updateable methods, we can easily extend this encoding to express an ML-style call-by-value )~-calculus with updateable references. Before proceeding with the formal semantics for the calculus, we fix notation for finite lists and finite maps. We write finite lists in the form r r which we usually write as r iel..n. Let r :: r iel..,~ = r r iel..,. Let
4i iE1..rn~r
jE1..n = 4i iE1..m, ~)j jEl..n.
Let a finite map, f , be a list of the form xi ~ 4i iel..n, where the xi are distinct. When f = xi ~ r iel.., is a finite map, let d o m ( f ) = {xi iel..n}. For the finite map f = f'@x ~-~ 4 ~ f " , let f ( x ) = 4- When f and g are finite maps, let the map f + (x ~ 4) be f'@x ~ 4@f" if f = f'@x ~ r otherwise (x ~ 4) :: f" Now we specify a small-step substitution-based operational semantics for the calculus 8,18. Let a store, a, be a finite map ei ~-~ oi iel..- from locations to objects. Each stored object consists of a collection of labelled methods. The methods may be updated individually. Abadi and Cardelli use a method store, a finite map from locations to methods, in their operational semantics of imperative objects. We prefer to use an object store, as it explicitly represents the grouping of methods in objects. Let a configuration, c or d, be a pair (a, a) where a is a term and a is a store. Let a reduction context, ~ , be a term given by the following grammar, with one free occurrence of a distinguished variable, e: U ::= * I n.s163
~ ~(x)b I clone(A) I let x = 7~ in b
77 We write Ra for the outcome of filling the single occurrence of the hole 9 in a reduction context R with the term a. Let the small-step substitution-based reduction relation, c ~ d, be the smallest relation satisfying the following, where in each rule the hole in the reduction context T~ represents 'the point of execution'. ( R e d O b j e c t ) (T~o, a) -~ (T~~, a') if a' = (L ~-~ O) :: a and~ r dom(a). ( R e d Select) (7~~.s a) -~ (7~bj ~/xj} , a) if a(~) = ~i = q(xl)bi iel..n and j e 1..n. ( R e d U p d a t e ) (n .ej ~ ~(x)b, o) - , (hill, o') if a(~) = s = q(xi)b, iel..n, j e 1..n, a' = a + (~ ~ s = q(xi)bi iel"'j-x, ~j = q(x)b, ~ = q(xi)biieJ+l"'~).
(Red Clone)
-*
if a(e) = o, a' = (e' ~-~ o) :: a and e' ~ dom(a). ( R e d L e t ) (T~let x = e in b, a) -~ (T~b~q~~, a). Let a store a be well formed if and only if fv(a(~)) = O and locs(a(~)) C_ dora(a) for each t E dom(a). Let a configuration (a, a) be well formed if and only if fv(a) = 0, locs(a) C_ dora(a) and a is well formed. A routine case analysis shows that reduction sends a well formed configuration to a well formed configuration, and that reduction is deterministic up to the choice of freshly allocated locations in rules for object formation and cloning. Let a configuration c be terminal if and only if there is a store a and a location ~ such that c = (t, a). We say a configuration c converges to d, c ~ d, if and only if d is a terminal configuration and c --+* d. Because reduction is deterministic, whenever c ~ d and c is well formed, the configuration d is unique up to the renaming of any newly generated locations in the store component of d. Abadi and Cardelli define a big-step closure-based operational semantics for the calculus: it relates a configuration directly to the final outcome of taking many individual steps of computation, and it uses closures, rather than a substitution primitive, to link variables to their values. We find the small-step substitution-based semantics better suited for the proofs in Sections 3 and 5 as well as for developing the theory of operational equivalence in Section 4. We have proved, using an inductively defined relation unloading closures to terms, that our semantics is consistent with theirs in the following sense:
Proposition 1. For any closed static term a, there is d such that (a, ~) ~ d if and only if evaluation of a converges in Abadi and Cardelli's semantics. 3
Compilation
to an Object-Oriented
Abstract
Machine
In this section we present an abstract machine for imperative objects, a compiler sending the object calculus to the instruction set of the abstract machine and a proof of correctness. The proof depends on an unloading procedure which converts configurations of the abstract machine back into configurations of the
78 object calculus from Section 2. The unloading procedure depends on a modified abstract machine whose accumulator and environment contain object calculus terms as well as locations. The instruction set of our abstract machine consists of the operations, ranged over by op, given as follows: a c c e s s i, object(gi, opsi) iE1..n (gi distinct), s e l e c t g, update(g, ops) or l e t ops, where ops ranges over operation lists. We represent compilation of a term a to an operation list ops by the judgment x s t- a ~ ops, defined by the following rules. The variable list x s includes all the free variables of a; it is needed to compute the de Bruijn index of each variable occurring in a.
(T~ans W r )
m~ ~c~..,, ~ mj ~ ~cce~s j if j E 1..n.
(Wrans O b j e c t ) ms b gi = q(Yi)ai iel"'n ~ object(gl, if Yi :: x s ~- ai => opsi and Yi ~ xs for all i E 1..n. ( T r a n s Select) x s t- a.g => o p s @ s e l e c t
opsi)iel"'n
g if x s t- a ~ ops.
( T r a n s U p d a t e ) xs ~- (a.g 4=: q(x)a') => ops@update(g, ops') if ms ~- a ~ ops and x :: ms ~- a' ~ ops' and x ~ ms. ( T r a n s C l o n e ) xs t- clone(a) ~ ops@cloae
if x s ~- a ~ ops.
( T r a n s L e t ) ms t- let x = a in a' ~ ops@let(ops') if x s t- a =ez ops and x :: xs t- a' ~ ops' and m ~ ms. An abstract machine configuration, C or D, is a pair (P, S), where P is a state and Z is a store, given as follows: P , Q ::= ( o p s , E , AC, R S ) E ::= ~i iel.., A C ::= ~ I~
.RS ::= & iel..,., F ::= (ops, E) 0 ::= (ti, Fi)iE1..,~ S ::= ti ~ Oi i~1..,,
machine state environment accumulator return stack closure stored object (gi distinct) store (~i distinct)
In a configuration ((ops, E , A C , R S ) , 27), ops is the current program. Environment E contains variable bindings. Accumulator A C either holds the result of evaluating a term, A C = t, or a dummy value, A C = 9. Return stack R S holds return addresses during method invocations. Store E associates locations with objects. Two transition relations, given next, represent execution of the abstract machine. A 3-transition, P ~ Q, corresponds directly to a reduction in the object calculus. A r-transition, P r Q, is an internal step of the abstract machine, either a method return or a variable lookup. Lemma 3 relates reductions of the object calculus and transitions of the abstract machine. (T R e t u r n ) ((9, E, AC, ( ops, E') :: R S ) , Z )
r ~ ( ( ops, E', AC, R S ) , E).
79 (r Access) ((access j :: ops, E, ~, RS), E) if E = ti iel..n and j E 1..n.
~ ~ ((ops, E, ti, RS), E)
(fl C l o n e ) ((clone :: ops, E, t, RS), E) ~ ~ ((ops, E, It', RS), 22') if E(~) = O and E' = (~' ~+ O) :: E and t' โข dom(E). (/~ O b j e c t ) ((object(gi, opsi) iel..n :: ops, E, ~, RS), ~) ~ ((ops, E, t, RS), (~ ~-~ (ei(opsi, E)) iel..=) :: E) if ~ r dom(22). (fl Select) ( ( s e l e c t gi :: ops,E,t,RS),E) ~ ((opsj,t :: Ei,N,(ops, E) :: RS), Z) if Z(t) = (gi, (opsl, Ei)) ieL.,~ and j e 1..n. (/~/Update) ((update(g, ops') :: ops, E, t, RS), E) ~ ~ ( ( ops, E, It, RS), ~') if Z(~) = O@(g, f)@O' and i7' = 22 + (L ~ O@(g, (ops', E))@O').
Let) ( ( l e t ops' :: ops, E, d, RS), 22)
((ops',t :: E, 1, (ops, E) :: RS), 22).
Each rule apart from the first tests whether the accumulator is empty or not. We can show that this test is always redundant when running code generated by our compiler. In the machine of the full version of this paper 9, we replace the accumulator with an argument stack, a list of values. To prove the abstract machine and compiler correct, we need to convert back from a machine state to an object calculus term. To do so, we load the state into a modified abstract machine, the unloading machine, and when this unloading machine terminates, its accumulator contains the term corresponding to the original machine state. The unloading machine is like the abstract machine, except that instead of executing each instruction, it reconstructs the corresponding source term. Since no store lookups or updates are performed, the unloading machine does not act on a store. An unloading machine state is like an abstract machine state, except that locations are generalised to arbitrary terms. Let an unloading machine state, p or q, be a quadruple (ops, e, ac, RS) where e takes the form ai iel.., and ac takes the form N or hi. Next we make a simultaneous inductive definition of a u-transition relation p =~ p' and an unloading relation, (ops, e) -,* (x)b, that unloads a closure to a method. (u Access) (access j :: ops', e, ~, RS) if j 6 1..n and e = hi iel..n.
~ ~ (ops', e, hi, RS)
(u O b j e c t ) (object(gi, opsi) ,el..,~ :: ops', e, U, RS) u) (ops', e, gi = ~(xi)bi iel..~, RS) if (ops i, e) -,-+ (xi)bi for each i E 1..n. (u C l o n e ) (clone :: ops', e, a, RS) ~
(ops', e, clone(a),
(u Select) ( s e l e c t g :: ops', e, a, RS)
u ~ (ops', e, a.s
(u U p d a t e ) (update(g, ops) :: ops', e, a, RS) u (ops', e, a.g r g(x)b, RS) if (ops, e) ..a (x)b.
RS). RS).
80 (u L e t ) (let(ops') :: ops", e, a, RS) if (ops', e) -,~ (x)b. (u R e t u r n )
(~, e, ac, (ops, E) :: RS)
'~ > (ops", e, let x = a in b, RS) ~ ~ (ops, E, ac, RS).
( U n l o a d C l o s u r e ) (ops, e) ..z (x)b if x ~ fv(e) and ---*
We complete the machine with the following unloading relations: O -,~ o (on objects), 22 -,~ a (on stores) and C -,z c (on configurations). ( U n l o a d O b j e c t ) (ei, (opsi, Ei)) iel.., -,z s = q(xi)bl iel.., if (opsi, Ei) ".~ (x~)bi for all i 9 1..n. ( U n l o a d S t o r e ) ~i ~ Oi iel..~ ..~ el ~-+ oi iel... if Oi ",~ oi for all i 9 1..n. ( U n l o a d Config) ((ops, E, AC, RS), E) -.~ (a, a) if 22 -,z a and (ops, E, AC, RS) ~'.~* (~, e', a, ~). We can prove the following: L e m m a 2. Whenever ~ I- a ~ ops then ((ops, ~, ~, ~), ~) -,z (a, ~). L e m m a S. (1) If C . , z c and C
~ D then D-,z c (2) If C ".~ c and C ~ > D then there is d such that D -,z d and c ~ d Let a big-step transition relation, C ~ D, on machine states hold if and only if there are ~ , E , Z with D = ((D, E, e, D), 22) and C ( ~> t3 ~>)* D. L e m m a 4. (1) If C ".~ c and C ~ D then there is d with D -,z d and c ~ d (2) If C .~z c and c ~ d then there is D with D .,z d and C ~ D T h e o r e m 5. Whenever ~ a =~ ops, for all d, (a, ~) ~ d if and only if there is D with ((ops, ~, ~, ~), ~) ~ D and D -,z d.
Proof. By Lemma 2 we have ((ops, U, D, ~), ~) .,z (a, ~). Suppose (a, ~) ~ d. By Lemma 4(2), ((ops, ~, D, ~), ~) "~ (a, ~) and (a, D) ~ d imply there is D with D -.~ d and ((ops, ~, ~, N), ~) ~ D. Conversely, suppose ((ops, ~, ~, ~), ~) ~ D for some D. By Lemma 4(1), ((ops, ~, D, ~), ~) ~ (a, ~) and ((ops, ~, ~, ~), ~) ~ D imply there is d with D -,z d and ((ops, ~, ~, D), ~) ~ d. 0 In the full version of this paper 9, we prove correct a richer machine, based on the machine used in our implementation, that supports functions as well as objects. The full machine has a larger instruction set than the one presented here, needs a more complex compiler and has an argument stack instead of an accumulator. The correctness proof is similar to the one for the machine presented here.
81
There is a large literature on proofs of interpreters based on abstract machines, such as Landin's SECD machine 12,22,25. Since no compiled machine code is involved, unloading such abstract machines is easier than unloading an abstract machine based on compiled code. The VLISP project 11, using denotational semantics as a metalanguage, is the most ambitious verification to date of a compiler-based abstract machine. Other work on compilers deploys metalanguages such as calculi of explicit substitutions 13 or process calculi 28. Rather than introduce a metalanguage, we prove correctness of our abstract machine directly from its operational semantics. We adopted Rittri's idea 23 of unloading a machine state to a term via a specialised unloading machine. Our proof is simpler than Rittri's, and goes beyond it by dealing with state and objects. Even in the full version of the paper there are differences, of course, between our formal model of the abstract machine and our actual implementation. One difference is that we have modelled programs as finitely branching trees, whereas in the implementation programs are tables of bytecodes indexed by a program counter. Another difference is that our model omits garbage collection, which is essential to the implementation. Therefore Theorem 5 only implies that the compilation strategy is correct; bugs may remain in its implementation. 4
Operational
Equivalence
of Imperative
Objects
The standard operational definition of term equivalence is Morris-style contextual equivalence 20: two terms are equivalent if and only if they are interchangeable in any program context without any observable difference; the observations are typically the programs' termination behaviour. Contextual equivalence is the largest congruence relation that distinguishes observably different programs. Mason and Talcott 18 prove that, for functional languages with state, contextual equivalence coincides with so-called CIU ("Closed Instances of Use") equivalence. Informally, two terms are CIU equivalent if and only if they have identical termination behaviour when placed in the redex position in an arbitrary configuration and locations are substituted for the free variables. Although contextual equivalence and CIU equivalence are the same relation, the definition of the latter is typically easier to work with in proofs. In this section we adopt CIU equivalence as our notion of operational equivalence for imperative objects. We establish a variety of laws of equivalence. We show that operational equivalence is a congruence, and hence supports compositional equational reasoning. Finally, we prove that CIU equivalence coincides with contextual equivalence, as in Mason and Talcott's setting. We define static terms a and d to be operationally equivalent, a ~ d, if, for all variables x l , . . . , xn, all static reduction contexts 7~ with fv(7~a) LJ ~(Ts C_ { x l , . . . , xn}, all well formed stores a, and all locations el . . . . , Ln 9 dom(a), we have that configurations (na{~i/x, iel..n, a) and (~a'~ei/x, iel..,~}, a) either both converge or both do not converge. It follows easily from the definition of operational equivalence that it is an equivalence relation on static terms and, moreover, that it is preserved by static
82 reduction contexts: a ~ a'
(~ Cong 7~)
7~a
locs(Tt) =
~ T~a'
From the definition of operational equivalence, it is possible to show a multitude of equational laws for the constructs of the calculus. For instance, the let construct satisfies laws corresponding to those of Moggi's computational ),calculus 19, presented here in the form given by Talcott 27.
Proposition 6. (1) (let x -= y in b) ~ b{~Y/x} (2) (let x = a in 7Ex) ~-. T~a, i f x ~ I v ( n ) The effect of invoking a method that has just been updated is the same as running the method body of the update with the self parameter bound to the updated object.
Proposition
7. (a.e ~ ~(x)b).~ ,~ (let x = (a.g ~ ~(x)b) in b)
The following laws characterise object constants and their interaction with the other constructs of the calculus.
Proposition 8. S u p p o s e o = gi = ~(xi)bi (1) o.s ~ (let x j = o in bj) =
iel..n =
and j E 1..n.
(2)
(o.e
(3) (4) (5) (6)
clone(o) "~ o (let x = o in ~ c l o n e ( x ) ) ~ (let x = o in ~o), i f x it I v ( o ) (let x = o in b) ~ b, if x it Iv(b) (let x = a in let y = o in b) ..~ (let y = o in let x = a in b), if x r I v ( o ) and y r I v ( a )
6 =
It is also possible to give equational laws for updating and cloning, but we omit the details. Instead, let us look at an example of equational reasoning using the laws above. Recall the encoding of call-by-value functions from Section 2.
)t(x)b de~ arg
= g ( z ) z . a r g , val = g ( s ) l e t x = s.arg in b
b(a) def = let y = a in (b.arg ~ ~ ( z ) y ) . v a l
From the laws for let and for object constants, the following calculation shows the validity of/3~-reduction, ( $ ( x ) b ) ( y ) ~ bY/x~. Let o = arg = q(z)y, val = ~ ( s ) l e t x = s.arg in b where z ~ y. ( ) t ( x ) b ) ( y ) ~ (()t(x)b).arg ~ q ( z ) y ) . v a l by Prop. 6(1) .~ o.val by Prop. 8(2) and (~ Cong 7~) let s = o in let x = s.arg in b by Prop. 8(1) let x = o.arg in b by Prop. 6(2) ..~ let x = (let z = o in y) in b by Prop. 8(1) and (~ Cong T~) ~-. let x = y in b by Prop. 8(5) and (~ Cong 7~) ..~ b{Y/x~ by Prop. 6(1)
83 This derivation uses the fact that operational equivalence is preserved by static reduction contexts, (~ Cong T~). More generally, to reason compositionally we need operational equivalence to be preserved by arbitrary term constructs, that is, to be a congruence. The following may be proved in several ways, most simply by an adaptation of the corresponding congruence proof for a A-calculus with references by Honsell, Mason, Smith and Talcott 14. Proposition 9. Operational equivalence is a congruence. From Proposition 9 it easily follows that operational equivalence coincides with Morris-style contextual equivalence. Let a term context, g, be a term containing some holes. Let the term g a be the outcome of filling each hole in the context C with the term a. T h e o r e m 10. a ~ a' if and only if for all term contexts g with locs(g) = 0 , Ca and ga' are closed, that (Ca, D)~ r (ga', ~)~. Earlier studies of operational equivalence of stateless object calculi 10,15,24 rely on bisimulation equivalence. See Stark 26 for an account of the difficulties of defining bisimulation in the presence of imperative effects. The main influence on this section is the literature on operational theories for functional languages with state 14,18. Agha, Mason, Smith and Talcott study contextual equivalence, but not CIU equivalence, for a concurrent object-oriented language based on actors 5. Ours is the first development of CIU equivalence for an object-oriented language. Our experience is that existing techniques for functional languages with state scale up well to deal with the object-oriented features of the imperative object calculus. Some transformations for rearranging side effects are rather cumbersome to express in terms of equational laws as they depend on variables being bound to distinct locations. We have not pursued this issue in great depth. For further study it would be interesting to consider program logics such as VTLoE 14 where it is possible to express such conditions directly. 5
Example:
Static Resolution
of Labels
In Section 3 we showed how to compile the imperative object calculus to an abstract machine that represents objects as finite lists of labels paired with method closures. A frequent operation is to resolve a method label, that is, to compute the offset of the method with that label from the beginning of the list. This operation is needed to implement both method select and method update. In general, resolution of method labels needs to be carried out dynamically since one cannot always compute statically the object to which a select or an update will apply. However, when the select or update is performed on a newly created object, or to self, it is possible to resolve method labels statically. The purpose of this section is to exercise our framework by presenting an algorithm for statically resolving method labels in these situations and proving it correct.
84
To represent our intermediate language, we begin by extending the syntax of terms to include selects of the form a.j and updates of the form a.j ~ ~(x)b, where j is a positive integer offset. The intention is that at runtime, a resolved select e.j proceeds by running the j t h method of the object stored at e. If the j t h method of this object has label s this will have the same effect as e.g. Similarly, an update e.j r q(x)b proceeds by updating the j t h method of the object stored at e with method q(x)b. If the j t h method of this object has label g, this will have the same effect as e.s r q(x)b. To make this precise, the operational semantics of Section 2 and the abstract machine and compiler of Section 3 may easily be extended with integer offsets. We omit all the details. All the results proved in Sections 3 and 4 remain true for this extended language. We need the following definitions to express the static resolution algorithm. A ::= gl iet..,~ SE ::= xi ~ Ai/eL.n
layout type (gi distinct) static environment (xi distinct)
The algorithm infers a layout type, A, for each term it encounters. If the layout type A is gi iel..n, with n > 0, the term must evaluate to an object of the form gi = q(xi)bi iel..n. On the other hand, if the layout type A is ~, nothing has been determined about the layout of the object to which the term will evaluate. An environment SE is a finite map that associates layout types to the free variables of a term. We express the algorithm as the following recursive routine resolve(SE, a), which takes an environment SE and a static term a with fv(a) C_ dom(SE), and produces a pair (a I, A), where static term a' is the residue of a after resolution of labels known from layout types to integer offsets, and A is the layout type of both a and a'. We use p to range over both labels and integer offsets.
resolve(SE, x) def (X, SE(x)) where x 6 dom(SE) = (gi = e.(x.~a resolve(SE, gi = q ( x i ) a i i e l - n ) def "~x ,! Ii iEl..n J, A ) where A = gi ieL.n
and (a~, Bi) = resolve((xi ~ A) :: SE, hi), xl ~ dom(SE), for each i 9 1..n resolve (SE, a.p) def (a'.j, ) if j 9 1..n and p = gj (a'.p, D) otherwise where (a', gi iel..n) = resolve(SE, a) def resolve(SE, a.p ~ q(x)b) = (a'.j ~ ~(x)U, A) if j 9 1..n and p = gj (al.p ~ ~(x)U, A) otherwise where (a', A) = resolve(SE, a), A = gi ieL.n and (U, B) = resolve((x ~ A) :: SE, b), x r dom(SE) resolve(SE, clone(a)) def (clone(a'), A) where (a I, A) = resolve(SE, a) resolve(SE, let x = a in b) d~f (let x = a' in U, B) where (a I, A) = resolve(SE, a) and (5I, B) = resolve((x ~ A) :: SE, b), x ~ dom(SE)
85 To illustrate the algorithm in action, suppose that alse is the object:
val=
(8)s.1
,tt=
(s)D,ff
=r
Then resolve(D,alse ) returns the following:
(val =
tt =
=
vat,
The method select s.ff has been statically resolved to s.3. The layout type val, tt, ff asserts that false will evaluate to an object with this layout. Our prototype implementation of the imperative object calculus optimises any dosed static term a by running the routine resolve(~, a) to obtain an optimised term a t paired with a layout type A. We have proved that this optimisation is correct in the sense that a ~ is operationally equivalent to a. T h e o r e m 11. Suppose a is a closed static term. If routine resolve(D , a) returns
(a', A), then a ~.. a'. On a limited set of test programs, the algorithm converts a majority of selects and updates into the optimised form. However, the speedup ranges from modest (10%) to negligible; the interpretive overhead in our bytecode-based system tends to swamp the effect of optimisations such as this. It is likely to be more effective in a native code implementation. In general, there are many algorithms for optimising access to objects; see Chambers 7, for instance, for examples and a literature survey. The idea of statically resolving labels to integer offsets is found also in the work of Ohori 21, who presents a A-calculus with records and a polymorphic type system such that a compiler may compute integer offsets for all uses of record labels. Our system is rather different, in that it exploits object-oriented references to self. 6
Conclusions
In this paper, we have collated and extended a range of operational techniques which we have used to verify aspects of the implementation of a small objectoriented programming language, Abadi and Cardelli's imperative object calculus. The design of our object-oriented abstract machine was not particularly difficult; we simply extended Leroy's abstract machine with instructions for manipulating objects. Our first result is a correctness proof for the abstract machine and its compiler, Theorem 5. Such results are rather more difficult than proofs of interpretive abstract machines. Our contribution is a direct proof method which avoids the need for any metalanguage--such as a calculus of explicit substitutions. Our second result is that Mason and Talcott's CIU equivalence coincides with Morris-style contextual equivalence, Theorem 10. The benefit of CIU equivalence is that it allows the verification of compiler optimisations. We illustrate this by proving Theorem 11, which asserts that an optimisation algorithm from our implementation preserves contextual equivalence.
86 This is the first study of correctness of compilation to an object-oriented abstract machine. It is also the first study of program equivalence for the imperative object calculus, a topic left unexplored by Abadi and Cardelli's book. To the best of our knowledge, the only other work on the imperative object calculus is a program logic due to Abadi and Leino 4 and a brief presentation, without discussion of equivalence, of a labelled transition system for untyped imperative objects in the thesis of Andersen and Pedersen 6. In principle, we believe our compiler correctness proof would scale up to proving correctness of a Java compiler emitting instructions for the Java virtual machine (JVM) 17. To carry this out would require formal descriptions of the operational semantics of Java, the JVM and the compiler. Due to the scale of the task, the proof would require machine support.
Acknowledgements Martin Abadi, Carolyn Talcott and several anonymous referees commented on a draft. Gordon holds a Royal Society University Research Fellowship. Hankin holds an E P S R C Research Studentship. Lassen is supported by a grant from the Danish Natural Science Research Council.
References 1. M. Abadi and L. Cardelli. An imperative object calculus: Basic typing and soundness. In Proceedings SIPL'95, 1995. Technical Report UIUCDCS-R-95-1900, Department of Computer Science, University of Illinois at Urbana-Champaign. 2. M. Abadi and L. Cardelli. An imperative object calculus. Theory and Practice of Object Systems, 1(13):151-166, 1996. 3. M. Abadi and L. Cardelli. A Theory of Objects. Springer-Verlag, 1996. 4. M. Abadi and K.R.M. Leino. A logic of object-oriented programs. In Proceedings TAPSOFT '97, volume 1214 of Lecture Notes in Computer Science, pages 682-696. Springer-Verlag, April 1997. 5. G. Agha, I. Mason, S. Smith and C. Talcott. A foundation for actor computation. Journal of Functional Programming, 7(1), January 1997. 6. D.S. Andersen and L.H. Pedersen. An operational approach to the q-calculus. Master's thesis, Department of Mathematics and Computer Science, Aalborg, 1996. Available as Report R-96-2034. 7. C. Chambers. The Design and Implementation of the Self Compiler, an Optzmizing Compiler for Object-Oriented Programming Languages. PhD thesis, Computer Science Department, Stanford University, March 1992. 8. M. Felleisen and D. Friedman. Control operators, the SECD-machine, and the A-calculus. In Formal Description of Programming Concepts III, pages 193-217. North-Holland, 1986. 9. A.D. Gordon, S.B. Lassen and P.D. Hankin. Compilation and equivalence of imperative objects. Technical Report 429, University of Cambridge Computer Laboratory, 1997. Also appears as BRICS Report RS-97-19, BRICS, Department of Computer Science, University of Aarhus. 10. A.D. Gordon and G.D. Rees. Bisimilarity for a first-order calculus of objects with subtyping. In Proceedings POPL'96, pages 386-395. ACM, 1996. Accepted for publication in Information and Computation.
87 11. J.D. Guttman, V. Swarup and J. Ramsdell. The VLISP verified scheme system. Lisp and Symbolic Computation, 8(1/2):33-110, 1995. 12. J. Hannah and D. Miller. From operational semantics to abstract machines. Mathematical Structures in Computer Science, 4(2):415-489, 1992. 13. T. Hardin, L. Maranget and B. Pagano. Functional back-ends and compilers within the lambda-sigma calculus. In ICFP'96, May 1996. 14. F. Honsell, I. Mason, S. Smith and C. Talcott. A variable typed logic of effects. Information and Computation, 119(1):55-90, 1993. 15. H. Hiittel and J. Kleist. Objects as mobile processes. In Proceedings MFPS'96, 1996. 16. X. Leroy. The ZINC experiment: an economical implementation of the ML language. Technical Report 117, INRIA, 1990. 17. T. Lindholm and F. Yellin. The Java Virtual Machine Specification. The Java Series. Addison-Wesley, 1997. 18. I. Mason and C. Talcott. Equivalence in functional languages with effects. Journal of b-~nctional Programming, 1(3):287-327, 1991. 19. E. Moggi. Notions of computations and monads. Information and Computation, 93:55-92, 1989. Earlier version in Proceedings LICS'89. 20. J.H. Morris. Lambda-Calculus Models of Programming Languages. PhD thesis, MIT, December 1968. 21. A. Ohori. A compilation method for ML-style polymorphic record calculi. In Proceedings POPL'9~, pages 154-165. ACM, 1992. 22. G.D. Plotkin. Call-by-name, call-by-value and the lambda calculus. Theoretical Computer Science, 1:125-159, 1975. 23. M. Rittri. Proving compiler correctness by bisimulation. PhD thesis, Chalmers, 1990. 24. D. Sangiorgi. An interpretation of typed objects into typed 1r-calculus. In FOOL 3, New Brunswick, 1996. 25. P. Sestoft. Deriving a lazy abstract machine. Technical Report 1994-146, Department of Computer Science, Technical University of Denmark, September 1994. 26. I. Stark. Names, equations, relations: Practical ways to reason about new. In TLCA '97, number 1210 in LNCS, pages 336-353. Springer, 1997. 27. C. Talcott. Reasoning about functions with effects. In Higher Order Operational Techniques in Semantics, Publications of the Newton Institute, pages 347-390. Cambridge University Press, 1997. To appear. 28. M. Wand. Compiler correctness for parallel languages. In Proceedings FPCA '95, pages 120-134. ACM, June 1995.
On the Expressive Power of Rewriting Massimo Marchiori CWI Kruislaan 413, NL-1098 SJ Amsterdam, The Netherlands max@cwi, nl
Abstract
In this paper we address the open problem of classifying the expressive power of classes of rewriting systems. We introduce a framework to reason about the relative expressive power between classes of rewrite system, with respect to every property of interest P. In particular, we investigate four main classes of rewriting systems: left-linear Term Rewriting Systems, Term Rewriting Systems, Normal Conditional Term Rewriting Systems and Join ConditionM Term Rewriting Systems. It is proved that, for all the main properties of interest of rewriting systems (completeness, termination, confluence, normalization etc.) these four classes form a hierarchy of increasing expressive power, with two total gaps, between left-linear TRSs and TRSs, and between TRSs and normal CTRSs, and with no gaps between normal CTRSs and join CTRSs. Therefore, these results formally prove the strict increase of expressive power between left-linear and non left-linear term rewriting, and between unconditional and conditional term rewriting, and clarify in what sense normal CTRSs can be seen as equivalent in power to join CTRSs.
Keywords: Term Rewriting Systems, Conditional Term Rewriting Systems, Observable Properties, Compilers.
1
Introduction
While t e r m rewriting is a well established field, a satisfactory formal study of the expressive power of classes of rewriting systems is still an open problem. All the works that so far tried to shed some light on this fundamental topic managed only to focus on particular instances of the problem, failing to provide a general approach to the study of expressive power. The first work on the subject is 3: imposing the restriction that no new symbol can be added, it provides an example of a conditional algebraic specification that is not expressible via unconditional ones. While this basic result is interesting, it only started shaping a view on the subject, since the imposed restriction of not having new symbols is readily extremely limitative.
89
A subsequent attempt to study some aspects of expressibility of rewrite systems has been done in 2, where it has been shown that 'weakly uniquely terminating' TRSs are more expressive than complete TRSs in the sense that they can express some 'TRS-suitable' pair congruence/representatives that cannot be expressed by the latter class. Later, 6 showed that linear TRSs are in a sense less powerful than term rewriting systems: that paper showed that linear TRSs generate fewer sets of terms than non linear TRSs when so-called " o r ' or "IO passes" are considered. Both of these works only focus on particular instances of the expressibility problem, and suffer from a severe lack of generality. The first work exhibits an ad-hoc result for a specific property and with a suitable notion of "expressibility". In the second work, TRSs are employed considering "passes" and not with usual reductions; also, the method cannot be used to prove other gaps w.r.t, other paradigms, for instance more expressive than TRSs, since TRSs already generate every recursively enumerable set of terms. Finally, the restriction to linear TRSs is rather strong. A somehow related work is 9, where the authors have investigated with success the equivalence among various types of conditional rewriting systems, and investigated the study of the confluence property. However, no concept of 'expressive power' of a class is investigated. In essence, the big problem is to set up a meaningful definition of expressive power. If we identify expressive power with computational power, then the problem becomes of little interest: for instance every class of rewrite systems containing the left-linear TRSs is equivalent to the class of left-linear TRSs, since a Taring machine can be simulated via a left-linear rewrite rule (5). In this paper, we give a rigorous definition of what means for a class of rewrite systems to be at least as expressive as another class with respect to a certain property of interest TL The solution is to employ a constructive transformation that translates every rewrite system of one class into a rewrite system of the other. The translation must satisfy some regularity conditions: roughly speaking, the produced rewrite system must not compute 'less' than the original one, and moreover the structure of the target class has to be respected in the sense that if part of the rewrite system is already in it, it is left untouched. We show how via such mappings, called unravelings, we can study the expressive power of rewriting system with respect to every property of interest More precisely, we focus on the relative expressive power of four main classes of rewriting systems: left-linear Term Rewriting Systems, Term Rewriting Systems, Normal Conditional Term Rewriting Systems, Join Conditional Term Rewriting Systems. It is formally proven that, for all the main properties of interest of rewriting systems (termination, confluence, normalization etc.) these four classes form a hierarchy of increasing expressive power, with two total gaps, one between left-linear TRSs and TRSs, and the other between TRSs and normal CTRSs, and no gaps between normal CTRSs and join CTRSs.
90 Therefore, these results formally prove the strict increase of expressive power between left-linear and non left-linear term rewriting, and between unconditional and conditional term rewriting. Also, they exactly formalize in what sense normal and join CTRSs can be considered equivalent: there is no expressive power difference between these two paradigm for every major observable property. Besides the theoretical relevance, it is also shown how this difference of expressive power can provide a clarification on the intrinsic difficulty of analysis of certain classes of rewriting systems with respect to others, and on the power of existing transformations among classes of rewriting systems (for instance, compilations). The paper is organized as follows. After some short preliminaries in Section 2, we introduce in Section 3 the notions of unraveling and of expressiveness w.r.t, a given property. In the subsequent three sections, we perform a thorough study of the relative expressive power of left-linear TRSs, TRSs, normal CTRSs and join CTRSs: Section 4 compares the expressive power of left-linear TRSs with that of TRSs. Section 5 compares the expressive power of TRSs with that of normal CTRSs. Section 6 compares the expressive power of normal CTRSs with that of join CTRSs. Finally, Section 7 presents the resulting expressive hierarchy of rewriting systems, discusses the gap results obtained via slightly different hypotheses, and explains the impact of the expressive power analysis for compilers and transformations.
2
Preliminaries
We assume knowledge of the basic notions regarding conditional term rewriting systems and term rewriting systems (cf. 7, 13). In this paper we will deal with join and normal CTRSs, that is in the first case rules are of the form l --* r r sl ~tl,. 9 9 sk ~tk (with Vat(r, sl, t l , . . . , sk, tk) C Var(l), where Var(s) denotes the variables of the term s), and in the second of the form l --~ r ~ Sl-*~.nl,...,sk--*~.nk (with Var(1) D_ Var(r, S l , . . . , Sk), and h i , . . . , n~ ground normal forms). In place of join CTRSs we will often simply say CTRSs. As far as the major properties of (C)TRSs are concerned, we will employ the standard acronym UN-" to denote uniqueness of normal forms w.r.t, reduction (a term can have at most one normal form). Also, we will consider the standard notions of completeness (confluence plus termination), normalization (every term has a normal form) and semi-completeness (confluence plus normalization). If R is a rewrite system, then its use as subscript of a rewrite relation indicates the rewriting is meant to be in R: for example, s--**t means that s reduces to t R
in R. Finally, to enhance readability, we will often identify a single rule with the corresponding rewrite system: for instance, instead of writing a one-rule TRS like {a ~ b}, we will simply write a --* b.
91
3
Expressiveness
Our approach allows to study the expressive power of rewrite systems in detail, focusing on every single property of interest. In a sense, it is a 'behavioural' approach, since we consider expressiveness with respect to a given property (the 'observable'). If a class gt of rewrite systems has at least the same "expressive power" than another class C with respect to a certain observable property 19, then there should be a transformation that given a rewrite system R in C produces a corresponding rewrite system R ~ in gr that is 'behaviourally equivalent' to R and that 'computes at least as much as' R. Behaviourally equivalent means that R t and R should be the same with respect to the observable property 7~: that is to say, R E 19 <=~R ~ E :P. On the other hand, the fact that R ~ 'computes at least as much as' R has already a quite standard definition in the literature. In 9 the notion of logical strength has been introduced; more precisely, given two rewriting system R and R ~, R ~ is said to be logically stronger than R if ~R C ~R' and ~R, ~ .JR; also, R' is said to have the same logical strength as R if ~R = SR" Thus, R' 'computes as least as' R can be seen as R ~ 'has at least the same logical strength as' R, that is to say J,R C SR'. The proper formalization of the intuitive notion of transformation is given by the concept of unraveling: D e f i n i t i o n 3.1 Given two classes g and C~ of rewrite systems, an unraveling of g into g ' is a computable map U from g to g' such that 1. V R e C :
~RC--~U(n)
2. VR 9 C : if R = R' U R" with R' 9 C', then U(R) = R' U U ( R " ) 3. VR 9 C : if R is finite, then U ( R ) is finite The first condition is just, as said, the requirement that the produced rewrite system has at least the same logical strength as the original one. The second condition says that if we are unraveling a rewrite system into g~, we can extract from it the part that is already in g~, and then go on computing the unraveling (roughly speaking, the unraveling must respect the structure of g ~, since we are interested in the relative increase of expressive power). The third condition ensures that if we have a finite system we don't get an infinite system by unraveling. We can then define the notion of expressiveness with respect to a given property 19: D e f i n i t i o n 3.2 Given two classes g and g t of rewrite systems, g~ is said to be at least as expressive as g w.r.t, the property 19 if there is an unraveling U of g into g ~ such that VR 9 r : R 9 19 r U ( R ) 9 19.
92 C~ is said to be as expressive as C (w.r.t. 79) if C~ is at least as expressive as C w.r.t. 79 and vice versa. Finally, C~ is said to be more expressive than C (w.r.t. 79) if Cr is at least as expressive as C w.r.t. 79 but not vice versa. The following proposition formalizes the intuitively obvious concept that if a class of rewrite system C~ contains another class C, then it is also at least as expressive as C.
P r o p o s i t i o n 3.3 Given two classes of rewrite systems C and C', if C C_ C' then C ~ is at least as expressive as C w.r.t, every property 79. P r o o f Take a property 79. The identity map Ie,c, from r to Ct is an unraveling of C into Cr, as it is immediate to check. Readily, V R E C : R E 79 r I c , , c ( R ) = R G 79, and so C~ is at least as expressive as C. The importance of establishing whether or not a certain class of rewrite systems (C ~) is at least as expressive as another one (C) w.r.t, a certain property is not only theoretical, but has practical impact as well. The typical case is when C~ C C, that is when one wants to show whether passing from C~ to a greater class C leads to a proper increase in expressive power (or vice versa, if restricting from C to C r leads to a proper loss in expressive power). If C and C~ have the same expressive power w.r.t. 79, then the analysis of the observable property 79 for objects in C can be reduced to the analysis of 79 for objects in the restricted class C~ (one just uses the corresponding unraveling to translate a rewrite system R E C into an R' E C~, and analyzes R~). On the other hand, if C is more expressive than C' w.r.t. 79, then the analysis of the observable property 79 for objects in C is inherently more difficult for rewrite systems in C than for those in Cr. For example, consider the case of compilers, or of transformational toolkits aiming at obtaining more efficient code (e.g. fold-unfold systems, etc.). Usually, they translate a program written in a high-level expressive language (C) to a low-level subset of it (C~). A minimum requirement for such a compilation/transformation to be sound could be for instance that if the starting program is terminating, then its transformed version terminates as well. But if C~ is not as expressive as C w.r.t, termination, then such a compiler/transformation cannot exist (unless it does not satisfy the second regularity condition of Definition 3.1). To make such informal reasoning more concrete, consider as C the class of CTRSs, and as C~ the class of TRSs. So, for instance, if TRSs are less expressive than CTRSs w.r.t, termination, then such a compilation is impossible (unless the compilation mapping does not satisfy the second condition of Definition 3.1). We will return on these topics in Section 7, after having completed the analysis of the expressive power among left-linear TRSs, TRSs, normal CTRSs and join CTRSs.
93
As far as the observable property 7~ is concerned, we will perform the expressibility analysis with respect to all the major properties of rewrite systems, that is: 9 Termination 9 Confluence 9 Completeness 9 Uniqueness of normal forms w.r.t, reduction (UN -~) 9 Normalization 9 Semi-completeness
4
Left-linear
TRSs
versus
TRSs
In this section we analyze the expressive power of left-linear Term Rewriting Systems versus Term Rewriting Systems. T h e o r e m 4.1
TRSs are more expressive than left-linear TRSs w.r.t, complete-
ne88. P r o o f By Proposition 3.3, TRSs are at least as expressive as left-linear TRSs w.r.t, completeness. So, we have to prove that left-linear TRSs are not at least as expressive as TRSs w.r.t, completeness. Ab absurdo, suppose there is an unraveling U of TRSs into left-linear TRSs such that every TRS R is complete if and only if U ( R ) is complete. Take the rule p : f ( X , X) --* X. Since in p we have f ( X , X ) ~ X , we also have f ( X , X ) ~ u ( p ) X . This implies that f(X,X)u(p:+X., But since U(p) is left-linear from f(X,X)u(p:+Z, it follows that f(X,Y)u(p~+X) (or U(p)'+Y): suppose w.l.o.g, that f(X,Y)u(p,+) X. Now, consider f ( X , Y) --~ Y U p: it is a complete TRS, and so its unraveling U ( f ( X , Y) --* Y U p) = f ( X , Y) --* Y U U(p) must be complete as well. But f(X,Y) ~+X and f ( X , Y ) ~+Y, a contradiction.
f(X,Y)--*YUU(p)
(X,Y)--*YUU(p)
T h e o r e m 4.2 TRSs are more expressive than left-linear TRSs w.r.t, confluence, UN--*, and semi-completeness. P r o o f Completely analogous to the proof of Theorem 4.1. T h e o r e m 4.3
TRSs are more expressive than left-linear TRSs w.r.t, termina-
tion. P r o o f By Proposition 3.3, we just have to prove that left-linear TRSs are not at least as expressive as TRSs w.r.t, termination. Ab absurdo, suppose there is an unraveling U of TRSs into left-linear TRSs such that every TRS R is terminating if and only if U(R) is terminating. Like in the proof of Theorem 4.1, using the rule p : f ( X , X ) --* X we can obtain that f ( X , Y)U(p~+X., Now, consider g(a) --* g(f(a, b)) U p: it is a terminating TRS, and so its unraveling g(a) --~ g(f(a, b))tAU(p) is terminating as well. But g(a) , +g(f(a, b) ) ~+g(a), a contradiction. g(~)--.g(f(~,b))uU(p) g(~)-~g(f(~,b))uU(p)
94 Theorem
4.4
TRSs are more expressive than left-linear TRSs w.r.t, normal-
ization. P r o o f By Proposition 3.3, we just have to prove that left-linear TRSs are not at least as expressive as TRSs w.r.t, normalization. Ab absurdo, suppose there is an unraveling U of TRSs into left-linear TRSs such that every TRS R is normalizing if and only if U ( R ) is normalizing. Like in the proof of Theorem 4.1, using the rule p : f ( X , X ) --* X we can obtain t h a t f ( X , Y)U(p~+X.. Since p is normalizing, U(p) is normalizing as well. Consider the (left-linear) T R S f ( X , Y ) ~ f ( X , Y ) U U(p): we have t h a t )* = ~*. Moreover, no normal form in U(p) contains the
I(X,Y).--*I(X,Y)UU(p)
U(p)
symbol f (since f ( X , Y ) - - ~ + X ) . Hence, from the fact that U(p) is normalizing U(p) it follows that J(X, Y) --, I ( X , Y) u U(p) is normalizing too. Now, take the TRS f ( X , Y) --, f ( X , Y) U p: since it is not normalizing ( f ( i , Y) has no normal forms), its unraveling U ( f ( X , Y) --* f ( X , Y) U p) = f ( X , Y) --, f ( X , Y) U U(p) must be not normalizing, a contradiction.
5
TRSs
versus
normal
CTRSs
In this section we analyze the expressive power of Term Rewriting Systems versus Normal Conditional Term Rewriting Systems. Theorem
5.1
Normal CTRSs are more expressive than TRSs w.r.t, termina-
tion. P r o o f By Proposition 3.3, we just have to prove that TRSs are not at least as expressive as normal CTRSs w.r.t, termination. Ab absurdo, suppose there is an unraveling U of CTRSs into TRSs such that every CTRS R is terminating if and only if U ( R ) is terminating. Take the rule p : a -~ b r c--~.~d. Since in c --* d U p we have aJ.b, then in U ( c --* d U p) --- c --* d U U(p) we have a~a_,euU(p)b as well, i.e. a ~*n*( b, for some n. c-~duU(p) c-~duU(p) If a ,*n uses the rule c -* d, then a ,+ec for some context e. ~-.duU(p) U(p) Now, c --* a O p is terminating, and so c --* a U U(p) is terminating as well. But in the latter TRS we have the infinite reduction a --*+ Cc -~ Ca --*+ eec a contradiction. The same reasonment leads us to exclude that the rule c --~ d can be used in the reduction b ~*n.
c---~duU(p)
95 ,*n *< b. So, J(X, X) --* f(a, b) U p is termiU(p) U(p) nating implies that f ( X , X) --* f(a, b) U U(p) is terminating as well, while in this TRS there is the reduction f(a, b) --~* f(n, n) --* f(a, b), a contradiction. Therefore, we have that a
T h e o r e m 5.2 ness. Proof
Normal CTRSs are more expressive than TRSs w.r.t, complete-
Completely analogous to the proof of the above theorem.
T h e o r e m 5.3
Normal CTRSs are more expressive than TRSs w.r.t, conflu-
ence. Consider the rule p : f ( X ) --* X ~ c--*~.d. Since f(X)c~d U~p* X , then f ( X ) ~*X. This reduction can be decomposed in the following c-~duU(p)
P r o o f Sketch
way:
I(X)u( p
~+t2
>+t3
) tlc--*d U(p)
,+t4...
~+tk-X
c~d
U(p)
Being p finite, we can take two constants A and B which do not appear in the rules of U(p). Consider the rule f(A) --* B. Being f(A) --* B U p confluent, f(A) --* B U U(p) is confluent too. Hence, since in the latter TRS f(A) rewrites to B and to tlX/A, we have that B and tlX/A join. But being B new, the only possibility is that tlX/A *+B. Now, it is not difficult to I(A)--*BUU(p) see that for every term s without occurrences of B, if s ~+s' )+u then I(A)--*B U(p) the commutation property holds, i.e. there is an s" such that s ~+s" *+ U(p) .f(A)--~B u. Therefore, in the reduction tlX/A )+B we can commute all the
f(A)-~BUU(p)
applications of the rule f(A) --* B, finally obtaining tlX/A_-----*+f(A)
U(p)
B. So, being A new we have that t l X / A I - ~ + f ( X Therefore, we have that f ( X ) +,
U(p)
tl+<
t2
)+
f(A)---~B
). *+ta. Being {f(A) --* B} U
d---*c U(p)
{d --~ c} U p confluent, we also have that {f(A) --* B} U {d ~ c} U U(p) is confluent. Hence, t3X/A~{I(A)_~B}U(d_~c}uU(p)B , and being B new this implies that t3X/A ~+B. By applying the aforementioned commu{f(A)--.*B}u{d---*c}UU(p) tation result, we have that from this reduction we can get t3X/A ~+ d-,~uU(p) f(A) ~+B, and being A new we obtain t3 >+f(X). I(A)--*B d--*cuU(p) By repeating this reasonment to t4, we can prove that t4 ~+f(X), and d-,cuU(p) so on, till at the end we obtain that t~ - X ~+f(X), a contradiction.
d--~cuU(p)
96
T h e o r e m 5.4
N o r m a l C T R S s are more expressive than T R S s w.r.t. UN--'.
P r o o f B y P r o p o s i t i o n 3.3, we just have to prove t h a t T R S s are not at least as expressive as n o r m a l C T R S s w.r.t. U N - ' . Ab absurdo, suppose there is an unraveling U of C T R S s into T R S s such t h a t every C T R S R is UN--* if and only if U ( R ) is UN -~. Take the rule p : a --~ b ~ c--*.Td. Since in c --* d U p we have a~b, we also have t h a t t h a t a and b join in U ( c --* d U p ) = c --* d O U ( p ) , t h a t is to say there are the reductions a ,* r n and b ~*n. r Is a a n o r m a l f o r m in U ( p ) ? Suppose it is not. Since {e --* a , e ~ f } U p โข UN -~, {e --* a , e - - * } O U ( p ) ~ UN -~ too. But since a is not a normal form in U ( p ) , it is not such in {e --* a, e --~ f } U U ( p ) r UN-* either. Hence, adding a --* a makes this T R S still not U N -~ (since no n o r m a l form has a in it, and -)* (~-.a,~-~/}uU(p)
B u t {a --* a , e --r a , e --* f } O p f } O U ( p ) E U N -~, a contradiction.
E UN-" implies {a -~ a , e --r a , e --~
So, a is a n o r m a l form in U(p). F r o m a~,._.dupb we get aJ.,._.duU(p)b. B u t being a a normal form in U ( p ) , it is a n o r m a l form in e --* d O U ( p ) too. So, a~c_~dvU(p)b ~ bc__.dvU(p)~*a. Is e a n o r m a l form in c --~ d 0 U ( p ) ? T h e reasonment is similar to w h a t seen previously for a. Suppose it is not. Then, { f --* e, f --* g, c ~ d} U p r UN--" implies { e, f --* g, c ~ d} O U ( p ) g U N - ' . B u t since e is not a normal form in c --* d U p, it is not such in { f --- e, f -~ g, c --* d} U U ( p ) either. Hence, adding e --- e to this T R S makes it still not UN--* (since no normal form has e in it, and )* ~
{f--*e,f--*g,c--~d}uU(p)
}*).
{e-*ef--*e,f--*g,c--*d}UO(p)
B u t {e ~ e, f --~ e, f -* g, c --* d} O p E UN-* implies {e --* e, f --* e, f --g, c --* d} U U ( p ) e UN-'*, a contradiction. So, e is a n o r m a l form in c --* d U U(p). Thus, being a is a normal form in U ( p ) and e a normal form in c ~ d O U ( p ) , we get t h a t b o t h a and e are n o r m a l forms in b --* e U c --* d O U(p). Therefore, b --~ e U c --* d O p E UN--" : r b --* e U c ~ d O U ( p ) E UN -~, whilst b in this T R S rewrites to the two different normal forms a a n d e. 7 Theorem ization.
5.5
N o r m a l C T R S s are more expressive than T R S s w.r.t, normal-
P r o o f B y P r o p o s i t i o n 3.3, we just have to prove t h a t T R S s are not at least as expressive as n o r m a l C T R S s w.r.t, normalization. Ab absurdo, suppose there is an unraveling U which is complete for normalization.
97 Take the rule p : a --* b r c---~, d. Since in c --* d U p we have a.~b, we also have t h a t t h a t a and b join in U ( c --* d U p ) = c --* d U U ( p ) , t h a t is to say there are the reductions a ,* c-~duU(p) n and b )*n. If a is a n o r m a l form in c --* d U U(p), then b
,*a. Also, since {b -~ U(p) b, c --~ d} U p is not normalizing, (b --* b, c --~ d} U U ( p ) is not normalizing too. On the other hand, ,* = ~*, and the normal forms
{b----~b,e--*d}UU(p)
e.--*duU(p)
in {b ~ b,c ~ d} U U ( p ) are the same as in c ~ d U U ( p ) , since no one of t h e m contains b (b is not a normal form in c -~ d U U ( p ) ) , and so being c --* d U U ( p ) normalizing implies t h a t {b --* b, c --* d} U U ( p ) is normalizing as well, a contradiction. So, a is not a n o r m a l form in c -~ d U U(p), and a f o r t i o r i neither in U(p). Being p normalizing, U ( p ) is normalizing as well. Also, a --* a U p is not normalizing implies t h a t a --, a U U ( p ) is not normalizing. B u t ~* ,*, and the n o r m a l forms in a --* a U U ( p ) are the same as in U(p), since U(p) no one of t h e m contains a (a is not a n o r m a l form in U ( p ) ) : so, being U ( p ) normalizing implies t h a t a --* a U U ( p ) is normalizing as well, a contradiction. --
Theorem pleteness.
5.6
N o r m a l C T R S s are more expressive than T R S s w.r.t, semicom-
P r o o f T h e proof is like t h a t of T h e o r e m 5.5, once substituted the word normalizing with the word semicomplete (it uses the fact t h a t if for two T R S s R1 and R~ we have --** = ---**, then R1 is confluent iff R2 is such). Rt
6
Normal
R2
CTRSs
versus
CTI:LSs
In this section we analyze the expressive power of N o r m a l Conditional T e r m Rewriting Systems versus join Conditional Term Rewriting Systems. We will first employ the simulation of C T R S s via n o r m a l C T R S s introduced
iu 9, s. A CTRS is transformed into a normal CTRSs by replacing every rule I ~ r r st J,t l , . 9 - , s k J, tk with the rules l --* r r e q ( s t , t l ) - * ~ . t r u e , . . . , eq(sk,tk)--*~.true and e q ( X , X ) --* true (where eq and true are new distinguished symbols). Call Ue~t such mapping. T h e n we have: Theorem
6.1
Uext is an unraveling of C T R S s into normal C T R S s .
P r o o f T h e first point of Definition 3.1 has been proved in 8, is trivial.
while the second
98 Using this unraveling, it can be proved that CTRSs and normal CTRSs have equal expressive power w.r.t, the following major properties: T h e o r e m 6.2
Normal CTRSs are as expressive as CTRSs w.r.t, termination.
P r o o f Sketch It suffices to prove that for every terminating CTRS T, T @ { e q ( X , X ) --* true} is terminating (here, @ denotes as usual the disjoint sum operator, i.e. the two systems are assumed to have disjoint signatures). Algebraically, this means that we have to prove that { e q ( X , X ) --* true} is in the kernel (cf. 151). Let's define the eq-rank of a term t as the greatest number of nested eq symbols. The eq-rank of a term cannot increase with reductions, as it is easy to see. So, we can perform a proof by induction on the eq-rank. Suppose ab absurdo that there is an infinite reduction in T (~ { e q ( X , X ) --~ true}. Take a term t with smallest eq-rank among those having an infinite reduction. Now, replace all the occurrences in t of the symbol eq by the symbol true. It is not difficult to prove that one can still mimic the original reduction; moreover, this reduction is still infinite, by the eq-rank minimality assumption. But then, we have an infinite reduction starting from a term having eq-rank zero, which means that T is not terminating, a contradiction. T h e o r e m 6.3 Normal CTRSs are as expressive as CTRSs w.r.t, confluence, normalization, and semi-completeness. P r o o f Sketch Essentially, the proof depends on the fact that all these properties are modular, so that adding to a CTRS T the TRS { e q ( X , X ) --* true} does not change the property of interest for T. T h e o r e m 6.4
Normal CTRSs are as expressive as CTRSs w.r.t, completeness.
P r o o f This follows by the previous two theorems, once noticed that completeness is confluence plus termination. Note that we cannot apply directly the proof argument of the previous Theorem 6.3, since completeness is not modular (cf. e.g. 16). Finally, the UN--* property remains: T h e o r e m 6.5
Normal CTRSs are as expressive as C T R S s w.r.t. UN--*.
P r o o f Sketch The proof uses the new unraveling U e ~ , which is a slightvariation of the unraveling Ue~ (that, as it can be easily seen, does not work). U ~ t is defined in such a way that every conditional rule l --* r ~= S l ~ t l , . . . , sk~tk is replaced by the rules l --* r ~ eq(sl, tl)--*~, t r u e , . . . , eq(sk, tk)--*?, true, eq(X, X ) --* true, and eq( X , Y ) --~ eq( X , Y ) (with eq and true new distinguished symbols). Next, one essentially has to prove that for every CTRS T which is UN-'*, T $ ( e q ( X , X ) --* true, e q ( X , Y ) --* e a ( X , Y ) } is still UN--', and this is not difficult to show (the proof is, analogously to the case of Theorem 6.2, by induction on the eq-rank), since no normal form can have an eq symbol.
99
8
/ GAP
GAP
EXPRE SIVE POWER
Figure 1: The Expressiveness Hierarchy of Rewriting System.
7
The Expressiveness Hierarchy
Summing up the results obtained in the previous three sections, we have formally shown that: 1) There is a gap in expressive power when restricting term rewriting to left-linear term rewriting, with respect to every main property of rewriting systems (Section 4), this way extending the gap result of 6 which only proved a gap between linear TRSs and TRSs. In Section 6 we have shown that: 2) normal CTRSs and join CTRSs have the same expressive power w.r.t, all the main properties of rewriting systems. So, normal CTRSs can be seen as equivalent to join CTRSs for every observable property of interest. Combining these results with those of Section 5, we obtain that 3) there is a gap in expressive power when passing to unconditional rewriting to conditional rewriting with respect to every main property of rewriting systems. Graphically, the resulting Expressiveness Hierarchy is illustrated in Figure 1. The conditions on the definition of unraveling (Definition 3.1) can be slightly modified obtaining a variety of other similar results. For instance, one may want to consider the more abstract case when the third finiteness condition is dropped (i.e., loosening the concept of expressive power by allowing the extra power to build "infinite" systems). To this respect, it is easy to see that the proofs that we have given for the gap results between left-linear TRSs and TRSs still hold in this more general setting, (i.e., even allowing the extra power to build infinite left-linear TRSs, the power gap still remains), thus showing that the gap is in a sense even stronger. Another case that can be considered concerns the first
100
condition of unraveling: here, the standard notion of logical strength (cf. Section 3) has been employed, which is based on the join relation. However, one could be led to consider another natural condition like for instance the one based on reduction: V R E C : ---*+ C ~+ R U(R) This way, the intuitive concept that the system U(R) computes 'at least as' the system R is formally represented by the fact that if in the system R a term t rewrites to another term t ~, then the same can be done in U(R). This way we have a stronger form of equivalence, where the systems are not only required to behave in the same way with respect to the logical strength, but even with respect to reducibility. It is not difficult to see that, in this new setting, all the proofs concerning the non-gap results between normal CTRSs and join CTRSs still hold. Also, trivially, all the other gap results between left-linear TRSs and TRSs, and between TRSs and normal CTRSs, still hold, having strengthened the definition of unraveling. Hence, we have that the Expressiveness Hierarchy remains true even in this new expressiveness setting. Another point worth mentioning is that the gap results given in this paper between TRSs and normal CTRSs are in a sense much stronger: for example, all the proofs that we have given for these cases still hold when only normal C T R S s with at most one conditional rule having one ground condition (and, even, made up by constants only) are considered, this way proving the stronger expressive gap between TRSs and this quite limited subclass of normal CTRSs (in a sense, the outcome is that it suffices just one conditional test, even in such a limited form, to produce an increase in expressive power). Besides the major properties of rewriting systems here studied, it is not difficult to investigate along the same lines many other properties of rewriting systems, like equational consistency, the equational normal form property, innermost termination, innermost normalization and so on (see e.g. 13), obtaining similar gap results as for the major properties. From the practical point of view, the presence of the gaps between left-linear and non left-linear term rewriting and between unconditional and conditional term rewriting can be seen as a formal confirmation that the analysis of all the major properties of TRSs (resp. CTRSs) is intrinsically more complex than for left-linear TRSs (resp. TRSs), cf. the discussion in Section 3. For instance, in 14 it has been studied to what extent the properties of CTRSs can be automatically inferred from those of TRSs. This study has been carried out using unravelings that 'behave well' with respect to some property, in the sense that the unraveled TRS safely approximates the original CTRS. The number of results that one can automatically obtain is high, but it is not clear in general to what extent results from a simpler field like TRSs can be extended to CTRSs. The results proved in this paper give a formal proof to the fact that there is an intrinsic factor due to the expressive power gap: it is impossible to fully recover the results known for any of the major properties of interest of CTRSs by resorting only on the
101
simpler TRS paradigm, since there is no unraveling able to fully preserve them; in other words, every approximating TRS must be lossy. Last but not least, another related consequence is in the field of compilation of CTRSs via TRSs. The presence of the gap between unconditional and conditional rewriting gives an a posteriori justification of the fact that so far all existing compilations of CTRSs via TRSs either are 'impure' in the sense that they have to use an ad hoc reduction strategy restriction, or they cannot act on the whole class of conditional term rewriting systems. So, in the first category we have the works by Ada, Goguen and Meseguer (1), and the work by Kaplan (12), which compiles CTRSs into Lisp code (the resulting Lisp programs could with some effort, as claimed in 11, be compiled into TRSs using Combinatory Logic). All the other works, i.e. 4, 10, 11, fall in the second category, since they considerably restrict the class of CTRSs that can be compiled.
Acknowledgments I wish to thank Jan Willem Klop for his support.
References 1
H. Aida, G. Goguen, and J. Meseguer. Compiling concurrent rewriting onto the rewrite rule machine. In S. Kaplan and M. Okada, editors, Proceedings
2nd International Workshop on Conditional and Typed Rewriting Systems, vol. 516 of LNCS, pp. 320-332. Springer-Verlag, 1990. 2
J. Avenhaus. On the descriptive power of term rewriting systems. Journal of Symbolic Computation, 2:109-122, 1986.
3
J. Bergstra and J.-Ch. Meyer. On specifying sets of integers. Journal of information processing and cybernetics (EIK), 20(10,11):531-541, 1984.
4
J. Bergstra and J.W. Klop. Conditional rewrite rules: Confluence and termination. J. of Computer and System Sciences, 32(3):323-362, 1986.
5
M. Dauchet. Simulation of Turing machines by a left-linear rewrite rule. In N. Dershowitz, editor, Proceedings of the Third International Conference on Rewriting Techniques and Applications, LNCS 355, pages 109-120. Springer-Verlag, 1989.
6 M. Dauchet and De Comite. A gap between linear and non-linear termrewriting systems. In Proceedings of the Second International Conference on Rewriting Techniques and Applications, volume 256 of LNCS, pages 95104, Bordeaux, France. Springer-Verlag, 1987. 7
N. Dershowitz and J.-P. Jouannaud. Rewrite systems. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, volume B, chapter 6, pages 243-320. Elsevier - MIT Press, 1990.
102
8 N. Dershowitz and M. Okada. A rationale for conditional equational programming. Theoretical Computer Science, 75:111-138, 1990. 9 N. Dershowitz, M. Okada, and G. Sivakumar. Canonical conditional rewrite systems. In Proceedings of the 9th International Conference on Automated Deduction, volume 310 of LNCS, pages 538-549. Springer-Verlag, 1988. 10
E. Giovannetti and C. Moiso. Notes on the elimination of conditions. In S. Kaplan and J.-P. Jouannaud, editors, Proceedings 1st International Workshop on Conditional and Typed Rewriting Systems, volume 308 of LNCS, pages 91-97. Springer-Verlag, 1988.
11
C. Hintermeier. How to transform canonical decreasing CTRSs into equivalent canonical TRSs. In N. Dershowitz and N. Lindenstrauss, editors, Proceedings $th International Workshop on Conditional and Typed Rewriting Systems, vol. 968 of LNCS, pages 186-205. Springer-Verlag, 1995.
12
S. Kaplan. A compiler for conditional term rewriting systems. In P. Lescanne, editor, Proceedings 2nd International Conference on Rewriting Techniques and Applications, volume 256 of LNCS, pages 25-41. SpringerVerlag, 1987.
13
J.W. Klop. Term rewriting systems. In S. Abramsky, Dov M. Gabbay, and T.S.E. Maibaum, editors, Handbook of Logic in Computer Science, volume 2, chapter 1, pages 1-116. Clarendon Press, Oxford, 1992.
14 M. Marchiori. Unravelings and ultra-properties. In Proceedings of the Fifth International Conference on Algebraic and Logic Programming (ALP'96), volume 1139 of LNCS, pages 107-121. Springer-Verlag, 1996. 15
M. Marchiori. The Theory of Vaccines. In Proceedings of the Twentyfourth International Colloquium on Automata, Languages, and Programming (ICALP'97), volume 1256 of LNCS, pages 660-670, Springer-Verlag, 1997.
16
Y. Toyama, J.W. Klop and H.P. Barendregt. Termination for Direct Sums of Left-Linear Complete Term Rewriting Systems. In Journal of the ACM, volume 42, number 6, pages 1275-1304, 1995.
M e c h a n i z i n g Verification of A r i t h m e t i c Circuits: SRT Division * D e e p a k K a p u r 1 and M . S u b r a m a n i a m 2.* 1 Computer Science Department, State University of New York, Albany, NY 12222
[email protected] 2 Functional Verification Group, Silicon Graphics Inc., Mountain View, CA 94040
[email protected] A b s t r a c t . The use of a rewrite-based theorem prover for verifying properties of arithmetic circuits is discussed. A prover such as Rewrite Rule Laboratory (RRL) can be used effectively for establishing numbertheoretic properties of adders, multipliers and dividers. Since verification of adders and multipliers has been discussed elsewhere in earlier papers, the focus in this paper is on a divider circuit. An SRT division circuit similar to the one used in the Intel Pentium processor is mechanically verified using RRL. The number-theoretic correctness of the division circuit is established from its equational specification. The proof is generated automatically, and follows easily using the inference procedures for contextual rewriting and a decision procedure for the quantifier-free theory of numbers (Presburger arithmetic) already implemented in RRL. Additional enhancements to rewrite-based provers such as RRL that would further facilitate verifying properties of circuits with structure similar to that of the SRT division circuit are discussed.
1
Introduction
There has been considerable interest recently in using a u t o m a t e d reasoning techniques to aid in enhancing confidence in hardware designs. A number of researchers have been exploring the use of BDD based software, model checkers, theorem provers and verification systems for verifying properties of arithmetic circuits, cache-coherence protocols, different kinds of processors including pipeline, scalable processors, as well as a commercial processor. Papers on these a t t e m p t s have appeared in recent conferences such as CAV and FMCAD. Intrigued by these a t t e m p t s and results, we decided to try our theorem prover
Rewrile Rule Laboratory (RRL) 11 for hardware verification, with the main objective of exploring circuits and their properties that can be verified automatically in a push-button mode. We have also been interested in identifying extensions and enhancements to RRL which would make it better suited for this application. In 8 and 7, we discussed how RRL had been used for verifying ripple-carry, carry-lookahead and carry-save adders, as well as a family of multipliers including Wallace-tree and Dadda multipliers. * Partially supported by the National Science Foundation Grant no. CCR-9712366. ** This work was done while the author was at State University of New York, Albany.
104
Our experience in using RRL has been very encouraging. RRL can be used effectively, essentially in the push-button style, for proving number-theoretic properties of these circuits without having to require fixing their widths. Parametric circuits can be verified; descriptions common to a family of related circuits can be given and reasoned about. Proofs of components can be reused while attempting proofs of larger circuits; as an example, while reasoning about multipliers, adders used in them can be treated as black-boxes insofar as they satisfy their specifications. In this paper, we discuss how RRL can be used for reasoning about SP~T division circuits. After reading 2 and 18, we first suspected that considerable user interaction with and guidance to RRL may be needed to verify the main properties of the circuit. The reported use of Mathematica and Maple in 2, 4 for reasoning about inequalities and real numbers, as well as the use of dependent types, table data structure, and other higher order features in 18 initially discouraged us from attempting a mechanical verification of the division circuit using RRL. We subsequently discovered to our pleasant surprise that the proof reported in 2 could be easily found using RRL without any user guidance; a brief sketch of that proof is given in 5. In fact, the mechanization of that proof was the easiest to do in RRL in contrast to the proofs of adders and multipliers in 8, 7. We have recently found a much simpler and easier proof of the SRT division circuit by explicitly representing the quotient selection table. (It is widely believed that the bug in the Intel Pentium processor was in the quotient selection table.) In this paper, we discuss this new proof. Later, we contrast this proof with our earlier proof attempt as well as proofs in 2, 18. Four major features seemed to have contributed to RRL being effective in mechanization attempts for hardware verification. 1. Fast contextual rewriting and reasoning about equality 23. 2. Decision procedures for numbers and freely constructed recursive data structures such as lists and sequences, and most importantly, their effective integration with contextual rewriting 6. 3. Cover set method for mechanization of proofs by induction 24, and its integration with contextual rewriting and decision procedures. 4. Intermediate lemma speculation heuristics. In the next section, SRT division algorithm and circuit are informally explained, with a special focus on radix 4 SRT circuit. The interaction between the appropriate choice of radix, redundancy in quotient digits, quotient selection and remainder computations is briefly reviewed. The third section is a brief overview of the theorem prover RRL. Section 4 is an equational formalization of SRT division circuit description in RRI,. Section 5 is a brief sketch of how the proof of the two invariant properties of the circuit was done using RRL. Section 6 is a discussion of related work, and our experience in using RRL for SRT division circuit. Section 7 concludes with some remarks on possible enhancements to //RL to make it better suited for verifying circuits using preprogrammed read-only-memory (rom).
105
2
SRT Division
Algorithm
and Circuit
The basic principles underlying the SRT division algorithm are reviewed. SRT division algorithm proposed by Sweeney, Robertson 17 and Tocher 19 has been frequently used in commercial microprocessors due to its efficiency and ease of hardware implementation 20, 22. Several expositions of the design of hardware divider circuits based on this algorithm appear in the literature 20, 15, 16, 3. The SRT algorithm takes as input, two normalized fractions, the dividend and the positive divisor, and outputs the quotient and the remainder. The focus in this paper is on this part of the division circuit as in 4, 2, 18. It is assumed that a normMization circuit for handling signs and exponents is correct. Much like the paper and pencil grade school division method, the SRT division algorithm is iterative, in which the quotient is computed digit by digit by repeatedly subtracting the multiples of the divisor from the dividend. In each iteration, the algorithm selects a quotient digit, multiplies it with the divisor, and the result is subtracted from the partial remainder computed so far. The result of the subtraction is the partial remainder for the next step. The partial remainder is initialized to be the dividend divided by r. The algorithm terminates once all the quotient digits have been computed. The algorithm can be formalized in terms of the following recurrences. P0 := dividend/r, Qo := 0, P~+I := r* P3 - q3+~ * divisor, f o r j = 0 , . . . , n - l , Q~+I := r 9 Q3 q- q~+l, for j = o , . . . , n - l , where Pj is the partial remainder at the beginning of the j-th iteration, and 0 <_ Pj < divisor, for all j, Qj is the quotient at the beginning of the iteration j, qj is the quotient digit at iteration j, n is the number of digits in the quotient, and r is radix used for representing numbers. The alignment of the partial remainders and the multiples of the divisor being subtracted is achieved by left shifting the partial remainder at each step (i.e., by multiplying Pj with the radix r). The correct positional placement of the quotient digit is similarly ensured by left shifting the partial quotient. And, the invariant 0 <_ Pj < divisor ensures that at each step, the highest multiple of the divisor less than the partial remainder is subtracted. SRT dividers used in practice incorporate several performance enhancing techniques while realizing the above recurrence. An important issue in implementing such an algorithm in hardware is the selection of correct quotient digit at each step. A brute force strategy of enumerating the multiples of the divisor until the subtraction leads to a number that is less than the divisor could be prohibitively expensive. The SRT dividers instead use quotient digit selection functions in the form of look-up tables for guessing a quotient digit at each step of division based on the partial remainder and the divisor. Two other major aspects resulting in the increased performance of SRT dividers are the choice of the radix in representing the quotient, and the choice
106
of a signed digit representation for the quotient digits. The former reduces the number of iterations required to get the quotient, and the latter reduces the time taken in each iteration by speeding up the partial remainder computation. In 20, tradeoffs between speed, radix choice, redundancy of quotient digits, are discussed. 2.1
C h o o s i n g Quotient Radix
In an SRT divider using the radix 2, each iteration produces one quotient bit, and n iterations are required to produce a quotient of n bit accuracy. The number of iterations can be reduced by choosing a higher radix. For example, choosing the radix to be 4, only n/2 iterations are needed; at each step, two quotient bits can be generated. The choice of a higher radix, however, entails larger time in each iteration since the selection of the quotient digit and the generation of divisor multiples become more complicated. Typically, radix 4 is used in practice, since it seems to provide a reasonable trade-off between the number of iterations and the time spent in each iteration 20. Multiplication by quotient digits 0, 1, 2, and 3, can be performed by shifting and adding/subtracting. The SRT divider specified and verified in this paper, uses the radix 4.
2.2
Redundant Quotient Digit Representation
SRT dividers reduce the latency of each iteration by using a redundant signed&git representation for the quotient digits. Typically, the digit values of a quotient represented with a radix r can range from 0 through r - 1. In contrast, in a redundant signed-digit representation, the digit values of a quotient with radix r are a consecutive set of integers -a, a where a is at least It~21 . Depending upon a, this allows for some redundancy. For example, a redundant signed-bit representation for quotient with radix 4 would be the quotient digit set {-2, - 1 , 0, 1, 2}; this is in contrast to 4 quotient digits commonly used for radix 4: {0, 1, 2, 3}. The value of a quotient with signed digits is interpreted by subtracting the binary weights of the negative digits from the non-negative ones. Due to the redundancy in the representation, more than one quotient can map onto the same number. For example, the quotients 10(-2) and 1(-1)2 in radix 4bothhavethevaluel*42-2.1 = 14 = 1 . 4 ~ - ( 1 . 4 ) + 2 . An advantage of using the above quotient digit set is that divisor multiples are generated simply by shifting. This is in contrast to the unsigned quotient digit set representation for radix 4 for which it is necessary to implement a shift followed by an add/subtract to generate 3 times the divisor. More importantly, redundancy among quotient digits allows the quotient digits to be selected based on only a few significant bits of the partial remainder and the divisor. This reduces the complexity of the quotient selection table, as well as allows the multiplication and the subtraction stage of an iteration to be overlapped with the quotient selection stage of a successive iteration. The radix 4 SRT divider in this paper uses the redundant signed-digit representation -2,2.
107
Fig. I. P - D Plot for Radix 4
2.3
Quotient Selection Function
The SRT division algorithm with redundant signed-digit quotient representation allows quotient digits selected to be inexact within certain bounds; the partial remainder generated in a step could be negative. The bound on the successive partial remainders using a redundant signed-digit representation I-a, a] for radix ris -D, a/(r - 1) < Pj < D * a / ( r - 1), where D is the divisor. By substituting the recurrence for the successive partial remainders, the range of shifted partial remainders that allow a quotient digit k to be chosen is: [(k - a / ( r -
1 ) ) , D , ( k q- a / ( r -
1)),D].
The correlation between the shifted partial remainder range P and divisor D in the SRT division algorithms is diagrammatically plotted as a P - D plot. The shifted partial remainder and the divisor form the axes of the plot which illustrates the shifted partial remainder ranges in which a quotient digit can be selected, without violating the bounds on the next partial remainder. The P-D plot for a radix 4 quotient with redundant digit set [-2, 2] is given in Figure 1. As the reader would notice, when the partial remainder is in the range [5/3D, 8/3D], the quotient digit 2 is selected. The shaded regions represent quotient digits overlaps where more than one quotient digits selection is feasible. So if the partial remainder is in the range [4/3D, 5/3D], either 2 or 1 can be used.
108
parrem gTg6gSg4.&3g2gl 1010.0 1010.1 1011.0 1011.1 1100.0 1100.1 1101.0 1101.1 1110.0 1110.1 1111.0 1111.1 0000.0 0000.1 0001.0 0001.1 0010.0 0010.1 0011.0 0011,1 0100,0 0100,1 0101.0 0101.1
Ttble
Divisor 1.000
1.001
fl.f2f3f4 1 . 0 1 0 ! 1.011 1 . 1 0 0
1.101
-
1 . 1 1 0 1,111 -3
-
-2
-2
-2
-2 -2
-
--
-2
-2
-2
-2
-
-2
-3
-2
-2
-2
-2
-2 1R -1 -1 -1 0 0 0 0 1 1 1 2
-2 -1 -1 -1 -1 0 0 0 0 1 1 1 1
-3 -2 -3 A -1 -1 0 0 1 1 3 2 2
-2 -2 -2 B -1 ~ 0 0 1 1 C 2 2
-2 -2 -3 -1 -1 D 0 0 1 1 1 2 2
~ ~ B ~ -1 D 0 0 1 1 1 2 :2
-2 -2 -1 -1 -1 0 0 0 E 1 1 2
-2 -2 -1 -1 -1 0 0 0 0 1 1 1 2
-
"h
2
2
2
2
2
-
3
2
2
2
2
3
2
-
- -
-
2
2
3
2
-
--
3
2
2
-
--
1. qnotlelt
C
-
Dislt
2
Selection
Table
For selecting an appropriate quotient digit, it is not necessary to know the exact value of the shifted partial remainder P or the divisor D. It suffices to know the region in which the ratio P/D lies in Figure 1. Due to the overlap between the lower bound for the P/D ratio for quotient digit k and the upper bound for the quotient digit k - 1, P/D ratio can be approximated in choosing quotient digits. For instance, a radix 4 SP~T divider with the partial remainders and divisor of width n, n > 8, it suffices to consider partial remainders up to 7 bits of accuracy and a divisor up to 4 bits of accuracy 20. The quotient selection table implementing the P-D plot for radix 4 is reproduced above from 20. Rows are indexed by the shifted truncated partial remainder g7g6g5g4.g3g2gl; columns are indexed by the truncated divisor fl.f2f3f4; table entries are the quotient digits. The table is compressed by considering only row indices up to 5 bits since only a few entries in the table depend upon the 2 least significant bits g2gl of the shifted partial remainder. For those cases, the table entries are symbolic values A, B, C, D, E, defined as: A = -(2-g2*gl),
S = -(2-g2),
C = l+g2, D = -l+g2,
E = g2.
These entries as well as other aspects of the selection table are further discussed in subsection 4.1, where we show how the table is input to RRL. The - entries in the table are for the cases of the shifted truncated partial remainder and truncated divisor pairs which are not supposed to arise during the computations. 2.4
Divider Circuit
A radix 4 SRT divider circuit using the signed digit representation -2, 2 is given in Figure 2. The registers divisor, remainder in the circuit hold the value of the divisor and the successive partial remainders respectively. The register
109
,~,,
~ . .
SLy;. ~ I
l
A..+...B " - - A - ! ? - - ~
l dl (3 bits)
md MUX
QUOLOG,C I
i qslgn ( I bit)
i i
~
d
Fig. 2. SRT Division Circuit using R a d i x 4
q holds the selected quotient digit along with its sign; the registers QPOS and QNEG hold the positive and negative quotient digits of the quotient. A multiplexor MUX is used to generate the correct multiple of the divisor based on the selected quotient digit by appropriately shifting the divisor. The hardware component QUO LOGIC stands for the quotient selection table, and it is typically implemented using an array of preprogrammed read-only-memory. The hardware component DAL U is a full width ALU that computes the partial remainder at each iteration. The component GALU (the guess ALU 20) is an 8-bit ALU that computes the approximate 8-bit partial remainder to be used for quotient selection. The components < < 2 perform left shift by 4. The circuit is initialized by loading dividend/4 (by right shifting the dividend by 2 bits) and the divisor into the remainder and divisor registers. The quotient is initialized to be zero by setting the registers QPOS and QNEG to be zero. The quotient digit register q is initialized by the appropriate alignment of the dividend and the divisor. At each iteration, the correct multiple of the quotient digit and the divisor is output by MUX. This output and the partial remainder in the remainder register is input to DAL U to compute the next partial remainder. An 8 bit estimate of the partial remainder in the remainder register and an 8 bit estimate of the output of the MUX are input to the GAL U.
110
GAL U computes an 8 bit estimate of the next partial remainder which is left shifted by 4, and then, used with the truncated divisor (dl), to index into QUO LOGIC to select the quotient digit for the next iteration. Note that GAL U and the quotient digit selection are done in parallel with the full width DAL U so that the correct quotient digit value is already available in the register q at the beginning of each iteration. This relationship between the inputs and output of DAL U and GAL U is captured using the predicate GALU_desc(rout, r i n , rod, t i n 1 , mdl, qsign, r o u t l ) in section 5.2 where the correctness of the circuit is discussed. The circuit is formalized in Section 4.
3
A Brief Overview
of RRL
Rewrite Rule Laboratory (RRL) 11 is different in its design philosophy from most proof checkers such as PVS, IMP, HOL, Isabelle, NUPRL, LP, in the sense it attempts to perform most inferences automatically without user guidance. Many proofs in RRL can be generated automatically; RRL can be used in such cases as a push-button theorem prover. In fact, that is how we typically use RRL for finding proofs, starting without having any clue about how a proof can be done manually. When a proof attempt fails and a proof cannot be found automatically, the transcript is looked at, which may reveal a variety of things. The conjecture may have to be modified, a definition may have be to fixed, or perhaps, an intermediate lemma needs to be hypothesized. RRL supports some heuristics for automatically generating intermediate lemmas based on formulas generated during a proof attempt. Lemmas which cannot be generated automatically by RRL must be provided by the user. This is where RRL needs guidance from the user. Below, we briefly discuss main features of RRL found useful for hardware verification. The specification language supported in RRL is equational, with support for defining abstract data types using constructors. Circuits and their behavioral specifications are given as equations and conditional equations. Definitions are distinguished from properties (lemmas) using := for definitions and == for properties to stand for the equality symbol. The correctness of circuit descriptions is established by proving various properties about these descriptions, and showing that they meet the behavioral specifications. After transforming definitions into terminating rewrite rules, RRL attempts to prove a conjecture by normalizing its two sides using contextual rewriting 23 and the decision procedures for discharging any hypotheses, if any, and checking whether the normal forms of the two sides of the conjecture are identical. If it succeeds, then the proof is said to have been obtained using equational reasoning and decision procedures. Otherwise, a proof by induction is attempted using the cover set method of generating induction schemes 24. RRL has built-in heuristics for t. orienting equations into terminating rewrite rules,
111
2. identifying the next rewrite rule to apply for simplification, and for that, determining the instantiation of the free variables, and discharging conditions, if any, of the rewrite rule, 3. invoking decision procedures for numbers (quantifier-free Presburger arithmetic), bits, data types with free constructors, and propositional logic, 4. selecting the next inference rule, 5. automatic case analyses, 6. choosing induction schemes based on the definitions of function symbols appearing in a conjecture and interaction among these definitions, 7. generating intermediate lemmas needed, as well as 8. automatic backtracking when a proof attempt fails. The user is thus relieved of the task of having to determine the sequence in which rewrite rules should be applied, when decision procedures should be invoked, how rewrite rules should be instantiated, when induction is performed, as well as what variables and induction scheme to be used for induction. The cover set method generates induction schemes based on well-founded orderings used to establish termination of function definitions. Based on an induction scheme, the conjecture is split into subgoals to be proved in order to prove the original conjecture. Each subgoal is then tried just like the original conjecture. If a proof attempt based on a particular induction scheme does not lead to a counter-example, but also does not succeed, RRL automatically backtracks to pick another induction scheme. While attempting to find proofs, RRL automatically attempts to generate new intermediate lemmas needed to find a proof of a conjecture. Currently, RRZ implements a simple heuristic for generalizing conjectures by abstracting to new variables, common subexpressions appearing in a conjecture and satisfying certain criteria. New intermediate lemma speculation heuristics have been investigated in 9, 10, and will be implemented, as we consider the intermediate lemma speculation research to be the most critical for automating proofs by induction.
4
Formalizing SRT Division in RRL
The SRT divider in Figure 2 is equationally specified in RRL. We first discuss how quotient selection table is axiomatized. The recurrence relations for partial remainder and quotients are axiomatized using the quotient digit. 4.1
F o r m a l i z i n g Q u o t i e n t Selection Table
The quotient selection table can be input essentially as is into RRL. As discussed earlier, even though the partial remainder is truncated to be 7 bits, only 5 bits are used to index the rows of the table. Every entry in the table is, thus, for four remainder estimates. In some cases, the table entry depends upon the
112
two least significant bits in the estimate. For showing that dependency, the symbolic entries A, B, C, D, E are used in the table, and their values depend upon the value of these bits. The table is input to R1rlL by defining the function n e x t q d i g i t below, such that given a row and column indices, it gives the entry in the table. Instead of using fractional numbers for indices, it is more convenient and faster for the prover to use their scaled integer versions as indices to the table. So all row and column indices are scaled up by 8. (Scaling up effectively leads to using number representations of bit vectors of the shifted truncated partial remainder estimate and the truncated divisor estimate by dropping the decimal point.) Since the table is big, we give only a partial specification of one of the rows, the eighth row, to illustrate how it is input into RRL. The eighth row is indexed by -5/2 (ignoring the two least significant bits), and is scaled up by multiplying by 8, to - 20 (2's complement representation is used in the table for row indices); the columns are indexed by 8/8 to 15/8, and they are also scaled up by multiplying by 8. The function m below stands for the minus operation. nextqdigit(m(20), nextqdigit(m(20), nextqdigit(m(20), nextqdigit(m(20),
8) := I0) := 12) := 14) :=
m(2), m(2), m(1),
m(1),
nextqdigit(m(20), nextqdigit(m(20), nextqdigs nextqdigit(m(20),
9) := II) := 13) := 15) :=
m(2), m(2),
m(1), m(1).
The eighth row corresponds to four shifted truncated remainder estimates: { - 5 / 2 , - 1 9 / 1 6 , - 9 / 8 , - 1 7 / 1 6 } depending upon the values of g2gl. It includes B when the column index is 1.011, where B = - ( 2 - g2). For all other column indices, the entries do not depend upon g2gl. So for all other column indices, n e x t q d i g i t is the same irrespective of whether the first argument is - 2 0 , - 1 9 , - 1 8 or -17. If the second argument of n e x t q d i g i t is 11, then its value is -2 if the first argument is - 2 0 or -19, since in that case g2 is 0; if the first argument is - 1 8 or -17, then n e x t q d i g i t is -1. Below, we give the specification for those cases when the second argument is 11. nextqdigit(m(19), 11) := m(2), nextqdigit(m(17), 11) := m(1).
nextqdigit(m(18), 11) := re(l),
Other rows in the table lead to similar specifications, with each row defining 32 entries. Inaccessible table entries (represented as - in the above table) are represented as a large negative or a large positive integer depending upon the top or the bottom of the table, respectively. The specification defines n e x t q d i g i t on 768 pairs of indices. In our first proof attempt of the above SRT circuit as sketched in 5, we followed an approach similar to the one taken in 2 for specifying the quotient selection table. The table in that proof is not explicitly represented but is rather abstracted as a predicate defined using six boundary values denoting the endpoints of the partial remainder ranges for the choice of the five quotient digits -2, 2. For example, the quotient digit - 2 is chosen whenever the partial remainder is in the range bl, b2). The boundary value bl is the minimum value of
113
the shifted truncated partial remainder estimate for which the selected quotient digit is - 2 . bl is explicitly enumerated for every truncated divisor value. Similarly, other boundary values are specified. R R L was successful in obtaining a proof similar to the one reported in 2. That proof has fewer cases but reasoning about each case is complex and more tedious, taking more cpu time, in contrast to the proof discussed later which has more cases but each case can be easily proved. We will contrast that proof based on such an intensional formalization of the quotient selection table after we have discussed the proofs of invariants in the next section using the above explicit, extensional representation of tables. 4.2
F o r m a l i z i n g P a r t i a l Remainder a n d Q u o t i e n t C o m p u t a t i o n s
The recurrence relations for partial remainders and quotients discussed above can be equationally axiomatized as two functions n o x t p a r r e t a and n e x t q u o t , respectively. 3 The inputs for n e x t q u o t are: the quotient computed so far, as well as the scaled truncated partial remainder and the scaled truncated divisor, to be used for indexing the quotient selection table for selecting the quotient digit. The new quotient is got by either adding or subtracting the selected quotient digit from the previous quotient based on the sign of the selected quotient digit. 4 nextquot(quotient,trtmcp,txamcd)
:= 4 * quotient + nextqdigit(trumcp,truncd).
The function nextparrem for computing the partial remainder is similarly defined. Its inputs are: the partial remainder computed at the previous iteration, the scaled truncated previous partial remainder, the divisor, and the scaled truncated divisor. nextparrem(parrem,truncp,divisor,truncd) := 4 * parrem - nextqdigit(truncp,
5
truncd)
* divisor.
A Correctness P r o o f of SRT D i v i s i o n in R R L
The correctness of the SRT divider circuit is established by proving two invariants about the circuit. It is assumed that the divider circuit is initialized appropriately with dividend/4 as the initial partial remainder, and with the initial quotient being zero. 3 n o x t p a r r e m and n e x t q u o t were called, respectively, n r o u t and n q u o t in 5.
4 In the circuit, addition/subtraction in GALU and DALU are selected based on the sign of the quotient digit. The functions nextquot and nextparre= could have easily been specified as adding or subtracting based on the sign of the quotient digit, exactly mimicing such selection in the circuit; in fact, that is how nextquot, nextparrem were specified in our first proof attempt 5. The proofs using such specifications can be found by RRL just as easily except that there are more subcases. The above formalization is more compact with the quotient sign not made explicit but instead being part of the quotient digit. DALUcomputation--addition or subtraction- gets abstracted, leading to fewer subcases.
114
5.1
Invariant relating Partial R e m a i n d e r a n d Quotient to D i v i d e n d
The first invariant is to show that in every iteration, the dividend is equal to the the addition of the partial remainder computed so far and the divisor multiplied by the quotient computed so far. However, as discussed in section 2, to mechanically align the partial remainder with the multiple of the divisor, in each step the partial remainder is left shifted; similarly, the quotient is also left shifted to align the quotient digit. The quotient and the partial remainder at step i are thus scaled up from their actual values by 4i. The relationship between the dividend, the partial remainder and divisor is: 4 i-1 * d i v i d e n d
=
Qi * d i v i s o r + Pi.
The above property can be verified by first showing that it initially holds (i = 0), which is indeed the case assuming that the partial remainder and quotient are properly initialized. For other steps, it follows from the following invariant relating partial remainders and quotients in successive iterations. Qi+I * d i v i s o r + Pi+t =- (Qi * d i v i s o r + Pi) * 4 .
This is input to R R L as: nextparrem(parrem,truncp,divisor,trtmcd) + nextquot(quotient, truncp,truncd)*divisor == 4*(parrem+(quotient*divisor))
This property is automatically established in R / / I by contextual rewriting and the decision procedure for linear arithmetic. 5.2
Partial R e m a i n d e r s N e v e r Go Out of B o u n d s
The second invariant is the more interesting one. It establishes that the division circuit would converge to a correct quotient, and would not overflow. As discussed in section 2, to ensure this for a radix 4 SRT divider with a redundant quotient digit representation -2, 2, it is necessary that the successive partial remainders be bounded as: - 2 / 3 * d i v i s o r < Pi <_ 2/3 * divisor. Since the partial remainder computation in every iteration is dependent on the quotient digit selected, this invariant also establishes the correctness of the quotient selection table based on the shifted truncated partial remainder and the truncated divisor. The second invariant is specified to R R L as follows: m(2) * divisor <= 3 * nextparrem(parrem,truncp,divisor,truncd) and 3 * nextparrem(parrem,truncp,divisor,truncd)
115
where GALU_desc(rout, r i n , md, r i n l , mdl, qsign, r o u t l ) == ( r i n l <= r i n ) and (64 * r i n < 64 * r i n l + 1) a n d ( m d l <= md) and (64 * md < 6 4 * m d l + 1) and cond(qsign=O, r o u t l = rinl+mdl, 64*routl = 64*rinl-64*mdl-1) cond(qsign=O, rout = r i n + md, rout = r i n - md).
and
The invariant states that if 1. the partial remainder is within the bounds in the previous iteration, 2. t r u n c d is the scaled up version of the truncation of the divisor up to 3 bits after the decimal point, 3. t r u n c p is the scaled up version of the shifted output z o u t l (multiplication by 4) of GAs g, then the partial remainder in the next iteration is also within the bounds. The predicate GALU_desc describes the 8 bit GAL U unit which approximates the DAL U computation. Given a partial remainder and a quotient digit multiplied with the divisor, DA L U computes the next partial remainder by subtraction (or addition depending upon the sign of the quotient digit). GAL U performs the same computation as the full ALU unit DAL U. However, GAL U unit operates on the truncated versions of the inputs to DAL U, and is much faster. The quotient digits are selected using the GAL U output, and this allows the quotient digit selection for the next iteration to be done in parallel with the fully precise DAL U computation. The first two conjuncts in the formula describing the GAL U circuit assert that the truncated partial remainder approximates the corresponding partial remainder up to 6 bits. The second two conjuncts deal with a similar assertion for its second input, the truncated divisor multiplied by the quotient digit. The first conditional expression describes the behavior of GAL U and the second one describes the behavior of DAL U.5 The second invariant is automatically proved by RRL by case analysis on the quotient selection table entries. Subcases are generated from the cover set of n e x t q d i g i t . There are 768 subcases corresponding to each shifted truncated partial remainder and truncated divisor estimates. In some subcases, further case analysis is done on q s i g n being 0 or 1. This analysis is repeated by RRL for the upper bound and the lower bound. Each of these cases is proved very easily by using rewriting and the linear arithmetic decision procedure with RRL using a total CPU time of 58 seconds on a Sun Spare 5 with 64MB of memory. This is despite the fact that RRL currently uses a unary representation of numbers, thus leading to very large expressions which must be rewritten. A slight modification leading to representing numbers efficiently, is likely to reduce the total proof time by at least an order of magnitude, if not two orders of magnitude. 5 The built-in primitive cond in RRL is used for constructing conditional expressions. It is used by the prover for generating case analysis.
116
For those pairs of truncated estimates used as indices to select a quotient digit in the range -2, 2, the property that the partial remainder in the next iteration remains within the bounds, follows from the property that the partial remainder in the previous iteration is within the bounds. For truncated estimates corresponding to the table entry -, hypotheses in the above invariant become inconsistent because of the constraint that the partial remainder in the previous iteration was within the bounds. We also show that no pair of indices other than those in Table 1 need to be considered. Only the specified column indices can occur since the divisor is normalized and positive, and we have assumed that the truncated divisor correctly approximates the divisor up to 3 bits. The completeness of the row indices is established in RRL by the following property which is easily proved by RRL by rewriting and the linear arithmetic. not(truncp < m(48)) and not(47 < truncp) i f m(2) * divisor <= 3 * parrem and 3 * parrem
A major noticeable difference between this proof and our earlier proof reported in 5 (as well as the proof in 2) is that the second invariant is proved directly without using any assumptions. In contrast, the second invariant relied on two assumptions in our earlier proof, much like the proof in 2, These assumptions--the output of GALU is the truncation of the output of DAL U, and the output of GAL U along with the truncated divisor select proper quotient digits from the quotient selection table--first had to be manually identified, and then proved separately, thus making the proof interactive. In the above proof, these properties are automatically derived, and appropriately used to establish the second key invariant. Additionally, the new proof establishes the completeness of the indices of the table in 20. This in contrast to our earlier proof (and the proof in 2) where the rows in the table of 20 implicitly extend in both directions with the out-of-bounds values.
5.3
Detecting Errors in Quotient Digit Selection Table
Despite our careful transcription of the table from 20, two errors were made in the table specification that was input into RRI,. We discuss below how these bugs were detected using RRL since this could be illustrative of finding possible bugs in a quotient selection table. Both errors in the transcription of the quotient selection table were detected while attempting a proof of the second invariant. Since every quotient selection table entry gives rise to an intermediate subgoal of the second invariant, RRL is unable to prove the subgoal corresponding to the erroneous entry. It also
117
explicitly displays the table index values from which the corresponding subgoal was generated. Using this, it is straightforward to determine the table indices associated with a failing subgoal. The first error was in the table entry indexed by (1111.0, 1.000). This was erroneously assigned a value of - 2 instead of - 1 . With this entry, RRL established that the partial remainder would violate its upper bound as explained below. Based on the table entry, r o u t l = - 1 / 4 , truncp =-8, t r u n c d = 8 and 8) = -2. Let d i v i s o r = I and p a r r e m = -6/25. Under these values, the conjuncts m ( 2 ) * d i v i s o r <= 3 * p a r r e m a n d 3 * p a r r e m <= 2 * divisor in the hypothesis of the second invariant reduce to true. The conjuncts t r u n c d <= 8 * d i v i s o r and 8 * d i v i s o r < t r u n c d + 1 a l s o reduce to true. Finally, the predicate GALU_clescis true implying that GALU approximates the partial remainder correctly up to 5 bits. However, the next partial remainder computed violates the upper bounds since, 4*-(6/25) + 2.1 is greater than 2/3.
nextqdigit(m(8),
The second error was in the table entry indexed by (0100.0, 1.111). The entry was erroneously assigned the out of bounds value (we used 10 for that) instead of 2. For this entry, using a similar analysis as above, RRL established that the partial remainder would violate its lower bound.
6
Comparison with Related Work
Given that we have now discussed our proof, this section compares it with the related work. We believe that the above formalization as well as proof are much simpler than other formalizations of the related circuit in the literature. 6 Verkest et al 21 discussed a proof of a nonrestoring division algorithm using Boyer and Moore's prover. Lesser and O'Leary 12 discussed a circuit of subtractive radix-2 square root using the NuPRL system. Square root and division circuits are considered to be related to each other 20. But circuits verified in 21, 12 are not based on the SRT method. Since Intel's Pentium bug was reported in the media, there has been a lot of interest in automated verification of the SRT divider circuits 18, 2, 4, 13. Bryant 1 discussed how BDDs can be used to perform a limited analysis of some of the invariants of the SRT circuit. As reported in 18, Bryant had to construct a checker-circuit much larger than the verified circuit to capture the specification of the verified circuit. As reported in 2, German and Clarke 4 formalized Taylor's description of SRT division circuit 20 as a set of algebraic relations on the real numbers. According to them, most of the hardware for the SRT algorithm could be described using linear inequalities. They used Maple, a computer algebra system, to prove properties of the SRT circuit by reasoning about linear inequalities using its Simplex algorithm package. 6 We are however also guilty of not making the proof even simpler by using fractions instead of integers.
118
This formalization was subsequently improved in 2 by Clarke, German and Zhao using the language of Analytica, a theorem prover built on top of Mathematica, another commercially available computer algebra system. To quote the authors, "Analytica is the first theorem prover to use symbolic computation techniques in a major way.... Compared to Analytica, most theorem provers require significant user interaction. The main problem is the large amount of domain knowledge that is required for even the simplest proofs. Our theorem prover, on the other hand, is able to exploit the mathematical knowledge that is built into the symbolic computation and is highly automatic." A correctness proof of SRT divider circuit was then done using Analytica. The main feature of the proof was an abstraction of the quotient selection table using the six boundary value predicates as discussed in subsection 4.1. This abstraction had to be manually provided by Clarke et al. The proof of invariants using this intensional representation of quotient selection table involves reasoning about inequMities, which can become quite tedious and involved. Perhaps, that is why Clarke et al had to use Analytica - using Mathematica algorithms for symbolic manipulation and inequality reasoning, and the Analytica subpart built on Mathematica for logical reasoning. Even though it is claimed in 2 that the proof is "fully automatic" (p. 111 in 2), the proof (especially, the proof of the second invariant regarding the boundedness of partial remainders) had to be decomposed manually and the two assumptions had to be discharged manually. Our first proof attempt discussed in 5 was essentially an exercise to determine how much of the Analytica proof 2 could be done automatically by R R L without having to use any symbolic computation algorithms of computer algebra systems. We mimiced the proof in 2 but making data dependency of various circuit components explicit on different data paths, as well as by identifying different assumptions made in the proof reported in 2. Much to our surprise, once we succeeded in translating the Analytica specification to R R L ' s equational language (which by the way, was the most nontrivial part of this proof), R R L was able to find proofs of all the formulas (the first invariant and the second invariant with the assumptions, as well as discharging of assumptions) automatically, without any interaction. The main infeience method used was that of contextual rewriting integrated with the linear arithmetic procedure in RRL. No additional mechanism was found necessary. No extensions had to be made to ~RL. Further, the entire proof (including the proof of the second invariant involved over 96 cases) could be done in less than 15 minutes on a Sparc 5 Sun workstation/ No timing information is given in 2 for their proof using Analytica. A brief comparison of the above proof with the proof reported in 5 based on representing the table using boundary value predicates is discussed: 1. The axiomatization using explicit table representation is much simpler, and it does away with an aspect of the specification development where human guidance is used for abstracting table entries as predicates. z We would also like to point out that RRL is an old piece of software written in Common Lisp.
119
2. Since the abstract representation of the quotient selection table using boundary value predicates in 5, 2 just considers the minimum and maximum values of a partial remainder for every quotient digit, thus losing information on other relations among entries, it is possible to certify erroneous tables correct. For instance, the first error in the table discussed above could be detected using the explicit representation of the table, but would be undetected using the boundary value predicate formulation, s 3. Even though the above proof has nearly 800 subcases, the proof of each subcase is much easier and a lot quicker to obtain as a subformula typically involves numeric constants that can be easily simplified and reasoned about. This is in contrast to the proof in 5 from the specification of the table using boundary value predicates which has 96 subcases. Each subcase in that proof involves a formula with variables with linear constraints, and the reasoning is not as simple and easy. 4. The above proof of the second invariant is direct without making any assumptions. In our earlier proof (as in the proof in 2), two assumptions had to be made, which were subsequently established. 5. The new proof takes less than 1 minute even with more subcases, in contrast to the earlier proof which took around 15 minutes using the same version of the prover on the same machine. The proof reported in 18 using the PVS system is more general than the above proof as well as the proofs in 2, 5. Its structure is essentially based on an earlier proof of Clarke and German 4. First, a general theory of SRT division for arbitrary radix r and arbitrary redundant quotient digit range -a, a is developed. Constraints on a quotient digit selection table are identified in terms of r, a and other parameters. This theory is then instantiated for the radix 4 SRT division circuit. The constraints on the table are also instantiated on a specific quotient digit selection table. The specification and the proof are organized manually using sophisticated mechanisms of the P VS language which supports higher-order logic, dependent types, overloading, module facility, a special data type table 18. The specification is developed with considerable human ingenuity, and the resulting proof is manually driven, even though parts of the proof can be done automatically using previously developed PVS tactics. As reported in 18, the correctness proof of the table implementation itself took 3 hours of cpu time, with the whole proof taking much longer even with user's help. Miner and Leathrum's work 13 is a generalization of the proof in 18; it formalized a subset of IEEE floating point standard, and uses it to do the proof for floating point arithmetic thus providing a formal link relating the SRT division algorithm, among other things, to the IEEE standard subject to IEEE compliant rounding. This proof uses even more sophisticated features of the PVS system, and is highly interactive. s It may, however, be possible to avoid such errors if additional constraints such as monotonicity of entries in every row in the table are also included in its specification along with the boundary value predicates.
120
Moore et al 14 reported a proof of correctness of the kernel of a microcoded floating point division algorithm implemented in AMD's 5K86 processor. The algorithm is defined in terms of floating point addition and multiplication. The proof is done using ACL2, a descendant of Boyer and Moore's prover. No claim is made about making the proof automatic, but rather the main emphasis is on formalizing the IEEE floating point arithmetic, compliant rounding, proving a library of relevant and useful properties of floating point arithmetic, and using it to verify the division algorithm based on Newton-l~aphson's method. 7
Concluding
Remarks
and Future
Enhancements
to RRL
We have discussed a proof of correctness of the SRT division Mgorithm using a rewrite-based theorem prover RRL. It is believed that the algorithm similar to the one discussed above was implemented in the Intel Pentium chip, which was found to be buggy. The bug is suspected to be as either the wrong entries in the quotient digit selection table or that certain portions of the table considered inaccessible could be accessed during the division computation. The salient features of the above proof are that 1. the formalization is much simpler than the ones reported in 2, 18, 2. the quotient digit selection table is explicitly specified, in contrast to the specification of the table in 2 in terms of boundary value predicates, an abstraction that a human designer would have to perform, 3. the proof of the second key invariant about the circuit is simpler, making fewer assumptions than in the proof reported in 2, 4. the proof finding takes much less resources in contrast to other proofs (58 seconds of cpu time on a Sparc5 Sun with 64MB of memory), and 5. possible bugs in the quotient digit selection table can be easily identified. From our work on using RRL for mechanizing verification of arithmetic circuits, a number of lessons can be drawn about possible enhancements to rewritebased provers such as RRL in order to make them better suited for the application of hardware verification. Perhaps the most critical feature of a prover for this application is that it should be able to perform simplification (rewriting) using conditional rewrite rules very efficiently and fast. Secondly, a prover should also be able to do automatic case analyses quickly. An important research problem is to develop heuristics for efficient case analysis that minimizes duplication of effort by recognizing a common subset of assumptions sufficient to establish many related subcases. Thirdly, for proofs of parameterized and generic circuits with subcircuits specified by constraints on their input/output behavior, a theorem prover should be able to perform proofs by induction. A related issue is that of automatic generation/speculation of intermediate lemmas, as often behavioral specifications cannot be directly proved from circuit specifications. We have briefly discussed these issues in 5 Mechanizing verification of the SRT division circuit highlights additional issues related to representing and reasoning about data structures found useful
121
in describing such circuits. The above proof involves reasoning about numeric constants while dealing with truncated partial remainders, truncated divisors as well as bounds on the partial remainder. A prover should be able to efficiently represent, display and reason about numeric constants. R R L does not provide good support for explicitly representing and manipulating numeric constants. Numbers are represented in unary notation using 0, successor and predecessor. In order to specify the SRT divider, the numeric constants have to be specified by the user as abbreviations defined in terms of these constructors. This imposes an unnecessary burden on the user for having to specify such trivial details. Further, in proofs generated by RRI,, these abbreviations are expanded as their unary representations, leading to unnecessary large formulas. Rewriting such formulas is considerably slower. Intermediate formulas with such large term structures also make the mechanical proofs hard to read and understand. A better representation for numbers and reasoning support for rationM number will be helpful in formalization floating point arithmetic and proving properties about floating point circuits. Such enhancements to R R L will help in verification of hardware implementations of sophisticated graphics and media processing algorithms 26, 25 which have widespread use. Many circuits including SRT divider rely on preprogrammed read-onlymemory, for implementing tables for fast computation using look-ups. A direct encoding of a table data structure, the support for reasoning about it, as well as efficient handling of various entries in the table leading to large case analysis will help. Specifications of such circuits are error-prone because of too many cases to be considered as well as large tables involving many numeric values. When a theorem prover detects a bug, it would be helpful for a theorem prover to trace the bug to the specification, particularly identifying the subcase and/or entry in the table. Identifying multiple bugs in a single run would enhance the designer's productivity in developing such specifications.
References 1. R.E. Bryant, Bit.level Analysis o an SRT Divider Circuit. Tech. Rep. CMU-CS95-140, Carnegie Mellon University, April 1995. 2. E.M. Clarke, S.M. German and X. Zha~, "Verifying the SRT division algorithm using theorem proving techniques," Proc. Computer Aided Verification, 8th Intl. Conf. - CAV'96, New Brunswick, July/August 1996, Springer LNCS 1102 (eds. Alur and Henzinger), 111-122. 3. M.D. Ercegovac and T. Lang, Division and Square Root: Digit Recurrence Algorithms and Implementations. Kluwer, 1994. 4. S. German, Towards Automatic Verification of Arithmetic Hardware. Lecture Notes, 1995. 5. D. Kapur, "Rewriting, decision procedures and lemma speculation for automated hardware verification," Proc. lOth Intl. Conf. Theorem Proving in Higher Order Logics, LNCS 1275 (eds. Gunter and Felty), Murray Hill, NJ, Aug 1997, 171-182. 6. D. Kapur and X. Nie, "Reasoning about numbers in Tecton," Proc. 8th Intl. Syrup. Methodologies for Intelligent Systems, (1SMIS'9~), Charlotte, North Carolina, October 1994, 57-70.
122
7. D. Kapur and M. Subramaniam, "Mechanically verifying a family of multiplier circuits," Proc. Computer Aided Verification, 8th Intl. Conf. - CAV'96, New Brunswick, July/August 1996, Springer LNCS 1102 (eds. Alur and Henzinger), 1996, 135-146. 8. D. Kaput and M. Subramaniam, "Mechanical verification of adder circuits using powerlists," Dept. of Computer Science Tech. Report, SUNY Albany, November 1995. Accepted for publication in J. of Formal Methods in System Design. 9. D. Kapur and M. Subramaniam, "Lemma discovery in automating induction," Proc. Intl. Conf. on Automated Deduction, CADE-13, LNAI 1104 (eds. McRobbie and Slaney), New Jersey, July 1996. 10. D. Kapur and M. Subramaniam, "Intermediate lemma generation from circuit descriptions," under preparation, State University of New York, Albany, NY, October 1997. 11. D. Kapur, and H. Zhang, "An overview of Rewrite Rule Laboratory (RRL)," J. of Computer and Mathematics with Applications, 29, 2, 1995, 91-114. 12. M. Leeser and J.O'Leary, "Verification of a subtractive radix-2 square root algorithm and implementation," Proc. ICCD'95, IEEE Computer Society Press, 1995, 526-531. 13. P.S. Miner and J.F. Leathrum Jr., "Verification of IEEE compliant subtractive division algorithm," Proc. FMCAD'96, Palo Alto, CA, 1996. 14. J Moore, T. Lynch and M. Kanfmaan, A Mechanically Checked Proof of the Correctness of the AMD5K86 Floating Point Division Algorithm. CL Inc. Technical Report, March 1996. 15. S.F. Obermann and M.J. Flynn, An Analysis of Division Algorithms and Implementations. Technical Report CSL-TR-95-675, Stanford University, July 1995. 16. A.R. Omondi, Computer Arithmetic Systems: Algorithms, Architecture and Implementations, Prentice Hall 1994. 17. J.E. Robertson, "A new class of digital division methods," IRE Transactions on Electronic Computers, 1958, 218-222. 18. H. Ruess, N. Shankar and M.K. Srivas, "Modular verification of SRT division," Proc. Computer Aided Verification, 8th Intl. Conf. - CAV'96, New Brunswick, July/August 1996, Springer LNCS 1102 (eds. Alur and Henzinger), 123-134. 19. K.D. Tocher, "Techniques of multiplication and division for automatic binary computers," Quarterly Journal of Mechanics and Applied Mathematics, 11(3), 1958. 20. G.S. Taylor, "Compatible hardware for division and square root," Proc. 5th IEEE Syrup. on Computer Architecture, May 1981. 21. D. Verkest, L. Claesen, and H. De Man, "A proof of the nonrestoring division algorithm and its implementation on an ALU," J. Formal Methods in System Design, 4, Jan. 1994, 5-31. 22. T.E. Williams and M. Horowitz, "A 160nS 54-bit CMOS division implementation using self-timing and symmetrically overlapped SRT stages," Proc. lOth IEEE Syrup. on Computer Arithmetic, 1991. 23. H. Zhang, "Implementing contextual rewriting," Proc. 3rd Intl. Workshop on Conditional Term Rewriting Systems, Springer LNCS 656 (eds. Remy and Rusinowitch), 1992, 363-377. 24. H. Zhang, D. Kapur, and M.S. Krishnamoorthy, "A mechanizable induction principle for equational specifications," Proc. 9th Intl. Conf. Automated Deduction (CADE), Springer LNCS 310, (eds. Lusk and Overbeek), Chicago, 1988, 250-265. 25. Proc. of Eighth Syrup. of HOT Chips, IEEE Computer Society, California, 1996. 26. Proc. of Ninth Syrup. of HOT Chips, IEEE Computer Society, California, 1997.
On the Complexity of Parallel Implementation of Logic Programs*
(Extended Abstract) E. Pontelli, D. Ranjan, G. Gupta Dept. of Computer Science New Mexico State University Las Cruces, NM 88003 USA {epontell, dranj an, gupta}@cs, nmsu. edu
Abstract. We study several data-structures and operations that commonly arise in parallel implementations of logic programming languages. The main problems that arise in implementing such parallel systems are abstracted out and precisely stated. Upper and lower bounds are derived for several of these problems. We prove a lower bound of/'2(log n) on the overhead incurred in implementing even a simplified version of or-parallelism. We prove that the aliasing problem in parallel logic programming is at least as hard as the union-find problem. We prove that an and-parallel implementation can be realized on an extended pointer machine with an O(1) overhead.
1
Introduction
Logic programming (LP) is a popular computer programming paradigm that has been used in a wide variety of applications, ranging from Artificial Intelligence, Genetic Sequencing, Database programming, Expert Systems, Natural Language Processing, Constraint based Optimization, to general purpose programming and problem solving. A nice property of logic programming languages is that parallelism can be automatically extracted from logic programs by the compiler or the runtime system. However, the implementation of a parallel logic programming system poses many interesting and challenging problems. Several solutions have been proposed for these problems and parallel implementations have been realized in the past. To date, however, little attention has been paid to analyzing the complexity of the operations involved in this implementation. The problem of implementing a logic programming language can be abstracted as the process of maintaining a dynamic tree. The operational semantics of. the logic language determine how this tree is built and what operations on the tree axe of interest. As execution proceeds according to the operational semantics, this trees grows and shrinks. In a parallel implementation the tree can * This work has been partially supported by NSF grants CCR 96-25358, INT 95-15256, and HRD 9628450, and by NATO Grant CRG 921318.
124
grow and shrink in parallel. Various operations are needed during parallel execution to guarantee the correct behaviour. For example, at runtime we may need to determine at a particular moment if a given node is in the leftmost branch of the tree that exists at that moment; or, given two nodes, if one node is an ancestor of another; etc. Although dynamic data structures have been studied extensively 2, 5, 23, 8, 20, 21, 4, to the best of our knowledge the specific data structures needed to support parallel logic programming have not been studied formally. In this paper we derive upper bounds and lower bounds for some of these operations. We prove a lower bound of/2(log n) on the overhead incurred in implementing even a restricted version of or-parallelism (no aliasing). We prove that the aliasing problem in parallel implementation of Prolog is at least as hard as the union-find problem. This gives us a lower bound of ~(n + ma(m,n)) for the aliasing problem on pointer machines, due to the results of 18, 21. We prove that an and-parallel implementation can be realized on an extended pointer machine with an O(1) overhead. We also give a scheme to support a restricted version of or-parallelism in time 0 ( ~ n ) per operation. To our knowledge this is the best scheme known to date. We also give a tight relationship between the and-parallelism problem and the problem of "time-stamping" on pointer machines. Elsewhere, we also show that the side-effect problem in parallel logic programming can be solved with a constant time overhead per operation 16. We believe that abstracting out the problems in parallel implementation of logic programs as that of building data structures to support certain operations of interest is one of the major contributions of the paper. All the results presented are novel, as these problems have never been considered before. The only work that comes close is that of Gupta and Jayaraman 7 that establishes some results on complexity of realizing an or-parallel implementation. The work presented here is far from complete and numerous open problems remain. One of the goals of this paper is to bring these problems to the attention of the complexity theory and data structures community. We hope this paper will initiate further research in the area of data-structures for dynamic graphs that arise in parallel implementation of declarative languages and parallel AI systems. The reader is assumed to be familiar with the general concepts and terminology of logic programming and Prolog 12.
1.1
Parallel Logic Programming
The presence of non-determinism is a peculiar feature of logic programming. This non-determinism in logic programs can be used as a source of implicit parallelism. Two major forms of parallelism can be identified 3: 9 And-Parallelism (AP) 9,15, which arises from don't care non-determinism; given a goal, several subgoals can be selected for simultaneous reduction. 9 Or-Parallelism (OP) 1, 13: which arises from don't know non-determinism. Given a subgoal, the multiple clauses whose heads unify with it can be solved in parallel. A major research direction in parallel logic programming has been to design parallel implementations of Prolog 3, 9,1,13. This implies that the parallel ex-
125
ecution mechanisms have to be designed in such a way that a user sees the same externally observable behaviour during parallel execution as is observed during sequential Prolog execution. We say that such a parallel system preserves Prolog semantics. Thus, in a parallel system that preserves Prolog semantics: (i) the same solutions are produced, in the same order, as in sequential execution; (ii) the same side-effects are observed, in the same order, as in sequential execution. In this paper, we only consider parallel implementations that preserves Prolog semantics. 2
Problems
in Parallel
Logic Programming
Various parallel implementations of Prolog have been proposed and realized 3, 10. Most of the current systems suffer from severe efficiency problems. Many of these inefficiencies, especially in and-parallel systems, arise from the need of guaranteeing Prolog semantics during parallel execution. Other inefficiencies, such as those arising in or-parallelism, are present due to the complexity entailed in implementing don't know non-determinism. Preservation of Prolog semantics, in fact, requires various operations (e.g., execution of side-effects) to be ordered correctly; this creates dependences between the concurrent executions. Only executions that respect all such dependences can be accepted. It is possible to combine both and-parallelism and or-parallelism together 3, 6, however, for the sake of simplicity, we consider the two forms of parallelism independently. In the rest of this work we will also not deal with the issue of implementing the correct ordering of side-effects: this problem has been studied elsewhere 16 and shown to have a worst-case time complexity of O(1) per operation. 2.1
And-Parallelism
Given a resolvent B 1 , . . . , Bn, multiple subgoals in the resolvent can be concurrently reduced in and-parallel execution. And-parallel execution can be visualized through the and-tree. The root of the and-tree is labeled with the initial goM. If a node contains a conjunction B1,... ,B,~, then it will have n children: the ith child of the node is labeled with the body of the clause used to solve the Bi. The main problem in the implementation of and-parallelism is how to efficiently manage the unifiers produced by the concurrent reduction of different subgoals. Two subgoals Bi and Bj (1 < i < j <__n) in the resolvent B1,... ,B,~ should agree in the bindings of all the variables that are common to them (such variables are termed dependent variables in parallel logic programming terminology). In sequential Prolog execution, usually, Bi, the goal to the left, binds the common variable and Bj works with the binding produced by Bi. In and-parallel execution, when Bj is started, the common variable will be unbound. Bj may attempt to instantiate this variable, violating Prolog semantics. The key problem is to ensure that bindings to common variables are made in the same order as in a sequential execution. This requirement is much stronger compared to just requiring that Prolog semantics be preserved (i.e., externally observable behavior during parallel execution be the same as in the sequential execution). If a parallel
126
system satisfies this stronger requirement, we say that it preserves strong Prolog semantics. Preserving strong Prolog semantics is important, as otherwise considerable amount of redundant computation may be performed 14. Note that if strong Prolog semantics is preserved, then Prolog semantics is preserved, but not vice versa. And-parallel implementations handle the requirement of preserving strong Prolog semantics by assigning producer or consumer status to each subgoal that shares the dependent variable 14. The leftmost goal in the resolvent that has access to the dependent variable is designated as the producer subgoal for that variable, all others are consumers. A consumer subgoal is not allowed to bind the dependent variable, it is only allowed to read its binding. If a consumer subgoal attempts to unify against the unbound dependent variable, it has to suspend until the producer goal binds it. If the producer subgoal finishes execution without binding the dependent variable, then the producer status is transferred to the leftmost consumer subgoal for that variable. The producer subgoal for a dependent variable, therefore, can change dynamically during execution. Thus, a major problem in an and-parallel implementation is to keep track of the leftmost subgoal that can (directly or indirectly) access each variable. 2.2
Or-Parallelism
In case of OP the parallel computation can be described as a tree, called the ortree. Each node contains a goal obtained from the computation. The root node is labeled with the initial goal. To expand a node further, the leftmost subgoal in the goal of that node is selected and the matching clauses are found. If B 1 , . . . , Bn is the goal at the node, for each clause H j : - D J , . . . , D ~ such that 0j = mgu(Hj,B1), a child node labeled with the goal ( D J , . . . , D T , B 2 , . . . , B n ) O j is created. Note also that the child node n~ corresponding to a matching clause Ci that textually precedes another matching clause Cj is placed to the left of n j, where nj is the child node corresponding to Cj. Sequential execution corresponds to building the or-tree one node at the time, in depth first order. In the or-tree, each branch is forced to maintain a local view of the substitution computed. This requirement emerges from the fact that, during a reduction like the one mentioned above, the substitutions 0j produced by unification may potentially conflict and must be kept separate in different branches in the or-tree.These conflicting bindings need to be maintained separately, as they will each lead to a different solution for the initial resolvent. The main issue, thus, in designing an implementation scheme capable of efficiently supporting OP is the development of an efficient way of associating the correct set of bindings to each branch of the or-tree. The naive approach of keeping 0j for each branch is clearly highly ineffโข since it requires the creation of complete copies of the substitution (which can be arbitrarily large) every time a branching takes place 7. 3
Abstraction
of the Problems
Let us consider a labeled tree T with n nodes and a root mot. Without loss of generality we will focus exclusively on binary trees. Trees are manipulated
127
through three instructions: (i) create_tree 0 which creates a tree containing only the root; (ii) expand(n, bl, b2) which, given one leaf n and two labels bl and b2, creates two new nodes (one for each label) and adds them as children of n (bl as left child and b2 as right child); (iii) remove(n) which, given a leaf n of the tree, removes it from the tree. These three operations are assumed to be the only ones available for modifying the "physical structure" of the tree. The tree implements a partial ordering between the node. Given two nodes n and n ~, we write n _ n ~ if n is an ancestor of n~; n -~ n ~ additionally says t h a t n r nq We will be often referring to the notion of leftmost branch. Given a node n, the leftmost branch of the subtree rooted in n can be defined inductively: (i) if n does not have any children, then the branch containing only n is the leftmost branch; (ii) if n has children, and ! is the leftmost child, then the leftmost branch is n followed by the leftmost branch of I. We can define in particular the following partial order n _ m:
(n <~m ~=~n is a node in the lef~most branch of the subtree roo~ed at m) Given a node n, let #(n) indicate the node min~_{m 9 NIn ~ m}. Intuitively, #(n) indicates the highest node m in the tree (i.e., closest to the root) such t h a t n is in the leftmost branch of the subtree rooted at m. p(n) is also known in the logic p r o g r a m m i n g community as the subroot node of n.
3.1
Abstracting Or-Parallelism
T h e main issue in dealing with OP is the management of the variables and their bindings. The development of the OP computation takes place as described in the previous section on side-effects. Variables that arise during execution, whose multiple bindings have to be correctly maintained, can be modeled as attributes of nodes. We can assume t h a t a number m of attributes are available (i.e., there are m variables). If the computation tree has size n, then we can assume t h a t m = O(n). At each node n three operations on attributes are possible: 9 assign a value v to an attribute a; as a consequence, every reference to a in a node m ~_ n will produce v. For the sake of simplicity we can assume that, given the node n where the assignment is performed and the attribute a we can uniquely infer the value v. Let us use the function access(n, a) to obtain the value v if assign(a, v) was performed in n, 2_ otherwise. 9 alias two attributes a l and ~2; This means t h a t every reference to a l (a2) in a node m _ n will produce the same result as a2 ( a l ) ; 9 dereference an attribute a - - t h a t is identify the value (if any) of a. This is equivalent to finding the node k = max~ {m I access(m, a) ~ 2_ A m -~ n}. Let us refer to the problem of finding this node n as the 0 7 ~ problem. Later we are going to discuss the problems of aliasing and dereerencing separately. 3.2
Abstracting
And-Parallelism
T h e main issue in implementing AP is the management of shared variables. Informally, each node m a y create a variable, try to bind a variable, or alias
128
two variable to each other. More precisely, we need to support the following operations on a variable v: 9 creation: for our purposes, we can assume that creation is implicit with the expand operation. Let us denote with e(v) the node where v is created; 9 binding: assign a value to the variable v; 9 aliasing: given two unbound variables v and v ~, make them aliases of each other. Once two variables are aliased, they will reference the same value. A variable v will be directly accessed or used only in the subtree rooted at e(v). Aliasing of variables, in general, does not create an immediate problem for andparallel computation (it will not directly create violations of the Strong Prolog semantics--as long as the subgoal which performs the aliasing is a producer of at least one of the two aliased variables). On the other hand, binding of an aliased variable does create a problem. If two variables v and v ~ are aliased, and later an attempt to bind one of them is made, then the binding should be allowed only if it satisfies the binding requirements. More precisely, let us use the notation a(v) to denote the set of variables that are aliased to the variable v. For an UN-aliased variable we have a(v) = {v}. The binding condition can then be expressed as follows: a binding taking place in a leaf n of the tree can be safely performed (w.r.t. Strong Prolog semantics) iff the node n lies on the leftmost branch of each node in a(v). Equivalently, Vw E a ( v ) ( n ~ e(w)). In the following we will denote with verify_leftmost(l,n) the operation which given a leaf node l and a node n (guaranteed to be on the path from the root to l) verifies that l is leftmost in the subtree rooted in n. Thus, apart from the general operations on the tree (creation, expansion, and removal), we need to perform two additional (and, in our opinion, orthogonal) operations to realize and-parallelism. Given a variable v for which a binding is attempted at leaf n of the tree, then in order to successfully bind v we need to: 9 identify in a(v) the "oldest" variable, w, in the set (a variable v is older than another variable w, if e(v) is an ancestor of e(w)). This is necessary in order to determine the reference node for the leftmost test and to locate a memory cell to store the binding. 9 verify whether the node n is leftmost with respect to the node identified in the previous step (verify_leftmost(n, w)). Let us refer to the first problem (identification of the "oldest" node in the set and memory location) as the ,4iol problem, and the second problem (determining if a node is leftmost in the subtree rooted in a given node) as the A10 2 problem. Later, we will discuss simpler versions of these problems, where one or more operations are not considered (e.g., the A10 problem without the alias operation). 4
Pointer
Machines
and
Extended
Pointer
Machines
A Pure Pointer machine consists of a finite but expandable collection R of records, called global memory, and a finite collection of registers. Each record is uniquely identified through an address (let Af be the set of all the addresses). A special address nil is used to denote an invalid address. Each record is a finite
129
collection of named fields. All the records in the global memory have the same structure, i.e. they all contains the same number of fields, with the same names. Each field may contain either a data (from a generic data domain Data) or an address (i.e., and element from Af). If :~ is the finite collection of field names, then the following function field.~ccess : ~" x Af ~ DataUAf allows to access the value stored in a field of a record. In the rest, for the sake of simplicity we will use the notation n(r) to denote the value returned by field_~ccess(n, address(r)). The pointer machine is also supplied with two finite collections of registers, dl, d2,... (data registers) and rl, r 2 , . . . (pointer registers). Each register d~ can contain an element from Data, while each register r~ can contain one element of Af. The machine can execute programs. A program P is a finite, numbered sequence of instructions (where each instruction in P is uniquely identified by one number). The instructions allow to move addresses and data between registers and between registers and records' fields. The only "constant" which can be explicitly assigned to a register is nil. Special instructions are used to create a new record (returning its address) and to perform conditional jumps. The only conditions allowed in the jumps are true and equality comparison between two pointer registers. Observe that the content of the data fields will never affect the behaviour of the computation. In terms of analysis of complexity, it is assumed t h a t each instruction has a unit cost. For further details on the structure and behaviour of the pure pointer machine the reader is referred to 11, 21,19. As evident from the previous description, the pure pointer machine model is quite limited in its ability. It provides a good base for modeling implementation of linked data structures, like trees and lists. But it fails to capture some computational aspects of a real computer architecture. Two major drawbacks of the pure pointer machine model are: (1) the lack of arithmetic--numbers have to be explicitly represented as linked lists of bits. This is realistic in the sense that, for arbitrarily large problems requiring arithmetic operations, the time to compute such operations will be a function of the size of the problem. Nevertheless models which assume the uniform cost criterion (like the RAM model)--i.e., cost of an operation is independent from the word size--have been frequently used. (P) the lack of pointer arithmetic--the only operation allowed on pointers is dereferencing (following the pointer). Real machines allows arithmetic operations on pointers, allowing implementations of arrays and other direct access structures. Of the two possible extensions, we consider in this framework the first one. T h a t is, we allow data registers to contain non-negative natural numbers and we allow for some additional machine instructions, which allow to initialize to 0 a data field/register, to increment the content of a data field/register, and two compare (<) two data registers. Unitary cost of the operations is assumed only on numbers whose maximum size in bits is lg n, where n is the size of the problem at hand. 4.1
C o m p l e x i t y of And-Parallelism
T h e P r o b l e m . A ~ l The Union-Find problem involves maintaining a collection of disjoint dynamic sets. Sets can be created and merged, and each set is
130
uniquely identified through a representative element. The following operations are allowed: (1) Make_set(x): given an element x, this operation creates a singleton set containing x. x is clearly used as representative element for this set. Disjointness requires that x does not belong to any other set of the collection; (2) Union(x,y): given two elements x and y belonging to two disjoint sets A~ and Ay, this operation destroys the two sets and creates a new set containing the elements Ax U Ay. An element in Ax U Ay is selected as representative of the newly created set; (3) Find_set(x): given an element x which belongs to one set of the collection, the function returns the representative of the set containing x. The general problem we are considering is that of performing an arbitrary (correct) sequence of the above operations, containing n Make_set operations and m Union and Find_set operations. The operations in the sequence are assumed to be performed on-line, that is each operation is not started until the immediately preceding one has been completed. Various researchers have investigated the complexity of this problem. The major result (originally 21 and then extended in 18) is that the this problem has a lower bound of ~(n + ma(n, m)), where is the inverse of the Ackermann function (thus a very slowly growing function). Optimal algorithms achieving this complexity have been proposed 21. Tarjan also was the first one to prove the upper bound complexity of the problem 22, which happens to be O(n + ma(n,m)). We can easily show that the A:P 1 problem has a complexity greater or equal to that of union-find. Let us consider an arbitrary solution of the .4:P1 problem. This means that we have means to represent variables, keep track of aliasing, retrieve a selected representative (the oldest variable). Given a set of objects { a l , . . . , a n } and a sequence of union and find operations, we can implement them on top of the .AT91. Each element a~ is associated to a distinct variable v~ (e.g. the memory location associated to the variable). Each time a union is required, the corresponding variables are aliased. A find operation for an element a~ can be realized by searching for the representative variable in c~(v~). The mapping between elements ai and variables does not require more than a finite number of steps. This implies that the amortized complexity of Aโข 1 is ~(n + mc~(n, m)), where n is the number of variables and m is the number of alias/find operations. Moreover, from the results of 2, the single operation worst case complexity for this problem is ~(Ag_~_~ \ l g l g n for a large class of algorithms on pointer machines. An optimal upper bound for this problem is still unknown. In general, A7)1 seems to have a complexity greater than that of union-find. This arises from the fact that when a variable v is about to be bound, and we are performing a find on a(v), then we want to obtain as result not any arbitrary element of the set but the oldest variable in ~(v). This requires the use of some sort of time stamping which may result in time complexity worse than that of union-find. A simple (amortized) upper bound is O((n + me(n, m))lg n). This is due to the fact that we can "time-stamp" each variable with its creation time and, amongst the aliased variables, we keep the time stamp of the oldest variable. A naive scheme to maintain time-stamps requires an order O(lgn) overhead on pointer machines. This scheme for time-stamping is not optimal. Using the
131
results in 17, where it is shown t h a t the time-stamping can be performed with complexity O ( ( l g l g n ) k) for some small k, we can show t h a t the A:P 1 problem can be solved in time O((n + mob(n,m))(lg lgn) k) on pointer machines. T h e p r o b l e m ,AT~2 In this section we will argue t h a t the problem A7~2 has an u p p e r bound of O(n + ma(n,m)), where n is the number of nodes created and ra are the remaining operations. The upper bound for the A ~ 2 problem on a pure pointer machine can be derived from the results known for the union-find problem 22, 21,18, 4. Let us assume t h a t we have an implementation of the union-find problem, i.e. we have a notion of disjoint set and the three operations Make_set, Find_set, and Union. We show t h a t this can be adopted to give an efficient implementation of the tree manipulation operations required for A7 ~2. We assume t h a t we are working on a pointer machine. Trees are represented in the traditional way. Each node is encoded in a record, and each record is assumed to have at least three pointers available, one to identify the parent node (nil if the node is the root of the tree), and two to identify the two children. The intuitive idea is to use the union-find operations to keep track of the "left branches" in the tree. T h e following operations on the tree are performed (assumption is t h a t we are processing binary trees): -
the creation of the tree does not require any special action. A new singleton set containing the node is created through the Make_set operation. - if we expand one node n to the right, adding a node m below it, then we have two possible cases: 9 if the node n has a left child, then nothing else is required. 9 if the node n does not have a left child, then a Union operation is performed between the set containing n and the one containing m. if we expand one node n to the left, then we should perform a union operation between the new node m added and the set containing n. - let us consider the various cases t h a t may occur when a leaf n is removed. Let m be the predecessor of n in the tree. 9 if n is the right child of m, then no special action is required upon removal of n; 9 if n is the left child of m and m does not have any right child, then again no special action is required; 9 if n is the left child of m and m has a right child v, then upon removal of n it is necessary to apply a Union operation between the set containing m and the one containing v. the operation of determining if a node n is leftmost in the subtree rooted at a certain node m can be solved by applying the Find_set operation to b o t h n and m and verifying whether they fall in the same set. -
-
From the above discussion we can infer t h a t for Jt:P 2, given a sequence of n create operations and m operations of all other types, the problem can be solved with a time complexity of O(n + ma(n, m)). This provides an amortized
132
upper bound to the problem. Regarding lower bounds on pure pointer machines, clearly f l i n + m) is an obvious lower bound. We conjecture that this is not tight. In section 6 we propose some related conjectures, proved for special cases, which indeed show that the lower bound is win + m). Summarizing, the complexity C~t~ (n, m) of A P ~ is ?
12(re+n) 4 CAp2(n,m)
<_ O ( n + m a ( n , m ) )
Our conjecture is that the problem indeed has the same complexity as Unionfind. This is also substantiated by the ability to show a lower bound of ~2(m + na(n, m)) for a slightly different problem 14. We would like to mention that, using the "time-stamping" scheme of 17, the A7~2 problem can be solved with single operation worst-case time complexity of O (lg lg n) per operation. This is better than the O ( ~ ) scheme that will result if we use the union-find scheme from 2. C o m p l e x i t y o n E x t e n d e d P o i n t e r M a c h i n e s The extended pointer machines add a fundamental component to the design, the ability to perform basic increment and comparison operations between natural numbers. This feature is sufficient to modify some of the lower bounds previously discussed. In particular, the presence of counters and the ability to compare them in constant time allows us to solve the J4~P2 problem in time 8(1 ) per operation. It is possible to show that keeping track of the #in) information for each leaf of the tree can be realized with complexity O(1) per operation 16. This scheme can be modified on an extended pointer machine to solve the A:P2 problem. Let us assume that each record in the memory of the pointer machine contains a data field capable of maintaining a counter, called age. During the create_tree operation, the root of the tree is created and the value 0 is stored in the age field of such record. Every time a node n is expanded, the two new nodes created in1 and n2) receive a value in their age field obtained by incrementing by one the age of n. The remove operation leaves unaffected the age fields of the various nodes. We can assume that each variable v itself is represented by a record, which contains a pointer to the node e(v) (the node where v was created). For each group of aliased variables we also maintain the information max(v) = min~_{m I m = e(w) A w E c~(v)}. When the binding for a variable v is attempted, then we will need to verify whether the current computation point (represented by the leaf n in the tree) is leftmost in the subtree rooted at max(v). At this point, the operation verify_leftmost can be easily realized. The intuitive idea is that the counter can be used to compare the depth of nodes. The sub field points to the highest node in the tree in which such node is on the leftmost branch. If such node is below the expected node, then the test should fail, otherwise it will succeed. On the other hand, it is also quite easy to see that, as consequence of the uniform cost assumption on the management of counter iaddition and comparison), we can propose an upper bound for ,47vl. It is relatively straightforward to add a simple comparison in the implementation of the union operation in Tarjan's optimal algorithm 21 to force the root of the tree to always contain the older
133
variable. This allows direct use of the union-find algorithm to support the r 1 problem, resulting in a complexity of O(n + ma(n, m)). The two results combined, reflect exactly the "practical" results achieved in 14, where a "constant-time" practical implementation scheme for handling the fit/) problem without aliasing is described.
5
Complexity
of Or-Parallelism
The problem of Or-parallelism is considerably more complex than those we have explored earlier. As mentioned earlier, the only previous work that deals with the complexity of the mechanisms required in parallel logic programming was in the context of or-parallelism 7. The results expressed in this work show that a generic 079 problem with n attributes and m operations per branch generates a lower bound which is strictly worse than 12(n + m). It is quite easy to observe that along each branch of the tree, attributes are managed in a way very similar to the way in which elements are managed in the union-find problem. Initially all attributes are distinct; given an attribute c~, let us denote with A(a) the set of all the attributes that are currently (on the branch at hand) aliased to a. If a new aliasing between al and ~2 takes place, then the set A is modified as A(al) = A(al) U A(a2). If we assume that the value of an attribute a is always associated to the representative of A(c~), then any dereference operation is based on a find operation in order to detect the representative. The observation above can be used to provide a new proof of the non-constant time nature of the management of or-parallel computations. T h e o r e m 1. The amortized time complexity of 079 is O(n + ms(n, m) ), where n is the number of attributes and m is the number of tree operations.
Proof. (Sketch) The proof is based on the idea that 079 subsumes the union-find problem. Let us consider a generic union-find problem with n elements and m union-find operations. From the results in 21, 18 we know that this problem has a lower bound of I2(n+m~(n, m)). Let us show how we can use the mechanisms of 079 to implement a solution to the union find problem. Each of the n elements is encoded as one of the attributes. Each time an operation is performed a new node is added along the left spine of the tree (we can ignore in this context the nodes that will be added to the right during the expansion). Each union operation is translated into an aliasing between the two attributes. Each find operation corresponds to the dereferencing of the corresponding attribute. Clearly each operation in the union find problem can be encoded using the 079 support and a constant number of additional operations. Thus, each solution to 079 cannot be better than the lower bound for the union-find problem. This result relies mainly on the execution of aliasing operations. Nevertheless, it is possible to show that the result holds even when we disallow aliasing during OP execution. Thus, the inability to achieve constant-time behaviour is inherent in the management of the attributes in the tree.
134
T h e o r e m 2. On pointer machines, the worst case time complexity of (9~P is ~(lg n) per operation even without aliasing.
The basic idea of the proof is that since there is no direct addressing in the pointer machines starting from a particular node only a "small" number of nodes can be accessed in a small number of steps. Without loss of generality, assume that each record has at most two pointers. Assume that we have a pointer machine program that performs the find operation in worst-case c(n) time for some function c. The find procedure receives only two pointers as arguments, one to the record that contains the name of the attribute whose value is being searched, and the other to the leaf node for which this attribute value is being searched. Starting from either of these records, we can access at most 2 c('~)+1 different records within c(n) steps. We use this fact to prove that c(n) = g2(lg n). Suppose c(n) is not ~ ( l g n ) . Then for large enough n,c(n) < ~ lgn. Consider the tree with n nodes that has the following structure. It has depth tl(n) + t2(n). The first tl (n) levels constitute a complete binary tree (let us indicate with F1 the first tl(n) levels of the tree); the remaining t2(n) levels consist of 2 tl(n) lists of nodes, each of length t2(n) and each attached to a different leaf of/"1. Clearly, we must have that 2 tl(n)+l - 1 + t2(n) * 2 t~(~) = n or t h a t t2(n) = (n + 1)/2 t~(n) - 2. Let us also assume that we have t2(n) different attributes, { a l , . . . , at2(n)}. All the nodes at level tl(n) + i of the tree perform a bind operation on ai, and each node assigns a different value to ai. Suppose we choose tl(n) and t2 (n) such that 2c(n)+lt2(n) < 2 t'(n) (1) Then there exists a leaf node 1 in the tree such that none of the last t2(n) nodes on the path from root to l is accessed starting from the pointer to the attribute node in any of the computations find(ai, l), i = 1 . . . t2(n). This implies that for any of the t2(n) different calls find(a~, l) the appropriate node on the path must be accessed from starting from the leaf node. Since the find procedure works in time c(n) we must have that 2 ~(n)+l >_ t2(n). This gives us that c(n) >_ lgt2(n) - 1. Since, c(n) < ~ lgn, 2 ~(n)+l < 2n 1/s. Let us choose tl(n) = 88lgn. Then 2 t~(~) = n 3/4 and t2(n) = (n + 1)In 3/4 - 2 n 1/4. These tl (n), t2 (n) satisfy the inequality (1) above as 2 ~('~)+1t2 (n) ~ 2n 3/s < n 3/4. Hence we can say that c(n) > lgt2(n) - 1, i.e. c(n) > 1/41gn - 1 which contradicts our assumption that c(n) < ~ lg n. Hence c(n) -- J?(lg n).
U p p e r B o u n d for O ' P The related research on complexity of the 0 7 ~ problem has been limited to showing that a constant time cost per operation cannot be achieved in any implementation scheme. No work has ever been attempted, to the best of our knowledge, to supply a tight upper bound to this problem. Most of the implementation schemes proposed in the literature can be shown to have a worst case complexity of O(n) per operation. Currently, the best result we have been able to achieve is the following:
135
T h e o r e m 3. The 07) problem with no aliasing can be solved on a pointer machine with a single operation worst-case time complexity of O(~fn(lgn) ~) for some small constant k. Proof. (Sketch) the 07 ) problem can be cast in the following simplified version. Each node in the tree is assigned a label. The same label can be used in various nodes of the tree, but no label can occur more than once in each branch of the tree. The only two operations allowed are insert(n, a l , a 2 ) which creates two new nodes under node n (required to be a leaf) and assigns respectively the label a l , a 2 to the newly created nodes, and label(n,a) which retrieves (if any) the node with label a present on the path between n and the root of the tree. Observe that we are simplifying the problem by disallowing deletions of nodes and restricting the attention to binary trees. To obtain the worst-case time complexity ()(~fn) per operation we maintain the collection of nodes assigned the same label a in tables of size ~/-n, and record in the tree the nodes where one table gets filled (fill nodes). Furthermore, every ~fn fill nodes a table is created summarizing the effect of the previous ~ fill nodes. A label operation is performed by first scanning the fill nodes up to the last created table, and then visiting table by table. Either an entry is found in one of the initial nodes or one of the tables (which will directly point to the group containing the node of interest), or the node of interest is in the first table for such label (i.e., the first table has not been filled yet), or the label has never been used along the considered branch. Testing each entry in a table requires the checking for nearest common ancestor 8, 5, 23. 6
Open
Problems
Various problems are still open and are currently being researched. First of all, for various problems we have proposed lower and upper bounds which are rather distant; these gaps need to be closed. The gap is particularly evident for the 03 ~ problem (even without aliasing), where the single operation worst-case lower bound is I2(lgn), while the upper bound is 0(~fn) per operation in the worst case. We still lack a lower bound for the .47) 2 problem. The problem, as illustrated in 17, seems to have very strong ties with the mentioned time stamping problem. Let us consider the following problem. Suppose we use a pointer machine to maintain an ordered sequence of elements. The only two operations allowed are insert_element, which adds a new element to the sequence, and precedes, which tests whether one element has been inserted before another. Let us consider the problem of performing a sequence of n insertions followed by a precedes test between two arbitrary elements of the sequence (we will refer to this problem as the I7) problem). Our conjecture is the following: C o n j e c t u r e 4. On a pure pointer machine, an arbitrary (correct) sequence of n operations belonging to the Z7) problem has a complexity strictly worse than
136
I2(n)--in particular, there is no solution where both insertion and test use a constant number of steps per operation. In 17 we show that the 2779 problem has a single operation worst-case upper bound of O(poly(lglgn)). We can prove this conjecture for a special class of algorithms, but this is not sufficient to establish the above conjecture. We are confident that the result holds and can be proved. The relevance of this conjecture can be seen from the following: L e m m a 5. The problem .4792 has a single operation worst-case lower bound of w(1) iff conjecture 4 is true.
Proof. if we can solve 2779 in constant time (i.e., conjecture 4 is false), then .AT92 can also be solved with single operation worst-case time complexity of (9(1). This follows from the scheme described in section 4.1. Let us assume by contradiction that we are capable of implementing all the operations required by the A792 problem in constant time. These operations can be used to provide a solution to the 7779 problem. In fact, the operation of inserting a new element can be translated directly in an expand operation applied on the leftmost leaf of the tree; the first argument of the expand is the node inserted by the 2779 operation, while the second node inserted is a dummy node. The precedes operation can be directly translated into an is_leftmost operation. In fact, given two nodes nl and n2 which have been previously inserted, we know by construction that both lie on the leftmost branch of the tree. Thus, n~ is leftmost in the tree rooted in n2 if n~ has been inserted before nl in the tree. 7
Conclusions
In this paper we studied the complexity of several operations associated with parallel execution of Prolog programs. The following results were achieved: 9 on both the pure and extended pointer machines, execution of and-parallelism requires a non-constant execution time due to the need of managing aliased variables. If aliasing is disallowed, a constant time behaviour is achievable on extended pointer machines. An upper bound for the problem was found. We showed that managing and-parallelism has an upper bound equivalent to the union-find problem. 9 On both models, execution of or-parallelism was showed to have a nonconstant time lower bound. In contrast to and-pazallelism, the non-constant time holds even when we consider extended pointer machines and we disallow aliasing of variables. Some problems are still open, the main one being showing that the problem A79 2 has a non-constant time lower bound on pure pointer machines. This result is currently based on the conjecture about the Z79 problem. Such conjecture has been proved to hold in special cases, but a general proof is still missing. Another open problem of interest is the development of an upper bound for the 079 in case of no aliasing between variables. This problem has been shown to have a non-constant time lower bound, but an optimal upper bound is still a subject of research.
137
References 1. K.A.M. Ali and R. Karlsson. Full Prolog and Scheduling Or-parallelism in Muse. International Journal of Parallel Programming, 1991. 19(6):445-475. 2. N. Blum. On the single-operation worst-case time complexity of the disjoint set union problem. SIAM Journal on Computing, 15(4), 1986. 3. M. Carlsson,G. Gupta, K.M. Ali and M.V. Hermenegildo. Parallel execution of prolog programs: a survey. Journal of Logic Programming, 1998.. to appear. 4. M. Fredman, M. Saks. The Cell Probe Complexity of Dynamic Data Structures. In Proes. of 2Ist ACM Symposium on Theory of Computing. ACM, 1989. 5. H.N. Gabow. Data structures for weighted matching and nearest common ancestor. In ACM Syrup. on Discrete Algorithms, 1990. 6. G. Gupta and V. Santos Costa. Cuts and Side-effects in And/Or Parallel Prolog. Journal of Logic Programming, 27(1):45-71, 1996. 7. G. Gupta and B. Jayaraman. Analysis of or-parallel execution models. ACM TOPLAS, 15(4):659-680, 1993. 8. D. Hotel and R.E. Tarjan. Fast Algorithms for Finding Nearest Common Ancestor. SIAM Journal of Computing, 13(2), 1984. 9. M. Hermenegildo and K. Greene. &-Prolog and its Performance. In Int'l Conf. on Logic Prog., MIT Press, 1990. 10. J.C. Kergommeaux and P. Codognet. Parallel logic programming systems. ACM Computing Surveys, 26(3), 1994. 11. D.E. Knuth. The Art of Computer Programming, volume 1. Addison-Wesley, 1968. 12. J. W. Lloyd. Foundations of Logic Programming. Springer-Verlag, 1987. 13. E. Lusk and al. The Aurora Or-parallel Prolog System. New Generation Computing, 7(2,3), '90. 14. E. Pontelli and G. Gupta. Implementation mechanisms for dependent andparallelism. In International Conference on Logic Programming. MIT Press, 1997. 15. E. PonteUi, G. Gupta, and M. Hermenegildo. &ACE: A High-performance Parallel Prolog System. In IPPS 95. IEEE Computer Society, 1995. 16. E. Pontelli, D. Ranjan, and G. Gupta. On the complexity of the Parallelism in Logic Programming. Technical report, New Mexico State University, 1997. 17. E. Pontelli, D. Ranjan, and G. Gupta. On the complexity of the insertion/precedes problem. Technical report, New Mexico State University, 1997. 18. H. La Poutr6. Lower Bounds for the Union-Find and the Split-Find Problem on Pointer Machines. Journal of Computer and System Sciences, 52:87-99, 1996. 19. A. SchSnhage. Storage Modification Machines. SIAM Journal of Computing, 9(3):490-508, August 1980. 20. D.D. Sleator and R.E. Tarjan. A data structure for dynamic trees. Journal of Computer and System Sciences, 26, 1983. 21. R.E. Tarjan. A Class of Algorithms which Require Nonlinear Time to Mantain Disjoint Sets. Journal of Computer and System Sciences, 2(18), 1979. 22. R.E. Tarjan. Data Structures and Network Algorithms. CBMS-NSF, 1983. 23. A.K. Tsakalidis. The Nearest Common Ancestor in a Dynamic Tree. ACTA Informatica, 25, 1988.
An Abductive Semantics for Disjunctive Logic Programs and Its Proof Procedure Jia-Huai You, Li Yah Yuan, Randy Goebel Department of Computing Science University of Alberta Edmonton, Alberta, Canada T6G 2H1 {you, yuan, goebel}~cs.ualberta.ca
A b s t r a c t . While it is weB-known how normal logic programs may be viewed as a form of abduction and argumentation, the problem of how disjunctive programs may be used for abductive reasoning is rarely discussed. In this paper we propose an abductive semantics for disjunctive logic programs with default negation and show that Eshghi and Kowalski's abductive proof procedure for normal programs can be adopted to compute abductive solutions for disjunctive programs.
1
Introduction
In the simplest form abduction is the problem: From A and A e- B, infer B as a possible explanation of A. Nonmonotonic reasoning has been explored as a form of abductive reasoning. In particular, default assumptions in logic programs have been treated as abductive hypotheses and a number of reasoning mechanisms and semantics have been proposed 7, 10, 16, 18, 19. Chief among these is Eshghi and Kowalski's formulation of an elegant abductive proof procedure for normal programs where default assumptions are viewed as abducibles. Kakas et al. presented a comprehensive exploration of abductive logic programming 16, 17. A fundamental insight is that abductive reasoning embodies an argumentation approach to logic program semantics. Dung 8, as well as Bondarenko et al. 2, subsequently showed that nonmonotonic reasoning in general is a form of argumentation using default assumptions. There are important applications of abductive reasoning with disjunctive programs. For example, in AI planning and scheduling, in general we are interested in whether there is a plan that achieves the goal, and whether there is a schedule that satisfies the specified constraints. Such a solution corresponds to an explanation (abducibles in a plan, for example) to an observation (the goal to be
139
achieved) in abduction. Furthermore, since abduction is a form of hypothetical reasoning, it embodies a form of learning and prediction; e.g., one is often able to abductively complete partial descriptions given as observations. For example, suppose a robot has two hands, the left being designed to be capable of picking up a block and placing it down in the same orientation, and the right rotating it. Now suppose an observation is made that the block's orientation is changed and that either the left or right hand is broken. 1 An explanation of this observation must include a prediction that it is the left hand that is broken. We note that the static semantics for disjunctive programs 24 is not designed to be capable of accommodating these kind of applications. Despite well understood results in relating normal programs with abduction, and in more general cases, relating more general inference systems with abduction and argumentation 2, the problem of how disjunctive programs may be viewed as abduction is still open. For example, Dung showed in 5 that acyclic disjunctive programs can be interpreted as abductive programs in the sense of Eshghi and Kowalski, but he left the question open as what constitutes an abductive semantics for the class of all disjunctive programs. The main problem is that the method used in Dung's approach, which is based on a program transformation called shifting, does not even preserve the minimal model semantics for positive disjunctive programs. The problem has been noticed by a number of authors in dealing with disjunctive default (e.g. 12, 26). In this paper we show a simple extension of the various abductive semantics for normal programs to disjunctive programs. The central idea is to augment the standard first order deduction relation by the inclusion of an additional inference rule, called the rule of assumption commitment, which resolves an assumption and a classic disjunction. Its ground form is:
not_r r162
r This inference rule is similar to that of resolution (or called disjunction elimination in natural deduction). That is, if one identifies not_r with -,r it becomes the resolution inference rule. Intuitively, it means that if we assume not_r we can strengthen our assumption in an attempt to derive additional information. For example, with the following disjunctive program
pay_cash(x, y) V pay_by_credit(x, y) +-1 Here the term observation is used in a broad sense, not necessarily an act of physically observing something.
"~40
where V is interpreted as epistemic disjunction (or, as exclusive as possible). The observation that bob pays cash to greg (pay_cash(bob, greg)) is explained by the assumption that bob does not pay gre# by credit,
not_pay_by_credit(bob, greg ) , which resolves with the above disjunction to yield pay_cash(bob, greg). The approach taken here falls into the general frameworks of abduction and argumentation 2, 8, 17 in that a specific inference system (using the above inference rule) is adopted for the purpose of capturing the meaning of the epistemic disjunction initially formulated by Gelfond and Lifschitz. The most interesting result out of such an adoption is a new semantics that extends the stable semantics for disjunctive programs in the same way as regular/preferential semantics extends the stable semantics for normal programs. Furthermore, with partial evaluation as pre-processing 3, Eshghi and Kowalski's abductive proof procedure, with a mechanism of positive loop checking, can be used to compute abductive solutions for a disjunctive program. For example, if an AI planning problem is formulated as a disjunctive program, one can use the Eshghi-Kowalski abductive proof procedure to answer the question whether a goal is achievable. The procedure computes the abducibles, in the backward chaining fashion, that are needed in order to achieve the goal. The procedure is sound but not complete with respect to the new semantics. ~ This seems to be the first backward chaining proof procedure for any semantics for disjunctive programs. The stable semantics in principle does not possess a top down proof procedure for the simple reason that it does not have the locality property; i.e. a backward chaining proof may be wrong simply due to nonexistence of a stable model. Almost all current implementations of the static semantics reply on a bottom-up generation of minimal models. Logic programming has been identified with a goal-oriented programming paradigm. An implementation of a nonmonotonic semantics could benefit from the techniques used in implementing Prolog 4. Thus, the existence of a top down proof procedure is a significant feature for any semantics. It is important to mention the three remarkable facts about Eshghi and Kowalski's procedure. First, it is conceptually simple-all it needs, on top of a traditional negation-as-failure proof procedure, is a mechanism to handle two kinds of (negative dependency) loops. Secondly, it is now known that this procedure is sound, and complete as well with a mechanism of positive loop checking, for a number of independently proposed but equivalent semantics for normal programs (cf. 7, 13, 20, 30). This includes the preferential semantics based on abduction 7, the regular semantics 29 and the partial stable semantics 25 using model-theoretic approachs, and a stronger version of the stable class 2 The procedure becomes complete when properly combined with a complete resolution procedure.
141
semantics 1. Perhaps more importantly, this procedure can be implemented efficiently, and this efficiency could be comparable with that of a completely top down proof procedure for the well-founded semantics, a An interesting conclusion is that the efficiency of credulous reasoning (w.r.t. one model) based on a weaker notion of stable models, i.e., regular models, is largely comparable with the efficiency of the reasoning using the well-founded semantics when query answering is based on a backward chaining proof. The next section sets the stage for disjunctive programs to be interpreted as abductive programs. Section 3 shows, using Dung's preferred extension as an example, that any abductive semantics for normal programs can be simply extended to disjunctive programs. This specifically yields a new abductive semantics for disjunctive programs. Section 4 gives an elegant fixpoint characterization of the minimal model semantics, stable semantics, and the new semantics in terms of the extended inference relation. Section 5 shows a result which enables the use of Eshghi-Kowalski's abductive proof procedure for the proposed semantics, with Section 6 containing axiditional related work and additional remarks. 2
Disjunctive
Logic
Programming
as Abduction
In general, an abductive framework is a triple (P, Ab, IC) where P is a first order theory, Ab is the set of abducibles, and IC the set of first order formulas serving as integrity constraints. We restrict to a special class of abductive frameworks which correspond to disjunctive programs. For each ordinary predicate atom r = p(tl, ...,tn), not_r denotes the corresponding abducible atom not_p(tt, ..., tn). We let P be a first order theory that consists of the clauses of the form Ot I V ... V olk ~-- # 1 , " . , I~m,
not-71, ..., not-Tn
where c~i's and/3i's are atoms, and not_Ti's are abdueibles (also called assump-
tions). We may denote the above clause by A +-- B, not_C where A is the set of the atoms in the head of the clause, B the set of the atoms in the body, and not_C the set of the assumptions in the body. 3 This is not a claim on the complexity of the semantics, which is known to be intractable for the former and tractable for the latter. This claim merely reflects the fact that as far as backward chaining proof is concerned, the behaviors of these two semantics are quite similar: odd (negative dependency) loops are treated exactly the same in both semantics, and for any even loop, a procedure for the weB-founded semantics assigns (tentatively) the value undefined based on the current loop, and a procedure for the regular semantics simply indicates "in a model". In either case, no further derivation beyond the loop is performed.
142
In the literature, disjunction is usually expressed by h which is sometimes called epistemic disjunction. The intuitive meaning of epistemic disjunction is that it be interpreted as exclusive as possible. Here we replace it by classic disjunction V. Its intended meaning is enforced by the underlying abductive semantics. Note that since such a clause is just a first order formula, the first order derivation relation t- is applicable. For example, with P consisting of a V b t-- not_c a+--b be-a we have P t- a +-- not_c, P U {not_c} ~- a, etc. In Dung's terminology, not_c is evidence for a. Further, we let V ... V Cn), not_C1,..., n o t c h I n > 1}
I V = {.l_~ (r
When n = 1, the constraints in I C become those for normal programs. The special atom / denotes a violation of a constraint. Although it embodies a meaning of inconsistency, it is not the same as logical inconsistency; in particular, its presence in a theory does not trivialize the theory by concluding anything. We argue that this is an advantage of logic programming, as a notion of "local inconsistency" may be accommodated. For the purpose of abductive reasoning with disjunctive programs, we augment the standard first order derivation relation by a new, resolution-like inference rule: Rule of Assumption Commitment
(RAC):
not_r r
V ... V r
~(r
v ... v
r
v
r
v ... v
r
where r is an atom, Cj's are literals, and r and r are unifiable with m.g.u. ~. When n = 1, the resolvent is _L. This inference rule says that if one assumes not-C, one may well commit it to --r in an attempt to derive additional information. In the sequel, we use t-d to denote the standard first order derivation relation augmented by the above rule of inference. Note that t-d is a monotonic relation. We say that a set E of abducibles is an explanation of an atom r if PUEI-ar
and
PUEOIC
~-d .L
143
Example 1. Consider the situation where a block x is putdown onto the table or
onto another block y.
on_table(x) V on_bloc ( , y)
putdown( )
putdown(a) +--
Suppose we see that a is on block b. Then E -- {not_on_table(a)} forms an explanation. An interesting aspect of abductive reasoning is about prediction. As a specific form of prediction, it is about completing, or enriching, the initially specified, incomplete information. Example 2. Consider the popular, broken-hand example originally discussed in
the context of default reasoning 23: We know either the left hand is broken or the right hand is broken, and in general, a hand is usable if not broken. The given information is incomplete as we don't know which hand is broken and which is not (perhaps both could have been broken). For the purpose of demonstrating the point of augmenting partial information, we further assume that the left hand being usable leads to the use of it that results in moving a block; and the use of the right hand leads to moving the table. lh_broken V rh_broken +-lh_usable +- not_lh_broken rh_usable +-- not_rh_broken move_block +-- lh_usable move_table 6- rh_usable
Now suppose we observe that the block is moved from its original location (and suppose we cannot see any operations). Under the closed world assumption for operations (namely, no other operations other the ones performed by the program may be performed), we can predict that it is the right hand that is broken.
3
Abductive
Semantics
for Disjunctive
Programs
In this section we demonstrate that an abductive semantics for normal programs can be simply extended to disjunctive programs by augmenting only the underlying derivation relation by RAC. We do this for Dung's semantics based on preferred extensions, as there is an additional interest in a fixpoint characterization and in the adoption of Eshghi and Kowalski's proof procedure.
144
In the sequel, for the sake of convenience, our technical exploration will be based on ground programs. Also, since given an abductive framework (P, Ab, IC), both Ab and IC are fixed, we only mention P. In the following definition, we paraphrase Dung's definition of a preferred extension. But by adopting ~-d , a new semantics for disjunctive programs is obtained automatically. First, we need some terminology: an assumption set S is called A_-consistent if P U S Y-d A_. D e f i n i t i o n 1. Let P be a disjunctive program. An assumption not_r is said to be acceptable w.r.t, an assumption set S if for any assumption set S' such that P U S' -d r we have P U S ~-d ~, for some not_~ E S ~. A preferred extension E is a maximal assumption set that is 1-consistent, and for every not_r E E, not_r is acceptable w . r . t . E . In terms of argumentation (see, for example, 2, 8, 16, 28), not_r is acceptable w.r.t. S if any assumption set S ~ that attacks not_r is counter-attacked by S itself. Note that the definition is precisely that of Dung's if we use ~- instead of Fd. In fact, for normal programs, b-d coincides with ~- since RAC has no real effect. Because of this, the properties of preferred extensions for normal programs can be easily extended to disjunctive programs. For example, it is easy to show that there is at least one preferred extension for any disjunctive program. Dung uses the Barber's paradox to illustrate the drawback of the stable semantics for the case of normal programs 7. To illustrate the same point for disjunctive programs, we extend the example to a disjunctive program.
Example 3. Consider the disjunctive program: shave(bob, x) +-- not_shave(x, x) pay_cash(y, x) Y pay_by_credit(y, x) +-- shave(x, y) accepted(x, y) +-- pay_cash(x, y) accepted(x, y) 6- pay_by_credit(x, y) This program intuitively says that bob shaves those who do not shave themselves; If x shaves y then y pays x by cash or credit; either way is accepted. Assume there is another person, called greg. Then clearly, we should conclude bob shaves greg, and greg pays bob by cash or by credit, either way is accepted. This program has no stable model. But it has two preferred extensions, both containing not_shave(greg,greg). In addition, one of the two contains not_pay_cash(greg, bob) and the other not_pay_by_credit(greg, bob). Thus, if we know greg pays cash to bob, then the abductive solution is that we must assume that greg does not pay bob by credit.
145
The use of the inference rule RAC may be viewed as semantically shifting disjunctive clauses in a program P. Recall that the idea of shifting a clause A 4-- B, not_C (e.g. see 5, 12) is to syntactically transform such a clause to a set of normal clauses where, for each atom r in A there is a normal clause with r as the head and any other atom ~ in A is removed to the body as not.T,. More formally, given a clause R:
A 4-- B, not_C the set of normal clauses obtained by shifting R, denoted Rshi$t is:
Rshqt = {r 4- B, not_C' : r 9 A, not_C' = not_CU {not_, : ~ 9 A,~ r r Given a disjunctive program P, we denote by PshiIt the set of all normal clauses obtained by shifting all the clauses in P. It is known that this technique is too strong to capture even the minimal model semantics of a positive disjunctive program (cf. 12, 26).
Example ~. Programs with head-cycles are often used to illustrate the difficulties with syntactically shifting a program. Consider, for example, the following program P: aVbVce-a4--b be-a The program has a head-cycle between a and b. In terms of stratification, a should be one level higher than b (by the second clause) and b should be one level higher than a (by the third clause); but a and b also appear in the same head of a clause; hence the term head-cycle. The program has two minimal models, {e} and {a, b}, which correspond to the two preferred extensions, E1 -- {not_a, not_b} and E2 = {not_c}. Let us verify that E1 is a preferred extension. The smallest assumption set that attacks not_a is {not_c} (i.e. P U {not_c} ed a), which is counter-attacked by E1 itself (i.e. P U E1 ~-d c). The normal program Pshiyt consists of the following clauses:
a 4-- not_b, not.z b 4-- not_a, not_c c 4- not_a, not_b a+--b be-a Pshiyt however does not have any stable model.
146
We now show that there is one-to-one relationship between a type of preferred extensions, called stable extensions, and stable models. We first recall the definition of a stable model. A stable model M of a disjunctive program P is a set of atoms, which is a minimal model of the following transformed program:
PM
=
{A 4-- B : A 4- B, not_C E P, and V(not_7) E not_C, 7 ~ M }
Similar to the case of normal programs 7, we define a stable extension S of a disjunctive program P as a total preferred extension of P: for any ordinary atom r either not_r E S or P U S ~-d r The stable models of a disjunctive program, and in particular the minimal models of a positive disjunctive program, are expressible as stable extensions. T h e o r e m 2. Let P be a disjunctive program and M be a set of atoms. M is a stable model of P iff SM is a stable extension of P where SM = {not_r : r ~ M } .
This result provides another way to understand the results by Eiter and Gottlob as well as by Marek et al. 9, 14, 22: disjunctive default logic falls into the same complexity level as default logic. In Gottlob's explanation of these complexity levels, this can be said rather plainly as: for disjunctive programs epistemic disjunction does not present one more source of computational difficulty than classic disjunction.
4
A Fixpoint Characterization
We show that some of the results presented in 30, namely the results that regular models/preferred extensions are maximal normal alternating fixpoints of a suitable operator for normal programs, can be extended to the case of disjunctive programs. This yields an elegant fixpoint characterization of all three semantics: minimal models for positive disjunctive programs, stable models, as well as preferred extensions, for disjunctive programs. We define a function Fp over sets S of assumptions as: F p ( S ) = {not_r
:
r is an ordinary atom such that P U S f-d r
It is easy to check that this function has the following property (called antimonotonicity) :
c $2
Fp(S2) c_ Fp(Sl)
This holds because the derivation relation t-d is monotonic; more information that we have, less information we do not derive. Then, the composite function that applies Fp twice, denoted F~, is monotonic. T h a t is, c
c
147
A fixpoint of the function F/~ is called an alternating fixpoint of Fp (or simply, P; of. 1, 11, 30). An alternating fixpoint S is called normal if S C Fp(S). For credulous reasoning, we are interested in maximal normal alternating fixpoints. On the other hand, Dung's preferred extensions are, by definition, the fixpoints of the following operator over sets of assumptions:
O p ( S ) = {not_r I not_r is acceptable w.r.t. S}. Note that F~ depends only on a derivation relation. As a result, the operation of extending an assumption set using F~ is rather mechanical. On the other hand, the definition of DR, which is based on the notion of acceptability, is less direct. However, we will show that these two operators are equivalent. First, let us see an example.
Example 5. Consider the following disjunctive program: a V b +-- not_a, not_b c +-- not_d It is easy to show that S = {not_d} is a maximal normal alternating fixpoint. First, Fp(S) -- {not_d, not_a, not_b}. Then, F~(S) = {not_d}. Further, S C_ Fp(S). Thus, S is a normal alternating fixpoint. In addition, it is easy to see that S is the only normal alternating fixpoint. For example, S ~ = {not_d, not_a} is not an alternating fixpoint, since Fp(S') = { n o t J , not_a, not_b} and F~,(S') = {not_d}. Further, though S" = {not_d, not_a, not_b} is an alternating fixpoint but it is not normal. Thus, S is trivially maximal. S is also a preferred extension. This is more tedious to verify, since one has to consider, for each assumption not_C, all the possible assumption sets that may attack not_C, and for each such assumption set that does attack not_C, whether it is counter-attacked by S. We leave this to the reader to verify. We now show that there is one-to-one correspondence between maximal normal alternating fixpoints and preferred extensions. First, we have a lemma. L e m m a 3. Let P be a disjunctive program and S be an assumption set. Then, F~ (S) = Dp (S). For an alternating fixpoint S = Fa(S ), the normality condition S C_ Fp(S) ensures consistency. E.g. in the above example, {not_d, not_a, not_b} is an alternating fixpoint, but it is not _L-consistent and thus it is not normal. From this lemma, it is easy to show T h e o r e m 4. Let P be a disjunctive program and S be an assumptzon set. Then, S is a preferred extension of P iff S is a maximal normal alternating fixpoint of P.
148
As a corollary of this theorem, stable models of a disjunctive program (and minimal models of a positive disjunctive program) are a particular type of maximal normM Mternating fixpoints; namely, they are fixpoints of Fp (which are then trivially fixpoints of F~). C o r o l l a r y 5. Let P be a disjunctive program and S be an assumption set. Then, S is a stable extension of P iff S is a fixpoint of Fp.
5
Proof Procedure
In this section we show that the extensions of a head-cycle free disjunctive program P are precisely those of the normal program Pshilt. This result enables the use of Eshghi and Kowalski's abductive proof procedure to answer queries, soundly, using the normal program obtained by shifting. This result can be seen as an extension of those presented by Dung in 8 on the use of Eshghi and Kowalski's proof procedure for disjunctive programs. Dung's result relies on two restrictions. The first is that a program should be head-cycle free, the same as ours. The other is that there should be no default negation going through recursion. Our result does not need this second restriction. First, let us look at an example. Example 6. Consider disjunctive program P = {a V b +-- not_a}. P is head-cycle free simply because there is no positive body literal in the clause. The normal program obtained by shifting is Pshiyt: a e-- not_a, not_b b 6-- not_a Clearly, both P and
Pshit
have the same unique preferred extension, {not_a}.
Now we show the relationship between the extensions of a head-cycle free disjunctive program P and those of its normal transformation P, hi/t. As there seems to be a particular interest in this result, we provide a proof sketch here. T h e o r e m 6. Let P be a finite ground, head-cyice free disjunctive program, and Pshift be the normal program obtained by shifting. Then, an assumption set S is a preferred extension of P iff it is a preferred extension of Pshit. Proof Sketch: We only show the =r part. Without loss of generality, we assume that the given program P is free of positive body literals. This is because partial evaluation 3 can reduce them.
149
We show that, for any assumption not_C, if not_r is acceptable w.r.t. S using program P and under t-d , then not_r is acceptable w.r.t. S using Pshiyt and under ~-. The if clause in this statement is: for any assumption set W such that P U W t-d r we have P U S F ~, for some not.~ E W. The conclusion part is: for any assumption set W such that Pshiyt U W t- r we have Pshilt U S ~- ~, for some not_T, E W. Suppose W is such that Pshilt O W F r Then, there is a clause r +not_C, not_Rest in Pshilt, where not_C U not_Rest C_ W and not__Rest denotes the set of assumptions shifted from the head of the clause {r U Rest +-- not_C in P. Then, it is easy to see there is a sequence of derivation steps using RAC that resolve {r U Rest ~ with those assumptions in not_Rest to yield r Thus, P U W hd r It then follows from the if clause in the statement above that P LJ S ~-d ~, for some not.~ E W. This implies there is a clause {~} U Rest' +-- not_D in P such that not_D C_ S and {~} U Rest' ~ resolves with the assumptions in not_ResF (not_ResF C S) to yield ~. Then, clearly, the corresponding shifted clause ~ ~-- not_D, not_Rest ~ derives ~. That is, Pshilt US - ~, for some not.T, E W. This shows the conclusion part. t3 There is something of further interest in this result: a finite ground disjunctive program can be pre-processed by partial evaluation 3 to reduce all the positive body literals in a program so that the resulting program is trivially head-cycle free. 4 Thus, the Eshghi-Kowalski proof procedure plus partial evaluation provides a proof theory for the abductive semantics proposed in this paper. The soundness of such a proof theory is easy to verify, mainly due to the fact that partial evaluation is semantics preserving for the semantics proposed in this paper, and the semantics proposed here reduces to preferential/regular semantics, for which the Eshghi-Kowalski procedure is sound and complete (with positive loop checking). However, because disjunction cannot be simulated completely by normal programs, a procedure for normal programs cannot be complete in general for disjunctive programs.
Example 7. First, consider the following, head-cycle free, program P: aVb c+--a c~--b The program has two preferred extensions {not_a} and {not_b}. Its shifted 4 If a program P is nonground, partial evaluation may not terminate. Even it can be made to terminate for finite ground programs, the process is in general intractable.
150
program,
Pshlt,
consists of a 6- not_b b 6- not=a ct---a
c6-b
It has the same preferred extensions. Given the query 6-c
the Eshghi-Kowalski procedure will construct two proofs (by backtracking), one with abducible not_b and the other with not=a. The Eshghi-Kowalski procedure is complete in this case. However, the situation changes if we add two clauses {a 6- not_a; b 6- not_b} into P, denoted it by P~. P~ has one preferred extension, ~, and P~ Fa c. NOW P'hi/t t is: a 6- not_a b 6- not_b a 6- not=b b 6- not_a c6-a c6-b
Although it has the same preferred extension as P', it is no longer the case that P;hilt ' I=d C, and the Eshghi-Kowalski procedure cannot answer the query 6- c using P~hi yt" To make it a complete proof procedure, the Eshghi-Kowalski procedure should be combined a linear resolution procedure. In this case, even head-cyclic programs can be handled, since partial evaluation is a form of resolution procedure. Recently, it has come to our attention that, in an unpublished manuscript 6, Dung defined a procedure that combines Eshghi-Kowalski procedure with a form of linear resolution, SLI-resolution of Lobo et al.'s 21. The relationship between this procedure and our semantics is currently being investigated.
6
R e l a t e d Work and Final R e m a r k s
Work by Inoue and Sakama 15, 15 yields important insights into a different aspect of the problem; how to represent abductive programs by (extended) disjunctive programs. In 15, they proposed a transformation from the former to the latter, and in 27, they showed, in general, an abductive program can be viewed as a disjunctive program with priorities. Our work is about a proposal of an abductive/argumentation framework for disjunctive programs. Further, our
151
abductive semantics for disjunctive programs reduces to the regular/preferential semantics for normal programs. This is the key to applying Eshghi and Kowalski's abductive proof procedure. Dung shows that there is a natural a b d u c t i o n / a r g u m e n t a t i o n interpretation for explicit negation. His techniques can be applied to the semantics proposed here. How to a c c o m m o d a t e explicit negation in the Eshghi-KowMski procedure requires further investigation.
References 1. C. Baral and V. Subrahmanian. Stable and extension class theory for logic programs and default logic. J. Automated Reasoning, pages 345-366, 1992. 2. A. Bondarenko, F. Toni, and R. Kowalski. An assumption-based framework for nonmonotonic reasoning. In Proc. Pnd Int'l Workshop on LPNMR, pages 171189. MIT Press, 1993. 3. S. Brass and J. Dix. Disjunctive semantics based upon partinal and botton-up evaluation. In Proc. 12th ICLP, pages 199--216, 1995. 4. W. Chen and D. Warren. Extending prolog with nonmonotonic reasoning. J. Logic Programming, 27(2):169-183, 1996. 5. P. Dung. Acyclic disjunctive logic programs with abductive procedures as proof procedure. In Proc. 199P Fifth Generation Computer Systems, pages 555-561, 1992. 6. P. Dung. An abductive procedure for disjunctive logic programs. Unpublished manuscript, 1993. 7. P. Dung. An argumentation theoretic foundation for logic programming. J. Logic Programming, 22:151-177, 1995. A short version appeared in Proc. ICLP '91. 8. P. Dung. On the acceptability of argument and its fundamental nile in nonmonotonic reasoning and logic programming and n-person game. Artificial Intelligence, 76, 1995. A short version appeared in Proc. IJCAI'93. 9. T. Eiter and G. Gottlob. On the computational cost of disjunctive logic programming: propositional case. Annals of Mathematics and Artificial Intelligence, 15:289-324, 1993. 10. K. Eshghi and R.A. Kowalski. Abduction compared with negation by failure, tn Proc. 6th ICLP, pages 234-254. MIT Press, 1989. 11. A. Van Gelder. The alternating fixpoint of logic programs with negation. J. Computer and System Sciences, pages 185-221, 1993. The preliminary version appeared in PODS '89. 12. M. Gelfond, V. Lifschitz, H. Przymusinska, and M. Truszczynski. Disjunctive defaults. n Proc. Pnd Int'l Conf. on Principle of Knowledge Representation and Reasoning, pages 230-237, 1991. 13. L. Giordano, A. Marteli, and M. Sapino. A semantics for eshghi and kowalsld's abductive procedure. In Proc. lOth ICLP. MIT Press, 1993. 14. G. Gottlob. Complexity results for nonmonotonic logics. Journal of Logic and Computation, 2:397-425, 1992.
152
15. K. Inoue and C. Sakama. Transforming abductive logic programs to disjunctive programs. In Proc. lOth IGLP. MIT Press, 1993. 16. A. Kakas, R. Kowaiski, and F. Toni. Abductive logic programming. J. Logic and Computation, 2:719-770, 1992. 17. A. Kakas, R. Kowalski, and F. Toni. The role of abduction in logic programming. In Handbook of Logic in Artificial Intelligence and Logic Programming. Oxford University, 1995. 18. A. Kakas and P. Mancarella. Generalized stable models: a semantics for abduction. In Proc. 9th ECAL 1990. 19. A. Kakas and P. Mancarella. Stable theories for logic programs. In Proc. ILPS. MIT Press, 1991. 20. A. Kakas and P. Mancarella. Preferred extensions are partial stable models. J. Logic Programming, pages 341-348, 1992. 21. J. Lobo, J. Minker, and A. Rajasekar. Foundations of Disjunctive Logic Programruing. MIT Press, 1992. 22. W. Marek, A. Rajasekar, and M. Truszczyriski. Complexity of computing with extended propositional logic programs. In H. Blair, W. Marek, A. Nerode, and J. Remmel, editors, Proc. the Workshop on Structural Complexity and RecursionTheoretic Methods in Logic Programming, pages 93-102, Washington DC, 1992. Cornell University. 23. D. Poole. What the lottery paradox tells us about default reasoning. In Proc. KR '89, pages 333-340, 1989. 24. T. C. Przymusinski. Static semantics for normal and disjtmctive logic programs. Annals o Mathematics and Artificial Intelligence, 1995. 25. D. Sacc~ and C. Zaniolo. Stable models and non-determinism in logic programs with negation. In Proc. 9th ACM PODS, pages 205-217, 1990. 26. C. Sakama and K. Inoue. Relating disjunctive logic programs to default theories. In Proc. 2nd lnt'l Worshop on LPNMR, pages 266-282. MIT Press, 1993. 27. C. Sakama and K. Inoue. Representing priorities in logic programs. In Proc. Int'! Conference and Symposium on Logic Programming. MIT Press, 1996. 28. J. You, R. Cartwright, and M. Li. Iterative belief revision in extended logic programming. Theoretical Computer Science, 170(1-2):383-406, 1996. 29. J. You and L. Yuan. A three-valued semantics for deductive databases and logic programs. J. Computer and System Sciences, 49:334-361, 1994. A short version appeared in Proc. PODS '90. 30. J. You and L. Yuan. On the equivalence of semantics for normal logic programs. J. Logic Programming, 22(3):211-222, 1995.
A s s u m p t i o n - C o m m i t m e n t in A u t o m a t a Swarup Mohalik and R. Ramanujam The Institute of Mathematical Sciences C.I.T. Campus Chennai - 600 113 India {swarup, jam}@imsc.ernet.in ABSTRACT In the study of distributed systems, the assumption - c o m m i t m e n t framework is crucial for compositional specification of processes. The idea is that we reason about each process separately, making suitable assumptions about other processes in the system. Symmetrically, each process commits to certain actions which the other processes can rely on. We study such a framework from an automata-theoretic viewpoint. We present systems of finite state automata which make assumptions about the behaviour of other automata and make commitments about their own behaviour. We characterize the languages accepted by these systems to be the regular trace languages (of Mazurkiewicz) over an associated independence alphabet, and present a syntactic characterization of these languages using top-level parallelism. The results smoothly generalize for automata over infinite words as well.
1
Introduction
A distributed system usually consists of a finite number of processes, which proceed asynchronously and periodically exchange information between each other. A theoretically simple model of a distributed system is one where the network of processes is static, so that the number of processes and the channels of communication between them is fixed, and messages are not buffered, so that all communication can be studied as synchronization. It was such a simplification that led to the model of Communicating Sequential Processes H78, H85. The attractiveness of such a model is that when there are n processes in the system, we can represent the system as a parallel composition (P1/~ IIPn), and in each of the Pi we can refer to synchronizations with other Pj. When analyzing the behaviours of the system, ideally, we would like to study the behaviours of each process separately and put them together. Such compositional reasoning formed a major issue of concern in the eighties Z89, PJ91.
154
Early on, it was realized that the assumption - c o m m i t m e n t FP78, MC81 or rely - guarantee J83 paradigm was required for local reasoning. While there are some differences between these two approaches, we are not concerned about these differences here, at least for the purposes of this paper. The main idea is that each process makes assumptions about the behaviour of other processes and commits to fulfilling those made by other processes about its own behaviour. We can see this as conditional commitment to a transaction: "if others commit to doing x and y, then I commit to doing z". Symmetrically all processes proceed this way, and we can compose processes only when mutual assumptions are met. This facilitates local reasoning: we can reason about the behaviour of each process separately, assuming that others maintain relevant properties and reason globally about their compatibility. While a number of researchers seem to have studied this framework in the context of programming methodology, process algebras or temporal logics, there seems to have been little effort in formulating it from an automata-theoretic viewpoint. Implicitly, these models assume each process to be a machine of some sort, but studying formally the implications of each process being a finite-state machine is a different exercise altogether. The closest to such an approach seems to be that of asynchronous cellular automata CMZ93, in the sense that the moves of a process may depend on the moves of others so that they move jointly. However, the connection of these automata to local reasoning as described here is not quite transparent. In this paper, we study systems of finite state automata, where alongwith a distributed alphabet of actions we also have a c o m m i t m e n t alphabet to specify conditional synchronization. An automaton A1 may synchronize with another automaton A2 on an action (a, A1, A2) where we see A1 as committing to A1 provided A2 commits to A2. Symmetrically, for such a synchronization to occur, A2 must have a transition on a where it commits to A2. (In this case, As may not require the A1 commitment from A1 .) We propose such automata and study their products. We consider the class of languages accepted by such systems and show them to be the same as regular trace languages (of Mazurkiewicz), thus showing that these systems have the same power as asynchronous (cellular) automata. We then present a Kleene-characterization of these languages using parallel composition of regular expressions. We then point out that when considering infinite behaviours of these automata the results smoothly generalize. Why should one look for an automata-theoretic account of the assumption commitment framework? An important reason is that these automata can serve as natural models for temporal logics based on local reasoning (for instance the one in R96). Compositional model checking is one of the major goals of computer-aided verification (Att95), and we believe that the a u t o m a t a presented here (particularly over infinite words) may help. Consider a temporal logic with local assertions that may refer to other agents' local formulas and with global formulas that assert compatibility. Then each formula can be associated with an automaton of our kind, so that model checking can be done in layers: individually for each agent and globally for compatibility.
155
An important problem that we do not study here is that of dynamic networks of processes. The assumption - commitment framework is general enough to include such networks, which can, in principle, accommodate unboundedly many processes. In future work, we hope to study the automata theory of networks where processes may fork and join at will. These a u t o m a t a also bear close resemblance to knowledge-based programs FHMV95, where a process makes conditional transitions based on what it knows about other processes. Incorporating such knowledge-based transitions in automata constitutes a proper generalization of our systems: here, an automaton may refer to assumptions about others only at synchronization points, whereas a knowledge-based program can refer to its knowledge about others at any stage of its computation. We see this as an interesting issue for future study.
2
The
automata
We begin with some notation. An automaton over a finite nonempty alphabet ~7 is a tuple M = (Q, ~, q0, F), where Q is a f i n i t e set of states, ~ C Q โข E โข Q is the transition relation, q0 E Q is the initial state, and F C Q is the set of final states of the automaton. As usual, we write qa_&ql to mean that (q, a, ql) E 6. Note that the automaton is in general nondeterministic. A run from q0 E Q to {11 a2 ~lk qk E Q is a sequence of transitions of the automaton : qo'-*ql--*...-'*qk, where k > 0. When q~ E F, we say that the sequence al ... ak is accepted by M, and denote the set of all strings by L ( M ) , the (regular) language accepted by M. Here we consider systems of such automata, for which we define the notion of a d i s t r i b u t e d a l p h a b e t . Fix a finite set of locations Loc -" { 1 , . . . , n}, n > 0. A distributed alphabet is a tuple ~ = ( ~ 1 , . - . , En), where each ~i is a finite nonempty set of actions of the i th automaton. When a E ~i f3 Ej, i r j, we think of it as a synchronization action between i and j. Given a distributed alphabet ~ , we often speak of the set E d J ~'1 IJ ... U z~n as the alphabet of the system. We also make implicit use of the associated function loc : ~U --~ 2 {1.....n} defined by loc(a) dej {i l a e Zi}. As suggested in the last section, we have another alphabet, which we call the c o m m i t m e n t a l p h a b e t . This is a tuple C = < C1,C2,.-.,Cn >, where for all i r j, C~ M Cj = {_l_}. The element / is the null assumption (or commitment). We call C = C1 U ... U Cn the commit set. For a E ~ , we use the notation Ca to denote the set U Ci. ieloc(a) Let -
where r E 4~a. When r = _L for i E loc(a), we treat this as a "don't care" condition.
156
D e f i n i t i o n 2.1 Given a distributed alphabet, ~ = ( $ 1 , . . . , Sn) and a commit alphabet "C, we define extended alphabets as follows: sf
doj { < a, r > la e Si and r 9 ~,}. ~ , . ",E~ >. S c def 9 = U Sf.
z~ c def = <
-
iELoc
We use A, # etc to denote the members of S c. We extend the function loc to 57c as well: lot(< a,r >) def {i I a 9 Si}. Note that the extended alphabet is more general than we need. When a 9 S , we need not consider functions r where Vi 9 loc(a), r = _L; further when loc(a) = {i}, there is no need to consider commitments at all. In the interest of simple notation, we retain the generality of this presentation. (In cases like these, we will unabashedly refer to a 9 ~f.)
We will need three kinds of projection maps, and we define them below. The first is the commit erasure map: a : E c* ---+ E* defined as : a ( < a1,r > "'" <~ ak, Ck >)
dej al ' ' ' a k . The second is the component projection map: r:
( s c* โข Loc) ---, ~ c . , defined by: eli def e (where e denotes the null string), and (Ax) ri de__9A(x i), i f / 9 loe()~), and is x It, otherwise9 At times, we abuse notation and use the symbol as a component projection map from (E* โข Loc) to Z* defined in identical fashion. The third is the commit projection map : 1: ( S c* โข Loc) ---+ C* defined by: e J. i d~efe , and (< a, r > x) ~ i def r ~ i), if i 9 loc(a), and is x ~ i, otherwise 9 We are now ready to define AC-automata, the class of machines which make assumptions and commitments. We first define individual a u t o m a t a and then systems of such automata 9 D e f i n i t i o n 2.2 Consider the distributed alphabet ~ = ( $ 1 , . . . , Z , ) , the commit alphabet C and the associated extended alphabet S~c over the set of locations Loc. Let i 9 { 1 , 2 , . . . , n } .
1. An A C - a u t o m a t o n over S c is a tuple A4i = (Qi, 'i, Sio) where, Qi is a finite set of states, S~o is the initial state, and 'i C_ Qi x K,c x Qi is the transition relation. 2. A S y s t e m o f A C - a u t o m a t a over the extended alphabet S, c is a tuple AJ = (A41, A42, ... , Alia, F), where each .h/i is an AC-automaton over S c.,, and
F g (Q~ โข
โข
A remark is in order. It is not necessary that the commit alphabet be fixed universally for the system, as we have done above. We could have defined each AC-automaton with its own commit alphabet and subsequently ensured in the definition of systems that for all i, j 9 Loc, the jth commit set of automaton i is contained in the jth commit set of automaton j. However, this tends to clutter up notation considerably, so we stick with the (more restricted) notation of globally determined commit alphabet. Note that we have associated final states only with systems rather than with individual automata. This is natural in the assumption - commitment framework:
157
local reachability of any state for any automaton is dependent on the behaviour of other a u t o m a t a in the system, and hence properties of the system are best given in terms of global states 1. We might wish to partially specify global states with a "don't care" condition on some components in the system, but this is easily specified using a set of..~lobal states as above. The global behaviour of 2vt is given below as that of the product automaton M associated with the system. Note that the system is then a (finite state) machine over S , thus hiding away assumptions and commitments as internal details. This fits with the intuition that the behaviour of a distributed system is globally specified on S , and the machines in the system are programmed to achieve this, using internal coordination mechanisms like synchronization and commitments among themselves. D e f i n i t i o n 2.3 Given a system of AC-automata A4 = (.h4t, A42,-.-,)vtn, F )
over Sc, the product automaton associated with the system is given by: M = ~ > , F ) where, 0 = (Q1 x . . . x Q , ) , a n d ~ C_ Q โข is defined by: < Pl,P2,'",Pn > ~ < ql,q2,''',qn > iff
(Q, ==~, < s ~ 1 7 6
1. Vi ~ loc(a),pi = qi, and e. (Vj E loc(a), there exists Cj e ~a) such that
(vi
lot(a),
,,
and (for all k e
Ck(k))).
We extend the one step transition function ==~ to words over E* . Then the product accepts a string x E S* if there is a state ~ in F such that < s ~ sO, ... , s ~ > ~ q . The language accepted by the system . ~ is then given as {x E ~* Ix is accepted by M }, and is denoted L = s ). The class of languages over E accepted by systems of AC-automata is denoted as s Formally, s = {n C Z* I there is a commit alphabet and an AC system . ~ over Sc such that n = s )}. Since the product is a finite state automaton, we note the following simple fact. F a c t 2.4 s
is included in the set of regular languages over S .
Note that we have defined AC-automata to be nondeterministic and the products are nondeterministic as well. Moreover, we even have systems where each automaton is deterministic whereas the product is not. For instance, consider 9 . (a 9 x) (a,โข177 the system of two a u t o m a t a where one has transitions r0 ' '} rl, r0 ~ r2 and the other has the transition so(a'a"~X)sl. Then the product will have two nondeterministic transitions on a from (r0, So) to (rl, sl) and to (r2, sl). The following observation ensures that indeterminacy in products of deterministic a u t o m a t a can only arise in this way. 1 We can formally show that the theorems in the paper go through even if we have only local sets of final states; but this makes the presentation messy, so we omit it.
158
Call an AC-automaton .&4 = (Q, transition q
~, q0) c o m p l e t e if whenever there is a
in A4, then for all i e loc(a), r
# J_.
O b s e r v a t i o n 2.5 For Aas y s t e m of complete deterministic AC-automata Ad , the product automaton M associated with .It4 is also deterministic. Let f..(cd - A C M ) ~ denote the set of all languages over bY accepted by systems of complete deterministic AC-automata. We will show in the next section that this is the same as s so the extra generality in the definitions is only for convenience and does not add expressiveness. Fig.l.AC-system for the language (ablc + aabbc)*.E1
~1
-" {a, c}, E2 = {b, c}.
ql (C,X2,y2) (c,xl,yl) M2(
(c,x2,y2) q3
3
M
/ C{
a p2, 1) (P~)b
, )
a ~ p 2 ~ , b ~ a ~ l'q3,
We now present a simple example of these automata. Consider an AC-system of two agents which operate on the alphabets {a, c} and {b, c} respectively, a and b are local actions and c is a synchronization. Suppose we want the system behaviour that they synchronize whenever they have each done one local action or when they have both done two local actions. Simple products of automata would not suffice here as they would allow for a synchronization when one of them has done one local action, whereas the other has done two, and such a stretch of behaviour is disallowed by the specification. A solution is presented in Figure 1, where the commit alphabet is ({zl, z2}, {Yl, Y2}). Here zl is a commitment for early synchronization by agent 1 and z2 for late synchronization, and similarly for agent 2.
159
3
The
languages
In this section we study the class of languages accepted by systems of ACautomata, which we have called s Since the behaviour of these have been given by products of automata, these are a kind of synchronized shuffle of languages. The only additional complication is to introduce assumptions and commitments and match them during the shuffle 2Peration. For this section, fix a distributed alphabet S = ( ~ 1 , . . . , ~ , ) , a commit alphabet C, and hence the associated extended alphabet S~'~ over Loc. D e f i n i t i o n 3.1 Let xi 9 El*, i E { 1 , . . . , n } . 1. We say that the tuple (xl, x 2 , . . . , xn) is c o m p a t i b l e iff for all i, j 9 (a) = ri). (b) (xi j) I J -z, (xj /) . j. 2. Suppose ( x l , x 2 , . . . , X n ) is compatible, and x 9 E*. We say that x ~s gene r a t e d by the tuple iff for all i E Loc, xri : a(xi). 3. Let Li c_ ~*, i 9 { 1 , . . . , n } . We define the n-ary A C - s h u t f l e of these languages by: L~IIL21 . . . L , %r {x 9 S* Ix is generated by a compatible tuple ( X l , X 2 , . . . , x n ) where for all i, xi 9 Li}. Let s - S h u f f l e ) denote the least class that includes the set {L C_ ,U* for some commit alphabet C, there exist regular languages L{ C S* such that L = nl II.'- L,~ and is closed under union. We now show that languages in AC-Shuffle are exactly the languages in s Before that, we need some propositions 2. For now, fix a system of AC-automata.AA : (JPtl, A42,..., โข n , F ) over Ec, and its associated product --- (~), ,, < s ~ so, ... , s ~ >, F ). Let =:=~i be the natural extension to words over E/* of the transition functions q.
Proposition 3.2 Let < rl,r2, . . . , r n > 3 <
q l , q 2 , . . . , q n > in M. Then for all i 9 Loc, there exist xi 9 ~ * such that r i ~ i q i, the tuple (xl, x 2 , . . . , x,~) is compatible, and x is generated by it.
Proposition 3.3 Suppose x 9 ~* . If for all i 9 Loc, there exist xi 9 โขc* such that r i =:~i qi, the tuple (xi, x 2 , " . , xn) is compatible, and x is generated by it, then < r l , r 2 , . . . , r n > ~ < q l , q 2 , . . . , q n > in M . L e m m a 3.4 Let Ad be an AC-system and let Fi C_ Qi, for all i. Let Li = L(AAi, Fi) and F : IIieLocFi. Then s ) : L1 II L2 I I " " II L , . 1 Proof. Let x 9 s ). Then there is an accepting path p : < q0, q20 , ' " , q~ > < q}, q~,..., q~ > in M where q~ 9 F/according to the assumption.
2 Proofs of Propositions 3.2 and 3.3 are given in Appendix 1
160
By Proposition 3.2, for all i E Loc there exist xi E Sc* such that s ~ i q ~ and x is generated by the compatible tuple (Xl, x 2 , . . - , x , ) . Then, xi E L i and by the definition of I, x E L1 II 52 I1"" I Ln. Let x E L1 II L~ II " II L . . Then, by definition of II, there exist xi E L i such that x is generated by (xl, x 2 , . ' . , xn) which is compatible. So for all i E Loc, there is a q~9 E Fi such that Sio=~iqii. By Proposition 3.3, < q 10 , q20 , ' " , q ~ > 2. q1'qI'
"'q~ >" Since < q},q~,...,q~ > e F by its construction, x e
). The lemma at once gives the following theorem as its corollary: T h e o r e m 3.5 ~ , ( A C M ~ ) C__s
- Shuffle~).
To show the converse of the above theorem, we take recourse to Mazurkiewicz trace theory DR95. Fix the distributed alphabet ~. D e f i n i t i o n 3.6 Define the relation ,,, on ~* as: for all x , y E S* , x ,,~ y dr xi = yi for all i E Loc. It is easy to see that ... is an equivalence. The equivalence classes of ,~ are called traces. In trace theory, it is customary to present ,-, in an alternative form: given ~ , we define a reflexive and symmetric relation :r C_ E x ~ as 2: = {(a, b)l loc(a) f7 loc(b) = 0}. 27 is called the independence relation. D e f i n i t i o n 3.7 Two words x and y are 1-trace equivalent, x ~ t Y, if there are words u, v E S* and (a,b) E I. such that z = uabv and y = ubav. The trace equivalence, ..,~, is the reflexive transitive closure of "~t. Since the definitions of ,-~ and "~t can be shown to be equivalent, we will use the symbol -~ to mean "~t as well. M ( E , Z ) = ~* / ,,, is called the trace monoid. Let r S* --* M(S,7:) be a morphism such that r = Ix~. The syntactic congruence "~T on M ( S , I ) is defined by Vr, t E M ( Z , Z ) , t " T r iffVtl,t2 E M ( S , I ) . t l t t 2 E T r ttrtz E T.
D e f i n i t i o n 3.8 A trace language T C M(S,77) is regular iff the syntactic congruence "~T is of finite index. Equivalently, T is regular iff r is a regular subset of E*. One can read the definition of regular trace languages as the regular languages that are closed under the equivalence relation ,~. Note that the closure of a regular language under ,~ need not be regular. For example, let L = (ab)*. If 27 = {(a, b)(b, a)}, then r is the language containing strings with equal number of a's and b's, which is not regular any more. Let R T L ~ denote the class of regular trace languages over ~ . The following proposition ensures that AC-automata accept regular trace languages.
161
P r o p o s i t i o n 3.9 s
- Shuffle~) C RTL~.
Proof. From L e m m a 3.4 we first see that s - S h u f f l e ~ ) is indeed included in the set of regular languages over E. Next, we show that every language L in ~ . ( A C - S h u f f l e ~ ) is closed under ,-~. It suffices to show that xbay e L whenever xaby E L for x , y E E* , a,b E ,U and (a,b) E ~. By definition of AC-shuffie, xaby is generated by a compatible tuple ( z l , x 2 , . . . , x , ) , with xi E L i . But, then xbay is also generated by (xl, z 2 , . . . , x , ) since for all i E Loc, xbayri = xabyri = tr(xi). Hence, by definition xbay E L. We now set out to show that every regular trace language over the distributed alphabet ~ is accepted by an AC-system. For this, we first need Zielonka's theorem characterizing regular trace languages by asynchronous a u t o m a t a Z87. A Zielonka automaton on ~ with n processes is a tuple ,4 - ( A 1 , . . . , An, A, F), where for every i e Loc, Ai - (Qi, ~ , Ai, s ~ is the i-th automaton, Q = IIieLoeQi is the state space of ,4, F C Q is the set of final states and s o = (s~ s ~ denotes the initial state of , 4 . A -- {6a I a e ~ } is the next state function, where 6a : lliezoc(a)Qi ---* 2U'ez~176 ")q'. ,4 is deterministic if Va 9 ~ , Vs 9 IIiezoc(a)Qi, I6a(s) g 1. The transition relation =vA between any two global states (pl, P 2 , . . . , Pn) and (ql, q2,.., qn) of ,4 is defined as: 9..p~) .a(a~,q2,...,qn) iff (qi, . . . . ,qi,) 9 ~,(p~,,...,p~,), , where { i l , . . . , ik} = loc(a), and pj -- qj, for all j ~ loc(a). This transition among the global states is extended to words over ~* in the natural way. The language accepted by ,4 is defined as: L(,4 ) - {x 9 Z* I Bs 9 F.s~ An immediate corollary of the transition on global states is the following. P r o p o s i t i o n 3.10 If (a,b) 9 7. then for all s, s' 9 Q, s=~s' iff s~aAS '. Conse-
quently L(,4 ) is closed under ..~. The trace languages accepted by A is defined as: T(,4 ) = {t 9 M ( S , 7.) I Then from the above proposition, we get:
Vu 9 t, 3s 9 F: s~ Corollaryl.
r
) = L(,4 ).
T h e o r e m 3.11 (Zielonka)
1. For every Zielonka automaton ,4, the trace language T(,4 ) C M ( E , I ) is regular. 2. For every regular trace language T C M ( S , Z ) there is a deterministic Zielonka automaton ,4 such that T = T(,4 ). We now show that every Zielonka automaton can also be presented as an AC-system of a u t o m a t a with the same behaviour. L e m m a 3.12 R T L ~ C_ ~ ( A C M ~ ) .
162
Proof. Let L E RTL. Then by Zielonka's theorem, there is a deterministic Zielonka-automaton .4 accepting L. Let it be ,4 = ( A 1 , . . . , A~, A, F), where for every i E Loc, Ai=(Qi, ~,i, Ai, sO) is the i-th automaton. Consider the commit alphabet given by Ci = Qi, for i E Loc. Define a system of AC-machines over the associated extended alphabet as follows: . ~ = (A41,A/12, 99. , M n , f ) , where A/i = (Qi, ',,s~ ) aria, Pi ,F).Theproofofthe lemma follows easily from the following claim:
(~, a, 4) 9 za iffp a '4(=~): Let (#, a, q) 9 A. Then for all i ~ loc(a), Pi = qi and by construction, for all i 9 loc(a), pi<~'C'>qi where/~i = pi,~i = qi and for all j 9 loc(a), r = pj. Then, by definition of transition in M, p ~ '4. ( ~ ) : Let/5 ~ '4. Then, by definition, for all i ~ loc(a),pi = qi, and for all
i 9 loe(a), pi
ACM~).
Thus, we have the following characterization theorem for the class of languages over ~ accepted by systems of AC-automata. T h e o r e m 3.14 E ( A C M ~ ) = s
4
The
- Shuffle~) = RTL~.
syntax
We now present a syntax for the languages studied in the last section. Given that 085 already provides a syntax for regular trace languages, this exercise may seem superfluous. However, the 085 syntax is global in nature, rather than a parallel composition of regular expressions. Given the intuition described in the introduction, namely that of describing systems of parallel programs which make assumptions and commitments about other programs in the system, looking for such a syntactic presentation seems well-motivated. Below, we write L1 .L2 to denote language concatenation, and L* for language (Kleene) iteration. The syntax is given in two layers, one for 'local' expressions
163
and another for parallel composition. Fix a distributed alphabet ~ , a commit alphabet C, and the associated extended alphabet.
A C R E G i ::=< a,C > 9 ~e I P + q P ; q l P * A C R E G ::-- rl II r2 II "'" I rn, ri e A C R E G i ( I~1 q- R2, R~ E A C R E G The semantics of these expressions is given as follows: for each i E Loc, we have a map 0~ : ACREG~ --* 2 ~c~ and globally a map 0 : A C R E G ---* 2 ~~ . These maps are defined by structural induction: -
< a, r >1~ = {< ~, r p + q~ = p~ u q~. ,;q~ = ,;. q~. p*~ = (pi)*.
>}.
- It1 II r= ll"" I rn -- r111 II It2= II"" II r,~,.,. - R1 + R2 -- R1 O R2. The class of regular languages generated by the A C R E G expressions is denoted as f~(ACREG). Formally, s = {L e E* for some commit alphabet C, there is an R 9 A C R E G over E c such that L = R}. L e m m a 4.1 f~(ACM)~ C s
Proof. Let L 9 s such that n = / : ( ~
Then there is a system A4 = (A41,A42,...,A4n, F) ). Then, L = U s f) where ~ l = (< fl41,... ,A/In >, fEF
{f}). Let f = (fl, f 2 , ' " , fn) and Li = L(A4i, fi). By Lemma 3.4, s ) = L1 II La . . . II Ln. Since each Li C ,U~* is regular, there is an ri 9 ACREGI such that ri = ni. Let X 1 = rl r2 - " I rn. Then by the last observation, = s ). Now let X = XS, +Xya + . . . + X f k and it is obvious that L = X. xA
L e m m a 4.2 s
where F = { f l , f 2 , ' " f k } . X 9 A C R E G
C RTL~.
Proof. Suppose L 9 s Then there is a commit alphabet C and an A C R E G expression R over it such that L = R. The proof is by induction on the number m of +'s in the A C R E G expression R. For the base case, when m = 0, R = rl I r2 II "'" II rn and R =ll Li where Li = rii for all i 9 Since ri 9 REGi, Li are regular. Then there exist deterministic finite automata A41 over ~c and a set Fi G Qi such that Li = L(A4i, Fi)). Let A4 = (A41, A42,... ,A4,, IIieLocFi be an AC-system. By Lemma 3.4, II ni = s ). Hence by Theorem 3.14, R = Li 9 RTL. The induction step follows routinely from the observation that the class of regular trace languages is closed under union.
164
Thus, we have our version of Kleene's theorem: T h e o r e m 4.3 s 5
Infinite
= s
behaviours
We now present AC-systems whose behaviours are given by infinite words. For lack of space, we merely define the systems and mention the results here. The details will be given in the full paper. Formally, fix a distributed alphabet ~ = ( ~ 1 , - . . , ~ , ) , a commit alphabet = (C1,..., Cn) and the associated extended alphabet Z "-5. An w A C - s y s t e m is a tuple A4 = (< fl41, fl42,..., Adn >, ~'), where fl4i are AC-automata over E c, i E Loc, a..nd .~ C__IlieLoe2 Qi . Let M be the product of the local AC-machines A4i. The behaviour of the system A4 , denoted as L~(A4 ) is defined as the subset of S ~ accepted by the product M with the acceptance table ~r. The automaton Maccepts a string x = aoal ... E Z ~ , if there exists an infinite run p = s ~ of the product system, and a tuple U = (U1,..., Un) E ~" such that__for all i 9 Loc,.2Infi(p ) = Ui where Infi(p) = {ql3~176sJi = q}. Then, L~(A4 ) = {x 9 Z ~ M accepts z}. Thus, s = {L C ,U~ there exists a commit alphabet C and an A
A
wAC-system A4 over ~c such that L = Lw(A4 )}. Note that we can extend all the earlier definitions (of _i, , a, 1 etc) to infinite strings, and define L1 l -.. l n n as before, where Li C_ E w. We can again show that such wAC-Shuffie languages exactly capture the behaviour of wAC-systems a, but the detour is now taken via Miiller asynchronous automata, introduced by DM93, and the proof uses results from GP92. The syntax is a smooth generalization, mirroring the way we construct wregular expressions. We now have three layers: ACREGi::=
R2 l "'"
e p+qp;qlp*
R1 + R2, r, s 9 A C R E G i , Pq 9 w A C R E G i Rn, Ri 9 w A C R E G i I X1 + X2, Xi 9 w A C R E G
The semantics of these expressions is defined in the obvious way, and again we can show a Kleene's theorem that these expressions exactly characterize the class of w-regular languages accepted by wAC-systems. E x a m p l e : We conclude the paper with an example of a simple two-process mutual exclusion problem that can be captured naturally by the ACA's. For simplicity, we abstract away the internal computational states and assume that the processors are always requesting for or executing in the critical section. We model this as following. Each of the processes can either be in state w(waiting to 3 Here again the main results follow from two propositions that are analogous to Propositions 3.2 and 3.3. We give their proof sketches in Appendix 2.
165
enter the critical section) or in state c(having entered the critical section) state. In order to gain access to the critical section from the wait state, the processes do a joint action c. The commitment alphabet is < {pl, npl}, {p2, rip2} >, where pi, i = 1, 2 denotes that process i is permitted and npi denotes that it is not permitted to enter the critical section. The design of process 1 can then be as follows: when 1 is in the state wl, it stays in the same state if it is not permitted entry to critical section. When it is permitted entry, assuming that process 2 is not permitted entry, it can go to the state cl denoting access of critical section. Process 2 is designed in a symmetric way. Figure 2 shows the two processes and also the product showing the global behaviour. It is clear that at no point both the processes can be in critical section thus satisfying the safety requirement. A Miiller acceptance table {{(wl, w2), (cl, w2), (wl, c2)}} ensures the liveness of both the processes. A c k n o w l e d g m e n t : We thank the anonymous referees for their valuable suggestions. Fig.2. ACA for two-processor mutual exclusion
a(
Wl~
b(
(c,npl,-)
(c,pl,np2) C1
W2~
(e,-, np2)
(c,npl,p2)
C2
M1
c
M2
(WI,W2)
(c1, w2)
(wl, c2)
PRODUCT References
AH95 CMZ93 DM93
Alur, R., and Henzinger, T., "Local liveness for compositional modelling of fair reactive systems", LNCS 939, 1995, 166-179. Cori, R., Metivier, Y. and Zielonka, W., "Asynchronous mappings and asynchronous cellular automata", Information and Computation, vol 106, 1993, 159-202. Diekert, V. and Muscholl, A., "Deterministic asynchronous automata for infinite traces", LNCS 665, 1993, 617-628.
166
DR95 FHMV95 FP78 GP92 a78 H85 J83 MC81 085 P J91 rt96 z87 z89
Diekert, V. and Rozenberg, G., The book o traces, World Scientific Press, 1995. Fagin, R., Halpern, J., Moses, Y. and Vardi, M., Reasoning about knowledge, M.I.T. Press, 1995. Frances, N., and Pnueli, A., "A proof method for cyclic programs", Acta Inf., vol 9, 1978, 138-158. Gastin, P. and Petit, A., "Asynchronous cellular automata for infinite traces", LNCS 627, 1992, 583-594. Hoare, C.A.R., "Communicating Sequential processes", Comm. A CM, vol 21, 1978, 666-677. Hoare, C.A.R., Communicating Sequential processes, Prentice Hall, 1985. Jones, C.B., "Specification and design of (parallel) programs", Proc FIP 83, 1983, 321-331. Misra, J., and Chandy, M., "Proofs of networks of processes", IEEE Trans. on Soft. Engg., vol 7, 1981, 417-426. Ochmanski, E., "Regular behaviour of concurrent systems", Bulletin of the EATCS, vol 27, 1985, 56-67. Pandya, P.K., and Joseph, M., "P-A logic: a compositional proof system for distributed programs", Distributed Computing, vol 5, 1991, 37-54. Ramanujam, R., "Locally linear time temporal logic", Proc 1EEE LICS, New Jersey, 1996, 118-127. Zielonka, W., "Notes on finite asynchronous automata", RAIRO-Inf. Theor. et Appli., vol. 21, 1987, 99-135. Zwiers, J., Compositionality, concurrency and partial correctness, Springer LNCS 32t, 1989.
Appendix 1 Here we prove the Proposition 3.2. Proposition 3.3 is proved similarly. Fix a system of AC-automata .s = (,~41,2r F ) over E c, and its associated product M = (Q, ,, < s ~ so,. .. , s o >, F ). P r o p o s i t i o n 3.2 Let < r l , r 2 , . - . , r n > = ~ < q l , q 2 , . . . , q ~ > in M. Then for all i E Loc, there exist xi E ~c* such that r i ~ i q i, the tuple (xl, x 2 , . . . , xn) is compatible, and z is generated by it.
Proof. The proof is by induction on the length of x. The base case, when x = e is triviM, as e is generated by (e, e,-- -, ~). For the induction step, let x = ya. Let < r 1' r2,.. ", r n >==~< pl, p2,. "', pn >==~< ql, q2,.. ", qn > be a path in M. By _Y_xY_i and induction hypothesis (IH), there exist yi E Z c* for all i such that r..i ----~ip y is generated by (Yl, Y 2 , ' " , Y,,). By the definition of transition in M there exist a set of transitions pi <~r162qi for all i e loc(a) such that for all j , k 9 loc(a),r "~i r We need to define the compatible tuple (xl,x2 . . . . ,xn) and show that x is generated by it. If i ~ loc(a), define zi to be Yi, and if i 9 loe(a), define ~i = Yi" < a,r >. By IH and the observation on/-transitions above, r i ~ q i for all i E Loe. We now show that zri = a(xi) for all i 9 Loc. Suppose i ~ foe(a). Then ~i = yali -- yi. Using IH, we get ~i = cr(yi) = cr(yi. < a,r > ) = cr(zi). On
167
the other hand, if i 9 loc(a), xri = yri . a. Using IH, we get xri = a(yi) 9 a = a ( w - < a, r >) = Thus, we only need to show that the tuple ( x l , x ~ , . . . , Xn) is compatible, for which we have already proved one condition. For i, j 9 Loc, a(x~ j) = (a(x~)) rj = ( x r i ) r j -- ( x j ) r i = (a(xj))i = a(xjri). We now show the other condition for compatibility. Suppose i ~ loc(a), and j 9 Loc. 1. 2. 3. 4.
(xi j) (Yi rj) (yj ri) (xi j)
~ j = (Yi j) ~ J, by construction of xi. ~ J __i (yj ri) j, j, by IH. -- (xj ri), again by construction of xi. ~ j -<_i (xj i) ~ j, from 1,2 ~nd 3.
Now consider the case when i 9 loc(a), and j 9 Loc. If j ~ loc(a), Itt gives the result, so further assume that j 9 loc(a).
1. 2. 3. 4. 5.
(zi j) j = (Yi" < a, r > j) ~ j, by construction of xi. from the definition of and ~. (W'< a, r > j) 1 J = (Yi rj) J. j . r (yi j) j --<j (yj i) I J, by IH. ~i (Yj ri) ~ j . r from 3. (yi rj) J" r j -~j (zj i) ~ j, from 1,2,3,4 and construction of xi.
Thus we have (xi j) ~ j -<j (zj ri) ~ j for all i, j 9 Appendix
as required.
2
The two most crucial propositions in the infinite case are analogous to Propositions 3.2 and 3.3. Here we give rough sketches of their proofs. Let a is a finite run of any machine M. Then, define last(a, M ) def the last state of a and f i r s t ( a , M ) def = the first state of a. Let p be a finite(infinite) path in M. Define statesi(p) d~f {ql~i
= q for some ~ visited in p}. A
P r o p o s i t i o n 5.1 Let p be an infinite run on some z 9 ~ in M such that I n f i ( p ) = Ui. Then for all i 9 Loc, there exist Pi in Adi for z i 9 E cw such that I n f ( p i ) = Us and z is generated by (21, x2, . . . , xn). Proof. After some finite point, say N, only the states from Ui's occur in p. So p can be written as p -- Poplp2..., where IPol = N , p k , k > 1 are finite paths on some xk E Z* and statesi(pk) = Ui for all i. By Proposition 3.2, let Pk be generated by a compatible tuple (z~, x~, 99-, ~nk). Let xJ = .~l.2.~s~J ~J ~j . for j 9. Loc. Each . x{ has. a finite path ~ in .gdi Also, last(d,Mj) = = first(p +l,M = fi,st(d+l,M ) for all k > 0. Hence, each xJ has an infinite path p/ in .bti and I n f ( p j ) = Uj. We verify the following as well. Let i, j 9 Loc. 1. o'(x i j) = cr(x~ j)a(x~ (j) . . . = a(x{ i)a(x~ ri)... = a(xJ i).
168
2. (z i j)
~ j = (x~ j) I J.(z~rJ) I J... _~ (=~ri) ~ j . ( 4 f i ) ~ j . . . = (=J ri) i J. 3. =ri = =~ri. ~ r i . . . = ~ ( ~ ) . ~ ( 4 ) . . . = ~(='). Hence, (x x, x 2 , . . . , x n) is compatible and generates x. P r o p o s i t i o n 5.2 Suppose x E ~o~ and for all i E Loc, there exists a run pi on some =i E E~* in ./Vii such that I n f ( p i ) = Ui and x is generated by a compatible tuple ( x l , x 2 , ' " , x n ) . Then there is an infinite path p in M for x such that Infi(p) = Ui for all i e Loc. Proof. The following claim is easy to verify.
C l a i m 1 Let x be generated by (xl, x2,-.-, xn). Let x ~ be a finite prefix of x and
let ,~ = I~'ril. Then ~' is generated b~ (~1ml, x2m~,...,
=him,).
Using the previous claim, write z = xlx 2 . . . such that each xJ is generated
by (x~,z~,.-.,z~) and for all i, states(g) = Vi where g is the finite path in A/i for the finite substring x i. By Proposition 3.3, for every xJ there is a pJ in M and last(p j, M ) = first(p k+l, "M). Hence the infinite path p = PlP~"" for x = xlx 2 . . . is in M. By the construction of p/'s, it follows that for all i, Infi(p) = Ui. 0
Compositional Design of Multitolerant Repetitive Byzantine Agreement 1 Sandeep S. Kulkarni
Anish Arora
Department of Computer and Information Science The Ohio State University Columbus, OH 43210 USA Abstract
We illustrate in this paper a compositional and stepwise method for designing programs that offer a potentially unique tolerance to each of their fault-classes. More specifically, our illustration is a design of a repetitive agreement program that offers two tolerances: (a) it masks the effects of Byzantine failures and (b) it is stabilizing in the presence of transient and Byzantine failures. 1
Introduction
Dependable systems are required to acceptably satisfy their specification in the presence of faults, security intrusions, safety hazards, configuration changes, load variations, etc. Designing dependable systems is difficult, essentially because some desired dependability property, say, availability in the presence of faults, may interfere with some other desired dependability property, say, security in presence of intrusions. As an example, in electronic commerce systems, design by replication facilitates availability but complicates security--replicas can be used to deal with faults which lose electronic currency, but they can also make it easier for intruders to double spend the money. To effectively formulate multiple dependability properties, we have proposed the concept of multitolerance 1: Each source of undependability is formulated as a "fault-class" and each dependability property in the presence of that faultclass is formulated as a type of "tolerance". Thus, multitolerance refers to the ability of a system to tolerate multiple classes of faults, each in a possibly different way. To design multitolerant systems, we recommend a component-based method 1: Our method, explained briefly, starts with an intolerant system and adds a set of components, one for each desired type of tolerance. Thus, the complexity of mnltitolerant system design is reduced to that of designing the components and of correctly adding them to the intolerant system. And, the complexity of reasoning about the interference between different tolerance properties is often reduced to considering only the relevant components, as opposed to involving the whole system. 1 Research supported in part by NSF Grant CCR-93-08640, NSA Grant MDA904-96-11011 and OSU G r a n t 221506. Email : {kulkarni,anish}~cis.ohio-state.edu; Web : http://www.cis.ohio-state.edu/{" kulkarni, anish }.
170
To further simplify the complexity of adding multiple components to the system, the method observes the principle of stepwise refinement: in the first step an intolerant program is designed; in each successive step, a component is added to the system resulting from the previous step, to offer a desired tolerance to a previously unconsidered fault-class, while preserving the tolerances to the previously considered fault-classes. The basic idea of transforming an intolerant program into one that possesses the required tolerance properties is indeed well understood and practiced, see for example 2, 3. In designing multitolerant programs, however, we have to deal with the additional complexity of multiple fault-classes: with respect to each fanlt-class, each component needs to behave in some desirable manner. To handle this complication in our compositional and stepwise method, while designing and adding a component in each step, we focus attention on the following strategy. (1) How to ensure that the added component will offer a desired tolerance to the system in the presence o/faults in the fault-class being considered ? (2) How to ensure that execution of the component will not interfere with the correct execution of the system in the absence of faults in all fault-classes being considered ?
(3)
How to ensure that execution of the component will not interfere with the tolerance of the system corresponding to a previously considered fault-class in the presence of faults in that previously considered fault-class ?
In this paper, we present a case study illustrating these three issues in the context of a compositional and stepwise design of a repetitive agreement program that offers two tolerances: (a) it masks the effects of Byzantine failures, by which we mean that it continues to satisfy the specification of repetitive agreement in the presence of Byzantine faults, and (b) it is stabilizing in the presence of transient and Byzantine faults, by which we mean that upon starting from an arbitrary state -which may result from the occurrence of these faults- it eventually reaches a state from where it satisfies the specification of repetitive agreement. The resulting multitolerant program is, to the best of our knowledge, the first program for repetitive agreement that is both masking tolerant and stabilizing tolerant. (Previous designs are only masking tolerant, e.g. 4, or only nonmasking tolerant e.g. 5, 6, but none is multitolerant. In fact, we are not aware of any previous design that is stabilizing tolerant.) The rest of the paper is organized as follows. In Section 2, we specify the problem of repetitive agreement. In Section 3, we identify a simple fault-intolerant program for solving the problem. In Section 4, we add a component to the program for masking the effect of Byzantine failure. In Section 5, we add another component for stabilizing from transient and Byzantine failures, while preserving the masking tolerance to Byzantine failures alone. We present an extension of our program in Section 6 and discuss its multitolerance-preserving refinement in Section 7. We comment on the general aspects of our method in Section 8, and make concluding remarks in Section 9.
171
2
Problem Statement: Repetitive Agreement
A system consists of a set of processes, including a "general" process, g. Each computation of the system consists of an infinite sequence of rounds; in each round, the general chooses a binary decision value d.g and, depending upon this value, all other processes output a binary decision value of their own. The system is subject to two fault-classes: The first one permanently and undetectably corrupts some processes to be Byzantine, in the following sense: each Byzantine process follows the program skeleton of its non-Byzantine version, i.e., it sends messages and performs output of the appropriate type whenever required by its non-Byzantine version, but the data sent in the messages and the output may be arbitrary. The second one transiently and undetectably corrupts the state of the processes in an arbitrary manner and possibly also permanently corrupts some processes to be Byzantine. (Note that, if need be, the model of a Byzantine process can be readily weakened to handle the case when the Byzantine process does not send its messages or perform its output, by detecting their absence and generating arbitrary messages or output in response.) The problem. In the absence of faults, repetitive agreement requires that each round in the system computation satisfies Validity and Agreement, defined below.
* Validity: If g is non-Byzantine, the decision value output by every nonByzantine process is identical to d.g. 9 Agreement: Even if g is Byzantine, the decision values output by all non-Byzantine processes are identical. Masking tolerance. In the presence of the faults in the first fanlt-class, i.e., Byzantine faults, repetitive agreement requires that each round in the system computation satisfies Validity and Agreement. Stabilizing tolerance. In the presence of the faults in the second fanlt-class, i.e., transient and Byzantine faults, repetitive agreement requires that eventually each round in the system computation satisfies Validity and Agreement. In other words, upon starting from an arbitrary state (which may be reached if transient and Byzantine failures occur), eventually a state must be reached in the system computation from where every future round satisfies Validity and Agreement. Before proceeding to compositionally design a masking as well as stabilizing tolerant repetitive agreement program, let us recall the wellknown fact that for repetitive agreement to be masking tolerant it is both necessary and sufficient for the system to have at least 3 t + 1 processes, where t is the total number of Byzantine processes 4. Therefore, for ease of exposition, we will initially restrict our attention, in Sections 3-5, to the special case where the total number of processes in the system (including g) is 4 and, hence, t is 1. In other words, the Byzantine failure fanlt-class may corrupt at most one of the four processes. Later, in Section 6, we will extend our multitolerant program for the case where t may exceed 1.
172
P r o g r a m m i n g n o t a t i o n . Each system process will be represented by a set of "variables" and a finite set of "actions". Each variable ranges over a predefined nonempty domain. Each action has a unique name and is of the form: (name) :: (guard) - - ~ (statement) The guard of each action is a boolean expression over the variables of that process and possibly other processes. The execution of the statement of each action atomically and instantaneously updates the value of zero or more of variables of that process, possibly based on the values of the variables of that and other processes. For convenience in specifying an action as a restriction of another action, we will use the notation (name') :: (guard I) A (name) to define an action (name ~) whose guard is obtained by restricting the guard of action (name) with (guard~), and whose statement is identical to the statement of action (name). Operationally speaking, (name ~) is executed only if the guard of (name) and the guard (guard ~) are both true. Let S be a system. A "state" of S is defined by a value for each variable in the processes of S, chosen from the domain of the variable. A state predicate of S is a boolean expression over the variables in the processes of S. An action of S is "enabled" in a state iff its guard (state predicate) evaluates to true in that state. Each computation of S is assumed to be a fair sequence of steps: In every step, an action in a process of S that is enabled in the current state is chosen and the statement of the action is executed atomically. Fairness of the sequence means that each action in a process in S that is continuously enabled along the states in the sequence is eventually chosen for execution.
3
Designing an Intolerant
Program
The following simple program suffices in the absence of faults: In each round, the general sends its new d.g value to all other processes. When a process receives this d.g value, it outputs that value and sends an acknowledgment to the general. After the general receives acknowledgments from all the other processes, it starts the next round which repeats the same procedure. We let each process j maintain a variable d.j, denoting the decision of j , t h a t is set to _L when j has not yet copied the decision of the general. Also, we let j maintain a sequence number sn.j, sn.j E {0..1}, to distinguish between successive rounds. (In Section 7, we consider the case where the sequence numbers are from the set { 0 . . K - l } where K >_ 2.) The general process. The general executes only one action, RGI: when the sequence numbers of all processes become identical, the general starts a new round by choosing a new value for d.g and incrementing its sequence number, sn.g. Thus, letting (9 denote addition modulo 2, the action of the general is:
RG1 ::
(Vk :: sn.k=sn.g)
~
d.g, sn.g := new-decisionO,sn.g ~ 1
173
The non-general processes. Each other process j executes two actions: The first action, RO1, is executed after the general has started a new round, in which case j copies the decision of the general. It then executes its second action, R02, which outputs its decision, increments its sequence number to denote that it is ready to participate in the next round, and resets its decision to _Lto denote that it has not yet copied the decision of the general in that round. Thus, the two actions of j are: ROI:: d.j=_l_ A (sn.j $ l = sn.g) R02 :: d.j~.l_
~
d.j := d.g
~
{ output d.j }; d.j, sn.j := _L,sn.j ~ 1
The correctness proof of R is straightforward. (The interested reader will find the proof in 7.) 4
Adding
Masking
Tolerance
to Byzantine
Faults
Program R is neither masking tolerant nor stabilizing tolerant to Byzantine failure. In particular, R may violate Agreement if the general becomes Byzantine and sends different values to the other processes. Note, however, that since these values are binary, at least two of them are identical. Therefore, for R to mask the Byzantine failure of any one process, it suffices to add a "masking" component to R that restricts action R 0 2 in such a way that each non-general process only outputs a decision that is the majority of the values received by the non-general processes. For the masking component to compute the majority, it suffices that the nongeneral process obtain the values received by other non-general processes. Based on these values, each process can correct its decision value to that of the majority. We associate with each process j an auxiliary boolean variable b.j that is true iff j is Byzantine. For each process k (including j itself), we let j maintain a local copy of d.k in D.j.k. Hence, the decision value of the majority can be computed over the set of D.j.k values for all k. To determine whether a value D.j.k is from the current round or from the previous round, j also maintains a local copy of the sequence number of k in SN.j.k, which is updated whenever D.j.k is.
The general process. To capture the effect of Byzantine failure, one action, MRG2, is added to the original action RG1 (which we rename as MRG1): MRG2 lets g change its decision value arbitrarily and is executed only if g is Byzantine. Thus, the actions for g are: MRG1 :: MRG2 ::
RG1 b.g
~
d.g := 011
The non-general processes. We add the masking component "between" the actions RO1 and R 0 2 at j to get the five actions M R O 1 - 5 : M R O 1 is identical to RO1. M R 0 2 is executed after j receives a decision value from g, to set D.j.j to d.j, provided that all other processes had obtained a copy of D.j.j in the previous round. M R 0 3 is executed after another process k has obtained
174
a decision value for the new round, to set D.j.k to d.k. M R 0 4 is executed if j needs to correct its decision value to the majority of the decision values of its neighbors in the current round. M R 0 5 is a restricted version of action R 0 2 that allows j to perform its output if its decision value is that of majority. Together, the actions M R 0 2 - 4 and the restriction to action R 0 2 in M R 0 5 define the masking component (cf. the dashed box below). To model Byzantine execution of j, we introduce action M R 0 6 that is executed only if b.j is true: M R 0 6 lets j change D.j.j and, thereby, affect the value read by process k when k executes M R 0 3 . M R 0 6 also lets j obtain arbitrary values for D.j.k and, thereby, affect the value of d.j when j executes M R 0 4 . Thus, the six actions of M R O are as follows: M R 0 1 :: R 0 1
MR02 :i ~.) "~:~ " ;'-S'N:/'i - ;,:/;, co;~'p'q ......
;-b.) ),~.jj
M R 0 3 :.", SN.j.k ~ l = SN.k.k M R 0 4 :'., .d.j.r ^ majdefined.j ^ d.j * maj.j . . . . M R 0 5 : : d.j r majdefined.j ^ d.j=maj.j:
) D.j.k, SN.j.k := D.k.k, SN.k.k )..dA:-..m~.j3 . . . . . . . . . . . . . . . . . . . . . . ) output_decision(d.j) ; d.j, sn.j :=.L, sn.j ~ l
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
M R 0 6 :: b.j
where, compl.j majdefined.j maj.j
:-'d./: ~/v~.) ~ i . . . . . . . . . .
) D.j.j : = 0 1 1 ; (11 k : SN.j.k ~ 1 = SN.k.k : Dj.k, SN.j.k := 011, SN.k.k)
---- (Vk :: S N . j . j = S N . k . j ) = cornpl.j A (Vk :: S N . j . j = S N . j . k ) = ( m a j o r i t y k :: D.j.k)
^ ( s n . j # SN.j.j)
Fault Actions. If the number of Byzantine processes is less than 1, the fault actions make some process Byzantine. Thus, letting l and m range over all processes, the fault actions are:
I{l : b.l}l < 1
)
b.m := true
P r o o f of correctness. In accordance with the design issues discussed in the introduction, this proof consists of two parts: (1) The masking component offers masking tolerance to R in the presence of Byzantine faults, and (2) the masking component does not interfere with R in the absence of faults. (1) For each round of the system computation, let v.j denote the value obtained by j in that round when it executes RO1, and let cordec be defined as follows.
cordec = d.g = (majority j :: v.j)
if ~b.g otherwise
Observe that in the start state of the round --where the sequence numbers of all processes are identical, i.e. (Vj, k :: sn.j = S N . j . k = sn.g), and no nonByzantine process has read the decision of g, i.e. (Vj : ~b.j : d.j = I ) - - only action RG1 in g can be executed. Thereafter, the only action enabled at each non-Byzantine process j is RO1. After j executes RO1, j can only execute its masking component. Moreover, j cannot execute R 0 2 until the masking component in j terminates in that round.
175
Thus, the masking component in j executes between the actions R 0 1 and R 0 2 in j. The masking component in j first executes action M R 0 2 to increment SN.j.j. By the same token, the masking component in k increments SN.k.k. Subsequently, the masking component in j can execute M R 0 3 , to update SN.j.k and D.j.k. Likewise, each k can execute M R 0 3 to update SN.k.j and D.k.j. Note that if k is non-Byzantine, D.j.k is the same as v.k, which in turn is equal to d.g if g is also non-Byzantine. It follows that eventually majdefined.j A maj.j = cordec holds, and the masking component in j can subsequently ensure that d.j = maj.j before it terminates in that round. After the masking component in j terminates, j can only execute action R02. It follows that, in the presence of a Byzantine fault, each round of the system computation satisfies Validity and Agreement. (2) Observe that, in the absence of a Byzantine fault, the masking component eventually satisfies majdefined.j A d.j = maj.j in each round and then terminates. Therefore, the masking component does not interfere with R in the absence of a Byzantine fault.
5
Adding Stabilizing Tolerance to Transient & Byzantine Failures
Despite the addition of the masking component to the program R, the resulting program M R is not yet stabilizing tolerant to transient and Byzantine failures. For example, M R deadlocks if its state is transiently corrupted into one where some non-general process j incorrectly believes that it has completed its last round, i.e., d.j = โข A SN.j.j ~ sn.j. It therefore suffices to add a "stabilizing" component to M R that ensures stabilizing tolerance to transient and Byzantine failures while preserving the masking tolerance to Byzantine failure. Towards designing the stabilizing component, we observe that in the absence of transient faults the following state predicates are invariantly true of MR: (i) whenever d.j is set to _L, by executing action M R 0 5 , j increments sn.j, thus satisfying SN.j.j = sn.j; and (ii) whenever j sets sn.j to be equal to sn.g, by executing action M R 0 5 , d.j is the same as 1. In the presence of transient faults, however, these two state predicates may be violated. Therefore, to add stabilizing tolerance, we need to guarantee that these two state predicates are corrected. To this end, we add two corresponding correction actions, namely M R 0 7 and M R 0 8 , to the non-general processes. Action M R 0 7 is executed when d.j is _L and SN.j.j is different from sn.j, and it sets SN.j.j to be equal to sn.j. Action M R 0 8 is executed when sn.j is the same as sn.g but d.j is different from โข and it sets d.j to be equal to i . With the addition of this stabilizing component to M R , we get a multitolerant program S M R .
MR07 :: MR08 ::
d.j=โข A SN.j.jr sn.j d.j~.L A sn.j=sn.g
-----+ -----+
SN.j.j := sn.j d.j := _L
176
Fault A c t i o n s . In addition to the Byzantine fault actions, we now consider the transient state corruption fault actions (let j and k range over non-general processes):
true true true
) ) ~
d.g, sn.g := 011,011 d.j, sn.j := 011,01 S N . j . k , D . j . k :-- 011,011
P r o o f of C o r r e c t n e s s . In accordance with the design issues discussed in the introduction, this proof consists of three parts: (1) The stabilizing component offers stabilizing tolerance to M R in the presence of transient and Byzantine faults, (2) the stabilizing component does not interfere with the execution of M R in the absence of faults, and (3) the stabilizing component does not interfere with the masking tolerance of M R in the presence of Byzantine faults only. (1) Observe that execution of the component in isolation ensures that eventually the program reaches a state where the state predicate S holds, where S
= (d.j=_l_ =~ S N . j . j = s n . j )
A (sn.j=sn.g
=~ d.j=.l_).
Since both disjuncts in S are preserved by the execution of all actions in M R , program M R does not interfere with the correction of S by stabilizing component. Starting from a state satisfying S, at most one round is executed incorrectly. For reasons of space, we omit the proof here and refer the interested reader to 7. (2) Observe that, in the absence of faults, S continues to be preserved, and hence the stabilizing component is never executed. Therefore, the stabilizing component does not interfere with M R in the absence of faults. (3) As in part (2), observe that, in the presence of Byzantine faults only, S continues to be preserved and, hence, the stabilizing component is never executed. Therefore, the stabilizing component does not interfere with M R in the presence of Byzantine faults.
6
Extension to Tolerate Multiple Byzantine Faults
To motivate the generalization of S M R to handle t Byzantine failures given n non-general processes, where n >_ 3t, let us take a closer look at how program S M R is derived from R. To design SMFt, we added to each process j a set of components C(j) (see Figure 1). g : Actions MRG1-2 j : Actions RO1-2 and MR06 C(j) : Actions M R 0 2 - 4 , M R 0 7 - 8 , and the restriction of MR05 (a) : Program R
(b) : Program SMR
Figure 1: S t r u c t u r e o f R a n d S M R
177
Note that action M R 0 2 is of the form of RG1 and that action M R 0 3 is of the form RO1 followed by R 0 2 . (D.j.j and S N . j . j play the role of d.g and sn.g and D.j.k and S N . j . k play the role of the d values at the non-general processes.) In other words, C(j) itself contains a repetitive agreement program ! With this insight, we are now ready to generalize program SMR to handle the multiple Byzantine faults, based on an idea that is essentially due to Lamport, Shostak, and Pease 4. (Our generalization, of course, is distinguished by being multitolerant.) Let g denote the general process, X denote the set of nongeneral processes, t denote the maximum number of Byzantine processes. We define S M R ( g , X , t) = B Y Z ( g , X, t, 0), where
BYZ(g,X,t,s)
=
inp(g, X, t, s) A MRGI(g, X, t, s)
U
(~ j : j E X : ROI(j,X,t,s)
~ MRG2(g, X, t, s) ~ w(j,X,t,s) ARO2(j,X,t,s)
MRO6(j,X,t, s) )
U (O j : j e X :
e(j,X,t, sog) )
and inp(g,X,t,s)
= =
d.(last(s),XU {g},t+l,trlast(s)) ~ โข ^ sn.(g,X, t, s) = sn.(last(s), X U {9}, t+l, trlast(s)) new_deeisionO
if s r 0 otherwise
w(j,X,t,s)
= =
majdefined.(j,X,t,s) true
ift>O otherwise
C(j,X,t,s)
=
MRO4(j,X,t, trlast(s)) MROS(j,X,t, trlast(s)) the empty program
=
^
d.(j,X,t,s)=maj.(j,X,t,s) U MROT(j,X,t, trlast(s)) ~ BYZ(j,X-{j},t-I,s)
ift>O otherwise
Here s is a sequence ; last(s) denotes the last element of s ; trlast(s) denotes the sequence obtained by omitting the last element of s ; s o j denotes the sequence obtained by appending j to s ; and action ac in program j is modified as follows: * j is replaced with the quadruple (j, X,t, s) * The quantification over k in compl is over the set { (k,X-{j},t-l, s o j ) : k E ( X - { j } ) } V { (j,X,t,s) } 9 The quantification over k in majdefined and maj is over the set
{ (j,X-{k},t-l, sok) : k ~ ( X - { j } )
} u { (j,X,t,s) } 9 If s is nonempty, the output_decision is assigned to the variable D.(j, X, t, s).(j, X t9 last(s), t + l , trlast( s) ) Observe that if the definition of S M R ( g , X, t) is instantiated with t --- 0, the resulting program is R. And, if the definition is instantiated with t -- 1, the resulting program is S M R (with the previously noted exception that action M R 0 3 in j of S M R is implemented by RO1 and R 0 2 in the bottommost instantiation of B Y Z , namely B Y Z ( j , X - {j}, O, (g))). Program S M R ( g , X, t) is multitolerant, i.e., it is masking tolerant to Byzantine faults and stabilizing tolerant to transient and Byzantine faults. We note that the structure of the proof of stabilization is the same as the proof for S M R : upon
178
starting from any state, the program reaches a state where S holds; subsequently, g is guaranteed to start a new round infinitely often; and when g starts the ( t + l ) - t h round, the resulting computation satisfies Validity and Agreement. The proof of masking tolerance is similar to the one in 4.
7
Refining the Atomicity While Preserving
Multitolerance
Our design thus far has assumed read-and-write atomicity, whereby each action of a process can atomically read the variables of the other processes and update the variables of that process. In this section, we show that our design can be refined into read-or-write atomicity, whereby each action of a process can either atomically read the variables of some other process or atomically write its variables but not both. We choose a standard refinement 8: In each process j a copy of every variable that j reads from another process is introduced. For each of these variables, an action is added to j that asynchronously reads that variable into the copy. Moreover, the actions of j are modified so that every occurrence of these variables is replaced by their corresponding copies. We perform this refinement successively in each step of our design. Thus, we refine R first, the masking component second, and the stabilizing component last. Below, we prove the properties of the program resulting from each refinement step, in terms of the issues (1)-(3) discussed in the introduction. Step 1: C o r r e c t n e s s of t h e r e f i n e d R. In the absence of faults, when g increments sn.9 by executing action RG1, the only actions of program R that can be executed are RO1 and then R02 at each non-general process. Until each non-general process j executes R02, g cannot execute any further action. Thus, by refining R, even if j first reads d.g and sn.g, then updates its local copies of d.g and sn.g, and later executes the refined action RO1, g cannot execute any other actions in the meanwhile. Hence, the computations of the refined R have the same effect as those of R. Step 2: C o r r e c t n e s s of t h e refined m a s k i n g c o m p o n e n t . (1) In the presence of Byzantine faults, the refined actions of R do not interfere with the refined masking component. To see this, consider the refinement of action M R 0 2 of the masking component: To execute MR02, j needs to read the variable SN.k.j of process k. The refinement introduces a copy of SN.k.j at j. For the refined action M R 0 2 to be enabled, j must first update these copies from other processes. Also, if compl.j is true then it continues to be true unless j changes SN.j.j by executing MR02. Hence, M R 0 2 can be correctly refined. Likewise, actions M R 0 4 and M R 0 5 can be correctly refined. Regarding action MR03, recall from Section 6 that M R 0 3 is equivalent to the simultaneous execution of RO1 and R02 and, hence, it too can be correctly refined. Hence, the masking component executes only between the executions of RO1 and R02 and thus the refined actions of R do not interfere with the refined masking component. (2) In the absence of Byzantine faults, just as in Section 4, the refined masking component eventually satisfies majdefined.j A d.j = maj.j in each round and then terminates. Therefore, the refined masking component does not interfere with the refined R in the absence of Byzantine faults.
179
Step 3: Correctness of the refined stabilizing component. (1) Towards preserving stabilizing tolerance in the presence of transient and Byzantine faults while refining the stabilizing component, we claim that the set of possible sequence numbers has to be increased to {0..K-1} where K >_ 4. (This claim follows from the fact that between g and j there are four sequence numbers, namely, sn.g, sn.j, and the copies of sn.g and sn.j at j and g respectively; for details, see 8.) Moreover, a deadlock has to be avoided in the states of refined version of R where sn.j ~ sn.g and sn.j ~ sn.g โข 1. (These states do not exist in S M R since its sequence numbers are either 0 or 1.) Therefore, to preserve stabilization, we need to add actions to the stabilizing component that satisfy sn.j E {sn.g, sn.g @ 1}. If these actions set sn.j to sn.g, d.j must also be set to โข otherwise, action M R 0 2 may interfere with this component by incrementing sn.j. Alternatively, these actions may set sn.j to sn.g G 1. Either alternative is acceptable. Since the refined R and the refined masking component preserve each constraint satisfied by the refined stabilizing component, the former does not interfere with the latter in the presence of transient and Byzantine faults. (2) In the absence of Byzantine faults, just as in Section 5, the stabilizing component never executes. Therefore, the refined stabilizing component does not interfere with the other refined components in the absence of faults. (3) In the presence of Byzantine faults only, again the stabilizing component never executes. Therefore, the refined stabilizing component does not interfere with the other refined components in the presence of Byzantine faults. In summary, program S M R can be refined into read-or-write atomicity while preserving its multitolerance by asynchronously updating copies of the variables of "neighboring" processes. Note that the copies can be implemented by channels of unit length between the neighboring processes that lose any existing message in them when they receive a new message. It follows that the refined program is multitolerant for a message passing network where each channel has unit capacity. Moreover, using standard transformations, one can further refine the program into a message passing one with bounded capacity channels.
8
Generalizing From Our Design
In this section, we discuss the general aspects of our method in the context of the design for S M R . We find that our stepwise method of adding a component to provide each desired tolerance property facilitated the solution of the problem at hand. It is worthwhile to point out that this method is general enough to design programs obtained from various existing fault-tolerance design methods such as replication, checkpointing and recovery, Schneider's state machine approach, exception handling, and Randell's recovery blocks. T y p e s of t o l e r a n c e c o m p o n e n t s . The stabilizing component we added to M R , to ensure that the state predicate S holds, is an instance of a corrector. Corrector components suffice for the design of stabilizing tolerance and, more generally, nonmasking tolerant programs. Wellknown examples of correctors include reset procedures, rollback-recovery, forward recovery, error correction codes, constraint (re)satisfaction, exception handlers, and alternate procedures
180
in recovery blocks. Large correctors can be designed in a stepwise and hierarchical fashion by parallel and/or sequential composition of small correctors. The masking component we added to R is itself composed of two sub-component: one a detector and the other a corrector. The detector consists of actions M R 0 2 - 3 and the restriction to R02 in MR05, while the corrector consists of actions M R 0 2 - 4 . The task of the detector is to help preserve the safety properties (namely, Validity and Agreement) in the presence of Byzantine failure, by detecting the state predicate "the decision of j is that of the majority of the non-general processes", while the task of the corrector is to ensure that the same state predicate holds. Note that adding this detector but not the corresponding corrector would have yielded only fail-safe tolerance instead of masking tolerance. In other words, in the presence of Byzantine failure, Validity and Agreement would be satisfied if all processes output their decision, although some processes may never output their decision. More generally, detector components suffice for the design of fail-safe tolerance and, together, detector and corrector components suffice for the design of masking tolerance. Wellknown examples of detectors include snapshot procedures, acceptance tests, error detection codes, consistency checkers, watchdog programs, snooper programs, and exception conditions. Analogous to the compositional design of large correctors, large detectors can be designed in a stepwise and hierarchical fashion, by parallel and/or sequential composition of small detectors. The interested reader is referred to a companion paper 1 for an in-depth study of detectors and correctors. Maskingtolerantsystem
+detectors a~d F~lsafr , ~ corrr/ ~ , tolerant system + corrector~x,x~ detectors
Stabilizin~ (nonmasl~ng)
tolerant system
Intolerantsystem Figure 2: C o m p o n e n t s
t h a t suffice for d e s i g n o f v a r i o u s t o l e r a n c e s
S e l f - t o l e r a n c e s o f c o m p o n e n t s . Since a component that is added to tolerate a fault-class is itself subject to the fault-class, the question arises: what sort of tolerance should the component itself possess to that fault-class ?
We observe that in SMR, the masking component is itself masking tolerant to Byzantine faults and the stabilizing component is itself stabilizing tolerant to transient and Byzantine faults. In fact, in general, for adding stabilizing (nonmasking) tolerance, it suffices that the added component be stabilizing (nonmasking) tolerant. Likewise, for adding fail-safe tolerance, it suffices that the added component be fail-safe tolerant. And, for adding masking tolerance, it suffices that the added component be masking tolerant. In practice, the detectors and correctors often possess the desired tolerance trivially. But if they do not, one way to design components to be self-tolerant is by the analogous addition of more detectors and correctors components to them. Alternative ways are exemplified by designs that yield self-checking, selfstabilizing, or inherently fault-tolerant programs.
181
Masking tolerant system + masking components Failsafe / ~ tolerant system~ ~ system + stabilizing c o m p o n e n ~ f a i l s a f e components 99 . Stablllzmg tolerant_
Intolerant system
Figure 3: Self-tolerances of c o m p o n e n t s for various tolerances Stepwise design of tolerances. We observe that our decision to make the program masking tolerant first and stabilizing tolerant second is not crucial. The same program could also be obtained by adding components in the reverse order, to deal with stabilization first and masking second. In fact, in general, the same multitolerant program can be designed by adding the tolerance components in different orders. For the special case of adding both detector and corrector components for masking tolerance, the design may be simplified by using a stepwise approach 5: For instance, we may first augment the program with detectors and then augment the resulting fail-safe tolerant program with correctors. Alternatively, we may first augment the program with correctors and then augment the resulting nonmasking tolerant program with detectors. Masking tolerant system +
Stabilizin~ (npnmasIffng) tolerant system
detectors / " ~ " . . + /" <.. "-.
+ correctors",.
" ~ x . . 7 J / +
r
correctors
Failsafe tolerant system
detectors
Intolerant system
Figure 4: Two approaches for stepwise design of m a s k i n g tolerance On a related note, we observe that adding detectors to a stabilizing program suffices to enhance the tolerance of the correctors in the program from stabilizing to masking. Likewise, adding correctors to a fail-safe program suffices to enhance the tolerance of the detectors in the program from fall-safe to masking. Stepwise r e f i n e m e n t of tolerances. Our example illustrates the general principle that a component-based multitolerant program can be refined in the same manner as it is designed: in the first step, the original program is refined and its correctness in the absence of faults is shown. Then, the refined version of the first component is added and the tolerance properties in the presence of the first fault-class are shown9 And so on for each fanlt-class. Thus, in obtaining the refined program, the design/proof of the original program can be reused9 A l t e r n a t i v e tolerances to B y z a n t i n e failures. In the presence of Byzantine failures alone, S M R satisfies the specification of repetitive agreement in each round. The reader may wonder whether this strong guarantee is true of every repetitive agreement program that tolerates Byzantine failures. The answer is negative: Zhao and Bastani 6 have presented a program that is nonmasking tolerant to Byzantine failures, i.e., that could violate the specification in some finite number of rounds only. Moreover, we present here a program that is
182
stabilizing tolerant --but not masking tolerant-- to Byzantine failure. This program is composed from S M R and a nonmasking tolerant program outlined below. Again, for simplicity, we consider the special case where there are four processes and at most one is Byzantine. In our nonmasking tolerant program, each non-general process j chooses a "parent" of j that is initially g. In each round, j receives the decision value of its parent and outputs that value as the decision of j. In parallel, j obtains the decision value of g and forwards it to other non-generai processes. If the values that j receives from g and the other two processes are not all identical, j is allowed to change its parent, so that it will output a correct decision in the following rounds, as follows. Let j, k, and l be the three non-general processes. We consider two cases: (1) g is Byzantine and (2) g is non-Byzantine. Case (1): If g sends the value B to l and B $ 1 to j and the remaining process k, j and k will suspect that l or g is Byzantine, and ! will know that g is Byzantine and that j and k are non-Byzantine. Without loss of generality, let the id of j be greater than that of k. We let both j and l change their parent to k (to avoid forming a cycle in the parent relation, k retains g as its parent). In all future rounds, j and l output the value received from k and, hence, the decision output by j, k, and l is identical. Case (2): Since the values sent by both j and k are the same, both j and k are non-Byzantine. Again, assuming the id of k is greater than that of j, it is safe to let j change its parent to k. In all future rounds, the decision output by j and k is the same as that output by g. It follows that the nonmasking tolerant program executes only a finite number of rounds incorrectly in the presence of at most one Byzantine failure. This program is made stabilizing by adding S M R to it, as follows: Each process j is in one of two modes: nonmasking or stabilizing. It executes the nonmasking tolerant program when it is in the nonmasking mode, and it executes S M R when it is in the stabilizing mode. Further, it is allowed to change from the nonmasking mode to the stabilizing mode, but not vice versa. Observe that the nonmasking tolerant program satisfies the state predicate "if the parent of j is k for some k r g, then k is non-Byzantine, the parent of k is g, and the parent of i is k provided i is non-Byzantine". Hence, if j suspects that this predicate is violated, i.e., in some round j detects that either g or k is Byzantine, or the parent of k is not g, or the parent of I is not k, then j changes to the stabilizing mode and starts executing S M R . Moreover, whenever j detects that some other process is in the stabilizing mode, it changes its mode to stabilizing. Thus, if the composite program is perturbed to a state that is not reached by the nonmasking tolerant program, eventually all processes execute actions of S M R . It follows that the composite program is stabilizing tolerant but not masking tolerant to Byzantine failures.
9
Concluding Remarks
In this paper, we presented a case study in the design of multitolerance. Starting with an intolerant program, we first added a component for masking tolerance and then added another component for stabilizing tolerance. Our design illustrated the issues of: (i) how to add a component for offering a tolerance in
183
the presence of a fault-class, (ii) how to ensure that the added component does not interfere with the satisfaction of the program specification in the absence of faults, and (iii) how to ensure t h a t the added component does not interfere with the tolerances offered to the previously considered fault-classes. While our proofs of interference-freedom were presented in informal terms, they are readily formalized, e.g., in the temporal logic framework of 1. The formalization builds upon previous work on compositionality, e.g. 9, 10, 11. We observed t h a t a similar design is possible where we first add stabilizing tolerance and then masking tolerance. Also, not every repetitive agreement prog r a m that stabilizes in the presence of transient and Byzantine faults needs to be masking tolerant to Byzantine faults; in particular, it could be only nonmasking tolerant to Byzantine faults. Moreover, as discussed in Section 7, our initial design which assumed read-and-write atomicity is readily refined within the context of our design method into read-or-write atomicity or message-passing. A reviewer of this p a p e r correctly observed t h a t our design method could benefit from using the principles of adaptive fault-tolerance 12. Indeed, one way of simplifying the interference-freedom obligation between tolerance components is to choose only one of the tolerance components for execution at a time and to a d a p t this choice as the (fault) environment and internal state of the system changes. Note, however, t h a t the mechanism for adapting the choice of tolerance component must itself be multitolerant.
References 1 A. Arora and S. S. Kulkarni. Component-based design of multitolerance. Technical Report OSU-CISRC TR37, Ohio State University, 1996. 2 Z. Liu and M. Joseph. Transformation of programs for fault-tolerance. Formal Aspects of Computing, 4(5):442-469, 1992. 3 K. P. Birman and R. van Renesse. Reliable distributed computing using the Isis Toolkit. IEEE Computer Society Press, 1994. 4 L. Lamport, R. Shostak, and M. Pease. The Byzantine generals problem. ACM Transactions on Programming Languages and Systems, 1982. 5 A. Arora and S. S. Kulkarni. Designing masking fanlt-tolerance via nonmasking fanlttolerance. IEEE Transactions on Software Engineering, 1997, to appear. 6 Y. Zhao and F. B. Bastani. A self-adjusting algorithm for Byzantine agreement. Distributed Computing, 5:219-226, 1992. 7 S. S. Kulkarni and A. Arora. Compositional design of multitolerant repetitive Byzantine agreement (preliminary version). Third Workshop on Self.Stabilizing Systems (WSS 97), University of California, Santa Barbara, 1997. 8 A. Arora and M. G. Gouda. Distributed reset. IEEE Transactions on Computers, 43(9):1026-1038, 1994. 9 K. Apt, N. Francez, and W.-P. de Roever. A proof system for communicating sequential processes. A CM transactions on Programming Languages and Systems, pages 359-385, 1980. 10 H. Schepers. Fault Tolerance and Timing of Distributed Systems: Compositional specification and verification. PhD thesis, Eindhoven University, 1994. 11 S. Owicki and D. Gries. An axiomatic proof technique for parallel programs. Aeta lnformatica, 6:319-340, 1976. 12 J. Goldberg, I. Greenberg, and T. Lawrence. Adaptive fault-tolerance. Proceedings of the IEEE Workshop on Advances in Parallel and Distributed Systems, pages 127-138, 1993.
Algorithmic Issues in Coding Theory Madhu Sudan Laboratory for Computer Science, MIT Cambridge, MA 02139, U.S.A. madhu@lcs, mit. edu
A b s t r a c t . The goal of this article is to provide a gentle introduction to the basic definitions, goals and constructions in coding theory. In particular we focus on the algorithmic tasks tackled by the theory. We describe some of the classical algebraic constructions of error-correcting codes including the Hamming code, the Hadamard code and the Reed Solomon code. We describe simple proofs of their error-correction properties. We also describe simple and efficient algorithms for decoding these codes. It is our aim that a computer scientist with just a basic knowledge of linear algebra and modern algebra should be able to understand every proof given here. We also describe some recent developments and some salient open problems.
1
Introduction
Error-correcting codes are combinatorial structures t h a t allow for the transmission of information over a noisy channel and the recovery of the information without any loss at the receiving end. Error-correcting codes come in two basic formats. (1) The "block error-correcting code": Here the information is broken up into small pieces. Each piece contains a fixed finite amount of information. The encoding method is applied to each piece individually (independently). The resulting encoded pieces (or blocks) are sent over the noisy channel. (2) The "convolutional codes": Here the information is viewed as a potentially infinite stream of bits and the encoding method is structured so as to handle an infinite stream. This survey will be restricted to the coverage of some standard block error-correcting codes. Formally a block error-correcting code may be specified by an encoding function C. T h e input to C is a message m, which is a k-letter string over some alphabet E (typically ~ = {0, 1} but we will cover more general codes as well). E m a p s m into a longer n-letter string over the same alphabet 1 . The m a p p e d string is referred to as a codeword. The basic idea is that in order to send the message m over to the receiver, we transmit instead the codeword C(m). By the time this message reaches the destination it will be corrupted, i.e., a few letters in C(m) 1 The assumption that the message is a k-letter string over ~U is just made for notational convenience. As it will become obvious, the representation of the message space is irrelevant to the communication channel. The representation of the encoded string is however very relevant!
185
would have changed. Say the received word is R. Hopefully R will still be able to convey the original message m even if it is not identically equal to C(m). The only way to preserve this form of redundancy is by ensuring that no two codewords are too "close" to each other. This brings us to the important notion of "close" ness used, namely the Hamming distance. The Hamming distance between two strings x, y E ~n, denoted A ( z , y ) , is the number of letters where x and y differ. Notice that A forms a metric, i.e., A(z, y) = 0 =~ z = y, A(~, y) = A(y, z) and A(z, y) + A(z, z) >_ A(z, z). A basic parameter associated with a code is its distance i.e., the maximum value d such that any two codewords are a Hamming distance of at least d apart. Given a code of distance d and a received word R that differs from C(m) in at most e < d - 1 places, the error in the transmission can be detected. Specifically, we can tell that some letter(s) has been corrupted in the transmission, even though we may not know which letters are corrupted. In order to to actually correct errors we have to be able to recover m uniquely based on R and a bound t on the number of errors that may have occurred. To get the latter property t has to be somewhat smaller than d - 1. Specifically i f t < ( d - 1)/2J, then we notice that indeed there can be at most one message m such that A(C(m), R) < t. (If ml and m2 both satisfy A ( C ( m l ) , R), A(C(m2), R) < t, then A(C(ml), C(m2)) <_ A(ml, R) + A(R, m2) < 2t < d - 1, contradicting the distance of C.) Thus in an information theoretic sense R maintains the information contained in m. Recovering the information m efficiently from C is another matter and we will come back to this topic presently. To summarize the discussion above we adopt the following terse notation that is standard in coding theory. A code C is an n, k, dq code i f C : E k ---, ~n, where I f = q with m i n = , y e ~ {A(C(z), C(y))} = d. With some abuse of notation we will use C to denote the image of the map C (i.e., C may denote the collection of codewords rather than the map). C is called a e-error-detecting code for e = d - 1 and a t-error correcting code for t = (d - 1)/2J. In the remaining sections of this article we will describe some common constructions of n, k, dq for various choices of the parameters n, k, d and q. We will also describe the algorithmic issues motivated by these combinatorial objects and try to provide some solutions (and summarize the open problems). (We assume some familiarity with algebra of finite fields 10, 19.) Before going on to these issues, we once again stress the importance of the theory of errorcorrecting codes and its relevance to computer science. The obvious applications of error-correcting codes are to areas where dealing with error becomes important such as storage of information on disks, CDs, and communication over modems etc. Additionally, and this is where they become important to the theoretical computer scientist, error-correcting codes come into play in several ways in complexity theory - - for example, in fault-tolerant computing, in cryptography, in the derandomization of randomized algorithms and in the construction of probabilistically checkable proofs. In several of these cases it is not so much the final results as the notions, methods and ingredients from coding theory that help. All of this makes it important that a theoretical computer scientist be comfortable with the methods of this field - - and this is the goal of this article. A reader
186
interested in further details may try one of the more classical texts 2, 11, 17. Also, the article of Vardy 18 is highly recommended for a more detailed account of progress in coding theory. The article is also rich with pointers to topics of current interest.
2
Linear
Codes
While all questions relating to coding theory can be stated in general, we will focus in our article on a subset of codes called linear codes. These codes are obtained by restricting the underlying alphabet ,U to be a finite field of cardinality q with binary operations "+" and ".". Thus a string in S n can be thought of as a vector in n-dimensional space, with induced operations "+" (vector addition), and "." (scalar multiplication). Thus a code C C E n is now a subset of the vectors. If this subset of vectors forms a "subspace" then the code is linear, as made formal below: D e f i n i t i o n t . C C S ~ is a linear code if Va E ET, z, y E C, z + y, a 9z E C. Many of the parameters of error-correcting codes become very clean in the case of linear codes. For instance, how does one specify a code C E ~ n ? For general codes, succinct representations may not exist! However, for every linear code a succinct representation, of size polynomial in n does exist! In particular, we have the following two representations: 1. For every n, k, dq linear code C there exists an n x k "generator" matrix G = Gc with entries from S such that C = { G z l z E E k } . 2. For every n, k, dq code C there exists an (n - k) x n parity check matrix H = Hc over E such that C = {y E Z ~ s.t. H y = 0}. Conversely, the following hold: Every n โข k matrix G over S defines an n, k', dq code, for some d > 1 and k' < k, CG having as codewords {Gzlz E ~Uk}. Similarly every (n - k) x n matrix H defines an n, k', d code C~t , for some d > 1 and k' < k, having as codewords {y E Z n I H Y = 0}. Exercise: 1. Prove properties (1) and (2) above. 2. Given the generator matrix Gc of a code C, give a polynomial time algorithm to compute a parity check matrix He for C. 3. Show that if G is of full column rank ( H is of full row rank) then the code CG (CH) is an n, k, dq code.
3
Some common
constructions
of codes
In this section we describe some common construction of codes. But first let us establish the goal for this section. In general we would like to find families of
187
n, k, dq codes for infinitely m a n y triples (n, k, d) for some fixed q. The property we would really like is t h a t k / n and d / n are bounded away from zero as n --+ c~. Such a code is termed asymptotically good and the two properties k / n > 0 and d / n > 0 are termed constant message-rate and constant distance-rate respectively. Unfortunately we will not be able to get to this goal in this article. But we will settle for what we term weakly good codes. These are codes with polynomial message-rate, i.e., k = I2(n E) for some r > 0 and constant distancerate.
3.1
Hamming
code
H a m m i n g codes are defined for every positive n such that there exists an integer 1 such that n = 2 z - 1. Then the H a m m i n g code of block size n over the alphabet {0, 1} is given by an l x n parity check m a t r i x H HuG whose columns are all the distinct /-dimensional non-zero vectors. Notice that there are exactly 21 - 1 of these. L e m m a 2. For every positive integer n such that n = 2 t - 1 f o r some integer l, the H a m m i n g code of block size n is an n, n - l, 32 code. Proof Sketch. Notice that the rank of H nMG is I. In particular the column vectors containing exactly one 1 are linearly independent and there are I of them. Thus we find that the H a m m i n g code is an In, k, d2 code for k = n - 1. We now move to showing that the distance of the H a m m i n g code is 3. Notice t h a t the code has no elements of weights since this would imply that two vectors in the parity check m a t r i x are identical. This implies the distance is at least 3. Now consider any two column vectors vl and v2 in H rtMc. Notice that the vector vx + v2 is also a column vector of H ~Ma and is distinct from vl and v2. Now consider the n dimensional vector which is zero everywhere except in the coordinates corresponding to the vectors vl, v2 and vl + v2. This vector has weight 3 and is easily seen to be an element of the H a m m i n g code. Thus the distance of the H a m m i n g code is exactly 3. The H a m m i n g code is a simple code with a very good rate. Unfortunately it can only correct 1 error, definitely far from our goal of constant error-rate. Next we move on to a code with good error-correcting properties, but with very low-rate. 3.2
Hadamard
code
A H a d a m a r d matrix is an n โข n m a t r i x M with entries from +1 such t h a t M M T = n 9 I,~ where I,, is the n x n identity matrix. A H a d a m a r d m a t r i x immediately leads to an error correcting code where the rows of M are the codewords. This leads to a codeword over the alphabet ~ = {+1, - 1 } . We prove the distance property of the code first.
188
Lemma3.
If M is a Hadamard matrix then any two rows agree is exactly n/2
places. Proof. Say the rows of interest are the ith and j t h rows. T h e n consider the element ( M M T ) i j . This element is the s u m of n terms, with the kth term being mikmjk. Notice t h a t this t e r m evaluates to + 1 if mik = mjk and - 1 otherwise. T h u s if the ith and j t h rows disagree in t places, then ( M M T ) i j = (n - t) + t. Since ( M M T ) i j = 0, we have t h a t n - 2t = 0 and hence the two rows (dis)agree in exactly n/2 places. T h u s the task of constructing a H a d a m a r d code reduces to the task of constructing H a d a m a r d matrices. Constructions of H a d a m a r d matrices have been a subject of m u c h interest in combinatorics. It is clear (from L e m m a 3) t h a t for an n x n H a d a m a r d m a t r i x to exists n m u s t be even. T h e converse is not known to be true and is still an open question. W h a t is known is t h a t an n โข n m a t r i x exists for every n of the f o r m p - 1 where p is a prime. It is also known t h a t if an n l โข nl H a d a m a r d m a t r i x exists and an n2 โข n2 H a d a m a r d m a t r i x exists, then an nln2 โข nl n2 m a t r i x exists. M a n y other such constructions are also known but not all possibilities are covered yet. Here we give the basic construction which applies when n is a power of 2. These constructions are described recursively as follows:
+Mi_l +M~_ 1HDM-
+1 + =
Mp~ +1
Lemma4.
-
= HoM
For every I, the rows of M ~ DMform a 2/,1,2z-112 code.
Proof. Left as an exercise to the reader. T h e H a d a m a r d codes m a i n t a i n a constant distance-rate. However their messag rate approaches zero very quickly. Next we describe a code with constant messagerate and distance-rate. T h e catch is t h a t the code uses an Mphabet of growing
size. 3.3
Reed Solomon code
T h e Reed Solomon codes are a family of codes defined over an alphabet of growing size, with n ~ q. T h e more c o m m o n definition of this code is not (we feel) as intuitive or as useful as the "folklore" definition. We present b o t h definitions here, s t a r t i n g with the more useful one and then show the equivalence of the two. D e f i n i t i o n 5 ( R e e d S o l o m o n c o d e s ) . Let Z be afield of size q, n < q and let x 0 , . . . , x n - 1 be some fixed e n u m e r a t i o n of n of the elements of ~ . (It is s t a n d a r d to pick n = q - 1 and xi = ai for some primitive element a2. T h e n for every 2 a is a primitive element of the field GF(q) if a ~ ~ 1 for any j < q - 1.
189 1 k <_ n, the Reed Solomon code CRs,n,~,q is defined as follows: A message m =
m0 . . . ink-1 corresponds to the degree k - 1 polynomial M ( z ) = ~i=ok-1mizZ.. T h e encoding of m, is C~ts,n,k,q(m ) = Co... cn-1 where cj = M ( z j ) . The distance properties of the Reed Solomon codes follow immediately from the fact that a degree k - 1 polynomial m a y only have k - 1 zeroes unless all of its coefficients are zero.
For every n < q and k < n, the Reed Solomon code CR1S , n , k -an n, k, n - kq linear code.
Lemma6.
Proof. The fact t h a t the code is linear follows from the M l ( z ) are polynomials of degree at most k - 1 then so distance follows from the fact that if M0(xj) = MI(r M0 --- M1 (or equivalently if M0(zj) - M l ( x j ) is zero M0 - M1 is the zero polynomial).
forms
fact t h a t if M0(r is M0(r + MI(z). for k values of j for k values of j,
and The then then
Finally for the sake of completeness we present a second definition of Reed Solomon codes. This definition is more commonly seen in the texts, but we feel this part m a y be safely skipped at first reading. D e f i n i t i o n 7 ( R e e d S o l o m o n c o d e s ) . Let ~Ube a field of size q with primitive element ~, and let n = q - 1, k _< n. Let P~,q(z) be the polynomial (x - a ) ( x 2 a2) ... (z - a n - a ) . The Reed Solomon code Crts,n,k, q is defined as follows: A message m = m 0 . . . m k - 1 corresponds to the degree k - 1 polynomial M ( z ) = ~-~=olmix ~. The encoding of m, is C~tS,n,k,q(m) = c o . . . c n - 1 where cj is the coefficient of xJ in the polynomial Pk,q(x)M(x). Viewed this way it is hard to see the correspondence between the two definitions (or the distance property). We prove an equivalence next.
The definitions of Reed Solomon codes given in Definitions 5 and 7 coincide for n = q - 1 and the standard enumeration of the elements of GF(q).
Lemma8.
Proof. Notice t h a t it suffices to prove that every codeword according to the first definition is a codeword according to the second definition. The fact that the sets are of the same size implies that they are identical. Consider the encoding of m = m 0 . . . m k - 1 . This encoding is C~ts,n,~,q = k-I
Co...Cn-1 with ci = ~ j = 0 m J ( a i ) j" To show that this is a codeword according to the second definition we need to verify that the polynomial C ( z ) ~-~ X"n-1 Z . ~ i = O ci x i has (r - ~l) as a factor for every ! E { 1 , . . . ,n - k}. Equivalently it suffices to verify that C ( a ~) = 0, which we do next: n-1
i=O
190 n-lk-1 =
'
i=0 j=0 k-1
n-1
j=0
i=0
k-1
q-2
= ~-~mj )--~Tj,~ j=0
i=0
where 7j,~ = c~J+z. Notice that for every j , ! s.t. j + ! r q - 1, 7j,I r 1. Notice further that for every such 7j,l the s u m m a t i o n ~ i =q-27i 0 i,l = 03 . Since j e { 0 , . . . , k - 1}, we find that 7j,I is 1 for every l e { 1 , . . . , q - 1 - k). Thus for every 1 E { 1 , . . . , n - k}, we find that C(c~ z) = 0. This concludes the proof. 3.4
Multivariate polynomial codes
T h e next family of codes we describe are not very commonly used in coding theory, but have turned out to be fairly useful in complexity theory and in particular in the results on probabilistically checkable proofs. Surprisingly these codes turn out to be a common generalization of H a d a m a r d codes and Reed Solomon codes! D e f i n i t i o n 9 ( M u l t i v a r i a t e p o l y n o m i a l c o d e ) . For integer parameters m, l and q with l < q, the multivariate polynomial code CPOLV,m,t,q has as message a string of coefficients m = {mil,i~ .....i,~) with ij > 0 and ~ j i j ~ 1. This sequence is interpreted as the m-variate polynomial M ( x l , . . . , x m ) = ~ i , .....ij mi,. ijx~'.., x~". The encoding o f m is the string of letters { M ( z l , . . . , zm)) with one letter for every ( X l , . . . , zm) E ~,n. Obviously the multivariate polynomial codes form a generalization of the Reed Solomon codes (again using the first definition given here of Reed Solomon codes). The distance property of the multivariate polynomial codes follow also from the distance property of multivariate polynomials (cf. 5, 13, 21).
For integers m, 1 and q with l < q, the code CeoLv,m,t,q is an n, k, (~q code with n = qm, k = (mm+t) and d = ( q - 1)qm-1.
Lemmal0.
Proof. The bound on n is immediate. The fact that the number of coefficients / 1 , . . . , im s.t. ~ j ij < 1 is at (re+z) is a well-known exercise in counting. Finally the bound on the distance follows from the fact a degree l polynomial can only be zero for l/q fraction of its inputs. (This is an easy inductive argument based on the number of variables. The base case is well known and inductively one a This identity is obtained as follows: Recall that Fermat's little theorem asserts that ~/q-1 _ 1 = 0 for every non-zero 3' in GF(q). Factoring the left hand side, we find q--2 ,,{t that either "y - I --- 0 or )-'~,=o -- O. Since 7 ~t 1, the latter must be the case.
191
picks a random assignment to the variables acl,..., zm-1 and argues that the resulting polynomial in am is non-zero with high probability. Finally one uses the base case again to conclude that the final polynomial in a:m is left non-zero by a random assignment to zm.) It is easy to see that the code C~S,q,k,q is the same as the code CPOLY4,k-l,q. Also notice that the code CpoLY,m4,2 forms an 2m,m,2m-q2 code, same as parameters of the Hadamard code given by the rows of Mm uD=. It turns out that these two codes are in fact identical. The proof is left as an exercise to the reader. 3.5
Concatenated codes
Each code in the collection of codes we have accumulated above has some flaw or the other. The Hamming codes don't correct too many errors, the Hadamard codes are too low-rate, and the Reed Solomon codes depend on a very large alphabet. Yet it turns out it is possible to put some of these codes together and obtain a code with reasonably good behavior ("polynomially good"). This is made possible by a simple idea called "concatenation", defined next. D e f i n i t i o n l l ( C o n c a t e n a t i o n o f c o d e s ) . Let C1 be an nx, kl, dlql code over the alphabet ~1 and let C2 be an n2, k.% d~q~ code over the alphabet ~.~. If ql = qk~ then the code Ca o C2 is defined as follows: Associate every letter in ~1 with a codeword of C2. Encode every message first using the code C1 and then encode every letter in the encoded string using the code C9.. More formally, given a message m E Z ~ = Z 2k'k~ , let Cl(m) = cl ...ca~ e 27~~. The encoding C1 o C2(m) is given by c11...c1~2c21.., cam2 E ,Ur~'n2, where for every i e { 1 , . . . , n l } , cil...Cina = C2(ci). Almost immediately we get the following property of concatenation 9 L e m m a 12. If CI is an In1, kl, dlql
code and if C2 is an In2, k2, d2q2 code with ql = q~2, then C1 o C2 is an nln2, klk2, d'q~ code, for some d' > did2. Proof. T h e block size and message size bounds follow from the definition. To see the distance property, consider two messages m 1 , m 2 E E~ 1 9 For l E {1, 2}, let c~ c t be the encoding of m t using C1 and let c~1 ... Cnln ~ z be its encoding using CI oC29 Notice that there must exist at least dl values of i such that ci1 # ci2 (by the distance of C1). For every such i, there must exist at least d2 values of j 1 2 such that cij # cij (by the distance of C~). Thus we find that C1 o C2(m 1) and C1 o C2(rn 2) differ in at least did2 places. 9
"
'
~ 1
To best see the power of concatenation, consider the following simple application: Let C1 be a Reed Solomon code with q = 2 m, n -- q and k = .4n. I.e., C1 is an n, .4n, .6n2-~ code with n = 2 m. Let C2 be the Hadamard code 2 m , m, 2 m- 112. The concatenation C1 oC2 is an n 2, .4n log n, .3n212 code. I.e., the resulting code has constant distance-rate, polynomial rate and is over the binary
192
alphabet! Thus this satisfies our weaker goal of obtaining a weakly-good code. Even the goal of obtaining an asymptotically good code is close now. In particular, the code of Justesen is obtained by an idea similar to t h a t of concatenation. Unfortunately we shall not he able to cover this material in this article.
4
Algorithmic
tasks
We now move on to the algorithmic tasks of interests: The obvious first candidate is encoding.
Problem 13 (Encoding). INPUT: n โข k m a t r i x G and message m E ,U~. OUTPUT: C(m), where C = Ca is the code with G as the generator matrix. It is clear t h a t the problem as specified above is easily solved in time O(nk) and hence in time polynomial in n. For specific linear codes such as the Reed Solomon codes it is possible to encode the codes faster, in time O(n log e n) for some constant e. However till recently no asymptotically good code was known to be encodable in linear time. In a recent breakthrough. Spielman 15 presented the first known code that is encodable in linear time. We will discuss this more in a little bit. The next obvious candidate problem is the decoding problem. Once again it is clear t h a t if the received word has no errors, then this problem is only as hard as solving a linear system and thus can be easily solved in polynomial time. So our attention moves to the case where the received word has errors. We first define the error detection problem.
Problem 14 (Error detection). INPUT: n โข k generator m a t r i x G for a code C = CG; and a received word R E S n . OUTPUT: Is R a codeword? The error detection problem is also easy to solve in polynomial time. We find the parity check m a t r i x H for the code C and then check if HR =0. We now move to the problem of decoding in the presence of errors. This problem comes in several variants. We start with the simple definition first:
Problem 15 (Maximum likelihood decoding). INPUT: n x k generator m a t r i x G for a code C = CG; and a received word R E S n. OUTPUT: Find a codeword x E C, that is nearest to R in H a m m i n g distance. (Ties m a y be broken arbitrarily.) There are two obvious strategies for solving the m a x i m u m likelihood decoding problem: B r u t e F o r c e 1: Enumerate all the codewords and find the one that is closest to R. Brute
F o r c e 2: For t = 0, 1 , . . . , , do: Enumerate all possible words within a
193
Hamming distance of t from R and check if the word is a codeword. Output the first match. Despite the naivete of the search strategies above, there are some simple cases where these strategies work in polynomial time. For instance, the first strategy above does work in polynomial time for Hadamard codes. The second strategy above works in polynomial time for Hamming codes (why?). However, both strategies start taking exponential time once the number of codewords becomes large, while distance also remains large. In particular, for "asymptotically good" or even "weakly good" codes, both strategies above run in exponential time. One may wonder if this exponential time behavior is inherent to the decoding problem. In perhaps the first "complexity" result in coding theory, Berlekamp, McEliece and van Tilborg 4 present the answer to this question. T h e o r e m 16 4.
The Ma~:imum likelihood decoding problem for general linear
codes is NP-hard. There are two potential ways to attempt to circumvent this result. One method is to define and solve the maximum likelihood decoding problem for specific linear codes. We will come to this question momentarily. The other hope is that we attempt to correct only a limited number of errors. In order to do so, we further parameterize the maximum likelihood decoding problem as follows:
Problem17 (Bounded distance decoding). INPUT: n โข k generator matrix G for a code C = Ca; a received word R E Z n and a positive integer t. OUTPUT: Find any/all codewords in C within a Hamming distance of t from R. The hardness result of 4 actually applies to the Bounded distance decoding problem as well. However one could hope for a result of the form: "There exists an e > 0, such that for every n, k, dq linear code C, the bounded distance decoding problem for C with t = ed is solvable in polynomial time". One bottleneck to such a general result is that we don't know how to compute d for a generic linear code. This motivates the following problem:
Problem 18 (Mimmum distance). INPUT: n โข k generator matrix G for a code C = Ca and an integer parameter d. OUTPUT: Is the distance of C at least d? This problem was conjectured to be coNP-hard in 4. The problem remained open for nearly two decades. Recently, in a major breakthrough, this problem was shown to be coNP-complete by Vardy 18. While this does not directly rule out the possibility that a good bounded distance decoding algorithm may exist, the result should be ruled as one more reason that general positive results may be unlikely. Thus we move from general results, i.e., where the code is specified as part of the input, to specific results, i.e., for well-known families of codes. The first question that may be asked is: "Is there a family of asymptotically-good n, k, dq
194
linear code and e > 0, for which a polynomial time bounded distance decoding algorithm exists for t > ed?" For this question the answer is "yes". A large number of algebraic codes do have such polynomial time bounded distance decoding algorithms. In particular the Reed Solomon codes are known to have such a decoding algorithm for t <_ /(d - 1)/21 (cf. 2, 11, 17). This classical result is very surprising given the non-trivial nature of this task. This result is also very crucial for many of the known asymptotically good codes, since many of these codes are constructed by concatenating Reed Solomon codes with some other codes. In the next section we shall cover the decoding of Reed Solomon codes in more detail. Lastly there is another class of codes, constructed by combinatorial means, for which bounded distance decoding for some t >__ ed can be performed in polynomial time. These are the expander codes, due to Sipser and Spielman 14 and Spielman 15. The results culminate in a code with very strong - - linear time (!!!) - - encoding and bounded distance decoding algorithms. In addition to being provably fast, the algorithms for the encoding and decoding of these codes are surprisingly simple and clean. However, the description of the codes and analysis of the algorithm is somewhat out of the scope of this paper. We refer the reader to the original articles 14, 15 for details. 5
Decoding
of Reed
Solomon
code
As mentioned earlier a polynomial time algorithm for bounded distance decoding is known and this algorithm corrects up to t < (d - 1)/21 errors. Notice that this coincides exactly with the error-correction bound of the code (i.e., a Reed Solomon code of distance d is a t-error-correcting code for t = ( d - 1)/2J). This bound on the correction capability is inherent, if one wishes to determine the codeword uniquely. However in the bounded distance decoding problem we do allow for multiple solutions. Given this latitude it is reasonable to hope for a polynomial-time decoding algorithm that corrects more errors - say up to t < (1-e)d where e is some fixed constant. However no such algorithm is known for all possible values of (n, k, d = n - k). Recently, in 16, we presented an algorithm which does correct up to (1 - c)d errors, provided k / n --* O. This algorithm was inspired by an algorithm of Welch and Berlekamp 20, 3 for decoding Reed Solomon codes. This algorithm is especially clean and elegant. Our solution uses similar ideas to correct even more errors and we present this next. Notice first that the decoding problem for Reed Solomon codes can be solved by solving the following cleanly stated problem:
Problem 19 (Reed Solomon decoding). INPUT: n pairs of points {(xi, Yi)}, xi, Yi E GF(q); and integers t, k. OUTPUT: All polynomials p of degree at most k - 1 such that Yi ~ p(xi) for at most t values of i. The basic solution idea in Welch-Berlekamp and our algorithm is to find an algebraic description of all the given points, and to then use the algebraic
195
description to extract p. The algebraic description we settle for is an "algebraic curve in the plane", i.e., a polynomial Q ( x , y ) in two variables x and y such that Q(xi,yi) - 0 for every vMue of x and y. Given this basic strategy, the performance of the algorithm depends on the choice of the degree of Q which allows for such a curve to exist, and still be useful! (For example if we allow Q to be 0, or if we pick the degree of Q be n in x and 0 in y, the such polynomials do exist, but are of no use. On the other hand a non-zero polynomial Q of degree n/lO in x and 0 in y may be useful, but will probably not exist for the given data points.) To determine what kind of polynomial Q we should search for, we pick two parameters l and m and impose the following conditions on Q(x, y) = ~ i , j qij xiyJ: 1. Q should not be the zero polynomial. (I.e., some qij should be non-zero.) 2. qij is non-zero implies j < rn and i + (k - 1)j < I. (The reason for this restriction will become clear shortly.)
3. Q(xi, yi) = 0 for every given pair (xi, yi). Now consider the task of searching for such a Q. This amounts to finding values for the unknown coefficients qij. On the other hand the conditions in (3) above amount to homogeneous linear equations in qij. By elementary linear algebra a solution to such a system exists and can be found in polynomial time provided the number of equations (n) strictly exceeds the number of unknowns (i.e., the number of (i, j) pairs such that 0 < i, j, j _< m and i + ( k - 1)j < m). It is easy to count the number of such coefficients. The existence of such coefficients will determine our choice of m, 1. Having determined such a polynomial we will apply the following useful lemma to show that p can be extracted from Q. L e m m a 20 1. Let Q(x, y) = ~ j qijxiv i be such that qij = 0 for every i, j with i + (k - 1)j > I. Then if p(x)' is polynomial of degree k - 1 such that for
strictly more than I values of i, yi = p(xi) and Q(xi,yi) = O, then y - p(x) divides the polynomzal Q(x, y). Proof. Consider first the polynomial g(x) obtained from Q by substituting y = p(x). Notice t h a t the term q~jxiy2 becomes a polynomial in x of degree i + ( k - I ) j which by property (2) above becomes a polynomial of degree at most I in x. Thus g(x) = Q(x, p(x)) becomes a polynomial in x of degree at most I. Now, for every i such that Yi = p(xi) and Q(xi, yi) = 0, we have that g(xi) = Q(xi,p(~i)) = O. But there are more than I such values of i. Thus g is identically zero. This immediately implies that Q(x, y) is divisible by y - p(x). (The division theorem for polynomials says that if a polynomial h(y) evaluates to 0 at y -- ( then y - ( divides h(y). Applying this fact to the polynomial Q=(y) = Q(x, y) and y = p(x), we obtain the desired result. Notice in doing so, we are switching our perspective. We are thinking of Q as a polynomial in y with coefficients from the ring of polynomials in x.) Going back to the choice of m and l, we have several possible choices. In one extreme we can settle for m = 1 and then if I ~ (n + k)/2, then we find that the
196
number of coefficients is more than n. In this case the polynomial Q(z, y) found by the algorithm is of the form A ( x ) y + B(x). Lemma 20 above guarantees that ift < (n-k)/2 then y - p ( x ) divides Q. Thus p(x) = - B ( z ) / A ( z ) and can be computed easily by a simple polynomial division. Thus in this case we can decode from L(n - k)/2J errors thus recovering the results of 20. In fact, in this case the algorithm essentially mimics the 20 algorithm, though the correspondence may not be immediately obvious. At a different extreme one may pick m ~ ~ and l ~ ~ and in this case Lemma 20 works for t ~ n - 2~n-k. In this case to recover p(z) from Q, one first factors the bivariate polynomial Q. This gives a list of all polynomial pj (z) such that y - p j (x) divides Q. From this list we pull out all the polynomialspj such that pj (zi) ~ y~ for at most t values of zi. Thus in this case also we have a polynomial time algorithm provided Q can be factored in polynomial time. Fortunately, such algorithms are known, due to Kaltofen 8 and Grigoriev 7 (see Kaltofen 9 for a survey of polynomial factorization algorithms). For k / n --+ O, the number of errors corrected by this algorithm approaches (1 - o(1))n. A more detailed analysis of this algorithm and the number of errors corrected by it appear in 16. The result shows that this given an n, xn, (1 - x)n~ Reed Solomon code, the number of errors corrected by this algorithm approaches n
1
l+p~
2~
wherep~=
+~-
.
A plot of this curve against ~ appears in Figure 1. Also shown in the figure are the distance of the code ((1 - x)n) and the classical-error correction bound ((1 - n)/2n). 6
Open
questions
Given that the fundamental maximum likelihood decoding problem is NP-hard for a general linear code, the next direction to look to is a bounded distance decoding algorithm for every n, k, dq linear code. The bottleneck to such an approach is that in general we can't compute d in polynomial time, due to the recent result of Vardy 18. Thus the next step in this direction seems to suggest an application of approximation algorithms: O p e n P r o b l e m 1 Gwen an n โข k matrix G, approximate the distance d of the code CG to within a factor of c~(n). The goal here is to find the smallest factor c~(n) for which a polynomial time approximation algorithm exists. Currently no non-trivial (i.e., with a(n) -- o(n)) approximation algorithm is known. A non-trivial a(n) approximation algorithm would then suggest the following candidate for bounded distance decoding: O p e n P r o b l e m 2 Given an n โข k matrix G, a word R E S n and an integer t, find all codewords within a Hamming distance o f t from R, or show that the minimum distance of the code is less than tcq(n).
197
t 1
i
i
i
'"'-..... "-..
Classical
,
New Correction Bound D i a m e t e r Bound (1 - x) ..... C o r r e c t i o n Bound (1 - x)/2
0.8
0.6 error (e/n) 0,4
0.2
0 0
i 0.2
0.4
rate
(k/n)
t 0.6
i 0.8
Fig. 1, Fraction of errors corrected by the algorithm from 16 plotted against the rate of the code. Also plotted are the distance of the code and the classical error-correction bound.
A similar problem is posed by Vardy 18 for oL1 = 2. Here the hope would be to find the smallest value of a l for which a polynomial time algorithm exists. While there is no immediate formal reasoning to believe so it seems reasonable to believe that az will be larger than a. Next we move to the questions in the area of design of efficient codes, motivated by the work of Spielman 15. O p e n P r o b l e m 3 For every t~ > O, design a family of n, ~n, ~n2 codes Cn so that the bounded distance problem on Cn with parameter t < 7n can be solved in linear time. The goal above is to make 7 as large as possible for every fixed t~. Spielman's result allows for the construction codes which match the best known values of 6 for any n, t~n, 8n2 linear code. However the value of 7 is still far from 6 in these results. We now move towards questions directed towards decoding Reed-Solomon codes. We direct the reader's attention to Figure 1. Clearly every point above
198
the solid curve and below the distance bound of the code, represents an open problem. In particular we feel that the following version m a y b e solvable in polynomial time: O p e n P r o b l e m 4 Find a bounded distance decoding algomthm for an n, xn, ( 1 ~)nq Reed Solomon code that decodes up to t < (1 - vfk')n errors. The motivation for this particular version is that in order to solve the bounded distance decoding problem, one needs to ensure that the number of outputs (i.e., the number codewords within the given bound t) is polynomial in n. Such a bound does exist for the value o f t as given above 6, 12, thus raising the hope t h a t this problem m a y be solvable in polynomial time also. Similar questions m a y also be raised about decoding multivariate polynomials. In particular, we don't have polynomial time algorithms matching the bounded distance decoding algorithm from 16, even for the case of bivariate polynomials. This we feel m a y be the most tractable problem here. O p e n P r o b l e m 5 Find a bounded distance decodzng algorithm for the bivariate polynomzal code CeoLY,2,~n,n that decodes up to t < (1 - v/-~)n 2 errors.
References 1. S. AFt, R. LIPTON, R. RUBINFELD AND M. SUDAN. Reconstructing algebraic functions from mixed data. SIAM Journal on Computing, to appear. Preliminary version in Proceedings of the 33rd Annual IEEE Symposium on Foundations of Computer Science, pp. 503-512, 1992. 2. E. R. BERLEKAMP. Algebraic Coding Theory. McGraw Hill, New York, 1968. 3. E. R. BERLEKAMP. Bounded Distance +1 Soft-Decision Reed-Solomon Decoding. In IEEE Transactions on Information Theory, pages 704-720, vol. 42, no. 3, May 1996. 4. E. R. BERLEKAMP, R. J. McELmcE AND H. C. A. VAN TILBORG. On the inherent intractability of certain coding problems. IEEE Transactions on Information Theory, 24:384-386, 1978. 5. R. DEMILLO AND R. LIPTON. A probabilistic remark on algebraic program testing. Information Processing Letters, 7(4):193-195, June 1978. 6. O. GOLDREICH, R. RUBINFELD AND M. SUDAN. Learning polynomials with queries: The highly noisy case. Proceedings of the 36th Annual IEEE Symposium on Foundations of Computer Science, pp. 294-303, 1995. 7. D. GruGoRmv. Factorization of Polynomials over a Finite Field and the Solution of Systems of Algebraic Equations. Translated from Zapiski Nauchnykh Seminarov Lenningra~iskogo Otdeleniya Matematicheskogo Instituta im. V. A. Steklova AN SSSR, Vol. 137, pp. 20-79, 1984. 8. E. KALTOFEN. A Polynomial-Time Reduction from Bivariate to Univariate Integral Polynomial Fa~:torization. In P3rd Annual Symposium on Foundations of Computer Science, pages 57-64, 1982. 9. E. KALTOFEN. Polynomial factorization 1987-1991. LATIN '92, I. Simon (Ed.) Springer LNCS, v. 583:294-313, 1992.
199
10. R. LIDL AND H. NIEDERREITER. Introduction to Finite Fields and their Applications. Cambridge University Press, 1986 11. F. J. MAcWlLLIAMS AND N. J. A. SLOANE. The Theory of Error-Correcting Codes. North-Holland, Amsterdam, 1981. 12. J. RADHAKRISHNAN. Personal communication, 3anuary, 1996. 13. J. T. SCHWARTZ. Fast probabilistic algorithms for verification of polynomial identities. Journal of the ACM, 27(4):701-717, 1980. 14. M. SmsErt AND D. A. SPIELMAN. Expander codes. 1EEE Transactions on Information Theory, 42(6):1710-1722, 1996. 15. D. A. SPIELMAN. Linear-time encodable and decodable error-correcting codes. IEEE Transactions on Information Theory, 42(6):1723-1731, 1996. 16. M. SUDAN. Decoding of Reed Solomon codes beyond the error-correction bound. Journal of Complexity, 13(1):180-193, March 1997. See also h t t p : / / t h e o r y , l c s . m i t . e d u / - madhu/paper8 .html for a more recent version. 17. J. H. VAN LINT. Introduction to Coding Theory. Springer-Verlag, New York, 1982. 18. A. VARDY. Algorithmic complexity in coding theory and the minimum distance problem. Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, pp. 92-109, 1997. 19. B. L. VAN DEft WAEaDEN. Algebra, Volume 1. Frederick Ungar Publishing Co., Inc., page 82. 20. L. WELCH AND E. R. BERLEKAMP. Error correction of algebraic block codes. US Patent Number 4,633,470, issued December 1986. 21. R. E. ZIPPEL. Probabilistic algorithms for sparse polynomials. EUROSAM '79, Lecture Notes in Computer Science, 72:216-226, 1979.
Sharper Results on the Expressive Power of Generalized Quantifiers Anil Seth The Institute of Mathematical Sciences C.I.T. Campus, Taramani Madras 600113, India e-mail: seth~imsc.ernet.in
A b s t r a c t . In this paper we improve on some results of 3 and extend them to the setting of implicit definability. We show a strong necessary condition on classes of structures on which PSPACE can be captured by extending PFP with a finite set of generalized quantifiers. For IFP and PTIME the limitation of expressive power of generalized quantifiers is shown only on some specific nontrivial classes. These results easily extend to implicit closure of these logics. In fact, we obtain a nearly complete characterization of classes of structures on which IMP(PFP) can capture PSPACE if finitely many generalized quantifiers are also allowed. We give a new proof of one of the main results of 3, characterizing the classes of structures on which L~,,, (Q) collapses to F O ( Q ) , where Q is a set of finitely many generalized quantifiers. This proof easily generalizes to the case of implicit definability, unlike the quantifier elimination argument of 3 which does not easily get adapted to implicit definability setting. This result is then used to show the limitation of expressive power of implicit closure of L~,~ (Q). Finally, we adapt the technique of quantifier elimination due to Scott Weinstein, used in 3, to show that IMP(Lk(Q))-types can be isolated in the same logic.
1
Introduction
Since the expressive power of first order logic is quite limited on finite structures, s o m e n a t u r a l fixed point extensions of it such as Least fixed point (LFP) and Partial fixed point ( P F P ) logics have been studied in finite m o d e l theory. L F P and P F P capture P T I M E and P S P A C E respectively, on classes of ordered structures. However on unordered structures even a powerful extension of these logics, L ~ , ~ fails to include all P T I M E queries. In fact, it is an open question if there is a logic which captures P T I M E on all structures. One way of extending the expressive power of a logic is by adding generalized quantifiers to it. This is a uniform way of enriching a logic by an a r b i t r a r y p r o p e r t y w i t h o u t going to second order logic. In 3, it was shown t h a t no finite set of generalized quantifiers can be added to I F P to capture exactly P T I M E on all structures
201
and similarly for PFP and PSPACE. However, this result was proved only for those classes of structures which realize for each k, a uniformly bounded number of k-automorphism classes in each structure. These classes are called "trivial classes" in 3. An example of such a class is the class of complete structures in a given vocabulary. Nevertheless most of the interesting classes of structures do not satisfy this condition and it remains open whether on such classes extension of fixed point logics by finitely many generalized quantifiers can capture the corresponding complexity class. For example, consider the class of complete binary trees studied in 7. It does not follow from 3 that for any finite set of generalized quantifiers Q, PFP(Q) ~ PSPACE on the class of complete binary trees. In this paper, we prove a more general result which shows that any extension of PFP by finitely many generalized quantifiers cannot capture PSPACE on any recursively enumerable class of structures which, roughly speaking, can not realize polynomially many automorphism types. As an example application of this result, it follows that on the class of complete binary trees mentioned above, for any finite set of generalized quantifiers Q, PFP(Q) # PSPACE. While we cannot prove a general theorem similar to the above result for IFP extended with generalized quantifiers and PTIME, for some special classes such as complete binary trees, we show the similar limitation of any finite set of generalized quantifiers. Another main result of 3 is to provide a characterization of the collapse of L ~ (Q) to F O ( Q ) on a class of structures, in terms of boundedness of L k (Q)types in the structures of this class. This is proved using a novel technique of quantifier elimination which is due to Scott Weinstein. We provide another proof of this result without using the quantifier elimination argument, and instead obtain this result by generalizing the quotient structure construction of 2, 1 in the presence of generalized quantifiers. Next we turn to implicit definability. Implicit closure of various logics on subclasses of structures has been studied in recent years by defining a notion of partial queries (see 5). Partial queries implicitly definable in various logics far exceed the expressive power of fixed point logics. For instance, IMP(PFP) captures PSPACE on the class of rigid structures and IMP(L~,~ ) can express every query on rigid structures. This raises the question whether IMP(LFP) or IMP(PFP) possibly in the presence of finitely many generalized quantifiers can capture corresponding complexity classes. We answer this question in the negative. The proof of our previous theorem easily extends to show that even IMP(PFP(Q)), where Q is a finite set of generalized quantifiers, can not capture PSPACE on any class of structures which does not realize polynomially many automorphism types. In the case of IMP(PFP(Q)) a converse of this result also holds, if we consider queries only upto some given arity. Next we define the notion of k-types for IMP(Lk(Q)) and prove a result analogous to the one in 3 characterizing the collapse of IMP~(L~,~ (Q)) to IMP(FO(Q)) over a class of structures in terms of boundedness of IMP(Lk(Q))-types, for all k, over this class. Here, IMPo~(L~,~ (Q)) is a stronger closure of n ~r162(Q) under implicit definability, which allows countably many query variables, than
202
IMP(L~o,~ (Q))
in which only finitely many query variables are allowed. As a corollary to this result we get that for any finite set of PTIME computable generalized quantifiers IMPoo(L~o,~(Q)) cannot express all PTIME queries on the class of complete structures. The above characterization theorem itself is proved by extending our proof of the theorem characterizing the collapse of L~,~ (Q) to F O ( Q ) . Its proof makes use of our quotient structure construction in the presence of generalized quantifiers. This justifies our presenting a new proof of the already known theorem of 3, characterizing the collapse of L ~OO1W (Q) to
FO(Q).
We do not know how to extend the quantifier elimination argument in 3 to prove the above characterization theorem. The two techniques, quantifier elimination argument of 3 and the quotiented structure construction of this paper appear to have different limitations and therefore seem incomparable. We cannot prove the isolation of IMP(Lk(Q))-types using quotiented structure construction. In the end, we provide a non-obvious adaptation of the quantifier elimination argument to isolate 1MP(Lk(Q))-types in the same logic. This extension is not obvious because unlike in the case of an Lk(Q) formula, the subformulae of a sentence defining a query implicitly may not define any query and hence an inductive argument does not work. This isolation theorem however, is not sufficient to prove our characterization theorem because we can not show an upper bound on the rank of the IMP(Lk(Q)) formulae isolating IMP(Lk(Q))-types within a structure in terms of the number of types realized in the structure.
2
Preliminaries
A vocabulary a is a finite sequence < R1,. 9 Rm > of relation symbols of fixed arities. A a structure A = < A, R A , . . . , R A > consists of a set A, called the universe of A, and relations R A C A r', where r~ is the arity of the relation symbol Ri, 1 < i < m. We shall assume our structures to be finite and classes of structures to be closed under isomorphism. A Boolean query Q over a class C of structures is a mapping from structures in C to {0, 1}, furthermore if A is isomorphic to B then Q(A) = Q(B). For any positive integer k, a k-dry query over C is a mapping which associates to every structure A in C a k-dry relation on A. Again if f is an isomorphism from A to B then f should also be an isomorphism from < A, Q ( A ) > to < B, Q(B) >.
2.1
Logics with Fixed Point Operators
Let ~(z, z l , . . . , zn, S) be a first order formula in the vocabulary ~ (J {S}, where S is n-dry relation symbol not in a. Let A be a structure, for any assignment c of elements in A to variables z the formula ~ gives rise to an operator ~(S) from n-dry relations on the universe A of A to n-dry relations on A as follows 9 (S) -- { ( a l , . . . , a n ) : A b la(c,al,''',an,S)} for every n-dry relation S on A. Variables z are parameter variables. Every such operator can be iterated and gives rise to the sequence of stages 4~'* , m >__1, where r = ~(0) , ~l+t = ~(~z). Each ~'~ is a n-dry relation on A.
203
If the above formula ~ is positive in S, that is each occurrence of S in is within an even number of negations, then the above sequence ~m, m >_ 1 converges, that is for each structure A, there is a m0 such that ~m0 = r for all m > m0. We define !ar162= ~im0. Least fixed point logic ( L F P ) arises by closing the first order logic (FO) under the following new rule, called least fixed point rule, to form new formulae; if ia(z, z l , . . . , x , , S ) is a formula in which S occurs positive then lfp(S, z l , . . . , z,)~(yl,..., Yn) is also a formula with z and Yl,..., Yn as its free variables. The meaning of this formula on a given structure A and for an assignment c of elements in A to variables z is as follows: lfp(S, z l , . . . , x,)~a(y~,..., y,) is true iff ( Y l , . . . , y , ) 9 ~ , where ~ is as defined above. The partial fixed point (PFP) logic is defined in the same way as LFP except that pfp(S, x l , . . . , a , ) i a ( y l , . . . , Yn) construct is available for each formula ~ ( a l , . . . , z , , S) not just for ~ where S occurs positive in ~. The meaning of n-ary relation pfp(S, z l , . . . , x , ) ~ with respect to variables y l , . . . , Yn is defined to be ~mo if there is a m0 such that era0 = ~mo+l and is defined to be 0 if there is no such m0. While the definition of fixed point construct allows for parameter variables, it is easy to eliminate them by enlarging the arity of relational symbol in the fixed point construct suitably, see 4, Lemma 7.1.10 (b). So every LFP(PFP) formulae is logically equivalent to a LFP(PFP) formula in which all instances of fixed point construction involve no parameter variables, that is in the above definition variables z are absent. In the following we will take advantage of this observation and will use without loss of generality the simplified definition of
LFP(PFP). 2.2
Generalized Quantifiers
Generalized quantifiers were first studied by Lindstr6m to increase the expressive power of first order logic without using second order quantifiers. In recent years generalized quantifiers have been used in finite model theory to extend the expressive power of various fixed point logics and to show limitations on expressive power that can be obtained by means of such extensions. We provide in this section a standard introduction to generalized quantifiers along with basic definitions as in 3. Let C be a class of structures over vocabulary ~ = < R1,. 9 Rm > (where Ri has arity ni) that is closed under isomorphism. We associate with C the generalized quantifiers Qc. For a logic L, define the extension L(Qc) by closing the set of formulas of L under the following formula formation rule: if r r are formula in L ( Q c ) and :ca, 9 :era are tuples of variables of arities nl,..., nm respectively, then Qc:~I,..., ~ , ~ ( r r is a formula of L(Qc) with all occurrences in r of the variables among ~ bound. The semantics of the quantifier Qc is given by: A, s ~ Q c x l , . . . , ~,~(r y),..., r y)), iff (A,r162 9 C, where A is the domain of A and cA(s) = {t 9 A n. I A ~ r
204
The type of the generalized quantifier Qc as above is defined to be < n l , . . . , nm > and its deity is defined to be m a c { n 1 , . . . , n~n}. We say that Qv is in complexity class DSPACES(n) or DTIMEt~n) if there is an algorithm which when given the encoding of any structure over ~ decides in complexity DSPACES(n) or DTIMEt(n) respectively, in the size of the encoding of the input, whether the given input structure 9 C. Examples: 1. The usual first order existential quantifier is associated with the class {(A,U) IUC__A,Ur 2. The counting quantifier Ci associated with the class {(A, U) I W C A, IWl _> i}. Both these quantifiers are unary and computable in linear time. 3. The planarity quantifier P is associated with the class of planar graphs, {(A, R)IR C_ A x A, (A, R) is planar } 4. The quantifier multiple, M is associated with the class {(A, Vl, /2) I Vl, Us C A, lUll = k.lV21 for some k 9 N}. In this paper we will consider logics FO(Q), P E P ( Q ) and L~,~ (Q). As is customary in the presence of generalized quantifiers, we will consider I E P ( Q ) instead of L F P ( Q ) because syntactic restrictions on L E P ( Q ) formulae guaranting the monotonicity of relations constructed during fixed point iterations are not obvious, and the semantics of L F P formulae without monotonicity conditions is not defined. 2.3
LU ( Q ) - T y p e s
By L k we mean the fragment of first order logic which uses at most k distinct variables (including both free and bound variables). Similarly, Lk(Q) denotes the k-variable fragment of EO(Q). In the following, we will assume Q to be an arbitrary but fixed set of finitely many generalized quantifiers. The idea of LLtypes was introduced in 2, 1. In 3 this notion is generalized to define L~(Q)types. We reproduce briefly relevant definitions and results from their work. Definition 1. Let A be a structure and let s =< al, ..., az > be a sequence of elements from A where I <_ k. nk(Q) type ors zs the set of all nk(Q)-formulae, r such that A ~ r Note that k-types induce an equivalence relation on the set {(A, a l , . . . , a k ) A a finite structure, a l , . . . , a k 9 Ak}. By the set of k-types realized in a class C of structures we mean the equivalence classes of the above relation where A is a structure in C. By k-types realized in a given structure A we mean the equivalence classes of k-tuples of elements of A, induced by the above relation. An interesting fact about Lk(Q) types is that they can be isolated in Lk(Q). This is stated more precisely in the following Lemma.
205
L e m m a 1. 3 Given (A, a l , . . . , a k ) there is a Lk(Q) formula r such that for all (B, bl . . . . ,b~:), B ~ r iff (A, a t , . . . , a k ) (B, b t , . . . , b~) have the same nk(Q)-type.
and
This result is proved in 3, using a quantifier elimination argument due to Scott Weinstein. 2.4
Implicit Definability o n F i n i t e S t r u c t u r e s
Let C be a class of finite structures. D e f i n i t i o n 2. Let L be a logical language over vocabulary S . Let r Rn) be a formula in L for some n and R t , . . . , Rn ~ Z . r implicitly defines a query over C, in language L if for every structure A E C there is exactly one sequence R A , . . . , R A of relations over A for which r R A) is true in A. The query defined by R1 is said to be a principal query and queries defined by R 2 , . . . , R~ are said to be auxiliary queries. I M P ( L ) is the set of queries which are principal queries defined by a formula as above. Note that it follows from the uniqueness of the sequence R A , . . . , R A in the definition above that relations R A , . . . , Rna are closed under automorphisms of A. Therefore these relations actually define a sequence of queries. Implicit definability, as a logic over finite structures, was first studied in 6. We have given the notion of implicit definability relative to a class of structures. In standard logic literature C is taken to be the class of all structures. However, in finite model theory C is often taken to be the class of interest (a proper subclass of finite structures) and the resulting query is a partial query (even over finite structures), 5. For defining a Boolean query, we may represent "true" by nonempty relation and "false" by empty relation. Alternatively, we may represent "true" by full relation (all tuples included), and "false" by empty relation.
3
Evaluating a Formula Efficiently on S t r u c t u r e s with Few A u t o m o r p h i s m T y p e s
Let us recall the following definition of k-automorphism types from 3, which will play a crucial role in this section. D e f i n i t i o n 3. Given a structure A an equivalence relation ~-k on A k is defined as follows. (al, . . . , ak ) ~-k (bl, . . . , bk ) if there is an automorphism f on structure A such that f ( a i ) = bi for 1 < i < k. Equivalence classes of A under this relation are called k-automorphism types of A . In this section, we present a new algorithm to evaluate a P F P formula on a structure, in the presence of generalized quantifiers, which is more efficient than the obvious method of evaluation if the number of automorphism types realized
206
in the structure is small. An efficient method to evaluate a P F P formula in the same sense was presented in 8 by constructing the quotiented structure A / ' ~ for a given structure ,4,. The contribution of this section is to obtain the efficient evaluation of a P F P formula in the presence of generalized quantifiers. It is not clear how to construct a quotiented structure in the presence of generalized quantifiers, though we will solve this problem later in this paper. However, the quotiented structure that we construct there grows exponentially in the number of automorphism types of the original structure unless we assume some specific properties of the generalized quantifiers. So the approach of constructing a quotiented structure cannot help us in solving our problem. We present a more direct method for evaluation of a formula which meets our objectives. We exploit the fact that all the intermediate sets that need to be constructed during the evaluation are closed under suitable automorphism. As observed towards the end of section 2.1, we need not consider parameter variables in the construction of fixed point formulae. This observation is also valid in the presence of generalized quantifiers, using the identical argument. So in the following, without loss of any generality we will assume that all fixed point constructions are without any parameter variables. It may, therefore be noted that by a k-variable I F P ( P F P ) formula we will mean a formula using at most k distinct variables and without parameter variables in I F P ( P F P ) definitions. How exactly we define the notion of k-variable fragment of P F P ( Q ) is not critical for us, we just need a stratification of P F P ( Q ) formulae to obtain a convenient complexity bound for evaluation of a formula at each level. L e m m a 2. Let S be a fixed vocabulary, and k > the maximum arity of relation
symbols in S. Let Q = (Qi)ieN be a family of generalized quantifiers which have amty < r and are computable in D S P A C E ( n ' ) . For any k-variable PFP(Q) formula r there is a constant c (independent of n , m ) such that on all S structures A, r can be evaluated in c.(n + n.m) r'+l space, where n is the szze of the structure (AI) and re=number of k-automorphism types of A. P r o o f . The proof is by induction on the structure of formula r To make the induction step go through, we prove a stronger statement in which r can also have finitely many relational variables occurring free. Details are given in the full version.
4
Some Necessary Conditions for the Existence of Q with
PFP(Q)- PSPACE We begin this section by combining the results of the previous section with the diagonalization arguments of 8 to obtain some necessary conditions on a class of finite structures for the existence of a suitable set Q of generalized quantifiers such that P F P ( Q ) = P S P A C E on this class of structure. T h e o r e m 1. Let C be a class of finite structures and let Q = (Qi)i~N be the family of all generalized quantifiers of arity < r and computable in D S P A C E ( n ' ) ,
207
for some fixed r, s. If there is a number l such that for all k > l, for all natural numbers i, and for all real numbers e > 0 there is a structure A E C such that the number of k automorphism types of A is < IAIe but the number of l-automorphism types of A is > i then P S P A C E ( C ) # P F P ( Q ) ( C ) . P r o o f . This can be proved using Lemma 2 above and the diagonalization argument of 8, Theorem 3. As an immediate application of the theorem above, consider the example of complete binary trees (CBT) originating from 7. E x a m p l e 1. Let Q be a finite set of P S P A C E computable generalized quantztiers. On any infinite class of complete binary trees P E P ( Q ) r P S P A C E . In 8, page 362 we remarked that there is a O(n p) time algorithm, where p is independent of k, to find k-automorphism class of a k-tuple in a complete binary tree. By examining the proof of Lemma 2 and the Theorem 1 we see that the diagonalization argument there can be adapted over any infinite class of binary trees to create a P T I M E query, which diagonalizes all I F P ( Q ) formulae on this class. So we have the following generalization of an observation in 8, page 362. E x a m p l e 2. Let Q be a finite set of PTIME computable generalized quantifiers. On any infinite class of complete binary trees I E P ( Q ) ys P T I M E . Note that the above examples cannot be deduced from the results of 3. The next theorem applies even to trivial classes, in the sense of 3, Definition 4.6, provided they are recursively enumerable. T h e o r e m 2. Let C be a recursively enumerable class of finite structures and let Q = (Qi)ieN be the family of all generalized quantifiers of arity g r and computable in D S P A C E ( n S ) , for some fixed r, s. If for every k and all real numbers c > 0 there are infinitely many structures Ai E C such that the number of k automorphism types of A is < A,I e, i = 1 , 2 , 3 , . . . then P S P A C E ( C ) r P F P ( Q ) ( C ) . In fact there is a Boolean query ,n P S P A C E ( C ) but not in PFP(Q)(C). P r o o f . This can be proved using Lemma 2 above and the diagonalization argument of 8, Theorem 4. -1 As an application of Theorem 2 we have the following. E x a m p l e 3. On any infinite recursively enumerable class of cliques P S P A C E P E P ( Q ) , for any set Q of bounded arity D S P A C E ( n s) generalized quantifiers for a given s. Using the fact that representation of k-automorphism types, in a structure with one binary relation symbol which is interpreted as an equivalence relation, can be constructed efficiently, we can easily deduce the following.
208
E x a m p l e 4. On any infinite recursively enumerable class of equivalence relations, in which any structure A has equivalence classes of at most O(log IAI) distinct cardinalities, P T I M E ~ I F P ( Q ) , for any set Q of bounded arity D T I M E(n s) generalized quantifiers for a given s. The following lemma shows that results of Theorems 1, 2 cannot be improved in, at least, some ways. L e m m a 3. Let C be a class of finite structures. If there is a natural number k and a real number e > 0 such that for all structures A E C the number of k-automorphism types of A is >__ IAI' then for each l, there is a P S P A C E computable query Qt such that for all l-ary queries P S P A C E ( C ) = PFPQ,(C). Here P F PQ~ is the language P F P augmented with additional, built-in relation symbol Ql which on any structure A is interpreted as Qa(A). P r o o f . Easy, will be given in the final version.
3
Notice that while it is clear that any Boolean query can be represented as a generalized quantifier, it is not clear that this can be done for queries of arity > 0 also. So we do not get a real converse for Theorem 2. This situation will change if we allowed implicit definitions also, as we shall see later.
5
Lk(q) Invariant
In this section we associate with each structure A and a finite set Q of generalized quantifiers, an object called its Lk(Q) invariant, such that if A, B have same invariant then their L k (Q) theories are the same. The invariant is an extension of the structure quotiented w.r.t type equivalence relation of 1,2. However it is not quite a first order structure, in order to keep information about generalized quantifiers the quotiented structure also has some second order functions over its domain. We begin by recalling an elementary observation from 3. O b s e r v a t i o n 1. For any structure A, any finite set q of generahzed quantiflers and any k, there are formula r x k ) , . . . , em(Xl . . . . , xk) which partition A ~ such that each r xk) isolates a Lk(Q)-type in A. P r o o f . Let A realize m distinct Lk(Q)-types of k-tuples. We can number these types as 1 , 2 , . . . , m . By definition for each a a , a 2 in different classes (say in type i and j respectively) we have a Lk(Q) formula r xk) such that A ~ r and A ~ -~r Let r = Al_<j<m,~#jr Let a be a vocabulary consisting of relation symbols R 1 , . . . , Rm and let Q he a finite set of generalized quantifiers under consideration. To avoid separate treatment for the standard first order logic quantifier '3', we assume it to he included in the set Q as unary generalized quantifier of example 1 in section 2.2. Let k be > the arities of the relations R 1 , . . . , R,n and arities of quantifiers
209
in Q. Note that if we are considering k-variable logic then we can replace any relation R / o f arity > k by several relations (but only finitely many) of arity < k depending on the pattern of variable repetitions that can occur if we place kvariables as argument to R/, such that for any L k formula in the old vocabulary there is an equivalent formula in the new vocabulary. A similar transformation can be done on generalized quantifiers of arity > k to obtain several (but only finitely many) new generalized quantifiers of arity < k by considering all patterns of variable repetitions in the sequence of relations in the class associated with the generalized quantifier. For a quantifier Q of type < nl, 9 9 nj > we define a set SQ, of j-tuple of sequences as follows: Sr = {< S l , . . . , sj > I si is a k - ni length sequence of distinct variables from ( z l , . . . , ~k)}. The invariant will be a structure over vocabulary hi. Vocabulary a' consists of symbols =', R ~ , . . . , R ' , P , 1 , . . . , P*,, (f~)caeq,sesQ where =', R ~ , . . . , R " are unary relation symbols and P ~ I , . . . , P~j, j = k k are binary relation symbols. For each quantifier Q of type < n l , . . . , nj > and s 9 SQ, f~ is a function from PJ to P where P is power set of the domain. Given a structure A its L t ( Q ) invariant A / - ~ is defined as atl_q
/ 9~ I = t , - -t , RI1, ".. , Rm, Psx,..., Ps,, (fsQ)QEq,sESQ), where __, (0~1,..., alr i f f a 1 = a 2. R~(al, 9 at) iff R i ( a l , . . . , az), where 1 is the arity of Ri.
Let s = < i l , . . . , it > be a sequence of integers from { 1 , . . . , k}. P, is defined as: P, = { ( a l , . . . a t , a G , . . . , a i k ) I al . . . . a~ 9 A}. If Q is a quantifier of type < n l , . . . , n j > and s = < s l , . . . , s j > 9 SQ then f~ : P(At /=~) j ---+P ( A t /=-~) is defined as follows. Given < I 1 , . . . , Ij >, where each Iz C A t / = ~ . Let Ol($1,st), 1 < l < j, be a formula over a where ~ is a sequence of the x l , . . . , z t not in sz, such that Il -- { a l , . . . , a k A ~ 81(al,...,ak)}. 01 can be constructed using r Cm in the Observation 1 above. Let r zk) = Q ~ I , . - . , tj(01(~l, s l ) , . . . , 8j(~j, sj)) fY (I1 . . . . , Ij) is defined to be the set of types of the tuple z l , . . . , a t for which r is true. T h a t is fy(I1 . . . . ,Ij) = { a l , . . . , a t A D r Note that the set of tuples satisfying r is =_~closed, by the definition of =kas q r is a Lk(Q) formula. Given a F O ( Q ) formula r zt) constructed using variables z ~ , . . . , z t ( Z l , . . . , zk is a permutation of ~ 1 , . . . , at), we define a formula r over or' as follows. r (z) will in general be a formula of higher order logic and not in first order logic. 9 If r - zi = zj then r = 3y(Ps(z,y)A = ' (y)) where s is a sequence chosen so that s = < i, j , . . . >. 9 If r = R j ( z i , , . . . , z i , , ) then r = 3 y ( P s ( z , y ) A R~(y)), where s is a sequence chosen so that s - < i l , . . . , ira,... >. 9 If r =- -1r then r (x) = --r (z) 9 If r -- r A r then r = r i A r
210
9 We now consider the case of generalized quantifiers (this by our assumption about Q also includes the case of r ~ 3xr Let r ~ Q y l . . . y j (r (ya, u a ) , . . . , ej (y~, u j ) ) , where Q is of type < nl . . . . . nj ) All variables in r are among { X l , . . . , xk }. Without loss of generality, we may assume that length of each ui is k - ni (if it is less then we can add some d u m m y variables to it). Let s be the sequence < u l , 9 9 u j >. r is defined as 3yPh(y, z) A y E f 2 ( { z r {zr where z l , . . . , z k -- x i l , . . . , x i k and h = < i l , . . . , i k >. Note that in this case r is not a first order formula. L e m m a 4. Let Z l , . . . , z k be a permutation of xl . . . . ,xk. Let r . . . . ,zk) be a Lk(Q) formula constructed using only variables x l , . . . , x~ then for all a l , . . . , ak E A, A ~ r 9., ak) iff A I =- qk ~ r P r o o f . This is proved by induction on the structure of r from this extended abstract.
Details are omitted
By L e m m a 4, we get that if two structures A, B have different Lk (Q) theories then their invariants are different. It is also interesting to note the following converse though we do not need it for our results later. L e m m a 5. I f two structures A , B have the same Lk(Q) theories then their invariants are also the same (upto isomorphism). P r o o f . Easy, given in the full version.
3
Remarks: 1. The size of the invariant defined above is exponential in the number of types realized the structure. This seems to be unavoidable in the most general case. Although, for a nice family of generalized quantifiers it may often be possible to come up with much smaller and perhaps first order quotiented structures exploiting the specific properties of these quantifiers. The formula isolating L k (Q)-types in 3, is also an invariant for a structure but its size is also exponential in the number of types realized. 2. One can always construct first order (many sorted) structures to represent higher order structures by keeping higher order objects in different sorts. So the fact that we constructed a second order structure does not indicate an intrinsic limitation. It was done to give a natural description of the quotiented object.
6
Collapsing L ~O0~W (Q) to FO(Q)
Using the results of the previous section, we now present a new proof of one of the main results of 3. We give details of every step, as some of this will be generalized to the setting of implicit definability in the next section. T h e o r e m 3. 3 Let C be a class of finite structures over vocabulary S and let Q be a finite set of generalized quantifiers. For any k the following are equivalent.
211
1. There is a number mk such that number of Ll:(Q)-types realized in each structure in C is bounded by mk. 2. Number of L ~ (Q)-types realized over C is finite. 3. L~,~(Q) collapses to Lk(q) over C. .~. There are only fimtely many distinct L~,to(Q ) queries over C.
P r o o f . We give here the proofs of (1) :=> (2) and (2) => (3) only, which differ significantly from those in 3. 1. =V 2. Given (A, h i , . . . , ak), A 9 C the Lk(Q) type of(A, a l , . . . , ak) is captured by ( A / =- -~q , al,..., ak), by lemma 4. That is if ( k / =- q2 , hi . . . . , at), (B/--~, bl,..., b~) are isomorphic then Lk(Q)-types of (A, a l , . . . , ak) and (B, bl,..., bk) are the same. By (1) the size of A / - ~ for all A 9 C is bounded by ink. There are only finitely many nonisomorphic structures of size < m~ possible in the vocabulary of quotiented structures over C. Hence there are only finitely many Lk(Q)-types realized over C. 2. => 3. (2) implies that there are only finitely many distinct queries in Lk(Q) over C, as every Lk(Q) query over C is a union o f - ~ t y p e s . Using this, (3) is proved by induction on the structure of a L~,o~(Q ) formula. The only case where we need to use (2) is when r = MieNr where r162 i 9 N, are L~,~(Q) formulae. By induction hypothesis for each r there is a r 9 Lk(Q) which is equivalent to r over C. As there are only finitely many distinct queries in Lk(Q) over C, only finitely many say, r C/r, of the r are logically inequivalent over C. Therefore r is equivalent, over C, to the L k (Q) formula r lv..-vr We also note the following natural observations. (These are not difficult to prove, but are not mentioned in 3.) L e m m a 6. Let Q be any set of generalized quantificrs. Lk(Q) and L~,~(Q) define the same type equivalence relation on finite structures. P r o o f . It suffices to show that for any (A, a l , . . . , a k ) and (B, b l , . . . , b k ) if there is a r E L~,~(Q) such that A ~ r and B -~r .... , bk) then there is a r zk) E L~(Q) such that A ~ r a~) and B ~ -~r bk). This is not difficult to prove by induction on the structure of r Using the lemma above and the result from 3 that L~(Q) types can be isolated in Lk(Q), we obtain the following normal form therem for L k ~ ( Q ) queries. C o r o l l a r y 1. Let Q be a finite set of generalized quantifiers. Every query in L~,~(Q) can be written as a countable disjunction of Lk(Q) formulae.
212
7
Generalizations to Implicit Definability
In this section we will generalize results of the previous sections to richer logics by considering the implicit closure of the logics considered in the previous sections.
7.1
IMP(PFP(Q))
and P S P A C E
First we extend below Lemma 2. L e m m a 7. Let S be a fixed vocabulary, and k > the maximum arity of relation symbols in S. Let Q = (Qi)iEN be a family of generalized quantifiers which have arity < r and are computable in D S P A C E ( n ~ ) . For every I M P ( P F P ( Q ) ) query P definable by a k-variable P F P ( Q ) formula r there is a constant c (independent of n, m) such that on all S structures A, P can be evaluated in c.(n + n.m) r'+l space, where n is the size of the structure dAI) and m=number of k-automorphism types of A. P r o o f . Easy. See the full version. Theorems 1, 2 are easily extended as below using the Lemma 7. T h e o r e m 4. Let C be a class of finite structures and let Q = (Qi)ieN be the famzly of all generalized quantifiers of arity < r and computable in D S P A C E ( n ' ) , for some fixed r, s. If there is a number 1 such that for all k > l, for all natural numbers i, and for all real numbers e > 0 there is a structure A E C such that the number of k automorphism types of A is < IAI' but the number of l-automorphism types of A is > i then P S P A C E ( C ) r I M P ( P F P ( Q ) ) ( C ) . E x a m p l e 5. Let Q be a finite set of P S P A C E computable generalized quantitiers. On any infinite class of complete binary trees I M P ( P F P ( Q ) ) ~ P S P A C E . T h e o r e m 5. Let C be a recursively enumerable class of finite structures and let Q = (Qi)iEN be the family of all generalized quantifiers of arity < r and computable in D S P A C E ( n S ) , for some fixed r, s. If for every k and all real numbers e > 0 there are infinitely many structures At E C such that the number of k automorphism types of A is < IA,I', i = 1 , 2 , 3 , . . . then P S P A C E ( C ) I M P ( P F P ( Q ) ) ( C ) . In fact there is a Boolean query in P S P A C E ( C ) but not in I M P ( P F P ( Q ) ) ( C ) . E x a m p l e 6. On any infinite recursively enumerable trivial class of structures P S P A C E r I M P ( P F P ( Q ) ) , for any set Q of bounded arity D S P A C E ( n ~) generalized quantifiers for a given s. E x a m p l e 7. On any infinite recursively enumerable class of equivalence relations, in which any structure A has equivalence classes of at most O(logloglAI) distinct cardinalities, P T I M E r I M P ( I F P ( Q ) ) , for any set Q of bounded arity D T I M E(n s) generalized quanlifiers for a given s.
213
Note that in the above example we have assumed the types bounded by O(loglog IA) instead of O(log IAI) as in Example 4. This is to take into account the additional time required to search over all sequences of automorphism closed relations in computing implicit closure of I F P ( Q ) , as in the proof of Lemma 7. However, no significant additional space is required to evaluate the implicit closure of P F P ( Q ) formula, so we also have, E x a m p l e 8. On any infinite recursively enumerable class of equivalence relations, in which any structure A has equivalence classes of at most O(loglAI) distinct cardinalities, P S P A C E ~ I M P ( P F P ( Q ) ) , for any set Q of bounded arity D S P A C E ( n s) generalized qnantifiers for a given s. We will now exploit Lemma 3 to get a sufficient condition for I M P ( P F P ( Q ) ) ( C ) = P S P A C E ( C ) for a finite set Q of generalized quantitiers. O b s e r v a t i o n 2. Let P be a P S P A C E query of any arity. Then there is P S P A C E computable generalized quantifier Q such that P is expressible in IMP(FO(Q)) P r o o f . For simplicity consider structures in vocabulary of one binary relation only. Let P be an l-ary P S P A C E query. Consider the generalized quantifier Qp associated with class {(A, R, P(A)) IA = (A, R), R C_ A2, A finite}. P is implicitly defined by formula r = Qp xlx2, yl ...yl(R(xl, x2), S(yl ...Yl)) which has principal query variable S. It is a simple observation that if in a structure A of size n there are n' kautomorphism types, for some e > 0, then for every p there is a h such that A has at least n p, h-automorphism types. We used this fact in the proof of Lemma 3. This motivates the following definition. D e f i n i t i o n 4. Let C be a class of finite structures. We say that C realizes polynomially many automorphism types if for every p there is a k such that each structure of size n in C realizes at least n p, k-automorphism types. By combining the results of Theorem 5, Lemma 3 and Observation 2 we get the following characterization. T h e o r e m 6. Let C be a recursively enumerable class of finite structures. Let l be a natural number. The following are equivalent.
i. There is a finite set Q of generalized quantifiers, such that P S P A C E , over C for queries of arity < l. 2. C realizes polynomially many automorphism types.
XMP(PFP(Q)) =
Notice that Theorem 6 is only a partial converse to Theorem 5. It seems an open problem to show that for every set of P S P A C E computable bounded arity generalized quantifiers Q, there is a P S P A C E query (of some arity) which is not in I M P ( P F P ( Q ) ) . Note that we always consider structures over an arbitrary but fixed signature.
214
7.2 IMP(Lk(Q))-types In order to generalize the Theorem 3, we need to define the notion of I M P ( L k (Q)) types. We define the type equivalence relation for any logical language L below. Let C be a class of finite structures. For each l, let .At = {(A, a) I A 6 C, a 6 Al}. D e f i n i t i o n 5. For each l, L defines an equivalence relation =--L on the set .Al as follows. (A, a l , . . . , a l ) --L (B, bl,...,bl) if for all l-ary queries P over C definable in L, P ( A ) ( a l , . . . , el) iff P ( B ) ( b l , . . . , bl). In the following, we will mainly be interested in k-type equivalence relation
=_IMP(Lk(Q)), with Q a finite set of generalized quantifiers. We have defined the notion of k-types in terms of queries rather than in terms of formulae as was done in 2, 3. This is more convenient for logics such as implicit definability, where the formulae defining queries may not even be closed under simple syntactic operations. Also, note that ~--IMP(Lk(Q)) depends on the class C of structures under consideration. In order to be able to work conveniently with I M P ( L k ( Q ) ) we note some simple closure properties of it in the Lemma below. L e m m a 8. Let P1, P2 be l-ary querzes in IMP(Lk(Q)). P1UP2, P1NP2, -~P1 are in IMP(Lk(Q)). More generally, i f r P 1 , . . . , Pn) 6 Lk(Q) and P1,..., P, are I M P ( L k ( Q ) ) queries then so is the query defined by r That is, IMP(Lk(Q)) queries are closed under Lk(Q) operations. O b s e r v a t i o n 3. Let C be a class of finite structures. For any structure A 6 C, any finite set Q of generalized quantifiers and any k, there are k-ary I M P( Lk( Q) ) queries P1,..., Pm which partition A k such that each Pi isolates a I M P ( L k ( Q ) ) type in A. P r o o f . The proof is same as in Observation 1, using definition of IMP(Lk(Q)) types and Lemma 8 above. 0 Below we generalize the construction of A / =-Qk in Section 5 to obtain the structure quotiented by "~IMP(L k (q)) relation. Given a finite set Q of generalized quantifiers the structure A/--IMP(L~(Q)) is defined in an identical manner as in A /=k, - q except that we u s e =__iMP(Lk(q)) instead of --~relation in the definition. The following Lemma is an analogue of Lemma 4 and is proved in the same way. L e m m a 9. For any Lk(Q) sentence r in the language or, with P 1 , . . . , P m ~ cr being relational symbols of arity < k, ihere is a sentence r in the language tr" (or" is or' as in section 5 augmented by P ~ , " ' , P m relational symbols of arity one ~ ~') such that for any structure A, A ~ r iff A/=IMP(L~(Q))~ r where P1,...Pm are ~.IMP(Lk(Q)) closed, Pi has arity Pi and Pi, P*, 1 < i < rn, are related as follows; P~* = {bl,..., bk I Pi(bl,..., bp,) A bp, = bp,+l = . . . = bk}. We can also observe the following, which is proved in the same way as Lemma 5.
215
L a m i n a 10. / f two
structures (A, a l , . . . , a k ) , ( B , b l , . . . , b k ) have same IMP(L~(Q)) theories then their invariants ( A / =--tMP(Lk(q)),al,...,ak), (B/--IMP(Lk(Q)), bl,..., bk) are also the same (upto isomorphism). In defining implicit closure of L~,~(Q), we allow a countable number of query variables and denote the resulting class of queries by I M P ~ ( L ~ , ~ ( Q ) ) . It can be shown that this logic is as expressive as the logic obtained if we allowed arbitrarily many queries. The subscript in the notation is introduced to distinguish it from the implicit closure where only finitely many queries are allowed which is denoted by IMP(L~,o~(Q)). The version I M P ( L ~ , ~ ) , was studied in 5 over rigid structures. We choose here IMPcc(L~,~(Q)) instead of I M P ( L ~ , ~ ) mainly for two reasons, first since we show limitations of expressibility, such a result is more general if stated for IMP~(L~,~(Q)). Second, there are some natural properties such as closure under countable union of queries which hold for I M P ~ ( L ~ , ~ ( Q ) ) queries but are not known for IMP(L~,o~(Q)). The following Lemma observes the simple properties of IMPoo(L~,~(Q)) that we will use. L e m m a 11. IMP~(L~,~o ) is closed under countable conjunction, countable
disjunction and complementation. We will use the following normal form Lemma in the proof of Theorem 7 below. L a m i n a 12. Each query in IMPoo(L~,~(Q)), where Q is a finite set of generalized quantifiers, is implicitly definable by a countable disjunction, Vi~Nr
where each r is of the form Oi(P1,.. .,Pn,) A (Am>n,Pm = Pro,), mi <_ ni and Oi(P1,..., P,,) e L k ( q ) over vocabulary S U { P 1 , . . . , Pn,}. P r o o f . This is proved by elementary techniques using Corollary 1.
7.3
Collapsing
IMP~c(L~,~(Q))
to
IMP(FO(Q))
We are now ready to prove an analogue of Theorem 3 for implicit definability. T h e o r e m 7. Let C be a class of finite structures over vocabulary S and let Q
be a finite set of generalized quantifiers. For any k the following are equivalent. 1. There is a number mk such that number of IMP(Lk(Q))-types realized in each structure in C is bounded by rnk. 2. Number of lMP(Lk(Q))-types realized over C is finite. 3. I M P ~ ( L ~ , ~ ( Q ) ) collapses to IMP(Lk(Q)) over C. 4. There are only finitely many distinct IMPoo(L~,~(Q)) queries over C. P r o o f . All cases except (2) =:> (3) follow the same argument as the corresponding cases in Theorem 3. For the case (2) ~ (3), we give a brief sketch here.
216
2. =r 3. The corresponding claim in Theorem 3 was proved by induction on the structure of L~,o~(Q), however it is not clear how to induct on the structure of a formula defining a query in IMPc~(L~,~ (Q)). So we use the normal form Lemma 12 above. First, we can rule out the case when there are infinitely many non-isomorphic structures of the form A/--IMP(Lk(Q)) , A E C. In this case one can show that there are infinitely many IMP(Lk(Q))-types realized over C, contradicting the assumption (2). So, we only have to consider the case when the structures of the form A/--iMp(Lk(q)) , A E C, are finite upto isomorphism. We proceed as follows. Let r P3,...) be a L~,o,(Q ) sentence over vocabulary S U (.Jieg Pi defining a query in IMPoo(L~,o,(Q)). Consider a structure A E C. r has a satisfying assignment on A. By the normal form Lemma there is a 0il (P1,..., Pn,,) e n k (Q) such that Oil (P1,..., Pn,, )A(Am>n, 1Pro = P m u ), mi 1 < ni~ has a satisfying assignment on A. Let B E C be any other structure such that A/--IMP(Lk(Q)) is isomorphic to B/--IMP(Lk(Q)). By considering 0* it is easy to see by Lemma 9 that /9i1(P1, ., P~,) also has a satisfying assignment on B. Since there are only finitely many non-isomorphic quotiented structures over C, we have a finite collection tgi1( P1, . . . , Pn,, ) , . . . , 9it ( P1, . . . , Pn,r ) of n k(Q) formulae constructed as above such that for each structure A E C there is a 0i~ having a satisfying assignment on A. Let n = m a x { n i ~ , . . . , nit}. Define for 1 _< j < r, r = 0 i , ( P 1 , . . . , Pn,~)A(A.,~<m<,Q,~ = Qm,,). Consider the formula r = Vl<j<~r It is easy to see that, r defines over C the same query in I M P ( L ~ ( Q ) ) as defined by r $1
""
R e m a r k : It might look as if the implication (1 ::r 3) in the above Theorem can be deduced from the corresponding implication in Theorem 3. This is not so. Given that the number of I M P ( L k ( Q ) ) types is bounded over C, the number of Lk(Q)-types is also bounded over C. This from Theorem 3 gives us that L~,,o(Q) collapses to Lk(Q) over C. However, this does not imply that over C: I M P ~ ( L ~ , ~ ( Q ) ) collapses to I U P ( L k ( Q ) ) . in fact, one can easily construct a class of structures on which L~,~, collapses to L k but I M P ( L ~ , ~ ) does not collapse to I M P ( L k ) . We give some applications of Theorem 7 below. Notice that number of kautomorphism types bounds the number of IMP(Lk)-types in any structure. So the following example is a corollary to Theorem 7 and Theorem 5. E x a m p l e 9. Let C be an infinite recursively enumerable trivial class of struc-
tures and let Q be a finite set of P S P A C E computable generalized quantifiers. I M P ~ ( L ~ , ~ ( Q ) ) on C is properly contained in P S P A C E . For P T I M E
we have the following examples.
E x a m p l e 10. Let C be an infinite recursively enumerable class of cliques and let
Q be a finite set ofpoly-time computable generalized quantifiers. IMPcc(L~,,o(Q)) on C is properly contained in P T I M E .
217
E x a m p l e 11. Let E~ be an infinite recursively enumerable class of equivalence relations, in which any structure A has equivalence classes of at most r distinct cardinalities, and let Q be a finite set of poly-time generalized quantifiers. IMPc~(L~,o~(Q)) on E, is properly contained in P T I M E .
8
Quantifier Elimination for Implicit Definability
In this section we adapt the technique of quantifier elimination due to Scott Weinstein used in 3, to isolate IMP(Lk(Q))-types. First, we present a general lemma about quantifier elimination, independent of implicit definability or types. In the following Q will be an arbitrary but a fixed set of finitely many generalized quantifiers. The lemma below is a version of the essential idea behind the proof of Theorem 4.2 in 3. L e m m a 13. Let A, B be two relational structures. Let S1,. 9 Sm be k-dry rela-
tional symbols in the vocabulary of these structures such that sA, . . . , SAm defines a partition on A k and S f f , . . . , S B defines a partition on B k. Assume that for each Lk(Q) formula r = c_ { z l , . . . , z~}, of quantifier rank < 1, there is an I C_ { 1 , . . . , m } s.t. A , B ~ V x i , . . . , x k V i e l S i ( x i , . . . , x k ) ~ r Then the following holds. (i) For every Lk(Q) formula r y c { x l , . . . , xk}, there is a J C_ {1,..., m} s.t. A, B ~ VXl,..., xkViE~ S i ( x l , . . . , ~r ~ r (it) Both A, B satisfy the same set of Lk(Q) sentences. P r o o f . (i) By induction on the quantifier rank of L~(Q) formulae r This uses the quantifier elimination argument as in 3. (it) Follows easily from (i). Let (A, a l , . . . , ak) be given, where A is a structure over vocabulary Z. We now give the construction of IMP(L~(Q)) formula to isolate IMP(Lk(Q)) type of (A, h i , . . . , ak). Let 0(P1,..., Pro) be a IMP(L~(Q)) formula (with possibly other auxiliary queries not displayed) such that its satisfying assignment to P1,..., Pm in A isolate IMP(Lk(Q)) types of Ak and h i , . . . , ak E P1. Let a(P1,..., Pro) be defined as: Al
218
Define ~IA( P, P1, . . . , Pro) as: O(Pa,..., Pm)AVxl,..., x k ( P ( x l , . . . , xk) ~ c~(P1,..., Pm)AflAPl(xl,..., zk)) Let P be principal query in r/a. (As stated above for 0, there may be some auxiliary queries other than P, P 1 , . - . , Pm in r/a which are not displayed, but these will not play any role in the proofs below so we will not mention them). L e m m a 14. For any structure D, let P(D), PI(D),..., Pro(D) denote the satisfying assignment ofrlA(P , P1,..., Pro). Let (B, b l , . . . , bk) be in the model of query implicitly defined by rlA(P, P1,..., P,~), that is (bl,..., bk) E P(B). Then for all Lk(Q) sentences r in the vocabulary S U { P 1 , . . . , Pro}, (A, P I ( A ) , . . . , Pro(A)) r iff(B, PI(B),...,Pro(B)) ~ r P r o o f . Since query implicitly defined by qa(P, P 1 , . . . , Prn) is nonempty on B, c~(P1, 9 9 Pro) and/~ are true on B by definition of qA. If we consider the extended structures (A, P I ( A ) , . . . , Pro(A)) and (S, P I ( B ) , . . . , Prn(B)) in vocabulary S O { P 1 , . . . , Pro}, the hypothesis of Lemma 13 is satisfied and the result follows by part (ii) of that Lemma. T h e o r e m 8. IMP(Lk(Q)) types can be isolated in IMP(Lk(Q)). P r o o f . Let (A, a l , . . . , al), l < k, be given. First assume I = k. We can construct a formula qA(P, P 1 , . . . , P m ) as above. It is not difficult to show that I M P ( L k (Q)) query P defined by r/a isolates I M P ( L k (Q))-type of (A, a ~ , . . . , ak). The case I < k can also be handled easily. Details are given in the full version. Notice that Lemma 8 is not sufficient to prove Theorem 7, in particular to prove implication (1) ::~ (2). In the case of Lk(Q) it is clear that the quantifier rank of the formulae isolating types within a structure is bounded by the number of types in the structure. However, this is not clear in the case of isolating IMP(Lk(Q)) types within a structure. In fact, we do not know any bound on the quantifier rank of formulae isolating types within a structure in terms of number of types alone in the structure. Another obstacle is that for a given r the number of Lk(Q) formulae defining IMP(L~(Q)) queries is not finite because the number of query variables is not bounded. However, this does not appear to be a serious obstacle as it may be handled by stratifying the ~IMP(Lk(Q)) relation according to the number of query variables. 9
Conclusion
We have given an elementary proof showing the limitation on the expressive power of fixed point logics in the presence of arity-bounded and complexitybounded generalized quantifiers. Our results extend many earlier results on the limitations of expressive power of finitely many generalized quantifiers, yet the proofs do not use any concepts from logic beyond mere definitions. As a result these techniques also seem applicable to other situations such as other variants of implicit definability.
219
Our argument involving quotiented structures seems very natural to establish that uniform boundedness of types within structures of a class implies finitely m a n y types (of fixed variable fragment) over the class. Though the construction of quotiented structures in the presence of generalized quantifiers may look technically tedious, the ideas are simple. This method also seems applicable to logics for which types may not be isolatable. If we are interested in determining whether L~o,~ (Q) or IMPoo(L~o,~ (Q)) contain all P S P A C E queries on a class of structures, for finitely many P S P A C E computable generalized quantifiers Q (similarly for P T I M E ) then Theorem 3 or Theorem 7 are of limited help as they characterize the collapse of L~o,~ (Q) etc. to a smaller set, F O ( Q ) etc. We do not know how to prove a theorem similar to Theorems 3, 7 to answer the above question. We mention some open questions related to implicit definability. We do not know if Theorem 7 holds for IMP(Lk,โข(Q)) instead of IMPoo(L~,~(Q)). Our proof fails because we do not know if I M P ( L ~ , ~ ( Q ) ) is closed under countable conjunctions and disjunctions. In fact, we do not even know if I M p (Loo,o~) = IMPoo(L~o,~ ). Another open question is to prove or disprove that IMP(Lk)types and IMPoo(L~,o~)-types are the same.
References 1. A. Dawar. Feasible Computation through Model Theory. PhD thesis, University of Pennsylvania, Philadelphia, 1993. 2. A. Dawar, S. Lindell, and S. Weinstein. Infinitary logic and inductive definability over finite structures. Information and Computation, 119(2):160-175, June 1995. 3. Anuj Dawar and Lauri Hella. The expressive power of finitdy many generalized quantifiers. Information and Computation, 123(2}:172-184, 1995, Extended abstract appeared in LICS94. 4. H. Ebbinghaus and J. Flum. Finite Model Theory. Perspectives in Mathematical Logic. Springer, 1995. 5. Lauri Hella, Phokion G. Kolaitis, and Kerkko Luosto. How to define a linear order on finite models. In Ninth Annual IEEE Symposium on Logic in Computer Science, pages 40-49, 1994. 6. Phokion G. Kolaitis. Implicit definability and unambiguous computation. In Fifth Annual IEEE Symposium on Logic in Computer Science, pages 168-180, 1990. 7. S. Lindell. An analysis of fixed-point queries on binary trees. Theoretical Computer Science, 85(1):75-95, 1991. 8. A. Seth. When do fixed point logics capture complexity classes. In Tenth Annual IEEE Symposium on Logic in Computer Science, 1995.
Improved Lowness Results for Solvable Black-box Group Problems N. V. Vinodchandran Institute of Mathematical Sciences, Chennai 600113, India. emaih vinod@imsc, ernet, in A b s t r a c t . In order to study the complexity of computational problems that arise from group theory in a general framework, Babai and Szemer4di 4, 2 introduced the theory of black-box groups. They proved that several problems over black-box groups are in the class NPNco-AM, thereby implying that these problems are low (powerless as oracle) for ,U~ and hence cannot be complete for NP unless the polynomial hierarchy collapses. In 1, Arvind and Vinodchandran study the counting complexity of a number of computational problems over solvable groups. Using a constructive version of a fundamental structure theorem about finite abelian groups and a randomized algorithm from 3 for computing generator sets for the commutator series of any solvable group, they prove that these problems are in randomized versions of low complexity counting classes like SPP and LWPP and hence low for the class PP. In this paper, we improve the upper bounds of 1 for these problems. More precisely, we avoid the randomized algorithm from 3 for computing the commutator series. This immediately places all these problems in either SPP or LWPP. These upper bounds imply lowness of these problems for classes other than PP. In particular, SPP is low for all gap-definable counting classes 9 (PP, C=P, ModkP etc) and LWPP is known to be low for PP and C=P. These results are in favor of the belief that these problems are unlikely to be complete for NP.
1
Introduction
Computational problems in group theory have recently been of considerable interest in computational complexity theory. One of the main reasons for this is t h a t the exact complexity of m a n y of the computational problems t h a t arise from group theory are not exactly characterized. For example, consider the basic problem of testing membership in matrix groups over finite fields, presented by a set of generators. While no polynomial time algorithm exists for this problem, it is not known whether it is hard for NP. Although, in the case of permutation groups, polynomial time algorithms are known for m a n y computational problems like testing membership and computing the order 11, there are m a n y other problems over permutation groups whose complexity is not exactly characterized 13, 14. In order to study the complexity of computational group theoretic problems in a generalized framework, Babai and Szemer~di 4 introduced the theory of
221
black-box groups. Intuitively speaking, in this framework we have an infinite family of abstract groups. The elements of each group in the family are uniquely encoded as strings of uniform length. The group operations (product, inverse etc) are assumed to be provided by a group oracle and hence are easily computable. Black-box groups are subgroups of groups from such a family and they are presented by generator sets. For example, matrix groups over finite fields and permutation groups presented by generator sets can be seen as examples of black-box groups. It is shown in 4 that Membership Testing in general blackbox groups (and many other computational problems over black-box groups) is in NP. A central problem considered in the papers of Babai and Szemer~di 4, 2 is Order Verification: given a black-box group G presented by a generator set and a positive integer n, verify that IGI = n. This problem is important because it turns out that several other problems reduce to Order Verification. In 2, using randomization to compute approximate lower bounds and sophisticated grouptheoretic tools, it is shown that Order Verification for general black-box groups is in AM. As a consequence, it turns out that several problems for black-box groups (e.g., Membership Testing, Group Intersection etc.) are in NP Nco-AM. It follows that these problems cannot be NP-complete unless the polynomial hierarchy collapses to ,Up 6, 16. These results strongly indicate that these problems may be of intermediate complexity between P and NP3 Recently, there has been interest in studying the complexity of computational group-theoretic problems with respect to counting complexity classes. In 15, KSbler et. al. have looked at the counting complexity of Graph Isomorphism and some group-theoretic problems like Group Intersection, Group Factorization etc, over permutation groups. They show that these problems are in counting classes that are low for PP and C=P. Their results are also in favor of the belief that these problems are not complete for NP. Motivated by the above results, in 1, Arvind and Vinodchandran study the counting complexity of a number of problems over solvable black-box groups. 2 Since solvable groups are a generalization of abelian groups, they first consider these problems over abelian groups. Using a constructive version of a fundamental theorem on the structure of finite abelian groups, the authors prove tight upper bounds on the counting complexity of these problems. More precisely, they show that, over abelian groups, the problems Membership Testing, Order Verification, Group Isomorphism and Group Intersection are in the class SPP. Since SPP is low for any gap-definable 9 counting class, this upper bounds imply that these problems are low for classes PP, C=P and ModkP for any k > 2. z It is interesting to note that, while most of the natural problems that are not known to be in P, are proved to be NP-complete, candidates for natural problems that are of intermediate complexity are very few. The problem of testing whether two labeled graphs are isomorphic (in short, GI) is one such candidate. 2 It is worth noting that solvable finite groups constitute a very large subclass of all finite groups. The celebrated FeitoThompson Theorem states that all finite groups of odd order are solvable.
222
They also show lowness of the problems Group Factorization, Coset Intersection and Double-Coset Membership over abelian groups, for PP and C=P" In the case of solvable groups, using a randomized algorithm for computing the commutator series of a solvable black-box group from 3, and a constructive version of the fundamental theorem for abelian factor groups, in order to construct a special set of generators called canonical generator set, the authors of 1 were able to show that all the above-mentioned problems are in randomized versions of SPP or LWPP (More precisely in ZPP SPP or in zppLWPP). These classes are low for the probabilistic class PP. Because of the randomization involved as a bottle-neck in the results for solvable groups in 1, the lowness properties of these problems for classes like C=P, ModkP, and other gap-definable counting classes, were not clear from 1. The present paper overcomes this bottle-neck. The main contribution of this paper is the design of a deterministic oracle Turing machine for converting a set of generators of a solvable group to a canonical generator set. Thus, by avoiding the randomized algorithm of 3 for computing a generator set for commutator series, we improve the upper bounds on the counting complexity of all the above-mentioned problems over solvable black-box groups. More precisely, we show that, over solvable black-box groups, the problems Solvability Testing, Membership Testing, Order Verification, Group Isomorphism and Group Intersection are in the class SPP, there by implying that these problems are low for any gap-definable counting class. We also show that the problems Group Intersection, Group Factorization, Coset Intersection and Double-Coset Membership over solvable groups are in the class LWPP. This implies that these problems over solvable groups are low for C=P also. These results indicate that, at least with respect to the counting complexity, problems over solvable groups may not be harder than their counter-parts over abelian groups. The rest of the paper is organized as follows. In Section 2, we give complexitytheoretic and group-theoretic notations and definitions which are necessary for the paper. The definition of a canonical generator set for solvable groups and results relating to it are also given. Section 3 is devoted to the design of a deterministic oracle algorithm, CANONICALGENERATOR,for computing a canonical generator set for a solvable group, from an arbitrary generator set. Finally, in Section 4, we give definitions of the computational problems that we are interested in, and improve the upper bounds on the counting complexity of these problems, using the algorithm given in Section 3. 2
Preliminaries
We refer the reader to 5 for standard complexity-theoretic definitions. Here we give only the minimal notations and definitions. We fix the finite alphabet E = (0, 1}. Let A, B C E* be two languages. The language 0A t_J 1B is denoted by A $ B. In 9, a uniform method for defining a number of counting classes using GapP, the class of gap-definable functions, is given. Those classes that can be defined in this manner are called gap-definable classes. Refer 9 for details.
223
We give explicit definitions of only two classes, SPP and LWPP. Let GapP 9 denote the class of gap-definable functions. A language L is in LWPP if there are functions f E GapP and h 9 FP such that; x 9 L implies that f ( x ) = h(Ixl) , and x r L implies that f ( x ) --- 0. A language L is in SPP if there is an f 9 GapP such that: x 9 L implies that f ( x ) = 1, and x r L implies that f ( x ) = O. It follows that SPP C LWPP. The concept of lowness is a well-studied notion in structural complexity theory. Intuitively, we say that the class s is low for class C, if s is powerless as an oracle to a machine accepting languages in C. Next we give the formal definition of lowness. Definition 2.1 Let C be any complexity class which allows natural relativization. Then the class s is said to be low for C, if for all L E s CL -- C, where CL denotes the complexity class that is obtained by relativizing C with respect to the language L. The main interest in these classes is because of the following theorem proved in 9 regarding their complexity. This result indicates that SPP and LWPP are counting classes of low complexity. It is believed that those problems in SPP or LWPP cannot be complete for NP. T h e o r e m 2.2 (9) The class SPP is low for all gap-definable counting classes. The class LWPP is low for PP and C=P. Let M be an oracle NP machine and let A E NP be accepted by an NP machine N and let f E FP. We say that M A make fin)-guarded queries if, on length n inputs, M A only asks queries y for which N i y ) has either 0 or f ( n ) accepting paths, for each n. In this terminology, we state a weaker version of a theorem from 15 which gives a method for proving membership of some languages in the classes SPP and LWPP. T h e o r e m 2.3 (15) Let M be a deterministic polynomial-time oracle machine and let f be a polynomial-time computable function. If A E NP such that M A makes f(n)-guarded queries, then there is a polynomial q such that the function h, where h(x) = f(Ixl) q(Ixl) if M A on input x accepts and h(x) = 0 if M A on input x rejects, is in GapP. For example, to prove that a language L is in SPP, it is enough to show the existence of a deterministic polynomial-time oracle machine accepting L, which makes only 1-guarded queries to a language in NP. We now give some basic group-theoretic definitions. The reader can refer standard textbooks 7, 12 for details. Let G be a finite group. For an element g E G, the order of g (denoted oig)) is the smallest positive integer such that gO(g) = e, where e is the identity of G. The order of G is denoted IGI. A subset H of G is a subgroup of G (denoted H < G) if H is a group under the group operation of G. Let X C G. The group generated by X is the smallest subgroup of G containing X and is denoted iX). A group G is abelian if Vgl,g2 E G : gig2 =
224
g2gi. Observe that if S is a set of generators for G then G is abelian iff Vgi, 92 E S : gig2 = g2gl. The element x y x - l y -1 is called the commutator of elements x and y in G. The subgroup of G generated by the set { x y x - l y -1 I x, y E G} is called the commutator subgroup of G. We denote the commutator subgroup of G by G'. It is well-known that G ' is a normal subgroup of G and the factor group GIG ~ is abelian. For a group G, the sequence G = Go > G1 > ..., where each group Gi is the commutator subgroup of Gi-1, is called the commutator series of G. A group G is said to be solvable if the commutator series terminates in the trivial subgroup {e} in finitely many steps. Now we define the notion of balck-box groups. D e f i n i t i o n 2.4 A group family is a countable sequence B = { B m } m > l of finite groups Bin, such that there are polynomials p and q satisfying the following conditions. For each m > 1, elements of Bm are uniquely encoded as strings in ~n(m). The group operations (inverse, product and testing for identity) of Bm can be performed in time bounded by q(m), for every m > 1. The order of Bm is computable in time bounded by q(m), for each m. We refer to the groups Bm of a group family and their subgroups (presented by generators sets) as black-box groups. 3 A class C of finite groups is said to be a subclass of B if every G E C is a subgroup of some Bm E B. For example let Sn denote the permutation group on n elements. Then, {Sn}n>l is a group family of all permutation groups Sn. As another example let GL,(q) denote the group of all n x n invertible matrices over the finite field Fq of size q. The collection GL(q) = {GLn(q)}n>l is a group family. The class of all solvable subgroups, {G G < GLn(q) for some n and G is solvable } is a subclass of GL(q). In this paper we are interested in the complexity of computational grouptheoretic problems (the exact definitions of the problems we consider are given in Section 4) when the groups involved are solvable. But, since solvable groups are a generalization of abelian groups, some remarks about the complexity of these problems over abelian black-box groups are in order. For proving tight upper bounds on the counting complexity of the above-mentioned problems over abelian groups in 1, the authors employ a constructive version of a fundamental theorem about the structure of finite abelian groups. This theorem says that any finite abelain group G can be uniquely represented as a direct product of some cyclic subgroups of G. One of the immediate consequence of this theorem is the existence of a special generator set, called independent generator set, for any abelian group. To be precise, let G be a finite abelian group. An element 9 E G, is said to be independent of a set X C_ G if (g) N (X) = {e}. A generator set S of G is an independent generator set for G, if all g E S is independent of S - {g}. One of the very useful properties of independent generator sets is the following. Let S be an independent generator set for an abelian group G. Then 3 Note that the black-box groups we define above are a restricted version of the blackbox groups introduced in 4.The black-box group defined in 4 is technically more general. There the black-box group is defined so as to incorporate factor groups.
225
for any g E G, there exist unique indices lh for h E S; lh < o(h) such that g = l'heS hlh" Hence membership testing in G can be done in a "l-guarded way" if G is presented by an independent generator set. In 1, an algorithm for converting a given generator set to an independent generator set is given, which is used in proving the upper bounds on the counting complexity for problems over abelian black-box groups. For proving the upper bounds for problems over solvable black-box groups in 1, the authors introduce a generalization of the notion of independent generator set, called canonical generator set for any class of finite groups. We now give the definition of a canonical generator set. The existence of canonical generators for the class of solvable groups is shown in 1.
Definition 2.5 Let B = {Bm}m>O be any group family. Let C be a subclass of/3. The class of groups C has canonical generator sets if for every G E C, if G < B m then there is an ordered set S = {gl,g2,... ,gs} C_ G such that each g E G can be uniquely expressed as g --- 91ll g212 .-.gsl, , where 0 < li < o(gi),l < i < s. Furthermore, s < q(m) for a polynomial q. S is called a canonical generator set for G. Notice that the above definition is a generalization of the definition of an independent generator set in the sense that the uniqueness property of the indices is preserved. Now, define a language L as follows.
L = ((om,S,g)lS C Bm,g e Bm;Vh 9 S 3lh;1 < lh < o(h) and g = 1-Ihesh lh } The following proposition brings out the fact that the language L can act as a "pseudo Membership Testing" in the sense that if S is a canonical generator set, then (0m, S, g) 9 L if and only if g 9 (S). More importantly in this case, the NP machine M (given in the proposition) will have unique accepting path for those instances inside L.
Proposition 2.6 Let B = {Bm}m>O be any group family. Then there exists an NP machine M witnessing L 9 NP. Let C be a subclass of I3 which has a canonical generator set and let S be a canonical generator set for (S) 9 C. Then M will have a unique accepting path if g 9 (S) and M will have no accepting path if g ~_ (S). The behavior of M is unspecified if the input does not satisfy the promise. Proof. In 10, it is shown that checking whether a number is prime or not is in the class UP. Using this, one can easily design an unambiguous nondeterministic polynomial-time transducer which computes the prime factorization of any number. Let M' be such a machine. Now, It is easy to see that the order of any g 9 can be computed if the prime factorization of IBm is given. So, M first computes IBml in polynomial time. Then by simulating M', it computes the prime factorization of Bml and computes the order o(h) for all h 9 S. Now, M' guesses indices lh such that 1 < lh < o(h) and accepts if g = Ihes hth and rejects otherwise. From the definition of canonical generator set, it follows that M has the behavior as described in the proposition.
226
Next lemma shows the existence of canonical generator sets for any solvable group. L e m m a 2.7 (1) Let B = {Bm}m>O be a group family such that IBm <_ 2 q(m) for a polynomial q. Let G < Bm be a finite solvable group and G = Go > G1 > 9.. > Gk-1 > Gk = e be the commutatorseries ofG. L e t T i = { h i l , h i 2 , . . . ,hiss} be a set of distinct coset representatives corresponding to an independent set of generators for the abelian group Hi = G i - 1 / G i . Then for any i, 1 < i < k, the ordered set 4 Si = uk=i Tj forms a canonical generator set for the group Gi and ISil < q(m). Thus the class of solvable groups from B has canonical generator sets. The basic steps implicitly involved in the upper bound proofs given in 1, for problems over solvable black-box groups, are the following. 1. A deterministic oracle algorithm (let us call it CANONIZE), which takes an arbitrary set of generators for the commutator series of a solvable black-box group as input, and converts it into a canonical generator set by making 1-guarded queries to a language in NP, is developed. 2. By carefully combining the algorithm CANONIZE with a randomized algorithm from 3 for computing generator sets of the commutator series for any solvable black-box group, a randomized oracle algorithm (let us call it RAND CANONICAL GENERATOR) for converting a generator set for any solvable group to a canonical generator set (which makes 1-guarded queries to an NP language) is given. 3. RAND CANONICAL GENERATOR is then easily modified to give membership of many computational problems over solvable groups in randomized counting classes which are low for PP. In this paper, we avoid randomization involved in step 2. More precisely, by using the algorithm CANONIZE as subroutine, we give a deterministic oracle algorithm CANONICAL GENERATOR (which makes 1-guarded queries to an NP language) for converting an arbitrary generator set to a canonical generator set, for any solvable black-box group G. This will immediately give improved upper bounds on the counting complexity of many problems over solvable groups which in turn gives lowness of these problems for many counting classes. In the next section we present the algorithm CANONICAL GENERATOR for converting an arbitrary generator set to a canonical generator set, for any solvable group. Since we will be using the algorithm CANONIZE as subroutine in CANONICAL GENERATOR, we describe the behavior of CANONIZE as a theorem. T h e o r e m 2.8 (1) Let B = {Bm}m>o be a group family. Then there is a deterministic oracle machine CANONIZE and a language L r E N P such that CANONIZE takes ( o m , S o , . . . , S k ) , Si C_ Bm as input and L ~ as oracle. Suppose the 4 The elements of the set U~=i Tj are ordered on increasing values of the index j, and lexicographically within each set Tj.
227
input satisfies the promise that (So) is solvable and for 0 < i < k, Si generates the i th commutator subgroup of (So). Then CANONIZE outputs canonical generator sets for (Si) for 0 < i < k. Moreover, CANONIZE runs in time polynomial in the length of the input and makes only 1-guarded queries to L ~. The behavior of CANONIZE is unspecified if the input does not satisfy the promise. 3
Computing
a Canonical
Generator
Set
This section is devoted to the proof of the following theorem. T h e o r e m 3.1 Let B = {Bm}m>_O be a group family. Then there is a language Lca E NP and a deterministic oracle machine CANONICAL GENERATOR that takes (0m, S) as input and Lca as oracle, and outputs a canonical generator set for (S) if (S) is solvable and outputs NOT SOLVABLE otherwise. Moreover, CANONICAL GENERATOR runs in time polynomial in the length of the input and makes only 1-guarded queries to Lca. Before going into the formal proof of the theorem, we give basic ideas behind the proof. Let S be a set of generators for a solvable group. Let (S) = Go > ... > Gi > ... > G~ = {e} be the commutator series of (S). We are interested in computing short generator sets for each Gi. It is clear that this problem essentially boils down to the problem of computing a generator set for the commutator subgroup of any group. The following theorem provides a method for this computation. The proof is standard group-theoretic. T h e o r e m 3.2 Let G be a finite group generated by the set S. Then, the commutator subgroup of G is the normal closure of the set { g h g - l h -1 g,h E S} in G. The above theorem gives us the following easy polynomial-time oracle algorithm COMMUTATORSUBGROUP which takes (0m, S) as input and Membership Testing as oracle and computes a generator set for the commutator group of (S).
COMMUTATORSUBGROUP(Om, S) 1 X ~ { g h g - l h -1 I 9 , h 9 S} 2 while 3g 9 S; x 9 X such that (Om,X, gx9 -1) ~_Membership Testing 3 d o X + - X U g x g -1 4 end-while 5 Output X It easily follows from Theorem 3.2 that COMMUTATORSUBGROUP, on input (0 m, S) outputs a generator set for the commutator subgroup (S) ~. Let Xi be the set X at the beginning of the i th iteration of the while-loop. If, after the i th iteration, no new element is added to Xi, then Xi is output. Otherwise, if X,+I = Xi U {9}, it follows from Lagrange's theorem that X~+x >_ 21Xd. Hence the number of iterations of the while-loop is bounded by p(m).
228
Since in the above algorithm, the queries to Membership Testing oracle may not be 1-guarded, a straightforward adaptation of the algorithm computing a generator set for all elements in the commutator series seems difficult. Suppose we can make sure that whenever a query (om,X,g) to Membership Testing is done, X is a canonical generator set for the solvable group (X), then from Proposition 2.6, we can replace Membership Testing oracle with the NP language L and it follows that query y will be 1-guarded. We make sure this promise, by constructing the commutator series in stages. Let S~ denote the partial generator set for the ith element in the commutator series of Go constructed at the end of stage (j - 1). At stage 1 we have S~ = S and S~ = {e} for 1 < i < p(m), where p is the polynomial bounding the length of any element in the group family. Input to Stage j is the tuple (z,"S ,i , .J. . S~(m) )j such that for ! > i, S~" is a canonical generator set for the solvable group (S~'). At the end of the stage, we update each S{ to -iSJ+l such that _~S j+l is still a subgroup of Gi, the ith commutator subgroup of Go. To keep the running time within polynomial bound, we make sure that after p(m) stages, there exists k, such that the k th partial commutator subgroup doubles in size. Then from Lagrange's theorem, it will follow that the commutator series will be generated after p3(m) stages. We now formally prove the theorem.
Proof. (of Theorem 3.1) We first give the formal description of the algorithm CANONICAL GENERATOR and then prove the correctness. CANONICAL GENERATOR uses oracle algorithms CHECK COMMUTATORand CANONIZE as subroutines. CHECK COMMUTATORtakes as input (0m, X, Y) such that X, Y C_Bm and checks whether (Y) contains the commutator subgroup of (X). This is done by first checking whether the commutators of all the elements in X axe in (Y). If this is not the case, the algorithm returns such a commutator. Otherwise, it further checks whether (Y) is normal in (X). Notice that, to do this it is enough to verify that Vx E X; y E Y, xyx -1 E (Y). If this condition is false, the algorithm returns an element xyx -x ~ (Y). If both the conditions are true, it follows from Theorem 3.2 that (Y) contains the commutator subgroup
of (x). CHECK COMMUTATOR makes oracle queries to the language L (defined in the previous section) for testing membership in (Y). It should be noted that, for CHECK COMMUTATOR to work as intended, Y should be a canonical generator set for the group (Y). We will make sure that CANONICALGENERATOR makes calls to CHECK COMMUTATORwith (0m, X, Y) as input, only when Y is a canonical generator set for the solvable group (Y). A formal description of the subroutine CHECK COMMUTATORis given below.
CHECK COMMUTATOR(0m, X, Y) 1 if 3Xl,X2 E X, such that (om,y, xlx2x~lx~ 1) r L 2 t h e n g +-- x y x - l y -1 3 Return g 4 else i f 3 x E X , y e Y s u c h t h a t ( O m , Y , x y x - 1 ) โข L
229
5 6 7 8 9 10
t h e n g +- x y x -1 Return g else g +- YES Return g end-if end-if
The subroutine CANONIZE is the algorithm promised by Theorem 2.8 for computing a canonical generator set for a solvable black-box group G, given an arbitrary generator set for the commutator series of G. CANONIZE makes 1-guarded queries to the NP language L I if the input satisfies the promise given in Theorem 2.8. We use the notation CANONIZE()/ to denote the generator set produced by CANONIZE for the Ith element Gi, in the commutator series of G. Following is the description of the algorithm CANONICAL GENERATOR. Define the language Lea as; Lc~ = L ~(9 L. Notice that the oracle access to Lc~ is implicit in the description. That is, CANONICALGENERATORqueries L ~through the subroutine CANONIZE and L through CHECK COMMUTATOR.
CANONICAL GENERATOR(0m, S) 1 Stage 0 2 S~ S~ 3 i~l 4 Stage j (Input to this stage is (i, S~,..., SpJ(m))) 5 k+--i 6 g ~-- CHECK COMMUTATOR(0m,s~, S~.t.1) 7 while g .~ YES 8 do S~+I e- S~ U {g} 9 k+--k+l 10 if k = p(m) 11 t h e n O u t p u t NOT SOLVABLE 12 end-if
13
g ~ CHECK COMMUTATOR(0m, S j, Sg+I)
14 e n d - w h i l e 15 i f k = l 16 t h e n O u t p u t CANONIZE(S~,SJ2,...,S~(m))I 17 else S +1 e - S f o r l < l < ( k - 1 ) 18 s ยง , - CANON,ZE(S , 19 i ~- (k - 1) 20 goto Stage j + 1 21 end-if
for k < I < p(m)
Now we are ready to prove the correctness of CANONICALGENERATOR.We first prove a series of claims, from which the correctness will follow easily. C l a i m 3.2.1 In the algorithm CANONICAL GENERATOR, at any stage j, it holds that Vi; 1 ~ i < p(m), (Si+l)J < (S~')'.
230
Proof. We prove this by induction on the stages. For the base case, when j = 0, it is clear that the claim holds. Assume that it is true for (j -. 1) th stage. Now 3 1 and S i3 . Depending on how the sets $3i+1 and S~ are updated in consider Si+ lines-17,18 of CANONICAL GENERATOR, we have the following cases. j-1 j j-1 C a s e 1. S~ : S i ; Si+ 1 = Si+ 1 . In this case, from the induction hypothesis, it is clear that (S~+1) < (S~)'. Case 2.5'~ = S~-Iu {g,}; S i+1 j j - 1 From the induction hypothesis, it = S~+1" = (sE:> < (s,J-l> ' < <s{)'. follows that J j-1 j j-1 C a s e 3. S~ = S i ; Si+ 1 = Si+ 1 U {gi+l}- The element gi+l is added to the j--11 at line-8 of the algorithm, where gi+l is the element returned by the set S~+ subroutine CHECK COMMUTATOR. Suppose g~+l is a commutator of the set S~ = S~ -1. Then gi+l = x y x - l Y -1 for some elements x , y 9 S~. From induction hypothesis and the definition of the commutator subgroup of a group, it follows i-1 U {gi+l }) < (S~)'. On the other hand, suppose gi+l is of the that ( Si +j 1 ) = (Si+I form x y x -1 for some x e S~ = S~ -1 and y e SiJ~.:. We have, (S~+~) < (Sr ' = (S~)'. But we know that (S~)' is normal in (S~). So, in particular gi+l 9 (S~)' and hence (si+l) i < C a s e 4 . S~ = S~ -1 U {gi}; SJ+, = S~+~ U {gi+,}. From induction hypothesis, we have (S~+:) < (S~'-1) '. It follows that (S~+~) < (S~- 1 (J {gi}). Now using a very similar argument as in C a s e 3, it is easy to show that (S~+1) < (S~)'. Hence the claim.
C l a i m 3.2.2 In CANONICAL GENERATOR, the input (i, Si, j S~+I,...J, S~(m) )j
to
any stage j, is such that for all i < t < p(m), S{ is a canonical generator for the solvable group (S{). Proof. We prove this by induction. For j = 1, it is easily verified that the claim is true. Let us assume that the claim is true for jth stage. Let (i, S ~ , . . . , S~(m)) be the input to the jth stage. Let the while-loop is exited through line-14 after l iterations with the value of g='YES'. (If the loop is exited through line-ll, then there are no more stages to be considered). Then the value of k = i + i and for t > k, S~ is not updated inside the loop, and hence by induction hypothesis, it remains a canonical generator set for the solvable group (S~). Since the value of g ---- COMMUTATOR CHECK(0 m , Sk, J Sk+l) J "~k/ < (S~+I)" is 'YES' we have that (~J\~ From Claim 3.2.1 we have (S~+1) < (S~)'. Hence (S~+1) = (Sgy. It follows that (S~) is solvable and S~ for k < t < p(m) is generator sets for the commutator series of (S~). Hence at line-18, CANONIZE will output a canonical generator set for each of the elements in the commutator series of (S~). At line-19, i is updated to k - 1, and the input to the j + 1th stage (k - 1 ' '~j+l k - l ' ~ k e j + l ~ ' ' ' ' S pj+l ( m ) l\ where
S~ +1 is a canonical generator set for the solvable group (S~ +1) for k < t < p(m). Hence the claim.
231 C l a i m 3.2.3 In the algorithm CANONICAL GENERATOR,
for any stage j, it
holds that 3i such that IS~+P(m)I > 21S~1 if stage j + p(m) exists. Pro@ Notice that, if the algorithm at stage j enters the while-loop, then there 3i such that S _~j+l = S iJ U g for a g r (S~). So, it is enough to show that the while-loop is entered at least once after every p(m) sages, if such a stage exists. Suppose the stage j is entered with the value of i = i ~. It is clear from the algorithm that if the algorithm never enters the while-loop in the next p(m) stages, at stage (j + p(m) + 1), the value of i = i' - (p(m) + 1) < 0, for all i ~ < p(m), which is impossible, since the algorithm is terminated when the value of i = 0. Hence the claim. To complete the proof of the theorem, first we shall see that the algorithm CANONICAL GENERATOR runs in time polynomial in the length of the input. Observe that it is enough to show that the number of stages executed by the algorithm is bounded by a polynomial, since the number of iterations of the while-loop in lines 7-14 is bounded by p(m). Now, the claim is that the the number of stages executed by the algorithm is bounded by 2p 3(m). Firstly, notice that for any H < Bin, IHI <_ 2p(m). Hence for any j, 1-I~(__'~) IS~I < 2p2(m). Suppose the claim is false. Now from Claim 3.2.3 it follows that, -p(m)lli=lIsiJTP(m)~-
p(m) IS~I. Hence Vp(m) 2I-,=1 11i=1 IS2Pa(m) > 2p2(m), a contradiction. Now we shall see that CANONICAL GENERATOR makes only 1-guarded queries to Lc~, where Lc~ = L~@ L. Let us first see that the queries to L through CHECK COMMUTATOR are 1-guarded. It is enough to show that whenever CANONI3" CAL GENERATOR calls CHECK COMMUTATOR with argument (om,s~,Sk+l) in stage j, SJk+l is a canonical generator set. But from Claim 3.2.2, the input J ) to any stage j, is such that for all i < t <_ p(m), S~ is (i, S~, S ,J+ 1 , . . . , S~(m)
a canonical generator for the solvable group (S~'). Now, by inspecting the description of the algorithm, it follows that whenever CANONICAL GENERATOR calls CHECK COMMUTATOR with argument (om,sik,SJk+l) , S~+ 1 is a canonical generator set. To see that the queries to L ~ through CANONIZE are 1-guarded, notice that calls to CANONIZE are made outside the while-loop. This means that CHECK returns YES. That is (~J\~ J COMMUTATOR with input (0m,S~", Sk+l) J ~k/ < (Sk+l)" Hence (S~)' = (Sk+l) J from Claim 3.2.1. So it follows that calls to CANONIZE with argument (S~,S~+I,...,S~(m)) will be such that S for i < l < p(m) will generate the commutator series of S~ for all i. It follows from Theorem 2.8 that queries to L ~ will be 1-guarded. Finally, we show that the above algorithm on input (0 m, S), outputs a canonical generator set for the group G = (S) if G is solvable and outputs NOT SOLVABLE otherwise. Now, observe that if HI < H2 are two finite groups, H i < H~. Hence it follows from Claim 3.2.1 that, (S~) < Gi for any i at any stage j, where G~ is the ith element in the commutator series of G. We know that after the execution of 2pa(m) stages, the algorithm outputs either a set
232
X C_Bm or NOT SOLVABLE. Suppose it outputs NOT SOLVABLE in stage j. This happens after the value of the variable inside the while-loop is assigned
p(m). From the description of the algorithm inside the loop, it follows that the group (Sip(m)) does not contain the commutator subgroup of (S~(m)_l). But if G where solvable, then we know that Gp(,~) = {e} and since (S~(m)) < Gp(m) from Claim 3.2.1, we have a contradiction. Suppose the algorithm outputs a set X C_Bm at line-16 in stage j. Thus the value of the variable k is 1. Notice that, inside the while-loop, the value of k is only incremented. This implies that at stage j the while-loop is not entered (the value of i could not have become 0 at a previous stage). So input to stage j is (1, S~,... ,Sp(m)). From Claim 3.2.2, it follows that for all 2 < t < p(m), (S~) is solvable and S~ is a canonical generator set for the group (StJ). From the value of g =YES and Claim 3.2.1, it follows that (S~')' = (S~). Also, since S~ = S for any stage j, it follows that S~" generates the ith element in the commutator series of (S) = G. Hence, from Theorem 2.8, it follows that CANONIZE(SlJ,..., SpJ(m))l is a canonical generator set for G. Hence the theorem.
4
Improving the Bounds
In this section, combining the algorithm CANONICALGENERATOR with the algorithms developed in 1, we prove upper bounds on the complexity of the following problems over solvable black-box groups. Let B = {B,n}m>0 be a group family. The following decision problems which we consider in this paper are well-studied in computational group theory 8, 4, 15, 2. Solvability Testing z~ {(0m ' S) (S) < B m and (S) is solvable }. Membership Testing ~ {(0 m, S, g) (S) < B m and g 9 (S)}. Order Verification ~ {(0"LS, n) (S) < S m and I(S) = n}. Group Isomorphism ~ {(0 7n, S1, S~) ($1), ($2) < B m and are isomorphic}. Group Intersection ~= {(0 n, S1, $2)
($1), ($2) < Brn and ($1) n ($2) r (e)}.
Group Factorization zx {(Om,S1,S2,g ) ($1), ($2)
low for all gap-definable counting classes. R e m a r k . In 3, a co-RP algorithm is given for Solvability Testing. But this upper bound does not give lowness for gap-definable counting classes other than PP.
233
In view of the above theorem, for all the problems that we consider here, we assume without loss of generality that the groups encoded in the problem instances are solvable. From Theorems 3.1, 2.3 and Proposition 2.6 it easily follows that Membership Testing over solvable groups is in SPP. T h e o r e m 4.2 Over any group family, Membership Testing for the subclass of solvable groups is in SPP and hence low for all gap-definable counting classes. For proving upper bounds for Group Isomorphism and Order Verification, we use the following theorem, which is essentially proved in 1. We omit the proof here. T h e o r e m 4.3 (1) Let B = {Bm}m>o be a group family. Then there are polynomial time deterministic oracle machines, 1. Mo that takes (0m, S, n) as input satisfying the promise that S is a canonical generator set for the solvable group (S) and Lo E NP as oracle such that; Mo makes 1-guarded queries to Lo and accepts if and only if I(S)I = n. 2. Mis that takes (Om,S1,S2) as input satisfying the promise that S1,82 are canonical generator sets for the solvable groups ($1), ($2) respectively and L~s E NP as oracle such that; Mis makes 1-guarded queries to Lis and accepts if and only if ($1) is isomorphic to ($2). The behavior of the machines are not specified if the input does not satisfy the promise. The above theorem along with Theorems 3.1, 2.3 gives the upper bound for Group Isomorphism and Order Verification over solvable black-box groups. T h e o r e m 4.4 Over any group family, Group Isomorphism and Order Verification for the subclass of solvable groups are in SPP and hence low for all gapdefinable counting classes. Let 79 denote any of the problems, Group Intersection, Group Factorization, Coset Intersection, Double-coset Memb. Next we show that the Theorem 3.1 along with the following Theorem from 1, gives membership of 79 in the class LWPP. It follows that any problem denoted by 7) for solvable groups are low for the class PP and C=P. 4.5 (1) Let 13 -= {Bm}m>o be a group family. Then there are polynomial time deterministic oracle machines MT~ that takes an instance x of problem79 ((Om,S1,S2) if79 is Group Intersection, (Ora,S1,S2,g) if79 is Group Faetorization or Coset Intersection, (0 m, 81, $2, g, h) if 79 is Double-coset Memb ) as input satisfying the promise that $1,82 are canonical generator sets for the solvable groups (81), ($2) respectively and LT~ E NP as oracle such that; M~ makes IBml-guarded queries to Lp and accepts i / a n d only if x E 79. The behavior of the machines is not specified if the input does not satisfy the promise.
Theorem
234
Finally we have the following theorem. 4.6 Over any group family, the problems Group Intersection, Group Factorization, Coset Intersection, Double-coset Memb or the subclass of solvable groups are in L W P P and hence low or the classes P P , and C = P .
Theorem
A c k n o w l e d g m e n t s . I would like to t h a n k V. Arvind for the discussions we had a b o u t the paper and his suggestions which improved the readability of the paper. I would like to t h a n k Meena Mahajan, Antoni Lozano and the referees for their suggestions which improved the readability of the paper.
References 1. V. Arvind and N. V. Vinodchandran. Solvable Black-Box Group Problems are low for PP. Symposium on Theoretical Aspects of Computer Science, LNCS Voh 1046, 99-110, 1996. 2. L. Babai. Bounded round interactive proofs in finite groups. SIAM Journal of Discrete Mathematics, 5: 88-111, 1992. 3. L. Babai, G. Cooperman, L. Finkelstein, E. Luks and A.. Seress. Fast Monte Carlo algorithms for permutation groups. In Journal of Computer and System Sciences, 50: 296-308, 1995. 4. L. Babai and M. Szemer~di. On the complexity of matrix group problems I. Proc. 25th IEEE Symposium on Foundations of Computer Science, 229-240, 1984. 5. 3. L. Balc~zar, J. Diaz and 3. Gabarr6. Structural Complexity - I ~ II. Springer Verlag, Berlin Hiedelberg, 1988. 6. R. Boppana, 3. Hastad and S. Zachos. Does co-NP have short interactive proofs? Information Processing Letters, 25: 127-132, 1987. 7. W. Burnside. Theory of Groups of Finite Order, Dover Publications, INC, 1955. 8. G. Cooperman and L. Finkelstein. Random algorithms for permutation groups. CWI Quarterly, 5 (2): 93-105, 1992. 9. S. Fenner, L. Fortnow and S. Kurtz. Gap-definable counting classes. Journal of Computer and System Sciences, 48: 116-148, 1994. 10. M. Fellows and N. Koblitz. Self-witnessing polynomial time complexity and prime factorization. Proc. 6th Structure in Complexity Theory Conference, 107-110, 1992. 11. M. Furst, J. E. Hopcroft and E. Luks. Polynomial time algorithms for permutation groups. Proe. Plst IEEE Symposium of Foundations of Computer Science, 36-45, 1980. 12. M. Hall. The Theory of Groups. Macmillan, New York, 1959. 13. C. Hoffmann. Group-Theoretic Algorithms and Graph Isomorphism. Lecture Notes in Computer Science ~136, Springer Verlag, 1982. 14. C. Hoffmann. Subcomplete Generalizations of Graph Isomorphism. Journal of Computer and System Sciences, 25: 332-359, 1982. 15. 3. KSbler, U. SchSning and J. Tor&n. Graph isomorphism is low for PP. Journal of Computational Complexity, 2: 301-310, 1992. 16. U. SchSning. Graph isomorphism is in the low hierarchy. Journal of Computer and System Sciences, 37: 312-323, 1988.
On R e s o u r c e - B o u n d e d M e a s u r e a n d Pseudorandomness V. Arvind 1 and :l. Kbbler 2 1 Institute of Mathematical Sciences, Chennai 600113, India email: arvind@imsc, ernet, in Theoretische Informatik, Universits Ulm, D-89069 Ulm, Germany emaih koebler@inf ormat ik. uni-ulm, de
A b s t r a c t . In this paper we extend a key result of Nisan and Wigderson 17 to the nondeterministic setting: for all a > 0 we show that if there is a language in E = DTIME(2 ~ that is hard to approximate by nondeterministic circuits of size 2an, then there is a pseudorandom generator that can be used to derandomize BP.NP (in symbols, BP.NP = NP). By applying this extension we are able to answer some open questions in 14 regarding the derandomization of the classes BP.~ P and BP. O P under plausible measure theoretic assumptions. As a consequence, if OP does not have p-measure 0, then AMncoAM is low for OP. Thus, in this case, the graph isomorphism problem is low for O P . By using the NisanWigderson design of a pseudorandom generator we unconditionally show the inclusion MAC ZPP NP and that MAN coMA is low for ZPP NP.
1
Introduction
In recent years, following the development of resource-bounded measure theory, pioneered by Lutz 12, 13, plausible complexity-theoretic assumptions like P # NP have been replaced by the possibly stronger, but arguably plausible measuretheoretic assumption #p(NP) # 0. With this assumption as hypothesis, a number of interesting complexity-theoretic conclusions have been derived, which are not known to follow from P # NP. Two prominent examples of such results are: there are Turing-complete sets for NP that are not many-one complete 15, there are NP problems for which search does not reduce to decision 15, 7. Recently, Lutz 14 has shown that the hypothesis #p(NP) # 0 (in fact, the possibly weaker hypothesis /~p(A P) # 0, k > 2) implies that B P . A P = A P (in other words, BP. A P can be derandomized). This has an improved lowness consequence: it follows that if/~p(A P) # 0 then AM n coAM is low for A P (i.e., any AM N coAM language is powerless as oracle to A P machines). It also follows from/~p(A P) # 0 that if NP C_ P / p o l y then PH - A P. Thus the results of Lutz's paper 14 have opened up a study of derandomization of randomized complexity classes and new lowness properties under assumptions about the resource-bounded measure of different complexity classes. The results of Lutz in 14 (and also a preceding paper 13) are intimately related to research on derandomizing randomized algorithms based on the idea of trading hardness for randomness 22, 25, 17. In particular, Lutz makes essential
236
use of the explicit design of a pseudorandom generator that stretches a short random string to a long pseudorandom string that looks random to d e t e r m i n i s t i c polynomial-size circuits. More precisely, the Nisan-Wigderson generator is built from a set (assumed to exist) that is in E and, for some ~ > O, is hard to a p p r o x i m a l e by circuits of size 2 a ' . As shown in 17, such a pseudorandom generator can be used to derandomize BPP. In Section 3 of the present paper we extend the just mentioned result of Nisan and Wigderson to the nondeterministic setting. We show that their generator can also be used to derandomize the Arthur-Merlin class AM = BP. NP, provided it is built from a set in E that is hard to approximate by n o n d e t e r m i n i s ~ i c circuits of size 2 ~n for some c~ > 0. Very recently 9, the result of Nisan and Wigderson has been improved by weakening the assumption that there exists a set A in E that is hard to approximate: it actually suffices that A has w o r s t - c a s e circuit complexity 2 n(n). We leave it as an open question whether a similar improvement is possible for the non-deterministic case. (For related results on derandomizing BPP see 2, 3.) In Section 4 we apply our extension of the Nisan and Wigderson result to the non-deterministic case to answer some questions left open by Lutz in 14. We show that for all k >_ 2, ~p(A~') r 0 implies BP.5: P = Z P (see Figs. 1 and 2 for a comparison of the known inclusion structure with the inclusion structure of these classes if #p(A2P ) r 0). Furthermore, we show under the possibly stronger assumption pp(NP) ~ 0 that with the help of a logarithmic number of advice bits also BP.NP can be derandomized (i.e., BP.NP C NP/log). Under the hypothesis pp(NP N coNP) r 0 we are able to prove that indeed BP. NP = NP which has some immediate strong implications as, for example, Graph Isomorphism is in NP ;3 coNP. Relatedly, in Section 5 we show that for all k ~ 2, pp(~P) r 0 implies BP. O P = O P, answering an open problem stated in 14. Thus, #p(O P) r 0 has the remarkable consequence that AM ;3 coAM (and consequently the graph isomorphism problem) is low for o P . Finally, we show in Section 6 that the Arthur-Merlin class MA is contained in ZPP NP and that MA ;3 coMA is even low for ZPP NP.
2
Preliminaries
In this section we give formal definitions and describe the results of Nisan and Wigderson 17 and of Lutz 14 which we generalize in this paper. We use the binary alphabet 5: = {0, 1}. The cardinality of a finite set X is denoted by IIXII and the length of x 6 Z* by Ixl. The join A @ B of two sets A and B is defined as A @ B = {0x I ~ e A} U {1~ I ~ 9 B}. The characteristic function of a language L C 5:* is defined as L(x) = 1 if x 9 L, and L(~) = 0 otherwise. The restriction of L(x) to strings of length n can be considered as an n-ary boolean function that we denote by L =n . Conversely, each n-ary boolean function g defines a finite language {x 9 5:n I g(x ) = 1} that we denote by Lg.
237
BP,E~ q
BP-//~
BP.,U " = E ~ ~
" BP-//~ i ~ = / / ~
"-,,,, BP.z ,
BP.//~ q
BP-E~
= / 7 ~ ~
~ J -
= 27~
2
AM q
2
NPI
co-AM
AM ~
co-NP
NP ~ , x
~co-AM TBpp~
co-NP
P
P
Fig. 1. Known inclusion structure
Pig. 2. Inclusion structure if/~p(A P) # 0
The definitions of complexity classes we consider like P, NP, AM, E, EXP etc. can be found in standard books 6, 5, 18. By log we denote the function logx = max{l, log 2 x } and (., .) denotes a standard pairing function. For a class C of sets and a class F of functions from 1" to ~*, let C/F 11 be the class of sets A such that there is a set B E C and a function h E F such that for all z E 2", x e A r (x, h(ll=l)) e S. The function h is called an advice function for A. The BP-operator 21 assigns to each complexity class C a randomized version BP. C as follows. A set L belongs to BP. C if there exist a polynomial p and a set D E C such that for all z, Izl = n
x 9 L =>.Probren{o,1}p(.)(x,r) 9 D >_3/4, x ~ L =~Probren{o,1}~(.)(x,r ) 9 m <_1/49 Here, the subscript r 9 {0, I}p(n) means that the probability is taken by choosing r uniformly at random from {0, i} p. We next define boolean functions that are hard-to-approximate and related notions. For a function s : iV" --~ Af+ and an oracle set A C_ 2*, CzT~A(n,s)
denotes the class of boolean functions f : {0, 1}n --+ {0, 1} that can be computed by some oracle circuit c of size at most s(n) having access to A. In case A = 0 we denote this class by CZTC(n, s). Furthermore, let CZTC(s) = Un>0 CzTC(n, s) and CITiA(s) = Un>o CZT4"A(n, s).
238
Definitionl.
(eft 25, 17)
1. Let f : {0, 1} n ~ {0, 1} be a boolean function, C be a set of boolean functions, and let r E Tr + be a positive real number, f is said to be r-hard for C if for all n-ary boolean functions g in C,
I
1
2
r
ll{z 9 {0, 1}n I f(:~) - g(:~)}ll <
2~
1
1
< ~ + - r"
2. Let r : AZ ~ 7~+ and L C_ S*. L is said to be finitely many n, the n-ary boolean function L =" 3. A class D is called r-hard for C if some language 4. A boolean function f (a language L, or a language hard if f (resp. L, D) is r-hard for CET~a(r).
r-hard for C if for all but is r(n)-hard for C. L E D is r-hard for C. class D) is called C277~a(r)-
The already discussed result of Nisan and Wigderson can be stated in relativized form as follows. T h e o r e m 2 . 17 For all ~ > 0 and all oracles A, i r E a is C:T.TiA(2a')-hard, then pA = BppA. The concept of resource-bounded measure was introduced in 12. We briefly recall some basic definitions from 12, 14 leading to the definition of a language class having p-measure 0. Intuitively, if a class C of languages has p-measure 0, then C fq E forms a negligibly small subclass of the complexity class E (where E = LJe>0DTIME(2Cn); see 12, 14 for more motivation). D e f i n i t i o n 3 . 12, 14 1. A function d : S* ~ Tr + is called a supermartingale if for all w E S*,
d(w) >_ (d(wO) + d ( w l ) ) / 2 . 2. The success set of a supermartingale d is defined as S~176 = {A
l i m s u p d ( A ( s l ) . . . A(st)) = c~} l --.* o o
where sl = A, s2 = 0, s3 = 1, s4 = 00, s5 = 0 1 , . . . is the standard enumeration of S* in lexicographic order. The unitary success set of d is
sl
=
U d(w)> 1
where, for each string w E S*, C~o is the class of languages A such that A ( s l ) . . . A(Sltol) = w, i.e., the smallest language in C,o is Lw = {si I wi = 1}. 3. A function d : Afi โข ~* --~ Tr is said to be p-computable if there is a function f : A/"i+1 โข E* ---* Tr such that f ( r , k l , . . . , k i , w) is computable in time ( r + k l + . . - + k i + wl) ~ and f(r, k l , . . . , k i , w ) - d ( k l , . . . , k i , w ) l <_ 2 -r. 4. A class X of languages has p-measure 0 (in symbols,/~p(X) = 0) if there is a p-computable supermartingale d such that X C_ S ~176 d.
239
In the context of resource-bounded measure, it is interesting to ask for the measure of the class of all sets A for which E A is not CITCA(2a")-hard. Building on initial results in 13 it is shown in 1 that this class has p-measure 0. Lemma4.
1 For all 0 < a < 1/3, #p{A I EA is not CZTia(2an)-hard} = O.
Lutz strengthened this to the following result that is more useful for some applications. Lemma5.
14 For all 0 < a < 1/3 and all oracles B E E,
pp{AIE A
is not CZT4.A~B(2an)-hard} = O.
As a consequence of the above lemma, Lutz derives the following theorem. Theorem6.
14 For k > 2, ~fpp(A) โข 0 then B P - A
C A P.
It is not hard to see that Theorem 6 can be extended to any complexity class C C EXP = Uc>0DTIME(2 "~ that is closed under join and polynomial-time Turing reducibility (see also Corollary 22). For example, if @P does not have p-measure 0, then BP. ~ P _C ~bP, implying 24 that the polynomial hierarchy is contained in @P. In Sects. 4 and 5 we address the question whether BP.E P = SP (or BP. (9P = 69P) can also be derived from #p(A P) r 0, and whether stronger consequences can be derived from/~p(NP) r 0 and #p(NP N coNP) r 0. 3
Derandomizing
AM
in Relativized
Worlds
In this section we show that the Nisan-Wigderson generator can also be used to derandomize the Arthur-Merlin class AM = BP. NP 4. We first define the counterpart of Definition 1 for nondeterministic circuits and the corresponding notion of hard-to-approximate boolean functions. A nondeterministic circuzt c has two kinds of input gates: in addition to the actual inputs ~ 1 , . . . , zn, c has a series of distinguished guess inputs yl, 9 9 Ym. The value computed by c on input E S n is 1 if there exists a y E S m such that c(xy) = 1, and 0 otherwise 23. We now define hardness for nondeterministic circuits. AfC2:Tra (s) denotes the union U~>0 JkfCZT"~A( n, s), where .AfC:TT~A(n, s) consists of all boolean functions f : {0, 1}~ ---* {0, 1} that can be computed by some nondeterministic oracle circuit c of size at most s(n), having access to oracle A. D e f i n i t i o n 7 . A boolean function f (a language L, or a language class D) is called A/'CrTiA(r)-hard if f (resp. L, D) is r-hard for AfCzTgA(r). We continue by recalling some notation from 17. Let p, l, m, k be positive integers. A collection D = ( D 1 , . . . , D p ) of sets Di C { 1 , . . . , l } is called a (p, l, m, k )-design if - for all i = 1 , . . . ,p, IID~II = m, and - for all i r j, ttDi rq Djl 1 < k.
240
Using D we get from a boolean function g : {0, 1}m --~ {0, 1} a sequence of boolean functions gi : {0, 1} r --* {0, 1}, i = 1 , . . . ,p, defined as
gi(sl,..., st) - g(si,,..., si,,) where Di = { i l , . . . , ira}. By concatenating the values of these functions we get a function gD : {0, 1} I ---+ {0, 1}P where go(s) = gl(s)...gp(s). As shown by Nisan and Wigderson 17, Lemma 2.4, the output of gD looks random to any small deterministic circuit, provided g is hard to approximate by deterministic circuits of a certain size (in other words, the hardness of g implies that the pseudorandom generator gD is secure against small deterministic circuits). The following lemma shows that gD is also secure against small nondeterministic circuits provided g is hard to approximate by nondeterministic circuits of a certain size. As pointed out in 19, this appears somewhat counterintuitive since a nondeterministic circuit c might guess the seed given to the pseudorandom generator gD and then verify that the guess is correct. But note that in our case, this strategy is ruled out by the size restriction on c which prevents c from simulating gD. L e r a m a 8 . Let D be a (p,l,m, k)-design and let g : {0, 1} m ~
{0, 1} be an A/'CzT~A(m,p 2 + p2k)-hard function. Then the function go has the property that for every p-input nondeterministic oracle circuit c of size at most p2,
Probyenlo,1}pcA(y)
= 1-
Probs6R{O,1},cA(gD(S)) -- 1 < 1/p.
Proof. The proof follows along similar lines as the one of 17, Lemma 2.4. We show that if there is a nondeterministic oracle circuit c of size at most p2 such that IProbye,{o,1},ca(y) = 1 - Prob,en{o,1},cA(go(s)) =
1 I>
1/p,
then g is not AfC:T.T~A(m,p2 + p2k)-hard. Let $ 1 , . . . , St and Z1,..., Zp be independently and uniformly distributed random variables over {0, 1} and let S = ( $ 1 , . . . , St). Then we can restate the inequality above as follows:
IVrobcA(Z1,..., Z p ) = 1 - ProbcA(gl(S),..., gp(S)) = 111> 1/p where gi(s) denotes the ith bit of gD(s), i = 1,... ,p. Now consider the random variables Xi = c a ( g l ( S ) , . . . ,gi-,(S), Zi,..., Zp), i = 1 , . . . , p . Since X1 = cA(z1,..., Zp) and since Xp+l = cA(gl(S),..., gp(S)), we can fix an index j 6 { 1 , . . . , p } such that
IProbXj = 11- ProbXj+l = 11 > 1/p 2. Consider the boolean function h : {0, 1}z x {0, 1}p-j+I --~ {0, 1} defined as
h(s, zj,...,zp) =
zj, i f c A ( g l ( s ) , . . . , g j - l ( s ) , z j , . . . , z p ) = 0 , 1 - zj, otherwise.
(1)
241
Since
Probh(S, Z j , . . . , Zp) = gj(S) - 1/2 = ProbXj = 0 A Zj = gj(S) + ProbXj = 1 A Zj r gj(S) - 1/2 = ProbZj
-" g./(S) + P r o b X j
= 1 - 2 . ProbXj
= ProbXj
= 1 - 2. ProbXj+l
= ProbXj
-- I - P r o b X j + l
- 1 A Zj - g j ( S ) - 1/2
= 1 A Zj = gj(S)
= I
it follows that (1) is equivalent to
IProbh(S, Z j , . . . , Z p ) = g j ( S ) -
1/21 > 1/p ~.
(2)
Since g j ( s l , . . . , s l ) only depends on the bits si with i E Dj, we can apply an averaging argument to find fixed bits si, i f~ Dj and fixed bits ~ j , . . . , ~p such that (2) still holds under the condition that Si = si for all i ~ / 9 / and Zi = 5i for all i = j , . . . , p . Since g j ( s l , . . . , sl) = g ( s l , . . . , sin) (for notational convenience we assume w.l.o.g, that Dj = { 1 , . . . , m}) we thus get
IProbh(S1,...,S,n,~m+l,...,~,,~j,...,~p)
= g(S1,...,Sm)-
1/21 > 1/p 2.
Now consider the nondeterministic oracle circuit c~ that on input Sl, 9 9 Srn first evaluates the functions gx, g 2 , . . . , gj-1 on ( s l , . . . , sin, Sin+l,..., ~l), and then simulates the oracle circuit cA to compute
cA(gl(Sl,...,
.
.
.
,
.
.
.
,
.
.
.
,
s i n ,
.
.
.
,
.
.
.
,
Then, c~A either computes the boolean function that maps ( S l , . . . , S m ) to h ( s l , . . . , sin, ~m+~,..., ~;, ~ j , . . . , ~p) or it computes the negation of this function (depending on whether 5j = 0 or 5j = 1) and hence it follows that
IProbc'A(s1,...,
Sin) = g(S1,..., Sin)-
1/2 > X/p 9.
Since each of gl(sl, 9 9 sin, ~m+x,..., s t ) , . . . , g j - l ( s l , . . . , Sin, Sin+l,..., ~t) depends on at most k input bits, these values can be computed by a deterministic subcircuit of size at most 2 k (namely, the brute-force circuit that evaluates that particular k-ary boolean function). This means that the size of d is at most p2 + p2 k, implying that g is not AfCT.T~A(m,p 2 + p2k)-hard. For our extension of Theorem 2 we also need the following lemma.
Let e be a positive integer and let the integer valued functions l, m, k be defined as t(p) = 2c ~ logp, re(p) = clogp, and k(p) = logp. Then there is a polynomial-time algorithm that on input lP computes a (p, l(p), m(p), k(p) )design. L e r a m a 9 . 17
T h e o r e m l 0 . Let A and B be oracles and let c~ > O. I r E A is AfCZT~/3(2an)hard, then BP. NP B C NpB / F p A. In particular, if E a is AfCZT~A(2an)-hard, then BP. NP A = NP A.
242
Proof. Let L E BP. NP s . Then there exist a polynomial p and a set D E NP B such that for all x, Ixl = n x E n ~ Probreโข(o,1}p(,)(x,r ) e D > 3/4, x โข L =~ ProbreR{o,1}p(.)(x,r ) e D ~_ 1/4. For a fixed input x, the decision procedure for D on input x, r can be simulated by some nondeterministic oracle circuit c= with input r, implying that
x E L ~ ProbreR{o,llp(.)cS(r)
= 1 > 3/4,
x q~ n ~ Probrea{o,1}~(.)cB(r) = 1 <_ 1/4 where w.l.o.g, we can assume that the size of c= is bounded by p2(lxl). Let c~ > 0 and let C E E A be an AfgITCB(2a")-hard language. Then for almost all n, the boolean function C =" : {0, 1}'~ ---* {0, 1} is AfCzTCB(n, 2 ~ ) hard. Thus, letting c = 3/a and re(n) = clogp(n), it follows that for almost all n, C =m('O is AfCZT~s (re(n), p(n)3)-hard. Now let l(n) = 2c 21ogp(n) and k(n) = logp(n). Then we can apply Lemmas 8 and 9 to get for almost all n a (p(n), l(n), m(n), k(n))-design D such that the boolean function CD re(n) : {0, 1} z(n) --* {0, 1}p(n) has for every p(n)-input nondeterministic oracle circuit c of size at most p(n) 2 the property that
Prob~e,{o,1},(,)cB(y)
= 1-
Prob,e,{o.1},(,)cB(CDm('O(s))=
111 _< 1/p(n).
Notice that since m(n) = O(log n) and since C 9 E A, it is possible to compute the advice function h(1 n) = C(Om(n)) ... C(1 re(n)) in FP A. Hence, the following procedure witnesses B 9 NpB /FpA: i n p u t x, x = n, and the sequence h(ln) = c(om(")) ... C(lm(n)); compute a (p(n), l(n), m(n), k(n))-design D and let r l , . . . , r2,(,) be the pseudorandom strings produced by CD re(n) on all seeds from {0, 1}l(n); if the number of ri for which cB(ri) = 1 is at least 2 z(n)-I t h e n a c c e p t else r e j e c t
4
Derandomizing BP.,U P if A~ is Not Small
In this section we apply the relativized derandomization of the previous section to extend Lutz's Theorem 6 to the ~ levels of the polynomial hierarchy. A crucial result used in the proof of Lutz's Lemma 5 is the fact that there are many n-ary boolean functions that are C2:Tg(n, 2~n)-hard. In Lemma 12 we establish the same bound for the nondeterministic case. 13 For each o~ such that 0 < o~ < 1/3, there is a constant no such that for all n > no and all oracles A, the number of boolean functions f : {0, 1} n ~ {0, 1} that are not CT.TiA(n,2~n)-hard is at most 2 ~" 9e -2"/'.
Lemmall.
243
L e m m a 12. For each oL such that 0 < cr < 1/3, there is a constant no such that for all n >_ no and all oracles A, the number of n-ary boolean functions that are not JfCIT~a(n, 2~n)-hard is at most 2 2" 9e -2"/4 .
Proof. The proof follows an essentially similar counting argument as in the deterministic case (see 13). In the sequel, let q = 2 ~n and let AfC2:TEA(n, q) denote the class of n-ary boolean functions computed by nondeterministic oracle circuits of size q with exactly j guess inputs, having access to oracle A. Notice that HCZTiA(n, q) = Uj=0 q-n AfCZTCjA (n, q), implying that II.AfCZโขA(n, q)ll < q--n
)-~j=0 IIAfCZTcA (n, q)ll. It is shown in 16 by a standard counting argument that for n < q ,
IICzTiA( n, q)ll < a(4eq) q where a = 2685. Since each function in A/'CI.TiA(n, q) is uniquely determined by an n + j-ary boolean function in CIT~A(n + j, q), it follows that q~n
II-~"CZnA(n,q)ll < ~
a(4eq) q < aq(4eq) q.
j=0 We now place a bound on the number of n-ary boolean functions that are not
.AfCITiA(n, q)-hard. Let
DELTA(n, q) = {D C Z n I 1/q < 12- n .
IIDII- 1/21}.
By applying standard Chernoff bounds, as shown in 13, it can be seen that IDELTA( n, q)ll < 22"2-'2~ where c > 0 is a small constant. Now, from the notion of JV'CITiA(n, q)-hard functions (Definition 7) it is easy to see that there are at most
'AfCIT"~A( n, q)ll"
IIDELTA( n, q)I < q(q + 1)(144cq) q" 22"2-r176
distinct n-ary boolean functions that are not .A/'CZTiA(n, q)-hard. Hence, using the fact that 0 < o~ < 1/3 we can easily find a constant no such that for n > no the above number is bounded above by 22"e -2~/4 as required. 3 We further need the important Borel-Cantelli-Lutz L e m m a 12. A series ~~1760 a~ of nonnegative reals is said to be p-convergent if there is a polynomial q such that for all r EAf, ~k~=q(r) ak <_ 2 - r . T h e o r e m 13. 12
Assume that d : .hi" x S* --+ T~+ is a function with the fol-
lowing properties 1. d is p-computable. 2. For each k E Af, the function dk, defined by dk(w) = d(k,w) is a supermartingale. 3. The series ~k~176dk(A) is p-convergent. The,,
s
= o.
244 Now we are ready to extend Lutz's L e m m a 5 to the case of nondeterministic circuits. L e m m a 14. For all 0 < c~ < 1/3 and all oracles B @E, #p{A I EA is not A/'CZT~aeB(2an)-hard} = O.
Proof. Let 0 < a < 1/3 and B 6 E. For each language A define the test language 3 C(A) = {z lzlO 2'~'6 A}, and let X = {A I C(A) is not A f c z n A s n ( 2 a n ) - h a r d } . Notice that since C(A) 6 E A , the theorem follows from the following claim.
Claim. #p(X) = 0. Proof of Claim. The proof follows the same lines as in 14, T h e o r e m 3.2 except for minor changes to take care of the fact that we are dealing with nondeterministic circuits. For each k > 0, let Xk = I {A I C(A) is not AfcrnAeS(n,2~")-hard}, if k = 2" for some n, 0,
otherwise.
It follows immediately that
x--DUx i>0~>_j
We will show t h a t pp(X) = 0 by applying the Borel-Cantelli-Lutz L e m m a (Theorem 13). Let no be the constant provided by L e m m a 12 and let k0 = 2 "~ In order to apply Theorem 13 we define d : Af โข S* ~ 7~+ as follows (exactly as in 14): t. If k < k0 or k is not a power of 2, then dk(w) = O. 2. If k = 2 n > ko and Iwl < 2 k+x, then dk(w) = e -kl/'. 3. If k = 2 n > ko and Iwl _> 2 kยง then
dk(w) =
E
ProbLg = C(A)=nAD I A 6 Cw
(n~2~"), DEDELTA(n,2 c'n)
g6 ~fczT~ L ~ e B
where dk(w) = d(k, w) and the conditional probabilities are taken by deciding the membership of each string z 6 S* to the random language A by an independent toss of a fair coin. Now, the following three properties of d can be proved along similar lines as in 14: 3 This test language was originally defined by 1 and later used in 14.
245
1. 2. 3. 4.
d is p-computable. For each k > 0, dk is a supermartingale with d~(A) < e -~1/4. For all k >__k0, Xk C Sldk. x c_ U~>0 fk>j S~dk 9
The only point where a different argument is required is in showing that d is p-computable because the circuits used to define dk(w) are nondeterministic. Nevertheless, notice that the only nontrivial case to be handled in the definition of dk is when k = 2n >_ k0 and w I >_ 2 k+l. In this case, the size of the considered nondeterministic oracle circuits is bounded by 2 a'~ < k. Therefore, in time polynomial in 2 6 < Iwl it is possible to evaluate these circuits by exhaustive search. It is now easy to derandomize B P - S P under the assumption that A P has non-zero p-measure. Theoremlh.
For all k >__2, if/~p(A~) # 0, then B P . E
= E P.
Proof. Assume the hypothesis and let B be a fixed ~P 1-complete set. We know from Lemma 14 that for c~ = 1/4, #p{A
E A is not AfCZT~acB(2a")-hard} = 0.
On the other hand, #p(A) # 0. Hence, there is a set A e AP such that E a (and thus also E ACB) is AfCZT~A~B(2~n)-hard. Applying Theorem 10 we get Z P = NP AaB = B P . N P A~B = B P . ~ P , which completes the proof. Furthermore, we obtain the following two interesting consequences. C o r o l l a r y 16. If #p(NP N coNP) r 0, then BP. NP = NP.
Proof. Assuming that/~p(NP N coNP) # 0, similar to the proof of Theorem 15 it follows that there is a set A E NP n coNP such that NP A = BP. NP A. From the fact that N P N P n c ~ - - NP, we immediately get that NP = BP. NP. C o r o l l a r y 17. If/~p(NP) r 0, then BP. NP C NP/log.
Proof. If/~p(NP) # 0, then from Theorems 10 and 14 it follows that there is a set A E NP such that BP. NP C N P / F P A. Actually, from the proof of Lemma 14 we know something stronger. Namely, we know that the test language C(A) = {= I =102~'~ ~ A} is in E A and is AfCZT~(2a'~)-hard. Hence, we can assume that A is sparse and therefore we get BP. NP C NP/log, by using a census argument 10. rq
246
5
Derandomizing BP.O~ if O~ is Not Small
In 14 it was an open question whether BP-O P = O P can be proven as a consequence of #p(NP) ~ 0. We answer this question by proving the same consequence from a possibly weaker assumption. For a complexity class K E {P, BPP, E} and oracle A, let K~ denote the respective relativized class where only parallel queries to A are allowed. D e f i n i t i o n l 8 . Let A C ~* be an oracle set. Let C2:T~(n, s) denote the class of boolean functions f : {0, 1}n --* {0, 1} that can be computed by some oracle circuit c of size at most s(n) that makes only parallel queries to oracle A. Furthermore, let CIn~(s)= U~>o CI.n~(n,s). It is not hard to verify that Nisan and Wigderson's result (Theorem 2) also holds in the parallel setting. Theoreml9,
For all ~ > 0 and all oracles A, if E~ is CIT~(2an)-hard, then
P~ = BPP~. C o r o l l a r y 20. For all k >_ 2, if #p(O P) # 0, then BP. 69F = 69F.
Proof. Assume the hypothesis and let B be a fixed EP_l-complete set. Observe that if #p(O P) # 0, then it follows from the proof of Lemma 5 (as given in 14) that for a = 1/4 there is a set A E O P such that C(A) is CzT~AeB(2an)-hard. Since C(A) E E~ C_EliA I ~ B and since CZnllA ~ B ( 2a)n C_CZT~A I ~ B ( 2a)n, it follows that E~IeB is CZn~leB(2an)-hard , implying that
ov = p~*B = BPe~eB = BP. Okv, where the second equality follows by Theorem 19. Corollary 20 has the following immediate lowness consequence. C o r o l l a r y 2 1 . If/~p(O P) # 0 then AM N coAM (and hence the graph isomorphism problem) is low for O P. Corollary 20 can easily be extended to further complexity classes. C o r o l l a r y 2 2 . For any complexity class C C_ EXP closed under join and polynomial-time truth-table reducibility, #p(C) ~ 0 implies that BP. C C_C.
Proof. Assume the hypothesis and let L be a set in BP. C, witnessed by some set B E C. Since C is closed under many-one reducibility we can define a suitably padded version B of B in C N E such that L belongs to BP. {/)}. Now, exactly as in the proof of Corollary 20 we can argue that there is a set A 6 C with the property that E Ae~ is CZnA~B(2a")-hard. Hence, by Theorem 19 it follows that L 6 BP.{/)} C Bpp~eB- = p~eB- C_ C. For example, using the fact that PP is closed under polynomial-time truthtable reducibility 8, it follows that if #p(PP) r 0, then B P . P P = PP.
247
6
M A is C o n t a i n e d in Z P P NP
In this section we apply the Nisan-Wigderson generator to show that MA is contained in ZPP NP and, as a consequence, that MAN coMA is low for ZPP NP. This improves on a result of 26 where a quantifier simulation technique is used to show that NP BPP (a subclass of MA) is contained in ZPP NP. The proof of the next theorem also makes use of the fact that there are many n-ary boolean functions that are CZT~(2~n)-hard (Lemma 11). T h e o r e m 23. MA is contained in ZPP NP.
Proof. Let L be a set in MA. Then there exist a polynomial p and a set B E P such that for all x, Ixl = n, x E A ==~3y, lY = p(n): Prob~eR{o,1}p(.)(x,y,r I 9 B > 3/4, x ~ A ~ Vy, lY = p(n): Prob~eR{o,1}p(.)(x, y, r) 9 B < 1/4. For fixed strings x and y, the decision procedure for B on input x, y, r can be simulated by some circuit C=,y with inputs r l , . . . , rp(n), implying that
x 9 A ~ By, lYl = p(n) : Prob~en{o,1},(,)c=,~(r ) = 1 >_ 3/4, x ~ A ~ Vy, lYl = p ( n ) : ProbreR{o,1}v(.)c~,~(r ) = 1 _< 1/4 where w.l.o.g, we can assume that the size of c,,~ is bounded by p2(Ix). It follows by the deterministic version of Lemma 8 that for any (p, l, m, k)-design D and any C2:T~(p2 + p2k)-hard boolean function g : {0, 11m --~ {0, 1t,
IProbye,{o,1}p c(y) = 1 - Prob, e R{0,1}' c(gD (s)) = 11 <_ 1/p holds for every p-input circuit c of size at most p2. Now let re(n) --- 12 logp(n), l(n) = 2 . 1221ogp(n), and k(n) = logp(n). Furthermore, by L e m m a l l we know that for all sufficiently large n, a randomly chosen boolean function g : {0, 11re(n) ~ {0, 1t is eZT~(2"K")/4)-hard (and thus e z n ( p ( n ) 2 + p(u)2~(~))hard) with probability at least 1 - e -~"(")/4 . Hence, the following algorithm together with the NP oracle set S = {(x, r l , . . . , r k ) 13y 9 ~P(i~D: I1{1 < i < k I c=,y(ri) = 11 I > k/2 t witnesses L 9 zPPNP: i n p u t x, Ix = n; compute a (p(n), l(n), m(n), k(n))-design D; c h o o s e r a n d o m l y g : {0, 11re(n) ---* {0, 11; if g is OZT~(2m(n)/4)-hard t h e n {this can be decided by an NP oracle t compute the pseudorandom strings r l , . 9 r2z(.) of gD on all seeds; if (x, r l , . . . , r2,(,) ) 9 B t h e n a c c e p t else r e j e c t else o u t p u t ?
248
We note that T h e o r e m 23 cannot be further improved to AM C Z P P NP by relativizing techniques since there is an oracle relative to which AM is not contained in E~P 20. From the closure properties of MA (namely that MA is closed under conjunctive truth-table reductions) it easily follows that NP MAnc~ C MA. From Theorem 23 we have M A C Z P P NP. Hence, NP MAnc~ C Z P P ~P, implying that Z P P NPMAnr176 C Z P P zPPNP = Z P P NP. We have proved the following corollary. C o r o l l a r y 24. MA N coMA is low for Z P P NP and, consequently, B P P is low for Z p p NP"
Acknowledgement We would like to thank Lance Fortnow for interesting discussions on the topic of this paper.
References 1. E. ALLENDER AND M. STRAUSS. Measure on small complexity classes with applications for BPP. In Proc. 35th IEEE Symposium on the Foundations of Computer Science, 807-818. IEEE Computer Society Press, 1994. 2. A. ANDP~EV, A. CLEMENTI, AND J. ROLIM. Hitting sets derandomize BPP. In Proc. ~3rd International Colloquium on Automata, Languages, and Programming, Lecture Notes in Computer Science ~1099, 357-368. Springer-Verlag, 1996. 3. A. ANDREEV, A. CLEMENTI, AND J. I:tOLIM. Worst-case hardness suffices for derandomization: a new method for hardness-randomness trade-offs. In Proc. $~th International Colloquium on Automata, Languages, and Programming, Lecture Notes in Computer Science ~1256. Springer-Vedag, 1997. 4. L. BABAI. Trading group theory for randomness. In Proc. 17th ACM Symposium on Theory o Computing, 421--429. ACM Press, 1985. 5. J. BALC/~ZAR, J. DIAz, AND J. GABAttR6. Structural Complexity II. SpringerVerlag, 1990. 6. J. BALC~ZAR, J. DfAZ, AND J. GABAaa6. Structural Complexity L SpringerVerlag, second edition, 1995. 7. M. BELLARE AND S. GOLDWASSER. The complexity of decision versus search. SIAM Journal on Computing, 23:97-119, 1994. 8. L. FORTNOW AND N. REINGOLD. PP is closed under truth-table reductions. Information and Computation, 124(1):1-6, 1996. 9. It. IMPAGLIAZZOAND A. WIGDERSON. P = B P P unless E has sub-exponential circuits: derandomizing the XOP~ lemma. In Proc. 29rd ACM Symposium on Theory of Computing. ACM Press, 1997. 10. J. KADIN. P NP0~ " and sparse Turing-complete sets for NP. Journal of Computer and System Sciences, 39:282-298, 1989. 11. R. M. KAaP AND It. J. LIPTON. Some connections between nonuniform and uniform complexity classes. In Proc. l$th ACM Symposium on Theory of Computing, 302-309. ACM Press, 1980. 12. J. 1I. LUTZ. Almost everywhere high nonuniform complexity. Journal of Computer and System Sciences, 44:220-258, 1992.
249
13. J. H. LUTZ. A pseudorandom oracle characterization of BPP. SIAM Journal on Computing, 22:1075-1086, 1993. 14. J. H. LUTZ. Observations on measure and lowness for A P. Theory of Computing Systems, 30:429-442, 1997. 15. J. H. LUTZ AND E. MAYORDOMO. Cook versus Karp-Levin: separating reducibilities if NP is not small. Theoretical Computer Science, 164:141-163, 1996. 16. J. H. LUTZ AND W. J. SCHMIDT. Circuit size relative to pseudorandom oracles. Theoretical Computer Science, 107:95-120, 1993. 17. N. NISAN AND A. WIGDERSON. Hardness vs randomness. Journal o~ Computer and System Sciences, 49:149-167, 1994. 18. C. PAPADIMITRIOU. Computational Complexity. Addison-Wesley, 1994. 19. S. RUDICH. Super-bits, demi-bits, and NQP-natural proofs. In Proe. 1st Intern.
Syrup. on Randomization and Approximation Techniques in Computer Science (Random'97), Lecture Notes in Computer Science #1269. Springer-Verlag, 1997. 20. M. SANTHA. Relativized Arthur-Merlin versus Merlin-Arthur games. Information and Computation, 80(1):44-49, 1989. 21. U. SCH6NING. Probabilistic complexity classes and lowness. Journal of Computer and System Sciences, 39:84-100, 1989. 22. A. SHAMIB.. On the generation of cryptographically strong pseudo-random sequences. In Proc. 8th International Colloquium on Automata, Languages, and Programming, Lecture Notes in Computer Science #62, 544-550. Springer-Verlag, 1981. 23. S. SKYUM AND L.G. VALIANT. A complexity theory based on boolean algebra. Journal of the ACM, 32:484-502, 1985. 24. S. TODA. PP is as hard as the polynomial-time hierarchy. SIAM Journal on Computing, 20:865-877, 1991. 25. A. C. YAo. Theory and applications of trapdoor functions. In Proc. $3rd IEEE Symposium on the Foundations of Computer Science, 80-91. IEEE Computer Society Press, 1982. 26. S. ZACHOS AND M. F/51~rt. Probabilistic quantifiers vs. distrustful adversaries. In Proc. 7th Conerence on Foundations of Software Technology and Theoretical Computer Science, Lecture Notes in Computer Science #287, 443-455. SpringerVerlag, 1987.
Verification of Open Systems Moshe Y. Vardi* Rice University, Department of Computer Science, Houston, TX 7725 I-I 892, U.S.A. Emaih vardi@cs, rice. edu, URL: http : //www. cs. rice. edul~vardi
Abstract. In computer system design, we distinguish between closed and open systems.
A closed system is a system whose behavior is completely determined by the state of the system. An open system is a system that interacts with its environment and whose behavior depends on this interaction. The ability of temporal logics to describe an ongoing interaction of a reactive program with its environmentmakes them particularly appropriate for the specification of open systems. Nevertheless, model-checking algorithms used for the verificationof closed systems are not appropriate for the verificationof open systems. Correct verification of open systems should check the system with respect to arbitrary environments and should take into account uncertainty regarding the environment. This is not the case with current model-checking algorithms and tools. Module checking is an algorithmic method that checks, given an open system (modeled as a finite structure) and a desired requirement (specified by a temporal-logic formula), whether the open system satisfies the requirement with respect to all environments. In this paper we describe and examine module checking problem, and study its computational complexity. Our results show that module checking is computationaUyharder than model checking.
1
Introduction
Temporal logics, which are modal logics geared towards the description of the temporal ordering of events, have been adopted as a powerful tool for specifying and verifying reactive systems PnuSl. One of the most significant developments in this area is the discovery of aigodthraic methods for verifying temporal-logic properties of finite-state systems ICE81, QS81, LP85, CES86, VW86a. This derives its significance both from the fact that many synchronization and communication protocols can be modeled as finite-state systems, as well as from the great ease of use of fully algorithmic methods. Experience has shown that algorithmic verification techniques scale up to industrial-sized designs CGH+95, and tools based on such techniques are gaining acceptance in industry BBG+94 We distinguish here between two types of temporal logics: universal and non-universal. Both logics describe the computation tree induced by the system. Formulas of universal temporal logics, such as L'IT,, VCTL, and VCTL*, describe requirements that should hold in all the branches of the tree GL94. These requirements may be either linear (e.g., in all computations, only finitely many requests are sent) as in LTL or branching (e.g., in all computations we eventually reach a state from which, no matter how we continue, no requests are sent) as in VCTL. In both cases, the more behaviors the system has, the harder it is for the system to satisfy the requirements. Indeed, universal temporal logics induce the simulation order between systems Mil71, CGB 86. That is, a system M simulates a system M ' if and only if all universal temporal logic formulas that are satisfied in M ' are satisfied in M" as well. On the other hand, formulas of non-universal temporal logics, such as CTL and CTL*, may also impose possibility requirements on the system (e.g., there exists a computation in which only finitely many requests are sent) EH86. Here, it is no longer * Supported in part by NSF grants CCR-9628400 and CCR-9700061 and by a grant from the Intel Corporattun.
251
true that simulation between systems corresponds to agreement on satisfaction of requirements. Indeed, it might be that adding behaviors to the system helps it to satisfy a possibilityrequirement or, equivalently, that disabling some of its behaviors causes the requirement not to be satisfied. We also distinguishbetween two types of systems: closed and open HP85. A closed system is a system whose behavior is completely determined by the state of the system. An open system is a system that interacts with its environmentand whose behavior depends on this interaction. Thus, while in a closed system all the nondeterministicchoices are internal, and resolved by the system, in an open system there are also external nondeterministicchoices, which are resolved by the environment Hoa85. In order to check whether a closed system satisfies a required property, we translate the system into some formal model, specify the property with a temporal-logic formula, and check formally that the model satisfies the formula. Hence the name model checking for the verification methods derived from this viewpoint. In order to cfieck whether an open system satisfies a required property, we should check the behavior of the system with respect to any environment, and often there is much uncertainty regarding the environment IF'Z88. In particular, it might be that the environmentdoes not enable all the external nondeterministic choices. To see this, consider a sandwich-dispensingmachine that serves, upon request, sandwiches with either ham or cheese. The machine is an open system and an environment for the system is an infinite line of hungry people. Since each person in the line can like either both ham and cheese, or only ham, or only cheese, each person suggests a different disabling of the external nondeterministic choices. Accordingly, there are many different possible environments to consider. It turned out that model-checking methods are applicable also for verification of open systems with respect to universal temporal-logic formulas IMP92, KV96, KV97a. To see this, consider an execution of an open system in a maximal environment; i.e., an environment that enables all the external nondeterministic choices. The result is a closed system, and it is simulated by any other execution of the system in some environment. Therefore, one earl check satisfaction of universal requirements in an open system by model checking the system viewed as a dosed system (i.e., all nondeterministicchoices are internal). This approach, however, can not be adapted when verifying an open system with respect to non-universalrequirements. Here, satisfaction of the requirements with respect to the maximal environmentdoes not imply their satisfaction with respect to all environments.Hence, we should explicitly make sure that all possibilityrequirements are satisfied, no matter how the environment restricts the system. For example, verifying that the sandwich-dispensing machine described above can always eventually serve ham, we want to make sure that this can happen no matter what the eating habits of the people in line are. Note that while this requirement holds with respect to the maximal environment, it does not hold, for instance, in an environment in which all the people in line do not like ham. Module checking is suggested in KV96, KVW97, KV97a as a general method for verification of open systems (we use the terms "open system" and "module" interchangeably). Given a module M and a temporal-logic formula ~b, the module-checking problem asks whether for all possible environments E, the execution of M in E satisfies g,. There are two ways to model open systems. In the first approach KV96, KVW97, we model open systems by transition systems with a partition of the states into two sets. One set contains system states and corresponds to states where the system makes a transition. The second set contains environment states and corresponds to states where the environmentmakes a transition. For a module M, let VM denote the unwinding of M into an infinite tree. We say that M satisfies ~biff ~bholds in all the trees obtained by pruning from VM subtrees whose root is a successor of an environment state. The intuition is that each such tree corresponds to a different (and possible) environment. We want g, to hold in every such tree since, of course, we want the open system to satisfy its specification no matter how the environment behaves. We examine the complexity of the module-checking problem for non-universal temporal
252
logics. It turns out that for such logics module checking is much harder than model checking; in fact, module checking as is as hard as satisfiability. Thus, CTL module checking is EXPTIMEcomplete and CTL* module checking is 2EXPTIME-complete. In both cases the complexity in terms of the size of the module is polynomial. In the second approach to modeling open systems KV97a, we look at the states of the transition system in more detail. We view these states as assignment of values to variables. These variables are controlled by either the system or by the environment. In this approach we can capture the phenomenon in which the environment the has incomplete information about the system; i.e., not all the variables are readable by the environment. Let us explain this issue in greater detail. An interaction between a system and its environment proceeds through a designated set of input and output variables. In addition, the system often has internal variables, which the environment cannot read. If two states of the system differ only in the values of unreadable variables, then the environmentcannot distinguish between them. Similarly, if two computations of the system differ only in the values of unreadable variables along them, then the environment cannot distinguish between them either and thus, its behaviors along these computations are the same. More formally, when we execute a module M with an environment8, and several states in the execution look the same and have the same history according to E's incomplete information, then the nondeterministicchoices done by ,f in each of these states coincide. In the sandwichdispensing machine example, the people in line cannot see whether the ham and the cheese are fresh. Therefore, their choices are independent of this missing information.Given an open system M with a partition of M ' s variables into readable and unreadable, and a temporal-logic formula ~b, the module-checking problem with incomplete information asks whether the execution of M in $ satisfies ~b, for all environments g whose nondeterministic choices are independent of the unreadable variables (that is, $ behaves the same in indistinguishablestates). It turns out that the presence of incomplete information makes module checking more complex. The problem of module checking with incomplete information is is EXPTIME-complete and 2EXPTIME-complete for CTL and CTL*, respectively. In both cases, however, the complexity in terms of the size of the module is exponential, making module checking with incomplete information quite intractable. 2
Module Checking
The logic CTL* is a branching temporal logic. A path quantifier, E ("for some path") or A ("for all paths"), can prefix an assertion composed of an arbitrary combinationof linear time operators. There are two types of formulas in CTL~: state formulas, whose satisfaction is related to a specific state, and path formulas, whose satisfaction is related to a specific path. Formally, let AP be a set of atomic proposition names. A CTL* state formula is either: - true, false, or p, for p E AP.
- - ~ , ~ v ~b, or ~oA ~b where ~oand g, are CTL* state formulas. - E ~ or A~, where ~ois a CTL* path formula. A CTL* path formula is either: - A CTL* state formula. - -~o, ~ V ~b, ~oA ~b, G~, F~o, X~, or ~U~b, where io and ~bare CTL* path formulas. The logic CTL* consists of the set of state formulas generated by the above roles. The logic CTL is a restricted subset of CTL*. In CTL, the temporal operators G, F, X, and U must be immediately preceded by a path quantifier. Formally, it is the subset of CTL* obtained by
253
restricting the path formulas to be G~o, F~o, X~o, ~oU~b, where ~o and ~b are CTL state formulas. Thus, for example, the CTL* formula ~o = A G F ( p A EXq) is not a CTL formula Adding a path quantifier, say A, before the F temporal operator in ~oresults in the formula A G A F ( p A E X q ) , which is a CTL formula. The logic VCTL* is a restricted subset of CTL* that allows only universal path quantification. Thus, it allows only the path quantifier A, which must always be in the scope of an even number of negations. Note that assertions of the form --A~b, which is equivalent to E-~b, are not possible. Thus, the logic VCTL* is not closed under negation. The formula ~o above is not a VCTL* formula. Changing the path quantifier g in ~o to the path quantifier A results in the formula A G F ( p A A X q ) , which is a VCTL* formula. The logic VCTL is defined similarly, as the restricted subset of CTL that allows only universal path quantification. The logics 3CTL* and 3CTL are defined analogously, as the existential fragments of CTL* and CTL, respectively. Note that negating a VCTL* formula results in an 3CTL* formula.' The semantics of the logic CTL* (and its sub-logics) is defined with respect to a program P = (AP, W, R, we, L), where A P is the set of atomic propositions, W is a set of states, R C_ W x W is a transition relation that must be total (i.e., for every w E W there exists w ' E W such that R(w, w')), w0 is an initial state, and L : W --* 2 a P maps each state to a set of atomic propositions true in this state. For w and w' with R(w, w'), we say that w' is a successor of w and we use bd(w) to denote the number of successors that w has. A path of P is an infinite sequence 7r = w ~ w l , . . . of states such that for every / > 0, we have R(w i, w i+1). The suffix w i , w i + l , . . . of 7r is denoted by ~ri. We use w ~ ~o to indicate that a state formula ~o holds at state w, and we use 7r ~ ~o to indicate that a path formula ~o holds at path ~r (with respect to a given program P). The relation ~ is inductively defined as follows. For all w, we have that w ~ true and w ~ false. - For an atomic proposition p E AP, we have to ~ p iffp E L(w) - w ~ "-,r i f f w ~ ~p. -
- w~oy~biffw~oorw~b. -
-
w ~ E~o iff there exists a path ~r = we, w l , . . , such that we = w and 7r ~ ~o. 7r ~ ~o for a state formula ~o iff w ~ ~ ~o.
-
~r ~ -,~o i f f ~" ~ ~o.
-
lr ~ X~oiff~r I ~ o .
-
7r ~ ~oUr iff there exists j >_ 0 such that ~rJ ~ r and for all 0 < i < j, we have ~ri ~ ~o.
The semantics above considers the Boolean operators -, ("negation") and v ("or"), the temporal operators X ("next") and U ("until"), and the path quantifier A. The other operators are superfluous and can be viewed as the following abbreviations. ~pA ~b = --,((-r V (--,~)) ("and"). F ~ = trueU~o ("eventually"). G~o = -,F-,~o ("always"). - A~o = --E-,~o ("for all paths").
-
-
-
A closed system is a system whose behavior is completely determined by the state of the system. We model a closed system by a program. An open system is a system that interacts with its environment and whose behavior depends on that interaction. We model an open system by a module M = (AP, Ws, W~, R, we, L), where AP, R, we, and L are as in programs, Ws is a set of system states, We is a set of environment states, and we often use W to denote W~ U W~. We assume that the states in M are ordered. For each state w ~ W, let suce(w) be an ordered tuple of w's R-successors; i.e., succ(w) = (w~ . . . . . w~a(to)), where for all 1 < i < bd(w), we
254
have R(w, wi), and the wi's are ordered. Consider a system state w, and an environment state we. Whenever a module is in the state ws, all the states in suce(w,) are possible next states. In contrast, when the module is in state we, there is no certainty with respect to the environment transitions and not all the states in succ(we) are possible next states. The only thing guaranteed, since we consider environments that cannot block the system, is that not all the transitions from we are disabled. For a state w E W, let step(w) denote the set of the possible (ordered) sets of w's next successors during an execution. By the above, step(w~) = {suec(ws)} and step(we) contains all the nonempty sub-tuples of suce(we). For k E IN, let k denote the set { 1 , 2 , . . . , k}. An infinite tree with branching degrees bounded by k is a nonempty set T _C k* such that if z 9 c E T where z E k* and c E k, then also z E T, and for all 1 <_ c' < c, we have that z 9 d E T. In addition, if x E T, then ~c 9 1 E T. The elements of T are called nodes, and the empty word ~ is the ioot of T. For every node z 6 T, we denote by d(z) the branching degree of x; that is, the number of c E k for which z 9 c in T. A path of T is a set 7r C_ T such that e E T and for all z E ,-r,there exists a unique c fi k such that z- c E 7r.Given an alphabet 27, a 27-1abeledtreeis a pair (T, V) where T is a tree and V : T ---+27 maps each node of T to a letterin 22. A module M can be unwound into an infinitetree (TM, VM) in a straightforward way. W h e n we examine a specificationwith respect to M , the specificationshould hold not only in (TM, V~) (which corresponds to a very specificenvironment thatdoes never restrictthe set of itsnext states),but in allthe treesobtained by pruning from (TM, VM) subtrees whose root is a successor of a node corresponding to an
environment state. Let ezec(M) denote the set of all these trees. Formally, (T, V) E ezec(M) iffthe following holds: -
V(~)
=
w0.
- ForaUz E Twith V(z) = w, there exists ( w t , . . . , w,) E step(w) suchthatTta({z}xIN) = { z - 1 , z - 2 , . . . , z - n} and for atl 1 < e < n we have V(z .e) = w,. Intuitively, each tree in ezec(M) corresponds to a different behavior of the environment. We will sometimes view the trees in ezec(M) as 2a/'-labeled trees, taking the label of a node z to be L(V(z)). Which interpretation is intended will be clear from the context. Given a module M and a CTL* formula ~b, we say that M satisfies ~b, denoted M ~ r ~b, if all the trees in exec(M) satisfy ~b. The problem of deciding whether M satisfies ~b is called module checking. We use M ~ ~b to indicate that when we regard M as a program (thus refer to all its states as system states), then M satisfies ~b. The problem of deciding whether M ~ ~b is the usual model-checking problem CE81, CES86, EL85, QS81. It is easy to see that while M ~ ~b implies that M ~ ~b, the other direction is not necessarily true. Also, while M ~ ~b implies that M ~ r -~b, the other direction is not true as well. Indeed, M ~ , ~b requires all the trees in ezec(M) to satisfy ~b. On the other hand, M ~ ~b means that the tree (TM, VM) satisfies r Finally, M ~ -,r only tells us that there exists some tree in ezee(M) that satisfies r As explained earlier, the distinction between model checking and module checking does not apply to universal temporal logics. L e m m a 1. KV96, KVW97 For universal temporal logics, the module-checking problem and
the model-checking problem coincide. In order to solve the module-checking problem for non-universal logics, we use nondeterministic tree automata. Tree automata run on 27-labeled trees. A Btichi tree automaton is A = (27, D, Q, q0, 6, F), where 2? is an alphabet,/9 is a finite set of branching degrees (positive integers), Q is a set of states, qo E Q is an initial state, 6 : Q x 27 x D ~ 2 <7- is a transition function satisfying 6(% tr, d) E Qd, for every q E Q, tr E 27, and d E D, and F _C Q is an acceptance condition.
255
A run of.4 on an input S-labeled tree (T, V) with branching degrees in D is a Q-labeled tree (T, r) such that r(s) = qo and for every z E T, we have that ( r ( z - I), r ( z . 2 ) , . . . , r ( z - d)) E 6(r(z), V(z), d(z)). If, for instance, r ( 1 . 1 ) = q, V ( I - 1) = or, d(1- 1) = 2, and 6(q, (r, 2) = {(ql, q2), (q4, q5)}, then either r(1 9 1 9 1) = ql and r(1 9 1 9 2) = q2, or r ( l - 1 9 1) = q4 and r ( l 9 1 92) = qs. Given a run (T, r) and a path Ir C_ T, we define Inf(rllr) = {q E Q : for infinitely many z E r , we have r(z) = q}. That is, Inf(rlTr ) is the set of states that r visits infinitely often along 7r. A run (T, r) is accepting iff for all paths r C T, we have h f f ( r l r ) N f i~ ~. Namely, along all the paths of T, the run visits states from F infinitely often. An automaton .4 accepts (7, V) iff there exists an accepting run (T, r) of A on (T, V). We use s to denote the language of the automaton .4; i.e., the set of all trees accepted by .4. In addition to Biichi tree automata, we also refer to Rabin tree automata. There, F C_ 2 Q x 2 0 ' and a run is accepting iff for every path ~r C T, there exists a pair (G, B) E F such that/nf(r,r) f~ G # $ and/nf(ra') n B = 0. The size of an automaton .4, denoted IAI, is defined as Iql + 161 + IFI, where 161 is the sum of the lengths of tuples that appear in the transitions in 6, and IFI is the sum of the sizes of the sets appearing in F (a single set in the case .4 is a Biichi automaton, and 2m sets in the case .4 is a Rabin automaton with ra pairs). Note that .41 is independent of the sizes of Z and ~D. Note also that A can be stored in space O(IAI). 3
The Complexity of Module Checking
We have already seen that for non-universai temporal logics, the model-checking problem and the module-checking problem do not coincide. In this section we study the complexity of CTL and CTL* module checking. We show that the difference between the model-checking and the module-checking problems reflects in their complexities, and in a very significant manner. Theorem 2. KV96
(1) The module-checkingproblem for CTL isEXPTIME.complete. (2) The module-checkingproblem for CTL* is2EXPTIME-complete. Proof (sketch): We start with the upper bounds. Given M and ~b, we define two tree automata. Essentially, the first automaton accepts the set of trees in ezee(M) and the second automaton accepts the set of trees that does not satisfy ~/,. Thus, M ~ r %/'iff the intersection of the automata is empty. Recall that each tree in ezee(M) is obtained from (TM, VM) by pruning some of its subtrees. The tree (TM, VM) is a 2AP-labeled tree. We can think of a tree (T, V) E e z e c ( M ) as the (2 "4P O {โข tree obtained from (TM, VM) by replacing the labels of nodes pruned in (T, V) by I . Doing so, all the trees in ezec(M) have the same shape (they all coincide with TM), and they differ only in their labeling. Accordingly, we can think of an environment to (TM, VM) as a strategy for placing .l_'s in (TM, VM): placing a .L in a certain node corresponds to the environment disabling the transition to that node. Since we consider environments that do not "block" the system, at least one successor of each node is not labeled with .L. Also, once the environment places a โข in a certain node z, it should keep placing .l_'s in all the nodes of the subtree that has z as its root. Indeed, all the nodes to this subtree are disabled. The first automaton, .AM, accepts all the (2AP IJ {_l_})-labeled tree obtained from (TM, VM) by such a "legal" placement of _l_'s. Formally, given a module M = (AP, ~;, W,, R, wo, L), we define A.~t = (2Ae U {โข ~O, Q, qo, 6, Q), where
256
- 2) = U,ocw {bd(w)}. That is, 2) contains all the branching degrees in M (and hence also all branching degrees in in TM). - Q ---- W x {T, I-, J_}). Thus, every state w of M induces three states (w, T), (w, I-), and (w, _L) in -AM. Intuitively, when .AM is in state (w, _L), it can read only the letter .L. When .AM is in state (w, T), it can read only letters in 2 AP. Finally, when,AM is in state (w, I-), it can read both letters in 2AP and the letter _L. Thus, while a state (w, ~-) leaves it for the environment to decide whether the transition to w is enabled, a state (w, T) requires the environment to enable the transition to w, and a state (w, .L) requires the environment to disable the transition to w. The three types of states help us to make sure that the environment enables all transitions from system states, enables at least one transition from each environment state, and disables transitions from states that the transition to them have already been disabled. -
qo =
(we, T).
- The transition function 6 : Q x ~ x 2) --* 2 qโข is defined for w E W and k = bd(w) as follows. Let suet(w) = ( w z , . . . , w;,). 9 F o r w E IV', U We a n d r e E {I--,.L}, we have
6((~, ~), โข k) = ((~, J_), (~2, J-),..., ( ~ , J-)). 9 For w E Ws and m E { T , I-}, we have
6((w, m), L(w), k) = ((wl, T), (w2,T) .... ,(wm,T)). 9 Forw E We andre E {T,l-},we have 6((w, m), L(w), k) = { ((w,, T), (w2,l-),..., (wtc,F)),
((~,, F), (w2,T),..., (~, ~-)), ((wl, I'-), (w2,I-) .... , (wk,T)) }.
That is, 6((w, m), L(w), k) contains k k-tuples. When the automaton proceeds according to the ith tuple, the environment can disable the transitions to all w's successors, except the transition to wi, which must be enabled. Note that 6 is not defined for the case k ~ bd(w) or when the input that not meet the restriction imposed by the T, P, and J. annotations, or the labeling ofw. Let k be the maximal branching degree in M . It is easy to see that QI < 3 9 IWI and is bounded by O(k. I/~l)Recall that a node of (T, V} E s that is labeled .L stands for a node that actually does not exist in the corresponding pruning of (TM, VM). Accordingly, if we interpret CTL* formulas over the trees obtained by pruning subtrees of (TM, VM) by means of the tress recognized by .AM, we should treat a node that is labeled by .t_ as a node that does not exist. To do this, we define a function f : CTL* formulas ---. CTL* formulas such that f(~) restricts path quantification to paths that never visit a state labeled with J_. We define f inductively as follows.
161 _< k. IRI. Thus, assuming that IWI < IRI, the size of.AM
- f ( q ) = q. f(-,~)------,f(~). - f(~l V ~2) = f(~l) V f(~2). - f ( E ~ ) = E((G-~_I_) A f(~)). - f ( A ~ ) = A((F.I.) V f ( ~ ) ) . - f(X~) = Xf(~). - f(~l U~2) = f(~l)Uf(~2). -
257
For example, f(EqU(AFp)) = E((G-,I) A (qU(A((F.L) Y Fq)))). When ~b is a CTL formula, the formula f(~b) is not necessarily a CTL formula. Still, it has a restricted syntax: its path formulas have either a single linear-time operator or two linear-time operators connected by a Boolean operator. By KG96, formulas of this syntax have a linear translation to CTL. Given ~b, let . A ~ , ~ be a Btichi tree automaton that accepts exactly all the tree models of f(--,~b) with branching degrees in ~D. By VW86b, such Av,~,~ of size 2 k'~ exists. By the definition of satisfaction, we have that M ~ r ~b iff all the trees in ezec(M) satisfy ~b. In other words, if no tree in ezee(M) satisfies -,~b. Recall that the automaton AM accepts a (2a e U { l})-labeled tree iff it corresponds to a "legal" pruning of (TM, VM) by the environment, with a pruned node being labeled by .1..Also, the automaton A~,~,~ accepts a (2 AP tJ { / } )-labeled tree iffit does not satisfy ~b, with path quantification ranging only over paths that never meet a node labeled with _L. Hence, checking whether M ~ r ~bcan be reduced to testing E(AM) NE(.A~,.,q~) for emptiness. Equivalently, we have to test s x A v , ~ ) for emptiness. By VW86b, the nonemptiness problem of Biichi tree automata can be solved in quadratic time, which gives us an algorithm of time complexity O(IRI2. 2kO(Iqd)). The proof is similar for CTL*. Here, following ES84, F_288, we have that Av,-w, is a Rabin tree automaton with z-k "2~ states and 2 ~ pairs. By EJ88, PR89, checking the emptiness of/~(.AM x .,4~,-,qj) Call then be done in time (k. IRI) z ~ 92 ~ 2~176 It remains to prove the lower bounds. To get an EXPTIME lower bound for CTL, we reduce CTL satisfiability,proved to be EXtrrIME-complete in FL79, Pra80, to CTL module checking. Given a CTL formula xb, we construct a module M and a CTL formula ~osuch that the size of M is quadratic in the length of ~b, the length of ~o is linear in the length of ~b, and ~b is satisfiable iff M ~ r " ~ . The proof is the same for CTL*. Here, we do a reduction from satisfiabilityof CTL*, proved to be 2EXPTIME-hard in VS85. See KV96 for more details. When analyzing the complexity of model checking, a distinction should be made between complexity in the size of the input structure and complexity in the size of the input formula; it is the complexity in size of the structure that is typically the computational bottleneck LP85. We now consider the program complexity VW86a of module checking; i.e., the complexity of this problem in terms of the size of the input module, assuming the formula is fixed. It is known that the program complexity of LTL, CTL, and CTL* model checking is NLOGSPACE VW86a, BVW94. This is very significant since it implies that if the system to be checked is obtained as the product of the components of a concurrent program (as is usually the case), the space required is polynomial in the size of these components rather than of the order of the exponentially larger composition. We have seen that when we measure the complexity of the module-checking problem in terms of both the program and the formula, then module checking of CTL and CTL* formulas is much harder than their model checking. We now claim that when we consider program complexity, module checking is still harder. Theorem 3. KV96
The program complexity of CTL and CTL* module checking is PTIME-
complete. Proof: Since the algorithms given in the proof of Theorem 2 are polynomial in the size of the module, membership in PTIME is immediate. We prove hardness in PThME by reducing the Monotonic Circuit Value Problem (MCV), proved to be PTIME-hard in Go177, to module checking of the CTL formula EFp. In the MCV problem, we are given a monotonic Boolean circuit a (i.e., a circuit constructed solely of AND gates and OR gates), and a vector (xl, 9.., x,~) of Boolean input values. The problem is to determine whether the output of o~on ( x l , . . . , xn) is 1.
258
Let us denote a monotonic circuit by a tuple c~ = (Gv, G3, Gin, gout, T), where Gv is the set of AND gates, G3 is the set of OR gates, Gin is the set of input gates (identified as g h . . - , gn), gout E Gv U G3 U Gin is the output gate, and T C G x G denotes the acyclic dependencies in c~, that is (g, g') 6 T i f f the output of gate g' is an input of gate g. Given a monotonic circuit ~ = (Gv, G3, Gin, gout, T) and an input vector x = ( z h . . . , z , ) , we construct a module M~,x = ({0, 1}, Gv, G3 U Gin, R, gout, L), where - R=TU{(g,g):g
E Gi.}.
- For g 6 Gv U G3, we have L(g) = { I}. For gi E Gi,, we have L(gi) = {zi}. Clearly, the size of Ma,x is linear in the size of a. Intuitively, each tree in exee(Mc,,x) corresponds to a decision of a as to how to satisfy its OR gates (we satisfy an OR gate by satisfying any nonempty subset of its inputs). It is therefore easy to see that there exists a tree (T, V) E ezee(Ma,x) such that (T, V) ~ AGI iff the output of c~ on x is 1. Hence, by the definition of module checking, wehave that Ma,x ~ . E F 0 i f f t h e o u t p u t o f a o n x is 0. 4
Module Checking with Incomplete Information
We first need to generalize the definition of trees from Section 2. Given a finite set T, an T-tree is a nonempty set T C_ T* such that if s 9v 6 T, where s E T* and v E T, then also s 6 T. When T is not important or clear from the context, we call T a tree. The elements o f T are called nodes, and the empty word e is the root of T. For every s E T, the nodes s 9 v 6 T where v 6 T are the children of s. An T-tree T is afull infinite tree if T = T*. Each node s of T has a direction in Y. The direction of the root is some designated vo E T. The direction of a node s 9v is v. A path lr of T is a set 7r C_ T such that e E 7r and for every s E x there exists a unique v 6 T such that s . v 6 ~r. Given two finite sets T and 2Y, a G-labeled T-tree is a pair (T, V) where T is an T-tree and V : T ---, Z maps each node of T to a letter in ~ . When T and 27 are not important or clear from the context, we call (T, V) a labeled tree. For finite sets X and Y, and a node s 6 ( X x Y ) ' , let hidey(s) be the node in X* obtained from s by replacing each letter ( z - y) by the letter z. For example, when X = Y = {0, 1}, the node 0010 of the ( X x Y)-tree on the right corresponds, by hidey, to the node 01 of the X-tree on the left. Note that the nodes 0011, 0110, and 0111 of the (X x Y)-tree also correspondto the node 01 of the X-tree. Let Z be a finite set. For a Z-labeled X-tree (T, V), we define the Y-widening of (T, V), denoted widey ((T, V)), as the Z-labeled ( X x Y)-tree (T', V') where for every s 6 T, we have hide,'(s) C_ 7" and for every t 6 T', we have V'(t) = V(hidey(t)). Note that for every node t E T', and z E X, the children t 9(z 9 V) of t, for all V, agree on their label in (T', V'). Indeed, they are all labeled with V(hider(t) 9z). We now describe a second approach to modeling open systems. We describe an open system by a module M = (I, O, H, W, wo, R, L ), where I, O, and H are sets of input, readable output, and hidden (internal) variables, respectively. We assume that I, O, and H are pairwise disjoint, we use K to denote the variables known to the environment; thus K = I U O, and we use P to denote all variables; thus P = K U H . W is a set of states, and wo E W is an initial state. R C__W x W is a total transition relation. For (w, w') E R, we say that w' is a successor of w. Requiring R to be total means that every state w has at least one successor. - L : W ~ 2 P maps each state to the set of variables that hold in this state. The intuition is that in every state w, the module reads L(w) O I and writes L(w) N (0 U H). -
-
-
259
A computation of M is a sequence wo, W h . . . of states, such that for all i > 0 we have (wi, wi+l} E FL We define the size IMI of M as (IwI * IPI) + IRI. We assume, without loss of generality, that all the states of M ate labeled differently; i.e., there exist no wl and w2 in W for which L(wt) = L(w2) (otherwise, we can add variables in H that differentiate states with identical labeling). With each module M we can associate a computation tree (TM, VM} obtained by pruning M from the initial state. More formally, (TM, VM} is a 2P-labeled 2P-tree (not necessarily with a fixed branching degree). Each node of (TM, VM) corresponds to a state of M, with the root corresponding to the initial state. A node eorrespending to a state w is labeled by L(w) and its children correspond to the successors of w in M. The assumption that the nodes ate labeled differently enable us to embody (TM, VM} in a (2e)*-tree, with a node with direction v labeled v. A module M is closed iff I = 0. Otherwise, it is open. Consider an open module M . The module interacts with some environment E that supplies its inputs. When M is in state w, its ability to move to a certain successor w' of w is conditioned by the behavior of its environment. If, for example, L(w') N l = o" and the environment does not supply ~r to M, then M cannot move to w'. Thus, the environment may disable some of M ' s transitions. We can think of an environment to M as astrategy E : (2K) * --~ {'I', .I.} that maps a finite history s of acomputation (as seen by the environment) to either -I', meaning that the environment enables M to execute s, or .1_, meaning that the environment does not enable M to execute s. In other words, if M reaches a state w by executing some s E (2K) *, and a successor w ~ o f w has L(w) N K = cr, then an interaction of M with E can proceed from w to w ~ iff s - ~r) = T. We say that the tree {(2K) * , EI maintains the strategy applied by E. We denote by M <1 E the execution of M in E; that is, the tree obtained by pruning from the computation tree {TM, VM} subtrees according to E. Note that E may disable all the successors of w. We say that a composition M <1 E is deadlockfree iff for every state w, at least one successor of w is enabled. Given M, we can define the max/real environment ~max for M . The maximal environment has E,~,~(x) = T for all z E (2K)*; thus it enables all the transitions of M . Recall that in Section 2, we modeled open systems using system and environment states, and only transitions from environment states may be disabled. Here, the interaction of the system with its environment is more explicit, and transitions are disabled by the environment assigning values to the system's input variables. The hiding and widening operators enable us to refer to the interaction of M with s as seen by both M and E. As we shall see below, this interaction looks different from the two points of views. First, clearly, the labels of the computation tree of M, as seen by $, do not contain variables in H. Consequently, C thinks that ~TM, VM) is a 2 K-tree, rather than a 2e-tree. Indeed, ~ cannot distinguish between two nodes that differ only in the values of variables in H in their labels. Accordingly, a branch of (TM, VM} into two such nodes is viewed by s as a single transition. This incomplete information of ~ is reflected in its strategy, which is independent of H . Thus, successors of a state that agree on the labeling of the readable variables are either all enabled or all disabled. Formally, i f / ( 2 K ) * , E) is the {T, .l.}-labeled 2K-tree that maintains the strategy applied by $, then the { T , / } - l a b e l e d 2 P -tree wide(2n)(((2K .) , e}) maintains the "full" strategy for E, as seen by someone that sees both K and H. Another way to see the effect of incomplete information is to associate with each environment E a tree obtained from (TM, VM} by pruning some of its subtrees. A subtree with root s E TM is pruned iff I('(hide(2~)(s)) = .1.. Every two nodes st and s2 that are indistinguishable according to C's incomplete information have hidel2n)(st) = hide(2n)(s2). Hence, either both subtrees with roots st and s2 ate pruned or both are not pruned. Note that once E(z) = .1_ for some s E (2K) *, we can assume that E ( s - t) for all t E (2K) * is also .1_. Indeed, once the environment disables the transition to a certain node s, it actually disables the transitions to all the nodes
260
in the subtree with root s. Note also that M <1 E is deadlock free iff for every s E TM with s hide(2H)(s)) = T, at least one direction v E 2 e has s . v E TM and E(hide(2H)(s . v)) = T. 5
T h e Complexity of Module C h e c k i n g w i t h I n c o m p l e t e I n f o r m a t i o n
The module-checking with incomplete information problem is defined as follows. Let M be a module, and let ~b be a temporal-logic formula over the set P of M ' s variables. Does M <1 E satisfy ~b for every environment s for which M <1 E is deadlock free? When the answer to the module-checking question is positive, we say that M reactively satisfies ~b, denoted M ~ r ~b. Note that when fir = 0, i.e., there are no hidden variables, then we get the module-checking problem, which was studied in Section 3. Even with incomplete information, the distinction between model checking and module checking does not apply to universal temporal logics.
For universal temporal logics, the module-checking with incomplete information problem and the model-checking problem coincide. Lemma4. KV97a
Dealing incomplete information for non-universal logics is complicated. The solution we suggest is based on alternating tree automata and is outlined below. In Sections 5.1 and 5.2, we define alternating tree automata and describe the solutions in detail. We start by recalling the solution to the module-checking problem. Given M and !b, we proceed as follows. A L Define a nondeterministic tree automaton .AM that accepts all the 2P-labeled trees that correspond to compositions of M with some E for which M
261
We now turn to a detailed description of the solution of the module-checking problem with incomplete information, and the complexity results it entails. For that, we first define formally alternating tree automata. 5.1
Alternating Tree Automata
Alternating tree automata generalize nondeterministic tree automata and were first introduced in MS87. An alternating tree automaton A = {s Q, q0, 6, a) runs on full G-labeled T-trees (for an agreed set T of directions). It consists of a finite set Q of states, an initial state q0 E Q, a transition function 6, and an acceptance condition c~ (a condition that defines a subset of Q~). For a set T of directions, let E + ( T โข Q) be the set of positive Boolean formulas over T โข Q; i.e., Boolean formulas built from elements in T โข Q using A and V, where we also allow the formulas true and false and, as usual, A has precedence over V. The transition function 6 : Q โข ~ --+ 13+(Y โข Q) maps a state and an input letter to a formula that suggests a new configuration for the automaton. For example, when T = {0, 1}, having 6(q, u) = (0, ql) A (0, q2) V (0, q2) A (1, q2) A (1, q3) means that when the automaton is in state q and reads the letter ~r, it can either send two copies, in states qt and q2, to direction 0 of the tree, or send a copy in state q2 to direction 0 and two copies, in states q2 and q3, to direction 1. Thus, unlike nondeterministic tree automata, here the transition function may require the automaton to send several copies to the same direction or allow it not to send copies to all directions. A run of an alternating automaton ,4 on an input ZMabeled T-tree (T, V) is a tree (T,, r) in which the root is labeled by qo and every other node is labeled by an element of T* x Q. Each node of T, corresponds to a node of T. A node in T,, labeled by (a:, q), describes a copy of the automaton that reads the node x of T and visits the state q. Note that many nodes of Tr can correspond to the same node of T ; in contrast, in a m n of a nondeterministic automaton on (T, V) there is a one-to-one correspondence between the nodes of the m n and the nodes of the tree. The labels of a node and its children have to satisfy the transition function. For example, if
(T, V) is a {0, l}-treewith V(e) = a and 6(qo, a) = ((0,ql) V (0, q2)) A ((0, q3) V (I, q2)), then the nodes of (Tr, r) at level I include the label (0, ql) or (0, q2),and include the label (0, q3) or (I, q2)-Each infinitepath p in (T~, r) is labeled by a word r(p) in QW. Let inf(p)denote the set of statesin Q thatappear in r(p) infinitelyoften. A run (T~, r) isaccepting iffallitsinfinitepaths satisfythe acceptance condition. In Biichialternatingtree automata, a C Q, and an infinitepath p satisfiesa iffinf(p) N a ~ 0. As with nondeterministic automata, an automaton accepts a tree iffthereexists an accepting run on it.We denote by s the language of the automaton .4; i.e., the set of all labeled trees that ,4 accepts. We say that an automaton is nonempty iffs ~ 0. We define the size I`41 of an alternating automaton .,4 = (~, Q, q0, 6, a) as Q + t~ + 16, where QI and a are the respective cardinalitiesof the sets Q and a, and where 16 is the sum of the lengths of the satisfiable(i.e.,not false) formulas that appear as 6(q, or) for some q and or. 5.2 Solving the Problem of Module-Checking with Incomplete Information
Given a module M and a CTL formula ~b over the sets I, O, and H, of M's variables, there exists an alternating Biichi tree automaton AM,r over { T, __}-labeled 2~~ of size O(IMI * r such that s162 is exac@ the set of strategies ~ such that M < C is deadlock free and satisfies r Theorem 5. KV97a
Proof (sketch): Let M = (I, O, H, W, w0, R, L), and let K = I U O. For w E W and v E 2 K, we defines(w, v) = {w' I (w, w') e R and LCw') M K = v} and dew) = {v Is(w,v) # 0}.
262 That is,s(w,v) contains allthe successors of w that agree in theirreadable variableswith v. Each such successor corresponds to a node in (TM, VM) with a directionin hide~l)(v).Accordingly, d(w) containsalldirectionsv for which nodes corresponding to w in (TM, VM) have at leastone successor with a directionin hide~t~)(v).,_ Essentially,the automaton ,4M.~ is similarto the product alternatingtreeautomaton obtained in the alternating-automatatheoretic framework for C T L model checking BVW94. There, as there is a singlecomputation tree with respect to which the formula is checked, the automaton obtained is a l-letterautomaton. Here, as there are many computation trees to check, we get a 2-1citerautomaton: each {T, J.}-labcled tree induces a differentcomputation tree, and .AM,,~ considers them all.In addition,itchecks that the composition of the strategyin the input with M is deadlock free.W e assume that ~b is given in a positive normal form, thus negations are applied only to atomic propositions.W e define .,4M.q,= ({T, _L}, Q, q0,5, a), where
Q = (W x (cl(~b)U {PT }) x {V, 3}) U {qo}, where cl(~)denotes the set of~b'ssubformulas. Intuitively,when the automaton is in state (w, ~o,V), itaccepts all strategiesfor which w is eitherpruned or satisfies~o,where ~o = PT is satisfiediffthe root of the strategyis labeled T. When the automaton is in state {w, ~, 3), itaccepts all strategiesfor which w is not pruned and itsatisfies~o.W e callV and 3 the mode of the state.While the statesin W x {PT } x {V, 3} check thatthe composition of M with the strategy in the input is deadlock free,the statesin W x cl(~b)x {V, 3} check that thiscomposition satisfies~b.The initialstateq0 sends copies to check both the deadlock frecness of the composition and the satisfactionof !b. - The transitionfunction 6 : Q x Z7 ---*B+(2 K x Q) is defined as follows (with m E {3,V}). 9 6(qo,.L) = false,and 6(qo, T) = 6((we, PT, 3), T) A 6((we, Ib,3), T).
-
9 For all w and ~a, we have
6((w, ~, V), _L) =
true and 5((w, ~o, 3), .L) = false.
9 6((w, pr. m). T) = (V. ~:,, Vw,~,(.,.~)(~. ('~', pT. 3))) A (A~E2* &,,~,(w,o)(", ('~', PT. V))).
9 6((w,p, ra), T)
= true i f p E L(w), and 6({w,p, m), T) = false i f p ~ L(w). 9 6((w,-,p, m), T ) = true i f p ~ L(w), and 6((w, "-~p,m), T) = false i f p e L(w).
9 ~((~, w ^ w, m), T) = ~((w, ~ , m), T) A ~((w, W, m), T). 9 6((w, ~l V ~2, m), T) = 6((w, ~PI, rn), T) V 6((w, ~B, m), T). 9 6((w, AX~p,m),T) = A~E~,~ A~,,~o(w,~)('~, (,~', v, v))9 6((w, SX~, rn), T ) = VoezK V,,,e~0,,~)(v , (w', ~, 3)). 9 6((w,A~lU~2,ra), T) = 6((w, ~p.~,rn), T)V(6((w, W, rn), T)AA,,e2. A,~,E,(,,,,,,)(v,(~', A w U w , V))). 9 5((w, E~lU~a2, rn), T) = 6((w, ~P2,rn), T)V(5((w, ~l, m), T)AVt, e2r Vw,o(w,.)(v, (w', E~pIU~p,z,3))). 9 5((w, AG~p,m), T) = 6((w, ~p, m), T)A A.~:,~ A,,,,E,(,~,~,)(,-', (~', AG~p,V)).
9 ~((,~, EO~, ~), T) = 6((,,,, ~, m), T) A V o ~ * V,~,~,(~.~)(", ("'. E ~ . 3)). Consider, for example, a transition from the state (w, AX~p, 3). First, if the transition to w is disabled (that is, the automaton reads _L), then, as the current mode is existential, the run is rejecting. If the transition to w is enable.d, then w's successors that are enabled should satisfy ~. The state w may have several successors that agree on some labeling v ~ 2 ~r and differ only on the labeling of variables in H. These successors are indistinguishable by the environment, and the automaton sends them all to the same direction v. This guarantees that cithcr all these successors are enabled by the strategy (in case the letter to be read in direction v is T) or all are disabled (in case the letter in direction v is _L). In addition, since the requirement to satisfy ~p concerns only successors of w that are enabled, the mode of the new states is universal. The copies of AM,,/, that check the composition with the strategy
263
to be deadlock free guarantee that at least one successor of w is enabled. Note that as the transition relation/~ is total, the conjunctions and disjunctions in 6 cannot be empty. - ot = W โข G(~b) โข {3, V}, where G(~b) is the set of all formulas of the form AG~ or E G ~ in el(~b). Thus, while the automaton cannot get trapped in states associated with "Untilformulas" (then, the eventuality of the until is not satisfied), it may get trapped in states associated with "Always-formulas" (then, the safety requirement is never violated). We now consider the size of .AM,~b.Clearly, Q = O(IWI * I~1). Also, as the transition associated with a state (w, ~, rn) depends on the successors of w, we have that lal = O(I/~1. I~1). Finally, led <_ IQI, and we are done. 13 Extending the alternating automata described in BVW94 to handle incomplete information is possible thanks to the special structure of the automata, which alternate between universal and existential modes. This structure (the "hesitation condition", as called in BVW94) exists also in automata associated with CTL* formulas, and imply the following analogous theorem.
Given a module M and a CTL* formula ~ over the sets I, O, and H, of M's variables, there exists an alternating Rabin tree automaton .AM,, over {-I-, &}-labeled 2zu~ with W * 2 ~ states and two pairs, such that E(.AM,O) is exactly the set of strategies E such that M <1 E is deadlock free and satisfies ~b. Theorem 6. KV97a
We now consider the complexity bounds that follow from our algorithm. The module-checking problem with incomplete information is EXPTIMEcomplete for CTL and is 2EXPTIME-complete for CTL*.
Theorem 7. KV97a
Proof (sketch): The lower bounds follows from the known bounds for module checking with complete information KV96. For the upper bounds, in Theorems 5 and 6 we reduced the problem M ~ r ~b to the problem of checking the nonemptiness of the automaton .AM,-~. When ~b is a CTL formula, .AM,-,~ is an alternating Btichi automaton of size O(IMI 9 I~1)- By VW86b, MS95, checking the nonemptiness of.AM,,,~ is then exponential in the sizes of M and ~b. When ~b is a CTL* formula, the automaton .AM,-,~ is an alternating Rabin automaton, with W I 9 2~ states and two pairs. Accordingly, by EJ88, MS95, checking the nonempfiness of .AM,~ is exponential in IWI and double exponential in Iffl13 As the module-checking problem for CTL is already EXPTIME-hard for environments with complete information, it might seem as if incomplete information can be handled at no cost. This is, however, not true. By Theorem 3, the program complexity of CTL module checking with complete information is PTIME-complete. On the other hand, the time complexity of the algorithm we present here is exponential in the size of the both the formula and the system. Can we do better? In Theorem 8 below, we answer this question negatively. To see why, consider a module M with hidden variables. When M interacts with an environment C, the module seen by E is different from M. Indeed, every state of the module seen by ~ corresponds to a set of states of M. Therefore, coping with incomplete information involves some subset construction, which blows-up the state space exponentially. In our algorithm, the subset construction hides in the emptiness test o f . A M , ~ . The program complexity of CTL module checking with incomplete information is EXPTIME-complete.
Theorem 8. KV97b
264
Proof(sketch): The upper bound follows from Theorem 7. For the lower bound, we do a reduction from the outcome problem for two-players games with incomplete information, proved to be EXPTIME-hard in Rei84. A two-player game with incomplete information consists of an AND-OR graph with an initial state and a set of designated states. Each of the states in the graph is labeled by readable and unreadable observations. The game is played between two players, called the OR-player and the AND-player. The two players generate together a path in the graph. The path starts at the initial state. Whenever the game is at an OR-state, the OR-player determines the next state. Whenever the game is at an AND-state, the AND-player determines the next state. The outcome problem is to determine whether the OR-player has a strategy that depends only on the readable observations (that is, a strategy that maps finite sequences of sets of readable observations to a set of known observations) such that following this strategy guarantees that, no matter how the AND-player plays, the path eventually visits one of the designated states. Given an AND-ORgraph G as above, we define a module M a such that .~,irareactively satisfies a fixed CTL formula ~ iff the OR-player has no strategy as above. The environments of M a correspond to strategies for the OR-player. Each environment suggests a pruning of (TMc, VMa ) such that the set of paths in the pruned tree corresponds to a set of paths that the OR-player can force the game into, no matter how the AND-player plays. The module Me is very similar to G, and the formula ~ requires the existence of a computation that never visits a designated state. The formal definition of MG and So involves some technical complications required in order to make sure that the environment disables only transitions from OR-states. 13
6
Discussion
The discussion of the relative merits of linear versus branching temporal logics is almost as early as these paradigms LareS0. One of the beliefs dominating this discussion has been "while specifying is easier in LTL, model checking is easier for CTU'. Indeed, the restricted syntax of CTL limits its expressive power and many important behaviors (e.g., stiong fairness) can not be specified in CTL. On the other hand, while model checking for CTL can be done in time O(IPI * I~1) cEs86, it takes time O(IPI * 21,/,I) for LTL LP85. Since LTL model checking is PSPACE-complete SC85, the latter bound probably cannot be improved. The attractive computational complexity of CTL model checking have compensated for its lack of expressive power and branching-time model-checking tools can handle systems with extremely large state spaces BCM+90, McM93, CGL93. If we examine this issue more closely, however, we find that the computational superiority of CTL over LTL is not that clear. For example, as shown in Var95, KV95, the advantage that CTL enjoys over LTL disappears also when the complexity of modular verification is considered. The distinction between closed an open systems discussed in this paper questions the computational superiority of the branching-time paradigm further. Our conclusion is that the debate about the relative merit of the linear and branching paradigms will not be settled by technical arguments such as expressive power or computational complexity. Rather, the discussion should focus on the attractiveness of the approaches to practitioners who practice computer-aided verification in realistic settings. We believe that this discussion will end up with the conclusion that both approaches have their merits and computer-aided verification tools should therefore combine the two approaches rather than "religiously" adhere to one or the other. References BBG+94
I. Beer, S. Ben-David, D. Geist. R. Gewirtzman, and M. Yoeli. Methodologyand system for practical formal verification of reactive hardware. In Prec. 6th Conferenceon Computer Aided
265
Verification, volume 818 of Lecture Notes in ComputerScience, pages 182-193, Stanford. June 1994. BCM+90 J.R.Burch, E.M.Clarke, K.L.McMillan, D.L. Dill, andLJ.Hwang. Symbolicmodelchecking: 10~ states and beyond. In Proceedings of the 5th Symposium on Logic in ComputerScience, pages 428-439, Philadelphia, June 1990. BVW94 O. Bernholtz, M.Y. Vardi, and P. Wolper. An automata-theoretic approach to branching-time model checking. In D. L. Dill, editor, Computer Aided Verification, Proc. 6th Int. Conference, volume 818 of Lecture Notes in ComputerScience, pages 142-155, Stanford, June 1994. Springer-Verlag, Bedim ICE81 E.M. Clarke and E.A. Emerson. Design and synthesis of synchronization skeletons using branching time temporal logic. In Proc. Workshop on Logic of Programs, volume 131 of Lecture Notes in ComputerScience, pages 52-71. Springer-Verlag, 1981. CES86 E.M. Clarke, E.A. Emerson, and A.P. $isda. Automatic verification of finite-state concurrent systems using temporal logic specifications. ACM Transactions on Programming Languages and Systems, 8(2):244-263, January 1986. CGB86 E.M. Clarke, O. Grumberg, and M.C. Browne. Reasoning abeut networks with many identical finite-state processes. In Prac. 5th ACM Symposium on Principles of Distributed Computing, pages 240-248, Calgary, Alberta, August 1986. CGH+95 E.M. Clarke, O. Grumberg, H. Hiraishi, S. Jha, D.E. Long, ICL. McMiUan, and L.A. bless. Verification of the futurebus+ cache coherence protocol. FormalMethods in System Design, 6:217-232, 1995. CGL93 E.M. Clarke, O. Grumberg, and D. Long. Verification tools for finite-state concurrent systems. In J.W. de Bakker, W.-P. de Roever, and G. Rozenberg, editors, Decade of Concurrency Reflections and Perspectives (Proceedings of REX School), volume 803 of Lecture Notes in Computer Science, pages 124--175. Springer-Vedag, 1993. EH86 E.A. Emerson and J.Y. Halpem. Sometimes and not never revisited: On branching versus linear time. JournatoftheACM, 33(1):151-178, 1986. EJ88 E.A. Emerson and C. Jutla. The complexity of tree automata and logics of programs. In Proceedingsof the 291hIEEE Symposium on Foundationsof ComputerScience, pages 368--377, White Plains, October 1988. EL85 E.A. Emerson and C.-L. Loi. Modalities for model checking: Branching time logic strikes back. In Proceedingsof the TwelfthA CM Symposiumon Principlesof ProgrammingLanguages,pages 84-96, New Orleans, January 1985. Eme85 E.A. Emerson. Automata, tableaux, and temporal logics. In Proc. Workshop on Logic of Programs, volume 193 of Lecture Notes in ComputerScience, pages 79-87. Spdnger-Verlag, 1985. ES84 E.A. Emerson and A. P. Sistla. Deciding branching time logic. In Proc. 16thACM Symposium on Theory of Computing, Washington, April 1984. b'L79 M.J. Fischer and R.E. Ladner. Propositionaldynamic logic of regular programs. J. of Computer and Systems Sciences, 18:194-211, 1979. 'F7~8 M.J. Fischer and L.D. Zuck. Reasoning about uncertainty in fault-tolerant distributed systems. In M. Joseph, editor, Proc. Syrup. on FormalTechniquesin Real-I"traeand Fault-TolerantSystems, volume 331 of Lecture Notes in ComputerScience, pages 142-158. Springer-Veflag, 1988. GL94 O. Gmmberg and D.E. Long. Model checking and modular vedfication. ACM Trans. on Programming Languagesand Systems, 16(3):843-871, 1994. Go177 L.M. Goldschlager. The monotone and planar circuit value problems are log space complete for p. SIGACTNews, 9(2):25-29, 1977. Hoa85 C.A.R. Hoare. Communicating Sequential Processes. Prentice-Hall, 1985. I-IP85 D. Harel and A. Pnueli. On the development of reactive systems. In K. Apt, editor, Logics and Models of Concurrent Systems, volume F- 13 of NATOAdvanced Summer Institutes, pages 477-498. Springer-Vedag, 1985. KG96 O. Kupferman and O. Grumberg. Buy one, get one free!! Journalof LOgic and Computation, 6(4):523-539, I996.
266
KV95 KV961
KV97a
KV97b
KVW97 FLare80 LP85 McM93 Mil71
IMP92 MS87 MS95
Pnu81 PR89 Pm80
QSgl Rei84 SC85 Vat95 VS85 VW86a
VW86b
O. Kupferman and M.Y. Vardi. On the complexity of branching modular model checking. In Proc. 6th Conferance on Concurrency Theory, volume 962 of Lecture Notes in Computer Science, pages 408-422, Philadelphia, August 1995. Springer-Veflag. O. Kupferman and M.Y. Vardi. Module checking. In ComputerAided Verification, Proc.8th Int. Conference, volume 1102 of Lecture Notes in ComputerScience, pages 75-86. Springer-Verlag, 1996. O. Kupferman and M.Y. Vardi. Module checking revisited. In ComputerAMed Verification, Proc. 9th Int. Conference, volume 1254 of Lecture Notes in ComputerScience, pages 36--47. Spdnger-Vedag, 1997. O. Kupferman and M.Y. Vardi. Weak alternating automata are not that weak. In 5th Israeli Symposium on Theory of Computing and Systems, pages 147-158. ~ Computer Society Press, 1997. O. Kupferman, M.Y. Vardi, and P. Wolper. Module checking. 1997. L. Lamport. Sometimes is sometimes "not never" - on the temporal logic of programs. In Proceedingsof the 7th ACM Symposium on Principlesof ProgrammingLanguages, pages 174185, January 1980. O. Lichtenstein and A. Pnueli. Checking that finite state concurrentprograms satisfy their linear specification. In Proceedings of the TwelfthACM Symposium on Principles of Programming Languages, pages 97-107, New Orleans, January 1985. K.L. McMillan. Symbolicmodel checking. Kluwer Academic Publishers, 1993. R. Milner. An algebraic definition of simulation between programs. In Proceedingsof the 2rid International Joint Conference on Artificial Intelligence, pages 481-489, September 1971. Z. Manna and A. Pnueli. Temporal specification and verification of reactive modules. 1992. D.E. Muller and I.E. Schupp. Altemating automata on infinite trees. TheoreticalComputer Science, 54,:267-276, 1987. D.E. Muller and P.E. Schupp. Simulating atemating tree automata by nondeterministic automata: New results and new proofs of theorems ofRabin, McNaughton and Safra. TheoreticalComputer Science, 141:69-107, 1995. A. Pnueli. The temporal semantics of concurrent programs. TheoreticalComputer Science, 13:45--60, 1981. A. Pnueli and R. Rosner. On the synthesis of a reactive module. In Proceedingsof the Sixteenth ACM Symposium on Principles of Programming Languages,Austin, January 1989. V.R. Pratt. A near-optimal method for reasoning about action. J. on Computer and System Sciences, 20(2):231-254, 1980. J.P. Queille and J. Sifakis. Specification and verification of concurrent systems in Cesar. In Proc. 5th International Syrup. on Programming,volume 137, pages 337-351. Springer-Vedag, Lecture Notes in Computer Science, 1981. J.H. Reif. The complexity of two-player games of incomplete information. J. on Computerand System Sciences, 29:274-301, 1984. A.P. Sistla and E.M. Clarke. The complexity of propositional linear temporal logic. J. ACM, 32:733-749, 1985. M.Y. Vardi. On the complexity of modular model checking. In Proceedingsof the lOth IEEE Symposium on Logic in Computer Science, June 1995. M.Y. Vardi and L. S tockmeyer. Improved upper and lower bounds for modal logics of programs. In Proc 17th ACM Symp. on Theory of Computing, pages 240-251, 1985. M.Y. Vardi and P. Wolper. An automata-theoretic approach to automatic program verification. In Proceedingsof the FirstSymposium on Logic in ComputerScience, pages 322-331, Cambridge, June 1986. M.Y. Vardi and P. Wolper. Automata-theoretic techniques for modal logics of programs. Journal of Computerand System Science, 32(2):182-221, April 1986.
Hoare-Style Compositional Proof Systems for Reactive Shared Variable Concurrency F.S. de Boer . I , U. H a n n e m a n n *.2 and W.-P. de Roever ***~ 1 Utrecht University, Department of Computer Science, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands 2 Christian-Albrechts-Universitiit zu Kiel, Institut f'tir Informatik und Praktische Mathematik II, Preusserstrasse 1-9, 241O5 Kiel, Germany
A b s t r a c t . A new compositional logic for verifying safety properties of shared variable concurrency is presented, in which, in order to characterize infinite computations, a Hoare-style I/pre/post format is used where I expresses the communication interface, enabling the characterization of reactive programs. This logic relates to the Rely/Guarantee paradigm of Jones 11, in that Rely/Guarantee formulae can be expressed within our formalism. As novel feature we characterize prefixes of computations through so-called time-diagrams, a mapping from a discrete total wellfounded ordering to states, and combine these with action predicates (already introduced in old work of, e.g., Lamport) in order to obtain a compositional formalism. The use of time diagrams enables the expression of strongest postconditions and strongest invariants directly within the assertion language, instead of through encoding within the natural numbers. A proof of Dekker's mutual exclusion algorithm is given.
1
Introduction
This paper represents part of our research into the usefulness and the scope of possible approaches to compositional formalisms for shared variable concurrency. It serves as a foundation for the corresponding chapter in 16. In 1965 E.W. Dijkstra introduced the p a r b e g i n statement for describing parallel composition between processes which communicate via shared variables. But it is only recently that the compositional and fully abstract semantics of shared variable concurrency has been studied in 4, 5. On the other hand the first complete logic for proving partial correctness properties of concurrent programs * e-mail: [email protected] ** e-mail: [email protected] *** e-maih wpr~informatik.uni-kiel.de
268
appeared already in 1976 and was developed by S. Owicki and D. Gries in 21. However,their proof method is not compositional in the sense that it does not allow a derivation of a correctness specification of a parallel program in terms of local specifications of its components without reference to their internal structure. Consequently this proof method cannot be used to support top-down programdesign. Moreover, the relevance of a compositional reasoning pattern with respect to the complexity of (mechanically supported) correctness proofs of concurrent systems lies in the fact that the verification (in a compositional proof system) of the local components of a system can in most practical cases be mechanized fully (or at least to a very large extent). What remains is a proof that some logical combination of the specifications of the components implies the desired specification of the entire system. This latter proof in general involves purely mathematical reasoning about the underlying data-structures and as such does not involve any reasoning about specific control structures (see also 12 where the use of 'mathematics' for specification and verification of concurrent programs is strongly advocated). This abstraction from the flow of control allows for a greater control of the complexity of correctness proofs. For the model of concurrency described by CSP, which is based on a synchronous communication mechanism, several (relatively) complete compositional proof methods have been introduced, e.g., in 6, 9, 15, 13, 18, 24, 26. These proof methods formalize reasoning about synchronous communication in terms of a trace logic, a trace being a sequence of records of communications. For the parallel composition of processes it is important to notice that their specification should only refer to projections of the trace onto those communications that involve the process at hand. Interpreting shared variables as CSP-like processes, these methods inspired also proof methods for shared variables 19. The first compositional characterization of shared variable concurrency was called the Rely/Guarantee (K/G) method and was conceived by Jones 11; for complete versions of this proof system consult 17, 22. Again validity of a I~/G specification of a process states that provided the environment satisfies the rely condition R that process fulfills the guarantee condition G. The difference with the so-called assumption/commitment (A/C) method 13 introduced by Misra and Chandy in 1981 being that validity of an A/C specification of a process S stipulates that C holds after each communication of S provided A holds after all communications before that particular one, whereas for any given so-called reactive sequence of a process, as described in 4, 5, the assumption R in the R / G method refers to all its environmental moves and the commitment G to all its process transitions in every prefix of the computation at hand. The AC and P~/G methods have in common that soundness of the network rules in both system can be proved by an inductive argument on the length of their computation sequence (respectively, traces or reactive sequences). The R/G method can be regarded as a compositional reformulation of the Owicki/Gries method as argued in 23 on the basis of a comparison of their respective completeness proofs, since both are based on the introduction of a special kind of auxiliary variables, namely the so-called history variables which record the sequence of state-changes, and both
269
proofs use the same style of strongest postcondition assertions. In 2 a compositional proof method is presented which formalizes reasoning about shared variable concurrency directly in terms of histories, i.e. they needn't be introduced through the addition of extra auxiliary variables. In other words, histories form an integral part of our programming logic, similarly as in the compositional proof method of 26 for CSP. In order to be able to describe parallel composition logically as conjunction we represent computation histories by time-diagrams as follows: Given a discrete total well-founded ordering which represents time abstractly, a program variable then is naturally viewed as a function from this abstract notion of time to the domain of values, a socalled time-diagram. Interpreting time-instances as possible interleaving points and introducing boolean variables (so-called action variables) which indicate for each time-instance whether the given process is active or not (these action variables are also used for the same purpose in3), we can describe logically the compositional semantics of 4, 5 in terms of time-diagrams. Thus we show in that paper that a compositional pre/post style reasoning about shared variable concurrency, apart from the given underlying data structures, involves reasoning about a discrete total well-founded ordering, the first-order logic of which is decidable. Since the proof method presented in as described above only reasons about the input/output behaviour of concurrent systems, its applicability to reactive systems is limited. In this paper we extend the method above in order to reason about non-terminating computations. The specification style here incorporates additionally to the pre- and postconditions an invariant which is interpreted w.r.t, all computations of a process including infinite ones. These generalized correctness formulae are suited to reactive processes as the invariant can also be seen as an interface specification towards the environment that guarantees a certain behaviour. We demonstrate that R / G style proofs can be embedded in our proof method. On the other hand, there is still a not understood difference between R / G and our time diagram based method in that until now nobody succeeded in extending the R / G method to real time, whereas for our approach this extension is only natural. The plan of the paper is as follows: In the next section we describe a programming language for shared variable concurrency. In Section 3 we introduce the assertion language and correctness specifications and describe their semantics. The proof system is presented in Section 4. An example of a correctness proof of a mutual exclusion algorithm is presented in Section 5. Section 6 discusses an embedding of the Rely/Guarantee formalism. 2
Programming
Language
In this section we present a programming language for shared variable concurrency. Let Pvar be the set of program variables, with typical elements z, y , . . . . For ease of presentation we restrict to the domain of values consisting of the integers and booleans only.
270
D e f i n i t i o n l . In the grammar of the programming language below, boolean expressions are denoted by b, whereas e denotes either an arithmetic or a boolean expression (we abstract from the syntactical structure of arithmetic and boolean expressions).
S::=b.x:=e
I &;S2 I
"
bi - +
The execution of the guarded assignment b.z := e corresponds with the execution of an await-statement of the form await b --~ x := e. Namely, the execution b.x :-- e is suspended in case b evaluates to false. In case b evaluates to true control proceeeds immediately with the execution of the assignment z :-- e which is executed atomically. Thus the evaluation of the guard and the execution of the assignment cannot be interleaved. Sequential composition is denoted as usual by the semicolon. Execution of the choice construct Di=ln hi-+ Si consists of the execution of S~ for which the corresponding guard bi evaluates to true. The control point between the evaluation of bi and the subsequent execution of Si constitutes an interleaving point. The evaluation of a boolean guard itself is atomic. In case none of the boolean guards evaluate to true the execution of the choice construct suspends. The execution of the itera. . -+ & tive construct * ~n=lbi -+ Si consists of the repeated execution of ~i=lb, until all the boolean guards are false. Parallel composition of the statements $1 and $2 is denoted by $1 l $2. Its execution consists of an interleaving of the atomic actions, that is, the guarded assignments and the boolean guards of $1 and $2. A program S in which all variables are local, i.e., no parallel environment refers to them, is called closed and will be denoted by S.
3
The Mathematics
of Shared-Variable
Concurrency
In the section we discuss the mathematical structures and corresponding logics needed to describe and reason about shared-variable concurrency in a compositional manner. Hence we must be able to distiguish between state changes performed by the process at hand and state changes caused by an environment 1. In 4, 5 a compositional semantics for shared variable concurrency is introduced based on so-called reactive sequences. A reactive sequence is a sequence of pairs of states: (~1, ~ ) , (~2, ~ ) , . . . . A pair of states (~, a') represents a computation step of the process which transforms the input state ~ into ~ . A 'gap' ( ~ , ~2) between two consecutive computation steps (~1, e~) and (or2, ~ ) represents the state-changes introduced by the environment. Parallel composition in this model is then described by interleaving of reactive sequences. In a full, closed, system
271
gaps have disappeared; then one only considers sequences which are connected, i.e., for which e~ = ~2. In order to be able to describe parallel composition logically as conjunction we introduce a representation of reactive sequences as time-diagrams: Given a discrete total well-founded ordering which represents time, a program variable then is naturally viewed as a function from time to the domain of values, a socalled time-diagram. Interpreting time-instances as possible interleaving points and introducing boolean variables which change in time to indicate whether the given process is active or not, we can describe logically the compositional semantics of 4, 5 in terms of time-diagrams. Thus compositional reasoning about shared variable concurrency, apart from the underlying data structures, involves reasoning about a discrete total wellfounded ordering providing a very simple time-structure which is sufficient for this purpose. In the context of mechanically supported program verification it is of interest here to note that the first-order logic of discrete total well-founded orderings is decidable. Moreover it should be observed here that we have only a qualitative notion of time which is introduced in order to model interleaving of parallel processes and as such it should be distinguished from the notion of real time as studied in, e.g., 10. Formally we define the (typed) assertion language for describing and reasoning about time-diagrams as follows. We assume given the standard types of the integers, denoted by int, and the type of booleans, denoted by bool. Furthermore we assume given the type of points in time, denoted by t i m e . As introduced in the previous section, the set of program variables is given by Pvar. For each z E Pray we have that z is either an integer or a boolean variable. We distinguish a set Avar C_ Pvar of boolean variables. Variables of Avar, with typical element a , . . . , will also be called action variables, since they will be used to indicate whether a given process is active or not. We assume that action variables do not occur in statements. The set of logical variables is denoted by Lvav (which is supposed to be disjoint from Pvar). A logical variable can be of any of the above given types i n t , b o o l or t i m e . In the sequel we will use the symbols t , . . . both for denoting time variables and time-instances (i.e. the elements of a given time-domain T). D e f i n i t i o n 2. We present the following main cases of a logical expression l. l ::= time I z I x(l) It1 -~ t2 Ixl(l) = x2(l)l ... with z E Lvar, x, xl, ~2 E Pvar, tl, t2 of type t i m e . In the above definition time is a constant of type t i m e which is intended to denote the current time instant. The intended meaning of a logical expression z(l), where it is implicitly assumed that l is of type t i m e , is the value of the program variable z at the time-instant denoted by I. The precedence relation in time is denoted by _. More complex logicM expressions can be constructed using the standard vocabulary of the integers and booleans.
272
D e f i n i t i o n 3 . Next we define the syntax of an assertion p.
p::=z l
lp^q l
p l3a. p
where l is of type bool, z E Lvar and a E Avar. Assertions are constructed from boolean logical expressions by means of the logical operations of negation, conjunction and (existential) quantification over logical variables and action variables. Note that we do not allow quantification of variables of Pvar \ Avar, that is, the variables which may occur in statements. In order to describe formally the semantics of the assertion language we need the following definitions. D e f i n i t i o n 4 . Let Val denote the set of all possible values. The set of states ~ , with typical element ~, is given by Pvar -+ Val (assuming that a state maps integer variables to integers and boolean variables to booleans). D e f i n i t i o n 5. Given a discrete well-founded total ordering (T, _), a time-domain T for short, a time-diagram d is an element of D = T -+Ia S, where T --+yd r denotes the set of partial functions from T to ,U the domain of which is nonempty and downward-closed, i.e. if d(t) is defined and t' _ t then also d(t') is defined. While a state assigns values to the program variables, as usual, a timediagram describes the state-changes in time. The domain of a diagram d we denote by dom(d). If d is finite, we denote the last time instant of d by max(d). Although computations can be both finite and infinite, we only need to check the finite computations in verifying partial correctness and safety properties, because for any program if there is an infinite computation which is invalid, there is also an invalid finite computation. Therefore, if considering all the finite computations leads to validity, then also considering all the infinite computations leads to validity. Thus the definition of max(d) is unambiguous as we restrict ourselves to the evaluation of assertions w.r.t finite time diagrams. Semantically assertions are evaluated with respect to a (time) diagram d E D = T -+d ~ and a logical environment e E Lvar --+ Val. Formally we have the following truth-definition. D e f i n i t i o n 6 . Let a =v a', where v C Pvar, if for all x E v we have that a(x) = ~'(x). This notion we lift to diagrams as follows: d =~ d' if dora(d) = dom(d') and, in case both d(t) and d'(t) are defined, d(t) =v d'(t), for every t. The value of a logical expression I in a logical environment e and a diagram d, denoted by i(e)(d), is defined by a straightforward induction on l, for example,
time(e)(d)
-- max(d)
and
Ix
(e)(d) = d(tl (e)(d)) (x).
The truth-value of an assertion p in a logical environment e and a diagram d, denoted by p(e)(d) (or sometimes also by e, d ~ p), is defined by induction on p. We give the following cases:
273
For z e Lvar of type i n t or bool, we define 3z.p(e)(d) if there exists a value v of a corresponding type such that ~p(e{v/z})(d). - For z e Lvar of type t i m e , we define ~3z. p(e)(d) if there exists a t e dom(d) such that p~(e{t/z})(d). - For a e Avar, we define I3a. p(e)(d) if there exists a d' such that d = . d', for v = Pvar \ {a}, and p(e)(d').Observe that this introduces in fact quantification over sequences of time-instances, i.e., a second order feature. -
Note that thus quantification over time is restricted to the domain of the given diagram. D e f i n i t i o n 7. A logical environment e and a time-diagram d are defined to be consistent if e maps every time variable to an element of the domain of d. An assertion p is valid if for any discrete well-founded total ordering (T, _), we have that ~v (e)(d), for any consistent e and d. For notational convenience we introduce the next-time operator C)l, where l is an expression of type t i m e , and the strict precedence relation -g. Note that the next-time operator (like all the other standard temporal operators) and the strict precedence relation can be expressed 3 In order to describe logically progress in time we introduce the following substitution operation. D e f i n i t i o n 8. Given an assertion p and a time variable t, the assertion pt/time denotes the result of (the usual) replacement of (occurrences of) time in p by t and, additionally, the replacement of every subformula 3t *. q (Vt~. q) by the bounded quantification 3t'(t' -~ t A q) (Vt'(t' ~ t --* q)). (Formulas of the form 3t'(t' -~ tAq) and Vt'(t' ~ t -~ q) we will also denote by 3t' _ t. q and Vt' ~_ t. q, respectively.) For example, given an assertion p the passing of one time-unit we can describe by 3t(pt/time A time = Or). Observe that due to the introduction of bounded quantification the assertion pit/time thus refers to the time interval determined by t which by the substitution operation is initialized to the 'old' value of time. This is formalized by the following substitution lemma. L e m m a 3.1 Let dr, with t a time-instance, denote the time-diagram d restricted to the set of time-instances preceding (and including) t. For any consistent logical environment e and time-diagram d, and assertion p we have
e, d ~ pit/time
iff e, de(t) ~ p
In 16 a compositional interleaving semantics of statements is presented which, given a time-domain (T, _), assigns to every statement S a meaning function
MS
e (T --'Yfd Z) -~ 7)(T -+fa ,V,)
3 E.g., C)l = / with satisfying Vl'.l' > l -~ l < l'
274
(the semantics M is a straightforward reformulation of the semantics introduced in 4). The intuition is that d' e MS(d) if d' is an extension of d which consists of an interleaved terminating execution of S. The semantics M uses a fixed action variable a to indicate the state-changes induced by the process itself. We then can define the truth of a partial correctness specification, denoted by ~ {p}S{q}, formally in terms of the semantics M: {p}S{q} if we have that whenever p(e)(d) evaluates to true and d' 9 M~S~(d) then q~(e)(d') evaluates to true as well. A proof system for partial correctness w.r.t, this semantics is presented in 2. While this style of semantics describes the initial/final state transformation, we aim at an appropriate characterization of reactive processes, i.e., possibly non-terminating programs, which require observation of all intermediate states. Therefore we define a semantics M< which contains all time diagrams which are a prefix of a (possibly infinite) computation. For example the semantics of an assignment b.z : - e contains all those timediagrams which consist of a waiting period possibly followed by the actual assignment. D e f i n i t i o n 9 . Generalized correctness specifications (Invariant specifications) are of the form I : {p} S {q}. I : {p} S {q} if, for all time diagrams d, whenever p holds in d and d' e M< S~ (d) then (i) I holds in d', and (ii) if d' e M S~ (d) then q holds in d'. Note that the condition d~ E M 5~ (d) states that d ~ is an extension of d which describes a terminating computation of S. We could have avoided this reference to our M semantics by introducing a fin predicate as termination flag of a process. The invariant I has to hold both in the initial state and the the final state of a terminated computation since d is trivially a prefix of d and if d' e M IS (d), then also d' e M< IS (d) holds. 4
The
Proof
System
In this section we present a proof system for deriving partial correctness specifications as introduced in the previous section. Within the proof system we frequently use predicate waitb(t,t') denoting that a process waits between t and t' to pass the guard b without succeeding. "Failed" test of the guard result in stuttering steps, i.e., all variables (denoted by ~) keep their values.
waitb($,t') ~' W ~ t-~ t'.((a(t) --+ --b(t')) A (a({) -+ ~ ( O ~) --- ~(~))) The actual assignment at the moment t is characterized by
275
ezec(b.z := e, ~9)(t) ~ ' a(t) A b(t) A tirne = O t ^ ~l(time) = fl(t) A x(time) = e(t) Here ~(time) = ~(t)) (where 9 is a sequence of variables different from x) denotes the conjunction Ai yi(time) = yi(t) (Y = Y l , . . . , Y,~). A guarded assignment b.x :-- e is characterized w.r.t, a precondition p by the following axiom. Note that we abbreviate x := x by skip. A s s i g n m e n t a x i o m : Let ~ be a sequence of variables different from x. I : {p} b.x := e {q} where
q -- 3t.((p A I)t/time
A 3t'.waitbCt,t') A ezec(b.x :'-- e, ~l)(t')
and z -
3t.vt/time
^ (3t'.wait~(t,
t') ^ (t'
-- t i m e
V e~ec(b := e, ~)(t')))
Due to the substitution of time by t in p, the (quantified) time variable t in the postcondition refers to the value of time before the execution of b.z := e. The idling period which represents possible interleavings of parallel processes (and stuttering steps) is given by the values of t and t'. At time t' the condition b evaluates to true, execution of b.z :-- e takes place and it takes one time-unit. While q coincides with the well-known characterization of the strongest postcondition of a standard assignment statement x :-- e in sequential programming (8), I can be interpreted as a strongest invariant w.r.t p and b.z := e, as used in completeness proofs for distributed message passing systems, e.g., in 25, 14. The rules for the other statements are as usual. Sequential rule:
I : {p}sl{r},I: {r}S2{q} :
{p}S1;S2{q}
Choice rule: I : {p} bi.skip;Si{q}
I : {p}Q~=lb~ ~
i-- 1,...,n
S~{q}
Iteration rule: I : {p} ~n=lbi --+ Si {p},
I : {p} h i - @ / . s k l p {q}
z: {p}, DLlb~-+ &{q} Parallel composition is described as follows. Parallel-rule-"
II : {pl}S1{ql} I2 : {p2}S2{q2} I : {Pl A p2 A lime = to}S1 li S~{3al,a2,Q,t2(q'l A q'2 A fin A act)}
276 where q~ denotes the formula qiai, ti/a, time, for i --- 1,2, in denotes the conjunction of the formulas time -- max(tl,t2), Vt.ti "~ t "~ time --~ -~ai(t), i -- 1,2, and act denotes the formula Vt.to "~ t -~ time -+ -~(al(t) A a2(t)) A a(t) e+ (a 1(t) V a2 (t)). Furthermore I denotes the formula 3a 1, as. 11a 1/a A I2 a2 / a A act. The quantified action variables al and a2 in the postcondition of the conclusion of the above rule are introduced to distinguish the computation steps of $1 and $2, respectively. The execution times of St and $2 are given by the time variables tl and t2, respectively. The initial time of the execution of the parallel composition of $1 and $2 is given by the time variable to, which is initialized to the value of time in the precondition. The assertion
yr. to -< t -< time.-,(al(t) ^ a2(t)) ^ a(t) ++ (al(t) V expresses that the execution of S1 II $2 consists of an interleaving of $1 and 82. For actual reasoning about correctness formulae, the following adaptation rules are used: An invariance axiom which expresses that read-only variables are not changed: I n v a r i a n e e a x i o m : Let x be a read-only variable of S. Vto ~_ t -~ time. a(t) ~ x(t) = z(Ot ) : {to = time} S {true} We have a similar rule to express that local variables (in that sense that any environment has at most read access to them) are not changed outside of a process: L o c a l i t y a x i o m : Let x be a local variable of S. Vto ~ t -< time.-~a(t) -~ z(t) = zCOt ) : {to = time} S {true}
Consequence rule: :{p}S{q}
I--+!',p'-+p,q-+q' I ' : {p~} S {q'}
Invariance introduction: I : {p} S {q}
i : {p} s {q A Reasoning about a statement under the assumption that it cannot be interleaved, i.e., about a closed system, can be axiomatized simply by the following rule. Non-Interleaving rule: I : {p} S {q}
I AVt.a(t): Iv} S {q AVt.a(t)} The additional information in the conclusion of the above rule expresses that S is active at all times. Moreover we have the elimination rule and the the conjunction rule.
277
In 16 we prove soundness, i.e. every derivable correctness specification is valid, and completeness, i.e. every valid correctness specification is derivable, of the proof system w.r.t, the compositional semantics M<. The completeness proof follows the lines of the general pattern introduced by Cook in 7. It is based on the expressibility in the assertion language of the strongest postcondition and of the strongest invariant. D e f i n i t i o n 10. For a given statement S and precondition p the strongest postcondition, denoted by SP (p, S), is defined by {d I there exists d' s.t. d' ~ p and d E M S~ (dl)}, and the strongest invariant, denoted by SInv(p, S) is defined by {d
there exists d' s.t. d' ~ p and d e M< IS (dl)},
(we assume that p does not contain free logical variables, therefore reference to a logical environment is omitted). It is worthwhile to remark here that we can express both the strongest postcondition and the strongest invariant in the assertion language directly, that is, we do not need the usual coding techniques (see 20); this constitutes the main advantage of our axiomatization based on time diagrams.
5
Example: Proving a Mutual Exclusion Property
This proof style is suited to prove safety properties, e.g., mutual exclusion of concurrent processes. As an example for this type of algorithm we prove the mutual exclusion property for Dekker's well-known algorithm. The algorithm consists of two symmetrical processes P1 and P2 that use boolean variables reqi to establish that Pi requests access to its critical section and a variable turn to report which process may be in its critical section. P~ :
* true --+ < noncritieali > ; reqi :-- true; * reqj --+ turn = j --4 reqi := false ; , turn = j --+ skip ; reqi := true
cflagi := true; < critical1 >; cflagi := false; reqi := false; turn := j
;
278
for i, j E {1,2}, i r j. We introduce local booleans eflagi to indicate when Pi is in its critical section. These processes do not terminate, thus the postcondition of their parallel composition will turn out to be false. The fact that the processes never reach their critical regions simultaneously is expressed by the invariant -,(c flag1 A c flag2). When we start executing the program we assume that no process is in its critical region and none of them has already requested this. Furthermore we regard the program in isolation, i.e., we assume that no other process exists which changes any variable occurring in our program. The correctness formula to be proved is: "l(cflagl A cflag2) : {--~peql A " ~ r e q 2 A -~cflagl A -~cflag2} P1lP2 (false}. 5.1
Local Proof
We will first consider the processes in isolation, constructing a specification from the proof system and then in a second step examine their parallel composition. We assume that both statements called criticali and noncriticali do not refer to the variables reql,req2,turn, cflag1 and cflag2. Since the processes are symmetrical, we can restrict our local proof w.l.o.g, to P1. P1 essentially is an infinitely often executed loop construct. The following formulae are used frequently in this proof, we define the following abbreviations :
rely1 de~ (Vt -< tirne.-~a(t) -+ reql(t) = reql(C)t) A c flag1 (t) = c flag1 ((~)~)) rely1 formulates the assumption that reql and eflag1 are local variables of P1. clef
guarl -~ (cflagl(tirne) -+ ((at, t'.~cylagl(t) ^ reql(t) ^ ~req2(t) ^ waft(t, t') ^ a(t') ^ cflagl(Ot')) Art' ~ ~ ~ ~ime.a(~) -+ (reql(~) = reql(O~ ^ e/lag~(~) = e/lagl(O~)))) As we will see, the guarl predicate excludes all possible computations that could violate the mutual exclusion property. def
i n ~ = (vt ~ time.a(~) -~ (req~(t) = r~q2(Ot))) Since req2 is a read-only variable of P1, we can guarantee that it is not changed by Px. clef Define the invariant I1 -- (rely1 -4 guarl) A invl, claiming that as long as the environment does not change reql, P1 guarantees that the critical section is entered only if -~eflagl A reql A -~req2 held before. Furthermore, invl states that P1 does not change the value of the read-only variable req2. We want to prove that I1 : {-~reql A -~cflagl} P1 {false} Applying the Loop rule, the "desired" postcondition of the loop, false, is obtained trivially since the outer loop's guard is identically true.
279
We have to check each assignment statement with respect to this invariant. First of all
I1: {(rely1 --+ (-~cflagl (time) A-~reql(time))) A I1} noncritical1 {relyl --} (~cflagl (time)-~reql (time))) is established mainly using the invariance axiom, since neither cflag1 nor reql occur in noncriticall, and the consequence rule. Similarly we can prove 11 :
{rely1 --> (-~cflagl (time) A-~reql(time)) A 11} reql := true {rely1 --+ (-~cflagx (time) A reql (time) ) } with the assignment axiom and the consequence rule.We omit here the details of the derivation of the correctness formula for the inner loop abbreviated by
inloopl : inioopl ~e 9 req2 --~
turn = 2 --~ reql :-- false ; 9 turn = 2 --+ skip ; reql :-- true
;
which satisfies the correctness formula I1 : {(re/y1 --} (reql (time) A -~cflagl (t/me))) A I1}
inloopl --+ reql(t) A-~cflagl(t)) A wait(t, t') A a(t') A -~req2(t') ^ time = O ( t ' ) A reql(time) = reql(t') ^ eftagx (time) = cftaal (t')))}
{(qt, t~.((Vt0 "< {-< t.relyx/time
using relyx --} (reqa (time)A-~cflagl (time)) as loop invariant. The postcondition above implies
(rely1 ~ (-~cflagl (time) ^ reql (time) ^-~req2(time))). The remaining assignments and critical~ are treated similarly as the assignment and noncritical1 above; we obtain formulae for these with the common invariant I1 and pre- and postconditions s.t. the postcondition of a statement can be adapted to the precondition of the following statement, e.g., by the Invariance introduction rule. Then we apply the sequential composition rule and derive the following formula for the outer loop body P~: I1 :
{rely1 ~ (-~eflagl (time) h -~reql (time) ) }
Pi {rely1 -~ (-~cflagl (time) A-~reql(time))} for P1 - *P~, hence another application of the consequence rule and the loop rule as indicated before leads to the desired correctness formula for P1.
280
5.2
Parallel
Composition
Given the two local proofs of P1 and P2 respectively, we clearly see t h a t the intended precondition is the conjunction of the local preconditions , and that the postcondition is false. Our main focus here is the mutual exclusion property of the critical sections of the processes, formalized by the requirement
Vt.'~(cflagl (t) A cflag~ (l)). First we realize that cflagi is local to Pi, hence the Locality rule allows to establish loci A I as invariant of Pi, where loci ~e Vt ~_ time.-,ai (t) -+ cflagi (t) = cflagi(Ot). Now the parallel composition rule leads to the following invariant for the cooperating processes:
I1 A I2 A Vt.-~(al(t) A a2(t)) A (a(t) ~-+ (al(t) V a2(t))) A loci A loc2 Since we are only interested in the cooperation of P1 and /)2 we restrict ourselves to the cases where no outside interference has to be regarded - all process variables (including those of the critical sections) are changed only within these two processes. Thus the application of the Non-interleaving rule together with the invariant above implies
guarl A guar2 A invt A inv2 A (Vt.-~(al(t) A a2(t)) A (al(t) V a2(t))), since invj A loci = relyi under the assumption par, i r j, i,j E {1, 2}. Next we assume that cflagt (time) A c flag2 (time) holds during the execution of P and show that this assumption together with the formula above yields false, thus proving the requested property. Now cflag~ (time) A c flag2 (time) and the formula above imply the following expression:
3tl.reql(tt) A-~req~(tt) AVtt ~_ t -~ time.reqt(t) = reqt(Ot) A qta.req2(t3) A -~reqt(t3) A Vt3 ~ t -~ time.req~(t) = req2(Ot), and using that although T is a total order,
tl ~ t3, since this contradicts reqt(tl) A-~reql(tz), - tl 7~ 13, as the above implies that reql(tl) A Vtl "< t -< time.reql(t) A -~reql (t3). Contradiction, - ta 7~ tl for a symmetrical reason, -
we obtain a contradiction. Thus, -~(cflagl (time) A cflag2(time)) is a conclusion from our invariant and we succeeded proving the mutual exclusion property by establishing the formula
(~(cflagl A cflag2)) : {-~reql A ~reqz A "~cflagl A -~cflag2} PIIIP~
{false}.
281
6
Embedding the Rely and Guarantee Formalism
In the Rely/Guarantee formalism 11, 17, 22 a specification is split up into four parts. There exist two assumptions on the environment: a precondition pre characterizing the initial state and a rely condition on state pairs that characterizes a relation any transition from the environment is supposed to satisfy. These assumptions describe conditions under which the program is used. The expected behavior of the program when used under these conditions consists of a postcondition post on the final state of the program in case it terminates, and a guarantee predicate guar which characterizes a relation any transition performed by the program itself should satisfy. Formally, P sat (pre, rely, guar, post) denotes that program P satisfies the specification quadruple if for all computations r of P, whenever r starts in a state which satisfies pre, and any environment transition in r satisfies rely, then any component transition in r satisfies guar and if v terminates, its final state satisfies post. Now we can embed the R / G formalism into our (generalized) system in the following way. First note that the pre- and postcondition of the R / G formalism correspond to a restricted kind of pre- and postcondition in our system, namely, we have only to 'time' them: pre(time) and post(time) denote the formulas obtained from pre and post by replacing every occurrence of a program variable x by z(time). Using the action variable a to indicate the environmental steps by requiring --a and the steps of the process itself by a , the rely and guar part of a R / G specification then corresponds with the following invariant I:
Vt.to "~ t.(-,a(t) -4 rely(~(t), ~ ( O t ) ) ) -4 Vt.to "< t.(a(t) --~ guar(~(t), ~ ( O t ) ) ) Hence we can express a R / G formula within our system as follows:
P sat (pre, rely, guar, post) I : {pre(time) A time = to} P {post(time)}
7
Final R e m a r k s
In the case of distributed communication for any give fully abstract model basically four different compositional logics exist, which are intertranslatable 26, 14; these are the sat - style, pre/post - style, I / p r e / p o s t - style and Assumpt i o n / C o m m i t m e n t logics. For shared variable concurrency only one such logic is known, that of Jones 11 based on the Rely/Guarantee ( R / G ) paradigm. There are several problems with the resulting proof methods, e.g., correctness proofs for mutual exclusion algorithms turn out rather difficult to give. By introducing the concept of time diagrams we were able to give such a compositional logic for real - time shared variable concurrency. In the present paper we extended a Hoare style formalism 2 to a compositional Hoare - style I / p r e / p o s t logic for reactive systems communicating through shared variables. We additionally embedded the R / G formalism within this logic
282
as another step towards a similar mutual intertranslatable system of proof styles for shared variable concurrency as needed for the compositional verification of reactive systems.
Acknowledgement s The authors would like to thank Qiwen Xu for his helpful comments on various versions of this manuscript.
References 1. P. Aczel. On an inference rule for parallel composition, unpublished note, 1993. 2. F.S. de Boer, U. Ha~memarm and W.-P. de Roever. A compositional proof system or shared variable concurrency, to appear at FME '97, LNCS, 1997. 3. H. Barringer, R. Kuiper, and A. Pnueli. Now you may compose temporal logic specifications. In 16th ACM symposium on Theory of Computation, pages 51-63, 1984. 4. F.S. de Boer, J.N. Kok, C. Palamidessi, and J.J.M.M. Rutten. The failure of failures: Towards a paradigm for asynchronous communication. In Proceedings of Concur '91, Lecture Notes in Computer Science, Vol. 527, pages 111-126, 1991. 5. S. Brookes. A fully abstract semantics of a shared variable parallel language. In Proceedings 8th Annual IEEE Symposium on Logic in Computer Science, IEEE Computer Society Press, pages 98-109, 1993. 6. S.D. Brookes, C.A.R. Hoare and A.W. Roscoe. A Theory of Communicating Sequential Processes. 3ACM 31(7), pp. 560 - 599, 1984. 7. S.A. Cook. Soundness and completeness of an axiom system for program verification. In SIAM J. on Computing 7, pp. 70-90, 1978. 8. R.W. Floyd. Assigning meaning to Programs. Mathematical Aspects of Computer Science XIX, American Mathematical Society, 1967. 9. E.C.R. Hehner, C.A.R. Hoare. A more complete model of Communicating Processes. TCS 26, pp. 134 - 120 , 1983. 10. J. Hooman. Specification and compositional verification of real-time systems. Lecture Notes in Computer Science, Vol. 558, 1992. 11. C.B. Jones. Development methods for computer programs including a notion of interference. PhD thesis, Oxford University Computing Laboratory, 1981. 12. L. Lamport. Verification and specification of concurrent programs. In A Decade of Concurrency (eds. J.W de Bakker, W.-P. de Roever and G. Rozenberg), Lecture Notes in Computer Science, Vol. 803, 1993. 13. J. Misra and K.M. Chandy. Proofs of networks of processes. IEEE Transactions on Software Engeneering, 7(7):417-426, 1981. 14. P. Pandya: Compositional Verification of Distributed Programs. Ph.D. Thesis, University of Bombay, 1988. 15. P. Pandya and M. Joseph: P-A logic - a compositional proof system for distributed programs. Distributed Computing, Vol. 5, pp. 37 - 54, 1991. 16. W.-P. de Roever, F.S. de Boer, U. Hannemann, J. Hooman, Y. Lakhnech, P. Pandya, M. Poel, H. Schepers, Q. Xu and J. Zwiers. State-Based Proof Theory of Concurrency: from Noncompositional to Compositional Methods. to appear 1998.
283
17. K. StOlen. Development of Parallel Programs on Shared Data-structures. PhD thesis, Computer Science Department, Manchester University, 1990. 18. N. Sounderarajan. Axiomatic semantics of communicating sequential processes. TOPLAS, 6:647-662, 1984. 19. N. Sounderarajan. A proof technique for parallel programs. Theoretical Computer Science, Vol. 31, pp. 13-29, 1984. 20. J.V. Tucker and J.I. Zucker. Program correctness over abstract data types, with error-state semantics. In CWI Monograph Series, vol. 6, Centre for Mathematics and Computer Science/North-Holland, 1988. 21. S. Owicki and D. Giles. An axiomatic proof technique for parallel programs. In Acta Informatika, 6:319-340, 1976. 22. Q. Xu. A theory of state -based parallel programming. PhD thesis, Oxford University Computing Laboratory, 1992. 23. Q. Xu, W.-P. de Roever and J. He. Rely- guarantee method for verifying shared variable concurrent programs. Formal Aspects of Computing 1997 (To appear). 24. C.C. Zhou and C.A.R. Hoare. Partial Correctness of CSP. Proc. IEEE Int. Conf. on Distributed Computer Systems, pp. 1- 12, 1981. 25. J. Zwiers, W.-P. de Roever and P. van Erode Boas: Compositionality and concurrent networks: soundness and completeness of a proofsystem. Technical Report 57, University of Nijmegen, The Netherland s, 1984. 26. J. Zwiers. Compositionality, Concurrency, and Partial Correctness. Lecture Notes in Computer Science, Vol.321, Springer-Verlag, 1989.
A Simple Characterization of Stuttering Bisimulation Kedar S. Namjoshi* Department of Computer Sciences, The University of Texas at Austin, U.S.A.
A b s t r a c t . Showing equivalence of two systems at different levels of abstraction often entails mapping a single step in one system to a sequence of steps in the other, where the relevant state information does not change until the last step. In BCG 88,dNV 90, bisimulations that take into account such "stuttering" are formulated. These definitions are, however, difficult to use in proofs of bisimulation, as they often require one to exhibit a finite, but unbounded sequence of transitions to match a single transition; thus introducing a large number of proof obligations. We present an alternative formulation of bisimulation under stuttering, in terms of a ranking function over a well-founded set. It has the desirable property, shared with strong bisimulation Mil 90, that it requires matching 8ingle transitions only, which considerably reduces the number of proof obligations. This makes proofs of bisimulation short, and easier to demonstrate and understand. We show that the new formulation is equivalent to the original one, and illustrate its use with non-trivial examples that have infinite state spaces and exhibit unbounded stuttering.
1
Introduction
Showing equivalence between two systems at different levels of abstraction m a y entail m a p p i n g a single step in one system to a sequence of steps in the other, which is defined with a greater amount of detail. For instance, a compiler m a y transform the single assignment statement "x := x * 10 + 2" to several low-level instructions. When proving correctness of the compiler, the single assignment statement step is matched with a sequence of low-level steps, in which the value of x remains unchanged until the final step. If the program state is defined by the values of program variables, then the intermediate steps introduce a finite repetition of the same state, a phenomenon called "stuttering" by L a m p o r t La 80. Stuttering arises in various contexts, especially as a result of operations t h a t hide information, or refine actions to a finer grain of atomicity. In BCG 88,dNV 90, bisimulations t h a t take into account such "stuttering" are defined. It is shown in BCG 88 t h a t states related by a stuttering bisimulation 9 This work was supported in part by SRC Contract 96-DP-388. The author can be reached at kedar@r .utexas. edu.
285
satisfy the same formulas of the powerful branching temporal logic CTL* EH 82 that do not use the next-time operator, X. Although these definitions are well suited to showing the relationship with CTL*, they are difficult to use in proofs of bisimulation, as they often require one to exhibit a finite, but unbounded sequence of transitions to match a single transition; thus introducing a large number of proof obligations. The main contribution of this paper is a simple alternative formulation, called well-founded bisimulation, because is based on the reduction of a rank function over a well-founded set. The new formulation has the pleasant property that, like strong bisimulation Mil 90, it can be checked by considering single transitions only. This substantially reduces the number of proof obligations, which is highly desirable in applications to infinite state systems such as communication protocols with unbounded channels or parameterized protocols, where checks of candidate relations are often performed by hand or with the assistance of a theorem prover. We demonstrate the use of the new formulation with some non-trivial examples that have infinite state spaces and exhibit unbounded stuttering. The use of rank functions and well-founded sets is inspired by their use in replacing operational arguments for termination of d o - o d loops with a proof rule that is checked for a single generic iteration (cf. AO 91). To the best of our knowledge, this is the first use of such concepts in a bisimulation definition. It seems possible that the ideas in this paper are applicable to other forms of bisimulation under stuttering, such as weak bisimulation Mil 90, and branching bisimulation GW 89. We have chosen to focus on stuttering bisimulation because of its close connection to CTL*. The paper is structured as follows: Section 2 contains the definition of stuttering bisimulation from BCG 88, and the definition of well-founded bisimulation. The equivalence of the two formulations is shown in Section 3. Applications of the well-founded bisimulation proof rule to the alternating bit protocol and token ring protocols are presented in Section 4, together with a new quotient construction for stuttering bisimulation equivalences. The paper concludes with a discussion of related work and future directions.
2
Preliminaries
Notation
:
Function application is denoted by a ".", i.e., for a function f : A -+ B, and an element a E A, f.a is the value of f at a. Quantified expressions are written in the format (Qx : r.x : p.x), where Q is the quantifier (one of V, 3, min, max), x is the "dummy", r.x is an expression indicating the range of x, and p.x is the expression being quantified over. For example, in this notation, Vx r(x) ~ p(x) is written as (Vx: r . x : p.x), and Bx r(x) A p(x) is written as (Bx: r . x : p.x).
286
Definition (Transition System) A Transition System (TS) is a structure (S,--L L, I, AP), where S is a set of states, --~ C S โข S is the transition relation, A P is the set of atomic propositions, L : S --~ P ( A P ) is the labelling function, that maps each state to the subset of atomic propositions that hold at the state, and I is the set of initial states. We write s --~ t instead of (s, t) E --~. We only consider transition systems with denumerable branching, i.e., where for every state s, I{t s -+ t}l is at most w. D e f i n i t i o n ( S t u t t e r i n g B i s i m u l a t i o n ) (cf. BCG 88 1) Let ,4 = (S, -+, L, I, AP) be a TS. A relation B C S x S is a stuttering bisimulation on ,4 iff B is symmetric, and For every s,t such that (s,t) E B,
1. L.s = L.t, 2. (Va : fp.(s,a) : (35 : fp.(t,~) : match.B.(a,r where fp.(s, ~r) is true iff ~ is a path starting at s, which is either infinite, or its last state has no successors w.r.t. -~. match.B.(a, 6) is true iff a and 5 can be divided into an equal number of non-empty, finite, segments such that any pair of states from segments with the same index is in the relation B. The formal definition of match is given in the appendix. States s and t are stuttering bisimilar relation B for which (s, t) E B.
iff there is a stuttering bisimulation
Examples:
)
Q
Q
oi ) Structure L
Structure M
Structure N
1 BCG 88 defines "stuttering equivalence" for finite-state, total transition systems, as the limit of a converging sequence of equivalences. For finite-state systems, these are just the Knaster-Tarski approximations to the greatest solution of the symmetric version of this definition.
287
States a and c are not stuttering bisimilar in structures L and M, but they are in structure N. Indeed, L, c ~ AF.P, but L, a ~= AF.P. Structure M shows that stuttering bisimulation distinguishes between deadlock (state c) and divergence (state a) : M , c ~= EX.true, but M , a ~ EX.true 2. The dotted lines show a stuttering bisimulation on structure N. Our alternative formulation is based on a simple idea from program semantics: we define a mapping from states to a well-founded set, and require, roughly, that the mapping decrease with each stuttering step. Thus, each stuttering segment is forced to be of finite length, which makes it possible to construct matching fullpaths from related states. Definition (Well-Founded Bisimulation) Let A = ( S , - + , L , I , AP) be a TS. Let rank : S โข S x S -~ W be a total function, where (W, -<) is well-founded 3. A relation B C S x S is a well-founded bisimulation on ,4 w.r.t, rank iff B is symmetric, and For every s,t such that (s,t) E B, 1. L.s = L.t 2. ( V u : s --+ u :
t - . v: (u, ((u,t) 9 B ^
9 B) v u,t) ra,k.(s,
V
(a) (b)
((u,t) ~ B A (3v: t --+ v : (s,v) 9 S A r a n k . ( u , s , v ) -~ r a n k . ( u , s , t ) ) ) ) ( c ) Notice that if W is a singleton, then clauses (b) and (c) are not applicable, so B is a strong bisimulation. The intuition behind this definition is that when (s, t) 9 B and s -+ u, either there is a matching transition from t (clause (2a)), or (u, t) 9 B (clause (25))- in which case the rank decreases, allowing (2b) to be applied only a finite number of times - or (u, t) r B, in which case (by clause (2c)), there must be a successor v of t such that (s, v) 9 B. As the rank decreases at each application of (2c), clause (2c) can be applied only a finite number of times. Hence, eventually, a state related to u by B is reached. Theorem 1 (soundness) is proved along these lines.
3
Equivalence of the two formulations
The equivalence of the two formulations is laid out in the following theorems. 2 The dNV 90 formulation of stuttering bisimulation considers states a and c of N to be bisimilar. The difference between our formulations is only in the treatment of deadlock vs. divergence in non-total structures. (W,-~) is well-founded iff there is no infinite subset {a.i I i E N} of W that is a strictly decreasing chain, i.e. where for all i E N, a.(i + 1) -~ a.i.
288
T h e o r e m 1 ( S o u n d n e s s ) . A n y well-founded bisimulation on a T S is a stuttering bisimulation. Proof. Let B be a well-founded bisimulation on a TS ,4, w.r.t, a function rank and a well-founded structure (W, -~).
Let (s, t) be an arbitrary pair in B. Then, L.s = L.t, by clause (1) of the wellfounded bisimulation definition. We show that if e is a fullpath starting at s, then there is a fullpath 8 starting at t such that m a t c h . B . ( e , 8) holds. In the following, we use the symbol ';' for concatenation of finite paths, and o for concatenation with removM of duplicate state. For example, aa; ab = aaab, and aa o ab -- aab. We construct 8 inductively. For the base case, 8.0 = t. Inductively assume that after i steps, i >_ 0, 8 has been constructed to the point where it matches a prefix 7 of e such that the end states of 7 and 6 mark the beginning of the ith segments. Let u be the last state of 7 and v be the last state of 8. By the inductive hypothesis, (u, v) E B. If ~ ends at u, then u has no successor states. Let ~ be any fullpath starting at v. Since u has no successors, a simple induction using (2b) shows that for every state z in ~, (x, u) is in B. Each application of (2b) strictly decreases rank along ~, hence ~ must be finite. The fullpath 8 o~ is a finite fullpath matching the finite fullpath a. If ~r does not end at u, let w be the successor of u in ~. As (u, v) E B, (1) If (2a) holds, there is a successor z of v such that (w, z) e B. Let w and x mark the beginning of a new segment. Extend 8 to 8; z, which matches 7; w. The induction step is proved. Otherwise, (il) If (2a) does not hold, but (25) does, then (w, v) e B. Let A be the longest prefix of the suffix of a starting at u such that for every state a in A, (a, v) E B, and only (2b) holds for (a, v) w.r.t, a --+ b for every successive pair of states a; b in ),. ;~ has at least one pair, as u; w is a prefix of ~. A cannot be infinite, as by (2b), for each successive pair a; b in A, rank.(b, b, v) -< rank.(a, a, v), so the rank decreases strictly in the well-founded set. Let y be the last state of )~. If e terminates at y, the argument given earlier applies. Otherwise, y has a successor y~ in a, but as ), is maximal, either (2a) or (2c) must apply for (y, v) e B w.r.t, y -~ y'. (2c) cannot apply, as then there is a successor z of v such that (y, x) 9 B, which contradicts the properties of ~. Hence (2a) must apply. Let x be the successor of v such that (y~, z) 9 B. Let y' and x mark the beginning of a new segment, and extend 8 to 8; z, which matches (7 o ~); ~'. (iii) If (2c) is the only clause that holds of (u, v) w.r.t, u --~ w, let ~r be a finite path maximal w.r.t, prefix ordering such that 7r starts at v, and for every successive pair of states a; b in r , (u, a) 9 B, only (2c) is applicable w.r.t, u --+ w, and b is the successor of a given by the application of (2c).
289
Such a maximal finite path exists as, otherwise, there is an infinite path ~ satisfying the conditions above. By (2c), for successive states a; b in ~, rank.(w, u, b) -~ rank.(w, u, a); so there is an infinite strictly decreasing chain in (W,-~), which contradicts the well-foundedness of (W,-~). Let z be the last state in 7r. Then (u, z) E B, and as ~r is maximal, either (2a) or (25) holds of (u, ~) w.r.t, u --~ w. So z r v. (25) cannot hold, as then (w, x) is in B; but then (2a) would hold for the predecessor of z in ~r. Hence (2a) holds; so z has a successor z for which (w, z) e B. Let w and z mark the beginning of a new segment, and extend $ to ($ o 7r); z, which matches 7; w. The induction step is shown in either case. T h e inductive argument shows that successively longer prefixes of a have successively longer matching finite paths, which are totally ordered by prefix order. Hence, if g is infinite, the limit of these matching paths is an infinite path from t which matches ~ using the partitioning into finite non-empty segments constructed in the proof. It is also desirable to have completeness : that for every stuttering bisimulation, there is a rank function over a well-founded set which gives rise to a well-founded bisimulation. T h e o r e m 2 ( C o m p l e t e n e s s ) . For any stuttering bisimulation B on a TS .4,
there is a well-founded structure (W, -~) and corresponding function rank such that B is a well-founded bisimulation on .4 w.r.t, rank. 0 Let .4 = (S, -~, L, I, AP). The well-founded set W is defined as the product W0 x W1 of two well-founded sets, with the new ordering being lexicographic order. The definitions of the well-founded sets W0 and W1, and associated functions ranko and rank1 are given below. Informally, ranko.(a, b) measures the height of a finite-depth computation tree rooted at a, whose states are related to b but not to any successor of b.. rankl.(a, b, c) measures the shortest finite path from c that matches b and ends in a state related to the successor a of b. D e f i n i t i o n o f (W0, -~0) a n d ranko For a pair (s,t) of states of .4, construct a tree, tree.(s,t), by the following (possibly non-effective) procedure, which is based on clause (25) of the definition of well-founded bisimulation: 1. The tree is empty if the pair (s, t) is not in B. Otherwise, 2. s is the root of the tree. The following invariant holds of the construction: For any node y of the current tree, (y, t) E B, and if y is not a leaf node, then for every child z of y in the tree, z is a successor of y in .4, and there is no successor v of t in .4 such that (z, v) E B. 3. For a leaf node y, and any successor z of y in .4, if (z, t) E B, but there is no successor v of t in .4 such that (z, v) E B, then add z as a child of y in the tree. If no such successor exists for y, then terminate the branch at y. Repeat step 3 for every leaf node on an unterminated branch.
290
L e m m a 3. tree.(s, t) is well-founded.
Proof. Suppose to the contrary that there is an infinite branch or, which is therefore a fullpath, starting at s. Let u be the successor of s on ~, and let ~ be the fullpath that is the suffix of r starting at u. By construction of the tree, for every state x on ~1, (x,t) E B, and for every successor v of t, (x, v) ~ B. However, as (u, t) E B, there must be a fullpath 6 starting at t for which match.B.(crl,6) holds. Let w be the successor of t on 8. From the definition of match, for some x on ~ , (x,w) E B. This is a contradiction. Hence, every branch of the tree must be of finite length. Since tree.(s,t) is well-founded, it can be assigned an ordinal height using a standard bottom-up assignment technique for well-founded trees : assign the empty tree height 0, and any non-empty tree T the ordinal sup.{height.S + 1 I S <~T}, where S
ranko.(U, t) -~o ranko.(s, t). Proof. From the construction, tree.(u, t) is the subtree of tree.(s, t) rooted at node u; hence its height is strictly smaller. Definition of (W1, -~1) and rank1 Let W1 = N, the set of natural numbers, and let -~1 be the usual order < on N. The definition of rank1 is as follows : For a tuple (u, s, t) of states of ,4, 1. I f ( s , t ) E B , s ~ u , ( u , t ) ~ B , and for every successor v o f t , (u,v) ~ B , then rankl.(u, s, t) is the length of the shortest initial segment that matches s among all matching fullpaths s; ~ and 5, where ~ starts at u, and 6 starts at t. Formally 4,
rankl.(u, s, t) = (rain 5, ~, a, 7r : fp.(t, 5) A fp.(u, or) A 7r,~ e INCA corr.((s; ~, ~r), (5, 5)): Iseg.O.(6, ~)l) As (s,t) E B, and s --~ u, there exist matching fullpaths s; ~ and 5, with ~r starting at u and 6 starting at t. As (u, t) ~ B, and no successor of t matches u, under any partition ~ of any fullpath & that matches a fullpath s; ~, the initial segment, seg.O.(6, ~), matches s, and must contain at least two states: t and some successor of t. Thus, rankl.(u, s, t) is defined, and is at least 2. 4 The appendix has precise definitions of INC and corr.
291
2. Otherwise, rankl.(u, s, t) = O.
For any stuttering bisimulation B on TS ,4, there is a well-founded set (W, ~) and corresponding function rank such that B is a well-founded bisimulation on A w.r.t, rank.
Theorem 2 (Completeness).
Proof. Let W = Wo x W1. The ordering -~ on W is the lexicographic ordering on Wo โข W1, i.e., (a, b) -< (c, d) - (a "~o c ) V ( a = c a b -~1 d). Define rank.(u, s, t) = (ranko.(u, t), rankl.(U, s, t)). W is well-founded, and rank is a total function. We have to show that B is a well-founded bisimulation w.r.t, rank. Let (s, t) E B. 1. L.s -- L.t, from the definition of stuttering bisimulation. 2. Let u be any successor of s. If there is no successor v of t such t h a t (u, v) E B, consider the following cases: - (u,t) E B : As no successor of t is related to u by B, u is a child of s in tree.(s,t), and by L e m m a 4, ranko.(u,t) 4o ranko.(s,t). Hence, -
rank.(u, u, t) -~ rank.(s, s, t). (u,t) ~ B : As no successor of t is related to u by B, rankl.(U,s,t) is non-zero. Let fullpath (i starting at t and partition ~ "witness" the value of rankl.(u, s, t). Let v be the successor of t in the initial segment seg.O.(~, ~). This successor exists, as the length of the segment is at least 2. rankl.(u,s,v) is at most r a n k l . ( u , s , t ) - 1, so rankl.(U, S, V) 4.;1
rankl .(u, s, t). As no successor o f t is related by S to u, (u,v) q~ B, so ranko.(u,v) = O. As (u, t) q~ B, ranko.(u, t) = 0. Since rank is defined by lexicographic ordering, rank.(u, s, v) -~ rank.(u, s, t). Hence, one of (2a),(2b) or (2c) holds for (s,t) e B w.r.t, s --+ u.
For a transition system that is finite-branching (every state has finitely m a n y successor states), tree.(s, t) for any s, t is a finite, finitely-branching tree; so its height is a natural number. Hence, W0 = N.
P r o p o s i t i o n 5. For a finite-branching transition system, W = N x N. T h e o r e m 6 ( M a i n ) . Let A = (S,--+, L, I, AP) be a transition system. A re-
lation B on A is a stuttering bisimulation iff B is a well-founded bisimulation w.r.t, some rank function. Proof. The claim follows immediately from Theorems 1 and 2. For simplicity, the definitions are structured so t h a t a bisimulation is a symmetric relation. The m a i n theorem holds for bisimulations that are not symmetric, but the definition of rank has to be modified slightly, to take the direction of matching (by B or by B - 1 ) into account. Details will appear in the full paper.
292
4
Applications
The definition of a well-founded bisimulation is, by Theorem 6, in itself a simple proof rule for determining if a relation is indeed a bisimulation up to stuttering. In this section, we look at several applications of this proof rule. We outline the proofs of well-founded bisimulation for the alternating bit protocol from Mil 90, and a class of token ring protocols studied in EN 95. We also present a new quotient construction for a well-founded bisimulation that is an equivalence. In all of these applications, the construction of the appropriate well-founded set and ranking function is quite straightforward. We believe that this is the case in other applications of stuttering bisimulation as well.
4.1
The Alternating Bit Protocol
A version of the alternating bit protocol is given in Mil 90, which we follow closely. The protocol has four entities : Sender and Replier processes, and message (Trans) and acknowledgement (Ack) channels. Messages and acknowledgements are tagged with bits 0 and 1 alternately. For simplicity, message contents are ignored; both channels are sequences of bits. For a channel c, let order.c represent the sequence resulting from removing duplicates from e, and let count.c be a vector of the numbers of duplicate bits. Vectors are compared component-wise if they have the same length. For example, order.(03; 1 ~) = O; 1, count.( O~; 1 ~) = (3, 2), and count.(lS) = (5). The bisimulation B relates only those states where the order of each channel is of length at most two. Hence count vectors have length at most two. Let (s~t) E B iff in s and t, the local states of the sender and replier processes are identical, and the order of messages in both channels is the same. Note that the number of duplicate messages is abstracted away. Let a.s = (count.( Trans.s), count.(Ack.s)). Let rank.(u,s,t) be ((~.s,a.t). The operations of the protocol are sending a bit or receiving a bit on either channel, and duplicating or deleting a bit on either channel. It is straightforward to verify that B is a well-founded bisimulation. The rank function is used, for instance, at a receive action in s from a channel with contents aZ; b, while the same channel in the corresponding state t has contents am; b~ (n > 1). The receive action at s results in a state u with channel content a t, while the same action at t results in a state v with channel content am; bn-1. u and v are not related, but v is related to s, and rank.(u, s, v) < rank.(u, s, t) (cf. clause (2c)). The example exhibits unbounded stuttering. With the original formulations of stuttering bisimulation, one would have to construct a computation of length n from state t to match the receive action from state s. This introduces n proof obligations, and complicates the proof. In contrast, with the new formulation, one need consider only a single transition from t.
293
4.2
Simple Token Ring Protocols
In EN 95 (cf. BCG 89), stuttering bisimulation is used to show that for token rings of similar processes, a small cutoff size ring is equivalent to one of any larger size. EN 95 shows that the computation trees of process 0 in rings of size 2 and of size n, n > 2, are stuttering bisimilar. It follows that a property over process 0 is true of all sizes of rings iff it is true of the ring of size 2. From symmetry arguments (cf. ES 93,CFJ 93), a property holds of all processes iff it holds for process 0. The proof given in the paper uses the BCG 88 definition and is quite lengthy; we indicate here how to use well-founded bisimulation. Each process alternates between blocking receive and send token transfer actions, with a finite number of local steps in between. For an n-process system with state space Sn, define c~n : S~ -+ N 2 as the function given by c~,.s = (i, j) where, in state s, if process m has the token, then i = (n - m) rood n is the distance of of the token from process 0, and j is the sum over processes of the maximum number of steps of each process from its local state to the first token transfer action. The tuples are ordered lexicographically. Let the rank function be rank.(u, s, t) = (~m.s, an.t), where s and t are states in instances with m and n processes respectively. Let the relation B be defined by (s, t) E B iff the local state of process 0 is identical in s and t. It is straightforward to verify that B is a well-founded bisimulation w.r.t, rank. The rank function is used in the situation where the token is received by process 0 by a move from state s to state u; however, the reception action is not enabled for process 0 in a state t related to s by B. In this case, some move of a process other than 0 is enabled at t, and results in a state v that reduces an, and hence the rank, either by a transfer of the token to the next process, or by reducing the number of steps to the first token transfer action. The next state v is related to s by B (cf. clause (2c) of the definition).
4.3
Quotient Structures
For a bisimulation B on TS ..4 that is an equivalence relation, a quotient structure ` 4 / B (read as .4 "mod" B) can be defined, where the states are equivalence classes (w.r.t. B) of states of .4, and the new transition relation is derived from the transition relation of .4. Quotient structures are usually much smaller than the original; a bisimulation with finitely many classes induces a finite quotient, as is the case in the examples given in the previous sections. Let .4 = (S, -+, L, I, A P ) be a TS, and B be a well-founded bisimulation on A w.r.t, a rank function a, that is an equivalence relation on S. The equivalence class of a state s is denoted by s. Define . 4 / B as the TS (8, ".-%s Z, A P ) given
by: -
s = {.11
s e s}
294
- The transition relation is given by : For C, D 9 S, C ~.~ D iff either 1. C C D , a n d ( 3 s , t : s 9 1 4 9 --+ t ) , o r
2. C = D , a n d ( V s : s e C : ( 3 t : t E C : s
~
t)).
The distinction between the two cases is made in order to prevent spurious self-loops in the quotient, arising from stuttering steps in the original. - The labelling function is given by s - L.s, for some s in C. (states in an equivalence class have the same label) The set of initial states, 2:, equals {s I s 9 I}. -
T h e o r e m 7. A is stuttering bisimilar to A / B.
Proof. Form the disjoint union of the TS's .4 and A / B . The bisimulation on this structure relates states of -4 and -4/B as follows : (a, b) 9 R iff a -- b V b -- a. Let sw : $ -+ S (read "state witness") be a partial function, defined at when C ".-* C does not hold. When defined, v = sw.C is such that v 9 no successor of v w.r.t. -4 is in C. Such a v exists by the definition of ew : S 2 --4 S 2 (read "edge witness") be a partial function, defined at iff C ~0 D. When defined, (v, u) = ew.(D, C) is such that u 9 C, v 9 u ~ v.
C only C, but -.~. Let (D, C) D, and
Let rank be a function defined on W U {2.} (2. is a new element unrelated to any elements of W) by : If u, s 9 S, and sw.C is defined, then rank.(u, s, C) = a.(u, s, sw.C). If D, C 9 $ and s 9 S, then rank.(D, C, s) = a.(ew.(D, C), s), if ew.( D, C) is defined. Otherwise, rank.( a, b, e) = 2.. Let (a, b) 9 R. From the definition of R, a and b have the same label. - a E S : For clarity, we rename (a, b) to (s, C). By the definition of R, C = s. Let s -4 u. If s -.~ u, then there is a successor D = u of C such that (u, D) E R, and clause (2a) holds. If the edge from s to u is absent, then s must equal u, and sw.C is defined. Let x = sw.C. As (s,x) e B, and (u,z) E B, but x has no successors to match u, clause (2b) holds for B, i.e., a.(u, u, z) -~ a.(s, s, z). By definition of rank, rank.(u, u, C) -~ rank.(s, s, C), so (25) holds for R. - a E S : For clarity, we rename (a,b) to (C,s). Let C -,~ D. Let ( y , z ) = ew.(D, C). As x --+ y, and (x, s) 9 B, there are three cases to consider : 1. There is a successor u of s such that (y, u) 9 S. Then y = u, so (D, u) 9 R, and (2a) holds. 2. (y,s) 9 B. Then y = Ix, so C = D. As C ~.z D, and s 9 C, s has a successor u such that u 9 C; hence (D, u) is in R and (2a) holds. 3. (y, s) ~ B and there exists u such that s --4 u, (x, u) 9 B, and a.(y, x, u) -~ a.(y, x, s). Hence, (C, u) 9 R, and rank.(D, C, u) -~ rank.(D, C, s). So clause (2c) holds.
295
Related Work and Conclusions Other formulations of bisimulation under stuttering have been proposed; however, they too involve reasoning about finite, but unbounded sequences of transitions. Examples include branching bisimulation GW 89, divergence sensitive stuttering dNV 90, and weak bisimulation Mil 90. We believe that it is possible to characterize branching bisimulation in a manner similar to our characterization of stuttering bisimulation, given the close connection between the two that is pointed out in dNV 90. An interesting question is whether a similar characterization can be shown for weak bisimulation Mil 90. Many proof rules for temporal properties are based on well-foundedness arguments, especially those for termination of programs under fairness constraints (el. GFMdR 83,Fr 86,AO 91). Vardi Va 87, and Klarlund and Kozen KK 91 develop such proof rules for very general types of linear temporal properties. Our use of well-foundedness arguments for defining a bisimulation appears to be new, and, we believe, of intrinsic mathematical interest. The motivation in each of these instances is the same : to replace reasoning about unbounded or infinite paths with reasoning about single transitions. Earlier definitions of stuttering bisimulation are difficult to apply to large problems essentially because of the difficulty of reasoning about unbounded stuttering paths. Our new characterization, which replaces such reasoning with reasoning about single steps, makes proofs of equivalence under stuttering easier to demonstrate and understand. In the example applications, it was quite straightforward to determine an appropriate well-founded set and rank function. Indeed, rank functions are implicit in proofs that use the earlier formulations. As the examples demonstrate, using rank functions explicitly leads to proofs that are shorter, and which can be carried out with assistance from a theorem prover. A c k n o w l e d g e m e n t s . Thanks to Prof. E. Allen Emerson, Peter Manolios, Jun Sawada, Robert Sumners, and Richard Trefler for carefully reading an earlier draft of this paper. Peter Manolios helped to strengthen some of the theorems and simplify the proofs. The comments from the referees helped to improve the presentation.
References AO 91 Apt, K. R., Olderog, E-R. Verification o Sequential and Concurrent Programs, Springer-Verlag, 1991. BCG 88 Browne, M. C., Clarke, E. M., Grumberg, O. Characterizing Finite Kripke Structures in Propositional Temporal Logic, Theor. Comp. Sci., vol. 59, pp. 115-131, 1988. BCG 89 Browne, M. C., Clarke, E. M., Grumberg, O. Reasoning about Networks with Many Identical Finite State Processes, Information and Computation, vol. 81, no. 1, pp. 13-31, April 1989.
296 CFJ 93 Clarke, E.M., Filkorn, T., Jha, S. Exploiting Symmetry in Temporal Logic Model Checking, 5th CAV, Springer-Verlag LNCS 697. EH 82 Emerson, E. A., Halpern, a. Y. "Sometimes" and "Not Never" Revisited: On Branching versus Linear Time Temporal Logic. in POPL, 1982. EN 95 Emerson, E.A., Namjoshi, K.S. Reasoning about Rings. in POPL, 1995. ES 93 Emerson, E.A., Sistla, A.P. Symmetry and Model Checking, 5th CAV, Springer-Verlag LNCS 697. Fr 86 Francez, N. Fairness, Springer-Verlag, 1986. GW 89 van Glabbeek, R. a., Weijland, W. P. Branching time and abstraction in bisimulation semantics, in Inormation Processing 89, Elsevier Science Pubfishers, North-Holland, 1989. GFMdR 83 Grumberg, O., Francez, N., Makowski, a., de Roever, W-P. A proof nile for fair termination, in InJormation and Control, 1983. KK 91 Klarlund, N., Kozen, D. Rabin measures and their applications to fairness and automata theory, in LICS, 1991. La 80 Lamport, L. "Sometimes" is Sometimes "Not Never". in POPL, 1980. Mil 90 Milner, R. Communication and Concurrency, Prentice-Hall International Series in Computer Science. Edited by C.A.R. Hoare. dNV 90 de Nicola, R., Vaandrager, F. Three logics for branching bisimulation, in LICS, 1990. Full version in Journal of the ACM, 42(2):458-487, 1995. Va 87 Vardi, M. Verification of Concurrent Programs - The Automata Theoretic Framework. in LICS, 1987. Full version in Annals of Pure and Applied Logic, 51:79-98, 1991.
6
Appendix
D e f i n i t i o n o f match Let I N C be the set of strictly increasing sequences of natural numbers starting at 0. Precisely, I N C = {re Irr : N -+ N A rc.O = 0 A (Vi : i E N : ~r.i < ~r.(i + 1))}. Let ~ be a path, and rc a member of I N C . For i E N, let intv.i.(a,~r) = rt.i, min.(~r.(i + 1), length.a)). The ith segment of ~ w.r.t, rt, seg.i.(~, r~), is defined by the sequence of states of ~ with indices in intv.i.(cr, re). Let ~ and 5, under partitions ~r and ~ respectively, correspond w.r.t. B iff they are subdivided into the same number of segments, and any pair of states in segments with the same index are related by S. Precisely, corr.S.((~, re), (5,~)) = (Vi :
i e N : intv.i.(cr, rr) 5~ ~J = intv.i.(5,() 7k 0 A (Vra, n : m E intv.i.(~r, rr) A n E
intv.i.(5,() : (~.m,5.n) ~ B))). Paths ~ and 5 match iff there exist partitions that make them correspond. Precisely, match.B.(e, 6) =_ (3~r,{ : ~r,~ e I N C : corr.B.( (a, rr), (5,r
General Refinement for High Level Petri N e t s Raymond Devillers 1, Hanna Klaudel 2 and Robert-C. P~iemann3 i Universit6 Libre de BruxeUes, Belgium, [email protected] 2 Universit$ Paris XII, IUT de Fontainebleau, France, [email protected] Universit6 Paris-Sud, France and Universit/it Hildesheim, Germany, [email protected] Abstract. The algebra of M-nets, a high level class of labelled Petri nets, was introduced in the Petri Box Calculus in order to cope with the size problem of the low level nets, especially if applied as semantical domain for parallel programming languages. A general, unrestricted refinement operator intended to represent the procedure call mechanism for concurrent calls is introduced into the M-net calculus. Its coherence with the low level refinements is exhibited, together with its main properties.
1
Introduction
While the algebra of Petri boxes (2, 1, 9, 7, 8, 10) has been introduced with the aim of modelling the semantics of concurrent programming languages (and succeeded to do so up to some extent, e.g. 6), in practical situations (and in particular when dealing with large value domains for program variables), this generally leads to huge (possibly infinite) nets, well defined mathematically but difficult to represent graphically and thus to grasp intuitively. In order to cope with this problem, higher level models have been introduced (17, 18, 11), and in particular a fruitful class of so-called M-nets 3, 4, 5 which nicely unfolds into low level boxes and thus allows to represent in a clear and compact way large (possibly infinite) systems. The same operations should be defined on the M-net level than on the low level one, and in particular a refinement (meta-)operation. A first step in this direction has been presented in 11, where the definition of the refinement for M-nets assumed some restrictions however, on the interface of the refined transitions and on the entry/exit interface of the refining nets; this leads unfortunately to difficulties when one wants to take concurrent procedure calls into account. In 19, 13 a further attempt is made to use a more general refinement operator for M-nets, both papers aiming at defining an M-net semantics for a parallel programming language with procedures; a refinement is then necessary in order to distinguish between concurrent instances of the same procedure; the approach defined in those papers is not fully satisfactory however, since it does not commute with the unfolding operation, and, furthermore, it hides several steps in the construction, while not being completely general. A next step has then been presented in 12: a more general refinement mechanism is there defined for a slightly extended M-net model, but it still needs some restrictions to commute with the unfolding operation, and is thus not fully satifactory. In particular,
298
these restrictions may lead to difficulties when applying successive refinements. The present paper aims at overcoming these difficulties and weaknesses. 2
The
M-net
Model
Let Val be a fixed but suitably large 4 set of values and Var be a suitably large 5 set of variables. The set of all well-formed predicates built from the sets Val, Vat and a suitable set of operators is denoted by Pr. We assume the existence of a fixed set A of action symbols, also called actions for short. Each action symbol A E A is assumed to have an arity at(A) which is a natural number describing the number of its parameters. A construct A ( v l , . . . , t'ar(A)), where A is an action symbol and Vj E { 1 , . . . , at(A)} : vj E Var U Val, is a parameterised action. The set of all parameterised actions is denoted by PA. A parameterised action A ( v l , . . . , Vat(A)) is called elementary, if Vj E { 1 , . . . , a t ( A ) } : t,j E Val. The set of all elementary parameterised actions will be denoted as EA. We also assume the existence of a fixed but suitably large 5 set X of hierarchical actions. The latter will be the key to refinements, and thus to any hierarchical presentation of a system, since they represent a kind of 'hole' to be later replaced by some corresponding (M-)net. Finally, we shall also use a set of structured annotations, built from the value and variable sets, which will denote nonempty sets of values. Their exact syntax will be specified later; at that point let us just notice that they include the sets Val and liar, a value v representing in that case the singleton set {v} and a variable x representing the singleton set {v} when the value of x is v. The main difference between M-nets and predicate/transition or coloured nets 14, 16 is that M-nets carry additional information in their place and transition inscriptions to support composition operations. In M-nets, besides the usual annotations on places (set of allowed tokens), arcs (multiset of structured annotations) and transitions (occurrence condition), we have an additional label on places denoting their status (entry, exit or internal) and an additional label on transitions, denoting the communication or hierarchical interface. D e f i n i t i o n 1 ( M - n e t s ) . An M-net is a triple (S, T, t), where S is a set of places, T is a set of transitions with S N T = 0, and t is an inscription function with domain S U (S x T) U (T โข S) U T such that: 9 For every place s E S, t(s) is a pair )~(s).a(s), where ),(s) E {e,i,x}, called the label of s, and c~(s), the type of s, is a nonempty set of values. 9 For every transition t E T, t(t) is a triple var(t).),(t).c~(t); where var(t), the variables of t, is a finite set of variables from Vat, )~(t), the label of t, is a finite multiset of parameterised actions (t will then be called a communication 4 In particular, this means that Val includes all the structured values which will be constructed through the refinement operation (see the definition of place types in section 5). 5 In order to be able to rename them whenever necessary to avoid name clashes.
299
transition), or a hierarchical action symbol (t will then be called a hierarchical transition); and a ( t ) , the guard of t, is a finite set of predicates from Pr; the variables occurring either in A(t) or c~(t) are assumed to belong to vat(t). 9 For every arc (s, t) 9 (S โข T) : t((s, t)) is a multiset of structured annotations (analogously for arcs (t, s) 9 (T โข S)); each structured annotation represents some nonempty set s of values absorbed or produced by the transition on the place; t((s,t)) will generally be abbreviated as t(s,t); again, the variables occurring in ~(s,t) are assumed to belong to vat(t). <> Each type a(s) delimits the set of tokens allowed on place s, and A(s) describes the status (entry e, internal i or exit x) of s. The label of a transition t can either be a multiset of parameterised actions expressing synchronisation capabilities of t, or a hierarchical action symbol informing about a possible future refinement of t. For reasons of simplicity, in figures, we will omit brackets around arc inscriptions, and arcs with e m p t y inscriptions. Figure 1 shows three M-nets, which will be used as a running example. We intend to refine N1 into transition tl and N2 into t2.
81
tl
"{~} 0 ~ " D . . o {,~}.Xl.0 ",,, s3
9.{i}~ o
r
~4
;{~}D ~ ",~, ~-~0 x{~'~} ~r {~176 {~} o.0
N e2
el
~1
9.{3,~} O bl,b~,fD, {bl,b2),O.0
N1
bl
Xl
"O ~'{4t
. {~,~} O
"t2
~ .I-I
i.{5,6}
~
*t3
-,~-~ - ~
x2
~ .O~.{~,o~
{c}.{A(c)}.0 P {c}.{B(~)}.0
e3
"{~} O
~'4
~
;~
{a}.{D(a)}.O
x3
~
' 0 ~~t
N2
Fig. 1. An M-net N with two hierarchical transitions tl and t2, and two refining M-nets N1 and N2.
Given a transition t 9 T, the part of N which consists of the transition t and all its incident arcs is called the area of t: area(t) = ( S x {t})U {t} U ({t} x S). Note that areas of different transitions are always disjoint, and that var(t) comprises all the variables occurring in the inscriptions of area(t). A binding for t is a function a : vat(t) --+ Val. If e is an entity depending on the variables of Notice that it will never represent a multiset of values: the multiset aspect is coped by the fact that t(s, t) is itself a multiset of structured annotations, and by the fact that two distinct structured annotations may have common values in their represented sets.
300
var(t), we shall denote by c~ the evaluation of this entity under the binding ~; in general, this will be obtained by replacing 7 in c each variable a E vat(t) occurring in it (if any) by its value e(a). For instance, A(t)~ E A4f(EA) , and a(t)a e A4f({true, false}) (after the evaluation of the terms). The guard ~(t) plays the rble of an occurrence condition in the sense that t may occur under a binding a only if a(t) is true for a, i.e., if all s terms in c~(t)a evaluate to true. The arc inscriptions specify the tokens flow. An empty arc inscription means that no tokens may ever flow along that arc, i.e., there exists no effective connection along it. A binding a o f t will be said enabling if a(t)~r e .h/~l({true}), i.e., if it satisfies the guard, and if moreover Vs 9 S : e(s, t)~ tA~(t, s)~ 9 M (a(s)), i.e., the flow of tokens respects place types. We shall assume that there is always at least one enabling binding for each transition (otherwise, it may be dropped). The hierarchical transition tl in the M-net N of our running example in figure 1 has a single enabling binding or1 = ( a / l ) , while t2 is enabled for ~r2 = ( a / l , b/1) and or3 = (a/1, b/2); the (silent) communication transition t3 is enabled by a4 = ( a / l ) . In N2 we have for "r2 the two bindings Pl = (c/5) and P2 = (c/6), for "/3 the bindings p3 = (c/5) and P4 = (e/6), and finally ?4 is enabled by p5 = (d/7). A marking of an M-net (S, T, 5) is a mapping M: S --+ M (Pal) which associates to each place s 9 S a multiset of values from a(s). In particular, we shall distinguish the entry marking, where M(s) = a(s) if A(s) = e and the empty (multi-)set otherwise, and the exit marking, where M(s) = a(s) if ~(s) = x and the empty (multi-)set otherwise. For an M-net N = (S, T, 5) we will denote the set of entry (respectively, exit) places of N by ~ (respectively, N ~ = S \ (~ U N ~ is the set of internal places of N. The transition rule specifies the circumstances under which a marking M ~ is reachable from a marking M. The effect of an occurrence of t is to remove all tokens used for the enabling binding r of t from the input places and to add tokens according to g to its output places. D e f i n i t l o n 2 . A transition t is enabled for an enabling binding ~ at a marking M1 if there is a marking M such that Vs e S: Ml(s) = t(s, t)~ + M(s). The occurrence of t at M1 under ~ then leads to a marking M~, such that Vs e S: M2(s) = M(s) + ,(t, s)cr. As a consequence, the semantics of an M-net is not modified if we rename locally (i.e., independently) the variables in each area. Without loss of generality, it will thus always be possible to assume that if t r t', then var(t) and var(t') are disjoint. 7 The evaluation rule will be slightly more complex for structured annotations; this will be clarified in definition 6. s In other words, the set of predicates could be replaced by their mere conjunction; the reason why this is not done directly here is due to technical reasons; moreover, it could happen that the conjunction has not been included in the allowed operators.
301
As usual, two (marked) M-nets N and N' are called isomorphic, if there are (marking-preserving, label-preserving and arc-inscription-preserving, up to local renamings) bijections between their places and transitions. 3
Unfolding
of an M-net
The unfolding operation associates a labelled low level net (see e.g. 2) U(N) with every M-net N, as well as a marking U(M) of/4(N) with every marking M of N. D e f i n l t l o n 3 . Let N = (S,T,L) be an M-net; then U(N) = (U(S),U(T), W, A) is defined as follows:
u(s) = {s, Is ~ s and v E a(s)}, and for each s~ e / / ( S ) : A(s.) = A(s). 9 /A(T) = {G It E T and e is an enabling binding of t}, and for each ta e U ( T ) : )~(G) -- A(t)~. 9 W(s~,G) = ~ L(s,t)(~).xg(v), and analogously for W ( G , s , ) . <> =e~(s,t)
9
Let M be any marking of N. H(M) is defined as follows: for every place sv E H(S), H(M)(sv) = M(s)(v). Thus, each elementary place sv E H(S) contains as many values v as the number of times this value occurs in the marking M(s). The unfoldings for N and N2 of the running example are given in figure 2. ~25 $11
x~ "
$21
~291
~5
~3p3
~25
te 1
~,.rv/
t~4
x2
t~
3
842
~
{A(5)} .~ .~
{B(S)} _~ _~
'D
|
'|
{D(7)}
U(N)
/4(N2)
Fig.2. Unfoldings/A(N) and/A(N2). 4
Low Level Refinement
The refinement NXI +- Ni i E I means 'N where all Xi-labelled transitions are refined into (i.e., replaced by a copy of) Ni, for each i in the indexing set I '. In order to ease the understanding of the next sections, and to exhibit the differences as well as the similarities between the low and high level approaches, we shall first shortly recall how this operation is introduced in the low level theory. Its definition is slightly technically complex, due to the great generality that is allowed. Indeed, refinements are easier to define when the refining nets Ni have a single entry and
302
a single exit place 11, or when there is a single transition to refine without side loop 15. However, here we want to allow any kind of configuration: any number of refined transitions (possibly infinitely many), any connectivity network, any arc weighting, any number of entry/exit places (possibly continuously infinitely many, due to the cardinality explosion phenomenon 2, 1). The definition uses a labelled tree device which nicely generalises the kind of multiplicative Cartesian cross product (pre/post places of transitions to be refined with entry/exit places of the refining net) commonly used in the literature as the interface places 15. This setting has not been chosen just for the purpose of treating the generM case, but also to get easily the main properties of the refinement operator. With this respect, it has been successfully reused in 9, 7, 8, 11, 113.
D e f i n i t i o n 4 . If X is a variable name and X = {Xi i E I} is a family of (distinct) such names, let us define T x = {t e T I A(t) = X} and T x = U x E x T x " Let S -- (S, T, W, ),), Si = (Si, Ti, Wi, )~i) be labelled nets (for each i 9 I). SXi 4-- ~i I i 9 is defined as the labelled net S = (S, T, 17r ~) with = ( T \ T x ) U Uiel Ti where T i = {t.ti t 9 T x' and ti 9 ~ }
= UiEI Si I,.JUsES Ss where S i = {t.si t 9 T x' and si E ~i} and S s is the set of all the labelled trees of the following form: s i.e., the root is labelled by s, the arcs are ... tJ ~N~ ~ labelled by a transition and a direction; 1/ x "'" for each i 9 I and for each (if any) t 9 s o 9.. et zt . . . . with a label of the form Xi, there is an arc labelled t going (down) to (a node labelled by) some (arbitrarily chosen) entry place et of Si and for each (if any) t' 9 "s with a label of the form Xi, there is an arc labelled t ~ coming (up) from (a node labelled by) some (arbitrarily chosen) exit place zt, of Si. W(t,s) i f / ' = t 9 ( T \ T X ) , g 9 S" lTV(/',g) = W ( t , s ) . W i ( t i , z i ) i f / ' = t.ti 9 Ti,~'~.~ occurs in g 9 S' Wi(ti,si) i f / ' = t.ti 9 T i , s = t.si 9 S i 0 otherwise l/V(g,/') is defined symmetrically if/' = t 9 ( T \ T x ) ~ ~i(ti) i f t ' = t.ti,~(t) = Xi and ti 9 Ti
~ff) = J" ~(t) A(s) =
A(s) i
if~ 9 S' otherwise.
A tree in S' may be represented by a set of sequences { s , t . e t , . - . , t ' . z t , , . . . } , describing the root and all the children labels together with the corresponding arc labels. <> The definition is illustrated by Figure 3, and it may be checked that the behaviour of the refined net indeed corresponds to what should be expected.
303 t2
eI
XI
$1
| E.
~11
t2 .c,
t2 .l~
E.Y
n,
Fig. 3. An example of the low level refinement.
5
General Refinement
Here we want to extend the low level definition to the M-net framework, without restricting the kind of inscriptions allowed by this model. In order to grasp the difficulty of the problem, let us consider the example exhibited at the end of 11 and shown B in figure 4: the refinement of the M-net fragment given by the first net N, when X is replaced 'naively' by the M-net N ~, would look like the third fragment; but while in the first one the two a variables were the same (they occur in the surrounding of the same transition) this is no longer the case in the third one, since variables only have a local meaning and m a y be changed independently around each transition; hence they m a y be fixed independently by enabling bindings. Hence it is necessary to transport the identity of the variables, or at least their bindings, from the entry of the refined copy to the exit.
L{0,1} X
L{0j}
{A(c)}
i.N
{B(d)}
" ( ..{,}
fragment of N
โข {.} N'
{A(~)} i . I N {B(d)} ~ ~ ~
i {oj }
"
i {oj }
tentative fragment of NIX (-- N'
Fig. 4. Illustration of the difficulty of the general case.
The intuition behind the general refinement operator is thus the following: A hierarchical transition t (labelled by X) of an M-net N has a set of enabling 9 In order to abbreviate the annotations in the figures, we shall often omit the guard a(t) when it is empty, and the variable set var(t) when it may easily be reconstructed from the annotations in the area of t.
304
bindings, i.e., possibly more than only a single one. Each enabling binding ~ for t can be understood as a 'mode' under which the refining M-net N ~ is executed. Once (if ever) the refining M-net has reached its exit marking, the execution of N is supposed to be continued in the state (marking) corresponding to the result of the occurrence of t under the considered mode ~r. Therefore two problems have to be solved: First the decision for the mode under which the refining M-net should be executed has to be taken, second, the chosen binding has to be 'memorised' in order to produce the corresponding marking after Ni has reached its final state. To guarantee the commutativity of the refinement operation with the unfolding a labelled tree device similar to the one used for low level nets above will be used here. The main difference with the low-level case, and with the previous attempts to define refinements at the M-net level 12, will be that, in our context, the place types (and consequently the evaluations of the structured annotations in arc inscriptions) will be sets of labelled trees but the interface places themselves will remain unstructured. Like above, i f X E X is a hierarchical action symbol and XI = {Xi i E I} is a set of such actions, T X = {t E T A(t) = X} is the set of all X-labelled hierarchical transitions, and T xl = U x e x l T x is the set of all hierarchical transitions with a label from XI. Like for the definition of refinements for low level-nets, the places of the net NXi ~ N i i E 1 will be of two kinds: the interface places and the copied internal places: 9 Each place s E S of the M-net N will also be a place of the refined M-net, with the same label as before. The only difference is that its type will be a set of labelled trees constructed from the old value set and the entry/exit interface of the refining nets. The new type &(s) of s is the set of all the (isomorphic classes of) labelled trees of the following form: the root is labelled by a value v E a(s); V / ,~ _ the arcs are labelled by bindings ~rt of . . . . . . transitions t (in T x" fq (s* U "s)) and a J direction (up or down) - more precisely, "'" (e,w) (z,w')"" for each i E I, for each (if any) t E s" with a label of the form Xi and for each enabling binding crt of t such that v E ~(s,t)~t, there is an arc labelled o't going down to a node labelled by some arbitrarily chosen pair (e, w) where e E ~ and w E a~(e), and for each i E I, each (if any) t I E ~ with a label of the form Xi and for each enabling binding ~t' of t ~ such that v E t(t, s)~rt,, there is an are labelled ~rt going up from a node labelled by some arbitrarily chosen pair (z, w ~) where z E Ni ~ and w' E ~i(z). 9 Copied internal places of Ni form the set S i of all the.pairs t.si where t is a transition of N labelled by X~ (A(t) :- X~) and si E N~ is an internal place of the refining net Ni. The label of such a place will always be internal. The type of t.si will be the set &(t.si) of all the pairs ~rt.v, where crt is an enabling binding of t and v E a(si) is any value allowed on si. We give the types for the interface places sl, s2, sa, s4 and the internal place t2.p
305
in NXi +- Ni I i E {1, 2} to illustrate the definitions for the values allowed on interface places and internal places.
I,.(X1,4) (,~',~,) (.2,5) r
(,,~,~) (,~2,6) (~3.,4) (,,:~,6) (e2,5) (.~,, ) (,~2,6) (,~2,6)
1
1
1
1
1
"~
,
~(~,) =
o,,
'}~o,
'
~(s~/ =
~,~(el,3) (ei,4)
~', t ~, r', r ~, l ~, r~ ~(~c,-~,~) (.2,6) (=3,7) (.2,5) (~2,6) (.3,7')
K(t2.p) = {r
~(s2) = {1}
~2.6, an.5, o'3.6}
Like for the definition of refinements for low level-nets, the transitions of .~ = NXi e- Ni I i E i will also be of two kinds: the untouched transitions t T \ T x ' , with the same inscription as before (~(t) is the same in g and )V), and the copied transitions t.$i, where A(~) = Xi E X~ and ti E T~. The set of those copied transitions is denoted by T ~, As for the inscription of the latter, we shall assume (without loss of generality) that var(t) N vari(ti) = ~; then var(t.t~) = var(t) ~ var~(~), ~(~.~) = )~(t~) and c~(t.t~) = ~(~) U ~(~,). In order to understand the rationale of the structured annotations occurring in the inscription of arcs of the refined M-net, let us consider the following example depicted in figure 5.
el
9~
(b,~).(C,~l) t.d' (b,s).(c,e2) ' ~
t'.~' "
r
fragment of N
fragment of N'
fragment of NX e- N'
Fig. 5. Illustration of the structured annotations.
Transition g is untouched but place s now has a type composed out of labelled trees, and the occurrence of g must produce one instance of each tree with a root labelled by a value produced through a in N; this will be represented by the notation (a, s).w, where the 'hole' symbol w means that there is no constraint on the son labels. Transition t.t", on the other hand, must absorb from s one instance of each tree with a root labelled by a value absorbed through b in N and a son corresponding to the selected mode (determined by the values of the variables from t) labelled by a value absorbed through c in el by ~": this will be represented by the notation (b, s).(c, el); it will also absorb trees of the same shape, but with the son labelled by a value absorbed through c in e2 by t", which
306
will be represented, similarly, by the notation (b, s).(c, e~); notice that the fact that t" absorbs one token from each of the two entry places in N * is replaced, in the refined net, by the fact that t.t" absorbs from s the tokens of two structured annotations, like if the whole ~ were replaced by a single entry place; this is due to the fact that the new place s gathers all the tree values produced by the new ft. Finally, transition t.ff ~ must also produce in place t.s ~ one instance of each value cr.v where v is any value produced through d in N ~ and ~r is any enabling binding for t (i.e., a mode determined by the values of the variables from t): this will be represented by the notation co.(d, s~), where the 'hole' symbol w means that there is no special constraint on the first part of the value. D e f i n l t i o n 5 ( G e n e r a l R e f i n e m e n t ) . Let N = (S,T, L) and Ni = (Si,Ti, tl), where i E I, be M-nets.~NXi +-- Ni i E 1 is defined as the M-net N = (S, T, ~, where S = U Si u S, T = (T \ T x~)U U Ti, and iEl
iEl
A(3).&(g)
if~ES if s = t.si E S i
r(g) = I.i.&(g)
ift=te(T\T
(
vat(t) ~ var(ti).Ai(ti).~(t) tJ ai(ti) ~(s,t)(a).{(a,s).w}
x')
if i" = t.t{, A(t) : x { , t{ e T~
if=tE(T\TX'),
and g e S
~e,(~,~)
o EeoN
E =
-e,(.,o
E
bE~z(e,,tz)
,(s,t)(a). Li(ei,ti)(b)" {(a,s).(b, ei)} if ~ = t.ti and ~ E S
ti(si, ti)(b). {oJ.(b, si)} if t = t.ti and ~ = t.si e S i bE%(s~,tz)
0
otherwise,
and analogously for arcs t(t, ~).
<~
We apply the general refinement on our running example of figure 1, the resulting M-net is depicted in figure 6. The previously given place types are omitted. s1
t l .'71
(~'~l)'(b~'e~),l-ffq ~a 8.~ ~b, ~.~
8.
,3 ,Z/~. X
{A(c)}.{a_
{D(a)} {a_
(a,..)(d,~.)
.r-t
Fig. 6. General refinement applied to N, N1, and N2: NX,
(b,..)(d,.s)
/--
e- Ni I i E {1, 2}.
We still have to specify the evaluation of the structured annotations under a binding; this will lead to a set (possibly infinite) of values (labelled trees, possibly
307
reduced to their root). In order to do that, let us first notice that, since for a hierarchical transition t of N and a transition ti of the refining M-net Ni, we have var(t.ti) = var(t) ~ var(ti), each binding # oft.ti is the union of a binding for t and a binding p for ti, while the bindings of an untouched transition t ~ are the same as in N. D e f i n i t i o n 6 . The evaluation of a structured annotation in the inscription Z(~,t') or Z(t, g) of an arc in the refined net for a binding # = cr U p or/~ = ~ (as specified above) is defined by: 9 (a, s).w# = {r E &(s) I the root of r belongs to ae}, if t = t e ( T \ T x~) and s E S, 9 (a, s).(b, s')# = {7- e &(s) I the root of r belongs to a~r and the son corresponding to the arc (down or up, depending on the inscripted arc) labelled at has a label (s', w) with w E bp}, if t = t.t~ with t E T x' and s E S, 9
=
e
Iv 9
if { = t.t~ with t E T x' and ~ = t.si 9 S i 9
<~
Then, it is not hard to see that the enabling bindings of t' in N are the enabling bindings of t' in N, and the enabling bindings of t.ti are the unions of an enabling binding for t and an enabling binding for ti. The evaluation defined for structured annotations will be illustrated for the occurrence of transition tl -'I1 and transition t2-72 in the refined M-net from figure 6. Consider the place sl with its initial marking, i.e., sl contains a token for each value of its type, and transition tl.71, can fire for every binding # composed out of an enabling binding r for tl and an enabling binding p for 71. There is a unique binding ~1 = (a/l) for tl and there are two enabling bindings for 71, i.e., p~ = (bl/4, b2/4) and p~ = (bl/4, b2/3). Take for instance # = al U p~ (but notice that the other combination might also be considered). The evaluation {(a, sl).(bl, el), (a, sl).(b2, el)}# gives us the set &(s-l), hence the occurrence of tl.71 under this marking is possible. The evaluation {(a, s3).(bl, zl)}p yields the entire type ~(s3), since the root of each tree value in ~(s3) belongs to &ell and the son corresponding to the arc ~rl belongs to blp~. Hence, the occurrence of tl.71 puts each tree value from ~(s3) on s3. Now we try to fire transition t2.Tz. It is enabled by bindings composed out of ~2 or o'3 for t2, and Pl = (c/5) or P2 = (c/6) for 72. One might expect that t2.72 can be fired twice under the given marking, since 72 is enabled twice under the initial marking of N2 (by Pl and P2). Let us take # = ~r2 U Pl. The evaluation 1
1
1
L\~
which is
(=~,4) (~,5) (~,5) (=1,4) (,:~,5) (~,6) (=1,4) (,~',s) (~3,~')
taken from s3 when firing t2.72, while {oJ.(c,p)}p generates {~r2.5} on t2.p. Notice that the present marking (after firing t2.7~ under binding e2 Up1) does no longer allow the execution oft2.72 under binding cr3Up2 (and neither oft2.74 under mode o'3), since the evaluation of arcs adjacent to (and hence the enabling of) t2.72 (respectively, t2.74) is defined with respect to the type of the adjacent place, i.e., with respect to if(s3), and not only with respect to the present marking of
308
the place. The execution t~.72 (respectively, t2.74) under mode or3 would require tokens (labelled trees) which are already removed from s3 by the occurrence of a transition from N2 under mode e2. This is the means to transport the chosen mode for the hierarchical transition through the refining M-net, and to ensure that once one transition of the refining net has chosen a mode, then the decision is valid for the entire net, even if there are transitions which are concurrent (as in our example t2.72 and t~.74 ) and hence independent. We might now fire t2.72 under binding # = o'2(3p2. The evaluation of the incoming 1
arc yields {
1
~1/1.2x,~
,
1
~V ~
,
~1/~2x,~
}, these three tokens are
(Z1,4) (e~.T6) (e2,5) (~1,4) (e~,6)(e2,6) (m1,4) (e~,6)(.3,7)
still in s3. The execution of t2.72 under this binding yields then ~2.6 on t2.p.
6
Some
Properties
of the General
Refinement
The basic property of the general refinement is the commutativity with the unfolding operation, stated in the following theorem and illustrated in figure 7. T h e o r e m 7 ( C o m m u t a t i v l t y ) . Let N = (S, T, 5) and Ni -- (&, Ti, ti), where i 6 I, be M-nets, then up to isomorphism
lA(NXi e- Ni I i 6 1) = U(N)Xi
N
\ ...::x::::,
N,
N
~ N(Ni) I i 6 I.
N,
/
M.e,, ......................
N|=L4(NX,+-N,)
=
t ..............
"...........
1.4(N)X,(--bI(N,)I=Nr
Fig. 7. Illustration of the commutativity.
Proof. In Nt as in Nr there are two kinds of transitions and two kinds of places: those coming from the net N to be refined and those coming from the refining nets Ni; we shall exhibit a one-to-one correspondence between the members of each category. Let t" be a transition of N which is not in T x ' , and ~" one of its enabling bindings; let t be a transition of N with label Xi, and ~ one of its enabling bindings; let t ~ be a transition of Ni, and cr~ one of its enabling bindings; let s be a place of N, v one of its values, aj an enabling binding of a transition tj to be refined absorbing value v from s, and cr~ an enabling binding of a transition tk to be refined producing value v in s; let ej be some entry place of the net refining tj and vj one of its values; let xk be some exit plaze of the net
309
refining tk and vk one of its values; finally, let s ~ be an internal place of gi and v~ one of its values. The one-to-one correspondance between the elements constituting both sides of the equation is schematised in the following table: in Nt =lg(NXi
+-- Ni)
in N,. = / / ( N ) X i
t', (t.t'),,o,,, S
+-//(N/)
t",, t,,.t',
V
9" ( e j ,
vk)""
9 ''
ej~j
Xk. k
"."
(~.Sl)a.v*
The mapping between the arcs follows immediately from the fact that the arc weights are directly driven by the (corresponding) name structures, such that an arc in U(NX~ +- Ni) and the arc in//(N)Xi ~ / / ( N i ) (between the related place and transition) have the same weight. We also have a general property about successive refinements, similar to the one already obtained in the low level domain. Since the variable sets of two successive refinements are not necessarily disjoint, we shall separate the second set into a common part and a disjoint part. We then have the following general expansion law for refinements, which allows to reduce any succession of simultaneous refinements to a single refinement (but whose refining components may themselves be refinements). T h e o r e m 8 ( E x p a n s i o n law). Let N, N~, Nj, and N~', where i E I , j E J,
and k E K, be M-nets. If J C I, I N K = O and {Yk k E K} N {X~ l i E I} = O, then up to isomorphism NXi +-- Ni l i e IX~ +- N~,Yk +- N~' IJ ~ J,k e K = g x i e . - N i X j e - - g ~ . , Y k3 e--N~'ljeg, kEg,Yk
~--N~lieI, keg.
Proof. Let us denote by Nl the left hand side and by Nr the right hand side of the equation. The two M-nets Nl and Nr are the same modulo parenthesis, and dropping some redundancy. The bijection between the two nets can be obtained through a transformation of net Nl into net Nr (or vice versa) by rewriting the identity of transitions and places, the types of the places and the inscriptions of arcs according to the parenthesis9 Consider the example depicted in figure 8, exhibiting the various types of configurations. The correspondence table may be constructed from the example as in the previous proof.
310 61
ea
N 8
(h.r
(tl.t').,3
(tl.d).~
e2
i
~.(h,,~)
$
tx.Ct'.a) tl 4r
( ) (-,~).((c,e~).(Lea),~t) (a,s) ((e,e2).(I,ea),e2)
l " ~o.(~o.(g,,a),t'.,a) 9(.(h,3), 9~) E ~ (Ca,,,).Ck,~a),,,)
t I .t 1
) (~.(~,,1),tt-,~).~ ((=,~').(~,=t),~')
$/
t 2 .t3
xa
.(it .~/)
((d'(d,t 1 ) , t l "'1)'(r
t2.~
~f
N2
tI
tl.$1 t 1 .a
13
Nl
) ((-,~).(r ((=,).(~,~2),~).(f,~3) )~.(g,sa)
fl
t I .~
(~,~').(C~,~) ~,=11 8r
) ((b,~').~,~').(I,~3)
~ (g,=a) )~.(h,,~)
t2.~
t~ ,/3
(b,~').(/,~3) ~.(g,,3)
t 2 .s 3
2( ) w.(h,,~) (b,s").(k,x~) t 2 ."/
$11
((b,~").~,~").Ck,=3)
NIX +- N1Y
$H
+- N21
~X ~- NIY ~ N2,Y ~- N2
Fig. 8. Illustration of the expansion law. 7
Conclusion
We have provided the M-net domain with the same algebraic structure as the low level Petri Box one, by introducing a generM simultaneous refinement operator; the coherence of the corresponding structure has been exhibited through the unfolding operation, and the properties are inherited from the low level ones. We have established thus a fully general and coherent framework in which the ideas of 19, 13 can be satisfactorily developed. To our knowledge, no other high level framework possesses an equally general refinement satisfying the desired algebraic properties. The basic ideas of this paper are most likely applicable to other high level Petri net models, although the formalisation is given here only for the M-net Calculus. Acknowledgments This work has been performed while the first author visited the UPVM's Equipe d'Informatique Fondamentale in spring 1997: our thanks thus go to E. Pelz and the Universit~ Paris Val de Marne for the invitation. We thank as well the anonymous referees for their careful reading and helpful comments.
311
References 1. E. Best, R. Devillers, and J. Esparza. General Refinement and Recursion for the Box Calculus. STACS'93. Springer, LNCS Vol. 665, 130-140 (1993). 2. E. Best, R. Devillers, and J.G. Hall. The Box Calculus: a New Causal Algebra with Multilabel Communication. Advances in Petri Nets 9~. Springer, LNCS Vol. 609, 21-69 (1992). 3. E. Best, H. Fleischhack, W. F~aezak, R.P. Hopkins, H. Klaudel, and E. Pelz. A Class of Composable High Level Petri Nets. Application and Theory of Petri Nets 1995. Springer, LNCS Vol. 935, 103-120 (1995). 4. E. Best, H. Fleischhack, W. Fraczak, R.P. Hopkins, H. Klaudel, and E. Pelz. An M-Net Semantics of B ( P N ) 2. Structures in Concurrency Theory: STRICT'95 Proceedings. Springer, 85-100 (1995). 5. E. Best, W. Fraczak, R.P. Hopkins, H. Klaudel, and E. Pelz. M-nets: an Algebra of High Level Petri Nets, with an Application to the Semantics of Concurrent Programming Languages. To appear in Acta Inormatica. 6. E. Best, R.P. Hopkins. B ( P N ) 2 - a Basic Petri Net Programming Notation. Proceedings of PARLE'93. Springer, LNCS Vol. 694, 379-390 (1993). 7. E. Best and M. Koutny. A Refined View of the Box Algebra. Application and Theory of Petri Nets 1995. Springer, LNCS Vol. 935, 1-20 (1995). 8. E. Best and M. Koutny. Solving Recursive Net Equations. Automata, Languages and Programming 1995. Springer, LNCS Vol. 944, 605-623 (1995). 9. R. DeviUers. The Synchronisation Operator Revisited for the Petri Box Calculus. Technical Report LIT-290, Universit~ Libre de Bruxelles (1994). 10. R. Devillers. S-Invariant Analysis of General Recursive Petri Boxes. Acta Informatica, Vol. 32, 313-345 (1995). 11. R. Devillers and H. Klaudel. Refinement and Recursion in a High Level Petri Box Calculus. Structures in Concurrency Theory: STRICT'95 Proceedings. Springer, 144-159 (1995). 12. R. Devillers, H. Klaudel and R.-C. Riemann. General Refinement in the M-net Calculus. Technical Report LIT-357, Universit6 Libre de Bruxelles (1997). 13. H. Fleischhack and B. Grahlmarm. A Petri Net Semantics for B(PN) 2 with Procedures. Parallel and Distributed Software Engineering, Boston Ma., 1997. 14. H. Genrich. Predicate-Transition Nets. In Petri Nets: Central Models and their Properties, Advances in Petri Nets 1986 Part L Springer, LNCS Vol. 254, 207-247 (1987). 15. R.J. van Glabbeek and U. Goltz. Refinement of Actions in Causality Based Models. Stepwise Refinement of Distributed Systems. Springer, LNCS Vol. 430, 267-300 (1989). 16. K. Jensen Coloured Petri Nets. Basic Concepts, Analysis Methods and Practical Use. EATCS Monographs on Theoretical Computer Science, Vol. 1. Springer (1992). 17. H. Klaudel. ModUles alg6briques, bas6s sur les r~seaux de Petri, pour la s6mantique des langages de programmation concurrents. PhD Thesis, Universit6 Paris XI Orsay (1995). 18. H. Klaudel and E. Pelz. Communication as Unification in the Petri Box Calculus. ~mdamentals of Computation Theory. Springer, LNCS Vol. 965, 303-312 (1995). 19. J. Lilius and E. Pelz. An M-net Semantics for B(PN) ~ with Procedures. In ISCIS XI, Vol. I, 365-374, Antalya, November 1996. Middle East Technical University.
Polynomial-Time Many-One Reductions for Petri Nets Catherine DUFOUttD and Alain FINKEL LSV, CNRS URA 2236; ENS de Cachan, 61 av. du Pdt. Wilson 94235 CACHAN Cedex, FRANCE. {Catherine.DUFOURD, Alain.FINKEL}~lsv.ens-cachan.fr A b s t r a c t . We apply to Petri net theory the technique of polynomialtime many-one reductions. We study boundedness, reachability, deadlock, liveness problems and some of their variations. We derive three main results. Firstly, we highlight the power of expression of reachability which can polynomially give evidence of unboundedness. Secondly, we prove that reachability and deadlock are polynomially-time equivalent; this improves the known recursive reduction and it complements the result of Cheng and al. 4. Moreover, we show the polynomial equivalence of liveness and t-liveness. Hence, we regroup the problems in three main classes: boundedness, reachability and liveness. Finally, we give an upper bound on the boundedness for post self-modified nets: 2o(s,z~(~r)%xog,,~(zr This improves a decidability result of Valk 18. K e y words: Petri net theory; Complexity Theory; Program Verification; Equivalences.
1
Introduction
The boundedness, the teachability, the deadlock, the t-liveness and the liveness problems are among the main problems studied in Petri nets. Solving these problems requires huge space and time resources. For boundedness, Lipton 13 proved that a lower space-bound is 2e"Ivff~, improved with 2 e'lNI by Bouziane 2 (where e is some constant and IN I is the size of the input net); Rackoff 17 proved that an upper space-bound for this problem is 2 ~176 For reachability, decidability has been proved by Mayr 14 and Kosaraju 12; Cardoza, Lipton, Mayr and Meyer 3,15 established that this problem is ExPsPAC~.-hard. However, until now, it is not known whether the reachability, the deadlock and the liveness problems are primitive recursive or not. In this paper, our aim is to compare these problems, to regroup similar problems into classes and to order these classes. We use polynomial-time many-one reductions 9. The idea is to take one instance of a problem A and to polynomially transform it into one instance of another problem B. The problem B is seen as an oracle used to solve the problem A. In the literature, we often find two other kinds of reductions: polynomialtime Turing reductions which allow to consult the oracle not only once, but a
313
polynomial number of times and recursive reductions. We obtain two sorts of results. Firstly, we prove three main theorems: -- Boundedness is polynomially reducible to reachability, - Reachability and deadlock are polynomially equivalent, Liveness and t-liveness are polynomially equivalent. -
For instance, we show that a Petri net N is unbounded if and only if a marking MN is reachable in the net N which is polynomially constructed from N. Let us note that our second theorem strengthens a recent result of Cheng, Esparza and Palsberg 4 who showed that reachability is polynomially reducible to deadlock. Secondly, we establish a strong relation between Petri nets and Post SelfModifying nets (PSM-nets) on the boundedness problem. Post self-modifying nets, defined by Valk 18, are extended Petri nets in which a transition may add a "dynamic number" of tokens (which is an affine function, with a specific form, of the current marking) in its output places. Valk has proven that the boundedness problem is decidable for post self-modifying nets. Here, we improve his decidability result by giving 20(Dgl2*l~ as an upper space-bound. Moreover, this upper bound is not so far from the lower bound 12(2N). A
There are four advantages in grouping problems together. Firstly, even if we still do not know the exact complexity of teachability and deadlock, it is instructive to know that they have the same complexity, modulo a polynomial transformation. Secondly, when we know that seven problems are polynomially equivalent, as for the ones of the teachability class, we may focus our attention on only one of these problems, to produce a good implementation of an algorithm solving it; this unique program may be used for solving the sixth other problems. Thirdly, the obtained results confirm our intuition about the hardness of problems in Petri nets. Basically we obtain the following order: Boundedness _< Reachability -- Deadlock < Liveness Fourthly, we obtain a new complexity result in using the equivalence between boundedness for Petri nets and boundedness for post self-modifying nets. In the next section, we give the basic definitions of Petri nets and polynomialtime reductions; then we make an overview of the known many-one polynomialtime reductions. In section 3, we reduce boundedness to teachability. In section 4, we prove that reachability is polynomially equivalent to deadlock; moreover, both are polynomially equivalent to teachability and deadlock for normalized Petri nets (for which valuations over arcs and initial marking are upper-bounded with 1). In section 5, we show that liveness is equivalent to t-liveness. In section 6, we prove that boundedness for Petri nets and boundedness for post selfmodifying nets are polynomially equivalent; we deduce from there the upperbound on the boundedness problem for PSM-nets. We conclude in section 7.
314
2
Petri
nets
and
polynomial-time
reductions
Let IN be the set of nonnegative integers and let IN1, (k > 1) be the set of kdimensional column vectors of elements in IN. Let X E lN ~, X(i) (1 < i < k) is the i th component of X. Let X, Y E INk, we have X -~ Y iff the two conditions hold : (a) X(i) < Y(i) (1 < i < k) and (b) 3j, 1 < j < k, s.t. Z ( j ) < Y(j). Let E be a finite alphabet, S* is the set of all finite words (or sequences) over E. We note ISI, the cardinal of a finite set S. We note INI, the size of a Petri net N. 2.1
P e t r i nets, properties and complexity
A Petri net is a 4-tuple N = < P, T, F, M0 > where P is a finite set of places, T is a finite set of transitions with P f~ T = ~, F : (P x T) U (T โข P ) ~ lN is a flow function and M0 E INIPI is the initial marking. A Petri net is normalized or ordinary if F is a function into {0, 1} and M0 is a function into {0, 1} IPI. A transition t is firable from a marking M E INIPI, written M ~-~, if for every place p, we have F(p,t) < M(p). Firing t from M leads to a new marking M ~, written M -~ M ~, defined as follows : for every place p, we have M~(p) = M(p) - F ( p , t ) + F(t,p). A marking M ' is reachable from M, written M -~ M', if there exists a sequence ~ E T* such that M 2_~ M ~. A marking is dead if no transition is firable from it. The teachability set of N, denoted RS(N), contains all the markings reachable from/140. A Petri net is unbounded if its teachability set is infinite. A transition t is quasi-live from M if it is firable from a marking M ~ with M 4 M ~. A transition t E T is live if it is quasi-live from any marking in RS(N). A Petri net is live if all transitions are live.
Definition 1. Given a Petri net N = < P,T,F, Mo >, t E T and M E INIPI: - The Boundedness Problem (BP) is to determine whether N is bounded or not. - The Reachability Problem (RP) is to determine whether M E RS(N) or not. - The Deadlock Problem (DP) is to determine whether RS(N) contains a deadmarking or not. - The t-Liveness Problem (t-LP) is to determine whether the transition t is live or not. - The Liveness Problem (LP) is to determine whether N is live or not. These problems have been widely studied. They are all decidable 11,8,12,14,7, but intractable in practice. A lower space-bound for the RP and BP is 2 e"I x / ~ 13. Reachability is ExPsPAcE-hard 3,15, but we don't know yet if the R P is primitive recursive or not. There exists a family of bounded Petri nets such that every net N of the family has a reachability set with a non-primitive recursive size in INI 10. An upper space-bound for deciding the BP is 20(INl*l~ 17. This bound comes from the following theorem: T h e o r e m 2. 11,17
A Petri net N = < P,T,F, Mo > is unbounded if and only if there exists two sequences ~1, ~r2 E T* such as Mo 2_~ M1 ~ M2 with
315 M1 -~ M2. The net is unbounded if and only if there exists such an execution of length less than a double exponential in the size of N. If we talk about complexity, we need to determine what is the size of a Petri net. The representation we have chosen which is slightly different from the one in 20 commonly used. Let V be the greatest integer found over the flow function and the initial marking. We propose to encode the flow function of a Petri net with two matrices of size (PI x IT) containing O(logY) bits: one matrix for input arcs and an other for output arcs. A Petri net is encoded with a sequence of bits giving the number of places, the number of transitions, the size of V, the flow function and finally the initial marking. The total size belongs to:
O(log IPl+log ITl+loglogV + 2* IPI*ITI*log V + IPI*Iog V) = O(IPI*ITI*Iog V) 2.2
K n o w n p o l y n o m i a l - t i m e reductions for Petri nets
Reductions 9 are used to compare different problems for which, most of the time, no efficient algorithm is known. We manipulate decision problems which are problems requiring Yes or No as output. We ask questions of the kind : "Does Petri net N possess the property P or not ?". The net given in input is called the instance of problem P. Most of the time, instances of our problems are Petri nets but it may happen that we need to specify a marking (as for the RP) or a transition (as for the t-LP). We note Ip the set of instances associated to problem P. We say that P is many-one polynomiM-time reducible to Q, written P _
Boundedness Boundedness-norm
Reachability Reachability- norn~__~_~ Sub-RP, Zero-R~
.Zero-.P I
Boundedness-PSMl~
..,~
/I'
t-Liveness t-Liveness-norrn
Liveness
Fig. 1. Summary of known polynomial-time many-one reductions.
316
We give in the current section an overview of known many-one polynomial-time reductions focusing on the BP, RP, DP, LP and t-LP and some of their variations. The Fig. 1 summarizes the relation between the problems with a diagram. All problems put in a same box are equivalent. An arrow from a box to another indicates the existence of a reduction from the first class to the other. An arc labeled with "not" refers to a reduction to the complement of a problem. The boundedness problem for post self-modifyingnets is written Boundedness-PSMN (the definition of PSM-nets is recalled in section 6). N o r m a l i z a t i o n : The normalization proposed in 6 is performed in quadratic time and preserves boundedness, teachability and t-liveness. We add the suffix -norm to design the classical problems over normalized, or ordinary, Petri nets. We have BP equivalent to BP-norm, RP equivalent to RP-norm and t-LP equivalent to t-LP-norm. R e a c h a b i l i t y : Many polynomial reductions were given by Hack 8, 16 about teachability properties. Hack pointed out three problems equivalent to the RP. The Submarking Reachability Problem (Sub-RP) over < N, M ~ >, where M I is a marking over the subset PI C P, is to decide whether there exists a marking M reachable such that for all p E P', M(p) = M'(p). The Zero-Reachability Problem (Zero-RP) over < N > is to decide whether there exists a reachable marking in which all the places are empty. The Single-Place Zero-Reachability (SPZero-RP) over < N , p > is to decide whether there exists a reachable marking for which place p is empty. Cheng and al. 4 showed that teachability is polynomially reducible to deadlock. Liveness: l~eachability is polynomially reducible to not-liveness 16. The other sense of the reduction is known recursive but we do not know a polynomial reduction. More recently, Cheng and al. 4 showed that the deadlock problem is polynomially reducible to not-liveness. But as for RP, the other sense is not known. Liveness appears to be a very expressive property. Hack 8 mentions a reduction from t-LP to LP performed in almost linear-time.
3
From unboundedness
to reachability
Let us compare the current state of knowledge about boundedness and reachability. Firstly, about complexity, we know an upper space-bound for solving boundedness 17 but we still do not know if teachability is primitive recursive or not. Moreover, this last question remains one of the hardest open questions in Petri net theory. Secondly, we know that if we increase the power of Petri nets a little bit then reachability becomes undecidable while boundedness seems more resistant. An illustrative example is the class of the post self-modifying nets for which boundedness is decidable but not teachability (see section 6). Reachability seems to be a stronger property than boundedness because BP is in EXPSPACE and I~P is ExPSPACE-hazd; in the current section, we explicitly give the reduc-
317
Level 4
pl
Level
t Level :
P2 N
Level
Fig. 2. Reduction from boundedness to reachability.
tion from BP to RP. The other sense, reachability to unboundedness, is probably false otherwise we would obtain a surprising upper bound complexity on solving teachability. T h e o r e m 3. Unboundedness is polynommlly reducible to reachability P r o o f 9 Let N = < P, T, F, M0 > be a Petri net. Recall that N is not bounded if and only if there exists an execution of N, M0 -~ M ' -~ M " , such that M ' -~ M " 11. The difference Md = M " - M ' is a nonnegative vector, with at least one strictly positive component. Our strategy is to look for such a marking Md. But we want to detect Md through reachability, by asking whether a specific marking is reachable, and this implies that we need to characterize Md in a standard way. Let us suppose that we add a summing-place that contains at any step the sum over all the places (a summing-place can easily be implemented in a Petri net by adding to each transition an arc labeled with the total effect of the transition). The marking Ma is certainly strictly greater than marking with 0 in all the places except 1 in the summing-place. We use this characterization for the final question of the reduction.Let us explain our reduction with the help of Fig. 2. We build N = < P, T, F, M0 > as follows:
318
Make two copies of N in N1 = < PI, T1, F1, M01 > and N2 =< P2, T~, F2, M02 > with M0 = M01 = M02; - Add two summing-places. At first, p~ contains the sum over all the places of Nx and p~ contains the sum over all the places of N2; Each transition t E Tz is duplicated leading to a new transition t ~ in T2 (note that now N2 is not anymore an exact copy of N); - Make the fusion of N1 and N2 over pairs (~1,t2) where tl E N1 and t2 E N2 are copies of the same original transition in N; - Add four levels of control, which are activated successively during an execution. Levels are managed with permission-places labeled explicitly in the picture. Control is first given at level 1 and moves as follows: Level 1 --4 level 2 --+ level 3 -+ level 4. The dashed arcs link a permission-place to the set of transitions that it allows to be fired: 9 Level 1 Mlows the two nets N1 and N2 to fire the original transitions together; 9 Level 2 allows only N2 to continue to fire the original transition while N1 and its summing-place are frozen; 9 Level 3 allows simultaneous emptying of two associated places (Pl, P2) where Pl 9 P1 U {p~} and p2 9 P2 W {p~} is its corresponding place; 9 Level 4 allows to empty places of N2 U {p~} only. -
-
C o r r e c t n e s s : N is unbounded if and only if Mr = (0, 0, 0, 1, 0.--0, 0, 0-- -0, 1) is reachable in/V. The first four positions in Mr are related to the four levels. The last position in Mr is related to summing-place p~. The other positions, all equal to 0, are related to the remaining places of N1 and N2. Note that Mr is a marking at level 4 (Mr (4) = 1). By construction, in N, at any time M' in Nt and M ~ in N2 are two markings appearing along an execution of N. The only way to empty correctly P1 and p~ and to keep at least one token in p~ is to have M j -< M~; this happens if and only if N is unbounded. Finally, level 4 allows to clean up the remaining places in order to exactly reach Mr when N is unbounded. C o m p l e x i t y : The net N contains O(IPI) places and O(P + ITI) transitions. The greatest value in .~ is (IPI* V), because of the summing-places (recall that V is the greatest value of N). The total size is thus O(IPI * (P + ITI) * log(IPl * V)) and the construction is linear on this size. We conclude that the time-complexity of the reduction is O(log IP} 9 IN z) and this concludes the proof. 9 4
Polynomial
equivalence
of reachability
and deadlock
Reachability and deadlock are decidable and thus recursively equivalent 4. In the current section, we prove that reachability, deadlock, reachability for normalized Petri nets and deadlock for normalized Petri nets are polynomiaUy equivalent. Recall that a Petri net is normalized if the flow function returns an integer in {0, 1} and the initial marking belongs to {0, 1} IPI. The reachability set however may be infinite and thus, normalized Petri nets should not be confused
319
with 1-safe nets for which any reachable marking contains only 0 or 1 as values. Normalization provides a simpler representation of Petri nets; in this sense, it is interesting to notice that studying KP or DP may be restricted to this class modulo a polynomial transformation. Our proofs use some known results but we explain in detail the main reduction "from deadlock to teachability". P r o p o s i t i o n 4. Reachability, deadlock, reachability for normalized PN and
deadlock for normalized PN are polynomially-time equivalent. Proof." We prove that RP <_poty RP-norm <_polu DP-norm <_potu DP <_votu RP. r The first reduction, from RP to RP-norm, is true by the normalization in 6 which is performed in quadratic time and preserves reachability. To make an efficient normalization, the main idea is to use the binary strings encoding integers appearing in F and M0, instead of using their values. r The second reduction, from RP-norrn to DP-norm, is true from the reduction in Cheng and al. 4. The main idea of the reduction is the following: let the original net run with dummy self-loop transitions. At any time, the current marking can be tested. The expected marking (which is part of the input) is subtracted from current marking. If the current marking was the expected one, the d u m m y transitions are not firable anymore and this leads to a deadlock. However, to preserve the normalization, we need to perform a pre-normalization over the expected marking. ~-+ The third reduction, from DP-norrn to DP, is trivial. ~-+ We explain in detail the fourth reduction, from DP to RP. A natural Turing reduction is to list all the partial dead markings and to ask for each of them, whether it is reachable or not. However, there exists an exponential number of dead markings and this strategy is not polynomial.
Construction, f r o m D P to R P . Let N = < P, T, F, M0 > be a Petri net. A deadlock Md in N is a reachable marking allowing no transition to be fired. This means that for every transition t, there exists a place p such that Md(p) < F(p, t). It is not necessary to describe the marking Md over all the places; a subset of places is sufficient. The main idea is to guess a partial marking, to validate it as a good candidate for a deadlock, to let run the original net and finally to compare Md guessed with the current marking M of the original net. For that, Md is subtracted from M (token by token). If the markings are the same, 0 is reachable in the chosen places for M and Md. Fig. 3 gives the general skeleton of the reduction. We construct a net with 4 levels of control. Each level controls a specific subnet, isolated in a box. However, two boxes may have common transitions, and this is illustrated with non-oriented dashed arcs. Control is given first at level 1 and moves as follows: level 1 ~ level 2 --+ level 3 ~ level 4. We explain in detail the four levels. 9 At Level 1, a subset of places P' C_ P is chosen, and a marking Md is guessed over P'. In Fig. 3, the guessed marking appears in the central box. If the place p is chosen, then a place Yesp is marked; otherwise, a place Nop is marked.
320
Level 1 ~ ,
-I ~
, y
,,
,
,~'
Level2
- I ~ .
"
Yespl
9
9169 ~
,..,
"'
/'
Nopl~
,,
'
' I Level3 Yesp Ip I Nop N~ p I
:/ I
/L
.ivlarz,ng . . . lwd , , i p , \ % , , I / AI-- ,' ~ / , /'
v
(-)
,
. . . .
Copyof N
~
/
I
,"
,' ,"
Level 4
""
Fig. 3. Reduction from deadlock to reachability.
For each original place p E P', the aimed Md (p) is stored into a place labeled with p'. Fig. 4 gives the details of the implementation for place p. An M~(p) cannot be greater than V, where V is the greatest valuation of the original net. To guess Md(p), i.e. the content of p~, we use a complementary place, labeled with Cv,. Places of kind p' are initialized with 0, and complementary places with V. At any time, the sum over a place and its complementary place is the constant V.
Yesp
Level 1
Fig. 4. From deadlock to reachability : to choose p and to guess
Md(p) into p'.
9 At level 2, the net verifies that Ma is a good candidate: Md must underevaluate for any transition the number of tokens required by at least one
321
input place. If the condition holds, then the place Psat is marked. To confirm that Md is a good candidate, we verify the following boolean equation : AteT Vpeot (Md(p) < F(p,t)) A (p 6 P') Condition Md(p) < F(p, t) is easily implemented using the complementary places: in fact, if if, i.e. Md(p), contains less than F(p,t) tokens then its complementary place contains at least V - F(p, t) + 1 tokens. The condition p 6 P' is verified by using the Yesp places. We illustrate the construction in Fig. 5, where we focus on transition tl which has here as input places: pl and an arbitrary pi. If the guessed marking is dead for tl, then a place "Dead for tl" is marked. The same implementation is done for all the transitions. Note that we use reflexive arcs, because places of kind Yesp or Cp, may be used for more than one original transition. When Md is recognized as dead for all transitions, then Psat may be marked (once here, but this is only a choice of construction).
YeSpl
Yesps
Yesp~p~
Fig. 5. From deadlock to reachability : to verify Md.
* At level 3, the net emulates the behavior of N. A copy of N is included in the current construction with a permission-token to level 3. 9 At level 4, the net stops the emulation and tests whether Md and the current marking M in the copy of N coincide. For that, the Yesp places are used to debit the chosen places simultaneously in M and Md. The other non chosen places of M are emptied using the Nop places. The remaining non relevant places of the construction are emptied without condition.
322
C o r r e c t n e s s : N reaches a dead marking if and only if Mr = (1, 0, 0, 0, 1, 0 . . . 0) is reachable in/V, where the first position of Mr refers to psat and the fifth one refers to level 4. It is evident that if a dead-marking is reachable in N, then it is possible to choose it as a good candidate and to finally reach Mr. In the other sense, if no dead marking is reachable in N then there are two cases: either P, at is not marked; or Psat is marked but this means that the guessed marking is not reachable and that current marking in the copy of N and Md will never coincide. C o m p l e x i t y : The net 2~ finally contains O(P + IT) places and O(IP , ITI) transitions (because of the module which verifies Md). The greatest value in N is V. The total size is thus in O(INI2), and the construction is linear on this size then quadratic. This concludes the proof. 9
5
Polynomial equivalence of liveness and t-liveness
There exists a polynomial reduction from reachability to not-liveness 16 using the variation of RP which asks whether a place p may be emptied. A similar reduction exists from deadlock to not-liveness 4. The other senses of the reductions, from not-LP to I~P and from not-LP to DP, are not known. Hack 8 gave a reduction from t-liveness to liveness. In the current section we show the other sense of the reduction, from liveness to t-liveness, making the two problems many-one polynomially equivalent. Note that we do not have this equivalence for the subclass of bounded free-choice net where t-liveness is NP-complete, while liveness is polynomial 5. T h e o r e m 5. Liveness is polynomially reducible to t-liveness. P r o o f : Let N = < P, T, F, M0 > be a Petri net. The construction of .~ is as follows: (1) Add a place Pt in output of every transition t E T; (2) Add a transition ttest having as input places the set of places {ptlt E T}. All the original transitions are live if and only if tte~t is quasi-live from any reachable marking in N. In/V we add IT places and O(IT) transitions. The total size of the net is O((IT I + PI) * T) and this size is quadratic in INI. The total time is linear in this size and thus polynomial. 9
6
An upper-bound on solving boundedness for Post Self-Modifying nets
Post Self-Modifying nets (PSM-nets), defined by Valk 18,19, are more powerful than Petri nets. In this model, transitions have extended arcs and/or classical arcs. Extended arcs are only in output of transitions. Let us suppose that there exists an extended arc from t to place P2 labeled with 21 *Pl + 4 , P3. Firing t from M, leads to a new marking M' such that M'(p2) -= M(p2)W21*M(pl)+4*M(p3). Thus, the next marking depends narrowly on the current one and this is why one uses the qualifier "self-modifying".
323
A PSM-net is a 5-tuple < P, T, F, M0, E >. The four first components are the same as in Petri nets and the fifth one, component E, is a function (T x P x P ) gq which returns a multiplicative coefficient, given a transition, an output place and a place to be consulted. In our example, we have E(t, P2, pl) = 21. Although PSM-nets are more expressive, the boundedness is still decidable and this is what makes this model attractive. The proof 18 is similar to the original one for Petri nets. However, reachability is undecidable. Let us define a lower bound on the size of an PSM-net. Let V be the greatest integer found over F,/1//0 and E. We encode the flow functions with matrices as for Petri nets. The size of a PSM-net belongs to a(IPI * ITI * log V). In the current section, we give an upper bound on solving boundedness for PSM-nets. We prove that we have a polynomial-time equivalence between boundedness for Petri nets and boundedness for post self-modifying nets. The non trivial sense of the reduction, from BP to BP-PSMN, requires quadratic time. As boundedness for Petri nets is decidable in space 20(Igll~ we obtain 2O(INI2 log INI) as an upper space-bound for BP-PSMN. The main idea is to build a net /V that emulates the behavior of N but computes the number of tokens output of extended arcs in a weak sense. This means that, in the best case, the computation will be the right one but, in any other case, the computation will under-evaluate the number of tokens to be produced. Any marking reachable in is, in some sense, covered b y a marking reachable in N and this implies that N is unbounded if and only if N is unbounded. T h e o r e m 6. Boundedness for PSM-nets is decidable in space 2 ~
log INI)
P r o o f : Let N = < P, T, F, E, M0 > be a post self-modifying net. We reduce BP-PSMN to BP; the time complexity is O(NI 2) leading to the theorem above. To construct N = < P , T , F , Mo >, we decompose the effect of any original transition for the weak computing of the tokens to be produced in output. Every transitions are replaced by a subnet as illustrated on an example in Fig. 6. For that, we need to associate every original place p to a place reservoir-p initialized with 0. We ensure the mutual exclusion between the IT subnets, such that as long as a current decomposition is not over, it is impossible to emulate another original transition. In Fig. 6, transition t has P4 as input place, P5 as classical output place and P2, P6 as "extended" output places. The arc to P2 is labeled with 2 1 , Pl + 4 , P3 and the arc to P6 with 7 * Pl. This implies that firing t from M has for consequence the addition of 2 1 , M(pl) + 4 * M(pa) tokens in P2 and 7 * M(pl) tokens in P6. The emulation of t is performed in four steps : 9 S t a r t t : the decomposition begins with the update of input places (here P4) and classical output places (here pb). Control is given to the next step. 9 U p d a t e b y Pl : the weak computations of 21 * M(px) and 7 * M(p~) take place here. As long as desired, tpl debits Pl of 1 token, crediting at the same time its reservoir place of 1 token, P2 of 21 tokens and P6 of 7 tokens. If the process ends when Pl is empty, then P2 and P6 received the exact number of tokens; otherwise they received less tokens than aimed.
324
Reservoir Pl
Reservoir
P3
P4
t r ........ i 21 ~ ~ 1
= ( ', start t .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
N
Nexttransition Fig. 6. Reduction from boundedness-PSMN to boundedness : weak firing of t.
9 U p d a t e b y P3 : the weak computation of the multiplicative coefficient for P3 takes place here. The value 4 * M(p3) is calculated in a weak sense, debiting P3 but keeping a trace in reservoir-p3 by the same time. 9 R e s t o r e a l t e r e d p l a c e s : we have now to restore the original contents of places Pl and P3. As long as desired, the contents of the reservoirs are put back into the original places. If the process continues up to empty the reservoirs, then Pl and P3 are restored; otherwise, they receive less tokens than aimed. Note that in this last case, we have not however lost any tokens because the remaining ones are in the reservoir places. Control is given to the next transition to be emulated. When all the steps are fully processed, we find in places p6 and p2 the right number of tokens, and we leave Pl and P3 unalterated. At any time, and this is the interesting point, if we "merge" any pair p and its reservoir by making their sum, we find a marking which is covered by a marking that is reachable in the original PSM-net. Moreover when the decompositions are well performed we find a marking reachable in the original PSM-net. These two facts are sufficient to make the reduction correct. Note that the construction needs to be a bit adapted for other cases such as reflexive extended arc. C o r r e c t n e s s : The original net N is unbounded if and only if the built n e t / V is unbounded. If N is unbounded, t h e n / Q is unbounded because there is always a way to emulate correctly the original net. If N is bounded then: either N fully performs the decomposition steps and produces as many tokens as N produces at any step; or it produces less tokens.
325 C o m p l e x i t y " The original places, the reservoirs and the mechanism which restores the places are common to all the decompositions of original transitions. Each decomposition of an original transition requires O(IPI) places and transitions in worst case. The whole net .~ contains O(IPI + (ITI * IPI)) places and O(IT I + (ITI * IPI)) transitions. The greatest value in N is logV. The total size is thus O((T I * IPI) 2 * logY) and the construction is linear on this size, thus o(Igl ~) and this concludes the proof. 9
7
Conclusion
Reachability Boundedness Not Boundedness-norm Boundedness-PSM N -I
Reachability-norm $ub-RP, Zero-RP SPZero-RP
Not
t-Liveness t-Liveness-norm Liveness
Deadlock Deadlock-norm
Fig. 7. Summary of polynomial-time many-one reductions.
In this paper we were interested in ordering Petri net problems, boundedness, teachability, deadlock, liveness and t-liveness, through their complexity. The Fig. 6 summarizes the contribution of our work. The main results are the following: We give an illustration of the expressive power of teachability by reducing to it the not-boundedness and the deadlock problems. Reachability is a very vulnerable property in term of decidability and often becomes undecidable, as soon as the power of Petri nets is increased. An example of an extended model for which I~P is undecidable, is the class of Petri nets allowing Reset arcs 1; a Reset arc clears a place as a consequence of a firing. We put in the same class the reachability and the deadlock problems. These problems were known to be recursively equivalent and thus, our comparison is more precise. We give 2 ~176 INI) as an upper-bound on the space-complexity for boundedness in post self-modifying nets, and this bound is not so far from the one for Petri nets, even though PSM-nets are strictly more powerful than Petri nets.
326
A c k n o w l e d g m e n t s . Thanks to the anonymous referees for their useful remarks.
References 1. T. Araki and T. Kasami. Some decision problems related to the reachabifity problem for Petri nets. TCS, 3(1):85-104, 1977. 2. Z. Bouziane. AIgorithmes primitifs r~cursifs et probl~mes ExPsPhCE-complets pour ies rgseaux de Petri cycliques. PhD thesis, LSV, l~cole Normale Sup6rieure de Cachan, France, November 1996. 3. E. Cardoza, R. Lipton, and A. Meyer. Exponential space complete problems for Petri nets and commutative semigroups. In Proc. o the 8 th annual A CM Symposium on theory of computing, pages 50-54, May 1976. 4. A. Cheng, J. Esparza, and J. Palsberg. Complexity result for 1-safe nets. TCS, 147:117-136, 1995. 5. J. Desel and J. Esparza. Free Choice Petri Nets. Cambridge University Press, 1995. 6. C. Dufourd and A. Finkel. A polynomial A-bisimilar normalization for Petri nets. Technical report, LIFAC, ENS de Cachan, 3uly 1996. Presented at AFL'96, Salgdtarjdn, Hungary, 1996. 7. J. Esparza and M. Nielsen. Decidability issues on Petri nets - a survey. Bulletin of the EATCS, 52:254-262, 1994. 8. M. Hack. Decidability questions for Petri Nets. PhD thesis, M.I.T., 1976. 9. J.E. Hopcroft and :I.D. Ullman. Introduction to automata theory, languages, and computation. Addison-Wesley, 1979. 10. M. Jantzen. Complexity of Place/Transition nets. In Petri nets: central models and their properties, volume 254 of LNCS, pages 413-434. Springer-Verlag, 1986. 11. R.M. Karp and R.E. Miller. Parallel program schemata. Journal of Computer and System Sciences, 3:146-195, 1969. 12. R. Kosaraju. Decidability of reachability in vector addition systems. In Proc. of the 14th Annual ACM Symposium on Theory of Computing, San Francisco, pages 267-281, May 1982. 13. R.:I. Lipton. The reachability problem requires exponential space. Technical Report 62, Yale University, Department of computer science, January 1976. 14. E.W. Mayr. An algorithm for the general Petri net reachability problem. SIAM Journal on Computing, 13(3):441-460, 1984. 15. E.W. Mayr and R. Meyer. The complexity of the word problem for commutative semigroups and polynomial ideals. Advances in Mathematics, 46:305-329, 1982. 16. J.L. Peterson. Petri Net Theory and the Modeling of Systems. Prentice Hall, 1981. 17. C. Rackoff. The covering and boundedness problems for vector addition systems. TCS, 6(2), 1978. 18. R. Valk. Self-modifying nets, a natural extension of Petri nets. In Proc. of ICALP'78, volume 62 of LNCS, pages 464-476. Springer-Verlag, September 1978. 19. R. Valk. Generalizations of Petri nets. In Proe. of the 10th Symposium on Mathematical Fondations of Computer Science, volume 118 of LNCS, pages 140-155. Springer-Verlag, 1981. 20. R. Valk and G. Vidal-Naquet. Petri nets and regular languages. Journal of Computer and System Sciences, 23(3):299-325, 1981.
Solving Some Discrepancy Problems in NC Sanjeev Mahajan,1 ? Edgar A. Ramos,2 ?? , and K. V. Subrahmanyam3 ? ? ? 1 LSI Logic, Milpitas, CA 95035, USA 2 Max-Planck-Institut fur Informatik, Im Stadtwald, 66123 Saarbrucken, Germany 3 SPIC Mathemat. Institute, 92.GN Chetty Road, T. Nagar Madras, India. 600 017
Abstract. We show that several discrepancy-like problems can be solved 2
in NC nearly achieving the discrepancies guaranteed by a probabilistic analysis and achievable sequentially. For example, given a set system (X; S ), where X is a ground set and S 2X , a set R X can be computedpin NC 2 so that, for each S 2 S , the discrepancy jjR \ S j,jR \ S jj is Op( jS j log jSj). Whereas previous NC algorithms could only achieve O( jS j1+ log jSj), ours matches the probabilistic bound achieved sequentially within a multiplicative factor 1 + o(1). Other problems whose NC solution we improve are lattice approximation, -approximations of range spaces of bounded VC-exponent, sampling in geometric con guration spaces, approximation of integer linear programs, and edge coloring of graphs.
1 Introduction Problem and Previous Work. Discrepancy is an important concept in combinatorics, see e.g. [1,5], and theoretical computer science, see e.g. [29,25,10]. It attempts to capture the idea of a good sample from a set. The simplest example, the set discrepancy problem considers a set system (X; S ), where X is a ground set and S 2X is a family of subsets of X . Here one is interested in a subset R X such that for each S 2 S the dierence jjR \ S j , jR \ S jj, called the discrepancy, is small. Using Cherno-Hoeding bounds [11,16,30,29,6,31], it is found that a random sample R X with each x 2 X taken into R independently with probability 1=2, results with nonzero probability in a low discrepp ancy set: for each S 2 S , jjR \ S j , jR \ S jj = O( jS j log jSj). In [29] using the method of conditional probabilities this was derandomized to obtain a deterministic sequential algorithm computing such a sample R . In parallel, several approaches have been used (k-wise independence combined with the method of conditional probabilities and relaxed to biased spaces [7,25,27,8]). However, so far these p eorts to compute a sample in parallel have resulted only in discrepancies O( jS j1+ log jSj). Work performed while at MPI Informatik, Germany. E-mail: [email protected] The work by this author was started at DIMACS/Rutgers Univ., New Brunswick, NJ, USA, supported by a postdoctoral fellowship. E-mail: [email protected] ??? Work performed while visiting MPI Informatik, Germany. E-mail: [email protected] ?
??
Results. In this paper, we describe NC algorithms (speci cally, the algorithms run in O(log2 n) time using O(nC ) processors for some constant C in the EREW PRAM model) that achieve the probabilistic bounds (achievable sequentially) within a multiplicative factor 1+ o(1). The technique we use is to model the sampling by randomized nite automatons (RFA's) 1 and then fool these automata with a probability distribution of polynomial size support. The approach is not new; in fact, Karger and Koller [18] show how to fool such automata via the lattice approximation problem using a solution for that problem developed in [25]. However, they apparently did not realize that the lattice approximation problem itself can be modeled by RFAs, and that as a result this and other discrepancylike problems can be solved in parallel nearly achieving the probabilistic bounds. We also describe how the work of Nisan [28] for fooling RFAs in the context of pseudorandom generators also ts the same general approach. We consider a sample R from X with each xj 2 X selected into R independently with probability pj . The goodness of the sample is determined by a polyP nomial number (in jX j) of random variables ci = x 2X aij qj with coecients aij in [0; 1] and qj = 1 i xj 2 R (the indicators for R). More precisely, R is good P if for each i, jci , i j i , where i = x 2X aij pj is the expected value of ci , and i is a deviation guaranteed by probabilistic (Cherno-Hoeding) bounds. Each coecient aij is restricted to have O(log jX j) bits so that the number of possible values of ci is polynomial in jX j, and therefore can be represented by a RFA of polynomial size. One RFA for each i. A key point, which perhaps explains why our observations had not been noticed before, is that it is sucient to fool the individual transition probabilities of the RFAs simultaneously, rather than the joint transition obtaining since the probability of a bad sample is bounded by the sum of the probabilities that each individual constraint does not hold. Although limited, this framework includes the lattice approximation problem, the discrepancy problem, and sampling problems in computational geometry. Also, since the lattice approximation problem can be used to obtain approximate solutions to integer linear programs [30,29], this this leads to improved results in the parallel context. As a result, with no extra eort, we improve on the recent work in [2]. Our improvement also translates to the derandomization in [25] of an algorithm for graph edge coloring by Karlo and Shmoys [20].2 j
j
Contents of the paper. We rst state the Cherno-Hoeding bounds used in this paper. In Sect. 2, we state and model the lattice approximation problem by RFAs; in Sect. 3, we present the techniques for fooling RFAs and the resulting algorithm for the lattice approximation problem; in Sect. 4, we consider the discrepancy problem and its application to solving the lattice approximation problem; in Sect. 5, we present two applications to computational geometry; nally, in Sect. 6, we brie y mention the applications to approximating integer linear programs and to edge coloring of graphs. 1 Finite automata in which transitions from a state to its immediate successor occurs
with a certain probability.
2 We thank an anonymous referee for pointing out this application.
Cherno-Hoeding Bounds. For independent random variables X1 ; : : :; Xn in P [0; 1], X = ni=1 Xi and = E[X ], let (; x) denote the absolute deviation for which, Pr(jX , j > (; x)) < x. A bound for (; x) is obtained using the Cherno-Hoeding bounds [11,16,29,1]: ( p ( log(1=x)) if c log(1=x) (1) (; x) = log(1=x) otherwise. log(log(1=x)=)
We de ne, likewise, k (; x), when X is the sum of k-wise independent random variables X1 ; : : :; Xn with values in [0; 1]. In this case [6,31]: p ( k(1=x)1=k) if k k (; x) = (2) (k(1=x)1=k ) otherwise.
2 Lattice Approximation In the lattice approximation (latt. app. ) problem we are given an m n matrix A with aij 2 [0; 1], an n 1 vector p with pj 2 [0; 1], and we are to compute an n 1 vector q with qj 2 f0; 1g, a lattice vector, that achieves small discrepancies P n i = j =1 aij (pj , qj ) .
2.1 Randomized Rounding
Raghavan's [29] solution to the latt. app. problem is to set each qj to 1 with probabilityPpnj , independently of all others, a process called randomized rounding. Let i = j =1 aij pj . The Cherno-Hoeding bounds guarantee that for each i with probability less than 1=m, i > (i ; 1=m); therefore, with nonzero probability, for all i, i (i ; 1=m) (m is the number of equations). For i = (log m) p (which will be the case most of the time), this is ( i log m). Raghavan [29] converted this probabilistic existence argument into a deterministic algorithm through the so called method of conditional probabilities. (achieving the discrepancies guaranteed by the Cherno-Hoeding bounds). A parallel version by Motwani et al [25] used polynomial size spaces with limited independence, together with a bit-by-bit rounding approach. Unfortunately, with the requirement that the algorithm be in NC, limited independence can only q 1+ produce discrepancies i = O( i log m). Using the Cherno-Hoeding bounds for arbitrary pj 's and the construction of k-wise independent probability spaces in [17], it is possible to avoid the bitby-bit rounding and obtain a faster and simpler algorithm (checking all the points in the probability space in a straightforward manner), though with larger discrepancies and work (the product of the time and the number of processors) bounds. This algorithm, with k = 2, turns out to be useful as a part of the main algorithm. To simplify later expressions, we assume that m is polynomial in n, so that log(n + m) = O(log n). In any case, the resulting work bound is polynomial in n only if such is the case.
Lemma 1. A lattice vector with discrepancies i = O(pi m =k ) can be com1
puted in O(log(m+n)) = O(log n) time using O(nk+1 m) processors in the EREW PRAM model.
2.2 Modeling Rounding with Leveled RFAs
Limiting the Precision. In order to derandomize the rounding procedure while getting closer to the probabilistic bound, it is useful to model it with RFAs. Speci cally, the idea is to have one RFA for each of the m equations so that P in the i-th RFA, states correspond to the dierent partial sums lj =1 aij qj , l = 0; : : :; n and qj 2 f0; 1g. For this to be useful, the number of states must be polynomial. Fortunately, as observed in [25], the fractional part of the coecients aij (and so the partial sums) can be truncated without a signi cant increment in the discrepancies. Also, it will be useful later to limit the precision of the probabilities pj . More precisely, these parameters can be truncated to L0 = dlog(3n=^)e fractional bits while increasing the discrepancy by at most ^:P Let a~ij and p~j be the corresponding truncated numbers, thus the discrepancy j j (aij qj , aij pj )j with respect to the original parameters can be upper bounded by X (aij qj j
X X , a~ij qj ) + (~aij qj , ~aij p~j ) + (~aij p~j j j X X
jaij , a~ij j + ~i +
j
X , ~aij pj ) + (~aij pj j X
ja~ij , aij j ~i + ^;
jp~j , pj j +
j
, aij pj )
j
where ~i is the discrepancy achieved for the truncated parameters. Furthermore, for the integer part of the partial sums, L00 = dlog ne bits suce. If 1=^ is polynomially bounded, then so is the number of states needed in the RFAs. We assume that ^ = O(1) is sucient and so L = L0 + L00 = 2 log n + O(1) bits are sucient to represent the dierent possible sums. Leveled RFAs. Thus, the rounding procedure can be modeled with m leveled RFAs. The i-th RFA, Mi , consists of n + 1 levels of states Ni;0 ; : : :; Ni;n, so that in Ni;j there is a state hi; j; ri for each number r with L bits. The transitions in Mi are between consecutive levels Ni;j ,1 and Ni;j in the natural way: hi; j , 1; ri is connected to hi; j; ri under qj = 0, and hi; j , 1; ri is connected to hi; j; r + aij i under qj = 1. The only state si = hi; 0; 0i in Ni;0 is the start state of Mi . A state hi; n; ri in the last level Ni;n is accepting if r is within a speci ed deviation i from i , that is, if jr , i j i . Let Ri denote the set of rejecting states in Ni;n . w t denotes that starting For two states s and t in some Mwi and a string w, s ! w t holds at s the string w leads to t, and [s ! t] is an indicator equal to 1 if s ! and equal to 0 otherwise. Let D be a probability distribution on l , the set of all 0=1 strings of length l. For w 2 l , PrD fwg denotes the probability of w in w t when w is chosen at random D, and PrD fstg denotes the probability of s ! according to D. Then X w t] Pr fwg: PrD fstg = [s ! D w2l
Basic Approach. Let Fn be the fully independent distribution on n according to the speci ed bit probabilities pj . Suppose that we can construct in polynomial time a distribution Dn on n with polynomial size support such that for each i, X
r2Ri
P
jPrD fsi rg , PrF fsi rgj : n
n
P
Then r2R PrD fsi rg + r2R PrF fsi rg, and if we set i = (i ; 1=2m) thePright hand side of this equation is at most + 21m . Thus, summing over all P i, i r2R PrD fsi rg < m + 1=2. For = 21m , this is at most 1. That is, there is at least one event in Dn that gives a lattice vector solution almost as good as that guaranteed by the probabilistic bound under Fn. As a result, we obtain discrepancies within a multiplicative factor 1 + o(1): (i ; 1=2m) rather than (i ; 1=m). (We could get even closer to the probabilistic bound by further reducing the error in the approximation at the expense of a greater amount of work.) Thus, derandomizing the rounding procedure becomes a problem of fooling a set of leveled RFAs, which is discussed in the next section. i
n
i
n
i
n
3 Fooling Leveled RFAs in Parallel Techniques to fool a RFA are found in the work of Nisan [28] in the context of pseudorandom generators, and in the work of Karger and Koller [18] in the context of parallel derandomization. Karger and Koller's approach is stronger in that it achieves relative error in the transition probabilities, while Nisan's approach achieves absolute error. Although Nisan's approach has the advantage of a compact representation, that is not important for our purposes. So far it has gone unnoticed that these techniques are precisely what is needed to nearly achieve the probabilistic bounds for the latt. app. problem in parallel. We present these two approaches in a uni ed manner for the particular case of leveled RFAs, which results in better processor bounds than if general RFAs are considered.
3.1 General Approach The goal is to construct a distribution Dn on n that fools each RFA Mi . We emphasize that we can fool simultaneously the individual transition probabilities of all the RFAs, PrF fsi ! rig for all i, but cannot fool the joint transition probabilities PrF fs1 ! r1; : : :; sm ! rm g. Let E0 be an integer parameter which will correspond to the (approximate) size of Dn , and let W = dlog E0e. n
n
Algorithm. As in [28,18], Dn is determined by a divide and conquer approach in which the generic procedure fool(l; l0) constructs a distribution that fools the transition probabilities between level l and l0 in all the RFAs. fool(l; l0 ) works as follows: It computes, using fool(l; l00) and fool(l00; l0 ) recursively, distributions D1 and D2 , each of size00 at most E00(1 + o(1)), that fool the transitions 00between states in levels l and l = b(l + l )=2c, and between states in levels l and l; reduce(D1 D2 ) then combines D1 and D2 into a distribution D of size at
most E0(1 + o(1)) that fools the transitions between states in levels l and l0 in all the RFAs. In the bottom of the recursion we use a 0=1 distribution F1 with support of size E0 implemented by W unbiased bits, which preserves the transition probabilities exactly. fool(l; l ) 1. if l = l then return F1 2. l = b(l + l )=2c 3. D1 = fool(l; l ) 4. D2 = fool(l ; l ) 5. return reduce(D1 D2 ) 0
0
00
0
00
00
0
Reduce. Let D~ = D1 D2 be the product distribution with support supp(D~ ) = fw1w2 : wi 2 supp(Di )g and PrD~ fw1w2g = PrD1 fw1gPrD2 fw2g. A randomized version of the combining is, as in [18]: Retain each w 2 D~ with certain probability q(w) into supp(D) with PrD fwg = PrD~ fwg=q(w). Thus, for all states, s; t, the transition probabilities are preserved in expectation: X w Pr ~ fwg E[PrD fstg] = [s ! t] qD(w) q(w) = PrD~ fstg: (3) w P
This selection also implies that the expected size of supp(D) is w q(w). We will bound this by our desired value E0(1 + o(1)) and formulate these conditions as a randomized rounding problem. This is exactly the approach of Karger and Koller [18]; but they missed the fact that the latt. app. problem itself can be modeled by RFAs and as a result the probabilistic bound can be nearly achieved. Next, we describe and analyze deterministic procedures to obtain a distribution D of size at most E0(1 + o(1)) such that for all states s; t the dierence jPrD fstg , PrD~ fstgj is small. We distinguish two cases according to whether we aim for absolute or relative error in the approximation. These cases correspond to the work in [28] and in [18] respectively.3 Our aim is a uni ed and self contained presentation adapted to our situation, emphasizing how new instances of the latt. app. problem appear naturally in solving the original instance.
3.2 Absolute Error
Let D be the distribution resulting from fool(l; l0 ) at the k-th level of the recursive computation, with the 0-th level being the bottom of the recursion. D should fool the RFAs in the sense that, for each s = si;l = hi; l; 0i, X
jPrD fstg , PrF fstgj k ; h
t2Ni;l
0
3 For most of our applications, absolute error suces. However, it turns out that for
some range of the parameters of the latt. app. problem, using the relative error option results in a lower work bound. See Sect. 5 for an application in which relative error seems to be needed.
where h = l0 , l and k is an upper bound on the absolute error accumulated up to the k-th recursion level. Note that if the transitions from si;l = hi; l; 0i are fooled then the transitions from the other states hi; l; ri, r 6= 0, in Ni;l are automatically fooled as well (because a string w induces a transition from hi; l; 0i to hi; l0 ; i i it induces a transition from hi; l; ri to hi; l0; r + i). Accumulation of Error. Let us assume that D, obtained from D~ at level k, satis es X jPrD~ fstg , PrD fstgj ~ (4) t2Ni;l
for each s = si;l . Since (proof omitted here) X jPrD~ fstg , PrF fstgj 2k,1; h
t2Ni;l
0
then k 2k,1 + ~, and so k (2k , 1)~. Let d = dlog ne be the last level. In order to achieve nal error d , we choose ~ = =n. Computing D from D~ . At each stage of the algorithm the partial distribution D constructed will be uniform on its support. If jsupp(D~ )j is less than E0 , then D = D~ , nothing needs to be done. Otherwise, D is obtained from D~ as follows. We have the following equations for every pair of states s = si;l and t 2 Ni;l : X w t]Pr fwg = Pr fstg; [s ! (5) D~ D~ ~ w2supp(D) and there is also the normalization condition: X PrD~ fwg = 1 (6) w2supp(D~ ) Multiplying each of these equations by E0 and with q(w) = E0PrD~ fwg = E0=jsupp(D~ )j, we obtain the following equations: X w t]q(w) = Pr fstgE [s ! for each Mi , s = si;l and t 2 Ni;l (7) 0 D~ w2supp(D~ ) X q(w) = E0: (8) w2supp(D~ ) These equations de ne a latt. app. problem, whose solution is the desired probability space D: The support of the space D will be precisely the support of this lattice vector, and the elements in the support will be assigned probability 1=supp(D).4 A solution to this latt. app. problem, as indicated earlier, is to retain into D each element w 2 supp(D~ ) with probability q(w). 4 To satisfy Eqn. (3) exactly we need to assign each element retained in D a probability PrD~ fwg=q(w). However D may not satisfy the requirements of being a probability 0
0
distribution under such an assignment and so we normalize the probability of every element to 1=jsupp(D)j. We show below that even under such an assignment D is a small size probability distribution approximating D~ well.
Let = 2L be the number of states in a level, and N = m be the number of pairs i; t. So the latt. app. problem in Eqns. (7-8) has N + 1 equations. Using the Cherno-Hoeding bounds, there exists a lattice vector (whose support is identi ed with supp(D) in the sequel) such that for all states s; t the following holds with non zero probability: X [s w2supp(D)
E0 E0
w t] , Pr fstg ! D~
(PrD~ fstgE0; PrD~ fstg=(m + 1))
X 1 w2supp(D)
(E0 ; 1=(m + 1));
,
P
P
The probability is non zero since mi=1 t2N PrD~ fstg=(m +1)+1=(m +1) = 1. Letting, = jsupp(D)j=E0 , this is equivalent to D~ fstg=(m + 1)) jPrD fstg , PrD~ fstgj (PrD~ fstgE0; Pr E0 j , 1j (E0 ; 1E=(m + 1)) : i;l0
0
So, for all s = si;l and t 2 Ni;l , the following holds with nonzero probability jPrD~ fstg , PrD fstgj jPrD~ fstg , PrD fstg j + PrD fstgj , 1j D~ fstg=(m + 1)) + (E0 ; 1=(m + 1)) : (PrD~ fstgE0; Pr E0 E0 0
In order to achieve the error bound between D~ and D expressed by Eqn. (4), it is sucient that jPrD~ fstg , PrD fstgj ~= = =n. Choice of E0. If the w's are selected with probability q(w) using a k-wise independent probability distribution, then using the estimate for k (; x) in p Eqn. (2),pwe obtain that jPrD~ fstg , PrD fstgj Cm1=k = E0. So we need that Cm1=k = E0 =n. We then choose E0 so that this holds: 2 2 2=k E0 C n m : 2
3.3 Relative Error
(9)
In this case, D should fool the RFAs in the sense that for each s = si;l and t 2 Ni;l , PrD fstg Pr fstg , 1 k ; F where k is the relative error accumulated up to the k-th recursion level. To achieve this, the distribution D is allowed to be non uniform on its support (a 0
h
probability distribution uniform on a support of polynomial size cannot have events with very small probability). The probabilities q(w) with which elements in D~ are retained into D are also non uniform. As in the absolute error case, we set up a latt. app. problem and the support of D will be precisely the support of a solution to it. Instead of assigning each element in the support of D a probability PrD~ fwg=q(w) as required to satisfy Eqn. (3), we normalize it by P
= w2supp(D) PrD~ fwg=q(w); that is PrD fwg = PrD~ fwg=q(w) . Accumulation of Error. Let us assume that D, obtained from D~ at level k, satis es PrD fstg ~ (10) Pr fstg , 1 ; D~ for each s = si;l and t 2 Ni;l . Since (proof omitted here) 0
PrD Pr
fstg , 1 + 1 (1 + )2 (1 + ~); k,1 F fstg h
then (1 + k ) (1 + k,1)2 (1 + ~), and d (1 + ~)n , 1 2n for ~ suciently small. Accordingly, we choose ~ = =2n to achieve total relative error . Choice of q(w). If jsupp(D~ )j is less than E0 then D = D~ , nothing needs to be done. Otherwise, we proceed as follows. Let = E0=(N + 1). We rewrite Eqns. (5) and (6) as w t]Pr fwg [s ! D~ q (w ) = q ( w )Pr ~ fstg D ~ w2supp(D) X
(11)
PrD~ fwg q(w) = : q(w) w2supp(D~ ) X
(12)
The probabilities q(w) are chosen as small as possible (to reduce the size of the support) while each coecient in this system of equations is at most 1, so that these equations constitute a latt. app. problem. Therefore, as in [18], we choose: !
w t]Pr fwg [s ! D~ q(w) = max max s;t PrD~ fstg ; PrD~ fwg : P
Replacing maximum by summation, we nd that w q(w) is upper bounded by X w
w X [s ! t]Pr ~ fwg D
st
PrD~ fstg
!
+ PrD~ fwg =
w X X [s ! t]Pr ~ fwg D
st
w
PrD~ fstg
+ = (N +1):
That is, the expected size of supp(D) is at most (N + 1) = E0 as desired.
Computing D from D~ . The latt. app. problem in Eqns. (11-12) is solved and the support of D is de ned to be the support of the lattice vector so obtained. Using the Cherno-Hoeding bounds, there exists a lattice vector (whose support is identi ed with supp(D) in the sequel) such that for for each Mi , s = si;l and t 2 Ni;l the following holds with non zero probability: 0
X [s w t]PrD~ w ~ st w2supp(D) q(w)PrD X PrD~ w q(w) w2supp(D)
!
f g f g , f g ,
(; 1=(N + 1)) (; 1=(N + 1)):
After dividing by this becomes, with = PrD fstg=PrD~ fstg, j , 1j (; 1=(N + 1)) and j , 1j (; 1=(N + 1)) : Thus, using 2, with nonzero probability PrD fstg (; 1=(N + 1)) : Pr fstg , 1 = j , 1j j1 , j + j , 1j 3 D~ Choice of and E0. We need 3(; 1=(N +1))= ~. Solving the latt. app. problem using a k-wise independent distribution, and using Eqn. (2), we obtain a condition for E0 (and since E0 = (N + 1)) 2 1+2=k E0 C n (m 2) : (13)
3.4 Work and Time Bounds
A variation of the algorithm in Lemma 1 is used for reduce. The recurrence for the number of processors used by fool(l; l0 ) is W (h) 2W (h=2) + Cf (E02 )E0m, where f (x) is the size of a k-wise independent probability space for x variables. Then the total number of processors is O(f (E02 )E0mn). This is minimized when k = 2. In the case of absolute error, one can have a better processor bound than that in Lemma 1 because a uniform 2-wise independent probability space of linear size can be constructed using hash functions as in [28]. Using Eqns. (9) and (13), we nally obtain the following (details omitted). Theorem 2. A leveled RFA can be fooled with absolute error in O2 (log2 n) time using O(n76 m4 =6) processors, and with relative error in O(log n) time using O(n1110m6 = 10) processors. For the latt. app. problem, it is sucient to use either absolute error with = 1=2m, or relative error with = 1. Thus, we obtain the following. Theorem 3. The latt. app. problem can be solved deterministically in the EREW PRAM model, resulting in discrepancies within a multiplicative factor 1 + o(1) of the probabilistic bound, using O(log2 n) time and O(n7 6 m6 min(m4 ; n44 )) processors.
4 Discrepancy 4.1 Problem
The particular case of the latt. app. problem in which each aij is 0 or 1 and each pj is 1=2 corresponds to the well-known discrepancy problem. It is usually stated as follows. We are given a set system (X; S ) where X is a ground set and S is a collection of subsets of X , n = jX j and m = jSj, and we are to compute a subset R from X , such that for each S 2 S , the discrepancies jjR \ S j , jR \ S jj are small. Let R be a sample from X with each x 2 X selected into R independently with probability 1=2. Then the Cherno-Hoeding bound for full independence, Eqn. (1), guarantees that with nonzero probability for each S 2 S :
p
jR \ S j , jR \ S j (jS j=2; 1=m) = ( jS j log m):
Generalizations and variations of the discrepancy problem have been extensively studied in combinatorics and combinatorial geometry (where S is determined from X IRd by speci c geometric objects) [5,1]. Computationally, it has also been object of extensive research [25,27,7]. Because of its importance, we consider in detail the work and time requirements for its solution in NC. Also, it is shown in [25] that an algorithm for the discrepancy problem can be used to solve the more general latt. app. problem. As a result, if we are willing to loose a log n factor in the running time, and a constant factor in the value of discrepancy achieved, then this represents a substantial saving in the amount of work performed (though still much higher than the work performed sequentially).
4.2 Algorithm The algorithm is just the specialization of the latt. app. algorithm of Sect. 2. The RFAs eectively work as counters that for each S 2 S store the number of elements of S that have been selected into R. Thus = n +1. The threshold p S that determines the rejecting states of MS is set to (jS j=2; 1=2m) = ( jS j log m), so that even after an absolute error less than 1=2m per RFA, or a relative error less than 1, still there is a good set with nonzero probability. This choice of S results in a discrepancy that is larger than the probabilistic bound (which is achievable sequentially) by only a factor 1 + o(1). Plugging the corresponding parameters in Thm. 3, we obtain the following. Theorem 4. The discrepancy problem can be solved deterministically in the EREW PRAM model in O(log n log(n+m)) = O(log2 n) time using O(n13m6 min (m4 ; n8)) processors.
4.3 Lattice Approximation Via Discrepancy The algorithm for the latt. app. problem in [25] is obtained by a reduction to the discrepancy problem. The resulting latt. app. algorithm achieves discrepancies a constant factor larger, while it has essentially the same work bound as the
discrepancy algorithm and a running time larger by a factor log n. The reduction uses as an intermediate step, for the purpose of analysis, the vector balancing problem. This problem is a latt. app. problem in which each pj = 1=2. Our improvement also translates to this algorithm (analysis omitted). As a result, we obtain the following.
Theorem 5. The latt. app. problem can be solved deterministically, resulting in discrepancies within a multiplicative factor O(1) from the probabilistic bound, for i log m, in the EREW PRAM model in O(L log n) = O(log n) time using 2
3
O(n13m6 min(m4 ; n8)) processors.
5 Sampling in Computational Geometry Randomized algorithms have been very successful in computational geometry [12,26] and, as a result, there has been interest in their derandomization. For this, two concepts capturing the characteristics of a sample have been developed: approximations of range spaces and sampling in con guration spaces. In both cases, our approach improves on previous NC constructions.
5.1 Approximations of Range Spaces A range space is a set system (X; R) consisting of a ground set X , n = jX j, and a set R of subsets of X called ranges. A subset A X is called an -approximation for (X; R) if for each R 2 R, jjA \ Rj=jAj , jRj=jX jj . For Y X , the restriction RjY is the set fY \ R : R 2 Rg. (X; R) is said to have bounded VCexponent if there is a constant d such that for any Y X , jRjY j = O(jY jd ). For (X; R) with bounded VC-exponent, a random sample of size O(r log r), where 2
the multiplicative constant depends on d, is a (1=r)-approximation with nonzero probability [32,1]. Sequentially, the method of conditional probabilities leads to a polynomial time algorithm for constructing these approximations with optimal size (matching the probabilistic bound). With a constant size loss in the size, they can be constructed in O(nrC ) time, for some constant C that depends on (X; R) [22,10]. Furthermore, for some range spaces that here we just call linearizable, and for r n , some > 0 depending on the range space, the construction can be performed in O(n log r) time [23]. In parallel (NC), however, only size O(r2+ ) has been achieved using k-wise independent probability spaces [13{15]. There is a close relation to the discrepancy problem. In fact, when the random sample R is of size jX j=2, the low discrepancy and approximation properties are (almost) equivalent. From the de nition, it is clear that the same approach used for the discrepancy problem can be used to compute an approximation of optimal size in parallel. Taking advantage of the good behavior of approximations under partitioning and iteration [22], the running times of the algorithms can be improved as follows, with only a constant factor loss in the size (details omitted here). The results for the CRCW PRAM model in [14,15] can be similarly improved.
Theorem 6. A (1=r)-approximation of size O(r log r) of a range space (X; R), jX j = n, can be computed deterministically in the EREW PRAM model in O(log n + log r) time using O(nrC ) work, for some C > 0. If (X; R) is linearizable, then for r n , for some 0 < < 1, the construction can be performed in 2
2
O(log n log r) time using O(n log r) work.
5.2 Sampling in Geometric Con guration Spaces Con guration spaces [12,9,26,24] provide a general framework for geometric sampling. A con guration space is a 4-tuple (X; T ; trig; kill) where: X is a nite set of objects, n = jX j; T is a mapping that assigns to each S X a set T (S ) called the regions determined by S , let R(X ) = [S X T (S ); trig is a mapping R(X ) ! 2X indicating for each 2 R(X ) the set of objects in X that trigger ; kill is a mapping R(X ) ! 2X indicating for each 2 R(X ) the set of objects in X that kill . We are interested in con guration spaces that satisfy the following axioms: (i) d = maxfjtrig()j : 2 Rg is a constant, called the dimension of the con guration space; furthermore, for S X with jS j d, the number of regions determined by S is at most a constant number E . (ii) For all S X and 2 R(X ), 2 T (S ) i trig() S and S \ kill() = ;. The following sampling theorem is the basis for many geometric algorithms [12].
Theorem 7. Let (X; T ; trig; kill) be a con guration space, with n = jX j, satisfying axioms (i) and (ii), and for an integer 1 r n let R be a sample
from X with each R independently with probability p = hP element of,X taken into i r=n. Then: E 2T (R) exp 2rn jkill()j 2d+1 f (r=2) where f (r) is an upper bound for E[jT (R)j]. It follows that with nonzero probability: (1) For all 2 T (R): jkill()j C nr log r, and (2) For all integer j 0: P2T (R) jkill()jj , C nr j f (r=2).
Sequentially, a sample as guaranteed by the sampling theorem can be computed in polynomial time (using the method of conditional probabilities). Through the use of a (1=r)-approximation, the time can be reduced to O(nrC ), and for linearizable con guration spaces, for r n, to O(n log r). In parallel (NC), kwise independence can only guarantee part (2) of the theorem for j = O(k) (but not part (1)) [3,4]. Modeling the sampling with leveled RFAs, and fooling them with relative error, we can construct in parallel a sample as guaranteed by the sampling theorem, except for a constant multiplicative factor. Relative error is needed because of the exponential weighting that makes even small probability events relevant. We obtain the following (details omitted here).
Theorem 8. A sample as guaranteed by the sampling theorem can be computed deterministically in the EREW PRAM model in O(log n + log r) time using O(nrC ) work; and in the case of a linearizable con guration space and r n , 2
in O(log n log r) time using O(n log r) work.
6 Other Applications
6.1 Approximation of Integer Linear Programs An NC algorithm for approximating positive linear programs was proposed in [21]. To solve positive integer linear programs approximately in NC, [2] propose mimicing the philosophy of [29] - rst solve the program without integrality constraints, approximately, using [21], and then use the NC latt. app. algorithm of [25] as a rounding black box to obtain an integral solution.pHowever the second step introduces an additional error since [25] only guarantees O( 1+ log m) discrepancy sets. [2] attempts to cori rect, in some cases, the error introduced as a result of using the latt. app. algorithm of [25]. Our algorithm essentially reduces the error introduced by latt. app. to the minimum possible.
6.2 Edge Coloring of Graphs
Let G = (V; E ) be an undirected graph whose maximal degree is . A legal edge coloring is an assignment of colors to the edges such that two edges incident to the same vertex cannot have the same color. Vizing's theorem states that G can be edge colored with +1 colors, and it implies a polynomial time sequential algorithm to nd such coloring. The best deterministic parallel algorithm is the derandomization in [25] of an algorithm in [20]. It uses a discrepancy algorithm and produces a coloring with p + O( 1+ ) colors. For = p(log n), substituting there our discrepancy algorithm produces a coloring with + O( log n) colors.
References 1. N. Alon and J. Spencer. The Probabilistic Method. Wiley{Interscience, 1992. 2. N. Alon and A. Srinivasan. Improved parallel approximation of a class of integer programming problems. Algorithmica 17 (1997) 449{462. 3. N. M. Amato, M. T. Goodrich, and E. A. Ramos. Parallel algorithms for higherdimensional convex hulls. In Proc. 35th Annu. IEEE Sympos. Found. Comput. Sci., 1994, 683{694. 4. N. M. Amato, M. T. Goodrich, and E. A. Ramos. Computing faces in segment and simplex arrangements. In Proc. 27th Annu. ACM Sympos. Theory Comput., 1995, 672{682. 5. J. Beck and W. Chen. Irregularities of distribution. Cambridge University Press, 1987. 6. M. Bellare and J. Rompel. Randomness-ecient oblivious sampling. In Proc. 35th Annu. IEEE Sympos. Found. Comput. Sci., 1994, 276{287. 7. B. Berger and J. Rompel. Simulating (log c n)-wise independence in NC. Journal ACM 38 (1991) 1026{1046. 8. S. Chari, P. Rohatgi and A. Srinivasan. Improved Algorithms via Approximations of Probability Distributions. In Proc. ACM Sympos. Theory Comput. (1994) 584{ 592. 9. B. Chazelle and J. Friedman. A deterministic view of random sampling and its use in geometry. Combinatorica 10 (1990) 229{249. 10. B. Chazelle and J. Matousek. On linear-time deterministic algorithms for optimization problems in xed dimension. In Proc. 4th ACM-SIAM Sympos. Discrete Algorithms, pages 281{290, 1993.
11. H. Cherno. A measure of asymptotic eciency for tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics 23 (1952) 493{509. 12. K. L. Clarkson and P. W. Shor. Applications of random sampling in computational geometry, II. Discrete Comput. Geom., 4 (1989) 387{421. 13. M.T. Goodrich. Geometric partitioning made easier, even in parallel. In Proc. 9th Annu. ACM Sympos. Comput. Geom., 73{82, 1993. 14. M.T. Goodrich. Fixed-dimensional parallel linear programming via relative approximations. In Proc. 7th ACM-SIAM Sympos. Discr. Alg. (SODA), 1996, 132{ 141. 15. M.T. Goodrich and E.A. Ramos. Bounded independence derandomization of geometric partitioning with applications to parallel xed-dimensional linear programming. To appear in Discrete and Comput. Geom. 16. W. Hoeding. Probability inequalities for sums of bounded random variables. American Statist. Assoc. J. 58 (1963) 13{30. 17. A. Joe. On a set of almost deterministic k-independent random variables. Annals of Probability 2 (1974) 161{162. 18. D.R. Karger and D. Koller. (De)randomized constructions of small sample spaces in NC. In Proc. 35th Annu. IEEE Sympos. Foundations Comput. Sci., 1994, 252{ 263. 19. H.J. Karlo and Y. Mansour. On construction of k-wise independent random variables. In Proc. 26th Annu. ACM Sympos. Theory Comput., 1994, 564{573. 20. H.J. Karlo and D.B. Shmoys. Ecient parallel algorithms for edge coloring problems. J. Algorithms 8 (1987) 39{52. 21. M. Luby and N. Nisan. A parallel approximation algorithm for positive linear programming. In Proc. 25th Annu. ACM Sympos. Theory Comput., 1993, 448{ 457. 22. J. Matousek. Approximations and optimal geometric divide-and-conquer. In Proc. 23rd Annu. ACM Sympos. Theory Comput., 1991, 505{511. Also in J. Comput. Syst. Sci. 50 (1995) 203{208. 23. J. Matousek. Ecient partition trees Discrete Comput. Geom. 8 (1992) 315{334. 24. J. Matousek. Derandomization in computational geometry. Available in the web site: http:// www.ms.m.cuni.cz/ acad/kam/matousek/ Earlier version appeared in J. Algorithms. 25. R. Motwani, J. Naor and M. Naor. The probabilistic method yields deterministic parallel algorithms. J. Comput. Syst. Sci. 49 (1994) 478{516. 26. K. Mulmuley. Computational Geometry: An Introduction Through Randomized Algorithms. Prentice Hall, Englewood Clis, NJ, 1993. 27. J. Naor and M. Naor. Small-bias probability spaces: ecient constructions and applications. SIAM J. Comput. 22 (1993) 838{856. 28. N. Nisan. Pseudorandom generators for space-bounded computation. Combinatorica 12 (1992) 449{461. 29. P. Raghavan. Probabilistic construction of deterministic algorithms: Approximating packing integer programs. J. Comput. Syst. Sci. 37 (1988) 130{143. 30. P. Raghavan and C.D. Thompson. Randomized rounding: A technique for provably good algorithms and algorithmic proofs. Combinatorica 7 (1987) 365{374. 31. J.P. Schmidt, A. Siegel and A. Srinivasan. Cherno-Hoeding bounds for applications with limited independence. SIAM J. Discrete Math. 8 (1995) 223-250. 32. V.N. Vapnik and A.Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16 (1971) 264{ 280.
Computing Reachability Properties Hidden in Finite Net Unfoldings Burkhard Graves Universit~t Hildesheim, Institut fiir Informatik, Marienburger Platz 22, D-31141 Hildesheim, Germany Fax: +49 5121 860475, email: graves~informatik.uni-hildesheim.de (August 1997) A b s t r a c t . It is commonly known that every reachable marking of a finite-state Petri net system is represented in its finite unfolding according to McMillan. Also the reachability of markings from each other is represented in the finite unfolding, but it is almost unknown that this information can be hidden very deep. This paper presents an efficient method for gaining this information, which is of course of great importance for potential modelcheckers working on finite unfoldings. All results presented in this paper also hold for a recently proposed optimized unfolding method.
1
Introduction
and
Motivation
A major drawback of interleaving semantics and modelcheckers based upon them is the so called state explosion problem. One among other approaches to cope with this problem is the use of partial order semantics instead of interleaving semantics 8. Like many other works done in this area, this paper uses finite l-safe Petri net systems to represent finite-state concurrent systems. Partial order semantics describes the behaviour of a net system by the set of its processes 3 or by its maximal branching process 5, also called maximal unfolding of the system, which can be seen as the union of all processes of the given system. However, if a system can exhibit an infinite behaviour, then at least one of its processes and consequently its maximal unfolding is infinite and therefore unsuitable for verification purposes. McMillan proposed in 8 an elegant algorithm for the computation of a finite initial part of the maximal branching process, called finite unfolding, in which every reachable marking of the system is represented. This work was refined and optimized by Esparza, RSmer and Vogler in 7; the finite unfolding calculated by their method is never bigger and often much smaller (in terms of orders of magnitude) than McMillan's finite unfolding, while still representing every reachable marking. However, in this paper we neglect the difference between these two unfolding methods. All results hold for both unfolding methods and the systems serving as examples have been choosen in a way, such that both unfolding methods yield the same finite unfolding (up to isomorphism). As already mentioned above, every reachable marking of a given system is represented in its finite unfolding. However,
328
"t I, q
P5
Fig. 1. A finite 1-safe net system.
PI~
3
4
P5
.L
Fig. 2. Its finite unfolding fir.
the reachability of markings from each other is deeply embedded in the finite unfolding. Consider for example the system and its finite unfolding displayed in Fig.1 and Fig.2. The reachable marking {P2,P3} is represented by a process contained in the finite unfolding corresponding to the configuration {el, e3, es} (describing the occurrence sequence t3t2t3). Obviously, the deadlock marking {Pl,ps} is reachable from {P2,p3} (by the occurrence sequence tlt2tlt4, for example), but how can this information be gained from the finite unfolding? Now, imagine all processes describing a run of a system to a given marking. In general, only a few of these processes are totally contained in the finite unfolding, but it is always the case that some initial part of every such process is contained in the finite unfolding. In our example, there are infinitely many processes describing a run of the system into the deadlock marking {Pl,PS}, but only two of them are totally contained in the finite unfolding. One corresponds to the configuration {e2}, the other one corresponds to the configuration {el, e3, e4, es}. Each of the remaining processes is prefixed by a process corresponding to the configuration {el, e3, e4, as}. Since {el, e3, es} is a subset of {el, e3, e4, es}, it can be concluded that {PhP5} is reachable from {P2,P3}. This example demonstrates that a specific classification of configurations contained in the finite unfolding is of great importance for potential modelcheckers. Three types of configurations can be distinguished wrt. a given marking: Configurations of the first type correspond to processes describing runs to the given marking, configurations of the second type correspond to processes which cannot be extended to processes describing runs to the given marking, and configurations of the third type are configurations which are neither of type one nor of type two. Since this classification is a disjoint partitioning of all configurations, the knowledge of two classes yields the third one. The configurations of type one can be easily calculated; two different algorithms for this task can be found in 6. However, the classification of the remaining configurations is a problem. Esparza tries to solve this problem in 6 by introducing a 'shift' operator working on configurations. Configurations of the third type should be calculated by repeated applications of this operator on configurations of the first type. Unfortunately, this does not work in some cases, as we will see in Sect.4. One could say, that the finite unfolding as defined by
329
McMiUan or by Esparza/RSmer/Vogler is 'too small', because the problem can be fixed by creating a sufficiently large finite unfolding which does not contain these special cases mentioned above. But this 'brute force' method would significantly slow down potential model checking algorithms, e.g. the one proposed in 6. This paper presents another solution, namely a modification of the 'shift' operator, such that it works as the old operator was supposed to do on the McMillan unfolding as well as on the Esparza/Riimer/Vogler unfolding.
2
Basic
Definitions
Following is a series of definitions, notions and theorems (without proofs) in a very brief form. More details can be found in the referenced literature. P e t r i N e t s . A triple N = (S,T,F) is a net ifSMT = 0 and F C ( S x T ) U ( T x S ) . The elements of S are called places, the elements of T transitions. Places and transitions are generically called nodes. N is finite if IS U T I E lN. We identify the flow relation F with its characteristic function on the set (S x T) U (T x S). The preset of a node x, denoted by ~ is the set {y E S U T I F(y, x) = 1}. The postset of x, denoted by x ~ is the set {y E S U T I F(x,y) = 1}. Preand postsets are generalized to sets of nodes X C_ S U T in the following way: ~ = Uxex~ x ~ = U x e x x~ (notice ~ = 0 ~ = @). A marking M of a net (S, T, F ) is a mapping S -~ IN. A 4-tuple Z = (S, T, F, M ~ is a net system if (S, T, F ) is a net and M ~ is a marking of (S,T, F); M ~ is called initial marking of E. E is finite if the underlying net is finite. A marking M enables a transition t if Vs E S : M(s) >_ F(s, t). A marking enabling no transition is a deadlock marking. If a transition t is enabled at M, then it can occur, and its occurrence leads to a new marking M ~, denoted by Mt)M ~, such that Vs E S : M*(s) = M(s) - F(s,t) + F(t,s). A sequence of transitions a = tl ... tn (n E ~ ) is an occurrence sequence if there exist markings M 0 , . . . , Mn such that Motl)Mlt2)... tn)Mn. Mn is the marking reached from M0 by the occurrence of a, denoted by Moa)Mn. M ~ is reachable from M if there exists an occurrence sequence a such that Ma)M ~. The set of all markings which can be reached from M is denoted by M). A marking M of a net (S, T, F ) is 1-sale if Vs E S : M(s) _< 1. We identify 1-safe markings with the set of places s such that M(s) -- 1. A system is 1-safe if all its reachable markings are 1-safe. Figure 1 shows a finite 1-safe system; its initial marking is {Pl, P4 }. B r a n c h i n g P r o c e s s e s . A branching process of a system is a special kind of net, called occurrence net, together with a certain homomorphism showing that this net can be interpreted as an unfolding of the system containing information about both concurrency and conflicts. In order to avoid confusions coming from the fact that the semantics of a (marked) net is again a (labelled) net, different names are used for the nodes of the net system and for those of the occurrence net which describes the system's semantics: the places of occurrence nets are called conditions, and their transitions are called events. We quickly review the
330
main definitions and results of 5, where the notion 'branching process' was introduced first: Let ( S , T , F ) be a net. The transitive closure of F , denoted by -~, is called causal relation. The symbol _ denotes the reflexive and transitive closure of F. Min(N) equals {x 9 S t J T I --3y 9 S t J T : y -~ x}. For x 9 S U T and X C S U T, we say x -< X if 3y 9 X : x -~ y (analogously for ~, ~- and ~). Two nodes Xl,X2 9 S U T are in conflict, denoted by xl#x2, if 3t1,t2 9 T, tl t2,~ N~ r ~ : tl ___xt At2 _ X2. A node x 9 S O T is in self-conflict i f x # x . We say xx co x2 if neither xl -~ x2 nor x2 -~ Xl nor Xl~:X2 holds. An occurrence net is a net N = (B, E, F ) such that (i) Vb 9 B : I%1 < 1
(iii) --Se 9 E : e # e
(ii) --qx 9 B U E : x -~ x
(iv) Vx 9 B t J E :
I{Y 9 B t J E I Y "~ x}l 9 lN.
If moreover b" I < 1 holds for every b E B, then N is called causal net. Let N1 = (SI,T1,F1) and N2 = (S2,T2,F2) be two nets. A homomorphism from N1 to N2 is a mapping h : SLUT1 ~ S2UT2 with h(S1) C $2 and h(T1) C_ 7"2 such that for every t E Tx the restriction of h to "t is a bijection between "t and "h(t), analogously for t ~ and h(t)'. A branching process of a net system ~7 = (N, M ~ is a pair 8 = (N*,P) where N t = (B, E, F ) is an occurrence net and p is a homomorphism from N ~ to N such that the restriction of p to Min(N ~) is a bijection between Min(N ~) and M ~ and Vel,e2 e E : ( ' e l = "e2 A p ( e l ) = p(e2)) =:~ el = e2. If N I is a causal net, then 8 is a process of E. Let j31 = ( N I , p l ) and 8z = (N2,p2) be two branching processes of a net system. A homomorphism from 8x to /~2 is a homomorphism h from N1 to N2, such that P2 o h = Pl and the restriction of h to Min(N1) is a bijection between Min(N1) and Min(N2). 81 and 82 are isomorphic if there is a bijective homomorphism from 81 to 82- Intuitively, two isomorphic branching processes differ only in the names of their conditions and events. It is shown in 5 that a net system has art unique maximal branching process up to isomorphism. We call it maximal unfolding of the system and denote it by 8m = (Bin, Era, Fro, pro). ~a is a prefix of 82 if N1 is a subnet of N2 and, moreover, there exists an injective homomorphism from ~1 to 82. Figure 2 shows a prefix of the maximal unfolding 8m of the finite 1-safe system displayed in Fig.1. It should be clear, that a repeated continuation with four events and five conditions labelled and 'arranged' like e a , . . . , e 8 and ba,...,b9, respectively, yields the maximal unfolding 1 . C o n f i g u r a t i o n s , C u t s a n d m o r e . A configuration of an occurrence net N = (B, E, F ) is a causally closed, conflict-free set of events C C_ E , which means Ve, e ~ 9 E : (e-~ e ' A e I e C) ~ e 9 C a n d V e , e' 9 C : - - ( e # e ~ ) . G i v e n e 9 E, the set e = {e' 9 E e' ~ e} is a configuration, called local configuration of e. A set of conditions B ~ C B is a co-set if its elements are pairwise in the co relation. A maximal co-set wrt. set inclusion is a cut. A marking M of a 1 For example: p(e7) = t2, *e7 = {br, b9}, e~ = {blo, b11}, p(bl0) = p3, p(bzl) = p4, etc.
331
system Z is represented in a branching process/3 --- (N,p) of E if fl contains a cut c such that, for each place s of Z, c contains exactly M(s) conditions b with p(b) = s. Every marking represented in a branching process is reachable, and every reachable marking is represented in the maximal unfolding of the net system. Finite configurations and cuts are tightly related: Let C be a finite configuration of a branching process 13 = (N,p). Then Cut(C) = (Min(N) U C ~ \ ~ is a cut representing the marking Mark(C) = p(Cut(C)). Two configurations C1 and C2 of a branching process correspond to each other if Mark(C1) = Mark(C2). A pair (C1, C2) of corresponding configurations is called cc-pair. Let fl = (B, E, F,p) be a branching process of a net system E = (N, M ~ and let c be a cut of ft. The set {x E B U E I x ~_ c A V y E c : -~(x#y)} is denoted by 1"c. Identifying F and p with their restrictions to 1"c, (t c = (BA J"c, E n 1"c, F, p) is a branching process of (N,p(c)); moreover, ff ~ -- jSm then ~c is the maximal branching process of (N,p(c)). It follows that ~ Cut(C1) and ~ Cut(C2) are isomorphic, provided (C1,C2) is a cc-pair; in this case we denote the (unique) isomorphism from ~ Cut(C1) to ~ Cut(C2) by I(Cl,e2). M c M i l l a n ' s F i n i t e U n f o l d i n g . Here we only present McMillan's unfolding method. The refined method of Esparza, RSmer and Vogler is more complicated; interested readers are referred to 7. As already mentioned, the differences between these two unfolding methods are not relevant for this paper. Let fl = (B, E, F,p) be a branching process of a net system Z. We say that fl is complete if every reachable marking of ~ is represented in fl and, moreover, contains an event labelled by t if a transition t can occur in E. The maximal unfolding of a net system is always complete. Since a finite 1-safe net system has only finitely many reachable markings, its maximal unfolding contains at least one complete finite prefix. An event e E E is a cut-off event if there exists a local configuration e' such that Uark(e') = Uark(e) and Ie' < Iel. To achieve that e is a cut-off event if Mark(e) = M ~ a new 'virtual' event _L is introduced and โข is defined as the empty configuration: Mark(_L) = M ~ = Mark(e) and I_kl = 0 < Iel. Given a cut-off event e, there may exist several e' such that Mark(e') = Mark(e) and Ie'l < Ie. We assume in the sequel that for each cut-off event e one of these e' is fixed, call it corresponding event of e and denote it by e ~ Moreover, we assume without loss of generality that e~ is not a cut-off event. Let E$ C Em defined by: e E E$ iff no event e' -~ e is a cut-off event. The (unique) prefix of ~m having E I as set of events is called McMillan's finite unfolding and denoted by HI = (BI, EI, FI,Pl). In 6, fll is proved to be always complete. Off denotes the set of cut-off events of ill- 7 I denotes the set of all configurations of flI" 7) denotes the set of all maximal configurations of/~I wrt. set inclusion. The set of all configurations contained in the maximal unfolding /~m is denoted by ~'m. Figure 2 shows the finite unfolding fll of the finite l-safe system displayed in Fig.1. e4 is the only cut-off event, e~ = โข is its corresponding event. Note, that indeed Mark(e4) = Mark(.L) = {Pl,P4}. 7) contains three maximal configurations: D1 = {e2}, D: = {el, es, e4, e6} and 03 = {el, es, e4, es}.
332
3
Mutual
Reachability
of Markings
Remark. To simplify subsequent definitions, we assume a finite 1-safe net system = (S, T, F, M ~ together with its (possibly infinite) maximal unfolding/~m =
(Bm,Em,F,~,pm) and its finite unfolding /~$ = ( B y , E i , F y , p l ) to be given throughout the rest of this paper. The prefix and the postfix of a (set of) node(s) and the operators 1" and ~ always refer to the maximal unfolding tim. It is beyond the scope of this paper to explain the ideas of 6, where a model checking algorithm is introduced, which should be able to check formulas of a simple branching time logic. However, the following is similar to what can be found in 61, but due to the fact that we are only interested in recognizing the mutual reachability of markings, we axe able to simplify some definitions. Definition 1. Let C be a configuration and let C, C1 and C2 he sets of configurations. We say C<_C iff 3 C ' E C : C C _ C '
and
C1<_C2 iff V C E C I : C < _ C 2 .
The restriction onto the finite unfolding ~Jl is denoted by vC = CnEj
and
VC = {vC
C e C}.
The set of the maximal elements contained in C wrt. set inclusionis denoted by max(C) = {c e c 1 . 3 c ' 9 c : c c c'}. Notice that max(C) may equal the empty set if C I t/g/. The following Lemma is needed for a proof later on; observe that it does not hold without the max operator. L e m m a 2. Let C1 and C2 be sets of configurations.
max(el) = max(C2) iff max(el) <_ max(C2) A max(C2) < max(el). Definition 3. Let M be a marking. We define
Satin(M) = {C I C 9 Y:rn A Mark(C) = M}, Satl(M ) = {C C 9 Y:! A Mark(C) = M}, Last(M) = max( Sat i ( M ) ) . In terms of Sect.l, the set Satin(M) contains configurations of the first type, which correspond to processes describing runs to the marking M. The same holds for Saty(M), but wrt. the finite unfolding fly. Last(M) can be seen as a 'compact representation' of Sati(M), because every configuration contained in Saty (M) is (at least) a subset of a configuration contained in Last(M). Due to
333
its compactness, the set Last(M) can be calculated easily (provided t h a t / ) is known, but this set can be calculated in advance by the unfolding mechanism); two different algorithms for this task can be found in 6. In our first example, Last({pl, P5}) equals { {e2 }, {el, e3, e4, e6 } }. But as we have seen, the knowledge of Last({pl, P5 }) is not enough to detect that {Pl, P5 } is reachable from {p2, P3 }. The following proposition shows that we are interested in max(~TSatm(M)), the compact representation of ~TSatm(M). P r o p o s i t i o n 4. Let M1 and M2 be two markings.
M2
M1) iff max(~TSatm(il)) < max(~TSatm(M2)). Unfortunately, the sets Satin(M1) and Satin(M2) may be infinite. Fortunately, the following section shows that max(vSat,~(M)) is equal to the maximum of a 9
set which can be calculated by some finite applications of a 'shift' operator on the finite set Last(M). 4
The
Shift Operators
In the following, we present two shift operators which are generalizations of (slightly modified 2) operators originally defined in 6. The first operator shifts a configuration wrt. a cc-palr, the second operator shifts a set of configurations wrt. a set of cc-palrs. By choosing these cc-pairs, the operators may be tuned. If they are tuned in such a way that they correspond to the ones defined in 6, we obtain a problem, as we will show. Hence we will propose a somewhat different tuning and we will show, as a result, that the problem disappears. Let (C1,62) be a cc-pair. As in 6, the branching process ~t Cut(C2) can be thought to be ~ Cut(C1) 'shifted forward' (or 'shifted backwards' if C2 < C1 ). Accordingly, if C1 is a subset of some configuration C of tim then C \ C1 is a configuration of # Cut(C1), I(cl,c2)(C \ C1) is a configuration of ~ Cut(C2) and C2 U I(c,,c2)(C \ C1) is again a configuration of ~m, which can be thought to be C 'shifted forward' wrt. the cc-palr (C1, C2). The following is a formal definition of this operation. D e f i n i t i o n 5 . Let T C ~m x Ym be a set of cc-pairs and (C1,C2) 9 T. The basic shift operator is defined by
c_c}-+ {cey S(c,,c,) :
Ic2 c_c}
C ~+ C2 u I(cl,c,)(C \ C,).
The complex shift operator is defined by 2Y,. __~ 2Y..
St:
C~Cu{S(cI,c~)(C)I(C~,C2)eT^CI C_CeC}.
The least fixpoint of S r containing C is given by
.Sr.C = .J S~(C). n>O
2 Merely for technical reasons, in order to get more algebraic properties.
334
Lemma6.
Let C E :F~n and let (C1,C2) be a ec-pair with C1 C C.
(i) Mark(C) = Mark(S(cl,c~)(C)), ({i) Icll < Ic21 IS(Cl,C, (c)l > Icl and --1 (iii) S(cI,c2) is bijective (S(c,,c2) = S(c2,c,)) and monotonic wrt. C. For convenience and in accordance with 6, we fix the following N o t a t i o n 7. We abbreviate St(C) by S(C) if T = {(e~ e) e E Off }. Moreover, if (C1,C2) = (e~ e) holds for some e G Off,we abbreviate I(eo,el)(C) by Ie(C) and S(eo,el)(C) by Se(C); the latter is called elementary shift. In Fig.2 we have for example Se4 ({e2}) = e4 U I~4 ({e2}) = {el, e3, e4} U {e6}. The following is a first step towards the aim formulated at the end of Sect.3. Theorem8.
max( v Satm ( M ) ) = max( v #S.Saty ( M ) ) = max( V ~S.Last( M ) ) .
Proof. The first equality follows directly from Satin(M) = #S.SatI(M ) which is proven in 6. The second equality is proven indirectly in 6, but due to a problem in the proof, we give a direct one:
Last(M) = max( Satf (M) ) =~ {definition of <__and max}
Sati(M ) < Last(M) {monotonicity of S(yo,y)} #S.SatI(M ) < #S.Last(M) {set theory}
v#S.Sat I(M) < v#S.Last(M) {definition of max} max(v#S.Satl(M) ) < max(v~S.Last(M) ) Together with max(v#S.Last(M)) <_ max(~7#S.Satl(M)), which follows directly from Last(M) C_ Satf (M), Lemma 2 yields the requested equality. Of course, there are still some problems: The calculation of #S.Last(M) requires infinite applications of S, #S.Last(M) itself may be infinite and, in particular, potential modelcheckers have only the finite unfolding fll at their disposal. Therefore, they can only work with finite versions of the shift operators. D e f i n i t i o n 9 . Let T be a set of cc-pairs, (C1,C2) E T, C C_ ~cf and C E C. We identify the finite versions of the shift operators by overbarring: (C) = VS(c,,c
St(C) = v S r ( C ) = U._>o
) (C) ,
and
335
Observe that Sy(C) = C 0 {S(cl,o2)(C) I (U1,C2) e T A C1 C C e C} and that the calculation of S(ol,o2)(C) does not require the maximal branching process tim; the knowledge of the pairs (e t , I ( c l , c 2 ) (e )t ) e E~ for every (C1,C2) 9 T is sufficient (an algorithm for calculating these pairs can be found in 6). Moreover, since S~(C) is a set of configurations of the finite unfolding for every n, and the finite unfolding contains only finitely many configurations, there exists some k 9 lN such that S~(C) = S~-+I(c). We have then/zSr.C = .J0
t5
/
P5
t6
t5
P5
t
P6
t7
PI
P7
tl
t2
e3 o
tI t4
P5
t
P6
Pl
Fig. 3. A finite 1-safe net system and its finite unfolding fir.
Consider the (always) reachable deadlock marking {PT). Since equals {{e2, eh, es}), we have 3
Last({pT})
Seo (Se, ({e2, eh, e8 })) = Se, ({el, e4, e~, e~ }) = {el, e3, e6, eT, e9, elo, eg }, therefore {el, e3, e6, eT, e9, elo, eg} e #S.{{e2, eh, ca}} and {el, e3, e6, eT, eg, elo} 6 vpS.{{e2, eh, e8)}. Notice that e5 was shifted out of flf and then returned (as elo). However, {el, e3, eo, eT, e9, el0} ~ IzS.Last({pT}), because once the event e5 is shifted out 3 The (double) primed events are not contained in the finite unfolding, because they lie after some cut-off event. But it should be clear, which events are meant in the maximal branching process, e.g. "e~ = {b~4}-
336
of the finite unfolding, the finite version S of the shift function S forgets about it. The problem is caused by the event e4. This event belongs to a class of events, which is characterized in the next definition.
Definition 10 (Tricky events). Let e 6 Off and let (C1, C2) be a cc-pair, e tricks(C,,C2) iffe 9 % Cut(C~) A (I(c,,c2)(e))*'AT Cut(C2) A Ef # @. Tricky events are recognized when calculating 4 the pairs (e', I(c,,c2)(e')) 6 E~ for a cc-pair (C1, C2). In our example, e4 is recognized as tricky (it tricks (0, e6)) because e4 6 %Cut(@) A (I(e,e6)(e4))"At Cut(e6) A E f = {elo} # 0. Equation (1) is only true for finite unfoldings containing no cut-off events el and e2, such that el tricks (e~, e2). This can be achieved in a 'brute force' way by enlarging the finite unfolding in an appropriate way, e.g. by introducing the additional requirement that for every cut-off event e its corresponding event e ~ must be an element of its local configuration (e ~ 6 e). But this method would significantly slow down potential model checking algorithms, e.g. the one proposed in 6. Keeping all events which fall out of the finite unfolding during repeated applications of S 'in mind' has a similar additional space and time complexity and is therefore out of the question. The approach we follow is to consider tricky events by combining sequences of shifts disturbed by them into 'clean' single shifts. In our example, instead of shifting the configuration {e2,eb,es} two times, first wrt. e4 and then wrt. e6 Closing the event e5 or, respectively, elo), we circumvent the tricky event e4 by shifting {e2, eb, es} wrt. ({e2}, {el, e3, e6, eT, e9}) in order to get the desired configuration {el , e3, e6, e7, e9, elo }. Therefore, by adding (as few as possible) appropriate cc-pairs (V ~ V) to the set {(e~ e) I e 6 Off }, we now construct a s e t / 2 of cc-palrs taking all tricky events into account such that max(vpS.Last(M)) = max(#Sn.Last(M)) holds. This solves our remaining problems, because the set #S;~.Last(M) is finite and its calculation requires only a finite number of applications of S. The construction of /-2 does not establish ~7pS.Last(M) = pSi.Last(M) (it can be checked later on that Fig.3 is a counterexample) nor max(vpS.C) = max(pSi.C) for an arbitrary C Cbcf. A counterexample concerning the second equation is omitted due to its complexity; interested readers should contact the author. In general, each event e tricking a cc-pair (V~ V1) is taken into account by introducing a cc-pair (V~ V2), whereby V~ can be easily calculated by shifting V~ U el wrt. (V~ Observe that V2 = S(vo,v,)(V ~ U e) contains only the cut-off events which are already contained in V,. The calculation of V~ is a little bit harder, because two unpleasant things can happen. F i r s t , it would be nice to calculate V~ by shifting V~~ U e backwards wrt. e. This would work in our example, because $~1(0 U lea) = {e2 }. Unfortunately, if there is another event e' tricking (e~ e), it can happen that S~-1 (V~ q~ E l . This would mean that S1 (Vl~Ue) does not exist, or in other words, v S I ( V ~ U e) is probably not a corresponding configuration of Vs. Therefore, it can happen that 4 Remember, this is necessary for performing a finite shift of a configuration.
337
the calculation of V~ depends on the existence of another cc-pair (V~ V3) already contained in /2, which takes the tricky event e' into account. This shows that the elements o f / 2 must be calculated in a certain order. An example concerning this 'first unpleasant thing' can be found in App.A. S e c o n d , suppose there is some 'appropriate' cc-pair (V~ 1/3) and let
(vo,
=
-1
o
u el),S(vo,v,)(v ~ u e)).
In this case, it can happen that V2~ contains some cut-off event e *. Let b~ E e '~ and b E Cut(V2) with p(b) = p(b'). If there is some e" E b'Ml" Cut(V2) M E I then we are again confronted with the problem that there is no e'" 61" Cut(V~ MEf with I(yo,v2)(e'") = e". To solve this problem, we shift V~ U e backwards (starting wrt. some appropriate (V~ V3) 6 / 2 ) until it contains no cut-off events anymore ~. Notice that this sequence of reverse shifts may require a sequence of appropriate cc-pairs already contained in 9 . We c a l l / 2 shift complete wrt. V~ U el if such cc-palrs exist. For simplification reasons we fix the following notation. N o t a t i o n 1 1 . Let C E Ym and let a = (V~ (V~ be a sequence of cc-pairs (n 6 ~ , a = e if n = 0). We define Sa(C) = Xu and S a l ( C ) = In with Xo =Yo = C and, for 0<__i < n ,
Xi+l and
= ~ S(yo+l,y,+l)(Xi) , if Xi is defined I undefined
Yi+l =
, otherwise
S(~_~,v~_~)(Yi ) , if Y/is defined undefined , otherwise.
D e f i n i t i o n 12. Let T be a set of cc-pairs. We call T shift complete with respect to a configuration C E ~I if 6' M 0 f f = 0 or if there exist some cc-pairs (W~, W1),..., (WOn,Wn) E T (n E ~ \ {0}) such that --1
Vi, 1 < i < n : S(wo,w,)...(wo,w~)(C) C E I and
S(wo,w,)...(w2,w~) (6") n 0 t l = 0.
In this case, we denote the sequence (W ~ W1) ... (W ~ Wn) by T c. If T is shift complete wrt. all C E 9rl then we call T shift complete. Observe that T c does not have to be unique. Now, as mentioned one page before, by iteratively adding appropriate ccpairs to the set {(e~ el) e e Off } we construct a shift complete set /2. The construction can not be done in one step, because one tricky event can prevent the consideration of another tricky event (remember the 'unpleasant things'). 5 An example concerning this 'second unpleasant thing' is omitted due to its complexity; interested readers should contact the author.
338 Definition 13. The set s is defined by s = Ui>0 x i with
l e e Off }
X0 = {(e~
xi+l = xi o {(v~
I
and
9 off 3 ( v ? , v l ) 9 x , : e tricks (V~ V1) A Xi is shift complete wrt. V~ U e
A
= S(v?,v,)(VP o el)
A V ~ = S D ' ( V ~ U el) with D = zV~ Since s C ~-y, there exists some k 9 hi such that $) = Uo<~
Proof. Assume s not to be shift complete wrt. some C' 9 ~'I- This means, that there is no sequence s which implies that there is some C obtained by shifting C' backwards as often as possible without falling out of fly such that V(V~ V) 9 s V C_ C Se 9 Off :e tricks (V~ V) A I(vo,v)(e) "" n C r
0.
(2)
If (V~ I11) is one of these (V ~ V) and el is the corresponding e then s is not shift complete wrt. V~ Uel (otherwise the tricky event el would have been taken into account due to the construction of s and V ~ U ell < C'. This means, that there is no sequence s which implies that there is some C obtained by shifting V~ U elf backwards as often as possible without falling out of ~31 such that (2) holds again. If (V~ V2) is one of these (V ~ V) and eu is the corresponding e then s is not shift complete wrt. V~ U e2 and IV~ U e2l < IV1~ U ell. This argumentation can be repeated ad infinitum. Since there cannot exist an infinite sequence of configurations of decreasing size, the assumption must be false. D A repeated application of the following Lemma shows that each non-elementary shift corresponds to a sequence of elementary shifts.
As in Definition 13, let (V~ 9 X,+I with V~ = SD~(V ~ U e) and V2 = S(v~o,vl) (V~ U el) (i 9 IN, D = X v~ue) and let C 9 ,~m.
Lemmalh.
S(v2,v2) (c) = S(vo,v,)(sD (c)). Finally we have Theoreml6.
max( v lzS.Last( M ) ) -- max(#S ~.Last( M ) ) .
Proof. Lemma 15 already yields max( ~7#S.Last( M ) ) D max(#S ~ .Last( M ) ) . Let lc(C) = {C 9 C -~3C' 9 C : IC' I < C} denote the set of configurations of /owest cardinality contained in some C C_ ~-rn and let C1 e max(~71~S.Last(M)).
339
Since ~ is shift complete, we can compute (by a finite amount of reverse shifts) C o with C o A Of = 0 and Mark(C ~ = Mark(C1). Suppose C O ~ Last(M) (otherwise we are done). Since M e Mark(C~ C = lc{C e ~ra I Mark(C) = M A C o C C} r 0. Let C2 e XTC. Now we compute C O with C ~ A Of = 0 and Mark(C ~ = Mark(C2). If again C O ~ Last(M) we iterate the above procedure until we get some C ~ (n > 2) with C ~ <_ Last(M); this will happen sooner or later, because IC' \ CO with C' e lc{C e f m Mark(C) = M A C ~ C C} gets smaller in each iteration (i -- 1, 2,...). By inverting all finite shifts that have been done, we can re-compute C1 which shows
max( ~7#S.Last( M ) ) C max(#S ~ .Last( M ) ) .
5
Summary, C o n c l u s i o n and O u t l o o k
This paper can be seen as a continuation of the work done in 6. Its main contribution is a generalization and correction of the shift operators presented there. The necessity of this generalization is shown by uncovering several subtleties of the finite unfolding (e.g. the existence of tricky events). Finally, it is shown that with the new operators the computing of teachability properties hidden in the finite unfolding of a given system is an easy job (Prop.4, Theorem 8 and 16). A properly working shift operator does not come for free. But since the construction of the shift complete set/2 is not very expensive 6 and has to be done only once and, in particular, before the potential start of a modelchecker working on the finite unfolding, it costs nearly nothing compared with the potentially numerous and expensive modelchecker runs. On the contrary, by summarizing sequences of shifts into single shifts, there may be chances for significant reductions of the time complexity of potential modelcheckers, e.g. the one proposed in 6. This should be examined in future work. The author is working on an extended version of this paper containing the proofs and examples left out here; it will be published soon. A c k n o w l e d g m e n t s . I would like to thank Eike Best and Javier Esparza for reading an early draft version of this paper. Special thanks go to Michaela Huhn; she gave me a major hint for the proof of the shift completeness of/2.
6 A lot of examples have been checked; in general, even complex unfoldings contain only a few tricky events - - exact complexity investigations seem to be very difficult and require further study.
340
A
An Example
p
t
p
't
Concerning the 'First Unpleasant
~ t bl
f_~_~
tl
Pll
tl
P6
t
PlO
t9
Pl
t5
P6
Thing'
P7
~5 t8
~5 P8
t7
tll t9
P
t2 t4
P5
P4
t1
t
0 ell ~ e8 e~6 ~ e6
~ t4
P5
t5
P6
t6
P7
Pl
Fig. 4. A finite 1-safe net system and its finite unfolding/3f. ell is a tricky event; it tricks (es, els). Se,0(es
Notice that
U e111) = S
,o(e
,e6,el,es,el
)
=
Mark({el, e2, es, e12, e16, e17, els, e19}) = {Pl,PS}, but V S ~ 1,(e6 U ell) = v{e3, es, e7, e~} = {e3, es, e7}
and Mark({e3,es,eT}) = {Plo,Pe}. This is caused by another tricky event, namely e7 tricking (es, ell), which can be taken into account by (S/71(es U leT), Se,, (es U eT)) = ({e4, elo}, {el, e5, e11, e2}). Observe, for example, that Se,,(Se~({e4,e9,elo})) = Se,,({e3,eT,e8,e~}) = {e1,es,e11,e2,e6} and Sell (Se, ({e4, co, elo})) = {el, eh, e11, e2}, but S({~4,elo},{e,,eh,ell,e=})({e4 , e9, elo}) = {el, eh, el, e2, es}. With ({e4,elo}, {et,eh,e11,e2}) 6 I2, the first component of the cc-pair taking the tricky event eli into account can be calculated via tt
,
o;,t
,
,
,
JJ
341
References 1. Bernardinello, L., De Chindio, F.: A survey of basic net models and modular net classes. G. Rozenberg, ed., Advances in Petri nets 1992, Lecture Notes in Computer Science 609, (Springer, Berlin, 1992) 304-351. 2. Best, E., Devillers, R.: Sequential and Concurrent Behaviour in Petri Net Theory. Theoretical Computer Science, 55(1), (1987) 299-323. 3. Best, E., Fernandez, C.: Nonsequential processes - a Petri net view. EATCS Monographs on Theoretical Computer Science 13 (1988). 4. Clarke, E.M., Emerson, E.A., Sistla, A.P.: Automatic verification of finite-state concurrent systems using temporal logic specifications. ACM Transactions on Programming Languages and Systems, 8(2), (1986) 244-263. 5. Engelfriet, J.: Branching processes of Petri nets. Acta Informatica 28, (1991) 575591. 6. Esparza, J.: Model checking using net unfoldings. Science of Computer Programming 23, (1994) 151-195. 7. Esparza, J., R6mer, S., Vogler, W.: An Improvement of McMillan's Unfolding Algorithm. T. Margaria, B. Steffen, ed., Proceedings of TACAS'96, LNCS 1055, (1996) 87-106. 8. McMillan, K.L.: Using unfoldings to avoid the state explosion problem in the verificationof asynchronous circuits.Proceedings of the 4th Workshop on Computer Aided Verification,(Montreal, 1992) 164-174. 9. Petri, C.A.: Kommunikation mit Automaten. Schriften des Institutes fiirInstrumentelle Mathematik, (Bonn, 1962). 10. Queille, J.P., Sifakis, J.: Specification and verification of concurrent systems in C E S A R . Proceedings of the 5th International Symposium on Programming, L N C S 137, (1981) 337-351.
Graph Editing to Bipartite Interval Graphs: Exact and Asymptotic Bounds. K. Cirino? S. Muthukrishnan?? N. S. Narayanaswamy??? H. Rameshy
Abstract. Graph editing problems deal with the complexity of transforming a given input graph G from class G to any graph H in the target class H by adding and deleting edges. Motivated by a physical mapping scenario in Computational Biology, we consider graph editing to the class of bipartite interval graphs (BIGs). We prove asymptotic and exact bounds on the minimum number of editions needed to convert a graph into a BIG.
1 Introduction Graph editing problems deal with the complexity of transforming a given input graph G from class G to any graph H in the target class H using editing operations, namely, adding, deleting edges or doing a combination of both; we denote this process by G ! H. Suppose we allow only edge additions. Then, the graph editing problems are what are known as \graph completion" problems. A classical example is the MINIMUM FILL-IN PROBLEM where H is the class of chordal graphs, G is arbitrary, and edges may only be added [22, 24]. Suppose instead we allow only edge deletions. Then, graph editing problems become what are known as \largest subgraph" problems. See [10] for references on these problems. In this paper, we consider graph editing problems where both additions and deletions are allowed; we use editions to refer to an addition or a deletion. Graph editing problems have been extensively studied in Graph Theory and its application areas for special classes of graphs: path and edge graphs (see [14], pages 198{199), trees [11], various interval graphs [13], chordal graphs etc. In the past 4 years, there has been a tremendous resurgence in the study of graph editing problems motivated by Computational Biology [19, 5, 6, 7, 12, 13, 16, 17, 19]. The majority of these draw ? Northeastern University. Work supported by DIMACS Special Year on Computa-
tional Biology.
?? Information Sciences Center, Bell Laboratories Innovations, Lucent Technologies,
Murray Hill, NJ 07974. [email protected].
??? Dept. of Computer Science and Automation, Indian Institute of Science, Bangalore
560012, India. [email protected].
y Dept. of Computer Science and Automation, Indian Institute of Science, Bangalore
560012, India. [email protected].
their motivation from DNA physical mapping. In those applications, H is a class of interval graphs 5 . The problems we study in this paper are motivated by Computational Biology as well. Speci cally we study the graph editing problem G ! H for the class of bipartite interval graphs (BIGs) H that are a specially important subclass of interval graphs of biological interest. We present the biological scenario and the motivation in Section 6 as an Appendix, and focus here on the graph-theoretic and complexity issues of the graph editing problem. While edition using insertions and deletions separately has been well studied for various classes of graphs, graph edition problems appear harder when both addition and deletion operations are allowed since they can take potentially many more paths from the source graph to the target graph passing through intermediate graphs which have little, or nothing, in common with both. While our results are theoretical and they do not have a practical relevance for the biological scenarios that motivated us, we hope that our insight into the complexity of this problem helps practitioners focus experiments better. We now describe our results in detail.
Our Results and Their Relevance.
We prove asymptotic and exact bounds on the complexity of editing graphs to BIGs.
Asymptotic Results.
To start, we prove: { Finding the minimum number of editions needed to convert a graph to a BIG is NP-hard. This holds even if the input graph is bipartite and it has bounded degree. On the other hand, we show that there is an easy 3 factor approximation algorithm for this problem. In addition, if the graph has at least 3n edges then it is trivial to obtain a 2 factor approximation. Thus sparse graphs (with say fewer than 3n edges) and additive approximations seem more meaningful for the problem than arbitrary graphs and standard multiplicative approximations. Next we prove that, { The minimum number of editions to convert a graph to a BIG cannot be approximated to an additive term of O(n1, ), for 0 < < 1, unless P = NP . In fact, this is true even when the input graph is of bounded degree and is restricted to be sparse, i.e., m n(1 + 1c ), for some constant c. On the other hand, we consider trees, that is graphs with m = n , 1 and prove a positive result. { The minimum number of editions needed to converted a tree to BIG can be determined in linear time. We also observe that if the input graph had treewidth !, then the minimum number of editions needed 2 to convert that to a BIG can be determined in O(2! poly(n)) time by 5
A graph is interval graph if each vertex can be assigned an interval on the real line such that there is an edge between two vertices if and only if their corresponding intervals overlap.
adapting standard methods for processing bounded treewidth graphs [3]. 6 Bounded degree graphs are motivated from Computational Biology. Making the biologically realistic assumption that the graphs are of bounded degree gives polynomial time algorithms when the unconditional case is NP-Complete [16, 18] for some problems. Our results above show that is not the case here. In proving the results above, we establish a relationship between the minimum number of editions and what is called the caterpillar cover of a graph (See Section 2 for the de nition). We prove complexity results on approximating the caterpillar cover which may be of independent graph-theoretic interest. Exact Complexity Bounds. We prove exact bounds on the worst case complexity of editing trees to BIGs. { The minimum number editions needed to convert a connected tree to a BIG is at most n , 5 changes when n is odd, and it is at most n , 6 changes otherwise. { The minimum number editions needed to convert a connected tree to a (general, not necessarily connected) interval graph is at most b n,2 5 c. { The minimum number editions needed to convert a connected tree to a (general) connected interval graph is at most b 2n,3 11 c. In each case, we prove the upper bound to be tight by exhibiting graphs that require that many editions. All our proofs here make heavy use of the known characterization of BIGs in terms of its forbidden subgraphs.
ut
Map. Some preliminary results needed are stated in Section 2. Our asymptotic hardness results are in Section 3. Our asymptotic and exact complexity bounds for trees are in Section 4. Some related observations appear in Section 5. The relevance of our results to Computational Biology is described in Section 6.
2 Preliminaries A caterpillar is a tree in which the deletion of all nodes of degree 1 results in a path. Suppose T is a tree which is not a caterpillar. Then there exists a splitting (T1 ; T2 ) of T where T1 is a caterpillar with jT1 j 5 (See [1]). Here jT1 j is the number of vertices in T1 . A graph is an interval graph if each vertex can be assigned an interval on the real line such that there is an edge between two vertices if and only if their corresponding intervals overlap. Two forbidden classes in the structural characterisation of interval graphs that are relevant to us are cycles of length 4 or more, and asteroidal triples. An asteroidal triple is a graph with the following structure: u - v - w - a - b and w - y - x. We can conclude that a tree is an 6 See [4] for de nition of treewidths and survey of results. A standard tree has treewidth 1.
interval graph if and only if it does not have asteroidal triples (since all the other class of forbidden graphs are not trees). It immediately follows that trees that are interval graphs are precisely the class of BIGs. We call a BIG a caterpillar. A caterpillar cover of a graph G is a partition of V (G) such that, the induced subgraph of G on each set of the partition has a spanning tree that is a caterpillar. Throughout, we denote the degree of a vertex v by (v). We use T , I , B, CI and C for the classes of trees, interval graphs, bipartite graphs, connected interval graphs and caterpillars respectively.
3 Asymptotic Complexity First we make a characterization in terms of caterpillar covers.
Lemma 1. Let the minimum number of edge editions on a bipartite graph G with n vertices and m edges to make it a caterpillar be (G). Let be the cardinality of the smallest caterpillar cover of G. Then, (G) = m , n , 1 + 2
Lemma 2. Let the minimum number of edge editions required to convert
a connected graph G with n vertices and m edges into a caterpillar be (G). Then there exists a polynomial time algorithm that approximates (G) within a factor of 3. In addition, if m n + nc for c > 0, there is a trivial algorithm to approximate (G) to a 2c + 1 factor.
Proof. Our algorithm simply nds the optimal caterpillar cover CT for
any arbitrary spanning tree T of G using Theorem 7, discards edges not in this cover, and connects the caterpillars together using the required number of edges. The number of editions performed in this process is at most m , n , 1 + 2T , where T is the number of caterpillars in CT . The rst part of the lemma now follows from Lemma 1 and from the fact that T m , n + 1 + , where is the optimal caterpillar cover of G. For the second part of the lemma, just delete all edges and add n,1 edges to connect the vertices into a single caterpillar. The number of editions made is m + n , 1. If m n + nc then m + n , 1 (2c +1)(m , n , 1+2)
ut
We remark that there exist graphs on which the above algorithm does indeed give an approximation factor of 3. Next, we consider additive approximations. We prove complexity results on nding the minimum caterpillar cover and use it to derive our results for (G).
Theorem 3. Deciding if a bounded degree bipartite graph has a caterpillar cover of cardinality k is NP -complete. This is true even when the graph has at most n(1 + 1c ) edges, for some constant c > 0.
Proof. (Sketch) The problem is clearly in NP . The hardness reduction is
from the Hamiltonian Path problem for directed graphs with bounded in and out degrees (The latter problem is NP -complete { See pages 199-200 in [14]). Given a digraph D with in and out degrees bounded, we will obtain a bipartite graph G with bounded degree which has a Hamiltonian path if and only if D has a Hamiltonian path. First, a graph G0 is obtained as follows. For each vertex v is D, there are path of four vertices vi ; v1 ; v2 ; vo in G0 . For each edge (u; v) in D, there is an edge uo ; vi in G0 . It is easy to see that G0 has a Hamiltonian path if and only if D has a Hamiltonian path. Further, coloring the vis and v2 s with one color and the v1 s and vo s with another shows that G0 is bipartite. In addition, G0 has bounded degree. Next, G is obtained by augmenting G0 in two ways: rst, by adding a vertex w0 for each vertex w in G0 and adding the edge w;w0 , and second, by adding k , 1 isolated vertices. G is easily seen to be bounded degree and bipartite. Further, G has a caterpillar cover of cardinality k if and only if G0 has a Hamiltonian path. G can be made sparse by replacing one of the isolated vertices by a chain of long enough length. ut
Corollary 4. Computing the minimum number of editions to a bipartite graph to convert it to a caterpillar is NP-hard.
Proof. Follows from Theorem 3 and Lemma 1.
ut
Theorem 5. There exists no ,polynomial time algorithm that will nd
a caterpillar cover of size n1 k in a bounded degree bipartite graph, unless P = NP . Here, k is the cardinality of the smallest caterpillar cover, is any xed number between 0 and 1, and > 0. Further, this holds even when the graph is sparse, i.e., it has at most n(1 + 1c ) edges, for some constant c > 0.
Proof. (Sketch) We show that if such an algorithm exists then the Hamil-
tonian Cycle problem for bounded degree sparse bipartite graphs can be solved in polynomial time. But this problem can be seen to be NPComplete as follows: The Hamiltonian Cycle problem in digraphs with bounded in degree and out degree is NP-Complete [14]. This holds even when the digraph is sparse because any digraph can be converted to a sparse one preserving Hamiltonicity by stretching any arbitrary vertex to a chain. This problem can now be reduced to the problem of deciding Hamiltonicity in bounded degree, sparse, bipartite graphs. The reduction used is essentially the same as the one described in Theorem 3. Given a bounded degree sparse bipartite graph H with p vertices, we construct a bounded degree bipartite graph G with n vertices. G has a
caterpillar cover of cardinality 1 if H has a Hamiltonian cycle, and no caterpillar cover of cardinality less than p` + 1 otherwise. We chose ` and in such a way that p` + 1 = n1, + 1. First, we de ne a graph G0 which consists of the graph H plus two other vertices, called i and o. In G0 , i is connected to an arbitrary vertex v of H and o is connected to all the neighbors of v in H . Next, a graph G" is de ned as having d p` e copies of G0 chained together, i.e., the o vertex of one copy is connected to the i vertex of the next copy. Next, G is obtained by augmenting G" by adding one extra vertex w0 for each vertex w in G" and adding the edges w0 ; w. It can be seen that G0 ; G00 ; G are all bounded degree and bipartite and sparse. In addition, G" has a Hamiltonian path if H has a Hamiltonian cycle. It follows that G has a caterpillar cover of cardinality 1 if H has a Hamiltonian cycle. Suppose H does not have a Hamiltonian cycle. Then G has no caterpillar cover of cardinality less than d p` e + 1. Since the size of G, n` , is at most 8 p`+1 , any caterpillar cover in G` has size at least ( 8n ) `+1 + 1. Setting n1, + 1 to be equal to ( 8n ) `+1 + 1, we l = 1 , and = 8` `+1 . get l+1 ut
Theorem 6. The number of editions required to obtain a caterpillar from a bounded degree bipartite graph G cannot be approximated to an additive term of O(n , ) for any xed , 0 < 1, unless P = NP . 1
Proof.(Sketch) Let be the cardinality of the smallest caterpillar cover of G. At least m , n , 1+2 editions are required. Suppose you can obtain a caterpillar in m,n,1+2+O(n , ) m,n,1+3O(n , ) editions. Then we can obtain a caterpillar cover of size O(n , ) in polynomial time. But this contradicts Theorem 3. ut 1
1
1
4 Exact Complexity We rst present a linear time algorithm for optimal solution of editing trees to caterpillars.
Theorem 7. An optimal caterpillar cover can be found in O(n) time in a tree T .
Proof Sketch. We will identify a particular caterpillar C in T whose 0
removal will give a new tree T with the following property: the optimal caterpillar cover of T is exactly one more than that of T 0 . We can assume that T is rooted at a non-leaf vertex. We also assume that there is always a vertex which is either the root and has at least 3 non-leaf children or a non-root node and has at least 2 non-leaf children; if such a vertex does not exist then T is itself a caterpillar. Let v be such a vertex with the added property that in the subtree rooted at v, no
other vertex has more than one non-leaf child. Then the subtrees rooted at children of v are caterpillars. The caterpillar C is determined as follows. If v has exactly two non-leaf children then the subtree rooted at v is itself a caterpillar; we take C to be this caterpillar. Otherwise, if v has more than two non-leaf children then pick the subtree rooted at any child of v as C . Clearly, the optimum caterpillar cover of T has at most one more caterpillar than the optimum caterpillar cover of T , C = T 0 Further, from any optimum cover of T , by at most one simple edge exchange operation, an optimum cover of T containing C can be constructed. This shows that the optimum cover of T 0 is at most one less than the optimum cover of T . Therefore, the size of the optimal cover for T is exactly one more than that for T , C = T 0 . The following is a linear time implementation of the algorithm. By a Depth First Traversal of the tree, a stack of vertices which have at least two non-leaf children is constructed. Every vertex which has at least two non-leaf children is pushed onto the stack the rst time it is visited. Let v be the vertex at the top of the stack. The following procedure is repeated till the stack becomes empty. If v has more than two non-leaf children in T , then the subtree rooted at a child of v is added to the caterpillar cover and is removed from T . If v has exactly two non-leaf children at T , then the subtree rooted at v is added to the caterpillar cover and is removed from T and, v is popped o the stack. In either case, let the resulting tree be denoted by T 0 . If the stack is not empty, then the above procedure is repeated with T := T 0 . We maintain the invariant that in the subtree rooted at v, no other vertex is present in the stack. Clearly, this is a linear time implementation of the algorithm. ut Combined with Lemma 1, it follows that the minimum number of editions needed to convert the given tree to a caterpillar can be computed in linear time.2 This result can be extended to graphs of treewidth ! in time O(2(!) poly(n)) time using standard ideas from [3]; we omit these details. Next we prove a series of three exact bounds on the minimum number of editions needed to convert the given tree into various interval graphs.
Theorem 8. The graph editing problem T ! C takes at most n , 5 editions if n is odd, and at most n , 6 editions otherwise; here n is the number of vertices in T T . Proof. The proof is by induction on n. The base case is when n = 7.
In that case, the only non-interval tree is the asteroidal triple which can be changed to a caterpillar by removing the edge (u; v) and adding the edge (u; w). When n = 8, the three non-interval trees are formed from the asteroidal triple by attaching an additional vertex to vertex u, to v and to w respectively. Each such tree can be made into a caterpillar by removing the edge (u; v) and adding the edge (u; w). Now assume the induction hypothesis holds for k < n. Consider two cases.
Case 1. Let T be a tree, T not a caterpillar, jT j = n odd. Let v be an end vertex of T , v adjacent to vertex u. Consider the graph T , v, jT , vj = n , 1 is even. Change T , v to theS caterpillar (T , v)0 using (n , 1) , 6 = n , 7 steps. Now if (T , v)0 fv; (u; v)g is a caterpillar, we are done. Else, remove edge (u; v) and add edge (v; s ) where s , 1 is the vertex in the labeling of the spine of (T , v)0 . The total number of editions is n , 7 + 2 = n , 5. Case 2. Let T be a tree, T not a caterpillar, jT j = n even. Then by the result of [1], there exists a splitting of T at v into T and T such that jT j 5 and T is a caterpillar. Now we wish to change T into a caterpillar. If jT j = 5, jT j = n , 4 which is even since n is even. Hence by induction hypothesis, jT j (n , 4) , 6 = n , 10. Otherwise if jT j > 5, jT j < n , 4 n , 5. Again by induction hypothesis, jT j (n , 5) , 5 = n , 10. In either case, the number of editions to convert T to a caterpillar T 0 is n , 10. It remains to only re-associate v in T 0 0 1
1
1
1
1
2
2
2
2
1
2
2
2
2
1
and T2 to form target graph T . Subcase A. The vertex v is a leaf vertex of either T1 or T20 . W.l.o.g. say T1 . Let u be the vertex adjacent to v in T1 . Remove the edge (u; v) from T 0 to get a disconnected graph T containing the two components X and Y both of which are caterpillars. Now we can add one edge between X and Y so that T is a connected caterpillar. The total number of editions is n , 10 + 2 n , 8. Subcase B. The vertex v is not a leaf vertex in either T1 or T20 . Consider v in T1 . Since T1 is a caterpillar, v is adjacent to at most two non-leaf vertices, say x and y. Remove edges (v; x) and (v; y) from the graph T 0 . The resulting graph T contains three components all of which are caterpillars. Now the three components can be connected up by adding at most two edges. The total number of editions is < n , 10 + 4 = n , 6.
2
Theorem 9. There exists a tree T 2 T on n nodes for odd n that requires at least n , 5 (respectively n , 6 when n is even) editions to be converted into a caterpillar.
Proof. Consider any sequence of editions of minimum length that con-
verts a tree into a caterpillar. Clearly this sequence does not consider the operation of inserting an edge e and deleting that same edge e (in either order). Note that the operations in that sequence can be arbitrarily permuted without changing the overall outcome. Therefore, we can assume without loss of generality that any such sequence has all the delete operations preceding all the insert operations. Say n is odd. Consider the tree T in Figure 4 where the center node is denoted w. Each path of length two emanating from w (including w) is called a branch. There are k = n,2 1 branches in T . Note that any three of these branches taken together form a 2-star. In order to convert T into a caterpillar, all such 2-stars must be destroyed. If any edge deletion occurs in a branch, we say that branch is broken, and it is unbroken otherwise.
b
b H b
! HHb S b !b! H ! S b b! b b H
(n-1)/2 branches n is odd
b
bH
b
b
b
! HHb S b !b! H ! S b b! b b H
(n-2)/2 branches n is even
b
Fig. 1. The trees for Theorem 9. Let x be the number of broken branches after all deletions have been performed; clearly the number of deletions is at least x. The number of disconnected components left behind after all the deletions have been performed is at least x +1 since each edge deletion creates one additional component. Any sequence of additions that compose a connected BIG from these components must perform at least x additions. Thus the total number of editions is at least 2x. Claim. x k , 2. Proof. Suppose otherwise that x < k , 2. In that case, at least three branches are unbroken after all deletions have been performed; this implies the presence of at least one 2-star. Consider the component that contains a 2-star. In order to destroy this induced subgraph forming a 2-star, one or more edges have to be added within this component. But adding any edge within this component introduces cycles and hence the resulting graph cannot be a caterpillar. That gives the contradiction. 2 Thus it follows that the total number of editions is at least 2x 2k , 4 n , 5 proving one part of the theorem. Suppose n is even. Consider the tree in Figure 4. The number of branches there is k = n,2 2 . The rest of the argument above holds and the minimum number of editions needed is at least 2x 2(k , 2) n , 6 proving the other part. 2
Theorem 10. The graph editing problem T ! I takes at most b n,2 5 c editions where I is the class of not necessarily connected interval graphs. Proof. The proof is by induction on n. The base cases for n = 7 and
n = 8 are the same as in Theorem 8. In each case the removal of the edge (v; w) will yield an interval graph. We assume the hypothesis is true for k and show it for k + 2. First we observe that any tree T , jT j 3, contains one of the following two subgraphs: (A) vertex w connected to u and to v, and the rest of the tree connected to w, and u and v are not connected to each other; (B ) vertex w connected to u which is connected to v, and the rest of the tree is connected to w; w and v are not connected to each other. In both cases, u and v are not adjacent to any other vertices. Suppose jT j = k + 2.
Case 1. T contains subgraph A. Consider T with u; v removed. By induction hypothesis, this can be changed to an interval graph T 0 using b k, c editions. Consider T 00 obtained from T 0 with edges (w;u) and 00 2
5
(w;v) added. Suppose T contains an asteroidal triple. Then both u and v must be part of asteroidal triples since T 0 did not contain any. Let y1 ; y2 ; u and y1 ; y2 ; v be asteroidal triples in T 00 . We claim that the degree of w, denoted (w), is 3. Suppose not. Then there exist some vertices x and z adjacent to w. But then either y1 ; y2 ; x or y1 ; y2 ; z is an asteroidal triple in T 0 . But T 0 is interval and therefore only one of x nd z can be adjacent to w. Hence (w) = 3. Let vertex x be adjacent to w. Remove the edge (w;x) in T 00 . The resulting graph is interval. So the number of editions is b k,2 5 c + 1 = b (k+2)2 ,5 c. Case 2. T contains subgraph B . Consider T with u; v removed. By induction hypothesis, this can be changed to an interval graph T 0 using b k,2 5 c editions. Consider T 00 obtained from T 0 with edges (w;u) and (w;v) added. Suppose either u or v forms an asteroidal triple. Remove the edge (w; u) in T 00 . The resulting graph is interval. So the number of ,5 c. editions is b k,2 5 c + 1 = b (k+2) 2 2
Theorem 11. There exists a tree T 2 T such that it takes at least b n, c 2
5
editions to convert it into a not necessarily connected interval graph.
We omit this proof here. Use the same example as in Theorem 9. It will suce to stop the argument in Theorem 9 with the case of deletions. We note that the construction in Theorem 10 produces disconnected interval graphs in some cases (for example in Case 2 above). In what follows we study the graph editing problem from trees under editions if the target graph is required to be a connected interval graph.
Theorem 12. The graph editing problem T ! CI takes at most b editions where CI is the class of connected interval graphs.
n,11 c
2
3
Proof. The proof is by induction on n. The base case is when n = 7 or n = 8. When n = 7, there is only one non-interval tree, the asteroidal
triple. In this case, one edition operation suces since adding the edge (w;x) destroys the 2-star and converts that to a connected interval graph. That proves the base case when n = 7. When n = 8, there are three noninterval trees (up to isomorphism) as in the proof of Theorem 8; in each case, simply adding the edge (w;x) suces and the base case holds. For induction hypothesis, assume the theorem is true for all k < n. Let T be a tree with jT j = n. Suppose the longest path in T has length p. Consider the set of longest paths in T . Case 1. There exists at least one longest path v1 ; v2 ; : : : ; vp such that either (v2 ) 3 or (vp,1 ) 3. W.l.o.g say (v2 ) 3. Now we remove the edge (v2 ; v3 ). This results in two components T1 and T2 such that
T1 is a star (hence a caterpillar), jT1 j = m 3, and jT2 j = n , m. The total number of editions to transform T into a connected interval graph is the cumulative cost of converting T2 to a connected interval graph, that for T1 , and the additional editing to form and later reconnect the 2(n,m),11
components. The rst of this takes at most b c editions. The 3 second takes none. The third amounts to 2, one for deleting the edge (v2 ; v3 ) and the other for connecting the two components. Thus, the number of editions needed in all is at most b 2(n , m3 ) , 11 c + 2 = b 2(n , m3 ) , 11 + 2c b 2(n , 33) , 11 + 2c since m 3. That reduces to at most b 2n,3 11 c. That nishes Case 1. Before we consider Case 2, we prove a useful lemma.
Lemma 13. Let T be the connected tree with longest path v ; v ; : : : ; v of length 5, jT j = n 7 and (u) 2 for all u except u = v . Then it takes at most b n, c editions to convert that to a connected interval 1
2
graph.
2
5
3
5
Proof. T has the longest path v ; ; v and additional paths of length 1 or 2 starting from v . Let b be the number of additional paths of length 2 from v . We can label these paths v ; vi; ; vi; for 1 i b. Now adding the edges (v ; vi; ) for all i gives an interval graph. So we must 2 add exactly b edges. But b b n, c. Lemma follows. 1
5
3
3
3
3
2
2
1
2
5
Now we return to Case 2 in the proof of Theorem 12. Case 2. In every longest path v1 ; v2 ; : : : ; vp, (v2 ) = 2 and (vp,1 ) = 2. (Note that otherwise, we have Case 1 above). If the longest path in the tree is of length 5, then we have precisely the case in Lemma 13. So that takes at most b n,2 5 c editions which is at most b 2n,3 11 c for n 7 as needed. So we will only trees with longest path of length greater than or equal to 6. Consider some longest path v1 ; : : : ; vp . The removal of edge (v3 ; v4 ) results in two components T1 (containing at least the vertices v1 ; v2 ; v3 ) and T2 (containing at least the vertices v4 ; v5 ; v6 ). So jT1 j = m 3 and jT2 j = n , m 3. The longest path in T1 must be of length at most 5 since otherwise, we can construct a path for T which is longer than v1 ; : : : ; vp . So it takes at most b m2,5 c editions to convert t1 to a connected interval graph by Lemma 13. (If m 5, the number of editions is at most 0). Subcase A. Suppose jT2 j 7. Then the total number of editions is at most b m2,5 c for T1 , b 2(n,m3 ),11 c for T2 , and 2 to remove edge (v3 ; v4 ) and then connect the resulting components. Hence when m 5, the total number of editions is at most b m 2, 5 c + b 2(n , m3 ) , 11 c + 2 b 4n , m6 , 25 c which when m 5 is at most b 4n ,6 30 c = b 2n ,3 15 c b 2n ,3 11 c
d
d
d S
d
d
d
, ,
d,
, , S , COCS d, KA C A 7 AC AC
d
d
d
b b
d
bb d
branches
Fig. 2. the trees for Theorem 14. which completes the proof. On the other hand, when m < 5, the total number of editions is at most ) , 11 c+2 = b 2(n , m) , 5 c b 2(n , 3) , 5 c = b 2n , 11 c 0+b 2(n , m 3 3 3 3 which completes the proof of this subcase. Subcase B. Suppose jT2 j < 7. Then the number of 1 to convert T2 to a connected interval graph is 0. So the total number of editions is at most b m 2, 5 c + 0 + 2 = b m 2, 1 c b n ,2 4 c since m n , 3. This is at most b 2n,3 11 c when n 9. The case when n = 7 or n = 8 form the base cases proved earlier. That completes the proof. 2
Theorem 14. There exists a tree T of n nodes such that it needs at least b n, c editions to convert that into a connected interval graph. 2
3
11
Proof. We can argue as in the proof of Theorem 9 and conclude that in
any minimum length sequence of editions converting T to a connected interval graph, all the delete operations precede all the insert operations, and that such a sequence does not include the operations of inserting an edge e and deleting that same edge e (in either order). First suppose n = 3k + 7 for integer k = n,3 7 . The tree T which satis es the claim here is shown in Figure 2. Each \fork" emanating from w (including w) is called a branch. There are k + 2 branches in T . If any edge deletion occurs in a branch, we say that branch is broken, and it is unbroken otherwise. Case 1. Suppose at least 2 branches are unbroken. Then each of the remaining k branches have either had at least one deletion or they have had none. In the former case, they need at least one addition to reconnect them. In the latter case, they will need two additions to remove the 2stars. Hence, in all, we need at least 2k editions.
Case 2. Suppose fewer than 2 branches are unbroken. Then at least k
branches have had at least one deletion and they would need at least one addition to reconnect them. Hence we need at least 2k editions in this case. In either case, the number of deletions is at least 2k = 2(n3,7) = 2n,3 14 . That gives the lower bound when n , 7 is multiple of 3, or in other words, k is an integer. When k is not an integer, we construct T 0 by adding sucient singleton nodes to the branches of T . The same argument as above, coupled with the fact that the number of editions must be integers, can be used to argue that T 0 needs at least d 2(n 3, 7) e = b 2(n 3, 7) + 1c = b 2n ,3 11 c editions which proves the theorem. ut
5 Concluding Remarks Using techniques similar to the ones above, it can be shown that the graph editing problem to caterpillars is NP-complete even with the restriction that the number of additions is restricted to p. Further, approximating the minimum number of additions required to edit a graph to a caterpillar within an O(n1, ) additive factor is not possible for any , 0 < 1, unless P = NP . Both these results follow from the fact that the minimum number of additions required is exactly one less than the size of the smallest caterpillar cover. It can also be shown that approximating the minimum caterpillar cover for chordal graphs to within an O(n1, ) multiplicative factor for any xed , 0 < 1, is impossible unless P = NP .
Acknowledgements We thank the referees for numerous comments, especially the suggestions which lead to Lemma 2.
References 1. T. Andrae and M. Aigner. The total interval number of a graph. J. Comb. Theory, Series B, 46, 7-21, 1989. 2. F. Alizadeh, R.M. Karp, D.K. Weisser, G. Zweig, Physical Mapping of Chromosomes Using Unique Probes, J. Comp. Bio., 2(2):153-158, 1995. 3. S. Arnborg, A. Proskurowski, Linear Time Algorithms for NP-Hard Problems Restricted to Partial k-Trees, Discrete Applied Mathematics, 23:11-24, 1989. 4. H. Bodlaender. A tourist guide through treewidth. Manuscript, 1995.
5. H. Bodlaender, M. Fellows, M. Hallet, T. Wareham and T. Warnow. The hardness of problems on thin colored graphs. Manuscript, 1995. 6. H. Bodlaender, and B. de Fluiter. Intervalizing k-colored graphs. Proc. ICALP, 1995. Also, see http : ==www:cs:ruu:nl= hansb=mypapers2:html for the journal version. 7. H. Bodlaender, M. Fellows, and M. Hallet. Beyond the NP Completeness for problems of bounded width. Proc. STOC, 449{458, 1994. 8. K. Booth and G. Leuker. Testing for the consecutive ones property, interval graphs and graph planarity using PQ algorithms. J. Comput. Syst. Sciences, 13, 335 { 379, 1976. 9. N. G. Cooper(editor). The Human Genome Project - Deciphering the Blueprint of Heredity, University Science Books, Mill Valley, California, 1994. 10. P. Cresenzi and V. Kann. The NP Completeness Compendium. See Section on Subgraphs and Supergraphs. http://www.nada.kth.se/ viggo/problemlist/compendium. 11. M. Farach, S. Kannan and T. Warnow. A robust model for constructing evolutionary trees. Proc. STOC, 1994. 12. M. Fellows, M. Hallet and T. Wareham. DNA Physical Mapping: three ways dicult. Proc. First ESA, 157{168, 1993. 13. P.W. Goldberg, M.C. Golumbic, H. Kaplan, R. Shamir, Four Strikes Against Physical Mapping of DNA, J. Comp. Bio., 2(1):139-152, 1995. 14. M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP -Completeness, p199-200. (Freeman, San Fransisco, CA, 19 79) 15. M.C. Golumbic, Algorithmic Graph Theory and Perfect Graphs. Academic Press, New York, 1980. 16. M. Golumbic, H. Kaplan and R. Shamir On the complexity of DNA physical mapping. Adv. Appl. Math., 15:251-261, 1994. 17. H. Kaplan and R. Shamir. Pathwidth, bandwidth and completion problems to proper interval graphs with small cliques. To appear in SIAM. J. Computing, 1996. 18. H. Kaplan and R. Shamir. Physical mapping and interval sandwich problems: Bounded degrees help. Manuscript, 1996. 19. H. Kaplan, R. Shamir and R. Tarjan. Tractability of parameterized completion problems on chordal and interval graphs: Minimum ll-in and physical mapping. Procs. of FOCS, 1994. 20. R. Karp, Mapping the Genome: Some Combinatorial Problems Arising in Molecular Biology, 25th ACM STOC, 1993. 21. P. Pevzner, M. Waterman. Open Combinatorial Problems in Computational Molecular Biology, Proceedings of the Third Israel Symposium on Theory of Computing and Systems, Jan 4-6, 1995, Tel Aviv, Israel. 22. J. Rose. A graph-theoretic study of the numerical solution of sparse positive de nite systems of linear equations. Reed Eds. Graph Theory and Computing, 183{217, Academic Press, NY, 1972. 23. M. Waterman, J. R. Griggs, Interval Graphs and Maps of DNA, Bull. of Math. Biol., 48:189-195, 1986.
24. M. Yannakakis. Computing the minimum ll-in is NP-Complete. SIAM J. ALg. Disc. Methods, 2, 1981. 25. C. Wang. A subgraph problem from restriction maps of DNA chain. Journal of Computational Biology, 1995.
6 Appendix I. Motivating Biological Scenario In this section, we very brie y review the motivating biological scenarios for studying graph editing problems for bipartite interval graphs (BIGs). Motivating Biological Scenario. Graph editing problems G ! H where H is a class of BIGs, arise in DNA physical mapping in presence of experimental errors. This scenario is nicely motivated in [23]. Consider mapping a DNA molecule by double digest restriction mapping. The basic operation is that of cutting a DNA strand into disjoint fragments by a restriction enzyme. Two dierent restriction enzymes A and B are employed separately and then simulatenously in a total of three experiments. From the overlap information from the pieces thus obtained, we can construct an overlap graph with vertices representing the fragments cut by A on the left and those by B on the right; there is an edge between two vertices if their corresponding fragements overlap. Thus the overlap graph is a bipartite interval graph (BIG). Clearly in the experimental process, the order of the fragments is lost, and the goal is to reconstruct that order based on the overlap graph. This is an easy task provided there were no experimental errors and the overlap graph is a BIG [8, 23]. However, biological experimental have a signi cant rates of false positive (presence of an edge between two vertices whose corresponding interval do not overlap) and false negative (absence of an edge between two vertices whose corresponding interval do overlap) errors. In that case, the overlap graph is not an interval graph and biologists are interested in constructing the \true" overlap graph (a BIG), from the graph obtained from experiments with errors, assuming only a few experimental errors occurred, in other words, with few edges edits. That gives the graph editing problem. It is easy to see that the addition of edges compensates for false negative errors and deletion for false positive errors.
Model Checking ? Edmund M. Clarke Department of Computer Science Carnegie Mellon, Pittsburgh
ABSTRACT: Model checking is an automatic technique for verifying nite-state reactive systems, such as sequential circuit designs and communication protocols. Speci cations are expressed in temporal logic, and the reactive system is modeled as a statetransition graph. An ecient search procedure is used to determine whether or not the state-transition graph satis es the speci cations. We describe the basic model checking algorithm and show how it can be used with binary decision diagrams to verify properties of large state-transition graphs. We illustrate the power of model checking to nd subtle errors by verifying part of the Contingency Guidance Requirements for the Space Shuttle. Keywords: automatic veri cation, temporal logic, model checking, binary decision
diagrams
Model checking is an automatic technique for verifying nite-state reactive systems. Speci cations are expressed in a propositional temporal logic, and the reactive system is modeled as a state-transition graph. An ecient search procedure is used to determine automatically if the speci cations are satis ed by the state-transition graph. The technique was originally developed in 1981 by Clarke and Emerson [10, 11]. Quielle and Sifakis [18] independently discovered a similar veri cation technique shortly thereafter. An alternative approach based on showing inclusion between !-automata was later devised by Robert Kurshan at ATT Bell Laboratories [14, 15]. Model checking has a number of advantages over veri cation techniques based on automated theorem proving. The most important is that the procedure is highly automatic. Typically, the user provides a high level representation of the model and the speci cation to be checked. The model checker will either terminate with the answer true , indicating that the model satis es the speci cation, or give a counterexample execution that shows why the formula is not satis ed. The counterexamples are particularly important in nding subtle errors in complex reactive systems. The rst model checkers were able to verify small examples ([1], [2], [3], [4], [11], [13], [16]). However, they were unable to handle very large examples due to the state explosion ?
This research is sponsored in part by the Wright Laboratory, Aeronautical Systems Center, Air Force Materiel Command, USAF, and the Advanced Research Projects Agency (ARPA) under grant F33615-93-1-1330. and in part by the National Science foundation under Grant No. CCR-9217549 and in part by the Semiconductor Research Corporation under Contract 92-DJ-294. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the ocial policies, either expressed or implied of the U.S. government.
problem . Because of this limitation, many researchers in formal veri cation predicted that
model checking would never be useful in practice. The possibility of verifying systems with realistic complexity changed dramatically in the late 1980's with the discovery of how to represent transition relations using ordered binary decision diagrams (OBDDs) [5]. This discovery was made independently by three research teams [8, 12, 17] and is basically quite simple. Assume that the behavior of a reactive system is determined by n boolean state variables v1 ; v2; : : :; vn. Then the transition relation of the system can be expressed as a boolean formula R(v1 ; v2; : : :; vn; v1 ; v2; : : :; vn) 0
0
0
where v1 ; v2; : : :; vn represents the current state and v1 ; v2; : : :; vn represents the next state. By converting this formula to a BDD, a very concise representation of the transition relation may be obtained. The original model checking algorithm, together with the new representation for transition relations, is called symbolic model checking [7, 8, 9]. By using this combination, it is possible to verify extremely large reactive systems. In fact, some examples with more than than 10120 states have been veri ed [6, 9]. This is possible because the number of nodes in the OBDDs that must be constructed no longer depends on the actual number of states or the size of the transition relation. Because of this breakthrough it is now possible to verify reactive systems with realistic complexity, and a number of major companies including Intel, Motorola, Fujitsu, and ATT have started using symbolic model checkers to verify actual circuits and protocols. In several cases, errors have been found that were missed by extensive simulation. We illustrate the power of model checking to nd subtle errors by considering a protocol used by the Space Shuttle. We discuss the veri cation of the Three-EnginesOut Contingency Guidance Requirements using the SMV model checker. The example describes what should be done in a situation where all of the three main engines of the Space Shuttle fail during the ascent. The main task of the Space Shuttle Digital Autopilot is to separate the shuttle from the external tank and dump extra fuel if necessary. The task involves a large number of cases and has many dierent input parameters. Thus, it is important to make sure that all possible cases and input values are taken into account and that the tank will eventually separate. The Digital Autopilot chooses one of the six contingency regions depending on the current ight conditions. Each region uses dierent maneuvers for separating from the external tank. This involves computing a guidance quaternion. Usually, the region is chosen once at the beginning of the contingency and is maintained until separation occurs. However, under certain conditions a change of region is allowed. In this case, it is necessary to recompute the quaternion and certain other output values. Using SMV we were able to nd a counterexample in the program for this task. We discovered that when a transition between regions occurs, the autopilot system may fail to recompute the quaternion and cause the wrong maneuver to be made. The guidance program consists of about 1200 lines of SMV code. The number of reachable states is 2 1014, and it takes 60 seconds to verify 40 CTL formulas. 0
0
0
References 1. M. C. Browne and E. M. Clarke. Sml: A high level language for the design and veri cation of nite state machines. In IFIP WG 10.2 International Working Conference from HDL Descriptions to Guaranteed Correct Circuit Designs, Grenoble, France. IFIP, September 1986. 2. M. C. Browne, E. M. Clarke, and D. Dill. Checking the correctness of sequential circuits. In Proceedings of the 1985 International Conference on Computer Design, Port Chester, New York, October 1985. IEEE. 3. M. C. Browne, E. M. Clarke, and D. Dill. Automatic circuit veri cation using temporal logic: Two new examples. In Formal Aspects of VLSI Design. Elsevier Science Publishers (North Holland), 1986. 4. M. C. Browne, E. M. Clarke, D. L. Dill, and B. Mishra. Automatic veri cation of sequential circuits using temporal logic. IEEE Transactions on Computers, C-35(12):1035{1044, 1986. 5. R. E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Transactions on Computers, C-35(8), 1986. 6. J. R. Burch, E. M. Clarke, and D. E. Long. Symbolic model checking with partitioned transition relations. In A. Halaas and P. B. Denyer, editors, Proceedings of the 1991 International Conference on Very Large Scale Integration, August 1991. Winner of the Sidney Michaelson Best Paper Award. 7. J. R. Burch, E. M. Clarke, K. L. McMillan, and D. L. Dill. Sequential circuit veri cation using symbolic model checking. In Proceedings of the 27th ACM/IEEE Design Automation Conference. IEEE Computer Society Press, June 1990. 8. J. R. Burch, E. M. Clarke, K. L. McMillan, D. L. Dill, and J. Hwang. Symbolic model checking: 1020 states and beyond. In Proceedings of the Fifth Annual Symposium on Logic in Computer Science. IEEE Computer Society Press, June 1990. 9. Jerry R. Burch, Edmund M. Clarke, David E. Long, Kenneth L. McMillan, and David L. Dill. Symbolic model checking for sequential circuit veri cation. IEEE Transactions on Computer-Aided Design of Integrated Circuits, 13(4):401{424, April 1994. 10. E. M. Clarke and E. A. Emerson. Synthesis of synchronization skeletons for branching time temporal logic. In Logic of Programs: Workshop, Yorktown Heights, NY, May 1981, volume 131 of Lecture Notes in Computer Science. Springer-Verlag, 1981. 11. E. M. Clarke, E. A. Emerson, and A. P. Sistla. Automatic veri cation of nite-state concurrent systems using temporal logic speci cations. ACM Transactions on Programming Languages and Systems, 8(2):244{263, 1986. 12. O. Coudert, C. Berthet, and J. C. Madre. Veri cation of synchronous sequential machines based on symbolic execution. In J. Sifakis, editor, Proceedings of the 1989 International Workshop on Automatic Veri cation Methods for Finite State Systems, Grenoble, France, volume 407 of Lecture Notes in Computer Science. Springer-Verlag, June 1989. 13. D. L. Dill and E. M. Clarke. Automatic veri cation of asynchronous circuits using temporal logic. IEE Proceedings, Part E 133(5), 1986. 14. Z. Har'El and R. P. Kurshan. Software for analytical development of communications protocols. AT&T Technical Journal, 69(1):45{59, Jan.{Feb. 1990. 15. R. P. Kurshan. Analysis of discrete event coordination. In J. W. de Bakker, W.-P. de Roever, and G. Rozenberg, editors, Proceedings of the REX Workshop on Stepwise Re nement of Distributed Systems, Models, Formalisms, Correctness, volume 430 of Lecture Notes in Computer Science. Springer-Verlag, May 1989. 16. B. Mishra and E.M. Clarke. Hierarchical veri cation of asynchronous circuits using temporal logic. Theoretical Computer Science, 38:269{291, 1985.
17. C. Pixley. A computational theory and implementation of sequential hardware equivalence. In R. Kurshan and E. Clarke, editors, Proc. CAV Workshop (also DIMACS Tech. Report 90-31), Rutgers University, NJ, June 1990. 18. J.P. Quielle and J. Sifakis. Speci cation and veri cation of concurrent systems in CESAR. In Proceedings of the Fifth International Symposium in Programming, 1981.
This article was processed using the LaTEX macro package with LLNCS style
Recursion Versus Iteration at Higher-Orders
A. J. Kfoury Boston University, Boston, MA 02215, USA e-mail: [email protected]
Abstract. We extend the well-known analysis of recursion-removal in
rst-order program schemes to a higher-order language of nitely typed and polymorphically typed functional programs, the semantics of which is based on call-by-name parameter-passing. We introduce methods for recursion-removal, i.e. for translating higher-order recursive programs into higher-order iterative programs, and determine conditions under which this translation is possible. Just as nitely typed recursive programs are naturally classi ed by their orders, so are nitely typed iterative programs. This syntactic classi cation of recursive and iterative programs corresponds to a semantic (or computational) classi cation: the higher the order of programs, the more functions they can compute.
1 Background and Motivation Although our analysis is entirely theoretical, as it combines methods from typed -calculi, from abstract recursion theory and from denotational semantics, the problems we consider have a strong practical motivation. The translation of recursive procedures into iterative procedures, sometimes called recursion-removal, has been a standard feature of programming practice for a long time. A frequent and important use of recursion-removal is, for example, in compiler design; modern treatments of this topic are in [6, Chapter 8] and [18, Chapter 21]. Discussions about other bene ts of recursion-removal can be found elsewhere in the literature, e.g. in the books [1, 10, 14], among several others. The usual implementation of a recursive procedure requires a stack, in addition to basic control mechanisms to change the order of execution in a computation, such as goto instructions. The stack in question is used to save information, the so-called activation records [2, Chapter 10], from successive recursive calls of the procedure. By contrast, the implementation of an iterative procedure does not require a stack, resulting in more ecient use of storage space. By appropriate coding and decoding of arbitrarily long sequences of activation records, it is in principle always possible to avoid the use of a stack and directly implement recursive procedures as iterative ones. The problem is that such a coding and decoding mechanism is not always available, and even if it is, it often leads to just as expensive implementations by trading storage space for ?
Partly supported by NSF grant CCR{9417382.
computational complexity. In the absence of such a coding and decoding mechanism, there are recursive procedures that can be translated into iterative ones, while there are others that cannot. A procedure in the latter case is inherently recursive, in the sense that any other procedure de ning the same function cannot be iterative and requires a stack for its implementation, if we also preclude the use of a coding and decoding mechanism to simulate a stack.
Comparative Schematology. The preceding facts are well-known among com-
puter scientists. In the early 1970's people working in comparative schematology aimed at rigorously establishing these and other similar results. Classes of program schemes were de ned, modelling dierent control mechanisms in conventional programming languages, such as recursion and iteration. An elegant and sometimes highly technical theory was developed to examine questions of inter-translatability between these classes. The rst paper in this area, sparking an intensive research activity for many years, was by Paterson and Hewitt [17] who spelled out precise conditions under which recursion-removal is not possible. We summarize the Paterson-Hewitt result by writing ITER < REC, meaning that iteration is a strictly weaker control structure than recursion.
Pebbling vs. Counting. Paterson and Hewitt gave two very dierent proofs
for ITER < REC. Their rst proof was based on the pebble game. As it turned out in later years, the pebble game and many of its variants became a basic technique in other areas of theoretical computer science; in particular, the most numerous applications have been in complexity theory, e.g. in time-space tradeos [19, 20], rather than in comparative schematology or in logics of programs. The second proof by Paterson and Hewitt was a counting argument. Freely speaking, counting (rather than pebbling) is used to show that a simulating iterative procedure runs out of time (rather than space). This dichotomy between pebbling and counting pervades nearly all studies of the original question recursion vs. iteration and later variants. For the early results for example, see the survey in [8, Chapters VII and VIII].
Other Theoretical Studies. Recursion pervades the whole eld of computer
science. Ways of removing or simplifying recursion abound. The background reviewed above by no means exhausts the topic and only covers that part of the eld directly motivating our work. There is for example an extensive literature on boundedness of database logic programs, see [9] and the references therein; a database program is bounded if the depth of recursion is a constant independent of the input, in which case the program is equivalent to a recursion-free rstorder formula. Another motivation for the study of recursion vs. iteration comes from inductive de nability in abstract recursion theory; see for example [15] and [16], where a formal logic framework is de ned to study the question. Another work still is [5], where the question is considered in relation to program synthesis. However interesting these and other studies are, none is directly related to this
paper. Closer to our concerns is part of the literature on continuations that deals with issues of recursion removal, speci cally, the use of CPS (continuationpassing style) to transform a recursive functional program into tail-recursive form. Although \iteration" and \tail-recursion" coincide in the case of rstorder programs, the two notions are not exactly the same at higher-orders. In the full version [12] of this report, we adduce several reasons for this distinction; in any case, an attractive theory and many non-trivial results emerge out of the distinction, as shown in [12].
Some Past Results and Some Questions. This history of results, starting
with the Paterson-Hewitt result in 1970, is entirely concerned with the rstorder case of recursion vs. iteration. That is, recursive and iterative procedures are always activated by receiving input values of ground type, and they always return an output value, if any, of ground type. Many modern programming languages allow their programs to use higherorder objects in their computations. The role of such objects is purely auxiliary in that, semantically, higher-order values are not mentioned in the input-output relation of the program. In our setting, only the main procedure in a program is rst-order, but it can call intermediate procedures of arbitrary nite orders. It is now important to distinguish between a single procedure and a nite collection of procedures calling each other, of which only one is the main procedure. In this paper, we call such a collection a functional program. The order of a functional program is the highest order of any of its procedures. In general, a functional program is recursive, as no restriction is imposed on its syntax; under the restriction that all of its procedures are iterative, we say that the functional program is iterative. If ITERn and RECn are, respectively, the set of all iterative functional programs and the set of all recursive functional programs of order 6 n, where n is a positive integer, then: (1) ITERn RECn i.e. every program in ITERn , as a syntactic object, is a program in RECn , but not vice-versa. We also have the following strict syntactic hierarchies: (2) ITER1 ITER2 ITER3 (3) REC1 REC2 REC3 (ITER1 and REC1 here are ITER and REC mentioned earlier.) It is well-know there is a strict computational hierarchy (see, for example, [7] or [11] or references cited therein): (4) REC1 < REC2 < REC3 < In words, increasing the order of functional programs results in a gain of computational power. On the other hand, so far we can only assert, by (2): (5) ITER1 6 ITER2 6 ITER3 6
and the obvious question is whether this computational hierarchy is strict. In the presence of higher-order procedures, the question of recursion vs. iteration splits into two parts. We rst ask, for n > 2: (6) Is ITERn < RECn ? If the answer to question (6) is yes, it will generalize to higher-orders the PatersonHewitt result in [17], namely: At every order, there are functional programs that are inherently recursive. We further ask, for n > 1: (7) Is RECn 6 ITERn for some n0 > n ? If yes, what is the least such n0 ? In words, question (7) asks whether recursion-removal is always possible after all, but now at the price of increasing the order of the program. These and other related questions, to be formulated in due course, are the immediate motivation for the present work. 0
Organization of the Paper. The rst task is to de ne precisely the frame-
work of the investigation. Our choices are not the only possible ones: How we de ne higher-order procedures, and how we interpret and execute them, depend on choices inspired by a particular programming paradigm. This is the paradigm of strongly-typed pure functional programs, where there are no side-eects and where the only execution mechanism is call-by-name parameter-passing. This is all done in Section 2. Hence, equivalence between programs here means equivalence under an operational semantics based on call-by-name execution. In addition to the hierarchies fRECn g and fITERn g mentioned earlier, we de ne the class p-REC of polymorphically typed recursive programs and the class p-ITER of polymorphically typed iterative programs. Our key technical lemmas are presented in Section 3 and Section 4. From these results, we draw several consequences about the hierarchies fRECn g and fITERn g in Section 5. Proofs are only sketched in this conference report. Details, as well as further material and related results, are in [12].
2 Basic De nitions: Syntax and Semantics What we set up is the syntax of a typed -calculus + conditionals + recursion. We introduce recursion by means of mutually recursive functional equations, and not by application of xpoint operators. De nition 1 (Types). Let TVar be an in nite set of type variables. Let and bool be two type constants, which we call ground types. The atomic types are f; boolg [ TVar. The set T of types is the least such that T f; boolg [ TVar [ f ( ! ) j ; 2 T g For 2 T, let TVar() denote the nite set of type variables occurring in . We partition T into the set T n of nite types and the set Tgen of generic types: T n = f 2 T j TVar() = ? g and Tgen = T , T n :
The order of a nite type 2 T n is: ; if = or bool; order () = 0max forder (1 ) + 1; order (2 )g; if = (1 ! 2 ). We do not de ne order() if is generic. In what follows, every term M has a type , which we indicate by writing M : . If is a nite type and M : , we say that M is nitely typed. If is a generic type and M : , we say that M is polymorphically typed. A type substitution is a map S : TVar ! T such that f 2 TVarjS () 6= g is nite. Every type substitution extends in a natural way to a f; bool; !ghomomorphism S : T ! T. For ; 2 T, we say that is an instance of , in symbols 4 , if there is a type substitution S such that = S (). De nition 2 (Terms). A class of functional programs is de ned relative to a xed rst-order signature A = [ A, where is a nite set of relation and function symbols and A is a countable set of individual symbols . To avoid trivial and uninteresting situations, we require that both 6= ? and A 6= ?. Every relation (or function) symbol f 2 has a xed arity k > 1, in which case its type is ! } ! ; abbreviated as k ! ; | ! {z k
where = bool (or , resp.), i.e. a nite type of order 1. Every constant in A is of type . It is convenient to use two disjoint sets of variables: object variables and function names. For every type there is a countably in nite set of object variables of type , and for every non-atomic type a countably in nite set of function names of type . The set of terms is the smallest containing
f tt, g [ A [
(ground constants)
[ f if j 2 T g [
(other constants)
f object variables g [ f instantiated function names g (variables) and closed under application and -abstraction. The details of the de nition of well-typed terms are given in Figure 1. We omit the type subscript in if and in an instantiated function name F whenever possible, if no ambiguity is introduced. (if M then N else P ) is a sugared version of (if M N P ). For simplicity, we often omit the type of a -binding, and therefore write (v M ) instead of (v : : M ). If a closed term M and all its subterms are nitely typed, the order of M is: order (M ) = max f order ( ) j N M and N : g If M is not closed and N (v M ) is the closure of M , then order (M ) = order (N ). We do not de ne the order of a polymorphically typed term. By a slight abuse of terminology, we often say \M is a n-th order term" to mean that order (M ) 6 n rather than order (M ) = n.
Ground constants c :
type (c) =
-terms
N1 : Nk : (fN1 Nk ) :
if-terms
N1 : bool
Variables
v:
type (v) =
Function names
F :
type (F ) = 4
Applications
M :! N : (M N ) :
Abstractions
M : (v : : M ) : ( ! )
Programs
M1 : 1
type (f ) = k !
N2 : N3 : (if N1 N2 N3 ) :
type (v) =
M2 : 2 M` : ` fF1 = M1; F2 = M2; : : : ; F` = M`g
Mi is a closed term and type(Fi ) = i for i 2 f1; : : : ; `g
Fig. 1. Rules for well-typed terms and programs. Throughout, 2 f; boolg. De nition 3 (Functional Programs). Let F be a function name of type 2 T. A function de nition for F is an equation of the form F = M where M is an arbitrary closed term of type . F on the left of \=" is not instantiated, while every occurrence of F on the right of \=" is instantiated as F , for some such that type(F ) 4 , and corresponds to a recursive call to F at type . A functional program P is a nite non-empty set of function de nitions, together with a distinguished function name F such that: { The type of F is k ! for some k > 1, where = or bool. { For every function name G , P has at most one function de nition for G . { For every function name G , P has a de nition for G if there is an instance G of G appearing in P (i.e. on the right of \=", a call to G at some type ). The restrictions we impose on the type of the distinguished F have to do with
the fact that the inputs and output, if any, of a program are always ground values.
Let P be the functional program fFi = Mi g16i6` . In a ML-like language that supports polymorphic recursion, P can be written as:
v1 vk : letrec (F1 ; : : : ; F` ) = (M1 ; : : : ; M`) in F1 v1 vk where F1 is the distinguished function of arity k > 1. If the types of all the terms M1 ; : : : ; M` and their subterms are nite, then P is nitely typed; otherwise, P is polymorphically typed. If P is nitely typed, then its order is order (P ) = maxforder (M1 ); : : : ; order (M` )g
If this order is n > 1, we also say this is an order-n functional program. We do not de ne the order of a polymorphically typed program. Speci c examples of nitely typed and polymorphically typed programs are given in Sections 3 and 4, respectively. Although it does not restrict the later analysis in any way, let us assume that the right-hand side of every function de nition Fi = Mi in P is in -normal form. Under this assumption, it is easy to check that if 1 6 order (Fi ) < order (Mi ) then there is an instantiated function name Fj occurring in Mi such that order (Fj ) = order (Mi ). Hence, under this assumption, we can equivalently and more simply de ne order (P ) by: order (P ) = maxforder (F1 ); : : : ; order (F` )g . The syntactic hierarchy of functional programs is given by:
RECn = fS nitely typed functional programs of order 6 n g REC = n>1 RECn p-REC = f polymorphically typed functional programs g De nition 4 (Call-by-Name Semantics). A functional program computes relative to a A -structure A which assigns a meaning to every symbol in the signature . We take the universe of A to be precisely the set A of all individual constants of type . The meaning in A of the symbol f 2 of type k ! , where k > 1 and = (or bool), is a total function from Ak to A (or to ftt,g). For the functional program P = fFi = Mi g16i6` we de ne the reduction relation ,,! A;P relative to A by the rewrite rules of Figure 2. In the -reduction rule, a1 ; : : : ; ak are arbitrary elements of A, f 2 has arity k > 1, and f interpreted in A is a function that maps a1 ak to ak+1 2 A [ftt,g. We often leave the structure A and the program P implicit, and write ,! instead of ,,! . We write ,! ,! for the re exive transitive closure of ,!. A;P The rst term in a computation of program P is always (F a1 ak ), where F is the distinguished function name of P with arity k > 1, for some a1 ; : : : ; ak 2 A. We call a = a1 ak an input vector for P . We call hA; ai an interpretation for P . Note that (F a1 ak ) is a closed term of a ground type, or bool. The next proposition is proved in [11, Sections 2 and 3]. A term is in normal form if it does not contain any redexes, i.e. it cannot be reduced. To reduce a term in normal order means to reduce its leftmost redex at every step.
((v N ) P ) ,,! N [v := P ] A;P
( -reduction)
Fi ,,! A;P Mi
(P -reduction)
(if b then N else P ) ,,! A;P
N; if b = tt, (if-reduction) P; if b = .
(f a1 ak ) ,,! ak+1 A;P
(-reduction)
N ,,! P A;P (NQ) ,,! (PQ) A;P N ,,! P A;P (QN ) ,,! (QP ) A;P N ,,! P A;P (v N ) ,,! (v P ) A;P
Fig. 2. Reduction rules for program P = fFi = Mig16i6` . Proposition 5. Let P be a program over the signature A, whose distinguished function name is F . Let hA; ai be an interpretation for P . If applying the reduction rules in some arbitrary order causes (F a) to terminate at ground value b, then applying the reduction rules in normal order also causes (F a) to terminate at the same ground value b. ut The call-by-name semantics of functional programs corresponds to carrying out their computations, viewed as reduction sequences, in normal order. Let P be a functional program over the signature A , whose distinguished function name is F : k ! where k > 1 and = or bool. Over a A -structure A, the program P de nes a partial function P A : Ak ! A or P A : Ak ! ftt, g, given by: P A = f ha; bi j (F a) ,! ,! b g where b 2 A or b 2 ftt, g, respectively. Implicit in this de nition, by the preceding proposition, is that P A is the callby-name semantics of P over the structure A. De nition 6 (Iterative Form). A term M is simple if M does not mention instantiated function names. A term M is left-linear if either M is simple or M (F N1 Nk ) where F is an instantiated function name and N1 ; : : : ; Nk are simple terms. A function de nition F = M is in iterative form (or left-linear form ) if
| either M v: N , where N is left-linear, | or M v: if N then P else Q, where N is simple, P and Q left-linear. A functional program fFi = Mi g is in iterative form if every de nition in it is in iterative form. The syntactic hierarchy of iterative programs is given by:
ITERn = fS nitely typed iterative programs of order 6 n g ITER = n>1 ITERn p-ITER = f polymorphically typed iterative programs g
3 From Recursion to Iteration We restrict our analysis in this section to nitely typed programs, as we do not know how to extend it to polymorphically typed programs. Unless stated otherwise, all programs in this section are nitely typed. Let P = fFi = Mi g16i6` be a functional program where F1 is the distinguished function name. The ` function de nitions in P are mutually recursive. To make explicit the dependence of the function de nitions on each other, we may write: P = fFi = Mi (F1 ; : : : ; F`)g16i6` , meaning that each Mi is a term that may mention some of the function names in fF1 ; : : : ; F` g. Suppose F1 : 1 ; : : : ; F` : ` for some types 1 ; : : : ; ` 2 T. As F1 is the distinguished function name, let 1 k ! for some k > 1 and some ground type . We temporarily introduce ` + 1 special symbols ?0 : ; ?1 : 1 ; : : : ; ?` : ` with the exhibited types. Intuitively, ?i stands for \the evaluation of function Fi has not yet converged". We de ne a \ owchart" P whose call-by-name semantics coincide with the call-by-name semantics of P . P is shown in Fig. 3, where x1 ; : : : ; xk are k input variables all of type and z is a fresh variable of type . The construction of P from P is suggested by standard methods in denotational semantics. P is a convenient description suggesting the later transition to iterative form. In the form of a functional program, not legal in our formalism because of the arti cial symbols ?0 ; ?1 ; : : : ; ?` , we can write P as G x = H x ?0 ?1 ?` H x z F1 F` = if (z 6= ?0 ) then (F1 x) else H x (F1 x) M1 M` where x = x1 xk and Mi is Mi with F1 ; : : : ; F` replaced by F1 ; : : : ; F` , respectively, and G and H are fresh function names with the following types: G : k ! H : k ! ! 1 ! ! ` ! The challenge here is how to turn P into a legal functional program, in particular how to simulate the arti cial symbols ?0; ?1 ; : : : ; ?`. For this, we rst transform P into ((P )), which incorporates an appropriate simulation of ?0 ; ?1 ; : : : ; ?` . In a second stage, we transform ((P )) into iter(P ), the nal iterative form of P .
input x1 xk
?
(F1 ; : : : ; F`) := (?1 ; : : : ; ?`)
-
?
(F1 ; : : : ; F` ) := (M1 (F1 ; : : : ; F` ); : : : ; M`(F1 ; : : : ; F`))
?
z := F1x1 xk
? Fig. 3. z 6= ?0 ?
tt
-
output z
Transforming P into P .
The transformation (( )) is de ned in Fig. 5. If P mentions object variable v (resp. function name F ), then ((P )) mentions both v (resp. F ) and a fresh object variable v (resp. fresh function name F ). If P = fFi = Mi g16i6` then ((P )) is the program: ((P )) = f Fi = Mi g16i6` [ f Fi = ((Mi )) g16i6` Note that P and ((P )) are over the same signature. The earlier de nition of P from P suggests the construction of \ owchart" ((P )) from ((P )), shown in Fig. 4, where ci is any closed term of type i for 1 6 i 6 `. In the form of a functional program, ((P )) is basically the desired iterative form of P , denoted iter(P ). The formal de nition of iter(P ) is given next. De nition 7 (Transformation to Iterative Form). Let P = fFi = Mig16i6` be a nitely typed functional program, where: | F1 : 1 ; : : : ; F` : ` for some 1 ; : : : ; ` 2 T, | F1 is the distinguished function, 1 k ! and 2 f; boolg. The iterative form iter(P ) of P is the following functional program: G x = H x c1 1 c` ` H x z F1 F1 F` F` = if z then (F1 x1 xk ) else H x (F1 tt x1 tt xk ) M1 ((M1 )) M` ((M` )) where 1. x = x1 xk .
input x1 xk
?
(F1 ; F1 ; : : : ; F` ; F`) := (c1 ; 1 ; : : : ; c` ; ` )
-
?
,
(F1 ; F1 ; : : : ; F` ; F`) := M1 ; ((M1 )); : : : ; M`; ((M` ))
?
z := F1 tt x1 tt xk
? Fig. 4. z ?
tt
-
output F1 x1 xk
Transforming ((P )) into ((P )).
2. 3. 4. 5.
F1 : 1 ; F1 : ((1 )); : : : ; F` : ` ; F` : ((` )) are fresh variables. Mi is Mi with F1 ; : : : ; F` replaced by F1 ; ; : : : ; F` , respectively. ((Mi )) is ((Mi )) with F1 ; F1 ; : : : ; F` ; F` replaced by F1 ; F1 ; : : : ; F` ; F` . For i 2 f1; : : : ; `g, if i i;1 ! ! i;ki ! i with ki > 1 and i 2 f; boolg, then ci (v1 : i;1 : :vki : i;ki : a), where a is any ground constant, or variable in fx1 ; : : : ; xk g, of type i .
6. G and H are fresh function names with the following types: G : k ! H : k ! bool ! 1 ! ((1 )) ! ! ` ! ((` )) ! Theorem 8. If P is a nitely typed functional program of order n > 1, then iter(P ) is an iterative program of order n + 1 equivalent to P . Proof. Let P = fFi = Mi g16i6` as in De nition 7. The equivalence of P and iter(P ) follows from the preceding discussion. To check that iter(P ) is in iterative form is a straightforward inspection of its de nition. Finally, if order (P ) = n, we can take: n = maxforder (F1 ); : : : ; order (F` )g = maxforder (1 ); : : : ; order (` )g See De nition 3. This implies: order (iter(P )) = order (H) = order (k ! bool ! 1 ! ((1 )) ! ! ` ! ((` )) ! ) =n+1 The last equality (to n + 1) is immediate from the de nition of order . ut
(()) = bool
Types
((bool)) = bool (( ! )) = (()) ! ! (( )) Special symbols
: bool
bool : bool ,
! v : (()):v : : : (( ! )) ((c : )) = tt : (())
Terms
((fN1 Nk : )) = ((N1 )) and and ((Nk )) : (()) ,
,
((if N1 N2 N3 : )) = if ((N1 )) if N1 ((N2 )) ((N3 )) : (( )) ((v : )) = v : (( )) ((F : )) = F : (( )) ,
((MN : )) = ((M ))((N ))N : (( )) ,
((v : :M : ! )) = v : (()):v : : ((M )) : (( ! )) Programs
f F1 = M1 ; : : : ; F` = M` ; F1 = ((M1 )); : : : ; F` = ((M` )) g
Fig. 5. The transformation (( )) of nitely typed functional programs. Throughout, 2 f; boolg and (b1 and b2 ) is shorthand for (if b1 b2 ). Example 1. The following rst-order functional program P is from the PatersonHewitt paper [17]:
F x = if px then x else g (F (Lx)) (F (Rx)) where the signature is = fp; L; R; gg, and the types are p : ! bool, L; R : ! , and F ; g : ! ! . Using the pebble game, Paterson and Hewitt show that this functional program cannot be simulated by a rst-order iterative program. By Theorem 8, iter(P ) is a second-order iterative program equivalent to P . The details of iter(P ) are given ,next. Let M denote the right-hand side of the function de nition in P , i.e. M x:if px then x else g(F (Lx))(F (Rx)) .
The functional program iter(P ) is: G x = H x (v:x) (w:w: ) HxzFF = if z then Fx else H x (F tt x) N P where G and H are fresh function names with the following types: G : ! H : ! bool ! ( ! ) ! (bool ! ! bool) ! and N and P are the following terms: N x: if px then x else g(F (Lx))(F (Rx)) P x:x: if x then if px then x else (F x (Lx))and(F x (Rx)) else
N is simply M with function name F replaced by variable F , and P is ((N )). ut Open Problem. Extend the transformation (( )) to polymorphically typed functional programs. De ne the iterative form iter(P ) of a polymorphically typed P based on this extension of (( )).
4 Polymorphic Iteration We prove that there are polymorphically typed iterative programs that cannot be translated into equivalent nitely typed functional programs. We start with an example of a polymorphically typed iterative program P , which we use in the proof of Theorem 9. Example 2. P is the following polymorphically typed functional program: F x = G xfa G xFZ = if rx then FZ else G (fx)(v:w:v(vw))FZ The types of the symbols in the signature = ff; rg, the ground constant a, the variables fx; v; w; F; Z g, and the function names fF ; Gg are a; x : f : ! r : ! bool v; F : ! w; Z : F : ! G : ! ( ! ) ! ! The instance of G in the de nition for F has type ! ( ! ) ! !
The instance of G in the de nition for G has type ! (( ! ) ! ( ! )) ! ( ! ) ! ( ! ) Noting that P is in iterative form, what we have in this example is a case of polymorphic iteration. If n is the least natural number such that r(f (n) x) is tt, we have the following converging computation:
F x ,! ,! ,! ,! ,! ,! ,! ,! .. .
Gxf a G (fx) 2 f a G (f (2) x) 2 2 f a
,! ,! G (f (n) x) 2| {z 2} f a n ,! ,! 2| {z 2} f a f| (f ({zf a) }) (f (e(n)) a) n
e(n)
where 2 denote the term (v:w:v(vw)), and the function e is given by e(0) = 1 and e(n + 1) = 2e(n) for all n > 0. Each call to G in the computation is at a dierent type. An explicitly typed intermediate term in this computation is (types inserted as superscripts): G ![k+1]![k+1] (f (k) x) 2[k+1] 2[k] 2[2] f [1] a where 0 6 k 6 n and we use the following type abbreviations: [0] = and [k + 1] = ([k] ! [k]). Note that because the program is polymorphic, G is repeatedly applied to nitely typed arguments with increasingly complex types. This example is adapted from [11, Example 5.3]. ut
Theorem 9. There is a polymorphically typed iterative program P which is not equivalent to any nitely typed functional program.
Proof. 2 The desired P is the program given in Example 2. It suces to show for every nitely typed functional program Q of order n > 1, over the signature of P , there is a A -structure A such that P A 6= QA . We choose A to be of the form hfa1 ; : : : ; au g [ N ; r; f i where u is a positive integer (to be appropriately selected depending on Q), N is the set of natural numbers, r is the predicate such that (r x) = tt i x = au , and f is the function: (f ai ) = ai+1 for 1 6 i < u, (f au ) = au , and (f i) = i + 1 for every i 2 N . P mentions only one ground constant, namely a, while Q may mention a as well as several other ground constants. We choose a to be 0 in N , and we can always de ne A so that all the other ground constants mentioned by Q are in fa1 ; : : : ; au g. No matter what the value of u is, the computation of P relative to the interpretation hA; a1 i converges and returns the value: (f (e(u)) a) = (f (e(u)) 0) = e(u) 2 Joint with Pawel Urzyczyn.
We need to select u large enough so that the computation of Q, relative to the same interpretation hA; a1 i, either diverges or converges and returns a value 6= e(u). By Theorem 8, we can restrict Q to be in iterative form. We can therefore write Q in the form of an order-n \ owchart" consisting of, say, k > 1 instructions and ` > 1 variables, all of order < n. As there is no \communication" between the elements in fa1 ; : : : ; au g and the elements in N , the behavior of Q is entirely determined by the substructure B = hfa1 ; : : : ; au g; r; f i. De ne the function exp such that exp(0; n) = n and exp(m+1; n) = 2exp(m;n) , for all m; n 2 N . It is not dicult to show there is a xed polynomial ' : N ! N such that if v : is a variable in Q and order() = m > 0, then v will be assigned at most exp(m; '(u)) distinct values, i.e. functions of type over the universe fa1 ; : : : ; au g, in the course of the computation of Q relative to the interpretation hB; a1i. If a state of this computation consists of an instruction label (k of them) in Q, together with the values assigned to the ` variables in , ` Q, then k exp(n; '(u)) is an upper bound on the number of distinct states that Q can visit in the course of its computation. If a state is repeated, the computation is doomed to diverge. Noting that k, ` and ndepend on Q and are , therefore xed in this proof, let (u) = k exp(n; '(u)) ` . Suppose z : is a variable in Q which is assigned the nal output, if any. The value assigned to z can be changed at most (u) times in the course of a converging computation of Q, now relative to hA; a1 i. Using the fact that a nitely-typed simple term M reduces to normal form in at most exp(jM j; jM j) steps, see [21], the nal value assigned to z cannot exceed exp(p (u); p (u)) for some p 2 N depending on Q. The desired conclusion now follows, because e(u) > exp(p (u); p (u)) for suciently large u. ut
5 Hierarchies The results of Sections 3 and 4 do not presume that the signature contains a symbol eq : 2 ! bool which is always interpreted as the equality relation on the universe A of a A -structure. To compare the hierarchies fITERn g and fRECn g, and use the results of [11], we now assume the existence of such a symbol eq. It is easy to see that Theorems 8 and 9 imply the following computational hierarchy: ITER1 < REC1 6 ITER2 6 REC2 6 ITER3 6 REC3 6 < p-ITER 6 p-REC The strictness of the rst level, ITER1 < REC1 , is the classical result of Paterson and Hewitt [17]. Using the already known fact that RECn < RECn+1 for every n > 1, see [11, Theorem 3.9], we can conclude: ITER1 < ITER3 < ITER5 < < ITER < p-ITER (odd orders) ITER2 < ITER4 < ITER6 < < ITER < p-ITER (even orders) Although we are not yet able to separate two consecutive levels in this hierarchy, we have already shown that increasing the order of nitely-typed iterative
programs results in a net gain of computational power, and adding polymorphic iteration results in a further gain of computational power. Based on the preceding, it is only natural to state the following. Conjecture. ITER1 < ITER2 < ITER3 < ITER4 < (all orders) To settle this conjecture, we can proceed in one of two ways to prove that consecutive levels of the hierarchy fITERn g can indeed be separated. This is similar to the situation in the rst-order case, where separation results can be established in two dierent ways, depending on whether \counting" or \pebbling" is used. The rst approach is to sharpen the counting arguments we have used so far, with a view to separate consecutive levels in the hierarchy. The second approach, more problematic at this point, is to try some kind of pebbling argument. This clearly raises the question of how to de ne a higher-order pebble game. We can de ne it with the aim to directly show that ITERn < ITERn+1 or, using the already established reduction RECn 6 ITERn+1 of Theorem 8, the stronger result that ITERn < RECn for every n > 2. In both cases there are several technical issues to be sorted out. Any of these two results will settle the above conjecture.
References 1. Abelson, H., and Sussman, G., Structure And Interpretation Of Computer Progams , MIT Press/McGraw-Hill, NY, 1984. 2. Aho, A.V. and Ullman, J.D., Principles of Compiler Design, Addison-Wesley, 1979. 3. Auslander, M.A., and Strong, H.R., \Systematic recursion removal", Communications ACM , 21, no. 2, pp 127-134, Feb 1978. 4. Barendregt, H.P., The Lambda Calculus, Its Syntax and Semantics, revised edition, North-Holland, Amsterdam, 1984. 5. Bohm, C., and Berarducci, A., \Automatic synthesis of typed lambda-programs on term algebras", Theoretical Computer Science, 39, pp 135{154, 1985. 6. Friedman, D.P., Wand, M., and Haynes, C.T., Essentials of Programming Languages, MIT Press/McGraw-Hill, NY, 1992. 7. Goerdt, A., \On the computational power of the nitely typed lambda-terms", in Proceedings of 13th MFCS, LNCS 324, pp 318-328, 1988. 8. Greibach, S.A., Theory of Program Structures: Schemes, Semantics, Veri cation, LNCS 36, Springer-Verlag, 1975. 9. Hillebrand, G.G., Kanellakis, P.C., Mairson, H.G., and Vardi, M.Y., \Undecidable Boundedness Problems for Datalog Programs", Journal of Logic Programming 25:2, pp 163-190, 1995. 10. Kamin, S.N., Programming Languages: An Interpreter-Based Approach, AddisonWesley, 1990. 11. Kfoury, A.J., Tiuryn, J., and Urzyczyn, P., \On the expressive power of nitely typed and universally polymorphic recursive procedures", Theoretical Computer Science, 93, pp 1-41, 1992. 12. Kfoury, A.J., \Recursion, Tail-Recursion, and Iteration at Higher-Orders". In preparation.
13. Kozen, D., and Tiuryn, J., \Logics of Programs", in Handbook of Theoretical Computer Science, Vol. B, Formal Methods and Semantics, ed. J. van Leeuven, Elsevier Science Publ. and The MIT Press, pp 789{840, 1990. 14. Mitchell, J.C., Foundations for Programming Languages, MIT Press, Cambridge, Mass, 1996. 15. Moschovakis, Y.N., Elementary Induction on Abstract Structures, North-Holland, 1974. 16. Moschovakis, Y.N., \The Formal Language of Recursion", Journal of Symbolic Logic, 54, pp 1216-1252, 1989. 17. Paterson, M.S., and Hewitt, C., \Comparative schematology", MIT A.I. Lab Technical Memo No. 201 (also in Proc. of Project MAC Conference on Concurrent Systems and Parallel Computation ), 1970. 18. Peyton Jones, S.L., The Implementation of Functional Programming Languages, Prentice-Hall, 1987. 19. Pippenger, N., \Pebbling", Fifth Symposium on Mathematical Foundations of Computer Science, IBM Japan, 1980. 20. Pippenger, N., \Advances in Pebbling", Proc. of 9th ICALP , LNCS no. 140, Springer-Verlag, 1982. 21. Statman, R., \The typed -calculus is not elementary recursive", Theoretical Computer Science, 9, pp 73-81, 1979. 22. Strong, H.R., \Translating recursion equations into owcharts", J. Computer and System Sciences , 5, pp 254-285, 1971. 23. Walker, S.A. and Strong, H.R., \Characterizations of owchartable recursions", J. Computer and System Sciences , 7, pp 404-447, 1973.
This article was processed using the LATEX macro package with LLNCS style
Compilation and Equivalence of Imperative Objects A.D. Gordon1, P.D. Hankin1 , and S.B. Lassen2 Computer Laboratory, University of Cambridge BRICS, Computer Science Department, University of Aarhus 1
2
Abstract. We adopt the untyped imperative object calculus of Abadi and Cardelli as a minimal setting in which to study problems of compilation and program equivalence that arise when compiling object-oriented languages. Our main result is a direct proof, via a small-step unloading machine, of the correctness of compilation to a closure-based abstract machine. Our second result is that contextual equivalence of objects coincides with a form of Mason and Talcott's CIU equivalence; the latter provides a tractable means of establishing operational equivalences. Finally, we prove correct an algorithm, used in our prototype compiler, for statically resolving method osets. This is the rst study of correctness of an object-oriented abstract machine, and of CIU equivalence for an object-oriented language.
1 Motivation This paper collates and extends a variety of operational techniques for describing and reasoning about programming languages and their implementation. We focus on implementation of imperative object-oriented programs. The language we describe is essentially the untyped imperative object calculus of Abadi and Cardelli [1{3], a small but extremely rich language that directly accommodates objectoriented, imperative and functional programming styles. Abadi and Cardelli invented the calculus to serve as a foundation for understanding object-oriented programming; in particular, they use the calculus to develop a range of increasingly sophisticated type systems for object-oriented programming. We have implemented the calculus as part of a broader project to investigate concurrent object-oriented languages. This paper develops formal foundations and veri cation methods to document and better understand various aspects of our implementation. Our work recasts techniques originating in studies of the -calculus in the setting of the imperative object calculus. In particular, our reduction relation for the object calculus, our design of an object-oriented abstract machine, our compiler correctness proof and our notion of program equivalence are all based on earlier studies of the -calculus. This paper is the rst application of these techniques to an object calculus and shows they may easily be re-used in an object-oriented setting. Our system compiles the imperative object calculus to bytecodes for an abstract machine, implemented in C, based on the ZAM of Leroy's CAML Light
[16]. A type-checker enforces the system of primitive self types of Abadi and Cardelli. Since the results of the paper are independent of this type system, we will say no more about it. In Section 2 we present the imperative object calculus together with a smallstep substitution-based operational semantics. Section 3 gives a formal description of an object-oriented abstract machine, a simpli cation of the machine used in our implementation. We present a compiler from the object calculus to instructions for the abstract machine. We prove the compiler correct by adapting a proof of Rittri [23] to cope with state and objects. In Section 4, we develop a theory of operational equivalence for the imperative object calculus, based on the CIU equivalence of Mason and Talcott [18]. We establish useful equivalence laws and prove that CIU equivalence coincides with Morris-style contextual equivalence [20]. In Section 5, we exercise operational equivalence by specifying and verifying a simple optimisation that resolves at compile-time certain method labels to integer osets. We discuss related work at the ends of Sections 3, 4 and 5. Finally, we review the contributions of the paper in Section 6. The full version of this paper, with proofs, is available as a technical report [9].
2 An Imperative Object Calculus We begin with the syntax of an untyped imperative object calculus, the impฯ calculus of Abadi and Cardelli [3] augmented to include store locations as terms. Let x, y, and z range over an in nite collection of variables. Let range over an in nite collection of locations, the addresses of objects in the store. The set of terms of the calculus is given as follows: a; b ::= term x variable location [`i = ฯ(xi )bi i21::n ] object (`i distinct) a:` method selection a:` ( ฯ(x)b method update clone (a) cloning let x = a in b let Informally, when an object is created, it is put at a fresh location, , in the store, and referenced thereafter by . Method selection runs the body of the method with the self parameter (the x in ฯ(x)b) bound to the location of the object containing the method. Method update allows an existing method in a stored object to be updated. Cloning makes a fresh copy of an object in the store at a new location. The reader unfamiliar with object calculi is encouraged to consult the book of Abadi and Cardelli [3] for many examples and a discussion of the design choices that led to this calculus. Here are the scoping rules for variables: in a method ฯ(x)b, variable x is bound in b; in let x = a in b, variable x is bound in b. If is a phrase of syntax we write fv () for the set of variables that occur free in . We say phrase is
closed if fv () = ?. We write ff =xgg for the substitution of phrase for each free occurrence of variable x in phrase . We identify all phrases of syntax up to alpha-conversion; hence a = b, for instance, means that we can obtain term b from term a by systematic renaming of bound variables. Let o range over objects, terms of the form [`i = ฯ(xi )bi i21::n ]. In general, the notation i i21::n means 1 , . . . , n . Unlike Abadi and Cardelli, we do not identify objects up to re-ordering of methods since the order of methods in an object is important for an algorithm we present in Section 5 for statically resolving method osets. Moreover, we include locations in the syntax of terms. This is so we may express the dynamic behaviour of the calculus using a substitution-based operational semantics. In Abadi and Cardelli's closure-based semantics, locations appear only in closures and not in terms. If is a phrase of syntax, let locs () be the set of locations that occur in . Let a term a be a static term if locs (a) = ?. The static terms correspond to the source syntax accepted by our compiler. Terms containing locations arise during reduction. As an example of programming in the imperative object calculus, here is an encoding of the call-by-value -calculus:
(x)b def = [arg = ฯ(z )z:arg ; val = ฯ(s)let x = s:arg in b] def b(a) = let y = a in (b:arg ( ฯ(z )y):val where y 6= z , and s and y do not occur free in b. It is like an encoding from Abadi
and Cardelli's book but with right-to-left evaluation of function application. Given updateable methods, we can easily extend this encoding to express an ML-style call-by-value -calculus with updateable references. Before proceeding with the formal semantics for the calculus, we x notation for nite lists and nite maps. We write nite lists in the form [1 ; : : : ; n ], which we usually write as [i i21::n ]. Let :: [i i21::n ] = [ ; i i21::n ]. Let [i i21::m]@[ j j21::n ] = [i i21::m ; j j21::n ]. Let a nite map, f , be a list of the form [xi 7! i i21::n ], where the xi are distinct. When f = [xi 7! i i21::n ] is a nite map, let dom (f ) = fxi i21::ng. For the nite map f = f 0 @[x 7! ]@f 00 , let f (x) = . When f and g are nite maps, let the map f + (x 7! ) be f 0 @[x 7! ]@f 00 if f = f 0 @[x 7! ]@f 00 , otherwise (x 7! ) :: f . Now we specify a small-step substitution-based operational semantics for the calculus [8,18]. Let a store, , be a nite map [i 7! oi i21::n ] from locations to objects. Each stored object consists of a collection of labelled methods. The methods may be updated individually. Abadi and Cardelli use a method store, a nite map from locations to methods, in their operational semantics of imperative objects. We prefer to use an object store, as it explicitly represents the grouping of methods in objects. Let a con guration, c or d, be a pair (a; ) where a is a term and is a store. Let a reduction context, R, be a term given by the following grammar, with one free occurrence of a distinguished variable, :
R ::= j R:` j R:` ( ฯ(x)b j clone (R) j let x = R in b
We write R[a] for the outcome of lling the single occurrence of the hole in a reduction context R with the term a. Let the small-step substitution-based reduction relation, c ! d, be the smallest relation satisfying the following, where in each rule the hole in the reduction context R represents `the point of execution'. (Red Object) (R[o]; ) ! (R[]; 0 ) if 0 = ( 7! o) :: and 2= dom (). (Red Select) (R[:`j ]; ) ! (R[bj ff=x gg]; ) if () = [`i = ฯ(xi )bi i21::n ] and j 2 1::n. (Red Update) (R[:`j ( ฯ(x)b]; ) ! (R[]; 0 ) if () = [`i = ฯ(xi )bi i21::n], j 2 1::n, 0 = +( 7! [`i = ฯ(xi )bi i21::j,1 ; `j = ฯ(x)b; `i = ฯ(xi )bi i2j+1::n ]). (Red Clone) (R[clone ()]; ) ! (R[0 ]; 0 ) if () = o, 0 = (0 7! o) :: and 0 2= dom (). (Red Let) (R[let x = in b]; ) ! (R[bff=xgg]; ). Let a store be well formed if and only if fv (()) = ? and locs (()) dom () for each 2 dom (). Let a con guration (a; ) be well formed if and only if fv (a) = ?, locs (a) dom () and is well formed. A routine case analysis shows that reduction sends a well formed con guration to a well formed con guration, and that reduction is deterministic up to the choice of freshly allocated locations in rules for object formation and cloning. Let a con guration c be terminal if and only if there is a store and a location such that c = (; ). We say a con guration c converges to d, c + d, if and only if d is a terminal con guration and c ! d. Because reduction is deterministic, whenever c + d and c is well formed, the con guration d is unique up to the renaming of any newly generated locations in the store component of d. Abadi and Cardelli de ne a big-step closure-based operational semantics for the calculus: it relates a con guration directly to the nal outcome of taking many individual steps of computation, and it uses closures, rather than a substitution primitive, to link variables to their values. We nd the small-step substitution-based semantics better suited for the proofs in Sections 3 and 5 as well as for developing the theory of operational equivalence in Section 4. We have proved, using an inductively de ned relation unloading closures to terms, that our semantics is consistent with theirs in the following sense: Proposition 1. For any closed static term a, there is d such that (a; []) + d if and only if evaluation of a converges in Abadi and Cardelli's semantics. j
3 Compilation to an Object-Oriented Abstract Machine In this section we present an abstract machine for imperative objects, a compiler sending the object calculus to the instruction set of the abstract machine and a proof of correctness. The proof depends on an unloading procedure which converts con gurations of the abstract machine back into con gurations of the
object calculus from Section 2. The unloading procedure depends on a modi ed abstract machine whose accumulator and environment contain object calculus terms as well as locations. The instruction set of our abstract machine consists of the operations, ranged over by op , given as follows: access i, object[(`i ; ops i ) i21::n ] (`i distinct), select `, update(`; ops ) or let ops , where ops ranges over operation lists. We represent compilation of a term a to an operation list ops by the judgment xs ` a ) ops , de ned by the following rules. The variable list xs includes all the free variables of a; it is needed to compute the de Bruijn index of each variable occurring in a. (Trans Var) [xi i21::n] ` xj ) [access j ] if j 2 1::n. (Trans Object) xs ` [`i = ฯ(yi)ai i21::n] ) [object[(`i ; ops i )i21::n]] if yi :: xs ` ai ) ops i and yi 2= xs for all i 2 1::n. (Trans Select) xs ` a:` ) ops @[select `] if xs ` a ) ops . (Trans Update) xs ` (a:` ( ฯ(x)a0 ) ) ops @[update(`; ops 0 )] if xs ` a ) ops and x :: xs ` a0 ) ops 0 and x 2= xs. (Trans Clone) xs ` clone (a) ) ops @[clone] if xs ` a ) ops . (Trans Let) xs ` let x = a in a0 ) ops @[let(ops 0)] if xs ` a ) ops and x :: xs ` a0 ) ops 0 and x 2= xs. An abstract machine con guration, C or D, is a pair (P; ), where P is a state and is a store, given as follows: P; Q ::= (ops ; E; AC; RS ) machine state E ::= [i i21::n ] environment AC ::= [] j [] accumulator RS ::= [Fi i21::n ] return stack F ::= (ops ; E ) closure O ::= [(`i ; Fi ) i21::n ] stored object (`i distinct) ::= [i 7! Oi i21::n ] store (i distinct) In a con guration ((ops ; E; AC; RS ); ), ops is the current program. Environment E contains variable bindings. Accumulator AC either holds the result of evaluating a term, AC = [], or a dummy value, AC = []. Return stack RS holds return addresses during method invocations. Store associates locations with objects. Two transition relations, given next, represent execution of the abstract ma chine. A -transition, P ,! Q , corresponds directly to a reduction in the object calculus. A -transition, P ,! Q, is an internal step of the abstract machine, either a method return or a variable lookup. Lemma 3 relates reductions of the object calculus and transitions of the abstract machine. ((ops ; E 0 ; AC; RS ); ). ( Return) (([]; E; AC; (ops ; E 0) :: RS ); ) ,!
((ops ; E; [ ]; RS ); ) ( Access) ((access j :: ops ; E; []; RS ); ) ,! j i 2 1 ::n ] and j 2 1::n. if E = [i ( Clone) ((clone :: ops ; E; []; RS ); ) ,! ((ops ; E; [0 ]; RS ); 0 ) 0 0 0 if () = O and = ( 7! O) :: and 2= dom ( ). ( Object) ((object[(`i; ops i) i21::n] :: ops ; E; []; RS ); ) ,! ((ops ; E; []; RS ); ( 7! [(`i (ops i ; E )) i21::n ]) :: ) if 2= dom ( ). ( Select) ((select `j :: ops ; E; []; RS ); ) ,! ((ops j ; :: Ej ; []; (ops ; E ) :: RS ); ) if () = [(`i ; (ops i ; Ei )) i21::n ] and j 2 1::n. ( Update) ((update(`; ops 0) :: ops ; E; []; RS ); ) ,! ((ops ; E; []; RS ); 0 ) 0 0 if () = O@[(`; F )]@O and = + ( 7! O@[(`; (ops 0 ; E ))]@O0 ). ( Let) ((let ops 0 :: ops ; E; []; RS ); ) ,! ((ops 0 ; :: E; []; (ops ; E ) :: RS ); ).
Each rule apart from the rst tests whether the accumulator is empty or not. We can show that this test is always redundant when running code generated by our compiler. In the machine of the full version of this paper [9], we replace the accumulator with an argument stack, a list of values. To prove the abstract machine and compiler correct, we need to convert back from a machine state to an object calculus term. To do so, we load the state into a modi ed abstract machine, the unloading machine, and when this unloading machine terminates, its accumulator contains the term corresponding to the original machine state. The unloading machine is like the abstract machine, except that instead of executing each instruction, it reconstructs the corresponding source term. Since no store lookups or updates are performed, the unloading machine does not act on a store. An unloading machine state is like an abstract machine state, except that locations are generalised to arbitrary terms. Let an unloading machine state, p or q, be a quadruple (ops ; e; ac; RS ) where e takes the form [ai i21::n ] and ac takes the form [] or [a]. Next we make a simultaneous inductive de nition of a u p0 and an unloading relation, (ops ; e) ; (x)b, that u-transition relation p ,! unloads a closure to a method. u (ops 0 ; e; [a ]; RS ) (u Access) (access j :: ops 0; e; []; RS ) ,! j i 2 1 ::n if j 2 1::n and e = [ai ]. u (u Object) (object[(`i; ops i) i21::n] :: ops 0; e; []; RS ) ,!
(ops 0 ; e; [[`i = ฯ(xi )bi i21::n ]]; RS ) if (ops i ; e) ; (xi )bi for each i 2 1::n. u (ops 0 ; e; [clone (a)]; RS ). (u Clone) (clone :: ops 0; e; [a]; RS ) ,! u (ops 0 ; e; [a:`]; RS ). (u Select) (select ` :: ops 0 ; e; [a]; RS ) ,! u (u Update) (update(`; ops ) :: ops 0; e; [a]; RS ) ,! 0 (ops ; e; [a:` ( ฯ(x)b]; RS ) if (ops ; e) ; (x)b.
u (ops 00 ; e; [let x = a in b]; RS ) (u Let) (let(ops 0 ) :: ops 00; e; [a]; RS ) ,!
if (ops 0 ; e) ; (x)b. u (ops ; E; ac; RS ). (u Return) ([]; e; ac; (ops ; E ) :: RS ) ,! (Unload Closure) (ops ; e) ; (x)b if x 2= fv (e) and u ([]; e0 ; [b]; []). (ops ; x :: e; []; []) ,! We complete the machine with the following unloading relations: O ; o (on objects), ; (on stores) and C ; c (on con gurations). (Unload Object) [(`i ; (ops i; Ei )) i21::n] ; [`i = ฯ(xi )bi i21::n] if (ops i ; Ei ) ; (xi )bi for all i 2 1::n. (Unload Store) [i 7! Oi i21::n] ; [i 7! oi i21::n] if Oi ; oi for all i 2 1::n. (Unload Con g) ((ops ; E; AC; RS ); ) ; (a; ) if ; and u ([]; e0 ; [a]; []). (ops ; E; AC; RS ) ,! We can prove the following: Lemma 2. Whenever [] ` a ) ops then ((ops ; []; []; []); []) ; (a; []).
Lemma 3.
D then D ; c (1) If C ; c and C ,! (2) If C ; c and C ,! D then there is d such that D ; d and c ! d
Let a big-step transition relation, C + D, on machine states hold if and only ) D . if there are ; E; with D = (([]; E; []; []); ) and C (,! [ ,!
Lemma 4.
(1) If C ; c and C + D then there is d with D ; d and c + d (2) If C ; c and c + d then there is D with D ; d and C + D
Theorem 5. Whenever [] ` a ) ops, for all d, (a; []) + d if and only if there is D with ((ops ; []; []; []); []) + D and D ; d. Proof. By Lemma 2 we have ((ops ; []; []; []); []) ; (a; []). Suppose (a; []) + d. By Lemma 4(2), ((ops ; []; []; []); []) ; (a; []) and (a; []) + d imply there is D with D ; d and ((ops ; []; []; []); []) + D. Conversely, suppose ((ops ; []; []; []); []) + D for some D. By Lemma 4(1), ((ops ; []; []; []); []) ; (a; []) and ((ops ; []; []; []); []) + D imply there is d with D ; d and ((ops ; []; []; []); []) + d. ut In the full version of this paper [9], we prove correct a richer machine, based on the machine used in our implementation, that supports functions as well as objects. The full machine has a larger instruction set than the one presented here, needs a more complex compiler and has an argument stack instead of an accumulator. The correctness proof is similar to the one for the machine presented here.
There is a large literature on proofs of interpreters based on abstract machines, such as Landin's SECD machine [12,22,25]. Since no compiled machine code is involved, unloading such abstract machines is easier than unloading an abstract machine based on compiled code. The VLISP project [11], using denotational semantics as a metalanguage, is the most ambitious veri cation to date of a compiler-based abstract machine. Other work on compilers deploys metalanguages such as calculi of explicit substitutions [13] or process calculi [28]. Rather than introduce a metalanguage, we prove correctness of our abstract machine directly from its operational semantics. We adopted Rittri's idea [23] of unloading a machine state to a term via a specialised unloading machine. Our proof is simpler than Rittri's, and goes beyond it by dealing with state and objects. Even in the full version of the paper there are dierences, of course, between our formal model of the abstract machine and our actual implementation. One dierence is that we have modelled programs as nitely branching trees, whereas in the implementation programs are tables of bytecodes indexed by a program counter. Another dierence is that our model omits garbage collection, which is essential to the implementation. Therefore Theorem 5 only implies that the compilation strategy is correct; bugs may remain in its implementation.
4 Operational Equivalence of Imperative Objects The standard operational de nition of term equivalence is Morris-style contextual equivalence [20]: two terms are equivalent if and only if they are interchangeable in any program context without any observable dierence; the observations are typically the programs' termination behaviour. Contextual equivalence is the largest congruence relation that distinguishes observably dierent programs. Mason and Talcott [18] prove that, for functional languages with state, contextual equivalence coincides with so-called CIU (\Closed Instances of Use") equivalence. Informally, two terms are CIU equivalent if and only if they have identical termination behaviour when placed in the redex position in an arbitrary con guration and locations are substituted for the free variables. Although contextual equivalence and CIU equivalence are the same relation, the de nition of the latter is typically easier to work with in proofs. In this section we adopt CIU equivalence as our notion of operational equivalence for imperative objects. We establish a variety of laws of equivalence. We show that operational equivalence is a congruence, and hence supports compositional equational reasoning. Finally, we prove that CIU equivalence coincides with contextual equivalence, as in Mason and Talcott's setting. We de ne static terms a and a0 to be operationally equivalent, a a0 , if, for all variables x1 ; : : : ; xn , all static reduction contexts R with fv (R[a]) [ fv (R[a0 ]) fx1 ; : : : ; xn g, all well formed stores , and all locations 1 ; : : : ; n 2 dom (), we have that con gurations (R[a]ff =x i21::ngg; ) and (R[a0 ]ff =x i21::ngg; ) either both converge or both do not converge. It follows easily from the de nition of operational equivalence that it is an equivalence relation on static terms and, moreover, that it is preserved by static i
i
i
i
reduction contexts:
a0 locs (R) = ? ( Cong R) a R [a] R[a0 ] From the de nition of operational equivalence, it is possible to show a multitude of equational laws for the constructs of the calculus. For instance, the let construct satis es laws corresponding to those of Moggi's computational calculus [19], presented here in the form given by Talcott [27].
Proposition 6.
(1) (let x = y in b) bffy=xgg (2) (let x = a in R[x]) R[a], if x 2= fv (R) The eect of invoking a method that has just been updated is the same as running the method body of the update with the self parameter bound to the updated object. Proposition 7. (a:` ( ฯ(x)b):` (let x = (a:` ( ฯ(x)b) in b) The following laws characterise object constants and their interaction with the other constructs of the calculus. Proposition 8. Suppose o = [`i = ฯ(xi )bi i21::n] and j 2 1::n. (1) o:`j (let xj = o in bj ) (2) (o:`j ( ฯ(x)b) [`i = ฯ(xi )bi i21::j,1 ; `j = ฯ(x)b; `i = ฯ(xi )bi i2j+1::n ] (3) clone (o) o (4) (let x = o in R[clone (x)]) (let x = o in R[o]), if x 2= fv (o) (5) (let x = o in b) b, if x 2= fv (b) (6) (let x = a in let y = o in b) (let y = o in let x = a in b), if x 2= fv (o) and y 2= fv (a) It is also possible to give equational laws for updating and cloning, but we omit the details. Instead, let us look at an example of equational reasoning using the laws above. Recall the encoding of call-by-value functions from Section 2.
(x)b def = [arg = ฯ(z )z:arg ; val = ฯ(s)let x = s:arg in b] def b(a) = let y = a in (b:arg ( ฯ(z )y):val
From the laws for let and for object constants, the following calculation shows the validity of v -reduction, ((x)b)(y) bffy=xgg. Let o = [arg = ฯ(z )y; val = ฯ(s)let x = s:arg in b] where z 6= y . ((x)b)(y) (((x)b):arg ( ฯ(z )y):val by Prop. 6(1) o:val by Prop. 8(2) and ( Cong R) let s = o in let x = s:arg in b by Prop. 8(1) let x = o:arg in b by Prop. 6(2) let x = (let z = o in y) in b by Prop. 8(1) and ( Cong R) let x = y in b by Prop. 8(5) and ( Cong R)
bffy=xgg
by Prop. 6(1)
This derivation uses the fact that operational equivalence is preserved by static reduction contexts, ( Cong R). More generally, to reason compositionally we need operational equivalence to be preserved by arbitrary term constructs, that is, to be a congruence. The following may be proved in several ways, most simply by an adaptation of the corresponding congruence proof for a -calculus with references by Honsell, Mason, Smith and Talcott [14].
Proposition 9. Operational equivalence is a congruence. From Proposition 9 it easily follows that operational equivalence coincides with Morris-style contextual equivalence. Let a term context, C , be a term containing some holes. Let the term C [a] be the outcome of lling each hole in the context C with the term a.
Theorem 10. a a0 if and only if for all term contexts C with locs (C ) = ?, C [a] and C [a0] are closed, that (C [a]; [])+ , (C [a0 ]; [])+. Earlier studies of operational equivalence of stateless object calculi [10,15,24] rely on bisimulation equivalence. See Stark [26] for an account of the diculties of de ning bisimulation in the presence of imperative eects. The main in uence on this section is the literature on operational theories for functional languages with state [14,18]. Agha, Mason, Smith and Talcott study contextual equivalence, but not CIU equivalence, for a concurrent object-oriented language based on actors [5]. Ours is the rst development of CIU equivalence for an object-oriented language. Our experience is that existing techniques for functional languages with state scale up well to deal with the object-oriented features of the imperative object calculus. Some transformations for rearranging side eects are rather cumbersome to express in terms of equational laws as they depend on variables being bound to distinct locations. We have not pursued this issue in great depth. For further study it would be interesting to consider program logics such as VTLoE [14] where it is possible to express such conditions directly.
5 Example: Static Resolution of Labels In Section 3 we showed how to compile the imperative object calculus to an abstract machine that represents objects as nite lists of labels paired with method closures. A frequent operation is to resolve a method label, that is, to compute the oset of the method with that label from the beginning of the list. This operation is needed to implement both method select and method update. In general, resolution of method labels needs to be carried out dynamically since one cannot always compute statically the object to which a select or an update will apply. However, when the select or update is performed on a newly created object, or to self, it is possible to resolve method labels statically. The purpose of this section is to exercise our framework by presenting an algorithm for statically resolving method labels in these situations and proving it correct.
To represent our intermediate language, we begin by extending the syntax of terms to include selects of the form a:j and updates of the form a:j ( ฯ(x)b, where j is a positive integer oset. The intention is that at runtime, a resolved select :j proceeds by running the j th method of the object stored at . If the j th method of this object has label `, this will have the same eect as :`. Similarly, an update :j ( ฯ(x)b proceeds by updating the j th method of the object stored at with method ฯ(x)b. If the j th method of this object has label `, this will have the same eect as :` ( ฯ(x)b. To make this precise, the operational semantics of Section 2 and the abstract machine and compiler of Section 3 may easily be extended with integer osets. We omit all the details. All the results proved in Sections 3 and 4 remain true for this extended language. We need the following de nitions to express the static resolution algorithm.
A ::= [`i i21::n ] SE ::= [xi 7! Ai i21::n ]
layout type (`i distinct) static environment (xi distinct)
The algorithm infers a layout type, A, for each term it encounters. If the layout type A is [`i i21::n ], with n > 0, the term must evaluate to an object of the form [`i = ฯ(xi )bi i21::n ]. On the other hand, if the layout type A is [], nothing has been determined about the layout of the object to which the term will evaluate. An environment SE is a nite map that associates layout types to the free variables of a term. We express the algorithm as the following recursive routine resolve (SE; a), which takes an environment SE and a static term a with fv (a) dom (SE ), and produces a pair (a0 ; A), where static term a0 is the residue of a after resolution of labels known from layout types to integer osets, and A is the layout type of both a and a0 . We use p to range over both labels and integer osets. resolve (SE; x) def = (x; SE (x)) where x 2 dom (SE ) resolve (SE; [`i = ฯ(xi )ai i21::n ]) def = ([`i = ฯ(xi )a0i i21::n ]; A) i 2 1 ::n where A = [`i ] and (a0i ; Bi ) = resolve ((xi 7! A) :: SE; ai ), xi 2= dom (SE ), for each i 2 1::n def resolve 0(SE; a:p) = (a :j; []) if j 2 1::n and p = `j (a0 :p; []) otherwise where (a0 ; [`i i21::n ]) = resolve (SE; a) def resolve 0(SE; a:p (0 ฯ(x)b) = (a :j ( ฯ(x)b ; A) if j 2 1::n and p = `j (a0 :p ( ฯ(x)b0 ; A) otherwise where (a0 ; A) = resolve (SE; a), A = [`i i21::n ] and (b0 ; B ) = resolve ((x 7! A) :: SE; b), x 2= dom (SE ) resolve (SE; clone (a)) def = (clone (a0 ); A) where (a0 ; A) = resolve (SE; a) resolve (SE; let x = a in b) def = (let x = a0 in b0 ; B ) 0 where (a ; A) = resolve (SE; a) and (b0 ; B ) = resolve ((x 7! A) :: SE; b), x 2= dom (SE )
To illustrate the algorithm in action, suppose that false is the object: [val = ฯ(s)s: ; tt = ฯ(s)[]; = ฯ(s)[]] Then resolve ([]; false ) returns the following: ([val = ฯ(s)s:3; tt = ฯ(s)[]; = ฯ(s)[]]; [val ; tt ; ]) The method select s: has been statically resolved to s:3. The layout type [val ; tt ; ] asserts that false will evaluate to an object with this layout. Our prototype implementation of the imperative object calculus optimises any closed static term a by running the routine resolve ([]; a) to obtain an optimised term a0 paired with a layout type A. We have proved that this optimisation is correct in the sense that a0 is operationally equivalent to a.
Theorem 11. Suppose a is a closed static term. If routine resolve ([]; a) returns (a0 ; A), then a a0 . On a limited set of test programs, the algorithm converts a majority of selects and updates into the optimised form. However, the speedup ranges from modest (10%) to negligible; the interpretive overhead in our bytecode-based system tends to swamp the eect of optimisations such as this. It is likely to be more eective in a native code implementation. In general, there are many algorithms for optimising access to objects; see Chambers [7], for instance, for examples and a literature survey. The idea of statically resolving labels to integer osets is found also in the work of Ohori [21], who presents a -calculus with records and a polymorphic type system such that a compiler may compute integer osets for all uses of record labels. Our system is rather dierent, in that it exploits object-oriented references to self.
6 Conclusions In this paper, we have collated and extended a range of operational techniques which we have used to verify aspects of the implementation of a small objectoriented programming language, Abadi and Cardelli's imperative object calculus. The design of our object-oriented abstract machine was not particularly dif cult; we simply extended Leroy's abstract machine with instructions for manipulating objects. Our rst result is a correctness proof for the abstract machine and its compiler, Theorem 5. Such results are rather more dicult than proofs of interpretive abstract machines. Our contribution is a direct proof method which avoids the need for any metalanguage|such as a calculus of explicit substitutions. Our second result is that Mason and Talcott's CIU equivalence coincides with Morris-style contextual equivalence, Theorem 10. The bene t of CIU equivalence is that it allows the veri cation of compiler optimisations. We illustrate this by proving Theorem 11, which asserts that an optimisation algorithm from our implementation preserves contextual equivalence.
This is the rst study of correctness of compilation to an object-oriented abstract machine. It is also the rst study of program equivalence for the imperative object calculus, a topic left unexplored by Abadi and Cardelli's book. To the best of our knowledge, the only other work on the imperative object calculus is a program logic due to Abadi and Leino [4] and a brief presentation, without discussion of equivalence, of a labelled transition system for untyped imperative objects in the thesis of Andersen and Pedersen [6]. In principle, we believe our compiler correctness proof would scale up to proving correctness of a Java compiler emitting instructions for the Java virtual machine (JVM) [17]. To carry this out would require formal descriptions of the operational semantics of Java, the JVM and the compiler. Due to the scale of the task, the proof would require machine support. Acknowledgements Martn Abadi, Carolyn Talcott and several anonymous referees commented on a draft. Gordon holds a Royal Society University Research Fellowship. Hankin holds an EPSRC Research Studentship. Lassen is supported by a grant from the Danish Natural Science Research Council.
References 1. M. Abadi and L. Cardelli. An imperative object calculus: Basic typing and soundness. In Proceedings SIPL'95, 1995. Technical Report UIUCDCS-R-95-1900, Department of Computer Science, University of Illinois at Urbana-Champaign. 2. M. Abadi and L. Cardelli. An imperative object calculus. Theory and Practice of Object Systems, 1(13):151{166, 1996. 3. M. Abadi and L. Cardelli. A Theory of Objects. Springer-Verlag, 1996. 4. M. Abadi and K.R.M. Leino. A logic of object-oriented programs. In Proceedings TAPSOFT '97, volume 1214 of Lecture Notes in Computer Science, pages 682{696. Springer-Verlag, April 1997. 5. G. Agha, I. Mason, S. Smith and C. Talcott. A foundation for actor computation. Journal of Functional Programming, 7(1), January 1997. 6. D.S. Andersen and L.H. Pedersen. An operational approach to the ฯ-calculus. Master's thesis, Department of Mathematics and Computer Science, Aalborg, 1996. Available as Report R{96{2034. 7. C. Chambers. The Design and Implementation of the Self Compiler, an Optimizing Compiler for Object-Oriented Programming Languages. PhD thesis, Computer Science Department, Stanford University, March 1992. 8. M. Felleisen and D. Friedman. Control operators, the SECD-machine, and the -calculus. In Formal Description of Programming Concepts III, pages 193{217. North-Holland, 1986. 9. A.D. Gordon, S.B. Lassen and P.D. Hankin. Compilation and equivalence of imperative objects. Technical Report 429, University of Cambridge Computer Laboratory, 1997. Also appears as BRICS Report RS{97{19, BRICS, Department of Computer Science, University of Aarhus. 10. A.D. Gordon and G.D. Rees. Bisimilarity for a rst-order calculus of objects with subtyping. In Proceedings POPL'96, pages 386{395. ACM, 1996. Accepted for publication in Information and Computation.
11. J.D. Guttman, V. Swarup and J. Ramsdell. The VLISP veri ed scheme system. Lisp and Symbolic Computation, 8(1/2):33{110, 1995. 12. J. Hannan and D. Miller. From operational semantics to abstract machines. Mathematical Structures in Computer Science, 4(2):415{489, 1992. 13. T. Hardin, L. Maranget and B. Pagano. Functional back-ends and compilers within the lambda-sigma calculus. In ICFP'96, May 1996. 14. F. Honsell, I. Mason, S. Smith and C. Talcott. A variable typed logic of eects. Information and Computation, 119(1):55{90, 1993. 15. H. Huttel and J. Kleist. Objects as mobile processes. In Proceedings MFPS'96, 1996. 16. X. Leroy. The ZINC experiment: an economical implementation of the ML language. Technical Report 117, INRIA, 1990. 17. T. Lindholm and F. Yellin. The Java Virtual Machine Speci cation. The Java Series. Addison-Wesley, 1997. 18. I. Mason and C. Talcott. Equivalence in functional languages with eects. Journal of Functional Programming, 1(3):287{327, 1991. 19. E. Moggi. Notions of computations and monads. Information and Computation, 93:55{92, 1989. Earlier version in Proceedings LICS'89. 20. J.H. Morris. Lambda-Calculus Models of Programming Languages. PhD thesis, MIT, December 1968. 21. A. Ohori. A compilation method for ML-style polymorphic record calculi. In Proceedings POPL'92, pages 154{165. ACM, 1992. 22. G.D. Plotkin. Call-by-name, call-by-value and the lambda calculus. Theoretical Computer Science, 1:125{159, 1975. 23. M. Rittri. Proving compiler correctness by bisimulation. PhD thesis, Chalmers, 1990. 24. D. Sangiorgi. An interpretation of typed objects into typed -calculus. In FOOL 3, New Brunswick, 1996. 25. P. Sestoft. Deriving a lazy abstract machine. Technical Report 1994-146, Department of Computer Science, Technical University of Denmark, September 1994. 26. I. Stark. Names, equations, relations: Practical ways to reason about new. In TLCA '97, number 1210 in LNCS, pages 336{353. Springer, 1997. 27. C. Talcott. Reasoning about functions with eects. In Higher Order Operational Techniques in Semantics, Publications of the Newton Institute, pages 347{390. Cambridge University Press, 1997. To appear. 28. M. Wand. Compiler correctness for parallel languages. In Proceedings FPCA'95, pages 120{134. ACM, June 1995.
On the Expressive Power of Rewriting Massimo Marchiori CWI Kruislaan 413, NL-1098 SJ Amsterdam, The Netherlands [email protected]
Abstract
In this paper we address the open problem of classifying the expressive power of classes of rewriting systems. We introduce a framework to reason about the relative expressive power between classes of rewrite system, with respect to every property of interest P . In particular, we investigate four main classes of rewriting systems: left-linear Term Rewriting Systems, Term Rewriting Systems, Normal Conditional Term Rewriting Systems and Join Conditional Term Rewriting Systems. It is proved that, for all the main properties of interest of rewriting systems (completeness, termination, con uence, normalization etc.) these four classes form a hierarchy of increasing expressive power, with two total gaps, between left-linear TRSs and TRSs, and between TRSs and normal CTRSs, and with no gaps between normal CTRSs and join CTRSs. Therefore, these results formally prove the strict increase of expressive power between left-linear and non left-linear term rewriting, and between unconditional and conditional term rewriting, and clarify in what sense normal CTRSs can be seen as equivalent in power to join CTRSs. Keywords: Term Rewriting Systems, Conditional Term Rewriting Systems, Observable Properties, Compilers.
1 Introduction While term rewriting is a well established eld, a satisfactory formal study of the expressive power of classes of rewriting systems is still an open problem. All the works that so far tried to shed some light on this fundamental topic managed only to focus on particular instances of the problem, failing to provide a general approach to the study of expressive power. The rst work on the subject is [3]: imposing the restriction that no new symbol can be added, it provides an example of a conditional algebraic speci cation that is not expressible via unconditional ones. While this basic result is interesting, it only started shaping a view on the subject, since the imposed restriction of not having new symbols is readily extremely limitative.
A subsequent attempt to study some aspects of expressibility of rewrite systems has been done in [2], where it has been shown that `weakly uniquely terminating' TRSs are more expressive than complete TRSs in the sense that they can express some `TRS-suitable' pair congruence/representatives that cannot be expressed by the latter class. Later, [6] showed that linear TRSs are in a sense less powerful than term rewriting systems: that paper showed that linear TRSs generate fewer sets of terms than non linear TRSs when so-called \OI" or \IO passes" are considered. Both of these works only focus on particular instances of the expressibility problem, and suer from a severe lack of generality. The rst work exhibits an ad-hoc result for a speci c property and with a suitable notion of \expressibility". In the second work, TRSs are employed considering \passes" and not with usual reductions; also, the method cannot be used to prove other gaps w.r.t. other paradigms, for instance more expressive than TRSs, since TRSs already generate every recursively enumerable set of terms. Finally, the restriction to linear TRSs is rather strong. A somehow related work is [9], where the authors have investigated with success the equivalence among various types of conditional rewriting systems, and investigated the study of the con uence property. However, no concept of `expressive power' of a class is investigated. In essence, the big problem is to set up a meaningful de nition of expressive power. If we identify expressive power with computational power, then the problem becomes of little interest: for instance every class of rewrite systems containing the left-linear TRSs is equivalent to the class of left-linear TRSs, since a Turing machine can be simulated via a left-linear rewrite rule ([5]). In this paper, we give a rigorous de nition of what means for a class of rewrite systems to be at least as expressive as another class with respect to a certain property of interest P . The solution is to employ a constructive transformation that translates every rewrite system of one class into a rewrite system of the other. The translation must satisfy some regularity conditions: roughly speaking, the produced rewrite system must not compute `less' than the original one, and moreover the structure of the target class has to be respected in the sense that if part of the rewrite system is already in it, it is left untouched. We show how via such mappings, called unravelings, we can study the expressive power of rewriting system with respect to every property of interest P. More precisely, we focus on the relative expressive power of four main classes of rewriting systems: left-linear Term Rewriting Systems, Term Rewriting Systems, Normal Conditional Term Rewriting Systems, Join Conditional Term Rewriting Systems. It is formally proven that, for all the main properties of interest of rewriting systems (termination, con uence, normalization etc.) these four classes form a hierarchy of increasing expressive power, with two total gaps, one between left-linear TRSs and TRSs, and the other between TRSs and normal CTRSs, and no gaps between normal CTRSs and join CTRSs.
Therefore, these results formally prove the strict increase of expressive power between left-linear and non left-linear term rewriting, and between unconditional and conditional term rewriting. Also, they exactly formalize in what sense normal and join CTRSs can be considered equivalent: there is no expressive power dierence between these two paradigm for every major observable property. Besides the theoretical relevance, it is also shown how this dierence of expressive power can provide a clari cation on the intrinsic diculty of analysis of certain classes of rewriting systems with respect to others, and on the power of existing transformations among classes of rewriting systems (for instance, compilations). The paper is organized as follows. After some short preliminaries in Section 2, we introduce in Section 3 the notions of unraveling and of expressiveness w.r.t. a given property. In the subsequent three sections, we perform a thorough study of the relative expressive power of left-linear TRSs, TRSs, normal CTRSs and join CTRSs: Section 4 compares the expressive power of left-linear TRSs with that of TRSs. Section 5 compares the expressive power of TRSs with that of normal CTRSs. Section 6 compares the expressive power of normal CTRSs with that of join CTRSs. Finally, Section 7 presents the resulting expressive hierarchy of rewriting systems, discusses the gap results obtained via slightly dierent hypotheses, and explains the impact of the expressive power analysis for compilers and transformations.
2 Preliminaries We assume knowledge of the basic notions regarding conditional term rewriting systems and term rewriting systems (cf. [7, 13]). In this paper we will deal with join and normal CTRSs , that is in the rst case rules are of the form l ! r ( s #t ; : : : ; sk #tk (with Var (r; s ; t ; : : : ; sk ; tk ) Var (l), where Var (s) denotes the variables of the term s), and in the second of the form l ! r ( s ! n ; : : : ; sk ! nk (with Var (l) Var (r; s ; : : : ; sk ), and n ; : : : ; nk ground normal forms). In place of join CTRSs we will often simply say CTRSs. As far as the major properties of (C)TRSs are concerned, we will employ the standard acronym UN! to denote uniqueness of normal forms w.r.t. reduction (a term can have at most one normal form). Also, we will consider the standard notions of completeness (con uence plus termination), normalization (every term has a normal form) and semi-completeness (con uence plus normalization). If R is a rewrite system, then its use as subscript of a rewrite relation indicates the rewriting is meant to be in R: for example, s,! t means that s reduces to t R in R. Finally, to enhance readability, we will often identify a single rule with the corresponding rewrite system: for instance, instead of writing a one-rule TRS like fa ! bg, we will simply write a ! b. 1
1
1
!
1
1
!
1
1
1
3 Expressiveness Our approach allows to study the expressive power of rewrite systems in detail, focusing on every single property of interest. In a sense, it is a `behavioural' approach, since we consider expressiveness with respect to a given property (the `observable'). If a class C 0 of rewrite systems has at least the same \expressive power" than another class C with respect to a certain observable property P , then there should be a transformation that given a rewrite system R in C produces a corresponding rewrite system R0 in C 0 that is `behaviourally equivalent' to R and that `computes at least as much as' R. Behaviourally equivalent means that R0 and R should be the same with respect to the observable property P : that is to say, R 2 P , R0 2 P . On the other hand, the fact that R0 `computes at least as much as' R has already a quite standard de nition in the literature. In [9] the notion of logical strength has been introduced; more precisely, given two rewriting system R and R0 , R0 is said to be logically stronger than R if #R #R and #R 6 #R ; also, R0 is said to have the same logical strength as R if #R = #R . Thus, R0 `computes as least as' R can be seen as R0 `has at least the same logical strength as' R, that is to say #R #R . The proper formalization of the intuitive notion of transformation is given by the concept of unraveling: 0
0
0
0
De nition 3.1 Given two classes C and C 0 of rewrite systems, an unraveling of C into C ' is a computable map U from C to C 0 such that 1. 8R 2 C : #R #U R 2. 8R 2 C : if R = R0 [ R00 with R0 2 C 0 , then U(R) = R0 [ U(R00 ) 3. 8R 2 C : if R is nite, then U(R) is nite (
)
The rst condition is just, as said, the requirement that the produced rewrite system has at least the same logical strength as the original one. The second condition says that if we are unraveling a rewrite system into C 0 , we can extract from it the part that is already in C 0, and then go on computing the unraveling (roughly speaking, the unraveling must respect the structure of C 0 , since we are interested in the relative increase of expressive power). The third condition ensures that if we have a nite system we don't get an in nite system by unraveling. We can then de ne the notion of expressiveness with respect to a given property P :
De nition 3.2 Given two classes C and C 0 of rewrite systems, C 0 is said to be at least as expressive as C w.r.t. the property P if there is an unraveling U of C into C 0 such that 8R 2 C : R 2 P , U(R) 2 P .
C 0 is said to be as expressive as C (w.r.t. P ) if C 0 is at least as expressive as C w.r.t. P and vice versa. Finally, C 0 is said to be more expressive than C (w.r.t. P ) if C 0 is at least as expressive as C w.r.t. P but not vice versa. The following proposition formalizes the intuitively obvious concept that if a class of rewrite system C 0 contains another class C , then it is also at least as expressive as C .
Proposition 3.3 Given two classes of rewrite systems C and C 0, if C C 0 then 0
C is at least as expressive as C w.r.t. every property P . Proof Take a property P . The identity map IC;C from C to C 0 is an unraveling of C into C 0 , as it is immediate to check. Readily, 8R 2 C : R 2 P , IC ;C (R) = R 2 P , and so C 0 is at least as expressive as C . 0
0
The importance of establishing whether or not a certain class of rewrite systems (C 0 ) is at least as expressive as another one (C ) w.r.t. a certain property is not only theoretical, but has practical impact as well. The typical case is when C 0 C , that is when one wants to show whether passing from C 0 to a greater class C leads to a proper increase in expressive power (or vice versa, if restricting from C to C 0 leads to a proper loss in expressive power). If C and C 0 have the same expressive power w.r.t. P , then the analysis of the observable property P for objects in C can be reduced to the analysis of P for objects in the restricted class C 0 (one just uses the corresponding unraveling to translate a rewrite system R 2 C into an R0 2 C 0 , and analyzes R0 ). On the other hand, if C is more expressive than C 0 w.r.t. P , then the analysis of the observable property P for objects in C is inherently more dicult for rewrite systems in C than for those in C 0 . For example, consider the case of compilers, or of transformational toolkits aiming at obtaining more ecient code (e.g. fold-unfold systems, etc.). Usually, they translate a program written in a high-level expressive language (C ) to a low-level subset of it (C 0 ). A minimum requirement for such a compilation/transformation to be sound could be for instance that if the starting program is terminating, then its transformed version terminates as well. But if C 0 is not as expressive as C w.r.t. termination, then such a compiler/transformation cannot exist (unless it does not satisfy the second regularity condition of De nition 3.1). To make such informal reasoning more concrete, consider as C the class of CTRSs, and as C 0 the class of TRSs. So, for instance, if TRSs are less expressive than CTRSs w.r.t. termination, then such a compilation is impossible (unless the compilation mapping does not satisfy the second condition of De nition 3.1). We will return on these topics in Section 7, after having completed the analysis of the expressive power among left-linear TRSs, TRSs, normal CTRSs and join CTRSs.
As far as the observable property P is concerned, we will perform the expressibility analysis with respect to all the major properties of rewrite systems, that is: Termination Con uence Completeness Uniqueness of normal forms w.r.t. reduction (UN! ) Normalization Semi-completeness
4 Left-linear TRSs versus TRSs In this section we analyze the expressive power of left-linear Term Rewriting Systems versus Term Rewriting Systems. Theorem 4.1 TRSs are more expressive than left-linear TRSs w.r.t. completeness. Proof By Proposition 3.3, TRSs are at least as expressive as left-linear TRSs w.r.t. completeness. So, we have to prove that left-linear TRSs are not at least as expressive as TRSs w.r.t. completeness. Ab absurdo, suppose there is an unraveling U of TRSs into left-linear TRSs such that every TRS R is complete if and only if U(R) is complete. Take the rule : f (X; X ) ! X . Since in we have f (X; X )#X , we also have f (X; X )#U X . This implies that f (X; X ),,! X . But since U() is left-linear from f (X; X ),,! X it ( )
U
+
U
( )
+
( )
follows that f (X; Y ),,! X (or ,,! Y ): suppose w.l.o.g. that f (X; Y ),,! U U U X . Now, consider f (X; Y ) ! Y [ : it is a complete TRS, and so its unraveling U(f (X; Y ) ! Y [ ) = f (X; Y ) ! Y [ U() must be complete as well. But f (X; Y ),,,,,,,,,,! X and f (X; Y ),,,,,,,,,,! Y , a contradiction. +
+
( )
f X;Y (
+
U
!Y [
)
( )
f X;Y
( )
+
( )
(
!Y [
)
U
+
( )
Theorem!4.2 TRSs are more expressive than left-linear TRSs w.r.t. con uence, UN , and semi-completeness. Proof Completely analogous to the proof of Theorem 4.1. Theorem 4.3 TRSs are more expressive than left-linear TRSs w.r.t. termination. Proof By Proposition 3.3, we just have to prove that left-linear TRSs are not at least as expressive as TRSs w.r.t. termination. Ab absurdo, suppose there is an unraveling U of TRSs into left-linear TRSs such that every TRS R is terminating if and only if U(R) is terminating. Like in the proof of Theorem 4.1, using the rule : f (X; X ) ! X we can obtain that f (X; Y ),,! X . Now, consider g(a) ! g(f (a; b)) [ : it is a termi-
U
+
nating TRS, and so its unraveling g(a) ! g(f (a; b)) [ U() is terminating as well. But g(a),,,,,,,,,,,,,! g(f (a; b)),,,,,,,,,,,,,! g(a), a contradiction. ( )
g a !g f a;b ( )
(
(
U
[
))
( )
+
g a !g f a;b ( )
(
(
U
[
))
( )
+
Theorem 4.4 TRSs are more expressive than left-linear TRSs w.r.t. normalization.
Proof By Proposition 3.3, we just have to prove that left-linear TRSs are not at least as expressive as TRSs w.r.t. normalization. Ab absurdo, suppose there is an unraveling U of TRSs into left-linear TRSs such that every TRS R is normalizing if and only if U(R) is normalizing. Like in the proof of Theorem 4.1, using the rule : f (X; X ) ! X we can obtain that f (X; Y ),,! X . U Since is normalizing, U() is normalizing as well. Consider the (left-linear) TRS f (X; Y ) ! f (X; Y ) [ U(): we have that ,,,,,,,,,,,,,,! = ,,! . Moreover, no normal form in U() contains the f X;Y !f X;Y [U U symbol f (since f (X; Y ),,! X ). Hence, from the fact that U() is normalizing U it follows that f (X; Y ) ! f (X; Y ) [ U() is normalizing too. Now, take the TRS f (X; Y ) ! f (X; Y ) [ : since it is not normalizing (f (X; Y ) has no normal forms), its unraveling U(f (X; Y ) ! f (X; Y ) [ ) = f (X; Y ) ! f (X; Y ) [ U() must be not normalizing, a contradiction. +
( )
(
)
(
)
( )
( )
+
( )
5 TRSs versus normal CTRSs In this section we analyze the expressive power of Term Rewriting Systems versus Normal Conditional Term Rewriting Systems.
Theorem 5.1 Normal CTRSs are more expressive than TRSs w.r.t. termination.
Proof By Proposition 3.3, we just have to prove that TRSs are not at least as expressive as normal CTRSs w.r.t. termination. Ab absurdo, suppose there is an unraveling U of CTRSs into TRSs such that every CTRS R is terminating if and only if U(R) is terminating. Take the rule : a ! b ( c! d. Since in c ! d [ we have a#b, then in U(c ! d [ ) = c ! d [ U() we have a#a!c[U b as well, i.e. a,,,,,,! n ,,,,,,b, for some n. c!d[U c!d[U If a,,,,,,! n uses the rule c ! d, then a,,! C[c] for some context C[ ]. c!d[U U Now, c ! a [ is terminating, and so c ! a [ U() is terminating as well. But in the latter TRS we have the in nite reduction a ! C[c] ! C[a] ! C[C[c]] ! : : :, a contradiction. The same reasonment leads us to exclude that the rule c ! d can be used in the reduction b,,,,,,! n. c!d[U !
( )
( )
( ) +
( )
( )
+
( )
+
Therefore, we have that a,,! n ,,b. So, f (X; X ) ! f (a; b) [ is termiU U nating implies that f (X; X ) ! f (a; b) [ U() is terminating as well, while in this TRS there is the reduction f (a; b) ! f (n; n) ! f (a; b), a contradiction. ( )
( )
Theorem 5.2 Normal CTRSs are more expressive than TRSs w.r.t. completeness.
Proof Completely analogous to the proof of the above theorem.
Theorem 5.3 Normal CTRSs are more expressive than TRSs w.r.t. con uence.
Proof Sketch Consider the rule : f (X ) ! X ( c!d. Since f (X ),,,! c!d[ X , then f (X ),,,,,,! X . This reduction can be decomposed in the following c!d[U way: f (X ),,! t ,! t ,,! t ,! t : : : ,,! tk X U c!d U c!d U !
( )
+
+
1
+
2
( )
+
3
+
4
( )
( )
Being nite, we can take two constants A and B which do not appear in the rules of U(). Consider the rule f (A) ! B . Being f (A) ! B [ con uent, f (A) ! B [ U() is con uent too. Hence, since in the latter TRS f (A) rewrites to B and to t [X=A], we have that B and t [X=A] join. But being B new, the only possibility is that t [X=A],,,,,,,,,! B . Now, it is not dicult to 1
1
f A !B [U
1
(
)
+
( )
see that for every term s without occurrences of B , if s,,,,! s0 ,,! u then f A !B (
+
)
U 00
+
( )
the commutation property holds, i.e. there is an s00 such that s,,! s ,,,,!
U
+
( )
f A !B (
)
(
)
+
u. Therefore, in the reduction t [X=A],,,,,,,,,! B we can commute all the f A !B [U applications of the rule f (A) ! B , nally obtaining t [X=A],,! f (A),,,,! f A !B U B . So, being A new we have that t [X=A],,! f (X ). U Therefore, we have that f (X ) ,,t ,t ,,! t . Being ff (A) ! B g [ U d!c U fd ! cg [ con uent, we also have that ff (A) ! B g [ fd ! cg [ U() is con uent. Hence, t [X=A]#ff A !Bg[fd!cg[U B , and being B new this implies that t [X=A],,,,,,,,,,,,,,,! B . By applying the aforementioned commuff A !B g[fd!cg[U tation result, we have that from this reduction we can get t [X=A],,,,,,! d!c[U f (A),,,,! B , and being A new we obtain t ,,,,,,! f (X ). f A !B d!c[U By repeating this reasonment to t , we can prove that t ,,,,,,! f (X ), and d!c[U so on, till at the end we obtain that tk X ,,,,,,! f (X ), a contradiction. +
1
(
)
( )
+
1
+
( )
+
1
( )
+
1
+
3
(
)
+
2
( )
3
( )
( )
+
3
(
)
( )
+
3
( )
+
(
+
3
)
( )
4
+
4
( )
d!c[U
+
( )
Theorem 5.4 Normal CTRSs are more expressive than TRSs w.r.t. UN! . Proof By Proposition 3.3, we just have to prove that TRSs are not at least as expressive as normal CTRSs w.r.t. UN! . Ab absurdo, suppose there is an unraveling U of CTRSs into TRSs such that every CTRS R is UN! if and only if U(R) is UN! . Take the rule : a ! b ( c! d. Since in c ! d [ we have a#b, we also have that that a and b join in U(c ! d [ ) = c ! d [ U(), that is to say there are the reductions a,,,,,,! c!d[U n and b,,,,,,! n. !
( )
c!d[U
( )
Is a a normal form in U()? Suppose it is not. Since fe ! a; e ! f g [ 62 UN! , fe ! a; e ! f g [ U() 62 UN! too. But since a is!not a normal form in U(), it is not such in fe ! a; e ! f g [ U() 62 UN either. Hence, adding a ! a makes this TRS still not UN! (since no normal form has a in it, and ,,,,,,,,,,,! fe!a;e!f g[U = ,,,,,,,,,,,,,,! ). fa!a;e!a;e!f g[U But fa ! a; e ! a; e ! f g [ 2 UN! implies fa ! a; e ! a; e ! f g [ U() 2 UN! , a contradiction. So, a is a normal form in U(). From a#c!d[ b we get a#c!d[U b. But being a a normal form in U(), it is a normal form in c ! d [ U() too. So, a#c!d[U b ) b,,,,,,! a. ( )
( )
( )
( )
c!d[U
( )
Is e a normal form in c ! d [ U()? The reasonment is similar to what seen previously for a. Suppose it is not. Then, ff ! e; f ! g; c ! dg [ 62 UN! implies ff ! e; f ! g; c ! dg [ U() 62 UN! . But since e is not a normal form in c ! d [ , it is not such in ff ! e; f ! g; c ! dg [ U() either. Hence, adding e ! e to this TRS makes it still not UN! (since no normal form has e in it, and ,,,,,,,,,,,,,,! = ,,,,,,,,,,,,,,,,,! ). ff !e;f !g;c!dg[U fe!ef !e;f !g;c!dg[U But fe ! e; f ! e; f ! g; c ! dg [ 2 UN! implies fe ! e; f ! e; f ! g; c ! dg [ U() 2 UN! , a contradiction. So, e is a normal form in c ! d [ U(). Thus, being a is a normal form in U() and e a normal form in c ! d [ U(), we get that both a and e are normal forms in b ! e [ c ! d [ U(). Therefore, b ! e [ c ! d [ 2 UN! ) b ! e [ c ! d [ U() 2 UN! , whilst b in this TRS rewrites to the two dierent normal forms a and e. ( )
( )
Theorem 5.5 Normal CTRSs are more expressive than TRSs w.r.t. normalization.
Proof By Proposition 3.3, we just have to prove that TRSs are not at least as expressive as normal CTRSs w.r.t. normalization. Ab absurdo, suppose there is an unraveling U which is complete for normalization.
Take the rule : a ! b ( c! d. Since in c ! d [ we have a#b, we also have that that a and b join in U(c ! d [ ) = c ! d [ U(), that is to say there are the reductions a,,,,,,! c!d[U n and b,,,,,,! n. !
( )
c!d[U
( )
If a is a normal form in c ! d [ U(), then b,,! a. Also, since fb ! U b; c ! dg [ is not normalizing, fb ! b; c ! dg [ U() is not normalizing too. On the other hand, ,,,,,,,,,,! = ,,,,,,! , and the normal forms fb!b;c!dg[U c!d[U in fb ! b; c ! dg [ U() are the same as in c ! d [ U(), since no one of them contains b (b is not a normal form in c ! d [ U()), and so being c ! d [ U() normalizing implies that fb ! b; c ! dg [ U() is normalizing as well, a contradiction. So, a is not a normal form in c ! d [ U(), and a fortiori neither in U(). Being normalizing, U() is normalizing as well. Also, a ! a [ is not normalizing implies that a ! a [ U() is not normalizing. But ,,,,,,! a!a[U = ,,! , and the normal forms in a ! a [ U() are the same as in U(), since U no one of them contains a (a is not a normal form in U()): so, being U() normalizing implies that a ! a [ U() is normalizing as well, a contradiction. ( )
( )
( )
( )
( )
Theorem 5.6 Normal CTRSs are more expressive than TRSs w.r.t. semicom-
pleteness. Proof The proof is like that of Theorem 5.5, once substituted the word normalizing with the word semicomplete (it uses the fact that if for two TRSs R and R we have ,! = ,! , then R is con uent i R is such). 1
2
R1
R2
1
2
6 Normal CTRSs versus CTRSs In this section we analyze the expressive power of Normal Conditional Term Rewriting Systems versus join Conditional Term Rewriting Systems. We will rst employ the simulation of CTRSs via normal CTRSs introduced in [9, 8]. A CTRS is transformed into a normal CTRSs by replacing every rule l ! r ( s #t ; : : : ; sk #tk with the rules l ! r ( eq (s ; t )! true ; : : : ; eq (sk ; tk )! true and eq (X; X ) ! true (where eq and true are new distinguished symbols). Call Uext such mapping. Then we have: 1
1
1
!
!
Theorem 6.1 U is an unraveling of CTRSs into normal CTRSs. Proof The rst point of De nition 3.1 has been proved in [8], while the second ext
is trivial.
1
Using this unraveling, it can be proved that CTRSs and normal CTRSs have equal expressive power w.r.t. the following major properties:
Theorem 6.2 Normal CTRSs are as expressive as CTRSs w.r.t. termination. Proof Sketch It suces to prove that for every terminating CTRS T , T feq (X; X ) ! true g is terminating (here, denotes as usual the disjoint sum
operator, i.e. the two systems are assumed to have disjoint signatures). Algebraically, this means that we have to prove that feq (X; X ) ! true g is in the kernel (cf. [15]). Let's de ne the eq -rank of a term t as the greatest number of nested eq symbols. The eq -rank of a term cannot increase with reductions, as it is easy to see. So, we can perform a proof by induction on the eq -rank. Suppose ab absurdo that there is an in nite reduction in T feq (X; X ) ! true g. Take a term t with smallest eq -rank among those having an in nite reduction. Now, replace all the occurrences in t of the symbol eq by the symbol true . It is not dicult to prove that one can still mimic the original reduction; moreover, this reduction is still in nite, by the eq -rank minimality assumption. But then, we have an in nite reduction starting from a term having eq -rank zero, which means that T is not terminating, a contradiction.
Theorem 6.3 Normal CTRSs are as expressive as CTRSs w.r.t. con uence,
normalization, and semi-completeness. Proof Sketch Essentially, the proof depends on the fact that all these properties are modular, so that adding to a CTRS T the TRS feq (X; X ) ! true g does not change the property of interest for T .
Theorem 6.4 Normal CTRSs are as expressive as CTRSs w.r.t. completeness. Proof This follows by the previous two theorems, once noticed that complete-
ness is con uence plus termination. Note that we cannot apply directly the proof argument of the previous Theorem 6.3, since completeness is not modular (cf. e.g. [16]). Finally, the UN! property remains:
Theorem 6.5 Normal CTRSs are as expressive as CTRSs w.r.t. UN!. Proof Sketch The proof uses the new unraveling Ue ext , which is a slight variae ext is tion of the unraveling Uext (that, as it can be easily seen, does not work). U de ned in such a way that every conditional rule l ! r ( s #t ; : : : ; sk #tk is replaced by the rules l ! r ( eq (s ; t )! true ; : : : ; eq (sk ; tk )! true , eq (X; X ) ! true , and eq (X; Y ) ! eq (X; Y ) (with eq and true new distinguished symbols). Next, one essentially has to prove that for every CTRS T which is UN! , T feq (X; X ) ! true ; eq (X; Y ) ! eq (X; Y )g is still UN! , and this is not dicult to show (the proof is, analogously to the case of Theorem 6.2, by induction on the eq -rank), since no normal form can have an eq symbol. 1
1
1
!
1
!
CONDITONAL REWRITING
CTRSs
Normal CTRSs
TRSs
GAP Left-linear TRSs
UNCONDITONAL REWRITING
GAP
EXPRESSIVE POWER
Figure 1: The Expressiveness Hierarchy of Rewriting System.
7 The Expressiveness Hierarchy Summing up the results obtained in the previous three sections, we have formally shown that: 1) There is a gap in expressive power when restricting term rewriting to left-linear term rewriting, with respect to every main property of rewriting systems (Section 4), this way extending the gap result of [6] which only proved a gap between linear TRSs and TRSs. In Section 6 we have shown that: 2) normal CTRSs and join CTRSs have the same expressive power w.r.t. all the main properties of rewriting systems. So, normal CTRSs can be seen as equivalent to join CTRSs for every observable property of interest. Combining these results with those of Section 5, we obtain that 3) there is a gap in expressive power when passing to unconditional rewriting to conditional rewriting with respect to every main property of rewriting systems. Graphically, the resulting Expressiveness Hierarchy is illustrated in Figure 1. The conditions on the de nition of unraveling (De nition 3.1) can be slightly modi ed obtaining a variety of other similar results. For instance, one may want to consider the more abstract case when the third niteness condition is dropped (i.e., loosening the concept of expressive power by allowing the extra power to build \in nite" systems). To this respect, it is easy to see that the proofs that we have given for the gap results between left-linear TRSs and TRSs still hold in this more general setting, (i.e., even allowing the extra power to build in nite left-linear TRSs, the power gap still remains), thus showing that the gap is in a sense even stronger. Another case that can be considered concerns the rst
condition of unraveling: here, the standard notion of logical strength (cf. Section 3) has been employed, which is based on the join relation. However, one could be led to consider another natural condition like for instance the one based on reduction: 8R 2 C : ,! ,,! +
+
UR This way, the intuitive concept that the system U(R) computes `at least as' the system R is formally represented by the fact that if in the system R a term t rewrites to another term t0 , then the same can be done in U(R). This way we R
(
)
have a stronger form of equivalence, where the systems are not only required to behave in the same way with respect to the logical strength, but even with respect to reducibility. It is not dicult to see that, in this new setting, all the proofs concerning the non-gap results between normal CTRSs and join CTRSs still hold. Also, trivially, all the other gap results between left-linear TRSs and TRSs, and between TRSs and normal CTRSs, still hold, having strengthened the de nition of unraveling. Hence, we have that the Expressiveness Hierarchy remains true even in this new expressiveness setting. Another point worth mentioning is that the gap results given in this paper between TRSs and normal CTRSs are in a sense much stronger: for example, all the proofs that we have given for these cases still hold when only normal CTRSs with at most one conditional rule having one ground condition (and, even, made up by constants only) are considered, this way proving the stronger expressive gap between TRSs and this quite limited subclass of normal CTRSs (in a sense, the outcome is that it suces just one conditional test, even in such a limited form, to produce an increase in expressive power). Besides the major properties of rewriting systems here studied, it is not dicult to investigate along the same lines many other properties of rewriting systems, like equational consistency, the equational normal form property, innermost termination, innermost normalization and so on (see e.g. [13]), obtaining similar gap results as for the major properties. From the practical point of view, the presence of the gaps between left-linear and non left-linear term rewriting and between unconditional and conditional term rewriting can be seen as a formal con rmation that the analysis of all the major properties of TRSs (resp. CTRSs) is intrinsically more complex than for left-linear TRSs (resp. TRSs), cf. the discussion in Section 3. For instance, in [14] it has been studied to what extent the properties of CTRSs can be automatically inferred from those of TRSs. This study has been carried out using unravelings that `behave well' with respect to some property, in the sense that the unraveled TRS safely approximates the original CTRS. The number of results that one can automatically obtain is high, but it is not clear in general to what extent results from a simpler eld like TRSs can be extended to CTRSs. The results proved in this paper give a formal proof to the fact that there is an intrinsic factor due to the expressive power gap: it is impossible to fully recover the results known for any of the major properties of interest of CTRSs by resorting only on the
simpler TRS paradigm, since there is no unraveling able to fully preserve them; in other words, every approximating TRS must be lossy. Last but not least, another related consequence is in the eld of compilation of CTRSs via TRSs. The presence of the gap between unconditional and conditional rewriting gives an a posteriori justi cation of the fact that so far all existing compilations of CTRSs via TRSs either are `impure' in the sense that they have to use an ad hoc reduction strategy restriction, or they cannot act on the whole class of conditional term rewriting systems. So, in the rst category we have the works by Ada, Goguen and Meseguer ([1]), and the work by Kaplan ([12]), which compiles CTRSs into Lisp code (the resulting Lisp programs could with some eort, as claimed in [11], be compiled into TRSs using Combinatory Logic). All the other works, i.e. [4, 10, 11], fall in the second category, since they considerably restrict the class of CTRSs that can be compiled.
Acknowledgments
I wish to thank Jan Willem Klop for his support.
References [1] H. Aida, G. Goguen, and J. Meseguer. Compiling concurrent rewriting onto the rewrite rule machine. In S. Kaplan and M. Okada, editors, Proceedings 2nd International Workshop on Conditional and Typed Rewriting Systems, vol. 516 of LNCS, pp. 320{332. Springer{Verlag, 1990. [2] J. Avenhaus. On the descriptive power of term rewriting systems. Journal of Symbolic Computation, 2:109{122, 1986. [3] J. Bergstra and J.-Ch. Meyer. On specifying sets of integers. Journal of information processing and cybernetics (EIK), 20(10,11):531{541, 1984. [4] J. Bergstra and J.W. Klop. Conditional rewrite rules: Con uence and termination. J. of Computer and System Sciences, 32(3):323{362, 1986. [5] M. Dauchet. Simulation of Turing machines by a left-linear rewrite rule. In N. Dershowitz, editor, Proceedings of the Third International Conference on Rewriting Techniques and Applications, LNCS 355, pages 109{120. Springer{Verlag, 1989. [6] M. Dauchet and De Comite. A gap between linear and non-linear termrewriting systems. In Proceedings of the Second International Conference on Rewriting Techniques and Applications, volume 256 of LNCS, pages 95{ 104, Bordeaux, France. Springer{Verlag, 1987. [7] N. Dershowitz and J.-P. Jouannaud. Rewrite systems. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, volume B, chapter 6, pages 243{320. Elsevier { MIT Press, 1990.
[8] N. Dershowitz and M. Okada. A rationale for conditional equational programming. Theoretical Computer Science, 75:111{138, 1990. [9] N. Dershowitz, M. Okada, and G. Sivakumar. Canonical conditional rewrite systems. In Proceedings of the 9th International Conference on Automated Deduction, volume 310 of LNCS, pages 538{549. Springer{Verlag, 1988. [10] E. Giovannetti and C. Moiso. Notes on the elimination of conditions. In S. Kaplan and J.-P. Jouannaud, editors, Proceedings 1st International Workshop on Conditional and Typed Rewriting Systems, volume 308 of LNCS, pages 91{97. Springer{Verlag, 1988. [11] C. Hintermeier. How to transform canonical decreasing CTRSs into equivalent canonical TRSs. In N. Dershowitz and N. Lindenstrauss, editors, Proceedings 4th International Workshop on Conditional and Typed Rewriting Systems, vol. 968 of LNCS, pages 186{205. Springer{Verlag, 1995. [12] S. Kaplan. A compiler for conditional term rewriting systems. In P. Lescanne, editor, Proceedings 2nd International Conference on Rewriting Techniques and Applications, volume 256 of LNCS, pages 25{41. Springer{ Verlag, 1987. [13] J.W. Klop. Term rewriting systems. In S. Abramsky, Dov M. Gabbay, and T.S.E. Maibaum, editors, Handbook of Logic in Computer Science, volume 2, chapter 1, pages 1{116. Clarendon Press, Oxford, 1992. [14] M. Marchiori. Unravelings and ultra-properties. In Proceedings of the Fifth International Conference on Algebraic and Logic Programming (ALP'96), volume 1139 of LNCS, pages 107{121. Springer{Verlag, 1996. [15] M. Marchiori. The Theory of Vaccines. In Proceedings of the Twentyfourth International Colloquium on Automata, Languages, and Programming (ICALP'97), volume 1256 of LNCS, pages 660{670, Springer{Verlag, 1997. [16] Y. Toyama, J.W. Klop and H.P. Barendregt. Termination for Direct Sums of Left-Linear Complete Term Rewriting Systems. In Journal of the ACM, volume 42, number 6, pages 1275{1304, 1995.
Mechanizing Veri cation of Arithmetic Circuits: SRT Division ? Deepak Kapur1 and M. Subramaniam2?? 1 Computer Science Department, State University of New York, Albany, NY 12222
[email protected]
2 Functional Veri cation Group, Silicon Graphics Inc., Mountain View, CA 94040
[email protected]
Abstract. The use of a rewrite-based theorem prover for verifying
properties of arithmetic circuits is discussed. A prover such as Rewrite Rule Laboratory (RRL) can be used eectively for establishing numbertheoretic properties of adders, multipliers and dividers. Since veri cation of adders and multipliers has been discussed elsewhere in earlier papers, the focus in this paper is on a divider circuit. An SRT division circuit similar to the one used in the Intel Pentium processor is mechanically veri ed using RRL. The number-theoretic correctness of the division circuit is established from its equational speci cation. The proof is generated automatically, and follows easily using the inference procedures for contextual rewriting and a decision procedure for the quanti er-free theory of numbers (Presburger arithmetic) already implemented in RRL. Additional enhancements to rewrite-based provers such as RRL that would further facilitate verifying properties of circuits with structure similar to that of the SRT division circuit are discussed.
1 Introduction There has been considerable interest recently in using automated reasoning techniques to aid in enhancing con dence in hardware designs. A number of researchers have been exploring the use of BDD based software, model checkers, theorem provers and veri cation systems for verifying properties of arithmetic circuits, cache-coherence protocols, dierent kinds of processors including pipeline, scalable processors, as well as a commercial processor. Papers on these attempts have appeared in recent conferences such as CAV and FMCAD. Intrigued by these attempts and results, we decided to try our theorem prover Rewrite Rule Laboratory (RRL) [11] for hardware veri cation, with the main objective of exploring circuits and their properties that can be veri ed automatically in a push-button mode. We have also been interested in identifying extensions and enhancements to RRL which would make it better suited for this application. In [8] and [7], we discussed how RRL had been used for verifying ripple-carry, carry-lookahead and carry-save adders, as well as a family of multipliers including Wallace-tree and Dadda multipliers. ? ??
Partially supported by the National Science Foundation Grant no. CCR-9712366. This work was done while the author was at State University of New York, Albany.
Our experience in using RRL has been very encouraging. RRL can be used effectively, essentially in the push-button style, for proving number-theoretic properties of these circuits without having to require xing their widths. Parametric circuits can be veri ed; descriptions common to a family of related circuits can be given and reasoned about. Proofs of components can be reused while attempting proofs of larger circuits; as an example, while reasoning about multipliers, adders used in them can be treated as black-boxes insofar as they satisfy their speci cations. In this paper, we discuss how RRL can be used for reasoning about SRT division circuits. After reading [2] and [18], we rst suspected that considerable user interaction with and guidance to RRL may be needed to verify the main properties of the circuit. The reported use of Mathematica and Maple in [2, 4] for reasoning about inequalities and real numbers, as well as the use of dependent types, table data structure, and other higher order features in [18] initially discouraged us from attempting a mechanical veri cation of the division circuit using RRL. We subsequently discovered to our pleasant surprise that the proof reported in [2] could be easily found using RRL without any user guidance; a brief sketch of that proof is given in [5]. In fact, the mechanization of that proof was the easiest to do in RRL in contrast to the proofs of adders and multipliers in [8, 7]. We have recently found a much simpler and easier proof of the SRT division circuit by explicitly representing the quotient selection table. (It is widely believed that the bug in the Intel Pentium processor was in the quotient selection table.) In this paper, we discuss this new proof. Later, we contrast this proof with our earlier proof attempt as well as proofs in [2, 18]. Four major features seemed to have contributed to RRL being eective in mechanization attempts for hardware veri cation. 1. Fast contextual rewriting and reasoning about equality [23]. 2. Decision procedures for numbers and freely constructed recursive data structures such as lists and sequences, and most importantly, their eective integration with contextual rewriting [6]. 3. Cover set method for mechanization of proofs by induction [24], and its integration with contextual rewriting and decision procedures. 4. Intermediate lemma speculation heuristics. In the next section, SRT division algorithm and circuit are informally explained, with a special focus on radix 4 SRT circuit. The interaction between the appropriate choice of radix, redundancy in quotient digits, quotient selection and remainder computations is brie y reviewed. The third section is a brief overview of the theorem prover RRL. Section 4 is an equational formalization of SRT division circuit description in RRL. Section 5 is a brief sketch of how the proof of the two invariant properties of the circuit was done using RRL. Section 6 is a discussion of related work, and our experience in using RRL for SRT division circuit. Section 7 concludes with some remarks on possible enhancements to RRL to make it better suited for verifying circuits using preprogrammed read-only-memory (rom).
2 SRT Division Algorithm and Circuit The basic principles underlying the SRT division algorithm are reviewed. SRT division algorithm proposed by Sweeney, Robertson [17] and Tocher [19] has been frequently used in commercial microprocessors due to its eciency and ease of hardware implementation [20, 22]. Several expositions of the design of hardware divider circuits based on this algorithm appear in the literature [20, 15, 16, 3]. The SRT algorithm takes as input, two normalized fractions, the dividend and the positive divisor, and outputs the quotient and the remainder. The focus in this paper is on this part of the division circuit as in [4, 2, 18]. It is assumed that a normalization circuit for handling signs and exponents is correct. Much like the paper and pencil grade school division method, the SRT division algorithm is iterative, in which the quotient is computed digit by digit by repeatedly subtracting the multiples of the divisor from the dividend. In each iteration, the algorithm selects a quotient digit, multiplies it with the divisor, and the result is subtracted from the partial remainder computed so far. The result of the subtraction is the partial remainder for the next step. The partial remainder is initialized to be the dividend divided by r. The algorithm terminates once all the quotient digits have been computed. The algorithm can be formalized in terms of the following recurrences. 0 := dividend=r; Q0 := 0; +1 := r Pj , qj+1 divisor; f or j = 0; ; Qj +1 := r Qj + qj+1 ; f or j = 0; ; n , 1; P
Pj
n
,1
;
where Pj is the partial remainder at the beginning of the j -th iteration, and 0 Pj < divisor, for all j , Qj is the quotient at the beginning of the iteration j , qj is the quotient digit at iteration j , n is the number of digits in the quotient, and r is radix used for representing numbers. The alignment of the partial remainders and the multiples of the divisor being subtracted is achieved by left shifting the partial remainder at each step (i.e., by multiplying Pj with the radix r). The correct positional placement of the quotient digit is similarly ensured by left shifting the partial quotient. And, the invariant 0 Pj < divisor ensures that at each step, the highest multiple of the divisor less than the partial remainder is subtracted. SRT dividers used in practice incorporate several performance enhancing techniques while realizing the above recurrence. An important issue in implementing such an algorithm in hardware is the selection of correct quotient digit at each step. A brute force strategy of enumerating the multiples of the divisor until the subtraction leads to a number that is less than the divisor could be prohibitively expensive. The SRT dividers instead use quotient digit selection functions in the form of look-up tables for guessing a quotient digit at each step of division based on the partial remainder and the divisor. Two other major aspects resulting in the increased performance of SRT dividers are the choice of the radix in representing the quotient, and the choice
of a signed digit representation for the quotient digits. The former reduces the number of iterations required to get the quotient, and the latter reduces the time taken in each iteration by speeding up the partial remainder computation. In [20], tradeos between speed, radix choice, redundancy of quotient digits, are discussed.
2.1 Choosing Quotient Radix In an SRT divider using the radix 2, each iteration produces one quotient bit, and n iterations are required to produce a quotient of n bit accuracy. The number of iterations can be reduced by choosing a higher radix. For example, choosing the radix to be 4, only n=2 iterations are needed; at each step, two quotient bits can be generated. The choice of a higher radix, however, entails larger time in each iteration since the selection of the quotient digit and the generation of divisor multiples become more complicated. Typically, radix 4 is used in practice, since it seems to provide a reasonable trade-o between the number of iterations and the time spent in each iteration [20]. Multiplication by quotient digits 0, 1, 2, and 3, can be performed by shifting and adding/subtracting. The SRT divider speci ed and veri ed in this paper, uses the radix 4.
2.2 Redundant Quotient Digit Representation SRT dividers reduce the latency of each iteration by using a redundant signeddigit representation for the quotient digits. Typically, the digit values of a quotient represented with a radix r can range from 0 through r , 1. In contrast, in a redundant signed-digit representation, the digit values of a quotient with radix r are a consecutive set of integers [,a; a] where a is at least dr=2e. Depending upon a, this allows for some redundancy. For example, a redundant signed-bit representation for quotient with radix 4 would be the quotient digit set f,2; ,1; 0; 1; 2g; this is in contrast to 4 quotient digits commonly used for radix 4: f0; 1; 2; 3g. The value of a quotient with signed digits is interpreted by subtracting the binary weights of the negative digits from the non-negative ones. Due to the redundancy in the representation, more than one quotient can map onto the same number. For example, the quotients 10(,2) and 1(,1)2 in radix 4 both have the value 1 42 , 2 1 = 14 = 1 42 , (1 4) + 2. An advantage of using the above quotient digit set is that divisor multiples are generated simply by shifting. This is in contrast to the unsigned quotient digit set representation for radix 4 for which it is necessary to implement a shift followed by an add/subtract to generate 3 times the divisor. More importantly, redundancy among quotient digits allows the quotient digits to be selected based on only a few signi cant bits of the partial remainder and the divisor. This reduces the complexity of the quotient selection table, as well as allows the multiplication and the subtraction stage of an iteration to be overlapped with the quotient selection stage of a successive iteration. The radix 4 SRT divider in this paper uses the redundant signed-digit representation [,2; 2].
8/3D
16/3 5 14/3 13/3 4 11/3 10/3 Shifted Partial 3 8/3 Remainder 7/3
qj = 2
5/3D 4/3D
qj = (1, 2)
2 5/3 4/3 1 2/3 1/3 0
qj = 1
2/3D
qj = (0, 1)
1/3D
qj = 0 8/8
12/8 Divisor
15/8
Fig. 1. P-D Plot for Radix 4
2.3 Quotient Selection Function
The SRT division algorithm with redundant signed-digit quotient representation allows quotient digits selected to be inexact within certain bounds; the partial remainder generated in a step could be negative. The bound on the successive partial remainders using a redundant signed-digit representation [,a; a] for radix r is ,D a=(r , 1) Pj D a=(r , 1); where D is the divisor. By substituting the recurrence for the successive partial remainders, the range of shifted partial remainders that allow a quotient digit k to be chosen is: [(k , a=(r , 1)) D; (k + a=(r , 1)) D]: The correlation between the shifted partial remainder range P and divisor D in the SRT division algorithms is diagrammatically plotted as a P-D plot. The shifted partial remainder and the divisor form the axes of the plot which illustrates the shifted partial remainder ranges in which a quotient digit can be selected, without violating the bounds on the next partial remainder. The P-D plot for a radix 4 quotient with redundant digit set [,2; 2] is given in Figure 1. As the reader would notice, when the partial remainder is in the range [5=3D; 8=3D], the quotient digit 2 is selected. The shaded regions represent quotient digits overlaps where more than one quotient digits selection is feasible. So if the partial remainder is in the range [4=3D; 5=3D], either 2 or 1 can be used.
parrem
g7g6g5g4.g3g2g1 1010.0 1010.1 1011.0 1011.1 1100.0 1100.1 1101.0 1101.1 1110.0 1110.1 1111.0 1111.1 0000.0 0000.1 0001.0 0001.1 0010.0 0010.1 0011.0 0011.1 0100.0 0100.1 0101.0 0101.1
1.000 1.001 { { { { { { { { { { -2 -2 -2 -2 -2 -2 A B -1 -1 -1 -1 0 0 0 0 1 1 1 1 2 C 2 2 2 2 { 2 { { { { { { { { { {
Divisor
f1.f2f3f4 1.010 1.011 1.100 1.101 { { { { { { { { { { { -2 { -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 B -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 D D 0 0 0 0 0 0 0 0 0 0 1 1 E 0 1 1 1 1 1 1 1 1 2 2 C 1 2 2 2 2 2 2 2 2 2 2 2 2 { { 2 2 { { { 2 { { { { { { { {
1.110 1.111 { { -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 B -1 -1 -1 -1 -1 -1 -1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 2 1 2 2 2 2 2 2 2 2 { 2 { {
Table 1. Quotient Digit Selection Table
For selecting an appropriate quotient digit, it is not necessary to know the exact value of the shifted partial remainder P or the divisor D. It suces to know the region in which the ratio P=D lies in Figure 1. Due to the overlap between the lower bound for the P=D ratio for quotient digit k and the upper bound for the quotient digit k , 1, P=D ratio can be approximated in choosing quotient digits. For instance, a radix 4 SRT divider with the partial remainders and divisor of width n, n > 8, it suces to consider partial remainders up to 7 bits of accuracy and a divisor up to 4 bits of accuracy [20]. The quotient selection table implementing the P-D plot for radix 4 is reproduced above from [20]. Rows are indexed by the shifted truncated partial remainder g7g6g5g4:g3g2g1; columns are indexed by the truncated divisor f 1:f 2f 3f 4; table entries are the quotient digits. The table is compressed by considering only row indices up to 5 bits since only a few entries in the table depend upon the 2 least signi cant bits g2g1 of the shifted partial remainder. For those cases, the table entries are symbolic values A; B; C; D; E , de ned as: A = ,(2 , g2 g1); B = ,(2 , g2); C = 1 + g2; D = ,1 + g2; E = g2: These entries as well as other aspects of the selection table are further discussed in subsection 4.1, where we show how the table is input to RRL. The - entries in the table are for the cases of the shifted truncated partial remainder and truncated divisor pairs which are not supposed to arise during the computations.
2.4 Divider Circuit
A radix 4 SRT divider circuit using the signed digit representation [,2; 2] is given in Figure 2. The registers divisor, remainder in the circuit hold the value of the divisor and the successive partial remainders respectively. The register
rout A+B GALU
rout1 (8 bits) <<2
<<2
A
A+B B
A - B -1
DALU
A-B A
B
rin1 (8 bits)
4rout1
md1 (8 bits) REMAINDER d1 (3 bits)
md MUX
QUO LOGIC 0 qsign ( 1 bit)
d
<<1
q Divisor qdigit( 2 bits) QPOS
QNEG
A
B A-B
Fig. 2. SRT Division Circuit using Radix 4
q holds the selected quotient digit along with its sign; the registers QPOS and QNEG hold the positive and negative quotient digits of the quotient. A multiplexor MUX is used to generate the correct multiple of the divisor based on the selected quotient digit by appropriately shifting the divisor. The hardware component QUO LOGIC stands for the quotient selection table, and it is typically implemented using an array of preprogrammed read-only-memory. The hardware component DALU is a full width ALU that computes the partial remainder at each iteration. The component GALU (the guess ALU [20]) is an 8-bit ALU that computes the approximate 8-bit partial remainder to be used for quotient selection. The components << 2 perform left shift by 4. The circuit is initialized by loading dividend/4 (by right shifting the dividend by 2 bits) and the divisor into the remainder and divisor registers. The quotient is initialized to be zero by setting the registers QPOS and QNEG to be zero. The quotient digit register q is initialized by the appropriate alignment of the dividend and the divisor. At each iteration, the correct multiple of the quotient digit and the divisor is output by MUX. This output and the partial remainder in the remainder register is input to DALU to compute the next partial remainder. An 8 bit estimate of the partial remainder in the remainder register and an 8 bit estimate of the output of the MUX are input to the GALU.
GALU computes an 8 bit estimate of the next partial remainder which is left shifted by 4, and then, used with the truncated divisor (d1), to index into QUO LOGIC to select the quotient digit for the next iteration. Note that GALU and the quotient digit selection are done in parallel with the full width DALU so that the correct quotient digit value is already available in the register q at the beginning of each iteration. This relationship between the inputs and output of DALU and GALU is captured using the predicate GALU desc(rout, rin, md, rin1, md1, qsign, rout1) in section 5.2 where the correctness of the circuit is discussed. The circuit is formalized in Section 4.
3 A Brief Overview of RRL Rewrite Rule Laboratory (RRL) [11] is dierent in its design philosophy from most proof checkers such as PVS, IMP, HOL, Isabelle, NUPRL, LP, in the
sense it attempts to perform most inferences automatically without user guidance. Many proofs in RRL can be generated automatically; RRL can be used in such cases as a push-button theorem prover. In fact, that is how we typically use RRL for nding proofs, starting without having any clue about how a proof can be done manually. When a proof attempt fails and a proof cannot be found automatically, the transcript is looked at, which may reveal a variety of things. The conjecture may have to be modi ed, a de nition may have be to xed, or perhaps, an intermediate lemma needs to be hypothesized. RRL supports some heuristics for automatically generating intermediate lemmas based on formulas generated during a proof attempt. Lemmas which cannot be generated automatically by RRL must be provided by the user. This is where RRL needs guidance from the user. Below, we brie y discuss main features of RRL found useful for hardware veri cation. The speci cation language supported in RRL is equational, with support for de ning abstract data types using constructors. Circuits and their behavioral speci cations are given as equations and conditional equations. De nitions are distinguished from properties (lemmas) using := for de nitions and == for properties to stand for the equality symbol. The correctness of circuit descriptions is established by proving various properties about these descriptions, and showing that they meet the behavioral speci cations. After transforming de nitions into terminating rewrite rules, RRL attempts to prove a conjecture by normalizing its two sides using contextual rewriting [23] and the decision procedures for discharging any hypotheses, if any, and checking whether the normal forms of the two sides of the conjecture are identical. If it succeeds, then the proof is said to have been obtained using equational reasoning and decision procedures. Otherwise, a proof by induction is attempted using the cover set method of generating induction schemes [24]. RRL has built-in heuristics for 1. orienting equations into terminating rewrite rules,
2. identifying the next rewrite rule to apply for simpli cation, and for that, determining the instantiation of the free variables, and discharging conditions, if any, of the rewrite rule, 3. invoking decision procedures for numbers (quanti er-free Presburger arithmetic), bits, data types with free constructors, and propositional logic, 4. selecting the next inference rule, 5. automatic case analyses, 6. choosing induction schemes based on the de nitions of function symbols appearing in a conjecture and interaction among these de nitions, 7. generating intermediate lemmas needed, as well as 8. automatic backtracking when a proof attempt fails. The user is thus relieved of the task of having to determine the sequence in which rewrite rules should be applied, when decision procedures should be invoked, how rewrite rules should be instantiated, when induction is performed, as well as what variables and induction scheme to be used for induction. The cover set method generates induction schemes based on well-founded orderings used to establish termination of function de nitions. Based on an induction scheme, the conjecture is split into subgoals to be proved in order to prove the original conjecture. Each subgoal is then tried just like the original conjecture. If a proof attempt based on a particular induction scheme does not lead to a counter-example, but also does not succeed, RRL automatically backtracks to pick another induction scheme. While attempting to nd proofs, RRL automatically attempts to generate new intermediate lemmas needed to nd a proof of a conjecture. Currently, RRL implements a simple heuristic for generalizing conjectures by abstracting to new variables, common subexpressions appearing in a conjecture and satisfying certain criteria. New intermediate lemma speculation heuristics have been investigated in [9, 10], and will be implemented, as we consider the intermediate lemma speculation research to be the most critical for automating proofs by induction.
4 Formalizing SRT Division in RRL The SRT divider in Figure 2 is equationally speci ed in RRL. We rst discuss how quotient selection table is axiomatized. The recurrence relations for partial remainder and quotients are axiomatized using the quotient digit.
4.1 Formalizing Quotient Selection Table The quotient selection table can be input essentially as is into RRL. As discussed earlier, even though the partial remainder is truncated to be 7 bits, only 5 bits are used to index the rows of the table. Every entry in the table is, thus, for four remainder estimates. In some cases, the table entry depends upon the
two least signi cant bits in the estimate. For showing that dependency, the symbolic entries A; B; C; D; E are used in the table, and their values depend upon the value of these bits. The table is input to RRL by de ning the function nextqdigit below, such that given a row and column indices, it gives the entry in the table. Instead of using fractional numbers for indices, it is more convenient and faster for the prover to use their scaled integer versions as indices to the table. So all row and column indices are scaled up by 8. (Scaling up eectively leads to using number representations of bit vectors of the shifted truncated partial remainder estimate and the truncated divisor estimate by dropping the decimal point.) Since the table is big, we give only a partial speci cation of one of the rows, the eighth row, to illustrate how it is input into RRL. The eighth row is indexed by -5/2 (ignoring the two least signi cant bits), and is scaled up by multiplying by 8, to - 20 (2's complement representation is used in the table for row indices); the columns are indexed by 8/8 to 15/8, and they are also scaled up by multiplying by 8. The function m below stands for the minus operation. nextqdigit(m(20), nextqdigit(m(20), nextqdigit(m(20), nextqdigit(m(20),
8) := 10) := 12) := 14) :=
m(2), m(2), m(1), m(1),
nextqdigit(m(20), nextqdigit(m(20), nextqdigit(m(20), nextqdigit(m(20),
9) := 11) := 13) := 15) :=
m(2), m(2), m(1), m(1).
The eighth row corresponds to four shifted truncated remainder estimates: f,5=2; ,19=16; ,9=8; ,17=16g depending upon the values of g2g1. It includes B when the column index is 1:011, where B = ,(2 , g2). For all other column indices, the entries do not depend upon g2g1. So for all other column indices, nextqdigit is the same irrespective of whether the rst argument is ,20; ,19; ,18 or ,17. If the second argument of nextqdigit is 11, then its value is -2 if the rst argument is ,20 or ,19, since in that case g2 is 0; if the rst argument is ,18 or ,17, then nextqdigit is -1. Below, we give the speci cation for those cases when the second argument is 11. nextqdigit(m(19), 11) := m(2), nextqdigit(m(17), 11) := m(1).
nextqdigit(m(18), 11) := m(1),
Other rows in the table lead to similar speci cations, with each row de ning 32 entries. Inaccessible table entries (represented as - in the above table) are represented as a large negative or a large positive integer depending upon the top or the bottom of the table, respectively. The speci cation de nes nextqdigit on 768 pairs of indices. In our rst proof attempt of the above SRT circuit as sketched in [5], we followed an approach similar to the one taken in [2] for specifying the quotient selection table. The table in that proof is not explicitly represented but is rather abstracted as a predicate de ned using six boundary values denoting the endpoints of the partial remainder ranges for the choice of the ve quotient digits [,2; 2]. For example, the quotient digit ,2 is chosen whenever the partial remainder is in the range [b1; b2). The boundary value b1 is the minimum value of
the shifted truncated partial remainder estimate for which the selected quotient digit is ,2. b1 is explicitly enumerated for every truncated divisor value. Similarly, other boundary values are speci ed. RRL was successful in obtaining a proof similar to the one reported in [2]. That proof has fewer cases but reasoning about each case is complex and more tedious, taking more cpu time, in contrast to the proof discussed later which has more cases but each case can be easily proved. We will contrast that proof based on such an intensional formalization of the quotient selection table after we have discussed the proofs of invariants in the next section using the above explicit, extensional representation of tables.
4.2 Formalizing Partial Remainder and Quotient Computations
The recurrence relations for partial remainders and quotients discussed above can be equationally axiomatized as two functions nextparrem and nextquot, respectively.3 The inputs for nextquot are: the quotient computed so far, as well as the scaled truncated partial remainder and the scaled truncated divisor, to be used for indexing the quotient selection table for selecting the quotient digit. The new quotient is got by either adding or subtracting the selected quotient digit from the previous quotient based on the sign of the selected quotient digit.4 nextquot(quotient,truncp,truncd) := 4 * quotient + nextqdigit(truncp,truncd).
The function nextparrem for computing the partial remainder is similarly de ned. Its inputs are: the partial remainder computed at the previous iteration, the scaled truncated previous partial remainder, the divisor, and the scaled truncated divisor. nextparrem(parrem,truncp,divisor,truncd) := 4 * parrem - nextqdigit(truncp, truncd) * divisor.
5 A Correctness Proof of SRT Division in RRL The correctness of the SRT divider circuit is established by proving two invariants about the circuit. It is assumed that the divider circuit is initialized appropriately with dividend/4 as the initial partial remainder, and with the initial quotient being zero. 3 nextparrem and nextquot were called, respectively, nrout and nquot in [5]. 4 In the circuit, addition/subtraction in GALU and DALU are selected based on the
sign of the quotient digit. The functions nextquot and nextparrem could have easily been speci ed as adding or subtracting based on the sign of the quotient digit, exactly mimicing such selection in the circuit; in fact, that is how nextquot, nextparrem were speci ed in our rst proof attempt [5]. The proofs using such speci cations can be found by RRL just as easily except that there are more subcases. The above formalization is more compact with the quotient sign not made explicit but instead being part of the quotient digit. DALU computation|addition or subtraction{ gets abstracted, leading to fewer subcases.
5.1 Invariant relating Partial Remainder and Quotient to Dividend
The rst invariant is to show that in every iteration, the dividend is equal to the the addition of the partial remainder computed so far and the divisor multiplied by the quotient computed so far. However, as discussed in section 2, to mechanically align the partial remainder with the multiple of the divisor, in each step the partial remainder is left shifted; similarly, the quotient is also left shifted to align the quotient digit. The quotient and the partial remainder at step i are thus scaled up from their actual values by 4i . The relationship between the dividend, the partial remainder and divisor is: 4i,1 dividend = Qi divisor + Pi : The above property can be veri ed by rst showing that it initially holds (i = 0), which is indeed the case assuming that the partial remainder and quotient are properly initialized. For other steps, it follows from the following invariant relating partial remainders and quotients in successive iterations. Qi+1 divisor + Pi+1 = (Qi divisor + Pi) 4 : This is input to RRL as: nextparrem(parrem,truncp,divisor,truncd) + nextquot(quotient, truncp,truncd)*divisor == 4*(parrem+(quotient*divisor)).
This property is automatically established in RRL by contextual rewriting and the decision procedure for linear arithmetic.
5.2 Partial Remainders Never Go Out of Bounds
The second invariant is the more interesting one. It establishes that the division circuit would converge to a correct quotient, and would not over ow. As discussed in section 2, to ensure this for a radix 4 SRT divider with a redundant quotient digit representation [,2; 2], it is necessary that the successive partial remainders be bounded as: ,2=3 divisor Pi 2=3 divisor: Since the partial remainder computation in every iteration is dependent on the quotient digit selected, this invariant also establishes the correctness of the quotient selection table based on the shifted truncated partial remainder and the truncated divisor. The second invariant is speci ed to RRL as follows: m(2) * divisor <= 3 * nextparrem(parrem,truncp,divisor,truncd) and 3 * nextparrem(parrem,truncp,divisor,truncd) <= 2 * divisor if m(2) * divisor <= 3 * parrem and 3 * parrem <= 2 * divisor and truncd <= 8*divisor and 8 * divisor < truncd + 1 and truncp = 32*rout1 and GALU_desc(parrem, rin, qd*divisor, rin1, md1, qsign, rout1),
where GALU_desc(rout, rin, md, rin1, md1, qsign, rout1) == (rin1 <= rin) and (64 * rin < 64 * rin1 + 1) and (md1 <= md) and (64 * md < 64 * md1 + 1) and cond(qsign=0, rout1 = rin1+md1, 64*rout1 = 64*rin1-64*md1-1) and cond(qsign=0, rout = rin + md, rout = rin - md).
The invariant states that if 1. the partial remainder is within the bounds in the previous iteration, 2. truncd is the scaled up version of the truncation of the divisor up to 3 bits after the decimal point, 3. truncp is the scaled up version of the shifted output rout1 (multiplication by 4) of GALU, then the partial remainder in the next iteration is also within the bounds. The predicate GALU desc describes the 8 bit GALU unit which approximates the DALU computation. Given a partial remainder and a quotient digit multiplied with the divisor, DALU computes the next partial remainder by subtraction (or addition depending upon the sign of the quotient digit). GALU performs the same computation as the full ALU unit DALU. However, GALU unit operates on the truncated versions of the inputs to DALU, and is much faster. The quotient digits are selected using the GALU output, and this allows the quotient digit selection for the next iteration to be done in parallel with the fully precise DALU computation. The rst two conjuncts in the formula describing the GALU circuit assert that the truncated partial remainder approximates the corresponding partial remainder up to 6 bits. The second two conjuncts deal with a similar assertion for its second input, the truncated divisor multiplied by the quotient digit. The rst conditional expression describes the behavior of GALU and the second one describes the behavior of DALU.5 The second invariant is automatically proved by RRL by case analysis on the quotient selection table entries. Subcases are generated from the cover set of nextqdigit. There are 768 subcases corresponding to each shifted truncated partial remainder and truncated divisor estimates. In some subcases, further case analysis is done on qsign being 0 or 1. This analysis is repeated by RRL for the upper bound and the lower bound. Each of these cases is proved very easily by using rewriting and the linear arithmetic decision procedure with RRL using a total CPU time of 58 seconds on a Sun Sparc 5 with 64MB of memory. This is despite the fact that RRL currently uses a unary representation of numbers, thus leading to very large expressions which must be rewritten. A slight modi cation leading to representing numbers eciently, is likely to reduce the total proof time by at least an order of magnitude, if not two orders of magnitude. 5 The built-in primitive
cond in RRL is used for constructing conditional expressions. It is used by the prover for generating case analysis.
For those pairs of truncated estimates used as indices to select a quotient digit in the range [,2; 2], the property that the partial remainder in the next iteration remains within the bounds, follows from the property that the partial remainder in the previous iteration is within the bounds. For truncated estimates corresponding to the table entry -, hypotheses in the above invariant become inconsistent because of the constraint that the partial remainder in the previous iteration was within the bounds. We also show that no pair of indices other than those in Table 1 need to be considered. Only the speci ed column indices can occur since the divisor is normalized and positive, and we have assumed that the truncated divisor correctly approximates the divisor up to 3 bits. The completeness of the row indices is established in RRL by the following property which is easily proved by RRL by rewriting and the linear arithmetic. not(truncp < m(48)) and not(47 < truncp) if m(2) * divisor <= 3 * parrem and 3 * parrem <= 2 * divisor and 8 <= truncd and truncd <= 15 and m(2) <= qd and qd <= 2 and truncd <= 8*divisor and 8*divisor < truncd+1 and truncp = 32*rout1 and GALU_desc(parrem, rin, qd * divisor, rin1, md1, qsign, rout1).
A major noticeable dierence between this proof and our earlier proof reported in [5] (as well as the proof in [2]) is that the second invariant is proved directly without using any assumptions. In contrast, the second invariant relied on two assumptions in our earlier proof, much like the proof in [2], These assumptions|the output of GALU is the truncation of the output of DALU, and the output of GALU along with the truncated divisor select proper quotient digits from the quotient selection table| rst had to be manually identi ed, and then proved separately, thus making the proof interactive. In the above proof, these properties are automatically derived, and appropriately used to establish the second key invariant. Additionally, the new proof establishes the completeness of the indices of the table in [20]. This in contrast to our earlier proof (and the proof in [2]) where the rows in the table of [20] implicitly extend in both directions with the out-of-bounds values.
5.3 Detecting Errors in Quotient Digit Selection Table Despite our careful transcription of the table from [20], two errors were made in the table speci cation that was input into RRL. We discuss below how these bugs were detected using RRL since this could be illustrative of nding possible bugs in a quotient selection table. Both errors in the transcription of the quotient selection table were detected while attempting a proof of the second invariant. Since every quotient selection table entry gives rise to an intermediate subgoal of the second invariant, RRL is unable to prove the subgoal corresponding to the erroneous entry. It also
explicitly displays the table index values from which the corresponding subgoal was generated. Using this, it is straightforward to determine the table indices associated with a failing subgoal. The rst error was in the table entry indexed by (1111:0; 1:000). This was erroneously assigned a value of ,2 instead of ,1. With this entry, RRL established that the partial remainder would violate its upper bound as explained below. Based on the table entry, rout1 = -1/4, truncp = -8, truncd = 8 and nextqdigit(m(8), 8) = -2. Let divisor = 1 and parrem = -6/25. Under these values, the conjuncts m(2) * divisor <= 3 * parrem and 3 * parrem <= 2 * divisor in the hypothesis of the second invariant reduce to true. The conjuncts truncd <= 8*divisor and 8 * divisor < truncd + 1 also reduce to true. Finally, the predicate GALU desc is true implying that GALU approximates the partial remainder correctly up to 5 bits. However, the next partial remainder computed violates the upper bounds since, 4*-(6/25) + 2*1 is greater than 2/3. The second error was in the table entry indexed by (0100:0; 1:111). The entry was erroneously assigned the out of bounds value (we used 10 for that) instead of 2. For this entry, using a similar analysis as above, RRL established that the partial remainder would violate its lower bound.
6 Comparison with Related Work Given that we have now discussed our proof, this section compares it with the related work. We believe that the above formalization as well as proof are much simpler than other formalizations of the related circuit in the literature.6 Verkest et al [21] discussed a proof of a nonrestoring division algorithm using Boyer and Moore's prover. Lesser and O'Leary [12] discussed a circuit of subtractive radix-2 square root using the NuPRL system. Square root and division circuits are considered to be related to each other [20]. But circuits veri ed in [21, 12] are not based on the SRT method. Since Intel's Pentium bug was reported in the media, there has been a lot of interest in automated veri cation of the SRT divider circuits [18, 2, 4, 13]. Bryant [1] discussed how BDDs can be used to perform a limited analysis of some of the invariants of the SRT circuit. As reported in [18], Bryant had to construct a checker-circuit much larger than the veri ed circuit to capture the speci cation of the veri ed circuit. As reported in [2], German and Clarke [4] formalized Taylor's description of SRT division circuit [20] as a set of algebraic relations on the real numbers. According to them, most of the hardware for the SRT algorithm could be described using linear inequalities. They used Maple, a computer algebra system, to prove properties of the SRT circuit by reasoning about linear inequalities using its Simplex algorithm package. 6 We are however also guilty of not making the proof even simpler by using fractions
instead of integers.
This formalization was subsequently improved in [2] by Clarke, German and Zhao using the language of Analytica, a theorem prover built on top of Mathematica, another commercially available computer algebra system. To quote the authors, \Analytica is the rst theorem prover to use symbolic computation techniques in a major way. ... Compared to Analytica, most theorem provers require signi cant user interaction. The main problem is the large amount of domain knowledge that is required for even the simplest proofs. Our theorem prover, on the other hand, is able to exploit the mathematical knowledge that is built into the symbolic computation and is highly automatic." A correctness proof of SRT divider circuit was then done using Analytica. The main feature of the proof was an abstraction of the quotient selection table using the six boundary value predicates as discussed in subsection 4.1. This abstraction had to be manually provided by Clarke et al. The proof of invariants using this intensional representation of quotient selection table involves reasoning about inequalities, which can become quite tedious and involved. Perhaps, that is why Clarke et al had to use Analytica - using Mathematica algorithms for symbolic manipulation and inequality reasoning, and the Analytica subpart built on Mathematica for logical reasoning. Even though it is claimed in [2] that the proof is \fully automatic" (p. 111 in [2]), the proof (especially, the proof of the second invariant regarding the boundedness of partial remainders) had to be decomposed manually and the two assumptions had to be discharged manually. Our rst proof attempt discussed in [5] was essentially an exercise to determine how much of the Analytica proof [2] could be done automatically by RRL without having to use any symbolic computation algorithms of computer algebra systems. We mimiced the proof in [2] but making data dependency of various circuit components explicit on dierent data paths, as well as by identifying different assumptions made in the proof reported in [2]. Much to our surprise, once we succeeded in translating the Analytica speci cation to RRL's equational language (which by the way, was the most nontrivial part of this proof), RRL was able to nd proofs of all the formulas (the rst invariant and the second invariant with the assumptions, as well as discharging of assumptions) automatically, without any interaction. The main inference method used was that of contextual rewriting integrated with the linear arithmetic procedure in RRL. No additional mechanism was found necessary. No extensions had to be made to RRL. Further, the entire proof (including the proof of the second invariant involved over 96 cases) could be done in less than 15 minutes on a Sparc 5 Sun workstation.7 No timing information is given in [2] for their proof using Analytica. A brief comparison of the above proof with the proof reported in [5] based on representing the table using boundary value predicates is discussed: 1. The axiomatization using explicit table representation is much simpler, and it does away with an aspect of the speci cation development where human guidance is used for abstracting table entries as predicates. 7 We would also like to point out that RRL is an old piece of software written in
Common Lisp.
2. Since the abstract representation of the quotient selection table using boundary value predicates in [5, 2] just considers the minimum and maximum values of a partial remainder for every quotient digit, thus losing information on other relations among entries, it is possible to certify erroneous tables correct. For instance, the rst error in the table discussed above could be detected using the explicit representation of the table, but would be undetected using the boundary value predicate formulation.8 3. Even though the above proof has nearly 800 subcases, the proof of each subcase is much easier and a lot quicker to obtain as a subformula typically involves numeric constants that can be easily simpli ed and reasoned about. This is in contrast to the proof in [5] from the speci cation of the table using boundary value predicates which has 96 subcases. Each subcase in that proof involves a formula with variables with linear constraints, and the reasoning is not as simple and easy. 4. The above proof of the second invariant is direct without making any assumptions. In our earlier proof (as in the proof in [2]), two assumptions had to be made, which were subsequently established. 5. The new proof takes less than 1 minute even with more subcases, in contrast to the earlier proof which took around 15 minutes using the same version of the prover on the same machine. The proof reported in [18] using the PVS system is more general than the above proof as well as the proofs in [2, 5]. Its structure is essentially based on an earlier proof of Clarke and German [4]. First, a general theory of SRT division for arbitrary radix r and arbitrary redundant quotient digit range [,a; a] is developed. Constraints on a quotient digit selection table are identi ed in terms of r; a and other parameters. This theory is then instantiated for the radix 4 SRT division circuit. The constraints on the table are also instantiated on a speci c quotient digit selection table. The speci cation and the proof are organized manually using sophisticated mechanisms of the PVS language which supports higher-order logic, dependent types, overloading, module facility, a special data type table [18]. The speci cation is developed with considerable human ingenuity, and the resulting proof is manually driven, even though parts of the proof can be done automatically using previously developed PVS tactics. As reported in [18], the correctness proof of the table implementation itself took 3 hours of cpu time, with the whole proof taking much longer even with user's help. Miner and Leathrum's work [13] is a generalization of the proof in [18]; it formalized a subset of IEEE oating point standard, and uses it to do the proof for
oating point arithmetic thus providing a formal link relating the SRT division algorithm, among other things, to the IEEE standard subject to IEEE compliant rounding. This proof uses even more sophisticated features of the PVS system, and is highly interactive. 8 It may, however, be possible to avoid such errors if additional constraints such as
monotonicity of entries in every row in the table are also included in its speci cation along with the boundary value predicates.
Moore et al [14] reported a proof of correctness of the kernel of a microcoded
oating point division algorithm implemented in AMD's 5K86 processor. The algorithm is de ned in terms of oating point addition and multiplication. The proof is done using ACL2, a descendant of Boyer and Moore's prover. No claim is made about making the proof automatic, but rather the main emphasis is on formalizing the IEEE oating point arithmetic, compliant rounding, proving a library of relevant and useful properties of oating point arithmetic, and using it to verify the division algorithm based on Newton-Raphson's method.
7 Concluding Remarks and Future Enhancements to RRL
We have discussed a proof of correctness of the SRT division algorithm using a rewrite-based theorem prover RRL. It is believed that the algorithm similar to the one discussed above was implemented in the Intel Pentium chip, which was found to be buggy. The bug is suspected to be as either the wrong entries in the quotient digit selection table or that certain portions of the table considered inaccessible could be accessed during the division computation. The salient features of the above proof are that 1. the formalization is much simpler than the ones reported in [2, 18], 2. the quotient digit selection table is explicitly speci ed, in contrast to the speci cation of the table in [2] in terms of boundary value predicates, an abstraction that a human designer would have to perform, 3. the proof of the second key invariant about the circuit is simpler, making fewer assumptions than in the proof reported in [2], 4. the proof nding takes much less resources in contrast to other proofs (58 seconds of cpu time on a Sparc5 Sun with 64MB of memory), and 5. possible bugs in the quotient digit selection table can be easily identi ed. From our work on using RRL for mechanizing veri cation of arithmetic circuits, a number of lessons can be drawn about possible enhancements to rewritebased provers such as RRL in order to make them better suited for the application of hardware veri cation. Perhaps the most critical feature of a prover for this application is that it should be able to perform simpli cation (rewriting) using conditional rewrite rules very eciently and fast. Secondly, a prover should also be able to do automatic case analyses quickly. An important research problem is to develop heuristics for ecient case analysis that minimizes duplication of eort by recognizing a common subset of assumptions sucient to establish many related subcases. Thirdly, for proofs of parameterized and generic circuits with subcircuits speci ed by constraints on their input/output behavior, a theorem prover should be able to perform proofs by induction. A related issue is that of automatic generation/speculation of intermediate lemmas, as often behavioral speci cations cannot be directly proved from circuit speci cations. We have brie y discussed these issues in [5] Mechanizing veri cation of the SRT division circuit highlights additional issues related to representing and reasoning about data structures found useful
in describing such circuits. The above proof involves reasoning about numeric constants while dealing with truncated partial remainders, truncated divisors as well as bounds on the partial remainder. A prover should be able to eciently represent, display and reason about numeric constants. RRL does not provide good support for explicitly representing and manipulating numeric constants. Numbers are represented in unary notation using 0, successor and predecessor. In order to specify the SRT divider, the numeric constants have to be speci ed by the user as abbreviations de ned in terms of these constructors. This imposes an unnecessary burden on the user for having to specify such trivial details. Further, in proofs generated by RRL, these abbreviations are expanded as their unary representations, leading to unnecessary large formulas. Rewriting such formulas is considerably slower. Intermediate formulas with such large term structures also make the mechanical proofs hard to read and understand. A better representation for numbers and reasoning support for rational number will be helpful in formalization oating point arithmetic and proving properties about oating point circuits. Such enhancements to RRL will help in veri cation of hardware implementations of sophisticated graphics and media processing algorithms [26, 25] which have widespread use. Many circuits including SRT divider rely on preprogrammed read-onlymemory, for implementing tables for fast computation using look-ups. A direct encoding of a table data structure, the support for reasoning about it, as well as ecient handling of various entries in the table leading to large case analysis will help. Speci cations of such circuits are error-prone because of too many cases to be considered as well as large tables involving many numeric values. When a theorem prover detects a bug, it would be helpful for a theorem prover to trace the bug to the speci cation, particularly identifying the subcase and/or entry in the table. Identifying multiple bugs in a single run would enhance the designer's productivity in developing such speci cations.
References 1. R.E. Bryant, Bit-level Analysis of an SRT Divider Circuit. Tech. Rep. CMU-CS95-140, Carnegie Mellon University, April 1995. 2. E.M. Clarke, S.M. German and X. Zhao, \Verifying the SRT division algorithm using theorem proving techniques," Proc. Computer Aided Veri cation, 8th Intl. Conf. - CAV'96, New Brunswick, July/August 1996, Springer LNCS 1102 (eds. Alur and Henzinger), 111-122. 3. M.D. Ercegovac and T. Lang, Division and Square Root: Digit Recurrence Algorithms and Implementations. Kluwer, 1994. 4. S. German, Towards Automatic Veri cation of Arithmetic Hardware. Lecture Notes, 1995. 5. D. Kapur, \Rewriting, decision procedures and lemma speculation for automated hardware veri cation," Proc. 10th Intl. Conf. Theorem Proving in Higher Order Logics, LNCS 1275 (eds. Gunter and Felty), Murray Hill, NJ, Aug 1997, 171-182. 6. D. Kapur and X. Nie, \Reasoning about numbers in Tecton," Proc. 8th Intl. Symp. Methodologies for Intelligent Systems, (ISMIS'94), Charlotte, North Carolina, October 1994, 57-70.
7. D. Kapur and M. Subramaniam, \Mechanically verifying a family of multiplier circuits," Proc. Computer Aided Veri cation, 8th Intl. Conf. - CAV'96, New Brunswick, July/August 1996, Springer LNCS 1102 (eds. Alur and Henzinger), 1996, 135-146. 8. D. Kapur and M. Subramaniam, \Mechanical veri cation of adder circuits using powerlists," Dept. of Computer Science Tech. Report, SUNY Albany, November 1995. Accepted for publication in J. of Formal Methods in System Design. 9. D. Kapur and M. Subramaniam, \Lemma discovery in automating induction," Proc. Intl. Conf. on Automated Deduction, CADE-13, LNAI 1104 (eds. McRobbie and Slaney), New Jersey, July 1996. 10. D. Kapur and M. Subramaniam, \Intermediate lemma generation from circuit descriptions," under preparation, State University of New York, Albany, NY, October 1997. 11. D. Kapur, and H. Zhang, \An overview of Rewrite Rule Laboratory (RRL)," J. of Computer and Mathematics with Applications, 29, 2, 1995, 91-114. 12. M. Leeser and J.O'Leary, \Veri cation of a subtractive radix-2 square root algorithm and implementation," Proc. ICCD'95, IEEE Computer Society Press, 1995, 526-531. 13. P.S. Miner and J.F. Leathrum Jr., \Veri cation of IEEE compliant subtractive division algorithm," Proc. FMCAD'96, Palo Alto, CA, 1996. 14. J Moore, T. Lynch and M. Kaufmann, A Mechanically Checked Proof of the Correctness of the AMD5K86 Floating Point Division Algorithm. CL Inc. Technical Report, March 1996. 15. S.F. Obermann and M.J. Flynn, An Analysis of Division Algorithms and Implementations. Technical Report CSL-TR-95-675, Stanford University, July 1995. 16. A.R. Omondi, Computer Arithmetic Systems: Algorithms, Architecture and Implementations, Prentice Hall 1994. 17. J.E. Robertson, \A new class of digital division methods," IRE Transactions on Electronic Computers, 1958, 218-222. 18. H. Ruess, N. Shankar and M.K. Srivas, \Modular veri cation of SRT division," Proc. Computer Aided Veri cation, 8th Intl. Conf. - CAV'96, New Brunswick, July/August 1996, Springer LNCS 1102 (eds. Alur and Henzinger), 123-134. 19. K.D. Tocher, \Techniques of multiplication and division for automatic binary computers," Quarterly Journal of Mechanics and Applied Mathematics, 11(3), 1958. 20. G.S. Taylor, \Compatible hardware for division and square root," Proc. 5th IEEE Symp. on Computer Architecture, May 1981. 21. D. Verkest, L. Claesen, and H. De Man, \A proof of the nonrestoring division algorithm and its implementation on an ALU," J. Formal Methods in System Design, 4, Jan. 1994, 5-31. 22. T.E. Williams and M. Horowitz, \A 160nS 54-bit CMOS division implementation using self-timing and symmetrically overlapped SRT stages," Proc. 10th IEEE Symp. on Computer Arithmetic, 1991. 23. H. Zhang, \Implementing contextual rewriting," Proc. 3rd Intl. Workshop on Conditional Term Rewriting Systems, Springer LNCS 656 (eds. Remy and Rusinowitch), 1992, 363-377. 24. H. Zhang, D. Kapur, and M.S. Krishnamoorthy, \A mechanizable induction principle for equational speci cations," Proc. 9th Intl. Conf. Automated Deduction (CADE), Springer LNCS 310, (eds. Lusk and Overbeek), Chicago, 1988, 250-265. 25. Proc. of Eighth Symp. of HOT Chips, IEEE Computer Society, California, 1996. 26. Proc. of Ninth Symp. of HOT Chips, IEEE Computer Society, California, 1997.
On the Complexity of Parallel Implementation of Logic Programs? (Extended Abstract)
E. Pontelli, D. Ranjan, G. Gupta Dept. of Computer Science New Mexico State University Las Cruces, NM 88003 USA
fepontell,dranjan,[email protected]
Abstract. We study several data-structures and operations that commonly arise in parallel implementations of logic programming languages. The main problems that arise in implementing such parallel systems are abstracted out and precisely stated. Upper and lower bounds are derived for several of these problems. We prove a lower bound of (log n) on the overhead incurred in implementing even a simpli ed version of or-parallelism. We prove that the aliasing problem in parallel logic programming is at least as hard as the union- nd problem. We prove that an and-parallel implementation can be realized on an extended pointer machine with an O(1) overhead.
1 Introduction Logic programming (LP) is a popular computer programming paradigm that has been used in a wide variety of applications, ranging from Arti cial Intelligence, Genetic Sequencing, Database programming, Expert Systems, Natural Language Processing, Constraint based Optimization, to general purpose programming and problem solving. A nice property of logic programming languages is that parallelism can be automatically extracted from logic programs by the compiler or the runtime system. However, the implementation of a parallel logic programming system poses many interesting and challenging problems. Several solutions have been proposed for these problems and parallel implementations have been realized in the past. To date, however, little attention has been paid to analyzing the complexity of the operations involved in this implementation. The problem of implementing a logic programming language can be abstracted as the process of maintaining a dynamic tree. The operational semantics of the logic language determine how this tree is built and what operations on the tree are of interest. As execution proceeds according to the operational semantics, this trees grows and shrinks. In a parallel implementation the tree can ?
This work has been partially supported by NSF grants CCR 96-25358, INT 95-15256, and HRD 9628450, and by NATO Grant CRG 921318.
grow and shrink in parallel. Various operations are needed during parallel execution to guarantee the correct behaviour. For example, at runtime we may need to determine at a particular moment if a given node is in the leftmost branch of the tree that exists at that moment; or, given two nodes, if one node is an ancestor of another; etc. Although dynamic data structures have been studied extensively [2, 5, 23, 8, 20, 21, 4], to the best of our knowledge the speci c data structures needed to support parallel logic programming have not been studied formally. In this paper we derive upper bounds and lower bounds for some of these operations. We prove a lower bound of (log n) on the overhead incurred in implementing even a restricted version of or-parallelism (no aliasing). We prove that the aliasing problem in parallel implementation of Prolog is at least as hard as the union- nd problem. This gives us a lower bound of (n + m(m; n)) for the aliasing problem on pointer machines, due to the results of [18, 21]. We prove that an and-parallel implementation can be realized on an extended pointer machine with an O(1) overhead. Wepalso give a scheme to support a restricted version of or-parallelism in time O~ ( 3 n) per operation. To our knowledge this is the best scheme known to date. We also give a tight relationship between the and-parallelism problem and the problem of \time-stamping" on pointer machines. Elsewhere, we also show that the side-eect problem in parallel logic programming can be solved with a constant time overhead per operation [16]. We believe that abstracting out the problems in parallel implementation of logic programs as that of building data structures to support certain operations of interest is one of the major contributions of the paper. All the results presented are novel, as these problems have never been considered before. The only work that comes close is that of Gupta and Jayaraman [7] that establishes some results on complexity of realizing an or-parallel implementation. The work presented here is far from complete and numerous open problems remain. One of the goals of this paper is to bring these problems to the attention of the complexity theory and data structures community. We hope this paper will initiate further research in the area of data-structures for dynamic graphs that arise in parallel implementation of declarative languages and parallel AI systems. The reader is assumed to be familiar with the general concepts and terminology of logic programming and Prolog [12].
1.1 Parallel Logic Programming
The presence of non-determinism is a peculiar feature of logic programming. This non-determinism in logic programs can be used as a source of implicit parallelism. Two major forms of parallelism can be identi ed [3]: And-Parallelism (AP) [9, 15], which arises from don't care non-determinism; given a goal, several subgoals can be selected for simultaneous reduction. Or-Parallelism (OP) [1, 13]: which arises from don't know non-determinism. Given a subgoal, the multiple clauses whose heads unify with it can be solved in parallel. A major research direction in parallel logic programming has been to design parallel implementations of Prolog [3, 9, 1, 13]. This implies that the parallel ex-
ecution mechanisms have to be designed in such a way that a user sees the same externally observable behaviour during parallel execution as is observed during sequential Prolog execution. We say that such a parallel system preserves Prolog semantics. Thus, in a parallel system that preserves Prolog semantics: (i) the same solutions are produced, in the same order, as in sequential execution; (ii) the same side-eects are observed, in the same order, as in sequential execution. In this paper, we only consider parallel implementations that preserves Prolog semantics.
2 Problems in Parallel Logic Programming
Various parallel implementations of Prolog have been proposed and realized [3, 10]. Most of the current systems suer from severe eciency problems. Many of these ineciencies, especially in and-parallel systems, arise from the need of guaranteeing Prolog semantics during parallel execution. Other ineciencies, such as those arising in or-parallelism, are present due to the complexity entailed in implementing don't know non-determinism. Preservation of Prolog semantics, in fact, requires various operations (e.g., execution of side-eects) to be ordered correctly; this creates dependences between the concurrent executions. Only executions that respect all such dependences can be accepted. It is possible to combine both and-parallelism and or-parallelism together [3, 6], however, for the sake of simplicity, we consider the two forms of parallelism independently. In the rest of this work we will also not deal with the issue of implementing the correct ordering of side-eects: this problem has been studied elsewhere [16] and shown to have a worst-case time complexity of O(1) per operation.
2.1 And-Parallelism
Given a resolvent B1 ; : : : ; Bn , multiple subgoals in the resolvent can be concurrently reduced in and-parallel execution. And-parallel execution can be visualized through the and-tree. The root of the and-tree is labeled with the initial goal. If a node contains a conjunction B1 ; : : : ; Bn , then it will have n children: the ith child of the node is labeled with the body of the clause used to solve the Bi . The main problem in the implementation of and-parallelism is how to eciently manage the uni ers produced by the concurrent reduction of dierent subgoals. Two subgoals Bi and Bj (1 i < j n) in the resolvent B1 ; : : : ; Bn should agree in the bindings of all the variables that are common to them (such variables are termed dependent variables in parallel logic programming terminology). In sequential Prolog execution, usually, Bi , the goal to the left, binds the common variable and Bj works with the binding produced by Bi . In and-parallel execution, when Bj is started, the common variable will be unbound. Bj may attempt to instantiate this variable, violating Prolog semantics. The key problem is to ensure that bindings to common variables are made in the same order as in a sequential execution. This requirement is much stronger compared to just requiring that Prolog semantics be preserved (i.e., externally observable behavior during parallel execution be the same as in the sequential execution). If a parallel
system satis es this stronger requirement, we say that it preserves strong Prolog semantics. Preserving strong Prolog semantics is important, as otherwise considerable amount of redundant computation may be performed [14]. Note that if strong Prolog semantics is preserved, then Prolog semantics is preserved, but not vice versa. And-parallel implementations handle the requirement of preserving strong Prolog semantics by assigning producer or consumer status to each subgoal that shares the dependent variable [14]. The leftmost goal in the resolvent that has access to the dependent variable is designated as the producer subgoal for that variable, all others are consumers. A consumer subgoal is not allowed to bind the dependent variable, it is only allowed to read its binding. If a consumer subgoal attempts to unify against the unbound dependent variable, it has to suspend until the producer goal binds it. If the producer subgoal nishes execution without binding the dependent variable, then the producer status is transferred to the leftmost consumer subgoal for that variable. The producer subgoal for a dependent variable, therefore, can change dynamically during execution. Thus, a major problem in an and-parallel implementation is to keep track of the leftmost subgoal that can (directly or indirectly) access each variable.
2.2 Or-Parallelism
In case of OP the parallel computation can be described as a tree, called the ortree. Each node contains a goal obtained from the computation. The root node is labeled with the initial goal. To expand a node further, the leftmost subgoal in the goal of that node is selected and the matching clauses are found. If B1 ; : : : ; Bn is the goal at the node, for each clause Hj : ,Dj1 ; : : : ; Djm such that j = mgu(Hj ; B1 ), a child node labeled with the goal (Dj1 ; : : : ; Djm ; B2 ; : : : ; Bn )j is created. Note also that the child node ni corresponding to a matching clause Ci that textually precedes another matching clause Cj is placed to the left of nj , where nj is the child node corresponding to Cj . Sequential execution corresponds to building the or-tree one node at the time, in depth rst order. In the or-tree, each branch is forced to maintain a local view of the substitution computed. This requirement emerges from the fact that, during a reduction like the one mentioned above, the substitutions j produced by uni cation may potentially con ict and must be kept separate in dierent branches in the or-tree.These con icting bindings need to be maintained separately, as they will each lead to a dierent solution for the initial resolvent. The main issue, thus, in designing an implementation scheme capable of eciently supporting OP is the development of an ecient way of associating the correct set of bindings to each branch of the or-tree. The naive approach of keeping j for each branch is clearly highly inecient, since it requires the creation of complete copies of the substitution (which can be arbitrarily large) every time a branching takes place [7].
3 Abstraction of the Problems
Let us consider a labeled tree T with n nodes and a root root. Without loss of generality we will focus exclusively on binary trees. Trees are manipulated
through three instructions: (i) create tree() which creates a tree containing only the root; (ii) expand(n; b1; b2 ) which, given one leaf n and two labels b1 and b2 , creates two new nodes (one for each label) and adds them as children of n (b1 as left child and b2 as right child); (iii) remove(n) which, given a leaf n of the tree, removes it from the tree. These three operations are assumed to be the only ones available for modifying the \physical structure" of the tree. The tree implements a partial ordering between the node. Given two nodes n and n0 , we write n n0 if n is an ancestor of n0 ; n n0 additionally says that n 6= n0 . We will be often referring to the notion of leftmost branch. Given a node n, the leftmost branch of the subtree rooted in n can be de ned inductively: (i) if n does not have any children, then the branch containing only n is the leftmost branch; (ii) if n has children, and l is the leftmost child, then the leftmost branch is n followed by the leftmost branch of l. We can de ne in particular the following partial order n m: (n m , n is a node in the leftmost branch of the subtree rooted at m) Given a node n, let (n) indicate the node minfm 2 N jn mg. Intuitively, (n) indicates the highest node m in the tree (i.e., closest to the root) such that n is in the leftmost branch of the subtree rooted at m. (n) is also known in the logic programming community as the subroot node of n.
3.1 Abstracting Or-Parallelism The main issue in dealing with OP is the management of the variables and their bindings. The development of the OP computation takes place as described in the previous section on side-eects. Variables that arise during execution, whose multiple bindings have to be correctly maintained, can be modeled as attributes of nodes. We can assume that a number m of attributes are available (i.e., there are m variables). If the computation tree has size n, then we can assume that m = O(n). At each node n three operations on attributes are possible: assign a value v to an attribute ; as a consequence, every reference to in a node m n will produce v. For the sake of simplicity we can assume that, given the node n where the assignment is performed and the attribute we can uniquely infer the value v. Let us use the function access(n; ) to obtain the value v if assign(; v) was performed in n, ? otherwise. alias two attributes 1 and 2 ; This means that every reference to 1 (2 ) in a node m n will produce the same result as 2 (1 ); dereference an attribute |that is identify the value (if any) of . This is equivalent to nding the node k = max fm j access(m; ) 6= ? ^ m ng. Let us refer to the problem of nding this node n as the OP problem. Later we are going to discuss the problems of aliasing and dereferencing separately.
3.2 Abstracting And-Parallelism The main issue in implementing AP is the management of shared variables. Informally, each node may create a variable, try to bind a variable, or alias
two variable to each other. More precisely, we need to support the following operations on a variable v: creation: for our purposes, we can assume that creation is implicit with the expand operation. Let us denote with (v) the node where v is created; binding: assign a value to the variable v; aliasing: given two unbound variables v and v0 , make them aliases of each other. Once two variables are aliased, they will reference the same value. A variable v will be directly accessed or used only in the subtree rooted at (v). Aliasing of variables, in general, does not create an immediate problem for andparallel computation (it will not directly create violations of the Strong Prolog semantics|as long as the subgoal which performs the aliasing is a producer of at least one of the two aliased variables). On the other hand, binding of an aliased variable does create a problem. If two variables v and v0 are aliased, and later an attempt to bind one of them is made, then the binding should be allowed only if it satis es the binding requirements. More precisely, let us use the notation (v) to denote the set of variables that are aliased to the variable v. For an UN-aliased variable we have (v) = fvg. The binding condition can then be expressed as follows: a binding taking place in a leaf n of the tree can be safely performed (w.r.t. Strong Prolog semantics) i the node n lies on the leftmost branch of each node in (v). Equivalently, 8w 2 (v) (n (w)). In the following we will denote with verify leftmost(l; n) the operation which given a leaf node l and a node n (guaranteed to be on the path from the root to l) veri es that l is leftmost in the subtree rooted in n. Thus, apart from the general operations on the tree (creation, expansion, and removal), we need to perform two additional (and, in our opinion, orthogonal) operations to realize and-parallelism. Given a variable v for which a binding is attempted at leaf n of the tree, then in order to successfully bind v we need to: identify in (v) the \oldest" variable, w, in the set (a variable v is older than another variable w, if (v) is an ancestor of (w)). This is necessary in order to determine the reference node for the leftmost test and to locate a memory cell to store the binding. verify whether the node n is leftmost with respect to the node identi ed in the previous step (verify leftmost(n; w)). Let us refer to the rst problem (identi cation of the \oldest" node in the set and memory location) as the AP 1 problem, and the second problem (determining if a node is leftmost in the subtree rooted in a given node) as the AP 2 problem. Later, we will discuss simpler versions of these problems, where one or more operations are not considered (e.g., the AP problem without the alias operation).
4 Pointer Machines and Extended Pointer Machines
A Pure Pointer machine consists of a nite but expandable collection R of records, called global memory, and a nite collection of registers. Each record is uniquely identi ed through an address (let N be the set of all the addresses). A special address nil is used to denote an invalid address. Each record is a nite
collection of named elds. All the records in the global memory have the same structure, i.e. they all contains the same number of elds, with the same names. Each eld may contain either a data (from a generic data domain Data) or an address (i.e., and element from N ). If F is the nite collection of eld names, then the following function eld access : F N ! Data [N allows to access the value stored in a eld of a record. In the rest, for the sake of simplicity we will use the notation n(r) to denote the value returned by eld access(n; address(r)). The pointer machine is also supplied with two nite collections of registers, d1 ; d2 ; : : : (data registers) and r1 ; r2 ; : : : (pointer registers). Each register di can contain an element from Data, while each register ri can contain one element of N . The machine can execute programs. A program P is a nite, numbered sequence of instructions (where each instruction in P is uniquely identi ed by one number). The instructions allow to move addresses and data between registers and between registers and records' elds. The only \constant" which can be explicitly assigned to a register is nil. Special instructions are used to create a new record (returning its address) and to perform conditional jumps. The only conditions allowed in the jumps are true and equality comparison between two pointer registers. Observe that the content of the data elds will never aect the behaviour of the computation. In terms of analysis of complexity, it is assumed that each instruction has a unit cost. For further details on the structure and behaviour of the pure pointer machine the reader is referred to [11, 21, 19]. As evident from the previous description, the pure pointer machine model is quite limited in its ability. It provides a good base for modeling implementation of linked data structures, like trees and lists. But it fails to capture some computational aspects of a real computer architecture. Two major drawbacks of the pure pointer machine model are: (1) the lack of arithmetic|numbers have to be explicitly represented as linked lists of bits. This is realistic in the sense that, for arbitrarily large problems requiring arithmetic operations, the time to compute such operations will be a function of the size of the problem. Nevertheless models which assume the uniform cost criterion (like the RAM model)|i.e., cost of an operation is independent from the word size|have been frequently used. (2) the lack of pointer arithmetic|the only operation allowed on pointers is dereferencing (following the pointer). Real machines allows arithmetic operations on pointers, allowing implementations of arrays and other direct access structures. Of the two possible extensions, we consider in this framework the rst one. That is, we allow data registers to contain non-negative natural numbers and we allow for some additional machine instructions, which allow to initialize to 0 a data eld/register, to increment the content of a data eld/register, and two compare () two data registers. Unitary cost of the operations is assumed only on numbers whose maximum size in bits is dlg ne, where n is the size of the problem at hand.
4.1 Complexity of And-Parallelism The Problem AP 1 The Union-Find problem involves maintaining a collection of disjoint dynamic sets. Sets can be created and merged, and each set is
uniquely identi ed through a representative element. The following operations are allowed: (1) Make set(x): given an element x, this operation creates a singleton set containing x. x is clearly used as representative element for this set. Disjointness requires that x does not belong to any other set of the collection; (2) Union(x; y): given two elements x and y belonging to two disjoint sets Ax and Ay , this operation destroys the two sets and creates a new set containing the elements Ax [ Ay . An element in Ax [ Ay is selected as representative of the newly created set; (3) Find set(x): given an element x which belongs to one set of the collection, the function returns the representative of the set containing x. The general problem we are considering is that of performing an arbitrary (correct) sequence of the above operations, containing n Make set operations and m Union and Find set operations. The operations in the sequence are assumed to be performed on-line, that is each operation is not started until the immediately preceding one has been completed. Various researchers have investigated the complexity of this problem. The major result (originally [21] and then extended in [18]) is that the this problem has a lower bound of (n + m(n; m)), where is the inverse of the Ackermann function (thus a very slowly growing function). Optimal algorithms achieving this complexity have been proposed [21]. Tarjan also was the rst one to prove the upper bound complexity of the problem [22], which happens to be O(n + m(n; m)). We can easily show that the AP 1 problem has a complexity greater or equal to that of union- nd. Let us consider an arbitrary solution of the AP 1 problem. This means that we have means to represent variables, keep track of aliasing, retrieve a selected representative (the oldest variable). Given a set of objects fa1; : : : ; an g and a sequence of union and nd operations, we can implement them on top of the AP 1 . Each element ai is associated to a distinct variable vi (e.g. the memory location associated to the variable). Each time a union is required, the corresponding variables are aliased. A nd operation for an element ai can be realized by searching for the representative variable in (vi ). The mapping between elements ai and variables does not require more than a nite number of steps. This implies that the amortized complexity of AP 1 is (n + m(n; m)), where n is the number of variables and m is the number of alias/ nd operations. Moreover, from the results of [2], the single operation worst case complexity for this problem is
( lglglgnn ) for a large class of algorithms on pointer machines. An optimal upper bound for this problem is still unknown. In general, AP 1 seems to have a complexity greater than that of union- nd. This arises from the fact that when a variable v is about to be bound, and we are performing a nd on (v), then we want to obtain as result not any arbitrary element of the set but the oldest variable in (v). This requires the use of some sort of time stamping which may result in time complexity worse than that of union- nd. A simple (amortized) upper bound is O((n + m(n; m)) lg n). This is due to the fact that we can \time-stamp" each variable with its creation time and, amongst the aliased variables, we keep the time stamp of the oldest variable. A naive scheme to maintain time-stamps requires an order O(lg n) overhead on pointer machines. This scheme for time-stamping is not optimal. Using the
results in [17], where it is shown that the time-stamping can be performed with complexity O((lg lg n)k ) for some small k, we can show that the AP 1 problem can be solved in time O((n + m(n; m))(lg lg n)k ) on pointer machines.
The problem AP 2 In this section we will argue that the problem AP has an 2
upper bound of O(n + m(n; m)), where n is the number of nodes created and m are the remaining operations. The upper bound for the AP 2 problem on a pure pointer machine can be derived from the results known for the union- nd problem [22, 21, 18, 4]. Let us assume that we have an implementation of the union- nd problem, i.e. we have a notion of disjoint set and the three operations Make set, Find set, and Union. We show that this can be adopted to give an ecient implementation of the tree manipulation operations required for AP 2 . We assume that we are working on a pointer machine. Trees are represented in the traditional way. Each node is encoded in a record, and each record is assumed to have at least three pointers available, one to identify the parent node (nil if the node is the root of the tree), and two to identify the two children. The intuitive idea is to use the union- nd operations to keep track of the \left branches" in the tree. The following operations on the tree are performed (assumption is that we are processing binary trees): { the creation of the tree does not require any special action. A new singleton set containing the node is created through the Make set operation. { if we expand one node n to the right, adding a node m below it, then we have two possible cases: if the node n has a left child, then nothing else is required. if the node n does not have a left child, then a Union operation is performed between the set containing n and the one containing m. { if we expand one node n to the left, then we should perform a union operation between the new node m added and the set containing n. { let us consider the various cases that may occur when a leaf n is removed. Let m be the predecessor of n in the tree. if n is the right child of m, then no special action is required upon removal of n; if n is the left child of m and m does not have any right child, then again no special action is required; if n is the left child of m and m has a right child v, then upon removal of n it is necessary to apply a Union operation between the set containing m and the one containing v. { the operation of determining if a node n is leftmost in the subtree rooted at a certain node m can be solved by applying the Find set operation to both n and m and verifying whether they fall in the same set. From the above discussion we can infer that for AP 2 , given a sequence of n create operations and m operations of all other types, the problem can be solved with a time complexity of O(n + m(n; m)). This provides an amortized
upper bound to the problem. Regarding lower bounds on pure pointer machines, clearly (n + m) is an obvious lower bound. We conjecture that this is not tight. In section 6 we propose some related conjectures, proved for special cases, which indeed show that the lower bound is !(n + m). Summarizing, the complexity CAP 2 (n; m) of AP 2 is ?
(m + n) < CAP 2 (n; m) O(n + m(n; m)) Our conjecture is that the problem indeed has the same complexity as Union nd. This is also substantiated by the ability to show a lower bound of (m + n(n; m)) for a slightly dierent problem [14]. We would like to mention that, using the \time-stamping" scheme of [17], the AP 2 problem can be solved with single operation worst-case time complexity of O(lg lg n) per operation. This is better than the O( lglglgnn ) scheme that will result if we use the union- nd scheme from [2].
Complexity on Extended Pointer Machines The extended pointer ma-
chines add a fundamental component to the design, the ability to perform basic increment and comparison operations between natural numbers. This feature is sucient to modify some of the lower bounds previously discussed. In particular, the presence of counters and the ability to compare them in constant time allows us to solve the AP 2 problem in time (1) per operation. It is possible to show that keeping track of the (n) information for each leaf of the tree can be realized with complexity O(1) per operation [16]. This scheme can be modi ed on an extended pointer machine to solve the AP 2 problem. Let us assume that each record in the memory of the pointer machine contains a data eld capable of maintaining a counter, called age. During the create tree operation, the root of the tree is created and the value 0 is stored in the age eld of such record. Every time a node n is expanded, the two new nodes created (n1 and n2 ) receive a value in their age eld obtained by incrementing by one the age of n. The remove operation leaves unaected the age elds of the various nodes. We can assume that each variable v itself is represented by a record, which contains a pointer to the node (v) (the node where v was created). For each group of aliased variables we also maintain the information max(v) = minfm j m = (w) ^ w 2 (v)g. When the binding for a variable v is attempted, then we will need to verify whether the current computation point (represented by the leaf n in the tree) is leftmost in the subtree rooted at max(v). At this point, the operation verify leftmost can be easily realized. The intuitive idea is that the counter can be used to compare the depth of nodes. The sub eld points to the highest node in the tree in which such node is on the leftmost branch. If such node is below the expected node, then the test should fail, otherwise it will succeed. On the other hand, it is also quite easy to see that, as consequence of the uniform cost assumption on the management of counter (addition and comparison), we can propose an upper bound for AP 1 . It is relatively straightforward to add a simple comparison in the implementation of the union operation in Tarjan's optimal algorithm [21] to force the root of the tree to always contain the older
variable. This allows direct use of the union- nd algorithm to support the AP 1 problem, resulting in a complexity of O(n + m(n; m)). The two results combined, re ect exactly the \practical" results achieved in [14], where a \constant-time" practical implementation scheme for handling the AP problem without aliasing is described.
5 Complexity of Or-Parallelism The problem of Or-parallelism is considerably more complex than those we have explored earlier. As mentioned earlier, the only previous work that deals with the complexity of the mechanisms required in parallel logic programming was in the context of or-parallelism [7]. The results expressed in this work show that a generic OP problem with n attributes and m operations per branch generates a lower bound which is strictly worse than (n + m). It is quite easy to observe that along each branch of the tree, attributes are managed in a way very similar to the way in which elements are managed in the union- nd problem. Initially all attributes are distinct; given an attribute , let us denote with A() the set of all the attributes that are currently (on the branch at hand) aliased to . If a new aliasing between 1 and 2 takes place, then the set A is modi ed as A(1 ) = A(1 ) [ A(2 ). If we assume that the value of an attribute is always associated to the representative of A(), then any dereference operation is based on a nd operation in order to detect the representative. The observation above can be used to provide a new proof of the non-constant time nature of the management of or-parallel computations.
Theorem 1. The amortized time complexity of OP is (n + m(n; m)), where n is the number of attributes and m is the number of tree operations.
Proof. (Sketch) The proof is based on the idea that OP subsumes the union- nd problem. Let us consider a generic union- nd problem with n elements and m union- nd operations. From the results in [21, 18] we know that this problem has a lower bound of (n+m(n; m)). Let us show how we can use the mechanisms of OP to implement a solution to the union nd problem. Each of the n elements is encoded as one of the attributes. Each time an operation is performed a new node is added along the left spine of the tree (we can ignore in this context the nodes that will be added to the right during the expansion). Each union operation is translated into an aliasing between the two attributes. Each nd operation corresponds to the dereferencing of the corresponding attribute. Clearly each operation in the union nd problem can be encoded using the OP support and a constant number of additional operations. Thus, each solution to OP cannot be better than the lower bound for the union- nd problem. 2 This result relies mainly on the execution of aliasing operations. Nevertheless, it is possible to show that the result holds even when we disallow aliasing during OP execution. Thus, the inability to achieve constant-time behaviour is inherent in the management of the attributes in the tree.
Theorem 2. On pointer machines, the worst case time complexity of OP is
(lg n) per operation even without aliasing.
The basic idea of the proof is that since there is no direct addressing in the pointer machines starting from a particular node only a \small" number of nodes can be accessed in a small number of steps. Without loss of generality, assume that each record has at most two pointers. Assume that we have a pointer machine program that performs the nd operation in worst{case c(n) time for some function c. The nd procedure receives only two pointers as arguments, one to the record that contains the name of the attribute whose value is being searched, and the other to the leaf node for which this attribute value is being searched. Starting from either of these records, we can access at most 2c(n)+1 dierent records within c(n) steps. We use this fact to prove that c(n) = (lg n). Suppose c(n) is not (lg n). Then for large enough n; c(n) < 18 lg n. Consider the tree with n nodes that has the following structure. It has depth t1 (n) + t2 (n). The rst t1 (n) levels constitute a complete binary tree (let us indicate with ,1 the rst t1 (n) levels of the tree); the remaining t2 (n) levels consist of 2t1 (n) lists of nodes, each of length t2 (n) and each attached to a dierent leaf of ,1 . Clearly, we must have that 2t1 (n)+1 , 1+ t2(n) 2t1 (n) = n or that t2 (n) = (n + 1)=2t1(n) , 2. Let us also assume that we have t2 (n) dierent attributes, f1 ; : : : ; t2 (n) g. All the nodes at level t1 (n) + i of the tree perform a bind operation on i , and each node assigns a dierent value to i . Suppose we choose t1 (n) and t2 (n) such that 2c(n)+1t2 (n) < 2t1 (n) (1) Then there exists a leaf node l in the tree such that none of the last t2 (n) nodes on the path from root to l is accessed starting from the pointer to the attribute node in any of the computations nd(i ; l), i = 1 : : : t2 (n). This implies that for any of the t2 (n) dierent calls nd(i ; l) the appropriate node on the path must be accessed from starting from the leaf node. Since the nd procedure works in time c(n) we must have that 2c(n)+1 t2 (n). This gives us that c(n) lg t2 (n) , 1. Since, c(n) < 18 lg n, 2c(n)+1 < 2n1=8 . Let us choose t1 (n) = 34 lg n. Then 2t1 (n) = n3=4 and t2 (n) = (n + 1)=n3=4 , 2 n1=4 . These t1 (n); t2 (n) satisfy the inequality (1) above as 2c(n)+1t2 (n) 2n3=8 < n3=4 . Hence we can say that c(n) > lg t2 (n) , 1, i.e. c(n) > 1=4 lg n , 1 which contradicts our assumption that c(n) < 18 lg n. Hence c(n) = (lg n). 2
Upper Bound for OP The related research on complexity of the OP problem has been limited to showing that a constant time cost per operation cannot be achieved in any implementation scheme. No work has ever been attempted, to the best of our knowledge, to supply a tight upper bound to this problem. Most of the implementation schemes proposed in the literature can be shown to have a worst case complexity of O(n) per operation. Currently, the best result we have been able to achieve is the following:
Theorem 3. The OP problem with no aliasing can be solved onpa pointer machine with a single operation worst{case time complexity of O( 3 n(lg n)k ) for some small constant k.
Proof. (Sketch) the OP problem can be cast in the following simpli ed version. Each node in the tree is assigned a label. The same label can be used in various nodes of the tree, but no label can occur more than once in each branch of the tree. The only two operations allowed are insert(n; 1 ; 2 ) which creates two new nodes under node n (required to be a leaf) and assigns respectively the label 1 ; 2 to the newly created nodes, and label(n; ) which retrieves (if any) the node with label present on the path between n and the root of the tree. Observe that we are simplifying the problem by disallowing deletions of nodes and restricting trees. To obtain the worst-case time p the attention to binary complexity O~ ( 3 n) per operation we maintain the collection of nodes assigned p the same label in tables of size 3 n, and record in the p tree the nodes where one table gets lled ( ll nodes). Furthermore, pevery 3 n ll nodes a table is created summarizing the eect of the previous 3 n ll nodes. A label operation is performed by rst scanning the ll nodes up to the last created table, and then visiting table by table. Either an entry is found in one of the initial nodes or one of the tables (which will directly point to the group containing the node of interest), or the node of interest is in the rst table for such label (i.e., the rst table has not been lled yet), or the label has never been used along the considered branch. Testing each entry in a table requires the checking for nearest common ancestor [8, 5, 23]. 2
6 Open Problems Various problems are still open and are currently being researched. First of all, for various problems we have proposed lower and upper bounds which are rather distant; these gaps need to be closed. The gap is particularly evident for the OP problem (even without aliasing), where the single p operation worst{case lower bound is (lg n), while the upper bound is O~ ( 3 n) per operation in the worst case. We still lack a lower bound for the AP 2 problem. The problem, as illustrated in [17], seems to have very strong ties with the mentioned time stamping problem. Let us consider the following problem. Suppose we use a pointer machine to maintain an ordered sequence of elements. The only two operations allowed are insert element, which adds a new element to the sequence, and precedes, which tests whether one element has been inserted before another. Let us consider the problem of performing a sequence of n insertions followed by a precedes test between two arbitrary elements of the sequence (we will refer to this problem as the IP problem). Our conjecture is the following:
Conjecture 4. On a pure pointer machine, an arbitrary (correct) sequence of n operations belonging to the IP problem has a complexity strictly worse than
(n)|in particular, there is no solution where both insertion and test use a
constant number of steps per operation. In [17] we show that the IP problem has a single operation worst{case upper bound of O(poly(lg lg n)). We can prove this conjecture for a special class of algorithms, but this is not sucient to establish the above conjecture. We are con dent that the result holds and can be proved. The relevance of this conjecture can be seen from the following: Lemma 5. The problem AP 2 has a single operation worst{case lower bound of !(1) i conjecture 4 is true. Proof. if we can solve IP in constant time (i.e., conjecture 4 is false), then AP 2 can also be solved with single operation worst{case time complexity of (1). This follows from the scheme described in section 4.1. Let us assume by contradiction that we are capable of implementing all the operations required by the AP 2 problem in constant time. These operations can be used to provide a solution to the IP problem. In fact, the operation of inserting a new element can be translated directly in an expand operation applied on the leftmost leaf of the tree; the rst argument of the expand is the node inserted by the IP operation, while the second node inserted is a dummy node. The precedes operation can be directly translated into an is leftmost operation. In fact, given two nodes n1 and n2 which have been previously inserted, we know by construction that both lie on the leftmost branch of the tree. Thus, n1 is leftmost in the tree rooted in n2 i n2 has been inserted before n1 in the tree. 2
7 Conclusions
In this paper we studied the complexity of several operations associated with parallel execution of Prolog programs. The following results were achieved: on both the pure and extended pointer machines, execution of and-parallelism requires a non-constant execution time due to the need of managing aliased variables. If aliasing is disallowed, a constant time behaviour is achievable on extended pointer machines. An upper bound for the problem was found. We showed that managing and-parallelism has an upper bound equivalent to the union- nd problem. On both models, execution of or-parallelism was showed to have a nonconstant time lower bound. In contrast to and-parallelism, the non-constant time holds even when we consider extended pointer machines and we disallow aliasing of variables. Some problems are still open, the main one being showing that the problem AP 2 has a non-constant time lower bound on pure pointer machines. This result is currently based on the conjecture about the IP problem. Such conjecture has been proved to hold in special cases, but a general proof is still missing. Another open problem of interest is the development of an upper bound for the OP in case of no aliasing between variables. This problem has been shown to have a non-constant time lower bound, but an optimal upper bound is still a subject of research.
References 1. K.A.M. Ali and R. Karlsson. Full Prolog and Scheduling Or-parallelism in Muse. International Journal of Parallel Programming, 1991. 19(6):445{475. 2. N. Blum. On the single-operation worst-case time complexity of the disjoint set union problem. SIAM Journal on Computing, 15(4), 1986. 3. M. Carlsson,G. Gupta, K.M. Ali and M.V. Hermenegildo. Parallel execution of prolog programs: a survey. Journal of Logic Programming, 1998.. to appear. 4. M. Fredman, M. Saks. The Cell Probe Complexity of Dynamic Data Structures. In Procs. of 21st ACM Symposium on Theory of Computing. ACM, 1989. 5. H.N. Gabow. Data structures for weighted matching and nearest common ancestor. In ACM Symp. on Discrete Algorithms, 1990. 6. G. Gupta and V. Santos Costa. Cuts and Side-eects in And/Or Parallel Prolog. Journal of Logic Programming, 27(1):45{71, 1996. 7. G. Gupta and B. Jayaraman. Analysis of or-parallel execution models. ACM TOPLAS, 15(4):659{680, 1993. 8. D. Harel and R.E. Tarjan. Fast Algorithms for Finding Nearest Common Ancestor. SIAM Journal of Computing, 13(2), 1984. 9. M. Hermenegildo and K. Greene. &-Prolog and its Performance. In Int'l Conf. on Logic Prog., MIT Press, 1990. 10. J.C. Kergommeaux and P. Codognet. Parallel logic programming systems. ACM Computing Surveys, 26(3), 1994. 11. D.E. Knuth. The Art of Computer Programming, volume 1. Addison-Wesley, 1968. 12. J. W. Lloyd. Foundations of Logic Programming. Springer-Verlag, 1987. 13. E. Lusk and al. The Aurora Or-parallel Prolog System. New Generation Computing, 7(2,3), '90. 14. E. Pontelli and G. Gupta. Implementation mechanisms for dependent andparallelism. In International Conference on Logic Programming. MIT Press, 1997. 15. E. Pontelli, G. Gupta, and M. Hermenegildo. &ACE: A High-performance Parallel Prolog System. In IPPS 95. IEEE Computer Society, 1995. 16. E. Pontelli, D. Ranjan, and G. Gupta. On the complexity of the Parallelism in Logic Programming. Technical report, New Mexico State University, 1997. 17. E. Pontelli, D. Ranjan, and G. Gupta. On the complexity of the insertion/precedes problem. Technical report, New Mexico State University, 1997. 18. H. La Poutre. Lower Bounds for the Union-Find and the Split-Find Problem on Pointer Machines. Journal of Computer and System Sciences, 52:87{99, 1996. 19. A. Schonhage. Storage Modi cation Machines. SIAM Journal of Computing, 9(3):490{508, August 1980. 20. D.D. Sleator and R.E. Tarjan. A data structure for dynamic trees. Journal of Computer and System Sciences, 26, 1983. 21. R.E. Tarjan. A Class of Algorithms which Require Nonlinear Time to Mantain Disjoint Sets. Journal of Computer and System Sciences, 2(18), 1979. 22. R.E. Tarjan. Data Structures and Network Algorithms. CBMS-NSF, 1983. 23. A.K. Tsakalidis. The Nearest Common Ancestor in a Dynamic Tree. ACTA Informatica, 25, 1988.
An Abductive Semantics for Disjunctive Logic Programs and its Proof Procedure Jia-Huai You, Li Yan Yuan, Randy Goebel
Department of Computing Science University of Alberta Edmonton, Alberta, Canada T6G 2H1 fyou, yuan, [email protected]
Abstract. While it is well-known how normal logic programs may be
viewed as a form of abduction and argumentation, the problem of how disjunctive programs may be used for abductive reasoning is rarely discussed. In this paper we propose an abductive semantics for disjunctive logic programs with default negation and show that Eshghi and Kowalski's abductive proof procedure for normal programs can be adopted to compute abductive solutions for disjunctive programs.
1 Introduction In the simplest form abduction is the problem: From A and A B, infer B as a possible explanation of A. Nonmonotonic reasoning has been explored as a form of abductive reasoning. In particular, default assumptions in logic programs have been treated as abductive hypotheses and a number of reasoning mechanisms and semantics have been proposed [7, 10, 16, 18, 19]. Chief among these is Eshghi and Kowalski's formulation of an elegant abductive proof procedure for normal programs where default assumptions are viewed as abducibles. Kakas et al. presented a comprehensive exploration of abductive logic programming [16, 17]. A fundamental insight is that abductive reasoning embodies an argumentation approach to logic program semantics. Dung [8], as well as Bondarenko et al. [2], subsequently showed that nonmonotonic reasoning in general is a form of argumentation using default assumptions. There are important applications of abductive reasoning with disjunctive programs. For example, in AI planning and scheduling, in general we are interested in whether there is a plan that achieves the goal, and whether there is a schedule that satis es the speci ed constraints. Such a solution corresponds to an explanation (abducibles in a plan, for example) to an observation (the goal to be
achieved) in abduction. Furthermore, since abduction is a form of hypothetical reasoning, it embodies a form of learning and prediction; e.g., one is often able to abductively complete partial descriptions given as observations. For example, suppose a robot has two hands, the left being designed to be capable of picking up a block and placing it down in the same orientation, and the right rotating it. Now suppose an observation is made that the block's orientation is changed and that either the left or right hand is broken.1 An explanation of this observation must include a prediction that it is the left hand that is broken. We note that the static semantics for disjunctive programs [24] is not designed to be capable of accommodating these kind of applications. Despite well understood results in relating normal programs with abduction, and in more general cases, relating more general inference systems with abduction and argumentation [2], the problem of how disjunctive programs may be viewed as abduction is still open. For example, Dung showed in [5] that acyclic disjunctive programs can be interpreted as abductive programs in the sense of Eshghi and Kowalski, but he left the question open as what constitutes an abductive semantics for the class of all disjunctive programs. The main problem is that the method used in Dung's approach, which is based on a program transformation called shifting, does not even preserve the minimal model semantics for positive disjunctive programs. The problem has been noticed by a number of authors in dealing with disjunctive default (e.g. [12, 26]). In this paper we show a simple extension of the various abductive semantics for normal programs to disjunctive programs. The central idea is to augment the standard rst order deduction relation by the inclusion of an additional inference rule, called the rule of assumption commitment, which resolves an assumption and a classic disjunction. Its ground form is: not _
,,, This inference rule is similar to that of resolution (or called disjunction elimination in natural deduction). That is, if one identi es not with :, it becomes the resolution inference rule. Intuitively, it means that if we assume not we can strengthen our assumption in an attempt to derive additional information. For example, with the following disjunctive program pay cash(x; y) _ pay by credit(x; y) 1
Here the term observation is used in a broad sense, not necessarily an act of physically observing something.
where _ is interpreted as epistemic disjunction (or, as exclusive as possible). The observation that bob pays cash to greg (pay cash(bob; greg)) is explained by the assumption that bob does not pay greg by credit, not pay by credit(bob; greg); which resolves with the above disjunction to yield pay cash(bob; greg). The approach taken here falls into the general frameworks of abduction and argumentation [2, 8, 17] in that a speci c inference system (using the above inference rule) is adopted for the purpose of capturing the meaning of the epistemic disjunction initially formulated by Gelfond and Lifschitz. The most interesting result out of such an adoption is a new semantics that extends the stable semantics for disjunctive programs in the same way as regular/preferential semantics extends the stable semantics for normal programs. Furthermore, with partial evaluation as pre-processing [3], Eshghi and Kowalski's abductive proof procedure, with a mechanism of positive loop checking, can be used to compute abductive solutions for a disjunctive program. For example, if an AI planning problem is formulated as a disjunctive program, one can use the Eshghi-Kowalski abductive proof procedure to answer the question whether a goal is achievable. The procedure computes the abducibles, in the backward chaining fashion, that are needed in order to achieve the goal. The procedure is sound but not complete with respect to the new semantics.2 This seems to be the rst backward chaining proof procedure for any semantics for disjunctive programs. The stable semantics in principle does not possess a top down proof procedure for the simple reason that it does not have the locality property; i.e. a backward chaining proof may be wrong simply due to nonexistence of a stable model. Almost all current implementations of the static semantics reply on a bottom-up generation of minimal models. Logic programming has been identi ed with a goal-oriented programming paradigm. An implementation of a nonmonotonic semantics could bene t from the techniques used in implementing Prolog [4]. Thus, the existence of a top down proof procedure is a signi cant feature for any semantics. It is important to mention the three remarkable facts about Eshghi and Kowalski's procedure. First, it is conceptually simple{all it needs, on top of a traditional negation-as-failure proof procedure, is a mechanism to handle two kinds of (negative dependency) loops. Secondly, it is now known that this procedure is sound, and complete as well with a mechanism of positive loop checking, for a number of independently proposed but equivalent semantics for normal programs (cf. [7, 13, 20, 30]). This includes the preferential semantics based on abduction [7], the regular semantics [29] and the partial stable semantics [25] using model-theoretic approachs, and a stronger version of the stable class 2
The procedure becomes complete when properly combined with a complete resolution procedure.
semantics [1]. Perhaps more importantly, this procedure can be implemented eciently, and this eciency could be comparable with that of a completely top down proof procedure for the well-founded semantics.3 An interesting conclusion is that the eciency of credulous reasoning (w.r.t. one model) based on a weaker notion of stable models, i.e., regular models, is largely comparable with the eciency of the reasoning using the well-founded semantics when query answering is based on a backward chaining proof. The next section sets the stage for disjunctive programs to be interpreted as abductive programs. Section 3 shows, using Dung's preferred extension as an example, that any abductive semantics for normal programs can be simply extended to disjunctive programs. This speci cally yields a new abductive semantics for disjunctive programs. Section 4 gives an elegant xpoint characterization of the minimal model semantics, stable semantics, and the new semantics in terms of the extended inference relation. Section 5 shows a result which enables the use of Eshghi-Kowalski's abductive proof procedure for the proposed semantics, with Section 6 containing additional related work and additional remarks.
2 Disjunctive Logic Programming as Abduction In general, an abductive framework is a triple hP; Ab; IC i where P is a rst order theory, Ab is the set of abducibles, and IC the set of rst order formulas serving as integrity constraints. We restrict to a special class of abductive frameworks which correspond to disjunctive programs. For each ordinary predicate atom = p(t1 ; :::; tn), not denotes the corresponding abducible atom not p(t1; :::; tn). We let P be a rst order theory that consists of the clauses of the form 1 _ ::: _ k 1 ; :::; m; not 1 ; :::; not n where i's and i 's are atoms, and not i 's are abducibles (also called assumptions). We may denote the above clause by A B; not C where A is the set of the atoms in the head of the clause, B the set of the atoms in the body, and not C the set of the assumptions in the body. 3
This is not a claim on the complexity of the semantics, which is known to be intractable for the former and tractable for the latter. This claim merely re ects the fact that as far as backward chaining proof is concerned, the behaviors of these two semantics are quite similar: odd (negative dependency) loops are treated exactly the same in both semantics, and for any even loop, a procedure for the well-founded semantics assigns (tentatively) the value undefined based on the current loop, and a procedure for the regular semantics simply indicates \in a model". In either case, no further derivation beyond the loop is performed.
In the literature, disjunction is usually expressed by j, which is sometimes called epistemic disjunction. The intuitive meaning of epistemic disjunction is that it be interpreted as exclusive as possible. Here we replace it by classic disjunction _. Its intended meaning is enforced by the underlying abductive semantics. Note that since such a clause is just a rst order formula, the rst order derivation relation ` is applicable. For example, with P consisting of a _ b not c a b b a we have P ` a not c, P [ fnot cg ` a, etc. In Dung's terminology, not c is evidence for a. Further, we let IC = f? (1 _ ::: _ n); not 1; :::; not n j n 1g When n = 1, the constraints in IC become those for normal programs. The special atom ? denotes a violation of a constraint. Although it embodies a meaning of inconsistency, it is not the same as logical inconsistency; in particular, its presence in a theory does not trivialize the theory by concluding anything. We argue that this is an advantage of logic programming, as a notion of \local inconsistency" may be accommodated. For the purpose of abductive reasoning with disjunctive programs, we augment the standard rst order derivation relation by a new, resolution-like inference rule:
Rule of Assumption Commitment (RAC): not 1 _ ::: _ n
,,,,,,,,,,,,,,, (1 _ ::: _ i,1 _ i+1 _ ::: _ n) where is an atom, j 's are literals, and and i are uni able with m.g.u. . When n = 1, the resolvent is ?. This inference rule says that if one assumes not , one may well commit it to : in an attempt to derive additional information. In the sequel, we use `d to denote the standard rst order derivation relation augmented by the above rule of inference. Note that `d is a monotonic relation. We say that a set E of abducibles is an explanation of an atom if P [ E `d and P [ E [ IC 6 `d ?
Example 1. Consider the situation where a block x is putdown onto the table or
onto another block y.
on table(x) _ on block(x; y) putdown(a)
putdown(x)
Suppose we see that a is on block b. Then E = fnot on table(a)g forms an explanation. An interesting aspect of abductive reasoning is about prediction. As a speci c form of prediction, it is about completing, or enriching, the initially speci ed, incomplete information. Example 2. Consider the popular, broken-hand example originally discussed in
the context of default reasoning [23]: We know either the left hand is broken or the right hand is broken, and in general, a hand is usable if not broken. The given information is incomplete as we don't know which hand is broken and which is not (perhaps both could have been broken). For the purpose of demonstrating the point of augmenting partial information, we further assume that the left hand being usable leads to the use of it that results in moving a block; and the use of the right hand leads to moving the table. lh broken _ rh broken lh usable not lh broken rh usable not rh broken move block lh usable move table rh usable Now suppose we observe that the block is moved from its original location (and suppose we cannot see any operations). Under the closed world assumption for operations (namely, no other operations other the ones performed by the program may be performed), we can predict that it is the right hand that is broken.
3 Abductive Semantics for Disjunctive Programs In this section we demonstrate that an abductive semantics for normal programs can be simply extended to disjunctive programs by augmenting only the underlying derivation relation by RAC. We do this for Dung's semantics based on preferred extensions, as there is an additional interest in a xpoint characterization and in the adoption of Eshghi and Kowalski's proof procedure.
In the sequel, for the sake of convenience, our technical exploration will be based on ground programs. Also, since given an abductive framework hP; Ab; IC i, both Ab and IC are xed, we only mention P . In the following de nition, we paraphrase Dung's de nition of a preferred extension. But by adopting `d , a new semantics for disjunctive programs is obtained automatically. First, we need some terminology: an assumption set S is called ?-consistent if P [ S 6 `d ?.
De nition1. Let P be a disjunctive program. An assumption not is said to
be acceptable w.r.t. an assumption set S if for any assumption set S 0 such that P [ S 0 `d , we have P [ S `d , for some not 2 S 0 . A preferred extension E is a maximal assumption set that is ?-consistent, and for every not 2 E, not is acceptable w.r.t. E. In terms of argumentation (see, for example, [2, 8, 16, 28]), not is acceptable w.r.t. S if any assumption set S 0 that attacks not is counter-attacked by S itself. Note that the de nition is precisely that of Dung's if we use ` instead of `d . In fact, for normal programs, `d coincides with ` since RAC has no real eect. Because of this, the properties of preferred extensions for normal programs can be easily extended to disjunctive programs. For example, it is easy to show that there is at least one preferred extension for any disjunctive program. Dung uses the Barber's paradox to illustrate the drawback of the stable semantics for the case of normal programs [7]. To illustrate the same point for disjunctive programs, we extend the example to a disjunctive program. Example 3. Consider the disjunctive program:
shave(bob; x) not shave(x; x) pay cash(y; x) _ pay by credit(y; x) shave(x; y) accepted(x; y) pay cash(x; y) accepted(x; y) pay by credit(x; y) This program intuitively says that bob shaves those who do not shave themselves; If x shaves y then y pays x by cash or credit; either way is accepted. Assume there is another person, called greg. Then clearly, we should conclude bob shaves greg, and greg pays bob by cash or by credit, either way is accepted. This program has no stable model. But it has two preferred extensions, both containing not shave(greg; greg). In addition, one of the two contains not pay cash(greg; bob) and the other not pay by credit(greg; bob). Thus, if we know greg pays cash to bob, then the abductive solution is that we must assume that greg does not pay bob by credit.
The use of the inference rule RAC may be viewed as semantically shifting disjunctive clauses in a program P. Recall that the idea of shifting a clause A B; not C (e.g. see [5, 12]) is to syntactically transform such a clause to a set of normal clauses where, for each atom in A there is a normal clause with as the head and any other atom in A is removed to the body as not . More formally, given a clause R: A B; not C the set of normal clauses obtained by shifting R, denoted Rshift is: Rshift = f
B; not C 0 : 2 A; not C 0 = not C [ fnot : 2 A; 6= gg
Given a disjunctive program P, we denote by Pshift the set of all normal clauses obtained by shifting all the clauses in P. It is known that this technique is too strong to capture even the minimal model semantics of a positive disjunctive program (cf. [12, 26]). Example 4. Programs with head-cycles are often used to illustrate the dicul-
ties with syntactically shifting a program. Consider, for example, the following program P : a_b_c a b b a The program has a head-cycle between a and b. In terms of strati cation, a should be one level higher than b (by the second clause) and b should be one level higher than a (by the third clause); but a and b also appear in the same head of a clause; hence the term head-cycle. The program has two minimal models, fcg and fa; bg, which correspond to the two preferred extensions, E1 = fnot a; not bg and E2 = fnot cg. Let us verify that E1 is a preferred extension. The smallest assumption set that attacks not a is fnot cg (i.e. P [ fnot cg `d a), which is counter-attacked by E1 itself (i.e. P [ E1 `d c). The normal program Pshift consists of the following clauses: a b c a b
not b; not c not a; not c not a; not b b a
Pshift however does not have any stable model.
We now show that there is one-to-one relationship between a type of preferred extensions, called stable extensions, and stable models. We rst recall the de nition of a stable model. A stable model M of a disjunctive program P is a set of atoms, which is a minimal model of the following transformed program: PM = fA B : A B; not C 2 P; and 8(not ) 2 not C; 62 M g Similar to the case of normal programs [7], we de ne a stable extension S of a disjunctive program P as a total preferred extension of P: for any ordinary atom , either not 2 S or P [ S `d The stable models of a disjunctive program, and in particular the minimal models of a positive disjunctive program, are expressible as stable extensions. Theorem2. Let P be a disjunctive program and M be a set of atoms. M is a stable model of P i SM is a stable extension of P where SM = fnot : 62 M g. 2 This result provides another way to understand the results by Eiter and Gottlob as well as by Marek et al. [9, 14, 22]: disjunctive default logic falls into the same complexity level as default logic. In Gottlob's explanation of these complexity levels, this can be said rather plainly as: for disjunctive programs epistemic disjunction does not present one more source of computational diculty than classic disjunction.
4 A Fixpoint Characterization We show that some of the results presented in [30], namely the results that regular models/preferred extensions are maximal normal alternating xpoints of a suitable operator for normal programs, can be extended to the case of disjunctive programs. This yields an elegant xpoint characterization of all three semantics: minimal models for positive disjunctive programs, stable models, as well as preferred extensions, for disjunctive programs. We de ne a function FP over sets S of assumptions as: FP (S) = fnot : is an ordinary atom such that P [ S 6 `d g: It is easy to check that this function has the following property (called antimonotonicity): S1 S2 ) FP (S2 ) FP (S1 ) This holds because the derivation relation `d is monotonic; more information that we have, less information we do not derive. Then, the composite function that applies FP twice, denoted FP2 , is monotonic. That is, S1 S2 ) FP2 (S1 ) FP2 (S2 )
A xpoint of the function FP2 is called an alternating xpoint of FP (or simply, P; cf. [1, 11, 30]). An alternating xpoint S is called normal if S FP (S). For credulous reasoning, we are interested in maximal normal alternating xpoints. On the other hand, Dung's preferred extensions are, by de nition, the xpoints of the following operator over sets of assumptions: DP (S) = fnot j not is acceptable w.r.t. S g: Note that FP2 depends only on a derivation relation. As a result, the operation of extending an assumption set using FP2 is rather mechanical. On the other hand, the de nition of DP , which is based on the notion of acceptability, is less direct. However, we will show that these two operators are equivalent. First, let us see an example. Example 5. Consider the following disjunctive program: a _ b not a; not b c not d It is easy to show that S = fnot dg is a maximal normal alternating xpoint. First, FP (S) = fnot d; not a; not bg. Then, FP2 (S) = fnot dg. Further, S FP (S). Thus, S is a normal alternating xpoint. In addition, it is easy to see that S is the only normal alternating xpoint. For example, S 0 = fnot d; not ag is not an alternating xpoint, since FP (S 0 ) = fnot d; not a; not bg and FP2 (S 0 ) = fnot dg. Further, though S 00 = fnot d; not a; not bg is an alternating xpoint but it is not normal. Thus, S is trivially maximal. S is also a preferred extension. This is more tedious to verify, since one has to consider, for each assumption not , all the possible assumption sets that may attack not , and for each such assumption set that does attack not , whether it is counter-attacked by S. We leave this to the reader to verify. We now show that there is one-to-one correspondence between maximal normal alternating xpoints and preferred extensions. First, we have a lemma. Lemma 3. Let P be a disjunctive program and S be an assumption set. Then, FP2 (S) = DP (S). 2 For an alternating xpoint S = FP2 (S), the normality condition S FP (S) ensures consistency. E.g. in the above example, fnot d; not a; not bg is an alternating xpoint, but it is not ?-consistent and thus it is not normal. From this lemma, it is easy to show Theorem 4. Let P be a disjunctive program and S be an assumption set. Then, S is a preferred extension of P i S is a maximal normal alternating xpoint of P.
As a corollary of this theorem, stable models of a disjunctive program (and minimal models of a positive disjunctive program) are a particular type of maximal normal alternating xpoints; namely, they are xpoints of FP (which are then trivially xpoints of FP2 ).
Corollary5. Let P be a disjunctive program and S be an assumption set. Then, S is a stable extension of P i S is a xpoint of FP .
5 Proof Procedure In this section we show that the extensions of a head-cycle free disjunctive program P are precisely those of the normal program Pshift. This result enables the use of Eshghi and Kowalski's abductive proof procedure to answer queries, soundly, using the normal program obtained by shifting. This result can be seen as an extension of those presented by Dung in [8] on the use of Eshghi and Kowalski's proof procedure for disjunctive programs. Dung's result relies on two restrictions. The rst is that a program should be head-cycle free, the same as ours. The other is that there should be no default negation going through recursion. Our result does not need this second restriction. First, let us look at an example. Example 6. Consider disjunctive program P = fa _ b not ag. P is head-cycle free simply because there is no positive body literal in the clause. The normal program obtained by shifting is Pshift :
a b
not a; not b not a
Clearly, both P and Pshift have the same unique preferred extension, fnot ag. Now we show the relationship between the extensions of a head-cycle free disjunctive program P and those of its normal transformation Pshift . As there seems to be a particular interest in this result, we provide a proof sketch here.
Theorem6. Let P be a nite ground, head-cylce free disjunctive program, and
Pshift be the normal program obtained by shifting. Then, an assumption set S is a preferred extension of P i it is a preferred extension of Pshift. Proof Sketch: We only show the ) part. Without loss of generality, we assume that the given program P is free of positive body literals. This is because partial evaluation [3] can reduce them.
We show that, for any assumption not , if not is acceptable w.r.t. S using program P and under `d , then not is acceptable w.r.t. S using Pshift and under `. The if clause in this statement is: for any assumption set W such that P [ W `d , we have P [ S ` , for some not 2 W. The conclusion part is: for any assumption set W such that Pshift [ W ` , we have Pshift [ S ` , for some not 2 W. Suppose W is such that Pshift [ W ` . Then, there is a clause not C; not Rest in Pshift , where not C [ not Rest W and not Rest denotes the set of assumptions shifted from the head of the clause fg [ Rest not C in P . Then, it is easy to see there is a sequence of derivation steps using RAC that resolve fg [ Rest with those assumptions in not Rest to yield . Thus, P [ W `d . It then follows from the if clause in the statement above that P [ S `d , for some not 2 W. This implies there is a clause f g[ Rest0 not D in P such that not D S and f g[ Rest0 resolves with the assumptions in not Rest0 (not Rest0 S) to yield . Then, clearly, the corresponding shifted clause not D; not Rest0 derives . That is, Pshift [ S ` , for some not 2 W. This shows the conclusion part. 2 There is something of further interest in this result: a nite ground disjunctive program can be pre-processed by partial evaluation [3] to reduce all the positive body literals in a program so that the resulting program is trivially head-cycle free.4 Thus, the Eshghi-Kowalski proof procedure plus partial evaluation provides a proof theory for the abductive semantics proposed in this paper. The soundness of such a proof theory is easy to verify, mainly due to the fact that partial evaluation is semantics preserving for the semantics proposed in this paper, and the semantics proposed here reduces to preferential/regular semantics, for which the Eshghi-Kowalski procedure is sound and complete (with positive loop checking). However, because disjunction cannot be simulated completely by normal programs, a procedure for normal programs cannot be complete in general for disjunctive programs. Example 7. First, consider the following, head-cycle free, program P:
a_b c a c b The program has two preferred extensions fnot ag and fnot bg. Its shifted 4
If a program P is nonground, partial evaluation may not terminate. Even it can be made to terminate for nite ground programs, the process is in general intractable.
program, Pshift , consists of
a b c c It has the same preferred extensions. Given the query
not b not a a b
c the Eshghi-Kowalski procedure will construct two proofs (by backtracking), one with abducible not b and the other with not a. The Eshghi-Kowalski procedure is complete in this case. However, the situation changes if we add two clauses fa not a; b not bg into P , denoted it by P 0. P 0 has one preferred extension, ;, and P 0 `d c. 0 Now Pshift is: a not a b not b a not b b not a c a c b Although it has the same preferred extension as P 0, it is no longer the case 0 `d c, and the Eshghi-Kowalski procedure cannot answer the query that Pshift 0 . c using Pshift To make it a complete proof procedure, the Eshghi-Kowalski procedure should be combined a linear resolution procedure. In this case, even head-cyclic programs can be handled, since partial evaluation is a form of resolution procedure. Recently, it has come to our attention that, in an unpublished manuscript [6], Dung de ned a procedure that combines Eshghi-Kowalski procedure with a form of linear resolution, SLI-resolution of Lobo et al.'s [21]. The relationship between this procedure and our semantics is currently being investigated.
6 Related Work and Final Remarks Work by Inoue and Sakama [15, 15] yields important insights into a dierent aspect of the problem; how to represent abductive programs by (extended) disjunctive programs. In [15], they proposed a transformation from the former to the latter, and in [27], they showed, in general, an abductive program can be viewed as a disjunctive program with priorities. Our work is about a proposal of an abductive/argumentation framework for disjunctive programs. Further, our
abductive semantics for disjunctive programs reduces to the regular/preferential semantics for normal programs. This is the key to applying Eshghi and Kowalski's abductive proof procedure. Dung shows that there is a natural abduction/argumentation interpretation for explicit negation. His techniques can be applied to the semantics proposed here. How to accommodate explicit negation in the Eshghi-Kowalski procedure requires further investigation.
References 1. C. Baral and V. Subrahmanian. Stable and extension class theory for logic programs and default logic. J. Automated Reasoning, pages 345{366, 1992. 2. A. Bondarenko, F. Toni, and R. Kowalski. An assumption-based framework for nonmonotonic reasoning. In Proc. 2nd Int'l Workshop on LPNMR, pages 171{ 189. MIT Press, 1993. 3. S. Brass and J. Dix. Disjunctive semantics based upon partinal and botton-up evaluation. In Proc. 12th ICLP, pages 199{216, 1995. 4. W. Chen and D. Warren. Extending prolog with nonmonotonic reasoning. J. Logic Programming, 27(2):169{183, 1996. 5. P. Dung. Acyclic disjunctive logic programs with abductive procedures as proof procedure. In Proc. 1992 Fifth Generation Computer Systems, pages 555{561, 1992. 6. P. Dung. An abductive procedure for disjunctive logic programs. Unpublished manuscript, 1993. 7. P. Dung. An argumentation theoretic foundation for logic programming. J. Logic Programming, 22:151{177, 1995. A short version appeared in Proc. ICLP '91. 8. P. Dung. On the acceptability of argument and its fundamental rule in nonmonotonic reasoning and logic programming and n-person game. Arti cial Intelligence, 76, 1995. A short version appeared in Proc. IJCAI'93. 9. T. Eiter and G. Gottlob. On the computational cost of disjunctive logic programming: propositional case. Annals of Mathematics and Arti cial Intelligence, 15:289{324, 1993. 10. K. Eshghi and R.A. Kowalski. Abduction compared with negation by failure. In Proc. 6th ICLP, pages 234{254. MIT Press, 1989. 11. A. Van Gelder. The alternating xpoint of logic programs with negation. J. Computer and System Sciences, pages 185{221, 1993. The preliminary version appeared in PODS '89. 12. M. Gelfond, V. Lifschitz, H. Przymusinska, and M. Truszczynski. Disjunctive defaults. In Proc. 2nd Int'l Conf. on Principle of Knowledge Representation and Reasoning, pages 230{237, 1991. 13. L. Giordano, A. Marteli, and M. Sapino. A semantics for eshghi and kowalski's abductive procedure. In Proc. 10th ICLP. MIT Press, 1993. 14. G. Gottlob. Complexity results for nonmonotonic logics. Journal of Logic and Computation, 2:397{425, 1992.
15. K. Inoue and C. Sakama. Transforming abductive logic programs to disjunctive programs. In Proc. 10th ICLP. MIT Press, 1993. 16. A. Kakas, R. Kowalski, and F. Toni. Abductive logic programming. J. Logic and Computation, 2:719{770, 1992. 17. A. Kakas, R. Kowalski, and F. Toni. The role of abduction in logic programming. In Handbook of Logic in Arti cial Intelligence and Logic Programming. Oxford University, 1995. 18. A. Kakas and P. Mancarella. Generalized stable models: a semantics for abduction. In Proc. 9th ECAI, 1990. 19. A. Kakas and P. Mancarella. Stable theories for logic programs. In Proc. ILPS. MIT Press, 1991. 20. A. Kakas and P. Mancarella. Preferred extensions are partial stable models. J. Logic Programming, pages 341{348, 1992. 21. J. Lobo, J. Minker, and A. Rajasekar. Foundations of Disjunctive Logic Programming. MIT Press, 1992. 22. W. Marek, A. Rajasekar, and M. Truszczynski. Complexity of computing with extended propositional logic programs. In H. Blair, W. Marek, A. Nerode, and J. Remmel, editors, Proc. the Workshop on Structural Complexity and RecursionTheoretic Methods in Logic Programming, pages 93{102, Washington DC, 1992. Cornell University. 23. D. Poole. What the lottery paradox tells us about default reasoning. In Proc. KR '89, pages 333{340, 1989. 24. T. C. Przymusinski. Static semantics for normal and disjunctive logic programs. Annals of Mathematics and Arti cial Intelligence, 1995. 25. D. Sacca and C. Zaniolo. Stable models and non-determinism in logic programs with negation. In Proc. 9th ACM PODS, pages 205{217, 1990. 26. C. Sakama and K. Inoue. Relating disjunctive logic programs to default theories. In Proc. 2nd Int'l Worshop on LPNMR, pages 266{282. MIT Press, 1993. 27. C. Sakama and K. Inoue. Representing priorities in logic programs. In Proc. Int'l Conference and Symposium on Logic Programming. MIT Press, 1996. 28. J. You, R. Cartwright, and M. Li. Iterative belief revision in extended logic programming. Theoretical Computer Science, 170(1-2):383{406, 1996. 29. J. You and L. Yuan. A three-valued semantics for deductive databases and logic programs. J. Computer and System Sciences, 49:334{361, 1994. A short version appeared in Proc. PODS '90. 30. J. You and L. Yuan. On the equivalence of semantics for normal logic programs. J. Logic Programming, 22(3):211{222, 1995.
This article was processed using the LATEX macro package with LLNCS style
Assumption-Commitment in Automata Swarup Mohalik and R. Ramanujam The Institute of Mathematical Sciences C.I.T. Campus Chennai - 600 113 India fswarup, [email protected]
ABSTRACT In the study of distributed systems, the assumption - commitment framework is crucial for compositional speci cation of processes. The idea is that we reason about each process separately, making suitable assumptions about other processes in the system. Symmetrically, each process commits to certain actions which the other processes can rely on. We study such a framework from an automata-theoretic viewpoint. We present systems of nite state automata which make assumptions about the behaviour of other automata and make commitments about their own behaviour. We characterize the languages accepted by these systems to be the regular trace languages (of Mazurkiewicz) over an associated independence alphabet, and present a syntactic characterization of these languages using top-level parallelism. The results smoothly generalize for automata over in nite words as well.
1 Introduction A distributed system usually consists of a nite number of processes, which proceed asynchronously and periodically exchange information between each other. A theoretically simple model of a distributed system is one where the network of processes is static, so that the number of processes and the channels of communication between them is xed, and messages are not buered, so that all communication can be studied as synchronization. It was such a simpli cation that led to the model of Communicating Sequential Processes [H78, H85]. The attractiveness of such a model is that when there are n processes in the system, we can represent the system as a parallel composition (P1 jjP2jj : : : jjPn), and in each of the Pi we can refer to synchronizations with other Pj . When analyzing the behaviours of the system, ideally, we would like to study the behaviours of each process separately and put them together. Such compositional reasoning formed a major issue of concern in the eighties [Z89, PJ91].
Early on, it was realized that the assumption - commitment [FP78, MC81] or rely - guarantee [J83] paradigm was required for local reasoning. While there are some dierences between these two approaches, we are not concerned about these dierences here, at least for the purposes of this paper. The main idea is that each process makes assumptions about the behaviour of other processes and commits to ful lling those made by other processes about its own behaviour. We can see this as conditional commitment to a transaction: \if others commit to doing x and y, then I commit to doing z". Symmetrically all processes proceed this way, and we can compose processes only when mutual assumptions are met. This facilitates local reasoning: we can reason about the behaviour of each process separately, assuming that others maintain relevant properties and reason globally about their compatibility. While a number of researchers seem to have studied this framework in the context of programming methodology, process algebras or temporal logics, there seems to have been little eort in formulating it from an automata-theoretic viewpoint. Implicitly, these models assume each process to be a machine of some sort, but studying formally the implications of each process being a nite-state machine is a dierent exercise altogether. The closest to such an approach seems to be that of asynchronous cellular automata [CMZ93], in the sense that the moves of a process may depend on the moves of others so that they move jointly. However, the connection of these automata to local reasoning as described here is not quite transparent. In this paper, we study systems of nite state automata, where alongwith a distributed alphabet of actions we also have a commitment alphabet to specify conditional synchronization. An automaton A1 may synchronize with another automaton A2 on an action (a; 1; 2) where we see A1 as committing to 1 provided A2 commits to 2 . Symmetrically, for such a synchronization to occur, A2 must have a transition on a where it commits to 2 . (In this case, A2 may not require the 1 commitment from A1 .) We propose such automata and study their products. We consider the class of languages accepted by such systems and show them to be the same as regular trace languages (of Mazurkiewicz), thus showing that these systems have the same power as asynchronous (cellular) automata. We then present a Kleene-characterization of these languages using parallel composition of regular expressions. We then point out that when considering in nite behaviours of these automata the results smoothly generalize. Why should one look for an automata-theoretic account of the assumption commitment framework? An important reason is that these automata can serve as natural models for temporal logics based on local reasoning (for instance the one in [R96]). Compositional model checking is one of the major goals of computer-aided veri cation ([AH95]), and we believe that the automata presented here (particularly over in nite words) may help. Consider a temporal logic with local assertions that may refer to other agents' local formulas and with global formulas that assert compatibility. Then each formula can be associated with an automaton of our kind, so that model checking can be done in layers: individually for each agent and globally for compatibility.
An important problem that we do not study here is that of dynamic networks of processes. The assumption - commitment framework is general enough to include such networks, which can, in principle, accommodate unboundedly many processes. In future work, we hope to study the automata theory of networks where processes may fork and join at will. These automata also bear close resemblance to knowledge-based programs [FHMV95], where a process makes conditional transitions based on what it knows about other processes. Incorporating such knowledge-based transitions in automata constitutes a proper generalization of our systems: here, an automaton may refer to assumptions about others only at synchronization points, whereas a knowledge-based program can refer to its knowledge about others at any stage of its computation. We see this as an interesting issue for future study.
2 The automata We begin with some notation. An automaton over a nite nonempty alphabet is a tuple M = (Q; ; q0; F), where Q is a nite set of states, Q Q is the transition relation, q0 2 Q is the initial state, and F Q is the set of nal states of the automaton. As usual, we write q!a q0 to mean that (q; a; q0) 2 . Note that the automaton is in general nondeterministic. Aa runa froma q0 2 Q to qk 2 Q is a sequence of transitions of the automaton : q0!q1! : : : !k qk , where k 0. When qk 2 F, we say that the sequence a1 : : :ak is accepted by M, and denote the set of all strings by L(M), the (regular) language accepted by M. Here we consider systems of such automata, for which we de ne the notion of a distributed alphabet. Fix a nite set of locations Loc = f1; : : :; ng; n > 0. A distributed alphabet is a tuple e = (1 ; : : :; n ), where each i is a nite nonempty set of actions of the ith automaton. When a 2 i \ j ; i 6= j, we think of it as a synchronization action between i and j. Given a distributed alphabet e , we often speak of the set def = 1 [ ::: [ n as the alphabet of the system. We also make implicit use of the associated function loc : ! 2f1;:::;ng de ned by loc(a) def = fi j a 2 i g. As suggested in the last section, we have another alphabet, which we call the commitment alphabet. This is a tuple Ce =< C 1; C 2 ; ; C n >, where for all i 6= j, C i \ C j = f?g. The element ? is the null assumption (or commitment). We call C = C 1 [[ : : : [ C n the commit set. For a 2 , we use the notation C a to denote the set C i. 1
i2loc(a)
2
Let i C i C i be the partial order de ned by c i d i c = ? or c = d. We extend the relation pointwise to sequences over C i as follows: for all 1 ; 2 2 C i , 1 i 2 i j1j = j2j and for all j j1j; 1(j) i 2 (j). Let a def = f : loc(a) 7! C a j 8i 2 loc(a), (i) 2 C i g. Below we extend the distributed alphabet using commitments so that we have actions of the form < a; > where 2 a . When (i) = ? for i 2 loc(a), we treat this as a \don't care" condition.
De nition 2.1 Given a distributed alphabet, e = ( ; : : :; n) and a commit alphabet eC , we de ne extended alphabets as follows: iC = f< a; > ja 2 i and 2 [ ag. C = < C ; ; C >. C = f iC . n 1
def
def
def
1
i2Loc
We use ; etc to denote the members of C . We extend the function loc to C as well: loc(< a; >) def = fi j a 2 i g. Note that the extended alphabet is more general than we need. When a 2 , we need not consider functions where 8i 2 loc(a); (i) = ?; further when loc(a) = fig, there is no need to consider commitments at all. In the interest of simple notation, we retain the generality of this presentation. (In cases like these, we will unabashedly refer to a 2 iC .) We will need three kinds of projection maps, and we de ne them below. The rst is the commit erasure map: : C ! de ned as : (< a1 ; 1 > < ak ; k >) def = a1 ak . The second is the component projection map: d: C ( Loc) ! C , de ned by: di def = (where denotes the null string), and def (x)di = (xdi), if i 2 loc(), and is xdi, otherwise. At times, we abuse notation and use the symbol d as a component projection map from ( Loc) to de ned in identical fashion. The third is the commit projection map : #: ( C Loc) ! C de ned by: # i def = , and (< a; > x) # i def = (i)(x # i), if i 2 loc(a), and is x # i, otherwise. We are now ready to de ne AC-automata, the class of machines which make assumptions and commitments. We rst de ne individual automata and then systems of such automata. De nition 2.2 Consider the distributed alphabet e = (1; : : :; n), the commit alphabet Ce and the associated extended alphabet fC over the set of locations Loc. Let i 2 f1; 2; : : :; ng. 1. An AC-automaton over iC is a tuple Mi = (Qi ; ,!i; si0 ) where, Qi is a nite set of states, si0 is the initial state, and ,!i Qi iC Qi is the transition relation. 2. A System of AC-automata over the extended alphabet C is a tuple M = (M1 ; M2 ; ; Mn; F), where each Mi is an AC-automaton over iC , and F (Q1 : : : Qn ).
f
f
A remark is in order. It is not necessary that the commit alphabet be xed universally for the system, as we have done above. We could have de ned each AC-automaton with its own commit alphabet and subsequently ensured in the de nition of systems that for all i; j 2 Loc, the j th commit set of automaton i is contained in the j th commit set of automaton j. However, this tends to clutter up notation considerably, so we stick with the (more restricted) notation of globally determined commit alphabet. Note that we have associated nal states only with systems rather than with individual automata. This is natural in the assumption - commitmentframework:
local reachability of any state for any automaton is dependent on the behaviour of other automata in the system, and hence properties of the system are best given in terms of global states 1. We might wish to partially specify global states with a \don't care" condition on some components in the system, but this is easily speci ed using a set of global states as above. f is given below as that of the product automaton The global behaviour of M c M associated with the system. Note that the system is then a ( nite state) machine over , thus hiding away assumptions and commitments as internal details. This ts with the intuition that the behaviour of a distributed system is globally speci ed on , and the machines in the system are programmed to achieve this, using internal coordination mechanisms like synchronization and commitments among themselves.
f = (M ; M ; ; Mn; F ) De nition 2.3 Given a system of AC-automata M f
1
2
c
over C , the product automaton associated with the system is given by: M = (Q; =); < s01 ; s02 ; ; s0n >; F ) where, Q = (Q1 : : : Qn ), and =) Q Q is de ned by: < p1; p2; ; pn > =a) < q1; q2; ; qn > i 1. 8i 62 loc(a); pi = qi , and 2. (8j 2 loc(a), there exists j 2 a ) such that (8i 2 loc(a), pi qi and (for all k 2 loc(a); i (k) k k (k))). We extend the one step transition function =) to words over . Then the product accepts a string x 2 if there is a state q in F such that x 0 0 0 < s1 ; s2 ; ; sn >=)q. The language accepted by the system M is then given as fx 2 jx is accepted by M g, and is denoted L = L(M ).
b
b
c
f
b
b
f
The class of languages over accepted by systems of AC-automata is denoted as L(ACM)e . Formally, L(ACM)e = fL j there is a commit alphabet eC and an AC system Mf over fC such that L = L(Mf )g. Since the product is a nite state automaton, we note the following simple fact.
Fact 2.4 L(ACM)e is included in the set of regular languages over .
Note that we have de ned AC-automata to be nondeterministic and the products are nondeterministic as well. Moreover, we even have systems where each automaton is deterministic whereas the product is not. For instance, consider the system of two automata where one has transitions r0(a;?; ,! )r1; r0(a;?;? ,! )r2 and the other has the transition s0 (a;?; ,! )s1 . Then the product will have two nondeterministic transitions on a from (r0 ; s0) to (r1 ; s1) and to (r2 ; s1). The following observation ensures that indeterminacy in products of deterministic automata can only arise in this way. 1
We can formally show that the theorems in the paper go through even if we have only local sets of nal states; but this makes the presentation messy, so we omit it.
Call an AC-automaton M = (Q; ,!; q0) complete if whenever there is a transition q ,! q0 in M, then for all i 2 loc(a), (i) 6= ?.
f, Observation 2.5 For a system of complete deterministic AC-automata M c f the product automaton M associated with M is also deterministic. Let L(cd , ACM)e denote the set of all languages over accepted by systems of complete deterministic AC-automata. We will show in the next section that this is the same as L(ACM)e , so the extra generality in the de nitions is only for convenience and does not add expressiveness.
Fig.1.AC-system for the language ([ab]c + [aabb]c): = fa; cg; = fb; cg: 1
p1 (c,x1,y1)
a p2 M1
2
q1 (c,x1,y1)
b
(c,x2,y2)
q2
a
M2
p3
(c,x2,y2)
b q3
M (p1,q1) a
b
c (p1,q2) b a (p2,q2) (p1,q3) b a a
(p2,q1) a b c
(p3,q1) b
(p3,q2)
(p2,q3) b
a
(p3,q3)
We now present a simple example of these automata. Consider an AC-system of two agents which operate on the alphabets fa; cg and fb; cg respectively. a and b are local actions and c is a synchronization. Suppose we want the system behaviour that they synchronize whenever they have each done one local action or when they have both done two local actions. Simple products of automata would not suce here as they would allow for a synchronization when one of them has done one local action, whereas the other has done two, and such a stretch of behaviour is disallowed by the speci cation. A solution is presented in Figure 1, where the commit alphabet is (fx1; x2g; fy1; y2 g). Here x1 is a commitment for early synchronization by agent 1 and x2 for late synchronization, and similarly for agent 2.
3 The languages In this section we study the class of languages accepted by systems of ACautomata, which we have called L(ACM). Since the behaviour of these have been given by products of automata, these are a kind of synchronized shue of languages. The only additional complication is to introduce assumptions and commitments and match them during the shue operation. For this section, x a distributed alphabet e = (1 ; : : :; n), a commit alphabet Ce, and hence the associated extended alphabet fC over Loc.
De nition 3.1 Let xi 2 iC , i 2 f1; : : :; ng. 1. We say that the tuple (x ; x ; : : :; xn) is compatible i for all i; j 2 Loc, (a) (xi dj) = (xj di). (b) (xidj) # j (xj di) # j . 2. Suppose (x ; x ; : : :; xn) is compatible, and x 2 . We say that x is generated by the tuple i for all i 2 Loc, xdi = (xi). 3. Let Li iC , i 2 f1; : : :; ng. We de ne the n-ary AC-shue of these languages by: L jjL jj : : :Ln = fx 2 jx is generated by a compatible tuple (x ; x ; : : :; xn) where for all i, xi 2 Li g. Let L(AC , Shuffle) denote the least class that includes the set fL j for some commit alphabet eC , there exist regular languages Li iC such that L = L k : : : k Ln g and is closed under union. We now show that languages in AC-Shue are exactly the languages in L(ACM). Before that, we need some propositions . For now, x a system of f = (M ; M ; ; Mn; F ) over fC , and its associated product AC-automata M c = (Qb ; ,!; < s ; s ; ; sn >; F ). Let =)i be the natural extension to words M over iC of the transition functions ,!i. c. Then for Proposition 3.2 Let < r ; r ; ; rn >=x)<xqi ; q ; ; qn > in M C i i all i 2 Loc, there exist xi 2 i such that r =)iq , the tuple (x ; x ; ; xn) is 1
1
2
1
1
2
def
2
2
1
2
1
0 1
2
0 2
0
1
2
1
2
1
2
compatible, and x is generated by it. Proposition 3.3 Suppose x 2 . If for all i 2 Loc, there exist xi 2 iC such x that ri =)i i qi , the tuple (x1 ; x2; ; xn) is compatible, and x is generated by it, then < r1; r2; ; rn > =x) < q1 ; q2; ; qn > in M .
c
f be an AC-system and let Fi Qi, for all i. Let Li = Lemma 3.4 Let M f ) = L k L k k Ln. L(Mi ; Fi) and F = i2Loc Fi. Then L(M f ). Then there is an accepting path = < q ; q ; ; qn > Proof. Let x 2 L(M x n c where qfi 2 Fi according to the assumption. =) < qf ; qf ; ; qf > in M 1
1
2
2
2
Proofs of Propositions 3.2 and 3.3 are given in Appendix 1
1 0
2 0
0
By Proposition 3.2, for all i 2 Loc there exist xi 2 iC such that si0 =x)i iqfi and x is generated by the compatible tuple (x1; x2; ; xn). Then, xi 2 Li and by the de nition of k, x 2 L1 k L2 k k Ln: Let x 2 L1 k L2 k k Ln : Then, by de nition of k, there exist xi 2 Li such that x is generated by (x1; x2; ; xn) which is compatible. So for all i 2 Loc, there is a qfi 2 Fi such that si0 =x)i i qfi . By Proposition 3.3, < q01 ; q02; ; q0n > =x)< qf1 ; qf2 ; ; qfn >. Since < qf1 ; qf2 ; ; qfn > 2 F by its construction, x 2 f ). L(M The lemma at once gives the following theorem as its corollary: Theorem 3.5 L(ACMe ) L(AC , Shufflee ).
To show the converse of the above theorem, we take recourse to Mazurkiewicz e trace theory [DR95]. Fix the distributed alphabet . De nition 3.6 De ne the relation on as: for all x; y 2 , x y def = xdi = ydi for all i 2 Loc. It is easy to see that is an equivalence. The equivalence classes of are called traces. In trace theory, it is customary to present in an alternative form: given e , we de ne a re exive and symmetric relation I as I = f(a; b)j loc(a) \ loc(b) = ;g. I is called the independence relation.
De nition 3.7 Two words x and y are 1-trace equivalent, x : t y, if there are words u; v 2 and (a; b) 2 I such that x = uabv: and y = ubav. The trace equivalence, t, is the re exive transitive closure of t . Since the de nitions of and t can be shown to be equivalent, we will use the symbol to mean t as well. M(; I ) = = is called the trace monoid. Let : ! M(; I ) be a morphism such that (x) = [x]. The syntactic congruence T on M(; I ) is de ned by 8r; t 2 M(; I ); t T r i 8t1 ; t2 2 M(; I ):t1tt2 2 T , t1rt2 2 T: De nition 3.8 A trace language T M(; I ) is regular i the syntactic congruence T is of nite index. Equivalently, T is regular i ,1T is a regular subset of .
One can read the de nition of regular trace languages as the regular languages that are closed under the equivalence relation . Note that the closure of a regular language under need not be regular. For example, let L = (ab) . If I = f(a; b)(b; a)g, then ,1L is the language containing strings with equal number of a's and b's, which is not regular any more. e The following Let RTLe denote the class of regular trace languages over . proposition ensures that AC-automata accept regular trace languages.
Proposition 3.9 L(AC , Shufflee ) RTLe .
e
Proof. From Lemma 3.4 we rst see that L(AC , Shuffle ) is indeed included
in the set of regular languages over . Next, we show that every language L in L(AC , Shufflee ) is closed under . It suces to show that xbay 2 L whenever xaby 2 L for x; y 2 , a; b 2 and (a; b) 2 I . By de nition of AC-shue, xaby is generated by a compatible tuple (x1 ; x2; ; xn), with xi 2 Li . But, then xbay is also generated by (x1 ; x2; ; xn) since for all i 2 Loc, xbaydi = xabydi = (xi ). Hence, by de nition xbay 2 L. We now set out to show that every regular trace language over the distributed alphabet e is accepted by an AC-system. For this, we rst need Zielonka's theorem characterizing regular trace languages by asynchronous automata [Z87]. A Zielonka automaton on e with n processes is a tuple A = (A1 ; : : :; An,, F), where for every i 2 Loc; Ai = (Qi ; i; i; s0i ) is the i-th automaton, Q = i2LocQi is the state space of A, F Q is the set of nal states and s0 = (s01 ; : : :; s0n) denotes the initial state of A . = fa j a 2 g is the next state function, where a : i2loc(a) Qi ! 2i2loc a Qi . A is deterministic if 8a 2 , 8s 2 i2loc(a) Qi , ja (s)j 1. The transition relation )A between any two global states (p1; p2; : : :; pn) and (q1; q2; : : :qn) of A is de ned as: (p1; p2; : : :pn))aA (q1; q2; : : :; qn) i (qi ; : : :; qik ) 2 a (pi ; : : :; pik ), where fi1; : : :; ik g = loc(a), and pj = qj , for all j 62 loc(a). This transition among the global states is extended to words over in the natural way. The language accepted by A is de ned as: L(A ) = fx 2 j 9s 2 F:s0)xAsg. An immediate corollary of the transition on global states is the following. ( )
1
1
Proposition 3.10 If (a; b) 2 I then for all s; s0 2 Q, s)abA s0 i s)baA s0 . Consequently L(A ) is closed under . The trace languages accepted by A is de ned as: T(A ) = ft 2 M(; I ) j 8u 2 t, 9s 2 F : s )uAsg. Then from the above proposition, we get: Corollary1. , T(A ) = L(A ). Theorem 3.11 (Zielonka) 1. For every Zielonka automaton A , the trace language T (A ) M(; I ) is regular. 2. For every regular trace language T M(; I ) there is a deterministic Zielonka automaton A such that T = T(A ). 0
1
We now show that every Zielonka automaton can also be presented as an AC-system of automata with the same behaviour.
Lemma 3.12 RTLe L(ACMe ).
Proof. Let L 2 RTL. Then by Zielonka's theorem, there is a deterministic Zielonka-automaton A accepting L. Let it be A = (A1 ; : : :; An; ; F), where for every i 2 Loc, Ai =(Qi ; i; i; s0i ) is the i-th automaton. Consider the commit alphabet given by C i = Qi , for i 2 Loc. De ne a system of AC-machines over the associated extended alphabet as follows: M = (M1; M2 ; ; Mn ; F), where Mi = (Qi; ,!i ; s0i ) and pi qi i there is a transition (p; a; q) 2 a ; p[i] = pi ; q[i] = qi and for all j 2 loc(a), i(j) = pj . Let the product of M be M = (Q; ,!; < s01 ; s02 ; ; s0n >, F ). The proof of the
f
f c
lemma follows easily from the following claim: a q: (p; a; q) 2 i p,! ()): Let (p; a; q) 2 . Then for all i 62 loc(a); pi = qi and by construction, for all i 2 loc(a); pi qi where p[i] = pi ; q[i] = qi and for all j 2 loc(a); i(j) = a q. c pj . Then, by de nition of transition in M , p,! a q. Then, by de nition, for all i 62 loc(a); p = q , and for all ((): Let p,! i i i 2 loc(a), pi ,!i qi such that for all j; k 2 loc(a); j (k) = k (k). By construction, for all i 2 loc(a), there is a transition (pi ; a; qi ) 2 a such that pi [i] = pi; qi [i] = qi. Also, for all i; j 2 loc(a); i(j) = pi [j]. Since for any k 2 loc(a); pk [i] = k (i) = pi [i] = pi, for all k 2 loc(a); pk = p. Because a is deterministic, we have qk = qj for all j; k 2 loc(a) and hence qk = q for all k 2 loc(a). Hence we get (p; a; q) 2 a, as required. Note that in the system constructed above, each AC-automaton is complete and deterministic. Thus we have shown that RTLe L(cd , ACMe ). Putting this together with Theorem 3.5 and Proposition 3.9, we get: Corollary 3.13 L(ACMe ) = L(cd , ACMe ). Thus, we have the following characterization theorem for the class of languages over e accepted by systems of AC-automata. Theorem 3.14 L(ACMe ) = L(AC , Shufflee ) = RTLe .
4 The syntax We now present a syntax for the languages studied in the last section. Given that [O85] already provides a syntax for regular trace languages, this exercise may seem super uous. However, the [085] syntax is global in nature, rather than a parallel composition of regular expressions. Given the intuition described in the introduction, namely that of describing systems of parallel programs which make assumptions and commitments about other programs in the system, looking for such a syntactic presentation seems well-motivated. Below, we write L1 L2 to denote language concatenation, and L for language (Kleene) iteration. The syntax is given in two layers, one for `local' expressions
e a commit and another for parallel composition. Fix a distributed alphabet , alphabet Ce, and the associated extended alphabet. ACREGi ::=< a; >2 iC j p + q j p; q j p ACREG ::= r1 k r2 k k rn ; ri 2 ACREGi j R1 + R2; Ri 2 ACREG The semantics of these expressions is given as follows: for each i 2 Loc, we C have a map []i : ACREGi ! 2 i , and globally a map [] : ACREG ! 2 . These maps are de ned by structural induction: { [< a; >]i = f< a; >g. { [p + q]i = [p]i [ [q]i. { [p; q]i = [p]i [q]i. { [p]i = ([p]i ). { [r1 k r2 k k rn] = [r1]1 k [r2]2 k k [rn]n. { [R1 + R2] = [R1] [ [R2]. The class of regular languages generated by the ACREG expressions is denoted as L(ACREG). Formally, L(ACREG)e = fL 2 j for some commit alphabet Ce, there is an R 2 ACREG over fC such that L = [R]g. Lemma 4.1 L(ACM)e L(ACREG)e .
[ f
f f
Proof. Let L 2 L(ACM). Then there is a system M = (M1 ; M2 ; ; Mn; F) such that L = L(M ). Then, L = L(M f ) where M f = (< M1; ; Mn >, f2F
f
ff g).
f f ) = L1 k Let f = (f1 ; f2; ; fn) and Li = L(Mi; fi ). By Lemma 3.4, L(M L2 k k Ln . Since each Li iC is regular, there is an ri 2 ACREGi such that [ri] = Li . Let Xf = r1 k r2 k k rn. Then by the last observation, f ). [Xf ] = L(M Now let X = Xf +Xf + +Xfk where F = ff1; f2 ; fk g. X 2 ACREG and it is obvious that L = [X]. Lemma 4.2 L(ACREG)e RTLe . 1
2
e e ACREG expression R over it such that L = [R]. The proof is by induction on Proof. Suppose L 2 L(ACREG) . Then there is a commit alphabet C and an
the number m of +'s in the ACREG expression R. For the base case, when m = 0, R = r1 k r2 k k rn and [R] =k Li where Li = [ri]i for all i 2 Loc. Since ri 2 REGi; Li are regular. Then there exist deterministic nite automata Mi over iC and a set Fi Qi such that Li = L(Mi; Fi)). Let M = (M1; M2; ; Mn; i2LocFi be an AC-system. By f ). Hence by Theorem 3.14, [R] =k Li 2 RTL. Lemma 3.4, k Li = L(M The induction step follows routinely from the observation that the class of regular trace languages is closed under union.
Thus, we have our version of Kleene's theorem: Theorem 4.3 L(ACM)e = L(ACREG)e .
5 In nite behaviours We now present AC-systems whose behaviours are given by in nite words. For lack of space, we merely de ne the systems and mention the results here. The details will be given in the full paper. Formally, x a distributed alphabet e = (1 ; : : :; n ), a commit alphabet eC = (C 1; ; Cn) and the associated extended alphabet fC . An !AC-system is a tuple M = (< M1 ; M2 ; ; Mn >; F ), where Mi are AC-automata over iC , i 2 Loc, and F i2Loc 2Qi . c be the product !of the local AC-machines Mi. The behaviour Let M f , denoted as L (Mf ) is de ned as the subset of ! acceptedof theby system M c with caccepts a string the product M the acceptance table F . The automaton M ! 0a 1a x = a0a1 : : : 2 , if there exists an in nite run = s !s ! : : : of the product system, and a tuple U = (U1 ; : : :; Un ) 2 F such that for all i 2 Loc, Infi () = Ui f ) = fx 2 ! jMc accepts xg: where Infi () = fqj91 j, sj [i] = qg. Then, L! (M Thus, L(!ACM)e = fL ! j there exists a commit alphabet Ce and an f over fC such that L = L! (Mf )g. !AC-system M Note that we can extend all the earlier de nitions (of i ; d; ; # etc) to in nite strings, and de ne L1 k : : : k Ln as before, where Li iC ! . We can again show that such !AC-Shue languages exactly capture the behaviour of !AC-systems3, but the detour is now taken via Muller asynchronous automata, introduced by [DM93], and the proof uses results from [GP92]. The syntax is a smooth generalization, mirroring the way we construct !regular expressions. We now have three layers: 0
1
ACREGi ::=< a; >2 iC j p + q j p; q j p !ACREGi ::= r s! j R1 + R2; r; s 2 ACREGi; Ri 2 !ACREGi !ACREG ::= R1 k R2 k k Rn ; Ri 2 !ACREGi j X1 + X2 ; Xi 2 !ACREG The semantics of these expressions is de ned in the obvious way, and again we can show a Kleene's theorem that these expressions exactly characterize the class of !-regular languages accepted by !AC-systems. Example: We conclude the paper with an example of a simple two-process mutual exclusion problem that can be captured naturally by the ACA's. For simplicity, we abstract away the internal computational states and assume that the processors are always requesting for or executing in the critical section. We model this as following. Each of the processes can either be in state w(waiting to 3
Here again the main results follow from two propositions that are analogous to Propositions 3.2 and 3.3. We give their proof sketches in Appendix 2.
enter the critical section) or in state c(having entered the critical section) state. In order to gain access to the critical section from the wait state, the processes do a joint action c. The commitment alphabet is < fp1; np1g; fp2;np2g >, where pi; i = 1; 2 denotes that process i is permitted and npi denotes that it is not permitted to enter the critical section. The design of process 1 can then be as follows: when 1 is in the state w1, it stays in the same state if it is not permitted entry to critical section. When it is permitted entry, assuming that process 2 is not permitted entry, it can go to the state c1 denoting access of critical section. Process 2 is designed in a symmetric way. Figure 2 shows the two processes and also the product showing the global behaviour. It is clear that at no point both the processes can be in critical section thus satisfying the safety requirement. A Muller acceptance table ff(w1; w2); (c1; w2); (w1; c2)gg ensures the liveness of both the processes. Acknowledgment: We thank the anonymous referees for their valuable suggestions. Fig.2. ACA for two-processor mutual exclusion (c, np1,-)
W1
a
b
(c,p1,np2)
(c, -, np2)
W2
(c,np1,p2)
C1
C2 c
M1
M2
(W1, W2) a
b c
(C1, W2)
c
(W1, C2) PRODUCT
References [AH95] [CMZ93] [DM93]
Alur, R., and Henzinger, T., \Local liveness for compositional modelling of fair reactive systems", LNCS 939, 1995, 166-179. Cori, R., Metivier, Y. and Zielonka, W., \Asynchronous mappings and asynchronous cellular automata", Information and Computation, vol 106, 1993, 159-202. Diekert, V. and Muscholl, A., \Deterministic asynchronous automata for in nite traces", LNCS 665, 1993, 617-628.
[DR95]
Diekert, V. and Rozenberg, G., The book of traces, World Scienti c Press, 1995. [FHMV95] Fagin, R., Halpern, J., Moses, Y. and Vardi, M., Reasoning about knowledge, M.I.T. Press, 1995. [FP78] Francez, N., and Pnueli, A., \A proof method for cyclic programs", Acta Inf., vol 9, 1978, 138-158. [GP92] Gastin, P. and Petit, A., \Asynchronous cellular automata for in nite traces", LNCS 627, 1992, 583-594. [H78] Hoare, C.A.R., \Communicating Sequential processes", Comm. ACM, vol 21, 1978, 666-677. [H85] Hoare, C.A.R., Communicating Sequential processes, Prentice Hall, 1985. [J83] Jones, C.B., \Speci cation and design of (parallel) programs", Proc IFIP 83, 1983, 321-331. [MC81] Misra, J., and Chandy, M., \Proofs of networks of processes", IEEE Trans. on Soft. Engg., vol 7, 1981, 417-426. [O85] Ochmanski, E., \Regular behaviour of concurrent systems", Bulletin of the EATCS, vol 27, 1985, 56-67. [PJ91] Pandya, P.K., and Joseph, M., \P-A logic: a compositional proof system for distributed programs", Distributed Computing, vol 5, 1991, 37-54. [R96] Ramanujam, R., \Locally linear time temporal logic", Proc IEEE LICS, New Jersey, 1996, 118-127. [Z87] Zielonka, W., \Notes on nite asynchronous automata", RAIRO-Inf. Theor. et Appli., vol. 21, 1987, 99-135. [Z89] Zwiers, J., Compositionality, concurrency and partial correctness, Springer LNCS 321, 1989.
Appendix 1
Here we prove the Proposition 3.2. Proposition 3.3 is proved similarly. f = (M1; M2; ; Mn, F ) over fC , and Fix a system of AC-automata M c b its associated product M = (Q; ,!; < s01 ; s02; ; s0n >; F ). c. Then for Proposition 3.2 Let < r1; r2; ; rn >=x)<xqi 1; q2 ; ; qn > in M all i 2 Loc, there exist xi 2 iC such that ri =)iqi , the tuple (x1; x2; ; xn) is compatible, and x is generated by it. Proof. The proof is by induction on the length of x. The base case, when x = is trivial, as is generated by (; ; ; ). For the induction step, let x = ya. Let c. By < r1 ; r2; ; rn >=y)< p1 ; p2; ; pn >)a < q1 ; q2; ; qn > be a path in M y i i C i induction hypothesis (IH), there exist yi 2 i for all i such that r =)ip and y is generated by (y1 ; y2; ; yn ). c there exist a set of transitions piqi for all i 2 loc(a) such that for all j; k 2 loc(a); j (k) i k (k). We need to de ne the compatible tuple (x1; x2; : : :; xn) and show that x is generated by it. If i 62 loc(a), de ne xi to be yi , and if i 2 loc(a), de ne xi = yi < a; i >. i i By IH and the observation on i-transitions above, ri =x) q for all i 2 Loc. We now show that xdi = (xi ) for all i 2 Loc. Suppose i 62 loc(a). Then xdi = yadi = ydi. Using IH, we get xdi = (yi ) = (yi < a; i >) = (xi ). On
the other hand, if i 2 loc(a), xdi = ydi a. Using IH, we get xdi = (yi ) a = (yi < a; i >) = (xi ). Thus, we only need to show that the tuple (x1 ; x2; : : :; xn) is compatible, for which we have already proved one condition. For i; j 2 Loc, (xi dj) = ((xi ))dj = (xdi)dj = (xdj)di = ((xj ))di = (xj di). We now show the other condition for compatibility. Suppose i 62 loc(a), and j 2 Loc. 1. (xi dj) # j = (yi dj) # j, by construction of xi . 2. (yi dj) # j i (yj di) # j, by IH. 3. (yj di) = (xj di), again by construction of xi . 4. (xi dj) # j i (xj di) # j, from 1,2 and 3. Now consider the case when i 2 loc(a), and j 2 Loc. If j 62 loc(a), IH gives the result, so further assume that j 2 loc(a). 1. (xi dj) # j = (yi < a; i > dj) # j, by construction of xi . 2. (yi < a; i > dj) # j = (yi dj) # j i(j), from the de nition of d and #. 3. (yi dj) # j j (yj di) # j, by IH. 4. (yi dj) # j i(j) i (yj di) # j i (j), from 3. 5. (xi dj) # j j (xj di) # j, from 1,2,3,4 and construction of xi . Thus we have (xidj) # j j (xj di) # j for all i; j 2 Loc, as required. ut
Appendix 2
The two most crucial propositions in the in nite case are analogous to Propositions 3.2 and 3.3. Here we give rough sketches of their proofs. Let is a nite run of any machine M. Then, de ne last(; M) def = the last def state of and first(; M) = the rst state of . Let be a nite(in nite) path c. De ne statesi () def= fqjq[i] = q for some q visited in g. in M
cC ! such that Proposition 5.1 Let be an in nite run on some x 2 ! in M i Infi () = Ui . Then for all i 2 Loc, there exist i in Mi for x 2 i such that Inf(i ) = Ui and x is generated by (x ; x ; ; xn). 1
2
Proof. After some nite point, say N, only the states from Ui 's occur in . So can be written as = 0 1 2 : : :, where j0 j = N, k ; k 1 are nite paths on some xk 2 and statesi (k ) = Ui for all i. By Proposition 3.2, let k be generated by a compatible tuple (x1k ; x2k; ; xnk). Let xj = xj1xj2xj3 : : : for j 2 Loc. Each xjk has a nite path jk in Mi . Also, last(jk ; Mj ) = last(k ; M )[j] = first(k+1 ; M )[j] = first(jk+1 ; Mj ) for all k 0. Hence, each xj has an in nite path j in Mi and Inf(j ) = Uj . We
c
c
verify the following as well. Let i; j 2 Loc. 1. (xi dj) = (xi1 dj)(xi2 dj) : : : = (xj1di)(xj2 di) : : : = (xj di):
2. (xi dj) # j = (xi1 dj) # j (xi2dj) # j j (xj1di) # j (xj2di) # j = (xj di) # j: 3. xdi = x1di x2di = (xi1) (xi2) = (xi ): Hence, (x1; x2; : : :; xn) is compatible and generates x. ut
Proposition5.2 Suppose x 2 ! and for all i 2 Loc, there exists a run i on some xi 2 iC in Mi such that Inf(i ) = Ui and x is generated by a compatible c for x such that tuple (x ; x ; ; xn). Then there is an in nite path in M Infi () = Ui for all i 2 Loc. 1
2
Proof. The following claim is easy to verify.
Claim 1 Let x be generated by (x ; x ; ; xn). Let x0 be a nite pre x of x and let mi = jx0dij. Then x0 is generated by (x [m ]; x [m ]; : : :; xn[mn ]): Using the previous claim, write x = x x such that each xj is generated by (xj ; xj ; ; xjn) and for all i, states(ji ) = Ui where ji is the nite path in Mi for the nite substring xji . By Proposition 3.3, for every xj there is a j c and last(j ; Mc) = first(k ; Mc). Hence the in nite path = in M c. By the construction of j 's, it follows that for all for x = x x is in M i; Infi () = Ui . ut 1
2
1
1
1
1
2
2
2
2
+1
1 2
1 2
Compositional Design of Multitolerant Repetitive Byzantine Agreement 1
Sandeep S. Kulkarni
Anish Arora
Department of Computer and Information Science The Ohio State University Columbus, OH 43210 USA
Abstract
We illustrate in this paper a compositional and stepwise method for designing programs that oer a potentially unique tolerance to each of their fault-classes. More speci cally, our illustration is a design of a repetitive agreement program that oers two tolerances: (a) it masks the eects of Byzantine failures and (b) it is stabilizing in the presence of transient and Byzantine failures.
1 Introduction
Dependable systems are required to acceptably satisfy their speci cation in the presence of faults, security intrusions, safety hazards, con guration changes, load variations, etc. Designing dependable systems is dicult, essentially because some desired dependability property, say, availability in the presence of faults, may interfere with some other desired dependability property, say, security in presence of intrusions. As an example, in electronic commerce systems, design by replication facilitates availability but complicates security|replicas can be used to deal with faults which lose electronic currency, but they can also make it easier for intruders to double spend the money. To eectively formulate multiple dependability properties, we have proposed the concept of multitolerance [1]: Each source of undependability is formulated as a \fault-class" and each dependability property in the presence of that faultclass is formulated as a type of \tolerance". Thus, multitolerance refers to the ability of a system to tolerate multiple classes of faults, each in a possibly dierent way. To design multitolerant systems, we recommend a component-based method [1]: Our method, explained brie y, starts with an intolerant system and adds a set of components, one for each desired type of tolerance. Thus, the complexity of multitolerant system design is reduced to that of designing the components and of correctly adding them to the intolerant system. And, the complexity of reasoning about the interference between dierent tolerance properties is often reduced to considering only the relevant components, as opposed to involving the whole system. 1 Research supported in part by NSF Grant CCR-93-08640, NSA Grant MDA904-96-11011 and OSU Grant 221506. Email : fkulkarni,[email protected]; Web : http://www.cis.ohio-state.edu/f~kulkarni,~ anish g.
To further simplify the complexity of adding multiple components to the system, the method observes the principle of stepwise re nement: in the rst step an intolerant program is designed; in each successive step, a component is added to the system resulting from the previous step, to oer a desired tolerance to a previously unconsidered fault-class, while preserving the tolerances to the previously considered fault-classes. The basic idea of transforming an intolerant program into one that possesses the required tolerance properties is indeed well understood and practiced, see for example [2, 3]. In designing multitolerant programs, however, we have to deal with the additional complexity of multiple fault-classes: with respect to each fault-class, each component needs to behave in some desirable manner. To handle this complication in our compositional and stepwise method, while designing and adding a component in each step, we focus attention on the following strategy. (1) How to ensure that the added component will oer a desired tolerance to the system in the presence of faults in the fault-class being considered ? (2) How to ensure that execution of the component will not interfere with the correct execution of the system in the absence of faults in all fault-classes being considered ? (3) How to ensure that execution of the component will not interfere with the tolerance of the system corresponding to a previously considered fault-class in the presence of faults in that previously considered fault-class ? In this paper, we present a case study illustrating these three issues in the context of a compositional and stepwise design of a repetitive agreement program that oers two tolerances: (a) it masks the eects of Byzantine failures, by which we mean that it continues to satisfy the speci cation of repetitive agreement in the presence of Byzantine faults, and (b) it is stabilizing in the presence of transient and Byzantine faults, by which we mean that upon starting from an arbitrary state {which may result from the occurrence of these faults{ it eventually reaches a state from where it satis es the speci cation of repetitive agreement. The resulting multitolerant program is, to the best of our knowledge, the rst program for repetitive agreement that is both masking tolerant and stabilizing tolerant. (Previous designs are only masking tolerant, e.g. [4], or only nonmasking tolerant e.g. [5, 6], but none is multitolerant. In fact, we are not aware of any previous design that is stabilizing tolerant.) The rest of the paper is organized as follows. In Section 2, we specify the problem of repetitive agreement. In Section 3, we identify a simple fault-intolerant program for solving the problem. In Section 4, we add a component to the program for masking the eect of Byzantine failure. In Section 5, we add another component for stabilizing from transient and Byzantine failures, while preserving the masking tolerance to Byzantine failures alone. We present an extension of our program in Section 6 and discuss its multitolerance-preserving re nement in Section 7. We comment on the general aspects of our method in Section 8, and make concluding remarks in Section 9.
2 Problem Statement: Repetitive Agreement
A system consists of a set of processes, including a \general" process, g. Each computation of the system consists of an in nite sequence of rounds; in each round, the general chooses a binary decision value d:g and, depending upon this value, all other processes output a binary decision value of their own. The system is subject to two fault-classes: The rst one permanently and undetectably corrupts some processes to be Byzantine, in the following sense: each Byzantine process follows the program skeleton of its non-Byzantine version, i.e., it sends messages and performs output of the appropriate type whenever required by its non-Byzantine version, but the data sent in the messages and the output may be arbitrary. The second one transiently and undetectably corrupts the state of the processes in an arbitrary manner and possibly also permanently corrupts some processes to be Byzantine. (Note that, if need be, the model of a Byzantine process can be readily weakened to handle the case when the Byzantine process does not send its messages or perform its output, by detecting their absence and generating arbitrary messages or output in response.) The problem. In the absence of faults, repetitive agreement requires that each round in the system computation satis es Validity and Agreement, de ned below. Validity: If g is non-Byzantine, the decision value output by every nonByzantine process is identical to d:g. Agreement: Even if g is Byzantine, the decision values output by all non-Byzantine processes are identical. Masking tolerance. In the presence of the faults in the rst fault-class, i.e., Byzantine faults, repetitive agreement requires that each round in the system computation satis es Validity and Agreement. Stabilizing tolerance. In the presence of the faults in the second fault-class, i.e., transient and Byzantine faults, repetitive agreement requires that eventually each round in the system computation satis es Validity and Agreement. In other words, upon starting from an arbitrary state (which may be reached if transient and Byzantine failures occur), eventually a state must be reached in the system computation from where every future round satis es Validity and Agreement. Before proceeding to compositionally design a masking as well as stabilizing tolerant repetitive agreement program, let us recall the wellknown fact that for repetitive agreement to be masking tolerant it is both necessary and sucient for the system to have at least 3t +1 processes, where t is the total number of Byzantine processes [4]. Therefore, for ease of exposition, we will initially restrict our attention, in Sections 3-5, to the special case where the total number of processes in the system (including g) is 4 and, hence, t is 1. In other words, the Byzantine failure fault-class may corrupt at most one of the four processes. Later, in Section 6, we will extend our multitolerant program for the case where t may exceed 1.
Programming notation. Each system process will be represented by a set of
\variables" and a nite set of \actions". Each variable ranges over a prede ned nonempty domain. Each action has a unique name and is of the form: hnamei :: hguardi ,! hstatementi The guard of each action is a boolean expression over the variables of that process and possibly other processes. The execution of the statement of each action atomically and instantaneously updates the value of zero or more of variables of that process, possibly based on the values of the variables of that and other processes. For convenience in specifying an action as a restriction of another action, we will use the notation hname i :: hguard i ^ hnamei to de ne an action hname i whose guard is obtained by restricting the guard of action hnamei with hguard i, and whose statement is identical to the statement of action hnamei. Operationally speaking, hname i is executed only if the guard of hnamei and the guard hguard i are both true. Let S be a system. A \state" of S is de ned by a value for each variable in the processes of S , chosen from the domain of the variable. A state predicate of S is a boolean expression over the variables in the processes of S . An action of S is \enabled" in a state i its guard (state predicate) evaluates to true in that state. Each computation of S is assumed to be a fair sequence of steps: In every step, an action in a process of S that is enabled in the current state is chosen and the statement of the action is executed atomically. Fairness of the sequence means that each action in a process in S that is continuously enabled along the states in the sequence is eventually chosen for execution. 0
0
0
0
0
0
3 Designing an Intolerant Program
The following simple program suces in the absence of faults: In each round, the general sends its new d:g value to all other processes. When a process receives this d:g value, it outputs that value and sends an acknowledgment to the general. After the general receives acknowledgments from all the other processes, it starts the next round which repeats the same procedure. We let each process j maintain a variable d:j , denoting the decision of j , that is set to ? when j has not yet copied the decision of the general. Also, we let j maintain a sequence number sn:j , sn:j 2 f0::1g, to distinguish between successive rounds. (In Section 7, we consider the case where the sequence numbers are from the set f0::K , 1g where K 2.) The general process. The general executes only one action, RG1: when the sequence numbers of all processes become identical, the general starts a new round by choosing a new value for d:g and incrementing its sequence number, sn:g. Thus, letting denote addition modulo 2, the action of the general is: RG1 :: (8k :: sn:k = sn:g)
,!
d:g; sn:g := new decision(); sn:g 1
The non-general processes.
Each other process j executes two actions: The rst action, RO1, is executed after the general has started a new round, in which case j copies the decision of the general. It then executes its second action, RO2, which outputs its decision, increments its sequence number to denote that it is ready to participate in the next round, and resets its decision to ? to denote that it has not yet copied the decision of the general in that round. Thus, the two actions of j are: RO1 :: d:j = ? ^ (sn:j 1 = sn:g) ,! d:j := d:g RO2 :: d:j = 6 ? ,! f output d:j g; d:j; sn:j := ?; sn:j 1
The correctness proof of R is straightforward. (The interested reader will nd the proof in [7].)
4 Adding Masking Tolerance to Byzantine Faults
Program R is neither masking tolerant nor stabilizing tolerant to Byzantine failure. In particular, R may violate Agreement if the general becomes Byzantine and sends dierent values to the other processes. Note, however, that since these values are binary, at least two of them are identical. Therefore, for R to mask the Byzantine failure of any one process, it suces to add a \masking" component to R that restricts action RO2 in such a way that each non-general process only outputs a decision that is the majority of the values received by the non-general processes. For the masking component to compute the majority, it suces that the nongeneral process obtain the values received by other non-general processes. Based on these values, each process can correct its decision value to that of the majority. We associate with each process j an auxiliary boolean variable b:j that is true i j is Byzantine. For each process k (including j itself), we let j maintain a local copy of d:k in D:j:k. Hence, the decision value of the majority can be computed over the set of D:j:k values for all k. To determine whether a value D:j:k is from the current round or from the previous round, j also maintains a local copy of the sequence number of k in SN:j:k, which is updated whenever D:j:k is. The general process. To capture the eect of Byzantine failure, one action, MRG2, is added to the original action RG1 (which we rename as MRG1): MRG2 lets g change its decision value arbitrarily and is executed only if g is Byzantine. Thus, the actions for g are: MRG1 :: RG1 MRG2 :: b:g ,!
d:g := 0j1
The non-general processes.
We add the masking component \between" the actions RO1 and RO2 at j to get the ve actions MRO1 , 5: MRO1 is identical to RO1. MRO2 is executed after j receives a decision value from g, to set D:j:j to d:j , provided that all other processes had obtained a copy of D:j:j in the previous round. MRO3 is executed after another process k has obtained
a decision value for the new round, to set D:j:k to d:k. MRO4 is executed if j needs to correct its decision value to the majority of the decision values of its neighbors in the current round. MRO5 is a restricted version of action RO2 that allows j to perform its output i its decision value is that of majority. Together, the actions MRO2 , 4 and the restriction to action RO2 in MRO5 de ne the masking component (cf. the dashed box below). To model Byzantine execution of j , we introduce action MRO6 that is executed only if b:j is true: MRO6 lets j change D:j:j and, thereby, aect the value read by process k when k executes MRO3. MRO6 also lets j obtain arbitrary values for D:j:k and, thereby, aect the value of d:j when j executes MRO4. Thus, the six actions of MRO are as follows: MRO1 :: RO1 MRO2 :: d.j โ โฅ โง SN.j.j = sn.j โง compl.j ๏ฃงโ D.j.j, SN.j.j := d.j, SN.j.j โ 1 MRO3 :: SN.j.k โ 1 = SN.k.k ๏ฃงโ D.j.k, SN.j.k := D.k.k, SN.k.k MRO4 :: d.j โ โฅ โง majdefined.j โง d.j โ maj.j ๏ฃงโ d.j := maj.j MRO5 :: d.j โ โฅ โง majdefined.j โง d.j = maj.j ๏ฃงโ output_decision(d.j) ; d.j, sn.j := โฅ, sn.j โ 1 MRO6 :: b.j
๏ฃงโ D.j.j := 0 | 1; (|| k : SN.j.k โ 1 = SN.k.k : D.j.k, SN.j.k := 0 | 1, SN.k.k)
(8k :: SN:j:j = SN:k:j ) majdefined:j compl:j ^ (8k :: SN:j:j = SN:j:k) ^ (sn:j 6= SN:j:j ) maj:j = (majority k :: D:j:k)
where, compl:j
Fault Actions.
If the number of Byzantine processes is less than 1, the fault actions make some process Byzantine. Thus, letting l and m range over all processes, the fault actions are: jfl : b:lgj < 1 ,! b:m := true
Proof of correctness. In accordance with the design issues discussed in the
introduction, this proof consists of two parts: (1) The masking component oers masking tolerance to R in the presence of Byzantine faults, and (2) the masking component does not interfere with R in the absence of faults. (1) For each round of the system computation, let v:j denote the value obtained by j in that round when it executes RO1, and let cordec be de ned as follows. cordec = d:g if :b:g = (majority j :: v:j ) otherwise Observe that in the start state of the round |where the sequence numbers of all processes are identical, i.e. (8j; k :: sn:j = SN:j:k = sn:g), and no nonByzantine process has read the decision of g, i.e. (8j : :b:j : d:j = ?)| only action RG1 in g can be executed. Thereafter, the only action enabled at each non-Byzantine process j is RO1. After j executes RO1, j can only execute its masking component. Moreover, j cannot execute RO2 until the masking component in j terminates in that round.
Thus, the masking component in j executes between the actions RO1 and RO2 in j . The masking component in j rst executes action MRO2 to increment SN:j:j . By the same token, the masking component in k increments SN:k:k. Subsequently, the masking component in j can execute MRO3, to update SN:j:k and D:j:k. Likewise, each k can execute MRO3 to update SN:k:j and D:k:j . Note that if k is non-Byzantine, D:j:k is the same as v:k, which in turn is equal to d:g if g is also non-Byzantine. It follows that eventually majdefined:j ^ maj:j = cordec holds, and the masking component in j can subsequently ensure that d:j = maj:j before it terminates in that round. After the masking component in j terminates, j can only execute action RO2. It follows that, in the presence of a Byzantine fault, each round of the system computation satis es Validity and Agreement. (2) Observe that, in the absence of a Byzantine fault, the masking component eventually satis es majdefined:j ^ d:j = maj:j in each round and then terminates. Therefore, the masking component does not interfere with R in the absence of a Byzantine fault.
5 Adding Stabilizing Tolerance to Transient & Byzantine Failures
Despite the addition of the masking component to the program R, the resulting program MR is not yet stabilizing tolerant to transient and Byzantine failures. For example, MR deadlocks if its state is transiently corrupted into one where some non-general process j incorrectly believes that it has completed its last round, i.e., d:j = ? ^ SN:j:j 6= sn:j . It therefore suces to add a \stabilizing" component to MR that ensures stabilizing tolerance to transient and Byzantine failures while preserving the masking tolerance to Byzantine failure. Towards designing the stabilizing component, we observe that in the absence of transient faults the following state predicates are invariantly true of MR: (i) whenever d:j is set to ?, by executing action MRO5, j increments sn:j , thus satisfying SN:j:j = sn:j ; and (ii) whenever j sets sn:j to be equal to sn:g, by executing action MRO5, d:j is the same as ?. In the presence of transient faults, however, these two state predicates may be violated. Therefore, to add stabilizing tolerance, we need to guarantee that these two state predicates are corrected. To this end, we add two corresponding correction actions, namely MRO7 and MRO8, to the non-general processes. Action MRO7 is executed when d:j is ? and SN:j:j is dierent from sn:j , and it sets SN:j:j to be equal to sn:j . Action MRO8 is executed when sn:j is the same as sn:g but d:j is dierent from ?, and it sets d:j to be equal to ?. With the addition of this stabilizing component to MR, we get a multitolerant program SMR. MRO7 :: d:j = ? ^ SN:j:j 6= sn:j MRO8 :: d:j = 6 ? ^ sn:j = sn:g
,! ,!
SN:j:j := sn:j d:j := ?
Fault Actions. In addition to the Byzantine fault actions, we now consider
the transient state corruption fault actions (let j and k range over non-general processes): true ,! d:g; sn:g := 0j1; 0j1 true ,! d:j; sn:j := 0j1; 0j1 true ,! SN:j:k; D:j:k := 0j1; 0j1
Proof of Correctness. In accordance with the design issues discussed in the
introduction, this proof consists of three parts: (1) The stabilizing component oers stabilizing tolerance to MR in the presence of transient and Byzantine faults, (2) the stabilizing component does not interfere with the execution of MR in the absence of faults, and (3) the stabilizing component does not interfere with the masking tolerance of MR in the presence of Byzantine faults only. (1) Observe that execution of the component in isolation ensures that eventually the program reaches a state where the state predicate S holds, where S = (d:j = ? ) SN:j:j = sn:j ) ^ (sn:j = sn:g ) d:j = ?) . Since both disjuncts in S are preserved by the execution of all actions in MR, program MR does not interfere with the correction of S by stabilizing component. Starting from a state satisfying S , at most one round is executed incorrectly. For reasons of space, we omit the proof here and refer the interested reader to [7]. (2) Observe that, in the absence of faults, S continues to be preserved, and hence the stabilizing component is never executed. Therefore, the stabilizing component does not interfere with MR in the absence of faults. (3) As in part (2), observe that, in the presence of Byzantine faults only, S continues to be preserved and, hence, the stabilizing component is never executed. Therefore, the stabilizing component does not interfere with MR in the presence of Byzantine faults.
6 Extension to Tolerate Multiple Byzantine Faults To motivate the generalization of SMR to handle t Byzantine failures given n non-general processes, where n 3t, let us take a closer look at how program SMR is derived from R. To design SMR, we added to each process j a set of components C (j ) (see Figure 1). g
g 1
1
n
C(1)
C(n)
n
(a) : Program R
g : Actions MRG1 , 2 j : Actions RO1 , 2 and MRO6 C (j ) : Actions MRO2 , 4, MRO7 , 8, and the restriction of MRO5
(b) : Program SMR
Figure 1: Structure of R and SMR
Note that action MRO2 is of the form of RG1 and that action MRO3 is of the form RO1 followed by RO2. (D:j:j and SN:j:j play the role of d:g and sn:g and D:j:k and SN:j:k play the role of the d values at the non-general processes.) In other words, C (j ) itself contains a repetitive agreement program ! With this insight, we are now ready to generalize program SMR to handle the multiple Byzantine faults, based on an idea that is essentially due to Lamport, Shostak, and Pease [4]. (Our generalization, of course, is distinguished by being multitolerant.) Let g denote the general process, X denote the set of nongeneral processes, t denote the maximum number of Byzantine processes. We de ne SMR(g; X; t) = BY Z (g; X; t; hi), where BY Z (g; X; t; s) = inp(g; X; t; s) ^ MRG1(g; X; t; s) [] MRG2(g; X; t; s) [] ( [] j : j 2 X : RO1(j; X; t; s) [] w(j;X; t; s) ^ RO2(j; X; t; s) [] MRO6(j; X; t; s) ) [] ( [] j : j 2 X : C (j; X; t; s g) )
and inp(g; X; t; s) w(j; X; t; s) C (j; X; t; s)
= = = = = =
d:(last(s); X [ fgg; t +1; trlast(s)) 6= ? ^ sn:(g; X; t; s) = sn:(last(s); X [ fgg; t +1; trlast(s)) new decision() majdefined:(j; X; t; s) ^ d:(j; X; t; s)= maj:(j; X; t; s) true MRO4(j; X; t; trlast(s)) [] MRO7(j; X; t; trlast(s)) [] MRO8(j; X; t; trlast(s)) [] BY Z (j; X , fj g; t , 1; s) the empty program
if s 6= hi otherwise if t > 0 otherwise if t > 0 otherwise
Here s is a sequence ; last(s) denotes the last element of s ; trlast(s) denotes the sequence obtained by omitting the last element of s ; s j denotes the sequence obtained by appending j to s ; and action ac in program j is modi ed as follows: j is replaced with the quadruple (j; X; t; s) The quanti cation over k in compl is over the set f (k; X , fj g; t , 1; s j ) : k 2 (X , fj g) g [ f (j; X; t; s) g The quanti cation over k in majdefined and maj is over the set f (j; X ,fkg; t , 1; s k) : k 2 (X , fj g) g [ f (j; X; t; s) g If s is nonempty, the output decision is assigned to the variable D:(j; X; t; s):(j; X [ last(s); t +1; trlast(s)) Observe that if the de nition of SMR(g; X; t) is instantiated with t = 0, the resulting program is R. And, if the de nition is instantiated with t = 1, the resulting program is SMR (with the previously noted exception that action MRO3 in j of SMR is implemented by RO1 and RO2 in the bottommost instantiation of BY Z , namely BY Z (j; X , fj g; 0; hgi)). Program SMR(g; X; t) is multitolerant, i.e., it is masking tolerant to Byzantine faults and stabilizing tolerant to transient and Byzantine faults. We note that the structure of the proof of stabilization is the same as the proof for SMR: upon
starting from any state, the program reaches a state where S holds; subsequently, g is guaranteed to start a new round in nitely often; and when g starts the (t +1) , th round, the resulting computation satis es Validity and Agreement. The proof of masking tolerance is similar to the one in [4].
7 Re ning the Atomicity While Preserving Multitolerance Our design thus far has assumed read-and-write atomicity, whereby each action of a process can atomically read the variables of the other processes and update the variables of that process. In this section, we show that our design can be re ned into read-or-write atomicity, whereby each action of a process can either atomically read the variables of some other process or atomically write its variables but not both. We choose a standard re nement [8]: In each process j a copy of every variable that j reads from another process is introduced. For each of these variables, an action is added to j that asynchronously reads that variable into the copy. Moreover, the actions of j are modi ed so that every occurrence of these variables is replaced by their corresponding copies. We perform this re nement successively in each step of our design. Thus, we re ne R rst, the masking component second, and the stabilizing component last. Below, we prove the properties of the program resulting from each re nement step, in terms of the issues (1)-(3) discussed in the introduction. Step 1: Correctness of the re ned R. In the absence of faults, when g increments sn:g by executing action RG1, the only actions of program R that can be executed are RO1 and then RO2 at each non-general process. Until each non-general process j executes RO2, g cannot execute any further action. Thus, by re ning R, even if j rst reads d:g and sn:g, then updates its local copies of d:g and sn:g, and later executes the re ned action RO1, g cannot execute any other actions in the meanwhile. Hence, the computations of the re ned R have the same eect as those of R. Step 2: Correctness of the re ned masking component. (1) In the presence of Byzantine faults, the re ned actions of R do not interfere with the re ned masking component. To see this, consider the re nement of action MRO2 of the masking component: To execute MRO2, j needs to read the variable SN:k:j of process k. The re nement introduces a copy of SN:k:j at j . For the re ned action MRO2 to be enabled, j must rst update these copies from other processes. Also, if compl:j is true then it continues to be true unless j changes SN:j:j by executing MRO2. Hence, MRO2 can be correctly re ned. Likewise, actions MRO4 and MRO5 can be correctly re ned. Regarding action MRO3, recall from Section 6 that MRO3 is equivalent to the simultaneous execution of RO1 and RO2 and, hence, it too can be correctly re ned. Hence, the masking component executes only between the executions of RO1 and RO2 and thus the re ned actions of R do not interfere with the re ned masking component. (2) In the absence of Byzantine faults, just as in Section 4, the re ned masking component eventually satis es majdefined:j ^ d:j = maj:j in each round and then terminates. Therefore, the re ned masking component does not interfere with the re ned R in the absence of Byzantine faults.
Step 3: Correctness of the re ned stabilizing component. (1) Towards
preserving stabilizing tolerance in the presence of transient and Byzantine faults while re ning the stabilizing component, we claim that the set of possible sequence numbers has to be increased to f0::K , 1g where K 4. (This claim follows from the fact that between g and j there are four sequence numbers, namely, sn:g, sn:j , and the copies of sn:g and sn:j at j and g respectively; for details, see [8].) Moreover, a deadlock has to be avoided in the states of re ned version of R where sn:j 6= sn:g and sn:j 6= sn:g 1. (These states do not exist in SMR since its sequence numbers are either 0 or 1.) Therefore, to preserve stabilization, we need to add actions to the stabilizing component that satisfy sn:j 2 fsn:g; sn:g 1g. If these actions set sn:j to sn:g, d:j must also be set to ?; otherwise, action MRO2 may interfere with this component by incrementing sn:j . Alternatively, these actions may set sn:j to sn:g 1. Either alternative is acceptable. Since the re ned R and the re ned masking component preserve each constraint satis ed by the re ned stabilizing component, the former does not interfere with the latter in the presence of transient and Byzantine faults. (2) In the absence of Byzantine faults, just as in Section 5, the stabilizing component never executes. Therefore, the re ned stabilizing component does not interfere with the other re ned components in the absence of faults. (3) In the presence of Byzantine faults only, again the stabilizing component never executes. Therefore, the re ned stabilizing component does not interfere with the other re ned components in the presence of Byzantine faults. In summary, program SMR can be re ned into read-or-write atomicity while preserving its multitolerance by asynchronously updating copies of the variables of \neighboring" processes. Note that the copies can be implemented by channels of unit length between the neighboring processes that lose any existing message in them when they receive a new message. It follows that the re ned program is multitolerant for a message passing network where each channel has unit capacity. Moreover, using standard transformations, one can further re ne the program into a message passing one with bounded capacity channels.
8 Generalizing From Our Design
In this section, we discuss the general aspects of our method in the context of the design for SMR. We nd that our stepwise method of adding a component to provide each desired tolerance property facilitated the solution of the problem at hand. It is worthwhile to point out that this method is general enough to design programs obtained from various existing fault-tolerance design methods such as replication, checkpointing and recovery, Schneider's state machine approach, exception handling, and Randell's recovery blocks. Types of tolerance components. The stabilizing component we added to MR, to ensure that the state predicate S holds, is an instance of a corrector. Corrector components suce for the design of stabilizing tolerance and, more generally, nonmasking tolerant programs. Wellknown examples of correctors include reset procedures, rollback-recovery, forward recovery, error correction codes, constraint (re)satisfaction, exception handlers, and alternate procedures
in recovery blocks. Large correctors can be designed in a stepwise and hierarchical fashion by parallel and/or sequential composition of small correctors. The masking component we added to R is itself composed of two sub-components, one a detector and the other a corrector. The detector consists of actions MRO2 , 3 and the restriction to RO2 in MRO5, while the corrector consists of actions MRO2 , 4. The task of the detector is to help preserve the safety properties (namely, Validity and Agreement) in the presence of Byzantine failure, by detecting the state predicate \the decision of j is that of the majority of the non-general processes", while the task of the corrector is to ensure that the same state predicate holds. Note that adding this detector but not the corresponding corrector would have yielded only fail-safe tolerance instead of masking tolerance. In other words, in the presence of Byzantine failure, Validity and Agreement would be satis ed if all processes output their decision, although some processes may never output their decision. More generally, detector components suce for the design of fail-safe tolerance and, together, detector and corrector components suce for the design of masking tolerance. Wellknown examples of detectors include snapshot procedures, acceptance tests, error detection codes, consistency checkers, watchdog programs, snooper programs, and exception conditions. Analogous to the compositional design of large correctors, large detectors can be designed in a stepwise and hierarchical fashion, by parallel and/or sequential composition of small detectors. The interested reader is referred to a companion paper [1] for an in-depth study of detectors and correctors. Masking tolerant system + detectors Stabilizing Failsafe (nonmasking) and tolerant tolerant correctors system system + correctors + detectors Intolerant system
Figure 2: Components that suce for design of various tolerances
Self-tolerances of components. Since a component that is added to tolerate
a fault-class is itself subject to the fault-class, the question arises: what sort of tolerance should the component itself possess to that fault-class ? We observe that in SMR, the masking component is itself masking tolerant to Byzantine faults and the stabilizing component is itself stabilizing tolerant to transient and Byzantine faults. In fact, in general, for adding stabilizing (nonmasking) tolerance, it suces that the added component be stabilizing (nonmasking) tolerant. Likewise, for adding fail-safe tolerance, it suces that the added component be fail-safe tolerant. And, for adding masking tolerance, it suces that the added component be masking tolerant. In practice, the detectors and correctors often possess the desired tolerance trivially. But if they do not, one way to design components to be self-tolerant is by the analogous addition of more detectors and correctors components to them. Alternative ways are exempli ed by designs that yield self-checking, selfstabilizing, or inherently fault-tolerant programs.
Masking tolerant system Stabilizing + masking components tolerant system + stabilizing components
Failsafe tolerant system
+ failsafe components Intolerant system
Figure 3: Self-tolerances of components for various tolerances Stepwise design of tolerances. We observe that our decision to make the program masking tolerant rst and stabilizing tolerant second is not crucial. The same program could also be obtained by adding components in the reverse order, to deal with stabilization rst and masking second. In fact, in general, the same multitolerant program can be designed by adding the tolerance components in dierent orders. For the special case of adding both detector and corrector components for masking tolerance, the design may be simpli ed by using a stepwise approach [5]: For instance, we may rst augment the program with detectors and then augment the resulting fail-safe tolerant program with correctors. Alternatively, we may rst augment the program with correctors and then augment the resulting nonmasking tolerant program with detectors. Masking tolerant system + detectors + correctors
Stabilizing (nonmasking) tolerant system + correctors
Failsafe tolerant system
+ detectors
Intolerant system
Figure 4: Two approaches for stepwise design of masking tolerance On a related note, we observe that adding detectors to a stabilizing program suces to enhance the tolerance of the correctors in the program from stabilizing to masking. Likewise, adding correctors to a fail-safe program suces to enhance the tolerance of the detectors in the program from fail-safe to masking. Stepwise re nement of tolerances. Our example illustrates the general principle that a component-based multitolerant program can be re ned in the same manner as it is designed: in the rst step, the original program is re ned and its correctness in the absence of faults is shown. Then, the re ned version of the rst component is added and the tolerance properties in the presence of the rst fault-class are shown. And so on for each fault-class. Thus, in obtaining the re ned program, the design/proof of the original program can be reused. Alternative tolerances to Byzantine failures. In the presence of Byzantine failures alone, SMR satis es the speci cation of repetitive agreement in each round. The reader may wonder whether this strong guarantee is true of every repetitive agreement program that tolerates Byzantine failures. The answer is negative: Zhao and Bastani [6] have presented a program that is nonmasking tolerant to Byzantine failures, i.e., that could violate the speci cation in some nite number of rounds only. Moreover, we present here a program that is
stabilizing tolerant |but not masking tolerant| to Byzantine failure. This program is composed from SMR and a nonmasking tolerant program outlined below. Again, for simplicity, we consider the special case where there are four processes and at most one is Byzantine. In our nonmasking tolerant program, each non-general process j chooses a \parent" of j that is initially g. In each round, j receives the decision value of its parent and outputs that value as the decision of j . In parallel, j obtains the decision value of g and forwards it to other non-general processes. If the values that j receives from g and the other two processes are not all identical, j is allowed to change its parent, so that it will output a correct decision in the following rounds, as follows. Let j , k, and l be the three non-general processes. We consider two cases: (1) g is Byzantine and (2) g is non-Byzantine. Case (1): If g sends the value B to l and B 1 to j and the remaining process k, j and k will suspect that l or g is Byzantine, and l will know that g is Byzantine and that j and k are non-Byzantine. Without loss of generality, let the id of j be greater than that of k. We let both j and l change their parent to k (to avoid forming a cycle in the parent relation, k retains g as its parent). In all future rounds, j and l output the value received from k and, hence, the decision output by j; k; and l is identical. Case (2): Since the values sent by both j and k are the same, both j and k are non-Byzantine. Again, assuming the id of k is greater than that of j , it is safe to let j change its parent to k. In all future rounds, the decision output by j and k is the same as that output by g. It follows that the nonmasking tolerant program executes only a nite number of rounds incorrectly in the presence of at most one Byzantine failure. This program is made stabilizing by adding SMR to it, as follows: Each process j is in one of two modes: nonmasking or stabilizing. It executes the nonmasking tolerant program when it is in the nonmasking mode, and it executes SMR when it is in the stabilizing mode. Further, it is allowed to change from the nonmasking mode to the stabilizing mode, but not vice versa. Observe that the nonmasking tolerant program satis es the state predicate \if the parent of j is k for some k 6= g, then k is non-Byzantine, the parent of k is g, and the parent of l is k provided l is non-Byzantine". Hence, if j suspects that this predicate is violated, i.e., in some round j detects that either g or k is Byzantine, or the parent of k is not g, or the parent of l is not k, then j changes to the stabilizing mode and starts executing SMR. Moreover, whenever j detects that some other process is in the stabilizing mode, it changes its mode to stabilizing. Thus, if the composite program is perturbed to a state that is not reached by the nonmasking tolerant program, eventually all processes execute actions of SMR. It follows that the composite program is stabilizing tolerant but not masking tolerant to Byzantine failures.
9 Concluding Remarks
In this paper, we presented a case study in the design of multitolerance. Starting with an intolerant program, we rst added a component for masking tolerance and then added another component for stabilizing tolerance. Our design illustrated the issues of: (i) how to add a component for oering a tolerance in
the presence of a fault-class, (ii) how to ensure that the added component does not interfere with the satisfaction of the program speci cation in the absence of faults, and (iii) how to ensure that the added component does not interfere with the tolerances oered to the previously considered fault-classes. While our proofs of interference-freedom were presented in informal terms, they are readily formalized, e.g., in the temporal logic framework of [1]. The formalization builds upon previous work on compositionality, e.g. [9, 10, 11]. We observed that a similar design is possible where we rst add stabilizing tolerance and then masking tolerance. Also, not every repetitive agreement program that stabilizes in the presence of transient and Byzantine faults needs to be masking tolerant to Byzantine faults; in particular, it could be only nonmasking tolerant to Byzantine faults. Moreover, as discussed in Section 7, our initial design which assumed read-and-write atomicity is readily re ned within the context of our design method into read-or-write atomicity or message-passing. A reviewer of this paper correctly observed that our design method could bene t from using the principles of adaptive fault-tolerance [12]. Indeed, one way of simplifying the interference-freedom obligation between tolerance components is to choose only one of the tolerance components for execution at a time and to adapt this choice as the (fault) environment and internal state of the system changes. Note, however, that the mechanism for adapting the choice of tolerance component must itself be multitolerant.
References [1] A. Arora and S. S. Kulkarni. Component-based design of multitolerance. Technical Report OSU-CISRC TR37, Ohio State University, 1996. [2] Z. Liu and M. Joseph. Transformation of programs for fault-tolerance. Formal Aspects of Computing, 4(5):442{469, 1992. [3] K. P. Birman and R. van Renesse. Reliable distributed computing using the Isis Toolkit. IEEE Computer Society Press, 1994. [4] L. Lamport, R. Shostak, and M. Pease. The Byzantine generals problem. ACM Transactions on Programming Languages and Systems, 1982. [5] A. Arora and S. S. Kulkarni. Designing masking fault-tolerance via nonmasking faulttolerance. IEEE Transactions on Software Engineering, 1997, to appear. [6] Y. Zhao and F. B. Bastani. A self-adjusting algorithm for Byzantine agreement. Distributed Computing, 5:219{226, 1992. [7] S. S. Kulkarni and A. Arora. Compositional design of multitolerant repetitive Byzantine agreement (preliminary version). Third Workshop on Self-Stabilizing Systems (WSS 97), University of California, Santa Barbara, 1997. [8] A. Arora and M. G. Gouda. Distributed reset. IEEE Transactions on Computers, 43(9):1026{1038, 1994. [9] K. Apt, N. Francez, and W.-P. de Roever. A proof system for communicating sequential processes. ACM transactions on Programming Languages and Systems, pages 359{385, 1980. [10] H. Schepers. Fault Tolerance and Timing of Distributed Systems: Compositional speci cation and veri cation. PhD thesis, Eindhoven University, 1994. [11] S. Owicki and D. Gries. An axiomatic proof technique for parallel programs. Acta Informatica, 6:319{340, 1976. [12] J. Goldberg, I. Greenberg, and T. Lawrence. Adaptive fault-tolerance. Proceedings of the IEEE Workshop on Advances in Parallel and Distributed Systems, pages 127{138, 1993.
Algorithmic issues in coding theory Madhu Sudan Laboratory for Computer Science, MIT Cambridge, MA 02139, U.S.A. [email protected]
Abstract. The goal of this article is to provide a gentle introduction to
the basic de nitions, goals and constructions in coding theory. In particular we focus on the algorithmic tasks tackled by the theory. We describe some of the classical algebraic constructions of error-correcting codes including the Hamming code, the Hadamard code and the Reed Solomon code. We describe simple proofs of their error-correction properties. We also describe simple and ecient algorithms for decoding these codes. It is our aim that a computer scientist with just a basic knowledge of linear algebra and modern algebra should be able to understand every proof given here. We also describe some recent developments and some salient open problems.
1 Introduction Error-correcting codes are combinatorial structures that allow for the transmission of information over a noisy channel and the recovery of the information without any loss at the receiving end. Error-correcting codes come in two basic formats. (1) The \block error-correcting code": Here the information is broken up into small pieces. Each piece contains a xed nite amount of information. The encoding method is applied to each piece individually (independently). The resulting encoded pieces (or blocks) are sent over the noisy channel. (2) The \convolutional codes": Here the information is viewed as a potentially in nite stream of bits and the encoding method is structured so as to handle an in nite stream. This survey will be restricted to the coverage of some standard block error-correcting codes. Formally a block error-correcting code may be speci ed by an encoding function C. The input to C is a message m, which is a k-letter string over some alphabet (typically = f0; 1g but we will cover more general codes as well). E maps m into a longer n-letter string over the same alphabet1 . The mapped string is referred to as a codeword. The basic idea is that in order to send the message m over to the receiver, we transmit instead the codeword C(m). By the time this message reaches the destination it will be corrupted, i.e., a few letters in C(m) 1 The assumption that the message is a k-letter string over is just made for notational convenience. As it will become obvious, the representation of the message space is irrelevant to the communication channel. The representation of the encoded string is however very relevant!
would have changed. Say the received word is R. Hopefully R will still be able to convey the original message m even if it is not identically equal to C(m). The only way to preserve this form of redundancy is by ensuring that no two codewords are too \close" to each other. This brings us to the important notion of \close"ness used, namely the Hammingdistance. The Hamming distance between two strings x; y 2 n , denoted (x; y), is the number of letters where x and y dier. Notice that forms a metric, i.e., (x; y) = 0 ) x = y, (x; y) = (y; x) and (x; y) + (x; z) (x; z). A basic parameter associated with a code is its distance i.e., the maximum value d such that any two codewords are a Hamming distance of at least d apart. Given a code of distance d and a received word R that diers from C(m) in at most e d , 1 places, the error in the transmission can be detected. Speci cally, we can tell that some letter(s) has been corrupted in the transmission, even though we may not know which letters are corrupted. In order to to actually correct errors we have to be able to recover m uniquely based on R and a bound t on the number of errors that may have occurred. To get the latter property t has to be somewhat smaller than d , 1. Speci cally if t b(d , 1)=2c, then we notice that indeed there can be at most one message m such that (C(m); R) t. (If m1 and m2 both satisfy (C(m1 ); R); (C(m2); R) t, then (C(m1); C(m2 )) (m1 ; R) + (R; m2 ) 2t d , 1, contradicting the distance of C.) Thus in an information theoretic sense R maintains the information contained in m. Recovering the information m eciently from C is another matter and we will come back to this topic presently. To summarize the discussion above we adopt the following terse notation that is standard in coding theory. A code C is an [n; k; d]q code if C : k ! n , where j j = q with minx;y2 k f(C(x); C(y))g = d. With some abuse of notation we will use C to denote the image of the map C (i.e., C may denote the collection of codewords rather than the map). C is called a e-error-detecting code for e = d , 1 and a t-error correcting code for t = b(d , 1)=2c. In the remaining sections of this article we will describe some common constructions of [n; k; d]q for various choices of the parameters n; k; d and q. We will also describe the algorithmic issues motivated by these combinatorial objects and try to provide some solutions (and summarize the open problems). (We assume some familiarity with algebra of nite elds [10, 19].) Before going on to these issues, we once again stress the importance of the theory of errorcorrecting codes and its relevance to computer science. The obvious applications of error-correcting codes are to areas where dealing with error becomes important such as storage of information on disks, CDs, and communication over modems etc. Additionally, and this is where they become important to the theoretical computer scientist, error-correcting codes come into play in several ways in complexity theory | for example, in fault-tolerant computing, in cryptography, in the derandomization of randomized algorithms and in the construction of probabilistically checkable proofs. In several of these cases it is not so much the nal results as the notions, methods and ingredients from coding theory that help. All of this makes it important that a theoretical computer scientist be comfortable with the methods of this eld | and this is the goal of this article. A reader
interested in further details may try one of the more classical texts [2, 11, 17]. Also, the article of Vardy [18] is highly recommended for a more detailed account of progress in coding theory. The article is also rich with pointers to topics of current interest.
2 Linear Codes While all questions relating to coding theory can be stated in general, we will focus in our article on a subset of codes called linear codes. These codes are obtained by restricting the underlying alphabet to be a nite eld of cardinality q with binary operations \+" and \". Thus a string in n can be thought of as a vector in n-dimensional space, with induced operations \+" (vector addition), and \" (scalar multiplication). Thus a code C n is now a subset of the vectors. If this subset of vectors forms a \subspace" then the code is linear, as made formal below:
De nition1. C n is a linear code if 8a 2 ; x; y 2 C , x + y; a x 2 C . Many of the parameters of error-correcting codes become very clean in the case of linear codes. For instance, how does one specify a code C 2 n ? For general codes, succinct representations may not exist! However, for every linear code a succinct representation, of size polynomial in n does exist! In particular, we have the following two representations: 1. For every [n; k; d]q linear code C there exists an n k \generator" matrix G = GC with entries from such that C = fGxjx 2 k g. 2. For every [n; k; d]q code C there exists an (n , k) n parity check matrix H = HC over such that C = fy 2 n s.t. Hy = 0g. Conversely, the following hold: Every n k matrix G over de nes an [n; k0; d]q code, for some d 1 and k0 k, CG having as codewords fGxjx 2 k g. Similarly every (n , k) n matrix H de nes an [n; k0; d] code CH0 , for some d 1 and k0 k, having as codewords fy 2 n jHy = 0g.
Exercise:
1. Prove properties (1) and (2) above. 2. Given the generator matrix GC of a code C , give a polynomial time algorithm to compute a parity check matrix HC for C . 3. Show that if G is of full column rank (H is of full row rank) then the code CG (CH ) is an [n; k; d]q code.
3 Some common constructions of codes In this section we describe some common construction of codes. But rst let us establish the goal for this section. In general we would like to nd families of
[n; k; d]q codes for in nitely many triples (n; k; d) for some xed q. The property we would really like is that k=n and d=n are bounded away from zero as n ! 1. Such a code is termed asymptotically good and the two properties k=n > 0 and d=n > 0 are termed constant message-rate and constant distance-rate respectively. Unfortunately we will not be able to get to this goal in this article. But we will settle for what we term weakly good codes. These are codes with polynomial message-rate, i.e., k = (n ) for some > 0 and constant distancerate.
3.1 Hamming code Hamming codes are de ned for every positive n such that there exists an integer l such that n = 2l , 1. Then the Hamming code of block size n over the alphabet f0; 1g is given by an l n parity check matrix H Hmg whose columns are all the distinct l-dimensional non-zero vectors. Notice that there are exactly 2l , 1 of these.
Lemma 2. For every positive integer n such that n = 2l , 1 for some integer l, the Hamming code of block size n is an [n; n , l; 3]2 code. Proof Sketch. Notice that the rank of H Hmg is l. In particular the column vectors
containing exactly one 1 are linearly independent and there are l of them. Thus we nd that the Hamming code is an [n; k; d]2 code for k = n , l. We now move to showing that the distance of the Hamming code is 3. Notice that the code has no elements of weights since this would imply that two vectors in the parity check matrix are identical. This implies the distance is at least 3. Now consider any two column vectors v1 and v2 in H Hmg. Notice that the vector v1 + v2 is also a column vector of H Hmg and is distinct from v1 and v2 . Now consider the n dimensional vector which is zero everywhere except in the coordinates corresponding to the vectors v1; v2 and v1 + v2. This vector has weight 3 and is easily seen to be an element of the Hamming code. Thus the distance of the Hamming code is exactly 3. The Hamming code is a simple code with a very good rate. Unfortunately it can only correct 1 error, de nitely far from our goal of constant error-rate. Next we move on to a code with good error-correcting properties, but with very low-rate.
3.2 Hadamard code A Hadamard matrix is an n n matrix M with entries from 1 such that MM T = n In where In is the n n identity matrix. A Hadamard matrix immediately leads to an error correcting code where the rows of M are the codewords. This leads to a codeword over the alphabet = f+1; ,1g. We prove the distance property of the code rst.
Lemma 3. If M is a Hadamard matrix then any two rows agree is exactly n=2 places.
Proof. Say the rows of interest are the ith and jth rows. Then consider the
element (MM T )ij . This element is the sum of n terms, with the kth term being mik mjk . Notice that this term evaluates to +1 if mik = mjk and ,1 otherwise. Thus if the ith and jth rows disagree in t places, then (MM T )ij = (n , t) + t. Since (MM T )ij = 0, we have that n , 2t = 0 and hence the two rows (dis)agree in exactly n=2 places. Thus the task of constructing a Hadamard code reduces to the task of constructing Hadamard matrices. Constructions of Hadamard matrices have been a subject of much interest in combinatorics. It is clear (from Lemma 3) that for an n n Hadamard matrix to exists n must be even. The converse is not known to be true and is still an open question. What is known is that an n n matrix exists for every n of the form p , 1 where p is a prime. It is also known that if an n1 n1 Hadamard matrix exists and an n2 n2 Hadamard matrix exists, then an n1n2 n1 n2 matrix exists. Many other such constructions are also known but not all possibilities are covered yet. Here we give the basic construction which applies when n is a power of 2. These constructions are described recursively as follows: 2 3 2 3 +MlHdm +MlHdm +1 +1 , 1 , 1 5: 5 MlHdm = 4 M1Hdm = 4 Hdm +MlHdm , M +1 ,1 ,1 l,1
Lemma 4. For every l, the rows of MlHdm form a [2l ; l; 2l,1]2 code. Proof. Left as an exercise to the reader.
The Hadamard codes maintaina constant distance-rate. However their messagerate approaches zero very quickly. Next we describe a code with constant messagerate and distance-rate. The catch is that the code uses an alphabet of growing size.
3.3 Reed Solomon code
The Reed Solomon codes are a family of codes de ned over an alphabet of growing size, with n q. The more common de nition of this code is not (we feel) as intuitive or as useful as the \folklore" de nition. We present both de nitions here, starting with the more useful one and then show the equivalence of the two. De nition5 (Reed Solomon codes). Let be a eld of size q, n q and let x0; : : :; xn,1 be some xed enumeration of n of the elements of . (It is standard to pick n = q , 1 and xi = i for some primitive element 2 . Then for every 2 is a primitive element of the eld GF(q) if =6 1 for any j < q , 1. j
1 k n, the Reed Solomon code CRS m= ;n;k;q is de ned as follows: A message P ,1 m0 : : :mk,1 corresponds to the degree k , 1 polynomial M(x) = ki=0 mi xi . 1 The encoding of m, is CRS ;n;k;q (m) = c0 : : :cn,1 where cj = M(xj ).
The distance properties of the Reed Solomon codes follow immediately from the fact that a degree k , 1 polynomial may only have k , 1 zeroes unless all of its coecients are zero. 1 Lemma 6. For every n q and k n, the Reed Solomon code CRS ;n;k forms an [n; k; n , k]q linear code.
Proof. The fact that the code is linear follows from the fact that if M0 (x) and
M1 (x) are polynomials of degree at most k , 1 then so is M0 (x) + M1 (x). The distance follows from the fact that if M0 (xj ) = M1 (xj ) for k values of j then M0 M1 (or equivalently if M0 (xj ) , M1 (xj ) is zero for k values of j, then M0 , M1 is the zero polynomial). Finally for the sake of completeness we present a second de nition of Reed Solomon codes. This de nition is more commonly seen in the texts, but we feel this part may be safely skipped at rst reading.
De nition7 (Reed Solomon codes). Let be a eld of size q with primitive element , and let n = q , 1, k n. Let Pk;q (x) be the polynomial (x , )(x , 2 2 ) (x , n,k). The Reed Solomon code CRS ;n;k;q is de ned as follows: A message m = m : : :m corresponds to the degree k , 1 polynomial M(x) = 0 k,1 Pk,1 i 1 i=0 mi x . The encoding of m, is CRS;n;k;q(m) = c0 : : :cn,1 where cj is the coecient of xj in the polynomial Pk;q (x)M(x).
Viewed this way it is hard to see the correspondence between the two de nitions (or the distance property). We prove an equivalence next.
Lemma 8. The de nitions of Reed Solomon codes given in De nitions 5 and 7 coincide for n = q , 1 and the standard enumeration of the elements of GF(q). Proof. Notice that it suces to prove that every codeword according to the rst
de nition is a codeword according to the second de nition. The fact that the sets are of the same size implies that they are identical. 1 Consider the encoding of m = m0 : : :mk,1 . This encoding is CRS ;n;k;q = Pk,1 i j c0 : : :cn,1 with ci = j =0 mj ( ) . To show that this is a codeword P according ,1 c xi to the second de nition we need to verify that the polynomial C(x) = ni=0 i l has (x , ) as a factor for every l 2 f1; : : :; n , kg. Equivalently it suces to verify that C(l ) = 0, which we do next: C(l ) =
nX ,1 i=0
ci (l )i
=
nX ,1 kX ,1
mj (i )j (l )i
i=0 j =0 kX ,1 nX ,1 = mj (j l )i j =0 i=0 q,2 kX ,1 X i = mj j;l j =0 i=0
where j;l = j +l . Notice that for every j; l s.t. j + P l 6= q , 1, j;l 6= 1. ,2 i = 03. Since Notice further that for every such j;l the summation qi=0 j;l j 2 f0; : : :; k , 1g, we nd that j;l 6= 1 for every l 2 f1; : : :; q , 1 , kg. Thus for every l 2 f1; : : :; n , kg, we nd that C(l ) = 0. This concludes the proof.
3.4 Multivariate polynomial codes The next family of codes we describe are not very commonly used in coding theory, but have turned out to be fairly useful in complexity theory and in particular in the results on probabilistically checkable proofs. Surprisingly these codes turn out to be a common generalization of Hadamard codes and Reed Solomon codes!
De nition9 (Multivariate polynomial code). For integer parameters m; l and q with l < q, the multivariate polynomial code Cpoly;m;l;q has P as message a string of coecients m = fmi1 ;i2 ;:::;im g with ij 0 and j ij l.
This sequence is interpreted as the m-variate polynomial M(x1 ; : : :; xm) = P i1 i i1 ;:::;ij mi1 ;:::;ij x1 xmm . The encoding of m is the string of letters fM(x1; : : :; xm )g with one letter for every (x1 ; : : :; xm) 2 m . Obviously the multivariate polynomial codes form a generalization of the Reed Solomon codes (again using the rst de nition given here of Reed Solomon codes). The distance property of the multivariate polynomial codes follow also from the distance property of multivariate polynomials (cf. [5, 13, 21]).
Lemma 10. For integers m;l and q with l < q, the code Cpoly;m;l;q is an [n; k; d]q , code with n = qm , k = mm+l and d = (q , l)qm,1 . Proof. The bound The fact that the number of coecients , P on n is immediate.
i1 ; : : :; im s.t. j ij l is at ml+l is a well-known exercise in counting. Finally the bound on the distance follows from the fact a degree l polynomial can only be zero for l=q fraction of its inputs. (This is an easy inductive argument based on the number of variables. The base case is well known and inductively one 3
This identity is obtained as follows: Recall that Fermat's little theorem asserts that
q,1 , 1 = 0 for every non-zero P ,2 i in GF(q). Factoring the left hand side, we nd
= 0. Since 6= 1, the latter must be the case. that either , 1 = 0 or qi=0
picks a random assignment to the variables x1; : : :; xm,1 and argues that the resulting polynomial in xm is non-zero with high probability. Finally one uses the base case again to conclude that the nal polynomial in xm is left non-zero by a random assignment to xm .) 1 It is easy to see that the code CRS ;q;k;q is the same as the code Cpoly;1;k,1;q. Also notice that the code Cpoly;m;1;2 forms an [2m ; m; 2m,1]2 code, same as parameters of the Hadamard code given by the rows of MmHdm. It turns out that these two codes are in fact identical. The proof is left as an exercise to the reader.
3.5 Concatenated codes Each code in the collection of codes we have accumulated above has some aw or the other. The Hamming codes don't correct too many errors, the Hadamard codes are too low-rate, and the Reed Solomon codes depend on a very large alphabet. Yet it turns out it is possible to put some of these codes together and obtain a code with reasonably good behavior (\polynomially good"). This is made possible by a simple idea called \concatenation", de ned next.
De nition11 (Concatenation of codes). Let C1 be an [n1; k1; d1]q1 code over the alphabet 1 and let C2 be an [n2; k2; d2]q2 code over the alphabet 2 . If q1 = q2k2 then the code C1 C2 is de ned as follows: Associate every letter in 1 with a codeword of C2 . Encode every message rst using the code C1 and then encode every letter in the encoded string using the code C2. More formally, given a message m 2 1k1 = 2k1 k2 , let C1(m) = c1 : : :cn1 2 1n1 . The encoding C1 C2 (m) is given by c11 : : :c1n2 c21 : : :cn1 n2 2 2n1 n2 , where for every i 2 f1; : : :; n1g, ci1 : : :cin2 = C2 (ci ). Almost immediately we get the following property of concatenation.
Lemma 12. If C1 is an [n1; k1; d1]q1 code and if C2 is an [n2; k2; d2]q2 code with q1 = q2k2 , then C1 C2 is an [n1n2; k1k2; d0]q2 code, for some d0 d1d2. Proof. The block size and message size bounds follow from the de nition. To
see the distance property, consider two messages m1 ; m2 2 1k1 . For l 2 f1; 2g, let cl1 : : :cln1 be the encoding of ml using C1 and let cl11 : : :cln1n2 be its encoding using C1 C2. Notice that there must exist at least d1 values of i such that c1i 6= c2i (by the distance of C1). For every such i, there must exist at least d2 values of j such that c1ij 6= c2ij (by the distance of C2 ). Thus we nd that C1 C2 (m1 ) and C1 C2 (m2 ) dier in at least d1 d2 places. To best see the power of concatenation, consider the following simple application: Let C1 be a Reed Solomon code with q = 2m , n = q and k = :4n. I.e., C1 is an [n; :4n; :6n]2m code with n = 2m . Let C2 be the Hadamard code [2m ; m; 2m,1]2. The concatenation C1 C2 is an [n2; :4n logn; :3n2]2 code. I.e., the resulting code has constant distance-rate, polynomial rate and is over the binary
alphabet! Thus this satis es our weaker goal of obtaining a weakly-good code. Even the goal of obtaining an asymptotically good code is close now. In particular, the code of Justesen is obtained by an idea similar to that of concatenation. Unfortunately we shall not be able to cover this material in this article.
4 Algorithmic tasks We now move on to the algorithmic tasks of interests: The obvious rst candidate is encoding. Problem 13 (Encoding).
n k matrix G and message m 2 k . Output: C (m), where C = CG is the code with G as the generator matrix. It is clear that the problem as speci ed above is easily solved in time O(nk) and hence in time polynomial in n. For speci c linear codes such as the Reed Solomon codes it is possible to encode the codes faster, in time O(n logc n) for some constant c. However till recently no asymptotically good code was known to be encodable in linear time. In a recent breakthrough. Spielman [15] presented the rst known code that is encodable in linear time. We will discuss this more in a little bit. The next obvious candidate problem is the decoding problem. Once again it is clear that if the received word has no errors, then this problem is only as hard as solving a linear system and thus can be easily solved in polynomial time. So our attention moves to the case where the received word has errors. We rst de ne the error detection problem. Input:
Problem 14 (Error detection).
n k generator matrix G for a code C = CG; and a received word R 2 n . Is R a codeword? The error detection problem is also easy to solve in polynomial time. We nd the parity check matrix H for the code C and then check if HR =0. We now move to the problem of decoding in the presence of errors. This problem comes in several variants. We start with the simple de nition rst: Input:
Output:
Problem 15 (Maximum likelihood decoding).
n k generator matrix G for a code C = CG; and a received word R 2 n . Output: Find a codeword x 2 C , that is nearest to R in Hamming distance. (Ties may be broken arbitrarily.) There are two obvious strategies for solving the maximumlikelihood decoding problem: Brute Force 1: Enumerate all the codewords and nd the one that is closest to R. Brute Force 2: For t = 0; 1; : : :;, do: Enumerate all possible words within a Input:
Hamming distance of t from R and check if the word is a codeword. Output the rst match. Despite the naivete of the search strategies above, there are some simple cases where these strategies work in polynomial time. For instance, the rst strategy above does work in polynomial time for Hadamard codes. The second strategy above works in polynomial time for Hamming codes (why?). However, both strategies start taking exponential time once the number of codewords becomes large, while distance also remains large. In particular, for \asymptotically good" or even \weakly good" codes, both strategies above run in exponential time. One may wonder if this exponential time behavior is inherent to the decoding problem. In perhaps the rst \complexity" result in coding theory, Berlekamp, McEliece and van Tilborg [4] present the answer to this question.
Theorem 16 [4]. The Maximum likelihood decoding problem for general linear codes is NP-hard.
There are two potential ways to attempt to circumvent this result. One method is to de ne and solve the maximum likelihood decoding problem for speci c linear codes. We will come to this question momentarily. The other hope is that we attempt to correct only a limited number of errors. In order to do so, we further parameterize the maximum likelihood decoding problem as follows: Problem 17 (Bounded distance decoding).
n k generator matrix G for a code C = CG ; a received word R 2 n and a positive integer t. Output: Find any/all codewords in C within a Hamming distance of t from R. Input:
The hardness result of [4] actually applies to the Bounded distance decoding problem as well. However one could hope for a result of the form: \There exists an > 0, such that for every [n; k; d]q linear code C , the bounded distance decoding problem for C with t = d is solvable in polynomial time". One bottleneck to such a general result is that we don't know how to compute d for a generic linear code. This motivates the following problem: Problem 18 (Minimum distance). Input:
n k generator matrix G for a code C = CG and an integer parameter d. Is the distance of C at least d?
Output:
This problem was conjectured to be coNP-hard in [4]. The problem remained open for nearly two decades. Recently, in a major breakthrough, this problem was shown to be coNP-complete by Vardy [18]. While this does not directly rule out the possibility that a good bounded distance decoding algorithm may exist, the result should be ruled as one more reason that general positive results may be unlikely. Thus we move from general results, i.e., where the code is speci ed as part of the input, to speci c results, i.e., for well-known families of codes. The rst question that may be asked is: \Is there a family of asymptotically-good [n; k; d]q
linear code and > 0, for which a polynomial time bounded distance decoding algorithm exists for t d?" For this question the answer is \yes". A large number of algebraic codes do have such polynomial time bounded distance decoding algorithms. In particular the Reed Solomon codes are known to have such a decoding algorithm for t b(d , 1)=2c (cf. [2, 11, 17]). This classical result is very surprising given the non-trivial nature of this task. This result is also very crucial for many of the known asymptotically good codes, since many of these codes are constructed by concatenating Reed Solomon codes with some other codes. In the next section we shall cover the decoding of Reed Solomon codes in more detail. Lastly there is another class of codes, constructed by combinatorial means, for which bounded distance decoding for some t d can be performed in polynomial time. These are the expander codes, due to Sipser and Spielman [14] and Spielman [15]. The results culminate in a code with very strong | linear time (!!!) | encoding and bounded distance decoding algorithms. In addition to being provably fast, the algorithms for the encoding and decoding of these codes are surprisingly simple and clean. However, the description of the codes and analysis of the algorithm is somewhat out of the scope of this paper. We refer the reader to the original articles [14, 15] for details.
5 Decoding of Reed Solomon code As mentioned earlier a polynomialtime algorithm for bounded distance decoding is known and this algorithm corrects up to t b(d , 1)=2c errors. Notice that this coincides exactly with the error-correction bound of the code (i.e., a Reed Solomon code of distance d is a t-error-correcting code for t = b(d , 1)=2c). This bound on the correction capability is inherent, if one wishes to determine the codeword uniquely. However in the bounded distance decoding problem we do allow for multiple solutions. Given this latitude it is reasonable to hope for a polynomial-time decoding algorithm that corrects more errors - say up to t < (1,)d where is some xed constant. However no such algorithm is known for all possible values of (n; k; d = n , k). Recently, in [16], we presented an algorithm which does correct up to (1 , )d errors, provided k=n ! 0. This algorithm was inspired by an algorithm of Welch and Berlekamp [20, 3] for decoding Reed Solomon codes. This algorithm is especially clean and elegant. Our solution uses similar ideas to correct even more errors and we present this next. Notice rst that the decoding problem for Reed Solomon codes can be solved by solving the following cleanly stated problem: Problem 19 (Reed Solomon decoding).
n pairs of points f(xi; yi )g, xi ; yi 2 GF(q); and integers t; k. All polynomials p of degree at most k , 1 such that yi 6= p(xi) for at most t values of i. The basic solution idea in Welch-Berlekamp and our algorithm is to nd an algebraic description of all the given points, and to then use the algebraic Input:
Output:
description to extract p. The algebraic description we settle for is an \algebraic curve in the plane", i.e., a polynomial Q(x; y) in two variables x and y such that Q(xi ; yi) = 0 for every value of x and y. Given this basic strategy, the performance of the algorithm depends on the choice of the degree of Q which allows for such a curve to exist, and still be useful! (For example if we allow Q to be 0, or if we pick the degree of Q be n in x and 0 in y, the such polynomials do exist, but are of no use. On the other hand a non-zero polynomial Q of degree n=10 in x and 0 in y may be useful, but will probably not exist for the given data points.) To determine what kind of polynomial Q we should search for, weP pick two parameters l and m and impose the following conditions on Q(x; y) = i;j qij xiyj : 1. Q should not be the zero polynomial. (I.e., some qij should be non-zero.) 2. qij is non-zero implies j m and i + (k , 1)j l. (The reason for this restriction will become clear shortly.) 3. Q(xi; yi ) = 0 for every given pair (xi ; yi ). Now consider the task of searching for such a Q. This amounts to nding values for the unknown coecients qij . On the other hand the conditions in (3) above amount to homogeneous linear equations in qij . By elementary linear algebra a solution to such a system exists and can be found in polynomial time provided the number of equations (n) strictly exceeds the number of unknowns (i.e., the number of (i; j) pairs such that 0 i; j, j m and i+(k , 1)j m). It is easy to count the number of such coecients. The existence of such coecients will determine our choice of m; l. Having determined such a polynomial we will apply the following useful lemma to show that p can be extracted from Q.
Lemma 20 [1]. Let Q(x; y) = Pi;j qijxi yj be such that qij = 0 for every i; j with i + (k , 1)j > l. Then if p(x) is polynomial of degree k , 1 such that for strictly more than l values of i, yi = p(xi ) and Q(xi; yi ) = 0, then y , p(x) divides the polynomial Q(x; y).
Proof. Consider rst the polynomial g(x) obtained from Q by substituting y =
p(x). Notice that the term qij xiyj becomes a polynomial in x of degree i+(k , 1)j which by property (2) above becomes a polynomial of degree at most l in x. Thus g(x) = Q(x; p(x)) becomes a polynomial in x of degree at most l. Now, for every i such that yi = p(xi ) and Q(xi; yi ) = 0, we have that g(xi ) = Q(xi ; p(xi)) = 0. But there are more than l such values of i. Thus g is identically zero. This immediately implies that Q(x; y) is divisible by y , p(x). (The division theorem for polynomials says that if a polynomial h(y) evaluates to 0 at y = then y , divides h(y). Applying this fact to the polynomial Qx (y) = Q(x; y) and y = p(x), we obtain the desired result. Notice in doing so, we are switching our perspective. We are thinking of Q as a polynomial in y with coecients from the ring of polynomials in x.) Going back to the choice of m and l, we have several possible choices. In one extreme we can settle for m = 1 and then if l (n + k)=2, then we nd that the
number of coecients is more than n. In this case the polynomial Q(x; y) found by the algorithm is of the form A(x)y + B(x). Lemma 20 above guarantees that if t b(n , k)=2c then y , p(x) divides Q. Thus p(x) = ,B(x)=A(x) and can be computed easily by a simple polynomial division. Thus in this case we can decode from b(n , k)=2c errors thus recovering the results of [20]. In fact, in this case the algorithm essentially mimics the [20] algorithm, though the correspondence may not be immediately obvious. p p At a dierent extreme one p may pick m n=k and l nk and in this case Lemma 20 works for t n , 2 nk. In this case to recover p(x) from Q, one rst factors the bivariate polynomial Q. This gives a list of all polynomial pj (x) such that y ,pj (x) divides Q. From this list we pull out all the polynomials pj such that pj (xi ) 6= yi for at most t values of xi . Thus in this case also we have a polynomial time algorithm provided Q can be factored in polynomial time. Fortunately, such algorithms are known, due to Kaltofen [8] and Grigoriev [7] (see Kaltofen [9] for a survey of polynomial factorization algorithms). For k=n ! 0, the number of errors corrected by this algorithm approaches (1 , o(1))n. A more detailed analysis of this algorithm and the number of errors corrected by it appear in [16]. The result shows that this given an [n; n; (1 , )n]q Reed Solomon code, the number of errors corrected by this algorithm approaches $r % 1 2 1 1 n 1 , 1 + , 2 where = + 4 , 2 : A plot of this curve against appears in Figure 1. Also shown in the gure are the distance of the code ((1 , )n) and the classical-error correction bound ((1 , )=2n).
6 Open questions Given that the fundamental maximum likelihood decoding problem is NP-hard for a general linear code, the next direction to look to is a bounded distance decoding algorithm for every [n; k; d]q linear code. The bottleneck to such an approach is that in general we can't compute d in polynomial time, due to the recent result of Vardy [18]. Thus the next step in this direction seems to suggest an application of approximation algorithms: Open Problem 1 Given an n k matrix G, approximate the distance d of the code CG to within a factor of (n). The goal here is to nd the smallest factor (n) for which a polynomial time approximation algorithm exists. Currently no non-trivial (i.e., with (n) = o(n)) approximation algorithm is known. A non-trivial (n) approximation algorithm would then suggest the following candidate for bounded distance decoding: Open Problem 2 Given an n k matrix G, a word R 2 n and an integer t, nd all codewords within a Hamming distance of t from R, or show that the minimum distance of the code is less than t1(n).
[t] 1 New Correction Bound Diameter Bound (1 - x) Classical Correction Bound (1 - x)/2 0.8
0.6 error (e/n) 0.4
0.2
0 0
0.2
0.4
rate (k/n)
0.6
0.8
Fig. 1. Fraction of errors corrected by the algorithm from [16] plotted against the rate of the code. Also plotted are the distance of the code and the classical error-correction bound.
A similar problem is posed by Vardy [18] for 1 = 2. Here the hope would be to nd the smallest value of 1 for which a polynomial time algorithm exists. While there is no immediate formal reasoning to believe so it seems reasonable to believe that 1 will be larger than . Next we move to the questions in the area of design of ecient codes, motivated by the work of Spielman [15].
Open Problem 3 For every > 0, design a family of [n; n; n]2 codes Cn so that the bounded distance problem on Cn with parameter t n can be solved in linear time.
The goal above is to make as large as possible for every xed . Spielman's result allows for the construction codes which match the best known values of for any [n; n; n]2 linear code. However the value of is still far from in these results. We now move towards questions directed towards decoding Reed-Solomon codes. We direct the reader's attention to Figure 1. Clearly every point above
1
the solid curve and below the distance bound of the code, represents an open problem. In particular we feel that the following version maybe solvable in polynomial time:
Open Problem 4 Find a bounded distance decoding algorithm for an [n; n; (1, )n]q Reed Solomon code that decodes up to t (1 , p)n errors. The motivation for this particular version is that in order to solve the bounded distance decoding problem, one needs to ensure that the number of outputs (i.e., the number codewords within the given bound t) is polynomial in n. Such a bound does exist for the value of t as given above [6, 12], thus raising the hope that this problem may be solvable in polynomial time also. Similar questions may also be raised about decoding multivariate polynomials. In particular, we don't have polynomial time algorithms matching the bounded distance decoding algorithm from [16], even for the case of bivariate polynomials. This we feel may be the most tractable problem here.
Open Problem 5 Find a bounded distance decoding algorithm for the bivariate p polynomial code Cpoly;2;n;n that decodes up to t (1 , 2)n2 errors.
References 1. S. Ar, R. Lipton, R. Rubinfeld and M. Sudan. Reconstructing algebraic functions from mixed data. SIAM Journal on Computing, to appear. Preliminary version in Proceedings of the 33rd Annual IEEE Symposium on Foundations of Computer Science, pp. 503{512, 1992. 2. E. R. Berlekamp. Algebraic Coding Theory. McGraw Hill, New York, 1968. 3. E. R. Berlekamp. Bounded Distance +1 Soft-Decision Reed-Solomon Decoding. In IEEE Transactions on Information Theory, pages 704-720, vol. 42, no. 3, May 1996. 4. E. R. Berlekamp, R. J. McEliece and H. C. A. van Tilborg. On the inherent intractability of certain coding problems. IEEE Transactions on Information Theory, 24:384{386, 1978. 5. R. DeMillo and R. Lipton. A probabilistic remark on algebraic program testing. Information Processing Letters, 7(4):193{195, June 1978. 6. O. Goldreich, R. Rubinfeld and M. Sudan. Learning polynomials with queries: The highly noisy case. Proceedings of the 36th Annual IEEE Symposium on Foundations of Computer Science, pp. 294{303, 1995. 7. D. Grigoriev. Factorization of Polynomials over a Finite Field and the Solution of Systems of Algebraic Equations. Translated from Zapiski Nauchnykh Seminarov Lenningradskogo Otdeleniya Matematicheskogo Instituta im. V. A. Steklova AN SSSR, Vol. 137, pp. 20-79, 1984. 8. E. Kaltofen. A Polynomial-Time Reduction from Bivariate to Univariate Integral Polynomial Factorization. In 23rd Annual Symposium on Foundations of Computer Science, pages 57-64, 1982. 9. E. Kaltofen. Polynomial factorization 1987{1991. LATIN '92, I. Simon (Ed.) Springer LNCS, v. 583:294{313, 1992.
10. R. Lidl and H. Niederreiter. Introduction to Finite Fields and their Applications. Cambridge University Press, 1986 11. F. J. MacWilliams and N. J. A. Sloane. The Theory of Error-Correcting Codes. North-Holland, Amsterdam, 1981. 12. J. Radhakrishnan. Personal communication, January, 1996. 13. J. T. Schwartz. Fast probabilistic algorithms for veri cation of polynomial identities. Journal of the ACM, 27(4):701{717, 1980. 14. M. Sipser and D. A. Spielman. Expander codes. IEEE Transactions on Information Theory, 42(6):1710{1722, 1996. 15. D. A. Spielman. Linear-time encodable and decodable error-correcting codes. IEEE Transactions on Information Theory, 42(6):1723{1731, 1996. 16. M. Sudan. Decoding of Reed Solomon codes beyond the error-correction bound. Journal of Complexity, 13(1):180-193, March 1997. See also http://theory.lcs.mit.edu/~ madhu/papers.html for a more recent version. 17. J. H. van Lint. Introduction to Coding Theory. Springer-Verlag, New York, 1982. 18. A. Vardy. Algorithmic complexity in coding theory and the minimum distance problem. Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, pp. 92{109, 1997. 19. B. L. van der Waerden. Algebra, Volume 1. Frederick Ungar Publishing Co., Inc., page 82. 20. L. Welch and E. R. Berlekamp. Error correction of algebraic block codes. US Patent Number 4,633,470, issued December 1986. 21. R. E. Zippel. Probabilistic algorithms for sparse polynomials. EUROSAM '79, Lecture Notes in Computer Science, 72:216{226, 1979.
This article was processed using the LaTEX macro package with LLNCS style
Sharper Results on the Expressive Power of Generalized Quanti ers Anil Seth The Institute of Mathematical Sciences C.I.T. Campus, Taramani Madras 600113, India e-mail: [email protected]
Abstract. In this paper we improve on some results of [3] and extend
them to the setting of implicit de nability. We show a strong necessary condition on classes of structures on which PSPACE can be captured by extending PFP with a nite set of generalized quanti ers. For IFP and PTIME the limitation of expressive power of generalized quanti ers is shown only on some speci c nontrivial classes. These results easily extend to implicit closure of these logics. In fact, we obtain a nearly complete characterization of classes of structures on which IMP (PFP ) can capture PSPACE if nitely many generalized quanti ers are also allowed. We give a new proof of one of the main results of [3], characterizing the classes of structures on which L!1;! (Q) collapses to FO(Q), where Q is a set of nitely many generalized quanti ers. This proof easily generalizes to the case of implicit de nability, unlike the quanti er elimination argument of [3] which does not easily get adapted to implicit de nability setting. This result is then used to show the limitation of expressive power of implicit closure of L!1;! (Q). Finally, we adapt the technique of quanti er elimination due to Scott Weinstein, used in [3], to show that IMP (Lk (Q))-types can be isolated in the same logic.
1 Introduction Since the expressive power of rst order logic is quite limited on nite structures, some natural xed point extensions of it such as Least xed point (LFP) and Partial xed point (PFP ) logics have been studied in nite model theory. LFP and PFP capture PTIME and PSPACE respectively, on classes of ordered structures. However on unordered structures even a powerful extension of these logics, L!1;! fails to include all PTIME queries. In fact, it is an open question if there is a logic which captures PTIME on all structures. One way of extending the expressive power of a logic is by adding generalized quanti ers to it. This is a uniform way of enriching a logic by an arbitrary property without going to second order logic. In [3], it was shown that no nite set of generalized quanti ers can be added to IFP to capture exactly PTIME on all structures
and similarly for PFP and PSPACE. However, this result was proved only for those classes of structures which realize for each k, a uniformly bounded number of k-automorphism classes in each structure. These classes are called \trivial classes" in [3]. An example of such a class is the class of complete structures in a given vocabulary. Nevertheless most of the interesting classes of structures do not satisfy this condition and it remains open whether on such classes extension of xed point logics by nitely many generalized quanti ers can capture the corresponding complexity class. For example, consider the class of complete binary trees studied in [7]. It does not follow from [3] that for any nite set of generalized quanti ers Q, PFP (Q) 6= PSPACE on the class of complete binary trees. In this paper, we prove a more general result which shows that any extension of PFP by nitely many generalized quanti ers cannot capture PSPACE on any recursively enumerable class of structures which, roughly speaking, can not realize polynomially many automorphism types. As an example application of this result, it follows that on the class of complete binary trees mentioned above, for any nite set of generalized quanti ers Q, PFP(Q) 6= PSPACE. While we cannot prove a general theorem similar to the above result for IFP extended with generalized quanti ers and PTIME, for some special classes such as complete binary trees, we show the similar limitation of any nite set of generalized quanti ers. Another main result of [3] is to provide a characterization of the collapse of L!1;! (Q) to FO(Q) on a class of structures, in terms of boundedness of Lk (Q)types in the structures of this class. This is proved using a novel technique of quanti er elimination which is due to Scott Weinstein. We provide another proof of this result without using the quanti er elimination argument, and instead obtain this result by generalizing the quotient structure construction of [2,1] in the presence of generalized quanti ers. Next we turn to implicit de nability. Implicit closure of various logics on subclasses of structures has been studied in recent years by de ning a notion of partial queries (see [5]). Partial queries implicitly de nable in various logics far exceed the expressive power of xed point logics. For instance, IMP (PFP) captures PSPACE on the class of rigid structures and IMP(L!1;! ) can express every query on rigid structures. This raises the question whether IMP(LFP) or IMP (PFP ) possibly in the presence of nitely many generalized quanti ers can capture corresponding complexity classes. We answer this question in the negative. The proof of our previous theorem easily extends to show that even IMP (PFP (Q)), where Q is a nite set of generalized quanti ers, can not capture PSPACE on any class of structures which does not realize polynomially many automorphism types. In the case of IMP(PFP(Q)) a converse of this result also holds, if we consider queries only upto some given arity. Next we de ne the notion of k-types for IMP(Lk (Q)) and prove a result analogous to the one in [3] characterizing the collapse of IMP1 (L!1;! (Q)) to IMP(FO(Q)) over a class of structures in terms of boundedness of IMP(Lk (Q))-types, for all k, over this class. Here, IMP1 (L!1;! (Q)) is a stronger closure of L!1;! (Q) under implicit de nability, which allows countably many query variables, than
IMP (L!1;! (Q)) in which only nitely many query variables are allowed. As a corollary to this result we get that for any nite set of PTIME computable generalized quanti ers IMP1 (L!1;! (Q)) cannot express all PTIME queries on the class of complete structures. The above characterization theorem itself is proved by extending our proof of the theorem characterizing the collapse of L!1;! (Q) to FO(Q). Its proof makes use of our quotient structure construction in the presence of generalized quanti ers. This justi es our presenting a new proof of the already known theorem of [3], characterizing the collapse of L!1;! (Q) to FO(Q). We do not know how to extend the quanti er elimination argument in [3] to prove the above characterization theorem. The two techniques, quanti er elimination argument of [3] and the quotiented structure construction of this paper appear to have dierent limitations and therefore seem incomparable. We cannot prove the isolation of IMP (Lk (Q))-types using quotiented structure construction. In the end, we provide a non-obvious adaptation of the quanti er elimination argument to isolate IMP(Lk (Q))-types in the same logic. This extension is not obvious because unlike in the case of an Lk (Q) formula, the subformulae of a sentence de ning a query implicitly may not de ne any query and hence an inductive argument does not work. This isolation theorem however, is not sucient to prove our characterization theorem because we can not show an upper bound on the rank of the IMP(Lk (Q)) formulae isolating IMP(Lk (Q))-types within a structure in terms of the number of types realized in the structure.
2 Preliminaries
A vocabulary is a nite sequence < R1; : : :; Rm > of relation symbols of xed A arities. A structure A =< A; RA 1 ; : : :; Rm > consists of a set A, called the ri universe of A, and relations RA i A , where ri is the arity of the relation symbol Ri, 1 i m. We shall assume our structures to be nite and classes of structures to be closed under isomorphism. A Boolean query Q over a class C of structures is a mapping from structures in C to f0; 1g, furthermore if A is isomorphic to B then Q(A) = Q(B). For any positive integer k, a k-ary query over C is a mapping which associates to every structure A in C a k-ary relation on A. Again if f is an isomorphism from A to B then f should also be an isomorphism from < A; Q(A) > to < B ; Q(B) >.
2.1 Logics with Fixed Point Operators
Let '(z ; x1; : : :; xn; S) be a rst order formula in the vocabulary [ fS g, where S is n-ary relation symbol not in . Let A be a structure, for any assignment c of elements in A to variables z the formula ' gives rise to an operator (S) from n-ary relations on the universe A of A to n-ary relations on A as follows (S) = f(a1; : : :; an) : A j= '(c; a1; : : :; an; S)g for every n-ary relation S on A. Variables z are parameter variables. Every such operator can be iterated and gives rise to the sequence of stages m , m 1, where 1 = (;) , l+1 = (l ). Each m is a n-ary relation on A.
If the above formula ' is positive in S, that is each occurrence of S in is within an even number of negations, then the above sequence m , m 1 converges, that is for each structure A, there is a m0 such that m0 = m , for all m m0 . We de ne '1 = m0 . Least xed point logic (LFP) arises by closing the rst order logic (FO) under the following new rule, called least xed point rule, to form new formulae; if '(z ; x1; : : :; xn; S) is a formula in which S occurs positive then lfp(S; x1; : : :; xn)'(y1; : : :; yn ) is also a formula with z and y1 ; : : :; yn as its free variables. The meaning of this formula on a given structure A and for an assignment c of elements in A to variables z is as follows: lfp(S; x1; : : :; xn)'(y1 ; : : :; yn) is true i (y1 ; : : :; yn) 2 '1 , where '1 is as de ned above. The partial xed point (PFP) logic is de ned in the same way as LFP except that pfp(S; x1; : : :; xn)'(y1 ; : : :; yn) construct is available for each formula '(x1; : : :; xn; S) not just for ' where S occurs positive in '. The meaning of n-ary relation pfp(S; x1 ; : : :; xn)' with respect to variables y1 ; : : :; yn is de ned to be m0 if there is a m0 such that m0 = m0 +1 and is de ned to be ; if there is no such m0 . While the de nition of xed point construct allows for parameter variables, it is easy to eliminate them by enlarging the arity of relational symbol in the xed point construct suitably, see [4, Lemma 7.1.10 (b)]. So every LFP(PFP) formulae is logically equivalent to a LFP(PFP) formula in which all instances of xed point construction involve no parameter variables, that is in the above de nition variables z are absent. In the following we will take advantage of this observation and will use without loss of generality the simpli ed de nition of LFP (PFP ).
2.2 Generalized Quanti ers Generalized quanti ers were rst studied by Lindstrom to increase the expressive power of rst order logic without using second order quanti ers. In recent years generalized quanti ers have been used in nite model theory to extend the expressive power of various xed point logics and to show limitations on expressive power that can be obtained by means of such extensions. We provide in this section a standard introduction to generalized quanti ers along with basic de nitions as in [3]. Let C be a class of structures over vocabulary =< R1; : : :; Rm > (where Ri has arity ni ) that is closed under isomorphism. We associate with C the generalized quanti ers QC . For a logic L, de ne the extension L(QC ) by closing the set of formulas of L under the following formula formation rule: if 1; : : :; m are formula in L(QC ) and x1 ; : : :; xm are tuples of variables of arities n1; : : :; nm respectively, then QC x1 ; : : :; xm (1; : : :; m ) is a formula of L(QC ) with all occurrences in i of the variables among xi bound. The semantics of the quanti er QC is given by: A; s j= QC x1 ; : : :; xm (1(x1 ; y); : : :; m (xm ; y)), i (A; A1 (s); : : :; Am (s)) 2 C, where A is the domain of A and Ai(s) = ft 2 Ani j A j= i(t; s)g.
The type of the generalized quanti er QC as above is de ned to be < n1 ; : : :; nm > and its arity is de ned to be maxfn1 ; : : :; nm g. We say that QC is in complexity class DSPACE[S(n)] or DTIME[t(n)] if there is an algorithm which when given the encoding of any structure over decides in complexity DSPACE[S(n)] or DTIME[t(n)] respectively, in the size of the encoding of the input, whether the given input structure 2 C.
Examples: 1. The usual rst order existential quanti er is associated with the class f(A; U) j U A; U 6= ;g 2. The counting quanti er Ci associated with the class f(A; U) j U A; jU j ig. Both these quanti ers are unary and computable in linear time. 3. The planarity quanti er P is associated with the class of planar graphs, f(A; R)jR A A; (A; R) is planar g 4. The quanti er multiple, M is associated with the class f(A; U1 ; U2) j U1 ; U2 A; jU1j = k:jU2j for some k 2 N g. In this paper we will consider logics FO(Q), PFP(Q) and L!1;! (Q). As is customary in the presence of generalized quanti ers, we will consider IFP(Q) instead of LFP (Q) because syntactic restrictions on LFP(Q) formulae guaranting the monotonicity of relations constructed during xed point iterations are not obvious, and the semantics of LFP formulae without monotonicity conditions is not de ned.
2.3 Lk (Q)-Types
By Lk we mean the fragment of rst order logic which uses at most k distinct variables (including both free and bound variables). Similarly, Lk (Q) denotes the k-variable fragment of FO(Q). In the following, we will assume Q to be an arbitrary but xed set of nitely many generalized quanti ers. The idea of Lk -types was introduced in [2,1]. In [3] this notion is generalized to de ne Lk (Q)types. We reproduce brie y relevant de nitions and results from their work. De nition 1. Let A be a structure and let s =< a1; : : :; al > be a sequence of elements from A where l k. Lk (Q) type of s is the set of all Lk (Q)-formulae, (x1; : : :; xl) such that A j= (a1; : : :; al ). Note that k-types induce an equivalence relation on the set f(A; a1; : : :; ak ) j A a nite structure, a1; : : :; ak 2 Ak g. By the set of k-types realized in a class C of structures we mean the equivalence classes of the above relation where A is a structure in C. By k-types realized in a given structure A we mean the equivalence classes of k-tuples of elements of A, induced by the above relation. An interesting fact about Lk (Q) types is that they can be isolated in Lk (Q). This is stated more precisely in the following Lemma.
Lemma 1. [3] Given (A; a ; : : :; ak) there is a Lk (Q) formula (x ; : : :; xk) such that for all (B; b ; : : :; bk ), B j= (b ; : : :; bk ) i (A; a ; : : :; ak ) and (B; b ; : : :; bk ) have the same Lk (Q)-type. 1
1
1
1
1
1
This result is proved in [3], using a quanti er elimination argument due to Scott Weinstein.
2.4 Implicit De nability on Finite Structures Let C be a class of nite structures. De nition 2. Let L be a logical language over vocabulary . Let (R1; : : :; Rn) be a formula in L for some n and R1; : : :; Rn 62 . implicitly de nes a query over C , in language L if for every structure A 2 C there is exactly one sequence RA1 ; : : :; RAn of relations over A for which (RA1 ; : : :; RAn) is true in A. The query de ned by R1 is said to be a principal query and queries de ned by R2; : : :; Rn are said to be auxiliary queries. IMP(L) is the set of queries which are principal queries de ned by a formula as above.
Note that it follows from the uniqueness of the sequence RA1 ; : : :; RAn in the de nition above that relations RA1 ; : : :; RAn are closed under automorphisms of A. Therefore these relations actually de ne a sequence of queries. Implicit de nability, as a logic over nite structures, was rst studied in [6]. We have given the notion of implicit de nability relative to a class of structures. In standard logic literature C is taken to be the class of all structures. However, in nite model theory C is often taken to be the class of interest (a proper subclass of nite structures) and the resulting query is a partial query (even over nite structures), [5]. For de ning a Boolean query, we may represent \true" by nonempty relation and \false" by empty relation. Alternatively, we may represent \true" by full relation (all tuples included), and \false" by empty relation.
3 Evaluating a Formula Eciently on Structures with Few Automorphism Types Let us recall the following de nition of k-automorphism types from [3], which will play a crucial role in this section.
De nition 3. Given a structure A an equivalence relation 'k on Ak is de ned as follows. (a ; : : :; ak ) 'k (b ; : : :; bk ) if there is an automorphism f on structure A such that f(ai) = bi for 1 i k. Equivalence classes of A under this relation are called k-automorphism types of A. 1
1
In this section, we present a new algorithm to evaluate a PFP formula on a structure, in the presence of generalized quanti ers, which is more ecient than the obvious method of evaluation if the number of automorphism types realized
in the structure is small. An ecient method to evaluate a PFP formula in the same sense was presented in [8] by constructing the quotiented structure A=' for a given structure A. The contribution of this section is to obtain the ecient evaluation of a PFP formula in the presence of generalized quanti ers. It is not clear how to construct a quotiented structure in the presence of generalized quanti ers, though we will solve this problem later in this paper. However, the quotiented structure that we construct there grows exponentially in the number of automorphism types of the original structure unless we assume some speci c properties of the generalized quanti ers. So the approach of constructing a quotiented structure cannot help us in solving our problem. We present a more direct method for evaluation of a formula which meets our objectives. We exploit the fact that all the intermediate sets that need to be constructed during the evaluation are closed under suitable automorphism. As observed towards the end of section 2.1, we need not consider parameter variables in the construction of xed point formulae. This observation is also valid in the presence of generalized quanti ers, using the identical argument. So in the following, without loss of any generality we will assume that all xed point constructions are without any parameter variables. It may, therefore be noted that by a k-variable IFP(PFP) formula we will mean a formula using at most k distinct variables and without parameter variables in IFP (PFP) de nitions. How exactly we de ne the notion of k-variable fragment of PFP (Q) is not critical for us, we just need a strati cation of PFP(Q) formulae to obtain a convenient complexity bound for evaluation of a formula at each level. Lemma 2. Let be a xed vocabulary, and k > the maximum arity of relation symbols in . Let Q = (Qi)i2N be a family of generalized quanti ers which have arity r and are computable in DSPACE(ns ). For any k-variable PFP (Q) formula (x), there is a constant c (independent of n; m) such that on all structures A, (x) can be evaluated in c:(n + n:m)r:s+1 space, where n is the size of the structure (jAj) and m=number of k-automorphism types of A. Proof. The proof is by induction on the structure of formula . To make the induction step go through, we prove a stronger statement in which can also have nitely many relational variables occurring free. Details are given in the full version. ut
4 Some Necessary Conditions for the Existence of Q with PFP (Q) = PSPACE We begin this section by combining the results of the previous section with the diagonalization arguments of [8] to obtain some necessary conditions on a class of nite structures for the existence of a suitable set Q of generalized quanti ers such that PFP (Q) = PSPACE on this class of structure. Theorem 1. Let C be a class of nite structures and let Q = (Qi )i2N be the family of all generalized quanti ers of arity r and computable in DSPACE(ns ),
for some xed r; s. If there is a number l such that for all k l, for all natural numbers i, and for all real numbers > 0 there is a structure A 2 C such that the number of k automorphism types of A is < jAj but the number of l-automorphism types of A is > i then PSPACE(C) 6= PFP(Q)(C).
Proof. This can be proved using Lemma 2 above and the diagonalization argument of [8, Theorem 3]. ut
As an immediate application of the theorem above, consider the example of complete binary trees (CBT) originating from [7]. Example 1. Let Q be a nite set of PSPACE computable generalized quanti ers. On any in nite class of complete binary trees PFP(Q) 6= PSPACE . In [8, page 362] we remarked that there is a O(np ) time algorithm, where p is independent of k, to nd k-automorphism class of a k-tuple in a complete binary tree. By examining the proof of Lemma 2 and the Theorem 1 we see that the diagonalization argument there can be adapted over any in nite class of binary trees to create a PTIME query, which diagonalizes all IFP(Q) formulae on this class. So we have the following generalization of an observation in [8, page 362].
Example 2. Let Q be a nite set of PTIME computable generalized quanti ers. On any in nite class of complete binary trees IFP(Q) = 6 PTIME . Note that the above examples cannot be deduced from the results of [3]. The next theorem applies even to trivial classes, in the sense of [3, De nition 4.6], provided they are recursively enumerable. Theorem 2. Let C be a recursively enumerable class of nite structures and let Q = (Qi)i2N be the family of all generalized quanti ers of arity r and computable in DSPACE(ns ), for some xed r; s. If for every k and all real numbers > 0 there are in nitely many structures Ai 2 C such that the number of k automorphism types of A is < jA j , i = 1; 2; 3; : : : then PSPACE(C) 6= PFP (Q)(C). In fact there is a Boolean query in PSPACE(C) but not in PFP (Q)(C). Proof. This can be proved using Lemma 2 above and the diagonalization argument of [8, Theorem 4]. ut As an application of Theorem 2 we have the following. Example 3. On any in nite recursively enumerable class of cliques PSPACE 6= PFP (Q), for any set Q of bounded arity DSPACE(ns ) generalized quanti ers for a given s. Using the fact that representation of k-automorphism types, in a structure with one binary relation symbol which is interpreted as an equivalence relation, can be constructed eciently, we can easily deduce the following. i
Example 4. On any in nite recursively enumerable class of equivalence relations, in which any structure A has equivalence classes of at most O(log jAj) distinct cardinalities, PTIME = 6 IFP(Q), for any set Q of bounded arity
DTIME(ns ) generalized quanti ers for a given s. The following lemma shows that results of Theorems 1, 2 cannot be improved in, at least, some ways. Lemma 3. Let C be a class of nite structures. If there is a natural number k and a real number > 0 such that for all structures A 2 C the number of k-automorphism types of A is jAj then for each l, there is a PSPACE computable query Q such that for all l-ary queries PSPACE(C) = PFPQ (C). Here PFPQ is the language PFP augmented with additional, built-in relation symbol Q which on any structure A is interpreted as Q (A). Proof. Easy, will be given in the nal version. ut Notice that while it is clear that any Boolean query can be represented as a generalized quanti er, it is not clear that this can be done for queries of arity > 0 also. So we do not get a real converse for Theorem 2. This situation will change if we allowed implicit de nitions also, as we shall see later. l
l
l
l
l
5 Lk(Q) Invariant In this section we associate with each structure A and a nite set Q of generalized quanti ers, an object called its Lk (Q) invariant, such that if A; B have same invariant then their Lk (Q) theories are the same. The invariant is an extension of the structure quotiented w.r.t type equivalence relation of [1,2]. However it is not quite a rst order structure, in order to keep information about generalized quanti ers the quotiented structure also has some second order functions over its domain. We begin by recalling an elementary observation from [3].
Observation 1. For any structure A, any nite set Q of generalized quanti ers and any k, there are formula (x ; : : :; xk ); : : :; m(x ; : : :; xk) which partition Ak such that each i (x ; : : :; xk ) isolates a Lk (Q)-type in A. Proof. Let A realize m distinct Lk (Q)-types of k-tuples. We can number these types as 1; 2; : : :; m. By de nition for each a1 ; a2 in dierent classes (say in type i and j respectively) we have a Lk (Q) formula i;j (x ; : : :; xk ) such that A j= i;j (a1 ) and A j= :i;j (a2 ). Let i (x ; : : :; xk ) = ^ j m;i6 j i;j (x). ut Let be a vocabulary consisting of relation symbols R ; : : :; Rm and let Q be a nite set of generalized quanti ers under consideration. To avoid separate treatment for the standard rst order logic quanti er `9', we assume it to be included in the set Q as unary generalized quanti er of example 1 in section 1
1
1
1
1
1
1
=
1
2.2. Let k be > the arities of the relations R1; : : :; Rm and arities of quanti ers
in Q. [Note that if we are considering k-variable logic then we can replace any relation Ri of arity k by several relations (but only nitely many) of arity k depending on the pattern of variable repetitions that can occur if we place kvariables as argument to Ri , such that for any Lk formula in the old vocabulary there is an equivalent formula in the new vocabulary. A similar transformation can be done on generalized quanti ers of arity > k to obtain several (but only nitely many) new generalized quanti ers of arity k by considering all patterns of variable repetitions in the sequence of relations in the class associated with the generalized quanti er. ] For a quanti er Q of type < n1 ; : : :; nj > we de ne a set SQ , of j-tuple of sequences as follows: SQ = f< s1 ; : : :; sj > j si is a k , ni length sequence of distinct variables from (x1; : : :; xk )g. The invariant will be a structure over vocabulary 0 . Vocabulary 0 consists of symbols =0 , R01; : : :; R0m , Ps1 ; : : :; Psj , (fsQ )Q2Q;s2S Q where =0 , R01 ; : : :; R0m are unary relation symbols and Ps1 ; : : :; Psj , j = kk are binary relation symbols. For each quanti er Q of type < n1 ; : : :; nj > and s 2 SQ , fsQ is a function from P j to P where P is power set of the domain. Given a structure A its Lk (Q) invariant A=Qk is de ned as (Ak =Qk ; =0; R01; : : :; R0m ; Ps1 ; : : :; Psj ; (fsQ )Q2Q;s2S Q ), where =0 ([a1; : : :; ak ]) i a1 = a2 . R0i([a1; : : :; ak ]) i Ri(a1 ; : : :; al ), where l is the arity of Ri . Let s =< i1 ; : : :; ik > be a sequence of integers from f1; : : :; kg. Ps is de ned as: Ps = f([a1; : : :ak ]; [ai1 ; : : :; aik ]) j a1 ; : : :ak 2 Ag. If Q is a quanti er of type < n1 ; : : :; nj > and s =< s1 ; : : :; sj >2 SQ then fsQ : [P (Ak =Qk )]j ! P(Ak =Qk ) is de ned as follows. Given < I1 ; : : :; Ij >, where each Il Ak =Qk . Let l (tl ; sl ), 1 l j, be a formula over where tl is a sequence of the x1; : : :; xk not in sl , such that Il = f[a1; : : :; ak ]jA j= l (a1 ; : : :; ak )g. l can be constructed using 1 ; : : :; m in the Observation 1 above. Let (x1; : : :; xk ) = Qt1 ; : : :; tj (1 (t1 ; s1 ); : : :; j (tj ; sj )) fsQ (I1 ; : : :; Ij ) is de ned to be the set of types of the tuple x1; : : :; xk for which (x1; : : :; xk) is true. That is fsQ (I1 ; : : :; Ij ) = f[a1; : : :; ak ] j A j= (a1; : : :; ak )g. Note that the set of tuples satisfying is Qk closed, by the de nition of Qk as is a Lk (Q) formula. Given a FO(Q) formula (z1 ; : : :; zk ) constructed using variables x1; : : :; xk (z1 ; : : :; zk is a permutation of x1; : : :; xk ), we de ne a formula (z) over 0 as follows. (z) will in general be a formula of higher order logic and not in rst order logic.
If zi = zj then (z) = 9y(Ps (z; y)^ =0 (y)) where s is a sequence chosen so that s =< i; j; : : : >.
If Rj (zi1 ; : : :; zim ) then (z) = 9y(Ps (z; y) ^ R0j (y)), where s is a sequence chosen so that s =< i1 ; : : :; im ; : : : >.
If : then (x) = : (x) If ^ then = ^ 1
2
1
2
We now consider the case of generalized quanti ers (this by our assumption about Q also includes the case of 9x ). Let Qy1 : : : yj ( (y1 ; u1 ); : : :; j (yj ; uj )), where Q is of type . All variables in are among fx ; : : :; xk g. Without loss of generality, we may assume that length of each ui is k , ni (if it is less then we can add some dummy variables to it). Let s be the sequence < u1 ; : : :; uj >. (z) is de ned as 9y[Ph (y; z) ^ y 2 fsQ (fz j (z)g; : : :; fz jj (z)g)], where z ; : : :; zk xi1 ; : : :; xik and h =< i ; : : :; ik >. Note that in this case (x) 1
1
1
1
1
1
1
is not a rst order formula. Lemma 4. Let z1; : : :; zk be a permutation of x1; : : :; xk. Let (z1 ; : : :; zk) be a Lk (Q) formula constructed using only variables x1; : : :; xk then for all a1; : : :; ak 2 A, A j= (a1 ; : : :; ak ) i A=Qk j= ([a1; : : :; ak ]) Proof. This is proved by induction on the structure of . Details are omitted from this extended abstract. ut By Lemma 4, we get that if two structures A, B have dierent Lk (Q) theories then their invariants are dierent. It is also interesting to note the following converse though we do not need it for our results later. Lemma 5. If two structures A, B have the same Lk (Q) theories then their invariants are also the same (upto isomorphism).
Proof. Easy, given in the full version. ut Remarks: 1. The size of the invariant de ned above is exponential in the number of types
realized the structure. This seems to be unavoidable in the most general case. Although, for a nice family of generalized quanti ers it may often be possible to come up with much smaller and perhaps rst order quotiented structures exploiting the speci c properties of these quanti ers. The formula isolating Lk (Q)-types in [3], is also an invariant for a structure but its size is also exponential in the number of types realized. 2. One can always construct rst order (many sorted) structures to represent higher order structures by keeping higher order objects in dierent sorts. So the fact that we constructed a second order structure does not indicate an intrinsic limitation. It was done to give a natural description of the quotiented object.
6 Collapsing L!1;! (Q) to FO(Q) Using the results of the previous section, we now present a new proof of one of the main results of [3]. We give details of every step, as some of this will be generalized to the setting of implicit de nability in the next section. Theorem 3. [3] Let C be a class of nite structures over vocabulary and let Q be a nite set of generalized quanti ers. For any k the following are equivalent.
1. There is a number mk such that number of Lk (Q)-types realized in each structure in C is bounded by mk . 2. Number of Lk (Q)-types realized over C is nite. 3. Lk1;! (Q) collapses to Lk (Q) over C . 4. There are only nitely many distinct Lk1;! (Q) queries over C .
Proof. We give here the proofs of (1) ) (2) and (2) ) (3) only, which dier signi cantly from those in [3].
1: ) 2: Given (A; a ; : : :; ak ), A 2 C the Lk (Q) type of (A; a ; : : :; ak ) is capQ tured by (A=k ; [a ; : : :; ak ]), by lemma 4. That is if (A=Qk ; [a ; : : :; ak ]), (B=Qk ; [b ; : : :; bk ]) are isomorphic then Lk (Q)-types of (A; a ; : : :; ak ) and (B; b ; : : :; bk ) are the same. By (1) the size of A=Qk for all A 2 C is bounded by mk . There are only nitely many nonisomorphic structures of size mk possible in the vocabulary of quotiented structures over C. Hence there are only nitely many Lk (Q)-types realized over C. 2: ) 3: (2) implies that there are only nitely many distinct queries in Lk (Q) Q k over C, as every L (Q) query over C is a union of k types. Using this, (3) is proved by induction on the structure of a Lk1;! (Q) Wformula. The only 1
1
1
1
1
1
1
case where we need to use (2) is when (x1; : : :xk ) = i2N i(x1 ; : : :xk ), where ; i, i 2 N, are Lk1;! (Q) formulae. By induction hypothesis for each i(x1 ; : : :xk ) there is a i (x1 ; : : :xk ) 2 Lk (Q) which is equivalent to i (x1; : : :xk ) over C. As there are only nitely many distinct queries in Lk (Q) over C, only nitely many say, i1 ; : : :; ir , of the 's are logically inequivalent over C. Therefore is equivalent, over C, to the Lk (Q) formula t i1 _ : : : _ ir . u We also note the following natural observations. (These are not dicult to prove, but are not mentioned in [3].)
Lemma 6. Let Q be any set of generalized quanti ers. Lk (Q) and Lk1;! (Q) de ne the same type equivalence relation on nite structures.
Proof. It suces to show that for any (A; a ; : : :; ak ) and (B; b ; : : :; bk) if there is a (x ; : : :; xk) 2 Lk1;! (Q) such that A j= (a ; : : :; ak ) and B j= :(b ; : : :; bk ) then there is a (x ; : : :; xk ) 2 Lk (Q) such that A j= (a ; : : :; ak) and B j= : (b ; : : :; bk ). This is not dicult to prove by induction on the structure of . ut Using the lemma above and the result from [3] that Lk (Q) types can be isolated in Lk (Q), we obtain the following normal form therem for Lk1;! (Q) 1
1
1
1
1
1
1
1
queries.
Corollary 1. Let Q be a nite set of generalized quanti ers. Every query in Lk1;! (Q) can be written as a countable disjunction of Lk (Q) formulae.
7 Generalizations to Implicit De nability In this section we will generalize results of the previous sections to richer logics by considering the implicit closure of the logics considered in the previous sections.
7.1 IMP (PFP (Q)) and PSPACE First we extend below Lemma 2.
Lemma 7. Let be a xed vocabulary, and k > the maximum arity of relation symbols in . Let Q = (Qi)i2N be a family of generalized quanti ers which have arity r and are computable in DSPACE(ns ). For every IMP(PFP(Q)) query P de nable by a k-variable PFP(Q) formula (x), there is a constant c (independent of n; m) such that on all structures A, P can be evaluated in c:(n + n:m)r:s space, where n is the size of the structure (jAj) and m=number of k-automorphism types of A. Proof. Easy. See the full version. ut +1
Theorems 1, 2 are easily extended as below using the Lemma 7.
Theorem 4. Let C be a class of nite structures and let Q = (Qi )i2N be the family of all generalized quanti ers of arity r and computable in DSPACE(ns ), for some xed r; s. If there is a number l such that for all k l, for all natural numbers i, and for all real numbers > 0 there is a structure A 2 C such that the number of k automorphism types of A is < jAj but the number of l-automorphism types of A is > i then PSPACE(C) = 6 IMP(PFP(Q))(C). Example 5. Let Q be a nite set of PSPACE computable generalized quanti ers. On any in nite class of complete binary trees IMP(PFP(Q)) = 6 PSPACE . Theorem 5. Let C be a recursively enumerable class of nite structures and let Q = (Qi)i2N be the family of all generalized quanti ers of arity r and computable in DSPACE(ns ), for some xed r; s. If for every k and all real numbers > 0 there are in nitely many structures Ai 2 C such that the number of k automorphism types of A is < jA j, i = 1; 2; 3; : : : then PSPACE(C) = 6 IMP (PFP (Q))(C). In fact there is a Boolean query in PSPACE(C) but not in IMP (PFP (Q))(C). Example 6. On any in nite recursively enumerable trivial class of structures PSPACE = 6 IMP (PFP (Q)), for any set Q of bounded arity DSPACE(ns ) i
generalized quanti ers for a given s.
Example 7. On any in nite recursively enumerable class of equivalence relations, in which any structure A has equivalence classes of at most O(log log jAj) distinct cardinalities, PTIME = 6 IMP(IFP(Q)), for any set Q of bounded arity DTIME(ns ) generalized quanti ers for a given s.
Note that in the above example we have assumed the types bounded by O(loglog jAj) instead of O(log jAj) as in Example 4. This is to take into account the additional time required to search over all sequences of automorphism closed relations in computing implicit closure of IFP(Q), as in the proof of Lemma 7. However, no signi cant additional space is required to evaluate the implicit closure of PFP (Q) formula, so we also have,
Example 8. On any in nite recursively enumerable class of equivalence relations, in which any structure A has equivalence classes of at most O(log jAj) distinct cardinalities, PSPACE = 6 IMP(PFP(Q)), for any set Q of bounded arity DSPACE(ns ) generalized quanti ers for a given s.
We will now exploit Lemma 3 to get a sucient condition for IMP (PFP (Q))(C) = PSPACE(C) for a nite set Q of generalized quanti ers. Observation 2. Let P be a PSPACE query of any arity. Then there is PSPACE computable generalized quanti er Q such that P is expressible in IMP (FO(Q)) Proof. For simplicity consider structures in vocabulary of one binary relation only. Let P be an l-ary PSPACE query. Consider the generalized quanti er QP associated with class f(A; R; P(A)) jA = (A; R); R A2; A niteg. P is implicitly de ned by formula (S) = QP x1x2; y1 : : :yl (R(x1; x2); S(y1 : : :yl )) which has principal query variable S. ut It is a simple observation that if in a structure A of size n there are n kautomorphism types, for some > 0, then for every p there is a h such that A has at least np , h-automorphism types. We used this fact in the proof of Lemma 3. This motivates the following de nition. De nition 4. Let C be a class of nite structures. We say that C realizes polynomially many automorphism types if for every p there is a k such that each structure of size n in C realizes at least np , k-automorphism types. By combining the results of Theorem 5, Lemma 3 and Observation 2 we get the following characterization. Theorem 6. Let C be a recursively enumerable class of nite structures. Let l be a natural number. The following are equivalent. 1. There is a nite set Q of generalized quanti ers, such that IMP(PFP(Q)) = PSPACE , over C for queries of arity l. 2. C realizes polynomially many automorphism types.
Notice that Theorem 6 is only a partial converse to Theorem 5. It seems an open problem to show that for every set of PSPACE computable bounded arity generalized quanti ers Q, there is a PSPACE query (of some arity) which is not in IMP (PFP (Q)). Note that we always consider structures over an arbitrary but xed signature.
7.2 IMP (Lk (Q))-types
In order to generalize the Theorem 3, we need to de ne the notion of IMP(Lk (Q)) types. We de ne the type equivalence relation for any logical language L below. Let C be a class of nite structures. For each l, let Al = f(A; a) j A 2 C; a 2 Al g. De nition 5. For each l, L de nes an equivalence relation L on the set Al as follows. (A; a1 ; : : :; al ) L (B; b1 ; : : :; bl ) if for all l-ary queries P over C de nable in L, P (A)(a1 ; : : :; al ) i P(B)(b1; : : :; bl ). In the following, we will mainly be interested in k-type equivalence relation IMP (Lk (Q)) , with Q a nite set of generalized quanti ers. We have de ned the notion of k-types in terms of queries rather than in terms of formulae as was done in [2,3]. This is more convenient for logics such as implicit de nability, where the formulae de ning queries may not even be closed under simple syntactic operations. Also, note that IMP (Lk (Q)) depends on the class C of structures under consideration. In order to be able to work conveniently with IMP(Lk (Q)) we note some simple closure properties of it in the Lemma below. Lemma 8. Let P1; P2 be l-ary queries in IMP(Lk (Q)). P1 [P2, P1\P2, :P1 are in IMP (Lk (Q)). More generally, if (x; P1; : : :; Pn) 2 Lk (Q) and P1; : : :; Pn are IMP (Lk (Q)) queries then so is the query de ned by . That is, IMP(Lk (Q)) queries are closed under Lk (Q) operations. Observation 3. Let C be a class of nite structures. For any structure A 2 C , any nite set Q of generalized quanti ers and any k, there are k-ary IMP(Lk (Q)) queries P1; : : :; Pm which partition Ak such that each Pi isolates a IMP(Lk (Q))type in A. Proof. The proof is same as in Observation 1, using de nition of IMP(Lk(Q))types and Lemma 8 above. ut Below we generalize the construction of A=Qk in Section 5 to obtain the structure quotiented by IMP (Lk (Q)) relation. Given a nite set Q of generalized quanti ers the structure A=IMP (Lk (Q)) is de ned in an identical manner as in A=Qk , except that we use IMP (Lk (Q)) instead of Qk relation in the de nition. The following Lemma is an analogue of Lemma 4 and is proved in the same way. Lemma 9. For any Lk (Q) sentence (P1; : : :; Pm) in the language , with P1; : : :; Pm 62 being relational symbols of arity k, there is a sentence (P1 ; : : :; Pm ) in the language 00 (00 is 0 as in section 5 augmented by P1; : : :; Pm relational symbols of arity one 62 0 ) such that for any structure A, A j= (P1; : : :Pm) i A= IMP (Lk (Q)) j= (P1 ; : : :; Pm ), where P1; : : :Pm are IMP (Lk (Q)) closed, Pi has arity pi and Pi; Pi, 1 i m, are related as follows; Pi = f[b1; : : :; bk] j Pi (b1; : : :; bpi ) ^ bpi = bpi +1 = : : : = bk g. We can also observe the following, which is proved in the same way as Lemma 5.
Lemma 10. If two structures (A; a ; : : :; ak), (B; b ; : : :; bk) have same IMP (Lk (Q)) theories then their invariants (A= IMP Lk Q ; [a ; : : :; ak ]), (B=IMP Lk Q ; [b ; : : :; bk ]) are also the same (upto isomorphism). In de ning implicit closure of Lk1;! (Q), we allow a countable number of query variables and denote the resulting class of queries by IMP1 (Lk1;! (Q)). 1
1
(
(
(
))
(
))
1
1
It can be shown that this logic is as expressive as the logic obtained if we allowed arbitrarily many queries. The subscript in the notation is introduced to distinguish it from the implicit closure where only nitely many queries are allowed which is denoted by IMP (Lk1;! (Q)). The version IMP(Lk1;! ), was studied in [5] over rigid structures. We choose here IMP1 (Lk1;! (Q)) instead of IMP(Lk1;! ) mainly for two reasons, rst since we show limitations of expressibility, such a result is more general if stated for IMP1 (Lk1;! (Q)). Second, there are some natural properties such as closure under countable union of queries which hold for IMP1 (Lk1;! (Q)) queries but are not known for IMP(Lk1;! (Q)). The following Lemma observes the simple properties of IMP1 (Lk1;! (Q)) that we will use.
Lemma 11. IMP1 (Lk1;! ) is closed under countable conjunction, countable disjunction and complementation.
We will use the following normal form Lemma in the proof of Theorem 7 below.
Lemma 12. Each query in IMP1(Lk1;! (Q)), where Q is a nite set of generalized quanti ers, is implicitly de nable by a countable disjunction, _i2N i , where each i is of the form i (P ; : : :; Pni ) ^ (^m>ni [Pm = Pmi ]), mi ni and i (P ; : : :; Pni ) 2 Lk (Q) over vocabulary [ fP ; : : :; Pni g. Proof. This is proved by elementary techniques using Corollary 1. ut 1
1
1
7.3 Collapsing IMP1 (L!1;! (Q)) to IMP (FO(Q))
We are now ready to prove an analogue of Theorem 3 for implicit de nability.
Theorem 7. Let C be a class of nite structures over vocabulary and let Q be a nite set of generalized quanti ers. For any k the following are equivalent.
1. There is a number mk such that number of IMP(Lk (Q))-types realized in each structure in C is bounded by mk . 2. Number of IMP (Lk (Q))-types realized over C is nite. 3. IMP1 (Lk1;! (Q)) collapses to IMP(Lk (Q)) over C . 4. There are only nitely many distinct IMP1 (Lk1;! (Q)) queries over C .
Proof. All cases except (2) ) (3) follow the same argument as the corresponding cases in Theorem 3. For the case (2) ) (3), we give a brief sketch here.
2: ) 3: The corresponding claim in Theorem 3 was proved by induction on the structure of Lk1;! (Q), however it is not clear how to induct on the structure of a formula de ning a query in IMP1 (Lk1;! (Q)). So we use the
normal form Lemma 12 above. First, we can rule out the case when there are in nitely many non-isomorphic structures of the form A=IMP (Lk (Q)) , A 2 C. In this case one can show that there are in nitely many IMP(Lk (Q))-types realized over C, contradicting the assumption (2). So, we only have to consider the case when the structures of the form A= IMP (Lk(Q)), A 2 C, are nite upto isomorphism. We proceed as follows. (P1; P2; P3; : : :) be a Lk1;! (Q) sentence over vocabulary [ S PLetde ning a query in IMP1 (Lk1;! (Q)). Consider a structure A 2 C. i2N i has a satisfying assignment on A. By the normal form Lemma there is a i1 (P1; : : :; Pni1 ) 2 Lk (Q) such that i1 (P1; : : :; Pni1 )^(^m>ni1 [Pm = Pmi1 ]), mi1 ni1 has a satisfying assignment on A. Let B 2 C be any other structure such that A=IMP (Lk (Q)) is isomorphic to B=IMP (Lk (Q)) . By considering i1 it is easy to see by Lemma 9 that i1 (P1 ; : : :; Pni ) also has a satisfying assignment on B. Since there are only nitely many non-isomorphic quotiented structures over C; we have a nite collection i1 (P1; : : :; Pni1 ); : : :; ir (P1 ; : : :; Pnir ) of Lk (Q) formulae constructed as above such that for each structure A 2 C there is a ij having a satisfying assignment on A. Let n = maxfni1 ; : : :; nir g. De ne for 1 j r, ij (P1 ; : : :; Pn) = ij (P1; : : :; Pnij ) ^ (^nij <mn [Qm = Qmij ]). Consider the formula = _1j r ij . It is easy to see that, de nes over C the same query in IMP(Lk (Q)) as de ned by . ut Remark: It might look as if the implication (1 ) 3) in the above Theorem can be deduced from the corresponding implication in Theorem 3. This is not so. Given that the number of IMP(Lk (Q)) types is bounded over C, the number of Lk (Q)-types is also bounded over C. This from Theorem 3 gives us that Lk1;! (Q) collapses to Lk (Q) over C. However, this does not imply that over C, IMP1 (Lk1;! (Q)) collapses to IMP(Lk (Q)). In fact, one can easily construct a class of structures on which Lk1;! collapses to Lk but IMP(Lk1;! ) does not collapse to IMP (Lk ). We give some applications of Theorem 7 below. Notice that number of kautomorphism types bounds the number of IMP(Lk )-types in any structure. So the following example is a corollary to Theorem 7 and Theorem 5. Example 9. Let C be an in nite recursively enumerable trivial class of structures and let Q be a nite set of PSPACE computable generalized quanti ers. IMP1 (L!1;! (Q)) on C is properly contained in PSPACE . For PTIME we have the following examples. Example 10. Let C be an in nite recursively enumerable class of cliques and let Q be a nite set of poly-time computable generalized quanti ers. IMP1(L!1;! (Q)) on C is properly contained in PTIME .
Example 11. Let Er be an in nite recursively enumerable class of equivalence relations, in which any structure A has equivalence classes of at most r distinct cardinalities, and let Q be a nite set of poly-time generalized quanti ers. IMP1 (L!1;! (Q)) on Er is properly contained in PTIME .
8 Quanti er Elimination for Implicit De nability In this section we adapt the technique of quanti er elimination due to Scott Weinstein used in [3], to isolate IMP(Lk (Q))-types. First, we present a general lemma about quanti er elimination, independent of implicit de nability or types. In the following Q will be an arbitrary but a xed set of nitely many generalized quanti ers. The lemma below is a version of the essential idea behind the proof of Theorem 4.2 in [3].
Lemma 13. Let A; B be two relational structures. Let S ; : : :; Sm be k-ary rela1
tional symbols in the vocabulary of these structures such that S1A ; : : :; SmA de nes a partition on Ak and S1B ; : : :; SmB de nes a partition on B k . Assume that for each Lk (Q) formula (x), x fx1; : : :; W xkg, of quanti er rank 1, there is an I f1; : : :; mg s.t. A; B j= 8x1 ; : : :; xk[ i2I Si (x1; : : :; xk ) $ (x)]. Then the following holds. (i) For every Lk (Q) formula (y ), y fx1; : : :; xkg, there is a J f1; : : :; mg W s.t. A; B j= 8x1; : : :; xk [ i2J Si (x1 ; : : :; xk ) $ (y)]. (ii) Both A; B satisfy the same set of Lk (Q) sentences.
Proof. (i) By induction on the quanti er rank of Lk (Q) formulae . This uses the quanti er elimination argument as in [3]. (ii) Follows easily from (i). ut
Let (A; a1; : : :; ak ) be given, where A is a structure over vocabulary . We now give the construction of IMP(Lk (Q)) formula to isolate IMP(Lk (Q)) type of (A; a1; : : :; ak ). Let (P1 ; : : :; Pm ) be a IMP(Lk (Q)) formula (with possibly other auxiliary queries not displayed) such that its satisfying assignment to P1 ; : : :; Pm in A isolate IMP (Lk (Q)) types of Ak and a1 ; : : :; ak 2 P1 . V Let (P1; : : :; Pm) beWde ned as: W [ 1im 9xPi(x)] ^ [8x 1im Pi(x)] ^ [8x 1i;j m;i6=j :Pi(x) _ :Pj (x)] (P1; : : :; Pm) is an Lk (Q) sentence asserting that P1; : : :; Pm partition the th k power of the domain. Let 1; : : :; r be the set of all Lk (Q) formulae of quanti er rank 1 in the language [fP1; : : :; Pm g. In A for each Lk (Q) formula (Wx), x fx1; : : :; xkg, there is a unique I f1; : : :; mg s.t. A j= 8x1; : : :; xk [ i2I Pi(x1 ; : : :; xk) $ (x)], where empty disjunction is considered an abbreviation of x1 6= x1 . This is because by de nition of types every IMP(Lk (Q)) query on A is a union of IMP (Lk (Q)) types in A. W V Let be 8x1; : : :; xk [ i2I Pi(x1 ; : : :; xk) $ (x)]. Let be 1ir i .
De ne A (P; P1; : : :; Pm ) as: (P1 ; : : :; Pm )^[8x1; : : :; xk(P(x1; : : :; xk ) $ [(P1; : : :; Pm)^ ^P1 (x1; : : :; xk )])]. Let P be principal query in A. (As stated above for , there may be some auxiliary queries other than P; P1; : : :; Pm in A which are not displayed, but these will not play any role in the proofs below so we will not mention them). Lemma 14. For any structure D, let P(D); P1(D); : : :; Pm (D) denote the satisfying assignment of A(P; P1; : : :; Pm). Let (B; b1 ; : : :; bk) be in the model of query implicitly de ned by A(P; P1; : : :; Pm), that is (b1; : : :; bk ) 2 P(B). Then for all Lk (Q) sentences in the vocabulary [fP1; : : :; Pmg, (A; P1(A); : : :; Pm(A)) j= i (B; P1 (B); : : :; Pm (B)) j= . Proof. Since query implicitly de ned by A (P; P1; : : :; Pm) is nonempty on B, (P1; : : :; Pm ) and are true on B by de nition of A. If we consider the extended structures (A; P1(A); : : :; Pm(A)) and (B; P1(B); : : :; Pm (B)) in vocabulary [ fP1; : : :; Pm g, the hypothesis of Lemma 13 is satis ed and the result follows by part (ii) of that Lemma. ut Theorem 8. IMP (Lk (Q)) types can be isolated in IMP(Lk(Q)). Proof. Let (A; a1; : : :; al), l k, be given. First assume l = k. We can construct a formula A (P; P1; : : :; Pm ) as above. It is not dicult to show that IMP (Lk (Q)) query P de ned by A isolates IMP(Lk (Q))-type of (A; a1; : : :; ak ). The case l < k can also be handled easily. Details are given in the full version.
ut
Notice that Lemma 8 is not sucient to prove Theorem 7, in particular to prove implication (1) ) (2). In the case of Lk (Q) it is clear that the quanti er rank of the formulae isolating types within a structure is bounded by the number of types in the structure. However, this is not clear in the case of isolating IMP (Lk (Q)) types within a structure. In fact, we do not know any bound on the quanti er rank of formulae isolating types within a structure in terms of number of types alone in the structure. Another obstacle is that for a given r the number of Lk (Q) formulae de ning IMP(Lk (Q)) queries is not nite because the number of query variables is not bounded. However, this does not appear to be a serious obstacle as it may be handled by stratifying the IMP (Lk (Q)) relation according to the number of query variables.
9 Conclusion We have given an elementary proof showing the limitation on the expressive power of xed point logics in the presence of arity-bounded and complexitybounded generalized quanti ers. Our results extend many earlier results on the limitations of expressive power of nitely many generalized quanti ers, yet the proofs do not use any concepts from logic beyond mere de nitions. As a result these techniques also seem applicable to other situations such as other variants of implicit de nability.
Our argument involving quotiented structures seems very natural to establish that uniform boundedness of types within structures of a class implies nitely many types (of xed variable fragment) over the class. Though the construction of quotiented structures in the presence of generalized quanti ers may look technically tedious, the ideas are simple. This method also seems applicable to logics for which types may not be isolatable. If we are interested in determining whether L!1;! (Q) or IMP1 (L!1;! (Q)) contain all PSPACE queries on a class of structures, for nitely many PSPACE computable generalized quanti ers Q (similarly for PTIME) then Theorem 3 or Theorem 7 are of limited help as they characterize the collapse of L!1;! (Q) etc. to a smaller set, FO(Q) etc. We do not know how to prove a theorem similar to Theorems 3, 7 to answer the above question. We mention some open questions related to implicit de nability. We do not know if Theorem 7 holds for IMP(Lk1;! (Q)) instead of IMP1 (Lk1;! (Q)). Our proof fails because we do not know if IMP(Lk1;! (Q)) is closed under countable conjunctions and disjunctions. In fact, we do not even know if IMP(L!1;! ) = IMP1 (L!1;! ). Another open question is to prove or disprove that IMP(Lk )types and IMP1 (Lk1;! )-types are the same.
References 1. A. Dawar. Feasible Computation through Model Theory. PhD thesis, University of Pennsylvania, Philadelphia, 1993. 2. A. Dawar, S. Lindell, and S. Weinstein. In nitary logic and inductive de nability over nite structures. Information and Computation, 119(2):160{175, June 1995. 3. Anuj Dawar and Lauri Hella. The expressive power of nitely many generalized quanti ers. Information and Computation, 123(2):172{184, 1995, Extended abstract appeared in LICS94. 4. H. Ebbinghaus and J. Flum. Finite Model Theory. Perspectives in Mathematical Logic. Springer, 1995. 5. Lauri Hella, Phokion G. Kolaitis, and Kerkko Luosto. How to de ne a linear order on nite models. In Ninth Annual IEEE Symposium on Logic in Computer Science, pages 40{49, 1994. 6. Phokion G. Kolaitis. Implicit de nability and unambiguous computation. In Fifth Annual IEEE Symposium on Logic in Computer Science, pages 168{180, 1990. 7. S. Lindell. An analysis of xed-point queries on binary trees. Theoretical Computer Science, 85(1):75{95, 1991. 8. A. Seth. When do xed point logics capture complexity classes. In Tenth Annual IEEE Symposium on Logic in Computer Science, 1995.
Improved Lowness Results for Solvable Black-box Group Problems N. V. Vinodchandran Institute of Mathematical Sciences, Chennai 600113, India. email: [email protected]
Abstract. In order to study the complexity of computational problems
that arise from group theory in a general framework, Babai and Szemeredi [4, 2] introduced the theory of black-box groups. They proved that several problems over black-box groups are in the class NP \ co-AM, thereby implying that these problems are low (powerless as oracle) for 2p and hence cannot be complete for NP unless the polynomial hierarchy collapses. In [1], Arvind and Vinodchandran study the counting complexity of a number of computational problems over solvable groups. Using a constructive version of a fundamental structure theorem about nite abelian groups and a randomized algorithm from [3] for computing generator sets for the commutator series of any solvable group, they prove that these problems are in randomized versions of low complexity counting classes like SPP and LWPP and hence low for the class PP. In this paper, we improve the upper bounds of [1] for these problems. More precisely, we avoid the randomized algorithm from [3] for computing the commutator series. This immediately places all these problems in either SPP or LWPP. These upper bounds imply lowness of these problems for classes other than PP. In particular, SPP is low for all gap-de nable counting classes [9] (PP, C= P, Modk P etc) and LWPP is known to be low for PP and C= P. These results are in favor of the belief that these problems are unlikely to be complete for NP.
1 Introduction Computational problems in group theory have recently been of considerable interest in computational complexity theory. One of the main reasons for this is that the exact complexity of many of the computational problems that arise from group theory are not exactly characterized. For example, consider the basic problem of testing membership in matrix groups over nite elds, presented by a set of generators. While no polynomial time algorithm exists for this problem, it is not known whether it is hard for NP. Although, in the case of permutation groups, polynomial time algorithms are known for many computational problems like testing membership and computing the order [11], there are many other problems over permutation groups whose complexity is not exactly characterized [13, 14]. In order to study the complexity of computational group theoretic problems in a generalized framework, Babai and Szemeredi [4] introduced the theory of
black-box groups. Intuitively speaking, in this framework we have an in nite family of abstract groups. The elements of each group in the family are uniquely encoded as strings of uniform length. The group operations (product, inverse etc) are assumed to be provided by a group oracle and hence are easily computable. Black-box groups are subgroups of groups from such a family and they are presented by generator sets. For example, matrix groups over nite elds and permutation groups presented by generator sets can be seen as examples of black-box groups. It is shown in [4] that Membership Testing in general blackbox groups (and many other computational problems over black-box groups) is in NP. A central problem considered in the papers of Babai and Szemeredi [4, 2] is Order Veri cation: given a black-box group G presented by a generator set and a positive integer n, verify that jGj = n. This problem is important because it turns out that several other problems reduce to Order Veri cation. In [2], using randomization to compute approximate lower bounds and sophisticated grouptheoretic tools, it is shown that Order Veri cation for general black-box groups is in AM. As a consequence, it turns out that several problems for black-box groups (e.g., Membership Testing, Group Intersection etc.) are in NP \ co-AM. It follows that these problems cannot be NP-complete unless the polynomial hierarchy collapses to 2p [6, 16]. These results strongly indicate that these problems may be of intermediate complexity between P and NP.1 Recently, there has been interest in studying the complexity of computational group-theoretic problems with respect to counting complexity classes. In [15], Kobler et. al. have looked at the counting complexity of Graph Isomorphism and some group-theoretic problems like Group Intersection, Group Factorization etc, over permutation groups. They show that these problems are in counting classes that are low for PP and C= P. Their results are also in favor of the belief that these problems are not complete for NP. Motivated by the above results, in [1], Arvind and Vinodchandran study the counting complexity of a number of problems over solvable black-box groups.2 Since solvable groups are a generalization of abelian groups, they rst consider these problems over abelian groups. Using a constructive version of a fundamental theorem on the structure of nite abelian groups, the authors prove tight upper bounds on the counting complexity of these problems. More precisely, they show that, over abelian groups, the problems Membership Testing, Order Veri cation, Group Isomorphism and Group Intersection are in the class SPP. Since SPP is low for any gap-de nable [9] counting class, this upper bounds imply that these problems are low for classes PP, C= P and Modk P for any k 2. 1 It is interesting to note that, while most of the natural problems that are not known to be in P, are proved to be NP-complete, candidates for natural problems that are of intermediate complexity are very few. The problem of testing whether two labeled graphs are isomorphic (in short, GI) is one such candidate. 2 It is worth noting that solvable nite groups constitute a very large subclass of all nite groups. The celebrated Feit-Thompson Theorem states that all nite groups of odd order are solvable.
They also show lowness of the problems Group Factorization, Coset Intersection and Double-Coset Membership over abelian groups, for PP and C= P_ In the case of solvable groups, using a randomized algorithm for computing the commutator series of a solvable black-box group from [3], and a constructive version of the fundamental theorem for abelian factor groups, in order to construct a special set of generators called canonical generator set, the authors of [1] were able to show that all the above-mentioned problems are in randomized versions of SPP or LWPP (More precisely in ZPPSPP or in ZPPLWPP ). These classes are low for the probabilistic class PP. Because of the randomization involved as a bottle-neck in the results for solvable groups in [1], the lowness properties of these problems for classes like C= P, Modk P, and other gap-de nable counting classes, were not clear from [1]. The present paper overcomes this bottle-neck. The main contribution of this paper is the design of a deterministic oracle Turing machine for converting a set of generators of a solvable group to a canonical generator set. Thus, by avoiding the randomized algorithm of [3] for computing a generator set for commutator series, we improve the upper bounds on the counting complexity of all the above-mentioned problems over solvable black-box groups. More precisely, we show that, over solvable black-box groups, the problems Solvability Testing, Membership Testing, Order Veri cation, Group Isomorphism and Group Intersection are in the class SPP, there by implying that these problems are low for any gap-de nable counting class. We also show that the problems Group Intersection, Group Factorization, Coset Intersection and Double-Coset Membership over solvable groups are in the class LWPP. This implies that these problems over solvable groups are low for C= P also. These results indicate that, at least with respect to the counting complexity, problems over solvable groups may not be harder than their counter-parts over abelian groups. The rest of the paper is organized as follows. In Section 2, we give complexitytheoretic and group-theoretic notations and de nitions which are necessary for the paper. The de nition of a canonical generator set for solvable groups and results relating to it are also given. Section 3 is devoted to the design of a deterministic oracle algorithm, Canonical Generator, for computing a canonical generator set for a solvable group, from an arbitrary generator set. Finally, in Section 4, we give de nitions of the computational problems that we are interested in, and improve the upper bounds on the counting complexity of these problems, using the algorithm given in Section 3.
2 Preliminaries We refer the reader to [5] for standard complexity-theoretic de nitions. Here we give only the minimal notations and de nitions. We x the nite alphabet = f0; 1g. Let A; B be two languages. The language 0A [ 1B is denoted by A B . In [9], a uniform method for de ning a number of counting classes using GapP, the class of gap-de nable functions, is given. Those classes that can be de ned in this manner are called gap-de nable classes. Refer [9] for details.
We give explicit de nitions of only two classes, SPP and LWPP. Let GapP [9] denote the class of gap-de nable functions. A language L is in LWPP if there are functions f 2 GapP and h 2 FP such that; x 2 L implies that f (x) = h(jxj), and x 62 L implies that f (x) = 0. A language L is in SPP if there is an f 2 GapP such that: x 2 L implies that f (x) = 1, and x 62 L implies that f (x) = 0. It follows that SPP LWPP. The concept of lowness is a well-studied notion in structural complexity theory. Intuitively, we say that the class L is low for class C , if L is powerless as an oracle to a machine accepting languages in C . Next we give the formal de nition of lowness. De nition 2.1 Let C be any complexity class which allows natural relativization. Then the class L is said to be low for C , if for all L 2 L, C L = C , where C L denotes the complexity class that is obtained by relativizing C with respect to the language L. The main interest in these classes is because of the following theorem proved in [9] regarding their complexity. This result indicates that SPP and LWPP are counting classes of low complexity. It is believed that those problems in SPP or LWPP cannot be complete for NP.
Theorem 2.2 ([9]) The class SPP is low for all gap-de nable counting classes. The class LWPP is low for PP and C= P.
Let M be an oracle NP machine and let A 2 NP be accepted by an NP machine N and let f 2 FP. We say that M A make f (n)-guarded queries if, on length n inputs, M A only asks queries y for which N (y) has either 0 or f (n) accepting paths, for each n. In this terminology, we state a weaker version of a theorem from [15] which gives a method for proving membership of some languages in the classes SPP and LWPP.
Theorem 2.3 ([15]) Let M be a deterministic polynomial-time oracle machine and let f be a polynomial-time computable function. If A 2 NP such that M A makes f (n)-guarded queries, then there is a polynomial q such that the function h, where h(x) = f (jxj)q jxj if M A on input x accepts and h(x) = 0 if M A on (
)
input x rejects, is in GapP.
For example, to prove that a language L is in SPP, it is enough to show the existence of a deterministic polynomial-time oracle machine accepting L, which makes only 1-guarded queries to a language in NP. We now give some basic group-theoretic de nitions. The reader can refer standard textbooks [7, 12] for details. Let G be a nite group. For an element g 2 G, the order of g (denoted o(g)) is the smallest positive integer such that go(g) = e, where e is the identity of G. The order of G is denoted jGj. A subset H of G is a subgroup of G (denoted H < G) if H is a group under the group operation of G. Let X G. The group generated by X is the smallest subgroup of G containing X and is denoted hX i. A group G is abelian if 8g1; g2 2 G : g1g2 =
g2 g1 . Observe that if S is a set of generators for G then G is abelian i 8g1; g2 2 S : g1 g2 = g2 g1 . The element xyx,1 y,1 is called the commutator of elements x and y in G. The subgroup of G generated by the set fxyx,1 y,1 j x; y 2 Gg is called the commutator subgroup of G. We denote the commutator subgroup of G by G0 . It is well-known that G0 is a normal subgroup of G and the factor group G=G0 is abelian. For a group G, the sequence G = G0 > G1 > : : :, where each group Gi is the commutator subgroup of Gi,1 , is called the commutator series of G. A group G is said to be solvable if the commutator series terminates in the trivial subgroup feg in nitely many steps.
Now we de ne the notion of balck-box groups. De nition 2.4 A group family is a countable sequence B = fBmgm1 of nite groups Bm , such that there are polynomials p and q satisfying the following conditions. For each m 1, elements of Bm are uniquely encoded as strings in p(m) . The group operations (inverse, product and testing for identity) of Bm can be performed in time bounded by q(m), for every m 1. The order of Bm is computable in time bounded by q(m), for each m. We refer to the groups Bm of a group family and their subgroups (presented by generators sets) as black-box groups.3 A class C of nite groups is said to be a subclass of B if every G 2 C is a subgroup of some Bm 2 B. For example let Sn denote the permutation group on n elements. Then, fSn gn1 is a group family of all permutation groups Sn . As another example let GLn (q) denote the group of all n n invertible matrices over the nite eld Fq of size q. The collection GL(q) = fGLn (q)gn1 is a group family. The class of all solvable subgroups, fG j G < GLn (q) for some n and G is solvable g is a subclass of GL(q). In this paper we are interested in the complexity of computational grouptheoretic problems (the exact de nitions of the problems we consider are given in Section 4) when the groups involved are solvable. But, since solvable groups are a generalization of abelian groups, some remarks about the complexity of these problems over abelian black-box groups are in order. For proving tight upper bounds on the counting complexity of the above-mentioned problems over abelian groups in [1], the authors employ a constructive version of a fundamental theorem about the structure of nite abelian groups. This theorem says that any nite abelain group G can be uniquely represented as a direct product of some cyclic subgroups of G. One of the immediate consequence of this theorem is the existence of a special generator set, called independent generator set, for any abelian group. To be precise, let G be a nite abelian group. An element g 2 G, is said to be independent of a set X G if hgi \ hX i = feg. A generator set S of G is an independent generator set for G, if all g 2 S is independent of S , fgg. One of the very useful properties of independent generator sets is the following. Let S be an independent generator set for an abelian group G. Then 3
Note that the black-box groups we de ne above are a restricted version of the blackbox groups introduced in [4].The black-box group de ned in [4] is technically more general. There the black-box group is de ned so as to incorporate factor groups.
for any Q g 2 G, there exist unique indices lh for h 2 S ; lh < o(h) such that g = h2S hlh . Hence membership testing in G can be done in a \1-guarded way" if G is presented by an independent generator set. In [1], an algorithm for converting a given generator set to an independent generator set is given, which is used in proving the upper bounds on the counting complexity for problems over abelian black-box groups. For proving the upper bounds for problems over solvable black-box groups in [1], the authors introduce a generalization of the notion of independent generator set, called canonical generator set for any class of nite groups. We now give the de nition of a canonical generator set. The existence of canonical generators for the class of solvable groups is shown in [1].
De nition 2.5 Let B = fBmgm> be any group family. Let C be a subclass of B. The class of groups C has canonical generator sets if for every G 2 C , if G < Bm then there is an ordered set S = fg ; g ; : : : ; gsg G such that each g 2 G can be uniquely expressed as g = gl1 gl2 : : : gsls , where 0 li o(gi ); 1 i s. Furthermore, s q(m) for a polynomial q. S is called a canonical generator set 0
1
1
2
2
for G.
Notice that the above de nition is a generalization of the de nition of an independent generator set in the sense that the uniqueness property of the indices is preserved. Now, de ne a language L as follows. Q L = f(0m ; S; g)jS Bm ; g 2 Bm ; 8h 2 S 9lh; 1 lh < o(h) and g = h2S hlh g The following proposition brings out the fact that the language L can act as a \pseudo Membership Testing" in the sense that if S is a canonical generator set, then (0m ; S; g) 2 L if and only if g 2 hS i. More importantly in this case, the NP machine M (given in the proposition) will have unique accepting path for those instances inside L.
Proposition 2.6 Let B = fBmgm> be any group family. Then there exists an NP machine M witnessing L 2 NP. Let C be a subclass of B which has a canonical generator set and let S be a canonical generator set for hS i 2 C . Then M will have a unique accepting path if g 2 hS i and M will have no accepting path if g 62 hS i. The behavior of M is unspeci ed if the input does not satisfy the 0
promise.
Proof. In [10], it is shown that checking whether a number is prime or not is in the class UP. Using this, one can easily design an unambiguous nondeterministic polynomial-time transducer which computes the prime factorization of any number. Let M 0 be such a machine. Now, It is easy to see that the order of any g 2 Bm can be computed if the prime factorization of jBm j is given. So, M rst computes jBm j in polynomial time. Then by simulating M 0 , it computes the prime factorization of jBm j and computes the order o(h) for allQh 2 S . Now, M 0 guesses indices lh such that 1 lh < o(h) and accepts if g = h2S hlh and rejects otherwise. From the de nition of canonical generator set, it follows that M has the behavior as described in the proposition.
Next lemma shows the existence of canonical generator sets for any solvable group.
Lemma 2.7 ([1]) Let B = fBmgm> be a group family such that jBmj 2q m for a polynomial q. Let G < Bm be a nite solvable group and G = G > G > : : : > Gk, > Gk = e be the commutator series of G. Let Ti = fhi ; hi ; : : : ; hisi g be a set of distinct coset representatives corresponding to an independent set of generators for the S abelian group Hi = Gi, =Gi . Then for any i, 1 i k, the (
0
0
1
1
)
1
2
1
ordered set 4 Si = kj=i Tj forms a canonical generator set for the group Gi and jSi j q(m). Thus the class of solvable groups from B has canonical generator sets.
The basic steps implicitly involved in the upper bound proofs given in [1], for problems over solvable black-box groups, are the following. 1. A deterministic oracle algorithm (let us call it Canonize), which takes an arbitrary set of generators for the commutator series of a solvable black-box group as input, and converts it into a canonical generator set by making 1-guarded queries to a language in NP, is developed. 2. By carefully combining the algorithm Canonize with a randomized algorithm from [3] for computing generator sets of the commutator series for any solvable black-box group, a randomized oracle algorithm (let us call it Rand Canonical Generator) for converting a generator set for any solvable group to a canonical generator set (which makes 1-guarded queries to an NP language) is given. 3. Rand Canonical Generator is then easily modi ed to give membership of many computational problems over solvable groups in randomized counting classes which are low for PP. In this paper, we avoid randomization involved in step 2. More precisely, by using the algorithm Canonize as subroutine, we give a deterministic oracle algorithm Canonical Generator (which makes 1-guarded queries to an NP language) for converting an arbitrary generator set to a canonical generator set, for any solvable black-box group G. This will immediately give improved upper bounds on the counting complexity of many problems over solvable groups which in turn gives lowness of these problems for many counting classes. In the next section we present the algorithm Canonical Generator for converting an arbitrary generator set to a canonical generator set, for any solvable group. Since we will be using the algorithm Canonize as subroutine in Canonical Generator, we describe the behavior of Canonize as a theorem.
Theorem 2.8 ([1]) Let B = fBmgm be a group family. Then there is a deterministic oracle machine Canonize and a language L0 2 NP such that Canonize takes h0m ; S ; : : : ; Sk i, Si Bm as input and L0 as oracle. Suppose the 0
0
4
S
The elements of the set kj=i Tj are ordered on increasing values of the index j , and lexicographically within each set Tj .
input satis es the promise that hS0 i is solvable and for 0 i k, Si generates the ith commutator subgroup of hS0 i. Then Canonize outputs canonical generator sets for hSi i for 0 i k. Moreover, Canonize runs in time polynomial in the length of the input and makes only 1-guarded queries to L0 . The behavior of Canonize is unspeci ed if the input does not satisfy the promise.
3 Computing a Canonical Generator Set This section is devoted to the proof of the following theorem.
Theorem 3.1 Let B = fBmgm be a group family. Then there is a language Lca 2 NP and a deterministic oracle machine Canonical Generator that takes (0m; S ) as input and Lca as oracle, and outputs a canonical generator set for hS i if hS i is solvable and outputs NOT SOLVABLE otherwise. Moreover, 0
Canonical Generator runs in time polynomial in the length of the input and makes only 1-guarded queries to Lca .
Before going into the formal proof of the theorem, we give basic ideas behind the proof. Let S be a set of generators for a solvable group. Let hS i = G0 > : : : > Gi > : : : > Gk = feg be the commutator series of hS i. We are interested in computing short generator sets for each Gi . It is clear that this problem essentially boils down to the problem of computing a generator set for the commutator subgroup of any group. The following theorem provides a method for this computation. The proof is standard group-theoretic.
Theorem 3.2 Let G be a nite group generated by the set S . Then, the commutator subgroup of G is the normal closure of the set fghg, h, j g; h 2 S g in G.
1
1
The above theorem gives us the following easy polynomial-time oracle algorithm Commutator Subgroup which takes (0m ; S ) as input and Membership Testing as oracle and computes a generator set for the commutator group of hS i. Commutator Subgroup(0m ; S ) 1 X fghg,1h,1 j g; h 2 S g 2 while 9g 2 S ; x 2 X such that (0m; X; gxg,1 ) 62 Membership Testing 3 do X X [ gxg,1 4 end-while 5 Output X It easily follows from Theorem 3.2 that Commutator Subgroup, on input (0m ; S ) outputs a generator set for the commutator subgroup hS i0 . Let Xi be the set X at the beginning of the ith iteration of the while-loop. If, after the ith iteration, no new element is added to Xi , then Xi is output. Otherwise, if Xi+1 = Xi [fgg, it follows from Lagrange's theorem that jXi+1 j 2jXi j. Hence the number of iterations of the while-loop is bounded by p(m).
Since in the above algorithm, the queries to Membership Testing oracle may not be 1-guarded, a straightforward adaptation of the algorithm computing a generator set for all elements in the commutator series seems dicult. Suppose we can make sure that whenever a query h0m ; X; gi to Membership Testing is done, X is a canonical generator set for the solvable group hX i, then from Proposition 2.6, we can replace Membership Testing oracle with the NP language L and it follows that query y will be 1-guarded. We make sure this promise, by constructing the commutator series in stages. Let Sij denote the partial generator set for the ith element in the commutator series of G0 constructed at the end of stage (j , 1). At stage 1 we have S01 = S and Si1 = feg for 1 i p(m), where p is the polynomial bounding the length of any element in the group family. Input to Stage j is the tuple hi; Sij ; : : : ; Spj(m) i such that for l > i, Slj is a canonical generator set for the solvable group hSlj i. At the end of the stage, we update each Sij to Sij+1 such that Sij+1 is still a subgroup of Gi , the ith commutator subgroup of G0 . To keep the running time within polynomial bound, we make sure that after p(m) stages, there exists k, such that the kth partial commutator subgroup doubles in size. Then from Lagrange's theorem, it will follow that the commutator series will be generated after p3 (m) stages. We now formally prove the theorem. Proof. (of Theorem 3.1) We rst give the formal description of the algorithm Canonical Generator and then prove the correctness. Canonical Generator uses oracle algorithms Check Commutator and Canonize as subroutines. Check Commutator takes as input (0m ; X; Y ) such that X; Y Bm and checks whether hY i contains the commutator subgroup of hX i. This is done by rst checking whether the commutators of all the elements in X are in hY i. If this is not the case, the algorithm returns such a commutator. Otherwise, it further checks whether hY i is normal in hX i. Notice that, to do this it is enough to verify that 8x 2 X ; y 2 Y , xyx,1 2 hY i. If this condition is false, the algorithm returns an element xyx,1 62 hY i. If both the conditions are true, it follows from Theorem 3.2 that hY i contains the commutator subgroup of hX i. Check Commutator makes oracle queries to the language L (de ned in the previous section) for testing membership in hY i. It should be noted that, for Check Commutator to work as intended, Y should be a canonical generator set for the group hY i. We will make sure that Canonical Generator makes calls to Check Commutator with (0m ; X; Y ) as input, only when Y is a canonical generator set for the solvable group hY i. A formal description of the subroutine Check Commutator is given below. Check Commutator(0m ; X; Y ) 1 if 9x1 ; x2 2 X; such that (0m; Y; x1 x2 x,1 1 x,2 1 ) 62 L 2 then g xyx,1 y,1 3 Return g 4 else if 9x 2 X; y 2 Y such that (0m ; Y; xyx,1 ) 62 L
5 then g xyx,1 6 Return g 7 else g YES 8 Return g 9 end-if 10 end-if The subroutine Canonize is the algorithm promised by Theorem 2.8 for computing a canonical generator set for a solvable black-box group G, given an arbitrary generator set for the commutator series of G. Canonize makes 1-guarded queries to the NP language L0 if the input satis es the promise given in Theorem 2.8. We use the notation [Canonize()]l to denote the generator set produced by Canonize for the lth element Gi , in the commutator series of G. Following is the description of the algorithm Canonical Generator. De ne the language Lca as; Lca = L0 L. Notice that the oracle access to Lca is implicit in the description. That is, Canonical Generator queries L0 through the subroutine Canonize and L through Check Commutator. Canonical Generator(0m ; S ) 1 Stage 0 2 S10 S ; Si0 feg for 1 i p(m) 3 i 1 4 Stage j (Input to this stage is hi; Sij ; : : : ; Spj(m) i) 5 k i 6 g Check Commutator(0m ; Skj ; Skj +1 ) 7 while g 6= YES 8 do Skj+1 Skj [ fgg 9 k k+1 10 if k = p(m) 11 then Output NOT SOLVABLE 12 end-if 13 g Check Commutator(0m; Skj ; Skj +1 ) 14 end-while 15 if k = 1 16 then Output [Canonize(S1j ; S2j ; : : : ; Spj(m) )]1 17 else Slj+1 Slj for 1 l (k , 1) 18 Slj+1 [Canonize(Skj ; Skj +1 ; : : : ; Spj(m) )]l for k l p(m) 19 i (k , 1) 20 goto Stage j + 1 21 end-if Now we are ready to prove the correctness of Canonical Generator. We rst prove a series of claims, from which the correctness will follow easily. Claim 3.2.1 In the algorithm Canonical Generator, at any stage j , it holds that 8i; 1 i < p(m), hSij+1 i < hSij i0 .
Proof. We prove this by induction on the stages. For the base case, when j = 0, it is clear that the claim holds. Assume that it is true for (j , 1)th stage. Now consider Sij+1 and Sij . Depending on how the sets Sij+1 and Sij are updated in lines-17,18 of Canonical Generator, we have the following cases. ,1 . In this case, from the induction hypothesis, Case 1. Sij = Sij,1; Sij+1 = Sij+1 j j it is clear that hSi+1 i < hSi i0 . ,1 . From the induction hypothesis, it Case 2. Sij = Sij,1 [ fgig; Sij+1 = Sij+1 j j , 1 j , 1 0 follows that hSi+1 i = hSi+1 i < hSi i < hSij i0 . ,1 [ fgi+1 g. The element gi+1 is added to the Case 3. Sij = Sij,1; Sij+1 = Sij+1 j , 1 set Si+1 at line-8 of the algorithm, where gi+1 is the element returned by the subroutine Check Commutator. Suppose gi+1 is a commutator of the set Sij = Sij,1 . Then gi+1 = xyx,1 y,1 for some elements x; y 2 Sij . From induction hypothesis and the de nition of the commutator subgroup of a group, it follows ,1 [fgi+1 gi < hS j i0 . On the other hand, suppose gi+1 is of the that hSij+1 i = hSij+1 i ,1 . We have, hS j,1 i < hS j,1 i0 = form xyx,1 for some x 2 Sij = Sij,1 and y 2 Sij+1 i+1 i j j j 0 0 hSi i . But we know that hSi i is normal in hSi i. So, in particular gi+1 2 hSij i0 and hence hSij+1 i < hSij i0 . ,1 [ fgi+1 g. From induction hypothesis, Case 4 . Sij = Sij,1 [ fgig; Sij+1 = Sij+1 j , 1 j , 1 0 ,1 i < hS j,1 [ fgi gi. Now using a we have hSi+1 i < hSi i . It follows that hSij+1 i very similar argument as in Case 3, it is easy to show that hSij+1 i < hSij i0 . Hence the claim.
Claim 3.2.2 In Canonical Generator, the input hi; Sij ; Sij ; : : : ; Spj m i to any stage j , is such that for all i < t p(m), Stj is a canonical generator for the solvable group hStj i. +1
(
)
Proof. We prove this by induction. For j = 1, it is easily veri ed that the claim is true. Let us assume that the claim is true for j th stage. Let hi; Sij ; : : : ; Spj(m) i be the input to the j th stage. Let the while-loop is exited through line-14 after l iterations with the value of g=`YES'. (If the loop is exited through line-11, then there are no more stages to be considered). Then the value of k = i + l and for t > k, Stj is not updated inside the loop, and hence by induction hypothesis, it remains a canonical generator set for the solvable group hStj i. Since the value of g = Commutator Check(0m ; Skj ; Skj +1 ) is `YES' we have that hSkj i0 < hSkj +1 i. From Claim 3.2.1 we have hSkj +1 i < hSkj i0 . Hence hSkj +1 i = hSkj i0 . It follows that hSkj i is solvable and Stj for k t p(m) is generator sets for the commutator series of hSkj i. Hence at line-18, Canonize will output a canonical generator set for each of the elements in the commutator series of hSkj i. At line-19, i is updated j +1 j +1 to k , 1, and the input to the j + 1th stage hk , 1; Skj+1 ,1 ; Sk ; : : : ; Sp(m) i where j +1 j +1 St is a canonical generator set for the solvable group hSt i for k t p(m). Hence the claim.
Claim 3.2.3 In the algorithm Canonical Generator, for any stage j , it holds that 9i such that jSij p m j 2jSij j if stage j + p(m) exists. Proof. Notice that, if the algorithm at stage j enters the while-loop, then there 9i such that Sij = Sij [ g for a g 62 hSij i. So, it is enough to show that the while-loop is entered at least once after every p(m) sages, if such a stage exists. Suppose the stage j is entered with the value of i = i0 . It is clear from the algorithm that if the algorithm never enters the while-loop in the next p(m) stages, at stage (j + p(m) + 1), the value of i = i0 , (p(m) + 1) < 0, for all i0 p(m), which is impossible, since the algorithm is terminated when the value + (
)
+1
of i = 0. Hence the claim.
To complete the proof of the theorem, rst we shall see that the algorithm
Canonical Generator runs in time polynomial in the length of the input.
Observe that it is enough to show that the number of stages executed by the algorithm is bounded by a polynomial, since the number of iterations of the while-loop in lines 7-14 is bounded by p(m). Now, the claim is that the the number of stages executed by the algorithm is bounded by 2p3(m). Firstly, notice Q p ( that for any H < Bm , jH j 2p(m). Hence for any j , i=1m) jSQij j 2p2 (m) . Sup(m) pose the claim is false. Now from Claim 3.2.3 it follows that, pi=1 jSij+p(m) j 3 Q Q (m) (m) 2 pi=1 jSij j. Hence pi=1 jSi2p (m) j > 2p2 (m) , a contradiction. Now we shall see that Canonical Generator makes only 1-guarded queries to Lca, where Lca = L0 L. Let us rst see that the queries to L through Check Commutator are 1-guarded. It is enough to show that whenever Canonical Generator calls Check Commutator with argument (0m ; Skj ; Skj +1 ) in stage j , Skj +1 is a canonical generator set. But from Claim 3.2.2, the input hi; Sij ; Sij+1 ; : : : ; Spj(m) i to any stage j , is such that for all i < t p(m), Stj is a canonical generator for the solvable group hStj i. Now, by inspecting the description of the algorithm, it follows that whenever Canonical Generator calls Check Commutator with argument (0m ; Skj ; Skj +1 ), Skj +1 is a canonical generator set. To see that the queries to L0 through Canonize are 1-guarded, notice that calls to Canonize are made outside the while-loop. This means that Check Commutator with input (0m ; Skj ; Skj +1 ) returns YES. That is hSkj i0 < hSkj +1 i. Hence hSkj i0 = hSkj +1 i from Claim 3.2.1. So it follows that calls to Canonize with argument (Sij ; Sij+1 ; : : : ; Spj(m) ) will be such that Slj for i l p(m) will generate the commutator series of Sij for all i. It follows from Theorem 2.8 that queries to L0 will be 1-guarded. Finally, we show that the above algorithm on input (0m ; S ), outputs a canonical generator set for the group G = hS i if G is solvable and outputs NOT SOLVABLE otherwise. Now, observe that if H1 < H2 are two nite groups, H10 < H20 . Hence it follows from Claim 3.2.1 that, hSij i < Gi for any i at any stage j , where Gi is the ith element in the commutator series of G. We know that after the execution of 2p3(m) stages, the algorithm outputs either a set
X Bm or NOT SOLVABLE. Suppose it outputs NOT SOLVABLE in stage j .
This happens after the value of the variable inside the while-loop is assigned p(m). From the description of the algorithm inside the loop, it follows that the group hSpj(m) i does not contain the commutator subgroup of hSpj(m),1 i. But if G where solvable, then we know that Gp(m) = feg and since hSpj(m) i < Gp(m) from Claim 3.2.1, we have a contradiction. Suppose the algorithm outputs a set X Bm at line-16 in stage j . Thus the value of the variable k is 1. Notice that, inside the while-loop, the value of k is only incremented. This implies that at stage j the while-loop is not entered (the value of i could not have become 0 at a previous stage). So input to stage j is h1; S1j ; : : : ; Sp(m) i. From Claim 3.2.2, it follows that for all 2 t p(m), hStj i is solvable and Stj is a canonical generator set for the group hStj i. From the value of g =YES and Claim 3.2.1, it follows that hS1j i0 = hS2j i. Also, since S1j = S for any stage j , it follows that Sij generates the ith element in the commutator series of hS i = G. Hence, from Theorem 2.8, it follows that [Canonize(S1j ; : : : ; Spj(m) )]1 is a canonical generator set for G. Hence the theorem.
4 Improving the Bounds In this section, combining the algorithm Canonical Generator with the algorithms developed in [1], we prove upper bounds on the complexity of the following problems over solvable black-box groups. Let B = fBm gm>0 be a group family. The following decision problems which we consider in this paper are well-studied in computational group theory [8, 4, 15, 2]. 4 f(0m ; S ) j hS i < B and hS i is solvable g. Solvability Testing = m 4 m Membership Testing = f(0 ; S; g) j hS i < Bm and g 2 hS ig. 4 f(0m; S; n) j hS i < B and jhS ij = ng. Order Veri cation = m 4 m Group Isomorphism = f(0 ; S1 ; S2 ) j hS1 i; hS2 i < Bm and are isomorphicg. 4 f(0m ; S ; S ) j hS i; hS i < B and hS i \ hS i 6= (e)g. Group Intersection = 1 2 1 2 m 1 2 4 f(0m; S ; S ; g) j hS i; hS i < B and g 2 hS ihS ig. Group Factorization = 1 2 1 2 m 1 2 4 f(0m ; S ; S ; g) j hS i; hS i < B and hS ig \ hS i = Coset Intersection = ;g. 1 2 1 2 m 1 2 6 4 m Double-Coset Memb = f(0 ; S1 ; S2 ; g; h) j hS1 i; hS2 i < Bm and g 2 hS1 ihhS2 ig. Firstly, observe that Theorem 3.1 together with Theorem 2.3 gives the following. Theorem 4.1 Over any group family, Solvability Testing is in SPP and hence low for all gap-de nable counting classes. Remark. In [3], a co-RP algorithm is given for Solvability Testing. But this upper bound does not give lowness for gap-de nable counting classes other than PP.
In view of the above theorem, for all the problems that we consider here, we assume without loss of generality that the groups encoded in the problem instances are solvable. From Theorems 3.1, 2.3 and Proposition 2.6 it easily follows that Membership Testing over solvable groups is in SPP. Theorem 4.2 Over any group family, Membership Testing for the subclass of solvable groups is in SPP and hence low for all gap-de nable counting classes. For proving upper bounds for Group Isomorphism and Order Veri cation, we use the following theorem, which is essentially proved in [1]. We omit the proof here.
Theorem 4.3 ([1]) Let B = fBmgm> be a group family. Then there are poly0
nomial time deterministic oracle machines, 1. Mo that takes (0m ; S; n) as input satisfying the promise that S is a canonical generator set for the solvable group hS i and Lo 2 NP as oracle such that; Mo makes 1-guarded queries to Lo and accepts if and only if jhS ij = n. 2. Mis that takes (0m; S1 ; S2 ) as input satisfying the promise that S1 ; S2 are canonical generator sets for the solvable groups hS1 i; hS2 i respectively and Lis 2 NP as oracle such that; Mis makes 1-guarded queries to Lis and accepts if and only if hS1 i is isomorphic to hS2 i. The behavior of the machines are not speci ed if the input does not satisfy the promise.
The above theorem along with Theorems 3.1, 2.3 gives the upper bound for Group Isomorphism and Order Veri cation over solvable black-box groups.
Theorem 4.4 Over any group family, Group Isomorphism and Order Veri cation for the subclass of solvable groups are in SPP and hence low for all gapde nable counting classes.
Let P denote any of the problems, Group Intersection, Group Factorization, Coset Intersection, Double-coset Memb. Next we show that the Theorem 3.1 along with the following Theorem from [1], gives membership of P in the class LWPP. It follows that any problem denoted by P for solvable groups are low for the class PP and C= P. Theorem 4.5 ([1]) Let B = fBmgm>0 be a group family. Then there are polynomial time deterministic oracle machines MP that takes an instance x of problem P ( (0m ; S1 ; S2 ) if P is Group Intersection, (0m ; S1 ; S2 ; g) if P is Group Factorization or Coset Intersection, (0m ; S1 ; S2 ; g; h) if P is Double-coset Memb) as input satisfying the promise that S1 ; S2 are canonical generator sets for the solvable groups hS1 i; hS2 i respectively and LP 2 NP as oracle such that; MP makes jBm j-guarded queries to LP and accepts if and only if x 2 P . The behavior of the machines is not speci ed if the input does not satisfy the promise.
Finally we have the following theorem. Theorem 4.6 Over any group family, the problems Group Intersection, Group Factorization, Coset Intersection, Double-coset Memb for the subclass of solvable groups are in LWPP and hence low for the classes PP, and C= P. Acknowledgments. I would like to thank V. Arvind for the discussions we had about the paper and his suggestions which improved the readability of the paper. I would like to thank Meena Mahajan, Antoni Lozano and the referees for their suggestions which improved the readability of the paper.
References 1. V. Arvind and N. V. Vinodchandran. Solvable Black-Box Group Problems are low for PP. Symposium on Theoretical Aspects of Computer Science, LNCS Vol: 1046, 99{110, 1996. 2. L. Babai. Bounded round interactive proofs in nite groups. SIAM Journal of Discrete Mathematics, 5: 88{111, 1992. 3. L. Babai, G. Cooperman, L. Finkelstein, E. Luks and A . Seress. Fast Monte Carlo algorithms for permutation groups. In Journal of Computer and System Sciences, 50: 296{308, 1995. 4. L. Babai and M. Szemeredi. On the complexity of matrix group problems I. Proc. 25th IEEE Symposium on Foundations of Computer Science, 229{240, 1984. 5. J. L. Balcazar, J. Daz and J. Gabarro. Structural Complexity { I & II. Springer Verlag, Berlin Hiedelberg, 1988. 6. R. Boppana, J. Hastad and S. Zachos. Does co-NP have short interactive proofs? Information Processing Letters, 25: 127{132, 1987. 7. W. Burnside. Theory of Groups of Finite Order, Dover Publications, INC, 1955. 8. G. Cooperman and L. Finkelstein. Random algorithms for permutation groups. CWI Quarterly, 5 (2): 93{105, 1992. 9. S. Fenner, L. Fortnow and S. Kurtz. Gap-de nable counting classes. Journal of Computer and System Sciences, 48: 116{148, 1994. 10. M. Fellows and N. Koblitz. Self-witnessing polynomial time complexity and prime factorization. Proc. 6th Structure in Complexity Theory Conference, 107{110, 1992. 11. M. Furst, J. E. Hopcroft and E. Luks. Polynomial time algorithms for permutation groups. Proc. 21st IEEE Symposium of Foundations of Computer Science, 36-45, 1980. 12. M. Hall. The Theory of Groups. Macmillan, New York, 1959. 13. C. Homann. Group-Theoretic Algorithms and Graph Isomorphism. Lecture Notes in Computer Science #136, Springer Verlag, 1982. 14. C. Homann. Subcomplete Generalizations of Graph Isomorphism. Journal of Computer and System Sciences, 25: 332{359, 1982. 15. J. Kobler, U. Schoning and J. Toran. Graph isomorphism is low for PP. Journal of Computational Complexity, 2: 301{310, 1992. 16. U. Schoning. Graph isomorphism is in the low hierarchy. Journal of Computer and System Sciences, 37: 312{323, 1988.
On Resource-Bounded Measure and Pseudorandomness V. Arvind1 and J. Kobler2 1
Institute of Mathematical Sciences, Chennai 600113, India email: [email protected] 2 Theoretische Informatik, Universitat Ulm, D-89069 Ulm, Germany email: [email protected]
Abstract. In this paper we extend a key result of Nisan and Wigderson [17] to the nondeterministic setting: for all > 0 we show that if O(n)
there is a language in E = DTIME(2 ) that is hard to approximate by nondeterministic circuits of size 2n , then there is a pseudorandom generator that can be used to derandomize BP NP (in symbols, BP NP = NP). By applying this extension we are able to answer some open questions in [14] regarding the derandomization of the classes BPkP and BP kP under plausible measure theoretic assumptions. As a consequence, if 2P does not have p-measure 0, then AM \ coAM is low for 2P . Thus, in this case, the graph isomorphism problem is low for 2P . By using the NisanWigderson design of a pseudorandom generator we unconditionally show the inclusion MA ZPPNP and that MA \ coMA is low for ZPPNP .
1 Introduction In recent years, following the development of resource-bounded measure theory, pioneered by Lutz [12, 13], plausible complexity-theoretic assumptions like P 6= NP have been replaced by the possibly stronger, but arguably plausible measuretheoretic assumption p (NP) 6= 0. With this assumption as hypothesis, a number of interesting complexity-theoretic conclusions have been derived, which are not known to follow from P 6= NP. Two prominent examples of such results are: there are Turing-complete sets for NP that are not many-one complete [15], there are NP problems for which search does not reduce to decision [15, 7]. Recently, Lutz [14] has shown that the hypothesis p (NP) 6= 0 (in fact, the possibly weaker hypothesis p (Pk ) 6= 0, k 2) implies that BP Pk = Pk (in other words, BP Pk can be derandomized). This has an improved lowness consequence: it follows that if p (P2 ) 6= 0 then AM \ coAM is low for P2 (i.e., any AM \ coAM language is powerless as oracle to P2 machines). It also follows from p (P2 ) 6= 0 that if NP P/poly then PH = P2 . Thus the results of Lutz's paper [14] have opened up a study of derandomization of randomized complexity classes and new lowness properties under assumptions about the resource-bounded measure of dierent complexity classes. The results of Lutz in [14] (and also a preceding paper [13]) are intimately related to research on derandomizing randomized algorithms based on the idea of trading hardness for randomness [22, 25, 17]. In particular, Lutz makes essential
use of the explicit design of a pseudorandom generator that stretches a short random string to a long pseudorandom string that looks random to deterministic polynomial-size circuits. More precisely, the Nisan-Wigderson generator is built from a set (assumed to exist) that is in E and, for some > 0, is hard to approximate by circuits of size 2n. As shown in [17], such a pseudorandom generator can be used to derandomize BPP. In Section 3 of the present paper we extend the just mentioned result of Nisan and Wigderson to the nondeterministic setting. We show that their generator can also be used to derandomize the Arthur-Merlin class AM = BP NP, provided it is built from a set in E that is hard to approximate by nondeterministic circuits of size 2n for some > 0. Very recently [9], the result of Nisan and Wigderson has been improved by weakening the assumption that there exists a set A in E that is hard to approximate: it actually suces that A has worst-case circuit complexity 2 (n) . We leave it as an open question whether a similar improvement is possible for the non-deterministic case. (For related results on derandomizing BPP see [2, 3].) In Section 4 we apply our extension of the Nisan and Wigderson result to the non-deterministic case to answer some questions left open by Lutz in [14]. We show that for all k 2, p (Pk ) 6= 0 implies BP kP = kP (see Figs. 1 and 2 for a comparison of the known inclusion structure with the inclusion structure of these classes if p (P2 ) 6= 0). Furthermore, we show under the possibly stronger assumption p (NP) 6= 0 that with the help of a logarithmic number of advice bits also BP NP can be derandomized (i.e., BP NP NP= log). Under the hypothesis p (NP \ coNP) 6= 0 we are able to prove that indeed BP NP = NP which has some immediate strong implications as, for example, Graph Isomorphism is in NP \ coNP. Relatedly, in Section 5 we show that for all k 2, p (kP ) 6= 0 implies BP kP = kP , answering an open problem stated in [14]. Thus, p (2P ) 6= 0 has the remarkable consequence that AM \ coAM (and consequently the graph isomorphism problem) is low for 2P . Finally, we show in Section 6 that the Arthur-Merlin class MA is contained in ZPPNP and that MA \ coMA is even low for ZPPNP .
2 Preliminaries In this section we give formal de nitions and describe the results of Nisan and Wigderson [17] and of Lutz [14] which we generalize in this paper. We use the binary alphabet = f0; 1g. The cardinality of a nite set X is denoted by jjX jj and the length of x 2 by jxj. The join A B of two sets A and B is de ned as A B = f0x j x 2 Ag [ f1x j x 2 B g. The characteristic function of a language L is de ned as L(x) = 1 if x 2 L, and L(x) = 0 otherwise. The restriction of L(x) to strings of length n can be considered as an n-ary boolean function that we denote by L=n . Conversely, each n-ary boolean function g de nes a nite language fx 2 n j g(x) = 1g that we denote by Lg .
.. .
.. .
..
.. . BP p
BP 3p .t = 3p QQ
QQ .. tBP 3p Q . P p 3p tQ Q QtQBP 3 t3 QQ QQ p QQtQP3 QQtBP 2p BP 2 tQ QQ QQ QQtQBP P2QQt2p 2p tQ QQ QQ QQtQP2 QQtco-AM AM tQ QQ Q QQt QQQt NP t QQ co-NP QQ BPP Qt
3 t p = 3 QQ ... QtQBP P3 = P3 QQ BP p BP 2p QQt = p2 t = 2p QQ 2 QQ P QtQBP 2 = P2 QQQQt t AM Q co-AM QQ Q Q t t t NP Q QQ BPP co-NP QQt
Fig. 1. Known inclusion structure
Fig. 2. Inclusion structure if p (P2 ) 6= 0
BP 3p
t
P
P
The de nitions of complexity classes we consider like P, NP, AM, E, EXP etc. can be found in standard books [6, 5, 18]. By log we denote the function logx = maxf1; dlog2 xeg and h; i denotes a standard pairing function. For a class C of sets and a class F of functions from 1 to , let C=F [11] be the class of sets A such that there is a set B 2 C and a function h 2 F such that for all x 2 , x 2 A , hx; h(1jxj)i 2 B: The function h is called an advice function for A. The BP-operator [21] assigns to each complexity class C a randomized version BP C as follows. A set L belongs to BP C if there exist a polynomial p and a set D 2 C such that for all x, jxj = n x 2 L ) Probr2R f0;1gp(n) [hx; ri 2 D] 3=4; x 62 L ) Probr2R f0;1gp(n) [hx; ri 2 D] 1=4: Here, the subscript r 2R f0; 1gp(n) means that the probability is taken by choosing r uniformly at random from f0; 1gp. We next de ne boolean functions that are hard-to-approximate and related notions. For a function s : N ! N + and an oracle set A , CIRA (n; s) denotes the class of boolean functions f : f0; 1gn ! f0; 1g that can be computed by some oracle circuit c of size at most s(n) having access to A.SIn case A = ; we denote this class by CIR(n; s). Furthermore, let CIR(s) = n0 CIR(n; s) S and CIRA (s) = n0 CIRA (n; s).
De nition1. (cf. [25, 17]) 1. Let f : f0; 1gn ! f0; 1g be a boolean function, C be a set of boolean functions, and let r 2 R+ be a positive real number. f is said to be r-hard for C if for all n-ary boolean functions g in C , 1 , 1 < kfx 2 f0; 1gn j f(x) = g(x)gk < 1 + 1 :
2n 2 r + 2. Let r : N ! R and L . L is said to be r-hard for C if for all but nitely many n, the n-ary boolean function L=n is r(n)-hard for C . 3. A class D is called r-hard for C if some language L 2 D is r-hard for C . 4. A boolean function f (a language L, or a language class D) is called CIRA (r)hard if f (resp. L, D) is r-hard for CIRA (r). The already discussed result of Nisan and Wigderson can be stated in relativized form as follows. Theorem 2. [17]A For all > 0 and all oracles A, if EA is CIRA (2n)-hard, then PA = BPP . The concept of resource-bounded measure was introduced in [12]. We brie y recall some basic de nitions from [12, 14] leading to the de nition of a language class having p-measure 0. Intuitively, if a class C of languages has p-measure 0, then SC \ E forms a negligibly small subclass of the complexity class E (where E = c>0DTIME(2cn); see [12, 14] for more motivation). De nition3. [12, 14] 1. A function d : ! R+ is called a supermartingale if for all w 2 , d(w) (d(w0) + d(w1))=2: 2. The success set of a supermartingale d is de ned as S1 [d] = fA j lim sup d(A(s1 ) A(sl )) = 1g 2
r
l!1
where s1 = ; s2 = 0; s3 = 1; s4 = 00; s5 = 01; : : : is the standard enumeration of in lexicographic order. The unitary success set of d is [ S1 [d] = Cw d(w)1
where, for each string w 2 , Cw is the class of languages A such that A(s1 ) : : :A(sjwj ) = w, i.e., the smallest language in Cw is Lw = fsi j wi = 1g. 3. A function d : N i ! R is said to be p-computable if there is a function f : N i+1 ! R such that f(r; k1 ; ; ki; w) is computable in time (r + k1 + + ki + jwj)O(1) and jf(r; k1; ; ki; w) , d(k1; ; ki; w)j 2,r . 4. A class X of languages has p-measure 0 (in symbols, p (X) = 0) if there is a p-computable supermartingale d such that X S1 [d].
In the context of resource-bounded measure, it is interesting to ask for the measure of the class of all sets A for which EA is not CIRA (2n)-hard. Building on initial results in [13] it is shown in [1] that this class has p-measure 0.
Lemma 4. [1] For all 0 < < 1=3, p fA j EA is not CIRA (2n)-hardg = 0. Lutz strengthened this to the following result that is more useful for some applications. Lemma 5. [14] For all 0 < < 1=3 and all oracles B 2 E, p fA j EA is not CIRAB (2n)-hardg = 0: As a consequence of the above lemma, Lutz derives the following theorem. Theorem6. [14] For k 2, if p (Pk ) 6= 0 then BP Pk Pk . It is not hard to see that Theorem 6 can be extended to any complexity class S C EXP = c>0DTIME(2nc ) that is closed under join and polynomial-time Turing reducibility (see also Corollary 22). For example, if P does not have p-measure 0, then BP P P, implying [24] that the polynomial hierarchy is contained in P. In Sects. 4 and 5 we address the question whether BPkP = kP (or BP kP = kP ) can also be derived from p (Pk ) 6= 0, and whether stronger consequences can be derived from p (NP) 6= 0 and p (NP \ coNP) 6= 0.
3 Derandomizing AM in Relativized Worlds In this section we show that the Nisan-Wigderson generator can also be used to derandomize the Arthur-Merlin class AM = BP NP [4]. We rst de ne the counterpart of De nition 1 for nondeterministic circuits and the corresponding notion of hard-to-approximate boolean functions. A nondeterministic circuit c has two kinds of input gates: in addition to the actual inputs x1; : : :; xn, c has a series of distinguished guess inputs y1 ; : : :; ym . The value computed by c on input x 2 n is 1 if there exists a y 2 m such that c(xy) = 1, and 0 otherwise [23]. WeSnow de ne hardness for nondeterministic circuits. NCIRA (s) denotes the union n0 NCIRA (n; s), where NCIRA (n; s) consists of all boolean functions f : f0; 1gn ! f0; 1g that can be computed by some nondeterministic oracle circuit c of size at most s(n), having access to oracle A.
De nition7.A A boolean function f (a language L, or a language class D) is called NCIR (r)-hard if f (resp. L, D) is r-hard for NCIRA (r). We continue by recalling some notation from [17]. Let p; l; m; k be positive integers. A collection D = (D1 ; : : :; Dp ) of sets Di f1; : : :; lg is called a (p; l; m; k)-design if { for all i = 1; : : :; p, kDi k = m, and { for all i 6= j, kDi \ Dj k k.
Using D we get from a boolean function g : f0; 1gm ! f0; 1g a sequence of boolean functions gi : f0; 1gl ! f0; 1g, i = 1; : : :; p, de ned as gi (s1 ; : : :; sl ) = g(si1 ; : : :; sim ) where Di = fi1; : : :; im g: By concatenating the values of these functions we get a function gD : f0; 1gl ! f0; 1gp where gD (s) = g1(s) : : :gp (s). As shown by Nisan and Wigderson [17, Lemma 2.4], the output of gD looks random to any small deterministic circuit, provided g is hard to approximate by deterministic circuits of a certain size (in other words, the hardness of g implies that the pseudorandom generator gD is secure against small deterministic circuits). The following lemma shows that gD is also secure against small nondeterministic circuits provided g is hard to approximate by nondeterministic circuits of a certain size. As pointed out in [19], this appears somewhat counterintuitive since a nondeterministic circuit c might guess the seed given to the pseudorandom generator gD and then verify that the guess is correct. But note that in our case, this strategy is ruled out by the size restriction on c which prevents c from simulating gD . LemmaA 8. Let D be a (p; l; m; k)-design and let g : f0; 1gm ! f0; 1g be an NCIR (m; p2 + p2k )-hard function. Then the function gD has the property that for every p-input nondeterministic oracle circuit c of size at most p2 , Prob y2R f0;1gp [cA (y) = 1] , Prob s2R f0;1gl [cA (gD (s)) = 1] 1=p: Proof. The proof follows along similar lines as the one of [17, Lemma 2.4]. We show that if there is a nondeterministic oracle circuit c of size at most p2 such that Prob y2R f0;1gp [cA (y) = 1] , Prob s2R f0;1gl [cA (gD (s)) = 1] > 1=p; then g is not NCIRA (m; p2 + p2k )-hard. Let S1 ; : : :; Sl and Z1 ; : : :; Zp be independently and uniformly distributed random variables over f0; 1g and let S = (S1 ; : : :; Sl ). Then we can restate the inequality above as follows: Prob [cA (Z ; : : :; Z ) = 1] , Prob [cA (g (S); : : :; g (S)) = 1] > 1=p 1 p 1 p where gi (s) denotes the ith bit of gD (s), i = 1; : : :; p. Now consider the random variables Xi = cA (g1(S); : : :; gi,1(S); Zi ; : : :; Zp ); i = 1; : : :; p: Since X1 = cA (Z1 ; : : :; Zp ) and since Xp+1 = cA (g1 (S); : : :; gp(S)), we can x an index j 2 f1; : : :; pg such that
Prob[Xj = 1] , Prob[Xj +1 = 1] > 1=p2:
Consider the boolean function h : f0; 1gl f0; 1gp,j +1 ! f0; 1g de ned as cA (g1(s); : : :; gj ,1(s); zj ; : : :; zp ) = 0; h(s; zj ; : : :; zp ) = 1 ,zj z; j ; ifotherwise:
(1)
Since Prob[h(S; Zj ; : : :; Zp ) = gj (S)] , 1=2 = Prob[Xj = 0 ^ Zj = gj (S)] + Prob[Xj = 1 ^ Zj 6= gj (S)] , 1=2 = Prob[Zj = gj (S)] + Prob[Xj = 1] , 2 Prob[Xj = 1 ^ Zj = gj (S)] , 1=2 = Prob[Xj = 1] , 2 Prob[Xj +1 = 1 ^ Zj = gj (S)] = Prob[Xj = 1] , Prob[Xj +1 = 1]
it follows that (1) is equivalent to
Prob[h(S; Zj ; : : :; Zp ) = gj (S)] , 1=2 > 1=p2:
(2)
Since gj (s1 ; : : :; sl ) only depends on the bits si with i 2 Dj , we can apply an averaging argument to nd xed bits s^i , i 62 Dj and xed bits z^j ; : : :; z^p such that (2) still holds under the condition that Si = s^i for all i 62 Dj and Zi = z^i for all i = j; : : :; p. Since gj (s1 ; : : :; sl ) = g(s1 ; : : :; sm ) (for notational convenience we assume w.l.o.g. that Dj = f1; : : :; mg) we thus get
Prob[h(S1 ; : : :; Sm ; s^m+1 ; : : :; s^l ; z^j ; : : :; z^p ) = g(S1 ; : : :; Sm )] , 1=2 > 1=p2:
Now consider the nondeterministic oracle circuit c0 that on input s1 ; : : :; sm rst evaluates the functions g1; g2; : : :; gj ,1 on (s1 ; : : :; sm ; s^m+1 ; : : :; s^l ), and then simulates the oracle circuit cA to compute cA (g1 (s1 ; : : :; sm ; s^m+1 ; : : :; s^l ); : : :; gj ,1(s1 ; : : :; sm ; s^m+1 ; : : :; s^l ); z^j ; : : :; z^p ): Then, c0A either computes the boolean function that maps (s1 ; : : :; sm ) to h(s1 ; : : :; sm ; s^m+1 ; : : :; s^l ; z^j ; : : :; z^p ) or it computes the negation of this function (depending on whether z^j = 0 or z^j = 1) and hence it follows that Prob [c0A (S ; : : :; S ) = g(S ; : : :; S )] , 1=2 > 1=p2 : 1 m 1 m Since each of g1(s1 ; : : :; sm ; s^m+1 ; : : :; s^l ); : : :; gj ,1(s1 ; : : :; sm ; s^m+1 ; : : :; s^l ) depends on at most k input bits, these values can be computed by a deterministic subcircuit of size at most 2k (namely, the brute-force circuit that evaluates that particular k-ary boolean function). This means that the size of c0 is at most p2 + p2k , implying that g is not NCIRA (m; p2 + p2k )-hard. ut For our extension of Theorem 2 we also need the following lemma. Lemma 9. [17] Let c be a positive integer and let the integer valued functions l; m; k be de ned as l(p) = 2c2 log p, m(p) = c log p, and k(p) = log p. Then there is a polynomial-time algorithm that on input 1p computes a (p; l(p); m(p); k(p))design.
Theorem10. Let BA and BB be oracles and let > 0. If EA is NCIRB (2n)A hard, then BP NP NP =FP . In particular, if EA is NCIRA (2n)-hard, then BP NPA = NPA .
Proof. Let L 2 BP NPB . Then there exist a polynomial p and a set D 2 NPB
such that for all x, jxj = n x 2 L ) Prob r2R f0;1gp(n) [hx; ri 2 D] 3=4; x 62 L ) Prob r2R f0;1gp(n) [hx; ri 2 D] 1=4: For a xed input x, the decision procedure for D on input x; r can be simulated by some nondeterministic oracle circuit cx with input r, implying that x 2 L ) Prob r2R f0;1gp(n) [cBx (r) = 1] 3=4; x 62 L ) Prob r2R f0;1gp(n) [cBx (r) = 1] 1=4 where w.l.o.g. we can assume that the size of cx is bounded by p2 (jxj). Let > 0 and let C 2 EA be an NCIRB (2n)-hard language. Then for almost all n, the boolean function C =n : f0; 1gn ! f0; 1g is NCIRB (n; 2n)hard. Thus, letting c = d3=e and m(n) = c log p(n), it follows that for almost all n, C =m(n) is NCIRB (m(n); p(n)3 )-hard. Now let l(n) = 2c2 logp(n) and k(n) = log p(n). Then we can apply Lemmas 8 and 9 to get for almost all n a (p(n); l(n); m(n); k(n))-design D such that the boolean function CD=m(n) : f0; 1gl(n) ! f0; 1gp(n) has for every p(n)-input nondeterministic oracle circuit c of size at most p(n)2 the property that
Proby2R f0;1gp(n) [cB (y) = 1] , Probs2R f0;1gl(n) [cB (CD=m(n) (s)) = 1] 1=p(n):
Notice that since m(n) = O(logn) and since C 2 EA , it is possible to compute the advice function h(1n ) = C(0m(n) ) C(1m(n) ) in FPA . Hence, the following procedure witnesses B 2 NPB =FPA : input x, jxj = n, and the sequence h(1n) = C(0m(n)) C(1m(n)); compute a (p(n); l(n); m(n); k(n))-design D and let r1; : : :; r2l(n) be the pseudorandom strings produced by CD=m(n) on all seeds from f0; 1gl(n); if the number of ri for which cBx (ri ) = 1 is at least 2l(n),1 then
accept else reject
ut
4 Derandomizing BP kP if Pk is Not Small
In this section we apply the relativized derandomization of the previous section to extend Lutz's Theorem 6 to the kP levels of the polynomial hierarchy. A crucial result used in the proof of Lutz's Lemma 5 is the fact that there are many n-ary boolean functions that are CIR(n; 2n)-hard. In Lemma 12 we establish the same bound for the nondeterministic case. Lemma 11. [13] For each such that 0 < < 1=3, there is a constant n0 such that for all n n0 and all oracles A, the number of boolean functions n ,2n=4 A n n 2 f : f0; 1g ! f0; 1g that are not CIR (n; 2 )-hard is at most 2 e .
Lemma 12. For each such that 0 < < 1=3, there is a constant n0 such that for all n n0 and all oracles A, the number of n-ary boolean functions that are not NCIRA (n; 2n)-hard is at most 22n e,2n=4 . Proof. The proof follows an essentially similar counting argument as in the de-
terministic case (see [13]). In the sequel, let q = 2n and let NCIRAj (n; q) denote the class of n-ary boolean functions computed by nondeterministic oracle circuits of size q with exactly j guess inputs, having access to oracle A. NoS tice that NCIRA (n; q) = qj ,=0n NCIRAj (n; q), implying that kNCIRA (n; q)k Pq ,n A j =0 kNCIRj (n; q)k. It is shown in [16] by a standard counting argument that for n q, kCIRA (n; q)k a(4eq)q where a = 2685. Since each function in NCIRAj (n; q) is uniquely determined by an n + j-ary boolean function in CIRA (n + j; q), it follows that
kNCIRA (n; q)k
qX ,n j =0
a(4eq)q aq(4eq)q :
We now place a bound on the number of n-ary boolean functions that are not NCIRA (n; q)-hard. Let DELTA(n; q) = fD n j 1=q j2,n jjDjj , 1=2jg: By applying standardn Cherno bounds, as shown in [13], it can be seen that (1,2)n 2 , c 2 jjDELTA(n; q)jj 2 A2 , where c > 0 is a small constant. Now, from the notion of NCIR (n; q)-hard functions (De nition 7) it is easy to see that there are at most jjNCIRA (n; q)jj jjDELTA(n; q)jj q(q + 1)(144eq)q 22n 2,c2(1,2)n distinct n-ary boolean functions that are not NCIRA (n; q)-hard. Hence, using the fact that 0 < < 1=3 we can easily nd a constant n0 such that for n n0 the above number is bounded above by 22n e,2n=4 as required. ut We further need the important Borel-Cantelli-Lutz Lemma [12]. A series P1 is said to be p-convergent if there is a polynomial k=0 ak of nonnegative reals P ,r q such that for all r 2 N , 1 k=q(r) ak 2 .
Theorem13. [12] Assume that d : N ! R+ is a function with the following properties 1. d is p-computable. 2. For each k 2 N , the function dk , de ned by dk (w) = d(k; w) is a supermartingale.P 3. The series 1 k=0 dk () is p-convergent. T1 S1 Then p ( j =0 k=j S1 [dk ]) = 0.
Now we are ready to extend Lutz's Lemma 5 to the case of nondeterministic circuits.
Lemma 14. For all 0 < < 1=3 and all oracles B 2 E, p fA j EA is not NCIRAB (2n)-hardg = 0: Proof. Let 0 < < 1=3 and B 2 E. For each language A de ne the test language3
C(A) = fx j x102jxj 2 Ag; and let X = fA j C(A) is not NCIRAB (2n)-hardg. Notice that since C(A) 2 EA , the theorem follows from the following claim. Claim. p (X) = 0. Proof of Claim. The proof follows the same lines as in [14, Theorem 3.2] except
for minor changes to take care of the fact that we are dealing with nondeterministic circuits. For each k > 0, let 8 <
Xk = :
fA j C(A) is not NCIRAB (n; 2n)-hardg; if k = 2n for some n; ;; otherwise.
It follows immediately that
X=
\ [
j 0 kj
Xk :
We will show that p (X) = 0 by applying the Borel-Cantelli-Lutz Lemma (Theorem 13). Let n0 be the constant provided by Lemma 12 and let k0 = 2n0 . In order to apply Theorem 13 we de ne d : N ! R+ as follows (exactly as in [14]): 1. If k < k0 or k is not a power of 2, then dk (w) = 0. 2. If k = 2n k0 and jwj < 2k+1 , then dk (w) = e,k1=4 . 3. If k = 2n k0 and jwj 2k+1 , then dk (w) =
X
g2NCIRLw B (n;2n); D2DELTA(n;2n)
Prob [Lg = C(A)=n 4D j A 2 Cw ]
where dk (w) = d(k; w) and the conditional probabilities are taken by deciding the membership of each string x 2 to the random language A by an independent toss of a fair coin. Now, the following three properties of d can be proved along similar lines as in [14]: 3
This test language was originally de ned by [1] and later used in [14].
1. d is p-computable. 2. For each k > 0, dk is a supermartingale with dk () e,k1=4 . 3. For all k Tk0, Xk S1 [dk]. S 4. X j 0 kj S1 [dk ]. The only point where a dierent argument is required is in showing that d is p-computable because the circuits used to de ne dk (w) are nondeterministic. Nevertheless, notice that the only nontrivial case to be handled in the de nition of dk is when k = 2n k0 and jwj 2k+1. In this case, the size of the considered nondeterministic oracle circuits is bounded by 2n k. Therefore, in time polynomial in 2k < jwj it is possible to evaluate these circuits by exhaustive search. ut It is now easy to derandomize BP kP under the assumption that Pk has non-zero p-measure.
Theorem15. For all k 2, if p (Pk ) 6= 0, then BP kP = kP . Proof. Assume the hypothesis and let B be a xed kP,1-complete set. We know
from Lemma 14 that for = 1=4, p fA j EA is not NCIRAB (2n)-hardg = 0: On the other hand, p (Pk ) 6= 0. Hence, there is a set A 2 Pk such that EA (and thus also EAB ) is NCIRAB (2n)-hard. Applying Theorem 10 we get kP = NPAB = BP NPAB = BP kP ; which completes the proof. ut Furthermore, we obtain the following two interesting consequences.
Corollary16. If p (NP \ coNP) 6= 0, then BP NP = NP. Proof. Assuming that p (NP \ coNP) = 6 0, similar to the proof of Theorem 15 it follows that there is a set A 2 NP \ coNP such that NPA = BP NPA : From the fact that NPNP\coNP = NP, we immediately get that NP = BP NP. ut Corollary17. If p (NP) =6 0, then BP NP NP= log. Proof. If p (NP) = 6 0, then from Theorems 10 and 14 it follows that there is a set A 2 NP such that BP NP NP=FPA . Actually, from the proof of Lemma 14 we know something stronger. Namely, we know that the test language
C(A) = fx j x102jxj 2 Ag is in EA and is NCIR(2n )-hard. Hence, we can assume that A is sparse and therefore we get BP NP NP= log, by using a census argument [10]. ut
5 Derandomizing BP kP if kP is Not Small
In [14] it was an open question whether BP 2P = 2P can be proven as a consequence of p (NP) 6= 0. We answer this question by proving the same consequence from a possibly weaker assumption. For a complexity class K 2 fP; BPP; Eg and oracle A, let KAjj denote the respective relativized class where only parallel queries to A are allowed. De nition18. Let A be an oracle set. Let CIRAjj (n; s) denote the class of boolean functions f : f0; 1gn ! f0; 1g that can be computed by some oracle circuit c of size at most s(n) that makes only parallel queries to oracle A. S Furthermore, let CIRAjj (s) = n0 CIRAjj (n; s). It is not hard to verify that Nisan and Wigderson's result (Theorem 2) also holds in the parallel setting. Theorem 19. For all > 0 and all oracles A, if EAjj is CIRAjj (2n)-hard, then PAjj = BPPAjj . Corollary 20. For all k 2, if p (kP ) 6= 0, then BP kP = kP . Proof. Assume the hypothesis and let B be a xed kP,1-complete set. Observe that if p (kP ) 6= 0, then it follows from the proof of Lemma 5 (as given in [14]) that for = 1=4 there is a set A 2 kP such that C(A) is CIRAB (2n)-hard. Since C(A) 2 EjjA EjjAB and since CIRAjj B (2n) CIRAB (2n), it follows that EjjAB is CIRAjj B (2n)-hard, implying that kP = PAjj B = BPPAjj B = BP kP ; where the second equality follows by Theorem 19. ut Corollary 20 has the following immediate lowness consequence. Corollary 21. If p(2P ) 6= 0 then AM \ coAM (and hence the graph isomorphism problem) is low for 2P . Corollary 20 can easily be extended to further complexity classes. Corollary 22. For any complexity class C EXP closed under join and polynomial-time truth-table reducibility, p (C) 6= 0 implies that BP C C. Proof. Assume the hypothesis and let L be a set in BP C, witnessed by some set B 2 C. Since C is closed under many-one reducibility we can de ne a suitably padded version B^ of B in C \ E such that L belongs to BP fB^ g. Now, exactly as in the proof of Corollary 20 we can argue that there is a set A 2 C with the property that EjjAB^ is CIRAjj B^ (2n)-hard. Hence, by Theorem 19 it follows that L 2 BP fB^ g BPPAjj B^ = PAjj B^ C:
ut
For example, using the fact that PP is closed under polynomial-time truthtable reducibility [8], it follows that if p (PP) 6= 0, then BP PP = PP.
6 MA is Contained in ZPPNP In this section we apply the Nisan-Wigderson generator to show that MA is contained in ZPPNP and, as a consequence, that MA \ coMA is low for ZPPNP . This improves on a result of [26] where a quanti er simulation technique is used to show that NPBPP (a subclass of MA) is contained in ZPPNP . The proof of the next theorem also makes use of the fact that there are many n-ary boolean functions that are CIR(2n)-hard (Lemma 11).
Theorem23. MA is contained in ZPPNP . Proof. Let L be a set in MA. Then there exist a polynomial p and a set B 2 P
such that for all x, jxj = n, x 2 A ) 9y; jyj = p(n) : Probr2R f0;1gp(n) [hx; y; ri 2 B] 3=4; x 62 A ) 8y; jyj = p(n) : Probr2R f0;1gp(n) [hx; y; ri 2 B] 1=4: For xed strings x and y, the decision procedure for B on input x; y; r can be simulated by some circuit cx;y with inputs r1; : : :; rp(n), implying that x 2 A ) 9y; jyj = p(n) : Probr2R f0;1gp(n) [cx;y (r) = 1] 3=4; x 62 A ) 8y; jyj = p(n) : Probr2R f0;1gp(n) [cx;y (r) = 1] 1=4
where w.l.o.g. we can assume that the size of cx;y is bounded by p2(jxj). It follows by the deterministic version of Lemma 8 that for any (p; l; m; k)-design D and any CIR(p2 + p2k )-hard boolean function g : f0; 1gm ! f0; 1g, Prob y2R f0;1gp [c(y) = 1] , Probs2R f0;1gl [c(gD (s)) = 1] 1=p holds for every p-input circuit c of size at most p2. Now let m(n) = 12 logp(n), l(n) = 2 122 log p(n), and k(n) = log p(n). Furthermore, by Lemma 11 we know that for all suciently large n, a randomly chosen boolean function g : f0; 1gm(n) ! f0; 1g is CIR(2m(n)=4)-hard (and thus CIR(p(n)2 + p(n)2k(n))m(n)=4 , 2 hard) with probability at least 1 , e . Hence, the following algorithm together with the NP oracle set B = fhx; r1; : : :; rk i j 9y 2 p(jxj) : kf1 i k j cx;y (ri ) = 1gk k=2g witnesses L 2 ZPPNP : input x, jxj = n; compute a (p(n); l(n); m(n); k(n))-design D; choose randomly g : f0; 1gm(n) ! f0; 1g; if g is CIR(2m(n)=4)-hard then fthis can be decided by an NP oracleg compute the pseudorandom strings r1; : : :; r2l(n) of gD on all seeds; if hx; r1; : : :; r2l(n) i 2 B then accept else reject else output ?
ut
We note that Theorem 23 cannot be further improved to AM ZPPNP by relativizing techniques since there is an oracle relative to which AM is not contained in 2P [20]. From the closure properties of MA (namely that MA is closed under conjunctive truth-table reductions) it easily follows that NPMA\coMA MA. From TheNP . Hence, NPMA\coMA ZPPNP , implying that orem 23MAwe have MA ZPP \coMA NP NP ZPP ZPP ZPP = ZPPNP . We have proved the following corollary. Corollary 24. MA \ coMA is low for ZPPNP and, consequently, BPP is low for NP ZPP .
Acknowledgement
We would like to thank Lance Fortnow for interesting discussions on the topic of this paper.
References 1. E. Allender and M. Strauss. Measure on small complexity classes with applications for BPP. In Proc. 35th IEEE Symposium on the Foundations of Computer Science, 807{818. IEEE Computer Society Press, 1994. 2. A. Andreev, A. Clementi, and J. Rolim. Hitting sets derandomize BPP. In Proc. 23rd International Colloquium on Automata, Languages, and Programming, Lecture Notes in Computer Science #1099, 357{368. Springer-Verlag, 1996. 3. A. Andreev, A. Clementi, and J. Rolim. Worst-case hardness suces for derandomization: a new method for hardness-randomness trade-os. In Proc. 24th International Colloquium on Automata, Languages, and Programming, Lecture Notes in Computer Science #1256. Springer-Verlag, 1997. 4. L. Babai. Trading group theory for randomness. In Proc. 17th ACM Symposium on Theory of Computing, 421{429. ACM Press, 1985. 5. J. Balcazar, J. Daz, and J. Gabarro. Structural Complexity II. SpringerVerlag, 1990. 6. J. Balcazar, J. Daz, and J. Gabarro. Structural Complexity I. SpringerVerlag, second edition, 1995. 7. M. Bellare and S. Goldwasser. The complexity of decision versus search. SIAM Journal on Computing, 23:97{119, 1994. 8. L. Fortnow and N. Reingold. PP is closed under truth-table reductions. Information and Computation, 124(1):1{6, 1996. 9. R. Impagliazzo and A. Wigderson. P=BPP unless E has sub-exponential circuits: derandomizing the XOR lemma. In Proc. 29rd ACM Symposium on Theory of Computing. ACM Press, 1997. 10. J. Kadin. PNP[log n] and sparse Turing-complete sets for NP. Journal of Computer and System Sciences, 39:282{298, 1989. 11. R. M. Karp and R. J. Lipton. Some connections between nonuniform and uniform complexity classes. In Proc. 12th ACM Symposium on Theory of Computing, 302{309. ACM Press, 1980. 12. J. H. Lutz. Almost everywhere high nonuniform complexity. Journal of Computer and System Sciences, 44:220{258, 1992.
13. J. H. Lutz. A pseudorandom oracle characterization of BPP. SIAM Journal on Computing, 22:1075{1086, 1993. 14. J. H. Lutz. Observations on measure and lowness for P2 . Theory of Computing Systems, 30:429{442, 1997. 15. J. H. Lutz and E. Mayordomo. Cook versus Karp-Levin: separating reducibilities if NP is not small. Theoretical Computer Science, 164:141{163, 1996. 16. J. H. Lutz and W. J. Schmidt. Circuit size relative to pseudorandom oracles. Theoretical Computer Science, 107:95{120, 1993. 17. N. Nisan and A. Wigderson. Hardness vs randomness. Journal of Computer and System Sciences, 49:149{167, 1994. 18. C. Papadimitriou. Computational Complexity. Addison-Wesley, 1994. 19. S. Rudich. Super-bits, demi-bits, and NQP-natural proofs. In Proc. 1st Intern. Symp. on Randomization and Approximation Techniques in Computer Science (Random'97), Lecture Notes in Computer Science #1269. Springer-Verlag, 1997. 20. M. Santha. Relativized Arthur-Merlin versus Merlin-Arthur games. Information and Computation, 80(1):44{49, 1989. 21. U. Schoning. Probabilistic complexity classes and lowness. Journal of Computer and System Sciences, 39:84{100, 1989. 22. A. Shamir. On the generation of cryptographically strong pseudo-random sequences. In Proc. 8th International Colloquium on Automata, Languages, and Programming, Lecture Notes in Computer Science #62, 544{550. Springer-Verlag, 1981. 23. S. Skyum and L.G. Valiant. A complexity theory based on boolean algebra. Journal of the ACM, 32:484{502, 1985. 24. S. Toda. PP is as hard as the polynomial-time hierarchy. SIAM Journal on Computing, 20:865{877, 1991. 25. A. C. Yao. Theory and applications of trapdoor functions. In Proc. 23rd IEEE Symposium on the Foundations of Computer Science, 80{91. IEEE Computer Society Press, 1982. 26. S. Zachos and M. Furer. Probabilistic quanti ers vs. distrustful adversaries. In Proc. 7th Conference on Foundations of Software Technology and Theoretical Computer Science, Lecture Notes in Computer Science #287, 443{455. SpringerVerlag, 1987.
This article was processed using the LATEX macro package with LLNCS style
Verification of Open Systems Moshe Y. Vardi? Rice University, Department of Computer Science, Houston, TX 77251-1892, U.S.A. Email: [email protected], URL: http://www.cs.rice.edu/ vardi
Abstract. In computer system design, we distinguish between closed and open systems. A closed system is a system whose behavior is completely determined by the state of the system. An open system is a system that interacts with its environment and whose behavior depends on this interaction. The ability of temporal logics to describe an ongoing interaction of a reactive program with its environment makes them particularly appropriate for the specification of open systems. Nevertheless, model-checking algorithms used for the verification of closed systems are not appropriate for the verification of open systems. Correct verification of open systems should check the system with respect to arbitrary environments and should take into account uncertainty regarding the environment. This is not the case with current model-checking algorithms and tools. Module checking is an algorithmic method that checks, given an open system (modeled as a finite structure) and a desired requirement (specified by a temporal-logic formula), whether the open system satisfies the requirement with respect to all environments. In this paper we describe and examine module checking problem, and study its computational complexity. Our results show that module checking is computationally harder than model checking.
1 Introduction Temporal logics, which are modal logics geared towards the description of the temporal ordering of events, have been adopted as a powerful tool for specifying and verifying reactive systems [Pnu81]. One of the most significant developments in this area is the discovery of algorithmic methods for verifying temporal-logic properties of finite-state systems [CE81, QS81, LP85, CES86, VW86a]. This derives its significance both from the fact that many synchronization and communication protocols can be modeled as finite-state systems, as well as from the great ease of use of fully algorithmic methods. Experience has shown that algorithmic verification techniques scale up to industrial-sized designs [CGH+ 95], and tools based on such techniques are gaining acceptance in industry [BBG+ 94] We distinguish here between two types of temporal logics: universal and non-universal. Both logics describe the computation tree induced by the system. Formulas of universal temporal logics, such as LTL, 8CTL, and 8CTL? , describe requirements that should hold in all the branches of the tree [GL94]. These requirements may be either linear (e.g., in all computations, only finitely many requests are sent) as in LTL or branching (e.g., in all computations we eventually reach a state from which, no matter how we continue, no requests are sent) as in 8CTL. In both cases, the more behaviors the system has, the harder it is for the system to satisfy the requirements. Indeed, universal temporal logics induce the simulation order between systems [Mil71, CGB86]. That is, a system M simulates a system M 0 if and only if all universal temporal logic formulas that are satisfied in M 0 are satisfied in M as well. On the other hand, formulas of non-universal temporal logics, such as CTL and CTL? , may also impose possibilityrequirements on the system (e.g., there exists a computation in which only finitely many requests are sent) [EH86]. Here, it is no longer ? Supported in part by NSF grants CCR-9628400 and CCR-9700061 and by a grant from the Intel
Corporation.
true that simulation between systems corresponds to agreement on satisfaction of requirements. Indeed, it might be that adding behaviors to the system helps it to satisfy a possibility requirement or, equivalently, that disabling some of its behaviors causes the requirement not to be satisfied. We also distinguish between two types of systems: closed and open [HP85]. A closed system is a system whose behavior is completely determined by the state of the system. An open system is a system that interacts with its environment and whose behavior depends on this interaction. Thus, while in a closed system all the nondeterministic choices are internal, and resolved by the system, in an open system there are also external nondeterministic choices, which are resolved by the environment [Hoa85]. In order to check whether a closed system satisfies a required property, we translate the system into some formal model, specify the property with a temporal-logic formula, and check formally that the model satisfies the formula. Hence the name model checking for the verification methods derived from this viewpoint. In order to check whether an open system satisfies a required property, we should check the behavior of the system with respect to any environment, and often there is much uncertainty regarding the environment [FZ88]. In particular, it might be that the environment does not enable all the external nondeterministic choices. To see this, consider a sandwich-dispensing machine that serves, upon request, sandwiches with either ham or cheese. The machine is an open system and an environment for the system is an infinite line of hungry people. Since each person in the line can like either both ham and cheese, or only ham, or only cheese, each person suggests a different disabling of the external nondeterministic choices. Accordingly, there are many different possible environments to consider. It turned out that model-checking methods are applicable also for verification of open systems with respect to universal temporal-logic formulas [MP92, KV96, KV97a]. To see this, consider an execution of an open system in a maximal environment; i.e., an environment that enables all the external nondeterministic choices. The result is a closed system, and it is simulated by any other execution of the system in some environment. Therefore, one can check satisfaction of universal requirements in an open system by model checking the system viewed as a closed system (i.e., all nondeterministic choices are internal). This approach, however, can not be adapted when verifying an open system with respect to non-universal requirements. Here, satisfaction of the requirements with respect to the maximal environment does not imply their satisfaction with respect to all environments. Hence, we should explicitly make sure that all possibility requirements are satisfied, no matter how the environment restricts the system. For example, verifying that the sandwich-dispensing machine described above can always eventually serve ham, we want to make sure that this can happen no matter what the eating habits of the people in line are. Note that while this requirement holds with respect to the maximal environment, it does not hold, for instance, in an environment in which all the people in line do not like ham. Module checking is suggested in [KV96, KVW97, KV97a] as a general method for verification of open systems (we use the terms โopen systemโ and โmoduleโ interchangeably). Given a module M and a temporal-logic formula , the module-checking problem asks whether for all possible environments E , the execution of M in E satisfies . There are two ways to model open systems. In the first approach [KV96, KVW97], we model open systems by transition systems with a partition of the states into two sets. One set contains system states and corresponds to states where the system makes a transition. The second set contains environment states and corresponds to states where the environment makes a transition. For a module M , let VM denote the unwinding of M into an infinite tree. We say that M satisfies iff holds in all the trees obtained by pruning from VM subtrees whose root is a successor of an environment state. The intuition is that each such tree corresponds to a different (and possible) environment. We want to hold in every such tree since, of course, we want the open system to satisfy its specification no matter how the environment behaves. We examine the complexity of the module-checking problem for non-universal temporal
logics. It turns out that for such logics module checking is much harder than model checking; in fact, module checking as is as hard as satisfiability. Thus, CTL module checking is EXPTIMEcomplete and CTL? module checking is 2EXPTIME-complete. In both cases the complexity in terms of the size of the module is polynomial. In the second approach to modeling open systems [KV97a], we look at the states of the transition system in more detail. We view these states as assignment of values to variables. These variables are controlled by either the system or by the environment. In this approach we can capture the phenomenon in which the environment the has incomplete information about the system; i.e., not all the variables are readable by the environment. Let us explain this issue in greater detail. An interaction between a system and its environment proceeds through a designated set of input and output variables. In addition, the system often has internal variables, which the environment cannot read. If two states of the system differ only in the values of unreadable variables, then the environment cannot distinguish between them. Similarly, if two computations of the system differ only in the values of unreadable variables along them, then the environment cannot distinguish between them either and thus, its behaviors along these computations are the same. More formally, when we execute a module M with an environment E , and several states in the execution look the same and have the same history according to E โs incomplete information, then the nondeterministic choices done by E in each of these states coincide. In the sandwichdispensing machine example, the people in line cannot see whether the ham and the cheese are fresh. Therefore, their choices are independent of this missing information. Given an open system M with a partition of M โs variables into readable and unreadable, and a temporal-logic formula , the module-checking problem with incomplete information asks whether the execution of M in E satisfies , for all environments E whose nondeterministic choices are independent of the unreadable variables (that is, E behaves the same in indistinguishable states). It turns out that the presence of incomplete information makes module checking more complex. The problem of module checking with incomplete information is is EXPTIME-complete and 2EXPTIME-complete for CTL and CTL? , respectively. In both cases, however, the complexity in terms of the size of the module is exponential, making module checking with incomplete information quite intractable.
2 Module Checking The logic CTL? is a branching temporal logic. A path quantifier, E (โfor some pathโ) or A (โfor all pathsโ), can prefix an assertion composed of an arbitrary combination of linear time operators. There are two types of formulas in CTL? : state formulas, whose satisfaction is related to a specific state, and path formulas, whose satisfaction is related to a specific path. Formally, let AP be a set of atomic proposition names. A CTL? state formula is either: โ true, false, or p, for p 2 AP . โ :', ' _ , or ' ^ where ' and are CTL? state formulas. โ E' or A', where ' is a CTL? path formula. A CTL? path formula is either: โ A CTL? state formula. โ :', ' _ , ' ^ , G', F', X', or 'U , where ' and
are CTL? path formulas.
The logic CTL? consists of the set of state formulas generated by the above rules. The logic CTL is a restricted subset of CTL? . In CTL, the temporal operators G, F , X , and U must be immediately preceded by a path quantifier. Formally, it is the subset of CTL? obtained by
restricting the path formulas to be G', F', X', 'U , where ' and are CTL state formulas. Thus, for example, the CTL? formula ' = AGF (p ^ EXq) is not a CTL formula. Adding a path quantifier, say A, before the F temporal operator in ' results in the formula AGAF (p ^ EXq), which is a CTL formula. The logic 8CTL? is a restricted subset of CTL? that allows only universal path quantification. Thus, it allows only the path quantifier A, which must always be in the scope of an even number of negations. Note that assertions of the form :A , which is equivalent to E : , are not possible. Thus, the logic 8CTL? is not closed under negation. The formula ' above is not a 8CTL? formula. Changing the path quantifier E in ' to the path quantifier A results in the formula AGF (p ^ AXq), which is a 8CTL? formula. The logic 8CTL is defined similarly, as the restricted subset of CTL that allows only universal path quantification. The logics 9CTL? and 9CTL are defined analogously, as the existential fragments of CTL? and CTL, respectively. Note that negating a 8CTL? formula results in an 9CTL? formula. The semantics of the logic CTL? (and its sub-logics) is defined with respect to a program P = hAP; W; R; w0; Li, where AP is the set of atomic propositions, W is a set of states, R W W is a transition relation that must be total (i.e., for every w 2 W there exists w0 2 W such that R(w; w0)), w0 is an initial state, and L : W ! 2AP maps each state to a set of atomic propositions true in this state. For w and w0 with R(w; w0), we say that w0 is a successor of w and we use bd(w) to denote the number of successors that w has. A path of P is an infinite sequence = w0; w1; : : : of states such that for every i 0, we have R(wi; wi+1 ). The suffix wi ; wi+1; : : : of is denoted by i . We use w j= ' to indicate that a state formula ' holds at state w, and we use j= ' to indicate that a path formula ' holds at path (with respect to a given program P ). The relation j= is inductively defined as follows. โ For all w, we have that w j= true and w 6j= false. โ For an atomic proposition p 2 AP , we have w j= p iff p 2 L(w) โ w j= :' iff w 6j= '. โ w j= ' _ iff w j= ' or w j= . โ w j= E' iff there exists a path = w0 ; w1 ; : : : such that w0 = w and j= '. โ j= ' for a state formula ' iff w0 j= '. โ j= :' iff 6j= '. โ j= ' _ iff j= ' or w j= . โ j= X' iff 1 j= '. โ j= 'U iff there exists j 0 such that j j= and for all 0 i < j , we have i
j= '.
The semantics above considers the Boolean operators : (โnegationโ) and _ (โorโ), the temporal operators X (โnextโ) and U (โuntilโ), and the path quantifier A. The other operators are superfluous and can be viewed as the following abbreviations. โ โ โ โ
' ^ = :((:') _ (: )) (โandโ). F' = trueU' (โeventuallyโ). G' = :F :' (โalwaysโ). A' = :E :' (โfor all pathsโ).
A closed system is a system whose behavior is completely determined by the state of the system. We model a closed system by a program. An open system is a system that interacts with its environment and whose behavior depends on that interaction. We model an open system by a module M = hAP; Ws; We ; R; w0; Li, where AP; R; w0, and L are as in programs, Ws is a set of system states, We is a set of environment states, and we often use W to denote Ws [ We . We assume that the states in M are ordered. For each state w 2 W , let succ(w) be an ordered tuple of wโs R-successors; i.e., succ(w) = hw1 ; : : :; wbd(w)i, where for all 1 i bd(w), we
have R(w; wi), and the wi โs are ordered. Consider a system state ws and an environment state we. Whenever a module is in the state ws , all the states in succ(ws ) are possible next states. In contrast, when the module is in state we , there is no certainty with respect to the environment transitions and not all the states in succ(we ) are possible next states. The only thing guaranteed, since we consider environments that cannot block the system, is that not all the transitions from we are disabled. For a state w 2 W , let step(w) denote the set of the possible (ordered) sets of wโs next successors during an execution. By the above, step(ws ) = fsucc(ws)g and step(we ) contains all the nonempty sub-tuples of succ(we ). For k 2 IN, let [k] denote the set f1; 2; : : :; kg. An infinite tree with branching degrees bounded by k is a nonempty set T [k] such that if x c 2 T where x 2 [k] and c 2 [k], then also x 2 T , and for all 1 c0 < c, we have that x c0 2 T . In addition, if x 2 T , then x 1 2 T . The elements of T are called nodes, and the empty word " is the root of T . For every node x 2 T , we denote by d(x) the branching degree of x; that is, the number of c 2 [k] for which x c in T . A path of T is a set T such that " 2 T and for all x 2 , there exists a unique c 2 [k] such that x c 2 . Given an alphabet , a -labeled tree is a pair hT; V i where T is a tree and V : T ! maps each node of T to a letter in . A module M can be unwound into an infinite tree hTM ; VM i in a straightforward way. When we examine a specification with respect to M , the specification should hold not only in hTM ; VM i (which corresponds to a very specific environment that does never restrict the set of its next states), but in all the trees obtained by pruning from hTM ; VM i subtrees whose root is a successor of a node corresponding to an environment state. Let exec(M ) denote the set of all these trees. Formally, hT; V i 2 exec(M ) iff the following holds: โ V (") = w0 . โ For all x 2 T with V (x) = w, there exists hw1 ; : : :; wni 2 step(w) such that T \(fxgIN ) = fx 1; x 2; : : :; x ng and for all 1 c n we have V (x c) = wc. Intuitively, each tree in exec(M ) corresponds to a different behavior of the environment. We will sometimes view the trees in exec(M ) as 2AP -labeled trees, taking the label of a node x to be L(V (x)). Which interpretation is intended will be clear from the context. Given a module M and a CTL? formula , we say that M satisfies , denoted M j=r , if all the trees in exec(M ) satisfy . The problem of deciding whether M satisfies is called module checking. We use M j= to indicate that when we regard M as a program (thus refer to all its states as system states), then M satisfies . The problem of deciding whether M j= is the usual model-checking problem [CE81, CES86, EL85, QS81]. It is easy to see that while M j=r implies that M j= , the other direction is not necessarily true. Also, while M j= implies that M 6j=r : , the other direction is not true as well. Indeed, M j=r requires all the trees in exec(M ) to satisfy . On the other hand, M j= means that the tree hTM ; VM i satisfies . Finally, M 6j=r : only tells us that there exists some tree in exec(M ) that satisfies . As explained earlier, the distinction between model checking and module checking does not apply to universal temporal logics. Lemma 1. [KV96, KVW97] For universal temporal logics, the module-checking problem and the model-checking problem coincide. In order to solve the module-checking problem for non-universal logics, we use nondeterministic tree automata. Tree automata run on -labeled trees. A Bยจuchi tree automaton is A = h; D; Q; q0; ; F i, where is an alphabet, D is a finite set of branching degrees (positive integers), Q is a set of states, q0 2 Q is an initial state, : Q D ! 2Q is a transition function satisfying (q; ; d) 2 Qd , for every q 2 Q, 2 , and d 2 D, and F Q is an acceptance condition.
A run of A on an input -labeled tree hT; V i with branching degrees in D is a Q-labeled tree hT; ri such that r(") = q0 and for every x 2 T , we have that hr(x 1); r(x 2); : : :; r(x d)i 2 (r(x); V (x); d(x)). If, for instance, r(1 1) = q, V (1 1) = , d(1 1) = 2, and (q; ; 2) = fhq1; q2i; hq4; q5ig, then either r(1 1 1) = q1 and r(1 1 2) = q2, or r(1 1 1) = q4 and r(1 1 2) = q5. Given a run hT; ri and a path T , we define Inf (r j ) = fq 2 Q : for infinitely many x 2 ; we have r (x) = q g: That is, Inf (rj) is the set of states that r visits infinitely often along . A run hT; ri is accepting iff for all paths T , we have Inf (rj) \ F = 6 ;. Namely, along all the paths of T , the run visits states from F infinitely often. An automaton A accepts hT; V i iff there exists an accepting run hT; ri of A on hT; V i. We use L(A) to denote the language of the automaton A; i.e., the set of all trees accepted by A. In addition to Bยจuchi tree automata, we also refer to Rabin tree automata. There, F 2Q 2Q , and a run is accepting iff for every path T , there exists a pair hG; B i 2 F such that Inf (rj) \ G = 6 ; and Inf (rj) \ B = ;. The size of an automaton A, denoted jAj, is defined as jQj + j j + jF j, where j j is the sum of the lengths of tuples that appear in the transitions in , and jF j is the sum of the sizes of the sets appearing in F (a single set in the case A is a Bยจuchi automaton, and 2m sets in the case A is a Rabin automaton with m pairs). Note that jAj is independent of the sizes of and D. Note also that A can be stored in space O(jAj).
3
The Complexity of Module Checking
We have already seen that for non-universal temporal logics, the model-checking problem and the module-checking problem do not coincide. In this section we study the complexity of CTL and CTL? module checking. We show that the difference between the model-checking and the module-checking problems reflects in their complexities, and in a very significant manner. Theorem 2. [KV96] (1) The module-checking problem for CTL is EXPTIME-complete. (2) The module-checking problem for CTL? is 2EXPTIME-complete. Proof (sketch): We start with the upper bounds. Given M and , we define two tree automata. Essentially, the first automaton accepts the set of trees in exec(M ) and the second automaton accepts the set of trees that does not satisfy . Thus, M j=r iff the intersection of the automata is empty. Recall that each tree in exec(M ) is obtained from hTM ; VM i by pruning some of its subtrees. The tree hTM ; VM i is a 2AP -labeled tree. We can think of a tree hT; V i 2 exec(M ) as the (2AP [ f?g)-labeled tree obtained from hTM ; VM i by replacing the labels of nodes pruned in hT; V i by ?. Doing so, all the trees in exec(M ) have the same shape (they all coincide with TM ), and they differ only in their labeling. Accordingly, we can think of an environment to hTM ; VM i as a strategy for placing ?โs in hTM ; VM i: placing a ? in a certain node corresponds to the environment disabling the transition to that node. Since we consider environments that do not โblockโ the system, at least one successor of each node is not labeled with ?. Also, once the environment places a ? in a certain node x, it should keep placing ?โs in all the nodes of the subtree that has x as its root. Indeed, all the nodes to this subtree are disabled. The first automaton, AM , accepts all the (2AP [ f?g)-labeled tree obtained from hTM ; VM i by such a โlegalโ placement of ?โs. Formally, given a module M = hAP; Ws; We ; R; w0; Li, we define AM = h2AP [ f?g; D; Q; q0; ; Qi, where
โ โ
D = Sw2W fbd(w)g. That is, D contains all the branching degrees in M (and hence also all
branching degrees in in TM ). Q = W f>; `; ?g). Thus, every state w of M induces three states hw; >i, hw; `i, and hw; ?i in AM . Intuitively, when AM is in state hw; ?i, it can read only the letter ?. When AM is in state hw; >i, it can read only letters in 2AP . Finally, when AM is in state hw; `i, it can read both letters in 2AP and the letter ?. Thus, while a state hw; `i leaves it for the environment to decide whether the transition to w is enabled, a state hw; >i requires the environment to enable the transition to w, and a state hw; ?i requires the environment to disable the transition to w. The three types of states help us to make sure that the environment enables
all transitions from system states, enables at least one transition from each environment state, and disables transitions from states that the transition to them have already been disabled. โ q0 = hw0 ; >i. โ The transition function : Q D ! 2Q is defined for w 2 W and k = bd(w) as follows. Let succ(w) = hw1 ; : : :; wk i. For w 2 Ws [ We and m 2 f`; ?g, we have
(hw; mi; ?; k) = hhw1 ; ?i; hw2; ?i; : : :; hwk ; ?ii: For w 2 Ws and m 2 f>; `g, we have (hw; mi; L(w); k) = hhw1; >i; hw2; >i; : : :; hwk ; >ii: For w 2 We and m 2 f>; `g, we have (hw; mi; L(w); k) = f hhw1 ; >i; hw2; `i; : : :; hwk ; `ii; hhw1 ; `i; hw2; >i; : : :; hwk ; `ii; .. .
hhw1 ; `i; hw2; `i; : : :; hwk ; >ii g: That is, (hw; mi; L(w); k) contains k k-tuples. When the automaton proceeds according to the ith tuple, the environment can disable the transitions to all wโs successors, except the transition to wi, which must be enabled. Note that is not defined for the case k 6= bd(w) or when the input that not meet the restriction imposed by the >; `, and ? annotations, or the labeling of w.
Let k be the maximal branching degree in M . It is easy to see that jQj 3 jW j and j j k jRj. Thus, assuming that jW j jRj, the size of AM is bounded by O(k jRj). Recall that a node of hT; V i 2 L(AM ) that is labeled ? stands for a node that actually does not exist in the corresponding pruning of hTM ; VM i. Accordingly, if we interpret CTL? formulas over the trees obtained by pruning subtrees of hTM ; VM i by means of the tress recognized by AM , we should treat a node that is labeled by ? as a node that does not exist. To do this, we define a function f : CTL? formulas ! CTL? formulas such that f ( ) restricts path quantification to paths that never visit a state labeled with ?. We define f inductively as follows. โ โ โ โ โ โ โ
f (q) = q. f (: ) = :f ( ). f (1 _ 2 ) = f (1 ) _ f (2 ). f (E ) = E ((G:?) ^ f ( )). f (A ) = A((F ?) _ f ( )). f (X ) = Xf ( ). f (1 U2 ) = f (1 )Uf (2 ).
For example, f (EqU (AFp)) = E ((G:?) ^ (qU (A((F ?) _ Fq)))). When is a CTL formula, the formula f ( ) is not necessarily a CTL formula. Still, it has a restricted syntax: its path formulas have either a single linear-time operator or two linear-time operators connected by a Boolean operator. By [KG96], formulas of this syntax have a linear translation to CTL. Given , let AD;: be a Bยจuchi tree automaton that accepts exactly all the tree models of f (: ) with branching degrees in D. By [VW86b], such AD;: of size 2kO(j j) exists. By the definition of satisfaction, we have that M j=r iff all the trees in exec(M ) satisfy . In other words, if no tree in exec(M ) satisfies : . Recall that the automaton AM accepts a (2AP [f?g)-labeled tree iff it corresponds to a โlegalโ pruning of hTM ; VM i by the environment, with a pruned node being labeled by ?. Also, the automaton AD;: accepts a (2AP [f?g)-labeled tree iff it does not satisfy , with path quantification ranging only over paths that never meet a node labeled with ?. Hence, checking whether M j=r can be reduced to testing L(AM ) \L(AD;: ) for emptiness. Equivalently, we have to test L(AM AD;: ) for emptiness. By [VW86b], the nonemptiness problem of Bยจuchi tree automata can be solved in quadratic time, which gives us an algorithm of time complexity O(jRj2 2kO(j j)). The proof is similar for CTL? . Here, following [ES84, EJ88], we have that AD;: is a Rabin O(j j) tree automaton with 2k2 states and 2O(j j) pairs. By [EJ88, PR89], checking the emptiness O(j j) k2O(j j) of L(AM AD;: ) can then be done in time (k jRj)2 2 . It remains to prove the lower bounds. To get an EXPTIME lower bound for CTL, we reduce CTL satisfiability, proved to be EXPTIME-complete in [FL79, Pra80], to CTL module checking. Given a CTL formula , we construct a module M and a CTL formula ' such that the size of M is quadratic in the length of , the length of ' is linear in the length of , and is satisfiable iff M 6j=r :'. The proof is the same for CTL? . Here, we do a reduction from satisfiability of CTL? , proved to be 2EXPTIME-hard in [VS85]. See [KV96] for more details. ut When analyzing the complexity of model checking, a distinction should be made between complexity in the size of the input structure and complexity in the size of the input formula; it is the complexity in size of the structure that is typically the computational bottleneck [LP85]. We now consider the program complexity [VW86a] of module checking; i.e., the complexity of this problem in terms of the size of the input module, assuming the formula is fixed. It is known that the program complexity of LTL, CTL, and CTL? model checking is NLOGSPACE [VW86a, BVW94]. This is very significant since it implies that if the system to be checked is obtained as the product of the components of a concurrent program (as is usually the case), the space required is polynomial in the size of these components rather than of the order of the exponentially larger composition. We have seen that when we measure the complexity of the module-checking problem in terms of both the program and the formula, then module checking of CTL and CTL? formulas is much harder than their model checking. We now claim that when we consider program complexity, module checking is still harder. Theorem 3. [KV96] The program complexity of CTL and CTL? module checking is PTIMEcomplete. Proof: Since the algorithms given in the proof of Theorem 2 are polynomial in the size of the module, membership in PTIME is immediate. We prove hardness in PTIME by reducing the Monotonic Circuit Value Problem (MCV), proved to be PTIME-hard in [Gol77], to module checking of the CTL formula EFp. In the MCV problem, we are given a monotonic Boolean circuit (i.e., a circuit constructed solely of AND gates and OR gates), and a vector hx1 ; : : :; xni of Boolean input values. The problem is to determine whether the output of on hx1 ; : : :; xni is 1.
Let us denote a monotonic circuit by a tuple = hG8; G9; Gin; gout; T i, where G8 is the set of AND gates, G9 is the set of OR gates, Gin is the set of input gates (identified as g1 ; : : :; gn), gout 2 G8 [ G9 [ Gin is the output gate, and T G G denotes the acyclic dependencies in , that is hg; g0 i 2 T iff the output of gate g0 is an input of gate g. Given a monotonic circuit = hG8; G9; Gin; gout; T i and an input vector x = hx1 ; : : :; xni, we construct a module M;x = hf0; 1g; G8; G9 [ Gin; R; gout; Li, where โ R = T [ fhg; gi : g 2 Ging. โ For g 2 G8 [ G9 , we have L(g) = f1g. For gi
2 Gin, we have L(gi ) = fxig.
Clearly, the size of M;x is linear in the size of . Intuitively, each tree in exec(M;x ) corresponds to a decision of as to how to satisfy its OR gates (we satisfy an OR gate by satisfying any nonempty subset of its inputs). It is therefore easy to see that there exists a tree hT; V i 2 exec(M;x) such that hT; V i j= AG1 iff the output of on x is 1. Hence, by the definition of module checking, we have that M;x j=r EF 0 iff the output of on x is 0. ut
4 Module Checking with Incomplete Information We first need to generalize the definition of trees from Section 2. Given a finite set , an -tree is a nonempty set T such that if s 2 T , where s 2 and 2 , then also s 2 T . When is not important or clear from the context, we call T a tree. The elements of T are called nodes, and the empty word is the root of T . For every s 2 T , the nodes s 2 T where 2 are the children of s. An -tree T is a full infinite tree if T = . Each node s of T has a direction in . The direction of the root is some designated 0 2 . The direction of a node s is . A path of T is a set T such that 2 and for every s 2 there exists a unique 2 such that s 2 . Given two finite sets and , a -labeled -tree is a pair hT; V i where T is an -tree and V : T ! maps each node of T to a letter in . When and are not important or clear from the context, we call hT; V i a labeled tree. For finite sets X and Y , and a node s 2 (X Y ) , let hideY (s) be the node in X obtained from s by replacing each letter (x y) by the letter x. For example, when X = Y = f0; 1g, the node 0010 of the (X Y )-tree on the right corresponds, by hideY , to the node 01 of the X -tree on the left. Note that the nodes 0011; 0110, and 0111 of the (X Y )-tree also correspondto the node 01 of the X -tree. Let Z be a finite set. For a Z -labeled X -tree hT; V i, we define the Y -widening of hT; V i, denoted wideY (hT; V i), as the Z -labeled (X Y )-tree hT 0 ; V 0 i where for every s 2 T , we have hide,Y 1(s) T 0 and for every t 2 T 0, we have V 0 (t) = V (hideY (t)). Note that for every node t 2 T 0 , and x 2 X , the children t (x y) of t, for all y, agree on their label in hT 0 ; V 0 i. Indeed, they are all labeled with V (hideY (t) x). We now describe a second approach to modeling open systems. We describe an open system by a module M = hI; O; H; W; w0; R; Li, where โ I , O, and H are sets of input, readable output, and hidden (internal) variables, respectively. We assume that I; O, and H are pairwise disjoint, we use K to denote the variables known to the environment; thus K = I [ O, and we use P to denote all variables; thus P = K [ H . โ W is a set of states, and w0 2 W is an initial state. โ R W W is a total transition relation. For hw; w0i 2 R, we say that w0 is a successor of w. Requiring R to be total means that every state w has at least one successor. โ L : W ! 2P maps each state to the set of variables that hold in this state. The intuition is that in every state w, the module reads L(w) \ I and writes L(w) \ (O [ H ).
A computation of M is a sequence w0 ; w1 ; : : : of states, such that for all i 0 we have hwi ; wi+1i 2 R. We define the size jM j of M as (jW j jP j) + jRj. We assume, without loss of generality, that all the states of M are labeled differently; i.e., there exist no w1 and w2 in W for which L(w1 ) = L(w2) (otherwise, we can add variables in H that differentiate states with identical labeling). With each module M we can associate a computation tree hTM ; VM i obtained by pruning M from the initial state. More formally, hTM ; VM i is a 2P -labeled 2P -tree (not necessarily with a fixed branching degree). Each node of hTM ; VM i corresponds to a state of M , with the root corresponding to the initial state. A node corresponding to a state w is labeled by L(w) and its children correspond to the successors of w in M . The assumption that the nodes are labeled differently enable us to embody hTM ; VM i in a (2P ) -tree, with a node with direction labeled . A module M is closed iff I = ;. Otherwise, it is open. Consider an open module M . The module interacts with some environment E that supplies its inputs. When M is in state w, its ability to move to a certain successor w0 of w is conditioned by the behavior of its environment. If, for example, L(w0 ) \ I = and the environment does not supply to M , then M cannot move to w0 . Thus, the environment may disable some of M โs transitions. We can think of an environment to M as a strategy E : (2K ) ! f>; ?g that maps a finite history s of a computation (as seen by the environment) to either >, meaning that the environment enables M to execute s, or ?, meaning that the environment does not enable M to execute s. In other words, if M reaches a state w by executing some s 2 (2K ) , and a successor w0 of w has L(w) \ K = , then an interaction of M with E can proceed from w to w0 iff E (s ) = >. We say that the tree h(2K ) ; Ei maintains the strategy applied by E . We denote by M < E the execution of M in E ; that is, the tree obtained by pruning from the computation tree hTM ; VM i subtrees according to E . Note that E may disable all the successors of w. We say that a composition M < E is deadlock free iff for every state w, at least one successor of w is enabled. Given M , we can define the maximal environment Emax for M . The maximal environment has Emax (x) = > for all x 2 (2K ) ; thus it enables all the transitions of M . Recall that in Section 2, we modeled open systems using system and environment states, and only transitions from environment states may be disabled. Here, the interaction of the system with its environment is more explicit, and transitions are disabled by the environment assigning values to the systemโs input variables. The hiding and widening operators enable us to refer to the interaction of M with E as seen by both M and E . As we shall see below, this interaction looks different from the two points of views. First, clearly, the labels of the computation tree of M , as seen by E , do not contain variables in H . Consequently, E thinks that hTM ; VM i is a 2K -tree, rather than a 2P -tree. Indeed, E cannot distinguish between two nodes that differ only in the values of variables in H in their labels. Accordingly, a branch of hTM ; VM i into two such nodes is viewed by E as a single transition. This incomplete information of E is reflected in its strategy, which is independent of H . Thus, successors of a state that agree on the labeling of the readable variables are either all enabled or all disabled. Formally, if h(2K ) ; Ei is the f>; ?g-labeled 2K -tree that maintains the strategy applied by E , then the f>; ?g-labeled 2P -tree wide(2H ) (h(2K ) ; Ei) maintains the โfullโ strategy for E , as seen by someone that sees both K and H . Another way to see the effect of incomplete information is to associate with each environment E a tree obtained from hTM ; VM i by pruning some of its subtrees. A subtree with root s 2 TM is pruned iff K 0 (hide(2H ) (s)) = ?. Every two nodes s1 and s2 that are indistinguishable according to E โs incomplete information have hide(2H ) (s1 ) = hide(2H ) (s2 ). Hence, either both subtrees with roots s1 and s2 are pruned or both are not pruned. Note that once E (x) = ? for some s 2 (2K ) , we can assume that E (s t) for all t 2 (2K ) is also ?. Indeed, once the environment disables the transition to a certain node s, it actually disables the transitions to all the nodes
in the subtree with root s. Note also that M < E is deadlock free iff for every s 2 TM with E (hide(2H ) (s)) = >, at least one direction 2 2P has s 2 TM and E (hide(2H ) (s )) = >.
5 The Complexity of Module Checking with Incomplete Information The module-checking with incomplete information problem is defined as follows. Let M be a module, and let be a temporal-logic formula over the set P of M โs variables. Does M < E satisfy for every environment E for which M < E is deadlock free? When the answer to the module-checking question is positive, we say that M reactively satisfies , denoted M j=r . Note that when H = ;, i.e., there are no hidden variables, then we get the module-checking problem, which was studied in Section 3. Even with incomplete information, the distinction between model checking and module checking does not apply to universal temporal logics. Lemma 4. [KV97a] For universal temporal logics, the module-checking with incomplete information problem and the model-checking problem coincide. Dealing incomplete information for non-universal logics is complicated. The solution we suggest is based on alternating tree automata and is outlined below. In Sections 5.1 and 5.2, we define alternating tree automata and describe the solutions in detail. We start by recalling the solution to the module-checking problem. Given M and , we proceed as follows. A1. Define a nondeterministic tree automaton AM that accepts all the 2P -labeled trees that correspond to compositions of M with some E for which M < E is deadlock free. Thus, each tree accepted by AM is obtained from hTM ; VM i by pruning some of its subtrees. A2. Define a nondeterministic tree automaton A: that accepts all the 2P -labeled trees that do not satisfy . A3. M j=r iff no composition M < E satisfies : , thus iff the intersection of AM and A: is empty. The reduction of the module-checking problem to the emptiness problem for tree automata implies, by the finite-model property of tree automata [Eme85], that defining reactive satisfaction with respect to only finite-state environments is equivalent to the current definition. In the presence of incomplete information, not all possible pruning of hTM ; VM i correspond to compositions of M with some E . In order to correspond to such a composition, a tree should be consistent in its pruning. A tree is consistent in its pruning iff for every two nodes that the paths leading to them differ only in values of variables in H (i.e., every two nodes that have the same history according to E โs incomplete information), either both nodes are pruned or both nodes are not pruned. Intuitively, hiding variables from the environment makes it easier for M to reactively satisfy a requirement: out of all the pruning of hTM ; VM i that should satisfy the requirement in the case of complete information, only these that are consistent should satisfy the requirement in the presence of incomplete information. Unfortunately, the consistency condition is non-regular, and cannot be checked by an automaton. In order to circumvent this difficulty, we employ alternating tree automata. We solve the module-checking problem with incomplete information as follows. B1. Define an alternating tree automaton AM;: that accepts a f>; ?g-labeled 2K -tree iff it corresponds to a strategy h(2K ) ; Ei such that M < E is deadlock free and does not satisfy . B2. M j=r iff all deadlock free compositions of M with E that is independent of H satisfy , thus iff no strategy induces a computation tree that does not satisfy , thus iff AM;: is empty.
We now turn to a detailed description of the solution of the module-checking problem with incomplete information, and the complexity results it entails. For that, we first define formally alternating tree automata. 5.1
Alternating Tree Automata
Alternating tree automata generalize nondeterministic tree automata and were first introduced in [MS87]. An alternating tree automaton A = h; Q; q0; ; i runs on full -labeled -trees (for an agreed set of directions). It consists of a finite set Q of states, an initial state q0 2 Q, a transition function , and an acceptance condition (a condition that defines a subset of Q! ). For a set of directions, let B+ ( Q) be the set of positive Boolean formulas over Q; i.e., Boolean formulas built from elements in Q using ^ and _, where we also allow the formulas true and false and, as usual, ^ has precedence over _. The transition function : Q ! B+ ( Q) maps a state and an input letter to a formula that suggests a new configuration for the automaton. For example, when = f0; 1g, having
(q; ) = (0; q1) ^ (0; q2) _ (0; q2) ^ (1; q2) ^ (1; q3) means that when the automaton is in state q and reads the letter , it can either send two copies, in states q1 and q2 , to direction 0 of the tree, or send a copy in state q2 to direction 0 and two copies, in states q2 and q3 , to direction 1. Thus, unlike nondeterministic tree automata, here the transition function may require the automaton to send several copies to the same direction or allow it not to send copies to all directions. A run of an alternating automaton A on an input -labeled -tree hT; V i is a tree hTr ; ri in which the root is labeled by q0 and every other node is labeled by an element of Q. Each node of Tr corresponds to a node of T . A node in Tr , labeled by (x; q), describes a copy of the automaton that reads the node x of T and visits the state q. Note that many nodes of Tr can correspond to the same node of T ; in contrast, in a run of a nondeterministic automaton on hT; V i there is a one-to-one correspondence between the nodes of the run and the nodes of the tree. The labels of a node and its children have to satisfy the transition function. For example, if hT; V i is a f0; 1g-tree with V () = a and (q0; a) = ((0; q1) _ (0; q2)) ^ ((0; q3) _ (1; q2)), then the nodes of hTr ; ri at level 1 include the label (0; q1) or (0; q2), and include the label (0; q3 ) or (1; q2 ). Each infinite path in hTr ; r i is labeled by a word r () in Q! . Let inf () denote the set of states in Q that appear in r() infinitely often. A run hTr ; ri is accepting iff all its infinite paths satisfy the acceptance condition. In Bยจuchi alternating tree automata, Q, and an infinite path satisfies iff inf () \ 6= ;. As with nondeterministic automata, an automaton accepts a tree iff there exists an accepting run on it. We denote by L(A) the language of the automaton A; i.e., the set of all labeled trees that A accepts. We say that an automaton is nonempty iff L(A) 6= ;. We define the size jAj of an alternating automaton A = h; Q; q0 ; ; i as jQj + jj + j j, where jQj and jj are the respective cardinalities of the sets Q and , and where j j is the sum of the lengths of the satisfiable (i.e., not false) formulas that appear as (q; ) for some q and . 5.2
Solving the Problem of Module-Checking with Incomplete Information
Theorem 5. [KV97a] Given a module M and a CTL formula over the sets I; O, and H , of M โs variables, there exists an alternating Bยจuchi tree automaton AM; over f>; ?g-labeled 2I [O -trees, of size O(jM j j j), such that L(AM; ) is exactly the set of strategies E such that M < E is deadlock free and satisfies . Proof (sketch): Let M = hI; O; H; W; w0; R; Li, and let K = I [ O. For w 2 W and 2 2K , we define s(w; ) = fw0 j hw; w0i 2 R and L(w0 ) \ K = g and d(w) = f j s(w; ) 6= ;g.
That is, s(w; ) contains all the successors of w that agree in their readable variables with . Each such successor corresponds to a node in hTM ; VM i with a direction in hide(,21H ) (). Accordingly, d(w) contains all directions for which nodes corresponding to w in hTM ; VM i have at least one 1 successor with a direction in hide, ( ). (2H ) Essentially, the automaton AM; is similar to the product alternating tree automaton obtained in the alternating-automata theoretic framework for CTL model checking [BVW94]. There, as there is a single computation tree with respect to which the formula is checked, the automaton obtained is a 1-letter automaton. Here, as there are many computation trees to check, we get a 2-letter automaton: each f>; ?g-labeled tree induces a different computation tree, and AM; considers them all. In addition, it checks that the composition of the strategy in the input with M is deadlock free. We assume that is given in a positive normal form, thus negations are applied only to atomic propositions. We define AM; = hf>; ?g; Q;q0; ; i, where โ
Q = (W (cl( ) [fp> g) f8; 9g) [fq0g, where cl( ) denotes the set of โs subformulas. Intuitively, when the automaton is in state hw; '; 8i, it accepts all strategies for which w is either pruned or satisfies ', where ' = p> is satisfied iff the root of the strategy is labeled >. When the automaton is in state hw; '; 9i, it accepts all strategies for which w is not pruned and it satisfies '. We call 8 and 9 the mode of the state. While the states in W fp> gf8; 9g check that the composition of M with the strategy in the input is deadlock free, the states in W cl( ) f8; 9g check that this composition satisfies . The initial state q0 sends copies
to check both the deadlock freeness of the composition and the satisfaction of . โ The transition function : Q ! B+ (2K Q) is defined as follows (with m 2 f9; 8g). (q0; ?) = false, and (q0; >) = (hw0; p> ; 9i; >) ^ (hw0 ; ; 9i; >). For all w and ', we have (hw; '; 8i; ?) = true and (hw; '; 9i; ?) = false.
(hw; p>; mWi; >) =W
(
0
V
V
0
22K w0 2s(w;) (; hw ; p>; 9i)) ^ ( 22K w0 2s(w;) (; hw ; p> ; 8i)).
(hw; p; mi; >) = true if p 2 L(w), and (hw; p; mi; >) = false if p 62 L(w). (hw; :p; mi; >) = true if p 62 L(w), and (hw; :p; mi; >) = false if p 2 L(w). (hw; '1 ^ '2 ; mi; >) = (hw; '1 ; mi; >) ^ (hw; '2; mi; >). (hw; '1 _ '2 ; mi; >) =V (hw;V'1 ; mi; >) _ (hw; '2; mi; >). (hw; AX'; mi; >) = W22K Ww0 2s(w;) (; hw0 ; '; 8i). (hw; EX'; mi; >) = 22K w0 2s(w;) (; hw0 ; '; 9i). (hw; A'1U'2 ; mi; >) = V V (hw; '2 ; mi; >) _ ( (hw; '1; mi; >) ^ 22K w0 2s(w;) (; hw0 ; A'1U'2 ; 8i)). (hw; E'1U'2 ; mi; >) = W W (hw; '2 ; mi; >) _ ( (hw; '1; mi; >V) ^ 2V2K w0 2s(w;) (; hw0 ; E'1U'2 ; 9i)). (hw; AG'; mi; >) = (hw; '; mi; >)^ W22K Ww0 2s(w;) (; hw0 ; AG'; 8i). (hw; EG'; mi; >) = (hw; '; mi; >) ^ 22K w0 2s(w;) (; hw0 ; EG'; 9i). Consider, for example, a transition from the state hw; AX'; 9i. First, if the transition to w
is disabled (that is, the automaton reads ?), then, as the current mode is existential, the run is rejecting. If the transition to w is enabled, then wโs successors that are enabled should satisfy '. The state w may have several successors that agree on some labeling 2 2K and differ only on the labeling of variables in H . These successors are indistinguishable by the environment, and the automaton sends them all to the same direction . This guarantees that either all these successors are enabled by the strategy (in case the letter to be read in direction is >) or all are disabled (in case the letter in direction is ?). In addition, since the requirement to satisfy ' concerns only successors of w that are enabled, the mode of the new states is universal. The copies of AM; that check the composition with the strategy
to be deadlock free guarantee that at least one successor of w is enabled. Note that as the transition relation R is total, the conjunctions and disjunctions in cannot be empty. โ = W G( ) f9; 8g, where G( ) is the set of all formulas of the form AG' or EG' in cl( ). Thus, while the automaton cannot get trapped in states associated with โUntilformulasโ (then, the eventuality of the until is not satisfied), it may get trapped in states associated with โAlways-formulasโ (then, the safety requirement is never violated). We now consider the size of AM;: . Clearly, jQj = O(jW j j j). Also, as the transition associated with a state hw; '; mi depends on the successors of w, we have that j j = O(jRjj j). Finally, jj jQj, and we are done. ut Extending the alternating automata described in [BVW94] to handle incomplete information is possible thanks to the special structure of the automata, which alternate between universal and existential modes. This structure (the โhesitation conditionโ, as called in [BVW94]) exists also in automata associated with CTL? formulas, and imply the following analogous theorem. Theorem 6. [KV97a] Given a module M and a CTL? formula over the sets I; O, and H , of M โs variables, there exists an alternating Rabin tree automaton AM; over f>; ?g-labeled 2I [O -trees, with jW j 2O(j j) states and two pairs, such that L(AM; ) is exactly the set of strategies E such that M < E is deadlock free and satisfies . We now consider the complexity bounds that follow from our algorithm. Theorem 7. [KV97a] The module-checking problem with incomplete information is EXPTIMEcomplete for CTL and is 2EXPTIME-complete for CTL? . Proof (sketch): The lower bounds follows from the known bounds for module checking with complete information [KV96]. For the upper bounds, in Theorems 5 and 6 we reduced the problem M j=r to the problem of checking the nonemptiness of the automaton AM;: . When is a CTL formula, AM;: is an alternating Bยจuchi automaton of size O(jM j j j). By [VW86b, MS95], checking the nonemptiness of AM;: is then exponential in the sizes of M and . When is a CTL? formula, the automaton AM;: is an alternating Rabin automaton, with jW j 2O(j j) states and two pairs. Accordingly, by [EJ88, MS95], checking the nonemptiness of AM;: is exponential in jW j and double exponential in j j. ut As the module-checking problem for CTL is already EXPTIME-hard for environments with complete information, it might seem as if incomplete information can be handled at no cost. This is, however, not true. By Theorem 3, the program complexity of CTL module checking with complete information is PTIME-complete. On the other hand, the time complexity of the algorithm we present here is exponential in the size of the both the formula and the system. Can we do better? In Theorem 8 below, we answer this question negatively. To see why, consider a module M with hidden variables. When M interacts with an environment E , the module seen by E is different from M . Indeed, every state of the module seen by E corresponds to a set of states of M . Therefore, coping with incomplete information involves some subset construction, which blows-up the state space exponentially. In our algorithm, the subset construction hides in the emptiness test of AM;: . Theorem 8. [KV97b] The program complexity of CTL module checking with incomplete information is EXPTIME-complete.
Proof (sketch): The upper bound follows from Theorem 7. For the lower bound, we do a reduction from the outcome problem for two-players games with incomplete information, proved to be EXPTIME-hard in [Rei84]. A two-player game with incomplete information consists of an AND-OR graph with an initial state and a set of designated states. Each of the states in the graph is labeled by readable and unreadable observations. The game is played between two players, called the OR-player and the AND-player. The two players generate together a path in the graph. The path starts at the initial state. Whenever the game is at an OR-state, the OR-player determines the next state. Whenever the game is at an AND-state, the AND-player determines the next state. The outcome problem is to determine whether the OR-player has a strategy that depends only on the readable observations (that is, a strategy that maps finite sequences of sets of readable observations to a set of known observations) such that following this strategy guarantees that, no matter how the AND-player plays, the path eventually visits one of the designated states. Given an AND-OR graph G as above, we define a module MG such that MG reactively satisfies a fixed CTL formula ' iff the OR-player has no strategy as above. The environments of MG correspond to strategies for the OR-player. Each environment suggests a pruning of hTMG ; VMG i such that the set of paths in the pruned tree corresponds to a set of paths that the OR-player can force the game into, no matter how the AND-player plays. The module MG is very similar to G, and the formula ' requires the existence of a computation that never visits a designated state. The formal definition of MG and ' involves some technical complications required in order to make sure that the environment disables only transitions from OR-states. ut
6 Discussion The discussion of the relative merits of linear versus branching temporal logics is almost as early as these paradigms [Lam80]. One of the beliefs dominating this discussion has been โwhile specifying is easier in LTL, model checking is easier for CTLโ. Indeed, the restricted syntax of CTL limits its expressive power and many important behaviors (e.g., strong fairness) can not be specified in CTL. On the other hand, while model checking for CTL can be done in time O(jP j j j) [CES86], it takes time O(jP j 2j j) for LTL [LP85]. Since LTL model checking is PSPACE-complete [SC85], the latter bound probably cannot be improved. The attractive computational complexity of CTL model checking have compensated for its lack of expressive power and branching-time model-checking tools can handle systems with extremely large state spaces [BCM+ 90, McM93, CGL93]. If we examine this issue more closely, however, we find that the computational superiority of CTL over LTL is not that clear. For example, as shown in [Var95, KV95], the advantage that CTL enjoys over LTL disappears also when the complexity of modular verification is considered. The distinction between closed an open systems discussed in this paper questions the computational superiority of the branching-time paradigm further. Our conclusion is that the debate about the relative merit of the linear and branching paradigms will not be settled by technical arguments such as expressive power or computational complexity. Rather, the discussion should focus on the attractiveness of the approaches to practitioners who practice computer-aided verification in realistic settings. We believe that this discussion will end up with the conclusion that both approaches have their merits and computer-aided verification tools should therefore combine the two approaches rather than โreligiouslyโ adhere to one or the other.
References [BBG+ 94] I. Beer, S. Ben-David, D. Geist, R. Gewirtzman, and M. Yoeli. Methodology and system for practical formal verification of reactive hardware. In Proc. 6th Conference on Computer Aided
Verification, volume 818 of Lecture Notes in Computer Science, pages 182โ193, Stanford, June 1994. [BCM+ 90] J.R. Burch, E.M. Clarke, K.L. McMillan, D.L. Dill, and L.J. Hwang. Symbolic model checking: 1020 states and beyond. In Proceedings of the 5th Symposium on Logic in Computer Science, pages 428โ439, Philadelphia, June 1990. [BVW94] O. Bernholtz, M.Y. Vardi, and P. Wolper. An automata-theoretic approach to branching-time model checking. In D. L. Dill, editor, Computer Aided Verification, Proc. 6th Int. Conference, volume 818 of Lecture Notes in Computer Science, pages 142โ155, Stanford, June 1994. Springer-Verlag, Berlin. [CE81] E.M. Clarke and E.A. Emerson. Design and synthesis of synchronization skeletons using branching time temporal logic. In Proc. Workshop on Logic of Programs, volume 131 of Lecture Notes in Computer Science, pages 52โ71. Springer-Verlag, 1981. [CES86] E.M. Clarke, E.A. Emerson, and A.P. Sistla. Automatic verification of finite-state concurrent systems using temporal logic specifications. ACM Transactions on Programming Languages and Systems, 8(2):244โ263, January 1986. [CGB86] E.M. Clarke, O. Grumberg, and M.C. Browne. Reasoning about networks with many identical finite-state processes. In Proc. 5th ACM Symposium on Principles of Distributed Computing, pages 240โ248, Calgary, Alberta, August 1986. [CGH+ 95] E.M. Clarke, O. Grumberg, H. Hiraishi, S. Jha, D.E. Long, K.L. McMillan, and L.A. Ness. Verification of the futurebus+ cache coherence protocol. Formal Methods in System Design, 6:217โ232, 1995. [CGL93] E.M. Clarke, O. Grumberg, and D. Long. Verification tools for finite-state concurrent systems. In J.W. de Bakker, W.-P. de Roever, and G. Rozenberg, editors, Decade of Concurrency โ Reflections and Perspectives (Proceedings of REX School), volume 803 of Lecture Notes in Computer Science, pages 124โ175. Springer-Verlag, 1993. [EH86] E.A. Emerson and J.Y. Halpern. Sometimes and not never revisited: On branching versus linear time. Journal of the ACM, 33(1):151โ178, 1986. [EJ88] E.A. Emerson and C. Jutla. The complexity of tree automata and logics of programs. In Proceedings of the 29th IEEE Symposium on Foundations of Computer Science, pages 368โ377, White Plains, October 1988. [EL85] E.A. Emerson and C.-L. Lei. Modalities for model checking: Branching time logic strikes back. In Proceedings of the Twelfth ACM Symposium on Principles of Programming Languages, pages 84โ96, New Orleans, January 1985. [Eme85] E.A. Emerson. Automata, tableaux, and temporal logics. In Proc. Workshop on Logic of Programs, volume 193 of Lecture Notes in Computer Science, pages 79โ87. Springer-Verlag, 1985. [ES84] E.A. Emerson and A. P. Sistla. Deciding branching time logic. In Proc. 16th ACM Symposium on Theory of Computing, Washington, April 1984. [FL79] M.J. Fischer and R.E. Ladner. Propositional dynamic logic of regular programs. J. of Computer and Systems Sciences, 18:194โ211, 1979. [FZ88] M.J. Fischer and L.D. Zuck. Reasoning about uncertainty in fault-tolerant distributed systems. In M. Joseph, editor, Proc. Symp. on Formal Techniques in Real-Time and Fault-Tolerant Systems, volume 331 of Lecture Notes in Computer Science, pages 142โ158. Springer-Verlag, 1988. [GL94] O. Grumberg and D.E. Long. Model checking and modular verification. ACM Trans. on Programming Languages and Systems, 16(3):843โ871, 1994. [Gol77] L.M. Goldschlager. The monotone and planar circuit value problems are log space complete for p. SIGACT News, 9(2):25โ29, 1977. [Hoa85] C.A.R. Hoare. Communicating Sequential Processes. Prentice-Hall, 1985. [HP85] D. Harel and A. Pnueli. On the development of reactive systems. In K. Apt, editor, Logics and Models of Concurrent Systems, volume F-13 of NATO Advanced Summer Institutes, pages 477โ498. Springer-Verlag, 1985. [KG96] O. Kupferman and O. Grumberg. Buy one, get one free!!! Journal of Logic and Computation, 6(4):523โ539, 1996.
[KV95]
[KV96]
[KV97a]
[KV97b]
[KVW97] [Lam80]
[LP85]
[McM93] [Mil71] [MP92] [MS87] [MS95]
[Pnu81] [PR89] [Pra80] [QS81]
[Rei84] [SC85] [Var95] [VS85] [VW86a]
[VW86b]
O. Kupferman and M.Y. Vardi. On the complexity of branching modular model checking. In Proc. 6th Conferance on Concurrency Theory, volume 962 of Lecture Notes in Computer Science, pages 408โ422, Philadelphia, August 1995. Springer-Verlag. O. Kupferman and M.Y. Vardi. Module checking. In Computer Aided Verification, Proc. 8th Int. Conference, volume 1102 of Lecture Notes in Computer Science, pages 75โ86. Springer-Verlag, 1996. O. Kupferman and M.Y. Vardi. Module checking revisited. In Computer Aided Verification, Proc. 9th Int. Conference, volume 1254 of Lecture Notes in Computer Science, pages 36โ47. Springer-Verlag, 1997. O. Kupferman and M.Y. Vardi. Weak alternating automata are not that weak. In 5th Israeli Symposium on Theory of Computing and Systems, pages 147โ158. IEEE Computer Society Press, 1997. O. Kupferman, M.Y. Vardi, and P. Wolper. Module checking. 1997. L. Lamport. Sometimes is sometimes โnot neverโ - on the temporal logic of programs. In Proceedings of the 7th ACM Symposium on Principles of Programming Languages, pages 174โ 185, January 1980. O. Lichtenstein and A. Pnueli. Checking that finite state concurrent programs satisfy their linear specification. In Proceedings of the Twelfth ACM Symposium on Principles of Programming Languages, pages 97โ107, New Orleans, January 1985. K.L. McMillan. Symbolic model checking. Kluwer Academic Publishers, 1993. R. Milner. An algebraic definition of simulation between programs. In Proceedings of the 2nd International Joint Conference on Artificial Intelligence, pages 481โ489, September 1971. Z. Manna and A. Pnueli. Temporal specification and verification of reactive modules. 1992. D.E. Muller and P.E. Schupp. Alternating automata on infinite trees. Theoretical Computer Science, 54,:267โ276, 1987. D.E. Muller and P.E. Schupp. Simulating aternating tree automata by nondeterministic automata: New results and new proofs of theorems of Rabin, McNaughton and Safra. Theoretical Computer Science, 141:69โ107, 1995. A. Pnueli. The temporal semantics of concurrent programs. Theoretical Computer Science, 13:45โ60, 1981. A. Pnueli and R. Rosner. On the synthesis of a reactive module. In Proceedings of the Sixteenth ACM Symposium on Principles of Programming Languages, Austin, January 1989. V.R. Pratt. A near-optimal method for reasoning about action. J. on Computer and System Sciences, 20(2):231โ254, 1980. J.P. Queille and J. Sifakis. Specification and verification of concurrent systems in Cesar. In Proc. 5th International Symp. on Programming, volume 137, pages 337โ351. Springer-Verlag, Lecture Notes in Computer Science, 1981. J.H. Reif. The complexity of two-player games of incomplete information. J. on Computer and System Sciences, 29:274โ301, 1984. A.P. Sistla and E.M. Clarke. The complexity of propositional linear temporal logic. J. ACM, 32:733โ749, 1985. M.Y. Vardi. On the complexity of modular model checking. In Proceedings of the 10th IEEE Symposium on Logic in Computer Science, June 1995. M.Y. Vardi and L. Stockmeyer. Improved upper and lower bounds for modal logics of programs. In Proc 17th ACM Symp. on Theory of Computing, pages 240โ251, 1985. M.Y. Vardi and P. Wolper. An automata-theoretic approach to automatic program verification. In Proceedings of the First Symposium on Logic in Computer Science, pages 322โ331, Cambridge, June 1986. M.Y. Vardi and P. Wolper. Automata-theoretic techniques for modal logics of programs. Journal of Computer and System Science, 32(2):182โ221, April 1986.
Hoare-Style Compositional Proof Systems for Reactive Shared Variable Concurrency F.S. de Boer? 1 , U. Hannemann??2 and W.-P. de Roever??? 2 1
Utrecht University, Department of Computer Science, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands 2 Christian-Albrechts-Universitat zu Kiel, Institut fur Informatik und Praktische Mathematik II, Preusserstrasse 1-9, 24105 Kiel, Germany
Abstract. A new compositional logic for verifying safety properties of shared variable concurrency is presented, in which, in order to characterize in nite computations, a Hoare-style I/pre/post format is used where I expresses the communication interface, enabling the characterization of reactive programs. This logic relates to the Rely/Guarantee paradigm of Jones [11], in that Rely/Guarantee formulae can be expressed within our formalism. As novel feature we characterize pre xes of computations through so-called time-diagrams, a mapping from a discrete total wellfounded ordering to states, and combine these with action predicates (already introduced in old work of, e.g., Lamport) in order to obtain a compositional formalism. The use of time diagrams enables the expression of strongest postconditions and strongest invariants directly within the assertion language, instead of through encoding within the natural numbers. A proof of Dekker's mutual exclusion algorithm is given.
1 Introduction This paper represents part of our research into the usefulness and the scope of possible approaches to compositional formalisms for shared variable concurrency. It serves as a foundation for the corresponding chapter in [16]. In 1965 E.W. Dijkstra introduced the parbegin statement for describing parallel composition between processes which communicate via shared variables. But it is only recently that the compositional and fully abstract semantics of shared variable concurrency has been studied in [4, 5]. On the other hand the rst complete logic for proving partial correctness properties of concurrent programs ? ?? ???
e-mail: [email protected] e-mail: [email protected] e-mail: [email protected]
appeared already in 1976 and was developed by S. Owicki and D. Gries in [21]. However,their proof method is not compositional in the sense that it does not allow a derivation of a correctness speci cation of a parallel program in terms of local speci cations of its components without reference to their internal structure. Consequently this proof method cannot be used to support top-down programdesign. Moreover, the relevance of a compositional reasoning pattern with respect to the complexity of (mechanically supported) correctness proofs of concurrent systems lies in the fact that the veri cation (in a compositional proof system) of the local components of a system can in most practical cases be mechanized fully (or at least to a very large extent). What remains is a proof that some logical combination of the speci cations of the components implies the desired speci cation of the entire system. This latter proof in general involves purely mathematical reasoning about the underlying data-structures and as such does not involve any reasoning about speci c control structures (see also [12] where the use of `mathematics' for speci cation and veri cation of concurrent programs is strongly advocated). This abstraction from the ow of control allows for a greater control of the complexity of correctness proofs. For the model of concurrency described by CSP, which is based on a synchronous communication mechanism, several (relatively) complete compositional proof methods have been introduced, e.g., in [6, 9, 15, 13, 18, 24, 26]. These proof methods formalize reasoning about synchronous communication in terms of a trace logic, a trace being a sequence of records of communications. For the parallel composition of processes it is important to notice that their speci cation should only refer to projections of the trace onto those communications that involve the process at hand. Interpreting shared variables as CSP-like processes, these methods inspired also proof methods for shared variables [19]. The rst compositional characterization of shared variable concurrency was called the Rely/Guarantee (R/G) method and was conceived by Jones [11]; for complete versions of this proof system consult [17, 22]. Again validity of a R/G speci cation of a process states that provided the environment satis es the rely condition R that process ful lls the guarantee condition G. The dierence with the so-called assumption/commitment (A/C) method [13] introduced by Misra and Chandy in 1981 being that validity of an A/C speci cation of a process S stipulates that C holds after each communication of S provided A holds after all communications before that particular one, whereas for any given so-called reactive sequence of a process, as described in [4, 5], the assumption R in the R/G method refers to all its environmental moves and the commitment G to all its process transitions in every pre x of the computation at hand. The A/C and R/G methods have in common that soundness of the network rules in both system can be proved by an inductive argument on the length of their computation sequence (respectively, traces or reactive sequences). The R/G method can be regarded as a compositional reformulation of the Owicki/Gries method as argued in [23] on the basis of a comparison of their respective completeness proofs, since both are based on the introduction of a special kind of auxiliary variables, namely the so-called history variables which record the sequence of state-changes, and both
proofs use the same style of strongest postcondition assertions. In [2] a compositional proof method is presented which formalizes reasoning about shared variable concurrency directly in terms of histories, i.e. they needn't be introduced through the addition of extra auxiliary variables. In other words, histories form an integral part of our programming logic, similarly as in the compositional proof method of [26] for CSP. In order to be able to describe parallel composition logically as conjunction we represent computation histories by time-diagrams as follows: Given a discrete total well-founded ordering which represents time abstractly, a program variable then is naturally viewed as a function from this abstract notion of time to the domain of values, a socalled time-diagram. Interpreting time-instances as possible interleaving points and introducing boolean variables (so-called action variables) which indicate for each time-instance whether the given process is active or not (these action variables are also used for the same purpose in[3]), we can describe logically the compositional semantics of [4, 5] in terms of time-diagrams. Thus we show in that paper that a compositional pre/post style reasoning about shared variable concurrency, apart from the given underlying data structures, involves reasoning about a discrete total well-founded ordering, the rst-order logic of which is decidable. Since the proof method presented in as described above only reasons about the input/output behaviour of concurrent systems, its applicability to reactive systems is limited. In this paper we extend the method above in order to reason about non-terminating computations. The speci cation style here incorporates additionally to the pre- and postconditions an invariant which is interpreted w.r.t. all computations of a process including in nite ones. These generalized correctness formulae are suited to reactive processes as the invariant can also be seen as an interface speci cation towards the environment that guarantees a certain behaviour. We demonstrate that R/G style proofs can be embedded in our proof method. On the other hand, there is still a not understood dierence between R/G and our time diagram based method in that until now nobody succeeded in extending the R/G method to real time, whereas for our approach this extension is only natural. The plan of the paper is as follows: In the next section we describe a programming language for shared variable concurrency. In Section 3 we introduce the assertion language and correctness speci cations and describe their semantics. The proof system is presented in Section 4. An example of a correctness proof of a mutual exclusion algorithm is presented in Section 5. Section 6 discusses an embedding of the Rely/Guarantee formalism.
2 Programming Language In this section we present a programming language for shared variable concurrency. Let Pvar be the set of program variables, with typical elements x; y; : : :. For ease of presentation we restrict to the domain of values consisting of the integers
and booleans only.
De nition1. In the grammar of the programming language below, boolean expressions are denoted by b, whereas e denotes either an arithmetic or a boolean expression (we abstract from the syntactical structure of arithmetic and boolean expressions). S ::= b:x := e j S1 ; S2 j [ []ni=1bi ! Si ] j [ []ni=1bi ! Si ] j S1 k S2
The execution of the guarded assignment b:x := e corresponds with the execution of an await-statement of the form await b ! x := e: Namely, the execution b:x := e is suspended in case b evaluates to false. In case b evaluates to true control proceeeds immediately with the execution of the assignment x := e which is executed atomically. Thus the evaluation of the guard and the execution of the assignment cannot be interleaved. Sequential composition is denoted as usual by the semicolon. Execution of the choice construct [ []ni=1bi ! Si ] consists of the execution of Si for which the corresponding guard bi evaluates to true. The control point between the evaluation of bi and the subsequent execution of Si constitutes an interleaving point. The evaluation of a boolean guard itself is atomic. In case none of the boolean guards evaluate to true the execution of the choice construct suspends. The execution of the iterative construct [ []ni=1bi ! Si ] consists of the repeated execution of [ []ni=1bi ! Si ] until all the boolean guards are false. Parallel composition of the statements S1 and S2 is denoted by S1 k S2 . Its execution consists of an interleaving of the atomic actions, that is, the guarded assignments and the boolean guards of S1 and S2 . A program S in which all variables are local, i.e., no parallel environment refers to them, is called closed and will be denoted by [S].
3 The Mathematics of Shared-Variable Concurrency In the section we discuss the mathematical structures and corresponding logics needed to describe and reason about shared-variable concurrency in a compositional manner. Hence we must be able to distiguish between state changes performed by the process at hand and state changes caused by an environment [1]. In [4, 5] a compositional semantics for shared variable concurrency is introduced based on so-called reactive sequences. A reactive sequence is a sequence of pairs of states: h1; 10 i; h2; 20 i; : : :. A pair of states h; 0 i represents a computation step of the process which transforms the input state into 0. A `gap' h10 ; 2i between two consecutive computation steps h1 ; 10 i and h2; 20 i represents the state-changes introduced by the environment. Parallel composition in this model is then described by interleaving of reactive sequences. In a full, closed, system
gaps have disappeared; then one only considers sequences which are connected, i.e., for which 10 = 2. In order to be able to describe parallel composition logically as conjunction we introduce a representation of reactive sequences as time-diagrams: Given a discrete total well-founded ordering which represents time, a program variable then is naturally viewed as a function from time to the domain of values, a socalled time-diagram. Interpreting time-instances as possible interleaving points and introducing boolean variables which change in time to indicate whether the given process is active or not, we can describe logically the compositional semantics of [4, 5] in terms of time-diagrams. Thus compositional reasoning about shared variable concurrency, apart from the underlying data structures, involves reasoning about a discrete total wellfounded ordering providing a very simple time-structure which is sucient for this purpose. In the context of mechanically supported program veri cation it is of interest here to note that the rst-order logic of discrete total well-founded orderings is decidable. Moreover it should be observed here that we have only a qualitative notion of time which is introduced in order to model interleaving of parallel processes and as such it should be distinguished from the notion of real time as studied in, e.g., [10]. Formally we de ne the (typed) assertion language for describing and reasoning about time-diagrams as follows. We assume given the standard types of the integers, denoted by int, and the type of booleans, denoted by bool. Furthermore we assume given the type of points in time, denoted by time. As introduced in the previous section, the set of program variables is given by Pvar. For each x 2 Pvar we have that x is either an integer or a boolean variable. We distinguish a set Avar Pvar of boolean variables. Variables of Avar, with typical element a; : : :, will also be called action variables, since they will be used to indicate whether a given process is active or not. We assume that action variables do not occur in statements. The set of logical variables is denoted by Lvar (which is supposed to be disjoint from Pvar). A logical variable can be of any of the above given types int, bool or time. In the sequel we will use the symbols t; : : : both for denoting time variables and time-instances (i.e. the elements of a given time-domain T).
De nition2. We present the following main cases of a logical expression l. l ::= time j z j x(l) j t1 t2 jx1(l) = x2(l)j : : : with z 2 Lvar , x; x1; x2 2 Pvar , t1; t2 of type time. In the above de nition time is a constant of type time which is intended to denote the current time instant. The intended meaning of a logical expression x(l), where it is implicitly assumed that l is of type time, is the value of the program variable x at the time-instant denoted by l. The precedence relation in time is denoted by . More complex logical expressions can be constructed using the standard vocabulary of the integers and booleans.
De nition3. Next we de ne the syntax of an assertion p.
p ::= l j :p j p ^ q j 9z: p j 9a: p where l is of type bool, z 2 Lvar and a 2 Avar . Assertions are constructed from boolean logical expressions by means of the logical operations of negation, conjunction and (existential) quanti cation over logical variables and action variables. Note that we do not allow quanti cation of variables of Pvar n Avar , that is, the variables which may occur in statements. In order to describe formally the semantics of the assertion language we need the following de nitions. De nition4. Let Val denote the set of all possible values. The set of states , with typical element , is given by Pvar ! Val (assuming that a state maps integer variables to integers and boolean variables to booleans). De nition5. Given a discrete well-founded total ordering (T; ), a time-domain T for short, a time-diagram d is an element of D = T !fd , where T !fd denotes the set of partial functions from T to the domain of which is nonempty and downward-closed, i.e. if d(t) is de ned and t0 t then also d(t0) is de ned. While a state assigns values to the program variables, as usual, a timediagram describes the state-changes in time. The domain of a diagram d we denote by dom (d). If d is nite, we denote the last time instant of d by max(d). Although computations can be both nite and in nite, we only need to check the nite computations in verifying partial correctness and safety properties, because for any program if there is an in nite computation which is invalid, there is also an invalid nite computation. Therefore, if considering all the nite computations leads to validity, then also considering all the in nite computations leads to validity. Thus the de nition of max(d) is unambiguous as we restrict ourselves to the evaluation of assertions w.r.t nite time diagrams. Semantically assertions are evaluated with respect to a (time) diagram d 2 D = T !fd and a logical environment e 2 Lvar ! Val . Formally we have the following truth-de nition. De nition6. Let =v 0 , where v Pvar , if for all x 2 v we have that (x) = 0 (x). This notion we lift to diagrams as follows: d =v d0 if dom (d) = dom (d0) and, in case both d(t) and d0 (t) are de ned, d(t) =v d0(t), for every t. The value of a logical expression l in a logical environment e and a diagram d, denoted by [ l]](e)(d), is de ned by a straightforward induction on l, for example, [ time ] (e)(d) = max (d) and [ x(l)]](e)(d) = d([[l]](e)(d))(x): The truth-value of an assertion p in a logical environment e and a diagram d, denoted by [ p]](e)(d) (or sometimes also by e; d j= p), is de ned by induction on p. We give the following cases:
{ { {
For z 2 Lvar of type int or bool, we de ne [ 9z:p]](e)(d) if there exists a value v of a corresponding type such that [ p]](efv=z g)(d). For z 2 Lvar of type time, we de ne [ 9z: p]](e)(d) if there exists a t 2 dom (d) such that [ p]](eft=z g)(d). For a 2 Avar , we de ne [ 9a: p]](e)(d) if there exists a d0 such that d =v d0, for v = Pvar n fag, and [ p]](e)(d0).Observe that this introduces in fact quanti cation over sequences of time-instances, i.e., a second order feature. Note that thus quanti cation over time is restricted to the domain of the given diagram. De nition7. A logical environment e and a time-diagram d are de ned to be consistent if e maps every time variable to an element of the domain of d. An assertion p is valid if for any discrete well-founded total ordering (T; ), we have that [ p]](e)(d), for any consistent e and d. For notational convenience we introduce the next-time operator l, where l is an expression of type time, and the strict precedence relation . Note that the next-time operator (like all the other standard temporal operators) and the strict precedence relation can be expressed 3. In order to describe logically progress in time we introduce the following substitution operation. De nition8. Given an assertion p and a time variable t, the assertion p[t=time] denotes the result of (the usual) replacement of (occurrences of) time in p by t and, additionally, the replacement of every subformula 9t0 : q (8t0: q) by the bounded quanti cation 9t0 (t0 t ^ q) (8t0 (t0 t ! q)). (Formulas of the form 9t0(t0 t ^ q) and 8t0(t0 t ! q) we will also denote by 9t0 t: q and 8t0 t: q, respectively.) For example, given an assertion p the passing of one time-unit we can describe by 9t(p[t=time ] ^ time = t). Observe that due to the introduction of bounded quanti cation the assertion p[t=time] thus refers to the time interval determined by t which by the substitution operation is initialized to the `old' value of time. This is formalized by the following substitution lemma.
Lemma 3.1
Let dt , with t a time-instance, denote the time-diagram d restricted to the set of time-instances preceding (and including) t. For any consistent logical environment e and time-diagram d, and assertion p we have e; d j= p[t=time] i e; de(t) j= p In [16] a compositional interleaving semantics of statements is presented which, given a time-domain (T; ), assigns to every statement S a meaning function M[[S]] 2 (T !fd ) ! P (T !fd ) 3
E.g., l = ~l with ~l satisfying 8l :l 0
0
> l
! ~ l
l
0
(the semantics M is a straightforward reformulation of the semantics introduced in [4]). The intuition is that d0 2 M[[S]](d) if d0 is an extension of d which consists of an interleaved terminating execution of S. The semantics M uses a xed action variable a to indicate the state-changes induced by the process itself. We then can de ne the truth of a partial correctness speci cation, denoted by j= fpgS fqg, formally in terms of the semantics M: j= fpgS fqg if we have that whenever [ p]](e)(d) evaluates to true and d0 2 M[[S]](d) then [ q]](e)(d0) evaluates to true as well. A proof system for partial correctness w.r.t. this semantics is presented in [2]. While this style of semantics describes the initial/ nal state transformation, we aim at an appropriate characterization of reactive processes, i.e., possibly non-terminating programs, which require observation of all intermediate states. Therefore we de ne a semantics M which contains all time diagrams which are a pre x of a (possibly in nite) computation. For example the semantics of an assignment b:x := e contains all those timediagrams which consist of a waiting period possibly followed by the actual assignment. De nition9. Generalized correctness speci cations (Invariant speci cations) are of the form I : fpg S fqg. j= I : fpg S fqg if, for all time diagrams d, whenever p holds in d and d0 2 M [ S]] (d) then (i) I holds in d0 , and (ii) if d0 2 M [ S]] (d) then q holds in d0. Note that the condition d0 2 M [ S]] (d) states that d0 is an extension of d which describes a terminating computation of S. We could have avoided this reference to our M semantics by introducing a fin predicate as termination ag of a process. The invariant I has to hold both in the initial state and the the nal state of a terminated computation since d is trivially a pre x of d and if d0 2 M [ S]] (d), then also d0 2 M [ S]] (d) holds.
4 The Proof System In this section we present a proof system for deriving partial correctness speci cations as introduced in the previous section. Within the proof system we frequently use predicate waitb (t; t0) denoting that a process waits between t and t0 to pass the guard b without succeeding. \Failed" test of the guard result in stuttering steps, i.e., all variables (denoted by x) keep their values. waitb (t; t0) def = 8t ~t t0 :((a(t~) ! :b(t~)) ^ (a(t~) ! x( t~) = x(t~))) The actual assignment at the moment t is characterized by
exec(b:x := e; y)(t) def = a(t) ^ b(t) ^ time = t ^ y(time) = y(t) ^ x(time) = e(t) Here y(time ) = yV(t)) (where y is a sequence of variables dierent from x) denotes the conjunction i yi (time ) = yi (t) (y = y1 ; : : :; yn ). A guarded assignment b:x := e is characterized w.r.t. a precondition p by the following axiom. Note that we abbreviate x := x by skip. Assignment axiom: Let y be a sequence of variables dierent from x. I : fpg b:x := e fqg where q 9t:((p ^ I)[t=time] ^ 9t0 :waitb(t; t0) ^ exec(b:x := e; y)(t0 ) and I 9t:p[t=time] ^ (9t0 :waitb(t; t0 ) ^ (t0 = time _ exec(b := e; y)(t0 ))) Due to the substitution of time by t in p, the (quanti ed) time variable t in the postcondition refers to the value of time before the execution of b:x := e. The idling period which represents possible interleavings of parallel processes (and stuttering steps) is given by the values of t and t0 . At time t0 the condition b evaluates to true, execution of b:x := e takes place and it takes one time-unit. While q coincides with the well-known characterization of the strongest postcondition of a standard assignment statement x := e in sequential programming ([8]), I can be interpreted as a strongest invariant w.r.t p and b:x := e, as used in completeness proofs for distributed message passing systems, e.g., in [25, 14]. The rules for the other statements are as usual.
Sequential rule:
I : fpgS1 frg; I : frgS2fqg I : fpgS1; S2 fqg
Choice rule: I : fpg bi :skip; Si fqg i = 1; : : :; n I : fpg[]ni=1bi ! Si fqg
Iteration rule:
I : fpg []ni=1bi ! Si fpg; I : fpg
V :b :skip fqg i i
I : fpg []ni=1bi ! Si fqg Parallel composition is described as follows.
Parallel-rule:
I1 : fp1 gS1fq1g I2 : fp2gS2 fq2g I : fp1 ^ p2 ^ time = t0 gS1 k S2 f9a1 ; a2; t1; t2(q10 ^ q20 ^ n ^ act )g
where qi0 denotes the formula qi [ai; ti=a; time ], for i = 1; 2, n denotes the conjunction of the formulas time = max (t1 ; t2), 8t: ti t time ! :ai(t), i = 1; 2, and act denotes the formula 8t:t0 t time ! :(a1 (t) ^ a2 (t)) ^ a(t) $ (a1 (t) _ a2(t)). Furthermore I denotes the formula 9a1 ; a2:I1[a1=a] ^ I2[a2=a] ^ act. The quanti ed action variables a1 and a2 in the postcondition of the conclusion of the above rule are introduced to distinguish the computation steps of S1 and S2 , respectively. The execution times of S1 and S2 are given by the time variables t1 and t2 , respectively. The initial time of the execution of the parallel composition of S1 and S2 is given by the time variable t0, which is initialized to the value of time in the precondition. The assertion 8t: t0 t time : :(a1(t) ^ a2(t)) ^ a(t) $ (a1 (t) _ a2(t)) expresses that the execution of S1 k S2 consists of an interleaving of S1 and S2 . For actual reasoning about correctness formulae, the following adaptation rules are used: An invariance axiom which expresses that read-only variables are not changed: Invariance axiom : Let x be a read-only variable of S. 8t0 t time : a(t) ! x(t) = x( t) : ft0 = time g S ftrueg We have a similar rule to express that local variables (in that sense that any environment has at most read access to them) are not changed outside of a process: Locality axiom : Let x be a local variable of S. 8t0 t time : :a(t) ! x(t) = x( t) : ft0 = time g S ftrueg
Consequence rule:
I : fpg S fqg I ! I 0 ; p0 ! p; q ! q0 I 0 : fp0g S fq0g
Invariance introduction:
I : fpg S fqg I : fpg S fq ^ I g Reasoning about a statement under the assumption that it cannot be interleaved, i.e., about a closed system, can be axiomatized simply by the following rule.
Non-Interleaving rule:
I : fpg S fqg I ^ 8t:a(t) : fpg [S] fq ^ 8t:a(t)g The additional information in the conclusion of the above rule expresses that S is active at all times. Moreover we have the elimination rule and the the conjunction rule.
In [16] we prove soundness, i.e. every derivable correctness speci cation is valid, and completeness, i.e. every valid correctness speci cation is derivable, of the proof system w.r.t. the compositional semantics M . The completeness proof follows the lines of the general pattern introduced by Cook in [7]. It is based on the expressibility in the assertion language of the strongest postcondition and of the strongest invariant.
De nition10. For a given statement S and precondition p the strongest post-
condition, denoted by SP (p; S), is de ned by fd j there exists d0 s.t. d0 j= p and d 2 M [ S]] (d0)g; and the strongest invariant, denoted by SInv(p; S) is de ned by fd j there exists d0 s.t. d0 j= p and d 2 M [ S]] (d0)g; (we assume that p does not contain free logical variables, therefore reference to a logical environment is omitted). It is worthwhile to remark here that we can express both the strongest postcondition and the strongest invariant in the assertion language directly, that is, we do not need the usual coding techniques (see [20]); this constitutes the main advantage of our axiomatization based on time diagrams.
5 Example: Proving a Mutual Exclusion Property This proof style is suited to prove safety properties, e.g., mutual exclusion of concurrent processes. As an example for this type of algorithm we prove the mutual exclusion property for Dekker's well-known algorithm. The algorithm consists of two symmetrical processes P1 and P2 that use boolean variables reqi to establish that Pi requests access to its critical section and a variable turn to report which process may be in its critical section. Pi : [ true ! < noncriticali > ; reqi := true; [ reqj ! [ turn = j ! reqi := false ; [ turn = j ! skip ]; reqi := true ] ]; cflagi := true; < critical1 >; cflagi := false; reqi := false; turn := j ]
for i; j 2 f1; 2g; i 6= j. We introduce local booleans cflagi to indicate when Pi is in its critical section. These processes do not terminate, thus the postcondition of their parallel composition will turn out to be false. The fact that the processes never reach their critical regions simultaneously is expressed by the invariant :(cflag1 ^ cflag2 ). When we start executing the program we assume that no process is in its critical region and none of them has already requested this. Furthermore we regard the program in isolation, i.e., we assume that no other process exists which changes any variable occurring in our program. The correctness formula to be proved is: :(cflag1 ^ cflag2 ) : f:req1 ^ :req2 ^ :cflag1 ^ :cflag2 g [P1kP2] ffalseg.
5.1 Local Proof We will rst consider the processes in isolation, constructing a speci cation from the proof system and then in a second step examine their parallel composition. We assume that both statements called criticali and noncriticali do not refer to the variables req1; req2; turn; cflag1 and cflag2 . Since the processes are symmetrical, we can restrict our local proof w.l.o.g. to P1. P1 essentially is an in nitely often executed loop construct. The following formulae are used frequently in this proof, we de ne the following abbreviations : = (8t time::a(t) ! req1(t) = req1( t) ^ cflag1 (t) = cflag1 ( t)) rely1 def rely1 formulates the assumption that req1 and cflag1 are local variables of P1. guar1 def = (cflag1 (time) ! ((9t; t0::cflag1 (t) ^ req1(t) ^ :req2(t) ^ wait(t; t0 ) ^ a(t0 ) ^ cflag1 ( t0)) ^8t0 t~ time:a(t~) ! (req1 (t~) = req1 ( t~) ^ cflag1 (t~) = cflag1 ( t~)))) As we will see, the guar1 predicate excludes all possible computations that could violate the mutual exclusion property. inv1 def = (8t time:a(t) ! (req2 (t) = req2 ( t))) Since req2 is a read-only variable of P1, we can guarantee that it is not changed by P1. De ne the invariant I1 def = (rely1 ! guar1) ^ inv1 , claiming that as long as the environment does not change req1, P1 guarantees that the critical section is entered only if :cflag1 ^ req1 ^ :req2 held before. Furthermore, inv1 states that P1 does not change the value of the read-only variable req2 . We want to prove that I1 : f:req1 ^ :cflag1 g P1 ffalseg Applying the Loop rule, the \desired" postcondition of the loop, false, is obtained trivially since the outer loop's guard is identically true.
We have to check each assignment statement with respect to this invariant. First of all I1 : f(rely1 ! (:cflag1 (time) ^ :req1(time))) ^ I1 g noncritical1 frely1 ! (:cflag1 (time):req1 (time))g is established mainly using the invariance axiom, since neither cflag1 nor req1 occur in noncritical1 , and the consequence rule. Similarly we can prove I1 : frely1 ! (:cflag1 (time) ^ :req1(time)) ^ I1 g req1 := true frely1 ! (:cflag1 (time) ^ req1(time))g with the assignment axiom and the consequence rule.We omit here the details of the derivation of the correctness formula for the inner loop abbreviated by inloop1 : inloop1 def = [ req2 ! [ turn = 2 ! req1 := false ; [ turn = 2 ! skip ]; req1 := true ] ]; which satis es the correctness formula I1 : f(rely1 ! (req1(time) ^ :cflag1 (time))) ^ I1 g inloop1 f(9t; t0:((8t0 ~t t:rely1 [t~=time] ! req1(t) ^ :cflag1 (t)) ^ wait(t; t0) ^ a(t0 ) ^ :req2(t0 ) ^ time = (t0) ^ req1 (time) = req1 (t0) ^ cflag1 (time) = cflag1 (t0)))g using rely1 ! (req1 (time) ^:cflag1 (time)) as loop invariant. The postcondition above implies (rely1 ! (:cflag1 (time) ^ req1(time) ^ :req2(time))): The remaining assignments and critical1 are treated similarly as the assignment and noncritical1 above; we obtain formulae for these with the common invariant I1 and pre- and postconditions s.t. the postcondition of a statement can be adapted to the precondition of the following statement, e.g., by the Invariance introduction rule. Then we apply the sequential composition rule and derive the following formula for the outer loop body P10 : I1 : frely1 ! (:cflag1 (time) ^ :req1(time))g P10 frely1 ! (:cflag1 (time) ^ :req1(time))g for P1 P10 , hence another application of the consequence rule and the loop rule as indicated before leads to the desired correctness formula for P1 .
5.2 Parallel Composition Given the two local proofs of P1 and P2 respectively, we clearly see that the intended precondition is the conjunction of the local preconditions , and that the postcondition is false. Our main focus here is the mutual exclusion property of the critical sections of the processes, formalized by the requirement 8t::(cflag1 (t) ^ cflag2 (t)). First we realize that cflagi is local to Pi, hence the Locality rule allows to establish loci ^ I as invariant of Pi , where loci def = 8t time::ai (t) ! cflagi (t) = cflagi ( t). Now the parallel composition rule leads to the following invariant for the cooperating processes: I1 ^ I2 ^ 8t::(a1(t) ^ a2 (t)) ^ (a(t) $ (a1 (t) _ a2 (t))) ^ loc1 ^ loc2 Since we are only interested in the cooperation of P1 and P2 we restrict ourselves to the cases where no outside interference has to be regarded - all process variables (including those of the critical sections) are changed only within these two processes. Thus the application of the Non-interleaving rule together with the invariant above implies guar1 ^ guar2 ^ inv1 ^ inv2 ^ (8t::(a1(t) ^ a2(t)) ^ (a1(t) _ a2(t))); since invj ^ locj = relyi under the assumption par, i 6= j, i; j 2 f1; 2g. Next we assume that cflag1 (time) ^ cflag2 (time) holds during the execution of P and show that this assumption together with the formula above yields false, thus proving the requested property. Now cflag1 (time) ^ cflag2 (time) and the formula above imply the following expression: 9t1 :req1(t1 ) ^ :req2(t1 ) ^ 8t1 t time:req1 (t) = req1( t)
^ 9t3 :req2(t3 ) ^ :req1(t3 ) ^ 8t3 t time:req2 (t) = req2( t);
and using that although T is a total order,
{ t 6= t , since this contradicts req (t ) ^ :req (t ), { t 6 t , as the above implies that req (t ) ^ 8t 1 1
{
3
1
3
:req1(t3 ). Contradiction, t3 6 t1 for a symmetrical reason,
1
1
1
1
3
1
t
time:req1 (t) ^
we obtain a contradiction. Thus, :(cflag1 (time) ^ cflag2 (time)) is a conclusion from our invariant and we succeeded proving the mutual exclusion property by establishing the formula (:(cflag1 ^ cflag2 )) : f:req1 ^ :req2 ^ :cflag1 ^ :cflag2 g [P1kP2] ffalseg:
6 Embedding the rely and guarantee formalism In the Rely/Guarantee formalism [11, 17, 22] a speci cation is split up into four parts. There exist two assumptions on the environment: a precondition pre characterizing the initial state and a rely condition on state pairs that characterizes a relation any transition from the environment is supposed to satisfy. These assumptions describe conditions under which the program is used. The expected behavior of the program when used under these conditions consists of a postcondition post on the nal state of the program in case it terminates, and a guarantee predicate guar which characterizes a relation any transition performed by the program itself should satisfy. Formally, P sat (pre; rely; guar; post) denotes that program P satis es the speci cation quadruple if for all computations of P, whenever starts in a state which satis es pre, and any environment transition in satis es rely, then any component transition in satis es guar and if terminates, its nal state satis es post. Now we can embed the R/G formalism into our (generalized) system in the following way. First note that the pre- and postcondition of the R/G formalism correspond to a restricted kind of pre- and postcondition in our system, namely, we have only to `time' them: pre (time ) and post (time ) denote the formulas obtained from pre and post by replacing every occurrence of a program variable x by x(time ). Using the action variable a to indicate the environmental steps by requiring :a and the steps of the process itself by a , the rely and guar part of a R/G speci cation then corresponds with the following invariant I: 8t:t0 t:(:a(t) ! rely(x(t); x( t))) ! 8t:t0 t:(a(t) ! guar(x(t); x( t))) Hence we can express a R/G formula within our system as follows: P sat (pre; rely; guar; post) def
,
I : fpre(time ) ^ time = t0 g P fpost(time )g
7 Final Remarks In the case of distributed communication for any give fully abstract model basically four dierent compositional logics exist, which are intertranslatable [26, 14]; these are the sat - style, pre/post - style, I/pre/post - style and Assumption/Commitment logics. For shared variable concurrency only one such logic is known, that of Jones [11] based on the Rely/Guarantee (R/G) paradigm. There are several problems with the resulting proof methods, e.g., correctness proofs for mutual exclusion algorithms turn out rather dicult to give. By introducing the concept of time diagrams we were able to give such a compositional logic for real - time shared variable concurrency. In the present paper we extended a Hoare style formalism [2] to a compositional Hoare - style I/pre/post logic for reactive systems communicating through shared variables. We additionally embedded the R/G formalism within this logic
as another step towards a similar mutual intertranslatable system of proof styles for shared variable concurrency as needed for the compositional veri cation of reactive systems.
Acknowledgements The authors would like to thank Qiwen Xu for his helpful comments on various versions of this manuscript.
References 1. P. Aczel. On an inference rule for parallel composition. unpublished note, 1993. 2. F.S. de Boer, U. Hannemann and W.-P. de Roever. A compositional proof system for shared variable concurrency. to appear at FME '97, LNCS, 1997. 3. H. Barringer, R. Kuiper, and A. Pnueli. Now you may compose temporal logic speci cations. In 16th ACM symposium on Theory of Computation, pages 51{63, 1984. 4. F.S. de Boer, J.N. Kok, C. Palamidessi, and J.J.M.M. Rutten. The failure of failures: Towards a paradigm for asynchronous communication. In Proceedings of Concur '91, Lecture Notes in Computer Science, Vol. 527, pages 111{126, 1991. 5. S. Brookes. A fully abstract semantics of a shared variable parallel language. In Proceedings 8th Annual IEEE Symposium on Logic in Computer Science, IEEE Computer Society Press, pages 98{109, 1993. 6. S.D. Brookes, C.A.R. Hoare and A.W. Roscoe. A Theory of Communicating Sequential Processes. JACM 31(7), pp. 560 { 599, 1984. 7. S.A. Cook. Soundness and completeness of an axiom system for program veri cation. In SIAM J. on Computing 7, pp. 70-90, 1978. 8. R.W. Floyd. Assigning meaning to Programs. Mathematical Aspects of Computer Science XIX, American Mathematical Society, 1967. 9. E.C.R. Hehner, C.A.R. Hoare. A more complete model of Communicating Processes. TCS 26, pp. 134 { 120 , 1983. 10. J. Hooman. Speci cation and compositional veri cation of real-time systems. Lecture Notes in Computer Science, Vol. 558, 1992. 11. C.B. Jones. Development methods for computer programs including a notion of interference. PhD thesis, Oxford University Computing Laboratory, 1981. 12. L. Lamport. Veri cation and speci cation of concurrent programs. In A Decade of Concurrency (eds. J.W de Bakker, W.-P. de Roever and G. Rozenberg), Lecture Notes in Computer Science, Vol. 803, 1993. 13. J. Misra and K.M. Chandy. Proofs of networks of processes. IEEE Transactions on Software Engeneering, 7(7):417{426, 1981. 14. P. Pandya: Compositional Veri cation of Distributed Programs. Ph. D. Thesis, University of Bombay, 1988. 15. P. Pandya and M. Joseph: P-A logic { a compositional proof system for distributed programs. Distributed Computing, Vol. 5, pp. 37 { 54, 1991. 16. W.-P. de Roever, F.S. de Boer, U. Hannemann, J. Hooman, Y. Lakhnech, P. Pandya, M. Poel, H. Schepers, Q. Xu and J. Zwiers. State-Based Proof Theory of Concurrency: from Noncompositional to Compositional Methods. to appear 1998.
17. K. Stlen. Development of Parallel Programs on Shared Data-structures. PhD thesis, Computer Science Department, Manchester University, 1990. 18. N. Sounderarajan. Axiomatic semantics of communicating sequential processes. TOPLAS, 6:647{662, 1984. 19. N. Sounderarajan. A proof technique for parallel programs. Theoretical Computer Science, Vol. 31, pp. 13{29, 1984. 20. J.V. Tucker and J.I. Zucker. Program correctness over abstract data types, with error-state semantics. In CWI Monograph Series, vol. 6, Centre for Mathematics and Computer Science/North-Holland, 1988. 21. S. Owicki and D. Gries. An axiomatic proof technique for parallel programs. In Acta Informatika, 6:319-340, 1976. 22. Q. Xu. A theory of state -based parallel programming. PhD thesis, Oxford University Computing Laboratory, 1992. 23. Q. Xu, W.-P. de Roever and J. He. Rely- guarantee method for verifying shared variable concurrent programs. Formal Aspects of Computing 1997 (To appear). 24. C.C. Zhou and C.A.R. Hoare. Partial Correctness of CSP. Proc. IEEE Int. Conf. on Distributed Computer Systems, pp. 1{ 12, 1981. 25. J. Zwiers, W.-P. de Roever and P. van Emde Boas: Compositionality and concurrent networks: soundness and completeness of a proofsystem. Technical Report 57, University of Nijmegen, The Netherland s, 1984. 26. J. Zwiers. Compositionality, Concurrency, and Partial Correctness. Lecture Notes in Computer Science, Vol.321, Springer-Verlag, 1989.
This article was processed using the LATEX macro package with LLNCS style
A Simple Characterization of Stuttering Bisimulation Kedar S. Namjoshi? Department of Computer Sciences, The University of Texas at Austin, U.S.A.
Abstract. Showing equivalence of two systems at dierent levels of ab-
straction often entails mapping a single step in one system to a sequence of steps in the other, where the relevant state information does not change until the last step. In [BCG 88,dNV 90], bisimulations that take into account such \stuttering" are formulated. These de nitions are, however, dicult to use in proofs of bisimulation, as they often require one to exhibit a nite, but unbounded sequence of transitions to match a single transition; thus introducing a large number of proof obligations. We present an alternative formulation of bisimulation under stuttering, in terms of a ranking function over a well-founded set. It has the desirable property, shared with strong bisimulation [Mil 90], that it requires matching single transitions only, which considerably reduces the number of proof obligations. This makes proofs of bisimulation short, and easier to demonstrate and understand. We show that the new formulation is equivalent to the original one, and illustrate its use with non-trivial examples that have in nite state spaces and exhibit unbounded stuttering.
1 Introduction Showing equivalence between two systems at dierent levels of abstraction may entail mapping a single step in one system to a sequence of steps in the other, which is de ned with a greater amount of detail. For instance, a compiler may transform the single assignment statement \x := x 10 + 2" to several low-level instructions. When proving correctness of the compiler, the single assignment statement step is matched with a sequence of low-level steps, in which the value of x remains unchanged until the nal step. If the program state is de ned by the values of program variables, then the intermediate steps introduce a nite repetition of the same state, a phenomenon called \stuttering" by Lamport [La 80]. Stuttering arises in various contexts, especially as a result of operations that hide information, or re ne actions to a ner grain of atomicity. In [BCG 88,dNV 90], bisimulations that take into account such \stuttering" are de ned. It is shown in [BCG 88] that states related by a stuttering bisimulation ?
This work was supported in part by SRC Contract 96-DP-388. The author can be reached at [email protected].
satisfy the same formulas of the powerful branching temporal logic CTL [EH 82] that do not use the next-time operator, X. Although these de nitions are well suited to showing the relationship with CTL, they are dicult to use in proofs of bisimulation, as they often require one to exhibit a nite, but unbounded sequence of transitions to match a single transition; thus introducing a large number of proof obligations. The main contribution of this paper is a simple alternative formulation, called well-founded bisimulation, because is based on the reduction of a rank function over a well-founded set. The new formulation has the pleasant property that, like strong bisimulation [Mil 90], it can be checked by considering single transitions only. This substantially reduces the number of proof obligations, which is highly desirable in applications to in nite state systems such as communication protocols with unbounded channels or parameterized protocols, where checks of candidate relations are often performed by hand or with the assistance of a theorem prover. We demonstrate the use of the new formulation with some non-trivial examples that have in nite state spaces and exhibit unbounded stuttering. The use of rank functions and well-founded sets is inspired by their use in replacing operational arguments for termination of do-od loops with a proof rule that is checked for a single generic iteration (cf. [AO 91]). To the best of our knowledge, this is the rst use of such concepts in a bisimulation de nition. It seems possible that the ideas in this paper are applicable to other forms of bisimulation under stuttering, such as weak bisimulation [Mil 90], and branching bisimulation [GW 89]. We have chosen to focus on stuttering bisimulation because of its close connection to CTL. The paper is structured as follows: Section 2 contains the de nition of stuttering bisimulation from [BCG 88], and the de nition of well-founded bisimulation. The equivalence of the two formulations is shown in Section 3. Applications of the well-founded bisimulation proof rule to the alternating bit protocol and token ring protocols are presented in Section 4, together with a new quotient construction for stuttering bisimulation equivalences. The paper concludes with a discussion of related work and future directions.
2 Preliminaries Notation :
Function application is denoted by a \.", i.e., for a function f : A ! B, and an element a 2 A, f:a is the value of f at a. Quanti ed expressions are written in the format (Qx : r:x : p:x), where Q is the quanti er (one of 8; 9; min ; max ), x is the \dummy", r:x is an expression indicating the range of x, and p:x is the expression being quanti ed over. For example, in this notation, 8x r(x) ) p(x) is written as (8x : r:x : p:x), and 9x r(x) ^ p(x) is written as (9x : r:x : p:x).
De nition (Transition System)
A Transition System (TS) is a structure (S; !; L; I; AP), where S is a set of states, ! S S is the transition relation, AP is the set of atomic propositions, L : S ! P (AP ) is the labelling function, that maps each state to the subset of atomic propositions that hold at the state, and I is the set of initial states. We write s ! t instead of (s; t) 2 !. We only consider transition systems with denumerable branching, i.e., where for every state s, jft j s ! tgj is at most !.
De nition (Stuttering Bisimulation) (cf. [BCG 88]1) Let A = (S; !; L; I; AP ) be a TS. A relation B S S is a stuttering bisimulation on A i B is symmetric, and For every s; t such that (s; t) 2 B, 1. L:s = L:t, 2. (8 : fp :(s ; ) : (9 : fp :(t ; ) : match :B :(; ))). where fp :(s ; ) is true i is a path starting at s, which is either in nite, or its last state has no successors w.r.t. !. match :B :(; ) is true i and can be divided into an equal number of non-empty, nite, segments such that any pair of states from segments with the same index is in the relation B. The formal de nition of match is given in the appendix. States s and t are stuttering bisimilar i there is a stuttering bisimulation relation B for which (s; t) 2 B.
Examples: Q
c
a
Q
Q Q
a
c
Q
a
c
Q
Q P P
P
b
Structure L 1
Structure M
Structure N
[BCG 88] de nes \stuttering equivalence" for nite-state, total transition systems, as the limit of a converging sequence of equivalences. For nite-state systems, these are just the Knaster-Tarski approximations to the greatest solution of the symmetric version of this de nition.
States a and c are not stuttering bisimilar in structures L and M, but they are in structure N. Indeed, L; c j= AF:P, but L; a 6j= AF:P. Structure M shows that stuttering bisimulation distinguishes between deadlock (state c) and divergence (state a) : M; c 6j= EX:true, but M; a j= EX:true 2 . The dotted lines show a stuttering bisimulation on structure N.
Our alternative formulation is based on a simple idea from program semantics: we de ne a mapping from states to a well-founded set, and require, roughly, that the mapping decrease with each stuttering step. Thus, each stuttering segment is forced to be of nite length, which makes it possible to construct matching fullpaths from related states.
De nition (Well-Founded Bisimulation) Let A = (S; !; L; I; AP ) be a TS. Let rank : S S S ! W be a total function, where (W; ) is well-founded3. A relation B S S is a well-founded bisimulation on A w.r.t. rank i B is symmetric, and For every s; t such that (s; t) 2 B, 1. L:s = L:t 2. (8u : s ! u : (9v : t ! v : (u; v) 2 B) _ (a) ((u; t) 2 B ^ rank :(u; u; t) rank :(s; s; t)) _ (b) ((u; t) 62 B ^ (9v : t ! v : (s; v) 2 B ^ rank :(u; s; v) rank :(u; s; t))))(c) Notice that if W is a singleton, then clauses (b) and (c) are not applicable, so B is a strong bisimulation.
The intuition behind this de nition is that when (s; t) 2 B and s ! u, either there is a matching transition from t (clause (2a)), or (u; t) 2 B (clause (2b)) - in which case the rank decreases, allowing (2b) to be applied only a nite number of times - or (u; t) 62 B, in which case (by clause (2c)), there must be a successor v of t such that (s; v) 2 B. As the rank decreases at each application of (2c), clause (2c) can be applied only a nite number of times. Hence, eventually, a state related to u by B is reached. Theorem 1 (soundness) is proved along these lines.
3 Equivalence of the two formulations The equivalence of the two formulations is laid out in the following theorems. The [dNV 90] formulation of stuttering bisimulation considers states a and c of N to be bisimilar. The dierence between our formulations is only in the treatment of deadlock vs. divergence in non-total structures. 3 (W; ) is well-founded i there is no in nite subset fa:i j i 2 Ng of W that is a strictly decreasing chain, i.e. where for all i 2 N, a:(i + 1) a:i. 2
Theorem 1 (Soundness). Any well-founded bisimulation on a TS is a stuttering bisimulation.
Proof. Let B be a well-founded bisimulation on a TS A, w.r.t. a function rank
and a well-founded structure (W; ). Let (s; t) be an arbitrary pair in B. Then, L:s = L:t, by clause (1) of the wellfounded bisimulation de nition. We show that if is a fullpath starting at s, then there is a fullpath starting at t such that match :B:(; ) holds. In the following, we use the symbol ';' for concatenation of nite paths, and for concatenation with removal of duplicate state. For example, aa; ab = aaab, and aa ab = aab. We construct inductively. For the base case, :0 = t. Inductively assume that after i steps, i 0, has been constructed to the point where it matches a pre x of such that the end states of and mark the beginning of the ith segments. Let u be the last state of and v be the last state of . By the inductive hypothesis, (u; v) 2 B. If ends at u, then u has no successor states. Let be any fullpath starting at v. Since u has no successors, a simple induction using (2b) shows that for every state x in , (x; u) is in B. Each application of (2b) strictly decreases rank along , hence must be nite. The fullpath is a nite fullpath matching the nite fullpath . If does not end at u, let w be the successor of u in . As (u; v) 2 B, (i) If (2a) holds, there is a successor x of v such that (w; x) 2 B. Let w and x mark the beginning of a new segment. Extend to ; x, which matches ; w. The induction step is proved. Otherwise, (ii) If (2a) does not hold, but (2b) does, then (w; v) 2 B. Let be the longest pre x of the sux of starting at u such that for every state a in , (a; v) 2 B, and only (2b) holds for (a; v) w.r.t. a ! b for every successive pair of states a; b in . has at least one pair, as u; w is a pre x of . cannot be in nite, as by (2b), for each successive pair a; b in , rank :(b; b; v) rank :(a; a; v), so the rank decreases strictly in the well-founded set. Let y be the last state of . If terminates at y, the argument given earlier applies. Otherwise, y has a successor y0 in , but as is maximal, either (2a) or (2c) must apply for (y; v) 2 B w.r.t. y ! y0 . (2c) cannot apply, as then there is a successor x of v such that (y; x) 2 B, which contradicts the properties of . Hence (2a) must apply. Let x be the successor of v such that (y0 ; x) 2 B. Let y0 and x mark the beginning of a new segment, and extend to ; x, which matches ( ); y0 . (iii) If (2c) is the only clause that holds of (u; v) w.r.t. u ! w, let be a nite path maximal w.r.t. pre x ordering such that starts at v, and for every successive pair of states a; b in , (u; a) 2 B, only (2c) is applicable w.r.t. u ! w, and b is the successor of a given by the application of (2c).
Such a maximal nite path exists as, otherwise, there is an in nite path satisfying the conditions above. By (2c), for successive states a; b in , rank :(w; u; b) rank :(w; u; a); so there is an in nite strictly decreasing chain in (W; ), which contradicts the well-foundedness of (W; ). Let x be the last state in . Then (u; x) 2 B, and as is maximal, either (2a) or (2b) holds of (u; x) w.r.t. u ! w. So x 6= v. (2b) cannot hold, as then (w; x) is in B; but then (2a) would hold for the predecessor of x in . Hence (2a) holds; so x has a successor z for which (w; z) 2 B. Let w and z mark the beginning of a new segment, and extend to ( ); z, which matches ; w. The induction step is shown in either case. The inductive argument shows that successively longer pre xes of have successively longer matching nite paths, which are totally ordered by pre x order. Hence, if is in nite, the limit of these matching paths is an in nite path from t which matches using the partitioning into nite non-empty segments constructed in the proof. ut It is also desirable to have completeness : that for every stuttering bisimulation, there is a rank function over a well-founded set which gives rise to a well-founded bisimulation. Theorem 2 (Completeness). For any stuttering bisimulation B on a TS A, there is a well-founded structure (W; ) and corresponding function rank such that B is a well-founded bisimulation on A w.r.t. rank. ut Let A = (S; !; L; I; AP ). The well-founded set W is de ned as the product W0 W1 of two well-founded sets, with the new ordering being lexicographic order. The de nitions of the well-founded sets W0 and W1, and associated functions rank 0 and rank 1 are given below. Informally, rank 0:(a; b) measures the height of a nite-depth computation tree rooted at a, whose states are related to b but not to any successor of b. rank 1:(a; b; c) measures the shortest nite path from c that matches b and ends in a state related to the successor a of b. De nition of (W0; 0) and rank 0 For a pair (s; t) of states of A, construct a tree, tree :(s; t), by the following (possibly non-eective) procedure, which is based on clause (2b) of the de nition of well-founded bisimulation: 1. The tree is empty if the pair (s; t) is not in B. Otherwise, 2. s is the root of the tree. The following invariant holds of the construction: For any node y of the current tree, (y; t) 2 B, and if y is not a leaf node, then for every child z of y in the tree, z is a successor of y in A, and there is no successor v of t in A such that (z; v) 2 B. 3. For a leaf node y, and any successor z of y in A, if (z; t) 2 B, but there is no successor v of t in A such that (z; v) 2 B, then add z as a child of y in the tree. If no such successor exists for y, then terminate the branch at y. Repeat step 3 for every leaf node on an unterminated branch.
Lemma 3. tree:(s; t) is well-founded. Proof. Suppose to the contrary that there is an in nite branch , which is there-
fore a fullpath, starting at s. Let u be the successor of s on , and let 0 be the fullpath that is the sux of starting at u. By construction of the tree, for every state x on 0, (x; t) 2 B, and for every successor v of t, (x; v) 62 B. However, as (u; t) 2 B, there must be a fullpath starting at t for which match :B:(0 ; ) holds. Let w be the successor of t on . From the de nition of match , for some x on 0 , (x; w) 2 B. This is a contradiction. Hence, every branch of the tree must be of nite length. ut Since tree :(s; t) is well-founded, it can be assigned an ordinal height using a standard bottom-up assignment technique for well-founded trees : assign the empty tree height 0, and any non-empty tree T the ordinal sup :fheight :S + 1 j S T g, where S T holds i S is a strict subtree of T. Let rank 0:(s; t) equal the height of tree :(s; t). As trees with countable branching need only countable ordinals as heights, let W0 be the set of countable ordinals, ordered by the inclusion order 2.
Lemma 4. If tree :(s; t) is non-empty, and u is a child of s in the tree, then rank 0 :(u; t) 0 rank 0 :(s; t). Proof. From the construction, tree :(u; t) is the subtree of tree :(s; t) rooted at
node u; hence its height is strictly smaller.
ut
De nition of (W1; 1) and rank 1 Let W1 = N, the set of natural numbers, and let 1 be the usual order < on N. The de nition of rank 1 is as follows : For a tuple (u; s; t) of states of A,
1. If (s; t) 2 B, s ! u, (u; t) 62 B, and for every successor v of t, (u; v) 62 B, then rank 1 :(u; s; t) is the length of the shortest initial segment that matches s among all matching fullpaths s; and , where starts at u, and starts at t. Formally 4 , rank 1:(u; s; t) = (min ; ; ; : fp :(t; ) ^ fp :(u; ) ^ ; 2 INC ^ corr :((s; ; ); (; )) : jseg :0:(; )j) As (s; t) 2 B, and s ! u, there exist matching fullpaths s; and , with starting at u and starting at t. As (u; t) 62 B, and no successor of t matches u, under any partition of any fullpath that matches a fullpath s; , the initial segment, seg :0:(; ), matches s, and must contain at least two states: t and some successor of t. Thus, rank 1:(u; s; t) is de ned, and is at least 2. 4
The appendix has precise de nitions of INC and corr .
2. Otherwise, rank 1 :(u; s; t) = 0.
Theorem 2 (Completeness). For any stuttering bisimulation B on TS A, there is a well-founded set (W; ) and corresponding function rank such that B is a well-founded bisimulation on A w.r.t. rank. Proof. Let W = W0 W1 . The ordering on W is the lexicographic ordering on W0 W1 , i.e., (a; b) (c; d) (a 0 c) _ (a = c ^ b 1 d). De ne rank :(u; s; t) =
(rank0:(u; t); rank 1:(u; s; t)). W is well-founded, and rank is a total function. We have to show that B is a well-founded bisimulation w.r.t. rank . Let (s; t) 2 B. 1. L:s = L:t, from the de nition of stuttering bisimulation. 2. Let u be any successor of s. If there is no successor v of t such that (u; v) 2 B, consider the following cases: { (u; t) 2 B : As no successor of t is related to u by B, u is a child of s in tree :(s; t), and by Lemma 4, rank 0:(u; t) 0 rank 0 :(s; t). Hence, rank :(u; u; t) rank :(s; s; t). { (u; t) 62 B : As no successor of t is related to u by B, rank 1:(u; s; t) is non-zero. Let fullpath starting at t and partition \witness" the value of rank 1:(u; s; t). Let v be the successor of t in the initial segment seg :0:(; ). This successor exists, as the length of the segment is at least 2. rank 1:(u; s; v) is at most rank 1 :(u; s; t) , 1, so rank 1:(u; s; v) 1 rank 1 :(u; s; t). As no successor of t is related by B to u, (u; v) 62 B, so rank 0:(u; v) = 0. As (u; t) 62 B, rank 0:(u; t) = 0. Since rank is de ned by lexicographic ordering, rank :(u; s; v) rank :(u; s; t). Hence, one of (2a),(2b) or (2c) holds for (s; t) 2 B w.r.t. s ! u.
ut
For a transition system that is nite-branching (every state has nitely many successor states), tree :(s; t) for any s; t is a nite, nitely-branching tree; so its height is a natural number. Hence, W0 = N.
Proposition 5. For a nite-branching transition system, W = N N. ut Theorem 6 (Main). Let A = (S; !; L; I; AP ) be a transition system. A relation B on A is a stuttering bisimulation i B is a well-founded bisimulation w.r.t. some rank function.
Proof. The claim follows immediately from Theorems 1 and 2.
ut
For simplicity,the de nitions are structured so that a bisimulationis a symmetric relation. The main theorem holds for bisimulations that are not symmetric, but the de nition of rank has to be modi ed slightly, to take the direction of matching (by B or by B ,1 ) into account. Details will appear in the full paper.
4 Applications The de nition of a well-founded bisimulation is, by Theorem 6, in itself a simple proof rule for determining if a relation is indeed a bisimulation up to stuttering. In this section, we look at several applications of this proof rule. We outline the proofs of well-founded bisimulation for the alternating bit protocol from [Mil 90], and a class of token ring protocols studied in [EN 95]. We also present a new quotient construction for a well-founded bisimulation that is an equivalence. In all of these applications, the construction of the appropriate well-founded set and ranking function is quite straightforward. We believe that this is the case in other applications of stuttering bisimulation as well.
4.1 The Alternating Bit Protocol A version of the alternating bit protocol is given in [Mil 90], which we follow closely. The protocol has four entities : Sender and Replier processes, and message (Trans) and acknowledgement (Ack) channels. Messages and acknowledgements are tagged with bits 0 and 1 alternately. For simplicity, message contents are ignored; both channels are sequences of bits. For a channel c, let order :c represent the sequence resulting from removing duplicates from c, and let count :c be a vector of the numbers of duplicate bits. Vectors are compared component-wise if they have the same length. For example, order :(0 3 ; 1 2 ) = 0 ; 1 , count :(0 3 ; 1 2 ) = (3; 2), and count :(1 5 ) = (5 ). The bisimulation B relates only those states where the order of each channel is of length at most two. Hence count vectors have length at most two. Let (s; t) 2 B i in s and t, the local states of the sender and replier processes are identical, and the order of messages in both channels is the same. Note that the number of duplicate messages is abstracted away. Let :s = (count :(Trans :s ); count :(Ack :s )). Let rank :(u; s; t) be (:s; :t). The operations of the protocol are sending a bit or receiving a bit on either channel, and duplicating or deleting a bit on either channel. It is straightforward to verify that B is a well-founded bisimulation. The rank function is used, for instance, at a receive action in s from a channel with contents al ; b, while the same channel in the corresponding state t has contents am ; bn (n > 1). The receive action at s results in a state u with channel content al , while the same action at t results in a state v with channel content am ; bn,1. u and v are not related, but v is related to s, and rank :(u; s; v) < rank :(u; s; t) (cf. clause (2c)). The example exhibits unbounded stuttering. With the original formulations of stuttering bisimulation, one would have to construct a computation of length n from state t to match the receive action from state s. This introduces n proof obligations, and complicates the proof. In contrast, with the new formulation, one need consider only a single transition from t.
4.2 Simple Token Ring Protocols In [EN 95] (cf. [BCG 89]), stuttering bisimulation is used to show that for token rings of similar processes, a small cuto size ring is equivalent to one of any larger size. [EN 95] shows that the computation trees of process 0 in rings of size 2 and of size n, n 2, are stuttering bisimilar. It follows that a property over process 0 is true of all sizes of rings i it is true of the ring of size 2. From symmetry arguments (cf. [ES 93,CFJ 93]), a property holds of all processes i it holds for process 0. The proof given in the paper uses the [BCG 88] de nition and is quite lengthy; we indicate here how to use well-founded bisimulation. Each process alternates between blocking receive and send token transfer actions, with a nite number of local steps in between. For an n-process system with state space Sn , de ne n : Sn ! N2 as the function given by n:s = (i; j) where, in state s, if process m has the token, then i = (n , m) mod n is the distance of of the token from process 0, and j is the sum over processes of the maximum number of steps of each process from its local state to the rst token transfer action. The tuples are ordered lexicographically. Let the rank function be rank :(u; s; t) = (m :s; n:t), where s and t are states in instances with m and n processes respectively. Let the relation B be de ned by (s; t) 2 B i the local state of process 0 is identical in s and t. It is straightforward to verify that B is a well-founded bisimulation w.r.t. rank . The rank function is used in the situation where the token is received by process 0 by a move from state s to state u; however, the reception action is not enabled for process 0 in a state t related to s by B. In this case, some move of a process other than 0 is enabled at t, and results in a state v that reduces n, and hence the rank, either by a transfer of the token to the next process, or by reducing the number of steps to the rst token transfer action. The next state v is related to s by B (cf. clause (2c) of the de nition).
4.3 Quotient Structures For a bisimulation B on TS A that is an equivalence relation, a quotient structure A=B (read as A \mod" B) can be de ned, where the states are equivalence classes (w.r.t. B) of states of A, and the new transition relation is derived from the transition relation of A. Quotient structures are usually much smaller than the original; a bisimulation with nitely many classes induces a nite quotient, as is the case in the examples given in the previous sections. Let A = (S; !; L; I; AP) be a TS, and B be a well-founded bisimulation on A w.r.t. a rank function , that is an equivalence relation on S. The equivalence class of a state s is denoted by [s]. De ne A=B as the TS (S ; ;; L; I ; AP) given by :
{ S = f[s] j s 2 S g
{ The transition relation is given by : For C; D 2 S , C ; D i either 1. C = 6 D, and (9s; t : s 2 C ^ t 2 D : s ! t), or 2. C = D, and (8s : s 2 C : (9t : t 2 C : s ! t)).
The distinction between the two cases is made in order to prevent spurious self-loops in the quotient, arising from stuttering steps in the original. { The labelling function is given by L:C = L:s, for some s in C. (states in an equivalence class have the same label) { The set of initial states, I , equals f[s] j s 2 I g.
Theorem 7. A is stuttering bisimilar to A=B. Proof. Form the disjoint union of the TS's A and A=B. The bisimulation on this
structure relates states of A and A=B as follows : (a; b) 2 R i [a] = b _ [b] = a. Let sw : S ! S (read \state witness") be a partial function, de ned at C only when C ; C does not hold. When de ned, v = sw :C is such that v 2 C, but no successor of v w.r.t. ! is in C. Such a v exists by the de nition of ;. Let ew : S 2 ! S 2 (read \edge witness") be a partial function, de ned at (D; C) i C ; D. When de ned, (v; u) = ew :(D; C) is such that u 2 C; v 2 D, and u ! v. Let rank be a function de ned on W [ f?g (? is a new element unrelated to any elements of W) by : If u; s 2 S, and sw :C is de ned, then rank :(u; s; C) = :(u; s; sw :C). If D; C 2 S and s 2 S, then rank :(D; C; s) = :(ew :(D; C); s), if ew :(D; C) is de ned. Otherwise, rank :(a; b; c) = ?. Let (a; b) 2 R. From the de nition of R, a and b have the same label.
{ a 2 S : For clarity, we rename (a; b) to (s; C). By the de nition of R, C = [s]. Let s ! u. If [s] ; [u], then there is a successor D = [u] of C such that (u; D) 2 R, and clause (2a) holds. If the edge from [s] to [u] is absent, then [s] must equal [u], and sw :C is de ned. Let x = sw :C. As (s; x) 2 B, and (u; x) 2 B, but x has no successors to match u, clause (2b) holds for B, i.e., :(u; u; x) :(s; s; x). By de nition of rank , rank :(u; u; C) rank :(s; s; C), so (2b) holds for R. { a 2 S : For clarity, we rename (a; b) to (C; s). Let C ; D. Let (y; x) = ew :(D; C). As x ! y, and (x; s) 2 B, there are three cases to consider : 1. There is a successor u of s such that (y; u) 2 B. Then [y] = [u], so (D; u) 2 R, and (2a) holds. 2. (y; s) 2 B. Then [y] = [x], so C = D. As C ; D, and s 2 C, s has a successor u such that u 2 C; hence (D; u) is in R and (2a) holds. 3. (y; s) 62 B and there exists u such that s ! u, (x; u) 2 B, and :(y; x; u) :(y; x; s). Hence, (C; u) 2 R, and rank :(D; C; u) rank :(D; C; s). So clause (2c) holds.
ut
5 Related Work and Conclusions Other formulations of bisimulation under stuttering have been proposed; however, they too involve reasoning about nite, but unbounded sequences of transitions. Examples include branching bisimulation [GW 89], divergence sensitive stuttering [dNV 90], and weak bisimulation [Mil 90]. We believe that it is possible to characterize branching bisimulation in a manner similar to our characterization of stuttering bisimulation, given the close connection between the two that is pointed out in [dNV 90]. An interesting question is whether a similar characterization can be shown for weak bisimulation [Mil 90]. Many proof rules for temporal properties are based on well-foundedness arguments, especially those for termination of programs under fairness constraints (cf. [GFMdR 83,Fr 86,AO 91]). Vardi [Va 87], and Klarlund and Kozen [KK 91] develop such proof rules for very general types of linear temporal properties. Our use of well-foundedness arguments for de ning a bisimulation appears to be new, and, we believe, of intrinsic mathematical interest. The motivation in each of these instances is the same : to replace reasoning about unbounded or in nite paths with reasoning about single transitions. Earlier de nitions of stuttering bisimulation are dicult to apply to large problems essentially because of the diculty of reasoning about unbounded stuttering paths. Our new characterization, which replaces such reasoning with reasoning about single steps, makes proofs of equivalence under stuttering easier to demonstrate and understand. In the example applications, it was quite straightforward to determine an appropriate well-founded set and rank function. Indeed, rank functions are implicit in proofs that use the earlier formulations. As the examples demonstrate, using rank functions explicitly leads to proofs that are shorter, and which can be carried out with assistance from a theorem prover.
Acknowledgements. Thanks to Prof. E. Allen Emerson, Peter Manolios, Jun Sawada, Robert Sumners, and Richard Tre er for carefully reading an earlier draft of this paper. Peter Manolios helped to strengthen some of the theorems and simplify the proofs. The comments from the referees helped to improve the presentation.
References [AO 91] Apt, K. R., Olderog, E-R. Veri cation of Sequential and Concurrent Programs, Springer-Verlag, 1991. [BCG 88] Browne, M. C., Clarke, E. M., Grumberg, O. Characterizing Finite Kripke Structures in Propositional Temporal Logic, Theor. Comp. Sci., vol. 59, pp. 115{131, 1988. [BCG 89] Browne, M. C., Clarke, E. M., Grumberg, O. Reasoning about Networks with Many Identical Finite State Processes, Information and Computation, vol. 81, no. 1, pp. 13{31, April 1989.
[CFJ 93] Clarke, E.M., Filkorn, T., Jha, S. Exploiting Symmetry in Temporal Logic Model Checking, 5th CAV, Springer-Verlag LNCS 697. [EH 82] Emerson, E. A., Halpern, J. Y. \Sometimes" and \Not Never" Revisited: On Branching versus Linear Time Temporal Logic. in POPL, 1982. [EN 95] Emerson, E.A., Namjoshi, K.S. Reasoning about Rings. in POPL, 1995. [ES 93] Emerson, E.A., Sistla, A.P. Symmetry and Model Checking, 5th CAV, Springer-Verlag LNCS 697. [Fr 86] Francez, N. Fairness, Springer-Verlag, 1986. [GW 89] van Glabbeek, R. J., Weijland, W. P. Branching time and abstraction in bisimulation semantics. in Information Processing 89, Elsevier Science Publishers, North-Holland, 1989. [GFMdR 83] Grumberg, O., Francez, N., Makowski, J., de Roever, W-P. A proof rule for fair termination, in Information and Control, 1983. [KK 91] Klarlund, N., Kozen, D. Rabin measures and their applications to fairness and automata theory. in LICS, 1991. [La 80] Lamport, L. \Sometimes" is Sometimes \Not Never". in POPL, 1980. [Mil 90] Milner, R. Communication and Concurrency, Prentice-Hall International Series in Computer Science. Edited by C.A.R. Hoare. [dNV 90] de Nicola, R., Vaandrager, F. Three logics for branching bisimulation. in LICS, 1990. Full version in Journal of the ACM, 42(2):458-487, 1995. [Va 87] Vardi, M. Veri cation of Concurrent Programs - The Automata Theoretic Framework. in LICS, 1987. Full version in Annals of Pure and Applied Logic, 51:79-98, 1991.
6 Appendix De nition of match
Let INC be the set of strictly increasing sequences of natural numbers starting at 0. Precisely, INC = f j : N ! N ^ :0 = 0 ^ (8i : i 2 N : :i < :(i + 1))g. Let be a path, and a member of INC : For i 2 N, let intv :i:(; ) = [:i; min :(:(i + 1); length :)). The ith segment of w.r.t. , seg :i:(; ), is de ned by the sequence of states of with indices in intv :i:(; ). Let and , under partitions and respectively, correspond w.r.t. B i they are subdivided into the same number of segments, and any pair of states in segments with the same index are related by B. Precisely, corr :B:((; ); (; )) (8i : i 2 N : intv :i:(; ) 6= ; intv :i:(; ) 6= ; ^ (8m; n : m 2 intv :i:(; ) ^ n 2 intv :i:(; ) : (:m; :n) 2 B))). Paths and match i there exist partitions that make them correspond. Precisely, match :B:(; ) (9; : ; 2 INC : corr :B:((; ); (; ))).
General Re nement for High Level Petri Nets Raymond Devillers1 , Hanna Klaudel2 and Robert-C. Riemann3 1
Universite Libre de Bruxelles, Belgium, [email protected] Universite Paris XII, IUT de Fontainebleau, France, [email protected] 3 Universite Paris-Sud, France and Universitat Hildesheim, Germany, [email protected] 2
Abstract. The algebra of M-nets, a high level class of labelled Petri
nets, was introduced in the Petri Box Calculus in order to cope with the size problem of the low level nets, especially if applied as semantical domain for parallel programming languages. A general, unrestricted re nement operator intended to represent the procedure call mechanism for concurrent calls is introduced into the M-net calculus. Its coherence with the low level re nements is exhibited, together with its main properties.
1 Introduction While the algebra of Petri boxes ([2, 1, 9, 7, 8, 10]) has been introduced with the aim of modelling the semantics of concurrent programming languages (and succeeded to do so up to some extent, e.g. [6]), in practical situations (and in particular when dealing with large value domains for program variables), this generally leads to huge (possibly in nite) nets, well de ned mathematically but dicult to represent graphically and thus to grasp intuitively. In order to cope with this problem, higher level models have been introduced ([17, 18, 11]), and in particular a fruitful class of so-called M-nets [3, 4, 5] which nicely unfolds into low level boxes and thus allows to represent in a clear and compact way large (possibly in nite) systems. The same operations should be de ned on the M-net level than on the low level one, and in particular a re nement (meta-)operation. A rst step in this direction has been presented in [11], where the de nition of the re nement for M-nets assumed some restrictions however, on the interface of the re ned transitions and on the entry/exit interface of the re ning nets; this leads unfortunately to diculties when one wants to take concurrent procedure calls into account. In [19, 13] a further attempt is made to use a more general re nement operator for M-nets, both papers aiming at de ning an M-net semantics for a parallel programming language with procedures; a re nement is then necessary in order to distinguish between concurrent instances of the same procedure; the approach de ned in those papers is not fully satisfactory however, since it does not commute with the unfolding operation, and, furthermore, it hides several steps in the construction, while not being completely general. A next step has then been presented in [12]: a more general re nement mechanism is there de ned for a slightly extended M-net model, but it still needs some restrictions to commute with the unfolding operation, and is thus not fully satifactory. In particular,
these restrictions may lead to diculties when applying successive re nements. The present paper aims at overcoming these diculties and weaknesses.
2 The M-net Model Let Val be a xed but suitably large4 set of values and Var be a suitably large5 set of variables . The set of all well-formed predicates built from the sets Val , Var and a suitable set of operators is denoted by Pr. We assume the existence of a xed set A of action symbols , also called actions for short. Each action symbol A A is assumed to have an arity ar(A) which is a natural number describing the number of its parameters. A construct A(1 ; : : :; ar(A) ), where A is an action symbol and j 1; : : :; ar(A) : j Var Val , is a parameterised action . The set of all parameterised actions is denoted by PA. A parameterised action A(1; : : :; ar(A)) is called elementary , if j 1; : : :; ar(A) : j Val . The set of all elementary parameterised actions will be denoted as EA. We also assume the existence of a xed but suitably large5 set X of hierarchical actions. The latter will be the key to re nements, and thus to any hierarchical presentation of a system, since they represent a kind of `hole' to be later replaced by some corresponding (M-)net. Finally, we shall also use a set of structured annotations , built from the value and variable sets, which will denote nonempty sets of values. Their exact syntax will be speci ed later; at that point let us just notice that they include the sets Val and Var , a value v representing in that case the singleton set v and a variable x representing the singleton set v when the value of x is v. The main dierence between M-nets and predicate/transition or coloured nets [14, 16] is that M-nets carry additional information in their place and transition inscriptions to support composition operations. In M-nets, besides the usual annotations on places (set of allowed tokens), arcs (multiset of structured annotations) and transitions (occurrence condition), we have an additional label on places denoting their status (entry, exit or internal) and an additional label on transitions, denoting the communication or hierarchical interface. 2
8
2 f
g
2
[
8
2 f
g
2
f g
f g
De nition1 (M-nets). An M-net is a triple (S; T; ), where S is a set of places , T is a set of transitions with S T = , and is an inscription function with domain S (S T ) (T S ) T such that: \
[
4
[
;
[
For every place s S , (s) is a pair (s):(s), where (s) e; i; x , called the label of s, and (s), the type of s, is a nonempty set of values. For every transition t T , (t) is a triple var(t):(t):(t); where var(t), the variables of t, is a nite set of variables from Var , (t), the label of t, is a nite multiset of parameterised actions (t will then be called a communication 2
2 f
g
2
In particular, this means that Val includes all the structured values which will be constructed through the re nement operation (see the de nition of place types in section 5). 5 In order to be able to rename them whenever necessary to avoid name clashes.
transition ), or a hierarchical action symbol (t will then be called a hierarchical transition ); and (t), the guard of t, is a nite set of predicates from Pr; the
variables occurring either in (t) or (t) are assumed to belong to var (t). For every arc (s; t) (S T ) : ((s; t)) is a multiset of structured annotations (analogously for arcs (t; s) (T S )); each structured annotation represents some nonempty set6 of values absorbed or produced by the transition on the place; ((s; t)) will generally be abbreviated as (s; t); again, the variables occurring in (s; t) are assumed to belong to var(t). 3 2
2
Each type (s) delimits the set of tokens allowed on place s, and (s) describes the status (entry e, internal i or exit x) of s. The label of a transition t can either be a multiset of parameterised actions expressing synchronisation capabilities of t, or a hierarchical action symbol informing about a possible future re nement of t. For reasons of simplicity, in gures, we will omit brackets around arc inscriptions, and arcs with empty inscriptions. Figure 1 shows three M-nets, which will be used as a running example. We intend to re ne N1 into transition t1 and N2 into t2 . :f1g
e
:f1g
e
e1 b1 ;b2 1 b1 e:f3;4g fb1 ;b2 g:;:;
h -
s1
ha- t Qa 1
Qssha- t b- sh : 1;2 s 3 a;b :X : a b ha- t a fag:X1 :; i:f1g
2
3
- h: 4 x
4
f
x
f
3
fag:;:;
x1
2
N :f5;6g
e
f g
:f7g
e
g
2
g
f g
:f5;6g
hc- c- phc- c- xh :f5;6g
e2
2
i
fcg:fA(c)g:;
e3
h
d
3
2
x
fcg:fB (c)g:;
-
4
fdg:fD (d)g:;
d
- xh : 7 3
x
f g
N2
N1
Fig.1. An M-net N with two hierarchical transitions t1 and t2 , and two re ning M-nets N1 and N2 .
Given a transition t T , the part of N which consists of the transition t and all its incident arcs is called the area of t: area (t) = (S t ) t ( t S ): Note that areas of dierent transitions are always disjoint, and that var (t) comprises all the variables occurring in the inscriptions of area (t). A binding for t is a function : var (t) Val . If is an entity depending on the variables of 2
f g
[f g[
f g
!
6
Notice that it will never represent a multiset of values: the multiset aspect is coped by the fact that (s;t) is itself a multiset of structured annotations, and by the fact that two distinct structured annotations may have common values in their represented sets.
var (t), we shall denote by [] the evaluation of this entity under the binding ; in general, this will be obtained by replacing7 in each variable a 2 var (t) occurring in it (if any) by its value (a). For instance, (t)[] 2 Mf (EA) , and (t)[] 2 Mf (ftrue; falseg) (after the evaluation of the terms). The guard (t) plays the r^ole of an occurrence condition in the sense that t may occur under a binding only if (t) is true for , i.e., if all8 terms in (t)[] evaluate to true.
The arc inscriptions specify the tokens ow. An empty arc inscription means that no tokens may ever ow along that arc, i.e., there exists no eective connection along it. A binding of t will be said enabling if (t)[] f ( true ), i.e., if it satis es the guard, and if moreover s S : (s; t)[] (t; s)[] ((s)), i.e., the ow of tokens respects place types. We shall assume that there is always at least one enabling binding for each transition (otherwise, it may be dropped). The hierarchical transition t1 in the M-net N of our running example in gure 1 has a single enabling binding 1 = (a=1), while t2 is enabled for 2 = (a=1; b=1) and 3 = (a=1; b=2); the (silent) communication transition t3 is enabled by 4 = (a=1). In N2 we have for 2 the two bindings 1 = (c=5) and 2 = (c=6), for 3 the bindings 3 = (c=5) and 4 = (c=6), and nally 4 is enabled by 5 = (d=7). A marking of an M-net (S; T; ) is a mapping M : S (Val ) which associates to each place s S a multiset of values from (s). In particular, we shall distinguish the entry marking, where M (s) = (s) if (s) = e and the empty (multi-)set otherwise, and the exit marking, where M (s) = (s) if (s) = x and the empty (multi-)set otherwise. For an M-net N = (S; T; ) we will denote the set of entry (respectively, exit) places of N by N (respectively, N ); N = S ( N N ) is the set of internal places of N . The transition rule speci es the circumstances under which a marking M 0 is reachable from a marking M . The eect of an occurrence of t is to remove all tokens used for the enabling binding of t from the input places and to add tokens according to to its output places. 2 M
8
2
f
[
g
2 M
! M
2
n
[
De nition2. A transition t is enabled for an enabling binding at a marking M1 if there is a marking M such that s S : M1 (s) = (s; t)[] + M (s). The occurrence of t at M1 under then leads to a marking M2 , such that s S : M2 (s) = M (s) + (t; s)[]. 8
8
2
2
3
As a consequence, the semantics of an M-net is not modi ed if we rename locally (i.e., independently) the variables in each area. Without loss of generality, it will thus always be possible to assume that if t = t0 , then var (t) and var (t0 ) are disjoint. 6
7
The evaluation rule will be slightly more complex for structured annotations; this will be clari ed in de nition 6. 8 In other words, the set of predicates could be replaced by their mere conjunction; the reason why this is not done directly here is due to technical reasons; moreover, it could happen that the conjunction has not been included in the allowed operators.
As usual, two (marked) M-nets N and N 0 are called isomorphic , if there are (marking-preserving, label-preserving and arc-inscription-preserving, up to local renamings) bijections between their places and transitions.
3 Unfolding of an M-net The unfolding operation associates a labelled low level net (see e.g. [2]) (N ) with every M-net N , as well as a marking (M ) of (N ) with every marking M of N . U
U
U
De nition3. Let N = (S; T; ) be an M-net; then (N ) = ( (S ); (T ); W; ) U
is de ned as follows:
U
U
(S ) = sv s S and v (s) , and for each sv (S ) : (sv ) = (s). (T ) = t t T and is an enabling binding of t , and for each tP (T ) : (t ) = (t)[]. W (sv ; t ) = (s; t)(x) x[](v), and analogously for W (t ; sv ). 3
U
f
j
2
U
f
j
2
2
g
2 U g
2 U
x2(s;t)
Let M be any marking of N . (M ) is de ned as follows: for every place sv (S ), (M )(sv ) = M (s)(v). Thus, each elementary place sv (S ) contains as many values v as the number of times this value occurs in the marking M (s). The unfoldings for N and N2 of the running example are given in gure 2. U
U
2
U
2 U
s11
h-
t1
s41
t2
s @@ 3 X -s h R h s QQ stX - h h- ,, e
21
e
X1 t4
x
31
2 3
i
42
x
2
;
U (N )
e25
21
33 x25
h- - ph- - h
e
e26
fA(5)g
5
i
fB (5)g
x
34 x26
22 p6
h- - h- - h
e
e37
h
e
fA(6)g
i
45
-D(7) f
g
fB (6)g
x
x37
-h x
U (N2)
Fig.2. Unfoldings U (N ) and U (N2 ).
4 Low Level Re nement The re nement N [Xi Ni i I ] means `N where all Xi -labelled transitions are re ned into (i.e., replaced by a copy of) Ni , for each i in the indexing set I '. In order to ease the understanding of the next sections, and to exhibit the dierences as well as the similarities between the low and high level approaches, we shall rst shortly recall how this operation is introduced in the low level theory. Its de nition is slightly technically complex, due to the great generality that is allowed. Indeed, re nements are easier to de ne when the re ning nets Ni have a single entry and j
2
a single exit place [11], or when there is a single transition to re ne without side loop [15]. However, here we want to allow any kind of con guration: any number of re ned transitions (possibly in nitely many), any connectivity network, any arc weighting, any number of entry/exit places (possibly continuously in nitely many, due to the cardinality explosion phenomenon [2, 1]). The de nition uses a labelled tree device which nicely generalises the kind of multiplicative Cartesian cross product (pre/post places of transitions to be re ned with entry/exit places of the re ning net) commonly used in the literature as the interface places [15]. This setting has not been chosen just for the purpose of treating the general case, but also to get easily the main properties of the re nement operator. With this respect, it has been successfully reused in [9, 7, 8, 11, 10].
De nition4. If X is a variable name and
= Xi i I is a family of X = t T (t) = X and T X = (distinct) such names, let us de ne T S T X . Let = (S; T; W; ); = (S ; T ; W ; ) be labelled nets (for each i i i i i X 2X ~ T; ~ W; ~ ~) with i I ). [Xi Si i I ] is de ned as the labelled net ~ = (S; ~T = (T T X ) i2I T i where T i = t:ti t T X and ti Ti S~ = Si2I S i Ss2S S s where S i = t:si t T X and si i and S s is the set of all the labelled trees of the following form: i.e., the root is labelled by s, the arcs are s labelled by a transition and a direction; 0 t, @ It for each i I and for each (if any) t s , @ with a label of the form Xi , there is an arc et xt0 labelled t going (down) to (a node labelled by) some (arbitrarily chosen) entry place et of i and for each (if any) t0 s with a label of the form Xi , there is an arc labelled t0 coming (up) from (a node labelled by)8 some (arbitrarily chosen) exit place xt0 of i . (t; s) if t~ = t (T T X ); s~ S s > <W (t; s) Wi (ti ; xi) if t~ = t:ti T i ; t x occurs in s~ S s W~ (t~; s~) = > W if t~ = t:ti T i ; s~ = t:si S i i (ti ; si) > :W 0 otherwise ~ ~ W (~s; t)is de ned symmetrically ~ t (T T X ) ~ (t~) = (i (tt)i ) ifif t~t = = t:ti ; (t) = Xi and ti Ti ( s ) if s ~ Ss ~ (~s) = i otherwise. A tree in S s may be represented by a set of sequences s; t:et; ; t0:xt0 ; , describing the root and all the children labels together with the corresponding arc labels. 3 X f
2
j
n
f
2
j
2
g
j
g
2
[
f
[
j
f
i
2
j
2
i
2
g
2
g
2
2
2
2
n
2 2
2
2
-
2
i
2
n
2
2
f
g
The de nition is illustrated by Figure 3, and it may be checked that the behaviour of the re ned net indeed corresponds to what should be expected.
t2 Y
h-V - h
6t s s t ? h- X -s h- Z - h 1
1
3
e
x1
e1 e
x2
h-V - h
e2
3
e
x
2
x
2
k
?- Z???- sh ?h ?h h t Pi 1 I@@@ @ RV , ,PPP@P@RV, ,,,
fs1 g
h- X - hf
s2 ;t2 :e1 ;
e
x
1
i t2 :x1 g
t1
fs2 ;t2 :e1 ; fs2 ;t2 :e2 ; i t2 :x1 g
i t2 :x2 g
1
2
t2 :
t2 :
[Y
f 3g
i
fs2 ;
3
x
t2 :e2 ; t2 :x2 g
k ]
Fig. 3. An example of the low level re nement.
5 General Re nement Here we want to extend the low level de nition to the M-net framework, without restricting the kind of inscriptions allowed by this model. In order to grasp the diculty of the problem, let us consider the example exhibited at the end of [11] and shown9 in gure 4: the re nement of the M-net fragment given by the rst net N , when X is replaced `naively' by the M-net N 0, would look like the third fragment; but while in the rst one the two a variables were the same (they occur in the surrounding of the same transition) this is no longer the case in the third one, since variables only have a local meaning and may be changed independently around each transition; hence they may be xed independently by enabling bindings. Hence it is necessary to transport the identity of the variables, or at least their bindings, from the entry of the re ned copy to the exit. .. .
. h- X a h..
:f0;1g a
i
i
fA(c)g
:f0;1g
fragment of N
:fg
e
fB (d)g
: IN hb- h-b h c d i
N0
:fg
x
fA(c)g N fB(d)g . .. h- -i: Ih - -a h.. . a c d i:f0;1g i:f0;1g tentative fragment of N [X N 0 ]
Fig. 4. Illustration of the diculty of the general case. The intuition behind the general re nement operator is thus the following: A hierarchical transition t (labelled by X ) of an M-net N has a set of enabling 9
In order to abbreviate the annotations in the gures, we shall often omit the guard (t) when it is empty, and the variable set var (t) when it may easily be reconstructed from the annotations in the area of t.
bindings, i.e., possibly more than only a single one. Each enabling binding for t can be understood as a `mode' under which the re ning M-net N 0 is executed. Once (if ever) the re ning M-net has reached its exit marking, the execution of N is supposed to be continued in the state (marking) corresponding to the result of the occurrence of t under the considered mode . Therefore two problems have to be solved: First the decision for the mode under which the re ning M-net should be executed has to be taken, second, the chosen binding has to be `memorised' in order to produce the corresponding marking after Ni has reached its nal state. To guarantee the commutativity of the re nement operation with the unfolding a labelled tree device similar to the one used for low level nets above will be used here. The main dierence with the low-level case, and with the previous attempts to de ne re nements at the M-net level [12], will be that, in our context, the place types (and consequently the evaluations of the structured annotations in arc inscriptions) will be sets of labelled trees but the interface places themselves will remain unstructured. Like above, if X X is a hierarchical action symbol and I = Xi i I is a set of such actions, T X = tS T (t) = X is the set of all X -labelled hierarchical transitions, and T X = X 2X T X is the set of all hierarchical transitions with a label from I . Like for the de nition of re nements for low level-nets, the places of the net N [Xi Ni i I ] will be of two kinds: the interface places and the copied internal places: Each place s S of the M-net N will also be a place of the re ned M-net, with the same label as before. The only dierence is that its type will be a set of labelled trees constructed from the old value set and the entry/exit interface of the re ning nets. The new type ~ (s) of s is the set of all the (isomorphic classes of) labelled trees of the following form: the root is labelled by a value v (s); v the arcs are labelled by bindings t of t , @ I t0 transitions t (in T X (s s)) and a , @ 0 direction (up or down) { more precisely, (e; w) (x; w ) for each i I , for each (if any) t s with a label of the form Xi and for each enabling binding t of t such that v (s; t)[t], there is an arc labelled t going down to a node labelled by some arbitrarily chosen pair (e; w) where e Ni and w i(e), and for each i I , each (if any) t0 s with a label of the form Xi and for each enabling binding t0 of t0 such that v (t; s)[t0 ], there is an arc labelled t going up from a node labelled by some arbitrarily chosen pair (x; w0) where x Ni and w0 i (x). Copied internal places of Ni form the set S i of all the pairs t:si where t is a transition of N labelled by Xi ((t) = Xi ) and si Ni is an internal place of the re ning net Ni . The label of such a place will always be internal. The type of t:si will be the set ~ (t:si ) of all the pairs t:v, where t is an enabling binding of t and v (si ) is any value allowed on si . We give the types for the interface places s1 , s2 , s3 , s4 and the internal place t2 :p 2
X
f
I
2
j
f
j
2
g
g
I
X
j
2
2
2
I \
[
2
2
2
2
2
2
2
2
2
2
2
2
in N [Xi Ni i 1; 2 ] to illustrate the de nitions for the values allowed on interface places and internal places. j
e(s3 ) =
(
g
7 ?SSw 1
;
2 3
1
(x1 ;4) (e ;5) (e ;5) 2 2
7 ?SSw 1
2 3
1
2 f
7 ?SSw 1
7 ?SSw
2 3
1
1
7 ?SSw 1
;
2 3
1
;
2 3
1
(x1 ;4) (e ;5) (e ;6) (x1 ;4) (e ;6) (e ;5) (x1 ;4) (e ;6) (e ;6) 2 2 2 2 2 2
1
;
7 ?SSw
;
2 3
1
;
7 ?SSw 1
2 3
1
;
7 ?SSw 1
7 ?SSw
;
2 3
1
1
)
2 3
1
(x1 ;4) (e ;5) (e ;7) (x1 ;4) (e ;7) (e ;5) (x1 ;4) (e ;6) (e ;7) (x1 ;4) (e ;7) (e ;6) (x1 ;4) (e ;7) (e ;7) 2 3 3 2 2 3 3 2 3 3
e(s1 ) =
(
1
1
(e1 ;3)
(e1 ;4)
?; ? 1
e(s2 ) = 1
)
e(s4 ) =
1
(
)
6 ; 6 ; 6; 6 ; 6 ; 6
1
2
1
2
1
2
2
2
3
3
2
3
(x2 ;5) (x2 ;6) (x3 ;7) (x2 ;5) (x2 ;6) (x3 ;7)
e(t2:p) = 2 :5; 2:6; 3:5; 3:6
f g
f
g
Like for the de nition of re nements for low level-nets, the transitions of Ne = N [Xi Ni i I ] will also be of two kinds: the untouched transitions t T T X , with the same inscription as before ((t) is the same in N and Ne ), and the copied transitions t:ti, where (t) = Xi I and ti Ti . The set of those copied transitions is denoted by T i. As for the inscription of the latter, we shall assume (without loss of generality) that var (t) var i (ti ) = ; then var (t:ti) = var (t) var i (ti ), (t:ti ) = (ti ) and (t:ti) = (t) (ti). In order to understand the rationale of the structured annotations occurring in the inscription of arcs of the re ned M-net, let us consider the following example depicted in gure 5. j
2
2
I
n
2 X
2
\
[
.. t0 .
-a is
t . .. X b
fragment of N
;
[
e1
iHH c t00 s0 . j *c -d i.. i
e e
e2
fragment of N 0
0 s (b;s):(c;e1 ) t:t00 t0 :s0. .. t i i. . (a;s):! (b;s):(c;e2 ) !:(d;s0 ) . fragment of N [X N 0 ]
Fig.5. Illustration of the structured annotations. Transition t0 is untouched but place s now has a type composed out of labelled trees, and the occurrence of t0 must produce one instance of each tree with a root labelled by a value produced through a in N ; this will be represented by the notation (a; s):!, where the `hole' symbol ! means that there is no constraint on the son labels. Transition t:t00, on the other hand, must absorb from s one instance of each tree with a root labelled by a value absorbed through b in N and a son corresponding to the selected mode (determined by the values of the variables from t) labelled by a value absorbed through c in e1 by t00: this will be represented by the notation (b; s):(c; e1); it will also absorb trees of the same shape, but with the son labelled by a value absorbed through c in e2 by t00, which
will be represented, similarly, by the notation (b; s):(c; e2); notice that the fact that t00 absorbs one token from each of the two entry places in N 0 is replaced, in the re ned net, by the fact that t:t00 absorbs from s the tokens of two structured annotations, like if the whole N 0 were replaced by a single entry place; this is due to the fact that the new place s gathers all the tree values produced by the new t0. Finally, transition t:t00 must also produce in place t:s0 one instance of each value :v where v is any value produced through d in N 0 and is any enabling binding for t (i.e., a mode determined by the values of the variables from t): this will be represented by the notation ! :(d; s0), where the `hole' symbol ! means that there is no special constraint on the rst part of the value. De nition5 (General Re nement). Let N = (S; T; ) and Ni = (Si ; Ti ; i), ~ T; ~ ~), where i I ,Sbe M-nets. N [Xi Ni i SI ] is de ned as the M-net N~ = (S; i X i ~ ~ where S = S S , T = (T T ) T , and i2I i2I ( (~s):~(~s) if s~ S ~(~s) = i:~(~s) if s~ = t:si S i ( (t) if ~t = t (T T X ) ~(t~) = var (t) var (ti ):i (ti ):(t) i(ti ) if t~ = t:ti ; (t) = Xi ; ti Ti 8 P (s; t)(a) (a; s):! if t~ = t (T T X ); and s~ S > >2 > P P P (s; t)(a) (e ; t )(b) (a; s):(b; e ) > > i i i i < 2 2 2 if t~ = t:ti and s~ S ~(~s; ~t) = > P > i(si ; ti)(b) !:(b; si) if t~ = t:ti and s~ = t:si S i > > 2 > : otherwise, and analogously for arcs (t~; s~). 3 We apply the general re nement on our running example of gure 1, the resulting M-net is depicted in gure 6. The previously given place types are omitted. 2
j
[
I
n
2
[
2
2
2
[
I
n
[
f
2
g
2
I
n
2
a (s;t)
a (s;t) ei
f
Ni b i (ei ;ti )
f
g
2
g
2
b i (si ;ti )
;
s1 (a;s ):(b ;e ) t1 : 1 1 1 1 fA(c)g:fabg t2 :p fB(c)g:fabg e (a;s1 ):(b2 ;e1 ) ; (a;s3 ):(b1 ;x1 ) i !:(c;p) t : (b;s4 ):(c;x2 ) s3 (a;s3 ):(c;e2 ) t2 : 2!:(c;p) 2 3 i fD (d)g:fabg (a;s3 ):(d;e3 ) (b;s4 ):(d;x3 ) s2 (a;s ):! t3 (a;s3 ):! 2 e t2 : 4 ;
h h
- Z ZZ~Z h > TT -
-
-h -
SSw sh 7 4
x
Fig. 6. General re nement applied to N , N1 , and N2 : N [Xi Ni j i 2 f1; 2g]. We still have to specify the evaluation of the structured annotations under a binding; this will lead to a set (possibly in nite) of values (labelled trees, possibly
reduced to their root). In order to do that, let us rst notice that, since for a hierarchical transition t of N and a transition ti of the re ning M-net Ni , we have var (t:ti) = var (t) var (ti ), each binding of t:ti is the union of a binding for t and a binding for ti , while the bindings of an untouched transition t0 are the same as in N . De nition6. The evaluation of a structured annotation in the inscription ~(~s; ~t) or ~(t~; s~) of an arc in the re ned net for a binding = or = (as speci ed above) is de ned by: (a; s):![] = ~(s) the root of belongs to a[] , if ~t = t (T T X ) and s S0, (a; s):(b; s )[] = ~ (s) the root of belongs to a[] and the son corresponding to the arc (down or up, depending on the inscripted arc) labelled t has a label (s0 ; w) with w b[] , if ~t = t:ti with t T X and s S , ! :(b; si)[] = :v ~ (~s) v b[] , 3 if t~ = t:ti with t T X and s~ = t:si S i . Then, it is not hard to see that the enabling bindings of t0 in Ne are the enabling bindings of t0 in N , and the enabling bindings of t:ti are the unions of an enabling binding for t and an enabling binding for ti . The evaluation de ned for structured annotations will be illustrated for the occurrence of transition t1 : 1 and transition t2 : 2 in the re ned M-net from gure 6. Consider the place s1 with its initial marking, i.e., s1 contains a token for each value of its type, and transition t1 : 1 . can re for every binding composed out of an enabling binding for t1 and an enabling binding for 1 . There is a unique binding 1 = (a=1) for t1 and there are two enabling bindings for
1 , i.e., 01 = (b1 =4; b2=4) and 02 = (b1 =4; b2=3). Take for instance = 1 02 (but notice that the other combination might also be considered). The evaluation (a; s1):(b1; e1 ); (a; s1):(b2; e1 ) [] gives us the set (s~1 ), hence the occurrence of t1: 1 under this marking is possible. The evaluation (a; s3):(b1; x1) [] yields the entire type e(s3 ), since the root of each tree value in e(s3 ) belongs to a[1] and the son corresponding to the arc 1 belongs to b1[02 ]. Hence, the occurrence of t1 : 1 puts each tree value from e(s3 ) on s3 . Now we try to re transition t2 : 2. It is enabled by bindings composed out of 2 or 3 for t2 , and 1 = (c=5) or 2 = (c=6) for 2 . One might expect that t2 : 2 can be red twice under the given marking, since 2 is enabled twice under the initial marking of N2 (by 1 and 2 ). Let us take = 2 1 . The evaluation [
[
f
2
j
g
2
I
n
2
f
2
j
2
f
2
2
j
g
2
I
I
2
2
g
2
[
f
g
f
(a; s3):(c; e2) [] yields
f
g
7 ?SSw 1
f
1
2 3
;
(x1 ;4) (e ;5) (e ;5) 2 2
7 ?SSw 1
1
2 3
g
[
;
7 ?SSw 1
1
2 3
g
(x1 ;4) (e ;5) (e ;6) (x1 ;4) (e ;5) (e ;7) 2 2 2 3
which is
taken from s3 when ring t2 : 2 , while !:(c; p) [] generates 2:5 on t2 :p. Notice that the present marking (after ring t2: 2 under binding 2 1 ) does no longer allow the execution of t2 : 2 under binding 3 2 (and neither of t2 : 4 under mode 3 ), since the evaluation of arcs adjacent to (and hence the enabling of) t2: 2 (respectively, t2: 4 ) is de ned with respect to the type of the adjacent place, i.e., with respect to e(s3 ), and not only with respect to the present marking of f
g
f
g
[
[
the place. The execution t2: 2 (respectively, t2 : 4 ) under mode 3 would require tokens (labelled trees) which are already removed from s3 by the occurrence of a transition from N2 under mode 2. This is the means to transport the chosen mode for the hierarchical transition through the re ning M-net, and to ensure that once one transition of the re ning net has chosen a mode, then the decision is valid for the entire net, even if there are transitions which are concurrent (as in our example t2: 2 and t2: 4 ) and hence independent. We might now re t2: 2 under binding = 2 2 . The evaluation of the incoming
7 ?SSw
[
7 ?SSw
1
7 ?SSw
1
1
; ; , these three tokens are still in s3 . The execution of t2: 2 under this binding yields then 2 :6 on t2:p. arc yields
f
1
2 3
2 3
1
(x1 ;4) (e ;6) (e ;5) 2 2
1
2 3
g
(x1 ;4) (e ;6) (e ;6) (x1 ;4) (e ;6) (e ;7) 2 2 2 3
6 Some Properties of the General Re nement The basic property of the general re nement is the commutativity with the unfolding operation, stated in the following theorem and illustrated in gure 7.
Theorem 7 (Commutativity). Let N = (S; T; ) and Ni = (Si ; Ti ; i), where i I , be M-nets, then up to isomorphism 2
U
(N [Xi
N
Ni
Ni i I ]) = (N )[Xi j
2
U
(Ni ) i I ]: j
2
Ni
N
Sw /
U
M-nets . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ? ? U (N ) U (Ni ) Petri Boxes N [Xi Ni ]
?
Nl =U (N [Xi Ni ])
=
Sw /(N )]=N
U (N )[Xi
U
i
r
Fig.7. Illustration of the commutativity. Proof. In Nl as in Nr there are two kinds of transitions and two kinds of places:
those coming from the net N to be re ned and those coming from the re ning nets Ni ; we shall exhibit a one-to-one correspondence between the members of each category. Let t00 be a transition of N which is not in T X , and 00 one of its enabling bindings; let t be a transition of N with label Xi , and one of its enabling bindings; let t0 be a transition of Ni , and 0 one of its enabling bindings; let s be a place of N , v one of its values, j an enabling binding of a transition tj to be re ned absorbing value v from s, and k an enabling binding of a transition tk to be re ned producing value v in s; let ej be some entry place of the net re ning tj and vj one of its values; let xk be some exit place of the net I
re ning tk and vk one of its values; nally, let s0 be an internal place of Ni and v0 one of its values. The one-to-one correspondance between the elements constituting both sides of the equation is schematised in the following table: in Nl = (N [Xi Ni ]) in Nr = (N )[Xi (Ni )] t0000 t0000 (t:t0)[0 t :t00 s sv v tj , @ j of tj , @ I k of tk I tk , @ , @ ej xk (ej ; vj ) (xk ; vk ) (t:s0 ):v0 (t ):(s0v0 ) U
U
U
j
k
vj
vk
The mapping between the arcs follows immediately from the fact that the arc weights are directly driven by the (corresponding) name structures, such that an arc in (N [Xi Ni ]) and the arc in (N )[Xi (Ni )] (between the related place and transition) have the same weight. U
U
U
t u
We also have a general property about successive re nements, similar to the one already obtained in the low level domain. Since the variable sets of two successive re nements are not necessarily disjoint, we shall separate the second set into a common part and a disjoint part. We then have the following general expansion law for re nements, which allows to reduce any succession of simultaneous re nements to a single re nement (but whose re ning components may themselves be re nements).
Theorem8 (Expansion law). Let N , Ni, Nj0 , and Nk00, where i I; j J , 2
2
and k 2 K , be M-nets. If J I; I \ K = ; and fYk j k 2 K g \ fXi j i 2 I g = ;, then up to isomorphism
N [Xi Ni i I ][Xj Nj0 ; Yk Nk00 j J; k K ] = N [Xi Ni [Xj Nj0 ; Yk Nk00 j J; k K ]; Yk Nk00 i I; k K ]: Proof. Let us denote by Nl the left hand side and by Nr the right hand side of the equation. The two M-nets Nl and Nr are the same modulo parenthesis, and dropping some redundancy. The bijection between the two nets can be obtained through a transformation of net Nl into net Nr (or vice versa) by rewriting the identity of transitions and places, the types of the places and the inscriptions of arcs according to the parenthesis. Consider the example depicted in gure 8, exhibiting the various types of con gurations. The correspondence table may be constructed from the example as in the previous proof. j
2
j
j
2
2
2
2
j
2
2
t u
e1
c t0 i x hPP q 1c Y d- hd- -e h h
h- - h- - h
s
t1 a X a
s0
N
s (t1 :t0 ): (t1 :t0 ):i3 (t1 :t0 ):
t1 :i1 t1 : s0 t2 : t2 :i3 t2 : s00
N [X
t2 b Y b
s00
1
e2
h((a;s):(c;e );s):(f;e )
? h?!:(g;i ) ?!:(h;i ) h?(!:(d;i );t :i ):(k;x ) ?(!:(d;i );t :i ):! h?((a;s ):(e;x );s ):! ?((b;s ):!;s ):(f;e ) h?!:(g;i ) ?!:(h;i ) h?((b;s ):!;s ):(k;x ) 1
3
3
1
1
1
1
1
1
1
0
0
3
0
3
3
3
00
N1 ][Y
00
N2 ]
t1 :(t0 :i3 ) t1 :i1 t1 : s0 t2 : t2 :i3 t2 :
3
N [X
s00
3
N2
?(a;s):((c;e ):(f;e );e ) h?!:(!:(g;i );t :i ) ?!:(!:(h;i );t :i ) h?!:((d;i ):(k;x );i ) ?!:((d;i ):!;i ) h?(a;s ):((e;x ):!;x ) ?(b;s ):(f;e ) h?!:(g;i ) ?!:(h;i ) h?(b;s ):(k;x ) 1 2
t1 :(t0 : ) t1 :(t0 : )
3
h(a;s):((c;e ):(f;e );e )
s
3
((a;s):(c;e2 );s):(f;e3 )
0
N1
i x hf- h h g h k
e3
1
3 3
0
3
3
0
3
3
1
3
1
1
1
0
1
0
1 2
1
3
3
3
00
N1 [Y
3
N2 ];Y
N2 ]
Fig. 8. Illustration of the expansion law.
7 Conclusion We have provided the M-net domain with the same algebraic structure as the low level Petri Box one, by introducing a general simultaneous re nement operator; the coherence of the corresponding structure has been exhibited through the unfolding operation, and the properties are inherited from the low level ones. We have established thus a fully general and coherent framework in which the ideas of [19, 13] can be satisfactorily developed. To our knowledge, no other high level framework possesses an equally general re nement satisfying the desired algebraic properties. The basic ideas of this paper are most likely applicable to other high level Petri net models, although the formalisation is given here only for the M-net Calculus. Acknowledgments
This work has been performed while the rst author visited the UPVM's Equipe d'Informatique Fondamentale in spring 1997: our thanks thus go to E. Pelz and the Universite Paris Val de Marne for the invitation. We thank as well the anonymous referees for their careful reading and helpful comments.
References 1. E. Best, R. Devillers, and J. Esparza. General Re nement and Recursion for the Box Calculus. STACS'93. Springer, LNCS Vol. 665, 130{140 (1993). 2. E. Best, R. Devillers, and J.G. Hall. The Box Calculus: a New Causal Algebra with Multilabel Communication. Advances in Petri Nets 92. Springer, LNCS Vol. 609, 21{69 (1992). 3. E. Best, H. Fleischhack, W. Fraczak, R.P. Hopkins, H. Klaudel, and E. Pelz. A Class of Composable High Level Petri Nets. Application and Theory of Petri Nets 1995. Springer, LNCS Vol. 935, 103{120 (1995). 4. E. Best, H. Fleischhack, W. Fraczak, R.P. Hopkins, H. Klaudel, and E. Pelz. An M-Net Semantics of B (PN )2. Structures in Concurrency Theory: STRICT'95 Proceedings. Springer, 85{100 (1995). 5. E. Best, W. Fraczak, R.P. Hopkins, H. Klaudel, and E. Pelz. M-nets: an Algebra of High Level Petri Nets, with an Application to the Semantics of Concurrent Programming Languages. To appear in Acta Informatica . 6. E. Best, R.P. Hopkins. B (PN )2 - a Basic Petri Net Programming Notation. Proceedings of PARLE'93. Springer, LNCS Vol. 694, 379{390 (1993). 7. E. Best and M. Koutny. A Re ned View of the Box Algebra. Application and Theory of Petri Nets 1995. Springer, LNCS Vol. 935, 1{20 (1995). 8. E. Best and M. Koutny. Solving Recursive Net Equations. Automata, Languages and Programming 1995. Springer, LNCS Vol. 944, 605-623 (1995). 9. R. Devillers. The Synchronisation Operator Revisited for the Petri Box Calculus. Technical Report LIT-290, Universite Libre de Bruxelles (1994). 10. R. Devillers. S-Invariant Analysis of General Recursive Petri Boxes. Acta Informatica, Vol. 32, 313{345 (1995). 11. R. Devillers and H. Klaudel. Re nement and Recursion in a High Level Petri Box Calculus. Structures in Concurrency Theory: STRICT'95 Proceedings. Springer, 144{159 (1995). 12. R. Devillers, H. Klaudel and R.-C. Riemann. General Re nement in the M-net Calculus. Technical Report LIT-357, Universite Libre de Bruxelles (1997). 13. H. Fleischhack and B. Grahlmann. A Petri Net Semantics for B(PN)2 with Procedures. Parallel and Distributed Software Engineering, Boston Ma., 1997. 14. H. Genrich. Predicate-Transition Nets. In Petri Nets: Central Models and their Properties, Advances in Petri Nets 1986 Part I. Springer, LNCS Vol. 254, 207{247 (1987). 15. R.J. van Glabbeek and U. Goltz. Re nement of Actions in Causality Based Models. Stepwise Re nement of Distributed Systems. Springer, LNCS Vol. 430, 267{300 (1989). 16. K. Jensen Coloured Petri Nets. Basic Concepts, Analysis Methods and Practical Use. EATCS Monographs on Theoretical Computer Science, Vol. 1. Springer (1992). 17. H. Klaudel. Modeles algebriques, bases sur les reseaux de Petri, pour la semantique des langages de programmation concurrents. PhD Thesis, Universite Paris XI Orsay (1995). 18. H. Klaudel and E. Pelz. Communication as Uni cation in the Petri Box Calculus. Fundamentals of Computation Theory. Springer, LNCS Vol. 965, 303-312 (1995). 19. J. Lilius and E. Pelz. An M-net Semantics for B(PN)2 with Procedures. In ISCIS XI, Vol. I, 365{374, Antalya, November 1996. Middle East Technical University.
This article was processed using the LATEX macro package with LLNCS style
Polynomial-time Many-one Reductions for Petri Nets Catherine Dufourd and Alain Finkel LSV, CNRS URA 2236; ENS de Cachan, 61 av. du Pdt. Wilson 94235 Cachan Cedex, FRANCE. fCatherine.DUFOURD, [email protected]
Abstract. We apply to Petri net theory the technique of polynomial-
time many-one reductions. We study boundedness, reachability, deadlock, liveness problems and some of their variations. We derive three main results. Firstly, we highlight the power of expression of reachability which can polynomially give evidence of unboundedness. Secondly, we prove that reachability and deadlock are polynomially-time equivalent; this improves the known recursive reduction and it complements the result of Cheng and al. [4]. Moreover, we show the polynomial equivalence of liveness and t-liveness. Hence, we regroup the problems in three main classes: boundedness, reachability and liveness. Finally, we give an upper bound on the boundedness for post self-modi ed nets: 2 2 ( ( ) log ( )). This improves a decidability result of Valk [18]. O size N
size N
Key words: Petri net theory; Complexity Theory; Program Veri cation; Equivalences.
1 Introduction The boundedness, the reachability, the deadlock, the t-liveness and the liveness problems are among the main problems studied in Petri nets. Solving these problems requires huge space and timep resources. For boundedness, Lipton [13] proved that a lower space-bound is 2c: jN j , improved with 2c:jN j by Bouziane [2] (where c is some constant and jN j is the size of the input net); Racko [17] proved that an upper space-bound for this problem is 2O(jN jlog jN j) . For reachability, decidability has been proved by Mayr [14] and Kosaraju [12]; Cardoza, Lipton, Mayr and Meyer [3,15] established that this problem is Expspace-hard. However, until now, it is not known whether the reachability, the deadlock and the liveness problems are primitive recursive or not. In this paper, our aim is to compare these problems, to regroup similar problems into classes and to order these classes. We use polynomial-time many-one reductions [9]. The idea is to take one instance of a problem A and to polynomially transform it into one instance of another problem B. The problem B is seen as an oracle used to solve the problem A. In the literature, we often nd two other kinds of reductions: polynomialtime Turing reductions which allow to consult the oracle not only once, but a
polynomial number of times and recursive reductions. We obtain two sorts of results. Firstly, we prove three main theorems:
{ Boundedness is polynomially reducible to reachability, { Reachability and deadlock are polynomially equivalent, { Liveness and t-liveness are polynomially equivalent. For instance, we show that a Petri net N is unbounded if and only if a marking MN is reachable in the net Nb which is polynomially constructed from N. Let us note that our second theorem strengthens a recent result of Cheng, Esparza and Palsberg [4] who showed that reachability is polynomially reducible to deadlock. Secondly, we establish a strong relation between Petri nets and Post SelfModifying nets (PSM-nets) on the boundedness problem. Post self-modifying nets, de ned by Valk [18], are extended Petri nets in which a transition may add a \dynamic number" of tokens (which is an ane function, with a speci c form, of the current marking) in its output places. Valk has proven that the boundedness problem is decidable for post self-modifying nets. Here, we improve his decidability result by giving 2O(jN j log jN j) as an upper space-bound. Moreover, this upper bound is not so far from the lower bound (2jN j ). 2
There are four advantages in grouping problems together. Firstly, even if we still do not know the exact complexity of reachability and deadlock, it is instructive to know that they have the same complexity, modulo a polynomial transformation. Secondly, when we know that seven problems are polynomially equivalent, as for the ones of the reachability class, we may focus our attention on only one of these problems, to produce a good implementation of an algorithm solving it; this unique program may be used for solving the sixth other problems. Thirdly, the obtained results con rm our intuition about the hardness of problems in Petri nets. Basically we obtain the following order: Boundedness Reachability Deadlock Liveness Fourthly, we obtain a new complexity result in using the equivalence between boundedness for Petri nets and boundedness for post self-modifying nets. In the next section, we give the basic de nitions of Petri nets and polynomialtime reductions; then we make an overview of the known many-one polynomialtime reductions. In section 3, we reduce boundedness to reachability. In section 4, we prove that reachability is polynomially equivalent to deadlock; moreover, both are polynomially equivalent to reachability and deadlock for normalized Petri nets (for which valuations over arcs and initial marking are upper-bounded with 1). In section 5, we show that liveness is equivalent to t-liveness. In section 6, we prove that boundedness for Petri nets and boundedness for post selfmodifying nets are polynomially equivalent; we deduce from there the upperbound on the boundedness problem for PSM-nets. We conclude in section 7.
2 Petri nets and polynomial-time reductions
Let IN be the set of nonnegative integers and let INk (k 1) be the set of kdimensional column vectors of elements in IN. Let X 2 INk , X(i) (1 i k) is the ith component of X. Let X; Y 2 INk , we have X Y i the two conditions hold : (a) X(i) Y (i) (1 i k) and (b) 9j; 1 j k; s:t: X(j) < Y (j). Let be a nite alphabet, is the set of all nite words (or sequences) over . We note jS j, the cardinal of a nite set S. We note jN j, the size of a Petri net N.
2.1 Petri nets, properties and complexity
A Petri net is a 4-tuple N =< P; T; F; M0 > where P is a nite set of places, T is a nite set of transitions with P \ T = ;, F : (P T) [ (T P ) ,! IN is a
ow function and M0 2 INjP j is the initial marking. A Petri net is normalized or ordinary if F is a function into f0; 1g and M0 is a function into f0; 1gjP j. A transition t is rable from a marking M 2 INjP j , written M !t , if for every place p, we have F (p; t) M(p). Firing t from M leads to a new marking M 0, written M !t M 0 , de ned as follows : for every place p, we have M 0 (p) = M(p) , F(p; t) + F (t; p). A marking M 0 is reachable from M, written M ! M 0 , M 0 . A marking is dead if no if there exists a sequence 2 T such that M ! transition is rable from it. The reachability set of N, denoted RS(N), contains all the markings reachable from M0 . A Petri net is unbounded if its reachability set is in nite. A transition t is quasi-live from M if it is rable from a marking M 0 with M ! M 0. A transition t 2 T is live if it is quasi-live from any marking in RS(N). A Petri net is live if all transitions are live. De nition 1. Given a Petri net N =< P; T; F; M0 >, t 2 T and M 2 INjP j : { The Boundedness Problem (BP) is to determine whether N is bounded or not. { The Reachability Problem (RP) is to determine whether M 2 RS(N) or not. { The Deadlock Problem (DP) is to determine whether RS(N) contains a deadmarking or not. { The t-Liveness Problem (t-LP) is to determine whether the transition t is live or not. { The Liveness Problem (LP) is to determine whether N is live or not. These problems have been widely studied. They are all decidable [11,8,12,14,7], p but intractable in practice. A lower space-bound for the RP and BP is 2c: jN j [13]. Reachability is Expspace-hard [3,15], but we don't know yet if the RP is primitive recursive or not. There exists a family of bounded Petri nets such that every net N of the family has a reachability set with a non-primitive recursive size in jN j [10]. An upper space-bound for deciding the BP is 2O(jN jlog jN j) [17]. This bound comes from the following theorem: Theorem 2. [11,17] A Petri net N =< P; T; F; M0 > isunbounded if and M with only if there exists two sequences 1; 2 2 T such as M0 ! M1 ! 2 1
2
M1 M2 .The net is unbounded if and only if there exists such an execution of length less than a double exponential in the size of N . If we talk about complexity, we need to determine what is the size of a Petri net. The representation we have chosen which is slightly dierent from the one in [20] commonly used. Let V be the greatest integer found over the ow function and the initial marking. We propose to encode the ow function of a Petri net with two matrices of size (jP j jT j) containing (logV ) bits: one matrix for input arcs and an other for output arcs. A Petri net is encoded with a sequence of bits giving the number of places, the number of transitions, the size of V , the
ow function and nally the initial marking. The total size belongs to: (log jP j +log jT j +loglog V +2 jP jjT j logV + jP j logV ) = (jP jjT j logV )
2.2 Known polynomial-time reductions for Petri nets Reductions [9] are used to compare dierent problems for which, most of the time, no ecient algorithm is known. We manipulate decision problems which are problems requiring Yes or No as output. We ask questions of the kind : \Does Petri net N possess the property P or not ?". The net given in input is called the instance of problem P . Most of the time, instances of our problems are Petri nets but it may happen that we need to specify a marking (as for the RP) or a transition (as for the t-LP). We note IP the set of instances associated to problem P. We say that P is many-one polynomial-time reducible to Q, written P poly Q, if we can exhibit a polynomial-time computable function f such as IP 2 P , f(Ip ) 2 Q. We say many-one because the function f is not necessarily a bijection. Sometimes, we have to take the complement of usual problems: for instance, we talk about the reduction from reachability to not-liveness and not to liveness.
Boundedness Boundedness-norm
Reachability Reachability-norm Sub-RP, Zero-RP SPZero-RP
Not
t-Liveness t-Liveness-norm
Not Boundedness-PSMN
Deadlock
Liveness
Fig.1. Summary of known polynomial-time many-one reductions.
We give in the current section an overview of known many-one polynomial-time reductions focusing on the BP, RP, DP, LP and t-LP and some of their variations. The Fig. 1 summarizes the relation between the problems with a diagram. All problems put in a same box are equivalent. An arrow from a box to another indicates the existence of a reduction from the rst class to the other. An arc labeled with \not" refers to a reduction to the complement of a problem. The boundedness problem for post self-modifying nets is written Boundedness-PSMN (the de nition of PSM-nets is recalled in section 6).
Normalization: The normalization proposed in [6] is performed in quadratic
time and preserves boundedness, reachability and t-liveness. We add the sux -norm to design the classical problems over normalized, or ordinary, Petri nets. We have BP equivalent to BP-norm, RP equivalent to RP-norm and t-LP equivalent to t-LP-norm.
Reachability: Many polynomial reductions were given by Hack [8], [16] about
reachability properties. Hack pointed out three problems equivalent to the RP. The Submarking Reachability Problem (Sub-RP) over < N; M 0 >, where M 0 is a marking over the subset P 0 P, is to decide whether there exists a marking M reachable such that for all p 2 P 0; M(p) = M 0 (p). The Zero-Reachability Problem (Zero-RP) over < N > is to decide whether there exists a reachable marking in which all the places are empty. The Single-Place Zero-Reachability (SPZero-RP) over < N; p > is to decide whether there exists a reachable marking for which place p is empty. Cheng and al. [4] showed that reachability is polynomially reducible to deadlock.
Liveness: Reachability is polynomially reducible to not-liveness [16]. The other
sense of the reduction is known recursive but we do not know a polynomial reduction. More recently, Cheng and al. [4] showed that the deadlock problem is polynomially reducible to not-liveness. But as for RP, the other sense is not known. Liveness appears to be a very expressive property. Hack [8] mentions a reduction from t-LP to LP performed in almost linear-time.
3 From unboundedness to reachability Let us compare the current state of knowledge about boundedness and reachability. Firstly, about complexity, we know an upper space-bound for solving boundedness [17] but we still do not know if reachability is primitive recursive or not. Moreover, this last question remains one of the hardest open questions in Petri net theory. Secondly, we know that if we increase the power of Petri nets a little bit then reachability becomes undecidable while boundedness seems more resistant. An illustrative example is the class of the post self-modifying nets for which boundedness is decidable but not reachability (see section 6). Reachability seems to be a stronger property than boundedness because BP is in Expspace and RP is Expspace-hard; in the current section, we explicitly give the reduc-
Level 4
p1 M0 (p1 )
Level 3
t
t0 Level 2
M0 (p1 )
M0 (p2 )
N
P22
P12
p2
(M0 )
M0 (p2 )
N
p2
2
t
Level 1
M0 (p1 )
M0(p2 )
P11
P21
(M0 )
N
1
p1
Fig.2. Reduction from boundedness to reachability. tion from BP to RP. The other sense, reachability to unboundedness, is probably false otherwise we would obtain a surprising upper bound complexity on solving reachability.
Theorem 3. Unboundedness is polynomially reducible to reachability Proof : Let N =< P; T; F; M > be a Petri net. that N is not bounded if Recall M 00, such that M 0 M 00 and only if there exists an execution of N, M ! M0 ! [11]. The dierence Md = M 00 , M 0 is a nonnegative vector, with at least one 0
0
1
2
strictly positive component. Our strategy is to look for such a marking Md . But we want to detect Md through reachability, by asking whether a speci c marking is reachable, and this implies that we need to characterize Md in a standard way. Let us suppose that we add a summing-place that contains at any step the sum over all the places (a summing-place can easily be implemented in a Petri net by adding to each transition an arc labeled with the total eect of the transition). The marking Md is certainly strictly greater than marking with 0 in all the places except 1 in the summing-place. We use this characterization for the nal question of the reduction. Let us explain our reduction with the help of Fig. 2. b T; b F; b M c0 > as follows: We build Nb =< P;
{ Make two copies of N in N =< P ; T ; F ; M > and N =< P ; T ; F ; M > 1
1
1
1
01
2
2
2
2
02
with M0 = M01 = M02; { Add two summing-places. At rst, p1 contains the sum over all the places of N1 and p2 contains the sum over all the places of N2 ; { Each transition t 2 T2 is duplicated leading to a new transition t0 in T2 (note that now N2 is not anymore an exact copy of N); { Make the fusion of N1 and N2 over pairs (t1; t2) where t1 2 N1 and t2 2 N2 are copies of the same original transition in N; { Add four levels of control, which are activated successively during an execution. Levels are managed with permission-places labeled explicitly in the picture. Control is rst given at level 1 and moves as follows: Level 1 ! level 2 ! level 3 ! level 4. The dashed arcs link a permission-place to the set of transitions that it allows to be red: Level 1 allows the two nets N1 and N2 to re the original transitions together; Level 2 allows only N2 to continue to re the original transition while N1 and its summing-place are frozen; Level 3 allows simultaneous emptying of two associated places (p1; p2) where p1 2 P1 [ fp1 g and p2 2 P2 [ fp2 g is its corresponding place; Level 4 allows to empty places of N2 [ fp2 g only. Correctness: N is unbounded if and only if Mr = (0; 0; 0; 1; 0 0; 0; 0 0; 1) b The rst four positions in Mr are related to the four levels. is reachable in N. The last position in Mr is related to summing-place p2 . The other positions, all equal to 0, are related to the remaining places of N1 and N2 . Note that Mr is b at any time M 0 in N1 a marking at level 4 (Mr (4) = 1). By construction, in N, 00 and M in N2 are two markings appearing along an execution of N. The only way to empty correctly P1 and p1 and to keep at least one token in p2 is to have M 0 M 00; this happens if and only if N is unbounded. Finally, level 4 allows to clean up the remaining places in order to exactly reach Mr when N is unbounded.
Complexity: The net Nb contains O(jP j) places and O(jP j + jT j) transitions. The greatest value in Nb is (jP j V ), because of the summing-places (recall that V is the greatest value of N). The total size is thus O(jP j (jP j + jT j) log(jP j V )) and the construction is linear on this size. We conclude that the time-complexity of the reduction is O(log jP j jN j ) and this concludes the proof. 2
4 Polynomial equivalence of reachability and deadlock Reachability and deadlock are decidable and thus recursively equivalent [4]. In the current section, we prove that reachability, deadlock, reachability for normalized Petri nets and deadlock for normalized Petri nets are polynomially equivalent. Recall that a Petri net is normalized if the ow function returns an integer in f0; 1g and the initial marking belongs to f0; 1gjP j. The reachability set however may be in nite and thus, normalized Petri nets should not be confused
with 1-safe nets for which any reachable marking contains only 0 or 1 as values. Normalization provides a simpler representation of Petri nets; in this sense, it is interesting to notice that studying RP or DP may be restricted to this class modulo a polynomial transformation. Our proofs use some known results but we explain in detail the main reduction \from deadlock to reachability". Proposition 4. Reachability, deadlock, reachability for normalized PN and deadlock for normalized PN are polynomially-time equivalent. Proof: We prove that RP poly RP-norm poly DP-norm poly DP poly RP. ,! The rst reduction, from RP to RP-norm, is true by the normalization in [6] which is performed in quadratic time and preserves reachability. To make an ecient normalization, the main idea is to use the binary strings encoding integers appearing in F and M0 , instead of using their values. ,! The second reduction, from RP-norm to DP-norm, is true from the reduction in Cheng and al. [4]. The main idea of the reduction is the following: let the original net run with dummy self-loop transitions. At any time, the current marking can be tested. The expected marking (which is part of the input) is subtracted from current marking. If the current marking was the expected one, the dummy transitions are not rable anymore and this leads to a deadlock. However, to preserve the normalization, we need to perform a pre-normalization over the expected marking. ,! The third reduction, from DP-norm to DP, is trivial. ,! We explain in detail the fourth reduction, from DP to RP. A natural Turing reduction is to list all the partial dead markings and to ask for each of them, whether it is reachable or not. However, there exists an exponential number of dead markings and this strategy is not polynomial.
Construction, from DP to RP. Let N =< P; T; F; M > be a Petri net. A 0
deadlock Md in N is a reachable marking allowing no transition to be red. This means that for every transition t, there exists a place p such that Md (p) < F(p; t). It is not necessary to describe the marking Md over all the places; a subset of places is sucient. The main idea is to guess a partial marking, to validate it as a good candidate for a deadlock, to let run the original net and nally to compare Md guessed with the current marking M of the original net. For that, Md is subtracted from M (token by token). If the markings are the same, 0 is reachable in the chosen places for M and Md . Fig. 3 gives the general skeleton of the reduction. We construct a net Nb with 4 levels of control. Each level controls a speci c subnet, isolated in a box. However, two boxes may have common transitions, and this is illustrated with non-oriented dashed arcs. Control is given rst at level 1 and moves as follows: level 1 ! level 2 ! level 3 ! level 4. We explain in detail the four levels. At Level 1, a subset of places P 0 P is chosen, and a marking Md is guessed over P 0. In Fig. 3, the guessed marking appears in the central box. If the place p is chosen, then a place Yesp is marked; otherwise, a place Nop is marked.
Level 1
Level 2
Choose P 0 Guess M
Verify M
d
Y esp1
Nop1
psat
d
V p01
V
Cp01
p0jP j
Marking M
d
p
P
Copy of N
Cp0jP j Level 4
pj
1 M0 (p1 )
Level 3
Y espjP j NopjP j
j
M0 (pjP j)
Fig. 3. Reduction from deadlock to reachability. For each original place p 2 P 0, the aimed Md (p) is stored into a place labeled with p0 . Fig. 4 gives the details of the implementation for place p. An Md (p) cannot be greater than V , where V is the greatest valuation of the original net. To guess Md (p), i.e. the content of p0, we use a complementary place, labeled with Cp0 . Places of kind p0 are initialized with 0, and complementary places with V . At any time, the sum over a place and its complementary place is the constant V . Nop
0
1
Chooses if p 2 P 0 or not
Y esp
0
0
Level 1
V
Cp0
p0
0
Guesses Md (p)
Fig. 4. From deadlock to reachability : to choose p and to guess M (p) into p0 . d
At level 2, the net veri es that Md is a good candidate: Md must under-
evaluate for any transition the number of tokens required by at least one
input place. If the condition holds, then the place psat is marked. To con rm that Md is a good candidate, we verify the following boolean equation :
^t2T _p2 t [(Md (p) < F(p; t)) ^ (p 2 P 0)] Condition Md (p) < F (p; t) is easily implemented using the complementary places: in fact, if p0 , i.e. Md (p), contains less than F(p; t) tokens then its complementary place contains at least V , F(p; t) + 1 tokens. The condition p 2 P 0 is veri ed by using the Yesp places. We illustrate the construction in Fig. 5, where we focus on transition t1 which has here as input places: p1 and an arbitrary pi. If the guessed marking is dead for t1 , then a place \Dead for t1" is marked. The same implementation is done for all the transitions. Note that we use re exive arcs, because places of kind Yesp or Cp0 may be used for more than one original transition. When Md is recognized as dead for all transitions, then psat may be marked (once here, but this is only a choice of construction). Y es
p1
pi
0
p01
F (p1 ; t1 )
0
0
p0jP j
C
C
V
,
Y es
p0i
C
V
pjP j
Y es
V
V +1
V
,
i
F (p ; t1 )
i
OR[p1 ; t1 ]
OR[p ; t1 ]
0
0 Dead for
+1
Dead for tjT j
t1
^t2T _p2 t [( d (
Level 2
M
0
1
0
p) < F (p; t))
^( 2 p
P
0 )]
p
sat
Fig.5. From deadlock to reachability : to verify M . d
At level 3, the net emulates the behavior of N. A copy of N is included in the current construction with a permission-token to level 3.
At level 4, the net stops the emulation and tests whether Md and the current
marking M in the copy of N coincide. For that, the Yesp places are used to debit the chosen places simultaneously in M and Md . The other non chosen places of M are emptied using the Nop places. The remaining non relevant places of the construction are emptied without condition.
Correctness: N reaches a dead marking if and only if Mr = (1; 0; 0; 0; 1; 0 0) b where the rst position of Mr refers to psat and the fth one is reachable in N, refers to level 4. It is evident that if a dead-marking is reachable in N, then it is possible to choose it as a good candidate and to nally reach Mr . In the other sense, if no dead marking is reachable in N then there are two cases: either psat is not marked; or psat is marked but this means that the guessed marking is not reachable and that current marking in the copy of N and Md will never coincide.
Complexity: The net Nb nally contains O(jP j + jT j) places and O(jP j jT j) transitions (because of the module which veri es Md ). The greatest value in Nb is V . The total size is thus in O(jN j2), and the construction is linear on this size then quadratic. This concludes the proof.
5 Polynomial equivalence of liveness and t-liveness There exists a polynomial reduction from reachability to not-liveness [16] using the variation of RP which asks whether a place p may be emptied. A similar reduction exists from deadlock to not-liveness [4]. The other senses of the reductions, from not-LP to RP and from not-LP to DP, are not known. Hack [8] gave a reduction from t-liveness to liveness. In the current section we show the other sense of the reduction, from liveness to t-liveness, making the two problems many-one polynomially equivalent. Note that we do not have this equivalence for the subclass of bounded free-choice net where t-liveness is NP-complete, while liveness is polynomial [5].
Theorem 5. Liveness is polynomially reducible to t-liveness. Proof: Let N =< P; T; F; M > be a Petri net. The construction of Nb is as follows: (1) Add a place pt in output of every transition t 2 T ; (2) Add a transition ttest having as input places the set of places fptjt 2 T g. All the original transitions are live if and only if ttest is quasi-live from any reachable b In Nb we add jT j places and O(jT j) transitions. The total size of marking in N. the net is O((jT j + jP j) jT j) and this size is quadratic in jN j. The total time is 0
linear in this size and thus polynomial.
6 An upper-bound on solving boundedness for Post Self-Modifying nets Post Self-Modifying nets (PSM-nets), de ned by Valk [18,19], are more powerful than Petri nets. In this model, transitions have extended arcs and/or classical arcs. Extended arcs are only in output of transitions. Let us suppose that there exists an extended arc from t to place p2 labeled with 21 p1 +4 p3. Firing t from M, leads to a new marking M 0 such that M 0 (p2) = M(p2)+21M(p1)+4M(p3). Thus, the next marking depends narrowly on the current one and this is why one uses the quali er \self-modifying".
A PSM-net is a 5-tuple < P; T; F; M0; E >. The four rst components are the same as in Petri nets and the fth one, component E, is a function (T P P) ,! IN which returns a multiplicative coecient, given a transition, an output place and a place to be consulted. In our example, we have E(t; p2; p1) = 21. Although PSM-nets are more expressive, the boundedness is still decidable and this is what makes this model attractive. The proof [18] is similar to the original one for Petri nets. However, reachability is undecidable. Let us de ne a lower bound on the size of an PSM-net. Let V be the greatest integer found over F , M0 and E. We encode the ow functions with matrices as for Petri nets. The size of a PSM-net belongs to (jP j jT j log V ). In the current section, we give an upper bound on solving boundedness for PSM-nets. We prove that we have a polynomial-time equivalence between boundedness for Petri nets and boundedness for post self-modifying nets. The non trivial sense of the reduction, from BP to BP-PSMN, requires quadratic time. As boundedness for Petri nets is decidable in space 2O(jN j log jN j) , we obtain 2O(jN j log jN j) as an upper space-bound for BP-PSMN. The main idea is to build a net Nb that emulates the behavior of N but computes the number of tokens output of extended arcs in a weak sense. This means that, in the best case, the computation will be the right one but, in any other case, the computation will under-evaluate the number of tokens to be produced. Any marking reachable in Nb is, in some sense, covered by a marking reachable in N and this implies that N is unbounded if and only if Nb is unbounded. Theorem 6. Boundedness for PSM-nets is decidable in space 2O(jN j log jN j) Proof: Let N =< P; T; F; E; M0 > be a post self-modifying net. We reduce BP-PSMN to BP; the time complexity is O(jN j2) leading to the theorem above. b T; b F; b M c0 >, we decompose the eect of any original To construct Nb =< P; transition for the weak computing of the tokens to be produced in output. Every transitions are replaced by a subnet as illustrated on an example in Fig. 6. For that, we need to associate every original place p to a place reservoir-p initialized with 0. We ensure the mutual exclusion between the jT j subnets, such that as long as a current decomposition is not over, it is impossible to emulate another original transition. In Fig. 6, transition t has p4 as input place, p5 as classical output place and p2 , p6 as \extended" output places. The arc to p2 is labeled with 21 p1 + 4 p3 and the arc to p6 with 7 p1 . This implies that ring t from M has for consequence the addition of 21 M(p1 ) + 4 M(p3 ) tokens in p2 and 7 M(p1 ) tokens in p6. The emulation of t is performed in four steps : Start t : the decomposition begins with the update of input places (here p4 ) and classical output places (here p5). Control is given to the next step. Update by p1 : the weak computations of 21 M(p1 ) and 7 M(p1 ) take place here. As long as desired, tp debits p1 of 1 token, crediting at the same time its reservoir place of 1 token, p2 of 21 tokens and p6 of 7 tokens. If the process ends when p1 is empty, then p2 and p6 received the exact number of tokens; otherwise they received less tokens than aimed. 2
2
1
Reservoir p1 Reservoir p3
p4
t
21
p3
p1
4
p1
t
t
1+ 3
21 4
p2
p6
7
p5
p4
p5
p
7
p
p2
p6
p1
start t
Update by p1
Update by p3 Weak ring of
t
Restore altered places
N Next transition
Fig. 6. Reduction from boundedness{PSMN to boundedness : weak ring of t. Update by p3 : the weak computation of the multiplicative coecient for p3 takes place here. The value 4 M(p3 ) is calculated in a weak sense, debiting p3 but keeping a trace in reservoir-p3 by the same time.
Restore altered places : we have now to restore the original contents
of places p1 and p3. As long as desired, the contents of the reservoirs are put back into the original places. If the process continues up to empty the reservoirs, then p1 and p3 are restored; otherwise, they receive less tokens than aimed. Note that in this last case, we have not however lost any tokens because the remaining ones are in the reservoir places. Control is given to the next transition to be emulated.
When all the steps are fully processed, we nd in places p6 and p2 the right number of tokens, and we leave p1 and p3 unalterated. At any time, and this is the interesting point, if we \merge" any pair p and its reservoir by making their sum, we nd a marking which is covered by a marking that is reachable in the original PSM-net. Moreover when the decompositions are well performed we nd a marking reachable in the original PSM-net. These two facts are sucient to make the reduction correct. Note that the construction needs to be a bit adapted for other cases such as re exive extended arc.
Correctness: The original net N is unbounded if and only if the built net Nb is
unbounded. If N is unbounded, then Nb is unbounded because there is always a way to emulate correctly the original net. If N is bounded then: either Nb fully performs the decomposition steps and produces as many tokens as N produces at any step; or it produces less tokens.
Complexity: The original places, the reservoirs and the mechanism which restores the places are common to all the decompositions of original transitions. Each decomposition of an original transition requires O(jP j) places and transitions in worst case. The whole net Nb contains O(jP j + (jT j jP j)) places and O(jT j + (jT j jP j)) transitions. The greatest value in Nb is logV . The total size is thus O((jT j jP j) log V ) and the construction is linear on this size, thus O(jN j ) and this concludes the proof. 2
2
7 Conclusion
Boundedness Not Boundedness-norm Boundedness-PSMN
Reachability Reachability-norm Sub-RP, Zero-RP SPZero-RP Deadlock Deadlock-norm
Not
t-Liveness t-Liveness-norm Liveness
Fig. 7. Summary of polynomial-time many-one reductions. In this paper we were interested in ordering Petri net problems, boundedness, reachability, deadlock, liveness and t-liveness, through their complexity. The Fig. 6 summarizes the contribution of our work. The main results are the following: We give an illustration of the expressive power of reachability by reducing to it the not-boundedness and the deadlock problems. Reachability is a very vulnerable property in term of decidability and often becomes undecidable, as soon as the power of Petri nets is increased. An example of an extended model for which RP is undecidable, is the class of Petri nets allowing Reset arcs [1]; a Reset arc clears a place as a consequence of a ring. We put in the same class the reachability and the deadlock problems. These problems were known to be recursively equivalent and thus, our comparison is more precise. We give 2O(jN j log jN j) as an upper-bound on the space-complexity for boundedness in post self-modifying nets, and this bound is not so far from the one for Petri nets, even though PSM-nets are strictly more powerful than Petri nets. 2
Acknowledgments. Thanks to the anonymous referees for their useful remarks.
References 1. T. Araki and T. Kasami. Some decision problems related to the reachability problem for Petri nets. TCS, 3(1):85{104, 1977. 2. Z. Bouziane. Algorithmes primitifs recursifs et problemes Expspace-complets pour les reseaux de Petri cycliques. PhD thesis, LSV, E cole Normale Superieure de Cachan, France, November 1996. 3. E. Cardoza, R. Lipton, and A. Meyer. Exponential space complete problems for Petri nets and commutative semigroups. In Proc. of the 8 annual ACM Symposium on theory of computing, pages 50{54, May 1976. 4. A. Cheng, J. Esparza, and J. Palsberg. Complexity result for 1-safe nets. TCS, 147:117{136, 1995. 5. J. Desel and J. Esparza. Free Choice Petri Nets. Cambridge University Press, 1995. 6. C. Dufourd and A. Finkel. A polynomial -bisimilar normalization for Petri nets. Technical report, LIFAC, ENS de Cachan, July 1996. Presented at AFL'96, Salgotarjan, Hungary, 1996. 7. J. Esparza and M. Nielsen. Decidability issues on Petri nets { a survey. Bulletin of the EATCS, 52:254{262, 1994. 8. M. Hack. Decidability questions for Petri Nets. PhD thesis, M.I.T., 1976. 9. J.E. Hopcroft and J.D. Ullman. Introduction to automata theory, languages, and computation. Addison-Wesley, 1979. 10. M. Jantzen. Complexity of Place/Transition nets. In Petri nets: central models and their properties, volume 254 of LNCS, pages 413{434. Springer-Verlag, 1986. 11. R.M. Karp and R.E. Miller. Parallel program schemata. Journal of Computer and System Sciences, 3:146{195, 1969. 12. R. Kosaraju. Decidability of reachability in vector addition systems. In Proc. of the 14 Annual ACM Symposium on Theory of Computing, San Francisco, pages 267{281, May 1982. 13. R.J. Lipton. The reachability problem requires exponential space. Technical Report 62, Yale University, Department of computer science, January 1976. 14. E.W. Mayr. An algorithm for the general Petri net reachability problem. SIAM Journal on Computing, 13(3):441{460, 1984. 15. E.W. Mayr and R. Meyer. The complexity of the word problem for commutative semigroups and polynomial ideals. Advances in Mathematics, 46:305{329, 1982. 16. J.L. Peterson. Petri Net Theory and the Modeling of Systems. Prentice Hall, 1981. 17. C. Racko. The covering and boundedness problems for vector addition systems. TCS, 6(2), 1978. 18. R. Valk. Self-modifying nets, a natural extension of Petri nets. In Proc. of ICALP'78, volume 62 of LNCS, pages 464{476. Springer-Verlag, September 1978. 19. R. Valk. Generalizations of Petri nets. In Proc. of the 10 Symposium on Mathematical Fondations of Computer Science, volume 118 of LNCS, pages 140{155. Springer-Verlag, 1981. 20. R. Valk and G. Vidal-Naquet. Petri nets and regular languages. Journal of Computer and System Sciences, 23(3):299{325, 1981. th
th
th
Computing Reachability Properties Hidden in Finite Net Unfoldings Burkhard Graves Universitat Hildesheim, Institut fur Informatik, Marienburger Platz 22, D-31141 Hildesheim, Germany Fax: +49 5121 860475, email: [email protected] (August 1997)
Abstract. It is commonly known that every reachable marking of a
nite-state Petri net system is represented in its nite unfolding according to McMillan. Also the reachability of markings from each other is represented in the nite unfolding, but it is almost unknown that this information can be hidden very deep. This paper presents an ecient method for gaining this information, which is of course of great importance for potential modelcheckers working on nite unfoldings. All results presented in this paper also hold for a recently proposed optimized unfolding method.
1 Introduction and Motivation A major drawback of interleaving semantics and modelcheckers based upon them is the so called state explosion problem . One among other approaches to cope with this problem is the use of partial order semantics instead of interleaving semantics [8]. Like many other works done in this area, this paper uses nite 1-safe Petri net systems to represent nite-state concurrent systems. Partial order semantics describes the behaviour of a net system by the set of its processes [3] or by its maximal branching process [5], also called maximal unfolding of the system, which can be seen as the union of all processes of the given system. However, if a system can exhibit an in nite behaviour, then at least one of its processes and consequently its maximal unfolding is in nite and therefore unsuitable for veri cation purposes. McMillan proposed in [8] an elegant algorithm for the computation of a nite initial part of the maximal branching process, called nite unfolding , in which every reachable marking of the system is represented. This work was re ned and optimized by Esparza, Romer and Vogler in [7]; the nite unfolding calculated by their method is never bigger and often much smaller (in terms of orders of magnitude) than McMillan's nite unfolding, while still representing every reachable marking. However, in this paper we neglect the dierence between these two unfolding methods. All results hold for both unfolding methods and the systems serving as examples have been choosen in a way, such that both unfolding methods yield the same nite unfolding (up to isomorphism). As already mentioned above, every reachable marking of a given system is represented in its nite unfolding. However,
p1 b1 p1 t1
t3 e1
p2 t2
t3
p3
p4 t4 p5
Fig. 1. A nite 1-safe net system.
p4 b2
e2 b4 t4 p5
p2 b3 t2 e3 p3 b5
p4 b6
t1 e4
t3 e5
p1 b9
p2 b7
e6 b8 t4 p5 e04 =?
Fig. 2. Its nite unfolding f .
the reachability of markings from each other is deeply embedded in the nite unfolding. Consider for example the system and its nite unfolding displayed in Fig.1 and Fig.2. The reachable marking fp2; p3 g is represented by a process contained in the nite unfolding corresponding to the con guration fe1 ; e3; e5 g (describing the occurrence sequence t3 t2 t3 ). Obviously, the deadlock marking fp1; p5 g is reachable from fp2; p3 g (by the occurrence sequence t1 t2 t1 t4 , for example), but how can this information be gained from the nite unfolding? Now, imagine all processes describing a run of a system to a given marking. In general, only a few of these processes are totally contained in the nite unfolding, but it is always the case that some initial part of every such process is contained in the nite unfolding. In our example, there are in nitely many processes describing a run of the system into the deadlock marking fp1; p5 g, but only two of them are totally contained in the nite unfolding. One corresponds to the con guration fe2g, the other one corresponds to the con guration fe1 ; e3 ; e4; e6 g. Each of the remaining processes is pre xed by a process corresponding to the con guration fe1; e3 ; e4 ; e5 g. Since fe1; e3 ; e5 g is a subset of fe1; e3 ; e4 ; e5 g, it can be concluded that fp1; p5 g is reachable from fp2 ; p3 g. This example demonstrates that a speci c classi cation of con gurations contained in the nite unfolding is of great importance for potential modelcheckers. Three types of con gurations can be distinguished wrt. a given marking: Con gurations of the rst type correspond to processes describing runs to the given marking, con gurations of the second type correspond to processes which cannot be extended to processes describing runs to the given marking, and con gurations of the third type are con gurations which are neither of type one nor of type two. Since this classi cation is a disjoint partitioning of all con gurations, the knowledge of two classes yields the third one. The con gurations of type one can be easily calculated; two dierent algorithms for this task can be found in [6]. However, the classi cation of the remaining con gurations is a problem. Esparza tries to solve this problem in [6] by introducing a `shift' operator working on con gurations. Con gurations of the third type should be calculated by repeated applications of this operator on con gurations of the rst type. Unfortunately, this does not work in some cases, as we will see in Sect.4. One could say, that the nite unfolding as de ned by
McMillan or by Esparza/Romer/Vogler is `too small', because the problem can be xed by creating a suciently large nite unfolding which does not contain these special cases mentioned above. But this `brute force' method would signi cantly slow down potential model checking algorithms, e.g. the one proposed in [6]. This paper presents another solution, namely a modi cation of the `shift' operator, such that it works as the old operator was supposed to do on the McMillan unfolding as well as on the Esparza/Romer/Vogler unfolding.
2 Basic De nitions Following is a series of de nitions, notions and theorems (without proofs) in a very brief form. More details can be found in the referenced literature. Petri Nets. A triple N = (S; T; F ) is a net if S\T = ; and F (ST )[(T S ). The elements of S are called places , the elements of T transitions . Places and transitions are generically called nodes . N is nite if jS [ Tj 2 IN. We identify the ow relation F with its characteristic function on the set (S T ) [ (T S ). The preset of a node x, denoted by x, is the set fy 2 S [ T j F (y; x) = 1g. The postset of x, denoted by x , is the set fy 2 S [ T j F (x; y) = 1g. Preand postsets are generalized to sets of nodes X S [ T in the following way: X = S x, X = S M of a net x2X x2X x (notice ; = ; = ;). A marking (S; T; F ) is a mapping S ! IN. A 4-tuple = (S; T; F; M 0 ) is a net system if (S; T; F ) is a net and M 0 is a marking of (S; T; F ); M 0 is called initial marking of . is nite if the underlying net is nite. A marking M enables a transition t if 8s 2 S : M (s) F (s; t). A marking enabling no transition is a deadlock marking . If a transition t is enabled at M , then it can occur , and its occurrence leads to a new marking M 0 , denoted by M [tiM 0 , such that 8s 2 S : M 0 (s) = M (s) , F (s; t) + F (t; s). A sequence of transitions = t1 : : : tn (n 2 IN) is an occurrence sequence if there exist markings M0 ; : : : ; Mn such that M0 [t1 iM1 [t2 i : : : [tn iMn . Mn is the marking reached from M0 by the occurrence of , denoted by M0 [iMn . M 0 is reachable from M if there exists an occurrence sequence such that M [iM 0 . The set of all markings which can be reached from M is denoted by [Mi. A marking M of a net (S; T; F ) is 1-safe if 8s 2 S : M (s) 1. We identify 1-safe markings with the set of places s such that M (s) = 1. A system is 1-safe if all its reachable markings are 1-safe. Figure 1 shows a nite 1-safe system; its initial marking is fp1 ; p4 g. Branching Processes. A branching process of a system is a special kind of net, called occurrence net, together with a certain homomorphism showing that this net can be interpreted as an unfolding of the system containing information about both concurrency and con icts. In order to avoid confusions coming from the fact that the semantics of a (marked) net is again a (labelled) net, dierent names are used for the nodes of the net system and for those of the occurrence net which describes the system's semantics: the places of occurrence nets are called conditions , and their transitions are called events . We quickly review the
main de nitions and results of [5], where the notion `branching process' was introduced rst: Let (S; T; F ) be a net. The transitive closure of F , denoted by , is called causal relation . The symbol denotes the re exive and transitive closure of F . Min(N ) equals fx 2 S [ T j :9y 2 S [ T : y xg. For x 2 S [ T and X S [ T , we say x X if 9y 2 X : x y (analogously for , and ). Two nodes x1 ; x2 2 S [ T are in con ict , denoted by x1 #x2 , if 9t1 ; t2 2 T; t1 6= t2 ; t1 \ t2 6= ; : t1 x1 ^ t2 x2 . A node x 2 S [ T is in self-con ict if x#x. We say x1 co x2 if neither x1 x2 nor x2 x1 nor x1 #x2 holds. An occurrence net is a net N = (B; E; F ) such that (i) 8b 2 B : j bj 1 (iii) :9e 2 E : e#e (ii) :9x 2 B [ E : x x (iv) 8x 2 B [ E : jfy 2 B [ E j y xgj 2 IN. If moreover jb j 1 holds for every b 2 B , then N is called causal net . Let N1 = (S1 ; T1; F1 ) and N2 = (S2 ; T2; F2 ) be two nets. A homomorphism from N1 to N2 is a mapping h : S1 [T1 ! S2 [T2 with h(S1 ) S2 and h(T1 ) T2 such that for every t 2 T1 the restriction of h to t is a bijection between t and h(t), analogously for t and h(t) . A branching process of a net system = (N; M 0 ) is a pair = (N 0 ; p) where 0 N = (B; E; F ) is an occurrence net and p is a homomorphism from N 0 to N such that the restriction of p to Min(N 0 ) is a bijection between Min(N 0 ) and M 0 and 8e1; e2 2 E : ( e1 = e2 ^ p(e1 ) = p(e2 )) ) e1 = e2 . If N 0 is a causal net, then is a process of . Let 1 = (N1 ; p1) and 2 = (N2 ; p2 ) be two branching processes of a net system. A homomorphism from 1 to 2 is a homomorphism h from N1 to N2 , such that p2 h = p1 and the restriction of h to Min(N1 ) is a bijection between Min(N1 ) and Min(N2 ). 1 and 2 are isomorphic if there is a bijective homomorphism from 1 to 2 . Intuitively, two isomorphic branching processes dier only in the names of their conditions and events. It is shown in [5] that a net system has an unique maximal branching process up to isomorphism. We call it maximal unfolding of the system and denote it by m = (Bm ; Em ; Fm ; pm ). 1 is a pre x of 2 if N1 is a subnet of N2 and, moreover, there exists an injective homomorphism from 1 to 2 . Figure 2 shows a pre x of the maximal unfolding m of the nite 1-safe system displayed in Fig.1. It should be clear, that a repeated continuation with four events and ve conditions labelled and `arranged' like e3; : : : ; e6 and b5 ; : : : ; b9 , respectively, yields the maximal unfolding1 . Con gurations, Cuts and more. A con guration of an occurrence net N = (B; E; F ) is a causally closed, con ict-free set of events C E , which means 8e; e0 2 E : (e e0 ^ e0 2 C ) ) e 2 C and 8e; e0 2 C : :(e#e0 ). Given e 2 E , the set [e] = fe0 2 E j e0 eg is a con guration, called local con guration of e. A set of conditions B 0 B is a co-set if its elements are pairwise in the co relation. A maximal co-set wrt. set inclusion is a cut . A marking M of a 1
For example: p(e7 ) = t2 , e7 = fb7 ; b9 g, e7 = fb10 ; b11 g, p(b10 ) = p3 , p(b11 ) = p4 , etc.
system is represented in a branching process = (N; p) of if contains a cut c such that, for each place s of , c contains exactly M (s) conditions b with p(b) = s. Every marking represented in a branching process is reachable, and every reachable marking is represented in the maximal unfolding of the net system. Finite con gurations and cuts are tightly related: Let C be a nite con guration of a branching process = (N; p). Then Cut(C ) = (Min(N ) [ C ) n C is a cut representing the marking Mark(C ) = p(Cut(C )). Two con gurations C1 and C2 of a branching process correspond to each other if Mark(C1 ) = Mark(C2 ). A pair (C1 ; C2 ) of corresponding con gurations is called cc-pair . Let = (B; E; F; p) be a branching process of a net system = (N; M 0 ) and let c be a cut of . The set fx 2 B [ E j x c ^ 8y 2 c : :(x#y)g is denoted by "c. Identifying F and p with their restrictions to "c, *c = (B\"c; E\"c; F; p) is a branching process of (N; p(c)); moreover, if = m then *c is the maximal branching process of (N; p(c)). It follows that * Cut(C1 ) and * Cut(C2 ) are isomorphic, provided (C1 ; C2 ) is a cc-pair; in this case we denote the (unique) isomorphism from * Cut(C1 ) to * Cut(C2 ) by I(C1 ;C2 ) . McMillan's Finite Unfolding. Here we only present McMillan's unfolding method. The re ned method of Esparza, Romer and Vogler is more complicated; interested readers are referred to [7]. As already mentioned, the dierences between these two unfolding methods are not relevant for this paper. Let = (B; E; F; p) be a branching process of a net system . We say that is complete if every reachable marking of is represented in and, moreover, contains an event labelled by t if a transition t can occur in . The maximal unfolding of a net system is always complete. Since a nite 1-safe net system has only nitely many reachable markings, its maximal unfolding contains at least one complete nite pre x. An event e 2 E is a cut-o event if there exists a local con guration [e0 ] such that Mark([e0 ]) = Mark([e]) and j[e0 ]j < j[e]j. To achieve that e is a cut-o event if Mark([e]) = M 0 , a new `virtual' event ? is introduced and [?] is de ned as the empty con guration: Mark([?]) = M 0 = Mark([e]) and j[?]j = 0 < j[e]j. Given a cut-o event e, there may exist several e0 such that Mark([e0 ]) = Mark([e]) and j[e0 ]j < j[e]j. We assume in the sequel that for each cut-o event e one of these e0 is xed, call it corresponding event of e and denote it by e0 . Moreover, we assume without loss of generality that e0 is not a cut-o event. Let Ef Em de ned by: e 2 Ef i no event e0 e is a cut-o event. The (unique) pre x of m having Ef as set of events is called McMillan's nite unfolding and denoted by f = (Bf ; Ef ; Ff ; pf ). In [6], f is proved to be always complete. O denotes the set of cut-o events of f . Ff denotes the set of all con gurations of f . D denotes the set of all maximal con gurations of f wrt. set inclusion. The set of all con gurations contained in the maximal unfolding m is denoted by Fm . Figure 2 shows the nite unfolding f of the nite 1-safe system displayed in Fig.1. e4 is the only cut-o event, e04 =? is its corresponding event. Note, that indeed Mark([e4 ]) = Mark([?]) = fp1; p4 g. D contains three maximal con gura-
tions: D1 = fe2g, D2 = fe1 ; e3; e4 ; e6 g and D3 = fe1 ; e3 ; e4; e5 g.
3 Mutual Reachability of Markings Remark. To simplify subsequent de nitions, we assume a nite 1-safe net system = (S; T; F; M 0 ) together with its (possibly in nite) maximal unfolding m = (Bm ; Em ; Fm ; pm) and its nite unfolding f = (Bf ; Ef ; Ff ; pf ) to be given throughout the rest of this paper. The pre x and the post x of a (set of) node(s) and the operators " and * always refer to the maximal unfolding m .
It is beyond the scope of this paper to explain the ideas of [6], where a model checking algorithm is introduced, which should be able to check formulas of a simple branching time logic. However, the following is similar to what can be found in [6], but due to the fact that we are only interested in recognizing the mutual reachability of markings, we are able to simplify some de nitions. De nition 1. Let C be a con guration and let C , C1 and C2 be sets of con gurations. We say C C i 9C 0 2 C : C C 0 and C1 C2 i 8C 2 C1 : C C2 : The restriction onto the nite unfolding f is denoted by 5C = C \ Ef and 5 C = f5C j C 2 Cg: The set of the maximal elements contained in C wrt. set inclusion is denoted by max(C ) = fC 2 C j :9C 0 2 C : C C 0 g: Notice that max(C ) may equal the empty set if jCj 62 IN. The following Lemma is needed for a proof later on; observe that it does not hold without the max operator. Lemma2. Let C1 and C2 be sets of con gurations. max(C1 ) = max(C2 ) i max(C1 ) max(C2 ) ^ max(C2 ) max(C1 ):
De nition 3. Let M be a marking. We de ne Satm (M ) = fC j C 2 Fm ^ Mark(C ) = Mg; Satf (M ) = fC j C 2 Ff ^ Mark(C ) = Mg; Last(M ) = max(Satf (M )): In terms of Sect.1, the set Satm (M ) contains con gurations of the rst type, which correspond to processes describing runs to the marking M . The same holds for Satf (M ), but wrt. the nite unfolding f . Last(M ) can be seen as a `compact representation' of Satf (M ), because every con guration contained in Satf (M ) is (at least) a subset of a con guration contained in Last(M ). Due to
its compactness, the set Last(M ) can be calculated easily (provided that D is known, but this set can be calculated in advance by the unfolding mechanism); two dierent algorithms for this task can be found in [6]. In our rst example, Last(fp1 ; p5 g) equals ffe2g; fe1; e3; e4 ; e6 gg. But as we have seen, the knowledge of Last(fp1 ; p5g) is not enough to detect that fp1; p5 g is reachable from fp2 ; p3 g. The following proposition shows that we are interested in max(5Satm (M )), the compact representation of 5Satm (M ). Proposition 4. Let M1 and M2 be two markings. M2 2 [M1 i i max(5Satm (M1 )) max(5Satm (M2 )): Unfortunately, the sets Satm (M1 ) and Satm (M2 ) may be in nite. Fortunately, the following section shows that max(5Satm (M )) is equal to the maximum of a set which can be calculated by some nite applications of a `shift' operator on the nite set Last(M ).
4 The Shift Operators In the following, we present two shift operators which are generalizations of (slightly modi ed2 ) operators originally de ned in [6]. The rst operator shifts a con guration wrt. a cc-pair, the second operator shifts a set of con gurations wrt. a set of cc-pairs. By choosing these cc-pairs, the operators may be tuned. If they are tuned in such a way that they correspond to the ones de ned in [6], we obtain a problem, as we will show. Hence we will propose a somewhat dierent tuning and we will show, as a result, that the problem disappears. Let (C1 ; C2 ) be a cc-pair. As in [6], the branching process * Cut(C2 ) can be thought to be * Cut(C1 ) `shifted forward' (or `shifted backwards' if jC2 j < jC1 j). Accordingly, if C1 is a subset of some con guration C of m then C n C1 is a con guration of * Cut(C1 ), I(C1 ;C2) (C n C1 ) is a con guration of * Cut(C2 ) and C2 [ I(C1 ;C2) (C nC1 ) is again a con guration of m , which can be thought to be C `shifted forward' wrt. the cc-pair (C1 ; C2 ). The following is a formal de nition of this operation. De nition 5. Let Fm Fm be a set of cc-pairs and (C1 ; C2) 2 . The basic shift operator is de ned by
! fC 2 Fm j C Cg S C1 ;C2 : fC 2 Fm j C Cg C 7! C [ I C1 ;C2 (C n C ): (
)
1
2
2
(
)
1
The complex shift operator is de ned by Fm Fm 2 S : 2 C ! 7! C [ fS(C1;C2 ) (C ) j (C1 ; C2 ) 2 ^ C1 C 2 Cg: The least xpoint of S containing C is given by [ S :C = Sn (C ): 2
n0
Merely for technical reasons, in order to get more algebraic properties.
Lemma6. Let C 2 Fm and let (C ; C ) be a cc-pair with C C . (i) Mark(C ) = Mark(S C1 ;C2 (C )); (ii) jC j < jC j () jS C1 ;C2 (C )j > jCj and (iii) S C1 ;C2 is bijective (S ,C1 ;C2 = S C2 ;C1 ) and monotonic wrt. : 1
(
1
(
2
)
2
1
)
(
)
(
1
)
(
)
For convenience and in accordance with [6], we x the following Notation 7. We abbreviate S (C ) by S (C ) if = f([e0]; [e]) j e 2 O g. Moreover, if (C1 ; C2 ) = ([e0 ]; [e]) holds for some e 2 O , we abbreviate I([e0 ];[e]) (C ) by Ie (C ) and S([e0 ];[e])(C ) by Se (C ); the latter is called elementary shift . In Fig.2 we have for example Se4 (fe2g) = [e4 ] [ Ie4 (fe2g) = fe1; e3 ; e4 g [ fe6g: The following is a rst step towards the aim formulated at the end of Sect.3. Theorem 8. max(5Satm(M )) = max(5S:Satf (M )) = max(5S:Last(M )): Proof. The rst equality follows directly from Satm (M ) = S:Satf (M ) which is proven in [6]. The second equality is proven indirectly in [6], but due to a problem in the proof, we give a direct one: Last(M ) = max(Satf (M )) ) fde nition of and maxg Satf (M ) Last(M ) ) fmonotonicity of S(V 0 ;V ) g S:Satf (M ) S:Last(M ) ) fset theoryg 5S:Satf (M ) 5S:Last(M ) ) fde nition of maxg max(5S:Satf (M )) max(5S:Last(M )) Together with max(5S:Last(M )) max(5S:Satf (M )), which follows directly from Last(M ) Satf (M ), Lemma 2 yields the requested equality. ut
Of course, there are still some problems: The calculation of S:Last(M ) requires in nite applications of S , S:Last(M ) itself may be in nite and, in particular, potential modelcheckers have only the nite unfolding f at their disposal. Therefore, they can only work with nite versions of the shift operators. De nition 9. Let be a set of cc-pairs, (C1 ; C2) 2 , C Ff and C 2 C . We identify the nite versions of the shift operators by overbarring: S(C1 ;C2 ) (C ) = 5S(C1;C2) (C ) ; S (C ) = 5S (C ) and S n S :C = n0 S (C ) :
Observe that S (C ) = C [ fS(C1;C2 ) (C ) j (C1 ; C2 ) 2 ^ C1 C 2 Cg and that the calculation of S(C1 ;C2) (C ) does not require the maximal branching process m ; the knowledge of the pairs (e0 ; I(C1 ;C2) (e0 )) 2 Ef2 for every (C1 ; C2 ) 2 is sucient (an algorithm for calculating these pairs can be found in [6]). Moreover, since Sn (C ) is a set of con gurations of the nite unfolding for every n, and the nite unfolding contains only nitely many con gurations, there exists some k 2 IN such that Sk (C ) = Sk+1 (C ). We have then S :C = S0nk Sn (C ). Now, consider for every C Ff 5S:C =? S:C (cf. [6] p.177): (1) Last(M ), which would solve If this was true, then 5S:Last(M ) would equal S: Last(M ) is nite and its calculaall our remaining problems, because the set S: tion requires only a nite number of applications of S. Unfortunately, (1) is not true; Fig.3 gives a counterexample. t5
p1 t1
t2 p3
p5
t6
p6
p1 b1
p4 b2
t4
t7
t3 e1
p2
p7
p2 b3
t3 p4
t5 e2
p5 b5
t6 e5
p b4 1 t4 p5 e4 b14
p6 b8 t7 e8 b10 p7
t2 e3 p3 b6
p4 b7
t1 e6
t3 e7
p1 b13
p2 b9
e04 = e2 e06 =? t4 e9
p5 b11
t6 e10
p6 b12
Fig. 3. A nite 1-safe net system and its nite unfolding f . Consider the (always) reachable deadlock marking fp7g. Since Last(fp7g) equals ffe2; e5 ; e8 gg, we have3 Se6 (Se4 (fe2 ; e5; e8 g)) = Se6 (fe1 ; e4 ; e05; e08 g) = fe1; e3 ; e6 ; e7; e9 ; e10 ; e008 g; therefore fe1 ; e3 ; e6; e7 ; e9 ; e10 ; e008 g 2 S:ffe2; e5 ; e8 gg and fe1 ; e3; e6 ; e7 ; e9 ; e10 g 2 5S:ffe2; e5; e8 gg: Notice that e5 was shifted out of f and then returned (as e10 ). However, fe1 ; e3 ; e6; e7 ; e9 ; e10 g 62 S: Last(fp7 g), because once the event e5 is shifted out 3
The (double) primed events are not contained in the nite unfolding, because they lie after some cut-o event. But it should be clear, which events are meant in the maximal branching process, e.g. e05 = fb14 g.
of the nite unfolding, the nite version S of the shift function S forgets about it. The problem is caused by the event e4 . This event belongs to a class of events, which is characterized in the next de nition.
De nition 10 (Tricky events). Let e 2 O and let (C ; C ) be a cc-pair. e tricks (C ; C ) i e 2 " Cut(C ) ^ (I C1 ;C2 (e)) \" Cut(C ) \ Ef 6= ;: Tricky events are recognized when calculating the pairs (e0 ; I C1 ;C2 (e0 )) 2 Ef for a cc-pair (C ; C ). In our example, e is recognized as tricky (it tricks (;; [e ])) because e 2 " Cut(;) ^ (I ;; e6 (e )) \" Cut([e ]) \ Ef = fe g = 6 ;. Equation (1) is only true for nite unfoldings containing no cut-o events e and e , such that e tricks ([e ]; [e ]). This can be achieved in a `brute force' way by enlarging the nite unfolding in an appropriate way, e.g. by introducing the additional requirement that for every cut-o event e its corresponding event e must be an element of its local con guration (e 2 [e]). 1
1
2
1
(
4
1
2
(
2
1
0
(
2
)
4
4
1
2
2
)
[
])
6
4
6
0 2
10
2
0
But this method would signi cantly slow down potential model checking algorithms, e.g. the one proposed in [6]. Keeping all events which fall out of the nite unfolding during repeated applications of S `in mind' has a similar additional space and time complexity and is therefore out of the question. The approach we follow is to consider tricky events by combining sequences of shifts disturbed by them into `clean' single shifts. In our example, instead of shifting the con guration fe2; e5 ; e8 g two times, rst wrt. e4 and then wrt. e6 (losing the event e5 or, respectively, e10 ), we circumvent the tricky event e4 by shifting fe2; e5 ; e8 g wrt. (fe2g; fe1; e3 ; e6 ; e7; e9 g) in order to get the desired con guration fe1; e3 ; e6 ; e7 ; e9; e10 g. Therefore, by adding (as few as possible) appropriate cc-pairs (V 0 ; V ) to the set f([e0 ]; [e]) j e 2 O g, we now construct a set of cc-pairs taking all tricky events into account such that max(5S:Last(M )) = max(S :Last(M )) holds. This solves our remaining problems, because the set S :Last(M ) is nite and its calculation requires only a nite number of applications of S. The construction of
does not establish 5S:Last(M ) = S :Last(M ) (it can be checked later on that Fig.3 is a counterexample) nor max(5S:C ) = max(S :C ) for an arbitrary C Ff . A counterexample concerning the second equation is omitted due to its complexity; interested readers should contact the author. In general, each event e tricking a cc-pair (V10 ; V1 ) is taken into account by introducing a cc-pair (V20 ; V2 ), whereby V2 can be easily calculated by shifting V10 [ [e] wrt. (V10 ; V1 ). Observe that V2 = S(V10 ;V1 ) (V10 [ [e]) contains only the cut-o events which are already contained in V1 . The calculation of V20 is a little bit harder, because two unpleasant things can happen. First, it would be nice to calculate V20 by shifting V10 [ [e] backwards wrt. e. This would work in our example, because Se,41 (;[ [e4]) = fe2 g. Unfortunately, if there is another event e0 tricking ([e0 ]; [e]), it can happen that Se,1 (V10 [ [e]) 6 Ef . This would mean that Se,1 (V10 [[e]) does not exist, or in other words, 5Se,1(V10 [[e]) is probably not a corresponding con guration of V2 . Therefore, it can happen that 4
Remember, this is necessary for performing a nite shift of a con guration.
the calculation of V20 depends on the existence of another cc-pair (V30 ; V3 ) already contained in , which takes the tricky event e0 into account. This shows that the elements of must be calculated in a certain order. An example concerning this ` rst unpleasant thing' can be found in App.A. Second, suppose there is some `appropriate' cc-pair (V30; V3 ) and let (V20 ; V2 ) = (S(,V130 ;V3 ) (V10 [ [e]); S(V10 ;V1 ) (V10 [ [e])):
In this case, it can happen that V20 contains some cut-o event e0 . Let b0 2 e0 and b 2 Cut(V2 ) with p(b) = p(b0 ). If there is some e00 2 b \" Cut(V2 ) \ Ef then we are again confronted with the problem that there is no e000 2" Cut(V20 ) \Ef with I(V20 ;V2 ) (e000 ) = e00 . To solve this problem, we shift V10 [ [e] backwards (starting wrt. some appropriate (V30 ; V3 ) 2 ) until it contains no cut-o events anymore5. Notice that this sequence of reverse shifts may require a sequence of appropriate cc-pairs already contained in . We call shift complete wrt. V10 [ [e] if such cc-pairs exist. For simpli cation reasons we x the following notation. Notation 11. Let C 2 Fm and let a = (V10; V1 ) : : : (Vn0; Vn) be a sequence of cc-pairs (n 2 IN; a = if n = 0). We de ne Sa (C ) = Xn and Sa,1 (C ) = Yn with X0 = Y0 = C and, for 0 i < n,
S
Vi0+1 ;Vi+1 (Xi ) , if Xi is de ned Xi = unde ned , otherwise ( , Y = S Vn0,i ;Vn,i (Yi ) , if Yi is de ned (
+1
and
i+1
(
)
1
unde ned
)
, otherwise.
De nition 12. Let be a set of cc-pairs. We call shift complete with respect to a con guration C 2 Ff if C \ O = ; or if there exist some cc-pairs (W ; W ); : : : ; (Wn ; Wn ) 2 (n 2 IN n f0g) such that 8i; 1 i n : S ,Wi0 ;Wi ::: Wn0 ;Wn (C ) Ef and S ,W10 ;W1 ::: Wn0 ;Wn (C ) \ O = ;: 0 1
1
0
(
(
1
)
1
(
)
(
)
)
In this case, we denote the sequence (W10 ; W1 ) : : : (Wn0 ; Wn ) by C . If is shift complete wrt. all C 2 Ff then we call shift complete . Observe that C does not have to be unique. Now, as mentioned one page before, by iteratively adding appropriate ccpairs to the set f([e0 ]; [e]) j e 2 O g we construct a shift complete set . The construction can not be done in one step, because one tricky event can prevent the consideration of another tricky event (remember the `unpleasant things'). 5
An example concerning this `second unpleasant thing' is omitted due to its complexity; interested readers should contact the author.
De nition 13. The set is de ned by = Si Xi with X = f([e ]; [e]) j e 2 O g and Xi = Xi [ f(V ; V ) j 9e 2 O 9(V ; V ) 2 Xi : e tricks (V ; V ) ^ Xi is shift complete wrt. V [ [e] ^ V = S V10 ;V1 (V [ [e]) 0 ^ V = SD, (V [ [e]) with D = XiV1 [ e g: S Since Ff , there exists some k 2 IN such that = ik Xi . Proposition 14. is shift complete. Proof. Assume not to0 be shift complete wrt. some C 0 2 Ff . This means, that there is no sequence C , which implies that there is some C obtained by shifting C 0 backwards as often as possible without falling out of f such that 8(V ; V ) 2 ; V C 9e 2 O : e tricks (V ; V ) ^ I V 0 ;V (e) \ C = 6 ;: (2) If (V ; V ) is one of these (V ; V ) and e is the corresponding e then is not shift complete wrt. V [ [e ] (otherwise the tricky event e would have been taken into account due to the construction of ) and jV [ [e ]j < jC 0 j. This means, 0 [ e1 V 1 that there is no sequence
, which implies that there is some C obtained by shifting V [ [e ] backwards as often as possible without falling out of f such that (2) holds again. If (V ; V ) is one of these (V ; V ) and e is the corresponding e then is not shift complete wrt. V [ [e ] and jV [ [e ]j < jV [ [e ]j. 0
0
0
0 2
+1
0 1
2
0 1
1
1
0 1
2
(
1
0 2
0 1
)
[ ]
0 1
0
0
0 1
0
0
1
0 1
1
0 2
2
)
1
1
[
0 1
(
0 1
]
0
0 2
2
0 2
1
1
2
2
0 1
1
This argumentation can be repeated ad in nitum. Since there cannot exist an in nite sequence of con gurations of decreasing size, the assumption must be false. ut
A repeated application of the following Lemma shows that each non-elementary shift corresponds to a sequence of elementary shifts. Lemma15. As in De nition 13, let (V20; V20) 2 Xi+1 with V20 = SD,1(V10 [ [e]) and V2 = S(V10 ;V1 ) (V10 [ [e]) (i 2 IN; D = XiV1 [[e] ) and let C 2 Fm . S(V20 ;V2 ) (C ) = S(V10 ;V1 ) (SD (C )): Finally we have Theorem 16. max(5S:Last(M )) = max(S :Last(M )): Proof. Lemma 15 already yields max(5S:Last(M )) max(S :Last(M )): Let lc(C ) = fC 2 C j :9C 0 2 C : jC 0 j < jCjg denote the set of con gurations of lowest cardinality contained in some C Fm and let C1 2 max(5S:Last(M )).
Since is shift complete, we can compute (by a nite amount of reverse shifts) C10 with C10 \ O = ; and Mark(C10 ) = Mark(C1 ). Suppose C10 6 Last(M ) (otherwise we are done). Since M 2 [Mark(C10 )i, C = lcfC 2 Fm j Mark(C ) = M ^ C10 Cg 6= ;. Let C2 2 5C . Now we compute C20 with C20 \ O = ; and Mark(C20 ) = Mark(C2 ). If again C20 6 Last(M ) we iterate the above procedure until we get some Cn0 (n > 2) with Cn0 Last(M ); this will happen sooner or later, because jC 0 n Ci0 j with C 0 2 lcfC 2 Fm j Mark(C ) = M ^ Ci0 Cg gets smaller in each iteration (i = 1; 2; : : :). By inverting all nite shifts that have been done, we can re-compute C1 which shows max(5S:Last(M )) max(S :Last(M )):
ut
5 Summary, Conclusion and Outlook This paper can be seen as a continuation of the work done in [6]. Its main contribution is a generalization and correction of the shift operators presented there. The necessity of this generalization is shown by uncovering several subtleties of the nite unfolding (e.g. the existence of tricky events). Finally, it is shown that with the new operators the computing of reachability properties hidden in the nite unfolding of a given system is an easy job (Prop.4, Theorem 8 and 16). A properly working shift operator does not come for free. But since the construction of the shift complete set is not very expensive6 and has to be done only once and, in particular, before the potential start of a modelchecker working on the nite unfolding, it costs nearly nothing compared with the potentially numerous and expensive modelchecker runs. On the contrary, by summarizing sequences of shifts into single shifts, there may be chances for signi cant reductions of the time complexity of potential modelcheckers, e.g. the one proposed in [6]. This should be examined in future work. The author is working on an extended version of this paper containing the proofs and examples left out here; it will be published soon. Acknowledgments. I would like to thank Eike Best and Javier Esparza for reading an early draft version of this paper. Special thanks go to Michaela Huhn; she gave me a major hint for the proof of the shift completeness of .
6
A lot of examples have been checked; in general, even complex unfoldings contain only a few tricky events | exact complexity investigations seem to be very dicult and require further study.
A An Example Concerning the `First Unpleasant Thing' p11 t10 p6 t6 p7 t11
p9
t5 t7 p5 p8 t4
t8 p10
t12
p9 b2
t9
t1
t2
p10 b4
t3
p3
p11 b8
t10 e10
p6 b13
t6 e14
b7 p10
e9 t9
b12 e15 p1 t7 p8 b18
b5 p9 e5 t4
b6 p11 e7 t8 b9 p5
e8 t10 b25 p10 e11 t5
b11 p6
e13 t6
t4 e18
p5 b20
t5 e19
p6 b21
t8 e2
p2
p1
t12 e4
p4 b1
p4
t9 e6
t3 e1
p1 b10
p2 b3
t11 e3
t2 e12 p3 b16
p4 b17
t1 e16
t3 e17
p1 b23
p2 b19
b24 p6
p7 b15
b14 p7
e07 = e4 e011 = e8 e016 = e6 t6 e20
p7 b22
Fig. 4. A nite 1-safe net system and its nite unfolding f . e is a tricky event; it tricks ([e ]; [e ]). Se16 ([e ] [ [e ]) = Se16 (e ; e ; e ; e ; e ) = fe ; e ; e ; e ; e ; e ; e ; e g: Notice that Mark(fe ; e ; e ; e ; e ; e ; e ; e g) = fp ; p g, but 5Se,11 ([e ] [ [e ]) = 5fe ; e ; e ; e0 g = fe ; e ; e g and Mark(fe ; e ; e g) = fp ; p g. This is caused by another tricky event, namely e tricking ([e ]; [e ]), which can be taken into account by (Se,7 ([e ] [ [e ]); Se11 ([e ] [ [e ])) = (fe ; e g; fe ; e ; e ; e g): Observe, for example, that Se11 (Se7 (fe ; e ; e g)) = Se11 (fe ; e ; e ; e0 g) = fe ; e ; e ; e ; e g and Se11 (Se7 (fe ; e ; e g)) = fe ; e ; e ; e g; but S fe4 ;e10 g;fe1 ;e5 ;e11 ;e2 g (fe ; e ; e g) = fe ; e ; e ; e ; e g: With (fe ; e g; fe ; e ; e ; e g) 2 , the rst component of the cc-pair taking the tricky event e into account can be calculated via S ,fe4 ;e10 g;fe1 ;e5 ;e11 ;e2 g (fe ; e ; e ; e ; e g) = fe ; e ; e g: 11
6
6
11
2
1
2
1
3
8
7
6
6
4
9
16
5
8
2
6
19 7
12
1
17
8
9
11
1
1
1
5
3
10
4
11
7
1
5
9
10
2
8
9
11
2
1
1
5
5
11
11
2
4
10
6
2
)
19
7
11
(
18
6
3
6
16
6
)
1
1
18
3
10
(
10
11
17
10
4
4
5
11
7
4
12
10
8
8
1
11
7
7
6
16
2
6
1
5
11
9
2
6
8
References 1. Bernardinello, L., De Chindio, F.: A survey of basic net models and modular net classes. G. Rozenberg, ed., Advances in Petri nets 1992, Lecture Notes in Computer Science 609, (Springer, Berlin, 1992) 304{351. 2. Best, E., Devillers, R.: Sequential and Concurrent Behaviour in Petri Net Theory. Theoretical Computer Science, 55(1), (1987) 299{323. 3. Best, E., Fernandez, C.: Nonsequential processes { a Petri net view. EATCS Monographs on Theoretical Computer Science 13 (1988). 4. Clarke, E.M., Emerson, E.A., Sistla, A.P.: Automatic veri cation of nite-state concurrent systems using temporal logic speci cations. ACM Transactions on Programming Languages and Systems, 8(2), (1986) 244{263. 5. Engelfriet, J.: Branching processes of Petri nets. Acta Informatica 28, (1991) 575{ 591. 6. Esparza, J.: Model checking using net unfoldings. Science of Computer Programming 23, (1994) 151-195. 7. Esparza, J., Romer, S., Vogler, W.: An Improvement of McMillan's Unfolding Algorithm. T. Margaria, B. Steen, ed., Proceedings of TACAS'96, LNCS 1055, (1996) 87{106. 8. McMillan, K.L.: Using unfoldings to avoid the state explosion problem in the veri cation of asynchronous circuits. Proceedings of the 4th Workshop on Computer Aided Veri cation, (Montreal, 1992) 164{174. 9. Petri, C.A.: Kommunikation mit Automaten. Schriften des Institutes fur Instrumentelle Mathematik, (Bonn, 1962). 10. Queille, J.P., Sifakis, J.: Speci cation and veri cation of concurrent systems in CESAR. Proceedings of the 5th International Symposium on Programming, LNCS 137, (1981) 337{351.
This article was processed using the LATEX macro package with LLNCS style
343
Author Index Arora, A. 169 Arvind, V. 235 de Boer, F.S. 267
Mahajan, S. 22 Marchiori, M. 88 Mohalik, S. 153 Muthukrishnan, S. 37
Cirino, K. 37 Clarke, Jr., E.M. 54
Namjoshi, K.S. 284 Narayanaswamy, N.S. 37
Devillers, R. 297 Dey, T.K. 6 Dufourd, C. 312
Pontelli, E. 123
Finkel, A. 312 Goebel, R. 138 Gordon, A.D. 74 Graves, B. 327 Gupta, G. 123 Hankin, P.D. 74 Hannemann, U. 267 Kapur, D. 103 Kfoury, A.J. 57 Klaudel, H. 297 Koebler, J. 235 Kulkarni, S.S. 169 Lassen, S.B. 74
Ramachandran, V. 1 Ramanujam, R. 153 Ramesh, H. 37 Ramos, E.A. 22 Ranjan, D. 123 Riemann, R.-C. 297 de Roever, W.-P. 267 Roy, A. 6 Seth, A. 200 Shah, N.R. 6 Subrahmanyam, K.V. 22 Subramaniam, M. 103 Sudan, M. 184 Vardi, M.Y. 250 Vinodchandran, N.V. 220 You, J.-H. 138 Yuan, L.-Y. 138
Lecture Notes in Computer Science For information about Vols. 1-1272 please contact your bookseller or SpringeFVerlag
Vol. 1273: P. Antsaklis, W. Kohn, A. Nerode, S. Sastry (Eds.), Hybrid Systems IV. X, 405 pages. 1997. Vol. 1274: T Masuda, Y Masunaga, M. Tsukamoto (Eds.), Worldwide Computing and Its Applications Proceedings, 1997. XVI, 443 pages. 1997 Vol. 1275: E.L. Gunter, A Felty (Eds.), Theorem Proving in Higher Order Logics Proceedings, 1997. VIII, 339 pages. 1997 Vol. 1276: T. Jiang, D.T. Lee (Eds.), Computing and Combmatorics. Proceedings, 1997. XI, 522 pages. 1997.
Vol. 1292: H. Glaser, P. Hartel, H. Kuchen (Eds.), Programming Languages: Implementations, Loglgs, and Programs. Proceedings, 1997. XI, 425 pages. 1997. Vol 1293: C. Nicholas, D. Wood (Eds.), Principles of Document Processing. Proceedings, 1996. XI, 195 pages. 1997. Vol 1294: B.S. Kahski Jr. (Ed.), Advances in Cryptology - - CRYPTO '97. Proceedings, 1997. XII, 539 pages. 1997.
Vol. 1277: V. Malyshkin (Ed.), Parallel Computing Technologies. Proceedings, 1997. XII, 455 pages. 1997.
Vol. 1295: I. Privara, P. Ru~.iEka (Eds.), Mathematical Foundations of Computer Science 1997. Proceedings, 1997. X, 519 pages. 1997.
Vol. 1278: R. Hofest~idt, T. Lengauer, M. L6ffler, D. Schomburg (Eds), BioinformaUcs Proceedings, 1996. XI, 222 pages. 1997.
Vol 1296: G. Sommer, K. Daanlidis, J. Pauli (Eds.), Computer Analysis of Images and Patterns. Proceedings, 1997. XIII, 737 pages. 1997.
Vol. 1279: B. S. Chlebus, L. Czaja (Eds.), Fundamentals of Computatmn Theory. Proceedings, 1997. XI, 475 pages. 1997. Vol 1280: X. Lm, P. Cohen, M. Berthold (Eds.), Advances in Intelligent Data Analysis. Proceedings, 1997. XII, 621 pages. 1997.
Vol. 1297: N. Lavra~, S. D~eroski (Eds.), Inductive Logic Programming. Proceedings, 1997. VIII, 309 pages. 1997. (Subseries LNAI).
Vol. 1281: M. Abadi, T. Ito (Eds.), Theoretical Aspects of Computer Software. Proceedings, 1997. XI, 639 pages. 1997.
Vol. 1299: M.T. Pazienza (Ed.), Information Extraction. Proceedings, 1997. IX, 213 pages. 1997. (Subseries LNAI).
Vol. 1282: D. Garlan, D. Le M&ayer (Eds.), Coordination Languages and Models. Proceedings, 1997. X, 435 pages. 1997.
Vol. 1300: C. Lengauer, M. Griebl, S. Gorlatch (Eds.), Euro-Par'97 Parallel Processing. Proceedings, 1997. XXX, 1379 pages. 1997.
Vol. 1283: M. Muller-Olm, Modular Compiler Verification XV, 250 pages. 1997.
Vol. 1301: M. Jazayeri, H. Schauer (Eds.), Software Engineering - ESEC/FSE'97. Proceedings, 1997 XIII, 532 pages. 1997.
Vol. 1284: R. Burkard, G. Woeginger (Eds.), Algorithms - - E S A '97. Proceedings, 1997. XI, 515 pages. 1997. Vol. 1285: X. Jao, J.-H. Kim, T. Furuhashi (Eds.), Simulated Evolution and Learmng. Proceedings, 1996. VIII, 231 pages. 1997. (Subseries LNAI). Vol. 1286" C. Zhang, D Lukose (Eds.), Multi-Agent Systems. Proceedings, 1996. VII, 195 pages. 1997. (Subseries LNAI). Vol. 1287: T. Kropf (Ed.), Formal Hardware Verification XII, 367 pages. 1997. Vol. 1288: M. Schneider, Spatial Data Types for Database Systems XIII, 275 pages. 1997. Vol. 1289. G. Gottlob, A. Leitsch, D. Mundici (Eds.), Computational Logic and Proof Theory. Proceedings, 1997 VIII, 348 pages. 1997.
Vol. 1298: M. Hanus, J. Heering, K. Meinke (Eds.), Algebraic and Logic Programnung Proceedings, 1997 X, 286 pages. 1997.
Vol. 1302: P. Van Hentenryck (Ed.), Static Analysis. Proceedings, 1997. X, 413 pages. 1997. Vol 1303. G. Brewka, C. Habel, B. Nebel (Eds.), KI-97: Advances in Artificial Intelligence. Proceedings, 1997. XI, 413 pages. 1997. (Subseries LNAI). Vol. 1304 W. Luk, P.Y.K. Cheung, M. Glesner (Eds.), Field-Programmable Logic and Apphcations. Proceedmgs, 1997. XI, 503 pages. 1997. Vol. 1305: D. Corne, J.L. Shapiro (Eds.), Evolutionary Computing. Proceedings, 1997. X, 307 pages. 1997. Vol. 1306: C. Leung (Ed.), Visual Information Systems. X, 274 pages. 1997. Vol. 1307: R. Kompe, Prosody in Speech Understanding Systems XIX, 357 pages. 1997. (Subseries LNAI).
Vol. 1290. E. Moggi, G. Rosolim (Eds.), Category Theory and Computer Science. Proceedings, 1997 VII, 313 pages. 1997
Vol. 1308: A. Hameurlam, A M. Tjoa (Eds.), Database and Expert Systems Applications. Proceedings, 1997. XVII, 688 pages. 1997.
Vol. 1291. D.G. Feltelson, L. Rudolph (Eds.), Job Scheduling Strategies for Parallel Processing. Proceedings, 1997. VII, 299 pages. 1997.
Vol. 1309: R. Steinmetz, L.C. Wolf (Eds.), Interactive Distributed Multimedia Systems and Telecommunication Services. Proceedings, 1997. XIII, 466 pages 1997.
Vol. 1310: A. Del Bimbo (Ed.), Image Analysis and Processing. Proceedings, 1997. Volume I. XXII, 722 pages. 1997.
Vol. 1331: D. W. Embley, R. C. Goldstem (Eds.), Conceptual Modeling - ER '97. Proceedings, 1997. XV, 479 pages. 1997.
Vol. 1311: A. Del Bimbo (Ed.), Image Analysis and Processing. Proceedings, 1997. Volume II. XXII, 794 pages. 1997.
Vol. 1332: M. Bubak, J. Dongarra, J. Wagniewski (Eds.), Recent Advances in Parallel Virtual Machine and Message Passing Interface. Proceedings, 1997 XV, 518 pages. 1997.
Vol. 1312: A. Geppert, M. Berndtsson (Eds.), Rules m Database Systems. Proceedings, 1997. VII, 214 pages. 1997. Vol. 1313: J. Fitzgerald, C.B. Jones, P. Lueas (Eds.), FME '97: Industrial Applications and Strengthened Foundations of Formal Methods. Proceedings, 1997. XIII, 685 pages. 1997.
VoL 1333: F. Pichler. R.Moreno-Dlaz (Eds.), Computer Aided Systems Theory - EUROCAST'97. Proceedings, 1997. XII, 626 pages. 1997. Vol. 1334: Y. Hart, T. Okamoto, S. Qing (Eds.), Information and Communications Security. Proceedings, 1997. X, 484 pages. 1997.
Vol 1314: S. Muggleton (Ed.), Inducttve Logic Programm~ng. Proceedings, 1996. VIII, 397 pages. 1997. (Subseries LNAI).
Vol. 1335: R.H. Mthring (Ed.), Graph-Theoretic Concepts in Computer Science. Proceedings, 1997. X, 376 pages. 1997.
Vol. 1315: G. Sommer, J J. Koenderink (Eds.), Algebraic Frames for the Perception-Action Cycle. Proceedings, 1997. VIII, 395 pages. 1997.
Vol. 1336: C. Polychronopoulos, K. Joe, K. Araki, M. Amamiya (Eds.), High Performance Computing. Proceedings, 1997. XII, 416 pages. 1997.
Vol. 1316: M Li, A. Maruoka (Eds.), Algorithmic Learnmg Theory. Proceedings, 1997. XI, 461 pages. 1997. (Subseries LNAI).
Vol. 1337: C. Freksa, M. Jantzen, R. Valk (Eds.), Foundations of Computer Science. XII, 515 pages. 1997.
Vol. 1317: M. Leman (Ed.), Music, Gestalt, and Computing. IX, 524 pages. 1997. (Subseries LNAI). Vol. 1318: R. Hirschfeld (Ed.), Financial Cryptography. Proceedings, 1997. XI, 409 pages. 1997. Vol. 1319: E. Plaza, R. Benjamins (Eds.), Knowledge Acqmsition, Modeling and Management Proceedings, 1997. XI, 389 pages. 1997. (Subseries LNAI). Vol. 1320: M. Mavronicolas, P. Tsigas (Eds.), Distributed Algorithms. Proceedings, 1997. X, 333 pages. 1997. Vol. 1321: M. Lenzerini (Ed.), AI*IA 97: Advances in Artificial Intelligence. Proceedings, 1997. XII, 459 pages. 1997. (Subseries LNAI). Vol. 1322: H. Hugmann, Formal Foundauons for Software Engineering Methods. X, 286 pages. 1997. Vol. 1323: E. Costa, A. Cardoso (Eds.), Progress in Artificial Intelligence. Proceedings, 1997. XIV, 393 pages. 1997. (Subseries LNAI). Vol. 1324: C. Peters, C. Thanos (Eds.), Research and Advanced Technology for Digital Libraries. Proceedings, 1997. X, 423 pages. 1997. Vol. 1325: Z.W. Rag, A. Skowron (Eds), Foundations of Intelligent Systems. Proceedings, 1997. XI. 630 pages. 1997. (Subseries LNAI). Vol. 1326: C. Nicholas, J. Mayfield (Eds.), Intelhgent Hypertext. XIV, 182 pages. 1997. Vol. 1327: W. Gersmer, A. Germond, M Hasler, J.-D. Nicoud (Eds.), Artificial Neural Networks - ICANN '97. Proceedings, 1997. XIX, 1274 pages. 1997. Vol. 1328: C. Retor~ (Ed.), Logical Aspects of Computational Linguistics. Proceedings, 1996. VIII, 435 pages. 1997. (Subseries LNAI). Vol. 1329: S.C. Hirtle, A.U. Frank (Eds.), Spatial Information Theory. Proceedings, 1997. XIV, 511 pages. 1997. Vol. 1330: G. Smolka (Ed.), Principles and Practice of Constraint Programming - CP 97. Proceedings, 1997. XII, 563 pages. 1997.
Vol. 1338: F. Pl~gil, K.G. Jeffery (Eds.), SOFSEM'97: Theory and Practice of Informatics. Proceedings, 1997. XIV, 571 pages. 1997. Vol. 1339: N.A. Murshed, F. Bortolozzi (Eds.), Advances in Document Image Analysis. Proceedings, 1997. IX, 345 pages. 1997. Vol. 1340: M. van Kreveld, J. Nievergelt, T. Roos, P. Widmayer (Eds.), AIgorithn~c Foundations of Geographic Information Systems. XIV, 287 pages. 1997. Vol. 1341: F. B ry, R. Ramakrishnan, K. Ramamohanarao (Eds.), Deductive and Object-Oriented Databases. Proceedings, 1997. XIV, 430 pages. 1997. Vol. 1342: A. Sattar (Ed.), Advanced Topics in Artifictal Intelligence. Proceedings, 1997. XVII, 516 pages. 1997. (Subseries LNAI). Vol. 1343: Y. lshikawa, R.R. Oldehoeft, J.V.W. Reynders, M. Tholburn (Eds.), Scientific Computing m Object-Oriented Parallel Environments. Proceedings, 1997. XI, 295 pages. 1997. Vol. 1344: C Ausnit-Hood, K.A. Johnson, R.G. Pettit, IV, S.B. Opdahl (Eds.), Ada 95 - Quality and Style. XV, 292 pages. 1997. Vol. 1345: R.K. Shyamasundar, K. Ueda (Eds.), Advances in Computing Science - ASIAN'97. Proceedings, 1997. XIII, 387 pages. 1997. Vol. 1346: S. Ramesh, G. Sivakumar (Eds.), Foundations of Software Technology and Theoretical Computer Science. Proceedings, 1997. XI, 343 pages. 1997. Vol. 1347: E. Ahronovitz, C. FIorio (Eds.), Discrete Geometry for Computer Imagery Proceethngs, 1997. X, 255 pages. 1997. Vol. 1349: M. Johnson (Ed.), Algebraic Methodology and Software Technology. Proceedings, 1997. X, 594 pages. 1997. Vol. 1350: H.W. Leong, H. Imai, S. Jain (Eds.), Algorithms and Computation. Proceedings, 1997. XV, 426 pages. 1997.