Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis, and J. van Leeuwen
2378
3
Berlin Heidelberg New York Barcelona Hong Kong London Milan Paris Tokyo
Sophie Tison (Ed.)
Rewriting Techniques and Applications 13th International Conference, RTA 2002 Copenhagen, Denmark, July 22-24, 2002 Proceedings
13
Series Editors Gerhard Goos, Karlsruhe University, Germany Juris Hartmanis, Cornell University, NY, USA Jan van Leeuwen, Utrecht University, The Netherlands
Volume Editors Sophie Tison LIFL Bˆatiment M3 Universit´e de Lille 1, Cit´e Scientifique 59655 Villeneuve d’Ascq cedex, France E-mail:
[email protected]
Cataloging-in-Publication Data applied for Die Deutsche Bibliothek - CIP-Einheitsaufnahme Rewriting techniques and applications : 13th international conference ; proceedings / RTA 2002, Copenhagen, Denmark, July 22 - 24, 2002. Sophie Tison (ed.). - Berlin ; Heidelberg ; New York ; Barcelona ; Hong Kong ; London ; Milan ; Paris ; Tokyo : Springer, 2002 (Lecture notes in computer science ; Vol. 2378) ISBN 3-540-43916-1
CR Subject Classification (1998): F.4, F.3.2, D.3, I.2.2-3, I.1 ISSN 0302-9743 ISBN 3-540-43916-1 Springer-Verlag Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer-Verlag Berlin Heidelberg New York a member of BertelsmannSpringer Science+Business Media GmbH http://www.springer.de © Springer-Verlag Berlin Heidelberg 2002 Printed in Germany Typesetting: Camera-ready by author, data conversion by PTP-Berlin, Stefan Sossna e.K. Printed on acid-free paper SPIN: 10870449 06/3142 543210
Preface
This volume contains the proceedings of the 13th International Conference on Rewriting Techniques and Applications (RTA 2002), which was held July 2224, 2002 in Copenhagen as part of the 3rd Federated Logic Conference (FLoC 2002). RTA is the major international forum for the presentation of research on all aspects of rewriting. Previous RTA conferences took place in Dijon (1985), Bordeaux (1987), Chapel Hill (1989), Como (1991), Montreal (1993), Kaiserslautern (1995), Rutgers (1996), Sitges (1997), Tsukuba (1998), Trento (1999), Norwich (2000), and Utrecht (2001). A total of 20 regular papers, 2 application papers and 4 system descriptions were selected for presentation from 49 submissions from Argentina (1), Brazil ( 23 ), Czech Republic (1), France (13), Germany (8), Israel ( 13 ), Italy (1 56 ), Japan (6), The Netherlands (2), Poland (1), Portugal (1 13 ), Rumania (1), Spain (4), UK ( 56 ), Uruguay ( 12 ), USA (5 12 ), Venezuela (1). The program committee awarded the best paper prize to Paul-Andr´e Melli`es for his paper Residual Theory Revisited. This paper presents an elegant and subtle generalization of Jean-Jacques L´evy’s residual theory. I am especially grateful to the invited speakers Franz Baader, John Mitchell, and Natarajan Shankar for accepting our invitation to present us their insights into their research areas. Many people helped to make RTA 2002 a success. I thank the members of the program committee for their demanding work in the evaluation process. I thank also the numerous external referees for reviewing the submissions and maintaining the high standards of the RTA conference. It is a great pleasure to thank Thomas Arts who was in charge of the local organization. I would also express my gratitude to Anne-C´ecile Caron for her assistance in many of my tasks as the program chair. Finally, I thank the FLoC 2002 Organizing Committee and the Local Organization Committee for all their hard work.
May 2002
Sophie Tison
Conference Organization
Program Chair Sophie Tison
University of Lille
Conference Chair Thomas Arts
Ericsson, Computer Science Laboratory
Program Committee Andrea Corradini Daniel J. Dougherty J¨ urgen Giesl Bernhard Gramlich Th´er`ese Hardin Christopher Lynch Jerzy Marcinkowski Aart Middeldorp Joachim Niehren Femke van Raamsdonk Albert Rubio Sophie Tison Ralf Treinen
University of Pisa Wesleyan University RWTH Aachen Vienna University of Technology University of Paris VI Clarkson University University of Wroclaw University of Tsukuba University of Saarland Free University Amsterdam Technical University of Catalonia University of Lille University of Paris XI
Steering Committee Franz Baader Leo Bachmair H´el`ene Kirchner Pierre Lescanne Jose Meseguer Aart Middeldorp
Dresden Stony Brook Nancy Lyon (publicity chair) Menlo Park Tsukuba (chair)
VIII
Conference Organization
List of External Referees Yves Andr´e J¨ urgen Avenhaus Steffen van Bakel Sebastian Bala Paolo Baldan Inge Bethke Stefano Bistarelli Roel Bloo Manuel Bodirsky Benedikt Bollig Eduardo Bonelli Sander Bruggink Roberto Bruni Venanzio Capretta Anne-C´ecile Caron Witold Charatonik Horatiu Cirstea Ren´e David Roberto Di Cosmo Gilles Dowek Ir`ene Durand Joost Engelfriet Katrin Erk Maribel Fern´ andez Jean-Christophe Filliˆ atre Andrea Formisano Fabio Gadducci Silvia Ghilezan Isabelle Gnaedig Guillem Godoy Philippe de Groote Stefano Guerrini Rolf Hennicker Miki Hermann Thomas Hillenbrand
Dieter Hofbauer Patricia Johann Gueorgui Jojgov Wolfram Kahl L : ukasz Kaiser Zurab Khasidashvili Alexander Koller Pierre Lescanne Martin Leucker Jean-Jacques L´evy Jordi Levy James B. Lipton S´ebastien Limet Daniel Loeb Salvador Lucas Denis Lugiez Luc Maranget Massimo Marchiori Ralph Matthes Paul-Andr´e Melli`es Oskar Mi´s Markus Mohnen Benjamin Monate C´esar A.Mu˜ noz Robert Nieuwenhuis Thomas Noll Enno Ohlebusch Pawe:l Olszta Vincent van Oostrom Dmitrii V. Pasechnik Jorge Sousa Pinto Detlef Plump Jaco van de Pol Tim Priesnitz Pierre R´ety
Alexandre Riazanov Christophe Ringeissen Andreas Rossberg Michael Rusinowitch Cesar Sanchez Christelle Scharff Manfred Schmidt-Schauß Aleksy Schubert Klaus U. Schulz Jan Schwinghammer Helmut Seidl G´eraud S´enizergues J¨ urgen Stuber Zhendong Su Taro Suzuki Gabriele Taentzer Jean-Marc Talbot Ashish Tiwari H´el`ene Touzet Yoshihito Toyama Tomasz Truderung Xavier Urbain Pawe:l Urzyczyn S´ andor V´ agv¨ olgyi Laurent Vigneron Mateu Villaret Christian Vogt Johannes Waldmann Freek Wiedijk Herbert Wiklicky Claus-Peter Wirth Hans Zantema Wieslaw Zielonka
Table of Contents
Invited Talks Combining Shostak Theories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Natarajan Shankar, Harald Rueß
1
Multiset Rewriting and Security Protocol Analysis . . . . . . . . . . . . . . . . . . . . . 19 John C. Mitchell Engineering of Logics for the Content-Based Representation of Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Franz Baader
Regular Papers, Application Papers Axiomatic Rewriting Theory VI Residual Theory Revisited . . . . . . . . . . . . . . 24 Paul-Andr´e Melli`es Static Analysis of Modularity of β-Reduction in the Hyperbalanced λ-Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Richard Kennaway, Zurab Khasidashvili, Adolfo Piperno Exceptions in the Rewriting Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Germain Faure, Claude Kirchner Deriving Focused Lattice Calculi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Georg Struth Layered Transducing Term Rewriting System and Its Recognizability Preserving Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Hiroyuki Seki, Toshinori Takai, Youhei Fujinaka, Yuichi Kaji Decidability and Closure Properties of Equational Tree Languages . . . . . . . 114 Hitoshi Ohsaki, Toshinori Takai Regular Sets of Descendants by Some Rewrite Strategies . . . . . . . . . . . . . . . . 129 Pierre R´ety, Julie Vuotto Rewrite Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 Johannes Waldmann An Extensional B¨ ohm Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Paula Severi, Fer-Jan de Vries
X
Table of Contents
A Weak Calculus with Explicit Operators for Pattern Matching and Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 Julien Forest Tradeoffs in the Intensional Representation of Lambda Terms . . . . . . . . . . . . 192 Chuck Liang, Gopalan Nadathur Improving Symbolic Model Checking by Rewriting Temporal Logic Formulae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 David D´eharbe, Anamaria Martins Moreira, Christophe Ringeissen Conditions for Efficiency Improvement by Tree Transducer Composition . . 222 Janis Voigtl¨ ander Rewriting Strategies for Instruction Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 237 Martin Bravenboer, Eelco Visser Probabilistic Rewrite Strategies. Applications to ELAN . . . . . . . . . . . . . . . . . 252 Olivier Bournez, Claude Kirchner Loops of Superexponential Lengths in One-Rule String Rewriting . . . . . . . . 267 Alfons Geser Recursive Derivational Length Bounds for Confluent Term Rewrite Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Elias Tahhan-Bittar Termination of (Canonical) Context-Sensitive Rewriting . . . . . . . . . . . . . . . . 296 Salvador Lucas Atomic Set Constraints with Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 Witold Charatonik, Jean-Marc Talbot Currying Second-Order Unification Problems . . . . . . . . . . . . . . . . . . . . . . . . . 326 Jordi Levy, Mateu Villaret A Decidable Variant of Higher Order Matching . . . . . . . . . . . . . . . . . . . . . . . . 340 Dan Dougherty, ToMasz Wierzbicki Combining Decision Procedures for Positive Theories Sharing Constructors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352 Franz Baader, Cesare Tinelli
System Descriptions JITty: A Rewriter with Strategy Annotations . . . . . . . . . . . . . . . . . . . . . . . . . 367 Jaco van de Pol
Table of Contents
XI
Autowrite: A Tool for Checking Properties of Term Rewriting Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 Ir`ene Durand TTSLI: An Implementation of Tree-Tuple Synchronized Languages . . . . . . 376 Benoit Lecland, Pierre R´ety in2 : A Graphical Interpreter for Interaction Nets . . . . . . . . . . . . . . . . . . . . . . 380 Sylvain Lippi Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
Combining Shostak Theories Natarajan Shankar and Harald Rueß SRI International Computer Science Laboratory Menlo Park CA 94025 USA {shankar,ruess}@csl.sri.com http://www.csl.sri.com/{˜shankar,˜ruess} Phone: +1 (650) 859-5272 Fax: +1 (650) 859-2844
Abstract. Ground decision procedures for combinations of theories are used in many systems for automated deduction. There are two basic paradigms for combining decision procedures. The Nelson–Oppen method combines decision procedures for disjoint theories by exchanging equality information on the shared variables. In Shostak’s method, the combination of the theory of pure equality with canonizable and solvable theories is decided through an extension of congruence closure that yields a canonizer for the combined theory. Shostak’s original presentation, and others that followed it, contained serious errors which were corrected for the basic procedure by the present authors. Shostak also claimed that it was possible to combine canonizers and solvers for disjoint theories. This claim is easily verifiable for canonizers, but is unsubstantiated for the case of solvers. We show how our earlier procedure can be extended to combine multiple disjoint canonizable, solvable theories within the Shostak framework.
1
Introduction
Consider the sequent 2 ∗ car (x) − 3 ∗ cdr (x) = f (cdr (x)) f (cons(4 ∗ car (x) − 2 ∗ f (cdr (x)), y)) = f (cons(6 ∗ cdr (x), y)).
This work was funded by NSF Grant CCR-0082560, DARPA/AFRL Contract F33615-00-C-3043, and NASA Contract NAS1-00079. During a phone conversation with the first author on 2nd April 2001, Rob Shostak suggested that the problem of combining Shostak solvers could be solved through variable abstraction. His suggestion is the key inspiration for the combination of Shostak theories presented here. We thank Clark Barrett, Sam Owre, and Ashish Tiwari for their meticulous reading of earlier drafts. We also thank Harald Ganzinger for pointing out certain limitations of our original definition of solvability with respect to σ-models. The first author is grateful to the program committees and program chairs of the FME, LICS, and RTA conferences at FLoC 2002 for their kind invitation.
S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 1–18, 2002. c Springer-Verlag Berlin Heidelberg 2002
2
N. Shankar and H. Rueß
It involves symbols from three different theories. The symbol f is uninterpreted, the operations ∗ and − are from the theory of linear arithmetic, and the pairing and projection operations cons, car , and cdr , are from the theory of lists. There are two basic methods for building combined decision procedures for disjoint theories, i.e., theories that share no function symbols. Nelson and Oppen [NO79] gave a method for combining decision procedures through the use of variable abstraction for replacing subterms with variables, and the exchange of equality information on the shared variables. Thus, with respect to the example above, decision procedures for pure equality, linear arithmetic, and the theory of lists can be composed into a decision procedure for the combined theory. The other combination method, due to Shostak, yields a decision procedure for the combination of canonizable and solvable theories, based on the congruence closure procedure. Shostak’s original algorithm and proof were seriously flawed. His algorithm is neither terminating nor complete (even when terminating). These flaws went unnoticed for a long time even though the method was widely used, implemented, and studied [CLS96,BDL96,Bjø99]. In earlier work [RS01], we described a correct algorithm for the basic combination of a single canonizable, solvable theory with the theory of equality over uninterpreted terms. That correctness proof has been mechanically verified using PVS [FS02]. The generality of the basic combination rests on Shostak’s claim that it is possible to combine solvers and canonizers from disjoint theories into a single canonizer and solver. This claim is easily verifiable for canonizers, but fails for the case of solvers. In this paper, we extend our earlier decision procedure to the combination of uninterpreted equality with multiple canonizable, solvable theories. The decision procedure does not require the combination of solvers. We present proofs for the termination, soundness, and completeness of our procedure.
2
Preliminaries
We introduce some of the basic terminology needed to understand Shostakstyle decision procedures. Fixing a countable set of variables X and a set of function symbols F , a term is either a variable x from X or an n-ary function symbol f from F applied to n terms as in f (a1 , . . . , an ). Equations between terms are represented as a = b. Let vars(a), vars(a = b), and vars(T ) represent the sets of variables in a, a = b, and the set of equalities T , respectively. We are interested in deciding the validity of sequents of the form T c = d where c and d are terms, and T is a set of equalities such that vars(c = d) ⊆ vars(T ). The condition vars(c = d) ⊆ vars(T ) is there for technical reasons. It can always be satisfied by padding T with reflexivity assertions x = x for any variables x in vars(c = d) − vars(T ). We write a for the set of subterms of a, which includes a. The semantics for a term a, written as M [[a]]ρ, is given relative to an interpretation M over a domain D and an assignment ρ. For an n-ary function f , the interpretation M (f ) of f in M is a map from Dn to D. For an uninterpreted
Combining Shostak Theories
3
n-ary function symbol f , the interpretation M (f ) may be any map from Dn to D, whereas only restricted interpretations might be suitable for an interpreted function symbol like the arithmetic + operation. An assignment ρ is a map from variables in X to values in D. We define M [[a]]ρ to return a value in D by means of the following equations. M [[x]]ρ = ρ(x) M [[f (a1 , . . . , an )]]ρ = M (f )(M [[a1 ]]ρ, . . . , M [[an ]]ρ) We say that M, ρ |= a = b iff M [[a]]ρ = M [[b]]ρ, and M |= a = b iff M, ρ |= a = b for all assignments ρ. We write M, ρ |= S when ∀a, b : a = b ∈ S ⇒ M, ρ |= a = b, and M, ρ |= (T a = b) when (M, ρ |= T ) ⇒ (M, ρ |= a = b). A sequent T c = d is valid, written as |= (T c = d), when M, ρ |= (T c = d), for all M and ρ. There is a simple pattern underlying the class of decision procedures studied here. Let ψ be the state of the decision procedure as given by a set of formulas.1 τ Let τ be a family of state transformations so that we write ψ −→ ψ if ψ is the result of applying a transformation in τ to ψ, where vars(ψ) ⊆ vars(ψ ) (variable preservation). An assignment ρ is said to extend ρ over vars(ψ ) − vars(ψ) when it agrees with ρ on all variables except those in vars(ψ ) − vars(ψ) for vars(ψ) ⊆ vars(ψ ). We say that ψ preserves ψ if vars(ψ) ⊆ vars(ψ ) and for all interpretations M and assignments ρ, M, ρ |= ψ holds iff there exists an assignment ρ extending ρ such that M, ρ |= ψ .2 When preservation is restricted to a limited class of interpretations ι, we say that ψ ι-preserves ψ. Note that the preserves relation is transitive. When the operation τ is deterministic, τ (ψ) represents the result of the transformation, and we call τ a conservative operation to indicate that τ (ψ) preserves ψ for all ψ. Correspondingly, τ is said to be ιconservative when τ (ψ) ι-preserves ψ. Let τ n represent the n-fold iteration of τ , then τ n is a conservative operation. The composition τ2 ◦ τ1 of conservative operations τ1 and τ2 , is also a conservative operation. The operation τ ∗ (ψ) is defined as τ i (ψ) for the least i such that τ i+1 (ψ) = τ i (ψ). The existence of such a bound i must be demonstrated for the termination of τ ∗ . If τ is conservative, so is τ ∗ . If τ is a conservative operation, it is sound and complete in the sense that for a formula φ with vars(φ) ⊆ vars(ψ), |= (ψ φ) iff |= (τ (ψ) φ). This is clear since τ is a conservative operation and vars(φ) ⊆ vars(ψ). 1
2
In our case, the state is actually represented by a list whose elements are sets of equalities. We abuse notation by viewing such a state as the set of equalities corresponding to the union of the sets of equalities contained in it. In general, one could allow the interpretation M to be extended to M in the transformation from ψ to ψ to allow for the introduction of new function symbols, e.g., skolem functions. This abstract design pattern then also covers skolemization in addition to methods like prenexing, clausification, resolution, variable abstraction, and Knuth-Bendix completion.
4
N. Shankar and H. Rueß
If τ ∗ (ψ) returns a state ψ such that |= (ψ ⊥), where ⊥ is an unsatisfiable formula, then ψ and ψ are both clearly unsatisfiable. Otherwise, if ψ is canonical, as explained below, |= (ψ φ) can be decided by computing a canonical form ψ [[φ]] for φ with respect to ψ .
3
Congruence Closure
In this section, we present a warm-up exercise for deciding equality over terms where all function symbols are uninterpreted, i.e., the interpretation of these operations is unconstrained. This means that a sequent T c = d is valid, i.e., |= (T c = d) iff for all interpretations M and assignments ρ, the satisfaction relation M, ρ |= (T c = d) holds. Whenever we write f (a1 , . . . , an ), the function symbol f is uninterpreted, and f (a1 , . . . , an ) is then said to be uninterpreted. Later on, we will extend the procedure to allow interpreted function symbols from disjoint Shostak theories such as linear arithmetic and lists. The congruence closure procedure sets up the template for the extended procedure in Section 5. The congruence closure decision procedure for pure equality has been studied by Kozen [Koz77], Shostak [Sho78], Nelson and Oppen [NO80], Downey, Sethi, and Tarjan [DST80], and, more recently, by Kapur [Kap97]. We present the congruence closure algorithm in a Shostak-style, i.e., as an online algorithm for computing and using canonical forms by successively processing the input equations from the set T . For ease of presentation, we make use of variable abstraction in the style of the abstract congruence closure technique due to Bachmair, Tiwari, and Vigneron [BTV02]. Terms of the form f (a1 , . . . , an ) are variable-abstracted into the form f (x1 , . . . , xn ) where the variables x1 , . . . , xn abstract the terms a1 , . . . , an , respectively. The procedure shown here can be seen as a specific strategy for applying the abstract congruence closure rules. In Section 5, we make essential use of variable abstraction in the Nelson–Oppen style where it is not merely a presentation device. Let T = {a1 = b1 , . . . , an = bn } for n ≥ 0 so that T is empty when n = 0. Let x and y be metavariables that range over variables. The state of the algorithm consists of a solution state S and the input equalities T . The solution state S will be maintained as the pair (SV ; SU ), where (l1 ; l2 ; . . . ; ln ) represents a list with n elements and semi-colon is an associative separator for list elements. The set SU then contains equalities of the form x = f (x1 , . . . , xn ) for an n-ary uninterpreted function f , and the set SV contains equalities of the form x = y between variables. We blur the distinction between the equality a = b and the singleton set {a = b}. Syntactic identity is written as a ≡ b as opposed to semantic equality a = b. A set of equalities R is functional if b ≡ c whenever a = b ∈ R and a = c ∈ R, for any a, b, and c. If R is functional, it can be used as a lookup table for obtaining the right-hand side entry corresponding to a left-hand side expression. Thus R(a) = b if a = b ∈ R, and otherwise, R(a) = a. The domain of R, dom(R)
Combining Shostak Theories
5
is defined as {a | a = b ∈ R for some b}. When R is not necessarily functional, we use R({a}) to represent the set {b | a = b ∈ R ∨ b ≡ a} which is the image of {a} with respect to the reflexive closure of R. The inverse of R, written as R−1 , is the set {b = a | a = b ∈ R}. A functional set R of equalities can be applied as in R[a]. R[x] = R(x) R[f (a1 , . . . , an )] = R(f (R[a1 ], . . . , R[an ])) R[{a1 = b1 , . . . , an = bn }] = {R[a1 ] = R[b1 ], . . . , R[an ] = R[bn ]} In typical usage, R will be a solution set where the left-hand sides are all variables, so that R[a] is just the result of applying R as a substitution to a. When SV is functional, then S given by (SV ; SU ) can also be used to compute the canonical form S[[a]] of a term a with respect to S. Hilbert’s epsilon operator is used in the form of the when operator: F (x) when x : P (x) is an abbreviation for F (x : P (x)), if ∃x : P (x). S[[x]] = SV (x) S[[f (a1 , . . . , an )]] = SV (x), when x : x = f (S[[a1 ]], . . . , S[[an ]]) ∈ SU S[[f (a1 , . . . , an )]] = f (S[[a1 ]], . . . , S[[an ]]), otherwise. The set SV of variable equalities will be maintained so that vars(SV ) ∪ vars(SU ) = dom(SV ). The set SV partitions the variables in dom(SV ) into equivalence classes. Two variables x and y are said to be in the same equivalence class with respect to SV if SV (x) ≡ SV (y). If R and R are solution sets and R is functional, then R R = {a = R [b] | a = b ∈ R}, and R ◦ R = R ∪ (R R ). The set SV is maintained in idempotent form so that SV ◦ SV = SV . Note that SU need not be functional since it can, for example, simultaneously contain the equations x = f (y), x = f (z), and x = g(y). We assume a strict total ordering x ≺ y on variables. The operation orient(x = y) returns {x = y} if x ≺ y, and returns {y = x}, otherwise. The solution state S is said to be congruence-closed if SU ({x}) ∩ SU ({y}) = ∅ whenever SV (x) ≡ SV (y). A solution set S is canonical if S is congruence-closed, SV is functional and idempotent, and SU is normalized, i.e., SU SV = SU . In order to determine if |= (T c = d), we check if S [[c]] ≡ S [[d]] for S = process(S; T ), where S = (SV ; SU ), SV = id T , id T = {x = x | x ∈ vars(T )}, and SU = ∅. The congruence closure procedure process is defined in Figure 1. Explanation. We explain the congruence closure procedure using the validity of the sequent f (f (f (x))) = x, x = f (f (x)) f (x) = x as an example. Its validity will be verified by constructing a solution state S equal to process(SV ; SU ; T ) for T = {f (f (f (x))) = x, x = f (f (x))}, SV = id T , SU = ∅, and checking S [[f (x)]] ≡ S [[x]]. Note that id T is {x = x}. In processing f (f (f (x))) = x with respect to S, the canonization step, S[[f (f (f (x))) = x]]
6
N. Shankar and H. Rueß
process(S; ∅) = S process(S; {a = b} ∪ T ) = process(S ; T ), where, S = close ∗ (merge(abstract ∗ (S; S[[a = b]]))). close(S) = merge(S; SV (x) = SV (y)), when x, y : SV (x) ≡ SV (y), (SU ({x}) ∩ SU ({y}) = ∅) close(S) = S, otherwise. merge(S; x = x) = S merge(S; x = y) = (SV ; SU ), where x ≡ y, R = orient(x = y), SV = SV ◦ R, SU = SU R.
abstract(S; x = y) = (S; x = y) abstract(S; a = b) = (S ; a = b ), when S , a , b , x1 , . . . , xn : f (x1 , . . . , xn ) ∈
a = b x ∈ vars(S; a = b) R = {x = f (x1 , . . . , xn )}, S = (SV ∪ {x = x}; SU ∪ R), a = R−1 [a], b = R−1 [b]. Fig. 1. Congruence closure
yields f (f (f (x))) = x, unchanged. Next, the variable abstraction step computes abstract ∗ (f (f (f (x))) = x). First f (x) is abstracted to v1 yielding the state {x = x, v1 = v1 }; {v1 = f (x)}; {f (f (v1 )) = x}. Variable abstraction eventually terminates renaming f (v1 ) to v2 and f (v2 ) to v3 so that S is {x = x, v1 = v1 , v2 = v2 , v3 = v3 }; {v1 = f (x), v2 = f (v1 ), v3 = f (v2 )}. The variable abstracted input equality is then v3 = x. Let orient(v3 = x) return v3 = x. Next, merge(S; v3 = x) yields the solution state {x = x, v1 = v1 , v2 = v2 , v3 = x}; {v1 = f (x), v2 = f (v1 ), v3 = f (v2 )}. The congruence closure step close ∗ (S) leaves S unchanged since there are no variables that are merged in SU and not in SV . The next input equality x = f (f (x)) is canonized as x = v2 which can be oriented as v2 = x and merged with S to yield the new value {x = x, v1 = v1 , v2 = x, v3 = x}; {v1 = f (x), v2 = f (v1 ), v3 = f (x)} for S. The congruence closure step close ∗ (S) now detects that v1 and v3 are merged in SU but not in SV and generates the equality v1 = v3 . This equality is merged to yield the new value of S as {x = x, v1 = x, v2 = x, v3 = x}; {v1 = f (x), v2 = f (x), v3 = f (x)}, which is congruence-closed. With respect to this final value of the solution state S, it can be checked that S[[f (x)]] ≡ x ≡ S[[x]]. Invariants. The Shostak-style congruence closure algorithm makes heavy use of canonical forms and this requires some key invariants to be preserved on the solution state S. If vars(SV )∪vars(SU ) ⊆ dom(SV ), then vars(SV )∪vars(SU ) ⊆
Combining Shostak Theories
7
dom(SV ), when S is either abstract(S; a = b) or close(S). If S is canonical and a = S[[a]], then SV [a ] = a . If SU SV = SU , SV [a] = a, and SV [b] = b, then SU SV = SU where S ; a = b is abstract(S; a = b). Similarly, if SU SV = SU , SV (x) ≡ x, SV (y) ≡ y, then SU ◦ SV = SU for S = merge(S; x = y). If SV is functional and idempotent, then so is SV , where S is either of abstract(S; a = b) or close(S). If S = close ∗ (S), then S is congruence-closed, and if SV is functional and idempotent, SU is normalized, then S is canonical. Variations. In the merge operation, if SU is computed as R[SU ] instead of SU R, this would preserve the invariant that SU−1 is always functional and SV [SU ] = SU . If this is the case, the canonizer can be simplified to just return SU−1 (f (S[[a1 ]], . . . , S[[an ]])). Termination. The procedure process(S; T ) terminates after each equality in T has been asserted into S. The operation abstract ∗ terminates because each recursive call decreases the number of occurrences of function applications in the given equality a = b by at least one. The operation close ∗ terminates because each invocation of the merge operation merges two distinct equivalence classes of variables in SV . The process operation terminates because the number of input equations in T decreases with each recursive call. Therefore the computation of process(S; T ) terminates returning a canonical solution set S . Soundness and Completeness. We need to show that |= (T c = d) ⇐⇒ S [[c]] ≡ S [[d]] for S = process(id T ; ∅; T ) and vars(c = d) ⊆ vars(T ). We do this by showing that S preserves (id T ; ∅; T ), and hence |= (T c = d) ⇐⇒ |= (S c = d), and |= (S c = d) ⇐⇒ S [[c]] ≡ S [[d]]. We can easily establish that if process(S; T ) = S , then S preserves (S; T ). If a = b is obtained from a = b by applying equality replacements from S, then (S; a = b ) preserves (S; a = b). In particular, |= (S S[[c]] = c) holds. The following claims can then be easily verified. 1. 2. 3. 4.
(S; S[[a = b]]) preserves (S; a = b). abstract(S; a = b) preserves (S; a = b). merge(S; a = b) preserves (S; a = b). close(S) preserves S.
The only remaining step is to show that if S is canonical, then |= (S c = d) ⇐⇒ S [[c]] ≡ S [[d]] for vars(c = d) ⊆ vars(S). Since we know that |= S S [[c]] = c and |= S S [[d]] = d, hence |= (S c = d) follows from S [[c]] ≡ S [[d]]. For the only if direction, we show that if S [[c]] ≡ S [[d]], then there is an interpretation MS and assignment ρS such that MS , ρS |= S but MS , ρS |= c = d. A canonical term (in S’) is a term a such that S [[a]] ≡ a. The domain DS is taken to be the set of canonical terms built from the function symbols F and variables from vars(S ). We constrain MS so that MS (f )(a1 , . . . , an ) = SV (x) when there is an x such that x = f (a1 , . . . , an ) ∈ SU , and f (a1 , . . . , an ), otherwise. Let ρS map x in vars(S ) to SV (x); the mappings for the variables outside vars(S ) are irrelevant. It is easy to see that MS [[c]]ρS = S [[c]] by induction on
8
N. Shankar and H. Rueß
the structure of c. In particular, when S is canonical, MS (f )(x1 . . . , xn ) = x for x = f (x1 , . . . , xn ) ∈ SU , so that one can easily verify that MS , ρS |= S . Hence, if S [[c]] ≡ S [[d]], then |= (S c = d).
4
Shostak Theories
A Shostak theory [Sho84] is a theory that is canonizable and solvable. We assume a collection of Shostak theories θ1 , . . . , θN . In this section, we give a decision procedure for a single Shostak theory θi , but with i as a parameter. This background material is adapted from Shankar [Sha01]. Satisfiability M, ρ |= a = b is with respect to i-models M . The equality a = b is i-valid, i.e., |=i a = b, if for all imodels M and assignments ρ, M [[a]]ρ = M [[b]]ρ. Similarly, a = b is i-unsatisfiable, i.e., |=i a = b, when for all i-models M and assignments ρ, M [[a]]ρ = M [[b]]ρ. An i-term a is a term whose function symbols all belong to θi and vars(a) ⊆ X ∪ Xi . A canonizable theory θi admits a computable operation σi on terms such that |=i a = b iff σi (a) ≡ σi (b), for i-terms a and b. An i-term a is canonical if σi (a) ≡ a. Additionally, vars(σi (a)) ⊆ vars(a) and every subterm of σi (a) must be canonical. For example, a canonizer for the theory θA of linear arithmetic can be defined to convert expressions into an ordered sum-of-monomials form. Then, σA (y + x + x) ≡ 2 ∗ x + y ≡ σA (x + y + x). A solvable theory admits a procedure solve i on equalities such that solve i (Y )(a = b) for a set of variables Y with vars(a = b) ⊆ Y , returns a solved form for a = b as explained below. solve i (Y )(a = b) might contain fresh variables that do not appear in Y . A functional solution set R is in i-solved form if it is of the form {x1 = t1 , . . . , xn = tn }, where for j, 1 ≤ j ≤ n, tj is a canonical i-term, σi (tj ) ≡ tj , and vars(tj ) ∩ dom(R) = ∅ unless tj ≡ xj . The i-solved form solve i (Y )(a = b) is either ⊥i , when |=i a = b, or is a solution set of equalities which is the union of sets R1 and R2 . The set R1 is the solved form {x1 = t1 , . . . , xn = tn } with xj ∈ vars(a = b) for 1 ≤ j ≤ n, and for any i-model M and assignment ρ, we have that M, ρ |= a = b iff there is a ρ extending ρ over vars(solve i (Y )(a = b)) − Y such that M, ρ |= xj = tj , for 1 ≤ j ≤ n. The set R2 is just {x = x | x ∈ vars(R1 ) − Y } and is included in order to preserve variables. In other words, solve i (Y )(a = b) i-preserves a = b. For example, a solver for linear arithmetic can be constructed to isolate a variable on one side of the equality through scaling and cancellation. We assume that the fresh variables generated by solve i are from the set Xi . We take vars(⊥i ) to be X ∪ Xi so as to maintain variable preservation, and indeed ⊥i could be represented as just ⊥ were it not for this condition. We now describe a decision procedure for sequents of the form T c = d in a single Shostak theory with canonizer σi and solver solve i . Here the solution state S is just a functional solution set of equalities in i-solved form. Given a solution set S, we define Sai as σi (S[a]). The composition of solutions sets is defined so that S ◦i ⊥i = ⊥i ◦i S = ⊥i and S ◦i R = R ∪ {a = Rbi | a = b ∈ S}. Note
Combining Shostak Theories
9
that solved forms are idempotent with respect to composition so that S ◦i S = S. The solved form solveclose i (id T ; T ) is obtained by processing the equations in T to build up a solution set S. An equation a = b is first canonized with respect to S as Sai = Sbi and then solved to yield the solution R. If R is ⊥i , then T is i-unsatisfiable and we return the solution state with Si = ⊥i as the result. Otherwise, the composition S ◦i R is computed and used to similarly process the remaining formulas in T . solveclose i (S; ∅) = S solveclose i (⊥i ; T ) = ⊥i solveclose i (S; {a = b} ∪ T ) = solveclose i (S , T ), where S = S ◦i solve i (vars(S))(Sai = Sbi ) To check i-validity, |=i (T c = d), it is sufficient to check that either solveclose i (id T ; T ) = ⊥ or S ci ≡ S di , where S = solveclose i (id T ; T ). Soundness and Completeness. As with the congruence closure procedure, each step in solveclose i is i-conservative. Hence solveclose i is sound and complete: if S = solveclose i (S; T ), then for every i-model M and assignment ρ, M, ρ |= S∪T iff there is a ρ extending ρ over the variables in vars(S ) − vars(S) such that M, ρ |= S . If σi (S [a]) ≡ σi (S [b]), then M, ρ |= a = S [a] = σi (S [a]) = σi (S [b]) = S [b] = b, and hence M, ρ |= a = b. Otherwise, when σi (S [a]) ≡ σi (S [b]), we know by the condition on σi that there is an i-model M and an assignment ρ such that M [[S [a]]]ρ = M [[S [b]]]ρ . The solved form S divides the variables into independent variables x such that S (x) = x, and dependent variables y where y = S (y) and the variables in vars(S (y)) are all independent. We can therefore extend ρ to an assignment ρ where the dependent variables y are mapped to M [[S (y)]]ρ . Clearly, M, ρ |= S , M, ρ |= a = S [a], and M, ρ |= b = S [b]. Since S i-preserves (id T ; T ), M, ρ |= T but M, ρ |= a = b and hence T a = b is not i-valid, so the procedure is complete. The correctness argument is thus similar to that of Section 3 but for the case of a single Shostak theory considered here, there is no need to construct a canonical term model since |=i a = σi (a), and σi (a) ≡ σi (b) iff |=i a = b. Canonical term model. The situation is different when we wish to combine Shostak theories. It is important to resolve potential semantic incompatibilities between two Shostak theories. With respect to some fixed notion of i-validity for θi and j-validity for θj with i = j, a formula A in the union of θi and θj may be satisfiable in an i-interpretation of only a specific finite cardinality for which there might be no corresponding satisfying j-interpretation for the formula. Such an incompatibility can arise even when a theory θi is extended with uninterpreted function symbols. For example, if φ is a formula with variables x and y that is satisfiable only in a two-element model M where ρ(x) = ρ(y), then the set of formulas Γ where Γ = {φ, f (x) = x, f (u) = y, f (y) = x} additionally requires ρ(x) = ρ(u) and ρ(y) = ρ(u). Hence, a model for Γ must have at least
10
N. Shankar and H. Rueß
three elements, so that Γ is unsatisfiable. However, there is no way to detect this kind of unsatisfiability purely through the use of solving and canonization. We introduce a canonical term model as a way around such semantic incompatibilities. The set of canonical i-terms a such that σi (a) ≡ a yields a domain for a term model Mi where Mi (f )(a1 , . . . , an ) = σi (f (a1 , . . . , an )). If Mi is (isomorphic to) an i-model, then we say that the theory θi is composable. Note that the solve operation is conservative with respect to the model Mi as well, since Mi is taken as an i-model. Given the usual interpretation of disjunction, a notion of validity is said to be convex when |= (T c1 = d1 ∨ . . . ∨ cn = dn ) implies |= (T ck = dk ) for some k, 1 ≤ k ≤ n. If a theory θi is composable, then i-validity is convex. Recall that |=i (T c1 = d1 ∨ . . . ∨ cn = dn ) iff |=i (S c1 = d1 ∨ . . . ∨ cn = dn ) for S = solveclose i (id T ; T ). If S = ⊥i , then |=i (T ck = dk ), for 1 ≤ k ≤ n. If S = ⊥i , then since S i-preserves T , |=i (S c1 = d1 ∨ . . . ∨ cn = dn ), but (by assumption) |=i (S ck = dk ). An assignment ρS can be constructed so that for independent ( i.e., where S(x) = x) variables x ∈ vars(S), ρS (x) = x, and for dependent variables y ∈ vars(S), ρS (y) = Mi [[S(y)]]ρS . If for S = ⊥i , |=σi (S ck = dk ), then Mi , ρS |= S and Mi , ρS |= ck = dk . Hence Mi , ρS |= (S ck = dk ), for 1 ≤ k ≤ n. This yields Mi , ρS |= (T c1 = d1 ∨ . . . ∨ cn = dn ), contradicting the assumption.
5
Combining Shostak Theories
We now examine the combination of the theory of equality over uninterpreted function symbols with several disjoint Shostak theories. Examples of interpreted operations from Shostak theories include + and − from the theory of linear arithmetic, select and update from the theory of arrays, and cons, car , and cdr from the theory of lists. The basic Shostak combination algorithm covers the union of equality over uninterpreted function symbols and a single canonizable and solvable equational theory [Sho84,CLS96,RS01]. Shostak [Sho84] had claimed that the basic combination algorithm was sufficient because canonizers and solvers for disjoint theories could be combined into a single canonizer and solver for their union. This claim is incorrect.3 We present a combined decision procedure for multiple Shostak theories that overcomes the difficulty of combining solvers. Two theories θ1 and θ2 are said to be disjoint if they have no function symbols in common. A typical subgoal in a proof can involve interpreted symbols from several theories. Let σi be the canonizer for θi . A term f (a1 , . . . , an ) is said to be in θi if f is in θi even though some ai might contain function symbols outside θi . In processing terms from the union of pairwise disjoint theories θ1 , . . . , θN , it is quite easy to combine the canonizers so that each theory treats terms in the other theory as variables. Since σi is only applicable to i-terms, we first 3
The difficulty with combining Shostak solvers was observed by Jeremy Levitt [Lev99].
Combining Shostak Theories
11
have to extend the canonizer σi to treat terms in θj for j = i, as variables. We treat uninterpreted function symbols as belonging to a special theory θ0 where σ0 (a) = a for a ∈ θ0 . The extended operation σi is defined below. σi (a) = R[σi (a )], when a , b, R : a is an i-term, R is functional, dom(R) ⊆ vars(a ), R(x) ∈ θj , for x ∈ dom(R), some j = i, R[a ] ≡ a Note that the when condition in the above definition can always be satisfied. The combined canonizer σ can then be defined as σ(x) = x σ(f (a1 , . . . , an )) = σi (f (σ(a1 ), . . . , σ(an ))), when i : f is in θi . This canonizer is, however, not used in the remainder of the paper. We now discuss the difficulty of combining the solvers solve 1 and solve 2 for θ1 and θ2 , respectively, into a single solver. The example uses the theory θA of linear arithmetic and the theory θL of the pairing and projection operations cons, car , cdr , where, somewhat nonsensically, the projection operations also apply to numerical expressions. Shostak illustrated the combination using the example 5 + car (x + 2) = cdr (x + 1) + 3. Since the top-level operation on the left-hand side is +, we can treat car (x + 2) and cdr (x + 1) as variables and use solve A . This might yield a partially solved equation of the form car (x + 2) = cdr (x + 1) − 2. Now since the top-level operation on the left-hand side is from the theory of lists, we use solve L to obtain x + 2 = cons(cdr (x + 1) − 2, u) with a fresh variable u. We once again apply solve A to obtain x = cons(cdr (x + 1) − 2, u) − 2. This is, however, not in solved form: the left-hand side variable occurs in an interpreted context in its solution. There is no way to prevent this from happening as long as each solver treats terms from another theory as variables. Therefore the union of Shostak theories is not necessarily a Shostak theory. The problem of combining disjoint Shostak theories actually has a very simple solution. There is no need to combine solvers. Since the theories are disjoint, the canonizer can tolerate multiple solutions for the same variable as long as there is at most one solution from any individual theory. This can be illustrated on the same example: 5 + car (x + 2) = cdr (x + 1) + 3. By variable abstraction, we obtain the equation v3 = v6 , where v1 = x + 2, v2 = car (v1 ), v3 = v2 + 5, v4 = x + 1, v5 = cdr (v4 ), v6 = v5 + 3. We can separate these equations out into the respective theories so that S is (SV ; SU ; SA ; SL ), where SV contains the variable equalities in canonical form, SU is as in congruence closure but is always ∅ since there are no uninterpreted operations in this example, and SA and SL are the solution sets for θA and θL , respectively. We then get SV = {x = x, v1 = v1 , v2 =
12
N. Shankar and H. Rueß
v2 , v3 = v6 , v4 = v4 , v5 = v5 , v6 = v6 }, SA = {v1 = x + 2, v3 = v2 + 5, v4 = x + 1, v6 = v5 + 3}, and SL = {v2 = car (v1 ), v5 = cdr (v4 )}. Since v3 and v6 are merged in SV , but not in SA , we solve the equality between SA (v3 ) and SA (v6 ), i.e., solve A (v2 + 5 = v5 + 3) to get v2 = v5 − 2. This result is composed with SA to get {v1 = x + 2, v3 = v5 + 3, v4 = x + 1, v6 = v5 + 3, v2 = v5 − 2} for SA . There are no new variable equalities to be propagated out of either SA , SL , or SV . Notice that v2 and v5 both have different solved forms in SA and SL . This is tolerated since the solutions are from disjoint theories and the canonizer can pick a solution that is appropriate to the context. For example, when canonizing a term of the form f (x) for f ∈ θi , it is clear that the only relevant solution for x is the one from Si . We can now check whether the resulting solution state verifies the original equation 5 + car (x + 2) = cdr (x + 1) + 3. In canonizing f (a1 , . . . , an ) we return SV (y) whenever the term f (Si (S[[a1 ]]), . . . , Si (S[[an ]])) being canonized is such that y = f (Si (S[[a1 ]]), . . . , Si (S[[an ]])) ∈ Si for f ∈ θi . Thus x + 2 canonizes to v1 using SA , and car (v1 ) canonizes to v2 using SL . The resulting term 5 + v2 , using the solution for v2 from SA , simplifies to v5 + 3, which returns the canonical form v6 by using SA . On the right-hand side, x + 1 is equivalent to v4 in SA , and car (v4 ) simplifies to v5 using SL The right-hand side therefore simplifies to v5 +3 which is canonized to v6 using SA . The canonized left-hand and right-hand sides are identical. We present a formal description of the procedure used informally in the above example. We show how process from Section 3 can be extended to combine the union of disjoint solvable, canonizable, composable theories. We assume that there are N disjoint theories θ1 ,. . . , θN . Each theory θi is equipped with a canonizer σi and solver solve i for i-terms. If we let I represent the interval [1, N ], then an I-model is a model M that is an i-model for each i ∈ I. We will ensure that each inference step is conservative with respect to I-models, i.e., I-conservative. We represent the uninterpreted part of S as S0 instead of SU . The solution state S of the algorithm now consists of a list of sets of equations (SV ; S0 ; S1 ; . . . ; SN ). Here SV is a set of variable equations of the form x = y, and S0 is the set of equations of the form x = f (x1 , . . . , xn ) where f is uninterpreted. Each Si is in i-solved form and is the solution set for θi . Terms now contain a mixture of function symbols that are uninterpreted or are interpreted in one of the theories θi . A solution state S is confluent if for all x, y ∈ dom(SV ) and i, 0 ≤ i ≤ N : SV (x) ≡ SV (y) ⇐⇒ Si ({x}) ∩ Si ({y}) = ∅. A solution state S is canonical if it is confluent; SV is functional and idempotent, i.e., SV ◦SV = SV ; the uninterpreted solution set S0 is normalized, i.e., S0 SV = S0 ; each Si , for i > 0, is functional, idempotent, i.e., Si ◦i Si = Si , normalized i.e., Si SV = Si , and in i-solved form. The canonization of expressions with respect to a canonical solution set S is defined as follows. S[[x]] = SV (x) S[[f (a1 , . . . , an )]] = SV (x), when i, x :
Combining Shostak Theories
13
abstract(S; x = y) = (S; x = y), abstract(S; a = b) = (S ; a = b ), when S , c, i : c ∈ max (
a = bi ), x ∈ vars(S ∪ a = b), SV = SV ∪ {x = x}, Si = Si ∪ {x = c}, Sj = Sj , for , i = j a = {c = x}[a], b = {c = x}[b]. Fig. 2. Variable abstraction step for multiple Shostak theories
i ≥ 0, f ∈ θi , x = σi (f (Si (S[[a1 ]]), . . . , Si (S[[an ]]))) ∈ Si
S[[f (a1 , . . . , an )]] = σi (f (Si (S[[a1 ]]), . . . , Si (S[[an ]]))), when i : f ∈ θi , i ≥ 0. Since variables are used to communicate between the different theories, the canonical variable x in SV is returned when the term being canonized is known to be equivalent to an expression a such that y = a in Si , where x ≡ SV (y). The definition of the above global canonizer is one of the key contributions of this paper. This definition can be applied to the example above of computing S[[5 + car (x + 2)]]. Variable Abstraction. The variable abstraction procedure abstract(S; a = b) is shown in Figure 2. If a is an i-term such that a ∈ X, then a is said to be a pure i-term. Let a = bi represent the set of subterms of a = b that are pure i-terms. The set max (M ) of maximal terms in M is defined to be {a ∈ M |a ≡ b ∨ a ∈ b, for any b ∈ M }. In a single variable abstraction step, abstract(S; a = b) picks a maximal pure i-subterm c from the canonized input equality a = b, and replaces it with a fresh variable x from X while adding x = c to Si . By abstracting a maximal pure i-term, we ensure that Si remains in i-solved form. Explanation. The procedure in Figure 3 is similar to that of Figure 1. Equations from the input set T are processed into the solution state S of the form SV ; S0 ; . . . , SN . Initially, S must be canonical. In processing the input equation a = b into S, we take steps to systematically restore the canonicity of S. The first step is to compute the canonical form S[[a = b]] of a = b with respect to S. It is easy to see that (S; S[[a = b]]) I-preserves (S; a = b). The result of the canonization step a = b is then variable abstracted as abstract ∗ (a = b ) (shown in Figure 2) so that in each step, a maximal, pure i-subterm c of a = b is replaced by a fresh variable x, and the equality x = c is added to Si . This is also easily seen to be an I-conservative step. The equality x = y resulting from the variable abstraction of a = b is then merged into SV and S0 . This can destroy confluence since there may be variables w and z such
14
N. Shankar and H. Rueß
process(S; ∅) = S process(S; T ) = S, when i : Si = ⊥i process(S; {a = b} ∪ T ) = process(S ; T ), where S = close ∗ (merge V (abstract ∗ (S; S[[a = b]]))). close(S) = S, when i : Si = ⊥i close(S) = S , when S , i, x, y : x, y ∈ dom(SV ), (i > 0, SV (x) ≡ SV (y), Si (x) ≡ Si (y), and S = merge i (S; x = y)) or (i ≥ 0, SV (x) ≡ SV (y), Si ({x}) ∩ Si ({y}) = ∅, and S = merge V (S; SV (x) = SV (y))) close(S) = normalize(S), otherwise. normalize(S) = (SV ; S0 ; S1 SV ; . . . ; SN SV ). merge i (S; x = y) = S , where i > 0, Si = Si ◦i solve i (vars(Si ))(Si (x) = Si (y)), Sj = Sj , for i = j, SV = SV . merge V (S; x = x) = S merge V (S; x = y) = (SV ◦ R; S0 R; S1 ; . . . ; SN ), where R = orient(x = y). Fig. 3. Combining Multiple Shostak Theories
that w and z are merged in SV (i.e., SV (w) ≡ SV (z)) that are unmerged in some Si (i.e., Si ({w}) ∩ Si ({z}) = ∅), or vice-versa.4 The number of variables in dom(SV ) remains fixed during the computation of close ∗ (S). Confluence is restored by close ∗ (S) which finds a pair of variables that are merged in some Si but not in SV , and merging them in SV , or that are merged in SV and not in some Si and merging them in Si . Each such merge step is also I-conservative. When this process terminates, S is once again canonical. The solution sets Si are normalized with respect to SV in order to ensure that the entries are in the normalized form for lookup during canonization. Invariants. As with congruence closure, several key invariants are needed to ensure that the solution state S is maintained in canonical form whenever it is given as the argument to process. If S is canonical and a and b are canonical with respect to S, then for (S ; a = b ) = abstract(S; a = b), S is canonical, and a and b are canonical with respect to S . The state abstract(S; a = b) Ipreserves (S; a = b). A solution state is said to be well-formed if SV is functional and idempotent, S0 is normalized, and each Si is functional, idempotent, and in 4
For i > 0, Si is maintained in i-solved form and hence, Si ({x}) = {x, Si (x)}.
Combining Shostak Theories
15
solved form. Note that if S is well-formed, confluent, and each Si is normalized, then it is canonical. When S is well-formed, and S = merge V (S; x = y) or S = merge i (S; x = y), then S is well-formed and I-preserves (S; x = y). If S is well-formed and congruence-closed, and S = normalize(S), then S is wellformed and each Si is normalized. If S = normalize(S), then each Si is in solved form because if x replaces y on the right-hand side of a solution set Si , then Si (y) ≡ y since Si is in i-solved form. By congruence closure, we already have that Si (x) ≡ Si (y) ≡ y. Therefore, the uniform replacement of y by x ensures that Si (x) ≡ x, thus leaving S in solved form. If S = close ∗ (S), where S is well-formed, then S is canonical. Variations. As with congruence closure, once S is confluent, it is safe to strengthen the normalization step to replace each Si by SV [Si ]. This renders S0−1 functional, but Si−1 may still be non-functional for i > 0, since it might contain left-hand side variables that are local. However, if Sˆi is taken to be Si restricted to dom(SV ), then Sˆi−1 with the strengthened normalization is functional and can be used in canonization. The solutions for local variables can be safely discarded in an actual implementation. The canonization and variable abstraction steps can be combined within a single recursion. Termination. The operations S[[a = b]] and abstract ∗ (S; a = b) are easily seen to be terminating. The operation close ∗ (S) also terminates because the sum of the number of equivalence classes of variables in dom(SV ) with respect to each of the solution sets SV , S0 , S1 , . . . , SN , decreases with each merge operation. Soundness and Completeness. We have already seen that each of the steps: canonization, variable abstraction, composition, merging, and normalization, is I-conservative. It therefore follows that if S = process(S; T ), then S Ipreserves S. Hence, if S [[c]] ≡ S [[d]], then clearly |=I (S c = d), and hence |=I (S; T c = d). The completeness argument requires the demonstration that if S [[c]] ≡ S [[d]], then |=I (S c = d) when S is canonical. This is done by means of a construction of MS and ρS such that MS , ρS |= S but MS , ρS |= c = d. The domain D consists of canonical terms e such that S [[e]] = e. As with congruence closure, MS is defined so that MS (f )(e1 , . . . , en ) = S [[f (e1 , . . . , en )]]. The assignment ρS is defined so that ρS (x) = SV (x). By induction on c, we have that MS [[c]]ρS = S [[c]]. We can also easily check that MS , ρS |= S . It is also the case that MS is an I-model since MS is isomorphic to Mi for each i, 1 ≤ i ≤ N . This can be demonstrated by constructing a bijective map µi between D and the domain Di corresponding to Mi . Let Pi be the set of pure i-terms in D, and let γ be a bijection between D − Pi and X such that γ(x) = x if Si (x) = x for x ∈ dom(SV ). Define µi so that µi (x) = Si (x) for x ∈ dom(SV ) and SV (x) = x, µi (y) = y for y ∈ Xi , µi (f (a1 , . . . , an )) = f (µi (a1 ), . . . , µi (an )) for f ∈ θi , and µi (a) = γ(a), otherwise. It can then be verified that for an i-term a, µi (MS [[a]]ρ) = Mi [[a]]ρi , where ρi (x) = µi (ρ(x)). This concludes the proof of completeness.
16
N. Shankar and H. Rueß
Convexity revisited. As in Section 4, the term model construction of MS once again establishes that I-validity is convex. In other words, a sequent |=I (T c1 = d1 ∨ . . . ∨ cn = dn ) iff |=I (T ck = dk ) for some k, 1 ≤ k ≤ n.
6
Conclusions
Ground decision procedures for equality are crucial for discharging the myriad proof obligations that arise in numerous applications of automated reasoning. These goals typically contain operations from a combination of theories, including uninterpreted symbols. Shostak’s basic method deals only with the combination of a single canonizable, solvable theory with equality over uninterpreted function symbols. Indeed, in all previous work based on Shostak’s method, only the basic combination is considered. Though Shostak asserted that the basic combination was adequate to cover the more general case of multiple Shostak theories, this claim has turned out to be unsubstantiated. We have given here the first Shostak-style combination method for the general case of multiple Shostak theories. The algorithm is quite simple and is supported by straightforward arguments for termination, soundness, and completeness. Shostak’s combination method, as we have described it, is clearly an instance of a Nelson–Oppen combination [NO79] since it involves the exchange of equalities between variables through the solution set SV . The added advantage of a Shostak combination is that it combines the canonizers of the individual theories into a global canonizer. The definition of such a canonizer for multiple Shostak theories is the key contribution of this paper. The technique of achieving confluence across the different solution sets is unique to our method. Confluence is needed for obtaining useful canonical forms, and is therefore not essential in a general Nelson–Oppen combination. The global canonizer S[[a]] can be applied to input formulas to discharge queries and simplify input formulas. The reduction to canonical form with respect to the given equalities helps keep the size of the term universe small, and makes the algorithm more efficient than a black box Nelson–Oppen combination. The decision algorithm for a Shostak theory given in Section 4 fits the requirements for a black box procedure that can be used within a Nelson–Oppen combination. The Nelson–Oppen combination of Shostak theories with other decision procedures has been studied by Tiwari [Tiw00], Barrett, Dill, and Stump [BDS02], and Ganzinger [Gan02], but none of these methods includes a general canonization procedure as is required for a Shostak combination. Variable abstraction is also used in the combination unification procedure of Baader and Schulz [BS96], which addresses a similar problem to that of combining Shostak solvers. In our case, there is no need to ensure that solutions are compatible across distinct theories. Furthermore, variable dependencies can be cyclic across theories so that it is possible to have y ∈ vars(Si (x)) and x ∈ vars(Sj (y)) for i = j. Our algorithm can be easily and usefully adapted
Combining Shostak Theories
17
for combining unification and matching algorithms with constraint solving in Shostak theories. Insights derived from the Nelson–Oppen combination method have been crucial in the design of our algorithm and its proof. Our presentation here is different from that of our previous algorithm for the basic Shostak combination [RS01] in the use of variable abstraction and the theory-wise separation of solution sets. Our proof of the basic algorithm additionally demonstrated the existence of proof objects in a sound and complete proof system. This can easily be replicated for the general algorithm studied here. The soundness and completeness proofs given here are for composable theories and avoid the use of σ-models. Our Shostak-style algorithm fits modularly within the Nelson–Oppen framework. It can be employed within a Nelson–Oppen combination (as suggested by Rushby [CLS96]) in which there are other decision procedures that generate equalities between variables. It is also possible to combine it with decision procedures that are not disjoint, as for example with linear arithmetic inequalities. Here, the existence of a canonizer with respect to equality is useful for representing inequality information in a canonical form. A variant of the procedure described here is implemented in ICS [FORS01] in exactly such a combination.
References [BDL96]
[BDS02]
[Bjø99] [BS96] [BTV02] [CLS96]
[DST80]
Clark Barrett, David Dill, and Jeremy Levitt. Validity checking for combinations of theories with equality. In Mandayam Srivas and Albert Camilleri, editors, Formal Methods in Computer-Aided Design (FMCAD ’96), volume 1166 of Lecture Notes in Computer Science, pages 187–201, Palo Alto, CA, November 1996. Springer-Verlag. Clark W. Barrett, David L. Dill, and Aaron Stump. A generalization of Shostak’s method for combining decision procedures. In A. Armando, editor, Frontiers of Combining Systems, 4th International Workshop, FroCos 2002, number 2309 in Lecture Notes in Artificial Intelligence, pages 132–146, Berlin, Germany, April 2002. Springer-Verlag. Nikolaj Bjørner. Integrating Decision Procedures for Temporal Verification. PhD thesis, Stanford University, 1999. F. Baader and K. Schulz. Unification in the union of disjoint equational theories: Combining decision procedures. J. Symbolic Computation, 21:211– 243, 1996. Leo Bachmair, Ashish Tiwari, and Laurent Vigneron. Abstract congruence closure. Journal of Automated Reasoning, 2002. To appear. David Cyrluk, Patrick Lincoln, and N. Shankar. On Shostak’s decision procedure for combinations of theories. In M. A. McRobbie and J. K. Slaney, editors, Automated Deduction—CADE-13, volume 1104 of Lecture Notes in Artificial Intelligence, pages 463–477, New Brunswick, NJ, July/August 1996. Springer-Verlag. P.J. Downey, R. Sethi, and R.E. Tarjan. Variations on the common subexpressions problem. Journal of the ACM, 27(4):758–771, 1980.
18
N. Shankar and H. Rueß
[FORS01] J.-C. Filliˆ atre, S. Owre, H. Rueß, and N. Shankar. ICS: Integrated Canonization and Solving. In G. Berry, H. Comon, and A. Finkel, editors, Computer-Aided Verification, CAV ’2001, volume 2102 of Lecture Notes in Computer Science, pages 246–249, Paris, France, July 2001. Springer-Verlag. [FS02] Jonathan Ford and Natarajan Shankar. Formal verification of a combination decision procedure. In A. Voronkov, editor, Proceedings of CADE-19, Berlin, Germany, 2002. Springer-Verlag. [Gan02] Harald Ganzinger. Shostak light. In A. Voronkov, editor, Proceedings of CADE-19, Berlin, Germany, 2002. Springer-Verlag. [Kap97] Deepak Kapur. Shostak’s congruence closure as completion. In H. Comon, editor, International Conference on Rewriting Techniques and Applications, RTA ‘97, number 1232 in Lecture Notes in Computer Science, pages 23–37, Berlin, 1997. Springer-Verlag. [Koz77] Dexter Kozen. Complexity of finitely presented algebras. In Conference Record of the Ninth Annual ACM Symposium on Theory of Computing, pages 164–177, Boulder, Colorado, 2–4 May 1977. [Lev99] Jeremy R. Levitt. Formal Verification Techniques for Digital Systems. PhD thesis, Stanford University, 1999. [NO79] G. Nelson and D. C. Oppen. Simplification by cooperating decision procedures. ACM Transactions on Programming Languages and Systems, 1(2):245–257, 1979. [NO80] G. Nelson and D. C. Oppen. Fast decision procedures based on congruence closure. Journal of the ACM, 27(2):356–364, 1980. [RS01] Harald Rueß and Natarajan Shankar. Deconstructing Shostak. In 16th Annual IEEE Symposium on Logic in Computer Science, pages 19–28, Boston, MA, July 2001. IEEE Computer Society. [Sha01] Natarajan Shankar. Using decision procedures with a higher-order logic. In Theorem Proving in Higher Order Logics: 14th International Conference, TPHOLs 2001, volume 2152 of Lecture Notes in Computer Science, pages 5–26, Edinburgh, Scotland, September 2001. Springer-Verlag. Available at ftp://ftp.csl.sri.com/pub/users/shankar/tphols2001.ps.gz. [Sho78] R. Shostak. An algorithm for reasoning about equality. Comm. ACM, 21:583–585, July 1978. [Sho84] Robert E. Shostak. Deciding combinations of theories. Journal of the ACM, 31(1):1–12, January 1984. [Tiw00] Ashish Tiwari. Decision Procedures in Automated Deduction. PhD thesis, State University of New York at Stony Brook, 2000.
Multiset Rewriting and Security Protocol Analysis John C. Mitchell Stanford University, Stanford, CA 94305 http://www.stanford.edu/˜jcm
Abstract. The Dolev-Yao model of security protocol analysis may beformalized using a notation based on multi-set rewriting with existential quantification. This presentation describes the multiset rewriting approach to security protocol analysis, algorithmic upper and lower bounds on specific forms of protocol analysis, and some of the ways this model is useful for formalizing sublte properties of specific protocols.
Background Many methods for security protocol analysis are based on essentially the same underlying model. This model, which includes both protocol steps and steps by a malicious attacker, appears to have developed from positions taken by Needham and Schroeder [NS78] and a model presented by Dolev and Yao [DY83]; it is often called the Dolev-Yao model. The Dolev-Yao model is used in theorem proving [Pau97], model-checking methods [Low96,Mea96a,MMS97,Ros95, Sch96], constraint solving [CCM01,MS01], and symbolic search tools [KMM94, Mea96b,DMR00]. Strand spaces [FHG98] and Spi-calculus [AG99] also use versions of the Dolev-Yao model. Historically, however, there has not been a standard presentation of the Dolev-Yao model independent of the many tools in which it is used. The work described in this invited talk, carried out over several years in collaboration with Iliano Cervesato, Nancy Durgin, Patrick Lincoln, and Andre Scedrov, began with an attempt to formulate the Dolev-Yao model in a general way [Mit98,DM99]. There were several goals in doing so: – Identify the basic assumptions of the model, independent of limitations associated with specific tools, such as finite-state limitations, – Make it easier for additional researchers to study security protocols using additional tools, – Study the power and limitations of the model itself. The Dolev-Yao model seems most easily characterized by a symbolic computation system. Messages are composed of indivisible values, representable by
Partially supported by DoD MURI “Semantic Consistency in Information Exchange,” ONR Grant N00014-97-1-0505.
S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 19–22, 2002. c Springer-Verlag Berlin Heidelberg 2002
20
J.C. Mitchell
constants of some vocabulary, and encryption is modeled as a non-invertible function that hides all information about the plaintext. The initial state of a system is represented by a set of private keys and other information available only to honest participants in the protocol, and public and compromised information known to the protocol adversary. The protocol adversary has the following capabilities: – overhear every message, – prevent a message from reaching the intended recipient, – decompose a message comprised of several fields into parts and decrypt any field for which the intruder has already obtained the encryption key, – remember all parts of all messages that have been intercepted or overheard, – encrypt using encryption keys known to the attacker, – generate messages and send them to any other protocol participant. The messages generated by the intruder may be composed of any information supplied to the intruder initially (such as public keys of protocol participants) and data obtained by overhearing or intercepting messages. In this model, the intruder is not allowed to guess encryption keys or other values that would normally be produced by randomized procedures. Many aspects of this model are easily represented by standard rewrite rules. The initial state of a system can be characterized by a set of formulas containing the “facts” known to the honest participants and the adversary. Representing each possible step in the exectution of a protocol or attack by a rewrite rule, we obtain the set of possible traces and reachable states. Two subtleties are linearity and the generation of new values. Linearity, in the sense of linear logic [Gir87], not the way the term is used in rewrite systems, arises because protocol participants change state. If we represent the fact that Alice is in an initial state with key k by the atomic formula A0 (k), then this formula should be removed from the state of the system after Alice leaves her initial state. Furthermore, if the system is in a state with two participants in identical states, this is different from a state in which there is only one such participant. Therefore, we represent the state of a system by a multiset of formulas rather than a set. Many security protocols involve steps that generate new values. In actual implementations, the new values are generally produced by some form of random number generator or pseudo-random generator. For example, a challengeresponse protocol begins with one participant, say Alice, generating a random challenge. Alice sends the challenge to Bob, possibly encrypted, and Bob computes and sends a response. If Alice’s challenge is fresh, meaning that it is a new value not known initially and not used previously in the protocol, then Alice may be sure that the response she receives is the result of computation by Bob. If Alice were to issue an old challenge, then some attacker could impersonate Bob by sending a saved copy of Bob’s old response. In our multiset rewriting framework, new values are generated as a result of instantiating existentially quantified variables with fresh constants. This is a form of a standard existential instantiation rule, applied to symbolic computation.
Multiset Rewriting and Security Protocol Analysis
21
The multiset rewriting approach to security protocol analysis consists of several modeling conventions using multisets of formulas and rewrite rules that may contain existential quantification on the right-hand-side of the rule. The modeling conventions include ways of separating the knowledge of one participant from the knowledge of another, separating private knowledge from public knowledge, and separating the state of a participant from a network message, all based on conventions for naming predicates used in atomic formulas. There are also conventions for separating an initialization phase, in which initial public and privante information is generated, from a key distribution and role-assignment phase and a protocol execution phase. This talk will present three aspects of the framework: – Basic concepts in protocol analysis and their formulation using multiset rewriting, – Algorithmic upper and lower bounds on specific forms of protocol analysis, – Some of the ways the model is useful for formalizing sublte properties of specific protocols. Our algorithmic results include undecidability for a class of protocols that only allow each participant a finite number of steps and impose a bound on the length of each message, exponential-time completeness when the number of new values (generated by existential instantiation) is bounded, and NP-completeness when the number of protocol roles used in any run is bounded. These results have been announced previously [DLMS99,CDL+ 99] and are further elaborated in a submitted journal article available from the author’s web site. Two directions for current and future work are the development of logics for reasoning about protocols and formalization of specific protocol properties. A logic described in [DMP01] uses a trace-based semantics of protocol runs, inspired by work on the multiset-rewriting framework. In ongoing work on contract-signing protocols, we have found it essential to specify clearly how each agent may behave and to use adaptions of game-theoretic concepts to characterize desired correctness conditions. Put as simply as possible, the essential property of any formalism for studing security protocol analysis is that we must define precisely the set of actions available to any attacker and the ways that each protocol participant may respond to any possible message. It is not adequate to simply characterize the actions that take place when no attacker is present.
References [AG99] [CCM01]
M. Abadi and A. Gordon. A calculus for cryptographic protocols: the spi calculus. Information and Computation, 148(1):1–70, 1999. H. Comon, V. Cortier, and J. Mitchell. Tree automata with one memory, set constraints and ping-pong protocols. In Proc. 28th Int. Coll. Automata, Languages, and Programming (ICALP’2001), pages 682–693. Lecture Notes in Computer Science, vol 2076, Springer, July 2001.
22
J.C. Mitchell
[CDL+ 99] I. Cervesato, N. Durgin, P. Lincoln, J. Mitchell, and A. Scedrov. A metanotation for protocol analysis. In P. Syverson, editor, 12-th IEEE Computer Security Foundations Workshop. IEEE Computer Society Press, 1999. [DLMS99] N. Durgin, P. Lincoln, J. Mitchell, and A. Scedrov. Undecidability of bounded security protocols. In Proc. Workshop on formal methods in security protocols, Trento, Italy, 1999. [DM99] N. Durgin and J. Mitchell. Analysis of security protocols. In Calculational System Design, Series F: Computer and Systems Sciences, Vol. 173. IOS Press, 1999. [DMP01] N. Durgin, J. Mitchell, and D. Pavlovic. A compositional logic for protocol correctness. In CSFW: Proceedings of The 14th Computer Security Foundations Workshop. IEEE Computer Society Press, 2001. [DMR00] G. Denker, J. Millen, and H. Ruess. The CAPSL integrated protocol environment. Technical Report SRI-CSL-2000-02, Computer Science Laboratory, SRI International, October 2000. [DY83] D. Dolev and A. Yao. On the security of public-key protocols. IEEE Transactions on Information Theory, 2(29), 1983. [FHG98] F. Javier Thayer F´ abrega, Jonathan C. Herzog, and Joshua D. Guttman. Strand spaces: Why is a security protocol correct? In Proceedings of the 1998 IEEE Symposium on Security and Privacy, pages 160–171, Oakland, CA, May 1998. IEEE Computer Society Press. [Gir87] J.-Y. Girard. Linear logic. Theoretical Computer Science, 50:1–102, 1987. [KMM94] R. Kemmerer, C. Meadows, and J. Millen. Three systems for cryptographic protocol analysis. J. Cryptology, 7(2):79–130, 1994. [Low96] G. Lowe. Breaking and fixing the Needham-Schroeder public-key protocol using CSP and FDR. In 2nd International Workshop on Tools and Algorithms for the Construction and Analysis of Systems. Springer-Verlag, 1996. [Mea96a] C. Meadows. Analyzing the Needham-Schroeder public-key protocol: a comparison of two approaches. In Proc. European Symposium On Research In Computer Security. Springer Verlag, 1996. [Mea96b] C. Meadows. The NRL protocol analyzer: an overview. J. Logic Programming, 26(2):113–131, 1996. [Mit98] J.C. Mitchell. Analysis of security protocols. Slides for invited talk at CAV ’98, available at http://www.stanford.edu/˜jcm, July 1998. [MMS97] J.C. Mitchell, M. Mitchell, and U. Stern. Automated analysis of cryptographic protocols using Murϕ. In Proc. IEEE Symp. Security and Privacy, pages 141–151, 1997. [MS01] J. Millen and V. Shmatikov. Constraint solving for bounded-process cryptographic protocol analysis. In 8th ACM Conference on Computer and Communication Security, pages 166–175, November 2001. [NS78] R.M. Needham and M.D. Schroeder. Using encryption for authentication in large networks of computers. Communications of the ACM, 21(12):993– 999, 1978. [Pau97] L.C. Paulson. Proving properties of security protocols by induction. In 10th IEEE Computer Security Foundations Workshop, pages 70–83, 1997. [Ros95] A. W. Roscoe. Modelling and verifying key-exchange protocols using CSP and FDR. In 8th IEEE Computer Security Foundations Workshop, pages 98–107. IEEE Computer Soc Press, 1995. [Sch96] S. Schneider. Security properties and CSP. In IEEE Symp. Security and Privacy, 1996.
Engineering of Logics for the Content-Based Representation of Information Franz Baader TU Dresden, Germany,
[email protected]
Abstract. Storage and transfer of information as well as interfaces for accessing this information have undergone a remarkable evolution. Nevertheless, information systems are still not ”intelligent” in the sense that they ”understand” the information they store, manipulate, and present to their users. A case in point is the world wide web and search engines allowing to access the vast amount of information available there. Web-pages are mostly written for human consumption and the mark-up provides only rendering information for textual and graphical information. Search engines are usually based on keyword search and often provide a huge number of answers, many of which are completely irrelevant, whereas some of the more interesting answers are not found. In contrast, the vision of a ”semantic web” aims for machine-understandable web resources, whose content can then be comprehended and processed both by automated tools, such as search engines, and by human users. The content-based representation of information requires representation formalisms with a well-defined formal semantics since otherwise there cannot be a common understanding of the represented information. This semantics can elegantly be provided by a translation into an appropriate logic or the use of a logic-based formalism in the first place. This logical approach has the additional advantage that logical inferences can then be used to reason about the represented information, thus detecting inconsistencies and computing implicit information. However, in this setting there is a fundamental tradeoff between the expressivity of the representation formalism on the one hand, and the efficiency of reasoning with this formalism on the other hand. This motivates the ”engineering of logics”, i.e., the design of logical formalisms that are tailored to specific representation tasks. This also encompasses the formal investigation of the relevant inference problems, the development of appropriate inferences procedures, and their implementation, optimization, and empirical evaluation. Another important topic in this context is the combination of logics and their inference procedures since a given application my require the use of more than one specialized logic. The talk will illustrate this approach with the example of so-called Description Logics and their application as ontology languages for the semantic web.
S. Tison (Ed.): RTA 2002, LNCS 2378, p. 23, 2002. c Springer-Verlag Berlin Heidelberg 2002
Axiomatic Rewriting Theory VI: Residual Theory Revisited Paul-Andr´e Melli`es Equipe PPS, CNRS, Universit´e Paris 7
Abstract. Residual theory is the algebraic theory of confluence for the λ-calculus, and more generally conflict-free rewriting systems (=without critical pairs). The theory took its modern shape in L´evy’s PhD thesis, after Church, Rosser and Curry’s seminal steps. There, L´evy introduces a permutation equivalence between rewriting paths, and establishes that among all confluence diagrams P −→ N ←− Q completing a span P ←− M −→ Q, there exists a minimum such one, modulo permutation equivalence. Categorically, the diagram is called a pushout. In this article, we extend L´evy’s residual theory, in order to enscope “border-line” rewriting systems, which admit critical pairs but enjoy a strong Church-Rosser property (=existence of pushouts.) Typical examples are the associativity rule and the positive braid rewriting systems. Finally, we show that the resulting theory reformulates and clarifies L´evy’s optimality theory for the λ-calculus, and its so-called “extraction procedure”.
1
Introduction
Knuth and Bendix [KB] define a critical pair as a pair of rewriting rules which induces overlapping redexes. The definition applies with little variation to word, term, or higher-order term rewriting systems [KB,K2 ,Ni,MN]. The syntactical definition captures the idea that a critical pair is a “singularity” of the system, and a risk of non-confluence. Indeed, the definition is mainly motivated by Knuth and Bendix’s theorem, which states that a (word, term, higher order) rewriting system is locally confluent, see [K2 ], iff all its critical pairs are convergent (=confluent). For instance, the two rules: A −→ B
A −→ C
(1)
form a critical pair, because they induce overlapping redexes A −→ B and A −→ C. The critical pair is not convergent, and thus, the term rewriting system is not locally confluent. S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 24–50, 2002. c Springer-Verlag Berlin Heidelberg 2002
Axiomatic Rewriting Theory VI: Residual Theory Revisited
25
On the other hand, the two rules L and R expressing the logical connective “parallel or” L
por(true, x) −→ true
R
por(x, true) −→ true
(2)
form also a critical pair, because they induce overlapping redexes l and r from the term por(true, true): l
% true 9
por(true, true)
(3)
r
But this critical pair (2) is convergent, and thus, the rewriting system is locally confluent. It is even confluent — this is proved by applying Newman’s lemma, or better, by observing that the system is an instance of van Oostrom and van Raamsdonk’s weakly orthogonal rewriting systems [OR]. Does this mean that the pair (1) is more “critical” than the pair (2) ? Well, from the strict point of view of confluence, yes: the pair (1) is “more” critical than the pair (2). But on the other hand, the standardization theory developed in [HL,Bo,M2 ] provides excellent reasons to consider the pair (2) as “critical” as (1). We explain this briefly. Suppose that one extends the term rewriting system (2) with the rule Ω −→ true The extended rewriting system is still confluent. Let us count the number of head rewriting paths from por(Ω, Ω) to the “value” term V = true. These head rewriting paths are characterized in [M3 ] for a class of rewriting systems enscoping all (possibly conflicting, possibly higher-order) term rewriting systems. The characterization shows that there exists at most one head rewriting path in a conflict-free (that is: critical-pair free, and left-linear) term rewriting system; think of the λ-calculus. In contrast, one counts two head rewriting paths in the system (2): Ω L por(Ω, Ω) −→ por(true, Ω) −→ true (4) Ω R por(Ω, Ω) −→ por(Ω, true) −→ true Each path mirrors the causal cascade leading to the rule L or R which computes the “parallel or” operation to true. By this point, we hope to convince the reader that the pair (2) is not “critical” for syntactical reasons only (as in Knuth and Bendix’s definition) but also for structural reasons related to causality of computations. In [M1 ,M2 ,M3 ], we analyze causality of computations by purely diagrammatic methods. Rewriting systems are replaced by their reduction graphs, which we equip with 2-dimensional tiles, in order to reflect redex permutations. Typically, the reduction graph of por(true, true) in (2) is equipped with three tiles:
26
P.-A. Melli`es
por(Ω, Ω)
Ω
/ por(true, Ω)
Ω
/ por(true, true)
Ω
por(Ω, true) por(true, Ω) MMM MMM MMM L MM& por(Ω, true) MMM MMM MMM R MM&
Ω
Ω
true Ω
true
/ por(true, true) ooo ooo o o L w oo o / por(true, true) o ooo o o oo R w oo o
That way, generic properties of rewriting systems (seen as 2-dimensional graphs) may be established. For instance, the standardization theorem [CF,L´e1 ,HL,Bo] states that there exists a unique standard (= outside-in) path in each “homotopy” equivalence class. It is proved diagrammatically in [GLM,M1 ,M2 ] from fairly general axioms on redex permutations in rewriting systems. It is worth noting here that tiling diagram (3) in the 2-dimensional graph of por(Ω, Ω) would break the standardization theorem, since the two standard paths (4) would appear then in the same “homotopy” class. This dictates to keep diagram (3) untiled in the 2-dimensional graph of por(Ω, Ω), and to understand it as a 2-dimensional hole. l
por(true, true)
% true 9
r
At this point, each path of (4) appears to be the canonical (that is: standard) representative of one homotopy class from por(Ω, Ω) to true. Reformulated this way, rewriting theory looks simpler, and closer to algebraic topology. This is certainly promising 1 . But let us come back to our main point. On one hand, Knuth and Bendix’s definition of a critical pair is purely syntactic; on the other hand, it captures subtle diagrammatic and causal singularities, like the critical pair (2). 1
A few readers will find example (2) funny. Indeed, tiling the critical pair (3) defines a 2-dimensional graph which happens to enjoy universal confluence. But this is pure chance. In particular, this universality property does not extend to weakly orthogonal rewriting systems.
Axiomatic Rewriting Theory VI: Residual Theory Revisited
27
However, and this is the starting point of the article, we claim that in one case at least, Knuth and Bendix’s syntactic definition of critical pair is too crude. Consider the term rewriting system with a unique binary symbol: the tensor product noted ⊗, and a unique rewriting rule: the associativity rule below. α
x ⊗ (y ⊗ z) −→ (x ⊗ y) ⊗ z
(5)
This rewriting system and various extensions have been studied thoroughly by category theorists, in order to establish so-called coherence theorems [Mc]. We recall that the rewriting system (5) is confluent and strongly normalizing; and that its normal forms are the variables, or the terms x ⊗ V obtained by “appending” a variable x to a normal form V . Here, we consider the local confluence diagrams induced by (5). They are of two kinds, depending whether the co-initial redexes u and v overlap, or do not overlap. First, the rewriting rule being right-linear (= it does not duplicate) two non-overlapping redexes u and v induce a square diagram, completed by the obvious residual redexes u and v : M
u
v
Q
/P (6)
v
u
/N
Then, two redexes u and v overlap in one case only: w⊗α
w ⊗ (x ⊗ (y ⊗ z)) −→ w ⊗ ((x ⊗ y) ⊗ z) α
(7)
w ⊗ (x ⊗ (y ⊗ z)) −→ (w ⊗ x) ⊗ (y ⊗ z) The local confluence diagram below is famous in the categorical litterature: it is called Mac Lane’s pentagon. w ⊗ (x ⊗ (y ⊗ z))
w⊗α
/ w ⊗ ((x ⊗ y) ⊗ z) α
(w ⊗ (x ⊗ y)) ⊗ z
α
(w ⊗ x) ⊗ (y ⊗ z)
α
(8)
α⊗z
/ ((w ⊗ x) ⊗ y) ⊗ z
Mac Lane introduced it in order to axiomatize the concept of monoidal category. We write ≡ for the “homotopy” equivalence relation on paths obtained by tiling every confluence diagrams (6) and (8). By strong normalization, and Newman’s lemma [Ne,K2 ], every two sequences f, g of associativity rules from a term M to its normal form V are equal modulo homotopy equivalence ≡. This is roughly the content of Mac Lane’s coherence theorem for monoidal category, see [M5 ] for further details.
28
P.-A. Melli`es
At this point, we stress that the rewriting system (5) and homotopy equivalence ≡ verify a remarkable property, which goes far beyond confluence. We call this property universal confluence. It states that, for every two coinitial rewriting paths f : M −→ P and g : M −→ Q, there exists two paths f : Q −→ N and g : P −→ N such that: – the rewriting paths f ; g : M −→ N and g; f : M −→ N are equal modulo ≡, f
M g
/P g
≡
Q
/N
f
– for every diagram path f : Q −→ O and g : P −→ O such that f ; g ≡ g; f , there exists a unique rewriting path h (unique modulo homotopy) such that f ≡ f ; h and g ≡ g ; h f
M
/P
g
Q
g
f
/N
g h
f
-O
The property is established in Dehornoy’s theory of Garside groups, using “reversing word” techniques [D1 ,DL]. Categorically speaking, the property states that the category of terms and rewriting paths modulo homotopy ≡ enjoys existence of pushout diagrams. Quite interestingly, the same universal confluence property holds in the λcalculus, and more generally in any conflict-free rewriting system. There, the homotopy equivalence relation ≡ is obtained by tiling all confluence diagrams generated by L´evy’s residual theory. In the λ-calculus, the equivalence relation ≡ is best illustrated by three paradigmatic confluence diagrams:
and
((λx.x)a)((λy.y)b)
/ a((λy.y)b)
((λx.x)a)b
/ ab
(9)
Axiomatic Rewriting Theory VI: Residual Theory Revisited
29
(λx.a)((λy.y)b) TTTT TTTT TTT) kk5 a kkkk k k k kkk (λx.a)b and (λx.xx)((λy.y)a)
u
/ ((λy.y)a)((λy.y)a) v1
a((λy.y)a)
v
(10)
v2
(λx.xx)a
/ aa
So, the associativity rule (5) on one hand, and all conflict-free rewriting systems on the other hand, enjoy the same remarkable universal confluence property. However, Knuth and Bendix’s definition considers that the associativity rule (5) forms a critical pair (with itself) because the two redexes (7) overlap. Obviously, this syntactic reason does not take into account Mac Lane’s pentagon (8) and its striking similarity to, say, diagram (10). So, one feels that L´evy’s residual theory should be extended to incorporate example (5). The main task of the article is precisely to formulate this extended theory. One technical difficulty on the way is that, despite what we have just suggested, Mac Lane’s pentagon is not exactly similar to diagram (10). Let us explain why. In diagram (10), the path v1 ; v2 develops the set of residuals of the β-redex v after the β-redex u. In particular, the β-redex v2 is residual after v1 of a β-redex w : ((λy.y)a)((λy.y)a) −→ ((λy.y)a)a itself residual of the β-redex v after u. This enables L´evy to deduce all local confluence diagrams from his theory of redexes, residuals and developments. On the contrary, in diagram (8) the path α
α⊗z
w ⊗ ((x ⊗ y) ⊗ z) −→ (w ⊗ (x ⊗ y)) ⊗ z −→ ((w ⊗ x) ⊗ y) ⊗ z
(11)
is not generated by the set of residuals of the redex
after the redex
w ⊗ (x ⊗ (y ⊗ z)) −→ (w ⊗ x) ⊗ (y ⊗ z)
α
(12)
w⊗α
(13)
w ⊗ (x ⊗ (y ⊗ z)) −→ w ⊗ ((x ⊗ y) ⊗ z)
In particular, the redex
30
P.-A. Melli`es α⊗z
(w ⊗ (x ⊗ y)) ⊗ z −→ ((w ⊗ x) ⊗ y) ⊗ z is not the residual of any redex after the redex: α
w ⊗ ((x ⊗ y) ⊗ z) −→ (w ⊗ (x ⊗ y)) ⊗ z In fact, after a short thought, one must admit that the path (11) as a whole is residual of the redex (12) after the redex (13). Admitting that paths may be residuals of redexes is not a small change in L´evy’s residual theory. If one sees redexes as the computational atoms of rewriting theory, then these rewriting paths deduced as residuals of redexes, appear as its molecules. We shall call them treks to carry the intuition that in most concrete case studies, these rewriting paths implement a unique elementary task, however the number of rewriting steps necessary to complete it. A few years ago, the author considered such “treks” and proved the (universal) confluence of a positive presentation of Artin braid group [A]. However, the proof [M4 ,KOV1 ] was too involved to generalize properly to other rewriting systems. Later discussions with Patrick Dehornoy, Jan Willem Klop and Vincent van Oostrom helped the author to develop the theory exposed here. So, the theory will keep nice and tractable, in sharp contrast to [M4 ]. The trick is to avoid defining the residuals of a trek after a trek (∗) because this is too difficult to formalize in most rewriting systems. Instead, we limit ourselves to defining the residuals of a trek after a redex, then deduce (∗) from a few coherence properties. This makes all the difference! Synopsis: we expose the classical theory of redexes and residuals in section 2, and revisit the theory in section 3. For comparison’s sake, we wrote sections 2 and 3 in a very similar fashion. We illustrate the theory on associativity and Mac Lane’s pentagon in section 4; on braids and Garside’s theorem in section 5; on L´evy’s optimality theory for the λ-calculus in section 6. Finally, we conclude and indicate further research directions.
2
Classical Residual Theory (L´ evy)
In this section, we recall the residual theory developed by J.-J. L´evy in his PhD thesis [L´e1 ], see also [HL,M1 ]. Consider a graph G, that is a set V of terms and a set E of redexes, together with two functions ∂0 , ∂1 : E −→ V indicating the source and target of each redex. We write u : M −→ M when u is a redex and ∂0 u = M and ∂1 u = M . A path is sequence
Axiomatic Rewriting Theory VI: Residual Theory Revisited
f = (M1 , u1 , M2 , ..., Mm , um , Mm+1 )
31
(14)
where M and N are terms, and for every 1 ≤ i ≤ m ui : Mi −→ Mi+1 Remark: every term M induce a path (M ), also noted idM , called the identity path of M . We overload the arrow notation, and write f : M1 −→ Mm+1 in a situation like (14). Moreover, we write f ; g : M −→ N for the composite of two paths f : M −→ P and g : P −→ N . M.
In any graph G, we write R(M ) for the set of redexes outgoing from the term
Definition 1 (residual relation). A residual relation over a graph G is a relation /u for every redex u : M −→ N , relating redexes in R(M ) to redexes in R(N ). The residual relation /− is extended to paths, by declaring that the relation u /f v holds when – either f = idM and u = v are redexes outgoing from M , – or f = u1 ; · · · ; un and there exists redexes w2 , ..., wn such that u /u1 w2 /u2 ... /un−1 wn /un v. Notation: given a redex u and a path f , we write u /f for the set { v | u /f v }. Every residual relation is supposed to verify four properties, formulated in this section. The two first properties are quite elementary: 1. self-destruction: for every redex u, the set u /u is empty, 2. finiteness: for every two coinitial redexes u and v, the set u /v is finite, Definition 2 (multi-redex). A multi-redex is a pair (M, U ) consisting of: – a term M , – a finite set U of redexes outgoing from M . Remark: every redex u : M −→ N may be seen alternatively as the multi-redex (M, {u}). Notation: we say that a redex u is element of a multi-redex (M, U ) when u ∈ U . The residual relation /f (M, U ) /f (N, V ) when:
is extended to multi-redexes by declaring that
32
P.-A. Melli`es
– f : M −→ N , – V is the set of residuals of elements of U after f , that is: V = {v | ∃u ∈ U, u /f v} Definition 3 (development). A rewriting path u
un−1
u
u
1 2 n M = M1 −→ M2 −→ · · · −→ Mn −→ Mn+1
develops a multi-redex (M, U ) when – every redex ui is element of (M, U ) /u1 ;···;ui−1 , – (M, U ) /f (Mn+1 , ∅). Notation: we write
U M f
when the rewriting path f develops the multi-redex (M, U ). We like this notation because it captures particularly well the idea that, in some sense, the path f “realizes” the multi-redex (M, U ). Definition 4 (partial development). A path f : M −→ P develops a multiredex (M, U ) partially when there exists a path g : P −→ N such that U M f ; g. The two last properties required on a residual relation /− appear in [GLM,M1 ]. They axiomatize the finite development lemma [Hyl,K1 ,Ba,M1 ,O4 ] and the cube lemma [L´e1 ,HL,Bo,Ba,P,OV]. 3. finite developments: for every multi-redex (M, U ), there does not exist any infinite sequence u
u
u
1 2 n M2 −→ ...Mn −→ Mn+1 ... M1 −→
of redexes such that, for every index i, the path u
ui−1
u
1 2 M2 −→ ...Mi−1 −→ Mi M1 −→
develops the multi-redex (M, U ) partially. 4. permutation: for every two coinitial redexes u : M −→ P and v : M −→ Q, there exists two paths hu and hv such that: • u /v Q hu and v /u P hv , • hu and hv are cofinal and induce the same residual relation /u;hv = /v;hu . Diagrammatically u /P M v
Q
hv
hu
/N
(15)
Axiomatic Rewriting Theory VI: Residual Theory Revisited
33
In his PhD thesis, L´evy introduces a permutation equivalence between rewriting paths. The idea is to tile all diagrams (15) and consider the induced “homotopy equivalence”. Definition 5 (L´ evy). We write f ≡1 g when the paths f and g factor as f1
u
h
f2
f1
v
h
f2
v N −→ N f = M −→ M −→ P −→ u g = M −→ M −→ Q −→ N −→ N
where – u and v are coinitial redexes, – u /v Q hu and v /u P hv , – hu and hv induce the same residual relation /u;hv = /v;hu . Definition 6 (L´ evy). The permutation or homotopy equivalence relation ≡ is the least equivalence relation (reflexive, transitive, symmetric) on rewriting paths, containing the relation ≡1 . L´evy proves his two main theorems about residual systems in [L´e1 ,HL], see also [Ba,K1 ,P,St,M1 ]: Theorem 1 (pushouts). For every two coinitial rewriting paths f : M −→ P and g : M −→ Q, there exists two paths f : Q −→ N and g : P −→ N such that: – the rewriting paths f ; g : M −→ N and g; f : M −→ N are equal modulo ≡, f
M g
/P g
≡
Q
/N
f
– for every diagram path f : Q −→ O and g : P −→ O such that f ; g ≡ g; f , there exists a unique rewriting path h (unique modulo homotopy) such that f ≡ f ; h and g ≡ g ; h M
f
/P
g
Q
g
/N
f
g h
f
-O
34
P.-A. Melli`es
Theorem 2 (epis). Every path f : M −→ P enjoys the left-simplification rule below: f; g ≡ f; h ⇒ g ≡ h for every two paths g, h : P −→ N . Theorem 1 states that the category G/≡ of terms and rewriting paths modulo homotopy has pushouts, while Theorem 2 states that every morphism of the category is an epimorphism, see [Mc] for a nice exposition of categorical concepts.
3
Classical Theory Revisited: Treks and Residuals
The residual theory we develop here repeats the pattern of section 2, except that redexes are replaced by treks in the definition of residuals. We also change the notation /u into [[u]] to help the reader put the two theories in contrast. M.
In any graph G, we write R(M ) for the set of redexes outgoing from the term
Definition 7 (trek graph). A trek graph R = (G, T, ) is a graph equipped with, for every term M – a set T (M ) of treks, – a generation relation between treks of T (M ) and redexes of R(M ), – a function · assigning a trek u ∈ T (M ) to every redex u ∈ R(M ), such that u u. Notation: we say alternatively that the trek t ∈ T (M ) generates the redex u ∈ R(M ), or that the redex u ∈ R(M ) starts the trek t ∈ T (M ) when t u. Remark: every graph defines a trek graph, by setting T (M ) = R(M ) and the identity. Definition 8 (residual relation). A residual relation over a trek graph R is a relation [[u]] for every redex u : M −→ N , relating treks of T (M ) to treks of T (N ). Just like in section 2, the residual relation [[u]] is extended from redexes to paths, by declaring that the relation s[[f ]]t holds when – either f = idM and s = t are treks of T (M ), – or f = u1 ; · · · ; un and there exists treks r2 , ..., rn s[[u1 ]]r2 [[u2 ]]...[[un−1 ]]rn [[un ]]t.
such that
Axiomatic Rewriting Theory VI: Residual Theory Revisited
35
Notation: given a trek s ∈ T (M ) and a path f : M −→ N , we write s[[f ]] = { t ∈ T (N ) | s[[f ]]t } Just like in section 2, four properties are required of every residual relation. The two first properties appear below. 1. self-destruction: for every redex u : M −→ N , the set u[[u]] is empty, 2. finiteness: for every trek t ∈ T (M ) and redex u ∈ R(M ), the set t[[u]] is finite, Definition 9 (multi-trek). A multi-trek is a pair (M, S) consisting of: – a term M , – a finite subset S of T (M ). Remark: every redex u : M −→ N may be seen alternatively as the multi-trek (M, {u}). The residual relation [[f ]] is extended from treks to multi-treks by declaring that (M, S)[[f ]](N, T) when: – f : M −→ N , – T = {t ∈ T (N ) | ∃s ∈ S, s[[f ]]t} = s∈S s[[f ]] Remark: the definition above is only possible thanks to property 2. (finiteness.) Definition 10 (development). A rewriting path u
u
un−1
u
1 2 n M2 −→ · · · −→ Mn −→ Mn+1 M = M1 −→
develops a multi-trek (M, S) when – every redex ui starts one of the treks of (M, S)[[u1 ; · · · ; ui−1 ]], – (M, S)[[f ]](Mn+1 , ∅). Notation: as in section 2, we write S M f when the rewriting path f develops the multi-trek (M, S). Definition 11 (partial development). A path f : M −→ P develops a multitrek (M, S) partially when there exists a path g : P −→ N such that S M f ; g.
36
P.-A. Melli`es
The two properties below repeat the axioms of finite developments and permutation formulated in section 2. 3. finite developments: for every multi-trek (M, S), there does not exist any infinite sequence u
u
u
1 2 n M1 −→ M2 −→ ...Mn −→ Mn+1 ...
of redexes such that, for every index i, the path u
ui−1
u
1 2 M1 −→ M2 −→ ...Mi−1 −→ Mi
develops the multi-trek (M, S) partially. 4. permutation: for every two coinitial redexes u : M −→ P and v : M −→ Q, there exists two paths hu and hv such that: • u[[v]] Q hu and v[[u]] P hv , • hu and hv are cofinal and induce the same residual relation [[u; hv ]] = [[v; hu ]]. Diagrammatically u /P M v
(16)
hv
Q
/N
hu
Here, we adapt L´evy permutation equivalence of section 2 to the new setting. Again, the idea is to tile all diagrams (16) and consider the “homotopy equivalence” on path. Definition 12 (tile). We write f ≡1 g when the paths f and g factor as f1
u
h
f2
f1
v
h
f2
v f = M −→ M −→ P −→ N −→ N u g = M −→ M −→ Q −→ N −→ N
where – u and v are coinitial redexes, – u[[v]] Q hu and v[[u]] P hv , – hu and hv induce the same residual relation [[u; hv ]] = [[v; hu ]]. Definition 13 (homotopy). The permutation or homotopy equivalence relation ≡ is the least equivalence relation (reflexive, transitive, symmetric) on rewriting paths, containing the relation ≡1 . The axiomatization extends L´evy’s theorems 1 and 2 to “border-line” systems like associativity or braids.
Axiomatic Rewriting Theory VI: Residual Theory Revisited
37
Theorem 3 (pushouts). The category G/≡ of terms and rewriting paths modulo homotopy equivalence, has pushouts. Theorem 4 (epis). Every morphism of G/≡ is an epimorphism. Proof. The proof follows the pattern of [L´e1 ,HL,M1 ]. The first step is to prove that two paths f and g are equal modulo homotopy when they develop the same multi-trek. This is done by a simple induction on the depth of the multi-trek, thanks to finite developments property 4. Then, one applies the usual commutation techniques, see also [St], except that multi-redexes are replaced now by multi-treks.
4
Illustration 1: The Associativity Rule and Mac Lane’s Pentagon
Here, we apply the theory of section 3 to the rewriting system (5) introduced in section 1, with unique rewriting rule: α
x ⊗ (y ⊗ z) −→ (x ⊗ y) ⊗ z
(17)
First, let us define what we mean by trek in that case. Given n ≥ 1 variables y1 , ..., yn , we define the normal form [y1 , ..., yn ] by induction on n: [y1 ] = y1
[y1 , ..., yn ] = [y1 , ..., yn−1 ] ⊗ yn
A trek t of T (M ) is defined as a pattern of x ⊗ [y1 , ..., yn ]
(18)
for n ≥ 2, and its occurrence o in the term M . So, a trek t ∈ T (M ) is a pair (o, n) where the occurrence o is a sequence '1 · · · 'k of L’s and R’s (for left and right) identifying the position of the “root” tensor product of the trek in the term. Notation: We say that the trek t has a head : the tensor product x ⊗ [y1 , ..., yn ]; and a spine: the n − 1 tensor products inside [y1 , ..., yn ]. We will construct our residual theory in such a way that t is “realized” or “developed” by the path f : x ⊗ [y1 , ..., yn ] −→ [x, y1 , ..., yn ] In our theory, every trek t ∈ T (M ) generates a unique redex u ∈ R(M ), defined as the redex with same occurrence as t. This redex rewrites x ⊗ [y1 , ..., yn ] −→ (x ⊗ [y1 , ..., yn−1 ]) ⊗ yn
38
P.-A. Melli`es
The left-hand side of the rewrite rule (17) is the particular case n = 2 of pattern (18). Thus, every redex u ∈ R(M ) defines a trek u ∈ T (M ) of same occurrence in M , satisfying u u. This defines a trek graph (G, T, ). Now, we define the residual structure [[−]] on it. Consider a redex u : M −→ N and two treks t ∈ T (M ) and t ∈ T (N ). The redex u induces a one-to-one function ϕ between the occurrences of M not overlapping the left-hand side of u, and the occurrence of N not overlapping the right-hand side of u. The two treks t and t verify t[[u]]t precisely when: – t = (o, n) and t = (ϕ(o), n) when the redex u and the trek t do not overlap, – t = (o, n) and t = (o · L, n − 1) when n ≥ 3 and the redex u overlaps with both the head and the spine of t. Observe that t u in that case. – t = (o · R, n) and t = (o, n) when the redex u overlap with the head (and not the spine) of t, – t = (o, n) and t = (o, n + 1) when the redex u overlap with the spine (and not the head) of t. There remains to check that [[−]] verifies the four properties required of residual systems. Property 1. (self-destruction) is obvious. Here, property 2. (finiteness) may be strengthened to the property that a trek has one residual at most. Accordingly, we only need to establish property 3. (finite developments) on singleton multi-treks only. Anyway, property 3. follows from strong normalization of the rewriting system. There remains to prove property 4. (permutation.) for every pair of coinitial redexes u : M −→ P and v : M −→ Q. We proceed by case analysis. Suppose first that u and v do not overlap, and let the redex u be the residual of u after v, and the redex v be the residual of v after u, in the usual residual theory (for redexes.) Then, obviously, in our extended residual theory, the trek u is residual of the trek u after v, and the trek v is residual of the trek v after u. Moreover, {u } Q u and {v } P v . Thus, diagram (6) is an instance of (16). For lack of space, we will only sketch how one proves that [[u; v ]] = [[v; u ]]. The proof is not very difficult, quite pleasant to proceed in fact, but requires to consider a few cases, based on whether u or v overlaps with the head and spine of a given trek t ∈ T (M ). Now, suppose that u and v do overlap. We may suppose wlog that v has instance o, that u has instance o · R, and that the pattern of the critical pair is (7): w⊗α
u : w ⊗ (x ⊗ (y ⊗ z)) −→ w ⊗ ((x ⊗ y) ⊗ z) α
v : w ⊗ (x ⊗ (y ⊗ z)) −→ (w ⊗ x) ⊗ (y ⊗ z)
Axiomatic Rewriting Theory VI: Residual Theory Revisited
39
On one hand, the redex v overlaps with the head of the trek u. Thus u = (o·R, 2) has residual the trek tu = (o, 2): tu : (w ⊗ x) ⊗ [y, z] developed by the redex u : α
(w ⊗ x) ⊗ [y, z] −→ [w, x, y, z] On the other hand, the redex u overlaps with the spine of the trek v. Thus, v = (o, 2) has residual the trek tv = (o, 3). tv : w ⊗ [x, y, z] developed by the path hv : α
α⊗z
w ⊗ [x, y, z] −→ (w ⊗ [x, y]) ⊗ z −→ [w, x, y, z] So, Mac Lane’s pentagon (8) is an instance of diagram (16). There remains to prove [[u; hv ]] = [[v; u ]]. Just as in case of non-overlapping redexes u and v, the proof proceeds by case analysis, which we omit here for lack of space. Since the square and pentagon diagrams (6) and (8) induce the same homotopy equivalence ≡ as our residual theory, we conclude from theorems 3 and 4 that: Theorem 5 (universal confluence). The rewriting paths generated by (17) and quotiented by the homotopy diagrams (6) and (8) define a category of epimorphisms with pushouts. Remark: all λ-calculi with explicit substitutions like [ACCL] go back to PierreLouis Curien’s Categorical Combinatory Logic [C] — whose main rewriting engine is the associativity rule. We are currently experimenting a theory of treks for a λ-calculus with explicit substitutions, extending what we have done here, and inducing the first universal confluence theorem for a calculus of that kind.
5
Illustration 2: Braids and Garside’s Theorem
We will be very concise here, for lack of space. For information, the reader is advised to read the crisp course notes [KOV1 ] where a nice exposition of [M4 ] appears, along with other materials. The braids group B was introduced by Artin [A]. The group is presented by the alphabet Σ = {σi | i ∈ Z} and the relations σi σi+1 σi = σi+1 σi σi+1
for every i ∈ Z
(19)
40
P.-A. Melli`es
σi σj = σj σi
when |i − j| ≥ 2
(20)
In order to solve the word problem in B, Garside [Ga] considered the monoid B+ of positive braids, with same presentation. He established that every two elements of B+ have a lower common multiple. The theorem may be restated as the universal confluence property of the rewriting system with elements b of B+ σi as terms, arrows M −→ M σi as redexes, and commutation diagrams (19) and (20) as tiles. Each σi is interpreted diagrammatically as the operation of permuting the strand i over the strand i + 1 in a braid diagram:
In particular, the equation (19) mirrors the topological equality:
In our residual system, a trek t of M is a pair (I, J) of subsets of Z, such that2 – I ∩ J is empty, – I ∪ J is a finite interval [i, j] with smaller bound i ∈ I and larger bound j ∈ J. Intuitively, a trek (I, J) is intended to “permute” all strands of J over all strands of I, without altering the order of strands internal to I and J. So, we declare that a trek (I, J) ∈ T (M ) generates a redex σi ∈ R(M ) when i ∈ I and j ∈ J. Observe that every redex σi ∈ R(M ) defines the trek σi = ({i}, {i+1}) ∈ T (M ), which verifies σi σi . This defines a trek graph. The residual relation [[−]] is defined on it as follows. Let (I, J) ∈ T (M ) and σk ∈ R(M ). We write σ˙ k for the permutation (k, k + 1) in Z; and i, j for the two integers i ∈ I and j ∈ J such that I ∪ J = [i, j]. Then, 2
The treks (I, J) define a well-known class of generators for braids, with remarkable permutation properties. Another interesting class of generators appears in [BKL].
Axiomatic Rewriting Theory VI: Residual Theory Revisited
41
1. (I, J)[[σk ]](I, J) when k is not element of [i − 1, j], 2. (I, J)[[σk ]](σ˙ k (I), σ˙ k (J)) when – either k ∈ [i + 1, j − 2], – or k = i and i + 1 ∈ I, – or k = j − 1 and j − 1 ∈ J, 3. (I, J)[[σi−1 ]](I + {i − 1}, J), 4. (I, J)[[σi ]](σ˙ i (I), J − {i + 1}) when i + 1 ∈ J and J is not singleton, 5. (I, J)[[σj−1 ]](I − {j − 1}, σ˙ j−1 (J)) when j − 1 ∈ I and I is not singleton, 6. (I, J)[[σj ]](I, J + {j + 1}). The definition verifies the four properties of residual systems: self-destruction is immediate while finiteness may be strengthened to the property that a trek has a residual at most, just as for the associativity rule in section 4. Finite development is easily established too, by weighting treks. The property of permutation is more interesting. Suppose that σi and σj are redexes from M . Two cases occur. Either |j − i| ≥ 2, and then σi [[σj ]]σi
and σj [[σi ]]σj
induce the commutative diagram: M σj
M σj
σi
/ M σi (21)
σj
σi
/ M σ i σj
Or j = i + 1, and then σi [[σi+1 ]]({i}, {i + 1, i + 2}) σi+1 [[σi ]]({i, i + 1}, {i + 2}) induce
σi
M
/ M σi
M σi σi+1
σi+1
M σi+1
σi+1
σi
/ M σi σi+1
σi+1
(22)
σi
/ M σi σi+1 σi
which mirror exactly equations (19) and (20). There remains to prove that the two paths f and g along the confluence diagrams (21) and (22) define the same residual relation [[f ]] = [[g]]. This is easy to establish for diagram (21) and an inch more tedious for diagram (22). We skip the proof here for lack of space.
42
6
P.-A. Melli`es
Illustration 3: L´ evy’s Optimality and Extraction Theory
In his PhD thesis, L´evy [L´e1 ,L´e2 ] develops a theory of optimality for the λcalculus. After defining families of β-redexes “of a common nature” which may be shared by an implementation, L´evy calls optimal any implementation which computes one such family at each step. The moving part of the story is that no optimal implementation existed at the time. L´evy formulated its principle, and the implementation was designed twelve years later [Lam]. In [L´e1 ,L´e2 ], families are defined in three different ways, which turn out to be equivalent: zigzag, extraction, labelling. Here, we focus on zigzag and extraction, and forget labelling for lack of space. Definition 14 (L´ evy). A β-redex with history is a pair (f, u) where f : M −→ P is a rewriting path and u : P −→ P is a β-redex. For notation’s sake, we feel free to identify the β-redex with history (f, u) and f
u
the path M −→ P −→ P . Definition 15 (residual). Let h : P −→ Q be a rewriting path, and let f
g
u
v
M −→ P −→ P and M −→ Q −→ Q be two β-redexes with history. We write (f, u) /h (g, v) when
u /h v
and
g ≡ f; h
In that case, we say that (f, u) is an ancestor of (g, v). The zig-zag relation is the least equivalence relation between redexes with history, containing this ancestor relation. Definition 16 (family). A family is an equivalence class of the equivalence relation between β-redexes with history. One unexpected point about families is the lack of common ancestor property. We illustrate this with an example, adapted from [L´e2 ]. Consider the three λterms P = (λx.xa)(λy.y) Q = (λy.y)a R=a and the redex with history v: v:
u
v
P −→ Q −→ R
(23)
Axiomatic Rewriting Theory VI: Residual Theory Revisited
43
The two redexes with history v1 and v2 : ∆
u
v
∆
u
v
v1 :
1 1 ∆P −→ P P −→ QP −→ RP
v2 :
2 2 ∆P −→ P P −→ P Q −→ PR
are in the same family, but do not have an ancestor v0 in common. Intuitively, this common ancestor should be v. But v computes the β-redex u, which is not a subcomputation of ∆; u1 and not a subcomputation of ∆; u2 . This means that the theory has to be slightly refined. L´evy [L´e2 ] introduces a non-deterministic extraction procedure ✄ acting on redexes with history. At this point, it is useful to recall that the set R(M ) of β-redexes from a λterm M may be ordered by prefix ordering M on occurrences. We also introduce a refinement of L´evy’s homotopy relation on paths [GLM,M1 ,M2 ]. Definition 17 (reversible homotopy). We write f 1 g when f and g factor as f1 f2 u v f = M −→ M −→ P −→ N −→ N f1
u
v
f2
g = M −→ M −→ Q −→ N −→ N where – u and v are disjoint β-redexes, that is incomparable wrt. M . – u is the unique residual of u after v, – v is the unique residual of v after u. The reversible homotopy equivalence relation is the reflexive transitive closure of 1 . So, considering amounts to tiling all diagrams of the form (9) in the reduction graph of the λ-calculus. Definition 18 (L´ evy). The extraction relation is the union of the four relations below. f
u
v
f
u
h
f
v
1. M → P → P → N ✄ M → P → Q when v N u and v /u v , v
f
g
v
2. M → P → P → Q → R ✄ M → P → Q → R when u and g; v compute in “disjoint” occurrences, and h; v is the path residual of g; v after u. In that case, u /g u /v u and u; h; v g; v; u induce the diagram below. M
f
/ C[A, B] u
C[A , B]
g h
/ C[A, B ] u
/ C[A , B ]
v v
/ C[A, B ]
u
/ C[A , B ]
44
P.-A. Melli`es f
u
h
v
f
g
v
3. M → P → P → Q → R ✄ M → P → Q → R when the path g; v computes inside the body A of the β-redex u : C[(λx.A)B] −→ C[A[B/x]] and the path h; v is the path residual of g; v after u. In that case, u /g u /v u induces the diagram below: C[(λx.A)B] u
C[A[B/x]] f
u
h
g
/ C[(λx.A )B]
≡
u
/ C[A [B/x]]
h
v
f
g
v ≡ v
/ C[(λx.A )B] u
/ C[A [B/x]]
v
4. M → P → P → Q → R ✄ M → P → Q → R when u : C[(λx.A)B] −→ C[A[B/x]] and the path g; v computes inside the argument B of u, and the path h; v computes just as g inside one copy of the argument B of A[B/x]. In the diagram, we write Al (the subscript l stands for linear) for the λ-term A where all instances of x are substituted by the λ-term B, except one instance which is substituted by a fresh variable l. C[(λx.A)B]
g
/ C[(λx.A)B ]
v
/ C[(λx.A)B ]
h
/ C[Al [B /l]]
v
/ C[Al [B /l]]
u
C[Al [B/l]]
L´evy shows that the procedure ✄ is strongly normalizing, and confluent. Suppose moreover that (f, u) and (g, v) are two redexes with history whose paths f and g are left-standard, that is standard wrt. the leftmost-outermost order on βredexes, see [CF,L´e1 ,K1 ,Ba,GLM,M2 ]. Then, L´evy shows that (f, u) and (g, v) are in the same family iff they have the same ✄-normal-form. This result provides a pertinent canonical form theorem for L´evy families. Typically, stepping back to example (23) the redex with history v is not the common ancestor, but the common ✄-normal-form of the two redexes with history v1 and v2 . We reformulate this part of L´evy optimality theory using the language of treks. The set P (M ) of paths outgoing from a λ-term M , may be ordered as follows: f
g
M −→ P M −→ Q ⇐⇒ ∃h : P → Q,
f; h g
The order (P (M ), ) is a local lattice 3 : every two bounded elements f and g have a least upper bound f ∨ g and a greatest least bound f ∧ g. We recall that two paths f, g ∈ P (M ) are bounded in (P (M ), ) when there exists a path h ∈ P (M ) such that f h and g h. 3
The order (P (M ), ) is a bit more than a local lattice, because the lub and glb of f and g do not depend on the choice of slice h.
Axiomatic Rewriting Theory VI: Residual Theory Revisited
45
Definition 19 (coprime). A coprime f ∈ P (M ) is a path such that, for every two bounded paths g, h ∈ P (M ), f g ∨ h ⇒ f g or f h So, a path f is coprime when it cannot be decomposed into more elementary computations f f1 ∨ . . . ∨ fk . Standard coprimes enjoy an important “enclave” or “localization” property. Here, by standard, we mean standard wrt. the prefix-ordering on β-redexes, see [GLM,M2 ]. Let v ∈ R(M ) be the first β-redex computed by a coprime f = v; g. The localization property tells that if the β-redex v ∈ R(M ) occurs inside the body (resp. the argument) of a β-redex u ∈ R(M ), then the whole standard coprime f computes inside the body (resp. the argument) of the β-redex u. This property leads to a new theory of residuals in the λ-calculus. Definition 20 (β-trek). A β-trek t ∈ T (M ) is a non-empty standard coprime of (P (M ), ), considered modulo . This defines Definition 21 (trek graph). A β-trek t ∈ T (M ) generates a β-redex u ∈ R(M ) when u t. Also, every β-redex u ∈ R(M ) defines the singleton β-trek u = {u} ∈ T (M ) which verifies u u. The residual relation t[[u]]t induced by a β-redex u : M −→ N on treks t ∈ T (M ) and t ∈ T (N ), is defined by induction on the length of t. Definition 22 (residual). 1. when the trek t generates the β-redex u, then the β-trek t is defined as the unique β-trek such that t u; t , 2. when t generates a β-redex v occurring inside the body of u; then, t is the residual of t after u, modulo , 3. when t generates a β-redex v occurring inside the argument of u : C[(λx.A)B] −→ C[A[B/x]]; then, t computes just as t inside one copy of B in A[B/x], 4. when t generates a β-redex v not occurring inside the β-redex u, this β-redex v has a unique residual v after u, and t has the residual s after v; the β-trek t is any path v ; s where s is a residual of s after the development of u /v . The definition is fine because a β-trek has always residuals of smaller length. The four properties of residual systems are established as follows. The properties of self-destruction and finiteness are immediate, while finite development follows
46
P.-A. Melli`es
from L´evy’s finite family developments theorem [L´e1 ,O3 ]. The last property of permutation follows from elementary combinatorial arguments, that we do not expose here for lack of space. At this point, two β-treks t1 , t2 ∈ T (M ) may have the same β-trek t3 ∈ T (N ) as residual after a β-redex u : M −→ N . Typically, in our example, the β-treks v and v1 have the β-trek u1 v1 QP −→ RP P P −→ as residual after the β-redex ∆P −→ P P . This motivates to limit oneself to normal β-treks. Definition 23. A trek is normal when it is a ✄-normal-form. It appears that normal β-treks define a sub-residual system of the β-trek system defined above, in the sense that normal β-treks have only normal β-treks as residuals, and that every β-trek u is normal. The main property of normal β-treks is that, for every normal β-trek t ∈ T (N ), and morphism u : M −→ N , there exists a unique normal β-trek s ∈ T (M ) such that t is residual of s after u. This normal β-trek is simply the ✄-normal-form of u; t. So in a sense, there is no “creation” in the residual theory of normal treks. A canonical ancestor theorem for normal treks follows, “replacing” the lack of such a theorem for β-redexes. Theorem 6 (canonical ancestor). Every family is the set of residuals of a normal trek t — modulo homotopy equivalence of the redexes with history. Remark: in contrast to section 4 and 5, where a redex may have a non-trivial trek as residual, β-redexes have only β-redexes as residuals. This hides the concept of β-trek very much, explains why the theory of β-treks was not formulated earlier. Since L´evy’s early steps, the optimality theory has been extended to term rewriting systems [Ma] to interaction systems [AL] and to higher-order systems [O3 ]. For instance, Asperti and Laneve gave several examples of higher-order rewriting systems, where the “zig-zag” notion of L´evy family does not match with the (more correct) “labelling” and “extraction” definitions. We conjecture that our language of treks and residuals repairs this, and provides the relevant “residual theory” to discuss optimality in higher-order rewriting systems.
7
Conclusion
In this article, we revisit L´evy’s residual theory, in order to enscope “border-line” rewriting systems admitting “innocuous” critical pairs, like the associativity rule,
Axiomatic Rewriting Theory VI: Residual Theory Revisited
47
or the positive braids rewriting systems. We show in addition that the resulting residual theory clarifies L´evy’s theory of optimality. In future work, we would like to express an effective syntactic criterion (∗) indicating when critical pairs are “innocuous” in a given rewriting system, or equivalently, when the language of treks applies. The criterion would refine Knuth and Bendix’s definition of critical pair, and formulate it in a more “combinatorial” style. A nice line of research initiated by G´erard Huet [Hu] will certainly guide us. Its main task is to devise conditions on critical pairs, ensuring confluence. As far as we know, three such syntactical conditions were already expressed on term rewriting systems: 1. G´erard Huet [Hu] devises a liberal criterion on critical pairs, inducing confluence of left and right-linear term rewriting systems. His class of strongly closed rewriting systems is limited by the right-linearity (= non-duplication) condition on rewriting rules. 2. in the same article, G´erard Huet introduces a critical pair condition which avoids the right-linearity condition. Since then, the condition was generalized by Yoshihito Toyama [To] and Vincent van Oostrom [O4 ]. 3. Bernhard Gramlich [Gr] introduces a third criterion on critical pairs, which he calls a parallel critical pair condition. As in Huet-Toyama-van Oostrom’s work, the condition implies confluence of left-linear term rewriting systems. In his article, Bernhard Gramlich shows that the three conditions 1. 2. and 3. are not comparable with one another. For instance, the associativity rule (5) is an instance of 1. but not of 2. and 3. This line of research has not only produced interesting syntactic conditions. Two abstract conditions were also exhibited: 1. Vincent van Oostrom [O1 ,O4 ] and Femke van Raamsdonk [R,OR] introduce weakly orthogonal rewriting systems and establish a confluence theorem for them. 2. Vincent van Oostrom and collaborators [O2 ,BOK,KOV2 ] establish a diagrammatic confluence theorem for a wide class of rewriting systems with decreasing diagrams. Validity of these conditions, either syntactical or abstract, may be decided on any finite rewriting system. We are looking for a criterion (∗) just as effective. There also remains to unify the approaches developed here and in Patrick Dehornoy’s work on Garside groups [D1 ]. At this stage, one main difference is the choice of “local” strong normalization property. Instead of our finite development hypothesis, Dehornoy requires strong normalization of each homotopy equivalence class of paths. This choice enables a simpler axiomatics, closer to
48
P.-A. Melli`es
Newman’s spirit, which enjoys an effective criterion [D2 ]. But it limits the scope of the theory. Typically, the pure λ-calculus is rejected because every path (λx.a)(∆∆) −→ · · · −→ (λx.a)(∆∆) −→ a lies in the homotopy equivalence class of (λx.a)(∆∆) −→ a where ∆ is the duplicating combinator ∆ = λy.yy. Acknowledgements. To Jan Willem Klop and Vincent van Oostrom for introducing me to braids rewriting and Garside’s theorem; to Patrick Dehornoy for explaining me his universal confluence theorems on monoids, and defying me to express them inside rewriting theory.
References [A] E. Artin. Theory of braids. Annals of Mathematics. 48(1):101–126, 1947. [ACCL] M. Abadi, L. Cardelli, P.-L. Curien, J.-J. L´evy. “Explicit substitutions”. Journal of Functional Programming, 1(4):375–416, 1991. [AL] A. Asperti, C. Laneve. Interaction Systems I: the theory of optimal reductions. In Mathematical Structures in Computer Science, Volume 4(4), pp. 457 - 504, 1995. [Ba] H. Barendregt. The Lambda Calculus: Its Syntax and Semantics. North Holland, 1985. [BKL] J. Birman, K.H. Ko and S.J. Lee. A new approach to the word problem in the braid groups. Advances in Math. vol 139-2, pp. 322-353. 1998. [Bo] G. Boudol. Computational semantics of term rewriting systems. In Maurice Nivat and John C. Reynolds (eds), Algebraic methods in Semantics. Cambridge University Press, 1985. [BOK] M. Bezem, V. van Oostrom and J. W. Klop. Diagram Techniques for Confluence. Information and Computation, Volume 141, No. 2, pp. 172 - 204, March 15, 1998. [C] P.-L. Curien. An abstract framework for environment machines. Theoretical Computer Science, 82(2):389-402, 31 May 1991. [CF] H.B. Curry, R. Feys. Combinatory Logic I. North Holland, 1958. [D1 ] P. Dehornoy. On completeness of word reversing. Discrete Math. 225 pp 93– 119, 2000. [D2 ] P. Dehornoy. Complete positive group presentations. Preprint. Universit´e de Caen, 2001. [DL] P. Dehornoy, Y. Lafont. Homology of gaussian groups. Preprint. Universit´e de Caen, 2001. [Ga] F. A. Garside. The braid grup and other groups. Quarterly Journal of Mathematic 20:235–254, 1969. [Gr] B. Gramlich. Confluence without Termination via Parallel Critical Pairs. Proc. 21st Int. Coll. on Trees in Algebra and Programming (CAAP’96), ed. H. Kirchner. LNCS 1059, pp. 211-225, 1996.
Axiomatic Rewriting Theory VI: Residual Theory Revisited [GLM]
49
G. Gonthier, J.-J. L´evy, P.-A. Melli`es. An abstract standardization theorem. Proceedings of the 7th Annual IEEE Symposium on Logic In Computer Science, Santa Cruz, 1992. [Hin] J.R. Hindley. An abstract form of the Church-Rosser theorem. Journal of Symbolic Logic, vol 34, 1969. [Hu] G. Huet. Confluent reductions: abstract properties and applications to term rewriting systems. Journal of the ACM, 27(4):797–821, 1980. [HL] G. Huet, J.-J. L´evy. Computations in orthogonal rewriting systems. In J.L. Lassez and G. D. Plotkin, editors, Computational Logic; Essays in Honor of Alan Robinson, pages 394–443. MIT Press, 1991. [Hyl] J.M.E. Hyland. A simple proof of the Church-Rosser theorem. Manuscript, 1973. [K1 ] J.W. Klop. Combinatory Reduction Systems. Volume 127 of Mathematical Centre Tracts, CWI, Amsterdam, 1980. PhD thesis. [K2 ] J.W. Klop. Term Rewriting Systems. In S. Abramsky, D. Gabbay, T. Maibaum, eds. Handbook of Logic in Computer Science, vol. 2, pp. 2–117. Clarendon Press, Oxford, 1992. [KOV1 ] J.W. Klop, V. van Oostrom and R. de Vrijer. Course notes on braids. University of Utrecht and CWI. Draft, 45 pages, July 1998. [KOV2 ] J.W. Klop, V. van Oostrom and R. de Vrijer. A geometric proof of confluence by decreasing diagrams Journal of Logic and Computation, Volume 10, No. 3, pp. 437 - 460, June 2000 [KB] D. E. Knuth, P.B. Bendix. Simple word problems in universal algebras. In J. Leech (ed) Computational Problems in Abstract Algebra. Pergamon Press, Oxford, 1970. Reprinted in Automation of Reasoning 2. Springer Verlag, 1983. [Lam] J. Lamping. An algorithm for optimal lambda-calculus reductions. In proceedings of Principles of Programming Languages, 1990. [L´e1 ] J.-J. L´evy. R´eductions correctes et optimales dans le λ-calcul. Th`ese de Doctorat d’Etat, Universit´e Paris VII, 1978. J.-J. L´evy. Optimal reductions in the lambda-calculus. In J.P. Seldin and J.R. [L´e2 ] Hindley, editors, To H.B. Curry, essays on Combinatory Logic, Lambda Calculus, and Formalisms, pp. 159–191. Academic Press, 1980. [Ma] L. Maranget. Optimal Derivations in Weak Lambda-calculi and in Orthogonal Term Rewriting Systems. Proceedings POPL’91. [Mc] S. Mac Lane. Categories for the working mathematician. Springer Verlag, 1971. P.-A. Melli`es. Description abstraite des syst`emes de r´e´ecriture. Th`ese de [M1 ] l’Universit´e Paris VII, December 1996. P.-A. Melli`es. Axiomatic Rewriting Theory I: A diagrammatic standardiza[M2 ] tion theorem. 2001. Submitted. P.-A. Melli`es. Axiomatic Rewriting Theory IV: A stability theorem in Rewrit[M3 ] ing Theory. Proceedings of the 14th Annual Symposium on Logic in Computer Science. Indianapolis, 1998. [M4 ] P.-A. Melli`es. Braids as a conflict-free rewriting system. Manuscript. Amsterdam, 1995. P.-A. Melli`es. Mac Lane’s coherence theorem expressed as a word problem. [M5 ] Manuscript. Paris, 2001. [MN] R. Mayr, T. Nipkow. Higher-Order Rewrite Systems and their Confluence. Theoretical Computer Science, 192:3-29, 1998. [Ne] M. H. A. Newman. On theories with a combinatorial definition of “equivalence”. Annals of Math. 43(2):223-243, 1942.
50 [Ni] [O1 ] [O2 ] [O3 ] [O4 ] [O5 ] [OR] [OV] [P] [R] [St] [To]
P.-A. Melli`es T. Nipkow. Higher-order critical pairs. In Proceedings of Logic in Computer Science, 1991. V. van Oostrom. Confluence for Abstract and Higher-Order Rewriting. PhD thesis, Vrije Universiteit, Amsterdam, 1994. V. van Oostrom. Confluence by decreasing diagrams. Theoretical Computer Science, 121:259–280, 1994. V. van Oostrom. Higher-order family. In Rewriting Techniques and Applications 96, Lecture Notes in Computer Science 1103, Springer Verlag, 1996. V. van Oostrom. Finite family developments. In Rewriting Techniques and Applications 97, Lecture Notes in Computer Science 1232, Springer Verlag, 1997. V. van Oostrom. Developing developments. Theoretical Computer Science, Volume 175, No. 1, pp. 159 - 181, March 30, 1997 V. van Oostrom and F. van Raamsdonk. Weak orthogonality implies confluence: the higher-order case. In Logical Foundations of Computer Science 94, Lecture Notes in Computer Science 813, Springer Verlag, 1994. V. van Oostrom and R. de Vrijer. Equivalence of reductions, and Strategies. Chapter 8 of Term Rewriting Systems, a book by TeReSe group. To appear this summer 2002. G.D. Plotkin, Church-Rosser categories. Unpublished manuscript. 1978. F. van Raamsdonk. Confluence and Normalisation for Higher-Order Rewriting. PhD thesis, Vrije Universiteit van Amsterdam, 1996. E.W. Stark. Concurrent transition systems. Theoretical Computer Science 64. May 1989. Y. Toyama. Commutativity of term rewriting systems. In K. Fuchi and L. Kott (eds) Programming of Future Generation Computer, vol II, pp. 393– 407. North-Holland, 1988.
Static Analysis of Modularity of β-Reduction in the Hyperbalanced λ-Calculus Richard Kennaway1, Zurab Khasidashvili 2, and Adolfo Piperno3 1
2
School of Information Systems, University of East Anglia, Norwich NR4 7TJ, England
[email protected] Department of Mathematics and Computer Science, Bar-Ilan University, Ramat-Gan 52900, Israel
[email protected] 3 Dipartimento di Informatica, Universita ` di Roma “La Sapienza”, Via Salaria 113, 00198 Roma, Italy
[email protected]
Abstract. We investigate the degree of parallelism (or modularity) in the hyperbalanced λ-calculus, λH , a subcalculus of λ-calculus containing all simply typable terms (up to a restricted η-expansion). In technical terms, we study the family relation on redexes in λH , and the contribution relation on redex-families, and show that the latter is a forest (as a partial order). This means that hyperbalanced λ-terms allow for maximal possible parallelism in computation. To prove our results, we use and further refine, for the case of hyperbalanced terms, some well known results concerning paths, which allow for static analysis of many fundamental properties of β-reduction.
1 Introduction Barendregt et al [7] have shown that there does not exist an optimal recursive one-step β-reduction strategy that, given a λ-term, would construct its shortest normalizing reduction path. To achieve optimality, L´evy [20, 21] proposed to consider multi-step normalizing strategies, where each multi-step contracts simultaneously a set of redexes of the same family. Lamping [18] and Kathail [9] proved that such multi-steps can indeed be implemented by single (sharing-) graph reduction steps. Since then, there have been many interesting discoveries both in the theory and practice of optimal implementation, and we refer the reader to the recent book on the subject by Asperti and Guerrini [4]. L´evy [20, 21] introduced the family concept in the λ-calculus in three different but equivalent ways: via a suitable notion of labelling that records the history of a reduction, via extraction procedure that eliminates from a reduction all steps that do not ’contribute’ to the creation of the last contracted redex, and by zig-zag, which is an equivalence closure of the residual relation. Asperti and Laneve [1] and Asperti et al [2] described a fourth way to characterize families via a concept of path, allowing in particular to detect future redexes statically. Here we continue our investigation of the Hyperbalanced λ-calculus, λH [15, 16]; in particular, we study the degree of parallelism that λH can offer, by investigating S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 51−65, 2002. Springer-Verlag Berlin Heidelberg 2002
52
R. Kennaway, Z. Khasidashvili, and A. Piperno
the family relation in this restricted setting. The hyperbalanced λ-terms have a nice structure that makes it easy to analyze their normalization behavior statically. All simply typable terms are hyperbalanced up to a restricted η-expansion, and moreover all strongly normalizable terms can be made hyperbalanced by means of β-reduction and η-expansion [16]. We will see that the family relation (and the derived contribution relation) in the λH -calculus has several attractive properties that the family relation in the λ-calculus in general does not posses. In particular, in hyperbalanced terms, well balanced paths [1] (wbp) are determined uniquely by their initial and ending nodes, and all wbp are legal. Recall that the paths that define families are legal. From this we immediately obtain that the number of families in any hyperbalanced term is finite. We claim (we are not reporting the proof here) that this is at most quadratic in the size of the term, and that such a bound is tight. These results imply strong normalization of λH , and of the simply typed λ-calculus. Further, we prove a virtual Church-Rosser property for λH , allowing us to prove that redexes of the same family are residuals of a unique virtual redex. Finally, we prove that extraction normal forms are chains and that the contribution relation is a forest. This shows that λH allows for a maximal possible parallelism in computation. In our proofs, we employ substitution paths [15]. They allow for simple proofs of the fact that, in the λH -calculus, wbp=legal paths=virtual redexes. This equivalence of different definitions of future redexes shows that it is largely a matter of convenience (and taste) whether one decides to choose wbp or substitution paths in the proofs of the above properties of the contribution relation. We find substitution paths simpler because the residual concept for them is based on a concept of descendant, while for wbp the ancestor concept is more complicated (the ancestor of a single edge may be a complex path, and this makes definition of the residual concept for wbp harder). Moreover, substitution paths allow for a simple characterization of the total number of families, as well as the number of needed families, in hyperbalanced terms (see Corollary 14). Notation and Terminology We mainly follow [5, 6] for the basic concepts and [21, 4] for advanced concepts related to the family relation, but with some deviations. All u missing definitions can be found there. We write M N if N is obtained from N by contracting a β-redex u in M. For any terms N and M, N M denotes that N is a subterm occurrence in M. We use w; v to denote terms which ar formed by application, as well as redexes. For any reductions P and Q, P + Q denotes the concatenation of P and Q, P Q denotes P + Q=P, where Q=P is the residual of Q under P, L denotes L´evy- or strong-equivalence, and denotes the family relation. Due to the lack of space, some proofs have been omitted in the text: they appear in [10]. The proofs are by combining induction with case analysis, and are rather routine anyway.
!
t
þ
'
2 The λH -Calculus The set Λ of λ-terms is generated using the grammar Λ ::= x
ÿ
j
j
(λx:Λ) (Λ1 Λ2 );
Static Analysis of Modularity of b-Reduction
53
where x belongs to a countable set Var of term variables. Terms are considered modulo renaming of bound variables, and the variable convention [5] is assumed; namely, bound variables are always chosen to be different from free ones. λax
λy
(' &% #(!"@#$,,, # # $ ### & ( y# λs
a
x
(' &% !"@#$. & . & &&
λw
w
vr 2
2 * 2* * @ 1* 1 $ $ 2 2 111111111 11 $$$ 1
(' &% !"@#$+ ! ++ !!
x
λu
λtz
2 * 2* "*2@2 0* 0 " " " 00 " """" vr 1 (
(' &%# 2 !"@#$. . ' ' ' u! λp (' &% !"@#$ ! -!! u" λr x
1
λv
(' &% %!"@#$,, % % ' %% (' % &% v* !"@#$// % // %% %% x λq t ) 2 (' &% !"@#$+ )) + 1 q
x
2
Fig. 1. Structure of λ-terms: abstraction terms correspond to subtrees rooted in a shadowed box, application terms correspond to subtrees rooted in a dashed box, while head-variable terms correspond to subtrees rooted in a rounded box. Arrow decoration is shown up to rank 2. vr i=virtual redex of rank i (see below).
We can see from the example in Figure 1 that lambda terms can be defined using the following grammar, where terms of the shape Qi; j , respectively λx1 : : : xn :S, respectively x or Ri , will be called application terms, abstraction terms, and head variable terms. Λ S Ri Qi; j
::= S j λx1 : : : xn :S (n > 0); ::= x j Ri (i > 0) j Qi; j (i; j > 0); ::= xΛ1 : : : Λi ; ::= (λx1 : : : x j :S)Λ1 : : : Λi :
(1)
Notation 1 We will often need to write the application operators explicitly in application and head variable terms. For example, the application and head variable terms above are written as @(: : : @(λx1 : : : λx j :L; L1 ); : : : ; Li ) and @(: : : @(z; N1 ); : : : ; N j ), respectively. The patterns of these terms are the contexts obtained from them by removing the body and the arguments, and the head and the arguments, respectively. Consider the set of λ-terms whose generic element is such that every application subterm occurring in it has as many binding variables as arguments.
54
R. Kennaway, Z. Khasidashvili, and A. Piperno
Definition 2 (i) (Balanced terms) A lambda term is called balanced iff, in every application subterm Qi; j in it, one has i = j. b is (ii) An application subterm N (λx1 : : : xn :P)N1 : : : Nn of a balanced term M Λ semi-hyperbalanced iff whenever N1 has k ( 0) initial abstractions, every free occurrence of x1 in P has exactly k arguments. b of hyperbalanced terms as the great(iii) (Hyperbalanced terms) Define the set b Λ Λ est set of balanced terms such that - b Λ is closed with respect to β-reduction; Λ, every application subterm in M is semi-hyperbalanced. - for any M b For example, the term Ω = (λx:xx)(λx:xx) is a (non-normalizable!) balanced term whose reducts all remain balanced but which is not hyperbalanced.
ÿ
2
þ ý
2
!
ÿ
u
Definition 3 Let M M 0 be a β-reduction step, where u (λx:P)Q. As pointed out in [19], a new β-redex v can be created in M 0 in three substantially different ways: (1) upwards: when P is an abstraction and it becomes the operator of v; (2) downwards: when P x and Q becomes the operator of v; (3) upwards: when u (λx:x)Q (i.e., P x), (λx:x)QQ0 M, and v = QQ0 . The third type of creation is not possible in hyperbalanced terms. Now let w be the application subterm corresponding to u, written w = aps(u), and let w0 = aps(v). Then, correspondingly, w0 will be called a created application subterm of type 1 or 2. For any other (non-created) application subterm wÿ in M 0 , its corresponding redex uÿ is a u-residual of a redex u00 in M, and in this case we will say that wÿ is a residual of w00 = aps(u00 ).
ÿ
6ÿ
ÿ
ü
3 Virtual Redexes in the λH-Calculus In this section, we shall show that redex families in the λH -calculus are determined uniquely by ’matching pairs’ of @- and λ-nodes. To do this, we give an inductive definition of matching pairs, which we call virtual redexes, via substitution paths [15]. Then we show (using substitution paths) that every virtual redex uniquely determines a well balanced path (wbp) connecting the corresponding @- and λ-occurrences, and that all wbp are legal. We conclude by using the 1 1 correspondence between legal paths and families obtained in [1]. Definition 4 (Well balanced paths [1]) Let M be a λ-term. (1) Any edge connecting a @-occurrence with its left son, ?, whose ’type’ ? may be λ; @ or v (variable), is a well balanced path (wbp for short) of type @ ?. (2) Let ψ be a wbp of type @ v whose ending variable is bound by a λ-node λ0 and φ be a wbp of type @ λ coming into λ0 . Then ψ (φ)r ρ is a wbp, where ρ is an edge outgoing the second argument of the initial node of φ. The type of ψ (φ)r ρ depends on the node connected to ρ. (3) Let ψ be a wbp of type @ @ ending in a node @0 and φ be of type @ λ leading from @0 to some λ-node λ0 . Then ψφρ is a wbp, where ρ is the edge that outgoes λ0 towards its body. The type of ψφρ depends on the connection of the edge ρ. Lemma 5 If @0 φλ0 is a wbp of type @ λ, in a λ-term, connecting a @-occurrence @0 with a λ-occurrence λ0 , then there is no other wbp of type @ λ that starts at @0 and properly extends φ.
û
û
û
û
ú
ú
û
û
û
û
Static Analysis of Modularity of b-Reduction
Proof. Immediate from Definition 4.
55
2
Λ. Definition 6 (Arrows, ranks, degrees) Let M b Step 1: For any application subterm w (λy1 : : : yl :N )M1 : : : Ml in M, draw arrows connecting every Mi with all free occurrences of yi in N (if any), i = 1; : : : ; l. The rank of all such arrows is 1, and the arrows are relative to w – any arrow coming out of Mi is called an i-th arrow relative to w. If w = aps(u), arrows coming out of M1 are also called relative to u. The degree of any arrow ι coming out of Mi is deg(ι) = i. Further, for any k 2, do (until applicable): Step k: For any head-variable subterm zN1 : : : Nm such that z has received an arrow κ of rank k 1 from an abstraction L λz1 : : : zm :P in the (k 1)-th step 1 , draw arrows connecting every N j with all free occurrences of z j in P, j = 1; : : : ; m. The rank of any such arrow is k, and the degree of any arrow ι coming out of N j is deg(ι) = deg(κ) + j. Call ι a successor, or a j-th successor, of κ, and call κ the predecessor of ι. If κ is relative to an application subterm w M (respectively, to a β-redex u), then so is ι.
ÿ
þ
ý
ÿ
ý
ü
ü 2
2
7!
Λ to an occurrence x M is written ι : N x, An arrow ι from a subterm N M b and rank(ι) denotes its rank. We have shown in [15] that rank(ι) determines the minimal number of superdevelopments [22] needed to perform the substitution corresponding to ι. Similarly, we will see that deg(ι) is the number of β-steps needed to perform the substitution corresponding to ι. See Figure 1 for a graphical example.
2
Definition 7 (Virtual redexes) Let λ0 be a λ-occurrence in M b Λ. Assume that M is marked with arrows (according to Definition 6). We define the matching application @0 for λ0 , the degree of the virtual redex (@0 ; λ0 ), and the arrows that (@0 ; λ0 ) contracts, as follows: (1) Let λ0 occur as the λ-occurrence of λxi in an application subterm w = @(: : : @(λx1 : : : λxm :S; T1 ); : : : ; Tm ):
Then the matching application @0 for λ0 is the i-th from the right occurrence of @ in the pattern of w. Further, (@0 ; λ0 ) is called the i-th virtual redex relative to w, and the degree of (@0 ; λ0 ) is deg(@0 ; λ0 ) = i 1. Finally, we say that (@0 ; λ0 ) contracts all arrows coming out of Ti . (2) Let λ0 occur as the λ-occurrence of λxi in an abstraction subterm
ý
!7
L = λx1 : : : λxm :S:
For any arrow ι : L z in M, the matching application @0 for λ0 , relative to ι, is the i-th from the right occurrence of @ in the pattern of the head-variable subterm R = @(: : : @(z; T1 ); : : : ; Tm ) corresponding to z. (Thus λ0 has as many matching applications as the number of arrows coming out of L.) Further, (@0 ; λ0 ) is called the i-th virtual redex relative to ι, (@0 ; λ0 ) is relative to the same application subterm as ι, and the degree of (@0 ; λ0 ) is deg(@0 ; λ0 ) = deg(ι) + i 1. Finally, the say that (@0 ; λ0 ) contracts all arrows coming out of Ti (such arrows are i-th successors of ι).
ý
1
We observe that such step cannot be in general successfully carried out in the unrestricted λ-calculus, where the initial abstractions of L are not required to be exactly m.
56
R. Kennaway, Z. Khasidashvili, and A. Piperno
The degree of a virtual redex is the number of steps needed to created the ‘corresponding’ redex. This will be made precise in Section 5. A virtual redex is illustrated in Figure 1. Convention: Clearly, a virtual redex v with degree 0 constitutes the pattern of a unique β-redex, and in the sequel we will identify v with the latter redex. The set of virtual Λ will be denoted by vr(M ). We will use again v; u and w to redexes in a term M b denote virtual redexes.
2
ÿ
2
Lemma 8 Let φ be a wbp of type @ λ in M b Λ, connecting an occurrence @0 with 0 0 0 a λ-occurrence λ . Then (@ ; λ ) is a virtual redex.
Proof. (See [10] for a detailed proof) By induction on the length of φ.
2
2
Λ and u = (@0 ; λ0 ) vr(M ), and let u contract an arrow L Lemma 9 Let M b Then L is the second argument of @0 and λ0 binds x.
!7 x.
Proof. The lemma is an immediate consequence of Definitions 6 and 7. Lemma 10 Let v0 = (@0 ; λ0 ) be a virtual redex in M @ λ that connects @0 with λ0 .
ÿ
2 bΛ. Then there is a wbp of type
Proof. (See [10] for a detailed proof) By induction on n = deg(v0 ). Along the proof of Lemma 8, we have proved the following characterization of wbp in the λH -calculus:
ÿ
Lemma 11 Let φ be a wbp of type @ λ connecting the occurrences @0 and λ0 in M b Λ. (1) If the left son of @0 is a λ-occurrence, then it coincides with λ0 and φ = @0 λ0 . (2) If the left son of @0 is a @-occurrence, @00 , then φ = @0 @00 ψλ00 λ0 , where @00 ψλ00 is a wbp and λ0 is a son of λ00 . (3) If the left son of @0 is a variable-occurrence, x, then φ = @0 xλ00 (ψ)r @00 λ0 , where @00 ψλ00 is a wbp and λ0 is the right son of @00 .
2
From this characterization of wbp we conclude that wbp in hyperbalanced terms do not contain cycles, and that all wbp are legal paths. (So there is no need to recall the definition of cycles or legal paths from [1].) Furthermore, we can prove that virtual redexes determine their corresponding wbp uniquely: Theorem 12 Let φ and ψ be wbp connecting the same occurrences in M φ = ψ.
2 bΛ. Then
Proof. Suppose on the contrary that φ and ψ are two different wbp connecting the same occurrences, @0 and λ0 , in M, and assume that φ is shortest such path. If the left son of @0 in φ is a λ, then φ = ψ = @0 ; λ0 by Lemma 5. If the left son of @0 is a @-occurrence @00 , then by Lemma 11 φ = @0 @00 φ00 λ00 λ and ψ = @0 @00 ψ00 λ00 λ0 , where @00 φλ00 and @00 ψλ00 are wbp and λ0 is a son of λ00 . But φ00 = ψ00 by the minimality of φ, implying φ = ψ. The case when the left son of @0 is a variable is similar to the previous case.
ÿ
Now, from the theorem of Asperti and Laneve [1] stating that there is a 1 1 correspondence between the families in a λ-term and wbp of type @ λ, we get the following theorem:
ÿ
Static Analysis of Modularity of b-Reduction
57
2
ÿ 1 correspondence between families relative x L 7! x be n. Then, by Theorem 13 and the Let the arity of an arrow ι : λx Λ determines definition of virtual redexes (Definition 7), any arrow of arity n in M 2 b n families relative to M. Further, recall from [15] that the garbage-collected form M of a term M 2 b Λ can be constructed from M statically. Now we can state and prove the Theorem 13 Let M b Λ. Then there is 1 to M and virtual redexes in M.
1 : : : n:
Æ
following corollaries:
Corollary 14 (1) The number of families in any hyperbalanced λ-term M coincides with the sum of the arities of application subterms and arrows in M. Thus the number is finite, and it can be determined statically. (2) The number of multi-steps in the normalizing leftmost-outermost complete familyreduction starting from M coincides with the sum of the arities of application subterms and arrows in the garbage-collected form MÆ of M, thus it can be determined statically. Proof. (1) By Definition 7, all virtual redexes in M can be found statically, and it remains to apply Theorem 13. (2) By [15, Theorem 4.7], the garbage-collected form MÆ of M can be found statically. Since the set of garbage-collected terms is closed under reduction [15], any reduction starting from MÆ is needed.2 Hence the length of complete family-reduction starting from MÆ coincides with the number of families relative to MÆ , i.e., the number of needed families in M, and it remains to observe that the leftmost-outermost normalizing complete family-reductions starting from M and MÆ have the same length. Corollary 15 (1) The λH -calculus is strongly normalizing wrt β-reduction. (2) The simply typed λ-calculus is strongly normalizing wrt β-reduction. Proof. (1) Exactly as in [3]: Immediate from Corollary 14.1 and the Generalized Finite Developments theorem [21, Theorem 5.1], stating that any reduction that contracts redexes in finitely many families is terminating. (2) Exactly as in [15], by combining (1) with the following two facts: (a) any simply typable λ-term has a hyperbalanced η-expansion; and (b) contraction of η-redexes can be postponed after contraction of β-redexes in any βη-reduction (since η-steps do not create β-redexes) [5, Corollaries 15.1.5-6].
4 Virtual Church-Rosser Theorem Let us recall the concept of descendant from [11], allowing the tracing of subterms u along β-reductions. In a β-step M M 0 , where u (λx:L)N, the contracted redex u does not have a residual, but it has a descendant: the descendant of u, as well as of its operator λx:L and its body L, is the contractum of u. The descendants of free occurrences of x in L are the substituted occurrences of N. The descendants of other subterms of M
!
2
þ
Closure of the set of garbage-free terms (called good terms in [15]) under reduction implies bΛ: Indeed, MÆ is balanced since it is obtained from M by replacing a number of that MÆ 2 b components with fresh variables; and every application subterm in MÆ is semi-hyperbalanced since there are no arrows in MÆ coming out of the replaced subterms.
58
R. Kennaway, Z. Khasidashvili, and A. Piperno
are intuitively clear (see [14] for a more precise definition). This definition extends to arbitrary β-reductions by transitivity. We recall from [21, 5] that any (finite) strongly equivalent reductions P : M N N are H-equivalent: for any redex w M, w=P = w=Q. Further, it follows and Q : M from the results in [11] that P L Q implies also that P and Q are strictly-equivalent: for any subterm L of M, the descendants of L under P and under Q are the same subterms of N. We will refer to these two properties as HCR and SCR, respectively. Our aim in this section is to generalize HCR to virtual redexes. The result will be used in Section 6 to prove a new characterization of families via virtual redexes. We first prove an analog of (a weaker form of) HCR for residuals of arrows. The following definition of residuals of arrows is equivalent to the one in [15], but is different in that the current definition is inductive, and the residuals of arrows are defined as arrows, so unlike [15] there is no need to prove this by a tedious case analysis. Pictures illustrating the residual concept for arrows can be found in [10] (see also [15]). u x Definition 16 (Residuals of arrows)3 Let M M 0 , let u (λy:P0 )P, and let ι : N be an arrow in M b Λ. We define the u-residuals of ι in M 0 by induction on n = rank(ι). (1) (n = 1) Then ι is the j-th arrow relative to an application subterm w (for some 1 j arity(w)). (a) If w = aps(u) and j = 1 (i.e., if u contracts ι), then ι does not have u-residuals. (b) If w = aps(u) and j > 1, then u creates a unique application subterm w0 of type 1, and the ( j 1)-th arrow ι0 : N 0 x0 relative to w0 such that x0 is a u-descendant of x is the unique u-residual of ι x0 (c) If w = aps(u), then for any u-residual w0 M 0 of w, any j-th arrow ι0 : N 0 0 0 relative to w such that x is a u-descendant of x is a u-residual of ι. (2) (n > 1) Let ι be say m-th successor of κ : L z. (a) If n = 2 and u contracts κ (and thus L = P and z is a free occurrence of y in P0 ), then the unique u-residual of ι is the arrow ι0 : N 0 x0 in M 0 such that: ι0 is relative to 0 the application subterm w of type 2 created by contraction of κ; and x0 is the unique u-descendant of x that is in the copy of L that substitutes z. (b) If u does not contract κ and κ does not have a u-residual, then ι does not have a u-residual either. (c) Otherwise, for any u-residual κ of κ, any m-th successor of κ0 in M 0 such that x0 is a u-descendant of x is a u-residual of ι. u Definition 17 (Residuals of virtual redexes) Let M b Λ, let M M 0 , and let v vr(M ) be relative to an application subterm w. We define residuals of v as virtual redexes in M 0 by induction on n = deg(v). - (n = 0) Then v is a redex, and the residuals of the redex v are declared as the residuals of the virtual redex v. - (n > 0) We distinguish two subcases: (a) Let v be the i-th virtual redex relative to w (for some i > 1). If w = aps(u), then u creates a unique application subterm w0 of type 1, and arity(w0 ) = arity(w) 1. Then the (i 1)-th virtual redex relative to w0 is the only u-residual of v. Otherwise, any uresidual wÿ of w is an application subterm such that arity(wÿ ) = arity(w), and the i-th virtual redex relative to wÿ is a u-residual of v (that is, v has as many u-residuals as w).
!!
þ
!
2
ü ü
6
û
!!
ÿ
!7
ý
ÿ 7! !7
!7
2
û
3
7!
[15], as well as [10], contains figures illustrating the definition.
!
2
û
Static Analysis of Modularity of b-Reduction
59
7!
(b) Let v be the i-th virtual redex relative to an arrow ι : L z relative to w. If rank(ι) = 1 and u contracts ι, then ι collapses and the u-descendant L0 of L that substitutes z becomes the operator of a created application subterm w0 of type 2 in M 0 . (In this case we say that w0 is created by contracting ι.) In this case, the i-th virtual redex relative to w0 is the only u-residual of v. If ι does not collapse and does not have a uresidual, then v does not have a u-residual either. Otherwise, for any u-residual ι0 of ι, the i-th virtual redex relative to ι0 is an u-residual of v. Remark 18 One can easily verify (by induction on rank(ι)) that, in the notation of Definition 16, if ι is relative to aps(u) M and u does not contract ι, then ι has exactly one u-residual in M 0 . From this and Definition 17 it is immediate that any virtual redex in M relative to aps(u) and different from u has exactly one u-residual in M 0 .
ÿ
Notation 19 We will use the same notation for residuals of arrows and virtual redexes as for residuals of redexes. For example, ι=P will denote the residuals of ι after the reduction P.
2 2
Lemma 20 Let u1 ; u2 M b Λ, and let ι be an arrow in M. Then the residuals of ι under u1 u2 and u2 u1 are the same arrows.
t
t
Proof. (See [10] for a detailed proof) Induction on rank(ι), using Defs 3 and 16. Lemma 21 Let u1 ; u2
2 M 2 bΛ, and let v 2 vr M . Then v u t u (
)
=(
1
2 ) = v=(u2
tu
1 ).
Proof. (See [10] for a detailed proof) Case analysis, according to Definition 17.
2
Theorem 22 (Virtual Church-Rosser, VCR) Let M b Λ, and let w N and Q : M N be L´evy-equivalent. Then w=P = w=Q. P:M
!!
!!
2 vr M , and let (
)
Proof. Lemma 21 implies the theorem exactly as in the case when w is a redex, see e.g. [17].
5 Extraction Normal Forms and Chains In this section we show that if Pv is an extraction normal form, in the λH -calculus, then P is a chain, that is, every step of P except the last creates the next contracted redex. We show that the converse is also true. We will use this characterization of extraction normal forms to study the contribution relation in the λH -calculus. Actually, we use this characterization also to show that redexes of the same family are residuals of a unique virtual redex, but we expect that this result can be proved for the λ-calculus in general, where extraction normal forms need not be chains.
2 bΛ, let M!M , and let v aps u . (1) Let ι be a u-residual of an arrow ι relative to an application subterm w in M 2 b Λ. If w v, then deg ι deg ι þ 1; otherwise, deg ι deg ι . u
Lemma 23 Let M
0
=
( )
0
=
( 0) =
( )
( 0) =
(2) Any arrow in M is a residual of an arrow in M. 0
Proof. (See [10] for a detailed proof) (1) By induction on n = rank(ι).
( )
60
R. Kennaway, Z. Khasidashvili, and A. Piperno
!7
(2) By induction on the rank n = rank(ι0 ) of an arrow ι0 : N 0 x0 in M 0 , it is shown that ι : N x is an arrow in M whose u-residual is ι0 , where x is the only u-ancestor of x0 and N is the only u-ancestor of N 0 that is an abstraction (but is not the operator of a redex).4
7!
Using Lemma 23, we now prove its analog for virtual redexes.
2
2
!
Lemma 24 Let M b Λ, let M M 0 , and let v0 vr(M 0 ) be a u-residual of a virtual redex v M relative to an application subterm w. If w = aps(u), then deg(v0 ) = deg(v) 1; otherwise, deg(v0 ) = deg(v).
ÿ
u
þ
6
Proof. If deg(v) = 0, then w = aps(u), both v and v0 are redexes, and deg(v0 ) = 0 = deg(v) by Definition 7. So let deg(v) > 0. We distinguish two subcases: (a) Let v be the i-th virtual redex relative to w (i > 1 since n > 0). If w = aps(u), then v has exactly one residual, v0 , which is the (i 1)-th virtual redex relative to the created application subterm w0 of type 1, hence deg(v0 ) = i 1 = deg(v) 1 by Definition 7. And if w = aps(u), then any residual v0 of v is the i-th virtual redex relative to some u-residual wÿ of w (if any), hence deg(v0 ) = i = deg(v) by Definition 7. (b) Now let v be the i-th virtual redex relative to an arrow ι relative to w. Then deg(v) = deg(ι) + i 1 by Definition 7. If deg(ι) = 1 and u contracts ι (in this case, w = aps(u)), then v0 is the i-th virtual redex relative to the application subterm w0 of type 2 created by the contraction of ι, thus deg(ι0 ) = i 1 = deg(v) 1. Otherwise, v0 is i-th virtual redex relative to a u-residual ι0 of ι, and deg(v0 ) = deg(ι0 ) + i 1 by Definition 7. Now it remains to apply Lemma 23.
þ
6
þ
þ
þ
þ
þ
þ Lemma 25 Let M 2 b Λ, let M !M , and let v 2 vr M u
0
0
(
unique virtual redex in M.
0
).
Then v0 is a u-residual of a
Proof. By Definition 3, we need to consider the following four cases: (1) Let v0 be i-th virtual redex relative to a residual w0 M 0 of an application subterm w M (hence w = v). Then, by Definition 17, v0 is the u-residual of (only) the i-th virtual redex relative to w. (2) Let v0 be i-th virtual redex relative to a created application subterm w0 M 0 of type 1. Then, by Definition 17, v0 is the u-residual of (only) the (i + 1)-th virtual redex relative to aps(u) (in this case, w = aps(u)). (3) Let v0 be i-th virtual redex relative to the application subterm w0 M 0 of type 2 created by contraction of an arrow ι. Then, by Definition 17, v0 is the u-residual of (only) the i-th virtual redex relative to ι (in this case, w = aps(u)). (4) Let v0 be i-th virtual redex relative to an arrow ι0 . Then, by Lemma 23, ι0 is a uresidual of an arrow ι in M, and by Definition 17, v0 is the u-residual of (only) the i-th virtual redex relative to ι.
ÿ
ÿ
6
ÿ
ÿ
Definition 26 We call a reduction P a chain if every step in P except the last creates the next contracted redex. 4
The only two different abstractions in M that may have the same descendant abstraction in M 0 are the operator λy:P0 and the body P0 of the contracted redex u (when P0 is an abstraction), x0 in M 0 such that N 0 is the u-descendant and it can be shown that there is no arrow ι0 : N 0 of P0 .
!7
Static Analysis of Modularity of b-Reduction
2
Definition 27 Let w vr(M ), let P : M we call v, or rather, Pv, an w-redex.
61
!! N, and let v 2 N be a P-residual of w. Then
The following two corollaries are now immediate from Lemma 24 and Lemma 25, respectively:
2
Corollary 28 Let v0 be a virtual redex in M0 b Λ with deg(v0 ) = n. Then there is unÿ1 u0 u1 Mn and a redex un Mn such that v0 has a unique a reduction P : M0 M1 : : : residual vi in Mi (i = 0; : : : ; n), deg(vi ) = n i, vi is relative to aps(ui ) (hence un = vn and Pun is a v0 -redex), and P + un is a chain.
! ! !
þ
!!
ÿ
2
N, where M b Λ, and let v Corollary 29 Let P : M virtual redex w vr(M ) such that Pv is an w-redex.
2
ÿ N. Then there is a unique
We will now give a characterization of extraction normal forms in the λH -calculus. We refer to [21, 4] for the definition of extraction relation and extraction normal forms in the λ-calculus. We denote en f (M ) the set of extraction normal forms relative to M.
!!
Lemma 30 Let P : M N be a chain, where M is any (i.e., not necessarily hyperbalanced) term. Then P en f (M ).
2
Proof. It is trivial to check that none of the four types of extraction steps in Definition 4.7 of [21] can be applied to a chain P, which means that P is in extraction normal form. The converse is not true in the λ-calculus in general, but is true in the λH -calculus (Lemma 34). We need two simple lemmas first.
ÿ 2 2
Λ, let Lemma 31 (See [10]) Let v1 be in the argument of a redex u = (λx:L)N M b v01 v1 =u, and let v01 create a redex v02 . Then v1 creates a redex v2 such that v02 v2 =u0 , where u0 = u=v1. v1 v2
2
-
u
-
u0
? -? v01
ÿ
v02
! ! !ÿ M u
u
Definition 32 let N0 M0 . We call a reduction P : M0 0 M1 1 : : : if every ui is in the unique descendant Ni Mi of N0 along P.
2
!!
ÿ
un 1
n
internal to N0
Λ and let P : M N be a chain. Then P is internal to the appliLemma 33 Let M b cation subterm corresponding to the first redex contracted in P.
! u
ÿ
Proof. Immediate from the fact that if M M 0 creates a redex v M 0 (of type 1 or 2), then the application subterm corresponding to v is inside the descendant of the application subterm corresponding to u. Now we can prove the converse of Lemma 30. Lemma 34 Let P be a reduction in the λH -calculus in extraction normal form. Then P is a chain.
62
R. Kennaway, Z. Khasidashvili, and A. Piperno
Proof. Suppose on the contrary that there is an extraction normal form P which is not a chain, and assume P is shortest among such reductions. Then P has the form
!M !M !! M is a u -residual of some redex v ÿ M . Let w P : M !M ! !M P : M0
where u1
u0
1
0
u1
2
n
0
0 = aps(v0 ),
0
0
1
u1
2
and let
n
By Definition 4.7.1 of [21], n > 2. Since P is standard, there are three possible relative positions of u0 and v0 . (1) The redexes u0 and v0 are disjoint, and v0 is to the right of u0 . Then w0 is disjoint from u0 , and by Lemma 33, P0 is internal to the residual w1 of w0 . Hence where is a reduction Q : M0 N internal to w0 such that P0 = Q=u0 , and by Definition 4.7.2 of [21], u0 can be extracted from P – a contradiction. (2) The redex v0 is in the body of u0 . By Lemma 31, there exist redexes v1 ; : : : ; vnÿ2 such that the following diagram commutes v0 v1 v n ÿ3 vnÿ2 M0 N1 N2 Nnÿ3 Nnÿ2 Nnÿ1 0 0 0 0 u n ÿ3 unÿ2 u0 u1 u2
!!
- - -? ? ?- ? - ? M - M - M M M M u u u u v v j 1 nþ2 , u + v u i 0 1
2
1
3
2
nÿ2
n ÿ2
nÿ1
nÿ1
n
þ
) i 1 = i = 0i ( = ; : : : ; n 2) and where u0j = u0 =( 0 + : : : + jÿ1 ) ( = ; : : : ; Q : v0 + : : : + vnÿ1 is a chain. By Lemma 33, Q is internal to w0 , hence is internal to the body of u0 , and we have P0 = Q=u0. Hence by Definition 4.7.3 of [21], u0 can be extracted from P – a contradiction. (3) The redex v0 is in the argument S of u0 . Further, u1 is in a substituted copy S0 of S. Then the application subterm w0 has a residual w1 M1 which is again an application Λ), and by Lemma 33, P0 is internal to w1 (and thus is internal subterm (since M0 ; M1 b 0 to S ). Hence by Definition 4.7.4 of [21], u0 can be extracted from P – a contradiction.
ÿ
2
Corollary 35 In the λH -calculus, a reduction P is a chain iff it is in extraction normal form.
6 From Zig-Zag Back to Residuals We show that redexes with co-initial histories are in the same family iff they are residuals of the same virtual redex in the initial term. This is actually a new characterization of families in the λH -calculus. (Asperti and Laneve [1] only showed how the path corresponding to a family can be traced back along the history of the canonical representative of the family.) We will make use of the virtual Church-Rosser property, which in particular implies that residuals of virtual redexes under reductions are invariant under L´evy equivalence. We expect that such a characterization can also be given for the λ-calculus in general. Λ, and let Pv; P0 v0 en f (M ) be vþ -redexes for some Lemma 36 (See [10]) Let M b þ 0 0 vr(M ). Then Pv = P v . v
2
2
2
Static Analysis of Modularity of b-Reduction
63
ÿ
Remark 37 Using Lemma 36 we can actually give another proof of the 1 1 correspondence of families and virtual redexes in the λ-calculus (Theorem 13). That proof does not use the properties of wbp from [1]. Recall from [21] that every family relative to M has exactly one member in extraction normal form, and the latter are different for different families. Hence it is enough to prove that there is a 1 1 correspondence between en f (M ) and vr(M ). By Corollary 29, there is a function f : en f (M ) vr(M ) assigning to any Pv en f (M ) the (unique) virtual redex vÿ vr(M ) such that Pv is a vÿ -redex. By Corollary 28, f is a bijection. It remains to use Lemma 36. Now we strengthen the above theorem to yield the promised characterization of families via residuals of virtual redexes. Λ. Redexes with histories Pv and Qu relative to M are in the Theorem 38 Let M b same family iff there is a redex w M such that both Pv and Qu are w-redexes.
ÿ
!
2
2
2 þ Proof. ()) Let Pv ' Qu. Since ' is the equivalence closure of the copy relation, it is enough to consider the case then Qu is a copy of Pv, i.e., Q ý P P for some P and u 2 v P . By Corollary 29, there is a unique w 2 vr M such that Qu is an w-redex. By VCR (Theorem 22), u 2 w P P , and by Lemma 25, v 2 w P. (() Let Pv and Qu be w-redexes. Further, let P v and Q u be the extraction normal forms of Pv and Qu, respectively. By ()), P v and Q u are w-redexes, and P v Q u by Lemma 36. Hence Pv ' Qu by Theorem 4.9 of [21]. +
L
=
0
(
=(
0
+
0
0
)
)
=
0
0
0
0
0
0
0
0
0
0
=
0
0
7 Creation and Contribution Relations
2
In this section we show that, for any M b Λ, the contribution relation on the set FAM (M ) of families relative to M is a forest (as a partial order). That is, the down set φ0 φ0 , φ of any family φ FAM (M ), wrt , , is a chain (i.e., is a total order).
2
!
f j !g
2
2
Definition 39 (Contribution relation) Let M b Λ and let v; v0 vr(M ). We define the creation relation < on vr(M ) as the union of the following relations < 1 and < 2 . (1) v < 1 v0 iff, for some i, either v is i-th virtual redex relative to an application subterm w M with arity(w) > i and v0 is the (i + 1)-th virtual redex relative to w; or v is i-th virtual redex relative to an arrow ι and v0 is the (i + 1)-th virtual redex relative to ι. (2) v < 2 v0 iff v0 is the 1st virtual redex relative to an arrow that v contracts. If v < v0 , then we say that v is a creator of v0 . The contribution relation v on vr(M ) is the transitive reflexive closure of < . Similarly, if v v v0 , then we say that v is a contributor of v0 . Lemma 40 (See [10]) Every virtual redex u M b Λ with a positive degree has a unique creator v vr(M ), and deg(u) = deg(v) + 1. u Lemma 41 (See [10]) Let M M 0 , let v1 ; v2 vr(M ) be relative to the application 0 subterm w = aps(u), and let v1 and v02 be the unique (by Remark 18) u-residuals of v1 and v2 , respectively. Then v1 < 1 v2 (resp. v1 < 2 v2 ) iff v01 < 1 v02 (resp. v01 < 2 v02 ). Lemma 42 (See [10]) Let M b Λ, let vÿ1 ; vÿ2 vr(M ), and let P1 v1 and P2 v2 be respecÿ ÿ tively v1 - and v2 -redexes in extraction normal form. Then vÿ1 < 1 vÿ2 (resp. vÿ1 < 2 vÿ2 ) iff P2 = P1 + v1 and v2 is a redex of type 1 (resp. type 2) created by v1 .
ÿ
ÿ
þ
ÿ
ÿ
ÿ
2
!
ÿ 2
þ 2 2 ÿ 2
ÿ
ÿ
ü
ü
ÿ ÿ
ÿ ÿ
64
R. Kennaway, Z. Khasidashvili, and A. Piperno
! ! !ÿ u
un 1
u
Corollary 43 A reduction P : M0 0 M1 1 : : : Mn is a chain iff there is a covering chain v0 < v1 < : : : < vnÿ1 of virtual redexes in M0 such that ui is a vi -redex, i = 0; : : : ; n 1.
ÿ ÿ ÿ
ÿ
Let FAM (P) denote the set of families (relative to the initial term of P) whose member redexes are contracted in P. The family relation induces the contribution relation on families as follows:
2
!
Definition 44 ([8]) Let φ1 ; φ2 be families relative to the same term M b Λ. Then φ1 , φ2 (read φ1 contributes to, or is a contributor of, φ2 ) iff for any redex Pv φ2 , P contracts at least one member of φ1 .
2
It follows easily from the definition of extraction in [21] that if a reduction Q extracts to Q0 , then FAM (Q0 ) FAM (Q). Hence φ1 , φ2 iff the history of the extraction normal form of φ2 contracts a member of φ1 . Hence we have immediately from Corollary 35 and Corollary 43 the following theorem.
!
þ
2 ý 2
2 ! !
Theorem 45 (1) Let M b Λ, let v1 ; v2 vr(M ), and let φ1 and φ2 be the corresponding families relative to M. Then v1 v v2 iff φ1 , φ2 . (2) Further, for any M b Λ, (FAM (M ); , ) is a forest.
8 Conclusion Theorem 45 implies that all application subterms (rather, their ‘skeletons’) can be computed independently, in the sense of [13], and the results can be ‘combined’ to yield the normal form (the combination of P and Q should yield the final term of their least upper bound P Q). Note here that nesting of application subterms is not an obstacle. At present, we do not know what constitutes the skeleton of an application subterm. The idea is that the application subterms whose relative substitution paths are isomorphic must have the same skeletons. But the above theorem shows that independent computation is possible 5 . Note that, once we compute a skeleton, we can reuse the result every time we meet the same skeleton in a hyperbalanced term. In other words, we can define ‘big beta steps’ (say by induction on the length of the longest relative path), that normalize skeletons in one big step. This would make computation of hyperbalanced terms efficient.
t
References 1. Asperti, A., Laneve, C. Paths, computations and labels in the lambda calculus. Theoretical Computer Science 142(2):277-297, 1995. 2. Asperti, A., Danos, V., Laneve, C., Regnier, L., Paths in the lambda-calculus. In Proceedings of the Symposium on Logic in Computer Science (LICS), IEEE Computer Society Press, 1994. 3. Asperti, A., Mairson H.G., Parallel beta reduction is not elementary recursive. In Proc. of ACM Symposium on Principles of Programming Languages (POPL), 1998. 5
For Recursive Program Schemes, whose contribution relation is again a forest, independent computation of redexes is very easy to implement [12].
Static Analysis of Modularity of b-Reduction
65
4. Asperti A., Guerrini S. The Optimal Implementation of Functional Programming Languages. Cambridge Tracts in Theoretical Computer Science, Cambridge University Press, 1998. 5. Barendregt, H., The Lambda Calculus. Its syntax and Semantics (revised edition). North Holland, 1984. 6. Barendregt, H., Lambda Calculi with Types. in Handbook of Logic in Computer Science, Vol. 2, Oxford University Press, 1992. 7. H.P. Barendregt, J.A. Bergstra, J.W. Klop and H. Volken, Some notes on lambda-reduction, in: Degrees, reductions, and representability in the lambda calculus. Preprint no. 22, University of Utrecht, Department of Mathematics, 1976, 13-53. 8. Glauert, J.R.W., Khasidashvili, Relative normalization in deterministic residual structures. In: Proc. of the 19th International Colloquium on Trees in Algebra and Programming, CAAP’96, Springer LNCS, vol. 1059, H. Kirchner, ed. 1996, p. 180-195. 9. V. Kathail, Optimal interpreters for lambda-calculus based functional languages, Ph.D. Thesis, MIT, 1990. 10. Kennaway, R., Khasidashvili, Z. and Piperno, A., Static Analysis of Modularity of β-reduction in the Hyperbalanced λ-calculus (Full Version), available at http://www.dsi.uniroma1.it/~piperno/. 11. Khasidashvili, Z., β-reductions and β-developments of λ-terms with the least number of steps. In: Proc. of the International Conference on Computer Logic, COLOG’88, Springer LNCS, 417:105-111, 1990. 12. Khasidashvili, Z. Optimal normalization in orthogonal term rewriting systems. In: Proc. of 5th International Conference on Rewriting Techniques and Applications, RTA’93, Springer LNCS, 690:243-258, 1993. 13. Khasidashvili, Z., Glauert, J. R. W. The Geometry of orthogonal reduction spaces. In Proc. of the 24th International Colloquium on Automata, Languages, and Programming, ICALP’97, Springer LNCS, vol. 1256, P. Degano, R. Gorrieri, and A. Marchetti-Spaccamela, eds. 1997, p. 649-659. 14. Khasidashvili, Z., Ogawa, M. and van Oostrom, V. Perpetuality and uniform normalization in orthogonal rewrite systems. Information and Computation, vol.164, p.118-151, 2001. 15. Khasidashvili, Z., Piperno, A. Normalization of typable terms by superdevelopments. In Proc. of the Annual Conference of the European Association for Computer Science Logic, CSL’98, Springer LNCS, vol. 1584, 1999, p. 260-282 16. Khasidashvili, Z., Piperno, A. A syntactical analysis of normalization. Journal of Logic and Computation, Vol.10(3)381-410, 2000. 17. Klop, J.W., Combinatory Reduction Systems. PhD thesis, Matematisch Centrum Amsterdam, 1980. 18. J. Lamping, An algorithm for optimal lambda calculus reduction, in: Proc. 17th ACM Symp. on Principles of Programming Languages (1990) 6-30. 19. L´evy, J.-J., An algebraic interpretation of the λβκ-calculus and a labelled λ-calculus. Theoretical Computer Science, 2:97-114, 1976. 20. L´evy, J.-J., R´eductions correctes et optimales dans le λ-calcul. PhD thesis, Univerit´e Paris 7, 1978. 21. J.-J. L´evy, Optimal reductions in the lambda-calculus, in: J.P. Seldin and J.R. Hindley, eds., To H. B. Curry: Essays on Combinatory Logic, Lambda-calculus and Formalism (Academic Press, 1980) 159-192. 22. van Raamsdonk, F., Confluence and superdevelopments. In: Proc. of the 5th International Conference on Rewrite Techniques and Applications, RTA’93, Springer LNCS, 690:168182, 1993.
Exceptions in the Rewriting Calculus Germain Faure1 and Claude Kirchner2 1
2
ENS Lyon & LORIA 46 all´ee d’Italie, 69364 Lyon Cedex 07, France
[email protected] LORIA & INRIA 615 rue du Jardin Botanique 54602 Villers-l`es-Nancy, France
[email protected] http://www.loria.fr/˜ckirchne
Abstract. In the context of the rewriting calculus, we introduce and study an exception mechanism that allows us to express in a simple way rewriting strategies and that is therefore also useful for expressing theorem proving tactics. The proposed exception mechanism is expressed in a confluent calculus which gives the ability to simply express the semantics of the first tactical and to describe in full details the expression of conditional rewriting.
The rewriting calculus, also called ρ-calculus, makes all the basic ingredients of rewriting explicit objects, in particular the notions of rule formation (abstraction), rule application and result. The original design of the calculus (fully described in [CK01,Cir00]) has nice properties and encompasses in a uniform setting λ-calculus and first-order rewriting. Thanks to its matching power, i.e. the ability to discriminate directly between patterns, and thanks to the first class status of rewrite rules, it also has the capability to represent in a simple and natural way object calculi [CKL01a,DK00]. It allows moreover, thanks to its abstraction mechanism, to design extremely powerful type systems generalizing the λ-cube [CKL01b]. In the ρ-calculus, a rewrite rule is represented as a ρ-term (a term of the ρ-calculus) and the application of a rule, for example f (x, y) → x, at the root of a ρ-term, for example f (a, b), is also represented by a ρ-term, e.g. [f (x, y) → x](f (a, b)). Applying a rewrite rule l → r at the root of a term t consists first in computing the matching between l and t and secondly in applying the obtained substitution(s) to the term r. Since matching may fail, we can get an empty set of results. When matching is unitary, i.e. results in only one substitution, the evaluation of the rule application results in a singleton. So, in general the evaluation of every application is represented as a set, giving thus a first class status to the results of a rule application. For simplicity, we consider here only syntactic matching and we have indeed in this case either zero or one substitution matching l towards t. For example, f (x, y) matches f (a, b) using the substitution σ = {x → a, y → b}. Therefore [f (x, y) → x](f (a, b)) reduces by the evaluation rule of the ρ-calculus into the singleton {σx}, i.e. {a}. Since there is no substitution matching f (x, y) on g(a, b), the evaluation of [f (x, y) → x](g(a, b)) results in the empty set. This is denoted [f (x, y) → x](g(a, b)) − −→ →ρ ∅. S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 66–82, 2002. c Springer-Verlag Berlin Heidelberg 2002
Exceptions in the Rewriting Calculus
67
Since sets are ρ-terms, we can go further on and represent rewriting systems. For example, the set consisting of the two rules f (x, y) → x and f (x, b) → b is a ρ-term denoted {f (x, y) → x, f (x, b) → b}. The application of such a rewrite system, at the root of the term f (a, b) is denoted by the ρ-term [{f (x, y) → x, f (x, b) → b}](f (a, b)). To evaluate such a term, we need to distribute the elements of the set and thus the previous term is evaluated to {[f (x, y) → x](f (a, b)), [f (x, b) → b](f (a, b))} and then to {a, b}. This shows how non-determinism is simply expressed in the rewriting calculus. We can also see on these examples that in the ρ-calculus the failure of matching the left hand-side of a rewrite rule on a term is revealed by the evaluation of the application to the empty set. Therefore, the empty set can be used as a signal that a rule does not match a term. This is very useful information but unfortunately it could be erased latter by the evaluation process. For example, ∗ assuming a and b to be constants, we have [x → a]([a → a](b)) − −→ →ρ [x → a](∅). Since clearly x matches the ρ-term ∅, the latter term can be reduced to {a}. This shows that in the ρ-calculus, it can happen that a sub-term of a given term leads to ∅ because of matching failure and that the final result is not the empty set. A calculus is called strict or equivalently it has the strictness property, when it allows for a strict propagation of failure. This means that if there is a matching failure at one step of a reduction, the only possible final result of the reduction is failure, represented here by the empty set. One of the basic but already elaborated goals we want to reach using the ρ-calculus, is to use its matching-ability to naturally and simply express reduction and normalization strategies. Indeed, the first simple question to answer is: is there a ρ-term representing a given derivation in a rewrite theory? And the answer has been shown to be positive [CK01], leading in particular to the use of ρ-terms as proof terms of rewrite derivations [Ngu01]. Going further on, we want to represent rewrite strategies, and in particular normalization strategies. Therefore we want to answer the question: given a rewrite theory R is there a ρ-term ξR , such that for any term u, if u normalizes to the term v in the rewrite theory R, then [ξR ](u) ρ-reduces to (a set containing) the term v. Since the ρ-calculus embeds the λ-calculus, any computable function as the normalization one is expressible in this formalism. We want to make use of the matching power and the non-determinism of the ρ-calculus to bring an increased ease in the expression of such functions together with their expression in a uniform formalism combining standard rewrite techniques and higher-order behaviors. In trying and using the specificities of the calculus to solve the previous questions, we encountered several difficulties, the main one being the ambivalent use of the empty set. For example, when applying on the term b the rewrite rule a → ∅ that rewrites the constant a into the empty set, the evaluation rule of the calculus returns ∅ because the matching against b fails: [a → ∅](b) − −→ →ρ ∅. But it is also possible to explicitly rewrite an object into the empty set like in ∗ [a → ∅](a) − −→ →ρ ∅, and in this case the result is also the empty set because the rewrite rule explicitly introduces it. It becomes then clear that one should avoid the ambivalent use of the empty set and introduce an explicit distinction between failure and the empty set of
68
G. Faure and C. Kirchner
results. In fact, by making this distinction, we add to the matching power of the rewriting calculus a catching power, which allows to describe, in a rewriting style, exception mechanisms, and therefore to express easily convenient and elaborated strategies needed when computing or proving. Because of its fundamental role, we mainly focus on the first strategy combinator used in ELAN [BKK+ 98] that is also called if then in LCF like logical frameworks. We thus propose a confluent calculus (using a call-by-value evaluation mechanism) in which the first and a simple but complete exception mechanism can be expressed. For that purpose, we introduce a new constant ⊥ and an exception handler operator exn. In this new version of the ρ-calculus, like in most exception mechanisms, an exception can be either raised by the user (⊥ is put explicitly in the initial term like for example in [x →⊥](a)) or can be caused by a “run time error” (for example a matching failure like in [f (x, y) → x](g(a, b))). Either an exception can be uncaught (⊥ goes all over the term and the term is reduced to {⊥} when using a strict evaluation strategy) or it can be stopped by exn and next caught thanks to matching. Thus, the calculus allows to: (i) catch any matching failure (and therefore to be able to detect that a term is in normal form) like in exn(⊥) → c exn [f (x, y) → x](g(a, b)) , (ii) ignore a raised exception like for example in the term [x → c](exn(M )), (iii) switch the evaluation mechanism according to a possible failure in the evaluation of a term (think again of the application to normal terms). For example, a program consisting of a division x/y (encoded in the ρ-calculus by (x/ρ y)), a program P and a possible switch of the evaluation to a program Perr if the Div By Zero of x/y, can be expressed by the term exception is raised during the computation first exn(⊥) → Perr , x → P exn(x/ρ y)) . Our contributions are therefore the construction, study and use of a rewriting calculus having an explicit exception mechanism. In the next section, we motivate and build the calculus. In section 2, we expose a call-by-value strategy that makes the calculus confluent. We finally show in section 3 that elaborated rewriting strategies can be described and evaluated, in particular we show that the first strategy combinator can be naturally expressed.
1
The ρε -Calculus
We introduce here a new ρ-calculus: the ρε -calculus. After the construction of st the ρ1 -calculus (the ρ-calculus doped with the first operator, as described in [CK01]), it is quite natural to ask if the first operator can be simply expressed in the ρ-calculus. To understand the problem, we ask the question: instead of enriching the ρ-calculus with the first operator and its specific evaluation rules, what first class ingredients must we add to the ρ-calculus to express the first? The first problem is to obtain a non-ambivalent meaning for the empty set. The second problem is to obtain both the strictness and the matching failure test.
Exceptions in the Rewriting Calculus
1.1
69
How Can the First Be Simply Expressed in the ρ-Calculus?
Since the ρ-calculus embeds the λ-calculus, any computable function like the first can be encoded with it. Using the “deep” and “shallow” terminology promoted in [BGG+ 92], what we are looking for is to simply express the first by avoiding the use of a deep encoding that would not fully use the matching capability of the ρ-calculus. Indeed a shallow encoding is encoding variables of the source calculus by variables of the target one, as a deep encoding typically encodes variables by constants. What we call in this paper a simple encoding is a shallow encoding using in a direct way the matching capabilities (i.e. the variable abstraction management) of the framework. The original ρ-calculus has been built in such a way that the empty set represents both the empty set of terms and the result of the failure of a rule application. For example, we have ∗
[a → ∅](a) − −→ →ρ ∅, where ∅ represents the empty set of terms and [a → ∅](b) − −→ →ρ ∅, where ∅ encodes the failure in the matching between a and b. So, in this version of the ρ-calculus, we are not able to determine if the result of a rule application is a failure. Nevertheless, this test is useful if we want to define the first in the ρ-calculus, since the first is defined by: F irst
[first(s1 , . . . , sn )](t) ❀ {uk ↓} ∗ −→ →ρ ∅, 1 i k − 1 if [si ](t) − ∗ −→ →ρ uk ↓ = ∅, FVar(uk ↓) = ∅ [sk ](t) −
F irst
[first(s1 , . . . , sn )](t) ❀ ∅ ∗ −→ →ρ ∅, 1 i n if [si ](t) −
where uk ↓ denotes the normal form of uk for the evaluation rules of the ρ-calculus ∗ −→ →ρ ) and FVar(t) denotes the free variables of t. (denoted − Remark 1. The side condition in F irst indicates that uk must not contain any free variable and must be in normal form. It seems to be restrictive at first sight, but it allows for a valid definition of the first that preserves the confluence of the ρ-calculus. Despite serious efforts to find it, the first seems not to be simply expressible in the ρ-calculus, and even if it were, there is a serious drawback the empty set from the rule application Indeed, in not distinguishing failure. ∗ ∗ first a → ∅, b → c a − −→ →ρ ∅ and first a → ∅, b → c c − −→ →ρ ∅ for two very different reasons.
70
1.2
G. Faure and C. Kirchner
A First Approach to Enriching the ρ-Calculus
To distinguish the empty set from rule application failure, we introduce a new symbol ⊥ (which we assume not to belong to the signature of our calculus) to denote the latter. As a consequence, we have to adapt accordingly the definition of the operators of the calculus. We denote by Sol(l t) the set of solutions obtained when syntactically matching l and t. The central idea of the main rule of the calculus (F ire) is that the application of a rewrite rule l → r at the root position of a term t, consists in computing the solution of the matching equation (l t) and applying the substitution returned by the function Sol(l t) to the term r. When there is no solution for the matching equation (l t) the special constant ⊥ is used and the result is the set {⊥}. We obtain the rule described in Figure 1.
F ire
[l → r](t) ❀
{⊥} if l t has no solution {σr} where σ ∈ Sol(l t)
Fig. 1. F ire rule in the extended ρ-calculus
Since the symbol ⊥ represents failure, we want it to be propagated. Therefore, we have to add the rules of the third part of Figure 2 (at this stage of the construction, the propagation of the failure is not strict). In the proposed calculus, the semantic of {⊥} is the failure term, so we want it to be a normal ∗ −→ → ∅ thanks to the side condition form term. Moreover, we can not have {⊥} − ∗ ←→⊥ and ⊥ can never be a n > 0 in the F latBot rule, we can not have {⊥} ←→ result since it is not a set (except of course, if the initial term is ⊥). To define the first, we need to express an evaluation scheme of the form: If res = ⊥ then evaluate c. It can be expressed at this step of the construction by the term: [⊥→ c](res). If res = ⊥, then [⊥→ c](res) can be rewritten in {c}. So, it seems necessary to have a new symbol function (which we assume not to belong to the signature of our calculus) to do a more precise matching, to distinguish the propagation of the failure from its catching. Example 1. Using the first operator (which is discussed in Section 3.1), we can express the evaluation scheme: if M is evaluated to the failure term then evaluate the term Pf ail else evaluate the term Pnor . thanks for example to the term first ⊥ → Pf ail , x → Pnor M . But, this has a serious drawback: If M leads to {⊥} then, using the rule for the ⊥ propagation (see the third part ofFigure 2) the following reduction may happen: first ⊥ → Pf ail , x → Pnor M ∗ ⊥ − −→ →AppBotR {⊥}. − −→ →ρ first ⊥ → Pf ail , x → Pnor
Exceptions in the Rewriting Calculus
71
So, we need to improve the proposed calculus to obtain a confluent and strict calculus which allows failure catching and rules out reductions like f (∅, ⊥) − −→ →ρ ∅ and f (∅, ⊥) − −→ →ρ {⊥}. 1.3
An Adapted Version of the Calculus: The ρε -Calculus
About the meaning of the empty set. What we propose in this new extension is to give a single meaning to the empty set: to represent only the empty set of terms. Consequently, the application of a term to the empty set should lead ∗ −→ →ρ {⊥}) and the application of the empty set to a term to failure (i.e. [v](∅) − ∗ −→ →ρ {⊥}). On the other hand, provided all should lead too to failure (i.e. [∅](v) − the ti are in normal form, a term like f (t1 , . . . , ∅, . . . , tn ) will be considered as a normal form and not re-writable to {⊥}. This is in particular useful to keep to the empty set its first class status. Moreover, we would like a ρ-term of the form u → ∅ to be in normal form since it allows us to express a void function. Up to the last rule (Exn), these design choices lead to the rewriting calculus evaluation rules described in Figure 2. Let us now explain the design choices leading to the last rule. An extension of the calculus to deal with exceptions. In particular motivated by our goal to define the first operator, we need to express failure catching. Moreover, if we want to control the strictness of failure propagation, we need to encapsulate the failure symbol by a new operator to allow for reductions of the following form: [operator(⊥) → u](operator(⊥)) − −→ →ρ {u}. This new operator is denoted exn and its evaluation rule is defined by: Exn
exn(t) ❀ {t} if {t} ↓ = {⊥} and FVar(t ↓) = ∅
Here, we can make the same comments on the side conditions of the Exn rule as whose of Remark 1. Moreover, thanks to a simple matching we have [exn(⊥) → u](exn(⊥)) − −→ →ρ {u}. ∗
Remark 2. If the evaluation of a term M leads to {⊥}, then exn(M ) − −→ →ρ ∗ exn({⊥}). It is then natural to allow for the reduction exn({⊥}) − −→ →ρ {exn(⊥)} since otherwise we prevent an evaluation switched by a possible failure in the evaluation of a term (see Example 2 in which the OpSet rule is essential). So, as can be seen in Figure 2, in the ρε -calculus, we extend the OpSet rule for all ε-functional symbols (i.e. for all h ∈ Fε = F ∪{exn, ⊥}, where F is the signature of the calculus). Remark 3. We could have replaced the Exn rule by a rule rewriting exn(t) in {t ↓} instead of {t}. But doing so will keep us away from a small step semantics, since the intrinsic operations at the object level will in this case be executed at the meta level and the operator at the object level will not contain all the evaluation information anymore.
72
G. Faure and C. Kirchner
F ire
[l → r](t)
Congr CongrF ail
[f (t1 , . . . , tn )](f (u1 , . . . , un )) [f (t1 , . . . , tn )](g(u1 , . . . , un ))
Distrib
[{u1 , . . . , un }](v)
{⊥} if l t has no solution {σr} where σ ∈ Sol(l t) ❀ {f ([t1 ](u1 ), . . . , [tn ](un ))} ❀ {⊥}
❀
F lat
❀ if n > 0, {[u1 ](v), . . . , [un ](v)} if n = 0, {⊥} [v]({u1 , . . . , un }) ❀ if n > 0, {[v](u1 ), . . . , [v](un )} if n = 0, {⊥} u → {v1 , . . . , vn } ❀ {u → v1 , . . . , u → vn } if n > 0 h(v1 , . . . , {u1 , . . . , un }, . . . , vm ) ❀ {h(v1 , . . . , ui , . . . , vm )}1in if n > 0 and h ∈ Fε {u1 , . . . , {v1 , . . . , vm }, . . . , un } ❀ {u1 , . . . , v1 , . . . , vm , . . . , un }
AppBotR AppBotL AbsBotR OpBot F latBot
[v](⊥) [⊥](v) v →⊥ f (t1 , . . . , tk , ⊥, tk+1 , . . . , tn ) {t1 , . . . , ⊥, . . . , tn }
❀ ❀ ❀ ❀ ❀
Exn
exn(t)
❀ {t} if {t} ↓ = {⊥} and FVar(t↓) = ∅
Batch SwitchR OpSet
{⊥} {⊥} {⊥} {⊥} {t1 , . . . , tn } if n > 0
Fig. 2. EVALε : the evaluation rules of the ρε -calculus
Definition of the ρε -calculus. The previous design decision leads us to define the ρε -terms and the ε-first-order terms in the following way. ρε -term
t ::= x | f (t, . . . , t) | t → t | [t](t) | {t, . . . , t} | ⊥ | exn(t)
where x ∈ X and f ∈ F. In the ρ-calculus, first-order terms only are allowed in the left hand side of an abstraction. In spite of this restriction, we obtain an expressive calculus for which confluence and strictness are obtained under a large class of evaluation strategies [CK01,Cir00]. Introducing ⊥ and exn, we would like to define an extension of the ρ-calculus that keeps these properties. Therefore we extend the terms which we will abstract on (i.e. rewrite rule left hand sides) to be as follows: ε-first-order term
t ::= x | f (t, . . . , t) | exn(⊥)
Definition 1. For F a set of function symbols and X a countable set of variables, we define the ρε -calculus by:
Exceptions in the Rewriting Calculus
73
– the subset ρε (F, X ) of ρε -terms whose sub-terms of the form l → r are such that l is a ε-first-order term; – the higher-order application of substitutions to terms; – the empty theory with the classical syntactic matching algorithm; – the evaluation rules described in Figure 2; – an evaluation strategy S that fixes the way to apply the evaluation rules. Fact 11 The set ρε (F, X ) is stable by EVALε . Proof. This is a simple consequence of the preservation of the set ρε (F, X ) by the evaluation mechanism. This explains in particular why EVALε does not contain the SwitchL rule of the ρ-calculus. Example 2. If we want to express the evaluation scheme presented in Example 1, thanks to the exn operator the problem exposed in that example is solved. In fact, in the ρε -term [first(exn(⊥) → Pf ail , x → Pnor )](exn(M )) the propagation of failure is stopped thanks to the exn operator. ∗
1. if M − −→ →ρε {⊥}, we have the following reduction: first exn(⊥) → Pf ail , x → Pnor exn(M ) ∗ − −→ →ρε firstexn(⊥) → Pf ail , x → Pnor exn({⊥}) − −→ →OpSet first exn(⊥) → Pf ail , x → Pnor {exn(⊥)} ∗ − −→ →ρε {Pf ail } So, in this case, the initial term leads to Pf ail , which is the wanted behavior. ∗ −→ →ρε {M ↓} where {M ↓} = {⊥}, we obtain: 2. if M − first exn(⊥) → Pf ail , x → Pnor exn(M ) ∗ first exn(⊥) → Pf ail , x → Pnor exn(M ) − −→ →ρε ∗ − −→ →ρε {Pnor } In this case also, we have the wanted result: Pnor . After the introduction of this new version of the ρ-calculus, we are going to make precise some of its properties and its expressiveness.
2 2.1
On the Confluence of the ρε -Calculus. The ρεv -Calculus The Non Confluence of the ρε -Calculus
In [Cir00,CK01], it is shown that the ρ-calculus is confluent under a large class of evaluation strategies. The same thing holds here. At first, it may seem easy to prove it but in fact, there are two problems that need a particular treatment: the Exn rule and the multiple ways to obtain failure. Let us begin to show typical examples of confluence failure. Non-confluence is inherited from the confluence failure of the ρ-calculus (described in [Cir00, CK01]) and from specific critical overlaps due to its enrichment and on which we are focusing now. First, if we apply a rule l → r to a term t (i.e. apply the F ire rule to [l → r](t)) with t Exn-reducible, we can have some non confluent reductions.
74
G. Faure and C. Kirchner
Example 3 (Under-evaluated terms). [a → b](exn(a)) MMM nn MMFMire n Exnnn n MMM nn n n MM& vnn {⊥} [a → b](a) F ire
{b}
In the ρ-calculus, knowing if a term can be reduced to failure (represented in the ρ-calculus by ∅) is quite easy since we only have to guarantee that no failure is possible and that ∅ is not a sub-term of t. But in the ρε -calculus, knowing if a term can be reduced to {⊥} seems not to be so easy since for example such a term must not contain sub-terms like [u](v) where u or v can be reduced to ∅ (see rule Distrib, Batch). In fact, if this condition is not verified, we have problems of confluence like the critical overlap of Example 4. Example 4 (Operand equals to ∅). [x → b](∅) JJ uu JJBatch JJ u u JJ u J$ zuuu F ire uu
{b}
{⊥}
In this example, the problem is indeed more complicated to solve than it seems since we need a test to insure that a term can not be reduced to ∅. More details about this problem can be found in [Fau01]. In the ρε -calculus, if we do not specify any strategy, we do not have a strict propagation of ⊥ as shown in the following example. Example 5 (Non-strict propagation of failure). [x → c](⊥) JJJ tt JAppBotR JJJ t JJJ tt t t % zt {c} {⊥} F ire tt
So, in addition to the difficulty to obtain, thanks to a suitable strategy, a confluent calculus, we are looking for a strategy which also allows for a strict calculus. The exn(⊥) role. The exn(⊥) pattern allows to catch via matching rule application failure like in exn(⊥) → c exn([f (x, y) → x](g(a, b))) . Of course, nothing in the ρε -calculus prevents from instantiating a variable by the exn(⊥) pattern. This is something quite natural since it allows to propagate failure suspicion or to ignore a raised exception like in [x → c](exn(M )).
Exceptions in the Rewriting Calculus
2.2
75
How to Obtain a Confluent Calculus? The ρεv -Calculus
As seen in the last section, the ρε -calculus is neither confluent nor strict. To obtain a confluent calculus, the evaluation mechanism described in EVALε should be tamed by a suitable strategy. This could be described in two ways, either by an independent strategy or, and this is the choice made in [Fau01], by technical conditions directly added to the F ire and Exn rules. Here, we do not describe these conditional rules and we refer to [Fau01] for the detailed approach. We define the ρεv -calculus as the ρε -calculus but using a call-by-value evaluation strategy. Intuitively, in the ρε -calculus, a value is a normal form term containing no free variable and no set (except the empty set as the argument of a function): value
v ::= c | f (v, . . . , v) | f (v, . . . , ∅, . . . , v) | v → v | exn(⊥)
(where c ∈ F0 and f ∈ ∪n≥1 Fn ) We can define the ρεv -calculus either by using the classical syntactic matching algorithm or by adding a new rule to it. Actually, although it is not necessary for the confluence and the strictness, it will be judicious that during the matching no term can be instantiated by ∅. So, as one can see in Figure 3 (where ∧ is supposed to be an associative and commutative operator), we add the EmptyInstantiation rule so as to avoid this instantiation.
Fig. 3. The call-by-value syntactic matching
In the ρεv -calculus, the F ire and Exn rules can only be applied to values. Assume that l, r are ρε -terms and v is a value, we define F irev , Exnv by: F irev
[l → r](v) ❀
Exnv
exn(v)
❀v
{⊥} if l v has no solution {σr}where σ ∈ Sol(l v)
So, now we can define more precisely the ρεv -calculus:
76
G. Faure and C. Kirchner
Definition 2. The ρεv -calculus is defined as the ρε -calculus except that we replace the F ire and Exn rules by the F irev and Exnv rules and that Sol(l v) is computed by the matching algorithm of Figure 3. In [Fau01], it is shown that this calculus is confluent and strict (since ⊥ is not a value): Theorem 1. The ρεv -calculus is a confluent and strict calculus. This result is fundamental since the confluence is really necessary in our approach to express the first. Thus, in the following section, we show that the ρεv -calculus allows us to express the first operator and consequently that we obtain a confluent and strict calculus in which the first can be expressed.
3
Expressiveness of the ρεv -Calculus
3.1
The First Operator in the ρεv -Calculus
Originally, the ρε -calculus was introduced to obtain a calculus in which the first can be simply expressed. All over the construction of the ρε -calculus, we aim at eliminating all arguments that help us to think that the first can not be expressed in it. We are going to show that we can indeed find a shallow encoding (see Section 1.1) of it in the ρεv -calculus. So, we define a term to express it, afterwards we prove that the given definition is valid and finally we give some examples. As seen below, the first is an operator whose role is to select between its arguments the first one that, applied to a given ρ-term, does not evaluate to {⊥}. This is something quite difficult to express in the ρ-calculus. We propose to represent the term [first(t1 , . . . , tn )](r) by a set of n terms, denoted {u1 , . . . , un }, with the property that every term ui can be reduced to {⊥} except perhaps one (say ul ). In other words, the initial set is reduced to a singleton, containing the normal form of ul , by successively reducing each ui (i = l) to {⊥} and then using the F latBot rule. To express the first in the ρε -calculus, we are going to use its two main trumps: the matching and the failure catching. Each ui is expressed according to the i − 1 first terms: If all uj (j < i) are reduced to {⊥}, then ui leads to [ti ](r) else ui leads to {⊥} by a rule application failure. So, we define each ui by u1 u2 u3 u4 .. .
= = = =
[t1 ](x) exn(⊥) → [t2 ](x) exn(u1 ) exn(⊥) → [exn(⊥) → [t3 ](x)](exn(u2 )) exn(u1 ) exn(⊥) → [exn(⊥) → [exn(⊥) → [t4 ](x)](exn(u3 ))](exn(u2 )) exn(u1 )
un = [exn(⊥) → [exn(⊥) → [. . . → [tn ](x)](exn(un−1 )) . . .](exn(u2 ))](exn(u1 ))
Exceptions in the Rewriting Calculus
77
To get the right definition, one should replace in the expression of ui , all uj (j < i) by their own definition, but such extended formulae would be rather tedious to read. We define M yF irst by:
M yF irst(t1 , . . . , tn ) = x → {u1 , . . . , un } By first(t1 , . . . , tn ), we denote precisely the first operator as defined in Section 1.1 by the two rules F irst , F irst and we denote by M yF irst(t1 , . . . , tn ) the operator purely defined in the ρε -calculus just above. It must be emphasized that this definition is not valid in a non-confluent calculus. In fact, if M yF irst is used in this kind of calculus, it may happen that there exists i0 such that ui0 can be evaluated to a term vi0 = {⊥} by a first reduction and to {⊥} by a second one. In this case, we can have for example a first reduction used to evaluate ui0 that leads to {⊥}, but in one of the uk (k > i0 ) we can have a different reduction of the same term ui0 that leads to vi0 = {⊥} and so uk is reduced as if ui0 is not evaluated to {⊥}, whereas here, it is. So, we obtain a set of terms which does not necessary have the right properties. For example, it can happen that more than one of these terms can not be reduced to {⊥}. Thus, in this section, we are going to work in a confluent calculus: the ρεv -calculus (although we can work in any confluent ρε -calculus). Let us show the validity of the definition of M yF irst. Lemma 1. In the ρεv -calculus, ∗
∗
– if for all i, [ti ](r) − −→ →ρεv {⊥}, then [M yF irst(t1 , . . . , tn )](r) − −→ →ρεv {⊥} and ∗ −→ →ρεv {⊥}. [first(t1 , . . . , tn )](r) − ∗ ∗ −→ →ρεv {⊥} and [tl ](r) − −→ →ρεv – if there exists l such that for all i ≤ l − 1 [ti ](r) − ∗ {vl } ↓ = {⊥} where FVar(vl ) = ∅ then [M yF irst(t1 , . . . , tn )](r) − −→ →ρεv {vl } ↓ ∗ −→ →ρεv {vl } ↓. and [first(t1 , . . . , tn )](r) − Proof. See [Fau01]. ∗ ∗ −→ →ρεv {⊥} and [tl ](r) − If there exists l such that for all i ≤ l − 1 [ti ](r) − −→ →ρεv {vl }↓ = {⊥} with FVar(vl ) = ∅, since a term with free variables is not a value, we can not “really evaluate” [M yF irst(t1 , . . . , tn )](r) and so, the evaluation is suspended (as it is for the first). So, w.r.t. terms with free variables, this result is also coherent. The following example is an illustration of this fact: Example 6. x → [M yF irst y → [x](y), y → c ](b) ∗ − −→ →ρεv x → {[x](b), [exn(⊥) → [y → c](b)](exn [x](b) )} ∗ − −→ →ρεv {x → [x](b), x → [exn(⊥) → c](exn [x](b) )} In the ρεv -calculus this result is in normal form because we can not continue the evaluation since the variable x needs to be instantiated. Let us note that it is not a drawback since we have the same behavior if we consider the first defined by the rules F irst and F irst given in Section 1.1. Thanks to the above lemma, we get our main expressiveness result (see Section 1.1 for the definition of simply expressible): Theorem 2. The first is simply expressible in the ρεv -calculus.
78
G. Faure and C. Kirchner
Now, we will denote by first(t1 , . . . , tn ) the term M yF irst(t1 , . . . , tn ). The above theorem is significant since this question about the expressiveness of the first in the ρ-calculus has arisen since the introduction of the ρ-calculus. Moreover, this result (a confluent calculus in which the first can be expressed) is proved here for the first time since the confluence of the ρ1st -calculus (the ρ-calculus doped with the first) is still an open question. Last but not least, this calculus allows also a simple but complete exception scheme as illustrated below. In the next example, we provide an example of two reductions of the same term illustrating the confluence of the proposed calculus. Example 7. A first reduction: x → [first y → [y](x), y → c ](b) a x → [y → [y](x)](b), = [exn(⊥) → [y → c](b)](exn([y → [y](x)](b))) a ∗ x → {[b](x), [exn(⊥) → [y → c](b)](exn([b](x)))} a − −→ →ρεv ∗ − −→ →ρεv { b a , [exn(⊥) → [y → c](b)](exn([b](a)))} ∗ − −→ →ρεv {{⊥}, [y → c](b)} ∗ − −→ →ρεv {c} An other possible reduction could be: x → [first(y → [y](x), y → c)](b) a ∗ first(y → [y](a), y → c) b − −→ →ρεv = {[y → [y](a)](b), [exn(⊥) → [y → c](b)](exn([y → [y](a)](b)))} ∗ − −→ →ρεv {⊥, {c}} ∗ − −→ →ρεv {c} In example 2, we saw that using the first we can encode an evaluation scheme switched by the result of the evaluation of one term (failure or not). Let us illustrate in this example the given definition of the first. Example 8. first(exn(⊥) → Pf ail , x → Pnor ) exn(M ) = exn(⊥) → Pf ail exn(M ) , exn(⊥) → [x → Pnor ](exn(M )) [exn(⊥) → Pf ail ](exn(M )) The evaluation strategy (call-by-value) forces to begin by the evaluation of M . We are going to distinguish two cases: ∗
1. if M − −→ →ρεv {⊥}, we have the following reduction: exn(⊥) → Pf ail exn(M ) , exn(⊥) → [x → Pnor ](exn(M )) [exn(⊥) → Pf ail ](exn(M )) ∗ {Pf ail }, {⊥} − −→ →ρεv ∗ − −→ →ρεv {Pf ail }
Exceptions in the Rewriting Calculus
79
So, in this case , the initial term leads to Pf ail , which is the wanted result. ∗ 2. if M − −→ →ρεv {M ↓} where {M ↓} = {⊥}, we obtain: exn(⊥) → Pf ail exn(M ) , exn(⊥) → [x → Pnor ](exn(M )) [exn(⊥) → Pf ail ](exn(M )) ∗ − −→ →ρεv {{⊥}, [x → Pnor ](exn(M ))} ∗ {[x → Pnor ](M )} − −→ →ρεv ∗ − −→ →ρεv {Pnor } In this last case also, we have the wanted result: Pnor . 3.2
Examples
In the Example 2, we have seen that the ρεv -calculus allows to switch the evaluation mechanism according to a possible failure in the evaluation of a term. In the following example we are going to see that we can switch the evaluation according to a failure catching as precise as desired. Example 9. Let us consider the evaluation scheme: if the two arguments of f are evaluated to the failure then evaluate Pf ail1&2 else if the first argument of f is evaluated to the failure then evaluate Pf ail1 else if the second argument of f is evaluated to the failure then evaluate Pf ail2 else evaluate Pnor It can be expressed in the ρεv -calculus by the following term (one should replace first by its given definition): first f exn(⊥), exn(⊥) → Pf ail1&2 , f exn(⊥), x → Pf ail1 , f x, exn(⊥) → Pf ail2 ,
f x, y → Pnor f exn(M ), exn(N ) So, we can switch the evaluation scheme according to the possible evaluation failure of two arguments of a function. It is clear that a similar term can be constructed for every function. The next example shows how to express in the ρεv -calculus the exception mechanism of ML [Mil78,WL96], when no named exception is considered. Example 10. The following ML program: try x/y; P1 with Div_By_Zero -> P2 is expressed in the ρεv -calculus, as:
80
G. Faure and C. Kirchner
first exn(⊥) → P 2, x → P 1 exn(x/ρ y)) where /ρ is an encoding of x/y in the rewriting calculus. Similarly, since we can ignore a raised exception (see Section 2.1), we can express the ML program try P1 with _ -> P2 by the ρ-term [x → P 2](exn(P 1)). Example 11. One of the first’s strong interest is to allow for a full and explicit encoding of normalization strategies. For instance, thanks to the first, an innermost normalization operator can be built [Cir00,CK01] as follows: – A term traversal operator for all operators fi in the signature:
Φ(r) = f irst( f1 (r, id, . . . , id), . . . , f1 (id, . . . , id, r), ..., fm (r, id, . . . , id), . . . , fm (id, . . . , id, r)) Using Φ we get for example: [Φ(a → c)](f (a, b)) → →{f (c, b)} and [Φ(a → b)](c) → →{⊥} ρε
ρε
– The BottomU p operator
Oncebu (r) = [Θ](Hbu (r)) with Hbu (r) = f → (x → [f irst(Φ(f ), r)](x)) where Θ is Turing’s fixed-point combinator expressed in the ρ-calculus: Θ = [A](A) with A = x → (y → [y]([[x](x)](y))) →ρε {f (b, g(a))}) For instance we have: [Oncebu (a → b)](f (a, g(a)) → – A repetition operator
repeat ∗ (r) = [Θ](J(r)) with J(r) = f → (x → [f irst(r; f, id)](x)) This can be used like in [repeat ∗ ({a → b, b → c})](a) → →ρε {c}. – Finally we get the innermost normalization operator:
im(r) = repeat ∗ (Oncebu (r)) It allows us to reduce f (a, g(a)) to its innermost normal form according to the rewriting system R = {a → b, f (x, g(x)) → x}. For instance, [im(R)](f (a, g(a))) → →ρεv {b}. Note that we do not need to assume confluence of R: considering the rewrite system R = {a → b, a → c, f (x, x) → x}, we have [im(R )](f (a, a)) → →ρεv {b, f (c, b), f (b, c), c}.
Exceptions in the Rewriting Calculus
81
Expressing normalization strategy becomes explicit and reasonably easy since in ρ-calculus, terms rules, rule applications and therefore strategies are considered at the object level. So, it is quite natural to ask if we can express how to apply these rules. Since the first is expressible in the ρεv -calculus, we can express all the operators above and we obtain as a corollary: Corollary 1. The operators im (and om) describing the innermost (resp. outermost) normalization can be expressed in the ρεv -calculus. Moreover, given a rewriting theory and two first-order ground terms (in the sense of rewriting) such that t is normalized to t↓ w.r.t. the set of rewrite rules R, the term [im(R)](t) is ρεv -reduced to a set containing the term t↓. Note that in this situation, a rewrite system is innermost confluent if and only if [im(R)](t) is a singleton. Notice also that, following [CK01], normalized rewriting [DO90] using a conditional rewrite rule of the form l → r if c can be simply expressed as the ρ-term l → [True → r]([im(R)](c)) and therefore that conditional rewriting is simply expressible in the ρεv -calculus.
4
Conclusion and Further Work
By the introduction of an exception mechanism in the rewriting calculus, we have solved the simple expression problem of the first operator in a confluent calculus. Consequently, this solves the open problem of the doped calculus confluence. Motivated by this expressiveness question, we have indeed elaborated a powerful variation of the rewriting calculus which is useful, first for theorem proving when providing general proof search strategies in a semantically well founded language, and secondly, as a useful programming paradigm for rule based languages. It has in particular the main advantage to bring the exception paradigm uniformly at the level of rewriting. This could be extended in particular by allowing for named exceptions. Another challenging question is now to express, maybe in the ρεv -calculus, strategy operators like the “don’t care” one used for example in the ELAN language. Acknowledgement. We thank Daniel Hirschkoff for his strong interest and support in this work, Horatiu Cirstea, Th´er`ese Hardin and Luigi Liquori for many stimulating discussions on the rewriting calculus and its applications and for their constructive remarks on the ideas developed here. We also thank the referees for their comments and suggestions.
82
G. Faure and C. Kirchner
References [BGG+ 92] R. Boulton, A. Gordon, M. Gordon, J. Harrison, J. Herbert, and J. V. Tassel. Experience with embedding hardware description languages in HOL. In V. Stavridou, T. F. Melham, and R. T. Boute, editors, Proceedings of the IFIP TC10/WG 10.2 International Conference on Theorem Provers in Circuit Design: Theory, Practice and Experience, volume A-10 of IFIP Transactions, pages 129–156, Nijmegen, The Netherlands, June 1992. North-Holland/Elsevier. [BKK+ 98] P. Borovansk´ y, C. Kirchner, H. Kirchner, P.-E. Moreau, and C. Ringeissen. An overview of ELAN. In C. Kirchner and H. Kirchner, editors, Proceedings of the second International Workshop on Rewriting Logic and Applications, volume 15, http://www.elsevier.nl/locate/entcs/volume15.html, Pont-` a-Mousson (France), September 1998. Electronic Notes in Theoretical Computer Science. Report LORIA 98-R-316. [Cir00] H. Cirstea. Calcul de r´e´ecriture : fondements et applications. Th`ese de Doctorat d’Universit´e, Universit´e Henri Poincar´e – Nancy 1, France, October 2000. [CK01] H. Cirstea and C. Kirchner. The rewriting calculus — Part I and II. Logic Journal of the Interest Group in Pure and Applied Logics, 9(3):427–498, May 2001. [CKL01a] H. Cirstea, C. Kirchner, and L. Liquori. The Matching Power. In V. van Oostrom, editor, Rewriting Techniques and Applications, volume 2051 of Lecture Notes in Computer Science, Utrecht, The Netherlands, May 2001. Springer-Verlag. [CKL01b] H. Cirstea, C. Kirchner, and L. Liquori. The Rho Cube. In F. Honsell, editor, Foundations of Software Science and Computation Structures, volume 2030 of Lecture Notes in Computer Science, pages 166–180, Genova, Italy, April 2001. [DK00] H. Dubois and H. Kirchner. Objects, rules and strategies in ELAN. In Proceedings of the second AMAST workshop on Algebraic Methods in Language Processing, Iowa City, Iowa, USA, May 2000. [DO90] N. Dershowitz and M. Okada. A rationale for conditional equational programming. Theoretical Computer Science, 75:111–138, 1990. [Fau01] G. Faure. Etude des propri´et´es du calcul de r´e´ecriture: du ρ∅ -calcul au ρε -calcul. Rapport, LORIA and ENS-Lyon, September 2001. http://www.loria.fr/˜ckirchne/=rho/rho.html. [Mil78] R. Milner. A theory of type polymorphism in programming. JCSS, 17:348– 375, 1978. [Ngu01] Q.-H. Nguyen. Certifying Term Rewriting Proof in ELAN. In M. van den Brand and R. Verma, editors, Proc. of RULE’01, volume 59. Elsevier Science Publishers B. V. (North-Holland), September 2001. Available at http://www.elsevier.com/locate/entcs/volume59.html. [WL96] Weiss and Leroy. Le langage Caml. InterEditions, 1996.
Deriving Focused Lattice Calculi Georg Struth Institut f¨ ur Informatik, Universit¨ at Augsburg Universit¨ atsstr. 14, D-86135 Augsburg, Germany Tel:+49-821-598-3109, Fax:+49-821-598-2274,
[email protected]
Abstract. We derive rewrite-based ordered resolution calculi for semilattices, distributive lattices and boolean lattices. Using ordered resolution as a metaprocedure, theory axioms are first transformed into independent bases. Focused inference rules are then extracted from inference patterns in refutations. The derivation is guided by mathematical and procedural background knowledge, in particular by ordered chaining calculi for quasiorderings (forgetting the lattice structure), by ordered resolution (forgetting the clause structure) and by Knuth-Bendix completion for non-symmetric transitive relations (forgetting both structures). Conversely, all three calculi are derived and proven complete in a transparent and generic way as special cases of the lattice calculi.
1
Introduction
We propose focused ordered resolution calculi for semilattices, distributive lattices and boolean lattices as theories of order. These calculi are relevant to many interesting applications, including set-theoretic and fixed-point reasoning, program analysis and construction, substructural logics and type systems. Focusing means integrating mathematical and procedural knowledge. Here, it is achieved via domain-specific inference rules, rewriting techniques and syntactic orderings on terms, atoms and clauses. The inference rules are specific superposition rules for lattice theory. They are constrained to manipulations with maximal terms in maximal atoms. Since lattices are quite complex for automated reasoning, focusing may not only drastically improve the proof-search in comparison with an axiomatic approach, it seems even indispensable. But it is also difficult. Only very few focused ordered resolution calculi are known so far. One reason is that existing methods require guessing the inference rules and justifying them a posteriori in rather involved completeness proofs (c.f. for instance [17,15]). We therefore use an alternative method to derive the inference rules and prove refutational completeness by faithfulness of the derivation [12]. This derivation method uses ordered resolution as a metaprocedure. It has two steps. First, a specification is closed under ordered resolution with on the fly redundancy elimination. The resulting resolution basis has an independence property similar to set of support: No inference among its members is needed in a refutation. Second, the patterns arising in refutational inferences between non-theory clauses and the resolution S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 83–97, 2002. c Springer-Verlag Berlin Heidelberg 2002
84
G. Struth
basis are transformed into inference rules that entirely replace the resolution basis. Up to refutational completeness of the metaprocedure, the derivation method is purely constructive. It allows a fine-grained analysis of proofs and supports variations and approximations. The concision and efficiency of focused calculi depends on the quality of the mathematical and procedural background knowledge that is integrated. Here, it is achieved mainly as follows. First, via specific axioms: for instance, via a lattice-theoretic variant of a cut rule as a surprising operational characterization of distributivity [9]. Second, extending related procedures: ordered chaining for quasiorderings [3,12], ordered resolution calculi as solutions to latticetheoretic uniform word problems [11] and Knuth-Bendix completion for quasiorderings [13]. On the one hand, ordered resolution is (via the cut rule) a critical pair computation for quasiorderings extended to distributive lattices. On the other hand, ordered chaining rules are critical pair computations extended to clauses. Extending both to lattices and clauses motivates an ordered resolution calculus with ordered chaining rules for lattices that include ordered resolution at the lattice level. Third, via syntactic orderings: these constrain inferences, for instance to the respective critical pair computations. We extend and combine those of the background procedures. We use in particular multisets as the natural data structure for lattice-theoretic resolution to build term orderings. Briefly, our main results are the following. We propose refutationally complete ordered chaining calculi for finite semilattices, distributive lattices and boolean lattices. These calculi yield decision procedures. We also argue how to lift the calculi to semi-decision procedures for infinite structures. The lattice calculi comprise an ordered chaining calculus for quasiorderings, a KnuthBendix completion procedure for quasiorderings and ordered resolution calculi as special cases. Their formal derivation and proof of refutational completeness is therefore uniform and generic. As a peculiarity, we derive propositional ordered resolution (with redundancy elimination) formally as a rewrite-based solution to the uniform word problem for distributive lattices. This yields a constructive refutational completeness proof that uses ordered resolution reflexively as a metaprocedure. Besides this, our results further demonstrate the power and applicability of the derivation method with a non-trivial example. Our work uses some long and technical arguments. We exclude them into an extended version [14]. Here, we only give the main definitions and illustrate the main ideas of the derivation. We also restrict our attention to finite structures. The remainder is organized as follows. Section 2 recalls some ordered resolution basics. Section 3 introduces some lattice theory and motivates our choice of the resolution basis. Section 4 introduces the syntactic orderings for our lattice calculi, section 5 the calculi themselves. Section 6 discusses the computation of the resolution basis. Section 7 sketches the derivation of the inference rules. Section 8 shows an example and discusses several extension and specializations of the calculi. Section 9 contains a conclusion.
Deriving Focused Lattice Calculi
2
85
Ordered Resolution and Redundancy
We first recall some well-known results about ordered resolution and redundancy elimination. Consider [2] for further reference. Let TΣ (X) be a set of terms with signature Σ and variables in X, let P be a set of predicates. The set A of atoms consists of all expressions p(t1 , . . . , tn ), where p is an n-ary predicate and t1 , . . . , tn are terms. A clause is an expression {|φ1 , . . . , φm |} −→ {|ψ1 , . . . , ψn |}. Its antecedent {|φ1 , . . . , φm |} and succedent {|ψ1 , . . . , ψn |} are finite multisets of atoms. Antecedents are schematically denoted by Γ , succedents by ∆. Brackets will usually be omitted. The above clause represents the closed universal formula (∀x1 . . . xk )(¬φ1 ∨· · ·∨¬φm ∨ψ1 ∨· · ·∨ψn ). A Horn clause contains at most one atom in its succedent. We deliberately write ∆ for −→ ∆ and ¬Γ for Γ −→ and Γ, φ for Γ ∪ φ. We consider calculi with inference rules constrained by syntactic orderings. A term ordering is a well-founded total ordering on ground terms. An atom ordering is a well-founded total ordering on ground atoms. For non-ground expressions e1 and e2 and a term or atom ordering ≺ we define e1 ≺ e2 iff e1 σ ≺ e2 σ for all ground substitutions σ. Consequently, e1 e2 if e2 σ e1 σ for some ground expressions e1 σ and e2 σ. An atom φ is maximal with respect to a multiset Γ of atoms, if φ ≺ ψ for all ψ ∈ Γ . It is strictly maximal with respect to Γ , if φ ψ for all ψ ∈ Γ . The non-ground orderings are still well-founded, but need no longer be total. While atom orderings suffice to constrain the inferences of the ordered resolution calculus, clause orderings are needed for redundancy elimination. To extend atom orderings to clauses, we measure clauses as multisets of their atoms and use the multiset extension of the atom ordering. To disambiguate occurences of atoms in antecedents and succedents, we assign to those in the antecedent a greater weight than those in the succedent. See section 4 for more details. The clause ordering then inherits totality and well-foundedness from the atom ordering. Again, the non-ground extension need not be total. In unambiguous situations we denote all orderings by ≺. Definition 1 (Ordered Resolution Calculus). Let ≺ be an atom ordering. The ordered resolution calculus OR consists of the following deduction inference rules. The ordered resolution rule Γ −→ ∆, φ Γ , ψ −→ ∆ , Γ σ, Γ σ −→ ∆σ, ∆ σ
(Res)
where σ is a most general unifier of φ and ψ, φσ is strictly maximal with respect to Γ σ, ∆σ and maximal with respect to Γ σ, ∆ σ. The ordered factoring rule Γ −→ ∆, φ, ψ , Γ σ −→ ∆σ, φσ
(Fact)
where σ is a most general unifier of φ and ψ and φσ is strictly maximal with respect to Γ σ and maximal with respect to ∆σ.
86
G. Struth
In all inference rules, side formulas are the parts of clauses denoted by capital Greek letters. Atoms occurring explicitly in the premises are called minor formulas, those in the conclusion principal formulas. Let S be a clause set and ≺ a clause ordering. A clause C is ≺-redundant or simply redundant in S, if C is a semantic consequence of instances from S which are all smaller than C with respect to ≺. Closing S under OR-inferences and eliminating redundant clauses on the fly transforms S into a resolution basis rb(S). The transformation need not terminate, but we have refutational completeness: All fair OR-strategies derive the empty clause within finitely many steps, if S is inconsistent. We call a proof to the empty clause an OR-refutation. Proposition 1. S is inconsistent iff rb(S) contains the empty clause. A resolution basis B is a special basis. By definition it satisfies the independence property that all conclusions of primary B-inferences, that is OR-inferences with both premises from B, are redundant. However, it need not be unique. Proposition 2. Let B be a consistent resolution basis and T a clause set. If B∪T is inconsistent, then some OR-refutation contains no primary B-inferences. In [2], variants of proposition 1 and proposition 2 have been shown for a stronger notion of redundancy. For deriving the lattice calculi, the weak notion suffices. But the strong notion can of course always be used. By proposition 2, resolution bases allow ordered resolution strategies similar to set of support. The construction of a resolution basis will constitute the first step of our derivation of focused lattice calculi. OR will be used as a metaprocedure in the derivation. Its properties, like refutational completeness and avoidance of primary theory inferences are essential ingredients.
3
Lattices
Here we are not concerned with arbitrary signatures and predicates. Let Σ = {, } and P = {≤}. and are operation symbols for the lattice join and meet operations. ≤ is a binary predicate symbol denoting a quasiordering. A quasiordered set (or quoset) is a set A endowed with a reflexive transitive relation ≤, which satisfies the clausal axioms x ≤ x,
(ref)
x ≤ y, y ≤ z −→ x ≤ z.
(trans)
A join semilattice is a quoset A closed under least upper bounds or joins for all pairs of elements. Formally, x ≤ z and y ≤ z iff xy ≤ z for all x, y, z ∈ A. A meet semilattice is defined dually as a quoset closed under greatest lower bound or meets for all pairs of elements. The dual of a statement about lattices is obtained by interchanging joins and meets and taking the converse of the ordering. A lattice is both a join and a meet semilattice. It is distributive, if x (y z) ≤ (x y) (x z) holds for all x, y, z ∈ A or its dual and therewith both. The
Deriving Focused Lattice Calculi
87
inequality x(yz) ≥ (xy)(xz) and its dual hold in every lattice. In a quoset, joins and meets are unique up to the congruence ∼ = (≤ ∩ ≥). Semantically, ≤/∼ is a partial ordering, hence an antisymmetric quasiordering (x ≤ y ∧ y ≤ x =⇒ x = y). Operationally, antisymmetry just splits equalities into inequalities. We can therefore disregard it. Joins and meets are associative, commutative, idempotent (xx = x = xx) and monotonic operations in the associated partial ordering. We will henceforth consider all inequalities modulo AC (associativity and commutativity) and normalize with respect to I (idempotence). See [4] for further information on lattices. In [10] we derive the resolution bases directly from natural specifications. Here we choose a technically simpler way that uses more domain-specific knowledge. The lattice axioms allow us to transform every inequality between lattice terms into a set of simpler expressions. Lemma 1. Let S be a meet semilattice, L a distributive lattice and G a set of generators (that is a set of constants disjoint from Σ). (i) An inequality s ≤ t1 . . . tn holds in S, iff s ≤ ti holds for all 1 ≤ i ≤ n. (ii) For all s, t ∈ TΣ∪G , an inequality s < t holds in L, iff there is a set of inequalities of the form s1 . . . sm ≤ t1 . . . tn that hold in L and the si and ti are generators occuring in s and t. A reduced join (meet) semilattice inequality is a join (meet) semilattice inequality whose left-hand (right-hand) side is a generator. A reduced lattice inequality is a lattice inequality whose left-hand side is a meet and whose right-hand side is a join of generators. Reduced semilattice and lattice inequalities are latticetheoretic analogs of Horn clauses and clauses. In particular, the clausal arrow is a quasiordering. We usually write more compactly s1 . . . sm ≤ t1 . . . tn instead of s1 . . . sm ≤ t1 . . . tn . By a lattice-theoretic variant of the Tseitin transformation [16], there is a linear reduction of distributive lattice terms. We speak of reduced clausal theories, when all inequalities in clauses are reduced. For the remainder of this text we assume that all lattice and semilattice inequalities are reduced. We will also restrict our inference rules to reduced terms. But then, the axioms of the join and meet semilattice and also the distributive axiom can no longer be used, since they do not yield reduced clauses. However, one cannot completely dispense with their effect in the initial specification. Here, background knowledge comes into play to find the appropriate initial specification for reduced clausal theories. Following [9] we use the cut rule x1 ≤ y1 z, x2 z ≤ y2 −→ x1 x2 ≤ y1 y2 .
(cut)
as an alternative characterization of distributivity. Written as an inference rule, this Horn clause is a lattice-theoretic variant of resolution. It combines the effect of transitivity, distributivity and monotonicity of join and meet. See [11] for a discussion and a derivation. In the context of Knuth-Bendix completion for quasiorderings, it can be restricted by ordering constraints and used as a critical pair computation to solve the uniform word problem for distributive lattices [11].
88
G. Struth
There are similar cut rules (not involving distributivity) x1 ≤ y1 z, z ≤ y2 −→ x1 ≤ y1 y2 , x1 ≤ z, x2 z ≤ y2 −→ x1 x2 ≤ y2 ,
(jcut) (mcut)
for the join and meet semilattice. These are used for solving the respective semilattice word problems. (cut), (jcut) and (mcut) work on reduced inequalities. The expression z which is cut out is necessarily a generator. Some lattice theory shows that these cut rules together with the Horn clauses x ≤ y −→ x ≤ yz,
(jr)
x ≤ z −→ xy ≤ z, (ml)
constitute our desired initial specifications. Lemma 2. The following sets axiomatize their reduced clausal theories up to normalization with I and modulo AC. (i) J = {(ref), (trans), (jr), (jcut)} for join semilattices, (ii) M = {(ref), (trans), (ml), (mcut)} for meet semilattices, (iii) D = {(ref), (trans), (jr), (ml), (jcut), (mcut), (cut)} for distributive lattices. See [14] for proofs. In implementations, normalization with respect to I and also with respect of 0 and 1, when these constants are present, must be explicit. We use the canonical equational system modulo AC T = {x ∧ x → x, x ∨ x → x, x ∨ 1 → 1, x ∧ 1 → x, x ∨ 0 → x, x ∧ 0 → 0}. The orientation of the rewrite rules is compatible with the term orderings from section 2. T induces reduced inequalities modulo ACI01. AC and I01 are however handled differently: AC by a compatible ordering (c.f. section 4), not by rewrite rules (the laws are not orientable). I01 by the normalizing rewrite system T , not by a compatible ordering (which does not exist). In the non-ground case, our calculi use ACI(01)-unification or -unifiability (when working with constraints). That can be very prolific [1]. In the ground case, much cheaper operations, like string matching, suffice for comparing terms in the inference rules. We denote the normal form of an expression e (a term, an atom, a clause) with respect to T by (e) ↓T . In the ground case we may sort joins and meets of generators after each normalization. All consequences of a reduced clause set and J, M and D are again reduced. However, consequences of a reduced clause set in T -normal form may not be in T -normal form. At the end of this section we again consider background knowledge to motivate the lattice calculi. As already mentioned, (cut) is used to solve the uniform word problem for distributive lattices. In this context, all non-theory clauses are atoms. We however want to admit arbitrary non-theory clauses. This is analogous to ordered chaining calculi for quasiordering, where a Knuth-Bendix completion procedure for quasiorderings, operating entirely on positive atoms, is extended to clauses via a (ground) positive chaining rule. Γ −→ ∆, r < s Γ −→ ∆ , s < t Γ, Γ −→ ∆, ∆ , r < t
(Ch+)
Deriving Focused Lattice Calculi
89
Thereby s is the maximal term in the minor formulas and the minor formulas are strictly maximal with respect to Γ , Γ , ∆ and ∆ . The rule is a clausal extension of a critical pair computation. A chaining rule for distributive lattices can then analogously be motivated as an extension of (cut) to the lattice level. Γ −→ ∆, s1 < t1 x Γ −→ ∆ , s2 x < t2 Γ, Γ −→ ∆, ∆ , s1 s2 < t1 t2 But which other inference rules are needed and what are the ordering constraints? These questions require deeper considerations. They will be answered with the help of the derivation method. Our third way of including background information is the construction of the specific syntactic orderings. In the following section, we will base them again on those of ordered resolution and ordered chaining. In section 6 we will show that J, M and D are resolution bases for these orderings and therefore independent sets. They will then be internalized into focused inference rules in section 7.
4
The Syntactic Orderings
The computation of a resolution basis, its termination and the procedural behaviour of our calculi crucially depend on the syntactic term, atom and clause orderings. In section 3 we have developed our initial axiomatizations with regard to the ordered chaining calculi for quasiorderings and ordered resolution in lattice theory. Here we build syntactic orderings for lattices that refine those of the background procedures and turn (jcut), and (cut) into lattice-theoretic variants of ordered Horn resolution and resolution. Like in section 2, we first define a term ordering and then extend it to atoms and clauses. For the term ordering, let ≺ be the multiset extension of some total ordering on the set of generators. We assign minimal weight to 0 and 1, if present. ≺ is trivially well-founded, if the set is finite or denumerably infinite. We measure both joins and meets of generators by their multisets. This clearly suffices for the terms occurring in reduced lattice inequalities. By construction, ≺ is well-founded and compatible with AC: terms which are equal modulo AC are assigned the same measure. ≺ is natural for resolution, since multisets are a natural data structure for clauses. This choice of ≺ will force the calculus under construction to become resolution-like1 . ≺ also appears, when resolution is modeled as a critical pair computation for distributive lattices. We now consider the atom ordering. Let B be the two-element boolean algebra with ordering
In [10], we alternatively force tableau calculi for distributive lattices with a term ordering emphasizing the subterm ordering.
90
G. Struth
(ground) atom φ ∈ A occurring in C. Hereby tν (φ) (tµ (φ)) denotes the maximal (minimal) term with respect to ≺ in φ. p(φ) = 1 (p(φ) = 0), if φ occurs in Γ (in ∆). s(φ) = 1 (s(φ) = 0), if φ = s < t and s t (s ≺ t). The (ground) atom ordering ≺2 ⊆ A×A is defined by φ ≺2 ψ iff µC (φ) ≺1 µC (ψ) for φ, ψ ∈ A. Hence ≺2 is embedded in ≺1 via the atom measure. The ordering ≺1 is total and wellfounded by construction. Via the embedding, ≺2 inherits these properties. The definition of the atom measure follows those of the ordered chaining calculi for quasiorderings and of Knuth-Bendix completion for quasiorderings. This forces the calculus to become a chaining calculus at the clausal level and a completion procedure at the lattice level. The polarity p is not needed for comparing atoms, but it is crucial for the extension to clauses, to integrate redundancy elimination. All these orderings are extended to the non-ground level and the clause level according to section 2. In particular the polarity p assigns greater weight to an occurrence of a term in the antecedent than to an occurrence in the succedent. In unambiguous situations we denote all orderings by ≺.
5
The Lattice Chaining Calculi
We now restrict our attention to finite lattices. Then all non-theory clauses are ground, since existential quantification can be replaced by a finite disjunction and universal quantification by a finite conjunction. The non-ground clauses in J, M and D are completely internalized into the focused inference rules. Therefore, our entire calculi are then ground. The extension to the non-ground case is discussed in section 8. We introduce indexed brackets to abbreviate the presentation of the calculi. A pair of brackets [.] in a clause denotes alternatively the clause without the brackets and the clause, where the brackets together with their content have been deleted. The clause Γ −→ ∆, [r]s ≤ t, for instance, denotes alternatively the clauses Γ −→ ∆, rs ≤ t and Γ −→ ∆, s ≤ t. In inference rules, brackets with the same index are synchronized. The leftmost of the following inference rules, for instance, denotes alternatively the two others. Γ −→ ∆, [r]i s ≤ t Γ , [u]i v ≤ w −→ ∆
Γ −→ ∆, rs ≤ t Γ , uv ≤ w −→ ∆
Γ −→ ∆, s ≤ t Γ , v ≤ w −→ ∆
Definition 2 (Distributive Lattice Chaining). Let be the atom and clause ordering of section 4. Let all clauses be reduced. The ordered chaining calculus for finite distributive lattices DC consists of the deductive inference rules and the redundancy elimination rules of OR2 and the following inference rules. Γ, t ≤ t −→ ∆ Γ −→ ∆ 2
(Ref)
Section 2 only defines a semantic notion of redundancy. Every set of inference rules implementing this notion is admitted.
Deriving Focused Lattice Calculi
Γ, r ∧ s ≤ t −→ ∆ Γ, r ≤ t −→ ∆
(ML)
91
Γ, r ≤ s ∨ t −→ ∆ Γ, r ≤ s −→ ∆
(JR)
Here the minor formula is maximal wrt. Γ and strictly maximal wrt. ∆. Γ −→ ∆, s1 ≤ [t1 ]j x Γ −→ ∆ , [s2 ]m x ≤ t2 (Γ, Γ −→ ∆, ∆ , s1 [s2 ]m ≤ [t1 ]j t2 ) ↓T
(Cut+)
Here the terms containing x are strictly maximal in the minor formulas. The minor formulas are strictly maximal wrt. the side formulas in the premises. Γ −→ ∆, s1 ≤ [t1 ]j x Γ , s1 [s2 ]m ≤ [t1 ]j t2 −→ ∆ (Γ, Γ , [u]m x ≤ t2 −→ ∆, ∆ ) ↓T
(Cut-)
Here the terms containing s1 are strictly maximal in the minor formulas. In the first premise, the minor formula is strictly maximal wrt. the side formulas. In the second premise, the minor formula is maximal wrt. the side formulas. Moreover, [u]m x = t2 mod AC. u = s2 or else u = s1 , if s2 is absent in the minor formula. Γ −→ ∆, s ≤ [t1 ]m x, s ≤ [t1 ]m t2 (Γ, [s]j x ≤ t2 −→ ∆, s ≤ [t1 ]m t2 ) ↓T
(DF)
Here, x is a generator, either t1 is strictly maximal in the minor formulas or s is strictly maximal in the minor formulas and s can be set to 1 in the antecedent of the conclusion. The leftmost minor formula is strictly maximal wrt. the side formulas and the rightmost minor formula. There are a dual (Cut-) and a dual (DF) rule, which are obtained by inverting the ordering, exchanging joins and meets and exchanging the brackets indexed by j with those indexed by m. The calculus is meant modulo AC at the lattice level. (Ref) stands for reflexivity, (JR) and (ML) for join right and meet left, in analogy to the sequent calculus. (Cut+) and (Cut-) stand for positive and negative cut, (DF) for distributivity factoring. Definition 3. Under the conditions of definition 2, the calculus DC specializes to the following calculi. (i)
An ordered chaining calculus for join semilattices JC, removing the inference rule (ML) and discarding in the inference rules of DC the contents of all brackets indexed with m. (ii) An ordered chaining calculus for meet semilattices MC, removing the inference rule (JR) and discarding in the inference rules of DC the contents of all brackets indexed with j. (iii) An ordered chaining calculus for quasiorderings QC, removing the inference rules (JR) and (ML) and discarding in the inference rules of DC the contents of all brackets indexed with j and m. (iv) An ordered chaining calculus for transitive relations TC, removing the inference rule (Ref) from QC.
92
G. Struth
Definition 4 (Lattice Ordered Resolution [11]). Under the conditions of definition 2, the deduction inference rules of the ordered Horn resolution calculus HOR and the ordered resolution calculus OR of [11]3 as rewrite-based solutions to the uniform word problem of semilattices and distributive lattices are restrictions of DC and MC to non-theory clauses consisting of positive atoms. In this case, the only applicable rule is (Cut+). It computes a lattice-theoretic variant of a Gr¨ obner basis from the given presentation (the set of positive atoms). The query inequalities, which are in question as consequences of the presentation and the axioms, can then be shown to be redundant by a search method (c.f. [11]). An alternative solution to the word problem transforms the query into a set of negative atoms and then uses DC. We will see in section 8 that the resolution calculi are indeed a decision procedure. Definition 5 (Knuth-Bendix Completion [13]). Under the conditions of definition 2, the deduction inference rules of the Knuth-Bendix completion procedures for quasiorderings and non-symmetric transitive relations (without term structure), are restrictions either of QC and TC to non-theory clauses consisting of positive atoms or OR, all lattice terms being generators. Soundness and completeness of DC are the subject of section 6 and section 7, the other calculi are dealt with in section 8. The unordered variant of DC is an instance of theory resolution. We will derive the DC-rules as an ordered variant thereof. DC is more focused than mere reasoning with resolution bases.
6
Construction of the Resolution Bases
We now discuss the first step in the derivation of the chaining calculi: the construction of the resolution basis. A complete formal treatment can be found in [14]. We therefore close the sets J, M and D under ordered resolution and redundancy elimination. Use of lattice duality avoids repetitions. It turns out that J, M and D are already the resolution bases. Proposition 3. Let ≺ be the atom ordering of section 4 and OR the ordered resolution calculus of section 2. J, M and D are resolution bases for the reduced clausal theories of join semilattices, meet semilattices and distributive lattices. That is M = rb(M ), J = rb(J) and D = rb(D). We always implicitly normalize with respect to I. In the proof we first restrict our attention to M . The result for J then follows by duality, the result for D requires only a slight extension of the combination of J and M . We consider all possible ordered resolution inferences between members of M and show that the conclusions of these inferences are redundant, that is they are implied by smaller instances of members of M . We restrict ourselves to 3
The ordering constraints of these calculi are weaker as those of ordered resolution given in this text in section 2.
Deriving Focused Lattice Calculi
93
a representative case, namely the following inference between two instances of (mcut), where y is maximal and s2 x. s2 ≤ x, xy ≤ t −→i s2 y ≤ t
s1 ≤ y, s2 y ≤ t −→d s1 s2 ≤ t
s1 ≤ y, s2 ≤ x, xy ≤ t −→ s1 s2 ≤ t We add the indices i and d to denote that the Horn clauses increase or decrease with respect to ≺ from left to right. We also put the maximal atoms that are cut out into boxes. The conclusion is redundant. It semantically follows from the smaller instances s1 ≤ y, xy ≤ t −→ s1 x ≤ t,
s2 ≤ x, s1 x ≤ t −→ s1 s2 ≤ t
of (mcut). The analysis of the remaining cases is similar. It shows that J, M and D are indeed resolution bases. Proposition 2 and 3 immediately imply the following fact, which is essential for the arguments in the following section. Corollary 1. For every inconsistent reduced clause set containing J, M or D there exists a refutation without primary theory inferences in OR. By corollary 1, working with the resolution bases is already a strong improvement over plain axiomatic reasoning. We will see in the following section that the ordered chaining calculi yield even more restrictive proofs.
7
Deriving the Chaining Rules
We now turn to the second step in the derivation: the extraction of the inference rules of DC from OR-derivations with D. Formal proofs of all statements can be found in [14]. Here, we can only sketch them. Our main assumptions are refutational completeness of OC (theorem 1) and the fact that our ordering constraints rule out primary theory inferences (corollary 1). The main idea is to consider all possible interactions between non-theory clauses and the elements of D in a refutation. Disregarding the ordering constraints, the first (Cut-) inference rule of DC can be derived, for instance, as Γ, s1 s2 ≤ t1 t2 −→ ∆
s1 ≤ t1 x , s2 x ≤ t2 −→ s1 s2 ≤ t1 t2
Γ, s1 ≤ t1 x, s2 x ≤ t2 −→ ∆ Γ −→ ∆ , s1 ≤ t1 x Γ, Γ , s2 x ≤ t2 −→ ∆, ∆
from two non-theory clauses and an instance of (cut). Here, the atoms of (cut) that are cut out are in boxes. In (Cut-), the effect of (cut) is completely internalized. The derivation of (Cut+) is similar. There, both atoms in the antecedent of the instance of (cut) are cut out. (DF) arise from a resolution step followed by a factoring step. (JR) and (ML) arise from resolution steps with (jr) and (ml). The restrictions in DC modeled by brackets arise from derivation with (jcut),
94
G. Struth
(mcut) or (trans). If it is possible to partition all refutations into these macro inferences, then the respective rules from D can be completely discarded in favor of the focused inference rules. Looking more precisely, this partition in patterns is however not straightforward. First, it must respect the ordering constraints of ordered resolution. Second, there are certain legal proof steps in refutations that may violate the pattern construction. We first briefly argue that these unwanted proof steps can be avoided. We then show how the partition of arbitrary OR-refutations into patterns corresponding to the DC-rules is obtained. To understand the obstacles to the macro inferences, consider again the above derivation of (Cut-). First, Γ −→ ∆ , s1 ≤ t1 x may be another instance of (cut). We call this situation a secondary theory inference. Second, ∆ may contain an atom bigger than s1 ≤ t1 x. Then the derivation of (Cut-) does not continue as above, but with a “bad” resolution step cutting out this bigger atom. We call this situation a blocking inference. In a blocking inference, only the first step uses the theory axiom. In DC, secondary theory inferences can occur in connection with (cut), (jcut), (mcut) and (trans). A OR-derivation is regular, when it contains neither primary and secondary theory inferences nor blocking inferences. We first present a technical lemma that shows that we can always assume that all instances of (cut), (jcut), (mcut) and (trans) are decreasing from left to right in a refutation, that is, it can be indexed with d. Lemma 3. For every two-step proof in a OR-refutation that cuts out two atoms of (cut), (jcut), (mcut) or (trans) there is another OR proof from the same nontheory premises to the same conclusion, in which (another instance of ) (cut) (jcut), (mcut) or (trans) decreases from left to right. The proof is based on a local proof transformation with the clauses in D, which can be completely hidden inside of a macro inference. We henceforth assume that all refutations are of this particular shape. This makes proofs much easier. Lemma 4. For every inconsistent clause set there exists a regular OR-refutation (possibly violating the ordering constraints). The proof is based on a proof transformation. In case of blocking inferences, the bad second inference must be permuted down towards the root of the proof tree, the needed second inference must be permuted up in a local way. In case of secondary theory inferences, we show that the two instances of (cut) can be replaced by one single instance and that the refutation can then be continued with only a few well-behaved modifications. It then remains to show that absence of blocking inferences and secondary theory inferences are compatible. We are now prepared for our main theorem. Theorem 1. Let all clauses be reduced. The ground ordered chaining calculus DC is refutationally complete for finite distributive lattices: For every reduced ground clause set that is inconsistent in the first-order theory of finite distributive lattices there exists a refutation in DC. Again, we only discuss some representative cases. We may assume that the refutation is regular and that all instances of (cut), (jcut), (mcut) or (trans) are in
Deriving Focused Lattice Calculi
95
d. We consider the (Res) and (Fact) steps of non-theory clauses with D. Terms, at which the chaining takes place, are sometimes put into boxes. First, the resolution inference of a non-theory clause Γ, s1 s2 ≤ t −→d ∆ with a ground instance of (ml) is s1 ≤ t −→i s1 s2 ≤ t Γ, s1 s2 ≤ t −→d ∆ , Γ, s1 ≤ t −→ ∆ where the minor formula is maximal with respect to the side formulas in the second premise. Internalization of (ml) yields (ML). Second, consider the following resolution inference of the non-theory clause. Let t1 be maximal in the following instance of (cut) and assume that there is a non-theory clause Γ −→ ∆, s1 ≤ t1 x in which the atom s1 ≤ t1 x is strictly maximal. Then there is a possible resolution inference Γ −→ ∆, s1 ≤ t1 x s1 ≤ t1 x, s2 x ≤ t2 −→ s1 s2 ≤ t1 t2 Γ, s2 x ≤ t2 −→ ∆, s1 s2 ≤ t1 t2
.
In the conclusion t1 may occur more than once, but only in ∆. Assuming a regular refutation, it can either be continued as Γ, s2 x ≤ t2 −→ ∆, s1 s2 ≤ t1 t2
Γ , s1 s2 ≤ t1 t2 −→ ∆
Γ, Γ , s2 x ≤ t2 −→ ∆, ∆
,
which yields the first of the (Cut-) rules. Or ∆ may be equal to ∆ , s1 s2 ≤ t1 t2 . Then the refutation can be continued by ordered factoring as Γ, s2 x ≤ t2 −→ ∆ , s1 s2 ≤ t1 t2 , s1 s2 ≤ t1 t2 Γ, s2 x ≤ t2 −→ ∆ , s1 s2 ≤ t1 t2 which yields the first of the (DF) rules. The remaining cases are similar. Soundness of DC is trivial from the completeness proof, since all rules have been derived from OR with rules from D. Corollary 2. The semilattice and chaining calculi in definition 3 and the resolution calculi in definition 4 are refutationally complete.
8
Discussion
Let us first consider an example that shows DC at work. See ([14]) for more examples. Many of them can be solved without (Cut+), (Cut-) or (DF); the effect of distributivity is sufficiently handled by the transformation to reduced lattice inequalities. Here we show that complements in distributive lattices are uniquely defined. We assume two complements b and c of an element a and show that they are identical. The corresponding reduced clauses are ab ≤ 0,
ac ≤ 0,
1 ≤ ab,
1 ≤ ac
b ≤ c, c ≤ b −→
96
G. Struth
Let a b c 1 0. We then obtain c ≤ b from 1 ≤ ab and ac ≤ 0 as well as b ≤ c from 1 ≤ ac and ab ≤ 0 by (Cut+). The empty clause follows immediately from these two inequalities by resolution with b ≤ c, c ≤ b −→. We now show that we can go beyond refutational completeness. Corollary 3. DC decides the elementary theory of finite distributive lattices. This holds since finitely presented distributive lattices are finite and no DCinference introduces a new variable or constant. Thus only finitely many inferences have irredundant conclusions. A resolution basis is finitely constructed. It contains the empty clause, iff the initial clause set was inconsistent. The result immediately specializes to the semilattice and chaining calculi in definition 3, the resolution calculi in definition 4 and the Knuth-Bendix procedure in definition 5. We now discuss some extensions of DC. Corollary 4. DC is refutationally complete for the reduced clausal theory of finite boolean lattices. Boolean lattices are complemented distributive lattices with 1 and 0. Complements can be eliminating in reduced inequalities, just like negation in clauses. a b ≤ c, for instance, is equivalent to a ≤ b c and a ≤ b c to a b ≤ c, if b denotes the complement of b. Thus DC suffices for the boolean case. For infinite lattices, quantifiers can no longer be eliminated and first-order variables appear. Our calculi then become semi-decision procedures, since the corresponding first-order theories are undecidable. This extension requires a signature including free Skolem functions. Since these functions are non-monotonic, they are harmless. Then, our calculi can be easily lifted, only idempotence can no longer be used implicitly as a simplification (c.f [14]). Homomorphisms can also be treated by pushing them down through lattice terms. With monotonic functions, further complications with certain critical pairs appear that possibly cannot be handled within first-order logic. See [12,13] for a deeper discussion. Due to ACI-unification, the (Cut-) rules may become very prolific in the nonground case. One improvement is unification constraints that use unifiability tests. With more structure, for instance in set theory, with lattice operations interpreted as union and intersection and the ordering as set inclusion, one may fortunately completely dispense with (Cut-) rules. a ⊆ b holds iff b intersected with some singleton subset of a is empty (c.f. [7]). Thus every negative inequality may be transformed into positive ones and (Cut-) never applies.
9
Conclusion
We have derived focused ordered resolution calculi for lattices. Our method supports the integration of specific syntactic orderings as well as semantic and procedural knowledge in a smooth and fine-grained way. We consider that an important feature of the derivation method. We have also applied our method to the derivation of tableau calculi as lattice theoretic decision procedures [10],
Deriving Focused Lattice Calculi
97
using a different syntactic ordering encoding the subformula property. In particular, this allows us to consider the cut rule from an algebraic point of view. Our focused lattice calculi are a basis for many interesting applications. We plan in particular a consideration of (fragments of) set theory [7] and of certain extensions of semilattices and distributive lattices [8,5,6] for the construction and analysis of hard- and software systems. Proof support for these endeavors is very challenging.
References 1. F. Baader and W. B¨ uttner. Unification in commutative idempotent monoids. J. Theoretical Computer Science, 56:345–352, 1988. 2. L. Bachmair and H. Ganzinger. Rewrite-based equational theorem proving with selection and simplification. J. Logic and Comp utation, 4(3):217–247, 1994. 3. L. Bachmair and H. Ganzinger. Rewrite techniques for transitive relations. In Ninth Annual IEEE Symposium on Logic in Computer Science, pages 384–393. IEEE Computer Society Press, 1994. 4. G. Birkhoff. Lattice Theory, volume 25 of Colloquium Publications. American Mathematical Society, 1984. Reprint. 5. J. H. Conway. Regular Algebras and Finite State Machines. Chapman and Hall, 1971. 6. P. J. Freyd and A. Scedrov. . Categories, Allegories. North-Holland, 1990. . 7. L. Hines. Str+ve⊆: The Str+ve-based Subset Prover. In M. E. Stickel, editor, 10th International Conference on Automated Deduction, volume 449 of LNAI, pages 193–206. Springer-Verlag, 1990. 8. D. Kozen. Kleene algebra with tests. Transactions on Programming Languages and Systems, 19(3):427–443, 1997. 9. P. Lorenzen. Algebraische und logistische Untersuchungen u ¨ ber freie Verb¨ ande. The Journal of Symbolic Logic, 16(2):81–106, 1951. 10. G. Struth. Canonical Transformations in Algebra, Universal Algebra and Logic. PhD thesis, Institut f¨ ur Informatik, Universit¨ at des Saarlandes, 1998. 11. G. Struth. An algebra of resolution. In L. Bachmair, editor, Rewriting Techniques and Applications, 11th International Conference, volume 1833 of LNCS, pages 214– 228. Springer-Verlag, 2000. 12. G. Struth. Deriving focused calculi for transitive relations. In A. Middeldorp, editor, Rewriting Techniques and Applications, 12th International Conference, volume 2051 of LNCS, pages 291–305. Springer-Verlag, 2001. 13. G. Struth. Knuth-Bendix completion for non-symmetric transitive relations. In M. van den Brand and R. Verma, editors, Second International Workshop on RuleBased Programming (RULE2001), volume 59 of Electronic Notes in Theoretical Computer Science. Elsevier Science Publishers, 2001. 14. G. Struth. Deriving focused lattice calculi. Technical Report 2002-7, Institut f¨ ur Informatik, Universit¨ at Augsburg, 2002. 15. J. Stuber. Superposition Theorem Proving for Commutative Algebraic Theories. PhD thesis, Institut f¨ ur Informatik, Universit¨ at des Saarlandes, 1999. 16. G. S. Tseitin. On the complexity of derivations in propositional calculus. In J. Siekmann and G. Wrightson, editors, Automation of Reasoning: Classical Papers on Computational Logic, pages 466–483. Springer-Verlag, 1983. reprint. 17. U. Waldmann. Cancellative Abelian Moioids in Refutational Theorem Proving. PhD thesis, Institut f¨ ur Informatik, Universit¨ at des Saarlandes, 1997.
Layered Transducing Term Rewriting System and Its Recognizability Preserving Property Hiroyuki Seki1 , Toshinori Takai2 , Youhei Fujinaka1 , and Yuichi Kaji1 1
2
Nara Institute of Science and Technology Takayama 8916-5, Ikoma 630-0101, Japan {seki,youhei-f,kaji}@is.aist-nara.ac.jp National Institute of Advanced Industrial Science and Technology Nakoji 3-11-46, Amagasaki 661-0974, Japan
[email protected]
Abstract. A term rewriting system which effectively preserves recognizability (EPR-TRS) has good mathematical properties. In this paper, a new subclass of TRSs, layered transducing TRSs (LT-TRSs) is defined and its recognizability preserving property is discussed. The class of LTTRSs contains some EPR-TRSs, e.g., {f (x) → f (g(x))} which do not belong to any of the known decidable subclasses of EPR-TRSs. Bottomup linear tree transducer, which is a well-known computation model in the tree language theory, is a special case of LT-TRS. We present a sufficient condition for an LT-TRS to be an EPR-TRS. Also some properties of LT-TRSs including reachability are shown to be decidable.
1
Introduction
Tree automaton is a natural extension of finite-state automaton on strings. A set of ground terms (tree language) T is recognizable if there exists a tree automaton which accepts T . Tree automaton inherits good mathematical properties from finite-state automaton. For example, the class of recognizable sets is closed under boolean operations (union, intersection and complementation), and decision problems such as emptiness and membership are decidable for a recognizable set. Let L(A) denote the language accepted by a tree automaton A. For a TRS R and a tree language T , define (→∗R )(T ) = {t | ∃s ∈ T s.t. s →∗R t}. A TRS R effectively preserves recognizability (abbreviated as EPR) if for any tree automaton A, (→∗R )(L(A)) is also recognizable and a tree automaton A∗ such that L(A∗ ) = (→∗R )(L(A)) can be effectively constructed. Due to the above mentioned properties of recognizable sets, some important problems, e.g., reachability, joinability and local confluence are decidable for EPR-TRSs [8,9]. Furthermore, with additional conditions, strong normalization property, neededness and unifiability become decidable for EPR-TRSs [4,12,15]. Another interesting application of EPR-TRS is in automatic termination proving [11]. The problem to decide whether a given TRS is EPR is undecidable [7], and decidable subclasses of EPR-TRSs have been proposed in a series of works [14, 3,10,9,12,15]. These subclasses put a rather strong constraint on the syntax of S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 98–113, 2002. c Springer-Verlag Berlin Heidelberg 2002
Layered Transducing Term Rewriting System
99
the right-hand side of a rewrite rule. For example, the right-hand side of a rewrite rule in a linear semi-monadic TRS (L-SM-TRS) [3] is either a variable or f (t1 , t2 , . . . , tn ) where each ti (1 ≤ i ≤ n) is either a variable or a ground term. Linear generalized semi-monadic TRS (L-GSM-TRS) [9] and right-linear finite path-overlapping TRS (RL-FPO-TRS) [15] weaken this constraint, but some simple EPR-TRSs such as {f (x) → f (g(x))} still do not belong to any of the known decidable subclasses of EPR-TRSs. To show that a given TRS R is EPR, for a given tree automaton A, a tree automaton A∗ such that L(A∗ ) = (→∗R )(L(A)) should be constructed. The above mentioned restrictions on the righthand side of a rewrite rule are sufficient conditions for a procedure of automata construction to halt. In this paper, a new subclass of TRSs, layered transducing TRSs (LT-TRSs) is defined and its recognizability preserving property is discussed. Intuitively, an LT-TRS is a TRS such that certain unary function symbols are specified as markers and a marker moves from leaf to root in each rewrite step. Bottom-up linear tree transducer [6], which is a well-known computation model in the tree language theory, can be considered as a special case of LT-TRS. We propose a procedure which, for a given tree automaton A and an LT-TRS R, constructs a tree automaton A∗ such that L(A∗ ) = (→∗R )(L(A)). The procedure introduces a state [z, q] which is the product of a state z already belonging to A∗ and a marker q and constructs a transition rule which is the product of a transition rule already in A∗ and a rewrite rule in R. However, an LT-TRS is not always EPR and the above procedure does not always halt. We present a sufficient condition for the procedure to halt. The subclass of LT-TRSs which satisfy the sufficient condition is still incomparable with any of the known decidable subclasses of EPR-TRSs. Especially, the class contains some EPR-TRSs, such as {f (x) → f (g(x))} mentioned above. Finally, some properties including reachability of LT-TRSs are shown to be decidable. The rest of the paper is organized as follows. After providing preliminary definitions in section 2, LT-TRS is defined in section 3. A procedure for automata construction is presented and the partial correctness of the procedure is proved in section 4. Sufficient conditions for the construction procedure to halt are presented in section 5. Also some properties including reachability are shown to be decidable for LT-TRS in section 5.
2 2.1
Preliminaries Term Rewriting Systems
We use the usual notions for terms, substitutions, etc (see [1] for details). Let Σ be a signature and V be an enumerable set of variables. An element in Σ is called a function symbol and the arity of f ∈ Σ is denoted by a(f ). A function symbol c with a(c) = 0 is called a constant. The set of terms, defined in the usual way, is denoted by T (Σ, V). The set of variables occurring in t is denoted by Var (t). A term t is ground if Var (t) = ∅. The set of ground terms is denoted
100
H. Seki et al.
by T (Σ). A ground term in T (Σ) is also called a Σ-term. A term is linear if no variable occurs more than once in the term. A substitution σ is a mapping from V to T (Σ, V), and written as σ = {x1 → t1 , . . . , xn → tn } where ti with 1 ≤ i ≤ n is a term which substitutes for the variable xi . The term obtained by applying a substitution σ to a term t is written as tσ. The term tσ is called an instance of t. A position in a term t is defined as a sequence of positive integers as usual, and the set of all positions in a term t is denoted by Pos(t). A subterm of t at a position o is denoted by t|o . If a term t is obtained from a term t by replacing the subterm of t at position o1 ∈ Pos(t ) with a term t1 then we write t = t [t1 ]o1 . A rewrite rule over a signature Σ is an ordered pair of terms in T (Σ, V), written as l → r. A term rewriting system (TRS ) over Σ is a finite set of rewrite rules over Σ. For terms t, t and a TRS R, we write t →R t if there exists a position o ∈ Pos(t), a substitution σ and a rewrite rule l → r ∈ R such that t|o = lσ and t = t[rσ]o . Define →∗R to be the reflexive and transitive closure of →R . Also the transitive closure of →R is denoted by →+ R . The subscript R of →R is omitted if R is clear from the context. A redex (in R) is an instance of l for some l → r ∈ R. A normal form (in R) is a term which has no redex as its subterm. Let NFR denote the set of all ground normal forms in R. A rewrite rule l → r is left-linear (resp. right-linear) if l is linear (resp. r is linear). A rewrite rule is linear if it is left-linear and right-linear. A TRS R is left-linear (resp. right-linear, linear ) if every rule in R is left-linear (resp. right-linear, linear). Notions such as reachability, joinability, confluence and local confluence are defined in the usual way. 2.2
Tree Automata
A tree automaton(TA) [6] is defined by a 4-tuple A = (Σ, P, ∆, Pfinal ) where Σ is a signature, P is a finite set of states, Pfinal ⊆ P is a set of final states, and ∆ is a finite set of transition rules of the form f (p1 , . . . , pn ) → p where f ∈ Σ, a(f ) = n, and p1 , . . . , pn , p ∈ P or of the form p → p where p , p ∈ P. A rule with the former form is called a non-ε-rule and a rule with the latter form is called an ε-rule. Consider the set of ground terms T (Σ ∪ P) where we define a(p) = 0 for p ∈ P. A transition of a TA can be regarded as a rewrite relation on T (Σ ∪ P) by regarding transition rules in ∆ as rewrite rules on T (Σ ∪ P). For terms t and t in T (Σ ∪ P), we write t A t if and only if t →∆ t . The reflexive and transitive closure of A is denoted by ∗A . If t A t1 A t2 A · · · A tk = t , we write t kA t and k is called the length of the transition sequence. For a TA A and t ∈ T (Σ), if t ∗A pf for a final state pf ∈ Pfinal , then we say t is accepted by A. The set of ground terms accepted by A is denoted by L(A). A set T of ground terms is recognizable if there is a TA A such that T = L(A). A state p is reachable if there exists a Σ-term t such that t ∗A p. It is well-known that for any TA A we can construct a TA A such that L(A ) = L(A) and A contains only reachable states and contains no ε-rule [6]. Recognizable sets inherit some useful properties of regular (string) languages.
Layered Transducing Term Rewriting System
101
Lemma 1. [6] The class of recognizable sets is effectively closed under union, intersection and complementation. For a recognizable set T , the following problems are decidable. (1) Does a given ground term t belong to T ? (2) Is T empty? 2.3
TRS Which Preserves Recognizability
For a TRS R and a set T of ground terms, define (→∗R )(T ) = {t | ∃s ∈ T s.t. s →∗R t}. A TRS R is said to effectively preserve recognizability if, for any tree automaton A, the set (→∗R )(L(A)) is also recognizable and we can effectively construct a tree automaton which accepts (→∗R )(L(A)). In this paper, the class of TRSs which effectively preserve recognizability is written as EPR-TRS. Theorem 1. If a TRS R belongs to EPR-TRS, then the reachability relation and the joinability relation for R are decidable [8]. It is also decidable whether R is locally confluent or not [9]. Unfortunately it is undecidable whether a given TRS belongs to EPR-TRS or not [7]. Therefore decidable subclasses of EPR-TRS have been proposed, for example, ground TRS by Brained [2], right-linear monadic TRS (RL-M-TRS) by Salomaa [14], linear semi-monadic TRS (L-SM-TRS) by Coquid´e et al. [3], linear generalized semi-monadic TRS (L-GSM-TRS) by Gyenizse and V´ agv¨ olgyi [9], and right-linear finite path overlapping TRS (RL-FPO-TRS) by Takai et al. [15]. Theorem 2. RL-M-TRS ⊂ RL-FPO-TRS ⊂ EPR-TRS and ground TRS ⊂ LSM-TRS ⊂ L-GSM-TRS ⊂ RL-FPO-TRS. All inclusions are proper. R´ety [13] defined a subclass of TRSs and showed that the class effectively preserves recognizability for the subclass C of tree languages each of which is a set {tσ | t is a linear term and σ is a substitution such that xσ is a constructor term for each x ∈ Var (t)} (abbreviated as C-EPR). For example, R3 of Example 5 in section 4 is not an EPR-TRS but it is C-EPR.
3
Layered Transducing TRS
A new class of TRS named layered transducing TRS (LT-TRS) is proposed in this section. Definition 1. Let Σ = F ∪ Q be a signature where F ∩ Q = ∅. Suppose that for each q ∈ Q, a(q) = 1. A function symbol in Q is called a marker . A layered transducing TRS (LT-TRS ) is a linear TRS over Σ in which each rewrite rule has one of the following forms: (i) f (q1 (x1 ), . . . , qn (xn )) → q(t), or (ii) q1 (x1 ) → q (t )
102
H. Seki et al.
where f ∈ F, qi (1 ≤ i ≤ n), q, q1 , q ∈ Q, x1 , . . . , xn are disjoint variables, t ∈ T (F, {x1 , . . . , xn }), t ∈ T (F, {x1 }) and t and t are linear terms. Example 1. Let g ∈ F with a(g) = 1 and q ∈ Q. R1 = {q(x) → q(g(x))} is an LT-TRS. Note that R1 is an EPR-TRS but is not an FPO-TRS since q(g(x)) properly sticks out of q(x) and hence the sticking out graph of R1 has a selflooping edge of weight one [15]. Example 2. Let f, g, h ∈ F, q1 , q2 , q ∈ Q. R2 = {f (q1 (x1 ), q2 (x2 )) → q(g(h(x2 ), x1 )), q1 (x1 ) → q(h(x1 ))} is an LT-TRS. In this paper, we use a, b, c to denote constants, f, g, h to denote non-marker symbols, q, q , q1 , q2 , . . . to denote markers, p, p , p1 , p2 , . . . to denote states and s, t, t1 , t2 , . . . to denote terms.
4
Construction of Tree Automata
In this section, we will present a procedure which takes an LT-TRS R and a tree automaton (TA) A as an input and constructs a TA A∗ such that L(A∗ ) = (→∗R )(L(A)) if the procedure halts. Let A = (Σ, P, ∆, Pfinal ) be a TA. By the definition of (→∗R )(L(A)), if t ∗A p and t →∗R s then s ∗A∗ p should hold. To satisfy this property, the proposed procedure starts with A0 = A and constructs a series of TAs A0 , A1 , . . .. We define A∗ as the limit of this chain of TAs. For example, let f (p1 , p2 ) → p ∈ ∆ and f (q1 (x1 ), q2 (x2 )) → q(g(h(x2 ), x1 )) ∈ R and assume that t = f (q1 (t1 ), q2 (t2 )) ∗A f (q1 (p1 ), q2 (p2 )) ∗A f (p1 , p2 ) A p.
(4.1)
Note that f (q1 (t1 ), q2 (t2 )) →R q(g(h(t2 ), t1 ))(= t ) and hence A∗ is required to satisfy q(g(h(t2 ), t1 )) ∗A∗ p. The procedure constructs a ‘product’ rule of f (p1 , p2 ) → p and f (q1 (x1 ), q2 (x2 )) → q(g(h(x2 ), x1 )) and some auxiliary rules so that A∗ can simulate the transition sequence (4.1) when A∗ reads q(g(h(t2 ), t1 )). More precisely, new states [p, q] and h(p2 ) are introduced and rules g(h(p2 ), p1 ) → [p, q], h(p2 ) → h(p2 )
(4.2)
are constructed. The following transition rule is also added so that s ∗A∗ [p, q] if and only if q(s) ∗A∗ p. q([p, q]) → p.
(4.3)
When A∗ reads q(g(h(t2 ), t1 )), we can see by (4.1) that t = q(g(h(t2 ), t1 )) ∗A q(g(h(p2 ), p1 )).
(4.4)
A∗ guesses that in a term t such that t →R t , the markers q1 and q2 were placed above the subterms t1 and t2 , respectively, and A∗ behaves as if it reads q1 and
Layered Transducing Term Rewriting System
pP q P
✮✏ q ✏ f
q q1 P
p2 ❅ ❅q ✏ ✮✏ 2
PP q P ✁❆ ✁t1 ❆
✏✏ ✮ ✏ ✁❅ ✁t 2 ❅
p1P p1
p2
→R
h(p2 )
❘ ❅ p2
h
103
p
✘[p, q] ✘✘✘ ✘ ✾ g p1 ❅ ✏✏ ✮ ❅✏ ✁❆ ✁t1 ❆
PP q P ✁❅ ✁t2 ❅
Fig. 1. Illustration of automata construction.
q2 at p1 and p2 . That is, A∗ simulates the transition f (p1 , p2 ) A p by rules (4.2). Also see Fig. 1. t ∗A q(g(h(p2 ), p1 )) A q(g(h(p2 ), p1 )) A q([p, q]) A p The last transition is by (4.3); A∗ encounters the marker q at the state [p, q], which means that the guess was correct, and A∗ changes its state to p by forgetting the guess q. The construction of new rules and states is repeated until Ai saturates. Hence, states with more than one nesting such as [[[p, q1 ], q2 ], q3 ] may be defined in general and the procedure does not always halt as illustrated in Example 5. There are two approaches to ensuring that the automata construction always halts. One of them is to redefine the procedure so that it generates a TA A∗ such that L(A∗ ) ⊇ (→∗R )(L(A)) and always halts [5]. The other one is to provide a sufficient condition on a TRS and/or a TA so that the procedure halts for any TRS R and TA A which satisfy the condition [15]. This paper takes the latter approach. A major difference between [15] and this paper is that we use a ‘product’ state [z, q], which in effect is the folding of infinitely many states and may avoid infinite state construction as illustrated in Example 4. Below we will present the construction procedure. For simplicity, if we write a signature Σ as Σ = F ∩ Q, then we implicitly assume that F ∩ Q = ∅ and Q is a set of markers. Procedure 1 Suppose Σ = F ∪ Q (Σ ∩ Q = ∅) and ∀q ∈ Q: a(q) = 1. Input: a tree automaton A = (Σ, P, ∆, Pfinal ) such that every state in P is reachable and no ε-rule is in ∆, an LT-TRS R over Σ. Output: a tree automaton A∗ s.t. L(A∗ ) = (→∗R )(L(A)). Step1. Let i := 0, A0 = (Σ, Z0 , ∆0 , Pfinal ) := A. In Step2–4, this procedure constructs A1 , A2 , · · · by adding new states and transition rules to A0 . Step2. Let i := i + 1, Ai = (Σ, Zi , ∆i , Pfinal ) := Ai−1 . Step3. Do the following (S) and (T) until Ai does not change. (S) if (S1) f (q1 (x1 ), . . . , qn (xn )) → q(t) ∈ R, f (z1 , . . . , zn ) → z ∈ ∆i−1 (f ∈ F), qj (zj ) → zj ∈ ∆i−1 (1 ≤ j ≤ n)
104
H. Seki et al.
or (S2) q1 (x1 ) → q(t) ∈ R, q1 (z1 ) → z ∈ ∆i−1 then if t = xl and [z, q] = zl then add to Zi [z, q]; add to ∆i zl → [z, q]; else let t = g(t1 , . . . , tm ) add to Zi t1 ρ, . . . , tm ρ, [z, q] add to ∆i g(t1 ρ, . . . , tm ρ) → [z, q]; where ρ = {xj → zj | 1 ≤ j ≤ m}1 ; ADDREC(tj , ρ) (1 ≤ j ≤ m). (T) ∀[z, q] ∈ Zi , ∀q ∈ Q add to ∆i q([z, q]) → z. Step4. If Ai−1 = Ai , then let A∗ := Ai and output A∗ , else go to Step2.
Procedure 2 (ADDREC) This procedure takes a term t ∈ T (F, V) and a substitution ρ: V → Zi−1 as an input, and adds new states and transition rules to Ai so that tσ ∗Ai tρ holds for every substitution σ such that σ = {xj → sj ∈ T (Σ) | sj ∗Ai xj ρ, 1 ≤ j ≤ m} (see Lemma 4). ADDREC(t, ρ) = if t = x then return; else let t = h(t1 , . . . , tn ) add to ∆i h(t1 ρ, . . . , tn ρ) → tρ; add to Zi t1 ρ, . . . , tn ρ; ADDREC(tj , ρ) (1 ≤ j ≤ n).
Example 3. Let A = (Σ, P, ∆, Pfinal ) be a TA where Σ = F ∪Q, F = {f, g, h, c}, Q = {q1 , q2 , q}, P = {p1 , p1 , p2 , pc , pf }, Pfinal = {pf } and ∆ = {c → pc , q1 (pc ) → p1 , q1 (p1 ) → p1 , q2 (pc ) → p2 , f (p1 , p2 ) → pf }. It can be easily verified that L(A) = {f (q1 (q1 (c)), q2 (c))}. We apply Procedure 1 to A and LT-TRS R2 of Example 2. For i = 1 of the procedure, f (q1 (x1 ), q2 (x2 )) → q(g(h(x2 ), x1 )) ∈ R2 , f (p1 , p2 ) → pf ∈ ∆0 = ∆, q1 (p1 ) → p1 ∈ ∆0 and q2 (pc ) → p2 ∈ ∆0 satisfy condition (S1) and rules g(h(x2 )ρ, x1 ρ) = g(h(pc ), p1 ) → [pf , q] and q([pf , q]) → pf are added to ∆1 where ρ = {x1 → p1 , x2 → pc }. Also a rule h(pc ) → h(pc ) is constructed by ADDREC(h(x2 ), ρ). Next, q1 (x1 ) → q(h(x1 )) ∈ R2 and q1 (p1 ) → p1 ∈ ∆0 satisfy condition (S2) and a rule h(p1 ) → [p1 , q] is constructed. The transition rules constructed in Procedure 1 are listed in Table 1. Since no rule is added when i = 2, the procedure halts and we obtain A∗ = A1 as the output. We can verify that L(A∗ ) = (→∗R2 )(L(A)). 1
For simplicity, for a state z ∈ Zi , we identify z with z.
Layered Transducing Term Rewriting System
105
Table 1. The transition rules constructed by Procedure 1. (S) (T) A1 g(h(pc ), p1 ) → [pf , q] q([pf , q]) → pf h(pc ) → h(pc ) h(p1 ) → [p1 , q] q([p1 , q]) → p1 h(pc ) → [p1 , q] q([p1 , q]) → p1
Example 4. Let A = (Σ, P, ∆, Pfinal ), Σ = F ∪ Q, F = {c, g}, Q = {q}, P = {pc , pf }, Pfinal = {pf } and ∆ = {c → pc , q(pc ) → pf }. Clearly, L(A) = {q(c)}. If we apply Procedure 1 to A and R1 of Example 1, then for i = 1 of the procedure, q(x) → q(g(x)) ∈ R1 and q(pc ) → pf ∈ ∆0 are considered and g(pc ) → [pf , q] and q([pf , q]) → pf are added. For i = 2, q(x) → q(g(x)) ∈ R1 and q([pf , q]) → pf ∈ ∆1 are considered and g([pf , q]) → [pf , q] is added. Since no rule is added when i = 3, the procedure halts with A∗ = A2 . Clearly, L(A∗ ) = {q(g n (c)) | n ≥ 0}. Note that the transition rule g([pf , q]) → [pf , q] simulates infinitely many rewrite steps caused by the rule q(x) → q(g(x)). In the methods proposed in [15] and [5] (without approximation), infinitely many states such as g n (pc ) and q(g n (pc )) (n ≥ 0) are introduced to simulate each rewrite step q(g n−1 (c)) →R1 q(g n (c)) by a different transition g(g n−1 (pc )) → g n (pc ), and thus the construction does not halt. Example 5. Let A = (Σ, P, ∆, Pfinal ), Σ = F ∪ Q, F = {c, f }, Q = {q}, P = Pfinal = {p}, ∆ = {c → p, f (p) → p, q(p) → p} and R3 = {f (q(x)) → q(f (x))}. R3 is an LT-TRS. Assume that Procedure 1 is executed for A and R3 . Since f (q(x)) → q(f (x)) ∈ R3 , f (p) → p ∈ ∆0 and q(p) → p ∈ ∆0 , rules f (p) → [p, q] and q([p, q]) → p are added to ∆1 . When i = 2, the procedure considers f (q(x)) → q(f (x)) ∈ R3 , f (p) → [p, q] ∈ ∆1 and q(p) → p ∈ ∆0 and it adds f (p) → [[p, q], q] and q([[p, q], q]) → [p, q] to ∆2 . The procedure repeats a similar construction and does not halt. Note that R3 is not an EPR-TRS since for a recognizable set T1 = {(f q)n (c) | n ≥ 0}, (→∗R3 )(T1 ) ∩ NFR3 = {q n (f n (c)) | n ≥ 0} is not recognizable. In the following, we will prove the soundness (t ∗Ai p ∈ P ⇒ ∃s: s ∗A p and s →∗R t) and completeness (s ∗A p and s →∗R t ⇒ ∃i ≥ 0: t ∗Ai p) of Procedure 1. By the definition of Procedure 1, it is not difficult to prove the following three lemmas. Note that by Procedure 1, if z ∈ Zi then either z ∈ P, z is of the form [z , q] for some z ∈ Zi−1 and q ∈ Q or z is of the form tρ for some term t and substitution ρ: V → Zi−1 . Lemma 2. If z1 → z ∈ ∆i (i ≥ 1), then z is of the form [z , q] and z1 is either in P or of the form [z1 , q1 ].
106
H. Seki et al.
Lemma 3. Let A = (Σ, P, ∆, Pfinal ) be a TA. Assume that every state p ∈ P is reachable. If Procedure 1 is executed for A and an LT-TRS R, then every state z ∈ Zi constructed during the execution of Procedure 1 is reachable. Lemma 4. For a transition rule g(t1 ρ, . . . , tm ρ) → [z, q] constructed in (S) of Procedure 1 and a Σ-term s, s ∗Ai g(t1 ρ, . . . , tm ρ) Ai [z, q] if and only if there exists a substitution σ: V → T (Σ) such that s = g(t1 , . . . , tm )σ and for each x ∈ Var (g(t1 , . . . , tm )), xσ ∗Ai xρ. For example, consider a rule g(h(pc ), p1 ) → [pf , q] constructed in Example 3. For a Σ-term s, s ∗A1 g(h(pc ), p1 ) A1 [pf , q] if and only if s can be written as g(h(s2 ), s1 ) where s1 and s2 are Σ-terms such that s1 ∗A1 p1 and s2 ∗A1 pc . Lemma 5. (Soundness) Let t ∈ T (Σ), p ∈ P, i ≥ 1 and [z, q] ∈ Zi . (A) t ∗Ai p ⇒ there exists a Σ-term s such that s ∗Ai−1 p and s →∗R t. (B) t ∗Ai [z, q] ⇒ there exists a Σ-term s such that s ∗Ai−1 z and s →∗R q(t). Proof. We will prove (A) and (B) simultaneously by induction on the length of the transition sequences t ∗Ai p and t ∗Ai [z, q]. We will only describe the proof of (B) due to space limitation. Assume t ∗Ai [z, q] and we perform the following case analysis (i)–(iii) by considering the rule applied in the last step of the transition sequence t ∗Ai [z, q]. (i) Assume t ∗Ai z Ai [z, q] (z ∈ Zi ). The rule z → [z, q] is constructed in (S). Assume that condition (S1) holds. (The case that (S2) holds can be treated in a similar way.) Then there exists f (q1 (x1 ), . . . , qn (xn )) → q(xl ) ∈ R with 1 ≤ l ≤ n, f (z1 , . . . , zn ) → z ∈ ∆i−1 (f ∈ F) and qj (zj ) → zj ∈ ∆i−1 (1 ≤ j ≤ n) where z = zl . By Lemma 3, there exists a Σ-term sj such that sj ∗Ai−1 zj for 1 ≤ j ≤ n. We further consider two subcases, namely, (i-a) z ∈ P, and (i-b) z = [z , q ] (z ∈ Zi , q ∈ Q) by Lemma 2. (i-a) By the inductive hypothesis (A), there exists a Σ-term s such that s ∗Ai−1 z and s →∗R t. Let s = f (q1 (s1 ), . . . , ql (s ), . . . , qn (sn )). Then s ∗Ai−1 f (q1 (z1 ), . . . , ql (z ), . . . , qn (zn )) ∗Ai−1 f (z1 , . . . , zn ) Ai−1 z and s →R q(s ) →∗R q(t). (i-b) There exists a Σ-term s such that s ∗Ai−1 z and s →∗R q (t) by the inductive hypothesis (B). Since ql ([z , q ]) → zl ∈ ∆i−1 by the assumption, [z , q ] = [zl , ql ]. Let s = f (q1 (s1 ), . . . , s , . . . , qn (sn )). Then s ∗Ai−1 f (q1 (z1 ), . . . , zl , . . . , qn (zn )) ∗Ai−1 f (z1 , . . . , zn ) Ai−1 z and s →∗R f (q1 (s1 ), . . . , ql (t), . . . , qn (sn )) → q(t). In either case (i-a) or (i-b), s ∗Ai−1 z and s →∗R q(t) and thus the lemma holds. (ii) Assume t = q (t ) ∗Ai q (z ) Ai [z, q](q ∈ Q, z ∈ Zi ). Since the rule q (z ) → [z, q] applied in the last step of the transition sequence is constructed in (T), z = [[z, q], q ] ∈ Zi . This implies [z, q] ∈ Zi−1 and q([z, q]) → z ∈ ∆i−1 . By the inductive hypothesis (B) applied to t ∗Ai [[z, q], q ], there exists a Σterm s such that s ∗Ai−1 [z, q] and s →∗R q (t ) (= t). Let s = q(s ), then s ∗Ai−1 q([z, q]) Ai−1 z and s →∗R q(q (t )) = q(t). Thus, the lemma holds.
Layered Transducing Term Rewriting System
107
(iii) Assume t = g(t1 , . . . , tm ) kAi g(z1 , . . . , zm ) Ai [z, q] (g ∈ F, z1 , . . . , zm ∈ Zi ). This is the most difficult case. The rule g(z1 , . . . , zm ) → [z, q] ∈ ∆i applied in the last step is constructed in (S). Assume that the rule is constructed by (S1). (The lemma can be shown in a similar way when the rule is constructed by (S2).) There exist
f (q1 (x1 ), . . . , qn (xn )) → q(g(ξ1 , . . . , ξm )) ∈ R (g ∈ F, ξj ∈ T (F, {x1 , . . . , xn })), f (z1 , . . . , zn ) → z ∈ ∆i−1 , qj (zj ) → zj ∈ ∆i−1 (1 ≤ j ≤ n).
(4.5) (4.6) (4.7)
By Lemma 4, the assumed transition sequence can be written as: t = g(t1 , . . . , tm ) = g(ξ1 , . . . , ξm )σ ∗Ai g(ξ1 , . . . , ξm )ρ ∗Ai g(ξ1 ρ, . . . , ξm ρ) = g(z1 , . . . , zm ) Ai [z, q]
(4.8)
σ = {xj → wj | 1 ≤ j ≤ n}, ρ = {xj → zj | 1 ≤ j ≤ n},
(4.9)
where k wj Aji
zj with kj ≤ k(1 ≤ j ≤ n).
(4.10)
Note that for j(1 ≤ j ≤ n) such that xj ∈ Var (g(ξ1 , . . . , ξm )), we use Lemma 3 to guarantee the existence of term wj . There are two cases to consider. (iii-a) If zj ∈ P then by (4.10) and the inductive hypothesis (A), there exists a Σterm uj such that uj ∗Ai−1 zj and uj →∗R wj . Let sj = qj (uj ), then sj ∗Ai−1 qj (zj ) Ai−1 zj by (4.7) and sj →∗R qj (wj ). (iii-b) If zj = [zj , qj ] then by the k
inductive hypothesis (B) applied to wj Aji [zj , qj ], there exists a Σ-term sj such that sj ∗Ai−1 zj and sj →∗R qj (wj ). In either case (iii-a) or (iii-b), sj ∗Ai−1 zj and sj →∗R qj (wj ). Let s = f (s1 , . . . , sn ). Then, s ∗Ai−1 f (z1 , . . . , zn ) Ai−1 z by (4.6), and s →∗R f (q1 (w1 ), . . . , qn (wn )) →R q(g(ξ1 , . . . , ξm ))σ = q(t) by (4.5),(4.8) and (4.9). Therefore, the lemma holds. Lemma 6. (Completeness) s →∗R t and s ∗A0 p ∈ P ⇒ there exists an integer i ≥ 0 such that t ∗Ai p . Proof. Assume that s →∗R t and s ∗A0 p. The lemma is shown by induction on the number of rewrite steps in s →∗R t. If s = t then the lemma holds clearly. Assume s →∗R t →R t. By the inductive hypothesis, there exists i ≥ 0 such that t ∗Ai p. For a rewrite step t →R t, the applied rewrite rule has the form of either (i) f (q1 (x1 ), . . . , qn (xn )) → q(u), or (ii) q1 (x1 ) → q(u). In either case (i) or (ii), t can be written as t = t [q(uσ)]o where o is the rewrite position and σ is a substitution. Assume u = g(u1 , . . . , um ) and
108
H. Seki et al.
uσ = g(t1 , . . . , tm ). (The case when u is a variable is easier and omitted.) Consider the case (i). Since t ∗Ai p and t |o is an instance of f (q1 (x1 ), . . . , qn (xn )), there exists a transition rule f (z1 , . . . , zn ) → z0 ∈ ∆i and t ∗Ai p can be written as t = t [f (q1 (x1 σ), . . . , qn (xn σ))]o ∗Ai t [f (q1 (z1 ), . . . qn (zn ))]o ∗Ai t [f (z1 , . . . , zn )]o Ai t [z0 ]o ∗Ai p where f (q1 (z1 ), . . . , qn (zn )) ∗Ai f (z1 , . . . , zn ) includes no ε-transition. Since qj (zj ) Ai zj (1 ≤ j ≤ n), (S1) is satisfied and a transition rule g(u1 ρ, . . . , um ρ) → [z0 , q] is constructed where ρ = {xj → zj | 1 ≤ j ≤ n}. Since xj σ ∗Ai zj , by Lemma 4, g(t1 , . . . , tm ) ∗Ai +1 [z0 , q]. The case (ii) can be treated in a similar way to the case (i) and we can show that (S2) is satisfied and transition rules which enable g(t1 , . . . , tm ) ∗Ai +1 [z0 , q] are constructed. In either case, q([z0 , q]) → z0 is also added in (T) and we can see that t = t [q(g(t1 , . . . , tm ))]o ∗Ai +1 t [q([z0 , q])]o Ai +1 t [z0 ]o ∗Ai p and the lemma holds.
Lemma 7. (Partial Correctness) Let A = (Σ, P, ∆, Pfinal ) be a TA, R be an LT-TRS. Assume that for input A and R, Procedure 1 constructs a series of tree automata A0 , A1 , A2 , · · ·. For any term t ∈ T (Σ) and state p ∈ P, there exists a term s ∈ T (Σ) such that s ∗A p and s →∗R t if and only if there exists i ≥ 0 such that t ∗Ai p. Proof. (⇒) By Lemma 6. (⇐) By induction on i and Lemma 5(A).
Lemma 8. If Procedure 1 halts for a TA A and an LT-TRS R, then L(A∗ ) = (→∗R )(L(A)) holds for the output A∗ of the procedure. Proof. (⊆) Assume t ∈ L(A∗ ). Since A∗ = Ai for some i ≥ 0, there exists a final state pf such that t ∗Ai pf . By Lemma 7, there exists a Σ-term s such that s ∗A pf and s →∗R t. Therefore, t ∈ (→∗R )(L(A)). L(A∗ ) ⊇ (→∗R )(L(A)) can be shown in a similar way.
5
Recognizability Preserving Property
In this section, two sufficient conditions for Procedure 1 to halt are proposed. One condition is that the sets of non-marker function symbols occurring in the left-hand sides and the right-hand sides of rewrite rules are disjoint. The other condition is the one which in effect restricts the class of recognizable sets. An LTTRS R which satisfies the former condition effectively preserves recognizability. Although the latter condition does not directly give a subclass of LT-TRSs which are EPR, we can show that some properties of LT-TRSs are decidable, using the latter condition.
Layered Transducing Term Rewriting System
5.1
109
I/O-Separated LT-TRS
An LT-TRS R is I/O-separated if R satisfies the following condition. Condition 1 For a signature Σ = F ∪ Q, F is further divided as F = FI ∪ FO , FI ∩ FO = ∅. A function symbol in FI (respectively, FO ) is called an input symbol (respectively, output symbol). For each rewrite rule f (q1 (x1 ), . . . , qn (xn )) → q(t), or q1 (x1 ) → q(t) in R, f ∈ FI and t ∈ T (FO , {x1 , . . . , xn }).
Lemma 9. Let R be an I/O-separated LT-TRS over Σ = FI ∪ FO ∪ Q. If Procedure 1 is executed for a TA A and R, then it always halts. Proof. Consider the TA A1 = (Σ, Z1 , ∆1 , Pfinal ) constructed in Procedure 1 for a TA A and an I/O-separated LT-TRS R. We can easily prove the claim that if a state [z, q] is added to Zi in (S) then z ∈ P by induction on i. Now we will show the lemma. Every rule g(· · ·) → z constructed in (S) or ADDREC satisfies g ∈ FO and hence the number of transition rules f (z1 , . . . , zn ) → z ∈ ∆i−1 (f ∈ FI ) which satisfy condition (S1) is finite. The number of transition rules qj (zj ) → zj ∈ ∆i−1 satisfying (S1) or (S2) is also finite since zj ∈ P or zj = [zj , qj ] where zj ∈ P by the above claim. Hence, Procedure 1 always halts. R1 in Example 1 and R2 in Example 2 are both I/O-separated LT-TRSs. Theorem 3. Every I/O-separated LT-TRS effectively preserves recognizability. A bottom-up tree transducer [6] is a well-known computation model in the theory of tree languages. For a linear bottom-up tree transducer M, if we consider the set of states of M as the set of markers, M corresponds to an I/O-separated LT-TRS. Hence, the following known property of tree transducer is obtained as a corollary. Corollary 1. [6] Every linear bottom-up tree transducer effectively preserves recognizability. 5.2
Marker-Bounded Sets
Let Σ ⊆ Σ be a subset of function symbols. Consider a tree representation of a term t. Let depth Σ (t) denote the maximum number of occurrences of function symbols in Σ which occur in a single path from the root to a leaf in t. That is, depth Σ (t) is defined as: max{depth Σ (ti ) | 1 ≤ i ≤ n} + 1 g ∈ Σ , depth Σ (g(t1 , . . . , tn )) = g ∈ Σ. max{depth Σ (ti ) | 1 ≤ i ≤ n}
110
H. Seki et al.
For example, for Σ = {f, g, h, c} and Σ = {f, g}, we have depth Σ (f (g(c), g(h(g(c))))) = 3. For a signature Σ = F ∪ Q, a set T ⊆ T (Σ) is marker-bounded if the following condition holds: Condition 2 There exists k ≥ 0 such that depth Q (t) ≤ k for each t ∈ T .
Lemma 10. Let R be an LT-TRS over Σ = F ∪Q. If t →∗R t then depth Q (t) ≤ depth Q (t ). If R contains no rewrite rule whose left-hand side is a constant, then depth Q (t) = depth Q (t ). Proof. By the form of a rewrite rule of an LT-TRS.
Definition 2. (deg) For each state z ∈ Zi , let deg(z) denote the number of nestings in z, which is defined as follows: deg(p) = 0 (p ∈ P), deg([z, q]) = deg(z) + 1 (z ∈ Zi , q ∈ Q), deg(f (t , . . . , t )) = max{deg(tj ) | 1 ≤ j ≤ n} 1 n where max ∅ = 0 for the empty set ∅. By definition, deg(f (t1 , . . . , tn )) = max{deg([z, q]) | [z, q] occurs in f (t1 , . . . , tn )}. The following lemma is not difficult to prove. Lemma 11. (i) For q(z1 ) → z ∈ ∆i (q ∈ Q), deg(z1 ) ≤ deg(z) + 1. (ii) For f (z1 , . . . , zn ) → z ∈ ∆i (g ∈ F), deg(zj ) ≤ deg(z) (1 ≤ j ≤ n). (iii) For z1 → z ∈ ∆i , deg(z1 ) ≤ deg(z).
Let A = (Σ, P, ∆, Pfinal ) be a TA. A state p ∈ P is useful if there exists a Σ-term t, a position o ∈ Pos(t) and a final state pf ∈ Pfinal such that t ∗A t[p]o ∗A pf . It is known that for a given TA A, we can construct a TA A which satisfies L(A ) = L(A), has no ε-rule and has only useful states. Lemma 12. Procedure 1 always halts for a TA A and an LT-TRS R which satisfy the following conditions. (1) L(A) is marker-bounded. (2) R contains no rewrite rule whose left-hand side is a constant. Proof. Let A and R be a TA and an LT-TRS which satisfy the conditions of the lemma. Without loss of generality, assume that A = (Σ, P, ∆, Pfinal ) has no ε-rule and has only useful states. The proof is by contradiction. Assume that Procedure 1 does not halt for A and R. For an arbitrary constant l, the number of states z0 with deg(z0 ) < l and the number of rules which contain only such states are both finite. Since Procedure 1 does not halt, there exists an integer i ≥ 0 and a state z0 ∈ Zi with deg(z0 ) ≥ l. In particular, let k be a constant in Condition 2 for L(A), then there exists an integer i ≥ 0 and a state z0 ∈ Zi with deg(z0 ) = k ≥ k + 1. Note that k ≤ i by the definition of deg(z0 ) and the
Layered Transducing Term Rewriting System
111
construction in Procedure 1. By Lemma 3, there exists a Σ-term s0 such that s0 ∗Ai z0 . Since deg(z0 ) ≥ 1, z0 can be written as z0 = · · · [z1 , q1 ] · · · (= ξ, including the case that z0 = [z1 , q1 ]) and deg(z0 ) = deg([z1 , q1 ]). The state z0 = ξ is introduced in (S) of Procedure 1 or ADDREC when the loop counter of the procedure is i ≤ i. Hence there exists f (q11 (x1 ), . . . , q1n (xn )) → q1 (t) ∈ R, ) → z1j ∈ ∆i−1 (1 ≤ j ≤ n) or there f (z11 , . . . , z1n ) → z1 ∈ ∆i−1 and q1j (z1j exists q11 (x1 ) → q1 (t) ∈ R and q11 (z11 ) → z1 ∈ ∆i−1 . Also, z0 = [z1 , q1 ] or | 1 ≤ j ≤ n}. If z0 = ξ = t ρ for some subterm t of t where ρ = {xj → z1j z0 = [z1 , q1 ] then clearly deg(z1 ) = deg(z0 )−1 = k −1. Suppose z0 = t ρ. Since → [z1 , q1 ] or g(t1 ρ, . . . , tn ρ) → [z1 , q1 ] where t = g(t1 , . . . , tn ) is the rule z1l constructed, deg(z0 ) ≤ deg([z1 , q1 ]) by Lemma 11. Hence, deg(z1 ) ≥ deg(z0 )−1 = k − 1. In either case, deg(z1 ) ≥ k − 1. Since s0 ∗Ai z0 = ξ and ξ is a subterm of tρ, there exists a Σ-term t0 such that t0 ∗Ai [z1 , q1 ] and s0 is a subterm of t0 . By Lemma 5(B), there exists a Σ-term s1 such that s1 →∗R q1 (t0 ) and s1 ∗Ai−1 z1 . Summarizing, s1 ∗Ai−1 z1 , s1 →∗R q1 (t0 ), deg(z1 ) ≥ k − 1, and s0 is a subterm of t0 . Repeating the above argument, we can see that there exist states zj ∈ Zi−j (1 ≤ j ≤ i), markers qj ∈ Q (1 ≤ j ≤ i), Σ-terms sj (0 ≤ j ≤ i), tj (0 ≤ j ≤ i − 1) such that: sj ∗Ai−j zj , sj →∗R qj (tj−1 ), deg(zj ) ≥ k − j and sj−1 is a subterm of tj−1 (1 ≤ j ≤ i).
(5.1)
Since sj →∗R qj (tj−1 ) (1 ≤ j ≤ i), depth Q (sj ) = depth Q (tj−1 ) + 1 by Lemma 10. Since sj is a subterm of tj (0 ≤ j ≤ i − 1), depth Q (tj ) ≥ depth Q (sj ). Hence, depth Q (si ) ≥ depth Q (s0 ) + i ≥ i ≥ k . Since zi is useful, there exists a Σ-term t , a position o ∈ Pos(t ) and a final state pf ∈ Pfinal such that t ∗A t [zi ]o ∗A pf .
(5.2)
Let t = t [si ]o , then t ∗A t [zi ]o ∗A pf ∈ Pfinal by (5.1) and (5.2). Thus, t ∈ L(A) holds. Furthermore, depth Q (t) ≥ depth Q (si ) ≥ k ≥ k + 1. This conflicts with Condition 2. Therefore, Procedure 1 halts. Theorem 4. For any TA A and an LT-TRS R which satisfy condition (1) and (2) of Lemma 12, (→∗R )(L(A)) is recognizable. Proof. By Lemmas 8 and 12.
Note that R1 , R2 , R3 in the examples satisfy condition (2) of Lemma 12. As mentioned in section 2.3, a TRS in R´ety’s subclass [13] is C-EPR. R´ety’s subclass and the subclass of LT-TRSs satisfying condition (2) of Lemma 12 are incomparable. In fact, any non-left-linear TRS in R´ety’s subclass is not an LT-TRS. On the other hand, {f (q(x)) → q(f (f (x)))} is an LT-TRS satisfying condition (2) but does not belong to R´ety’s subclass. Also the class of markerbounded sets and C are incomparable. For example, consider R3 in Example 5. L(A) = {f (q n (c)) | n ≥ 0} does not satisfy condition (1) but R3 belongs to R´ety’s subclass and L(A) belongs to C.
112
H. Seki et al.
Corollary 2. For a finite set T of ground terms and an LT-TRS R which contains no rewrite rule whose left-hand side is a constant, (→∗R )(T ) is recognizable. Corollary 3. For an LT-TRS R which contains no rewrite rule whose left-hand side is a constant, reachability and joinability are decidable. Proof. The reachability problem is to decide whether for a given TRS R and Σ-terms s and t, s →∗R t holds or not. It is obvious that s →∗R t if and only if t ∈ (→∗R )({s}). The latter condition is decidable by Lemma 1 and Corollary 2. Decidability of joinability can easily be verified by noting that ∃w: s →∗R w and t →∗R w if and only if (→∗R )({s}) ∩ (→∗R )({t}) = ∅.
6
Conclusion
In this paper, a new subclass of TRSs called LT-TRSs is defined and a sufficient condition for an LT-TRS to effectively preserve recognizability is provided. The subclass of LT-TRSs satisfying the condition contains simple EPR-TRSs which do not belong to any of the known decidable subclasses of EPR-TRSs. Extending the proposed class is a future study. It is not difficult to extend Procedure 1 so that a ground term can occur at an argument position of the left-hand side of a rewrite rule. For example, assume that f (z1 , z2 ) → z ∈ ∆i−1 and f (q1 (x1 ), d(c)) → q(g(x1 )) (c, d, f, g ∈ F, q1 , q ∈ Q) is a rewrite rule. If q1 (z1 ) → z1 ∈ ∆i−1 and d(c) ∗Ai−1 z2 then construct a transition rule g(z1 ) → [z, q]. Acknowledgments. The authors would like to thank Dr. Hitoshi Ohsaki for his invaluable discussions. Also they would like to thank the anonymous referees for their carefully reading the paper and giving useful comments.
References 1. Baader, F. and Nipkow, T.: Term Rewriting and All That, Cambridge University Press, 1998. 2. Brained, W.S.: “Tree generating regular systems,” Inform. and control , 14, pp.217– 231, 1969. 3. Coquid´e, J.L., Dauchet, M., Gilleron, R. and V´ agv¨ olgyi, S.: “Bottom-up tree pushdown automata: classification and connection with rewrite systems,” Theoretical Computer Science, 127, pp.69–98, 1994. 4. Durand, I. and Middeldorp, A.: “Decidable call by need computations in term rewriting (extended abstract),” Proc. of CADE-14, North Queensland, Australia, LNAI 1249, pp.4–18, 1997. 5. Genet, T.: “Decidable approximations of sets of descendants and sets of normal forms,” Proc. of RTA98, Tsukuba, Japan, LNCS 1379, pp.151–165, 1998. 6. G´ecseq, F. and Steinby, M.: Tree Automata, Acad´emiai Kiad´ o, 1984.
Layered Transducing Term Rewriting System
113
7. Gilleron, R.: “Decision problems for term rewriting systems and recognizable tree languages,” Proc. of STACS’91, Hamburg, Germany, LNCS 480, pp.148–159, 1991. 8. Gilleron, R. and Tison, S.: “Regular tree languages and rewrite systems,” Fundamenta Informaticae, 24, pp.157–175, 1995. 9. Gyenizse, P. and V´ agv¨ olgyi, S.: “Linear generalized semi-monadic rewrite systems effectively preserve recognizability,” Theoretical Computer Science, 194, pp.87– 122, 1998. 10. Jacquemard, F.:“Decidable approximations of term rewriting systems,” Proc. of RTA96, New Brunswick, NJ, LNCS 1103, pp.362–376, 1996. 11. Middeldorp, A.: “Approximating dependency graph using tree automata techniques,” Proc. of Int’l Joint Conf. on Automated Reasoning, LNAI 2083, pp.593– 610, 2001. 12. Nagaya, T. and Toyama, Y.: “Decidability for left-linear growing term rewriting systems,” Proc. of RTA99, Trento, Italy, LNCS 1631, pp.256–270, 1999. 13. R´ety, P.: “Regular sets of descendants for constructor-based rewrite systems,” Proc. of LPAR’99, Tbilisi, Georgia, LNCS 1705, pp.148–160, 1999. 14. Salomaa, K.: “Deterministic tree pushdown automata and monadic tree rewriting systems,” J. Comput. system Sci., 37, pp.367–394, 1988. 15. Takai, T., Kaji, Y. and Seki, H.: “Right-linear finite path overlapping term rewriting systems effectively preserve recognizability,” Proc. of RTA2000, Norwich, U.K., LNCS 1833, pp.246–260, 2000.
Decidability and Closure Properties of Equational Tree Languages Hitoshi Ohsaki and Toshinori Takai National Institute of Advanced Industrial Science and Technology Nakoji 3–11–46, Amagasaki 661–0974, Japan {ohsaki, takai}@ni.aist.go.jp
Abstract. Equational tree automata provide a powerful tree language framework that facilitates to recognize congruence closures of tree languages. In the paper we show the emptiness problem for AC-tree automata and the intersection-emptiness problem for regular AC-tree automata, each of which was open in our previous work [20], are decidable, by a straightforward reduction to the reachability problem for ground AC-term rewriting. The newly obtained results generalize decidability of so-called reachable property problem of Mayr and Rusinowitch [17]. We then discuss complexity issue of AC-tree automata. Moreover, in order to solve some other questions about regular A- and AC-tree automata, we recall the basic connection between word languages and tree languages.
1
Introduction
Tree automata theory has been studied with much attention as the basis of computational model dealing with terms. Recently tree automata techniques are applied in a particular domain for automatically solving security problems of cryptographic protocols (network protocols with some cryptographic primitives) [7,10,19]. This is a natural phenomenon resulting from the background of tree automata, in which complexity issues, especially on decidability, have been investigated with considerable efforts. In fact, applications based on tree automata theory, e.g. for program and system verification [9,22], depend deeply upon decidability results and closure properties. Regular tree languages, which is the counterpart of regular word languages, are known to be well-behaved, in the sense that they are closed under boolean operations and their decision problems are computable in many cases. Moreover, in term rewriting there are some useful subclasses in which rewriting closures preserve regularity [24,25]. But, on the other hand, it causes the situation where questions of non-regular tree language are considered to be troublesome. For instance, because term algebra modulo equational theory is not handled with regular tree automata (e.g., AC-congruence closure of a regular tree language is not regular [3]), the standard tree automata technique can not be applied for modeling cryptographic primitives like Diffie-Hellman exponentiation and onetime pads [23]. The difficulty of system verification allowing AC-operators is also found in another algebraic approach [18]. S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 114–128, 2002. c Springer-Verlag Berlin Heidelberg 2002
Decidability and Closure Properties of Equational Tree Languages
115
Over the last decade tree automata framework has been generalized. So far we can find several extensions, such as tree automata with constraints (e.g. [1, 13]), tree set automata [14] and timed tree automata [12]. Each of the extensions covers a wider class of regular tree languages, while keeping the benefit of some decidability results and closure properties. Equational tree automata proposed in our previous paper [20] is the recent extension, with which congruence closures of recognizable tree languages are recognizable. We also proved that in certain useful cases, recognizable equational tree languages are closed under union and intersection operations. However, there are several important questions remaining open in the previous work. Our goal of this paper is to find the answers to the open questions. More precisely, we show the following results: (1) decidability of emptiness problem for AC-tree automata, (2) decidability of intersection-emptiness, subset and universality problems for regular A- and regular AC-tree automata, and (3) closure properties, intersection and complement, of A-regular and AC-regular tree languages. Moreover, we show the computability of equational rewrite descendents. This result gives rise to the decidability of reachable property problem for ground AC-term rewriting. The rest of the paper is organized as follows. In the remaining part of this section we fix our terminologies and notations on term rewriting. In Section 2 we recall equational tree automata. And we show a positive answer to the problem (1), by generalizing the decidability result of Mayr and Rusinowitch [16]. Using this result we show the computability of ground AC-rewrite descendents. In Section 3 we discuss the complexity issue on some AC-tree automata problems. The problems (2) and (3) for A-regular tree languages are (negatively) solved in Section 4. We then partially solve the same questions for AC-regular case in Section 5. Finally we conclude this paper by summarizing decidability results and closure properties of A-, C- and AC-tree languages. We assume familiarity with the basics of term rewriting. An equation over the signature F (a finite set of function symbols f whose arity is denoted by arity(f )) and a set V of variables is a pair (s, t) of terms s, t ∈ T (F, V). An equation (s, t) is denoted by s ≈ t. An equation l ≈ r is called linear if neither l nor r contains multiple positions of the same variable. We say l ≈ r is variable-preserving if a multiset occurrence of each variable x in l is the same as a multiset occurrence of x in r. A ground equation is an equation whose left- and right-hand sides do not contain any variable. An equational system (ES for short) E is a set of equations. Given two sets FA , FC of some binary function symbols in F. The intersection of FA and FC is denoted by FAC . An ES consisting of associativity axioms f (f (x, y), z) ≈ f (x, f (y, z)) for all f ∈ FA is denoted by A(FA ). An ES of commutativity axioms f (x, y) ≈ f (y, x) for all f ∈ FC is C(FC ). We write AC(FAC ) for the union of A(FAC ) and C(FAC ). If unnecessary to be explicit, we simply write A, C and AC. A binary relation →E induced by an ES E is defined as follows: s →E t if s = C[lσ] and t = C[rσ] for some equation l ≈ r ∈ E, context C ∈ C(F, V) and substitution σ. The symmetric closure of →E is denoted by
116
H. Ohsaki and T. Takai
E . In the paper we do not assume r ≈ l ∈ E if l ≈ r ∈ E, so →E = E . The equivalence relation of →E (i.e., the reflexive-transitive closure of E ) is written by ∼E . A term rewriting system is an ES whose equations l ≈ r are called rewrite rules and are denoted by l → r. An equational TRS (ETRS for short) R/E is a combination of a TRS R and an ES E over the same signature F. To emphasize the signature of an ETRS, it can be denoted by the pair (F, R/E) of two components. An ETRS is called ground if the TRS consists of ground rewrite rules. Note that E is not necessarily ground. We write s →R/E t if there exist terms s and t such that s ∼E s →R t ∼E t. As a special case, if E = A (E = C, E = AC, respectively), R/E is called an A-TRS (C-TRS, AC-TRS). Given an ETRS R/E. A term s reachable from a term t with respect to R/E, i.e. t →∗R/E s is called a descendent of t. For a set L of terms, descendents of terms in L, i.e. { t | ∃s ∈ L. s →∗R/E t }, are denoted by (→∗R/E )[L ].
2
AC-Tree Automata and Emptiness Problem
A tree automaton (TA for short) A = (F, Q, Qfin , ∆) consists of a signature F, a finite set Q of states (special constants with F ∩ Q = ∅), a set Qfin (⊆ Q) of final states and a finite set ∆ of transition rules in one of the following forms: f (p1 , . . . , pn ) → q
or
f (p1 , . . . , pn ) → f (q1 , . . . , qn )
for some f ∈ F with arity(f ) = n and p1 , . . . , pn , q, q1 , . . . , qn ∈ Q. In the latter form, the root function symbols of the left- and right-hand sides must be the same. An equational tree automaton (ETA for short) A/E is the combination of a TA A and an ES E over the same signature F. We often denote the 5tuple (F, Q, Qfin , ∆, E) for A/E. An ETA A/E is called regular if ∆ consists of transition rules in the shape of f (p1 , . . . , pn ) → q. We say A/E is an A-TA (associative-tree automaton) if E = A. An ETA R/E with E = C is called a C-TA (commutative-tree automaton). Likewise, if E = AC, it is called an AC-TA. We write s →A/E t if there exist a transition rule l → r ∈ ∆ and a context C ∈ C(F ∪ Q) such that s ∼E C[l] and t ∼E C[r]. The relation →A/E on T (F ∪ Q) is called move relation of A/E. The transitive closure and reflexive∗ transitive closure of →A/E are denoted by →+ A/E and →A/E . For a TA A, we ∗ simply write →A , →+ A and →A , instead. A state q is called accessible with A/E if there exists a term t in T (F) such that t →∗A/E q. A term t is accepted by A/E if t ∈ T (F) and t →∗A/E q for some q ∈ Qfin . Elements in the set L(A/E) are ground terms accepted by A/E. A tree language L over F is some subset of T (F). We can easily observe that: membership problem for an ETA A/E is decidable if E is length-preserving, i.e., if E is variable-preserving and for every l ≈ r ∈ E, the number of function symbols occurring in l is the same as the number of function symbols in r. We write E(L) for {t | ∃s ∈ L. s ∼E t}. The set E(L) is called E-congruence closure of L. A tree language L is E-recognizable if there exists A/E such that
Decidability and Closure Properties of Equational Tree Languages
117
L = L(A/E). In certain useful cases, E-recognizable tree languages are closed under union and intersection: Lemma 1. If E is a variable-preserving ES, the union of E-recognizable tree languages L1 and L2 is E-recognizable. Moreover, if E = A (or E = AC), the intersection of L1 and L2 is E-recognizable. A tree language L is called E-regular if L is E-recognizable with a regular ETA A/E. If E = ∅, the tree language L is called regular. Every recognizable tree language is regular, i.e. every TA is transformed to a regular TA with the same expressive power. However, E-recognizable tree languages are not E-regular in general. In the previous paper [20] we studied decidability on equational tree automata. Especially we focused on emptiness problems, i.e. a question of whether L(A/E) = ∅. In case E is linear, this question for regular E-tree automata can be reduced to the same question for regular tree automata by using the commutation lemma: Lemma 2. Every regular ETA A/E with E linear satisfies E(L(A)) = L(A/E). Since E(L(A)) = ∅ if and only if L(A) = ∅, the following property is a direct consequence of Lemma 2. Corollary 1 ([20]). Emptiness problem for regular E-tree automata with E linear is decidable. Thus the same problem for regular A-, C- and AC-tree automata is decidable. In contrast, the non-regular A-case is undecidable (Corollary 1, [20]). The nonregular AC-case remains open in the previous paper. In the following part, we show the positive answer to this open question. First of all, we introduce a recent work on reachability problem for ground AC-TRS of Mayr and Rusinowitch [16]. Proposition 1. Reachability problem for ground AC-TRS, i.e. a question of whether s →∗R/AC t for an arbitrary ground AC-TRS R/AC and terms s, t, is decidable. Next we state the following property. Lemma 3. Given two ground AC-TRSs (F, R/AC) and (G, S/AC), where FAC = GAC . If F0 ∩ G0 = ∅, i.e. F and G do not share any constant symbol, then for all terms s, t ∈ T (F ∪ G), s →∗(R∪S)/AC t if and only if s →∗R/AC s and s →∗S/AC t for some s . Proof. Since the “if” part is trivial, it suffices to show the “only if” part. Suppose u →S/AC v →R/AC w. Then u ∼AC C1 [l1 ] and v ∼AC C1 [r1 ] for some l1 → r1 ∈ S, and v ∼AC C2 [l2 ] and w ∼AC C2 [r2 ] for some l2 → r2 ∈ R. By assumption, r1 is a term in T ((F ∪ G) − F0 ) and l2 is a term in T ((F ∪ G) − G0 ). This yields a context D such that D[r1 , l2 ] ∼AC C1 [r1 ] and D[r1 , l2 ] ∼AC C2 [l2 ], and thus, u ∼AC D[l1 , l2 ] →S D[r1 , l2 ] →R D[r1 , r2 ] ∼AC w. Hence u ∼AC D[l1 , l2 ] →R D[l1 , r2 ] →S D[r1 , r2 ] ∼AC w. This implies →∗(R∪S)/AC ⊆ →∗R/AC · →∗S/AC .
118
H. Ohsaki and T. Takai
In the previous lemma, by taking an AC-TA (F, Q, Qfin , ∆, AC) as a ground AC-TRS (F ∪ Q, ∆/AC), we can extend Proposition 1 as follows. Lemma 4. Given a ground AC-TRS (F, R/AC) and tree languages L1 , L2 over F. If L1 and L2 are AC-recognizable tree languages, it is decidable whether there exist some s in L1 and t in L2 such that s →∗R/AC t. Proof. Suppose L1 = L(A/AC) and L2 = L(B/AC) where A = (F, P, Pfin , ∆1 ) and B = (F, Q, Qfin , ∆2 ), such that P ∩ Q = ∅. Here we assume without loss of generality that p, q are state symbols such that Pfin = {p} and Qfin = {q}. Then we obtain ∃s ∈ L1 , ∃t ∈ L2 . s →∗R/AC t · · · (A) if and only if ∃u, v ∈ T (F). u →∗A/AC p ∧ v →∗B/AC q ∧ u →∗R/AC v. · · · (B) Moreover, using Lemma 3 twice, (B) holds if and only if · · · (C) p →∗(R∪A−1 ∪B)/AC q. Due to Proposition 1, we know (C) is decidable, and so is (A).
Given a term t, a predicate P and a binary relation → on terms, a question of whether there exists some s in (→∗ )[ {t} ] with P (s) is called reachable property problem. Here P (s) can be replaced by the membership test s ∈ [[P ]]. Mayr and Rusinowitch showed in [17] that if P is a so-called state formula, this problem is decidable within their framework called process rewrite systems. In term rewriting setting, it corresponds to the following result: Proposition 2. Reachable property problem is decidable for ground AC-TRSs if [[P ]] is a regular tree language. Due to Lemma 4 we know that (→∗R/AC )[ L1 ] ∩ L2 = ∅ is a computable question, provided R/AC is a ground AC-TRS and L1 , L2 are AC-recognizable. Therefore the above proposition is generalized by our equational tree automata framework. One should notice that even if L1 and/or L2 are regular, the above decidability holds. Note that regular tree languages are not AC-recognizable: Suppose FAC = {f} and L = {f(a, b)}. The tree language L is regular, but it is not ACrecognizable (and not AC-regular), because every AC-TA A/AC accepting f(a, b) also accepts f(b, a). On the other hand, as easy instances of Lemma 4, we can show the positive answers to our open questions: Corollary 2. Emptiness problem for AC-TA is decidable. Proof. Let A/AC be an AC-TA over the signature F. In Lemma 4, by taking R = ∅, L1 = L(A/AC) and L2 = T (F), we know that L1 = ∅ if and only if there exist s ∈ L1 and t ∈ L2 such that s →∗R/AC t. Corollary 3. Intersection-emptiness problem for regular AC-TA is decidable. Proof. Along the same lines of the previous case, we take R = ∅, L1 = L(A/AC) and L2 = L(B/AC) such that A/AC and B/AC are regular AC-TA.
Decidability and Closure Properties of Equational Tree Languages
119
Input A: TA (F , Q, Qfin , ∆), FAC : AC-symbols. Output Qacc : set of accessible states. Step 1. Let ∆ := { l → r ∈ ∆ | root(l) ∈ FAC }, Q0 , Qacc := ∅, k := 0. Step 2. Compute Q : set of state symbols {q ∈ Q | ∃t ∈ T (F ∪ Qk ). t →∗∆−∆ q}. Step 3. Compute Qk+1 : theunion of Q and aux({l → r ∈ ∆ | root(l) = f }, f, Q , Q). f ∈FAC
Here aux is a function call to the auxiliary procedure. Step 4. If Qk+1 = Qk then k is increased by 1 and go to Step 2; otherwise, return Qacc by letting Qacc := Qk . Fig. 1. Procedure: State symbols accessible with AC-TA
Input
R: ground TRS, f : binary function symbol, Q1 , Q2 : sets of constants such that Q1 ⊆ Q2 . Output P: set of state symbols {q ∈ Q2 | ∃t ∈ T ({f } ∪ Q1 ). t →∗R/AC(f ) q }. Step 1. Select a fresh constant c. Let S := R ∪ { c → q | q ∈ Q1 } ∪ { c → f (c, c) }. Step 2. Compute P := { q ∈ Q2 | c →∗S/AC(f ) q }. Step 3. Return P. Fig. 2. Auxiliary Procedure
But an important remark for the above results is that Mayr and Rusinowitch did not actually show in [16] the proof of the general case of Proposition 1, i.e. there is no proof of decidability of the reachability problem for ground AC-TRS with arbitrary many AC-symbols. In other words, as far as the above proofs depend upon Proposition 1, the newly obtained result (e.g. Lemma 4) works only for the single-AC case. So in the next section, by showing the original procedure, we generalize our results (Corollaries 2 and 3) together with doing estimation of the complexity of emptiness questions.
3
Complexity of AC-Tree Language Problems
Let us introduce a special notation for AC-tree languages: Suppose f is a function symbol in FAC . An f -block of a term t is a non-empty maximal context in t consisting only of f . For notational convenience, if t = C [C[t1 , . . . , tn ]] such
120
H. Ohsaki and T. Takai
that C is an f -block, we write C [Cf [[t1 , . . . , tn ]]]. For instance, g(f (a, f (b, c))) is represented as g(Cf [[a, b, c]]). First we state the basic property of the procedure in Fig. 2. This procedure is defined as in the way of Garey and Johnson [5]. Lemma 5. At the step 2 in Fig. 2, for every q ∈ Q2 , if c →∗S/AC(f ) q, there exists a term t ∈ T (Q1 ∪ {f }) such that c →∗S/AC(f ) t →∗R/AC(f ) q. Proof. Let R1 = {c → f (c, c)} and R2 = {c → q | q ∈ Q1 }. Since c does not appear in the right-hand sides of rewrite rules of a ground TRS R ∪ R2 , we can prove that →(R∪R2 )/AC(f ) · →R1 /AC(f ) ⊆ →R1 /AC(f ) · →(R∪R2 )/AC(f ) . By the same reason, →R/AC(f ) · →R2 /AC(f ) ⊆ →R2 /AC(f ) · →R/AC(f ) . Thus we obtain a rewrite sequence c →∗R1 /AC(f ) t1 →∗R2 /AC(f ) t2 →∗R/AC(f ) q. Since rewrite rules in R are ground and do not contain c, the number of occurrences of c does not change in t2 →∗R/AC(f ) q. This implies |t2 |c = 0, because q = c. Similarly, since R1 ∪ R2 is ground and consists of function symbols in Q1 ∪ {f }, we have t2 ∈ T (Q1 ∪ {f } ∪ {c}). Hence t2 ∈ T (Q1 ∪ {f }). Next we show soundness of the main procedure. Lemma 6. For every AC-TA A/AC, a state q is accessible with A/AC if and only if q is in Qacc . Proof. Let A = (F, Q, Qfin , ∆). Using the induction on the loop-counter k of the procedure in Fig. 1, we show that q ∈ Qk for some k if and only if q is accessible with A/AC. First we prove the “if” part. The base case is trivial, because Q0 = ∅. For induction step, we suppose q ∈ Qk . We divide into the three cases as follows: (1) If q ∈ Qk with k < k, the lemma holds by induction hypothesis. (2) If q∈ / Qk for any k < k but q ∈ Q at Step 2, the lemma also holds by induction hypothesis and the definition of Step 2. (3) If q ∈ / Qk for any k < k but q ∈ Qk ∗ at Step 3 then c →S/AC(f ) q for some f ∈ FAC . By Lemma 5, we have a term t ∈ T (Q ∪ {f }) such that t →∗R/AC(f ) q where R = {l → r ∈ ∆ | root(l) = f }. From the previous cases, since Q at Step 2 is a set of accessible states, so is q (as t →∗A/AC t for some t ∈ T (F)). For the “only if” part, we take a state q ∈ Q such that t →∗A/AC q for some t ∈ T (F). By induction on the number of blocks in t, we show that q is in Qk for some k. If there is no block in t, by letting k = 0 the lemma holds obviously. For induction step, we assume without loss of generality that t = C [s1 , . . . , si−1 , Cf [[t1 , . . . , tn ]], si+1 , . . . , sm ], where C ∈ C(F −FAC ). C is possibly the empty context. By assumption, t →∗A/AC C [p1 , . . . , pm ] →∗A q and Cf [[t1 , . . . , tn ]] →∗A/AC Cf [[q1 , . . . , qn ]] →∗A/AC pi for some p1 , . . . , pm , q1 , . . . , qn ∈ Q. By induction hypothesis, qj ∈ Qkj for some kj (1 j n). Moreover, pj ∈ Qkj for some kj (1 j m) if j = i. By definition of the procedure in Fig. 1, we obtain pi ∈ Qkmax +1 where kmax = max{kj | 1 j n}. Hence, by letting kmax = max({kj | 1 j m and j = i} ∪ {kmax + 1}), q ∈ Qkmax or + 1)-th loop. q ∈ Q at the (kmax Theorem 1. Emptiness problem for AC-TA with arbitrary many AC-symbols is decidable.
Decidability and Closure Properties of Equational Tree Languages
121
The procedure in Fig. 1 needs at most (|Q|2 × |FAC |)-auxiliary function calls, and each computation is reachability test of a Petri net instance. Reachability problem for Petri nets is EXPSPACE-hard [15]. In contrast, the emptiness problem for regular AC-TA is solved in linear time, due to Lemma 2. From the similar observation, the complexity of membership problem can also be obtained. Corollary 4. Membership problem for regular AC-TA is NP-complete. Proof. We show that this problem is in NP. Let A/AC be a regular AC-TA. From Lemma 2, a term t is accepted by A/AC if and only if there exists a term t accepted by A and t ∼AC t. Thus the following procedure is sound: (1) Guess a term t such that t ∼AC t. (2) Check whether or not t ∈ L(A). Since the membership problem for regular TA is solved in linear time, the above algorithm is in NP. On the other hand, membership problem for commutative context-free grammar, which is known as an NP-hard problem [4], is reduced to this problem by assuming F = {f}∪Σ with FAC = {f}. Here Σ is the alphabet of a grammar. Hence the membership problem for regular AC-TA is NP-complete. On the other hand, because AC-recognizable tree languages are closed under intersection (Lemma 1), we can extend another decidability result. Theorem 2. Intersection-emptiness problem for regular AC-TA with arbitrary many AC-symbols is decidable.
4
Yet Negative Results on A-Tree Languages
In this section we discuss decidability of A-tree languages. In the previous paper we showed that emptiness problem for A-TA is undecidable in general. Using this result with the same proof argument of Lemma 4, we can prove the following undecidability. Proposition 3 ([3]). Reachability problem for ground A-TRS is undecidable. Proof. We use the following fact (Corollary 1, [20]): Emptiness problem is undecidable for A-TA. We take an A-TA A/A. Here we assume without loss of generality that A = (F, Q, {q}, ∆1 ) like in the proof of Lemma 4. Define the TA B = (F, {p}, {p}, ∆2 ) where ∆2 = {f (p, . . . , p) → p | f ∈ F}. Obviously, L(B/A) = T (F). Similar to the proof of Lemma 3, we obtain →A/A · →B−1 /A ⊆ →B−1 /A · →A/A . This implies that p →∗(A∪B−1 )/A q if and only if p →∗B−1 /A t →∗A/A q for some t. By definitions of A and B, t ∈ T (F ∪ {p}), and moreover, t ∈ T (F ∪ Q). Thus t is in T (F) if there exists. Since it is undecidable whether t →∗A/A q for some t ∈ T (F), so is p →∗(A∪B−1 )/A q.
122
H. Ohsaki and T. Takai
Other problems, such as subset and universality problems, are also undecidable for A-TA. However, decidability of the two problems for regular A-TA remains open in [20]. In the following part, we show these decidability results by reducing to the same word problems. A grammar G is the 4-tuple (Σ, Q, q0 , ∆), whose components are the alphabet Σ, a finite set Q of state symbols such that Σ ∩ Q = ∅, an initial state q0 (∈ Q) and a finite set ∆ of string rewrite rules (called production rules) in the form of l → r for some l, r ∈ (Σ ∪ Q)+ . A grammar is called context-free if l ∈ Q for all rules l → r ∈ ∆. A regular grammar is a context-free grammar whose right-hand sides of rules are in (Σ ∪ Q) Σ ∗ . A word generated by G is an element w in Σ ∗ such that q0 →∗G w. Note that G does not generate the empty word ". The language generated by G, denoted by L(G), is a set of generated words. A relationship between word languages and tree languages can be found in the literature, e.g. in [2,6]. A study of leaf languages is one of the examples: Proposition 4 ([6]). The following two statements hold true: 1. For every CFG G there exists a regular TA A such that L(G) = leaf(L(A)). 2. For every regular TA A there exists a CFG G such that leaf(L(A)) = L(G). A leaf-word leaf(t) of a term t over the signature F is a word over the alphabet F0 (a set of constants in F), that is inductively defined as follows: leaf(t) = t if t is a constant; leaf(t) = leaf(t1 ) . . . leaf(tn ) if t = f (t1 , . . . , tn ) and arity(f ) 1. We use the same notation leaf for the extension to tree languages, i.e. let L be a tree language over F, then leaf(L) = {leaf(t) | t ∈ L}. We say leaf(L) is a leaf-language of L. One should notice that the following statement is not true: If leaf(L) is CFG then L is a regular tree language. For instance, let us consider the following example. We take G1 = ({a, b}, {α, β, γ}, γ, ∆1 ) where ∆1 : γ → aβ γ → bα
α → a α → aγ
α → bαα
β → b β → bγ
β → aβ β
By induction on the length of words, we can prove that for every non-empty word w over the alphabet {a, b}, α →∗G1 w if and only if |w|a = |w|b + 1, β →∗G1 w if and only if |w|b = |w|a + 1, γ →∗G1 w if and only if |w|a = |w|b . This implies L(G1 ) = {w | |w|a = |w|b }, i.e. G1 generates the set of words w such that the number of occurrences of a in w is the same as the number of occurrences of b in w. On the other hand, we consider a tree language over a signature F such that F0 = {a, b}. If F contains at least a function symbol f with arity(f ) 2, a tree language L = {t | |t|a = |t|b } is no longer a regular tree language, due to Pumping Lemma. Now we discuss the relationship between word languages and equational tree languages. The following properties are obtained as an easy observation of leafoperation.
Decidability and Closure Properties of Equational Tree Languages
123
Lemma 7. Given an arbitrary signature F. For all terms s, t ∈ T (F) and a tree language L over F, 1. leaf(s) = leaf(t) if s ∼A t. The reverse holds if F = F0 ∪ {f} and FAC = {f}. 2. leaf(L) = leaf(A(L)). Suppose F = F0 ∪ {f } and FAC = {f }. If L1 , L2 are tree languages over F, the following statements hold true: 3. leaf(L1 ∪ L2 ) = leaf(L1 ) ∪ leaf(L2 ), 4. leaf(A(L1 ) ∩ A(L2 )) = leaf(L1 ) ∩ leaf(L2 ), 5. A(L1 ) ⊆ A(L2 ) if and only if leaf(L1 ) ⊆ leaf(L2 ). Proof. To prove the property 1, we observe that leaf(C[f(f(t1 , t2 ), t3 )]) = w1 leaf(t1 ) leaf(t2 ) leaf(t3 ) w2 = leaf(C[f(t1 , f(t2 , t3 ))]) for some w1 , w2 ∈ F0∗ . It can be proved by the structural induction of C. This implies that if s ∼A t then leaf(s) = leaf(t). For the reverse, we define the mapping tree as follows: tree(w) = a if w ∈ F0 ; tree(w) = f(a, tree(w )) if w = a w for some a ∈ F0 and w ∈ F0+ . Then it suffices to show that for all u in T (F), tree(leaf(u)) ∼A u. It can be proved by the structural induction of u. The properties 2 and 3 are easy. Note that union (∪) is distributive over set comprehension, i.e. {leaf(t) | t ∈ L1 ∪ L2 } = {leaf(t) | t ∈ L1 } ∪ {leaf(t) | t ∈ L2 }. Next we show the property 4. We suppose w ∈ leaf(A(L1 ) ∩ A(L2 )). By definition, there exists t ∈ A(L1 ) ∩ A(L2 ) such that w = leaf(t). Moreover, leaf(t) ∈ leaf(A(L1 )) and leaf(t) ∈ leaf(A(L2 )). Thus w ∈ leaf(A(L1 )) ∩ leaf(A(L2 )). Due to the property 2, w ∈ leaf(L1 ) ∩ leaf(L2 ). For the reverse inclusion, we suppose w ∈ leaf(A(L1 )) ∩ leaf(A(L2 )). Note that leaf(L1 ) = leaf(A(L1 )) and leaf(L2 ) = leaf(A(L2 )). Then there exist t1 ∈ A(L1 ) and t2 ∈ A(L2 ) such that w = leaf(t1 ) and w = leaf(t2 ). Due to the property 1, t1 ∼A t2 , and thus, t1 ∈ A(L2 ) (and t2 ∈ A(L1 ) as well). Hence t1 ∈ A(L1 ) ∩ A(L2 ). This implies w ∈ leaf(A(L1 ) ∩ A(L2 )). Finally we show the property 5. The “only if” part is trivial. To show the reverse, we suppose t ∈ A(L1 ). Then leaf(t) ∈ leaf(L1 ) (as leaf(t) ∈ leaf(A(L1 ))), and thus, leaf(t) ∈ leaf(L2 ). This implies that, due to the property 1, there exists s ∈ L2 such that s ∼A t. Hence t ∈ A(L2 ). One should note that leaf(L1 ∩L2 ) = leaf(L1 )∩leaf(L2 ). For example, suppose L1 = {f(f(a, b), c)} and L2 = {f(a, f(b, c))} such that F = {f, a, b, c} and FA = {f}. Then leaf(L1 ∩ L2 ) = ∅, but leaf(L1 ) ∩ leaf(L2 ) = {abc}. Furthermore, we see that leaf(L1 ) ⊆ leaf(L2 ) does not imply L1 ⊆ L2 . We show an extension of Proposition 4 (1). In the extension, maximality of tree languages refines the relationship to word languages. Definition 1. A tree languages L over F is maximal for a word language W if for all terms t in T (F), leaf(t) ∈ W if and only if t ∈ L. An ETA A/E is maximal for W if L(A/E) is maximal for W . Lemma 8. Given a CFG G over the alphabet Σ. Then there exists the associated regular A-TA AG /A that is maximal for L(G).
124
H. Ohsaki and T. Takai
Proof. Let G = (Σ, Q, q0 , ∆). Suppose ∆ is in Chomsky Normal Form. We take F = Σ ∪ {f} and FA = {f}. Define the regular A-TA AG /A as follows: AG = (F, Q, {q0 }, ∆ ) where ∆ = {f(q1 , q2 ) → q | q → q1 q2 ∈ ∆} ∪ {a → q | q → a ∈ ∆}. It is easy to show that for every (non-empty) word w ∈ Σ + , q0 →∗G w if and only if tree(w) →∗AG /A q0 . Here tree is the mapping defined in the previous proof. For maximality we use Lemma 7 (1) and the following property: leaf(tree(w)) = w for every w ∈ Σ. Then we can prove that for every term t ∈ T (Σ ∪ {f}) and word w ∈ Σ ∗ , leaf(t) = w if and only if t ∼A tree(w). Thus, if leaf(t) ∈ L(G) then t ∼A tree(leaf(t)) and tree(leaf(t)) →∗AG /A q0 . Hence t ∈ L(AG /A). We consider the CFG G2 = ({a, b}, {α, α , β, β , γ, δ, ξ}, γ, ∆2 ) where ∆2 : γ → δβ γ → ξα
α → a α → δγ
α → ξ α α → α α
β → b β → ξγ
β → δ β β → β β
δ → a ξ → b
Note that G2 is in Chomsky Normal Form, and L(G2 ) = {w | |w|a = |w|b } because G2 is essentially the same as G1 . Then the associated A-TA AG2 /A over the signature F = {f, a, b} and FA = {f} recognizes a tree language maximal for L(G2 ) and for L(G1 ). Next we extend Proposition 4 (2) as in the previous way. Lemma 9. Given an regular A-TA A/A over the signature F. Then there exists the associated CFG GA/A such that L(GA/A ) = leaf(L(A/A)). Proof. From Proposition 4 (2), for the language leaf(L(A)) there exists a CFG G such that L(G) = leaf(L(A)). On the other hand, from Lemma 7 (2), leaf(L(A)) = leaf(A(L(A))). Since L(A/A) = A(L(A)) by Lemma 2, we obtain leaf(L(A)) = leaf(L(A/A)). By letting GA/A = G, it satisfies that L(GA/A ) = leaf(L(A/A)). In this case, A/A is not always maximal. For instance, we take an A-TA A/A such that L(A/A) = {a} over the signature {f, a}. Obviously there exists a CFG GA/A such that L(GA/A ) = {a}. But then, A/A is not maximal for L(GA/A ), because f(a) is not in L(A/A). Corollary 5. The following two statements hold true: 1. For every CFG G there exists a regular A-TA A/A such that L(A/A) is a maximal tree language for L(G). 2. For every regular A-TA A/A there exists a CFG G such that leaf(L(A/A)) = L(G). In case the signature F = F0 ∪ {f} and FA = {f}, L(A/A) is a maximal tree language for L(G). Proof. An immediate consequence of Lemmata 8 and 9, except the case that F = F0 ∪ {f} and FA = {f} in the latter statement. In the particular case, it holds that tree(leaf(s)) ∼A s for any s in T (F). Thus, if leaf(t) ∈ L(G), there exists s ∈ L(A/A) such that leaf(s) = leaf(t). Moreover, s ∼A t by Lemma 7 (1). Hence t ∈ L(A/A). Thus A-regular tree languages inherits the negative results of context-free languages.
Decidability and Closure Properties of Equational Tree Languages
125
Theorem 3. A-regular tree languages are not closed under intersection or complement. Proof. Given CFGs G1 and G2 over the alphabet Σ. Due to Lemma 8, there are regular A-TA A/A and B/A over F, where F = Σ ∪ {f} and FA = {f}, such that L(G1 ) = leaf(L(A/A)) and L(G2 ) = leaf(L(B/A)), respectively. Suppose to the contradiction that A-regular tree languages are closed under intersection. Then there exists a regular A-TA C/A over F such that L(C/A) = L(A/A) ∩ L(B/A). Moreover, due to Lemma 9, there exists a CFG G3 such that L(G3 ) = leaf(L(C/A)). From Lemma 7 (2) and (4), we obtain L(G3 ) = leaf(L(A/A)) ∩ leaf(L(B/A)) = L(G1 ) ∩ L(G2 ). However, it contradicts to the fact that contextfree languages are not closed under intersection. Hence A-regular tree languages are not closed under intersection. To show not being closed under complement, we use the previous fact together with De Morgan Law. Theorem 4. The following questions are undecidable in A-regular tree languages: Given A/A and B/A are regular A-TA over the signature F, then – L(A/A) ⊆ L(B/A)? – L(A/A) = T (F)? – L(A/A) ∩ L(B/A) = ∅?
(subset) (universarity) (intersection-emptiness)
Proof. Given CFGs G1 and G2 over the alphabet Σ. Due to Lemma 8, there exist regular A-TA A/A and B/A over F such that L(G1 ) = leaf(L(A/A)) and L(G2 ) = leaf(L(B/A)). Here the signature F = Σ ∪ {f} and FA = {f}. Due to Lemma 7 (2) and (5), we know that L(A/A) ⊆ L(B/A) if and only if leaf(L(A/A)) ⊆ leaf(L(B/A)), because L(A/A) = A(L(A)) and L(B/A) = A(L(B)), respectively. This implies the question if L(A/A) ⊆ L(B/A) is undecidable, because the question if L(G1 ) ⊆ L(G2 ) is undecidable [8]. Similarly, using Lemma 7 we can prove that – L(A/A) = T (F) if and only if leaf(L(A/A)) = leaf(T (F)), – L(A/A) ∩ L(B/A) = ∅ if and only if leaf(L(A/A)) ∩ leaf(L(B/A)) = ∅. Hence the other two questions are also undecidable, because universality and intersection-emptiness problems for context-free languages are undecidable. Additionally, we can prove that equivalence problem for regular A-TA is undecidable: We take two regular A-TA A/A, B/A. The subset problem L(A/A) ⊆ L(B/A) is represented as L(A/A) ∪ L(B/A) = L(B/A). By Lemma 2, we have L(A/A) ∪ L(B/A) = A(L(A)) ∪ A(L(B)), and then, A(L(A)) ∪ A(L(B)) = A(L(A)∪L(B)). The A-congruence closure of a regular tree language L(A)∪L(B) is A-regular, i.e. there is a regular A-TA C/A such that L(C/A) = A(L(A)∪L(B)). This implies that L(A/A) ⊆ L(B/A) if and only if L(C/A) = L(B/A). Since the former question is undecidable, so is the equivalence L(C/A) = L(B/A).
126
H. Ohsaki and T. Takai
L(A/E) = ∅? L(A/E) ⊆ L(B/E)? L(A/E) = T (F )? L(A/E) ∩ L(B/E) = ∅?
closed under ∪ closed under ∩ closed under ( )
regular non-regular regular non-regular regular non-regular regular non-regular
regular non-regular regular non-regular regular non-regular
C
A
AC
×
×
() ?
×
() ?
×
C
A
AC
×
()
× ?
() ?
Fig. 3. Some decidability results and closure properties
5
Concluding Remarks
In the paper we have shown decidability results and closure properties of Aand AC-tree language. New results on the decidability (Theorems 1, 2 and 4) and closure properties (Theorem 3) are the solutions to the questions remaining open in our previous paper [20]. Using the following Parikh’s result (Theorem 2 in [21]; the same result also in [4]), we can show a partial solution to the question of whether AC-regular tree languages are closed under intersection and complement. Lemma 10. Permutation closures of context-free languages are closed under boolean operations (union, intersection and complement). Corollary 6. Every AC-regular tree languages over the signature F = {f} ∪ F0 and FAC = {f} is closed under boolean operations. Proof. We show closedness under union and intersection below. Let A/AC, B/AC be regular AC-TA over F. By Proposition 4 (2), there exists a context-free grammar G1 and G2 such that leaf(L(A)) = L(G1 ) and leaf(L(B)) = L(G2 ). Due to Lemma 2, we have leaf(L(A/AC)) = leaf(AC(L(A))). Moreover, by assumption
Decidability and Closure Properties of Equational Tree Languages
127
of F, leaf(AC(L(A))) = perm(leaf(L(A))). Thus leaf(L(A/AC)) = perm(L(G1 )). The same property holds for B/AC. By Lemma 10, there exists a contextfree grammar GR such that perm(L(GR )) = perm(L(G1 )) R perm(L(G2 )) for each operation R ∈ {∪, ∩}. By Proposition 4 (1), we obtain a TA CR such that leaf(L(CR )) = L(GR ), and thus, leaf(L(CR /AC)) = perm(L(GR )). Therefore L(CR /AC) = L(A/AC) R L(B/AC) over F = {f} ∪ F0 with FAC = {f}. Due to the above corollary together with decidability of emptiness problem for AC-TA (Corollary 2), the subset and universality problems are decidable in this particular case. On the other hand, from the recent study about multiset grammars [11], we know that regular and non-regular AC-tree automata correspond to multiset context-free grammars and multiset monotone grammars, respectively. More precisely, the relationships like in Corollary 5 hold for them. This implies the expressive power of non-regular AC-tree automata strictly subsumes the regular case, which is also the answer to an open question of our previous paper. We summarize the decidability results and closure properties of C-, A- and AC-tree languages in Fig. 3. New results are indicated by dotted squares. All the results of C-tree languages are easily obtained, because C-TA are essentially the same as regular TA (Lemma 3, [20]). In the figure, the check mark means “positive” and the cross × is “negative”. The question mark ? means “open”. If the same result holds in both regular and non-regular cases, it is represented by a single mark in a large column. A positive result proved only in a special case, e.g. of F = {f} ∪ F0 with FAC = {f}, is denoted by the check mark with round brackets. Acknowledgements. The authors would like to thank Ralf Treinen for his continuous help. We also thank to three anonymous referees for their comments and criticism.
References 1. B. Bogaert and S. Tison: Equality and Disequality constraints on Direct Subterms in Tree Automata, Proc. of 9th STACS, Cachan (France), LNCS 577, pp. 161–171 1992. 2. H. Comon, M. Dauchet, R. Gilleron, F. Jacquemard, D. Lugiez, S. Tison and M. Tommasi: Tree Automata Techniques and Applications, draft, 1999. Available on http://www.grappa.univ-lille3.fr/tata/ 3. A. Deruyver and R. Gilleron: The Reachability Problem for Ground TRS and Some Extensions, Proc. of 14th CAAP, Barcelona (Spain), LNCS 351, pp. 227–243, 1989. 4. J. Esparza: Petri Nets, Commutative Context-Free Grammars, and Basic Parallel Processes, Fundamenta Informaticae 31(1), pp. 13–25, 1997. 5. M.R. Garey and D.S. Johnson: Computers and Intractability: A Guide to the Theory of NP-Completeness, W.H. Freeman & Company, New York, 1979. 6. F. G´ecseg and M. Steinby: Tree Automata, Akad´emiai Kiad´ o, Budapest, 1984. 7. T. Genet and F. Klay: Rewriting for Cryptographic Protocol Verification, Proc. of 17th CADE, Pittsburgh (PA), LNCS 1831, pp. 271–290, 2000.
128
H. Ohsaki and T. Takai
8. J.E. Hopcroft and J.D. Ullman: Introduction to Automata Theory, Languages, and Computation, Addison-Wesley Publishing Company, 1979. 9. H. Hosoya, J. Vouillon and B.C. Pierce: Regular Expression Types for XML, Proc. of 5th ICFP, Montreal (Canada), SIGPLAN Notices 35(9), pp. 11–22, 2000. 10. Y. Kaji, T. Fujiwara and T. Kasami: Solving a Unification Problem under Constrained Substitutions Using Tree Automata, JSC 23(1), pp. 79–117, 1997. 11. M. Kudlek and V. Mitrana: Normal Forms of Grammars, Finite Automata, Abstract Families, and Closure Properties of Macrosets, Multiset Processing, LNCS 2235, pp. 135–146, 2001. 12. S. La Torre and M. Napoli: Timed Tree Automata with an Application to Temporal Logic, Acta Informatica 38(2), pp. 89–116, 2001. 13. D. Lugiez: A Good Class of Tree Automata and Application to Inductive Theorem Proving, Proc. of 25th ICALP, Aalborg (Denmark), LNCS 1443, pp. 409–420, 1998. 14. D. Lugiez and J.L. Moysset: Tree Automata Help One to Solve Equational Formulae in AC-Theories, JSC 18(4), pp. 297–318, 1994. 15. E.W. Mayr: An Algorithm for the General Petri Net Reachability Problem, SIAM J. Comput. 13(3), pp. 441–460, 1984. 16. R. Mayr and M. Rusinowitch: Reachability is Decidable for Ground AC Rewrite Systems, Proc. of 3rd INFINITY, Aalborg (Denmark), 1998. Draft available from http://www.informatik.uni-freiburg.de/˜mayrri/ac.ps 17. R. Mayr and M. Rusinowitch: Process Rewrite Systems, Information and Computation 156, pp. 264–286, 1999. 18. J. Millen and V. Shmatikov: Constraint Solving for Bounded-Process Cryptographic Protocol Analysis, Proc. of 8th CCS, Philadelphia (PA), pp. 166–175, 2001. 19. D. Monniaux: Abstracting Cryptographic Protocols with Tree Automata, Proc. of 6th SAS, Venice (Italy), LNCS 1694, pp. 149–163, 1999. 20. H. Ohsaki: Beyond Regularity: Equational Tree Automata for Associative and Commutative Theories, Proc. of 15th CSL, Paris (France), LNCS 2142, pp. 539–553, 2001. 21. R.J. Parikh: On Context-Free Languages, JACM 13(4), pp. 570–581, 1966. 22. X. Rival and J. Goubault-Larrecq: Experiments with Finite Tree Automata in Coq, Proc. of 14th TPHOLs, Edinburgh (Scotland), LNCS 2152, pp. 362–377, 2001. 23. B. Schneier: Applied Cryptography: Protocols, Algorithms, and Source Code in C, Second Edition, John Wiley & Sons, 1996. 24. H. Seki, T. Takai, Y. Fujinaka and Y. Kaji: Layered Transducing Term Rewriting System and Its Recognizability Preserving Property, Proc. of 13th RTA, Copenhagen (Denmark), 2002. To appear in LNCS. 25. T. Takai, Y. Kaji and H. Seki: Right-Linear Finite-Path Overlapping Term Rewriting Systems Effectively Preserve Recognizability, Proc. of 11th RTA, Norwich (UK), LNCS 1833, pp. 246–260, 2000.
Regular Sets of Descendants by Some Rewrite Strategies Pierre R´ety and Julie Vuotto LIFO - Universit´e d’Orl´eans, B.P. 6759, 45067 Orl´eans cedex 2, France {rety, vuotto}@lifo.univ-orleans.fr http://www.univ-orleans.fr/SCIENCES/LIFO/Members/rety/
Abstract. For a constructor-based rewrite system R, a regular set of ground terms E, and assuming some additional restrictions, we build a finite tree automaton that recognizes the descendants of E, i.e. the terms issued from E by rewriting, according to innermost, innermost-leftmost, and outermost strategies.
1
Introduction
Tree automata have already been applied to many areas of computer science, and in particular to rewrite techniques [2]. In comparison with more sophisticated refinements [4,10,9], finite tree automata are obviously less expressive, but have plenty of good properties and lead to much simpler algorithms from a practical point of view. Because of potential applications to automated deduction and program validation (reachability, program testing), the problem of expressing by a finite tree automaton the transitive closure of a regular set E of ground terms with respect to a set of equations, as well as the related problem of expressing the set of descendants R∗ (E) of E with respect to a rewrite system R, have already been investigated [1,5,14,3,11,15]1 . Except [11,15], all those papers assume that the right-hand-sides (both sides when dealing with sets of equations) of rewrite rules are shallow2 , up to slight differences. On the other hand, P. R´ety’s work [12] does not always preserve recognizability (E is not arbitrary), but allows rewrite rules forbidden by the other papers3 . Reduction strategies in rewriting and programming have drawn an increasing attention within the last years, and matter both from a theoretical point of view, if the computation result is not unique, and from a practical point of view, for termination and efficiency. For a strategy st, expressing by a finite tree ∗ automaton the st-descendants Rst (E) of E, can help to study st : in particular ∗ ({t1 }), and it allows to decide st-reachability since t1 st →∗ t2 ⇐⇒ t2 ∈ Rst 1 2 3
[11] computes sets of normalizable terms, which amounts to compute sets of descendants by orienting the rewrite rules in the opposite direction. Shallow means that every variable appears at depth at most one. Like f (s(x)) → s(f (x)).
S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 129–143, 2002. c Springer-Verlag Berlin Heidelberg 2002
130
P. R´ety and J. Vuotto st
∗ ∗ st-joinability since t1 ↓ t2 ⇐⇒ Rst ({t1 }) ∩ Rst ({t2 }) = ∅. More generally, it can help with the static analysis of rewrite programs, and by extension, of functional programs. This paper is an extension of [12] that takes some strategies into account. As far as we know, the problem of expressing sets of descendants according to some strategies had not been addressed yet. We build finite tree automata that can express the sets of descendants of E with respect to a constructedbased rewrite system, according to innermost, innermost-leftmost, outermost strategies, assuming :
1. E is the set of ground constructor-instances (also called data-instances) of a given linear term. 2. Every rewrite rule is linear (both sides). 3. In right-hand-sides, there are no nested defined-functions, and arguments of defined-functions are either variables or ground terms. For innermost-leftmost strategy, we temporarily assume in addition that righthand-sides contain at most one defined-function. For outermost strategy, we also assume this extra restriction, and moreover that R has no critical pairs. It is shown in [12] that if any restriction among 1, 2, 3 is not satisfied, then the set of descendants is not regular (even if a strategy among innermost, innermostleftmost, outermost, is respected). About restriction 2, only non-right-linearity causes non-regularity. However, to deal with strategies, we need the regularity of the set of irreducible terms, hence left-linearity. The paper is organized as follows. Section 2 introduces preliminaries notions. The reader used to term rewriting and tree automata may skip Subsection 2.1, but not the following ones which present more specific notions. Section 3 (resp. 4, 5) gives the computation of innermost (resp. innermost-leftmost, outermost) descendants. Missing proofs are available in the full version. See R´ety’s web page.
2 2.1
Preliminaries Usual Notions : Term Rewriting and Tree Automata
Let C be a finite set of constructors and F be a finite set of defined-function symbols (functions in a shortened form). For c ∈ C ∪ F , ar(c) is the arity of c. Terms are denoted by letters s, t. A data-term is a ground term (i.e. without variables) that contains only constructors. TC is the set of data-terms, TC∪F is the set of ground-terms. For a term t, V ar(t) is the set of variables appearing in t, P os(t) is the set of positions of t, P os(t) is the set of non-variable positions of t, P osF (t) is the set of defined-function positions of t. t is linear if each variable of t appears only once in t. For p ∈ P os(t), t|p is the subterm of t at position p, t(p) is the top symbol of t|p , and t[t ]p denotes the subterm replacement. For positions p, p , p ≥ p means that p is located below p , i.e. p = p .v for some position v, whereas pp means that p and p are incomparable, i.e. ¬(p ≥ p ) ∧ ¬(p ≥ p). The term t contains nested functions if there exist p, p ∈ P osF (t) s.t. p > p . The domain dom(θ) of a substitution θ is the set of variables x s.t. xθ = x.
Regular Sets of Descendants by Some Rewrite Strategies
131
A rewrite rule is an oriented pair of terms, written l → r. A rewrite system R is a finite set of rewrite rules. lhs stands for left-hand-side, rhs for right-handside. R is constructor-based if every lhs l of R is of the form l = f (t1 , . . . , tn ) where f ∈ F and t1 , . . . , tn do not contain any functions. The rewrite relation →R is defined as follows : t →R t if there exist p ∈ P os(t), a rule l → r ∈ R, and a substitution θ s.t. t|p = lθ and t = t[rθ]p (also denoted by t →[p,l→r,θ] t ). →∗R denotes the reflexive-transitive closure of →R . t is irreducible if ¬(∃t | t →R t ). t →[p] t is innermost (resp. leftmost, outermost) if ∀v > p (resp. ∀v occurring strictly on the left of p, ∀v < p) t|v is irreducible. The narrowing relation ❀R is defined as follows : t ❀R t if there exist p ∈ P os(t), a rule l → r ∈ R, and a substitution θ s.t. t|p θ = lθ and t = (tθ)[rθ]p (also denoted by t ❀[p,l→r,θ] t ). A (bottom-up) finite tree automaton is a quadruple A = (C ∪ F, Q, Qf , ∆) where Qf ⊆ Q are sets of states and ∆ is a set of transitions of the form c(q1 , . . . , qn ) → q where c ∈ C ∪ F and q1 , . . . , qn , q ∈ Q, or of the form q1 → q (empty transition). Sets of states are denoted by letters Q, S, D, and states by q, s, d. →∆ (also denoted →A ) is the rewrite relation induced by ∆. A ground term t is recognized by A into q if t →∗∆ q. L(A) is the set of terms recognized by A into any states of Qf . A derivation t →∗∆ q where q ∈ Qf is called a successful run on t. The states of Qf are called final states. A is deterministic if whenever t →∗∆ q and t →∗∆ q we have q = q . A Q-substitution σ is a substitution s.t. ∀x ∈ dom(σ), xσ ∈ Q. A set E of ground terms is regular if there exists a finite automaton A s.t. E = L(A). 2.2
Nesting Automata
Intuitively, the automaton A discriminates position p into state q means that along every successful run on t ∈ L(A), t|p (and only this subterm) is recognized into q. This property allows to modify the behavior of A below position p without modifying the other positions, by replacing A|p by another automaton A . See [12] for missing proofs. Definition 2.1. The automaton A = (C ∪ F, Q, Qf , ∆) discriminates the position p into the state q if • L(A) = ∅, • and ∀t ∈ L(A), p ∈ P os(t), • and for each successful derivation t →∗∆ t[q ]p →∗∆ qf where qf ∈ Qf , we have - q →∗∆ q (i.e. by empty transitions) if p = p, - not (q →∗∆ q) otherwise. In this case we define the automaton A|p = (C ∪ F, Q, {q}, ∆). Lemma 2.2. L(A|p ) = {t|p | t ∈ L(A)}.
132
P. R´ety and J. Vuotto
Definition 2.3. Let A = (C ∪ F, Q, Qf , ∆) be an automaton that discriminates the position p into the state q, and let A = (C ∪ F, Q , Qf , ∆ ) s.t. Q ∩ Q = ∅. We define A[A ]p = (C ∪ F, Q ∪ Q , Qf , ∆\{l → r | l → r ∈ ∆ ∧ r = q} ∪ ∆ ∪ {qf → q | qf ∈ Qf }) Lemma 2.4. L(A[A ]p ) = {t[t ]p | t ∈ L(A), t ∈ L(A )}, and A[A ]p still discriminates p into q. Moreover, if A discriminates another position p s.t. p ≥ p, into the state q , then A[A ]p still discriminates p into q . Lemma 2.5. Let A, B be automata, and let A ∩ B be the classical automaton used to recognize intersection, whose states are pairs of states of A and B. If A discriminates p into qA , B discriminates p into qB , and L(A) ∩ L(B) = ∅, then A ∩ B discriminates p into (qA , qB ). Proof. Let t ∈ L(A ∩ B). - since t ∈ L(A), p ∈ P os(t) - for any successful run on t, t →∗∆A∩B t[(q A , q B )]p →∗ (qf A , qf B ) - if p = p then from discrimination of A and B, q A →∗∆ qA and q B →∗∆ qB - if p = p then from discrimination of A and B, not(q A →∗∆ qA ) and not(q B →∗∆ qB ).
2.3
Particular Automata
Let us define the initial automaton, i.e. the automaton that recognizes the datainstances of a given linear term t. Definition 2.6. We define the automaton Adata that recognizes the set of dataterms TC : Adata = (C, Qdata , Qdataf , ∆data ) where Qdata = Qdataf = {qdata } and ∆data = {c(qdata , . . . , qdata ) → qdata | c ∈ C}. Given a linear term t, we define the automaton Atθ that recognizes the datainstances of t : Atθ = (C ∪ F, Qtθ , Qtθf , ∆tθ ) where Qtθ = {q p | p ∈ P os(t)} ∪ {qdata } Q tθf = {q } (qdata if t is a variable) qdata if t|p.i is a variable p ∆tθ = t(p)(s1 , . . . , sn ) → q | p ∈ P os(t), si = p.i q otherwise ∪∆data Note that Atθ discriminates each position p ∈ P os(t) into q p . On the other hand, Atθ is not deterministic, as soon as there is p ∈ P os(t) s.t. t|p is a constructor-term. Indeed for any data-instance t|p θ, t|p θ →∗[∆tθ ] q p and t|p θ →∗[∆tθ ] qdata . Let us now define an automaton that recognizes the terms irreducible at position p.
Regular Sets of Descendants by Some Rewrite Strategies
133
Definition 2.7. Let IRRp (R) = {s ∈ TC∪F |p ∈ P os(s) and s|p is irreducible}. To prove the regularity of IRRp (R), we need some more definitions. Definition 2.8. Let RED(R) be the language of reducible terms: RED(R) = {s | ∃p ∈ P os(s) s →[p ,l→r,σ] s } Lemma 2.9. [6] If R is left-linear, RED(R) is a regular language. Lemma 2.10. IRR (R) = RED(R). Therefore, IRR (R) is a regular language. Thanks to an automaton that recognizes IRR (R), we can now build an automaton that recognizes IRRp (R). Theorem 2.11. Let t be a term, and p ∈ P os(t). IRRp (R) is a regular language and is recognized by an automaton that discriminates every position p ∈ P os(t) s.t. p > p. Proof. Let A = (C ∪ F, Q , Qf , ∆ ) be an automaton that recognizes IRR (R). Let p = p1 . . . . .pk with p1 , . . . , pk ∈ IN − {0} and ∀i pi ≤ M axf ∈F ∪C (ar(f )) We define Airr as follows : Airr = (C ∪ F, Qirr , Qirrf , ∆irr ) where v Qirr = {qany , qrec } ∪v
Airr recognizes IRRp (R) indeed, because: t|p reducible i.e. ∃ u position s.t u ≥ p and t →[u] t . - qany recognizes any terms. - q w recognize t|w f or w < p. We have written ar(s) ≥ plong(j)+1 to ensure that p ∈ P os(t). For example, if p = 1.2.1 and s(. . .) → q 1 , then s should have an arity ≥ 2. Obviously, Airr discriminates p into qrec (into q if p = )), and each p ∈ P os(t) p (q p if p < p). s.t. p ≥ p into qany
134
2.4
P. R´ety and J. Vuotto
Descendants
t is a descendant of t if t →∗R t . t is a normal-form of t if t →∗R t and t is irreducible. If E is a set of ground terms, R∗ (E) denotes the set of descendants ∗ of elements of E. IRR(R) denotes the set of irreducible ground terms. Rin (E) ∗ ∗ (resp. Rilef (E), R (E)) denotes the set of descendants of E, according to an out t innermost (resp. innermost-leftmost, outermost) strategy. Definition 2.12. t →+ [p,rhs s] t means that t is obtained by rewriting t at position p, plus possibly at positions coming from the rhs’s. Formally, there exist some intermediate terms t1 , . . . , tn and some sets of positions P (t), P (t1 ), . . . , P (tn ) s.t. t = t0 →[p0 ,l0 →r0 ] t1 →[p1 ,l1 →r1 ] . . . →[pn−1 ,ln−1 →rn−1 ] tn →[pn ,ln →rn ] tn+1 = t where
- p0 = p and P (t) = {p}, - ∀j, pj ∈ P (tj ), - ∀j, P (tj+1 ) = P (tj )\{p | p ≥ pj } ∪ {pj .w | w ∈ P osF (rj )}. Remark : P (tj ) contains only function positions. Since there are no nested functions in rhs’s, p, p ∈ P (tj ) implies pp . Definition 2.13. Given a language E and a position p, we define Rp∗ (E) as follows Rp∗ (E) = E ∪ {t | ∃t ∈ E, t →+ [p,rhs s] t } Example 1. R = {f (x) → s(x), g(x) → s(h(x)), h(x) → f (x)} R1∗ (f (h(g(a)))) = E ∪ f (f (g(a))) ∪ f (s(g(a)) An insight into the algorithm underlying the following result will be given in the sequel by Example 2, and a formal description is in the full version. The resultant automata are different from the starting one only at positions below p, and in the general case, are built by nesting automata. Theorem 2.14. [12] Let R be a rewrite system satisfying the restrictions given in the introduction. If E is recognized by an automaton that discriminates position p into some state q, and possibly p into q for some p ∈ P os(t) s.t. p ≥ p, and some states q , then so is Rp∗ (E). 2.5
Positions
Given a term t, we define : Definition 2.15. Let p ∈ P os(t). Succ(p) are the nearest function positions below p : Succ(p) = {p ∈ P osF (t) | p > p and ∀q ∈ P os(t) (p < q < p ⇒ q ∈ / P osF (t))} Definition 2.16. Let p, p ∈ P os(t). p ✁ p means that p occurs strictly on the left of p , i.e. p = u.i.v, p = u.i .v , where i, i ∈ IN and i < i .
Regular Sets of Descendants by Some Rewrite Strategies
3
135
∗ Innermost Descendants : Rin (E)
Example 2. Let a, s be constructors and f be a function, s.t. a is a constant, and s, f are unary symbols. Let t = f (s(f (s(y)))) and Atθ be the automaton that recognizes the language E = f (s(f (s(s∗ (a))))) of the data-instances of t. Atθ can be summarized by writing : q q1
q 1.1 q 1.1.1
f (s ( f ( s
qdata
( s∗ (a)))))
which means that ∆tθ = {a → qdata , s(qdata ) → qdata , s(qdata ) → q 1.1.1 , f (q 1.1.1 ) → q 1.1 , s(q 1.1 ) → q 1 , f (q 1 ) → q } where q is the accepting state. Consider now the rewrite system R = {f (s(x)) → s(x)}. ∗ Obviously, Rin (E) = E ∪ f (s(s(s∗ (a)))) ∪ s(s(s∗ (a))). We can make two remarks: - When rewriting E, some instances of rhs’s of rewrite rules are introduced ∗ (E), we by rewrite steps. So, to build an automaton that can recognize Rin need to recognize the instances of rhs’s into some states, without making any confusion between the various potential instances of the same rhs. - When the starting term has nested functions, according to the innermost strategy, we first have to rewrite innermost function positions. Note that here, we can rewrite E at positions ) and 1.1. According to the previous remark, we start from position 1.1. ∗ Now, we calculate R1.1 (E). (1) f (s(f (s(s∗ (a))))) →[1.1,x/s∗ (a)] f (s(s(s∗ (a)))) The language that instantiates the rewrite rule variable x is s∗ (a) (recognized
into qdata ). Therefore, we encode the first version of the rhs: adding state dqdata and the transition s(qdata ) → dqdata .
dq
data
s
qdata
( x ) by
We can simulate the rewrite step, by adding transitions again. This step is called saturation in the following. Consider (1) again. Since f (s(x)) is the rule lhs, and f (s(qdata )) →∗∆tθ q 1.1 , we add the transition dqdata → q 1.1 so that the instance of the rhs by qdata is also recognized into q 1.1 , i.e. s(s∗ (a)) →∗ q 1.1 . ∗ (E) = E ∪ f (s(s(s∗ (a)))) is recognized by the automaton. So, R1.1 ∗ (E) at position ) is allowed only if position Now, rewriting terms of R1.1 ∗ 1.1 is normalized. Consider E = R1.1 (E) ∩ IRR1.1 (R) where IRR1.1 (R) is the ground terms irreducible at position 1.1, over the TRS R. Thus E = f (s(s(s∗ (a)))), and let us calculate R∗ (E ).
Let A = (C ∪ F, Q , {q }, ∆ ) an automaton that recognizes the language E where ∆ = {a → qdata , s(qdata ) → qdata , s(qdata ) → q 1.1 , s(q 1.1 ) → q 1 , f (q 1 ) → q } where q is the accepting state.
136
P. R´ety and J. Vuotto
f (s(s(s∗ (a)))) →[x/s(s∗ (a))] s(s(s∗ (a)))
(2)
The language that instantiates x is s(s∗ (a)) (recognized into q we encode a second version of the rhs : the transition s(q
1.1
) → dq 1.1 .
d 1.1 q
s
q
1.1
). Therefore,
1.1
( x ) by adding state dq 1.1 and 1.1
By saturation, since f (s(x)) is the rule lhs and f (s(q )) → q , we add the transition dq1.1 → q so that s(s(s∗ (a))) →∗ q . So, R∗ (E ) = E ∪ s(s(s∗ (a))) is recognized by the automaton. E contains only terms normalized at position 1.1, which is not required by the innermost strategy when no rewrite step is applied at position ). ∗ ∗ ∗ ∗ Therefore, Rin (E) = R∗ (E ) ∪ R1.1 (E) = R∗ (R1.1 (E) ∩ IRR1.1 (R)) ∪ R1.1 (E). Remark : In the previous example, the starting term has nested functions. When it is not the case, every rewrite step is innermost, because rhs’s have no nested functions either. In general t may have more than two function positions. To generalize, we need the following notion. ∗ Definition 3.1. Given a language L and a position p, Rin,p (L) are the innermost descendants of L over the TRS R, reducing positions below (or equal to) p, i.e. ∗ (L) = {s |∃s ∈ L, s → ∗[u1 ,...,un ] s by an innermost strategy, ∀i (ui ≥ p)} Rin,p
For a language L, let L|p = {s|p | s ∈ L, p ∈ P os(s)}. Lemma 3.2. Let R be a rewrite system satisfying the restrictions given in the introduction, and E be the data-instances of a given linear term t. Let p ∈ P osF (t), and L be a language s.t. L|p = E|p , and that is recognized by an automaton A that discriminates every position p ∈ P osF (t) | p ≥ p. Then, ∗ Rin,p (L) = Rp∗ (L) if Succ(p) = ∅ Otherwise, let Succ(p) = {p1 , . . . , pn }, and in this case ∗ ∗ ∗ R [R (. . . (Rin,p (L)) . . .) ∩pi ∈Succ(p) IRRpi (R)] ∗ 1 n Rin,p (L) = p in,p ∗ ∗ ∪ Rin,p1 (. . . (Rin,p (L)) . . .) n ∗ (L) is recognized by an automaton A s.t. if p ∈ P os(t), p > p, and and Rin,p A discriminates p into q , then A also discriminates p into q .
Proof. By noetherian induction on (P osF (t), >). - If Succ(p) = ∅, then ∀s ∈ L, ∀p ∈ P os(s), (p > p =⇒ s(p ) ∈ C). And ∗ (L). since rhs’s have no nested functions, Rp∗ (L) = Rin,p We get A by Theorem 2.14. - Let Succ(p) = {p1 , . . . , pn }. Let us define : ∗ Rin,>p (L) = {s |∃s ∈ L, s → ∗[u1 ,...,un ] s by an innermost strat., ∀i (ui > p)} Let s ∈ L. Either a rewrite step is applied at position p, and the strategy is innermost only if we first normalize s below position p by an innermost
Regular Sets of Descendants by Some Rewrite Strategies
137
derivation, or no rewrite step is applied at position p. And since no definedfunction occurs along any branches between p and pi : ∗ ∗ ∗ Rin,p (L) = Rp∗ [Rin,>p (L) ∩pi ∈Succ(p) IRRpi (R)] ∪ Rin,>p (L) Now, note that ∀i, j ∈ [1..n], (i = j =⇒ pi ||pj ). Moreover rewrite steps at incomparable positions can be commuted. Then obviously : ∗ ∗ ∗ Rin,>p (L) = Rin,p (. . . (Rin,p (L) . . .)) 1 n L is recognized by an automaton A that discriminates every p ∈ P osF (t) s.t. p ≥ p. For each i, pi > p, then A discriminates every p ∈ P osF (t) ∗ (L) is recognized by an autos.t. p ≥ pi . By induction hypothesis, Rin,p n maton An that still discriminates p and every position p s.t. p ≥ pi , i = ∗ ∗ (Rin,p (L)) is recognized by an automaton An−1 that 1, . . . , n − 1, Rin,p n−1 n still discriminates p and every position p s.t. p ≥ pi , i = 1, . . . , n − 2, ∗ ∗ (. . . (Rin,p Rin,p (L) . . .)) is recognized by an automaton A1 that still discri1 n minates p. By Theorem 2.11, IRRpi (R) is recognized by an automaton that discriminates every position p ∈ P osF (t) s.t. p > pi , then necessarily p. By lemma 2.5, ∩pi ∈Succ(p) IRRpi (R) is recognized by an automaton that discriminates p. ∗ ∗ (. . . (Rin,p (L)) . . .) ∩pi ∈Succ(p) IRRpi (R) is recognized by Therefore Rin,p 1 n an automaton that discriminates p, and from Theorem 2.14, so is ∗ ∗ Rp∗ [Rin,p (. . . (Rin,p (L)) . . .)∩pi ∈Succ(p) IRRpi (R)]. Moreover discrimination 1 n of positions p > p is preserved. Finally, by union, we obtain an automaton that discriminates p and preserves the discrimination of positions p > p. Theorem 3.3. Let R be a rewrite system satisfying the restrictions given in the introduction, and E be the data-instances of a given linear term t. ∗ Rin, (E) if t()) ∈ F ∗ Rin (E) = R∗ (. . . (R∗ in,p1 in,pn (E) . . .)) otherwise with Succ()) = {p 1 , . . . , pn } ∗ and Rin (E) is effectively recognized by an automaton. Proof. We have two cases: ∗ ∗ (E) = Rin, (E). - If t()) ∈ F , obviously Rin - If t()) ∈ / F , ∀i, j ∈ [1..n], (i = j =⇒ pi ||pj ), and rewrite steps at incompa∗ ∗ ∗ rable positions can be commuted. Then Rin (E) = Rin,p (. . . (Rin,p (E) . . .)). 1 n The automaton comes from Definition 2.6 and from applying Lemma 3.2 (several times in the second case). Example 3. Let E the data-instances of t = f (g(x), h(g(y))) and R = {f (x, y) → y, h(x) → s(x), g(x) → x} ∗ will symbolize the data-terms that instantiate t. ∗ t()) ∈ F, we so calculate Rin, (E) where E = f (g(∗), h(g(∗))). ∗ ∗ ∗ ∗ ∗ ∗ Rin, (E) = R [Rin,1 (Rin,2 (E)) ∩ IRR1 (R) ∩ IRR2 (R)] ∪ Rin,1 (Rin,2 (E)). ∗ (E). We have to compute Rin,2
138
P. R´ety and J. Vuotto
Succ(2) = {2.1} ∗ ∗ ∗ (E) = R2∗ [Rin,2.1 (E) ∩ IRR2.1 (R)] ∪ Rin,2.1 (E) So, Rin,2 ∗ where Rin,2.1 (E) = E ∪ f (g(∗), h(∗)). ∗ ∗ Rin,2 (E) = R2∗ [f (g(∗), h(∗))] ∪ Rin,2.1 (E) ∗ = f (g(∗), h(∗)) ∪ f (g(∗), s(∗)) ∪ Rin,2.1 (E) (denoted by E1). ∗ ∗ Now, we can compute Rin,1 (Rin,2 (E)). Succ(1) = ∅. ∗ So, Rin,1 (E1) = R1∗ (E1) = E1 ∪ f (∗, h(∗)) ∪ f (∗, s(∗)) ∪ f (∗, h(g(∗))) (denoted by E2). ∗ Rin, (E) = R∗ [E2 ∩ IRR1 (R) ∩ IRR2 (R)] ∪ E2 = R∗ [f (∗, s(∗))] ∪ E2 = f (∗, s(∗)) ∪ s(∗) ∪ E2 ∗ Finally, we obtain Rin (E) = E ∪ f (g(∗), h(∗)) ∪ f (g(∗), s(∗)) ∪ f (∗, h(∗)) ∪ f (∗, s(∗)) ∪ f (∗, h(g(∗))) ∪ s(∗).
4
∗ Innermost-Leftmost Descendants : Rilef t (E)
Definition 4.1. Given a language L and a position p, let us define : ∗ ∗ Rilef t,p (L) = {t | ∃t ∈ L, t →[u1 ,...,un ] t by an innermost − lef tmost strategy, and ∀i (ui ≥ p)} Lemma 4.2. Let R be a rewrite system satisfying the restrictions given in the introduction, and E be the data-instances of a given linear term t. Let p ∈ P osF (t) and L be a language s.t. L|p = E|p , and that is recognized by an automaton A that discriminates every position p ∈ P osF (t) | p ≥ p. Then, ∗ ∗ Rilef t,p (L) = Rp (L) if Succ(p) = ∅ Otherwise, let Succ(p) = {p1 , . . . , pn } s.t. p1 ✁ . . . ✁ pn , and in this case ∗ ∗ ∗ Rp [Rilef t,p (. . . (Rilef t,p1 (L) ∩ IRRp1 (R)) . . .) ∩ IRRpn (R)] n ∗ ∗ ∗ ∗ Rilef t,p (L) = ∪ Rilef t,p1 (L) ∪ Rilef t,p2 (Rilef t,p1 (L) ∩ IRRp1 (R)) ∪ . . . ∗ ∗ . . . ∪ Rilef t,pn (. . . (Rin,p (L) ∩ IRRp1 (R)) . . .) 1 ∗ and Rilef t,p (L) is recognized by an automaton A s.t. if p ∈ P os(t), p > p, and A discriminates p into q , then A also discriminates p into q .
Theorem 4.3. Let E be the data-instances of a linear term t and let R a TRS satisfying the restrictions given in the introduction. ∗ Rilef t, (E) if ) ∈ P osF (t) R∗ ∗ (E) ∪ . . . ∪ Rilef t (E) = ilef t,p1 R∗ ∗ otherwise ilef t,pn (. . . (Rilef t,p1 (E) ∩ IRRp1 ) . . .) with Succ()) = {p , . . . , p } s.t. p ✁ . . . ✁ p 1 n 1 n ∗ and Rilef t (E) is effectively recognized by an automaton.
Proof. We have two cases: ∗ ∗ - If ) ∈ P osF (t), obviously Rilef t (E) = Rilef t, (E).
Regular Sets of Descendants by Some Rewrite Strategies
139
- If ) ∈ P osF (t), ∀i, j s.t. pi ✁ pj , innermost-leftmost descendants at position pj can be computed after to have normalized those at position pi . ∗ ∗ ∗ ∗ Then obviously, Rilef t (E) = Rilef t,p1 (E) ∪ . . . ∪ Rilef t,pn (. . . (Rilef t,p1 (E) ∩ IRRp1 ) . . .) The automaton comes from Definition 2.6 and from applying Lemma 4.2. Example 4. Let E the data-instances of t = f (g(x), h(y)) and R = {f (x, y) → s(f (x, y)), h(x) → s(x), g(x) → x}. ∗ will symbolize the data-terms that instantiate t. ∗ t()) ∈ F, we so calculate Rilef t, (E) where E = f (g(∗), h(∗)). ∗ ∗ ∗ ∗ Rilef t, (E) = R [Rilef t,2 (Rilef t,1 (E) ∩ IRR1 (R)) ∩ IRR2 (R)] ∗ ∗ ∗ ∪Rilef t,1 (E) ∪ Rilef t,2 (Rilef t,1 (E) ∩ IRR1 (R)). ∗ We have to compute Rilef t,1 (E). ∗ ∗ So, Rilef t,1 (E) = R1 (E) =E ∪ f (∗, h(∗)) ∗ ∗ Now, we can compute Rilef t,2 (Rilef t,1 (E) ∩ IRR1 (R)). ∗ Rilef t,2 (f (∗, h(∗)) = f (∗, s(∗)) ∪ f (∗, h(∗)) So, R∗ (f (∗, s(∗))) = s∗ (f (∗, s(∗))) ∗ ∗ Finally, we obtain Rilef t (E) = s (f (∗, s(∗))) ∪ E ∪ f (∗, h(∗)). Remark : The current computation of Rp∗ does not take any strategies into account. Since rhs’s do not have nested defined-functions, Rp∗ is automatically innermost, but not leftmost. This is why we assume in addition that rhs’s have at most one defined-function. However it is possible to modify the computation of Rp∗ , to take the leftmost strategy into account (see [13]). Then, Theorem 4.3 still holds even if right-hand-sides contain several defined-functions.
5
∗ Outermost Descendants : Rout (E)
Example 5. Consider two constructors s, a, and let R = {f (s(x), s(y)) → s(f (x, y)), g(s(x)) → s(g(x))} Let t = f (g(x), g(y)), hence the starting language is E = f (g(s∗ (a)), g(s∗ (a))). Obviously the descendants of E are R∗ (E) = s∗ (f (s∗ (g(s∗ (a))), s∗ (g(s∗ (a))))), whereas the outermost descendants are : ∗ Rout (E) = s∗ (f (s∗ (g(s∗ (a))), s? (g(s∗ (a))))) ∪ s∗ (f (s? (g(s∗ (a))), s∗ (g(s∗ (a))))) where s? means zero or one occurrence of s. ∗ (E), we first reduce innermost positions 1 and Surprisingly, to compute Rout ∗ ∗ 2 by computing R2 (R1 (E)) = f (s∗ (g(s∗ (a))), s∗ (g(s∗ (a)))). Now we reduce f (position ) in E), however we keep only the descendants s.t. f cannot be reduced any more. Let R! denote this local normalization. We get : R! [R2∗ (R1∗ (E))] = s∗ (f (s∗ (g(s∗ (a))), g(s∗ (a)))) ∪ s∗ (f (g(s∗ (a)), s∗ (g(s∗ (a))))) ∗ which is a subset of Rout (E). Some outermost descendants are missing because the normalization of f is too harsh. To define the right one, we introduce an outermost abstract rewriting, that selects exactly the outermost descendants
140
P. R´ety and J. Vuotto
among the elements of R∗ (E). It is composed of an outermost narrowing step, a subterm decomposition, and an approximation. Subterm decomposition and approximation are just for making abstract rewriting terminate. The following definitions are illustrated by Example 6. Notations : ❀ denotes the narrowing relation. Given a TRS R, t ❀[p] means that there exists a narrowing step at position p issued from t. Because of the restrictions, every considered term is linear : variable names do not matter, and for readability, we will often replace variables by the anonymous variable ⊥. Definition 5.1. Let T2 be the set of terms (with variables) s.t. t()) ∈ F and every branch of t contains at most two nested defined-functions. Given a TRS R, let us define on T2 a binary relation ≈, to approximate terms. Let t, t ∈ T2 , and for t let Succ()) = {p1 , . . . , pn }. For each i ∈ {1, . . . , n}, letti = t[⊥]pj , ∀j=i , and let t0 = t[⊥]pj , ∀j . Then t ≈ t iff 1. t = t ∧ (1) where (1) = (t ❀[] ∨ Succ()) = ∅) 2. or t = t0 ∧ ¬(1) ∧ t0 ❀[] 3. or ((t = t1 ∧ t1 ❀[] ) ∨ . . . ∨ (t = tn ∧ tn ❀[] )) ∧ ¬(1) ∧ t0 ❀[] Remark : For a given t, several t may exist in case 3, and there is at least one because t1 ❀[] ∧ . . . ∧ tn ❀[] implies t ❀[] (thanks to constructor discipline), i.e. condition (1) is true. Comments : In case 1, t can be narrowed at position ), or t does not contain nested defined-functions. Therefore a step t ❀[] (if any) is outermost : we keep t and will try to narrow it at ). Case 2 means that t cannot be narrowed at ), even if inner positions are first narrowed. Therefore, any narrowing step at an inner position is outermost. Note that a term obtained by narrowing steps at inner positions is an instance of t0 . Thus, the descendants of tθ that are instances of t0 are outermost descendants : we keep t0 instead of t. Case 3 means that t cannot be narrowed at ), but could be narrowed at ) if inner positions are first modified (narrowed). Suppose ti ❀[] . Then every derivation t ❀∗[pj ,...,pj ] s s.t. ∀jk , pjk = pi is outermost, s is an instance of ti , and s ❀[] : 1 l we keep ti instead of t, and will try to narrow it at pi . Definition 5.2. (outermost abstract rewriting) Let t, t ∈ T2 . t /→[p,l→r] t if there exists a term s s.t. t ❀[p,l→r,θ] s, and - p = ) ∧ ∃u ∈ P osF (r) | s|u ≈ t , - or t ❀[] and s ≈ t . Example 6. Let R = {h(c(x, y)) → c(h(x), i(y)), i(s(x)) → p(i(x))} where c, s, p are constructors. Since h(h(⊥)) ❀[] and h(h(⊥)) ❀[1] h(c(h(⊥), i(⊥))) ≈case1 itself , then h(h(⊥)) /→ h(c(h(⊥), i(⊥))). Since h(c(h(⊥), i(⊥))) ❀[] c(h(h(⊥)), i(i(⊥))) and subterms h(h(⊥)) ≈case3 itself (n = 1 then t1 = t) and i(i(⊥)) ≈case3 itself , then h(c(h(⊥), i(⊥))) /→ h(h(⊥)) and h(c(h(⊥), i(⊥))) /→ i(i(⊥)). Since i(i(⊥)) ❀[] and i(i(⊥)) ❀[1] i(p(i(⊥))) ≈case2 i(p(⊥)), then i(i(⊥)) /→ i(p(⊥)). Note that i(p(⊥)) does not narrow.
Regular Sets of Descendants by Some Rewrite Strategies
141
Lemma 5.3. Let t ∈ T2 , σ be a data-substitution, and tσ →[p,l→r] t be a rewrite step. If p = ) let u ∈ P osF (r), otherwise let u = ). Then : tσ →[p,l→r] t is outermost ⇐⇒ ∃s, s ∈ T2 | t ≈ s ∧ s /→?[p,l→r] s ∧ t |u is an instance of s The previous lemma does not extend to several steps if rhs’s may contain several defined-functions (resp. if R has critical pairs), due to the decomposition step included in abstract rewriting, which may cause a confusion between different subterms (resp. between terms obtained by applying different rewrite rules). Example 7. R = {i(x, y) → c(h(x), h(y)), h(x) → x, g(s(x)) → s(g(x))} Let t = i(s(g(x)), g(y)). Then t ≈ t /→[] h(s(g(⊥))). The step i(s(g(s∗ (a))), g(s∗ (a))) →[] t = c(h(s(g(s∗ (a)))), h(g(s∗ (a)))) is outermost. Now t →[2.1] t is not outermost whereas t |2 = h(s(g(s∗ (a)))) is an instance of h(s(g(⊥))). Definition 5.4. Let E be the data-instances of a given linear term t, and p ∈ P osF (t). Assume Succ(p) = ∅. Let us define (t|p )abs ∈ T2 by : if Succ(Succ(p)) = ∅ (i.e. t|p ∈ T2 ), let (t|p )abs = t|p , otherwise (t|p )abs = (t[⊥]pj , ∀j )|p where Succ(Succ(p)) = {p1 , . . . , pn }. Note that t|p is an instance of (t|p )abs . ∗ (E) be the set of descendants of {s ∈ T2 | (t|p )abs ≈ s} by outermost Let Rabs,p abstract rewriting. ∗ ∗ Let L(Rabs,p (E)) be the set of ground instances of elements of Rabs,p (E). Example 8. Consider Example 5 again, and let p = ). Then tabs = t, and t ≈ t1 = f (g(⊥), ⊥), and t ≈ t2 = f (⊥, g(⊥)). Now : t1 = f (g(⊥), ⊥) /→[1] f (s(g(⊥)), ⊥) /→[] f (g(⊥), ⊥) = t1 t2 = f (⊥, g(⊥)) /→[2] f (⊥, s(g(⊥))) /→[] f (⊥, g(⊥)) = t2 Therefore ∗ (E) = {t2 , f (⊥, s(g(⊥))), t1 , f (s(g(⊥)), ⊥)} Rabs,
Now, if R! reduces f and keeps the descendants whose subterms headed by f ∗ (E), then even the missing descendants (when are instances of elements of Rabs, ? ∗ s is exactly s) are obtained, and R! [R2∗ (R1∗ (E))] = Rout (E). ∗ ∗ Lemma 5.5. Rabs,p (E) is finite. Then the set L(Rabs,p (E)) of ground instances ∗ of elements of Rabs,p (E) is effectively recognized by a tree automaton. Definition 5.6. Let p ∈ P osF (t) s.t. Succ(p) = ∅. t →![p,rhs s] t means that t →∗[p,rhs s] t (like in Definition 2.12, except that zero step is allowed), provided in addition ∗ ∀q ∈ P (t ), t |q ∈ L(Rabs,p (E)) Given a language L, let Rp! (L) = {t | ∃t ∈ L, t →![p,rhs s] t }. Lemma 5.7. If L is recognized by an automaton A that discriminates p, then Rp! (L) is recognized by an automaton A s.t. if p > p and A discriminates p into q , then A also discriminates p into q .
142
P. R´ety and J. Vuotto
Definition 5.8. t0 →[p0 ,l0 →r0 ] t1 → . . . tn →[pn ,ln →rn ] tn+1 = t is outermost under a position p if ∀i, pi ≥ p ∧ ∀qi (p < qi < pi =⇒ ti →[qi ] ) ∗ Given a language L, Rout,p (L) are the descendants of L outermost un∗ (L) = {t | ∃s ∈ L, s →∗ der p, using the TRS R, i.e. Rout,p t by a derivation outermost under p}. Lemma 5.9. Let R be a rewrite system satisfying the restrictions given in the introduction, and E be the data-instances of a given linear term t. Let p ∈ P osF (t), and L be a language s.t. L|p = E|p , and that is recognized by an automaton A that discriminates every position p ∈ P osF (t) | p ≥ p. Then, ∗ (L) = Rp∗ (L) if Succ(p) = ∅ Rout,p Otherwise, let Succ(p) = {p1 , . . . , pn }, and in this case ∗ ∗ ∗ (L) = Rp! [Rout,p (. . . (Rout,p (L)) . . .)] Rout,p 1 n ∗ (L) is recognized by an automaton A s.t. if p ∈ P os(t), p > p, and and Rout,p A discriminates p into q , then A also discriminates p into q .
Theorem 5.10. Let R be a rewrite system satisfying the restrictions given in the introduction, and E be the data-instances of a given linear term t. ∗ Rout, (E) if t()) ∈ F ∗ Rout (E) = R∗ ∗ out,p1 (. . . (Rout,pn (E) . . .)) otherwise with Succ()) = {p , . . . , p } 1
and
∗ Rout (E)
6
Conclusion
n
is effectively recognized by an automaton.
When computing descendants, taking some strategies into account is possible, keeping the same class of tree language (the regular ones) and assuming the same restrictions (plus left-linearity). Does it also hold for any other computations of descendants ?
References 1. H. Comon. Sequentiality, second order monadic logic and tree automata. In Proc., Tenth Annual IEEE Symposium on logic in computer science, pages 508-517. IEEE Computer Society Press, 26-29 June 1995. 2. H. Comon, M. Dauchet, R. Gilleron, D. Lugiez, S. Tison, and M. Tommasi. Tree Automata techniques and Applications (TATA). http://l3ux02.univ-lille3.fr/tata. 3. J. Coquid´e, M. Dauchet, R. Gilleron, and S. Vagvolgyi. Bottom-up Tree Pushdown Automata and Rewrite Systems. In R. V. Book, editor, Proceedings 4th Conference on Rewriting Techniques and Applications, Como (Italy), volume 488 of LNCS, pages 287-298. Springer-Verlag, April 1991.
Regular Sets of Descendants by Some Rewrite Strategies
143
4. M. Dauchet, A.C. Caron, and J.L. Coquid´e. Reduction Properties and Automata with Constraints. In Journal of Symbolic Computation, 20:215-233. 1995. 5. M. Dauchet and S. Tison. The theory of ground rewrite systems is decidable. In Proc., Fifth Annual IEEE Symposium on logic in computer science, pages 242-248, Philadelphia, Pennsylvania, 1990. IEEE Computer Society Press. 6. J. H. Gallier and R. V. Book. Reductions in tree replacement systems. Theoretical Computer Science,37:123-150, 1985. 7. T. Genet. Decidable Approximations of Sets of Descendants and Sets of Normal Forms. In Proceedings of 9th Conference on Rewriting Techniques and Applications, Tsukuba (Japan), volume 1379 of LNCS, pages 151-165. Springer-Verlag,1998. 8. T. Genet, F. Klay. Rewriting for Cryptographic Protocol Verification. Technical report, CNET-France Telecom, 1999. http://www.loria.fr/genet/Publications/GenetKlay-RR99.ps. 9. V. Gouranton, P. R´ety, and H. Seidl. Synchronized Tree Languages Revisited and New Applications. In Proceedings of FoSSaCs, volume 2030 of LNCS, SpringerVerlag, 2001. 10. M. Hermann and R. Galbav´ y. Unification of Infinite Sets of Terms Schematized by Primal Grammars. Theoretical Computer Science, 176, 1997. 11. F. Jacquemard. Decidable Approximations of Term Rewrite Systems. In H. Ganzinger, editor, Proceedings 7th Conference RTA, New Brunswick (USA), volume 1103 of LNCS, pages 362-376. Springer-Verlag, 1996. 12. P. R´ety. Regular Sets of Descendants for Constructor-based Rewrite Systems. In Proceedings of the 6th international conference on LPAR, Tbilisi (Republic of Georgia), Lecture Notes in Artificial Intelligence. Springer-Verlag, 1999. 13. P. R´ety. J. Vuotto. Regular Sets of Descendants by Leftmost Strategy. Research Report RR-2002-08 LIFO, 2002. 14. K. Salomaa. Deterministic Tree Pushdown Automata and Monadic Tree Rewriting Systems. The journal of Computer and System Sciences, 37:367-394, 1988. 15. T. Takai, Y. Kaji, and H. Seki. Right-linear Finite Path Overlapping Term Rewriting Systems Effectively Preserve Recognizability. In L. Bachmair, editor, Proceedings 11th Conference RTA, Norwich (UK), volume 1833 of LNCS, pages 246-260. Springer-Verlag, 2000.
Rewrite Games Johannes Waldmann Institut f¨ ur Informatik, Universit¨ at Leipzig Augustusplatz 10, D-04109 Leipzig, Germany
[email protected]
Abstract. For a terminating rewrite system R, and a ground term t1 , two players alternate in doing R-reductions t1 →R t2 →R t3 →R . . . That is, player 1 choses the redex in t1 , t3 , . . ., and player 2 choses the redex in t2 , t4 , . . . The player who cannot move (because tn is a normal form), loses. In this note, we propose some challenging problems related to certain rewrite games. In particular, we re-formulate an open problem from combinatorial game theory (do all finite octal games have an ultimately periodic Sprague-Grundy sequence?) as a question about rationality of some tree languages. We propose to attack this question by methods from set constraint systems, and show some cases where this works directly. Finally we present rewrite games from to combinatory logic, and their relation to algebraic tree languages.
1
Introduction
This paper presents some fascinating open problems from the theory of combinatorial two person games. On the one hand, methods from string and term rewriting (including finite automata and set constraints) just wait to be applied there; while on the other hand, this application might stimulate some interesting developments in rewriting itself. For a terminating rewrite system R, and a ground term t1 , two players alternate in doing R-reductions t1 →R t2 →R t3 →R . . . That is, player 1 choses the redex in t1 , t3 , . . ., and player 2 choses the redex in t2 , t4 , . . . The player who cannot move (because tn is a normal form), loses. In this note, we propose some challenging problems related to certain rewrite games. In particular, we re-formulate an open problem from combinatorial game theory (do all finite octal games have an ultimately periodic Sprague-Grundy sequence?) as a question about rationality of some tree languages. We show how octal games can be described by word rewriting systems, so the natural generalization is to look at games in arbitrary term rewriting systems. Then, ultimate periodicity of a Sprague-Grundy sequence is translated into the Sprague-Grundy classes being rational, i. e. recognized by finite automata. S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 144–158, 2002. c Springer-Verlag Berlin Heidelberg 2002
Rewrite Games
145
We propose to attack the transformed problem, using set constraint methods that would have to be extended to cope with rewrite projections. For now, we show that this approach works in the simple case of subtraction games. We then give some examples of rewrite games in Combinatory Logic, and find that under certain restrictions, their Sprague-Grundy classes are algebraic (a. k. a. context-free) tree languages. Finally, we briefly review related work and emphasize directions for future research in rewrite games.
2
Notation and Preliminaries
We assume the reader is familiar with term rewriting (see for instance [BN98]) and tree languages (see [CDG+ 97]). To honour the “french style”, we use the words rational for regular or recognizable, and algebraic instead of context-free. Remember that an algebraic tree language ⊆ Term(Σ) is produced by a grammar that is in fact a rewrite system in an extended signature Σ ∪ Γ with nonterminals Γ = {N1 , . . . , Nk }, and monadic rewrite rules Ni (x1 , . . . , xn ) → t, where x1 , . . . , xn are distinct variables, and t ∈ Term(Σ ∪ Γ ∪ {x1 , . . . , xn }). An algebraic language is called rational iff all non-terminals are nullary. For the convenience of the reader, we now present, in some detail, standard notions and results from combinatorial game theory. For a compact introduction, see [Guy98c,Guy98a]; the definitive references are [Con76,BCG83]. A complete guide to the literature is [Fra00]. 2.1
Impartial Games
Combinatorial game theory studies finite deterministic two-person zero-sum games. This excludes chance, and hidden information. We also disallow ties. A game is normal if the player who cannot move, loses (and the player who made the last move, wins). A game ist impartial if in each position, the set of possible moves does not depend on which player moves next. We may describe a game “intensionally” by giving its ruleset, and a starting position. On the other hand, we often use an “extensional” view of a game as the (directed acyclic) graph containing all reachable positions, connected according to the rules of the game. We then even forget about the concrete positions, and just look at the unlabelled digraph: An impartial game G is a set of impartial games, called the options of G. (From now on, we just write “the game G”.) The simplest game is ∅, which has no options. This game ∅ is also called 0 (zero). In a combinatorial game, for each game position, exactly one of the players has a winning strategy. For impartial games, we define unary predicates N, P by N (G) : ⇐⇒ ∃g ∈ G : P (g),
P (G) : ⇐⇒ ∀g ∈ G : N (g)
saying that in G, the Next (resp. Previous) player has a winning strategy.
146
2.2
J. Waldmann
Take-and-Break Games
A position in a take-and-break game is a multi-set of positive numbers, and a move, in general, is to pick one number, decrease it, and then split it into several parts. As usual, the object is to get the last move. Each sequence M = (M1 , M2 , . . .) of sets Mr ⊆ N encodes the rules for a particular such game: It is allowed to decrease a number by r, and split the remainder into s positive summands, iff s ∈ Mr . We only consider games where the sequence M is eventually constant ∅. For example, the game officers is characterized by M = ({1, 2}, ∅, . . .), so the valid moves are to subtract one, and split the result into two or one (but not zero) positive numbers. That is, we are not allowed to pick a number “1” (since this would leave no positive number). An example how this game might be played is {8} →1 {3, 4} →2 {1, 1, 4} →1 {1, 1, 3} →2 {1, 1, 1, 1}. We underlined the numbers that were reduced (and possibly split). In the given line of play, Player 1 loses. Actually, she could have won, starting {8} →1 {2, 5}. If the sets Mr are finite, they can be encoded by dr := {2s : s ∈ Mr }, and the ruleset of the game is given by (d1 , d2 , . . .). Adding a leading zero and a decimal point, while suppressing trailing zeroes, officers is conveniently described by 0·6. An octal game is a take-and-break game where numbers are split in at most two parts. That is, for all r, we have max Mr ≤ 2. Therefore, for all code digits 0 ≤ dr < 8, and this explains the name. For example, officers is an octal game. A subtraction game is a take-and-break game, where numbers are reduced, but never split. For all code digits dr ∈ {0, 3}. The game is then characterized by its subtraction set {r : dr = 3}. For instance, the game 0 · 03033 has subtraction set S = {2, 4, 5}, and a possible line of play is 14 →1 9 →2 5 →1 1. Here, Player 2 lost, but he has a better move: He can win the game if he starts 14 →1 9 →2 7. 2.3
Basics of Sprague-Grundy Theory
In the disjoint sum G + H of two games, a valid move is either a move in G or a move in H, leaving the other summand untouched. Formally, G + H := {g + H : g ∈ G} ∪ {G + h : h ∈ H} Again, the object is to get the last move globally. Of course + is associative and commutative and 0 + G = G = G + 0. Then the relation ≈ on games given by G ≈ H : ⇐⇒ P (G+H) is an equivalence relation. It is stable w.r.t. disjoint sum: ∀G, H, K : G ≈ H ⇒ G + K ≈ H + K. (Note that the relation G ∼ H : ⇐⇒ P (G) = P (H) is not stable.)
Rewrite Games
147
We define a sequence of games (∗0, ∗1, ∗2, ∗3, . . .), called nimbers, by ∗0 := 0, ∗(n + 1) = {∗0, ∗1, . . . , ∗n}. The nimber ∗n corresponds to a heap of n beans, from which we may remove any positive number of beans. The Sprague-Grundy theorem [Spr35] asserts that for each game G, there is exactly one n such that G ≈ ∗n. We call n the Sprague-Grundy value of G, and write n = sg(G). We have sg(G) = 0 ⇐⇒ P (G) (previous player wins). The essential tool in the proof of the theorem, and in the computation of Sprague-Grundy values, is the mex rule: We have {∗n1 , . . . , ∗nk } ≈ ∗n, where n = mex{n1 , . . . , nk }. Here, the mex (minimum excludant) of a finite set M ⊂ N is defined by mex M = min(N \ M ). For all a, b ∈ N, we have ∗a + ∗b ≈ ∗(a ⊕ b), where ⊕ denotes binary addition that disregards carries. For example, ∗1 + (∗2 + ∗3) ≈ ∗1 + ∗1 ≈ ∗0. 2.4
Sprague-Grundy Sequences
The Sprague-Grundy sequence of a take-and-break game is the sequence (sg(0), sg(1), . . .). For example, the Sprague-Grundy sequence of the game 0 · 77 (called Kayles) starts (0, 1, 2, 3, 1, 4, 3, 2, 1, 4, 2, 6, . . .) It is easy to see that for each subtraction game, its Sprague-Grundy sequence is bounded and ultimately periodic: Due to the mex rule, the values are bounded by |S|, where S ⊂ N is the subtraction set; and the period length is at most |S|max S , since the continuation of the sequence is determined if a slice of length max S is given. For example, the subtraction set S = {7, 64, 89, 96} produces a period of length 5.756.171, after a pre-period of length 1061. It is open whether there are families Sn of subtraction sets for which the period length indeed grows exponentially w.r.t. max Sn . [AB95,Fla97]. For some octal games it is known that their Sprague-Grundy sequence is ultimately periodic. (To continue the example, the Kayles sequence reaches a period of length 12, after after a pre-period of length 71.) Whether this is true in general, is a famous open problem: Conjecture 1 ([Con76,BCG83,Guy98b]). For each octal game, the Sprague-Grundy sequence is bounded and ultimately periodic. There are small games with truly astronomic periods (e. g. 0 · 16 has period length 149459), but the games 0 · 6, 0 · 06, 0 · 14, 0 · 36, 0 · 64, 0 · 74, 0 · 76, 0 · 004, 0 · 005, 0 · 006, 0 · 007 and others remain unsolved today, despite computing some of their sequences up to length 1010 .
148
3
J. Waldmann
Rewrite Games
Definition 2 (Rewrite Game). For a terminating rewrite system R, and a ground term t1 , two players alternate in doing R-reductions t1 →R t2 →R t3 →R . . . That is, player 1 choses the redex in t1 , t3 , . . ., and player 2 choses the redex in t2 , t4 , . . . The player who cannot move (because tn is a normal form), loses. By definition, this gives a normal impartial game, so the Sprague-Grundy theory is applicable, and each ground term t ∈ Term(Σ) can be assigned its SpragueGrundy value sgR (t). Of course, such functions sgR can be arbitrarily complex, since R can encode any computation. On the other hand it is interesting to look for such R whose sgR behaves nicely. 3.1
Simulating Octal Games
Proposition 3. Each octal game can be simulated by a rewrite game. Proof. A game position {h1 , h2 , . . . , hk } is encoded as a word w = DU h1 DU h2 . . . DU hk D. A move “decrease a number by r, and split the remainder into s positive numbers” is encoded, for s = 0, 1, 2, by the rewrite rules s = 0 : {DU r D → D},
s = 1 : {DU r U → DU },
s = 2 : {U U r U → U DU }.
Note that we have to guarantee the non-emptiness of the resulting heaps, so we cannot simply write U r → D to remove r beans. Also, we use the convention that to the left of each U string, there is a D letter. For some octal digits it is possible to give a more compact system. We encode dr = 3 = 21 + 20 by U r → , dr = 6 = 22 + 21 by U r+1 → DU , and dr = 7 = 22 + 21 + 20 by U r → D. A subtraction game with difference set S is represented by {U r → : r ∈ S}. The unsolved games 0 · 6 (called officers) resp. 0 · 007 (called James Bond, or treblecross) are encoded by the rewrite systems R0·6 = {U U → DU },
R0·007 = {U U U → D}.
The line of play in officers presented in the introduction now reads DU 3 U U U 3 →1 DU U U DU U U U →2 DU DU DU U U U →1 DU DU DDU U U →2 DU DU DDU DU , where the chosen redexes are underlined. We remark that the encoding of the game positions (as strings) works for all take-and-break games, but the encoding of the moves (as rewrite rules) needs the octality restriction. There seems to be no easy way to simulate the splitting of a number into three (or more) summands. Even “tricky” representations (binary strings, trees) do not seem to help.
Rewrite Games
3.2
149
Periodicity and Rationality
What is the correct translation of “ultimate periodicity of a Sprague-Grundy sequence” into the language of rewrite games? Note that a Sprague-Grundy sequence is in fact a mapping from sizes of heaps, that is, strings on a one-letter alphabet, to Sprague-Grundy values. For rewrite games, we need to extend this, since we deal not with heaps, but trees, in arbitrary signatures. We suggest therefore to look at the number, and complexity, of Sprague-Grundy classes of rewrite games. Definition 4. The Sprague-Grundy class SGR (n) of the rewrite game of R is SGR (n) = {t : sgR (t) = n}. Note P = SGR (0) and N = n>0 SGR (n). Theorem 5. The Sprague-Grundy sequence of a take-and-break game is eventually periodic iff it has only finitely many non-empty Sprague-Grundy classes, all of which are rational languages (using the above encoding). Proof. Let l := length of pre-period plus length of period of the Sprague-Grundy sequence, and m := the maximal Sprague-Grundy value (which must exist — the sequence is eventually periodic, thus bounded). To show that each SGR (n) is rational, a finite automaton has to be constructed that reads a word w = DU h1 DU h2 . . . DU hk D and checks whether sg(h1 ) ⊕ sg(h2 ) ⊕ . . . = n. While reading U hi , the automaton needs at most l states to compute y := sg(hi ). It also needs to remember what it read before, namely x := sg(h0 )⊕. . .⊕sg(hi−1 ), and this can ce done with at most m states. In all, the machine needs l × m states. The computation of x ⊕ y can be done without additional states since the range of x and y is bounded (for y, by the precondition of the theorem, for x, because of a ⊕ b ≤ a + b ≤ 2m.). On the other hand, if a Sprague-Grundy SGR (n) class is rational, then so is SGR (n) ∩ DU ∗ D. This is (essentially) a one-letter rational language, and thus a sum of linear sets {DU ak+b D : k ≥ 0}. Now the period of the Sprague-Grundy sequence is bounded by the least common multiple of all the differences a of these (finitely many) sets. The class SG(0) (= P ) already contains all information on the other classes: Theorem 6. For all take-and-break games, using the above encoding: If the Sprague-Grundy class SG(0) is rational, then there are only finitely many nonempty SG(n), and all of them are rational. Proof. We write Cn for SG(n), and we let ∂L/∂w = {u : w · u ∈ L} denote the left quotient of the language L by the word w. Let Cn be a non-empty Sprague-Grundy class, and v ∈ Cn . Since vDw ≈ v + w, we have vDw ∈ C0 ⇐⇒ w ∈ Cn . So Cn = ∂C0 /∂vD. Therefore rationality of C0 implies that of Cn .
150
J. Waldmann
Assume now that there are infinitely many non-empty Sprague-Grundy classes. By definition, these classes are mutually disjoint. But since C0 is rational, it can only have a finite number of distinct left quotients. The Conjecture 1 on octal games naturally leads to the following Question 7. What rewrite games R have finitely many Sprague-Grundy classes, all of which are rational tree languages?
4
Sprague-Grundy Classes and Set Constraints
Now we show that for any rewrite game, Sprague-Grundy classes are solutions of a generalized set constraint system. We let R−1 (L) := {s : s →R t ∈ L} denote the set of one-step predecessors of the set L w.r.t. the rewrite relation R. Proposition 8. Writing Cn for SGR (n), we have, for each n ≥ 0, Cn = R−1 (C0 ) ∩ R−1 (C1 ) ∩ . . . ∩ R−1 (Cn−1 ) \ R−1 (Cn ), where, for n = 0, the empty intersection is understood as Term(Σ). Proof. By the mex rule, we have sgR (t) = n iff t has successors of SpragueGrundy values 0, 1, . . . , n − 1, but not n. Remeber that by Theorem 6, to “solve” an octal game it would be enough to determine C0 . (This special relation bewteen C0 and the other classes does not hold for all rewrite games, since it is an effect of our encoding of octal games.) In other words, C0 alone is as difficult as all the equations together. On the other hand, C0 solves the (seemingly simple) equation C0 = {D, U }∗ \ R−1 (C0 ). For the James Bond game, corresponding to R = {U U U → D}, it is even open whether C0 ∩ DU ∗ D is finite. For subtraction games, the constraint system can be solved easily. We only need to consider words from U ∗ (i. e. single heaps), since the corresponding rewrite rules cannot break D boundaries. But then the alphabet is unary, and R−1 (Cn ) is just U r ·Cn . Now the equations from Proposition 8 form a system of positive set constraints. It is known [AW92] that such a system, if solvable, admits rational solutions. In our case, the solution is clearly unique, and therefore must be rational. (This again shows that subtraction games have ultimately periodic Sprague-Grundy sequences.) Note that we did effectively consider a string rewrite system with rewriting only at one end of the strings. Using results for set constraint systems with projection [CP94], this idea can be extended to top-rewriting games (only top reductions are allowed) for rules l → r, where both l and r contain at most one variable. (Details will be explained in a forthcoming technical report.) This method (of tree set automata [GTT99]) would need to be extended to work for (certain) games with rewriting allowed anywhere, for instance, string
Rewrite Games
151
rewriting systems encoding octal games. The known huge period lengths seem to imply that this is not straightforward, but the absence (so far) of aperiodic octal games gives hope. Astronomic periods are perhaps not that surprising, in light of the known complexities of set constraints (NEXPTIME-completeness, if projections are involved).
5 5.1
Some Rewrite Game Examples Linear, Inverse Monadic, Non-overlapping Systems
...... ..• . → x. The Combinator I has the rewrite rule I ... .....x This game is trivial — the number of moves is exactly one less than the number of I in the start term (regardless of the reduction strategy), so there are only two Sprague-Grundy classes C0 , C1 , as follows from these propositions:
Proposition 9. If a rewrite system R is linear and has no critical pairs, then all reduction chains from t1 to t2 have common length. Note that we use linear in the strong sense: a rule l → r is linear iff each variable occurs exactly twice: once in l, once in r. Proposition 10. If a game has the property that for all positions p, all lines of play from p to a final position p have common length, then there are just two Sprague-Grundy classes, corresponding to values 0 and 1. Proof. The class C0 contains all positions that have an even distance to a stop ping position, and C1 contains all positions with odd distance. Therefore, in the I game, the classes are given by the rational grammars ....... ....... ...• ...• C0 → I ∪ C0... ..C1 ∪ C1... ..C0 ,
....... ....... ...• ...• C1 → C0... ..C0 ∪ C1... ..C1 ,
producing the sets of all terms with an odd (C0 ) resp. even (C1 ) number of I . The combinator F is the rewrite system
•
......... ....• y → y.. ...x.
....... ..... ......... ...... ........ .... .......
•
F x ....... ..• . Its normal forms are given by F1 := F , Fn+1 = F.... ..F . n Since the preconditions of Proposition 9 hold, we can apply Proposition 10. So the game has exactly two Sprague-Grundy classes. We will prove that they are not rational, but algebraic tree languages. The construction generalizes to similar systems. Proposition 11. For each term t, dist(t) = size(t) − size(nf(t)), where size(t) is the number of F in t, and dist(t) denote the length of a reduction chain from t to its normal form nf(t). Proof. In each reduction step, the size decreases by one.
152
J. Waldmann
........ ..• Proposition 12. nf(Fa... ..Fb ) = if a > b then Fa+1−b else Fb+2−a . Proof. By induction on a + b: ....... ....... ..• ..• 1. a = 1: F1... ...Fb = F.... ...Fb = Fb+1 = Fb+2−a . . .......... . . ........ . . ..•... ...• ......• 2. a > 1: Fa... ..Fb = ........•......................... .........Fb → Fb..... F.. a−1 . F Fa−1 ........ ...• a) b > a − 1 ( ⇐⇒ a ≤ b): Fb..... F.. a−1 →∗ Fb+1−(a−1) = Fb+2−a ........ ...• b) b ≤ a − 1 ( ⇐⇒ a > b): Fb..... F.. a−1 →∗ Fa−1+2−b = Fa+1−b . . . . . .....•..... Proposition 13. dist(Fa Fb ) ≡ (if a > b then 1 else 0) (mod 2). ........ ........ ........ ..• ..• ..• Proof. If a > b, then dist(Fa... ..Fb ) ≡ size(Fa... ..Fb ) − size(nf(Fa... ..Fb )) ≡ (a + ........ ........ ..• ..• b) − (a + b − 1) ≡ 1 (mod 2). If a ≤ b, then dist(Fa... ..Fb ) ≡ size(Fa... ..Fb ) − ........ ..• size(nf(Fa... ..Fb )) ≡ (a + b) − (b + 2 − a) ≡ 0 (mod 2).
Proposition 14. Neither of C0 , C1 is rational.
........ ...• Proof. Consider the intersection of C0 with the rational language F∗.. ..F∗ . By . . . .... ..• Proposition 13, this is {Fa... ...Fb : a ≤ b}, which is not rational. Since C1 = Term(Σ) \ C0 , the class C1 is not rational as well.
Proposition 15. Both C0 and C1 are algebraic.
...... ..• . → Proof. We use an enriched version of the inverse image y.... .....x
•
...... ...... ......... ..... ........ ..... ......
y of the F x rewrite rule to generate the set of all even-step (resp. odd-step) predecessors of normal forms. The number of reductions is counted in a “distributed” manner, as follows. •..... We replace the signature Σ = {F , .... .....} by Σ = {F , .....0....., .....1.....}. The morphism •
expo : Term(Σ ) → {0, 1} : F → 0, j...i....k → (i + j + k mod 2) ..
adds up the exponents modulo 2, and ....... .• drop : Term(Σ ) → Term(Σ) : F → F , x.....k...y → x...... ....y
is the (alphabetic) morphism that just removes exponents. Now we construct the language H = {t ∈ Term(Σ ) : dist(drop(t)) ≡ expo(t)
(mod 2)}.
Each t ∈ H contains, in its exponents, information about the parity of the length of the reduction of drop(t) to normal form. H is given by the algebraic grammar . H → F ∪ F....0...H , . .
. ....i.... x.. y →
i . x y F .... ......... . ...... .... .....
1
The languages C0 , C1 are algebraic since Ci = drop(H ∩ expo−1 (i)). Note that expo−1 (i) is rational since it is an inverse image of a morphism to a finite set; then we use closure of algebraic languages w.r.t. intersection with rational ones, and finally closure of algebraic languages under (alphabetic) morphisms.
Rewrite Games
153
This construction easily generalizes: Proposition 16. If the rewrite system R is linear, non-overlapping, and inverse monadic (i. e. R−1 is monadic), then its rewrite game has just two SpragueGrundy classes, both of which are algebraic tree languages. Note that the rewrite systems for octal games with all digits dr = 7, e. g. U r → D, are linear and inverse monadic as well. However they are overlapping (e. g. for r = 2, there is the critical pair DU ← U U U → U D), and indeed non-confluent. Still we know that in some cases, their Sprague-Grundy sequence is eventually periodic (equivalently, the Sprague-Grundy classes are rational): R0·07 = {U 2 → D} has period length 34, R0·77 = {U → D, U 2 → D} has period 12, but periodicity is open for R0·007 = {U 3 → D}. So the question is, how can we lift the “non-overlapping” restriction in Proposition 16? It seems that the number and complexity of the Sprague-Grundy classes somehow measure the degree of “inconfluence” of the rewrite system. 5.2
Some More Combinators
Finally we mention two games that are unsolved, in the sense that we don’t know a better way of computing Sprague-Grundy values than to expand the complete game tree. .... The combinator K is the system ...•....................•........y → x. K x For the Sprague-Grundy function sg = sgK of this game, we have .... Proposition 17. sg( ...•....................•........t2 ) = sg(t1 ) ⊕ (1 + sg(t2 )). K t1 Proof. Instead of computing Sprague-Grundy values by the mex rule, we give a “strategic” proof. We write ∗n for any term t with sg(t) = n. ..... We show that ....•........................•.........∗n is equivalent to the game ∗m + ∗(n + 1), by giving a ∗m K .... winning strategy for the second player in the sum ....•.........................•..........∗n + ∗m + ∗(n + 1). If K ∗m the first player moves in ∗m, then the second player mirrors his move in the other copy of ∗m, and wins by induction. If the first player moves in ∗n, to ∗n , then the other one moves from ∗(n + 1) to ∗(n + 1), and vice versa. If the first player .... moves from ∗(n + 1) to some ∗0, then the second player moves from ....•.........................•..........∗n ∗m K to ∗m. The resulting game is ∗m + ∗m ≈ 0, a second-player win. • ) = (1 + sg(t )) ⊕ sg(t ). 1 2 • t t 2 K K ........... ........... .......• . ......... • ........ Proof. By proposition 17, it is enough to show . ............ ...• ∗n ≈ ....•................. ..∗m ........ ...... . . . . . . . ∗m ....•.... K ∗n K K .......... .. .. • ......... .. ......... → K....•. ....∗n will be Moves in ∗m or ∗n can be mirrored, and a move ........... ∗n ........•....... . . . . . . ...... ∗m ....• K K
Proposition 18. sg(
..... ........ .......... ........ ....... .......... ....... ............ 1
•
154
J. Waldmann
......... • ....• . ≈ ∗n, since no ∗m → ∗n, and vice versa. Of course K ∗n ∗n K reduction can use the outer K .
answered by
.. ........ .......... ........ ..............
•
All numbers do occur as Sprague-Grundy values: Proposition 19. There is a sequence c0 , c1 , . . . of K terms with sg(cn ) = n. .... Proof. By proposition 17, we may take c0 = K , cn+1 = ...•......................•........cn . K K ............ .......• ...... ......... Proposition 20. sg( .............. t1 ) = 0. .......... • ............ ................. . . . • t . . . . 1 ........ . . . . . ...•... K K K Proof. We show that this game is a win for the second player. Moves in t1 will .... be mirrored, the only other move goes → ....•.....................•.........t1 , which will be answered by K K .......... ....... • ...... ........... t1 → K ≈ 0. ...• K K The K game seems rather difficult, although reductions are short. The difficulty comes from the fact that sg is not a morphism: Proposition 21. There is a context C[] and terms t, t with sg(t) = sg(t ), such that sg(C[t]) = sg(C[t ]). .......... ........ ......• ....• ... ........ Proof. Take C[] = .......... ∗n and t = K , t = K K . We have sg(K ) = .......• ..... ........... ∗n ...• . K [] sg(K K ) = 0 since both are normal forms. By Proposition 18, sg(C[t]) = ......... .......... .......... • ............ ........ • ....... sg( .......... ................ ∗n) = (1 + n) ⊕ n = 0, and sg(C[t ]) = sg( ∗n) = 0 ..........•................ ......• . ......... ............... . . . . . . . . . . . . . • • . ∗n ∗n . . . . . . ........ ... . . . ...... ....• K K K K K by Proposition 20. .......... .......... ....• ....• . . . . . . . . . . . . . . The combinator D has the reduction rule ...•........... y → x ....•......... . x y D x ... Definition 22. Normal forms are right spines: D0 = D , Dn+1 = D ....•. ...Dn . The game is difficult to analyze since it may take tower-of-exponentionals time. This can be seen from: ... Lemma 23. The normal form of Da..•. ...Db is D2a +b , and each reduction to that a form takes at least 2 − 1 steps. So terms may get huge, but they contain a lot of duplicated subtrees. If these are on the right spine, their Sprague-Grundy values cancel: ... Proposition 24. For all D terms t1 , t2 , sg(t1.......•. ..............•......... ) = sg(t2 ). t1 t 2 ........... .....• . . . . . . Proof. We show that t1 ...•......... ≈ t2 . Any move inside ti will just be mirrored. t1 t2 .. .. If the inner expression t1...•.....t2 is a redex, then t1 = D...•......t3 . Reduction leads . . . . . . to t1.........•...................•............... , but then the root is a redex as well, so we now have in fact t3 ....•........ t3 t 2 .... ..... .........•................. ........... , and the second player moves to t3...........•.......................•................ . By induction . . . . ....... . . . • . .. ......•....... t3 ..........•..................... D t3 t3 t ......•........t . . . . • t . . 3 2 3 t3 t 2 (twice), this is equivalent to t2 .
Rewrite Games
155
... Proposition 25. sg( ...•....................•. ........t2 ) = (1 + sg(t1 )) ⊕ sg(t2 ). t D 1
Proof. Similar to Proposition 17.
Proposition 26. There is a sequence c0 , c1 , . . . of D terms with sg(cn ) = n. Proof. Take c0 = D , cn+1 =
• , and apply Proposition 25. • D D cn .... ....... ........ ....... ...........
Proposition 24 does not extend easily to duplicate subtrees elsewhere, since again the Sprague-Grundy function is not a morphism. For both K and D , the rewrite system is non-linear (deleting resp. duplicating). Remember that we solved the games for I and F , with the rules being linear. Now the rewrite systems associated to octal games are linear as well (since they are word rewrite systems) but linearity alone does not imply boundedness of Sprague-Grundy values — there is even a word rewrite game that disproves it (see the forthcoming technical report).
6
Discussion
Summary. In this paper, we have introduced the field of (impartial) rewrite games, by giving definitions, examples, a few basic results, and some open questions. Further, we discussed how rewrite games might help to solve a long standing open problem on combinatorial games. To carry out this plan, we would need to handle set constraints with rewrite projections. Outlook. While the present paper dealt with impartial games (where both players always have identical options), game theory starts to get really interesting with partizan games (where option sets might differ). Most “natural” games (Chess, Go, . . . ) belong to this class. It is straightforward to define partizan rewrite games: each of the two players player P1 , P2 has his own rewrite system R1 , R2 , and they construct a reduction t1 →R1 t2 →R2 t3 →R1 . . . There should be relations to modularity properties of rewrite systems. Related work. Games do occur in various branches of logics and computer science. Two-person games are naturally related to formulas in predicate logic, because the sequence of alternating moves “me–you–me–you–. . . ” corresponds to a sequence of quantifiers ∃∀∃∀ . . . (There exists a move such that for all your moves there exists an answer move such that . . . ) This is applied in EhrenfeuchtFra¨ıss´e games to characterize expressiveness of certain logics (see, e. g. chapters 2 and 3 in [EF99]). In turn, this leads to characterization of complexity classes (for instance, QBF (quantified Boolean formula) is PSPACE-complete). Another application of games in logic is the proof of Rabin’s theorem (decidability of existential monadic second-order logic on trees) via the forgetful determinacy theorem [GH82], that relates an accepting computation of an ω automaten to a winning strategy in a game.
156
J. Waldmann
In complexity theory, one uses Pebble Games to investigate time/space tradeoffs. (These are one-person games.) Then there are the more recent developments of Arthur-Merlin games [BM88] and Interactive Proof systems [GMR89]. They characterize complexity classes, and provide means to validate knowledge, without actually revealing it. These games typically have a pre-defined (short) number of moves, and one is interested in probabilities of success. This aspect differs from combinatorial games (and rewrite games). A lot of “natural” combinatorial games (parametrized variants of Chess, Go, etc.) are PSPACE-complete by reduction from QBF [GJ79]. On the other hand, there are complexity results on one-player games (puzzles like Sokoban or Atomix [Hol01]) that typically use a reduction from the Finite Automata Intersection Emptiness problem. This again is connected to the complexity of solving certain set constraints, which are EXPTIME- or NEXPTIME-hard (depending on the admitted operators). We are not aware of previous work on rewrite games as defined here. V´ agv¨ olgyi [V´ ag98] deals with the Ground Tree Transducer Game, which is a kind of rewrite game (since a GTT is a rewrite system) but the emphasis is on the Ehrenfeucht-Fra¨ıss´e method (one player wants to construct t1 , t2 that are related by the GTT, the opponent wants to spoil this) while in our definition, the game is normal impartial (both players have identical options, and both want to get the last move). In the combinatorial game theory literature, there is one topic that suggests a special form of rewrite games: Stromquist and Ullman [SU93] investigate sequential compounds of games, and Guy [Guy98b] suggests to investigate compounds in arbitrary posets. This idea is again mentioned by Albert and Nowakowski [AN01] when they discuss how to generalize the game of End-Nim onto trees. In our language, this would correspond to a rewrite game that is played at, or near to, the leaves of a tree, gradually pruning it in the process. Note that this could be seen as ground rewriting, which is, in a sense, complementary to the analysis of top rewrite games mentioned in the present paper. Relations between games and formal languages have been studied for the case of Peg Solitaire by Ravikumar [Rav97] (when played on strings, and strips of fixed width, winning positions form a rational language) and its impartial two-player variant by Eppstein and Moore [ME00]. Conclusion. It is not all surprising that combinatorial games and term rewriting are closely connected in both directions: on the one hand, a lot of games have “natural” presentations as rewrite games, on the other hand, game trees (and their canonical forms) just are terms, (and Conway’s “one line proofs” [Con76] are rewrite sequences.) I think this is a fascinating area of research where much more waits to be discovered. You are invited to play the rewrite games from this paper against my computer program here: http://theopc.informatik.uni-leipzig.de/˜joe/trs/.
Rewrite Games
157
Acknowledgements. I am grateful to Ingo Alth¨ ofer, Achim Flammenkamp, Dieter Hofbauer, J¨ org Rothe, and Georg Snatzke, for explanations and discussions on games, rewriting, and complexity, as well as to Mirko Rahn, Christian Vogt, and the anonymous referees for careful reading and suggesting improvements of the presentation of this paper.
References [AB95]
Ingo Alth¨ ofer and J¨ org B¨ ultermann. Superlinear period lengths in some subtraction games. Theoretical Computer Science, 148:111–119, 1995. [AN01] Michael H. Albert and Richard J. Nowakowski. The game of end-nim. Electronic Journal of Combinatorics, 8(2), 2001. [AW92] Alexander Aiken and Edward L. Wimmers. Solving systems of set constraints. In Seventh Annual IEEE Symposium on Logic in Computer Science, pages 329–340, 1992. [BCG83] Elwyn R. Berlekamp, John H. Conway, and Richard K. Guy. Winning Ways for your Mathematical Plays. Academic Press, 1983. (A K Peters, 2001). [BM88] L. Babai and S. Moran. Arthur-Merlin games: A randomized proof system, and a hierarchy of complexity classes. Journal of Computer and System Sciences, 36(2):254–276, 1988. [BN98] Franz Baader and Tobias Nipkow. Term Rewriting and All That. Cambridge University Press, 1998. [CDG+ 97] Hubert Comon, Max Dauchet, R´emi Gilleron, Denis Lugiez, Sophie Tison, and Marc Tommasi. Tree Automata Techniques and Applications. http://www.grappa.univ-lille3.fr/tata/, 1997. [Con76] John H Conway. On Numbers and Games. Academic Press, 1976. (A K Peters, 2001). [CP94] Witold Charatonik and Leszek Pacholski. Set constraints with projections are in NEXPTIME. In IEEE Symposium on Foundations of Computer Science, pages 642–653, 1994. [EF99] Heinz-Dieter Ebbinghaus and J¨ org Flum. Finite Model Theory. Springer, 1999. [Fla97] Achim Flammenkamp. Lange Perioden in Subtraktions-Spielen. HansJacobs Verlag, Lage, Germany, 1997. http://wwwhomes.uni-bielefeld.de/achim/. [Fra00] Aviezri Fraenkel. Combinatorial games: Selected bibliography with a succinct gourmet introduction. http://www.combinatorics.org/Surveys/index.html, 1994–2000. Electronic Journal of Combinatorics, Dynamic Surveys. [GH82] Y. Gurevich and L. Harrington. Trees, automata, and games. In Proc. Symp. Theory of Computing, pages 60–65. ACM Press, 1982. [GJ79] M. R. Garey and D. B. Johnson. Computers and Intractability. W. H. Freeman, 1979. [GMR89] S. Goldwasser, S. Micali, and C. Rackoff. The knowledge complexity of interactive proof systems. SIAM Journal on Computing, 18(1):186–208, February 1989. [GTT99] Remi Gilleron, Sophie Tison, and Marc Tommasi. Set constraints and automata. Information and Computation, 149(1):1–41, 1999.
158
J. Waldmann
[Guy98a] [Guy98b] [Guy98c] [Hol01] [ME00] [Now98] [Rav97] [Spr35] [SU93] [V´ ag98]
Richard K. Guy. Impartial Games, pages 61–78. In Nowakowski [Now98], 1998. Richard K. Guy. Unsolved problems in combinatorial games, pages 475– 492. In Nowakowski [Now98], 1998. Richard K. Guy. What is a game, pages 43–60. In Nowakowski [Now98], 1998. Markus Holzer. Assembling molecules in atomix is hard. Technical Report Technical Report TUM-I0101, Technische Universit¨ at M¨ unchen, Institut f¨ ur Informatik, 2001. Cristopher Moore and David Eppstein. One-dimensional peg solitaire, and duotaire. In MSRI Workshop on Combinatorial Games, page (to appear), 2000. http://www.santafe.edu/˜moore/pubs/peg.html. Richard J. Nowakowski, editor. Games of No Chance. Cambridge University Press, 1998. B. Ravikumar. Peg-solitaire, string rewriting systems and finite automata. In International Symposium on Algorithms and Computation, Singapore, 1997. http://homepage.cs.uri.edu/faculty/ravikumar/index.html. ¨ Richard Sprague. Uber mathematische kampfspiele. Tohoku Math. J., 41:438–444, 1935. Walter Stromquist and Daniel Ullman. Sequential compounds of combinatorial games. Theoretical Computer Science, 119:311–321, 1993. S. V´ agv¨ olgyi. The ground tree transducer game. Fundamenta Informaticae, (34):175–201, 1998.
An Extensional B¨ ohm Model Paula Severi1 and Fer-Jan de Vries2 1
2
Departimento di Informatica, Universit` a di Torino. Corso Svizzera 185, 10149 Torino, Italy -
[email protected] Department of Mathematics and Computer Science, University of Leicester University Road, Leicester, LE1 7RH, UK -
[email protected]
Abstract. We show the existence of an infinitary confluent and normalising extension of the finite extensional lambda calculus with beta and eta. Besides infinite beta reductions also infinite eta reductions are possible in this extension, and terms without head normal form can be reduced to bottom. As corollaries we obtain a simple, syntax based construction of an extensional B¨ ohm model of the finite lambda calculus; and a simple, syntax based proof that two lambda terms have the same semantics in this model if and only if they have the same eta-B¨ ohm tree if and only if they are observationally equivalent wrt to beta normal forms. The confluence proof reduces confluence of beta, bottom and eta via infinitary commutation and postponement arguments to confluence of beta and bottom and confluence of eta. We give counterexamples against confluence of similar extensions based on the identification of the terms without weak head normal form and the terms without top normal form (rootactive terms) respectively.
1
Introduction
In this paper we present a confluent infinitary extension λh∞ β⊥η of the extensional lambda calculus λβη . In earlier work confluent infinitary extensions of the lambda calculus λβ without the eta rule have been studied. Typically, confluence of such infinitary extensions cannot be obtained unless we add a form of bottom rule that identifies computationally insignificant terms with some added symbol ⊥. Different choices of computationally insignificant terms may lead to different confluent and normalising extensions. The main three choices, the set of terms without head normal form, the set of terms without weak head normal form and the set of terms without top normal form (rootactive terms) result in three different confluent and normalising calculi in which the normal forms are respectively known as B¨ohm trees, L´evy-Longo trees and Berarducci trees. In contrast, here for extensional lambda calculus we have no choice: only identification of all terms without head normal form with ⊥ results in the confluent, normalising calculus λh∞ β⊥η . The normal forms of this calculus are known as etaB¨ ohm trees [2]. As corollaries of confluence and normalisation we obtain a simple,
Partially supported by IST-2001-322222 MIKADO; IST-2001-33477 DART.
S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 159–173, 2002. c Springer-Verlag Berlin Heidelberg 2002
160
P. Severi and F.-J. de Vries
syntax based, construction of an extensional B¨ ohm model Bη of the finite lambda calculus; plus a new and simple syntax based proof that two lambda terms have the same semantics in this model if and only if they have the same eta-B¨ohm tree if and only if they are observationally equivalent wrt to beta normal forms. Hence this extensional B¨ohm model Bη equates more terms than Barendregt’s B¨ohm model B in [3]). It induces the same equality relation as Park’s model ∗ D∞ of [16,6,17]. The eta rule has not been considered before in infinitary lambda calculus mainly because of a counterexample [12,11] showing that arbitrary transfinite βη-reductions can not be compressed into reductions of at most ω length, as in the case without eta, since infinite β-reduction can create an eta redex.1 The confluence proof in [12] heavily depended on compression. The recent approach of [11] uses transfinite induction and postponement of ⊥-reduction over β-reduction. In this paper we will follow the postponement proof technique. Roughly speaking we will show that any transfinite mixed β⊥η-reduction factors into a β⊥reduction followed by an η-reduction. Confluence of β⊥η-reduction then follows from confluence of β⊥-reduction, commutation of β-reduction and ⊥-reduction with η-reduction, and confluence of η-reduction. The commutation of η-reduction and ⊥-reduction requires care: in general only outermost (or maximal [3]) ⊥reduction commutes with η-reduction, which can be easily overlooked, already in finite lambda calculus. The calculus λh∞ β⊥η is normalising: given a term one first reduces it via a leftmost outermost β⊥-reduction to its B¨ohm tree, and then via a leftmost outermost ηreduction to its eta-B¨ ohm tree. This two step process cannot really be improved upon. We will show that at most ω+ω steps are needed to compute the eta-B¨ohm tree of a lambda term. Using the notation of the counterexample in the footnote against ω-compression, the term z(λx.Ex)(λx.Ex)(λx.Ex) . . . is an example of a term that needs at least ω+ω steps to reduce to its eta-B¨ ohm tree zy ω y ω y ω . . . . This contrasts with the three infinitary extensions of λβ , where the reduction to normal form needs at most ω steps.
2
Infinite Lambda Calculus
Infinite lambda calculus houses besides the usual finite terms and reductions also infinite terms and infinite, converging reduction. It incorporates in a natural way the open-ended view on computation that, as the computation of a program proceeds, more and more information is read from the input, and more and more of the output is produced. We will now recall some notions and facts of infinite lambda calculus presented in [12,11]. We assume familiarity with basic notions and notations from [3]. 1
Consider a term E with the property that Ex →∗β y(Ex) (e.g. the term E = Ωλzw.y(zw) in the notation of Definition 1) and the term y ω = y(y(y(. . . ))). Then ω ω λx.Ex →∗β λx.(y(Ex))x →∗β λx.y(y(Ex))x → →ω β λx.y x →η y . This example is related to the example of 10.1.22 in [3].
An Extensional B¨ ohm Model
161
Let Λ⊥ be the set of finite λ-terms given by the inductive grammar: M ::= ⊥ | x | M M | λx.M Let u be any finite sequence of 0, 1 and 2’s. The subterm M |u of a term M ∈ Λ⊥ at occurrence u (if there is one) is defined by induction as usual: M | = M
(λx.M )|0u = M |u
(M N )|1u = M |u
(M N )|2u = N |u
We will define three length related measures for occurrences: lengthh (u) is the number of 2’s in u, lengthw (u) is the number of 0’s and 2’s in u and finally lengtht (u) is the number of 0’s, 1’s and 2’s in u. The depth at which a subterm N in M occurs can now be measured by the length of the occurrence u of N in M . This leads to three different metrics dx on Λ⊥ for x ∈ {h, w, t}: x-metric dx (M, N ) = 0, if M = N and dx (M, N ) = 2−lengthx (u) , where u is a common occurrence of minimal length such that M |u = N |u . Now in the spirit of Arnold w∞ and Nivat [1] we define the sets Λh∞ and Λt∞ ⊥ , Λ⊥ ⊥ as the metric completions of the set of finite lambda terms Λ⊥ over the respective metrics dh , dw and dt . The indices h, w and t stand for head normal form, weak head normal form and top normal form respectively. It may be illustrative to draw pictures and to think of terms as trees: draw the edges corresponding to the counted occurrences vertically and all other edges horizontally. Then trees in the three metric completions don’t have infinite horizontal branches. In case of Λh∞ ⊥ this implies, when paths of branches are coded by sequences of 0, 1 and 2’s, that its trees are characterised by the fact that their branches don’t have infinite “tails” consisting of 0 and 1’s only. Definition 1. Some abbreviations for useful finite and infinite λ-terms: I = λx.x 1 = λxy.xy K = λxy.x
K∞ = λx.λx. . . . M ω = M (M (M . . . )) ω M = ((. . . M )M )M
Ω = (λx.xx)λx.xx ΩM = (λx.M (xx))λx.M (xx) Ωη = λx0 .(λx1 .(. . . )x1 )x0
w∞ ω h∞ The inclusions Λh∞ ⊂ Λt∞ ⊥ ⊂ Λ⊥ ⊥ are strict. For instance, x ∈ Λ⊥ , K∞ ∈ w∞ h∞ ω t∞ w∞ h∞ Λ⊥ − Λ⊥ and x ∈ Λ⊥ − Λ⊥ . As shown in [12,11] the sets Λ⊥ , Λw∞ and ⊥ Λt∞ are the three minimal infinitary extensions of the finite lambda calculus λβ ⊥ containing respectively the B¨ohm trees, L´evy-Longo trees and the Berarducci trees and each closed under its respective notion of convergent reduction to be defined next. The set Λh∞ ⊥ will function as the underlying set of finite and infinite lambda terms for our extensional infinitary extension λh∞ β⊥η . Many notions of finite lambda calculus apply and/or extend more or less straightforwardly to the infinitary setting. The main idea which goes back to Dershowitz e.a. in [8] is that reduction sequences can be of any transfinite ordinal length α:
M0 → M1 → M2 → . . . Mω → Mω+1 → . . . Mω+ω → Mω+ω+1 → . . . Mα This makes sense if the limit terms Mω , Mω+ω , . . . in such sequence are all equal to the corresponding Cauchy limits limβ→λ Mβ in the underlying metric
162
P. Severi and F.-J. de Vries
space for any limit ordinal λ ≤ α. If this is the case, the reduction M0 →α Mα is called Cauchy converging. We need the stronger concept of a strongly converging reduction that in addition satisfies that the depth of the reduced redexes goes to infinity at each limit term: limβ→λ dβ = ∞ for each limit ordinal λ ≤ α, where dβ is the depth in Mβ of the reduced redex in Mβ → Mβ+1 . We will denote strongly converging reduction by → →. Any finite reduction M0 →∗ Mn is strongly converging. Finally we have to introduce the basic reduction relations of λh∞ β⊥η . Besides the familiar beta and eta rules it contains one of the bottom rules ⊥h (or ⊥ for short). We define three bottom rules ⊥x where x ∈ {h, w, t} as follows: M → ⊥, provided M [⊥ := Ω] ∈ Ux
(⊥x )
We usually omit the subscript x and we write ⊥ instead of ⊥x . Outermost bottom reduction, denoted as →⊥out , is the restriction of bottom reduction to outermost redexes. The sets Ux are sets of ⊥-free finite and infinite lambda terms and defined as follows: 1. The (for this paper most important) set Uh is the set of ⊥-free terms in Λh∞ ⊥ that don’t have (a finite β-reduction to) a head normal form, where a term is a head normal form if it is of the form λx1 . . . λxn .yM1 . . . Mm . 2. The set Uw is the set of ⊥-free terms in Λw∞ that don’t have (a finite β⊥ reduction to) a weak head normal form, where a term is a weak head normal form if it is either a term of the form yM1 . . . Mn or an abstraction λx.M . 3. The set Ut is the set of ⊥-free terms in Λt∞ ⊥ that don’t have (a finite βreduction to) a top normal form (rootstable form), where a term is a top normal form, if it is either an abstraction λx.M , or, an application M N where M cannot β-reduces (in a finite number of steps) to an abstraction. Some examples: The term Ω does not have a top normal form and therefore belongs to all three sets. The term Ωx is a top normal form but has no (weak) head normal form. The term λx.Ω is a weak head normal form, but has no head normal form. Hence all inclusions in Uh ⊃ Uw ⊃ Ut are strict. The extensional infinite lambda calculus that is the main object of study in h∞ this paper is the calculus λh∞ β⊥η = (Λ⊥ , →β⊥η ). We will also briefly mention x∞ ∞ λβ⊥η = (Λx , →β⊥η ) for x ∈ {w, t}. For any of these infinite lambda calculi λx∞ ρ we say that – a term M in λx∞ is in ρ-normal form if there is no N in λx∞ such that ρ ρ M →ρ N . – λx∞ is infinitary confluent (or just confluent for short) if (Λ∞ →ρ ) satisfies ρ x ,→ the diamond property, i.e. ρ ← ←◦→ →ρ ⊆ → →ρ ◦ ρ ← ←. – λx∞ is (weakly) normalising if for all M ∈ Λ∞ ρ x there exists an N in ρ-normal form such that M → →ρ N . – Let α be an ordinal. We say that λx∞ is α-compressible if for all M, N such ρ that M → →ρ N there exists a reduction from M to N of length at most α.
An Extensional B¨ ohm Model
163
Without the bottom rule there is no chance of proving confluence. Berarducci’s counterexample [5] is very short: Ω ∗β ← ΩI → →β I ω . w∞ t∞ Crucial properties of those three infinite lambda calculi λh∞ β⊥ , λβ⊥ and λβ⊥ are: Theorem 1. Confluence, normalisation and compression of β⊥ [12,11]. w∞ t∞ The infinite lambda calculi λh∞ β⊥ , λβ⊥ and λβ⊥ are confluent, normalising and ω-compressible. Theorem 2. Postponement of β over ⊥ [11]. If M → →β⊥ N then there exists →⊥ N . Q such that M → →β Q →
3
Two Non-confluent Extensional Infinite Lambda Calculi
Before we give the proof of confluence of λh∞ β⊥η we will show that the two related extensional infinite lambda calculi λw∞ and λt∞ β⊥η β⊥η are not confluent. In fact what w we note is that already the finite calculi λβ⊥η and λtβ⊥η are not confluent for finite reductions. For this we use the term Ωη ∈ Λw∞ ⊂ Λt∞ ⊥ ⊥ . Similar to Ω which β-reduces to itself in only one step, this term η-reduces to itself in only one step. The body of the outermost abstraction in Ωη has no weak (top) normal form. The span ⊥ ⊥ ← ΩI η ← Ω1 → →β Ωη →⊥ λx.⊥ can only be joined if λx.⊥ →⊥ ⊥, which does not hold for the ⊥w -rule and the ⊥t -rule. Hence this is a counterexample t∞ of confluence of both λw∞ β⊥η and λβ⊥η . Remark also that there is a critical pair between the eta rule and each bottom rule: λx.⊥ ⊥← λx.⊥x →η ⊥. The reverse step follows from the fact that the term ⊥x[⊥ := Ω] := Ωx has no weak head normal form. This pair can be completed only if λx.⊥ →⊥ ⊥ which is true for the ⊥h -rule, but not for the ⊥w -rule. This gives an alternative counterexample against confluence of λw∞ β⊥η .
4
The Confluent Extensional Infinite Lambda Calculus λh∞ β⊥η
In this section we will prove the confluence and normalisation of λh∞ β⊥η . Thereto we will first prove some useful properties of the infinitary eta calculus λh∞ η , secondly the commutation of eta and beta and the commutation of eta and outermost bottom, and thirdly postponement of eta over beta and bottom. 4.1
The Infinitary Eta Calculus λh∞ η
The set Λh∞ has the very pleasant property that any Cauchy-converging η⊥ reduction sequence in Λh∞ ⊥ is h-strongly convergent. This property is not shared by the other two infinitary extensions Λw∞ and Λt∞ ⊥ ⊥ . For instance there exists
164
P. Severi and F.-J. de Vries
a Cauchy-converging and non-h-strongly converging η-reduction sequence from t∞ the term Ωη ∈ Λw∞ ⊥ ⊂ Λ⊥ . Definition 2. For M ∈ Λh∞ ⊥ define |M |n as the number of nodes at h-depth n. The number |M |n of nodes of term M at h-depth n decreases if and only if we contract an η-redex in M at h-depth n. Hence: Lemma 1. Any Cauchy-converging η-reduction starting from a term in Λh∞ ⊥ is h-strongly convergent. Proof. Suppose by contradiction that we have some transfinite Cauchyconverging η-reduction sequence M0 →η M1 →η . . . , that reduces infinitely often at h-depth n. Then infinitely many of the inequalities in the next sequence are strict: |M0 |n ≥ |M1 |n ≥ |M2 |n ≥ . . . But infinite strictly decreasing sequences of natural numbers don’t exist. Hence the limit of the h-depth of the contracted redexes in this sequence goes to infinity at each limit ordinal ≤ α. Therefore any η-reduction sequence starting from a term in Λh∞ ⊥ is h-strongly converging. Lemma 2. The infinitary lambda calculus λh∞ is ω-compressible. η Proof. Let M → →η N a reduction sequence of length γ. We will prove that there exists a sequence from M to N of length at most ω. The proof proceeds by transfinite induction on γ. The argument of the limit case is standard, see [11]. If γ is a successor ordinal, it is sufficient to prove that a sequence of length ω + 1 can be compressed into one of length ω. Without loss of generality, we may suppose that we have a strongly η-reduction sequence of length ω + 1 from M0 to Mω as in: M0
∗ η
/ λx.Mk x
η
/ λx.Mk+1 x
η
Mk
/ λx.Mk+2 x
η
/ Mk+1
η
η
λx.Mω x
η
η
Mω
/ Mk+2
η
Then working to the right onwards from λx.Mk x the dotted squares can be all constructed. This results in a reduction of length ω starting at M0 and after k + 1 steps continuing as Mk →η Mk+1 →η . . . . The limit of this converging reduction is clearly Mω . In the following lemma, as well as later for other reduction relations, we indicate m with a superscript that the redex contracted in M →η N is at depth m. m
n
Lemma 3. Let M0 be a term in Λh∞ ⊥ . If M0 →η M1 and M0 →η M2 then one of the following two cases holds: M0
m η
n η
M2
m η
/ M1
M0
n η
n η
/ M3
M2
n η
/ M1 M3
An Extensional B¨ ohm Model
165
Proof. Note that the h-depth of an η-redex in a term does not change when we contract an η-redex elsewhere in the term. is confluent. Theorem 3. The infinitary lambda calculus λh∞ η Proof. Let two coinitial η-reductions be given. By compression (Lemma 2) we may assume that their length is at most ω. By simultaneous induction on their length we show that for any two coinitial η-reductions can be joined with a so called tiling diagram construction [11] in which all horizontal and vertical reductions are strongly converging. Suppose we have two η-reductions of length ω: M0,0 → →η M0,ω and M0,0 → →η Mω,0 . We will not present the whole induction, but comment on the main cases. The successor-successor case of the induction is in fact the previous lemma. The successor-limit case: this follows if we can construct the following tiling = diagram in which the notation M →η N expresses that in the reduction from M to N in which at most one reduction step has been performed: at h-depth m. M0,0 m0 η
M1,0
n0 η
/ M0,1
m0 /= η n0 /= η
/ M1,1
n1 η
/ M0,2
m0 /= η n1 /0 η
M0,ω m0 /= η
M1,ω
/ M1,2
By induction hypothesis this diagram can be constructed but for the right most edge. The bottom reduction inherits the (strong) convergence property from the top reduction. As we are dealing with η-reduction it is easy to see that there is at most one residual of the redex contracted in M0,0 →η M1,0 in M0,ω . Contraction of this set of residuals gives the reduction of the rightmost edge: it will be at most one step. The limit-limit case: Using the induction hypothesis we can construct diagrams for pair of coinitial sequences of respective lengths (n, m) where either n < ω and m ≤ ω, or m < ω and n ≤ ω. By the general tiling diagram theorem in [11] or in this particular instance simply the uniform nature of the strong convergence of all vertical and horizontal reductions it follows that their limits have to be the same: Mω,ω .
M0,0 m0 η
M1,0 m1 η
M2,0
Mω,0
n0 η
/ M0,1
m0 /= η n0 /= η
/ M1,1
m1 /= η n0 /= η
n0 /= η
n1 η
/ M0,2
m0 /= η n1 /0 η
/ M1,2
m1 /= η
/ M2,1
n1 /=
/ Mω,1
n1 /=
η
η
M0,ω m0 /= η
M1,ω
m1 /= η
/ M2,2
M2,ω
/ Mω,2
Mω,ω
Theorem 4. The infinitary lambda calculus λh∞ is normalising. η
166
P. Severi and F.-J. de Vries
Proof. Let M0 be given. We construct a reduction M0 →η M1 →η M2 . . . recursively: suppose we have Mn then we construct Mn+1 by contracting the leftmost η-redex of smallest h-depth. This gives us a possibly infinite reduction which by Lemma 1 can only be strongly converging. Its last term is an η-normal form. This is trivial in case of a finite reduction; in case of an infinite reduction, there is a standard reductio ad absurdum argument: Suppose there is a η-redex in the limit term, then this η-redex was already present at some finite stage in the reduction and apparently not reduced in the remaining reduction. But the reduction strategy is such that no redex can get overlooked. Contradiction. 4.2
Commutation
In this section we prove that the reductions → →η commutes with both → →β and → →⊥ . As each of these reduction relations on its own is ω-compressible we can assume that the length of the reductions is at most ω. Theorem 5. [Commutation of → →β and → →η ] If M0,0 → →β M0,γ and M0,0 → →η Mδ,0 then there exists Mδ,γ such that M0,γ → →η Mδ,γ and Mδ,0 → →β Mδ,γ . Proof. A double induction on the length of the two sequences, respectively γ and δ. The critical cases in this proof are: (1, 1), (1, ω), (ω, 1) and (ω, ω). (γ, δ) = (1, 1): This case is a careful analysis of cases based on the relative positions of the β-redex and the η-redex. Suppose M0 can do both a β-reduction and an η-reduction at respectively h-depth n and m. The possible situations are: 1. The redexes do not interfere with each other: a) The redexes are not nested, i.e. M0,0 = C[(λx.M )N, (λy.P y)]. b) The β-redex is inside the η-redex, that is M0,0 is of the form C1 [λx.C2 [(λy.M )N ]x]. c) The η-redex is inside the body of the abstraction, that is M0,0 is of the form C1 [(λx.C2 [λy.M y])N ]. 2. The η-redex is inside the argument of the application, that is M0,0 is of the form C1 [(λx.M )C2 [λy.N y]]. 3. The β-redex and η-redex overlap, that is M0,0 is either C[(λx.M x)N ] or C[λx.(λy.M )x]. These possibilities result in three different diagrams (the labels on the arrows refer to the h-depth of the contracted redex): M0,0
m η
n β
M1,0
n η
/ M0,1
M0,0
n β
n β
/ M1,1
m η
/ M0,1
M0,0
n β
n β
≥m−1 M1,0 η / / M1,1
M1,0
n η
/ M0,1 M1,1
It is important to note that the depth of the eventual residual of β-redex remains the same after contraction of the η-redex.
An Extensional B¨ ohm Model
167
(γ, δ) = (1, ω): This case is simple. The construction of the commutation diagram comes down to an infinite horizontal chain of base case diagrams. Either the β gets cancelled against one of the η-steps or not. This implies that from a certain point onwards the vertical edges are equalities or not. (γ, δ) = (ω, 1): This case is somewhat involved. Using the previous case we construct for any natural number n the horizontal reduction M0,0 η / M0,1 Mn,0 → →η Mn,1 . This way we get the vertical reduction on the right M1,0 →β M1,1 →β M1,2 →β . . . which is strongly = β β converging because its reduction steps take place at the / / M M 1,0 1,1 same depth as the corresponding steps on the left. Hence η it has a limit, say Mω,1 . In fact each horizontal η-reduction = β = β is the complete development of a set Vn of (occurrences of) η η-redexes in Mn,0 . The missing η-reduction at the bottom / / Mω,1 Mω,0 can now be filled in: complete of the set of development (occurrences of) η-redexes k≥0 m≥k Vm gives us precisely a strongly convergent reduction Mω,0 → →η Mω,1 . (γ, δ) = (ω, ω): Repeated application of the previous two cases allows us to construct the tiling diagram. M0,0 η / M0,1 η / M0,2 M0,ω The vertical β-reductions are all = β = β = β β strongly converging in a uniform / / M1,1 / / M1,2 way: at most some β-steps can get M1,0 M1,ω η η cancelled, but the h-depth of the = β = β = β β remaining β-residuals remains un altered. The horizontal η-reduction M2,0 / / M2,1 / / M2,2 M2,ω η η sequences cannot be else than strongly converging by Lemma 1. Using the tiling diagram theorem of [12] or Mω,0 / / Mω,1 / / Mω,2 Mω,ω η η the uniform (nature of the) strong convergence of all vertical reductions we get that the bottom reduction and the rightmost reduction both end in the same limit term. Commutation of → →η with → →⊥ does not hold as it is shown by the counterexample Ω η ← λx.Ωx →⊥ λx.⊥. However, if we restrict ourselves to leftmost outermost ⊥ reduction we can prove commutation of → →η with → →⊥ . First a lemma saying that ⊥-redexes are preserved under η-reduction. Lemma 4. Let M ∈ Λh∞ →η N . ⊥ be a ⊥-free term and M → 1. If M is a β-head normal form then so is N . 2. If N is a β-head normal form then M has a β-head normal form. 3. M ∈ Uh if and only if N ∈ Uh . Proof. 1. Standard inductive argument on the length of the η-reduction. 2. Without loss of generality we may assume that an η-reduction ending in a β-head normal form is finite. By induction on the length n of this reduction we will show that there is a reduction of linear β-redexes from the initial
168
P. Severi and F.-J. de Vries
term to a β-hnf, where a β-redex (λx.P )Q is linear if the bound variable x occurs free in P at most once. The base case, when n = 0, is trivial. Induction step: suppose M →η P →nη N and N is a β-hnf. By induction hypothesis there exists P in β-head normal form such that P →∗β P by contraction of linear β-redexes. By Lemma 5 (its proof does not depend on this result nor the next theorem) we can postpone the η-step from M to P over the linear β-reduction to P , so that we get
M →∗β M →= η P , where the β-reduction contracts linear β-redexes. An easy case analysis shows that either M is in β-head normal form or reduces to one by contracting a linear β-redex. 3. Suppose that N β-reduces to a head normal form. Then M β-reduces to a head normal form by postponement of η-reduction over β-reduction (Lemma 5) and the previous part. The converse is proved similarly using commutation instead of postponement. Theorem 6. Commutation of → →η with → →⊥out . If M0,0 → →⊥out M0,γ and M0,0 → →η Mδ,0 then there exists Mδ,γ such that M0,γ → →η Mδ,γ and Mδ,0 → →⊥out Mδ,γ . Proof. The induction proof proceeds as the previous proof of Theorem 5. We skip all cases but: (γ, δ) = (1, 1): A careful analysis of cases based on the relative positions of the ⊥out -redex (denoted by U below) and the η-redex leads to two basic situations: – The ⊥-redex is inside the η-redex: M0,0 ≡ C1 [λx.C2 [U ]x]. – The η-redex is inside the ⊥-redex: M0,0 ≡ C1 [C2 [λx.M x]] ≡ C1 [U ]. These two cases result in the following two diagrams: M0
m η
n ⊥out
M2
m η
/ M1 n ⊥out
/ M3
M0
m η
n ⊥out
M2
/ M1 n ⊥out
M2
Note that the h-depth of the ⊥out -redex and its eventual residual after contraction of the η-redex is the same. The last case follows from Lemma 4. 4.3
Postponement
We will prove that η-reduction can be postponed in mixed β⊥η-reductions and hence as as well in mixed βη-reductions. We first need two preparatory lemmas. Lemma 5. Let γ, δ ≤ ω. If M0,0 → →η M0,γ → →β Mδ,γ , then there exists an Mδ,0 such that M0,0 → →β Mδ,0 → →η Mδ,γ . If M0,γ → →β Mδ,γ is finite, then M0,0 → →β Mδ,0 will be finite as well.
An Extensional B¨ ohm Model
169
Note: in (λx.(λy.M )x)N →η (λy.M )N →β M [x := N ] the resulting reduction after postponement of η-reduction requires two β-reduction steps. Proof. The proof is again a double induction on the length of the two reductions. In the proof we try to reconstruct a tiling diagram for: M0,0
η
/ / M0,γ
β
Mδ,0
β
η
/ / Mδ,γ
The proof by induction is an interesting variation on the proof of commutation of β and η. We skip all cases but one: (γ, δ) = (1, ω): Suppose the contracted redex in the final β-reduction M0,ω →β M1,ω has h-depth n. Then because M0,0 → →η M0,ω is strongly converging, there →η M0,ω is a k such that for i ≥ k the depth of all remaining η-redexes in M0,k → is at least n. M0,0
∗ η
/ M0,k
n β ∗
(ind hyp)
n β
M1,0
>n η
n β
/ / M1,k
η
/ / M0,ω / / M1,ω
η
Hence all M0,i for i ≥ k contain a β-redex at the same position as the one redex contracted in M0,ω . Let M1,i be the result of contracting that β-redex in M0,i . Since the η-reduction at M0,i takes place at depth lower than n, we get case 1 (a and c) and case 2 of the proof of Theorem 5 and hence M1,i → →η M1,i+1 . By Lemma 1 this reduction sequence is strong converging and its limit coincides with M1,ω . An appeal to Theorem 5 completes the proof of this case. The next lemma is proved in a similar way. Lemma 6. If M0,0 → →η M0,γ → →⊥ Mδ,γ , then there exists an Mδ,0 such that M0,0 → →⊥ Mδ,0 → →η Mδ,γ . If M0,γ → →⊥ Mδ,γ is finite, then M0,0 → →⊥ Mδ,0 will be finite as well. Proof. The proof is a double induction. We only do the case (δ, γ) = (1, 1), and skip the rest. By case analysis and using Lemma 4 we obtain the following two diagrams: M0,0
m η
n ⊥
M1,0
m η
/ M0,1
M0,0
n ⊥
n ⊥
/ M1,1
M1,0
m η
/ M0,1 n ⊥
M1,1
Combining the previous two lemmas with Theorem 2 we get:
170
P. Severi and F.-J. de Vries
Corollary 1. If M0,0 → →η M0,γ → →β⊥ Mδ,γ , then there exists an Mδ,0 such that →β⊥ Mδ,0 → M0,0 → →η Mδ,γ . If M0,γ → →β⊥ Mδ,γ is finite, then M0,0 → →β⊥ Mδ,0 will be finite as well. Theorem 7. Postponement of → →η over → →β⊥ . If M → →β⊥η N , then there →η N . exists an L such that M → →β⊥ L → Proof. The proof is by (a genuine!) transfinite induction on the number of subsequences of the form M1 → →η M2 → →β M3 in M → →β⊥η N . The base case is trivial, the successor case follows directly from Corollary 1. We only show the limit case for the limit ω. The proof for arbitrary limits is similar. Consider: M ≡ M1,0 → →η M1,1 → →β⊥ M2,1 → →η M2,2 → →β⊥ M3,2 . . . . . . Mω,ω ≡ N Using the induction hypothesis we construct the next diagram row by row. M1,0 β⊥
η
/ / M1,1
β⊥
M2,0
η
/ / M2,1
η
/ / M2,2
Mω,0
η
/ / Mω,1
η
/ / Mω,2
η
/ / Mω,ω
Because the diagonal is strongly converging and the horizontal η-reductions don’t change the depth of the vertical β⊥-reductions, all vertical reductions are strongly convergent as well and have limits. By induction hypothesis they are connected by η-reduction steps. By Lemma 1 the combined reduction at the bottom row is strongly converging. By the uniform nature of the strong convergence of all vertical reductions the limit of reductions at the bottom row and the diagonal are the same. Corollary 2. Let M, N ∈ Λh∞ ⊥ . 1. If M → →η N , then nf β⊥ (M ) → →η nf β⊥ (N ). 2. If M → →β⊥η N , then nf β⊥ (M ) → →η nf β⊥ (N ). Proof. To prove the first item we postpone ⊥ over β (Theorem 2). Next because the ⊥-reduction ends in a β⊥-normal form we can remove all non-outermost ⊥-steps. Then we can construct the diagram on the left using the two commutation theorems. The term N2 is in β⊥-normal form, and hence by the unique β⊥-normal form property (Theorem 1) we get N2 = nf β⊥ (N ).
M β⊥
// N
(Thm. η
5)
β / / N1
⊥out (Thm.
6)
/ / N2
β M1
η
nf β⊥ (M )
η
⊥out
An Extensional B¨ ohm Model
171
For the proof of second item we build the diagram below using postponement of η-reduction over β⊥-reduction (Theorem 7) and the previous part. β⊥η
M β⊥
// P
β⊥ (Thm. 1) β⊥
nf β⊥ (M )
η (Part
nf β⊥ (P )
η
%% // N 1) β⊥ / / nf β⊥ (N ) 88
η
4.4
Confluence and Normalisation
Finally we prove that λh∞ β⊥η is confluent, normalising and ω + ω-compressible. Theorem 8. Confluence of β⊥η. The extensional infinite lambda calculus λh∞ β⊥η is confluent. Proof.
M q 0 MMMM q q MMMβ⊥η β⊥η qqq MMM β⊥ qq q MMM q q && xqx q (Cor 2) nf β⊥ (M0 ) (Cor 2) M1 M2 β⊥
η
x x nf β⊥ (M1 )
(Thm η
β⊥
η
&&
M3
& & nf β⊥ (M2 )
3) xx
η
Our proof technique using postponement has the flavour of the confluence proof for finite λβη of Curry and Feys [7], and may therefore seem to be related to the confluence proof of the finite λβ⊥η of 15.2.15(ii) in [3]. The latter proof makes use of η-normal forms, whereas we use the auxiliary notion of β⊥-normal forms. Note, however, that η-normal forms don’t work, as can be seen by applying the proof technique of [3] to the coinitial reductions Ω η ← λx.Ωx →⊥ λx.⊥. In its compact form our proof does not restrict to a confluence proof for λβ⊥η . But it is not hard to distill from the previous proofs a proof for the finite setting that make use of ⊥-normal forms instead of β⊥-normal forms. It may be of interest to see whether other proofs of finitary confluence for finite λβ⊥η can be generalised to the infinitary setting. For instance: the older proof of Barendregt, Bergstra, Klop and Volken [4] and the more recent proof by van Oostrom [15].
172
P. Severi and F.-J. de Vries
Theorem 9. Normalisation and compression of β⊥η. The extensional infinite lambda calculus λh∞ β⊥η is normalising and is ω + ω-compressible. h∞ are confluent, normalisProof. By theorems 1, 3 and 2, the calculi λh∞ β⊥ and λη ing and ω-compressible. The β⊥η-normal form of a term can be obtained by first computing the β⊥-normal form and then the η-normal form. To prove thatλh∞ β⊥η is ω + ω-compressible we use postponement of eta over beta and bottom. Finally recall the last nine lines of the introduction.
5
Eta-B¨ ohm Trees and the Extensional B¨ ohm Model
As a corollary of the confluence and normalisation results we see that each term in λh∞ ohm tree. Hence we can β⊥η has a unique β⊥η-normal form, its eta-B¨ construct an extensional B¨ ohm model Bη for both λβη and λh∞ β⊥η almost for free. In the notation of [3] we define the triple Bη = (Bη , ·, [[ ]]) as follows: 1. Bη is the set of β⊥η-normal forms of terms in Λh∞ ⊥ . 2. M · N = nf β⊥η (M N ) for all M, N in Bη . 3. [[M ]]ρ = nf β⊥η (M ρ ) where M in Bη and M ρ is the simultaneous substitution of all free variables of M for ρ(x). The unique normal form property of λh∞ β⊥η implies well-definedness of this definition. The ease of this construction contrasts with the usual construction based on elaborate continuity arguments in [3] of the standard B¨ ohm model for λβ . An informal definition of eta-B¨ ohm trees can be found in [2]. It is also possible to give a corecursive definition. In early approaches to eta-B¨ ohm trees sets of finite β⊥η-approximants have been used, see for example [10,6]. Theorem 10. The following statements are equivalent for terms in λh∞ β⊥η : 1. M, N have the same eta-B¨ ohm tree. 2. M and N are observationally equivalent wrt finite βη-normal forms [13], i.e. C[M ] has a finite βη-normal form if and only if C[N ] has a finite βη-normal form for all C. ∗ 3. M, N have the same interpretation in Park’s model D∞ [16,6,17] for the extensional lambda calculus. (1 ⇔ 2) has been proved by Hyland in [10] and (2 ⇔ 3) has been proved in [6]. Note that the direction (1 ⇒ 2) is in fact a corollary of the unique normal form property of λh∞ β⊥η , see [9] for an argument in a similar situation where approximants can not be used.
6
Future Research
We are currently working on another infinitary extension of λβη that has the infinite-eta-B¨ohm trees as its normal forms. How to build an extension that captures the Nakajima trees [14,18,3] is still a challenge.
An Extensional B¨ ohm Model
173
Acknowledgements. The first author thanks the Department of Mathematics and Computer Science of Leicester University for their hospitality. Both authors are grateful to Mariangiola Dezani-Ciancaglini and Vincent van Oostrom for discussions and Mario Severi for providing computer facilities during the preparation of this document.
References 1. A. Arnold and M. Nivat. The metric space of infinite trees. algebraic and topological properties. Fundamenta Informaticae, 4:445–476, 1980. 2. S. van Bakel, F. Barbanera, M. Dezani-Ciancaglini, and F.J. de Vries. Intersection types for λ-trees. TCS, 272(1-2):3–40, 2002. 3. H. P. Barendregt. The Lambda Calculus Its Syntax and Semantics. North-Holland Publishing Co., Amsterdam, Revised edition, 1984. 4. H. P. Barendregt, J. Bergstra, J. W. Klop, and H. Volken. Degrees, reductions and representability in the lambda calculus. Technical report, Department of Mathematics, Utrecht University, 1976. 5. A. Berarducci. Infinite λ-calculus and non-sensible models. In Logic and algebra (Pontignano, 1994), pages 339–377. Dekker, New York, 1996. 6. M. Coppo, M. Dezani-Ciancaglini, and M. Zacchi. Type theories, normal forms, and D∞ -lambda-models. Information and Computation, 72(2):85–116, 1987. 7. H.B. Curry and R. Feys. Combinatory Logic, volume I. North-Holland Publishing Co., Amsterdam, 1958. 8. N. Dershowitz, S. Kaplan, and D.A. Plaisted. Rewrite, rewrite, rewrite, rewrite, rewrite, . . . . Theoretical Computer Science, 83(1):71–96, 1991. 9. M. Dezani-Ciancaglini, P. Severi, and F.J. de Vries. Infinitary lambda calculus and discrimination of Berarducci trees. TCS, 200X. Draft available on www/mcs.le.ac.uk/˜ferjan. 10. J. M. E. Hyland. A survey of some useful partial order relations on terms of the lambda calculus. In C. B¨ ohm, editor, Lambda Calculus and Computer Science Theory, volume 37 of LNCS, pages 83–93. Springer-Verlag, 1975. 11. J.R. Kennaway and F.J. de Vries. Infinitary rewriting. In TeReSe, editor, Term Rewriting Systems, Cambridge Tracts in Theoretical Computer Science. Cambridge University Press, 200X. Draft available on www/mcs.le.ac.uk/˜ferjan. 12. J.R. Kennaway, J. W. Klop, R. Sleep, and F.J. de Vries. Infinitary lambda calculus. TCS, 175(1):93–125, 1997. 13. J. H. Morris Jr. Lambda calculus models of programming languages. PhD thesis, M.I.T., 1968. 14. R. Nakajima. Infinite normal forms for the λ-calculus. In C. B¨ ohm, editor, Lambda calculus and Computer Science Theory, volume 37 of LNCS, pages 62–82. SpringerVerlag, 1975. 15. V. van Oostrom. Developing developments. TCS, 175(1):159–181, 1997. 16. D. Park. The Y-combinator in Scott’s lambda-calculus models (revised version). Technical report, Department of Computer Science, University of Warwick, 1976. Copy available at www.dcs.qmul.ac.uk/˜ae/papers/others/ycslcm/. 17. G.D. Plotkin. Set-theoretical and other elementary models of the λ-calculus. TCS, 121:351–409, 1993. 18. C. P. Wadsworth. The relation between computational and denotational properties for Scott’s D∞ -models of the lambda-calculus. SIAM J. Comput., 5(3):488–521, 1976.
A Weak Calculus with Explicit Operators for Pattern Matching and Substitution Julien Forest Laboratoire de Recherche en Informatique (CNRS URM 8623), Bˆat 490, Universit´e Paris-Sud, 91405 Orsay CEDEX, France.
[email protected].
Abstract. In this paper we propose a Weak Lambda Calculus called λPw having explicit operators for Pattern Matching and Substitution. This formalism is able to specify functions defined by cases via pattern matching constructors as done by most modern functional programming languages such as OCAML. We show the main property enjoyed by λPw , namely subject reduction, confluence and strong normalization.
1 Introduction In this paper we propose a Weak Lambda Calculus with Pattern Matching and Explicit Substitution called λPw . The calculus λPw is inspired by calculi of explicit substitutions [ACCL91, BBLRD96, BR95, Kes96] and by calculi of patterns [KPT96, CK99, CK]. The weak nature of λPw allows us to denote variables by names without requiring α-conversion to implement correctly the notion of reduction. Theoretical study of functional programming has been enriched by the introduction of typed λ-calculi, explicit substitutions [ACCL91, BBLRD96, BR95, Kes96] and pattern matching [KPT96, CK99, CK]. These three notions are the main ingredients of the formalism we propose in this paper to model typed functional languages with function definition by cases. In the early thirties, Church proposed the λ-calculus as a general theory of functions and logic. Typed versions of λ-calculus were then defined by Curry and Church, they became the standard theoretical tool for defining and implementing typed functional programming languages. There is however an important gap between λ-calculus and modern functional languages: – On one hand, the operation of substitution is not incorporated in the language level but is left as a meta-operation. – On the other hand, most popular functional languages (resp. proof assistants) allow the definition of functions (resp. proofs) by cases via pattern-matching mechanisms, while λ-calculus does not incorporate at all these constructs. The first problem is solved by incorporating the so-called explicit substitutions into the language used to implement functional programming. To do this, one simply adds S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 174–191, 2002. c Springer-Verlag Berlin Heidelberg 2002
A Weak Calculus with Explicit Operators for Pattern Matching and Substitution
175
a new construction to denote substitution and new reduction rules to describe the interaction between substitution and other constructors of the language. Many calculi with explicit substitutions [ACCL91, BBLRD96, BR95, Kes96] have been proposed in the literature, operational and logic properties of these calculi were extensively studied [Les94, CHL96, Kes96]. The second problem is solved by allowing abstractions of function not only with respect to variables but also with respect to patterns. Thus, the form of the arguments of a given function can be specified in a very precise form; for instance, a term having the form λx, y.M specify that the expected argument is a pair. In the early nineties, Kesner, Tannen and Puel [KPT96] proposed a Calculus of Pattern Matching as a tool of theoretical study of pattern matching a` la ML. In 1999, Kesner and Cerrito [CK99, CK] refined the ideas in [KPT96] and defined the calculus T P CES as a formalism with explicit pattern matching and explicit substitution. Other languages with explicit pattern matching, such as for example the ρ-calculus [CK98, CKL01], were recently proposed in the literature to model other programming paradigms. The calculus presented in this paper, called λPw , is a calculus with explicit pattern matching and explicit substitutions. This calculus is not designed as a user level language but as the output calculus of a pattern matching compilation algorithm. Such an algorithm is supposed to take a pattern matching function definition and to return an equivalent one where all the ambiguities between overlapping patterns have disappeared and where incomplete patterns definitions have been detected and completed. Such an hypothesis is not really restrictive since all the functional languages with pattern matching features ([Obj]) apply such an algorithm before evaluating programs. The calculus λPw is based on [CK99, CK], but have the following new features: – λPw is a weak calculus of explicit substitutions, that is, functions are lazyly evaluated. To implement this correctly, substitutions are not allowed to cross lambda constructors - so that α-conversion is no more needed to achieve correct reduction of terms - and composition of substitutions is incorporated into the substitution language in order to guarantee confluence. The syntax of λPw is based on the weak σ-calculus with names [CHL96] in contrast to T P CES , which is a strong calculus based on the substitution formalism with names called x [BR95, Blo95]. – In contrast to T P CES which treats “ordinary” substitutions explicitly but the socalled “sum” substitutions implicitly 1 , λPw treats all the substitutions as explicit. This choice makes the formalism (typing rules and typing reduction rules) and the proofs more involved than those in [CK99], but results in a complete and selfcontained formalism which is able to describe different implementations of functional languages with pattern matching. The formalism that we present in this paper enjoys all the classical properties of typed λ-calculi. – It is confluent on all terms. – It has the subject reduction property. – It is strongly normalizing on all well-typed terms. 1
In fact, the first version presented in [CK99] treats sum substitutions explicitly, but the revised and corrected version in [CK] only keeps ordinary substitution as an explicit operation by moving sum substitution to the meta-level.
176
J. Forest
The paper is organized as follows. We will first give in Section 2 a formal definition of λPw and give some basic properties such as preservation of free variables by reduction and confluence on all ground terms. We then introduce in Section 3 a typing system for λPw and show that λPw enjoys the subject reduction property. We finally show strong normalization of well-typed λPw -terms in Section 4 before the conclusion given in Section 5.
2 Definition of λPw We first define the raw expressions of the calculus λPw by giving three different sets to denote respectively raw terms, raw substitutions and raw sum terms. The notion of raw expression is refined by first defining the set of free variables of any raw expression which allows us to define (well-formed) expressions such as terms, substitutions and sum terms. Reduction rules of λPw are given in Figure 1. These rules are showed to preserve free variables of expressions. We fix two distinct infinite sets of variables: the set of usual variables, noted x, y, z, . . . , which are used to denote ordinary terms, and the set of sum variables, noted ξ, ψ, . . . , which are used to denote disjunction. We also fix two constants L and R and we use the notation K to denote indistinctly one or the other one. We will also use the notation T to denote indistinctly L, R or a sum variable. Types of λPw are given by the following grammar: (Types) A ::= ι |A×A |A+A |A→A
Base type Product Type Sum Type Functional Type
Patterns of λPw are given by the following grammar: (Pattern) P :=
|x | P, P | @(P, P ) | (P |ξ P )
Wildcard Variable Pair Contraction Sum
The notations , x and P, Q are standard while the notation @(P, Q) is similar (indeed more general) to the as constructor of Ocaml [Obj]. The pattern (P |ξ Q) is used to specify two different structures P and Q (of types A and B) corresponding to a pattern of sum type A + B. The sum variable ξ appearing in a pattern (P |ξ Q) is used to propagate the result of any matching w.r.t the pattern (P |ξ Q) all along the term where this variable occurs. Raw Substitutions of λPw are given by the following grammar: (Raw Substitution) s ::= id | (x/M ).s | (ξ P:A /K).s |s◦s
Identity Cons usual var Cons sum var Concatenation
A Weak Calculus with Explicit Operators for Pattern Matching and Substitution
177
In order to mark which branch of a sum pattern has been chosen we use special syntax that we called sum terms. A sum term is either a constant (there is one for each possible choice), a sum variable (no choice has been made), or a substituted sum variable (the full evaluation has not been made). (Raw Sum Terms) Ξ ::= ξ |L |R | ξ[s]
Sum Variable Left Constant Right Constant Sum Substitution
We are now able to introduce λPw -terms. The main difference between λ-calculus and pattern calculi is that the notation λx.M is generalized to λP.M where P is a pattern as given by the previous grammar. Thus, λPw -terms are given by the following grammar: (Raw Terms) M ::= x | (M N ) | M, M | inlB (M ) | inrA (M ) | [M |ξ M ] | [M |sΞ M ] | λP : A.M | M [s]
Usual Variable Application Pair Left injection Right injection Case Frozen Case Abstraction Closure
All along the paper we may sometimes omit types from expressions in order to simplify the notation, but expressions are supposed to be as defined by this grammar. A Case constructor of the form [M |ξ N ] is used to specify two different terms M and N corresponding respectively to two different patterns P and Q of a sum pattern (P |ξ Q) appearing somewhere in the program. The communication between the case constructor [M |ξ N ] and its corresponding sum pattern (P |ξ Q) is achieved via the sum variable ξ. The introduction of the Frozen Case constructor is purely technical, the idea is to prevent reduction of the sub-term M (resp. N ) inside a case constructor of the form [M |ξ N ] where a left (resp. right) choice has been already made. Example 1. A simple λPw -term is λ(x |ξ y) : A + B.[λy : B.x, y |ξ λx : A.x , y]. For a more interesting example let us suppose that we have encoded the recursive type nat as a sum type, and that (0 |ξ S m) is a pattern of type nat representing either 0 or a positive natural number of the form S m. We refer the reader to [CK99] for more examples and more details about encoding of recursive types in the formalism of pattern calculi. Indeed, the following Ocaml [Obj] term: match n with | 0 -> 0 | (S m) -> m is given by the λPw -term λ(0 |ξ (S m)) : nat.[0 |ξ m]. Definition 21 (Raw expression) A raw expression is either a raw term, a raw substitution or a raw sum term.
178
J. Forest
As usually done in λ-calculus we will work modulo α-conversion. This notion must be defined with care since bound variables are not only all the variables appearing in complex patterns, but also, all the variables bound by substitutions. Definition 22 (Binding Variables) The set of Binding Variables of a pattern (resp. a substitution) is defined as: BV ar( ) =∅ BV ar(x) ={x} BV ar(P, Q) =BV ar(P ) ∪ BV ar(Q) BV ar((P |ξ Q))=BV ar(P ) ∪ BV ar(Q) ∪ {ξ} BV ar(@(P, Q)) =BV ar(P ) ∪ BV ar(Q)
BV ar(id) =∅ BV ar((x/M ).s)=BV ar(s) ∪ {x} BV ar((ξ P /K).s)=BV ar(s) ∪ BV ar(P ) ∪ {ξ} BV ar(s ◦ t) =BV ar(s) ∪ BV ar(t)
We have for example, BV ar((x |ξ y)) = {ξ, x, y} and BV ar((x/M ).(ξ y /L).id) = {x, y, ξ}. Definition 23 (Free Variables) The set of Free Variables of an expression e is given by: F V (x) F V (inl(M )) = F V (inr(M )) F V (M N ) = F V (M, N ) F V (λP.M ) F V (M [s]) F V ([M |ξ N ]) F V ([M |sΞ N ]) F V (id) = F V (K) F V ((x/M ).s) F V ((ξ P /K).s) F V (s ◦ t) F V (ξ) F V (ξ[s])
={x} =F V (M ) =F V (M ) ∪ F V (N ) =F V (M ) \ BV ar(P ) =(F V (M ) \ BV ar(s)) ∪ F V (s) =F V (M ) ∪ F V (N ) ∪ {ξ} =((F V (M ) ∪ F V (N )) \ BV ar(s)) ∪ F V (s) ∪ F V (Ξ) =∅ =F V (M ) ∪ F V (s) =F V (s) =(F V (s) \ BV ar(t)) ∪ F V (t) ={ξ} =({ξ} \ BV ar(s)) ∪ F V (s) (ξ y /L).id
Thus for example, F V (λ(x |ξ y).[x |ψ t]) = {ψ, t} and F V ([x |ξ[(x/t).id] y]) = {x, t, ξ}. We define the set of free sum variables (F SV ) of a raw expression e as the set of sum variables of e which are in F V (e). Definition 24 (Bound variables) The Bound Variables of a raw expression e are those variables appearing in e but not free in e. We are now ready to define α-conversion on λPw -expressions as simply renaming of bound variables. Thus, for example λ(x |ξ x).x and λ(y |ψ y).y are α-equivalent, but neither λ(x |ξ x).y and λ(y |ξ y).y nor λ(x |ξ x).x and λ(x |ξ y).x are α-equivalent. We are now ready to introduce the reduction rules which are given in Figure 1. The Pattern matching rules are the rules implementing the pattern matching. The Propagation of Substitutions, Substitutions and Variables and Constants, Substitutions and Composition rules are a natural extension of those of the σ-calculus.
A Weak Calculus with Explicit Operators for Pattern Matching and Substitution
179
The Case rules explain the mechanism to distribute a substitution s with respect to a case term [M |ξ N ] which consists in: – We first transform the case term into a frozen case term using the rule (F reeze) . – We then treat the part ξ[s] of the obtained frozen case term until a result (i.e. the variable ξ, the constants L or R) is obtained. – We can then distribute the substitution in the frozen case term using one of the rules (Lef t), (Right) or (Xi). The reduction system generated by the rules Abs id, Abs pair, Abs contr, Abs lef t, Abs right, Abs var and Abs wild is used to implement the pattern matching operation and is noted by −→P . All the other rules generate the reduction system used to implement the behavior of substitution and is noted by −→es . The reduction relation −→λPw is generated by −→es ∪ −→P . To simplify the notation we may simply note −→ for −→λPw in the rest of the paper. Example 2. We show one way to propagate the substitution s = (x/M3 ).(ξ P:A /L).id inside the term M = [M1 |ξ M2 ]. – First of all we reduce the case term into a frozen case term by M [s] −→ s F reeze [M1 |ξ[s] M2 ] – We then ”evaluate” the part ξ[s]: ξ[s] −→ Sub sum var4 ξ[(ξ P:A /L).id] −→ Sub sum var2 L – Thus we have [M1 |sξ[s] M2 ]−→+ [M1 |sL M2 ], and thus applying the rule (Lef t), we obtain M [s]−→+ M1 [s]. Remark 1. Let s and s be raw substitutions such that s −→ s , then BV ar(s) = BV ar(s ). However, the reduction system in Figure 1 is not really correct in the sense that −→λPw does not preserve free variables. This is shown by the following example: Example 3. Let M be a term such that F V (M ) = ∅ and let U = (λ(x |ξ y).x inr(M )). Then U −→∗ λPw x[(ξ x /R).id] −→λPw x and F V (U ) = ∅ but F V (x) = {x}. In order to avoid this problem we restrict the set of raw expressions in order to guarantee that no new free variable does appear along reduction sequences. The notion of acceptable expression, or simply expression, is obtained via the introduction of the following concepts: Definition 25 (Localized Free Variables) Given a sum variable ξ, a sum constant K and a raw expression e, we define the set of localized free variables of e w.r.t. ξ and K, written as F VξK (e), as the subset of F V (e) define exactly as for F V (e) except for the following cases: F VξL ([M F VξR ([M F VξL ([M F VξR ([M F VξL ([M F VξR ([M
|ξ N ]) = F VξL (M ) ∪ {ξ} |ξ N ]) = F VξR (N ) ∪ {ξ} |sξ N ]) = F VξL (M [s]) ∪ {ξ} |sξ N ]) = F VξR (N [s]) ∪ {ξ} |sξ[t] N ]) = F VξL (M [s]) ∪ {ξ} |sξ[t] N ]) = F VξR (M [s]) ∪ {ξ}
if ξ ∈ BV ar(t) if ξ ∈ BV ar(t)
180
J. Forest
Start Rule (λP.M ) N
−→
(λP.M )[id] N
(Abs id)
Pattern Matching (λP1 , P2 .M )[s] N1 , N2 (λ@(P1 , P2 ).M )[s] N (λ(P1 |ξ P2 ).M )[s] inl(N ) (λ(P1 |ξ P2 ).M )[s] inr(N ) (λx.M )[s] N (λ .M )[s] N
−→ −→ −→ −→ −→ −→
((λP1 .λP2 .M )[s] N1 ) N2 ((λP1 .λP2 .M )[s]N ) N (λP1 .M )[(ξ P2 /L).s] N (λP2 .M )[(ξ P1 /R).s] N M [(x/N ).s] M [s]
(Abs (Abs (Abs (Abs (Abs (Abs
Case [M |ξ [M |sL [M |sR [M |sξ
−→ −→ −→ −→
[M |sξ[s] N ] M [s] N [s] [M [s] |ξ N [s]]
(F reeze) (Lef t) (Right) (Xi)
−→ −→ −→ −→
M [s]N [s] inl(M [s]) inr(M [s]) M1 [s], M2 [s]
(Sub (Sub (Sub (Sub
app) lef t) right) pair)
Substitutions and Variables and Constants x[id] −→ x x[(x/N ).s] −→ N y[(x/N ).s] −→ y[s] if y = x x[(ξ P /K).s] −→ x[s] ξ[id] −→ ξ ξ[(ξ P /K).s] −→ K ξ[(ψ P /K).s] −→ ξ[s] if ξ = ψ ξ[(x/M ).s] −→ ξ[s]
(Sub (Sub (Sub (Sub (Sub (Sub (Sub (Sub
var1 ) var2 ) var3 ) var4 ) sum var1 ) sum var2 ) sum var3 ) sum var4 )
Substitutions and Composition M [s][t] (s ◦ t) ◦ u ((x/M ).s) ◦ t ((ξ P /K).s) ◦ t id ◦ s
(Sub (Sub (Sub (Sub (Sub
clos) ass env) concat1 ) concat2 ) id)
N ][s] N] N] N]
Propagation of Substitutions (M N )[s] inl(M )[s] inr(M )[s] M1 , M2 [s]
−→ −→ −→ −→ −→
M [s ◦ t] s ◦ (t ◦ u) (x/M [t]).(s ◦ t) (ξ P /K).(s ◦ t) s
pair) contr) lef t) right) var) wild)
Fig. 1. Reduction Rules for λPw
Intuitively, F VξL (e) (resp. F VξR (e)) contains all the free variables of e except those that are on the right (resp. left) part of the sum terms rooted by the sum variable ξ. Thus, for example, F VξL (λ(x |ξ y).[x |ξ t]) = ∅, F VξL (λy.[x |ξ t]) = {x}, and F VξK (x[(ξ x /L).id]) = {x}
A Weak Calculus with Explicit Operators for Pattern Matching and Substitution
181
We are finally ready to define the notion of acceptable expression which will avoid the example of creation of new free variables introduced before. Definition 26 (Acceptable Expression) The raw expression e is said to be acceptable (or just called an expression) iff Acc(e), where Acc( ) is the least congruence on expressions such that every variable and every constant is acceptable and also the following requirements hold: Acc(ψ[s]) if ψ ∈ BV ar(Q) ∀(ξ Q /K) ∈ s Acc(M [s]) if F VξK (M ) ∩ BV ar(Q) = ∅ ∀(ξ Q /K) ∈ s ∀(ξ Q /K) ∈ t Acc(s ◦ t) if F VξK (s) ∩ BV ar(Q) = ∅ Acc(λP.M ) if (F VξR (M ) ∩ BV ar(Q1 )) ∪ (F VξL (M ) ∩ BV ar(Q2 )) = ∅ ∀(Q1 |ξ Q2 ) ∈ P Acc([M |sξ N ]) if Acc(M [s]) and Acc(N [s]) if (ξ Q /K) ∈ t Acc([M |sξ[t] N ]) if Acc(M [s]) and Acc(N [s]) and Acc(ξ[t]) Acc([M |sξ[t] N ]) if Acc(M [s]) and Acc(ξ[t]) if (ξ Q /L) ∈ t Acc([M |sL N ]) if Acc(M [s]) Acc([M |sξ[t] N ]) if Acc(N [s]) and Acc(ξ[t]) if (ξ Q /R) ∈ t Acc([M |sR N ]) if Acc(N [s])
Thus for example, the term λ(x |ξ y).[x |ξ t] is acceptable while λ(x |ξ y).[x |ψ t] is not acceptable. Indeed F VξR ([x |ψ t]) = {x, t, ξ} and BV ar(x) = {x} and thus by definition of Acc() for abstractions we have that λ(x |ξ y).[x |ψ t] is not acceptable. The reader may also remark that the terms U and x[(ξ x /L).id] given in Example 3 are neither acceptable. We have to show now that this new notion of acceptable expression recently introduced is correct to prevent creation of new free variables along reduction sequences, that is, reduction preserves acceptable expressions and free variables. Lemma 21 If e is an expression and e −→λPw e then – e is an expression – F V (e ) ⊆ F V (e) We are now ready to state confluence of λPw which guarantees that normal forms are unique. Theorem 22 (Confluence for λPw ) λPw is confluent. Proof. The proof technique used to show the confluence property is the same as in [CHL96]. By lack of space we can not give here all the details of this proof but we refer the interested reader to [For02] for full technical explanations. The idea/scheme of the proof is the following. 1. We first show strong normalization and confluence of the system −→es . This allows us to work with es-normal forms of λPw -expressions. 2. We then define a reduction system −→aux on es-normal forms and we show that −→aux is confluent. 3. We finally conclude by Hardin’s Interpretation Lemma [CHL96] which relates confluence of −→λPw with confluence of −→aux .
182
J. Forest
3 A Typing System for λPw As λ-calculus is strictly contained in λPw , then λPw is not strongly normalizing. In order to obtain strong normalization of λPw we define a typing system which is capable of associating types to terms, sum terms and substitutions in a given environment. While typing systems have been already studied for calculi with explicit substitutions [DG01] and also for calculi with patterns [KPT96, CK], no formalism in the literature exists to correctly type explicit choice sum terms. The typing system that we present in this section is shown to have the subject reduction property, that is, typing is preserved under reduction. The notion of acceptable expression (Definition 26) is essential to guarantee such a property. We restrict now our attention to a special kind of patterns called acceptable. For that, we say that a pattern P is linear if and only if every variable occurs at most once in P . We define a type environment to be a pair Φ; Γ such that Φ is a sum environment defined as a set of pairs of the form ξ : K and Γ is a pattern environment defined as a set of typed patterns, which are pairs of the form P : A. We say that a type environment Φ; Γ is linear if every variable occurs at most once in Φ; Γ . Definition 31 (Acceptable Patterns and Environments) The set of acceptable patterns of type A, denoted by AP(A), is defined to be the smallest set of linear patterns verifying the following properties: ∈ AP(A); x ∈ AP(A) for any variable x; @(P, Q) ∈ AP(A) if P ∈ AP(A) and Q ∈ AP(A); P, Q ∈ AP(B × C) and (P |ξ Q) ∈ AP(B + C) if P ∈ AP(B) and Q ∈ AP(C). The role of the notion of “acceptable patterns” is to prevent the (wildcard) typing rule (corresponding to (weakening) in logic) to introduce meaningless pattern expressions. This notion extends naturally to environments by defining Φ; Γ to be acceptable if and only if each pattern appearing in Φ; Γ is acceptable. Thus for example the pattern x, y is linear but is not in AP(A + B) since a pair pattern is not compatible with a sum type. We now introduce the typing rules for terms and substitutions (resp. for sum terms) in Figure 2 (resp. Figure 3) assuming that all the patterns and type environments appearing in these rules are acceptable. In the rules (Case1 ) and (F rozencase1 ), ξ is a fresh sum variable and (P |ξ Q) is linear. In the rule (W ildcard), Φ; P : A, Γ has to be linear. In the rule (P roj1 ), all the xi ’s are distinct usual variables. In the rules (P roj2 ) and (N proj), all the ξi ’s are distinct sum variables. In the rule (N proj) ξ does not appear in Γ . In the rule (App), we require that N does not contain free sum variables. In the rules (Sub term), (F rozencase1 ) and (F rozencase2 ) we require that s does not contain free sum variables. In the rule (Sub cons1 ) we require that M does not contain free sum variables. In the rules (Sub cons1 ), (Sub cons2 ), all contexts have to be linear. We say that the term M (resp. the substitution s and the sum term Ξ) has type A (resp. co-environment Φ ; Γ and sum type T) in a type environment Φ; Γ if and only if there is a type derivation ending with Φ; Γ M : A (resp. Φ; Γ s 7 Φ ; Γ and Φ; Γ Ξ ❀ T). We say that the term M (resp. the substitution s and the sum term Ξ) is well-typed in Φ; Γ if and only if there is a type A such that M has type A in Φ; Γ (resp there is an environment Φ ; Γ such that s has is co-environment Φ ; Γ in Φ; Γ and there is a sum type T such that Ξ has sum type T in Φ; Γ ). We will make an abuse of notation
A Weak Calculus with Explicit Operators for Pattern Matching and Substitution
183
Fig. 2. Typing Rules for Terms and Substitutions
by writing Φ; Γ M : A (resp. Φ; Γ s 7 Φ ; Γ and Φ; Γ Ξ ❀ T) to indicate that M (resp. s and Ξ) has type A in Φ; Γ .
184
J. Forest
ξ1 : K1 , . . . , ξm : Km ; Γ ξj ❀ Kj
(P roj2 )
Φ; Γ L ❀ L Φ; Γ s / Φ ; Γ
(L)
if ∀i, ξ = ξi ξ1 : K1 , . . . , ξm : Km ; Γ ξ ❀ ξ
Φ; Γ R ❀ R
Φ ; Γ ξ ❀ T
Φ; Γ ξ[s] ❀ T
(N proj)
(R)
(Sub sum)
Fig. 3. Typing Rules for Sum Terms
First of all, we remark that for any substitution s, the co-environment of s contains its typing environment. This observation can be formalized as follows: Remark 2. If Ψ ; ∆ s 7 Φ; Γ then there exists Ψ and ∆ such that Φ = Ψ Ψ and Γ = ∆∆ . Also note that for any well-typed expression e in Φ; Γ all its free variables appear in Φ; Γ . An important property used in the subject reduction proof (Theorem 34) states that if an expression e is well-typed in a typing environment Φ; Γ , then it is also well-typed in any "reasonable" typing environment containing Φ; Γ . Lemma 31 (Weakening for Environments) – If Φ; Γ M : A then for all acceptable Ψ ; ∆ such that BV ar(Ψ ; ∆) ∩ (BV ar(Φ; Γ ) ∪ F SV (M )) = ∅, then Ψ Φ; Γ ∆ M : A. – If Φ; Γ s 7 Φ ; Γ , then for all acceptable Ψ ; ∆ such that BV ar(Ψ ; ∆) ∩ (BV ar(Φ; Γ ) ∪ BV ar(s) ∪ F SV (s)) = ∅, then Ψ Φ; ∆Γ s 7 Ψ Φ ; ∆Γ . – If Φ; Γ Ξ ❀ K, then for all acceptable Ψ ; ∆ such that BV ar(Ψ ; ∆) ∩ (BV ar(Φ; Γ ) ∩ F SV (Ξ)) = ∅, then Ψ Φ; ∆Γ Ξ ❀ K. Proof. We prove these three statements by induction on (|Ψ | + |∆|,h) where |∆| is the number of patterns appearing in ∆, |Ψ | is the number of sum variables appearing in Ψ and h is the height of the considered proof. The proof of subject reduction is strongly based on the possibility of deconstructing patterns into more simpler ones via the Dec( ) operation which is defined as follows: Definition 32 Given a typed pattern P : A, we define its deconstruction as follows: Dec( : A) = :A Dec(x : A) = x : A Dec((P1 |ξ P2 ) : A1 + A2 ) = (P1 |ξ P2 ) : A1 + A2 Dec(P1 , P2 : A1 × A2 ) = Dec(P1 : A1 ), Dec(P2 : A2 ) Dec(@(P1 , P2 ) : A) = Dec(P1 : A), Dec(P2 : A)
A Weak Calculus with Explicit Operators for Pattern Matching and Substitution
185
This notion extends naturally to a pattern environment Γ = P1 : A1 , . . . , Pn : An by defining Dec(Γ ) as Dec(P1 : A1 ), . . . , Dec(Pn : An ). The typing system enjoys the property that any well-typed expression in a typing environment is also a well-typed expression in the deconstructed environment. Lemma 32 If e is well typed in an environment Φ; Γ then e is also well typed in Φ; Dec(Γ ). Another property of typing derivations which is needed in the proof of subject reduction states that for every well typed expression e we can choose a “canonical” typing derivation ending with a particular typing rule associated to e. Thus for example, if Φ; Γ M1 , M2 : A × B then there is a proof of this sequent ending with the rule (×Right). We refer the interested reader to [For02] for further details. The deconstruction operation given in Definition 32 can be used to simplify pair and contraction patterns, but there is no operation to simplify sum patterns. There is however a property of typing derivations which allows us to decompose sum patterns appearing on the left hand side of sequents as follows: Lemma 33 Let K and K be L or R, then – If Φ; (P |ξ Q) : A + B, Γ M : C then so is ξ : K, Φ; P : A, Q : B, Γ M : C. – If Φ; (P |ξ Q) : A + B, Γ s 7 Φ ; (P |ξ Q) : A + B, Γ then so is ξ : K, Φ; P : A, Q : B, Γ s 7 ξ : K, Φ ; P : A, Q : B, Γ . – If Φ; (P |ξ Q) : A + B, Γ Ξ ❀ K then ξ : K, Φ; P : A, Q : B, Γ Ξ ❀ K . – If Φ; (P |ξ Q) : A + B, Γ Ξ ❀ ψ then ξ : K, Φ; P : A, Q : B, Γ Ξ ❀ ψ. We can now state the following result: Theorem 34 (Subject reduction) – If Φ; Γ M : A and M −→ M then Φ; Γ M : A. – If Φ; Γ s 7 Φ ; Γ and s −→ s then Φ; Γ s 7 Φ ; Γ . – If Φ; Γ Ξ ❀ T (with T ∈ {L, R, ξ}) and Ξ −→ Ξ then Φ; Γ Ξ ❀ T. Proof. By induction on expressions, using Lemmas 33, 32 and 31.
4 Strong Normalization for λPw This section is devoted to the proof of strong normalization of well-typed λPw -terms. This proof is an adaptation of the one proposed by Ritter [Rit94] for a restricted version of λσ with de Bruijn indices where substitutions can only cross the leftmost outermost lambda. The scheme of our proof can be summarized as follows: – We first define a calculus modulo an equational theory, noted λPw/≡ . – We then define the notion of reducible term and reducible substitution for λPw/≡ expressions. We show that any reducible term (resp. substitution) is strongly normalizing.
186
J. Forest
– We show that the term M [s] and the substitution t ◦ s are reducible for any reducible substitution s, well-typed term M and well-typed substitution t. – We use the previous point to show that any well-typed λPw/≡ -expression is reducible and thus strongly normalizing. – Finally, we deduce strong normalization for λPw -expressions from strong normalization for λPw/≡ -expressions. We start the proof by introducing the notion of void substitutions which are special substitutions which do not really change the pattern environment part of the their typing environment (i.e. which do not bind new usual variable). Definition 41 The set VS of void substitutions is defined to be the smallest set of substitutions stable by concatenation such that: – id ∈VS – (ξ P:A /K).s ∈VS if and only if s ∈VS and BV ar(P ) = ∅ Void substitutions enjoy the following properties: Remark 3. – Let s be a void substitution such that s −→ s , then s is a void substitution. – Any void substitution is strongly normalizing. Lemma 41 Let s be a substitution such that Ψ ; ∆ s 7 Φ; ∅, then we have: – ∆=∅ – s is a void substitution Proof. The first point holds by Remark 2 and the second one by contradiction. 4.1
Definition of λPw/≡
We can now define the notion of λPw/≡ -reduction modulo an equational theory. Definition 42 The congruence ≡ on λPw -terms is defined by: (M N )[s] =Sub app M [s] N [s]
(s ◦ t) ◦ u =Sub ass env s ◦ (t ◦ u)
M [s][t] =Sub clos M [s ◦ t] We will consider the reduction system λPw/≡ , where a −→λPw/≡ b if and only if there exist a , b such that a ≡ a −→R b ≡ b, where R = λPw \ {Sub app,Sub ass env,Sub clos}. This definition can also be interpreted as a reduction on equivalence classes, i.e., [a] −→λPw/≡ [b] if and only if a −→R b , for a ∈ [a] and b ∈ [b]. We may use indistinctly both interpretations according to the context. We remark that by subject reduction (Theorem 34), the notion of well-typed λPw/≡ terms is well-defined.
A Weak Calculus with Explicit Operators for Pattern Matching and Substitution
4.2
187
Strong Normalization for λPw/≡
We are now able to introduce the notion of reducible expressions, which makes use of the following concept. Definition 43 (Neutral terms and substitutions) – A term is neutral if and only if it is neither of the forms (λx.N )[s], inrA (M ), inlB (M ) nor M1 , M2 . – A substitution s is neutral if and only if it is not of the form (x/M ).t. Notation 42 Let M be a term. 1 – PA×B (M ) denotes the term (λx, : A × B.x)[id] M where x is a fresh variable. 2 – PA×B (M ) denotes the term (λ , x : A × B.x)[id] M where x is a fresh variable. – SA+B (M ) denotes the term (λ(x |ξ y) : A + B.[x, w2 |ξ w1 , y])[id] M where x, y, w1 , w2 and ξ are fresh variables.
Definition 44 (Reducible terms and substitutions) The set of reducible terms for a given type in an environment Φ; Γ is defined by induction on types as follows: ιΦ;Γ =def {M | Φ; Γ M : ι and M is strongly normalizing}. A → BΦ;Γ =def {M | Φ; Γ M : A → B and ∀N ∈ AΦ;Γ ∆ , (M N ) ∈ BΦ;Γ ∆ } where ∆ satisfies the condition of the point 1 of Lemma 31. 1 2 (M ) ∈ AΦ;Γ and PA×B (M ) ∈ A × BΦ;Γ =def {M | Φ; Γ M : A × B, PA×B BΦ;Γ } A+BΦ;Γ =def {M | Φ; Γ M : A+B and ∀ fresh variables w1 , w2 , SA+B (M ) ∈ A × BΦ;w1:A,w2:B,Γ } The set of reducible substitutions for an environment Φ; Γ in an environment Ψ ; ∆ is defined as follows: Φ; Γ Ψ ;∆ =def {s |Ψ ; ∆ s 7 Φ; Γ and ∀(x : A) ∈ Γ, x[s] ∈ AΨ ;∆ } Reducible terms enjoy the following expected properties: Lemma 43 For every type C the following statements hold: 1. If M ∈ CΦ;Γ , then M is strongly normalizing. 2. If Φ; Γ (xM1 . . . Mn ) : C and M1 . . . Mn are strongly normalizing, then (xM1 . . . Mn ) ∈ CΦ;Γ . 3. If M ∈ CΦ;Γ and M −→ M then M ∈ CΦ;Γ . 4. If M is a neutral of type C and all its one-step reducts are reducible expressions, then M is reducible. Proof. The proof is done by induction on the type C.
188
J. Forest
Lemma 44 Let M be a term in AΦ;Γ . For all acceptable environment Ψ ; ∆ satisfying the conditions of Point 1 of Lemma 31, M is in AΦΨ ;Γ ∆ Proof. By induction on the type A. We can now deduce from Lemma 43 (Point 2) the following property: Corollary 1. All the variables are reducible. As for terms, reducible substitutions also enjoy the following expected properties: Lemma 45 1. If s ∈ Φ; Γ Ψ ;∆ , then s is strongly normalizing. 2. If s ∈ Φ; Γ Ψ ;∆ and s −→ s then s ∈ Φ; Γ Ψ ;∆ . 3. If s is a neutral substitution such that Ψ ; ∆ s 7 Φ; Γ and all its one-step reducts are reducible, then s ∈ Φ; Γ Ψ ;∆ . Proof. We prove the properties by cases on Γ . In the case of Γ = ∅ the proof is done by remarking that s is void. In the case of Γ = ∅, the proof is done by using Lemma 43. Lemma 46 Let s be a substitution in Φ; Γ Ψ ;∆ . For any acceptable environment Φ ; Γ satisfying the conditions of point 2 of Lemma 31, s is in Φ Φ; Γ Γ Φ Ψ ;Γ ∆ Proof. By Lemmas 31 and 44. Since id is neutral, well-typed and has no reducts, then we can conclude the following by Lemma 45. Corollary 2. Let Φ; Γ be a valid environment. Then id ∈ Φ; Γ Φ;Γ . We are now ready to prove the state statement of this section which allows us to prove that any well-typed expression is reducible. Theorem 47 Let Ψ ; ∆ and Φ; Γ be valid environments and s be a substitution in Φ; Γ Ψ ;∆ . – For every substitution t such that Φ; Γ t 7 Φ ; Γ , we have t ◦ s is in Φ ; Γ Ψ ;∆ . – For every term M such that Φ; Γ M : A, we have M [s] is in AΨ ;∆ . Proof. This proof can be done by induction on the structure of the (substitution/term) e. It uses some technical lemmas which we cannot present here by lack of space but which are fully detailed in [For02]. Intuitively, these lemmas state how to deduce reducibility for a given expression from the reducibility of its sub expressions. The equivalence relation is used, for example, to deduce reducibility for M [s][t] from reducibility of M [s ◦ t]. By Theorem 47 and Corollary 2 we have that the term M [id] and the substitution t ◦ id are reducible and thus by Lemmas 43 and 45 M [id] and t ◦ id turns out to be strongly normalizing so that the following result holds. Theorem 48 (λPw/≡ strong normalization) Any well typed λPw/≡ -expression is λPw/≡ strongly normalizing.
A Weak Calculus with Explicit Operators for Pattern Matching and Substitution
4.3
189
Strong Normalization for λPw
In order to conclude with the main theorem of this section we use the following standard property [FKP99]. Lemma 49 Let A = O, R1 ∪ R2 be an abstract reduction system such that: – R2 is strongly normalizing; – there exists a reduction system S = O , R and translation T from O to O such that: • a −→R1 b implies T (a) −→R T (b), • a −→R2 b implies T (a) = T (b). Then for all term a ∈ O such that T (a) is R -strongly normalizing we have that a is (R1 ∪ R2 )-strongly normalizing. Theorem 410 (λPw strong normalization) Any well typed λPw -expression is λPw strongly normalizing Proof. By Lemma 48 and Lemma 49, where R =−→λPw/≡ , R2 is the reduction system engendered by the three rule Sub app, Sub ass env and Sub clos, R2 is the reduction system engendered by the remaining rules and T is the canonical projection from λPw -expression to λPw/≡ -expression.
5 Conclusion In this paper we have presented a weak calculus λPw with explicit operations for pattern matching and substitution. Our formalism is successively inspired by [KPT96] and [CK]. In contrast to [KPT96], which treats substitutions and pattern-matching as meta-level operations, we have incorporated them into the syntax of the language by introducing appropriate reduction and typing rules. In contrast to [CK], we have eliminated the use of α-conversion (as our calculus is weak), and we have considered here a more powerful system of substitutions having composition. However, in our opinion, the major progress w.r.t the calculus T P CES presented in [CK] is that λPw incorporates sum replacement as an explicit operation, where the calculus T P CES uses meta-level substitutions for sum variables. This step was one of the main goals of the formalism we propose in this work. We have shown that the calculus λPw enjoys all the classical properties of typed λ-calculi, namely it is confluent on all terms, it has the subject reduction property and it is strongly normalizing on all well-typed terms. In the future, we would like to extend λPw with algebraic data types, in order to cover more realistic functional programming languages, and with more general syntax for binding structures, as for example Klop’s CRS [Klo80], in order to cover not only functional programming, but also other programming paradigms. We would also like to incorporate to our formalism some ideas of the ρcalculus [CK98] which deals with explicit pattern matching in a rewriting formalism. In particular, the λPw -calculus implements a fix pattern-matching algorithm in contrast to ρ-calculus which can be parametrized with different matching algorithms.
190
J. Forest
Last, but not least, we are studying different evaluation strategies for λPw , namely lazy and eager evaluators, which represent concrete implementations of functional languages via the more theoretical notion of reduction system proposed in this paper. We expect that this future work will allow us to provide a better explanation of the interaction between the operations of pattern matching and substitution. Acknowledgement. The author would like to thank his director Delia Kesner for the time spent to help him and for all the important remarks done on this article.
References [ACCL91] [BBLRD96]
[Blo95]
[BR95]
[CHL96]
[CK] [CK98] [CK99]
[CKL01]
[DG01] [FKP99]
[For02] [Kes96]
[Klo80]
Mart´ın Abadi, Luca Cardelli, Pierre Louis Curien, and Jean-Jacques L´evy. Explicit substitutions. Journal of Functional Programming, 4(1):375–416, 1991. Zine-El-Abidine Benaissa, Daniel Briaud, Pierre Lescanne, and Jocelyne RouyerDegli. λυ, a calculus of explicit substitutions which preserves strong normalisation. Journal of Functional Programming, 6(5):699–722, 1996. Roel Bloo. Preservation of strong normalization for explicit substitutions. Technical Report TR95-08, TUE Computing Science Reports, Eindhoven University of Technology, 1995. Roel Bloo and Kristoffer Rose. Preservation of strong normalization in named lambda calculi with explicit substitution and garbage collection. In Computing Science in the Netherlands, pages 62–72. Netherlands Computer Science Research Foundation, 1995. Pierre-Louis Curien, Th´er`ese Hardin, and Jean-Jacques L´evy. Confluence properties of weak and strong calculi of explicit substitutions. Journal of the ACM, 43(2):362–397, March 1996. Serenella Cerrito and Delia Kesner. Pattern matching as cut elimination. ftp://ftp.lri.fr/LRI/articles/kesner/pm-as-ce.ps.gz. Horatiu Cirstea and Claude Kirchner. ρ-calculus, the rewriting calculus. In Fifth International Workshop on Constraints in Computational Logics, 1998. Serenella Cerrito and Delia Kesner. Pattern matching as cut elimination. In Giuseppe Longo, editor, Fourteenth Annual IEEE Symposium on Logic in Computer Science (LICS), pages 98–108. IEEE Computer Society Press, July 1999. Horatiu Cirstea, Claude Kirchner, and Luigi Liquori. Matching Power. In Aart Middeldorp, editor, Proceedings of RTA’2001, Lecture Notes in Computer Science, Utrecht (The Netherlands), May 2001. Springer-Verlag. Ren´e David and Bruno Guillaume. A λ-calculus with explicit weakening and explicit substitution. Mathematical Structures in Computer Science, 11, 2001. Maria C.F. Ferreira, Delia Kesner, and Laurence Puel. Lambda-calculi with explicit substitutions preserving strong normalization. Applicable Algebra in Engineering Communication and Computing, 9(4):333–371, 1999. Julien Forest. A calculus of pattern matching and explicit substitution draft. Technical Report LRI-1313, 2002. http://www.lri.fr/˜forest/lpw.ps.gz. Delia Kesner. Confluence properties of extensional and non-extensional λ-calculi with explicit substitutions. In Harald Ganzinger, editor, Seventh International Conference on Rewriting Techniques and Applications, volume 1103 of Lecture Notes in Computer Science, pages 184–199. Springer-Verlag, July 1996. Jan-Willem Klop. Combinatory Reduction Systems, volume 127 of Mathematical Centre Tracts. CWI, Amsterdam, 1980. PhD Thesis.
A Weak Calculus with Explicit Operators for Pattern Matching and Substitution [KPT96] [Les94]
[Obj] [Rit94]
191
Delia Kesner, Laurence Puel, and Val Tannen. A Typed Pattern Calculus. Information and Computation, 124(1):32–61, 1996. Pierre Lescanne. From λσ to λυ, a journey through calculi of explicit substitutions. In Annual ACM Symposium on Principles of Programming Languages (POPL), pages 60–69, Portland, Oregon, 1994. The Objective Caml language. http://caml.inria.fr/. E. Ritter. Normalisation for typed lambda calculi with explicit substitution. Lecture Notes in Computer Science, 832:295–??, 1994.
Tradeoffs in the Intensional Representation of Lambda Terms Chuck Liang1 and Gopalan Nadathur2 1 2
Department of Computer Science, Hofstra University, Hempstead, NY 11550
[email protected], Fax: 516-463-5790 Department of Computer Science and Engineering, University of Minnesota, 4-192 EE/CS Building, 200 Union Street SE, Minneapolis, MN 55455
[email protected], Fax: 612-625-0572
Abstract. Higher-order representations of objects such as programs, specifications and proofs are important to many metaprogramming and symbolic computation tasks. Systems that support such representations often depend on the implementation of an intensional view of the terms of suitable typed lambda calculi. Refined lambda calculus notations have been proposed that can be used in realizing such implementations. There are, however, choices in the actual deployment of such notations whose practical consequences are not well understood. Towards addressing this lacuna, the impact of three specific ideas is examined: the de Bruijn representation of bound variables, the explicit encoding of substitutions in terms and the annotation of terms to indicate their independence on external abstractions. Qualitative assessments are complemented by experiments over actual computations. The empirical study is based on λProlog programs executed using suitable variants of a low level, abstract machine based implementation of this language.
1
Introduction
This paper concerns the representation of lambda terms in the implementation of programming languages and systems in which it is necessary to examine the structures of such terms during execution. The best known uses of this kind of lambda terms appears within higher-order metalanguages [16,21], logical frameworks [9,19] and proof development systems [5,7,20]. Within these systems and formalisms, the terms of a chosen lambda calculus are used as data structures, with abstraction in these terms being used to encode binding notions in objects such as formulas, programs and proofs, and the attendant β-reduction operation capturing important substitution computations. Although the intensional uses of lambda terms have often pertained to this kind of “higher-order” approach to abstract syntax, they are not restricted to only this domain. Recent research on the compilation of functional programming languages has, for example, advocated the preservation of types in internal representations [23]. Typed intermediate languages that utilize this idea [22] naturally call for structural operations on lambda terms during computations. S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 192–206, 2002. c Springer-Verlag Berlin Heidelberg 2002
Tradeoffs in the Intensional Representation of Lambda Terms
193
The traditional use of lambda terms as a means for computing permits each such term to be compiled into a form whose only discernible relationship to the original term is that they both reduce to the same value. Such a translation is, of course, not acceptable when lambda terms are used as data structures. Instead, a representation must be found that provides a rapid access at runtime to the form of a term and that also facilitates comparisons between terms based on this structure. More specifically, the relevant comparison operations usually ignore bound variable names and equality modulo α-conversion must therefore be easy to recognize. Further, comparisons of terms must factor in the β-conversion rule and, to support this, an efficient implementation of β-contraction must be provided. An essential component of β-contraction is a substitution operation over terms. Building in a fine-grained control over this operation has been thought to be useful. This control can be realized in principle by introducing a new category of terms that embody terms with ‘suspended’ substitutions. The detailed description of such an encoding is, however, a little complicated because the propagation of substitutions and the contraction of redexes inside the context of abstractions have, in general, to be considered when comparing terms. The representational issues outlined above have been examined in the past and approaches to dealing with them have also been described. A well-known solution to the problem of identifying two lambda terms that differ only in the names chosen for bound variables is, for instance, to transform them into a nameless form using a scheme due to de Bruijn [4]. Similarly, several new notations for the lambda calculus have been described in recent years that have the purpose of making substitutions explicit (e.g., see [1,3,10,18]). However, the actual manner in which all these devices should be deployed in a practical context is far from clear. In particular, there are tradeoffs involved with different choices and determining the precise way in which to make them requires experimentation with an actual system: the operations on lambda terms that impact performance are ones that arise dynamically and they are notoriously difficult to predict from the usual static descriptions of computations. This paper seeks to illuminate this empirical question. The vehicle for its investigation is the Teyjus implementation of λProlog [17]. λProlog is a logic programming language that employs lambda terms as a representational device and that, in addition to the usual operations on such terms, uses higher-order unification as a means for probing their structures. The Teyjus system, therefore, implements intensional manipulations over lambda terms. This system isolates several choices in term representation, permitting them to be varied and their impact to be quantified. We employ this concrete setting to understand three different issues: the value of explicit substitutions, the benefits and drawbacks of the de Bruijn notation and the relevance of an annotation scheme that determines the dependence of (sub)terms on external abstractions. Using a mixture of qualitative characterizations and experiments, we conclude that: 1. Explicit substitutions are useful so long as they provide the ability to combine β-contraction substitutions.
194
C. Liang and G. Nadathur
2. The potential disadvantage of the de Bruijn representation, namely the need to renumber indices during β-contraction, is not significant in practice. 3. Dependency annotations can improve performance, but their effect is less pronounced when combined with the ability to merge environments. The rest of this paper is structured as follows. In the next section, we describe a notation for lambda terms that underlies the Teyjus implementation. This notation embodies an explicit representation of substitutions but one that can, with suitable control strategies, be used to realize substitutions either eagerly or lazily. In Section 3, we outline the structure of λProlog computations, categorize these into conceptually distinct classes and describe specific programs in each class that we use to make measurements. The following three sections discuss, in turn, the differences in lambda term representation of present interest and provide the results of our experiments. We conclude the paper with an indication of other questions that need to be examined empirically.
2
A Notation for Lambda Terms
We use an explicit substitution notation for lambda terms in this paper that builds on the de Bruijn method for eliminating bound variable names. A notation of this kind conceptually encompasses two categories of expressions, one corresponding to terms and the other corresponding to environments that encode substitutions to be performed over terms. In a notation such as the λσ-calculus [1] that use exactly these two categories, an operation must be performed on an environment expression each time it is percolated inside an abstraction towards modifying the de Bruijn indices in the terms whose substitution it represents. The notation that we have designed for use in the implementation of λProlog instead allows these adjustments to be carried out in one swoop when a substitution is actually effected rather than in an iterated manner. To support this possibility, this notation includes a third category of expressions called environment terms that encode terms together with the ‘abstraction context’ they come from. Our notation additionally incorporates a system for annotating terms to indicate whether or not they contain externally bound variables. Formally, the syntax of terms, environments and environment terms of our annotated suspension notation are given by the following rules: T E ET A
::= ::= ::= ::=
C | V | #I | (T T )A | (λA T ) | [[T , N , N , E]]A nil | ET :: E @N | (T , N ) o|c
In these rules, C represents constants, V represent instantiatable variables (i.e. variables that can be substituted for by terms), I is the category of positive numbers and N is the category of natural numbers. Terms correspond to lambda terms. In keeping with the de Bruijn scheme, #i represents a variable bound by the ith abstraction looking back from the occurrence. An expression
Tradeoffs in the Intensional Representation of Lambda Terms
195
of the form [[t, ol, nl, e]]o or [[t, ol, nl, e]]c , referred to as a suspension, is a new kind of term that encodes a term with a ‘suspended’ substitution: intuitively, such an expression represents the term t with its first ol variables being substituted for in a way determined by e and its remaining bound variables being renumbered to reflect the fact that t used to appear within ol abstractions but now appears within nl of them. Conceptually, the elements of an environment are either substitution terms generated by a contraction or are dummy substitutions corresponding to abstractions that persist in an outer context. However, renumbering of indices may have to be done during substitution, and to encode this the environment elements are annotated by a relevant abstraction level. To be deemed well-formed, suspensions must satisfy certain constraints that have a natural basis in our informal understanding of their content: in an expression of the form [[t, i, j, e]]o or [[t, i, j, e]]c , the ‘length’ of the environment e must be equal to i, for each element of the form @l of e it must be the case that l < j and for each element of the form (t , l) of e it must be the case that l ≤ j. A final point to note about the syntax of our expressions is that all non-atomic terms are annotated with either c or o. The former annotation indicates that the term in question does not contain any variables bound by external abstractions and the latter is used when either this is not true or when enough information is not available to determine that it is. The expressions in our notation are complemented by a collection of rewrite rules that simulate β-contractions. These rules are presented in Figure 1. The symbols v and u that are used for annotations in these rules are schema variables that can be substituted for by either c or o. We also use the notation e[i] to denote the ith element of the environment. Of the rules presented, the ones labelled (βs ) and (βs ) generate the substitutions corresponding to the β-contraction rule on de Bruijn terms and the rules (r1)-(r12), referred to as the reading rules, serve to actually carry out these substitutions. The rule (r2) pertaining to ‘reading’ an instantiatable variable is based on a particular interpretation of such variables: substitutions that are made for them must not contain de Bruijn indices that are captured by external abstractions. This is a common understanding of such variables but not the only one. For example, treating these variables as essentially first-order ones whose instantiation can contain free de Bruijn indices provides the basis for lifting higher-order unification to an explicit substitution notation [8]. We comment briefly on this possibility at the end of the paper but do not treat it in any detail here. The correctness of some of our reading rules, in particular, the rules (r8)(r10), depends on consistency in the use of annotations. Our assumption is that these are correctly applied at the outset; thus, the static (compilation) process that creates initial internal representations of terms is assumed to apply the annotation c to only those terms that do not have unbound de Bruijn indices in them. It can then be seen that the rewrite rules preserve consistency while also attempting to retain the content in annotations [15]. The ultimate utility of our notation is dependent on its ability to simulate reduction in the lambda calculus. That it is capable of doing this can be seen
196
C. Liang and G. Nadathur (βs )
((λu t1 ) t2 )v → [[t1 , 1, 0, (t2 , 0) :: nil]]v
(βs )
((λu [[t1 , ol + 1, nl + 1, @nl :: e]]o ) t2 )v → [[t1 , ol + 1, nl, (t2 , nl) :: e]]v
(r1)
[[c, ol, nl, e]]u → c provided c is a constant
(r2)
[[x, ol, nl, e]]u → x provided x is an instantiatable variable
(r3)
[[#i, ol, nl, e]]u → #j provided i > ol and j = i − ol + nl.
(r4)
[[#i, ol, nl, e]]u → #j provided i ≤ ol and e[i] = @l and j = nl − l.
(r5)
[[#i, ol, nl, e]]u → [[t, 0, j, nil]]u provided i ≤ ol and e[i] = (t, l) and j = nl − l.
(r6)
[[(t1 t2 )u , ol, nl, e]]v → ([[t1 , ol, nl, e]]v [[t2 , ol, nl, e]]v )v .
(r7)
[[(λu t), ol, nl, e]]v → (λv [[t, ol + 1, nl + 1, @nl :: e]]o ).
(r8)
[[(t1 t2 )c , ol, nl, e]]u → (t1 t2 )c .
(r9)
[[(λc t), ol, nl, e]]u → (λc t).
(r10)
[[[[t, ol, nl, e]]c , ol , nl , e ]]u → [[t, ol, nl, e]]c .
(r11)
[[[[t, ol, nl, e]]o , 0, nl , nil]]o → [[t, ol, nl + nl , e]]o .
(r12)
[[t, 0, 0, nil]]u → t Fig. 1. Rule schemata for rewriting annotated terms
in two steps [15]. First, underlying every consistently annotated term in the suspension notation is intended to be a de Bruijn term that is to be obtained by ‘calculating out’ the suspended substitutions. The reading rules can be shown to possess properties that support this interpretation: every sequence of rewritings using these rules terminates and any two such sequences that start at the same term ultimately produce annotated forms of the same de Bruijn term. It can then be shown that the de Bruijn term t β-reduces to s if and only if any t that is a consistently annotated version of t can be rewritten using our rules to a term s that is itself a consistently annotated version of s. The (βs ) rule is redundant to our collection if our sole purpose is to simulate β-contraction; indeed, omitting this rule yields a calculus that is similar to those in [3] and [10]. However, the (βs ) rule is the only one in our system that permits substitutions arising from different contractions to be combined into one environment and, thereby, to be carried out in the same walk over the structure of the term being substituted into. The rule (r11) is also redundant, but it serves a similar useful purpose in that it that it allows a reduction to be combined with a renumbering walk after a term has been substituted into a new (abstraction) context. In fact, the main uses of rules (r11) and (r12) arise right after a use of rule (r5) and a reduction procedure based on our rules can actually roll these distinct rule applications into one. Rather than eliminating the effects of rules (βs ) and
Tradeoffs in the Intensional Representation of Lambda Terms
197
(r11), it is possible to replace them with more general ones that are capable of merging the environments in any term of the form [[[[t, ol1 , nl1 , e1 ]]u , ol2 , nl2 , e2 ]]v to produce an equivalent term of the form [[t, ol, nl, e]]v ; such a collection of rules is, in fact, presented in [18]. However, embedding this larger collection of rules in an actual procedure can be difficult and an underlying assumption here is that the rules (βs ) and (r11) provide an efficient way to capture most of their useful effects. A final observation is that the rules (r8)-(r10) are redundant in a sense analogous to the (βs ) rule. However, using these rules can have practical benefits as we discuss in Section 6.
3
Computations in λProlog
The language λProlog is one that, from the perspective of this paper, extends Prolog in three important ways. First, it replaces first-order terms—the data structures of a logic programming language—with the terms of a typed lambda calculus. These lambda terms are complemented by a notion of equality given by the α-, β- and η-conversion rules. Second, λProlog uses a unification operation that builds in the extended notion of equality accompanying lambda terms. Finally, the language incorporates two new kinds of goals that provide, respectively, for scoping over names and clauses defining predicates. The new features of λProlog endow it with useful metalanguage capabilities based essentially on the fact that the abstraction construct that is present in lambda terms provides a versatile mechanism for capturing the binding notions that appear in a variety of syntactic constructs. Suppose, for instance, that we are interested in representing a formula of the form ∀x P (x) where P (x) represents a possibly complex formula in which x appears free. Using lambda terms, this formula may be encoded as (all (λx P (x))), where all is a constant chosen to encode the predicative force of the universal quantifier and P (x) represents P (x). This kind of explicit treatment of quantifier scope has several advantages. Identity of formulas under variable renaming is implicit in the encoding; the representations of ∀x P (x) and ∀y P (y) are, for example, equivalent lambda terms. Quantifier instantiation can be realized uniformly through application. Thus, the instantiation of the quantifier in (all Q), where Q itself is a complex structure possibly containing quantifiers, by the term t can be realized simply by writing down the expression (Q t); β-contraction realizes the actual substitution with all the renamings necessary for logical correctness. Expressions with instantiatable variables can be used in conjunction with the enhanced unification notion to analyze formulas in logically sophisticated ways. Finally, many computations on formulas are based on recursion over their structure. The usual Horn clauses of a logic programming language already provide for a recursion over first-order structure. The new scoping mechanisms of λProlog complement this to realize also a recursion over binding structure in an elegant and logically justifiable way. The capabilities of λProlog that we have outlined above have been exploited for a variety of purposes in the past, ranging from building theorem provers and encoding proofs to implementing experimental programming languages. A detailed discussion of these applications is beyond the scope of this paper. For
198
C. Liang and G. Nadathur
our present purposes it suffices to note that all of them depend on the ability to perform conversion operations on lambda terms, to compare their structures and to decompose them in logical ways. Thus, λProlog provides a rich programming framework—that is not unlike others that utilize lambda terms intensionally— for studying the impact on performance of choices in the representation of these terms. In the following sections, we use this framework in quantifying some of these differences. We identify below a taxonomy of λProlog computations that is based roughly on the ‘level’ of the higher-order feature used and we use this to describe a collection of testing programs to be employed in our empirical study. The test suite is available from the Teyjus site at http://teyjus.cs.umn.edu/. First-Order Programs. This category exercises no higher-order features and should therefore be impervious to differences in the way (lambda) terms are treated. We include two programs from this category: – [quicksort] A standard Prolog implementation of the familiar sorting routine. – [pubkey] An implementation of a public key security protocol described in [6]. This program uses the new scoping devices in λProlog. First-Order Like Unification with Reduction. Genuine lambda terms may appear in programs in this category, but these figure mainly in reduction computations, most unification problems being first-order in nature. Matching the representation of a lambda term with the pattern (all Q), for instance, yields a unification problem of this kind and the subsequent application (Q t) generates a reduction. The two programs included in this category are the following: – [hnorm] A head normalization routine used to reduce a collection of randomly generated lambda terms. – [church] A program that involves arithmetic computations with Church numerals and associated combinators. Lλ Style Programs. Computation in this class proceeds by first dispensing with all abstractions in lambda terms using new constants, then carrying out a first-order style analysis over the remaining structure and eventually abstracting out the new constants. As an idiom, this is a popular one amongst λProlog users and it has also been adopted in other related systems such as Elf and Isabelle. Programs in this class use a controlled form of reduction—the argument of a redex is always a constant—and a restricted form of higher-order unification [12]. We test the following programs in this category: – [typeinf ] A program that infers type schemes for ML-like programs. – [compiler] A compiler for a small imperative language [11]. Unrestricted Programs. Programs in this class make essential use of (general) higher-order unification. As such, they tend to involve significant backtracking and they also encompass β-reduction where the arguments of redexes are complex terms. The programs tested in this category are the following: – [hilbert] A λProlog encoding of Hilbert’s Tenth Problem [13]. – [funtrans] A transformer of functional programs [14].
Tradeoffs in the Intensional Representation of Lambda Terms
199
The Teyjus system used in our experiments is an abstract machine and compiler based implementation of λProlog. It uses a low-level encoding of the terms in the annotated suspension notation. The use of the rewrite rules in Figure 1 within Teyjus are confined to a procedure that transforms terms to head normal forms, upon which all comparison operations are based. This procedure can be modified to realize, and hence to study, the effect of different choices in lambda term representation. It is this ability that we utilize in the following sections.
4
The Value of Explicit Substitutions
Reflecting an explicit substitution notation into the representation of lambda terms provides the basis for a lazy strategy in effecting reduction substitutions. There are two potential benefits to such laziness. First, actual substitutions may be delayed till a point where it becomes evident that it is unnecessary to perform them. Thus, consider the task of determining if the two terms ((λ λ λ (#3 #2 s)) (λ #1)) and ((λ λ λ (#3 #1 t)) (λ #1)) are identical, assuming that s and t are some unspecified but complicated terms. We can conclude that they are not by observing that they reduce respectively to λ λ (#2 s ) and λ λ (#1 t ), where s and t are terms that result from s and t by appropriate substitutions. Note that, in making this determination, it is not necessary to explicitly calculate the results of the substitutions over the terms s and t. The annotated suspension notation and a suitable head-normalization procedure [15] provide the basis for such an approach. Second, such laziness makes it possible to combine substitution walks that arise from contracting different β-redexes. Thus, suppose that we wish to instantiate the two quantifiers in the formula represented by (all (λx (all (λy P )))), where P represents an unspecified formula, with the terms t1 and t2 . Such an instantiation is realized through two reductions, eventually requiring t2 and t1 to be substituted for the first and second free variables in P and the indices of all other free variables to be decremented by two. All these substitutions involves a walk over the same structure—the structure of P —and it would be profitable if they could all be done together. To combine walks in this manner, it is necessary to temporarily suspend substitutions generated by β-contractions. In a situation in which all the redexes are available in a single term, this kind of delaying of substitution can be built into the reduction procedure through ad hoc devices. However, in the case being considered, the two quantifier instantiations are ones that can only be considered incrementally and, further, intervening structure needs to be processed before the abstraction giving rise to the second redex is encountered. The structure that engenders sharing is therefore not all available within a single call to a reduction procedure and an explicit encoding of substitution over P seems to be necessary for realizing this benefit. Towards understanding how important these factors are in practice, different strategies were experimented with in the head normalization routine in Teyjus. Three variations were tried. In one case, each time a βs rule was used, the effects of the resulting substitution were immediately calculated out, mimicking the customary eager approach. The second version used lazy substitution but did not use the (βs ) rule in the rewriting process. This version gives us the benefits
200
C. Liang and G. Nadathur
Eager Program Running Reading Rule Time Applications quicksort 0.25 secs 0 pubkey 0.34 secs 0 church 0.36 secs 243461 hnorm 0.75 secs 266970 typeinf 14.70 secs 10291085 compiler 3.53 secs 2496431 hilbert 0.48 secs 296027 funtrans 2.20 secs 44803
Lazy without Merging Running Reading Rule Time Applications 0.25 secs 0 0.34 secs 0 0.39 secs 367314 0.83 secs 439121 14.81 secs 17582708 3.82 secs 3391088 0.36 secs 58894 2.22 secs 60116
Lazy with Merging Running Reading Rule Time Applications 0.25 secs 0 0.33 secs 0 0.27 secs 73200 0.66 secs 89020 9.58 secs 2177884 2.26 secs 318703 0.34 secs 5489 2.19 secs 24290
Fig. 2. Comparison of Reduction Substitution Strategies
of delayed substitution without the added advantage of combining substitution walks. The last variation used the full repertoire of rewrite rules, once again in a demand-driven mode for calculating substitutions, thereby attempting to draw on all the benefits of explicit substitutions. The overall framework in all three cases was provided by a graph-based reduction routine with an iterative control and that uses a stack as an auxiliary store. Such a procedure writes the result of each rewriting step based on the rules in Figure 1 to the heap. An alternative procedure, that was not utilized but that is applicable equally to all three variants tested and hence likely to produce similar timing behaviour, would embed most of the information written to heap within a recursion stack. Figure 2 tabulates the running time and the number of applications of the rewrite rules that percolate the effects of substitutions over terms for each variant over the suite of programs described in Section 3. All the tests were conducted on a 400MHz UltraSparc and each figure in the table represents the average of five runs. The data in the table indicates a clear preference for a lazy approach to substitution combined with merging of substitutions in the cases where lambda terms are used intrinsically in computation. In the cases where the computation predominantly involves reduction or Lλ style processing, the time improvements range from 12% to over 35%. The measured time is over all computations, including backchaining over logic programming clauses. The difference attributable to the substitution strategy is pinpointed more dramatically and accurately by the counts of reading rule applications that are directly responsible for the timing differences and that are unaffected by the specifics of term representation and reduction procedure implementation. These figures indicate a reduction of structure traversal to between a third and an eigth in the cases mentioned through the use of lazy substitution with merging. A complementary observation is that while the number of β-redexes contracted remains unchanged, between 50% and 90% of these contractions are realized via the (βs ) rule. We note that behavior degrades in some cases when lazy substitution is used without merging. There are two explanations for this. First, in this mode, terms of the form [[t, ol, nl, e]]v where t is itself a suspension are often encountered. In a demand driven approach the two suspensions have to be worked inwards in tan-
Tradeoffs in the Intensional Representation of Lambda Terms
201
dem and this has a noticeable overhead. Second, backtracking, a possibility in a logic programming like setting, has an impact. Backtracking causes a revocation of the changes that have been made beyond the choice point that computation is retracted to. By carrying out substitution work that is deterministic late, a source of redundancy is introduced. This effect is evident in the increase in reading rule applications under a lazy substitution strategy without merging. The data tabulated above pertain to the situation when annotations are used. Without these the differences are more dramatic as we discuss later. The conclusion from our observations thus seems to be that explicit substitutions are important for performance provided they are accompanied by merging.
5
The Treatment of Bound Variables
The common approaches to representing bound variables can be separated into two categories: those that use explicit names or, roughly equivalently, pointers to cells corresponding to binding occurrences and those that use de Bruijn indices. In comparing these approaches, it is important to distinguish two operations to which bound variables are relevant. One of these is that of checking the identity of two terms up to α-convertibility. The second is that of making substitutions generated by β-contractions into terms, in which case the treatment of bound variables is relevant to the way in which illegal capture is avoided. The de Bruijn representation is evidently superior from the perspective of checking the identity of terms up to α-convertibility. For example, consider matching the two terms λy1 . . . λyn (yi t1 . . . tm ) and λz1 . . . λzn (zi s1 . . . sm ). The heads of these terms that are embedded under the abstractions are bound in both cases by the same abstraction. Thus, the matching problem can be translated into one over the arguments of this term. A prelude to this transformation at a formal level under a name based scheme is, however, a ‘normalization’ of bound variable names. This step is avoided under the de Bruijn scheme. A further consideration of the above example indicates a more significant advantage of the de Bruijn notation. Under a name based scheme, the transformation step must produce the following set of pairs of terms to be matched: {λy1 . . . λyn t1 , λz1 . . . λzn s1 , . . . , λy1 . . . λyn tm , λz1 . . . λzn sm }. The abstractions at the front of each of the terms are necessary: they provide the context in which the bound variables in the arguments are to be interpreted in the course of matching them. Constructing these new terms at run time is computationally costly and also a bit too complex to accommodate in a low-level, abstract machine based system such as Teyjus. Under the de Bruijn scheme, this context is implicitly present in the numbering of bound variables, obviating the explicit attachment of the abstractions. From the perspective of carrying out substitutions in contrast, the de Bruijn scheme has no real benefit and may, in fact, even incur an overhead. The important observation here is that the renaming that may be needed in the substitution process in a name based scheme has a counterpart in the form of renumbering relative to the de Bruijn notation. To understand the nature of the needed
202
C. Liang and G. Nadathur
mechanism, we may consider the reduction of the term λx ((λy λz y x) (λw x)) whose de Bruijn representation is λ ((λ λ #2 #3) (λ #2)). This term reduces to λx λz ((λw x) x), a term whose de Bruijn representation is λ λ ((λ #3) #2). Comparing the two de Bruijn terms, we notice the following: When substituting the term (λ #2) inside an abstraction, the index representing the locally free variable occurrence, i.e., 2, has to be incremented by 1 to avoid its inadvertent capture. Further, indices for bound variable occurrences within the scope of an abstraction that disappears on account of a β-contraction may have to be changed; here the index 3 corresponding to the variable occurrence x in the scope of the abstraction that is eliminated must be decremented by 1. The substitution operation that is used in formalizing β-contraction under the de Bruijn scheme must account for both effects. At a detailed level, there is a difference in the renaming and renumbering devices needed in name-based and nameless representations. Given a β-redex of the form (λx λy t1 ) t2 whose de Bruijn version is a term of the form (λ λ tˆ1 ) tˆ2 , the renaming in the first case is effected over the ‘body’, i.e., λy t1 , and in the second case over the argument, i.e., tˆ2 .1 One advantage of the name-based representation is that the renaming may be avoided altogether if there is no name clash. However determining this requires either a traversal of the term being substituted, or an explicit record of the variables that are free in it. An interesting alternative, described, for instance, in [2], is to always perform a renaming and, more significantly, to fold this into the same structure traversal as that realizing the β-contraction substitution. The above discussion indicates that the additional cost relative to substitution that is attendant on the de Bruijn notation is bounded by the effort expended in renumbering substituted terms. A first sense of this cost can thus be obtained by measuring the proportion of substitutions that actually lead to nontrivial renumbering of compound terms. Cases of this kind can be identified as those in which rule (r5) is used where the skeletal term is non atomic and where an immediate simplification by one of the rules (r8)-(r10) or (r12) is not possible. Figure 3 tabulates the data gathered towards this end for the programs in our test suite that use reductions. An interesting observation is that no renumbering is actually involved in the case of Lλ style programming. The reason for this is not hard to see—the only reductions performed are those corresponding to eliminating the binding with a new constant. Thus, for a significant set of computations carried out in λProlog and related languages, renumbering is a non-issue. In the other cases, some renumbering can occur but adopting a merging based approach to substitution can reduce this considerably. This phenomenon is also understandable; substituting a term in after more enclosing abstractions have disappeared due to contractions leaves fewer reasons to renumber. The cases where a nontrivial renumbering needs to be done do not necessarily constitute an extra cost. In general, when a term is substituted in, it is necessary also to examine its structure and possibly reduce it to (weak) head normal form. 1
In the de Bruijn scheme, some bound variables in tˆ1 may also have to be renumbered, but this can be done efficiently at the same time that tˆ2 is substituted into the term.
Tradeoffs in the Intensional Representation of Lambda Terms Eager
203
Lazy with Merging
Program
Total Renumbering Total Renumbering Substitutions Substitutions Substitutions Substitutions church 34999 3267 34723 12 hnorm 25497 0 32003 0 typeinf 777832 0 722291 0 compiler 154967 0 100519 0 hilbert 3840 1500 1539 411 funtrans 8587 146 7374 19 Fig. 3. Renumbering with the de Bruijn Representation
Now, the necessary renumbering can be incorporated into the same walk as the one that carries out this introspection. This structure is realized by choosing to percolate the substitution inwards first in a term of the form [[t, 0, nl, nil]]v , using rule (r11) to facilitate the necessary merging in the case that t is itself a suspension. The main drawback of this approach, in contrast to the scheme in [2] for instance, is that it can lead to a loss in sharing in reduction if the same term, t, has to be substituted, and reduced, in more than one place. An indication of the loss in sharing can be obtained from the differences in the number of (βs ) and (βs ) reductions under the two strategies in those cases where renumbering is an issue. Our measurements show that, under an ‘umbrella’ regime of delayed substitution with merging, these numbers were identical in all the relevant cases. Thus, there is no loss in sharing from combining the reduction and renumbering walks and, consequently, no real renumbering overhead relative to our test suite. Our conclusion, then, is that the de Bruijn treatment of bound variables is the preferred one in practice. It is obviously superior to name based schemes relative to comparing terms modulo α-conversion and, in fact, representations of the latter kind are not serious contenders from this perspective in low-level implementations. The de Bruijn scheme has the drawback of a renumbering overhead in realizing β-contraction. However, our experiments show that this overhead is either negligible or nonexistent in a large number of cases.
6
The Relevance of Annotations
Annotations that are included with terms have the potential for being useful in two different ways. First, they can lead to substitutions walks being avoided when it is known that the substitutions will not affect the term. Second, by allowing a suspension to be simplified directly to its skeleton term, they can lead to a preservation of sharing of structure, and, hence, of reduction work, in a graph based implementation. Both effects are present in the rules (r8)-(r10) in Figure 1, the only ones to actually use annotations. There is, of course, a cost associated with maintaining annotations as manifest in the structure of all the other rules. However, this cost can be considerably reduced with proper care. In the Teyjus implementation, for example, an otherwise unused low-end bit in the
204
C. Liang and G. Nadathur
tag word corresponding to each term stores this information. The setting of this bit is generally folded into the setting of the entire tag word and a single test on the bit that ignores the structure of the term suffices in utilizing the information in the annotation. Eager Lazy with Merging Without With Without With Annotations Annotations Annotations Annotations quicksort 0.26 0.25 0.25 0.25 pubkey 0.33 0.34 0.31 0.33 church 1.90 0.36 0.29 0.27 hnorm 0.77 0.75 0.68 0.66 typeinf 48.56 14.70 9.56 9.58 compiler 19.51 3.53 2.29 2.26 hilbert 0.60 0.48 0.34 0.34 funtrans 2.20 2.20 2.26 2.19 Program
Fig. 4. The Effect of Annotations
Experiments were conducted to quantify the benefits of annotations. The different versions of the head normalization routine described in Section 4 were modified to ignore annotations. The chosen programs were then executed on a 400MHz UltraSparc using the various versions of the Teyjus system. Figure 4 tabulates the running time in seconds for four of these versions; we have omitted the case of delayed substitutions without merging since the results are similar to the version with eager substitutions. The indication from these data is that annotations can make a significant difference in a situation where substitution is performed eagerly—for example, the running time is reduced by about 70% and 80% in the case of the two Lλ style programs—but they have little effect in the situation when lazy substitution with merging is used. Interpreting these observations, it appears that annotations can lead to the recognition of a large amount of unnecessary substitution work. In the situation where substitution walks can be merged, however, these can also be combined with walks for reducing the term. Since reduction has to be performed in any case, the redundancy when annotations are not used is minimal. Consistent with the observations from the previous section, the data for the case of lazy substitutions also indicates little benefit from shared reduction.
7
Conclusion
We have examined the tradeoffs surrounding the use of three ideas in the machine encoding of lambda terms. Our study indicates that a notation that supports a delayed application of substitution as well as a merging of substitution walks can be used to significant practical effect. Within this context the de Bruijn scheme
Tradeoffs in the Intensional Representation of Lambda Terms
205
for treating bound variables has definite advantages with little attendant costs. Finally, the benefits of annotations appear to be marginal at best when reduction and substitution are performed in tandem. Our observations have been made relative to a graph-based approach to representing terms. It is further possible to use hash-consing in the low-level encoding as has been done in the FLINT system that also employs the annotated suspension notation [22]. While we have not experimented with this technique explicitly, we believe that its use will be neutral to the other choices considered here. The work reported here can be extended in several ways. One further question to consider is the difference between destructive and non-destructive realizations of reduction. There are ‘obvious’ advantages to a destructive version in a deterministic setting that become less clear with a language like λProlog that permits backtracking. This matter can be examined experimentally. Along a different direction, the implementation of higher-order unification needs to be considered more carefully. By changing the interpretation of instantiatable or meta variables, it is possible to lift higher-order unification to an explicit substitution notation. Doing so has the benefit of making the application of substitutions to meta variables very efficient. However, there are also costs: a more general mechanism for combining substitutions is needed and context information must be retained dynamically to translate metavariables prior to the presentation of output. There is a tradeoff here that can, once again, be assessed empirically. A final question concerns a more refined set of benchmarks, one that makes finer distinctions in the categories of computations on lambda terms. Such a refinement may reveal further factors that can influence the tradeoffs in representations. Acknowledgements. This work was supported in part by NSF Grant CCR0096322. Useful presentation suggestions were received from the referees, not all of which could be explicitly utilized due to space limitations.
References 1. M. Abadi, L. Cardelli, P.-L. Curien, and J.-J. L´evy. Explicit substitutions. Journal of Functional Programming, 1(4):375–416, 1991. 2. L. Aiello and G. Prini. An efficient interpreter for the lambda-calculus. The Journal of Computer and System Sciences, 23:383–425, 1981. 3. Z. Benaissa, D. Briaud, P. Lescanne, and J. Rouyer-Degli. λυ, a calculus of explicit substitutions which preserves strong normalization. Journal of Functional Programming, 6(5):699–722, 1996. 4. N. de Bruijn. Lambda calculus notation with nameless dummies, a tool for automatic formula manipulation, with application to the Church-Rosser Theorem. Indag. Math., 34(5):381–392, 1972. 5. R. L. Constable et al. Implementing Mathematics with the Nuprl Proof Development System. Prentice-Hall, Englewood Cliffs, New Jersey, 1986. 6. G. Delzanno. Specifying and debugging security protocols via hereditary Harrop formulas and λProlog - a case-study. In Fifth International Symposium on Functional and Logic Programming. Springer Verlag LNCS vol. 2024, 2001.
206
C. Liang and G. Nadathur
7. G. Dowek, A. Felty, H. Herbelin, G. Huet, C. Murthy, C. Parent, C. PaulinMohring, and B. Werner. The Coq proof assistant user’s guide. Rapport Techniques 154, INRIA, Rocquencourt, France, 1993. 8. G. Dowek, T. Hardin, and C. Kirchner. Higher-order unification via explicit substitutions. Information and Computation, 157:183–235, 2000. 9. R. Harper, F. Honsell, and G. Plotkin. A framework for defining logics. Journal of the Association for Computing Machinery, 40(1):143–184, January 1993. 10. F. Kamareddine and A. R´ios. Extending the λ-calculus with explicit substitution which preserves strong normalization into a confluent calculus on open terms. Journal of Functional Programming, 7(4):395–420, 1997. 11. C. Liang. Compiler construction in higher order logic programming. In 4th International Symposium on Practical Aspects of Declarative Languages, pages 47–63. Springer Verlag LNCS No. 2257, 2002. 12. D. Miller. A logic programming language with λ-abstraction, function variables, and simple unification. Journal of Logic and Computation, 1(4):497–536, 1991. 13. D. Miller. Unification under a mixed prefix. Journal of Symbolic Computation, pages 321–358, 1992. 14. M. Mottl. Automating functional program transformation. MSc Thesis. Division of Informatics, University of Edinburgh, September 2000. 15. G. Nadathur. A fine-grained notation for lambda terms and its use in intensional operations. Journal of Functional and Logic Programming, 1999(2), March 1999. 16. G. Nadathur and D. Miller. An overview of λProlog. In K. A. Bowen and R. A. Kowalski, editors, Fifth International Logic Programming Conference, pages 810– 827. MIT Press, August 1988. 17. G. Nadathur and D. J. Mitchell. System description: Teyjus—a compiler and abstract machine based implementation of λProlog. In Automated Deduction– CADE-16, pages 287–291. Springer-Verlag LNAI no. 1632, 1999. 18. G. Nadathur and D. S. Wilson. A notation for lambda terms: A generalization of environments. Theoretical Computer Science, 198(1-2):49–98, 1998. 19. C. Paulin-Mohring. Inductive definitions in the system Coq: Rules and properties. In Proceedings of the International Conference on Typed Lambda Calculi and Applications, pages 328–345. Springer-Verlag LNCS 664, 1993. 20. L. C. Paulson. Isabelle: A Generic Theorem Prover, volume 828 of Lecture Notes in Computer Science. Springer Verlag, 1994. 21. F. Pfenning and C. Sch¨ urmann. System description: Twelf — a meta-logical framework for deductive systems. In Proceedings of the 16th International Conference on Automated Deduction, pages 202–206. Springer-Verlag LNAI 1632, 1999. 22. Z. Shao. Implementing typed intermediate language. In Proc. 1998 ACM SIGPLAN International Conference on Functional Programming (ICFP’98), pages 313–323. ACM Press, September 1998. 23. D. Tarditi, G. Morrisett, P. Cheng, C. Stone, R. Harper, and P. Lee. TIL: A type-directed optimizing compiler for ML. In 1996 SIGPLAN Conference on Programming Language Design and Implementation, pages 181–192. ACM Press, 1996.
Improving Symbolic Model Checking by Rewriting Temporal Logic Formulae David D´eharbe1 , Anamaria Martins Moreira1 , and Christophe Ringeissen2 1
Universidade Federal do Rio Grande do Norte — UFRN Natal, RN, Brazil {david,anamaria}@dimap.ufrn.br 2
LORIA-INRIA Nancy, France
[email protected]
Abstract. A factor in the complexity of conventional algorithms for model checking Computation Tree Logic (CTL) is the size of the formulae, and, more precisely, the number of fixpoint operators. This paper addresses the following questions: given a CTL formula f , is there an equivalent formula with fewer fixpoint operators? and how term rewriting techniques may be used to find it? Moreover, for some sublogics of CTL, e.g. the sub-logic NF-CTL (no fixpoint computation tree logic), more efficient verification procedures are available. This paper also addresses the problem of testing whether an expression belongs or not to NF-CTL, and providing support in the choice of the most efficient amongst different available verification algorithms. In this direction, we propose a rewrite system modulo AC, and discuss its implementation in ELAN, showing how this rewriting process can be plugged in a formal verification tool.
1
Introduction
The term model checking is used to name formal verification techniques that show that a given property, expressed as a temporal logic formula or as an automaton, is satisfied by a given structure, such as a transition-graph, that represents a system under analysis [15,4,12]. One of the most successful approaches is symbolic model checking [13], a decision procedure for verifying that a Computation Tree Logic (CTL) formula is true in a Kripke structure. In practice, due to the state space explosion problem, symbolic model checking must often be combined with other techniques. Today, research is being carried out with the goal of potentially increasing the size of the systems that can be verified through model checking. It is well-known that the size of the structure and of the properties are the two factors in the worst-case complexity function of symbolic model checking. We can identify three possible ways to attack this problem: reducing
Partially supported by projects CNPq–INRIA (FERUS) and CNPq–NSF (Formal Verification of Systems of Industrial Complexity).
S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 207–221, 2002. c Springer-Verlag Berlin Heidelberg 2002
208
D. D´eharbe, A.M. Moreira, and C. Ringeissen
the size of the model [5] (e.g. with abstraction, symmetry, decomposition), developing more efficient verification algorithms where the size of the model and of the formula have a smaller impact on the verification activity [10,2], and reducing the size of the formulae (with a rewrite system) [6]. The research presented in this paper contributes to the last two lines of action. The main contribution of this paper is a rewrite system to test if a given CTL formula belongs to a sublogic, called NF-CTL, a sub-logic of CTL that may be verified much more efficiently than with conventional symbolic model checking algorithms [6]. This rewrite system has been implemented in the ELAN platform [3]. Further, we initiate the study of the problem of, given a CTL formula, finding an equivalent formula with fewer temporal operators. The integration of these techniques shall result in more efficient formal verification tools. The remainder of this paper is structured as follows: Section 2 gives the required notions of symbolic model checking. Section 3 presents NF-CTL and the associated optimized verification algorithms. In Section 4, we present the main results of our study of rewriting properties of CTL and NF-CTL. The issues involved in plugging the developed rewriting systems into symbolic model checking are discussed in Section 5. Finally, Section 6 concludes and presents future work.
2
CTL Symbolic Model Checking
In this section, we provide the definitions of the main notions involved in symbolic model checking: the state-transition graph model of Kripke structures, and the syntax and semantics of the temporal logic CTL. Moreover, we sketch the symbolic model checking algorithms. 2.1
Kripke Structures
Let P be a finite set of boolean propositions. A Kripke structure over P is a quadruple M = (S, T, I, L) where: – – – –
S is a finite set of states. T ⊆ S × S is a transition relation, such that ∀s ∈ S, ∃s ∈ S, (s, s ) ∈ T . I ⊆ S is the set of initial states. L : S → 2P is a labeling function. L is injective and associates with each state a set of boolean propositions true in the state.
A path π in the Kripke structure M is an infinite sequence of states s1 , s2 , . . . such that ∀i ≥ 1, (si , si+1 ) ∈ T . π(i) is the ith state of π. States and transitions of a Kripke structure can be identified to their characteristic functions. The characteristic function of a state s is given by the conjunction of all propositions in the label of s, and of the negation of the propositions that are not in the label of s. The characteristic function of a transition t = (s1 , s2 ) is the conjunction of the functions for s1 and s2 . These definitions are extended to sets of states and sets of transitions. The characteristic functions play an important role in symbolic model checking algorithms, in order to represent and compute the forward image
Improving Symbolic Model
209
(the set of states that can be reached in one transition from a given set of states S ⊆ S, denoted Forward (S )), and the backward image (the set of states that reach a given set of states in one transition, denoted Backward (S )). 2.2
Computation Tree Logic
Given a set of atomic propositions P , the set of CTL formulae over P is given by the following definition. Definition 1. Given a finite set of boolean propositions P , CTL(P ) is the smallest set such that (1) P ∪ {0, 1} ⊆ CTL(P ) and (2) if f and g are in CTL(P ), then f ∗ g, f ⊕ g, ¬f , f + g, f ⇒ g, and EXf , EGf , E[f Ug], AXf , AGf , AFf , EFf , A[f Ug], A[f Rg], E[f Rg] are in CTL(P ). Informally, each operator is the combination of a path quantifier (E, for some path, or A, for all paths) and a state quantifier (X, in the next state, F, for some state, G, for all states, U, until, R, release). Formally, the following definition gives the semantics of CTL formulae in term of a given Kripke structure. Definition 2. Given a Kripke structure M = (S, T, I, L) over a finite set of atomic propositions P , the validity of a formula f ∈ CTL(P ) at a state s of M , denoted by M, s |= f , is inductively defined as follows, where g and h are in CTL(P ), then 1. 2. 3. 4. 5.
M, s |= p iff p ∈ L(s). M, s |= ¬g iff M, s |= g. M, s |= g ∗ h iff M, s |= g and M, s |= h. M, s |= 0 and M, s |= 1. M, s |= EXg iff there exists a state s of M such that (s, s ) ∈ T and s |= g. I.e. s has a successor where g is valid. 6. M, s |= EGg iff there exists a path π of M such that π(1) = s and ∀i ≥ 1, M, π(i) |= g. I.e. s is at the start of a path where g holds globally. 7. M, s |= E[gUh] iff there exists a path π of M such that π(1) = s and ∃i ≥ 1, M, π(i) |= h ∧ ∀j, i > j ≥ 1, M, π(j) |= g. I.e. s is at the start of a path where h holds eventually and g holds until h becomes valid. A formula f is said valid in M , denoted by M |= f if it is valid for all initial states: M |= f iff ∀s ∈ I, M, s |= f. The remaining temporal and boolean operators can be seen as notational shortcuts. The semantics of the boolean operators is usual1 . The remaining temporal operators obey the following set of axioms: AXf = ¬EX¬f AGf = ¬EF¬f = ¬EG¬f AFf = E[1Uf ] E = EFf A[f Ug] = ¬E[¬gU(¬f ∗ ¬g)] ∗ ¬EG¬g A[gRf ] = ¬E[¬gU¬f ] E[gRf ] = ¬A[¬gU¬f ] 1
+, ⊕, ∗ correspond respectively to or, xor, and
EG0 = 0 E[f U0] = 0 EX0 = 0
210
D. D´eharbe, A.M. Moreira, and C. Ringeissen
In the definition of E, the three axioms on the right-hand side are consequences of Definition 2 and are not standard: they show how expressions with “canonical” temporal operators can be further simplified. In our setting, we consider the equational theory opCTL defined as the union of E and an equational theory NEW involving two new operators: AGE⇒ (f, g) = AG(f ⇒ E[gRf ]) NEW = AGA⇒ (f, g) = AG(f ⇒ A[gRf ]) McMillan’s algorithm SMC ctl , known as symbolic model checking, consists in verifying if a CTL formula f stands in a boolean-based representation of a Kripke structure. SMC ctl recurses over the syntactic structure of the formula, returning the charateristic function of the set of states where f is valid2 . f is considered valid in the structure when it is valid in all its initial states. The following set of equations shows the rules that direct McMillan’s symbolic model checking: 1. 2. 3. 4. 5. 6.
SMC ctl (p) = p if p ∈ P SMC ctl (¬f ) = ¬SMC ctl (f ) SMC ctl (f ∗ g) = SMC ctl (f ) ∧ SMC ctl (g) SMC ctl (EXf ) = Backward (SMC ctl (f )) SMC ctl (EGf ) = gfpK.SMC ctl (f ) ∧ Backward (K) SMC ctl (E[f Ug]) = lfpK.SMC ctl (g) ∨ (SMC ctl (f ) ∧ Backward (K))
In these equations, gfp and lfp denote the greatest and least fixpoints respectively, applied to predicate transformers for the lattice of sets of states. In practice, the computational resources used in symbolic model checking depend on the nesting and number of iterations that are required to compute these fixpoints. In the rest of this paper, we will refer to CTL operators EG and E[U] as fixpoint operators.
3
No Fixpoint Computation Tree Logic
No fixpoint computation tree logic (NF-CTL) is a sub-logic of CTL that is such that there are decision procedures with false negatives, based on induction properties of the logic, and that do not require these fixpoint computations. 3.1
Step Temporal Logic
Initially, we define Step Temporal Logic (STL), a temporal logic that does not contain fixpoint operators. The syntax of STL is given by the following definition. Definition 3. Given a set of propositions P , STL(P ) is the smallest set such that P ⊆ STL(P ) and, if f and g are in STL(P ), then f ∗ g, f ⊕ g, ¬f, f + g, f ⇒ g, EXf , AXf are in STL(P ). 2
Actually, the characteristic function may also be valid for valuations of propositions that do not correspond to states of the structure, the so-called non-reachable states.
Improving Symbolic Model
211
Therefore, STL is the subset of CTL containing all formulae with no fixpoint operators. The semantics of STL is the same as that of CTL, and the algorithm SMC stl for symbolic model checking of STL consists of the parts of the algorithm SMC ctl handling the boolean and the EX operators (items 1 through 4 of the definition of SMC ctl ). 3.2
No Fixpoint Computation Tree Logic
NF-CTL is also a subset of CTL, but additionally contains some patterns of properties including fixpoint operators. However, NF-CTL formulae may be model checked without costly fixpoint computations [6]. The following definition gives the syntax of NF-CTL. Definition 4. Given a set of propositions P , NFCTL(P ) is the smallest set such that – STL(P ) ⊆ NFCTL(P ) – if f and g are in STL(P ), then A[f Rg], E[f Rg], AGE⇒ (f, g), AGA⇒ (f, g) are in NFCTL(P ) – if f and g are in NFCTL(P ), then f ∗ g, f ⊕ g, ¬f, f + g, f ⇒ g are in NFCTL(P ). Although at first look, NF-CTL seems to have a restricted expressive power, important classes of CTL formulae can be expressed in this subset: more than 30% of the properties found in the examples distributed with NuSMV [1] are in NF-CTL. The following formulae patterns are all in NF-CTL: AGf = A[0Rf ] (always), AG(f ⇒AGf ) = AG(f ⇒A[0Rf ]) (always from), A[gRf ] (release), AG(Req⇒A[AckRReq]) (handshaking), and AFf = ¬EG¬f = ¬E[0R¬f ] (eventually). Concerning verification optimization, Table 1 presents sufficient conditions for universal NF-CTL formulae to be true of a Kripke structure [6]. For instance, if Forward (f ∧ ¬g) ⊆ f holds, then we can conclude that the NF-CTL formula AG(f ⇒A[gRf ]) is valid. If, additionally, I |= f holds, then A[gRf ] is also valid. When I |= f , then A[gRf ] does not hold. Otherwise, we cannot conclude. While standard model checking verification of these formulae requires one (in the case of A[gRf ]) or two (in the case of AG(f ⇒A[gRf ])) fixpoint computations, checking these conditions requires at most a single computation of the Backward function. Note that each fixpoint computation requires O(|S|) calls to Backward (S is the number of states of the Kripke structure). Table 2 gives sufficient conditions for existential NF-CTL formulae to hold in a Kripke structure [6]. Again, the interesting part of this table lays in the second and fourth lines. If it can be shown that f ∧ ¬g ⊆ Backward (f ) is true, we can conclude that the NF-CTL formula AG(f ⇒E[gRf ]) is valid. Furthermore, if we have I |= f , then E[gRf ] is also valid. Again, standard model checking verification of these formulae requires one or two fixpoint computations while checking the above requirements needs no fixpoint computation at all.
212
3.3
D. D´eharbe, A.M. Moreira, and C. Ringeissen
Algorithms
The improved symbolic model checking algorithm is given in Figure 1. It tests if the formula to be verified is in NF-CTL, through the predicate NFCTLR . In case f is not in NF-CTL, the classical algorithm is invoked. In case f does belong to NF-CTL, the algorithm SMC nfctl tests the sufficient conditions. However, f is not always expressed as one of the NF-CTL template formulae and it would be operationally interesting to get an equivalent formula in one of the desired format. This equivalent formula is denoted f ↓R and its computation is achieved by rewriting as explained in Section 5, using the results developed in Section 4.
1 function SM C(f : CTL) : boolean 2 if NFCTLR (f ) then return SMC nfctl (f ↓R ) 3 else return SMC ctl (f )
Fig. 1. Improved symbolic model checking
The implementation of the predicate NFCTLR involves the detailed study of the rewriting properties of the logics CTL and NF-CTL, which is developed in Section 4. The algorithm SMC nfctl is given in Figure 2 and simply tests the different sufficient conditions enumerated in Tables 1 and 2.
4
Rewriting of Temporal Logic Formulae
In this section we are interested in defining a notion of normal forms for CTL formulae that would ease the problem of finding an equivalent NF-CTL formulae. Our aim is to reach these normal forms using an appropriate term rewrite system. 4.1
Rewriting of Boolean Formulae
Since Boolean formulae are CTL formulae, we already need a rewrite system to normalize Boolean formulae. On the one hand, it is known that Boolean Algebra admits no convergent term rewrite system [17]. On the other hand, it is known that Boolean Rings admits a term rewrite system which is confluent and terminating modulo the Associativity-Commutativity of conjunction (denoted by ∗) and exclusive or (denoted by ⊕). The Associativity-Commutativity equational theory for a binary operator f is defined as follows: f (x, f (y, z)) = f (f (x, y), z) AC(f ) = f (x, y) = f (y, x)
Improving Symbolic Model Table 1. Sufficient conditions for universal NF-CTL formulae I ⊆ f Forward (f ∧ ¬g) ⊆ f AG(f ⇒A[gRf ]) A[gRf ] 0 0 ? 0 0 1 1 0 1 0 ? ? 1 1 1 1
Table 2. Sufficient conditions for existential NF-CTL formulae I ⊆ f f ∧ ¬g ⊆ Backward (f ) AG(f ⇒E[gRf ]) E[gRf ] 0 0 ? 0 0 1 1 0 1 0 ? ? 1 1 1 1
Fig. 2. Algorithm for SMC nfctl
213
214
D. D´eharbe, A.M. Moreira, and C. Ringeissen
In the rest of the paper, we consider AC = AC(⊕) ∪ AC(∗). We assume the reader familiar with standard notations in the field of equational reasoning. Our notations are standard in rewriting theory [7]. For each set of rules R presented in this paper (involving possibly AC operators), we consider the rewrite relation −→R defined as follows: s −→R t if there exist a rule l → r ∈ R, a position ω in the term s, and a substitution σ of variables in l such that (1) the subterm of s at position ω, denoted by s|ω , is equal modulo AC to the σ-instance of l, and (2) t is the term obtained from s by replacing s|ω with the σ-instance of r. When the rewrite rule l → r, the position ω, and the substitution σ are known or necessary, we may indicate them as superscript of −→R . The rewrite relation −→R corresponds to the classical rewrite relation modulo AC denoted by −→R,AC in [11], where syntactic matching used in standard (syntactic) rewriting is replaced by AC-matching. We refer to [11] for standard definitions of confluence and termination modulo an equational theory. Definition 5. Given a set of propositions P , Bool(P ) is the smallest set such that (1) P ∪ {0, 1} ⊆ Bool(P ), and (2) if f and g are in Bool(P ), then f ∗ g, f ⊕ g, ¬f, f + g, f ⇒ g are in Bool(P ). The equational presentation for Boolean Rings below is the classical one, where we have three axioms defining standard boolean operators, ¬ (not), ⇒ (implication), + (or), in terms of operators ⊕, ∗, 0, 1. x∗x =x =x x∗1 =0 BR = x ∗ 0 x ∗ (y ⊕ z) = (x ∗ y) ⊕ (x ∗ z) x⊕x =0
x⊕0 ¬x x⇒y x+y
=x =x⊕1 = x ⊕ (x ∗ y) ⊕ 1 = (x ∗ y) ⊕ x ⊕ y
From now on, we assume that ∗ has a greater priority than ⊕, and so we omit parentheses in boolean expressions. Let BR→ be the rewrite system obtained from the left-to-right orientation of axioms in BR. Proposition 1. BR→ is confluent and terminating modulo AC. Corollary 1. AC-rewriting wrt. BR→ provides a decision algorithm for the equality modulo the equational theory of Boolean Rings defined by BR ∪ AC. Every formula in Bool(P ) can be uniquely normalized wrt. BR→ to a formula involving only boolean operators 0, 1, ⊕, ∗. 4.2
Rewriting of CTL Formulae
The set of rules for Boolean Rings can be enriched with rules defining temporal operators. The resulting term rewrite system allows us to reduce formulae in CTL(P ). Let opCTL→ be the rewrite system obtained from the left-to-right orientation of axioms in opCTL (cf Section 2.2). Proposition 2. opCTL→ ∪ BR→ is confluent and terminating modulo AC.
Improving Symbolic Model
215
Proof. It is confluent since left-hand sides of opCTL→ do not contain ACoperators, and there is no critical pairs in opCTL→ . Moreover, any peak between a rule in BR→ and a rule in opCTL→ commutes, because left-hand sides in opCTL→ are built over a signature which is different from the signature of BR→ . Termination can be proved by using the fully syntactic AC-RPO defined in [16], with a precedence such that: ⇒ + ¬ ∗ ⊕ 1 0, AG EF, E[R] A[U], CTL operators are greater than boolean operators, and CTL operators occurring in left-hand sides of opCTL→ are greater than operators in {EX, EG, E[U]}. ✷ Let CT L = opCTL ∪ BR ∪ AC. Corollary 2. AC-rewriting wrt. opCTL→ ∪ BR→ provides a decision algorithm for the equality modulo CT L. Every formula in CTL(P ) can be uniquely normalized wrt. opCTL→ ∪ BR→ to a formula involving only temporal operators EX, EG, E[U] and boolean operators 0, 1, ⊕, ∗. 4.3
NF-CTL Membership
The improved symbolic model checking function designed for NF-CTL formulae requires an algorithm for finding an equivalent formula that belongs to NF-CTL. The equivalent formula will be the normal form computed by rewriting. We have seen that the class of NF-CTL formulae is defined using operators EX, A[R], E[R], AGE⇒ (, ), AGA⇒ (, ). Consequently, we would like to retrieve these operators in normal forms. Hopefully, we can translate axioms in E in such way that temporal operators are expressed using EX, A[R], E[R]. This leads to the following equational theory, where equalities are obtained from E (cf. Section 2.2) thanks to the principle of replacement of equals for equals (using Boolean axioms). AXf = ¬EX¬f E[f Ug] = ¬A[¬f R¬g] E[0R0] = 0 A[f Ug] = ¬E[¬f R¬g] A[f R1] = 1 = ¬A[0R¬f ] D = EFf AFf = ¬E[0R¬f ] EX0 =0 EGf = E[0Rf ] AGf = A[0Rf ] Let BRR = BR→ ∪ D→ . Proposition 3. BRR is confluent and terminating modulo AC. Proof. Similar to the proof of Proposition 2.
✷ We have not yet considered the case of operators AGE⇒ (, ) and AGA⇒ (, ). Like other operators, AGE⇒ (, ) (resp. AGA⇒ (, )) can be expressed with A[R] and E[R]:
216
D. D´eharbe, A.M. Moreira, and C. Ringeissen
AGE⇒ (f, g) = AG(f ⇒ E[gRf ]) = A[0R(1 ⊕ f ⊕ f ∗ E[gRf ])] AGA⇒ (f, g) = A[0R(1 ⊕ f ⊕ f ∗ A[gRf ])] Since we want to have occurrences of AGE⇒ (, ) and AGA⇒ (, ) in normal forms, we would like now to orient these two equations from right to left: A[0R(1 ⊕ f ⊕ f ∗ E[gRf ])] → AGE⇒ (f, g)
(r)
A[0R(1 ⊕ f ⊕ f ∗ A[gRf ])] → AGA⇒ (f, g)
(r )
which means that some expressions of the form A[0R?] will still be reduced. But considering the previous rules is too naive, if we want to obtain a confluent (modulo AC) term rewrite system. Indeed, we need to take care of the shape of the BRR-normal form of f : it can be 0, 1, a negative (in the form u ⊕ 1), or a positive. Definition 6. A BRR-normal form f is negative if there exists a term u such that f =AC u ⊕ 1. A BRR-normal form is positive if it is neither negative, nor 0, nor 1. In the sequel, we consider four conditional variants of r and r , where the different cases are expressed by conditions. The rewrite rule r leads to the rewrite system ARE = {r1, . . . , r4}, and similarly r leads to ARA = {r1 , . . . , r4 }: A[0R(x ⊕ y ∗ E[gRf ])] → AGE⇒ (f, g) x ⊕ y ∗ E[gRf ] is in BRR-normal form if x ⊕ y ∗ E[gRf ] =AC (1 ⊕ f ⊕ f ∗ E[gRf ]) ↓BRR , (r1) f is positive A[0R(x ⊕ E[gRf ⊕ 1])] → AGE⇒ (f ⊕ 1, g) x ⊕ E[gRf ⊕ 1] is in BRR-normal form if x ⊕ E[gRf ⊕ 1] =AC (f ⊕ E[gRf ⊕ 1] ⊕ f ∗ E[gRf ⊕ 1]) ↓BRR , (r2) f ⊕ 1 is negative A[0R E[gR1]] → AGE⇒ (1, g) (r3) AGE⇒ (0, g) → A[0R1] (r4) Analogously, ARA can be defined from ARE by replacing occurrences of E[R] (resp. AGE⇒ (, )) with A[R] (resp. AGA⇒ (, )). For conditional rewrite systems ARE and ARA, it is not obvious to see that conditions of first and second rule can be satisfied, and so to be convinced that the related rewrite relation is non-empty. This is clarified by the following lemma. Lemma 1. Let f be a positive BRR-normal form. For each of the following terms: 1. r1− (f, g) := (1 ⊕ f ⊕ f ∗ E[gRf ]) ↓BRR 2. r2+ (f, g) := (f ⊕ E[gRf ⊕ 1] ⊕ f ∗ E[gRf ⊕ 1]) ↓BRR there exist two terms x, y such that
Improving Symbolic Model
217
1. r1− (f, g) =AC x ⊕ y ∗ E[gRf ] and r1− (f, g) is negative; 2. r2+ (f, g) =AC x ⊕ E[gRf ⊕ 1] and r2+ (f, g) is positive. Note that a similar lemma can be stated for ARA. Let us now consider the following rewrite system R = BRR∪ARE∪ARA. The rewrite relation generated by R mimics the equational theory CT L, as stated below. Proposition 4. For any f, g ∈ CTL(P ), we have f =CT L g ⇐⇒ f ←→∗R∪AC g At this point, a natural question arises: is R a confluent rewrite system? And if not, is it possible to turn R into an equivalent confluent rewrite system? In order to give an answer to this question, we first consider a subsystem of R. Lemma 2. BRR ∪ ARE is confluent modulo AC. ,r1
,r2
Proof. Assume there exists a peak t1 ←−ARE A[0Rt] −→ARE t2 . Lemma 1 implies that, on the one hand t is positive and on the other hand t is negative, which is impossible. Due to the use of conditions, there is no other critical pairs in ARE, and no critical pairs between BRR and ARE since the binary operator A[R] does not occur in left-hand sides of rules in BRR. ✷ The equational theory ARA is not just like ARE. Consider the first rule in ARA, say r1 . Its left-hand side contains two nested occurrences of A[R], and there is a critical pair of r1 on itself. This may be problematic for the confluence of BRR ∪ ARA, when reducing complicated CTL formulae with many nested A[R] operators. But it is no more a problem if we consider simple CTL formulae, like for instance NF-CTL formulae. To make precise what we call simple and complex CTL formulae, we stratify formulae in CTL with respect to the nesting of operators A[R] and E[R]. The following definition is motivated by the fact that formulae in NF-CTL cannot have more than two nested release operators. Definition 7. Let f ∈ CTL(P ). The depth of releases in f , denoted by dr(f ), is inductively defined as follows: – – – – – – –
dr(0) = dr(1) = 0 and dr(p) = 0 if p ∈ P , dr(AXf ) = dr(EXf ) = dr(¬f ) = dr(f ), dr(f ⊕ g) = dr(f ∗ g) = max(dr(f ), dr(g)), dr(EFf ) = dr(AFf ) = dr(EGf ) = dr(AGf ) = 1 + dr(f ), dr(E[f Ug]) = dr(A[f Ug]) = 1 + max(dr(f ), dr(g)), dr(E[f Rg]) = dr(A[f Rg]) = 1 + max(dr(f ), dr(g)), dr(AGE⇒ (f, g)) = dr(AGA⇒ (f, g)) = 2 + max(dr(f ), dr(g))
The normalization wrt. BRR does not increase the depth of releases. Given any f ∈ CTL(P ), let f ↓NEW be the normal form of f wrt. the confluent and terminating rewrite system NEW → , and let fˆ be the term (f ↓NEW ) ↓BRR . Proposition 5. For any f ∈ CTL(P ), dr(fˆ) ≤ dr(f ).
Let CTLn (P ) = {f ∈ CTL(P ) | dr(f ) ≤ n}. In the rest of this section, we concentrate on CTL2 (P ) since it includes NFCTL(P ). By considering only formulae in CTL2 (P ), ARA becomes confluent, and there is no critical pair between rules in ARA and rules in BRR ∪ ARE. Moreover, the termination of R can be easily proved using a lexicographic combination.
218
D. D´eharbe, A.M. Moreira, and C. Ringeissen
Proposition 6. R is confluent and terminating modulo AC over formulae in CTL2 (P ). Proposition 7. For any f ∈ NFCTL(P ), dr(f ) ≤ 2 and f ↓R ∈ NFCTL(P ). We are now ready to state our main result. Proposition 8. The following problem: “Given a formula f in CTL(P ), is there a formula g in NFCTL(P ) such that f =CT L g?” is decidable using AC-rewriting wrt. R. Proof. We prove the following statement: for each f in CTL(P ), there exists a formula g in NFCTL(P ) such that f =CT L g if and only if dr(fˆ) ≤ 2 and fˆ ↓R ∈ NFCTL(P ). – (⇒) f =CT L g implies fˆ =AC gˆ. According to Proposition 7, we have dr(g) ≤ 2 and g ↓R ∈ NFCTL(P ). Since dr(ˆ g ) ≤ dr(g) (by Proposition 5) and gˆ ↓R =AC g ↓R , we get dr(ˆ g ) ≤ 2 and gˆ ↓R ∈ NFCTL(P ). Since fˆ =AC gˆ, we can conclude that dr(fˆ) ≤ 2 and fˆ ↓R ∈ NFCTL(P ). – (⇐) If dr(fˆ) ≤ 2 and fˆ ↓R ∈ NFCTL(P ), then, we can choose g = fˆ ↓R since f =CT L fˆ ↓R . ✷ The proof of Proposition 8 motivates the definition of the following predicate used in the improved symbolic model checking function (see Figure 1): NFCTLR (f ) ⇔ (dr(fˆ) ≤ 2 ∧ fˆ ↓R ∈ NFCTL(P )) 4.4
Reducing the Number of Temporal Operators in CTL Formulæ
An interesting research direction consists now in combining the rewrite system R with other simplification rules, which are still valid in all Kripke structures. The following simplification rules aim at reducing the number of temporal operators occurring in a CTL formula. We can distinguish two sets of rules. The first set consists of rules built over temporal operators and boolean operators, whilst the second set is made of rules involving only temporal operators. Both sets of rules do not increase what we call the depth of releases. AGf ∗ AGg → AG(f ∗ g) → EF(f + g) EFf + EFg → EGf + EGg S1 = EG((g + EGf ) + (f + EGg)) AF((g ∗ AFf ) + (f ∗ AFg)) → AFf ∗ AFg A[(f ∗ f )U(A[f Ug] ∗ g + A[f Ug ] ∗ g)] → A[f Ug] ∗ A[f Ug ]
Improving Symbolic Model
S2 =
AGAGf EFEFf AGEFAGEFf EFAGEFAGf EGEGf
AFAFf AFEGAFf EGAFEGf AGEGf EFAFf
→ → → → → → → → → →
AGf EFf EFAGEFf AGEFAGf EGf AFf EGAFf AFEGf AGf EFf
219
AFAGAFf EGEFEGf AGAFAGf EFEGEFf A[f UA[f Ug]]
→ → → → →
AGAFf EFEGf AFAGf EGEFf A[f Ug]
AFEGAGAFf EGAFEFEGf AGAFEFAGEFf EFEGAGEFAGf
→ → → →
EGAGAFf AFEFEGf AFEFAGEFf EGAGEFAGf
On can remark that rules in S2 strictly decreases the depth of releases, when rules are applied at top-position of terms. Proposition 9. S1 ∪ S2 is terminating. Proof. Each rule of S1 ∪ S2 strictly decreases the number of temporal operators. ✷ We have used the daTac theorem-prover [18] to check that S1 and S2 are confluent and terminating modulo AC. The last four rules of S2 are critical pairs computed by daTac. However, S1 ∪ S2 is not confluent since (AGAGp) ∗ (AGq) has two different S1 ∪S2 -normal forms which are AG((AGp) ∗ q) and AG(p ∗ q). Even if S1 ∪ S2 is not confluent, it is clearly interesting to perform rewriting wrt. S1 ∪ S2 , and to check after each rewrite step whether or not the result term t is such that NFCTLR (t). If this is the case, we will be able to apply the improved model checking function.
5
A Rule-Based Implementation
The rewrite system R described above has been implemented using the ELAN [3] rule-based system, which provides an environment for specifying and executing rewrite programs in a language based on rewrite rules guided by strategies expressed with a strategy language. Obviously, the key feature we use from ELAN is AC-rewriting. Indeed, an ELAN program can be evaluated by two execution tools (an interpreter and a compiler) both supporting AC-rewriting. The implementation of rules in BRR is completely straightforward. Then, we declare rules in ARE ∪ ARA after rules in BRR. The order of rule declaration is significant in ELAN, since it gives implicitly the order of rule application. It means that terms involved in conditions of ARE ∪ ARA will be implicitly BRR-normalized. Therefore, ELAN rules corresponding to rules in ARE ∪ ARA can be easily implemented without an explicit call to BRR-normalization. For instance, the first rule of ARE will be declared in ELAN as the last default rule for A[0R?]: [] AR[0 R x xor y*E[g R f]] => AGEimp(f,g) if x xor y*E[g R f] == 1 xor f xor f*E[g R f]
220
D. D´eharbe, A.M. Moreira, and C. Ringeissen
Note that terms involved in the condition are automatically BRRnormalized, and the equality test == corresponds to AC-equality. The boolean function implementing NF-CTL membership is also defined by a set of ELAN rules (as well as the function implementing the depth of releases), like the following one: [] NF-CTL(E[f1 R f2]) => t if STL(f1) and STL(f2)
We have first tested our rewrite program with the ELAN interpreter on a set of CTL formulae [8]. We were also able to compile our rewrite program with the ELAN compiler. It generates first the related C code, and then a stand-alone executable. The communication between this executable and the symbolic model checker is possible via Unix pipes. Another more elegant solution would be to link the generated C code of the rewrite program with the C code implementing symbolic model checking. In this direction, we envision to use the TOM [14] pattern-matching compiler, which is a part of the ongoing effort to develop an ELAN toolkit. TOM can be viewed as a pre-processor for classical imperative languages such as C or Java. It allows us to combine rule-based declarations and C code, and to declare functions by pattern-matching like in functional programming. An AC-extension of TOM would provide us a way to compile rewrite rules in R and to link the resulting C code with, for instance, a C library for manipulating Binary Decision Diagrams.
6
Conclusion
We have shown an algorithm for testing if a CTL formula is equivalent to a formula in a subclass of CTL for which symbolic model checking can be improved thanks to less fixpoint computations. Our algorithm is based on normalization of formulae with respect to a rewrite system. This normalization is followed by a membership test, to check whether the normal form belongs to a particular subclass of CTL. In the future, we plan to generalize our approach by studying how a rewrite sytem such as R can be used to consider larger subclasses of CTL with more nested temporal operators, like the one introduced in [6]. Another interesting open problem consists in combining our rewrite system R with simplification rules dedicated to the elimination of superfluous temporal operators. A set of simplification rules is already identified [9], but it is not confluent and so we have to discover what would be a good winning rewriting strategy. At the implementation level, our long-term project is to combine rewriting techniques for simplifying CTL formulae with symbolic model checking of simplified CTL formulae. This future perspective opens interesting implementation problems. It is a typical application where we want to use rewriting techniques (and compilation of rewriting) in the area of symbolic model checking, in which efficient implementations are usually written with imperative programming languages.
Improving Symbolic Model
221
References 1. NuSMV home page. http://nusmv.irst.itc.it, accessed on Apr. 23 2002. 2. A. Biere, A. Cimatti, E. M. Clarke, and Y. Zhu. Symbolic model checking without BDDs. In Tools and Algorithms for Construction and Analysis of Systems, pages 193–207, 1999. 3. P. Borovansk´ y, C. Kirchner, H. Kirchner, P.-E. Moreau, and C. Ringeissen. An Overview of ELAN. In C. Kirchner and H. Kirchner, editors, Proc. Second Intl. Workshop on Rewriting Logic and its Applications, Electronic Notes in Theoretical Computer Science, Pont-` a-Mousson (France), Sept. 1998. Elsevier. 4. E. Clarke and E. A. Emerson. Design and synthesis of synchronization skeletons for branching time temporal logic. In Logics of Programs: Workshop, volume 131 of LNCS, pages 52–71. Springer Verlag, 1981. 5. E. Clarke, O. Grumberg, and D. Peled. Model Checking. MIT Press. 6. D. D´eharbe and A. M. Moreira. Symbolic model checking with fewer fixpoint computations. In World Congress on Formal Methods and their Application(FM’99), volume 1708 of LNCS, pages 272–288, 1999. 7. N. Dershowitz and J.-P. Jouannaud. Rewrite systems. In Handbook of Theorectical Computer Science, chapter 15. Elsevier Science Publishers B.V., 1990. 8. M. B. Dwyer, G. S. Avrunin, and J. C. Corbett. Patterns in property specifications for finite-state verification. Technical Report UM-CS-1998-035, 1998. 9. S. Graf. Logique du temps arborescent pour la sp´ecification et preuve de programmes. PhD thesis, Institut National Polytechnique de Grenoble, France, 1984. 10. H. Iwashita, T. Nakata, and F. Hirose. CTL model checking based on forward state traversal. In ICCAD’96, page 82, 1996. 11. J.-P. Jouannaud and H. Kirchner. Completion of a set of rules modulo a set of equations. SIAM Journal on Computing, 15(4):1155–1194, 1986. 12. R. P. Kurshan. Computer-Aided Verification of Coordinating Processes. Princeton Univ Pr, 1995. 13. K. McMillan. Symbolic Model Checking. Kluwer Academic Publishers, 1993. 14. P.-E. Moreau, C. Ringeissen, and M. Vittek. A pattern-matching compiler. In D. Parigot and M. van den Brand, editors, Proceedings of the 1st International Workshop on Language Descriptions, Tools and Applications, volume 44, Genova, april 2001. Electronic Notes in Theoretical Computer Science. 15. J.-P. Queille and J. Sifakis. Specification and verification of concurrent systems in CESAR. In Procs. 5th international symposium on programming, volume 137 of Lecture Notes in Computer Science, pages 244–263. Springer Verlag, 1981. 16. A. Rubio. A Fully Syntactic AC-RPO. In P. Narendran and M. Rusinowitch, editors, Rewriting Techniques and Applications, 10th International Conference, RTA99, LNCS 1631, pages 133–147, Trento, Italy, July 2–4, 1999. Springer-Verlag. 17. R. Socher-Ambrosius. Boolean Algebra Admits No Convergent Term Rewriting System. In R. V. Book, editor, Rewriting Techniques and Applications, 4th International Conference, RTA-91, LNCS 488, pages 264–274, Como, Italy, Apr. 10–12, 1991. Springer-Verlag. 18. L. Vigneron. Automated Deduction Techniques for Studying Rough Algebras. Fundamenta Informaticae, 33(1):85–103, Feb. 1998.
Conditions for Efficiency Improvement by Tree Transducer Composition Janis Voigtl¨ ander Department of Computer Science, Dresden University of Technology 01062 Dresden, Germany.
[email protected]
Abstract. We study the question of efficiency improvement or deterioration for a semantic-preserving program transformation technique based on macro tree transducer composition. By annotating functional programs to reflect the internal property “computation time” explicitly in the computed output, and by manipulating such annotations, we formally prove syntactic conditions under which the composed program is guaranteed to be more efficient than the original program, with respect to call-by-need reduction to normal form. The developed criteria can be checked automatically, and thus are suitable for integration into an optimizing functional compiler.
1
Introduction
Lazy functional languages are well suited for a modular programming style, where a task is solved by combining solutions of subproblems. Unfortunately, modular programs often lack efficiency compared to other — often less understandable — programs that solve the same tasks. These inefficiencies are caused, e.g., by the production and consumption of structured intermediate results such as lists or trees. As an example, consider the following definitions in Haskell: data Nat = S Nat | Z exp :: Nat → Nat → Nat exp (S x) y = exp x (exp x y) exp Z y = S y
div :: Nat → Nat div (S x) = div x div Z = Z
div :: Nat → Nat div (S x) = S (div x) div Z = Z n
The function exp — computing exp (S n Z) (S m Z) = S 2 +m Z — is defined using an accumulating parameter y, which will also be called context parameter henceforth. The functions div and div — computing div (S n Z) = S n div 2 Z — are defined by mutual recursion. If we sequentially compose exp and div — by computing for some value t of type Nat the expression e = (div (exp t Z)) — an intermediate data structure is created by exp and consumed by the div - and div -functions. A standard technique for eliminating intermediate results is classical deforestation [13], an algorithmic instance of the unfold/fold-technique [1]. However, classical deforestation does not succeed in optimizing e, due to its
Research supported by the DFG under grant KU 1290/2-1.
S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 222–236, 2002. c Springer-Verlag Berlin Heidelberg 2002
Conditions for Efficiency Improvement by Tree Transducer Composition
223
well-known problem of not reaching accumulating parameters [2]. Also, classical deforestation was only proved to be non-deteriorating for linear programs or for call-by-name evaluation without sharing [11]. K¨ uhnemann [7,8] tackled the problem of eliminating intermediate data structures in accumulating parameters by using composition results from the theory of tree transducers [5]. The functional programs considered are extended schemes of primitive recursion — allowing mutual recursion and nesting of terms in context parameter positions — so called macro tree transducers (for short mtts). Already Engelfriet and Vogler [4] showed that the sequential composition of two mtts can be realized by a single mtt, if one of the original mtts is defined without using context parameters. For the above example, e would thus be transformed to e = (f t (div Z) (div Z)), with new functions f and g constructed as1 : f (S x) y1 f Z y1 g (S x) y1 g Z y1
y2 y2 y2 y2
= f x (f x y1 y2 ) (g x y1 y2 ) = y2 = g x (f x y1 y2 ) (g x y1 y2 ) = S y1
The transformed program avoids the creation of an intermediate result and its eventual consumption, with obvious benefits for the efficiency. In particular, e needs fewer lazy evaluation steps for reduction to normal form than e. In general, the number of call-by-need reduction steps performed by the transformed program might also increase compared to the original one. If, for example, we consider the expression (exp (div t) Z), tree transducer composition can again eliminate the intermediate result. However, in this case the transformed program will — except for very small inputs — perform more callby-need reduction steps than the original program. Clearly, in order to use tree transducer composition as a program transformation technique in an optimizing compiler, we should be able to discriminate programs for which the transformation degrades efficiency from those for which the transformation is indeed beneficial. In this paper we develop such criteria that can be checked automatically by a compiler. These criteria are sufficient to classify the various examples of the composition techniques from [4] given in [9], where the performance improvement achieved by tree transducer composition for particular programs was assured by ad hoc reasoning or by experiments. Our results improve on formal efficiency considerations by K¨ uhnemann [8] for linear programs and by H¨ off [6] also for nonlinear ones. For example, our criteria can detect that the replacement of e from the introductory example by e is safe with respect to efficiency, which was not captured by previous results. Since in the case that the first involved mtt has no context parameters the tree transducer composition technique is equivalent to classical deforestation (cf. [8]), our results also establish call-by-need performance improvements through classical deforestation for some nonlinear programs. The remainder of this paper is organized as follows. In Sect. 2 we define basic notations and concepts of macro tree transducer units. Section 3 recalls 1
The basic idea is to construct definitions for f and g such that for every t = (S n Z) and t1 = (S m Z): f t (div t1 ) (div t1 ) = div (exp t t1 ) g t (div t1 ) (div t1 ) = div (exp t t1 ) .
224
J. Voigtl¨ ander
program transformation by tree transducer composition. In Sect. 4 we develop our formal efficiency analysis and give the main theorems, with application to tree transducer composition and classical deforestation. Finally, Sect. 5 contains future research topics.
2
Preliminaries
We denote by IN the set of natural numbers including 0. For n ∈ IN, we denote by [n] the set {1, . . . , n} ⊆ IN, and by Xn the finite set {x1 , . . . , xn } of variables; analogously for Yn and Zn . We denote simultaneous substitution of v1 , . . . , vn for u1 , . . . , un in v by v[u1 , . . . , un ← v1 , . . . , vn ], but will also use an alternative notation similar to set comprehensions, e.g., v[ui ← vi | i ∈ [n]]. We write substitutions left-associative. A ranked alphabet is a pair (Σ, rankΣ ), where Σ is a finite, nonempty set of symbols and rankΣ assigns to every σ ∈ Σ a natural number k, which will also be given by writing σ (k) . For every k ∈ IN, we define Σ (k) = {σ ∈ Σ | rankΣ (σ) = k}. For every set A disjoint from Σ, we define the set TΣ (A) of trees over Σ indexed by A as the smallest set T such that (i) A ⊆ T and (ii) if σ ∈ Σ (k) and t1 , . . . , tk ∈ T , then also (σ t1 · · · tk ) ∈ T . If readability allows, outer brackets of trees will be omitted. We denote TΣ (∅) by TΣ . For every set C ⊆ Σ ∪ A, we denote the number of occurrences of symbols from C in a tree t by |t|C . For a singleton set C = {c}, we denote |t|{c} by |t|c . We consider left-linear rewrite systems [3] with rewrite rules of the form lhs = rhs — with lhs and rhs being trees over ranked alphabets indexed by a set of variables, which will usually not be mentioned explicitly — where the right-hand side rhs contains only variables that also occur in lhs. A rewrite system R induces a binary nondeterministic reduction relation ⇒R describing left-to-right application of a rule from R in some context. We use the well-known concepts of confluence, termination and normal form. For a confluent and terminating reduction relation ⇒R , we denote the unique normal form of a tree t with respect to ⇒R by nf (⇒R , t). While ⇒R does not fix a reduction strategy, our efficiency analysis is concerned with lazy functional programming languages. Hence, we consider call-by-need reduction steps [14] (leftmost-outermost reduction with sharing) and denote by cbn(R, t) the number of such steps required to reach the normal form nf (⇒R , t). We model functional programs by macro tree transducer units (for short mtt units). Firstly, we describe the possible shapes of right-hand sides of their rules. Definition 1. Let Q and ∆ be ranked alphabets and k, r ∈ IN. The set RHS(Q, ∆, k, r) of right-hand sides over Q and ∆, with k recursion variables and r context variables, is the smallest set RHS ⊆ T∆∪Q (Xk ∪ Yr ) such that (i) Yr ⊆ RHS, (ii) for every δ ∈ ∆(n) and φ1 , . . . , φn ∈ RHS: (δ φ1 · · · φn ) ∈ RHS, and (iii) for every q ∈ Q(n+1) with n ∈ IN, xi ∈ Xk and φ1 , . . . , φn ∈ RHS: (q xi φ1 · · · φn ) ∈ RHS. Definition 2. An mtt unit M is a tuple (Q, Σ, ∆, R) with a ranked alphabet Q of states, where Q(0) = ∅, a ranked alphabet Σ of input symbols, a ranked
Conditions for Efficiency Improvement by Tree Transducer Composition
225
alphabet ∆ of output symbols, where Q ∩ (Σ ∪ ∆) = ∅, and a set R of rules, such that R contains for every k, r ∈ IN, σ ∈ Σ (k) and q ∈ Q(r+1) , exactly one rule of the form: q (σ x1 · · · xk ) y1 · · · yr = rhsM,q,σ , with rhsM,q,σ ∈ RHS(Q, ∆, k, r). Of course, the actual variable names used in rules R of an mtt unit are not fixed to come from Xk and Yr for some k, r ∈ IN; consistent renaming is allowed. The semantics of an mtt unit is given by the reduction relation induced by its rules. We give a rather artificial example of three mtt units that will be used to illustrate important phenomena in Sect. 4. For more practical examples see [9], where it was demonstrated that also typical functions on polymorphic data types and some higher-order functions like map can be viewed as mtt units by choosing appropriate function and constructor symbols. Example 1. Let Σ = Ω = {S (1) , Z (0) } and ∆ = {δ (2) , γ (1) , α(0) }. Then M1 = (1) (1) ({q (2) }, Σ, ∆, R1 ), M1 = ({q (3) }, Σ, ∆, R1 ) and M2 = ({p1 , p2 }, ∆, Ω, R2 ) are mtt units, where R1 , R1 and R2 contain rules as follows: R1 : q (S x1 ) y1 q Z y1 R1 : q (S x1 ) y1 q Z y1
R2 : p1 p1 p1 p2 y2 = δ (q x1 y2 (γ y1 )) (γ (q x1 y1 y2 )) p2 p2 y2 = y1 = γ (q x1 (δ (q x1 y1 ) y1 )) = y1
(δ x1 x2 ) = p2 x1 (γ x1 ) = S (p2 x1 ) α =Z (δ x1 x2 ) = p1 x2 (γ x1 ) = p1 x1 α =Z .
Definition 3. An mtt unit M = (Q, Σ, ∆, R) is: – – – – – – – – –
3
a top-down tree transducer unit (for short tdtt unit), if Q = Q(1) recursion-linear, if for every q ∈ Q, σ ∈ Σ (k) , i ∈ [k]: |rhsM,q,σ |xi < =1 context-linear, if for every q ∈ Q(r+1) , σ ∈ Σ, h ∈ [r]: |rhsM,q,σ |yh < =1 linear, if it is recursion-linear and context-linear recursion-nondeleting, if for every q ∈ Q, σ ∈ Σ (k) and i ∈ [k]: |rhsM,q,σ |xi > =1 context-nondeleting, if for every q ∈ Q(r+1) , σ ∈ Σ and h ∈ [r]: |rhsM,q,σ |yh > =1 basic, if the right-hand sides of its rules do not contain nested calls, i.e., subtrees of the form (q xi · · · (q xi · · ·) · · ·) atmost, if it is recursion-linear, and it is context-linear or basic atleast, if it is recursion-nondeleting, and it is context-nondeleting or basic.
Tree Transducer Composition
In this section we recall the semantic-preserving composition of two mtt units into a single mtt unit (the meaning of “semantic-preserving” will be made precise in Lemma 1 below). Constructions for performing such a composition in the cases that the first or the second mtt unit is a tdtt unit were given already in [4]. Here, we present a single transformation that captures both cases.
226
J. Voigtl¨ ander
Construction 1. Let M1 = (Q, Σ, ∆, R1 ) and M2 = (P, ∆, Ω, R2 ) be mtt units, such that one of the two is a tdtt unit, and Q ∩ P = ∅. Let m ∈ IN be the number of elements of P , and fix some ordering on P , such that P = {p1 , . . . , pm }. The composed mtt unit will have the set of states F = {qp(r∗m+s+1) | q ∈ Q(r+1) , p ∈ P (s+1) }. We use two rewrite systems: Pre, which contains for every h ∈ [r] with Q(r+1) = ∅ and every p ∈ P (1) , the rewrite rule: p yh = yh,p . Here the yh and yh,p are treated as ordinary symbols, rather than as variables of the rewrite system. Comp, which contains for every q ∈ Q(r+1) and p ∈ P (s+1) , the rewrite rule: p (q x y1 · · · yr ) z1 · · · zs = qp x (p1 y1 ) · · · (pm y1 ) · · · (p1 yr ) · · · (pm yr ) z1 · · · zs . Here x, y1 , . . . , yr and z1 , . . . , zs are considered as variables of the rewrite system. Since M1 or M2 is a tdtt unit, never both the ys and zs will be present. We abbreviate the right-hand side of the above rewrite rule as ζq,p . (0)
(0)
Since the ranked alphabets P, ∆, {y1 , y2 , . . .} and Q are pairwise disjoint, there are no critical pairs [3] in R2 ∪ Pre ∪ Comp. Hence, the reduction relation ⇒R2 ∪Pre∪Comp is confluent. It is also terminating, because for every rule the first arguments of calls to states of P in the right-hand side are proper subtrees of the first argument of the call on the left-hand side. Now, we can construct the mtt unit M1 M2 = (F, Σ, Ω, R1 R2 ) with rules as follows. For every q ∈ Q(r+1) , σ ∈ Σ (k) , rule q (σ x1 · · · xk ) y1 · · · yr = rhsM1,q,σ in R1 , and p ∈ P (s+1) , R1 R2 contains the rule: qp (σ x1 · · · xk ) y1,p1 · · · yr,pm z1 · · · zs = nf (⇒R2 ∪Pre∪Comp , p rhsM1,q,σ z1 · · · zs )
Then, M1 M2 implements the sequential composition of M1 and M2 , in the sense of the following lemma. Lemma 1. For every rewrite rule (p (q x y1 · · · yr ) z1 · · · zs = ζq,p ) ∈ Comp, t ∈ TΣ , t1 , . . . , tr ∈ T∆ and t1 , . . . , ts ∈ TΩ : 1. nf (⇒R1 ∪R2 , p (q t y1 · · · yr ) z1 · · · zs ) = nf (⇒R1 R2 , ζq,p [x ← t]) 2. nf (⇒R1 ∪R2 , p (q t t1 · · · tr ) t1 · · · ts ) = nf (⇒R1 R2 ∪R2 , ζq,p [x ← t][y1 , . . . , yr , z1 , . . . , zs ← t1 , . . . , tr , t1 , . . . , ts ]) . Proof. Both assertions follow from statement (II)(a)(i) of Lemma A.12 in [12] for φ = (q x1 y1 · · · yr ), respectively for φ = (q x1 t1 · · · tr ). Example 2. We compose the mtt units M1 and M2 from Example 1, yielding the mtt unit M1 M2 = ({qp1 (3) , qp2 (3) }, Σ, Ω, R1 R2 ) with the following set of rules: qp1 (S x1 ) y1,p1 qp1 Z y1,p1 qp2 (S x1 ) y1,p1 qp2 Z y1,p1
y1,p2 y1,p2 y1,p2 y1,p2
= S (qp2 x1 (qp2 x1 y1,p1 y1,p2 ) y1,p1 ) = y1,p1 = qp1 x1 (qp2 x1 y1,p1 y1,p2 ) y1,p1 = y1,p2 .
Note that here we have Comp = {p1 (q x y1 ) = ζq,p1 , p2 (q x y1 ) = ζq,p2 }, where ζq,p1 = qp1 x (p1 y1 ) (p2 y1 ) and ζq,p2 = qp2 x (p1 y1 ) (p2 y1 ).
Conditions for Efficiency Improvement by Tree Transducer Composition
227
Another example of the application of Construction 1 can be found in the introduction, where f = expdiv and g = expdiv . An optimizing compiler can take advantage of the composition construction by detecting appropriate places in the program where the rewrite rules from Comp can be applied. While the soundness of these rewritings is guaranteed by Lemma 1, it remains to be shown under which conditions Construction 1 actually leads to performance improvements over the original program (using mtt units M1 and M2 ) by the composed program (additionally containing the mtt unit M1 M2 , and with rewritings from Comp having been applied). From the form of rewrite rules in Comp it is obvious that intermediate data structures produced by M1 in the original program have been removed in the composed program. Thus, no memory cells for this intermediate result have to be allocated in the heap and later be deallocated by the garbage collector. This beneficial effect, however, might be rendered useless, if the composed program needs to perform more reduction steps than the original one. Hence, we would like to establish conditions under which we have for every q ∈ Q(r+1) , p ∈ P (s+1) , t ∈ TΣ , t1 , . . . , tr ∈ T∆ and t1 , . . . , ts ∈ TΩ : cbn(R1 R2 ∪ R2 , ζq,p [x ← t][y1 , . . . , yr , z1 , . . . , zs ← t1 , . . . , tr , t1 , . . . , ts ]) < cbn(R1 ∪ R2 , p (q t t1 · · · tr ) t1 · · · ts ) . =
4
Efficiency Analysis
We want to formally relate the efficiency of a composed program obtained by Construction 1 to the efficiency of the original program. As computation time is an intensional property — i.e., it cannot be extracted from the result of a computation — such a study cannot directly use the algebraic methods developed to reason about extensional properties, e.g., to establish the correctness of the composition construction in Lemma 1. A well-known strategy in this situation is to externalize the internal property, e.g., by transforming or annotating programs. Rosendahl [10] produces for every first-order functional program a stepcounting version that — when called with the same arguments — returns the number of call-by-value reduction steps performed by the original program. Sands [11] developed a call-by-name improvement theory and used it to prove the correctness of unfold/fold-transformations [1] by annotating functional programs with a special identity function that represents a single “tick” of computation time, and by stating a set of laws that can be used to derive statements about the relative efficiency of program expressions. This use of a special symbol to indicate performed reduction steps is similar to what we will be doing in Lemmata 2 and 3 in Sect. 4.2. Note, however, that our transformation technique goes beyond unfold/fold-steps if M1 is not a tdtt unit, hence Sands’ results can neither be used to prove Lemma 1, nor to establish criteria under which Construction 1 improves the efficiency of programs. 4.1
Annotating Programs
In the following, let M1 = (Q, Σ, ∆, R1 ) and M2 = (P, ∆, Ω, R2 ) be two fixed mtt units, one of which is a tdtt unit, with , ◦, •, 0 ∈ Σ ∪ ∆ ∪ Ω ∪ Q ∪ P . We
228
J. Voigtl¨ ander
use the notions and naming conventions from Construction 1. Based on M1 and M2 , we define — by annotating and adding rules — several new mtt units, the relevance of which will only become clear later. Definition 4. An mtt unit M1left→right has components (Q, Σ , ∆ , R1left→right ) with Σ and ∆ obtained by adding to Σ and ∆ symbols from {(1) , ◦(1) , •(1) , 0(1) } that are mentioned in left and in right, respectively, and with R1left→right containing rules as given in the table below. Several mtt units M2left→right = (P, ∆ , Ω , R2left→right ) are introduced analogously. R1→ : q (σ x1 · · · xk ) y1 · · · yr = (rhsM1,q,σ ) ∀q ∈ Q(r+1) , σ ∈ Σ (k) →• R2 : p (δ x1 · · · xk ) z1 · · · zs = • (rhsM2,p,δ ) ∀p ∈ P (s+1) , δ ∈ ∆(k) p ( x1 ) z1 · · · zs = • (p x1 z1 · · · zs ) ∀p ∈ P (s+1) →• R2 : p (δ x1 · · · xk ) z1 · · · zs = • (rhsM2,p,δ ) ∀p ∈ P (s+1) , δ ∈ ∆(k) R2→◦ : p (δ x1 · · · xk ) z1 · · · zs = rhsM2,p,δ ∀p ∈ P (s+1) , δ ∈ ∆(k) p ( x1 ) z1 · · · zs = ◦ (p x1 z1 · · · zs ) ∀p ∈ P (s+1) → R2 : p (δ x1 · · · xk ) z1 · · · zs = 0 (rhsM2,p,δ ) ∀p ∈ P (s+1) , δ ∈ ∆(k) R2◦•→◦• : p (δ x1 · · · xk ) z1 · · · zs = rhsM2,p,δ ∀p ∈ P (s+1) , δ ∈ ∆(k) p (◦ x1 ) z1 · · · zs = ◦ (p x1 z1 · · · zs ) ∀p ∈ P (s+1) p (• x1 ) z1 · · · zs = • (p x1 z1 · · · zs ) ∀p ∈ P (s+1) →◦•,0 R1 : q (σ x1 · · · xk ) y1 · · · yr = ∀q ∈ Q(r+1) , σ ∈ Σ (k) ◦ (rhsM1,q,σ [(δ · · ·) ← • (δ · · ·) | δ ∈ ∆]) R1→◦•,1 : q (σ x1 · · · xk ) y1 · · · yr = ∀q ∈ Q(r+1) , σ ∈ Σ (k) rhsM1,q,σ [(δ · · ·) ← • (δ · · ·) | δ ∈ ∆] [(q · · ·) ← ◦ (q · · ·) | q ∈ Q] R2→◦• : p (δ x1 · · · xk ) z1 · · · zs = • (rhsM2,p,δ ) ∀p ∈ P (s+1) , δ ∈ ∆(k) p ( x1 ) z1 · · · zs = ◦ (p x1 z1 · · · zs ) ∀p ∈ P (s+1) Note that if M1 fulfills one of the restrictions introduced in Definition 3, then also all the introduced M1left→right fulfill this restriction; analogously for M2 . 4.2
Ticking of Original Program
Note that M1→ from Definition 4 outputs a -symbol in every rule application. If none of these symbols is afterwards duplicated, then the number of -symbols in the output produced by M1→ is exactly the number of performed call-byneed reduction steps. This fact is used in the following lemma to determine the efficiency of the sequential composition of M1 and M2 . For the rest of the paper, we fix some q ∈ Q(r+1) , p ∈ P (s+1) , t ∈ TΣ , t1 , . . . , tr ∈ T∆ and t1 , . . . , ts ∈ TΩ , and the substitution κ = [y1 , . . . , yr , z1 , . . . , zs ← t1 , . . . , tr , t1 , . . . , ts ]. Lemma 2. If M1 is context-linear or basic, and M2 is atmost, then: cbn(R1 ∪R2 , p (q t t1 · · · tr ) t1 · · · ts ) = |nf (⇒R1→ ∪R2→• , p (q t t1 · · · tr ) t1 · · · ts )|•
Proof. In every application of a rule from R1→ (corresponding to an R1 -step), one -symbol is produced in the intermediate result. There is no other way how -symbols can be introduced, and since M1→ is (i) context-linear or (ii) basic —
Conditions for Efficiency Improvement by Tree Transducer Composition
229
i.e, no -symbols will appear in context parameters of states from Q in case (ii), or those parameters cannot be copied in case (i) — none of these -symbols will be duplicated. Only those -symbols are reached and reproduced as •-symbols by R2→• that correspond to an R1 -step being forced by call-by-need reduction of (p (q t t1 · · · tr ) t1 · · · ts ). Since M2→• is recursion-linear, every -symbol will be reproduced as •-symbol at most once. Since M2→• is context-linear or basic, those •-symbols and also the •-symbols produced by M2→• at ∆-symbols — corresponding to the R2 -steps during the call-by-need reduction — will not be duplicated. Hence, the resulting number of •-symbols is equal to the number of steps of R1 and R2 in the original program. Example 3. Recall the mtt units M1 and M2 from Example 1, and M1→ and M2→• as obtained from them according to Definition 4. Note that M1 is basic and M2 is atmost, hence the preconditions of Lemma 2 are satisfied. With t = S Z and t1 = t2 = α, we have, e.g., the following reduction: ⇒R1→ ⇒R2→• ⇒R1→ ⇒R2→• ⇒R2→• ⇒R1→ ⇒R2→• ⇒R2→•
p2 (q t t1 t2 ) p2 ( (δ (q Z t2 (γ t1 )) (γ (q Z t1 t2 )))) • (p2 (δ (q Z t2 (γ t1 )) (γ (q Z t1 t2 )))) • (p2 (δ ( t2 ) (γ (q Z t1 t2 )))) • (• (p1 (γ (q Z t1 t2 )))) • (• (• (S (p2 (q Z t1 t2 ))))) • (• (• (S (p2 ( t1 ))))) • (• (• (S (• (p2 t1 ))))) • (• (• (S (• (• Z)))))
(mark step of R1 ) (count marked step) (will not be counted) (count step of R2 ) (count step of R2 ) (mark step of R1 ) (count marked step) (count step of R2 )
Indeed, we have cbn(R1 ∪ R2 , p2 (q t t1 t2 )) = | • (• (• (S (• (• Z)))))|• = 5. Note that a step of R1 that is not forced by call-by-need evaluation is not counted in the final output. If we refrain from counting the steps of M1 and settle for the approximation of only counting the steps of M2 , we can drop part of the restrictions in Lemma 2: Lemma 3. If M2 is context-linear or basic, then: cbn(R1 ∪R2 , p (q t t1 · · · tr ) t1 · · · ts ) > |nf (⇒R1 ∪R2→• , p (q t t1 · · · tr ) t1 · · · ts )|• Proof. Every step of R2→• (corresponding to an R2 -step) produces one •-symbol. Since M2→• is context-linear or basic, no such symbol is duplicated. Hence, the number of •-symbols in nf (⇒R1 ∪R2→• , p (q t t1 · · · tr ) t1 · · · ts ) is equal to the number of R2 -steps during call-by-need reduction of (p (q t t1 · · · tr ) t1 · · · ts ). Since during this reduction at least one step must be performed by R1 , the inequality follows. 4.3
Ticking of Composed Program
Assume that Construction 1 composes M1 and M2 to M1 M2 = (F, Σ, Ω, R1 R2 ). Since M1→ or M2→◦ is a tdtt unit, Construction 1 is also applicable to these mtt units, yielding M1→ M2→◦ = (F, Σ, Ω ∪ {◦(1) }, R1→ R2→◦ ).
230
J. Voigtl¨ ander
Example 4. For M1→ and M2→◦ as obtained from the mtt units in Example 1, Construction 1 yields the mtt unit M1→ M2→◦ with set of rules: qp1 (S x1 ) y1,p1 qp1 Z y1,p1 qp2 (S x1 ) y1,p1 qp2 Z y1,p1
y1,p2 y1,p2 y1,p2 y1,p2
= ◦ (S (qp2 x1 (qp2 x1 y1,p1 y1,p2 ) y1,p1 )) = ◦ y1,p1 = ◦ (qp1 x1 (qp2 x1 y1,p1 y1,p2 ) y1,p1 ) = ◦ y1,p2 .
In the previous example, the rules obtained by the composition construction for M1→ and M2→◦ correspond to the rules obtained in Example 2 by composing M1 and M2 , except that every right-hand side has an additional ◦-symbol on top. This is due to the -symbols on top of all right-hand sides of R1→ and due to the added rules p1 ( x1 ) = ◦ (p1 x1 ) and p2 ( x1 ) = ◦ (p2 x1 ) in R2→◦ . The following lemma establishes that this observation holds in general. Lemma 4. For every f ∈ F and σ ∈ Σ: rhsM → M →◦ ,f,σ = ◦ (rhsM1 M2 ,f,σ ). 1
2
Proof. Straightforward by noting that the definitions of Pre and Comp only depend on Q and P , and that for every q ∈ Q, p ∈ P (s+1) and σ ∈ Σ: nf (⇒R2→◦ ∪Pre∪Comp , p ( (rhsM1,q,σ )) z1 · · · zs ) = nf (⇒R2→◦ ∪Pre∪Comp , ◦ (p rhsM1,q,σ z1 · · · zs )) = ◦ (nf (⇒R2 ∪Pre∪Comp , p rhsM1,q,σ z1 · · · zs ))
Hence, M1→ M2→◦ can be used to of M1 M2 , as exploited in the proof
approximate the number of reduction steps of the following lemma, which estimates the efficiency of the composed program without actually performing Construction 1. Lemma 5.
cbn(R1 R2 ∪ R2 , ζq,p [x ← t]κ) < |nf (⇒R→ , (nf (⇒R→ ∪R→◦ , p (q t y1 · · · yr ) z1 · · · zs )) κ)|{◦,} = 2 1 2 Proof. By Lemma 4, we know that for every f ∈ F and σ ∈ Σ, the following holds: rhsM → M →◦ ,f,σ = ◦ (rhsM1 M2 ,f,σ ). Since all rules simply have an addi1 2 tional symbol on top, we get the following equivalence: cbn(R1 R2 ∪ R2 , ζq,p [x ← t]κ) = cbn(R1→ R2→◦ ∪ R2→ , ζq,p [x ← t]κ). During call-by-need reduction of ζq,p [x ← t]κ with ⇒R→ R→◦ ∪R→ , every step of 1
2
2
R1→ R2→◦ , produces a ◦-symbol, while every step of R2→ produces a 0-symbol. Hence, the overall number of steps is < = |nf (⇒R1→ R2→◦ ∪R2→ , ζq,p [x ← t]κ)|{◦,} . By confluence considerations, we can obtain the same normal form by first reducing ζq,p [x ← t] to its normal form with respect to ⇒R→ R→◦ , then performing 1 2 the substitution κ and further reducing to normal form with ⇒R2→ . This gives us the equivalent expression |nf (⇒R2→ , (nf (⇒R→ R→◦ , ζq,p [x ← t])) κ)|{◦,} , 1 2 which by Lemma 1 from Sect. 3 is equal to: |nf (⇒R2→ , (nf (⇒R1→ ∪R2→◦ , p (q t y1 · · · yr ) z1 · · · zs )) κ)|{◦,} . 4.4
Combining Approximations
With Lemmata 2 and 3 we have two different ways to estimate the efficiency of the original program, while Lemma 5 approximates the efficiency of the composed program. In each case, this is done by annotating the rules of M1 and
Conditions for Efficiency Improvement by Tree Transducer Composition
231
M2 appropriately and counting certain symbols in the produced output. In the following, we will combine these results to determine the relative efficiency of the composed vs. the original program. Using Lemmata 2 and 5, we obtain the first of our main theorems: Theorem 1. If M1 is context-linear or basic, and M2 is atmost, then: cbn(R1 R2 ∪ R2 , ζq,p [x ← t]κ) − cbn(R1 ∪ R2 , p (q t t1 · · · tr ) t1 · · · ts ) < =0 Proof. By Lemmata 5 and 2, we have: cbn(R1 R2 ∪ R2 , ζq,p [x ← t]κ) − cbn(R1 ∪ R2 , p (q t t1 · · · tr ) t1 · · · ts ) < |χ|{◦,} − |nf (⇒R→ ∪R→• , p (q t t1 · · · tr ) t1 · · · ts )|• , = 1 2 where χ = nf (⇒R2→ , (nf (⇒R1→ ∪R2→◦ , p (q t y1 · · · yr ) z1 · · · zs )) κ) . It is clear that: |nf (⇒R1→ ∪R2→• , p (q t t1 · · · tr ) t1 · · · ts )|• = |nf (⇒R2→• , (nf (⇒R1→ ∪R2→• , p (q t y1 · · · yr ) z1 · · · zs )) κ)|• . Since we have t1 , . . . , tr ∈ T∆ , the outer reduction with ⇒R2→• will only apply rules at symbols from ∆. Hence, the previous expression is equivalent to |χ |{•,} , where χ = nf (⇒R2→ , (nf (⇒R1→ ∪R2→• , p (q t y1 · · · yr ) z1 · · · zs )) κ) . By comparing the definitions of R2→• and R2→◦ , it should be obvious that |χ|{◦,} − |χ |{•,} = |χ|◦ − |χ |• < = 0. Note that by further considering the value of |χ|◦ − |χ |• in the previous proof, we could obtain the more precise statement that under the preconditions of Theorem 1 the composed program saves at least as many reduction steps as were performed in the original program by states of M2 on the part of the intermediate result produced by rules of M1 . Also, note that Theorem 1 generalizes the efficiency statements about the composed program compared to the original program in Corollary 21 and Theorem 23 of [8], where M1 and M2 were required to be linear, and where in the case that M1 is not a tdtt unit M2 was restricted to have only one state. In order to relax the a priori restrictions imposed by Theorem 1 on the involved mtt units, we can use Lemma 3 (instead of Lemma 2) as starting point: Lemma 6. If M2 is context-linear or basic, then for every λ ∈ {0, 1}: cbn(R1 R2 ∪ R2 , ζq,p [x ← t]κ) − cbn(R1 ∪ R2 , p (q t t1 · · · tr ) t1 · · · ts ) < λ + |χ|◦ − |χ|• , where χ = nf (⇒R2◦•→◦• , p (nf (⇒R→◦•,λ , q t t1 · · · tr )) t1 · · · ts ). 1
Proof. By Lemmata 5 and 3, we have: cbn(R1 R2 ∪ R2 , ζq,p [x ← t]κ) − cbn(R1 ∪ R2 , p (q t t1 · · · tr ) t1 · · · ts ) < |χ1 |{◦,} − |nf (⇒R1 ∪R2→• , p (q t t1 · · · tr ) t1 · · · ts )|• , where χ1 = nf (⇒R2→ , (nf (⇒R1→ ∪R2→◦ , p (q t y1 · · · yr ) z1 · · · zs )) κ) . Replacing R2→◦ by R2→◦• in the expression defining χ1 does not change the number of occurrences of ◦ and 0, because only additional •-symbols will appear in the output, hence |χ1 |{◦,} = |χ2 |{◦,} , where χ2 = nf (⇒R2→ , (nf (⇒R1→ ∪R2→◦• , p (q t y1 · · · yr ) z1 · · · zs )) κ) . Also, it is clear that:
232
J. Voigtl¨ ander
|nf (⇒R1 ∪R2→• , p (q t t1 · · · tr ) t1 · · · ts )|• = |nf (⇒R1→ ∪R2→◦• , p (q t t1 · · · tr ) t1 · · · ts )|• = |nf (⇒R2→◦• , (nf (⇒R1→ ∪R2→◦• , p (q t y1 · · · yr ) z1 · · · zs )) κ)|• . Since we have t1 , . . . , tr ∈ T∆ , the outer reduction with ⇒R2→◦• will only apply rules at symbols from ∆, producing •-symbols. Hence, the previous expression is equivalent to |χ2 |{•,} , and thus the right-hand side of the above inequality is equal to |χ2 |{◦,} − |χ2 |{•,} , respectively to |χ3 |◦ − |χ3 |• , where χ3 = nf (⇒R2 , (nf (⇒R2→◦• , p (nf (⇒R1→ , q t y1 · · · yr )) z1 · · · zs )) κ) . The rules from R2→◦• replace every -symbol by one ◦-symbol, and produce a •-symbol for every consumed symbol from the intermediate ranked alphabet ∆. The same effect can be achieved by firstly directly producing ◦-symbols instead of -symbols, and secondly marking every produced intermediate symbol from ∆ with a •-symbol and then just reproducing those. Hence, χ3 is equal to: nf (⇒R2 , (nf (⇒R2◦•→◦• , p (nf (⇒R→◦•,0 , q t y1 · · · yr )) z1 · · · zs )) κ) . 1
Since, furthermore, the t1 , . . . , tr ∈ T∆ do not contain ◦- or •-symbols, and the rules of R2 are contained in R2◦•→◦• , the previous expression is by confluence considerations equal to: χ4 = nf (⇒R2◦•→◦• , p (nf (⇒R→◦•,0 , q t t1 · · · tr )) t1 · · · ts ). 1
In nf (⇒R→◦•,0 , q t t1 · · · tr ), the rules from R1→◦•,0 produce one ◦-symbol when1 ever a state is applied at some input symbol. Alternatively, we could produce one ◦-symbol atop every state call. Hence, χ4 is also equal to: χ5 = nf (⇒R2◦•→◦• , p (nf (⇒R→◦•,1 , ◦ (q t t1 · · · tr ))) t1 · · · ts ). 1
In the case λ = 0, the lemma now follows from |χ4 |◦ − |χ4 |• = 0 + |χ|◦ − |χ|• , while in the case λ = 1, the lemma follows from |χ5 |◦ − |χ5 |• = 1 + |χ|◦ − |χ|• (by normal form and definition of | · |◦ ). Lemma 6 is promising, because it estimates the efficiency improvement or deterioration of the composed program over the original one, without the need of actually performing the composition construction. But, in contrast to the above Theorem 1, the approximation is still input-dependent, while we would like to obtain a statement about all runs of the original and the composed program. Since we are interested in the difference in the number of occurrences of ◦and •-symbols, we can manipulate the right-hand sides of R1→◦•,λ , as long as we do not decrease the value of |χ|◦ − |χ|• in Lemma 6. A trivial way to do this is by removing • (◦ · · ·)- and ◦ (• · · ·)-contexts, because the rules in R2◦•→◦• would just reproduce those. If ◦- and •-symbols do not occur together, certain conditions have to be fulfilled to allow “bringing them closer” without decreasing the overall value of |χ|◦ − |χ|• . Such conditions are established in the following definition and lemma, and are then used to prove our second main theorem. Definition 5. The rewrite system Elim contains the following rewrite rules (with variables u, u1 , u2 , . . .): 1. • (◦ u) = u 2. ◦ (• u) = u
Conditions for Efficiency Improvement by Tree Transducer Composition
233
3. if M2 is atmost, then for every δ ∈ ∆(n) and i ∈ [n]: • (δ u1 · · · un ) = (δ u1 · · · (• ui ) · · · un ) 4. if M2 is atleast, then for every δ ∈ ∆(n) and i ∈ [n]: (δ u1 · · · (• ui ) · · · un ) = • (δ u1 · · · un ) 5. if M1 is context-nondeleting and M2 is atleast, then for every q ∈ Q(n+1) and i ∈ [n]: (q u u1 · · · (• ui ) · · · un ) = • (q u u1 · · · un ) Lemma 7. Let M = (Q, Σ, ∆∪{◦(1) , •(1) }, R) be an mtt unit, where Q, Σ and ∆ are the ranked alphabets of M1 (and R will typically contain annotated versions of the rules from M1 , which however is not a technical precondition of this lemma). Assume that M is context-nondeleting, if M1 is context-nondeleting. If we rewrite one right-hand side in R with one rewrite step of Elim, yielding an mtt unit M with set of rules R , then |χ|◦ − |χ|• < = |χ |◦ − |χ |• , where: χ = nf (⇒R2◦•→◦• , p (nf (⇒R , q t t1 · · · tr )) t1 · · · ts ) χ = nf (⇒R2◦•→◦• , p (nf (⇒R , q t t1 · · · tr )) t1 · · · ts ) .
Proof sketch. The lemma can be proved by structural induction on t ∈ TΣ , using a nested induction on the structures of right-hand sides from R and R2◦•→◦• , respectively, and using — for every p ∈ P (s +1) , n ∈ IN, i ∈ [n], τ, τ1 , . . . , τn ∈ T∆∪{◦(1) ,•(1) } , t ∈ TΣ , and δ ∈ ∆(n) or q ∈ Q(n+1) — one of the following properties (depending on the rule from Elim that was applied): 1. 2. 3. 4. 5.
|χ1 |◦ − |χ1 |• |χ2 |◦ − |χ2 |• |χ3 |◦ − |χ3 |• |χ4 |◦ − |χ4 |• |χ5 |◦ − |χ5 |•
where:
χ1 χ2 χ1 = χ2 χ3 = χ4 χ3 = χ4 χ5 χ5
< |χ1 |◦ − |χ1 |• = < |χ2 |◦ − |χ2 |• = < |χ3 |◦ − |χ3 |• , if M2 is atmost = < |χ4 |◦ − |χ4 |• , if M2 is atleast = < |χ5 |◦ − |χ5 |• , if M1 is context-nondeleting and M2 is atleast, = = nf (⇒R2◦•→◦• , p = nf (⇒R2◦•→◦• , p = nf (⇒R2◦•→◦• , p = nf (⇒R2◦•→◦• , p = nf (⇒R2◦•→◦• , p = nf (⇒R2◦•→◦• , p = nf (⇒R2◦•→◦• , p
(• (◦ τ )) z1 · · · zs ) (◦ (• τ )) z1 · · · zs ) τ z 1 · · · zs ) (• (δ τ1 · · · τn )) z1 · · · zs ) (δ τ1 · · · (• τi ) · · · τn ) z1 · · · zs ) (nf (⇒R , q t τ1 · · · (• τi ) · · · τn )) z1 · · · zs ) (nf (⇒R , • (q t τ1 · · · τn ))) z1 · · · zs ) .
Properties 1 and 2 are immediate from χ1 = • (◦ χ1 ) and χ2 = ◦ (• χ2 ). For j ∈ {3, 4, 5} it can easily be seen that |χj |◦ = |χj |◦ . Hence, to validate properties 3–5, it remains to prove that under the appropriate restrictions: |χ3 |• < = |χ3 |• , < |χ4 |• < = |χ4 |• , respectively, |χ5 |• = |χ5 |• . These inequations can be established by using the rule p (• x1 ) z1 · · · zs = • (p x1 z1 · · · zs ) in R2◦•→◦• , and the following two facts2 : Fact 1: if M2 is atmost, τ ∈ T∆∪{◦(1) ,•(1) } , and τ is obtained by inserting at most one •-symbol into τ , then: |nf (⇒R2◦•→◦• , p τ z1 · · · zs )|• < = 1 + |nf (⇒R2◦•→◦• , p τ z1 · · · zs )|• . 2
For property 5 we additionally use the observation that for context-nondeleting M , nf (⇒R , q t τ1 · · · τn ) is obtained from nf (⇒R , q t τ1 · · · (• τi ) · · · τn ) by removing at least one •-symbol.
234
J. Voigtl¨ ander
Fact 2: if M2 is atleast, τ ∈ T∆∪{◦(1) ,•(1) } , and τ is obtained by removing at least one •-symbol from τ , then: 1 + |nf (⇒R2◦•→◦• , p τ z1 · · · zs )|• < = |nf (⇒R2◦•→◦• , p τ z1 · · · zs )|• . Theorem 2. If M2 is context-linear or basic, and there exists λ ∈ {0, 1}, such that the rules of R1→◦•,λ can be rewritten with (finitely many applications of ) Elim until no ◦-symbols remain, then: cbn(R1 R2 ∪ R2 , ζq,p [x ← t]κ) − cbn(R1 ∪ R2 , p (q t t1 · · · tr ) t1 · · · ts ) < λ Proof. By the preconditions we know that R1→◦•,λ can be rewritten with Elim to some R∗ containing no ◦-symbols. By Lemma 6 and repeated applications of Lemma 7, we then have: cbn(R1 R2 ∪ R2 , ζq,p [x ← t]κ) − cbn(R1 ∪ R2 , p (q t t1 · · · tr ) t1 · · · ts ) < λ + |χ |◦ − |χ |• , where χ = nf (⇒R2◦•→◦• , p (nf (⇒R∗ , q t t1 · · · tr )) t1 · · · ts ). Since R∗ produces no ◦-symbols, no such symbols can occur in χ , i.e. |χ |◦ = 0. Hence, the theorem follows from |χ |• ∈ IN. Actually, we could obtain a more precise statement by further considering the value of |χ |• . Note that since Theorem 2 is based on the approximation from Lemma 3, which disregarded the steps performed by M1 in the original program, we actually obtain that the composed program performs at most as many reduction steps as were performed in the original program by states of M2 , plus λ. 4.5
Application
In Theorems 1 and 2 we have given criteria under which the composed program obtained from Construction 1 is at least as efficient as the original program. Note that these sufficient conditions for non-deterioration can be checked on the original program, hence an optimizing compiler will perform the composition construction only if it has ensured beforehand that this is indeed beneficial. As an example, consider e = (div (exp t Z)) from the introduction. Since the mtt unit defining exp is context-linear and the tdtt unit defining div and div is atmost, Theorem 1 is sufficient to automatically decide that a replacement of e by e is safe with respect to efficiency (for every possible input t). Note that also Theorem 2 is constructive, in the sense that we can algorithmically decide, whether there exists an λ ∈ {0, 1} such that the rules of a given R1→◦•,λ can be rewritten with Elim until no ◦-symbols remain, because the right-hand sides of mtt rules are finite trees and none of the rewrite rules in Elim introduces new symbols. Example 5. Recall the mtt units M1 and M2 from Example 1 (composed in Example 2). Since M2 is context-linear, Theorem 2 is applicable as follows. Consider the rules in R1 →◦•,1 (i.e., λ = 1): q (S x1 ) y1 = • (γ (◦ (q x1 (• (δ (◦ (q x1 y1 )) y1 ))))) q Z y1 = y1 Since M2 is atmost, Elim contains the rewrite rules of points 1–3 in Definition 5 from the previous subsection. Hence, we can rewrite as follows:
Conditions for Efficiency Improvement by Tree Transducer Composition
235
rhsM1 →◦•,1,q,S ⇒Elim(3) γ (• (◦ (q x1 (• (δ (◦ (q x1 y1 )) y1 ))))) ⇒Elim(1) γ (q x1 (• (δ (◦ (q x1 y1 )) y1 ))) ⇒Elim(3) γ (q x1 (δ (• (◦ (q x1 y1 ))) y1 )) ⇒Elim(1) γ (q x1 (δ (q x1 y1 ) y1 )) . From Theorem 2 it now follows that for every t ∈ TΣ and t1 ∈ T∆ : cbn(R1 R2 ∪ R2 , qp1 t (p1 t1 ) (p2 t1 )) − cbn(R1 ∪ R2 , p1 (q t t1 )) < 1 cbn(R1 R2 ∪ R2 , qp2 t (p1 t1 ) (p2 t1 )) − cbn(R1 ∪ R2 , p2 (q t t1 )) < 1 . 4.6
Results about Classical Deforestation
K¨ uhnemann [8] compares tree transducer composition with classical deforestation [13]. From his Lemma 20 follows that if M1 is a tdtt unit, then our composition construction essentially yields the same result as classical deforestation with implicit let-expressions. Hence, Theorems 1 and 2 can be used to establish conditions under which classical deforestation for lazy languages is guaranteed to improve efficiency even for nonlinear programs: Corollary 1. In the following cases, classical deforestation leads to a program at least as efficient as the original program3 : 1. M1 is a tdtt unit and M2 is an atmost mtt unit. 2. M1 is a tdtt unit, every rule of which has a right-hand side of the form (δ · · ·) for some δ ∈ ∆, and M2 is a context-linear or basic mtt unit. Proof. The proposition for case 1 follows from Theorem 1. In case 2 the proposition follows from Theorem 2 with λ = 0, applying the Elim-rule 2 from Definition 5.
5
Future Work
The presented efficiency analysis gives sufficient conditions for when call-byneed reduction of the composed program to normal form does not need more steps than for the original expression, independent from the input. In lazy functional programs, however, such expressions might also occur in a context where their reduction to normal form is not necessary. Hence, the analysis should be made context-independent. We conjecture that the efficiency statement from Theorem 1 remains valid also for partial call-by-need reductions, as does the statement of Theorem 2, if the Elim-rules under points 4 and 5 of Definition 5 are abandoned. Voigtl¨ ander and K¨ uhnemann [12] presented a new composition technique that generalizes the transformation considered here, by handling also cases where both involved mtt units use accumulating parameters. Our formal efficiency analysis method is also applicable to the extended transformation and then successfully classifies all examples from [7,8,9,12], but due to space constraints we could not elaborate on this more general setting in the present paper. 3
By the remarks below the proofs of Theorems 1 and 2, we can even show that the resulting program performs strictly fewer reduction steps than the original program.
236
J. Voigtl¨ ander
Acknowledgment. I would like to thank Armin K¨ uhnemann and the anonymous referees for helpful comments and suggestions.
References 1. R.M. Burstall and J. Darlington. A transformation system for developing recursive programs. J. ACM, 24:44–67, 1977. 2. W.N. Chin. Safe fusion of functional expressions II: Further improvements. J. Funct. Prog., 4:515–555, 1994. 3. N. Dershowitz and J.P. Jouannaud. Rewrite systems. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, volume B, chapter 6, pages 243–320. Elsevier Science Publishers B.V., 1990. 4. J. Engelfriet and H. Vogler. Macro tree transducers. J. Comput. Syst. Sci., 31:71– 145, 1985. 5. Z. F¨ ul¨ op and H. Vogler. Syntax-Directed Semantics—Formal Models Based on Tree Transducers. Monographs in Theoretical Computer Science. Springer-Verlag, 1998. 6. M. H¨ off. Vergleich von Verfahren zur Elimination von Zwischenergebnissen bei funktionalen Programmen. Master thesis, Dresden University of Technology, 1999. 7. A. K¨ uhnemann. Benefits of tree transducers for optimizing functional programs. In Foundations of Software Technology & Theoretical Computer Science, Chennai, India, Proceedings, volume 1530 of LNCS, pages 146–157. Springer-Verlag, 1998. 8. A. K¨ uhnemann. Comparison of deforestation techniques for functional programs and for tree transducers. In Functional and Logic Programming, Tsukuba, Japan, Proceedings, volume 1722 of LNCS, pages 114–130. Springer-Verlag, 1999. 9. A. K¨ uhnemann and J. Voigtl¨ ander. Tree transducer composition as deforestation method for functional programs. Technical Report TUD-FI01-07, Dresden University of Technology, 2001. 10. M. Rosendahl. Automatic complexity analysis. In Functional Programming Languages and Computer Architecture, London, England, Proceedings, pages 144–156. ACM Press, 1989. 11. D. Sands. Total correctness by local improvement in the transformation of functional programs. ACM Trans. on Prog. Lang. and Systems, 18:175–234, 1996. 12. J. Voigtl¨ ander and A. K¨ uhnemann. Composition of functions with accumulating parameters. Technical Report TUD-FI01-08, Dresden University of Technology, 2001. http://wwwtcs.inf.tu-dresden.de/˜voigt/TUD-FI01-08.ps.gz. 13. P. Wadler. Deforestation: Transforming programs to eliminate trees. Theoret. Comput. Sci., 73:231–248, 1990. 14. C.P. Wadsworth. Semantics and Pragmatics of the Lambda Calculus. PhD thesis, Oxford University, 1971.
Rewriting Strategies for Instruction Selection Martin Bravenboer and Eelco Visser Institute of Information and Computing Sciences, Universiteit Utrecht, P.O. Box 80089, 3508 TB Utrecht, The Netherlands. http://www.cs.uu.nl/˜visser,
[email protected],
[email protected]
Abstract. Instruction selection (mapping IR trees to machine instructions) can be expressed by means of rewrite rules. Typically, such sets of rewrite rules are highly ambiguous. Therefore, standard rewriting engines based on fixed, exhaustive strategies are not appropriate for the execution of instruction selection. Code generator generators use special purpose implementations employing dynamic programming. In this paper we show how rewriting strategies for instruction selection can be encoded concisely in Stratego, a language for program transformation based on the paradigm of programmable rewriting strategies. This embedding obviates the need for a language dedicated to code generation, and makes it easy to combine code generation with other optimizations.
1
Introduction
Code generation is the phase in the compilation of high-level programs in which an intermediate representation (IR) of a program is translated to a list of machine instructions. Code generation is usually divided into instruction selection and register allocation. During instruction selection the nodes of IR expression trees are associated with machine instructions. The process of instruction selection can be formulated as a rewriting problem. First the IR tree is rewritten to an instruction tree, in which parts (tiles) of the original tree are replaced by instruction identifiers. This instruction tree is then flattened into a list of instructions (code emission) during which (temporary) registers are introduced to hold intermediate values. Typically, the set of rewrite rules for rewriting the IR tree is highly ambiguous, i.e., there are many instruction sequences that implement the computation specified by the tree, some of which are more efficient than others. The instruction sequence obtained depends on the rewriting strategy used to apply the rules. Standard rewriting engines such as ASF+SDF [4] provide a fixed strategy for exhaustively applying rewrite rules, e.g., innermost normalization. This is appropriate for many applications, e.g., algebraic simplifications on expression trees, but not for obtaining the most efficient code sequence for an instruction selection problem. For the construction of compiler back-ends, code generator generators such as TWIG [1], BEG [5], and BURG [8] use tree pattern matching [6] and dynamic programming [2] to compute all possible matches in parallel and then choose the S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 237–251, 2002. c Springer-Verlag Berlin Heidelberg 2002
238
M. Bravenboer and E. Visser
rewrite sequences with the lowest cost. These systems use the same basic strategy, although there are some differences between them, mainly in the choice of pattern match algorithm and the amount of precomputation done at generator generation time. Although some other tree manipulation operations can be formulated in these systems, they are essentially specialized for code generation. Other compilation tasks need to be defined in a different language. In this paper we show how rewriting strategies for instruction selection can be encoded concisely in Stratego, a language for program transformation based on the paradigm of programmable rewriting strategies [10, 11]. The explicit specification of the rewriting strategy obviates the need for a special purpose language for the specification of instruction selection, allows the programmer to switch to a different strategy, and makes it easier to combine code generation with other optimizations such as algebraic simplification, peephole optimization, and inlining. In Section 2 we illustrate the instruction selection problem by means of a small intermediate representation, and a RISC-like instruction set, define instruction selection rules, and discuss covering a tree with rules. In Section 3 we explore several strategies for applying the RISC selection rules, including global backtracking to generate all possibilities, innermost rewriting, and greedy topdown (maximal munch). It turns out that no dynamic programming is needed in the case of simple instruction sets. In Section 4 we turn to the problem of code generation for complex instruction set (CISC) machines, which is illustrated with a small instruction set and corresponding rewrite rules. In Section 5 we introduce a specification of a rewriting strategy based on dynamic programming in the style of [1] to compute the cheapest rewrite sequence given costs associated with rewrite rules. The generic specification of dynamic programming is very concise, it fits in half a page, and provides a nice illustration of the use of scoped dynamic rewrite rules [9].
2
Instruction Selection
In this section we describe instruction selection as a rewriting problem. First we describe a simple intermediate representation and a subset of an instruction set for a RISC-like machine. Then we describe the problem of code generation for this RISC machine as a set of rewrite rules. The result of rewriting an intermediate representation tree with these rules is a tile tree which is turned into a list of instructions by code emission, a transformation which flattens a tree. In Section 4 we will consider code generation for a CISC-like instruction set. 2.1
Intermediate Representation
An intermediate representation (IR) is a low level, but target independent representation for programs. Constructors of IR represent basic computational operations such as adding two values, fetching a value from memory, moving a
Rewriting Strategies for Instruction Selection
239
module IR imports Operators signature sorts Exp Stm constructors BINOP : BinOp * Exp * Exp -> Exp // arithmetic operation CONST : Int -> Exp // integer constant REG : String -> Exp // machine register TEMP : String -> Exp // symbolic (temporary) register MEM : Exp -> Exp // dereference memory address MOVE : Exp * Exp -> Stm // move between memory a/o registers Fig. 1. Signature of an intermediate representation
value from one place to another, or jumping to another instruction. These operations are often more atomic than actual machine instructions, i.e., a machine instruction involves several IR constructs. This makes it possible to translate intermediate representations to different architectures. Figure 1 gives the signature of a simple intermediate representation format. We have omitted constructors for forming whole programs; for the purpose of instruction selection we are only interested in expressions and simple statements. Figure 2 shows an example IR expression tree in textual and corresponding graphical form. The expression denotes moving a value from memory at the address at offset 8 from the sum of the B and C registers to memory at offset 4 from the FP register. 2.2
Instruction Set
Code generation consists of mapping IR trees to sequences of machine instructions. The complexity of this task depends on the complexity of the instruction set of the target architecture. Reduced Instruction Set (RISC) architectures provide simple instructions that perform a single action, while Complex Instruction Set (CISC) architectures provide complex instructions that perform many actions. For example loading and storing values from and to memory is expressed MOVE MEM
MEM
BINOP PLUS
BINOP
REG
CONST
PLUS
BINOP
CONST
"FP"
4
PLUS
TEMP
TEMP
"B"
"C"
MOVE(MEM(BINOP(PLUS,REG("FP"),CONST(4))), MEM(BINOP(PLUS,BINOP(PLUS,TEMP("B"),TEMP("C")),CONST(8)))) Fig. 2. Example IR expression tree
8
240
M. Bravenboer and E. Visser
module RISC signature sorts Instr Reg constructors TEMP : String -> Reg REG : String -> Reg R : Reg -> Reg ADD : Reg * Reg * Reg -> Instr ADDI : Reg * Reg * Int -> Instr LOADC : Reg * Int -> Instr LOAD : Reg * Reg -> Instr LOADI : Reg * Reg * Int -> Instr MOVER : Reg * Reg -> Instr STORE : Reg * Reg -> Instr STOREI : Reg * Reg * Int -> Instr
// // // // // // // // // // //
temporary register machine register result register add registers add register and immediate load constant load memory into register load memory into register move register to register store register in memory store register in memory
Fig. 3. Constructors for a subset of the instructions of RISC machine.x
by separate instructions on RISC machines, while CISC instruction sets allow many instructions to fetch their operands from memory and store their results back to memory. We will first consider code generation for the small RISC-like instruction set of Figure 3. The instructions consist of arithmetic operations working on registers and storing their result in a register (only addition for the example), load and store operations for moving values from and to memory, and an operation for loading constants and one for moving values between registers. Several instructions have an ‘immediate’ variant in which one of the operands is a constant. In load and store operations these immediate values indicate an offset from the address register. For example, LOADI(REG("a"), REG("b"), 8) means loading the value at offset 8 from the address in register b. The sequence of instructions in Figure 4 implements the example IR tree in Figure 2. 2.3
Selecting Instructions with Rewrite Rules
The process of instruction selection can be divided into two phases. First, determine for each tree node which instruction to use. Then linearize the tree into a sequence of instructions (code emission). The connection between machine instructions and intermediate representation can be defined in terms of rewrite rules. Figure 5 defines rules mapping IR tree patterns to RISC instructions. To translate a complete IR tree to machine instructions we now have to find an [ADD(TEMP("b"),TEMP("B"),TEMP("C")), LOADI(TEMP("a"),TEMP("b"),8), STOREI(TEMP("a"),REG("FP"),4)]
ADD b B C LOADI a b 8 STOREI a FP 4
Fig. 4. An instruction sequence in abstract and concrete syntax.
Rewriting Strategies for Instruction Selection
241
module IR-to-RISC-Rules imports IR RISC rules Select-ADD : BINOP(PLUS, e1, e2) -> ADD(R(TEMP(
)), e1, e2) Select-ADDI : BINOP(PLUS, e, CONST(i)) -> ADDI(R(TEMP()), e, i) Select-LOADC : CONST(i) -> LOADC(R(TEMP()), i) Select-MOVER : MOVE(r@, e) -> MOVER(r, e) Select-STORE : MOVE(MEM(e1), e2) -> STORE(e2, e1) Select-STOREI : MOVE(MEM(BINOP(PLUS, e1, CONST(i))), e2) -> STOREI(e2, e1, i) Select-LOAD : MEM(e) -> LOAD(R(TEMP()), e) Select-LOADI : MEM(BINOP(PLUS, e, CONST(i))) -> LOADI(R(TEMP()), e, i) Fig. 5. Instruction selection rules mapping IR trees to instructions.
application of rules to nodes of the tree such that each node of the tree is covered by some pattern; except for ‘leaf’ nodes such as REG, TEMP and constants. Figure 6 shows a possible covering for the example IR tree of Figure 2 together with the resulting instruction tree. In an expression tree, results from subexpressions are passed implicitly up in the tree. When breaking a tree into a sequence of instructions this is no longer the case; intermediate values should be stored into registers or in memory. We will assume here that they can be stored in registers. In the rewrite rules in Figure 5 passing of intermediate values is achieved by generating a new temporary register at the position of the destination register to hold the result of the instruction. For example, rule Select-ADD generates the term ADD(R(TEMP()), e1, e2). The R( ) constructor indicates that this argument contains the result of the instruction. The new primitive generates a new unique name. 2.4
Code Emission
After covering the IR tree with instructions, the program is still in tree shape. Code emission linearizes the tree into a list of instructions. For example, for the tiling in Figure 6 we get the code sequence of Figure 4. A specification of a code emission strategy based on the generic tree flattening strategy postordercollect is given in the technical report version of this paper.
242
M. Bravenboer and E. Visser
MOVE MEM
MEM
BINOP REG
BINOP
PLUS
"FP"
CONST
BINOP
4
PLUS
PLUS
CONST
TEMP
TEMP
"B"
"C"
8
STOREI LOADI R
ADD
REG 8
4
"FP"
TEMP
R
TEMP
TEMP
"a_0"
TEMP
"B"
"C"
"b_0"
STOREI(LOADI(R(TEMP("x_0")), ADD(R(TEMP("y_0")),TEMP("B"),TEMP("C")),8), REG("FP"),5) Fig. 6. Optimum tiling of the example tree and the corresponding instruction tree with temporary intermediate result registers. STORE(LOAD(a,ADD(b,ADD(c,B,C),LOADC(d,8))),ADD(e,FP,LOADC(f,4))) STORE(LOAD(a,ADD(b,ADD(c,B,C),LOADC(d,8))),ADDI(g,FP,4)) STORE(LOAD(a,ADDI(h,ADD(i,B,C),8)),ADD(j,FP,LOADC(k,4))) STORE(LOAD(a,ADDI(h,ADD(i,B,C),8)),ADDI(l,FP,4)) STORE(LOADI(m,ADD(n,B,C),8),ADD(o,FP,LOADC(p,4))) STORE(LOADI(m,ADD(n,B,C),8),ADDI(q,FP,4)) STOREI(LOAD(r,ADD(s,ADD(t,B,C),LOADC(u,8))),FP,4) STOREI(LOAD(r,ADDI(v,ADD(w,B,C),8)),FP,4) STOREI(LOADI(x,ADD(y,B,C),8),FP,4) ADD LOADC ADD LOAD LOADC ADD STORE
c d b a f e a
B C 8 c d b 4 FP f e
ADD LOADI LOADC ADD STORE
n m p o m
B C n 8 4 FP p o
ADD y B C LOADI x y 8 STOREI x FP 4
Fig. 7. All possible instruction selections for the example IR tree and three prettyprinted code sequences
Rewriting Strategies for Instruction Selection
3
243
Strategies for Instruction Selection
In the previous section we have seen that rewrite rules can be used for expressing the mapping from intermediate representation trees to lists of machine instructions. However, the set of rewrite rules in Figure 5 is highly ambiguous, i.e., there are many rewritings possible. For example, Figure 7 shows all 9 possible tilings for the example expression in Figure 2 together with three of the code sequences pretty-printed. This kind of ambiguity is typical for instruction selection rules. Therefore, the instruction sequence finally obtained depends on the rewriting strategy used to order the application of selection rules. In this section we will examine several strategies and their applicability to instruction selection. 3.1
Intermezzo: Rewriting Strategies
Thusfar we have considered algebraic signatures and standard rewrite rules. In a normal rewriting system a term over a signature is normalized with respect to the set of rewrite rules. In Stratego the rewriting strategy is not fixed, but can be specified in a language of strategy combinators. A strategy is a term transformation that may fail. Rewrite rules are basic strategies. Examples of basic combinators are sequential composition, non-deterministic choice, deterministic choice, and a number of generic term traversal operators. Using these combinators a wide variety of rewriting strategies can be defined. The Stratego library provides a large number of generic strategies built using these combinators. In the rest of this paper we will define several compound strategies. The strategy elements in these definitions will be explained when needed. For a full account we refer to the literature; good starting points are [10, 9, 11]. 3.2
Exhaustive Application
The traditional way to interpret a set of rewrite rules is by applying them exhaustively to a subject term. The following strategy takes this approach: innermost-tilings = innermost(Select-ADD + Select-ADDI + Select-STORE + Select-LOAD + Select-STOREI + Select-LOADC + Select-LOADI + Select-MOVER)
The local non-deterministic choice operator (s1 + s2) combines two strategies s1 and s2 by trying either of them. If that fails the other is tried. If one of the branches has succeeded the choice is committed, i.e., no backtracking to this choice is done if the continuation fails. The innermost strategy takes a transformation, e.g., a choice of rules, and applies them exhaustively to a subject term starting with the inner terms. This is expressed generically as innermost(s) = all(innermost(s)); try(s; innermost(s))
244
M. Bravenboer and E. Visser
The generic traversal combinator all(s) applies the transformation s to each direct subterm of a constructor application. Thus, C(t1,...,tn) denotes C(<s>t1,...,<s>tn). The first part of the strategy normalizes all direct subterms of a node. After that (; is sequential composition), the strategy tries to apply the transformation s. If that succeeds the result, e.g., the right-hand side of the term, is again normalized. The combinator try(s) is defined as try(s) = (s <+ id). That is, try to apply s, but if that fails do nothing, i.e., use the identity strategy id. Thus, if s fails the subject term is in normal form. The innermost strategy is not adequate for instruction selection, however. It produces the worst result, e.g., the first result in Figure 7. This is caused by the fact that reducible subexpressions of a pattern are reduced before the (more complex) pattern itself is applied. Furthermore, the exhaustive application is overkill since instruction selection rules do not need exhaustive application; each node needs to be rewritten at most once. 3.3
All Rewritings
Another approach is to generate all possible results. This is achieved by the following strategy: all-tilings = bagof(topdown(try(Select-ADD ++ Select-ADDI ++ Select-STORE ++ Select-STOREI ++ Select-LOADC ++ Select-LOAD ++ Select-LOADI ++ Select-MOVER)))
In constrast to the + operator used above, the global non-deterministic choice operator (s1 ++ s2) is a non-committing choice operator. This means that after one of the branches has succesfully terminated, but the continuation fails, the strategy backtracks to the other branch. Formally, we have that (s1 ++ s2); s3 is equal to (s1; s3) ++ (s2; s3), which does not hold for +. This property is used by the bagof(s) operator to produce the list of all possible results by forcing a backtrack to the last choice point. The topdown strategy, defined as topdown(s) = s; all(topdown(s)) traverses a tree in pre-order, applying a transformation s to a node before transforming its subterms using the generic traversal operator all. The list of all results in Figure 7 was produced using this strategy. After generating the list of all possible results we could define a filter that selects the best solution. As a specification this is interesting, as an implementation it is not feasible due to the combinatorial explosion; for each node all possible rewrites for its subnodes need to be considered. 3.4
Maximal Munch
If each rule corresponds to a machine instruction, the best solution is usually the shortest sequence of instructions. In other words, we want to maximize the tree tilings selected. The following strategy takes this approach:
Rewriting Strategies for Instruction Selection
245
maximal-munch = topdown(try((Select-ADDI <+ Select-ADD) + Select-LOADC + (Select-STOREI <+ Select-STORE) + (Select-LOADI <+ Select-LOAD) + Select-MOVER))
Starting at the root of the expression tree, the strategy tries to apply one of the selection rules. Instead of combining all rules with the non-deterministic choice operator +, some pairs of rules are combined using the deterministic choice operator (s1 <+ s2), which always tries its left argument first. In this way the maximal-munch strategy gives priority to patterns that take the largest bite, e.g., preferring Select-ADDI over Select-ADD. In fact, this simple strategy, combining pre-order top-down traversal with rule priority, is adequate for simple RISC-like instruction sets. 3.5
Pattern Matching
As can be seen from the examples above, programmable strategies allow the combination of rewrite rules into different strategies. This flexibility in combining rules does not restrict the efficient implementation of pattern matching. When combining rules in a choice, as in the examples above, the rules are not tried one by one. Instead a pattern match optimization compiles the rule selection into a matching automaton which folds left-hand sides with the same root constructor into a single condition (and recursively for subpatterns).
4
Complex Instruction Sets
In RISC instruction sets complex memory addressing modes are restricted to load and store operations. In complex instruction set (CISC) machines, many instructions can use the full range of addressing modes to read their operands directly from memory using base and index registers and constant offsets from these. Some machines also have heterogeneous register sets such that not all instructions can work with all registers, requiring values to be moved between registers. For such machines maximal munch is not adequate. Code generators in production compilers use a dynamic programming approach to compute the code module CISC signature sorts Reg Addr Instr constructors TEMP : String -> Reg REG : String -> Reg NONE : Reg R : Reg -> Reg
IMM REGA ADDR ADD MOVEM
: : : : :
Int -> Addr Reg -> Addr Reg * Reg * Int * Int -> Addr Addr * Addr * Addr -> Instr Addr * Addr -> Instr
Fig. 8. Constructors for a subset of the instructions of a CISC machine
246
M. Bravenboer and E. Visser
MOVEM 4[FP] 8[B,C]
# C1=1 C2=1 C3=0 C4=0 C5=0 C6=0 C7=0 cost=1
ADD r B C MOVEM 4[FP] 8[r]
# C1=1 C2=1 C3=0 C4=0 C5=0 C6=0 C7=2 cost=2
ADD ADD ADD MOVEM
# C1=1 C2=1 C3=1 C4=1 C5=3 C6=3 C7=4 cost=6
n FP $4 e B C l e $8 [n] [l]
Fig. 9. Lowest cost selections based on different cost assignments
sequence with the least cost. To illustrate rewriting with dynamic programming we define the small CISC-like instruction set in Figure 8. In contrast to the RISC instruction set there are no special load and store operations. Instead each instruction can use all addressing modes and thus load and store values to memory. The ADDR(r1,r2,scale,off) addressing mode indicates a value in memory at address r1 + (r2*scale) + off, where r1 is a base register, r2 an index register, and scale and off constants. Figure 10 defines instruction selection rules for this instruction set from the intermediate representation of Figure 1. Since each instruction can use all addressing modes, it is not desirable to define rules for each instruction parsing each addressing mode. Instead addressing modes are selected separately. To ensure that tiles connect, each rule indicates the mode of its tile and the expected modes of the leaves of the tile using the constructor l(mode,tile). Again, this rule set is ambiguous. To indicate which rewrite should be used, costs are assigned to rewrite rules corresponding to the operational cost of selected instruction. Figure 11 assigns symbolic costs C1 to C7 to several of the patterns of Figure 10. The goal now becomes to select the tree with lowest overall cost. Figure 9 shows several lowest cost selections for different cost assignments to these symbolic costs. This illustrates that instruction selection can no longer be achieved by a fixed strategy.
5
Dynamic Programming
Maximal munch does not work since chain rules such as Select-LOAD can be applied indefinitly. A straightforward solution would be to generate all solutions, compute their cost, and take the cheapest solution. However, in Section 3 we already saw that this leads to a combinatorial explosion in which subresults are computed many times. In the case of the rules of Figure 10, the situation is worse, since rules Select-REGA and Select-LOAD together lead to infinitly many possibile rewrites. The dynamic programming approach of [2, 1] is similar to the all-tilings strategy of Section 2, but instead of computing all possible rewrites in sequence, we first compute all possible matches in one bottom-up pass over the tree, tabulating the cheapest match for each node. This information can then be used to
Rewriting Strategies for Instruction Selection
247
module IR-to-CISC-Rules imports CISC IR Dynamic-Programming signature constructors a : Mode reg : Mode imm : Mode s : Mode rules Select-MOVEM : MOVE(e1, e2) -> l(s, MOVEM(l(a, e1), l(a, e2))) Select-ADDM : MOVE(e1, BINOP(PLUS, e2, e3)) -> l(s,ADD(l(a,e1), l(a,e2), l(a,e3))) Select-ADD : BINOP(PLUS,e1,e2) -> l(reg,ADD(R(REGA(TEMP())),l(a,e1),l(a,e2))) Select-imm : CONST(i) -> l(imm, i) Select-reg : REG(r) -> l(reg, REG(r)) Select-reg’ : TEMP(r) -> l(reg, TEMP(r)) Select-ABS : MEM(i) -> l(a, ADDR(NONE,NONE,1,l(imm, i))) Select-BASE : MEM(r) -> l(a, ADDR(l(reg, r),NONE,1,0)) Select-BASEIMM : MEM(BINOP(PLUS, e1, e2)) -> l(a, ADDR(l(reg,e1), NONE, 1, l(imm,e2))) Select-BASEINDEX : MEM(BINOP(PLUS, e1, e2)) -> l(a, ADDR(l(reg,e1), l(reg,e2), 1, 0)) Select-BASEINDEXIMM : MEM(BINOP(PLUS, BINOP(PLUS, e1, e2), i)) -> l(a, ADDR(l(reg, e1), l(reg, e2), 1, l(imm, i))) Select-IMM : r -> l(a, IMM(l(imm, r))) Select-REGA : r -> l(a, REGA(l(reg, r))) Select-LOAD : x -> l(reg, MOVEM(R(REGA(TEMP())), l(a, x))) Fig. 10. Instruction selection module CISC-Costs imports CISC rules CiscCost : ADD(x, y, z) CiscCost : MOVEM(x, y) CiscCost : ADDR(NONE, y, 1, CiscCost : ADDR(x, NONE, 1, CiscCost : ADDR(NONE, y, 1, CiscCost : ADDR(x, NONE, 1, CiscCost : ADDR(x, y, 1, i)
rules mapping IR trees to CISC instructions.
0) 0) i) i)
-> -> -> -> -> -> ->
C1 C2 C3 C4 C5 C6 C7
where where where where where
<not(NONE)> <not(NONE)> <not(NONE)> <not(NONE)> <not(NONE)>
Fig. 11. Instruction costs
y x y; <not(0)> i x; <not(0)> i x; <not(NONE)> y
248
M. Bravenboer and E. Visser
perform the optimal rewrite sequence. Figure 13 presents the specification of a generic dynamic programming strategy. Figure 12 shows an instantiation of this strategy with the rules and costs of Figures 10 and 11, and a default cost of 0. 5.1
Optimum Tiling
The dynamic programming strategy is implemented by optimum-tiling, which is parameterized with a cost computation cost, a set of normal rules rs, and a set of chain rules is. The strategy consists of a bottomup traversal followed by a topdown traversal. During the bottomup traversal, the strategy match-tile finds out which rules are applicable and which one is the cheapest. This is expressed by defining a dynamic rewrite rule BestTile for each subtree t, which rewrites t tagged with a mode m as l(m,t) to the corresponding instruction tree. These dynamic rules are then applied in a topdown traversal over the tree. In other words, this corresponds to maximal-munch where the rule to be used is determined dynamically instead of statically and can be different at each node. 5.2
Match Tile
Strategy match-tile first creates a pair of the subject term and the application of the rules to the subject term. If one of the rules succeeds, it is registered by register-match. Then fail forces backtracking to another rule, until no more rules match. After applying the normal rules, the closure of the chain rules is computed by repeating the same procedure until no more chain rules apply. The repetition is necessary since one chain rule application can enable another one. This process terminates since register-match only succeeds if the cost of the match is lower than the cost of all previous matches. 5.3
Register Match
Register-match is the core of the dynamic programming strategy. It gets a pair of a subject tree and the result of applying a rule to it, which should be a tile tagged with a mode. ComputeCost returns the cumulative cost of the tile and the costs of the trees at the leafs of the tile. If this cost is less than the BestCost so far for the tree in this mode, the match is registered by generating new dynamic rules for BestCost and BestTile rewriting the l(mode,tree) term to the computed cost and the given tile. This overrides any previously generated rules for l(mode,tree). Thus, the best result per mode is computed for each node in a bottomup fashion; suboptimal results are discarded. Dynamic rewrite rules [9] are just like ordinary rules, except that they inherit the values of any of their variables that are bound in their definition context. Thus the definitions of the rule BestCost inherits from its context the values of the variables mode, tree, and cost. A rule specific for these values is generated, which may exist next to other BestCost rules, except if the values for mode and tree are the same. In that case the old rule is overridden.
Rewriting Strategies for Instruction Selection
module IR-to-CISC imports lib IR-to-CISC-Rules CISC-Costs Dynamic-Programming strategies cisc-select = optimum-tiling(CiscCost <+ !0, cisc-tiles, cisc-injections, !s) cisc-tiles = Select-ADDM ++ Select-MOVEM ++ Select-ADD ++ Select-imm ++ Select-reg ++ Select-reg’ ++ Select-ABS ++ Select-BASE ++ Select-BASEINDEX ++ Select-BASEIMM ++ Select-BASEINDEXIMM cisc-injections = Select-LOAD + Select-IMM + Select-REGA Fig. 12. Instantiation of the dynamic programming algorithm
module Dynamic-Programming imports lib dynamic-rules signature constructors l : Mode * a -> a strategies optimum-tiling(cost, rs, is, rootmode) = {| BestTile, BestCost : bottomup(match-tile(cost, rs, is)); !l(, ); topdown(try(BestTile)) |} match-tile(cost, rs, is) = try( test(!(, ); register-match(cost); fail)); repeat(test(!(, ); register-match(cost))) register-match(cost) = ?(tree, l(mode, tile)); where( tile => cost ; (cost, l(mode, tree)) ; rules( BestCost : l(mode, tree) -> cost BestTile : l(mode, tree) -> tile ) ) ComputeCost(cost) = where(cost => tile-cost) ; collect(?l(_,_)); foldr(!tile-cost, add, BestCost) Fig. 13. Dynamic programming algorithm for computing an optimum tiling.
249
250
5.4
M. Bravenboer and E. Visser
Compute Cost
The cost of a match is determined by adding the cost of the tile itself as provided by the parameter strategy cost and the BestCosts of the leafs of the tile. The leafs are obtained by collecting all outermost subterms matching l( , ) from the tile. For example, the tile ADDR(l(reg,e1), NONE, 1, l(imm,e2)) has cost C6 and leafs [l(reg,e1),l(imm,e2)]. Thus the fold over this list produces (l(reg,e1), (l(imm,e2), C6)). If BestCost fails for some leaf of a tile, this indicates that no tile was found for the corresponding tree with the mode needed by this tile.
6
Discussion
The instruction selection techniques in this paper were developed as part of a project building a complete Tiger compiler [3] using rewriting techniques. The generic strategies are part of the Stratego Standard Library, which contains many more generic strategies. The examples in this paper are written in Stratego and have been tested with release 0.7 of the Stratego Compiler1 . The maximal munch algorithm was directly inspired by [3]. However, our separation of selection rules and code emission makes the rules independent of the strategy to be used. The dynamic programming algorithm is based on the ideas of [2, 1]. These ideas have been used in several systems including TWIG [1], BEG [5], and BURG [8]. The main differences between these systems is the pattern matching algorithm used and the amount of computation done at generator generation time. BURG [8] is based on bottom-up rewriting (BURS) theory and performs all cost computations at generator generation time, whereas our implementation does all cost computations at code generation time. Furthermore, BURG uses bottom-up pattern matching [6], an efficient matching algorithm that does not reexamine subterms. This is achieved by normalizing patterns to shallow patterns and introducing new modes. We have not yet performed comparative benchmarks, but our implementation will undoubtely be much slower than the highly optimized BURG code generators. For a list of IR trees of 9200 nodes, the maximal-munch strategy took 0.03 seconds to produce a (RISC) instruction tree of 11,100 nodes (2100 instructions), while the optimum-tiling dynamic programming strategy took 1.96 seconds to produce a (CISC) instruction tree of 15,000 nodes (1000 instructions). We expect to improve the performance considerably by applying specialization and fusion in the style of [7], where an optimization for specialization of the generic innermost strategy is presented. A possible optimization is the normalization of rule left-hand sides in order to achieve better pattern matching.
7
Conclusion
In this paper we have shown the complete code of several instruction selection algorithms. Creating a complete code generator only requires more rules, not more 1
http://www.stratego-language.org
Rewriting Strategies for Instruction Selection
251
implementation. This is clear evidence of the expressivity of the specification of program transformations with rewrite rules and rewriting strategies. Instead of hardwiring the strategy in the rewrite engine, the programmer can define the most appropriate strategy for the task at hand. For example, one could start with a simple maximal-munch strategy for a code generator and only switch to dynamic programming when it is really needed. Furthermore, programmable strategies enable the combination of several transformations each using different strategies. Much of the expressivity of strategies is due to the ability of capturing generic schemas with generic traversal operators. Finally, the incorporation of dynamic programming techniques provides new opportunities for the specification of other kinds of transformations, and poses new opportunities for the optimization of strategies. In particular, it would be interesting to generalize tabulation techniques from code generators to more general transformation strategies.
References 1. A. V. Aho, M. Ganapathi, and S. W. K. Tjiang. Code generation using tree pattern matching and dynamic programming. ACM Transactions on Programming Languages and Systems, 11(4):491–516, October 1989. 2. A. V. Aho and S. C. Johnson. Optimimal code generation for expression trees. Journal of the ACM, 23(3):488–501, 1976. 3. A. W. Appel. Modern Compiler Implementation in ML. Cambridge University Press, 1998. 4. A. van Deursen, J. Heering, and P. Klint, editors. Language Prototyping. An Algebraic Specification Approach, volume 5 of AMAST Series in Computing. World Scientific, Singapore, September 1996. 5. H. Emmelmann, F.-W. Schroer, and R. Landwehr. BEG - a generator for efficient back ends. In ACM SIGPLAN 1989 Conference on Programming Language Design and Implementation (PLDI’89), pages 227–237. ACM, July 1989. 6. C. H. Hoffmann and M. J. O’Donnell. Pattern matching in trees. Journal of the ACM, 29(1):68–95, January 1982. 7. P. Johann and E. Visser. Fusing logic and control with local transformations: An example optimization. In B. Gramlich and S. Lucas, editors, Workshop on Reduction Strategies in Rewriting and Programming (WRS’01), volume 57 of Electronic Notes in Theoretical Computer Science, Utrecht, The Netherlands, May 2001. Elsevier Science Publishers. 8. T. A. Proebsting. BURS automata generation. ACM Transactions on Programming Languages and Systems, 17(3):461–486, May 1995. 9. E. Visser. Scoped dynamic rewrite rules. In M. van den Brand and R. Verma, editors, Rule Based Programming (RULE’01), volume 59/4 of Electronic Notes in Theoretical Computer Science. Elsevier Science Publishers, September 2001. 10. E. Visser. Stratego: A language for program transformation based on rewriting strategies. System description of Stratego 0.5. In A. Middeldorp, editor, Rewriting Techniques and Applications (RTA’01), volume 2051 of Lecture Notes in Computer Science, pages 357–361. Springer-Verlag, May 2001. 11. E. Visser, Z.-e.-A. Benaissa, and A. Tolmach. Building program optimizers with rewriting strategies. In Proceedings of the third ACM SIGPLAN International Conference on Functional Programming (ICFP’98), pages 13–26. ACM Press, September 1998.
Probabilistic Rewrite Strategies. Applications to ELAN Olivier Bournez and Claude Kirchner LORIA & INRIA, 615 rue du Jardin Botanique, BP 101, 54602 Villers-l`es-Nancy Cedex, Nancy, France. {Olivier.Bournez,Claude.Kirchner}@loria.fr
Abstract. Recently rule based languages focussed on the use of rewriting as a modeling tool which results in making specifications executable. To extend the modeling capabilities of rule based languages, we explore the possibility of making the rule applications subject to probabilistic choices. We propose an extension of the ELAN strategy language to deal with randomized systems. We argue through several examples that we propose indeed a natural setting to model systems with randomized choices. This leads us to interesting new problems, and we address the generalization of the usual concepts in abstract reduction systems to randomized systems.
1
Introduction
Term rewriting has been developed since the last thirty years, leading to a deep and solid corpus of knowledge about the rewrite relation induced by a set of rewrite rules. More recently, rule based languages focussed on the use of rewriting as a modeling tool, which results in making the out-coming specification executable in a very efficient way [11]. Such languages enlighten the fundamental role of rewrite strategies, either for computation or for deduction. In the ELAN language, the notion of rule and strategy are both first class and this is backed-up by the concept of rewriting calculus [5]. In this framework, that generalizes the lambda-calculus, the basic notions are those of rewrite rule and of rule application. To extend the modeling capabilities of the calculus, we explore in this work the possibility of making the rule application subject to probabilistic choices. Since the probabilistic firing of a rewrite rule can be seen as a specific kind of strategy, we introduce in the first part of this work a new notion of strategy. From a practical point of view, we added to ELAN the notion of probabilistic choice strategy, permitting us to fire under some probability a given set of rules. It is in particular a fundamental design choice to manage probabilities at the level of strategies instead of at the level of the rules themself where the flexibility and the semantics would have been then less clear. This leads us to interesting new capabilities and problems that we are addressing in this paper. S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 252–266, 2002. c Springer-Verlag Berlin Heidelberg 2002
Probabilistic Rewrite Strategies. Applications to ELAN
253
The combination of the concept of explicit rule application and of probabilistic choice has many applications. In particular for probabilistic data bases [21], probabilistic agents [6], genetic algorithms [9], randomized algorithms [15], randomized proof procedures [14], . . . etc. Therefore, the notion of probabilistic strategy is quite useful as a modeling tool. What we address in the second part of this paper is a first step in the understanding of such strategies in the rewriting context. Indeed we focus on the study of abstract reduction systems extensions under probabilistic choice. For example, we can define a notion of almost-sure termination as the property that the set of infinite derivations of an abstract reduction system is of null measure. A typical situation is the rewrite system a → a, a → b where the rewrite strategy consists in applying each rule with probability 0.5. This system is terminating with probability one but clearly not terminating. Similarly, probabilistic notions for confluence or other usual notions of classical abstract reduction systems can be defined. The design of tools to check these properties at the level of a rewrite relation is a challenge and will strengthen the use of the concept as a specification tool. This paper aims at putting a first stone towards these directions. We presented first ideas about such an approach to control rule firing together with a first prototype in [1]. A different point of view has been developed recently to extend the Constraint Handling Rule process with probabilistic capabilities put on the rewrite rules themselves [8,16,18,17,19]. This is, to the best of our knowledge, the only attempts to formalize probabilistic transitions using rule based languages. In Section 2 we introduce through some simple examples the strategy operator that we added to ELAN strategy language to deal with probabilistic systems. In Section 3, we argue through several examples that this is indeed a natural setting for specifying systems with probabilistic choices. In Section 4, we discuss on the operational semantic of the proposed strategy operator through a discussion on the way this operator is implemented in the ELAN prototype. This leads us to important questions about the generalization of usual rewriting concepts such as confluence and termination that we discuss for probabilistic abstract reduction systems in Section 5.
2
A General Probabilistic Rewrite Strategy Operator
Let us first consider the simple example of euro coin flipping1 . It could be modeled using the following rules: [h] x => head [t] x => tail 1
See http://www.inference.phy.cam.ac.uk/mackay/abstracts/euro.html for a summary of the recent attention devoted to the potential bias of the euro coin (thanks to one of the referees).
254
O. Bournez and C. Kirchner
where h and t are the names, also call labels of the rules, head, tail are just constant function symbols and x is a variable. In order to express the equiprobability of the two sides of a euro coin, we write equi => PC(h:0.5, t:0.5) which means that the two rules fire with the same probability and that the application of the strategy equi to flip (denoted (equi flip)) reduces either to tail or to head with the same probability 0.5. If euro coins were biased, we would write something like: bias => PC(h:0.4, t:0.6) Then, we would have with probability 0.4 the derivation (bias flip) => (PC(h:0.4, t:0.6) flip) => (h flip) => head and, with probability 0.6 the derivation (bias flip) => (PC(h:0.4, t:0.6) flip) => (t flip) => tail But actually a strategy operator that makes choices with fixed probabilities is not sufficient to model all games, since probabilities often also depend on parameters. Suppose for example, that in addition to bias problems there were no harmonization between the countries of the euro zone. Then, the bias would depend on the country and we would need somehow a function probaH, from countries to numbers in the interval [0, 1] to represent the bias. It could be natural to model this latter function as a strategy itself, mapping (normalizing) terms representing countries to real numbers in the interval [0, 1]. This could be done in ELAN using something like: [probaH] france => 0.49 [probaH] belgium => 0.50 ... The bias operator would then be simply described by: bias
=> PC(h:probaH, t:1 - probaH)
and we would write (bias luxembourg) to simulate Luxembourg euro flipping. Hence, we actually introduce the following strategy operator: for strategies p1 , . . . , pn mapping terms (which sort is un-important here) to terms of the sort of real numbers so that on any term t, Σi (pi t) = 1, and when s1 , . . . , sn are other strategies which application is subject to a certain probability law, P C(s1 : p1 , . . . , sn : pn ) is the strategy that consists, applied on some term t, in choosing an i ∈ {1, . . . , n} with probability (pi t), and then returning (si t), the application of strategy si on the term t. Before formally describing this strategy combinator, let us play with more examples.
Probabilistic Rewrite Strategies. Applications to ELAN
3
255
Examples
We now develop several examples to argue that the PC operator is indeed a natural setting for modeling probabilistic systems. We choose first an example that requires probabilistic choices with fixed probabilities. The second example requires probabilistic choices with run-time varying probabilities. We choose as third example to take a simple algorithm of the book [15] to exemplify how easy it is to transform any randomized algorithm into a ELAN program using the previous operator. Example 1. Suppose we want to model the following game between two players. Initially, two players have M euros (M ≥ 2). An unbiased coin is flipped. If it falls on head, player one wins one euro from player 2, otherwise player 2 wins two euros from player 1. The game stops when one of the two players is ruined. This can be modeled as follows. [h] game(M1,M2) => if [h] game(M1,0) => [t] game(M1,M2) => if [t] game(0,M2) => [t] game(1,M2) =>
game(M1+1,M2-1) M2>0 player-1-wins game(M1-2,M1+2) M1>1 player-2-wins player-2-wins
The strategy modeling the repetition of coin flipping is: play => repeat(PC(h:0.5, t:0.5)) Considering the derivations of (play game(M,M)), we get a modeling of our game. This is clearly not a terminating system, since there are infinite derivations starting from (play game(M,M)). However, according to probability theory, the probability of such an infinite derivation is null: with probability 1, a derivation terminates. So we can say, as we will develop in Section 5, that this system is almost-surely terminating (but not terminating). Of course we must also have a picking balls example: Example 2. Consider a urn with N green balls and two yellow balls. We pick a ball until we get a yellow one. We loose if we pick all the green balls. [pickgreen]
urn(N) => if [pickgreen] urn(1) => [pickyellow] urn(N) =>
urn(N-1) N>1 loose win
Transition probabilities now depend on time, i.e. on terms. If balls are picked equiprobably, then the probability of picking a green ball should be N/(N + 2). So we write:
256
O. Bournez and C. Kirchner
play => repeat(PC(pickgreen: p, pickyellow: 1-p)) where p is itself a strategy. [p] urn(N) => N / (N+2) We could add many other examples to argue that this setting is quite natural for prototyping probabilistic games, random walks, or randomized algorithms. To argue that randomized algorithms can easily been turned into a ELAN program, we take the first example of the book [15]. This is a sorting algorithm in expected time O(2nHn ) = O(n log n), where Hn is the nth Harmonic number [15]. Example 3. We need first to choose an element uniformly in a list L. This is obtained as the application of the probabilistic strategy any to L with: [thisone] y.L => y [otherone] y.L => L [pthisone] L => 1/length(L) [potherone] L => (length(L)-1)/length(L) any => repeat*(PC(thisone: pthisone, otherone: potherone)) And now, we sort as follows: [] sort(L) => sort(subset-lower(L,y)).y.sort(subset-greater(L,y)) where y:= (any L) [] sort(nil) => nil [] subset-lower(z.L,y) => z.subset-lower(L,y) [] subset-lower(z.L,y) => subset-lower(L,y) [] subset-lower(nil,y) => nil
if z < y if z >= y
[] subset-greater(z.L,y) => z.subset-greater(L,y) [] subset-greater(z.L,y) => subset-greater(L,y) [] subset-greater(nil,y) => nil
if z > y if z <= y
4
Operational Semantics
The previous PC operator is available in the ELAN system and the previous examples can be check in the current prototype. We now present the operational semantic of the probabilistic choice strategy. This is easy to get from following discussion on its implementation, using the background semantics of the ELAN language [3,5]. Actually, the PC operators are implemented in ELAN itself, using the classical trick of imperative programming to simulate any probabilistic distribution
Probabilistic Rewrite Strategies. Applications to ELAN
257
using a uniform one [13]. Concretely, we just added to ELAN a built-in operator uniform-random-generator() that returns a real number in interval [0, 1] with uniform distribution. Now, when the pi are strategies (possibly constant) n that, on any term t, evaluate to a positive (or null) real number such that i=1 (pi t) = 1, then the application of the strategy P C(s1 : p1 , . . . , sn : pn ) on a term t is defined as follows: [ ] (P C(s1 : p1 , . . . , sn : pn ) t) ⇒ (choose(y, s1 : (p1 t), . . . , sn : (pn t)) t) where y := uniform-random-generator() With, n for reals ξi ∈ [0, 1] (that result from the evaluation of pi on t) such that i=1 ξi = 1: [ ] choose(y, τ1 : ξ1 , . . . , τn : ξn ) ⇒ τ1 if y ≤ ξ1 [ ] choose(y, τ1 : ξ1 , . . . , τn : ξn ) ⇒ τ2 if ξ1 < y ≤ ξ1 + ξ2 ... [ ] choose(y, τ1 : ξ1 , . . . , τn : ξn ) ⇒ τn if ξ1 + ξ2 + . . . + ξn−1 < y This ELAN code simulates a non-uniform distribution using a uniform one. For example, we have the derivations (bias france) => (PC(h : probaH, t:1 - probaH) france) => (choose(0.7, h: (probaH france), t: ((1- probaH) france)) france) => (choose(0.7, h: 0.49, t: 0,51) france) => (t france) => tail and (bias france) => (PC(h: probaH, t:1 - probaH) france) => (choose(0.25, h: (probaH france), t: ((1- probaH) france)) france) => (choose(0.25, h: 0.49, t: 0,51) france) => (h france) => head
5
Probabilistic Abstract Reduction Systems
Dealing with probabilistic systems gives rise to interesting new problems. For example, what are the generalizations of the classical notions of rewriting such as confluence, termination? Which results still hold, and which can be extended
258
O. Bournez and C. Kirchner
to that setting? These are the kind of problems that we address in this section by studying the generalization of the classical results about abstract reduction systems to probabilistic ones. In a first step, we are not dealing with rewriting (ad posteriori nor with conditional rewriting, nor rewriting controlled by strategies) but with reduction systems. However, we claim that this study can give some clues to understand what probabilistic rewriting could be. Observe also that we took the approach of starting from results from classical abstract reduction systems and studying whether they are still valid in that setting. This is a priori different from considering probabilistic abstract reduction systems by themselves. We first come back to the classical setting (see for example [2,12]). An abstract reduction system (ARS) is A = (A, →) consisting of a set A and a binary relation →⊂ A×A on A . A derivation is a finite, or infinite sequence π = π0 → π1 · · · → πn with (πi , πi+1 ) ∈→ for all i. →n denotes the n-step composition of →: →0 is the identity, and →n+1 =→n ≤n ≤n ◦ →. → denotes less-than-n-step composition: → = 0≤i≤n →i . The reflexive transitive closure of a relation → is denoted by →∗ and its symmetric closure is denoted by ↔. The reduction relation → is said locally confluent (LC) if b ← a → c ⇒ ∃d b →∗ d ∧ c →∗ d. It is said confluent (C) if b ←∗ a →∗ c ⇒ ∃d b →∗ d ∧ c →∗ d. The following results are well-known. Proposition 1 (Classical Setting [2,12]). The following are equivalent: 1. 2. 3. 4. 5.
→ is confluent (C) →∗ is locally confluent (LC) →∗ is confluent (C) the relation is semi-confluent: b ← a →∗ c ⇒ ∃d b →∗ d ←∗ c. the relation is Church Rosser: a ↔∗ b ⇒ ∃c a →∗ c ←∗ b.
Let A be an ARS. a ∈ A is said to be in normal form if there is no b with a → b. We say that a has a normal form (hnf ) if there is a b in normal form with a →∗ b. a, b ∈ A are said to be convertible if a ↔∗ b. The reduction relation →, or A, is normalizing (N) if every a ∈ A has a normal form. It is terminating (T) if there is no infinite chains π0 → π1 · · · → πn . . . . It has the unique normal form property (UN) if for all a, b ∈ A, if a and b are convertible and in normal form, then a and b are identical. It has the normal form property (NF) if for all a, b ∈ A, if a is in normal form and convertible with b, then b →∗ a. It is inductive (Ind) if every reduction sequence π0 → π1 · · · → πn (possibly infinite) there is an a with πi →∗ a for all i. It is increasing (Inc) if there is a map f from A to N such that a → b implies f (a) < f (b). It is finitely branching (FB) if for all a, the set of b with a → b is finite. The following relations are well-known.
Probabilistic Rewrite Strategies. Applications to ELAN
259
Proposition 2 (Classical Setting [2,12]). 1. confluent (C) ⇒ normal form property (NF) ⇒ unique normal form property (UN) 2. terminating (T) and locally confluent (LC) ⇒ confluent (C) (Newman’s Lemma) 3. unique normal form property (UN) and normalizing (N) ⇒ confluent (C) 4. unique normal form property (UN) and normalizing (N) ⇒ inductive (Ind) 5. inductive (Ind) and increasing (Inc) ⇒ terminating (T) 6. locally confluent (LC) and normalizing (N) and increasing (Inc) ⇒ terminating (T) We can now switch to the probabilistic case. The idea behind the probabilistic setting is to consider reductions with probabilities. Let us first come back to school [10,7,20]: a σ-algebra on a set Ω is a set of subsets of Ω which contains the empty-set, and is stable by countable union and complementation. In particular, the set of subsets is a natural σ-algebra for any countable set. A measurable space (Ω, σ) is a set with a σ-algebra on it. If (Ω, σ) and (Ω , σ ) are measurable spaces, a function f : Ω → Ω is measurable if for all W in σ , f −1 (W ) ∈ σ. A probability is a function P from a σ-algebra to [0, 1], which is countably additive, and such that P (Ω) = 1. A triplet (Ω, σ, P ) is called a probability space. A random variable is a measurable function on some probability space. Given two probabilities P and P on σ and σ respectively, one can consider P ⊕ P which is the product measure defined on the σ-algebra σ×σ , and is characterized by P ⊕ P (A×A ) = P (A)P (A ). Given A, B ∈ σ, when P (B) > 0, the conditional probability of A given B is by definition P (A|B) = P (A ∩ B)/P (B). A stochastic sequence on a set S is a family (Xi )i∈N , of random variables defined on some fixed probability space (Ω, σ, P ) with values on S. It is said to be Markovian if its conditional distribution function satisfies the so-called Markov property, that is for all n P (Xn = s|X0 = π0 , X1 = π1 , . . . , Xn−1 = πn−1 ) = P (Xn = s|Xn−1 = πn−1 ), and homogeneous if furthermore this probability is independent of n. We are ready to define probabilistic abstract reduction systems (PARS). The idea is that a PARS is given by some set A, and a function [s ❀ t] that gives, for all s,t in A, the probability of going from s to t. Formally: Definition 1 (PARS). A probabilistic abstract reduction system (PARS) is a a countable set A and a function [ ❀ ] from pair A = (A, [ ❀ ]) consisting of A×A to [0, 1], such that for all s, t∈A [s ❀ t] is 0 or 1. Note that we allow t∈A [s ❀ t] to be 0. This happens exactly for terminal states: a state a ∈ A is terminal if there is no b with [a ❀ b] > 0. For nonterminal states, t∈A [s ❀ t] is necessarily 1.
260
O. Bournez and C. Kirchner
A derivation of A is then a finite or infinite homogeneous Markovian stochastic sequence whose transition probabilities are given by [ ❀ ]. Formally: Definition 2 (Derivations). A derivation π of A is an homogeneous Markovian stochastic sequence π = (πi )i∈N on S ∪ {⊥} such that for all i, P (πi+1 = t|πi = s) = [s ❀ t] if s = ⊥ non-terminal = 1 if t = ⊥ and s = ⊥ terminal = 1 if t = ⊥ and s = ⊥ = 0 otherwise. PARS correspond to the extension of ARS with probabilities on derivations. Indeed, to a finitely branching2 ARS A = (A, →), we can associate a PARS A = (A, [ ❀ ]) where, for all x, y ∈ A, [x ❀ y] is 0 if (x, y) ∈→, [x ❀ y] = 1/cardinal({t|(s, t) ∈→}) otherwise. Derivations of the ARS A and of the PARS A are in correspondence: a non-terminating chain π0 → π1 · · · → πn . . . of A corresponds to derivation π0 π1 . . . πn . . . of A (observe that we have πi ∈ A ∀i). A terminating chain π0 → π1 · · · → πn of A corresponds to derivation π0 π1 . . . πn ⊥⊥ . . . of A . Clearly, to an ARS can correspond several PARS. Conversely, to a PARS A = (A, [ ❀ ]) corresponds a unique ARS A = (A, →) which is obtained by forgetting probabilities: the relation →⊂ A×A of A is defined by s → t iff [s ❀ t] > 0. This unique ARS, or the corresponding relation →, will be called the projection of A . The matrix (pt,s ) = (P (πi = t|πi−1 = s)) on S ∪ {⊥} is what is called a stochastic matrix (even when S is an infinite set) [4]. It has the nice property that columns sum to 1. Actually our setting is the one of Homogeneous Markov Chain (HMC) theory [4]. The main difference is that we will focus on the generalization of classical rewriting notions and not on the usual Markov chain problematics: see [4] for a presentation of HMC theory. Definition 3 (Relations ❀, ❀n ,❀∗ ). Let s ∈ A be a state, and π = π0 π1 . . . πn a derivation with π0 = s. We write 1. 2. 3. 4. 2
s ❀ t iff π1 = t, s ❀n t iff πn = t, s ❀≤n t iff there is some i ≤ n with πi = t, s ❀∗ t iff there is some i ∈ N with πi = t.
To an unrestricted (non-necessarily finitely-branching) ARS associate similarly a PARS A by putting probabilities 1/2, 1/4, 1/8, . . . on the outgoing transitions of a non-terminal s when the set {t|(s, t) ∈→} is not finite.
Probabilistic Rewrite Strategies. Applications to ELAN
261
For s, t ∈ A, we write [s ❀ t] (respectively: [s ❀n t],[s ❀≤n t],[s ❀∗ t]) for the probability that s ❀ t (resp. s ❀n t, s ❀≤n t, s ❀∗ t) holds on a derivation π = π0 π1 . . . πn . . . given that π0 = s. Derivations have the nice property of being preserved by shifting: a stopping time with respect to a stochastic sequence (Xn )n∈N is a random variable τ taking its values in N ∪ {∞} such that for all integer i ≥ 0, the event τ = m can be expressed in terms of X0 , X1 , . . . , Xm [4]. A typical example is the hitting time of some state s ∈ S defined as T = inf{i|Xi = s}. An other typical example is a constant time which is a particular stopping time [4]. Observation 1 (Strong Markov Property) Let τ be a stopping time with respect to a derivation π = π0 π1 . . . πn . . . of A. The derivation after τ , (also called the τ -shift of π) is by definition the derivation (πi )i∈N defined by πi = πτ +i . Then for any state s ∈ S, given that Xτ = s (hence we suppose τ = ∞), the derivation after τ is a derivation of A. Proof. This is a restatement of the so-called “Strong Markov Property” for HMC: see [4] for example. Clearly, the function [ ❀ ] corresponds to the function [ ❀ ] of A. From the (strong) homogeneous Markov properties, we can also easily prove: Proposition 3. for all s, t ∈ A, n ≥ 1 1. [s ❀1 t] = [s ❀ t] 2. [s ❀n t] = u∈A [s ❀n−1 u][u ❀ t] 3. [s ❀∗ t] = limn→∞ [s ❀≤n t] Example 4. Let A be the PARS defined by A = {a, b} and 1. [a ❀ a] = 1/2 2. [a ❀ b] = 1/2 Then. 1. 2. 3. 4. 5. 6.
[a ❀n a] = 1/2n [a ❀n b] = 1/2n [a ❀≤n a] = 1 [a ❀≤n b] = 1 − 1/2n [a ❀∗ a] = 1 [a ❀∗ b] = 1
Recall that in the classical setting that an ARS A is locally confluent (LC) if b ← a → c ⇒ ∃d b →∗ d ∧ c →∗ d. We will then say that a PARS A is probabilistically locally confluent (pLC) if P (∃d b ❀∗ d ∧ c ❀∗ d) > 0 whenever a ❀ b and a ❀ c by two independently chosen derivations. Of course, such a definition must be understood as follows: a PARS A is probabilistically locally confluent, if for all a, b, c ∈ A, if a ❀ b by some derivation
262
O. Bournez and C. Kirchner
π, and a ❀ c by some independently chosen derivation π , then the probability that there is a d ∈ A such that simultaneously b ❀∗ d by derivation π after stopping time τ = 1 and c ❀∗ d by derivation π after stopping time τ = 1 is positive. The probability measure considered here is the product measure P ⊕ P , since we consider two independently randomly chosen derivations π and π . Observe that from Strong Markov Property, this is equivalent to say that, for all b, c ∈ A, the probability that for two independently randomly chosen derivations π and π starting from b and c respectively, there is a d ∈ A such that simultaneously b ❀∗ d by derivation π and c ❀∗ d by derivation π , is positive, whenever [a ❀ b] > 0 and [a ❀ c] > 0 for some a ∈ A. In the same vein, we will say that ❀ is almost-surely locally confluent (PLC) if P (∃d b ❀∗ d ∧ c ❀∗ d) = 1 whenever a ❀ b and a ❀ c by two independently chosen derivations. We let the reader guess what probabilistically confluent (pC) , and almostsurely confluent (PC) are. Proposition 1 has the following equivalent: Proposition 4. The following are equivalent: 1. ❀ is almost-surely confluent (PC) (resp. probabilistically confluent (pC) ) 2. ❀∗ is almost-surely locally confluent (PLC) (resp. probabilistically locally confluent (pLC) ) 3. ❀∗ is almost-surely confluent (PC) (resp. probabilistically confluent (pC) ) 4. a ❀ b, a ❀∗ c implies there is a d with b ❀∗ d, c ❀∗ d with probability one (resp. positive probability). Recall that an a ∈ A is in normal form if there is no b with [a ❀ b] > 0. We say that a ∈ A probabilistically has a normal form (phnf ) if P (∃b nf a ❀∗ b) > 0. We say that a ∈ A almost-surely has a normal form (Phnf ) if P (∃b nf a ❀∗ b) = 1. A relation is probabilistically normalizing (pN) if every a ∈ A probabilistically has a normal form (phnf) . A relation is almost-surely normalizing (PN) if every a ∈ A almost-surely has a normal form (Phnf) . A relation is almost-surely terminating (PT) if every derivation π0 , π1 , . . . , πn is terminal (ultimately equal to ⊥) with probability 1. It is probabilistically terminating (pT) if this latter probability is positive. Two states a, b ∈ A are said to be convertible if a and b are convertible in the classical sense for the projection of A , that is for the classical binary relation [ ❀ ] > 0. A relation has unique normal form property (UN) if for all a, b ∈ A, if a and b are convertible and in normal form, then a and b are identical. A relation has the probabilistic normal form property (pNF) if P (b ❀∗ a) > 0 whenever a is in normal form and convertible with b. A relation has the almost-sure normal form property (PNF) if P (b ❀∗ a) = 1 whenever a is in normal form and convertible with b.
Probabilistic Rewrite Strategies. Applications to ELAN
263
We are now ready to study the properties of the above notions. We will then say that a PARS is locally confluent (LC) (respectively confluent (C) , normalizing (N) , terminating (T) , unique normal form property (UN) , inductive (Ind) , increasing (Inc) , finitely branching (FB) ) if its projection is. It is noticeable that the projection operator commutes with finite iteration and (reflexive) transitive iteration. In other words, probabilistic joinability and joinability corresponds. Formally, Proposition 5. Let A = (A, [ ❀ ]) be a PARS and A = (A, →) its projection: s → t iff [s ❀ t] > 0. Then, for all s, t ∈ A, 1. [s ❀n t] > 0 iff s →n t 2. [s ❀∗ t] > 0 iff s →∗ t Proof. The two assertions are proved by induction over n. They are clear for n = 0 and n = 1. For n > 1, we have s →n t iff there is a u with s →n−1 u and iff there is a u with [s ❀n−1 u] > 0 and [u ❀ t] iff [s ❀n t] = u → tn−1 u][u ❀ t] > 0. For the second assertion, we have s →∗ t iff there u∈A [s ❀ is a n with s →n t iff there is a n with [s ❀n t] > 0 iff [s ❀∗ t] > 0. Indeed, if there is such a n, [s ❀∗ t] is at least [s ❀n t]. For theother direction, we clearly have by sigma-additivity of probabilities [s ❀∗ t] ≤ n∈N [s ❀n t]. We then claim. Proposition 6. 1. almost-surely locally confluent (PLC) ⇒ probabilistically locally confluent (pLC) ⇔ locally confluent (LC) . 2. The reverse implication is false in general. Proof. Only the equivalence between probabilistically locally confluent (pLC) and locally confluent (LC) is not trivial. Clearly P (∃d a ❀∗ d ∧ b ❀∗ d) > 0 iff ∃d P (a ❀∗ d ∧ b ❀∗ d) > 0 iff ∃d P (a ❀∗ d) > 0 ∧ P (b ❀∗ d) > 0. From previous lemma, that happens iff [ ❀ ] > 0 is locally confluent (LC) . In the same vein, we can prove: Proposition 7. 1. almost-surely confluent (PC) ⇒ probabilistically confluent (pC) ⇔ confluent (C) 2. almost-surely has a normal form (Phnf ) ⇒ probabilistically has a normal form (phnf ) ⇔ has a normal form (hnf ) 3. almost-surely normalizing (PN) ⇒ probabilistically normalizing (pN) ⇔ normalizing (N) 4. almost-sure normal form property (PNF) ⇒ probabilistic normal form property (pNF) ⇔ normal form property (NF) 5. terminating (T) ⇒ almost-surely terminating (PT) ⇔ almost-surely normalizing (PN) ⇒ probabilistically terminating (pT) 6. The reverse implications are false in general. We then get the following picture:
264
O. Bournez and C. Kirchner
Proposition 8. 1.
Probabilistic Rewrite Strategies. Applications to ELAN
265
The previous implications can all be derived by adapting the proofs from the classical abstract reduction systems towards probabilistic ones. Observe that they were obtained by considering the classical rewriting settings and testing whether it generalizes to that context. That means first that we are not at all exhaustive: some reverse implications, or other implications may hold. Second, the classical theory of HMC can a priori also be used to derived other facts, or relations between our notions.
6
Conclusion
In this paper we presented an extension of the ELAN system to deal with the prototyping and modeling of systems with probabilistic evolutions. We proposed to extend the system by adding a new “probabilistic choose” operator to the strategy language of the system. We argued through simple classical examples that this approach that puts probabilities at the level of strategies, i.e. at the level of the control of rewriting is very natural. This is in contrast with similar works where probabilities are in some sense hard-coded in rules [8]. Furthermore, this approach has the advantage that the problem of probabilities normalization (probabilities must be normalized to sum to 1) in the above mentioned work is avoided in a nice manner: this is put at the level of the definition of the operators which are coded in the language itself. Dealing with probabilistic systems gives rise to many very interesting questions about the generalization of the classical rewriting notions. We put the first stones in that direction by defining several notions and stating generalizations of classical results for abstract derivation systems. Our results mainly say that the notions defined by considering joinability with positive probability corresponds to classical notions by putting away probabilities, and that the notions defined by considering almost-sure joinability are stronger. These are of course only the first steps in the direction of fully understanding probabilistic rewrite systems. Actually, when dealing with rewrite systems, the objects of interest are; (1) abstract rewrite systems, (2) the rewrite relation that should be stable by context and substitution, (3) the rewrite logic, (4) the rewrite calculus. The generalization of all these concepts and their relations dealing with probabilistic systems/theories give rise to many potentially interesting problems. We touched only (1), and on this topics we think that some other results could also been derived from Homogeneous Markov Chain Theory.
References 1. D. Abdemouche. Strat´egies probabilistes en elan et applications. M´emoire de dea en informatique, Universit´e Henri Poincar´e – Nancy 1, July 2001. 2. F. Baader and T. Nipkow. Term Rewriting and all That. Cambridge University Press, 1998.
266
O. Bournez and C. Kirchner
3. P. Borovansk´ y, C. Kirchner, H. Kirchner, and C. Ringeissen. Rewriting with strategies in ELAN: a functional semantics. International Journal of Foundations of Computer Science, 12(1):69–98, February 2001. 4. P. Br´emaud. Markov Chains. Springer, 1991. 5. H. Cirstea and C. Kirchner. The rewriting calculus — Part I and II. Logic Journal of the Interest Group in Pure and Applied Logics, 9(3):427–498, May 2001. 6. A. Dutech, O. Buffet, and F. Charpillet. Multi-Agent Systems by Incremental Gradient Reinforcement Learning. In 17th International Joint Conference on Artificial Intelligence, Seattle, WA, USA, volume 2, pages 833–838, August 2001. 7. W. Feller. An introduction to probability theory and its applications, volume 1 and 2. Wiley, 1968. 8. T. Fr¨ uhwirth, A. Di Pierro, and H. Wiklicky. Toward probabilistic constraint handling rules. In S. Abdennadher and T. Fr¨ uhwirth, editors, Proceedings of the third Workshop on Rule-Based Constraint Reasoning and Programming (RCoRP’01), Paphos, Cyprus, December 2001. Under the hospice of the International Conferences in Constraint Programming and Logic Programming. 9. D. E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading, Massachusetts, 1989. 10. G. Grimmett. Probability Theory. Cambridge University Press, 1993. 11. H. Kirchner and P.-E. Moreau. Promoting rewriting to a programming language: A compiler for non-deterministic rewrite programs in associative-commutative theories. Journal of Functional Programming, 11(2):207–251, 2001. 12. J. W. Klop. Term rewriting systems. In S. Abramsky, D. M. Gabbay, and T. S. E. Maibaum, editors, Handbook of Logic in Computer Science, volume 2, chapter 1, pages 1–117. Oxford University Press, Oxford, 1992. 13. D. E. Knuth. The art of Computer Programming, volume 2. Addison-Wesley Publishing Company, second edition, 1981. 14. P. Lincoln, J. Mitchell, and A. Scedrov. Stochastic interaction and linear logic. In J.-Y. Girard, Y. Lafont, and L. Regnier, editors, Advances in Linear Logic, pages 147–166. Volume 222, London Mathematical Society Lecture Notes, Cambridge University Press, 1995. 15. R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995. 16. A. D. Pierro and H. Wiklicky. An operational semantics for probabilistic concurrent constraint programming. In Proceedings of the 1998 International Conference on Computer Languages, pages 174–183. IEEE Computer Society Press, 1998. 17. D. Pierro and Wiklicky. A markov model for probabilistic concurrent constraint programming. In APPIA-GULP-PRODE’98, Joint Conference on Declarative Programming, 1998. 18. D. Pierro and Wiklicky. Probabilistic concurrent constraint programming: Towards a fully abstract model. In MFCS: Symposium on Mathematical Foundations of Computer Science, 1998. 19. D. Pierro and Wiklicky. Concurrent constraint programming: Towards probabilistic abstract interpretation. In 2nd International ACM SIGPLAN Conference on Principles and Practice of Declarative Programming (PPDP’00), 2000. 20. W. Rudin. Real and Complex Analysis, 3rd edition. McGraw Hills, USA, March 1987. 21. V. Subrahmanian. Probabilistic databases and logic programming. In International Conference on Logic Programming, volume 2237 of Lecture Notes in Computer Science, page 10. Springer-Verlag, 2001.
Loops of Superexponential Lengths in One-Rule String Rewriting Alfons Geser ICASE, Mail Stop 132C, NASA Langley Research Center, Hampton, VA 23681. [email protected]
Abstract. Loops are the most frequent cause of non-termination in string rewriting. In the general case, non-terminating, non-looping string rewriting systems exist, and the uniform termination problem is undecidable. For rewriting with only one string rewriting rule, it is unknown whether non-terminating, non-looping systems exist and whether uniform termination is decidable. If in the one-rule case, non-termination is equivalent to the existence of loops, as McNaughton conjectures, then a decision procedure for the existence of loops also solves the uniform termination problem. As the existence of loops of bounded lengths is decidable, the question is raised how long shortest loops may be. We show that string rewriting rules exist whose shortest loops have superexponential lengths in the size of the rule. Keywords: string rewriting, semi-Thue system, uniform termination, termination, loop, one-rule, single-rule Submission category: Regular research paper.
1
Introduction
Uniform termination, i.e., the non-existence of an infinite reduction sequence, is an undecidable property of string rewriting systems (SRSs) [8], even if they comprise only three rules [13]. It is open whether uniform termination is decidable for SRSs with less than three rules. An SRS admits a loop if there is a reduction of the form u →+ sut. Every looping SRS is non-terminating. The converse does not hold, even for two-rule SRSs [6]. McNaughton [15] conjectures that every one-rule non-terminating SRS admits a loop. If McNaughton’s conjecture holds, and if the existence of loops is decidable for one-rule SRSs, then the uniform termination of one-rule SRSs is decidable. Existence of loops of bounded length is decidable [6]. This immediately raises the question whether there is an algorithm that outputs upper bounds of lengths of shortest loops. On this account it is most interesting how long shortest loops can be.
This work was supported by the National Aeronautics and Space Administration under NASA Contract No. NAS1-97046 while the author was in residence at ICASE.
S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 267–280, 2002. c Springer-Verlag Berlin Heidelberg 2002
268
A. Geser
The purpose of this note is to prove that there are one-rule SRSs that admit loops of superexponential lengths in the size of the rule, but no shorter loops. This is in harsh contrast to the common belief that loops are simple. Specifically, we prove the following result. Theorem 1. For all p ≥ 2q, q ≥ 1, r ≥ 2, the string rewriting rule R = {10p → 0p 1r 0q } admits loops of length 1 +
−1 i=0
ri where = pq but no shorter loops.
The fact that these SRSs loop is known [19, Case 6.2.2] [10, Theorem 10.1, Case 2.2]; the contribution of this paper is a sharp lower bound for the length of the loop. Theorem 1 follows immediately from Lemmas 5 and 16, which we will prove below. By choosing q = 1 and keeping p ≥ 2 fixed, we get a family of rules where the shortest length of loops is polynomial in r with degree p − 1. By choosing q = 1 and keeping r ≥ 2 fixed, we get shortest loop lengths exponential in p with base r. By choosing q = 1 and r = p the minimal loop length is greater than pp−1 . This shows the claimed superexponential growth. The paper is organized as follows. We will show: in Section 3 that each R has a loop; in Section 4 that the length of the loop is as claimed; in Section 5 that the start string of a shortest loop has a special shape; and in Section 6 that these strings initiate no shorter loops.
2
Preliminaries
We assume that the reader is familiar with termination of string rewriting. SRSs are also called semi-Thue systems. For an introduction to string rewriting see Book and Otto [1] or Jantzen [9]. The study of termination in one-rule string rewriting has been initiated by Kurth in his thesis [11]. Further work includes McNaughton [14,15,16], Senizergues [19], Kobayashi et al. [20,10], and Zantema and Geser [6,21,4,5]. Since SRSs can be encoded as term rewriting systems where letters are unary function symbols, the results of termination of term rewriting [3] apply. An SRS R is a set of string rewriting rules, i.e., pairs of strings denoted as u → v. The reduction step relation, also denoted by →, is defined by sut → svt for all strings s, t and string rewriting rules u → v. Here st denotes the concatenation of strings s and t. A loop is a reduction of the form t →+ utv where u, v are strings. An SRS R is said to admit a loop if a loop t →+ utv exists. The string t is also called a prefix, u a suffix of tu. Any string utv is said to contain t as a factor. The set of overlaps of a string u with a string v is defined by OVL(u, v) = {w ∈ Σ + | u = u w, v = wv , u v = ε, u , v ∈ Σ ∗ } .
Loops of Superexponential Lengths in One-Rule String Rewriting
3
269
The Rule Admits a Loop
Throughout this paper we assume strings over the two-letter alphabet {0, 1}, and we speak about one-rule SRSs R = {10p → 0p 1r 0q }, for some p ≥ 2q, q ≥ 1, r ≥ 2. In this section we show that each R has a loop. Definition 1. Let strings ti , i ≥ 0 be defined recursively by t0 = 1, ti+1 = 0q tri . The following lemma is crucial for the proof that R admits loops. p ∗ p−q m tk+1 0q holds for all k, m ≥ 0. Lemma 1. tm k 0 → 0
Proof. Proof by induction on (k, m) ordered lexicographically. The case m = 0 is trivial, so assume m > 0. Case 1: k = 0. Then m−1 p q 10p → tm−1 0p 1r 0q →∗ 0p−q tm−1 0q 1r 0q = 0p−q tm tm 0 0 = t0 1 0 , 0 1
by definition of t0 , inductive hypothesis for (k, m − 1), and definition of t1 , respectively. Case 2: k > 0. Then m−1 q r p q r q p−q m 0 tk−1 0p →∗ tm−1 0q 0p−q trk 0q →∗ 0p−q tm−1 tk+1 0q , tm k 0 = tk k k+1 0 tk 0 = 0
by definition of tk , inductive hypothesis for (k − 1, r), inductive hypothesis for (k, m − 1), and definition of tk+1 , respectively. By definition tk is a factor of tk+1 . Now if tk 0p is a factor of tk+1 0q then we have a loop. To this end k has to be great enough. Example 1. Let p = 2, q = 1, r = 2. Then R = {100 → 00110}. We have t0 = 1, t1 = 0t0 t0 = 011, t2 = 0t1 t1 = 0011011, t3 = 0t2 t2 = 000110110011011, and so forth. The string t0 0p = 100 is not a factor of t1 0q = 0110. Neither is t1 0p = 01100 a factor of t2 0q = 00110110. However t3 0q = 0001101100110110 contains t2 0p as a factor at the underlined occurrence. The problem is traced back to finding a factor 0p within tk . The following property of ti is the key to the solution. Lemma 2. For all k ≥ 0, the following hold: 1. 0qk is a prefix of tk . 2. 0qk+1 is not a factor of tk . Proof. Straightforward induction on k. If k is chosen great enough then 0p fits into 0qk . Let x denote the least integer i such that i ≥ x. Lemma 3. Let = pq . Then t 0p →∗ 0p−q t+1 0q is a loop.
270
A. Geser
Proof. By Lemma 1 for m = 1 we get a reduction t 0p →∗ 0p−q t+1 0q .
(1)
Now suppose that = pq , whence q ≥ p. The following analysis shows that in this case Reduction (1) indeed forms a loop, i.e., that its left hand side t 0p is a factor of its right hand side, 0p−q t+1 0q . For some string w we get 0p−q t+1 0q = 0p−q 0q tr 0q = 0p−q 0q tr−2 t t 0q = 0p−q 0q tr−2 t 0q w0q = 0p−q 0q tr−2 t 0p 0q−p w0q by definition of t+1 , the premise r ≥ 2, Lemma 2, and the property q ≥ p, respectively. The occurrence of t 0p is underlined.
4
The Loop Has Superexponential Length
Let us start by calculating the lengths of reductions of Lemma 1. p ∗ p−q m tk+1 0q constructed in Lemma 4. For all k, m ≥ 0, the reduction tm k 0 → 0 Lemma 1 has length mrk .
Proof. Check that 0rk = 0, (m+1)r0 = 1+mr0 , and (m+1)rk+1 = rrk +mrk+1 . From Lemma 4 we get immediately: Proposition 1. The length of the reduction in Lemma 3 is r . The length of the loop in Lemma 3 is not yet minimal. A refinement leads to the following shorter loop. We will prove its minimality in the subsequent sections. Lemma 5. Let = pq . Then there is a loop 10(p−q)(+1)+q →n 0p 1r−1 0(p−q) t 0q of length n = 1 +
−1 i=0
ri .
Proof. By Lemmas 1 and 4 we have the reduction of length n, 10(p−q)(+1)+q → 0p 1r−1 t0 0(p−q)+q →r
0
0p 1r−1 0p−q t1 0(p−q)(−1)+q →r .. .
1
0p 1r−1 0(p−q)(−1) t−1 0(p−q)+q →r
−1
0p 1r−2 10(p−q) t 0q .
Now t has a prefix 0q by Lemma 2, and so a prefix 0p by definition of . Together with the underlined string this forms a reoccurrence of the initial string, 10(p−q)(+1)+q = 10(p−q) 0p , as a factor in the final string. So the presented reduction is indeed a loop.
Loops of Superexponential Lengths in One-Rule String Rewriting
5
271
How Shortest Loops Start
To prove that there are no loops shorter than those of Lemma 5, we first restrict the set of strings that may initiate shortest loops. To this end we employ the fact that the existence of loops is characterized by the existence of looping forward closures. Forward closures [12,2] are restricted reductions. The following characterization of forward closures by Hermann is convenient. Definition 2 (Forward Closure [12,2,7]). The set of forward closures of an SRS R is the least set FC(R) of R-reductions such that fc1. if (l → r) ∈ R then (l → r) ∈ FC(R), fc2. if (s1 →+ t1 x) ∈ FC(R) and (xl2 → r2 ) ∈ R such that x = ε then (s1 l2 →+ t1 xl2 →+ t1 r2 ) ∈ FC(R), fc3. if (s1 →+ t1 l2 t1 ) ∈ FC(R) and (l2 → r2 ) ∈ R then (s1 →+ t1 l2 t1 →+ t1 r2 t1 ) ∈ FC(R). We call a forward closure of the form s →+ usv a looping forward closure. Theorem 2 ([6]). An SRS admits a loop if and only if it has a looping forward closure. Moreover if there is a loop of length n then there is a looping forward closure of length at most n. Lemma 6. Every forward closure of R has the form 10(p−q)k+q →∗ w1r 0q for some k ≥ 1 and some string w. Proof. By induction on the definition of forward closure. Case 2 is trivial. For Case 2, let s1 = 10(p−q)k+q , t1 x = w1r 0q , xl2 = 10p , r2 = 0p 1r 0q . Observe that x must be x = 10q . This implies t1 = w1r−1 , l2 = 0p−q and we get s1 l2 = 10(p−q)k+q 0p−q = 10(p−q)(k+1)+q →∗ w1r−1 0p 1r 0q = t1 r2 as the composed forward closure. It has the claimed form. Case 2: Let s1 = 10(p−q)k+q , t1 l2 t1 = w1r 0q , l2 = 10p , r2 = 0p 1r 0q . By p > q, l2 cannot be a factor of 1r 0q . Nor can it left overlap with it: OVL(l2 , 1r 0q ) = ∅. Therefore t1 is longer than 1r 0q . In other words a string w exists such that t1 = w 1r 0q . Hence w = t1 l2 w and the composed forward closure is s1 = 10(p−q)k+q →∗ t1 r2 w 1r 0q = t1 r2 t1 which has the claimed form. Next we show that a forward closure can only issue an infinite reduction, and so a loop, if its left hand side is large enough. Lemma 7. 0(p−q)k+q tk 0q is irreducible for all k ≤ pq . Proof. Suppose that 0(p−q)k+q tk 0q is reducible. Then tk is reducible; but then tk−1 contains a factor 0p ; by Lemma 2 then q(k − 1) ≥ p; so k ≥ 1 + pq .
272
A. Geser
Theorem 3 ([18]). Let R be non-overlapping and let s be an arbitrary string. Then s has an infinite reduction if and only if all reductions starting from s can be prolonged infinitely. Lemma 8. If 10(p−q)k+q issues an infinite reduction then k ≥ 1 + pq . Proof. First we observe that R is non-overlapping, i.e., its left hand side, 10p , has no overlap with itself: OVL(10p , 10p ) = ∅. Now there is a reduction s = 10(p−q)k+q →∗ 0(p−q)k tk 0q = s by Lemma 1 applied k times for m = 1. If k ≤ pq then this reduction cannot be prolonged as its final string, s , is irreducible by Lemma 7. By Theorem 3 therefore s issues no infinite reduction for k ≤ pq .
6
Shorter Loops Do Not Exist
We still have to prove that strings 10k(p−q)+q , k ≥ 1 + = 1 + pq , initiate no −1 loops shorter than 1 + i=0 ri . First let us switch from strings s ∈ {0, 1}∗ to their tuple representation T (s) ∈ N∗ . Here N denotes the set of non-negative integers. Please note that our notion of tuple representation differs from the literature [17,11]. Definition 3. Let k ∈ N, and let xi , yi ∈ N for all 0 ≤ i ≤ k. A string s ∈ {0, 1}∗ of the form s = 0x0 (p−q)+y0 q 10x1 (p−q)+y1 q . . . 10xk (p−q)+yk q is said to have a tuple representation T (s) = (x0 , . . . , xk ; y0 , . . . , yk ) . if for all 1 ≤ i ≤ k, yi ≤ − 1, and xi > 0 implies yi > 0. The guard yi ≤ − 1, which is equivalent to yi q < p, ensures that the xi and yi are uniquely given by xi (p − q) + yi q. Some strings over the alphabet {0, 1} may have no tuple representation, e.g. 0 has no tuple representation if p = 4, q = 2. For our purposes, however, it is reassuring to know that 10k(p−q)+q has a tuple representation for any k and that certain rewrite steps preserve the existence of tuple representation. We will conveniently speak about rewriting steps at position m: Definition 4. Let s have a tuple representation, T (s) = (x0 , . . . , xk ; y0 , . . . , yk ), and let 0 ≤ m ≤ k − 1. Then s →m s if xm+1 > 0, (and so ym+1 > 0,) and s = . . . 10xm (p−q)+ym q 0p 1r 0q 0(xm+1 −1)(p−q)+(ym+1 −1)q . . . . Proposition 2. Let s have a tuple representation. Then s → s if and only if s →m s for some 0 ≤ m ≤ k − 1.
Loops of Superexponential Lengths in One-Rule String Rewriting
273
Definition 5. Let T (s) = (x0 , . . . , xk ; y0 , . . . , yk ) and let 0 ≤ m ≤ k − 1. Then a rewrite step s →m s is called ordinary if ym < − 1. Else the step is called extraordinary. A reduction is called ordinary if every step is ordinary. An extraordinary reduction has at least one extraordinary step. Ordinary rewrite steps preserve the existence of tuple representation: Proposition 3. Let T (s) = (x0 , . . . , xk ; y0 , . . . , yk ), let 0 ≤ m ≤ k − 1, and let s →m s be ordinary. Then s has the tuple representation T (s ) = (x0 , . . . , xm−1 , xm + 1, 0, . . . , 0, xm+1 − 1, xm+2 , . . . , xk ; r−1
y0 , . . . , ym−1 , ym + 1, 0, . . . , 0, ym+1 , ym+2 , . . . , yk ) . r−1
In contrast, extraordinary steps may create strings that have no tuple representation. Example 2. Let s have the tuple representation T (s) = (2, 1; − 1, 1), and let m = 0. Then we have s = 02(p−q)+(−1)q 10p →m 03(p−q)+q 10q = 04(p−q)+(+1)q−p 10q . For p = 5, q = 2 we get = 3 and ( + 1)q − p = 3 which has no representation as an integer multiple of q. So the string 04(p−q)+(+1)q−p 10q has no tuple representation. Our goal is to demonstrate that, in any reduction starting from 10k(p−q)+q , the first extraordinary step takes place only as late as the completion of the loop. We are now going to construct two functions h, h that estimate the length of the shortest ordinary reduction to the next string that has the factor 10(p−q)+q , and the length of the shortest extraordinary reduction, respectively. These two functions will be based on the following auxiliary functions gk . Definition 6. The functions gk : Nk+1 → N, k ∈ N are defined by gk (x0 , . . . , xk ) =
1 (rx1 +···+xk + rx2 +···+xk + · · · + rxk − k) . r−1
gk does not depend on its first argument. This is intentional. The following derived properties will be useful below. Proposition 4. For all k ≥ 1, x0 , . . . , xk the following hold: 1. gk (x0 , . . . , xk ) = gk−1 (x0 , . . . , xk−1 ) if xk = 0; 2. gk (x0 , . . . , xk ) ≥ gk−1 (x1 , . . . , xk ); 3. gk (x0 , . . . , xk ) = gk+r−1 (x0 , . . . , xi + 1, 0, . . . , 0, xi+1 − 1, . . . , xk ) + 1 for all 0 ≤ i ≤ k − 1 such that xi+1 ≥ 1;
r−1
274
A. Geser
4. gk (x0 , . . . , xk ) ≥ gk (x0 + 1, x1 , . . . , xi−1 , xi − 1, xi+1 , . . . , xk ) for all 1 ≤ i ≤ k such that xi ≥ 1; 5. gk is monotone in each argument. The next lemma is the workhorse of this section. It states that a reduction step decreases by at most one, the minimal value of all those gk terms which have the same sum of arguments. Lemma 9. Let s → s be an ordinary step where T (s) = (x0 , . . . , xk ; y0 , . . . , yk ) and T (s ) = (x0 , . . . , xk ; y0 , . . . , yk ). Then for every 1 ≤ i ≤ j ≤ k , zj ≤ xj there exist 1 ≤ i ≤ j ≤ k, zj ≤ xj such that gj −i (xi , . . . , xj −1 , zj ) ≥ gj−i (xi , . . . , xj−1 , zj ) − 1, xi + · · · + xj −1 + zj = xi + · · · + xj−1 + zj . Proof. Let 0 ≤ m ≤ k − 1 be determined by the rewrite step s →m s . Then by Proposition 3 we have k = k + r − 1, x0 = x0 , . . . , xm−1 = xm−1 , xm = xm + 1, xm+1 = · · · = xm+r−1 = 0, xm+r = xm+1 − 1, xm+r+1 = xm+2 , . . . , xk = xk . And we have y0 = y0 , . . . , ym−1 = ym−1 , ym = ym +1, ym+1 = · · · = ym+r−1 = 0, ym+r = ym+1 , ym+r+1 = ym+2 , . . . , yk = yk . Let 1 ≤ i ≤ j ≤ k and zj ≤ xj . The proof is done by case analysis on i and j . Case 1: 1 ≤ i ≤ j ≤ m. If j = m or zj = xm + 1 then choose i = i , j = j , zj = zj . In this case we get gj −i (xi , . . . , xj −1 , zj ) = gj−i (xi , . . . , xj−1 , zj ) .
(2)
Else choose i = i , j = m + 1, zj = 1. Here we use xm+1 > 0 to establish zj ≤ xj . We get gj −i (xi , . . . , xj −1 , zj ) = gm−i (xi , . . . , xm−1 , xm + 1) = gm+r−i (xi , . . . , xm−1 , xm + 1, 0, . . . , 0) r
= gm+1−i (xi , . . . , xm , 1) − 1 = gj−i (xi , . . . , xj−1 , zj ) − 1, by Items 1 and 3 of Proposition 4. Case 2: m + 1 ≤ i ≤ j ≤ m + r − 1. Then xi + · · · + xj −1 + zj = 0. Choose any 1 ≤ i ≤ k, and let j = i and zj = 0. Then gj −i (xi , . . . , xj −1 , zj ) = 0 = g0 (0) . Case 3: m + r ≤ i ≤ j ≤ k . If i = m + r or j = i then choose i = i − r + 1, j = j − r + 1, zj = zj . We get (2). Else we have i = m + r and j > i , and so xi = xm+1 − 1. In this case let i = m + 1. If zj > 0 then let j and zj be defined by j = j −r +1 and zj = zj −1; else let i ≤ j < j −r +1 be the greatest number
Loops of Superexponential Lengths in One-Rule String Rewriting
275
such that xj > 0 and let zj = xj − 1. By xi > 0, j and zj are well-defined. Thus we get gj −i (xi , . . . , xj −1 , zj ) = gj −r+1−i (xi − 1, xi+1 , . . . , xj −r , zj ) = gj−i (xi − 1, xi+1 , . . . , xj−1 , zj + 1) ≥ gj−i (xi , xi+1 , . . . , xj−1 , zj ) by Items 1 and 4 of Proposition 4. Case 4: 1 ≤ i ≤ m and m + r ≤ j ≤ k . Choose i = i , j = j − r + 1, zj = zj . We get gj −i (xi , . . . , xj −1 , zj ) = gj−i+r−1 (xi , . . . , xm + 1, 0, . . . , 0, xm+1 − 1, . . . , xj−1 , zj ) r−1
= gj−i (xi , . . . , xm , xm+1 , . . . , xj−1 , zj ) − 1
by Item 3 of Proposition 4. Note that if j = m+r then xj = xm+1 −1 = xj −1. So one can always choose zj = zj yielding zj = zj ≤ xj ≤ xj as required. Case 5: 1 ≤ i ≤ m < j ≤ m + r − 1. This case reduces to Case 1 by the identity gj −i (xi , . . . , xj −1 , zj ) = gm−i (xi , . . . , xm ) due to Item 1 of Proposition 4. Case 6: m + 1 ≤ i < m + r ≤ j ≤ k . This case reduces to Case 3 by the inequality gj −i (xi , . . . , xj −1 , zj ) ≥ gj −m−r (xm+r , . . . , xj −1 , zj ) due to Item 2 of Proposition 4. These are all cases. In each case it is easy to show that xi + · · · + xk = xi + · · · + xk . This finishes the proof. Definition 7. Let T (s) = (x0 , . . . , xk ; y0 , . . . , yk ) and let x1 +· · ·+xk ≥ . Then h(s) ∈ N is defined by h(s) = min{gj−i (xi , . . . , xj−1 , zj ) | 1 ≤ i ≤ j ≤ k, zj ≤ xj , xi + · · · + xj−1 + zj = } . Well-definedness of h(s) follows immediately from the fact that the minimum is taken from a finite, non-empty set. Lemma 10. Let s → s be an ordinary step where T (s) = (x0 , . . . , xk ; y0 , . . . , yk ) and T (s ) = (x0 , . . . , xk ; y0 , . . . , yk ). If x1 + · · · + xk ≥ then x1 + · · · + xk ≥ and h(s) ≤ h(s ) + 1.
276
A. Geser
Proof. The condition x1 + · · · + xk ≥ ensures that h(s ) is defined. If s →m s for 1 ≤ m ≤ k − 1 then x1 + · · · + xk = x1 + · · · + xk ≥ by Proposition 3. Else s →m s for m = 0 and then (x1 − 1) + · · · + xk = x1 + · · · + xk ≥ . So x1 + · · · + xk ≥ whence h(s) is defined. By definition of h(s ), there is 1 ≤ i ≤ j ≤ k , zj ≤ xj such that both h(s ) = gj −i (xi , . . . , xj −1 , zj ) and xi +· · ·+xj −1 +zj = . Hence by Lemma 9, there is 1 ≤ i ≤ j ≤ k, zj ≤ xj such that h(s ) ≥ gj−i (xi , . . . , xj−1 , zj ) − 1 and xi + · · · + xj−1 + zj = . So h(s) ≤ gj−i (xi , . . . , xj−1 , zj ) ≤ h(s ) + 1. Lemma 11. Let T (s) = (x0 , . . . , xk ; y0 , . . . , yk ). If s →n u10(p−q)+q v for some strings u, v is an ordinary reduction then x1 + · · · + xk ≥ and h(s) ≤ n. Proof. By induction on n. The base case n = 0 is proven by h(s) ≤ g0 () = 0. For the inductive step let s → s →n−1 u10(p−q)+q v, let x1 + · · · + xk ≥ , and let h(s ) ≤ n − 1. Thus x1 + · · · + xk ≥ and h(s) ≤ n by Lemma 10. With Lemma 11 we have a criterion for ordinary reductions. For extraordinary reductions we pursue a similar line of reasoning. We start with a lemma akin to Lemma 9. If i = j then for convenience let gj−i (yi , xi+1 , . . . , xj−1 , zj ) = g0 (zj ) and let yi + xi+1 + · · · + xj−1 + zj = zj . Note that we require zj ≤ yj if i = j and zj ≤ xj else. Lemma 12. Let s → s be an ordinary step where T (s) = (x0 , . . . , xk ; y0 , . . . , yk ) and T (s ) = (x0 , . . . , xk ; y0 , . . . , yk ). Then for every 1 ≤ i < j ≤ k , zj ≤ xj and for every 1 ≤ i = j ≤ k , zj ≤ yj there exist 1 ≤ i < j ≤ k, zj ≤ xj or 1 ≤ i = j ≤ k, zj ≤ yj such that gj −i (yi , xi +1 , . . . , xj −1 , zj ) ≥ gj−i (yi , xi+1 , . . . , xj−1 , zj ) − 1, yi + xi +1 + · · · + xj −1 + zj = yi + xi+1 + · · · + xj−1 + zj . Proof. Let 0 ≤ m ≤ k − 1 be determined by the rewrite step s →m s . Case 1: 1 ≤ i = j ≤ k , zj ≤ yj . If j = m or zj = zm + 1 then g0 (zj ) = 0 = g0 (zj ). Else choose j = m + 1, zj = 1. Here we use ym+1 > 0 to establish zj ≤ yj . We get g0 (zj ) = gr (ym + 1, 0, . . . , 0) = g1 (ym , 1) − 1 = gj−i (yi , zj ) − 1 r
by Items 1 and 3 of Proposition 4. Case 2: 1 ≤ i < j ≤ k , zj ≤ xj . Case 2.1: m + 1 ≤ i < j ≤ m + r − 1. Then yi + xi +1 + · · · + xj −1 + zj = 0. Choose any 1 ≤ i ≤ k, and let j = i and zj = 0. Then gj −i (yi , xi +1 , . . . , xj −1 , zj ) = 0 = g0 (0) .
Loops of Superexponential Lengths in One-Rule String Rewriting
277
Case 2.2: i = m + r. Choose = i − r + 1, j = j − r + 1, zj = zj . We get gj −i (yi , xi , . . . , xj −1 , zj ) = gj−i (yi , xi+1 , . . . , xj−1 , zj ), yi + xi + · · · + xj −1 + zj = yi + xi+1 + · · · + xj−1 + zj . Case 2.3: 1 ≤ i ≤ m or i = m + r and m + r − 1 ≤ j ≤ k . We carry over the proof of Lemma 9, observing the facts i = j , i = j, and yi − xi = yi − xi . Then we may conclude from the proof of Lemma 9 that gj −i (yi , xi +1 , . . . , xj −1 , zj ) = gj −i (xi , . . . , xj −1 , zj ) ≥ gj−i (xi , . . . , xj−1 , zj ) − 1 = gj−i (yi , xi+1 , . . . , xj−1 , zj ) − 1 and yi + xi +1 + · · · + xj −1 + zj = (yi − xi ) + xi + · · · + xj −1 + zj = (yi − xi ) + xi + · · · + xj−1 + zj = (yi − xi ) + xi + · · · + xj−1 + zj = yi + xi+1 + · · · + xj−1 + zj . This finishes the proof. Lemma 13. Let s → s be an ordinary step where T (s) = (x0 , . . . , xk ; y0 , . . . , yk ) and T (s ) = (x0 , . . . , xk ; y0 , . . . , yk ). If yi + xi+1 + · · · + xk ≥ for some 1 ≤ i ≤ k then yi + xi+1 + · · · + xk ≥ for some 1 ≤ i ≤ k. Proof. The claim immediately follows from the following claim. For every 1 ≤ i ≤ k there is 1 ≤ i ≤ k such that ∆ = yi + xi+1 + · · · + xk − (yi + xi +1 + · · · + xk ) ≥ 0 . The proof is done by case analysis on i . Case 1: 1 ≤ i ≤ m − 1. Choose i = i . Then ∆ = xm + xm+1 − (xm + 0 + · · · + 0 + xm+r ) r−1
= xm + xm+1 − (xm + 1 + xm+1 − 1) = 0 . Case 2: i = m. Again choose i = i . Then + 0 + · · · + 0 + xm+r ) ∆ = ym + xm+1 − (ym r−1
= ym + xm+1 − (ym + 1 + xm+1 − 1) = 0 . Case 3: m+1 ≤ i ≤ m+r−1. Choose i = m+1. Then ∆ = xm+1 −xm+r = 1 ≥ 0. Case 4: m + r ≤ i ≤ k . Choose i = i − r + 1. Then obviously ∆ = 0. This finishes the proof.
278
A. Geser
Definition 8. Let T (s) = (x0 , . . . , xk ; y0 , . . . , yk ) and let yi + xi+1 + · · · + xk ≥ for some 1 ≤ i ≤ k. Then h (s) ∈ N is defined by h (s) = min{gj−i (yi , xi+1 , . . . , xj−1 , zj ) | 1 ≤ i < j ≤ k, zj ≤ xj , yi + xi+1 + · · · + xj−1 + zj = } . Because the minimum is taken from a finite, non-empty set, h (s) is well-defined. Using Lemmas 12 and 13, one can prove the following in the same way as Lemma 10: Lemma 14. Let s → s be an ordinary step where T (s) = (x0 , . . . , xk ; y0 , . . . , yk ) and T (s ) = (x0 , . . . , xk ; y0 , . . . , yk ). If yi + xi+1 + · · · + xk ≥ for some 1 ≤ i ≤ k then yi + xi+1 + · · · + xk ≥ for some 1 ≤ i ≤ k, and h (s) ≤ h (s ) + 1. Thus we get a lemma like Lemma 11: Lemma 15. Let T (s) = (x0 , . . . , xk ; y0 , . . . , yk ). If there is an extraordinary reduction s →n t then yi + xi+1 + · · · + xk ≥ for some 1 ≤ i ≤ k, and moreover h (s) ≤ n. Proof. Let s →n s be shortest, i.e., s →n−1 s is an ordinary reduction and only the last step s → t is extraordinary. The inductive base n = 1 is proven by h (s) ≤ g1 (−1, 1) = 1. For the inductive step s → s →n−2 s let yi +xi+1 +· · ·+ xk ≥ for some 1 ≤ i ≤ k and let h (s ) ≤ n − 1. Thus yi + xi+1 + · · · + xk ≥ for some 1 ≤ i ≤ k and h (s) ≤ n by Lemma 14. Now let us prove that the length of the loop in Lemma 5 is minimal. −1 Lemma 16. R admits no loops of length less than 1 + i=0 ri for = pq . Proof. Suppose that s →+ usv is a loop of minimal length. By Theorem 2 we may assume that s →+ usv is a forward closure. By Lemma 6, s = 10(p−q)k+q for some k ≥ 0. By Lemma 8, k ≥ 1 + . Since the loop is a forward closure, all zeroes in s must be consumed during the reduction. By OV L(, ) = ∅ we may assume that the steps are rearranged to s →k (0p 1r−1 )k 10q = s →n usv. Case 1: s →n usv is ordinary. Then we get h(s ) ≤ n by Lemma 11 and by 10(p−q)+q prefix of s. We compute h(s ) as follows: h(s ) = g(r−1)(−1) (1, 0, . . . , 0, 1, 0, . . . , 0, 1, . . . , 0, . . . , 0, 1)
r−2
r−2
−1
r−2
1 ((r − 1)r−1 + (r − 1)r−2 + · · · + (r − 1)r1 − (r − 1)( − 1)) r−1 −1 = ri − .
=
i=0
Loops of Superexponential Lengths in One-Rule String Rewriting
279
−1 So the total length of the reduction is k + n ≥ k + h(s ) = k + i=0 ri − ≥ −1 1 + i=0 ri . Case 2: s →n usv is extraordinary. Then we get h (s ) ≤ n by Lemma 15. It turns out that h (s ) = h(s ). So, no matter whether the reduction −1 s →n usv is ordinary or not, we get k + n ≥ 1 + i=0 ri . This finishes the proof. Acknowledgements. I am grateful to Robert McNaughton, David Musser, and Paliath Narendran for their encouragement to pursue this research and to Hans Zantema and Hanne Gottliebsen for reading the manuscript.
References 1. Ronald Book and Friedrich Otto. String-rewriting systems. Texts and Monographs in Computer Science. Springer, New York, 1993. 2. Nachum Dershowitz. Termination of linear rewriting systems. In Proc. 8th Int. Coll. Automata, Languages and Programming, LNCS 115, pages 448–458. Springer, 1981. 3. Nachum Dershowitz. Termination of rewriting. J. Symb. Comput., 3(1&2):69–115, Feb./April 1987. Corrigendum: 4, 3, Dec. 1987, 409-410. 4. Alfons Geser. Note on normalizing, non-terminating one-rule string rewriting systems. Theoret. Comput. Sci., 243:489–498, 2000. 5. Alfons Geser. Decidability of termination of grid string rewriting rules. SIAM J. Comput., 2002. In print. 6. Alfons Geser and Hans Zantema. Non-looping string rewriting. Theoret. Informatics Appl., 33(3):279–301, 1999. 7. Miki Hermann. Divergence des syst`emes de r´ e´ecriture et sch´ematisation des ensembles infinis de termes. Habilitation, Universit´e de Nancy, France, March 1994. 8. G´erard Huet and Dallas S. Lankford. On the uniform halting problem for term rewriting systems. Technical Report 283, INRIA, Rocquencourt, FR, March 1978. 9. Matthias Jantzen. Confluent string rewriting, volume 14 of EATCS Monographs on Theoretical Computer Science. Springer, Berlin, 1988. 10. Yuji Kobayashi, Masashi Katsura, and Kayoko Shikishima-Tsuji. Termination and derivational complexity of confluent one-rule string rewriting systems. Theoret. Comput. Sci., 262(1/2):583–632, 2001. 11. Winfried Kurth. Termination und Konfluenz von Semi-Thue-Systemen mit nur einer Regel. Dissertation, Technische Universit¨ at Clausthal, Germany, 1990. 12. Dallas S. Lankford and D. R. Musser. A finite termination criterion. Technical report, Information Sciences Institute, Univ. of Southern California, Marina-delRey, CA, 1978. 13. Yuri Matiyasevitch and Geraud S´enizergues. Decision problems for semi-Thue systems with a few rules. In Proc. 11th IEEE Symp. Logic in Computer Science, pages 523–531, New Brunswick, NJ, July 1996. IEEE Computer Society Press. 14. Robert McNaughton. The uniform halting problem for one-rule Semi-Thue Systems. Technical Report 94-18, Dept. of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, August 1994. See also “Correction to ‘The Uniform Halting Problem for One-rule Semi-Thue Systems’”, unpublished paper, August, 1996. 15. Robert McNaughton. Well-behaved derivations in one-rule Semi-Thue Systems. Technical Report 95-15, Dept. of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, November 1995. See also “Correction by the author to ‘Wellbehaved derivations in one-rule Semi-Thue Systems’”, unpublished paper, July, 1996.
280
A. Geser
16. Robert McNaughton. Semi-Thue Systems with an Inhibitor. J. Automated Reasoning, 26:409–431, 1997. 17. Paliath Narendran and Friedrich Otto. The problems of cyclic equality and conjugacy for finite complete rewriting systems. Theoret. Comput. Sci., 47:27–38, 1986. 18. Michael J. O’Donnell. Computing in systems described by equations. LNCS 58. Springer, 1977. 19. Geraud S´enizergues. On the termination problem for one-rule Semi-Thue Systems. In RTA-7, LNCS 1103, pages 302–316. Springer, 1996. 20. Kayoko Shikishima-Tsuji, Masashi Katsura, and Yuji Kobayashi. On termination of confluent one-rule string rewriting systems. Inform. Process. Lett., 61(2):91–96, 1997. 21. Hans Zantema and Alfons Geser. A complete characterization of termination of 0p 1q → 1r 0s . Applicable Algebra in Engineering, Communication, and Computing, 11(1):1–25, 2000.
Recursive Derivational Length Bounds for Confluent Term Rewrite Systems Research Paper Elias Tahhan-Bittar Dpto. de Matem´ aticas Puras y Aplicadas, Univ. Sim´ on Bol´ıvar, P.O. Box 89000, Caracas 1080-A, Venezuela. [email protected]
Abstract. Let F be a signature and R a term rewrite system on ground terms of F . We define the concepts of a context-free potential redex in a term and of bounded confluent terms. We bound recursively the lengths of derivations of a bounded confluent term t by a function of the length of derivations of context-free potential redexes of this term. We define the concept of inner redex and we apply the recursive bounds that we obtained to prove that, whenever R is a confluent overlay term rewrite system, the derivational length bound for arbitrary terms is an iteration of the derivational length bound for inner redexes.
1
Introduction
Given a finite term rewrite system R, a natural question beyond termination is how to bound the length of derivations of a term t. This question has been addressed for term rewrite systems whose termination is proved by recursive path orderings, see for example [2], [5] and [11]. Another way to address the bounding problem is to bound the lengths of derivations of a term recursively. For instance, given a confluent terminating term rewrite system, given terms u0 , . . . , um and ↓ (u0 ), . . . , ↓ (um ) their respective normal forms how to bound the lengths of the derivations of a term f (u0 , . . . , um ) by the lengths of the derivations of u0 , . . . , um and of f (↓ (u0 ), . . . , ↓ (um )). In [3] we studied this approach for strict constructor orthogonal rewrite systems and applied it to study the lengths of derivations of primitive recursive terms. In the present paper we continue the work developed in [3]. We define the notion of context-free potential redexes and elaborate a new proof using techniques of [7] to extend the scope of the recursive bounds given in [3] to confluent term rewrite systems. O’Donnell proved in [10] that innermost terminating orthogonal TRS are terminating. Later Gramlich extended this result to confluent overlay TRS in [8]. We explore in this work a related question: how to bound the length of derivations of terms in overlay confluent TRS as a function of the lengths of derivations of inner redexes. S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 281–295, 2002. c Springer-Verlag Berlin Heidelberg 2002
282
2
E. Tahhan-Bittar
Preliminaries
Here we review definitions and notations required through the article, wherever possible, we adopt the definitions and notations proposed in [4] or [1]. Signature, term, ground term. A signature F is a set of function symbols. The set of those elements of F whose arity is k is denoted Fk . The elements of F0 are also referred to as constants. Ω is a symbol which is not in F and which we call a hole. The set T (F, {Ω}) of terms is the least set (with respect to the inclusion order) satisfying the following rules: i) Ω ∈ T (F, {Ω}), ii) F0 ⊆ T (F, {Ω}) and iii) if u0 , . . . , uk is a sequence of terms in T (F, {Ω}) and f ∈ Fk+1 then f (u0 , . . . , uk ) ∈ T (F, {Ω}). The subset T (F ) of T (F, {Ω}) of ground terms is the least set satisfying rules ii) and iii). Note that T (F ) is non-empty if and only if F0 is non empty. Throughout this article, the equality relation on terms is the syntactic identity. Position, parallel positions. IN denotes the natural numbers and IN∗ is the set of strings of natural numbers. The empty string is denoted by . If p, q ∈ IN∗ , p · q is the concatenation of p and q. If A ∈ IN∗ , and i ∈ IN, the set {i · a : a ∈ A} is denoted by i · A. Elements of IN∗ are ordered by the prefix ordering, ≤, that is q ≤ p if, and only if, there exists ω ∈ IN∗ such that p = q · w; in this case we define p − q := w. Subsets of IN∗ enable us to define the notion of position of a node in a term. Given a term t, the subset P os(t) of IN∗ is defined by P os(t) = { } if t ∈ F0 ∪ {Ω}, P os(f (t0 , . . . , tk−1 )) = { } ∪ i · P os(ti ). 0≤i≤k−1
If P ⊂ P os(t) then P is a set of parallel positions if the elements of P are pairwise incomparable with respect to ≤. Given a position p and Q a set of parallel positions, we write p < Q, p ≤ Q, Q < p and Q ≤ p, if there is q ∈ Q such that, respectively, p < q, p ≤ q, q < p and q ≤ p. Subterm at a position, Context of a set of positions. For p ∈ P os(t), the subterm of t whose root is at position p in t, denoted by t|p , is defined by : t| = t,
f (t0 , . . . , tk−1 )|i·p = ti |p , for 0 ≤ i ≤ k − 1.
If f ∈ Fk , and u1 , . . . , uk are terms in T (F, {Ω}), we call u1 ,. . . ,uk immediate subterms of the term f (u1 , . . . , uk ). Given a term t and a sequence, P = p1 , . . . , pm , of parallel positions of t, we denote by t|P the sequence t|p1 , . . . , t|pm . For p ∈ P os(t), the context of p in t, denoted by t|p , is defined by : t| = Ω,
f (t0 , . . . , tk−1 )|i·p = f (t0 , . . . , ti−1 , ti |p , ti+1 , . . . , tk−1 ), for 0 ≤ i ≤ k − 1.
Recursive Derivational Length Bounds for Confluent Term Rewrite Systems
283
This notion of context of a position can be extended to that of a set of parallel positions. Suppose that t ∈ T (F, {Ω}) and {p1 , . . . , pm } is a set of parallel positions of t, then t|p1 ,...,pm is defined as follows : t|∅ := t
and
t|p1 ,...,pm := (t|p1 )|p2 ,...,pm .
Subterm replacement. Suppose that t, s ∈ T (F, {Ω}) and p ∈ P os(t), then t[s]p is defined by : t[s] = s,
f (t0 , . . . , tk−1 )[s]i·p = f (t0 , . . . , ti−1 , ti [s]p , ti+1 , . . . , tk−1 ), where 0 ≤ i ≤ k − 1.
t[s]p is the result of replacing t|p by s in t. Parallel replacement is defined as follows : if t, s1 , . . . , sm ∈ T (F, {Ω}) and p1 , . . . , pm are parallel positions in P os(t), then : t[s1 , . . . , sm ]p1 ,...,pm = (t[s1 ]p1 )[s2 , . . . , sm ]p2 ,...,pm . Given a set of parallel positions P = {p1 , . . . , pm } in P os(t), and a term s, the parallel replacement t[s, . . . , s]p1 ,...,pm is denoted by t[s]P . If t, s1 , . . . , sm ∈ T (F, {Ω}) and if P1 , . . . , Pm is a partition of a set of parallel positions in P os(t), then we use the notation : t[s1 , . . . , sm ]P1 ,...,Pm := (t[s1 ]P1 )[s2 , . . . , sm ]P2 ,...,Pm . The above definitions give rise to the following identities : t|p1 ,...,pm = t[Ω]p1 ,...,pm
and t|p1 ,...,pm [t|p1 , . . . , t|pm ] = t .
Rewrite rule, rewrite system. Given a signature F and a set of variable symbols with arity zero, V , such that F ∩ V = ∅, we consider terms of T (F ∪ V ). Given t ∈ T (F ∪ V ), we denote by V P os (t) the set of positions p ∈ P os(t) such that t|p ∈ V . The set of variables occurring in t, denoted by var(t), is defined by var(t) := {t|p : p ∈ V P os(t)}. A rewrite rule on T (F ∪ V ), denoted by l → r, is a pair (l, r) ∈ T (F ∪ V ) such that l ∈ V and var(r) ⊆ var(l). Given t ∈ T (F ∪ V ) and x ∈ var(t), we denote by P os(t, x) the set of positions p ∈ V P os(t) such that t|p = x. Two rewrite rules l → r and l → r are equivalent if they are equals up to renaming variables, i.e. l[Ω]V P os(l) = l [Ω]V P os(l ) , r[Ω]V P os(r) = r [Ω]V P os(r ) and there is a one to one function, φ, from var(l) onto var(l ), such that for each x ∈ var(l), P os(l, x) = P os(l , φ(x)) and P os(r, x) = P os(r , φ(x)). Throughout this article we identify equivalent rewrite rules. A set of rewrite rules on T (F ∪ V ) is called a term rewrite system, (TRS). A rewrite rule l → r is left linear if for each variable x ∈ var(l), the set P os(t, x) is a singleton. A TRS R on T (F ∪ V ) is left-linear if each rewrite rule in R is left-linear.
284
E. Tahhan-Bittar
The rewrite relation on T (F ). Given a rewrite rule ρ = (l → r) on T (F ∪V ), and var(l) := {x1 , . . . , xm }. Let t1 , t2 ∈ T (F ) and p ∈ P os(t1 ), the relation t1 →ρ,p t2 holds if, and only if, there are terms sx1 , . . . , sxm ∈ T (F ) such that the following conditions hold : (i) t1 |p = t2 |p , (ii) t1 |p = l[sx1 , . . . , sxm ]P os(l,x1 ),...,P os(l,xm ) and (iii) t2 |p = r[sx1 , . . . , sxm ]P os(r,x1 ),...,P os(r,xm ) . Given a rewrite system R, a term t1 ∈ T (F ) rewrites, or reduces, to a term t2 ∈ T (F ), t1 →R t2 , if there is a rewrite rule ρ ∈ R and a position p ∈ P os(t1 ) such that t1 →ρ,p t2 . The transitive closure of the rewrite relation →R is denoted + ∗ by →+ R , the reflexive closure of →R is denoted by →R , the symmetric relation ∗ ∗ of →R is denoted by R←. When there is no ambiguity we omit the mention of R, for instance we write →∗ instead of →∗R . We notice that we consider only reductions of ground terms. Let R be a rewrite system. A rewrite sequence, or derivation, is given by a sequence t0 , . . . , tj of terms, a sequence of rules ρ1 , . . . , ρj and a sequence of positions p1 , . . . , pj such that t0 →ρ1 ,p1 t1 →ρ2 ,p2 · · · →ρj−1 ,pj−1 tj−1 →ρj ,pj tj . We abbreviate by (t0 , σ) or by t0 →σ tj , where σ is the sequence (ρ1 , p1 ), . . . , (ρj , pj ). Further, when we don’t give any detail about the signature and the rewrite system; we suppose given a signature F and a rewrite system, R, on T (F ). Termination, confluence. Let R be a rewrite system. An infinite rewrite sequence or infinite derivation starting from a term t0 , is an infinite sequence t0 , . . . , tj , . . . such that t0 → t1 → · · · → tj−1 → tj → tj+1 · · ·. A term t0 is strongly normalizing, abbreviated by (SN ), if there is no infinite R-derivation starting from t0 , A rewrite system R terminates or is strongly normalizing, also abbreviated by (SN ), if there is no infinite R-derivation. A rewrite system R is confluent if ∗← ◦ →∗ ⊂→∗ ◦ ∗←, (i.e. if u ∗← t →∗ v, then there is w such that u →∗ w ∗← v). A rewrite system R is locally confluent if ← ◦ →⊂→∗ ◦ ∗←. Let R be a rewrite system. A term t is a redex if it is redeucible at the root, i.e. there is a rule ρ and a term t such that t →ρ, t . A term t is irreducible if there is no u such that t → u. An irreducible term u is a normal form of a term t if t →∗ u. If R is a confluent rewrite system, any term which admits a normal form has a unique normal form, this normal form is denoted by ↓ (t). Orthogonal, Overlay Rewrite Systems. A TRS R on T (F ) is overlay if for each redex t ∈ T (F ), for each rewrite rule l → r such that t →l→r, t and for each position p such that < p < V P os(l) the subterm t|p is not a redex. A TRS on T (F ), R, is orthogonal if it is overlay, left-linear, and for each redex t ∈ T (F ) there is only one rule l → r such that t →l→r, t . We notice that these
Recursive Derivational Length Bounds for Confluent Term Rewrite Systems
285
definitions are equivalent with the usual definitions given for example in [8], and we consider that they are more suitable for this article. A TRS on T (F ), R, is said to be a constructor rewrite system if its signature F can be partitioned into two sets C and D called, respectively, the set of constructors and the set of defined functions symbols, such that, for each rule l → r in R, the term l is of the form d(u0 , . . . , uk ) where d is a defined function symbol and u0 , . . . , uk belong to T (C ∪ V ). Remark that any constructor TRS is an overlay term rewrite system. A constructor TRS is strict if every irreducible ground term belongs to T (C).
3 3.1
Potential Redexes, Context-Free Potential Redexes Ancestors and Potential Redexes
We define an ancestor relation which generalizes the notion of residuals given in [7]: Definition 1. Given a rewrite rule ρ = (l → r) on T (F ∪ V ), with var(l) := {x1 , . . . , xm }, given a reduction t1 →ρ,p t2 and positions q ∈ P os(t2 ) and p ∈ P os(t1 ); we say that p is an ancestor of q by the reduction t1 →ρ,p t2 , written p is a (t1 , (ρ, p))-ancestor of q, if if q ∈ P os(t2 |p )\{p} q if q ∈ p · (P os(r)\ V P os(r)) p = p p · pj · w if q = p · qj · w where qj ∈ P os(r, xj ) and pj ∈ P os(l, xj ) . The ancestor relation with respect to a derivation is the reflexive transitive closure of the ancestor relation by a reduction defined above. When t →σ t , p ∈ P os(t) and q ∈ P os(t ) we write that p is a (t, σ)-ancestor of q, or p (t,σ) q, to indicate that p is an ancestor of q by the derivation (t, σ). The set of (t, σ)ancestors of a position q is denoted by anc(t,σ) (q) We notice that if R is a left-linear TRS, then the (t, σ)-ancestor relation is actually a function, i.e. given t →σ t , and q ∈ P os(t ) then anc(t,σ) (q) has a unique element which we denote also by anc(t,σ) (q). This is not the case for non left-linear TRS; moreover anc(t,σ) (q) may have comparable elements, for instance: Example 1. Given the TRS R := {c(a, x) → d(b, x), d(x, x) → e(x)}. Given the derivation t0 = c(a, b) →δ1 t1 = d(b, b) →δ2 t2 = e(b), then anc(t1 ,δ2 ) (0) = {0, 1} and anc(t0 ,δ1 ·δ2 ) (0) = { , 1}. The ancestor relation satisfies the following comparability property : Lemma 1. Given a derivation t →σ t and q, q ∈ P os(t ) with q ≤ q , then p (t,σ) q ⇒ p ≤ anc(t,σ) (q )
and
p (t,σ) q ⇒ anc(t,σ) (q) ≤ p .
Proof. By induction on the length of the derivation σ. Since the ancestor relation is the reflexive transitive closure of the ancestor relation by a reduction it is sufficient to prove the lemma for one reduction. Suppose that ρ = (l → r) and that t →ρ,p¯ t . We proceed by cases:
286
E. Tahhan-Bittar
– if p¯ < anc(t,σ) (q) then there is a position p such that anc(t,σ) (q) = {p} and p ≤ p for each position p ∈ anc(t,σ) (q ). – if p¯ < anc(t,σ) (q). Then q = p¯ · qj · v and q = p¯ · qj · v for some qj ∈ V P os(r) and v < v . Therefore, p¯·pj ·v ∈ anc(t,σ) (q) if and only if p¯·pj ·v ∈ anc(t,σ) (q ). A term t is a potential redex if there is a redex t such that t →∗ t . Given a term t and p ∈ P os(t), we say that t|p is a potential redex of t, (PR), or that p is a potential redex position of t, (PRP), if t|p is a potential redex. The ancestor relation backward preserves potential redex positions; indeed: Lemma 2. Given a derivation t →σ t , if t |q is a potential redex and p is a (t, σ)-ancestor of q, then t|p is a potential redex. Proof. As in the proof of lemma 1 it is sufficient to prove the lemma for one reduction. Suppose that ρ = (l → r) and that t →ρ,p t . Given p a (t, (ρ, p ))ancestor of q, we proceed by case: – – – –
if if if if
p and p are parallel then q = p and t|q = t |q , p = p , we have already that t|p is a redex, p < p then p = q and t|q →σ,(p −q) t |q , p < p then t|p = t |q .
Given a derivation t →σ t and positions p ∈ P os(t), q ∈ P os(t ). We say that the subterm t |q σ-stems from the subterm t|p , if there is a position p such that p is a (t, σ)-ancestor of q and p ≤ p ; we say also that the position q (t, σ)-stems from the position q and write p (t,σ) q. The stemming relation satisfies right-monotonicity: Lemma 3. Given a derivation t →σ t and positions p ∈ P os(t), q, q ∈ P os(t ). If p (t,σ) q and q < q then p (t,σ) q . Proof. Since p (t,σ) q there is p ∈ P os(t) such that p ≤ p (t,σ) q. By lemma 1, and since q < q , there is p ∈ P os(t) such that p ≤ p and p (t,σ) q . Definition 2. Given a derivation t →σ t , p ∈ P os(t) and q ∈ P os(t ). We say that t |q is an outermost potential redex (OPR) of t (t, σ)-stemming from t|p if q is a minimal potential redex position (t, σ)-stemming from p. Lemma 4. Given a derivation t →σ t and p ∈ P os(t); if t |q is an OPR (t, σ)stemming from t|p and r < q such that t |r is a potential redex then there is s < p such that s is a (t, σ)-ancestor of r. Proof. Since t |q (t, σ)-stems from t|p there is a position p such that p ≤ p and such that p is a (t, σ)-ancestor of q. By lemma 1 there is s such that s ≤ p and such that s is a (t, σ)-ancestor of r. By hypothesis t |r is a potential redex and t |q is an OPR (t, σ)-stemming from t|p ; therefore, p ≤ s and hence s < p . Lemma 5. Given a left-linear TRS R. Given a derivation t →σ t →τ t and p ∈ P os(t). If t |q is an OPR (t, σ)-stemming from t|p and t |r is an OPR. (t , τ )-stemming from t |q then t |r is an OPR (t, σ · τ )-stemming from t|p .
Recursive Derivational Length Bounds for Confluent Term Rewrite Systems
287
Proof. Let s be a potential redex position of t such that s < r. By the previous lemma and left-linearity, q := anc(t ,τ ) (s) < q. By lemma (2) q is a potential redex position and by the previous lemma and left-linearity again p := anc(t ,σ) (q ) < p. Hence, by left-linearity, p := anc(t,σ·τ ) (s). The following example shows that the left-linearity is a necessary hypothesis in the previous lemma. Example 2. Given the (overlay) TRS R := {a(x) → b(s(x), x), d → e, d → s(e), b(x, x) → c(x), e → f, s(x) → x}. Given the derivation t →σ t →τ t where t = a(d) → b(s(d), d) → b(s(e), d) → t = b(s(e), s(e)) → t = c(s(e)) . Then t |00 is an OPR (t, σ)-stemming from t|0 and t |00 is an OPR (t , τ )stemming from t |00 but t |0 is the OPR (t, σ · τ )-stemming from t|0 . Given a derivation t →σ t and a set of parallel positions P ⊆ P os(t). A position, q ∈ P os(t ), (t, σ)-stems from P if there is some p ∈ P such that the position q (t, σ)-stems from p. We abbreviate the sentence outermost potential redex positions by OPRP. General TRS satisfy a weaker version of lemma (5): Lemma 6. Given a TRS R. Given a derivation t →σ t →τ t and P ⊆ P os(t) a set of parallel positions. Given Q ⊆ P os(t ) the set of OPRP (t, σ)-stemming from P and R the set of OPRP (t , τ )-stemming from Q then R is the set of OPRP (t, σ · τ )-stemming from P . Proof. By lemma (3), it is clear that each element of R (t, σ · τ )-stems from P . Given r ∈ P os(t ) a potential redex position such that r is (t, σ · τ )-stemming from P . There are positions p and q such that P ≤ p t,σ q t ,τ r. By lemma (2), the positions p and q are potential redex positions. Since Q is the OPRP (t, σ)stemming from P necessarily Q ≤ q and since R is the OPRP (t , τ )-stemming from Q necessarily R ≤ r. Therefore, R is the set of OPRP (t, σ · τ )-stemming from P . 3.2
Context-Free Potential Redexes
A term t is independent with respect to a context C[Ω]p if for any rewrite rule ρ = (l → r) and position q ≤ p such that C[t]p →ρ,q t , there is q1 ∈ V P os(l) such that q ·q1 ≤ p; roughly speaking each rewrite of C[t]p occurs in the context C[Ω]p or in t. Given a term t and a position p ∈ P os(t), such that t|p is independent with respect to t|p , then t|p is called an independent subterm of t. Definition 3. A potential redex t is context-free with respect to a context C[Ω]p – or a C[Ω]p -free potential redex – if for any derivation C[t]p →σ t , the outermost potential redexes of t σ-stemming from t are independent subterms of t . A potential redex t is context-free if it is context-free with respect to any context. Given a term t and p ∈ P os(t), we say that the subterm t|p is context-free in t if it is context-free with respect to t|p .
288
E. Tahhan-Bittar
Given a term t and a set of parallel positions P ⊆ P os(t). The set of subterms t|P is context free in t if for any derivation t →σ t , and for any q outermost potential redex position (t, σ)-stemming from P , then t |q is an independent subterm of t . Outermost potential redexes which are subterms of a term t are context-free subterms of t. In fact, if P is the set of outermost potential redex positions in t then the context t|P is “invariant”, i.e. if t →σ t →(ρ,p) , is a derivation then P ≤ p. Example 3. Given the overlay TRS defined in example (2) and the term t = a(d). The subterm t|0 is a context-free redex in t. In fact we have: Proposition 1. For any term rewrite system R the following statements are equivalent: 1. 2. 3. 4.
R is overlay. Every potential redex is independent w.r.t. any context. Every potential redex is context-free. Every redex is independent w.r.t. any context.
Proof. 1 implies 2 Suppose that R is overlay. Given a term t and a redex u such that t is a non independent subterm of u. Let l → r be a rule such that u →l→r, v and such that there is a position p satisfying < p < V P os(l) and t = u|p . If (t, (ρi , pi )i ) is a derivation then (u, (ρi , p · pi )i ) is a derivation and, since R is overlay, for each i, V P os(l) ≤ p · pi , therefore pi = , and so t is not a potential redex. 2 implies 3 and 3 implies 4 By definitions. 4 implies 1 Suppose that every redex is independent. Given a redex t and a rule l → r such that t →l→r, u; given p ∈ P os(t)\{ } such that t|p is a redex, necessarily V P os(l) ≤ p; hence R is overlay. Corollary 1. Given a strict constructor overlay TRS R, the context-free potential redexes are the terms d(u0 , . . . , un ) where d is a defined function symbol. The following is a key proposition which allows to give recursive bounds on the lengths of derivations: Proposition 2. Given a term t and a set of parallel potential redex positions P ⊆ P os(t) such that the set of subterms t|P is context-free in t. Let t →σ t be a derivation. Let Q ⊆ P os(t ) be the set of OPRP (t, σ)-stemming from P , then t |Q is context-free in t . Proof. Given a derivation t →σ t →τ t . Let R ⊆ P os(t ) be the set of OPRP (t , τ )-stemming from Q. By lemma (6), R is the set of OPRP (t, σ · τ )-stemming from P and hence for each r ∈ R, the subterm t |r is independent in t . Therefore, the set of subterms t |Q is context-free in t .
Recursive Derivational Length Bounds for Confluent Term Rewrite Systems
4
289
Recursive Bounded Confluence Theorem
The derivational height of a term t, denoted by λ(t), is the supremum of the lengths of derivations of t. Given a finite set X, we denote by |X| the cardinality of X. The multiplicity of a rewrite rule, ρ = (l → r), denoted by m(ρ), is defined by |P os(r, x)| m(ρ) = max{ : x ∈ var(l)} . |P os(l, x)| The multiplicity of a term t is given by: m(t) = sup{m(ρ) ∈ Q : ∃u, u ∈ T (F ), ∃ρ ∈ R, ∃p ∈ P os(u), t →∗ u →ρ,p u } . A term t is Bounded Confluent, (BC), if λ(t) and m(t) are finite and if t has a unique normal form. Given a term t and a set of parallel positions P , if the terms of t|P are BC, we say that t|P is BC. Given P := p1 , . . . , pm a sequence of parallel positions of a term t. If t|P is BC we use the following notations: ↓ (t|P ) is the sequence ↓ (t|p1 ), . . . , ↓ (t|pm ), j=m m(t|P ) := max{m(tp1 ), . . . , m(tpm }, λ(t|P ) := j=1 λ(tpj ) and ↓P (t) := t|P [↓ (t|P )]P . Remark 1. We notice that if u is a BC potential redex and u →∗ u , if Q is the set of OPRP of u then u |Q is BC, λ(u |Q ) = λ(u ) and ↓ (u) =↓Q (u ). Given a set of parallel positions P , a derivation of t, σ = (ρj , qj )0≤j≤k , is said to be a derivation of t|P if for each j, P ≤ qj . We notice that in this case, for each p ∈ P , the subsequence {(ρj , qj ) : 0 ≤ j ≤ k and p ≤ qj } is a derivation of t|p. The techniques used in the next lemma are of the same vein as those developed in [7] and [9]. Lemma 7. Given a term t and a sequence P of parallel potential redex positions such that t|P is BC and context-free in t and ↓P (t) is BC. Let t →(ρ,p) t be a reduction and Q be the OPRP (t, (ρ, p))-stemming from P . Then, t |Q and ↓Q (t ) are BC. Moreover, one of the following cases occurs: – If P ≤ p then λ(t |Q ) < λ(t|P ) and λ(↓Q (t )) = λ(↓P (t)). – If P ≤ p then λ(t |Q ) ≤ m(ρ) · λ(t|P ) and λ(↓Q (t )) < λ(↓P (t)). Proof. Since t|P is independent in t the reduction occurs either in t|P or in t|P . We study each case. Reduction of a context-free potential redex: P ≤ p. There is p¯ ∈ P such that p¯ ≤ p. Let Q be the set of OPR of t |p¯. By remark (1), and since t|p¯ is BC, we know that t |p·Q ¯ is BC and ↓ (t|p¯) = t |p¯[↓ (t |Q )]Q . Therefore, since Q = P \{¯ p} ∪ p¯ · Q , we conclude that t |Q is BC and ↓Q (t ) =↓P (t) is BC. Hence, λ(↓Q (t )) = λ(↓P (t)). Since for each derivation σ of t |Q , (ρ, p) · σ is a derivation of t|P , we have λ(t |Q ) < λ(t|P ).
290
E. Tahhan-Bittar
Reduction of a context: P ≤ p. Since t|P is independent in t we have t|P →(ρ,p) t |Q . Therefore, for each q ∈ Q, if p¯ (t,(ρ,p)) q then p¯ ∈ P and t|p¯ = t |q . Hence, t |Q is BC and λ(t |Q ) ≤ m(ρ) · λ(t|P ). Moreover, since ↓P t →(ρ,p) ↓Q t , ↓Q (t ) =↓P (t) is BC and λ(↓Q t ) < λ(↓P t). Given a set of derivations ∆, a derivation σ ∈ ∆ is called a worst derivation in ∆ if the length of σ is greater or equal than the length of any derivation belonging to ∆. For instance, given a term t, i) if t is BC then the length of a worst derivation of t is λ(t) and ii) if P is a set of parallel positions such that t|P is BC, then the length of a a worst derivation of t|P is λ(t|P ). If t|P and ↓P (t) are BC the notation w.d. of t|P w.d. t−−−−−−−→↓P (t)−−−−−→↓ (↓P (t)) λ(t|P )
λ(↓P (t))
means that first we perform a worst derivation of t|P , whose length is λ(t|P ) and then we perform a worst derivation of ↓P (t), whose length is λ(↓P (t)). Using this notation, we can rephrase partially the previous lemma as follows: Lemma 8. Under the same hypothesis of lemma (7), one of the following cases occurs: Reduction of a context-free potential redex: P ≤ p. Then: w.d. of t|P w.d. −−−−−−−→ ↓P(t) −−−−−→ ↓ (↓P (t)) t λ(t|P ) λ(↓P (t)) (ρ,p) ∅ ∅ w.d. of t |Q w.d. ↓ (↓ t −−−−−−−−→ ↓Q (t ) −−−−−→ Q (t ) λ(t |Q )
λ(↓Q (t ))
with λ(t |Q ) < λ(t|P ) and λ(↓Q (t )) = λ(↓P (t)). Reduction of a context: P ≤ p. Then: w.d. of t|P w.d. −−−−−−−→ ↓P(t) −−−−−→ t λ(t|P ) λ(↓P (t)) (ρ,p) (ρ,p) w.d. of t |Q w.d. −−−−−−−−→ ↓Q (t ) −−−−−→ t λ(t |Q )
λ(↓Q (t ))
↓ (↓P (t)) ∅ ↓ (↓Q (t )
with λ(t |Q ) ≤ m(ρ) · λ(t|P ) and λ(↓Q (t )) < λ(↓P (t)). Proposition 3. Given a term t and a set of parallel potential redex positions P ⊆ P os(t) such that the set of subterms t|P is BC and context-free in t. Let t →σ t be a derivation. Let Q ⊆ P os(t ) be the set of OPRP (t, σ)-stemming from P . Then t |Q is BC and context-free in t , and ↓Q (t ) is BC. Also, the following statements are satisfied. 1. There is a subsequence σ1 of σ such that ↓P t →σ1 ↓Q t . 2. For each q ∈ Q there is a position p ∈ P and a derivation (t|p , σq ) such that t|p →σq u and t |q is an outermost potential redex of u.
Recursive Derivational Length Bounds for Confluent Term Rewrite Systems
291
Proof. Let t be a term and P a set of positions satisfying the hypothesis of the proposition. Given a derivation t →σ t →(ρ,¯q) t . Let Q ⊆ P os(t ), be the set of OPRP (t, σ)-stemming from P . Let R ⊆ P os(t ), be the set of OPRP (t, σ · (ρ, q¯))-stemming from P . Suppose that Q satisfies the conclusions of the proposition and lets prove that R satisfies them also. By lemma (6), R is the set of OPRP (t , (ρ, q¯))-stemming from Q. Hence, by proposition (2) and lemma (7) we know that t |R is BC and context-free in t , and ↓R (t ) is BC. We prove the statements of the proposition by case. If Q ≤ q¯. Then, as seen in the proof of lemma (7), we have ↓Q (t ) =↓R (t ) and hence ↓P t →σ1 ↓Q t =↓R (t ). So, statement 1 is satisfied in this case. Let be q ∈ Q such that q ≤ q¯. For any r ∈ R such that r and q are incomparable we have t |r = t |r ; hence the statement 2 is satisfied for t |r . If r ∈ R and q ≤ r. Then, by hypothesis, there is a position p ∈ P and a derivation (t|p , σq ) such that t|p →σq u and t |q is an OPR of u. But, as seen in the proof of lemma (7), we have t |q →(ρ,¯q−q) t |q and t |r is an OPR of t |q . Therefore, if t |q = u|s then t|p →σq u →(ρ,s·¯q−q) v and t |r is an OPR of v. If Q ≤ q¯. By hypothesis there is a subsequence σ1 of σ such that ↓P t →σ1 ↓Q t . But, as seen in the proof of lemma (7), we have ↓Q t →(ρ,¯q) ↓R t . So, statement 1 is satisfied in this case. As seen in the proof of lemma (7), for each r ∈ R there is q ∈ Q such that t |q = t |r and hence statement 2 is satisfied in this case. Remark 2. Given a sequence of natural numbers (mi , ni )0≤i≤k . If there is a constant c such that for each i either mi+1 < mi and ni+1 = ni or either mi+1 ≤ c · mi and ni+1 < ni , then the inequality k ≤ n0 + cn0 · m0 holds. Theorem 1 (Recursive bounded confluence). Given a term t and a sequence P of parallel positions of t such that t|P is BC, context-free in t, and ↓P (t) is BC, then t is BC, and the following inequalities hold: λ(t) ≤ λ(↓P (t)) + m(↓P (t))λ(↓P (t)) × λ(t|P ) and
m(t) ≤ max{m(t|P ), m(↓P (t))} .
Proof. Let t be a term and P a set of positions satisfying the hypothesis of the proposition. Given a derivation t →σ t →(ρ,¯q) t . Let Q ⊆ P os(t ), be the set of OPRP (t, σ)-stemming from P . Let R ⊆ P os(t ), be the set of OPRP (t, σ · (ρ, q¯))-stemming from P . By lemmas (6) and (7) and by proposition (3) we conclude that: – If q ≤ q¯ for some q ∈ Q then λ(t |R ) < λ(t |Q ) and λ(↓Q (t )) = λ(↓R (t )). Moreover, there is a position p ∈ P and a derivation (t|p , σq ) such that t|p →σq u →(ρ,s·¯q−q) v, hence m(ρ) ≤ m(t|P ).
292
E. Tahhan-Bittar
– If q ≤ q¯ for every q ∈ Q, then λ(t |R ) ≤ m(ρ) · λ(t |Q ). Moreover, ↓Q t →(ρ,p ) ↓Q t , hence m(ρ) ≤ m(↓P t) and λ(↓R t ) < λ(↓Q t ). Using remark (2) we conclude that the inequalities of the theorem hold. Example 4. Given the TRS R = {a → b, b → e, e → f, c(x, 0) → d(x, x), c(x, s(y)) → c(d(x, x), y)} . We write n instead of s(s(....(s(0)))), where s is repeated n times. Given the term t := (c(a, n)) then the subterm t|0 = a is BC and context-free potential redex in t. We have λ(a) = 3, ↓ (a) = f , λ(c(f, n)) = n + 1, m(c(f, n)) = 2 and λ(t) = (n + 1) + 2n+1 · 3. Hence the theorem gives the sharpest possible bound.
5
Complexity of Bounded Confluent Overlay TRS
The recursive BC theorem allows us to conclude: Corollary 2. Given an overlay TRS. Let t = f (u0 , . . . , uk ) be a term such that u0 , . . . uk are BC and ↓i (t) := f (↓ (u0 ), . . . , ↓ (uk )) is BC then t is BC and the following inequalities hold: λ(t) ≤ λ (↓i (t)) + m(↓i (t))λ(↓i (t)) × (λ(u0 ) + · · · + λ(uk )) and
m(t) ≤ max{m(↓i (t)), m(uo ), . . . m(uk )} .
Proof. By proposition (1) every potential redex is context free. Given P := {k · q : 0 ≤ k ≤ m and q is a position of an OPR of uk } , the equality ↓i (t) =↓P (t) holds and we can apply theorem (1). This corollary has been proved by other means and applied in [3] to obtain bounds for the lengths of derivations of primitive recursive terms by primitive recursive rewriting schemes. We study further in this section bounds for the derivational length of bounded confluent overlay TRS. A term is an inner redex if every proper subterm is irreducible. An innermost derivation is a derivation where only inner redex subterms are reduced at each step. A TRS is innermost terminating if every innermost derivation has a finite length. A TRS is finitely branching if for every redex t there is a finite number of rules l → r such that t →(l→r,) t . Given a finitely branching innermost terminating TRS, we denote by λi (t) the length of a longest innermost derivation of t. We know by a result of [8] which extends a result of [10] that any locally confluent overlay innermost-terminating TRS is terminating. The previous corollary allows us to prove a weaker version of Gramlich’s theorem: Proposition 4. Every locally confluent overlay finitely branching innermost terminating TRS is bounded confluent.
Recursive Derivational Length Bounds for Confluent Term Rewrite Systems
293
Proof. Let R be a finitely branching and locally confluent TRS. The size of a term t, denoted by |t|, is the cardinality of P os(t). We prove by induction on (λi (t), |t|), lexicographically ordered, that each term is BC. Suppose that for every term u such that (λi (u), |u|) < (λi (t), |t|), the term u is BC. If t is an inner redex then for every rule l → r if t →(l→r,) t , we have λi (t ) < λi (t), so t is BC; therefore, since R is a finitely innermost branching, λ(t) and m(t) are finite, and since R is locally confluent then t has a unique normal form (by Newmann’s lemma). If t is not an inner redex, then λi (↓i (t)) < λi (t) and, for each subterm ul , we have λi (ul ) ≤ λi (t); therefore, ↓i (t) is BC and each subterm uk is BC, so by corollary (2) the term t is BC. A uniformly bounded confluent, (UBC), TRS is a bounded confluent TRS such that there are functions µ ∈ ININ and ϕ ∈ ININ such that for every term t we have m(t) ≤ µ(|t|) and λ(t) ≤ ϕ(|t|). The functions µ and ϕ are called respectively the multiplicity bound function and the derivational-length bound function. The inner redexes derivational-length bound function is a function ϕi ∈ ININ such that for every inner redex t we have λ(t) ≤ ϕi (|t|). A function, φ, from IN to IN, is a size function, if for each reduction t →(ρ,p) t , |t | ≤ φ(t). We notice that if R is UBC then φ(n) = n · µ(n) is a size function. Given f a function from IN to IN, we denote: (i) f (0) = id, where id is the identity function of IN, (ii) f (n+1) = f ◦ f (n) , where ◦ is the composition of functions and (iii) The iteration operator It is defined by: It(f )(x) = f (x) (x). We stress the fact that f n is the exponential function and is not the function f (n) . A function, f ∈ ININ , is expansive if f (n) + f (m) ≤ f (n + m) for each n, m ∈ IN. We notice that if f and g are expansive, then f + g, f · g, f ◦ g and It(f ) are expansive; moreover, for each n ∈ IN, n ≤ f (n). The symbol S will denote the successor function. Proposition 5. Let R be a uniformly bounded confluent overlay TRS. Let µ be an expansive multiplicity function, let ϕi be an expansive inner redexes derivational-length bound function, and let Γ ∈ ININ be the function defined by: Γ (x) := ϕi (x) + x · (µ(x))ϕi (x) . Let φ be an expansive size function. Then the function: ϕ := It (Γ ◦ S ◦ It(φ)) , is an expansive derivational-length bound function. Proof. The function ϕ is expansive, since it is built with expansive functions using operations which preserve expansivity. We prove by lexicographic induction on (λ(t), |t|) that for each term t the inequality λ(t) ≤ ϕ(|t|) holds. – If t is an irreducible term then λ(t) = 0 ≤ ϕ(|t|). – If t is an inner redex then λ(t) ≤ ϕi (|t|) ≤ ϕ(|t|).
294
E. Tahhan-Bittar
– We use the notation of corollary (2). If t = f (u0 , . . . , uk ) is a reducible term but is not an inner redex, then λ(↓i (t)) < λ(t) and for each subterm ul , we have λ(ul ) ≤ λ(t); therefore we can apply the inductive hypothesis. By corollary (2), and since ↓i (t) is an inner redex; we have: λ(t) ≤ λ (↓i (t)) + (m(↓i (t)))λ(↓i (t)) · (λ(u0 ) + · · · + λ(uk )) ≤ ϕi (| ↓i (t)|) + µ(| ↓i (t))|)ϕi (|↓i (t)|) · (ϕ(|u0 |) + · · · + ϕ(|uk |)) . For each immediate subterm uj of t, we have by induction hypothesis that: | ↓ (uj ) | ≤ φ(λ(uj )) (|uj |) ≤ It(φ) (ϕ (|uj |)) . Applying the expansivity properties | ↓i (t)| ≤ 1 +
j=k
(It(φ) ◦ ϕ) (|uj |)
j=0
j=k |uj | ≤ 1 + (It(φ) ◦ ϕ) j=0
j=k = (S ◦ It(φ) ◦ ϕ) |uj | j=0
On the other hand: j=k j=0
λ(uj ) ≤
j=k
ϕ (|uj |)
j=0
j=k |uj | ≤ ϕ j=0
j=k ≤ (S ◦ It(φ) ◦ ϕ) |uj | j=0
Thus :
λ(t) ≤ (Γ ◦ S ◦ It(φ) ◦ ϕ)
j=k
|uj |
j=0
j=k j=k |uj | j=0 = (Γ ◦ S ◦ It(φ)) (Γ ◦ S ◦ It(φ)) |uj |
IH
j=k j=k 1+ |uj | j=0 = (Γ ◦ S ◦ It(φ)) |uj | j=0
j=0
Recursive Derivational Length Bounds for Confluent Term Rewrite Systems
295
j=k j=k 1+ |uj | 1 + j=0 ≤ (Γ ◦ S ◦ It(φ)) |uj | j=0
= ϕ (|t|) .
6
Conclusion
We studied how to bound recursively the lengths of derivations of bounded confluent terms for any TRS. We saw how this result implies a known theorem relating innermost termination and strong termination for finitely branching locally confluent overlay TRS. We think that the analysis given in this paper could bring some insight to the study of strategy defined derivations, for instance perpetual derivations developed by Nederpelt, Khasidashvili and others more recently.
References 1. F. Baader and T. Nipkow Term Rewriting and All That. Cambridge University Press (1998). 2. G. Bonfante and A. Cichon and J-Y Marion and H. Touzet Complexity classes and rewrite systems with polynomial interpretation, Lecture Notes in Computer Science, 1584 (1999) 372–384 3. E.A. Cichon, E. Tahhan. Strictly Orthogonal Left-Linear Rewrite Systems and Primitive Recursion. Annals of Pure and Applied Logic. 108 (2001) 79–102. 4. N. Dershowitz, J.P. Jouannaud. Rewrite Systems. Handbook of Theoretical Computer Science, Volume B, Elsevier, Amsterdam (1990) 243–320. 5. Dieter Hofbauer. Termination proofs by multiset path orderings imply primitive recursive derivation lengths Theoretical Computer Science 105 (1) (1992) 129– 140. 6. G. Huet. Confluent reductions: Abstract properties and applications to term rewriting systems. J. Assoc. Comput. Mach. 27 (4) (1980) 797–821. 7. G. Huet, J.-J. L´evy. Computations in orthogonal rewriting systems, I and II. Computational Logic: Essays in Honor of Alan Robinson, Ed. J.-L. Lassez and G. Plotkin, MIT Press (1991) 395–443. 8. Bernhard Gramlich Abstract Relations between Restricted Termination and Confluence Properties of Rewrite Systems. Fundamenta Informaticae 24 (1995) 3–23. 9. Zurab Khasidashvili. Perpetuality and Strong Normalization in Orthogonal Term Rewriting Systems. Lecture Notes in Computer Science 775 (1994) 163-174. 10. M.J. O’Donnell. Computing in Systems Described by Equations. Lecture Notes in Computer Science 58 (1977). 11. A. Weiermann, Termination proofs for term rewriting systems with lexicographic path orderings imply multiply recursive derivation lengths, Theoretical Computer Science 139 (1995) 355-362.
Termination of (Canonical) Context-Sensitive Rewriting Salvador Lucas DSIC, Universidad Polit´ecnica de Valencia Camino de Vera s/n, E-46022 Valencia, Spain [email protected]
Abstract. Context-sensitive rewriting (CSR) is a restriction of rewriting which forbids reductions on selected arguments of functions. A replacement map discriminates, for each symbol of the signature, the argument positions on which replacements are allowed. If the replacement restrictions are less restrictive than those expressed by the so-called canonical replacement map, then CSR can be used for computing (infinite) normal forms of terms. Termination of such canonical CSR is desirable when using CSR for these purposes. Existing transformations for proving termination of CSR fulfill a number of new properties when used for proving termination of canonical CSR. Keywords: (infinitary) normalization, term rewriting, termination.
1
Introduction
A replacement map is a mapping µ : F → P(N) satisfying µ(f ) ⊆ {1, . . . , k}, for each k-ary symbol f of the signature F [Luc98]. We use them to discriminate the argument positions on which replacements are allowed. In this way, we obtain a restriction of rewriting which we call context-sensitive rewriting (CSR [Luc98]). Terminating TRSs are µ-terminating (i.e., no term initiates an infinite sequence of CSR under µ). However, CSR can achieve termination, by pruning (all) infinite rewrite sequences. Several methods have been developed to formally prove µtermination of TRSs [BLR02,FR99,GM99,GM02,Luc96,SX98,Zan97]. Example 1. Consider the following TRS R: sel(0,x:y) → x sel(s(x),y:z) → sel(x,z) from(x) → x:from(s(x))
first(0,x) → [] first(s(x),y:z) → y:first(x,z)
Let µ(s) = µ(:) = µ(from) = {1} and µ(sel) = µ(first) = {1, 2}. The µtermination of R can be proved by using Zantema’s techniques [Zan97]. Up to now, termination of CSR has been studied disregarding other computational issues such as achieving some kind of completeness in computations. This
Work partially supported by CICYT TIC2001-2705-C03-01, Acciones Integradas HI 2000-0161, HA 2001-0059, HU 2001-0019, and Generalitat Valenciana GV01-424.
S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 296–310, 2002. c Springer-Verlag Berlin Heidelberg 2002
Termination of (Canonical) Context-Sensitive Rewriting
297
problem has been studied in [Luc98]. Since computing (infinite) normal forms is the essential task of most rewriting-based engines (interpreters of rewritingbased (lazy) programming languages, theorem provers, etc.), the application of CSR in practice requires undertaking these issues. In Section 3, we define canonical CSR as an specialization of CSR that only uses replacement maps which are less restrictive than the canonical replacement map µcan R of the TRS R. This is the least replacement map ensuring that all non-variable subterms of every left-hand side of the rules of R are replacing [Luc98]. Section 4 discusses the use of canonical CSR for computing normal forms and infinite normal forms of terms (with left-linear, possibly non-terminating TRSs). Termination of canonical CSR becomes essential for these tasks. Thus, for the purpose of computing (infinite) normal forms, proving termination of canonical CSR can be priorized over proofs of termination: a proof of termination of canonical CSR can be used to achieve normalizations (with possibly non-terminating, left-linear TRSs) and also to arbitrarily approximate infinite normal forms. In the remainder of the paper we try to analyze whether proving termination of canonical CSR for a terminating TRS is much more difficult than proving its termination. In Section 5, we review the main (transformational) techniques for proving termination of CSR and prove a couple of new interesting properties. Most of them appear when these techniques apply for proving termination of canonical CSR. We prove that all transformations introduced by Giesl and Middeldorp [GM99,GM02] are complete for proving termination of canonical CSR. We also prove that the contractive transformation of [Luc96] is complete for proving termination of canonical CSR if the replacing variables of the right-hand sides of each rule of the TRS are also replacing in the left-hand side (conservativeness [Luc96]). We show that, in both cases, Zantema’s transformation [Zan97] and Ferreira and Ribeiro’s transformation [FR99] remain incomplete. In Section 6, we also analyze the (relative) behavior of the transformations when simple termination, rather than termination, is considered. Simple termination covers the use of most usual automatizable orderings for proving termination of rewriting [Der87]. We obtain a new hierarchy of the transformations which is helpful for guiding their practical use. We prove that the transformations for proving termination of CSR preserve (simple) termination. Thus, proving termination of canonical CSR does not seem to be essentially more difficult than proving termination. Section 7 provides a first experimental evidence supporting this claim.
2
Preliminaries
Throughout the paper, X denotes a countable set of variables and F denotes a signature, i.e., a set of function symbols {f, g, . . . }, each having a fixed arity given by a mapping ar : F → N. A TRS is a pair R = (F, R) where R is a set of rewrite rules. Given TRSs R = (F, R) and R = (F , R ), we let R ∪ R be the TRS (F ∪ F , R ∪ R ). L(R) denotes the set of lhs’s of R. Given a signature F, we consider the left-linear TRS Emb(F) = (F, {f (x1 , . . . , xk ) → xi | f ∈ F, i ∈ {1, . . . , ar(f )}}). A TRS R is simply terminating if R ∪ Emb(F) is terminating.
298
3
S. Lucas
(Canonical) Context-Sensitive Rewriting
A mapping µ : F → P(N) is a replacement map (or F-map) if ∀f ∈ F, µ(f ) ⊆ {1, . . . , ar(f )} [Luc98]. The ordering on MF , the set of all F-maps, is: µ µ if for all f ∈ F, µ(f ) ⊆ µ (f ). Thus, µ µ means that µ considers less positions than µ (for reduction), i.e., µ is more restrictive than µ . According to , µ⊥ (resp. µ ) given by µ⊥ (f ) = ∅ (resp. µ (f ) = {1, . . . , ar(f )}) for all f ∈ F, is the minimum (maximum) of MF . Given a TRS R = (F, R), we write MR µ rather than MF . The set of µ-replacing positions Pos (t) of t µ∈ T (F, X ) is: µ µ Pos (t) = {Λ}, if t ∈ X and Pos (t) = {Λ} ∪ i∈µ(root(t)) i.Pos (t|i ), if t ∈ X . The set of replacing variables Varµ (t) of t is Varµ (t) = {x ∈ Var(t) | Posx (t) ∩ Posµ (t) = ∅}. In context-sensitive rewriting (CSR [Luc98]), we (only) rewrite p replacing redexes: t µ-rewrites to s, written t →µ s, if t →R s and p ∈ Posµ (t). The →µ -normal forms are called µ-normal forms. Note that, except for the trivial case µ = µ , the µ-normal forms strictly include all normal forms of R. A TRS R is µ-terminating if →µ is terminating. The canonical replacement map µcan R is the most restrictive replacement map ensuring that the non-variable subterms is easily of the left-hand sides of the rules of R are replacing. Note that µcan R obtained from R: ∀f ∈ F, i ∈ {1, . . . , ar(f )}, i ∈ µcan R (f )
iff
∃l ∈ L(R), p ∈ PosF (l), (root(l|p ) = f ∧ p.i ∈ PosF (l))
where PosF (l) is the set of positions of non-variable subterms of l. Let CMR = {µ ∈ MR | µcan R µ} be the set of replacement maps which are less or equally restrictive than µcan R . We say that →µ is a canonical context-sensitive rewrite relation if µ ∈ CMR .
4
Termination and Normalization Properties
The most important semantic property of canonical CSR is that, for left-linear TRSs, µ-normal forms are head-normal forms1 [Luc98]. We define a normalization procedure normµ which is based on performing a layered µ-normalization of terms (see Figure 1). In Figure 1, the maximal replacing context MRCµ (t) of t is the maximal prefix of t whose positions are µ-replacing in t. Also, µ-norm is a mapping from sets of terms to µ-normal forms of them. The normalization of a term proceeds by first µ-normalizing terms and next recursively normalizing (in this way) the maximal non-replacing subterms of the obtained µ-normal forms. Theorem 1 (Correctness). Let R = (F, R) be a left-linear TRS, µ ∈ CMR , and t ∈ T (F, X ). If s ∈ normµ ({t}), then s is a normal form of t. This procedure is not ensured to be correct or terminating if µ ∈ CMR . Example 2. Consider R and µ as in Example 1. Let t = first(5,from(0)) (numbers n are a shorthand for sn (0)) and µ (f ) = µ(f ) for all symbol f 1
A head-normal form is a term which does not rewrite to a redex.
Termination of (Canonical) Context-Sensitive Rewriting
299
Procedure normµ (T ) T := µ-norm(T ) for each t ∈ T let t = C[t1 , . . . , tn ], where C[ ] = MRCµ (t) for i := 1, . . . , n do Si := normµ ({ti }) Tt := C[S 1 , . . . , Sn ] return t∈T Tt end procedure normµ Given a context C[ ] and sets of terms T1 , . . . , Tn , notation C[T1 , . . . , Tn ] denotes the set {C[t1 , . . . , tn ] | t1 ∈ T1 , . . . , tn ∈ Tn }. Fig. 1. Algorithm for normalization via µ-normalization
except µ (first) = {1}. Note that µ ∈ CMR , but R is still µ -terminating. We have normµ ({t}) = first(5,normµ ({from(0)})) = first(5,0:normµ ({from(1)})) = first(5,0:1:normµ ({from(2)})) .. . Thus, normµ (t) does not terminate and there is no output. Even if normµ terminates, the outcome can be incorrect. Let s = first(1,first(1,[0,1,2])) ([0,1,2] is a shorthand for 0:1:2:[]). Now: normµ ({s}) = first(1,normµ ({first(1,[0,1,2])})) = first(1,0:normµ ({first(0,[1,2])})) = {first(1,[0])} The outcome is not a normal form: first(1,[0]) →∗ [0]. In the following theorem we say that µ-norm is complete for µ-normalization if for all T ⊆ T (F, X ) and t ∈ T , the µ-normal forms of t are in µ-norm(T ). Theorem 2 (Completeness). Let R = (F, R) be a left-linear TRS, µ ∈ CMR , and t ∈ T (F, X ). Assume that µ-norm is complete for µ-normalization. If s is a normal form of t, then s ∈ normµ ({t}). Correctness and completeness of normµ can easily be achieved if canonical µtermination of the TRS R can be ensured: in this case, we let µ-norm(T ) to be the set of all µ-normal forms of each t ∈ T . Example 3. Consider R and µ as in Example 1. Now we can successfully apply the ‘normalization via µ-normalization’ procedure to t in Example 2: normµ ({t}) = 0:normµ ({first(4,from(1))}) = 0:1:normµ ({first(3,from(2))}) .. .
= 0:1:2:3:4:normµ ({first(0,from(5))}) = {[0,1,2,3,4]}
300
S. Lucas
Lazy programming languages admit giving infinite values as the meaning of expressions. Infinite values are limits of converging infinite sequences of partially defined values which are more and more defined. In this setting, the following ‘termination-like’ property has been envisaged: Top-termination: A TRS is top-terminating if no infinitary reduction sequence performs infinitely many rewrites at topmost position Λ [DKP91]. Top-termination is, in general, an undecidable property. We can prove toptermination of a left-linear TRS R by proving its canonical µ-termination. Theorem 3. Let R be a left-linear TRS and µ ∈ CMR . If R is µ-terminating, then R is top-terminating. A TRS is infinitary normalizing if every (finite) term t admits a strongly convergent sequence (i.e., a sequence that, ultimately2 , reduces deeper and deeper redexes) starting from t and ending into a (possibly infinite) normal form (i.e., a term without redexes). For left-linear TRSs, top-termination is a sufficient condition for infinitary normalization. This fact easily follows from3 [DKP91]. Still, there are infinitary normalizing TRSs which are not top-terminating. Example 4. Consider the TRS R: f(a) → f(f(a)) f(a) → a This TRS is infinitary normalizing but not top-terminating: f(a) → f(f(a)) → f(a) → · · · Hence, from Theorem 3, we conclude that µ-termination criteria (see Section 5 below) can also be used for proving infinitary normalization. Infinitary normalizing TRSs can be thought of as playing the role of normalizing TRSs in the infinitary setting, i.e., ensuring that every term has a meaning (its, possibly infinite, normal form). Our normalization via µ-normalization technique easily generalizes to infinitary normalization (see [Luc02] for details).
5
Termination of CSR by Transformation
The µ-termination of a TRS R = (F, R) is usually proved by demonstrating µ µ termination of a transformed TRS RµΘ = (FΘ , RΘ ) obtained from R and µ ∈ MF by using a transformation Θ [FR99,GM99,GM02,Luc96,SX98,Zan97]. Definition 1. A transformation Θ is 1. correct (regarding µ-termination) w.r.t. M ⊆ MF if, for all µ ∈ M, termination of RµΘ implies µ-termination of R. 2 3
We only consider sequences of length at most ω. But note that the notion of infinitary normal form used in [DKP91] (a term t such that t = t whenever t → t ) differs from the usual one.
Termination of (Canonical) Context-Sensitive Rewriting
301
2. complete (regarding µ-termination) w.r.t. M ⊆ MF if, for all µ ∈ M, µtermination of R implies the termination of RµΘ . If M = MF in Definition 1, we just say that Θ is correct (or complete). Obviously, complete transformations Θ preserve termination of R, i.e., if R is terminating, then RµΘ is terminating. The simplest (and trivial) correct transformation for proving µ-termination is the identity: RµID = R (terminating TRSs are µ-terminating for every replacement map µ). We review the main (nontrivial) correct transformations for proving termination of CSR and prove a number of new properties (e.g., completeness) which appear when they are used for proving termination of canonical CSR. 5.1
The Contractive Transformation
With the contractive transformation [Luc96], the non-µ-replacing arguments of all symbols in F are removed and a new, µ-contracted signature FLµ is obtained (possibly reducing the arity of symbols). The function τµ : T (F, X ) → T (FLµ , X ) drops the non-replacing immediate subterms of a term t ∈ T (F, X ) and constructs a ‘µ-contracted’ term by joining the (also transformed) replacing arguments below the corresponding operator of FLµ . A TRS R = (F, R) is µcontracted into RµL = (FLµ , {τµ (l) → τµ (r) | l → r ∈ R}). Example 5. Consider the TRS R (borrowed from [GM02]): nats → adx(zeros) zeros → 0:zeros
adx(x:y) → incr(x:adx(y)) incr(x:y) → s(x):incr(y)
and µ(:) = µ(incr) = µ(adx) = µ(s) = {1}. Then RµL (we use the same symbols with possibly decreased arities): nats → adx(zeros) zeros → :(0)
adx(:(x)) → incr(:(x)) incr(:(x)) → :(s(x))
is terminating (use an rpo with precedence nats > adx, zeros > incr > :, 0, s). Hence, R is µ-terminating. The contractive transformation only works well with µ-conservative TRSs, i.e., satisfying that Varµ (r) ⊆ Varµ (l) for all rule l → r of R [Luc96]; otherwise, extra variables will appear in a rule of RµL thus becoming non-terminating. Let CoCMR = {µ ∈ CMR | for all rules l → r in R, Varµ (r) ⊆ Varµ (l)} That is: CoCMR contains the replacement maps µ ∈ CMR that make R µconservative. Since µ ∈ CoCMR for all TRS R, CoCMR is not empty. We prove that the contractive transformation becomes complete w.r.t. CoCMR . Theorem 4 (Completeness). Let R be a left-linear TRS and µ ∈ CoCMR . If R is µ-terminating, then RµL is terminating. Even though µ-conservativeness can be difficult to achieve, there still are interesting examples fulfilling this requirement (see [BLR02,Luc02]). Theorem 4 does not hold if µ ∈ CMR (even for µ-conservative TRSs).
302
S. Lucas
Example 6. Consider the terminating (hence µ-terminating) TRS R [GM02]: f(b,x) → f(c,x)
together with µ(f) = {2}. Note that R is µ-conservative but µ ∈ CMR ; RµL : f(x) → f(x)
is not terminating. Left-linearity cannot be dropped in Theorem 4. Example 7. Consider the TRS R: f(x,x) → f(a,b)
with µ(f) = {1}. Since R is terminating, R is µ-terminating. However, RµL : f(x) → f(a)
is not terminating. Note that µ ∈ CoCMR . An obvious corollary of Theorem 4 is that the µ-contractive transformation preserves termination of left-linear TRSs R whenever µ ∈ CoCMR . Examples 6 and 7 show the need for these restrictions. 5.2
Zantema’s Transformation
Zantema’s transformation marks the non-replacing arguments of function symbols (disregarding their positions within the term) [Zan97]. Given R = (F, R) µ µ and µ ∈ MF , RµZ = (F ∪ F ∪ {activate}, RZ ) where RZ consists of two parts The first part results from R by replacing every function symbol f occurring in a left or right-hand side with f (a fresh function symbol of the same arity as f which, then, is included in F ) if it occurs in a non-replacing argument of the function symbol directly above it. These new function symbols are used to block further reductions at this position. In addition, if a variable x occurs in a non-replacing position in the lhs l of a rewrite rule l → r, then all occurrences of x in r are replaced by activate(x). Here, activate is a new unary function symbol which is used to activate blocked function symbols again. µ The second part of RZ consists of rewrite rules that are needed for blocking and unblocking function symbols: f (x1 , . . . , xk ) → f (x1 , . . . , xk ) activate(f (x1 , . . . , xk )) → f (x1 , . . . , xk )
for every4 f ∈ F , together with the rule activate(x) → x Zantema’s transformation can succeed when the contractive transformation fails. Example 8. Let R and µ be as in Example 1. Since R is not µ-conservative, RµL cannot be used for proving µ-termination of R. However, RµZ : 4
Actually, the rules activate(f (x1 , . . . , xk )) → f (x1 , . . . , xk ) can be avoided if no symbol activate appears in the rules of the first part of the transformation.
Termination of (Canonical) Context-Sensitive Rewriting first(0,x) first(s(x),y:z) sel(0,x:y) sel(s(x),y:z) from(x)
→ → → → →
303
[] from(x) → from’(x) y:first’(x,activate(z)) activate(from’(x)) → from(x) x first(x,y) → first’(x,y) sel(x,activate(z)) activate(first’(x,y)) → first(x,y) x:from’(s(x)) activate(x) → x
is terminating (use rpo based on sel > activate ≈ first > from, :, first’, [] and from > :, from’, s, and giving sel the usual lexicographic status). Zantema’s transformation remains incomplete for proving canonical µtermination even with left-linear, µ-conservative TRSs. Example 9. Consider R and µ as in Example 5. Note that µ ∈ CoCMR . Remember that RµL is (simply) terminating. However, RµZ : nats → adx(zeros) zeros → 0:zeros’ incr(x) → incr’(x) adx(x) → adx’(x) activate(zeros’) → zeros activate(x) → x
incr(x:y) → s(x):incr’(activate(y)) adx(x:y) → incr(x:adx’(activate(y))) zeros → zeros’ activate(incr’(x)) → incr(x) activate(adx’(x)) → adx(x)
is not terminating: adx(zeros) → adx(0:zeros’) → incr(0:adx’(activate(zeros’))) → incr(0:adx’(zeros)) → s(0):incr’(activate(adx’(zeros))) → s(0):incr’(adx(zeros)) → · · ·
Zantema’s transformation preserves termination of TRSs. Theorem 5. Let R be a TRS and µ ∈ MR . If R is terminating, then RµZ is terminating. In [FR99], Ferreira and Ribeiro propose a variant of Zantema’s transformation which has been proved strictly more powerful than Zantema’s one (see [GM02]). Again, RµF R has two parts. The first part results from the first part of RµZ by marking all function symbols (except activate) which occur below an already marked symbol. Therefore, all function symbols of non-replacing subterms are marked. The second part consists of the rule activate(x) → x plus the rules: f (x1 , . . . , xk ) → f (x1 , . . . , xk ) activate(f (x1 , . . . , xk )) → f ([x1 ]f1 , . . . , [xk ]fk )
for every f ∈ F for which f appears in the first part of RµF R , where [t]fi = activate(t) if i ∈ µ(f ) and [t]fi = t otherwise. They also include rules activate(f (x1 , . . . , xk )) → f ([x1 ]f1 , . . . , [xk ]fk ) for k-ary symbols f where f does not appear in the first part of RµF R . However, Giesl and Middeldorp have recently shown that these rules are not necessary for obtaining a correct transformation [GM02]. Note that R and µ of Example 5 also show incompleteness of Ferreira and Ribeiro’s transformation RµF R .
304
5.3
S. Lucas
Giesl and Middeldorp’s Transformations
Giesl and Middeldorp introduced a transformation that explicitly marks the replacing positions of a term (by using a new symbol active). Given a TRS µ µ µ R = (F, R) and µ ∈ MF , the TRS RµGM = (FGM , RGM ) consists of FGM = µ F ∪ {active, mark} and the rules RGM given by (for all l → r ∈ R and f ∈ F): active(l) → mark(r) mark(f (x1 , . . . , xk )) → active(f ([x1 ]f , . . . , [xk ]f )) active(x) → x where [xi ]f = mark(xi ) if i ∈ µ(f ) and [xi ]f = xi otherwise [GM99]. Let µ M = (FGM , {mark(f (x1 , . . . , xk )) → active(f ([x1 ]f , . . . , [xk ]f )) | f ∈ F}); note that M is a confluent and terminating TRS [GM99]. Giesl and Middeldorp showed that, in general, this transformation is not complete. We prove the completeness of this transformation for proving termination of canonical CSR when dealing with left-linear TRSs. Theorem 6 (Completeness). Let R be a left-linear TRS and µ ∈ CMR . If R is µ-terminating, then RµGM is terminating. Example 1 of [GM99] shows that this result does not hold if µ ∈ CMR . Giesl and Middeldorp also proposed two refinements of this transformation that are intended to provide better results in practice (being still incomplete). 1. The first modification (RµmGM ), see [GM02] Definition 3, consists in replacing the single symbol active by fresh symbols fa for every f ∈ F. Thus, µ FmGM = F ∪ {fa | f ∈ F} ∪ {mark}. The pattern active(f (· · · )) is also replaced everywhere by fa (· · · ). The rule active(x) → x is expanded into all rules of the form fa (x1 , . . . , xk ) → f (x1 , . . . , xk ) and, hence, removed. Now, µ let M = (FmGM , {mark(f (x1 , . . . , xk )) → fa ([x1 ]f , . . . , [xk ]f ) | f ∈ F}). 2. The second modification (RµnGM ) consists of normalizing the rhs’s of rules fa (l1 , . . . , lk ) → mark(r) in RµmGM (where f (l1 , . . . , lk ) → r ∈ R) w.r.t. M , see Section 7 of [GM02]. Since RµGM , RµmGM and RµnGM are equivalent regarding termination [GM02], they are also complete under the conditions of Theorem 6. Also, RµGM and RµmGM are equivalent regarding simple termination. Theorem 7. Let R be a TRS and µ ∈ MR . Then, RµGM is simply terminating if and only if RµmGM is simply terminating. Theorem 7 does not hold for RµnGM . Example 10. Consider the following non-simply terminating TRS R: f(0,1,x) → c(f(x,x,x))
Let µ(f) = {1, 2} and µ(c) = ∅. Then, with RµmGM :
Termination of (Canonical) Context-Sensitive Rewriting fa (0,1,x) mark(f(x,y,z)) mark(0) mark(1) mark(c(x))
→ → → → →
mark(c(f(x,x,x))) fa (mark(x),mark(y),z) 0a 1a ca (x)
305
fa (x,y,z) → f(x,y,z) 0a → 0 1a → 1 ca (x) → c(x)
µ we have the following infinite derivation (in RµmGM ∪ Emb(FmGM )):
fa (0,1,f(0,1,x)) →Rµ mark(c(f(f(0,1,x),f(0,1,x),f(0,1,x)))) mGM →+ mark(f(0,1,f(0,1,x))) →Rµ fa (mark(0),mark(1),f(0,1,x)) µ ) Emb(F mGM
→+ µ Emb(F
mGM
Thus,
RµmGM
)
fa (0,1,f(0,1,x)) → · · ·
mGM
is not simply terminating. However, RµnGM :
fa (0,1,x) mark(f(x,y,z)) mark(0) mark(1) mark(c(x))
→ → → → →
ca (f(x,x,x)) fa (mark(x),mark(y),z) 0a 1a ca (x)
fa (x,y,z) → f(x,y,z) 0a → 0 1a → 1 ca (x) → c(x)
is rpo-terminating (use the precedence mark > fa , 0a , 1a > ca > f, 0, 1 > c). Thus, RµmGM (or RµGM ) and RµnGM are not equivalent regarding simple termination. However, we have the following. Theorem 8. Let R be a TRS and µ ∈ MR . If RµGM is simply terminating, then RµnGM is simply terminating. Unfortunately, the transformations also have some (practical) drawbacks. Theorem 9. Let R be a left-linear TRS and µ ∈ CMR . If R is not simply terminating, then RµGM and RµmGM are not simply terminating. The theorem does not hold if µ ∈ CMR . Example 11. Consider the following TRS: f(f(x)) → f(g(h(x))) h(x) → f(x)
This TRS is not simply terminating: f(f(x)) → f(g(h(x))) →Emb(F ) f(h(x)) → f(f(x)) → · · ·
Let µ(f) = µ(g) = µ(h) = ∅. Note that µ ∈ CMR . Now, RµGM : active(f(f(x))) → mark(f(g(h(x)))) active(h(x)) → mark(f(x)) mark(f(x)) → active(f(x))
mark(g(x)) → active(g(x)) mark(h(x)) → active(h(x)) active(x) → x
is simply terminating; e.g., the reduction sequence that corresponds to the preµ ). vious ‘dangerous’ sequence is not longer possible in RµGM ∪ Emb(FGM Theorem 9 does not hold for RµnGM (see Example 10; note that µ ∈ CMR ). Still, RµnGM can also be non-simply terminating if R is not simply terminating.
306
S. Lucas
Example 12. Consider the non-simply terminating TRS R: f(f(x)) → f(g(f(x)))
together with µ(f) = {1} and µ(g) = ∅. Note that µ ∈ CMR . Then, RµnGM : fa (f(x)) → fa (ga (f(x))) mark(f(x)) → fa (mark(x)) mark(g(x)) → ga (x)
fa (x) → f(x) ga (x) → g(x)
is not simply terminating: fa (f(x)) →Rµ
nGM
fa (ga (f(x))) →Emb(F µ
nGM
)
fa (f(x)) → · · ·
As one can notice by re-examining the examples of Giesl and Middeldorp’s papers [GM99,GM02], the success of their transformations usually depends on using Arts and Giesl’s dependency pairs method [AG00]. Our results in this section clarify this situation by showing that, in fact, simplification orderings (alone) should not be used to orientate the rules of RµGM or RµmGM in proofs of canonical µ-termination. Rather, some kind of auxiliary technique must be additionally used to make them suitable. This is critically true when a proof of canonical µ-termination of a nonterminating (hence non-simply terminating) TRS is attempted. Automatic tools for proving termination that do not provide for adequate preprocessings (based on, e.g., applying argument filtering transformations, exploiting modularity, using dependency pairs, etc.) should not be used together with RµGM or RµmGM . For instance, when using CiME 2.0 system5 for proving termination of RµGM or RµmGM , the use of dependency pairs should be activated before.
6
Comparing the Transformations
Figure 2 shows how different transformations for proving termination of CSR compare. The leftmost diagram of Figure 2 is obtained from that of [GM02] plus Theorem 5. An arrow from Θ to Θ means that whenever RµΘ is terminating, then RµΘ is also terminating (for µ selected from the subset of replacement maps written below the diagram). Dashed arrows (as well as minimal paths containing them) require left-linearity of R. *GM stands for the three transformations introduced in the previous section (that are equivalent regarding termination of the transformed TRS); C stands for the complete Giesl and Middeldorp’s transformation (not described here, see [GM99,GM02]). Comparisons regarding simple termination are also interesting from a more practical point of view. Apart from Theorems 7 and 8, we have the following. Theorem 10. Let R be a TRS and µ ∈ MR . If R is simply terminating, then RµZ , RµF R , and RµGM are simply terminating. We conjecture that simple termination of RµZ implies that of RµF R (the opposite is not true). Simple termination can be achieved after applying some transformations. 5
Available at http://cime.lri.fr
Termination of (Canonical) Context-Sensitive Rewriting C
✸C
..... .... ... ... ..... ... ... ..
✻ *GM
❆
✕ ✁ ✁
❆
Z
L
.. ... ... ... ... ......
✻
FR
✻
L
Z
✻
ID MR
*GM ✛
✻
✻
ID
L
.. ... ... .. . . ...........
✻
✰
Z
❘
C .
✕ ❑ ✁ ❆ ✁ ❆ FR ❆ ❆ ✕ ✁ ✁ ❆
❆
FR
.. .. .. . .. .. .. . ..
*GM
✕ ❑ ✁ ❆ ✁ ❆
307
.. .. .. .. .. .................. ....
ID CMR
CoCMR
Fig. 2. Comparing transformations regarding termination
Example 13. Consider R and µ as in Example 12. Recall that R (and RµGM , RµmGM , RµnGM ) is not simply terminating. However, RµL : f(f(x)) → f(g)
is rpo-terminating (use precedence f > g). Also, RµZ : f(f(x)) → f(g(f’(x))) f(x) → f’(x)
activate(f’(x)) → f(x) activate(x) → x
is rpo-terminating (use the precedence activate > f > g, f’). RµF R also is rpo-terminating. Simple termination of RµnGM does not imply simple termination of RµZ or RµF R . Example 14. Consider R and µ as in Example 5. Note that RµZ (and RµF R ) is not terminating (see Example 9). However, RµnGM : natsa adxa (x:y) zerosa incra (x:y) mark(nats) mark(adx(x)) mark(zeros) mark(x:y) mark(incr(x))
→ → → → → → → → →
adxa (zerosa ) incra (mark(x):a adx(y)) 0a :a zeros sa (mark(x)):a incr(y) natsa adxa (mark(x)) zerosa mark(x):a y incra (mark(x))
mark(0) mark(s(x)) natsa adxa (x) zerosa x:a y incra (x) 0a sa (x)
→ → → → → → → → →
0a sa (mark(x)) nats adx(x) zeros x:y incr(x) 0 s(x)
is simply terminating. For replacement maps µ ∈ CMR , an immediate consequence of Theorems 9 and 10 is the equivalence of simple termination of R and RµGM for left-linear TRSs R. With conservativeness, we have: Theorem 11. Let R be a left-linear TRS and µ ∈ CoCMR . If RµZ , RµF R , or RµnGM are simply terminating, then RµL is simply terminating. Figure 3 shows the hierarchy of transformations regarding simple termination.
308
S. Lucas L
....
Z
FR
nGM
... ... .. .. .. .. .. .. .. ... .. . ... ... ... ... ... ... ... ... . . . . . . . ... ... . .. ... ... . ... ..... ... ... ... ... ... ... ... ... .. . . . . ... ... ... ... ... ... .. ... .. . . ... . ... ... ..... ... .... ..... ...... .......... .
✻
✻
✻
mGM
✒
✠
GM
✻
ID MR
Z
nGM
.... .... .. . .. . .. .. ... .. ... .. ... .. ... ... ... .. ... ... .. ... .. ... ... .. .... ... .. ... ... ... .. ... ... . ... ... .. .. .. ... ... ... .... ... ... ... ... ... .. ... .. .... .... .
✻ ✻
FR
.... ..... ..... .. .. .. ... . .. .. .. .. .. ... ... . ... ... . . . .. .. ... ... .. ... ... .. ... .. ... ... .. ... .. ... .. .. .. . . .. ... . .. ... .. ... .. ..... .... ....
✻
✻ ✻
mGM
✒
✠
GM
✒
✠
ID
CMR
.. ..
.. .. .... ... .. ✯ ..✻ .. .. .
.. .. ..
Z
. ..
. .
nGM
.... .... .. . .. . .. .. ... .. ... .. ... .. ... ... ... .. ... ... .. ... .. ... ... .. .... ... .. ... ... ... .. ... ... . ... ... .. .. .. ... ... ... .... ... ... ... ... ... .. ... .. .... .... .
✻ ✻
.. ..
FR
.... ..... ..... .. .. .. ... . .. .. .. .. .. ... ... . ... ... . . . .. .. ... ... .. ... ... .. ... .. ... ... .. ... .. ... .. .. .. . . .. ... . .. ... .. ... .. ..... .... ....
✻
✻ ✻
mGM
✒
✠
GM
✒
✠
ID
CoCMR
Fig. 3. Comparing transformations regarding simple termination
7
Experiments on Proving Termination of Canonical CSR
Consider a left-linear TRS R for which we want to establish some interesting termination property. We claim that, for the purpose of computing (infinite) normal forms, proving termination of canonical CSR could be priorized over proofs of termination: a proof of termination of canonical CSR can be used to achieve normalizations (even with non-terminating TRSs) and also to arbitrarily approximate infinite normal forms (see Section 4). Of course, if (it is clear that) R is not terminating, then only the proof of termination of canonical CSR should be attempted. Our results suggest that proving µ-termination of a terminating TRS R is not much harder than directly proving termination of R (e.g., simple termination is preserved by the transformations, see Theorems 5 and 10). Moreover, proving termination of canonical CSR can be easier than attempting a direct proof of termination (see Examples 12 and 13). Table 1 shows the result of a preliminary experience aimed at providing some experimental evidence supporting our claim. We have selected two rather different collections of termination examples: Dershowitz’s 33 examples of termination [Der95] (first group of four examples) and Arts and Giesl’s collection of examples for termination using dependency pairs [AG01] (most of them non-simply terminating; see the second group of four examples). In our experiments, given a TRS R, we systematically µcan try to prove termination of RΘR for Θ ∈ {ID, L, Z, F R, GM, mGM, nGM } µcan R (note that proving termination of RID is just proving termination of R). We have used mu-term 1.0 for obtaining the transformed6 TRSs. We have attempted the proofs of termination using CiME 2.0 (mu-term 1.0 has also been used for giving the appropriate format) in both its standard version (Std, 6
mu-term 1.0 is available from http://www.dsic.upv.es/users/elp/slucas/muterm.
Termination of (Canonical) Context-Sensitive Rewriting
309
Table 1. Experiments on termination vs. µcan R -termination with CiME 2.0 Ref. 5. 7. 8. 33. 3.1. 3.5. 3.7. 3.10.
ID L Z Example Std DP Std DP Std DP Non Simp. N 0.06 0.03 0.00 0.04 0.00 Dutch Flag 0.09 0.06 0.05 0.02 0.22 0.12 Diff. N N 0.02 0.00 3.14 1.43 Hydra N N NC NC N ? Division v.1 N 2152 NC NC ? 1.11 Remainder N 2.65 NC NC ? 9.08 Logarithm 105.0 17.9 =ID =ID 6469 117.6 Min. sort N ? NC NC N ?
FR Std DP 0.08 0.05 ? 0.20 ? ? ? ? ? ? =Z =Z =Z =Z =Z =Z
GM Std DP N 4153 ? 0.33 ? 0.70 N N ? ? ? ? ? ? N N
mGM Std DP N 0.14 ? 0.2 ? 0.59 N N ? ? ? ? ? ? N ?
nGM Std DP N 0.16 ? 0.17 ? 0.47 N N ? ? ? ? ? ? N ?
where polynomial orderings are used for proving simple termination of the TRS), and activating the use of dependency pairs (DP, thus dealing with non-simply terminating TRSs). In this way, we proceeded as a (typical) user of automatic sofware tools having no expertise in proofs of termination. A positive answer for a given experiment is reported in our figures by indicating the time (in seconds) used by CiME to find out the proof. CiME’s negative answers are reported with ‘N’. Finally, ‘?’ means that the search for a proof was interrupted without getting any answer (usually after several hours, often days, of execution). Other marks such as ‘NC’ or ‘=ID’ (in column L), or ‘=Z’ (in column FR) mean that the TRS R is not µcan R -conservative (‘NC’), or µcan µcan µcan RLR = R (‘=ID’), or that RFRR = RZR (‘=Z’). In Table 1, we have selected a representative sample of the (still growing) complete experiment7 . We can see that, whenever a TRS R is proved terminating, µcan R -termination of R can also be proved using some (non-trivial) transformation. This usually amounts to have some overhead in the proof, but we have also obtained improvements (e.g., with µcan µcan Dutch Flag and RLR , or Division v.1 and RZR ). In several cases we can prove (with CiME) µcan R -termination even when we cannot prove termination of the (terminating) TRS (e.g., with Non Simp. or Diff.). On the other hand, hard problems like Hydra were not managed in any way (with the current conditions for the experiment). We believe that these results are quite encouraging regarding our hypothesis.
8
Conclusions
We have proven that, for left-linear TRSs R, Giesl and Middeldorp’s transformations are complete for proving termination of canonical CSR; if conservativeness is additionally fulfilled, then the contractive transformation is also complete. We have compared the different transformations regarding simple termination. Although undecidable, simple termination copes with the use of most usual orderings for proving termination of rewriting automatically (e.g., rpo, kbo, and poly [Der87]). Giesl and Middeldorp noticed that, even though their transformations subsume the others (see the leftmost diagram of Figure 2), the latter 7
See http://www.dsic.upv.es/users/elp/slucas/experiments for more details.
310
S. Lucas
‘may still be useful for the purpose of automation’ [GM99]. This is true, indeed, and Figure 3 provides a new (quite different, compared to Figure 2) hierarchy of the transformations which is helpful for guiding their practical use in combination with simplification orderings. Their interaction with other techniques for automatically proving termination (e.g., DP-simple termination [AG00,AG01]) should be further investigated. We argued that termination of canonical CSR is a computational property which can be more interesting to analyze than standard termination. We have partially supported this claim both with theoretical and experimental results. Acknowledgements. I thank Cristina Borralleras and Albert Rubio for many interesting discussions about the topics of this paper. I also thank J¨ urgen Giesl and the anonymous referees for their useful remarks.
References [AG00]
T. Arts and J. Giesl. Termination of Term Rewriting Using Dependency Pairs Theoretical Computer Science, 236:133-178, 2000. [AG01] T. Arts and J. Giesl. A collection of examples for termination of term rewriting using dependency pairs. TR AIB-2001-09, RWTH Aachen, 2001. [BLR02] C. Borralleras, S. Lucas, and A. Rubio. Recursive Path Orderings can be Context-Sensitive. Proc. of CADE’02, Springer LNAI to appear, 2002. [Der87] N. Dershowitz. Termination of rewriting. JSC, 3:69-115, 1987. [Der95] N. Dershowitz. 33 Examples of Termination. LNCS 909:16-26, SpringerVerlag, Berlin, 1995. [DKP91] N. Dershowitz, S. Kaplan, and D. Plaisted. Rewrite, rewrite, rewrite, rewrite, rewrite. Theoretical Computer Science 83:71-96, 1991. [FR99] M.C.F. Ferreira and A.L. Ribeiro. Context-Sensitive AC-Rewriting. Proc. of RTA’99, LNCS 1631:286-300, Springer-Verlag, Berlin, 1999. [GM99] J. Giesl and A. Middeldorp. Transforming Context-Sensitive Rewrite Systems. Proc. of RTA’99, LNCS 1631:271-285, Springer-Verlag, Berlin, 1999. [GM02] J. Giesl and A. Middeldorp. Transformation Techniques for ContextSensitive Rewrite Systems. Technical Report AIB-2002-02, Aachen, 2002. [Luc96] S. Lucas. Termination of context-sensitive rewriting by rewriting. Proc. of ICALP’96, LNCS 1099:122-133, Springer-Verlag, Berlin, 1996. [Luc98] S. Lucas. Context-sensitive computations in functional and functional logic programs. Journal of Functional and Logic Programming, 1998(1):1-61, January 1998. [Luc02] S. Lucas. Context-sensitive rewriting strategies. Information and Computation, to appear. [SX98] J. Steinbach and H. Xi. Freezing – Termination Proofs for Classical, ContextSensitive and Innermost Rewriting. Institut f¨ ur Informatik, T.U. M¨ unchen, January 1998. [Zan97] H. Zantema. Termination of Context-Sensitive Rewriting. Proc. of RTA’97, LNCS 1232:172-186, Springer-Verlag, Berlin, 1997.
Atomic Set Constraints with Projection Witold Charatonik1,2 and Jean-Marc Talbot3 1
3
Max-Planck Institut für Informatik, Saarbrücken, Germany 2 University of Wroclaw, Poland Laboratoire d’Informatique Fondamentale de Lille, Lille, France
Abstract. We investigate a class of set constraints defined as atomic set constraints augmented with projection. This class subsumes some already studied classes such as atomic set constraints with left-hand side projection and INES constraints. All these classes enjoy the nice property that satisfiability can be tested in cubic time. This is in contrast to several other classes of set constraints, such as definite set constraints and positive set constraints, for which satisfiability ranges from DEXPTIME-complete to NEXPTIME-complete. However, these latter classes allow set operators such as intersection or union which is not the case for the class studied here. In the case of atomic set constraints with projection one might expect that satisfiability remains polynomial. Unfortunately, we show that that the satisfiability problem for this class is no longer polynomial, but CoNP-hard. Furthermore, we devise a PSPACE algorithm to solve this satisfiability problem.
1 Introduction Set constraints are first-order formulas interpreted over sets of trees. They are defined as inclusions between set expressions. These set expressions are built up from variables, a ranked signature (constants and function symbols) and various set operators. According to the set operators used, different classes of set constraints are defined. The main domain of application of set constraints is program analysis, often called set-based analysis [15,13,2]. Set constraints play also a role in type checking [24] (See [1,16,20] for surveys). The satisfiability problem for different classes of set constraints has been extensively studied and its complexity turns out to be often quite high; in most cases (definite set constraints [15], set constraints with intersection [8], positive and/or negative set constraints [12,4,6], positive and negative set constraints with projection [7]) satisfiability is DEXPTIME-hard [8] or even NEXPTIME-hard [5] for some of them. Nonetheless for some other classes, it turned out that the satisfiability problem is tractable, more precisely cubic time. Atomic constraints are such a class: for these simple constraints, no set operator is allowed. A cubic time algorithm for satisfiability for these constraints is given in [13]. Motivated by program analysis, Müller, Niehren and Podelski introduced in [19] the class of INES constraints (Inclusions over Non-Empty Sets of trees). They correspond syntactically to atomic set constraints but the empty set is no longer an admissible value to instantiate variables. They showed that satisfiability for INES constraints can be tested in cubic time. In the same paper, the authors also investigate atomic set constraints extended with the possibility to force the denotation of S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 311–325, 2002. c Springer-Verlag Berlin Heidelberg 2002
312
W. Charatonik and J.-M. Talbot
some variables (but maybe not all of them) to be non-empty and show that satisfiability remains cubic. In [13], another extension of atomic set constraints have been studied. There the syntax of constraints was extended to allow the use of projection on the left-hand side of inclusions. It has been shown that satisfiability remains cubic also in this case. A natural question is then “Can theses two extensions (atomic set constraints with non-emptiness constraints and projection in the left-hand side of inclusions) be combined to obtain a new class for which satisfiability could be tested in cubic time ?”. We depict in the table below known results from previous works about complexity of the satisfiability problem for extensions of the class of atomic set constraints. At the first view the answer seems to be “yes”; in the conclusion of an extended version of [19] available from Podelski’s web page the authors explicitly say that it should be straightforward. Surprisingly, in Section 3 we show that even in the “simple” case of projections on the left-hand side of inclusions, in presence of non-emptiness constraints, the satisfiability problem becomes CoNP-hard. no emptiness constraints with emptiness constraints
no left-hand side projection with left-hand side projection O(n3 ) O(n3 ) [13] O(n3 ) [19] ?
On one hand, projection is a quite intricate operator which can be usually treated easily in the left-hand set of inclusions ([8,21]) by a simple elimination method.1 However, it leads to more difficult processing in the right-hand side: several methods have been proposed for classes of set constraints that do not admit projection at all [3,4,6,12, 19,23] or admit it only on the left-hand side of inclusions [5,8,13,15,21] but only few concerning projections in the right-hand side [7,9,11]. On the other hand, for the latter classes adding projection operation does not increase the complexity of satisfiability problem. So, one may wonder whether projections could be freely added to atomic set constraints without modifying the cubic time complexity for satisfiability. Again our result from Section 3 implies the negative answer: there exists a simple encoding of nonemptiness constraints using projections in the right-hand side of inclusions provided that the signature is rich enough. Our second contribution is an algorithm for solving atomic set constraints with projections. It covers both extensions with non-emptiness constraints and projection on both sides of inclusions. We show that it works in PSPACE, while the only previously known algorithm covering this class of constraints (devised for the general set constraints with projections [7]) worked in NEXPTIME. There is still a gap between the CoNP lower bound and PSPACE upper bound; we leave open the question about the exact complexity of the problem. This work is a continuation of the research project initialized by Aiken, Kozen, Vardi, and Wimmers in [3] to characterize the complexity of the satisfiability problem for set constraints depending on the syntax used. Given the NEXPTIME lower bound in the general case it is natural to ask under which assumptions faster algorithms may be found. In [3] the authors give a detailed analysis of the complexity of subclasses of positive set 1
Known methods for elimination of left-hand side projections either require boolean operations on set expressions or are correct only wrt. least solutions. Therefore they do not apply to the class in question (cf. Examples 1 and 2).
Atomic Set Constraints with Projection
313
constrains which are obtained by restricting the ranked alphabet of constructor symbols (for example, positive set constraints over unary trees have a DEXPTIME-complete satisfiability problem). This classification, however, does not say much about applications in program analysis where we usually have to deal with symbols of high arity. The approach based on the choice of used set operators seems to be more appropriate here. For example, the complement operation is almost never used in set-based analysis. Contrary to this, projection (on the left-hand side of inclusion) is used in most applications, it is present already in the early paper by Reynolds [21]. The restriction to nonempty sets is often used as an optimization to speed up the solving process. Due to lack of space, some proofs are omitted; they can be found in [10].
2 Atomic Set Constraints with Projection 2.1
Definitions
We consider a fixed signature Σ, that is a set of function symbols equipped with arities. Constants are symbols from Σ with arity 0. We use a, b, c to denote constants and f, g, h to denote symbols of arity at least one. We assume moreover that Σ contains at least one constant and one symbol of arity greater than or equal to 2.2 We denote by Σ >0 the set of all symbols from Σ that are not constants. We also use a countable set V = {x, y, z, . . . } of variables ranging over sets of trees over the signature Σ. Trees: Let Π be the free monoid generated by the empty path and concatenation over IN+ , the set of strictly positive natural numbers. Words from Π are called paths. Given two paths π, π we write π.π for the concatenation of π and π . Let τ be a partial mapping from Π to Σ. The mapping τ is said to be (i) non-empty if τ is defined for some π, (ii) prefix-closed if for any π and i such that τ is defined for π.i, then τ is defined for π as well and, (iii) arity consistent if for all π, if τ is defined for π and the arity of τ (π) is n, then τ is defined for π.1, . . . , π.n and undefined for all π.j with n < j. A tree τ is a partial mapping from Π to Σ, which is non-empty, prefix closed and arity consistent. A tree τ is finite if τ is defined for only finitely many paths. We denote TΣω (resp. TΣ ) the set of all trees (resp. of all finite trees) generated over Σ. For a tree τ and a path π, we write τ [π] for the subtree of τ rooted at position π. If τ is not defined for π then also τ [π] is undefined. Set Constraints: let Σ −1 be the set of unary function symbols, called projections and defined as {fi−1 | f ∈ Σ >0 and i ranges over 1 . . . arity(f )}. We call set expressions, terms that are built from V ,Σ and Σ −1 . We use the letters s, t, u, v (possibly with subscripts) to denote set expressions. 2
This assumption will be used only for the lower-bound proof; the algorithm from Section 4 works also in the unary case. On the other hand, e.g. in the unary case without emptiness constraints all projections can be easily eliminated and the problem becomes polynomial.
314
W. Charatonik and J.-M. Talbot
Definition 1. An atomic set constraint with projection ϕ is given by the following grammar (where s, t are some set expressions and x is a variable) ϕ ::= s ⊆ t | x ∅ | ϕ ∧ ϕ In the definition above, ∧ stands for logical conjunction. However, we will often identify such conjunction with the set of all its conjuncts. The semantics of our constraints is given by the structure of sets of trees P(TΣω ) or by the structure of sets of finite trees P(TΣ ): the domain of P(TΣω ) (resp. of P(TΣ )) is the set of all subsets of TΣω (resp. of TΣ ) and function and projection symbols are interpreted as follows: for sets S, S1 , . . . , Sn of trees (or of finite trees), f (S1 , . . . , Sn ) = {f (τ1 , . . . , τn ) | τ1 ∈ S1 , . . . , τn ∈ Sn } fi−1 (S) = {τi | f (τ1 , . . . , τn ) ∈ S} Finally, both P(TΣω ) and P(TΣ ) interpret ⊆ as set inclusion and ∅ as non-emptiness. A valuation is a mapping that associates a set of trees with each variable from V . It can be then canonically extended to all set expressions. We say that a valuation σ satisfies an inclusion s ⊆ t iff σ(s) is a subset of σ(t) and satisfies a non-emptiness constraint x ∅ iff σ(x) is a non-empty set. We say that σ satisfies a constraint ϕ if it satisfies each conjunct from ϕ. The satisfiability problem is to determine for a given constraint ϕ whether there exists a valuation σ that satisfies ϕ. In this case, we say that σ is a solution of ϕ. When ϕ admits a solution, we say that ϕ is satisfiable, otherwise it is said to be unsatisfiable. 2.2
Expressiveness
Non-emptiness constraints x ∅ in Definition 1 may be seen as redundant as we can encode them using projection provided that the signature Σ contains a binary function symbol and a constant as follows: Example 1. The constraint a ⊆ f1−1 (f (a, x)) implies that the denotation of the variable x must be non-empty in any solution of this constraint. Atomic set constraints with projection generalize several classes of set constraints that have already been studied and for which a cubic time satisfiability algorithm has been devised. ✷ The atomic set constraints and atomic set constraints with left-hand side projections correspond to restrictions of the syntax from Definition 1: in both of them, non-emptiness constraints x ∅ are not allowed. Moreover, in the first one, set expressions s and t from Definition 1 do not contain projection whereas for the second one only s may contain projections. Satisfiable constraints from these two classes always have a least solution. This property does not hold when projections are added, which proves that atomic set constraints with projections are strictly more expressive. Example 2. Looking again at the constraint from Example 1, it is easy to find two minimal solutions σ,σ such that respectively σ(x) = {a} and σ (x) = {f (a, a)} that are not comparable. Hence, there is no least solution for this constraint.
Atomic Set Constraints with Projection
315
✷ INES Constraints have been introduced in [19]: syntactically defined as atomic set constraints, their main feature is that they are interpreted over non-empty set of trees. As for atomic set constraints, a cubic time algorithm to solve satisfiability has been proposed in [19]. We can encode an INES constraint ϕ in the constraint from Definition 1 by simply adding to ϕ the non-emptiness constraint x ∅ for all variables x occurring in ϕ. INES constraints always admit a greatest solution. However, this is not true for atomic constraints with projection as exemplified below. This proves that our class is strictly more expressive than the class of INES constraints. Example 3. The constraint f (x, x ) ⊆ a implies that the denotation of either x or x is empty. Thus, one can find two incomparable maximal solutions σ,σ satisfying respectively σ(x) = ∅, σ(x ) = TΣ and σ (x) = TΣ , σ (x ) = ∅. It should be noticed that this constraint is simply not satisfiable over INES. ✷ Beyond INES, in [19] the authors consider also atomic set constraints enhanced with non-emptiness constraints x ∅ and give a cubic time satisfiability algorithm. It is proved in [19] that every satisfiable constraint ϕ in this class has a solution where each nonempty variable x contains either only trees starting with one particular function symbol f (if ϕ contains an inclusion x ⊆ f (. . . )) or it contains trees starting with all function symbols from Σ (if ϕ does not contain such an inclusion). This is no longer the case for atomic set constraints with projection. For example the constraint fi−1 (x) ⊆ y ∧ y ⊆ a ∧ y ⊆ b implies that the variable x may contain trees starting with any function symbol from Σ except f . This proves that atomic set constraints with projection are strictly more expressive than atomic set constraints with non-emptiness constraints. Our final example in this section shows some of the ideas behind our lower-bound proof (cf. the use of markers in the next section). Note that the projection here occurs only on the left-hand side of inclusions. Example 4. Consider the constraint x∅
∧ ∧
x ⊆ f (a, z) f2−1 (x) ⊆ v
∧ ∧
z⊆a v ⊆ f (a, a).
It is not satisfiable because the first two columns imply that intersection of z and v must be nonempty while the third one implies that this intersection is empty.
3 Complexity Lower Bound We show in this section that the satisfiability problem for atomic set constraints with projection is CoNP-hard. We prove this statement by presenting a polynomial reduction from the unsatisfiability problem for propositional formulas in conjunctive normal form to the satisfiability problem of these set constraints. We recall that CoNP is the class of problems whose complements are in NP and that a problem is CoNP-hard (respectively CoNP-complete) if its complement is NP-hard (respectively NP-complete). We consider propositional variables ranged over by u. A literal l is either a propositional variable u or its negation ¬u. A clause is a disjunctive formula l1 ∨ . . . ∨ ln where
316
W. Charatonik and J.-M. Talbot
li is a literal. We say that a clause C is non-trivial if the disjunction contains at least one literal and each variable u appears at most once in C either positively u or negatively ¬u. A CNF formula is a finite conjunction of non-trivial clauses. The Problem CNF-UNSAT is as follows : – INPUT : • u1 , . . . , un : a non-empty sequence of propositional variables. • C1 ∧ . . . ∧ Cp : a CNF formula written over u1 , . . . , un . – ANSWER : Yes, if C1 ∧ . . . ∧ Cp is not satisfiable. No, otherwise. It is well-known that satisfiability of CNF formulas is NP-complete, hence CNFUNSAT is CoNP-complete. Before presenting formally how CNF-UNSAT can be polynomially reduced to the satisfiability problem of atomic set constraints with projection, let us sketch the main ideas of this reduction. Some of these ideas have been used before in [17,18] to prove hardness for satisfiability or entailment problems for constraints overs trees or sets of trees. We describe first these ideas in terms of trees and then turn when needed to sets of trees. The first point is that Boolean valuations for a sequence of propositional variables u1 , . . . , un can be viewed as a path of length n in a complete binary tree assuming that 1 stands for true and 2 for false. For instance, for the sequence u1 , u2 , u3 , the path 1.2.1 encodes the valuation associating true with u1 and u3 and false with u2 . The second idea is to notice that valuations satisfying a clause C do not need to be depicted as a full binary tree, but that a dag representation is sufficient. For instance, the clauses C1 = u1 ∨ u3 and C2 = ¬u1 ∨ ¬u2 can be represented as follows f C1
f
f
f
f
f
C2
{M1 }
f f
f f
f {M2 }
We label leaves of the dags with some markers distinguishing the valuations (paths) satisfying the clause from the valuations that do not. For instance, in the representation of C1 and of C2 , we used {M1 } and {M2 } respectively to point out valuations satisfying this clause and for the others. Now, if we unfold these two dags to full binary trees and overlap one with the other we obtain the tree f f
f
f
f
f
f
{M1 } {M1 } {M1 , M2 } {M1 , M2 } {M1 , M2 } {M2 } {M1 , M2 } {M2 }
Atomic Set Constraints with Projection
317
where each leaf corresponds to a valuation (described by the path leading to this leaf) and is labeled with the set of markers identifying the clauses satisfied by this valuation. For example, the node identified by the path 1.2.1 is labeled with {M1 , M2 } which indicates that both clauses C1 and C2 are satisfied by this valuation. In the general case we will have that the formula in question is satisfiable if and only if this corresponding tree contains a node labeled with all markers. Now, let us come to a “set” point of view. Encoding of markers and their use to produce an unsatisfiable set constraint in case of a satisfiable formula is the novelty of this reduction. Let us give some intuitions about that. We will use p + 1 markers, one for each clause and one “tester”. Each marker represents a set of trees. These sets are constructed in such a way that the intersection of any p of them is nonempty while the intersection of all p + 1 is empty. A node in the overlapped tree above labeled with several markers represents the intersection of the sets represented by all these markers. Now if the formula C1 ∧ . . . ∧ Cp is satisfiable then the overlapped tree contains a node marked with {M1 , . . . , Mp }, hence the set represented by this node intersected with the set represented by the tester is empty, which together with a non-emptiness constraint make the corresponding set constraint unsatisfiable. The crucial part is the definition of the markers. It heavily depends on the use of projections and is a generalization of Example 4, where the two lines define respectively the marker and the tester in the case of p = 1. The rough idea is that a constraint of the form f2−1 (x) ⊆ x is very similar to a constraint of the form x ⊆ f ( , x ). The important difference is that in the second case the variable x may contain only trees whose root is labelled with the symbol f while in the first case it may contain terms starting with a different symbol. On the picture below we use dashed lines for the constraints of the first form and solid lines for the constraints of the second form. For p = 2, the markers {M1 }, {M2 } can be depicted as the following two trees on the left: f {M1 } a
◦ ◦
f2−1 a
f
{M2 } a
f
f2−1 {M1 , M2 } a a
f a
a
Sets of trees represented by the markers are partially described by means of projections. The main part in the markers is the right-most path, that is paths of the form 2∗ . For the makers {Mi } we force the node 2i−1 to be an f : for {M1 }, this is and for {M2 } the node at path 2. The rest of this path is defined as projection f2−1 . Roughly speaking, the meaning of {M1 } is a non-empty set of trees satisfying the following requirement: the root of the tree is labeled with f and the position 1 with a. Additionally, if the position 2 is labeled with f then the position 2.2 is labeled with a. Now, concerning {M2 }, its meaning is a non-empty set of trees satisfying the following requirement: if the root is labeled with f then the positions 2, 2.1 and 2.2 are labeled respectively with f , a and a. Now, the marker {M1 , M2 } produced by overlapping trees from the clauses is described above as the last tree on the right. Note that this tree is totally defined, there is no
318
W. Charatonik and J.-M. Talbot
more projection f2−1 over it. This property implies that the intersection with the set represented by the tester is empty. The meaning of {M1 , M2 } must satisfy both requirements of {M 1} and {M2 }; the only set satisfying them is the singleton {f (a(f (a, a)))}. The last point of the encoding is the tester. The left-hand part of the picture below is used for overlapping with the whole binary tree, the right-hand part is the actual tester. ◦
f f
T
{MT }
f
f2−1 ◦ f a
{MT }
f2−1
a
The tester {MT } represents any non-empty set of trees that satisfy the requirement: if the positions and 2 of the tree are labeled with f then positions 2.2, 2.2.1 and 2.2.2 are labeled respectively by f , a and a. Note that the intersection of the sets represented by {MT } with any of {M1 }, {M2 } is nonempty but the intersection of {MT } with {M1 , M2 } is empty. Indeed, the set {f (a, a)} satisfies the requirements of both {M1 } and {MT } whereas the set {a} satisfies the requirements of both {M2 } and {MT }. But no non-empty set can satisfy the requirements of {M1 }, {M2 } (that is, of {M1 , M2 }) and {MT } at the same time. We show now formally that we can built in polynomial time from any instance of the CNF-UNSAT problem a set constraint with projection that is satisfiable iff the answer to the instance of CNF-UNSAT is Yes. We consider a signature Σ containing a symbol f which is at least binary and another function symbol. For simplicity, we will assume f to be binary and that the other function symbol is a constant a. We consider an instance of CNF-UNSAT given by {u1 , . . . , un , C1 ∧ . . . ∧ Cp } (for ¯ and define its encoding Enc(¯ ¯ We use variables x, xi , y i , wj with short, {¯ u, C}) u, C). j j 1 ≤ i ≤ p, 1 ≤ j ≤ n + 1 and zki , v i with 1 ≤ i, k ≤ p. The variables xij , yji are used to encode Boolean valuations (for the ith clause and the jth variable), the variables wj to encode the top part of the tester and the variables zki , v i to encode respectively the markers for clauses and the marker of the tester. We let x ∅ ∧ x ⊆ w1 ∧ Enc Test (¯ ¯ u) ∧ Enc MTest (C) ¯ = Enc(¯ u, C) i ¯ ∧ ¯ x ⊆ x ∧ Enc (i, u ¯ , C) Enc Mark (i, C) Cl 1 1≤i≤p
1≤i≤p
u) = Enc Test (¯
1≤i≤p
wj ⊆ f (wj+1 , wj+1 ) ∧ wn+1 ⊆ v 1
1≤j≤n
¯ = Enc MTest (C)
1≤i≤p
f2−1 (v i ) ⊆ v i+1 ∧ v p+1 ⊆ f (a, a)
Atomic Set Constraints with Projection
¯ = Enc Mark (i, C)
i i i f2−1 (zki ) ⊆ zk+1 ∧ zii ⊆ f (a, zi+1 ) ∧ zp+1 ⊆a
1≤k≤p,k=i
¯ = Enc Cl (i, u ¯, C)
319
i i i yji ⊆ f (yj+1 , yj+1 ) ∧ yn+1 ⊆ z1i ∧
1≤j≤n
¯ Enc LC (i, j, u ¯, C)
1≤j≤n
i i i xj ⊆ f (yj+1 , xj+1 ) if uj occurs in Ci i i i ¯ Enc LC (i, j, u ¯, C) = xj ⊆ f (xj+1 , yj+1 ) if ¬uj occurs in Ci i i i xj ⊆ f (xj+1 , xj+1 ) if uj and ¬uj don’t occur in Ci Proposition 1. A CNF formula C1 ∧ . . . ∧ Cp written over variables u1 , . . . , un is satisfiable if and only if the corresponding set constraint Enc({u1 , . . . , un , C1 , . . . , Cp }) is not satisfiable. Theorem 1. Let Σ be a signature containing at least two function symbols, one of these symbols having an arity greater than or equal to two. Then satisfiability problem for atomic set constraints with projection is Co-NP-hard both over sets of trees P(TΣω ) and sets of finite trees P(TΣ ). ¯ Proof. Obvious from Proposition 1 noticing that the size of the constraint Enc({¯ u, C}) ¯ is polynomial in the size of {¯ u, C}, the input of CNF-UNSAT. Remark 1. Consider an introduction of projections of constant symbols to the syntax of set constraints with the semantics a−1 (S) = TΣ if a ∈ S and a−1 (S) = ∅ otherwise. Then the satisfiability problem becomes DEXPTIME-hard by a simple reduction from the emptiness of intersection of deterministic top-down tree automata.
4 A Satisfiability Algorithm 4.1
Preliminaries
To handle constraints more easily we will assume them to be flat. Definition 2. A constraint ϕ is said to be flat if it is generated by the following grammar (x, y, x1 , . . . , xn being variables) ϕ ::= x ⊆ y | x = f (x1 , . . . , xn ) | x = fi−1 (y) | x ∅ | ϕ ∧ ϕ | ⊥ where equality x = t is interpreted as x ⊆ t ∧ t ⊆ x and ⊥ represents the value False. As soon as ⊥ occurs in a constraint ϕ, ϕ is not satisfiable. Considering flat constraints is not a loss of generality: for any constraint ϕ one can compute in linear time in the size of ϕ a flat constraint ϕ equivalent to ϕ. Given a constraint ϕ, we write Σϕ for the set of all function symbols f from Σ occurring in ϕ either as a function symbol f or as a projection fi−1 . The set Σϕ>0 is the set Σϕ restricted to symbols that are not constants. 4.2 Annotated Paths – Path Constraints We introduce a new kind of constraints called path constraints defined over annotated paths.
320
W. Charatonik and J.-M. Talbot
Annotated Paths: An annotated path κ is an element of the free monoid generated by the empty path and concatenation over {f, i | f ∈ Σϕ>0 and 1 ≤ i ≤ arity(f )}. Given a tree τ and an annotated path κ, we define the subtree τ [κ] of τ rooted at κ as (i) τ if κ = , as (ii) τi [κ ] if κ = f, i.κ and τ = f (τ1 , . . . , τn ), and (iv) undefined otherwise. It should be noticed that whereas paths express only positions in a tree, annotated paths express also conditions over the labeling of the tree; for the trees τ = f (a, b) and τ = g(a, b), both τ [1] and τ [1] are defined and equal to a. But for the annotated path f, 1, the tree τ [f, 1] is defined and equal to a, whereas τ [f, 1] is undefined as the root of τ is a symbol different from f . Given a set of trees S, we write S[κ] for the set {τ [κ] | τ ∈ S}. It is always defined although it can be empty. For an annotated path κ, we define its restriction over IN+ (denoted res(κ)) as the path π obtained from κ by removing annotating function symbols. Path Constraints: a path constraint ψ is written as x[κ] ⊆ y. Its semantics is as follows: a valuation σ satisfies x[κ] ⊆ y iff σ(x)[κ] ⊆ σ(y). Note that if κ = f, 1 is an annotated path of length 1 then x[κ] ⊆ y is just a different notation for fi−1 (x) ⊆ y. We define a binary relation between flat constraints and path constraints inductively as follows: – ϕ x[] ⊆ y if x ⊆ y ∈ ϕ. – ϕ x[f, i] ⊆ xi if x = f (x1 , . . . , xn ) ∈ ϕ or fi−1 (x) = xi ∈ ϕ. – ϕ x[κ.κ ] ⊆ y if there exists z such that ϕ x[κ] ⊆ z and ϕ z[κ ] ⊆ y. Proposition 2. If ϕ x[κ] ⊆ y then any solution σ of ϕ, σ(x)[κ] ⊆ σ(y). Proof. By induction over the length of κ. Let us consider an annotated path κ such that for some x and y, ϕ x[κ] ⊆ y. There may exist a solution σ for which σ(x) is non-empty but nonetheless σ(x)[κ] is empty. For instance, for the constraint ϕ1 = x ∅ ∧ f1−1 (x) = y ∧ y = g(z) there exists a solution σ such that σ(x) = {a}, σ(y) = σ(z) = ∅, so σ(x)[f, 1] is empty. Now considering the constraint ϕ2 = ϕ1 ∧ x = f (x1 , x2 ), one can deduce that for any solution σ of ϕ2 , both σ (x)[f, 1] and σ (x)[f, 1.g, 1] is not empty. The annotated paths f, 1 and f, 1.g, 1 become unavoidable for x in the constraint ϕ2 . We define UAP(x, ϕ) the set of unavoidable annotated paths for the variable x and the constraint ϕ as the least set satisfying : – ∈ UAP(x, ϕ) – if κ ∈ UAP(x, ϕ), ϕ x[κ] ⊆ y and y = f (¯ y ) then κ.f, i ∈ UAP(x, ϕ) for all 1 ≤ i ≤ arity(f ). Proposition 3. For all solutions σ for ϕ, for all variables x, for all annotated paths κ from UAP(x, ϕ), if σ(x) is non-empty then for all trees τ in σ(x), τ [κ] is defined. Proposition 4. For all variables x in ϕ, if there exists an unavoidable annotated path κ in UAP(x, ϕ) such that ϕ x[κ] ⊆ y, ϕ x[κ] ⊆ z and for two different function symbols f, g, the two constraints y = f (¯ y ), z = g(¯ z ) belong to ϕ then for any solution σ for ϕ, σ(x) is empty.
Atomic Set Constraints with Projection
(I1) (I2) (I3) (I4) (I5)
321
x ⊆ x; x ⊆ y, y ⊆ z → x ⊆ z x ∅, x = f (¯ x), y = f (¯ y ), x ⊆ y → x ¯ ⊆ y¯ x ∅, x = f (¯ x), y = fi−1 (z), x ⊆ z → xi ⊆ y x = fi−1 (y), y ⊆ z, z = f (¯ z ) → x ⊆ zi x = fi−1 (x ), y = fi−1 (y ), x ⊆ y → x ⊆ y
(N1) x ¯ ∅, x = f (¯ x) → x ∅ (N2) x ∅, x = fi−1 (x ) → x ∅ (N3) x ∅ → y ∅ if exists κ in UAP(x, ϕ) such that ϕ x[κ] ⊆ y (C1) x ∅, y = f (¯ y ), z = g(¯ z) → ⊥ if exists κ in UAP(x, ϕ) such that ϕ x[κ] ⊆ y, ϕ x[κ] ⊆ z (C2) x ⊆ y, y = g(¯ y ), fi−1 (x) = x , x ∅ → ⊥ Fig. 1. The axioms Ax
4.3 A Satisfiability Test The Axioms Ax: we give in Figure 1 a set of axioms Ax on which our satisfiability test is based. To shorten the presentation, we write x ¯ a sequence x1 , . . . , xk of variables; the length k is most of the time fixed by the context. For instance, writing f (¯ x) fixes k to be the arity of f . Similarly, we write x ¯ ⊆ y¯ to denote the constraints x1 ⊆ y1 , . . . , xk ⊆ yk , x ¯ ∅ to denote the constraints x1 ∅, . . . , xk ∅. Let us discuss the axioms from Ax: the axioms (I1 − 5) as well as (N1 − 2) and (C2) are immediate. The axiom (N3) is less obvious and subsumes other axioms that we may think about; it is more general than x ∅, x ⊆ y → y ∅ as is always unavoidable and x ⊆ y implies ϕ x[] ⊆ y. It makes also redundant the axiom x ∅, x = f (¯ x) → x ¯ ∅, as x = f (¯ x) implies ϕ x[i] ⊆ xi for all 1 ≤ i ≤ arity(f ). Furthermore, from the following constraint, using the axiom (N3), x ∅ ∧ x ⊆ y ∧ y = f (y1 , y2 ) ∧ g2−1 (y1 ) = y ∧ x ⊆ z ∧ f1−1 (z) = z ∧ z = g(z1 , z2 ) one can deduce y ∅, z1 ∅, z2 ∅ as the paths f, 1.g, 1, f, 1.g, 2 are unavoidable for x. Additionally, when the structure of sets of finite trees P(TΣ ) is considered, a new axiom (OC) addressing the occur-check problem is required. (OC) x ∅ → ⊥
if UAP(x, ϕ) is infinite.
This axiom is pretty unusual: traditionally [19,8], occur-check is stated as the existence of a non-trivial path from a non-empty variable to itself. So, one may expect an axiom such as: (OC1 ) x ∅ → ⊥ if ϕ x[κ] ⊆ x for an annotated path κ = . Unfortunately, this axiom is too strong as ⊥ could be deduced from the constraint x ∅ ∧ f1−1 (x) = y ∧ y ⊆ x and obviously, this constraint is satisfiable over sets of finite trees P(TΣ )—for example the valuation assigning {a, f (a)} to x and {a} to y is a solution. One may try to make the axiom (OC1 ) weaker by reasoning not about
322
W. Charatonik and J.-M. Talbot
annotated paths, but only about unavoidable annotated paths. We would have an axiom such as: (OC2 ) x ∅ → ⊥ if ϕ x[κ] ⊆ x for κ ∈ UAP(x, ϕ) and κ = . Unfortunately, the axiom (OC2 ) is too weak; consider the constraint x ∅ ∧ x ⊆ y1 ∧ h−1 1 (y1 ) = y2 ∧ y2 = h(y1 ) x ⊆ z1 ∧ z1 = h(z2 ) ∧ h−1 1 (z2 ) = z1 One can see that this constraint admits no solution over sets of finite trees; nevertheless, ⊥ can not be deduced using the axiom (OC2 ) as there is no non-trivial unavoidable annotated path κ leading from a variable to itself (intuitively, there is an unavoidable path from the intersection y1 ∩ z1 to itself). So, we require simply for the occur-check the set UAP(x, ϕ) to be infinite and we will prove below that this statement is correct. Theorem 2. The axioms Ax are valid for P(TΣω ). The axioms Ax + (OC) are valid for P(TΣ ). Proof. By routine check for the axioms (I1 − 5), (N1 − 2) and (C2). Using Proposition 3 for the axiom (N3). Using Proposition 4 for the axiom (C1). Finally, for the axiom (OC), in case of sets of finite trees P(TΣ ), let us show that for any variable x, if x ∅ in ϕ and UAP(x, ϕ) is infinite, then ϕ admits no solution. Function symbols from Σ occurring in ϕ have fixed arities, and thus, these arities must be bounded. As UAP(x, ϕ) is infinite, by Koenig’ Lemma, we can extract from it an infinite sequence of annotated paths κ0 , κ1 , κ2 , . . . such that κi is a prefix of κi+1 . So, by Proposition 3, for any tree τ in σ(x) and for any κi , τ [κi ] is defined. Obviously, this requirement can not be satisfied by some finite tree. So, ϕ is not satisfiable over P(TΣ ). A Satisfiability Test: we describe here how the axioms Ax (or Ax + (OC)) can be used for defining a test for satisfiability for a constraint ϕ. We first consider the axioms from Ax and (OC) as inference rules: premises are lefthand sides of implications and conclusions the right-hand sides. Note that some rules like (N3), (C1), (OC) have side conditions concerning unavoidable paths. We denote Sat(ϕ) as the least set of constraints containing ϕ and closed under those inference rules, that is if premises of an inference rule belong to Sat(ϕ) and if the side condition is satisfied, then the conclusion belongs to Sat(ϕ). We will moreover require that the variables from Sat(ϕ) are exactly those from ϕ; this is guaranteed by an application of the axiom x ⊆ x from Ax only for variables from ϕ. Actually, the axioms Ax + (OC) can be viewed as an operator Sat over constraints that, given a constraint ϕ computes all consequences of ϕ under these axioms. Moreover, we can notice that for ϕ viewed as a collection of basic constraints, the set UAP(x, ϕ) is monotonic in the sense that ϕ ⊆ ϕ implies UAP(x, ϕ) ⊆ UAP(x, ϕ ). Thus, Sat is a monotonic operator over constraints: for two constraints ϕ, ϕ , if ϕ ⊆ ϕ then Sat(ϕ) ⊆ Sat(ϕ ). So, over the lattice with ϕ as least element, the operator Sat admits a least fix-point that we denote Satω (ϕ). Theorem 3. if ⊥ ∈ Satω (ϕ) then ϕ is unsatisfiable. Proof. Immediate from Theorem 2
Atomic Set Constraints with Projection
323
We prove now the converse of Theorem 3: if ⊥ does not belong to Satω (ϕ) then ϕ admits a solution. We build from Satω (ϕ) a valuation σS defined for any variable x as follows (i) σS (x) = ∅ if x ∅ ∈ Satω (ϕ) and (ii) otherwise, σS (x) is the set of all trees τ satisfying for all f, π, κ, y s.t. π = res(κ), τ [κ] is defined and Satω (ϕ) x[κ] ⊆ y if y = f (¯ y ) ∈ Satω (ϕ) then τ (π) = f and −1 / Satω (ϕ) then τ (π) = f if z = fi (y), y ∅ ∈ Satω (ϕ) and z ∅ ∈ Theorem 4. if ⊥ ∈ Satω (ϕ) then σS is a solution for Satω (ϕ) and thus for ϕ. 4.4 A PSPACE Algorithm We devise in this section a PSPACE algorithm deciding whether a constraint ϕ is unsatisfiable, that is according to Theorems 3 and 4 deciding whether ⊥ belongs to Satω (ϕ). To present and prove correctness of our algorithm requires some auxiliary propositions. Proposition 5. For any constraint ϕ of size |ϕ| with n variables, – for any i, the size of Sati (ϕ) is upper-bounded by |ϕ|2 + 2|ϕ| + 1; – there exists an integer p such that p ≤ n2 + n + 1 and Satp (ϕ) = Satω (ϕ). Testing that an annotated path κ is unavoidable for a variable x and a constraint ϕ is polynomial: Proposition 6. Let ϕ be a constraint and κ an annotated path. Deciding whether κ ∈ UAP(x, ϕ) can be done in polynomial time in |ϕ| and |κ| where |ϕ| is the size of ϕ and |κ| the length of κ. Moreover, we can show that if premises and side-conditions for the axioms (N3) (C1) hold for some annotated paths then they hold for annotated paths whose length is at most exponential in the number of variables in ϕ, Proposition 7. Let ϕ be a constraint and n the number of variables occurring in ϕ. – for all κ in UAP(x, ϕ) such that ϕ x[κ] ⊆ y, there exists κ in UAP(x, ϕ) such that ϕ x[κ ] ⊆ y and |κ | ≤ 2n . – for all κ in UAP(x, ϕ) such that ϕ x[κ] ⊆ y and ϕ x[κ] ⊆ z with y = f (¯ y) and z = g(¯ z ) then there exists κ in UAP(x, ϕ), y , z two variables and f , g two different function symbols such that ϕ x[κ ] ⊆ y and ϕ x[κ ] ⊆ z with 2 y = f (y¯ ) and z = g (z¯ ) and |κ | ≤ 2n . Finally, we show that UAP(x, ϕ) is infinite if it contains a path sufficiently long and this length on which infiniteness can be tested is simply exponential in the number of variables occurring in ϕ, Proposition 8. Let ϕ be a constraint and n the number of variables occurring in ϕ, if UAP(x, ϕ) contains a path κ of length equal 2n + 1 then UAP(x, ϕ) is infinite. Proposition 9. Testing side-conditions for the inference rules (N3), (C1) and (OC) is in PSPACE.
324
W. Charatonik and J.-M. Talbot
Proof. We present a non-deterministic polynomial-space algorithm to test the sidecondition for the rule (N3); algorithms for (C1) and (OC) are similar. This algorithm is based on the result of Proposition 7, that is if there exists an unavoidable path κ for x in ϕ leading to y then there is an unavoidable path κ of length at most 2n for x in ϕ leading to y. V := {x}; Vn := ∅; c := 1 While y ∈ / V and c ≤ 2n do Guess f ∈ Σ and i ≤ arity(f ) s.t. u = f (¯ v ) ∈ ϕ for some u ∈ V Vn := {w | ϕ z[f, i] ⊆ w and z ∈ V } c := c + 1; V := Vn Endwhile Return (y ∈ V ) Obviously, this algorithm uses only polynomial space in n: the cardinality of the sets V and Vn is bounded by n and the counter c can be coded on log(2n ) ≤ n + 1 bits. This non-deterministic algorithm can be turned into a deterministic one using the Savitch’s theorem [22] (NPSPACE(n) = PSPACE(n2 )). Proposition 10. For any constraint ϕ, Sati+1 (ϕ) can be computed from Sati (ϕ) with an algorithm running in polynomial-space in |ϕ|2 + 2|ϕ| + 1. Proof. The computation is done as follows: we start from a constraint S equal to Sati (ϕ) and we iterate until S is unchanged the following steps: 1. we saturate S by the axioms (I1 − 5), (N 1 − 2) and (C2) in polynomial time in the size of Sati (ϕ) (upper-bounded by |ϕ|2 + 2|ϕ| + 1; Proposition 5); 2. for any variable x and any constraint y ∅, we test if the side-condition of (N 3) holds for UAP(x, Sati (ϕ)) in polynomial-space in the size of Sati (ϕ) (Proposition 9); if it holds, we add y ∅ to S; 3. for any variable x such that x ∅ ∈ S, any pair of constraints y = f (¯ y ), z = g(¯ y ) from ϕ, we test if the side-condition of (C1) holds for UAP(x, Sati (ϕ)) in polynomial-space in the size of Sati (ϕ) (Proposition 9); if it holds, we add ⊥ to S; 4. for an interpretation over sets of finite trees: for any variable x such that x ∅ ∈ S, we test if the side-condition of (OC) holds for UAP(x, Sati (ϕ)) in polynomialspace in the size of Sati (ϕ) (Proposition 9); if it holds, we add ⊥ to S; The constraint Sati+1 (ϕ) is the constraint S obtained at the end of the iteration process. As the size of Sati+1 (ϕ), is bounded by |ϕ|2 + 2|ϕ| + 1, this latter provides also an upper-bound on the number of possible iterations. Theorem 5. Satisfiability for atomic set constraints with projection is in PSPACE. Proof. Immediate, combining results from Propositions 5 and 10.
References 1. A. Aiken. Set Constraints: Results, Applications, and Future Directions. In Proc. of the 2rd International Workshop on Principles and Practice of Constraint Programming, LNCS 874, pages 326–335. 1994.
Atomic Set Constraints with Projection
325
2. A. Aiken. Introduction to Set Constraint-Based Program Analysis. In Science of Computer Programming, 35(2): 79-111, 1999. 3. A. Aiken, D. Kozen, M. Vardi and E. L. Wimmers. The complexity of set constraints. In 1993 Conference on Computer Science Logic, LNCS 832, pages 1–17, 1993. 4. A.Aiken, D. Kozen and E. Wimmers. Decidability of Systems of Set Constraints with Negative Constraints. Information and Computation, 122(1):30–44, oct 1995. 5. L. Bachmair, H. Ganzinger and U. Waldmann. Set constraints are the monadic class. In Proc. of the 8th IEEE Symp. on Logic in Computer Science, pages 75–83, 1993. 6. W. Charatonik and L. Pacholski. Negative Set Constraints with Equality. In Proc. of the 9th IEEE Symp. on Logic in Computer Science, pages 128–136, 1994. 7. W. Charatonik and L. Pacholski. Set Constraints with Projections are in NEXPTIME. In Proc. of the 35th Symp. on Foundations of Computer Science, pages 642–653, 1994. 8. W. Charatonik and A. Podelski. Set Constraints with Intersection. In Proc. of the 12th IEEE Symp. on Logic in Computer Science, pages 362–372, 1997. 9. W. Charatonik and A. Podelski. Co-definite Set Constraints. In Proc. of the 9th International Conference on Rewriting Techniques and Applications, LNCS 1379, pages 211–225, 1998. 10. W. Charatonik and J.-M. Talbot. Atomic Set Constraints with Projection. Research Report Max-Planck-Institut für Informatik. MPI-I-2002-2-008, 2002. 11. P. Devienne, J.-M. Talbot and S. Tison. Solving classes of set constraints with tree automata. In Proc. of the 3rd International Conference on Principles and Practice of Constraint Programming, LNCS 1330, pages 62–76, 1997. 12. R. Gilleron, S. Tison and M. Tommasi. Solving Systems of Set Constraints with Negated Subset Relationships. In Proc. of the 34th Symp. on Foundations of Computer Science, pages 372–380, 1993. 13. N. Heintze. Set Based Program Analysis. PhD thesis, Carnegie Mellon University, 1992. 14. N. Heintze and J. Jaffar. A Finite Presentation Theorem for Approximating Logic Programs. In Proc. of the 17th Symp. on Principles of Programming Languages, pages 197–209, 1990. 15. N. Heintze and J. Jaffar. A Decision Procedure for a Class of Herbrand Set Constraints. In Proc. of the 5th IEEE Symp. on Logic in Computer Science, pages 42–51, 1990. 16. N. Heintze and J. Jaffar. Set constraints and set-based analysis. In Proc. of the Workshop on Principles and Practice of Constraint Programming, LNCS 874, pages 281–298, 1994. 17. F. Henglein and J. Rehof. The complexity of subtype entailment for simple types. In Proc. of the 12th IEEE Symp. on Logic in Computer Science, pages 362–372, 1997. 18. J. Niehren, M. M¨ uller and J.-M. Talbot. Entailment of Atomic Set Constraints is PSPACEComplete. In Proc. of the 14th IEEE Symp. on Logic in Computer Science, pages 285–294, 1999. 19. M. M¨ uller, J. Niehren and A. Podelski. Inclusion Constraints over Non-Empty Sets of Trees. In Proc. of 7th International Joint Conference CAAP/FASE - (TAPSOFT’97), LNCS 1214, pages 345–356, 1997. 20. L. Pacholski and A. Podelski. Set Constraints: A Pearl in Research on Constraints. In Proc. of the 3rd International Conference on Principles and Practice of Constraint Programming, LNCS 1330, pages 549–561, 1997. Tutorial. 21. J. C. Reynolds. Automatic computation of data set definitions. Information Processing, 68:456–461, 1969. 22. Savitch, W. Relationships between nondeterministic and deterministic tape complexities. Journal of Computer and System Sciences 4(2), 177–192. 23. K. Stefansson. Systems of Set Constraints with Negative Constraints are NEXPTIMEComplete. In Proc. of the 9th IEEE Symp. on Logic in Computer Science, pages 137–141, 1994. 24. T. E. Uribe. Sorted Unification using Set Constraints. In Proc. of the 11th International Conference on Automated Deduction, LNAI 607, pages 163–177, 1992.
Currying Second-Order Unification Problems Jordi Levy1 and Mateu Villaret2 1
IIIA, CSIC, Campus de la UAB, Barcelona, Spain. http://www.iiia.csic.es/˜levy 2 IMA, UdG, Campus de Montilivi, Girona, Spain. http://www.ima.udg.es/˜villaret
Abstract. The Curry form of a term, like f (a, b), allows us to write it, using just a single binary function symbol, as @(@(f, a), b). Using this technique we prove that the signature is not relevant in second-order unification, and conclude that one binary symbol is enough. By currying variable applications, like X(a), as @(X, a), we can transform second-order terms into first-order terms, but we have to add betareduction as a theory. This is roughly what it is done in explicit unification. We prove that by currying only constant applications we can reduce second-order unification to second-order unification with just one binary function symbol. Both problems are already known to be undecidable, but applying the same idea to context unification, for which decidability is still unknown, we reduce the problem to context unification with just one binary function symbol. We also discuss about the difficulties of applying the same ideas to third or higher order unification.
1
Introduction
The Curry form of a term, like f (a, b), allows us to write it, using just a single binary symbol, as @(@(f, a), b), where @ denotes the explicit application. This helps to solve unification problems. In first-order logic, this transformation reduces a unification problem to a new unification problem containing a single binary symbol. The size of the new problem [and of the unifier] is similar to the size of the original problem [and of the original unifier]. So, from the point of view of complexity there is not a significant difference, but in practical implementations this allows representing terms as binary trees, and contexts as subterms, and has been used in term indexing data structures [GNN01]. In second-order logic the transformation is not so obvious. We can currify constant symbol applications and second-order variable applications, obtaining a first-order term. For instance, for f (X(a), Y ), where X is a second-order variable, we obtain @(@(f, @(X, a)), Y ), where both X and Y are now first-order variables. However, solvability of unification problems is not preserved by such transformation, unless we consider some form of first-order unification modulo
This research has been partially supported by the CICYT Research Projects DENOC (BFM2000-1054-C02), LOGFAC, and TIC2001-2392-C03-01
S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 326–339, 2002. c Springer-Verlag Berlin Heidelberg 2002
Currying Second-Order Unification Problems
327
β-reduction for solving the new problem. For instance, the second-order unifi? cation problem F (G(a), b) = g(a) is solvable, whereas its first-order Curry form ? @(@(F, @(G, a)), b) = @(g, a) is unsolvable. Moreover, the right-hand side of the β-equation (λx . t1 )t2 = t1 [t2 /x] is a meta-term, unless we make substitution explicit [ACCL98]. Roughly speaking this is what is done in the so called explicit unification [DHK00,BM00]. Here, we propose to currify function symbol applications, but not variable applications. Therefore, the new problem we get is also a second-order unifica? ? tion problem. For instance, for F (G(a), b) = g(a), we get F (G(a), b) = @(g, a), that is also solvable. In this case, we do not reduce the order of the unification problem, but we reduce the number of function symbols to just one: the application symbol @. It can be argued that this reduction is useless, since second-order unification [Gol81] was already known to be undecidable for just one binary function symbol [Far91], although applying the reduction to the results of [LV00] proves that second-order unification is undecidable for one binary function symbol and one second-order variable occurring four times. Moreover, the same reduction is applicable to context unification [Com98], for which decidability is still unknown [Com98,LV01,SS96,Lev96,SS98,SSS98,SSS99], and it allows concentrating the efforts in a very simple signature. We also think that currying could help to simplify the signature used in higher-order matching, and this could help to prove its decidability (or undecidability). If we currify function applications in a second-order [or context] unification problem, it is easy to prove that, if the original problem is solvable, then its Curry form is also solvable: we can currify the unifier of the original problem to obtain a unifier of its Curry form. However, the converse is not true and, in general, solvability is not preserved by currying, as the following examples prove. Example 1. The following context unification problem ? g( F (G(a)), F (a), G(a) )= = g( f (a, b), H(a, b), H(X, a) ) ?
is unsolvable. However, its Curry form ? @(@(@(g, F (G(a)) ), F (a) ), G(a) )= = @(@(@(g, @(@(f, a), b) ), H(a, b) ), H(X, a) ) ?
is solvable and has the following unifier σ(F ) = λx . @(x, b) σ(G) = λx . @(f, x) σ(H) = λx . λy . @(x, y) σ(X) = f Similarly, the following second-order unification problem ? G(a ) )= g( F (G(a)), F (G(a )), F (a), F (a ), G(a), ? = g( f (a, b), f (a , b), H(a, b), H(a , b), H(X, a), H(X, a ) )
is also unsolvable, whereas its Curry form is solvable.
328
J. Levy and M. Villaret σ ( @(@(@(g, F(G(a))), F(a)), G(a)) )
σ ( @(@(@(g, @(@(f, a), b)), H(a, b)), H(X, b)) )
@
@
@
@
@
@
g
@ @ f
a
f b
@ a
@
@
g
@
b a
@
@ f
a
f
a
b
b a
Cut over a left chain of "@"
Fig. 1. Graphic representation of the unifier of curried context unification problem of Example 1.
In the previous example, σ(F ), σ(G), σ(H) and σ(X) are not “well-typed”, i.e. they are not the Curry form of any well-typed term. For instance, σ(F ) = λx . @(x, b) is the Curry form of λx . x(b), but this term is third-order typed (and F is a second-order typed variable), and σ(G) = λx . @(f, x) is the Curry form of λx . f (x), but f has two arguments. This disallows us to reconstruct a unifier for the original unification problem from the unifier we get for its Curry form. We can also see that the original unification problems contain variables that “touch”. For instance, F touches G in F (G(a)), and H touches X in H(X, a). We will prove, for second-order and for context unification, that, if no variable touches any other variable, then solvability of the problems is preserved in both directions by our partial currying. It is easy to reduce second-order and context unification problems to problems accomplishing such property. Therefore, we conclude that second-order and context unification can be both reduced to the partial Curry form, where only a binary function symbol @ is used. It is well known that word unification [Mak77] is an special case of context unification. Plandowski [Pla99] proves that if σ is a most general unifier of a ? word unification problem t = u, then any substring of σ(t) “is over a cut”, i.e. there exists an occurrence of the substring in σ(t) that is not completely inside the instance of a variable. Something similar can be proved for second-order and for context unification. The pathology of Example 1 is due to the existence of a cut over a left chain of @ ended by a constant. For instance, in the example, the left chain @(@(f, . . .), . . .) is “cut” by F (G(. . .)), i.e. one piece is inside σ(F ) and another inside σ(G) (see Figure 1). If variables “do not touch” this situation is avoided, and satisfiability is preserved. Our main result could be proved using a version of Plandowski’s theorem for second-order unification, but the proof would be longer than the one we present in this paper.
Currying Second-Order Unification Problems
329
This paper proceeds as follows. In Section 2 we introduce some standard definitions and results about second-order and context unification. Most of our results hold for second-order and for context unification, and sometimes we do not make the distinction explicit. In Section 3 we define the partial Curry forms where only function symbols applications are made explicit. In Section 4 we define a labeling on Curry forms that is used to characterize “well-typed” terms, i.e. terms that are the Curry form of some well-built term. In Section 5 we prove our main result: second-order and context unification can be reduced to a simplified form where only a single binary function symbol and constants are used. We conclude in Section 6 with a discussion about the difficulties to extended these results to higher order.
2
Preliminary Definitions
A second-order signature Σ is a finite disjoint union of finite sets of symbols Σ = n≥0 Σn , where symbols f ∈ Σn are said to be n-ary, noted arity(f ) = n. We distinguish between constant symbols, when arity(a) = 0, and function symbols, when arity(f ) > 0. Similarly, we define the set of variables X = n≥0 Xn , and distinguish between first-order variables (the set X0 ) and second-order variables (the rest of variables n≥1 Xn ). We use lambda bindings and the usual notion of bound and free variables. For simplicity, we assume that bound and free variables have distinct names, and use lower case letters for the bound variables and upper case letters for free variables. The set of terms T (Σ, X ) is defined as in [Gol81]. A first-order term is either a constant symbol a ∈ Σ0 , a first-order variable X ∈ X0 , or has the form f (t1 , . . . , tn ) or X(t1 , . . . , tn ), where f ∈ Σn , X ∈ Xn , and ti ’s are first-order terms. A second-order term has the form λx1 . . . λxn . t, where t is a first-order term, n ≥ 1, and xi ’s are (bound) first-order variables. In other words, we assume that any term is written in βη-long normal form, and we do not consider constants of third or higher order. Therefore, λ-abstractions always appear in the head of the term. The arity of a term is the number of λ-abstractions that it has in the head. Therefore, first-order terms are the terms of arity zero. ? A second-order unification problem is a pair of first-order terms, noted t = u. ? Notice that all variables of t = u occur free. A second-order substitution σ is a set of (variable,term) pairs, like [X1 → t1 ] . . . [Xn → tn ], where Xi and ti have the same arity. Therefore, instances of second-order variables contain λ-abstractions. The application of a substitution σ to a first-order term t is defined recursively as follows σ(f (t1 , . . . , tn )) = f (σ(t1 ), . . . , σ(tn )) σ(X) = t σ(X) = X σ(F (t1 , . . . , tn )) = ρ(u)
if the pair X → t is in σ otherwise
if the pair F → λx1 · · · λxn . u is in σ, and where ρ = [x1 → σ(t1 )] . . . [xn → σ(tn )] σ(F (t1 , . . . , tn )) = F (σ(t1 ), . . . , σ(tn )) otherwise
330
J. Levy and M. Villaret
Notice that in the previous definition we avoid the definition of instances of a second-order terms, and therefore all problems related with the variable capture. A substitution σ is said to be a unifier (or solution) of a second-order uni? u if σ(t) = σ(u). The definition of most general unifier is fication problem t = standard. ? u A context unification problem is also a pair of first-order terms t = over a second-order signature. In the ambit of context unification, secondorder variables are called context variables. A context substitution is a secondorder substitution [X1 → t1 ] . . . [Xn → tn ] where, for all second-order term ti = λx1 · · · λxn . u, every xj occurs exactly once in u. Then, a context uni? fier of a context unification problem t = u is a context substitution σ satisfying σ(t) = σ(u). Notice that second-order and context unification problems have the same presentation, and any solvable context unification problem is also solvable viewed as a second-order unification problem. Sometimes, context unification is defined restricting context variables to be unary. Here we consider n-ary variables, and use bound variables to denote the “holes” of the context (instead of the box used by other authors). If nothing is said, the signature of a problem is given by the set of constants that it contains and a denumerable infinite set of variables, for every arity. For technical reasons we also assume that the signature contains, at least, a binary function symbol and a constant (that can be added if the problem does not contain any). The following is a basic property of most general second-order [and context] unifiers that will be required in some proofs. It ensures that the signature does not play an important role w.r.t. the decidability of the problem. ? Property 1. Let t = u be a second-order or a context unification problem, and σ be a most general unifier. Then, for any variable X, σ(X) does not contain ? constants not occurring in the problem t = u.
Proof: Suppose that a most general unifier σ introduces a constant f not occurring in the problem. Then we can replace everywhere this constant by a fresh variable F of the same arity and get another unifier that is more general than σ (we can instantiate F by λx1 . . . λxn . f (x1 , . . . , xn ), but not vice versa). This contradicts the fact that σ is most general.
3
Currying Terms
Definition 1. Given a signature Σ = c Σ is defined by n≥0 n
n≥0
Σn , the curried signature Σ c =
Σ0c = n≥0 Σn Σ2c = {@} Σnc = ∅ for n = 0, 2
Currying Second-Order Unification Problems
331
The currying function C : T (Σ, X ) → T (Σ c , X ) is defined recursively as follows: C(a) = a C(x) = x n n C(f (t1 , . . . , tn )) = @(· · · @(f, C(t1 )) · · ·, C(tn )) C(F (t1 , . . . , tn )) = F (C(t1 ), . . . , C(tn )) C(λx . t) = λx . C(t) for any constant a ∈ Σ0 , bound variable x, function symbol f ∈ Σn , and variable F ∈ Xn . The currying function is injective, but it is not onto, as the following definition suggests. Definition 2. Given a term t ∈ T (Σ c , X ), we say that it is well-typed (w.r.t. Σ), if C −1 (t) is defined, i.e. if there exists a term u ∈ T (Σ, X ) such that C(u) = t. ? u over Σ is Lemma 1. If the second-order [context] unification problem t = ? solvable, then the second-order [context] unification problem C(t) = C(u) over c Σ is also solvable. ? Proof: Let σ be a unifier of t = u, then it is easy to prove that the substitution ? C(u). σC defined as σC (F ) = C(σ(F )) is a unifier of C(t) = ? In fact, we have proved a stronger result: given a unifier σ of t = u, we ? can find a unifier σC of C(t) = C(u) that satisfies the commutativity property C(σ(t)) = σC (C(t)). This commutativity property is represented by the following diagram:
? t= u
C✲ ? C(u) C(t) =
C σ =====⇒ σC ❄ σ(t)
❄ C ✲ σC (C(t))
Unfortunately, as it is shown in Example 1, the converse is not true. Given ? ? a unifier of C(t) = C(u) it is not always possible to obtain a unifier of t = u. In the next Section, we describe sufficient conditions to ensure that the inverse construction is possible.
4
Labeling Terms
The first step to find a sufficient condition ensuring that the currying function preserves satisfiability is to characterize well-typed curried terms. This is done by labeling application symbols @ with the “arity” of their left argument, and using a “hat”to mark the roots of right arguments. If left arguments always have positive arity, and right arguments always have arity zero, then the term is well-typed.
332
J. Levy and M. Villaret
Definition 3. Given a signature Σ = L n≥0 Σn is defined by: Σ0L = Σ2L ΣnL
n≥0
Σn , the labeled signature Σ L =
n≥0 Σn l l
= {@ , @ | l ∈ {. . . , −1, 0, 1, . . .}} =∅ for n = 0, 2
The labeling functions L, L : T (Σ c , X ) → T (Σ L , X ) are defined by the following rules: 1. If the left child of an @ is an n-ary symbol f ∈ Σn , then it has label l = arity(f ) − 1 = n − 1. 2. If the left child of an @ is a variable X ∈ X , or a bound variable, then it has label −1, regardless the arity of the variable is. 3. If the left child of an @ is another @ with label n, then it has label n − 1. In the case of L we also use the following rule: 4. If an @ is the right child of another @, or it is the child of a variable, or it is the root of the term, then, apart from the label, it also has a hat. Example 2. The L-labeling of the term σ(C(t)), used in Example 1 and shown in Figure 1, is as follows. 0 (@1 (@2 (g, @ 0 (@1 (f, a), b)), @ 1 (f, a)) −1 (a, b)), @ @ 0 @ ✏PP ✏ PP ✮✏ ✏ q 1 1 @ @ ✏PP ✏ ✏ P ✡❏ ✮ ✏ q P ✢ ❏ ✡ 2 −1 @ f a @ ✁❆ ✟✟❍❍ ✙ ✟ ❥ ❍ ☛ ❆ ✁ 0 a b g @ ❂✚ ✚ ~ @1 ✡❏ ✢ ❏ ✡ f a
b
Notice that labels can be negative numbers. These negative labels do not appear in labellings of “well-typed” terms. Based on these labels, it is easy to characterize well-typed terms. Lemma 2. A term t ∈ T (Σ c , X ) is well-typed if, and only if, L(t) does not contain application symbols with negative labels (@−n , for n > 0) or with hat n with n = 0). and non-zero labels (@
Currying Second-Order Unification Problems
333
Proof: The only if implication is obvious. For the if implication, assume that the n , with n = 0. Then, any does not contain @−n , with n > 0, or @ labeling L(t) @ symbol is in a sequence of the form: 0 @ ✟❍
❍❍ ✟✟ ✙ ❥ 1 @ tn ✟❍ ✟ ❍ ✙ ✟ ❥ ❍ ··· tn−1 ✚ ❂ ✚ ~ @n−1 t2 ✡❏ ✢ ❏ ✡ f t1 0 is a right child of another @, or the child of a variable F , or the where the node @ root of the term. We can prove that this is the currying of f (C −1 (t1 ), . . . , C −1 (tn )) that is a well constructed term, because f has n arguments and arity n.
5
When Variables Do Not Touch
In this section, we try to find sufficient conditions ensuring that, when we have ? ? C(u), we can find a unifier for t = a unifier for C(t) = u. The strategy to prove this result is summarized in the following diagram: ? u t=
C✲ ? C(t) = C(u)
L✲ ? L(C(u)) L(C(t)) =
C −1 L σ ⇐========= σC ============⇒ σL ❄ C −1 σ(t) ✛
❄ σC (C(t))
❄ L ✲ σL(L(C(t)))
We will find a condition that makes the right square commute (Lemma 5). Then we will prove that when the right square commutes, then the left one also commutes (Lemma 6). This second commutativity property ensures that the currying transformation preserves satisfiability. The sufficient condition we have found is based on the following definition. Definition 4. Given a term t ∈ T (Σ, X ), we say that two variables X, Y ∈ X touch, if t contains a subterm of the form X(t1 , . . . , Y (u1 , . . . , um ), . . . , tn ). In the context unification problem of Example 1, the variable F touches G, and the variable H touches X. For technical reasons, before proving that the first square commutes (Lemma 5) we prove the same result using a variant of the labeling function where hats are not considered (notice that in Lemma 3 L’s have no hats).
334
J. Levy and M. Villaret
? Lemma 3. If the variables of t = u do not touch, and σC is a most general ? unifier of C(t) = C(u), then the substitution σL defined by
σL (F ) = L(σC (F )) ? is a most general unifier of L(C(t)) = L(C(u)), and satisfies
σL (L(C(t))) = L(σC (C(t))) Proof: First, we prove that σL (L(C(t))) = L(σC (C(t))) As far as σC and σL only differ in the introduction of labels, both terms have the same form, except for the labels. Therefore, we only have to compare the labels of the corresponding @’s in both terms. There are two cases: – If the occurrence of the @ is outside the instance of any variable, then this @ already occurs in C(t), and it is in a sequence of the form: @0 ✚ ❂ ✚ ~ ··· ❂✚ ✚ ~
@n−2 ❙ ✴ ❙ ✇ n−1 @ ✁❆ ☛ ❆ ✁ f
where the f and all the @’s in between already occur in C(t) (they have not been introduced by an instantiation either). Thus, the @ gets the same label in σL (L(C(t)) as in L(σC (C(t))), because this label only depends on the left descendants, and they have not been introduced by σC or σL . – If the @ is inside the instance of a variable F , we have to prove that it gets the same label in σL (L(F (t1 , . . . , tn ))) = σL (F )(σL (L(t1 )), . . . , σL (L(t1 ))) as in L(σC (F (t1 , . . . , tn ))). In the first case we label σC (F ) before instantiating (so we have bound variables in the place of the arguments), whereas in the second case we label σC (F ) after instantiating (so we already have the arguments ti ). As we will see, in both cases the labels we get are the same. The root of one of the arguments ti can be a left descendant of the @, and its label will depend on such argument. However, if variables do not touch, the head of any argument ti of F is a constant, and the head of C(ti ) is either a 0-ary constant a or an @ with label 0. Therefore, the labels of the ancestors of the argument inside σC (F ) will be the same if we replace the argument by a bound-variable, and the label of the corresponding @ inside σL (F ) will be the same.
Currying Second-Order Unification Problems
335
λx ❙ σ (F ) k❙ @ ❙ ❙ ❅ ✠ ❘ ❅ ❙ . . . ❙ ✠ x
L(σ(F (t)))
❙ ❙ k ❙ @ ❙ ❘ ❅ ❙ . .✠ . ❅
❙ ✠ 0❅ ❅ @ ❅ ✠ ❘ ❅ ❅ ... ❅ . . . ❅ @n−1 ❅ ✠❅ ❘ ❅ ❅ f ... ❅
Similarly, we can prove σL (L(C(u))) = L(σC (C(u))). As σC (C(t)) = σC (C(u)), ? we can conclude that σL is a unifier of L(C(t)) = L(C(u)). ? ? C(u) by Given a unifier of L(C(t)) = L(C(u)), we can find a unifier of C(t) = removing labels. Using this idea, it is easy to prove that, if σC is most general ? ? C(u), then σL is also most general for L(C(t)) = for C(t) = L(C(u)). Otherwise, there would be a unifier more general than σL , and removing labels we could obtain a unifier more general than σC . The following is a technical lemma that we need in the proof of Lemma 5. ? u do not touch, and σC is a most general Lemma 4. If the variables of t = ? unifier of C(t) = C(u), then the arguments ti of any variable F never occur as left child of an @ in σC (C(t)).
Proof: As C(t) and C(u) are trivially well-typed, by Lemma 2, L(C(t)) and L(C(u)) will not contain @’s with negative labels. Let σL be the most gen? eral unifier of L(C(t)) = L(C(u)) given by Lemma 3. Now, by Property 1, as σL is a most general unifier, for any variable F , σL (F ) will not contain @’s with negative labels, either. We can conclude then that the head of any argument ti of F can not be a left child of an @. As far as the heads of σL (ti ) have zero label or are 0-ary constants, this situation would introduce a negative label in some @ inside σL (F ). ? Lemma 5. If the variables of t = u do not touch, and σC is a most general ? unifier of C(t) = C(u), then the substitution σL defined by
C (F )) σL(F ) = L(σ ? is a most general unifier of L(C(t)) = L(C(u)), and satisfies
C (C(t))) = L(σ σL(L(C(t))) Proof: We already know that both terms have the same form and the same labels, thus we only have to prove that they have the same hats. Again, there are two cases:
336
J. Levy and M. Villaret
– If the occurrence of the @ is outside the instance of any variable, then the only situation we have to consider is the following. If the @ has as father a variable F in C(t), and after instantiation, it becomes a left child of an @ inside σL(F ), then it could loose the hat. However, if variables do not touch, this situation is not possible because, by Lemma 4, arguments ti of F never occur as left child of an @ in σL(F (t1 , . . . , tn )). – If the occurrence of the @ is inside the instance of a variable F , then we have to prove that the fact that @ has a hat or not, does not depend on the arguments of F . This is obvious because this fact does not depend on the descendants of the @. As in Lemma 3, this allows us to replace arguments by bound variables and get a unifier σL for our problem. Using the argument of Lemma 3, we conclude that σL is a most general unifier ? of L(C(t)) = L(C(u)). ? Lemma 6. If the variables of t = u do not touch, and σC is a most general ? ? unifier of C(t) = C(u), then there exists a most general unifier σ of t = u that satisfies C(σ(t)) = σC (C(t)) ? Proof: Let σL be the most general unifier of L(C(t)) = L(C(u)) given by Lemma 5. As C(t) and C(u) are well-typed, by Lemma 2, they do not contain negative labels nor hats over non-zero labeled @’s. Then, by Property 1, σL does not introduce such kind of labels or hats. Therefore, as σL(F ) is defined as the labeling of σC (F ), using again Lemma 2, σC (F ) will be well-typed, and we can define: σ(F ) = C −1 (σC (F ))
Theorem 1. Decidability of second-order [context] unification can be NPreduced to decidability of second-order [context] unification with just one binary function symbol, and constants Proof: By Lemmas 1 and 6, we know that, when variables do not touch, satisfiability of second-order and context unification problems is preserved by currying. Now, we will prove that we can NP-reduce solvability of second-order and context unification to solvability of the corresponding problems without touching variables. For second-order the reduction is as follows. For every n-ary variable F , we conjecture one of the following possibilities: – Project F → λx1 . . . . λxn . xi , for some i ∈ {1, . . . , n}. – Instantiate F → λx1 . . . . λxn . f (F1 (x1 , . . . , xn ), . . . , Fm (x1 , . . . , xn )), for some constant f ∈ Σm occurring in the original unification problem, and being F1 , . . . , Fm fresh free variables.
Currying Second-Order Unification Problems
337
Obviously, this reduction can be performed in polynomial non-deterministic time. As far as the new problem is an instance of the original one, if the new problem is solvable, so it is the original one. If the original problem is solvable, and σ is a most general unifier, then, for every variable F , let σ(F ) = λx1 . . . λxn . t be written in normal form. Taking t as a tree, descend from the root to the left-most leave, discarding free variables, until you get a constant f (this must be a constant occurring in the problem, by Property 1), a 0-ary variable, or a bound variable xi . It is easy to prove that the instantiation F → λx1 . . . λxn . xi , if we find a bound variable xi , F → λx1 . . . λxn . a for some fixed constant a, if we find a 0-ary variable, or F → λx1 . . . λxn . f (F1 (x1 , . . . , xn ), . . . , Fm (x1 , . . . , xn )), if we find a constant f , results in a solvable problem. In fact, the solution of the new problem is σ composed with a substitution that projects as G → λx1 · · · λxn . x1 the free variables that we have discarded during the traversal, and as X → a the 0-ary variables that we have found. For context unification the reduction is as follows. For every n-ary variable F , we conjecture one of the following possibilities: – Project F → λx . x, if it is unary. – Instantiate F → λx1 . . . λxn . f (F1 (xτ (1) , . . . , xτ (r1 ) ), . . . , Fm (xτ (rm−1 +1) , . . . , xτ (n) )) for some constant f ∈ Σm occurring in the original unification problem, some permutation τ , and being F1 , . . . , Fm fresh free variables. As for the second-order case, it can be proved that this nondeterministic reduction preserves satisfiability. However, in this case we have to assume that the original signature (the problem) contains, at least, a binary function symbol and a 0-ary constant.
6
Conclusions and Further Work
Currying terms is an standard technique in functional programming and has been used in practical applications of automated deduction. It is also used in higherorder unification via explicit substitutions or explicit unification. However, in these cases not only applications, but also lambda abstractions are made explicit, and unification is made modulo the explicit substitution rules. Here, we propose a partial currying transformation for second-order unification, where the “order” of the unification problem is not reduced, like in explicit unification, but the signature is simplified. The transformation is not trivial, and we prove that, to preserve solvability of the problems, we need to ensure that “variables do not touch”. The reduction also works for context unification. This allows us to concentrate on a simpler signature containing constant symbols and just one binary function symbol: the explicit application symbol @. Decidability of higher-order matching is still an open question. Proving that higher-order matching can be curried, i.e., that we can simplify the signature,
338
J. Levy and M. Villaret
could contribute to prove its decidability or undecidability. The extension of our technique to third-order and higher orders is proposed as a further work. The first difficulty we find trying to apply our transformation to third or higher order matching problems is that we must deal with instances of variables that are not connected. For instance, the following matching problem: ? f ( F (λx .g(x), a), F (λx .g (x), a ) ) = = f ( f (g(h(a)), a), f (g (h(a )), a ) ) ?
is solved by the substitution: F → λx λy .f (x, (h(y)), y) where the instance of F is split into two pieces f and h. In such situations we have to guaranty that these pieces do not touch, to avoid that these “cuts” (in the sense of Plandowski [Pla99]) could cut a left chain of @’s. Acknowledgments. We acknowledge Roberto Nieuwenhuis for suggesting us the use of currying in second-order unification problems. We also thank all the anonymous referees for their helpful comments.
References [ACCL98] M. Abadi, L. Cardelli, P.-L. Curien, and J.J. L´evy. Explicit substitutions. Journal of Functional Programming, 1(4):375–416, 1998. [BM00] Nikolaj Bjorner and C´esar Mu˜ noz. Absoulte explicit unification. In Proceedings of the 11th Int. Conf. on Rewriting Techniques and Applications (RTA’00), volume 1833 of LNCS, pages 31–46, Norwich, UK, 2000. [Com98] Hubert Comon. Completion of rewrite systems with membership constraints. Journal of Symbolic Computation, 25(4):397–453, 1998. [DHK00] G. Dowek, T. Hardin, and C. Kirchner. Higher-order unification via explicit substitutions. Information and Computation, 157:183–235, 2000. [Far91] W. M. Farmer. Simple second-order languages for wich unification is undecidable. Theoretical Computer Science, 87:173–214, 1991. [GNN01] Harald Ganzinger, Robert Nieuwenhuis, and Pilar Nivela. Context trees. In Proceedings of the First Int. Conf. on Automated Reasoning (IJCAR 2001), volume 2083 of LNCS, pages 242–256, Siena, Italy, 2001. [Gol81] W. D. Goldfarb. The undecidability of the second-order unification problem. Theoretical Computer Science, 13:225–230, 1981. [Lev96] Jordi Levy. Linear second-order unification. In Proceedings of the 7th Int. Conf. on Rewriting Techniques and Applications (RTA’96), volume 1103 of LNCS, pages 332–346, New Brunsbick, New Jersey, 1996. [LV00] Jordi Levy and Margus Veanes. On the undecidability of second-order unification. Information and Computation, 159:125–150, 2000. [LV01] Jordi Levy and Mateu Villaret. Context unification and traversal equations. In Proceedings of the 12th Int. Conf. on Rewriting Techniques and Applications (RTA’01), volume 2051 of LNCS, pages 167–184, Utrecht, The Netherlands, 2001.
Currying Second-Order Unification Problems [Mak77] [Pla99] [SS96] [SS98] [SSS98]
[SSS99]
339
G. S. Makanin. The problem of solvability of equations in a free semigroup. Math. USSR Sbornik, 32(2):129–198, 1977. Wojciech Plandowski. Satisfiability of word equations with constants is in pspace. In Proceedings of the 40th Annual Symposioum on Foundations of Computer Science, FOCS’99, pages 495–500, New York, NY, USA, 1999. Manfred Schmidt-Schauß. An algorithm for distributive unification. In Proceedings of the 7th Int. Conf. on Rewriting Techniques and Applications (RTA’96), volume 1103 of LNCS, pages 287–301, New Jersey, USA, 1996. Manfred Schmidt-Schauß. A decision algorithm for distributive unification. Theoretical Computer Science, 208:111–148, 1998. Manfred Schmidt-Schauß and Klaus U. Schulz. On the exponent of periodicity of minimal solutions of context equations. In Proceedings of the 9th Int. Conf. on Rewriting Techniques and Applications (RTA’98), volume 1379 of LNCS, pages 61–75, Tsukuba, Japan, 1998. Manfred Schmidt-Schauß and Klaus U. Schulz. Solvability of context equations with two context variables is decidable. In Proceedings of the 16th Int. Conf. on Automated Deduction (CADE-16), LNAI, pages 67–81, 1999.
A Decidable Variant of Higher Order Matching Dan Dougherty1 and ToMasz Wierzbicki2, 1
2
Wesleyan University, USA University of Wroclaw, Poland
Abstract. A lambda term is k-duplicating if every occurrence of a lambda abstractor binds at most k variable occurrences. We prove that the problem of higher order matching where solutions are required to be k-duplicating (but with no constraints on the problem instance itself) is decidable. We also show that the problem of higher order matching in the affine lambda calculus (where both the problem instance and the solutions are constrained to be 1-duplicating) is in NP, generalizing de Groote’s result for the linear lambda calculus [4].
1
Introduction
A lambda term is called linear if every bound variable occurs in it exactly once. In [4] de Groote defined linear higher order matching as the problem of deciding if there are linear solutions of a matching problem M =? N , where both terms M and N are also linear, and showed that this problem is NP-complete. This relatively low complexity suggests that this is a quite restricted version of matching. Although de Groote’s definition is well-justified by applications he is concerned with, nevertheless it seems interesting to consider another version, which also may be called linear, but which consists in deciding if there are linear solutions of an arbitrary rather than linear matching problem (incidentally, Levy [5] defined this way an analogous notion of the linear second order unification, by insisting on linearity of solutions, but allowing to use arbitrary terms in problems). This new linearity condition is much less restrictive than that of de Groote, since the new problem inherits the full complexity of β-reduction, and the best known lower bound for general matching also holds for it: the problem is non-elementary (see 1.1). In the present paper we show decidability of this new version of matching. In fact we show that a rather more general problem is decidable. Say that a lambda term is k-duplicating if every bound variable occurs in it at most k times (Definition 2). We prove that, for any fixed k, a complete set of k-duplicating solutions of an arbitrary matching problem is always finite, and the problem of finding such a set is decidable. Finally we strengthen slightly de Groote’s result in the following way. Say that a term is affine, if it is 1-duplicating (an affine function may, as linear one, use its argument once, but it may also forget it). Using tools developed in
Supported by KBN grant 8T 11C 04319
S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 340–351, 2002. c Springer-Verlag Berlin Heidelberg 2002
A Decidable Variant of Higher Order Matching
341
the paper we are able to replace “linear” with “affine” in de Groote’s theorem and prove that the problem of deciding if an affine matching problem has affine solutions is in NP (thus is NP-complete). All of our results apply to matching under β-equality as well as βη-equality. The k-duplicating matching is especially appealing in the λβ calculus, since it turns out to be quite general decidable approximation of β-matching, which has recently been shown to be undecidable [6]. On the other hand the decidability of k-duplicating matching in βη-calculus sheds no light on the status of the full problem in this setting, which is still open. There have been several other attempts to define some restricted variants of matching, for which decidability could be proved. The most successful include: bounding the order of solutions by 3 [3], 4 [9,10,1], or (in special case) by 5 [12], and restricting the signature to the first order (atomic) constants [8] (requiring at the same time that both sides of an instance are of base type). From the other hand a similar approach to ours has been applied to the problem of higher order unification [11]. 1.1
Lower Bounds
To put our results in perspective it may be useful to recall that all of the variants of matching thus far studied in the full lambda calculus are non-elementary, since a straightforward reduction to the problem of checking β-convertibility [14] can be applied to them as well. We shall briefly recall this argument here. It is well known that the problem “given a closed pure term M of type o2 → o, ∗ is it true that M → βη λxy.x?” is non-elementary [13,7]. Since it is required that both terms M and N in the matching problem M =? N should be in normal form, we cannot just write the equation M =? λxy.x. However we may easily reduce this problem to matching in the following way: replace any redex (λxi .Pi )Qi in M with a term fi (λxi .Pi )Qi obtaining this way a term M , which does not contain redexes, where fi are “fresh” variables of appropriate types. Write equations fi =? λxy.xy together with M =? λxy.x. If one insists on having a single equation instead of a set {Mj =? Nj }nj=1 , it suffices to transform it to λz.zM1 . . . Mn =? λz.zN1 . . . Nn . The only solution of the set ∗ {fi =? λxy.xy}i is a substitution θ = [fi /λxy.xy]i . Since M θ → β M , the whole ∗ problem has a (unique) solution θ if and only if M →β λxy.x, which turns out to be non-elementary. The solution θ is linear. Of course M is not linear. In case of a nonempty signature it is sometimes required that both sides of a matching equation are of base type o. Suppose that the signature contains two constants 0 ∗ and 1 of type o. Then a reduction from the problem of checking if M → βη λxy.x to matching proceeds as follows: write a term M 10, “hide” all β-redexes using fresh variables fi obtaining a term M , then write the equation M =? 1 together with equations fi (λx.x)0 =? 0 and fi (λx.x)1 =? 1. Again the only solution of the last pair of equations is the substitution [fi /λxy.xy]. Recall, that a matching problem is atomic if all right hand sides of equations are single constants of type o. Atomic matching is decidable [8]. Thus pure as well as atomic matching with linear solutions is non-elementary. Hence also pure and atomic k-duplicating, for any k ≥ 1, as well as general matching is non-elementary.
342
2
D. Dougherty and T. Wierzbicki
Preliminaries
Let T be the set of simple types built over one base type o and given by the following grammar: T ::= o | T → T. We assume that → associates to the right, and omit parentheses whenever possible. Small Greek letters σ, τ , ρ, etc., denote arbitrary types. The notions of size and order of types are defined as usual: |o| = 1 |σ → τ | = 1 + |σ| + |τ | order(o) = 1 order(σ → τ ) = max(order(σ) + 1, order(τ )) Let X , type(·) be a set of variables together with a function type : X → T, which assigns a unique type to each variable x ∈ X . We assume that there are infinitely many variables of each type. Similarly let Σ, type(·) be a collection of constants. The set Σ is sometimes called a signature. This set may be empty, finite or infinite (we say, that the lambda calculus is pure, if Σ = ∅, and atomic, if type(c) = o for all c ∈ Σ). We use small Latin letters x, y, z, etc., to denote arbitrary variables, and a, c, f , etc., to denote arbitrary constants. Let Xσ = {x ∈ X | type(x) = σ} and Σσ = {c ∈ Σ | type(c) = σ} be sets of respectively variables and constants of type σ. For all types σ ∈ T we → simultaneously define sets Λσ of lambda terms of type σ to be the smallest sets satisfying the following conditions: 1. 2. 3. 4.
→
Xσ ⊆ Λ σ , → Σσ ⊆ Λσ , → → if x ∈ Xσ and M ∈ Λτ , then λx.M ∈ Λσ→τ , → → → if M ∈ Λσ→τ and N ∈ Λσ , then M N ∈ Λτ . →
We use capital letters M , N , P , Q, etc., to denote arbitrary terms. If M ∈ Λσ , we sometimes write type(M ) = σ or M : σ. The size of a term is defined inductively: |x| = 1 |c| = 1 |M N | = 1 + |M | + |N | |λx.M | = 1 + |M | An address is a sequence p ∈ {0, 1}∗ . Empty sequence is denoted by . A subterm M p of a term M at address p is defined as follows: M = M (λx.M )0p = M p (M1 M2 )0p = M1 p (M1 M2 )1p = M2 p We say that an address p is valid in a term M , if M p is well defined. Note that (M p )q = M pq , where pq is the concatenation of sequences p and q. We say
A Decidable Variant of Higher Order Matching
343
that a term N is a subterm of M , and write N ⊆ M , if there exists an address p, such that M p = N . The result of the substitution of a term N at address p in a term M , providing that p is valid in M and type(M p ) = type(N ), is a term M [pN ] obtained by replacing a subterm at address p in M with N . Similarly we define a simultaneous substitution M [pi Ni ]ki=1 , providing there are no two addresses pi and pj , such that pi is a prefix of pj . The set Occ(x, M ) of free occurrences of a variable x in a term M is the set of all addresses in M at which x occurs as a free variable. The set FV(M ) of free variables of a term M is the set of those variables, which have at least one free occurrence in M . A term M is closed, if FV(M ) = ∅. The substitution M [x/N ] of a term N for variable x in a term M , providing that type(x) = type(N ), is the result of substitution of a term N at all free occurrences of variable x in M . Similarly we define the simultaneous substitution M [xi /Ni ]ni=1 . As usual we consider β- and η-reduction rules: (λx.M )N β M [x/N ]
λy.M y η M
for any variables x, y and terms M , N , such that y ∈ FV(M ). A binary relation → R on Λσ is monotonic, if for every every terms M , N , P , Q, and variable x, such that M RN and P RQ, there is (M P )R(N Q) and (λx.P )R(λx.Q). The one step β-reduction →β is the monotonic closure of β , the one step η-reduction →η is ∗ the monotonic closure of η , the (many step) β-reduction → β is the monotonic, ∗ reflexive and transitive closure of β , and the (many step) βη-reduction → βη is the monotonic, reflexive and transitive closure of β ∪ η . The two relations ∗ ∗ → β and →βη lead to two slightly different lambda calculi. A term M is a β-redex if it is of the form (λx.M1 )M2 ; a term is in β-normal form if it contains no β-redex. A term M is in η¯-normal form (called also ηlong form), if for every address p in M , such that M p is not an abstraction, and type(M p ) = o, there is p = q0, and M q is an application. β η¯-normal forms are unique modulo βη-conversion and are closed under substitutions and β-reduction. A β-reduction sequence is a sequence of terms (Mn )ni=0 , such that Mi →β Mi+1 for i = 0, . . . , n − 1. We say that this reduction sequence starts at M0 , leads to Mn , and has length n. We say that a term M is an instance of N , and write M ≥ N (or say, that N is a generalization of M , and write N ≤ M ), if there exists a substitution θ, such that M = N θ. The relation ≤ is a partial preorder on terms (i.e., is reflexive and transitive). Even though ≤ is not an order (is not weakly asymmetric), the notions of the smallest and greatest elements, lower and upper bounds, and suprema and infima of sets of terms with respect to ≤ are well defined. Unlike for order relations, these elements are usually not unique (but they are unique modulo renaming of free variables). Lemma 1 (basic properties of the instance relation). →
1. Every nonempty set P ⊆ Λσ of terms has an infimum, inf P. In particular, → the set of the smallest elements in Λσ coincides with the set of variables Xσ .
344
D. Dougherty and T. Wierzbicki
2. If a nonempty set P possesses an upper bound, then it has a supremum, sup P. 3. If M1 ≤ M2 and p is a valid address in M1 , then p is a valid address in M2 , and M1 p ≤ M2 p . 4. If there exists a substitution θ, such that M1 θ = M2 , and N1 θ = N2 , and p is a valid address in M1 , then p is a valid address in M2 , and M1 [pN1 ] ≤ M2 [pN2 ]. 5. If there exists a substitution θ, such that M1 θ = M2 , N1 θ = N2 , and x ∈ Dom(θ), then M1 N1 ≤ M2 N2 , and λx.M1 ≤ λx.M2 . 6. If M1 ≤ M2 (with additional proviso that there exists substitution θ, such that M1 θ = M2 and every x ∈ Dom(θ) occurs at most once in M1 ), and p is a valid address in M2 but not in M1 , then M1 ≤ M2 [pN ] for any term N . 7. If FV(M ) = ∅ and M ≤ N , then M = N . 8. If M ≤ N , and M p is a β-redex, then N p is a β-redex. 9. If M ≤ N , then |M | ≤ |N |. 10. If M = sup{Ni }ki=1 , where for each i there exists a substitution θi such that Ni θi = M and every x ∈ Dom(θi ) occurs at most once in Ni , then k |M | ≤ 1 − k + i=1 |Ni |. We define a similar instance relation for substitutions: we say that a substitution θ1 is at least as general as θ2 , and write θ1 ≤ θ2 , if there exists a substitution ρ, such that θ2 = θ1 ρ (i.e., if xθ1 ≤ xθ2 for every variable x ∈ X ). 2.1
Higher-Order Matching
Higher-order βη- (resp. β-) matching is the following problem: “Given two terms M and N of the same type, in β η¯- (resp. β-) normal form, where N is closed, usually written in the form of equation M =? N . Is there any substitution θ for free variables in M , where all terms xθ for x ∈ FV(M ) are in β η¯- (resp. β-) ∗ normal form, such that M θ → β N ?” We say, that θ is a solution of the instance ? M = N . There is a subtle difference between those two variants of matching [6]. In particular it is known that β-matching is undecidable, while decidability of βη-matching is still open. The results of the present paper apply to both variants. A set S of substitutions is a complete set of solutions of a matching problem M =? N , if for every θ ∈ S and every θ ≥ θ, the substitution θ is a solution of M =? N , and for every solution θ of M =? N there exists a substitution θ ∈ S, such that θ ≤ θ . It is not the case that if a matching problem M =? N has a solution then it has a closed solution, since, depending on the signature Σ, the types of some of the free variables of M may not be inhabited. This motivates the following definition and lemma. Definition 1. A term is almost-closed if its free variables (if any) are all of base type. A substitution θ is almost-closed if each θ(x) is an almost-closed term. Lemma 2. If a matching problem has a solution then it has an almost-closed solution.
A Decidable Variant of Higher Order Matching
345
Proof. If θ is a solution to M =? N and θ(x) has a variable v : τ free then we may choose some base-type variable z and replace v by the term λy1 . . . yn .z : τ ; since v does not occur in N this is obviously still a solution.
3
Decidability of k-Duplicating Matching
Definition 2. A lambda term is k-duplicating if for every subterm N ⊆ M , such that N = λx.P , there are at most k free occurrences of x in P , i.e., | Occ(x, P )| ≤ k. In this section we show the decidability of k-duplicating matching, that is, matching with no restriction on the problem instance but with the constraint that solutions must be k-duplicating. It will suffice to show that we can, given a matching problem, effectively generate a finite set of terms which includes all the potential k-duplicating solutions to the problem. The essential insight is that for each k there are in fact only finitely many pure k-duplicating normal forms at each type (independently of the matching problem). When the given matching problem contains constants there is a slight complication since we must allow for the fact that these constants may occur in solutions. But Lemma 3 below shows that we can bound the number of occurrences of constants in potential solutions by inspecting the problem. Definition 3. Let N be a normal-form term over the signature Σ and let c ∈ Σ. Then #(c, N ) is the number of occurrences of c in N . Lemma 3. Let M =? N be a matching problem and let c be some constant. For both β- and βη-matching: if the matching problem has a solution it has an almostclosed solution in which each solution term X satisfies #(c, X) ≤ #(c, N ). Proof. For simplicity of notation suppose there is only one free variable x in M . Let X be a solution, so that M [x/X] = N . By Lemma 2 we may assume that X is almost closed. Let us define the notion of a c-occurrence in X being relevant to the solution, as follows. For each occurrence of c inside of X choose some fresh constant of the same type as c: let us call these new ones c1 , . . . , cm . Build X by replacing the given occurrences of c by the ci in X, and let N be the normal form of M [x/X ]. Obviously N will differ from N just in that some of the c occurrences in N will now be certain ci . The original c occurrences in X whose corresponding ci appear in N are declared to be the relevant ones. Now: obviously no more than #(c, N ) such ci will occur in N , so there are at most this many relevant occurrences in X. If in X we replace the non-relevant c-occurrences by an arbitrary term of the right type then we still have a solution. If we replace each non-relevant occurrence by an almost closed term, say λy.z where z is some base-type variable, the lemma follows. Definition 4. Let σ ≺ τ mean that order(σ) is less than order(τ ) Let ≺∗ be the multiset extension of ≺.
346
D. Dougherty and T. Wierzbicki
Definition 5. Given
M = λx1 ..λxn .B,
let M be the multiset of types formed by including type τ with multiplicity m if τ is not a base type and in B there are m occurrences of free variables or constants of type τ . Then write M ≺∗ N if |M |≺∗ |N |. Note that in computing M we consider, in B, constants and the xi as well as variables which are not among the xi . There should be no ambiguity in using ≺∗ to refer to both the relation on type-multisets and the relation on terms. The relations ≺∗ are well-founded [2]. Lemma 4. Let M ≡ λx1 . . . λxn .hM1 · · · Mp (where the atom h is either a constant, one of the xi , or a free variable). Then Mj ≺∗ M for all j. Proof. Let the type M be σ1 → · · · → σn → σ; let the type of Mi be µi , for 1 ≤ i ≤ p; so that the type of h is µ = µ1 → · · · → µm → σ. Of course if h has base type then p = 0 and the lemma is vacuously true. To verify the claim in the non-trivial case: let µj be δ1 → · · · → δd → o. In comparing Mj with M we have lost one occurrence of µ, the type of h, and possibly added occurrences of the types δ1 , . . . , δd , due to the fact that Mj may have bound variables of these types (free in the matrix of Mj but not free in M ). But each δk is of lower order than µ. This establishes the lemma. The preceding is a general fact about all terms, k-duplicating or not. But the hypothesis of k-duplication and bounds on the occurrences of constants permit us to bound M . Proposition 1. Let w be a function assigning to each constant c in Σ a nonnegative integer w(c). For each σ and k we can effectively write down finitely many terms M comprising all of the almost-closed β-normal forms of type σ which are k-duplicating and satisfy #(c, M ) ≤ w(c). Proof. Let M : σ1 → · · · → σn → o satisfy the hypotheses. Recall that basetype variables do not figure in computing M . Since M is almost-closed we may observe that M is bounded by the multiset consisting of – k copies of each σi , and – for each constant c of non-base type τ , #(c, M ) copies of τ . Together with the fact that ≺∗ is well-founded this means that we may effectively bound the depth of all the M satisfying the hypotheses. This establishes the proposition. So we can solve the general k-duplicating matching problem by an exhaustive search: Theorem 1. The problems of k-duplicating β-matching and βη-matching are each decidable. Proof. Let problem M =? N be given. For each free variable x : σ of M we may, by Lemma 1, compute the set of almost-closed k-duplicating β-normal forms X of type σ satisfying #(c, X) ≤ #(c, N ) for each c. If we are considering βη matching we filter out those terms which are not in η¯-normal form. By Lemma 3 this is a complete search procedure for candidates for a solution.
A Decidable Variant of Higher Order Matching
4
347
Complexity of Matching in the Affine Lambda Calculus
Definition 6. A lambda term is linear (resp. affine, a λI-term), if for every subterm N ⊆ M , such that N = λx.P , there is exactly one (resp. at most one, at least one) free occurrence of x in P , i.e., | Occ(x, P )| = 1 (resp. | Occ(x, P )| ≤ 1, | Occ(x, P )| ≥ 1). Note, that the notion of affinity coincides with 1-linearity from our previous investigations, and that a term is linear exactly when it is an affine λI-term. As opposed to λI-terms, arbitrary terms are sometimes called λK-terms. The sets of linear, affine and λI-terms are closed under β-reduction (as opposed to k-duplicating terms, for k ≥ 2, which are not). Thus it is possible to investigate theories of lambda conversion for such terms, and to consider the ordinary problem of higher order matching in these calculi, instead of modifying the formulation of the problem itself. Such variants of matching are less general than the variant considered above, since we put restrictions not only on the form of solutions, but also on the instances of the problem. One might expect, that the complexity of such problems should be lower than that of the k-duplicating matching. Indeed, de Groote [4] showed, that matching in the linear lambda calculus is NP-complete. In this section we generalize his proof to the case of affine lambda calculus. Consider the following potential function: Φ(M ) = | type(λx.N1 )|. N ⊆M N =(λx.N1 )N2
Lemma 5 (de Groote). If M is affine, and M →β N , then Φ(M ) > Φ(N ). Proof. Suppose that M is reduced to N at address p, that is M p = (λx.P )Q and N = M [pP [x/Q]]. In a reduction step (λx.P )Q β P [x/Q] all redexes occurring in P and Q are left intact, and since x occurs in P at most once, there may be created at most one new redex inside P as a result of substitution. This new redex will be created if x occurs at active position (as an operator in an application) in a subterm of the form xP and Q is a lambda abstraction. Moreover the reduct P [x/Q] may also occur in active position and if it turns out to be a lambda abstraction, it will be a part of another new redex. Let type(λx.P ) = σ → τ . Closing the redex (λx.P )Q decreases the potential by | type(λx.P )|, and adds to it at most | type(x)| = |σ| plus | type(P [x/Q])| = |τ |. Thus Φ(N ) ≤ Φ(M ) − | type(λx.P )| + | type(x)| + | type(P [x/Q]) = Φ(M ) − |σ → τ | + |σ| + |τ | = Φ(M ) − 1. Definition 7. We say that a β-reduction sequence (Mi )m i=0 is variable-onlyforgetting, if for every i = 0, . . . , m − 1 and p ∈ {0, 1}∗ , if Mi p = (λx.P )Q β Mi+1 p , and x ∈ FV(P ), then Q is a variable. Since if x ∈ FV(P ), then |(λx.P )y| = 3 + |P | = 3 + |P [x/y]|, and if x ∈ FV(P ) and x occurs n > 0 times in P , then |P [x/Q]| = |P | + n(|Q| − 1) ≥ |P | + |Q| − 1, so |(λx.P )Q| = |P | + |Q| + 2 ≤ |P [x/Q]| + 3, we have:
348
D. Dougherty and T. Wierzbicki
Lemma 6. If (Mi )m i=0 is variable-only-forgetting, then |Mm | − |M0 | ≤ 3m. The next two lemmas state that every β-reduction sequence containing only affine terms can be replaced with an equivalent (in some sense) variable-onlyforgetting sequence. The key idea is to replace any subterm that is forgotten during reduction with a fresh free variable (which is allowed to be forgotten). Lemma 7. For any β-reduction sequence (Mi )m i=0 containing only affine terms and any term N ≤ Mm (with additional proviso that there exists substitution θ, such that N θ = Mm and every variable from Dom(θ) occurs at most once in N ), there exists a variable-only-forgetting reduction sequence (Nj )nj=0 leading to N , where N0 ≤ M0 , and n ≤ m. Proof. In the whole proof when stating that M ≤ N we will tacitly assume, that there exists a substitution θ, such that M θ = N and every variable from Dom(θ) occurs in M at most once. The proof is by induction on the length m of the reduction sequence (Mi )m i=0 . For m = 0 the conclusion is obvious. Suppose that such a sequence (Nj )nj=1 exists for an arbitrary sequence (Mi )m i=1 of length m (for convenience we shifted indexes i and j by one), and consider a sequence (Mi )m i=0 of length m + 1. Since M0 →β M1 , suppose that M0 is reduced to M1 at address p, that is M0 p = (λx.P )Q β P [x/Q] = M1 p . By induction assumption N1 ≤ M1 . If p is not a valid address in N1 , then N1 ≤ M0 , by Lemma 1.6. Thus (Nj )nj=1 is the desired reduction sequence. Otherwise we need to find a new term N0 as shown in the following diagram: ∗ M 0 →β M 1 → β Mm
≥
≥
≥
∗ N 0 →β N 1 → β Nn
in such a way, that the reduction step N0 →β N1 is variable-only-forgetting. Fix a fresh free variable y (which does not occur anywhere else in considered terms). Since N1 ≤ M1 , then by Lemma 1.3 we have N1 p ≤ M1 p = P [x/Q].
(1)
The term M1 is affine, so | Occ(x, P )| ≤ 1. If | Occ(x, P )| = 0, i.e., x ∈ FV(P ), then P [x/Q] = P , thus N1 p = P . Hence, by Lemma 1.5 we have P = (λx.(N1 p ))y ≤ (λx.P )Q, where y is fresh. Let N0 = N1 [pP ]. By Lemma 1.4 we get N0 ≤ M1 [p(λx.P )Q] = M0 . Since N1 p does not contain free occurrences of x, then P β N1 p , thus N0 →β N1 . Moreover (Nj )nj=0 is a desired variableonly-forgetting sequence leading to Mm and not longer than (Mi )m i=0 . Now suppose that | Occ(x, P )| = 1, and let Occ(x, P ) = {q}. If q is not a valid address in N1 p , then, by Lemma 1.6, we have N1 p ≤ (M1 p )[qx] = P . From (1) and Lemma 1.5 we have P = (λx.(N1 p ))y ≤ (λx.P )Q, where y is fresh. So let, as before, N0 = N1 [pP ]. By Lemma 1.4 we get N0 ≤ M1 [p(λx.P )Q] = M0 . Since N1 p does not contain free occurrences of x, then P β N1 p , thus N0 →β N1 . Moreover (Nj )nj=0 is a desired variable-only-forgetting sequence leading to Mm and not longer than (Mi )m i=0 . The last case to be considered is when q is a valid address in N1 p . From (1) and Lemma 1.3 we have N1 pq ≤ (P [x/Q])q = Q.
A Decidable Variant of Higher Order Matching
349
By Lemma 1.4 we get (N1 p )[qx] ≤ (P [x/Q])[qx] = P , so by Lemma 1.5 we have P = (λx.((N1 p )[qx]))(N1 pq ) ≤ (λx.P )Q. Let N0 = N1 [pP ]. Since N1 ≤ M1 , then by Lemma 1.4 we have N0 ≤ M1 [p(λx.P )Q] = M0 . Moreover P β (N1 p )[qN1 pq ] = N1 p , thus N0 →β N1 . The reduction sequence (Nj )nj=0 is not longer than (Mi )m i=0 , and, since x occurs in (N1 p )[qx], it is also variable-only-forgetting. ∗ ∗ Lemma 8. If M → β M and FV(M ) = ∅, then N →β M for every N ≥ M .
Proof. It is sufficient to show that if (Mi )m i=0 is a reduction sequence such that FV(Mm ) = ∅, then for any N0 there exists a reduction sequence (Ni )m i=0 , such that Nm = Mm . Proof is by induction on m. For m = 0 we have N0 ≥ M0 , so by Lemma 1.7 we get N0 = M0 , and (Ni )0i=0 is the desired sequence. Now suppose m that for any sequence (Mi )m i=1 and any N1 ≥ M1 such a sequence (Ni )i=1 exists (for convenience, as in previous lemma, we shifted the index i by one), and consider a sequence (Mi )m i=0 and a term N0 ≥ M0 . We need to find a term N1 , as shown in the following diagram:
→∗
≤
≤
β
∗ M 0 →β M 1 → β Mm
N 0 →β N 1
Since M0 →β M1 , there exists an address p, such that M0 p β M1 p . Since N0 ≥ M0 , then by Lemma 1.8 the subterm N0 p is a β-redex. Suppose N0 p = (λx.P )Q. Let N1 be the term obtained by contracting a redex at address p, i.e., N1 = N0 [pP [x/Q]]. By Lemma 1.3 we have (λx.P )Q ≥ M0 p . Thus P [x/Q] ≥ M1 p , so N1 ≥ M1 . By induction hypothesis there exists a reduction sequence m (Ni )m i=1 leading to Mm . Then (Ni )i=0 is the desired reduction sequence leading to Mm and starting from N0 . Using Lemmas 7 and 8 we are able to prove the following: Proposition 2. For any matching problem M =? N in the affine lambda calculus there always exists a finite complete set of solutions S, such that if [x/Nx ]x∈FV(M ) ∈ S, then
|Nx | ≤ |N | + 3
| type(x)| · | Occ(x, M )|,
(2)
x∈FV(M )
for any x ∈ FV(M ). (In particular S may be empty, if M =? N has no solution.) Proof. Let θ be an arbitrary solution of M =? N . It may be too big to satisfy (2), since it may contain huge terms that are forgotten during reduction of M θ to N . Suppose (Mi )m i=0 is a reduction sequence starting from M θ and leading to N . By Lemma 7 there exists a variable-only-forgetting sequence (Nj )nj=0 , such that N0 ≤ M0 , Nn = Mm = N , and n ≤ m. Now N0 contains only necessary fragments of M θ, so let Nx = sup({N0 p | p ∈ Occ(x, M ) and p valid in N0 } ∪ {x}).
350
D. Dougherty and T. Wierzbicki
Since N0 ≤ M θ, then N0 p ≤ (M θ)p = xθ by Lemma 1.3, for any p ∈ Occ(x, M ). Thus the above set possesses an upper bound xθ, and by Lemma 1.2 there exists its supremum, so Nx is well-defined. Let θ = [x/Nx ]x∈FV(M ) . We have immediately θ ≤ θ. Moreover for any substitution θ ≥ θ , since M θ ≥ M θ ≥ N0 and N is closed, then by Lemma 8 there exists a reduction sequence starting from M θ and leading to N . Thus θ is a solution of M =? N . From the other hand, by Lemma 6 we have |N0 | − |N | ≤ 3n, while by Lemma 5 we have n ≤ Φ(N0 ) − Φ(N ). Note that Φ(N ) = 0, since N is in normal form. Comparing the last two inequalities we get |N0 | ≤ |N | + 3n ≤ |N | + 3Φ(N0 ). By Lemmas 1.9 and 1.10 we have |Nx | ≤ |N0 |. From the other hand it is easy to see, that since N0 ≤ M θ, then Φ(N0 ) ≤ Φ(M θ). Moreover, since M and Nx are in normal form, then all redexes in M θ are those of the form Nx P , created as a result of substitution of terms Nx for variables from FV(M ). Since | type(Nx )| = | type(x)|, we get (2). We have shown that for any solution θ of M =? N there exists a substitution θ ≤ θ satisfying (2), such that any θ ≥ θ is a solution of M =? N . Let S be the set of such substitutions θ for all solutions θ of M =? N . The set S is finite, since the size of θ is bounded. Theorem 2. The problem of checking if an instance of a matching problem in the affine lambda calculus has a solution is NP-complete. Proof. The inequality (2) gives an upper bound on the size of solution, which depends only on terms M and N , and is a polynomial function of the length of any reasonable encoding of the problem M =? N as a word over a finite alphabet. Thus it suffices to guess terms Nx and check if θ = [x/Nx ]x∈FV(M ) is a solution. Since a normal form of a linear term may be found in a polynomial number of steps, the problem of checking if an instance M =? N has a solution is in NP. From the other hand the simplest proof of NP-hardness of the second order matching [1] works also in the affine case. Also de Groote’s proof of NP-hardness of matching in the linear lambda calculus can be applied here. Again, as in the proof of Theorem 1, if the βη-matching is considered, then, when guessing a solution we omit terms, which are not in η¯-normal form, so the theorem is valid for β- as well as βη-lambda calculus.
References 1. Hubert Comon, Yan Jurski, Higher-Order Matching and Tree Automata, Proc. 11th Int’l Workshop Computer Science Logic, CSL’97, Mogens Nielsen, Wolfgang Thomas, eds., Aarhus, Denmark, August 1997, LNCS 1414, Springer-Verlag, 1998, 157–176. 2. Nachum Dershowitz, Zohar Manna, Proving termination with multiset orderings, Comm. Assoc. for Computing Machinery, 6 (22), 465–476, 1979.
A Decidable Variant of Higher Order Matching
351
3. Giles Dowek, Third order matching is decidable, Proc. 7th IEEE Symp. Logic in Computer Science, LICS’92, IEEE Press, 1992, 2–10, also in Annals of Pure and Applied Logic, 69, 1994, 135–155. 4. Philippe de Groote, Linear Higher-Order Matching Is NP-Complete, Proc. 11th Int’l Conf. Rewriting Techniques and Applications, RTA 2000, Leo Bachmair, ed., Norwich, UK, July 10–12, 2000, LNCS 1833, Springer-Verlag, 2000, 127–140. 5. Jordi Levy, Linear Second Order Unification, Proc. 7th Int’l Conf. Rewriting Techniques and Applications, RTA’96, H. Ganzinger, ed., New Brunswick, NJ, 1996, LNCS 1103, Springer-Verlag, 1996, 332–346. 6. Ralph Loader, Higher Order β Matching is Undecidable, October 2001, manuscript. 7. Harry G. Mairson, A Simple Proof of a Theorem of Statman, Theoretical Computer Science, 103, 1992, 213–226. 8. Vincent Padovani, Decidability of All Minimal Models, Proc. 3rd Int’l Workshop Types for Proofs and Programs, TYPES’95, Stefano Berardi, Mario Coppo, eds., Torino, Italy, 1995, LNCS 1158, Springer-Verlag, 1996, 201–215. 9. Vincent Padovani, On equivalence classes of interpolation equations, Proc.Int’l Conf. Typed Lambda Calculi and Applications, TLCA’95, M. Dezani-Ciancaglini, G. Plotkin, eds., LNCS 902, Springer-Verlag, 1995, 335–349. 10. Vincent Padovani, Decidability of fourth-order matching, Mathematical Structures in Computer Science, 3 (10), 2000, 361–372. 11. Manfred Schmidt-Schauß, Klaus U. Schulz, Decidability of bounded higher order unification, technical report Frank-report-15, Institut f¨ ur Informatik, J. W. GoetheUniversit¨ at, Frankfurt am Main, 2001. 12. Aleksy Schubert, Linear interpolation for the higher order matching problem, Proc. 7th Int’l Joint Conf. Theory and Practice of Software Development, TAPSOFT’97, M. Bidoit, M. Dauchet, eds., LNCS 1214, Springer-Verlag, 1997. 13. Richard Statman, The Typed λ-Calculus is Not Elementary Recursive, Theoretical Computer Science, 15, 1981, 73–81. 14. Sergei Vorobyov, The“Hardest” Natural Decidable Theory, Proc. 12th Annual IEEE Symp. Logic in Computer Science, LICS’97, IEEE Press, 1997, 294–305.
Combining Decision Procedures for Positive Theories Sharing Constructors Franz Baader1 and Cesare Tinelli2 1
TU Dresden, Germany, [email protected] University of Iowa, USA, [email protected]
2
Abstract. This paper addresses the following combination problem: given two equational theories E1 and E2 whose positive theories are decidable, how can one obtain a decision procedure for the positive theory of E1 ∪ E2 ? For theories over disjoint signatures, this problem was solved by Baader and Schulz in 1995. This paper is a first step towards extending this result to the case of theories sharing constructors. Since there is a close connection between positive theories and unification problems, this also extends to the non-disjoint case the work on combining decision procedures for unification modulo equational theories.
1
Introduction
Built-in decision procedures for certain types of theories (like equational theories) can greatly speed up the performance of theorem provers. In many applications, however, the theories actually encountered are combinations of theories for which dedicated decision procedure are available. Thus, one must find ways to combine the decision procedures for the single theories into one for their combination. In the context of equational theories over disjoint signatures, this combination problem has been thoroughly investigated in the following three instances:1 the word problem, the validity problem for universally quantified formulae, and the unification problem. For the word problem, i.e., the problem whether a single (universally quantified) equation s ≡ t follows from the equational theory, the first solution to the combination problem was given by Pigozzi [9] in 1974. The problem of combining decision procedures for universally quantified formulae, i.e., arbitrary Boolean combinations of equations that are universally quantified, was solved by Nelson and Oppen [8] in 1979. Work on combining unification algorithms started also in the seventies with Stickel’s investigation [12] of unification of terms containing several associative-commutative and free symbols. The first general result on how to combine decision procedures for unification was published by Baader and Schulz [1] in 1992. It turned out that decision procedures for unification (with constants) are not sufficient to allow for a combination result. Instead, one needs decision procedures for unification with linear constant restrictions in the theories to be combined. In 1995, Baader and Schulz 1
Some of the work mentioned below can also handle more general theories. To simplify the presentation, we restrict our attention in this paper to the equational case.
S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 352–366, 2002. c Springer-Verlag Berlin Heidelberg 2002
Combining Decision Procedures for Positive Theories Sharing Constructors
353
[2] described a version of their combination procedure that applies to positive theories, i.e., positive Boolean combinations of equations with an arbitrary quantifier prefix. They also showed [3] that the decidability of the positive theory is equivalent to the decidability of unification with linear constant restrictions. Since then, the main open problem in the area was how to extend these results to the combination of theories having symbols in common. In general, the existence of shared symbols may lead to undecidability results for the union theory (see, e.g., [6,5] for some examples). This means that a controlled form of sharing of symbols is necessary. For the word problem and for universally quantified formulae, a suitable notion of shared constructors has proved useful. In [5], Pigozzi’s combination result for the word problem was extended to theories all of whose shared symbols are constructors. A similar extension of the NelsonOppen combination procedure can be found in [13]. In a similar vein, we show in this paper that the combination results in [2] for positive theories (and thus for unification) can be extended to theories sharing constructors. We do that by extending the combination procedure in [2] with an extra step that deals with shared symbols and proving that the extended procedure is sound and complete. Since this extra step is not finitary, the new procedure in general yields only a semi-decision procedure for the combined theory. Under some additional assumptions on the equational theory of the shared symbols, the procedure can, however, be turned into a decision procedure. Although the combination procedure described here differs from the one in [2] by just one extra step, proving its correctness is considerably more challenging, due to the non-disjointness of the theories. A major contribution of this work is a novel algebraic construction of the free algebra of the combined theory. As in the non-disjoint case [2], this construction is vital for the correctness proof of the procedure, and we believe that it will prove helpful also in future research on non-disjoint combination. The paper is organized as follows. Section 2 contains some formal preliminaries. Section 3 defines our notion of constructors and presents some of their properties, which will be used later to prove the correctness of the combination procedure. Section 4 describes our extension of the Baader-Schulz procedure to component theories sharing constructors. It then introduces a straightforward condition on the component theories under which the semi-decision procedure obtained this way can in fact be used to decide the positive consequences of their union. Finally, it proves that the general procedure is sound and complete. We conclude the paper with a comparison to related work and suggestions for further research. Space constraints prevent us from providing all the proofs of the results in the paper. The missing proofs can be found in [4].
2
Preliminaries
In this paper we will use standard notions from universal algebra such as formula, sentence, algebra, subalgebra, generators, reduct, entailment, model, homomorphism and so on. Notable differences are reported in the following.
354
F. Baader and C. Tinelli
We consider only first-order theories (with equality) over a functional signature. A signature Σ is a set of function symbols, each with an associated arity, an integer n ≥ 0. A constant symbol is a function symbol of zero arity. We use the letters Σ, Ω, ∆ to denote signatures. Throughout the paper, we fix a countably-infinite set V of variables, disjoint with any signature Σ. For any X ⊆ V , T (Σ, X) denotes the set of Σ-terms over X, i.e., first-order terms with variables in X and function symbols in Σ. Formulae in the signature Σ are defined as usual. We use ≡ to denote the equality symbol. We also use the standard notion of substitution, with the usual postfix notation. We call a substitution a renaming iff it is a bijection of V onto itself. We say that a subset T of T (Σ, V ) is closed under renaming iff tσ ∈ T for all terms t ∈ T and renamings σ. If A is a set, we denote by A∗ the set of all finite tuples made of elements of A. If a and b are two tuples, we denote by a, b the tuple obtained as the concatenation of a and b. If ϕ is a term or a formula, we denote by Var(ϕ) the set of ϕ’s free variables. We will often write ϕ(v) to indicate, as usual, that v is a tuple of variables with no repetitions and all elements of Var(ϕ) occur in v. A formula is positive iff it is in prenex normal form and its matrix is obtained from atomic formulae using only conjunctions and disjunctions. A formula is existential iff it has the form ∃u. ϕ(u, v) where ϕ(u, v) is a quantifierfree formula. If A is an algebra of signature Ω, we denote by A the universe of A and by AΣ the reduct of A to a given subsignature Σ of Ω. If ϕ(v) is an Ω-formula and α is a valuation of v into A, we write (A, α) |= ϕ(v) iff ϕ(v) is satisfied by the interpretation (A, α). Equivalently, where a = α(v), we may also write A |= ϕ(a). If t(v) is an Ω-term, we denote by [[t]]A α the interpretation of t in A under the valuation α of v. Similarly, if T is a set of terms, we denote by [[T ]]A α the set {[[t]]A α | t ∈ T }. A theory of signature Ω, or an Ω-theory, is any set of Ω-sentences, i.e., closed Ω-formulae. An algebra A is a model of a theory T , or models T , iff each sentence in T is satisfied by the interpretation (A, α) where α is the empty valuation. Let T be an Ω-theory. We denote by Mod (T ) the class of all Ω-algebras that model T . The theory T is satisfiable if it has a model, and trivial if it has only trivial models, i.e., models of cardinality 1. For all sentences ϕ (of any signature), we say as usual that T entails ϕ, or that ϕ is valid in T , and write T |= ϕ, iff T ∪ {¬ϕ} is unsatisfiable. We call (existential) positive theory of T the set of all (existential) positive sentences in the signature of T that are entailed by T . An equational theory is a set of (universally quantified) equations. If E is an equational theory of signature Ω and Σ is an arbitrary signature, we denote by E Σ the set of all (universally quantified) Σ-equations entailed by E. When Σ ⊆ Ω we call E Σ the Σ-restriction of E. For all Ω-terms s(v), t(v), we write s =E t and say that s and t are equivalent in E iff E |= ∀v. s ≡ t. We will later appeal to the two basic model theory results below about subalgebras (see [7] among others).
Combining Decision Procedures for Positive Theories Sharing Constructors
355
Lemma 1. Let B be a Σ-algebra and A a subalgebra of B. For all quantifier-free formulae ϕ(v1 , . . . , vn ) and individuals a1 , . . . , an ∈ A, A |= ϕ(a1 , . . . , an ) iff B |= ϕ(a1 , . . . , an ). Lemma 2. For all equational theories E, Mod (E) is closed under subalgebras. Similarly to [2], our procedure’s correctness proof will be based on free algebras. Instead of the usual definition of free algebras, we will rely on the following characterization [7]. Proposition 3. Let E be a Σ-theory and A a Σ-algebra. Then, A is free in E over some set X iff the following holds: 1. A is a model of E generated by X; 2. for all s, t ∈ T (Σ, V ) and injections α of Var(s ≡ t) into X, if (A, α) |= s ≡ t then s =E t. When A is free in E over X we will also say that A is a free model of E (with basis X). We will implicitly rely on the well-known fact that every nontrivial equational theory E admits a free model with a countably infinite basis, namely the quotient term algebra T (Σ, V )/=E . We will also use the following two results from [2] about free models and positive formulae. Lemma 4. Let B be an Ω-algebra free (in some theory E) over a countably infinite set X. For all positive Ω-formulae ϕ(v 1 , v 2 , . . . , v 2m−1 , v 2m ) the following are equivalent: 1. B |= ∀v 1 ∃v 2 · · · ∀v 2m−1 ∃v 2m . ϕ(v 1 , v 2 , . . . , v 2m−1 , v 2m ); 2. there exist tuples x1 , . . . , xm ∈ X ∗ and b1 , . . . , bm ∈ B ∗ and finite subsets Z1 , . . . , Zm of X such that a) B |= ϕ(x1 , b1 , . . . , xm , bm ), b) all components of x1 , . . . , xn are distinct, c) for all n ∈ {1, . . . , m}, all components of bn are generated by Zn in B, d) for all n ∈ {1, . . . , m − 1}, no components of xn+1 are in Z1 ∪ · · · ∪ Zn . Lemma 5. For every equational theory E having a countable signature and a free model A with a countably infinite basis, the positive theory of E coincides with the set of positive sentences true in A. In this paper, we will deal with combined equational theories, that is, theories of the form E1 ∪ E2 , where E1 and E2 are two component equational theories of (possibly non-disjoint) signatures Σ1 and Σ2 , respectively. Where Σ := Σ1 ∩ Σ2 , we call shared symbols the elements of Σ and shared terms the elements of T (Σ, V ). Notice that, when Σ1 and Σ2 are disjoint, the only shared terms are the variables. Most combination procedures, including the one described in this paper, work with (Σ1 ∪ Σ2 )-formulae by first “purifying” them into a set of Σ1 -formulae and a set of Σ2 -formulae. There is a standard purification procedure that, when Σ1 and Σ2 are disjoint, can convert any set S of equations of signature Σ1 ∪ Σ2 into a set S of pure equations (that is, each of signature Σ1 or Σ2 ) such that S is satisfiable in a (Σ1 ∪ Σ2 )-algebra A iff S is satisfiable in A. As we show in [5], a similar procedure also exists for the case in which Σ1 and Σ2 are not disjoint.
356
3
F. Baader and C. Tinelli
Theories with Constructors
The main requirement for our generalization of the combination procedure described in [2] to apply is that the symbols shared by the two theories are constructors as defined in [5,13]. For the rest of the section, let E be an non-trivial equational theory of signature Ω. Also, let Σ be a subsignature of Ω. Definition 6 (Constructors). The signature Σ is a set of constructors for E iff for every free model A of E with a countably infinite basis X, AΣ is a free model of E Σ with a basis Y including X. It is usually non-trivial to show that a signature Σ is a set of constructors for a given theory E by using just the definition above. Instead, using a syntactic characterization of constructors given in terms of certain subsets of T (Ω, V ) is usually more helpful. Before we can give this characterization, we need a little more notation. Given a subset G of T (Ω, V ), we denote by T (Σ, G) the set of Σ-terms over the “variables” G. More precisely, every member of T (Σ, G) is obtained from a term s ∈ T (Σ, V ) by replacing the variables of s with terms from G. To express this construction, we will denote any such term by s(r) where r is a tuple collecting the terms of G that replace the variables of s. Note that G ⊆ T (Σ, G) and that T (Σ, V ) ⊆ T (Σ, G) whenever V ⊆ G. Definition 7 (Σ-base). A subset G of T (Ω, V ) is a Σ-base of E iff 1. V ⊆ G; 2. for all t ∈ T (Ω, V ), there is an s(r) ∈ T (Σ, G) such that t =E s(r); 3. for all s1 (r 1 ), s2 (r 2 ) ∈ T (Σ, G), s1 (r 1 ) =E s2 (r 2 ) iff s1 (v 1 ) =E s2 (v 2 ), where v 1 and v 2 are tuples of fresh variables abstracting the terms of r 1 , r 2 so that two terms in r 1 , r 2 are abstracted by the same variable iff they are equivalent in E. We say that E admits a Σ-base if some G ⊆ T (Ω, V ) is a Σ-base of E. Theorem 8 (Characterization of constructors). The signature Σ is a set of constructors for E iff E admits a Σ-base. A proof of this theorem and of the following corollary can be found in [5]. Corollary 9. Where A is a free model of E with a countably-infinite basis X, let α be an arbitrary bijection of V onto X. If G is a Σ-base of E, then AΣ is free in E Σ over the superset [[G]]A α of X. In the following, we will assume that the theories we consider admit Σ-bases closed under renaming. This assumption is necessary for technical reasons. It is used in the long version of this paper in the proof of a lemma (Lemma 4.18 in [4]; omitted here) needed to prove the soundness of the combination procedure described later. Although we do not know whether this assumption can be made with no loss of generality, it is not clear how to avoid it and it seems to be satisfied by all “sensible” examples of theories admitting constructors. Also note that
Combining Decision Procedures for Positive Theories Sharing Constructors
357
the same technical assumption was needed in our work on combining decision procedures for the word problem [5]. It is shown in [5] that, under the right conditions, constructors and the property of having Σ-bases closed under renaming are modular with respect to the union of theories. Proposition 10. For i = 1, 2 let Ei be a non-trivial equational Σi -theory. If Σ := Σ1 ∩ Σ2 is a set of constructors for E1 and for E2 and E1 Σ = E2 Σ , then Σ is a set of constructors for E1 ∪ E2 . If both E1 and E2 admit a Σ-base closed under renaming, then E1 ∪ E2 also admits a Σ-base closed under renaming. A useful consequence of Proposition 10 for us will be the following. Proposition 11. Let E be an Ω-theory and let E be the empty ∆-theory for some signature ∆ disjoint with Ω. If Σ ⊆ Ω is a set of constructors for E, then it is a set of constructors for E ∪ E . Furthermore, if E admits a Σ-base closed under renaming, then so does E ∪ E .
4
Combining Decision Procedures
In this section, we generalize the Baader-Schulz procedure [2] for combining decision procedures for the validity of positive formulae in equational theories from theories over disjoint signatures to theories sharing constructors. More precisely, we will consider two theories E1 and E2 that satisfy the following assumptions for i = 1, 2, which we fix for the rest of the section: – Ei is a non-trivial equational theory of some countable signature Σi ; – Σ := Σ1 ∩ Σ2 is a set of constructors for Ei , and Ei admits a Σ-base closed under renaming; – E1 Σ = E 2 Σ . Let E := E1 ∪ E2 . Under the assumptions above, E Σ = E1 Σ = E2 Σ (see [5]). In the following then, we will use E Σ to refer indifferently to E1 Σ or E2 Σ . The combination procedure will use two kinds of substitutions that we call, after [13], identifications and Σ-instantiations. Given a set of variables U , an identification of U is a substitution defined by partitioning U , selecting a representative for each block in the partition, and mapping each element of U to the representative in its block. A Σ-instantiation of U is a substitution that maps some elements of U to non-variable Σ-terms and the other elements to themselves. For convenience, we will assume that the variables occurring in the terms introduced by a Σ-instantiation are always fresh. 4.1
The Combination Procedure
The procedure takes as input a positive existential (Σ1 ∪ Σ2 )-formula ∃w. ϕ(w) and outputs, non-deterministically, a pair of sentences: a positive Σ1 -sentence and a positive Σ2 -sentence. It consists of the following steps.
358
F. Baader and C. Tinelli
1. Convert into DNF. Convert the input’s matrix ϕ into the disjunctive normal form ψ1 ∨ · · · ∨ ψn and choose a disjunct ψj . 2. Convert into Separate Form. Let S be the set obtained by purifying, as mentioned in Section 2, the set of all the equations in ψj . For i = 1, 2, let ϕi (v, ui ) be the conjunction of all Σi -equations in S,2 with v listing the variables in Var(ϕ1 ) ∩ Var(ϕ2 ) and ui listing the remaining variables of ϕi . 3. Instantiate Shared Variables. Choose a Σ-instantiation ρ of Var(v) = Var(ϕ1 ) ∩ Var(ϕ2 ). 4. Identify Shared Variables. Choose an identification ξ of Var(ϕ1 ρ) ∩ Var(ϕ2 ρ) = Var(vρ). For i = 1, 2, let ϕi := ϕi ρξ. 5. Partition Shared Variables. Group the elements of Vs := Var(vρξ) = Var(ϕ1 ) ∩ Var(ϕ2 ) into the tuples v 1 , . . . , v 2m , with 2 ≤ 2m ≤ |Vs | + 1, so that each element of Vs occurs exactly once in the tuple v 1 , . . . , v 2m .3 6. Generate Output Pair. Output the pair of sentences ( ∃v 1 ∀v 2 · · · ∃v 2m−1 ∀v 2m ∃u1 . ϕ1 , ∀v 1 ∃v 2 · · · ∀v 2m−1 ∃v 2m ∃u2 . ϕ2 ). Ignoring inessential differences and our restriction to functional signatures, this combination procedure differs from Baader and Schulz’s only for the presence of Step 3. Note however that, for component theories with disjoint signatures (the case considered in [2]), Step 3 is vacuous because Σ is empty. In that case then the procedure above reduces to that in [2]. Correspondingly, our requirements on the two component theories also reduce to that in [2], which simply asks that E1 and E2 be non-trivial. In fact, when Σ is empty it is always a set of constructors for Ei (i = 1, 2), with T (Σi , V ) being a Σ-base closed under renaming. Moreover, E1 Σ = E2 Σ because they both coincide with the theory {v ≡ v | v ∈ V }. As will be shown in Section 4.3, our combination procedure is sound and complete in the following sense. Theorem 12 (Soundness and Completeness). For all possible input sentences ∃w. ϕ(w) of the combination procedure, E1 ∪ E2 |= ∃w. ϕ(w) iff there is a possible output (γ1 , γ2 ) such that E1 |= γ1 and E2 |= γ2 . Unlike the procedure in [2], the combination procedure above does not necessarily yield a decision procedure. The reason is that the non-determinism in Step 3 of the procedure is not finitary since in general there are infinitely-many possible Σ-instantiations to choose from. One viable, albeit strong, restriction for obtaining a decision procedure is described in the next subsection. 4.2
Decidability Results
In order to turn the combination procedure from above into a decision procedure, we require that the equivalence relation defined by the theory E Σ = E1 Σ = E2 Σ be bounded in a sense described below. 2 3
Where Σ-equations are considered arbitrarily as either Σ1 - or Σ2 -equations. Note that some of the subtuples v i may be empty.
Combining Decision Procedures for Positive Theories Sharing Constructors
359
Definition 13. Let E be an equational Ω-theory. We say that equivalence in E is finitary modulo renaming iff there is a finite subset R of T (Ω, V ) such that for all s ∈ T (Ω, V ) there is a term t ∈ R and a renaming σ such that s =E tσ. We call R a set of E-representatives. When Ω in the above definition is empty, equivalence in E is trivially finitary— with any singleton set of variables being a set of E-representatives. A non-trivial example is provided at the end of this section. If E Σ is finitary modulo renaming, then it is easy to see that it suffices to consider only finitely many instantiations in Step 3 of the procedure, which leads to the following decidability result. Proposition 14. Assume that Σ, E1 , E2 satisfy the assumptions stated at the beginning of Section 4, and that equivalence in E Σ is finitary modulo renaming. If the positive theories of E1 and of E2 are both decidable, then the positive existential theory of E1 ∪ E2 is also decidable. Using a Skolemization argument together with Proposition 11, the result above can be extended from positive existential input sentences to arbitrary positive input sentences. The main idea is to Skolemize the universal quantifiers of the input sentence and then expand the signature of one the theories, E2 say, to the newly introduced Skolem symbols. Proposition 11 and the combination result in [2] for the disjoint case imply that the pair E1 , E2 , where E2 is the conservative extension of E2 to the expanded signature, satisfies the assumptions of Proposition 14. Theorem 15. Assume that E1 , E2 satisfy the assumptions of Proposition 14. If the positive theories of E1 and of E2 are both decidable, then the positive theory of E := E1 ∪ E2 is also decidable. The following example describes one theory satisfying all the requirements on the component theories imposed by Theorem 15. Example 16. Consider the signature Ω := {0, s, +} and, for some n > 1, the equational theory En axiomatized by the identities x + (y + z) ≡ (x + y) + z, x + y ≡ y + x, x + s(y) ≡ s(x + y), x + 0 ≡ x, sn (x) ≡ x. where as usual sn (x) stands for the n-fold application of s to x. We show in [4] that, for En and the subsignature Σ := {0, s} of Ω, all the assumptions of Theorem 15 on the component theories are satisfied. 4.3
Soundness and Completeness of the Procedure
The soundness and completeness proof for the disjoint case in [2] relies on an explicit construction of the free model of E = E1 ∪ E2 as an amalgamated product of the free models of the component theories. A direct adaptation of the free amalgamation construction of [2] to the non-disjoint case has so far proven elusive. An important technical contribution of the present work is to provide
360
F. Baader and C. Tinelli
A1
Y1
Z1,2
h1
Z2,1
Z1
h2
Z2
X2 =X’2 Y2
A2 =F
X’1
X1 Z1,1
h3
Z2,2
Fig. 1. The Fusion F of A1 and A2 .
an alternative way to obtain an appropriate amalgamated free model in the case of theories sharing constructors. We obtain this model for the union theory E indirectly, by first building a simpler sort of amalgamated model as a fusion (defined below) of the free models of the two component theories. Contrary to Baader and Schulz’s free amalgamated product, the fusion model we construct is not free in E. However, it has a subalgebra that is so. That subalgebra will serve as the free amalgamated model of E. Definition 17 (Fusion [5,13]). A (Ω1 ∪ Ω2 )-algebra F is a fusion of a Ω1 algebra A1 and a Ω2 -algebra A2 iff F Ω1 is Ω1 -isomorphic to A1 and F Ω2 is Ω2 -isomorphic to A2 . It is shown in [13] that two algebras A1 and A2 have fusions exactly when they are isomorphic over their shared signature, and that every fusion of a model of a theory T1 with a model of a theory T2 is a model of the theory T1 ∪ T2 . In the following, we will construct a model of E = E1 ∪ E2 as a fusion of free models of the theories E1 and E2 fixed earlier, whose shared signature Σ was a set of constructors for both. We start by fixing, for i = 1, 2, – – – –
a free model Ai of Ei with a countably infinite basis Xi , a bijective valuation αi of V onto Xi , a Σ-base Gi of Ei closed under renaming, and i the set Yi := [[Gi ]]A αi .
We know from Corollary 9 that Xi ⊆ Yi and Ai Σ is free in E Σ = E1 Σ = E2 Σ over Yi . Observe that Ai is countably infinite, given our assumption that Xi is countably infinite and Σi is countable. As a consequence, Yi is countably infinite as well.
Combining Decision Procedures for Positive Theories Sharing Constructors
361
Now let Zi,2 := Yi \ Xi for i = 1, 2, and let {Z1,1 , Z1 } be a partition of X1 such that Z1 is countably infinite and |Z1,1 | = |Z2,2 |.4 Similarly, let {Z2,1 , Z2 } be a partition of X2 such that |Z2,1 | = |Z1,2 | and Z2 is countably infinite (see Figure 1). Then consider 3 arbitrary bijections h1 : Z1,2 −→ Z2,1 , h2 : Z1 −→ Z2 , h3 : Z1,1 −→ Z2,2 , as shown in Figure 1. Observing that {Zi,1 , Zi , Zi,2 } is a partition of Yi for i = 1, 2, it is immediate that h1 ∪ h2 ∪ h3 is a well-defined bijection of Y1 onto Y2 . Since Ai Σ is free in E Σ over Yi for i = 1, 2, we have that h1 ∪ h2 ∪ h3 extends uniquely to a (Σ)-isomorphism h of A1 Σ onto A2 Σ . The isomorphism h induces a fusion of A1 and A2 whose main properties are listed in the following lemma, taken from [5]. Lemma 18. There is a fusion F of A1 and A2 having the same universe as A2 and such that 1. 2. 3. 4. 5.
h is a (Σ1 )-isomorphism of A1 onto F Σ1 ; the identity map of A2 is a (Σ2 )-isomorphism of A2 onto F Σ2 ; F Σi is free in Ei over Xi := Z2,j ∪ Z2 for i, j = 1, 2, i = j; F Σ is free in E Σ over Y2 = Z2,1 ∪ Z2 ∪ Z2,2 ; Σ2 Σ1 Y2 = [[G2 ]]F = [[G1 ]]F α2 h◦α1 .
We will now consider the theory E = E1 ∪ E2 again, together with the algebras F, F1 , F2 and A where: – F is the fusion of A1 and A2 from Lemma 18; – Fi := F Σi for i = 1, 2;5 – A is the subalgebra of F generated by Z2 . Both F and A are models of E. In fact, F is a model of E = E1 ∪ E2 for being a fusion of a model of E1 and a model of E2 , whereas A is a model of E by Lemma 2. We prove in [4] that A is in fact a free model of E. To do that we use the following sets of terms, which will come in handy later as well. ∞ ∞ ∞ ∞ ∞ Definition 19 (G∞ := G∞ 1 , G2 , G ). Let G 1 ∪ G2 where for i = 1, 2, Gi := ∞ n n n=0 Gi and {Gi | n ≥ 0} is the family of sets defined as follows:
:= V, G0i := Gni ∪ {r(r1 , . . . , rm ) | r(v1 , . . . , vm ) ∈ Gi \ V, r =E v for all v ∈ V, Gn+1 i rj ∈ Gnk with k = i, for all j = 1, . . . , m, rj =E rj for all distinct j, j = 1, . . . , m }. ∞ ∞ satisfy the following two properties. As proved in [5], the sets G∞ 1 , G2 , G
Lemma 20. Let i ∈ {1, 2}. For any bijection α of V onto Z2 the following holds: 4 5
This is possible because Z2,2 is countable (possibly finite). These algebras are defined just for notational convenience.
F. Baader and C. Tinelli
...
362
G13
C13 G12
C12 C
G11
1 1
Y
G20 G21
... G22 X’1
G23
Z 2,1
...
G10
=
C10
X’2
C20
Z2
C21 C22 C23
Z 2,2
... n Fig. 2. The families {[[Gn i ]] | n ≥ 0} and {Ci | n ≥ 0}.
F 1. [[G∞ i \ V ]]α ⊆ Z2,i ; F F 2. for all t1 , t2 ∈ G∞ i \ V , if [[t1 ]]α = [[t2 ]]α then t1 =E t2 .
Proposition 21. The set G∞ is Σ-base of E = E1 ∪ E2 . Note that this proposition entails by Theorem 8 that Σ is a set of constructors for E. Using these two properties (and Proposition 3) we can show the following. Proposition 22. A is free in E over Z2 . Corollary 23. For every bijection α of V onto Z2 , AΣ is free in E Σ over Y := [[G∞ ]]A α , and Y ⊆ Y2 . For the rest of the section, let us fix a bijection α of V onto Z2 and the corresponding set Y := [[G∞ ]]A α. To prove the completeness of the combination procedure we will need two families {C1n | n ≥ 0} and {C2n | n ≥ 0} of sets partitioning the set Y above. To build these families we use the denotations in A of the sets Gn1 and Gn2 introduced in Definition 19. More precisely, for i = 1, 2, we consider the family {[[Gni ]]A α | n ≥ 0} of subsets of Y . Since A is the subalgebra of F generated by n F Z2 and α is a valuation of V into Z2 , it is easy to see that [[Gni ]]A α = [[Gi ]]α for all n ≥ 0. Therefore, we will write just [[Gni ]] in place of either [[Gni ]]A or [[Gni ]]F α α.
Combining Decision Procedures for Positive Theories Sharing Constructors
363
Observe that [[G01 ]] = [[G02 ]] = Z2 and [[Gni ]] ⊆ [[Gn+1 ]] for all n ≥ 0 and i i = 1, 2. Given that [[Gni ]] \ Z2 ⊆ [[Gni \ V ]]A , we can conclude by Lemma 20 that α [[Gni ]] \ Z2 ⊆ Z2,i .6 By Corollary 23 we have that n≥0
([[Gn1 ]] ∪ [[Gn2 ]]) = [[
∞ ∞ (Gn1 ∪ Gn2 )]] = [[G∞ 1 ∪ G2 ]] = [[G ]] = Y.
n≥0
Now consider the family of sets {Cin | n ≥ 0}, depicted in Figure 2 along with {[[Gni ]] | n ≥ 0} and defined as follows: ]] \ [[Gni ]] for all n ≥ 0. Ci0 := [[G0i ]] and Cin+1 := [[Gn+1 i First note that n≥0 (C1n ∪ C2n ) = n≥0 ([[Gn1 ]] ∪ [[Gn2 ]]) = Y. Then note that, for all n ≥ 0 and i = 1, 2, the elements of Cin are individuals of the algebras F1 and F2 (which have the same universe). By Lemma 20, C1n ⊆ [[Gn1 ]] ⊆ Z2,1 ∪Z2 = X2 ; in other words, every element of C1n is a generator of F2 . Similarly, C2n ⊆ [[Gn2 ]] ⊆ Z2,2 ∪ Z2 = X1 , that is, every element of C2n is a generator of F1 . In addition, we have the following. Lemma 24. For all distinct m, n ≥ 0 and distinct i, j ∈ {1, 2}, 1. Cim ∩ Cin = ∅ and 2. Cin+1 is Σi -generated by [[Gnj ]] in Fi . Now, Theorem 12 is an easy consequence of the following proposition. Proposition 25. For i = 1, 2, let ϕi (v, ui ) be a conjunction of Σi -equations where v lists the elements of Var(ϕ1 ) ∩ Var(ϕ2 ) and ui lists the elements of Var(ϕi ) not in v. The following are equivalent: 1. There is a Σ-instantiation ρ of v, an identification ξ of Var(vρ) and a grouping v 1 , . . . , v 2m of Var(vρξ) with each element of Var(vρξ) occurring exactly once in v 1 , . . . , v 2m such that A1 |= ∃v 1 ∀v 2 · · · ∃v 2m−1 ∀v 2m ∃u1 . (ϕ1 ρξ) and A2 |= ∀v 1 ∃v 2 · · · ∀v 2m−1 ∃v 2m ∃u2 . (ϕ2 ρξ). 2. A |= ∃v∃u1 ∃u2 . (ϕ1 ∧ ϕ2 ). Proof. The proof of (1 ⇒ 2) is similar to the corresponding proof in [2], although it requires some additional technical lemmas (see [4] for details). We concentrate here on the proof of (2 ⇒ 1). Assume that A |= ∃v, u1 , u2 . (ϕ1 (v, u1 ) ∧ ϕ2 (v, u2 )). Let α be the bijection of V onto Z2 and Y the subset of Y2 that we fixed after Corollary 23. Since the reduct AΣ of A is Σ-generated by Y by the same corollary, there is a Σinstantiation ρ of v, an identification ξ of Var(vρ), and an injective valuation β 6
n This entails that [[Gm 1 ]] \ Z2 is disjoint with [[G2 ]] \ Z2 for all m, n > 0.
364
F. Baader and C. Tinelli
of v into Y such that, for ϕi := ϕi ρξ (i = 1, 2) and v listing the variables of vρξ, we have (A, β) |= ∃u1 , u2 . (ϕ1 (v , u1 ) ∧ ϕ2 (v , u2 )). From this, recalling that A is (Σ1 ∪ Σ2 )-generated by Z2 by construction and Y is included in Y2 , we can conclude that there is a tuple a of pairwise distinct elements of Y2 , all (Σ1 ∪ Σ2 )-generated by Z2 , such that A |= ∃u1 , u2 . ϕ1 (a, u1 ) ∧ ϕ2 (a, u2 ). Since A is a subalgebra of F and ϕ1 ∧ϕ2 is quantifier-free, it follows by Lemma 1 that F |= ∃u1 , u2 . ϕ1 (a, u1 ) ∧ ϕ2 (a, u2 ) as well. Given that each ϕi is a Σi formula and u1 and u2 are disjoint, we have then that F1 |= ∃u1 . ϕ1 (a, u1 ) and F2 |= ∃u2 . ϕ2 (a, u2 ).
(1)
We construct a partition of the elements of a that will induce a grouping of v having the properties listed in Point 1 of the proposition. For that, we will use the families {C1n | n ≥ 0} and {C2n | n ≥ 0} defined before Lemma 24. First, let a1 be a tuple collecting the components of a that are in C10 ∪ C11 . Then, for all n > 1, let an be a tuple collecting the components of a that are in C1n . Finally, for all n > 0, let bn be a tuple collecting the components of a that are in C2n .7 Since a is a (finite) tuple of Y ∗ and Y = n≥0 (C1n ∪ C2n ) as observed earlier, m there is a smallest m > 0 such that every component of a is in n=0 (C1n ∪ C2n ). Let n ∈ {0, . . . , m − 1}. By Lemma 24(2), bn+1 is Σ2 -generated by [[Gn1 ]] in F2 . Let Zn+1 be any finite subset of [[Gn1 ]] that generates bn+1 . Now recall that F2 is free over the countably-infinite set X2 . We prove that a1 , . . . , am , b1 , . . . , bm , and Z1 , . . . , Zm satisfy Lemma 4(2). To start with, we have that an ∈ (X2 )∗ for all n ∈ {1, . . . , m} because C1n ⊆ n [[G1 ]] ⊆ X2 by construction of C1n . From Lemma 24(1) it follows that the tuples an and an are pairwise disjoint for all distinct n, n ∈ {1, . . . , m}, which means that all components of a1 , . . . , am are distinct. Now let n ∈ {1, . . . , m − 1}. Observe that the set Z1 ∪· · ·∪Zn is included in [[Gn−1 ]] = C10 ∪· · ·∪C1n−1 whereas 1 n+1 every component of an+1 belongs to C1 . It follows that no components of an+1 are in Z1 ∪ · · · ∪ Zn . Finally, where f is the bijection that maps, in order, the components of a to those of v , let v 1 , v 2 , . . . , v 2m−1 , v 2m be the rearrangement of v corresponding to a1 , b1 , . . . , am , bm according to f . From (1) above we know that F2 |= ∃u2 . ϕ2 (a1 , b1 , . . . , bm , am , u2 ). By Lemma 4 we can then conclude that F2 |= ∀v 1 ∃v 2 · · · ∀v 2m−1 ∃v 2m ∃u2 . ϕ2 . Almost symmetrically, we can prove F1 |= ∃v 1 ∀v 2 · · · ∃v 2m−1 ∀v 2m ∃u1 . ϕ1 . The claim then follows from the fact that Fi is Σi -isomorphic to Ai for i = 1, 2 by Lemma 18. 7
Each tuple above is meant to have no repeated components, and may be empty.
Combining Decision Procedures for Positive Theories Sharing Constructors
5
365
Related Research
From a technical point of view, this work strongly depends on previous research on combining decision procedures for unification in the disjoint case and on research on combining decision procedures for the word problem in the nondisjoint case. The combination procedure as well as the proof of correctness are modeled on the corresponding procedure and proof in [2]. The only extension to the procedure is Step 3, which takes care of the shared symbols. In the proof, one of the main obstacles to overcome was to find an amalgamation construction that worked in the non-disjoint case. Several of the hard technical results used in the proof depend on results from our previous work on combining decision procedures for the word problem [5]. The definition of the sets Gi , which are vital for proving that the constructed algebra A is indeed free, is also borrowed from there. It should be noted, however, that this definition can also be seen as a generalization to the non-disjoint case of a syntactic amalgamation construction originally due to Schmidt-Schauß [11]. As already mentioned in the introduction, the notion of constructors used here is taken from [5,13]. The only other work on combination methods for unification in the nondisjoint case is due to Domenjoud, Ringeissen and Klay [6]. The main differences with our work are that (i) their notion of constructors is considerably more restrictive than ours; and (ii) they combine algorithms computing complete sets of unifiers, and so their method cannot be used to combine decision procedures. On the other hand, Domenjoud, Ringeissen and Klay do not impose the strong restriction that the component theories be finitary modulo renaming, which we need for our decidability result. However, it was recently discovered [10] that termination of the combination algorithm in [6] is actually not guaranteed with the conditions given in that paper.
6
Conclusion
We have extended the Baader-Schulz combination procedure [2] for positive theories to the case of component theories over non-disjoint signatures. The main contribution of this paper is the formulation of appropriate restrictions under which this procedure is sound and complete, and the proof of soundness and completeness itself. This proof depends on a novel construction of the free model of the combined theory, which is not just a straightforward extension of the free amalgamation construction used in [2] in the disjoint case. Regarding the generality of our restriction to theories sharing constructors, we believe that the notion of constructors is as general as one can get, a conviction that is supported by the work on combining decision procedures for the word problem and for universal theories [5,13]. Unfortunately, our combination procedure yields only a semi-decision procedure since it incorporates an infinitary step. The restriction to equational theories that are finitary modulo renaming overcomes this problem, but it is probably too strong to be useful in applications. Thus, the main thrust of further research
366
F. Baader and C. Tinelli
will be to remove or at least relax this restriction. We believe that the overall framework introduced in this paper and the proof of soundness and completeness of the semi-decision procedure (or at least the tools used in this proof) will help us obtain more interesting decidability results in the near future. One direction to follow could be to try to impose additional algorithmic requirements on the theories to be combined or on the constructor theory, and exploit those requirements to transform the infinitary step into a series of finitary ones. For this, the work in [6], which assumes algorithms computing complete sets of unifiers for the component theories, could be a starting point. Since the combination algorithm presented there has turned out to be non-terminating [10], that work needs to be reconsidered anyway. Another direction for extending the results presented here is to withdraw the restriction to functional signatures. As a matter of fact, the combination results in [2] apply not just to equational theories, but to arbitrary atomic theories, i.e., theories over signatures also containing relation symbols and axiomatized by a set of (universally quantified) atomic formulae. Since the algebraic apparatus employed in the present paper (in particular, free algebras) is also available in this more general case (in the form of free structures), it should be easy to generalize our results to atomic theories.
References 1. F. Baader and K.U. Schulz. Unification in the union of disjoint equational theories: Combining decision procedures. In Proc. CADE-11, Springer LNCS 607, 1992. 2. F. Baader and K.U. Schulz. Combination of constraint solving techniques: An algebraic point of view. In Proc. RTA’95, Springer LNCS 914, 1995. 3. F. Baader and K.U. Schulz. Unification in the union of disjoint equational theories: Combining decision procedures. J. Symbolic Computation 21, 1996. 4. F. Baader and C. Tinelli. Combining decision procedures for positive theories sharing constructors. Technical report no. 02-02, Dept. of Computer Science, University of Iowa, 2002 (available at http://cs.uiowa.edu/˜tinelli/papers.html). 5. F. Baader and C. Tinelli. Deciding the word problem in the union of equational theories. Information and Computation, 2002. To appear (preprint available at http://cs.uiowa.edu/˜tinelli/papers.html). 6. E. Domenjoud, F. Klay, and Ch. Ringeissen. Combination techniques for nondisjoint equational theories. In Proc. CADE-12, Springer LNCS 814, 1994. 7. W. Hodges. Model Theory, Cambridge University Press, 1993. 8. G. Nelson and D.C. Oppen. Simplification by cooperating decision procedures. ACM Trans. on Programming Languages and Systems 1, 1979. 9. D. Pigozzi. The join of equational theories. Colloquium Mathematicum 30, 1974. 10. Ch. Ringeissen. Personal communication, 2001. 11. M. Schmidt-Schauß. Unification in a combination of arbitrary disjoint equational theories. J. Symbolic Computation 8, 1989. 12. M. E. Stickel. A unification algorithm for associative commutative functions. J. ACM 28, 1981. 13. C. Tinelli and Ch. Ringeissen. Unions of non-disjoint theories and combinations of satisfiability procedures. Theoretical Computer Science, 2002. To appear (preprint available at http://cs.uiowa.edu/˜tinelli/papers.html).
JITty: A Rewriter with Strategy Annotations Jaco van de Pol Centrum voor Wiskunde en Informatica P.O.-box 90.079, 1090 GB Amsterdam, The Netherlands
1
Introduction
We demonstrate JITty, a simple rewrite implementation with strategy annotations, along the lines of the Just-In-Time rewrite strategy, explained and justified in [4]. Our tool has the following distinguishing features: – It provides the flexibility of user defined strategy annotations, which specify the order of normalizing arguments and applying rewrite rules. – Strategy annotations are checked for correctness, and it is guaranteed that all produced results are normal forms w.r.t. the underlying TRS. – The tool is “light-weight” with compact but fast code. – A TRS is interpreted, rather than compiled, so the tool has a short start-up time and is portable to many platforms. We shortly review strategy annotations in Section 2. JITty is available via http://www.cwi.nl/˜vdpol/jitty/ together with a small demonstrator (Section 3) that can be used to experiment with strategy annotations. The rewrite engine has also been integrated in the µCRL tool set [1]. Although performing a rewrite step takes more time in JITty than in the standard compiling rewriter of the µCRL toolset, the former is often preferred, owing to avoidance of compilation time, and better normalization properties of the just-in-time strategy. In Section 4 we emphasize certain requirements on the rewriter imposed by the µCRL toolset. This leads to an unconventional application programmer’s interface, which is described in Section 5.
2
User Defined and Predefined Strategy Annotations
A strategy annotation for a function symbol f is a list of integers and rule labels, where the integers refer to the arguments of f (1 ≤ i ≤ arity(f )) and the rule labels to rewrite rules for f , i.e., rules whose left hand side have top symbol f . For instance, the annotation f : [1, α, β, 2] means that a term with top symbol f should be normalized by first normalizing its first argument, then trying rule α and β; if both fail the second argument is normalized. To normalize correctly, a strategy annotation must be full and in-time. It is full if all arguments and rules for f are mentioned. It is in-time if arguments are mentioned before rules that need them. A rule needs an argument if either the argument starts with a function symbol, or the argument is a non-linear variable. S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 367–370, 2002. c Springer-Verlag Berlin Heidelberg 2002
368
J. van de Pol
signature T(0) or(2) loop(0) F(0) and(2) rules a1([x], and(x,T), x) o1([x], or(T,x), T) a2([x], and(x,F), F) o2([x], or(F,x), x) default justintime strategies and([2,a1,a2,1]) end rewrite( and(loop,F) ) rewrite( or(T,loop) ) rewrite( or(and(loop,F),or(T,loop)) ) stop
l([], loop, loop)
Fig. 1. Boolean example of a demonstrator file.
It has been proved in [4] that if a normal form is computed under a strategy annotation satisfying the above restrictions, then the result is a normal form of the original TRS without strategies. Therefore JITty checks these criteria. The following strategies are predefined: leftmost innermost, which first normalizes all arguments, and subsequently tries all rewrite rules; and just-in-time, which also normalizes its arguments from left to right, but tries to apply rewrite rules as soon as their needed arguments have been evaluated. JITty’s strategy annotations are similar to OBJ’s annotations (e.g. [3]). The annotations of JITty are more refined, because rules can be mentioned individually, but less sophisticated, because laziness annotations are not supported. See [4] for other rule based systems with user-controlled strategies (ELAN, Maude, OBJ-family, Stratego). Although, strictly speaking, evaluation can be done when strategy annotations are not in-time or full, an interesting optimization can be applied if they are. In particular, for in-time annotations we have that all subterms of a normal form are in normal form. Hence, in α : f (g(x)) → h(x), with f : [1, α], it is guaranteed that the argument of h will be normal, so it will not be traversed. Therefore, JITty currently only supports “correct” annotations.
3
Simple Demonstrator
We provide a simple demonstrator, which reads a file containing a signature, a number of rules, a default strategy, a number of user defined strategy annotations, and a number of commands. It has a fixed structure, as shown in Figure 1. The signature consists of a number of function symbols with their arity. A rule consists of a label, a list of variables, a left hand side and a right hand side. The default strategy should be either innermost or justintime. The default strategy can be overwritten for each function symbol, by an annotation, being a mixed list of integers and rule labels.
JITty: A Rewriter with Strategy Annotations
369
The commands are of the form rewrite(term), to start rewriting. The three examples in Figure 1 are carefully chosen to terminate. Other examples may loop for ever. After replacing justintime by innermost, the first term will terminate, but the last two will not. The website shows more examples, such as the following rule for division, which terminates for closed x and y by virtue of the just-in-time strategy: div(x,y) → if(lt(x,y),O,S(div(minus(x,y),y))).
4
Embedding in the µCRL Toolset
A µCRL specification consists of an equational data theory and a process part. The µCRL toolset [1] contains a.o. an automated theorem prover for the equational theory, and a simulator for the process part which serves as the front end of visualization and model checking tools. Both tools depend heavily on term rewriting in order to decide the equational theory. Therefore we need that the strategy annotations always yield normal forms of the TRS. The simulator has to normalize the guards of transition rules (i.e., terms with state variables). The same guard is rewritten in many states. For efficiency reasons, JITty maintains a current environment, which is a normalized substitution. Given global environment σ, rewriting a term t now means to get the normal form of tσ , assuming that the substitution σ is normalized. This can be exploited in the implementation: t has to be traversed only once, and xσ (for variables x in t) is not traversed at all, because it is supposed to be in normal form. The user can modify the current environment by assigning a term to a variable, provided this term is in normal form. To resolve name conflicts, the user can enter and leave blocks. Entering a block doesn’t change the global environment, but leaving a block restores the previous environment. The theorem prover is based on binary decision diagrams (BDD) with equations instead of proposition symbols. BDDs are nothing but highly shared ifthen-else trees. An important optimization, crucial when rewriting BDDs, is that the rewriter can be put in “hash mode”. In this case, each computed result is stored in a look-up table. So each sub-computation is performed only once.
5
Application Programmer’s Interface
JITty is implemented in the programming language C, and relies on the ATerm library [2]. This library is supposed to have an efficient term implementation. It guarantees that terms are always stored in maximally shared form. Moreover, the µCRL toolset uses ATerms as well, so at term level there is no translation between µCRL and JITty. Therefore, ATerms also show up in the Application Programmer’s Interface of JITty, shown in Figure 2. A complete C program using the basic functionality is shown in Figure 3. JIT init is used to initialize (or reset) the rewriter. At initialization, the following information is needed: lists of function symbols, rewrite rules and strategy annotations, an indication of the default strategy (currently one of the constants INNERMOST or JUSTINTIME) and an indication whether hash tables should be used
370
J. van de Pol
#include "aterm2.h" #define INNERMOST 1 #define JUSTINTIME 2 void JIT_init(ATermList funs, ATermList rules, ATermList strategy, int default_strat, char withhash); ATerm JIT_normalize(ATerm t); void JIT_flush(void); void JIT_assign(Symbol v, ATerm t); void JIT_enter(void); void JIT_leave(void); void JIT_clear(void); int JIT_level(void); Fig. 2. JITty – Application Programmer’s Interface #include "jitty.h" int main(int argc, char* argv[]) { ATinitialize(argc,argv); /* JIT_init(ATparse("[f(1),g(2),a(0),b(0)]"), /* ATparse("[frule([x],f(x),g(x,b))," /* " grule([x],g(x,x),f(a))]"), /* ATparse("[f([frule,1])]"), /* INNERMOST, /* 0); /* ATprintf("%t\n",JIT_normalize(ATparse("f(b)")));
initialize ATerms signature rule for f rule for g strategy annotation default strategy without hashing }
*/ */ */ */ */ */ */
Fig. 3. Example program of using JITty
(0 = no, 1 = yes). JIT normalize(t) returns the normal form of t in the current environment. JIT flush() is used to clear the hash table, in case it becomes too memory consuming. The other functions are used to manipulate the global environment, as explained in Section 4. JIT enter() and JIT leave() can be used to enter or leave a new block. JIT clear() undoes all bindings in the current block. JIT assign(v,t) assigns term t to variable v (represented as Symbol from the ATerm library). Finally, JIT level() returns the current level.
References 1. S. Blom, W. Fokkink, J. Groote, I. van Langevelde, B. Lisser, and J. van de Pol. µCRL: A toolset for analysing algebraic specifications. In Proceedings of CAV 2001, LNCS 2102, pages 250–254, 2001. See also http://www.cwi.nl/˜mcrl/. 2. M. van den Brand, H. de Jong, P. Klint, and P.A. Olivier. Efficient Annotated Terms. Software – Practice & Experience, 30:259–291, 2000. 3. J. Goguen, T. Winkler, J. Meseguer, K. Futatsugi, and J.-P. Jouannaud. Introducing OBJ. In J. Goguen and G. Malcolm, editors, Software Engineering with OBJ: algebraic specification in action. Kluwer, 2000. 4. J. van de Pol. Just-in-time: On strategy annotations. In B. Gramlich and S. Lucas, editors, Electronic Notes in TCS, volume 57, 2001. (Proc. of WRS 2001, Utrecht).
Autowrite: A Tool for Checking Properties of Term Rewriting Systems Ir`ene Durand LaBRI, Universit´e de Bordeaux I 33405 Talence, France [email protected]
1
Introduction
Huet and L´evy [6] showed that for the class of orthogonal term rewriting systems (TRSs) every term not in normal form contains a needed redex (i.e., a redex contracted in every normalizing rewrite sequence) and that repeated contraction of needed redexes results in a normal form if it exists. However, neededness is in general undecidable. In order to obtain a decidable approximation to neededness Huet and L´evy introduced the subclass of strongly sequential TRSs and showed that strong sequentiality is a decidable property of orthogonal TRSs. Several authors [2,3,7,9,10,11,12] proposed decidable extensions of the class of strongly sequential TRSs. Since Comon’s [1,2] and Jacquemard’s [7] work, all the corresponding decidability proofs have been expressed using tree automata techniques: typically a property is satisfied if and only if some associated automaton recognizes the empty language. A uniform framework for the study of sequentiality is described in [3] where classes of TRSs are parameterized by approximation mappings. Nowdays the best known approximation for which sequentiality is decidable is the growing approximation studied in [10]. We assume that the reader is familiar with the basics of term rewriting [8]. Familiarity with the theory of sequentiality ([1,3,4,5,6,7,10,11]) is helpfuf. Autowrite is an experimental tool written in Common Lisp for checking properties of TRSs. It was initially designed to check sequentiality properties of TRSs. For this purpose, it implements the tree automata constructions used in [7,3,4, 10] and many useful operations on terms, TRSs and tree automata (unfortunaletly not all yet integrated into the graphical interface). A graphical interface (still under construction) is written using FreeCLIM, the free implementation of the CLIM specification. Using this interface, one can check sequentiality of the different approximations of a given system. Many other functionalities are available directly from the Common Lisp interactive environment and could easily be integrated into the graphical interface. The Autowrite tool was used to check sequentiality for most of the examples presented in [5]. Our term rewriting systems consist of left-linear rewrite rules l → r that satisfy root(l) ∈ / V and Var(r) ⊆ Var(l). The following example is adapted from [9]. S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 371–375, 2002. c Springer-Verlag Berlin Heidelberg 2002
372
I. Durand
f(a, g(x0 , a)) f(b, g(a), x0 ) R1 = f(x0 , a) g(b, b)
→ → → →
a a x0 a
If the condition Var(r) ⊆ Var(l) is not imposed, we speak of extended TRSs (eTRSs). Such TRSs arise naturally when we approximate TRSs, as explained below. Let R be an eTRS over a signature F. The set of ground normal forms of R is denoted by NF(R). Let R• be the eTRS R ∪ {• → •} over the extended signature F• = F ∪ {•}. We say that redex ∆ in C[∆] ∈ T (F) is R-needed if there is no term t ∈ NF(R) such that C[•] →∗R t. Finally, we say that R is sequential if every reducible term in T (F) contains a R-needed redex. Let R and S be eTRSs over the same signature. We say that S approximates R if →∗R ⊆ →∗S and NF(R) = NF(S). An approximation mapping is a mapping α from TRSs to eTRSs with the property that α(R) approximates R, for every TRS R. Given a TRS R, we say that R is α-sequential if α(R) is sequential. Next we define the approximation mappings s, nv, and gr. Let R be a TRS. The strong approximation s(R) is obtained from R by replacing the right-hand side of every rewrite rule by a variable that does not occur in the corresponding left-hand side. The nv approximation nv(R) is obtained from R by replacing the variables in the right-hand sides of the rewrite rules by pairwise distinct variables that do not occur in the corresponding left-hand sides. The growing approximation gr(R) is obtained from R by renaming the variables in the right-hand sides that occur at a depth greater than 1 in the corresponding left-hand side. Given a TRS R and α ∈ {s, nv, gr}, it is decidable whether α(R) is sequential. However the decision procedures are very complex (exponential for s, doubly exponential for nv and gr [4]) so it is impossible to show by hand that a particular system is sequential, even for very small systems. However showing that a system is not sequential can be done more easily by exibiting a term with no needed redex. This later property of a term can be proved in polynomial time. For R1 , we obtain the following approximated TRSs: f(a, g(x0 , a)) → y0 f(a, g(x0 , a)) → a f(b, g(a, x0 )) → y0 f(b, g(a, x0 )) → a s(R1 ) = nv(R1 ) = , a) → y → y0 f(x0 , a) f(x 0 0 g(b, b) → y0 g(b, b) → a gr(R1 ) = R1 The next example (R2 ) is adapted from [11]. f(g(a, x0 ), a) f(g(a, x0 ), a) → a , a), k(a)) → a f(g(x f(g(x0 , a), k(a)) 0 f(k(c), x0 ) f(k(c), x0 ) → a nv(R2 ) = R2 = g(k(a), k(a)) → h(k(a)) g(k(a), k(a)) k(k(a)) k(k(a)) → a h(x0 ) h(x0 ) → k(x0 ) gr(R2 ) = R2
→ → → → → →
a a a h(k(a)) a k(y0 )
Autowrite: A Tool for Checking Properties of Term Rewriting Systems
373
The last example (R3 ) is adapted from Berry’s example (the first three rules). f(x0 , a, b) f(b, x0 , a) f(a, b, x0 ) R3 = h(k(x0 )) g(a, a) g(a, b) g(b, a)
→ → → → → → →
h(x0 ) h(x0 ) h(x0 ) g(x0 , x0 ) g(a, a) a b
f(x0 , a, b) f(b, x0 , a) f(a, b, x0 ) gr(R3 ) = h(k(x0 )) g(a, a) g(a, b) g(b, a)
→ → → → → → →
h(x0 ) h(x0 ) h(x0 ) g(y0 , y0 ) g(a, a) a b
Autowrite computes α(R) for any approximation α in {s, nv, gr}.
2
Automata
Here are the different kinds of automata that Autowrite can build: 1. Given a set of ground terms L: an automaton AL such that L(AL ) = L. 2. Given a set of linear terms L: an automaton AL such that L(AL ) = {σ(t) | t ∈ L and σ is a ground substitution}. 3. Given the set of left-hand sides of a left-linear TRS R: an automaton ANF(R) such that L(ANF(R) ) = NF(R). 4. Given any regular tree language L recognized by an automaton AL and a left-linear growing e-TRS R: a deterministic automaton CR,L (as described ∗ in [10]) such that L(CR,L ) = (←)(L). 5. Given a left-linear growing e-TRS R: an automaton DR such that L(DR ) = ∅ is equivalent to R is sequential [4]. Autowrite may check the following properties about tree automata: 1. Given an automaton A: does L(A) = ∅? If not, it may exhibit ground terms belonging to the language. 2. Given a ground term t and an automaton A: does t ∈ L(A)? To decide whether a TRS R is α-sequential, Autowrite computes α(R) then builds the automaton Dα(R) and finally checks whether L(Dα(R) ) = ∅. If so it concludes that R is α-sequential otherwise it exhibits a ground term of L(Dα(R) ) which is a term with no α(R)-needed redex. Note that to build Dα(R) , Autowrite must previously compute Cα(R) = Cα(R),NF(R) . For Cα(R) , both Jacquemard’s algorithm for growing-linear systems and Toyama and Nagaya’s for linear-growing systems have been implemented. For Dα(R) , we have implemented the algorithm presented in [4]. In fact these three algorithms have been adapted in order to compute directly automata with only accessible states. This complicates the code but reduces considerably the size of the constructed automata.
3
Experimental Results
The graphical version of Autowrite can be found at http://dept-info.labri.u-bordeaux.fr/˜idurand/autowrite
374
I. Durand
Autowrite has been used to check α-sequentiality for many linear TRSs. It can also check many other properties of TRSs and perform computations on tree automata; although these are not yet accessible from the graphical interface, they can be called using Lisp. However, with regard to sequentiality, one cannot hope to use Autowrite for big TRSs because the size of the constructed automaton R DR is in O(22 ) as shown in [4]. With Autowrite, we were able to check sequentiality for many examples found in the literature. In the following table we present a few sequentiality results for some TRSs R with different approximations. We present the number of states (st) and rules (rl) of the automata Cα(R) and Dα(R) built to decide αsequentiality. If the TRS is not α-sequential we present the first term with no α(R)-needed redex computed by Autowrite. α Time in sec R1 s 13 nv 101 gr 98 R2 s 19 nv 69 gr 145 R3 s 3 nv 16 gr 263
Cα(R) st rl 20 412 25 742 15 182 24 573 33 1329 19 292 12 435 14 857 11 293
Dα(R) Sequentiality results st rl 107 2181 f(g(b, b), g(g(b, b), g(b, b))) 243 7463 f(f(a, a), g(f(a, a), f(a, a))) 114 16477 Sequential 109 2460 f(g(h(a), h(a)), h(a)) 400 4391 f(g(h(a), h(a)), h(h(a))) 117 17469 Sequential 15 179 f(g(b, a), g(b, a), g(b, a)) 27 716 f(f(b, b, a), f(b, b, a), f(b, b, a)) 30 8187 Sequential
References 1. H. Comon. Sequentiality, second-order monadic logic and tree automata. In Proc. 10 th LICS, pages 508–517, 1995. 2. H. Comon. Sequentiality, monadic second-order logic and tree automata. Information and Computation, 1999. Full version of [1]. 3. I. Durand and A. Middeldorp. Decidable call by need computations in term rewriting (extended abstract). In Proc. 14th CADE, volume 1249 of LNAI, pages 4–18, 1997. 4. Ir`ene Durand and Aart Middeldorp. On the complexity of deciding call-by-need. Technical Report 1196–98, LaBRI, 1998. 5. Ir`ene Durand and Aart Middeldorp. On the modularity of deciding call-by-need. In Foundations of Software Science and Computation Structures, volume 2030 of Lecture Notes in Computer Science, pages 199–213, Genova, 2001. Springer-Verlag. 6. G. Huet and J.-J.L´evy. Computations in orthogonal rewriting systems, i and ii. In Computational Logic, Essays in Honor of Alan Robinson, pages 396–443. The MIT Press, 1991. Original version: Report 359, Inria, 1979. 7. F. Jacquemard. Decidable approximations of term rewriting systems. In Proc. 7th RTA, volume 1103 of LNCS, pages 362–376, 1996. 8. J.W. Klop. Term rewriting systems. In Handbook of Logic in Computer Science, Vol. 2, pages 1–116. Oxford University Press, 1992. 9. J.W. Klop and A. Middeldorp. Sequentiality in orthogonal term rewriting systems. Journal of Symbolic Computation, 12:161–195, 1991.
Autowrite: A Tool for Checking Properties of Term Rewriting Systems
375
10. Takashi Nagaya and Yoshihito Toyama. Decidability for left-linear growing term rewriting systems. In Proc. 10th RTA, volume 1631 of LNCS, 1999. 11. M. Oyamaguchi. NV-sequentiality: A decidable condition for call-by-need computations in term rewriting systems. SIAM Journal on Computation, 22:114–135, 1993. 12. Y. Toyama. Strong sequentiality of left-linear overlapping term rewriting systems. In Proc. 7th LICS, pages 274–284, 1992.
TTSLI: An Implementation of Tree-Tuple Synchronized Languages Benoit Lecland and Pierre R´ety LIFO - Universit´e d’Orl´eans B.P.6759,45067 Orl´eans cedex 2, France {lecland,rety}@lifo.univ-orleans.fr http://www.univ-orleans.fr/SCIENCES/LIFO/Members/{lecland,rety}
1
Introduction
Tree-Tuple Synchronized Languages have first been introduced by means of TreeTuple Synchronized Grammars (TTSG) [3], and have been reformulated recently by means of (so-called) Constraint Systems (CS), which allowed to prove more properties [2,1]. A number of applications to rewriting and to concurrency have been presented (see [5] for a survey). TTSLI is an implementation of constraint systems, together with the main operations. It is written in Java, and is available on Lecland’s web page.
2
Constraint Systems for Tuple Synchronized Languages
We just give examples to recall what a CS is. Formal definitions are given in [2]. Example 1. In the signature Σ = {f \2 , b\0 }, let Lid = {(t, t) | t ∈ TΣ } be the set of pairs of identical terms. Lid can be defined by the following grammar, given in the form of a constraint system : Xid ⊇ (b, b) Xid ⊇ (f (11 , 21 ), f (12 , 22 )) (Xid , Xid ) where 11 , 21 , 12 , 22 abbreviate pairs (for readability). For example 21 means (2, 1), which denotes the first component of the second argument (the second Xid ). Note that since 11 and 12 come from the same Xid , they represent two identical terms, in other words they are linked (synchronized), whereas for example 11 and 21 are independent. Example 2. Now if we consider the slightly different constraint system : Xsym ⊇ (b, b) Xsym ⊇ (f (11 , 21 ), f (22 , 12 )) (Xsym , Xsym ) We get the set Lsym = {(t, tsym ) | tsym is the symmetric tree of t}. S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 376–379, 2002. c Springer-Verlag Berlin Heidelberg 2002
TTSLI: An Implementation of Tree-Tuple Synchronized Languages
377
Example 3. In the signature Σ = {s\1 , b\0 } let Ldble = {(sn (b), s2n (b))}. It can be defined by the constraint system : Xdble ⊇ (b, b) Xdble ⊇ (s(11 ), s(s(12 ))) Xdble Constraint systems are closed under union, projection, and (assuming a restriction) join. Membership and emptiness are decidable. They are not closed under intersection, except for a subclass [4]. The implementation also includes an extension of membership : language matching.
3
Implementation
3.1
Language Matching
Example 4. Consider again the CS Csym given in Example 2, which generates Lsym = {(t, tsym )}. We want to find a CS that generates the triple language U (U for Unknown) s.t. : (t1 , t2 , t3 ) ∈ U ⇐⇒ (f (t1 , b), f (t2 , t3 )) ∈ Lsym This language matching problem is denoted by P = (f (11 , b), f (12 , 13 ))U,Xsym , which means that if we replace 11 , 12 , 13 by the components of U , we should get the language Lsym (whose axiom is Xsym ). Obviously, U = {(t, b, tsym ) | t ∈ TΣ } and can be generated by the CS {U ⊇ (11 , b, 12 )Xsym } ∪ Csym . Note that the language matching problem is an extension of the membership problem : for example, if (t, t ) is a pair of ground terms1 , (t, t ) ∈ Lsym ⇐⇒ (t, t )U,Xsym = true where U is a 0-tuple language. The proof of the algorithm is written and give a new result about CS’s : the closure under language matching. Algorithm. Starting from the initial problem P , the algorithm works in several steps. A step generates one constraint and a number of language matching subproblems to be solved. When no more subproblems are left, the algorithm stops. The size of each subproblem is not greater than the size of the initial problem. Therefore there are at most finitely many subproblems (up to a variable renaming). Thus, if the algorithm pays attention not to solve the same subproblem twice, it will terminate. 3.2
Projection
Let X be the axiom of a constraint system C that recognizes a language of ntuples, and let {x1 , . . . , xp } ⊆ {1, . . . , n}. To obtain the projection of C on components x1 , . . . , xp , one can add to C the constraint Xproj ⊇ (1x1 , ..., 1xp )(X), and consider that the axiom is Xproj . 1
Recall that CS’s express only ground terms.
378
B. Lecland and P. R´ety
Example 5. Let C be the following CS : X ⊇ (s(11 ), p(12 ), f (13 )) X1 X1 ⊇ (s(11 ), p(12 ), f (13 )) X X ⊇ (a, b, c) C generates the language L = {(s2n (a), p2n (a), f 2n (a)) | n ∈ IN}. The projection on components 1 and 3, i.e. the language L = {(s2n (a), f 2n (a)) | n ∈ IN}, is generated by the CS C = {Xproj ⊇ (11 , 13 )X} ∪ C. However, to generate L , a better CS obviously is˙: X ⊇ (s(11 ), f (12 )) X1 X1 ⊇ (s(11 ), f (12 )) X X ⊇ (a, c) The following algorithm returns a CS that generates only useful intermediate components. Algorithm. Let X be the axiom of a constraint system C that recognizes a language L of n-tuples, and let E = {x1 , . . . , xp } ⊆ {1, . . . , n}. Let ΠX,E (C) denote a CS that generates the projection of L on components x1 , . . . , xp . Let πX,E (C) be the operation that for each constraint X ⊇ (t1 , . . . , tn )(X1 , . . . , Xk ) ∈ C generates X ⊇ (tx1 , . . . , txp )(X1 , . . . , Xk ) ∈ C, hence that removes useless components (but not useless non-terminals). ΠX,E (C) is generated by a recursive function : – C = ∅ – D = πX,E (C) – for each constraint X ⊇ (t1 , . . . , tp )(X1 , . . . , Xk ) ∈ D, for each i ∈ {1, . . . , k}, let C = C ∪ ΠXi ,Ei (C), where Ei = {x | ix occurs in (t1 , . . . , tp )} – return C ∪ D At the end, we clean the resultant CS by removing useless non-terminals (if any). 3.3
Join
Join is natural join, like in data-bases. The general algorithm is given in [1]. Given a language S of l-tuples, and T of n-tuples, and for i ∈ {1, . . . , l}, j ∈ {1, . . . , n} the i, j-join of S and T is a l + n − 1-tuple language, denoted S ✶i,j T , and defined by : {(s1 , . . . , sl , t1 , . . . , tj−1 , tj+1 , . . . , tn ) | (s1 , . . . , sl ) ∈ S∧(t1 , . . . , tn ) ∈ T ∧si = tj } S ✶ T stands for S ✶l,1 T . Example 6. Let the language L1 = {(sn (a), s2n (a)) | n ∈ IN} and the language L2 = {(s3n (a), sn (a)) | n ∈ IN} The join of L1 and L2 respectively on components 2 and 1 is L1 ✶2,1 L2 = {(s3n (a), s6n (a), s2n (a)) | n ∈ IN}
TTSLI: An Implementation of Tree-Tuple Synchronized Languages
3.4
379
Other Operations
Checking emptiness for a CS is very similar to checking emptiness for a regular tree language defined by a regular grammar. See [2]. Union, as well as a membership test faster than running the language matching test on a ground tuple, are also implemented.
4
Experimentations
We have run an application to rewriting as described in [1] : given regular tree languages L1 , L2 , and a tree binary relation R defined by a CS, R(L1 ) ∩ L2 = ∅ ⇐⇒ Π2 (L1 ✶ R) ✶ L2 = ∅ Let R be the relation defined by the only rewrite rule f (x, y, z, u) → f (u, z, x, y). Let L1 = (s∗ (a), s∗ (b), s∗ (b), s∗ (b)). To show that R3 (L1 ) ∩ L1 = ∅ where R3 is the rewriting in three steps, we check that Π2 (L1 ✶ R3 ) ✶ L1 = Π2 (Π2 (Π2 (L1 ✶ R) ✶ R) ✶ R) ✶ L1 = ∅ We have successfully run this example on a Sun Ultra 5 computer in 50”.
5
Conclusion
Thanks to this implementation, we also hope to run the other existing known applications of synchronized languages, as well as the further ones.
References 1. V. Gouranton, P. R´ety, and H. Seidl. Synchronized Tree Languages Revisited and New Applications. Research Report 2000-16, LIFO, 2000. http://www.univorleans.fr/SCIENCES/LIFO/Members/rety/publications.html. 2. V. Gouranton, P. R´ety, and H. Seidl. Synchronized Tree Languages Revisited and New Applications. In Proceedings of FoSSaCs, volume 2030 of LNCS. SpringerVerlag, 2001. 3. S. Limet and P. R´ety. E-Unification by Means of Tree Tuple Synchronized Grammars. In Proceedings of 6th Colloquium on Trees in Algebra and Programming, volume 1214 of LNCS, pages 429–440. Springer-Verlag, 1997. Full version in DMTCS (http://dmtcs.loria.fr/), volume 1, pages 69-98, 1997. 4. S. Limet, P. R´ety, and H. Seidl. Weakly Regular Relations and Applications. In Proceedings of 12th Conference on Rewriting Techniques and Applications, Utrecht (The Netherlands), volume 2051 of LNCS. Springer-Verlag, 2001. 5. P. R´ety. Langages synchronis´es d’arbres et applications. Habilitation Thesis (in French). LIFO, Universit´e d’Orl´eans, June 2001.
in2 : A Graphical Interpreter for Interaction Nets Sylvain Lippi Institut de Math´ematiques de Luminy, Marseille, France [email protected]
in2 can be considered as an attractive and didactic tool to approach the interaction net paradigm. But it is also an implementation in C of the core of a real programming language featuring a user-friendly graphical syntax and an efficient garbage collector free execution.
1
Introduction
Interaction nets introduced by Yves Lafont [3] are a generalization of Proof nets of Jean-Yves Girard [1]. It was originally presented as a model of computation but we rather focus on the programming language viewpoint. Indeed we present a concrete implementation of an interpreter for the interaction nets: in2 . So we first shortly recall the main principles of the interaction nets and give an example of interaction system. Then we give a small description of in2 and show how it can be used to execute a program written with interaction nets.
2
The Interaction Net Paradigm
We present the interaction net paradigm in a practical way by introducing a simple example: the interaction combinators. For a more detailed presentation see [3]. 2.1
Graphical Syntax
– The basic ingredient is a symbol with its arity. Our example system has three symbols: δ and γ of arity 2 and ε of arity 0. – Occurrences of symbols are called cells. Cells have one principal port and their number of auxiliary ports is given by the arity of their corresponding symbol. The γ-, δ- and ε-cells are pictured like this: 1
2
1
2
γ
δ
0
0
ε 0
γ- and δ-cells have three ports: one principal port (0) and two auxiliary ports (1 and 2). ε-cells have only one (principal) port. For esthetic reasons, cells with no auxiliary port are pictured with a circle; the important point is to distinguish the principal port of a cell. S. Tison (Ed.): RTA 2002, LNCS 2378, pp. 380–385, 2002. c Springer-Verlag Berlin Heidelberg 2002
in2 : A Graphical Interpreter for Interaction Nets
381
Convention. Auxiliary ports are not interchangeable. For instance, one can number them from 1 to n, keeping 0 for the principal port. In practice, the ports will always be implicitly numbered in clockwise order. – A net is a graph built with cells and free ports. Ports (principal ports, auxiliary ports or free ports) are connected pairwise by wires. An example of a net built with the combinators can be found in figure 1
ε γ γ
γ
γ
δ
δ
γ
δ
δ
=
γ
ε ( , )
ι
( , )
Fig. 1. A net and its concrete representation.
– A rule is a pair of nets (left member right member ) with the same number of free ports. The important restriction is that the left member must be built with two cells connected by their principal ports; such a net is called a cut. 1 There is, at most, one rule for each pair of symbols. Figure 2 gives the rules for the combinators.
γ
δ
δ
δ
γ
δ
γ
γ
δ
γ
γ ε
ε
ε
δ
ε
ε
ε
ε ε
Fig. 2. Interaction rules for the combinators
1
if the two cells of the left member share the same symbol, there is also a symmetry condition on the right member (see [4]).
382
S. Lippi
2.2
Execution
Once a set of symbols and rules has been fixed, we can apply one of those rules to a net obtaining another net and so on until we have reached an irreducible one. The reduction relation is denoted by and its reflexive and transitive closure by ∗ . Here is an example of reduction where a net is duplicated by a δ-cell:
ε
ε
γ δ
δγ
ε
ε
δ
δ
γ
γ
∗ δε
ε
ε γ
ε
ε γ
There may be several cuts in a net but the order in which they are eliminated does not matter. All reduction strategies lead to the same irreducible net and have the same length (see [4]). The reason is that two instances of a cut are necessarily disjoint, so we can apply the corresponding rules independently.
3
Implementation
The interpreter can be freely downloaded at http://iml.univ-mrs.fr/˜lippi. It is written in C in order to obtain an efficient execution and to benefit from the multiple program libraries: flex and bison for parsing, the Gnu Library for memory allocation and the Gimp Tool Kit [7] for graphical layout. Nevertheless, the implemented algorithms are widely independant of the programming language. Let us have a look at the “concrete” representation of a net. Surprisingly, it is closer from a tree structure than from a graph. Basically, the main difference between the graphical representation and the concrete one is that links are oriented in the second case. All the connections start from an auxiliary port. The most important data types are the following: – A cell of arity n is composed of a pointer in the table of symbols and an array of n addresses. – A port address (or simply address) represented by is composed of a pointer to a cell and a port number n ≥ 0. – A cut is a pair of addresses represented as follows: ( , ). 2 Let us remark that the cell and address data types are mutually recursive. Moreover, a cell of arity n i.e with n + 1 ports contains only n addresses; this is due to the fact that there are no links starting from a principal port. Cuts are stored separately is a stack. Instead of defining another data type for free ports, we simply introduce an interface cell ι that “collect” all the free ports of 2
In fact, we could simply use a pair of pointers to a cell since the cells of a cut are always connected through their principal ports and this is how cuts are really implemented.
in2 : A Graphical Interpreter for Interaction Nets
383
a net. Figure 1 presents a net and its concrete representation. The same ideas are used for the concrete representation of a rule; we consider the net obtained by connecting the corresponding free ports of the left and the right member.
4
User Manual
An interaction net program (for short interaction program) is completely described by the following data: – symbol : the symbols with their respective arities; – rule: the interaction rules (this is the core of the program) – net: the net from which the reduction starts. Similarly to Prolog programs, the difficult part to write is the set of rules and not the starting net to be reduced (or the set of goals to be eliminated). An interaction program is written with a textual syntax. Let us begin with the syntax of a net; it is defined by the following grammar: NET CUT TERM LIST LIST1
::= ::= ::= ::= ::=
empty | CUT NET TERM -- TERM variable | symbol LIST empty | ( LISTE1 ) TERM | TERM , LIST1
The simplest idea is to start from scratch and introduce cells and connections between the ports (auxiliary ports, principal ports and free ports) using variables to name the ports: – Let us take the symbol δ (of arity 2) for example. A δ-cell with the principal port named x and the auxiliary ports respectively named y and z is denoted by x -- δ(y,z) or δ(y,z) -- x. – A connection between two ports named x and y is denoted by x -- y or y -- x. Definition 1. Variables that occur only once correspond to free ports and are called free variables. The other variables appear twice; they correspond to connections and are called bound variables. This syntax is quite simple but it also requires a large number of variable names: one for each port. So there is a syntactic shortcut to lighten the notation; sentences of the form x -- T where x is a bound variable and T a term can be removed by replacing the other occurrence of x by T. By repeating this process we obtain a condensed syntax (Lafont’s syntax) where bound variables are needed only for connections between two auxiliary ports. With a bit of training, one can write a net directly with the condensed syntax: putting principal ports down and auxiliary ports up, we obtain trees with connections between leaves. Nevertheless, the user can write a net with the “na¨ıve” syntax.
384
S. Lippi
The syntax of a rule is about the same: the left member is described by two cells separated by >< and the right member is written like a net. The corresponding free ports of the two members must share the same names. Rules are separated by commas. The δγ rule can be written: δ(x,y) >< γ(r,s) r -- δ(a,b) s -- δ(c,d) δ(γ(a,c),γ(b,d)) >< γ(δ(a,b),δ(c,d)) or x -- γ(a,c) y -- γ(b,d) Figure 3 is a screen shot of in2 launched on the combinators system. One window shows the “source code” of the program written with the (textual) condensed syntax; the user can specify a display color for each symbol. Two other windows are displayed by the interpreter: the net window shows the net which is currently reduced and the next rule window shows the rule that is going to be applied if the user presses the reduce button.
Fig. 3. A screenshot of in2
References 1. J.-Y. Girard. Linear logic. Theoretical Computer Science 50, 50(1):1–102, 1987. 2. Y. Lafont. Interaction nets. In proceedings of the 17th Annnual ACM Symposium on Principles of Programming Languages, Orlando (Fla., USA), pages 95–108, 1990.
in2 : A Graphical Interpreter for Interaction Nets
385
3. Y. Lafont. From proof-nets to interaction nets. In Advances in Linear Logic, London Mathematical Society Lecture Note Series 222. Cambridge University Press, 1995. 4. Y. Lafont. Interaction combinators. Information and Computation, 137(1):69–101, 1997. 5. Sylvain Lippi. Interaction nets as a programming language. Technical report, Institut de Math´ematiques de Luminy, 2001. 6. Sylvain Lippi. Encoding left reduction in the lambda-calculus with interaction nets. To appear in Mathematical Structures in Computer Science, 2002. 7. Peter Mattis and the GTK+ team. The GIMP Toolkit. http://www.gtk.org, 1998.
Author Index
Baader, Franz 23, 352 Bournez, Olivier 252 Bravenboer, Martin 237
Nadathur, Gopalan
Charatonik, Witold
Piperno, Adolfo 51 Pol, Jaco van de 367
Ohsaki, Hitoshi
311
D´eharbe, David 207 Dougherty, Dan 340 Durand, Ir`ene 371
Seki, Hiroyuki 98 Severi, Paula 159 Shankar, Natarajan Struth, Georg 83
267
Kaji, Yuichi 98 Kennaway, Richard 51 Khasidashvili, Zurab 51 Kirchner, Claude 66, 252
207
1
Tahhan-Bittar, Elias 281 Takai, Toshinori 98, 114 Talbot, Jean-Marc 311 Tinelli, Cesare 352
Lecland, Benoit 376 Levy, Jordi 326 Liang, Chuck 192 Lippi, Sylvain 380 Lucas, Salvador 296 Melli`es, Paul-Andr´e 24 Mitchell, John C. 19 Moreira, Anamaria Martins
114
R´ety, Pierre 129, 376 Ringeissen, Christophe Rueß, Harald 1
Faure, Germain 66 Forest, Julien 174 Fujinaka, Youhei 98 Geser, Alfons
192
Villaret, Mateu 326 Visser, Eelco 237 Voigtl¨ ander, Janis 222 Vries, Fer-Jan de 159 Vuotto, Julie 129 207
Waldmann, Johannes 144 Wierzbicki, ToMasz 340