Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
4941
Marino Miculan Ivan Scagnetto Furio Honsell (Eds.)
Types for Proofs and Programs International Conference, TYPES 2007 Cividale del Friuli, Italy, May 2-5, 2007 Revised Selected Papers
13
Volume Editors Marino Miculan Ivan Scagnetto Furio Honsell Università degli Studi di Udine Dipartimento di Matematica e Informatica Via delle Scienze 206, 33100 Udine, Italy E-mail: {miculan, scagnett, honsell}@dimi.uniud.it
Library of Congress Control Number: 2008926731 CR Subject Classification (1998): F.3.1, F.4.1, D.3.3, I.2.3 LNCS Sublibrary: SL 1 – Theoretical Computer Science and General Issues ISSN ISBN-10 ISBN-13
0302-9743 3-540-68084-5 Springer Berlin Heidelberg New York 978-3-540-68084-0 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com © Springer-Verlag Berlin Heidelberg 2008 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12267088 06/3180 543210
Preface
These proceedings contain a selection of refereed papers presented at or related to the Annual Workshop of the TYPES project (EU coordination action 510996), which was held during May 2–5, 2007 in Cividale del Friuli (Udine), Italy. The topic of this workshop was formal reasoning and computer programming based on type theory: languages and computerized tools for reasoning, and applications in several domains such as analysis of programming languages, certified software, formalization of mathematics and mathematics education. The workshop was attended by more than 100 researchers and included more than 40 presentations. We also had the pleasure of three invited lectures, from Fr´ed´eric Blanqui (INRIA, Protheo team), Peter Sewell (University of Cambridge) and Amy Felty (University of Ottawa). From 22 submitted papers, 13 were selected after a reviewing process. Each submitted paper was reviewed by three referees; the final decisions were made by the editors. This workshop is the last of a series of meetings of the TYPES working group funded by the European Union (IST project 29001, ESPRIT Working Group 21900, ESPRIT BRA 6435). The proceedings of these workshops were published in the Lecture Notes in Computer Science series: TYPES TYPES TYPES TYPES TYPES TYPES TYPES TYPES TYPES TYPES TYPES
1993 1994 1995 1996 1998 1999 2000 2002 2003 2004 2006
Nijmegen, The Netherlands, LNCS 806 B˚ astad, Sweden, LNCS 996 Turin, Italy, LNCS 1158 Aussois, France, LNCS 1512 Kloster Irsee, Germany, LNCS 1657 L¨okeborg, Sweden, LNCS 1956 Durham, UK, LNCS 2277 Berg en Dal, The Netherlands, LNCS 2646 Turin, Italy, LNCS 3085 Jouy-en-Josas, France, LNCS 3839 Nottingham, UK, LNCS 4502
ESPRIT BRA 6453 was a continuation of ESPRIT Action 3245, Logical Frameworks: Design, Implementation and Experiments. Proceedings for annual meetings under that action were published by Cambridge University Press in the books Logical Frameworks and Logical Environments. TYPES 2007 was made possible by the contribution of many people. We thank all the participants to the workshops, and all the authors who submitted papers for consideration for these proceedings. We would like to also thank the referees for their effort in preparing careful reviews. Finally we acknowledge the support of the University of Udine in the organization of the meeting. January 2008
Marino Miculan Ivan Scagnetto Furio Honsell
VI
Preface
Referees Abel, Andreas Alessi, Fabio Asperti, Andrea Benton, Nick Bertot, Yves Bove, Ana Brady, Edwin Brauner, Paul Callaghan, Paul Castran, Pierre Coquand, Thierry Crosilla, Laura D’Agostino, Giovanna Damiani, Ferruccio Di Gianantonio, Pietro Filinski, Andrzej Gabbay, Murdoch J. Gambino, Nicola Geuvers, Herman Ghani, Neil Gregoire, Benjamin Hasuo, Ichiro Herbelin, Hugo Honsell, Furio Hyland, Martin Hyvernat, Pierre Jacobs, Bart Kamareddine, Fairouz Kikuchi, Kentaro Kirchner, Claude Klein, Gerwin
Levy, Paul B. Luo, Yong Luo, Zhaohui Mackie, Ian Marra, Vincenzo McBride, Conor Miculan, Marino Miller, Dale Moggi, Eugenio Momigliano, Alberto Nordstr¨ om, Bengt Norell, Ulf Omodeo, Eugenio Ornaghi, Mario Paulin-Mohring, Christine Peyton-Jones, Simon Pichardie, David Pitts, Andy Pollack, Randy Power, John Rabe, Florian Rubio, Albert Scagnetto, Ivan Schwichtenberg, Helmut Soloviev, Sergei Urban, Christian Veldman, Wim Vene, Varmo Wenzel, Markus Werner, Benjamin Zacchiroli, Stefano
Table of Contents
Algorithmic Equality in Heyting Arithmetic Modulo . . . . . . . . . . . . . . . . . . Lisa Allali
1
CoqJVM: An Executable Specification of the Java Virtual Machine Using Dependent Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Robert Atkey
18
Dependently Sorted Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jo˜ ao Filipe Belo
33
Finiteness in a Minimalist Foundation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Francesco Ciraulo and Giovanni Sambin
51
A Declarative Language for the Coq Proof Assistant . . . . . . . . . . . . . . . . . . Pierre Corbineau
69
Characterising Strongly Normalising Intuitionistic Sequent Terms . . . . . . J. Esp´ırito Santo, S. Ghilezan, and J. Iveti´c
85
Intuitionistic vs. Classical Tautologies, Quantitative Comparison . . . . . . . Antoine Genitrini, Jakub Kozik, and Marek Zaionc
100
In the Search of a Naive Type Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Agnieszka Kozubek and Pawel Urzyczyn
110
Verification of the Redecoration Algorithm for Triangular Matrices . . . . . Ralph Matthes and Martin Strecker
125
A Logic for Parametric Polymorphism with Effects . . . . . . . . . . . . . . . . . . . Rasmus Ejlers Møgelberg and Alex Simpson
142
Working with Mathematical Structures in Type Theory . . . . . . . . . . . . . . . Claudio Sacerdoti Coen and Enrico Tassi
157
On Normalization by Evaluation for Object Calculi . . . . . . . . . . . . . . . . . . J. Schwinghammer
173
Attributive Types for Proof Erasure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hongwei Xi
188
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
203
Algorithmic Equality in Heyting Arithmetic Modulo Lisa Allali LogiCal - École polytechnique - Région Ile de France www.lix.polytechnique.fr/Labo/Lisa.Allali/
1
Introduction
Deduction Modulo is a formalism that aims at distinguish reasoning from computation in proofs. A theory modulo is formed with a set of axioms and a congruence defined by rewrite rules: the reasoning part of the theory is given by the axioms, the computational part by the congruence. In deduction modulo, we can in particular build theories without any axiom, called purely computational theories. What is interesting in building such theories - purely defined by a set of rewrite rules - is the possibility, in some cases to simplify the proofs (typically equality between two closed terms), and also the algorithmic aspect of these proofs. The motivation of building a purely computational presentation of Heyting Arithmetic takes root in La science et l’hypothèse by Henri Poincaré [8] where the author asks: should the proposition 2 + 2 = 4 be proved or just verified ? A good way to verify such propositions is to use the formalism of deduction modulo and rewrite rules. In this perspective, Gilles Dowek and Benjamin Werner have built a purely computational presentation of Heyting Arithmetic[4]. Yet, this presentation didn’t take advantage of the decidability of equality in Arithmetic. In their system, equality was defined by rewrite rules that followed Leibniz’s principle. This is the essential aspect that is changed in the work we present in this paper. The starting point of this work is a remark of Helmut Schwichtenberg, following the development that have been done in minlog [6], about how a set of rewrite rules could be (or not) enough to decide equality in Heyting Arithmetic expressed as a purely computational theory. We answer positively to that question with a new purely computational presentation of Heyting Arithmetic HA−→ such as: – HA−→ is an extension of the usual axiomatic presentation of Heyting Arithmetic HA: Leibniz’s proposition is not defining equality anymore, but is a consequence of the rewrite rules of the system – this extension is conservative over HA – the congruence of HA−→ is decidable – HA−→ has cut elimination property. This work opens new ways to consider equality of inductive types in general, not anymore with Leibniz’s axiom as it is the case in Coq for instance, but building specific rewrite rules for each type we would be interested in. M. Miculan, I. Scagnetto, and F. Honsell (Eds.): TYPES 2007, LNCS 4941, pp. 1–17, 2008. c Springer-Verlag Berlin Heidelberg 2008
2
L. Allali
2
Definitions
2.1
Deduction Modulo
Modern type theories feature a rule said conversion rule which allows to identify propositions which are equal modulo beta-equivalence. It is often presented as follows: Γ t:T
Γ T : T ype Γ t : T
Γ T : T ype
T ≡ T
where T ≡ T is read T is convertible to T . This convertibility is not checked by logical rules but by computation with the rule β. The idea of natural deduction modulo is to use this computation of convertibility inside natural deduction. For instance, the axiom and ⇒ elimination rules are the following: Γ ≡ B
Ax if A ∈ Γ and A ≡ B
Γ ≡ A Γ ≡ C ⇒e if C ≡ A ⇒ B Γ ≡ B The other rules of natural deduction modulo are build the same way upon natural deduction[3]. The convertibility ≡ is not fixed but depends on the theory. It can be any congruence defined by the reflexive, symmetric and transitive closure of a rewrite system. 2.2
Theories in Natural Deduction Modulo
Definition 1 (Axiomatic theory) An axiomatic theory is a set of axioms. Definition 2 (Modulo theory) A modulo theory T is a set of axioms and a congruence defined as the reflexive, transitive and symmetric closure of a set of rewrite rules. The rewrite rules are either from terms to terms either from atomic propositions to propositions. Quantifiers may bind variables, thus these rewrite systems are Combinatory Reduction Systems [7]. Notation: Γ T A means the proposition A is provable in the theory T under the hypothesis Γ . Definition 3 (Purely computational theory) A purely computational theory is a modulo theory where the set of axioms is empty.
Algorithmic Equality in Heyting Arithmetic Modulo
2.3
3
Relations between Theories
We want to go from Heyting Arithmetic theory, which is an axiomatic theory, to reach a purely computational theory, that has the same expressiveness. We need the following definitions to be able to compare, step by step, each theory we build to the previous one. Definition 4 (Equivalence between two theories) Let T and T be two theories formed on the same language L. The theories T and T are equivalent if and only if for any proposition P of L, T P if and only if T P . Definition 5 (Extension) Let T and T be two theories formed respectively on the languages L and L with L ⊆ L . Theory T is an extension of T if and only if for all proposition P of L, if T P then T P . Definition 6 (Conservative extension) Let T and T be two theories respectively formed on languages L and L with L ⊆ L . T is a conservative extension of T if and only if for any proposition P of L, T P if and only if T P . 2.4
Models
In this section, we introduce the material we need to build models for intuitionist deduction modulo. We give the necessary definitions and state the main theorems. The interested reader can refer to [2] and [5] for further explanations. Pseudo Heyting algebra as model for modulo intuitionist logic Definition 7 (Pseudo Heyting algebra) ˜ , ˜ ∀, ˜ ∃, ˜ ⇒ ˜, ∨ ˜ , ⊥, Let B be a set and ≤ a relation on B. A structure B, ≤, ∧ ˜ is a Pseudo Heyting algebra if – ≤ is a reflexive and transitive relation (not necessarily antisymmetric)1 ˜ is a minimum of B for ≤ – ⊥ ˜ is a maximum of B for ≤ – ˜ y is a lower bound of x and y (where x and y are in B) – x∧ ˜ y is a upper bound of x and y (where x and y are in B) – x∨ ˜ and ∃ ˜ ( infinite lower and upper bounds) are functions from ℘(B) to B – ∀ such that: ˜ ≤ x (where x is in B and a is in ℘(B)), - x ∈ a ⇒ ∀a ˜ (where x and c are in B and a is in ℘(B)), - (∀x ∈ a c ≤ x) ⇒ c ≤ ∀a 1
When this relation is more over antisymmetric we get a Heyting Algebra.
4
L. Allali
˜ (where x is in B and a is in ℘(B)), - x ∈ a ⇒ x ≤ ∃a ˜ ≤ c (where x and c are in B and a is in ℘(B)). - (∀x ∈ a x ≤ c) ⇒ ∃a ˜ y ≤ z (where x, y and z are in B). – x ≤ y ⇒z ˜ ⇔ x∧ Definition 8 (Ordered pseudo Heyting algebra) An ordered pseudo Heyting algebra is a pseudo Heyting algebra together with a relation on B such that – – – –
is an order relation, ˜ ≤ b and b b then ˜ ≤ b , ˜ ˜ is a minimal element for , is a maximal element for and ⊥ ˜ ˜ ˜ ˜ ∧, ∨, ∀, ∃ are monotonous, ⇒ ˜ is left anti-monotonous and right monotonous.
Definition 9 (Complete ordered pseudo Heyting algebra) An ordered pseudo Heyting algebra is said to be complete if every subset of B has a greatest lower bound for . Definition 10 (Modulo intuitionistic model) Let L be a language. An Intuitionist model M of L is : – – – –
a set M , an ordered and complete pseudo Heyting algebra B, for each function symbol f of arity n a function fˆ from M n to M , for each predicate symbol P of arity n a function Pˆ from M n to B.
Definition 11 (Denotation) Let M be a model, A be a proposition and φ be an assignment. We define Aφ as follows: xφ ⊥φ φ f (t1 , ..., tn )φ P (t1 , ..., tn )φ
= φ(x) ˜ =⊥ ˜ = = fˆ(t1 φ , ..., tn φ ) = Pˆ (t1 φ , ..., tn φ )
A ∧ Bφ A ∨ Bφ A ⇒ Bφ ∀x Aφ ∃x Aφ
˜ Bφ = Aφ ∧ ˜ |Bφ = Aφ ∨ = Aφ ⇒B ˜ φ ˜ = ∀{A φ,x:=v | v ∈ M } ˜ = ∃{A φ,x:=v | v ∈ M }
Definition 12 (Models for purely computational theory) A model of a purely computational theory whose rewrite rules are R1 −→ R1 , . . ., Rn −→ Rn is such that for each assignment φ, Ri φ = Ri φ for i ∈ {1, . . . , n}. The concept of model is useful when trying to find relations between theories as it is shown by the two following theorems: Theorem 1 (Completeness Theorem) Let T be a theory. If for every model M such as M |= T we have M |= A then T A.2 2
M |= reads as M is a model for.
Algorithmic Equality in Heyting Arithmetic Modulo
5
Theorem 2 (Correctness Theorem) If T A then, for every model M, if M |= T then M |= A. Definition 13 (Super-consistency) A theory T , ≡ in deduction modulo is super-consistent if, for each ordered and complete pseudo Heyting algebra B, there exists a B-model of this theory. The main property of a super-consistent theory is to bear a model valuated in the Candidates Algebra and thus to normalize [3]. Theorem 3 (Normalization) If a theory T , ≡ is super-consistent, then each proof in T , ≡ is strongly normalizable.
3
Different Presentations of Heyting Arithmetic - From Axioms to Rewrite Rules
3.1
The Axiomatic Presentation of Heyting Arithmetic
The language of arithmetic is formed with the constant 0, the unary functional symbol S, the binary functional symbols + and × and the binary predicate symbol =. The axioms are structured in four groups. Definition 14 (HA) 1. The axioms of equality Ref lexivity Leibniz axiom scheme ∀x (x = x) ∀x ∀y (x = y ⇒ (P (x) ⇔ P (y))) 2. The axioms 3 and 4 of Peano ∀x ∀y (S(x) = S(y) ⇒ x = y)
a
∀x (0 = S(x) ⇒ ⊥)
3. The induction scheme (P {x := 0} ∧ ∀y (P {x := y} ⇒ P {x := S(y)})) ⇒ ∀n (P {x := n}) 4. The axioms of addition and multiplication. ∀y (0 + y = y) ∀x ∀y (S(x) + y = S(x + y)) ∀y (0 × y = 0) ∀x ∀y (S(x) × y = x × y + y) a
We chose to formulate here Leibniz’s axiom with an equivalence symbol. Note that ∀x ∀y (x = y ⇒ (P (x) ⇒ P (y))) would have been enough but the equivalence form simplifies the proof of Proposition 5 (equivalence beetween this theory and HAR ).
6
L. Allali
The steps to go from an axiomatic presentation of Heyting Arithmetic HA to a purely computational one HA−→ We shall introduce four successive theories to reach the final purely computational theory we aim at: HAR , HAN , HAK and HA−→ . We will prove that each of them is equivalent to or is a conservative extension of HA. The main novelty is the step from HA to HAR with new rewrite rules to compute equality instead of the Leibniz’s scheme. The three other theories are traced on the work done in [4], especially for the treatment of the induction scheme, but the rules are different so that the proofs need to be adapted. 3.2
HAR , a Theory Equivalent to HA
The theory HAR keeps an axiom scheme for induction, but orients the axioms of addition and multiplication as rewrite rules. It also introduces four rules for rewriting atomic propositions of the form t = u. As we shall see, these rules replace the axioms of equality (reflexivity and Leibniz’s scheme) and the axioms 3 and 4 of Peano. Definition 15 (HAR ) 1. The induction scheme (P {x := 0} ∧ ∀y (P {x := y} ⇒ P {x := S(y)})) ⇒ ∀n P {x := n} 2. The rewrite rules 0 = 0 −→ 0 = S(x) −→ ⊥ S(x) = 0 −→ ⊥ S(x) = S(y) −→ x = y
0 + y −→ y S(x) + y −→ S(x + y) 0 × y −→ 0 S(x) × y −→ x × y + y
Proposition 1. The propositions ∀x (x = x), ∀x ∀y (x = y ⇒ y = x), and ∀x ∀y ∀z (x = y ⇒ y = z ⇒ x = z) are provable in HAR . Proof. Reflexivity is proved by induction on x. This requires to prove the proposition 0 = 0 and ∀y (y = y ⇒ S(y) = S(y)). The first proposition reduces to and the second to ∀y (y = y ⇒ y = y) that is obviously provable. Symmetry is proved by two nested inductions, the first on x and the second on y. Transitivity is proved by three nested inductions on x, y and then z. Notice that all these proofs can be written inside the system itself using the induction axiom scheme. 2 Proposition 2. The propositions ∀x ∀y ∀z (x = y ⇒ x + z = y + z) and ∀x ∀y ∀z (x = y ⇒ z + x = z + y) are provable in HAR . Proof. Both propositions are proved inside the system by two nested inductions on x and y. 2 Proposition 3. The propositions ∀x ∀y ∀z (x = y ⇒ x × z = y × z) and ∀x ∀y ∀z (x = y ⇒ z × x = z × y) are provable in HAR .
Algorithmic Equality in Heyting Arithmetic Modulo
7
Proof. Both propositions are proved inside the system by two nested inductions on x and y. But this requires to prove first the propositions ∀x x × 0 = 0, ∀y ∀x (y × S(x) = y × x + y) and ∀x ∀y (x × y = y × x) that are again proved with the induction axiom scheme. 2 Proposition 4. For each term t, the proposition ∀a ∀b (a = b ⇒ t{y := a} = t{y := b}) is provable in HAR . Proof. By induction on the structure of t, using Proposition 1, 2, 3.
2
Proposition 5. Each instance of Leibniz’ scheme ∀x ∀y (x = y ⇒ (P (x) ⇔ P (y))) is provable in HAR . Proof. By induction on the structure of P using Proposition 4 for the atomic case. 2 Proposition 6 (Equivalence between HA and HAR ). The theory HAR is equivalent to HA, i.e. for any closed propositions A in the language of HA, A is provable in HA if and only if A is provable in HAR Proof ⇒ We check that each axiom of HA is provable in HAR and we conclude with an induction over the proof structure. – the proposition ∀x (x = x) is provable in HAR by Proposition 1. – Leibniz’ scheme is provable in HAR by Proposition 5. – the axioms 3 and 4 of Peano rewrite to easily provable propositions x = y ⇒ x = y and ⊥ ⇒ ⊥. – The induction scheme is an axiom of HAR . – The axioms of addition and multiplication rewrite to propositions that are consequences of the reflexivity of equality. ⇐ The induction axiom scheme is the same in HAR than in HA. The rest of HAR is a rewrite system defining a congruence ≡. We prove that for every propositions A and B, if A ≡ B, there exists a proof of A ⇔ B in HA. To do so, we prove by induction on the structure of A that if A −→ B in HAR then there exists an proof of A ⇔ B in HA. 3.3
HAN , a Conservative Extension of HAR
We add the predicate N to the language. We modify the induction scheme axiom adding this predicate and two axioms for the predicate N that are the axioms 1 and 2 of Peano.
8
L. Allali
Definition 16 (HAN ) 1. The induction scheme ∀n (N (n) ⇒ (P {x := 0} ∧ ∀y (N (y) ⇒ P {x := y} ⇒ P {x := S(y)})) ⇒ P {x := n}) 2. The axioms 1 and 2 of Peano N (0) 3. The (1) (2) (3) (4)
rewrite rules 0 = 0 −→ 0 = S(x) −→ ⊥ S(x) = 0 −→ ⊥ S(x) = S(y) −→ x = y
∀x (N (x) ⇒ N (S(x))) (5) (6) (7) (8)
0 + y −→ y S(x) + y −→ S(x + y) 0 × y −→ 0 S(x) × y −→ x × y + y
Translation | . | from HAR to HAN |(t = u)| = (t = u) |A ∨ B| = |A| ∨ |B| | | = |A ⇒ B| = |A| ⇒ |B| |⊥| = ⊥ |∀x A| = ∀x (N (x) ⇒ |A|) |A ∧ B| = |A| ∧ |B| |∃x A| = ∃x (N (x) ∧ |A|) We want to prove that HAN is an extension of HAR . The difficulty stands in the N (t) added by the translation. We first prove a few properties on this N predicate: Proposition 7. HAN ∀x ∀y (N (y) ⇒ N (x) ⇒ N (x + y)) Proof. We first introduce N (y) in the context, then we use the induction scheme axiom on x. 2 Proposition 8. HAN ∀x ∀y (N (x) ⇒ N (y) ⇒ N (x × y)) Proof. We first introduce N (y) in the context, then we use the induction scheme axiom on x. The proof uses Proposition 7. 2 → → Proposition 9. N (− z )3 HAN N (t) for all t where F V (t) = − z Proof. By structural induction on t.
2
Then we prove the following proposition which is the key lemma of our proof. → Proposition 10. For each proposition A and vector − z where F V (A) is included → − → − in z , if Γ HAR A then |Γ |, N ( z ) HAN |A|. Proof. By induction on the size of the proof tree. Most of the cases are trivial, except those concerning introduction and elimination of the quantifier. For those one, we use Proposition 9. 2 3
→ − N (− z ) is the notation for N (z1 ), ..., N (zn ) with → z = {z1 , ..., zn }.
Algorithmic Equality in Heyting Arithmetic Modulo
9
Proposition 11 (HAN is an extension of HAR ). Let A be a closed proposition of HAR . If A is provable in HAR then |A| is provable in HAN . → Proof. We use Proposition 10, and remove all the N (− z ) appearing in the context by using the ∃ elimination rule as follows: Ax Π HAN N (0) → − ∃i N (z0 ), N ( z ) HAN |A| HAN ∃x N (x) ∃e → − N ( z ) HAN |A| 2 To prove that the extension is conservative with respect to the translation | . |, we introduce another translation ∗ from HAN to HAR where every occurrence of the N predicate is replaced by . Then we prove some properties about this translation to finally be able to prove the conservativity. Translation * from HAN (t = u)∗ = (t = u) ∗ = (A ∧ B)∗ = A∗ ∧ B ∗ (∀x A)∗ = ∀x (A∗ ) (∃x A)∗ = ∃x (A∗ )
to HAR N (x)∗ = ⊥∗ = ⊥ (A ∨ B)∗ = A∗ ∨ B ∗ (A ⇒ B)∗ = A∗ ⇒ B ∗
Proposition 12. Let A be a closed proposition of HAN . If A is provable in HAN then A∗ is provable in HAR . Proof. By induction on the size of the proof, using the fact that the rewrite rules are the same and that the induction scheme axiom of HAR is the exact translation by ∗ of the induction scheme axiom of HAN . 2 Corollary 1. Let A be a closed proposition of HAR . If |A| is provable in HAN then |A|∗ is provable in HAR . Proposition 13. Let A be a closed proposition of HAR . A and |A|∗ are equivalent in HAR . Proof. By structural induction on A.
2
Proposition 14 (Conservativity with respect to the translation | . |). Let A be a closed proposition of HAR . If |A| is provable in HAN then A is provable in HAR Proof. By Corollary 1, we know that if |A| is provable in HAN then |A|∗ is provable in HAR , and as A and |A|∗ are equivalent in HAR (Proposition 13), we 2 can conclude that if |A| is provable in HAN then A is provable in HAR .
10
L. Allali
3.4
HAK , a Conservative Extension of HAN
We sort our theory with the two sorts ι and κ, as follows: 0:ι S : ι, ι + : ι, ι, ι × : ι, ι, ι = : ι, ι N : ι. We add a symbol ∈ : ι, κ. For all propositions P of HAN , where F V (P ) = z, y1 , ..., yn , we add a new function symbol fz,y1 ,...,yn,P : ι, . . . , ι, κ. n times
The elements of sort κ are classes of integers. We build these classes with a comprehension axiom scheme restricted to the propositions of HAN following an idea going back to Takeuti. Finally we modify the induction axiom. We keep the previous rewrite rules. Definition 17 (HAK ) 1. The comprehension scheme ∀x∀y1 ...∀yn (x ∈ fz,y1 ,...,yn,P (y1 , . . . , yn ) ⇔ P {z := x})
a
2. The induction scheme ∀n(N (n) ⇔ ∀k(0 ∈ k ⇒ ∀y(N (y) ⇒ y ∈ k ⇒ S(y) ∈ k) ⇒ n ∈ k)) 3. The (1) (2) (3) (4) a
rewrite rules 0 = 0 −→ 0 = S(x) −→ ⊥ S(x) = 0 −→ ⊥ S(x) = S(y) −→ x = y
(5) (6) (7) (8)
0 + y −→ y S(x) + y −→ S(x + y) 0 × y −→ 0 S(x) × y −→ x × y + y
Remark: by construction of the new function symbols of the form fz,y1 ,...,yn ,P , there is no occurrence of the ∈ symbol in proposition P .
Proposition 15. Let A be a closed proposition of HAN . A is provable in HAN if and only if A is provable in HAK . Proof ⇒ The rewrite rules are the same. We check that each axiom of HAN is provable in HAK . ⇐ We begin with an arbitrary model MN of HAN . We show we can extend this model to a model MK of HAK without changing the denotation for the proposition of HAK . As A is a theorem of HAK , MK validates A. As the denotation of a proposition of HAK is the same in MK than in MN , MN also validates A. As A is valid in all model of HAN , we conclude A is a theorem of HAN . Thus all the theorems of HAK are theorems of HAN .
Algorithmic Equality in Heyting Arithmetic Modulo
11
We need the following definition to build such a model: Definition 18 (Definable function in HAN ) Let M be a model of HAN . A function γ from M to B is definable if there exists a proposition P in HAN language with F V (P ) = {x, y1 , ..., yn } and an assignment Φ from all b1 , ..., bn of M to y1 , ..., yn such as: γ(a) = P Φ,x:=a Let us now show how we build a model MK from a model MN without changing the denotation for the proposition of HAK . Let MN be a model of HAN . Let MN be the domain of MN and B its Heyting algebra. Extension from MN to MK : Let Mι = MN . Let Mκ be the set of the definable functions from Mι to B. The domain MK of MK is made of the sets Mκ and Mι . The variables of class are interpreted in Mκ , the other variables are interpreted in Mι . All the symbols of HAN have the same denotation in MK and in MN . Let us see how we interpret the symbols of HAK that do not appear in HAN : – We interpret the function symbol fz,y1 ,...,yn,P as the function mapping b1 , ..., bn (elements of Mι ) to the element of Mκ : a → P x:=a,y1 :=b1 ,...,yn:=bn – We interpret the ∈ symbol by the following application: x ∈ E = Ex Let us prove that MK is a model of HAK proving that the two axioms of HAK are valid in MK . • ∀x∀y1 ...∀yn (x ∈ fz,y1 ,...,yn,P (y1 , . . . , yn ) ⇔ P {z := x}) We need to show ˜ ∀x∀y1 ...∀yn (x ∈ fz,y1 ,...,yn,P (y1 , . . . , yn ) ⇔ P {z := x}) ≥ which lead to prove that for each a, b1 , ..., bn of Mι ˜ x ∈ fz,y1 ,...,yn,P (y1 , . . . , yn ) ⇔ P {z := x}x:=a,y1 :=b1 ,...,yn:=bn ≥ Let Φ be the assignment {x := a, y1 := b1 , ..., yn := bn }. We now must prove x ∈ fz,y1 ,...,yn,P (y1 , . . . , yn )Φ = P {z := x}Φ Let us focus on the first part of this equality: x ∈ fz,y1 ,...,yn,P (y1 , . . . , yn )Φ = fz,y1 ,...,yn,P (y1 , . . . , yn )Φ xΦ We have: fz,y1 ,...,yn,P (y1 , . . . , yn )Φ = fz,y1 ,...,yn,P Φ (y1 Φ , . . . , yn Φ ) = fz,y1 ,...,yn,P Φ (b1 , . . . , bn ) By the interpretation we have given to function symbols, we have: fz,y1 ,...,yn,P Φ (b1 , . . . , bn ) is the definable function γ of Mκ associated to P with the assignment Φ that associates b1 , ..., bn to y1 , ..., yn .
12
L. Allali
Thus: (fz,y1 ,...,yn,P Φ (b1 , . . . , bn ))xΦ = γ a And by definition of the definable functions: γ a = P Φ ,z:=a As x is not free in P , we can add x := a to Φ : we get Φ, because the values assigned to y1 , ..., yn by Φ and Φ are the same. We have γ a = P Φ,z:=a Let us now look at the second part of the equality: P {z := x}Φ P {z := x}Φ = P Φ,z:=xΦ = P Φ,z:=a We finally have: for each interpretation Φ x ∈ fz,y1 ,...,yn,P (y1 , . . . , yn )Φ = P {z := x}Φ We can conclude ˜ ∀x∀y1 ...∀yn (x ∈ fz,y1 ,...,yn,P (y1 , . . . , yn ) ⇔ P {z := x}) ≥ • We proceed in the same way to prove that ∀n (N (n) ⇔ ∀f (0 ∈ f ⇒ ∀y (N (y) ⇒ y ∈ f ⇒ S(y) ∈ f ) ⇒ n ∈ f ))
4
HA−→, a Purely Computational Presentation of Heyting Arithmetic
In the previous section, all the axioms of the theory HAK were in equivalent form (i.e in the form of A ⇔ B for some propositions A and B). Following [2] we can transform an axiom in equivalent form into a rewrite rule without changing the expressiveness of the theory: the same theorems can be proved. In this section, we change the axioms of HAK into rewrite rules to obtain a purely computational theory. Definition 19 (HA−→ ) (1) x ∈ fz,y1 ,...,yn,P (y1 , . . . , yn ) −→ P {z := x} (2) (3) (4) (5) (6)
N (n) −→ ∀k (0 ∈ k ⇒ ∀y (N (y) ⇒ y ∈ k ⇒ S(y) ∈ k) ⇒ n ∈ k) 0 = 0 −→ 0 = S(x) −→ ⊥ S(x) = 0 −→ ⊥ S(x) = S(y) −→ x = y
(7) 0 + y −→ y (8) S(x) + y −→ S(x + y) (9) 0 × y −→ 0 (10) S(x) × y −→ x × y + y
Algorithmic Equality in Heyting Arithmetic Modulo
13
Proposition 16. Let A be a closed proposition of HAK . A is provable in HAK if and only if A is provable in HA−→ . Proof. To go from HAK to HA−→ , we have replaced two axioms in a form of equivalence by two rewrite rules that make each part of these equivalences congruent. It is trivial to prove that any proposition proved in HAK by using these two axioms can be proved in HA−→ , using the new rewrite rules. Conversely, any proposition proved in HA−→ can be prove in HAK using the transitivity of ⇔. 2
5
Properties of HA−→
5.1
HA−→ Is a Conservative Extension of HA
HA−→ is equivalent to HAK . HAK is a conservative extension of HAN . HAN is a conservative extension of HAR with respect to the translation | . |. HAR is equivalent to HA. Thus, HA−→ is a conservative extension of HA. 5.2
Decidability of the Congruence Defined by the HA−→ Rewrite System
The rewrite system of HA−→ is not terminating, due to the rule (2). We change the orientation of this rule to obtain the rewrite system R. As the congruence is defined by the reflexive, symmetric and transitive closure of the rewrite rules, the congruence defined by R is the same as the congruence of HA−→ . Thus in order to prove the decidability of the congruence of HA−→ , we prove the termination and confluence of R. (1) x ∈ fz,y1 ,...,yn,P (y1 , . . . , yn ) −→ P {z := x} (2) ∀k (0 ∈ k ⇒ ∀y (N (y) ⇒ y ∈ k ⇒ S(y) ∈ k) ⇒ n ∈ k) −→ N (n) (3) (4) (5) (6)
0 = 0 −→ 0 = S(x) −→ ⊥ S(x) = 0 −→ ⊥ S(x) = S(y) −→ x = y
(7) 0 + y −→ y (8) S(x) + y −→ S(x + y) (9 0 × y −→ 0 (10) S(x) × y −→ x × y + y
The R rewrite system Remarks on the R rewrite system – This system is not a first order rewrite system due to the second rule that contains binders. Notice that as k is bounded in rule (2), one can not rewrite rule (1) in rule (2).
14
L. Allali
– The first rule is a rule scheme (it is also the case in HA−→ ): there is an infinity of rewrite rules following this scheme, as many as propositions P we can write in HAN . Example: Let us take the proposition z = y. For this proposition the symbol function fz,y,z=y has been added to the language. The instance of the rule 1 following the scheme for this proposition is: x ∈ fz,y,z=y (y) −→ x = y. The substitution P {z := x} only appear in the rule scheme but it doesn’t appear in any instance of it. Proposition 17. The R rewrite system is terminating. Proof. We establish the following well founded order on N × N × N . The first component is the number of occurrences of the symbol ∈ appearing in a proposition. This component makes decrease rule 1: by construction of the comprehension scheme, the symbol ∈ doesn’t appear in P . The value decreases obviously in rule 2 also. The value does not change for the other rules where the symbol ∈ doesn’t appear. For the second component we define a measure function w on terms and propositions: this function is first defined on terms using the following equations w(x) = w(0) = 2 w(S(t)) = 2 + w(t)
w(t + u) = 1 + w(t) + w(u) w(t × u) = 2 + (w(t) × w(u))
We can easily prove that for any term t, w(t) 2. Then we propagate this measure on propositions as follows: w( ) = 0 w(t = u) = w(t) + w(u) w(A ∨ B) = w(A) + w(B) w(A ∧ B) = w(A) + w(B)
w(⊥) = 0 w(t ∈ k) = w(t) w(A ⇒ B) = w(A) + w(B) w(∀x A) = w(∃x A) = w(A)
This measure obviously decreases rule 3,4,5 and 6. Few simple calculi are enough to prove that the value is decreasing for rule (7), (9) and (10), knowing that for any term t, w(t) 2. Yet, the measure does not change for rule (8). We introduce finally a last measure w for rule 8: The measure is defined on terms using the following equations: w (x) = w (0) = 2 w (S(t)) = 2 + w (t)
w (t + u) = 1 + 2 × w (t) + w (u) w (t × u) = 2 + (w (t) × w (u))
The propagation on propositions is the same as for w. This measure decreases for rule (8). Proposition 18. The R rewrite system is confluent.
2
Algorithmic Equality in Heyting Arithmetic Modulo
15
Proof. There is no critical pair in the system, so the system is locally confluent[7]. As it is terminating, we can conclude that the system is confluent. 2 Proposition 19 The congruence defined by the R rewrite system is decidable. Proof. As the rewrite system is terminating and confluent, there exists a normal form for propositions and terms in our system. Two propositions or terms are congruent if and only if they have the same normal form. As the system has strong normalization property, the congruence is decidable. 2 5.3
Cut Elimination Property
Proposition 20. HA−→ has cut elimination property. Using [5] we prove that HA−→ is super-consistent: >From an ordered and complete pseudo Heyting algebra B, we will build a Bmodel M of HA−→ such that for each interpretation Φ, if A −→ A is a rule defining the congruence in our theory then AΦ = A Φ . ˜ , ˜ ∀, ˜ ∃, ˜ ⇒, ˜, ∨ ˜ , ⊥, Proof. Let B = B, ≤, ∧ ˜ We build M as follows: – The domain of M is Mι = N and Mκ = B N . – The interpretation of the function symbol 0 is the 0N of the integers. S, + and × are interpreted as expected as the successor function, the addition and multiplication in N. ˜ and . ˜ – ⊥ and are interpreted respectively by ⊥ – We interpret the membership and all the function symbols of sort κ as in the previous proof of conservativity of HAK : the interpretation of ∈ that for each n and f associates f (n). The interpretation of a symbol of sort κ is a function receiving an assignment for the n free variables in the proposition associated to f , and returns a function from N to B. – The interpretation of equality , =, ˜ is defined by the infinite following array, ˜ on the diagonal, ⊥ ˜ on the rest of the array. witch is = ˜ 0 1 2 ˜ ⊥ ˜ ⊥ ˜ 0 ˜ ˜ ˜ 1 ⊥ ⊥ ˜ ⊥ ˜ ˜ 2 ⊥ .. .. .. .. . . . .
... ... ... ... .. .
– Interpretation of the predicate N : This is the most technical construction. Indeed, this predicate appears recursively in the rewrite rule: N (n) −→ ∀k (0 ∈ k ⇒ ∀y (N (y) ⇒ y ∈ k ⇒ S(y) ∈ k) ⇒ n ∈ k)
16
L. Allali
Let us keep in mind we are looking for a certain function F from N to B to interpret N such as for all a in N: N (n)n:=a = ∀k (0 ∈ k ⇒ ∀y (N (y) ⇒ y ∈ k ⇒ S(y) ∈ k) ⇒ n ∈ k))n:=a i.e. F = a → ∀k (0 ∈ k ⇒ ∀y (N (y) ⇒ y ∈ k ⇒ S(y) ∈ k) ⇒ n ∈ k))n:=a For each function f from N to B, we build a model Mf where N is interpreted by f , the other symbols are interpreted as defined previously. Let Φ be the function form B N to B N , mapping f to the function M
f a → ∀k (0 ∈ k ⇒ ∀y (N (y) ⇒ y ∈ k ⇒ S(y) ∈ k) ⇒ n ∈ k))n:=a
We are interested in the function F such as Φ(F ) = F . Does such a fixpoint exists ? The order on B N defined by f g if for each x, f (x) g(x) is a complete order and the function Φ is monotonous as the occurrence of N is positive in ∀k (0 ∈ k ⇒ ∀y (N (y) ⇒ y ∈ k ⇒ S(y) ∈ k) ⇒ n ∈ k)) Thus we can apply the Knaster-Tarski theorem and deduce there exists a fixed point F of the function Φ. Let us interpret the N predicate by this fixed point F (ie choosing the model MF ). By construction, N (n)MF = ∀k (0 ∈ k ⇒ ∀y (N (y) ⇒ y ∈ k ⇒ S(y) ∈ k) ⇒ n ∈ k))MF MF is a B-model of HA−→ . We conclude by Definition 13 that HA−→ is super-consistent and thus, by Proposition 3, all proofs in HA−→ strongly normalize. 2
6
Discussion
One can ask if this system is really efficient in practice: in one hand, the proof of x = y are shorter, in the other hand the proof of ∀x∀y (x = y ⇒ P (x) ⇒ P (y)) is longer. There is no theoretical answer to that question, it is only by making tests that we would see how the size of proof terms would change. A good indication is that the way we manage to “simulate” an application of Leibniz principle with our rewrite rules (the way it is shown in [1]) is linear in the size of the proposition.
7
Conclusion
We have reached a presentation of Heyting Arithmetic without any axiom, simply defined by a rewrite rule system. A cornerstone of this presentation is that it makes use of the decidability of the equality in Heyting Arithmetic, indeed the equality is defined as a decision procedure, rather than as Leibniz’s proposition which becomes a consequence of the congruence of the system.
Algorithmic Equality in Heyting Arithmetic Modulo
17
Acknowledgments I would like to thank Gilles Dowek for all his constructive advice, Arnaud Spiwack for the help he gave me during the writing of this paper, and the anonymous referees who provided useful comments that contributed to the correctness of the paper.
References 1. Allali, L.: Memoire de DEA, http://www.lix.polytechnique.fr/Labo/Lisa.Allali/rapport_MPRI.pdf 2. Dowek, G., Hardin, T., Kirchner, C.: Theorem proving modulo. Journal of Automated Reasoning 31, 32–72 (2003) 3. Dowek, G., Werner, B.: Proof normalization modulo. The Journal of Symbolic Logic 68(4), 1289–1316 (2003) 4. Dowek, G., Werner, B.: Arithmetic as a theory modulo. In: Giesl, J. (ed.) RTA 2005. LNCS, vol. 3467, pp. 423–437. Springer, Heidelberg (2005) 5. Dowek, G.: Truth values algebras and normalization. In: Altenkirch, T., McBride, C. (eds.) TYPES 2006. LNCS, vol. 4502, Springer, Heidelberg (2007) 6. Schwichtenberg, H.: Proofs as programs. Proof theory: a selection of papers from the Leeds Proof Theory Programme 1990. Cambridge University Press, Cambridge (1992) 7. van Oostrom, V., van Raamsdonk, F.: Weak Orthogonality Implies Confluence: The High-Order Case. Technical Report: ISRL-94-5 (December 1994) 8. Poincarè, H.: La Science et l’hypothèse, 1902, Flammarion (1968) 9. Dowek, G.: La part du calcul. Mèmoire d’Habilitation à Diriger des Recherches, Universitè Paris 7 (1999) 10. The Coq Development Team. Manuel de Rèfèrence de Coq V8.0. LogiCal Project (2004-2006), http://coq.inria.fr/doc/main.html
CoqJVM: An Executable Specification of the Java Virtual Machine Using Dependent Types Robert Atkey LFCS, School of Informatics, University of Edinburgh Mayfield Rd, Edinburgh EH9 3JZ, UK
[email protected]
Abstract. We describe an executable specification of the Java Virtual Machine (JVM) within the Coq proof assistant. The principal features of the development are that it is executable, meaning that it can be tested against a real JVM to gain confidence in the correctness of the specification; and that it has been written with heavy use of dependent types, this is both to structure the model in a useful way, and to constrain the model to prevent spurious partiality. We describe the structure of the formalisation and the way in which we have used dependent types.
1
Introduction
Large scale formalisations of programming languages and systems in mechanised theorem provers have recently become popular [4,5,6,9]. In this paper, we describe a formalisation of the Java virtual machine (JVM) [8] in the Coq proof assistant [11]. The principal features of this formalisation are that it is executable, meaning that a purely functional JVM can be extracted from the Coq development and – with some O’Caml glue code – executed on real Java bytecode output from the Java compiler; and that it is structured using dependent types. The motivation for this development is to act as a basis for certified consumerside Proof-Carrying Code (PCC) [12]. We aim to prove the soundness of program logics and correctness of proof checkers against the model, and extract the proof checkers to produce certified stand-alone tools. For this application, the model should faithfully model a realistic JVM. For the intended application of PCC, this is essential in order to minimise and understand the unavoidable semantic gap between the model and reality. PCC is intended as a secure defence against hostile code; the semantic gap is the point that could potentially be exploited by an attacker. To establish and test of our model we have designed it to be executable so that it can be tested against a real JVM. Further, we have structured the design of the model using Coq’s module system, keeping the component parts abstract with respect to proofs about the model. This is intended to broaden the applicability of proofs performed against the model and to prevent “over-fitting” to the specific implementation. In order to structure the model we have made heavy use of Coq’s feature of dependent types to state and maintain invariants about the internal data M. Miculan, I. Scagnetto, and F. Honsell (Eds.): TYPES 2007, LNCS 4941, pp. 18–32, 2008. c Springer-Verlag Berlin Heidelberg 2008
CoqJVM: An Executable Specification of the JVM Using Dependent Types
19
structures. We have used dependent types as a local structuring mechanism to state the properties of functions that operate on data structures and to pass information between them. In some cases this is forced upon us in order to prove to Coq that our recursive definitions for class loading and searching the class hierarchy are always terminating, but they allow us to tightly constrain the behaviour of the model, reducing spurious partiality that would arise in a more loosely typed implementation. To demonstrate the removal of spurious partiality, we consider the implementation of the invokevirtual function. To execute this function, we must resolve the method reference within the instruction; find the object in the heap; search for the method implementation starting from the object’s class in the class pool and then, if found invoke this method. In a naive executable implementation, we would have to deal with the potential failure of some of these operations. For example, the finding of the class for a given object in the heap. We know from the way the JVM is constructed that this can never happen, but we still have to do something in the implementation, even if it just returns an error. Further, every proof that is conducted against this implementation must prove over again that this case cannot happen. We remove this spurious partiality from the model by making use of dependent types to maintain invariants about the state of the JVM. These invariants are then available to all proofs concerning the model. Our belief is that this will make large-scale proofs using the model easier to perform, and we have some initial evidence that this is the case, but detailed research of this claim is still required. There are still cases when the model should return an error. We have attempted to restrict these to when the code being executed is not type safe. A basic design decision is to construct the model so that if the JVM’s bytecode verifier would accept the code, then the model should not go wrong. Overview. In the next section we give an overview of our formalisation, detailing the high-level approach and the module design that we have adopted. We describe the modelling of the class pool and its operations in Section 3. In Section 4 we describe the formalisation of object heaps and static fields, again using dependent types. In Section 5 we describe our modelling of the execution of single instructions. The extraction to O’Caml is discussed in Section 6. We discuss related work in Section 7 and conclude with notes for future work in Section 8. The Formalisation. Due to reasons of space, this paper can only offer a highlevel overview of the main points of the formalisation. For more information the reader is referred to the formalisation itself, which is downloadable from http://homepages.inf.ed.ac.uk/ratkey/coqjvm/.
2
High-Level Structure of the Formalisation
The large-scale structure of the formalisation is organised using Coq’s module facilities. We use this for two reasons: to abstract the model over the implementation of several basic types such as 32-bit integers and primitive class, method
20
R. Atkey
and field names; and also to provide an extra-logical reason to believe that we are modelling a platonic JVM, rather than fitting to our implementation. The interface for the basic types is contained within the signature BASICS. We assume a type Int32.t with some arithmetic operations. This is instantiated with O’Caml’s int32 type after extraction. We also require types to represent the class, field and method names. Since these types are used as the domains of finite maps throughout the formalisation, we stipulate that they must also have an ordering suitable for use with Coq’s implementation of balanced binary trees. We keep the constituent parts of the formalisation abstract from each other by use of the module system. This has the advantage of reducing complexity in each part and keeping the development manageable. It is also an attempt to keep proofs about the model from “over-fitting” to the concrete implementation used. For example, we only expose an abstract datatype for the set of loaded classes and some axiomatised operations on it. The intention is that any implementation of class pools will conform to this specification, and so proofs against it will have wider applicability than just the implementation we have coded1 . Thirdly, the use of modules makes the extracted code safer to use from O’Caml. Many of the datatypes we use in the formalisation have been refined by invariants expressed using dependent types. Since O’Caml does not have dependent types they are thrown away during the extraction process. By using type abstraction we can be sure that we are maintaining the invariants correctly. The main module is Execution, which has the following functor signature: Module Execution (B : BASICS) (C : CLASSDATATYPES with Module B := B) (CP : CLASSPOOL with Module B := B with Module C := C) (RDT : CERTRUNTIMETYPESwith Module B := B with Module C := C with Module CP := CP ) The module types mentioned in the signature correspond to the major components of our formalisation. The signature BASICS we have already mentioned. CLASSDATATYPES contains the basic datatypes for classes, methods, instructions and the like. CLASSPOOL is the interface to the set of loaded classes and the dynamic loading facilities, described in Section 3. CERTRUNTIMETYPES is the interface to the object heap and the static fields, described in Section 4. Note the use of sharing constraints in the functor signature. These are required since the components that fulfil each signature are themselves functors. We need sharing constraints to state that all their functor arguments must be equal. The heavy use of sharing constraints exposed two bugs in Coq’s implementation, one to do with type checking code using sharing and one in extraction.
1
This technique is also used in Bicolano: http://mobius.inria.fr/bicolano
CoqJVM: An Executable Specification of the JVM Using Dependent Types
3
21
Class Pools and Dynamic Loading
Throughout execution the JVM maintains a collection of classes that have been loaded into memory: the class pool. In this section we describe how we have modelled the class pool; how new classes are loaded; how functions that search the class pool are written; and how the class pool is used by the rest of the formalisation. 3.1
The Class Pool
Essentially, the class pool is nothing but a finite map from fully qualified class names to data structures containing information such as the class’s superclass name and its methods. We store both proper classes and interfaces in the same structures, and we differentiate between them by a boolean field classInterface, which is true when the structure represents an interface, and false otherwise. The map from class names to class structures directly mimics the data structure that would be used in a real implementation of a JVM. In order to construct an executable model within Coq though, the basic data structure is not enough; we have to refine the basic underlying data structure with some invariants. The motivation for adding invariants to the class pool data structure was originally to enable the writing of functions that search over the class hierarchy. Each class data structure is a record type, with a field classSuperClass : option className indicating the name of that class’s superclass, if any. Searches over the class hierarchy operate by following the superclass links between classes. In a real JVM implementation it is known that, due to the invariants maintained by the class loading procedures, if a class points to a superclass, then that superclass will exist, and that java.lang.Object is the top of the hierarchy. Therefore, every search upwards through the inheritance tree will always terminate. When writing these functions in Coq we must convince Coq that the function actually does terminate, i.e. we must have a proof that the class hierarchy is well founded so that we can write functions that recurse on this fact. To this end, we package the basic data structure for the class pool, a finite map from class names to class structures, with two invariants: classpool : Classpool.t classpoolInvariant : ∀nm c. lookup classpool nm = Some c → classInvariant classpool nm c classpoolObject : ∃c. lookup classpool java.lang.Object = Some c ∧ classSuperClass c = None ∧ classInterfaces c = [ ] ∧ classInterface c = false We call the type of these records certClasspool. The type Classpool.t represents the underlying finite map. The function lookup looks up a class name in a given class pool, returning an option class result. There are two invariants that we maintain on class pools, the first covers every class in classpool , we describe this
22
R. Atkey
below. The second states that there is an entry for java.lang.Object and that it has no superclass and no superinterfaces, and is a proper class. The invariants that every class must satisfy are given by the predicate classInvariant : Classpool.t → className → class → Prop classInvariant classpool nm c ≡ className c = nm ∧ (classSuperClass c = None → nm = java.lang.Object) ∧ classInvariantAux classpool c which states that each class structure must be recorded under a matching name; only java.lang.Object has no superclass; and that further invariants hold: classInvariantAux classpool c. This varies depending on whether c is a proper class or an interface. In the case of a proper class, we require that two properties hold: that all the class’s superclasses are present in classpool and likewise for all its superinterfaces. A proof that all a class’s superclasses are present is recorded as a term of the following inductive predicate: goodSuperClass classpool : option className → Prop gscTop : goodSuperClass classpool None gscStep : ∀cSuper nmSuper. lookup classpool nmSuper = Some cSuper → classInterface cSuper = false → goodSuperClass classpool (classSuperClass cSuper ) → goodSuperClass classpool (Some nmSuper ) For a class record c, knowing goodSuperClass classpool (classSuperClass c), means that we know that all the superclasses of c, if there are any, are contained within classpool , finishing with a class that has no superclass. We also know that all the class structures in this chain are proper classes. By the other invariants of class pools, we know that the top class must be java.lang.Object. There is also a similar predicate goodInterfaceList classes interfaceList that states that the tree of interfaces starting from the given list is well founded. For classInvariantAux classpool c, when c is an interface we again require that all the superinterfaces are present, but we also insist that the superclass must be java.lang.Object, to match the requirements in the JVM specification. These predicates may be used to write functions that search the class hierarchy that are accepted by Coq by the technique of recursion on an ad-hoc predicate [3]. Unfortunately, they are not suitable for proving properties of these functions. We describe in Section 3.3 how we use an equivalent formulation to prove properties of these functions. 3.2
Dynamic Class Loading
New classes are loaded into the JVM as a result of the resolution of references to entities such as classes, methods and fields. For example, if the code for a method contains an invokevirtual instruction that calls a method int C.m(int), then
CoqJVM: An Executable Specification of the JVM Using Dependent Types
23
the class name C must be resolved to a real class, and then the method int C.m(int) must be resolved to an actual method in the class resolved by C, or one of its superclasses. The process of resolving the reference C may involve searching for an implementation on disk and loading it into the class pool. The JVM specification distinguishes between the loading and resolution of classes. It also takes care to maintain information on the different class loaders used to load classes, in order to prevent spoofing attacks on the JVM’s type safety [7]. In our formalisation we only consider a single class loader, the bootstrap class loader. With this simplification, we can unfold the resolution and loading procedures described in the JVM specification to the following steps: 1. If the class C exists in the class pool as c then return c. 2. Otherwise, search for an implementation of C. If not found, return error. If found, check it for well-formedness and call it pc (preclass). 3. Recursively resolve the reference to the superclass, returning sc. 4. Check that sc is a proper class, and that C is not the name of sc or any of its superclasses. 5. Recursively resolve the references to superinterfaces. 6. Convert pc to a class c and add it to the class pool. Return c. The well-formedness check in step 2 checks that the class we have found has a superclass reference and that it has a matching name to the one we are looking for. In the case of interfaces, it checks that the superclass is always java.lang.Object. Since the formalisation only deals with a structured version of the .class data loaded from disk, we do not formalise any of the low-level binary well-formedness checks prescribed by the JVM specification. We separate class implementations (preclasses pc) from loaded classes c because preclasses contain information that is discarded by the loading procedure, such as the proof representations required for PCC. If we start from a class pool that contains nothing but an implementation of java.lang.Object satisfying the properties in the previous section, and add classes by the above procedure then it is evident that we maintain the invariants. We now describe how we have formalised this procedure as a Coq function. The first decision to be made is how to represent the implementations of classes “on disk”. Following Liu and Moore’s ACL2 model [9], we do this by modelling them as a finite map from class names to preclasses. The intention is that this represents a file system mapping pathnames to files containing implementations. We use the O’Caml glue code described in Section 6 to load the actual files from disk before any execution, parse them and put them in this structure. With this, the implementation of the procedure above can look into this collection of preclasses in order to find class implementations and check them for well-formedness. However, a problem arises due to the recursive nature of the loading and resolution procedure. We must be able to prove that the resolution of a single class reference will always terminate; we must not spend forever trying to resolve an infinite chain of superclasses or interfaces. To solve this problem we define a predicate wfRemovals preclasses that states that preclasses may
24
R. Atkey
be reduced to empty by removing elements one by one. With some work, one can prove ∀preclasses. wfRemovals preclasses. Using this, we define a function loadAndResolveAux which has type loadAndResolveAux (target : className) (preclasses : Preclasspool.t) (PI : wfRemovals preclasses) (classes : certClasspool) : {LOAD classes, preclasses ⇒ classes & c : class | lookup classes target = Some c}. This function takes the target class name to resolve, a preclasses to search for implementations, a PI argument stating that preclasses can be reduced to empty by removing elements, and the current class pool classes. The return type is an instance of the following type with two constructors: loadType (A : Set) (P : certClasspool → A → Prop) (classes : certClasspool) (preclasses : Preclasspool.t) : Set loadOk : ∀classes a. preserveOldClasses classes classes → onlyAddFromPreclasses classes classes preclasses → P classes a → loadType A P classes preclasses loadFail : ∀classes . preserveOldClasses classes classes → onlyAddFromPreclasses classes classes preclasses → exn → loadType A P classes preclasses and {LOAD classes, preclasses ⇒ classes & a : A | Q} is notation for loadType A (λclasses a. Q) classes preclasses. Hence, in the type of loadAndResolveAux, there are two possibilities: either a class structure is returned, along with a proof that this class is in the new class pool classes . Or a new classpool classes is returned, along with an error of type exn. These errors represent exceptions like java.lang.ClassFormatError that are turned into real Java exceptions by the code executing individual instructions. The two common parts of the constructors, the predicates preserveOldClasses and onlyAddFromPreclasses relate the new class pool classes to the old class pool classes and to preclasses. The predicate preserveOldClasses classes classes states that any classes that were in classes must also be in classes . The predicate onlyAddFromPreclasses classes classes preclasses states that any new classes in classes that are not in classes must have been loaded from preclasses. These two properties are used to establish properties of the class pool as it evolves during the execution of the virtual machine. In particular, we use them to show that the invariants of previously loaded classes are not violated by loading new classes, and to allow the inheritance of known facts about preclasses to the class pool. This is intended to be used in consumer-side PCC for pre-checking the proofs in a collection of class implementations before execution begins. We do not have space to go into the implementation of loadAndResolveAux. We have written the function in a heavily dependently typed style, making use of a “proof passing style”. We describe this style in the next section.
CoqJVM: An Executable Specification of the JVM Using Dependent Types
3.3
25
Writing and Proving Functions That Search the Class Pool
Given a representation of class pools and functions that add classes to it, we also need functions that query the class pool. To execute JVM instructions, we need ways to determine when one class inherits from another; to search for virtual methods and to search for fields and methods during resolution. The basic structure of these operations is to start at some point in the class hierarchy and then follow the superclass and superinterface links upwards until we find what we are looking for. We introduced goodSuperClass as a way to show that the hierarchy is well founded. While this definition is suitable for defining recursive functions over the superclass hierarchy, it is not suitable for proving properties of such functions. In the induction principle on goodSuperClass generated for a property P by Coq’s Scheme keyword, the inductive step has P (classSuperClass cSuper ) g as a hypothesis where g has type goodSuperClass classes (classSuperClass cSuper ). The variable cSuper denotes the implementation of the superclass of the current class that goodSuperClass guarantees the existence of. However, during the execution of a recursive function over the class hierarchy we will have looked up the same class, but under a different Coq name cSuper . We know from the other invariants maintained within certClasspool that cSuper and cSuper are equal, because they are stored under the same class name, but our attempts at rewriting the hypotheses with this fact were defeated by type equality issues. Although it may be possible to find a way to rewrite the hypotheses in such a way that allows us to apply the induction hypothesis, we took an easier route and defined an equivalent predicate, superClassChain: superClassChain : certClasspool → option className → Prop := sccTop : ∀classes. superClassChain classes None sccStep : ∀classes cSuper nmSuper. lookup classes nmSuper = Some cSuper → classInterface cSuper = false → (∀cSuper . lookup classes nmSuper = Some cSuper → superClassChain classes (classSuperClass cSuper )) → superClassChain classes (Some nmSuper). The difference here is that the step to the next class in the chain is abstracted over the implementation of that class, removing the problem described above. We retain the original goodSuperClass predicate because it is easier to prove while adding classes to the classpool. These two are easily proved equivalent. We can now define functions that are structurally recursive on the superclass hierarchy by recursing on the structure of the superClassChain predicate. To define such functions we must prove two inversion lemmas. These have the types inv1 : ∀classes nm c optNm. optNm = Some nm → superClassChain classes optNm → ¬(lookup (classpool classes) nm = None)
26
R. Atkey fix search (classes : certClasspool) (supernameOpt : option className) (scc : superClassChain classes supernameOpt ) := match optionInformative supernameOpt with | inleft (exist supername snmEq) ⇒ let notNotThere := inv1 snmEq scc in match lookupInformative classes supername with | inleft(exist superC superCExists ) ⇒ let scc := inv2 superCExists snmEq scc in (* examine superC here, use * search classes (classSuperClass superC ) scc for * recursive calls *) | inright notThere ⇒ match notNotThere notThere with end end | inright ⇒ (* search failed code here *) end Fig. 1. Skeleton search function
inv2 : ∀classes nm c optNm.optNm = Some nm → superClassChain classes optNm → lookup (classpool classes) nm = c → superClassChain classes (classSuperClass c). A skeleton search function is shown in Figure 1. In addition to inv1 and inv2, we use functions optionInformative : ∀A o. {a : A | o = Some a} + {o = None} and lookupInformative : ∀classes nm. {c | lookup classes nm = Some c} + {lookup classes nm = None}. The search function operates by first determining whether there is a superclass to be looked up. If so, optionInformative returns a proof object for supernameOpt = Some supername. This is passed to inv1 to obtain a proof that supername does not not exist in classes. The function then must look up supername for itself: since Coq does not allow the construction of members of Set by the examination of members of Prop, a proof can only tell a function that its work will not be fruitless, not do its work for it. To dispose of the impossible case when the superclass is discovered not to exist, we combine notNotThere and notThere to get a proof of False which is eliminated with an empty match (recall that ¬A is represented as A → False in Coq). Otherwise, we use inv2 to establish superClassChain for the rest of the hierarchy, and proceed upwards. 3.4
Interface to the Rest of the Formalisation
The implementation of the type certClasspool is kept hidden from the rest of the formalisation. To state facts about a class pool, external clients must make do with a predicate classLoaded : certClasspool → className → class → Prop. To know classLoaded classes nm c is to know that c has been loaded under the name nm in classes. The resolution procedures for classes, methods and fields are
CoqJVM: An Executable Specification of the JVM Using Dependent Types
27
also exposed, using the loadType type described above. Each of these functions returns witnesses attesting to the fact that, when they return a class, method or field, that entity exists and matches the specification requested. This module also provides two other services: assignability (or subtype) checking, and virtual method lookup. These both work by scanning the class pool using the technique described in the previous subsection.
4
The Object Heap and Static Fields
The two other major data structures maintained by the JVM are the object heap and the static fields. In this section we describe their concrete implementations and the dependently typed interface they expose to the rest of the formalisation. 4.1
Object Heaps
As with the class pool, the object heap is essentially nothing but a finite map, this time from object references to structures representing objects. Object references are represented as natural numbers using Coq’s positive type. As above, we apply extra invariants to this basic data structure to constrain it to more closely conform to object heaps that actually arise during JVM execution. We build object heaps in two stages. First, we take a standard finite map data structure and re-package it as a heap. Heaps are an abstract datatype with the following operations, abstracted over a type obj of entities stored in the heap. We have operations lookup : heap → addr → option obj to look up items in the heap; update : heap → addr → obj → option t to update the heap, but only of existing entries; and new : heap → obj → heap × addr to create new entries. Given a heap datatype with these operations and the obvious axioms, we build object heaps tailored to the needs of the JVM. The first thing to fix is the representation of objects themselves. We regard objects as pairs of a class name and a finite map from fields to values. As with class pools we require several invariants to hold about the representation of each object. Each of the class names mentioned in an object heap actually exist in some class pool. Thus, the type of an object heap depends on some class pool: certHeap classes. Second, we require that each of the fields in each object is well-typed. We are helped here by the fact that JVM field descriptors contain their type. The type of the operation that looks up a field in an object is heapLookupField classes (heap : certHeap classes) (a : addr) (fldCls : className) (fldNm : fieldName) (fldTy : javaType) : {v | objectFieldValue classes heap a fldCls fldNm fldTy v} + {¬objectExists classes heap a}. This operation looks up an object at address a, and a field within that object. If the object exists then either the actual value of that field or a default value for the field (based on fldTy) is returned, along with a proof that this is the value of that field in the object at a in heap. Otherwise, a proof that the object does not
28
R. Atkey
exist is returned. The predicates objectFieldValue and objectExists record facts about the heap that can be reasoned about by external clients, in a similar way to the classLoaded predicate for class pools. A possible invariant that we do not maintain is that each field present in an object is actually present in that object’s class and vice versa. We choose not maintain this invariant because it simplifies development at this stage of the construction of the model. As described above and as we will elaborate in Section 5, we use dependent types to reduce spurious partiality in the model and to make the model more usable for proving properties. At the moment, it is useful to know that a field’s value is well-typed; when proving a property of the model that relies on type safety we do not have to keep around an additional invariant stating that all fields are well-typed. Also, it is useful to know that an object’s class exists so that the implementation of the invokevirtual instruction can use this information to find the class implementation and search for methods. If the class does not exist there is no sensible action to take other than to just go wrong, which introduces spurious partiality into the model. However, there is an obvious action to take if a field does not exist – return a default value. The interface that object heaps present to the rest of the formalisation is constructed in the same style as that for class pools. We present an abstract type certHeap classes, along with operations such as heapLookupField above. Operations heapUpdateField and heapNew update fields and create new objects respectively. All these operations are dependently typed so that they can be used in a proof-passing style within the implementations of the bytecode instructions. Since the type of object heaps depends on the current class pool to state its invariants, we have to update the invariants’ proofs when new classes are added to the class pool. We use the preserveOldClasses predicate from Section 3.2: preserveCertHeap : ∀classesA classesB. certHeap classesA → preserveOldClasses classesA classesB → certHeap classesB Since every operation that alters the class pool produces a proof object of type preserveOldClasses, this can be passed into the above function to produce a matching object heap. 4.2
Static Fields
The static fields arc modelled in exactly the same way as the fields of a single object in the heap. The rest of the model is presented with a dependently typed interface that maintains the invariant that each field’s value is welltyped according to the field’s type. The type of well-typed static field stores is fieldStore classes heap.
5
Modelling Execution
All of the modules described above are arguments to the Execution functor, whose signature was in Section 2. We now describe the implementation of this module.
CoqJVM: An Executable Specification of the JVM Using Dependent Types
5.1
29
The Virtual Machine State
The state of the virtual machine is modelled as a record with the fields stateFrameStack stateClasses stateObjectHeap stateStaticFields
: list frame : certClasspool : certHeap stateClasses : fieldStore stateClasses stateObjectHeap.
States contain the three major data structures for the class pool, object and static fields that we have covered above. The additional field records the current frame stack of the virtual machine. Individual stack frames have the fields: frameOpStack : list rtVal framePC : nat frameClass : class
frameLVars : list (option rtVal) frameCode : code
The type rtVal is used to represent run-time values manipulated by the virtual machine such as integers and references. There are entries for the current operand stack and the local variables. The use of option types in the local variables is due to the presence of values that occupy multiple words of memory. Values of types such as int only occupy a single 32-bit word of memory on the real hardware, but long and double values occupy 64-bits. When stored in the local variables, the second half of a 64-bit value is represented using None. The rest of the fields in frame are as follows: the framePC field records the current program counter; frameCode records the code being executed, this consists of a list of instructions and the exception handler tables; frameClass is the class structure for the code being executed, this is used to look up items in the class’s constant pool. 5.2
Instruction Execution
The main function of the formalisation is exec : state → execResult. This executes a single bytecode instruction within the machine. If any exceptions are thrown then the catching and handling or the termination of the machine are all handled before exec returns. The type execResult sets out the possible results of a single step of execution: execResult : Set cont : state → execResult stop : state → option rtVal → execResult stopExn : state → addr → execResult wrong : execResult. An execution step can either request continuation to the next step; stop, possibly with a value; stop with a reference to an uncaught exception; or go wrong. The basic operation of exec is simple. The current instruction is found by unpacking the current stack frame from the state and looking up by framePC . Each instruction is implemented by a different function. The non-object-oriented
30
R. Atkey
implementations are relatively straightforward; the object-oriented instructions more complex. They interact with the class pool, object heap and static field data structures described in earlier sections. The dependently typed interfaces are used to ensure that we maintain the invariants of each data structure, and that we only go wrong when the executed code is not type safe.
6
Extraction to O’Caml
The Coq development consists of roughly 5275 lines of specification and 2288 lines of proof, as measured by the coqwc tool. The proof component primarily comprises glue lemmas to allow the coding of proof-passing dependently typed functions. After extraction this becomes 16454 lines of O’Caml code, with a .mli file of 17132 lines. The expansion can be explained partially by the inclusion of some elements of Coq’s standard library, but mainly by the repetition of module interfaces several times. This appears to be due to an internal limitation of Coq in the way it represents module signatures with sharing constraints. To turn this extracted code into a simulator for the JVM we have written around 700 lines of O’Caml glue code. The bulk of this is code to translate from the representation of .class files used by the library that loads and parses them to the representation required by the extracted Coq code. The action of the O’Caml code is simply to generate a suitable preclasses by scanning the classpath for .class files, construct an initial state for a nominated static method, and then iterate the exec function until the machine halts. The JVM so produced is capable of running bytecode produced by the javac compiler, albeit very slowly. We have not yet implemented arrays or strings, so the range of examples is limited, but we have used it to test the dynamic loading and virtual method lookup and invocation, discovering several bugs in the model.
7
Related Work
We know of two other large-scale executable specifications of the JVM constructed within theorem provers. Liu and Moore [9] describe an executable JVM model, called M6, implemented using ACL2’s purely functional subset of Common Lisp. M6 is particularly complete: in common with the work described here it simulates dynamic class loading and interfaces, it also goes beyond our work in simulating class initialisation and instruction-level interleaving concurrency (though see comments on concurrency in the next section). Liu and Moore describe two applications of their model. They use the model to prove the invariant that if a class is loaded then all its superclasses and interfaces must also be loaded. Unlike in our model, where this invariant is built in, this proof is an optional extra for M6. Our motivation for maintaining this invariant is to prove that searches through the class hierarchy terminate. In M6 this is proved by first establishing that there will always be a finite number of classes loaded, and so there will be a finite number of superclasses to any class; then recursion
CoqJVM: An Executable Specification of the JVM Using Dependent Types
31
proceeds on this finite list. Note that, unlike our method, this does not guarantee that, at the point of searching through the hierarchy, all the classes will exist. Liu and Moore also directly prove some correctness properties of concurrent Java programs. Another executable JVM specification is that of Barthe et al [2]. This is an executable specification of JavaCard within Coq. With respect to our model, they do not need to implement dynamic class loading since all references are resolved before runtime in the JavaCard environment. They also prove the soundness of a bytecode verifier with respect to their model. Klein and Nipkow [4] have formalised a Java-like language, Jinja, complete with virtual machine, compiler and bytecode verifier, in Isabelle/HOL. They have proved that the compilation from the high-level language to bytecode is semantics preserving. The language they consider is not exactly Java, and their model simplifies some aspects such as the lack of dynamic loading and interfaces. The formalisation is executable via extraction to SML. Another large-scale, and very complete formalisation is that of St¨ark et al [13] in terms of abstract state machines. This formalisation is executable, but proofs about the model have not been mechanised. The Mobius project contains a formalisation of the JVM called Bicolano2 . This formalisation uses Coq’s module system to abstract away from the representation of the machine’s data structures in a similar way to ours, and has been used to prove the soundness of the Mobius Base Logic. It does not model dynamic class loading. Other large-scale formalisations of programming languages include Leroy’s formalisation of a subset of C and PowerPC machine code for the purposes of a certified compiler [6]. This is a very large example of a useful program being extracted from a formal development. Another large mechanisation effort is Lee et al ’s Twelf formalisation of an intermediate language for Standard ML [5].
8
Conclusions
We have presented a formal, executable model of a subset of the Java Virtual Machine structured using a combination of dependent types and Coq’s module system. We believe that this use of dependent types as a structuring mechanism is the first application of such a strategy to a large program. The model is incomplete at time of writing. In the immediate future we intend to add arrays and strings to the model in order to extend the range of real Java programs that may be executed. There are several extensions that require further research. Modelling I/O behaviour of JVM programs would be a useful feature. We speculate that a suitable way to do this would be to write the formalisation using a monad. The monad would be left abstract but axiomatised in Coq in order to prove properties, but be implemented by actual I/O in O’Caml. Even more difficult is the implementation of concurrency. Liu and Moore’s ACL2 model simulates concurrency by interleaving, but this does not capture all the possible behaviours allowed by the Java Memory Model [10]. There has been recent work on formalising the Java Memory Model in Isabelle/HOL [1], but it 2
http://mobius.inria.fr/bicolano
32
R. Atkey
is difficult to see how this could be made into an executable model. A suitable approach may be to attempt to only model data-race free programs, for which the Java Memory Model guarantees the validity of the interleaving semantics. Acknowledgement. This work was funded by the ReQueST grant (EP/C537068) from the Engineering and Physical Sciences Research Council.
References 1. Aspinall, D., Sevc´ık, J.: Formalising Java’s Data Race Free Guarantee. In: Schneider, K., Brandt, J. (eds.) TPHOLs 2007. LNCS, vol. 4732, pp. 22–37. Springer, Heidelberg (2007) 2. Barthe, G., Dufay, G., Jakubiec, L., Serpette, B., de Sousa, S.M.: A Formal Executable Semantics of the JavaCard Platform. In: Sands, D. (ed.) ESOP 2001. LNCS, vol. 2028, pp. 302–319. Springer, Heidelberg (2001) 3. Bertot, Y., Cast´eran, P.: Interactive Theorem Proving and Program Development: Coq’Art: The Calculus of Inductive Constructions. Springer, Heidelberg (2004) 4. Klein, G., Nipkow, T.: A machine-checked model for a Java-like language, virtual machine and compiler. ACM Transactions on Programming Languages and Systems 28(4), 619–695 (2006) 5. Lee, D.K., Crary, K., Harper, R.: Towards a Mechanized Metatheory of Standard ML. In: POPL 2007: Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp. 173–184. ACM Press, New York (2007) 6. Leroy, X.: Formal certification of a compiler back-end or: programming a compiler with a proof assistant. In: POPL 2006: Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp. 42– 54. ACM Press, New York (2006) 7. Liang, S., Bracha, G.: Dynamic class loading in the Java virtual machine. In: OOPSLA 1998: Proceedings of the 13th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, pp. 36–44. ACM Press, New York (1998) 8. Lindholm, T., Yellin, F.: The Java Virtual Machine Specification, 2nd edn. Addison-Wesley, Reading (1999) 9. Liu, H., Moore, J.S.: Executable JVM model for analytical reasoning: A study. Sci. Comput. Program. 57(3), 253–274 (2005) 10. Manson, J., Pugh, W., Adve, S.V.: The Java memory model. In: POPL 2005: Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages, pp. 378–391. ACM Press, New York (2005) 11. The Coq development team. The Coq proof assistant reference manual. LogiCal Project, Version 8.0 (2004) 12. Necula, G.C.: Proof-carrying code. In: Proceedings of POPL 1997 (January 1997) 13. St¨ ark, R., Schmid, J., B¨ orger, E.: Java and the Java Virtual Machine – Definition, Verification, Validation. Springer, Heidelberg (2001)
Dependently Sorted Logic Jo˜ ao Filipe Belo The University of Manchester, School of Computer Science Oxford Road, Manchester, M13 9PL, UK
[email protected]
Abstract. We propose syntax and semantics for systems of intuitionistic and classical first order dependently sorted logic, with and without equality, retaining type dependency, but otherwise abstracting, from systems for dependent type theory, and which can be seen as generalised systems of multisorted logic. These are presented as extensions to Gentzen systems for first order logic in which the logic is developed relative to a context of variable declarations over a theory of dependent sorts. A generalised notion of Kripke structure provides the semantics for the intuitionistic systems.
1
Introduction
Dependently sorted logic may be described as the generalisation of multisorted first order logic to a logic with dependent sorts, i.e., as the generalisation of multisorted first order logic in which the languages have been extended to include dependent sorts. This extension is nevertheless minimal in the sense that sort dependency is the only extra structure assumed. We may thus describe the systems we introduce here as systems for multisorted first order logic generalised with dependent sorts. Alternatively, we may describe these systems as abstractions of logic enriched type theories [1] in which the only structure retained from the type theory is type dependency. The overall structure of these systems is thus that of a system of predicates and proofs over a type theory, or a theory of sorts [2]. The idea of minimally extending multisorted logic with dependent sorts has been addressed several times already [3,4,5,6] with seemingly different motivations. A system for intuitionistic dependently sorted logic without equality and an abstract notion of theory of dependent sorts called a type setup is proposed by Aczel in [3], starting the work we present in this paper. This is further developed in [4] with set theoretical semantics and completeness, but for classical logic instead. In [5], Makkai introduces a dependently sorted logic to formulate category theoretical notions, in which the use of equality is restricted, consequently avoiding function symbols. The paper [6] by Rabe is rather close in goal to ours but the development is somewhat different, deals only with classical logic, and imposes conditions on the syntax which, as said there, are rather restrictive, and which we don’t need. M. Miculan, I. Scagnetto, and F. Honsell (Eds.): TYPES 2007, LNCS 4941, pp. 33–50, 2008. c Springer-Verlag Berlin Heidelberg 2008
34
J.F. Belo
One motivation for developing dependently sorted logic as presented here is to fit dependent sorts in systems for software specification founded on multisorted logic, like CASL [7]. We want to try to apply these systems to generic programming following related work [8,9] in dependent type theory. In this paper we propose syntax and semantics for systems of intuitionistic and classical dependently sorted logic, with and without equality, staying as close as possible to traditional presentations of multisorted logic. One may then be concerned with important properties of multisorted logic, like interpolation, still holding in these systems. In a subsequent paper we intend to show that Craig interpolation indeed holds for these systems. This paper has two parts. In the first part we are concerned with theories of sorts. These are syntactical theories, so it begins with the notions of expression and substitution, after which the notion of signature is presented. Signatures are the simplest of the theories of sorts due to the absence of equality. This is followed by the notion of generalised algebraic theory, introduced by Cartmell [10,11,12], which adds equality both on terms and on sorts. Then set theoretical semantics of generalised algebraic theories is given and a completeness result is proved. The second part presents the notion of a dependently sorted first order theory and its complete semantics, both classical and intuitionistic.
2
Theories of Dependent Sorts
Roughly, a theory of dependent sorts is one which allows individual variables to occur in the expression of sorts. The actual sort, in the multisorted sense, depends on the value of the variables. It’s very much like the truth value of a predicate depending on the value of the variables occurring in it. Consider, for instance, the expression vec(n) for the sort of vectors of length n [13]. Sort dependency is the only extra structure, with regard to sorts in the multisorted sense, that we assume in dependently sorted logic. We ignore further structure, like dependent products, common in dependent type theories. This section presents two theories of dependent sorts, namely signatures and generalised algebraic theories. The distinguishing feature among them is equality, which is absent in signatures. The main observation about the presentation of these theories is that the inductive definition of sorts and terms must be simultaneous by the nature of dependent sorts. 2.1
Expressions
We shall now give a definition of a notion of expression, which we do mainly to simplify the definition of substitution on the objects we shall introduce later. We give it in very much the standard way where the expressions are built from variables by a series of symbol applications. It should nevertheless be noted that for now we don’t assign arities to the symbols, and hence that we don’t impose any arity constraint on the formation of expressions. It should also be noted that we intend a notion of expression modulo renaming of bound variables, or α-conversion.
Dependently Sorted Logic
35
For certain inductively defined sets we need an objective or formal notion of derivation according to the clauses of the inductive definition. These sets shall be defined indirectly through the inductive definition of the derivations instead. For this we use the notation Π1 . . . Πk {R}, Q to abbreviate a clause “If Π1 , . . . Πk are derivations, then (Π1 , . . . Πk , Q) is a derivation, provided that R.” We may omit R, which we then consider to hold. The Πi we call the premises of the derivation and Q the conclusion. When writing down such a clause we shall in fact, of each Πi , show only its conclusion. We say that “Q is derivable,” write Q, when Q is the conclusion of some derivation. The sets indirectly being defined are the sets of those Q which are derivable. For uniformity sake, hereafter we shall use this notation in every inductive definition, even when we don’t need the formal derivations. For us a sequence is always a possibly empty finite sequence, the empty sequence being denoted by the letter . In a situation where a sequence is specified, say E1 , . . . En , the sequence may be denoted simply by the unsubscripted letter, say E. Definition 1. We assume fixed an infinite set U of symbols and an infinite set V of variables and inductively define the expressions by the following clauses. Variable, compound, variable binding v
{v ∈ V }
E1
. . . En {H ∈ U } H(E)
A ψ {H ∈ U and v ∈ V } (Hv : A)ψ
We may use infix notation E1 HE2 when denoting expressions of the form H(E1 , E2 ) and also omit the parenthesis when denoting expressions of the form H(). Moreover, in this subsection we let the letters, possibly subscripted or primed, – H denote an arbitrary symbol, – u, v, w, x, y denote arbitrary variables, – A, C, D, E, ψ denote arbitrary expressions, except otherwise indicated. Definition 2. An occurrence of a variable v in an expression E is free ( bound) in E according to the following induction on the structure of E: (1) the occurrence of v in v is free; (2) a free (bound) occurrence of v in Ei is free (bound) in H(E1 , . . . En ); (3) a free (bound) occurrence of v in A is free (bound) in (Hv : A)ψ; (4) a free (bound) occurrence of v in ψ is free (bound) in (Hu : A)ψ if v is distinct from u, otherwise is bound. Definition 3. An occurrence of a variable v in an expression E is said to be binding in E according to the following induction on the structure of E: (1) the occurrence of v in v is not binding; (2) a binding occurrence of v in Ei is binding in H(E1 , . . . En ); (3) the occurrence of v in (Hv : A)ψ is binding; (4) a binding occurrence of v in ψ is binding in (Hu : A)ψ.
36
J.F. Belo
Definition 4. A variable v is said to be free in an expression E if v has a free occurrence in E and v is said bound in E if v has a binding occurrence in E. The set FV(E) is the set of free variables in E. The set FV(E1 , . . . En ) is the union of the set of free variables of each Ei . Definition 5. A substitution is a pair of sequences D1 , . . . Dm /y1 , . . . ym , for any m ≥ 0, such that the variables in y are distinct. Definition 6. Let the pair D1 , . . . Dm /y1 , . . . ym be a substitution. The simultaneous substitution E[D/y] of D for y in E is defined by induction on the structure of E by equations: v[D/y] = Di v[D/y] = v
if v = yi for some i, if v = yi for all i,
H(E1 , . . . En )[D/y] = H(E1 [D/y], . . . En [D/y]) (Hv : A)ψ[D/y] = (Hv : A)(ψ[D /y ])
if v ∈ FV(D),
where D /y is D1 , . . . Di−1 , Di+1 , . . . Dm /y1 , . . . yi−1 , yi+1 , . . . ym D/y
if v = yi for some i, if v = yi for all i.
Each equation must be considered under the condition that the simultaneous substitutions on its right hand side are defined. Proposition 1 ((Strict) Substitution lemma). Let C1 , . . . Cl /x1 , . . . xl and D1 , . . . Dm /y1 , . . . ym be substitutions. Then E[C/x][D/y] = E[D/y][C[D/y]/x], provided the simultaneous substitutions are defined and none of the variables in x is in y or occurs free in D. As said, we intend a notion of expression modulo renaming of bound variables. For that we define syntactic equality which relates those expressions differing only in bound variables. Definition 7. Syntactic equality ≡ of expressions is inductively defined by the clauses: Reflexivity, congruence, bound variable renaming v≡v
E1 ≡ E1 . . . En ≡ En H(E) ≡ H(E )
ψ[w/v] ≡ ψ [w/v ] (Hv : A)ψ ≡ (Hv : A)ψ
where w is some variable occurring neither in ψ nor in ψ . Proposition 2. Syntactic equality is an equivalence relation.
Dependently Sorted Logic
37
Proposition 3. Let E and E be expressions such that E ≡ E . Then E[D/y] ≡ E [D/y], for any substitution D1 , . . . Dm /y1 , . . . ym for which the simultaneous substitutions are defined. Proposition 4. For any expression E and substitution D1 , . . . Dm /y1 , . . . ym there exists an expression E such that E ≡ E and E [D/y] is defined. Proposition 4 implies that simultaneous substitution on equivalence classes of expressions is totally defined. Thus, hereafter the expressions are taken modulo syntactic equality, or renaming of bound variables. 2.2
Signatures
The simplest way to set up a theory of sorts with sort dependency is, perhaps, through the notion of signature we present in this section. The fundamental notions in this presentation are those of context, of sort, and of term. The set up will be such that (1) every sort and term shall explicitly contain a context declaring the variables from which it may be formed: no variable may be used other than those declared in that context; and (2) every term shall explicitly contain an expression designating its sort. Definition 8. A variable declaration is a pair v : A, where v is a variable and A is an expression. Signatures enjoy the relevant properties of the generalised algebraic theories we present in the next section. In fact they are a particular kind of generalised algebraic theory. We thus choose to develop signatures only to the point where we can give some examples involving dependent sorts. As metavariables, we now let the letters, possibly subscripted or primed, – – – –
f , g, h, F , G denote arbitrary symbols, u, v, w denote arbitrary variables, p, q, r, s, t, A, B, C, D denote arbitrary expressions, Γ , Δ, E denote arbitrary sequences of variable declarations,
except otherwise indicated. Also, unless stated otherwise, we assume the sequences of variables in order declared in Γ and Δ to be x1 , . . . xm and y1 , . . . yn , respectively. Definition 9. We start with the following definitions. 1. A sort constructor declaration is a pair (Δ) F . 2. A term constructor declaration is a triple f :: Δ → D. Sort and term constructor declarations simply assign arities to symbols: the sequence of variable declarations assigned to a symbol determines the number
38
J.F. Belo
of terms, and their sort, to which that symbol may be applied, and the target expression of a term constructor declaration determines furthermore the sort of that application. Note nevertheless that, at this point, symbols are assigned arbitrary sequences of variable declarations, i.e., the expressions assigned to the variables in those sequences don’t necessarily designate sorts. Nor for that matter does the target expression of a term constructor declaration designate a sort, as it should. These two constraints must be further imposed by stating that, in a proper collection of declarations, in what we shall call a signature, the sequences of variable declarations assigned to the symbols must actually be contexts and, roughly, the target expressions of term constructor declarations must be sorts. Definition 10. A signature is a set Σ of sort and term constructor declarations such that 1. there are no two declarations for the same symbol in Σ, 2. if (Δ) F ∈ Σ, then Δ, and 3. if f :: Δ → D ∈ Σ, then (Δ) D, according to the following clauses for the simultaneous inductive definition of contexts Γ , sorts (Γ ) A, and terms (Γ ) t : A over Σ, where (Γ ) t : Δ abbreviates the list of premises Γ
(Γ ) t1 : B1
...
(Γ ) tn : Bn [t1 , . . . tn−1 /y1 , . . . yn−1 ]
Δ
for B1 , . . . Bn the sequence of sorts of the variables in order declared in Δ. Note that since Δ may be the empty sequence, the premise Γ is necessary to disallow an arbitrary sequence of variables as the context of the conclusion. empty context, context extension
(Γ ) A {v ∈ Γ } Γ, v : A
sort constructor application (Γ ) t : Δ {(Δ) F ∈ Σ} (Γ ) F (t) declared variable, term constructor application Γ {v : A ∈ Γ } (Γ ) v : A
(Γ ) t : Δ (Δ) D {f :: Δ → D ∈ Σ} (Γ ) f (t) : D[t/y]
We may omit the pair of parenthesis when denoting an application of a constructor symbol to the empty sequence. We usually omit the context, and the arrow, in the declaration of sort and term constructors when the context is empty. Example 1. The canonical example of a signature is perhaps that for a theory of categories, where the set of morphisms between two objects is designated by a dependent sort:
Dependently Sorted Logic
39
Sort constructors () Obj (x : Obj, y : Obj) Arr Term constructors ◦ :: (x : Obj, y : Obj, z : Obj, g : Arr(x, y), f : Arr(y, z)) → Arr(x, z) id :: (x : Obj) → Arr(x, x) Example 2. A more interesting example deals with indexed families of categories: Sort constructors () V (i : V ) Obj (i : V, x : Obj(i), y : Obj(i)) Arr Term constructors ◦ :: (i : V, x : Obj(i), . . . f : Arr(i, y, z)) → Arr(i, x, z) id :: (i : V, x : Obj(i)) → Arr(i, x, x) Example 3. Another example is that of a signature for a theory of stacks, where the sort of a stack includes the number of elements in it: Sort constructors () N at () T (len : N at) Stk Term constructors 0 :: N at suc :: N at → N at empty :: Stk(0) push :: (len : N at, s : Stk(len), e : T ) → Stk(suc(len)) top :: (len : N at, s : Stk(suc(len))) → T This disallows, for instance, the application of top to the empty stack. Before we move on to the next subject we note, regarding the definition of signature, that dropping the premise Δ from the sort constructor application clause, and the premises Δ and (Δ) D from the term constructor application clause, enables the following to be a signature. Sort constructors (x : G(c)) G Term constructors c :: G(c) Note the “circularity” in the declarations. The theory can still be developed in this case, without much change, although the proofs are not so straightforward since the language no longer has a proper simultaneous inductive definition.
40
2.3
J.F. Belo
Generalised Algebraic Theories
The notion of generalised algebraic theory extends that of signature with the expression of equality both on terms and on sorts. It was introduced by Cartmell [10] as “a generalisation of the usual notion of a many-sorted algebraic or equational theory.” Our presentation uses slightly different notation and terminology in a style closer to that of the traditional presentations of multisorted languages. The key observation here is that an inferred equality may enable a new application of a constructor symbol, thus the contexts, the sorts, and the terms must be formed simultaneously with the inference of equality. Definition 11 1. An algebraic equation axiom is a triple (Γ ) r = s : A. 2. A sort equation axiom is a triple (Γ ) A = B. Definition 12. A generalised algebraic theory is a set Σ of sort constructor declarations, term constructor declarations, algebraic equation axioms, and sort equation axioms such that 1. 2. 3. 4. 5.
there are no two declarations for the same symbol in Σ, if (Δ) F ∈ Σ, then Δ, if f :: Δ → D ∈ Σ, then (Δ) D, if (Γ ) s = t : A ∈ Σ, then (Γ ) s : A and (Γ ) t : A, if (Γ ) A = B ∈ Σ, then (Γ ) A and (Γ ) B,
according to the following clauses for the simultaneous inductive definition of contexts Γ , sorts (Γ ) A, terms (Γ ) t : A, algebraic equations (Γ ) r = s : A, and sort equations (Γ ) A = B over Σ. For brevity we omit the signature clauses for the contexts, sorts, and terms. sort replacement on terms (Γ ) r : A (Γ ) A = B (Γ ) r : B algebraic equation axiom (Γ ) r : A (Γ ) s : A {(Γ ) r = s : A ∈ Σ} (Γ ) r = s : A reflexivity, symmetry, transitivity (Γ ) t : A (Γ ) t = t : A
(Γ ) r = s : A (Γ ) s = r : A
(Γ ) r = s : A (Γ ) s = t : A (Γ ) r = t : A
substitution on algebraic equations (Γ ) r = s : Δ (Δ) p = q : D (Γ ) p[r/y] = q[s/y] : D[r/y]
Dependently Sorted Logic
41
sort replacement on algebraic equations (Γ ) r = s : A (Γ ) A = B (Γ ) r = s : B sort equation axiom (Γ ) A (Γ ) B {(Γ ) A = B ∈ Σ} (Γ ) A = B reflexivity, symmetry, transitivity of sort equations (Γ ) A (Γ ) A = A
(Γ ) A = B (Γ ) B = A
(Γ ) A = B (Γ ) B = C (Γ ) A = C
substitution on sort equations (Γ ) r = s : Δ (Δ) C = D (Γ ) C[r/y] = D[s/y] We next display a series of fundamental properties of these derivations, writing (Γ ) t : Δ as an abbreviation for Γ, (Γ ) t1 : B1 , . . . (Γ ) tn : Bn [t1 , . . . tn−1 /y1 , . . . yn−1 ], and Δ, for Δ the sequence y1 : B1 , . . . yn : Bn of variable declarations. Proposition 5. Let Σ be a generalised algebraic theory. i. If u occurs in p in (Δ) p : D, then (Δ) u : Bi for some i. ii. If u occurs in D in (Δ) D, then (Δ) u : Bi for some i. Proposition 6. Suppose that (Γ ) t : Δ. i. If (Δ) p : D, then (Γ ) p[t/y] : D[t/y], ii. If (Δ) D, then (Γ ) D[t/y]. Proposition 7 i. ii. iii. iv. v.
If If If If If
(Γ ) A, then Γ . Γ and v : A ∈ Γ , then (Γ ) A. (Γ ) r = s : A, then (Γ ) r : A and (Γ ) s : A. (Γ ) A = B, then (Γ ) A and (Γ ) B. (Γ ) r : A, then (Γ ) A.
Proposition 8. If (Γ ) r : A and (Γ ) r : B, then (Γ ) A = B. Proof. The proof is by a double induction on the height of the derivations. Take a derivation of (Γ ) r : A and a derivation of (Γ ) r : B. They must be a variable, a term constructor application, or a sort replacement on terms. For every possible combination of these it can be checked that (Γ ) A = B indeed holds either by reflexivity or by the hypothesis of induction and transitivity.
42
3
J.F. Belo
Dependently Sorted Algebras
We now turn to the set theoretical semantics for generalised algebraic theories which we prove sound and complete, although in a weak sense. The semantics is based on the idea that a sort is interpreted by a family of sets indexed by the interpretation of its context, and that a term is interpreted by a function on the interpretation of its context to the interpretation of its sort, but in a way which respects the indexing of that sort. The condition that such an interpretation respects the interpretation of substitution as composition defines the notion of structure for a generalised algebraic theory. If furthermore the equality axioms are satisfied by the interpretation, then the structure is called a dependently sorted algebra. Thus, let Σ be a generalised algebraic theory. Definition 13. A structure M for Σ is a triple of interpreting functions ·M on contexts, sorts, and terms such that 1. Γ M is a set ()M = {∅} Γ, v : AM = {(e, e ) | e ∈ Γ M and e ∈ (Γ ) AM (e)} 2. (Γ ) AM is a family of sets indexed by Γ M (Γ ) D[t/y]M (e) = (Δ) DM ((Γ ) t : ΔM (e)) 3. (Γ ) r : AM : (e ∈ Γ M ) → (Γ ) AM (e) (Γ ) xi : Ai M (e) = ei (Γ ) p[t/y] : D[s/y]M (e) = (Δ) p : DM ((Γ ) t : ΔM (e)) where xi : Ai is the ith variable declaration in Γ and (Γ ) t : ΔM (e) abbreviates ((Γ ) t1 : B1 M (e), . . . (Γ ) tn : Bn [t1 , . . . tn−1 /y1 , . . . yn−1 ]M (e)), B1 , . . . Bn being the sequence of sorts of the variables in order declared in Δ. One can also define a structure to be an assignment of a family of sets to each sort constructor and of a function to each term constructor such that the above equations, slightly changed and taken in general as inductively defining a partial assignment on the contexts, sorts, and terms, are indeed total. Nevertheless, the above definition captures the properties needed for what follows. Definition 14. An algebraic equation (Γ ) r = s : A is valid in a structure M for Σ, written M |=Σ (Γ ) r = s : A, if (Γ ) r : AM = (Γ ) s : AM . Similarly for a sort equation. The structure M is called a (dependently sorted) algebra for Σ if every algebraic and sort equation in Σ is valid in M . Proposition 9 (Soundness) i. For any algebraic equation (Γ ) r = s : A over Σ, (Γ ) r = s : A only if M |=Σ (Γ ) r = s : A for any algebra M for Σ.
Dependently Sorted Logic
43
ii. For any sort equation (Γ ) A = B over Σ, (Γ ) A = B only if M |=Σ (Γ ) A = B for any algebra M for Σ. Proof By induction on the height of the derivation of (Γ ) r = s : A and (Γ ) A = B, using properties 2 and 3 of the definition of structure for Σ in the case of a substitution on sort equations and substitution on algebraic equations, respectively. 3.1
Completeness
By completeness we understand the converse of propositions 9.i and 9.ii. We shall show that the converse of 9.ii does not hold, so we aim solely at the converse of 9.i, which we call weak completeness. The method we use is the standard one of building a term algebra, and then showing that it only validates a given algebraic equation if that equation is derivable. Thus, let Σ be a generalised algebraic theory. We shall prove the following proposition. Proposition 10 (Weak Completeness). For any equation (Γ ) r = s : A, it holds that (Γ ) r = s : A if M |=Σ (Γ ) r = s : A for any algebra M . For simplicity of notation, we restrict the proof to the case where Γ is the empty context, and thus build the term algebra by collecting closed terms. For the general case one would collect terms in the context Γ instead. Definition 15. We say a sort A is closed if the sort has the empty context, and similarly for terms. We’ll often leave out the empty context when referring to closed sorts and closed terms. Recall proposition 8 for the next definition. Proposition 11. Let the binary relation on the closed terms over Σ be defined by r : A s : B if r = s : B, for closed terms r : A and s : B. Then, i. is an equivalence relation, ii. c : Γ e : Γ implies r : A[c/x] r : A[e/x], for (Γ ) A and r : A[e/x], and iii. c : Γ e : Γ implies t[c/x] : A[c/x] t[e/x] : A[e/x], for (Γ ) t : A. Definition 16. The canonical structure for Σ is the triple of assignments · defined on the contexts Γ , sorts (Γ ) A, and terms (Γ ) t : A by an induction on the number of declarations in Γ , 1. () = {∅}, 2. Γ , y : B = {(e, e ) | e ∈ Γ and e ∈ (Γ ) B(e)}, 3. (Γ ) A is a family of sets indexed by Γ (Γ ) A([e] ) = {[r : A[e/x]] | r : A[e/x]}, 4. (Γ ) t : A : ([e] ∈ Γ ) → (Γ ) A([e] )
44
J.F. Belo
(Γ ) t : A([e] ) = [t[e/x] : A[e/x]] , where x is the sequence of variables in order declared in Γ . Proposition 12. The canonical structure for Σ is a structure for Σ. Proof. The proof is by induction on the number of variables declared in Γ , checking the conditions in the definition of structure. It essentially follows from the properties of substitution. Proposition 13. The canonical structure for Σ is an algebra for Σ. Proof. It needs only to be checked that, whenever (Γ ) r = s : A ∈ Σ, (Γ ) r : A([e] ) = (Γ ) s : A([e] ), for all [e] ∈ Γ , and that whenever (Γ ) A = B ∈ Σ, (Γ ) A([e] ) = (Γ ) B([e] ), for all [e] ∈ Γ . Proposition 14. Let Σ be a generalised algebraic theory, and let M be the canonical structure for Σ. Then, M |=Σ r = s : A only if r = s : A, for closed terms r : A and s : A. Proof. This is an immediate consequence of the definition of . Again, we don’t claim that M |=Σ A = B only if A = B, for closed sorts A and B. Here a problem arises as follows. Suppose Σ is the generalised algebraic theory {() A, () B, (x : A) A = B, (x : B) A = B}. Then A = B, but M |=Σ A = B for every algebra M for Σ. This is because in every algebra either A and B are interpreted by the empty set, and then their interpretations are equal, or must otherwise be interpreted by equal sets as imposed by the axioms. Thus we have the following. Proposition 15. There is a generalised algebraic theory over which an equality on sorts is not derivable but is nevertheless satisfied in every algebra. Thus, for the stronger completeness result we need a more general category, one which allows a more intensional notion of equality between the objects interpreting the sorts.
4
Systems of Dependently Sorted Logic
As said in the introduction, we consider a logic system to be a system of predicates and proofs fitted over a theory of sorts. The purpose of this section is to make precise what we mean by this for the particular case of dependently sorted logic. We shall introduce systems for both classical and intuitionistic first order logic. The inference rules are essentially those of the systems G1c and G1i in [14], except that:
Dependently Sorted Logic
45
1. the sequents in our systems include a context, which is a way of dealing with empty sorts as the availability of variables of given sorts is implicitly an existence assumption, see [15], page 811, and 2. a substitution rule is included with an algebraic equality as a side condition, so that a term in the conclusion of a derivation may be replaced by another algebraically equal to it. We use the letters, possibly subscripted or primed, ζ, ρ, φ, ψ to denote arbitrary expressions, and Θ, P , Φ to denote arbitrary multisets of expressions. Definition 17 1. A predicate symbol declaration is a pair R ⊂ (Δ). 2. A logic system is a generalised algebraic theory Σ together with a set of predicate symbol declarations R ⊂ (Δ) such that Σ Δ. Let Σ be a logic system. We shall use the same letter to denote a logic system and its theory of sorts. Definition 18. The formulas over Σ are inductively defined according to the following clauses. atomic, truth, and falsity (Γ ) R(t)
{R ⊂ (Δ) ∈ Σ and (Γ ) t : Δ}
(Γ )
{ Γ }
(Γ ) ⊥
{ Γ }
conjunction, disjunction, and implication (Γ ) ζ0 (Γ ) ζ1 (Γ ) ζ0 ∧ ζ1
(Γ ) ζ0 (Γ ) ζ1 (Γ ) ζ0 ∨ ζ1
(Γ ) ζ0 (Γ ) ζ1 (Γ ) ζ0 → ζ1
universal and existential quantification (Γ, x : A) ψ (Γ ) (∃x : A)ψ
(Γ, x : A) ψ (Γ ) (∀x : A)ψ
Proposition 16 1. If u occurs free in φ in (Δ) φ, then (Δ) u : Bi for some i, where B1 , . . . Bn is 2. If (Γ ) t : Δ and (Δ) φ, then (Γ ) φ[t/y]. Definition 19. A sentence is a formula with the empty context. We usually omit the context when denoting a sentence. A sequent is a triple (Γ ) Φ ⇒ P such that (Γ ) Φ and (Γ ) P are finite multisets of formulas. The multiset (Γ ) Φ is called the antecedent and (Γ ) P the succedent. A sequent is called intuitionistic if the succedent has at most one formula. Definition 20. Sequents are derived over a logic system according to the following clauses, called the inference rules for classical logic.
46
J.F. Belo
logical axiom, ⊥ elimination, and introduction (Γ ) ζ ⇒ ζ
(Γ ) ⊥ ⇒
(Γ ) ⇒
weakening left and right, contraction left and right (Γ ) Φ ⇒ P (Γ ) ζ, Φ ⇒ P
(Γ ) Φ ⇒ P (Γ ) Φ ⇒ P, ζ
(Γ ) ζ, ζ, Φ ⇒ P (Γ ) ζ, Φ ⇒ P
(Γ ) Φ ⇒ P, ζ, ζ (Γ ) Φ ⇒ P, ζ
substitution and cut (Δ) Ψ ⇒ Θ { (Γ ) r = s : Δ} (Γ ) Ψ [r/y] ⇒ Θ[s/y]
(Γ ) Φ0 ⇒ P0 , ζ (Γ ) ζ, Φ1 ⇒ P1 (Γ ) Φ0 , Φ1 ⇒ P0 , P1
∧ elimination and introduction (Γ ) ζi , Φ ⇒ P (Γ ) ζ0 ∧ ζ1 , Φ ⇒ P
(Γ ) Φ ⇒ P, ζ0 (Γ ) Φ ⇒ P, ζ1 (Γ ) Φ ⇒ P, ζ0 ∧ ζ1
∨ elimination and introduction (Γ ) ζ0 , Φ ⇒ P (Γ ) ζ1 , Φ ⇒ P (Γ ) ζ0 ∨ ζ1 , Φ ⇒ P
(Γ ) Φ ⇒ P, ζi (Γ ) Φ ⇒ P, ζ0 ∨ ζ1
→ elimination and introduction (Γ ) Φ ⇒ P, ζ0 (Γ ) ζ1 , Φ ⇒ P (Γ ) ζ0 → ζ1 , Φ ⇒ P
(Γ ) ζ0 , Φ ⇒ P, ζ1 (Γ ) Φ ⇒ P, ζ0 → ζ1
∃ elimination and introduction (Γ, u : A) ψ[u/v], Φ ⇒ P (Γ ) (∃v : A)ψ, Φ ⇒ P
(Γ ) Φ ⇒ P, ψ[t/v] { (Γ ) t : A} (Γ ) Φ ⇒ P, (∃v : A)ψ
∀ elimination and introduction (Γ ) ψ[t/v], Φ ⇒ P { (Γ ) t : A} (Γ ) (∀v : A)ψ, Φ ⇒ P
(Γ, u : A) Φ ⇒ P, ψ[u/v] (Γ ) Φ ⇒ P, (∀v : A)ψ
Definition 21. The inference rules for intuitionistic logic are the same as those for classical logic except that the sequents must be intuitionistic and the → elimination clause is replaced by: (Γ ) Φ0 ⇒ ζ (Γ ) ζ, Φ1 ⇒ P . (Γ ) Φ0 , Φ1 ⇒ P Definition 22. If a sequent (Γ ) Φ ⇒ P is derivable over Σ using the inference rules of classical logic, then we write Σ (Γ ) Φ ⇒ P . If the sequent is intuitionistic and the derivation uses the inference rules of intuitionistic logic, then we write instead iΣ (Γ ) Φ ⇒ P . We omit the subscript Σ if no confusion arises. Definition 23. A first order theory over Σ is a set of sentences over Σ, called the first order axioms of the theory. A sentence ζ is said to be derivable from the first order axioms in S, written S ζ, if () Φ ⇒ ζ such that the sentences in Φ are all first order axioms in S. We again use a superscript i in case the derivation is intuitionistic.
Dependently Sorted Logic
5
47
Structures for Dependently Sorted Logic
Having presented the systems for intuitionistic and classical logic in the previous section, we now proceed with their semantics. The striking aspect of the following development is, perhaps, that it is only a slight generalisation of that for multisorted logic. We present complete semantics for both classical and intuitionistic logic. The definition of classical structure is straightforward: given a structure for the theory of sorts only the declared predicate symbols remain to be interpreted. Definition 24. Let Σ be a logic system. A classical structure M for Σ is a structure for Σ together with an assignment of a subset RM of ΔM to each predicate symbol declaration R ⊂ (Δ). The interpretation of arbitrary formulas is then given by appropriate subsets of the interpretation of their contexts and is defined as follows. Definition 25. The satisfaction relation |=Σ on M between formulas (Γ ) ζ and elements e of Γ M is defined by induction on the structure of (Γ ) ζ as follows. M, e |= (Γ ) R(t) if (Γ ) t : ΔM (e) ∈ RM , M, e |= (Γ ) , M, e |= (Γ ) ⊥, M, e |= (Γ ) ζ0 ∧ ζ1 if M, e |= (Γ ) ζ0 and M, e |= (Γ ) ζ1 , M, e |= (Γ ) ζ0 ∨ ζ1 if M, e |= (Γ ) ζ0 or M, e |= (Γ ) ζ1 , M, e |= (Γ ) (∃x : A)ψ if M, (e, e ) |= (Γ, x : A) ψ for some (e, e ) ∈ Γ, x : AM , 7. M, e |= (Γ ) ζ0 → ζ1 if M, e |= (Γ ) ζ0 implies M, e |= (Γ ) ζ1 , 8. M, e |= (Γ ) (∀x : A)ψ if M, (e, e ) |= (Γ, x : A) ψ for all e ∈ (Γ ) AM (e).
1. 2. 3. 4. 5. 6.
Proposition 17 (Substitution lemma). For every formula (Δ) φ, terms (Γ ) t : Δ, and e ∈ Γ M , M, e |= (Γ ) φ[t/y] if and only if M, (Γ ) t : ΔM (e) |= (Δ) φ. Proof. By induction on the structure of (Δ) φ. Definition 26. A formula (Γ ) ζ is valid in M , written M |=Σ (Γ ) ζ, if M, e |=Σ (Γ ) ζ for all e ∈ Γ M . The classical structure M is a classical model of a first order theory S over Σ, written M |=Σ S, if M is an algebra for Σ and every first order axiom in S is valid in M . The formula is a consequence of S, written S |=Σ (Γ ) ζ, if it is valid in every model of S. Again we omit the subscript Σ if no confusion arises. We claim the completeness of the above semantics for classical logic. Proposition 18 (Completeness). For any first order theory S, for any sentence ζ, S ζ if and only if S |= ζ.
48
J.F. Belo
Proof. The “only if” direction of the claim, soundness, is proved by induction on the derivation of ζ from the first order axioms in S. The classical approach to the other direction is to show that if ζ is not derivable from the first order axioms in S, then S has a model that does not satisfy ζ. The traditional proof which proceeds by first extending the theory to a complete Henkin one – a theory such that for any sentence, either it or its negation is derivable and such that there is a constant witnessing every derivable existential – and then by defining a classical structure interpreting the predicate symbols in the canonical algebra through derivability, like that in [16] for the case of enumerable languages or the more general one in [17], carries over to our case. Such a proof carried over to dependently sorted languages without equality may be found in [4]. We now proceed to the semantics for the intuitionistic systems. We define a notion of Kripke structure composed of classical structures and generalised in the sense that between the nodes we allow arbitrary classical structure morphisms. We should note that this generalisation is not essential to our results, as the standard definition, which only considers inclusions between the nodes, should suffice. Definition 27. Let N be a classical structure. A classical structure morphism on M to N , also called an extension of M , is a family h of functions indexed by the sorts over Σ such that 1. for all sort (Δ) D, all d ∈ ΔM , all d ∈ (Δ) DM (d), h(Δ) D (d ) ∈ (Δ) DN (hΔ (d)) 2. for all term (Δ) p : A, all d ∈ ΔM , (Δ) p : AN (hΔ (d)) = h(Δ) A ((Δ) p : AM (d)) 3. for all R ⊂ (Δ) ∈ Σ, all d ∈ ΔM d ∈ RM only if hΔ (d) ∈ RN where hΔ is the induced assignment on contexts: h = id{∅} hΓ,v:A (e, e ) = (hΓ (e), h(Γ ) A (e )) for all e ∈ Γ M and e ∈ (Γ ) AM (e). Definition 28. A (generalised) Kripke structure for Σ is any category of classical structures for Σ and classical structure morphisms between them. The extension of a formula is defined in the usual way through forcing, except that we now have to consider arbitrary structure morphisms on each node for implications and universal quantifications.
Dependently Sorted Logic
49
Definition 29. The forcing relation Σ on a Kripke structure K between classical structures M in K, formulas (Γ ) ζ, and elements e of Γ M is defined by induction on the structure of (Γ ) ζ as follows. M, e (Γ ) R(t) if (Γ ) t : ΔM (e) ∈ RM , M, e (Γ ) , M, e (Γ ) ⊥, M, e (Γ ) ζ0 ∧ ζ1 if M, e (Γ ) ζ0 and M, e (Γ ) ζ1 M, e (Γ ) ζ0 ∨ ζ1 if M, e (Γ ) ζ0 or M, e (Γ ) ζ0 , M, e (Γ ) (∃x : A)ψ if M, (e, e ) (Γ, x : A) ψ for some (e, e ) ∈ Γ, x : AM , 7. M, e (Γ ) ζ0 → ζ1 if N, hΓ (e) (Γ ) ζ0 implies N, hΓ (e) (Γ ) ζ1 for all classical structure N and classical structure morphism h : M → N in K, 8. M, e (Γ ) (∀x : A)ψ if N, (hΓ (e), e ) (Γ, x : A) ψ for all (hΓ (e), e ) ∈ Γ, x : AN , classical structure N , and classical structure morphism h : M → N in K.
1. 2. 3. 4. 5. 6.
Definition 30. A formula (Γ ) ζ is valid at a classical structure M in K, written M Σ (Γ ) ζ, if M, e (Γ ) ζ for all e ∈ Γ M . The formula is valid in K, written K Σ (Γ ) ζ, if M (Γ ) ζ for all classical structure M in K. The Kripke structure K is a Kripke model of a first order theory S, written K S, if every first order axiom in S is valid in K. The formula is a Kripke consequence of S, written S Σ (Γ ) ζ, if it is valid in every Kripke model of S. Proposition 19 (Substitution lemma). Let (Δ) φ be a formula, let M be a classical structure in K, and let e ∈ Γ M . Suppose that (Γ ) t : Δ. Then M, e (Γ ) φ[t/y] if and only if M, (Γ ) t : ΔM (e) (Δ) φ. Proof. By induction on the structure of (Δ) φ. Proposition 20 (Completeness). For any first order theory S and sentence ζ S i ζ if and only if S ζ Proof. The “only if” part of the claim is easy. For the other, one assumes a sentence ζ not derivable intuitionistically from the first order axioms in S and proceeds to build a Kripke model that does not force it. Roughly, the traditional proof considers all possible saturations – extensions of the theory such that for every derivable disjunction at least one of its disjuncts is derivable and such that there is a constant witnessing every derivable existential – of S and then takes as a model the category composed of the canonical structures of each of those saturations and of all classical structure morphisms between them. This can be carried over to our semantics. One can show that the model thus constructed does not force ζ from the fact that ζ is not derivable in at least one of the saturations.
50
J.F. Belo
Acknowledgments This work was carried out under the supervision of Professor Peter Aczel. Also, many thanks to the referees for their generous and helpful comments.
References 1. Gambino, N., Aczel, P.: The generalised type-theoretic intepretation of constructive set theory (preprint 2005) 2. Jacobs, B.: Categorical Logic and Type Theory. Studies in Logic and the Foundations of Mathematics, vol. 141. North Holland, Amsterdam (1999) 3. Aczel, P.: Predicate logic with dependent sorts or types. Unpublished (2004) 4. Belo, J.F.: Dependently typed predicate logic. Master’s thesis, University of Manchester (2004) 5. Makkai, M.: First order logic with dependent sorts, with applications to category theory. Unpublished (1995) 6. Rabe, F.: First-order logic with dependent types. In: Furbach, U., Shankar, N. (eds.) IJCAR 2006. LNCS (LNAI), vol. 4130, pp. 377–391. Springer, Heidelberg (2006) 7. Mosses, P.D. (ed.): CASL Reference Manual. LNCS, vol. 2960. Springer, Heidelberg (2004) 8. Benke, M., Dybjer, P., Jansson, P.: Universes for generic programs and proofs in dependent type theory. Nordic J. of Computing 10(4), 265–289 (2003) 9. Pfeifer, H., Rueß, H.: Polytypic proof construction. In: Bertot, Y., Dowek, G., Hirschowitz, A., Paulin, C., Th´ery, L. (eds.) TPHOLs 1999. LNCS, vol. 1690, pp. 55–72. Springer, Heidelberg (1999) 10. Cartmell, J.: Generalized algebraic theories and contextual categories. PhD thesis, Univ. Oxford (1978) 11. Cartmell, J.: Generalized algebraic theories and contextual categories. Ann. Pure Appl. Logic 32, 209–243 (1986) 12. Pitts, A.M.: Categorical logic. In: Abramsky, S., Gabbay, D.M., Maibaum, T.S.E. (eds.) Handbook of Logic in Computer Science. Algebraic and Logical Structures, vol. 5, ch.2, Oxford University Press, Oxford (2000) 13. Hofmann, M.: Syntax and semantics of dependent types. In: Pitts, A.M., Dybjer, P. (eds.) Semantics and Logics of Computation, vol. 14, pp. 79–130. Cambridge University Press, Cambridge (1997) 14. Troelstra, A.S., Schwichtenberg, H.: Basic proof theory. Cambridge University Press, Cambridge (2000) 15. Johnstone, P.T.: Sketches of an Elephant: A Topos Theory Compendium, vol. 2. Oxford University Press, Oxford (2002) 16. Johnstone, P.T.: Notes on Logic and Set Theory. Cambridge University Press, Cambridge (1987) 17. Shoenfield, J.R.: Mathematical Logic. Association for Symbolic Logic (1967)
Finiteness in a Minimalist Foundation Francesco Ciraulo1 and Giovanni Sambin2 Universit` a di Palermo, Dipartimento di Matematica ed Applicazioni, Via Archirafi 34, 90123 Palermo, Italy
[email protected] http://www.math.unipa.it/∼ciraulo Universit` a di Padova, Dipartimento di Matematica Pura ed Applicata, Via Trieste 63, 35121 Padova, Italy
[email protected] http://www.math.unipd.it/∼sambin/
1
2
Abstract. We analyze the concepts of finite set and finite subset from the perspective of a minimalist foundational theory which has recently been introduced by Maria Emilia Maietti and the second author. The main feature of that theory and, as a consequence, of our approach is compatibility with other foundational theories such as Zermelo-Fraenkel set theory, Martin-L¨ of’s intuitionistic Type Theory, topos theory, Aczel’s CZF, Coquand’s Calculus of Constructions. This compatibility forces our arguments to be constructive in a strong sense: no use is made of powerful principles such as the axiom of choice, the power-set axiom, the law of the excluded middle. Keywords: minimalist foundation, finite sets, finite subsets, type theory, constructive mathematics.
1
Introduction
The behaviour of a mathematical object and the properties it possesses are influenced by the foundational assumptions one accepts. That is true also for the apparently clear concepts of finite set and finite subset of a given set. For this reason, it seems interesting to know a stock of properties about finiteness which are true in all foundational theories (or, at least, in the most used ones). Maria Emilia Maietti and the second author have recently proposed (see [5]) a foundational theory which is “minimalist” in the sense that it can be seen as the common core of some of the most used foundations, namely, ZermeloFraenkel set theory, topos theory, Martin-L¨ of’s Type Theory, Aczel’s CZF, Coquand’s Calculus of Constructions. A peculiarity of this minimalist foundation is that it is based on two levels of abstraction: an extensional theory to develop mathematics in more or less the usual informal way (see [4]) and an underlying intensional type theory called “minimal Type Theory” (“mTT” from now on) on which mathematics is formalized (see [5]). Therefore, our task of speaking about finiteness independently from foundations acquires a more precise form: to study finiteness from the perspective of this minimalist foundation, and hence M. Miculan, I. Scagnetto, and F. Honsell (Eds.): TYPES 2007, LNCS 4941, pp. 51–68, 2008. c Springer-Verlag Berlin Heidelberg 2008
52
F. Ciraulo and G. Sambin
eventually in terms of mTT. Accomplishing this task is the aim of the present paper. Thanks to the reasons explained above, the definitions and results in the present paper are constructive in a strong sense: no use is made of powerful principles such as the axiom of choice, the power-set axiom, the principle of excluded middle. In fact, each of these principles breaks compatibility with at least one of the above foundational theories. The present work can be seen as a sequel to [9] because all the definitions and properties stated there, even if originally intended for Martin-L¨ of’s theory, remain valid when viewed from the point of view of mTT since they do not need any application of the axiom of choice. For the same reason, a large part of [6] and [8] can be read as an explanation of the formal system mTT. For all those notions which are used but not explained in this paper, we refer to [5] and [6].
2
Minimal Type Theory: A Brief Introduction
The type theory mTT can be formalized as a variant of Martin-L¨ of’s theory (see [6] and [8]); thus we feel free to use all the standard notation developed for Type Theory, mainly, the set constructors Σ and Π. The main difference between the two systems is that mTT identifies each proposition with a particular set (namely the set of all its proofs), but not the converse which Martin-L¨ of’s theory does, instead. That implies that the usual identification between logical constants and set constructors cannot be performed any longer. In other words, in mTT every logical constant needs an independent definition; for example, the always false proposition, written ⊥, has to be kept distinct from N (0) (the set with no elements; see below) simply because N (0) is not a proposition. As a consequence, the axiom of choice is no longer provable in mTT.1 To see this, let us briefly explain the difference between (Σ x ∈ A)B(x) (disjoint union) and (∃ x ∈ A) B(x) (existential quantifier) in mTT. Both are sets, but only the latter is a proposition. Their formation and introduction rules are formally the same, but their elimination rules, namely [z ∈ (Σ x ∈ A)B(x)] .. .. C(z) set
C prop 1
[x ∈ A, y ∈ B(x)] .. .. d ∈ (Σ x ∈ A)B(x) m(x, y) ∈ C(< x, y >) ElΣ (d, m) ∈ C(d)
d ∈ (∃ x ∈ A)B(x) El∃ (d, m) ∈ C
[x ∈ A, y ∈ B(x)] .. .. m(x, y) ∈ C ∃ - elimination
(1)
(2)
Note that the absence of the axiom of choice is necessary to keep compatibility with topos theory (see [5] for more details).
Finiteness in a Minimalist Foundation
53
differ because the proposition C in the ∃-elimination rule cannot depend on a proof of (∃ x ∈ A)B(x). This apparently small limitation is enough to make the axiom of choice non-deducible. Here for “ the axiom of choice” we mean the following proposition: (∀x ∈ A)(∃y ∈ B(x))C(x, y) → (∃f ∈ (Π x ∈ A)B(x))(∀x ∈ A)C(x, f (x)) (3) (where f (x) stands for Ap(f, x), the element of B(x) which is obtained by applying the function f to the input x in A). On the contrary, the set (Πx ∈ A)(Σx ∈ B(x))C(x, y) → (Σf ∈ (Πx ∈ A)B(x))(Πx ∈ A)C(x, f (x)) (4) can be proved to be inhabited. The reason for that stands in the fact that the second (or right) projection can be defined with respect to Σ, but not with respect to ∃. Provided that c is an element of (Σ x ∈ A)B(x) (respectively (∃x ∈ A)B(x)), then the first (or left) projection, written p(c) is the element ElΣ (c, m) (respectively El∃ (c, m)) obtained by elimination with A in place of C and x in place of m(x, y); of course, p(< a, b >) = a by equality. The second projection is obtained, in the case of Σ, by taking m(x, y) to be y; this forces C(x) to be B(x). Hence, this technique cannot be used in the case of ∃. Summing up, from a proof c ∈ (∃x ∈ A)B(x) we are able to construct an element p(c) ∈ A which, from a metalinguistic level, can be seen to satisfy B; nevertheless, we are not able to construct a proof of B(p(c)) within the system mTT. This fact is intimately related to the fact that, even if the axiom of choice is non-deducible within the system, it in fact holds on a metalinguistic level as long as the pure system mTT is considered; this happens because of our constructive interpretation of quantifiers. Of course, not all the extensions of mTT (e.g. topos theory) share this property and hence we cannot expect to prove the axiom of choice within our system. By the way, note that the usual logical rule of ∃-elimination can be obtained from the above one by suppressing all proof terms; so:
C prop
(∃ x ∈ A)B(x) true C true
[x ∈ A, B(x) true] .. .. C true logical ∃ - elimination.
(5)
This rule says that if we want to infer C (which does not depend on x ∈ A) from (∃x ∈ A)B(x), then we can assume to have an arbitrary x ∈ A and a proof of B(x); of course, that does not mean we are using first and second projection. We take the occasion to warn the reader that we will often use a =S b, or simply a = b, instead of the proposition Id(S, a, b), provided that S is a set; this proposition, however, has to be kept distinct from the judgement a = b ∈ S. Provided that A set, B set, f ∈ A → B and a ∈ A we often write f (a) instead of Ap(f, a).
54
3
F. Ciraulo and G. Sambin
A Constructive Concept of Finiteness
In the framework of mTT, like in other constructive approaches, a collection of objects is called a set when, roughly speaking, we have rules to construct such objects; we reserve the word “element” for an object of a set. It is common practice to distinguish intensional sets from extensional sets (also called setoids) which are (intensional) sets endowed with an equivalence relation. Even if the definitions in the present paper are formulated with regard to sets, they can easily been extended to setoids: it is enough to replace the propositional equality by the equivalence relation of the setoid. Thus the natural framework to set the following results should be the extensional level of the minimal type theory (see [4]). An example of (intensional) set is N , the set of formal natural numbers. N -formation
N set
N -introduction 0∈N
n∈N s(n) ∈ N
(6)
(7)
N -elimination [z ∈ N ] [x ∈ N, y ∈ C(x)] .. .. .. .. C(z) set c ∈ N d ∈ C(0) e(x, y) ∈ C(s(x)) R(c, d, e) ∈ C(c)
(8)
The programm R (for “recursion”) performs the following steps. Firstly, it brings c to its canonical form, that will be either 0 or s(n) for some n ∈ N . In the first case it returns d ∈ C(0) (or, better, the canonical element produced by d); in the second case it evaluates R(n, d, e) and then it computes e(n, R(n, d, e)). N -equality [x ∈ N, y ∈ C(x)] .. .. d ∈ C(0) e(x, y) ∈ C(s(x)) R(0, d, e) = d ∈ C(0)
[x ∈ N, y ∈ C(x)] .. .. n ∈ N d ∈ C(0) e(x, y) ∈ C(s(x)) R(s(n), d, e) = e(n, R(n, d, e)) ∈ C(s(n)) (9)
As usual, we write sn (0) (n an informal natural number) for the canonical element of N which is obtained from 0 by n applications of s. Thus sn (0) is a shorthand for the formal expression which represents the informal natural number n. When no confusion arises, we will use the symbol n instead of the formal natural number sn (0). Note that, provided that n and m are two different informal natural numbers, surely the proposition Id(N, sn (0), sm (0)) cannot be proved within the system, as it is clear by an easy metalinguistic investigation.
Finiteness in a Minimalist Foundation
55
Nevertheless neither the proposition ¬Id(N, sn (0), sm (0)) is deducible, unless the first universe (also called the set of small sets) is defined (actually the boolean universe defined in [4] is enough). Once the above rules are given, one can define addition in the usual recursive way: let the value of e(x, y) be s(y); then the element R(b, a, e) is what is called a + b. Moreover, one can define a ≤ b as (∃c ∈ N )(a + c = b), where x = y is the proposition Id(N, x, y). Of course, the standard product and a limited subtraction, such as all other recursive functions, can be defined in the standard way. Another example is the definition of N (k), the standard set with k elements. N (k)-formation
k∈N N (k) set
k = k ∈ N N (k) = N (k )
(10)
n = m ∈ N n < k true nk = mk ∈ N (k)
(11)
N (k)-introduction n ∈ N n < k true nk ∈ N (k)
These rules introduce the k canonical elements of the set N (k), namely, 0k , (s(0))k , . . . , (sk−1 (0))k , which, for the sake of brevity, we write 0k , 1k , . . . , (k − 1)k . N (k)-elimination [z ∈ N (k)] [n ∈ N, n < k true] .. .. .. .. C(z) set c ∈ N (k) cn ∈ C(nk ) Rk (c, c0 , . . . , ck−1 ) ∈ C(c)
(12)
where Rk is the function that brings c to its canonical form, that will be a certain nk for some n < k, and hence picks the corresponding cn . N (k)-equality [z ∈ N (k)] [x ∈ N, x < k true] .. .. .. .. C(z) set n ∈ N n < k true cx ∈ C(xk ) . Rk (nk , c0 , . . . , ck−1 ) = cn ∈ N (k)
(13)
Note that, for n ∈ N it is possible to prove by induction the proposition (n < k) → (n = 0) ∨ (n = 1) ∨ . . . (n = (k − 1)), provided that k ∈ N is fixed. Thus for x ∈ N (k), it is possible to prove the proposition (x = 0k ) ∨ (x = 1k )∨. . . ∨(x = (k − 1)k ). This implies that every quantification over N (k) can be replaced by a finite conjunction or disjunction. More precisely, a proposition of the form (∀x ∈ N (k))P (x) is equivalent to P (0k ) & P (1k ) & . . . & P ((k − 1)k ), while (∃x ∈ N (k))P (x) is the same as P (0k ) ∨ . . . ∨ P ((k − 1)k ). Even if the axiom of choice is not deducible within the system mTT, nevertheless it holds with respect to the sets of the form N (k), in the sense of the following proposition.
56
F. Ciraulo and G. Sambin
Proposition 1. Let k ∈ N and S(x) set [x ∈ N (k)]; then the proposition (∀x ∈ N (k))(∃a ∈ S(x))P (x, a) → (∃f ∈ T )(∀x ∈ N (k))P (x, f (x))
(14)
is deducible, where T is (Π x ∈ N (k))S(x). Proof. Let Q(x) be (∃a ∈ S(x))P (x, a); then (∀x ∈ N (k))Q(x) is equivalent to Q(0k ) & . . . & Q((k − 1)k ). Thus, we can replace (∀x ∈ N (k))Q(x) with the k assumptions (∃a ∈ S(nk ))P (nk , a), n = 0, . . . , k − 1. By ∃-elimination k times, we can assume P (0k , a0 ), . . . , P ((k − 1)k , ak−1 ), where each ai is an element of S(ik ), i = 0, . . . , k−1. By N (k)-elimination, we can construct a family R(x, a0 , . . . , ak−1 ) ∈ S(x) and then a function f ∈ (Π x ∈ N (k))S(x), where f is λx.R(x, c0 , . . . , ck−1 ), such that P (x, f (x)) holds for all x ∈ N (k). Thus the proposition (∃f ∈ (Π x ∈ N (k))S(x))(∀x ∈ N (k))P (x, f (x)) can be inferred from P (nk , an ), n = 0, . . . , k − 1 and then, since it does not depend on any an , directly from (∃a ∈ S(x))P (x, a), x ∈ N (k). A classical definition says that a set is finite if it is not infinite, where it is infinite if there exists a one-to-one correspondence between it and one of its proper subsets. An alternative way is to consider the sets of the form N (k) as prototypes of the finite sets and, hence, to call a set finite if it is in a bijective correspondence with N (k), for some k ∈ N . That is just the definition given by Brouwer in [3] and then by Troelstra and van Dalen in [10]. Of course, several other notions are possible (see section 5; see also [11]). For example, following [3], we could say that a set is (numerically) bounded if it cannot have a subset of cardinality n, for some natural number n. Otherwise, following [10], we could say that a set is finitely indexed or finitely enumerable or listable if there exists a surjective function from some N (k) onto it. From a classical point of view, that is in the framework of Zermelo-Fraenkel set theory with choice, the above definitions turn out to be all equivalent; the same does not happen in other foundations (see [11] for counterexamples in intuitionistic mathematics). So we have to make a choice; of course, we look for the most simple, natural and effective one. What we do is to adopt the following (see “finitely indexed” in [10]). Provided that A set, B set and f ∈ A → B we write f (A) = B for the proposition (∀b ∈ B)(∃a ∈ A)Id(B, b, Ap(f, a)); in other words, f (A) = B true is the judgement “f is surjective”. Definition 1 (finite set). Let S be a set; S is said to be finite if the proposition (∃k ∈ N )(∃f ∈ N (k) → S)(f (N (k)) = S), which we shortly denote by Fin(S), is true. Proposition 2. If I is a finite set and (∃g ∈ I → S)(g(I) = S) is true, then S is finite. Proof. The proof is quite obvious; however we give a sketch of it in order to show that it can be carried out within mTT. By ∃-elimination (twice) on Fin(I), we can assume k ∈ N , f ∈ N (k) → I and f (N (k)) = I. Again by ∃-elimination, we can assume g ∈ I → S and g(I) = S. The function λx.g(f (x)) ∈ N (k) → S is surjective and thus Fin(S) is true, regardless of the particular k, f and g.
Finiteness in a Minimalist Foundation
57
It is possible to give also the notion of unary set, i.e. a set with at most one element. Trivially, every unary set is finite too. Definition 2 (unary set). Let S be a set; we say that S is unary if the proposition (∃k ∈ N )(k ≤ s(0) & (∃f ∈ N (k) → S)(f (N (k)) = S)) is true. Given a set I and a set-indexed family of sets S(i) set [i ∈ I], it is possible to construct their indexed sum (or disjoint union), written (Σ i ∈ I)S(i). Its canonical elements are couples of the kind < i, a > with i ∈ I and a ∈ S(i). The following lemma and the subsequent proposition say that finite sets have the expected behavior with respect to indexed sums. Lemma 1. Let k ∈ N and n(x) ∈ N [x ∈ N (k)]; then (Σ x ∈ N (k))N (n(x)) is finite. Proof. Let m = n(0k ) + n(1k ) + . . . + n((k − 1)k ) ∈ N and consider the function f ∈ N (m) → (Σ x ∈ N (k))N (n(x)) defined by the following m conditions: 0m 1m .. .
−→ −→ .. .
< 0k , 0n(0k ) > < 0k , 1n(0k ) > .. .
→ < 0k (n(0k ) − 1)m − (n(0k ))m − → < 1k .. .. . . .. .. . . (m − 1)m −→ < (k − 1)k
, (n(0k ) − 1)n(0k ) > , 0n(1k ) > .. . .. . , (n((k − 1)k ) − 1)n((k−1)k ) >
(15)
The idea is trivial: we perform k stages: firstly, we enumerate the n(0k ) elements of N (n(0k )), then the n(1k ) elements of A(n(1k )) and so on till we reach the last element in N (n((k − 1)k )). Proposition 3. Let A(i) set [i ∈ I] be a finite set-indexed family of finite sets, that is, let I and each of the A(i) be finite. Then (Σ i ∈ I)A(i) is finite. Proof. By ∃-elimination on Fin(I), we can assume k ∈ N , f ∈ N (k) → I and f (N (k)) = I. Firstly, let Q(i) prop [i ∈ I] be an arbitrary propositional function over I. From f ∈ N (k) → I we can infer (∀i ∈ I)Q(i) → (∀x ∈ N (k))Q(f (x)) true. Also, from f (N (k)) = I we can infer (∀x ∈ N (k))Q(f (x)) → (∀i ∈ I)Q(i). Thus (∀i ∈ I)Q(i) is equivalent to (∀x ∈ N (k))Q(f (x)), provided that the assumptions at the very beginning of the proof hold. Now let Q(i) ≡ Fin(A(i)) ≡ (∃n ∈ N )(∃g ∈ N (n) → A(i))(g(N (n)) = A(i)). Thus (∀i ∈ I)Fin(A(i)) is equivalent to (∀x ∈ N (k))Fin(A(f (x))). Hence, by proposition 1 applied twice, we can infer the existence of n ∈ N (k) → N and g ∈ (Π x ∈ N (k))(N (n(x)) → A(f (x))) such that g(x)(N (n(x))) = A(f (x)), that is g(x) is surjective, for all x ∈ N (k).
58
F. Ciraulo and G. Sambin
Let h ∈ (Σ x ∈ N (k))N (n(x)) → (Σ i ∈ I)A(i) be the function defined by h ≡ λz. < f (p(z)), g(p(z))(q(z)) >, that is h(< x, y >) =< f (x), g(x)(y) >. The function h is surjective and the thesis follows by the previous lemma and proposition 2. As a corollary one gets that the cartesian product A × B of two finite sets is finite too. Beside the Σ operator, another common constructor for sets is the so-called dependent (or cartesian) product, written Π, which includes the set of functions between two sets as a special case. The canonical elements of (Π i ∈ I)S(i) are functions of the kind λx.f (x) with x ∈ I and f (x) ∈ S(x). The behavior of Π with respect to finiteness is described in the following lemma and proposition. Lemma 2. Let k ∈ N and n(x) ∈ N [x ∈ N (k)]; then (Π x ∈ N (k))N (n(x)) is finite. Proof. Let m = n(0k ) · n(1k ) · . . . · n((k − 1)k ) ∈ N and consider the function f ∈ N (m) → (Π x ∈ N (k))N (n(x)) defined by the following m conditions (we suppress indexes): ⎧ ⎧ 0 → 0 0 → 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎨ 1 → 0 1 → 0 ... 1 −→ 0 −→ . .. .. .. .. .. ⎪ ⎪ . . . . ⎪ .. ⎪ . ⎪ ⎪ ⎩ ⎩ k − 1 → 1 k − 1 → 0 ⎧ ⎧ 0 → 0 0 → 1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎨ 1 → n(1) − 1 1 → 0 m m − 1 −→ −→ . . . (16) .. .. .. .. .. .. ⎪ ⎪ n(0) n(0) . . . . . . ⎪ ⎪ ⎪ ⎪ ⎩ ⎩ k − 1 → n(k − 1) − 1 k − 1 → 0 ⎧ 0 → n(0) − 1 ⎪ ⎪ ⎪ ⎨ 1 → n(1) − 1 ... ... m − 1 −→ .. .. .. ⎪ . . . ⎪ ⎪ ⎩ k − 1 → n(k − 1) − 1 Proposition 4. Let k ∈ N and A(x) set [x ∈ N (k)] be a family of finite sets indexed by N (k). Then (Π x ∈ N (k))A(x) is finite. Proof. As in the proof of the previous proposition, from (∀x ∈ N (k))Fin(A(x)), we can construct n ∈ N (k) → N and g ∈ (Π x ∈ N (k))(N (n(x)) → A(x)) such that g(x)(N (n(x))) = A(x), that is g(x) is surjective, for all x ∈ N (k). Let h ∈ (Π x ∈ N (k))N (n(x)) → (Π x ∈ N (k))A(x) be the function defined by λz.(λx.g(x)(z(x))), that is h(λx.f (x)) = λx.g(x)(f (x)). Note that it is surjective and apply the previous lemma and proposition 2. As a corollary, the set of functions N (k) → S is finite provided that k ∈ N and S is a finite set.
Finiteness in a Minimalist Foundation
59
Note that we cannot generalize the previous proposition to the case of an arbitrary finite set I in place of N (k). That is so because proving finiteness of (Π i ∈ I)A(i) would need the construction of a partial inverse of the surjective function giving finiteness of I (think of the special case I → A). The point is that such a construction of a partial inverse cannot be performed if the axiom of choice is missing. Hence, finiteness of A → B does not follow from finiteness of A and B. Incidentally, note that if both the axiom of choice and the first universe (or, simply, the boolean universe of [4]) are adopted, then a finite set become exactly a set that can be put in a bijective correspondence with some N (k) (see Brouwer’s definition of finite set in [3]). In fact, if S is finite, then there exists an onto map f : N (k) → S; thus, by choice, we can construct a partial inverse, say g, whose image is in a one to one correspondence with S. Now, since equality in N (k) is decidable (thanks to the existence of either the first or the boolean universe), we can count the elements of g(S); let n be this number. Then it is possible to construct a bijection between N (n) and S.
4
Finite Subsets
Before turning our attention to finite subsets, we have to introduce the notion of subset we are going to use. Following [9], a subset of a given set S is represented by a first order (that is, with variables ranging only over sets) propositional function with at most one free variable over that set. A propositional function over S is of the kind U (x) prop [x ∈ S]; thus U (x) is a proposition provided x ∈ S. We write U , or also {x ∈ S : U (x)}, when we think of it as a subset of S. We write U ⊆ S to express that U is a subset of S. The membership relation between an element a ∈ S and a subset U ⊆ S is written a S U (or a U when no confusion arises) and is defined as the proposition Id(S, a, a) & U (a), where U (x) is a propositional function which represents U . Note that a U is a proposition provided that a ∈ S and U ⊆ S; that is, (a S U ) prop [a ∈ S, U (x) prop [x ∈ S]] .
(17)
Hence, a U is not a judgement, but only a proposition; moreover a U is true exactly when U (a) is true and a ∈ S. Thus, from (a U ) true we can derive the judgement a ∈ S; note that the proposition Id(S, a, a) is introduced just to keep trace of the element a, since U (a) could loose the information about it (see [9] for further explanations). Given two subsets, say U and V , i.e. two propositional functions over S, we say that U is included in V when the proposition (∀x ∈ S)(x U → x V ), written U ⊆ V , is true. Of course, U = V is the proposition (U ⊆ V ) & (V ⊆ U ); hence, equality between subsets is extensional; in other words, a subset is a class of equivalent propositional functions. Important examples of subsets are: the empty subset, written ∅, that corresponds to the propositional function ⊥ prop [x ∈ S], where ⊥ is the false proposition; the total subset, denoted by S or simply by S, that corresponds to the always true propositional function prop [x ∈ S];
60
F. Ciraulo and G. Sambin
the singletons {a} for a ∈ S i.e. the propositional functions x = a (or, better, Id(S, x, a) prop [x ∈ S]).2 Finally, operations on subsets are defined by reflecting the corresponding connectives (of intuitionistic logic). For example, provided that U and V are represented by U (x) and V (x) respectively, U ∩V is represented by the propositional function U (x) & V (x); in other words, x U ∩ V if and only if (x U ) & (x V ). Infinitary operations are also available, such as the union of a set-indexed family of subsets (more details can be found in [9]). Note that an operation corresponding to implication is also definable; in particular, given a subset U represented by U (x), we denote by −U the subset represented by ¬ U (x) ≡ U (x) → ⊥. We write PS for the collection of all subsets of the set S; it is surely not a set in the framework of mTT: assuming the powerset axiom breaks compatibility with predicative foundations, e.g. Martin-L¨ of’s type theory (for more details on this, see [5]). To be precise, PS is an extensional collection, namely the quotient over logical equivalence of the collection of all propositional functions over S. Subsets of a set S can be identified with images under functions with S as their codomain. In fact, let U (x) prop [x ∈ S] then it is also U (x) set [x ∈ S]; thus we can construct (Σ x ∈ S)U (x) and the function λx.p(x) : (Σ x ∈ S)U (x) → S, where p is the first projection; that is, provided that a ∈ S and π ∈ U (a), we map each < a, π > to a. Thus an element a ∈ S is in the image of λx.p(x) if and only if there exists a proof π of U (a). Vice versa, provided that I set and f : I → S, then the propositional function (∃i ∈ I)(x = f (i)) prop [x ∈ S] defines a subset of S which is exactly the image of I under f . Following the same pattern as for sets, we give the following. Definition 3 (finite (unary) subset). A subset K of a set S is finite ( unary) if it is the image of a function f : N (k) → S, for some k ∈ N (k = 0, 1); that is, K can be represented by the propositional function (in the free variable x): (∃i ∈ N (k))(x = f (i)). The collection of all finite (unary) subsets of S is denoted by Pω S (P1 S, respectively). It follows directly from the definition that every unary subset is finite too; so we can think of P1 S as included in Pω S. Trivially, S (in the sense of S ) belongs to Pω S (P1 S) if and only if S is finite (unary) as a set. The above definition is just the same as in [9] and coincides with the notion of finitely indexed as given in [10]. Until the end of this section, we will prove some basic properties about finite (and unary) subsets. First of all, we give a natural characterization in terms of finite sequences of elements, i.e. lists. So we need to introduce the set constructor List, which is defined by the following rules. List-formation
2
S set List(S) set
(18)
Note that, since the symbol S can represent both a set and the total subset of that set, U ⊆ S can denote both a judgement and a proposition. Which case occurs will be clear from the context.
Finiteness in a Minimalist Foundation
61
List-introduction l ∈ List(S) a ∈ S cons(l, a) ∈ List(S)
nil ∈ List(S)
(19)
that is, lists are recursively constructed starting from the empty list nil and adding elements of S one at a time (cons can be thought of as the function that attaches an element at the end of a given list). List-elimination [x ∈ List(S)] [x ∈ List(S), y ∈ S, z ∈ A(x)] .. .. .. .. A(x) set l ∈ List(S) a ∈ A(nil) f (x, y, z) ∈ A(cons(x, y)) LR(a, f, l) ∈ A(l) (20) that is, if we have an element in A(nil) and every time we know an element in A(x) we can construct (by means of the function f ) an element of A(cons(x, y)), then we are able to construct an element in A(l) for every list l. In other words, we have a function LR (for “list recursion”) that for every list l returns a value in A(l) depending on the method f and the starting value a ∈ A(nil). An important consequence of this last rule is that we can use induction when proving a certain property about lists. Remember that every proposition is also a set (an element is just a proof, a verification); then the elimination rule yields: [x ∈ List(S)] .. .. P (x) prop l ∈ List(S) P (nil) true P (l) true
[x ∈ List(S), y ∈ S, P (x) true] .. .. P (cons(x, y)) true induction.
(21) Finally we have two equality rules which we state without writing again all the hypotheses. LR(a, f, nil) = a
LR(a, f, cons(l, b)) = f (l, b, LR(a, f, l)).
(22)
These conditions can be read, of course, as a recursive definition of the function LR. In order to continue with our simultaneous treatment of finite and unary subsets, we need to define the set of sequences of length at most one, written List1 (S). The rules for it are obtained as a slight modification of those for List(S) and hence we do not write them down in all details. We give only the introduction rules, as an example: nil ∈ List1 (S)
a∈S . cons(nil, a) ∈ List1 (S)
(23)
Even if not formally right, we can conceive of List1 (S) as included in List(S) in order to avoid boring distinctions.
62
F. Ciraulo and G. Sambin
Sometimes we write [ ] instead of nil, [a] instead of cons(nil, a), [a, b] instead of cons(cons(nil, a), b) and so on. List(S) can be endowed with a binary operation, called concatenation and written ∗, which is recursively defined by the following clauses: l ∗ nil =def l (24) l ∗ cons(m, a) =def cons(l ∗ m, a) where l,m are in List(S) and a ∈ S. Finally, we would like to define a function dec (for “de-construct”) from List(S) to PS; of course, formally we will have (dec(l)(x) prop [x ∈ S])[l ∈ List(S)]
(25)
because we want dec(l) to be a subset of S (that is a propositional function over S) for any l ∈ List(S). Let us define dec recursively as follows3 : dec(nil)(x) ≡ ⊥(x) . (26) dec(cons(l, a))(x) ≡ dec(l)(x) ∨ Id(S, x, a) Proposition 5 (a characterization of finite (unary) subsets). Let S be a set and K ⊆ S. K is finite (unary) if and only if ∃ l ∈ List(S) (∃ l ∈ List1 (S)) such that K is equal to dec(l) in PS. Proof. Suppose K is finite (unary); then, by ∃-elimination, we can assume to know a number k (k = 0, 1) and a function f from N (k) to S, such that K = f (N (k)). Now consider the list lf = [f (0k ), . . . , f ((k − 1)k )] and remember that (∃i ∈ N (k))(x = f (i)) is equivalent to (x = f (0k )) ∨ (x = f (1k )) ∨ . . . ∨ (x = f ((k − 1)k )). In other words, x f (N (k)) and x dec(lf ) are equivalent. Vice versa, by ∃-elimination again, we can assume to have a list, say l = [a0 , . . . , ak−1 ], whose length is k and such that dec(l) = K. Then we can define a surjection fl from N (k) to S by prescribing the k conditions: fl (nk ) =def an , for n = 0, . . . , k − 1. Now it is easy to realize the equivalence between x fl (N (k)) and x dec(l). The previous proposition can be stated informally by saying that Pω S = (dec (List(S)), ↔) and P1 S = (dec(List1 (S)), ↔). In other words Pω S and P1 S are set-indexed extensional families. It is also possible to define Pω S as a setoid, that is a quotient set. Indeed, let ∼ be the relation over List(S) defined by l1 ∼ l2 if dec(l1 ) ↔ dec(l2 ). One sees at once that ∼ is an equivalence relation; hence we can consider the setoid (List(S), ∼). So, Pω S can be identified with (List(S), ∼); a similar argument holds for P1 S and (List1 (S), ∼). The general idea is that a finite subset is obtained from a list by forgetting (that is, by abstracting from) the order and multiplicity with which items appear in it. The fact that Pω S and P1 S are set-indexed families allows us to treat them almost like sets. First of all, we can quantify over them; in fact, every quantification intended over finite subsets of S can be given a constructive meaning 3
We write ⊥(x) to emphasize that we look at ⊥ as a propositional function over S.
Finiteness in a Minimalist Foundation
63
by quantifying over the set List(S) and then using the function dec. In particular, an expression like “(∀K ∈ Pω S)(. . . K . . .)” is a shorthand for “(∀l ∈ List(S))(. . . dec(l) . . .)”. Similarly for ∃. Of course a proposition over Pω S, say P (K), as to be thought of as a proposition over List(S) of the kind Q(l) such that Q(l1 ) ↔ Q(l2 ) true if dec(l1 ) ↔ dec(l2 ) true. Moreover it is possible to use Pω S to construct new setoids. For example, List(Pω S) can be defined as the setoid (List(List(S)), ≈), where {l0 , . . . , ln−1 } ≈ {k0 , . . . , km−1 } is Id(N, n, m) & (∀i ∈ N )((i < n) → (dec(li ) ↔ dec(ki ))). However, if one is interested in constructing new objects based on Pω S, then it is more convenient to define Pω S (and P1 S similarly) by adding to the rules of List(S) the following ones (and by modifying the elimination rule in order to take the new equality into account; see [7]): exchange
contraction
a ∈ S b ∈ S l ∈ Pω S ; cons(cons(l, a), b) = cons(cons(l, b), a) ∈ Pω S
(27)
a ∈ S l ∈ Pω S . cons(cons(l, a), a) = cons(l, a) ∈ Pω S
(28)
It is easy to show that these two rules are enough to force two canonical elements to be equal when they are formed by the same items, regardless of order and repetitions. Thus, for a and b in S, we can infer that [a, b, b, a, b] and [a, b] are equal elements of Pω S. Of course, this does not mean that the equality in Pω S is decidable; for example, we can infer [a] = [b] ∈ Pω S only if a = b ∈ S. In other words, the equality in Pω S is decidable if and only if that of S is. The usual way of dealing with finite subsets can be reconstructed by means of suitable definitions and derived rules. As an example, let us consider the notion of membership. The idea is that an element a ∈ S belongs to l ∈ Pω S if the assumption a ∈ S has been used in the construction of l. However, by exchange and contraction, one may assume that a is the last item in l. So one can put a l ≡ (∃m ∈ Pω S)(l = cons(m, a)).
(29)
If Pω is seen as a constructor, then it is possible to construct Pω Pω S and so on. In [12] a proof can be found of the fact that Pω S is finite, provided that S is finite. However, we prefer to keep our original definition and look at Pω S as an extensional set-indexed collection of subsets of S. The main reason for that is that we are thus allowed to apply to finite subsets all the operations of PS, even if Pω S is not closed under them. Proposition 6 (basic properties of finite and unary subsets). Let Pω S and P1 S be the collections of all finite and unary subsets of a set S; then: i) ∅ belongs to P1 S; ii) {a} belongs to P1 S for any a ∈ S; iii) K ∪ L ∈ Pω S, for all K and L belonging to Pω S.
64
F. Ciraulo and G. Sambin
Proof. For i) and ii) consider the lists nil and cons(nil, a), respectively. With regard to iii) take the concatenation of two lists corresponding to K and L and note that dec(l ∗ m) = dec(l) ∨ dec(m); in other words, the concatenation of two lists corresponds to the union of the corresponding finite subsets. Proposition 7 (induction principle for finite subsets). Let P (K) be a predicate over Pω S such that: 1. P (∅) holds; 2. P (L) implies P (L ∪ {a}), for any a ∈ S and L in Pω S; then P (K) holds for every K in Pω S. Proof. Note that the hypotheses 1 and 2 can be rewritten as P (dec(nil)) and (∀l ∈ List(S))(∀a ∈ S)(P (dec(l)) → P (dec(cons(l, a)))), while the thesis is (∀l ∈ List(S))P (dec(l)). Thus the statement is just a reformulation of the induction rule for lists with respect to the proposition Q(l) ≡ P (dec(l)). Proposition 8. Let S be a set and K ⊆ S be finite (possibly unary). Then it is decidable whether K is empty or inhabited. Proof. We prove (K = ∅) ∨ (∃a ∈ S)(a K) by induction on Pω S. If K = ∅, then we are done. Now suppose the statement is true for K and consider the subset K ∪ {a}, for a ∈ S; of course, a K ∪ {a} and the proof is complete. Proposition 6 says that (Pω S, ∪, ∅) is the sup-semilattice generated by the singletons. In general, the intersection of two finite (unary) subsets cannot be proved to be finite (unary) too. This phenomenon corresponds to the fact that we cannot find the common elements between two given lists unless the equality relation in the underlying set S is decidable. 4 Proposition 9. Let S be a set. The following are equivalent: 1. 2. 3. 4. 5.
the equality in S is decidable; {a} ∩ {b} is finite, for all a and b in S; K ∩ L is finite for all finite K and L; {a} ∪ −{a} = S for all a ∈ S; K ∪ −K = S for all finite K.
Proof. 1 ⇒ 2. If a = b holds, then {a} ∩ {b} is equal to {a} which is finite; instead, if a = b, then a cannot belong to {b} and {a} ∩ {b} is empty, hence finite. 2 ⇒ 3. Assume K = {a0 , . . . , an−1 } and L = {b0 , . . . , bm−1 }; then K = m−1 ∪n−1 i=0 {ai }, while L = ∪j=0 {bj }. Thus K ∩ L = ∪i,j ({a} ∩ {b}), by distributivity of ∩ with respect to ∪; thus it is finite, since it is the union of finitely many (namely n · m) finite subsets. 4
Of course, the equality in S is decidable if the proposition (∀a ∈ S)(∀b ∈ S) (Id(S, a, b) ∨ ¬Id(S, a, b)) is true.
Finiteness in a Minimalist Foundation
65
3 ⇒ 4. Let a, b ∈ S; then {a} and {b} are finite; thus {a} ∩ {b} is finite and it is decidable whether it is empty or inhabited. If the former holds, then b cannot belong to {a}, thus b −{a}. Instead, if the latter holds, then the proposition (∃c ∈ S)(c {a} ∩ {b}) yields a = b; so b {a}. Since b was arbitrary, {a} ∪ −{a} is the whole S. n−1 4 ⇒ 5. If K = {a0 , . . . , an−1 } = ∪n−1 i=0 {ai }, then −K = ∩i=0 − {ai }. By distributivity of ∪ with respect to ∩, K ∪ −K(x) can be seen as the intersection of n subsets of the kind K ∪ −{ai } for i = 1, . . . , n. Each of them is a union of n + 1 subsets and contains {ai } ∪ −{ai }, which is S by hypothesis; hence K ∪ −K = S. 5 ⇒ 1. Let a and b be two arbitrary elements of S. Since {b} is finite, then {b} ∪ −{b} = S; thus a belongs to it. In other words (a = b) ∨ (a = b) holds. Thus, provided that the equality of S is decidable, Pω S is closed under intersection. On the contrary, an arbitrary subset of a finite (unary) subset is not forced to be finite (unary) too, even in the case of a decidable equality. For let P be an arbitrary proposition and read P & Id(N (1), x, 01 ) as a propositional function over the finite set N (1). If the subset {x ∈ N (1) : P & (x = 01 )} were finite, then we could decide whether it is empty or inhabited: in the first case ¬P should hold, otherwise P should; in other words, we could prove the law of excluded middle. Moreover note that the above argument holds even if the existence of the first universe is assumed (thus the equality of N (1) is decidable). Thus we have proved the following. Proposition 10. In the framework of mTT, the statement that every subset of a finite (sub)set is also finite is equivalent to the full law of the excluded middle. We conclude the present section with a property about finite subsets which was used both in [2] and [12] to prove constructive versions of Tychonoff’s theorem. Proposition 11. Let S be a set and K,V ,W subsets of S. If K ⊆ V ∪ W and K is finite (unary), then there exist V0 ⊆ V and W0 ⊆ W both finite (unary) such that K = V0 ∪ W0 . The above proposition looks as intuitively clear: take V0 and W0 to be K ∩ V and K ∩ W respectively. But formally we cannot follow this road because a part of a finite subset is not finite, in general (previous proposition). Proof. Let us start from the unary case. We can effectively decide if K is the empty subset or a singleton. In the first case take V0 = ∅ = W0 . Otherwise we have K = {a} for some a and, moreover, a V ∪ W ; so either a V or a W . In the first case take V0 = {a} and W0 = ∅; W0 = {a} and V0 = ∅, otherwise. In order to prove the statement in the finite case, we use induction. If K = ∅ we can take V0 = ∅ = W0 . Now assume the theorem to be true for K and prove it for K ∪ {a}. So, let K ∪ {a} ⊆ V ∪ W . Then K ⊆ V ∪ W and by inductive hypothesis there exist V0 ⊆ V and W0 ⊆ W both finite and satisfying K = V0 ∪ W0 ; hence K ∪ {a} = V0 ∪ W0 ∪ {a}. On the other hand, we know that a V ∪ W : so, if a V , then we can take V0 = V0 ∪ {a} and W0 = W0 , while if
66
F. Ciraulo and G. Sambin
a W , then we take V0 = V0 and W0 = W0 ∪ {a} (if a belongs to both of them, then both choices are good). A remark on the last part of the previous proof could be useful. Even if we proceed by cases starting from a V ∪ W , it does not mean we are assuming that we can decide wether a V or a W ; what we are doing is just an application of the logical rule called “elimination of disjunction”. Thus, the effective construction of V0 and W0 strongly depends on the degree of constructiveness of the hypothesis K ⊆ V ∪ W . The previous proposition, combined with the fact that we are always able to decide whether a finite subset is inhabited or not, yields the following corollary (see [12]). Corollary 1. Let P (x) and Q(x) be two propositional functions over a set S and let K ⊆ S be a finite subset such that for every x K either P (x) or Q(x) holds. Then either P (x) holds for every x K or there exists some x K such that Q(x) holds. In fact, from K ⊆ P ∪Q we can infer, as in the previous proposition, K = P0 ∪Q0 and then decide whether Q0 is empty or inhabited.
5
Some Other Notions of Finiteness
The notion of finite (sub)set we have adopted all over the present paper looks as the most natural one and, in fact, it is used by many authors (see [2] and [12]) including the present ones (see [1]). On the other hand, such notion lacks some desired properties such as closure under intersection. Hence, one can look for other definitions which enjoy the desired properties. Here we give a brief list of possible alternative notions, each accompanied by a brief report about its properties and disadvantages. For each of the following notions about subsets, a corresponding definition for sets can be obtained by identifying each set with its total subset. Definition 4 (sub-finite; see [10]). U ⊆ S is sub-finite if U ⊆ K is true, for some K which is finite according to definition 3. The collection of all sub-finite subsets is closed both under (arbitrary) intersections and finite unions; on the other hand, it is not set-indexed, in general. Moreover, the computational content carried by a sub-finite subset is very poor; for instance, it is not possible to decide its emptyness. Definition 5 (bounded; see [3]). U ⊆ S is bounded if ∃k ∈ N such that ∀f ∈ N (k) → S f (N (k)) ⊆ U → (∃i, j ∈ N (k))(i = j & f (i) = f (j))
(30)
is true; that is, there cannot exist an injective map from N (k) into U (i.e. U has less than k elements).
Finiteness in a Minimalist Foundation
67
Contrary to the case of finite subsets, which are always represented by propositional functions of the form (∃i ∈ N (k))(x = f (i)), for some k ∈ N and f ∈ N (k) → S, it appears quite difficult to characterize the propositional functions corresponding to bounded subsets. Also answering the question whether the collection of all bounded subsets is set-indexed or not seems an hard task. This is surely due to the negative, not direct character of this definition. Proposition 12. Let S be a set and U, V ⊆ S; then: 1. if U is finite, then U is bounded; 2. if U ⊆ V and V is bounded, then U is bounded; 3. if U is sub-finite, then U is bounded. Proof. If U is finite, then there exists a number k such that U has at most k elements; so U has less than k + 1 elements and hence it is bounded. If V is bounded, then there exists a number k such that no f from N (k) to V can be injective. Consider an arbitrary function f from N (k) to U ; of course, it can be seen as a map that take its values in V , hence it can not be injective and U is bounded. If U is sub-finite, then there exists K ⊆ S such that K is finite and U ⊆ K. By item 1, K is bounded; by item 2, U is bounded. Item 1 in the previous proposition says that every finite (sub)set is bounded. On the contrary, it can not be formally proved that a bounded (sub)set is finite: classical logic seems necessary. Finally, an interesting generalization of the notion of finite subset is the following one that was proposed to us by Silvio Valentini. Definition 6 (semi-finite). U ⊆ S is semi-finite if: (x U ) ↔ ∨j∈J (&i∈I(j) x = aji ),
(31)
where aji ∈ S and both the set J and each I(j), j ∈ J, are of the form N (k). Of course, the aji in the definition above as to be thought of as a map from (Σ j ∈ J)I(j) to S. The collection of all semi-finite subsets is closed under finite intersections and unions; moreover, provided that the equality in S is decidable, semi-finiteness collapses to finiteness. Note that a semi-finite subset can be seen, by distributivity, as the intersection of a certain finite family of finite subsets. In other words, as Pω S is the ∪-semi-lattice generated by singletons, so the collection of all semi-finite subsets is the lattice generated by them (with respect to intersection and union). Note also that semi-finite subsets form a family indexed by the set List(List(S)). However, it is no longer decidable if a semi-finite subset is empty or not; in other words, with respect to this definition proposition 8 (and hence proposition 11 and corollary 1) fails. Acknowledgments. The authors thank Maria Emilia Maietti for a lot of essential suggestions she gave them during long and dense discussions.
68
F. Ciraulo and G. Sambin
References 1. Ciraulo, F., Sambin, G.: Finitary Formal Topologies and Stone’s Representation Theorem. Theoretical Computer Science (to appear) 2. Coquand, T.: An Intuitionistic Proof of Tychonoff Theorem. Journal of Symbolic Logic 57(1), 28–32 (1992) 3. van Dalen, D. (ed.): Brouwer’s Cambridge Lectures on Intuitionism. Cambridge University Press, Cambridge (1981) 4. Maietty, M.E.: Quotients over Minimal Type Theory. In: Cooper, S.B., L¨ owe, B., Sorbi, A. (eds.) CiE 2007. LNCS, vol. 4497, Springer, Heidelberg (2007) 5. Maietti, M.E., Sambin, G.: Toward a Minimalist Foundation for Constructive Mathematics. In: Crosilla, L., Schuster, P. (eds.) From Sets and Types to Topology and Analysis: Towards Practicable Foundations for Constructive Mathematics, Oxford UP. Oxford Logic Guides, vol. 48 (2005) 6. Martin-L¨ of, P.: Intuitionistic Type Theory. Notes by G. Sambin of a series of lectures given in Padua, June 1980. Bibliopolis, Naples (1984) 7. Negri, S., Valentini, S.: Tychonoff’s Theorem in the Framework of Formal Topologies. Journal of Symbolic Logic 62(4), 1315–1332 (1997) 8. Nordstr¨ om, B., Peterson, K., Smith, J.: Programming in Martin-L¨ of’s Type Theory. Clarendon Press, Oxford (1990) 9. Sambin, G., Valentini, S.: Building up a Toolbox for Martin-L¨ of Type Theory: Subset Theory. In: Sambin, G., Smith, J. (eds.) Twenty-Five Years of Constructive Type Theory. Proceedings of a Congress Held in Venice, October 1995, pp. 221–240. Oxford University Press, Oxford (1998) 10. Troelstra, A.S., van Dalen, D.: Constructivism in Mathematics: An Introduction, vol. 1. NorthHolland, Amsterdam (1988) 11. Veldman, W.: Some Intuitionistic Variations on the Notion of a Finite Set of Natural Numbers. In: de Swart, H.C.M., Bergman, L.J.M. (eds.) Perspectives on negation. Essays in honour of Johan J. de Iongh on his 80th birthday, pp. 177–202. Tilburg Univ. Press, Tilburg (1995) 12. Vickers, S.J.: Compactness in Locales and in Formal Topology. In: Banaschewski, B., Coquand, T., Sambin, G. (eds.) Papers presented at the 2nd Workshop on Formal Topology (2WFTop 2002), Venice, Italy, April 04-06, 2002. Annals of Pure and Applied Logic, vol. 137, pp. 413–438 (2006)
A Declarative Language for the Coq Proof Assistant Pierre Corbineau Institute for Computing and Information Science Radboud University Nijmegen, Postbus 9010 6500GL Nijmegen, The Netherlands
[email protected]
Abstract. This paper presents a new proof language for the Coq proof assistant. This language uses the declarative style. It aims at providing a simple, natural and robust alternative to the existing Ltac tactic language. We give the syntax of our language, an informal description of its commands and its operational semantics. We explain how this language can be used to implement formal proof sketches. Finally, we present some extra features we wish to implement in the future.
1
Introduction
1.1
Motivations
An interactive proof assistant can be described as a state machine that is guided by the user from the ‘statement φ to be proved’ state to the ‘QED’ state. The system ensures that the state transitions (also known as proof steps in this context) are sound. The user’s guidance is required because automated theorem proving in any reasonable logics is undecidable in theory and difficult in practice. This guidance can be provided either through some kind of text input or using a graphical interface and a pointing device. In this paper, we will focus on the former method. The ML language developed for the LCF theorem prover [7] was a seminal work in this domain. The ML language was a fully-blown programming language with specific functions called tactics to modify the proof state. The tool itself consisted of an interpreter for the ML language. Thus a formal proof was merely a computer program. With this in mind, think of a person reading somebody else’s formal proof, or even one of his/her own proof a couple of months after having written it. Similarly to what happens with source code, this person will have a lot of trouble understanding what is going on with the proof unless he/she has a very good memory or the proof is thoroughly documented. Of course, running the proof through the prover and looking at the output might help a little. This illustrates a major inconvenience which still affects many popular proof languages used nowadays: they lack readability. Most proofs written are actually
This work was partially funded by NWO Bricks/Focus Project 642.000.501.
M. Miculan, I. Scagnetto, and F. Honsell (Eds.): TYPES 2007, LNCS 4941, pp. 69–84, 2008. c Springer-Verlag Berlin Heidelberg 2008
70
P. Corbineau
write-only or rather write- and execute-only, since what the user is interested in when re-running the proof in is not really the input, but rather the output of the proof assistant, i.e. the sequence of proof states from the statement the theorem to ’QED’. The idea behind declarative style proofs is to actually base the proof language on this sequence of proof states. This is indeed the very feature that makes the distinction between procedural proof languages (like ML tactics in LCF) and declarative proof languages. On the one hand, procedural languages emphasize proof methods (application of theorems, rewriting, proof by induction. . . ), at the expense of a loss of precision on intermediate proof states: the intermediate states depend on the implementation on tactics instead of a formal semantics. On the other hand, declarative languages emphasize proof states but are less precise about the logical justification of the gap between one state and the next one. 1.2
Related Work
The first proof assistant to implement a declarative style proof language was the Mizar system, whose modern versions date back to the early 1990’s. The Mizar system is a batch proof assistant: it compiles whole files and writes error messages in the body of the input text, so it is not exactly interactive, but its proof language has been an inspiration for all later designs [15]. Another important source in this subject is Lamport’s How to write a proof [10] which takes the angle of the mathematician and provides a very simple system for proof notations, aimed at making proof verification as simple as possible. John Harrison has been the first to develop a declarative proof language for an interactive proof assistant: the HOL88 theorem prover [8]. Donald Syme has developed the DECLARE [12,13] batch proof assistant for higher-order logic with declarative proofs in mind from the start. He also describes CAPW (Computer Aided Proof Writing) as a means of overcoming the verbosity of declarative proofs. The first interactive proof assistant for which a declarative language has been widely adopted is Isabelle [11], with the Isar (Intelligible Semi-Automated Reasoning) language [14], designed by Markus Wenzel. Freek Wiedijk also developed a light declarative solution [16] for John Harrison’s own prover HOL Light [9]. For the Coq proof assistant [2], Mariusz Giero and Freek Wiedijk have built a set of tactics called the MMode [6] to provide an experimental mathematical mode which give a declarative flavor to Coq. Recently, Claudio Sacerdoti-Coen added a declarative language to the Matita proof assistant [1]. 1.3
A New Language for the Coq Proof Assistant
The Coq proof assistant is a Type Theory-based interactive proof assistant developed at INRIA. It has a strong user base both in the field of software and
A Declarative Language for the Coq Proof Assistant
71
hardware verification and in the field of formalized mathematics. It also has the reputation of being a tool whose procedural proof language Ltac has a very steep learning curve both at the beginner and advanced level. Coq has been evolving quite extensively during the last decade, and the evolution has made it necessary to regularly update existing proofs to maintain compatibility with most recent versions of the tool. Coq also has a documentation generation tool to do hyper-linked rendering of files containing proofs, but most proofs are written in a style that makes them hard to understand (even with syntax highlighting) unless you can actually run them, as was said earlier for other procedural proof languages. Building on previous experience from Mariusz Giero, we have built a stable mainstream declarative proof language for Coq. This language was built to have the following characteristics: readable. The designed language should use clear English words to make proof reading a feasible exercise. natural. We want the language to use a structure similar to the ones used in textbook mathematics (e.g. for case analysis), not a bare sequence of meaningless commands. maintainable. The new language should make it easy to upgrade the prover itself: behavior changes should only affect the proof locally. stand-alone. The proof script should contain enough explicit information to be able to retrace the proof path without running Coq. The Mizar language has been an important source of inspiration in this work but additional considerations had to be taken into account because the Calculus of Inductive Constructions (CIC) is much richer than Mizar’s (essentially) firstorder Set Theory. One of the main issue is that of proof genericity: Coq proofs use a lot of inductive objects for lots of different applications (logical connectives, records, natural numbers, algebraic data-types, inductive relations. . . ). Rather than enforcing the use of the most common inductive definitions, we want to be as generic as possible in the support we give for reasoning with these objects. Finally, we want to give an extended support for proofs by induction by allowing multi-level induction proofs, using a very natural syntax to specify the different cases in the proof. The implementation was part of the official release 8.1 version of Coq. 1.4
Outline
We first describe some core features of our language, such as forward and backward steps, justifications, and partial conclusions. Then we give a formal syntax and a quick reference of the commands of our language, as well as an operational semantics. We go on by explaining how our language is indeed an implementation of the formal proof sketches [17] concept, and we define the notion of well-formed proof. We finally give some perspective for future work.
72
2 2.1
P. Corbineau
Informal Description An Introductory Example
To give a sample of the declarative language, we provide here the proof of a simple lemma about Peano numbers: the double function is defined by double x = x+x and the div2 functions by: ⎧ ⎪ ⎨div2 0 = 0 div2 1 = 0 ⎪ ⎩ div2 (S (S x)) = S (div2 x) The natural numbers are defined by means of an inductive type nat with two constructors 0 and S (successor function). The lemma states that div2 is the left inverse of double. We first give a proof of the lemma using the usual tactic language: Lemma double_div2: forall n, div2 (double n) = n. intro n. induction n. reflexivity. unfold double in *|-*. simpl. rewrite <- plus_n_Sm. rewrite IHn. reflexivity. Qed. Now, we give the same proof using the new declarative language: Lemma double_div2: forall n, div2 (double n) = n. proof. let n:nat. per induction on n. suppose it is 0. reconsider thesis as (0=0). thus thesis. suppose it is (S m) and Hrec:thesis for m. have (div2 (double (S m)) = div2 (S (S (double m)))). ~= (S (div2 (double m))). thus ~= (S m) by Hrec. end induction. end proof. Qed. The proof consists of a simple induction on the natural number n. The first case is done by conversion (computation) to 0 = 0 and the second case by computation
A Declarative Language for the Coq Proof Assistant
73
and by rewriting the induction hypothesis. Of course, you could have guessed that by simply reading the declarative proof. 2.2
Forward and Backward Proofs
The notions of declarative and procedural proofs are often confused with the notions of forward and backward proof. We believe those two notions are mostly orthogonal: the distinction between declarative and procedural style is that declarative proofs mention explicitly the intermediate proof steps, while procedural proofs explain what method is used to go to the next state without mentioning it, whereas the distinction between forward and backward proof is merely a notion of whether the proof is build bottom-up by putting together smaller proofs, or top-down by cutting a big proof obligation into smaller ones. A possible reason for the confusion between declarative style and backwards proofs is that most declarative languages rely on the core command have h : φ justif ication which introduces a new hypothesis h that asserts φ while using justification to check that it holds. This kind of command is the essence of forward proofs: it builds a new object — a proof of φ — from objects that already exists and are somehow mentioned in the justification. In order to show that you can also use backwards steps in a declarative script, our language contains a command suffices H1 and . . . and Hn to show G justif ication which acts as the dual of the have construction: it allows to replace the current statement to prove by sufficient conditions with stronger statements (as explained by the justification). For example, you can use this command to generalize your thesis before stating a proof by induction. 2.3
Justifications
When using a command to deduce a new statement from existing ones, or to prove that some statements suffice to prove a part of the conclusion, you need the proof assistant to fill in the gap. The justification is a hint to tell the proof assistant which proof objects are to be used in the process, and how they shall be used. In our language, justifications are of the form by π1 , . . . , πn using t the πk are proof objects which can be hypotheses names (variables) but also more complex terms like application of a general result H : (∀x : nat, P x) to a specific object n : nat to get a proof (H n) : P n. The expression t is an optional tactic expression that will be used to prove the validity of the step.
74
P. Corbineau
The meaning of the πk objects is that only they and their dependencies — if H is specified and H has type P x then x is also implicitly added — will be in the local context in which the new statement is to be proved. If they are omitted then the new statement should be either a tautology or provable without any local hypothesis by the means of the provided tactic. If the user types by *, all local hypotheses can be used. The use of by * should be transient because it makes the proof more brittle: the justification depends too much on what happened before. If specified, the tactic t will then be applied to the modified context and if necessary, the remaining subgoals will be treated with the assumption tactic, which looks for a hypothesis convertible with the conclusion. If using t is omitted then a default automation tactic is used. This tactic is a combination of auto, congruence[4] and firstorder [3]. When writing a sequence of deduction steps, it often happens that a statement is only used in the next step. In that case the statement might be anonymous. By using the then keyword instead of have, this anonymous statement will be added to the list of proof objects to be used in that particular justification. If a justification fails, the proof assistant issues a warning Warning: insufficient justification. rather than an error. This allows the user to write proofs from the outside in by filling the gaps, rather than linearly from start to end. In the CoqIDE interface, this warning is emphasized by coloring the corresponding command with an orange background. This way, a user reading a proof script will immediately identify where work still needs to be done. 2.4
Partial Conclusion
A (partial) conclusion step is a deduction step of which the statement is a structural subformula of the current thesis and hence can be counted as proven and removed from the thesis. For example, A is a partial conclusion for A ∧ B, it is also a partial conclusion for A ∨ B. In the latter case, choosing to prove A implies a choice in the proof we want to make, by proving A rather than B. In our language, the user can type the command thus G justif ication to provide a partial conclusion G proved using justification. If G is not proved, the usual warning is issued, but if G is not a sub-formula of the conclusion then an error occurs: the user is trying to prove the wrong formula. More precisely, the notion of partial conclusion is a consequence of the encoding of logical connectives as inductive types in Coq. We will look for partial conclusions in the possible sub-terms of proof terms based on inductive type constructors. In Figure 1, we have exposed the rules for the refinement of a conclusion G ¯ as sufficient conditions. The result is using a proof u of V possibly assuming U
A Declarative Language for the Coq Proof Assistant Tσ ≡ V
σ most general match ¯ x : T ⇒u y¯ : U ; σ ∪ {x → (u y¯)} x ¯ fresh
Ci constructor
x ¯ : T¯ ⇒u Γ ; σ
x : T ⇒u Γ ; σ Γl , x : T, Γr ⇒u Γl σ, Γ, Γr σ; σ
75
list
x ¯ : T¯ Ci p¯ x ¯ : I p¯
constructor z : (I p¯) ⇒u Γ ; σ ∪ {z → (Ci p¯ x ¯)σ} ¯ : T¯ , σ z fresh z : G ⇒u x gather ¯ V ) = Σx G \ (u : U ¯ : T¯; λ y : Σ x ¯ : T¯ .match y with x ¯ => σz end Fig. 1. The partial conclusion mechanism
a pair of a new conclusion G and a proof of G → G . If Γ is a list of typed variables and σ a substitution, then Γ σ is formed by applying σ to all elements of Γ the following way: – if σx = x then x : T is discarded – is σx = x then x : T σ is added to Γ σ The notation Σ x ¯ : T¯ stands for a (possibly dependent) tuple built using the standard (and, prod, ex, sig, sigT) binary type constructors. Since a given problem may have several solutions, a simple search strategy has been adopted, and the constructor rule has been restricted to non-recursive types in order to keep the search space finite. thesis
instruction
A∧B (A ∧ B) ∧ C A∨B (A ∧ B) ∨ (C ∧ D) ∃x : nat, P x ∃x : nat, P x ∃x : nat, ∃y : nat, (P y ∧ R x y) A∧B
thus B thus B thus B thus C take (2 : nat) thus P 2 thus P 2 suffices to have x : nat such that P x to show B
remaining thesis A A∧C D P2 ∃x : nat, R x 2 A ∧ ∃x : nat, P x
Fig. 2. Examples of partial conclusions
On Fig. 2, we give various examples of uses for the partial conclusion mechanism. The special constant stands for a conclusion where everything is proved. When using P 2 as a partial conclusion for ∃x : nat, P x, even though a placeholder for nat should remain, this placeholder is filled by 2 because of typing constraints. Please note that instantiating an existential quantifier with a specific witness is an instance of this operation. In a former version of this language, we kept so-called split thesis (i.e several conclusions) instead of assembling them again. We decided to abandon this feature because it added much confusion without extending significantly the
76
P. Corbineau
functionalities. In the case of the suffices construction, the part of the conclusion is removed and replaced by the sufficient conditions. Using this mechanism helps the user to build the proof piece by piece using partial conclusions to simplify the thesis and thus keeping track of his/her progress.
3 3.1
Syntax and Semantics Syntax
Figure 3 gives the complete formal syntax of the declarative language. the unbound non-terminals are id for identifiers, num for natural numbers, term and type for terms and types of the Calculus of Inductive Constructions, pattern refers to a pattern for matching against inductive objects. Patterns may be nested and contain the as keyword and the wildcard, but no disjunctive pattern is allowed. instruction ::= | | | | | | | | | | | | | | | | |
proof assume statement [and statement]∗ [and (we have)-clause]? (let, be)-clause (given)-clause (consider)-clause from term [have|then|thus|hence] statement justif ication thus? [∼ =|= ∼] [id:]? term justif ication suffices [(to have) − clause|statement [and statement]∗] to show statement justif ication [claim|focus on] statement take term ∗ ? ] as term
define id [var[,var] reconsider id thesis[[num]]? as type per [cases|induction] on term per cases of type justif ication suppose [id[,id]∗ and]? is is pattern
? such that statement [and statement]∗[and (we have)-clause]? end [proof|claim|focus|cases|induction] escape return
α, β-clause ::= α var[,var]∗ [β such that statement [and statement]∗
? [and α, β-clause]? statement
::= [id:]? type | thesis | thesis for id
::= id[:type]?
? justification ::= by * term[,term]∗ [using tactic]?
var
Fig. 3. Syntax for the declarative language
A Declarative Language for the Coq Proof Assistant
3.2
77
Commands Description
proof. ... end proof. This is the outermost block of any declarative proof. If several subgoals existed when the proof command occurred, only the first one is proved in the declarative proof. If the proof is not complete when encountering end proof, then the proof is closed all the same, but with a warning, and Qed or Defined to save the proof will fail. have h:φ justif ication. then h:φ justif ication. This command adds a new hypothesis h of type φ in the context. If the justification fails, a warning is issued but the hypothesis is still added to the context. The then variant adds the previous fact to the list of objects used in the justification. thus h:φ justif ication. hence h:φ justif ication. These commands behave respectively like have and then but the proof of φ is used as a partial conclusion. This can end the proof or remove part of the proof obligations. These commands fail if φ is not a sub-formula of the thesis. claim h : φ. ... end claim. This block contains a proof of φ which will be named h after end claim. If the subproof is not complete when encountering end claim, then the subproof is still closed, but with a warning, and Qed or Defined to save the proof later will fail. focus on φ. ... end focus. This block is similar to the claim block, except that it leads to a partial conclusion. In a way, focus is to claim what thus is to have. This comes handy when the thesis is a conjunction and one of the conjuncts is an implication or a universal quantification: the focus block will allow to use local hypotheses. (thus) ~= t justif ication. (thus) =~ t justif ication. These commands can only be used if the last step was an equality l = r. t should be a term of the same type as l and r. If ~= is used then the justif ication will be used to prove l = t and the new statement will be l = t. Otherwise, the justif ication will be used to prove t = r and the new statement will be t = r. When present, the thus keyword will trigger a conclusion step. suffices H : Φ to show Ψ justif ication. This command allows to replace a part of the thesis by a sufficient condition, e.g. to strengthen it before simple with previous step opens sub-proof iterated equality intermediate step have then claim ~=/=~ conclusive step thus hence focus on thus ~=/thus =~ Fig. 4. Synthetic classification of forward steps
78
P. Corbineau
starting an proof by induction. In the thesis, Ψ is then replaced by Φ. The justification should prove Ψ using Φ. assume G:Ψ ...and we have x such that H:Φ. let x be such that H:Φ. These commands are two different flavors for the introduction of hypotheses. They expect the thesis to be a product (implication or universal quantifier)of the shape Πi xi : T i.Gi. It expects the Ti to be convertible with the provided hypotheses statements. This command is well-formed only if the missing types can be inferred. given x such that H:Φ. consider x such that H:Φ from G. given is similar to let, except that this command works up to elimination of tuples and dependent tuples such as conjunctions and existential quantifiers. Here the thesis could be ∃x.Φ → Ψ with Φ convertible to Φ. The consider command takes an explicit object G to destruct instead of using an introduction rule. define f (x : T ) as t. This command allows to defines objects locally. If parameters are given, a function (λ-abstraction) is defined. reconsider thesis as T . reconsider H as T . These commands allows to replace the statement the thesis or a hypothesis with a convertible one, and fails if the provided statement is not convertible. take t. This command allows to do a partial conclusion using an explicit proof object. This is especially useful when proving an existential statement: it allows to specify the existential witness. per cases on t. — of F justif ication. suppose x : H. ... suppose x : H . ... end cases. This introduces a proof per cases on a disjunctive proof object t or a proof of the statement F derived from justif ication. The per cases command must immediately be followed by a suppose command which will introduce the first case. Further suppose commands or end cases can be typed even if the previous case is not complete. In that case a warning is issued. If t occurs in the thesis, you should use suppose it is instead of suppose. per induction on t. — cases — suppose it is patt and x : H. ... suppose it is patt and x : H . ... end cases. This introduces a proof per dependent cases or by induction. When doing the proof, t is substituted with patt in the thesis. patt must be a pattern for a value of the same type as t. It may contain arbitrary sub-patterns and as statements to bind names to sub-patterns.
A Declarative Language for the Coq Proof Assistant
79
Those name aliases are necessary to apply the induction hypothesis at multiple levels. If you are doing a proof by induction, you may use the thesis for construction in the suppose it is command to refer to an induction hypothesis. You may also write induction hypotheses explicitly. escape. ... return. This block allows to escape the declarative mode back to the tactic mode. When encountering the return instruction, this subproof should be closed, or else a warning is issued. 3.3
Operational Semantics
The purpose of this section is to give precise details about what happens to the proof state when you type a proof command. The proof state consists of a stack S that contains open proofs and markers to count open sub-proofs, and each subproof is either a judgement Γ G, where Γ is a list of types (or propositions) indexed by names and G is a type (or proposition) or the symbol which stands for a closed subproof. A proof script S consists of the concatenation of instructions. The rules are given as big-step semantics S ⇒T S means that when proving theorem T, we reach state S when executing the script S. This means that any script allowing to reach the empty stack [] is a complete proof of the theorem T. For clarity, we only give here the rules for some essential rules, the remaining rules can be found in appendix A. T = {Γ G}
S ⇒T (Γ ); []
proof. ⇒T (Γ G); []
S end proof. ⇒T []
S ⇒T (Γ G); S
jΓ T
G =
S have (x : T ) j. ⇒T (Γ ; x : T G); S S ⇒T (Γ G); S
jΓ T
G =
S thus (x : T ) j. ⇒T (Γ ; x : T G \ (x : T )); S The j Γ G expression means that the justification j is sufficient to solve the problem Γ G. If it is not, the command issues a warning. The ≡ relation is the conversion (βδιζ-equivalence) relation of the Calculus of Inductive Constructions (see [2]). We write L R whenever the L context can be obtained by decomposing tuples in the R context. The \ operator is defined in Fig. 1, rule gather. We use the traditional λ notation for abstractions and Π for dependent products (either implication or universal quantification, depending on the context). The distinction between casesd and casesnd is used to prevent the mixing of suppose with suppose it is. For simplicity, the appendix A omits the coverage condition for case analysis as well as the semantics for escape and return.
80
P. Corbineau
4
Proof Editing
4.1
Well-Formedness
If we drop the justification (j . . . ) and completeness (G = ) conditions in our formal semantics, we get a notion of well-formed proofs. Those proofs when run in Coq, are accepted with warnings but cannot be saved since the proof tree contains gaps. This does not prevent the user from going further with the proof since the user is still able to use the result from the previous step. The smallest well formed proof is: proof. end proof. Introduction steps such as assume have additional well-formedness requirements: the introduced hypotheses must match the available ones. The given construction allows a looser correspondence. The reconsider statements have to give a convertible type. For proofs by induction, well-formedness requires the patterns to be of the correct type, and induction hypotheses to be build from the correct sub-objects in the pattern. 4.2
Formal Proof Sketches
We claim that well-formed but incomplete proofs in our language play the role of formal proof sketches: they ensure that hypotheses correspond to the current statement and that objects referred to exist and have a correct type. When avoiding the by * construction, justifications are preserved when adding extra commands inside the proof. In this sense our language supports incremental proof development. The only thing that the user might have trouble doing when turning a textbook proof into a proof sketch in our language is ensuring that first-order objects are introduced before a statement refers to them, since a textbook proofs might not be topologically organized. The user will then be able to add new lines within blocks (mostly forward steps).
5 5.1
Conclusion Further Work
Arbitrary relation composition. The first extension that is needed for our language is the support for iterated relations other than equality. This is possible as soon as a generalized transitivity lemma of the form ∀xyz, x R1 y → y R2 z → x R3 z is available. Better automation. There is a need for a more precise and powerful automation for the default justification method, to be able to give better predictions of when a deduction step will be accepted. A specific need would be an extension of equality reasoning to arbitrary equivalence relations (setoids, PERs . . . ).
A Declarative Language for the Coq Proof Assistant
81
Multiple induction. The support for induction is already quite powerful (support for deep patterns with multiple as bindings), but more can be done if we start considering multiple induction. It might be feasible to detect the induction scheme used (double induction, lexicographic induction ...) to build the corresponding proof on-the-fly. Translation of procedural proofs. The declarative language offers a stable format for the preservation of old proofs over time. Since many Coq proofs in procedural style already exist, it will be necessary to translate them to this new format. The translation can be done in two ways: by generating a declarative proof either from the proof tree, or from the proof term. The latter will be more fine grain but might miss some aspects of the original procedural proof. The former looks more difficult to implement. 5.2
Conclusion
The new declarative language is now widely distributed, though not yet widely used, and we hope that this paper will help new users to discover our language. As a beginning, Jean-Marc Notin (INRIA Futurs) has translated a part of the Coq Standard Library to the declarative language. The implementation is quite stable and the automation, although not very predictable, offers a reasonable compromise between speed and power. We really hope that this language will be a useful medium to make proof assistants more popular, especially in the mathematicians community and among undergraduate students. We believe that our language provides a helpful implementation of the formal proof sketch concept; this means it could be a language of choice for turning textbook proofs into formal proofs. It could also become a tool of choice for education. In the context of collaborative proof repositories such as in [5], our language, together with other declarative languages, will fill the gap between the narrow proof assistant community and the general public: we aim at presenting big formal proofs to the public.
References 1. Coen, C.S.: Automatic generation of declarative scripts. CHAT: Connecting Humans and Type Checkers (December 2006) 2. The Coq Development Team. The Coq Proof Assistant Reference Manual – Version V8.1 (February 2007) 3. Corbineau, P.: First-order reasoning in the calculus of inductive constructions. In: Berardi, S., Coppo, M., Damiani, F. (eds.) TYPES 2003. LNCS, vol. 3085, pp. 162–177. Springer, Heidelberg (2004) 4. Corbineau, P.: Deciding equality in the constructor theory. In: Altenkirch, T., McBride, C. (eds.) TYPES 2006. LNCS, vol. 4502, pp. 78–92. Springer, Heidelberg (2007)
82
P. Corbineau
5. Corbineau, P., Kaliszyk, C.: Cooperative repositories for formal proofs. In: Kauers, M., Kerber, M., Miner, R., Windsteiger, W. (eds.) MKM/CALCULEMUS 2007. LNCS (LNAI), vol. 4573, pp. 221–234. Springer, Heidelberg (2007) 6. Giero, M., Wiedijk, F.: MMode, a mizar mode for the proof assistant coq. Technical report, ICIS, Radboud Universiteit Nijmegen (2004) 7. Gordon, M., Milner, R., Wadsworth, C.: Edinburgh LCF. LNCS, vol. 78. Springer, Heidelberg (1979) 8. Harrison, J.: A mizar mode for HOL. In: von Wright, J., Harrison, J., Grundy, J. (eds.) TPHOLs 1996. LNCS, vol. 1125, pp. 203–220. Springer, Heidelberg (1996) 9. Harrison, J.: The HOL Light manual, Version 2.20 (2006) 10. Lamport, L.: How to write a proof. American Mathematics Monthly 102(7), 600– 608 (1995) 11. Paulson, L.: Isabelle. LNCS, vol. 828. Springer, Heidelberg (1994) 12. Syme, D.: DECLARE: A prototype declarative proof system for higher order logic. Technical report, University of Cambridge (1997) 13. Syme, D.: Three tactic theorem proving. In: Bertot, Y., Dowek, G., Hirschowitz, A., Paulin, C., Th´ery, L. (eds.) TPHOLs 1999. LNCS, vol. 1690, pp. 203–220. Springer, Heidelberg (1999) 14. Wenzel, M.: Isar - A generic interpretative approach to readable formal proof documents. In: Bertot, Y., Dowek, G., Hirschowitz, A., Paulin, C., Th´ery, L. (eds.) TPHOLs 1999. LNCS, vol. 1690, pp. 167–184. Springer, Heidelberg (1999) 15. Wenzel, M., Wiedijk, F.: A comparison of Mizar and Isar. Journal of Automated Reasoning 29(3-4), 389–411 (2002) 16. Wiedijk, F.: Mizar light for HOL light. In: Boulton, R.J., Jackson, P.B. (eds.) TPHOLs 2001. LNCS, vol. 2152, pp. 378–394. Springer, Heidelberg (2001) 17. Wiedijk, F.: Formal proof sketches. In: Berardi, S., Coppo, M., Damiani, F. (eds.) TYPES 2003. LNCS, vol. 3085, pp. 378–393. Springer, Heidelberg (2004)
A
Operational Semantic (Continued) S ⇒T (Γ ; l : T G); S
j, l Γ T
G =
S then (x : T ) j. ⇒T (Γ ; l : T ; x : T G); S S ⇒T (Γ ; l : T G); S
j, l Γ T
G =
S hence (x : T ) j. ⇒T (Γ ; l : T ; x : T G \ (x : T )); S S ⇒T (Γ G); S
jΓ r =u
G =
S ~= u j. ⇒T (Γ ; e : l = u G); S S ⇒T (Γ G); S
jΓ u=l
G =
S =~ u j. ⇒T (Γ ; e : u = r G); S S ⇒T (Γ G); S
jΓ r =u
G =
S thus ~= u j. ⇒T (Γ ; e : l = u G \ ( l = u)); S S ⇒T (Γ G); S
jΓ u=l
G =
S thus =~ u j. ⇒T (Γ ; e : u = r G \ ( u = r)); S
A Declarative Language for the Coq Proof Assistant
S ⇒T (Γ G); S
G =
S claim (x : T ). ⇒T (Γ T ); claim; (Γ ; x : T G); S S ⇒T (Γ G); S
G =
S focus on (x : T ). ⇒T (Γ T ); focus; (Γ ; x : T G \ (x : T )); S S ⇒T (Γ ); claim; (Γ G); S
S ⇒T (Γ ); focus; (Γ G); S
S end claim. ⇒T (Γ G); S
S end focus. ⇒T (Γ G); S
S ⇒T (Γ G); S
Γ t:T
G =
S take t. ⇒T (Γ G \ (t : T )); S S ⇒T (Γ G); S
Γ ; x1 : T1 , . . . , xn : Tn t : T
G =
S define f (x1 : T1 ) . . . (xn : Tn ) as t. ⇒T (Γ ; f := λx1 : T1 . . . λxn : Tn .t G); S S ⇒T (Γ Πx1 : T1 . . . Πxn : Tn .G); S
(T1 . . . Tn ) ≡ (T1 . . . Tn )
S assume/let (x1 : T1 ) . . . (xn : Tn ). ⇒T (Γ ; x1 : T1 ; . . . ; xn : Tn G); S S ⇒T (Γ Πx1 : T1 . . . Πxn : Tn .G); S
(T1 . . . Tm ) (T1 . . . Tn )
S given (x1 : T1 ) . . . (xm : Tm ). ⇒T (Γ ; x1 : T1 ; . . . ; xm : Tm G); S S ⇒T (Γ G); S
Γ t:T
(T1 . . . Tn ) (T ) G =
S consider (x1 : T1 ) . . . (xn : Tn ) from t. ⇒T (Γ ; x1 : T1 ; . . . ; xn : Tn G); S S ⇒T (Γ ; x : T G); S
T ≡ T
G =
S reconsider x as T . ⇒T (Γ ; x : T G); S S ⇒T (Γ T ); S
T ≡ T
G =
S reconsider thesis as T . ⇒T (Γ T ); S S ⇒T (Γ G); S
j Γ ; x1 : T1 ; . . . ; xn : Tn T
G =
S suffices (x1 : T1 ) . . . (xn : Tn ) to show T j. ⇒T (Γ G \ (T1 ; . . . ; Tn T ); S S ⇒T (Γ G); S
jΓ t:T
S per cases of T j. ⇒T casesnd (t : T ); (Γ ; x : T G); S S ⇒T (Γ G); S
Γ t:T
S per cases on t. ⇒T cases(t : T ); (Γ ; x : T G); S S ⇒T cases(t : T ); (Γ G); S S suppose (x1 : T1 ) . . . (xn : Tn ). ⇒T (Γ ; x1 : T1 ; . . . ; xn : Tn G); casesnd (t : T ); (Γ G); S
83
84
P. Corbineau
S ⇒T (Γ ); casesnd (t : T ); (Γ G); S S suppose (x1 : T1 ) . . . (xn : Tn ). ⇒T (Γ ; x1 : T1 ; . . . ; xn : Tn G); casesnd (t : T ); (Γ G); S S ⇒T cases(t : T ); (Γ G); S S suppose it is p and (x1 : T1 ) . . . (xn : Tn ). ⇒T (Γ ; x1 : T1 ; . . . ; xn : Tn G[p/t]); casesd (t : T ); (Γ G); S S ⇒T (Γ ); casesd (t : T ); (Γ G); S S suppose it is p and (x1 : T1 ) . . . (xn : Tn ). ⇒T (Γ ; x1 : T1 ; . . . ; xn : Tn G[p/t]); casesd (t : T ); (Γ G); S S ⇒T (Γ ); casesd/nd (t : T ); (Γ G); S S end cases. ⇒T (Γ ); S S ⇒T (Γ G); S
Γ t:T
S per induction on t. ⇒T induction(t : T ); (Γ ; x : T G); S S ⇒T induction(t : T ); (Γ G); S S suppose it is p and (x1 : T1 ) . . . (xn : Tn ). ⇒T (Γ ; x1 : T1 ; . . . ; xn : Tn G[p/t]); induction(t : T ); (Γ G); S S ⇒T (Γ ); induction(t : T ); (Γ G); S S suppose it is p and (x1 : T1 ) . . . (xn : Tn ). ⇒T (Γ ; x1 : T1 ; . . . ; xn : Tn G[p/t]); induction(t : T ); (Γ G); S S ⇒T (Γ ); induction(t : T ); (Γ G); S S end induction. ⇒T (Γ ); S
Characterising Strongly Normalising Intuitionistic Sequent Terms J. Esp´ırito Santo1 , S. Ghilezan2 , and J. Iveti´c2 1
Mathematics Department, University of Minho, Portugal
[email protected] 2 Faculty of Engineering, University of Novi Sad, Serbia
[email protected],
[email protected]
Abstract. This paper gives a characterisation, via intersection types, of the strongly normalising terms of an intuitionistic sequent calculus (where LJ easily embeds). The soundness of the typing system is reduced to that of a well known typing system with intersection types for the ordinary λ-calculus. The completeness of the typing system is obtained from subject expansion at root position. This paper’s sequent term calculus integrates smoothly the λ-terms with generalised application or explicit substitution. Strong normalisability of these terms as sequent terms characterises their typeability in certain “natural” typing systems with intersection types. The latter are in the natural deduction format, like systems previously studied by Matthes and Lengrand et al., except that they do not contain any extra, exceptional rules for typing generalised applications or substitution.
Introduction The recent interest in the Curry-Howard correspondence for sequent calculus [9,2,5,8,6] made it clear that the computational content of sequent derivations and cut-elimination can be expressed through an extension of the λ-calculus, where the construction that interprets cut subsumes both explicit substitution and an enlarged concept of application, exhibiting the features of “multiarity” and “generality” [8]. The sequent calculus acts relatively to such calculus of sequent terms as a typing system, and the ensuing notion of typeability is sufficient, but not necessary, for strong normalisability. This situation is well-known in the context of the ordinary λ-calculus, where simple-typeability is sufficient, but not necessary, for strong β-normalisability. A form of getting a characterisation of strongly normalising λ-terms is to extend the typing system with intersection types. For this reason intersection type assignment systems were introduced into λ-calculus in the late 1970s by Coppo and Dezani [3], Pottinger [15] and Sall´e [18]. Intersection types completely characterise strong normalisation in lambda calculus (see [1]). In this paper we seek a characterisation of strongly normalising sequent terms via intersection types. We first introduce, following [6], an extension of the λcalculus named λGtz (after Gentzen) corresponding to a sequent calculus for intuitionistic implicational logic, equipped with reduction rules for cut-elimination. M. Miculan, I. Scagnetto, and F. Honsell (Eds.): TYPES 2007, LNCS 4941, pp. 85–99, 2008. c Springer-Verlag Berlin Heidelberg 2008
86
J. Esp´ırito Santo, S. Ghilezan, and J. Iveti´c
The typing system is from the beginning equipped with intersection types, following [4]. The correctness of the typing system is obtained by a reduction to the correctness of the system D [12]. The completeness of the typing system is obtained as a corollary to subject expansion at root position. A recent topic of research is the use of intersection types for the characterisation of strong-normalisability in extensions of the λ-calculus with generalised applications or explicit substitutions [14,13,11]. A common symptom of these works is the need to throw in the typing system some extra, exceptional rules for typing generalised applications or substitutions. This breaks somehow the harmony observed in the ordinary λ-calculus between typeability induced by intersection types and strong β-normalisability. One may wonder whether, in the extended scenario with generalised applications or explicit substitutions, the blame for the slight mismatch is on some insufficiency of the intersection types technique, or on some insufficiency of the reduction relations causing too many terms to be terminating. It turns out that, because of its expressive power, λGtz is a good tool to analyse this question. A simple analysis of our main characterisation result shows that strong normalisability as sequent terms (i.e. inside λGtz ) of λ-terms with generalised applications or explicit substitutions characterises their typeability in certain “natural” typing systems with intersection types. The latter are in the natural deduction format, like systems previously studied in [14,13], except that they do not contain any extra, exceptional rules for typing generalised applications or substitution. So one is led to compare the behavior under reduction of λ-terms with generalised applications or explicit substitutions inside λGtz and inside their native system ΛJ [10] or λx [17]. We conclude that the problem in ΛJ is that we cannot form explicit substitutions, and in λx is that we cannot compose substitutions. The paper is organized as follows. Section 1 presents the syntax of the untyped λGtz calculus. Section 2 introduces an intersection type system λGtz ∩. Strong normalisation is proved in Section 3, and characterisation of strong normalisation is given in Section 4. In Section 5, the relation between λGtz calculus and calculi with generalised applications and explicit substitutions is discussed. Finally, Section 6 concludes this paper.
1
Syntax of λGtz
The abstract syntax of λGtz is given by: (Terms) t, u, v ::= x | λx.t | tk (Contexts) k ::= x .t | u :: k, where x ranges over a denumerable set of term variables. Terms are either variables, abstractions or cuts tk. A context is either a selection or a context cons(tructor). Terms and contexts are together referred to as the expressions and will be ranged over by E. In λx.t and x .t, t is the scope of
Characterising Strongly Normalising Intuitionistic Sequent Terms
87
the binders λx and x , respectively. Free variables in λGtz calculus are those that are not bound neither by abstraction nor by selection operator and Barendregt’s convention should be applied in both cases. In order to avoid parentheses, we let the scope of binders extend to the right as much as possible. According to the form of k, a cut may be an explicit substitution t( x.v) or .v) (m ≥ 1). In the last a multiary generalised application t(u1 :: · · · :: um :: x case, if m = 1, we get a generalised application t(u :: x .v); if v = x, we get a .x as the empty list of arguments); multiary application t[u1 , · · · , um ] (think of x a combination of constraints m = 1 and v = x brings cuts to the form of an ordinary application. Reduction rules of λGtz are as follows: (β) (λx.t)(u :: k) → u( x.tk) (π) (tk)k → t(k@k ) (σ) t( x.v) → v[x := t] (μ) x .xk → k, if x ∈ /k where t[x := u] (or k[x := u]) denotes meta-substitution, and k@k is defined by x.v)@k = x .vk . (u :: k)@k = u :: (k@k ) and ( The rules β, π, and σ reduce cuts to the trivial form y(u1 :: · · · um :: x .v), for some m ≥ 1, which represents a sequence of left introductions. Rule β generates a substitution, and rule σ executes a substitution in the meta-level. Rule π generalises the permutative conversion of the λ-calculus with generalised applications. Rule μ has a structural character, and either performs a trivial substitution in the reduction t( x.xk) → tk, or minimizes the use of the generality feature in the .xk) → t(u1 · · · um :: k). reduction t(u1 · · · um :: x βπσ-normal forms of λGtz are: (Terms) tnf , unf , vnf = x | λx.tnf | x(unf :: knf ) (Contexts) knf = x .tnf | tnf :: knf λGtz is a flexible system for representing logical derivations in the sequent calculus format and studying cut-elimination. The inference rules of LJ axiom, right introduction, left introduction, and cut, are represented by the constructions x, λx.t, y(u :: x .v), and t( x.v), respectively. The βπσ-normal forms correspond to the multiary, cut-free, sequent terms of [19]. See [6] for more on λGtz .
2
Intersection Types for λGtz
Definition 1. The set of types T ypes, ranged over by A, B, C, ..., A1 , ..., is inductively defined as follows: A, B ::= p | A → B | A ∩ B where p ranges over a denumerable set of type atoms.
88
J. Esp´ırito Santo, S. Ghilezan, and J. Iveti´c
Definition 2 (i) Pre-order ≤ over the set of types is the smallest relation that satisfies the following properties: 1. 2. 3. 4. 5. 6.
A≤A A ∩ B ≤ A and A ∩ B ≤ B (A → B) ∩ (A → C) ≤ A → (B ∩ C) A ≤ B and B ≤ C implies A ≤ C A ≤ B and A ≤ C implies A ≤ B ∩ C A ≤ A and B ≤ B implies A → B ≤ A → B
(ii) Two types are equivalent, A ∼ B , if and only if A ≤ B and B ≤ A. In this paper, we will consider types modulo the equivalence relation. Remark 3. The equivalence (A → B) ∩ (A → C) ∼ A → (B ∩ C), or more generally ∩(∩Ak → Bi ) ∼ ∩Ak → ∩Bi , follows from the given set of rules, and will be used in the sequel. Definition 4 (i) A basic type assignment is an declaration of the form x : A, where x is a term variable and A is a type. (ii) A basis Γ is a set of basic type assignments, where all term variables are different. (iii) There are two kinds of type assignment: - Γ t : A for typing terms; - Γ ; B k : A for typing contexts. The following typing system for λGtz is named λGtz ∩. In Ax, →L , and Cut ∩Ai = A1 ∩ · · · ∩ An , for some n ≥ 1. j ∈ {1, · · · , n} (Ax) Γ, x : ∩Ai x : Aj Γ, x : A t : B (→R ) Γ λx.t : A → B Γ t : Ai , ∀i ∈ {1, · · · , n} Γ tk : B
Γ u : Ai , ∀i ∈ {1, · · · , n} Γ ; B k : C (→L ) Γ ; ∩Ai → B u :: k : C Γ ; ∩Ai k : B
(Cut)
Γ, x : A v : B (Sel) Γ;A x .v : B
By taking n = 1 in Ax, →L , and Cut we get the typing rules of [6] for assigning simple types. Notice that in this typing system there are no separate rules for the right introduction of intersections. The management of intersection is built in the other rules.
Characterising Strongly Normalising Intuitionistic Sequent Terms
89
Proposition 5 (Admissible rule - (∩L )) (i) If Γ, x : Ai t : B, for some i, then Γ, x : ∩Ai t : B. (ii) If Γ, x : Ai ; C k : B, for some i, then Γ, x : ∩Ai ; C k : B. Proof. By mutual induction on the derivation.
Proposition 6 (Basis expansion) (i) Γ t : A ⇔ Γ, x : B t : A and x ∈ / F v(t). (ii) Γ ; C k : A ⇔ Γ, x : B; C k : A and x ∈ / F v(k). Definition 7 / Γ2 } Γ1 ∩ Γ2 = {x : A|x : A ∈ Γ1 & x ∈ / Γ1 } ∪ {x : A|x : A ∈ Γ2 & x ∈ ∪ {x : A ∩ B|x : A ∈ Γ1 & x : B ∈ Γ2 }. Proposition 8 (Bases intersection) (i) Γ1 t : A ⇒ Γ1 ∩ Γ2 t : A. (ii) Γ1 ; B k : A ⇒ Γ1 ∩ Γ2 ; B k : A. Proposition 9 (Generation lemma - GL) (i) Γ x : A iff x : ∩Ai ∈ Γ and A ≡ Ai , for some i. (ii) Γ λx.t : A iff A ≡ B → C and Γ, x : B t : C. (iii) Γ ; A x .t : B iff Γ, x : A t : B. (iv) Γ tk : A iff there is a type B ≡ ∩Bi such that Γ t : Bi for all i, and Γ ; ∩Bi k : A. (v) Γ ; D t :: k : C all i.
iff D ≡ ∩Ai → B, and Γ ; B k : C and Γ t : Ai for
Proof. The proof is straightforward since all rules are syntax-directed.
Lemma 10 (Substitution and append lemma) (i) If Γ, x : ∩Ai t : B and Γ u : Ai , for each i, then Γ t[x := u] : B. (ii) If Γ, x : ∩Ai ; C k : B and Γ u : Ai , for each i, then Γ ; C k[x := u] : B. (iii) If Γ ; B k : Ci , ∀i, and Γ ; ∩Ci k : A, then Γ ; B k@k : A. Proof. (i) and (ii) is proved by simultaneous induction on t and k. (iii) is proved by induction on k.
90
J. Esp´ırito Santo, S. Ghilezan, and J. Iveti´c
Theorem 11 (Subject Reduction). If Γ t : A and t → t , then Γ t : A. Proof. The proof employs the previous lemma. It is omitted because of the lack of space.
Example 12. In λ-calculus, the term λx.xx has the type (A ∩ (A → B)) → B. The corresponding term in λGtz -calculus is λx.x(x :: y.y). Although being a normal form this term is not typeable in the simply typed λGtz -calculus. It is typeable in λGtz ∩ in the following way: Ax x : A ∩ (A → B), y : B y : B Ax x : A ∩ (A → B) x : A
Sel x : A ∩ (A → B); B y .y : B
Ax x : A ∩ (A → B) x : A → B
x : A ∩ (A → B); A → B (x :: y .y) : B
→L
Cut x : A ∩ (A → B) x(x :: y .y) : B λx.x(x :: y .y) : (A ∩ (A → B)) → B
3
→R .
Typeability ⇒ SN
In order to prove strong normalisation for the λGtz ∩ system, we connect it with the well-known system D for λ-calculus via an appropriate mapping, and then use strong normalisation theorem for λ-terms typeable in D system. λ-terms are given by M, N, P ::= x | λx.M | M N and equipped with (β) (λx.M )N → M [x := N ] (π1 ) (λx.M )N P → (λx.M P )N (π2 ) M ((λx.P )N ) → (λx.M P )N without clash of free and bound variables (Barendregt’s convention). We let π = π1 ∪ π2 . Proposition 13. If a λ-term M is β-SN, then M is βπ-SN.
Proof. This is Theorem 2 in [7]. The following typing system for λ is named D in [12]. Γ, x : A x : A Γ, x : A M : B →I Γ λx.M : A → B
Ax
Γ M :A→B Γ N :A →E Γ MN : B
Γ M :A Γ M :B ∩I Γ M : A∩B
Γ M : A1 ∩ A2 ∩E Γ M : Ai
Characterising Strongly Normalising Intuitionistic Sequent Terms
91
Lemma 14. The following rules are admissible in D: Γ M : A Γ ⊆ Γ W eak Γ M : A
Γ N : A Γ, x : A M : B Subst Γ M [x := N ] : B
Proposition 15 (SN). If a λ-term M is typeable in D, then M is β-SN.
Proof. A result from [16], [12].
We define a mapping F from λGtz to λ. The idea is as follows. If F (t) = .v), say, is mapped to M , F (ui ) = Ni and F (v) = P , then t(u1 :: u2 :: x (λx.P )(M N1 N2 ). Formally, a mapping F : λGtz − T erms −→ λ − T erms is defined simultaneously with an auxiliary mapping F : λ − T erms × λGtz − Contexts −→ λ − T erms as follows: F (x) = x F (λx.t) = λx.F (t) F (tk) = F (F (t), k) F (N, x .t) = (λx.F (t))N F (N, u :: k) = F (N F (u), k) Proposition 16. If λGtz ∩ proves Γ t : A, then D proves Γ F (t) : A. Proof. The proposition is proved together with the claim: if λGtz ∩ proves Γ ; A k : B and D proves Γ N : A, then D proves Γ F (N, k) : B. The proof is by simultaneous induction on derivations Π1 and Π2 of Γ t : A and Γ ; A k : B, respectively. Cases according to the last typing rule used. The case (Ax) is obtained by the corresponding Ax in D together with the ∩E. The case → R, is easy, because D has the corresponding typing rule. Case (Cut). Π1 has the shape Π11i Π12 Γ t : Ai , ∀i Γ ; ∩Ai k : B (Cut) Γ tk : B By IH(Π11i ), D proves Γ F (t) : Ai . By repeated application of ∩I, D proves Γ F (t) : Ai . By IH(Π12 ), D proves Γ F (F (t), k) : B. This is what we want, since F (F (t), k) = F (tk). Case (Sel). Π2 has the shape Π21 Γ, x : A t : B (Sel) Γ;A x .t : B Suppose D proves Γ N : A. Then in D one has
92
J. Esp´ırito Santo, S. Ghilezan, and J. Iveti´c
IH Γ, x : A F (t) : B →I Γ λx.F (t) : A → B Γ N :A (→ E) Γ (λx.F (t))N : B This is what we want, since F (N, x .t) = (λx.F (t))N . Case (→ L). Π2 has the shape Π21i Π22 Γ u : Ai , ∀i Γ;B k : C (→ L) Γ ; ∩Ai → B u :: k : C Suppose D proves Γ N : ∩Ai → B. By IH (Π21i ) D proves Γ F (u) : Ai , ∀i; therefore, by repeated application of ∩I, D proves Γ F (u) : ∩Ai . Then in D one has Γ N : ∩Ai → B Γ F (u) : ∩Ai (→ E) Γ N F (u) : B Hence, by IH(Π22 ), D proves Γ F (N F (u), k) : C. This is what we want, since F (N F (u), k) = F (N, u :: k).
Proposition 17. For all t ∈ λGtz , if F (t) is βπ-SN, then t is βπσμ-SN. Proof. Consequence of the following properties of F : (i) if t →βπ u in λGtz , then Gtz F (t) →+ , then F (t) →β F (u) in λ.
π F (u) in λ; (ii) if t →σμ u in λ Theorem 18 (Typeability ⇒ SN). If a λGtz -term t is typeable in λGtz ∩, then t is βπσμ-SN. Proof. Suppose t is typeable in λGtz ∩. Then, by Proposition 16, F (t) is typeable in D. So, by Proposition 15, F (t) is β-SN. Hence, by Proposition 13, F (t) is βπ-SN. Finally, by Proposition 17, t is βπσμ-SN.
4 4.1
SN ⇒ Typeability Typeability of Normal Forms
Proposition 19. βπσ-normal forms of λGtz calculus are typeable in λGtz ∩ system. Hence so are βπσμ-normal forms. Proof. By simultaneous induction on the structure of βπσ-normal terms and contexts. – Basic case: Every variable is typeable. – λx.tnf is typeable. By IH, tnf is typeable, so Γ tnf : B. We examine two cases:
Characterising Strongly Normalising Intuitionistic Sequent Terms
93
Case 1. If x : A ∈ Γ , then Γ = Γ , x : A and we can assign the following type to λx.tnf : Γ , x : A tnf : B Γ λx.tnf : A → B.
(→R )
Case 2. If x : A ∈ / Γ , then by Proposition 6 we get Γ, x : A tnf : B thus concluding Γ, x : A tnf : B (→R ) Γ λx.tnf : A → B. – x .tnf is typeable. Proof is very similar to the previous one. – tnf :: knf is typeable. By IH tnf and knf are typeable, i.e. Γ1 tnf : A and Γ2 ; B knf : C. Then, by Proposition 8 we get Γ1 ∩ Γ2 tnf : A and Γ1 ∩ Γ2 ; B knf : C, so we assign the following type to tnf :: knf : Γ1 ∩ Γ2 tnf : A Γ1 ∩ Γ2 ; B knf : C Γ1 ∩ Γ2 ; A → B tnf :: knf : C.
(→L )
– x(tnf :: knf ) is typeable. By IH and the previous case, context tnf :: knf is typeable, i.e. Γ ; A → B tnf :: knf : C. We examine 3 cases: Case 1. If x : A → B ∈ Γ , then: Γ x:A→B
(Ax)
Γ ; A → B tnf :: knf : C
Γ x(tnf :: knf ) : C.
(Cut)
Case 2. If x : D ∈ Γ , then Γ = Γ , x : D and we can expand basis of x : A → B x : A → B to Γ , x : D ∩ (A → B) x : A → B using Propositions 5 and 6. Also, by Proposition 5, we can write Γ , x : D ∩ (A → B); A → B tnf :: knf : C. Now, the corresponding type assignment is:
Γ , x : D ∩ (A → B) x : A → B
Γ , x : D ∩ (A → B); A → B tnf :: knf : C (Cut)
Γ , x : D ∩ (A → B) x(tnf :: knf ) : C.
Case 3. If x isn’t declared at all, by Proposition 6 we get Γ, x : A → B; A → B tnf :: knf : C from Γ ; A → B tnf :: knf : C, and then conclude: Γ, x : A → B x : A → B
(Ax)
Γ, x : A → B;A → B tnf :: knf : C
Γ, x : A → B x(tnf :: knf ) : C.
(Cut)
94
4.2
J. Esp´ırito Santo, S. Ghilezan, and J. Iveti´c
Subject Expansion at Root Position
Lemma 20. If Γ u( x.tk) : A and x ∈ F v(u) ∪ F v(k), then Γ (λx.t) (u :: k) : A. Proof. Γ u x.(tk) : A implies, by GL(iv), that there is a type B ≡ ∩Bi , such .(tk) : A. Further, this implies, by that Γ u : Bi , for all i and Γ ; ∩Bi x GL(iii), that Γ, x : ∩Bi tk : A so then there is a C ≡ ∩Cj such that Γ, x : ∩Bi t : Cj for all j and Γ, x : ∩Bi ; ∩Cj k : A. By assumption, the variable x is not free in k, so using Proposition 6 we can write the previous sequent as Γ ; ∩Cj k : A. Now, because of the equivalence ∩(∩Bi → Cj ) ∼ ∩Bi → ∩Cj , we have: Γ, x : ∩Bi t : Cj , ∀j Γ λx.t : ∩Bi → Cj , ∀j
(→R )
Γ u : Bi , ∀i Γ ; ∩Cj k : A Γ ; ∩Bi → ∩Cj u :: k : A
Γ (λx.t)(u :: k) : A.
(→L )
(Cut)
Lemma 21 (Inverse substitution lemma) (i) Let Γ v[x := t] : A, and let t be typeable. Then there is a basis Γ and a type B ≡ ∩Bi , such that Γ , x : ∩Bi v : A and for all i, Γ t : Bi . (ii) Let Γ ; C k[x := t] : A, and let t be typeable. Then there is a basis Γ and a type B ≡ ∩Bi , such that Γ , x : ∩Bi ; C k : A and for all i, Γ t : Bi . Proof. By simultaneous induction on the structure of the term v and the context k.
Lemma 22 (Inverse append lemma). If Γ ; B k@k : A then there is a type C ≡ ∩Ci such that Γ ; B k : Ci , ∀i and Γ ; ∩Ci k : A. Proof. By induction on the structure of k. – Basic case: k ≡ x .v x.v)@k = x .vk . From Γ ; B x .vk : A, by GL(iii), In this case k@k = ( we have that Γ, x : B vk : A. Then, by GL(iv), there is a C ≡ ∩Ci such that Γ, x : B v : Ci , ∀i and Γ, x : B; ∩Ci k : A. From the first sequent we get Γ ; B x .v : Ci , ∀i . From the second one, considering that x is not free in k , we get Γ ; ∩Ci k : A. – k ≡ u :: k In this case, k@k = (u :: k )@k = u :: (k @k ). From Γ ; B u :: (k @k ) : A, by GL(v), B ≡ ∩Ci → D, Γ ; D k @k : A and Γ u : Ci , for all i. From the first sequent, by IH, we get some E ≡ ∩Ej such that Γ ; D k : Ej , ∀j and Γ ; ∩Ej k : A. Finally, for each j, Γ u : Ci , ∀i
Γ ; D k : Ej
Γ ; ∩Ci → D(≡ B) u :: k : Ej so the proof is completed.
(→L )
Characterising Strongly Normalising Intuitionistic Sequent Terms
95
Proposition 23 (Subject expansion at root position). If t → t , t is the contracted redex and t is typeable in λGtz ∩, then t is typeable in λGtz ∩. Proof. We examine four different cases, according to the applied reduction. – (β) : Directly follows from Lemma 20. – (σ) : We should show that typeability of t ≡ v[x := u] leads to typeability of t ≡ u x.v. Assume that Γ v[x := u] : A. By Lemma 21 there are a Γ and a B ≡ ∩Bi such that Γ u : Bi , ∀i and Γ , x : ∩Bi v : A. Now Γ , x : ∩Bi v : A .v : A Γ u : Bi , ∀i Γ ; ∩Bi x x.v : A. Γ u –
(Sel) (Cut)
(π) : We should show that typeability of t(k@k ) implies typeability of (tk)k . Γ t(k@k ) : A, by GL(iv) yields that there is B ≡ ∩Bi such that Γ t : Bi , ∀i, and Γ ; ∩Bi k@k : A. By applying Lemma 22 on previous sequent, we get Γ ; ∩Bi k : Cj , ∀j, and Γ ; ∩Cj k : A, for some type C ≡ ∩Cj . Now, for each j, Γ t : Bi , ∀i Γ ; ∩Bi k : Cj Γ tk : Cj
(Cut)
So Γ tk : Cj , ∀j. We obtain Γ (tk)k : A with a further application of (Cut). – (μ) : It should be shown that typeability of k implies typeability of x .xk. Assume Γ ; B k : A. Since x ∈ / k we can suppose that x ∈ / Γ , and by using Proposition 6 write Γ, x : B; B k : A. Now Γ, x : B x : B Γ, x : B; B k : A Γ, x : B xk : A Γ;B x .xk : A.
(Cut)
(Sel)
Theorem 24 (SN ⇒ typeability). All strongly normalising (βσπ − SN ) expressions are typeable in λGtz ∩ system. Proof. The proof is by induction over the length of the longest reduction path out of a strongly normalising expression E, with a subinduction on the size of E. If E is a βσπ-normal form, then E is typeable by Proposition 19. If E is itself a redex, let E be the expression obtained by contracting redex E. Therefore E is strongly normalising and by IH it is typeable. Then E is typeable, by Proposition 23. Next suppose that E is not itself a redex nor a normal form. Then E is of one of the following forms: λx.u, x(u :: k), u :: k, or x .u (in each case with
96
J. Esp´ırito Santo, S. Ghilezan, and J. Iveti´c
u or k not βπσ-normal). Each of the above u and k is typeable by IH, as the subexpressions of E. It is easy then to build the typing of E, as in the proof of Proposition 19.
Corollary 25. A term is strongly normalising if and only if it is typeable in λGtz ∩. Proof. By Theorems 18 and 24.
5
Generalised Applications and Explicit Substitutions
We consider two extensions of the λ-calculus: the ΛJ-calculus, where application M (N, x.P ) is generalised [10]; and the λx-calculus, where substitution M x := N is explicit [17]. Intersection types have been used to characterise the strongly normalising terms of both ΛJ-calculus [14] and λx-calculus [13]. Both in [14] and [13] the “natural” typing rules for generalised application or substitution had to be supplemented with extra rules (the rule app2 in [14]; the rules drop or K − Cut in [13]) in order to secure that every strongly normalising term is typeable. Indeed, examples of terms are given whose reduction in ΛJ or λx always terminates, but which would not be typeable, had the extra rules not been added to the typing system. The examples in ΛJ [14] and λx [13] are t0 := (λx.x(x, w.w))(λz.z(z, w.w), y.y ), y = y , t1 := y y := xxx := λz.zz , respectively. Two questions are raised by these facts: first, why the “natural” rules fail to capture the strongly normalising terms; second, how to characterise in terms of reduction the terms that receive a type under the “natural” typing rules. We now prove that λGtz and λGtz ∩ are useful for giving an answer to these questions. Definition 26. Let t be a λGtz -term 1. t is a λJ-term if every cut occurring in t is of the form t(u :: x .v). 2. t is a λx-term if every cut occurring in t has one of the forms t(u :: x .x) or t( x.v). We adopt the terminology “λJ-term” (instead of “ΛJ-term”) for the sake of uniformity. We may write t(u, x.v) instead of t(u :: x .v). Let t(u) abbreviate t(u :: x .x) and vx := t denote t( x.v). An inductive characterisation is: (λJ-terms) t, u, v ::= x | λx.t | t(u, x.v) (λx-terms) t, u, v ::= x | λx.t | t(u) | vx := t
Characterising Strongly Normalising Intuitionistic Sequent Terms
97
Definition 27 1. λJ∩ is the typing system consisting of the rules Ax, →R and the following rule, where ∩Ak = A1 ∩· · ·∩An and ∩Bi = B1 ∩· · ·∩Bm , for some n, m ≥ 1: Γ t : ∩Ak → Bi ,∀i ∈ {1,· · · , m}
Γ u : Ak ,∀k ∈ {1,· · · , n}
Γ, x : ∩Bi v : C
Γ t(u, x.v) : C
(Gen.Elim)
2. λx∩ is the typing system consisting of the rules Ax, →R and the following rules, where ∩Ak = A1 ∩ · · · ∩ An , for some n ≥ 1: Γ t : ∩Ak → B
Γ u : Ak , ∀k ∈ {1, · · · , n} (Elim) Γ t(u) : B
Γ t : Ak , ∀k ∈ {1, · · · , n} Γ, x : ∩Ak (Subst) Γ vx := t : B If n = m = 1 in (Gen.Elim), then we obtain the usual rule for assigning simple types to generalised application. If n = 1 in (Elim) or (Subst), then we obtain the usual rule for assigning simple types to application or substitution. λJ∩ is a “natural” system for typing λJ-terms, in two senses. First, the rules in λJ∩ follow the natural deduction format. Notice that we retained in λJ∩ only the rules of λGtz ∩ that act on the RHS formula of sequents, and replaced the other rules of λGtz ∩ by an elimination rule. Second, λJ∩ has just one rule for typing generalised applications, contrary to in [14]. Similarly, λx∩ is a “natural” system for typing λx-terms. Again, we retained in λx∩ only the rules of λGtz ∩ that act on the RHS formula of sequents, and replaced the other rules of λGtz ∩ by an elimination rule and a substitution rule. In addition, no extra cut or substitution rules are needed, contrary to [13]. The following is an addenda to GL. Proposition 28. In λGtz ∩ one has: 1. Γ t(u, x.v) : C iff there are A1 , . . . , An , B1 , . . . Bm such that Γ t : ∩Ak → Bi , for all i; and Γ u : Ak , for all k; and Γ, x : ∩Bi v : C. 2. Γ t(u) : B iff there are A1 , . . . , An such that Γ t : ∩Ak → B and Γ u : Ak , for all k. 3. Γ vx := t : B iff there are A1 , . . . , An such that Γ t : Ai , for all i; and Γ, x : ∩Ai v : B. Proof. We just sketch the proof of statement 1. The “only if” implication follows by successive application of GL. As to the “if” implication, let A1 , . . . , An , B1 , . . . Bm be such that Γ t : ∩Ak → Bi , ∀i, Γ u : Ak , ∀k, and Γ, x : ∩Bi v : C. Here we use ∩Ak → ∩Bi ∼ ∩(∩Ak → Bi ). Recall t(u :: x .v) is denoted by t(u, x .v). Γ, x : ∩Bi v : C Γ u : Ak , ∀k Γ t : ∩Ak → Bi , ∀i
Γ ; ∩Bi x .v : C
Γ ; ∩Ak → ∩Bi u :: x .v
Γ t(u :: x .v) : C
(Sel) (→ L)
(Cut)
98
J. Esp´ırito Santo, S. Ghilezan, and J. Iveti´c
Proposition 29 1. Let t be a λJ-term. λGtz ∩ derives Γ t : A iff λJ∩ derives Γ t : A. 2. Let t be a λx-term. λGtz ∩ derives Γ t : A iff λx∩ derives Γ t : A. Proof The “if” implications are proved by induction on Γ t : A in λJ∩ or λx∩, using the fact that Gen.Elim, Elim, and Subst are derived rules of λGtz ∩ (which is clear from the proof of Proposition 28). The “only if” implications are proved by induction on t, and rely on GL and its addenda (Proposition 28).
So we get a characterisation of typeability of t in the “natural” systems λJ∩ or λx∩ in terms of strong normalisability of t as a sequent term: Corollary 30 1. Let t be a λJ-term. t is βπσμ − SN iff t is typeable in λJ∩. 2. Let t be a λx-term. t is βπσμ − SN iff t is typeable in λx∩. In addition, the “natural” systems λJ∩ and λx∩ do capture the strongly normalising terms, the point being what we mean by “strongly normalising”. Going back to the examples t0 and t1 of the beginning of this section, although t0 and t1 are strongly normalising in ΛJ and λx, respectively, they are not so in λGtz . Indeed, x.((x(x, w.w)) y .y ), which, after one β-reduction step, t0 becomes (λz.z(z, w.w)) by abbreviation, is y y := x(x)x := λz.z(z), that is t1 ! After one σ-reduction step, t1 becomes the clearly non-terminating y y := (λz.z(z))(λz.z(z)). So, in this sense, it is correct that the natural typing systems λJ∩ and λx∩ (as well as the typing systems of [14] and [13] without extra-rules app2 , drop, and K − Cut) fail to give a type to t0 and t1 , because these terms are, after all, non-terminating. Why were these terms no so in their native reduction systems? In ΛJ, t0 becomes y after one step of β-reduction because the two substitutions of t1 cannot be formed and hence are immediately executed. In λx, the execution of the outer substitution in t1 is blocked because λx has no composition of substitutions.
6
Conclusion
This paper gives a characterisation, via intersection types, of the strongly normalising intuitionistic sequent terms. This expands the range of application of the intersection types technique. One of the points of extending the Curry-Howard correspondence to sequent calculus is that such exercise will shed light on issues like reduction, strong normalisability, or typeability in the original systems in natural deduction format. In this paper this promise is fulfilled, because the characterisation of strong normalisability in the sequent calculus proves useful for analysing recent applications of intersection types in natural deduction system containing generalised applications or explicit substitutions. This analysis confirms that there is a delicate equilibrium between clean typing systems and expressive reduction systems.
Characterising Strongly Normalising Intuitionistic Sequent Terms
99
References 1. Amadio, R., Curien, P.-L.: Domains and Lambda-Calculi. Cambridge Tracts in Theoretical Computer Science, vol. 46. Cambridge University Press, Cambridge (1998) 2. Barendregt, H., Ghilezan, S.: Lambda terms for natural deduction, sequent calculus and cut elimination. J. Funct. Program. 10(1), 121–134 (2000) 3. Coppo, M., Dezani-Ciancaglini, M.: A new type-assignment for lambda terms. Archiv f¨ ur Mathematische Logik 19, 139–156 (1978) 4. Dougherty, D., Ghilezan, S., Lescanne, P.: Characterizing strong normalization in the Curien-Herbelin symmetric lambda calculus: extending the Coppo-Dezani heritage. Theoretical Computer Science (to appear, 2007) 5. Esp´ırito Santo, J.: Revisiting the correspondence between cut-elimination and normalisation. In: Welzl, E., Montanari, U., Rolim, J.D.P. (eds.) ICALP 2000. LNCS, vol. 1853, pp. 600–611. Springer, Heidelberg (2000) 6. Esp´ırito Santo, J.: Completing Herbelin’s programme. In: Della Rocca, S.R. (ed.) TLCA 2007. LNCS, vol. 4583, pp. 118–132. Springer, Heidelberg (2007) 7. Esp´ırito Santo, J.: Delayed substitutions. In: Baader, F. (ed.) RTA 2007. LNCS, vol. 4533, pp. 169–183. Springer, Heidelberg (2007) 8. Esp´ırito Santo, J., Pinto, L.: Permutative conversions in intuitionistic multiary sequent calculi with cuts. In: Hofmann, M.O. (ed.) TLCA 2003. LNCS, vol. 2701, pp. 286–300. Springer, Heidelberg (2003) 9. Herbelin, H.: A lambda calculus structure isomorphic to Gentzen-style sequent calculus structure. In: Pacholski, L., Tiuryn, J. (eds.) CSL 1994. LNCS, vol. 933, pp. 61–75. Springer, Heidelberg (1995) 10. Joachimski, F., Matthes, R.: Standardization and confluence for ΛJ. In: Bachmair, L. (ed.) RTA 2000. LNCS, vol. 1833, pp. 141–155. Springer, Heidelberg (2000) 11. Kikuchi, K.: Simple proofs of characterizing strong normalization for explicit substitution calculi. In: Baader, F. (ed.) RTA 2007. LNCS, vol. 4533, pp. 257–272. Springer, Heidelberg (2007) 12. Krivine, J.L.: Lambda-calcul, types et mod`eles, Masson, Paris (1990) 13. Lengrand, S., Lescanne, P., Dougherty, D., Dezani-Ciancaglini, M., van Bakel, S.: Intersection types for explicit substitutions. Inf. Comput. 189(1), 17–42 (2004) 14. Matthes, R.: Characterizing strongly normalizing terms of a λ-calculus with generalized applications via intersection types. In: Rolin, J., et al. (eds.) ICALP Workshops 2000, pp. 339–354. Carleton Scientific (2000) 15. Pottinger, G.: A type assignment for the strongly normalizable λ-terms. In: Seldin, J.P., Hindley, J.R. (eds.) To H. B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism, pp. 561–577. Academic Press, London (1980) 16. Ronchi, S., Rocca, D.: Principal type scheme and unification for intersection type discipline. Theor. Comput. Sci. 59, 181–209 (1988) 17. Rose, K.: Explicit substitutions: Tutorial & survey. Technical Report LS-96-3, BRICS (1996) 18. Sall´e, P.: Une extension de la th´eorie des types en lambda-calcul. In: Ausiello, G., B¨ ohm, C. (eds.) ICALP 1978. LNCS, vol. 62, pp. 398–410. Springer, Heidelberg (1978) 19. Schwichtenberg, H.: Termination of permutative conversions in intuitionistic Gentzen calculi. Theoretical Computer Science 212(1–2), 247–260 (1999)
Intuitionistic vs. Classical Tautologies, Quantitative Comparison Antoine Genitrini1 , Jakub Kozik2 , and Marek Zaionc2 PRiSM, CNRS UMR 8144, Universit´e de Versailles ´ Saint-Quentin en Yvelines, 45 av. des Etats-Unis, 78035 Versailles cedex, France
[email protected] 2 Theoretical Computer Science, Jagiellonian University, Gronostajowa 3, Krak´ ow, Poland [jkozik,zaionc]@tcs.uj.edu.pl 1
Abstract. We consider propositional formulas built on implication. The size of a formula is the number of occurrences of variables in it. We assume that two formulas which differ only in the naming of variables are identical. For every n ∈ N, there is a finite number of different formulas of size n. For every n we consider the proportion between the number of intuitionistic tautologies of size n compared with the number of classical tautologies of size n. We prove that the limit of that fraction is 1 when n tends to infinity1 .
1
Introduction
In the present paper we consider propositional formulas built on implication only. In particular we do not use logical constant ⊥. The size of a formula is the number of occurrences of variables in it. We assume that two formulas which differs only in the naming of variables are identical. For every n ∈ N, there is finite number of different formulas of size n, we denote that number by F (n). Consequently there is also finite number of classical tautologies and of intuitionistic tautologies of that size. These numbers are denoted by Cl(n) and Int(n) respectively. We are going to prove that: Int(n) = 1. lim n→∞ Cl(n) This work is a part of the research in which the asymptotic likelihood of truth is estimated. We refer to Gardy [4] for a survey on probability distribution on Boolean functions induced by random Boolean expressions. For the purely implicational logic of one variable the exact value of the density of truth was computed in the paper [11] of Moczurad, Tyszkiewicz and Zaionc. It is well known that under Curry-Howard isomorphism this result answered the question 1
Research described in this paper was partially supported by POLONIUM grant Quantitative research in logic and functional languages, cooperation between Jagiel´ lonian University of Krakow, L’ Ecole Normale Sup´erieure de Lyon and Universite de Versailles Saint-Quentin, contract number 7087/R07/R08.
M. Miculan, I. Scagnetto, and F. Honsell (Eds.): TYPES 2007, LNCS 4941, pp. 100–109, 2008. c Springer-Verlag Berlin Heidelberg 2008
Intuitionistic vs. Classical Tautologies, Quantitative Comparison
101
of finding the “density” of inhabited types in the set of all types. The classical logic of one variable and the two connectors – implication and negation – was studied in Zaionc [13]. Over the same language, the exact proportion between intuitionistic and classical logics has been determined in Kostrzycka and Zaionc [7]. Some variants involving formulas with other logical connectors have also been considered. The case of and/or connectors received much attention – see Lefmann and Savick´ y [8], Chauvin, Flajolet, Gardy and Gittenberger [1] and Gardy and Woods [5]. Matecki [9] considered the case of the equivalence connector. In the latest paper [6] of Fournier, Gardy, Genitrini and Zaionc, the proportion between intuitionistic and classical logics when the overall number of variables is finite was studied. In this paper, the methods (and moreover the partition of formulas into several classes) developed in [6] are used in the different case, when the number of variables is arbitrary and not fixed. But formally this paper and [6] are incomparable in the sense that each one is not an extension of the other.
2
Basic Facts
2.1
Catalan Numbers
The n-th Catalan number is the number of binary trees with n internal nodes or, equivalently, n + 1 leaves. For our exposition it will be convenient to focus on leaves, therefore we denote by C(n) the (n−1)-th Catalan number. Its (ordinary) generating function is √ 1 − 1 − 4z n . C(n)z = c(z) = 2 n∈N
That function fulfills the following property c(z) = c(z)c(z) + z.
(1)
The radius of convergence of c(z) is and limz→R 41 c(z) = . We have also the following property 1 C(n − 1) lim = . n→∞ C(n) 4 1 4
2.2
1 2
Algebraic Asymptotics
Lemma 1. Let f, g ∈ Z[[z]] be two algebraic generating functions, having (as complex analytic functions) unique dominating singularities in ρ ∈ R+ . Suppose that these functions have Puiseux expansions around ρ of the form 1
1
f (z) = cf + df (z − ρ) 2 + o((z − ρ) 2 ) 1
1
g(z) = cg + dg (z − ρ) 2 + o((z − ρ) 2 ). Then
[z n ]f (z) f (z) = lim . n→∞ [z n ]g(z) z→R ρ− g (z) lim
102
A. Genitrini, J. Kozik, and M. Zaionc
By the singularity analysis for algebraic generating functions (see e.g. Theorem 8.12 from [2]) we obtain that: df [z n ]f (z) = . n→∞ [z n ]g(z) dg lim
(z) On the other hand it can be easily calculated that limz→R ρ− fg (z) = analogous argument can be derived from the Szeg¨ o Lemma (see [12]).
2.3
df dg .
The
Bell Numbers
The n-th Bell number, denoted by B(n), is the number of equivalence relations which can be defined on some fixed set of size n. We use the following property, which can be derived from the asymptotic formula for Bell numbers by Moser and Wyman ([10], see also [3]). e log(n) B(n − 1) ∼ . B(n) n 2.4
Formulas
Implicational formulas can be represented by binary trees, suitably labeled: their internal nodes are labeled by the connector → and their leaves by some variables. By φ we mean the size of formula φ which we define to be the total number of occurrences of propositional variables in the formula (or leaves in the tree representation of the formula). Parentheses (which are sometimes necessary) and the implication sign itself are not included in the size of expressions. Formally, xi = 1 and φ → ψ = φ + ψ . We denote by F (n) the number of implicational formulas of size n. Let T be a formula (tree). It can be decomposed with respect to its right branch. Hence it is of the form A1 → (A2 → (. . . → (Ap → r(T ))) . . .) where r(T ) is a variable. We shall write it as T = A1 , . . . , Ap → r(T ). The formulas Ai are called the premises of T and r(T ), the rightmost leaf of the tree, is called the goal of T . 2.5
Counting up to Names
It is easy to observe, that the number of different formulas of size n is F (n) = B(n)C(n). C(n) corresponds to the shapes of formulas represented by trees, and B(n) to all possible distributions of variables in that shape.
Intuitionistic vs. Classical Tautologies, Quantitative Comparison
3
103
Simple Tautologies
We follow notation from [11], [14] and [6]. We are going to prove a theorem analogous to the main theorem of [6]. Definition 1. G is the set of simple tautologies i.e. expressions that can be written as T = A1 , . . . , Ap → r(T ), where there exists i such that Ai is a variable equal to r(T ). We let (G(n))n∈N be the sequence of the numbers of simple tautologies of size n. It is easy to prove that simple tautologies are indeed intuitionistic tautologies. The asymptotic equivalence of classical and intuitionistic logic is a direct consequence of the following theorem. Theorem 1. Asymptotically, all the classical tautologies are simple. We are going to prove the theorem in three steps. First, we estimate the number of simple tautologies. Then in two steps we show that the number of remaining tautologies is asymptotically negligible. Lemma 2. The fraction of simple tautologies among all formulas of size n is . asymptotically equal to e log(n) n Proof. First, we enumerate all the shapes of trees which can be labelled to be simple tautologies. The set of such trees will be denoted by GT . A tree belongs to GT if and only if it has at least one premise which is a leaf. Let GT (n, l) denote the number of trees of size n, whose l premises are leaves. We define bi variate generating function gt(x, z) = n,l∈N\{0} GT (n, l)z n xl . We use standard unlabeled constructions (see [3]) to obtain the explicit expression for gt(x, z). Clearly, for every tree t from GT there is a premise which is a leaf. The last such premise decomposes uniquely the sequence of premises of t into two sequences. The first consists of arbitrary trees, while the second – of trees which are not leaves. Note that c2 (z) is the generating function for trees which are not leaves. Corresponding constructions on generating function yields: gt(x, z) =
1 1 · xz · · z. 1 − c2 (z) − xz 1 − c2 (z)
(2)
1 In the expression above the term 1−c2 (z)−xz corresponds to the sequence of 2 trees which are either leaves (xz) or not (c (z)). The second term xz corresponds to the last premise that is a leaf. The third term 1−c12 (z) corresponds to the remaining sequence of premises which are not leaves. The last occurrence of z corresponds to the goal. Let us fix l ∈ N \ {0}. Let Gl (n) denote the number of simple tautologies of size n in which l premises are leaves. ¿From the inclusion-exclusion principle we obtain
GT (n, l) · l · B(n − 1) > Gl (n) > GT (n, l) · l · B(n − 1) − GT (n, l)
l(l − 1) B(n − 2). 2
104
A. Genitrini, J. Kozik, and M. Zaionc
The first inequality comes from the fact that in every simple tautology there is at least one premise which is equal to goal. Hence GT (n, l) corresponds to the shape of the tree, l corresponds to the possibilities of choice of the premise, B(n − 1) corresponds to the all possible labeling of variables (n − 1 since one premise is chosen to be equal to the goal). Of course, the formulas in which many premises are equal to the goal are counted more than once. The second inequality comes from subtracting all the formulas in which at least two premises are equal to the goal (again some formulas are subtracted many times). We have
GT (n, l) · l · B(n − 1) >
l∈N\{0}
Gl (n)
l∈N\{0}
l(l − 1) B(n − 2) GT (n, l) · l · B(n − 1) − GT (n, l) 2
Gl (n) >
l∈N\{0}
l∈N\{0}
therefore for every n ∈ N l∈N\{0} GT (n, l) · l · B(n − 1) F (n)
>
l∈N\{0}
Gl (n)
F (n)
(3)
and l∈N\{0}
Gl (n)
F (n)
>
l∈N\{0} (GT (n, l)
· l · B(n − 1) − GT (n, l) l(l−1) 2 B(n − 2)) F (n)
(4) We are going to find a succinct formula for the generating function gt(z) = n z l · GT (n, l). Taking the derivative of the function gt(x, z) with n∈N l∈N\{0} respect to x we obtain the generating function of the sequence GT (n, l) · l · xl−1 z n . gtx (x, z) = n,l∈N\{0}
It remains to substitute 1 for x to obtain the sought generating function. We can write that function explicitly by applying those operations to the explicit formula for gt(x, z). We obtain: gt(z) =
z 1 − c(z)
2 = c(z)2 ,
the last equality results from (1). We encourage the reader to find a direct interpretation in terms of trees of the obtained expression for gt(z). By differentiating gt(x, z) twice with respect to x, substituting 1 for x and multiplying by 12 , we analogously obtain the generating function: gt(z) =
n∈N\{0}
zn
l∈N\{0}
GT (n, l)
l(l − 1) 2
Intuitionistic vs. Classical Tautologies, Quantitative Comparison
105
Hence gt(z) = gtx (1, z) = c(z)(c(z) − z). Both generating functions gt(z) and gt(z) are algebraic generating functions with unique dominating singularities in 14 . Hence by the Lemma 1:
[z n ]gt(z) gt (z) = lim =1 1 − c (z) n→∞ [z n ]c(z) z→R 4 lim
and
3 [z n ]gt(z) gt (z) = lim = lim 1 − c (z) n→∞ [z n ]c(z) 4 z→R 4 Therefore l∈N
GT (n, l) · l · B(n − 1) ([z n ]gt(z)) B(n − 1) e log(n) = ∼ C(n)B(n) C(n) B(n) n
and l∈N
GT (n, l) l(l−1) ([z n ]gt(z)) B(n − 2) 3 2 B(n − 2) = ∼ C(n)B(n) C(n) B(n) 8
Finally from (3) and (4) we obtain Gl (n) G(n) e log(n) = l∈N ∼ F (n) F (n) n
e log(n) n
2
(5)
It remains to estimate the number of tautologies which are not simple. Those have to be found among formulas which are neither simple tautologies nor simple nontautologies (a formula T is simple nontautology if the goal of T does not occur as a goal of any premise of T ). That means that in every such formula T there is at least one premise which is not a variable, and whose goal is equal to the goal of T . First we will show that the number of formulas A1 , . . . , Ak → x in which x is a goal of at least two premises is negligible, the set of such formulas will be denoted by M P and the number of such formulas of size n by M P (n). Lemma 3. The fraction of formulas A1 , . . . , Ak → x in which x is a goal of at ). least two premises among all formulas of size n is o( e log(n) n Proof. Let P (n, l) denotethe number of trees of size n in which l premises are not leaves. Let p(x, z) = n,l∈N P (n, l)z n xl be its generating function. From the equation (1) we know that c(z) =
z 1−
c2 (z)
−z
106
A. Genitrini, J. Kozik, and M. Zaionc
That equation can be interpreted in terms of combinatorial constructions. Every tree is a sequence of premises, followed by the goal. That translates to the ex1 pression c(z) = 1−c(z) · z. Every tree is either a leaf or it consists of two subtrees, therefore we can substitute c2 (z) + z for c(z) in the last equation. We add a formal parameter x for every premise which is not a leaf to obtain the generating function p(x, z): z p(x, z) = 2 1 − xc (z) − z Every formula T ∈ M P has two premises with the goal equal to the goal of T , and those premises are not leaves. Therefore M P (n) ≤ B(n − 2)
P (n, l)
l∈N\{0}
l(l − 1) 2
Note that
zn
n∈N
P (n, l)
l∈N
Hence
l(l − 1) c5 (z) c7 (z) = 12 px (1, z) = = 2 (1 − c(z))2 z2 5
lim
n→∞
c (z) [z n ] (1−c(z)) 2
[z n ]c(z)
7
( c z(z) 7 2 ) = lim = (z) 1− c 4 z→R 4
It follows that
2 B(n − 2) l∈N1 P (n, l) l(l−1) M P (n) 7 e log(n) 2 ≤ ∼ F (n) C(n)B(n) 4 n
and comparing to (5), formulas from M P are negligible.
Finally, we estimate the number of tautologies which have exactly one premise with goal equal to the goal of the whole formula (compare the part about less simple non-tautologies in [6]). Let T be such a formula, and C be that premise. Let D be the first premise of C (C is not a variable), and r(D) be the goal of D (see Figure 3). A necessary condition for the formula T to be a tautology is that either r(D) is a goal of at least one premise of T or D, or r(T ) is a goal of some premises of D. We estimate the number of such formulas. Let LT be the set of formulas T for which both of the following conditions hold: 1. T has exactly one premise C whose goal is r(T ), and that premise is not a variable (this implies that T is not a simple tautology nor a simple nontautology). 2. Let D be the first premise of C. At least one of the following conditions holds: (a) there is a premise of D with goal r(T ), (b) there is a premise of D with goal r(D), (c) r(D) = r(T ) and there is a premise of T with goal r(D).
Intuitionistic vs. Classical Tautologies, Quantitative Comparison
107
→ →
T1
→
Ti−1 →
→ →
D
→
C1 Cq
→
Ti Tp
r(T )
r(T )
Fig. 1. Tautologies from LT
Lemma 4. The of tautologies which are not simple among all formulas fraction . of size n is o e log(n) n Proof. Clearly all the tautologies, which are not simple and not in M P , belong to LT .Let LT (n) denote the number of formulas from LT of size n. Since M P (n) e log(n) is o it is enough to prove the estimation for LT (n). n Let LT T (n, m, l, k) denote the number of trees of size n which have l + 1 premises, and in which the first premise of the m-th premise has exactly k premises. Let LT T (n, l, k) = m∈N LT T (n, m, l, k). Every such tree can be turned into formula from LT by the appropriate assignment of variables, and every formula from LT can be constructed in this way. Therefore LT (n) ≤ (l + k) · LT T (n, l, k) · B(n − 2) + k · LT T (n, l, k) · B(n − 2) l,k∈N
l,k∈N
The first sum corresponds to the situation, where r(D) occurs as a goal in some premises of D or T . The second one – to the the situation, where r(T ) occurs as a goal of some premises of D (these situations are not disjoint). Clearly LT (n) ≤ 2 (l + k) · LT T (n, l, k) · B(n − 2). l,k∈N
Let ltt(x, y, z) =
l,k,n∈N
ltt(x, y, z) =
xl y k z n LT T (n, l, k). We have z 1 1 · · c(z) · · z. 1 − xc(z) 1 − yc(z) 1 − xc(z)
1 The first term 1−xc(z) corresponds to the sequence of premises preceding the z distinguished premise C. The component 1−yc(z) ·c(z) corresponds to the premise
108
A. Genitrini, J. Kozik, and M. Zaionc
C (formal parameter y counts the premises of the subtree corresponding to D). 1 The last component 1−xc(z) · z corresponds to the remaining premises of the main tree and the leaf (goal). Let ltt(x, z) = ltt(x, x, z), then ltt(x, z) = l,k,n∈N x(l+k) z n LT T (n, l, k), and lttx (1, z) = l,k,n∈N z n (l + k)LT T (n, l, k). We denote the last function by ltt(z), LT (n) )n∈N . We it is the generating function for the sequence which majorizes ( 2B(n−2) can write it explicitly as 3c6 (z) . ltt(z) = z2 Then we have 6 6 3c (z) n 3c (z) [z ] z2 z2 18 LT T (n) lim = lim = lim = =9 n n→∞ C(n) n→∞ [z ]c(z) c (z) 2 z→R 41 −
Therefore 2LT T (n)B(n − 2) B(n − 2) LT (n) ≤ ∼ 18 ∼ 18 F (n) B(n)C(n) B(n)
e log(n) n
2
The Theorem 1 is a direct consequence of Lemmas 2,3,4.
4
Discussion
Actually, in this paper much more is proved. The result obtained is not related only to intuitionistic tautologies, but it holds also in any logic which is able to prove simple tautologies. Indeed, all formulas of the form of simple tautologies are tautologies of every reasonable logic with this syntax. Therefore results comparing densities of any logic between minimal and classical one are the same. So the theorem proved may be applied as well to minimal, intuitionistic, and any intermediate logic. It shows, in fact, that a randomly chosen theorem has a proof which is a projection and statistically all true statements are the trivial ones. In the paper only implicational fragment is taken in to consideration. Right now we do not know the analogous result for more complex syntax. But based on our experience we believe that the similar theorems holds for more complex syntaxes, including full propositional logic. At the moment, these are just expectations, but certainly it is worth to look in this direction. Despite of the fact that all discussed problems and methods are solved by mathematical means, the paper, as was suggested by referees may have some philosophical interpretation and impact. However, the paper is purely technical and we are not ready to comment on these philosophical issues. Acknowledgements. We are very grateful to all three anonymous referees who suggested many improvements to the presentation of our paper.
Intuitionistic vs. Classical Tautologies, Quantitative Comparison
109
References 1. Chauvin, B., Flajolet, P., Gardy, D., Gittenberger, B.: And/Or trees revisited. Combinatorics, Probability and Computing 13(4-5), 475–497 (2004) 2. Flajolet, P., Sedgewick, R.: Analytic combinatorics: functional equations, rational and algebraic functions. In: INRIA, vol. 4103 (2001) 3. Flajolet, P., Sedgewick, R.: Analytic combinatorics. Book in preparation (2007), available at: http://algo.inria.fr/flajolet/Publications/books.html 4. Gardy, D.: Random Boolean expressions. In: Colloquium on Computational Logic and Applications. Proceedings in DMTCS, Chambery (France), June 2005, pp. 1–36 (2006) 5. Gardy, D., Woods, A.: And/or tree probabilities of Boolean function. Discrete Mathematics and Theoretical Computer Science, 139–146 (2005) 6. Fournier, H., Gardy, D., Genitrini, A., Zaionc, M.: Classical and intuitionistic logic are asymptotically identical. In: Duparc, J., Henzinger, T.A. (eds.) CSL 2007. LNCS, vol. 4646, pp. 177–193. Springer, Heidelberg (2007) 7. Kostrzycka, Z., Zaionc, M.: Statistics of intuitionistic versus classical logic. Studia Logica 76(3), 307–328 (2004) 8. Lefmann, H., Savick´ y, P.: Some typical properties of large And/Or Boolean formulas. Random Structures and Algorithms 10, 337–351 (1997) 9. Matecki, G.: Asymptotic density for equivalence. Electronic Notes in Theoretical Computer Science 140, 81–91 (2005) 10. Moser, L., Wyman, M.: An asymptotic formula for the Bell numbers, Transactions of the Royal Society of Canada, XLIX (1955) 11. Moczurad, M., Tyszkiewicz, J., Zaionc, M.: Statistical properties of simple types. Mathematical Structures in Computer Science 10(5), 575–594 (2000) 12. Wilf, H.: generating functionology, 3rd edn. A K Peters Publishers (2006) 13. Zaionc, M.: On the asymptotic density of tautologies in logic of implication and negation. Reports on Mathematical Logic 39, 67–87 (2005) 14. Zaionc, M.: Probability distribution for simple tautologies. Theoretical Computer Science 355(2), 243–260 (2006)
In the Search of a Naive Type Theory Agnieszka Kozubek and Pawel Urzyczyn Institute of Informatics, University of Warsaw, Poland {kozubek,urzy}@mimuw.edu.pl
Abstract. This paper consists of two parts. In the first part we argue that an appropriate “naive type theory” should replace naive set theory (as understood in Halmos’ book) in everyday mathematical practice, especially in teaching mathematics to Computer Science students. In the second part we make the first step towards developing such a theory: we discuss a certain pure type system with powerset types. While the system only covers very initial aspects of the intended theory, we believe it can be used as an initial formalism to be further developed. The consistency of this basic system is established by proving strong normalization.
1
Why Not Set Theory?
Set theory is an enormous success in the contemporary mathematics, including the mathematics relevant to Computer Science. Virtually all maths is developed within the framework of set theory, and virtually all books and papers are written under the silent assumption of ZF or ZFC axioms occurring “behind the back”. We sometimes feel as if we actually lived in set theory, as if it was the only true and real world. The set-theoretical background has made its way to education, from the university to the kindergarten level, and what once was a foundational subject on the border of logic and philosophy now has become a part of elementary mathematics. And indeed, set theory deserves its pride. From an extremely modest background—the notion of “being an element” and the idea of equality—it develops complex notions and objects serving the needs of even most demanding researcher. Enjoying the paradise of sets we tend to forget about the price we pay for that. Of course, we must avoid paradoxes, and thus the set formation patterns are severely restricted. We must give up Cantor’s idea of “putting together” any collection of objects, resigning therefore, at least partly, from the very basic intuition that a set of objects can be selected by any criterion at all. Universes vs predicates. In fact, there are two very basic intuitions that are glued together into the notion of a “set”:
Partly supported by the Polish Government Grant 3 T11C 002 27, and by the EU Coordination Action 510996 “Types for Proofs and Programs”.
M. Miculan, I. Scagnetto, and F. Honsell (Eds.): TYPES 2007, LNCS 4941, pp. 110–124, 2008. c Springer-Verlag Berlin Heidelberg 2008
In the Search of a Naive Type Theory
111
– Set as a domain or universe; – Set as a result of selection. We used to treat this identification as natural and obvious. But perhaps only because we were taught to do so. These two ideas are in fact different, and this very confusion is responsible for Russel’s paradox. In addition, ordinary mathematical practice often makes an explicit difference between the two aspects. Mathematicians have been classifying objects according to their domain, kind, sort or type since the antiquity [2,21]. An empty set of numbers and an empty set of apples are intuitively not the same, as well as in most cases we do not need and do not want to treat a function in the same way as its arguments. The difference between domains (types) and predicates is made explicit in type theory. This results in various simplifications. For instance, the difference between operations on universes (product, disjoint sum) and operations on predicates (intersection, set union) becomes immediately apparent and natural. Yet another example is that a union A of a family A of sets is typically of the same “type” as members of A rather than as A itself. In set theory, this argument is not sufficient to reject common student’s misconceptions like e.g. A ⊆ A, because classifying sets (a priori) into types is illegal. Everyday maths vs foundations of mathematics. The purpose of set theory was to give a universal foundation for a consistent mathematics. That happened at the beginning of the 20th century, when consistency of elementary notions was a serious issue, threatened by the danger of antinomies, and when modern formal mathematics was in its infancy. It was then important to ensure as much security as possible. Therefore, all the development had to be done from first principles, and the results of it have little to do with ordinary mathematical practice. For instance, using the Axiom of Foundation one derives in set theory the surrealistic conclusion that all the world is built from curly braces. This foundational tool is now being widely used for a quite different purpose. We use sets as a basic everyday framework for various kinds of mathematics, and we teach set theory to students, beginning at a very elementary level. But that puts us into an awkward situation. On the one hand, we want to use as much common-sense as possible, on the other hand we do not want paradoxes and inconsistency. So if we do not want to cheat, what can we do? One possibility is to hide the problem and pretend that everything is OK: “Emmm. . . We assume that all sets under consideration are subsets of a certain large set.” This is what often happens in elementary and high-school textbooks. But is it really different than saying that the world is placed on the back of a giant turtle? An intelligent student must eventually ask on whose back the turtle is standing. And then all we can say is “Sit up straight!” The other option is to pull the skeleton out of the closet, put all axioms on the table, and pay a heavy overhead by spending a lot of effort on constructing ordered pairs in the style of Kuratowski, integers in the style of von Neumann, and so on. This approach is common at the university level and has been considerably mastered. For half a century, the book [18] by Halmos has been giving guidance to lecturers how to achieve a balance between precision and simplicity.
112
A. Kozubek and P. Urzyczyn
(Contrary to its title, the book is not about naive set theory. It is about axiomatic set theory taught in a “naive” or “common-sense” style.) But even this didactic masterpiece is a certain compromise. The idea vs the implementation. This is because the overhead is unavoidable. Very basic mathematical ideas must be encoded in set theory before they can be used, and a substantial part of student’s attention is paid to the details of the encoding. To a large extent this is a wasted effort and it would be certainly more efficient to concentrate on “top-level” issues. Using an old comparison in a different context, it is like teaching the details of fuel injection in a driving school while we should rather let students practice driving. Getting accustomed to set theoretical “implementation” of mathematics is painful to many students. In set theory the implementation is not “encapsulated” at all and we can smell the fuel in the passenger’s cabin. One of the most fundamental God’s creations is turned into a transitive set of von Neumann’s numbers so we must live with phenomena like 1 ∈ 2 ⊆ 3 ∈ 4 or N = N. We do not really need these phenomena. The actual use of various objects and notions in mathematics is based on their intensional “specification” rather than formal implementation. We still have to ask students to remember the rule a, b = c, d iff a = c ∧ b = d, (*) in addition to the definition a, b = {{a}, {a, b}}. But we must spend time on proving the above equivalence. A doubtful reward is the malicious homework “Prove that (N × N) = N.” We got used to such homeworks so much that we do not notice that they are nonsense. In a typed framework a substantial part of this nonsense simply disappears.
2
Why Type Theory and What Type Theory?
We believe that an appropriate type theory should give a chance to build a framework for “naive” mathematics that would not exhibit many of the drawbacks mentioned above. In particular, it is reasonable to expect that a “naive type theory” can be more adequate than “naive set theory” from our point of view, in that it should – – – – –
be free from both paradoxes and unnecessary artificial formalization; distinguish between domains (universes) and sets understood as predicates; begin with intensional specifications rather than from bare first principles; be closer to the everyday maths and computer science practice; be more appropriate for automatic verification.
We do not want to depart from ordinary mathematical practice, and thus our naive type theory should be adequate for classical reasoning, and extensional with respect to functions and predicates. We find it however methodologically appropriate that these choices are made explicit (introduced by appropriate axioms) rather than implicit (built in the design principles). We also would like to include a Curry-Howard flavour, taking seriously De Bruijn’s slogan [6]:
In the Search of a Naive Type Theory
113
Treating propositions as types is definitely not in the way of thinking of the ordinary mathematician, yet it is very close to what he actually does. The basic idea is of course to separate the two roles played by sets, namely to put apart domains (types) and predicates (selection criteria for objects of a given type). Thus for any type A we need a powerset type P (A), identified with the function space A → ∗, where ∗ is the sort of propositions. That is, we would like to treat “M ∈ {a : A | ϕ(a)}” as syntactic sugar for “ϕ(M )”. Although our principal aim is a “naive” approach, we should be aware of the necessity of a formalization. Firstly, because we still need some justification for consistency, secondly, because it may be desirable that “naive” reasoning can be computer-assisted. We find it quite natural and straightforward to build such a formalization beginning with a certain pure type system, to be later extended with additional constructs and axioms. Related systems. Simple type theory: In Church’s simple type theory [8,21] there are two base types: the type i of individuals and the type b of truth values. Expressions have types and formulas are simply expressions of type b. There is no built-in notion of a proof and formulas are not types. In addition to lambda-abstraction, there is another binding operator that can be used to build expressions, namely the definite description ιx. ϕ(x), meaning “the only object x that satisfies ϕ(x)”. While various forms of definite description are often used in the informal language of mathematics, the construct does not occur in most contemporary logical systems. As argued by William Farmer in a series of papers [11,12,13,14], simple type theory could be efficiently used in mathematical practice and teaching. Also the textbook [2] by P.B. Andrews develops a version of simple type theory as a basis for everyday mathematics. This is very much in line with our way of thinking. We choose a slightly different approach, mostly to avoid the inherently two-valued Boolean logic built in Church’s type theory. Quine’s New Foundations: Quine’s type theory [20,23] is based on an implicit linear hierarchy of universes. Full comprehension is possible at each level, but a set always lives at a higher floor than its elements. The idea of a linear hierarchy is of course convenient from a foundational point of view, but is not very intuitive. Also implementing “ordinary” mathematics requires a similar effort as in the usual set theory. The restriction to stratified constructs does not help either: one encounters difficulties when trying to define functions between objects belonging to different levels of the hierarchy. Constable’s computational naive type theory: We have to admit that the title of Halmos’ book has already been rephrased by R. Constable [9]. But Constable’s idea of a “naive type theory” is quite different than ours. It is inspired by MartinL¨ of’s theory and based on the idea of a setoid type, determined by a domain of objects plus an appropriate notion of equality. (In other words, quotient becomes a basic notion.) For instance, the field Z3 has the same domain as the set of integers Z, but a different equality. And Z6 is defined by taking an “intersection” of Z2 and Z3 . This is very convenient and natural way of dealing with quotient constructions. However (even putting aside the little counterintuitivity of the “contravariant” intersection) we still believe that a “naive” notion of equality
114
A. Kozubek and P. Urzyczyn
should be more strict: two objects should not be considered the same in one context but different in another. Coq and the calculus of inductive constructions: An almost adequate framework for a naive type theory is the Calculus of Constructions extended with inductive types. This is essentially the basic part of the type theory of the Coq proof assistant [5]. The paper [7] describes an attempt to use Coq in teaching rudiments of set theory. But in Coq, if A is a type (A : Set is provable) then the powerset A → Prop of A is a kind (A → Prop : Type is provable). That is, a set and its powerset do not live in the same sort, although they should receive similar treatment. Weyl’s predicative mathematics and Luo’s logic-enriched theories: Zhaohui Luo in [22] considers logic-enriched type theories” where the logical aspect is ” separated by design from the data-type aspect (in particular a separate kind Prf (P ) is used for proofs of any proposition P ). Within that framework one can introduce both predicative and impredicative notion of a set, so that the kind Type is closed under the powerset construction. This approach is used by Adams and Luo [1] to formalize the predicative mathematics of Weyl [25], who long ago made an explicit distinction between “categories” and sets, understood respectively as universes and predicates. Weyl’s theory is strictly predicative, and this certainly departs from our “naive” understanding of sets, but the impredicative version mentioned in [22] is very much consistent with it. In this paper. In the next section we collect a few postulates concerning the possible exposition of a naive type theory. With this exposition we would like to initiate a discussion to help establish a new approach to both teaching and using mathematics in a way that will avoid the set-theoretic “overheads” and remain sufficiently precise and paradox-free. We realize that a naive approach to type theory can result in an inconsistency, as it happened to naive set theory and many other ideas. Therefore we consider it necessary to build the naive approach on top of a rigourous formal system, to be developed in parallel. The relation between the formal language and the naive theory should be similar to the relation between the first-order ZFC formal theory, and Halmos’ book [18]. In the present paper we do not attempt to solve the problem in general but rather to formulate it explicitly and highlight its importance. On the technical side, we only address here one very initial but important problem. A set X and its powerset P (X) should be objects of the same sort, and we also assume that subsets of X should be identified with predicates on X. In the language of pure type systems that leads to the idea of a type assignment of the form X→∗ : ∗, which turns out to imply inconsistency. In Section 4 we show that this inconsistency can be eliminated if the difference between propositions and types is made explicit. More precisely, we prove strong normalization (and thus consistency) of an appropriate PTS. Of course, that is only the first step. We need a much richer consistent system to back up our “practical” exposition of sets, functions, and composite types, as sketched in the previous section. This will most likely require extending our
In the Search of a Naive Type Theory
115
system LNTT by various additional constructs, in particular a general scheme for inductive types, and additional axioms. All this is future work.
3
Informal Exposition
In this section we sketch some basic ideas of how a “naive” informal presentation of basic mathematics could look when set theory is replaced by type theory. As we said, these ideas go far beyond the initial formalism of Section 4. Types. Every object is assigned a type. Types themselves are not objects.1 Certain types are postulated by axioms, and many of these should be special cases of a general scheme for introducing inductive (perhaps also co-inductive) types. In particular, the following should be assumed: – – – – –
A unit type with a single element nil . Product types A × B and co-product types A + B, for any types A, B. The type N of integers. The powerset type P (A), for any type A. Function types A → B, perhaps as a special case of a more general product.
In particular, a powerset P (A) of a type A should form a type and not a kind (i.e. it should be in the same sort as A) so that operations on types can be applied equally to both. Otherwise the classification of compound objects becomes unreasonably complicated: just think of a product A × P (A) × (A → P (A)). Types come together with their constructors, eliminators etc., their properties postulated by axioms. For instance, the equivalence (*) should be an axiom. Equality. In the “ordinary” mathematics two objects are equal iff they are the same object; one can do the same in the typed framework. As in common mathematical practice, equality between sets, functions, etc. should be extensional. In the formal model Leibniz’s equality should probably be an axiom. Sets. A predicate ϕ(x), where x : A, is identified with a subset {x : A | ϕ(x)} of type A. Subsets are assumed to be extensional, i.e., ϕ = ϕ iff ∀x:A. ϕ(x) ↔ ϕ (x). Inclusion is defined as usual by ϕ ⊆ ϕ iff ∀x:A. ϕ(x) → ϕ (x). Set union and intersection as well as the complement −ϕ = {x : A | ¬ϕ(x)} are well-defined operations on sets. Note the difference between operations on sets (like union) and on types (like disjoint sum). An indexed family of sets is given by any 2-argument predicate, so that e.g. we can write the ordinary definition y:Y Ay = {x : X | ∀y:Y. Ay (x)}. Should we need an intersection indexed by elements of a set rather than a type we must explicitly include it in the definition by writing y∈ψ Ay = {x : X | ∀y:Y (ψ(y) → Ay (x))}. 1
At least not yet. We may have to relax this restriction, if we want to deal with e.g. objects of type “semigroup”. This may lead to an infinite hierarchy of universes.
116
A. Kozubek and P. Urzyczyn
At this stage, one can prove standard results about the properties of the algebra of sets. Subsets of a Cartesian product A × A are of course called relations, and we can discuss properties of relations and introduce constructions like transitive closure and so on. Equivalences and quotients. While a definition of an equivalence relation (possibly partial) over a type A presents no difficulty, the notion of a quotient type must be postulated separately. Clearly, for every a : A we could consider a set [a]r = {b : A | b r a}, and form a subset of P (A) consisting of all such sets. However, that would be inconsistent with our main idea: a domain of interpretation is always a type and not a set. Also, there is no actual reason to define the abstract objects, the classes of abstraction, as equivalence sets, as it is done in set theory. There is a difference between abstraction and implementation. For instance, we define rationals from integers, but we do not think of 1/2 as of a set. The quotient type A/r induced by a (partial) equivalence r should be equipped with a canonical (partial) map abstract : A → A/r and (as a form of the axiom of choice) one could also postulate another map select : A/r → A satisfying abstract ◦ select = idA/r . The one-to-one correspondence between the quotient type and the set of equivalence classes should then be proven as a theorem (“the principle of equivalence”). Functions: total or partial? The notion of a function brings the first serious difficulty. In typed systems, once we assert f : A → B and a : A we usually conclude f (a) : B. That means we treat f (a) as a legitimate, well-defined object of type B. Everything works well as long as we can assume that all functions from A to B are total. However, it can happen that a function is defined only on a certain subset A of a given domain. In set theory this is not a problem, because both the type of arguments and the actual domain are simply sets, and we can always take f : A → B rather than f : A → B. In the typed framework, we would like to still say that e.g. λx:R. 1/x maps reals to reals, but the domain of the function is a proper subset of the type R. There are several possible solutions of this problem, see [11,15]. Perhaps the most adequate solution for our needs is to distinguish between the function space A → B and the type A −◦ B of partial functions. Then one assigns a domain predicate dom(f ) to every partial function f , and imposes a restriction on the use of the expression f (a) to the cases when a ∈ dom(f ). This seems to be quite consistent with the ordinary mathematical practice. The old idea of a definite description may turn out useful in this respect. A standard function definition may have the form f (x) = ιy.ϕ(x, y), or equivalently f = λx ιy.ϕ(x, y), and we would postulate an axiom of the form x ∈ dom(λx ιy.ϕ(x, y)) iff ∃!y ϕ(x, y). Extensionality for partial functions would then be stated as f = g ↔ (dom(f ) = dom(g)) ∧ ∀x (x ∈ dom(f ) → f (x) = g(x)). This approach assumes that f (a) is a well-formed expression of the appropriate type only if a ∈ dom(f ), a problem that does not formally occur2 in set theory, 2
But it occurs in practice: e.g. f (x) = y can be understood differently than x, y ∈ f .
In the Search of a Naive Type Theory
117
where f (x) = y is syntactic sugar for x, y ∈ f . In type theory, it is more natural to refrain from entering this level of extensionality, and to assume function application as a primitive. Mathematics is not an exact science. Various identifications are common in mathematical practice. In a strictly typed framework such identifications are unavoidable. For instance, we would like to treat total functions as special cases of partial functions, even if these are of two different types. There is a natural coercion from A → B to A −◦ B, which can, at the meta-level, be treated as identity without creating confusion. Also the difference between types and subsets becomes inconvenient in certain situations. One specific example is when we have an algebra with a domain represented as a type, and we need to consider a subalgebra based on a subset of that domain. Then we would prefer to have the “large” and the “small” domain living in the same sort. To overcome this difficulty, one may have to postulate a selection scheme: for every subset S of type A there exists a type A|S, such that objects of type A|S are in a bijective correspondence with elements of S. This partially brings back the identification of domains and predicates, but in a controlled way.
4
Naive Type Theory as a Pure Type System
The assumption that a set and a powerset should live in the same sort leads naturally to the following idea: consider a pure type system with the usual axiom ∗ : and with the rule (∗, , ∗). This rule makes possible to build products of the form Πx:A. κ, where A : ∗ and κ : , and the product itself is then a type (is assigned the sort ∗). In particular, the function space A → ∗ is a type, and this is exactly the powerset of A. A subset of A is then represented by any abstraction λx:A. ϕ(x), where ϕ(x) is a (dependent) proposition. Unfortunately, this idea is too naive. As pointed out by A.J.C. Hurkens and H. Geuvers, this theory suffers from Girard’s paradox, and thus it is inconsistent. Theorem 1 (Geuvers, Hurkens [17]). Let VNTT (Very Naive Type Theory) be an extension of λP by the additional rule (∗, , ∗). Then every type is inhabited in VNTT (every proposition has a proof ). Proof. The proof is essentially the same as Hurkens’ proof in [19], (cf. the version given in [24, chapter 14]) for the system λU− . There are two essential factors that imply that Russel’s paradox can be implemented in a theory: – A powerset P (x) of a domain x of a sort s lives in the same sort s. – There is enough polymorphism available in s to implement a construction of an inductive object μx:s.P (s). In λU− we have s = and polymorphism on the kind level is directly available. But almost the same can happen in VNTT, for s = ∗. Indeed, the powerset A → ∗ of any type A is a type, and although type polymorphism as such is not
118
A. Kozubek and P. Urzyczyn
present, it sneaks in easily by the back door. Instead of quantifying over types, one can quantify over object variables of type T → ∗, where T is any type. Thus instead of using μt : ∗.P (t) = ∀t(∀u: ∗ ((u → t) → P (u) → t) → t) one takes a : T and then defines μt : ∗.P (t) = ∀x : T → ∗ (∀y : T → ∗ ((ya → xa) → P (ya) → xa) → xa), with essentially the same effect.
It follows that our naive type theory cannot be too naive, and must avoid the danger of Girard’s paradox. The solution is to distinguish between propositions and sets, like in Coq. Define a pure type system LNTT (Less Naive Type Theory) with four sorts ∗t , ∗p , t , p , with axioms (∗t : t ) and (∗p : p ) and with the following rules: (∗t , ∗t , ∗t ), (∗p , ∗p , ∗p ), (∗t , t , t ), (∗t , ∗p , ∗p ), (∗t , p , ∗t ). The first and second rule represent, respectively, the formation of function types, and logical implication; the third rule is for dependent types and the fourth one permits quantification over objects of any type. The last rule is for the powerset. Note that there is no polymorphism involved, as rule (t , ∗t , ∗t ) can be fatal; however impredicativity is still present because of rule (∗t , p , ∗t ). Strong Normalization First note that, as all PTSs with only β-reduction, the system LNTT has the Church-Rosser property and subject reduction property on well-typed terms [16]. Moreover, LNTT is a singly sorted PTS [3], so the uniqueness of types also holds. Definition 2. In a fixed context Γ we use the following terminology. 1. 2. 3. 4. 5. 6. 7.
A A A A A A A
is is is is is is is
a term if and only if there exists B such that Γ A : B or Γ B : A. a kind if and only if Γ A : t . a constructor if and only if there exists B such that Γ A : B : t . a type if and only if Γ A : ∗t . a formula if and only if Γ A : ∗p . an object if and only if there exists B such that Γ A : B : ∗t . a proof if and only if there exists B such that Γ A : B : ∗p .
The classification of terms in LNTT is more complicated than e.g. in the calculus of constructions λC. While in λC there is a simple linear” hierarchy (from ” objects via types/constructors to kinds), in LNTT we also have a separate hierp archy from proofs via formulas to ∗ . The relation between the two hierarchies is not straightforward: in some respects formulas correspond to types, in other to objects. This is why we need two translations in the proof of strong normalization. We use the notation T ermΓ , KindΓ , ConstrΓ , T ypeΓ , P ropΓ , ObjΓ , and P roofΓ for, respectively, terms, kinds, constructors, types, formulas, objects, and proofs of the context Γ . The following lemma explains the various cases.
In the Search of a Naive Type Theory
119
Lemma 3. Assume a fixed context Γ – If A is a term such that Γ A : p then A = ∗p . – If A is a kind then A is of the following form • A = ∗t or • A = Πx : τ.B where τ is a type and B is a kind. – If A is a constructor then • A is a type, or • A is a variable, or • A is of the form λx : τ.κ where τ is a type and κ is a constructor, or • A is of the form κM where M is an object and κ is a constructor. – If A is a type then • A is a type variable, or • A is of the form Πx : τ.σ where τ and σ are types, or • A is of the form Πx : τ.∗p where τ is a type, or • A is of the form κM where M is an object and κ is a constructor. – If A is a formula then • A is a propositional variable, or • A is of the form Πx : ϕ.ψ where ϕ and ψ are formulas, or • A is of the form Πx : τ.ϕ where τ is a type and ϕ is a formula, or • A is of the form M N where M and N are objects. – If A is an object then • A is an object variable, or • A is of the form λx : τ.N where τ is a type and N is an object, or • A is of the form λx : τ.ϕ where τ is a type and ϕ is a formula, or • A is of the form M N where M and N are objects. – If A is a proof then • A is a proof variable, or • A is of the form λx : τ.D where τ is a type and D is a proof, or • A is of the form λx : ϕ.D where ϕ is a formula and D is a proof, or • A is of the form D1 D2 where D1 and D2 are proofs, or • A is of the form DN where D is a proof and N is an object. Lemma 4. If A is a term which is not a proof and B is a subterm of A then B is not a proof. Proof. This is an immediate consequence of Lemma 3.
Note that it follows from Lemma 4 that all formulas of the form Πx : ϕ. ψ, where ϕ and ψ are formulas, are actually implications (can be written as ϕ → ψ) because the proof variable x cannot occur in ψ. The first part of our strong normalization proof applies to all terms but proofs. For a fixed context Γ we define the translation TΓ : T ermΓ − P roofΓ → T erm(λP 2) from terms of LNTT into the system λP 2. Special variables Bool, F orall and Impl will be used in the definition of T . Types for these variables are given by the following context: Γ0 = {Bool : ∗, F orall : Πτ :∗. (τ → Bool) → Bool, Impl : Bool → Bool → Bool}.
120
A. Kozubek and P. Urzyczyn
Definition of the translation TΓ follows: – – – – – – – – – –
TΓ (t ) = ; TΓ (p ) = ∗; TΓ (∗t ) = ∗; TΓ (∗p ) = Bool; TΓ (x) = x, when x is a variable; TΓ (Πx : A.B) = Πx : TΓ (A).TΓ,x:A (B), for products created with the rules (∗t , ∗t , ∗t ), (∗t , p , ∗t ), (∗t , t , t ); TΓ (Πx : τ.ϕ) = F orall TΓ (τ ) (λx : TΓ (τ ).TΓ,x:τ (ϕ)), for products created with the rule (∗t , ∗p , ∗p ); TΓ (Πx : ϕ.ψ) = Impl TΓ (ϕ) TΓ (ψ), for products created with the rule (∗p , ∗p , ∗p ); TΓ (λx : A.B) = λx : TΓ (A).TΓ,x:A (B); TΓ (AB) = TΓ (A) TΓ (B).
For the sake of simplicity we omit the subscript Γ if it is clear which context we are using.3 Note that we cannot apply the translation T to proofs. Formulas of LNTT get translated by TΓ into objects of λP 2. Thus each abstraction of the form λx : ϕ.N would have to be translated into an expression λx : T (ϕ).T (N ). But T (ϕ) is an object so this expression would be ill-formed. The translation T is extended to contexts as follows: – T () = Γ0 , – T (Γ, x : A) = T (Γ ), x : TΓ (A), if A is a kind, a type, or ∗p , – T (Γ, x : A) = T (Γ ), if A is a formula. We now state some technical lemmas which are used in the proof of soundness of the translation T . Definition 5. We say that contexts Γ and Γ are equivalent with respect to the set of variables X = {x1 , . . . , xn } if and only if Γ and Γ are legal contexts and for all x ∈ X we have Γ (x) =β Γ (x). Lemma 6. If Γ and Γ are equivalent with respect to X, and N ∈ T ermΓ is such that F V (N ) ⊆ X and Γ N : A, then Γ N : A where A =β A . In particular, if N ∈ T ermΓ then N ∈ T ermΓ . Proof. Induction with respect to the structure of the derivation Γ N : A.
Lemma 7. If Γ and Γ are equivalent with respect to F V (M ) and M ∈ T ermΓ then TΓ (M ) = TΓ (M ). Proof. Induction with respect to the structure of M .
Lemma 8. If Γ a : A and Γ, x : A B : C for some C and a, B are not proofs then TΓ (B[x := a]) = TΓ,x:A (B)[x := TΓ (a)]. Proof. Induction with respect to the structure of B, using Lemma 7. 3
I.e., when the context is clear from the context ;)
In the Search of a Naive Type Theory
121
Lemma 9. If B and B are not proofs and B →β B then TΓ (B) + β TΓ (B ).
Proof. The proof is by a routine induction with respect to B →β B . If B is a redex then, by Lemma 4, it must be of one of the following forms: (λx : τ.ϕ)N, (λx : τ.M )N, (λx : τ.κ)N , where τ is a type, ϕ is a formula, and M, N are objects. In each of these cases we apply Lemma 8. If B is not a redex, we apply the induction hypothesis, using Lemma 7.
Lemma 10. If B TΓ (B) =β TΓ (B ).
=β
B and B, B are kinds, types or objects then
Proof. By Church-Rosser property there exists a well-typed term C such that B β C and B β C. We have TΓ (B) β TΓ (C) and TΓ (B ) β TΓ (C), by Lemma 9, whence TΓ (B) =β TΓ (B ).
Lemma 11 (Soundness of the translation T ). If Γ A : B and A is not a proof then T (Γ ) TΓ (A) : TΓ (B) in λP 2. Proof. Induction with respect to the structure of the derivation of Γ A : B using Lemmas 7, 8 and 10.
Corollary 12. If M is not a proof then M is strongly normalizing. Proof. Assume that there is an infinite reduction M →β M1 →β M2 →β · · · By + + Lemma 9 then T (M ) + β T (M1 ) β T (M2 ) β · · · But T (M ) is a valid term of λP 2, by Lemma 11, thus it is strongly normalizing. The contradiction shows that also M is strongly normalizing.
To show strong normalization for proofs we use another translation t from LNTT to the calculus of constructions λC. This translation depends on a given context Γ . Observe that the classification of a term A in LNTT does not a priori determine the classification of t(A) in λC. For instance, some types of LNTT are translated to types and some (those which have ∗p as a ”target”) to kinds, cf. Lemma 18. Similarly, some object terms of LNTT are translated as type constructors. Note that we do not define the translation for t and p as it is not needed for soundness. – – – – – – – – –
tΓ (∗t ) = ∗; tΓ (∗p ) = ∗; tΓ (x) = x, if x is a variable; tΓ (Πx : τ.B) = tΓ,x:τ (B), for products constructed using the rule (∗t , t , t ); tΓ (Πx : A.B) = Πx : tΓ (A).tΓ,x:A (B), for all other products; tΓ (λx : τ.κ) = tΓ,x:τ (κ), if κ is a constructor and τ is a type; tΓ (λx : A.B) = λx : tΓ (A).tΓ,x:A (B), for all other abstractions; tΓ (κN ) = tΓ (κ), if κ is a constructor; tΓ (AB) = tΓ (A)tΓ (B), for all other applications.
122
A. Kozubek and P. Urzyczyn
We extend the translation t to contexts by taking t() = and t(Γ, x : A) = t(Γ ), x : tΓ (A). Lemma 13. If Γ and Γ are equivalent with respect to F V (M ) and M is a term in Γ then tΓ (M ) = tΓ (M ). Proof. Induction with respect to the structure of M .
Lemma 14. Assume that Γ, x : A B : C and Γ N : A and N is an object or a proof. – If N is an object and B is a type or a constructor then tΓ (B[x := N ]) = tΓ,x:A (B). – If B is neither a type nor a constructor then tΓ (B[x := N ]) = tΓ,x:A (B)[x := tΓ (N )]. Proof. Induction with respect to the structure of B, using Lemma 7.
Definition 15. A reduction step A →β A is silent if – A = (λx : τ.κ)N →β κ[x := N ] = A , where κ is a constructor and N is an object, or – A = Πx : τ.B →β Πx : τ .B = A , where τ →β τ and B is a kind, or – A = κN →β κN = A , where N →β N and κ is a constructor, or – A = λx : τ.κ →β λx : τ .κ = A , where κ is a constructor, or – A = C[B] →β C[B ] = A , where C[ ] is any context and B →β B is a silent reduction. Lemma 16. If A →β B then tΓ (A) β tΓ (B). In addition, if the reduction A →β B is not silent then tΓ (A) + β tΓ (B). Proof. Induction with respect to A →β B, using Lemma 14 when A is a redex, and Lemma 13 in the other cases.
Corollary 17. If B =β B then tΓ (B) =β tΓ (B ). The following lemma states soundness of the translation tΓ . In particular, item 2 implies that all the rules in the calculus of constructions are needed. Lemma 18. Assume a fixed environment Γ . 1. If M is a proof, an object, or a formula and Γ M : B holds in LNTT then t(Γ ) tΓ (M ) : tΓ (B) in λC. 2. If M is a type or a constructor then t(Γ ) tΓ (M ) : ∗ or t(Γ ) tΓ (M ) : . 3. If M is a kind then t(Γ ) tΓ (M ) : . Proof. Simultaneous induction with respect to the structure of the appropriate derivation, using Lemma 13.
Theorem 19. System LNTT has the strong normalization property. Proof. We already know that all expressions except proofs are strongly normalizing. Arguing as in the proof of Corollary 12, and using Lemma 16, we conclude that almost all steps in an infinite reduction sequence must be silent. Thus it suffices to prove that if D is a proof than there is no infinite silent reduction of D. This goes by induction with respect to the size of D, by cases depending on its shape.
In the Search of a Naive Type Theory
123
No Conclusion The above is by no means a complete proposal of either theoretical or didactic character. It is essentially a collection of questions and partial suggestions of how such a proposal should be eventually designed. These questions are of double nature, and we would like to pursue the two directions. The first one is to find means to talk about basic mathematics without referring to set theory in either a naive (i.e., inconsistent) or axiomatic way, using instead an appropriate typebased language. That should happen in a possibly non-invasive way, keeping as much linguistic compatibility with the “standard” style as possible. The second problem is to give a formal foundation to this informal typebased language. This formalization is to be used for two purposes: to guarantee logical consistency of the naive exposition and to facilitate computer assisted verification and teaching. That requires building a complex system, of which our PTS-style Less Naive Type Theory is just a very basic core. This system must involve various extensions as in [4], perhaps include a hierarchy of sorts, etc. All this is future work.
Acknowledgement Thanks to Herman Geuvers and Christophe Raffalli for helpful discussions. Also thanks to the anonymous referees for their suggestions.
References 1. Adams, R., Luo, Z.: Weyl’s predicative classical mathematics as a logic-enriched type theory. In: Altenkirch, T., McBride, C. (eds.) TYPES 2006. LNCS, vol. 4502, Springer, Heidelberg (2007) 2. Andrews, P.B.: An Introduction to Mathematical Logic and Type Theory: To Truth Through Proof, 2nd edn. Applied Logic Series, vol. 27. Kluwer Academic Publishers, Dordrecht (2002) 3. Barendregt, H.P.: Lambda calculi with types. In: Abramsky, S., Gabbay, D.M., Maibaum, T.S.E. (eds.) Handbook of Logic in Computer Science, vol. II, pp. 117– 309. Oxford University Press, Oxford (1992) 4. Barthe, G.: Extensions of pure type systems. In: Dezani-Ciancaglini and Plotkin [10], pp. 16–31 5. Bertot, Y., Cast´eran, P.: Interactive Theorem Proving and Program Development. Coq’Art: The Calculus of Inductive Constructions. Texts in Theoretical Computer Science. An EATCS Series, Springer, Heidelberg (2004) 6. de Bruijn, N.G.: A survey of the project automath. In: Seldin, J.P., Hindley, J.R. (eds.) To H.B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism, pp. 579–606. Academic Press, London (1980) J., Sakowicz, J.: Papuq: A Coq assistant (manuscript, 2007) 7. Chrzaszcz, 8. Church, A.: A formulation of the simple theory of types. Journal of Symbolic Logic 5(2), 56–68 (1940)
124
A. Kozubek and P. Urzyczyn
9. Constable, R.L.: Naive computational type theory. In: Schwichtenberg, H., Steinbruggen, R. (eds.) Proof and System-Reliability, pp. 213–259. Kluwer Academic Press, Dordrecht (2002) 10. Dezani-Ciancaglini, M., Plotkin, G. (eds.): TLCA 1995. LNCS, vol. 902. Springer, Heidelberg (1995) 11. Farmer, W.M.: A partial functions version of Church’s simple theory of types. Journal of Symbolic Logic 55(3), 1269–1291 (1990) 12. Farmer, W.M.: A simple type theory with partial functions and subtypes. Annals of Pure and Applied Logic 64, 211–240 (1993) 13. Farmer, W.M.: A basic extended simple type theory. Technical Report 14, McMaster University (2003) 14. Farmer, W.M.: The seven virtues of simple type theory. Technical Report 18, McMaster University (2003) 15. Farmer, W.M.: Formalizing undefinedness arising in calculus. In: Basin, D., Rusinowitch, M. (eds.) IJCAR 2004. LNCS (LNAI), vol. 3097, pp. 475–489. Springer, Heidelberg (2004) 16. Geuvers, H.: The Church-Rosser property for beta-eta-reduction in typed lambda calculi. In: Logic in Computer Science, pp. 453–460 (1992) 17. Geuvers, H.: Private communication (2006) 18. Halmos, P.R.: Naive Set Theory. Van Nostrand, 1960. Reprinted by Springer, Heidelberg (1998) 19. Hurkens, A.J.C.: A simplification of Girard’s paradox. In: Dezani-Ciancaglini and Plotkin [10], pp. 266–278 20. Jensen, R.B.: On the consistency of a slight(?) modification of Quine’s NF. Synthese 19, 250–263 (1969) 21. Kamareddine, F., Laan, T., Nederpelt, R.: Types in logic and mathematics before 1940. Bulletin of Symbolic Logic 8(2), 185–245 (2002) 22. Luo, Z.: A type-theoretic framework for formal reasoning with different logical foundations. In: Okada, M., Satoh, J. (eds.) ASIAN 2006. LNCS, vol. 4435, pp. 214–222. Springer, Heidelberg (2006) 23. Quine, W.V.: New foundations for mathematical logic. American Mathematical Monthly 44, 70–80 (1937) 24. Sørensen, M.H., Urzyczyn, P.: Lectures on the Curry-Howard Isomorphism. Elsevier, Amsterdam (2006) 25. Weyl, H.: The Continuum. Dover, Mineola, NY (1994)
Verification of the Redecoration Algorithm for Triangular Matrices Ralph Matthes and Martin Strecker C. N. R. S. et Universit´e Paul Sabatier (Toulouse III) Institut de Recherche en Informatique de Toulouse (IRIT) 118 route de Narbonne, F-31062 Toulouse Cedex 9
Abstract. Triangular matrices with a dedicated type for the diagonal elements can be profitably represented by a nested datatype, i. e., a heterogeneous family of inductive datatypes. These families are fully supported since the version 8.1 of the Coq theorem proving environment, released in 2007. Redecoration of triangular matrices has a succinct implementation in this representation, thus giving the challenge of proving it correct. This has been achieved within Coq, using also induction with measures. An axiomatic approach allowed a verification in the Isabelle theorem prover, giving insights about the differences of both systems.
1
Introduction
Nested datatypes [9] may keep certain invariants (see also the illuminating [11]) even without employing a dependently-typed system where types may also depend on objects, thus e. g., maintaining size information in the types. Redecoration for triangular matrices by means of a nested datatype has first been studied in the case of infinite triangles [2]. Its finitary version has been programmed iteratively in the subsequent journal version [3] and through primitive recursion in Mendler-style [1]. In all these cases, no attempt was made to verify properties other than termination. We put forward the example of redecoration of triangular matrices as a prototypical situation, where nested datatypes yield concise and elegant programs that are verifiable. The price to pay is a more complex framework that is needed in order to formulate the programs and a more complex logical apparatus for verifying them. Moreover, as in all formal verification tasks, a major challenge is to develop an appropriate correctness criterion. We have chosen to give a very precise intuition about the algorithm. Even though this might satisfy the experienced programmer, we felt the need for a subsequent verification against a completely different model: a model that is just based on the ordinary type of lists and thus does not impose the aforementioned complex machinery.
With financial support by the European Union FP6-2002-IST-C Coordination Action 510996 “Types for Proofs and Programs”
M. Miculan, I. Scagnetto, and F. Honsell (Eds.): TYPES 2007, LNCS 4941, pp. 125–141, 2008. c Springer-Verlag Berlin Heidelberg 2008
126
R. Matthes and M. Strecker
Since we chose not to use dependent types, the list-based model speaks about a too large datatype, namely “triangles” that may be quite degenerate. Nevertheless, we may fully profit from tool support for lists that is well-developed in interactive theorem provers. In this case study, we deployed the Coq and the Isabelle proof assistants. Coq has a very strong type system that is fully adequate for representing nested datatypes and reasoning about them. The latest release 8.1 of Coq explicitly supports nested datatypes: definition by pattern-matching, induction principles, . . . Even though Isabelle, which is based on simply-typed lambda-calculus with type variables, does not accept nested datatypes as such, it permits to simulate essential aspects of the development in an axiomatic manner. Two critical aspects can be ensured by the use of theorem provers such as the two systems under study: termination of the algorithm (a non-functional specification) and functional correctness with respect to a chosen correctness criterion, in our case the relation with the list-based model. The axiomatic approach that we had to use in Isabelle cannot ensure termination of the algorithms on nested datatypes and cannot justify their induction schemes from first principles. However, the development of the list-based model is entirely derivable from a small logical core. The main challenge of the verification is to find the right lemmas that allow to get the inductive proof of the simulation theorem through. The whole explanations and the semi-formal development in standard mathematical style of the proof in the next three sections have only been possible with the aid of a proof engineering effort in the proof assistants. Simplification and rewriting are tasks that are error-prone for humans and where the tool support is particularly welldeveloped and helpful. In the light of two complete formalizations in two entirely independent systems, it does not seem necessary to reproduce such proof steps in the main part of this article. The curious reader is invited to consult the full proof scripts that are available online [12]. The article is structured as follows: Section 2 introduces the problem and gives the intuitive justification of redecoration for triangular matrices, viewed as a nested datatype. In Section 3, the list-based model is developed, against which the original model is verified in Section 4. Highlights of the formalizations in the proof assistants are presented in Section 5 for Coq and Section 6 for Isabelle. We conclude in Section 7.
2
Triangular Matrices
The “triangular matrices” of the present article are finite square matrices, where the part below the diagonal has been cut off. Equivalently, one may see them as symmetric matrices where the redundant information below the diagonal has been omitted. The elements on the diagonal play a different role than the other elements in many mathematical applications, e. g., one might require that the diagonal elements are invertible (non-zero). This is modeled as follows: A type E of elements outside the diagonal is fixed throughout, and there is a type of diagonal elements that enters all definitions as a parameter. More technically, if A
Verification of the Redecoration Algorithm for Triangular Matrices
127
is the type of diagonal elements, then Tri A shall denote the type of triangular matrices with A’s on the diagonal and E’s outside. Then, Tri becomes a family of types, indexed over all types, hence a type transformation. Moreover, the different Tri A are inductive datatypes that are all defined simultaneously, hence they are an “inductive family of types” or “nested datatype”. We do not consider empty triangles here. So, the smallest element of Tri A contains a single element that thus is a diagonal element and hence taken from the type A. This is materialized by the datatype constructor sg : A → Tri A, that constructs these “singletons”. Here, A is meant to be a type variable, i. e., a variable of type Set, the universe of computational types. The reader may just conceive that sg were given the quantified type ∀A. A → Tri A. The non-singleton case can be visualized like this: AEEEE AEEE AEE AE A
...
The vertical line cuts the triangle into one element of A and a “trapezium”, with an uppermost row solely consisting of E’s. One might now, for the purpose of having an inductive generation process, decompose that trapezium into the uppermost row and the triangle below, but it would be hard to keep the information that both have the same number of columns (unless making use of dependent types). The approach to be followed is the integration of the side diagonal (i. e., the elements just above the diagonal) into the diagonal. In this way, the trapeziums above are in one-to-one correspondence to triangles as follows: EEE AEE AE A
E E E E A
triangle −→ ←− trapezium
E×A
E E×A
...
E E E×A
E E E E×A
...
The trapezium to the left is then considered as the “trapezium view” of the triangle to the right. Vice versa, the triangle to the right is the “triangle view” of the trapezium to the left. Since we are about to define what triangles are, it is now comfortable to refer to the triangle view of the trapezium in the second datatype constructor constr : A → Tri(E × A) → TriA , again with A a type variable. Hence, the non-singleton triangles are conceived to consist of the topmost leftmost element, taken from A, and a triangle with
128
R. Matthes and M. Strecker
diagonal elements taken from E ×A. Abbreviate Trap A := Tri(E ×A). Therefore, constr : ∀A. A → Trap A → Tri A, but this is just a point of view to refer to the trapezium view of the argument that “is” a triangle. From sg and constr, the inductive family Tri is now fully defined as everything that is finitely generated from these two constructors. As is usual with nested datatypes, one cannot understand one specific family member Tri A for some type A in isolation, but the recursive structure will include Tri(E×A), Tri(E×(E×A)), . . . , hence infinitely many family members (with indices of increasing size). Naturally, also the induction principle for Tri cannot speak about one instance Tri A in isolation. The following induction principle is intuitively justified.1 Given a predicate P that speaks about all triangles with all types of diagonal elements, i. e., P : ∀A. Tri A → Prop, where Prop is the universe of propositions, the aim is to assure that P holds universally, i. e., ∀A∀t : Tri A. PA t holds true. (Here and everywhere, we suppress the type information that A is a type variable, and we write the type argument to P as a subscript.) An inductive proof of this universal statement now only requires a proof of the two following statements: – ∀A∀a : A. PA (sg a). – ∀A∀a : A∀r : Trap A. PE×A r → PA (constr a r). The inductive hypothesis PE×A r refers to the instantiation of the predicate P with type argument E × A. Except from this, the principle is no more difficult in nature than the induction principle for (homogeneous) lists. The redecoration algorithm whose verification is the aim of this article can be described as the binding operation of a comonad2 . In other words, we will organize the material in the form of a comonad for the type transformer Tri that might then be called the redecoration comonad. The function top : ∀A. Tri A → A that computes the top left element is programmed as follows: top(sg a) := a ,
top(constr a r) := a .
This is a simple non-recursive instance of definition by pattern-matching. The function top will be the counit of the comonad that we are defining. Redecoration in general is dual to substitution, see [15]. Following this view, redecoration for triangular matrices will be defined as Kleisli coextension operation: Given types A, B and a function f : Tri A → B – the “redecoration rule” – define the function redec f : Tri A → Tri B. In the formulation of [15], Tri A becomes the type of A-decorated structures. So, the redecoration rule f assigns B-decorations to A-decorated structures, and redec f “co-extends” this to an assignment of Bdecorated structures to A-decorated structures. The intuitive idea of redecoration in the case of triangles is to go recursively through the triangle and to replace each diagonal element by the result of 1 2
A formal justification is provided by Coq, see section 5. No category-theoretic knowledge is required to follow the article. The laws are given in our concrete situation, but they are mentioned to be instances of the general comonad notions.
Verification of the Redecoration Algorithm for Triangular Matrices
129
applying f to the sub-triangle that extends to the right and below the diagonal element. redec : ∀A∀B. (Tri A → B) → Tri A → Tri B, redec f (sg a) := sg(f (sg a)) , ) := constr(f t) (rest(redec f t)) . redec f (constr a r t:=
Here, rest(redec f t) is a meta-notation for the element of type Tri(E × B) yet to be defined. However, already at this stage of definition, the first comonad law becomes apparent: top(redec f t) = f t . It remains to define rest(redec f t) : Tri(E × B) for f : Tri A → B and t = constr a r with a : A and r : Tri(E × A). The type of r “is” a triangle, but, from the description of constr, r ought to be seen in trapezium view in order to follow the above intuition of redecoration. Going back to the illustration on page 127, the uppermost row is to be cut off, then the topmost A be replaced by f applied to the remaining triangle, and then redecoration has to be carried on recursively. Finally, the uppermost row has to be recovered. First, define the operation cut that cuts off the top row from the trapezium view of its argument: cut : ∀A. Trap A → Tri A , cut(sg (e, a)) := sg a , cut(constr (e, a) r) := constr a (cut r) . Here, (e, a) : E × A denotes pairing. Note that r is of type Trap(E × A) in the second clause. The definition principle is thus “polymorphic recursion” where the type parameter can change in the recursive calls. Since it is even a definition that exploits that the argument is not an arbitrary Tri A, it goes beyond the iteration schemes proposed in [3].3 Note that, in the recursive equation for cut, no change of view is necessary since the arguments are always seen as trapeziums. We will define rest(redec f t) just from f and r : Trap A where the latter might be called rest t. In “reality”, r is no trapezium so that a recursive call to redec for r will need a dedicated redecoration rule for the trapezium view, to be obtained from the “original” redecoration rule f in (ordinary) triangle view: f : Trap A → E × B , f r := (fst(top r), f (cut r)) , with fst the left/first projection out of a pair. Note that the target type E × B of f is the type parameter of Tri(E × B) = Trap B. The left component of f r 3
Basically, that article only allows to define iterative functions of type ∀A. Tri A → X A for some type transformation X. Through the use of syntactic Kan extensions for X, this can be relaxed somewhat, and an iterative function (called fcut on page 49 of that article) with the more general type ∀A∀B. (B → E × A) → Tri B → Tri A had to be defined before instantiating it to cut by using the identity on E × A as the functional parameter (hence with B := E × A).
130
R. Matthes and M. Strecker
instructs to keep the leftmost element of the uppermost row in the trapezium view. Note that the original definition of the right component of f r in [2] did not use a cut function but just lifted the second projection via a mapping function for Tri. Correct types do not suffice, and verification would have been welcome to exclude such an error. The operation that associates f with f is named lift, so lift f := f and lift : ∀A∀B. (Tri A → B) → Trap A → E × B. The definition of redecoration is finished by setting rest(redec f t) := redec (lift f ) r, whence (without using the abbreviations), the recursive equation for redec becomes redec f (constr a r) = constr (f (constr a r)) (redec (lift f ) r) , which is the equational form of the reduction behaviour established through Mendler recursion in earlier work [1]. A very typical phenomenon of nested datatypes are recursive functions that take an additional functional parameter – here the f – that is modified during the recursion. The major question is now: Did we come up with the right definition? By fairly easy inductive reasoning, using some auxiliary lemmas about cut and lift, the other two comonad laws can be established for the triple consisting of Tri, top and redec:4 – redec top t = t, – redec (g ◦ (redec f )) t = redec g (redec f t) for f, g, t of appropriate types, namely f : Tri A → B, g : Tri B → C, t : Tri A. In general, ◦ denotes functional composition λgλf λx. g(f x), but is written in infix notation. However, these laws and the textual description do not yet confirm a computational intuition that might have been formed through the experience with simpler datatypes such as lists. Therefore, we will set out to relate the behaviour of redec to a function redecL that does not involve nested datatypes but is based on just the ubiquitous datatype of lists.
3
A List-Based Model
We assume the type transformation List, where for any type A, the type of all finite lists with elements taken from A is List A. Although it is also a family of inductive types, List is not a nested datatype since there is no relation between any List A and List B for A = B in the definition of List. Clearly, such relations 4
We also need that redec is extensional in its function argument, see the discussion in the implementation-related sections.
Verification of the Redecoration Algorithm for Triangular Matrices
131
occur with the usual mapping function map : ∀A∀B. (A → B) → List A → List B that maps its function argument over all the elements of the second argument, but this is only after the definition of List. The list-based representation of triangles is now a simply parameterized family of inductive types, defined explicitly by reference to List: TriL A := List(List E × A) . Any element of some TriL A is a finite list of “columns”, and each column consists of the finite list of E elements above the diagonal and the A element on the diagonal. Note that the argument List E × A of List has to be parenthesized to the left, i. e., as (List E) × A. We visualize an element of TriL A as a generalized triangle, with the A’s still in the diagonal, but always the list of E’s above each diagonal element, with the first element the farthest away from the diagonal. An example with 4 columns would be: E E A
E E E AEE AE A
Triangularity is not expressed since, again, we do not want to make use of dependent types by which this could be controlled through the lengths of the E lists. In order to relate statements about Tri and TriL, we define the “list representation” of triangles of type Tri A as elements of type TriL A. Assume we want the representation for some constr a r with r : Trap A, then a recursive call to the representation function would yield an element of TriL(E × A). So, we would have to push out those E’s within the diagonal elements to the E lists. The columnwise operation is thus: shiftToE : ∀A. List E × (E × A) → List E × A , shiftToE(es, (e, a)) := (es + [e], a) , where + is used to denote list concatenation and [e] is the list that consists of just the element e, while the empty list will be denoted by [ ], and the “cons” operation will be denoted by infix “::”. The mapping with shiftToE changes from “triangle view” to “trapezium view” in the list-based representation: shiftToEs : ∀A. TriL (E × A) → TriL A , shiftToEs := map shiftToE . The list representation of triangles is given iteratively as follows: toListRep : ∀A. Tri A → TriL A , toListRep(sg a) := [([ ], a)] , toListRep(constr a r) := ([ ], a) :: shiftToEs (toListRep r) .
132
R. Matthes and M. Strecker
The intention is to define a notion of redecoration also for the list-based representation, i. e., an operation redecL : ∀A∀B. (TriL A → B) → TriL A → TriL B . However, there will be no proper comonad structure since no counit topL : ∀A. TriL A → A can exist: A could be instantiated by an empty type A0 , and TriL A0 would still not be empty since it contains [ ]. As a preparation for the definition of redecL, more operations on columns are introduced that allow to cut off and restore the topmost E element: removeTopE : ∀A. List E × A → List E × A , removeTopE([ ], a) := ([ ], a) , removeTopE(e :: es, a) := (es, a) , singletonTopE : ∀A. List E × A → List E , singletonTopE([ ], a) := [ ] , singletonTopE(e :: es, a) := [e] , appendEs : ∀A. List E → List E × A → List E × A , := (es + es , a) . appendEs es (es , a) For all pairs p : List E ×A, one has appendEs(singletonTopE p)(removeTopE p) = p. The technical problem here is just that the E list can be empty, and so there is the need for the list with at most one element. These operations can be canonically extended to multiple columns. For oneplace functions, this is done via map, for the two-place function appendEs, the generic zipWith function known from the Haskell programming language (see www.haskell.org) comes into play: zipWith : ∀A∀B∀C. (A → B → C) → List A → List B → List C , zipWith f (a :: 1 ) (b :: 2 ) := f a b :: zipWith f 1 2 , zipWith f 1 [ ] := [ ] , := [ ] . zipWith f [ ] 2 The last auxiliary definitions for redecL are: removeTopEs : ∀A. TriL A → TriL A , removeTopEs := map removeTopE , singletonTopEs : ∀A. TriL A → List(List E) , singletonTopEs := map singletonTopE , zipAppendEs : ∀A. List(List E) → TriL A → TriL A , zipAppendEs := zipWith appendEs . The following definition is by wellfounded recursion over the TriL A argument of redecL. Note that removeTopEs does not change the list length of its argument and that therefore, it is just the list length of the TriL argument of redecL that is smaller in the recursive call. redecL : ∀A∀B. (TriL A → B) → TriL A → TriL B , redecL f [ ] := [ ] , redecL f ((es, a) :: r) := (es, f ((es, a) :: r)) :: zipAppendEs (singletonTopEs r) (redecL f (removeTopEs r)) .
Verification of the Redecoration Algorithm for Triangular Matrices
133
In comparison with redec, this definition contains a new trivial case for the empty list, and the redecoration rule f does not need to be adapted to a trapezium view in the recursive call. Thus, f is just a fixed parameter throughout the recursion, hence, also the type parameters stay fixed. Due to the less rigid constraints on the form in TriL, there may be E’s above the leftmost A. This parameter es is taken into account when evaluating the redecoration rule, but still, only the diagonal elements are modified by the algorithm.
4
Verification Against the List-Based Model
Theorem 1 (Simulation). If E is non-empty, then for all types A, B, terms t : Tri A and f : TriL A → B: redecL f (toListRep t) = toListRep(redec (f ◦ toListRep) t) . This is the most natural theorem that relates redec and redecL through toListRep: If there were an operation topL to turn TriL and redecL into a comonad, this theorem would establish for toListRep one of the two properties of a comonad morphism from Tri to TriL. Unfortunately, it does not reduce redec to redecL or vice versa. The former direction seems already to be hampered by the need for a redecoration rule f : TriL A → B, hence with a much wider domain than prescribed for redec. However, this is not so due to the existence of a left inverse fromListRep of toListRep. As we will see in the main theorem at the end of this section, redec f t can be expressed in terms of redecL, toListRep and fromListRep. For the proof of the simulation theorem, one has to replay cut and lift on the list representations: Define remsh : ∀A. List E × (E × A) → List E × A , remsh := removeTopE ◦ shiftToE . Abbreviate TrapL A := TriL(E × A). Define cutL : ∀A. TrapL A → TriL A , cutL := map remsh , where the operational intuition is just to put the argument in trapezium view and then to cut off the top row. This intuition is met thanks to the functor law for map stating preservation of composition, i. e., map (g ◦ f ) t = map g (map f t). Lemma 1. toListRep(cut r) = cutL(toListRep r) for all A and terms r : Trap A. Proof. This is by induction on Trap, hence a section of Tri. The induction principle is as follows: Given P : ∀A. Trap A → Prop, one concludes its universality, i. e., ∀A∀t : Trap A. PA t from the following two clauses: – ∀A∀a : E × A. PA (sg a). – ∀A∀a : E × A∀r : Trap (E × A). PE×A r → PA (constr a r).
134
R. Matthes and M. Strecker
It should be as intuitive as the Tri induction principle. For formal justifications, see the later sections. The inductive step of the lemma will need (for r : TrapL(E × A)) shiftToEs (cutL r) = cutL (shiftToEs r) , that in turn follows from (for r : List E × (E × (E × A))) shiftToE(remsh r) = remsh(shiftToE r) and the above-mentioned functor law for map.
The analogue liftL of lift can only be defined for non-empty E. We will assume some fixed e0 of type E in the sequel. Define liftL : ∀A∀B. (TriL A → B) → TrapL A → E × B , liftL f [ ] := (e0 , f (cutL [ ])) , liftL f ((es, (e, a)) :: r ) := (e, f (cutL r)) . r:=
The following relation between lift and liftL is a consequence of the preceding lemma. Lemma 2. lift (f ◦ toListRep) r = liftL f (toListRep r) for types A, B and terms f : TriL A → B and r : Trap A. The major obstacle on the way to proving the theorem is the following lemma. Lemma 3 (Main Lemma). For any types A, B and terms f : TriL A → B and r : TrapL A, one has with
shiftToEs (redecL (liftL f ) r) = zipAppendEs 1 2 1 := singletonTopEs(shiftToEs r) : List(List E) , 2 := redecL f (removeTopEs(shiftToEs r)) : TriL B .
Note that r : TrapL A, but that r is nevertheless just a (generalized) triangle. Hence liftL f is the right redecoration rule that treats it in trapezium view. Redecoration will nevertheless produce a (generalized) triangle, so the result is finally transformed into trapezium view. On the right-hand side, r is first made into a (generalized) trapezium, then redecoration is done to the result after cutting off the top row, but then the cut off elements are restored, hence the outcome is also a (generalized) trapezium. Note also that, as argued before, by virtue of the functor law for map, the argument to redecL f in 2 is equal to cutL r. Proof. The function redecL can be understood as being defined by recursion over the list length of its argument, and also the proof can be done by induction on the list length of r. See more details in the following specific sections on Coq and Isabelle how it is done more elegantly.
Verification of the Redecoration Algorithm for Triangular Matrices
Theorem 1 follows from the main lemma by induction on Tri for t.
135
We want to define a left inverse to toListRep. The type ∀A. TriL A → Tri A cannot be inhabited since TriL A is never empty while Tri A inherits emptiness from A. Hence, we will only be able to define a function fromListRep : ∀A. A → TriL A → Tri A such that fromListRep a0 (toListRep t) = t for all a0 : A, t : Tri A. Recall that we have fixed an element e0 of type E. The operation on columns is defined as shiftFromE : ∀A. List E × A → List E × (E × A) , shiftFromE([ ], a) := ([ ], (e0 , a)) , shiftFromE(e :: es, a) := (removelast (e :: es), (last (e :: es), a)) , with functions removelast and last of the meaning suggested by their names. It is easy to establish that, for all pairs p : List E × (E × A), one has shiftFromE(shiftToE p) = p . This is extended to an operation on (generalized) triangles: shiftFromEs : ∀A. TriL A → TrapL A , shiftFromEs := map shiftFromE , and shiftFromEs(shiftToEs r) = r for all r : TrapL A follows from the respective result on columns, the two functor laws for map (hence, also map (λx.x) l = l) and extensionality of map in its function argument.5 The definition of fromListRep is by wellfounded recursion over the TriL A argument, and as for redecL, this is justified by the decrease of the list length of this argument in the recursive calls. However, unlike the situation of redecL, we need polymorphic recursion in that the function at type A calls itself at type E × A: := sg a0 , fromListRep a0 [ ] := sg a , fromListRep a0 [(es, a)] fromListRep a0 ((es, a) :: p :: r) := constr a fromListRep (e0 , a0 ) (shiftFromEs(p :: r)) . Lemma 4 (left inverse). For any type A, terms a0 : A and t : Tri A, one has fromListRep a0 (toListRep t) = t. Proof. By a use of the induction principle for Tri, exploiting the fact that any toListRep t is a non-empty list, hence of the form p :: r. Theorem 2 (Main Theorem). If E is non-empty, then for all types A, B, and terms a0 : A, b0 : B, f : Tri A → B and t : Tri A: redec f t = fromListRep b0 redecL (f ◦ (fromListRep a0 )) (toListRep t) . Proof. An immediate consequence of the simulation theorem and the preceding lemma, by using extensionality of redec in its functional argument once more. 5
See the discussion on extensionality in Section 5.
136
5
R. Matthes and M. Strecker
Details on Formal Verification with Coq
In this section, a Coq development of the mathematical contents of the last three sections is discussed. The Coq vernacular file can be found at the web site [12]. We mentioned above that the Coq system [10]6 has a genuine pattern-matching support for nested datatypes like Tri since version 8.1, contributed by Christine Paulin. In version 8.0, there were subtle problems because such datatypes could only be specified through datatype constructors with universally quantified types that had to live in the universe Set as well, hence Set had to be made impredicative by an option to the Coq runtime system. The following remarks concern Coq 8.1 at patch level 3, released in December 2007. The nested datatype (a. k. a. inductive family) Tri is introduced as follows: Inductive Tri (A:Set) : Set := sg : A -> Tri A | constr : A -> Tri (E * A) -> Tri A. Then, the appropriate induction principle is automatically generated, and one can check its type: Check Tri_ind : forall P : forall A : Set, Tri A -> Prop, (forall (A : Set) (a : A), P A (sg a)) -> (forall (A : Set) (a : A) (r : Tri (E * A)), P (E * A) r -> P A (constr a r)) -> forall (A : Set) (t : Tri A), P A t. This is exactly the induction principle of Section 2. However, the induction principle for Trap in Section 4 seems to need a (straightforward) proof via the fix construction for structurally recursive functions/proofs. The definition of redecL by recursion over a measure and reasoning about redecL by “measure induction” uses an experimental feature of Coq 8.1 (one has to load separately the package Recdef), provided by Pierre Courtieu, Julien Forest and Yves Bertot [4,5,6]. Function redecL (A B:Set)(f:TriL A -> B)(t: TriL A) {measure length t} : TriL B := match t with nil => nil | (es,a)::rest => (es,f((es,a)::rest)):: zipAppendEs (singletonTopEs rest) (@redecL A B f (removeTopEs rest)) end. The fact that the length is a measure that decreases in the recursive call has to be proven in order to get Coq to accept this as a definition. Thanks to the explicit form @redecL A B that reveals the Church-style syntax that underlies
6
We will only presuppose concepts and features of Coq that are explained in the Coq textbook [8].
Verification of the Redecoration Algorithm for Triangular Matrices
137
Coq although it is hidden from the user by the mechanism of implicit arguments, measure induction even works with this polymorphic function. Coq automatically generates an induction principle redecL ind, called functional induction, that allows to argue about values of redecL directly along the recursive call scheme of its definition. The induction hypothesis is prepared with the argument removeTopEs rest, and there is no need to redo the justification by means of the decreasing length again. The proof of the main lemma is then an instance of redecL ind, and this is interactively initiated by functional induction (redecL (liftL e0 f) r). In a simpler form, functional induction is used for the analysis of zipWith that, despite being structurally recursive in both list arguments, also profits from being defined by the “Function” command that again prepares the induction hypotheses, and this has already been available in Coq for years now. However, even the current extensions to functional induction in Coq 8.1 patch level 3 do not cover the definition of fromListRep because it combines recursion with decreasing measure with polymorphic recursion. In the development version of Coq, a proposal by Julien Forest works well where the type A and the elements a0 : A and t : TriL A are encapsulated in a record, see [12]. Our solution consists in defining an auxiliary function with an additional parameter n of type nat by ordinary recursion on n and then fixing n to the length of t. This works very well because the list length is just one less in the recursive call and because the proof of Lemma 4 only needs the defining equations of fromListRep immediately preceding that lemma. In the middle of the proof of the second comonad law, redec top t = t, we have to prove redec top t = redec (lift top) t. It was easy to prove ∀r. lift top r = top r before that. It would be easy to conclude if this implied lift top = top, but this typically cannot be done in intensional type theory to which the underlying system of Coq belongs, namely the Calculus of Inductive Constructions. But we do not even need that equality since, in general, redec f t only depends on the values of f (the “extension” of f ) and not its definition (or “intension”). More precisely, one can show by Tri induction on t that redec is “extensional”: ∀f f . (∀t . f t = f t ) → redec f t = redec f t . This property is also needed for the proofs of the third comonad law, the simulation theorem and the main theorem, and the analogous property for map enters the proof of the main lemma, its auxiliary lemmas and the proof that shiftFromEs is a left-inverse of shiftToEs.
6
Details on Formal Verification with Isabelle
We are going to sketch an alternative to the Coq implementation, described in the previous section. This is done within the system Isabelle (more precisely,
138
R. Matthes and M. Strecker
Isabelle 2007 of November 2007), and the script with the theory development is also available from the web site [12]. The type system of Isabelle is less expressive than the type system of Coq: it is a simply typed lambda-calculus with ML-style polymorphism [13]. Type parameters of polymorphic functions need not be supplied explicitly, but can be inferred by the system, and universal quantification over types on the top-level is provided through schematic type variables. The datatype definition mechanism currently implemented in Isabelle is described in more detail in [7]. To be consistent with the Coq formalization, we would like to fix a type constant E by declaring “typedecl E” and define the polymorphic tri datatype as follows: datatype ’a tri
=
sg (’a)
|
constr (’a) ((E * ’a) tri)
As spelled out in Section 2, in the resulting induction principle ∀ P. (∀ a. P (sg a)) −→ (∀ a r. P r −→ P (constr a r)) −→ ∀ t. P t the universally quantified induction predicate P would then be applied both to a (E * ’a) tri and a ’a tri, thus overstraining Isabelle’s type system. Therefore, such a datatype definition is not valid in Isabelle. We circumvent this and related problems by not conceiving sg and constr as constructors of an inductive type, but just as constants declared by consts sg :: ’a ⇒ (’a,’e) tri constr :: ’a ⇒ (’a, ’e) trap ⇒ (’a, ’e) tri As above, (’a, ’e) trap abbreviates (’e * ’a,’e) tri. (And the fixed parameter E is replaced by a second type parameter ’e. For the whole theoretical development, this difference does not play any role, but it facilitates concrete programming examples that are also provided in the Isabelle script.) For carrying out proofs, we have to provide appropriate instances of the induction predicate. In order to obtain the desired computational behaviour, we manually have to add reduction rules, as will be shown in the following. As an example, take the cut function of Section 2. We declare the function cut by: consts cut :: (’a, ’e) trap ⇒ (’a, ’e) tri The primitive-recursive function definition is accomplished by providing the following characteristic equations: axioms
cut_sg [simp]: cut (sg (e,a)) = sg a cut_constr [simp]: cut (constr (e,a) r) = constr a (cut r)
Note that in the second equation, cut is applied to expressions of different types: on the left, to a term of type ’a tri, on the right, to a term (’a,’e) tri. Here,
Verification of the Redecoration Algorithm for Triangular Matrices
139
we exploit an essential difference between a universally quantified variable (as in the induction predicate above), which can only be applied to elements of the same type, and a globally declared constant such as cut, which can be applied to instances of different type. This distinction is reminiscent of the difference, in an ML-style type system, between the term λid : a ⇒ a. λf : nat ⇒ bool ⇒ nat. f (id 0) (id T rue) (which is not well-typed) and let id = (λx : a. x) in λf : nat ⇒ bool ⇒ nat. f (id 0) (id T rue) (which is). Of course, this axiomatization does not provide the guarantees of a genuine primitive recursive definition, such as termination. As mentioned above, for typing reasons, we cannot state a general induction principle. We can, however, exploit the same mechanism as for function definitions and provide instances of the induction principle for proving individual theorems. We illustrate the procedure for the proof of the following (where # is the “cons” operation and snd is the right/second projection out of a pair) lemma toListRep_cons_inv: toListRep t = a # list −→ top t = snd a We notice that the proof can be carried out using the following instance of the induction predicate: (∀a. P1 (sg a)) −→ (∀a r. P1 r −→ P1 (constr a r)) −→ ∀t. P1 t where P1 is defined as λt. (∀ a list. toListRep t = a # list −→
top t = snd a)
The proof of the lemma is now very easy: unfold the definition of P1 and carry out elementary term simplification. Altogether, the proof of Theorem 1 requires four instances of the induction schema. This approach is not difficult, but suffers from the well-known drawbacks of code duplication: it is error-prone and the resulting theories are hard to maintain. This is even more true since, for the proof of Lemma 3, in order to get the induction through, we have to quantify over the function f as well, and this implicitly requires to quantify over its additional type variable B. Since the latter quantification cannot be expressed, we cannot just use the above induction axiom for P1 with the respective new predicate in place of P1 but have to copy its definition to the four occurrences in the induction formula, giving rise to the axiom Tri ind MAIN appl2 in the Isabelle script. Even though it is possible in principle to generate the required induction schemas, the discussion shows that
140
R. Matthes and M. Strecker
the result tends to be artificial and, by excessive code duplication, contrary to good practice. On the good side, the difference between the induction principles for Tri and Trap becomes invisible in this approach while in Coq, the former is provided and the latter has to be defined by structural recursion. In the list-based model, we enjoy the full support from Isabelle for datatypes, here for lists. The proof of the main lemma can just follow the structure of the recursive calls in the definition, expressed in the generated theorem redecL.induct that is a version of the respective “functional induction” scheme in Coq, without dependent types and hence without the need to reference redecL in it. This functionality has been developed by Konrad Slind [14]. Also note that proving extensionality of redec in its function argument becomes a triviality in Isabelle, thanks to its rule expand fun eq that assumes (?f = ?g) = (∀ x. ?f x = ?g x), i. e., functions are equal if and only if they are point-wise equal. Finally, we remark that Isabelle’s type system only allows inhabited types, hence the type parameters only range over nonempty types. Thus, all the elements denoted by a0 , b0 and e0 in our informal description (and present in the Coq development) could have been obtained by the ε operator and therefore do not show up in the Isabelle scripts [12].
7
Conclusions
This article has presented a mathematical formalization of redecoration in triangular matrices by means of a nested datatype. Redecoration provides a comonad structure for this datatype. Moreover, we have established a precise relationship with a model that is only based on lists. For its verification, we have contrasted two formalizations in the proof assistants Coq and Isabelle and discussed their different approaches, in particular recursion and induction that do not just follow the datatype definition. An important difficulty has been the necessity of polymorphic recursion, but this is intrinsic to nested datatypes. We would hope for some Isabelle extension with a full support of nested datatypes, i. e., where induction axioms and equational specifications of recursive functions are generated and justified in the kernel of Isabelle, just as in the existing datatype package. Interesting future work would treat the original infinite triangular matrices of [2,3] or even specify and verify a datatype-generic definition of redec. Acknowledgements. With the help of Stefan Berghofer, we overran the more subtle problems with variables of different kinds in Isabelle. Mamoun Filali has provided valuable suggestions for the elimination of functional induction in the Coq development as an alternative to Julien Forest’s construction that is no longer supported by the current Coq version. The referees’ suggestions helped to substantially strengthen the main theorem.
Verification of the Redecoration Algorithm for Triangular Matrices
141
References 1. Abel, A., Matthes, R.: Fixed points of type constructors and primitive recursion. In: Marcinkowski, J., Tarlecki, A. (eds.) CSL 2004. LNCS, vol. 3210, pp. 190–204. Springer, Heidelberg (2004) 2. Abel, A., Matthes, R., Uustalu, T.: Generalized iteration and coiteration for higherorder nested datatypes. In: Gordon, A.D. (ed.) FOSSACS 2003. LNCS, vol. 2620, pp. 54–68. Springer, Heidelberg (2003) 3. Abel, A., Matthes, R., Uustalu, T.: Iteration and coiteration schemes for higherorder and nested datatypes. Theoretical Computer Science 333, 3–66 (2005) 4. Balaa, A., Bertot, Y.: Fix-point equations for well-founded recursion in type theory. In: Aagaard, M.D., Harrison, J. (eds.) TPHOLs 2000. LNCS, vol. 1869, pp. 1–16. Springer, Heidelberg (2000) 5. Barthe, G., Courtieu, P.: Efficient reasoning about executable specifications in Coq. In: Carre˜ no, V.A., Mu˜ noz, C.A., Tahar, S. (eds.) TPHOLs 2002. LNCS, vol. 2410, pp. 31–46. Springer, Heidelberg (2002) 6. Barthe, G., Forest, J., Pichardie, D., Rusu, V.: Defining and reasoning about recursive functions: A practical tool for the Coq proof assistant. In: Hagiya, M., Wadler, P. (eds.) FLOPS 2006. LNCS, vol. 3945, pp. 114–129. Springer, Heidelberg (2006) 7. Berghofer, S., Wenzel, M.: Inductive datatypes in HOL - lessons learned in formallogic engineering. In: Bertot, Y., Dowek, G., Hirschowitz, A., Paulin, C., Th´ery, L. (eds.) TPHOLs 1999. LNCS, vol. 1690, pp. 19–36. Springer, Heidelberg (1999) 8. Bertot, Y., Cast´eran, P.: Interactive Theorem Proving and Program Development. Coq’Art: The Calculus of Inductive Constructions. Texts in Theoretical Computer Science. Springer, Heidelberg (2004) 9. Bird, R., Meertens, L.: Nested datatypes. In: Jeuring, J. (ed.) MPC 1998. LNCS, vol. 1422, pp. 52–67. Springer, Heidelberg (1998) 10. Coq Development Team: The Coq Proof Assistant Reference Manual Version 8.1. Project LogiCal, INRIA (2006), System available at: http://coq.inria.fr 11. Hinze, R.: Manufacturing datatypes. Journal of Functional Programming 11, 493– 524 (2001) 12. Matthes, R., Strecker, M.: Coq and Isabelle development for Verification of the Redecoration Algorithm for Triangular Matrices (2007), http://www.irit.fr/∼ Ralph.Matthes/CoqIsabelle/TYPES07/ 13. Nipkow, T., Paulson, L.C., Wenzel, M.T.: Isabelle/HOL. LNCS, vol. 2283. Springer, Heidelberg (2002) 14. Slind, K.: Wellfounded schematic definitions. In: McAllester, D. (ed.) CADE 2000. LNCS, vol. 1831, pp. 45–63. Springer, Heidelberg (2000) 15. Uustalu, T., Vene, V.: The dual of substitution is redecoration. In: Hammond, K., Curtis, S. (eds.) Trends in Functional Programming 3, Intellect, Bristol / Portland, OR, pp. 99–110 (2002)
A Logic for Parametric Polymorphism with Effects Rasmus Ejlers Møgelberg and Alex Simpson LFCS, School of Informatics, University of Edinburgh Abstract. We present a logic for reasoning about parametric polymorphism in combination with arbitrary computational effects (nondeterminism, exceptions, continuations, side-effects etc.). As examples of reasoning in the logic, we show how to verify correctness of polymorphic type encodings in the presence of effects.
1
Introduction
Strachey [11] defined a polymorphic program to be parametric if it applies the same uniform algorithm across all of its type instantiations. Parametric polymorphism has proved to be a very useful programming language feature. However, the informal definition of Strachey does not lend itself to providing methods of verifying properties of polymorphic programs. Reynolds [10] addressed this by formulating the mathematical notion of relational parametricity, in which the uniformity in Strachey’s definition is captured by requiring programs to preserve certain relations induced by the type structure. In the context of pure functional polymorphic languages, such as the second-order lambda-calculus, relational parametricity has proven to be a powerful principle for establishing abstraction properties, proving equivalence of programs and inferring useful properties of programs from their types alone [12]. Obtaining a useful and indeed consistent formulation of relational parametricity becomes trickier in the presence of computational effects (nondeterminism, exceptions, side-effects, continuations, etc.). Even the addition of recursion (and hence possible nontermination) to the second-order lambda-calculus causes difficulties. For this special case, Plotkin proposed second-order intuitionistic-linear type theory as a suitable framework for formulating relational parametricity [9]. This framework has since been developed by the first author and colleagues [2], but it does not adapt to general effects. Recently, the authors have developed a more general framework that is appropriate for modelling parametric polymorphism in combination with arbitrary computational effects [4]. The framework is based on a custom-built type theory PE for combining polymorphism and effects, which is strongly influenced by Moggi’s computational metalanguage [6], and Levy’s call-by-push-value calculus [3]. As presented in [4], the type theory is interpreted in relationallyparametric models developed within the context of an intuitionistic set theory as the mathematical meta-theory. While this approach provides an efficient M. Miculan, I. Scagnetto, and F. Honsell (Eds.): TYPES 2007, LNCS 4941, pp. 142–156, 2008. c Springer-Verlag Berlin Heidelberg 2008
A Logic for Parametric Polymorphism with Effects
143
framework for building models, the underlying principles for reasoning about the combination of parametricity and effects are left buried amongst the (considerable) semantic details. The purpose of the present article is to extract the logic for parametricity with effects that is implicit within these models, and to give a self-contained presentation of it. In particular, no understanding of the semantic setting of [4] is required. The logic we present, builds on Plotkin and Abadi’s logic for parametric polymorphism in second-order lambda-calculus [7], and is influenced by the existing refinements of this logic to linear type theory and recursion [9,2]. The logic is built over the type theory PE, presented by the authors in [4]. As in Levy’s callby-push-value (CBPV) calculus [3], the calculus PE has two kinds of types: value types (whose elements are static values) and computation types (whose elements are dynamic effect-producing computations. The type theory allows for polymorphic quantification over value types as well as over computation types. A central result in [4] is that the algebraic operations that cause effects (as in [8]) can be given polymorphic types and satisfy a parametricity principle. For example, in a type theory for polymorphism and nondeterminism, the nondeterministic choice operation has polymorphic type ∀X. X → X → X, where X ranges over all computation types. An essential ingredient in the logic we present is the division of relations into value relations and computation relations. The latter generalise the notion of admissible relations that arise in the theory of parametricity and recursion [2]. To see why such a notion is necessary for the formulation of a consistent theory of parametricity, consider the type ∀X. X → X → X of a binary nondeterministic choice operation, as above. Relational parametricity, states that for all computation types X, Y and all relations R between them, any operation of the above type must preserve R. If R were to range over arbitrary relations, then only the first and second projections would satisfy this condition, and so algebraic operations (such as nondeterministic choice) would not count as parametric. This is why a restricted class of computation relations is needed. Such relations can be thought of as relations that respect the computational structure. This paper makes two main contributions. The first is the formulation of the logic itself, which is given in Section 3. Here, our goal is to present the logic in an intelligible way, and we omit the (straightforward) proofs of the basic properties of the logic. Our second contribution is to use the logic to formalize correctness arguments for the type theory PE. In particular, we verify that our logic for parametricity with effects proves desired universal properties for several polymorphically-defined type contructors, including existential and coinductive computation types. For this, we include as much detail as space permits.
2
A Type Theory for Polymorphism and Effects
This section recalls the type theory PE for polymorphism and effects as defined and motivated in [4]; see also [5] for an application. As mentioned in the introduction, like CBPV [3], PE has two collections of types: value types and
144
R.E. Møgelberg and A. Simpson Γ, x : B | Δ t : C
Γ, x : B | − x : B
Γ | Δ λx : B. t : B → C
Γ |Δ t: B Γ | Δ ΛX. t : ∀X. B
X ∈ FTV(Γ, Δ)
Γ |x: A t: B Γ |x: A x: A
Γ |Δ s: B → C
Γ | Δ s(t) : C Γ | Δ t : ∀X. B Γ | Δ t(A) : B[A/X] Γ |− s: A B
◦
Γ | − λ x : A. t : A B
Γ |Δ t: B Γ | Δ ΛX. t : ∀X. B
X ∈ FTV(Γ, Δ)
Γ |− t: B
Γ |Δ t: A
Γ | Δ s(t) : B Γ | Δ t : ∀X. B Γ | Δ t(A) : B[A/X]
Fig. 1. Typing rules for PE
computation types. We follow Levy’s convention of distinguishing syntactically between the two by underlining computation types as in A, B, C . . .. The calculus PE has polymorphic quantification over both value types and computation types, with type variables denoted X, Y . . . and X, Y . . . respectively. Value types and computation types are defined by the grammar A, B ::= X | A → B | ∀X. A | X | ∀X. A | A B A, B ::= A → B | ∀X. A | X | ∀X. A . Note that the computation types form a subcollection of the value types. One semantic intuition is that value types are sets and computation types are algebras for some computational monad in the sense of Moggi [6]. In such a model, is modelled by the collection of algebra homomorphisms, a set which does not in general carry a natural algebra structure and is thus a value type in PE, and the inclusion of computation types into value types is modelled by the forgetful functor mapping an algebra to its carrier. We refer the interested reader to [4] for a discussion of such models in detail. Typing judgements of PE are of the form Γ | Δ t : A where Γ is an ordinary context of variables, and Δ is a second context called the stoup subject to the following conditions: either Δ is empty or it is of the form Δ = z : B, in which case A is also required to be a computation type. The semantic intuition for the second case is that t denotes an algebra homomorphism from B to A. The typing rules are presented in Figure 1. In them, Γ | − t : A denotes a judgement with empty stoup, and the operation FTV returns the set of free type variables, which is defined in the obvious way. Note the following consequence of the typing rules: if Γ | z : A t : B is well typed, then so is Γ, z : A | − t : B. Terms of PE are identified up to α-equivalence as usual. Although the calculus PE has just a few primitive type constructors, a wide range of derived type constructors, on both value types and computation types, can be encoded using polymorphism.
A Logic for Parametric Polymorphism with Effects
145
1 =def ∀X. X → X A × B =def ∀X. (A → B → X) → X
(X ∈ FTV(A,B))
0 =def ∀X. X A + B =def ∀X. (A → X) → (B → X) → X
(X ∈ FTV(A,B))
∃X. B =def ∀Y. (∀X. (B → Y )) → Y
(Y ∈ FTV(B))
∃X. B =def ∀Y. (∀X. (B → Y )) → Y
(Y ∈ FTV(B))
μX. B =def ∀X. (B → X) → X
(X +ve in B)
νX. B =def ∃X. (X → B) × X
(X +ve in B)
Fig. 2. Definable value types !B =def ∀X. (B → X) → X
(X ∈ FTV(B))
◦
1 =def ∀X. 0 → X A ×◦ B =def ∀X. ((A X) + (B X)) → X
(X ∈ FTV(A,B))
0◦ =def ∀X. X A ⊕ B =def ∀X. (A X) → (B X) → X B· A =def ∀X. (B → A X) → X ◦
(X ∈ FTV(A,B)) (X ∈ FTV(B,A))
∃ X. A =def ∀Y . (∀X. (A Y )) → Y
(Y ∈ FTV(A))
∃◦ X. A =def ∀Y . (∀X. (A Y )) → Y
(Y ∈ FTV(A))
◦
μ X. A =def ∀X. (A X) → X
(X +ve in A)
ν ◦ X. A =def ∃◦ X. (X A)· X
(X +ve in A)
Fig. 3. Definable computation types
Since value types extend second-order lambda-calculus, the polymorphic type encodings known from that case can be used for type encodings on value types in PE. Figure 2 recalls these type encodings and also shows how to encode existential quantification over computation types. Note that the encodings of inductive and coinductive types require a positive polarity of the type variable X. This notion is defined in the standard way, cf. Section 5. Figure 3 describes polymorphic encodings of a number of constructions on computation types. The first of these is the free computation type !B on a value type B. This plays the role of the monad in Moggi’s computational lambdacalculus [6], or more precisely of the F constructor in CBPV [3] (for further details, see [4]). The next constructions are unit, product, initial object and binary coproduct of computation types. Thetype B· A is the B-fold copower of A, and thus can be thought of as a coproduct x∈B A of computation types indexed by a value type. The remaining constructions are existential quantification over value types and computation types, packaged up as computation types, and inductive and coinductive computation types. We remark that the somewhat exotic-looking types appearing in this figure do have applications. For example,
146
R.E. Møgelberg and A. Simpson
in forthcoming work, we shall demonstrate an application to giving a (linear) continuation-passing translation of Levy’s CBPV. In Section 3 below we formulate a logic for reasoning about relational parametricity in PE. The main applications of this logic, in Sections 4 and 5, will be to verify the correctness of a selection of the above type encodings.
3
The Logic
This section presents the first main contribution of the paper, our logic for reasoning about parametricity in PE. As mentioned in the introduction, this logic has been extracted as a formalization of the reasoning principles validated by the relationally-parametric models of PE described in [4]. The purpose of this paper, however, is to give a self-contained presentation of the logic without reference to [4]. The idea is that the logic can be understood independently of its (somewhat convoluted) models. In order to be well-typed, propositions are defined in contexts of relation variables and term variables, denoted Θ and Γ respectively in the meta-notation. As mentioned in the introduction, the logic has two classes of relations: value relations between value types and computation relations between computation types. We use the notations Relv (A, B) and Relc (A, B) for the collections of all value relations between A and B and all computation relations between A and B respectively. The formation rules for propositions are given in Figure 4. In the figure, the notation Rel− (A, B) is used in some rules. In these cases the rule holds for both value relations and computation relations, and so is a shorthand for two rules. Note that we only include connectives and quantifiers from the negative fragment of intuitionistic logic. Although the others could be included in principle, we shall not need them, and so omit them for space reasons. The formation rules for relations are given in Figure 5. Relations are closed under: conjunctions, universal quantification and under implications whose antecendent does not depend on the variables being related. This last restriction is motivated by the models considered in [4]. A similar condition is required on admissible relations in [2]. For the purposes of the present paper, this condition should just be accepted as a syntactic condition that needs to be adhered to when using the logic. Lemma 1. 1. If ρ : Relc (A, B) in some context then also ρ : Relv (A, B) in the same context. 2. If Γ | − f : A → B then Θ ; Γ (x : A, y : B). f (x) = y : Relv (A, B) for any relational context Θ. 3. If Γ | − g : A B then Θ ; Γ (x : A, y : B). g(x) = y : Relc (A, B). We write f and g for the relations of items 2 and 3 in the lemma, and call these relations graphs. We use eqA for the graph of the identity function on A. Since relations ρ are always of the form (x : A, y : B). φ we can use the metanotation ρ(t, u) for φ[t, u/x, y] whenever Γ | − t : A and Γ | − u : B.
A Logic for Parametric Polymorphism with Effects
Γ |− t: A
Γ |− t: A
Γ |− u: A
R : Rel− (A, B) ∈ Γ
Θ ; Γ R(t, u) : Prop
Θ ; Γ t =A u : Prop Θ ; Γ φ : Prop
Γ |− u: B
147
Θ ; Γ ψ : Prop
Θ ; Γ φ ψ : Prop
Θ ; Γ, x : A φ : Prop
∈ {∧, ⊃}
Θ, R : Rel− (A, B) ; Γ φ : Prop
Θ ; Γ φ : Prop
Θ ; Γ ∀R : Rel− (A, B). φ : Prop
Θ ; Γ ∀X. φ : Prop
Θ ; Γ ∀x : A. φ : Prop ( )
Θ ; Γ φ : Prop Θ ; Γ ∀X. φ : Prop
( )
Fig. 4. Typing rules for propositions. Here ( ) is the side condition X ∈ / FTV(Θ, Γ ) and − ranges over {v, c}.
Γ, x : A | − t : C
Γ, x : A | − t : A
Γ, y : B | − u : C
Θ, R : Rel− (A, B) ; Γ (x : A , y : B ). R(t, u) : Relv (A , B )
Θ ; Γ (x : A, y : B). t = u : Relv (A, B) Γ |x: A t: C
Γ, y : B | − u : B
Γ | x : A t : A
Γ |y : B u: C
Γ | y : B u : B
Θ, R : Relc (A, B); Γ (x : A , y : B ). R(t, u) : Relc (A , B )
Θ ; Γ (x : A, y : B). t = u : Relc (A, B) Θ ; Γ, z : C (x : A, y : B). φ : Rel− (A, B)
Θ, R : Rel= (C, C ) ; Γ (x : A, y : B). φ : Rel− (A, B)
Θ ; Γ (x : A, y : B). ∀z : C. φ : Rel− (A, B)
Θ ; Γ (x : A, y : B). ∀R : Rel= (C, C ). φ : Rel− (A, B)
Θ ; Γ (x : A, y : B). φ : Rel− (A, B)
Θ ; Γ (x : A, y : B). φ : Rel− (A, B)
Θ ; Γ (x : A, y : B). ∀X. φ : Rel− (A, B)
Θ ; Γ (x : A, y : B). ∀X. φ : Rel− (A, B)
Θ ; Γ ψ : Prop
Θ ; Γ (x : A, y : B). φ : Rel− (A, B)
Θ ; Γ (x : A, y : B). ψ ⊃ φ : Rel− (A, B)
Fig. 5. Typing rules for relations. In these rules −, = range over {v, c}.
Similarly we can write ρop : Rel− (B, A) for (y : B, x : A). φ. If ρ is a value relation then so is ρop , and likewise for computation relations. Deduction sequents are written on the form Θ ; Γ | Φ ψ where Φ is a finite set of formulas. A deduction sequent is well-formed if Θ ; Γ ψ : Prop and Θ ; Γ φi : Prop for all φi in Φ, and we shall assume well-formedness whenever writing a deduction sequent. The rules for deduction in the logic are presented in Figure 6, to which should be added the rules for β and η equality as in Figure 7, and the usual rules for implication, conjunction, which we have omitted for reasons of space. The rules for equality implement a congruence relation on terms (the congruence rules not explicit in Figure 6 can be derived from the equality elimination rule). An important application of the logic is to prove equalities between terms. For terms Γ | Δ s : A and Γ | Δ t : A, we write Γ |Δ s = t: A
148
R.E. Møgelberg and A. Simpson Θ ; Γ, x : A | Φ ψ Θ ; Γ | Φ ∀x : A. ψ
x∈ / FV(Φ)
Θ ; Γ | Φ ∀x : A. ψ
Θ ; Γ | Φ ψ[t/x]
Θ, R : Rel− (A, B) ; Γ | Φ ψ Θ ; Γ | Φ ∀R : Rel− (A, B). ψ Θ ; Γ | Φ ∀R : Rel− (A, B). ψ
Γ |− t: A
− ∈ {v, c}
Θ ; Γ (x : A, y : B). φ : Rel− (A, B)
Θ ; Γ | Φ ψ[φ[t, u/x, y]/R(t, u)] Θ ; Γ |Φ ψ Θ ; Γ | Φ ∀X. ψ Θ ; Γ |Φ ψ Θ ; Γ | Φ ∀X. ψ
Θ ; Γ | Φ ∀X. ψ
X∈ / FTV(Θ, Γ, Φ)
Θ ; Γ | Φ ψ[A/X] Θ ; Γ | Φ ∀X. ψ
X∈ / FTV(Θ, Γ, Φ)
Θ ; Γ | Φ ψ[A/X]
Θ ; Γ |Φ t = u
Γ |− t: A Θ ; Γ |Φ t = t
Θ ; Γ | Φ φ[t/x]
Θ ; Γ | Φ φ[u/x]
Γ, x : B | − t, u : C
Θ ; Γ, x : B | Φ t = u
Θ ; Γ | Φ λx : B. t = λx : B. u Γ | − t, u : B
Θ ; Γ |Φ t = u
Θ ; Γ | Φ ΛX. t = ΛX. u Γ | x : A t, u : B
Θ ; Γ, x : A | Φ t = u
Θ ; Γ | Φ λ x : A. t = λ◦ x : A. u Θ ; Γ |Φ t = u
Θ ; Γ | Φ ΛX. t = ΛX. u
x ∈ FV(Φ)
X ∈ FTV(Γ, Δ, Φ)
◦
Γ | − t, u : B
− ∈ {v, c}
x ∈ FV(Φ)
X ∈ FTV(Γ, Δ, Φ)
Fig. 6. Deduction rules
(although we shall often omit the type A) as notation for the deduction sequent − ; Γ, Δ t = u. Thus Γ | Δ s = t : A and Γ, Δ | − s = t : A are equivalent. This corresponds to the faithfulness of the forgetful functor from computation types to value types in the semantic models of [4]. A related fact, is that the canonical map of type (A B) → A → B given by the term λf : A B. λx : A. f (x) is injective, which is derivable using the lemma below. Lemma 2. The following extensionality schemas are provable in the logic. ∀f, g : A → B. (∀x : A. f (x) = g(x)) ⊃ f = g ∀f, g : A B. (∀x : A. f (x) = g(x)) ⊃ f = g ∀x, y : (∀X. A). (∀X. x X = y X) ⊃ x = y ∀x, y : (∀X. A). (∀X. x X = y X) ⊃ x = y
A Logic for Parametric Polymorphism with Effects (λx : A. t)(u) = t[u/x] ◦
(λ x : A. t)(u) = t[u/x]
λx : A. t(x) = t ◦
λ x : A. t(x) = t
149
if t : A → B and x ∈ / FV(t) if t : A B and x ∈ / FV(t)
(ΛX. t) A = t[A/X]
ΛY . t Y = t
if t : ∀X. A and Y ∈ / FTV(t)
(ΛX. t) A = t[A/X]
ΛY. t Y = t
if t : ∀X. A and Y ∈ / FTV(t)
Fig. 7. β, η rules for PE Xi [ρ, ρ] = ρi X j [ρ, ρ] = ρj (A → B)[ρ, ρ] = (f : (A → B)(C, C), g : (A → B)(C , C )). ∀x : A(C, C), ∀y : A(C , C ). A[ρ, ρ](x, y) ⊃ B[ρ, ρ](f (x), g(y)) (A B)[ρ, ρ] = (f : (A B)(C, C), g : (A B)(C , C )). ∀x : A(C, C), ∀y : B(C , C ). A[ρ, ρ](x, y) ⊃ B[ρ, ρ](f (x), g(y)) (∀X. A)[ρ, ρ] = (x : ∀X. A[C, C], y : ∀X. A[C, C]). ∀Y, Z. ∀R : Relv (Y, Z). A[ρ, ρ, R](x Y, y Z) (∀X. A)[ρ, ρ] = (x : ∀X. A[C, C], y : ∀X. A[C, C]). ∀Y , Z. ∀R : Relc (Y , Z). A[ρ, ρ, R](x Y , y Z)
Fig. 8. Relational interpretation of types
We now come to the crucial relational interpretation of types, needed to define relational parametricity. Suppose A is a type such that FTV(A) ⊆ {X, X} (using bold font for vectors), and ρ and ρ are vectors of relations of the same lengths as X and X respectively such that Θ ; Γ ρi : Relv (Ci , Ci ) for each i indexing an element of X, and Θ ; Γ ρj : Relc (Cj , Cj ) for each j indexing an element of
X. We define A[ρ, ρ/X, X] : Relv (A[C, C/X, X], A[C , C /X, X]) by structural induction on A as in Figure 8, using the short notation A[ρ, ρ] for A[ρ, ρ/X, X] and A(C, C) for A[C, C/X, X].
Lemma 3. If A is a computation type then A[ρ, ρ] is a computation relation. As our axiom for parametricity we shall take a version of Reynolds’ identity extension schema [10] adapted to our setting. Using the shorthand notation ρ ≡ ρ for ∀x, y. ρ(x, y) ⊃⊂ ρ (x, y) this can be stated as: A[eqB , eqB ] ≡ eqA[B,B]
(1)
where A ranges over all value types such that FTV(A) ⊆ {X, X} and B and B range over all vectors of value types and computation types respectively (open as well as closed). Lemma 4. Identity extension (1) is equivalent to the two parametricity schemas: ∀x : (∀Y. A(B, B, Y )). ∀Y, Z, R : Relv (Y, Z). A[eqB , eqB , R](x(Y ), x(Z)) ∀x : (∀Y . A (B, B, Y )). ∀Y , Z, R : Relc (Y , Z). A [eqB , eqB , R](x(Y ), x(Z)) where A, A range over types with FTV(A) ⊆ {X, X, Y } and FTV(A ) ⊆ {X, X, Y }
150
R.E. Møgelberg and A. Simpson
In the case of parametricity in the second-order lambda-calculus, the equivalence asserted by Lemma 4 is well known. The proof in our setting is similar. In one direction, the parametricity schemas are special cases of the identity extension schema in the case of polymorphic types. The other direction is proved by induction over the structure of types. Lemma 5 (Logical relations). Suppose Γ | Δ t : A is a valid typing judgement with FTV(Γ, Δ, t, A) ⊆ {X, X} and suppose we are given vectors of relations ρ : Relv (C, C ) and ρ : Relc (C, C ). Suppose we are further given si : Bi (C, C) and si : Bi (C , C ) for each xi : Bi in Γ , and if Δ = xn+1 : Bn+1 is non-empty also sn+1 : Bn+1 (C, C) and sn+1 : Bn+1 (C , C ). If Bi [ρ, ρ](si , si ) for all i, then A[ρ, ρ](t[s, C, C/x, X, X], t[s , C , C /x, X, X]) In the sequel, we apply the logic defined in this section to verify properties of PE. In doing so, we call the underlying logic, without the identity extension schema, L; and we write L+P for the logic with the identity extension schema (equivalently parametricity schemas) added.
4
Verifying Polymorphic Type Encodings
In this section, we apply the logic to verify the correctness of a selection of the datatype encodings presented in Section 2. Our style will be arguments in an informal style, including as much detail as space permits, but to ensure that the arguments are always directly formalizable. The value type encodings of Figure 2, can be verified essentially as in Plotkin and Abadi’s logic [7] (see also [1]). Nevertheless, we briefly discuss the the case of coproducts, as it serves to illustrate a subtlety introduced by the stoup. The type A+B supports derived introduction and elimination rules as follows. Γ |− t: A
Γ |− u: B
Γ | − in1 (t) : A + B
Γ | − in2 (u) : A + B
Γ, x : A | Δ u : C
Γ, y : B | Δ u : C
Γ |− t: A + B
Γ | Δ case t of in1 (x). u; in2 (y). u : C
(2)
Here, the left and right inclusions are defined as expected: in1 (t) =def ΛX. λf : A → X. λg : B → X. f (t) in2 (u) =def ΛX. λf : A → X. λg : B → X. g(u) But the definition of the case construction depends on the stoup. If the stoup is empty then case t of in1 (x). u; in2 (y). u =def t C (λx : A. u) (λy : A. u ) but if it is non-empty, say Δ = z : C then case t of in1 (x). u; in2 (y). u is:
A Logic for Parametric Polymorphism with Effects
151
(t (C C) (λx : A. λ◦ z : C . u) (λy : A. λ◦ z : C . u )) z . That this encoding of coproducts enjoys the expected universal property is captured by the equalities in the theorem below. Theorem 1. Suppose u, u are as in the hypothesis of the elimination rule (2) then L proves • If Γ | − t : A then Γ | Δ case in1 (t) of in1 (x). u; in2 (y). u = u[t/x] • If Γ | − s : B then Γ | Δ case in2 (s) of in1 (x). u; in2 (y). u = u [s/y] If Γ | − t : A + B and Γ, z : A + B | Δ u : C then L+P proves Γ | Δ case t of in1 (x). u[in1 (x)/z]; in2 (y). u[in2 (y)/z] = u[t/z] We omit the proof since it follows the usual argument using relational parametricity, cf. [7,1]. Instead, we turn to the constructs on computation types, of Figure 3, whose verification makes essential use of computation relations. Although the type !A, corresponding to Moggi’s T A [6] and Levy’s F A [3], is a particularly important one for effects; we omit the verication of its universal property here, since the argument is given in detail in [4]. There, an informal argument is presented, which is justified in semantic terms. However, every detail of this argument is directly translatable into our logic. Instead, as our first example, we consider the type A · B, which represents a A-fold copower of B. Type theoretically, the universal property requires a natural bijection between terms of type A → (B C) and terms of type (A · B) C. The derived introduction and elimination rules for copowers are Γ |− t: A
Γ |Δ s: B
Γ |Δ t · s: A · B Γ, x : A | y : B u : C
Γ |Δ t: A · B
Γ | Δ let x · y be t in u : C
(3)
(4)
Indeed, we define t · s = ΛX. λf : A → B X. f (t)(s) let x · y be t in u = t C (λx : A. λ◦ y : B. u). Lemma 6. Suppose t, u are as in the hypothesis of the elimination rule (4) and Γ | − f : C C . Then L+P proves Γ | Δ let x · y be t in f (u) = f (let x · y be t in u) Proof. Parametricity for t states − ; Γ, Δ ∀X, Y , R : Relc (X, Y ). ((eqA → eqB R) → R)(t X, t Y )
(5)
152
R.E. Møgelberg and A. Simpson
By definition, − ; Γ (eqA → eqB f )(λx : A. λ◦ y : B. u, λx : A. λ◦ y : B. f (u)). Also, by Lemma 1, f is a computation relation, Thus, applying (5) we get: − ; Γ, Δ f (t C (λx : A. λ◦ y : B. u), t C (λx : A. λ◦ y : B. f (u))) i.e., by definition of the copower let expressions, − ; Γ, Δ f (let x · y be t in u, let x · y be t in f (u)) So, by definition of f , − ; Γ, Δ let x · y be t in f (u) = f (let x · y be t in u) which means that Γ | Δ let x · y be t in f (u) = f (let x · y be t in u)
is provable as desired. Lemma 7. Suppose Γ | Δ t : A · B and x, y are fresh. Then L+P proves Γ | Δ let x · y be t in x · y = t Proof. By extensionality it suffices to prove that if X and f are fresh then Γ, f : A → B X | Δ (let x · y be t in x · y) X f = t X f. Since
Γ, f : A → B X | − λ◦ x : A · B. x X f : A · B X,
by Lemma 6 Γ, f : A → B X | Δ (let x · y be t in x · y) X f = let x · y be t in (x · y X f ) But let x · y be t in (x · y X f ) = t X (λx : A. λ◦ y : B. x · y X f ) = t X f. Theorem 2. If Γ | − t : A, Γ | Δ s : B and Γ, x : A | y : B u : C then L proves Γ | Δ let x · y be t · s in u = u[t, s/x, y] If Γ | z : A · B u : C and Γ | Δ t : A · B then L+P proves Γ | Δ let x · y be t in u[x · y/z] = u[t/z] Proof. The first part follows from β and η equalities, and the second part from Lemma 6 and Lemma 7.
A Logic for Parametric Polymorphism with Effects
153
This theorem formulates the desired universal property for copower types. We consider one other example from Figure 3, existential computation types of the form ∃◦ X. A. The derived introduction and elimination rules are: Γ | Δ t : A[B/X]
(6)
Γ | Δ B, t : ∃◦ X. A Γ |x: A u: B
Γ | Δ t : ∃◦ X. A
Γ | Δ let X, x be t in u : B
X∈ / FTV(B)
(7)
with the relevant term constructors defined by: B, t = ΛY . λf : (∀X. A Y ). f B (t) let X, x be t in u = t B (ΛX. λ◦ x : A. u) Since correctness argument follows very closely that for copower types, we merely state the relevant lemmas and theorem, omitting the proofs. Lemma 8. Suppose t, u are as in the hypothesis of the elimination rule (7) and that Γ | − f : B B . Then L+P proves Γ | Δ let X, x be t in f (u) = f (let X, x be t in u) Lemma 9. Suppose Γ | Δ t : ∃◦ X. A then L+P proves Γ | Δ let X, x be t in X, x = t Theorem 3. Suppose Γ | Δ t : A[B/X] and Γ | x : A u : C then • L proves Γ | Δ let X, x be B, t in u = u[B, t/X, x] • If Γ | z : ∃◦ X. A s : C then L+P proves Γ | Δ let X, x be t in s[X, x/z] = s[t/z]
5
Inductive and Coinductive Types
This final section describes encodings of general inductive and coinductive computation types and verifies the correctness of the latter. To describe the universal properties of these types we need to consider the functorial actions of the type constructors of PE. This is essentially a standard analysis of type structure, adapted to the setting of the two collections of types in PE. We define positive and negative occurrences of type variables in types in the standard way (→ and reverse polarity of the type variables occurring on the left, and all other type constructors preserve polarity). If A is a value type in
154
R.E. Møgelberg and A. Simpson Yi (f , g, h, k) Y j (f , g, h, k) (A → B)(f , g, h, k) (A B)(f , g, h, k) (∀Y. A)(f , g, h, k) (∀Y . A)(f , g, h, k)
= = = = = =
gi kj λl : (A → B). B(f , g, h, k) ◦ l ◦ A(g, f , k, h) λl : (A B). B(f , g, h, k) ◦ l ◦ A(g, f , k, h) λx : ∀Y. A. ΛY. A(f , g, h, k, id Y , id Y )(x Y ) λx : ∀Y . A. ΛY . A(f , g, h, k, id Y , id Y )(x Y )
Fig. 9. The functorial interpretation of types
which the variables X, X occur only negatively and the type variables Y , Y occur only positively, then we can define a term MA : ∀X, X , Y , Y , X, X , Y , Y . (X → X) → (Y → Y ) → (X X) → (Y Y ) → A → A(X , Y , X , Y ) The term MA is defined by structural induction over A simultaneously with the definition of a term NB : ∀X, X , Y , Y , X, X , Y , Y . (X → X) → (Y → Y ) → (X X) → (Y Y ) → B B(X , Y , X , Y ) for any computation type B satisfying the same condition on the occurrences of variables as above. The definition is in Figure 9, in which the simplified notation A(f , g, h, k) is used for MA (or NA whenever A is a computation type) applied to f , g, h, k, alongside evident notation for function composition. Lemma 10. For computation types A the terms MA and NA agree up to inclusion of into →. Moreover, the terms MA define functors since they: • preserve identities: A(id , id , id, id ) = id • preserve compositions: A(f ◦ f , g ◦ g, h ◦ h , k ◦ k) = A(f , g , h , k ) ◦ A(f , g, h, k) Finally we adapt the graph lemma of [7] to our setting. Lemma 11 (Graph Lemma). If f : B → B, g : C → C, h : B B, k : C C then L+P proves A[f op , g, hop , k] ≡ A(f , g, h, k) Suppose A is a computation type whose only free type variable is the computation type variable X which occurs only positively. As a consequence of parametric polymorphism the types μ◦ X. A =def ∀X. (A X) → X ν ◦ X. A =def ∃◦ X. (X A) · X are carriers of initial algebras and final coalgebras respectively for the functor induced by A. Here, we show how the universal property for final coalgebras can be verified in our logic.
A Logic for Parametric Polymorphism with Effects
155
The final coalgebra structure is defined using unfold : ∀X. (X A) → X ν ◦ X. A out : (ν ◦ X. A) A[ν ◦ X. A/X] defined as unfold = ΛX. λf : X A. λ◦ x : (ν ◦ X. A). X, f · x out = λ◦ x : ν ◦ X. A. let X, y be x in (let f · z be y in A(unfold X f )(f (z))) Lemma 12. If Γ | − f : B A[B/X] then L proves Γ | x : B out(unfold B f x) = A(unfold B f )(f (x)) . In diagramatic form Lemma 12 means that B
f
◦ A[B/X]
unfold B f ◦ ν ◦ X. A
A(unfold B f ) ◦ out ◦ A[ν ◦ X. A/X]
commutes, i.e., the term unfold verifies that out is a weak final coalgebra. Lemma 13. Suppose Γ | − h : B B , f : B A[B/X], f : B A[B /X], and that Γ | x : B f (h(x)) = A(h)(f (x)). Then L+P proves Γ | x : B unfold B f x = unfold B f (h(x)) Proof. By the Graph Lemma (Lemma 11) the assumption can be reformulated as h A[h/X])(f, f ). So by parametricity of unfold (h eqν ◦ X. A )(unfold B f, unfold B f ) implying Γ | x : B unfold B f x = unfold B f (h(x)).
Lemma 14. L+P proves unfold ν ◦ X. A out = id ν ◦ X. A . Proof. By Lemma 13, for arbitrary X, f : X A, unfold X f = (unfold ν ◦ X. A out) ◦ (unfold X f ) so Γ, f : X A | x : X X, f · x = unfold ν ◦ X. A out X, x. This implies using Lemma 8 and Lemma 9 that for any y : ν ◦ X. A y = let X, f · x be y in X, f · x = let X, f · x be y in (unfold ν ◦ X. A out X, x) = unfold ν ◦ X. A out (let X, f · x be y in X, x) = unfold ν ◦ X. A out y and so the lemma follows from extensionality.
156
R.E. Møgelberg and A. Simpson
Theorem 4. Suppose f : B ν ◦ X. A and h : B A[B/X] such that A(h) ◦ f = out ◦ h then L+P proves h = unfold B f . Proof. By Lemma 13 and Lemma 14 unfold B f = (unfold ν ◦ X. A out) ◦ h = h .
References
1. Birkedal, L., Møelberg, R.E.: Categorical models of Abadi-Plotkin’s logic for parametricity. Mathematical Structures in Computer Science 15(4), 709–772 (2005) 2. Birkedal, L., Møgelberg, R.E., Petersen, R.L.: Linear Abadi & Plotkin logic. Logical Methods in Computer Science 2 (2006) 3. Levy, P.B.: Call By Push Value, a Functional/Imperative Synthesis. Kluwer, Dordrecht (2004) 4. Møgelberg, R.E., Simpson, A.K.: Relational parametricity for computational effects. In: LICS, pp. 346–355. IEEE Computer Society Press, Los Alamitos (2007) 5. Møgelberg, R.E., Simpson, A.K.: Relational parametricity for control considered as a computational effect. Electr. Notes Theor. Comput. Sci 173, 295–312 (2007) 6. Moggi, E.: Notions of computation and monads. Information and Computation 93, 55–92 (1991) 7. Plotkin, G.D., Abadi, M.: A logic for parametric polymorphism. In: Bezem, M., Groote, J.F. (eds.) TLCA 1993. LNCS, vol. 664, pp. 361–375. Springer, Heidelberg (1993) 8. Plotkin, G.D., Power, J.: Algebraic operations and generic effects. Applied Categorical Structures 11(1), 69–94 (2003) 9. Plotkin, G.D.: Type theory and recursion (extended abstract). In: Proceedings, Eighth Annual IEEE Symposium on Logic in Computer Science, Montreal, Canada, June 19–23, 1993, p. 374. IEEE Computer Society Press, Los Alamitos (1993) 10. Reynolds, J.C.: Types, abstraction, and parametric polymorphism. Information Processing 83, 513–523 (1983) 11. Strachey, C.: Fundamental concepts in programming languages. Lecture Notes, International Summer School in Computer Programming, Copenhagen (August 1967) 12. Wadler, P.: Theorems for free. In: Proceedings 4th International Conference on Functional Programming languages and Computer Architectures (1989)
Working with Mathematical Structures in Type Theory Claudio Sacerdoti Coen and Enrico Tassi Department of Computer Science, University of Bologna Mura Anteo Zamboni, 7 – 40127 Bologna, Italy {sacerdot,tassi}@cs.unibo.it
Abstract. We address the problem of representing mathematical structures in a proof assistant which: 1) is based on a type theory with dependent types, telescopes and a computational version of Leibniz equality; 2) implements coercive subtyping, accepting multiple coherent paths between type families; 3) implements a restricted form of higher order unification and type reconstruction. We show how to exploit the previous quite common features to reduce the “syntactic” gap between pen&paper and formalised algebra. However, to reach our goal we need to propose unification and type reconstruction heuristics that are slightly different from the ones usually implemented. We have implemented them in Matita.
1
Introduction
It is well known that formalising mathematical concepts in type theory is not straightforward, and one of the most used metrics to describe this difficulty is the gap (in lines of text) between the pen&paper proof, and the formalised version. A motivation for that may be that many intuitive concepts widely used in mathematics, like graphs for example, have no simple and handy representation (see for example the complex hypermap construction used to describe planar maps in the four colour theorem [11]). On the contrary, some widely studied fields of mathematics do have a precise and formal description of the objects they study. The most well known one is algebra, where a rigorous hierarchy of structures is defined and investigated. One may expect that formalising algebra in an interactive theorem prover should be smooth, and that the so called De Bruijn factor should be not so high for that particular subject. Many papers in the literature [9] give evidence that this is not the case. In this paper we analyse some of the problems that arise in formalising a hierarchy of algebraic structures and we propose a general mechanism that allows to tighten the distance between the algebraic hierarchy as is conceived by mathematicians and the one that can be effectively implemented in type theory. In particular, we want to be able to formalise the following informal piece of mathematics1 without making more information explicit, expecting the interactive theorem prover to understand it as a mathematician would do. 1
PlanetMath, definition of Ordered Vector Space.
M. Miculan, I. Scagnetto, and F. Honsell (Eds.): TYPES 2007, LNCS 4941, pp. 157–172, 2008. c Springer-Verlag Berlin Heidelberg 2008
158
C. Sacerdoti Coen and E. Tassi
Example 1. Let k be an ordered field. An ordered vector space over k is a vector space V that is also a poset at the same time, such that the following conditions are satisfied 1. for any u, v, w ∈ V , if u ≤ v then u + w ≤ v + w, 2. if 0 ≤ u ∈ V and any 0 < λ ∈ k, then 0 ≤ λu. Here is a property that can be immediately verified: u ≤ v iff λu ≤ λv for any 0 < λ. We choose this running example instead of the most common example about rings[9,16,3] because we believe the latter to be a little deceiving. Indeed, a ring is usually defined as a triple (C,+,∗) such that (C,+) is a group, (C,∗) is a semigroup, and some distributive properties hold. This definition is imprecise or at least not complete, since it does not list the neutral element and the inverse function of the group. Its real meaning is just that a ring is an additive group that is also a multiplicative semigroup (on the same carrier) with some distributive properties. Indeed, the latter way of defining structures is often adopted also by mathematicians when the structures become more complex and embed more operations (e.g. vector spaces, Riesz spaces, integration algebras). Considering again our running example, we want to formalise it using the following syntax2 , and we expect the proof assistant to interpret it as expected: record OrderedVectorSpace : Type := { V:> VectorSpace; (∗ we suppose that V.k is the ordered field ∗) p:> Poset with p.CApo = V.CAvs ; (∗ the two carriers must be the same ∗) add le compat: ∀ u,v,w:V. u ≤ v → u + w ≤ v + w; mul le compat: ∀ u:V.∀ α :k. 0 ≤ u → 0 < α → 0 ≤ α ∗ u }. lemma trivial: ∀ R.∀ u,v:R. (∀ α . 0 < α → α ∗ u ≤ α ∗ v) → u ≤ v.
The first statement declares a record type. A record type is a sort of labelled telescope. A telescope is just a generalised Σ-type. Inhabitants of a telescope of length n are heavily typed n-tuples x1 , . . ., xn T1 ,...,Tn where xi must have type Ti x1 . . .xi−1 . The heavy types are necessary for type reconstruction. Instead, inhabitants of a record type with n fields are not heavily typed n-tuples, but lighter n-tuples x1 , . . ., xn R where R is a reference to the record type declaration, which declares once and for all the types of fields. Thus terms containing inhabitants of records are smaller and require less type-checking time than their equivalents that use telescopes. Beware of the differences between our records — which are implemented, at least as telescopes, in most systems like Coq — and dependently typed records “`a la Betarte/Tasistro/Pollack” [5,4,8]: 1. there is no “dot” constructor to uniformly access by name fields of any record. Thus the names of these projections must be different, as .CApo and .CAvs . 2
The syntax is the one of the Matita proof assistant, which is quite close to the one of Coq. We reserve λ for lambda-abstraction.
Working with Mathematical Structures in Type Theory
159
We suppose that ad-hoc projections .k, .v, etc. are automatically declared by the system. When we write x.v we mean the application of the .v function to x; 2. there is no structural subtyping relation “` a la Betarte/Tasistro” between records; however, ad-hoc coercions “`a la Pollack” can be declared by the user; in particular, we suppose that when a field is declared using “:>”, the relative projection is automatically declared as a coercion by the system; 3. there are no manifest fields “` a la Pollack”; the with notation is usually understood as syntactic sugar for declaring on-the-fly a new record with a manifest field; however, having no manifest fields in our logic, we will need a different explanation for the with type constructor, it will be given in Sec. 2. When lambda-abstractions and dependent products do not type their variable, the type of the variable must be inferred by the system during type reconstruction. Similarly, all mathematical notation (e.g. “∗”) hides the application of one projection to a record (e.g. “ ?.∗ ” where ? is a placeholder for a particular record). The notation “x:R” can also hide a projection R.CA from R to its carrier. All projections are monomorphic, in the sense that different structures have different projections to their carrier. All placeholders in projections must be inferred during type reconstruction. This is not a trivial task: in the expression “α ∗ u ≤ α ∗ w” both sides of the inequation are applications of the scalar product of some vector space R (since u and v have been previously assigned the type R.CA); since their result are compared, the system must deduce that the vector space R must also be a poset, hence an ordered vector space. In the rest of the paper we address the problem of representing mathematical structures in a proof assistant which: 1) is based on a type theory with dependent types, telescopes and a computational version of Leibniz equality; 2) implements coercive subtyping, accepting multiple coherent paths between type families; 3) implements a restricted form of higher order unification and type reconstruction. Lego, Coq, Plastic and Matita are all examples of proof assistants based on such theories. In the next sections we highlight one by one the problems that all these systems face in understanding the syntax of the previous example, proposing solutions that require minimal modifications to the implementation.
2
Dependently Typed Records in Type Theory
The first problem is understanding the with type constructor employed in the example. Pollack and alt. in [8] propose the model for a new type theory having in the syntax primitive dependently typed records, and show how to interpret records in the model. The theory lacks with, but it can be easily added to the syntax (adopting the rules proposed in [16]) and also interpreted in the model. However, no non-prototipical proof assistant currently implements primitive dependently typed records.
160
2.1
C. Sacerdoti Coen and E. Tassi
Ψ and Σ Types
In [16], Randy Pollack shows that dependently typed records with uniform field projections and with can be implemented in a type theory extended with inductive types and the induction-recursion principle [10]. However, inductionrecursion is also not implemented in most proof assistants, and we are looking for a solution in a simpler framework where we only have primitive records (or even simply primitive telescopes or primitive Σ-types), but no inductive types. In the same paper, he also shows how to interpret dependently typed records with and without manifest fields in a simpler type theory having only primitive Σ-types and primitive Ψ -types. A Σ-type (Σ x:T. P x) is inhabited by heavily typed couples w,pT,P where w is an inhabitant of the type T and p is an inhabitant of P w. The heavy type annotation is required for type inference. A Ψ -type (Ψ x:T. p) is inhabited by heavily typed singletons wT,P,p where w is an inhabitant of the type T and p is a function mapping x of type T to a value of type P x. The intuitive idea is that w, p[w]T,P and wT,P,λx:T. p[x] should represent the same couple, where in the first case the value of the second component is opaque, while in the second case it is made manifest (as a function of the first component). However, the two representations actually are different and morally equivalent inhabitants of the two types are not convertible, against intuition. We will see later how it is possible to represent couples typed with manifest fields as convertible couples with opaque fields. We will denote by .1 and .2 the first and second projection of a Σ/Ψ -type. The syntax “Σ x:T. P x with .2 = t[.1] ” can now be understood as syntactic sugar for “Ψ x:T. t[x] ”. The illusion is completed by declaring a coercion from Ψ x:T. p to Σ x:T. P x so that wT,P,p is automatically mapped to w, p wT,P when required. Most common mathematical structure are records with more than two fields. Pollack explains that such a structure can be understood as a sequence of left-associating3 nested heavily typed pairs/singletons. For instance, the record r ≡ nat, list nat, @R of type R := {C : Type; T := list C; app: T → T → T} is represented as4 r0 ≡ (), TypeUnit , λC:Unit. Type r1 ≡ r0 ΣC:Unit. Type , λx:(ΣC:Unit. Type).Type1 , λy:(ΣC:Unit. Type). list y.1 r ≡ r1 , @Ψy:(ΣC:Unit. Type). list y.1 , λx:(Ψy:(ΣC:Unit. Type). list y.1). x.2→x.2→x.2
of type Σ x:(Ψ y:(Σ C: Unit. Type). list y .1). x.2 → x.2 → x.2. However, the deep heavy type annotations are actually useless and make the term extremely large and its type checking inefficient. The interpretation of with also becomes more complex, since the nested Σ/Ψ types must be recursively traversed to compute the new type. 3
4
In the same paper he also proposes to represent a record type with a right-associating sequence of Σ/Φ types, where a Φ type looks like a Ψ type, but makes it first fields manifest. However, in Sect. 5.2.2 he also argues for the left-associating solution. Type1 in the definition of r1 is the second universe in Luo’s ECC [13]. Note that Type has type Type1 .
Working with Mathematical Structures in Type Theory
2.2
161
Weakly Manifest Types
In this paper we drop Σ/Ψ types in favour of primitive records, whose inhabitants do not require heavy type annotations. However, we are back at the problem of manifest fields: every time the user declares a record type with n fields, to follow closely the approach of Pollack the system should declare 2n record types having all possible combinations of manifest/opaque fields. To obtain a new solution for manifest fields we exploit the fact that manifest fields can be declared using with and we also go back to the intuition that records with and without manifest fields should all have the same representation. That is, when x ≡ 3 (x is definitionally equal to 3) and p: P x, the term x, pR should be both an inhabitant of the record R := { n: nat; H: P n} and of the record R with n = 3. Intuitively, the with notation should only add in the context the new “hypothesis” x ≡ 3. However, we want to be able to obtain this effect without extending the type theory with with and without adding at run time new equations to the convertibility check. This is partially achieved by approximating x ≡ 3 with an hypothesis of type x = 3 where “=” is Leibniz polymorphic equality. To summarise, the idea is to represent an inhabitant of R := {n: nat; H: P n} as a couple x, pR and an inhabitant of R with n=3 as a couple c, qR,λc:R. c.n=3 of type Σ c:R. c.n=3. Declaring the first projection of the Σ-type as a coercion, the system is able to map every element of R with n=3 into an element of R. However, the illusion is not complete yet: if c is an inhabitant of R with n=3, c .1. n (that can be written as c.n because .1 is a coercion) is Leibniz-equal to 3 (because of c.2 ), but is not convertible to 3. This is problematic since terms there were well-typed in the system presented by Pollack are here rejected. Several concrete example can already be found in our running example: to type u + w ≤ v + w (in the declaration of add le compat), the carriers p.CApo and V.CAvs must be convertible, whereas they are only Leibniz equal. In principle, it would be possible to avoid the problem by replacing u + w ≤ v + w with [u+w]p.2 ≤[v+w]p.2 where [ ] is the constant corresponding to Leibniz elimination, i.e. [x] w has type Q[M] whenever x has type Q[N] and w has type N=M. However, the insertion of these constants, even if done automatically with a couple of mutual coercions, makes the terms much larger and more difficult to reason about. 2.3
Manifesting Coercions
To overcome the problem, consider c of type R with n=3 and notice that the lack of conversion can be observed only in c .1. n (which is not definitionally equal to 3) and in all fields of c.1 that come after n (for instance, the second field has type P c.1. n in place of P 3). Moreover, the user never needs to write c.1 anywhere, since c.1 is declared as a coercion. Thus we can try to solve the problem by declaring a different coercion such that c .1. n is definitionally equal to 3. In our example, the coercion5 is 5
The name of the coercion is kn R verbatim, R and n are not indexes.
162
C. Sacerdoti Coen and E. Tassi
definition kn R : ∀ M : nat. R with n=M → R := λ m:nat. λ x:(Σc:R. c.n=M). M, [x.1.H]x.2 R
Once knR is declared as a coercion, c.H is interpreted as (knR 3 c).H which has type n P (kn R 3 c).n, which is now definitionally equal to P 3. Note also that (kR 3 c).H is definitionally equal to [c .1. H]c.2 that is definitionally equal to c .1. H when c.2 is a closed term of type c .1. n = 3. When the latter holds, c .1. n is also definitionally equal to 3, and the manifest type information is actually redundant, according to the initial intuition. The converse holds when the system is proof irrelevant, or, with minor modifications, when Leibniz equality is stated on a decidable type [12]. Coming back to our running example, u + w ≤ v + w can now be parsed as the well-typed term CA
po u (V.+) w ((kPoset V.CAvs p).≤) v (V.+) w
Things get a little more complex when with is used to change the value of a field f1 that occurs in the type of a second field f2 that occurs in the type of a third field f3 . Consider the record type declaration R := { f1 : T; f2 : P f1 ; f3 : Q f1 f2 } and the expression R with f1 = M, interpreted as Σ c:R. c. f 1 = M. We must find a coercion from R with f1 = M to R declared as follows definition kfR1 : ∀ M:T. R with f1 = M → R := λ M:T. λ x:(Σc:R. c.f1=M). M, [c.1.f2 ]c.2 , w
for some w that inhabit Q M [c.1.f 2 ]c.2 and that must behave as c .1. f 3 when c .1. f 1 ≡ M. Observe that c .1. f 3 has type Q c.1. f 1 c.1. f 2 , which is definitionally equivalent to Q c.1. f 1 [c .1. f 2 ]reflT c.1.f1 , where refl T c.1.f1 is the canonical proof of c .1. f 1 = c.1. f 1 . Thus, the term w we are looking for is simply [[ c .1. f 3 ]]c.2 which has type Q M [c.1.f2 ]c.2 where [[ ]] is the constant corresponding to computational dependent elimination for Leibniz equality: lemma [[ ]]p : Q x (refl A x) →Q y p. where x : A, y : A, p : x = y, Q : (∀ z. x = z → Type) and [[M]]reflA
x
≡ M.
To avoid handling the first field differently from the following, we can always use [[ ]] in place of [ ] . The following derived typing and reduction rules show that our encoding of with behaves as expected. Phi-Start ∅ valid Phi-Cons
Φ valid R, l1 , . . . , ln free in Φ Ti : Πl1 : T1 . . . . .Πli−1 .Ti−1 .T ype i ∈ {1, . . . , n} Φ, R = l1 : T1 , . . . , ln : Tn : T ype valid
Working with Mathematical Structures in Type Theory
163
Form (R = l1 : T1 , . . . , ln : Tn : T ype) ∈ Φ Γ, l1 : T1 , . . . , li−1 : Ti−1 a : Ti Γ R with li = a : T ype Intro (R = l1 : T1 , . . . , ln : Tn : T ype) ∈ Φ Γ R with li = a : T ype Γ Mk : Tk M1 . . . Mi−1 a Mi+1 . . . Mk−1 k ∈ {1, . . . , i − 1, i + 1, . . . , n} Γ M1 , . . . , Mi−1 , a, Mi+1 , . . . , Mn R , ref lA aR,λr:R.a=a : R with li = a Coerc
(R = l1 : T1 , . . . , ln : Tn : T ype) ∈ Φ Γ R with li = a : T ype Γ c : R with li = a li Γ kR ac:R
Coerc-Red Γ
li kR
(R = l1 : T1 , . . . , ln : Tn : T ype) ∈ Φ a M1 ,. . . , MnR , wR,s M1 ,. . . , Mi−1 , a, [[Mi+1 ]]w ,. . . , [[Mn ]]w R
Proj
(R = l1 : T1 , . . . , ln : Tn : T ype) ∈ Φ Γ R with li = a : T ype Γ c : R with li = a li Γ (kR a c).lj : Tj (kR a c).l1 . . . (kR a c).lj−1
Proj-Red1
(R = l1 : T1 , . . . , ln : Tn : T ype) ∈ Φ li Γ (kR a M1 , . . . , Mn R , wR,s ).lj Mj
Proj-Red2
j
(R = l1 : T1 , . . . , ln : Tn : T ype) ∈ Φ li Γ (kR a c).li a
Proj-Red3 (R = l1 : T1 , . . . , ln : Tn : T ype) ∈ Φ li Γ (kR a M1 , . . . , Mn R , wR,s ).lj [[Mj ]]w
2.4
i<j
Deep with Construction
In order to interpret the with type constructor on “deep” fields, it is sufficient to follow the same schema, changing the coercion to make their sub-records manifest. Formally, when Q := {f: T; l : U} and R := {q: Q; s: S}, we interpret R with q.f = M with Σ c:R. c.q. f = M and we declare the coercion: definition kq.f R : ∀ M:T. R with q.f = M → R := λ M:T. λ x:(Σc:R. c.q.f=M). kqR (kfQ M x.1.1, x.2Q, λq:Q. q.f=M ) (match x with a, lQ , sR , wR, λr:R. r.q.f=M ⇒ a, l Q , sR , [[refl Q a, lQ ]]w R, λr:R. r.q=kf
Q
M a,lQ , wQ, λq:Q. q.f=M )
164
C. Sacerdoti Coen and E. Tassi
Note that the computational rule associated to the computational dependent elimination of Leibniz equality is necessary to type the previous coercion: a, l Q , sR ,[[reflQ a, lQ ]]w R, λr:R. r.q=kf
Q Ma,lQ ,wQ,λq:Q. q.f=M
is well typed since refl Q a, lQ has type a, l Q = a, l Q that is equivalent to a, l Q = kfQ a a, lQ , reflT aQ,λq:Q. q.f=q.f ; thus [[reflQ a, lQ ]]w has type a, l Q = kfQ M a, lQ , wQ, λq:Q. q.f=M . As expected, (kq.f R M c).q.f M for all c of type R with q.f = M. Due to lack of space we omit all other derived typing and reduction rules associated to the deep with construct. 2.5
Nested with Contructions
Finally, from the derived typing and reduction rules it is not evident that a type R with la =M with lb =N can be formed. Surprisingly, this type poses no additional problem. The system simply de-sugars it as Σ d: (Σ c:R. c. l a =M). (klRa M d).lb =N
and, as explained in the next section, automatically declares the composite coercion klRa ,lb := λ M,N,c. klRb N (klRa M c) as a coercion from R with la =M with lb =N to R such that: (klRa ,lb M N c).la M and (klRa ,lb M N c).lb N and (klRa ,lb M N M1 ,. . ., Mn R , wa , wb ).li {{Mi }}wa ,wb
where {{Mi }}wa ,wb is Mi (if i
Signature Strengthening and with Commutation
To conclude our investigation of record types with manifest fields in type theory, we consider a few additional properties, which are signature strengthening and with commutation. An important typing rule for dependently typed records with manifest fields is signature strengthening: a record c of type R must also have type R with f = R.f and the other way around. In our setting R with f = R.f is interpreted as Σ c:R. c. f = c.f and we can couple the coercion kfR from R with f = R.f to R with a dual coercion ιR from R to R with f = R.f such that: ∀ w. kfR (ιR (w)) ≡w, ∀ w.ιR (kfR (w)) = w and the latter Leibniz equality is strengthened to definitional equality when w.2 is a closed term or the system is proof irrelevant. The same can be achieved with minor modificaitons when the equality on the type of the f field is decidable. with commutation is the rule that states the definitional equality of R with f=M with g=N and R with g=N with f=M when both expressions are well-typed. In our interpretation, the two types are not convertible since they are represented by different nestings of Σ-types. Moreover, for any label l that follows f and g in R, the l projection of two canonical inhabitants of the two
Working with Mathematical Structures in Type Theory
165
types built from the same terms are provable equal, but not definitionally equal: in the first case we obtain a term [[[[M]]wf ]]wg for some wf and wg , and in the second case we obtain a term [[[[M]]wg ]]wf . A proof of their equality is simply [[[[reflT M]]wf ]]wg . Definitional equality holds when when wf or wg are canonical terms — in particular when they are closed terms — or if at least one of the two types has a decidable equality. In practice, with commutation can often be avoided declaring a pair of mutual coercions between the two commutated types. 2.7
Remarks on Code Extraction
Algebra has been a remarkable testing area for code extraction, see the constructive proof developed in [9] for example. The encoding of manifest fields presented in the previous sections behave nicely with respect to code extraction. The manifest part of a term is encoded in the equality proof, that is erased during extraction, projections like kfR , are extracted to functions that simply replace one field of the record in input. All occurrences of [[ ]] are also erased.
3
Structures, Inheritance, Shared Carrier and Unification
Following the ideas from the previous section, we can implement with as syntactic sugar: R with f = M is parsed as Σ c:R. c. f = M and our special coercion kfR respecting definitional equality is defined from Σ c:R. c. f=M to R. The scope of the special coercion should be local to the scope of the declaration of a variable of type R with f=M. When with is used in the declaration of one record field, as in our running example, the scope of the coercion extends to the following record fields, and also to the rest of the script. As our running example shows, one important use of with is in multiple inheritance, in order to share multiply imported sub-structures. For instance, an ordered vector space inherits from a partially ordered set and from a vector space, under the hypothesis that the two carriers (singleton structures) are shared. Since sub-typing is implemented by means of coercions, multiple inheritance with sharing induces multiple coercion paths between nodes in the coercion graph (see Fig. 1; dotted lines hide intermediate structures like groups or Riesz spaces). When the system needs to insert a coercion to map an inhabitant of one type to an inhabitant of a super-type, it must choose one path in the graph. In order to avoid random choices that lead to unwanted interpretations and to type errors (in systems having dependent types), coherence of the coercion graph is usually required [14,3]. The graph is coherent when the diagram commutes according to βη-conversion. However in the following we drop η-conversion which is not supported in Coq and Matita. One interesting case of multiple coherent paths in a graph is constituted by coherent paths between two nodes and the arcs between them obtained by composition of the functions forming one path. Indeed, it is not unusual in large formalisation as CoRN [9] to have very deep inheritance graphs and to need to cast inhabitants of very deep super-types to the root type. For instance, the expression ∀ x: R should be understood as ∀ x: k R where k is a coercion from the ordered,
166
C. Sacerdoti Coen and E. Tassi
fk − algebra Q
k NK r k k H k uk k D" ectorSpace OrderedV Algebra SSS m p m SSS p I m p V mm SSS mmm pp SSS m m p S m wp S) vmm P osetQQ VkectorSpace QQQ k j CAvskkkk QQQ h i kk Q k g Q k f QQQ CApo kk e _oukkk_ ` a b c c d ( T ype
@ kRing
Fig. 1. Inheritance graph from the library of Matita
archimedean, complete field of real numbers to its carrier. Without composite coercions, the system needs to introduce a coercion to ordered, archimedean fields, then another one to ordered fields, another one to fields, and then to rings, and so on, generating a very large term and slowing down the type-checker. If coherent DAGs of coercions pose no problem to conversion, they do for unification, although this aspect has been neglected in the literature. In particular, consider again our running example, whose coercion graph is shown in Fig. 1. Suppose that the user writes the following (false) statement: ∀ x. −x ≤ x where −x is just syntactic sugar for −1 ∗ x. The statement will be parsed as ∀ x:?1 . −1 ?4 .∗ x ?5 .≤ x and the type reconstruction engine will produce the following two unification constraints: ?1 ≈?4 .CAvs (since x is passed to ?4 .∗) and ?4 .CAvs ≈?5 .CApo (since −1 ∗ x is passed to ?5 .≤). The first constraint is easily solved, “discovering” that x should be an element of a vector space, or a element of one super-structure of a vector space (since ?4 can still be instantiated with a coercion applied to an element of a super-structure). However, the second constraint is problematic since it asks to unify two applications (?4 .CAvs and ?5 .CApo ) having different heads. When full higher-order unification is employed, the two heads (two projections) are unfolded and unification will eventually find the right unifier. However, unfolding of constants during unification is too expensive in real world implementations, and higher order unification is never implemented in full generality, preferring an incomplete, but deterministic unification strategy. Since expansion of constants is not performed during unification, the constraint to be solved is actually a rigid-rigid pattern with two different heads. To avoid failure, we must exploit the coherence of the coercion graph. Indeed, since the arguments of the coercions are metavariables, they can still be instantiated with any possible path in the graph (applied to a final metavariable representing the structure the path is applied to). For instance, ?4 .CAvs can be instantiated to ?6 .p.CAvs where ?6 is an ordered vector space and the vector space ?4 is obtained from ?6 forgetting the poset structure.
Working with Mathematical Structures in Type Theory
167
Thus the unification problem is reduced to finding two coherent paths in the graph ending with CAvs and CApo . A solutions is given by paths ?6 .V.CApo and ?6 .p.CAvs . Another one by ?7 .r.V.CApo and ?7 .r.p.CAvs where ?7 is an f-algebra. Among all solutions the most general one corresponds to the pullback (in categorical terms) of the two coercions, when it exists. In the example, the pullback is given by V and p. All other solutions (e.g. r .V and r .p) factor trough it. If the pullback does not exist (i.e. there are different solutions admitting antiunifiers), the system can just pick one solution randomly, warning the user about the arbitrary choice. Coercion graphs for algebraic structures usually enjoy the property that there exists a pullback for every pairs of coercions with the same target. Finally, note that the Coq system does not handle composite coercions, since these would lead to multiple paths between the same types. However, long chains of coercions are somehow problematic for proof extraction. According to private communication, an early experiment in auto-packing chains of coercions was attempted, but dropped because of the kind of unification problems just explained. After implementing the described procedure for unification up to coherence in Matita, we were able to implement coercion packing.
4
Type Reconstruction with Unification and Coercions
Syntactic de-sugaring of with expression for a large hierarchy of mathematical structures has been made by hand in Matita, proving the feasibility of the approach. In particular, combining de-sugaring with the unification up to coherence procedure described in the previous paragraph, we are able to write the first part of our running example in Matita. The statement of the trivial lemma, however, is not accepted yet. To fully understand which problem still arises, we need to introduce type reconstruction and coercion synthesis algorithms more formally. 4.1
Type Reconstruction Algorithm
Coq, Lego and Matita use similar algorithms based on [1,2,17] to insert coercions in user provided ill-typed terms to make them well-typed. Coercions can be inserted in three different positions: around arguments expected to be sorts (e.g. when typing bound variables), around application heads (to fix the arity of the head), and around application arguments (to fix their types). In the following presentation the coercion synthesis judgement R
Γ t Γ t : T means that a term t in a well-typed context Γ can be internalised as a well-typed term t of type T ; t is obtained by inserting coercions in t. s ranges over sorts (Prop or Type in CIC). c ranges over declared coercions.
168
C. Sacerdoti Coen and E. Tassi
The rules given in [1,17] are the following: lam
Γ T
R
Γ T : T
R
Γ T ≡ /s
Γ c : T → s
Γ, x : c T b Γ, x : c T b : U R
Γ λx : T.b Γ λx : c T .b : Πx : c T .U prod Γ T
R
Γ T : T
R
Γ T ≡ /s
Γ c : T → s
Γ, x : c T U : s
Γ, x : c T U
R
Γ Πx : T.U Γ Πx : c T .U : s app-head Γ f
R
Γ f : F
Γ c : F → Πx : A.U
Γ F≡ / Πx : B.C R
Γ (c f ) a Γ u : U R
Γ f a Γ u : U app-arg R
R
Γ f Γ f : Πx : B.U Γ a Γ a : A Γ A≡ /B Γ c:A→B R
Γ f a Γ f (c a ) : U [c a /x] All these rule have a negative precondition. If the precondition is positive, then the coercion is not needed and thus not inserted. These rules have been employed in the type reconstruction algorithm of Coq and Matita. The type reconstruction algorithm is obtained from the syntax directed type inference algorithm by adding metavariables [15] in the calculus (standing for missing sub-terms) and by replacing conversion (≡) with unification (≈). We thus extend our judgement with an environment Θ that is a list of metavariable declarations (Γ ?i : T ) or metavariable instantiations. With ?si we state a metavariable that can only be instantiated with a sort (Prop or Type in CIC); with ?ci a metavariable that can only be instantiated with a coercion. R
can now instantiate metavariables performing unification, thus the whole judgement is extended to R
Θ : Γ t Θ : Γ t : T Insertion of coercions interacts badly with open terms. Consider, for instance, the following example and assume a coercion from natural numbers to integers. ∀ P : int → Prop. ∀ Q : nat → Prop. ∀ b. P b ∧Q b.
Here P b is processed before Q b. The rule app-arg is not applied, since the type of b is a metavariable ?i and ?i ≈ int. Then Q b is processed, but now b has
Working with Mathematical Structures in Type Theory
169
type int , int ≈ / nat and there is no coercion from int to nat. The problem here is that a coercion was needed around the first occurrence of b but since its type was flexible app-arg was not triggered. To solve the problem, one important step is the realization that rules that insert coercions and rules that do not are occurrences of the same rule when identities are considered as coercions. In [6,7], Chen proposes an unified set of rules that also employes least upper bounds (lub) in the coercion graph to compute less coerced solutions. Chen’s rule for application adapted with metavariables in place of coercions is the following: I-app Θ:Γ f
R
Θ : Γ f : C R
Θ :Γ a Θ
Θ = Θ , Γ ?ci : C →lub Πx : A.B
: Γ a : A
Θ = Θ , Γ ?cj : A → A
R
Θ : Γ f a Θ : Γ (?ci f ) (?cj a ) : B[(?cj a )/x] Adopting this rule, the problematic example above is accepted: ∀ P : int → Prop. ∀ Q : nat → Prop. ∀ b. P b ∧ Q b is understood as ∀ P : int → Prop. ∀ Q : nat → Prop. ∀ b: ?1 . (?c2 P) (?c3 b) ∧ (?c4 Q) (?c5 b) where ?1 can be instantiated with nat, ?c3 with the coercion from nat to int and all other
coercions with the identity. From this example it is clear that Chen’s rules modified with metavariables are able to type every term generating a large number of constraints that must inefficiently be solved at the very end looking at the coercion graph. Note, however, that rule I-appl in its full generality is not required to accept our running example. We believe this not to be a coincidence. Indeed, most formulae in the algebraic domain are of a particular shape: 1) universal quantifications are either on structure types (e.g. ∀G : Group) or elements of some structure (e.g. ∀g : G to be understood as ∀g : G.CA); 2) functions either take structures in input (e.g. G × G); or they manipulate structure elements whose domain is left implicit (e.g. : M.CA → nat → M.CA for some monoid M ). In particular, all operations in a structure are functions of the second kind. Under this assumption, rule I-appl can be relaxed to rule app-head-arg, which is given below together with the rules for explicitly and implicitly typed universal quantification. lam-explicit Θ:Γ T Θ = Θ , Γ ?sj , Γ ?ci : T →?sj
R
Θ : Γ T : T
R
Θ : Γ, x :?ci T b Θ : Γ b : U
R
Θ : Γ λx : T.b Θ : Γ λx :?ci T .b : Πx :?ci T.U lam-implicit Θ = Θ, Γ ?i :?sj
R
Θ : Γ, x :?i b Θ : Γ, x :?i b : T R
Θ : Γ λx :?.b Θ : Γ λx :?i .b : Πx :?i .T
170
C. Sacerdoti Coen and E. Tassi
prod-explicit Θ:Γ T
R
Θ : Γ T : T
Θ : Γ, x :?ci T U
R
Θ = Θ , Γ ?sj , Γ ?ci : T →?sj
Θ : Γ, x :?ci T U : s R
Θ : Γ Πx : T.U Θ : Γ Πx :?ci T .U : s prod-implicit Θ = Θ, Γ ?sj , Γ ?i :?sj
R
Θ : Γ, x :?i U Θ : Γ, x :?i U : s
Θ : Γ Πx :?.U
R
Θ : Γ Πx :?i .U : s
app-head-arg Θ:Γ f
R
R
Θ : Γ f : F Θ : Γ a Θ : Γ a : A Θ , cf : F → Πx : T → U = lubΠ (Θ , Γ, F ) Θ , ca : A → T = lub(Θ , Γ, A, T ) R
Θ : Γ f a Θ : Γ (cf f ) (ca a ) : U [ca a /x] The auxiliary function lubΠ (Θ, Γ, F ) returns a couple Θ , c : T → Πx : U.V such that in Θ and Γ we have F ≈ T and Πx : U.V is the least upper bound of all solutions in the coercion graph. Note that, according to the restrictions we made, F cannot be a flexible term. Thus the computation of the least upper bound is as in Chen. The auxiliary function lub(Θ, Γ, T, U ) returns a couple Θ , c : T → U such that the type of the coercion c can be unified to T → U in Γ and Θ and the coercion is the least upper bound of the solutions in the coercion graph. The lub function is defined according to the restriction on functions in the algebraic language. Indeed, by hypothesis we must only consider the following two cases corresponding to the two kind of functions in our language: 1. f has type S → T for some structure type S and some type T and a has type ?1 or it has type R for some structure type R. In the first case the lub is the identity coercion and ?1 is unified with S. In the second case the lub is the coercion from R to S in the coercion graph. 2. f has type ?1 .CAR →? and a has type ?2 or it has type ?2 .CAS . In both cases the lub is the identity coercion and the type of a is unified with ?1 .CAR exploiting the coercion graph as explained in Sect. 3. Finally, as expected, our rules are not complete outside the fragment we choose. For instance, assume a coercion from natural numbers to integers and consider the following statement: lemma broken : ∀ f : (∀ A : Type. A →A → Prop). f ?i 3 −2 ∧f ?i −2 3.
Here the type of f is completely specified, and the rule prod-explicit is applied. The term f ?i , which is outside our fragment, has type ?i → (?i → Prop) and it is passed an argument of type nat the first time and an argument of type int the second time. No backtracking free algorithm would be able to type this term.
Working with Mathematical Structures in Type Theory
5
171
Conclusions
In this paper we addressed the problem of representing mathematical structures in a proof assistant based on a type theory with dependent types, telescopes and a computational version of Leibniz equality. We show how to represent dependently typed records with manifest fields in type theory exploiting coercive subtyping and unification up to coherence in coercion graphs. We made a significant advancement with respect to [16] since we do not require induction-recursion to have the with construct. Unification up to coherence seems also a novel approach. We have also identified a significant fragment of algebra for which a backtracking-free coercion-aware type reconstruction algorithm can be efficiently implemented. This latter result requires further investigation (to enlarge the fragment) and a formal proof of completeness.
References 1. Bailey, A.: Coercion synthesis in computer implementations of type-theoretic frameworks. In: Gim´enez, E. (ed.) TYPES 1996. LNCS, vol. 1512, pp. 9–27. Springer, Heidelberg (1998) 2. Bailey, A.: The Machine-Checked Literate Formalisation Of Algebr. In: Type Theory. PhD thesis, University of Manchester (1998) 3. Barthe, G.: Implicit coercions in type systems. In: Berardi, S., Coppo, M. (eds.) TYPES 1995. LNCS, vol. 1158, pp. 1–15. Springer, Heidelberg (1996) 4. Betarte, G., Tasistro, A.: Formalization of systems of algebras using dependent record types and subtyping: An example. In: Proceedings of the 7th. Nordic workshop on Programming Theory, Gothenburg (1995) 5. Betarte, G., Tasistro, A.: Extension of Martin-L¨ of’s type theory with record types and subtyping. In: Twenty-five Years of Constructive Type Theory, Oxford Science Publications (1998) 6. Chen, G.: Subtyping, Type Conversion and Transitivity Elimination. PhD thesis, University Paris 7 (1998) 7. Chen, G.: Coercive subtyping for the calculus of constructions. In: The 30th Annual ACM SIGPLAN - SIGACT Symposium on Principle of Programming Language (POPL) (2003) 8. Coquand, T., Pollack, R., Takeyama, M.: A logical framework with dependently typed records. Fundamenta Informaticae 65(1-2), 113–134 (2005) 9. Cruz-Filipe, L., Geuvers, H., Wiedijk, F.: C-corn, the constructive coq repository at nijmegen. In: Asperti, A., Bancerek, G., Trybulec, A. (eds.) MKM 2004. LNCS, vol. 3119, pp. 88–103. Springer, Heidelberg (2004) 10. Dybjer, P.: A general formulation of simultaneous inductive-recursive definitions in type theory. Journal of Symbolic Logic 65(2) (2000) 11. Gonthier, G.: A computer-checked proof of the four-colour theorem. Available at: http://research.microsoft.com/∼ gonthier/4colproof.pdf 12. Hedberg, M.: Unpublished proof formalized in lego by T. Kleymann and in coq by B. Barras, http://coq.inria.fr/library/Coq.Logic.Eqdep dec.html
172
C. Sacerdoti Coen and E. Tassi
13. Luo, Z.: An Extended Calculus of Constructions. PhD thesis, University of Edinburgh (1990) 14. Luo, Z.: Coercive subtyping. J. Logic and Computation 9(1), 105–130 (1999) 15. Mu˜ noz, C.: A Calculus of Substitutions for Incomplete-Proof Representation in Type Theory, November 1997. PhD thesis, INRIA (1997) 16. Pollack, R.: Dependently typed records in type theory. Formal Aspects of Computing 13, 386–402 (2002) 17. Saibi, A.: Typing algorithm in type theory with inheritance. In: The 24th Annual ACM SIGPLAN - SIGACT Symposium on Principle of Programming Language (POPL) (1997)
On Normalization by Evaluation for Object Calculi J. Schwinghammer Programming Systems Lab, Saarland University, Saarbr¨ ucken, Germany
Abstract. We present a procedure for computing normal forms of terms in Abadi and Cardelli’s functional object calculus. Even when equipped with simple types, terms of this calculus are not terminating in general, and we draw on recent ideas about the normalization by evaluation paradigm for the untyped lambda calculus. Technically, we work in the framework of Shinwell and Pitts’ FM-domain theory, which leads to a normalization procedure for the object calculus that is directly implementable in a language like Fresh O’Caml.
1
Introduction
Normalization by evaluation (NBE), sometimes referred to as reduction-free normalization, is a technique for computing normal forms of terms. It was proposed in [7] as an efficient method for proof normalization, based on the representation of natural deduction proofs as terms of the simply typed lambda calculus (possibly enriched with constants). The underlying principle, once discovered, is rather simple: In a model of the calculus, the denotations a and a of any convertible terms a ↔ a are necessarily identified, so if it is possible to extract a normal form b representing a semantic element d (i.e., such that b = d) then interpretation followed by extraction yields a normal form for any given term. Of course, the trick is to find models that allow for such term extraction. Residualizing models contain (representations of) syntax and provide basic operations on terms, which can be used to construct normal forms inside the model. For instance, a residualizing interpretation of simply typed lambda calculus may be obtained by constructing the full set-theoretic hierarchy over the set of terms, where a term extraction function ↓, along with a term embedding function ↑ from lambda terms to semantic elements, can be defined by mutual induction on the type: letting ↓ι (a) = a and ↑ι (a) = a at base types ι, one considers ↓A⇒B (f ) = lam(x, ↓B (f (↑A (var x)))) ↑
A⇒B
(a) = λ(v ∈ A). ↑ (app(a, ↓ (v)) B
A
(1) (2)
where lam, app and var are used as constructors for lambda terms. Indeed, [7] proves that, as long as the variable x in (1) is chosen ‘fresh’, if ρ is the identity environment then ↓A(aρ ) is a βη-long normal form of the term a : A. If one uses a (functional) programming language as an adequate meta-language to describe the interpretation and extraction of terms, then NBE leads to M. Miculan, I. Scagnetto, and F. Honsell (Eds.): TYPES 2007, LNCS 4941, pp. 173–187, 2008. c Springer-Verlag Berlin Heidelberg 2008
174
J. Schwinghammer
a normalization method that is directly executable, and correct by construction. The method is robust and widely applicable; it has been adapted to various type theories, going well beyond simply typed lambda calculus (for instance, see [4,3,8,7]). An introductory survey, including applications of NBE to partial evaluation of functional programs, can be found in [10]. In this paper, we present a NBE procedure for Abadi and Cardelli’s functional object calculus [1]. Even when equipped with simple types, the terms of this calculus are not in general convertible to normal forms. Recursion is inherent to objects, and arises from the presence of a distinguished ‘self’ identifier that provides access to the host object from within method bodies. (Terms are also not necessarily normalizing in similar foundational calculi for object-oriented languages, e.g. [12], and we expect that our results can be straightforwardly adapted.) Thus, we draw heavily on work about NBE of untyped lambda calculus [11,2]. In hindsight this is not surprising, since indeed it bears many similarities to the self-application model of method invocation in object calculi. To take into account the partiality of the normalization function, the correctness criterion is weakened accordingly [11]: it comprises of soundness (results are normal forms of the respective input), identification (convertible inputs yield the same result), and completeness (the function is defined on every input that is convertible to a normal form), which we also establish for our NBE procedure. Just as in the defining equation (1) for abstractions above, a technical complication arises during the extraction of terms corresponding to the methods of an object in normal form. One has to find variables that are fresh, not simply with respect to some given term, but rather ‘globally’: in general the name of the bound variable must be chosen before the term for the method body can be constructed recursively. Most previous work has addressed this issue by guaranteeing that variables generated in the normalization process are indeed globally unique, for instance, by implementing a name generator using a state monad, or avoided name clashes by adopting de Bruijn levels to identify variables. In contrast, we follow Pitts and construct a residualizing model using nominal sets, which allow for a rigorous yet fairly lightweight treatment of binding constructs, via built-in notions of finite support and freshness [16]. More precisely, our model for the object calculus is a variant of the untyped domain model of [17], where (1) we replace the category of domains by the FM-domain theory of Shinwell and Pitts, and (2) use a continuation semantics in the style of Shinwell and Pitts, and Benton and Leperchey [18,19,6]. This provides a neat solution to the problem of interpreting fresh name generation in the meta-language, and allows for a conceptually clear presentation of NBE and its correctness. While the overall structure of our proof closely follows that of Filinski and Rohde’s [11], proofs of individual properties have a distinct flavour due to the continuation semantics. In particular, the central relation between denotations and syntax is an instance of the relational -lifting of [14,19]. Outline. The next section recalls Abadi and Cardelli’s calculus. Section 3 summarizes the relevant aspects from FM domain theory. The construction of the
On Normalization by Evaluation for Object Calculi a : at x : at
a. : at
a : at
m : mnf
a. := m : at
175
(∀i ∈ [k]) mi : mnf
a : at
a : nf
[i = mi i∈[k] ] : nf
a : nf
ς(x)a : mnf
Fig. 1. Atomic terms and normal forms If a ≡ [i = mi i∈[k] ], j ∈ [k] and mj ≡ ς(xj )aj then Selection: a.j ↔ (xj → a)(aj ) Update: a.j := m ↔ [i = mi i∈[k],i=j , j = m] Fig. 2. Conversion semantics
normalization procedure is given in Section 4, its correctness is established in Section 5 which forms the technical core of the paper. The appendix contains the existence proof omitted in Section 5.
2
Syntax and Conversion Semantics of Object Calculus
Syntax. Fix a set L of labels, ranged over by , and let x, y range over a countably infinite set of variables Var . For k ∈ N, let [k] = {1, . . . , k}. The set obj of object calculus terms is defined by the following grammar: a, b ∈ obj ::= x | [i = mi i∈[k] ] | a. | a. := m m ∈ meth ::= ς(x)a The self binder of methods is the only binding construct (ς(x)a binds x in the method body a). This determines the set fv (a) of free variables of a, and we identify terms up to α-equivalence. Given a (finite) map θ : Var → obj we write capture-avoiding simultaneous substitution as θ(a). The set of atomic (or neutral ) terms and (method) normal forms, respectively, are defined inductively by the inference rules in Figure 1. These will be the output of the nbe procedure (if any). They roughly correspond to the ‘wellformed’ irreducible terms with respect to the usual reduction semantics from [1] i∈[k] (informally called results in [1]). In particular, note that [i = ς(xi )ai ]. and i∈[k] [i = ς(xi )ai ]. := ς(y)b are not normal forms whenever = i for all i ∈ [k] even though both are irreducible. It is not difficult to repair this mismatch by considering a minor variation of the reduction relation; e.g. as in [9]. Conversion. The conversion relation a ↔ b between terms is the least equivalence relation on obj containing the axioms in Figure 2 that is compatible, i.e., it is reflexive, symmetric, transitive, and for all contexts C[−] with a single hole, if a ↔ b then C[a] ↔ C[b].
176
3
J. Schwinghammer
FM Domain Theory
We work in the category FM-Cppo of FM-cppos over Var and equivariant strict continuous functions. To keep this paper self-contained we recall the necessary definitions from [19,18]. In the following we write (x x ) for the transposition that swaps x and x and fixes all other y ∈ Var, and denote by perm = perm(Var ) the group of bijections π : Var → Var generated by all transpositions (i.e., where π(x) = x for all but finitely many x ∈ Var ); id stands for the identity. An action of perm on a set A is an operation · : perm × A → A such that id · a = a and π · (π · a) = (π ◦ π ) · a for all a ∈ A and π, π ∈ perm. An FM-set A is given by an action of perm on A such that every a ∈ A is finitely supported, meaning that there exist finite sets V ⊆ Var such that for all π ∈ perm, ∀(x ∈ V ) π(x) = x ⇒ π · a = a. Each a ∈ A in fact possesses a smallest such set supp(a) supporting a. Every set A gives rise to an FM-set when equipped with the trivial action π · a = a. Syntax as FM-Set. The set of obj-terms may be turned into an FM-set, where the action of perm is defined by structural recursion and the interesting cases are π · x = π(x) and π · ς(x)a = ς(π(x))(π · a). This gives rise to a well-defined action of perm on α-equivalence classes of terms such that the notion of support coincides with that of free variables, i.e., supp(a) = fv (a) for all a ∈ obj. Moreover, α-equivalence itself can be characterised using the action on terms, as the least congruence relation ∼ such that whenever (x y) · a ∼ (x y) · a for some/all y not occurring in a, a and different from x and x , then ς(x)a ∼ ς(x )a [16]. Domain-theoretic Constructions. An FM-poset is an FM-set A equipped with a partial order on A that is compatible with the group action, i.e. where a a implies π · a π · a for all a, a and π. More generally, a subset D ⊆ A of an FM-set A is finitely supported if there exists finite V ⊆ Var such that for all π that fix V pointwise, a ∈ D ⇔ π · a ∈ D. By an FM-cpo we mean an FM-poset A where each finitely supported directed subset D has a least upper bound D. (The use of directed-complete FM-cpos is not essential for our purposes; we could just as well have used chain-complete FM-posets as in [19].) A continuous function f : A → B between FM-cpos is a monotonic function f from A to B such that (1) f is equivariant : f (π · a) = π · (f (a)) for all a and π, and (2) f preserves least upper bounds: f (D) = f (D) for all finitely supported directed sets D ⊆ A. An FM-cppo is an FM-cpo that possesses a least element ⊥, for which necessarily supp(⊥) = ∅ holds. A continuous function f between FM-cppos is strict if f (⊥) = ⊥. The smash product A1 ⊗ A2 and coalesced sum A1 ⊕ A2 of FM-cppos A1 and A2 are given by the corresponding construction on pointed cpos (e.g. [15]) where perm acts by π · a, a = π · a, π · a and π · ιi (a) = ιi (π · a) for i = 1, 2, resp. (We may omit the tags ιi below). If A is a FM-cpo then its lift A⊥ is obtained
On Normalization by Evaluation for Object Calculi
177
by adjoining a new element ⊥ ∈ / A, with π · ⊥ = ⊥ and ⊥ a for all π, a; conversely, for an FM-cppo, A↓ is the FM-cpo obtained by removing ⊥. The function space A → B between FM-cpos consists of those monotonic functions f : A → B that preserve least upper bounds of finitely supported directed subsets of A, and additionally are finitely supported in the sense that there exists a finite set V ⊆ Var such that for all π ∈ perm and a ∈ A, ∀(x ∈ V ) π(x) = x ⇒ π · (f (a)) = f (π · a). If A, B are FM-cppos then the strict function space A B is the subset of functions in A → B that additionally preserve ⊥. When equipped with the action where (π · f )(a) = π · (f (π −1 · a)) for a ∈ A, both sets become FM-c(p)pos. For an FM-cppo A we also consider an FM-cppo of L-labelled records with entries from A: its underlying set is RecL (A) = (L → A↓ ) ⊥ (3) L⊆fin L
For ⊥ = ιL (r) ∈ RecL (A) we write dom(r) = L and use record notation {|1 = a1 , . . . , k = ak |} if L = {1 , . . . , k } and r(i ) = ai for all i ∈ [k]. We shall also write r. for r() and r. := a for the record that maps to a and all other ∈ dom(r) to r. (assuming ∈ dom(r); the expressions denote ⊥ otherwise). The ordering on (3) is given by r r
⇔
r = ⊥ ⇒ dom(r) = dom(r ) ∧ ∀( ∈ dom(r)) r. r .,
and the action of perm by (π · r). = π · (r.) for all ∈ dom(r). In particular, supp(r) = ∈dom(r) supp(r.) so that r is finitely supported. Continuation Monad. For the purposes of giving a denotation to objects in a continuation semantics we denote by A⊥ the FM-cppo A 1⊥ and by A⊥⊥ the FM-cppo (A⊥ )⊥ = (A 1⊥ ) 1⊥ , where 1 is a singleton cppo. We shall write return a for the unit λ(h ∈ A⊥ ). h(a) of the continuation monad, and denote by f ∗ ∈ (A⊥⊥ B ⊥⊥ ) the extension f ∗ (d)(h) = d(λ(a ∈ A). f (a)(h)) of a function f ∈ (A B ⊥⊥ ). The notation let a ⇐ d in e[a], where e may ∗ depend strict continuously on a, stands for (λ(a ∈ A). e[a]) (d). Note that return ∗ and (−) are equivariant and (strict) continuous operations, and that we have: let a ⇐ return a in e[a] = e[a ] let a ⇐ (let a ⇐ d in e [a ]) in e[a] = let a ⇐ d in let a ⇐ e [a ] in e[a]. A Domain Equation for Objects. An account of the self-application inherent in Abadi and Cardelli’s object calculus requires a recursively defined domain. As
178
J. Schwinghammer
outlined in [19], the constructions on FM-cppos are functorial, with a locally FMcontinuous action on the morphisms of FM-Cppo. Essentially, this means that solutions to recursive domain equations can be found by the classical technique using embedding-projection pairs, suitably adapted to FM-cppos by replacing arbitrary directed sets in the construction by finitely supported ones. Similar to the untyped lambda calculus, for the object calculus we will be interested in a model where the space of ‘records of pre-methods’ is a retract of the model. More precisely, given an FM-cppo A, we let FA : FM-Cppoop × FM-Cppo −→ FM-Cppo be the locally FM-continuous functor FA (X, Y ) = A ⊕ RecL (X Y ⊥⊥ )
(4)
and observe that the construction referred to above yields solutions D with i : FA (D, D) ∼ = D that are minimal invariant objects in the sense that the map δ : (D D) → (D D), defined as δ(e) = i ◦ FA (e, e) ◦ i−1 , satisfies id D = lfp(δ). This minimal invariant property will be employed in the existence proof in Section 5 below. For ease of notation we will usually omit the isomorphism i in the following.
4
Normalization Procedure
We will interpret the object calculus in a residualizing model, specified by the following pair of mutually recursive domain equations: O = obj⊥ ⊕ RecL (M)
M = O O⊥⊥
Clearly O can be obtained as the minimal invariant of (4) by choosing A = obj⊥ . Term Constructors. The embedding Var → obj extends to a strict continuous function var ∈ (Var ⊥ obj⊥ ) with empty support, mapping x ∈ Var to x. Similarly, the other ways of constructing obj terms may be viewed as strict continuous functions (with empty support): – – – –
meth : Var ⊥ ⊗ obj⊥ meth⊥ sends x, a = ⊥ to ς(x)a obj : RecL (meth⊥ ) obj⊥ sends {|i = mi i∈[k] |} to [i = mi i∈[k] ] sel : obj⊥ ⊗ L⊥ obj⊥ sends a, = ⊥ to a. upd : obj⊥ ⊗ L⊥ ⊗ meth⊥ obj⊥ sends a, , m = ⊥ to a. := m
If we let tm(a) = i(ι1 (a)) and rec(r) = i(ι2 (r)) then any element ⊥ = d ∈ O may be uniquely written as either d = tm(a) or d = rec r for (uniquely determined) a ∈ obj and ⊥ = r ∈ RecL (M), respectively. Reification and Reflection. The reason for using FM domain theory and the continuation semantics is that it lets us choose fresh variable names: there exists an element fresh in the FM-cppo (Var ⊥ )⊥⊥ that maps h ∈ (V ar⊥ )⊥ to h(x) ∈ 1⊥ , where x is any variable not in supp(h). The choice of x does not matter
On Normalization by Evaluation for Object Calculi
179
xη = return(η(x)) [i = mi i∈[k] ] = let (fi ⇐ mi η | i ∈ [k]) in return(rec( {|i = fi i∈[k] |} )) η
a.η = match aη with tm(b) ⇒ ↑(sel(b, )) | rec(r) ⇒ r.(rec r) a.l := mη = match aη with tm(b) ⇒ let m ⇐ ↓m ∗ (mη ) in ↑(upd(b, , m )) | rec(r) ⇒ let f ⇐ mη in return(rec(r. := f )) ς(x)aη = return(λ(d ∈ O). if d=⊥ then ⊥ else aη[x:=d] ) Fig. 3. Interpretation of obj-terms in O⊥⊥
since the action of perm on 1⊥ is necessarily trivial, hence if x, y ∈ / supp(h) then h(x) = ((x y) · h)(x) = (x y) · (h(y)) = h(y). Now reflection ↑ : obj⊥ O⊥⊥ , ↑(a) = return(tm a), lets us view terms as elements of O. (Conceptually, it would be enough to have reflection for atomic terms only.) Conversely, the (mutually recursive) reification functions ↓ : O (obj⊥ )⊥⊥ and ↓m : M (meth⊥ )⊥⊥ , allow us to read back object calculus terms from semantic elements. They are defined as least fixed points, by the equations ↓(⊥) = ⊥ ↓(tm a) = return(a)
↓(rec r) = let (m ⇐ ↓m (r.) | ∈ dom(r)) in return obj( {| = m ∈dom(r) |} ) ↓m (f ) = let x ⇐ fresh in let a ⇐ ↓∗ (f ∗ (↑(var x))) in return meth(x, a) where ↓m makes use of the function fresh described above. Interpretation and Normalization of Objects. We interpret obj in O. More precisely, given an environment η ∈ Env = Var →O such that η(x) = ⊥ for all x, the denotation of each term a ∈ obj is an element aη ∈ O⊥⊥ . Similarly, the denotation of each method m ∈ meth is an element mη ∈ M⊥⊥ . The notation match d with tm(a) ⇒ e1 [a] | rec(r) ⇒ e2 [r] (where e1 and e2 may depend strict continuously on a and r, resp.) stands for the case construct let v ⇐ d in (λ(a ∈ obj⊥ ). e1 ) ⊕ (λ(r ∈ RecL (M)). e2 ) (v), so that in particular
⎧ ⎪ if d = ⊥ ⎨⊥ match (return d) with tm(a) ⇒ e1 [a] | rec(r) ⇒ e2 [r] = e1 [a] if d = tm a ⎪ ⎩ e2 [r] if d = rec r
180
J. Schwinghammer
The interpretation is given in Figure 3, defined by recursion on obj-terms. It can be verified that −η respects α-equivalence and therefore is also well-defined on α-equivalence classes. Alternatively, −η may be directly defined on equivalence classes by α-structural recursion [16]. Lemma 1 (Substitution). For all a, x = x1 . . . xn , b = b1 . . . bn , d = d1 . . . dn and η, if bi η = return(di ) for all i ∈ [n] then (x → b)(a)η = aη[x:=d] . Proof. By α-structural induction on a, exploiting that the denotation of a only depends on the value of the environment on fv (a) = supp(a) which is similarly proven by α-structural induction using the definition of a → aη . See [16]. Theorem 1 (Model soundness). If a ↔ a then a = a . Proof (sketch). By induction on the derivation of a ↔ a . For instance, we have for all a of the form [i = mi i∈[k] ] and j ∈ [k] with each mi of the form ς(xi )ai : a.j η = match aη with tm(b) ⇒ ↑(sel(b, )) | rec(r) ⇒ r.(rec r) = gj (rec {|i = gi i∈[k] |} )
(for gi = λd.if d=⊥ then ⊥ else ai η[xi :=d] )
= aj η[xj :=rec {|i =gi i∈[k] |} ] = (xj → a)aj η where the second equation follows from aη = return(rec {|i = gi i∈[k] |} ) for gi = λd.if d=⊥ then ⊥ else ai η[xi :=d] , the second is by definition of gj , and the last equation is by Lemma 1. The case for update is similar, the cases for the equivalence and compatibility rules are immediate by induction. Note that return is injective. Thus we may define norm : obj → obj⊥ to be the partial map satisfying norm(a) = b
⇔
↓∗ (aη0 ) = return(b)
(some b ∈ obj)
(5)
and norm(a) = ⊥ otherwise. Here, η0 = λ(x ∈ Var). tm(var x) denotes the environment that maps every variable to the corresponding element of O.
5
Correctness
Following [11], the correctness properties we expect from norm : obj → obj⊥ are split into three parts: Soundness. If the normalization function is defined, then the output is convertible to the input, and in normal form: norm(a) = a ⇒ a : nf ∧ a ↔ a . Identification. The normalization function yields equal results on convertible terms: a ↔ a ⇒ norm(a) = norm(a ).
On Normalization by Evaluation for Object Calculi
181
⊥ obj b ⇔ true tm a obj b ⇔ a : at ∧ a = b rec r obj [i = mi i∈[k] ] ⇔ {i | i ∈ [k]} = dom(r) ∧ ∀(i ∈ [k]) r.i meth mi f meth ς(x)a ⇔ ∀(d ∈ O) ∀(b ∈ obj) d obj b ⇒ f (d) (x := b)(a) d b ⇔ d (obj )⊥⊥ b Fig. 4. Relations obj ⊆ O × obj, meth ⊆ M × meth, and ⊆ O⊥⊥ × obj
Completeness. The normalization function will be defined whenever the input term has a normal form: a ↔ a ∧ a : nf ⇒ norm(a) = ⊥. While identification and completeness are fairly direct consequences of Theorem 1, the proof of soundness requires more work: as explained by Filinski and Rohde, the property is closely related to proofs of adequacy of a denotational semantics with respect to an operational one [11]. Relating Denotations to NBE Results. For an FM-cppo A and relation R ⊆ A × obj we let R⊥ ⊆ (A (obj⊥ )⊥⊥ ) × ctxt and R⊥⊥ ⊆ A⊥⊥ × obj be the relations, resp., defined by: ϕ R⊥ C[−] ⇔ ∀(d ∈ A) ∀(b ∈ obj) d R b ∧ ϕ(d) = ⊥ ⇒ ∃(a : nf) ϕ(d) = return(a) ∧ C[b] ↔ a ⊥ ⊥ ∗ d R⊥⊥ b ⇔ ∀(ϕ ∈ (A obj⊥ ⊥ ))∀(C[−] ∈ ctxt) ϕ R C[−] ∧ ϕ (d) = ⊥ ⇒
∃(a : nf) ϕ∗ (d) = return(a) ∧ C[b] ↔ a
Using this notation, Figure 4 defines a (recursive) relation = (obj )⊥⊥ ⊆ O⊥⊥ × obj. Note that the existence of such a relation is not immediately obvious, due to both positive and negative occurrences of the relation in the clause for meth . The existence proof follows the method of Pitts [15] (exploiting the minimal invariant property of the domain O), and is given as Theorem A.4 in the appendix. Lemma 2 (Lifting). If d obj b then return(d) b. Proof. By definition, since ϕ∗ (return d) = ϕ(d).
Lemma 3 (Fundamental property of ). Suppose η is an environment and θ is a substitution such that η(x) obj θ(x) for all x ∈ fv (a). Then 1. for all a ∈ obj, aη θ(a), and 2. for all m ∈ meth and g ∈ M, if mη = return g then g meth θ(m). Proof. Simultaneously by α-structural induction on a and m, respectively. The cases where a is x or [i = mi i∈[k] ] are easy. If a is a . then some desugaring
182
J. Schwinghammer
of the match expression yields aη = let v ⇐ a η in (f1 ⊕ f2 )(v ) where f1 ∈ (obj⊥ O⊥⊥ ) and f2 ∈ (RecL (M) O⊥⊥ ) are f1 (b) = ↑(sel(b, ))
f2 (r) = r.(rec r)
We must prove that aη θ(a), i.e., that there exists b : nf such that both let v ⇐ aη in ϕ(v) = return(b) and C[θ(a)] ↔ b hold whenever ϕ (obj )⊥ C[−] and let v ⇐ aη in ϕ(v) = ⊥. So let ϕ (obj )⊥ C[−] and suppose let v ⇐ aη in ϕ(v) = ⊥. By part 1 of the induction hypothesis, a η θ(a ), hence for all ϕ (obj )⊥ C [−], let v ⇐ a η in ϕ (v ) = ⊥ ⇒ ∃(b : nf) let v ⇐ a η in ϕ (v ) = return(b ) ∧ C [θ(a )] ↔ b . (6) Thus, instantiating (6) with ϕ (v ) = let v ⇐ (f1 ⊕ f2 )(v) in ϕ(v ) and C [−] = C[[−].], and observing that by associativity let v ⇐ a η in ϕ (v ) = let v ⇐ aη in ϕ(v) = ⊥, we find that ϕ (obj )⊥ C [−] implies ∃(b : nf) let v ⇐ aη in ϕ(v) = return(b) ∧ b ↔ C [θ(a )] = C[θ(a)] from which we may conclude existence of the required normal form b once we have established ϕ (obj )⊥ C [−]. To that end, let d0 obj b0 and assume ϕ (d0 ) = ⊥. In particular, d0 = ⊥, so either d0 = tm a0 for a0 ∈ obj, or d0 = rec r where ⊥ = r ∈ RecL (M). In the first case, a0 : at and a0 ↔ b0 by definition of obj . Hence, tm(a0 .) obj b0 ., and ⊥ = ϕ (d0 ) = let v ⇐ f1 (a0 ) in ϕ(v) = ϕ(tm (a0 .)) combined with ϕ (obj )⊥ C[−] guarantees the existence of b : nf such that ϕ (d0 ) = return(b) and b ↔ C[a0 .] ↔ C[b0 .] ↔ C [b0 ]. In the second case, where d0 = rec r, by the definition of obj b0 must be [i = mi i∈[k] ] where {i | i ∈ [k]} = dom(r) and r.i meth mi for all i. From ⊥ = ϕ (d0 ) we obtain ϕ (d0 ) = let v ⇐ f2 (r) in ϕ(v) = let v ⇐ r.(rec r) in ϕ(v). Writing mi as ς(xi )ai , the assumption d0 obj b0 immediately yields r.(rec r) (xi → b0 )(ai ), so that from the assumption ϕ (obj )⊥ C[−] we obtain ϕ (d0 ) = return(b ) and b ↔ C[(xi → b0 )(ai )]] for some b : nf. Thus also C [b0 ] = C[b0 .i ] ↔ C[(xi → b0 )(ai )] ↔ b . This proves ϕ (obj )⊥ C [−] and concludes this case of the inductive proof. The case where a is a . := m is similar, using ϕ (v ) = let v ⇐ (f1 ⊕ f2 )(v) in ϕ(v) where f1 (b) = let m ⇐ ↓m ∗ mη in ↑(upd(b, , m )) and f2 (r) = let f ⇐ mη in return(rec r. := f ), and the context C [−] = C[[−]. := θ(m)].
On Normalization by Evaluation for Object Calculi
183
Finally, let us consider the case of methods, where m is ς(x)a and we may assume that x ∈ / supp(θ) = x∈dom(θ) {supp(θ(x))} ∪ dom(θ). Then mη denotes return(g) for g(d) = if d = ⊥ then ⊥else aη[x:=d] , and θ(m) = ς(x)θ(a). We must prove that mη meth θ(m), so let d obj b. Observe that for η = η[x := d] and θ = (x → b) ◦ θ we have η (y) obj θ (y) for all y ∈ Var , hence by part 1 of the induction hypothesis (mη )(d) = aη θ (a) = (x → b)(θ(a)). Since this is true for all d obj b we have proved mη meth θ(m).
Lemma 4 (Reification of related elements) 1. For all a ∈ obj, if a : at then tm(a) obj a. 2. For all d ∈ O, b ∈ obj, if d obj b and ↓(d) = ⊥ then ↓(d) = return(a) for some a : nf such that a ↔ b. 3. For all f ∈ M, m ∈ meth, if f meth m and ↓m (f ) = ⊥ then ↓m (f ) = return(m ) for some m : mnf such that m ↔ m. Proof (sketch). Part 1 is immediate from the definition of obj . The proof of the second and third part is by fixed point induction with respect to the admissible predicates P ⊆ O (obj⊥ )⊥⊥ and Q ⊆ M (meth⊥ )⊥⊥ , P = {ϕ | ϕ (obj )⊥ [−]} Q = {ψ | ∀f meth m. ψ(f ) = ⊥ ⇒ ∃(m : mnf) ψ(f ) = return(m ) ∧ m ↔ m} More precisely, defining Φ : (M (meth⊥ )⊥⊥ ) → (O (obj⊥ )⊥⊥ ) by Φ(ψ)(d) = match d with tm(a) ⇒ return(a) | rec(r) ⇒ let (m ⇐ ψ(r.) | ∈dom(r)) in return(obj {| = m ∈dom(r) |} )
and Ψ : (O (obj⊥ )⊥⊥ ) → (M (meth⊥ )⊥⊥ ) by Ψ (ϕ)(f ) = let x ⇐ fresh in let a ⇐ ϕ∗ (f ∗ (↑(var x))) in return(meth(x, a))
we prove ϕ ∈ P ⇒ Ψ (ϕ) ∈ Q and ψ ∈ Q ⇒ Φ(ψ) ∈ P , for then the definition of ↓ = lfp(Φ ◦ Ψ ) ∈ P and ↓m = Φm (↓) = lfp(Ψ ◦ Φ) ∈ Q as least fixed points have the required properties, by definition of (obj )⊥ and Q, respectively. Lemma 5 (Definedness of normal forms). Suppose that for all x ∈ fv (a) ∪ fv (m), η(x) = tm(b) for some b ∈ obj. Then 1. if a : at then aη = return(tm a ) for some a ∈ obj; 2. if a : nf then ↓∗ (aη ) = return(a ) for some a ∈ obj; and 3. if m : mnf then ↓m ∗ (mη ) = return(m ) for some m ∈ meth. Proof. By induction on the derivation of a : at, a : nf and m : mnf, resp.
184
J. Schwinghammer
We can now prove correctness: since tm(var x) obj x by Lemma 4(1) we have aη0 a for η0 = λ(x ∈ Var ). tm(var x), by Lemma 3. Hence, if norm(a) = ⊥ then ↓(a η0 ) = return(a ) for some a : nf such that a ↔ a , by Lemma 4(2), and norm(a) = return(a ). Conversely, if a ↔ a then ↓∗ (aη0 ) = ↓∗ (a η0 ) by Theorem 1. In particular, if a : nf then ↓∗ (aη0 ) = ⊥ by Lemma 5(2), and we have shown: Theorem 2 (Correctness) 1. If norm(a) = ⊥ then norm(a) = return(a ) for some a : nf such that a ↔ a. 2. If a ↔ a then norm(a) = norm(a ). 3. If a ↔ a for some a : nf, then norm(a) = ⊥.
6
Conclusion
We have proved correctness of NBE for Abadi and Cardelli’s (untyped) object calculus, giving a semi-decision procedure for the simplest of the equational theories presented in [1]. Shinwell and Pitts prove that the continuation semantics forms an adequate model of the Fresh O’Caml dialect of SML [19]. In this sense, the normalization result leads to an implementation that is correct by construction. The previous approach of Filinski and Rohde, using a de Bruijn-level naming scheme for the computed normal forms, clearly carries over to object calculus. Correspondingly, our main contribution here is not so much the consideration of the object calculus but working out the details of the construction in the world of nominal sets. We believe that our work provides further evidence to support the point made in [16]: an approach to NBE using nominal sets in the formal development “allows us to retain the essential simplicity of an informal account [. . . ]”, without obscuring the basic ideas by issues of name generation. However, while this is true for the statement of the properties, the use of the continuation monad certainly complicates some of the proofs, as a comparison to [11] shows: Filinski and Rohde prove correctness of NBE for the untyped lambda calculus using standard domain-theoretic methods, and handle name generation by means of ‘wrapper functions’ and Kripke relations to keep track of used names. The proof of our logical relations lemma (Lemma 3) is less straightforward than that of the corresponding property in [11]. Moreover, their constructions can be implemented in ‘conventional’ functional programming languages, not relying on language support for freshness. On the other hand, as observed by one of the reviewers, these facts also indicate that NBE may simply not be a good application for demonstrating the rather more powerful machinery of nominal sets: deconstruction and pattern matching of abstract syntax with binders, which is supported by nominal sets, is not necessary for NBE. (Pattern matching will be implicitly used in an implementation, when comparing two normal forms in (obj⊥ )⊥⊥ for equality; due to the ‘extraction’ of a term b from the continuation semantics in (5), the given definition of norm is computationally not meaningful.) One remaining question is how to capture the more refined equational theories of objects presented in [1] which rely on types. The problem here is that subtyping
On Normalization by Evaluation for Object Calculi
185
is an obstacle to defining a reification map. Another question that we leave open is the generalization from computing normal forms to computing B¨ ohm trees [5], which have a natural analogue in the object calculus. For untyped lambda terms, [11] shows that the domain-theoretic normalization result extends to this infinitary case. It should be interesting to see if a similar generalization is possible within FMdomain theory: for instance, the domain of (lazy) lambda trees used in [11] differs from a correspondingly constructed FM-cpo, in that the FM-cpo cannot contain trees with infinitely many free variables (due to the finite support property). Acknowledgments. I would like thank the reviewers for their valuable comments that helped to improve correctness and readability of the paper.
References 1. Abadi, M., Cardelli, L.: A Theory of Objects. Springer, Heidelberg (1996) 2. Aehlig, K., Joachimski, F.: Operational aspects of untyped normalization by evaluation. Mathematical Structures in Computer Science 14(4), 587–611 (2004) 3. Altenkirch, T., Hofmann, M., Streicher, T.: Reduction-free normalisation for a polymorphic system. In: Proc. LICS 1996, pp. 98–106 (1996) 4. Balat, V., Di Cosmo, R., Fiore, M.P.: Extensional normalisation and type-directed partial evaluation for typed lambda calculus with sums. In: Proc. POPL 2004, pp. 64–76. ACM Press, New York (2004) 5. Barendregt, H.P.: The Lambda Calculus. North-Holland, Amsterdam (1984) 6. Benton, N., Leperchey, B.: Relational reasoning in a nominal semantics for storage. In: Urzyczyn, P. (ed.) TLCA 2005. LNCS, vol. 3461, pp. 86–101. Springer, Heidelberg (2005) 7. Berger, U., Schwichtenberg, H.: An inverse of the evaluation functional for typed λ-calculus. In: Proc. LICS 1991, pp. 203–211. IEEE Computer Society Press, Los Alamitos (1991) 8. Coquand, T., Dybjer, P.: Intuitionistic model constructions and normalization proofs. Mathematical Structures in Computer Science 7(1), 75–94 (1997) 9. de Liguoro, U.: Characterizing convergent terms in object calculi via intersection types. In: Abramsky, S. (ed.) TLCA 2001. LNCS, vol. 2044, pp. 315–328. Springer, Heidelberg (2001) 10. Dybjer, P., Filinski, A.: Normalization and partial evaluation. In: Barthe, G., Dybjer, P., Pinto, L., Saraiva, J. (eds.) APPSEM 2000. LNCS, vol. 2395, pp. 137–192. Springer, Heidelberg (2002) 11. Filinski, A., Rohde, H.K.: Denotational aspects of untyped normalization by evaluation. Theoretical Informatics and Applications 39(3), 423–453 (2005) 12. Fisher, K., Mitchell, J.C.: A delegation-based object calculus with subtyping. In: Reichel, H. (ed.) FCT 1995. LNCS, vol. 965, pp. 42–61. Springer, Heidelberg (1995) 13. Fresh O’Caml. Website (2007), at http://www.fresh-ocaml.org/ 14. Lindley, S., Stark, I.: Reducibility and -lifting for computation types. In: Urzyczyn, P. (ed.) TLCA 2005. LNCS, vol. 3461, pp. 262–277. Springer, Heidelberg (2005) 15. Pitts, A.M.: Relational properties of domains. Information and Computation 127, 66–90 (1996) 16. Pitts, A.M.: Alpha-structural recursion and induction. Journal of the ACM 53, 459–506 (2006) 17. Reus, B., Streicher, T.: Semantics and logic of object calculi. Theoretical Computer Science 316, 191–213 (2004)
186
J. Schwinghammer
18. Shinwell, M.R.: The Fresh Approach: functional programming with names and binders. PhD thesis, University of Cambridge Computer Laboratory (February 2005) 19. Shinwell, M.R., Pitts, A.M.: On a monadic semantics for freshness. Theoretical Computer Science 342, 28–55 (2005)
A
Existence Proof
Let Rel be the set of finitely supported relations R ⊆ O×obj such that {d | d R b} forms an admissible subset of O, for all b ∈ obj. Lemma A.1 (Admissibility preservation) If R ∈ Rel then {d | d R⊥⊥ b} is an admissible subset of O⊥⊥ , for all b ∈ obj. Proof. That ⊥ R⊥⊥ b is immediate from the definition. So let b ∈ obj, D ⊆ O⊥⊥ be a finitely supported directed set such that d R⊥⊥ b holds for all d ∈ D, and let e = D. To prove e R⊥⊥ b let ϕ R⊥ C[−] and suppose ⊥ = ϕ∗ (e) = {ϕ∗ (d) | d ∈ D}. Thus there exists some d ∈ D such that ϕ∗ (d) = ⊥, and for all such d we have that there is a : nf where ϕ∗ (d) = return(a) and C[b] ↔ a by d R⊥⊥ b. (From the discrete ordering on obj it follows that this is the same a for all such d). Thus ϕ∗ (e) = return(a) and the result follows. For R, S ∈ Rel let ΨM (R, S) ⊆ M × meth be ΨM (R, S) = {(f, ς(x)a) | ∀(d ∈ O)∀(b ∈ obj) d R b ⇒ f (d) S ⊥⊥ (x → b)(a)} and Ψ (R, S) ⊆ O × obj be such that (d, b) ∈ Ψ (R, S) holds if and only if 1. d = ⊥, or 2. d = tm(b) and b : at, or 3. d = rec(r), b = [i = mi i∈[k] ] where dom(r) = {i | i ∈ [k]}, and (r.i , mi ) ∈ ΨM (R, S) for all i ∈ [k]. Lemma A.2 (Admissibility) For all R, S ∈ Rel, Ψ (R, S) ∈ Rel. Proof. This follows easily with Lemma A.1.
We note that Rel is a FM-complete lattice with respect to set inclusion, with meets of finitely supported sets given by set-theoretic intersection. Moreover, by Lemma A.2, the symmetrization Ψ § of Ψ , R, S → Ψ § (R, S) = (Ψ (S, R), Ψ (R, S)) is a monotone map on Relop × Rel which has a least (pre-)fixed point (Δ− , Δ+ ) by a variant of the Knaster-Tarski Fixed Point Theorem [18]. Since then also (Δ+ , Δ− ) is a fixed point, one has Δ+ ⊆ Δ− . For the converse, define for e ∈ (O O) and R, S ∈ Rel, e : R ⊂ S ⇔ ∀(d ∈ O)∀(b ∈ obj). (d, b) ∈ R ⇒ (e(d), b) ∈ S, intuitively stating that e maps R-related elements to S-related elements. A consequence of this definition is the following property.
On Normalization by Evaluation for Object Calculi
187
Lemma A.3 Suppose e : R ⊂ S. Then 1. (ϕ, C[−]) ∈ S ⊥ ⇒ (ϕ ◦ e, C[−]) ∈ R⊥ 2. (d, b) ∈ R⊥⊥ ⇒ (λ(h ∈ S ⊥ ). d(h ◦ e), b) ∈ S ⊥⊥ for all ϕ ∈ (O (obj⊥ )⊥⊥ ), C[−] ∈ ctxt, d ∈ O⊥⊥ and b ∈ obj. Proof. Part (1) is verified straightforwardly using the definition of R⊥ and S ⊥ ; part (2) then follows from (1) and the definitions of R⊥⊥ and S ⊥⊥ , resp. Now to show Δ− = Δ+ it suffices to prove that id : Δ− ⊂ Δ+ . Since by the minimal invariant property of O one has id = lfp(δ) this follows by a fixed point induction with respect to the admissible (because Δ+ ∈ Rel) predicate [Δ− , Δ+ ] = {e ∈ (O O) | e : Δ− ⊂ Δ+ }. Clearly ⊥ ∈ [Δ− , Δ+ ]. Moreover, to show e ∈ [Δ− , Δ+ ] ⇒ δ(e) ∈ [Δ− , Δ+ ] it suffices to prove that Ψ satisfies, for all R, S ∈ Rel, e : R ⊂ S ⇒ δ(e) : Ψ (S, R) ⊂ Ψ (R, S),
(7)
for then e ∈ [Δ− , Δ+ ] yields δ(e) : Ψ (Δ+ , Δ− ) ⊂ Ψ (Δ− , Δ+ ). But the latter is just δ(e) : Δ− ⊂ Δ+ by choice of (Δ− , Δ+ ) = Ψ § (Δ− , Δ+ ) ∈ Relop × Rel, establishing that e ∈ [Δ− , Δ+ ] implies δ(e) ∈ [Δ− , Δ+ ]. It remains to prove (7). To this end, assume e : R ⊂ S, and let (d, b) ∈ Ψ (S, R); we show (δ(e)(d), b) ∈ Ψ (R, S). This is clear if δ(e)(d) = ⊥, so assume δ(e)(d) = ⊥. Then, by definition of δ, either d = tm(a) for some a ∈ obj and δ(e)(d) = tm(a), or else d = rec(r) for some ⊥ = r ∈ RecL (M) and δ(e)(r). = δM (e)(r.) for all ∈ dom(r). In the former case, (δ(e)(d), b) ∈ Ψ (R, S) follows directly from the assumption (d, b) ∈ Ψ (S, R), so let us consider the other case. The assumption (rec(r), b) ∈ Ψ (S, R) yields that b is of the form [i = mi i∈[k] ] where dom(r) = {i | i ∈ [k]}, and it remains to show (δM (e)(r.i ), mi ) ∈ ΨM (R, S) for all i ∈ [k]. By the precondition in (7), (e(d ), b ) ∈ S for all (d , b ) ∈ R. Supposing that mi = ς(xi )ai then, since (r.i , mi ) ∈ ΨM (S, R), we have ∀(d ∈ O)∀(b ∈ obj). (d , b ) ∈ R ⇒ (r.i (e(d )), (xi → b )(ai )) ∈ R⊥⊥ .
(8)
Using e : R ⊂ S we may instantiate Lemma A.3(2) by the right-hand side of (8) to obtain for all d ∈ O and all b ∈ obj: (d , b ) ∈ R ⇒ (λ(h ∈ S ⊥ ). r.i (e(d ))(h ◦ e), (xi → b )(ai )) ∈ S ⊥⊥ , which shows the required property (δM (e)(r.i ), mi ) ∈ ΨM (R, S). Hence we have shown (7) and may define obj to equal Δ− = Δ+ , to obtain the following: Theorem A.4 (Existence) There exists a relation obj ∈ Rel such that obj = Ψ (obj , obj ).
Attributive Types for Proof Erasure Hongwei Xi Boston University Abstract. Proof erasure plays an essential role in the paradigm of programming with theorem proving. In this paper, we introduce a form of attributive types that carry an attribute to determine whether expressions assigned such types are eligible for erasure before run-time. We formalize a type system to support this form of attributive types and then establish its soundness. In addition, we outline an extension of the developed type system with dependent types and present some examples to illustrate its use in practice.
1
Introduction
In DML [Xi07,XP99], a restricted form of dependent types are introduced to allow for specification and inference of significantly more accurate type information (when compared to the types in ML) and thus further facilitate effective program error detection and compiler optimization through types. In contrast to the standard full dependent types (as in Martin-L¨ of’s constructive theory [Mar84,NPS90]), types in DML can only depend on indexes drawn from a chosen index language, and type-checking a sufficiently annotated program in DML can be reduced to solving constraints from the chosen index language. This design makes it particularly straightforward to support common realistic programming features such as general recursion and effects (e.g., exceptions and references) in the presence of dependent types. In order to solve constraints in an algorithmically effective manner, certain restrictions need to be imposed on indexes. For instance, constraints on integer indexes in DML are required to be linear,1 and a constraint solver based on the Fourier-Motzkin variable elimination method [DE73] is then employed to solve such constraints. While this is indeed a simple design, it is inherently ad hoc and cannot handle a situation where nonlinear constraints (e.g., ∀n : int. n ∗ n ≥ 0) are involved. Let us now see a simple example that clarifies this point. Let list be a type constructor that takes a type T and an integer I to form a type list(T, I) for lists of length I in which each element is of type T . The two list constructors associated with list are assigned the following types: nil cons
1
: :
∀α. list(α, 0) ∀α.∀ι. ι ≥ 0 ⊃ (α ∗ list(α, ι) → list(α, ι + 1))
This work is partially supported by NSF grants no. CCR-0229480 and no. CCF0702665. More precisely, it is required that constraints on integer indexes in DML be converted into linear integer programming problems.
M. Miculan, I. Scagnetto, and F. Honsell (Eds.): TYPES 2007, LNCS 4941, pp. 188–202, 2008. c Springer-Verlag Berlin Heidelberg 2008
Attributive Types for Proof Erasure
189
which indicate that nil forms a list of length 0 and cons takes an element and a list of length I to form a list of length I + 1. We use α and ι for bound variables ranging over types and integers, respectively. Now assume that the function @ (infix) for appending two lists is given the following type: ∀α.∀ι1 .∀ι2 . list(α, ι1 ) ∗ list(α, ι2 ) → list(α, ι1 + ι2 ) In other words, appending two lists of length I1 and I2 yields a list of length I1 + I2 . Naturally, the function that concatenates a list of length I1 in which each element is a list of length I2 is expected to have the following type: ∀α.∀ι1 .∀ι2 . list(list(α, ι2 ), ι1 ) → list(α, ι1 ∗ ι2 ) Unfortunately, this type is not allowed in DML as accepting nonlinear terms like ι1 ∗ ι2 as type indexes would readily make constraint solving undecidable (or worse, intractable). dataprop MUL (int, | {n:int} MULbas | {m, n, p:int | | {m, n, p:int |
int, int) = // a prop for encoding multiplication (0, n, 0) m >= 0} MULind (m+1, n, p+n) of MUL (m, n, p) m > 0} MULneg (~m, n, ~p) of MUL (m, n, p)
Fig. 1. A dataprop for encoding multiplication on integers
To address this limitation, a fundamentally different design is adopted in ATS (which supersedes DML) to accommodate a paradigm that combines programming with theorem proving [CX05]. With this design, the programmer is given a means to handle nonlinear constraints by constructing explicit proofs attesting to the validity of such constraints (while linear constraints are still handled by an automatic constraint solver). In Figure 1, we declare a prop (i.e., type for proofs) constructor MUL, where the concrete syntax indicates that there are three (proof) value constructors associated with MUL, which are given the following constant props (or c-props for short): MULbas : ∀ι. MUL(0, ι, 0) MULind : ∀ι1 .∀ι2 .∀ι3 . ι1 ≥ 0 ⊃ (MUL(ι1 , ι2 , ι3 ) → MUL(ι1 + 1, ι2 , ι3 + ι2 )) MULneg : ∀ι1 .∀ι2 .∀ι3 . ι1 > 0 ⊃ (MUL(ι1 , ι2 , ι3 ) → MUL(−ι1 , ι2 , −ι3 )) Given integers I1 , I2 , I3 , I1 ∗ I2 = I3 holds if and only if MUL(I1 , I2 , I3 ) can be assigned to a closed (proof) value. In essence, MULbas, MULind and MULneg correspond to the following three equations in an inductive definition of the multiplication function on integers: 0 ∗ n = 0; (m + 1) ∗ n = m ∗ n + n if m >= 0; (−m) ∗ n = −(m ∗ n) if m > 0. In Figure 2, a function concat of the following type for concatenating a list of lists is implemented: ∀α.∀ι1 .∀ι2 . (ι1 ≥ 0 ∧ ι2 ≥ 0) ⊃ (list(list(α, ι2 ), ι1 ) → ∃ι. MUL(ι1 , ι2 , ι) ∗ list(α, ι))
190
H. Xi
// (...) is used to form tuples; e.g., () is the empty tuple // | is used like a comma, which separates proofs from values // {...} means universal quantification // [...] means existential quantification fun concat1 {a:type} {m, n:int | m >= 0; n >= 0} (xxs: list (list (a, n), m)) : [p:int | p >=0 ] (MUL (m, n, p) | list (a, p)) = case+ xxs of | nil () => (MULbas | nil ()) | cons (xs, xss) => let val (pf | res) = concat1 xss in (MULind pf | xs @ res) end Fig. 2. An implementation of the list concatenation function in ATS
When applied to a list of length I1 in which each element is a list of length I2 , this function returns a pair (pf, v), where v is a list of length I and pf (of the prop MUL(I1 , I2 , I)) is what we call a proof (in contrast to a program), which provides a witness to I = I1 ∗ I2 . Please find many more programming examples as such written in ATS [Xi], a language with a highly expressive type system rooted in the framework Applied Type System (ATS) [Xi04]. Proofs can often be large and expensive to construct and should be erased before run-time. Note that we extract nothing from proofs. This style of programing, which we call programming with theorem proving, is rather different from the paradigm of program extraction (from proofs) as is supported in NuPrl [C+ 86] or Coq [PM89,DFH+ 93,BC04]). For instance, a function like concat can be effectful (though it is not) and need not to be terminating (though it is). To support the construction of programs involving effects (e.g., nontermination, exceptions and references, nondeterminism) is probably one of the most crucial issues in the design of ATS. In order to guarantee that proofs in a program can be erased without altering the dynamic semantics of the program, a design is adopted in ATS that completely separates proofs from programs. Generally speaking, props (i.e., types for proofs) are introduced that can only be assigned to proofs, which are verified to be total (i.e., pure and terminating) in the type system of ATS, and programs, even if they are total, are disallowed in the construction of proofs. In short, programs may contain proofs but proofs cannot contain programs. While this design is conceptually simple, it leads to a rather duplicated presentation of various rules (e.g., typing rules and evaluation rules) for proofs and programs [CX05]. More seriously, it also complicates certain cases of proof construction that can be made significantly simpler if total programs are allowed to occur inside proofs. We will present an example in Section 4 to clarify this point. In this paper, we follow the design of program extraction in Coq, where proofs can occur in programs and vice versa. The primary contribution of this paper lies in a novel design for unifying proofs and programs in an effectful programming language (in contrast to a theorem proving system like Coq). The developed formalism for supporting this design is already employed in ATS/Geizella, the
Attributive Types for Proof Erasure
191
current implementation of ATS [Xi]. However, for brevity, we can only present the essential idea behind this design in a simply typed setting and then outline an extension that accommodates dependent types as well as polymorphic types. We organize the rest of the paper as follows. In Section 2, we present a language L0 based on the simply typed lambda-calculus. A form of attribute types are supported in L0 to determine whether expressions assigned such types can be erased at compile-time without affecting the dynamic semantics of a program. We in Section 3 to support both dependent then outline an extension of L0 to L∀,∃ 0 types and polymorphic types. In Section 4, we give a short but realistic example to illustrate a need for unifying proofs with programs. Lastly, we mention some closely related work and conclude. erasure bits b ::= 0 | 1 effect bits t ::= 0 | 1 types τ ::= δ | 1 | τ1 ∗ τ2 | τ1 →t τ2 a-types τˆ ::= (τ )b a-type cores τ ::= δ | 1 | τˆ 1 ∗ τˆ2 | τˆ1 →t τˆ 2 expr. e ::= x | f | c(e) | if(e0 , e1 , e2 ) | | e1 , e2 | fst(e) | snd(e) | lam x. e | fix f. e | app(e1 , e2 ) a-expr. ˆe ::= (e)b a-expr. cores e ::= x | f | c(ˆ e) | if(ˆ e0 , ˆe1 , ˆe2 ) | | ˆ e1 , ˆe2 | fst(ˆ e) | snd(ˆ e) | e | app(ˆ e1 , ˆe2 ) lam x. ˆ e | fix f. ˆ values v ::= x | cc(v) | | v1 , v2 | lam x. e a-values vˆ ::= (v)b a-value cores v ::= x | cc(ˆ v ) | | ˆ v 1 , vˆ2 | lam x. ˆe contexts Γ ::= ∅ | Γ, xf : τˆ substitutions θ ::= [] | θ[x → vˆ] | θ[f → ˆe] Fig. 3. The syntax for L0
2
Formal Development
We present a language L0 based on the simply typed lambda-calculus to formally introduce a form of attributive types. The syntax for L0 is given in Figure 3. We use both b (erasure bits) and t (effect bits) to range over 0 and 1. Given two (erasure) bits b1 and b2 , b1 ⊗ b2 is a bit, which equals 1 if and only if b1 = b2 = 1. Given two (effect) bits t1 and t2 , t1 ⊕ t2 is a bit, which equals 0 if and only if t1 = t2 = 0. So, ⊗ and ⊕ correspond to boolean product and sum, respectively. We use δ, τ , τ and τˆ for base types (e.g., bool for booleans and int for integers), types, a-type (i.e., attributive type) cores and a-types, respectively. Similarly, we use e, e and ˆe for expressions, a-expression cores, a-expressions, respectively. Given τˆ = (τ )b , we write bit(ˆ τ ) for b, core(ˆ τ ) for τ , and (ˆ τ )b0 for b⊗b0 b (τ ) . Similarly, given ˆe = (e) , we write bit(ˆe) for b, core(ˆe) for e, and (ˆe)b0 b⊗b0 for (e) . So both (ˆ τ )b and (ˆe)b are just syntactic sugar. It will soon become clear that an a-type τˆ can only be assigned to an a-expression ˆe such that bit(ˆ τ ) = bit(ˆe), and bit(ˆe) = 0 means that ˆe can be erased from any program
192
H. Xi
containing ˆe without altering the dynamic semantics of the program. We say that ˆe is erasable if bit(ˆe) = 0. Clearly, an a-expression cannot be erased if its evaluation may generate effects at run-time. To address this issue, we design a type system based on the notion of types with effects [JG91] to track whether evaluating an a-expression may generate effects. Given an a-type core τ = τˆ1 →t τˆ2 , a call to a function of atype (τ )b may generate effects only if t = 1. Note that nonterminating evaluation is the sole kind of effect in L0 , but more can be easily added when L0 is extended. For instance, in ATS, we also track effects caused by raising exceptions, accessing references, aborting program execution (abnormally), etc. We use x for a lam-variable and f for a fix-variable, and xf for either an x or f . We use c for a constant, which is either a constant constructor cc or a constant function cf. The a-expressions in Figure 3 are standard except for the erasure bits they carry. We now assign a dynamic semantics to L0 . The evaluation contexts in L0 are defined as follows. v , E )b | eval. ctx. E ::= [] | (E)b | (c(E))b | (if(E, ˆe1 , ˆe2 ))b | (E, ˆe )b | (ˆ b b b (fst(E)) | (snd(E)) | (app(E, ˆe)) | (app(ˆ v , E))b The redexes in L0 and their reducts are defined below. Definition 1. (Redexes) We define redexes and their reducts as follows. – – – – – – –
((v)b1 )b2 is a redex, and its reduct is (v)b1 ⊗b2 . (if((true)b0 , ˆe1 , ˆe2 ))b is redex, and its reduct is (ˆe1 )b . (if((false)b0 , ˆe1 , ˆe2 ))b is redex, and its reduct is (ˆe2 )b . (app((lam x. ˆe)b0 , ˆ v ))b is a redex, and its reduct is (ˆe[x → ˆv ])b . (fst((ˆ v1 , ˆ v 2 )b0 ))b is a redex, and its reduct is (ˆ v 1 )b . v 2 )b . (snd((ˆ v 1 , vˆ2 )b0 ))b is a redex, and its reduct is (ˆ b (fix f. ˆe) is a redex and its reduct is ˆe[f → (fix f. ˆe)b ].
Given ˆe1 and ˆe2 , we write ˆe1 → ˆe2 to mean that ˆe1 reduces to ˆe2 , that is, ˆe1 = E[ˆe] and ˆe2 = E[ˆe ] for some redex ˆe and its reduct ˆe . We may also use → for the single step call-by-value evaluation on expressions, which is completely standard. As usual, we use →+ for the transitive closure of →, and →∗ for the transitive and reflexive closure of →. We use Γ t ˆe : τˆ for a typing judgment in L0 that assigns the a-type τˆ to the a-expression ˆe, where the bit t is used to indicate whether evaluating ˆe may generate any effects. The static semantics for L0 is given by the typing rules in Figure 4, for some of which we provide brief explanation as follows. The rule (ty-fix-var) indicates that a fix-variable is considered effectful; in the rule (ty-const), we write c : τˆ1 →t0 τˆ2 to mean that c is assigned the a-type core τˆ1 →t0 τˆ2 , for instance, by some kind of signature; the rule (ty-erase) essentially means that an a-expression ˆe can be erased if the evaluation of ˆe is guaranteed to be free of effects; from the rule (ty-if ), it is clear that an if-expression must be erasable if its condition is erasable; the rule (ty-app) indicates that an application is erasable if the function in the application is.
Attributive Types for Proof Erasure
Γ (xf) = τˆ
b = bit(ˆ τ)
Γ 0 (x)b : τˆ
c : τˆ1 →t0 τˆ2 Γ
Γ t1 ˆe1 : τˆ Γ
t0 ⊕t1 ⊕t2
Γ t1 ˆe1 : τˆ1 Γ
t0 ⊕ t ≤ b
(c(ˆe))b : τˆ2 Γ t2 ˆe2 : τˆ
(ty-const)
t0 ⊕ t1 ⊕ t2 ≤ b = bit(ˆ τ ) ⊗ b0
(if(ˆe0 , ˆe1 , ˆe2 )) : (ˆ τ )b0 t1 ⊕ t2 ≤ b
b
(ˆe1 , ˆe2 ) : (ˆ τ 1 , τˆ2 )b
Γ t ˆe : (ˆ τ 1 ∗ τˆ2 )b0
t ≤ b = bit(ˆ τ 1 ) ⊗ b0
t
Γ (fst(ˆe))b : (ˆ τ 1 )b0 τ 1 ∗ τˆ2 )b0 Γ t ˆe : (ˆ
t ≤ b = bit(ˆ τ 2 ) ⊗ b0
t
Γ (snd(ˆe))b : (ˆ τ 2 )b0 Γ, x : τˆ1 t ˆe : τˆ2 b
t
b
Γ (lam x. ˆe) : (ˆ τ 1 → τˆ2 ) 0
τ 1 →t τˆ2 )b0 Γ t1 ˆe1 : (ˆ Γ
(ty-fst)
(ty-snd) t ≤ b = bit(ˆ τ)
Γ t (fix f. ˆe)b : τˆ t1 ⊕ t2 ⊕ t ≤ b = bit(ˆ τ 2 ) ⊗ b0
(app(ˆe1 , ˆe2 ))b : (ˆ τ 2 )b0 Γ 0 ˆe : τˆ
Γ 0 (ˆe)0 : (ˆ τ )0
(ty-if )
(ty-tup)
Γ, f : τˆ t ˆe : τˆ
(ty-lam)
Γ t2 ˆe2 : τˆ1
t1 ⊕t2 ⊕t
(ty-fix-var)
b
Γ t2 ˆe2 : τˆ2
t1 ⊕t2
bit(ˆ τ) = 1
Γ 1 (f )1 : τˆ
Γ t ˆe : τˆ1
t0 ⊕t
Γ t0 ˆe0 : (bool)b0
Γ (xf) = τˆ
(ty-lam-var)
193
(ty-fix)
(ty-app)
(ty-erase)
Fig. 4. The typing rules for L0
Proposition 1. If Γ t vˆ : τˆ is derivable, then t = 0. Proof. By an inspection of the typing rules in Figure 4. Proposition 2. If Γ t ˆe : τˆ is derivable, then t ≤ bit(ˆe) = bit(ˆ τ ). τ ) ⊗ b holds for any a-type τˆ and bit b. This Proof. Note that bit((ˆ τ )b ) = bit(ˆ proposition follows from an inspection of the rules in Figure 4. Lemma 1. (Canonical Forms) Assume that ∅ 0 (v)b : (τ )b is derivable. v 0 ) for some con1. If τ = δ for some base type δ, then v is of the form cc(ˆ structor cc associated with δ, that is, cc is given an a-type core of the form τˆ0 → δ for some τˆ0 . 2. If τ = τˆ1 ∗ τˆ2 , then v is of the form ˆ v 1 , vˆ2 . 3. If τ = τˆ1 →t τˆ2 , then v is of the form lam x. ˆe. Proof. By an inspection of the typing rules in Figure 4. Lemma 2. (Substitution) We have the following.
194
H. Xi
1. Assume that both Γ 0 vˆ : τˆ1 and Γ, x : τˆ1 t ˆe : τˆ2 are derivable. Then Γ t ˆe[x → vˆ] : τˆ2 is also derivable. 2. Assume that both Γ t1 ˆe1 : τˆ1 and Γ, f : τˆ1 t2 ˆe2 : τˆ2 are derivable. Then Γ t2 ˆe2 [f → ˆe1 ] : τˆ2 is derivable for some t2 ≤ t2 . Proof. By structural induction on the derivations of Γ, x : τˆ1 t ˆe : τˆ2 and Γ, f : τˆ1 t2 ˆe2 : τˆ2 , respectively. Theorem 1. (Subject Reduction) Assume that ∅ t1 ˆe1 : τˆ is derivable and ˆe1 → ˆe2 holds. Then ∅ t2 ˆe2 : τˆ is derivable for some t2 ≤ t1 . Proof. By structural induction on the derivation D of ∅ t1 ˆe1 : τˆ. Lemma 2 is needed when handling the case where the last applied rule in D is (ty-app) or (ty-fix). Theorem 2. (Progress) Assume that ∅ t ˆe1 : τˆ is derivable. Then either ˆe1 is an a-value or ˆe1 → ˆe2 for some a-expression ˆe2 . Proof. The theorem follows from structural induction on the derivation D of ∅ t ˆe1 : τˆ. With Theorem 1 and Theorem 2, it is clear that for each well-typed a-expression ˆe, that is, ∅ ˆe : τˆ is derivable for some τˆ, the evaluation of ˆe either leads to an a-value or it continues forever. So the type soundness of L0 is established. Of course, we can also prove the type soundness of L0 by simply ignoring erasure bits. However, we need Theorem 1 and Theorem 2 to prove Theorem 5, a main result in the paper that justifies proof erasure. Definition 2. (Reducibility) Given an a-expression ˆe and an a-type τˆ = (τ )b0 , we say that ˆe is reducible of τˆ if ˆe↓, that is, there is no infinite reduction sequence from ˆe, and 1. τ = δ for some base type δ; or 2. τ = τˆ1 ∗ τˆ 2 for some τˆ1 and τˆ2 and for any vˆ1 and ˆv 2 , ˆe →∗ (ˆ v 1 , vˆ2 )b implies that vˆ1 and vˆ2 are reducible of τˆ1 and τˆ2 , respectively; or 3. τ = τˆ1 →0 τˆ2 for some τˆ1 and τˆ2 and for any ˆe0 , ˆe →∗ (lam x. ˆe0 )b implies that ˆe0 [x → vˆ] is reducible of τˆ2 for every vˆ reducible of a-type τˆ1 ; or 4. τ = τˆ1 →1 τˆ2 for some τˆ1 and τˆ2 . Given a substitution θ and a context Γ of the same domain, we say that θ is reducible of Γ if θ(xf) is reducible of Γ (xf) for every xf ∈ dom(θ) = dom(Γ ). It should be stressed that ˆe being reducible of τˆ does not imply that τˆ can be assigned to ˆe. For instance, according to the definition, every value (including function value) is reducible of (δ)b0 for every base type δ. Also, it is clear that erasure bits play no role in the definition of reducibility. More precisely, if ˆe is reducible of (τ )b for some b, then it is reducible of (τ )b for any b.
Attributive Types for Proof Erasure
195
Proposition 3. We have the following. If ˆe1 is reducible of τˆ and ˆe1 → ˆe2 , then ˆe2 is also reducible of τˆ . If ˆe is reducible of τˆ whenever ˆe1 → ˆe holds, then ˆe1 is reducible of τˆ . If ˆe is reducible of τˆ, then (ˆe)b is reducible of (ˆ τ )b . If ˆe1 and ˆe2 are reducible of τˆ1 and τˆ2 , respectively, then (ˆe1 , ˆe2 )b is reducible of (ˆ τ 1 ∗ τˆ2 )b . 5. If ˆe is reducible of (ˆ τ 1 ∗ τˆ2 )b0 , then for b = bit(ˆ τ 1 ) ⊗ b0 , (fst(ˆe))b is reducible b0 of (ˆ τ 1) . 6. If ˆe is reducible of (ˆ τ 1 ∗ τˆ2 )b0 , then for b = bit(ˆ τ 2 ) ⊗ b0 , (snd(ˆe))b is reducible b0 of (ˆ τ 2) . τ 1 →0 τˆ2 )b0 and τˆ1 , respectively, then for 7. If ˆe1 and ˆe2 are reducible of (ˆ b b = bit(ˆ τ 2 ) ⊗ b0 , (app(ˆe1 , ˆe2 )) is reducible of (ˆ τ 2 )b0 .
1. 2. 3. 4.
Proof. Both (1) and (2) follow from the definition of reducibility immediately. As for (3), it follows by structural induction on τˆ. We now prove (4). Clearly, both ˆe1 ↓ and ˆe2 ↓ hold. So (ˆe1 , ˆe2 )b ↓ holds as well. Suppose (ˆe1 , ˆe2 )b →∗ (ˆ v 1 , vˆ2 )b . Then ˆe1 →∗ vˆ1 and ˆe2 →∗ vˆ2 . By (1), vˆ1 and vˆ2 are reducible of τˆ1 and τˆ2 , respectively. By definition, ˆe is reducible of (ˆ τ 1 ∗ τˆ2 )b . We leave out the routine proofs for (5), (6) and (7). Lemma 3. Assume that Γ 0 ˆe : τˆ is derivable and θ is reducible of Γ . Then ˆe[θ] is reducible of τˆ. Proof. We proceed by structural induction on the derivation D of Γ 0 ˆe : τˆ. – Assume that D is of the following form: D1 :: Γ 0 ˆe1 : τˆ1
D2 :: Γ 0 ˆe2 : τˆ2
0⊕0≤b
Γ (ˆe1 , ˆe2 ) : (ˆ τ 1 , τˆ2 ) 0
b
b
(ty-tup)
where ˆe = (ˆe1 , ˆe2 )b . By induction hypothesis on D1 and D2 , we know that ˆe1 [θ] and ˆe2 [θ] are reducible of τˆ1 and τˆ2 , respectively. Hence, ˆe[θ] = (ˆe1 [θ], ˆe2 [θ] )b is reducible of (ˆ τ 1 ∗ τˆ2 )b by Proposition 3 (4). – Assume that D is of the following form: τ 1 →t τˆ2 )b0 Γ 0 ˆe1 : (ˆ
Γ 0 ˆe2 : τˆ1
0 ⊕ 0 ⊕ 0 ≤ b = bit(ˆ τ 2 ) ⊗ b0
Γ (app(ˆe1 , ˆe2 )) : (ˆ τ 2 )b0 0
b
(ty-app)
τ 2 )b0 . Then by induction hypothesis, where ˆe = (app(ˆe1 , ˆe2 ))b and τˆ = (ˆ t τ 1 → τˆ2 )b0 and τˆ2 , respectively. By Propoˆe1 [θ] and ˆe2 [θ] are reducible of (ˆ sition 3 (7), ˆe[θ] is reducible of τˆ. The rest of cases can be handled similarly (by applying Proposition 3).
196
H. Xi
|(e)1 | = |e| |xf| = xf |c(ˆ e)| = c(|ˆ e|) |(e)0 | = e0 |, |ˆ e1 |, |ˆ e2 |) || = |if(ˆ e0 , ˆe1 , ˆe2 )| = if(|ˆ e1 |, |ˆ e2 | |fst(ˆ e)| = fst(|ˆ e|) |snd(ˆ e)| = snd(|ˆ e|) |ˆ e1 , ˆe2 | = |ˆ e| |fix f. ˆ e| = fix f. |ˆ e| |app(ˆ e1 , ˆe2 )| = app(|ˆ e1 |, |ˆ e2 |) |lam x. ˆe| = lam x. |ˆ Fig. 5. The erasure function on a-expression cores and a-expressions
Theorem 3. (Totality) Assume that ∅ 0 ˆe : τˆ is derivable. Then ˆe↓ holds. Proof. By Lemma 3, ˆe is reducible of τˆ. By the definition of reducibility, ˆe ↓ holds. When L0 is extended with dependent types, it becomes a great deal more involved to prove a corresponding version of Theorem 3. The technique for doing so is developed in [Xi02]. A function | · | is defined in Figure 5 that erases a-expressions into expressions. Note that the erasure of an a-expression ˆe is if bit(ˆe) = 0. As is desired, erasure commutes with substitution. Proposition 4. We have |ˆe2 [xf → ˆe1 ]| = |ˆe2 |[xf → |ˆe1 |] for any a-expressions ˆe1 and ˆe2 . Proof. This follows from the definition of the erasure function | · | immediately. The soundness of erasure is stated and proven as follows. Theorem 4. (Soundness of Erasure) Assume that ∅ t ˆe : τˆ is derivable. If ˆe → ˆe , then |ˆe| →0/1 |ˆe |, that is, |ˆe| = |ˆe | or |ˆe| → |ˆe |. Proof. We proceed by structural induction on the derivation D of ∅ t ˆe1 : τˆ. – The derivation D is of the following form: ∅ t0 ˆ e0 : (bool)b0
∅ t1 ˆ e1 : τˆ ∅
t0 ⊕t1 ⊕t2
∅ t2 ˆ e2 : τˆ
t0 ⊕ t1 ⊕ t2 ≤ b = bit(ˆ τ ) ⊗ b0
(if(ˆ e0 , ˆ e1 , ˆ e2 ))b : (ˆ τ )b0
(ty-if )
where ˆe = (if(ˆe0 , ˆe1 , ˆe2 ))b . We have three subcases. • ˆe0 → ˆe0 for some ˆe0 and ˆe = (if(ˆe0 , ˆe1 , ˆe2 ))b . If b = 0, we are done since |ˆe| = |ˆe | = . If b = 1, then |ˆe| →0/1 |ˆe | holds since we have |ˆe0 | →0/1 |ˆe0 | by induction hypothesis. • ˆe0 = (true)b0 . Then ˆe = (ˆe1 )b . If b = 0, then we are done since |ˆe| = |ˆe | = . If b = 1, then b0 must equal 1 and thus |ˆe| = if(true, |ˆe1 |, |ˆe2 |), which implies |ˆe| → |ˆe1 | = |ˆe |. • ˆe0 = (false)b0 . This case is similar to the previous one. The rest of the cases can be handled similarly. The following theorem implies that if the erasure of a well-typed a-expression ˆe in L0 evaluates to a value v, then ˆe evaluates to an a-value whose erasure is v.
Attributive Types for Proof Erasure
197
Theorem 5. (Completeness of Erasure) Assume that ∅ t ˆe : τˆ is derivable. 1. If |ˆe| is a value, then ˆe →∗ vˆ for some a-value such that |ˆ v | = |ˆe|. 2. If |ˆe| → e , then ˆe →+ ˆe for some ˆe such that |ˆe | = e . Proof. We first prove (1) by analyzing the structure of ˆe. – bit(ˆe) = 0. Then |ˆe| = . By Proposition 2, t = 0, and by Theorem 3, ˆe ↓ holds. So by Theorem 1 and Theorem 2, we have ˆe →∗ ˆv for some a-value ˆv of a-type τˆ. Clearly, |ˆ v | = as bit(ˆ v ) ≤ bit(ˆe) = 0. – bit(ˆe) = 1. We present an interesting case where ˆe = (ˆe1 , ˆe2 )1 . Since |ˆe| = |ˆe1 |, |ˆe2 | is a value, both |ˆe1 | and |ˆe2 | are values. By induction hypothesis, ˆe1 →∗ vˆ1 and ˆe2 →∗ vˆ2 for some a-values vˆ1 and vˆ2 such that |ˆ v 1 | = |ˆe1 | v1 , ˆ v 2 , and we have ˆe →∗ vˆ and |ˆ v | = |ˆe|. All other and |ˆ v 2 | = |ˆe2 |. Let vˆ = ˆ cases can be handled similarly. We now prove (2) by structural induction on the derivation D of ∅ t ˆe1 : τˆ. – The derivation D is of the following form: ∅ t1 ˆ e1 : τˆ1
∅ t2 ˆ e2 : τˆ2
t1 ⊕ t2 ≤ b
∅ t1 ⊕t2 (ˆ e1 , ˆ e2 )b : (ˆ τ 1 , τˆ2 )b
(ty-tup)
where ˆe = (ˆe1 , ˆe2 )b . Clearly, b = 1 since |ˆe| cannot be . There are two subcases. • |ˆe1 | is a value. Then by (1), there exists ˆv 1 such that ˆe1 →∗ vˆ1 and |ˆ v 1 | = |ˆe1 |. Clearly, |ˆe| → |ˆe1 |, e2 for some e2 such that |ˆe2 | → e2 holds. By induction hypothesis, ˆe2 →+ ˆe2 for some ˆe2 such that |ˆe2 | = e2 . Let v 1 , ˆe2 )1 , and we have ˆe →+ ˆe and |ˆe | = |ˆ v 1 |, |ˆe2 | = |ˆe1 |, e2 . ˆe = (ˆ • |ˆe1 | is not a value. Then |ˆe| → e1 , |ˆe2 | for some e1 such that |ˆe1 | → e1 . By induction hypothesis, ˆe1 →+ ˆe1 for some ˆe1 such that |ˆe1 | = e1 . Let ˆe = (ˆe1 , ˆe2 )1 , and we are done. The rest of the cases can be handled similarly. By Theorem 4 and Theorem 5, we know that the erasure of a well-type aexpression ˆe preserves the dynamic semantics of ˆe w.r.t. the erasure function. The attributive types in L0 are precisely introduced to establish this property.
3
Extension
The type system of L0 , which is based on simple types, is not expressive enough to support interesting and realistic style of programming with theorem proving. with dependent types as In this section, we mention an extension of L0 to L∀,∃ 0 well as polymorphic types. Formalizing an extension as such is just a common routine in the framework of Applied Type System [Xi04], and we have coined a word predicatization to refer to such a routine. Please see [Xi04,CX05] for more details that are not presented here for the sake of brevity.
198
H. Xi Σ τ1 : typeb1
Σ τ2 : typeb2
Σ τ1 ∗ τ2 : typeb1 ⊕b2 Σ τ1 : typeb1
Σ τ2 : typeb2
Σ τ1 →t τ2 : typeb2 Σ B : bool Σ τ : typeb (srt-⊃) Σ B ⊃ τ : typeb Σ, a : σ τ : typeb (srt-∀) Σ ∀a : σ. τ : typeb
(srt-tup)
t ≤ b2
(srt-fun)
Σ B : bool Σ τ : typeb (srt-∧) Σ B ∧ τ : typeb Σ, a : σ τ : typeb (srt-∃) Σ ∃a : σ. τ : typeb
Fig. 6. Some sorting rules for L∀,∃ 0
The language L∀,∃ consists of a static component (statics) and a dynamic 0 component (dynamics). The statics itself is a simply typed language and a type in it is referred to as a sort. We assume the existence of the following basic sorts: bool, int, type0 and type1 , and we may write prop and type for type0 and type1 , respectively; bool is the sort for truth values, and int is the sort for integers, and prop is the sort for props, and type is the sort for types. We use a for static variables and s for static terms, and write Σ s : σ to mean that s can be given (over those the sort σ under the context Σ. The additional forms of types in L∀,∃ 0 in L0 ) are given below: types τ ::= . . . | δ(s) | B ⊃ τ | B ∧ τ | ∀a : σ. τ | ∃a : σ. τ where s stands for a sequence of static terms. We use B ⊃ T for a guarded type and B ∧ T for an asserting type, where B and T refer to static expressions of sorts bool and type, respectively. As an example, the following type is for a function from natural numbers to negative integers: ∀a1 : int. a1 ≥ 0 ⊃ (int(a1 ) → ∃a2 : int. (a2 < 0) ∧ int(a2 )) where int(I) is a singleton type for the integer equal to I. The guard a1 ≥ 0 indicates that the function can only be applied to an integer that is greater than or equal to 0; the assertion a2 < 0 means that each integer returned by the function is negative. Some of the rules for assigning sorts to static terms are given in Figure 6. In addition, we have the following sorting rule: Σ τ : type1 Σ τ : type0 which allows a type to be used as a prop. It is this simple rule that initiates the study on attributive types. τ ) = b is Let us use a judgment of the form Σ τ : typeb ⇒ τˆ, where bit(ˆ assumed, to mean that Σ τ : typeb is derivable and τ is transformed into τˆ based the derivation of Σ τ : typeb . Then we can readily turn the rules in
Attributive Types for Proof Erasure
199
Figure 6 into the ones for deriving judgments of this form. For instance, the following rule is derived from the rule (srt-fun): Σ τ1 : typeb1 ⇒ τˆ1
Σ τ2 : typeb2 ⇒ τˆ2
t ≤ b2
Σ τ1 → τ2 : typeb2 ⇒ (ˆ τ 1 → τˆ2 ) t
t
b2
It should be obvious to see how the other rules in Figure 6 are handled, and we leave out the details. Note that using the sorting rules to attach erasure bits to types is particularly important in practice as requiring the programmer to do so manually would seem too unwieldy if not completely impractical.. is either a static boolean term B of sort bool, or a A static term s in L∀,∃ 0 static integer I of sort int, or a prop P of sort prop, or a type T of sort type. In practice, we allow the programmer to introduce new sorts through datasort declarations, which are rather similar to datatype declarations in ML. We assume some primitive functions cB and cI when forming static terms of sorts bool and int; for instance, we can form terms such as I1 + I2 , I1 − I2 , I1 ≤ I2 , ¬B, B1 ∧ B2 , etc. We use B for a sequence of static boolean terms and Σ; B |= B for a constraint that means for any substitution Θ : Σ (meaning Θ(a) can be assigned the sort Σ(a) for every a ∈ dom(Θ) = dom(Σ)), if each static boolean term in B[Θ] equals true then so does B[Θ]. In practice, such a constraint relation is often determined by some automatic decision procedure. is of the form Σ; B; Γ ˆe : τˆ, and we omit the typA typing judgment in L∀,∃ 0 ing rules for L∀,∃ . The developed theory in Section 2 can be readily carried over 0 to L∀,∃ . To establish program (or proof) termination more effectively in prac0 tice, we employ an approach that allows the programmer to supply termination metrics for automatic termination verification [Xi02]. We will explain some uses of this approach in Section 4.
4
An Example
We now use an example to illustrate how the form of attributive types developed in this paper can be used to support programming with theorem proving. In Figure 7, we declare a type constructor tree that takes a type T and two integers I1 and I2 to form the type tree(T, I1 , I2 ) for binary trees of height I1 and size I2 in which each element is of type T . We use max(h1 , h2 ) for the maximum of h1 and h2 . Clearly, for any binary tree of height I1 and size I2 , we have 2I1 < I2 . To establish this property, we declare a prop constructor POW2 in Figure 7 to encode the power function with base 2: Given integers I1 and I2 , if POW2(I1 , I2 ) is inhabited, the 2I1 = I2 holds. We implement a function pow2 and a proof function lemma in Figure 8, which are assigned the following type and prop, respectively: pow2 lemma
: :
∀ι1 . ι1 ≥ 0 ⊃ (int(ι1 ) →0 ∃ι2 . POW2(ι1 , ι2 ) ∗ int(ι2 ) ∀α.∀ι1 .∀ι2 . tree(α, ι1 , ι2 ) →0 ∃ι. (ι2 < ι) ∧ POW2(ι1 , ι)
Note that given an integer I, we use int(I) for the singleton type containing the only integer of value I. Clearly, this prop means that I2 < 2I1 if there exists a tree
200
H. Xi
datatype tree (type, int, int) = | {a:type} E (a, 0, 0) | {a:type} {h1,h2,s1,s2:nat} B (a, 1+max(h1,h2), 1+s1+s2) of (tree (a,h1,s1), a, tree (a,h2,s2)) dataprop POW2 (int, int) = // POW2 (p, n) means 2^p = n | POW2bas (0, 1) | {p,n:nat} POW2ind (p+1, n+n) of POW2 (p, n) Fig. 7. A dependent datatype for binary trees and a dataprop for encoding power of 2 // [pow2] computes powers of two fun pow2 {p:nat} .. (p: int p): [n:nat] (POW2 (p, n) | int n) = if p igt 0 then let val (pf | n) = pow2 (ipred p) in (POW2ind pf | n iadd n) end else begin (POW2bas | 1) end // [prfun] means that a proof function is declared; instead of // of type, a prop is assigned to a proof function prfun lemma {a:type} {h,s:nat} .. (t: tree (a, h, s)): [n: nat | s < n] POW2 (h, n) = case+ t of | E () => POW2bas () | B (t1, _, t2) => let prval pf1 = lemma t1 and pf2 = lemma t2 val (pf | _) = pow2 (ipred (height t)) prval () = pow2_inc (pf1, pf) and () = pow2_inc (pf2, pf) in POW2ind pf end Fig. 8. A proof construction involving total functions
of height I1 and size I2 . In the definition of and pow2 and lemma, the functions iadd, ipred, igt and height, and the proof function pow2 inc are assigned the following types and prop: iadd ipred igt height pow2 inc
: : : : :
∀ι1 ∀ι2 . (int(ι1 ) ∗ int(ι2 )) →0 int(ι1 + ι2 ) ∀ι. int(ι) →0 int(ι − 1) ∀ι1 ∀ι2 . (int(ι1 ) ∗ int(ι2 )) →0 bool(ι1 > ι2 ) ∀α.∀ι1 .∀ι2 . (ι1 ≥ 0 ∧ ι2 ≥ 0) ⊃ tree(α, ι1 , ι2 ) →0 int(ι1 ) ∀ι1 .∀ι1 .∀ι2 .∀ι2 . (ι1 ≥ 0 ∧ ι2 ≥ 0 ∧ ι1 ≤ ι2 ) ⊃ (POW2(ι1 , ι1 ) ∗ POW2(ι2 , ι2 ) →0 (ι1 ≤ ι2 ) ∧ 1)
Note that we use bool(B) for the singleton type containing the only boolean of value B. The functions iadd, ipred and igt are primitive ones with the obvious meaning. The function height computes the height of a given tree and the proof function pow2 inc essentially proves that 2I1 ≤ 2I2 holds for any natural number
Attributive Types for Proof Erasure
201
I1 and I2 satisfying I1 ≤ I2 . For brevity, we omit the actual code that implements height and pow2 inc.
5
Related Work and Conclusion
In an attempt to advance the type system of ML, Dependent ML (DML) is developed to support a restricted form of dependent types where type indexes are required to be drawn only from a chosen index language [Xi07,XP99]. In DML, type-checking a sufficiently annotated program can be reduced to solving constraints from this chosen index language, which is often handled through a fully automatic but limited decision procedure. In ATS [Xi], a paradigm of programming with theorem proving is introduced [CX05], making it possible for the programmer to handle (difficult) constraints by constructing explicit proofs. Consequently, the need for proof erasure appears. The approach to proof erasure in this paper draws primary inspiration from the design of program extraction (from proofs) in Coq [PM89,Let03]. In particular, the sorts prop and type here roughly corresponds to the kinds Prop and Set in Coq. The notion of proof erasure in this paper is also casually related to a modal type theoretical study on proof irrelevance [Pfe01]. The approach taken in [Pfe01] is fundamentally different from the one in [PM89] as the former does not support, a priori, a separation between props and types. Instead, whether an object can be classified as a program (i.e., intentional expression) or a proof only depends on some conditions on its free variables. We, however, do not make use of such conditions when formulating the typing rules for L0 .
References BC04. C+ 86. CX05. DE73. DFH+ 93. JG91. Let03.
Bertot, Y., Casteran, P.: Interactive Theorem Proving and Program Development Coq’Art: The Calculus of Inductive Constructions. Texts in Theoretical Computer Science. An EATCS Series. Springer, Heidelberg (2004) Constable, R.L., et al.: Implementing Mathematics with the NuPrl Proof Development System, p. 299. Prentice-Hall, Englewood Cliffs, New Jersey (1986) Chen, C., Xi, H.: Combining Programming with Theorem Proving. In: Proceedings of the Tenth ACM SIGPLAN International Conference on Functional Programming, Tallinn, Estonia, September 2005, pp. 66–77 (2005) Dantzig, G.B., Eaves, B.C.: Fourier-Motzkin elimination and its dual. Journal of Combinatorial Theory (A) 14, 288–297 (1973) Dowek, G., Felty, A., Herbelin, H., Huet, G., Murthy, C., Parent, C., Paulin-Mohring, C., Werner, B.: The Coq proof assistant user’s guide. Rapport Technique 154, INRIA, Rocquencourt, France, Version 5.8 (1993) Jouvelot, P., Gifford, D.K.: Algebraic reconstruction of types and effects. In: Proceedings of 18th ACM SIGPLAN Symposium on Principles of Programming Languages, January 1991, pp. 303–310 (1991) Letouzey, P.: A New Extraction for Coq. In: Geuvers, H., Wiedijk, F. (eds.) TYPES 2002. LNCS, vol. 2646, Springer, Heidelberg (2003)
202 Mar84. NPS90. Pfe01. PM89. Xi. Xi02. Xi04. Xi07. XP99.
H. Xi Martin-L¨ of, P.: Intuitionistic Type Theory, Bibliopolis, Naples, Italy, p. 91 (1984) ISBN88-7088-105-9 Nordstr¨ om, B., Petersson, K., Smith, J.M.: Programming in Martin-L¨ of’s Type Theory. International Series of Monographs on Computer Science, vol. 7, p. 221. Clarendon Press, Oxford (1990) Pfenning, F.: Intentionality, Extentionality and Proof Irrelevance in Modal Type Theory. In: Proceedings of 16th IEEE Symposium on Logic in Computer Science, Boston, June 2001, pp. 221–230 (2001) Paulin-Mohring, C.: Extraction de programmes dans le Calcul des Constructions. Th`ese de doctorat, Universit´e de Paris VII, Paris, France (1989) Xi, H.: The ATS Programming Language. Available at: http://www.ats-lang.org/ Xi, H.: Dependent Types for Program Termination Verification. Journal of Higher-Order and Symbolic Computation 15(1), 91–132 (2002) Xi, H.: Applied Type System (extended abstract). In: Berardi, S., Coppo, M., Damiani, F. (eds.) TYPES 2003. LNCS, vol. 3085, pp. 394–408. Springer, Heidelberg (2004) Xi, H.: Dependent ML: an approach to practical programming with dependent types. Journal of Functional Programming 17(2), 215–286 (2007) Xi, H., Pfenning, F.: Dependent Types in Practical Programming. In: Proceedings of 26th ACM SIGPLAN Symposium on Principles of Programming Languages, San Antonio, Texas, January 1999, pp. 214–227. ACM press, New York (1999)
Author Index
Allali, Lisa 1 Atkey, Robert 18 Belo, Jo˜ ao Filipe
Matthes, Ralph 125 Møgelberg, Rasmus Ejlers 33
Ciraulo, Francesco Corbineau, Pierre
51 69
Esp´ırito Santo, J.
85
Genitrini, Antoine Ghilezan, S. 85 Iveti´c, J.
100
Kozik, Jakub 100 Kozubek, Agnieszka
Sacerdoti Coen, Claudio 157 Sambin, Giovanni 51 Schwinghammer, J. 173 Simpson, Alex 142 Strecker, Martin 125 Tassi, Enrico
157
Urzyczyn, Pawel
85
Xi, Hongwei 110
142
Zaionc, Marek
110
188 100