This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
€, so (w.r.t. Q) (7, E alt =+ E ) and (P, e + E alt). Thus we prefer rule E . Binary unions always have degree 2, injections always have degree 3. Only $-functions of degree 3 are formed. +’+’ ( 4([XI ( D )B @ I.[ ( D )4,q.e.d. Conversely, we might generate part of the &-equalityby adding general permutative reductions, paying due attention to the thus arising SN problem.
+
*
312
D.T. van Daalen
2.7. A possible extension concerning $-functions We can, however, define an extension of the language by also admitting degree 2 $-functions, i.e. glueing type-valued functions together into a single type-valued function. To this end we put: Let aEr, PET. let cpEa + r , $ E P + 7. Then (4‘) Plus rule 1: I- cp$ $ ( E (a$P)+r). ( 5 ) Plusrule2: BEn(cp), cEn($) +-bB$C(En(cp$$)).
The old plus can be considered as a special case of rule 5 , by using
[. : .I Y €3 .[
: PI
Y
>Edt
.1
: a @PI Y
E
or E alt:
*
We do not discuss this extension here, because it really complicates the normability problem (see [ C.5, VIII.4.6)). 2.8. Elementary properties As in V.2.7-V.2.9 we can infer some nice properties. First, concerning the degrees: I- A
* A degree correct
A 9B AEB
degree(A) = degree(B) 3 degree(A)
= degree( B )
+1 .
Then, concerning contexts, renaming (see (C.5, V.2.9.21) and weakening ( [ C . 5 , V.2.9.31). Further, the simultaneous and the single substitution theorem ([C . 5 , V.2.9.4-5]), and correctness of categories ( [ C . 5 ,V.2.101): A E B I- B. Analogously to the abstr. and appl. properties in ((7.5, V.2.101 and [C.5, V.2.141 (which mutatis mutandis hold hold as well in AUT-II) we have properties like
*
I- (9, A, B )
* ( AEa, c p ~ a+
7,
B E(A) cp) etc.
i.e. the “inversion of the correctness rules”. An important additional property (to be proved in the next section) is uniqueness of types: AEB, AEC*BqC which in AUT-QE did not hold for A of degree 2, because of type inclusion.
313
Generalizing Automat h by Means of a Lambda-Typed Lambda Calculus* N.G. de Bruijn
SUMMARY The calculus A h developed in this paper is claimed to be able t o embrace all the essential aspects of Automath, apart from the feature of type inclusion, which will not be considered in this paper. The calculus deals with a correctness notion for lambda-typed lambda formulas (which are presented in the form of what will be called lambda trees). To an Automath book there corresponds a single lambda tree, and the correctness of the book is equivalent to the correctness of the tree. The algorithmic definition of correctness of lambda trees corresponds to an efficient checking algorithm for Automath books.
1. INTRODUCTION 1.1. Automath and lambda calculus We are not going to explain Automath in this paper; for references and a few remarks we refer to Section 6.1. The basic common feature of the languages of the Automath family is lambda-typed lambda calculus. Nevertheless Automath has various aspects of a different nature, of which we mention the context administration and the mechanism of instantiation. Moreover there is the notion of degree, and the rules of the languages, in particular those regarding abstractors, are different for different degrees. But a large part of what can be said about Automath, in particular as far as language theory is concerned, can be said about the bare lambda-typed lambda calculus already. In [de Bruijn 71 (B.2)] it was described how a complete Automath book can be considered as a single lambda calculus formula, and that idea gave rise *Reprinted from: Kueker, D.W., Lopes-Escobar, E.G.K. and Smith, C.H., eds., Mathematical Logic and Theoretical Computer Science, p. 71-92, by courtesy of Marcel Dekker Inc., New York.
N.G. de Bruijn
314
to work on language theory ([Nederpelt 73 (C.3)], [van Daalen 801) about the lambda-typed lambda calculus system called A. This system of condensation of an Automath book into a single formula (AUT-SL: single line Automath book) had a disadvantage, however. In order to put the book into the lambda calculus framework it was necessary to first eliminate all definitional lines of the book. Considering the fact that the description of a mathematical subject may involve a large number of definitions, the exponential growth in length we get by eliminating them is prohibitive in practice: it can serve a theoretical purpose only. The kind of lambda-typed lambda calculus to be developed in the present paper may be better in this respect. It makes it possible to keep the full abbreviational power of Automath books within the framework of a lambda calculus. In this framework a number of features of Automath can be explained in a unifying way. Lines, contexts and instantiations all vanish from the scene. They find their natural expression in the lambda calculus, like in AUT-SL, but now without loosing the relation with the original Automath book. In particular the way we actually check the correctness of an Automath book is directly related to an efficient way t o check the correctness of a lambda formula. Therefore the checking algorithm described in this paper can be expected to become a basis of all checkers of Automath-like languages. The little differences between the various members of the Automath family lead to rather superficial modifications of that basic program. It can be expected that most of these modifications will be felt at the input stage only. The paper is restricted t o the Automath languages without type inclusion. The feature of type inclusion (which is used in AUT-68 and AUT-QE) requires modifications in the correctness definition and the checking algorithm. We shall not discuss such modifications here.
1.2.
lkees
The paper has another feature, not strongly related t o the main theme. That feature is the predominant place given to the description in terms of trees rather than to the one in terms of character strings. Of course, this may be considered as just a matter of taste. Nevertheless it may have an advantage to have a coherent description in terms of trees, in particular for future reference. The author believes that if it ever comes to treating the theory of Automath in an Automath book, the trees may stand a better chance than the character strings.
Generalizing Automath (B. 7)
315
2. LAMBDA TREES
2.1. What to take as fundamental, character strings or trees Syntax is closely connected to trees. Formulas, and other syntactic structures, are given as strings of characters, but can be represented by means of trees. On the other hand, treeshaped structures can be coded in the form of strings of characters. One might say that the trees and the character strings are two faces of one and the same subject. The trees are usually closer to the nature of things, the character strings are usually better for communication. Or, to put it in the superficial form of a slogan, the trees are what we mean, the strings are what we say. Discussing syntax we have to choose which one of the two points of view, trees of strings, is to be taken as the point of departure. Usually one seems to prefer the character strings, but we shall take the less traditional view to start from the trees. One can have various reasons for this preference, but here we mention the following two as relevant for the present paper: (i) It seems t o be easier to talk about the various points of a tree than about the various “places” in a character string. (ii) The trees make it easier to discuss the matter of bound variables. We shall use the character strings as a kind of shorthand in cases where the trees become inconvenient. This shorthand is quite often easier to write, to print and to read, but the reader should know all the time that the trees are the mathematical structures we really intend to describe. In Section 2.7 we shall display the shorthand rules. 2.2. The infinite binary tree We start from a set with two elements, 1 and r (mnemonic for “left” and “right”). W is to be the set of all words over ( 1 , T } , including the empty word E ; in standard notation W = {l,r}*. This set W will be called the infinite binary tree. We consider the mappings father leftson rightson
(W\{E})+ W , : W +W , : W +W .
:
The father of a word is obtained by omitting its rightmost letter, the leftson is obtained by adding an 1 on the right, the rightson is obtained by adding an r on the right.
N.G. de Bruijn
316
Examples: father(1) = E , father(?-) = E , father(1rl) = IT leftson(&)= 1 , leftson(lrr1) = 1 ~ ~ 1 1 , rightson(&)= T
rightson(rl1) = rllr.
In these examples we have followed the usual sloppy way t o write words as concatenated sequences of letters, and to make no distinction between a oneletter word and the letter it consists of. We define the binary infix relation < by agreeing that u < v (with u E W , v E W ) means that the word u is obtained from the word v by omitting one or more letters on the right. So ITT < I T T T ~ , and E < u for all u E W\{E}. The relation is obviously transitive. As usual, u 5 v means that either u < v or u = v. And v > u (v 2 u ) will mean the same thing as u < v (u 5 v).
2.3. Binary trees We shall consider all binary trees t o be finite subtrees of the infinite binary tree. A binary tree is a finite subset V of W with the following properties:
(i)
E
E
V,
(ii) for all u E V with u #
E
we have father(u) E V ,
(iii) if u E V then leftson(u) E V if and only if rightson(u) E V . Elements of V are called points of the binary tree. If u E V and leftson(u) # V , rightson(u) # V , then is called an end-point of V . The set of all end-points is denoted V,. The point E is called the root of
V. There are two popular ways t o draw two-dimensional pictures of a binary tree. The way we follow in this paper is to draw sons above their fathers. The other one has the fathers above their sons (such pictures can be called weeping willows). In both cases leftson(u) is drawn t o the left of rightson(u), for all u. Readers who prefer to draw weeping willows instead of upright trees will not have any trouble, since for their benefit we shall avoid the use of terms like “up”, “down”, “above”, “below” for describing vertical orientation. The inequality < is neutral in this respect. 2.4. Labels We consider three different objects outside W . They will be called A, T and Elements of the set W U {A, T, T} will be called labels. Points with label A or T will be called A-nodes and T-nodes, respectively. T.
Generalizing Automath (B.7)
317
If V is a binary tree then any mapping of V into the set of labels is called a labeling of V . If f is a labeling, and u E V , then f(u)is called the label of u. 2.5. Definition of lambda trees
A lambda tree is a pair (V,lab), where V is a binary tree, and lab is a labeling of V that satisfies the following conditions (i), (ii), (iii): (i)
If u E V\Ve then lab(u) E { A , T } .
(ii) If u E V, then lab(u) E V U { r } . (iii) If u E V, and lab(u) E V then lab(lab(u)) = T and rightson(lab(u)) 5 u.
2.6. An example We give an example with 17 points. These points and their labels are specified as follows: lab(&)= T , lab(l) = r , lab(r) = T , lab(r1) = T , lab(rll) = 7 , lab(&) = A , lab(rZr1) = rl , lab(r1rr) = E , lab(rr) = A , lab(rrl) = T , lab(rrl1) = T , lab(rr1r) = r , lab(rrr) = T , lab(rrrl) = 7 , lab(rrrr) = T , lab(rrrr1) = TTT , lab(rrrrr) = T . This lambda tree is pictured in Figure 1.
Figure 1, A lambda tree The picture does not show the names of the points, but it does show their labels as far as they are A, T or r. In the cases of points u with labels in V we have indicated lab(u) by means of a dotted arrow from u (which is always an
N.G. de Bruijn
318
end-point, according t o 2.5 (i)) to lab(u) (which is a point on the path from u to the root of the tree). Indeed the arrows always go to points with label T , and at such points the arrows always come from the right, according to 2.5 (iii).
2.7. Representation of a lambda tree as a character string We begin by taking a set of identifiers to be called dummies. They are no elements of the set of labels. Next in some arbitrary way we attach a dummy to every point of the tree that has label T , and different points get different dummies. In the example of 2.6 we attach X I to E , x2 to r , 23 t o rl, 2 4 to rr1, x5 to rrr, 2 6 to rrrr. We can now also attach dummies to the end-points as far as their label is not T . To the end-point u (with label lab(u) E V ) we attach the same dummy as we attached t o lab(u). The point lab(u) with its dummy is called the binding instance of the dummy, the point u with its dummy is called a bound instance. In Figure 2 the dummies are shown. The arrows could be omitted since their information is provided by the dummies: the arrows run from the bound instances of a dummy to its binding instance.
Figure 2, Tree with named dummies We now produce the character string representation by the following algorithm that attaches character strings to all subtrees: (i)
A subtree consisting of a single point is represented by T if its label is and by its dummy if it is a bound instance of that dummy.
T
(ii) A subtree whose root is labeled by A , to which there is attached a left-hand subtree (with character string P ) and a right-hand subtree (with character string Q), gets the character string ( P )Q. (iii) A subtree whose root is labeled by T , with dummy xi, say, and with P and Q as under (ii), gets the character string [xi : PI Q.
Generalizing Automath (B.7)
319
If we apply the algorithm to the tree of Figure 2 we get [zl : T ] [z2 : [z3 : T ] ( 2 3 ) 211 ([z4 : z2] T ) [ 2 5 : T ] (26 : 251 2 2
.
The way back from character string to lambda tree is easy, and we omit its description. 2.8. Remarks
The following remarks might give some background to our definitions and notations. (i)
The notation [z : P ]Q is the notation in Automath for the typed lambda abstraction. Here the binding dummy z is declared as being of type P. In untyped lambda calculus one might write [z] Q , but it is usually written with a lambda: Az.Q or A, Q.
(ii) In standard lambda calculus there is the construct called “application”, usually written as a concatenation QP. The interpretation is that Q is a function, P a value of the argument, and that Q P is the value of the function Q at the point P . The Automath notation puts the argument in front of the function: it has ( P )Q instead of QP. The decision t o put the “applicator” ( P ) in front of the function Q , is in harmony with the convention to put abstractors (like the [z : P] above) in front of what they operate on. Older Automath publications had { P }Q instead of ( P )Q. (iii) The T has about the same role that is played in Automath by ‘type’ and ‘prop’, the basic expressions of degree 1. (iv) The labels A and T in the lambda tree are mnemonic for “application” and “typing”. (v) The typing nodes are at the same time lambda nodes. This is different from what we had in [de Bruzjn 72b (C.2)],Section 13; there the lambda was a separate node in the right-hand subtree of the node with label T. Taking them together has the effect that the arrows in the lambda tree lead to nodes labeled T instead of A, and that the provision has to be made that arrows leading to a T-node always arrive from the right (see 2.5 (iii)). In the character string representation this provision means that in the case of [z : P]Q the dummy 2 does not occur in P. (vi) The tree of Figure 2 can be presented in namefree form by means of the reference depth system of [ d e Bruzjn 72b (C.2)]. We explain it here: If
N.G. de Bruijn
320
there is an arrow from an end-point u to lab(u) then the reference depth of u is the number of IJ with lab(v) = T and lab(u) 5 IJ < rightson(v) 5 u. We can replace the information contained in lab(u) by the reference depth of u. If that depth is 3, say, then we find lab(u) by proceeding from u to the root of the tree; the point we want is the third T-node we meet, provided that we only count T-nodes we approach from the right. For the tree of Figure 1 this is carried out in Figure 3.
Figure 3, Tree with reference depths Comparing Figure 3 t o Figure 2 we note that the three ones in Figure 3 lead to three different dummies ( 5 3 , 52, 55) in Figure 2, and that the two bound instances of 52 have the different reference depths 1 and 3. If we pass from the tree to the character string representation, we can omit the names of the dummies. We can write the namefree form of the example of Figure 2 as
This simple example demonstrates that the depth reference system was designed for other purposes than for easy reading.
3. DEGREE AND TYPE 3.1. Introduction
To every lambda tree we shall assign a non-negative number, t o be called its degree. And we shall even assign a degree to every end-point of a lambda tree.
Generalizing Automath (B.7)
32 1
If a lambda tree has degree > 1 we shall define its type, which is again a lambda tree. The degree of the new tree is 1 less than the one of the original one. As a preparation we need the notion of the lexicographical order in a binary tree. Moreover, for the definition of the type of a lambda tree we need the notion of implantation.
3.2. Lexicographical order In a lambda tree the points are words consisting of 1’s and r’s. We can order them as in a dictionary, starting with the empty word. For the tree of Figure 1 (Section 2.6) the dictionary is: E,
1 , r, rl, rll, rlr, rlrl, rlrr, rr, rrl, rrll, r r l r , rrr, rrrl, rrrr, r r r r l , rrrrr
.
A word u is said to be lexicographically lower than the word v if u comes before v in the dictionary. Note that if u < v (in the sense of Section 2.2) then u is lexicographically lower than v, but the converse need not be true. 3.3. Ascendants Let (V,lab) be a lambda tree. If u is an end-point with lab(u) # T then we shall define the ascendant of u, to be denoted by asc(u); it will be again an end-point. We note that lab(u) is not an end-point (see 2.5 (iii)), and therefore there exist points of V which are lexicographically higher than lab(u) and lower than rightson(lab(u)). The lexicographically highest of these is an end-point of V, and it is this end-point that we take as the definition of asc(u). Let us take Figure 1 as an example. The end-points are, in lexicographic order: I , rll, rlrl, rlrr, rrll, r r l r , r r r l , rrrrl, rrrrr. Of these, I , rll, r r l r and rrrl have no ascendants, but asc(rlr1) = rll, asc(r1rr) = I , asc(rrl1) = r l r r , asc(rrrr1) = rrrl, asc(rrrrr) = r l r r . 3.4. Degree of an end-point If u is an end-point of (V,lab), and lab(u) # cally lower than u. This is obvious since (i)
7,
then asc(u) is lexicographi-
asc(u) is lexicographically lower than rightson(lab(u)),
(ii) rightson(lab(u)) 5 u by Section 2.5, so (iii) rightson(lab(u)) is either equal to or lexicographically lower than u. We can now define the degree deg(u) of the end-points one by one, proceeding through the lexicographically ordered sequence of end-points. We define
N . G . de Bruijn
322 deg(u) = 1
+ deg(asc(u))
deg(u) = 1
if lab(u) = r
,
if lab(u) #
.
T
This defines deg as a function: deg : V,
+
{1,2,3,...}
.
In the example of Figure 1, we have 1, rll, rrlr, rrrl of degree 1, rlrl, rlrr, rrrrl of degree 2, and rrll, rrrrr of degree 3. 3.5. Degree of a l a m b d a tree
As the degree of a lambda tree (V,lab) we define deg(w), where w is the lexicographically highest point of V. This w is a word without l’s. Note that a lambda tree can have end-points whose degree exceeds the degree of the tree. We get an example if in the tree of Figure 1 we replace the label of rrrrr by r : then the tree has degree 1 but its point rrll has degree 3. 3.6. I mp la n tation Let (V,lab) be a lambda tree, and u be a point of V, not necessarily an end-point. And let S be a set of end-points of V . We assume that the following implantation condition holds: for every w E S and for every v E V with v 2 u, lab(v) E V, lab(v) < u we have rightson(lab(v)) 5 w. In this situation we shall describe a new lambda tree obtained by implanting at every point of S a copy of the subtree whose root is u. This new tree (V’,lab’) will be denoted as (V’, lab’) = impl( V, lab, u, S) . First we form the subtree at u, to be denoted as sub(u). It is the set of all words p E { l , r } *such that the concatenation up belongs to V . Next we define V’ as V’=VU
u
wES
u
p€sub(u)
WP
where wp is the concatenation of w and p. In order to define the labeling lab‘ of V’ we divide the set sub(u) into two categories: subl(u) and sub2(u). The first one, subl(u), is the set of all p E sub(u) for which both lab(up) E V and lab(up) 2 u. Such a p has the property that lab(up) = uq with some q E sub(u). We may call these p’s points with internal reference. All other points of sub(u) are put into sub2(u). This consists of the p’s such that lab(up) is A, T or T and of all p’s for which both lab(up) E V and lab(up) < u. The latter p’s may be called points with external reference. We are now ready t o describe the labeling lab’ of V‘. For all u E V\S we take lab’(u) = lab(u). The other points of V’ can be uniquely written as wp with w E S, p E sub(u). If p E sub2(u) we simply take lab’(wp) = lab(up). If
Generalizing Automath (B.7)
323
p E subl(u), however, the label of the copied point is no longer the same as the original label, but the copy of the original label. To be precise, if q is such that lab(up) = uq, then we take lab’(wp) = wq. Note that if s E S then s belongs to both V and V‘, and that lab’(s) can be different from lab(s). It is not hard to show that (V‘, lab’) is again a lambda tree. In Figure 4 we show a case of implantation. The lambda tree on the left is (V,lab), the one on the right is (V‘,lab’). We have (V’, lab’) = impl( V, lab, rl, { rrr}). S
Figure
4,
Implantation
3.7. Implantation and degree We keep the notation of Section 3.6. Taking some fixed w E S we can consider the wp (with p € sub(u)) as copies of the corresponding up. We now claim that the degree of wp in (V’, lab’) is always equal to the degree of u p in (V, lab). This can be proved by induction, letting p run through sub(u) in lexicographical order. If p is an external reference then the statement on the degrees is an easy consequence of the fact that the points of V\S have in (V,lab) the same degree as in (V’,lab’). If p is an internal reference, however, we remark that the ascendants of up in (V,lab) and of wp in (V’,lab’) are corresponding points again, so that they have equal degree by the induction hypothesis. Since deg(up) = deg(asc(up)) 1, deg(wp) = deg(asc(wp)) 1, we also have deg(up) = d e g ( w ) .
+
+
3.8. Type of a lambda tree If a lambda tree (V,lab) has degree > 1 we shall define the type, which is again a lambda tree. Let w be the lexicographically highest point of w (see 3.5). If lab(w) = T then (V, lab) has degree 1 and its type will not be defined. The only other possibility is that lab(w) E V. Now lab(lab(w)) = T , whence lab(w) has a leftson. We now
N.G. de Bruijn
324
define the type of (V, lab), to be denoted typ(V,lab), by implanting the subtree of leftson(lab(w)); here the set S consists of the single point w: typ(V, lab) = impl(V, lab, leftson(lab(w)), {w})
.
An example of typing is already available in Figure 4: the tree on the right is the type of the one on the left. 3.9. Typing lowers the degree by 1
We shall show that if the degree of (V,lab) exceeds 1, then the degree of typ(V, lab) is one less than the degree of (V,lab). Let again 20 be as in Sections 3.5 and 3.7. Since deg(w) > 1, w has an ascendant: u = asc(w). Then deg(w) = deg(v) 1. In the terminology of Section 3.8 we can now state that the lexicographically highest point in typ(V,lab) is the copy of v, and so, by the result of that section, its degree in typ(V, lab) equals the degree of u in (V, lab). So the degree of type(V, lab), i.e., the degree of its lexicographically highest point, is one less than deg(w), which is the degree of (V, lab).
+
4. REDUCTIONS 4.1. Beta reduction
We shall not present beta reduction directly. It will be introduced as the result of a set of more primitive reductions: local beta reductions and ATremovals. The reason for this is that the delta reductions of Automath can be considered as local beta reductions, and not as ordinary beta reductions. 4.2. AT-pairs
Let (V,lab) be a lambda tree. An AT-pair is a pair (u,v) where u E V, u E V, lab(u) = A, lab(v) = T, v = rightson(u).
Example: in Figure 1 (TT, T T T ) is an AT-pair. 4.3. AT-couples
We mention that whatever we do with AT-pairs can be generalized to ATcouples. We shall not actually use AT-couples, but we give the definition for the sake of completeness. Let n be a positive integer, let u1, u2, ...,u,,be points of V with ui = rightson(uj) for 1 < j 1 = 1 5 n. Furthermore, whenever 1 5 m < n, the number of i with 1 5 i 5 m and lab(ui) = T is less than the number of i with 1 5 i 5 m and lab(ui) = A. And finally the number of i with 1 5 i 5 n and lab(ui) = T is equal to the number of i with 1 5 i 5 n and lab(ui) = A. Now ( u l , ~ , , )is called an AT-couple. It is easy to see that
+
Generalizing Automath (B.7)
325
lab(u1) = A, lab(u,) = T . The situation can be illustrated by replacing the sequence ul,..., un by a sequence of opening and closing brackets: ui is replaced by an opening or a closing bracket according to lab(ui) = A or lab(ui) = T . The conditions mentioned above mean that the first and the last bracket form a matching pair of brackets, like in
[I1 “I[111.
4.4. Local beta reduction Let (V,lab) be a lambda tree, and let w be an end-point with lab(w) # T . We assume that the point lab(w) is the rightson of a point u with label A. So (u,lab(w)) is an AT-pair. We can now form the following implantation (Vf,lab‘) = impl(V, lab, leftson(u), {w})
.
The passage from (V, lab) to (Vf,lab’) is called local beta reduction at w. We give an example in the language of character strings. Let (V,lab) correspond to
[w:7][z: ([z:w]z)[y: [p:T]T](y)y]T. We apply local beta reduction to the second one of the two bound occurrences of y. It comes down to replacing that y by [ z : w]z (but we have to refresh the dummy z ) :
[w: T I [z: ( [ z : w ] z ) [y : [p : TI TI (y) [q : w]q] 7 * 4.5. AT-removal
Let (u,v) be an AT-pair in the lambda tree (V,lab), and assume that there is no w E V such that lab(w) = v. Then we can define a new lambda tree (V’,lab’) that arises by omitting this AT-pair and everything that grows on u and v on the left. A formal definition of V’ is the following one. We omit u and v from V, and furthermore all points which are 1 u1 and all points which are 2 vl. Next every point of the form urrw is replaced by the corresponding uw. In the latter cases the labels are redefined: if in V we had lab(urrw) = U T T Z then we take lab’(uw) = uz; if, however, lab(urrw) is not 2 urr we just take lab’(uw) = lab(urrw). We given a n example of AT-removal in the language of character strings. In ((7) .[ :
TI [Y : 71 Y) 7
there are no bound instances of
2, so
the pair
(T)
[z : T] can be removed.
N.G. de Bruijn
326
The result is
4.6. Mini-reductions
We shall use the word mini-reduction for what is either a local beta reduction or an AT-removal.
4.7. Beta reduction Let (u,v) be an AT-pair in the lambda tree (V, lab). Then beta reduction of (V,lab) (with respect to (u,v)) is obtained in two steps: (i) We pass from (V, lab) to impl(V, lab, leftson(u), S) , where S is the set of all w E V with lab(w) = v. (ii) This new lambda tree still has the AT-pair (u,v). To this pair we apply AT-removal. Step (i) can also be described as a sequence of local beta reductions, applied one by one to the w with lab(w) = v. The order in which these w's are taken is irrelevant.
4.8. The Church-Rosser property In the following, R is a relation on the set of all lambda trees. For example, the relation R can be the one of mini-reduction: if A and B are lambda trees then (A, B) E R expresses that B is obtained from A by mini-reduction. If A and B are lambda trees, we say that B is an R-reduct of A if either B = A or there is a finite sequence A = Ao, A l , ...,A , = B such that for every i (0 5 i < n ) we have (Ai,Ai+l) E R. We say that lambda trees C and D are R-equivalent if there is an E which is an R-reduct of both C and D. Simple examples of this are (i) the case C = D , and (ii) the cases where D is an R-reduct of C . We note that this equivalence notion is obviously reflexive and symmetric. If it is also transitive, we say that R has the Church-Rosser property.
Generalizing Automath (€3.7)
327
4.9. Church-Rosser for beta reductions The famouschurch-Rossertheorem states that in untypedlambdacalculus the set of all beta reductions has the Church-Rosser property (see [Barendregt 811). The fact that we have lambda trees with T-nodes does not make it much harder. The left-hand subtrees of the T-nodes do not play an important role in the beta reductions, but nevertheless reductions take place in these subtrees too, so they cannot be ignored. For a treatment that inludes the case of lambda trees we refer to [ d e Bruzj'n 72b (C.2)]. 4.10. Church-Rosser for mini-reductions The Church-Rosser property for mini-reductions is a simple consequence of the one for beta reductions. Actually two lambda trees C and D are beta equivalent if and only if they are mini-equivalent. This follows from the transitivity of beta equivalence, combined with (i) If A leads to B by a beta reduction then B is a mini-reduct of A. (This was already noted at the end of Section 4.7.) (ii) If A leads to B by a mini-reduction then A and B are beta equivalent. In order t o show (ii) we note that if the mini-reduction is local beta reducv), then beta reduction with respect t o (u, v) can be tion with the AT-pair (u, applied both to A and B , and the results are identical. If the mini-reduction is AT-removal, it is just a case of beta reduction. 4.11. Equivalence Now that we know that beta equivalence and mini-equivalence are the same, we just use the word equivalence for both.
5. CORRECTNESS 5.1. Introduction The notion of correctness of a lambda tree is concerned with the type of the P's that occur in subtrees ( P )Q. Roughly speaking, we require that either Q, or the type of Q, or the type of the type of Q, ... , is equivalent to something of the form [x : R] S, where R is equivalent to the type of P. The system of all correct lambda trees will be called delta-lambda (or Ah). It is different from the older system A (see [Nederpelt 73 (C.3)], [van Daalen SO]) in the following respect. In A we always require for the correctness of ( P )Q that
N.G. de Bruijn
328
both P and Q are correct themselves. In AA we do not: P should be correct, but in formulating the requirements for Q we may make use of P. For example, in ( P )[z : R]S the [z : R]S need not be correct. We may have to apply local beta reduction by means of the pair ( P )[z : R], that transforms S into some S’ such that [z : R]S’ is correct. We actually need this feature if we want to interpret an Automath book as a correct lambda tree. 5.2. Subdivided lambda trees In order to facilitate the formulation of correctness, we introduce a particular kind of lambda trees, where the points are colored red, white and blue. We consider a quadruple (V, lab, p , q ) , where (V,lab) is a lambda tree, and p , q are non-negative integers. Every u E V is a word of T’S and l’s, and by ~ T ( u )we ) 0, ~ T ( T T = ) 2, ~ T ( T T ~ = T ) 2, denote the number of T’S it starts with. So ~ T ( E = etc. The points u with TIT(U) 5 p are called red, those with p < ~ T ( u )5 q white, those with n r ( u ) > q blue. The points el T , T T , T T T , ... are called main line points. We shall call (V,lab,p,q) a subdivided l a m b d a tree if (i) and (ii) hold: (i) The white main line points all have label A . (ii) Among the red main line points there are no two consecutive labels A , and the last one in the red sequence E , T , T T , ... has label T . In other words, the sequence can be partitioned into groups of length 1 and 2, those of length 1 have label T , and those of length 2 consist of two consecutive points with labels A and T , respectively. It is any easy consequence of (i) and (ii) that the set of blue points is nonempty. Note that the conditions are automatically satisfied if p = q = 0. In other words, any lambda tree is a subdivided lambda tree if we color it all blue. In the language of character strings a subdivided lambda tree looks like RWB, where W is a (possibly empty) string ( P I )... (pk)(where k = q - p ) , and R is a string with entries either of the form [z : Q] or of the form ( P )[z : Q]. The red part R might be called a knowledge f r a m e , the white part W a waiting l i s t . In order to clearly indicate the subdivision we write the character string as
w,B ) .
(R,
5.3. The deflnition of correctness Let Slam3 be the set of all subdivided lambda trees. It can be presented as a set of triples ( R ,W,B).
Generalizing Automath (B. 7)
329
We shall define a subset Corr3 of Slam3. The elements of Corr3 are called the correct elements of Slam3. A lambda tree (V,lab) is called correct if (V,lab, 0,O) E Corr3. We note that (V,lab, 0,O)equals ( E , E , B ) if the character string B represents (V,lab). As always, E stands for the empty string, and we shall use the obvious notations for concatenation of character strings. We start by putting a set of triples ( R ,W,B ) into Corr3, in rule (i); the other rules produce new triples on the basis of old ones. (i)
If ( R ,E , 7)E Slam3 then ( R ,E , 7)E Corr3.
(ii) If z is a dummy, if ( R ,W , z ) E Slam3, and if (R,W,typ z) E Corr3 then ( R ,W,z) E Corr3. We have not defined typ z separately in this paper (it would not be a lambda tree but part of a lambda tree). But we can define ( R ,W,typ z) as the subdivided lambda tree that represents (V‘,lab‘, p , q ) , where (V‘,lab’) = typ( V,lab), and (V,lab, p , q ) is represented by ( R ,W,z). (iii) If ( R ,E , K ) E Corr3 and ( R ,W ( K ) ,B ) E Cox3 then ( R ,W,( K )B ) E Corr3. , : U ] B )E (iv) If ( R , E , U ) E Corr3, ( R [ z : V ] , c , B )E Corr3, then ( R , E [x Corr3. (v) If ( R , E , U )E Corr3, ( R ( K )[z : U ] ,W , B ) E Corr3, and if T P ( R , K , U ) holds, then ( R ,W ( K ) ,[x : U ]B ) E Corr3. Here T P stands for “type property”, and T P ( R ,K , V ) is the statement that if ( R ,E , K ) and ( R ,E , U ) represent (V,lab, p , p ) and (V’,lab’, p , p ) , respectively, then (V’,lab’) is equivalent to typ( V,lab). We remark that the conditions about Slam3 in rules (i) and (ii) guarantee that indeed Corr3 is a subset of Slam3. It may seem strange that in rule (i) there is no correctness requirement on R. Therefore we cannot claim that the correctness of ( R ,W,B ) implies the correctness of RWB. Nevertheless it can be shown that if we algorithmically check the correctness of a correct lambda tree (see Section 5.4), we will never enter into cases ( R ,W,B ) where ( E , E , RWB) is not correct, and the conditions on Slam3 in (i) and (ii) will always be satisfied. 6.4. Algorithmic correctness check
For every ( R ,W,B ) E Slam3 at most one of the rules (i)-(v) can be applied, and, apart from rule (i), these replace the question of the correctness by one or more uniquely defined other questions. If none of the rules can be applied
N.G. de Bruijn
330
we conclude to incorrectness. Those “other questions” are all about correctness again, apart from the T P ( R ,K , U ) arising in (v). This provides us with an algorithm for the task of the correctness check for a given lambda tree. We can think of the job as having been split into two parts: (i) Preparing a type check list. This means that we do not answer the question about the T P ( R ,K , U)’s with the various R, K , U turning up, but just put them on a list of jobs that still have to be done. The fact that all degrees are finite (see Section 3.4) guarantees that this job list is made in a finite number of steps. (ii) Establishing truth or falsity of the various T P ( R ,K , U)’s. The work under (i) can already lead to the conclusion that our lambda tree is incorrect. If we forget about syntactic errors that arise if we are presented with a structure that is not a lambda tree at all, this only happens in cases where we get to ( R ,W , T )with W # E , where none of our rules apply. 5.5. Remarks about the type check list The type check list can be prepared if we systematically apply the rules (i)-(v). In each one of the rules (iii), (iv), (v) there are two subgoals where something has to be shown to belong to Corr3. There are good reasons to tackle these subgoals in the order in which they are mentioned in the rules. This comes down to a lexicographical traversal of the lambda tree we have to investigate. This traversal can occasionally be interrupted by some application of rule (ii), which leads to an excursion in an extended tree. The type check list prepared by the algorithm hinted at in 5.4 can lead t o some duplication of work, by two causes: (i)
The given lambda tree can have one and the same substructure at various places. This will actually occur quite often if we represent a n Automath book as a lambda tree.
(ii) Application of rule (iv) of Section 5.3 leads us into asking questions about typ x that have already been answered before. The duplications mentioned in (ii) can be avoided to a large extent: see Section 5.8. We mention a shortcut that reduces the work needed t o prepare the type check list. It is obtained by splitting rule (ii) of Section 5.3 into (ii’) and (ii”): (ii‘) is as (ii), but with the restriction W # E , and
Generalizing Automath (B.7)
33 1
(ii‘) if ( R , & , z E ) Slam3 (where z is a dummy), then ( R , E , E ~ )Corr3.
5.6. Remarks about the type checks The type checks T P ( R ,K , U ) were introduced in 5.3 (vi). Given R, K , U , we can consider the question to establish by means of a n algorithm whether T P ( R ,K , V ) is true or false. The question comes down to establishing whether the (V,lab) of 5.3 (vi) has a type (which is simply a matter of degree) and whether (V’,lab’) and typ(V,lab) have a common reduct. It is quite easy to design an algorithm that does a tree search of all reducts of (V’,lab’) and typ(V,lab). If they do have a common reduct, that fact will be established in a finite time. But will “finite” be reasonably small here? And what if they do not have a common reduct? Are we able to establish that negative fact in a finite time, or at least in a reasonable time? And what if the tree search does not terminate? From a theoretical point of view we can say that our questions about the correctness of a given lambda tree are decidable. For the system A this was already shown by R. Nederpelt ([Nederpelt 7 3 (C.3)], (van Daalen S O ] ) , for A h by L.S. van Benthem Jutting (oral communication). It is done in two steps: (i) Between the notion of “lambda tree” and “correct lambda tree” there is a notion “norm-correct lambda tree”. For any given lambda tree it can be established in a finite time whether it is or is not norm-correct. For the notion of norm-correctness we refer to Section 5.9. In [Nederpelt 73 (C.3)] the term “normable” was used instead of “norm-correct”. (ii) For every norm-correct lambda tree we have the strong normalization property: there exists a number N (depending on the tree) such that no sequence of reductions is longer than N . As t o (ii) we note that if we have reduced both (V’,lab’) and typ(V,lab) to a point where no further reductions are possible, then the question becomes trivial: in that case, having a common reduct just means being equal. The strong normalization property guarantees that the question whether a given lambda tree is or is not correct is a decidable question.
5.7. Practical standpoint Apart from the cases of very small trees, the matter of decidability of correctness will not be of practical value: the number N mentioned in 5.6 (ii) will usually be prohibitively large. If a tree is incorrect, the finite time it takes to establish that fact may be hopelessly long. It is better to be more modest, and t o try to design algorithms with efficient strategies, by means of which we can
N.G. de Bruijn
332
show the correctness of the lambda trees we have to deal with in practice. If such algorithms are applied to an incorrect lambda tree, the fact that they have used an unreasonable amount of time without having reached a decision, may be considered as an indication that the tree is possibly incorrect. Sometimes we can apply quite easy checks by means of which an incorrect tree can be rejected fast: it might fail to be norm-correct, or might be no lambda tree at all. Or we might run into cases where the type of some (V, lab) is required but where deg(V, lab) = 1. 5.8. Avoiding double work
We can rearrange the definition of correctness in such a way that it leads to an algorithm that gives just a single type check corresponding to each A-node in the lambda tree we have to check the correctness of. If we just follow the algorithm sketched in Section 5.4, the cases where we have to treat ( R ,W ,typ z) will cause double work: what is involved in typ z has been earlier dealt with in the execution of the algorithm. The only thing that deserves to be checked is whether the A nodes in W match with the T-nodes that arise from typ z (possibly after one or more further applications of rule (ii)). Let us divide the waiting list into two consecutive parts. The first part is still called “white”, the second part is called “yellow”. For the yellow part the work load will be lighter than for the white part. A formal definition of these four-colored lambda trees is easily obtained by slight modification of Section 5.2. We have to consider (V, lab, p , s,q ) ; the points with p < n r ( u ) 5 s are white, those with s < n r ( u ) 5 q yellow. And the yellow main line points are required to have label A, just like the white ones. Let us denote the set of these four-colored lambda trees by Slam4. Its elements will be represented as ( R ,W,Y,B ) , just like those of Slam3 were represented by ( R ,W,B). We now formulate a new definition of correctness of lambda trees, equivalent to the old one. The difference is that the new definition leads to an algorithm that avoids the duplication we hinted at. It involves both a subset of Corr3 of Slam3 and a subset Corr4 of Slam4. The final goal is as before: (V,lab) is called correct is its character string P is such that ( E , E , P ) E Corr3. As rules we take (i), (iii), (iv), (v) as in Section 5.3, but we add new rules (vi)-(xii), where (iii) replaces the discarded rule (ii): (vi)
If ( R , E , E , TE) Slam4 then ( R ,E , E , T ) E Corr4.
(vii)
If z is a dummy, and ( R ,W ,E , z) E Corr4 then ( R ,W,z) E Corr3.
Generalizing Automath (B.7)
333
(viii) If z is a dummy, if ( R ,W,Y, z) E Slam4, and if ( R ,W,Y, typ z) E Corr4 z) E Corr4. The definition of ( R ,W,Y, typ z) is similar to then ( R ,W,Y, the one of ( R ,W,typ z) in 5.3 (ii). (ix)
If ( R ,W,Y ( K ) ,B ) E Corr4 then ( R ,W,Y, ( K )B ) E Corr4.
(x)
If ( R [ z :U ] , E , E , BE) Corr4 then ( R , E , E , [ zV: ] B )E Corr4.
(xi)
If ( R( K )[z : V ] W, , E , B ) E Corr4 and T P ( R ,K,V ) holds, then ( R ,W ( K ) , E[z , : V ]B ) E Corr4.
, Y,B ) E Corr4 then ( R ,W,Y ( K ) ,[z : V ]B ) E Corr4. (xii) If ( R ( K )[z : V ] W, At the end of Section 5.5 we mentioned the shortcut rule (ii’). There is a similar shortcut here: it can replace (vi) and (x): (x‘) If ( R ,E , E , B ) E Slam4 then ( R ,E , E , B ) E Corr4. However, the set of rules without shortcuts may be better for theoretical purposes. 5.9. Weaker notions of correctness
We can weaken the notion of correctness by weakening the requirement about T P ( R ,K , V ) in rule 5.3 (v). If in rule 5.3 (v) we omit the requirement of T P ( R ,K , V ) altogether, we get what we can call semicorrectness. For semicorrect lambda trees we can define a norm corresponding to Nederpelt’s norm for the system A (see [Nederpelt 73 (C.3)]).A norm is a particular kind of lambda tree: it has no labels A and all end-point labels are T . To every semicorrect lambda tree we attach such a norm. It can be defined algorithmically if we just follow the list of Section 5.8. First we define the norms of the (R, W,B)’s and ( R ,W,Y,B)’s: (i) and (vi): as norms of ( R , E , T )and ( R , E , E , Twe ) take the lambda tree consisting of just one node, labeled T . (iii): norm(R, W, ( K )B ) = norm(& W ( K ) ,B ) . (iv) and (x): as norm of (R,E,[z : V ]B ) (or of ( R , E , E[z , : V ]B ) ) we take the lambda tree with root labeled T , whose left-hand subtree is norm(R, E , U ) (or norm(& E , E , V ) ) ,and whose right-hand subtree is norm(R [z : V ] E, , B ) (or norm(R[z : V ] , E , E , B ) ) . (v): norm(& W ( K ) ,[z : V ]B ) = norm(R ( K )[z : V ] W, , B). (vii): norm(& W , z )=norm(& W , E , ~ ) . (viii): norm(& W,Y, z) = norm(& W ,Y, typ z). (ix): norm(R, W,Y,( K )B ) = norm(R, W,Y ( K ) ,B ) .
N.G. de Bruijn
334
(xi): n o r m ( R , W ( K ) , E , [ z : V ] B ) = n o r m ( R ( K ) [ z : V l , W , ~ , B ) . (xii): norm(R, W,Y ( K ) ,[z : V ]B ) = norm(R ( K )[z : U], W,Y ,B ) . The norms of ( R ,W,B ) or (R, W,Y,B ) are actually a kind of norm for W B or W Y B ; the role of R is only to provide the types of the dummies. Finally the norm of a semicorrect lambda tree (V, lab) is defined as the norm of the all-blue lambda tree (V, lab, 0,O) (which has the form (E, E, B ) ) . We can use a similar algorithm for finding the degree of a lambda tree: we just say in cases (i) and (vi) that the degree is 1, and in cases (ii) and (viii) that the degree is t o be increased by 1. Next we can define the notion of norm-correct lambda trees. We get that notion by replacing in rule 5.3 (v) the condition that the type of (V’,lab‘) is equivalent to typ(V,lab) by the condition that (V’,lab’) has the same norm as (V, lab). This condition is weaker than T P ( R , K , V ) ,and therefore every correct lambda tree is also norm-correct. For norm-correct lambda trees we have the strong normalization property (see Section 5.6). 5.10. Norms for lambda trees which are not necessarily semicorrect If (V, lab) is a lambda tree which is not semicorrect, that fact is established by the algorithm of Section 5.8 at some moment where we get to ( R ,W , T )or (R, W, Y ,r ) with W or Y non-empty. For such lambda trees we can nevertheless still define the norm, by the procedure of Section 5.9, if we just extend the action in cases (i) and (vi) by saying that (R, W,T ) and (R, W ,Y ,T ) have the single-noded tree (labeled by T ) as their norm, also in cases where W or Y are not empty.
6. AUTOMATH
BOOKS AS LAMBDA TREES
6.1. Some characteristics of Automath
We shall not explain Automath in detail here: we assume that the reader knows it from other sources (like [de Bruzjn 70a (A.Z)], [ d e Bruzjn 71 (B.Z)], [de Bruijn 73b],[de Bruzjn 80 (A.5)],[uanBenthem Jutting 77],[vanDaalen S O ] ) . In particular, we shall not try to be very precise in defining particular brands of Automath. Nevertheless we indicate a few characteristics, in order to get to the kind of Automath that corresponds to An. For a discussion that compares various forms of Automath in the light of such characteristics we refer to [de Bruijn 74al. (i)
Automath books are written as sequences of lines: primitive lines, ordinary
Generalizing Automath (B.7)
335
lines (= definitional lines), and context lines that describe the contexts of the other lines. (ii) We have the notion of typing, and that leads to the degrees. In standard Automath the only degrees are 1, 2 and 3, and it seems that for the description of mathematics no serious need for higher degrees ever turned UP. (iii) There are restrictions on abstraction. Contexts may be described as [z1 : All ... [z, : A,] where the Ai may have degree 1 or 2, but in expressions (also in the A;’s of the contexts) we only admit abstractors [z: A] where A has degree 2. (iv) In Automath we have instantiation: if the identifier p is the identifier of a line in a context of length n, then the “instantiations” p(E1, ...,E n ) ,where the Ei are expressions, can be admitted in other contexts. (v)
In some of the Automath languages (like AUT-QE) we admit “quasiexpressions”: expressions of degree 1 which are not just T .
(vi) In some of the Automath languages we have type inclusion: if E : [XI : All ... [z, : A,] T then we admit that E is substituted at places where a typing [z1 : All ... [zk: Ak] 7 (with some k < n ) is required. 6.2. Automath without type inclusion We can take Automath with quasi-expressions but without type inclusion (AUT-QE-NTI). Both AUT-QE-NTI and AUT-68 are sublanguages of AUTQE: we might say that in AUT-68 type inclusion is prescribed, in AUT-QE it is optional, in AUT-QE-NTI it is forbidden. In [ d e Bruzjn 78c (B.411 it was pointed out that AUT-QE-NTI can be used as a language for writing mathematics, somewhat lengthier than in AUT-QE. One might say that sacrificing type inclusion has to be paid by means of a number of extra axioms. But there is a disadvantage to type inclusion: type inclusion makes language theory considerably harder. The rules in AUT-QE-NTI are simple; we just mention that whenever A : B in the context [z : U ] , and U has degree 2, then [z : U ] A : [z : U ] B in empty context. 6.3. AUT-LAMBDA
In AUT-QE-NTI we still had restrictions: (i) the degrees are 1, 2 or 3, and (ii) in expressions abstractors [z : U ] are allowed only if U has degree 2.
N.G. de Bruijn
336
If we give up these restrictions, we get what we can call liberal AUT-QE-NTI. In liberal AUT-QE-NTI the role of instantiation can be taken over completely by abstractors and applicators. In order to make this clear we take a simple example: f := A : B in context [z : U ] . According to the liberal abstraction rules of AUT-LAMBDA we can write a new line F := [z : U ]A : [z : V ]B in empty context. Next, the instantiation f ( E ) is equivalent to ( E )F , so we can just replace the line f := A : B by the new one in empty context, and abolish the instantiation. Carrying on, we get books without instantiation, all written in empty context. Such books can be considered as having been written in a sublanguage of liberal AUT-QE-NTI; let us call it AUT-LAMBDA. In AUT-LAMBDA there are just two kinds of lines: (i) primitive lines
f := ‘prim’ : P and (ii) definitional lines g := Q : R .
(Note: ‘prim’ was written as PN in c her Automath public: ions.) 6.4. Turning AUT-LAMBDA into AA
We shall turn a book in AUT-LAMBDA into a lambda tree by a system that turns correct AUT-LAMBDA into correct lambda trees. It almost works in the opposite direction too, but it turns out that AA is a trifle stronger than AUTLAMBDA. The difference lies in some cases of what was mentioned in Section 5.1, but the difference is so small and unimportant that it seems t o be attractive to modify the definition of AUT-LAMBDA a tiny little bit, in order to make the correspondence complete. The transition is simple. We turn the identifiers of a book in AUT-LAMBDA into dummies. To a line f := ‘prim’ : P we attach the abstractor [f : PI, t o a line g := Q : R we attach the applicator-abstractor pair (Q)[g : R]. We do this for each line of the book, and put the abstractors and applicator-abstractor pairs into a single string, and we close it off by T. So to a book
f := ‘prim’
:
P
g:=
Q
:
R
k : =
V
: W
h := ‘prim’ : 2
there corresponds the string
Generalizing Automath (€3.7)
337
and this corresponds to a lambda tree.
6.5. Checking algorithms If we start from an AUT-LAMBDA book, transform it into a lambda tree as in Section 6.4,and apply the checking algorithm of Section 5.4,then we have the advantage that the AUT-LAMBDA book is checked line by line. So even if the book is incorrect as a whole, the first k lines can still be correct, and the algorithm can establish that fact. The same thing holds if we take the weaker correctness notions discussed in Section 5.9. 6.6. Type inclusion If we want to add the feature of type inclusion to AUT-LAMBDA, the transition of a book to a lambda tree can no longer be made in the same way. Moreover we need essential changes in the notion of typing in Ah. 6.6. A variation of Ah
We mention a modification of the definition of correctness of a lambda tree, obtained by considering different kinds of A-nodes. Let us divide the set of all A-nodes of a lambda tree into two classes: strong ones and weak ones. We take it as a rule that whenever a part of a tree is copied (like in the definition of typing) the copies of weak nodes are weak again, and the copies of strong nodes are strong. For the weak A-nodes the rules are as in Section 5.3, but for the strong ones we modify rule 5.3 (iii) by not just requiring that ( R ,E , K ) and ( R ,W ( K ) ,B ) are in Corr3, but also ( R ,E , B ) . In connection to what was said in Section 5.1 we might say that A corresponds to the case where all A-nodes are taken to be strong, and that AA is the case where all A-nodes are weak. The case mentioned in Section 6.4lies between these two: if we want to close the gap between AUTLAMBDA and AA we have to make all main line A-nodes weak and all others strong. If we replace weak A-nodes by strong ones, a correct lambda tree may turn into an incorrect one, but it can be expected t o become correct again by reductions.
This Page Intentionally Left Blank
339
Lambda calculus extended with segments Chapter 1, Sections 1.1 and 1.2 (Introduction) H. Balsters 1. INTRODUCTION The A-calculus is concerned with axiomatizing the mathematical concept of function and the rules governing the application of functions to values of their arguments. In the A-calculus a function is seen as a rule for calculating values; this is a view which differs from the one held in set theory, where a function is t o be a set of ordered pairs and is identified with its graph. In axiomatizing the concepts of function and application we define (i) a syntax, consisting of a set of grammar rules, and (ii) inference rules. The A-calculus to be described in this section, called Aa, is an extension of the ordinary type free A-calculus (cf. [Barendregt 84al) and was originally conceived by N.G. de Bruijn (cf. [de Bruijn 78al). The main feature of Aa is the incorporation of a new class of terms called segments. These segments were originally devised in order to provide for certain abbreviational facilities in the mathematical language Automath. Automath is a typed A-calculus in which it is possible t o code mathematical texts in such a way that the correctness of each proof written in Automath can be verified mechanically (i.e. by a computer). There is much to say about the Automath system, much more than the topic of this thesis aims to cover. We shall mainly treat Aa as an interesting extension of the A-calculus in its own right and not pay very much attention t o connections with Automath. This thesis will be a rather technical treatise of the syntax and axiomatics of Xu-theory. For an introduction to the Automath project we refer to [de Bruajn 80 (A.5)] and [van Benthem Jutting 81 (B.l)];the latter paper offers an excellent introduction to a fundamental Automath-language called AUT-68. For a detailed treatise of the language theory of the Automath-languages we refer to [van Daalen 801. This introduction consists of three sub-sections. In Section 1.1 we shall give an informal description of the Ao-system and pinpoint major differences with
340
H . Balsters
ordinary type free A-calculus (for a very complete and up-to-date description of type free A-calculus we refer to [Barendregt 84a]). Section 1.2 contains an informal description of the XTu-system (Xu extended with types). The types in XTU are an extension of the types in Church’s Theory of simple types (cf. [Church 40]), the extension being that simple types are constructed for segments and segment variables. 1.1. An informal introduction t o the Xu-system
In this section we shall give an informal description of a system called Xu. We shall offer some explanation for the motivation behind the system and show in which way Xu is an actual extension of ordinary type free A-calculus. We start with a simple system called XV.
1.1.1. The system XV The system XV is the well-known type free A-calculus as described in [Barendregt 81 although there are some slight deviations in notation. Type free A-calculus has formulas like
The corresponding formulas in XV are written as
In XV functional abstraction is denoted by Xz( ...) (i.e. the function that assigns (...) to the variable x, where x may occur in (...)), and functional application is denoted by 6AB (i.e. the function B applied to its argument A, where A and B are XV-terms). Note that in XV arguments are written in front of functions, this in contrast with ordinary type free A-calculus where application of a function B to its argument A is usually written as B ( A ) . The syntax of XV is very simple and is given below.
Definition 1.1.1, (1) XV-terms are words over the following alphabet
Lambda calculus extended with segments (B.8)
...
X
variables abstractor
6
applicator
v1, ~ 2 ~ 0 3 ,
34 1
(2) The set of XV-terms is the smallest set X satisfying
xEX , for every variable x (ii) A E X + A, A E X , for every variable x (iii) A , B E X 6AB E X . (i)
*
As will be clear, XV-terms are written in prefix notation: each variable has arity 0, each abstractor A, has arity 1 and the applicator 6 has arity 2. Each term can be represented as a rooted tree. As an example we consider the term 6 2 A,
A, 6yx
(40
which we write in tree form as
6
l2 -
A,
-
A,
-
6
lY -
2.
(4’’)
The correspondence between terms like (4‘) and trees like (4”) is one-to-one. It certainly helps to think of XV-terms as such trees, and in particular t o see operations on terms as operations on their corresponding trees; especially when long terms are involved it is often useful to consider tree representation of terms.
1.1.2. Beta reduction In X-calculus we have the fundamental notion of application. The application of a function B to an argument A is written as 6AB. Apart from functional application we have the notion of functional abstraction. As said before, the intuitive meaning of A=( ...) is “the function that assigns (...) to the variable x”. This is illustrated in the following example (not a XV-term by the way) 63X,(2
*
x + 1) = 2 * 3 + 1
+
i.e., we substitute the number 3 for the variable z in 2 z 1. A formula of the form 6 A A, B is called a redex. Substitution of A for the free occurrences of z in B is denoted by C , ( A , B ) . The transition from 6 A A, B to C , ( A , B ) is called @reduction. We now proceed by giving a more formal description of substitution. We recall that an occurrence of a variable x in a term A is called bound in A if this occurrence of x lies in the scope of some abstractor A, in A ; otherwise this occurrence of x is called free in A . Note that a variable can occur both free and bound in the same term; as an example consider the two occurrences of the variable x in the following term written in tree form
H . Balsters
342
I”
6 - A,
-
6
I” -
y.
Definition 1.1.2. If A is a term and 2 is a variable and y is a variable with y # x then we define C , ( A , B ) inductively for terms B by (1) C , ( A , x ) = A W A ,Y) = Y
I
(2) L ( A , Az C) = Xz C
(3) c,(A, A, C) =
A, &(A, C)
,
X,C,(A, C‘) ,
if x does not occur free in C, or: y does not occur free in A otherwise - where C’ is obtained by renaming of all free occurrences of y in C by some variable z which does not occur free in A, C.
(4) C,(A,6CD) = 6 C z ( A , C ) C , ( A , D ) .
Most of the four clauses in the definition given above are self-evident, with the possible exception of clause (3). Clause (3) is necessary in order to avoid that free occurrences of the variable y in A get bound by the A, of XyC after substitution, which would otherwise lead to inconsistencies. This renaming of bound variables is known as a-reduction. In our case it is said that XyC areduces to A, C’. Usually a-reduction is considered unessential. If a-reduction transforms a term A into A’ then A and A’ are considered to be equivalent in an informal way. This convention implies that the name of a bound variable is unessential; the “meaning” of a term is considered unaltered after performing an a-reduction on that term. Actually, in the definition of substitution given above, clause (3) does not introduce a proper term but rather an a-equivalence class of terms. 1.1.3. Name-free notation Renaming of bound variables can sometimes be very cumbersome; proofs involving a-reduction are notoriously tedious. But apart from this we have our own intrinsic reasons to avoid a-reduction. Later on we shall introduce the full Xa-system, an extension of XV. The main feature of Xa is the incorporation of a new class of terms called segments. Segments are discussed in Section 1.1.4. Substitution of segments for their corresponding variables can give rise to a large number of a-reductions, especially when the formulas are long. There is, however, a very simple way to avoid a-reduction. In [de Bruijn 78b], N.G. de
Lambda calculus extended with segments (B.8)
343
Bruijn introduced the concept of nameless dummies; he invented a A-calculus notation that makes a-reduction superfluous. The idea is that we just write X instead of A, A,, ... and every variable is replaced by a term of the form [ ( n ) , where n is some positive integer. Each J(n) is called a name-free variable and n is called a reference number. The reference number n of a name-free variable J(n) determines the X that binds a specific occurrence of J(n) in some term. The procedure is as follows. If the name-free variable J ( n ) occurs in some term t , we first form the tree representation o f t . We then descend from J ( n )towards the root of the tree and the n-th X encountered is the X that binds J(n). As an example consider the following name-carrying term in tree representation
X,
-
I"
lY
A,
-
6 - A, - 6 - A,
-
x.
The name-free equivalent of this term is
Remark. If a reference number n is larger than the number of X's lying on the path from an occurrence of J(n) to the root of the tree in which it occurs then we can interpret that occurrence as being free. The use of name-free notation has certain consequences for substitution of XJ-terms (XV-terms written in name-free form), which we now shortly describe. Substitution in a XJ-term t results in the replacement of free occurrences of a certain variable in t by some term u. We could also describe this situation in terms of trees by saying that certain end-points J ( n ) of the tree equivalent t^ of t have been replaced by some tree .iL. Consider the following example of such a substitution in a XJ-tree. Let t be the XJ-term
which has the following tree-representation
t^
1 J(4 16 X
-
X
-
6
-
-
J(1)
X - X -
This tree contains a redex, namely
x
- J(3)
H . Balsters
344
1 E(2) 1 6 - ((1)
6
-
x
-
x
-
x
- ((3)
and we can therefore perform a P-reduction on t^. By &reducing ((3) is a candidate for substitution of the sub-tree
t^, the end-point
/ ((2) 6
-
((1) *
Should we, however, simply replace ((3) by this sub-tree, as would have been the case if t^ had been written in name-carrying form, then this would result in the following tree t^’
x
-
x
-
x
-
x
- 6
/5(2) - ((1).
It is immediately clear that the variables ((1) and ((2) in t^’ refer to completely different A’s than in t^. This inconsistency is due to the fact that (i) ((1) and ((2) are external references in of the subterm 6((2)((1));
t^ (Lee,references t o
(ii) after replacement, the variables ((1) and ((2) in their left.
t^’
A’s to the left
have two extra A’S on
There is, however, a simple way t o resolve this inconsistency: by raising the reference numbers 1 and 2 in ((1) and ((2) by 2 in if,these variables refer t o the same A’s that they originally referred t o in t^. This example demonstrates that certain measures have to be taken in order to ensure that external references remain intact when we substitute a At-term. In Section 2, where we give a formal definition of substitution of name-free terms, we shall introduce so-called reference mappings, which see to it that reference numbers are suitably updated in order t o avoid inconsistencies as described above. We refrain from further discussion of these reference mappings here; they shall be described extensively, both informally and formally, in Section 2. In the following sections of this chapter we shall first stick to name-carrying notation of formulas. The major reason for this is to point out that namecarrying notation can possibly be maintained in Xu-calculus (XV-calculus extended with segments and segment variables), but we also want to show how awkward things can get in Xu-calculus by employing name-carrying notation. In the case of XV-calculus the name-free notation might seem exaggerated in preciseness, and we can imagine reservations towards this notation as far as readability of formulas is concerned. In the case of Xu-calculus we shall try to
Lambda calculus extended with segments (B.8)
345
show that the name-free notation has advantages over name-carrying notation, both in preciseness and readability. 1.1.4. Segments and abbreviations We may consider a variable as an abbreviation of a certain term if this variable can be replaced by that term by means of some suitable P-reduction. For example, consider the following term written in tree form -
/ A X
A,
-
6
-
A,
2 -
6
/" -
z .
(5)
By /3-reducing ( 5 ) we obtain the term
A,
-
6
I" -
A,
- 2 ,
(5')
i.e. a term in which the variable z has been replaced by the term A, x and the redex has vanished. If we would have more occurrences of the variable z , each bound by the A, of the redex, then each of these occurrences serves as a kind of abbreviation of the term A, x. In A n there are, however, still quite different things that we want to abbreviate. One such thing is a so-called &string like
I
A
/
B
6 - 6 - 6
C
/
.
If it occurs more than once in a certain term, we may wish to abbreviate it. Yet (6) is not a term, in the sense of a AV-term, but only part of a term; it becomes a AV-term if we place an arbitrary AV-term behind it. Such parts of AV-terms are called segments. Another example of a segment is a so-called A-string like A,
-
A,
- A,.
(7)
In Automath we have many cases where we would like to abbreviate segments. In this respect we mention an interesting Automath-language, namely Nederpelt's language A (cf. (Nederpelt 73 (C.3)]). The original idea of introducing such a language as A stems from N.G. de Bruijn who devised a language called AUT-SL (from Automath-Single Line) in which Automath texts can be represented as one single formula. The language A was devised as a fundamental and simple Automath-language which is very well suited for language-theoretical investigation. In typical codings of Automath texts in A we encounter very many copies of certain &strings and A-strings, copies which we would like to abbreviate. As a consequence, segments like &strings and A-strings will be treated
H. Balsters
346
as separate independent entities in Xa. In X u we shall even take a broader approach and allow for segments of a much more general form than &strings or A-strings alone. In the following section we shall give examples of such segments of a more general form. 1.1.5. S e g m e n t variables and s u b s t i t u t i o n Segments are terms with a kind of open end on the extreme right. From now on we shall use the symbol w to indicate the open end on the right. So
I
A
l
B
l
C
6 - 6 - 6 - w
is a segment as well as A,
A,
-
A,
-
- w .
As said before, segments are not XV-terms; a segment becomes a XV-term if we replace the w by an arbitrary XV-term. According to this scheme the following formulas can also be considered as segments:
lA
6 - A,
- A,
-
lA
Ax
-
w
lB
6 - A,
- 6 - w .
By replacing the w in both of these formulas by some XV-term we obtain a XVterm (provided, of course, that A and B are XV-terms). In X a we will go even one step further by allowing recursive nesting of segments, and as a consequence w’s can occur in other branches as well, like in
6
/ x x - w -
A,
-
I”
A,
-
6 - w
or / A X
,6
-
A,
-
w -
A,
I”
-
6 - w
I
6 - Xu
w .
All these occurrences of w in the foregoing formulas act as a kind of “holes”, which - once replaced by a XV-term - yield again a XV-term. All formulas having an w on the extreme right axe called segments in Xu. Along with segments we also add to our system a new kind of variables for which segments can be
Lambda calculus extended with segments (B.8)
347
substituted. These variables are represented by unary prefix symbols and are denoted, in name-carrying form, by o,o',a",... . An example of a Xo-term containing a segment and a segment variable is
6
I
- A,
A,
- A,
Xu - a -
-
- w
2.
This term is in redex form, where the segment variable o is bound by the Xu of the redex. Performing a P-reduction on this redex results in
A,
-
A,
A,
-
-
x
(8')
i.e., the prefix symbol o is replaced by the segment A, A, A, (where the w has been dropped). In Xu, segment variables can serve as a means to abbreviate segments, just like variables in AV can serve as a means to abbreviate XVterms. When using segment variables to abbreviate segments we must be careful, though. Consider for example the Xo-term (8). The variable x in that term refers to the abstractor A, hidden inside the segment variable o,as seen in (8') where x gets bound by A, after P-reduction of (8). This is an intended feature which we always have to take into account in Xo-calculus. If a segment variable o occurs in some Xo-term then after replacement of o by the segment s that o abbreviates in t , it can happen, as most often will be the case, that certain variables occurring in t get captured by abstractors lying on the main branch of the tree representation of s. This is to say that each occurrence of a segment variable o in a Ao-term t can contain abstractors - hidden inside o - which will capture certain variables in t after performing a P-reduction in t resulting in the replacement of o by the segment that o abbreviates in t. We now wish to discuss a situation in which there are more occurrences of the same segment variable o in some Xo-term. Consider the following Xo-term in tree representation - A,
/A"
6
Xu
-
-
o
-
-
w
1"
o - 6 - y
(9)
Performing a &reduction on this term results in
A,
-
A,
-
A,
- A,
-
6
I" -
y
(9')
where both instances of o have been replaced by the segment A, A., The variables x and y in (9') are bound by the last two abstractors A, and A, as indicated by the arrows in (9") shown below
*A,
- A,
-
A,
&'
- A,
-
7x y.
6
-
L d '
H . Balsters
348
Suppose, however, that we could want x and y to be bound by other occurrences of the abstractors A, and A, as indicated in / - - -
A,
Y
- A,
4
7x
6 - y. \ \ - - -/ - A,
- A,
-
(9y
In Aa we want to have the freedom to allow for such deviations in priority of binding power of A's, which appear when we have more than one occurrence of some segment variable in a Xu-term. One way of doing this is by renaming the abstractors in (9') in a suitable way; consider for example the following term
It is clear that the variables x and y are bound by the first two abstractors A, and A, just as we intended them to be bound in (9"'). This renaming, however, is done after substitution has taken place; i.e. the renaming has taken place after @-reductionof (9) to (9'). What we would like is that it can be seen beforehand (i.e. before @-reductiontakes place) how the abstractors inside segments shall be renamed. We would like t o have a means systematically indicating beforehand how this renaming of bound variables shall take place, instead of more or less arbitrarily renaming bound variables in segments after @-reduction. One way of doing this is by replacing the first, respectively the second, occurrence of a in (9) by a(x, y), respectively a(x1,yl). These parameter lists (x,y) and (x1,y1) serve as instructions indicating that the abstractors A, and A, are t o be renamed A,, in the first, respectively the second occurrence of a in by A, A, and A, (9) (actually only in the second occurrence of a real renaming takes place). In general if a segment has n ( n 2 0) A's lying on the main branch of its tree, say A,, ...,,,A, and a is a segment variable referring to that segment then by adding a parameter list (yl, ...,y,) to 0 we have an instruction indicating that the n abstractors A,, , ...,,A, are to be renamed by ,A, , ...,,A and in that order. Also the occurrence of the variables 21,...,x, in the segment which were bound by A,, ...,,A, are to be renamed by y1, ..., y,. We note that it is important that the parameter list added to a segment variable a has as its length: the number of A's lying on the main branch of the segment s that u refers to (this number is called the weight of 8 ) . By adding parameter lists to segment variables we have a means to bind occurrences of variables to a A hidden inside a segment exactly as we desire. There is still one problem, though, that we have to resolve. When performing a @-reductioninside a segment we are sometimes dealing with redices which, in
Lambda calculus extended with segments (B.8)
349
the substitutional process involved, have an effect on thew on the extreme right of that segment. Consider, for example, the following segment
A,
-
6
lA - A,
A,
-
-
A,
-
w
By @reducing the redex 6 A A, A, A, w occurring in (10) we are faced with evaluating C , ( A , A, A, w).By the clauses given in Definition 1.1.1 we know how to “shift” the C,-operator past the two abstractors A, and A,, but then we arrive at the w and have to decide how to evaluate C,(A, w). We could simply define C,(A,w) as w, but then certain vital information would get lost; a situation which we now explain. Suppose that (10) occurs as a segment in some term t and that (10) is referred to by some segment variable a(y1, y2,y3, y4) occurring in t. Suppose also that there is an occurrence of the variable y2 in t which refers to the abstractor A,* hidden inside a ( y l , y 2 , g 3 , y 4 ) . By ,&reducing (10) and defining C,(A, w) as w , this occurrence of y2 is no longer a candidate for substitution of the term A (which would have been the case prior to this @-reduction of (lo)), simply because the abstractor A, (or better: Ay2) has vanished. In order to avoid inconsistencies and to keep this candidate-role of substitution for such occurrences of variables 92 intact, we shall define such substitutions of a term A at an end-point w of a segment by
C,(A,w) = 6
lA -
A,
- w
.
In this way it remains possible to refer to the A, of the original redex in (lo), and occurrences of variables which referred indirectly to that lambda by means of a reference to a lambda hidden inside some segment variable remain candidates for substitution of the term A. There is still a problem, though, because the order of the A’s in the reduced segment is different from the order in which they appeared in the original segment. In our example, P-reduction of (10) results in
A,
- A,!
- A,,
lA
- 6 - A,
- w
where t and w have possibly been replaced by new variables z’ and w’, this in case that free occurrences of z or w in A would otherwise have been captured. The abstractors in (10) appear in the order A,, A,, A,, A, and in (lo‘) the order is A,, A,!, A,, A,. This difference has consequences when these segments are substituted for some occurrence of a variable a ( y l , y 2 , y3,y4). Consider, for example, the following two terms in which the segments (lo), respectively (lo’), occur
350
H . Balsters
6
A,
-
lA
6
-
1
A,
-
- A,
- A,
-
w r.,
- 4Yl,Y2,Y3,Y4) - yz
and
/ 6
A,
- A0
- A,,
A,!
-
-
- 6
lA -
A,
- w
- y2
4Yl,Y2,Y3,Y4)
These terms @-reduce to
A,,
-
6
A,,
-
A,,
I A’ -
A,,
-
A,,
-
Ay4 -
YZ
and - A,,
-
6
I A’ -
A,,
-
Y2
where A’ is obtained from A by renaming all free occurrences of z by y1. In (12) we see that A’ can be substituted for y2 by performing one more 0-reduction; this is, however, not the case in (12’). So by changing the order of the A’s in some segment s by performing a @-reduction inside s we can get the situation that occurrences of variables that originally (i.e. prior to this 0-reduction of s) referred to a certain A hidden inside some parameter-listed segment variable, afterwards refer to a completely different A. There is a way, however, in which such inconsisteixies can be resolved. By adding a n extra parameter, called a segment mapping (or segmap for short) to an w we can safely @-reduce a segment prior to substitution of that segment. A segmap is a permutation of some interval [l..n] of IN (n 2 0), and tells us how to restore the original order of the A’s occurring in a segment; i.e. by adding a segmap to the w on the extreme right of a segment we can determine the order in which the abstractors occurred before @-reductionof the original segment. Instead of writing w we now write w ( $ ) , where 1c, is some segmap. In our example we replace the w on the extreme right of (11’) by w ( +) , where is a permutation of [1..4] defined by
+
$0)= 1 $(2) = 3
*(3) = 4 $44) = 2
.
Let us denote this modification of (11‘) by (11”). If we rearrange the order of the parameter list ( y l , yz, y3, y4) in accordance to ?J, (i.e. the first parameter remains first in the list, the second becomes the third, the third becomes the fourth and
Lambda calculus extended with segments (B.8)
35 1
- most importantly - the fourth parameter becomes the second in the list) then we obtain a new parameter list (y1, y3, y4, yz). By replacing the parameter list (yl, y2, y3, y4) in (11’) by this new parameter list (yl, y3, y4, yz) we obtain the following modified version of (11”)
/ 6
A, - A,)
-
Xu -
- A,,
- 6
I A’ -
A,
-
w
0(Yl,Y3,94,92) - 92
( 11”)
which @-reduces to
I A’ A,,
- A,,
- A,,
-
6
-
A,,
- y2
(12“)
and we see that all occurrences of variables in (12) and (12”) refer t o the same A’s, just as we wanted. By adding parameter lists and segmaps we can take care of problems concerning references to A’s hidden inside segment variables in a suitable way. We shall now attempt to give a more formal description of substitution of a segment for a segment variable. We shall present this definition in name-carrying form, this in order to show that name-carrying notation can be maintained in principle but that employment of name-free notation provides for a more natural (and certainly more concise) means for dealing with substitution of segments for segment variables.
Definition 1.1.3. Let Aw($) be a segment with weight n (n E nV U {0}), $ be a permutation of [l..n] and B be a term. Substitution of Aw($) for u(y1, ...,gn) in a ( y 1 , ...,yn)B is defined by
6) (ii) (iii) where id(n) denotes the identity map on [l..n],(yi, ...,y;) is the result of rearranging (yl, ...,yn) as indicated by $ and A’ is the result of suitable renaming of bound variables in A as indicated by (yi, ...,y i ) . This definition is still rather vague since we have not defined ,Y,,~,l,.,,,,,,) ( A w ( $ ) ,B ) , and also because such descriptions as “rearrangement of a parameter list as indicated by a segmap” and “suitable renaming of bound variables in a term as indicated by a parameter list” can hardly be considered as descriptions with formal status. The transition from (ii) to (iii) is also a bit strange,
H . Balsters
352
since it is not clear from (ii) alone how the segmap $ in (iii) suddenly turns up again. Apparently, this is not a very good definition since it is too vague; but, as mentioned before, this definition was only intended as an attempt towards a formal definition. A precise formal definition of substitution for segment variables can of course be given, but such a definition would be rather involved. There is a more elegant and shorter way to define substitution for segment variables, namely by employing name-free notation for segments and segment variables. This notation is described in the following section.
1.1.6. Name-free notation for segments and segment variables There is another way of dealing with references t o A’s hidden inside segment variables than attaching parameter lists to segment variables, namely by employing name-free notation. What we shall do is the following. Segment variables are written in name-free form as u(n,m),where n denotes the reference number of o (which, like in E(n),determines the X that some specific occurrence of o(n,m)refers to) and m ( m 2 0) denotes the number of X’s lying on the main branch of the tree representation of the segment that o(n,m ) intends t o abbreviate (the number m is also called the weight of o(n,m ) ) .The number m in o(n,m) is to play the role of a parameter list in name-carrying notation; i.e. m indicates that there are m X’s hidden inside ~ ( nm). , As an example of a term in name-free notation containing a segment and a segment variable consider the following term written in tree form
X
- 6
I
/ 5(1) A - 6 - A - A - w
-
X
-
0(1,3)
-
t(5) I 6 - 5(2).
In this term we see that o(1,3) abbreviates a segment with three X’s lying on the main branch of its tree; so when determining the X that 5(5) refers to we descend from t ( 5 ) towards the root of the tree, subtract 3 from 5, subsequently subtract 1 and see that 5 ( 5 ) refers to the first X (from the left) of the tree. The variable 5(2) refers to the second X (from the right) hidden inside o ( l , 3 ) ; E ( 2 ) is thus bound by the second X (from the right) of the segment
/ ((1) X - 6 -
X - X - w .
By employing name-free notation we get a concise way of denoting segment vaxiables and can do without attaching (potentially long) parameter lists to these variables. There is still one problem, though; a problem which we discussed earlier on in the name-carrying version of Xo-calculus, which dealt with the performance of certain &reductions inside segments prior to substitution of those
Lambda calculus extended with segments (B.8)
353
segments for their respective segment variables. By performing a P-reduction inside a segment, the order in which certain A’s originally occurred in that segment can be disturbed and, as we have seen earlier, this can lead to problems when we substitute the reduced segment for certain occurrences of segment variables in a term in which that segment occurs. We solved those problems by adding segmaps to the w’s on the extreme right of the segments involved and we shall do so again in the name-free version of xu. We now shortly describe substitution of segments for segment variables and we shall give this description in an informal manner in terms of trees. The tree representation of a segment has an w ( $ ) - where II, is some segmap - on the extreme right of its main branch. When we substitute a segment we remove the w($) and put the remaining tree fragment in the place of some occurrence of a segment variable in a Xu-tree. Segment variables occur in Xa-trees as unary nodes and substitution of segments for segment variables thus gives rise to replacements at unary nodes inside a Xa-tree (which differs completely from A(-substitutions, where we could only perform replacements at end-nodes of trees). When such a substitution is performed, we again - as in the case of A(-substitutions - have to be careful and update external references in order to ensure that these references remain intact after substitution. But not only do we have to update external references when we substitute a segment for a corresponding segment variable, we also have to take into account the effect of the segmap $ attached to the end-point w of the segment involved, since such a segmap reallocates references to A’s lying on the main branch of the segment which we want to substitute. We now give a n example to demonstrate both of these features. Consider the following example of a Xa-tree containing a segment and a segment variable
1 x
-
6
I
x
-
-
x
x -
-
x
((3)
6 -
w($)
-
x
-
a(3,2)
-
((1)
where $ is the permutation of [1..2] defined by $(1) = 2 and $(2) = 1. This tree, which we shall refer to as t^, contains a redex, namely
6
I
1 ((3) X - X - 6
-
x
-
x
-
-
x
w($) - 0 ( 3 , 2 ) - ((1)
and we can therefore perform a P-reduction on t^. By P-reducing node u(3,2) is a candidate for substitution of the sub-tree
1 ((3) x
-
x
-
6 - w($).
t^,
the unary
H . Balsters
354
Should we simply replace a(3,2) by the tree fragment
/ ‘33) A - A - 6 then this would result in the following tree
t^’
/ ((3) A
-
A - A - A
-
A
-
6
- ((1).
It is immediately clear that the variables [( 1) and t(3) refer to different A’s then they originally referred to in t^. The variable ((3) is an external reference in t^ and, as in the case of At-substitutions, has to be suitably updated whenever the segment in which ((3) occurs is substituted for some segment variable. The variable ((1) in t^ refers to one of the two A’s hidden inside a(3,2); it seems to refer to the first A (from the right) lying on the main branch of the segment involved, but the segmap Ir, reallocates this reference to the second A (from the right). This means that correct P-reduction of t^ would result in the following tree P’
/ ((5) A - A - A - A -
A-6-((2).
In Section 2 we shall give a formal definition of substitution of Aa-terms. In this definition we shall use secalled reference mappings which see to it that reference numbers are suitably updated, like in our example in the transition from t^ to t^”. These reference mappings (or refmaps for short) and their interaction with Aa-terms are described extensively in Section 2, and we refrain from further discussion of refmaps here. The employment of name-free notation and segmaps makes it possible to give a formal definition of substitution of segments for segment variables in a very concise way, as we shall see in Section 2. In previous examples describing how substitution of segments for segment variables can take place we have restricted ourselves to rather simple situations. Our formal treatment of such substitutions, however, will take much more involved situations into account. Our formal definition of substitution will take into consideration certain accumulative effects which can occur when segments contain references to other segments, or even A’s which bind segment variables. 1.2. An introduction to the typed system ATU In this section we shall give a description of the Xa-system extended with types for terms. The types in AT a are a generalization of the types described in
Lambda calculus extended with segments (B.8)
355
Church’s Theory of simple types (cf. [Church 40]), the extension being that simple types are constructed for segments and that the description is given in namefree notation. The basic ideas for our description are taken from [de Bruijn 78a]. We shall start from a name-carrying calculus without segments - which, basically, is Church’s system of simple types - called ATV. We then gradually move on to a system in which operations on types are made more explicit and in which the name-free notation is incorporated. Finally, we shall describe the full AT a-system by offering, in name-free notation, a typing of segments. The definitions offered in this section will be followed by explanatory remarks.
Definition 1.2.1 (AT V ) . (1) Type symbols ( T ) The set of type symbols T is the smallest set X such that (i) e , @ E X ; (ii) a,P E X\{@}
* ( ~ 0E X) .
(2) Primitive symbols The set of primitive symbols consists of (i) variables: z, y, z,, (ii) the symbols
...
a E T\{@} ;
X (abstractor) and 6 (applicator).
(3) Terms ( X T V ) The set of terms AT V is the smallest set
X such that
X , for every variable z, ; (ii) t E X X za t E X , for every variable z, ; (iii) u, v E X * 6uv E X . (i)
2, E
*
(4) Types of terms
The function typ on AT V is defined inductively for terms t by (i)
tYPb) =a;
(4 @ , 7
(ii) typ(X z, u ) =
(iii) typ(6uv) =
P , @
,
if tYP(U) = P
otherwise
if typ(u) = cr otherwise
(5) The set of correct terms (AT V ) AT V = { t E AT V I typ(t) # @ } .
#@;
# @ and typ(v) = (4)
H. Balsters
356
Remarks. (1) e is some ground type, 8 is to be interpreted as the type of terms which are “incorrectly” typed.
(2) (a@ is to be interpreted as the type of those terms which map terms of type a to terms of type P. (3) If typ(t) = a then a is generally of the form ( ~ I ( Q z (... Q ~(a,&,+I) ... ))), where ( ~ 1 ..., , a,+1 are types. Speaking in terms of trees, this means that there are n abstractors X x a l , ...,X zanlying on the main branch of the tree representation t^ of t (and in that order) that cannot be removed by some &reduction in t ; i.e. for each abstractor Axai there is no matching 6 (or rather: 6 Ai ) such that this 6X-pair can be removed by means of a suitable sequence of P-reductions.
Before giving the next definition we introduce some notation concerning sequences. For an elaborate treatment of sequences we refer to Section 2.1. At this stage it is only important to know that a sequence is seen as a function with some interval [l..n] of N (n 2 0) as its domain, where n will be the length of the sequence in question.
Notation. Let C be some non-empty set (called an alphabet). - C* denotes the set of sequences over C (including the empty sequence denoted
by -
0 (the empty set)).
if c E C then ( c ) denotes the sequence of length 1 consisting of the “symbol” C.
-
-
if F,G E C’ then F & G denotes the concatenation of the sequences F and G, in particular if F is a sequence of length n (n 2 0) then F = (F(1)) ( W ) ) ... & ( F ( n ) ) . if F E C’ then E denotes the reversed sequence of F , i.e. if F = (F(1))& ( F ( 2 ) ) & ... & ( F ( n ) )then E = ( F ( n ) )& ... & ( F ( 2 ) )& (F(1)).
In the following definition we offer an alternative version of ATV in which operations on types are made more explicit.
Definition 1.2.2 ( A p , V ) . (1) Types ( T y ) The set of types T y is the smallest set X such that
Lambda calculus extended with segments (B.8)
357
(i) @ E X ; (ii) F E (X\{@})*
*y(F)E X ;
(2) Primitive symbols The set of primitive symbols consists of (i) variables: zf,y f , zf,... f E Ty\{@} ; (ii) the symbols X (abstractor) and 6 (applicator).
(3) Terms (AT-, V) The set of terms AT-, V is the smallest set X such that
zf E X , for every variable x f ; (ii) t E X + X z ~t fE X , for every variable zf ; (iii) u, v E X =+ 6uv E X .
(i)
(4) Types of terms The function y-typ on AT-, V is defined inductively for terms t by
(i)
T-tYPbf) = f ;
(ii) y-typ(A zf u ) =
(iii) y-typ(6uv) =
I
{
;(f)
& G)
ify-tYP(u) = y(G), for some G E (Ty\{@})* ;
7
, y(G)
if y-typ(u) = f and y-tYP(v) = r((f)& G ) 1 for some f E Ty\{@} and
,
GE @
otherwise
,
(R\{@})* ;
otherwise
( 5 ) The set of correct terms (AT-, V) AT?
v = { t E AT-, v 1 ?typ(t)
# 8).
Remarks. (1) We note that the symbol y is of no particular interest in itself, and the reason for introducing it is basically historical in nature. In [de Bruijn 78a] types of AT-,-terms (i.e. non-segments) were called “green” types, whereas types of segments were called “red” types. The symbol y has been chosen for the construction of the type of a AT-, V-term purely for mnemonic reasons. In Definition 1.1.5 (XTQ) we shall construct types of segments, and these types will be of the form p(F, G, H ) . Here the symbol p is used in the construction of types of segments, again, purely for mnemonic reasons.
H . Balsters
358
(2) y(8) is the analogue of the ground type e in Definition 1.2.1. (3) y((f) & G) is the type of those terms which map terms of type f to terms of type y(G) (cf. clause (4) (ii) above). (4) In terms of trees, if y-typ(t) = y((f1) & ... & (f,,)), then this means that there are n abstractors Xxfl, ...,X zf,,lying on the main branch of the tree representation t^ of t that cannot be removed by means of a suitable sequence of P-reductions in t (cf. comment (3) in the remarks on Definition 1.2.1).
In the following definition we go one step further and introduce a new typeconstructor n which takes two arguments, both sequences of types. We recall that y ( F ) denotes the type of those terms with n abstractors lying on the main branch of their corresponding trees (we assume that F is a sequence ( f l ) & ... & (f,,) of length n) that cannot be removed by suitable P-reductions. In the case of segments, however, we can also have terms with applicators lying on the main branch of their tree representations which cannot be removed by means of suitable ,&reductions. When we write r( F, G), where F and G are sequences of types ( f l ) & ... & (f,,) and (91) & ... & (gm), respectively, then F denotes the sequence of n non-removable abstractors and G denotes the sequence of m non-removable applicators. We also introduce a product operation %” between 7r-types and y-types with which we can calculate types of terms. We note that terms in the system AT,^ V , defined below, are never typed as r-types; n-types in AT,^ V are only used as intermediate constructs for calculating the eventual type (a y-type) of a term. When we calculate the type of a AT,,-term t we first calculate the type of a beginning part of that term (such a beginning part is a segment and will thus have a n-type as its type), say that this results in the n-type r ( F ,G). Then we calculate the type of the remaining part of t (which is not a segment and thus has a y-type as its result type), say that this remaining part o f t has type y ( H ) . The product n ( F ,G) * y(H) will result in the eventual type of t. With the interpretation of r ( F ,G) as the type of a beginning part of a term with F as the sequence of non-removable X’s and G as the sequence of non-removable 6’s, Definition 1.2.3 should not be too hard to understand. After this definition we shall give an example of calculating the type of a AT,^ V-term.
Definition 1.2.3
AT^^ V ) .
(1) Quasi-types ( T r ) The set of quasi-types T, is defined as
Lambda calculus extended with segments (B.8)
359
(2) Products of quasi-types and types (*) Let F , G and H be elements of (Ty\{@})*. The product of a quasi-type and a type is defined as follows
(4) Types of terms The function ny-typ on X T V~is ~defined inductively for terms t by
(i)
v-tYP(xf) = f ;
(ii) “T-tYP(xxf ). = .((f ), 0)
*
TY-tYP(u) ;
(iii) Ty-typ(buw) = ~ ( 0(ny-typ(u))) , * Ty-typ(w). (5) The set of correct terms ( A T ~ ~ V )
A simple example of calculating the Ty-type of a ATny-V term Consider the following term t Xf
6 Xg xg xh
and assume that h = y ( H ) , where H is some element of (Ty\{@})*.According t o the rules given in Definition 1.2.3, the type o f t is calculated as follows ny-typ( x Zf 6 2 9 x Xg Xh) =
* w-tYP(6xgX2gxh) = T ( ( f ) , @ )* .(0,(9)) * T y - t d x Z g X h ) = T((f),0) * 7 4 , (9)) * T((gL0) * v-tYP(xh) = .((f)?fJ)* .(0,(9)) * 4 ( 9 ) , 0 ) * T ( ( f ) , @ )* 4,(9)) * Y((9) H) = .((f)>@)* y ( H ) = r((f) H )
= n((f),0) = =
= = = =
=
H. Balsters
360
and this result is indeed as expected: as mentioned earlier in comment (3) by Definition 1.2.2, y(( f ) & H ) is to be interpreted as the type of those terms which map terms of type f to terms of type y ( H ) ,and clearly t is a term of that type. Also note that t @reduces to the term A x f x , ( ~ )which, as expected, also has type r ( ( f 82 ) HI. The systems AT V , AT, V and AT*, V are, though different in their respective descriptions, essentially equivalent in the sense that the expressive power of each of these systems is exactly the same. The reason for deviating from the notations and constructs employed in the original system AT V is that we eventually want to give a description of a typing mechanism for AT (T,a simple-typed version of the name-free system X a . In XTCT we shall construct a completely new kind of types, called ptypes, for segments. What will be shown is that the employment of K-types, y-types and the *-operation provides for not only an exact but also a concise description of a typing mechanism for segments and segment variables written in name-free notation. We now proceed by defining a typed version AT ( of the name-free system A<. Types in AT< are elements of T y . In order to calculate a type of a name-free term in AT< we introduce the concept of a &context, denoted by T , which is a sequence of elements of Ty\{@}.
Deflnition 1.2.4 (ATE). Terms (AT <) The set of terms A T < is the smallest set X satisfying (i)
[(n)E X , for every n E IN ;
(ii) t E X =+ X j t E X , for every f E Ty\{@} ; (iii) u,v E X
+ 6vv E X .
(-Type contexts (7) A <-context T is an element of (Ty\{@})*. (Note that a type context T is a function of the form T\{ 1.1
T
: [l..length(~)] +
The typing function E-typ Let r be a J-type context. The function 5-typ is defined inductively for AT[-terms t by
Lambda calculus extended with segments (B.8)
36 1
(4) The set of correct terms Let is
T
be a (-type context. The set of correct AT(-terms with respect to
{t E AT t I <-tYP(t,7 ) # Remarks
T
.
.
(1) In AT^ we just write Af, A,, Ah, ... instead of Azf, A x g , A x h , ... (the names of variables are dropped). (2) The type of an occurrence of a variable ((n) in a A ~ ( - t e r m t is found as follows. First we form the tree representation t^ of t, then we descend from that occurrence of ((n)in i towards the root of the tree and the n-th lambda, say Af, is the lambda that binds this occurrence of ((n) and the type f attached to this lambda is the type of ((n). (If the total number of A’s encountered on the root path of this occurrence of t(n)is less than n (implying that this occurrence of t(n)is free) then the type context will see to it that this occurrence of ((n)is suitably typed.) (3) The correspondence between name-carrying terms in AT^^ V and namefree terms in AT ( is as follows. If t is a AT^^ V-term not containing free occurrences of variables then we have the following correspondence
where .f denotes the name-free equivalent of t. If t contains free occurrences of variables then we have the correspondence
where the (-context T is such that it is of sufficient length and sees to it that all free occurrences of variables in are typed in the same way as they were typed in t. We now move on to the definition of the full AT^-system by constructing types for segments. In order to do so we introduce a new kind of types, called p types, for segments. A p t y p e has three parameters and is written as p(F, G, H ) , where F , G and H are sequences of y- and, possibly, ptypes. The extra parameter H has a purely administrative function; intuitively H is the sequence of all types attached to the A’s, including those hidden inside segment variables, lying on the main branch of the tree representation of the segment in question. The sequences F and G have the same meaning as before in the case
H . Balsters
362
of the quasi-type a ( F , G ) , namely the sequence of non-removable A’s and the sequence of non-removable 6’s, respectively. We need such an extra parameter H in p(F,G, H ) in order to determine the type of those variables which refer to a A hidden in a segment variable, a situation which we now explain. Suppose that we have a AT 0-term t in which we have a segment sw($)and an occurrence of a segment variable u(n,m) which abbreviates SW($J) in t. From o(n,m) we see that sw($) has m ( m 2 0) A’s lying on the main branch of its tree representation: these m A’s are hidden inside o(n,m)and they can be referred to by variables in t occurring to the right of o(n,m). In order to be able to type those variables which refer to one of the A’s hidden inside o(n,m) we inspect the third parameter H of the type, say p(F,G, H ) , of sw($).Suppose that the m A’s lying on the main branch of the tree representation of su($)occur in the order Ahl, ..., Ah, then H shall be the sequence (h,) & ( h m - l ) & ... & ( h l ) . If a variable in t refers t o the i-th (0 5 i 5 m ) A (from the right) hidden inside u(n,m)then it will be typed the i-th member hi of H . Our definition of ATO will also take into account the reallocational effects that segmaps $ have on references t o A’s lying on the main branch of the segments in question. We now give our definition, which at first sight might be a bit hard to understand. We shall give a n example of calculating the type of a AT^-term which should help clarify the rules stated in Definition 1.2.5. We note that the construct a ( F , G), given below, is the same construct a(F,G) as in Definition 1.2.3: it is an intermediate construct used for evaluating the product of a number of types in order to evaluate the eventual type of a term (including segments), which is either a y-type or a p t y p e (but never a a-type).
Definition 1.2.5 (ATo). (1) Types (TI The set of types T is the smallest set X satisfying (i)
BEEX;
(ii) V F E (X\{@})* : y ( F ) E X ; (iii) VF, G, H E (X\{@})* : p(F,G, H ) E X
.
(Note that $0) E X and p ( 0 , 0 , 0 ) E X . ) (2) Quasi-types (T,) The set of quasi-types T, is defined as
{a(F,G) I F, G E (T\{@})*}* (3) Products of quasi-types and types (*) Let F , G, H , I and J be elements of (T\{@})*.The product of a quasi-type and a type is defined as follows
363
Lambda calculus extended with segments (B.8)
n(F,G)
n(F,G)
*
*
y ( H )=
y(F & I )
p(H,I,J)=
(4) Terms (AT a )
I
if H = G & I for some I E (I"\{@})*;
, ,
otherwise if H = G & K for some K E (T\{@.))*;
p ( F & K, I , J ) , p(F,K&I,J)
,
ifG=K&i?forsome
K
,
c4
E
(T\{@})*;
otherwise
The set of XF a-terms is the smallest set X satisfying (i)
((n)E X , for every n E
N;
(ii) if $ is a segmap then w ( $ ) E X ; (iii) if u E X and f E I"\{@}
then Xf u E X ;
(iv) if u E X then a ( n , m ) u E X , for every n E
N and
mE NU{O};
(v) if u,
E X then 6uw E X
.
( 5 ) Type contexts A type context is an element of
(T\{@})* .
( 6 ) The typing function (typ) Let r be a type context. The function typ is defined inductively for XTUterms t by
(9
tYP(t(n),T) =
~ ( n,) if n E dom(.r) and ~ ( nis)a y-type ; @
,
otherwise
(iii) tYP(Xf u, 7)= n ( ( f ) ,0) * tYP(U7 (f)
(4 typ(a(n, m)u,7) = n ( F ,G) * typ(u, H & T ) ,
T );
if n E dom(.r) and ~ ( nis)a p t y p e of the form p(F,G, H ) , where H is a sequence of length m ;
, otherwise
H. Balsters
364
(v) tYP(6U'Ul.) = 7
4 t (tYP(U1
TI))
*
tYP('U1.) ;
(vi) The set of correct terms Let T be a type context. The set of correct X T ~-t erm swith respect t o r is
{ t E A T 0 ItyP(ti7) # 8 ) We now give a further explanation of the rules stated in Definition 1.2.5, and we shall do so by means of a non-trivial example in which all of the features for calculating y- and ptypes are incorporated. In this example we shall employ the following notation conventions fl
*
***
*
fn-1
*
fn
= (fl
*
(f2
* ... *
(fn-2
*
(fn-1
*
fn))
...)
(association to the right) ( f l , f2,
*"l
fn)
= (fl) &?
(f2) 8L
a'.
&
(fn) '
Consider the following term t written in tree form
where f, g, h, i and j are certain elements of T\{@} and $J is a permutation of the interval [1..3] defined by $(1) = 2, $(2) = 3 and $(3) = 1. According t o the rules given in Definition 1.2.5, the type of t with respect to the empty context 0 is calculated, step by step, as follows t y p ( X f 6 X g 6 E ( 2 ) X h X i 6 E ( 1 ) ~ ( $ J ) X j ~ ( 1 , 3 ) E ( 1= ),0)
= =
~ ( ( f 0) ) , * ~ ( Xg 65(2) 6 Xi 65(1) 4ll)X j 4 , 3 ) {(Ill (f)) = ~ ( ( f 0) ) , * 4 0 , (tYP(ul (!)))I * ~ Y P ( %4 1 , 3) t ( l ) , (f))
where u is the segment Xg6{(2)XhXi6~(l)w($J), or in tree form
Lambda calculus extended with segments (B.8)
* P(0, 0,( h ,S, i))
365
=
(note that the composition of the sequence (i, h, g, f ) with the segmap $ yields not only a permuted but also reduced sequence ( h ,g, i) of (i, h, g , f)) = 4 ( 9 ) ,0)
*
7
a
(f)) * 4 ( h ) ,0) * .((i),
* P(k4 (i), ( h 9 , i ) ) = * P ( ( i ) , (i), ( h g ,2 ) ) =
* r(0,(f)) * 4v4, 0) 4,(4, @,9, i)) 0) * 7 4 ,(f)) *
= .((9), 0) = .((9)1
0)
*
= .((9), 0)
=
P ( ( 4 , (i), ( h ,91 9 ) =
(iff = h, otherwise the product is equal to @) = P ( ( S , i),
( 4 1
( h g ,i))
and this is indeed as expected: the segment u has two non-removable abstractors (A, and Xi) lying on the main branch of its tree; it has one non-removable applicator with i as the type of its argument; it has a total number of three abstractors lying on the main branch of its tree, which, due to the reallocational effect of the segmap $, are referred to in the order Ah, A, and X i (from the right). Now that we have evaluated t y p ( u , ( f ) ) we can proceed with calculating tYP(t, 0): tYP(40) =
4)))* tYP(4 4 L 3 ) <(1), (f)) = * 744,(P((S14, (9, = 4(f), 0) * 4 0 , ( P ( ( 9 3 4 , h g , 4)))* 4 ( A , 0) * * t y p ( 4 l 1 3 )<(I), (if)) = = 4(f), 0) * 7 0 , (P((9, 4,(i), (h,9, 4)))* .((h0) * T(F,G) *
=
T((f),
0)
( 9 1
(1)
H. Balsters
366
*typ(<(l), (hlrh2rh3,jt.f))= (where j = p(F, G ,( h l ,hz, h 3 ) ) for some F, G E (T\{@})*and hl, h2,h3 E T\{@} (cf. clause 6) iv)), otherwise the product is equal to 8)
= 4(f), 0) *
7 4 ,b((g,i),
(4, (hg74))) * 4 ( A !0) * 4F,G)*
hl =
(if hl is a y-type, otherwise the product is equal to @) = .((f),
0) * 4 0 , ( p ( ( g ,i), (i), ( h g ,4)))*
.((dl
0) * Y ( F
H1) =
(where hl = y ( c & HI)for some H1 E ('T\{@})* (cf. clause 3) ii)), otherwise the product is equal to @) = .((f), =
0) *
7 4 4M(S,
4,(i), ( h g ,4)))* r((Jl8 2 F
H1) =
.((f),0) * r ( F &H1) = (if j = p(F, G,(hl,h2,h3)) = p ( ( g , i), (4, @,g, i)), i.e. if F = (9,i), G = (i), hl = h, hz = g, h3 = i, otherwise the product is equal t o 8)
= Y ( ( f ) &z F & H1) = = r ( V , g , i ) %I H1) (by definition of j ) and this is indeed the expected result: t is a non-segment and therefore its type is a y-type; if we assume that H = (i) & H1 = (i) & ( h l ,..., h,,) for certain hl, ...,h, E T\{@}, then the non-removable abstractors lying on the main branch of the tree representation of t occur in the order Xf, A, Xi, Ahl, ...,Ah,,, since the non-removable abstractors hidden in a ( l , 3 ) are Xg and Xi, and the first type i in the sequence (i, hl,...,h,,) is removed because the type of the argument <( 1) of the last applicator occurring in the segment
is equal to i (remember that the last variable <(1)occurring in t has type y((i, hl,...,hn))which means that the first non-removable abstractor of the term that this occurrence of ((1) intends to abbreviate would be Xi, and that this X i matches the 6<(1)-part in the segment u). Note also that t P-reduces to the following term written in tree form
/ ((2) Xf -
Xg
-
6
-
/ <(l) - Xi -
6
- <(2)
,
where we have substituted the segment u for a ( l , 3 ) (the reference number 1 in the last variable <( 1) in t has been changed to 2 because of the reallocational
Lambda calculus extended with segments (B.8)
367
effect of the segmap $J). This new term can be P-reduced once more, resulting in
Xf - A,
- xi
-
/ ((1) 6 - E(3)
)
where we have substituted an updated version of the first occurrence of the variable E ( 2 ) for the second occurrence of E(2) (which was bound by the abstractor Ah of the redex). The variable <(l)in this term has type i, and the variable t(3) has type f = h = y((i,hl, ...)h J ) ; therefore the type of the whole term is equal to y((f , g , i) & ( h l ,...)hn)),which is the same type as we have calculated for t: an expected result. In general, one would expect the type of a term and its P-reduct to be the same. This property of equality of types for terms and their P-reducts with respect to a certain context is called the closure property. A proof of the closure property for XTO is given in Chapter 4 of [Balsters 861. We note that in Chapter 4 we shall also define the product of two quasi-types and furthermore show that this extended version of the *-operation is association, i.e. ( f * g ) * h = f * ( g * h ) for all quasi-types f g and quasi-types and types h. Products of quasi-types and the associativity of the *-operation will prove to be useful for facilitating the calculations of types of AT g-terms.
[In Chapter 3 of [Balsters 861 a proof of the Church-Rosser property, regarding P-reduction, is given for the system XTU .]
This Page Intentionally Left Blank
PART C Theory
371
A Normal Form Theorem in a A-Calculus with Types L.S. van Benthem Jutting
It has been long conjectured that every expression in Automath has a normal form. An unpublished proof of this has been given by L.E. Fleischhacker. Here a proof is presented that in a A-calculus closely resembling Automath every correct expression has a normal form. The proof proceeds along the lines pointed out by Fleischhacker and uses a norm which is due to Nederpelt. The importance of this theorem is that it makes it possible for us to decide whether two expressions are “equal”. In fact, together with the Church-Rosser theorem (see [Curry and Feys 581) we may deduce that two expressions are “equal” iff they have the same normal form. This helps in proving that correctness of Automath expressions is decidable.
1. DEFINITION OF THE LANGUAGE We will give here only a very loose definition. A strict definition may be found in [ d e Brmijn 70a (A.2)]. We discern constants a, b, c, ..., variables x,y, z , ..., the symbol type and various brackets as primitive symbols. For the sake of clarity we will use below also other constants like 1, s and lV. Expressions are defined by:
a constant, a variable, the symbol type are expressions; if A and B are expressions, then ( A ) B and [x: A] B are expressions. Intuitively expressions may be thought of as denoting objects: ( A )B denotes the value of the function B for the argument A; [x : A] B denotes the function associating to every z in the domain A the value B (which may depend upon z). We will call x bound in [x: A] B. We shall discern 3-expressions, 2-expressions and 1-expressions. Intuitively 3-expressions denote LLmathematicalobjects”, e.g. the natural number one is denoted by the expression 1, the successor-function in the natural numbers may be denoted by s, the natural number two, being the successor of one, is then denoted by (1) s.
L.S. van Benthem Jutting
372
2-expressions denote “classes” to which mathematical objects belong, e.g. the set of natural numbers, denoted by N , or the set of all functions mappings N into N , denoted by [x: N ] N . 1-expressions denote “superclassesll to which classes belong, e.g. the superclass of all classes, denoted by type, or the superclass of the classes of mappings of IV into some other class, denoted by [z : nV] type. Syntactically 1-expressions are those expressions which have type as their last symbol. Now every mathematical object belongs, in our conception, to exactly one class and every class to exactly one superclass. This induces a function y, called type, mapping 3-expressions into 2-expressions and 2-expressions into 1expressions. E.g. y(1) = N , y(s) = [z : N ] N ,y ( N ) = type etc. It follows that we must discern between the natural number one, with y(1) I N , and the real number one, denoted by 1*with y ( l * ) = R. It will be clear now that an expression A is either a 1-expression, or A is a 2-expression and then y ( A ) is a 1-expression or A is a 3-expression and then ? ( A ) is a 2-expression and ? ( ? ( A ) ) a 1-expression. The type y must be thought of as defined on a finite number of constants. It may be extended to a new constant a by defining $a) as a certain 2-expression or 1-expression which contains only constants defined before a. In this case . a must be thought of as denoting a definite object of the class or superclass denoted by y ( a ) . We will say that a is a defined constant. On bound variables the type y is defined, too: in [z : A ] B the variable z, which might occur free in B, has type y(z) zz A. Hence A must be a 2-expression or a 1-expression (otherwise the expression [z : A ] B is incorrect). On composite expressions y may be defined recursively. We now give a notation for substitution: the result of substituting the expression A for the variable z in the expression B is denoted by B [z := A ] . A definition of substitution we will omit here. The intuitive meaning of ( A ) B and [z : A] B leads us to a definition of reduction as follows:
(a) [z: A] B
+
[y : A ] ( B [z := y ] ) if y is not free in B.
(p) ( A ) [ z : B ] C + C [ z : = A ] . (77)
[z : A] (z) B
+
B if z is not free in B.
Intuitively the expressions to the right and to the left of + denote the same objects. We extend the relation + to a monotone quasi order on all expressions, i.e. if A + C and B + D, then ( A ) B + ( C )D and [z : A ] B + [z : C ]D . Now there are rules according to which it may be decided whether an expression is correct. One of these was mentioned above: in [z : A] B, A should be either a 2-expression or a 1-expression. The main ideas are:
A normal form theorem in a A-calculus with types (C.1)
373
(a)
A correct expression does not contain free variables or undefined constants.
(b)
( A )B is only correct if B denotes a function and A belongs to the domain of that function (i.e. A is not a 1-expression and ? ( A ) is the domain of
B). (c)
The constants should be defined in due order, and for every defined constant a, $a) should be correct with respect to the constants defined before.
2. THE NORMAL FORM THEOREM
We say that A is in normal form (in n.f.) if neither A nor any subexpression of A is p- or 0-reducible. It follows that if A is in normal form, then
A
[ZI: B l ] [ ~ :2Bz] ... [z, : B,] ( 0 1 )
... ( D , ) p
where n, m are non-negative integers, p denotes a constant, a variable or the symbol T and B1, ...,B,, D1, ...,D, are in n.f. We say that A has a normal form if B in n.f. exists such that A -W B. We now introduce the norm T on expressions as follows .(type) = type
.(a) = ~ ( ? ( a ) ) for all defined constants a . r(b) = 0
for all undefined constants b .
~ ( z= ) T(A)
if x is bound by [z : A]
T(Y)
if T/ is free.
=0
P
if T ( A ) # O and T ( B )= [ T ( A )P] for a certain symbolstring P
0
otherwise
T ( ( A )B ) =
T([Z :
A]B ) =
I
[ T ( A )T] ( B )
if T ( A )# 0 and T ( B )# 0 otherwise
A strong point of this norm is that it is invariant under substitution and reduction: Theorem 1. If T ( B )# 0 and T ( A )= ~ ( z #) 0 , then T ( B[z := A ] ) = T ( B ) . The proof is easy when substitution is well defined.
0
L.S. van Benthem Jutting
374
Theorem 2. If T ( A )# 0 and A
-n
B , then r ( B )= T ( A ) .
We will prove this for P-reduction: Suppose A = (C) [x : D ] E and B = E [z := C]. As r ( A ) # 0 we know that r ( C ) # 0 and r ( [ x: D] E ) = [ r ( C ) T ] ( A ) .Hence r ( [ x: D ] E ) # 0 and it follows ] ( E ) .It follows that r ( D ) = r ( C ) and 7 ( E )= r ( A ) . that r ( [ x : D ] E ) = [ r ( D ) T Moreover, r(x) = r ( D ) because x is bound by [x : D], hence r(x) = r ( D ) = 0 r ( C ) .Therefore, by Theorem 1, r ( B )= r ( E [x := C]) = T ( E )= r ( A ) . The next theorem is the crucial part in our proof.
Theorem 3. If A is in n.f. with r ( A ) = r ( x ) # 0 and B is in n.f. with r ( B )# 0 , then C in n.f. exists such that B [ z := A] + C. The proof is complicated and proceeds by double induction:
(I)
with respect to the length of T ( A ) ,
(11)
with respect to the length of B .
The difficulty lies in the case when B = ( D )x, because then, by substituting A for z in B, an expression is obtained which is in general not in normal form. 0 The next theorem is an easy consequence of Theorem 3.
Theorem 4. If r ( A ) # 0 , then A has a normal form.
0
We now state
Theorem 5. If A is correct, then r ( A ) # 0 .
0
From Theorem 4 and 5 now follows
Theorem 6. If A is correct, then A has a normal form.
0
375
Lambda Calculus Notation with Nameless Dummies, a Tool for Automatic Formula Manipulation, with Application to the Church-Rosser Theorem* N.G. de Bruijn
ABSTRACT In ordinary lambda calculus the occurrences of a bound variable are made recognizable by the use of one and the same (otherwise irrelevant) name at all occurrences. This convention is known to cause considerable trouble in cases of substitution. In the present paper a different notational system is developed, where occurrences of variables are indicated by integers giving the “distance” to the binding X instead of a name attached t o that A. The system is claimed to be efficient for automatic formula manipulation as well as for metalingual discussion. As a n example the most essential part of a proof of the Church-Rosser theorem is presented in this namefree calculus.
1. INTRODUCTION For what lambda calculus is about, we refer to [Barendregt 711, [Church 411 or [Curry and Feys 581, although no specific knowledge will be required for the reading of the present paper. Manipulations in the lambda calculus are often troublesome because of the need for re-naming bound variables. For example, if a free variable in an expression has to be replaced by a second expression, the danger arises that some free variable of the second expression bears the same name as a bound variable in the first one, with the effect that binding is introduced where it is not intended. Another case of re-naming arises if we want to establish the equivalence of two *Reprinted from: Indagationes Math. 34, 5 , p. 381-392, by courtesy of the Koninklijke Nederlandse Akademie van Wetenschappen, Amsterdam.
N.G. de Bruijn
376
expressions in those situations where the only difference lies in the names of the bound variables (i.e. when the equivalence is so-called a-equivalence). In particular in machine-manipulated lambda calculus this re-naming activity involves a great deal of labour, both in machine time and in programming effort. It seems t o be worth-while t o try to get rid of the re-naming, or, rather, to get rid of names altogether. Consider the following three criteria for a good notation: (i)
easy t o write and easy t o read for the human reader;
(ii) easy to handle in metalingual discussion; (iii) easy for the computer and for the computer programmer. The system we shall develop here is claimed to be good for (ii) and good for (iii). It is not claimed to be very good for (i); this means that for computer work we shall want automatic translation from one of the usual systems to our present system at the input stage, and backwards at the output stage. An example showing that our method is adequate for (ii) can be found in Sections 10-12, which present the kernel of a proof for the Church-Rosser theorem. This proof is essentially the one that was given in [Barendregt 711, where it was attributed t o P. Martin-Lof (1971). Later private information by Mr. H.P. Barendregt disclosed that the idea is due to W.W. Tait. For a survey of proofs of the Church-Rosser theorem see [Barendregt 711 p. 16-17. An elaborate treatment of the theorem can also be found in [Curry and Feys 581. What is said about lambda calculus in this paper can be applied directly t o other kinds of dummy-binding in mathematics. For example, if we have an expression like the product IIi,,f(k, m),we can write it as II(p, q, X k f ( k , m)). For any new quantifier we wish t o use (like II here) we have t o take a particular symbol that is treated aa an element of the alphabet of constants (see Section
3) * Application t o Automath is explained in Section 13.
2. NOTATION IN METALINGUAL DISCUSSION If we want to denote a string of symbols by a single (“metalingual”) symbol, we have to be very careful, in particular if this procedure is repeated, e.g. if we form mixed strings of lingual and metalingual symols, represent these by a new symbol, etc. We shall use parentheses ( ) for this purpose. If denotes a string, then is not the string itself. For the string itself we shall use (a). We shall say that @ denotes the string and that (a) is the string. Let us consider some
Lambda calculus notation with nameless dummies (C.2)
377
examples, where the basic lingual symbols are all Latin letters as well as the hyphen. (These examples will show the use of ( ) in nested form, and therefore show that the simple device of using Greek letters on the metalingual level is definitely insufficient.) We shall use the symbol p for reversing the order of a string. That is, p(pqra) denotes arqp, whence (p(pqra)) = arqp. Now let 9 denote the word phi, and let C denote the word sigma. Then (9)(C) is the word phisigma, (9)- (C) = phi - sigma, ( p ( ( C ) ) )= amgis, and ( P ( ( P ( ( @ ) ) )(dsigma)))) =
= (p(ihpamgis)) = sigmaphi = (C) (9). The ( )’s of this section are not to be confused with the similar symbols we use in Backus’ normal form of a syntax (e.g. in Section 5 ) . In typescript and in handwriting it is convenient to underline a formula instead of putting it in ( )%. In print, however, underlining, and in particular multi-level underlining, is awkward.
3. NAME-CARRYING EXPRESSIONS We explain the kind of lambda calculus expressions which we want to turn into namefree expressions. We have a set of “constants” (a, b, c, f , g, ...) and a set of “variables” (3,t , u, v ,w, 2, ...). And there is the symbol X that can have any variable as a suffix. Moreover we admit application, of which the following is the interpretation. If 9 denotes a function, and I? a value of the variable, then (9)((I?)) is the value of the function at (r).We shall use a different notation instead: we add a symbol A to the list of constants, and we write A ( ( @ ) (I?)) , instead of (a) ((I’)). This puts it on a par with another kind of expression we are going to admit, viz. things like f ( , , ), where f is any constant. In the interpretations the latter kind of expression can be very close to what we have just called application, but that does not bother us at the moment. We shall not go into a formal definition of the syntax; the following example (that accidentally does not contain the symbol A at all) will be clear enough. We take the expression
4. GETTING RID OF THE NAMES OF VARIABLES In order to facilitate the discussion, we represent the expression as a planar tree which is easier to read than (3.1) itself.
N.G. de Bruijn
378
If in (3.1) we change the names of the bound variables, e.g. x, t , u, s into p , u, s, x, we get an expression that is what is usually called a-equivalent to (3.1): Apa(Aub(p,u,f(Asa(~,u, z ) , Lw)),w,Y) .
We shall take the simplistic point of view that a-equivalent expressions are the same. Formula (3.1) contains bound variables x, t , u, s and free variables z , w, y. We shall keep a list of letters from which the free variables are to be taken. Let that list be, in this order, z , v , w, y; we draw the points A,, A,, ,A, A, under the tree. The variables in the tree (Figure 1) are encircled (unless they occur as a suffix of A).
1,3
A,
2,3
7,3
. Figure 1
For every encircled letter we evaluate two integers which are indicated in the figure, viz. the reference depth and the level. The reference depth of an encircled letter at a certain spot, x say, is the number of A’s we encounter when running down until we meet A, (this A, is counted as one of the encountered A’s). It is agreed that the A,, A,, A,, A, (which do not belong t o the tree itself) can also be encountered on our way down, e.g. if we run down to A, we encounter A,, A,, x u , A,.
Lambda calculus notation with nameless dummies ((2.2)
379
The level of an encircled variable at a certain spot counts the total number of A’s we encounter when running down the tree until we get to the root (if the root is a A, like A, here, this one is also included in the count; the loose Av, A,, A,, A, are not counted this time). Let us now erase the variables and the integers indicating the levels; we keep the reference depth. No information is lost: the erased letters and numbers can be easily reconstructed. If we are not interested in the names of the bound variables (and honestly we should not be) we can erase the suffix in A, At, A,, A,. In those cases where we are interested in the names of the free variables we have to keep the ordered list z , v , w, y in order to be able to reconstruct our expression. Note that a point of the tree refers to a free variable if and only if the reference depth exceeds the level. Thus the information contained in our name-carrying expression can be presented as
with the free variable list z , o, w, y. This expression is called namefree. Note that (3.1) can be represented differently if we take a different free variable list. Any sequence of distinct variables may serve as a free variable list provided it contains z , w, y in any order. Conversely, every namefree expression can be decoded into a name-carrying one if we provide a free variable list that is long enough. This determines the name-carrying expression up to namechanging of the bound variables. Instead of providing a finite free variable list we can take an infinite one (with the effect that we need not bother whether the list is long enough). The reference depths refer to a count in the reference list from right to left, corresponding to the fact that A’s are written in front of the formula they act on. Therefore, such infinite variable lists have to be written as ...,23,22,21 instead of the usual left-to-right notation of an infinite sequence.
5. THE SYNTAX FOR NAMEFREE EXPRESSIONS We present the syntax in Backus’ normal form:
::= a I b I c I d ::=
I ...
::=
I
N.G. de Bruijn
380
In the next sections we shall use, in an informal way, the notions “level” of an integer in an NF expression, in the sense of Section 4. (The “reference depth” of an integer is, of course, the integer itself.)
6. SUBSTITUTION We shall define the effect of a substitution of a sequence of NF expressions into a single NF expression denoted by R. What we intend t o describe is the following. Let ...,C3, C2, C1 denote the sequence (in right-to-left notation). (In practice only finitely many Ck’s are relevant, whence we need not always give the full infinite sequence.) We attach a free variable list ...,x3,x2, x1 to R, and one and the same free variable list ..., 9 3 , 92, 91 to every C;. That determines name-carrying expressions to be denoted by R* and Ct. Now replace any free x; in (0.) by the corresponding (Cz). Thus we get an expression, to be denoted by r*,with possible free variables ...,93, y2, y1. With respect t o this free variable list ..., 93, y2, 91 this r*corresponds to the NF expression S(..., (&), (&), (El) ; (R)) we shall define presently. The definition will be recursive with respect to the structure of (0); R may denote either an NF expression string or an NF expression. We follow the syntactic classification of Section 5. (i)
If (R) = ( R l ) ,( 0 2 ) then
where
r; denotes
(ii) If (0)is a constant then
(iii) If (R) = (y) ((0,))(where y denotes a constant and R1 an N F expression string) then
(iv) If (R) is the positive integer k then
Lambda calculus notation with nameless dummies (C.2)
38 1
(v) If (0)= X ( r ) then
where
Ai
denotes
(S(...,4 , 3 , 2 ;
(Xi))).
(6.2)
Note that (Ai) is obtained from refers to a free variable.
7. THE OPERATORS ~ SUBSTITUTION
(Xi)by adding 1 to every integer in (Xi)that
h AND ,
GLOBAL DESCRIPTION OF
It will be convenient to use the separate notation T ~ ( ( C )in) order to abbreviate S( ..., h + 3 , h + 2 , h + 1 . ; (C)) .
It means adding h (which is a positive integer) to every integer in (C) that refers to a free variable. The special case ~ l ( ( C i )occurs ) in (6.2). With the aid of this notation we can give amore global description ofhow (S(..., (&), ( C Z )(El) , ; (0)))is obtained: start from 0, and in each case where an integer t in (0)exceeds its level 1, we replace that t by ( ~ i ( ( C t - i ) ) ) . In automatic formula manipulation it may be a good strategy to refrain from evaluating such T ~ ( ( C ) ) ’but S , just t o store them as pairs 1, (C), and go into (full or partial) evaluation only if necessary. The following formulas may come in handy: TkTi
= 7k+1
(TO((a))) = (0)
@(.*.,(C3), ( C d , (El) i (Tc((fQ)))) =
=
(s(..., ( C k + 3 ) , (Ck+Z)r (Ck+l) ; (a))).
The latter formula is a special case of the following result on composite substitution: If
(0)= (S(*..,( A d , (A1) i (A))) then
1
s(..., (q, (cl); (a))= s(..., (w, (rl); (A))
where
(ri)= (s(..., (Cz), (xi) ; (Ai)))
(i = 1,2, ...) .
N.G. de Bruijn
382 8. BETA REDUCTION
If we have an applicational expression A ( ( @ ) (r)) , (cf. Section 3), then the interpretation is that (a) is a function, (I?) a value of the variable in that function, and A((@),(r))is intended to represent the value of the function (@) at the point (r).If (@) happens to have the form X (a), then the function value can actually be evaluated. Roughly speaking, it comes down to substituting (I?) in (R) for all occurrences of the bound variable corresponding to the X in front of (0). A precise definition in terms of N F expressions is easy to give: If R and r denote N F expressions, then A(X (0),(r))is an N F expression to which beta reduction can be applied. The effect of the beta reduction is the N F expression
w...,3,2711 (r); (0))).
(8.1)
The usual beta reduction for name-carrying expressions is obtained if we use one and the same free variable list for all four expressions X (R), (I?), A(X (a),(r)) and (8.1).
9. ETA REDUCTION
In terms of name-carrying expressions, g-reduction means the following. If C denotes a name-carrying expression that does not contain the variable x , then A, ( C ) ( x ) (or in our notation X , A ( ( C ) , x ) )has the same mathematical interpretation as (C) itself. The transfer from A, (C) (z) to (C) is called g-reduction. We shall define it for NF expressions: For any NF expression (A) we define as g-reduction the transition of
XA((n((A))),l) into (A)
.
(9.1)
If we transform both expressions of (9.1) into name-carrying expressions by means of one and the same free variable list, the transition (9.1) becomes the 7-reduction for name-carrying expressions.
10. MULTIPLE BETA REDUCTION In Section 8 we considered beta reduction of an NF expression. It was reduction of the full expression and not the beta reduction of a subexpression (local beta reduction) which we shall consider presently. In order to be able t o indicate where the ,&reduction has t o be carried out, we introduce a set of constants (applicational symbols) to be used instead of the single symbol A. By the
Lambda calculus notation with nameless dummies ((3.2)
383
same device we get the possibility of multiple local beta reduction: we indicate a subset of the set of applicational symbols and we carry out beta reduction for all symbols of that subset. let U be a subset of the set of constants. An NF expression (C) is called U-correct if every element of U that occurs in (C) is always followed by a string in parentheses with the form (A(R), (r)).In other words, each occurrence of each element of U is ready for local beta reduction. To be more precise, we indicate how the syntax of Section 5 is t o be changed in order to get the syntax of the U-correct NF expressions. We have to replace the entries
1
(
by
We shall now define the operator P,y on the set of U-correct NF expressions recursively: (i)
If (C) is a single constant or a positive integer, then
(ii) If (C) = (y) ((El), ...,( C k ) ) , where (7)is a constant not in U , then
(iii) If (C) = A (El) then
(iv) If (C) = (y)(A
(a),(I?))
and (y) E U then (cf. (8.1))
I
N.G. de Bruijn
384
Needless to say, the effect of Pu on an expression string ( E l ) ,...,( C k ) is to be defined by ( h ( ( Z l ) ) ) , ...,( h ( ( C k ) ) ) .
11. THEOREMS ON MULTIPLE BETA REDUCTION
Theorem 11.1. Zf (R), (El), (I&), ... are U-correct, then (PU((S(...l(C2),(El) ; (0))) = = ( S ( Y (PLI((C2)))l(Pu((C1))); ( P u ( ( f 4 ) ) ) ).
Proof. For easier reading we shall drop the signs ) and ( throughout this proof. The proof has to be read twice. The first time we deal with the proof of PUS(..., E21 C1 ; a) = S(...,PuC2,PuC1; Pun)
(11.1)
in the case that the Ci are integers. (This case is intuitively clear, but it takes little extra trouble to derive it formally.) In the second reading the result of the first reading can be used. We apply induction with respect to the structure of R, using the definition of substitution as given in Section 6. (Note that in the first reading the induction hypothesis is used only for cases belonging to the first reading.) The cases (i), (ii), (iv) of Section 6 are very simple, and so is case (iii) if the constant y is not in U . We concentrate on the two remaining cases, viz. R = X r and R = y(AA, r) with y E U . If R = X r we apply (v) of Section 6 twice: PUS(..., c2,El ;
w = P V W . . . , A2,
All
1;
r)
(11.2)
1
S(...,PU&,PUCI;Wr) = A s ( . . . , A z , A i , 1 ; D
u~)
(11.3)
1
where Ai is given by (6.2), and Af = S(...,4 , 3 , 2 ; PvCi). By Section 10 (iii) and by the induction hypothesis, the right-hand side of (11.2) equals XPUS(..., A21 A l l 1 ; r) = ~ S ( . . . , P u ~ 2 , P u ~ Purl 1,1;
.
(11.4)
In the first reading of the proof the Ei and Ai are integers, whence PuCi = Ci, and therefore A: = Ai = PuAi. So the right-hand sides of (11.3) and (11.4) are equal, hence the left-hand sides of (11.2) and (11.3) are equal. In the second reading of the proof we may use the theorem for the case that the Ci are integers; hence
Lambda calculus notation with nameless dummies (C.2)
385
and the right-hand sides of (11.2) and (11.3) are equal. The second case we have t o deal with, is R = y(AA, r) with y E U . We have to show
PUS(..., c2,c1 ; $4
r))= q..., PU&,PUCl
; Pur(AA,I-))
. (11.5)
The right-hand side equals, by Section 10 (iv)
q..., P U c 2 , P U C l ; q..., 3,2, LPUr ; h a ) ). By the formulas on composite substitution (Section 7) this is
~ ~ . . . , P ~ ~ ~ , P ~ ~ ;~mi; , ~ P~ ~. A . . .) , P ~ ~ (11.6) ~ , P ~ ~ ~ The left-hand side of (11.5) equals, according to 6 (iii),
s(...,c2,c1; r)).
P ~ ~ (..., Scz, ( c1; AA),
(11.7)
By 6 (v) we have
S(..., C z , C1 ; AA) = A@ where
9 = S( ...,Az, A 1 , l ; A)
,
Ai
= S( ...,3 , 2 ; Ci).
Applying 10 (iv) we can write for (11.7)
S ( . . . , 2 , 1 , P U S ( . . .Cz,C1; , r);P U @ )
'
(11.8)
By the induction hypothesis we have PU@ = S(...,PuAz,PuA1,1; P U A )
,
and so we can apply the formula for composite substitution (Section 7) t o (11.8); it becomes
q...,n2,nl ; P U A )
(11.9)
where
nl = s(...,2, i , p U s (...,c2,el ; r); 1) = pus(...,cz,el ; r) I I ~ +=~s(...,2, i , p U s (...,c2,el ; r) ; pUhi) (i = 1,2, ...I . We have to show that (11.9) equals (11.6). By the induction hypothesis we = PuCi have II1 = S(...,puCz,/3~&; Bur), so it remains to show that (i = 1,2, ...). In the first reading of the proof the Ci are positive integers. Therefore the hi are integers > 1; it follows that PuAi = Ai > 1, whence IIi+l = Ai - 1 = Ci = PUCi. In the second reading of the proof we may use the result of the first reading:
N.G. de Bruijn
386
PuAi = PUS(...,3 , 2 ; Ci)= S( ...,3 , 2 ; PuCi) , and the formula for IIi+l now results in (cf. Section 7) R+1
= S(...,3,2,1; PuCz) = puci
0
.
Theorem 11.2. Let U and V be subsets of the set of constants, and let (C) be both U-correct and V-correct. Then (,&((C))) is V-correct, (Pv((C))) is U correct and (Pu( (Pv ( (C))) 1) = (Pv ( (Pu( (C)1) 1) 9
Proof. Again we omit the (’s and )’s. The V-correctness of PuC is easily proved by recursion: use the definition of Pu of Section 10. In 10 (iv) we have to use Theorem 10.1. By the same recursion we shall prove PuPvC = PvPuC. The only case where the induction step is non-trivial is the case C = y(AR,r) with y E U U V. If y E U we have by 10 (iv) PvPuC = PVS(..., 372, L P u r ; Pun) By Theorem 11.1 this equals
q..., 3,2,1,PvPur ; PvPun) .
(11.10)
If y $ U , y E V we find by 10 (ii), 10 (iii)
P ~ P= ~ PC~ T ( A P , ~Purl , , and by 10 (iv) this equals (11.10). So y E U U V implies that PvPuC equals (11.10). On behalf of the induction hypothesis (11.10) is symmetric, whence I3 PvPuC = PuPvC.
12. THE CHURCH-ROSSER THEOREM FOR BETA REDUCTION We consider an NF expression C with a single constant A that can be used for P-reduction. We label all A’s in C so that they become all different. Next we take a subset U of the labelled A’s, we apply pu and then remove the labels. This gives an NF expression C’. We say that C‘ is a multiple reduction of C’, and we write C IrnC’. If U has only one element, and if that element has just one occurrence in C, the reduction is called single, and we write C 2, C’. If C1 and C2 satisfy either C1 2s C2 or C2 2, C1, we write C1 C2. The Church-Rosser theorem for beta reduction says: If C1 C2 ... C, then there are A l , ...,A,, and II1, ...,IIh with N
N
N
N
Lambda calculus notation with nameless dummies (C.2)
2s
A1
2 s ... 2 s
Akt
En
2s
nl 2 s ... 2 s
387
nh, A k = n h .
This can now be proved as follows. From Theorem 11.2 we easily obtain: if C1 C2, C1 2rnC3 then there is a C4 with C2 2, C4, C3 & C4. Moreover it can be shown: If C 2, A then there is a sequence
zrn
C 2s C1
>s
C2
Ls ...
Cm=A
(Actually, if every element of U occurs at most once in the U-correct expression C, then we can arrange the elements of U as u1, ...,urnin such a way that
... D { u * }C = DUX .)
D{um}
The Church-Rosser theorem now follows by a trivial reduction argument. The above proof can be easily adapted to lambda calculus with expressions as types (see Section 13).
13. NOTATION IN AUTOMATH
The mathematical language Automath (see [de Bruzjn 70a ( A . 2 ) ] ) has lambda calculus with types, and these types are again expressions. That is, instead of A, we have things that can be visualized as A,,(*) (R), where 0 and R denote name-carrying expressions. We may think of z to be a variable of the type (0). It is clear that we do not want x to have any binding influence on (0).In order t o achieve this, we create a new lingual constant T (just like we added A to our set of constants in Section 3), and we write
T ((0) 3 A, (Q) 1
(13.1)
instead of A,,(*) (0).Now (13.1) can be transformed into a namefree expression just like any other name-carrying expression. The actual notation in Automath is different. Instead of (13.1) Automath , Automath uses {(I')} (0). uses [z : (a)](a),and for the application A ( ( @ ) (r))
14. ALGORITHMS
An algorithm for turning a n NF expression into a name-carrying one, can be described on the basis of the recursive definition of substitution in Section 6. Let (0)be an NF expression. Take a free variable list ..., 2 3 , 2 2 , z 1 consisting of distinct letters which do not belong to our alphabet of constants. Now add these xi to that alphabet, and evaluate
N.G. de Bruijn
388
This is a namefree expression; if we proclaim the xi’s to be variables again, it becomes an intermediate expression where the free variables have names but the bound variables are nameless. If we want to have names for the bound variables too, we have t o modify S slightly. We take an infinite store of letters y1, y2, ... (different from the x i ’ s and different from the constants), and we take a modified form of (6.1). Any time we get to apply (6.1) we take a fresh y (i.e. one that has not been used before) and we replace the right-hand side of (6.1) by X,
(q..., @3), ( A 2 ) , ( A l ) , Y ; (U))
*
It is not very hard either to give algorithms that transform name-carrying expressions into namefree ones. This can be done if a free variable list is given (and then it has t o be checked, during the execution of the algorithm, whether this list is adequate), but we can also write an algorithm that produces a free variable list itself. For the case of the first-mentioned possibility we give a brief description of the crucial steps. Let ...,z 3 , z 2 , z 1 be a free variable list, and let (a)be the name-carrying expression we want to transfer into the namefree expression (a*).If (a) equals one of the z’s, then (a*)is an integer, viz. the index of that z. If (a)is a variable, but not one of the z’s, the answer is “free variable list was wrong”. If (0)= A, (I’) then we transform (I’) into the nameless expression (I“)by means of the free variable list ...,z3,22,z1, y, and we have (a*)= X (I?*). The other cases (the cases (i) (0)= (%), (%), (ii) (a)= a constant, (iii) (n)= (p)( @ I ) ) ) are very easy.
389
Strong Normalization in a Typed Lambda Calculus with Lambda Structured Types R.P. Nederpelt
CHAPTER I. INTRODUCTION AND SUMMARY 1. Lambda calculus The lambda notation was originally introduced as a useful notation by Church in two papers developing a system of formal logic [Church 321. He extended this notation in his calculus of lambda conversion (lambda calculus). This calculus was meant to describe a general class of functions which have the feature that they can be applied to functions of this same class. For historical comment see [Curry and Feys 58, Ch. 0, D and Ch. 3, S l ] and [Barendregt 71, Ch. 1, 1.11. In the latter reference the importance of lambda calculus for the development of recursive functions is mentioned. The calculus has also been brought into relation with the theory of ordinal numbers, predicate calculus and other theories. From the very beginning, lambda calculus was strongly linked to the theory of combinatory logic. We shall later mention some major results achieved concerning lambda calculus. Right here we stress the contribution of lambda calculus to ordinary mathematics at a purely notational level. The mathematical custom to use the notation f(z), both for the function itself and for the value of this function at an undetermined argument z, obscures the mathematical notion “function”. According to Curry and Feys “this defect is especially striking in theories which employ functional operations (functions which admit other functions as arguments)”. For an example showing that the usual mathematical function notation is defective not only for understanding, but also in use, see [Curry and Feys 58, Ch. 3, A2]. We shall give an example of the lambda notation. Consider the function which assigns to z the value x + 2. This function is denoted in lambda notation as Xz .x + 2. We can apply the function to an argument, say 3. The application of this function to the argument 3 is denoted as ( X i . z 2)3. The result of this application must clearly be 3 2.
+
+
R.P. Nederpelt
390
+
This suggests that there exists an order between the terms (Xz z 2)3 and 3 2 (the latter term is “closer to the outcome”). The transitive and reflexive relation corresponding t o such an order is called a reduction. In the above case it is called a P-reduction, often denoted by 2 p . Thus we have the relation ( A X . 2 2)3 2 p 3 2. The reduction relation is also monotonous, i.e.: if term S reduces to term T , then Xz . S reduces to Xz . T , ( U ) S to ( U ) T and ( S ) U t o ( T ) U . So from the relation (Xz . z 2)3 2 p 3 2 follows, for example, that A y e ((Xz . z 2)3) >p Xy . (3 2). The relation compares two terms (viz. (Xz . z 2)3 and 3 2); the fact that these terms have the common value 5 in the usual interpretation, plays no rBle here. If we d o not take 3, but z as argument for the above function, then we obtain (Xz z 2)z >p z 2. So lambda calculus makes a clear distinction between the function: Ax 2 and the value of this function for an undetermined argument: z+2. We are used to the fact that the terms Xz . z 2 and Xy . y 2 denote the same function. The two terms are called a-equivalent, and the passage of the one into the other is called a-reduction, often denoted by La.In this way we Xy . y 2. It is quite a nuisance that this also have the relation Xz z 2 a-reduction, which is simply a renaming of variables, plays a r61e in the lambda notation. One can avoid this by considering a-equivalence classes instead of separate terms. Another nice and practical way out is given by de Bruijn [ d e Bruijn 72b (C.2)],who completely suppresses the use of names of variables by means of a notational system referring to the positions of a variable in a term. We wish to state that the desire t o eliminate variables is one of the things giving rise to combinatory logic. The method used in combinatory logic to obtain this elimination is, however, different from de Bruijn’s. A third reduction, which is commonly used and strongly related to extensionality (see [Barendregt 71, Th. 1.1.17 and Th. 1.1.18]), is called 7-reduction. This relation, commonly denoted by Iv,is based on the following rule: If z is not free in the term M , then Xz. ( M ) z > v M . An intuitive justification is that, for any argument X ,the sides of the relation have comparable values: this value is (Xz . ( M ) z ) X for the left-hand side and ( M ) X for the right-hand side, and (XZ * ( M ) z ) X 2 p ( M ) X . A sequence of reductions obtained by successive application of reductions is called a reduction sequence. For each of the reduction relations explained above, the corresponding symmetric and transitive closure is called a conversion relation. One of the first important results in lambda calculus concerns the dependence between conversion and reduction. This is called the Church-Rosser theorem, which states: If
+
+
+
+
+
+
+
+
+
+
+
a x +
+
+
+
+
+
Strong normalization in a typed lambda calculus (C.3)
39 1
X converts to Y , then there is a 2 such that X reduces to 2 and Y reduces to 2 (see [Curry and Feys 58, Ch. 41). For interesting historical comments see [Barendregt 71, Th. 1.2.9 and remarks in 1.2.18 plus footnote]. In Appendix I1 of the latter reference the latest and nicest proof of the Church-Rosser theorem is given (1971 by W.W. Tait and P. Martin-Lof). For a precise description see [Schulte Monting 731. In this thesis we shall use the name “Church-Rosserproperty” for the following statement: If A reduces t o B and to C , then there is a D such that B and C reduce to D. This property is equivalent to the Church-Rosser theorem. 2. Normalization and strong normalization An important issue in lambda calculus is the question of the normalization of terms. This is a termination problem. For example, a P-reduction such as (Xzcz)y 2 p y cannot be continued in a non-trivial manner: there is no reduction for y, except those trivial on account of the reflexivity of a-, b- and v-reduction. In this case (Xz . z)y is said t o normalize into a normal form y. In lambda calculus, which allows all functions as arguments of functions, such a termination of the reduction is not guaranteed. See Church’s nice example: w2 = (Xz zz)(Xz. zz). [Note: One writes A B instead of ( A ) B , and ABC instead of ( ( A ) B ) C . ]There is a non-trivial @reduction, by applying the rule (Xz . z z ) A 2 p ( A ) A with A = Xz . zz. This produces w2 2 0 up. It is clear that the reduction of w2 by repeated used of the above non-trivial @reduction will never come to an end. There are more and stranger examples of such terms, the reduction of which never terminates. For example: put w3 = Xz.zzz. Then w3w3 >p w3w3w3 >p ... . Barendregt even constructed a universal generator with the property that it has a reduction sequence in which all terms of lambda calculus occur as subterms. A term in lambda calculus is called normalizable if there is some reduction sequence which terminates. A term is strongly normalizable if each of its reduction sequences terminates. The last term of a terminating sequence is called a normal form. It is obvious that strong normalization implies normalization. The reverse implication does not hold. For example: put again w2 = Xz . zz, then ) t o Xy y if the function Xz . (Xy . y) is applied to (Xz . (Xy y ) ) ( w ~ w 2reduces the argument ~ 2 ~ but 2 ,it reduces to itself if the function w2 is applied t o the argument w2. Since Xyey is in normal form, (Xz.(Xy. y ) ) ( w 2 w 2 ) is normalizable, but not strongly normalizable. In this example we see a term that normalizes if one application of a function to an argument is assigned priority over another. There is a general theorem in lambda calculus (the standardization theorem, cf. [Curry and Feys 58, Ch.
392
R.P. Nederpelt
4El]), which states that any normalizable term can be normalized by assigning priority t o the “leftmost” application in the term. The fact that some terms in lambda calculus have non-terminating reduction sequences is related to the feature that one can use all functions as arguments for functions. (Even the function itself can be used as an argument, see the above-mentioned example by Church. This is called self-application.) The same things can happen in programming languages and in the theory of partial recursive functions, where normalization- (or termination-) problems arise too. In lambda calculus the question of the normalizability of terms has been shown to be undecidable. There are systems in which normalization implies strong normalization. For example, in a restricted lambda calculus (XI-calculus) this implication holds (the so-called second Church-Rosser theorem, see [Curry and Feys 58, Ch. 4, El), but the proof is not trivial. Prawitz [Prawitz 651 proved normalization for derivations in natural deduction. He also proved strong normalization for these derivations in [Prawitz 711. Note that in the latter proof he does not use his results from [Prawitz 651, but quite a different proof technique developed by Tait [Tait 671. An interesting problem concerning normalization is the question of uniqueness of normal forms. If a term A has the property that every terminating reduction sequence leads to the same normal form (but for a-reduction), then A is said to have a unique normal form. We note that the Church-Rosser theorem implies the uniqueness of the normal form if this exists. In this thesis we shall show that, if in a system all terms are normalizable into a unique normal form, then each term is strongly normalizable. This will be proved for a certain lambda calculus called A, the method can, however, be applied to more systems, and we suggest this as a field of further investigation. 3. Normalization in systems of typed lambda calculus
In ordinary mathematics one, sometimes tacitly, assumes that each object has a certain type (in our example of a term in lambda notation: Xz x 2, we assumed that x has a type (e.g. that of the natural numbers) in which addition is possible). In systems of typed lambda calculus one attaches a type to each term. In so doing and in restricting the formation of terms in accordance with the types (see the “applicability condition” explained in Section 1.4) one brings lambda calculus nearer to usual mathematical systems. We note here that there is a strong correspondence between derivations in systems of natural deduction and terms in systems of typed lambda calculus, as well as between formulae in the one and types in the other: a derivation D 9
+
Strong normalization in a typed lambda calculus ((2.3)
393
proving a formula F corresponds to a term D’with type F’. This is called the “formulae-as-types notion”. The latter notion has recently been investigated by various authors in developing a theory of construction and in studying functional interpretations. The first indication in this direction was given in [Curry and Feys 58, p. 312-3151. We further mention Lauchli [Lauchli 701, de Bruijn, who developed and applied this notion with a large variety of types in his mathematical language Automath ( [ d e Bruzjn 70a (,4.2)]),Howard [Howard 801, Prawitz [Prawitz 711 and Girard [Girard 711. Normalization problems also arise in systems of typed lambda calculus, Sanchis [Sanchis 671 investigated a lambda calculus with types (essentially Godel’s theory of functionals of finite type) and found all terms in this calculus t o be strongly normalizable. Martin-Lof [Martin-Lof 75a] admitted more general types and obtained normalization for his terms. His system is close to the requirements of common mathematics in the sense that usual mathematical notions such as the logical connectives and the recursion operator are incorporated. In this thesis we shall regard a typed lambda calculus, in which the types themselves have lambda structure. Our typed lambda calculus, which we call A, has a large overlap with the mathematical language Automath [ d e Bruzjn 70a (A.2)]. (See the following section for the relation between Automath and our system A.) In particular, a single-line version of Automath (AUT-SL, see [ d e Bruijn 71 (B.211) introduced by de Bruijn has led us t o the investigations in this thesis. Preliminary work in the direction of AUT-SL can be found in our notes on Lambda Automath ([Nederpelt 71a] and [Nederpelt 71b]), in which some syntactical notions of Automath were unified. In AUT-SL this unification was extended considerably. de Bruijn defined AUT-SL by means of a recursive programme. Our definition of system A (given in Chapter 111) follows more orthodox recursive lines. Nevertheless, the resulting systems are the same. In these systems there is no syntactical distinction between terms and types. We therefore use the word expression rather than term or type. There is one basic constant in the system, called T . To each expression which does not end in T we shall assign a type in a natural manner. We say that expressions ending in T have degree 1. Each other expression has some degree n > 1, while the degree of such an expression A is defined t o be one more than the degree of the type of A . In this manner we have expressions of any finite degree at our disposal. In Automath and in Martin-Lof s system there is a restriction to the degrees permitted. Both systems have only terms and types of degree 1, 2 or 3.
394
R.P. Nederpelt
Our system has in common with Automath that logical connectives, the recursion operator and a basic set of numbers (e.g. natural numbers) are not incorporated. The proofs of normalization results concerning these systems can be formalized in first order arithmetic. Yet it is possible t o interpret into these systems mathematical theories containing, for instance, logical connectives and the recursion operator by introducing new primitive equality relations which extend the existing equality relations which correspond to conversion. We shall prove normalization and strong normalization for our system in Chapter 111. As mentioned above, we shall introduce a method for deriving strong normalization from normalization together with the uniqueness of normal forms (see Section 1.6). 4. The relation to the mathematical language Automath Automath (see [de Bruijn 68b] and [ d e Bruzjn 7Oa (A.,!?)]) was designed by de Bruijn as a language for mathematics. It has the property that the interpretation of a text written in Automath is correct mathematics if the text is syntactically correct. Many such systems have been developed for logic. For mathematics, Russell and Whitehead’s Principia Mathematica was the first successful attempt in the direction of formalization, There have since been many other attempts. However, in the majority of these systems important parts of the mathematical argumentation were not incorporated in the formal system, but were dealt with at a meta-level. For example, in systems based on axioms and inference rules a theorem is true if it can be inferred by successive application of a number of axioms and rules. But one hardly ever says exactly (in terms of the formal language) which axioms and rules were used, and in which order. Moreover, the use of an axiom scheme was usually not substantiated by a formalized indication of the substitution instance employed. Admittedly, there is a gap in the completeness of the formalization in Automath, too. The gap is that, in the case of “definitionally equal” expressions, there is no indication of how this equality can be established on the basis of the language definition. It is left to algorithms t o justify these definitional equalities. The existence of terminating algorithms for this purpose can be proved by means of normalization properties. The question of practical efficiency of such algorithms is, of course, a different one, and is not considered in this thesis. Two expressions in Automath are called definitionally equal if one expression can be transferred into the other by (1) conversions, and (2) the elimination of abbreviations.
Strong normalization in a typed lambda calculus ((3.3)
395
A major problem for automatic checking in Automath is whether definitional equality of two expressions is decidable. The latter is clearly the case if each expression is effectively normalizable into a unique normal form. In this respect, see [Kreisel 721. The main aim of this thesis is to prove the existence and uniqueness of normal forms for A. Since A does not use an abbreviation system as a syntactical element like Automath does, we may restrict ourselves t o conversions. We note that the omission of abbreviations is no severe restriction, since abbreviations are relatively simple operations usually considered to be only notational devices without mathematical content. The mere typing of lambda calculus expressions does not guarantee the property of normalization. We need more. Automath permits only a restricted class of expressions. In this class only those expressions E are included which obey the so-called applicability condition: for each part of E which has the form of a function F applied to an argument A it is required that (1) F has a domain D, and (2) the type of A is definitionally equal to D. These requirements are natural for a system which is so closely linked to ordinary mathematics. The following examples in lambda notation will make this clear. In the first place, it would be unnatural to supply an expression which is not a function with an argument: one can attach an argument to Ax x 2, but it looks strange to provide the number 7 with an argument. Secondly, let us assume that x in Ax . x 2 is required to have the natural numbers as type. This defines the domain of the function. Then one may write the application (Aa: * x 2)3, since 3 has the same type as x. But it would be quite unnatural to write the application (Xx.x+2)a_,where arepresents a vector in R3. In AUT-SL and in A , expressions have to obey the applicability condition, like in Automath. This condition is sufficiently strong to guarantee normalizability (even a weaker condition suffices, see Section 1.6). We note that Automath has the property that assignment of a type to an expression of degree 3 is different to that for expressions of degree 2. Expressions of degree 3 have lambda structured types, whereas expressions of degree 2 all have the same type, viz. the expression denoted by the symbol type. (This symbol type is the Automath version of the symbol T used in our system 12.) As an illustration we give an example in lambda notation. Suppose that the term Ax.x 2 has Nat as type for x, and type as type for Nat. Then Ax.x 2 has degree 3. In the manner of Automath it has Ax. Nat as type. The latter expression, having degree 2, has as type the expression type.
+
+
+
+
+
R.P. Nederpelt
396
In AUT-SL and in A, however, the assignment of types to expressions of any degree 2 2 is treated in a uniform manner, comparable to the assignment of types to expressions of degree 3 in Automath. If the term in the above example (Ax.x 2) were treated in the A-way, its type would again be Ax . N a t , but the type of Ax. Nat would be Ax . T . We note that an extension of Automath, called AUT-QE (“Automath with quasi-expressions”, see [de Bruijn 73b]), has more expressions of degree 1 than only type; it admits as expressions of degree 1 some of those admitted in AUTSL and A. However, AUT-QE allows a choice to be made for some expressions of degree 2, between essentially different types. Again using the above example as an illustration: in the manner of AUT-QE one may choose either Ax . type or type as type of Ax.Nat. It is to be noted that the above-mentioned difference between Automath (or AUT-QE) and AUT-SL (or A) has the important consequence that neither Automath nor AUT-QE is a subsystem of AUT-SL (or A). The results for A obtained in this thesis are therefore not immediately transferable either to Automath or to AUT-QE. Normalization for a simpler form of AUT-QE, which does form a subsystem of AUT-SL, was proved by van Benthem Jutting [van Benthem Jutting 71a], using the norm introduced in this thesis (we shall call this norm p; cf. Section 1.6). The normalization theorem of this thesis is a generalization of that of [van Benthem Jutting 71a]. Strong normalization for a system resembling Automath was recently studied by R.C. de Vrijer on the basis of Tait’s ideas exposed in [Tait 671, and for Automath and AUT-QE by D.T. van Daalen (private communications). The uniqueness of normal form has only been proved with respect to preduction. Uniqueness of normal form with respect to P-v-reduction is as yet an open question (see also Section 1.6).
+
5. Change of notational conventions In the lambda notation as usually employed the quantifiers (such as Ax) are written to the left of the expressions they operate upon, whereas applications are written to the right. This corresponds to mathematical notational traditions to write quantifiers (such as V, Cr=l,...) t o the left, and to write the argument of a function f t o the right (as in f(a)). In ordinary mathematics these two kinds of operations have nothing in common, but in lambda calculus they are closely related by p- and v-reduction. In a sense, quantification (also called abstraction) and application are inverse operations. Sequences of such operations can be applied in various orders, and it is most convenient to write them all on the same side of an expression, thus
Strong normalization in a typed lambda calculus (C.3)
397
showing clearly in which order the expression has been formed from its constituents. In Automath applications and abstractions are all written to the left. Instead of writing abstractions in the form Ax, Automath writes [z : A ] , in which A stands for the type of the variable x. Applications are indicated by writing the expression in angle brackets; instead of the usual mathematical notation f(a) we write (a) f. For example: the term given in lambda notation as (Ax . x 2 ) 3 reads in Automath as: ( 3 ) [x : N u t ] p h ( x , 2 ) . (Here we assume that x has as type the natural numbers, abbreviated Nut; a minor difference is that Automath uses only prefix notation for operators.) Note that the pair ) [ indicates the possibility of P-reduction. Sometimes, but not always, the pair ] ( indicates the possibility of q-reduction. This notation for abstraction and application renders the use of parentheses ( ) entirely superfluous, since there can be no doubt as to the order in which abstractions and applications appear. The separation dot as used in Ax . x 2 disappears as well. Automath uses the parentheses ( ), but for a different purpose. In AUT-QE, AUT-SL and in the system A which we shall develop in this thesis, these slightly different notational conventions are also adopted.
+
+
6. Summary of the contents of this thesis This thesis contains a chapter on the formal system A (Chapter 11) and a chapter on the formal system A (Chapter 111). In the latter chapter we develop the main results of the thesis. System A forms part of system A, containing those expressions of A which obey the applicability condition (explained in Section 1.4). We shall now discuss the contents of Chapter 11. There we define expressions inductively by: x and 7 are expressions; [z : A] B and ( A ) B are expressions if A and B are so (z is a variable). In system A we only include those expressions which are “distinctly bound”, i.e. (1) which do not contain free variables, and
( 2 ) which have distinct binding variables. Our preference for bound (also called closed) expressions (expressions without free variables) is noticeable throughout this thesis. We give the following justification for this preference. We believe that in a typed lambda calculus the feature of typing can only be meaningful if every typable expression has an effectively computable type. Since free variables have no traceable type in our
R.P. Nederpelt
398
system, this implies that only bound expressions are admissible. If in this thesis we deviate from this agreement by considering expressions with free variables, this will be in cases in which it is clear from the context which types belong to these free variables. The consequence of the above agreement is that many expressions under discussion begin with a n abstractor chain Q. (An abstractor chain is a string of abstractors; an abstractor has the form [z : A ] , A being an expression.) The fact that we require all binding variables in an expression in A to be distinct has only practical reasons (cf. Section 11.5). We stress that system A is not a typed lambda calculus in the usual sense, since the types have no influence whatsoever on the formation of expressions. The types, which themselves have a lambda structure, will only be treated as formal expressions. It is not until Chapter 111, dealing with the restricted system A, that the types will play the usual r6le in the formation of expressions. This is due to the applicability condition imposed upon expressions in A. We shall formulate the relations a-, 0- and 7-reduction inside A and we shall prove a number of properties of these reductions in the system A (in Sections 11.4, 11.5 and 11.7, respectively). In Section 11.6 we shall consider some reductions related to @reduction. Our proof of strong normalization in A (Section 111.3) is based on these reductions. The more important one of these reductions will be called PI-reduction. We shall explain its characteristic property by reducing the term which we previously used as an example: (Xz . z 2)3, or, in Automath-notation: (3) [z : Nut] pZw(z, 2) (cf. the previous section). As for P-reduction, we have the relation (3) [z : Nat] plus(z, 2) >p plvs(3,2). But with ,&-reduction, which we denote by I p , , we have: (3) [z : Nat]plus(z, 2) I p , (3) [z : Nat]plus(3,2). Here the part (3) [z : Nut] is left intact on the right-hand side. (Actually PI-reduction is more complicated; see Section 11.6.) The following feature of 01-reduction is worth noting: Application of Preduction sometimes enables one to eliminate a non-normalizable subterm (in this respect we recall the example (Ax. (Aye g))(wzwz) 2 p Xy . y of Section 1.2), but with PI-reduction this is impossible. In Section 11.6 we prove the Church-Rosser property for PI-reductions, using a proof technique of Tait and Martin-Lof. This property implies the uniqueness of normal form for PI-reductions. The Church-Rosser property for P-reduction can be proved in a similar manner. Unfortunately the Church-Rosser property for P-7-reductions does not hold in our system A. The trouble here arises from the typed character of our lambda calculus. We explain this in greater detail in Section 11.7. (However, we conjecture the Church-Rosser property for Pg-reductions in A; see the end of the present section.)
+
Strong normalization in a typed lambda calculus (C.3)
399
In Section 11.7 we also prove a theorem concerning the "postponement of 7-reductions" in a sequence of p- and 7-reductions, by means of a method suggested by Barendregt. At the end of Section 11.7 we define lambda equivalence for A: A and B are lambda equivalent if there is a C such that A and B reduce to C. This lambda equivalence is not necessarily transitive since the Church-Rosser property for P-7-reductions does not hold in A. In Section 11.8 we define a formal type-operator called Typ, which assigns a type to an expression not ending in r. The action of this type-operator is syntactically simple and is in agreement with what we mentioned about the assignment of types in Section 1.3. In Section 11.8 we also define the degree-function Deg, which is in agreement with our description of degree as given in Section 1.3. In our system A we can apply the type-operator Typ a finite number of times. For each expression A in A there is an n 2 0 such that Typn A ends in r , which implies that Typn A has no type. (Here Typn A is obtained by n applications of the type-operator.) This n is the degree of A minus one. We define Typ* A to be Typn A for that particular n. We begin Chapter I11 with the definition of the formal system A (Section 111.1). Among the theorems in Section 111.1 there is one which states that the type of an expression in A again belongs to A. In Section 111.2 we prove the normalization theorem for A . We use a norm p, which is a partial function on A. The norm p ( A ) for a certain A in A is itself an expression in A. The norm of A is defined if A obeys a weak form of the applicability condition, which amounts to the following: for each part of an expression E which has the form of a function F applied t o an argument A : (1) F has a domain D , and (2) the norms of A and D are defined and equal (apart from a-reduction) Applied t o expressions for which the norm is defined (so-called p-normable expressions), the norm p has two powerful properties: (1) If A reduces to B , the norms of A and B are (essentially) equal, and (2) the norm of an expression is (essentially) the same as the norm of its type.
The norm p ( A ) of a p-normable expression A can be obtained by (1) replacing non-binding variables by their types, repeating this process until no non-binding variable remains, and
(2) cancelling adjacent pairs (C) [z : D ]
400
R.P. Nederpelt
We show in Section 111.2 that all expressions in A are pnormable. We subsequently show that each p-normable expression has a normal form for preductions. It follows in particular that A is normalizable for @reductions. It now easily follows that A is also normalizable for p-0-reductions. Our proofs show that the normal form of A in A is effectively (viz. primitively recursively) computable. In Section 111.3we prove strong normalization for A. We use the 01-reduction introduced in Section 11.6. We show that expressions in A are normalizable for PI-reductions, using the same methods as in the corresponding proof for 0-reductions in Section 111.2. By using the Church-Rosser property for 01reductions as proved in Section 11.6 we obtain the uniqueness of normal form for PI-reductions. The special features of PI-reduction enable us to conclude strong normalization in A for ,&-reductions from the normalization and the uniqueness of normal form. Strong normalization in A for P-reductions is a consequence, as well as strong normalization for p-77-reductions. The uniqueness of normal form in A is proved for ,&reductions but not for 077-reductions. Nevertheless, the latter would be a consequence if we could prove the following conjecture:
Conjecture I. In A the Church-Rosser property holds for p-77-reductions.
0
(The difficulties in proving this arise in the same place where the corresponding statement for A turns out to be false; see Section 11-7,)
As to A, there is an important conjecture on closure: Conjecture 11. If A is an expression an A and i f A reduces to B, then B 0 is an expression in A . In [Nederpelt 72b] we stated this as a theorem, but the proof turned out to be incorrect. The latter conjecture has no influence upon the results in this thesis; it is, however, of importance for the construction of an efficient checkingalgorithm for expressions in A.
Strong normalization in a typed lambda calculus (C.3)
40 1
CHAPTER 11. THE FORMAL SYSTEM A 1. Alphabet and syntactical variables We use the following symbols as our alphabet: (i)
an infinite set of (individual) variables: a I 8 , ~ , a 1 , P 1 , ~ 1 , . . . ;
(ii) a single constant, called the base: (iii) the improper symbols: [ ,
1,
T
;
( , ) , ,.
As syntactical variables denoting certain well-structured symbol strings (possibly empty) we use small Latin letters, a , b, c, ... and Latin capitals A , B, C, ... (primed or subscripted if required). In special definitions, called Notation Rules, we restrict the use of some syntactical variables (and its primed or subscripted variants). For example, we agree upon:
Notation Rule 1.1. A s a syntactical variable for arbitrary strings of sym0 bols from the alphabet we use the Latin capital S. Such a string can be empty. The empty string itself is denoted by 0.
Notation Rule 1.2. A s syntactial variables for individual variables we use 0 the small Latin letters x , y and 2. (Instead of “individual variable” we often say “variable”.) Hence from now on each use of a syntactical variable S (or Sl,S’, etc.) denotes a string of symbols from the alphabet, and each use of a syntactical variable x (or y , X I , etc.) denotes an individual variable. It is usual to build strings of symbols from the alphabet and syntactical variables, concatenated. For example, [z : a][y : a]x is such a string. We shall call this kind of string a mixed string. Equality of mixed strings will be expressed in the discussion language by the symbol =. For example, if we wish to express that the strings S and [z : a l p are the same, we write S = [x : a]p. The symbol f is the negation of E. There are said to be two “occurrences” of a in the string [a: p] a. We shall formalize this notion occurrence. We define that S” occurs i n S after S’ if there is an S”’ such that S = S’S”S”’. Hence, in the above example: a occurs in [a: P)a after [, and a occurs in [a: p ] a after [a : PI. In this manner we can distinguish between occurrences.
R.P. Nederpelt
402
The following statement is clear: If S1 occurs in S after S’ and S2 occurs in S1 after S”, then S2 occurs in S after S’S”. Consider the mixed string [z : y] z, in which there are two occurrences of z. If z denotes a , then [z : y] 2 denotes [a : y] a: both occurrences of z are replaced by a. If, moreover, y denotes p, then [z : y] z denotes [a : /3] a. It is, however, also possible that both 2 and y denote a. Then [z : y ] z denotes [a : a ] a (see also [Shoenfield 67, p. 71). Syntactical variables are used in two hardly distinguishable r6les: as abbreviations (“We abbreviate [z : a] /3 as S ” ) and as variables (“Let S be a string of the form ...”). It is also good usage to state something in the nature of “Let A = [z : B]C”, meaning: “Assume that A has the form [z : B ]C for certain z, B and C” (in this manner one economizes in the use of the existential quantifier). We shall define many specific sets and relations in an inductive manner (see [Shoenfield 67, p. 41). The proof technique linked with this kind of definition, which amounts to induction on the construction, is often called (somewhat confusingly) induction on the length of proof (or induction on theorems, see [Shoenfield 67, p. 51). We shall call an application of one rule of the inductive definition a derivationstep. If a relation is defined inductively by a number of rules, then the relation is also said to be generated by these rules. When speaking of a transitive (or reflexive, etc.) relation generated by a number of rules, one wishes to express that the rule of transitivity (or reflexivity, etc.) is to be added to that number of rules. If S denotes a certain symbol string, then the length of S is the number of symbols in that string. We denote the length of S by 15’1. For example, if S = [a : p] a, then IS1 = 6.
2. Expressions The expressions of our systems are inductively defined as follows (we use the word expression rather than the words term or type):
Definition 2.1. (1) A variable is an expression. (2)
7
is an expression.
(3) If z is a variable and if A and B are expressions, then [z : A] B is an expression. (4) If A and B are expressions, then ( A )B is an expression.
0
Strong normalization in a typed lambda calculus (C.3)
403
Note that this definition gives a unique construction of an expression.
Notation Rule 2.2. As syntactical variables for expressions we use the Latin capitals A , B , C, ...,N . 0 Definition 2.3. A symbol string of the form [z : Cl is called an abstractor, a symbol string of the form ( D ) an applicator. A lambda phrase is either an abstractor or an applicator. A (possibly empty) string of abstractors (applicators, A-phrases) is called a n abstractor chain (an applicator chain, a lambda phrase chain). 0 Notation Rule 2.4. A s a syntactical variable for abstractor chains we use the Latin capital Q, for applicator chains the Latin capital R and for A-phrase 0 chains the Latin capital P. The number of entries in a string forming an abstractor chain Q (an applicator chain P , a lambda phrase chain R ) is denoted by 11Q11 (llPll, IlRll respectively). Hence 11Q11 = 0 if Q = I, and IIQ [x : C]ll = 11Q11 1. An expression B can be a subexpression of an expression A , denoted B c A . This relation is inductively defined as follows:
+
Definition 2.5. (1) A c A . (2) If C c A or C c B , then C c [z : A] B and C c ( A )B.
0
Note: if B c A , then A = &BS2, i.e.: a subexpression of an expression A is an expression which forms a connected part of A . Instead of B c A we sometimes say: A contains B. If B c A and B f A , we call B a proper subexpression of A .
Theorem 2.6. If F C E and E C D , then F C D. Proof. Induction on IDI.
0
If B c A , then B occurs in A , but there may evidently be more occurrences of B in A . In the following we wish to be able to distinguish between such occurrences of B in A . We shall indicate the occurrence meant by saying “B c A after S” if B c A and B occurs in A after S.
Definition 2.7. Let B occur in A after S1 and let C occur in A after S,.
R.P. Nederpelt
404
We call these occurrences disjoint if either
S2
= S1BS’ or S1 = SzCS”.
0
Theorem 2.8. Let B occur in A after S1, let C occur in A after S2, let B c A and C c A. Then (1) B and C occur disjointly an A, or-
(2) B c C, or
(3) C c B.
Proof. Induction on the length of proof of B c A.
0
Let B c A after S. We shall inductively define the factor of A with respect to S and B (denoted A1 (S; B)) in Definition 2.9. In this definition the occurrence of B meant is precisely described. However, it will often be clear from the context which occurrence of B is meant in case B c A. In that case the precise indication of this occurrence is superfluous, and instead of A I (S; B) we shall write AIB. Informally we can concentrate the inductive definition of AIB, under the condition that in each of the following rules the occurrences of B under discussion are “in corresponding places”:
= B, then AIB = A. If A = [z : C] D, then AIB = CIB in case B c C and AIB = [z : C] (DIB)
(1) If A (2)
in case B c D. (3) If A
= (C)D, then
B c D.
AIB
= C ( B in case B c C and AIB = DIB in case
For a description of a characteristic property of AIB, which justifies its introduction, see the following section (after Th. 3.6). The formal inductive definition of A I (S; B) is the following: Deflnition 2.9. Let B c A after S.
= B, then A I (S; B) = A. Let A = [z : C] D.
(1) If A (2)
If B c C after S1 and [z: S1 = S, then A I ( S ; B) = CI (Si; B). If B c Dafter 5’2 and [z : C] S2 = S, thenA I (S; B) [z : C] ( D I ( s 2 ; B)).
=
(3) Let A (C) D. If B c C after S1 and (Sl= S, then A1 ( S ; B) = C ( (SI;B). IfBcDafterSzand (C)Sz=S,thenAI(S; B)=DI(S2;B).
0
405
Strong normalization in a typed lambda calculus (C.3)
Note: the parentheses ( ) in [z : C] ( D I (S2 ; B ) ) belong to the discussion language and are meant to fix the scope of I . Let B c A after S . It will be clear that B C AIB, or, a fortiori: AIB ends in B (here, of course, A ( B is meant to be A I ( S ; B ) ) . It is also evident that AIB = Q B and ( Q A )1 B = Q(A1B). We state the following theorems: Theorem 2.10. If C c B
c A, then (AIB) I C
AIC. 0
Proof. Induction on [A(. Theorem 2.11. If B
cA
and AIB
= Q1[z : C]QzB, then C c A.
Proof. Induction on IAI, using Th. 2.6.
0
Theorem 2.12. If E = Q l [ z : C]D and B c C after S , then EIB (here EIB is E I ( Q l [ z: S ; B ) and CIB i s CI ( S ;B)). Proof. Induction on 11Q111.
= Ql(C[B) 0
Theorem 2.13.
(1) If [z : C]D c A after S1 and B c D after SZ, then AIB (here AIB is A I (&[z: C]5 2 ; B)).
= Q l [ z : C]QzB
(2) If B C A after S and AIB = Q1[z : C]Q2B, then there is a D such that [z : C]D c A after S1, B C D after Sz and S = Sl[z: C]S2 (here AIB is A I ( S ;B)). Proof. In both parts of the theorem: induction on IAl.
0
We conclude with an inductive definition of the function Tail, which maps expressions to expressions: Definition 2.14. ( 3 ) Tail(z) = z . (2) Tail(7)
= 7.
(3) Tail([z : A] B ) = Tail(B) . (4) Tail((A) B ) = Tail(B).
R.P. Nederpelt
406
Note that Tail(A) can only be a variable or 7. An expression A can always be written (uniquely) as A = P Tail(A) (in which P denotes a A-phrase chain, see Notation Rule 2.4).
3. Bound expressions An occurrence of a variable in an expression can be a free, a bound or a binding occurrence. We shall introduce these well-known notions in our system too. An occurrence of a variable in an expression is binding if and only if that occurrence immediately follows an opening bracket [. If D contains an occurrence of z (i.e.: z c D ) , then that occurrence of z is either bound (and there is a unique binding occurrence of z which binds that bound occurrence) or free. A formal description is given in the following inductive definition. In this definition we often encounter “corresponding” occurrences of x. For easy understanding we shall not use our formalism concerning occurrences (see Section II.l), but we shall introduce “a certain z” and refer to it as “that 2”. Definition 3.1. (1) z is free in
2.
(2) Let a certain z be free in A or B. Then that z is free in (A) B.
(3) Let a certain z be free in A . Then that z is free in [y : A] B (both if y and if y f z).
=z
(4) Let a certain z be free in B. Then that z is free in [y : A] B if y f z, but that z is bound in [z : A] B (by the binding z occurring in [z : A] B after
[I. (5) Let a certain I in A be bound by a binding 2 in A, or let a certain z in B be bound by a binding x in B. Then that z is bound by the corresponding 0 binding z in both (A) B and [y : A] B (also if y e z). The binding z occurring in [z : A] B after [ binds precisely the free 2’s in B (if any). We shall mainly be interested in expressions in which no variable is free, called bound expressions (in the literature also called closed expressions). In bound expressions the same binding variable can occur in different instances. This cannot, however, give rise to confusion as to the connection between a bound variable and the binding variable by which it is bound,
Strong normalization in a typed lambda calculus (C.3)
407
Yet, for practical reasons, we wish to avoid such expressions. We call a bound expression in which all binding variables are different, a distinctly bound expression, and we restrict ourselves to the set of all distinctly bound expressions, which we call A. This is no essential restriction. Every interesting theory concerning bound expressions can be restricted to distinctly bound expressions. Let x c D after S and let this x be bound in D. It follows from Def. 3.1 that we have D = &[a: : E ] FS2 such that [x : E] F c D, z c F after S3, S l [ z : El S3 = S and the x occurring in D after S1[ binds the x occurring in D after S. We shall call [x : E ] the binding abstractor of the bound x. From this and Th. 2.13 (1) it follows:
Theorem 3.2. If z c D E A, then Dlx z Q l [ x : El Q2x and [x : El is the binding abstractor of the bound x in D. 0 It follows from Th. 2.13 (2):
Theorem 3.3. Zf for each x c D there are Q l [ x : A] Q22, then D is a bound expression.
&I,
A and Q2 such that D1x
= 0
The following theorem expresses in an intricate manner the obvious observation that, in case x c K c D E A , the x is bound by a binding abstractor either outside or inside K .
Theorem 3.4. If x (2) DIK (ii) Klx
c K c D E A, then either
E & I [ . : A]Q2K, E
D ( x = Q1[x : A] Q2Q’x and K l s = Q’x, or
Q l [ x : A] Q2x and D ( z = QQ1[x : A] Q2x.
In both cases [x : A] is the binding abstractor of x in D.
Proof. Let DIK = Q K , then D ( x = (by Th. 2.10) ( D l K )I x = ( Q K )Ix Q ( K l z ) = QQ’x. Hence, by Th. 3.2, either Q E Q l [ x : AlQ2, or Q‘ Q i [ x : A] Q 2 . Theorem 3.5. If Q ( A )B E then Q A E A.
= = 0
A, then Q A and Q B E A; if Q [x : A]B E A , 0
Proof. Apply Th. 3.3. Theorem 3.6. Zf A E A and B c A after S , then AIB E A.
Proof. First assume that x
c B.
Then by Th. 2.10: ( A I B ) I x = Alx
=
R.P. Nederpelt
408
Q ~ [: xC]Q ~ xNext . assume that AIB e QB and z c Q. Then Q = Ql[y : D ] Q2 and x c D. By Th. 2.11: D c A. So (QB)1x = Ql(Dlx) (010) I x F (AID)1 z z A ( x = Q ~ [:xC ]Q ~ x Apply . Th. 3.3, and 0 note that the binding variables in AIB must be distinct. The above theorem states an essential feature of the factor AIB. If A E A and B c A, then not necessarily B E A. But AIB, which is QB for a certain abstractor chain Q, “closes” B in A by placing in front of B those abstractors which necessarily bind all free variables of E . This, together with our previously uttered wish to restrict our expressions as much as possible to A, justifies our introduction in Section 11.2 of the factor AIB. We continue with three theorems, related to one another.
Theorem 3.7. If Q [z : A]B E A and
t
B , then QB E A.
Proof. Observe the various places where a variable y ( f x ) can occur in QB. 0
Theorem 3.8. If QB E A and Q [x : A]B E A , then x Proof. The assumption z
cB
leads to a contradiction.
B. 0
Theorem 3.9. If QA and QB E A , if the binding variables in A and B are distinct and if x does not occur as a binding variable in QA or QB, then Q [ t : A]B E A. Proof. Again observe the various places where a variable y ( f x ) can occur in 0 Q [x : A]B. In general: If QB and QPD E A and x occurs as a binding variable in P , then not x c B. We say: P has no binding influence on B. The description of A, which we gave so far, began with (general) expressions and selected the distinctly bound expressions among these. This method is not so practical for theoretical investigations. In the following two theorems we shall indicate how we can compose expressions in A from expressions in A. Or, rather, we shall show how expressions in A can be decomposed into smaller expressions, which also belong to A.
Theorem 3.10. (1) r E A . (2) If QA E A and if x does not occur in QA, then Q [ x : A]. E A and
Q [t : A]7 E A .
Strong normalization in a typed lambda calculus ((2.3)
409
(3) If Q A and Qy E A, if x does not occur in Q A and if x $ y, then Q [x : A] y E A .
(4) If Q A and Q B
E A and af the binding variables in A and B are distinct, then Q ( A )B E A .
Proof. It is trivial that. Q [z : A]x, Q [z : A ]7 , Q [x : A]y and Q ( A )B respectively are again expressions. These are also clearly bound expressions, moreover 0 distinctly bound by the conditions given in the theorem. We may consider the four parts of the previous theorem as derivation rules.
Definition 3.11. We call K A-constructible if we can establish that K E A 0 by a (finite) number of applications of the rules in Th. 3.10. The proof of the following theorem is technical. Yet it is interesting to see how we can establish A-constructibility. For better understanding, we shall express the main lines at the end of the present section.
Theorem 3.12. If K E A, then K as A-constructible.
Proof. Induction on 1K(. If (KI = 1, then K = 7 and K is A-constructible by rule (1). Assume that IK(1 > 1, and let all distinctly bound expressions K’ with lK‘1 < 1K1 be A-constructible (first induction hypothesis). Then K = PIP2 ... P,Tail(K) for some n 2 1, where each of the Pi is a lambda phrase (i.e. either an abstractor or an applicator). We can now prove the lemma: “For all i for which 1 5 i 5 n + 1 it holds that: K I (Pi ... P,, Tail(K)) is A-constructible”, by induction on n + 1 - i.
+
(1) Let z = n 1 (i.e. Pi ... P, = 0). If IKITail(K)I < 1K1, then the first induction hypothesis leaves nothing to prove. If [ K1 Tail(K)( = 1K1, then K I Tail(K) G K = [z1 : El] ... [z, : En] Tail(K) for n 2 1. (a) Assume that Tail(K) = x. Then for exactly one s: zs = 2. Abbreviate [zi : El] ... [zt : Et] as Qt for 0 5 t 5 n. We distinguish the cases z, = x and z, f 2. If z, = x, then K is A-constructible by rule (2) since Q,-lE,, E A by Th. 3.5. If z, f x,then K is A- constructible by rule (3), since Qn-lEn I A and Qn-lz, E A (the latter by Th. 3.7). (b) Assume that Tail(K) I 7. Then Qn-lEn E A by Th. 3.5 and Aconstructible by induction. By rule (2) we then find that K is A-constructible.
R.P. Nederpelt
410
(2) Let 1 5 i 5 n and assume that K I (Pj ... P, Tail(K)) is A-constructible if i < j 5 n 1 (second induction hypothesis). If IK I (Pi... P,Tail(K))I < 1K1, then again the first induction hypothesis leaves nothing to prove. So let IKl(Pi ...P,,Tail(K))I = 1KI. Then K = K I ( P i ... P,Tail(K)) = [zl : El] ... [zt : Et] P i ... P,Tail(K).
+
(a) Let Pi E [zt+l : Et+l]. Then we have that KI (Pi ... P,Tail(K)) = K I (Pi+l ... P, Tail(K)), which is A-constructible by the second induction hypothesis. (b) Let Pi = ( F ) . Then QtF E A by Th. 3.5 and A-constructible by the second induction hypothesis, and the same holds for QtP,+l ...P, Tail(K). Hence, by rule (4): Qt(F)Pi+1 ... P,, Tail(K) = K is A-constructible. From this lemma it follows that K I ( P I ,..., P,Tail(K)) = KIK = K is Aconstructible. (In this proof we did not check the conditions concerning variables in rules (1) to (4). It is easy to see that these are fulfilled in the appropriate places.) 0 From Th. 3.10 and Th. 3.12 follows that rules (1) t o (4) generate the relation K E A. Hence, we can consider these four rules as a second definition of A. The advantage of the latter is that we have an inductive definition of A, whereas the original definition was restrictive with respect to the set of all expressions. In a proof by induction the four rules of Thf3.10 are much easier to use. The notion “induction on the length of proof” usually refers to a proof (in fact a construction) produced by an inductive definition, as is the second definition of A. From now on we shall refer to this latter definition when giving a proof by “induction on the length of proof of K E A”. Note that, given a K E A, only one of the derivation steps in Th. 3.10 can give K A as a conclusion: If K = 7, then this must be rule (1). If K = Q ( A )B, then this must be rule (4). If K = QT and Q f 0, then this must be part 2 of rule (2). If K =- Q z and Q = Q’[z : A ] , then this must be part 1 of rule (2). If K = Q z and Q = Q’[y : A] for some y f 2, then this must be rule (3). The proof of the previous theorem suggests how we can establish that K E A by using the four rules of Th. 3.10 (i.e. that K is A-constructible). We can express this ir, words as follows: Let K be a distinctly bound expression. (1) we first establish that K I Tail(K) is A-constructible: (a) if Tail(K) = z, then find the binding abstractor [z : A] in K which binds K, establish that K I A = QA is A-constructible, and apply
Strong normalization in a typed lambda calculus (C.3)
411
be the abstracrule (2) to obtain Q[z : A]. E A. Let Q{,Qh, ...,Qi tors occurring in K 1 Tail(K) “between” [z : A] and z. Insert these abstractors, starting with Q{ (from left to right), by inserting Q; in Q [z : A] Q{ ... Q:-,z between Q:-l and z (by rule (3)). In this manner we establish that K I Tail(K) is A-constructible. (b) If Tail(K) = 7 , then K I Tail(K) = 7 and we may use rule (1) immediately, or K I Tail(K) = Q [z : A] 7 . In the latter case: establish that QA is A-constructible, and apply rule (2) to establish that K I Tail(K) is also A-constructible.
(2) We established that K I Tail(K) = [q : All ... [z, : An] Tail(K) is A-constructible. In K we find applicators ( B l ) ,..., (B1) “between” the abstractors [zi : Ail. Insert these Bi,starting with Bi and ending with B1 (from right to left) in the appropriate places, using rule (4). In this manner we establish that K is A-constructible. (Note the following. If we establish that K is A-constructible, then we use the A-constructibility of KIE for all E c K . We can prove this by induction on the length of proof of K E A.) 4. Replacement, renovation and a-reduction
If we replace a certain variable z in all its occurrences (free, bound or binding) in an expression A by a variable y, then we denote the result of this replacement by ((z := y)) A. An inductive definition of simple replacement is the following (induction here is on the length of the expression):
Definition 4.1. For each pair z and y, ((z := y)) is a function from expressions to expressions. (1) ( ( z : = y ) ) z = y ;
((z:=y))z=z
(2) ((z := y)) [z : A] B (3) ((z := y)) ( A ) B
ifzfz; ((z:=y))~=~.
[((z := y)) t : ((z:= 9)) A] ((x := p)) B.
(((z:= y)) A ) ((z:= y)) B.
0
The simple replacement of certain variables by others will be used for making the binding variables in an expression distinct. This we shall call the renovation of the expression. Renovation is in fact nothing but a repeated renaming of variables. Renaming does not affect relevant properties of expressions (under a reasonable interpretation of “variable”; see also what we said concerning &-equivalent terms in Section 1.1).
R.P. Nederpelt
412
We have maintained names for variables for reasons of tradition and legibility. This is at the expense of the renovation selector (to be introduced in this section) and the secalled a-reduction (our wishes concerning bound expressions in the previous section had nothing to do with names for variables, but with our dislike of the occurrence of free variables; the additional wish to have distinctly bound expressions, however, does concern names for variables, and leads us to introduce renovation). We shall discuss the process of renovation. The mathematical meaning of renovation is not profound. If one dislikes a formalization of an intuitively clear concept, one can continue reading at Def. 4.4 (concerning a-reduction). There is, of course, one precaution which one must take in the renovation process: the relation between bound and binding variables should remain unaffected in a natural way. For example, in [z : y ] [z : z ]z the binding 5 occurring after [ binds the bound z occurring after [z : y ] [z : ; the binding z occurring after [z : y ] [ binds the bound z occurring at the end of the expression. In changing this expression into an expression with distinct binding variables, we might obtain: [x : y ] [ z : x ]z for a certain z f x. Such a variable z as introduced in the latter example by the renovation process plays a special r81e. It has to be chosen with care. At any rate it should be different to all binding variables in the expression under discussion, or, as we shall say: it has to be fresh with respect to that expression. We introduce the renovation selector FTV,operating on expressions. In using the renovation selector FTV with an expression A we have it preceded by a lambda phrase chain P , giving PFTV A. The subscript V denotes a finite set of variables, which can be empty. We shall not specify the variables belonging to V until the following section, where we use FTV in the formal definition of substitution. The expression PFTV A can informally be described as being PA', where
(1) A' is obtained from A by renovation of A , and (2) the fresh variables chosen during this renovation do not occur in P or V and are mutually distinct. The following inductive definition gives a formalization of this concept:
Definition 4.2. Let V be a finite set of variables. (1) PFTVz = P z ; P F r v r
= Pr.
(2) P F r v ( [ y : B ] C ) = P [ z : B ' ] F r v ( ( ( y:= z ) ) C ) ,with PB' while z does not occur in PB' and z # V . (3) P F r v ( ( B )C ) = P (B')FTvC, with PB'
= PFTV B.
= PFTvB, 0
Strong normalization in a typed lambda calculus (C.3)
413
From the above definition we see that the renovation of an expression takes place from left to right. For instance, the renovation resulting in P F T v ( ( B )C) first requires the renovation resulting in P F r v B , and subsequently the one resulting in P (B‘)F r v C . This implies that the fresh variables chosen in the renovation process are mutually distinct. Of course uncertainties remain as to the choice of fresh variables. An order in the set of all variables could turn the selector Frv into an operator. We shall, however, not push the formalization this far. In the following section we shall describe substitution by the aid of the renovation selector, and we shall in turn use substitution in describing the 0-reduction. Our use of the renovation selector is meant to keep an expression distinctly bound after @-reduction. We shall use the renovation selector in typing an expression, as is described in Section 11.8, with the same purpose. We usually begin renovation with V = 0 (here, of course, 0 denotes the empty set and not the empty string).
Definition 4.3. P F r A = PFTQA.
0
In fact P F r A is the concatenation of P and F r w A , where W is the set of variables occurring in P . Let P F r v A = PA’ be the result of a renovation. If we again write F r v A , in the same context, we do not require a new renovation but mean A’. If we wish another renovation in the same context, then we supply F r v with primes: P F r b A can be such a new renovation. We shall now define the a-reduction relation. For an informal discussion of a-reduction see Section 1.1. We restrict a-reduction to expressions in A:
Definition 4.4. a-reduction, denoted by La,is the transitive relation generated by: 0 If A E A and if y does not occur in A , then A > a ((z := y)) A. The a-reduction is clearly an equivalence relation (reflexivity: take z t o be a variable which does not occur in A; symmetry: note that z does no longer occur in ((z := y)) A ) . If two expressions are related by a-reduction ( A 1, B ) , we speak of “the a-reduction A B”. This is clearly abuse of language, although it cannot give rise to confusion. A renaming of a single variable in a distinctly bound expression is called a single-step a-reduction, denoted as A Zb, B (so A Lb, B if and only if A E A and B = ((z := y)) A where z occurs as a binding variable in A and y does not occur in A ) .
R.P. Nederpelt
414 The following theorems are trivial:
Theorem 4.5. If P A E A, then P F T v A E A; if A E A and A 2, B , 0 then B E A and IAJ= IBI. Theorem 4.6. Let P A , P B , PP‘A and PP’B E A. Then P A 2, P B if and only if PP’A PP‘B. 0 5. Substitution and P-reduction Substitution is an operation acting on expressions. We denote “the result of the substitution of A for 2 in B” by (z := A ) B . One can use several definitions of substitution which are equivalent. We shall use the definitions given in Def. 5.1 and Def. 5.2. These definitions of substitution can informally be described as follows: P ( z := A ) v B is the expression which we obtain from P B by replacing all free z’s in B with renovations of A , in which the fresh variables are chosen in the following manner: they have to be mutually distinct for all renovations of A replacing the free z’s, and they have to be distinct from the variables occurring in B , P or V (we do not replace the binding variables in B by fresh ones). Here V denotes a finite set of variables, which can be empty. This careful dealing with fresh variables is necessary to guarantee that an expression with distinct binding variables has again distinct binding variables after &reduction (to be defined in this section); substitution is an essential part of @reduction. The following part of this section as far as Def. 5.5, will formalize the above notion of substitution. As with renovation, our formalization of substitution may be cumbersome to the reader. One may continue with Def. 5.5, without impairing understanding. An inductive definition of P ( z := A ) v B is the following (induction is here on the length of B ) :
Definition 5.1. (1) P ( z := A ) v z
PFrvA; P ( z := A ) v y E Py if y f z; P ( z := A)VT= Pr.
(2) If y $ z, then P ( z := A ) v [ y : B ] C = P [ y : B ’ ] ( z := A)&’, where PB‘ P ( z := A ) w B , W being the union of V and the set of all variables occurring in C. If y = z, then P ( z := A ) v [ y: B ] C = P [y : B’]C , where B‘ is obtained as above.
(3) p ( z := A ) v ( B )C
P (B’)(z := A ) v C , where B‘ is obtained as above.
0
Strong normalization in a typed lambda calculus (C.3)
415
From the above definition we see that substitution (as renovation) takes place from left to right in an expression: for instance, the substitution resulting in P ( z := A ) v ( B )C first requires the substitution resulting in P ( z := A ) w B , and subsequently the substitution resulting in P (B’)(z := A)&. In part ( 2 ) of the definition we see the importance of the subscript used in (z := A ) : in executing the substitution resulting in P ( z := A ) w B we must have at our disposal the set of variables occurring in C in order to be able to choose fresh variables different from the variables in C. The set W contains the latter variables. The set V can be the empty set. When we begin a substitution, V is usually empty.
Definition 5.2. P ( z := A ) B E P ( z := A ) @ B .
0
In the above definitions there are two more or less unusual parts. Usually
(x := A)z is defined as A ; however, with a view to our wish to keep distinctly bound expressions distinctly bound after some substitution, we deviate from this. Next, one sometimes defines (z := A ) [y : B] C, in either of the cases that z = y or z f y, as [ z : (z := A ) B ](z := A ) ( y := z ) C , z being a fresh variable. The latter definition prevents so-called “confusion of variables” (cf. [Curry and Feys 58, Ch. 3D2]; see also the example below), but gives rise to an additional amount of simple substitutions of variables of the form (y := z ) , which we find cumbersome. In using Def. 5.1 and Def. 5.2 confusion of variables may occur if the use of the substitution operator is not restricted. For example, we have that: [g : A ] ( z := y ) [ y : 712 = [y : A] [y : TIP, where the final y in the latter expression is influenced by [y : T],and not by [y : A] as it should be. In general: confusion of variables may arise as a consequence of the substitution resulting in P ( z := A ) v B if a free variable y of A (with y f z) occurs as a binding variable in B , and if there is a free z in B within the “scope” of that binding variable y. A sufficient condition for avoiding this is, that the free variables of A do not occur as binding variables in B . We use substitution only in the relation &reduction, defined later in this section. The above condition is there fulfilled. Hence confusion of variables cannot arise in our system. Note that (z := A ) operates on free z’s, ((z := A ) ) on all z’s, and FTon all binding variables in an expression. We also define the substitution operator for lambda phrase chains:
Deflnition 5.3. If (z := A)PT = P‘T, then (z := A ) P = P’.
0
R.P. Nederpelt
416
One may interchange the substitution operator and the renovation selector under certain conditions: ‘Theorem 5.4. Let P A E A and P [ z : B]D E A. Then PFr(x := A)D P ( x := A)FrD if no binding variable of D occurs in A.
Proof. Induction on IDI,using Th. 4.5 and the lemma: “((y := z ) ) (z := A)C (z := A ) ( ( y:= 2)) C if y (Z A (but for renaming)”. 0 Substitution is used in the more interesting reduction in lambda calculus called P-reduction, which we denote by >-p. The interpretation linked with @reduction is the application of a function t o an argument (see also the informal description in Section 1.1). We shall restrict @-reductionto expressions belonging to A. This is unusual. One usually conceives of a reduction as a formal relation between expressions in which free variables may occur. It is only our preference for distinctly bound expressions which makes us choose the definitions given below. Note that our P-reduction is not essentially different from the usual concept. The use of the Q in Def. 5.5 is a little obscuring in this respect. We first define single-step @-reduction,denoted by >&: Definition 5.5. Single step P-reduction is the relation generated by: (1) If Q ( A )[z : B]C E A, then Q ( A )[z : B]C >-& Q(z := A)C.
>&Q ( A )D. If QA >b QB, then Q [z : A]C >&
( 2 ) Let Q ( A )C and Q ( A )D E A. If QC >-& Q D , then Q ( A )C ( 3 ) Let Q [z : A]C and Q [z : B]C E A. Q [z : B]C.
(4) Let Q ( A )C and Q ( B )C E A. If QA >-& QB, then Q ( A )C
>& Q ( B )C.0
Note: one rule appears to be missing (viz.: Let Q [z : A]C and Q [z : A]D E A. If QC >& QD, then Q [z : A]C >b Q [z : A] D).But this is a derived rule, see Cor. 5.14. Rules (2), (3) and (4) in the above definition are called the monotony rules of single-step @-reduction;we call rule (1) the rule of elementary @-reduction. If K and L are related by a single-step P-reduction, we obtain L from K by replacing a certain subexpression ( A )[z : B]C in K by (z := A)C. This is, in terms of interpretation, a single functional application. If K L , then we have a construction (a “proof”) which establishes that relation according to Def. 5.5. Such a construction consists of one derivation step which is an elementary @-reduction (rule (1) of Def. 5.5) followed by a
>&
Strong normalization in a typed lambda calculus (C.3)
417
number of derivation steps which are monotony steps (rules (2) to (4)). Note that a single-step P-reduction is achieved from a number of derivation steps. We note that, since Q ( A ) [z : B]C E A, no free variable of A can occur as binding variable in C. This is sufficient, as previously stated, to prevent “confusion of variables”.
Definition 5.6. P-reduction is the reflexive and transitive closure of singlestep &reduction. (This means that (1) if K
>&L , then K 2 p L;
(2) K Lp K ;
(3) if K >p L and L L p M , then K >p M . )
0
If A and B are related by a single-step P-reduction, we speak of “the singleB”. As with a-reduction, this is abuse of language. step P-reduction A Analogously we speak of ‘‘the P-reduction A 2 p B”. A,, we also write A0 A1 ... A l , A1 Az, ...,A,-1 If A0 A , or A0 2; A,. We call this a composite single-step @-reduction, or a n nstep @-reduction(a zero-step P-reduction has, of course the form A A ) . If A0 L p A l , ...,A,-1 >p A,, we also write A0 L p A1 >-p ... >p A,. It follows from the definition of @-reductionthat each P-reduction K >-p L can be presented as an n-step @-reduction K = A0 2; ... A , = L. This splitting is called a decomposition of a P-reduction. Each of the monotony rules has the form: “If reduction (i) holds, then reduction (ii) holds”. It is usual to call reduction (ii) the direct consequence of reduction (i). For example: Q ( A )C 20 Q ( A )D is a direct consequence of Q C >p Q D . We recall that “the length of proof of K L” is the total number of derivation steps in the proof of K Lb L. A proof of K L begins with an elementary P-reduction Q ( A )[z : B]C >-p Q ( z := A ) C . In this case we say that Q ( A ) [z : B]C generates the single-step P-reduction K >-b L. The following theorem holds:
>&
>& >& >b
>b
>-b
>b
>b
Theorem 5.7. Let K E A . Then Q ( A ) [z : B]C generates a single-step P-reduction of the form K L if and only if ( A ) [x: B]C C K and Q ( A )[z : B]C = K I ( A )[z : B]C.
>&
*
>&
Proof. : Induction on the length of proof of K L. -e: We state the following lemma: “Let K = Q‘M E A, and let E = ( A ) [z : B]C c M . Then KIE generates a single-step @-reductionK = Q’M I & Q‘N”. We prove this lemma by induction on IMI:
R.P. Nederpelt
418
= E . Then KIE = K = Q’(A)[z : B]C >&Q’(z := A ) C . Let M = [z : F ]G and E c F . Call K’ = Q’F E A; then E c F , hence, by induction: K‘IE = Q‘(F1E) generates a single-step @-reduction K’ =
(1) Let M (2a)
Q’F 2; Q’F’. By applying monotony rule (3) it follows that K’IE also generates Q‘M = Q’[z : F ] G I &Q’[z : F’] G. Moreover, KIE = K‘IE.
M = [z : F ] G and E c G. Take K’ = K = Q’[z : F ] G = Q”G; then E c G, hence, by induction: K’JE generates a single-step 0-reduction Q”G >b Q“G‘, which can be rewritten as Q‘M 2; Q’[z : F ] G’.
(2b) Let
(3a) Let A4 = ( F )G and E c F . The proof is analogous to the one in case (2a), using monotony rule (4) instead of (3). (3b) Let M = ( F ) G and E C G. Again the proof is analogous to the one in case (2a), now using monotony rule (2) instead of (3). This proves the lemma. The “if-part” of the theorem follows immediately from the lemma. 0 We state the “closure theorem for A with respect t o single-step &reduction”:
Theorem 5.8. If K E A and K 2& L , then L E A. Proof. Induction on the length of proof of K 2; L. Note that our definition of substitution for a variable with the aid of the renovation selector is essential. 0 In the proof we can use Th. 3.2. Corollary 5.9. If K E A and K 2 p L , then L E A.
0
One may conceive of the 0-reduction relation, not as a relation between expressions, but as a relation between a-equivalence classes (in the a-equivalence K‘). class of K we include all K’ such that K The following theorem gives a justification for this conception of @-reduction:
>,
Theorem 5.10. Let A E A, A >& B and A 2, C . Then there is a reduction C D such that B D.
>&
>,
Proof. It is sufficient to assume that A >b, C. Apply induction on the length 0 of proof of A B.
>&
In the sequel we shall sometimes refer to the above conception of @-reduction, by inserting the words “but for a-reduction” in a statement (for example: “ A >p B but for a-reduction” means that there are A’ and B’ such that A 2, A’ >p B’ 2, B ) . However, we often omit the words “but for a-reduction”.
Strong normalization in a typed lambda calculus (C.3)
419
We shall proceed with a number of theorems, in which especially the rBle of the abstractor chain Q in a P-reduction is considered, Q occurring in the beginning of an expression. (The definition of llPll was given after Notation Rule 2.4.)
Theorem 5.11. If Q E E A, Q E Q’F and 11Q11 = 11Q’11, then Q = Q’ or E = F . In the latter case Q = Q l [ z : K ]Q2, Q’ = Q l [ z : L] Q2 and Q i K 2; QiL.
>b
Proof. Induction on the length of proof of Q E 2; Q’F. There are four possible Q’F. In three of these cases for the last derivation step in the proof of Q E cases the conclusion is: Q = Q’. The fourth case is that the last derivation Q’F is: “ Q I K 2; Q1L, so Q E Q 1 [ z: K ]M 2; step in the proof of Q E Qi[z: L ] M = Q’F”. Now if 11Q11 I 11Q111, then Q = Q’, and if )1Q11 > 11Q111~ then E = F , Q = Q l [ z :K ] Q 2 and Q ’ E Q l [ z :L]Q2. 0
>&
>b
Theorem 5.12. If Q E E A and Q E 2; K , then K Q‘ and F’ with 11Q‘11 = 11Q11.
= Q‘F’
for certain
>b
Proof. The reduction Q E K must be the conclusion of one of the rules of Def. 5.6. It is easy to see that the statement of the theorem holds good in all these cases. 0 If a reduction QC >p Q’D is given, in which 1)&11 = 11Q‘11, one can conceive of an accompanying “reduction” of C to D. The following theorem shows this.
Theorem 5.13. If QC , QoC and QoD E A, QC 2 p Q’D and then QoC >p QoD.
11Q11
=
11Q’ 1 1,
>b
Proof. First assume that QC Q’D. Then by Th. 5.11 Q = Q’ or C = D. In the latter case it is trivial that QoC >p QoD. So assume Q = Q’. Then inQ’D and Th. 5.12 yield QoC QoD. duction on the length of proof of QC The general theorem follows. 0
>;
>;
The apparently missing monotony rule, announced immediately after Def. 5.5, is a consequence:
Corollary 5.14. Let QC , Q D , Q [z : A] C and Q [z : A ] D E A . If QC 2b Q D , then Q [z : A] C 2b Q [z : A ] D. 0 We defined P-reduction as being transitive and reflexive. We shall now show that P-reduction is also monotonous:
R.P. Nederpelt
420
Theorem 5.15. The monotony rules hold for 0-reduction, i.e.: (1) If Q ( A )C and Q ( A )D E A, and QC >p QD, then Q ( A )C >p Q ( A )D .
(2) If Q [ z :A ] C and Q [ z : B ] C E A, and Q A >p Q B , then Q [ z : A ] C 2 p Q [z : B]C. (3)
If Q ( A )C and Q ( B )C E A, and Q A >p Q B , then Q ( A )C >p Q ( B )C.
Proof. We shall prove rule (2). Since QC >p QD, we know that QC >-& El 2& E2 ... L& En QD. From Th. 5.12, Th. 5.13 and induction on n it follows that there is also a reduction QC QFl QF2 ... QF, >-& QD. Repeated application of the corresponding monotony rule for single-step @reduction gives: Q ( A )C Q ( A )F1 2b ... Q ( A )D , hence Q ( A ) C >p Q ( A )D. The other two monotony rules for P-reduction can be proved analogously. 0
>&
>&
>b
>&
>&
>&
One may extend Cor. 5.14 to P-reduction: Theorem 5.16. Let Q C , Q D , Q [z : A] C and Q [z : A] D E A . Zf QC 2 p QD, 0 then Q [z : A] C 2 p Q (z : A] D. Theorem 5.17. then P C 20 PD.
Zf Q C , PC and P D E A, QC
>p
Q’D and 11Q11 = 11Q’11,
Proof. Let PC(C 3 QoC, then QoC >-p QoD follows from Th. 5.13. The theorem is proved by repeated application of monotony rule (2) for ,&reduction 0 (see Th. 5.15). Note: the converse of this theorem does not hold (“if P C , QC and QD E A and P C >-p P D , then QC 2 p QD”). Example (Q = [z : TI, P = [z : T ] (z)): [X : T ] (32) [y : T ] (Z)[Z : T ] y >p [z : T ] (z) [Z : T ] Z,but not: [Z: T ] [y : T ] (Z) [Z : T ] 9 >p [Z: T ] [ Z : T ] 2. Theorem 5.18. If P ( A )[z : B ] C E A, then P ( A ) [z : B] C
>&P ( z := A ) C .
Proof. This is a consequence of the following: ( P ( A )[z : B]C) I ( A )[z : B ] C = 0 Q ( A )[Z : B ] C 2; Q ( z := A)C. Apply Th. 5.17. Theorem 5.19. Zf Q K , QM and Q’M E A, Q K 2 p Q‘L and 11Q11 = 11Q‘11, then Q M >p Q‘M. Proof. Along the same lines as the proof of Th. 5.16.
0
Strong normalization in a typed lambda calculus (C.3)
42 1
The following theorems are trivial consequences of the preceding. Theorem 5.20. If Q [y : K ] L E A and Q [y : K ] L >p Q [y : K’] L’, then Q K La QK‘. 0 T h a r e m 5.21. If Q K E A, Q K >a Q’K’ and 11Q11 = 11Q’11, then Q K QK’ L p Q’K’ and Q K >a Q‘K 2 p Q’K’.
0
We define the beta equivalence relation as follows: Definition 5.22. Let A and B E A. We call A beta equivalent to B (denoted: A -p B ) if there is an expression C such that A >p C and B >p C. 0 It is clear that beta equivalence is reflexive and symmetric. The transitivity of beta equivalence will be proved in Th. 7.35, using Th. 6.43 (in the literature the transitive closure of beta equivalence is called beta conversion). Theorem 5.23. Let Q K and Q L E A. If Q K -p QL, there is an N such that QK I p QN and QL QN.
Proof. Since QK -p QL: Q K >p A and Q L 2 p A. From Th. 5.12 it folQ‘N with 11Q11 = 11&’11. Then from Th. 5.21: Q K >p Q N and lows that A 0 QL >p Q N .
From this theorem, together with Th. 5.15, it easily follows that the monotony rules hold for beta equivalence (parts (a), (c) and (d) of the following theorem); part (b) follows from Th. 5.23 and Th. 5.16: Theorem 5.24.
(a) I f QC, Q D , Q (A) C , Q (A) D E A and QC -0 Q D , then Q ( A )C -p Q ( A )D. ( b ) If QC, Q D , Q [ x : A ]C , Q [x : A ] D E A and QC ~p Q D , then Q [ z : A ] C -p Q [ z : A] D. (c)
If Q A , Q B , Q ( A )C , Q ( B )C Q ( A )C Q ( B )C -
E
A and Q A “ p Q B , then
( d ) If QA, Q B , Q [ x : A] C , Q [ x : B]C E A and Q A -p QB, then Q [ Z : A] C -0 Q [X : B ] C .
0
R.P.Nederpelt
422
6. Other P-reductions
In a P-reduction we eliminate a pair of the form ( A )[z : B] occurring in an expression, obtaining “copies” of A (to be precise: expressions A’, A”, etc., which are renovations of A ) instead of the non-binding 2’s in that expression. We sometimes wish to retain information concerning “past” P-reductions, as a kind of “scar” in an expression. The easiest way to do this is to maintain the pair ( A ) [z : B] in an expression after P-reduction. We shall formalize this kind of P-reduction, calling it PI-reduction. Another P-reduction, called &-reduction, will be introduced especially t o eliminate the “scars” ( A ) [z : B] as soon as they are no longer required. We shall show that a P-reduction can be decomposed into a PI-reduction and a ,&-reduction. We describe in this section PI- and P2-reduction as a preparation for Section
111.3. The fact that we wish to keep the pair ( A )[z : B] in an expression after 81reduction complicates matters since we wish each sequence of ,f3-reductions to have a corresponding sequence of PI-reductions. For example, a sequence of preductions Q ( A )(B) [z : T] [y : z] y 2; Q ( A )[y : F r B ]y >b Q F r A should have its counterpart in PI-reductions: Q ( A )(B) [z : T] [y : z] y >bl Q ( A ) (B) [z : T] [y : FTB]y >bl Q ( A )( B )[z : T ] [y : F T B ]FTA. Note that in [z : T] located the latter single-step PI-reduction we have to ignore the scar (B) between ( A ) and [y : FTB] (a property of such a scar (B) [z : T] is that the z does not occur in the expression following it). However, PI-reduction permits more. One may ignore the pair (B) [z : T] in Q ( A ) (B) [z : T] [y : z] y, in spite of the fact that z does occur in [y : z] y. This gives the single-step PI-reduction: Q ( A ) (B) [z : 71 [y : z] y >pl Q ( A )(B)[z : T] [y : z] FTA, by applying [y : z] y t o A . This is a real extension of the usual &reduction concept. We shall call a string like (B) [z : T] in the above example, which may be located between “function” and “argument” of a ,&-reduceable expression, a P-chain. Moreover, we shall agree that the relation Q ( A )P [y : C] D Q ( A )P [y : C] (a, := A ) D (in which P is a /?-chain) does only hold if y occurs in D. If we did not require this, one could continue the PI-reduction with the latter expression, thus producing a non-terminating PI-reduction sequence. The P2-reduction relation, on the other hand, eliminates an applicator and Q P E (in which P is again a P-chain); an abstractor as in Q ( A )P [y : C]E the latter relation only holds, however, if y does not occur in E . We give an inductive definition of P-chain:
>bz
Strong normalization in a typed lambda calculus (C.3)
423
Definition 6.1. (1) If P
= 0,then P is a @-chain.
(2) If P is a @-chain, then ( B )P [z : C] is a @-chain. (3) If PI and P2 are @-chains, then P1P2 is a @-chain.
0
Example: ( A ) ( B )[z : C ] (D) [y : E ] [ z : F ] is a @-chain.
Notation Rule 6.2. We write P to indicate that P is a @-chain. We write 0 [g : B]C to indicate in [z : B]C that z C.
A @-chain P has the property that the number of applicators in P is equal to the number of abstractors in P. Moreover, if P = PlP2, the number of applicators in PI is at least equal to the number of abstractors in PI. We can also express this by means of a valuation v of lambda phrase chains, defined inductively by (1)
40) = 0,
+
( 2 ) v ( ( A )P ) = v ( P ) 1, (3) ~ ( [:zA] P ) = v ( P ) - 1.
Then for a @-chain P it holds that (i)
v(P)= 0, and
(ii) if P = PlP2, then v(P1) 2 0. These conditions are also sufficient. The following theorem concerning @-chains can be proved by the aid of the above-mentioned valuation properties:
Theorem 6.3. (1)
If P f 0, then P
(2) P
= P'(B) [z : C]P".
= PIPS is a @-chainaf
and only if PI
= P I P ~ F is' ~ a ,&chain.
0
Note that for each P f 0 there is a unique decomposition P = P1P2 ... Pn with Pi f 0 and Pi f Pip,!' for some P,! $ 0 and P,!' f 0. The following theorem shows that a @-chain has a compact structure:
R.P. Nederpelt
424
Theorem 6.4. If ( A )P [z : B]C C F , then either ( A )P [z : B] is a part of PI (i.e. Pl = s ~ ( AP) [z : B]s2),or ( A )P [Z: B] c F .
c
Proof. The essential part of the theorem is that the following cannot occur: P EE P2P3, PI = Pd(A) P2 and F G P3[z : B ] C ( “ ( A ) P [ :z B ] C occurs partially in P I , partially in F ” ) . This can be proved by the aid of the valuation 0 properties for @-chains. We continue with the definitions of single-step
@I-
and @2-reduction:
Definition 6.5. Single-step &-reduction is the relation generated by (1) If Q ( A )P [z : B]C E A and
2
c C, then
Q ( A ) P [ z : B ] C>bl Q ( A ) P [ z :B ] ( z : = A ) C and ,
(2) the monotony rules (see Def. 5.5 (2), (3) and (4), reading
2&).
>bl
instead of 0
Definition 6.6. @I-reduction is the reflexive and transitive closure of single0 step @I-reduction. Deflnition 6.7. Single-step ,&-reduction is the relation generated by (1) If Q ( A )P
[g : B]C E A, then Q ( A )P [z : B]C >b2 QPC, and 0
(2) the monotony rules.
Deflnition 6.8. P2-reduction is the reflexive and transitive closure of singlestep p2-reduction. 0 Note that in Def. 6.7 z may not occur in C. As in the case of @-reduction (see the previous section), we speak of elementary @I- or @2-reduction, n-step @I- or @2-reduction1and a single-step @I- or @2-reduction generated by Q ( A ) P [z : B]C or Q ( A )P [g : B]C respectively. L” or “... K L” is defined as for single-step The “length of proof of K @-reduction. Finally we define PI-equivalence ( K -pl L ) and @2-equivalence ( K L) analogously to @-equivalence (see the previous section).
>b2
>bl
>b, L , then L E A. Induction on the length of proof of K >., L or K >b, L respectively.
Theorem 6.9. If K E A and K Proof. Cf. the proof of Th. 5.8.
L
OT
K
0
Strong normalization in a typed lambda calculus ((3.3)
425
Definition 6.10. We shall write P >p, P’ etc. if P r >p, P’r etc. Theorem 6.11. If Q A E A and Q A with 11Q11 = 11&’11, and either (1) Q
>bl
>&, K
(for Q A
0
>b2 K ) , then K --= Q’A‘
>&, Q’ respectively) and A = A’, or and Q A >b, QA‘ (or Q A >b2 QA’ respectively).
Q’ (or Q
(2) Q I Q‘
Proof. Cf. the proofs of Th. 5.11 and Th. 5.12.
0
Theorem 6.12. The monotony rules hold for PI-reduction and for Pz-reduction.
Proof. Cf. the proof of Th. 5.15.
0
Theorem 6.13. If QC, Q P C and Q P D E A, and QC >pl Q D (or QC >p, QD), then QPC I p , Q P D (or QPC >pa Q P D respectively). 0
Proof. Cf. the proof of Th. 5.17.
The following two theorems deal with the relation between ,&reduction on the one hand and PI- and &reduction on the other. Theorem 6.14. If K
>bl L , then K -p
L.
Proof. Induction on the length of proof of K
>bl
>b, L.
>&,
L is Q ( A )P [X : B]c Q ( A )P [g : B](Z:= A)C. Note that, if P f 8, then P = P’ ( B )[z : C]P”, and PI = P’P’’ is again a pchain by Th. 6.3. It follows that there is a @-reductionfor K and for L. Continuation of this o-reduction process gives: K >p Q ( A )[z : B’]C‘ >p Q ( z := A)C’, and L >p Q ( A ) [g : B’](z := A)C’ >p Q ( z := A)C’. In the latter reduction one should note that the substitutions (y := D) introduced in the reduction L >p Q ( A )[g : B’](z := A)C’ do not influence A , since y A. Together with the statement that in this case ( y := D ) ( z := A ) E = (z := A)(y := D ) E (but for renaming) this results in us obtaining the same C’ (but for renaming) in reducing L as we obtained in reducing K . It follows that K -p L. 11. K L is Q ( A )C >b, Q ( A )D as a direct consequence of QC QD. Then, by induction: QC -p QD. Hence, by Th. 5.24: Q ( A )C -p Q ( A )D. 111, IV. The two other monotony cases are proved similarly to 11. 0
I. K
<
>bl
>bl
>b L , then K >b, M >b2 L or K >bz L. Proof. Let K >b L be generated by Q ( A )[z : B]C >b Q ( z := A ) C . Q ( z := A ) C . If 2 c C, then Q ( A )[z : B]C >b, Q ( A )[g : B](z := A ) C Theorem 6.15. If K E A and K
R.P. Nederpelt
426
>b
L is a &?-reduction. The remainder follows from the fact If 2 (?- C, then K 0 that the monotony rules for p-, PI- and &reduction are similar. We shall now prove a number of theorems leading to a theorem on the possibility of postponement of Pz-reductions after PI-reductions (Th. 6.19). Theorem 6.16. If K E A, K certain n 2 1.
>bz L >bl
M , then K 2bl L’
M for a
>bl M be generated by the following elementary ,&-reduction: >bl Q ( A )P[z : B ](z := A)C, and K Ibz L by
Proof. Let L Q (A! P[z : B ] C
[i
Q’ (D) PI : El F 2bz Q’PlF. Then ( A )P [z : B]C c L and P1F c L. Now we can distinguish three cases: (1) ( A )P [z : B]C and P1F occur in L in disjoint places, (2) ( A )P [ z : B ] C c PlF, (3) P1F c ( A ) P [ z : B ] C and P1F f ( A )P [Z: B]C. (1) In this case it is clear that the theorem holds for n = 1. (2) We may distinguish (see also Th. 6.4):
(a)
= P 2 ( G )P3 or = P2[z : GI P3 and ( A )P [z : B]C theorem holds for n = 1.
(b)
PI = P2 ( A )P [z : B]P3. Idem.
c
G. The
(c) ( A ) P [z : B]C c F. Idem. These cases (a)-(c) are exhaustive if ( A )P [z : B]C c
F.
(3) (a) Let P1F c A . Now the theorem holds for n is the number of occurrences of z in C plus one.
= P2PlP3. The theorem holds for n = 1. Let P E P2 ( G )P3 or P = P ~ [:zGI P3 and @IFc G. Idem.
(b) Let P (c)
(d) Let P1F
c [z : B ]C. Idem.
These cases (a)-(d) are exhaustive if ( A ) P [z : B ]C.
F c ( A )P [z : B]C and
P1F
f
(In this proof we several times use the lemma: PlqP3, then PI ( G )&[z : H ] P3 is also a P-chain”, which is a conse“If P 0 quence of Th. 6.3.) Theorem 6.17. If K E A, K >pa L
M , then K
>bl
L‘ >pa M .
Proof. Induction on the number of steps of K >pz L, using Th. 6.16.
0
Strong normalization in a typed lambda calculus (C.3)
Theorem 6.18. If K E A, K >pz L
>Fl
M , then K
>$, L’
427 >p, M .
Proof. Induction on p , using Th. 6.17.
0
Theorem 6.19. If K1 2’ K2 2’ ... 2’ K, by single-step 01- and 02-reductions, the total number of pi-reductions being p , there is a reduction K1 = L
>,;
M >pz N
G
K,.
Proof. Combine the successive single-step 01-reductions in K1 >’ Kz 2‘ ... 2‘ K,, and do the same with the successive single-step P2-reductions: we obtain Ki Li 20, Mi >p, L2 I p , M2 I p , ... I p , Li 20, Mi 2 0 , K,. Induction on 1 yields the proof. 0 We shall now prove what we call the Church-Rosser property (CR) for 01reduction, which we formulate as follows: If K I p l L and K I p , M , there is an N such that L >p, N and M 20, N . We can express this Church-Rosser property in a diagram, as follows: K
Figure 1 From CR for PI-reduction it easily follows that 01-equivalence is transitive, hence indeed an equivalence relation (reflexivity and symmetry of are trivial). Hence we can also state that -pl is the equivalence relation generated by I p , , which is an alternative formulation for the Church-Rosser theorem for PI-reduction. In proving C R for PI-reduction we shall use a technique introduced by W.W. Tait and P. Martin-Lof, given in [Barendregt 71, Appendix 111. We shall discuss the power of this technique in brief. In order t o prove CR for a reduction it is natural to begin with single-step reductions “K 2’ L” and “K 2‘ M ” . In a usual single-step (e.g. p-) reduction one can then find an N such that “ L 2 N ” and “ M N ” , but unfortunately
>
R.P. Nederpelt
428
only one of these last two reductions is necessarily a single-step reduction, and one cannot say in advance which of the two. If one now begins with multiple-step reductions “ K 2 L” and “ K 2 M” and one tries, by the aid of the above, to find an N such that “ L 2 N ” and “M 2 N ” , then the termination of this attempt is not guaranteed. The following example, drawn in a diagram, suggests what might happen:
.
M
Figure 2 Each rectangle in this diagram represents reductions; three sides of the rectangle are single-step reductions, one side is two-step. The diagram can, however, be continued indefinitely in the place where we draw the dotted lines. Now Tait and Martin-Lof defined a new “single”-step reduction (which we shall call single-step nested reduction t o avoid confusion). The latter reduction has the property that with each pair of single-step nested reductions “K >* L” and “ K 2’ M” there can be found an N such that “ L 2* N ” and “ M 2’ N” , both last-mentioned reductions being single-step nested reductions as well. Moreover, each multiple-step reduction can be decomposed into single-step nested reductions and each single-step nested reduction is a composition of (ordinary) single-step reductions. If one now begins with multiple-step reductions “ K 2 L” and “ K 2 M ” , one can decompose these reductions into single-step nested reductions and apply the
Strong normalization in a typed lambda calculus (C.3)
429
above. Then one obtains, for example, a situation as is expressed in the following diagram:
M
Figure 3 In each of these rectangles the four sides represent single-step nested reductions. Moreover, the nested reductions “ L 1’ L’ N” and “ M M‘ N” can be decomposed into (ordinary) single-step reductions, which combine into “ L N ” and “ M 2 N ” . Thus we obtain CR. In the following we shall define a single-step nested PI-reduction, which we shall call single-step y-reduction. Our y-reduction is a little more complex than the nested reduction of Tait and Martin-Lof, but it yields essentially the same results. The “nested” character of y-reduction can be explained as follows. Let a 01-reduction be generated by Q ( A )P [z : B]C , let Q A 2; QA’ and Q P [z : B]C I ! QP‘ , [z : B’]C’. Then also Q ( A )P [z : B]C 2; Q ( A ’ )P’ [z : B‘] (z := A‘)C‘ (if the suggested PI-reduction is preceded by singlestep nested reductions “inside” A , P, B and C, the composite reduction is a single-step nested reduction). The reductions take place in a “nested” order. With the aid of y-reductions we shall prove CR for PI-reduction.
>*
>*
>*
>
Definition 6.20. Single-step y-reduction, denoted by tion generated by
>k1 is the reflexive rela-
(1) If Q ( A ) P [ z B : ] C E A,z c C , Q A 21, QA’ and Q P [ z : B ] C >Iy QP‘ [z : B‘]C’, then Q ( A )P [z : B]C 2; Q (A’)P‘ [z : B’](z := A‘)C’.
(2) If Q ( A )C and Q (A’)C’ E A, Q A Q (A’)C’.
>!,
QA’ and QC 2; Q C ’ , then Q ( A )C
>!,
R.P. Nederpelt
430
(3) If Q [z : A] C and Q [z : A’]C’ E A, QA 2; QA’ and Q [z : A]C 2; Q [z : A]C’, then Q [z : A]C 2; Q [z : A’] C’. (4) If A E A, A 21, B and B
B’, then A 2; B’.
0
We call (2) and (3) the monotony rules for single-step y-reduction. We also L ) analogously to /3-equivalence. define y-equivalence ( K
--,
Definition 6.21. y-reduction, denoted by Ir,is the transitive closure of singlestep y-reduction. 0 We continue with some theorems concerning y-reduction (it will be clear that Q 21, Q‘ if and only if QT I ! Q’T). ,
Theorem 6.22. ZfQA E A and QA 2; K , then K Q 2; Q’ and QA >!, QA’.
= Q‘A‘, where 11Q11 = 11Q’11,
Proof. Induction on the length of proof of QA 2; K . (1) If QA 2; K by reflexivity, the theorem is trivial.
(2) If QA 2; K is Qo(B)P [z : C]D 2; Qo(B’)P‘ [z : C’](z := B‘)D’ as a direct consequence of QoB 2; QoB‘ and QoP [z : C]D 2; QoP‘ [z : C‘]D’, then 11Q11 5 IlQoll, so QO= QQ“, and the theorem follows.
I!, K is Qo ( B )C 2; Qo(B‘)C’ as a direct consequence of QoB 2; QoB’ and QoC 2; QoC‘, then again 11Q11 5 IlQoll and the theorem follows.
(3) If QA
2; K be Qo[z : B]C 2; Qo[z : B’]C’ as a direct consequence of QoB I!, QoB’ and Qo[z : B]C 2; Qo[z : B]C’.
(4) Let QA
(i) If 11Q11 5 IlQoll or Q = Qo[z : B] then the theorem follows. (ii) If C = [ y l : B1]... [ y , : B,] COand Q E Qo[z: B] [ y l : B1] ... [yn : B,], then QA G Qo[z : B]C = QO[Z: B] [ P I : B1] ... [y, : B,] A 2; Qo[t : B]C‘ E QO[Z: B] [zi : Bi] ... [z, : Bh]A‘ (by induction) with QA 2; QA’ and Q I!, Qo[z : B] [zi : Bi]... [ z , : Bk]. It follows that Q ?I, QO[Z: B‘] [ZI : Bi]... [zn : Bk]. Also C’ = [ t i : Bi] ... [.tn : Bh]A’, SO Qo[z : B‘]C’ = Qo[z : B‘] [zi : Bi] ... [ z , : B;] A‘. Consequently the theorem holds if we take Q‘ = QO[Z: B‘][zl : Bi]... [ z , : BL]. (5) Let QA 2; K be a direct consequence of QA 2; K’ and K’ K . Then by induction the theorem holds for QA 2; K’, and trivially also for QA 2: K . 0
Strong normalization in a typed lambda calculus (C.3)
43 1
Theorem 6.23. The monotony rules hold f o r y-reduction.
Proof. Cf. the proof of Th. 5.15.
0
Theorem 6.24. If Q C , P C and P D E A , and QC 2; Q D , then PC 2; PD.
Proof. Analogous to the proof of Th. 5.17.
0
The following two theorems deal with the relation between PI- and y-reduction.
Theorem 6.25. If K E A and K 2bl L , then K 2; L.
Proof. Induction on the length of proof of K 2&, L. The rule of elementary PI-reduction is covered by Def. 6a.11 (1) (take Q A 27 QA’ to be Q A QA by reflexivity, etc.), the monotony rules for ,&-reduction are covered by the monotony rules for y-reduction (again using the reflexivity of y-reduction in 0 appropriate places.) Theorem 6.26. If K E A and K 21, L , then K 2 p , L but f o r a-reduction.
Proof. Induction on the length of proof of K 2; L. For example: let K 21, L be Q ( A )P [z : B]C 2; Q (A’)81 [z : B’](z := A‘)C‘ as a direct consequence of z C C , Q A 2; QA’ and Q P [ z : B ] C 2; Q P ‘ [ z : B’IC‘. By induction the last two reductions can also be obtained by P1-reductions and a-reductions, and Q ( A ) P [ z : B ] C z p , Q ( A ) P ‘ [ z: B’IC’ 20, Q ( A ‘ ) P ‘ [ z: B’IC’ >pl Q (A’)P’ [z : B’](z := A’)C’ (but for a-reduction). In the last ,&-reduction we use the lemma: “If z c C , if z occurs as a binding variable in Q1 and if QIC 2 p l QlC‘, then z c C’”. 0 Theorem 6.27. If K E A and K
>7
L , then L E A.
Proof. Follows from Th. 6.26 and Th. 6.9.
0
We inductively define similarity of two lambda phrase chains (not necessarily P-chains):
Definition 6.28. then PI and Pz are similar. (1) If PI Pz = I, (2) If PI and P2 are similar, then ( A )PI and ( B )P2 are similar, and [z : A] PI 0 and [z : B] P2 are similar. The following theorems are a preparation for Th. 6.37, which expresses CR
R.P. Nederpelt
432
for y-reduction. In order to prove Cor. 6.31 and Th. 6.34 it is convenient to extend the notion of &chain, as in Def. 6.29: a number of P-chains, connected by abstractors, will be called a P-chain complex. A P-chain complex may be empty.
Deflnition 6.29. Let P I ,P2, ..., Pi be (possibly empty) P-chains. Then a lambda chain PI [q: All P2 [Q : A21 ... Pi-1 [zi-l : Ai-11 Pi is called a P-chain complex. 0 -
P.
We denote a &chain complex P by The following statement can be proved by the aid of the valuations: If P is a P-chain complex and P E P1P2, then P2 is a P-chain complex.
-
-
>!,
Theorem 6.30. Zf Q P A E A and Q P A
-
-
-
K , then K
= Q’P‘A’,
where
11Q11 = 11Q’11, Q I!,Q’, Q P A >!, QP‘A’ and P and P’ are similar. Proof. If P = Q1 (including P = 8), then Th. 6.22 gives the proof. Let P f Q1. We proceed with induction on the length of proof of Q P A K . Note that there must be at least one applicator in the lambda chain P I on account of our assumption P f Q1.
>!,
-
(1) Assume that Q P A I ! K, by reflexivity. The proof is now trivial.
(2) Assume that Q h A I!,K is QQ1 (C) PI [y : D] E I!, QQ1 (C’) [ y : D’] (y := C‘)E‘ as a direct consequence of QQlC 21, QQlC’ and Q Q 1 9 [y : DIE >!, Q Q l p { [ y : D’IE’. Now it must hold
-
that E-= &A, while QlP1 [y : D] P2 is a @-chain complex. By induction:
-
E’
= PiA’ and -P2 and Pi are similar.
The remainder is easy.
(3) Assume that Q P A >!, K is QQ1 (C) D
>!,
quence of QQlC 2; QQlC‘ and QQlD -
-
QQ1 (C’)D’ as a direct conse-
L!, QQlD’.
-
Now
P=
Q1
(C) F 2
= P2A. Here p2 is a @-chain complex, hence by induction D’ P&4’, in which P2 and Pi are similar. The remainder follows easily.
and - D -
=
>!, QQl[y : C’]D’ as a direct QQlC‘ and QQl[y : C ]D 2; QQ1 [y : C ]D’.
(4) (a) Assume that Q h A 21, K is QQ1[y : C]D
>!, -
consequence of QQlC
-
P
-
Then E Q 1 [y : C ]p2 and D = &B. The completion of the proof is similar to that in the last part of the previous case. -
(b) Assume that Q P A
>!,
K is Qo[y : C ] Q l h A >!, Qo[y : C’]D’ as a -
’direct consequence of QoC >!, QoC’ and QO[y : C]Q I P A
-
>!,
Qo[y :-C ]D’. -Then, by induction, D’ Q’,P‘A‘ with llQ1ll = IlQiII, and are similar. The remainder follows.
while
P
Strong normalization in a typed lambda calculus ((2.3)
433
-
(5) Assume that Q P A 2; K is a direct consequence of Q Y A 2; K’ and K‘ K . The proof is again by induction. 0
Corollary 6.31. If Q P A E A and Q P A 2; K , then K = Q’P’A‘, where 0 11Q11 = 11Q’ 1 1, Q 2; Q‘, Q P A 2; QP’A’, while P and P’ are similar. The following four theorems are lemmas for Th. 6.36. Th. 6.34 might have been called “the substitution lemma for y-reduction” . Theorem 6.32. (1) Let QP [z : B]C E A and Q d [z : B]C 2; K . Then K = Q’fi [z : B’]C’ such that Q 2; Q’, QP [z : B]C 2; Q f i [z : B’]C’, 11Q11 = 11Q’ 1 1, while P and P’ are similar. (2) Let Q ( A )P [z : B]C E A and Q ( A )P [z : B]C 2; K . Then either
= Q’ (A’)fi [z : B’]C’, or K = Q’ (A’)fi [z : B’](z := A’)C‘,
(i) K (ii)
where in both cases Q 2; Q‘, Q A 2; QA‘, QP [z : B]C 2; Qpl [z : B’]C’ and 11Q11 = 11Q’11, while P and P‘ are similar. (3) Let Q ( A )B E A and Q ( A )B 2; K . Then either
(i) K
= Q’ (A’)B’
11Q’Ili
where Q 2; Q’, QA 2; QA’, Q B 2; QB‘ and 11Q11 =
or
(ii) Q ( A )B = Q ( A )P [z : C] D, z c D, K G &’(A’)fi [z : C‘](z := A’)D’, Q 2; Q’, 11Q11 = 11Q’11, Q A 2; QA’ and QP [z : C]D 2; Q P [z : C‘] D’. Proof. See Cor. 6.31 and the possibilities for single-step y-reduction.
0
Theorem 6.33. If QP [z : B]C E A, QP [z : B]C 21, Q‘8’ [z : B’]C’ and z c C, then z c C’.
Proof. In a subexpression we can only eliminate free variabes by substitution, and substitution can only originate from a 01-reduction. Note that a 01-reduction yielding a substitution (z := A ) cannot occur in the above. 0 -
-
Theorem 6.34. Let Q A and Q P B E A, -let no binding variable of P B occur in QA and let z not occur in Let Q P B 2; QP‘B’ (where (IP(1= llP’ll)
P.
R.P. Nederpelt
434 -
-
and QA 2; QA’. Then Q P ( z := A ) B I!, QP‘(z := A’)B’.
Proof. First consider the case that P = Q1, so P’ = Q\. We prove the theorem by induction on IBI. If B = y f z or B = T then the theorem is trivial. If B = z, note that QQlA >(, QQIA’, so also QQlA 2; &&’,A‘ by induction on the length of proof of QQlz I!, Q Q ~ zand monotony rule (3) for single-step y-reduction.
>!,
(1) Let B = [ y : El F . Then B’ = [ y : E’] F’ by Th. 6.32 (l),Q Q l E QQ1E‘ and QQ1 [ y : E] F 2; QQ1 [ y : El F’. So also (by induction) QQ1(z := A ) E 21, QQl(z := A’)E’ and QQ1 [ y : E ](z := A ) F QQ1 [ y : E ] (z := A’)F‘. It easily follows from the latter reduction that QQ1 [ y : (Z := A ) E ](Z := A ) F 2; QQ1 [ y : (Z := A ) E ] (Z := A’)F’. It := A‘)B‘, hence also follows that QQ1(z := A ) B 2; QQl(Z := A ) B >!, QQ\(z := A’)B’.
>!,
(2) Let B = ( E )F. Now by Th. 6.32 (3) either B’ = (E’)F‘ with QQlE >!, QQiE’ and QQlF >!, QQiF’, or B E ( E )PI [ y : GI H , and QQlB =
QQ1 ( E )Pi [ y : GI H I!,QQ’, (E‘)Pi [ y : G’]( 9 := E’)H‘
QQiB‘.
In the first case the proof is similar to the proof in case (1). In the second case we can follow analogous lines, using the fact that (y := (z := A’)E’)(z:= A’)H’ = (z := A‘)(y := E’)H‘ but for renaming. Now consider the case that P f Q I , whence at least one applicator- must occur in chain P. We proceed with induction on the length of proof of Q P B 2; -
QP‘B’.
-
-
(1) Assume that Q P B 2;
QP‘B‘ by reflexivity. This case can be proved -
similarly to the case that -
(2) Assume that Q P B
= &I.
-
>I,QP‘B‘ is QQ1 ( C )PI [ y : D]E 2;
QQ1 (C’) [ y : D’] ( y := C’)E’ as a direct consequence of QQlC 2; QQlC’ and Q Q 1 4 [ y : D]E I!, QQlPi [ y : D‘] E’. Now it must hold that -
-
-
E G P2B and E’ = PiB’, where B’ F ( y := (7’)”’. is also a f?-chain complex, it follows by induction: QQ& [ y : D] F z ( Z := A ) B 2; Q Q ~-P[;y : D’] Q P ( z := A ) B 5 QQ1 ( C )PI [ y : D] := A ) B
Since QIP1 [y : D]& := A I ) B / /so ,
FZ(Z 2; QQ1 (C‘)Pi [ y : D’](y := C’)(Pi(z - := A‘)B’’) =
-
QQ1 (C‘)Pi [ y : D’] ( ( y := C‘)Pi)(Z := A’)(y := C’)B’‘ = Q P ( z := A’)B’ (here we changed ( y := C’)(z := A’),’’ into (z := A’)(y := C‘)B’’, which is allowed by Th. 6.33 and by the conditions imposed upon the variables).
Strong normalization in a typed lambda calculus (C.3)
I!,QP‘B’ is QQ1 ( C )D >!, QQ1 (C’)D’ as a direct con-
(3) Assume that Q P B
sequence of QQlC
>!,
QQlC’ and QQ1D
and PI= Q1 (C’)Pi; D -
QQ1&(z := A ) B -
435
>!,
-
>!-, &&ID’. Then h = Q1 ( C )F 2
= PiB’. By induction: -
P2B - and D’
QQlFi(z := A’)B’, so also Q P ( z := A ) B 2;
Q?(z := A’)B’.
-
-
(4) Assume that Q P B >!, QP‘B’ is QQ1 [y : C] D >!, QQ1 [y : C’]D’ as a direct consequence of QQlC >!, QQlC’ and QQ1 [y : C ]D I!, -
-
-
P
&
-
-
QQ1 [y : C ]D’. Then Q1 (y : C] and = Q1 [y : C’]j . ;D = & B and D’= PiB’. The remainder of the proof is analogous to that in case (3). -
-
(5) Assume that Q P B -
-
-
-
QP‘B’ is a direct consequence of Q P B >!, QtrprrBff
-
and Ql’FB‘‘ > a QP‘B‘. Then Q h ( z := A ) B induction; the remainder is easy. Theorem 6.35. If QA and Q’A’ E A, Q Q A >!, Q’A’.
>\
>\ Q”P“(z := A’),’’
by 0
Q’ and QA
>\
QA’, then
Proof. Induction on 11Q11, using monotony rule (3) of single-step y-reduction. 0
Theorem 6.36. If K E A, K L 2; N and M >!, N .
>!,
L and K 21, M , there is an N such that
Proof. Induction on the length of proof of K Th. 6.35 several times without saying so. (1) Let K
>!,
L by reflexivity. Take N
>!, L.
We shall use Th. 6.24 and
= M.
>!,
(2) Let K >!, L be Q ( A )P [z : B ]C Q (A’)P‘[z : B‘] (z := A‘)C’ as a direct consequence of QA >!, QA’ and QP [z : B ]C QP’ [z : B‘]C’. Now by Th. 6.32 (2) M can have the form (i) M (ii) M
>!,
= Q” (A”)P” [z : B“]C” or = Q” (A”)P” [z : B”](z := A”)C’’,
where in both cases Q >!, Q”, Q A I! QA”, ,QP [z : B ]C >!, QP“[z : B”]C”, 11Q11 = 11Q”11 and llPll = IIP”II. By induction and by Th. 6.22 there is an A”‘ such that QA‘ QA”’ and QA” >!, QA”’. Again by induction and by Th. 6.32 (1) there are PI”, B”’ and C“’ such that
>!,
R.P. Nederpelt
436
QP‘ [z : B’] C’ 2; QB“‘ [z : B”‘]C”’ and QB“ [z : B”]C” 2; QB“‘ [Z: B”‘]C”’. By Th. 6.34: QB‘ [z : B‘] (z := A’)C’ 2; QB“‘[z : B’”](z := A”’)C’“, hence by monotony and Th. 6.35: L = Q (A’)P‘ [z : B’] (z := A’)C’ 2; Q” (A”’)P”’ [z : B”‘](z := A”‘)C”’. Call the latter expression N . It also holds that Q”? [z : B”]C” 2; Q”P“‘ [z : B’”]C”’, and z c C”’ by Th. 6.33. So Q” (A”)B“ [z : B”]C” 2; N = Q” (A’”)B‘“’[z : B‘”](z := A”’)C”’ by an elementary y-reduction. This completes this part of the proof in case (i). In case (ii) we first establish that QP” [z : B”](z := A”)C” 2; QB“‘ [z : B”’] (z := A“’)C”’ by Th. 6.34, yielding by monotony that M 2; N . (3) Let K 2; L be Q ( A )C 2; Q (A’)C’ as a direct consequence of QA 2; QA‘ and QC 2; QC‘. Now by Th. 6.32 (3) M can have the form
(i) M =- Q” (A”)C“, with Q 2 Q”, 11Q11 = IIQ”II, Q A 2; QA“ and QC 2; QC”, or (ii) M z
= Q” (A”)P’ [z : D”] (z := A”)E”, where K = Q ( A )P [z : D] E , c E , QA 2;
QA‘, Q 2; Q” and QP [z : D] E 2; QP” [z : D”] E”.
In case (i) we can find by induction A”’ and C”’ such that QA’ 2; Q‘IA‘’’, Q”A” 2!, Q”A”’, QC’ 2; Q”C’” and Q”C” 21, Q”C”’ and we can take N Q” (A”’)C”‘. In case (ii) we are in a position similar to case (i) of (2), with L and M permuted. (4) Let K 2; L be Q [ z : A ] C 2; Q [ z : A’IC‘ as a direct consequence of QA 2; QA‘ and Q [z : A]C 2; Q [z : A]C’. Then M 2, Q” [z : A”]C” by Th. 6.32 (l),where Q 2; Q”, Q A 2; QA” and Q [z : A]C 2; Q [z : A]C”. Take N = Q” [z : A”‘]C“’, where A”’ and C”’ are obtained as in (3), case (i) * (5) Let K 2; L as a direct consequence of K 2; L’ and L’ L. Then by induction we can find an N such that L’ 2; N and M 2; N , and also L 2; N . 0
Theorem 6.37 (CR for y-reduction). If K E A, K 2-, L and K Ir M , then L --, M . Proof. This is a consequence of the previous theorem.
0
Theorem 6.38 ( C R for ,&-reduction). If K E A, K z p , L and K I p , M , then L -pl M but for a-reduction.
Strong normalization in a typed lambda calculus (C.3)
437
Proof. Decompose K >pl L and K >pl M , apply Th. 6.25, Th. 6.37 and Th. 6.26: we obtain an N' such that L I p , N > a N' and M rplN" > a N'. 0
>b, >b2 M , then there is an n >bl > Let K >b, L be generated by Q ( A )P [z : B ]C Lb,
Theorem 6.39. If K E A, K L and K such that M N and L >i2 N with n 1.
Proof.
>b2
Q ( A ) P [& : B ] (z := A)C, and K M by Q' (D) i)l [i: E] F >b2 Q'PlF. If ( D )PI [y : E ] F c A , we need n P2-reductions for the n A's in Q ( A )P [g : B] (z := A)C. If not, we need only one. The theorem easily follows. 0
Theorem 6.40. If K E A, K >p, L and K >p, M , there is an N such that M >pl N and L >p2 N . Proof. Apply Th. 6.39 repeatedly. This can be illustrated by the following diagram:
M
Figure
4
>bl
Here we assume that K 20, L can be decomposed into K 2bl L' L , and K 2 p 2 M into K M' M . In the diagram all edges (in the sense usual in graph theory) parallel to the edge from K t o L' represent single-step PI-reductions, those in the direction of K M' represent single-step &reductions. 0
>b2
>b2
>b2 rb2
Theorem 6.41. If K E A, K L and K L N OT L = N , and M N or M E N .
>b2
Proof. Let K
>b2 M , there is an N
such that
>b2 L be generated by Q ( A )P [& : B ]C 2b2QPC, and K 2b2
R.P. Nederpelt
438
M by Q' (D) PI
[i: E ]F >&2
Q'PlF. If (D) PI
[i: E] F
c A or c B then
>&, L ; if ( A )P [g : B]C C D or C E , then L >bz M . If (D) 4 [i: El F = ( A ) P [g : B] C then L = M . In all other cases there is clearly an N such that L >&, N and M >&, N . 0 M
Theorem 6.42 ( C R for P2-reduction). then L "pz M .
If K
E
A, K >pa L and K >pz M , 0
Proof. Apply Th. 6.41 repeatedly. Theorem 6.43 ( C R for P-reduction). L "p M.
If K
E
A, K 2 p L and K >p M , then
Proof. As that of Th. 6.38.
0
Theorem 6.44. If K E A, K >p L and K >p M , then there are N , N" and N"' such that L 20, Nf' >pa N and M >p, N"' >p, N .
Proof. Decompose K L and K >p M , according to Th. 6.19, into K >pl L' >p, L and K >pl M' Zp, M respectively. The remainder of the proof is illustrated by the following diagram:
M
Figure 5 We find N' from Th. 6.38, N" and N"' from Th. 6.40 and finally N from Th. 6.42. 0 7. q-reduction, reduction and lambda equivalence
A third reduction in lambda calculus (apart from a- and P-reduction) is called q-reduction and denoted by &. We shall incorporate it in our system.
Strong normalization in a typed lambda calculus ((3.3)
439
We first define single-step 7-reduction, denoted by 2;:
Definition 7.1. Single-step 7-reduction is the relation generated by: (1) If Q [z : A] (2) B E A and ( 2 ) Let Q ( A )C and
2
(Z B, then Q [z : A] (z) B 2; QB.
Q ( A )D E A.
If QC 2; Q D , then Q ( A )C 1; Q ( A )D.
(3) Let Q [z : A]C and Q [z : B]C E A. If QA 2; QB, then Q [z : A] C 2; Q [z : B]C. (4) Let Q ( A ) C and Q ( B ) C E A. If QA 2; QB, then Q ( A ) C 2; Q ( B ) C . 0
Rules (2), (3) and (4) are called the monotony rules of single-step 7-reduction; they are similar to those of single-step P-reduction. Rule (1) is called the rule of elementary 7-reduction.
Definition 7.2. 7-reduction is the reflexive and transitive closure of single0 step 7-reduction. If A and B are related by a (single-step) 7-reduction, we speak of “the (singlestep) 7-reduction A B”. The notions n-step 7-reduction and decomposition of an 7-reduction are defined analogously to the corresponding notions for preduction. If the first derivation step of a single-step 7-reduction has the form Q [z : A] (z) B QB, we say that Q [z : A] (z)B generates the single-step 7-reduction.
>;
>;
Theorem 7.3. Let K E A . Then Q [z : A] (z) B generates a single-step 7reduction of the form K 2; L if and only if [x: A] (x)B c K and Q [z : A] (z) B = K 1 [z : A] (z) B. Proof. Similar to the proof of Th. 5.7.
0
Theorem 7.4. If K E A and K 2; L , then L Proof. Induction on the length of proof of K proof of Th. 5.8.
E
>;
A. L. The proof is similar to the
Theorem 7.5. If QE E A, QE 2; Q’F and 11Q11 = 11Q‘11, then (2)
Q = Q’,
(ii) E
= F , or
0
R.P. Nederpelt
440
= QO [z : A ] , E = (z) [y : B]E’, Q’ = QO[y : B ] , z [y : B]E’ and = E‘. In the second case Q = Q1 [z : K ] Q2, Q‘ = Q1 [z : L]Q2 and Q1K 2; Q1L. (iii) Q F
Proof. Induction on the length of proof of Q E 2; Q’F. The proof is comparable to the proof of Th. 5.11, except for the case in which Q E 2; Q’F is an elementary 77-reduction. In this case we have t o note the possibility that QE and Q’F are as in (iii). 0 Theorem 7.6. If Q E E A and Q E 2; K , then K and F‘ with 11Q’11 2 11Q11 - 1.
= Q’F’
for certain Q’ 0
Proof. Similar to the proof of Th. 5.12. Theorem 7.7. If Q E E A and Q E 2; K
21 QG, then K
= QF.
Proof. If Q = Q1 [z : A] and Q E 2; K is Q1 [z : A] (z) B 2; Q l B , then the binding variable 2 of K has disappeared, and we cannot regain it by 77-reduction. Hence by Th. 7.6: K = Q’F and 11Q’11 = 11Q11, and the case expressed in Th. 7.5 (iii) does not hold. In the derivation steps leading to K 27 QG the final ones of the first 11Q11 abstractors cannot disappear by an elementary 11-reduction for the same reason as above. Assume that Q f Q’.Then by Th. 7.5 (ii): Q = Q i [z : K ]Q 2 , Q’ = Q1 [z : L] Q2 and Q1K 2; Q1L. It is clear that ILI < 1KI. Since the length of an expression cannot increase by 77-reduction it follows that we cannot regain Q from Q’. Hence Q = Q’. 0 Theorem 7.8. The monotony rules hold for 0-reduction.
Proof. Similar to the proof of Th. 5.15, using Th. 7.5. Theorem 7.9. If Q E , P E and P F E A, and Q E 27 Q F , then P E
Proof. Analogous to the proof of Th. 5.17; use Th. 7.7.
0
21 P F . 0
The converse of this theorem holds too. Given an 77-reduction Q K 21 M , it need not follow that M = Q’N’ with 11Q11 = 11Q‘11, since the find abstractors of Q may have been cancelled in 77reductions. For example: Let Q = Q’[z : A] and K G ( z ) ~then , Q K 2; Q’T, where 11Q’11 = 11Q11 - 1. This kind of 77-reduction plays an important r6le in the following. We shall call them g!-reductions. We shall prove a number of theorems concerning q!-reductions. In Th. 7.14 we shall show that we can postpone q!-reductions until after other 77-reductions. Cor. 7.17 will result from our discussions of q!-reductions.
Strong normalization in a typed lambda calculus (C.3)
441
Definition 7.10. (1) K 2; L is called a single q!-reduction (denoted by K
>$)
L ) if K
I
Q [z : A] (z)B and K generates K 2; L (i.e. if K 2; L is an elementary q-reduction). This reduction is called of order p if IIQ [z : All1 = p. (2) K L,, L is called a k-fold q!-reduction (denoted K 2':) L ) if there are K , =
... [xi : Ail (Zi)... ( 2 1 ) B
Q [z1 : All
8,
such that K
= Kk
2:) Kk-1 ...
>'I)
-,,! KO= Q B = L and Ki generates Ki 2:) Ki-1. This reduction K 2s) L is
called of order p if Kk
22)Kk-1 is of order p.
0
Theorem 7.11. If K E A and K 2s) L 2; N , then either K 2:) N or there is a reduction K 2; M 2s) N where K 2; M is not an q!-reduction.
Proof. K = Q [z : A] (z) B 2;' Q B Q B = L 2; N .
=
L. Consider the possibilities for 0
Theorem 7.12. If K E A and K 2s' L 2; N , where K 2s) L is of order where M 2:'
$')
N of orderp, or there is a reduction K 2; M 2s) N N is of order p and K 2; M is not an q!-reduction.
p , then either K
Proof. Compare with the previous theorem.
0
Theorem 7.13. Let K E A and K 2s' L 2,, N , where K 2s) L is of
K
>,,
>,,
L' 2;; N , where a decomposition of L contains no q!-reductions and L' 2;; N is of order p.
orderp. Then there is a reduction K
Proof. Decompose L 2,, N into L = El 2; ... 2; E, = N . We proceed with induction on T . If r = 1 there is nothing to prove. Let r > 1. Consider the reduction K L = El 2; E2. By the previous theorem we have either K
>$+l)
E2 or a reduction K 2; L" 2';) E2 where K 7.
2; L" is
not an q!-reduction. Applying the induction hypothesis on K 2s") E2 2,, N
>,,
or L" 2 ';' E2 N , we obtain K 7. K I,, L' contains no q!-reductions.
>,,
L' 2"; N , where a decomposition of rl.
0
Theorem 7.14. If QK E A and QK 2,, L , there is a reduction Q K 2,, I >(k) Q'K -,,! L where 11Q11 = 11Q'11, Q'K L is of order 11Q11, and where a decomposition of QK Q'K' contains no q!-reductions of order 11Q11.
>,,
Proof. Decompose QK
L into single-step q-reductions Q K
= L1 2; ... 2;
R.P. Nederpelt
442
L,
E
L. Let i be the smallest integer such that Li 2: Li+l is an q!-reduction
of order 11Q11. Apply the previous theorem on L; 2:’ L;+1
z,, L,
>‘I
L’ 2‘:) L as desired. The fact that L‘ obtain a reduction Q K ‘I. 11Q’11 = 11Q11 follows from Th. 7.5.
E
L. We
= Q’K’ with 0
>,,
Theorem 7.15. If Q K E A, Q K Q‘K‘, 11Q11 = 11Q‘11 = p and a decomposition of Q K Q’K’ contains no q!-reductions of order p , there is a reduction Q K 2oQK’ Q’K’ and a reduction Q K Q’K Q’K’.
>‘I
Proof. See Th. 7.5.
0
>,,
>,,
Theorem 7.16. If Q I K E A, Q2L E A, Q1K M , Q2L M , QI = [q : All ... [zp: AP] and Q2 = [q : B1]... [zp: BPI, there is an N such that Q1K 1,, QIN and Q2L 2,, Q2N.
>‘I
Proof. By the aid of Th. 7.14 and Th. 7.15 we can find reductions Q1K Q1K’ lo QiK’ 20’ M and Q2L Q2 L‘ >‘I QbL‘ 2“) M , where Q‘,K’ 2“) M
>’I
’I!
‘I!
and QiL’ ;2 : M are of order p . Note that Q: = [XI : A:] ... [zP : A;] and Q; = [XI : Bi] ... [xp: EL]. Now both QiK‘ and QhL‘ E A, so k = 1: assume k > 1, then M = [q: A’,]... [+k : Ak-J M’ = [z1 : B i ] ... [+k : B’P- k ] ... [+I : L3k-J M”; it follows that [+I : BL-.l] occurs in M’, hence also in K’; this contradicts the fact that Q{K’ E A, since we found two binding variables xp-l in Q: K‘. It follows that K’ = L’; we can take 0 N = K’ = L’. Corollary 7.17. If Q K , QL E A, Q K QN such that QK Q N and Q L QN.
>,,
Theorem 7.18. If Q D E A, Q D Q’ = [q: A;] ... [xp: A;], then QD
>,,
M and Q L
>’I Q’E, Q E
>,,
M , there is a 0
[q : A11 ...[x, : Ap] and
QE.
Proof. Resulting from Th. 7.14 we can find a reduction Q D
>‘I Q”D’ 2‘:’ Q‘E ‘I.
where Q”D’ 2s) Q’E is of order 11Q11. Now Q” = [ZI: Ayl ... [zp: A:], hence k = 0 (because &I’D’ E A; see the proof of Th. 7.16). Then also QD Q E by 0 Th. 7.15.
>,,
We shall now prove a theorem concerning the so-called “postponement of rpreductions” for A. What we want t o prove is that every reduction K 2 M which takes place by means of single-step p- and 7-reductions in arbitrary order, can be replaced by a reduction K 2 p L M , in which all &reductions precede all 7-reductions.
Strong normalization in a typed lambda calculus (C.3)
443
It is easy t o show that each reduction A 2; B 2; C can be replaced either by a reduction A 2; B’ 26 C (where r 2 0) or by a reduction A 2; B’ 2; C. But this does not suffice to prove the theorem. It is not sure that this process of interchanging 17’s and /3’s terminates for a given reduction K 2 M . In [Curry and Feys 58, Ch. 4,D2] a compound /3-reduction is introduced for the purpose of proving the above mentioned theorem. In our opinion there is an error in their proof (viz., the case that R is MkN and L is some Mjyj for j 5 k is missing). Nevertheless, their idea can be extended in such a manner that the theorem on the postponement of 17-reductions can be proved. We have carried this out by defining a ‘‘compound /3-reduction” A 2; B with the property that each reduction A Z,, B 2; C can be replaced by a reduction A 2; B’ >-,, C. However, this compound P-reduction looks rather complicated. Barendregt suggested to us another way of proving the theorem (private communication). He proposed a ‘‘nested” 17-reduction (which we call &-reduction and denote by 2;) with the property that a reduction A 2; B 2 p C can be replaced by a reduction A 2 p B’ 2; C. The nested character of this &-reduction is comparable to that of y-reduction discussed in the previous section. We prefer the latter way of proving because it is easier to understand.
Deflnition 7.19. Single-step &-reduction, denoted by tion generated by
>;,
is the reflexive rela-
(1) If Q [z : A] (z) B E A, z !$ B and Q B 2; QC, then Q [z : A] (2) B 2; QC. (2) If Q ( A )C E A, Q A 2; QA’ and QC 2; QC’, then Q ( A )C 2; Q (A’)C‘.
(3) If Q [ z : A ] C E A, Q A 2; QA’ and Q [ z : A ] C 2; Q [ z : AIC’, then 0 Q [z : A]C 2; Q [z : A’] C’.
We call rule (1)in this definition the rule of elementary single-step &-reduction, rules (2) and (3) the monotony rules for &-reduction. The following two theorems deal with the relation between 17- and &-reduction.
Theorem 7.20. If K E A and K 2; L , then K 2; L. Proof. Induction on the length of proof of K 2; L. Theorem 7.21. If K E A and K 2; L , then K
>,,
0
L.
Proof. Induction on the length of proof of K 2; L. For example, if K 2; L is Q [z : A] (z) B 2; QC, as a direct consequence of Q B 2; QC, then by induction Q B QC, and Q [z : A] (z) B 2; Q B >,, QC.
>,,
0
R.P. Nederpelt
444
We shall now prove a number of theorems which are lemmas for the theorem on the postponement of 77-reductions (Th. 7.28).
Theorem 7.22. If K E A and K 2; L , then L E A. Proof. Follows from Th. 7.21 and Th. 7.4.
0
Theorem 7.23. If Q E E A and Q E 2; Q[y : G] H , then Q E = Q [xi : Ail (xi) [ ~ :2 A21 ( ~ 2 ... ) [z, : An] (5,) [y : G’] H’, with QG‘ 2; QG, Q [y : G’] H’ 2; Q [y : G’] H and zi @ [zi+l: Ai+l] ... (zn)[y : G‘] H’. Proof. Induction on the length of proof of Q E 2; Q[y : GI H . If the latter reduction results from reflexivity, the proof is completed. (1) Let Q E 2; Q [y : GI H be Q’ [z : A] (z)B 2; Q’C, as a direct consequence of Q’B 2; Q’C. If Q‘ = QQ”, induction yields the proof. If Q = Q‘ [z : A], then C begins with [z : A]. This implies that [z : A] occurs in B (since 6-reduction can only omit abstractors and applicators without influencing the remainder of the expression), which is impossible since Q E E A. So this latter case cannot apply.
(2) Let Q E 2; Q [y : G] H be Q‘ (A) C 2; Q’ (A’) C‘. Then Q E = Q [y : G] F . (3) Let Q E 2; Q[y : GI H be Q’[z : A ] C 2; Q’[z : A’IC’, as a direct consequence of Q’k 2; Q’A’ and Q‘ [z : A] C 2; Q‘ [z : A] C’. There are the following possibilities: (a) Q = Q’, (b) Q = Q’ [z : A] Q1 and (c) Q’ = QQ1 with 11Q111 > 0. In all three cases the proof is easy.
0
Theorem 7.24. Let QA and Q [z : B]C E A, QA 2; QA’ and Q [z : B] C 2; Q [z : B] C’. Then Q(z := A ) C 2; Q(z := A’)C’. Proof. Induction on ICI. If C = 7, C
z or C
= y $ 2 , the proof is easy.
(1) Let C = [y : E] F . There are two possible cases: (a) Q [z : B] C 2; Q [z : B] C’ is Q [z : B] [y : E] F 2; Q [z : B] [y : E’]F’, as a direct consequence of Q [z : B] E 2; Q [z : B]E’ and Q [z : B] [y : E] F 2; Q [z : B] [y : E] F’. By induction: Q(z := A ) E 2; Q(z := A’)E’, and Q[y : (z := A)E] (z := A ) F 2; Q [y : (z := A)E] (z := A’)F’ (the latter because Q [y : (z := A)E] [z : B]F 2; Q [y : (z := A)E] [z : B] F’). Hence
Q(z := A ) C 2; Q(z := A’)(?‘.
Strong normalization in a typed lambda calculus (C.3)
445
(b) Q [z : B]C 1; Q [z : B]C‘ is Q [z : B ] [y : E] (y) G 2; Q [z : B]G’, as a direct consequence of Q [ z : B]G 2; Q [ z : BIG’. By induction Q ( z := A)G 2; Q ( z := A’)G’, SO Q ( z := A ) [y : El (y) G Q [y : (5 := A ) E ](y) (z := A)G 2; Q ( z := A’)G’.
(2) Let C = ( E )F . Then Q [z : B ]C 2; Q [ z : B ]C’ is Q [z : B ] ( E )F 2; Q [z : B ] (E‘)F’, as a direct consequence of Q [z : B] E 2; Q [z : B] E’ and Q [z : B]F 2; Q [z : B ]F‘. The theorem results from the induction. (Note that Q [z : B]C 2; Q [z : B ]C’ cannot be Q [z : B ] (z) G 2; QG‘.) 0
Theorem 7.25. Let A E A and A 2; B 2; C. Then A >p B‘ 2; C. Proof. Induction on the length of proof of A 2; B. If the last derivation step results from reflexivity, nothing remains to be proved. (1) Let A 2; B be Q [z : D] (z) E 2; QE‘ as a direct consequence of Q E 2; QE‘, and let B $ C be generated by Q’ ( F ) [y : G]H Q’(y := F ) H . The following cases may apply:
>b
(a) ( F ) [y : GI H C Q. There is clearly a reduction A >p B’ 2; C.
>b
(b) ( F ) [y : G]H C E’. Let B C be QE’ >b QE”’, then by induction there is a reduction Q E 2 0 QE” 2; QE”’, hence Q [z : D] (z) E 2 p Q [z : D ] (z) E” QE”‘.
>;
(2) Let A 2; B be Q ( D )E 2; Q (D‘) E‘ as a direct consequence of Q D 2; QD’ and Q E 2; QE’, and let B 2; C be generated by Q’ ( F ) [y : GI H Q’(y := F ) H . The following cases may apply:
>b
(a) ( F )[y : GI H C Q. Clearly A >p B‘ 2; C. (b) ( F )[y : G]H = (D’)E’. Then QE’ = Q [y : G]H , so Q E = Q [q: All ( 2 1 ) ... [zCn : A,] (zn)[y : G’]H’, with QG’ 2; QG, Q [y : G‘]H’ 2; Q[y : G’]H and zi [zi+l : Ai+l]... (z,) [y : G’]H’ by Th. 7.23. Then Q ( D )E 2 p Q(y := D)H’. By Th. 7.24: Q ( y := D)H’ 2; Q(y := F)H E C .
<
(c)
( F ) [y : G]H C D‘. Then C = Q (D’”)E’ with QD‘ 2; QD”‘, and by induction QD >p QD“ 2; QD”’, hence Q ( D ) E >p Q (D”)E 2; Q (D”’) E‘ G C.
(d) ( F ) [y : GI H c E’. Then C EZ Q (D’) E”’ with QE’ 2; QE”‘, and by induction Q E >p QE” 2; QE”’, hence Q ( D ) E 2 p Q ( D )El’ 2; Q (D’) E“’ = C.
446
R.P. Nederpelt
(3) Let A 2; B be Q [x : D] E 2; Q [x : D’] E’ as a direct consequence of QD 2; QD’ and Q [x : D] E 2; Q [ x : D] E’, and let again B 2b C be generated by Q’ ( F ) [ y : GI H 2b Q’(y := F ) H . The following cases may apply:
c Q. Clearly A 2 p B’ 2; C. ( F ) [ y : G]H c D‘. Then C = Q [z : D’”]E’
(a) ( F ) [ y : G]H (b)
with QD’ 2b QD”’. By induction: QD 2 p QD” 2; QD”‘, so Q [x : D] E 2 p Q [x : D”]E 2; Q [z : D”‘]E’ (where we require the lemma: Q [z : D] E 2; Q [z : D] E‘, then Q [z : D”]E 2; Q [z : D”]E’).
(c) ( F ) [ y : G]H c E’. Then C = Q [z : D’] E”’. Also: Q [z : D] E’ Q [z : D] E”’, so by induction Q [x : D] E >p Q [x : D] E” > K Q [x: D]E”‘. Hence Q [z : D] E >p Q [z : D] E” 2nQ [z : D‘] E”’ = C.
>b 0
Theorem 7.26. Let A E A and A 2; B 2; C . Then A 2 p B 2; C Proof. Induction on p , using the previous theorem.
0
Theorem 7.27. Let A E A and let A 2 C b y means of a number of single-step IC- and P-reductions in arbitrary order. Then there is a reduction A 2 p B > K C. Proof. Induction on the number of single-step &-reductions in A 2 C. If this number is zero, the proof is completed. Else, let A 2 C be A 2 A’ 2; B 1; C. Apply the previous theorem, obtaining A 2 A’ 2~ B’ 2; C, and apply the 0 induction on A 2 A’ 2 p B‘.
Theorem 7.28. Let A E A and let A 2 C b y means of a number of single-step 77- and P-reductions in arbitrary order. Then there is a reduction A 2 p B L,, C. Proof. Since each g-reduction c m be considered as a &-reduction (Th. 7.20), we can apply the previous theorem, obtaining A 2 p B 2, C. But B InC 0 implies B > n C (Th. 7.21), so A >p B C.
>,,
The remainder of this section will concern (general) reduction, defined as a sequence of single-step a-,P- and g-reductions.
Definition 7.29. Single-step reduction (denoted by 2’) is the relation obeying: 0 A 2’ B if and only if A >b, B, A 2b B or A 2; B. Definition 7.30. Reduction (or general reduction, denoted by 2 ) is the reflex0 ive and transitive closure of single-step reduction.
Strong normalization in a typed lambda calculus (C.3)
447
Theorem 7.31. The monotony rules hold for reduction. Proof. Use Th. 5.15 and Th. 7.8.
0
We shall prove a theorem (Th. 7.33) which expresses that the Q is in a certain sense irrelevant in a reduction QC Q E : it can be replaced by any P such that PC and P E E A. This corresponds with general usage in lambda calculus to define reduction for expressions which may contain free variables. Our choice to define reductions inside A is apparently not in disagreement with that general usage.
>
Theorem 7.32. If QC E A and QC 2 Q E b y means of there is a reduction QC >p Q D & Q E .
p - and rpreductions,
Proof. There is a reduction QC >p K & Q E by Th. 7.28. Now by Th. 5.12: K = Q’D with 11Q11 = 11Q’11. If Q = [ X I : All ... [zP: A p ] ,then Q’ E [x1 : A’,]... [x, : A;], so by Th. 7.18: Q‘D >17Q’E. From Th. 5.21: QC 2 p QD, and from Th. 7.9: QD >17Q E . 0 Theorem 7.33. If Q C , PC and P E
E
A, and QC 2 Q E , then PC
> PE.
Proof. See Th. 7.32, Th. 5.17 and Th. 7.9.
0
Reduction is a non-symmetric relation between expressions in A, which is reflexive and transitive. We shall define lambda equivalence. The definition of beta equivalence was given in Def. 5.22. In Th. 7.35 we shall prove that beta equivalence is the symmetric closure of beta reduction.
Deflnition 7.34. Let A and B E A. We call A lambda equivalent t o B (denoted: A N B ) if there is an expression C such that A 2 C and B 2 C . 0 Theorem 7.35. Beta equivalence is reflexive, symmetric and transitive. Proof. Reflexivity and symmetry are trivial. Transitivity follows from Th. 6.43 (CR for &reduction): let A -p B and B -p C , then there are D and E such that A >p D , B >p D , B >p E and C >p E. Moreover, there is an F such that D 2 p F and E >p F (Th. 6.43), so A >p F and C 2 p F . Hence A -p C . 0 Unfortunately, there is no similar theorem for lambda equivalence. Of course lambda equivalence is symmetric and reflexive, but not necessarily transitive.
R.P. Nederpelt
448
The reason for this is that CR does not hold for (general) reduction: for example, let K L and K M , let K = Q [z : A] (z) [y : B ] C where z [y : B ]C, let K 2 L be Q [ z : A] (z) [y : B ]C Q [y : B]C and let K M be Q [z : A ] (z) [y : B]C >p Q [z : A] (y := z)C (la Q [y : A] C ) . Now we cannot in general find an N such that L 2 N and M 2 N , since we know nothing concerning a relation between A and B.
>
>
>,,
>
We note the following. We can embed ordinary lambda calculus into A, since there is a one-to-one correspondence between expressions from lambda calculus and those expressions in A in which only abstractors of the form [z : T ] occur. If we restrict ourselves in A to the latter expressions, the example above changes Q [y : T ] C. Now there into K = Q [z : T ] (z) [y : T ] C, L E Q [y : T ] C and M is no problem as regards CR. Indeed, in lambda calculus the Church-Rosser property holds (see [Barendregt 71, Appendix 111). The following theorem expresses that lambda equivalence of Q K and QL implies the existence of an N such that Q K 2 QN and Q L 2 QN or, otherwise stated: the abstractor chain Q can remain unaffected.
Theorem 7.36. Let Q K and QL E A . If Q K that Q K 2 Q N and Q L 2 Q N .
-
Q L , there exists an N such
>
Proof. There must be an M : Q K 1 M and Q L M . By postponement of q-reductions we obtain reductions Q K >p M I 2,, M and Q L 2 0 M2 Z q M . Th. 5.12 implies that M I = QlK', M2 2 Q2L', 11Q11 = 11Q111 = 11Q211. Then, according to Th. 5.21, we also have Q K I p QK' 2 p Q1K' l qM and Q L 2 p QL' Lp Q2L' I,,M . It is easy t o show that Q1 and Qz have the form as required in Th. 7.16, hence there is an N such that Q1K' Q1N and Q2L' Lq Q2N. From Th. 7.9 it follows that QK' > q QN and QL' Q N . So Q K 2 QN and QL 2 QN. 0
>,,
>,,
The monotony rules also hold for lambda equivalence:
Theorem 7.37. (a) If QC, Q D , Q ( A ) C , & ( A )D E A and QC Q ( A )D.
-
Q D , then Q ( A ) C
(b) I f Q C , QD, Q [ z :A ] C , Q [ z :A ] D E A a n d Q C - Q D , thenQ[z : A J C Q [z : A ] D.
(c) If QA, Q B , Q ( A ) C , Q ( B ) C E A and Q A
Q ( B )C . (d) If Q A , Q B , Q [z : A] C,Q [z : B]C E A and Q A Q [z : B]C .
-
Q B , then Q ( A ) C
-
Q B , then Q [z : A] C
-
-
Strong normalization in a typed lambda calculus ((7.3)
449
Proof. See Th. 7.36 and Th. 7.33.
0
Theorem 7.38. If QC , Q D , PC and P D E A and QC
N
Q D , then P C
-
PD. 0
Proof. Th. 7.36 and Th. 7.33. 8. Type and degree
The notions introduced in the preceding sections are from lambda calculus (as reduction, lambda equivalence) or applicable to lambda calculus (factors, bound expressions), since the types played no essential r6le. We shall now look into the typing of an expression in A. With every A E A for which Tail A f 7 we define a type, denoted as Typ A, as follows:
Definition 8.1. Let A E A and Tail A Typ A G Pi [X : B]P ~ F T B .
= z, so A = PI [z : B]P ~ x .Then 0
Informally speaking, we may say that B is the type of z in the above expression. Note, however, that we allow Typ to operate only on expressions in
A. Theorem 8.2. If A E A and Tail A f
7,then
Typ A E A.
Proof. Let A = P1 [z : B]P2x and let Alx f Q1 [z : B]Q2z. We prove that Typ A is a bound expression. All non-binding variables in PI [z : B] P2 are clearly also bound in Typ A . Consider a non-binding variable z c F T B c is a corresponding y c B, and Aly = Q1Q3y. So Q1 [z : B]Q ~ F T B There . Typ Alt = Q1 [z : B]Q2Qiz where Q$z is a renovation of Q3y. Case 1: if y was bound in Aly by a binding variable in Q3, z is bound in Typ Alz by the corresponding binding variable in 96. Case 2: if y was bound in Aly by a binding variable in Q1, z E y is still bound by the same binding variable in Q1 since all binding variables of [z : B]Q2Qb are different from y. So Typ A is bound. Clearly Typ A is also distinctly bound by the renovation of B. 0 We define repeated application of Typ inductively as follows:
Definition 8.3. Let A E A. Then Typ'A n 2 1 and if Tail Typn-' A f 7, then Typ" A
= A;
if Typ"-'A is defined for 0 A).
= Typ(Typn-'
If A E A and TypnA is defined, we call n permissible for A ( n = 0 is always permissible for A E A).
R.P. Nederpelt
450
Theorem 8.4. Zf A E A and A & B , then Typ" A permissible for A and B. Proof. It is sufficient to prove: if A The latter proof is easy.
rb,B and TailA f
T,
>a
Typ" B for all n
then TypA
>a
TypB. 0
With each expression A in A we define a degree, denoted Deg A: Definition 8.5. (1) If A E A and Tail A = 7 ,then Deg(A) = 1. (2) I f A E A , T a i l A = x a n d A = P l [ z :B]P2x,thenDeg(A)=Deg(PlB)+l. 0
Induction on the length of A shows that Deg(A) is well-defined by Def. 8.5. Clearly Deg A = 1 if and only if Tail A = T . We shall now prove a number of theorems, leading to the theorem: if TailA f T , then Deg A = Deg Typ A 1 (Th. 8.12). We could have taken this property as a definition of Deg. In that case, however, the well-definedness of Deg would have been harder to prove.
+
Theorem 8.6. Zf PC E A and P ( K )C E A, then Deg PC = Deg P ( K )C. Proof. Induction on IPC1.
0
Corollary 8.7. If A E A and A = P C , then Deg A = Deg(A1C).
0
Corollary 8.8. If A E A, Tail A Deg A = Deg Q1B 1.
+
=x
and Alx
=
Q1 [x : B]Q2z, then 0
Theorem 8.9. Zf PC E A and P [x: K ]C E A, then Deg PC = Deg P [z : K ]C. Proof. By Th. 3.8: x @ C. The rest of the proof follows from induction on
WI.
0
Theorem 8.10. Zf PC E A and PP'C E A, then Deg PC
= Deg PP'C.
Proof. Induction on IIP'II, using Th. 8.6 and Th. 8.9. Theorem 8.11. Zf A E A and A
B , then Deg A = Deg B.
Proof. Take A >b, B ; induction on /A]. Theorem 8.12. Zf A E A and Tail A
0
$ 7 , then
0
Deg Typ A = Deg A - 1.
Strong normalization in a typed lambda calculus (C.3)
451
Proof. Let Tail A = z and A = PI [z : C ]Pzz, so Typ A = PI [z : C]PzFrC. Then, P1FrC E A and Deg PIFrC = Deg PIC by Th. 4.5 and Th. 8.11. By Th. 8.10: Deg PIFrC = Deg Typ A. So Deg A = Deg PIC 1 = Deg Typ A + 1.
+
0
0
Corollary 8.13. If A E A, then Tail(TypDegA-' A ) I 7.
This optimal exponent of Typ with a certain A E A is of special importance. We shall introduce an abbreviation:
Definition 8.14. If A E A, then Typ* A
= TypDegA-' A .
0
We stress that the asterisk replaces an exponent n dependent upon A . Moreover, note that Typ is a partial function on A, but Typ' is a total function on
A. We proceed with a number of theorems on Typn, Typ' and Deg:
Theorem 8.15. If A E A , Deg A = 1 and A 1 B, t h e n Deg B = 1. Theorem 8.16. If PC E A, then for permissible n T y p n P C particular Typ* PC = PC".
=
0
PC'; in
Theorem 8.17. If PC E A, PP'C E A, and for a permissible n Typn PC PC', t h e n n is permissible f o r PP'C, and Typn PP'C 2, PP'C'.
CI G
Proof. It is sufficient to assume Tail PC $ 7 and n = 1. Let Tail PC = z and ( P C )I z = Q1 [z : B]Q22, then (PP'C)12 = Qi [z : B]Q;z, and [z : B] appears in either P or C. The remainder follows. 0 Theorem 8.18. If PC E A, PP'C E A and for a permissible n Typn PP'C PP'C', then n is permissible for PC and Typ" PC 1, PC'. Proof. Similar to the previous proof.
= 0
R.P. Nederpelt
452
CHAPTER 111. THE FORMAL SYSTEM A 1. Legitimate expressions The “meaning” of (A) B is the application of function B t o argument A. So far this application was unrestricted: any expression could serve as an argument. Besides, it was of no interest whether B really was a function or not. In the formal system A, which we shall introduce in this chapter, we only admit the expressions of A which obey the applicability condition. (For an informal introduction of the applicability condition: see Section 1.4.) We call this kind of expressions legitimate expressions. Since A is a part of A, we again provide expressions with abstractor chains Q, as we did with expressions in A (cf. the beginning of Section 1.6). We begin with the definitions of function, domain and applicability with respect to an abstractor chain Q:
Definition 1.1. Let Q B E A. We call Q B a Q-function if there are x, K and L such that Typ* QB 2 Q [x : K] L. The expression Q K is called a Q0 domain of QB. Definition 1.2. The expression Q B is called Q-applicable to QA if Q B is a Q-function with Q-domain Q K , Deg QA > 1 and Typ QA 2 Q K . In that 0 case Q ( A )B is a legitimate Q-application of Q B to QA. The formal system A is inductively defined by:
Definition 1.3. (1)
T
E A.
(2) If QA E A and if z does not occur in QA, then Q [ x : A]. Q [x: A] T E A.
E A and
(3) If QA and Qy E A, if x does not occur in QA and if x f y, then Q [z : A] y E A. (4) If QA and Q B E A, if the binding variables in A and B are distinct and if 0 Q B is Q-applicable to QA, then Q (A) B E A. The only difference to the (second) definition of A as given by Th. 11.3.10 lies in the applicability condition in (4): Q B must be Q-applicable t o QA, i.e. Typ*QB 2 Q[y : K ] L and Typ Q A 2 Q K . These reductions are defined for expressions in A (cf. the following Th. 1.4 and Th. 11.8.2). Note that the
Strong normalization in a typed lambda calculus ((2.3)
453
applicability condition does not state that the reductions mentioned concern expressions in A only. The applicability condition has the powerful consequence that all expressions in A normalize (cf. Section 1.2), which we shall prove later in this chapter, whereas in the wider system A normalization is not guaranteed.
Theorem 1.4. If A E A, then A E A. Proof. Induction on the length of proof of A E A.
0
Restricting ourselves to a- and &reductions, we can weaken the applicability condition in the sense that we replace 2 by -:
Theorem 1.5. If Q A and Q B E A , Q [y : K ]L E A, Typ* Q B -p Q [ y : K ] L , Typ Q A -p Q K and if the binding variables in A and B are distinct, then Q ( A )B E A . Proof. Let Typ* Q B = QB’ (Th. 11.8.16). Since QB’ -p Q [y : K ]L , there is an M such that QB’ z p QM and Q [ y : K ] L I p Q M (Th. 11.5.12 and Th. 11.5.16). From Th. 11.5.16 and Th. 11.5.20: QM E Q [y : K’] L’ such that Q K I p QK’. Let Typ Q A = QA’. Since QA‘ -p Q K , there is a K” such that QA’ >p QK’’ and Q K I p QK”. Hence (Church-Rosser theorem for P-reduction, Th. 11.6.43) QK’ QK”, so there is a K”’: QK’ 2 p QK”’ and QK” >p QK“’. Also Q [y : K’] L’ >p Q [y : K”‘]L’. Resuming: Typ* Q B >p Q [ y : K”’]L’ and Typ Q A 2 QK”’. So Q ( A )B E A.
-
0
Note that the above theorem does not hold if we use lambda equivalence (-) instead of P-equivalence (-p). Let Q A E A and TypQA 2 QA’. Let Q B E A for some B. Then Typ* Q B = QB‘ Q [y : A‘] (y) B’ for some fresh y, since Q [y : A’] (y) B‘ & QB‘. If the above theorem were to hold with instead of -0, it would follow that Q ( A ) B E A. Note that A and B are arbitrary. This can clearly not generally be the case. As a counterexample, take Q = [z : 71, A = B = z. Then Q ( A )B = [z : T ] (z)z, which does not belong to
-
-
A. We shall prove a number of theorems concerning A.
Theorem 1.6. If A E A and A 2, B , then B E A.
0
As with A, it holds for A that, given K E A, only one of the derivation steps in Def. 1.3 can yield K E A as a conclusion (unique A-constructibility).
R.P. Nederpelt
454
Theorem 1.7.
If Q ( A )B E A, then Q A and Q B E A.
Proof. Follows from the unique A-constructibility.
0
Theorem 1.8. If Q [x : A]B E A , then QA E A.
Proof. Induction on IBI, using the unique A-constructibility. Let B = [yl : B I ]... [yk : B k ] Ps,where P f [ z : E ] PI, and s = T, s G y f z or s = x.
case 1. P = 8,k = 0. Then QA E A from rule (2) of Def. 1.3 for all possible s. case 2. P = 8, k 2 1. Then Q [z : A ] [yl : B1]... [ y k - l : Bk-11 Bk E A from rule (2) or (3), so Q A E A by induction. case 3. P = ( E )PI. Then Q [x : A] [ y ~: B1]... [yk : B k ] E E A by Th. 1.7, hence QA E A by induction. Theorem 1.9. Zf QA E A , then Qr E A.
Proof. Induction on [ A [ .If A E 7, there is nothing to prove. If A G z, then Q = Q1 [y : B ] or Q f Q1 [x : B ] . In both cases Q1B E A , so also Qr E A. I f A = ( B ) C or A = [ z : B ] C , then Q B E A by Th. 1.7 or by Th. 1.8, so by 0 induction Qr E A . Theorem 1.10. ZfA E A and B
c A , then AIB E A.
Proof. Induction on IAl. If A = r then the proof is trivial. Let A = (21: All ... [zk : Ak] Ps, where P f [ z : E ] PI. (1) If B
= [xj : Aj] ... [zk: Ah]Ps or B = Ps, then AIB 3 A E A.
( 2 1 : Ail ...[zi-i : Ai-I] (AilB) : All ... [ ~ i - 1 : Ai-11 Ai E A by ( [ x i : All ... [zi-i : Ai-l] Ai)lB and Th. 1.8, so by induction AIB E A.
( 2 ) If B C Ai, then AIB
(3) Let B C Ps, B f Ps. If P = 0 then B = s and AIB = A E A. So assume P = ( K )PI. Distinguish the cases B c K and B c PIS. In both cases we may conclude AIB E A by a similar reasoning as in (2). 0
Corollary 1.11. ZfA E A and x
c A, then Alz E A.
0
Theorem 1.12. Zf Q A and Q B E A , Q [x : A] B E A, then Q [z : A ]B E A.
Proof. Induction on IBI. Let B case 1. P case 2. P
= [y1 : B1]... [yk : Bk]Ps, where P f
= 8, k = 0. Then Q [z : A]B E A by Def. 1.3 (2) or (3). = 8, k 2 1. Call [y1 : B1]... [yk-l : Bk-11 E Q’.
[ z : El PI.
Strong normalization in a typed lambda calculus (C.3)
455
(1) Assume s = Y k . Then QQ’Bk E A by Th. 1.8, and Q [z : A] Q’Bk E A (Th. 11.3.8 and Th. 11.3.9), so by induction Q [ z : A]Q’Bk E A, hence Q [z : A] B E A. (2) Assume s f Y k . Then QQ‘Bk and QQ’s E A (by the unique A-constructibility), Q [ z : A]Q’Bk and Q [ z : A]Q’s E A (Th. 11.3.8 and Th. 11.3.9), so by induction Q [z : A]Q’Bk and Q [ z : A]Q’s E A. It follows that Q [z : A] B E A.
case 3. P I ( E )P’. c a l l [ y :~B l ] ... [Yk : Bk] = Q”. Then QQ”E and QQ”P’s E A by Th. 1.7, Typ* QQ”P’s = QQ”F‘ 2 QQ” [ z : K ] L and Typ QQ”E = QQ”E‘ 2 QQI’K. It follows from Th. 11.7.33, Th. 11.8.9 and Th. 11.8.17 that T y p * Q [ z : A]Q”P’s z Q [ z : A]Q”F‘ 2 Q [ z : A ] Q ” [ z : K ] L and Typ Q [ z : A]Q” E = Q [ z : A]Q”E’ 2 Q [ z : A ] Q ” K . By Th. 11.3.8 and Th. 11.3.9 Q [ z : A]Q”P’s and Q [ z : A]Q”E E A, so by induction they also 0 belong to A, hence Q [z : A ] B E A. Theorem 1.13. Zf Q [z : A ] B E A and Q B E A, then Q B E A. Proof. Induction on 1BI. The proof is similar to the proof of Th. 1.12, with 0 the use of Th. 11.3.10 instead of Th. 11.3.11. We shall use the following theorem as a lemma for the important Th. 1.15. Theorem 1.14. Let PP’K, P L E A, PP’L E A and Typ* PP’K Typ* PP’L. Then PP’L E A. Proof. Induction on IlPP‘Il. If PP’
I I, the
la
proof is trivial.
case 1. Assume P = Q ( E )P“. Then QPl‘P’K E A, Q E E A, Typ* QP‘IP’K 2 Q [ y : M ] N and Typ Q E 2 Q M . Also: QP”L E A and QPI’P’L E A. We now prove that Typ* QP”P‘K 2, Typ* QP‘IP‘L. Let Typ* PP’K = PP’K’ and Typ* PP‘L I PP’L‘, then by hypothesis Q ( E )P“P’K‘ = PP’K’ PP’L‘ = Q ( E )P”P‘L‘, so also QPI’P‘K’ la QPI’P’L’ (Th. 11.4.6). But Typ*QP“P’K = QP”P‘K’ and Typ*QP”P’L = QP’‘P’L’ by Th. 11.8.6 and Th. 11.8.18. It follows by induction that QP’IP’L E A. Also Typ* QPI’P‘L 2 Q [ y : MI N , so Q ( E )PI’P’L I PP’L E A. case 2. Assume P = Q and P’ = Q’ ( E )P”. Then QQ‘P‘IK E A, QQ’E E A , Typ’ QQ‘P‘IK 2 QQ’ [y : M ] N and Typ QQ‘E 2 QQ’M. Also: QQ’PI’L E A and Typ* QQ‘PI‘K Typ* QQ‘PI‘L (which can be proved as in case l),so by induction QQ’P”L E A. Since Typ’ QQ’PI’L 2 QQ‘ [ y : M ] N it follows that QQ’ ( E )P”L I PP’L E A. case 3. Assume P I Q and P’ = Q’. If Q’ = 0 there is nothing to prove. Let
R.P. Nederpelt
456
Q’ [zl : MI]... [z,, : Mn]for n 2 1. Since QL and QQ’L E A, z, cannot occur in QL (Th. 11.3.8) or in Q [z1 : M i ] ... [zi-l : Mi-11 Mi.It follows from QQ’K E A (Th. 1.8) that QMi, Q [xi : Mi] M2, ..., Q [xi : M I ]... [zn-1 : Mn-11 Mn E A. SO also Q [XI : M i ] L , Q [xi : M i ] [Q : M2] L , ...,QQ’L E A by Th. 11.3.11 and 0 Th. 1.12. Theorem 1.15. If A E A, then Typn A E A for all permissible n. Proof. Let A = Pi [z : B]P ~ xthen , Typ A = PI [z : B]PZFrB. Since A E A: P1B E A (Th. 1.8), so PlFrB E A (Th. 1.6). Also Typ A E A (Th. 11.8.2) and Typ* A > a Typ*(Typ A). Now, applying Th. 1.14, we obtain Typ A E A. The theorem follows directly. 0 2. The normalization theorem In this section we shall prove the normalization theorem: if A E A, there is a B in normal form such that A 2 B ( B is said to be in normal form if there are no reductions B 2; B‘ or B 2; B’). We do this by the aid of a norm p, which is a partial function from expressions in A to expressions in A, and which has the following powerful properties with relation to A: (1) If A E A, then p(A) is defined.
(2) If A E A and A 2 B , then p(A) & p(B). (3) If A E A and Deg A
> 1, then p ( A )
p(Typ(A)).
Hence this norm is invariant (apart from a-reduction) with respect to reduction and typing. We first define P A for every A E A. This P A is a partial function from subexpressions of A to expressions. It is rather in contradiction to our philosophy to define the norm with respect to subexpressions, which need not belong to A. We could have avoided this by giving a definition of the norm in the line of our second definition of A, only considering norms of expressions in A. This, however, would have impaired understanding of the following and would have led to laborious descriptions. On the other hand, in this section the context of a subexpression will always be clear, so that no confusion can arise. In the following inductive definition of PA we do not explicitly indicate which occurrence of a subexpression in an expression is meant, since this will be clear from the context.
Definition 2.1. Let A E A. (1) If
7
c A then
=
~ A ( T ) 7.
Strong normalization in a typed lambda calculus (C.3) (2) If z c A , Alz
457
= Q1 [z : B]Q 2 x and if p A ( B ) is defined, then PA(Z) = ~ A ( B ) .
(3) If [z : B]C c A, and if both ~ A ( Band ) ~ A ( Care ) defined, then P A ( ( . : B] [x : p A ( B ) ] P A ( C ) .
c)
=
(4) If ( B )C c A , if both P A ( B )and P A ( C )are defined and PA(C) [y : D] E where D ~ A Bthen , ~A((B C)) G E. 0
From this definition it can easily be seen that, if ~ A isAdefined for A E A,
P A Acontains no bound variables. The following theorem is obvious:
The binding variables in P A A will be irrelevant t o our purposes. We might as well do without them. Our reason for retaining them is personal taste: we find the property ~ A ( AE) A agreeable. In trying to calculate ~ A ( Afor ) a certain A E A, we apply the four rules of Def. 2.1; the only event in which this calculation can break down prematurely (before ~ A ( Ais) obtained), is when we encounter a subexpression ( B )C c A for which the conditions stated in Def. 2.1 (4) are not fulfilled. These conditions may be considered as a weaker form of the applicability condition (cf. Section 1.6, where this is explained in an informal manner): (1) C must have a norm with a functional character: PAC= [y : D] E , and (2) B has a norm which behaves as an appropriate argument for the “function” PAC:PABL a D. If these conditions are fulfilled, the norm of ( B )C is defined as the result of the C)) 2 E ; if application of the “function” PACto the “argument” ~ A B~: A ( ( B these conditions are not fulfilled, the norm of ( B )C is not defined, and neither is the norm of A . Note that the norm of a bound variable is defined as the norm of its “type”: if [x : B ] is the binding abstractor of x, then P A ( Z ) = P A ( B ) . The existence of P A A for a certain A E A indicates that some weak funcA tional condition is fulfilled. Suprisingly enough, the existence of ~ A already guarantees that there is a normal form for A . We shall prove this in Th. 2.17. We are especially interested in normalization properties of expressions in A. We note that expressions in A have, so to say, a much stronger functional character than is required for the existence of the norm of expressions. Th. 2.7, stating that P A Aexists for A E A, is not hard to prove.
R.P. Nederpelt
458
If in the following we speak of the norm of a subexpression B of a certain expression A , it will be clear which A we mean, even if we do not state this explicitly. In such cases we shall write p(B) instead of ~ A ( B ) . If A E A, B c A and ~ A ( Bis) defined, we call B pA-normable. Here, too, we speak of “p-normableB” if it is clear which A (with B c A E A ) we mean. If Qr E A, if Qr is pnormable and ~ ( Q T=)Q‘r, we call Q p-normable, and we abbreviate p(Q) = Q’.
Theorem 2.3. If A E
A, A is p-normable and B c A , then B is p-normable.
Proof. Induction on lAl, with the use of the definition of B
cA
(Def. 11.2.5). 0
Theorem 2.4. If Q A E A and Q A is p-normable, then Q and A are pnormable and p(QA) G (pQ)pA; if Q A E A, and if Q and A are p-normable, then Q A is p-normable and p(QA) = (pQ)pA.
Proof. Induction on 11Q11.
0
Theorem 2.5. If A E A, if A is p-normable and A 2 B , then B is p-normable and pA > a pB.
Proof. First assume that A 2’ B. We proceed by induction on the length of proof that A 2’ B. If A 2; B then the proof is trivial; this case is expressed in Th. 2.2. (I)
(a) A 2’ B is Q ( C )[z : D ] E 2; Q ( z := C ) E . Since A is p-normable: p ( [ z : D]E ) = [z : p D ] p E and pC 2, p D = pz (see Th. 2.3 and Th. 2.4). Moreover, pA = p(Q ( C )[z : D] E ) = (pQ)pE. We now prove for this 2, C and E:
Lemma. If K c E, then (z := C ) K is p-normable and p(z := C ) K 2, p K .
Proof of the lemma. Induction on 1KI.
=
(1) (a) If K z z, then (z := C ) K = FrC and p ( z := C ) K pFrC 2, pC 2, pz = p K . (b) If K y f z or K z T , then (z := C ) K K and p(z := C ) K = p K . (2) If K = [y : FIG, then pK [y : pF]pG. Note that y f z. By induction: (z := C ) F and (z := C ) G are pnormable, p(z := C ) F la pF and p(z := C ) G 2, pG. So (z := C ) K is p-normable and p(z := C ) K [y : p(z := C ) F ] p ( z:= C ) G > a [y : pF] pG pK.
=
=
=
=
=
Strong normalization in a typed lambda calculus (C.3)
459
(3) If K = ( F )G , then pG = [y : L] H and L la p F , and pK = H . By induction: (z := C ) F and (z := C)G are p-normable, p(z := C ) F pF and p(z := C)G 2, pG. It follows that p(z := C)G = [ z : L'] H' and L' La L La pF l ap(z := C ) F , so (z := C ) K is p-normable and p(z := C ) K I H' H G pK. 0 It follows that B is pnormable (since E c E ) , and p B = (pQ)p(z:= C ) E l a(pQ)pE= PA. (b) A 2' B is Q [z : C ](z) D 2; QD. Since A is pnormable: QC is p-normable, pz = pC, pD = [a, : L ] H and L > a pz = pC, so pA = ( P Q ) [z : PCI H l a bQ) : PC]H La (pQ)pD= PQD PB. (11) A 2' B is a direct consequence of a monotony rule. It depends on the monotony rule which of the following three cases applies: (a) A 2' B is Q ( C )E 2' Q (C)F as a direct consequence of Q E 2' QF. Since A is p-normable: Q E is p-normable. By induction: QF is p normable, and pQE = (pQ)pE 2, (pQ)pF = pQF, so pE 2, pF. Moreover, pE = [a, : L] H and pC La L, so also pF = [z : L'] HI, where L' > a L and HI La H . It follows that (C) F is p-normable, and p(C) F = H' H . So pB (pQ)pH = pA. (b) A 2' B is Q [z : C ]E 2' Q [x : D] E as a direct consequence of QC 2 ' Q D . Then pA = (pQ)[z : pC]pE. By induction: Q D is p-normable and (pQ)pD = pQD, so pC la pD. Hence B is pQC = (pQ)pC pnormable and pA l a(pQ)[z : pD]pE = pB. (c) A 2' B is Q ( C )E 2' Q ( D )E as a direct consequence of QC 2' QD. Then pE = [z : L] H and L l apC. By induction: Q D is pnormable and pQC = (pQ)pC la (pQ)pD zz pQD, so pC > a pD. Hence B is p-normable and pB = (pQ)H = pA. Finally, if A 2 B is a multiple-step reduction, decompose the reduction and 0 apply the above.
Theorem 2.6. If A E A, Deg A p-normable and pA la p Typ A.
>
1 and A is p-normable, then Typ A is
Proof. Let A = PI [z : B ] P ~ z then , Typ A = PI [z : B ] P2FrB. It is not hard to show that pFrB la pB = px. Let PI [x : B]P2 = PIP''. Next prove by induction on IIP"II that P"FrB is p-normable, and p(P"FrB) 2, p(P"x). 0 Theorem 2.7. If A E A, then A is p-normable (i.e. p is a total function on A).
460
R.P. Nederpelt
Proof. Induction on the length of proof of A E A. (1) A
= 7: trivial.
(2) A = Q [ z : B ] z or Q [ z : BIT E A as a direct consequence of QB E A. Then by induction QB is p-normable, hence Q is pnormable and pB = pz. Hence A is pnormable.
(3) A = Q [z : B]y E A as a direct consequence of QB E A and Qy E A. By induction: QB and Qy are pnormable, hence Q, B and y are pnormable, so A is pnormable.
= Q ( B )C E A as a direct consequence of QB E A, QC E A, and the Qapplicability of QC to QB. Then QB and QC are p-normable (induction), and so are Q, B and C. The Q-applicability implies that Typ* QC 2 Q [z : K ]L and Typ QB 2 QK. From Th. 2.5 and Th. 2.6: Typ* QC and Q [z : K ] L are pnormable, pQC p Typ* QC 2, (pQ)[z : p K ] p L (so PQK (so P B 2, pK). PC [z : PKI p L ) and PQB Za p TYPQ B 0 Hence ( B )C is p-normable and so is A .
(4) A
Instead of p is total on A , we also say: A is p-normable. F’rom Th. 2.5 we derive:
Theorem 2.8. If A , B E A and A
-
B , then p A
pB.
0
We shall now prove the normalization theorem for p-normable expressions.
Definition 2.9. A E A is normal (or in normal form) if there are no reductions A 2b B or A & B ; A is normalizable (or A has a normal form) if there exists a normal C such that A 2 C (C is called a normal form of A ) . 0 Definition 2.10. A E A is @-normal if there is no reduction A 2; B ; A is @-normahable (or A has a @-normalform) if there exists a @-normalG such 0 that A 20 C (C is called a @-normalform of A ) . Hence A is normal if A admits of neither @-, nor 77-reductions; A is @-normal if A admits of no @-reductions (except trivial ones).
Theorem 2.11. If A E A is normal and A B , then B is normal. If A E A as @-normaland A B OT A > q B , then B is @-normal. Proof. The only non-trivial statement is that B is @-normalif A is p-normal
Strong normalization in a typed lambda calculus (C.3)
46 1
and A zs B. It can, however, easily be seen that a single-step g-reduction of a pnormal expression cannot introduce the possibility of a single-step /3-reduction. 0 We restate the following well-known theorem:
Theorem 2.12. If A E A is @-normalizable,then A is nonnalizable.
Proof. As a result of Th. 2.11, g-reductions of A do not cancel the /3-normal character. But the possible number of single-step g-reductions applicable to A 0 is finite, since the expression becomes shorter with each step. Theorem 2.13. If A E A is @-normalizable,the p-normal form of A is unique but for a-reductions.
Proof. Let C and D be P-normal, A >p C and A >p D. Then, by the ChurchRosser theorem for &reduction (Th. 11.6.43): there is an E such that C 2 p E 17 and D >,,p E. Hence C la D. Theorem 2.14. Assume that Q (Ak)... ( A l )B is in A and p-normable. Then k
i=l
Proof. Induction on k . If k = 0 the proof is trivial. Let k > 0. Then ( A l )B is p-normable, hence p B = [z : M ] N and pA1 >, M , so IpAll = IMI. Moreover, k
p ( A 1 )B
G
N , hence IN( >
(pAi(by induction. It follows that i=2
= Q (Cn)...(GI) F IpCi(. If A = Qs (with 3 = z
Definition 2.15. Assume that A is in A and pnormable, A for some n
> 1 and F f
n
( M )N . Then o ( A ) =
or s E T), then a ( A )= 0.
i=l
0
Theorem 2.16. Assume that A as in A and p-normable, A E Q (Cn) ... (Cl) F , F f ( M )N , and let QCi (for 1 5 i 5 n ) and QF be @-normal. Then A is pnormalizable.
Proof. Induction on o ( A ) . (1) If a ( A ) = 0, then n = 0 and A = Q F in p-normal form.
( 2 ) Let a ( A ) > 0. Then n 2 1. We proceed by induction on IF(. If F = y, then A is in @-normal form ( F = T cannot occur since A is p-normable).
R.P. Nederpelt
462
So let IF1
> 1. Then F z [z : D]E .
Assume that E = ( H m ) ... ( H I )y where y f z and m 2 0. Then A 2 Q ( C n )...(C2) ((z := C1)Hm)...((z := C1)Hl)y. Now note that Q (C1)[z : D]H , E A, pnormable, and a ( &(C1)[z : D] H,) = JpClJ5 a(A). Moreover, IH,I < [El,so by induction Q(C1)[z : D]H , is P-normalizable. Since QC,, QD and QH, are P-normal, the pnormalization of Q (C1)[z : D]H , must commence with Q (C1) [z : D] H , 2; Q ( z := C1)H,, so Q ( z := Cl)H,is P-normalizable; say Q ( z := C l ) H , 2 Q K z in p-normal form. It follows that A 2 Q (Cn)... (C2) ( Km )... (K1)y in ,&normal form.
(i)
(ii) Assume that E z ( H m )... ( H I )T where m 2 0. Again, analogously to (i), A is P-normalizable. (Moreover, m must be 0 since A is p normable.) (iii) Assume that E = (Hm)... ( H I )z where m 2 0. Then A 2 A’ = Q (Cn)... ((72) ( K m )... (K1)FrCl, where we obtain the @-normalQKr as in (i). If now FrC1 is a variable or if FrCl begins with an applicator, we have obtained a p-normal form. If FrC1 E [y : M ]N , then
g(A’)=
2 r=2
IpCzl
+
m
lpKzl < IpFrCl) (by Th. 2.14 and Th. 2.5) = p 1
IpClI 5 a ( A ) ,so by the induction hypothesis: A’ is 0-normalizable, so A is P-normalizable too, or n = 1 and rn = 0, whence A’ is p-normal. (iv) Assume that E = [y : H I ]H2. Then A 2 Q (Cn)... (C2) [y : (Z := Ci)Hl](Z := C1)H2 2 Q (Cn)... (C2)[y : K1]K2, where we again obtain the p-normal Q [y : K1]K2 as in (i). Since a(Q(C,) ... (C2)[y : K1]Kz) = n
IpC,l
< cr(A), it follows by induction that A is P-normalizable, or
2=2
n = 1 and Q [y : K1]K2 is P-normal.
0
Theorem 2.17 (@-normalizationtheorem). If A E A is p-normable, then A is P-normalizable. Proof. Induction on the length of proof of A E A.
(1) A
=
T;
trivial.
(2) A = Q [z : B ]z or A E Q [z : B]T E A ils a direct consequence of Q B E A. Then by induction Q B is P-normalizable, so QB 2~ Q’B’ in @-normal form, and A I p Q’ [z : B’]z or A >p Q‘ [z : B‘]T in P-normal form.
Strong normalization in a typed lambda calculus (C.3)
463
(3) A = Q [ z : B ]y E A as a direct consequence of Q B E A and Qy E A. Then by induction Q B >p Q’B‘ in p-normal form, so A Q’ [z : B’]y in O-normal form.
>
= Q ( B ) C E A as a direct consequence of Q B E A and QC E A. Then by the induction hypothesis: Q B >p Q’B’ in P-normal form (with 11Q11 = 11Q’11) and QC >p Q”C’ in P-normal form (with 11Q11 = 11Q”11). From Th. 2.13 and induction on 11Q11 it follows that Q”C’ la Q’C’. Also Q ( B )C >p Q’ (B‘)C’, which is in p-normal form if C’ $ [z : D] E. So all that is left to prove is that Q’ (B’)[z : D] E is P-normalizable. But this 0 follows from Th. 2.16 and Th. 2.5.
(4) A
Theorem 2.18 (normalizationtheorem for A). If A E A then A is P-normalizable and normalizable. Proof. Follows from Th. 2.17, Th. 2.12 and Th. 2.7.
0
In fact we proved that A E A is efectively normalizable, since all our proofs are constructive, which implies that the normal form of A E A is effectively computable.
3. Strong normalization In the previous section we have proved normalization for A. This guarantees that for every A E A there is a reduction which leads to a normal form. However, we do not yet know whether an arbitrary sequence of single-step reductions, beginning with A , terminates (in a normal form). We shall prove this in this section. The property that an arbitrary sequence of single-step reductions, beginning with some A , terminates, will be called the property of strong normalization. In the proof we shall use ,&-reduction and &reduction, introduced in Section 11.6. A feature of PI-reduction is that “scars” of old ,&-reductions are retained. We shall first prove strong normalization for A as to ,&-reduction, and derive strong normalization for A as to P-reduction; finally, we shall incorporate qreductions.
Definition 3.1. A E A is P1-normal (or in pl-normal form) if there is no B such that A B; A is P1-normalizable (or A has a 01-normal form) if there exists a pl-normal C such that A Ip, C (C is then called a P1-normal form of A ) . The concepts &normal, &normal form and P2-normalizable are defined analogously. 0
R.P. Nederpelt
464 Theorem 3.2. If A E A, A is pl-normal and A
>a
B , then B is pl-normal.0
Theorem 3.3. If A E A is pl-normalizable, the 01-normal form is unique but for a-reduction. Proof. This follows from CR for P1-reductions (Th. 11.6.38).
0
Theorem 3.4. If A E A, A is p-normable and A >pl B , then B is p-normable and pA > a pB. Proof. First assume that A of proof of A B.
>bl
(I)
>bl
B . We proceed with induction on the length
Let A = Q ( C )P [x : D]E >pl Q (C) P [i: D](x := C ) E = B. Then P [x : D ] E is pnormable. It is easy to see (induction on IIPII) that [x : pD] pE. So pC > a pD. As in the p ( P [ x : D] E ) E p ( [ x : D ] E ) lemma occurring in the proof of Th. 2.5 we can prove that (x := C ) E is pnormable and that pE > a p ((x := C ) E ) .It follows that B is p-normable and pA 2, pB.
>bl
(11) A B is a direct consequence of a monotony. In all three cases the proof is identical to that given in part I1 of the proof of Th. 2.5. Finally, if A >pl B is a multiple-step reduction, decompose the reduction 0 into single-step PI-reductions and apply the above.
We shall now prove the PI-normalization theorem. We do this in quite the same manner in which we proved the p-normalization theorem. In fact, if we had begun by proving the pl-normalization theorem, the P-normalization theorem would have been a corollary. We have not chosen this order because in the following proof the main lines are obscured by the presence of a number of P-chains Pi. In contrast to this, the line of thought in the proof of the pnormalization theorem, given in the previous section, is much more lucid. Definition 3.5. Let A E A, A E P1PB and let P be such that, for each [x : C] for which P E Pz [x : C ]P3, it holds that x $ P3B. Then we call P an 0
ineffective P-chain, and write A
= PI P B.
0
Theorem 3.6. If A E A is pl-normal and B c A , then B has the form 0
B =PO
0
[XI
: All
PI
... [x, : A,]
0
0
P , ( B I ) Pi
0
... ( B L )Pis,
with s
= zi
or
9 E 7.
Proof. Induction on IBI.
0
Strong normalization in a typed lambda calculus (C.3)
Theorem 3.7. Let
465
(Ak)Pk (Ah-1) P k - 1 ... ( A l )P1B belong to A and be k
>
p-normable. Then lpBl
IpAil. i= 1
Proof. Analogous to the proof of Th. 2.14. Note again that for p-normable 0 PC it holds: p(bC) = pC. Definition 3.8. Let A E A and assume that A is pnormable. Let A = 0 0 0 Q Pn+1 (Cn)P n ... ( C i ) P i F , where F T , F x or F E [y : M ]N with n
y
c N . Then a l ( A ) =
JpCiI if n
1 1, and a l ( A ) = 0 in case n = 0 and
i=l 0
P,+lf 0. Moreover, q ( A ) = 0 if A
= QT or Qx.
0
Theorem 3.9. Let A belong to A and be p-normable, let 0 0 0 A E Q Pn+1 (Cn) Pn ... (Ci) P i F , where F f T , F = x or F
[x : D]E
0
wzth x c E . Let QC,,QF and Q P , r be pi-normal. Then A is pi-normalizable.
Proof. Induction on q ( A ) . The proof is analogous to that of Th. 2.16. How0
ever, some modifications are required due to the P,. We shall briefly comment 0 0 0 on this: As to (2) (i): E = P k (H,) ... P i ( H I ) Pby, and
A2Q
(cn)i
in+i
n
... (
~ 2 F )2 ( ~ 1 F ) 1
[g: I)] ; :(K,,J ... Fy (
~ 1 , y:;)
0
where the K , are obtained as in the proof of Th. 2.16 and where Q P!T are the 0
,&-normal forms of Q(z := Cl) P ~ TThese . can be obtained since either 0
0
0
(1) if x CP;: P! =Pi, or 0
0
(2) if x CP;: q ( Q (CI) [X : D] P t r ) = (pC1I 5 g ( A ) and I P ~ T<~IF( (apply the induction on IF[).
Note that
F 2
(C,)
[g : D] Fk is an ineffective ,f3-chain. 0
As to (2) (iii): FrCl can be P O [x : M ] N. If x c N , then the proof is similar to that of Th. 2.16. If x N , we can take [g : M ] as part of an ineffective p0
0
chain ( K 1 ) P: P O
[g : M I , and
look at the structure of N instead of that of
FrCl. This amounts to looking for the first “effective” abstractor in N . If there is such an abstractor and the obtained expression is not yet in /31-normal form, induction is applicable as in Th. 2.16. If not, we have already obtained pl-normal form. 0 As to (2) (iv): E =PO [y : H I ]Hz. If y c H2 (so y C K z ) , the proof is 0 obvious. If not, look at K2 instead of E , as in the previous case.
R.P. Nederpelt
466
Theorem 3.10 (PI-normalization theorem). is -normahable.
If A
E
A is p-normable, then A
Proof. Induction on the length of proof of A E A, analogously to the proof of Th. 2.17. As to case (4)of this proof the only case worth mentioning is 0
c E , then Th. 3.9 yields the desired result, and if we have already obtained P1-normal form. C’ =P [z : D] E. If z
5
$E
o
Theorem 3.11 (PI-normalization theorem for A). If A E A, then A is ,131normalizable. Proof. Follows from Th. 2.7 and Th. 3.10.
0
In fact we have proved that A E A is efectively PI-normalizable. Theorem 3.12. Let A E A and A
>&,B .
Then IAl 5 IB(.
Proof. Induction on the length of proof of A
>&,B .
0
Definition 3.13. Let A E A. We write Pl-nf A for the PI-normal form of A which we obtain from the effective computation as suggested by Th. 3.10 and used in Th. 3.11. 0 Note. This PI-normal form is unique (Th. 3.3). Definition 3.14. We call K E A strongly P-normalizable if there is an upper bound for the length 1 of reduction sequences K = K1 KZ ... Kl. Analogously we define the concepts strong PI-, Pa- or 17-normalizability of K . 0
>& >& >&
Theorem 3.15 (strong Pa-normalization theorem for A). If A E A, then A is strongly -normahable. Proof. Induction on IA(.
>b2
Definition 3.16. Let A E A, and let A = A1 A2 >k2 ... >L2 A, be the longest possible sequence of single-step P2-reductions beginning with A. Then @(A) = p . 0 Note that p 5 (A[. Theorem 3.17. If A E A and A Proof. Follows from Th. 3.3.
>&, B , then &(P1-nf A) = OZ(P1-nfB ) . 0
Strong normalization in a typed lambda calculus (C.3)
467
Theorem 3.18. If A 2bl B , then & ( A ) < 0 2 ( B ) .
>,;
B. The only interesting case is A = Q ( C )p [z : D] E >&,Q ( C )P [g : D] (3 := C ) E = B , where, indeed, we have at least one single-step P2-reduction more on the right hand side. The rest Proof. Induction on the length of proof of A
of the proof is easy.
0
Corollary 3.19. If A E A, then 02(A) 5 @ @ l - n f A ) .
0
Theorem 3.20 (strong pl-normalization theorem for A). If A E A, then A is strongly ,f31-normalizable. Proof. Follows from Th. 3.17, Th. 3.18 and Cor. 3.19.
0
Definition 3.21. Let A E A, and let A = A1 >b, A2 2&, ... A, be the longest possible sequence of single-step PI-reductions beginning with A. Then 01(A) = p . 0
>bl
Theorem 3.22. Let A E A, let A be in &-normal f o r m and let A >p, B . Then B is also i n PI-normal form. Proof. If B were not in PI-normal form, then B >b, C for some C. In that case there would be a reduction A B’ >p, C according to Th. 11.6.17. Con0 tradiction.
>bl
Theorem 3.23. If A E A, then there is a n upper bound for the length 1 of A1 2’ A2 2’ ... 2’ Al, where each Ai 2’ Ai+l is a reduction sequences A single-step p1- or /32-reduction. Proof. Induction on @ l ( A ) . If 01(A) = 0, then A is in pl-normal form. If we can apply ,&-reductions on A such that A B (n 2 l ) , then B is also in pl-normal form (Th. 3.22). The number of possible single-step &reductions applicable is finite (5 &(A); cf. Th. 3.15). So let 01(A) = p > 0, and assume that the theorem holds for all K with Q l ( K ) < p . Let A 2 D be a reduction sequence consisting of single-step PIand &-reductions. If no PI-reductions occur in the reduction sequence, the length of the reduction sequence can be at most 02(A). Else, let A 2 D be A B C D. Then by Th. 11.6.17 there is also a reduction sequence A B’ >p, C D . Each B” such that A B“ has by induction (since 01(B”) < &(A)) an upper bound for the length of reduction sequences B” 2’ ... >’ E in which each single-step reduction is either a PI- or a p2reduction. Let m be the maximum of these upper bounds. Then the length of the reduction sequence B’ 20, C D, hence of C D, cannot be more than
>z2
>,;
>b,
>&, >
>
>b,
>
>
R.P. Nederpelt
468
m. It follows that the length of any reduction sequence A 2 D can be at most 0 @2(A) m 1.
+ +
Theorem 3.24 (strong P-normalization theorem for A). If A E A, then A is strongly P-normalisable. Proof. Each P-reduction sequence of A can be decomposed into single-step PI0 and Pz-reductions by Th. 11.6.15. So Th. 3.23 yields the desired result.
>&
>& >&
Definition 3.25. Let A E A, and let A = Al’ A2 ... A , be the longest possible sequence of single-step @reductions beginning with A . Then @ ( A )= p . 0 Theorem 3.26 (strong 7-normalization theorem for A). If A E A, then A is strongly rpnormalizable. Proof. Induction on IAl.
0
Definition 3.27. We call K E A strongly normalizable if there is an upper bound for the length 1 of reduction sequences K = K I K2 _>’... >’ Kl 0 where each reduction Ki >’ K;+1 is a single-step P- or 7-reduction.
>‘
Theorem 3.28 (strong normalization theorem for A). If A E A, then A is strongly normalizable. Proof. Induction on @ ( A ) . The proof is similar to that of Th. 3.23. Use Th. 3.26 instead of Th. 3.22, and instead of Th. 11.6.17, use the theorem: If K E A, K lo L M , then K L’ >_ M . The latter theorem is easy to prove, since each reduction A 2; B C can be replaced either by a reduction A >&B‘ 2; C (where r 2 0) or by a reduction A B’ _>; C (see the discussion after Th. 7.18; see also Th. 7.25).
>b
>b
>&
>&
469
Big Trees in a A-Calculus with A-Expressions as Types* R.C. de Vrijer
0. Outline
The abstract term system AX studied in this paper is a close relative of the Automath family of languages. In the investigation of normalization and decidability properties of these languages, AX came up as a natural generalization of AUT-QE, the language currently in use for mechanical proof checking at the Automath project in Eindhoven. For introductory reference, see [van Daalen 73
~4.311. The introduction, Section 1, is an informal account of the system AX and its relation to other systems. The formal description of AX is given in the Sections 2 and 3. In 4 the main results are stated, mostly without proof. Section 5 is devoted to proving that the big trees are well founded (BT).
1. Introduction 1.1. Heuristic description Before describing the main results of the paper we make a few heuristic comments, especially on the generalized type structure involved. Here we use the “formulas-as-types” notion for interpreting mathematical statements and proofs, originated independently in [ d e Bruzjn 70a ( A . 2 ) and [Howard 801 (the term comes from Howard). Further references are given in 1.4.
1.1.1. Type structure To illustrate the transition from the type structure of traditional type theory, e.g. the typed A-calculus exhibited in [Hindley et al. 721, to the types we have here, we consider mnstructive versions of propositional and predicate logic respectively. If we identify a proposition a with the type of its constructions (or *Reprinted from: Bohm, C., ed., A-Calculus and Computer Science Theory, p. 252-271, by courtesy of Springer-Verlag, Heidelberg.
R.C. de Vrijer
470
proofs), then the implication a + p will be the type of constructions that map constructions of a to constructions of /3. That is, a + p corresponds essentially to the Cartesian power pa. In predicate logic a construction c of Vz.P(z) will map any object t from the domain of quantification a to a construction of the proposition P ( t ) . Hence the type of c(t) depends on the choice o f t . The notion of power doesn’t suffice any longer; we need that of Cartesian product: II P ( z ) . XECX
1.1.2. Abstraction and application, two interpretations Automath exploits the formal similarity between two kinds of abstraction: functional abstraction to form the functionlike construction Xz E a.c(z)and the product construction ll P ( z ) . It is convenient to unify these principles in XECX
the notations [z : a]c(z) and [z : a]P ( z ) , respectively. Observe that now functional application in the former case corresponds to specification of coordinate axis in the latter. Also here we use the same notations: ( t )[z : a]c(z) and ( t )[z : a]P ( z ) ,which reduce to c(t) and P ( t ) ,respectively. Now this uniform syntactical treatment of both kinds of abstraction, very convenient for our purposes, may cause some confusion in interpretation. For example, vis i vis the formula-type analogy it amounts to using the same notation for both the predicate, i.e. “propositional function”, Xz E a.P(z) and its universal quantification
vz E a.P(z). 1.1.3. Supertypes, type inclusion We further introduce the constant type as a %upertype” of types. Then, e.g. [z : a]type will be the supertype of all those types p, such that whenever t is an element of type a (notation: t E a ) , ( t )p is a meaningful type. Hence, carrying on the example from 1.1.2, we have [z : a]P ( z )E [z : a]type. Moreover, because of the possibility of interpreting [z : a]P ( z ) as a proposition Vz E a.P(z), we require that [z : a]P ( z )E type. This motivates the facility in AX (and in AUT-QE) to pass here from [z : a]type to type, known as the principle of type inclusion: [z : a]type C type (cf. [van Daalen 73 (A.311, [de Bruzjn 7Oa (A.2)] and 3.5.2 below). In order to clarify this slightly ambiguous situation one could for the product construction introduce the II’s again, and obtain II [z : a]P ( z )E type for the product and [z : a]P ( z )E I I [z : a]type for the type-valued function, respectively (cf. [Zucker 77 (A.d)]).
1.1.4. AX-theories Expressions are built up by using the principles of abstraction and application mentioned above, starting from variables, parameters and constants. A
Big trees in a A-calculus (C.4)
471
particular choice of the constants and their (super) type assignments will depend on the interpretation one has in mind. Such a choice is formally fixed by a base (cf. 3.1). Each base determines a specific AX-theory. In informal mathematics new notions are always introduced in a context, possibly indicated by the presence of certain parameters and assumptions. This observation is reflected in AX by the fact that constants are allowed to depend on parameters. We now illustrate the treatment of constants in AX and the parameter mechanism involved. Let Cl(a,p) be a type constant, t o be interpreted as the proposition 3s E a.(s)p, where p is supposed to represent a predicate on the type a. Inforp ) one might stipulate: mally introducing C1(a, (1) “Let P be a type, and Q be a predicate on P. Then we will consider C1( P ,Q) as a proposition.”
In AX the (super) types of parameters are indicated by superscripts and hence the corresponding axiom reads: (2)
Cl(PtYPelQ[s’pltype) Etype .
The rule of existence introduction can now be formalized by adding another constant Cz(a,p) together with the axiom (3)
C2(PtJPe,Q[s:pltJrpe) E [X : P] [y : (s) Q]Cl(P, Q)
.
When actually given a E type and p E [s: a]type, the statements c1(a, p) E type and Cz(a,p) E [x : a][y : (x) 01 Cl(a,p) can now be obtained as instances of (2) and (3), respectively. Moreover, for objects t E a and s E ( t ) P application and P-reduction yield: (s) ( t )Cz(a,p)ECl(a,p). For further explanation on the subject of interpretation we refer to the treatment of AUT-QE in [van Daalen 73 (A.3)] and to [van Benthem Jutting 773. [Note. We can be somewhat more explicit on the relation between the formats of AX and AUT-QE. Axioms like (2) and (3), which in a AX-theory are given by a base, correspond to PN-lines in an AUT-QE book:
* P *
P Q
Q * Q *
c1
:= := :=
c 2
:=
type [ x : Pltype : type : [s: PI [y : (s) Q] type
: :
PN PN
.
In this manner a AX-theory (or rather, its base) corresponds to an AUT-QE book in which all constants are introduced as primitive notions. Vice versa, each such AUT-QE book gives rise to a AX-theory. Defined constants are not considered in AX.]
R.C. de Vrijer
472 1.2. Applicability
Usually, in type theory as in the Automath languages, term application is subjected to the applicability condition: ( t )f is a term iff there are types Q and p such that tEa and f E [ z : alp. Now in typed A-calculus this condition is easy to formulate. The type structure and the assignments of types to terms are given in advance, i.e. all of the syntax precedes the generation of theorems. In our case, however, types depend on objects and the type assignments are themselves treated as theorems in AX. Hence here the applicability condition would make derivability interfere with term formation. A common way of dealing with this complication (cf. Automath, Martin-Lof, etc.) is to generate the terms (including the types) simultaneously with the theorems. By contrast we take the approach of allowing unrestricted application in AX, but instead now subjecting the rule of @reduction
(4)
( t )[z : a]c --t c [z := t]
to the condition tEa. We can then formulate an applicability condition by referring to derivability in AX and so define the set of legitimate terms. The legitimate fragment AX - 1 of AX is the system one obtains by restricting AX to the language containing only legitimate terms. Hence AX - 1 may be considered as the part of AX that is significant for interpretation. (Though, of course, the illegitimate terms do have a computational interpretation in the term model.) The justification for the above sketched procedure lies in the following result: AX is a conservative extension of AX - 1 , (5) This property may be regarded as a soundness criterion for the notion of legitimacy as defined above, and hence for AX: if the equality of two significant (read: legitimate) terms can be proved in AX, it can be done using only significant terms. The proof of (5) uses the result on “big trees” described below. 1.3. Decidability, big trees
We now turn to a second desirable property of the systems: (6)
AX and hence AX - 1 are decidable.
Decidability of the typed A-calculus is an easy corollary of the strong normalization property (SN) and the Church-Rosser property (CR). Every term reduces effectively to its normal form (nf) and two terms are equal iff their nf’s are identical. However, although both SN and CR go through for AX, they are not sufficient for the decidability, as we will now explain. In the discussion we make use of an effective function T ,which assigns canon. since we have uniqueness ically t o every object a type such that t ~ T ( t ) Then, of types (cf. 3.3.5):
Big trees in a A-calculus (C.4)
(7)
473
tE a -3T ( t ) = ct
(where by CR, = is equivalent to having the same nf). So, in order to see if (4) holds, we must first determine if ~ ( tand ) a have the same nf (by (7)). Then in the process of reducing these terms questions of the form (4) may arise again, and so on. To deal with this problem, we proceed as follows. Let -‘bt be the improper reduction relation generated by (i)
usual &reduction,
(ii) applying T , (iii) taking proper subterms. Call the tree of -’bt-reduction sequences of a term C the “big tree” of C. Then we prove instead of SN the stronger property:
(BT)
big trees of terms in AX are well founded.
Together with CR this result easily implies the decidability. Further, as mentioned above, it is also used in the proof of ( 5 ) . In his thesis Nederpelt [Nederpelt 73 (C.3)] stated as a conjecture for his sytem A, the closure property: (8)
Legitimate terms reduce to legitimate terms.
It turns out that B T (for A) implies (8). Further it seems that BT can be proved for A by a method, similar to the one used here. (Note that by contrast (8) for AX is a simple consequence of the formulation of the system and its proof does not require BT.) We feel that, apart from the applications described, BT may have some interest on its own. 1.4. Historical remarks
The first proof of normalization of an Automath system was given in [van Benthem Jutting 71b (C.l)]. Nederpelt (Nederpelt 73 (C.3)] proved strong normalization for his system A. He made two conjectures: the above mentioned closure property for A and CR for the system with 77-reduction. The latter conjecture was proved in [van Daalen 80, (C.5)]. The result is assumed in this paper. Scott [Scott 701 suggested to use the ideas of de Bruijn [de Bruijn 70a (A.2)] for the formalization of an intuitionistic theory of constructions. At about the same time Howard [Howard 801 came up with similar ideas. The line is pursued
R.C. de Vrijer
474
in [Martin-Lof 75aJ. His theory of types is claimed t o be a natural framework for intuitionistic mathematics. The different accents in motivation - Automath more practical, Martin-Lof more philosophical - might be responsible for some of the differences in the investigated systems.
2. The language, expressions In this paragraph we specify the language of a AX-theory. ‘This language is affected by the choice of a base (cf. 3.1). A similarity type (defined below) codes the information, which is relevant for the formation of expressions. Hence for each similarity type s we define the language 12,. 2.1. Alphabet All formal symbols used are from the alphabet consisting of the symbols for variables parameters const ants binary relations
...
2,? 2, I,
P, Q,R, ... C1, Cz, C,, ... and type
=, +, ++, -+, E , (, ), (, ), ,.
and the auxiliary symbols [, 1, Variable symbols will be indexed by types to become (object-) variables, parameter symbols by types and supertypes to become object- and type-parameters, respectively. The set of variables is assumed to be such that whenever needed, we are able to choose uniquely a “new” variable of the desired type, not yet occurring in the context. The enumeration of the constant symbols is meant to show the order in which they can be introduced in a particular interpretation (cf. 1.1.4 and the notions of date and base). In Automath this would be the order in which they appear in a “book” (cf. [ v a n Daalen 73 (A.3)]). 2.2. Similarity type
A similarity type s is a triple (SO,Sl,o), where So and S1 are disjoint sets ’ to (0, l}”,the set of finite of natural numbers and o is a function from SOU 51 (possibly empty) sequences of zeros and ones. Here SOindicates the set of constant symbols used for object-constants, S1 the set of constant symbols used for type-constants and if i E SOU 4 ,then o(i) determines the positions of object- and type-parameters of Ci (cf. 2.3.1 (ii)).
Big trees in a A-calculus (C.4)
475
2.3. Expressions The expressions fall into three sorts: objects, types and supertypes. These are simultaneously defined in 2.3.1. In the definition we use already the notion of closed expression, to be defined in 2.3.4. However, it is clear that the definitions could have been given simultaneously.
2.3.1. Definition. Given a similarity type s, the sets of variables, parameters, constants, objects, types and supertypes, building together the set E, of expressions is defined by simultaneous induction. If z is a variable symbol, P a parameter symbol, a a type, ,O a closed (cf. 2.3.4) type and p* a closed supertype, then za is a variable, PP is an object-parameter and Po' is a type-parameter. Let o(i)= 61, ...,6, and C1, ...,C, be expressions such that Cj is an object if 6 j = 0 and Cj is a type if 6 j = 1, then Ci(C1, ...,C,) is an object-constant if i E SOand a type-constant if i E S1. Variables, object-parameters and object-constants are atomic objects. Typeparameters and type-constants are atomic types and type is the only atomic supertype. If f and t are objects, a and ,O are types, a* is a supertype and za a variable, then ( t )f and [za: a]t are objects, ( t )p and [xa : a]p are types 0 and ( t )a* and [xa : a]a* are supertypes.
2.3.2. Convent ions As syntactical variables we use C, r, ... for expressions in general, f , g , t , s,... for objects, a,& ... for types and a*,p*,... for supertypes. The symbols for variables, parameters and constants are used themselves as syntactical variables for their respective categories as well. As long as no confusion arises we will freely add and omit indexes. In particular the superscripts of variables and parameters are suppressed where possible, e.g. we write [z : a]z instead of [za: a]za. Vectorial notation is introduced for sequences of expressions; e.g. 3 is short for the sequence a l , ...,an,where the number n is either known or not essential. As = is a symbol of the language, we use = for syntactic equality between expressions. Now follow some more technical and notational definitions concerning expressions.
R.C. de Vrjer
476
2.3.3. Complexity, length and date According to Definition 2.3.1 each expression has a construction, easily seen to be unique, consisting of a finite number of applications of the rules (i) t o (iv). The complexcity c ( C ) of an expression C is the number of steps in its construction. By induction on c ( C ) we define two more measures on C: its length l ( C ) and date d(C). 1(C) = 1 if C is either
a variable, a parameter or t y p e ;
l(C(C1, ...,C,)) = max(l(&), ...,l(C,))
+
+ 1;
l ( ( t )r) = max(l(t), l ( r ) ) 1; l([x : a]r) = max(l(a), l ( I ' ) )
+ 1.
d(type) = o ; d(z*) = d(a); d ( P ) = d ( r ) ; d(C;(&, ...,C,)) = max(i,d(C1), ..., d(C,)) ; d ( ( t )I?) = max(d(t), d ( r ) ); d( [x: a]r) = max(d(a), d ( I ' ) ) . Notice that d ( C ) is the greatest natural number i, such that Ci appears in the construction of C. 2.3.4. Free variables, parameters, special variables By induction of l ( C ) we define the sets FV(C) of free variables and Par@) of parameters of C .
=0
FV(type) = 0
; Par(type)
FV(z) = {z}
; Par(x) = 0
FV(P) = 0
; Par(P)={P}
FV(C(C1, ...,C,)) =
U FV(Ci)
; Par(C(C1, ...,C,)) =
i
U Par(Ci)
isn
FV((t) I') = FV(t) U FV(I')
; Par(@)I') = Par(t) u Par(I')
FV([x : a]I?) = F V ( a ) u (FV(I')\{x})
; Par([z : a]I?) = Par(@)U P a r ( r )
.
The set SV(C) of special variables of C is defined as
SV(C)=
U
FV(a)
sa'EFV(C)
An expression C is called closed if FV( C ) = 0. For a sequence &,...,Enwe introduce the notation where F is FV, SV or Par.
F(5) = U isn
F(C,),
Big trees in a A-calculus (C.4)
477
2.3.5. Proper subexpressions The relation 3 (contains as a proper subexpression) between expressions is the smallest transitive relation such that C(C1, ...,C,) 3 Ci;( t )r 3 t ; ( t )r 3 r; [z : 4 r 3 a and [z : a ]r 3 r. 2.3.6. Simultaneous substitution Let P I ,..., P, and 21,...,z, be sequences of distinct parameters and variables. And let t l , ...,t, be a sequence of objects and &,...,Em. a sequence of expressions, such that Ci is of the same sort (i.e. object or type) as Pi. Then t’] of simultaneous substitution of 2,Cfor P , Z is defined the result C [P,Z := 2, by induction on I(C). In the definition we abbreviate [P,Z := 2, t’] by I”.
xi = ti (1 5 i 5 n) and z’ = z if z @ (21,..., z,} ; Pi = Ci (1 5 i 5 m ) and P’ = P if P @ {PI, ..,,P,} ; (C(C1,...,C k ) ) ’ c C(C’,,...,C;) ; type’
type ;
( ( t )r)l 3 (t’)r’ ; ([z : a]r)’= [y : a’](I’[z := y])’ , where y is a new variable. By
r’ [P, Z := e,t‘] we denote the sequence I?:, ...,r;.
2.3.7. a-equivalence of a-equivalence between expressions is the smallest equivThe relation alence relation such that, s and a ca/3, then also xQ =a xp,Pc = a Pr, if C 5, I?, t C(A1,...,C, ...,A,) =a C(A1, ...,r, ...,A,), ( t )C = a ( s )r and if y $! F V ( r ) then [x : a]C
[y : a]I?
[x := y].
In the sequel we shall simply identify a-equivalent expressions. Formally one might pass to a-equivalence classes, considering an expression as merely a name, denoting the class it belongs to, and show that the preceding definitions behave well with respect to za. In some places names of bound variables will be tacitly assumed to be chosen such that no ‘‘clashes” arise.
2.3.8. Lemma. Let {Ql, ...,Q,} n Par(2,t’) = { y ~ ,...,yn} n F V ( 5 , S ) = 0. Then c y’ := r’, 31[P,z := 2, t‘] = c [P,5 := 2,t‘] y’ := (f, [P,z :=
2,a.
10,
16,
0
R.C. de Vrijer
478
0
2.4. Formulas, language
-
Let s be a similarity type. Then the language C, consists of the formulas: r (equals), C I? (reduces to), C ++ r (properly reduces to), C + r (reduces in one step to) and C E r (has type or has supertype), where C and l7 are expressions in E,. If R is a relation symbol we write C1, ...,C, R r l , ...,r, (5 R l ? ) for the sequence of formulas CI R rl, ...,C, R I?,. C =
3. XA-t heories
3.1. Base According to what has been said in 1.1.4, the set of axioms and rules of a AX-theory can be divided into two parts. (i) A set, characterizing the underlying system (the same for any AX-theory). (ii) In addition, the assignments of types and supertypes to the relevant constants, determined by a base (defined below). The situation may be compared to e.g. predicate logic, where one adds for each particular theory a set of mathematical axioms to the fixed framework of logical axioms and rules. Now recall the example in 1.1.4. It involves an instance (1) of the general assumption scheme:
(*)
“Let PI be a C1, let P2 be a
C2,
... and P,
be a C,. Then
... .”
In such a scheme it is assumed that the Xi’s are well defined in the given context, which leads to the requirement that Ci+l should not contain free variables or parameters other than PI ,...,Pi. This observation motivates the following definition.
3.1.1. Definition. A regular sequence of parameters (‘sop) is a sequence PF1,...,P,”-of distinct parameters, where the Xi’s are closed types or super0 types and for 0 5 i < n, we have Par(Ci+l) C { P I ,...,Pi}. We now proceed t o the definition of a base. Notice the requirements on dates, motivated by the remark made in 2.1.
Big trees in a A-calculus (C.4) 3.1.2. Definition. A base R is a triple (i)
479
(9,
p, T ’ ) , where
s is a similarity type ( S o , S l , a ) .
(ii) p is an effective function from SOUS1 to rsop’s, such that for all i E SouS1, if p ( i ) = PF1,...,P,”-, then C i ( P f l ,...,P,”.) E E, and
max(d(C1),..., d(C,)) < i. (iii) T’ is an effective function from SO to closed types in E, and from 5’1 to closed supertypes in E,, such that for i E SOU S1, if p ( i ) = P I ,..., P,, then 0 Par(T’(i)) E { P I ,...,P,} and d(T’(i))< i.
[Note. W e can now fill in some more details of the correspondence between AXtheories, via a base R, and AUT-QE books (cf. the note to Section 1.1.4). This may also clarify the rather formal definitions. First, what is called here an rsop is just a context i n AUT-QE. The constants Ci would in the AUT-QE book that corresponds to a base R be introduced in PN-lines, in the order of the indexes i. Then p ( i ) gives the context on which Ci is defined and ~ ’ ( i ) determines the category (a type or a supertype). It could be remarked that Automath books are always finite, whereas for a base in AX there is no such restriction. However, for the language theory that makes no difference.] 3.2. Axioms and rules of AX [R]
3.2.1. Given a base R, the AX-theory XA [R]is formulated in the language C,. The axioms and rules of AX [R] are the following. (I)
type assignment. (a)
Z
~
E P ~E E ;
.
(b) Ci(&, ...,C , ) E T ’ ( ~ )[PI,...,P, := C1, ..., C,], if i E SOUS1 and p ( i ) =
PI, ..., P,. (c) a Etype (type inclusion). (d) C E r i- [ z :a ] C E [ z: a ] r ,provided x $Z SV(C). (e) C E r k ( t ) C E ( t ) r .
(11) one step reduction. @reduction: t Ea k ( t )[z : a]C + C [x := t]. q-reduction: f EP, P E [x : a]a* I- [z : a](x)f + f , provided z g! FV( f ) , P E [x: a]a* k [z : a](z) /3 + /3, provided z $Z FV(0).
480
R.C. de Vrijer monotonicity rules: (a) (b)
c r I- C(C1, ...,C, ...,C,) -, C(C1, ...,r, ...,C,). c + r I- ( t )c ( t )r ; t s I- ( t )c (s)C. -+
-+
(c) C -+ I? I- [z : a]C (d) a -+ p I- [z : a]C
-+
-+
-+
-+
r, provided z $! SV(C).
[z : a]
[y : p] C [z := y], provided y
FV(C).
(111) proper reduction, reduction and equality.
(a) (b) (.)
c + r t- c -++ r ; C r, r A I- C -++ A. x++rt-z+r;C+C. c + r I - c = r ;c = r k r = c ; c = r , r = ~ t - c = ~ . ++
-++
(d) C = r , A E C k A E r ; C = r , C E h k r E A .
3.2.2. Remarks (i)
I(c) amounts to the principle of type inclusion (cf. 1.1.3 and 3.5).
(ii) The motivation of the restriction in I(d) is clear from the following example. Suppose one had [z : a]yc(") E [x : a]C ( x ) . Then for arbitrary t E a by application and P-reduction yc(") E C ( t ) ,which is obviously not intended. (iii) The restriction in II(c) excludes the possibility of both ( t ) [z : a](yc(")) [z : C(z)] z ( t )[z : a]yc(") + yc(") and ( t )[z : a](yc(")) [z : C(z)] z + (yc(")) [ z : C ( t ) ]z , both in normal form, 0 violating CR. -+
In the sequel we assume an arbitrary base 0 to be fixed. By just stating a formula we mean that it is derivable in AX [n],for convenience further referred to as AX. Syntactical variables for expressions are supposed to range over E,.
3.2.3. Lemma. The monotonicity rules II(a-d) hold also with b y ++, + or =.
-+
replaced 0
3.2.4. Now follows a, rather technical, definition, auxiliary to the important substitution lemma 3.2.5. Compare also Definition 3.1.1 (rsop's). Definition. Given an expression C, a sequence Prl,..., P:m,x;ll, ...,2:" := rl, ...,rm, t l , ...,t , is called a regular substitution sequence (rss) for C , if the following conditions are satisfied:
481
Big trees in a A-calculus (C.4)
(i) (ii)
riE&[F := r'] (1 5 i 5 m). tiEai [ P , z : = r',t'] (1 5 i 5 n).
(iii) If Q* E Par(C)\{Pl, ..., I'm}, then Par(h) n {PI,...,P,} (iv) If yp E FV(C)\{zl, ...,z,}, then FV(P) n {XI,...,z}, Par(@)n {Pl,...,Pm}= 0.
= 0.
= 0
It is easily verified that the conditions (iii) and (iv) are fulfilled if in particular:
...,z,} n SV(C) = 0,or
0
m = 0 and
0
Par@)
0
Par(C) C { P I ,...,P,}
(21,
{ P I ,...,Pm} and FV(C) E
(21,..., z,},
and hence if
and C is closed.
r, and let C R r , [P,2 := 9, q. 0 Simultaneous induction on the length of deduction of C R r.
3.2.5. Lemma. Let F,.' := 2 , t ' b e both a n rss for C and for where R is +, ++, +, = or E. T h e n also C [P,2 := 2, R
Proof.
3.3. Canonical type assignment, uniqueness of types The assignment function r' generates a function r , which assigns canonically to each object a type and to each type a supertype, such that always C E r ( C ) .
3.3.1. Definition. r ( C ) is defined by induction on 1(C).
a:; .(pr) = r. r(Ci(C1,...,C,)) E r'(i)[P := f ] for i 5
E SOU S1, where p ( i ) = 9,...,P,. 7 ( ( t r) ) 3 (4 7(r); o r( [z : a]I?) = [z : a]r ( r ) ,where z is chosen such that z $ SV(r).
3.3.2. Lemma. CET(C) holds for any object or type C.
Proof. Induction on 1(C). 3.3.3. Lemma.
r(c(5)) ' .1
0
:=
f l 3 T ( c'1.( ~ := t'])).
Proof. Immediate by Lemma 2.3.9 and the definition of
7.
0
3.3.4. Lemma. Let Z := t'be a n rss for C, then T ( C [2:= f l ) = r ( C )'.[ := t'].
Proof. Induction on 1(C). Use Lemma 3.3.3 in case C is a constant.
0
R.C. de Vrijer
482 3.3.5. Theorem. (Uniqueness of types). a = r(t).
tEa
Proof. One side is implied by Lemma 3.3.2. For the other side, prove by simultaneous induction on the length of deduction of t Ea and t = s, respectively, the two statements t E a + a = 7 ( t )and t = s + r ( t )= r ( s ) . The proof makes 0 use of the previous lemma. 3.3.6. Remark. The analogous result for supertypes does not hold (cf. 3.5). However, in AX without Rule I(c) one would obtain Theorem 3.3.5 for super0 types as well. 3.4. Legitimacy In this section we define the set L of legitimate expressions. Then the legitimate fragment AX - 1 of AX is the theory obtained by restricting the axioms and rules of AX, to use only expressions from L.
3.4.1. Remark that L depends on the choice of R. We might call R a le0 gitimate base if {Ci(p(i))I i E SOU Sl} U {T'(i)I i E SOU Sl} L. 3.4.2. For the sake of the characterization of the legitimate expressions we now introduce a function r*, assigning canonically t o each expression a supertype. Deflnit ion.
=
7*(a*) a*
for supertypes a*
r*(a)= .(a)
for types a .
r * ( t )= r ( r ( t ) )
for objects t .
0
Remark. r* may be compared to Typ* in [Nederpelt 73 (C.3)].
0
3.4.3. Definition. The set L of legitimate expTessions is specified by defining by induction on ( d ( C ) , c ( C ) ) (i.e. w.d(C) c(C), cf. 5.2), what it means for an expression C to be legitimate.
+
za E L iff a E L; pC E L iff c E L. Ci(5)E L iff C1, ...,C,, r'(i) E L and p(i) := 2 is an rss for ~ ' ( i ) . ( t )l? E L iff t , I? E L and for some a,a* we have tEa and r*(r)= [Z : a]a*. 0 [x : a]I? E L iff a,I? E L , provided z g' SV(r).
Big trees in a A-calculus (C.4)
483
3.4.4. Lemma, Let P , 2 := r', { b e an rss for C1, ...,C,, respectively, and let := 2 be an rss for the closed expression C. Then also := 9 [P,Z := I?, fl is an rss for C.
a
Proof. Apply Lemma 3.2.5.
0
3.4.5. Lemma. Let C , r l , . . , ,rm,tl,...,t, E L and let rss for C, then C [P,Z := ?, fl E L.
p,Z := ?,? be
an
Proof. Induction on 1(C). Use Lemmas 3.2.5 and 3.4.4. 3.4.6. Theorem. (Extended Closure). Let C E L and let either C + I?, or C 3 (I.e. C E L and C -'bt r + r E L.)
r
or r ( C )
= r.
0
Then also
r
E
L. 0
3.5. Type inclusion, uniqueness of domains The analogue of the uniqueness of types theorem for supertypes does not hold. E.g. we have both [z : a]PE(z: &]type and [x : a ] @ E t y p e (cf. 1.1.3). However, one does obtain a weaker result, viz. uniqueness of domains:
a E [ z : p]p* and aE[$ : y ] y * + p = y . This property is important as a justification for the above characterization of legitimate expressions. We state here without proof:
3.5.1. Theorem. a E [z : p] p*
28 for
some supertype A*, .(a) = [z : p] A*.
In order to say something more on the structure of supertypes in AX - I , we define the relation & of type inclusion between supertypes in L.
3.5.2. Definition. First define the relation c between supertypes in L inductively by (i)
a* c type for any supertype a*.
(ii) If a* c p*, then also [z : a ]a* c [z : a]0' and ( t )a* c ( t )p*. Then
is the smallest transitive relation in L extending = and C.
0
3.5.3. Theorem. Let a , p, a*,p* E L . Then (a) aEa* and aEp* (22) aEa* *.(a)
a* 5 p* or p* E a*.
E a*.
0
484
R.C. de Vrijer
So T assigns to a legitimate type its minimal legitimate supertype. Note that a supertype in L , which is in normal form, is always of the form ... ["k : ak]type. [XI : 4. Decidability and conservativity 4.1. Sequences, trees We use u, p, ... to range over, finite or infinite, sequences of expressions. We define lh(u) to be the length of u if u is finite, Ih(u) = 00 if u is infinite. C will also stand for the sequence of length one, consisting of C only. If Ih(u) < 00, then u,p stands for the concatenation of u and p. We define: u < p ( p extends u ) iff there exists a sequence T , such that u, T = p.
4.1.1. Definition. A sequence Co, C1, ... is called a (i)
reduction sequence of CO iff Ci
(ii)
rs-sequence of Co iff either Ci + Ci+l or Ci 3 Ci+l,
4
(iii) +bt-sequence of Co iff either Ci
C,+l,
+
C;+1 or Ci IICi+l or .(Xi)
= Ci+l.
0
4.1.2. Definition. The finite reduction sequences of a term C form under the partial order < a tree, the reduction tree of C. Analogously we have the rs-tree and the +bt-tree of C. The latter is called the big tree of C. The set of +bt-sequences of C is denoted by S(C). Finally, B ( C ) = {I? I C -Sbt r}. 0 4.1.3. Definition. h ( C ) will be the height of the reduction tree of C: h ( C ) = max({lh(a) I a is a reduction sequence of C}). Analogously, b(C) = max({lh(u)(o E S(C)}) is the height of the big tree of C. 0
4.2. Normal forms, strong normalization An expression C is in normal form (nf) if there does not exist an expression
r such that C -+ r. An expression C is called strongly normalizable if h ( C ) < duction tree of C is well founded.
00,
i.e., if the re-
4.3. Results We now state the main results of the paper. The details of proofs are generally omitted. However, Section 5 will be devoted to sketching the proof of BT (Theorem 4.3.2).
485
Big trees in a A-calculus (C.4) 4.3.1. Theorem. (CR). If C = r, then there exists an expression A, such that C
--H
A and I’ + A.
0
A proof shall not be given here. Let it suffice to remark that in A X without the rule of q-reduction the property follows easily from the strong normalizability of AX. In the present situation, where q-reduction is included, the proof is more complicated. It was proved by van Daalen (cf. 1.4).
4.3.2. Theorem. (BT). For every expression C, b(C) < 00. 1.e. big trees in AX are well founded.
0
This result implies that every expression is strongly normalizable (SN). Moreover, by CR one obtains that for each C, there exists a unique nf r, such that C = I?. (In contrast to its use in “uniqueness of types”, uniqueness is here to be understood with respect to =.) This unique expression will be denoted by nf(C).
4.3.3, Corollary. Given an expression C, its big tree can be effectively constructed.
Proof. Given the big trees of an object t and a type a,one can decide if tEa; viz. by merely checking if nf(r(t)) E nf(a). By this observation it is easy to devise an algorithm, which, when applied to an expression C, constructs the big 0 tree of C, and which can be proved to be correct by induction on b(C). 4.3.4. Corollary. AX is decidable.
0
4.3.5. Let ( C , r ) t- A R A’ assert the existence of a deduction of A R A‘ in AX, in which occur only expressions from B ( C ) u B ( r ) . Lemma. (Transitivity). If C’, I?’ E B ( C )u B ( r ) and (C’, I?’) F A = A’, then ( C ,I?) I- A = A‘.
0
4.3.6. Definition. A new measure n(r) is defined by induction on b(r): 0 > 1). n(r) = (C(g,h)ES‘(r)n ( A ) ) 1 1 where s‘(r)= { P E s(r)I
+
4.3.7. Theorem. Let C R I?, where R is =, +, ++, +, or E . Then ( C , r) ICRr.
+
Proof. Induction on n ( C ) n(I’). Let us restrict attention to equalities. If C and I? are both in nf, then by CR, C 3 I’ and we are done. So assume that C -+ C’. Then by the induction hypothesis and transitivity, (Ell?) k C’ = r.
R.C. de Vrijer
486
Hence it is enough to show that ( C , r ) k C = C'. Now distinguish cases as to the last rule applied in a deduction of C + C'. We treat only one case. ) [z : a]a*. It must be shown that Let C = [z : a](z) f + f = C' and ~ * ( f = ( C , r ) I- ~ * ( f = ) [z : alp* for some p* (cf. the rule of 0-reduction and Theorem 3.5.1). By CR, r * ( f )and [z : a ] a * have a common reduct [y : 7]7*. Now n(a) n(7) < n ( C ) and n(~*(f)) n([y : 7 ] 7 * )< n ( C ) imply that ( C , r ) F a = y and ( C , r ) I- ~ * ( f )= [y : y]y*, respectively, and consequently 0 (c,r) F T*(f)= [z : a]y*.
+
+
4.3.8. Corollary. AX as a conservative extension of AX
- 1.
Proof. By Theorem 4.3.7 and the Closure Theorem 3.4.6.
0
5. Proof of the big tree theorem The strategy of the proof of BT (Theorem 4.3.2) will be to define an extension AX - p of AX, by adding an extra rule of term formation for ordered pairs: if T ( C ) = r, then [C,rl is an expression. A pair [C,rl may be considered as just a copy of C, the second component r being present only for bookkeeping reasons. The reduction relation is extended to include the projections p,rl + C and [C, I'l + r. Strong normalization of expressions in AX - p is proved by using a computability argument. Subsequently a map cp is defined, embedding AX in AX - p such that +bt-sequences in AX give rise t o longer rs-sequences in XA - p . Termination of rs-sequences is an easy corollary of SN. Hence we may conclude that +bt-sequences in AX do terminate.
5.1. Introduction of AX - p The base fl, which was fixed under 3.2.2, is still assumed here. So AX - p will be in fact an extension of AX [a]. The definition of the set E - p of expressions of AX - p involves a "forget function'' p from expressions of XA - p to expressions of AX, consistently deleting the second coordinates of pairs. (Hence p acts as the identity on expressions of AX.) The next two definitions should be taken as simultaneously defining the set E - p and the function p.
5.1.1. Deflnition. For the definition of E - p take clauses (i) to (iv) of the inductive definition of E (2.3.1) and add a fifth clause: (v) If C and r are in E - p and 7(p(C)) = p(T) is deducible in AX, then [C, I?] is a n object if C is an object and a type if C is a type, respectively. 0
Big trees in a A-calculus (C.4)
487
5.1.3. The definitions, notations and conventions from Section 2.3 are generalized to E - p . In particular, l ( [ C , r l ) = max(l(C),l(r)) 1; d ( [ C , r l ) = max(d(C),d(r)); P a r ( [ E , r l ) = P a r ( C ) u P a r ( r ) ; F V ( [ E , r l ) = F V ( C ) U F V ( r ) ;
+
[ c , r 3 c, [c,q 3 r. p , q [ P , Z := X , q = [c[P,z := &q,r[P,z:= i,q1. Substitution is only admitted if the substitution result is in E - p again,
i.e., if the substitution does not violate the restriction in 5.1.1 (v). A sufficient condition for this requirement is given in 5.1.6 below. 5.1.4. The formulas of AX 5.1.5.
- p are defined as in 2.4.
The axioms and rules of AX - p are those of AX (cf. 3.2.1) and addi-
tionally
-, C; [C, rl -,r. A I- [c,q -, p,q;r -,A I- [c,q-, p , ~ ] .
(11) projection: [C, rl (e)
c
--*
Remark that now, by projection, an expression may reduce to an expression of a different sort, i.e. an object to a type and a type to a supertype, respectively. For that reason a few obvious restrictions are to be made in some of the rules. In II(a) and III(d) we require C and l? to be of the same sort. In II(b), t and s have to be both objects; in II(d), a and ,L3 have to be both types. 5.1.6. The definitions and results of Sections 3.2 and 3.3 are generalized to AX - p . Remark in particular that by Lemma 3.2.5 we obtain: If P , Z := l?? is an rss for C in E - p , then C [PIZ := r', is in E - p again, and hence an
4
admitted substitution. Add to Definition 3.3.1 the clause:
T( [C,
rl) = T ( C ) .
5.2. Norms
The proof of SN for AX - p is essentially based on the method of proof originated in [Tait 671, and used e.g. in [Prawitz 71, Appendix A] for a system of natural deduction. The key notion of this method, computability (alternative terminologies: convertibility, validity, reductibilitk), could be defined by induction on the length of type in [Tait 671 and on the length of the end formula of
R.C. de Vrijer
488
a deduction in (Prawitz 711. Here it is essential that the type of a term and the end formula of a deduction do not change under reduction of the term and the deduction, respectively. In our proof their task will be fulfilled by a norm on expressions y(C). Auxiliary t o its definition we first introduce the measure
m w Note. Pairs of natural numbers are supposed to be ordered lexicographically. 5.2.1. Definition. m(C)is defined by induction on ( d ( C ) , c ( C ) ) . m(type) = 0; m ( P )= m ( r )
+ 1 ; m ( x a )= m(a)+ 1;
+
+
m(ci(E1, ...,En))= max(m(Cl), ...,m(C,)) m(+(i)) 1 ; m ( ( t )r) = max(m(t),m ( r ) ) ;m([a:: 4 r) = max(m(a), m ( r ) ) and m(Tr1A l l = max(m(r),m(A)).
0
5.2.2. Lemma.
(i)
If C is an atomic expression (not type), then m(r(C))< m(E).
(ii) For all objects and types C, m(.(X)) 5 m(C). (iiz) I f C 2 I', then m(r)5 m ( C ) .
0
5.2.3. The norm y(C) is going to be a, possibly empty, string of the brackets [ and 1. Let G, H , ... range over such strings. They are well ordered by <: G < H iff the number of brackets in G is less than the number of brackets in H . X denotes the empty string. Deflnition. y(C) is defined by induction on (m(C),l ( C ) ) . ?(type) = A ; y(C) = y(7(C)) for other atomic C's;
I. r) = [Y(U)l
Y([X :
;
Y( P,All = Y(r);
5.2.4. Lemma. I f y ( t i ) = y(ai) (1 5 i 5 n), then y(C [xy',...,z? := t l , ...,tn] = y(C).
Big trees in a A-calculus (C.4)
489
5.2.5. Lemma. (a)
If tEa, then y ( t ) =?(a).
(22)
If C = r, then y(C) = y(r).
Proof. Prove (i) and (ii) simultaneously by induction on the length of de0 duction in XA - p . Use Lemma 5.2.4.
5.3. Computability The notion of computability can now be defined by induction on y(C).
5.3.1. Definition. An expression C is computable (comp) if both (i) C is strongly normalizable; (ii) whenever C -H [x: a]r and t E a and t is comp, then also r [x := t] is comp. 0
The definition is correct. For if C -H [z : a ] r and tEa, then y(t) = ~ ( a<) [ ~ ( ay)( ]r ) = Y(C)and Y(F [x := 4 ) = < Y(C).
5.3.2. Lemma. (i) If C
(aa)
-H
and C as comp, then so is
r.
Let C not have the f o r m [x : a ] r . Then C is comp zff all C1 such that C C1 are cornp. -+
Proof. Immediate by inspection of the definition.
5.3.3. Lemma. If C1, ...,C, are comp, then so is Proof. Induction on h(C1)
0
C(2).
+ ... + h(C,).
0
5.3.4. Lemma. If both C and t are comp, then so is ( t )C.
+
Proof. Induction on h ( C ) h(t). Assuming that ( t )C + r, prove that comp, and apply Lemma 5.3.2 (ii). Distinguish two cases: (i) Either t ---* tl and I? = ( t l )C or C by the induction hypothesis.
-+
C1 and
r is
r = ( t )C1. Then r is comp
(ii) C = [x : a ] A and = A[x := t ] . Then I? is comp by clause (ii) of the Computability Definition 5.3.1. 0
R.C. de Vrijer
490
5.3.5. Lemma. If C and
r
are comp, then so is [C, I?].
+
Proof. Again prove by induction on h ( C ) h(I’),that [C, I’l A is comp.
-+
A implies that 0
5.3.6. Definition. C is called computable under substitution, (cus) if for all comp expressions t l , ...,t, and variables 21, ...,z, such that 3 := $is an rss for 0 C, c [Z := i) is comp. 5.3.7. Theorem. All expressions C of E - p are cus. Proof. Induction on 1(C). Let 3 := $ b e an arbitrary rss for C, such that t l , ...,t , are comp. Throughout the proof we abbreviate A’ = A [3:= The only case which is not immediate by the Lemmas 5.3.3-5 and the induction hypothesis is C = [z : a]I?, C‘ = [z : a’]r’,where a’ and r’ are comp by the induction hypothesis. We check (i) and (ii) of Definition 5.3.1.
a.
(i) Suppose (T = CO,C1, ... is a nonterminating reduction sequence of C’. Distinguish two cases: There exist finite reduction sequences (TO, [z : a11 (z) fo of C’, and (TI, a1 of a’, and ( T Z , (z) fo of r’,such that (T = (TO, [z : a11 (z) fo, f l , ... . Then ( T Z , (z) fo, (z) f1, ... would be a nonterminating reduction sequence of I”, contradicting the computability of I”. Case (a) does not apply, i.e., no outer 7-reductions are performed in 0. Then (T would induce reduction sequences (TO of a’ and 01 of I”, such that either (TO or (TI or both are nonterminating, contradicting the fact that a’ and r’ are both comp. (ii) Suppose C’+ [z : a11 rl. Again distinguish two cases: (a) a’ +, az, I” + (z) f , z @ F V ( f ) , and so C’ + [z : a21 (z) f + f and f +, [z : a l ] r l . Let tEa1 be comp. Then r ’ [ x := t] --H rl [z:= t ] .Further ((z) f) [z := t] = ( t )f + ( t )[z : a11 Z,z := ( t is a n rss for I- and r’[z := t] = r [ Z , z:= C t ] . Hence by the induction hypothesis r’[z := t] is comp and by Lemma 5.3.2 (i) so is rl [z := ti. -+
(b) Case (a) does not apply. Then a’ --H a1 and r’ + !?l[zcyl:= za’]. Hence, if t Ea1, also I?’ [z := t] + rl [z := t] and repeating the argu0 ment in (a) we find that for comp t Eal, rl [z := t] is comp.
5.3.8. Corollary. All expressions of E - p are strongly normalizable.
0
Big trees in a A-calculus (C.4)
49 1
5.3.9. Corollary. If C is an expression in E - p , then every rs-sequence of C
terminates. Proof. Induction on ( h ( C ) I, @ ) ) , observing that if C 2
r, then h(r)5 h ( C ) . 0
5.4. Embedding AX in AX - p We now define a map cp: E -+ E - p , such that to each +bt-sequence of an expression C in AX corresponds a longer rs-sequence of cp(C) in AX - p . Then Corollary 5.3.9 guarantees the well foundedness of big trees in AX.
5.4.2. Lemma. If C E El then C = cp(C) (in AX - p ) . Proof. By induction on I @ ) ,
check that p(C) +t C.
5.4.3. Corollary. If C E I? in AX, then cp(C) E cp(r).
0
5.4.4. Lemma. If tEa in AX, C E El then cp(C) [z" := cp(t)]+t cp(C[z := t ] ) .
Proof. Induction on (m(C),l ( C ) ) . We show only three cases. (i)
= (Tz, cp(a)l)[z := cp(t)l Tcp(t),4a)l [Z := cp(t)l = rC(cp(r')) [z := cp(t)l,cp(.(_C(f)))
V(Z) [z := cp(t)I
-+
cp(t).
( 4 rP(C(Q [z := d4ll rC(cp(r [z := t])),cp(~(C(l?)) [z := t])l = cp(C(r)[z := t ] ) . Here we applied the induction hypothesis on rl, ...,rn and .(C(?)) and we used Lemma 3.3.3.
+
(iii) cp([y : p ] r) [z := cp(t)]= [ z : p(P) [z := cp(t)]] cp(r)[z := cp(t)][y := z] + [u,cp(p [Z := t ] ) cp(r ] [Z := t ] )[y := U ] = p([y : p] r [Z := t ] ) . (Apply the induction hypothesis on /3 and r.) 5.4.5. Lemma. If C 4 r in AX, then p(C) ++
cp(r). Proof. Induction on the length of deduction of C -+ r. We show only one case. Let C G ( t )[x : a]A A [z := t] = and t E a @-reduction). Then V(C)
= Mi))
-+
[Y : 4 Q ) I cp(A) [z := 91
5.4.3 and 5.4.4.
+
cp(A) .1 := cp(t)l
+
cp(nby h n m a s
0
R.C.de Vrijer
492
5.4.6. Lemma. If C E E, then p(C)
++
cp(~(C))(c either object or type).
Proof. Induction on 1(C). Two examples are: (i) p(za)= [za : cp(a)]
-+
p(a)
p(7(zm));
(ii) cp((t)r) E (cp(t))p(r) -++ (cp(t))(p(T(r)) = cp(T((t)r)),by the induction hypothesis for r. 0 5.4.7. Lemma. If C 3
r
in AX, then p(C) 1 cp(I') in A X - p .
0
Corollary. If Co, ...,En is a -+bt-sequence i n AX, then there exists an rs-sequence from cp(C0) to p(Cn) i n AX - p of equal or greater length.
5.4.8.
Proof. Induction on n, using the Lemmas 5.4.5-7.
0
5.4.9. Theorem. If C E E, then every +bt-sequence of C terminates.
Proof. Immediate from the Corollaries 5.3.9 and 5.4.8.
0
493
The Language Theory of Autornath Parts of Chapters 11, IV, V
-
VIII
D.T. van Daalen
11. MISCELLANEA [Notation and terminology
A. Expressions Apart from the usual abstraction and application expressions [x : A]B and ( A )B , we discuss primitive and defined constant-expressions p ( A ) and d(A) - where A stands for a string of expressions Al, ...,Ak -, pairs (P,A,B ) and projections A ( l ) and A(2),injections il(A,p) and i2(B,a) and plus-expressions A @ B . The P in ( P , A ,B ) and the P, a in i l ( A , p ) and i z ( B , a ) are called type-labels and sometimes omitted. See [van Daalen 73 (A.3)]for the constant expressions and [B.6]for the others. B. Elementary reductions Apart from the usual p- and 7-reduction we discuss 6-reduction : d ( A ) >6 D [ Z / A (for defined constants d with definition scheme d(Z) := D) n-reduction: ( A ,B)(1) >= A, ( A ,B)(2) >rrB , >o A a-reduction : ( A ( l )A(2)) +-reduction: ( i l ( A ) () B @ C ) >+ ( A )B and (iz(A))( B eJ C ) >+ ( A )c &-reduction: ( [ x: A] ( i l ( x ) )B ) @ ( [ x: C](iz(z))B ) > E B if x $!'FV(B).
p-, T - and +-reductions are called introduction-elimination (I.E.) reductions.
v-,
u - and E-reductions are called the corresponding extensional (ext) reductions.
C. Notations -
expressions, modulo a-reduction is a subexpression of B AcB A sub B :A is a proper subexpression of B :one-step reduction - contract one redex per step >1 :syntactic identity of :A
D.T. van Daalen
494
51 > 2 A 1B
-
one-step-reduction - several, disjoint, redices m a y be contracted in one step : generic form of one-step reduction, e.g. >1 or 31 : more-step reduction, the transitive and reflexive closure of > 1 :A and B are confluent, i.e. have a c o m m o n reduct :definitional equality, the equivalence relation generated by >1 . : disjoint
For each of the above reduction related symbols, subscripts m a y indicate what types of elementary reductions are included, e.g. >l,p or 'PJ. D. Properties :Normalization
N SN CR CRi
property
: Strong normalization property : Church-Rosser property B 5 A 2 C =+ B 1 C :Weak Church-Rosser B I C B 1C ij -PP : ij-postponement A 2ij B =+ 3 c A Li C 2j weak ij-pp :weak ij-postponement A rij B
*
3 c ,A~ 2i
+
c 2j
B
D 5i B .
Properties N, SN, CR and CR1 are also prefixed to indicate what types of elementary reductions are included, e.g. /3-N and Pq-CR.]
11.8. An informal analysis of CRI 8.1. In presence of SN, the weak CR-property CR1 is sufficient for CR. Anyhow, for the heuristics of a CR-proof an analysis of CR1 is indispensable. Let i and j indicate kinds of elementary reduction, such as p, q etc. Let C be a n expression, with an i-redex R c C and a j-redex S C C. By contracting R t o R' (resp. S to S') we get C >l,i r (resp. C > l , j A). We want to find out whether r and A have a common reduct C' and if so, by what kind of and by how many contractions, C' can be reached from r and A. In the informal discussion below all possible cases are systematically treated, according to the relative positions of the redices R and S. The first point is of course, that either (a) R and S are disjoint, (b) R = S , (c) R sub S or (d) S sub R. In case (a), the contractions just commute: 8.2.
C=
r z ...R'...s... > l , j ...R' ...S'...
...R...s... C'E
>l,i
C,
As for case (b), if we assume that
(*)
for each definitional constant only one defining axiom is given,
The language theory of Automath, Chapter 11,Section 8 ((2.5)
495
then all elementary reductions are mutually exclusive. 1.e. if R i-contracts to R‘ and R j-contracts to S’ then i and j refer to the same kind of reduction and R’ = S‘. So, under assumption (*), which is indeed fulfilled in the Automath system of abbreviations, in case (b) for a common reduct we can take C‘ zi I’(=
A). Case (c) is discussed in Sec. 8.4 and further. Case (d) can of course be reduced to case (c) by interchanging i and j, R and S. 8.3. About expression variables in schemes for reduction The elementary reductions are formulated in schematic form, i.e. with metavariables for expressions in them. For instance, in the scheme of ,&reduction “ ( A )[x : B]C elementary reduces to C [A]” (in Sec. 3.2.1), the meta-variables A , B , C are the expression variables of the scheme. For each of the schemes, all of its expression variables occur (of course!) at least once in the left-hand side (redex). Let X be an expression variable of a scheme for reductions. We distinguish three cases: (i)
X disappears in the contractum (such as B above).
(ii) X occurs just once in the contractum, possibly there is substituted in X (such as C above). (iii) X is possibly multiplied by substitution (such as A above). For all kinds of reductions, except CT and E , the expression variables occur precisely once in the redex. To these two exceptional cases we refer as the twin reductions (because of the twin occurrences of the meta-variable, e.g. of X in (X(1),X ( 2 ) ) ) . 8.4. Case (c). Let R sub S, S j-contracts to S’. Distinguish the following
cases: (cl) R
cX
for some instance X of a meta-variable of the j-redex.
(c2) not (cl), so R forms an essential part of S (such as [x : B]C in ( A ) [x: B]C ) . Now, unless j refers to a twin reduction and R c X for some instance X of a twin occurrence, in case (cl) the j-redex is not spoilt by the i-contraction. For common reduct C‘ we take the result of simply contracting the modified (by the internal i-contraction) j-redex in r. From A we can reach C’ by icontracting nothing (if X disappears, i.e. case (i), Sec. 8.3), i-contracting one possibly modified (by substitution) occurrence of R (if X occurs once, i.e. case (ii), Sec. 8.3) or i-contracting possibly more disjoint occurrences of R (if X
D.T. van Daalen
496
multiplies, case (iii), Sec. 8.3). So C is disjoint one-step i-reduction). Examples:
>l,i
r
>1j
C’ Z1,i A
<1j
C (where
51,i
(1) j is /3, X “occurs once”, use substitution property I, Sec. 3.8:
c = s = ( A )[ Z : B]R
>l,i
r = ( A )[Z: B]R’
>l,p
C ’ z R ‘ [ A ]
c = s = ( R ) : B]...Z...Z... >l,i r = (R‘)iZ : B]...Z...x... C’ = ...R’ ...R‘ ... <1,j A S‘ = ...R...R... <1,p C .
>l,p
In contrast with this, if j refers to a twin case and R c X for some “twin variable” X , then the j-redex is spoilt by the i-contraction indeed - but can be restored by i-contracting the other twin as well. So, since twin variables occur just once in the contractum (case (ii), Sec. 8.3), for some I”, C’, C >l,i r >l,i I?’ > 1 j C’
=
8.5. Case (c2). R is an essential part of S. Notice that there are two possibilities:
(1) j is a n 1.E.-reduction, i is the corresponding ext-reduction. (2) i is a n 1.E.-reduction, j is an ext-reduction. Case (c21). Here are three cases, r] v. p, 0 v. 7r and E v. +. In the first two cases there is no problem, even if the type-labels are present: l? = A, so we can take C’ = I? too.
(4c <1,7 ( A )[ Z : BI (4c >1,p ( A )c ( Z $ FV(C)) (Q,A, B)(p)< l , u (f‘, (Q,A, B ) ( I )(Q, , A, B)(2))(p) >I,* (Q, A, B)(p) 1
(0.1
(p=1orp=2).
+
The case of E v. is more complicated. First, there is an additional P-reduction needed. Secondly, there are problems with the type-labels. (E+)
R 3 ([x: B1J( i i ( 0~1, ) ) C) CB ( [ z: B2] ( i 2 ( 0~2, ) ) C ) ,
S (ip(A,0 3 ) ) R ,
R’= C ,
S’= ( A )[ Z : BPI (iP(5,D p ) )C , ( p = 1 or p = 2, z $ FV(C)) ,
(ip(A,&))C
<1,E
S >I,+ 5’’
>1,p
(ip(A,Dp[A]))C .
The language theory of Automath, Chapter 11, Section 8 (C.5)
497
So, in this case, r I,+ A >1,p A‘ with I? = A‘ but for the type labels. Hence, without type-labels, C’ = I? = A’ can serve as a common reduct. But with type-labels type-restrictions have to be imposed in order to guarantee that D, [A] and D3 are definitionally equal (and may have a common reduct).
+
8.6. Case (c22) covers p v. 77, 7~ v. u, v. E and p v. E . In the first two cases CR1 holds but for the type-labels. In the third case additional q-contractions are needed (compare with 8.5, E v. +), but in the fourth case CR1 (so CR) simply does not hold at all.
So here, I? = A but for the type-labels. Regarding R v. u,the situation compares with the twincase in 8.4: an additional .rr-reduction is needed.
=
So, r’ I , ~A with I” A but for the type-labels. In order t o keep CR in this case, we must at least require that P and Q are definitionally v. E . Since E is a twin-reduction, an additional equal. Then we come to +-contraction is needed, and two additional 77-steps. But to our relief there are no problems with type-labels.
+
Finally, we give a counterexample for ( p e ) , even without type-labels.
€I+ [z]iz(z). Then The best we can get from R’, @ R2 is R’, @ R’, E [z]il(z) S‘ < S 2 R’, @ R‘,, both are normal but S‘ f R’, @ R’, contradicting CR.
8.7. We resume our results in a table, writing i for
>1j,
;for
sl,i.
D.T. va.n Daalen
498 start with 1...j i...i
case a. redices disjoint b. redices equal c. i-redex sub j-redex c l . i-redex non-essential part c l l . j not twin case c12. j twin case c2. i-redex essential part c21. i-redex in intro form
1
j...i
... (*I +
1...j i...j
i , j ...a
r]...P
...
j...i
U*..T
...
E...+
...p (**I
... (**I
P-4
c22. i-redex in eli-form
I d. just like c, with i and j
complete with
T...U
T...
+...E
+, 7, v... (i.e. +, q...)
... P..,.
x x x
interchanged.
Notes to 8.7 and 8.8: (*) Provided there is one defining axiom for each defined constant. (**) But for the type-labels.
8.8. Alternatively, we can arrange our results in a table, according to the kinds of reduction i , j . We write i" for >y,i, the reflexive closure of >l,i. In the first column below one finds values of ( i , j ) . In the second column is indicated by what kind of reductions can be completed (i.e. can be reached a common reduct) if one starts with i...j. complete with j " .,.i"
j' ...i"
(**)
;...;(*I io, jO...io
(**)
io,j'...io i", j"...jo,i"
+", E'...
+O
or ...P
(**)
or
+, r ] , 17..
x x x 11.9. An informal analysis of postponement
9.1. A discussion, similar to the analysis of CR1 in the preceding section, can be devoted to the question of postponement. Let J? contain a j-redex R; by contracting R to R' one gets C. Let C contain an i-redex S; by contracting S one arrives at A.
The language theory of Automath, Chapter
IT, Section 9 (C.5)
499
Essential for ij-postponement is that the j-contraction does not create the i-redex S . Of course, for most of the cases for i , j , essentially new i-redices are indeed created by the j-contractions. E.g., how a ,&redex is created by a n-contraction:
( 4(1.
: BI
c,a ( 1 )
>l,x
( A ).I
:
BI c 3
or by a +-contraction: (ii(a)) ([x : B ] C @ D )
>I,+
( A )[z : B ] C
I
Below we just consider the possibility of ij-PP where i is an 1.E.- and j is an ext-reduction, and the possibility of weak 6j-PP in general. 9.2. Ext-postponement 9.2.1. Let i refer to an 1.E.-reduction and let j refer to an ext-reduction. The
schemes for ext-reduction have a single expression variable as contractum. So R’ is an instance of such an expression variable. If (a) R’ and S are disjoint (in C), or (b) S c R‘ and ( b l ) the expression variable of which R’ is an instance occurs once in the j-redex (so, in fact, j must be 7-reduction), then the i- and the j-contraction can be interchanged. Example of (bl): [X : A] (x)B > I , ~B > l , i
B’
< I , ~[Z
: A] ( x ) B’
FV(B’), because z 6 FV(B)). If (b2) j refers to a twin reduction (i.e. 0 or E ) then two disjoint i-contractions are needed. E.g. ( X $!
P(l),qZ))
>l,o
B
>l,i
B‘
.
< l , i < l , i @(l),B(2))
9.2.2. If (c) R’ c S [the case R’ = S falls under (b)] and (cl) R’ is part of an instance of an expression variable of the i-redex, then one can start with >l,i and finish with some disjoint j-contractions, compare case (cl) of the CR1-analysis. Example:
( R )[X : B]...x...x...
>1,j
...R‘...R’... Z1,j ...R...R...
(R’)[ X : B ]...X . ..x... >1,p <1,p
( R )[ x : B ]...x...x... .
9.2.3. Otherwise, (c2) R‘ is an essential part of S. Since i is an 1.E.-reduction, R’ is in introduction form, i.e. i n j , abstr, or p a i r , or it is a plus-expression.
Now we assume that (*) such type restrictions are fulfilled, that (1) the result of a 0-contraction is never an i n j - , an a b s t r - or a plus-expression, (2) the result of an E- or an 7-contraction is never an inj-expression or a p a i r . Then (c2) can only be realized as follows (for brevity we omit type-labels):
D.T. van Daalen
500
( p = 1 or p = 2)
E
creates
P:
(4(1.1 >1,E
(il(.)) [YI C ) @ ([I.
(i2(.))
[?I1 C))
( 4 [YI c >l,P C[AI .
Indeed, in all but the last case, the i-redex is not essentially new: (T,T (i.e. > I , ~ > I , ~ can ) be simulated by T , T ; v,P by P,P; E , by +,PI and v, by P, But PE-PP (so (~+)-E-PP) is false.
+
+.
+
+
9.2.4. We resume the results of this sections in a table
simulate by case
(b2)
(cl)
P,77 P,v
(a)
-
P,ii
+,v
+,7)
-
r,v
+,vo
r,77
-
T,VO
010
+,E
-
T,E
-
T,T,E
T,E
PIE
-
P,P,E
P,Z
+,c7
T,U
(*)
(bl)
PIPlO
+,+,u
P,Z +,(To
T1X,U
7r,(To
+,+,E
+,EO
Assuming certain type restrictions.
9.3. Weak &advancement 9.3.1. Since the presence of b-redices is only dependent on the presence of defined constants, apparently no essentially new b-redices are created by the other reductions. However, we can only hope for weak &advancement (i.e. weak 6j-PP for all kinds of reductions j , distinct from 6) in view of the PG-example:
( d ( E ) )[Z : A]...z...~...>1,p ...a!(,!?)...d(,!?)...
>1,6
...D [E]...42)...
The language theory of Automath, Chapter 11, Section 10 (C.5)
501
where d(y3 := D is the defining axiom of d. If we start with >I,& here, then possibly too many &redices are contracted. Actually, the situation compares very well with the situation with the twin reductions w.r.t. CR1.
9.3.2. Let r, C, A, R, R‘, S be as in 9.1. R is an arbitrary non-&redex, S is a &redex d(@ (defining axiom as above, say). If (a) R’ and S are disjoint in C then the contractions can be interchanged:
r=...R...S...
>l,j
...R’ ...S’...
...R‘ ...S... >I,& ...R...S’...
+,j
If (b) R’ sub S , then R‘ 51,j.Example:
C
r.
Ei, for some i, so we can simulate
>l,j>1,6
by
>1,6
d(R) >1,j d(R’) >I,&
...R’ ...R’ ... 71,j ...R...R...
d(R) .
If (c) S c R’ then (cl) S is part of an instance of an expression variable of the j-reduction scheme, or (c2) j is /3, S c C [A](E R’ where R = (A) [x: B]C). Case (cl) is just like case (a). In case (c2) there are two possibilities:
c C, $0 [A]=
(c21) d(&)
(for some
go),or
(c22) d ( E ) is part of one of the substituted occurrences of A.
9.3.3. The contractions can again be interchanged in case (c21). Example:
( A ) [Z : €31 d ( F ) <1,p
>1,p
d ( F [ A ] )>I,& ...F [A]...F [A]...
( A )[ X : B]...F...F...
( A )[X B ]d ( F ) .
Case (c22): As in the example above, for some A’, C’
r
>1,p
C
>1,6
A 31,6 A’
<1,p C’ <1,6
r
*
9.3.4. Resuming:
11.10. Multiple substitution 10.1. Let D be a set of expressions; C is an expression, x is a variable. Then r is a multiple substitution result of C with D for x if r can be produced from C by substituting some A E D for each free occurrence of x in C (possibly different A’s for different occurrences of x).
D.T. van Daalen
502
The set of such multiple substitution results is denoted C [[z/DI(here, locally, abbreviated to C*) or just C flD] and can be defined inductively, along the lines of ordinary substitution, as follows:
+ A E x*
(i)a.
A
(i)b.
z f y =+ y E y*
(ii)
2
(iii)a.
If[
E
D
$ y, (VAED y @ FV(A)), r E C* =+ Xy . (if necessary rename y)
.
E (Xy C)*
= 0 =+ f E f *
(iii)b. I‘i E Ct for i = 1,...,If1 =+ f(r’) E
(f(2))’.
By induction on the length of C it can be shown that C* is decidable if D is decidable; e.g., if D is finite then C* is finite.
10.2. Multiple substitution satisfies much the same properties as ordinary substitution. E.g.: if z f y, V a Ey~@ FV(A) then
r*“y/C*n = (rWCI)* or, in full
rn m UYP
ux/an = r
PI r wn .
Here = is ordinary set equality. The proof is by induction on with ordinary substitution,
C 1 r, A E C* =+ A 2 Ar ,
for some Ar E
So, just like
r*.
10.3. If D is a set and, A E D =+ C 2 A then, for all I“ in
r
r.
r [[z/Dl],
2 r’ .
So, if p(C) denotes the set of reducts of C then
r r E r flz/p(c)i=+ r [
~ C2I r’ .
The concept of multiple substitution will typically be used in the proof of normalisation-like properties (see e.g. Ch. IV and VII, also in this Volume).
The language theory of Automath, Chapter II, Section 11 (C.5)
503
11.11. Reduction under substitution in AP-calculus; Barendregt’s lemma 11.l. Introduction The variable x and the expression all reduction relations we have
c 2 C’ , r 2 r’
r will be fixed throughout Section 11. For
c [z/r]2 C/ [z/r/].
Now we consider the converse question: If C [ x / r ] 2 A, what can be said about A in terms of reducts of C and r ? We concentrate on free A-calculus here; reduction is just P-reduction. Expressions are variables, application expressions AB and A-expressions Ay.A. We write A B for (...(ABl)Bz...)Bk. We write C + A (relative to x and I?), if A can be produced from C by replacing certain occurrences ~1 = xAI1, A2 z xAI2, . . . , i l k = xii!tk ( k 2 0 ) of subexpressions of C by reducts A;, A;, ...,A: of A1 [ x / r ] , 112 [ x / r ] ,...,A k [ x / r ] respectively, not leaving any free occurrences of x unreplaced. Here we prove the reduction-under-substitution lemma:
(*)
if C [ x / r ] 2 A then, for some C‘, C 2 C’ + A
.
Barendregt proved a restricted form of a similar fact for weak combinatory logic, using some underlining technique (see H.P. Barendregt, The undefinability of Church’s 6, unpubl. 1972). His proof was extended to A-calculus by de Boer, a student of de Vrijer [de Boer 751. Our proof here will be different: We show that the set of A, such that for some C‘, C 2 C‘ + A, is closed under reduction. Since C + C [ x / r ] , this proves (*). A corollary of (*) is the square brackets lemma, in Sec. 11.5, which is applied in our proof of p-SN in Sec. IV.2.4.4. However, direct proofs of the square brackets lemma are also possible; first there is Levy’s proof [LBvy 75, p. 1341 using the standardization theorem and secondly, there is a proof using SN (IV. Sec. 2.4.3). Further interesting applications of property (*) are the non-definability results in [Barendregt et al. 761, [Mitschke 761. [(*) is included as an exercise in [Barendregt 811 .] 11.2. The definition of -+ 11.2.1. Abbreviate [ x / r ] by * here. Informally speaking, C + A means +
-
*
CE
...A1 ...A2 ... ... E ...XM ~ .X.h./ r , ...XMk ...
A
...A: ...A; ...A: ... ,
and E
with A,’ 2 A: for i = 1, ...,k
.
D.T. van Daalen
504
11.2.2. Formally, we can define
+
(la)
(zG)' 2 A + zG + A
(Ib)
z$Y+Y+Y
(2)
yf
inductively, as follows:
(A? possibly empty)
y $ FV(I'), C -t A
2,
+- Xy.C + Xy.A (if necessary rename y)
C1
(3)
A , , C2 + A2
-+
* ClC2 + AiA2 .
11.3. The following properties of
+
are easily proved from 11.2.2.
(1)
C + C"
(2)
C+A
(3a)
c
(3b)
C+Xy.Al
*(l)C=Ay.C1, (2) c = zG
(3c)
C+AlA2
=S
-t
3
y
=S
C*>A (1)
c =y ,
y f z , or (2)
+
A i , C2
C1+Al,
(1) C r C 1 C 2 , C1 + A l l (2) c = zA2 .
11.4.1. Substitution lemma for
Xi
+
*
A2
y f z , or C2 + A 2 , or
+. If y $ FV(I'), y f
X i [Y/&]
-t
Proof. By induction on the definition of C1 C; A l . Then
>
C1 [Y/C2]
c = zil
z then
A i [Y/&] +
Al. E.g. let C1
= zil[Y/C,]
and
(Xi [ Y / G ] ) * Z; [ Y / C ~ ] ]1 A i [ Y / C ~ ]1 A i [Y/&] because of 11.3 (2) above. So
Xi
[Y/&]
+
.
A i [Y/&]
Or, let
xi
CiiCi2 , A i
A11A12,
xi1
-t
A i l , C12
Then
C i i [Y/G] A i l [Y/&] +
and el2
[Y/.%]
+
A12 [Y/&]
(by ind. hyp.)
+
A12 .
= ($A?),
The language theory of Automath, Chapter 11, Section 11 (‘2.5)
505
0
11.4.2. Reduction lemma for some 2’.
-+. C + A
>1
A’
C 2 C’
+
A’ for
Proof. By induction on the length of A . Or, informally, as follows: A must contain a redex, A = ...(Xy.Al)Az ..., A’ = ...A1 [y/Az]... . Now there are three cases:
or (2)
C 3 ...(Z J G )...~ ,~
(26)’ 2 Xy.Al,
C2 + A2 ,
or (3)
C
+
...(ZM)..., ( 2 9 ) ’2 ( . . . ( A v . A ~ ) A ~,. . . )
We must indicate an appropriate C’: In case (1) take C’ 3 ...C1 [y/C2] ... and use 11.4.1,in case (2) and (3) simply take C = C’. So, in fact, even C >p C‘ +. A’.O 11.4.3. Theorem. C -+ A 2 A‘ =+ C 2 C’ + A’, for some C’. Proof. By induction on A
2 A’, using 11.4.2.
0
11.5. Corollaries 11.5.1. Reduction-under-substitution lemma
( i e . Property (*), Sec.
11.1):
C’2A
+
C>C’+A.
0
11.5.2. Barendregt’s lemma ([de Boer 751): (if z !$ FV(C))
C r 2 A =+ CZ 2 C’ -+ A Proof. (CZ)’
.
= CI’ 2 A SO CZ 2 C’+ A .
11.5.3. Square brackets lemma. If C’ 2 Xy.A then either (1)
C>Xy.Ao, A Z Z A
OT
(2)
C 2 z d , (z@)* 2 Xy.A
.
0
D.T. van Daalen
506
Note about terminology: The name square brackets lemma comes from the square brackets which represent abstr in Automath notation. Here the name A-lemma would be more appropriate. Slightly more general than 11.5.3 is:
11.5.4. “Outer shape lemma”. If C* 2 A then C 2 Ao, A: 2 A with either (1)
the latter reduction (A; 2 A) is non-main,
or (2)
A,=xA?.
0
[Main reduction: when at one of the reduction steps, the expression itself acts as the redex to be contracted.]
11.5.5. Note: The corollaries 11.5.1, 11.5.2 and 11.5.4 do not extend to reduction, but the square brackets lemma does (by ,&postponement).
pv-
The language theory of Automath, Chapter IV, Section 2 (C.5)
507
IV. STRONG NORMALIZATION FOR FIRST ORDER PURE TYPED A-CALCULUS WITH APPLICATION TO AUT-QE IV.2. Normalization and strong normalization for normable expressions 2.1.1. Here we consider a system M of normable expressions, in which the first order pure typed A-calculus systems, such as the systems of correct Automath expressions, can be embedded. Each normable expression C has a norm p(C). Norms are defined inductively: (i)
7
is a norm.
(ii) if u l , v 2 are norms then [ul] v 2 is a norm. The length I ( u ) of a norm u can be defined to be the number of 7’s in u. Equality of norms is denoted by =.
2.1.2. The expressions in M are formed from variables, A, a b s t r and appl (and possibly other constants). Abstraction expressions are denoted [z : A] B and application expressions (A) B. By writing p(A) we implicitly intend that A E M. Here follow the relevant properties of M and the norm p: (1) M is closed under taking subexpressions, i.e. C E M , r C C (2) C = [z : A] B E M
* p(C) E [p(A)]b ( B ) ,~ ( z=) p(A).
(3) C = ( A )B E M
p ( B ) = [p(A)]p ( C ) .
+r E M .
(4) M is closed (and p is preserved) under substitution: l-42) = P(A), B E M CL(B[%/A])= P ( B ) .
*
(5) M is closed (and p is preserved) under reduction: c E M , c 2 r a b(r) p ( c ) .
=
2.1.3. The norm p induces a well-founded order sions, as follows:
c
r
:* W C > ) <
on the normable expres-
w3) .
Then by the properties above, < p induces an actual stratification, according to functional complexity: both argument and value of a function precede the function w.r.t. < p i.e. if (A) B E M then A
(A)
D.T. van Daalen
508
Induction on <,, is just called induction o n p. Below we only deal with (strong) normalization for P-reduction. [Using ,877postponement, see ZI.9,] we can extend to the Pq-case. In Sec. IV.4.6we extend with &reductions.
2.2. Normalization for P-reduction: first proof 2.2.1. Heuristics Assume that C E M , C is not normal, C >1,p C‘. So
C = ...(A ) [z : B]C...,C’ = ... C [A]... The redices in C’ are of several kinds (compare with 11.9): (1) (‘old” redices, already present in C (and there disjoint with
(A)[z : B]C).
(2) “modified” redices, i.e. redices R [ z / A ]C C [z/A]in C’ where R
c C in C.
(3) L‘multiplied”redices, i.e. redices inside substituted occurrences of A in C [ A ] .
(4) “newly created” redices ( D l [A]) [y : 0 2 1 D3, where A ( D 1 ) z c C or z = C.
= [y : Dz]0 3 and
( 5 ) kewly created” redices (01) [y : D2 [ A ] ]D3 [As] where C = [y : Dz] D3. If A is normal, no redices are multiplied. If p ( A ) = T , no redices of type 4) are created. 2.2.2. First proof of &normalization (Prawitz 1965). (See [Prawitz 651.) This proof is quite similar to the first proof of 6-N in [ v a n Daalen 80, 111.4.31. Define the order of a redex (A)[z : B]C to be 1(p([z : B]C ) ) . Let C E M , let m(C) be the maximal order of redices in C, and let #(m,C) be the number of occurrences of redices of order m in C. Our normalization procedure runs as follows: if C is not normal then contract an innermost redex (A)[z : B ] C of maximal order. And so on. That this procedure terminates, follows by induction on (I) m ( C ) , (11) # ( m ,C). For, one redex of order m(C)disappears and, since we chose a n innermost redex of order m ( C ) all the redices of type (2)-(5) above are of order less than m ( C ) . Further, the “old” redices were already present in C, with the same order. So, either m(C)- if # ( m ,C) = 1 - or # ( m ,C) - otherwise - properly decreases under the indicated contraction.
2.3. Second proof of p-normalization (LBvy, Jutting). (See [LCvy 741, [van Benthem Jutting 71aJ.)
The language theory of Automath, Chapter IV, Section 2 (C.5)
2.3.1. (Substitution lemma for P-N (Jutting). B E M , p(A) normal, B normal + B [z/A] /3-normalizes.
509
= p(z), A
Proof. By induction on (I) p(A), (11) length of B. If B [ A ]is not normal, then B contains subexpressions of the form (BI)z and A = [y : All Az. Our normalization procedure for B [A] runs as follows: for each of these (B1)z take ... (B1)z ending in it. By ind. hyp. (11) ((Bk-1) ... (B1)z) [A] the maximal (Bk) normalizes, t o C, say. If C = [y : Cl] C, then by ind. hyp. (I) applied to B k [A], Cz [y/Bk [A]] normalizes. By normalization of all these maximal subexpressions 0 of the form ((8)z) [A], B[A] can be normalized. 2.3.2. Corollary. C E A4
+
C P-N. 0
Proof. By induction on the length of C.
2.3.3. The reduction procedure intended above corresponds to the following definition of normal form:
= [ z : nf(Al)]nf(Az) , nf ((Al) Az) = if nf ( A z ) = [y : B1] Bz then nf (Bz (y/nf (Al)]) nf([z : Al]Az)
else (nf ( A l ) ) nf (Az)
.
LCvy speaks about “interieur d’abord’l-reductions. In fact, LCvy’s proof of normalization by this procedure does not use a substitution lemma, but instead employs an induction up t o ww (for an explanation, see Sec. 2.6.1).
2.4. Strong 0-normalization (p-SN): first proof 2.4.1. Heuristics for SN We can formulate the following conditions for P-SN. C is P-SN if (i) the direct subexpressions of C are SN, and (ii) C = (A) B, B 2 [y : C] D
3
D [A] SN
(because all first main reducts of B are reducts of some D [ A ] ) . So, if we have the substitution theorem for P-SN: if B E M, p(z) then
= p(A)
A S N , B S N =+ B [ A ] SN,
then we can prove P-SN by mere induction on the length of [normable]expressions.
2.4.2. Heuristics for the substitution theorem Now let B E M , p(z) p(A), B SN and A SN. Abbreviate C[z/A] by C*. The question is how to
510
D.T. van Daalen
prove the SN-conditions for B*. The crucial case is when B SN-conditions require:
= (B1)Bz. The
(i) BT SN, B,* SN, and (ii) B,' 2 [ y : C ]D
+
D [Bf]SN.
In the case that the outside square brackets of [y : C]D do not originate from the substitution * but show up as well in reduction sequences of Bz, i.e. if
B z 2 [ ~ : C o ] D oc:2c, , Q 2 D , then
B 2 ( B i ) [ y CoIDo :
(Do [&I)*
>I
Do[&]
,
D: [B;]2 D [B;]
which suggests to use induction on the reduction tree of B ( B is SN) in order to establish that D [B;]is SN. Otherwise, the square-brackets lemma (Sec. 11.11, and Sec. 4.3 below) must provide the necessary information. 2.4.3. Alternative proof of the square-brackets lemma The proof in 11.11 works for free X - p-calculus. Here we give a n alternative proof for SN expressions. Abbreviate C [ z / A ]by C*. Square brackets lemma. Let B* 2~ [ y : C ]D and let B be SN. Then either (a)
B 2 [ y : CO]DO,C$ 2 C , D: 2 D , or
(ii) B 2
(?) 2, (?* ) A 2 [ y : C]D.
Proof. By induction on (I) O(B) (the length of the reduction tree of B ) , (11) length of B. Distinguish the cases:
= 2. Then (ii) holds. B = [y : B I ]Bz. Since we have no 17-reduction, (i) holds.
(1) B (2)
(B1)B2. Then B; 2 [ z : E ] F , F [B;]2 [y : C ]D. By ind. hyp. (11) (3) B applied t o Bz, either
(a) Bz 2 [ z : Eo]Fo, F; 2 F or (b) B2 2 (C?) 2, (G*)A 2 [ z : E] F .
In case (a), B 2 (BI)[ z : Eo]FO (Fo [Bl])'
>I
FO[ B I ]and
F; [Bf]2 F [B;]2 [ y : C]D .
The language theory of Automath, Chapter IV, Section 2 (C.5)
511
Hence, ind. hyp. (I) applies to FO[Bl]which gives the desired result for 8. In case (b),
so B satisfies (ii). (4) In the remaining case, B* does not reduce to [y : C] D.
0
2.4.4. First proof of P-SN 2.4.4.1. In agreement with 2.4.1, we start with the substitution theorem
for P-SN:
B E M , p ( z ) = p ( A ) , A S N , B SN
+
B[z/A] SN.
Proof. By a triple induction on (I) p(A), (11) 29(B),(111) length of B. [29(B) is the height of the reduction tree of B ; also 2 9 ~etc.] Abbreviate E[z/A] by C*. We prove the SN-condition for B*. The crucial case is when B E (B1)B2. The direct subexpressions Bf, B; of B' are SN by ind. hyp. (111) (or possibly (11)). So, let B,* 2 [y : C] D. We must prove that D [Bf]is SN. By the squarebrackets lemma applied to B2 (which is SN, hence we can use the alternative proof without any circularities) we have two cases:
DO, (i) B2 2 [Y : CO] (ii) B2 2
2 D , or
(F)z, ( ( F )z)* 2 [y : C]D.
In case (i), B 2 (B1)[y : CO]DO > I DO[ B I ]and (Do [Bl])'= Dt; [Bf]2 D [Bf], so the ind. hyp. (11) applies to DO[ B l ]whence , D [Bf]is SN. In case (ii), we know that D is SN (since Bd is SN and B,* 2 [y : C] D ) . Further, by the properties of p , p ( B ; ) = p ( B 1 ) E p(y), B1 < p z so BT < p A. Hence, 0 we can apply ind. hyp. (I) to Bf and get that D [Bf]is SN. 2.4.4.2. Corollary. C E M
+
C P-SN (as indicated in 2.4.1).
0
Second proof of 0-SN (Note: Actually, both the second and the third proof of P-SN are incorrect: The substitution theorem is not sufficient here, we rather need a replacement theorem. Since the idea of the proof can be maintained, and since the error will be repaired in VII.4.5 we have not altered the present text.) 2.5.
2.5.1.
Heuristics for the substitution theorem for SN Let A and B
D.T. van Daalen
512
be SN. As in 11.5.3.3, B* (where * stands for [ z / A ] )is SN if all its reduction sequences contain an SN expression. Let B* > I C > ... be a reduction sequence of B*. First, if the redex is an old or a modified redex (terminology as in 2.2.1) then the contraction and the substitution commute: for some CO,B >1 CO, C: E C. So, if we use induction on O(B),we can conclude that C is SN. In fact, the proofs in 2.4.3 and 2.4.4.1 use the similar fact that, in some cases, substitution and main reduction commute. Secondly, reduction sequences of B* can start with contractions inside substituted A’s (or inside reducts of such A’s). There is only a finite number of such contractions, since A is SN. Finally, if the first redex contracted is a new redex then we have to use properties of the norm. Our alternative proof of the substitution theorem, below, is based on the above ideas and avoids the square brackets lemma. 2.5.2. An additional assumption on M and p is needed, viz. that M is closed and p is preserved under “correctly normed” replacement: If C E Ad, and C’ is formed from C by replacing an occurrence of r C C with some r‘ E M , such that @’) = p ( r ) then p(C’) = p( C) . 2.5.3. Second proof of the substitution theorem for P-SN Let B E M, p(z) p ( A ) , A and B are SN. We must prove that B* is SN. Again, we use a
=
triple induction on (I) p ( A ) , (11) 19(B), (111) length of B. Let B* = BO >1 B1 >1 ... >1 Bk > 1 Bk+l > ... (k 2 0 ) be a reduction sequence of B* and let the step from Bk to Bk+l be the first reduction step not taking place inside (a reduct) of some substituted A . So B* = ...A ... A ..., Bk = ... A’ ...A” ... = ... (C) [y : Dl E ..., Bk+l = ... E [y/C] ..., where A 2 A’, A 2 A”. If p ( A ) is the set of reducts of A (which is finite) then B*, B1, ..., Bk belong to the multiple substitution result B [ z / p ( A ) ](Sec. 11.10) and Bk+l is the first reduct not in that set. Clearly, k 5 19(A). # ( E , B ) , i.e. the length of reduction tree of A times the number of free occurrences of E in B. We show that is SN. Put R = (C) [y : D] E , and distinguish the following cases: (i) (Co) [Y : Do] Eo Ro C B and R E Ro [ z / p ( A ) ] . Then the contraction of R commutes with the multiple substitution, viz.
The ind. hyp. (11) applies to B’, and (B‘)*2 Bk+l so Bk+l is SN.
The language theory of Automath, Chapter IV, Section 2 (C.5)
513
(ii) (CO) z = RO c B , C E CO[z/p(A)], A 2 [y : D] E . We apply ind. hyp. (I) twice (in contrast with 2.4.4.1). First CO <, 2 so C <,, A. By ind. hyp. (111) C; (so C) is SN, and A is SN, so E is SN. Hence, by ind. hyp. (I) E [y/C] is SN. Secondly, E <, A so E [C] <,, A. Now take a fresh variable z , with p ( z ) = p ( E ) . Form the expression B’ from B by replacing the specific occurrence of & by z , and form B” from Bk by replacing R with z. [The problem with this ‘$roof” has to do with possible free variables of &, which are bound in B. Their presence makes that not B = B’[z/&]. Instead we would need literary replacement of z with &, as used in VII.4.5.1 By our assumption 2.5.2 the norm of B and its subexpressions are not affected by this replacement. Clearly, since B = B’ [z/Ro],B’ is SN and g(B’) 5 19(B).Further B’ is shorter than B. So ind. hyp. (111) or ind. hyp. (11) can be applied, giving that (B‘)*is SN. And by ind. hyp. (I) - this is the second application - (B’)*[ z / E[y/C]] is SN. Resuming, in case (ii) we have:
= ... 2 ... (Co). ..., B’ E ...2 ... z ... , (B’)*E ... A ...z , B* = ... A . . . ((7;) A . . . , Bk = ... A’ ... ( C )[y : D ] E... , B
B” = ... A’ ...z ... .
so (B‘)*2 B”, and (El‘)* [ z / E[y/C]] = ...A ... E [y/C] ... 2
2 B” [ L I E [V/c]] ... A’ ... h’[y/C]... whence
Bk+l
is SN, q.e.d.
2.5.4. Corollary. C E M
j
C 8-SN (as indicated in 2.4.1).
0
2.6. Third proof of P-SN 2.6.1. This new proof is a mere variant of the previous one. However, instead of an iterated substitution a simultaneous substitution is employed. Consequently, we start with a simultaneous substitution theorem for P-SN. The induction used is essentially induction up to ww, instead of the previously used inductions on w , w2 and w 3.
Explanation: The threefold w-induction (as used in the above proofs) can be considered as a single transfinite (up to w 3 ) induction on triples ( m ,n, k ) of natural numbers, ordered lexicographically, i.e. according to their corresponding ordinal w2 . m w . n k. Similarly, the present proof uses a single transfinite induction on finite sequences (mk,m k - 1 , ...,mo),where mk # 0, for arbitrary lc, ordered (I) according
+
+
D.T. vm Daalen
514
to their length k , (11) lexicographically, i.e. according to their corresponding ordinal wk
9
rnk
+ wk-l
. m k - l + ... + w . ml
+ mo .
2.6.2. Simultaneous substitution theorem for SN. Let B E M , p ( x i ) = p(Ai) = ui for i = 1,...,k . let A' and B be SN. T h e n B [?/XI SN.
Proof. Abbreviate C [?/XI by C".Let 1zi denote the number of occurrences of xi (for i = 1, ...,k ) in the whole reduction tree of B. Define aj to be
C
ni
. 19(Ai).
i,l(ui)=j,l
Let m be the maximum of those l(vi) with ni # 0. We use induction on (I) (am,...,ao) (ordered as above), (11) t9(B),(111) length of B. Let B* >1 C. We shall prove that C is SN. The cases are: (1) If the redex contracted is old or modified proceed as in the proof 2.5.3, case
(9. (2) If the redex contracted is a multiplied redex, i.e.
then take a fresh variable z with p ( z ) = p ( x i ) , form an expression B' from B by replacing the specific occurrence of xi by z , and consider the new substitution [?,z/A,A:]. Clearly, C = B' [Z, z / A , A:],and C is SN by ind. hyp. (I). Notice that, in fact, only c q v , ) is affected, viz. decreased by at least 1. (3) If the redex contracted is new (compare proof 2.5.3), case (ii)) then B = ...xi ... (Do)xi ... ~j ..., B* E ... Ai ... (D:) Ai ... Aj ..., Ai z [y : El F and C E ...A; ...F [D:] ... Aj ... . Now form B' by replacing (DO)zi with a new p ( ( D o ) x i ) , and consider the new substitution variable z, p ( z ) [Z, z / i , F [D:]]. Since B B' [ z / ( D o )xi],the replacement removes at least one occurrence of xi from the reduction tree, whereas possibly only occurrences of z (which has shorter norm) are added. So the component cyj with I(ui) = j properly decreases when going from B to B'. Further, as in 5.3.2 case (ii), F [D:] is SN so C E B' [Z, z / & F [D:]]is SN, by ind. hyp. (I). 0
=
2.6.3. Corollary. Substitution theorem for SN (take k = 1 above).
0
The language theory of Automath, Chapter IV, Section 4.6 (C.5) 2.6.4. Corollary. C E M
3
C P - SN.
515 17
IV.4. The normability of AUT-QE
[In [van Daalen 80, 1V.4.1...4.53 (not included here), it is proved that the correct expressions of AUT-QE+, and hence of A U T - Q E and AUT-68, are weakly norm correct (w.n.c.), and hence weakly normable. These weak notions have been introduced to cope with type inclusion: by type inclusion substitution with 2-expressions may alter the norm of expressions. However, the norms of degree 3 subexpressions are unaffected and all the 0-N and 0-SN proofs remain valid.] 4.6. Extension to PvS-SN By the previous section we know that the correct expressions of AUT-QE-+ (and hence of AUT-QE and AUT-68) are p-SN. The definitional axioms are just like in Chapter 111, so we also have 6-SN. Now we extend to P$-SN. 4.6.1. Lemma. If p ( A ) = p ( x ) , B normable, then
A P6-SN, B P6-SN =+ B [ x / A ]P6-SN . Proof. By induction on (I) p(A), (11) 29pb(B), (111) length (B). Combine a single-substitution version of the second proof of 6-SN [van Daalen 8 0 , 111.5.41 0 with the first proof of 0-SN (1Vq2.4.4).
<
[.'/A]
4.6.2. Lemma. Let k be correctness, as in 4.5.2. I f = 3E6, is weakly norm correct and weakly degree correct, Ai is PS-SN ( i = 1,..., 131) then
=+ B [$/A]PS-SN .
[In fact, i n IV.4.1 ...4. 5, the notions of weak norm correctness and weak normability have been introduced in the framework of the weakly degree-correct expressions. I n the weakly degree-correct expressions, the degree restrictions only serve to rule out application with degree 2 expressions - f o r which weaker norm restrictions are imposed than for expressions of degree 3 and higher.] Proof. By induction on k B (as in 4.5.3). Abbreviate [ Z / 4 by *. Some cases are:
= (C) D. By ind. hyp. C*, D* are PS-SN. Now let D* 2 [y : E] F. By 4.5.3, B*, C* and D* have a norm. By 4.4.3, F has a norm and p ( C * ) = p ( y ) . So, by 4.6.1, F [ C * ]is p6-SN, q.e.d.
(1) B
(2) B = d(C?). By ind. hyp. the Cj* are P6-SN. The C?* form a w.n.c. and w.d.c. substitution. So by applying the ind. hyp. to d e f ( d ) , with the new 0 substitution, def (d) [C?] is PG-SN, q.e.d.
D.T. van Daalen
516
4.6.3. Theorem (PS-SN): F A
+
A PG-SN.
Proof. Take the identical substitution 4.6.4.
Corollary. I- A
+
[?/$I
above.
A PvS-SN (by (P6)-postponement, see 11.9).
0
The language theory of Automath, Chapter V (C.5)
517
V. THE E-DEFINITION AND THE CLOSURE PROPERTY FOR PURE REGULAR AUTOMATH LANGUAGES Section 2 of this chapter introduces the E-definition, closely related to the definition (of AUT-QE) in [van Daalen 73 (A.3)], as a framework for defining Automath languages. Section 3 proves the closure property (correct expressions remain correct under reduction) for several versions of the pure (i.e. only p-, 9-and &reduction), regular (i.e. only expressions of degree 1, 2 and 3) languages AUT-68 and AUT-
QE. Section 4 proves, using closure and CR [ C R stands for Church-Rosser property] (thus anticipating the PVR-result of Chapter VI), the equivalence of the E-definition with an algorithmic definition, such as Nederpelt’s definition of A. This gives the decidability of the various systems, and further allows certain simplifications in the E-definition.
V.l. Introduction 1.1. E-definition versus algorithmic definition We distinguish some fundamentally different methods of defining the correct expressions, with typing and equality relation (w.r.t. book and context) [see, e.g. [van Daalen 73 (A.3)]],of an Automath language, or of any other system with generalized type-structure. First, the E-definition, below, introduces E-formulas A E B (expressing the typing relation: A has type B) and Q-formulas A B (for expressing equality: A definitionally equal to B ) . Correctness of expressions (notation: I- A ) and both kinds of formulas is given by a simultaneous inductive definition, without giving a clue how the correctness might be effectively verified. Essentially the same definition method is used in [van Daalen 73 (A.3)], and in [Martin-Lof 75a). Secondly, there is the algorithmic definition which characterizes the correct expressions by giving a verification algorithm for correctness. In this case Q can be defined in terms of reduction ( A B :($ A 1 B ) and E can be defined in terms of Q and the typing function t y p ( A E B :@ typ(A) Q B , forgetting type-inclusion for the moment). The main example of an algorithmic definition is Nederpelt’s definition of A in [Nederpelt 73 (C.3)]. In the third place, we mention de Vrijer’s definition method of AX in [ d e Vrijer 75 (C.d)]. He starts with the simultaneous introduction of the correct Eand @formulas, and after that defines correctness of expressions in terms of E , Q and typ.
D.T. va.n DaaJen
518
1.2. Some general points on the language theory A priori it is not clear that the various definition methods generate the same structure (of correct expressions, with typing and equality). So one might think that the language theory has two aims, viz. (1) proving the equivalence of the various formulations, and (2) proving that the generated structures satisfy some specific desirable properties (Sec. 1.3).
However, these aims can hardly be separated: properties are first proved for one formulation, then the equivalence is established and finally the properties are transferred to the other formulation, via the equivalence. A simple example of this situation: for the system given by the algorithmic definition, decidability is just a matter of termination of the algorithm, i.e. normalization (as Nederpelt points out [Nederpelt 73 (C.3)]).So, by the results in Chapter IV, if a system can be proved to be equivalent to the “algorithmic one”, it is decidable. As a second illustration, we sketch roughly how the development below is organized. [Terminology: > means 1-step reduction, 2 is the transitive closure of >; A 1 B iff there is C such that A 2 C 5 B; is the transitive closure of 1; further, for relations R like >, 2 , 1, -, R+ is the restriction of R to the correct expressions .] We work with three systems: I and I1 are given by an E-definition and I11 is the algorithmic definition. The three systems essentially just differ as regards their @rules. In system I, p is defined to be the equivalence relation generated by >+ (but realize that 9 and k are introduced simultaneously). This is the restricted “technical” version of the E-definition, which we present in Section 2, and take as the starting point for the development in Section 3. In system 11, 9 is -t-, i.e. the transitive closure of J+. This is the liberal form of the E-definition, which we think is most suitable for practical purposes, as a reference manual, say. In system 111, the algorithmic definition, which we given in Section 4, 9 is defined to be just I+. We say that a system satisfies CL if its correct expressions remain correct under reduction, and that it satisfies CR if its correct expressions are CR. Clearly, both I and I11 are contained in 11, since I1 has more liberal rules for 9. Further, if I satisfies CL then I and I1 are equivalent, as is proved by induction on the definition of correctness in system 11. Also by induction on 11-correctness it is proved that I1 and I11 are equivalent, if I11 satisfies CR. Now, in Section 3 we prove that I satisfies CL, and in Chapter VI we prove (roughly) CL Pq6-CR (for the P6-case we know CR already). This gives CR for 11, so CR for 111, so it shows that all the three systems are equivalent, and satisfy CL and CR.
-
*
The language theory of Automath, Chapter V (C.5)
519
An approach, alternative to the one sketched above, is given in Chapter VII. There the algorithmic definition serves as a starting point and CL and CR are proved simultaneously, using induction on so-called big trees.
1.3. What are the desirable properties? As desirable properties for the structures of correct expressions generated, we mention: (i)
substitutivity: correctness of expressions and formulas is preserved under substitution with correct expressions of the right types.
(ii)
closure (CL) and preservation of types (PT): correctness of expressions and formulas is preserved under reduction.
(iii) the Church-Rosser property CR, and the weak Church-Rosser theorem:
A Q B+ AJB.
(iv) (strong)normalization (S)N and decidability. behaves as an equality, such as:
(v) properties for 9, which show that -
the lefthand-equality rule LQ: A E B, A C equality rule is included in the definition).
- monotonicity rules: A
B, C
Q
D
+
+
C E B (the righthand-
( B )D , etc.
( A )C
(vi) uniqueness properties -
uniqueness of types UT: A E B , A E C
- uniqueness of domains -
+
B Q C.
UD: [z : A ] B [z : C ] D
+
A
Q
C (and B q D ) .
extended uniqueness of domains EUD: [z : A ] B E [z : C]D =+ A q C (and B E D ) .
Of course in the presence of type-inclusion (in AUT-QE), only restricted forms of uniqueness of types and property LQ (see Sec. 1.7) are valid. It depends on the choice of a definition method and on the language defined, which of the above properties are basic and which can be derived from these basic ones. Anyhow, SN, G y C R and GP-CR we know already. The discussion below starts with substitutivity (Sec. 2.9) and ends with Pq-CR (Chapter VI) and decidability (Sec. 4, as sketched in 1.2). In between, (ii) and (v) and (vi), which turn out to be connected, are considered more or less simultaneously. In fact, first PT, LQ and UD and the property of (vii) sound applicability SA: ( A )[z : B]C correct
+
AEB
520
D.T. van Daalen
are proved simultaneously, by a careful induction on degree. Then follows onestep closure CL1 by induction on correctness, and finally CL, by induction on
>.
1.4. Some points on closure
Apart from the specific role which closure plays in our discussion, it is of course important as a technical property. Compare, e.g. IV.2: the point of the generalization from the correct expressions to the normable expressions, lies precisely in the fact that the normable system is “large enough” to prove closure for it in a relatively easy fashion (in contrast with closure for the correct expressions), and small enough to prove (strong) normalization for it, with the help of closure. The normalization properties and CR are nicely preserved under certain forms of taking subsystems. So it is sufficient to prove these properties for some “large” systems: normalization for the normable expressions, p6- and v6-CR for all the expressions, and PqG-CR under faily general conditions in Chapter VI. The closure property, however, poses a separate problem for each particular language, because correctness is defined in terms of reduction. Further we must stick t o a particular definition, since in the proof of closure we often apply induction on the definition of correctness. Only after closure has been proved, some important derived rules follow and equivalence with the alternative definitions can be established. Nevertheless, we try and give a uniform treatment of the various languages here, by splitting up the closure proof in the parts common t o all the languages (e.g. substitutivity, CL1 + CL, etc.), and the part specific for each particular language, i.e. the proof of SA, UD, PT and LQ. The specific part is given quite elaborately for the “worst case”, 011-AUT-QE (and its extensions), in Sec. 3.2 and 3.3, and just sketched for the simpler languages, such as P6-AUT-QE, PqAUT-68 etc. (Sec. 3.4). In fact, for the simpler languages the specific part simply vanishes, in which case the whole closure proof boils down to the simple closure proofs in [Girard 721 and [Martin-Lof 75a]. 1.5. Summary
Section 2 starts with a list of inductive clauses for establishing correctness of expressions, E- and 9-formulas, relative t o correct book and context. E-definitions for particular languages are specified by indicating (1) a reduction relation (0-reduction with or without 6 and
(2) possible degree restrictions, (3) a particular set of rules from the list.
v),
The language theory of Automath, Chapter V (C.5)
521
In order to avoid confusion we restrict ourselves here to the regular languages (i.e. degrees only 1, 2 and 3), from P-AUT-68 to Pqb-AUT-QE+. Then we prove some simple properties (renaming of contexts, substitutivity, correctness of categories) and give a short discussion of some of the rules. Section 3 deals with the actual proof of closure and the connected properties (i.e. (ii), (v), (vi) and (vii) above) for the whole range of regular languages, as far as these properties are valid (in view of type-inclusion). First, heuristic considerations (Sec. 3.1) point out how the connections can be, and how the proof might be organized in the more complicated cases (such as Pv-AUT-QE). Secondly, the proof is actually carried out for Pv-AUT-QE (Sec. 3.2). After that, via an unessential extension result, all the properties are transferred to Pqb-AUT-QE+ (Sec. 3.3). Finally, it is shown, that for all the simpler languages (Pq-AUT-68, PG-AUT-QE(+), etc.) easier proofs can be given, which use the more liberal E-definition I1 (see 1.2) instead of I as a starting point (Sec. 3.4). We claim that the restriction to degrees 1, 2 and 3 in the closure proof of Pq-AUT-QE is not essential, and that this proof can be easily adapted for A(+), using the results on norm-degree-correctness in VII.2.2. Section 4 contains the details of the equivalence proof sketched in 1.2 above. First it is shown how, essentially, the verification of correctness can be reduced t o the verification of equality. typ-functions for the various languages are discussed. Then we present the algorithmic system (like system I11 above) and an “intermediate” system (like system 11). However, the situation is more complicated than sketched above, because the equivalence proofs in 4.3.2 and 4.3.3 are also used for proving the so-called strengthening rule superfluous (see below). Finally some remarks on the actual verification are made (Sec. 4.4).
1.6. Complication 1: the strengthening rule Of course, if an expression or a formula is correct relative to a book and a context, its constants are in the book and its free variables are in the context. The strengthening rule is connected with the converse question: In systems such as I, I1 above, which have rules for the transitivity of 9, it is a priori not clear that a correct equality A p B can be established via expressions containing only variables and constants occurring in A or in B. So it might be possible that a proof of correctness of A , or of A E B needs correctness of expressions containing variables and constants outside A (and B). Now for the sake of proving q- one-step-closure we have included a postulate, the strengthening rule, in our definition, which allows to skip “redundant” variables from the context. This appears to be a nasty rule because it might spoil the nice order on the correct expressions induced by the definition of correctness. See, e.g., Sec. 2.10.3 and 2.14.1.
D.T. van Daalen
522
The proof that the rule is superfluous, runs roughly as follows: let t-I, t-11 and stand for the correctness predicate in system I (as in 1.2, with strengthening rule), system I1 (as in 1.2, without strengthening rule), and the algorithmic system I11 (without strengthening rule), respectively. As in 1.2, t-111 t-11 (Sec. 4.3.2). By CL for system I (Sec. 3), we have t-11 FI. Since in the algorithmic definition strengthening is provable as in [Nederpelt 73 (C.3)]),by CR (for I, so for 11, so for 111, in Chapter VI) we can conclude FI + FII, which closes the circle (Sec. 4.3.3). l-111
*
*
1.7. Complication 2: definitional 2-constants in the presence of type-inclusion The rule of type-inclusion in AUT-QE allows us to infer A E T from A E [z : a ] ~This . shows how uniqueness of types gets lost in AUT-QE (but only for 2-expressions A ) . For the restricted form which we can prove instead we refer to Sec. 3.2.6.1. A peculiarity, due t o the combination of definitional 2-constants and typeinclusion, is that rule Lq is violated too in AUT-QE. Example: if a E T , A E [x : a ] (relative ~ to empty context, say), then the scheme d := A E
T
(also with empty context)
is correct in AUT-QE. Now d 9 A , A E [z : a ] but ~ not d E [z : a ] r . So, in AUT-QE, definitional 2-constants are not only used as abbreviations but also for cutting down the type of the expressions abbreviated. As a consequence of this, definitional 2-constants in AUT-QE can lead t o unessential extensions, which are not definitional extensions (Sec. 3.3.2). One might wonder why we do not take more liberal variants of AUT-QE, which allows d E [ x : a17 as well. In fact, we mention such a variant AUT-QE* somewhere for technical reasons (Sec. 3.3.11),but we do not think that this way of ignoring the typ of a definitional constant is suitable for practical purposes. Part of our motivation runs as follows: First, we do not want it for definitional 3-constants, where the definition part can stand for a long proof, and the typ represents a short theorem (1.5.2 in [A.6]). So, we do not like it for 2-constants, for the sake of uniformity.
V.2. On the E-definition 2.1. The book-and-context part of the E-definition 2.1.1. The correctness of books, contexts and expressions is defined simultaneously with the correctness of E-formulas A E B and Q-formulas A 9 B. [See e.g. [van Daalen 73 (A.3)I.l The symbol F stands for correctness; the notation for the correctness of contexts (w.r.t. a), expressions, E- and 9-formulas (w.r.t. 23 and [) is respectively
The language theory of Automath, Chapter V (C.5)
523
B ; ( I- A , a;( I- A E B and B ; ( t- A Q B. The symbols E and assumed to bind tighter than I-.
B;(
Q
are
2.1.2. For brevity we sometimes write “B;( I- A E/Q B” instead of “13; ( I- A E B respectively B; ( I- A B”, and “B;( I- A ( E / QB)” instead of “B;( I- A respectively t?;
<
(1) (inhabitable degree condition) an expression Q can only act as the t y p of a constant in a scheme or as the t y p of a variable in a context, if its degree is 1 or 2.
(2) (compatibility of def and t y p ) in a scheme ( * d ( 2 ) := A E that B;( I- A E rl where B is the preceding book. 2.2. Some notational conventions 2.2.1. We often assume implicitly a fixed correct book t, correct w.r.t. 23. Le., if B;<,17 I- then we write
r it is required
B and a fixed context
7 I- A ( E / QB ) for B;tlB I- A ( E / QB ) and just
A E/Q B
for
B;( I- A
E/Q
B
(so for formulas we omit the +-symbol in this case). 2.2.2. At some places in the definition the degree of expressions is explicitly displayed as a superscript:
FiA(E/QB ) e t- A ( E / QB ) and degree ( A ) = i . 2.2.3. Formulas like A1 E A2
A1 EA2 and A2
A3 E A4 are used as abbreviation for
A3 and A3 E A4 etc.
2.3. The expression-and-formula part of the definition: expressions The rules for the correctness of expressions and formulas fall apart in six groups labeled I to VI. We start with group I (correctness of 1-expressions) and
D.T. van Daalen
524 group I1 (correctness of non-1-expressions).
I. Correctness of 1-expressions. 1.1. T-rule: t - l ~ . 1.2. Abstraction rule: k 2 a , z E a F1 A + I-' [z : a ] A . 1.3. Application rule: A E a,t-'B [z : a]C + I-' (A) B. 1.4. Instantiation rule: if the scheme of d is in U, with context y' E a 1-constant then B' E &/8] =+ I - l d ( 8 ) . Notice, that the degree of A is indeed 1, if
t-l A
6,and d is
is derived by the above rules.
11. Correctness of non-1-expressions. 11.
AEB + F A .
2.4. The expressions-and-formula part: E-formulas The rules of group 111, below, in combination with rule 11, also serve as the formation rules for the non-1-expressions. Group IV contains the type modification rules. 111. Formation of non-1-expressions. 111.1. copy rule: = ...,z E a,... + z E a. 111.2. Abstraction rules: if F 2 a then III.2.A. x E a I- B E ?- + k [z : a ] B E T . [z : a ] B E [z : a]C. III.2.Ba. z E a B E c =+ So of the latter are two versions, III.2.B' and III.2.B2. 111.3. Application rules: if A E a then III.3.A. B E [z : a ] C =+ (A) B E C [ z / A ] . III.3.B. B E C E [z : a ] D =+ (A) B E ( A )c. 111.4. Instantiation rule: if the scheme of c is in B, with context y' E
8 E @[y'/8] =+
6,then
c ( 8 ) E typ(c)[y'lB].
Note: Below we shall prove A E B not explicitly required here.
+ I-
B (correctness of categories), which is
IV. Type modification rules. IV.l. Type conversion: B E C, C Q D + B E D. IV.2. Type-inclusion: B E [Z : G][u: PIT =+ B E [Z : 617 (where [Z : G] stands ... [ z k : a k ] ) . for [z1: 2.5. The expression-and-formula part: Q-formulas The rules for the correctness of Q-formulas form group V.
The language theory of Automath, Chapter V (C.5) v . Correctness of 9-formulas. V.1. Reflexivity: I- A =+ A p A . V.2. @propagation: A 9 B, I- C , ( B > C or C
> B)
525
j
A q C.
Note: This is indeed the most restricted version of Q. see Sec. 1.2. 2.6. The strengthening rule This is a technical rule, which we use in the proof of q-CL, but afterwards, i.e. after having proved CL and (with help of CL) CR, as in Sec. 1.6, prove superfluous. It is called strengthening rule because it permits to remove assumptions from the context. We say that q is a subcontext of <,for short 77 sub <,if the sequence of E-formulas of q is a subsequence of the sequence of E-formulas of <. So, q sub
< * q sub (<,z E a )
and ( q , x E a ) sub (<,z E a) .
VI. The strengthening rule. If B;to,
*
2.7.2. The degree specification for the regular languages AUT-68, AUT-QE and AUT-QE+ are: (1) degrees admitted 1, 2 and 3, inhabitable degrees 1 and 2, domain degree 2 and argument degree 3. (2) value and function degree are as in the following scheme
I
function degree value degree
I
AUT-68 3 273
I
I
AUT-QE 293 1,293
1
AUT-QE+ 1,2,3 1,293
1
Languages where all value degrees are also function degrees, are said to be +-languages: AUT-QE+ (and AUT-68+, AUT-QE*, to be defined later). Consequently AUT-68 and AUT-QE are non-+-languages.
D.T. van Daalen
526
2.7.3. No matter what rules are chosen, by induction on F (i.e. on the definition of correctness) it follows that
AEB
+
A not of degree 1 .
So no application expressions (C) D with degree (C) = 1 and no instantiation expressions c ( 6 ) where some Cj has degree 1, are formed, and the rules 111.4 and III.3.A do not give rise to substitution with 1-expressions (in the categories). Hence, also by induction on F,
+
(degree ( A ) = 1 e degree ( B )= 1)
(1)
A9B
(2)
A E B =+ (degree ( A ) = 2 e degree ( B ) = 1) .
2.7.4. This shows, together with the explicit degree restriction in the rules 1.2 and 111.2, that the expressions formed and the substitutions involved are weakly degree correct (cf. [vanDaalen 80, Ch. IV.4.4.21). The inhabitable degree restriction guarantees that only expressions of degrees 1, 2 and 3 are formed. So, the specifications of 2.7.2 (1) are fulfilled and
AEB AqB
+ +
+
degree ( A ) = degree ( B ) 1 degree ( A ) = degree ( B )
and all the substitutions generated by the rules are degree correct: If stituted for Z then, for all i, degree ( A i ) = degree (zi).
A' is sub-
2.8. Specification of the languages 2.8.1. The rules The difference between the definitions of the various regular languages only concerns the rules of abstraction, application and type-inclusion. All the other rules, and also III.2.B' (for abstraction expressions of degree 3) and III.3.A (application) are present in each of the definitions. For the rest the situation is as follows
application
III.3.B
III.3.B, 1.3
Note: Below it will turn out that (1) III.2.A is a derived rule of AUT-QE and AUT-QE+. (2) III.3.B, 1.3 and IV.2 (type-inclusion) are (trivially) derived rules of AUT-68.
The language theory of Automath, Chapter V (C.5)
527
So, after all, in AUT-68 all the rules except III.2.B1, 1.2 are valid; AUT-QE and AUT-QE+ have additionally III.2.B' and 1.2 and, besides, AUT-QE+ has 1.3. 2.8.2. T h e reduction relation For definiteness we agree that > in the Qrule V.2 stands for disjoint one-step reduction 51 (i.e. contract several disjoint redices in one go). So it satisfies the monotonicity conditions, e.g.
A>A',
B>B'
*
(A)B>(A')B'
with the important consequence that
A > A"
*
B[lq > B[K'].
In any case the reduction relation includes &reduction, but we leave open the presence of g- and &reduction. Of course, if no definitional constants are in the book then there is no &reduction. We assume that AUT-68 has no definitional 1-constants (because, modulo the elimination of abbreviations, the only 1-expression in AUT-68 is T ) . The rules of strengthening will only be present in languages with g-reduction. 2.9. T h e s u b s t i t u t i o n theorem 2.9.1. For the E-definition (in contrast with the algorithmic definition) it is
easy to show the substitutivity: correctness of expressions and formulas is preserved under correct substitutions, i.e. substitution with correct expressions of the right types. For technical reasons we start with a weak form of substitution, compare a-reduction. 2.9.2.
all
XI
T h e o r e m (Renaming of contexts): If [ = Z E c? and C' are mutually distinct, then (with <' := 2 E c?')
5 I- A(E/QB )
= C [/."'I,
+ 5' k A'(E/QB')
and the correctness proofs of both sides of the implication sign are equally long. Proof. Induction on I-. 2.9.3. An easy corollary of this is the weakening theorem, the converse of strengthening: if sub [ then
E
I-,
EO I- A(E/QB ) +
Proof. Induction on
[ I- A(E/QB ) .
I- A(E/QB ) .
0
As a corollary of this we can prove that in a derivation of correctness the
D.T. van Daalen
528
application of strengthening can be postponed to the end of the derivation.
-
2.9.4. Now we come to the simultaneous substitution theorem: if q then E
pfG/g],7 I- C(E/QD )
= y' E B,
C[$/g] (E/Q D[G/gl) .
Proof. By induction on 77 I- C(E/QD). We treat just some of the cases, distinguished according to the last rule applied in the derivation. Abbreviate C to C'. Last rule is 111.2.B': Assume q I-2 C1 and 7 , z E C1 ki+' Cz E Dz. By the ind. hyp. and by 2.7.4, F2C;. By the copy rule t E Ci I- z E C; (if necessary, i.e. if z in 5, rename the implicit context to E l ) . Now, by weakening, we can apply the ind. hyp. with the extended substitution [G, z/d,21 t o q,z E C1 I-i+' C2 E D2. This gives L E Ci I-2" Cz E Dz and, by III.2.Bi, I- [Z : Ci]Cz E [t : CilDZ, q.e.d. Possibly one must first rename 5' back to again. Last rule is V.2: Assume 11 I- C1 C2,q I- C3,Cz > (73. By the ind. hyp. 0 I- Ci Q Ci and k C;. Since C; > C;, I- Ci Q C;, q.e.d.
[$/a]
<
<
2.9.5. Corollary (single substitution theorem):
A Ea, zE
(v
I- B(E/QC)
B[z/A] (E/Q C[z/A])
0
2.10. Some easy properties 2.10.1. On abstraction In addition to the remark in 2.3, after rule 1.4, we can say that the last inference in a proof of I-' A must be rule VI.l or one of the rulcs I. In particular, if F' [z : a ] A , this can only follow from t,z E a I-' A for some 5 with sub E (since sub is transitive). So application of VI.l gives
<
EI-[z:a]A + < , s E a I - A . 2.10.2. Correctness of categories In the rules of the definition having A E B as their consequence, it is not explicitly required that I- B. For the copy rule this correctness of categories follows from weakening, for III.2.A from the 7-rule, for III.3.A from the single substitution theorem (use induction on I-),for 111.4 from the simultaneous substitution theorem etc. So, we have correctness of categories
A E B =+I--.
The language theory of Automath, Chapter V (C.5)
529
2.10.3. Abstraction again Assume that & , x E a I-a A , A of value degree, degree(&) = 2. If i = 1 then from 1.2 we infer 50 I- [x : &]A. If i > 1 then, as above, we can retrace some <1,x E a,& F i A E B with sub (1 and the transition from &, x E a,& k A to t o , x E a I- A follows from applications of strengthening. By the weakening theorem, we can extend the context to t 1 , x E a , & , x ’ E a , with some new 2’. By the substitution theorem we can infer 61, x E a,( 2 , x’ E a k A [ x / x ’ ]E B [ z / x ’ ] .In case we can apply III.2.B (this depends on the language under consideration) we get 51,x E a,& I- [ x : & ] A E [x : a ] B . Otherwise the language is AUT-68, i = 2, B = r and application of lII.2.A gives < I , x E a,(2 I- [x : a ] A E 7. Anyhow, rule I1 and iterated use of strengthening give t o I- [x : a]A. Resuming, (degree(&) = 2, A of value degree, x E a I- A) e I- [ x : a ] A . Note: The results in 2.9 and 2.10 are also valid, and simpler to prove, if 77reduction (and strengthening) is not present.
2.11. On the Q-rules 2.11.1. Clearly is the equivalence relation generated by >+,i.e. the restriction of > t o the correct expressions. So A Q B means precisely that
ck such that
k A and I- B and there are correct C1, ...,
A
> C1 >
< Ci-1
> Ci+l > ... < > ... < ck < B
C Ci
< Cj-1 < Cj > Cj+l
(where possibly, in view of strengthening, the Ci in between are correct w.r.t. extended contexts).
2.11.2. An alternative rule of @propagation is V.2’
A Q B , kC, B J C
+
AQC.
If the language definition has this rule, becomes -+, i.e. (I+)* (Sec. 1.2), i.e. the transitive closure of the restriction of 1 to the correct expressions. So, no matter what other rules there are in the definition of correctness,
*
v.2
CL, v . 2
*
V.2’ and
V.2’
2.11.3. An even stronger rule for Q, also including reflexivity is
D.T.va.n Daalen
530
V.2” FA, EB, A = B + A q B [where = is the (unrestricted) equivalence relation generated b y
>I,
Assuming the (full) CR-theorem, i.e. CR for all, not just the correct expressions, which is the case if v-reduction is not present, we get
(V.l,V.2‘)
* V.2”
*
2.12. O n type-conversion 2.12.1. The q-formulas (and the Q-rules, see below) can be avoided completely by reformulating IV. 1, the type-conversion rule to IV.1’
A E B , FC, ( B > C o r C > B )
*
AEC.
And, corresponding to V.2’ rather than to V.2,
IV.1”
A E B , I-C, B l C
+
AEC
*
As in 2.11.2, IV.1” + IV.1’ and CL, IV.1’ IV.1“. Corresponding to V.2” is the alternative rule
IV.1”’ A E B , B = C , F C + A E C [where = is the (unrestricted) equivalence relation generated b y
>I.
2.12.2. The system with Q-formulas, 9-rules V . l and V.2, and rule IV.l is indeed a conservative extension of the system without Q but with the corresponding type-conversion rule instead. First we have IV.l,V.l,V.2
+
IV.1‘
,
respectively
IV.l,V.l,V.2/
+- IV.1” ,
respectively
IV.1,V.1,V.2”
*
IV.1”’
,
so the Q-systemis an extension of the Q-lessone. Secondly, the expressions and E-formulas, correct in a Q-system are also correct in the corresponding q-less system.
2.12.3. Notice, that in the presence of v, rule IV.1”’ (so rule V.2” too!) is inconsistent in the sense that it gives rise to anomalies such as self-application. This fact is connected with the Pv
The language theory of Automath, Chapter V (C.5)
531
Example: If a E T then I- [z : a]a and I- [y : [z : a]a]a. Further [z : a ] a = (by p ) [z : a] (z)[y : [z : a]a]a= (by q) [y : [z : a]a]a.So, if f E [z : a]a then (f)f E a.
2.13. On type-inclusion 2.13.1. Iterated use of the rule of type-inclusion gives
AE[Z:Z][G:&
*
AE[Z:Z]T
so
A E [ Z : Z ] T+ A E T . This shows that AUT-68 is a sublanguage of AUT-QE: all the correct books, contexts, expressions and formulas of AUT-68 are also correct in AUT-QE.
Proof. Rule III.2.A, not in the definition of AUT-QE, can be derived from so III.2.B' and IV.2. For, let z E a I- B E T . Then I- [z : a ] BE [z : a y ] ~ 0 I- [z : a ] BE T , q.e.d.
2.13.2. Conversely, rule IV.2 is (vacuously) a derived rule of AUT-68, because all the correct AUT-68 1-expressions &reduce to T . 2.14. The form of derivations 2.14.1. We called the rules I11 the formation rules of non-1-expressions. This is because, in a proof of I-Z+' A, we can retrace some ( I- A E B and & I- A E C, such that (i)
the last rule applied in proving & k A E C is the formation rule of A, i.e. one of the rules 111,
(ii) the transition from type conversion,
(1
I- A E C to
(iii) the transition from [ I- A E B to
5kAE
B is by iterated use of VI.2 and
60 k A is by using VI.2, 11, and VI.l.
So, in case there is no type-inclusion applied, e.g. if i > 1, we have (use weakening) & I- B q C. Below we introduce a symbol covering the relation between B and C in case type-inclusion is involved.
c can be defined as follows Q! I- A c B j [z : &]AC [z : a]B.
2.14.2. The new relation (i)
I- [z : a]A,z E
*
A c B. (ii) A 9 B (iii) c is transitive. (iv) I-'&
+ a c T.
D.T. van Daalen
532
Clearly, c is a reflexive and transitive relation on the correct expressions, including q and type-inclusion, which on the non-1-expressions coincides with q (use 2.10.3). The type modification rule can now be contracted to one rule IV
AEB, B c C
And, for
El,
+
AEC.
B and C as in 2.14.1 we have
2.14.3. so, in a proof of [z : a]C c D.
t- C c B now.
[x : a ] B E D we can retrace
2 E
a k B E C with
Similarly, in a proof of ( A )B E D we can retrace either (i) B E [z : a]C with CIA] 1D , A E a , or (ii) B E C E [z : a ] E with ( A )C C D, A E a. And, in a proof of c(C?) c D we can retrace some
c ( 6 ) E t y p ( c ) [ C ]C
D
.
2.14.4. Above, we used already
t - [ z : a ] A ,z E a t - A q B =+ [ z : a ] A q [ z : a ] B . The other monotonicity rule
a 9 p , t- [z : a ] A + [z : a ] Aq [ z : P]A follows by induction on 9 , using the substitution theorem. However, we do not know yet
A q B , C q D =+ ( A ) C q ( B ) D and consequently, it is a priori not clear that (uniqueness of types for 3-expressions)
k3AEa, A E P
* asp.
This (and its weaker counterpart for 2-expressions) will not be proved before the next section (3.2.4, 3.2.6). 2.15. On the application rules 2.15.1. In AUT-68, where no 1-abstraction expressions are formed, the rule III.3.B is vacuously a derived rule, viz. there are no B with B E C E [z : a ] D .
Since, in AUT-68 and AUT-QE+, k 2 [ z: a ] c
+
[z : a]C E [z : a]D
The language theory of Automath, Chapter V (C.5)
533
we can restrict the rule III.3.A
+
A E ~ B , E[z:a]C
(A)BEC[A]
to the case where degree (C) = 1.
2.15.2. As an alternative to III.3.B (and to III.3.A if 1.3 is present) we mention III.3.B’ k ( A ) C , BE
c+
(A) B E ( A ) C .
The following equivalences hold (1.3,111.3.A,III.3.B)
(1.3J11.3.B’)
(III.3.A1111.3.B)(j (III.3.A,III.3.B‘)
.
Proof. E.g. that III.3.A is a derived rule in presence of 1.3 and III.3.B‘. Let A E a , B E [z : a]C. By 1.3 (and III.3.B’, if degree(C) = 2), t- ( A ) [z : a]C. By the single substitution theorem -i C[A].So by III.3.B’ and type-conversion ( A )B E C[A]. 0
2.15.3. Notice that in the presence of 7-reduction rule III.3.A by itself is sufficient, because qJII.3.A
+
I11.3.B .
Proof. Assume A E a , B E C E [z : a]D. Then z E a k z E a , so by III.3.A, t- (z)C E D and by abstraction b [z : a] (z)c E [z : a]D. By 11 and typeconversion B E [z : a] (z) C (z $! FV(C)), so by III.3.A ( A )B E ( A )C , q.e.d. 0
zEa
2.16. A n E-definition for A and A+ 2.16.1. In order to adapt the E-definition to A and A+ we must first drop the inhabitable degree condition, and the restriction to a of degree 2 in the abstraction rules 1.2 and 111.2. The rule of type-inclusion and rule III.2.A must be skipped but III.2.Bi is permitted for all i. A suitable combination of application rules is 1.3 and III.3.B’ for A+, and III.3.A and III.3.B’ for A. An alternative for III.3.B’ is an extended form of III.3.B AE
CY
, B E Cl E ... E ck E
[Z: a]D
+
( A )B E ( A )CI .
2.16.2. Degree considerations for A and A+ are indeed more involved than those in 2.7. Of course we can show weak degree correctness, as in 2.7, but we must know more in order to establish degree correctness. See Ch. VII, Sec. 2.2. The various properties proved above, such as substitutivity, correctness of categories, etc. simply go through for the E-versions of A and A+.
534
D.T. van Daalen
V.3. The actual closure proof 3.1. Heuristics 3.1.1. The first idea which comes to mind about proving closure, CL CL
FA, A L B * F B
is simply to prove one-step closure, CL1 CL1
FA, A > B 3 l - B
by induction on F A and then use induction on 2. Among the possible ways of one-step reduction we distinguish the main or “outside” reductions
(P)
( A ). 1 : BIG > C[AI
(v)
z @FV(A)
(6)
d ( 2 ) > def(d)[A]
[z : a](z) A
>A
and the “inside” reductions which follow by the monotonicity rules (appl)
A > A’, B > B’
(abstr) a (const)
+
> a’, A > .!’ =+
A > At
( A )B > (A‘)B’ [z : a ] A> [z : @’]A’
c(2) > c ( R ) .
So we assume that > stands for disjoint one-step reduction. Now consider, e.g., the appl-case where the correctness of ( A )B follows from A E a , B E [z : a]C. Here the induction hypothesis, CL1 applied to A and to B , just tells us that I- A’ and F B’ (where A > A’, B > B’), which is of course not enough to conclude k (A’)B’. This suggests that we need preservation of types, PT PT
A E ~ F,B , A 1 B
+.
BE^
or at least one-step preservation of types, PT1 PT1
A E a , I-B, A > B
+
BEa
additionally. Similarly with the const-case of one-step reduction. 3.1.2. So the next idea is to combine CL and PT to
CLPT
F A ( E ~ )A, 2 B
+FB ( E ~ )
(as the conjunction of the version with and the version without parentheses) and to use the same induction. 1.e. first prove
The language theory of Automath, Chapter V (C.5)
CLPTl
535
k A ( ~ a ) A, > B + ~ - B ( E c Y )
by induction on correctness and then use induction on 2. This works fine with all the inside reductions. E.g., consider once more the appl-case: A E a , B E [z : a ] C ,A > A’, B > B’. Now the induction hypothesis gives us A’ E a , B’ E [z : a]C and (A’)B’ E C[A’]. Since > is disjoint onestep reduction, C[A]> C[A’]so C [ A ]Q C[A’]so ( A ’ )B’ E C [ A ] ,q.e.d. The other cases of inside reductions are treated similarly, using some facts from the previous sections. Then the outside reductions: 6 and 77 do not cause major difficulties either. For 6 use the simultaneous substitution theorem and the compatibility of def and typ, for 77 use the strengthening rule. But there is a problem with /I-outside reduction. For, in order to conclude k C [ A ]from k ( A )[z : BIG, we seem to need soundness of applicability, SA SA
t- ( A )[ z : B ] C
+
AEB
which would allow us to use the single substitution theorem. 3.1.3. Let us try to find out about SA. So consider the assumptions which can lead to the correctness of ( A )[z : B ] C . E.g. A E a , [z : B]C Q [z : a ] D (resp. [z : B]C E [z : 010).Then SA amounts to uniqueness of domains, UD UD
[ z : B ] C Q [ z : : a ]+D B
Q
~
resp. extended uniqueness of domains, EUD EUD
[z : B]C E [z: a ] D
+
B
Q cy
or: A E a , [z : B ] C E D E [z : a ] E (these are the assumptions of rule 111.3.B). As in 2.14.3, for some F , [z : B]C E [z : B ] F C D (and in fact [ x : B ] F D ) . So, in this case SA seems to require the left-hand equality rule LQ
LQ
A E ~ A, Q B
BE^
which would give [x : B ] F E [x: a ] E and, by EUD, A E B . However, LQ + PT. So, it appears that we cannot do SA separately beforehand (i.e. not if III.3.B is present) and then proceed with CLPT as sketched above.
3.1.4. In order to simplify matters, we first forget about type-inclusion. Then we may hope to be able to prove uniqueness of types, UT UT
A E ~ A,
E
+~a s p .
D.T. van Daalen
536
*
EUD and, besides, LQ and PT turn out to be If we assume UT then UD equivalent. This may suggest us to incorporate the proof of SA in the proof of CLPT. But we do not have UT yet. If we try t o prove UT by induction on the length of A, we come again in trouble with rule III.3.B. For, let A1 E a!, A2 E B E [z : a ] D , A2 E C E [z : a]E. The ind. hyp. just gives us B Q C here, but we need more, viz. something like
(*)
I-(A)B, B Q C
*
(A)BQ(A)C
(this is one half of the third monotonicity formula of Sec. 2.14.4). Since a proof of (*) requires Lq in turn, UT cannot be isolated either. We might try to combine SA, UT and CLPT, i.e. to prove the necessary instances of SA and UT in the course of the proof of CLPT1. A proof dong these lines is indeed possible even if type-inclusion is present, but it has a complicated structure and it cannot easily be extended to languages with higher function degrees, such as A and A+.
3.1.5. Thus we prefer the alternative approach sketched below, which essentially runs as follows: first prove PT1, UT and LQ by induction on degree, then prove SA and UD, and afterwards prove CL as indicated in 3.1.1. To this end we distinguish degree-i-versions of the various properties
+.
PT~
t - z A ~ a ,A > B , t-*B
Lqi
I - ~ A E QA, q B
*
UT'
I-gAEa, A
+~a s p
(d)
F i B QC , I- ( A ) B
UDi
ki[z: a!]AQ [x : p ] B
SAi
Fi(A)[z:B]C
E
*
BE^
BE&
( A ) BQ ( A ) C
*
a!
9 /3
AEB.
First notice that: PT;,
UTi =% Lqi
and that:
Lqi
*
Lqi
* UTi+'
hence:
(*i)
.
We assume that the language under consideration is a non-+-language (see (ignoring typeSec. 2.7). Then it is relatively easy to show UDk and UT"' inclusion), where k is the lowest value degree. Now let us try t o prove PTi+' by
The language theory of Automath, Chapter V (C.5)
537
induction on correctness, where we assume PT:, L Q j and UTj+' for j 5 i. An instructive example is the appl-case of inside reduction: A > A', B > B', Fi+' ( A ) B, ki+' (A') B'. It is no restriction to assume that both ( A ) B and (A') B' originate from the extended application rule of 2.16.1: A E a, A' E a', B E c1 E ... E cl E [z : a]D,B' E ci E ... E E [z : a']D' with degree(D) = degree(D') = k and 1 = 1'. Then by the ind. hyp. we have B' E C1, so by UTi+l C1 q Ci and by LQi Ci E (32. Then follows C2 q Cl and C; E C3 etc. Finally we have [z : a ] D p [z : &'ID'and by UDk a a' so A' E a. Hence {A') B' E (A')C1 < ( A )C1, so (A') B' E ( A ) C1, q.e.d. From PT;" and UT*+' we get LQi+', and UTi+2. So by induction, we get PT1, LQ, (*) and UT.
c;,
3.1.6. It is clear that SAi++' can be distilled from the proof of PTi+', but it can alternatively be given as follows. First, we have LQi+l U D ~
,
mi++'
so we have UD. Now let k*+' ( A ) [z : B]C. Then (see Sec. 2.15.2) either A E a , [z : B]C E [z : a]D,or [z : B]C E E , k ( A )E . Further [z : B]C E [z : B]F . So by UT we have Q B, or by (*) we have k ( A ) [z : B]F . So from LQ, UD and UT we get SAi 3 SAi+l
and by induction SA.
3.2. Closure for Pv-AUT-QE 3.2.1. For definiteness we present a rather detailed version of our closure proof here for Pq-AUT-QE, i.e. AUT-QE without definitional constants and without &reduction. So the admitted degrees are 1, 2 and 3, the value degrees are 1, 2 and 3, the domain degree is 2 and the argument degree is 3. The function degrees are just 2 and 3, so &-AUT-QE is a non-+-language. So the reasoning of Sec. 3.1.5 is valid, but for additional problems due t o the presence of type-inclusion (viz. that UT is not true and that not immediately (PT1 j LQ) and (UD =+ EUD)). These problems are overcome by the introduction of a "canonical type" in Sec. 3.2.4 below. This canonical type also plays a role in the 0-case of PT1. Later we include definitional constants and &reduction, and application expressions of degree 1, thus extending our result to Pq6-AUT-QEf (in Section 3.3). A closure proof of &-AUT-68 can easily be imitated from the proof below and is in fact somewhat easier because there is no type-inclusion.
D.T. van Daalen
538
3.2.2. We specify a set of rules (in shorthand, omitting contexts) for Pq-AUTQE, which according to the properties in 2.10-2.15 are equivalent t o the rules indicated previously. I-T
... , z E a , ... k z (Ea) z E a I- A (EB)
* I-
[z : a ] A ( E [z : a ] B )
+ I-
A E a, k 2 B E [ z : cy]C AE
(Y,
BE
( A ) B( E C [ A ] )
c E [z : a10 * I-
A E z[d,.'E z * p(.')
( A )B (E ( A )C )
E P is a scheme
(Wd)
I- P ( 4
* AEC
AEBcC I- A, A
> B or B > A, I- B + A B
+
AqBqC
+
(where > is 51,i.e. disjoint one step Pq-reduction)
AqC
A q B =s A c B I-'A =+ A C T z E a I-
A
C
B =+
AcBcC
[[z : a ] A1[z : a ] B
AcC
strengthening.
On 1-expressions and type-inclusion 3.2.3.1. Since there are no 1-application expressions and no definitional constants all 1-expressions are of the form '.[ : ZIT, with .'possibly empty. And, if I-' [z : &]A,I-' [z : P]B, [z : a ] A > [z : BIB, then a > P, A > B so cy q P and z E a I- A 9 B. So, by induction on 9, we can show UD'
l - ' [ z : a ] A p [ z : P ] B+ a q p
(andzEat-AqB).
Then, by induction on C, we get I-'[. : a ] Ac [z : P]B
*a
P
(and z E a I- A C B )
I
3.2.3.2. We introduced UTi, uniqueness of types for expressions of degree i (i > 11,
The language theory of Automath, Chapter V (C.5)
UT~
P A E B A,
*
EC
539
BQC.
For i = 3 this will be proved below, but for i = 2 it is simply false in view of type-inclusion. Now we define
B 0 C : e BcCorCcB. Below we shall prove that the new symbol covers the relationship between B and C whenever A E B and A E C. Clearly on the non-1-expressions 0 is just Q. We have k1[z:a]A0[z:P]B
aQP,
(zEakAOB).
Further 0 satisfies a strengthening rule, and is substitutive: AEQ, z E a k B O C
+
B[A]OC[A].
3.2.3.3. We also want to show
FIBoC
@
forsomeA, A c B a n d A c C .
Proof. + is trivial. So let B I1 A C C. Then A = [t: 71 [y’ : [Z : Z]T, h’= [.?: 711 [y’: C = [.?: 721. (or similar with B and interchanged), with 0 “7 Q T~ Q 72”, Y E 7 F Q f i l , l . SO B c c (or c c B ) .
B~]T,
c
B
3.2.4. The canonical type 3.2.4.1. It is possible, for each A with ki+’ A to indicate a n
QO
such that
is a minimal representative - w.r.t. IT - of the categories of A, i.e. A E QO and: (A E Q + QO C a)
(1)
a0
(2)
FV(a0) c FV(A)
.
We call this (YO the c a n t y p of A (with respect t o a context). The definition of c a n t y p is like the definition of t y p in [van Daalen 80, Sec. IV.3.21 [ t y p is like c a n t y p but with rule (iv) for B of all degrees, and without rule (v)], but slightly modified in order t o stay in the correct fragment, as follows:
= tYP(.)
(i)
CantYPb)
(ii)
CantYP(P(4) = tYP(P”1
(iii)
cantyp([z : o ] B )= [z : a ] c a n t y p ( B ) - w.r.t. to extended context
(iv)
cantyp((A) B ) = (A) c a n t y p ( B ) if degree ( B )= 3
(v)
cantyp((A) B ) = C [ A ]if degree ( B )= 2 and c a n t y p ( B ) = [z : Q]C.
-
540
D.T. va,n Daalen
Clearly, typ(A) 2 cantyp(A) so property (2) above is immediate. Now we prove a lemma corresponding to property (1). Lemma. If LQ' and I-'+' A E a then A E cantyp(A) C a. 3.2.4.2.
Proof. By induction on the length of A. The more interesting cases are (i)
A = [z : crl]Al, z E a1 I- A1 E 0 2 , [z : al]a2 c a. By the ind. hyp., z E 01 I- A1 E cantyp(A1) C C Y ~ ,SO [Z : a l ] A E [Z : 011 c a t y p ( A 1 ) cantyp(A) c [z : a1Jcr2I I a,q.e.d.
(ii) A = (Al) A2, A1 E a1, I-2 A2 E [z : al]C, C[A1] C a. By the ind. hyp., A2 E cantyp(A2) c [z : al]C so cantyp(A2) s [z : ai]C'. Hence cantyp(A) is indeed defined, a1 0 a;, z E a1 I- C' C C, so (Al) A2 E C'[Al] C a, q.e.d. (iii) A = (Al) Az, A1 E 01, F3A2 E B E [z : al]C, (Al) B 0 a. By the ind. hyp. A2 E cantyp(A2) Q B. By Lq* we can use property (*') of Sec. 3.1.5 and 0 get cantyp(A) 9 ( A l )B 9 a , q.e.d. 3.2.4.3. Corollary.
(a)
I-'A E B, A E C =+ B O C (this is, for A of degree 2, the desired property of 0 ) .
(ii) k 2 [x : a ] A E [x : P]B =+ a Q p,
2 E
cr I- A E B (this includes EUD2).
(iii) S A ~ . Proof. (i) LQ1 is vacuously fulfilled, so B 7 cantyp(A) c C, so by 3.2.3.3 0 B 0 C. (ii) and (iii) are immediate. 3.2.5.1. Now that we have introduced cantyp we can use it in the proof of PT. We define the property of preservation of cantyp.
PCT'
!-'A,
A 2 A'
,
I- A' =+ cantyp(A)
cantyp(A')
.
Similarly PCT';; PCT is the conjunction of all the PCTi. We first prove some lemmas for PCT2. 3.2.5.2. Lemma (substitution lemma for cantyp): Let B* stand for B[x/A]. Then z E a , $ E /? I-2 C, k3 A E a cantyp(C)* cantyp(C*) where the cantyp's are taken w.r.t. (z E a,$E $) and ($E p't) resp.
=
Proof. Induction on C. Note that C f z, because degree(z) = 3. Some cases are:
The language theory of Automath, Chapter V (C.5)
54 1
= [z : C1]C2, cantyp(C)* = [z : C;]cantyp(Cz)* (w.r.t. z E a , y ’ p, ~ z E c 1 ) E (by ind. hyp.) [ z : cr]cantyp(Cz) (w.r.t. a E p*,z E c;) -a
(i) C
G
cantyp(C*), q.e.d. (ii) C
=
(Cl)C2, cantyp(C)’
=
D[Cl]*= D*[Cr] where cantyp(C2) = cantyp(C,’), so cantyp(C*)
=
[ z : 710 and, by ind. hyp., [ z : 7*]D*
=
D*[C;] as well, q.e.d.
0
3.2.5.3. Corollary. z E a k2 C, k3A E a
*
cantyp(C)[A] = cantyp(C[A]). 0
3.2.5.4. Corollary (p-PCT;): CantYP(C[Al).
k2 (A) [z : B] C
+-
cantyp((A) [z : B] C)
Proof. By SA2 we have A E B, so even cantyp( (A) [z : B]C) G cantyp(C)[A] = 0 c U t Y P(C [A1). 3.2.5.5. Lemma (Q-PCT:): k2[z : a](.) A , z $! FV(A) =+ cantyp([z : a](.) A)
cantyp(A)
.
Proof. Let cantyp(A) = [y : p]D and let k2 [z : a](.) A be based upon z E a’, A E [y : a’]D’. By 3.2.4.2 [y : p]D c [y : &’ID‘ and z $2 FV([y : BID), so a Q a‘ Q p and cantyp(A) = [z : p]D[y/z] Q [z : a]D[y/z] = cantyp([z : a](z)A). 0 3.2.5.6. Theorem. PCT:.
Proof. Let I-2 A, I- A’, A > A’. For a main reduction use 3.2.5.4 or 3.2.5.5. For inside reductions use induction on the length of A. Some cases are:
=
[z: A’,]A;, A1 > A;, A2 > A;. By ind. hyp. (i) A G [z : AlIA2, A’ cantyp([z : A1]A2) Q c m t y p ( [ z : AlIA’,) = [z : Al) cantyp(A;) Q [z : A:] cantyp(Ak), by the substitution property 3.2.5.3.
= (Al) A2, A’ = (A;)
A;, A1 > A‘,, A2 > A;. Since (Al) A2 is correct, A1 E a1,A2 E cantyp(A2) = [z : PIC C [z : a l ] D . So a1 Q p. Similarly A’, E a’,, A; E cantyp(A;) = [z : p’]C’ C [z : a’,]D’. So a{ Q p’. By the 0 ind. hyp. [z : PIC [z : p’]C’, so C[Al] Q C’[Al] Q C’[A’,], q.e.d.
(ii) A
D.T. van Daalen
542
3.2.5.7. Corollary. (i)
PT?,
(ii) Lq2, 0
(iii) U D ~ .
3.2.6.1. By Lq2 we can apply 3.2.4.2 to expressions of degree 3 now. We get: (i)
F3A E a
+
A E cantyp(A) Q
0.
(ii) UT3 : k 3 A E a , A E p + a O p (i.e. a Q of 0 for A of degree 3).
p) (this is the announced property
(iii) SA3 (e.g. as in 3.1.6). Notice that by UT3 the properties PCT3 and PT3 are equivalent.
3.2.6.2. We introduce CLPTZ: FiA(Ea) , A 2 A’
+ FiA’(Ea)
and similarly CLPT!. Here follow some lemmas for CLPT;.
3.2.6.3. Lemma (PLCLPTB): k3(A) [z : B]C E D
+
C[A] E D.
Proof. Let A E a, [z : B] C E F E [z : a ] G , (A) F Q D , and let z E B I- c E H. [z : B]H F . By SA3 we have A E B and by (*’) (A) [z : B] H Q (A) F . By 0 the substitution theorem for correctness C[A] E H[A] Q D.
3.2.6.4. Lemma (vLCLPT7): F3[z : a](.) A E B , z @ FV(A)
+
A E B.
Proof. cantyp([z : 0](z) A) G [z : a](.) cantyp(A) Q cantyp(A) (by 77-re0 duction), by strengthening I- A, so by 3.2.6.1 A E B.
3.2.6.5. Now we are ready for CLPT. Theorem (CLPT1) : I- A ( E ~ )A, > A’
+ I- A ‘ ( E ~ ) .
Proof. If A > A’ is a main reduction use SA, strengthening, PT2 and the preceding two lemmas. Otherwise use induction on the length of A. (i)
A = [z : al]A1, A’ = [z : a:]Ai, a1 > a:, A1 > A;, 2 E a1 I- A 1 ( ~ a 2 ) , ([z: a1102 c a ) . By ind. hyp. I- a: and z E a; I- A ~ ( E ~ Y ~ ) . So I- [z : a’,]A’, (E[z : ai]a2 Q [x : al]a2 C a ) - read this twice, one time with and one time without the symbols in parentheses -.
The language theory of Automath, Chapter V (C.5)
543
(ii) A = (Al)Az, A’ = (A;)Ai, A1 > A;, AZ > A;, Al E a1, A2 E [z : a l ] C , C[A] c a. By ind. hyp. A: E 011, A; E [x : al]C. So A’ E C[A{]Q C[Al].
(iii) As in (ii), but A2 E B E [z : al]C, ( A l )B C a. By ind. hyp. A’, E q , Ah E B , SO A’ E (A:) B Q ( A * )B. (iv) A = p(B1, ...,B k ) , A’ = p(Bi ,...,B i ) , 2 > I?’, B1 E P I , Bz E Pz[B1],..., Bk E Pk[Bl,...,Bk-11, P [ B ]c a, where y ’ E p’ * p(y’) E P is a scheme. By so ind. hyp. B; E P i , Bi E P2[B1]9 P2[B:],...,Bk E P k [g1Q Pk[g’], 0 p ( B i , ...,Bk) E P [ B { ..., , B;] Q P [ g ] .
3.2.6.6. Corollary. (i)
CLPT,
(ii) LQ, (iii) UD.
0
3.2.6.7. Corollary (Rule V . 2 , See. 2.11): F A , F B , A J. B
+
A 9 B.
0
3.3. Extension to Pr&AUT-QE+ 3.3.1. Now we consider P$-AUT-QE+, i.e. Pq-AUT-QE extended with 1application expressions, with definitional constants and with definitional reduction. The additional rules are 1.3
A E ~ F, I B Q [ z : a ] C +I-’(A)B
(vi’)
A E a[A],2 E 3 * d ( 2 )
:= d($) := D(E E ) is a scheme =+
k d(A)(EE [ A ] )
(cf. Sec. 3.2.2 and Sec. 2.3 respectively). If we try to repeat the previously given proof, we first come in trouble because not all the compound 1-expressions are abstraction expressions anymore. This makes the proof of UD1 from Sec. 3.2.3 fail, though the property itself remains valid. Furthermore there is the problem with definitional 2-constants and typeinclusion (mentioned in Sec. 1.7), which makes Lq2 fail. Below we give an indirect proof instead which runs as follows: first we show (Secs. 3.3.3-3.3.8) that the indicated extension is a so-called unessential extension. Then we use this fact to transfer the desired properties from Pq-AUT-QE t o the new system (Sec. 3.3.9). Finally (in Sec. 3.3.11) we briefly discuss an even larger system than AUT-QE+, which we call AUT-QE*.
D.T. van Daalen
544
3.3.2. Some terminology Consider two systems of correct expressions with typing and equality relation, (k, E, Q ) and (k+,E+, Q+) respectively. (F+, E+, Q+) is an extension of (k,E,Q) if t- =% k+, E + E+ and Q =% Q+, i.e.:
B -I resp. B;E
I- resp.
B;[ I- A
(E/Q
B)
+
B k+ resp. B; [ k+ resp. B;E k+ (E+/Q+ B ) . We further just write t-+ A E/Q B instead of k+ A E+/Q+ B. The “new” system k+ is said to be conservative over the “old” system k if all new facts about old objects are old facts, i.e. if UEO
F A , I- B , k + A E/Q B + t - A E/Q B .
An extension is unessential if no “essentially new” objects are formed, i.e. if all new objects are equal t o old ones. This means that the new system can be translated into the old one by a mapping-, working on expressions, books and contexts, such that
+k+AqA-
t-+A
UE2
B I-+ resp. B;[ I-+ resp. B;[ I-+ A + B- k resp. B-; [- I- resp. B-;[- I- A-
UE3
B;[k+ A
E/Q
B
+
and k A
+
UE1
B-;[- F A -
A=A-
E/Q
B- .
Clearly unessential extensions are conservative. Property U E 3 means that new formulas inply their old counterparts. Unessential extensions also satisfying UE3/, the converse of UE3, UE3’
+ + A , F+B, t - A - E / Q B -
+~+AE/QB
are called definitional extensions. In a definitional extension new formulas are equivalent to old ones. All unessential extensions satisfy the Q-part of UE3’, but for the E-part we need property LQ for the larger system (at least if the smaller system satisfies LQ). For that matter, if the +-system satisfies LQ, we have UE1,UEZ
UE3’
and: UEO,UEl,UE2
+ UE3 ,
3.3.3. The translation Of course, we take Pg-AUT-QE for our smaller system I- and we take Pqb-AUT-QE+ as the extension k+. We are going to prove that k+ is a n unessential (but not a definitional) extension.
545
The language theory of Automath, Chapter V (C.5)
For an expression A we intend its translation A- to be the normal form w.r.t. a certain reduction relation In order to make A- well-defined and in view of UE1, UE2 we require
>-.
(0) 2- normalizes and satisfies CR. (1) 2- just affects the new elements of expressions (1-application parts and definitional constants) and removes them.
(2) 2- is part of the reduction relation of the new system and satisfies CLPT.
<
For contexts 5 E r 3 the context E- is simply 5 E 6- (where the meaning of b- is clear). Similarly schemes for primitive constants * p ( 2 ) E /3 are translated into E- * p ( 5 ) E p-. But schemes for definitional constants have to be omitted in the translation. Before fixing 2- we define ij-reduction 2i, i-reduction of degree j (where i is p, 7, 6 or a combination of these). This is the reduction relation generated from elementary ij-reduction, defined as follows: A elementary iJ-reduces to A' if A elementary i-reduces to A' and degree(A) = j. The corresponding one-step reduction is denoted >:. Notice that for degreecorrect A the degree of A' above is j as well (cf. Sec. 2.7). Now, in view of requirement (1) above, we define 2- to be the reduction relation generated from 2; and 2 6 .
<
3.3.4. Notice that pl-reductions cannot be inside reductions. Strong normalization for p' is easy to prove even without using normability. From [van Daalen 80, Ch. 1111 we recall 6-SN and 6-CR. We can show that P'-CR holds, and that p' commutes with all other reductions (such as p2, 6, $) except 77'. (See 11.8.) So 2- commutes with all kinds of reduction but $, and we have >--SN and >--CR (whence requirement (0)above). Clearly >--normal forms d o not contain defined constants anymore; a simremoves the 1-application parts as well. ple normability argument shows that
>-
3.3.5. A further property we want 2.- to satisfy is CLPT. Since 6-CLPT1 follows from the simultaneous substitution theorem (cf. 2.9.4) we just want to know SA1 -I:
( A ) [z : B] C
+ I-+
AEB
or, equivalently, U D ~ [z : B]
c Q [z : a]D + k+ B Q a .
Here turn up the problems with 1-expressions, announced in 3.3.1. To overcome these we seemingly modify our system:
D.T. van Daalen
546 (1) We exclude ql-reduction. (2) We change our 1-application rule into
1.3’
A E a , F I B red- [ z : a ] C
+ k!+(A)B
where red- is 2- restricted to the correct expressions, i.e. generated by
t-+A, F + A ’ , (A
>b
A’orA
>6
A’) + F + A r e d - A ‘ .
Clearly 1.3 1.3’, so the modification is a restriction. However, after having proved >--CLPT (whence UE1, see Sec. 3.3.6), UE2 and UE3 (Sec. 3.3.7) for the modified version, we shall be able to show that both 1.3 and $-equality: t-+ A, A >: A’, F+A’ + F + A A’ are derived rules. Hence the two versions of F+ are equivalent, and we have the desired properties for the original +-system.
3.3.6.1. For the modified system the property SA’ is clear, so we have the theorem (2- -CLPT): F + A (Ea),A 2- A’ + F+A’ (Ea).
>b
Proof. Since we know b-CLPT, and is just [i.e. identity] on the non-lexpressions we only need to consider A of degree 1. Use, e.g., a double induction, viz.
(1) on O-(A) - i.e. the length of the >--reduction tree of A, (2) on length(A). The only interesting case is when A = (Al) AS, A1 E a , A2 red- [z : a]C. If A1 2- A’, then A1 >6 A‘, so by 6 - CLPT A’, E a. If A2 Ah then by the ind. hyp. and by -CR A’, red-[z : a‘]C’, [z : a ] C red-[z : A’IC’. So A: E a’ and t-+ (A’,) A;. If A2 = [z : As]A4 then A1 E A3 (this is SA1) and t-+Aq[Al]. Since a reduction A 2 A’ starts with an inside or with an outside reduction, we are finished by the first ind. hypothesis.
>
>-
3.3.6.2. Corollary (UEI): F + A
+ F+A 9 A-.
0
3.3.7. Theorem (UE2 and UE3): Consider the system without 77’ and with rule I . 9 . Then
B t-+, resp. B ; [ t-+, resp. B;[F+ A ( E / Q B )
+
B- F, resp. B-; [- I-, resp. B-; [- F A - ( E / QB - ) . Proof. By induction on F+, using >--CLPT.
The interesting rules are :
The language theory of Automath, Chapter V ((2.5)
(i)
547
Appl. rule 1.3‘: let E+ A E a , E+ B red- [z : a]C. By ind. hyp. t- A- E a-. Clearly B- = [z : a-]C- and by ind. hyp. E B - , so 2 E a- I- C - , so I- ((A) B ) - = C-[A-], q.e.d.
(ii) Instantiation rule (vi’): let B contain a scheme y’E fi * d(y’) := D (possibly followed by *d(y’) E C). Let B1 be the book preceding this scheme. By ind. hyp. B;;y’Ep’- I- D-(EC-). Now if B ; < E dEfi[g],then by ind. hyp. B-;,$- E B’- E (p’[l?])[k], so B-;
fi-
(iii) Q-rule: let I-+A B, I-+C, B > C. By ind. hyp. E A- B-, E C.Since 2- commutes with all other reductions, except possibly $ which we have forbidden, we find B- 2 C- so by CL for pq-AUT-QE I- B- C- and I- A- C - , q.e.d. The case that C > B instead is completely similar. 0
3.3.8.1. Now we prove that 1.3 is a derived rule in the modified system. So as[z : a-]C-, whence B- must sume I-+A E a , I - i B [z : a]C.By 3.3.7 E’Bp and k+ a p. Further, by 3.3.6.1, I-+B red- Bbe [z : P]Bl with I- aand by 1.3’ E+ (A) B, q.e.d. 3.3.8.2. Similarly, $-equality is a derived rule. Let E+ A, I-+A’, A >: A’. We can assume that degree(A) = 1. By induction on length(A) we prove that k+ A A’. The interesting case is when A = [z : 0111 (z) A’, z $! FV(A’). As in 3.3.8.1, z E 011 I-+ A’ r e d - [ z : a2]Al with z $! FV(a2). By SA1 z E a1 I-+ a1 q a2 and by strengthening I-+a1 9 012. So E+ A q [z : (~1]A1 q [z : az]Al q A‘, q.e.d. 3.3.8.3. Hence the system with 1.3 and 7’-equality is equivalent to the system with 1.3’ and without ql-equality. So we have SA1, >_--CLPT, UE1, UE2 and UE3 for the original system of ,L?r&AUT-QE+ now. 3.3.9. The proof of CLPT 3.3.9.1. As in 3.2.6.5, we can prove CLPTl from outside-CLPT1, by induction on correctness. Clearly 6-CLPT (and a fortiori 6-outside-CLPT1) is included in >--CLPT, so we just need p- and q-outside-CLPT1. In the next section we infer PT3 and SA from our UE-result, which leaves us to prove the p2- and q2-case of outside-PT1 only. These two cases are dealt with in 3.3.9.3.
3.3.9.2. Consider the properties mentioned in 3.1.5. In this section we distinguish the two versions of a property (viz. for the smaller and the larger system)
D.T. van Daalen
548
by providing the latter with a UTg
+ below. It is clear that +
UTft and UTft,PTi
whence UT,:
LQft
.
PT; and LQ$
The property UD is also preserved in passing to the larger system, and in fact, as in 3.2.3.1, k + [ ~a :] A Q [z:P]B
+ k + a Q P,(xE a F + A Q B ) .
By LQ: we have (*;). SA: we knew already. Now we show SA!+ for i # 1: let (A) [I : B]C. Since i # 1, ((A) [z : B]C)- = (A-) [z : B-]C-, so by UE2, I-' (A-) [z : B-]C- and by SA, F A- E B-. Hence by LQ; again, we have SA!+ for i # 1 as well. 3.3.9.3. In Sec. 3.2.5 we used c a n t y p in proving P- and q-outside-PT?. The same procedure applies in the +-system, but with t y p [see 3.2.41 instead of c a n t y p now. In particular we have
ii')
typ(d(/i))
E typ(d)[Af
for defined constants of degree 2 and 3 now, and (i.1
tYP((A) B ) = (A) tYP(B)
for both B of degree 2 and 3.
As in 3.2.4.2 we get I-:A
Ea
+ F + A E typ(A) C a
and, as in 3.2.5.2, k: A E a , (z E
IY
I-' C)
+ typ(C[A]) G typ(C)[A] .
So, as in 3.2.5.4 and 3.2.5.5, we get I-?(A) [z : B1
c*
whence P-outside-PT:,,,
and
k i [z : a I ( 4 A ,
whence q-outside-PT:,,
tyP((A) [z : B1 c)Q tyP(C[Al)
@ FV(A)
* ~ Y P ( [ Z: a I ( 4 A) 9 typ(A)
.
3.3.10.1. In 3.3.9.2 we have carefully avoided the properties which do not hold in the larger system, in particular LQ2 and (*'). For a counterexample
The language theory of Automath, Chapter V (C.5)
549
let d ( z ) be defined by z E 7 * d ( z ) := [y : 212, with typ(d) = 7. If a E 7 , then d ( a ) Q [y : a]a E [y : a]7,but certainly not d ( a ) E [y : a]7,so not LQ2. If, furthermore, A E a , then I- (A) [y : a]a but not I- (A)d(a), whence not (*'). Consequently, the +-system is not a definitional extension of the old system.
3.3.10.2. Besides, if we stick t o our counterexample, z E d ( a ) I- t E [y : a]a,so z E d ( a ) k (A) z E a, but not z E d ( a ) I- (A) d ( a ) (= t y p ( ( A ) z ) ) . This shows that t y p applied t o 3-expressions can lead us out of the correct expressions (in contrast with the situation in the smaller system), and that not:
F3A
+
A E typ(A)
.
3.3.10.3. In the next section we restore (*) and LQ2 by a further extension of the 1anguage.But first we give a theorem stating some very weak versions of LQ2 to hold in P$-AUT-QE+ instead of LQ'. Recall the symbol 0 from Sec. 3.2.3 and the result (Sec. 3.2.4.3, 3.2.6.1) for Pq-AUT-QE:
I-AEB, I-AEC =sI-BDC. Theorem. Let
~ + A E B I,- + C E D , I - A q C . Then
I - + A E D or I - + C E B . Proof. By UE we get I- A- E B-, I- C- E D - , I- A- Q C-. By LQ for Pq-AUTQE we get I-C- E B- SO I- B- 0 D-, sok+B Q B- 0 D- Q D , i.e. F + B o D , 0 i.e. B c D or D c B, q.e.d. 3.3.11.1. The aforementioned anomalies can partially be removed by properly extending PoG-AUT-QE+ to a language P@-AUT-QE*. In this new system we first replace the application rules by [z : a]C, A E a =+ I- (A) B
(1)
B
(2)
B E C, k (A) C =+ I- (A) B E (A) C
Rule (1) is simply 1.3 (Sec. 2.3) without the restriction to degree 1. Rule (2) is III.3.B' (Sec. 2.15). So, indeed, AUT-QE* extends AUT-QE+.
3.3.11.2. By this modification we gain the property k3A =s I- typ(A)
, so it is a proper extension .
Furthermore, by 0-reduction we get
D.T. van Daalen
550
B E [z : a]C
+
B
[z : a ] ( z B )
, which yields property (*)
for the new system. Our counterexample, however, shows that there are still problems: LQ2 does not hold, so we do not yet have a definitional extension of AUT-QE. Besides, now the new 2-expressions (e.g. ( A )d(a) in the example, which is correct now) do not have a correct t y p l and not even an E-formula.
3.3.11.3. The following theorem shows that the difference between AUT-QE+ and AUT-QE* just lies in the particular role of the definitional 2-constants, and that AUT-QE* is an unessential extension of AUT-QE+ (though it is no definitional extension). Theorem. Let t-* stand for correctness in AUT-QE*, and let A’ be the b2normal form of A . Then I-* A(E/QB ) t-+ A’(E/QB‘)(so I- A-(E/Q El-)).
*
0
Proof. Induction on I-*.
3.3.11.4. A drastic way of combining 2-constants with type-inclusion and still preserve LQ, is to add LQ explicitly to the language definition, or at least something like k2A, C E B , A r Z C
*
AEB.
Adding this rule to P@-AUT-QE+ produces the smallest definitional extension of AUT-QE which includes P@-AUT-QE+, and it gives us AUT-QE* plus all the missing E-formulas. An alternative way of defining this new system (we still call it AUT-QE*) is by ignoring the type-assignment part of definitional 2-schemes, and by defining the t y p of a definitional 2-constant to be the t y p of its definiens (compare the way norms have to be introduced for AUT-QE, [van Daalen 80, Ch. IV.4.41). From the latter definition of this new system it will be clear that our desirable properties (except UT2, of course) can be proved for it by the same methods as used in the closure proof of AUT-QE+.
3.3.12.1. Up till now we have, for definiteness, just compared Pv-AUT-QE with P@AUT-QE+ (and P@-AUT-QE*), i.e. we made the extension in one step and added the definitional constants and the 1-appl-expressions simultaneously. One can as well, of course, consider intermediate languages like Pq-AUT-QE+ and Pqb- AUT-QE. Then one notices that the problems with (*), LQ2 and t y p are exclusively due to the 6 (in particular b 2 ) and not to the in P$-AUT-QE+. Thus Pq-AUTQE+ satisfies LQ and (*), and is a neat definitional extension of PO-AUT-QE, whereas P$-AUT-QE has all the unpleasant features of PT&AUT-QE+. In
+
The language theory of Automath, Chapter V (C.5)
551
fact, PgG-AUT-QE+ is a definitional extension of PgG-AUT-QE, and PgG-AUTQE can only be made into a definitional extension of Pg-AUT-QE (call this new system from now on AUT-QE‘) by adding a rule like in Sec. 3.3.11.4.
3.3.12.2. If one takes AUT-68 instead and adds an application rule: AECI,
[ z : a l c Q B E+ ~ (A)BET
(compare 3.3.11.1, rule (1)) one gets the corresponding +-language (i.e. smallest value degree = smallest function degree), AUT-68+. These systems are easier to handle than AUT-QE: both AUT-68 and AUT-68+ satisfy UT, LQ and (*), even in the presence of definitional constants, and AUT-68+ is a definitional extension of AUT-68. Without definitional constants, AUT-68+ is already contained in AUT-QE, but PgG-AUT-68+ is not contained in Pg6-AUT-QE. It is contained, though, in the system AUT-QE’ of 3.3.12.1. Closure for AUT-68+ can, e.g., be proved by the methods of the next section (see 3.4.5).
3.4. Some easier closure proofs (for simpler languages) 3.4.1. There are various ways of proving closure for simpler languages, such as Pq-AUT-68 or PG-AUT-QE. First, one can take the closure proof of the previous sections and adapt it to the language under consideration. Since g-reduction, type-inclusion and liberal degree specification (in particular for function degree) are responsible for many technical details in the proof, the simpler languages allow some obvious simplifications. E.g. if a language lacks q-reduction we can clearly skip the g-closure part and, besides, we can freely use CR. Or, if a language has more restricted function degrees (AUT-68 vs. AUT-QE, non-+-languages vs. +-languages), we have to push SA, LQ, UD etc. through less degree levels. And, if a language lacks type-inclusion (AUT-68 and Nederpelt’s A), we simply have PT + LQ, and do not need to introduce something like cantyp for this purpose. A second approach is suggested by the fact that our language definition contains some technicalities which are only introduced to make the closure proof (i.e. this kind of closure proof, for a complicated language like Pg-AUT-QE) possible. In particular, I intend the use of the restricted Q-rule V.2 instead of the more liberal V.2’, i.e. the use of the restricted system type I, instead of the liberal system type I1 (see Sec. 1.2). Recall that after having proved closure for I, I and I1 can be proved t o be equivalent, and that, after all, we are more interested in system I1 than in system I. Now it turns out that, for the simpler languages, the modifications in the language definition (and the detour via system I) are superfluous, and that we can give a direct closure proof for a type I1 language definition.
D.T. van Daalen
552
Such direct closure proofs are presented below for all the regular languages which either lack 7-reduction, or have just function degree 3: p(b)-AUT-68(+), P(G)-AUT-QE(+) and PQ-AUT-68. A mere sketch is given for Pq(G)-AUT-68+ (for the definition of AUT-68+ see Sec. 3.3.12). 3.4.2. So we give these languages by a n E-definition with 9-rule
A Q B , B J C , I-C
V.2'
+
AqC
which a priori is stronger than V.2 but later turns out to be equivalent. The properties in Secs. 2.9, 2.10 such as the substitution theorem, correctness of categories, and the property: a of domain degree, A of value degree, z E a I- A ($ F [z : a ] A simply go through. As in Sec. 3.1, we essentially just need SA for proving closure. So below we confine ourselves to SA and, in connection with this, UD for the various languages. We start with the 17-less languages. 3.4.3.1. Theorem. UD for V-less languages.
Proof. Let [z : a ] B [z : PIC. Then by CR, [z : a ] B 1 [z : PIC so a B 1C, whence a 9 P and z E a I- B 9 C.
1P
and 0
3.4.3.2. Corollary. SA' for P ( b ) - A UT-QE+, SA2 for P ( b ) - AUT-68+. 0
Proof. Let A E a,[z : B] C 9 [z : a ] D . Then B p a so A E B.
Let c be defined as in Sec. 2.14. We need a lemma. I- F C G, G 2 [Z: 40 s-F 2 [ Z : G]C with Id1 = lpl and 6 1 b (2.e. a1 1 PI, a2 1 P2, 3.4.3.3.
etc.).
Proof. Induction on
c.
0
3.4.3.4. Corollary. SA2 for P(b)-A UT-QE(+), SA3 for O(b)-AUT-68(+).
Proof. Let A E a,[z : B] C E [z : cr]D. Then [z : B ] c E [z : B] F So by the previous lemma B 9 cr and A E B .
C [Z: a ] D .
0
Now in order to get SA3 for P-AUT-QE(+) we need a lemma again. Notice that the proof of this lemma fails when there are definitional constants. Lemma. I - ~ A E B B , ~ [ z ' : ~ D , A > [ z I:G~I = cI @, I ZJB. 3.4.3.5.
Proof. Induction on the length of A. The interesting cases are: (1) A
[XI
: al]A1, A1
[ ~ :1 P i ] [&
:
2 [&
,&ID, IS21 =
:
&]C,
I- A1 E B1, [ x i : al]B1 C B 2 E 3.4.3.3 0 1 1 P i and B1 2 [& : &]Bl
21
1B-21.By
The language theory of Automath, Chapter V (C.5)
&.
with 8 2 1 By the ind. hyp. (a1,a)1 ( P I ,d ) q.e.d.
(2) A
f
32
= B,
( A i )A2, Ai E 7,A2 E [z : TI&,
553
184 so d2 1 8 2 and (j; = B i [ A i ] C B 1 [.’:
,BID.
0
By 3.4.3.3 again, B l [ A l ] 2 [Z : p’D1 with 1 p”. Because B1 has degree 1 and A1 has degree 3, B1 2 ’.[ : ~ o ] D with o &[A11 2 fl. Similarly, since A2 has degree 2, if ( A l )A2 2 ’.[ : 6]Cthen A2 2 [ z : 7’1[z : &]COwith d o [ A l ]1 6 , Co[A1]2 C . By the ind. hyp. 60180 0 so d 5 do[Al] 1 Po[A1] 2 B a n d by CR d 1 q.e.d.
8,
3.4.3.6. Corollary. SA3 for
P - A UT-QE(+).
Proof. Let A E a, [z : B]C E D E [z : a ] F . Then [z : B]C E [z : B]G Q D whence D 2 [z : B’]G’ with B 1 B‘. By the lemma B 1a , so B Q Q and A E B. 0
3.4.3.7. So we have SA for P(S)-AUT-68(+) and 0-AUT-QE(+). In order to tackle the PG-case of AUT-QE we first prove 6-CLPT, which give us an unessential extension result. Then we can either extend SA directly, or first extend the lemma 3.4.3.5 to P6-AUT-QE+ and proceed as before. 3.4.4.1. Now consider PQ-AUT-68. We cannot use CR anymore. Theorem. UD2 for P7-A UT-68.
Proof. All 2-expressions are of the form ’.[ : cr]y or ‘.[ : ZIP((?). So if 3 [z : P ] B , then A 3 [z : a]A1 with a 2 P. By ind. on 9 we can 0 prove: if F2A Q [z : P]Bthen A [z : a]A1 with a Q 0. This gives UD2. k2 A
3.4.4.2. Corollary. SA for
PQ-AUT-68. 0
Proof. Immediate. 3.4.4.3. The same proof works as well for Pq6-AUT-68, as follows. Lemma. F2A 26 [a::a ] A l , F2B, A 1 B + B 26 [ z : P]B1, a 1 P .
Proof. Since 2 6 commutes with 2, [a: : a ] A l 2 [z : Q’]A{ 56 E 5 B. By . the &advancement (Sec. 11.9.3), B 26 C 2 [z : a”]A;’ 5 6 [z : Q ’ ] A ~Here reduction C 2 [a: : cr’’]A: does not contain &reductions so C = [a: : P]Bl with 0 0 2 a” 5 a’5 a,q.e.d. 3.4.4.4. By the simultaneous substitution theorem we have 6-CLPT again. Then by induction on Q we can prove:
E 2 F Q [ z : P ) B=+ F
1 6
[z:Q]A, a Q P .
This gives us UD2 whence SA, as before.
554
D.T. va,n Daalen
3.4.5. It is possible to extend these results (for pq(b)-AUT-68) to the corresponding +-languages Pq(b)-AUT-68+, but it is rather complicated. We can use a mixture of the methods in 3.4.4.3 and 3.4.4.4 and the methods in Sec. 3.3. Thus we start with leaving v2-reduction out of consideration, and restricting the k ( A )B. appl-rule of degree 2 to: A E a,I-* B 2 [z : p ] D , a! 2 p Later on these two restrictions prove to be immaterial. For the restricted system SA2 is immediate and p2-closure is guaranteed. Then we need b-p2advancement and the fact that bP2-reduction commutes with 2 , and get
*
k 2 F Q [ z : P ] B+ F l a p a [ z : a ] A , a Q P .
This yields UD2, and SA3 and we are finished.
V.4. The equivalence of the E-definition with the algorithmic definition 4.1. Introduction
4.1.1. Since in the E-definition the correctness of expressions and formulas (relative to a correct book and a correct context) was given by an ordinary inductive definition, the correctness relation is a priori just recursively enumerable and not necessarily recursive i.e. effectively decidable. In this section V.4, though, we prove the decidability and discuss some related topics. First we give some introductory considerations leading to a sketch of a decision procedure (Secs. 4.1.3-4.1.6). The whole verification process is, in principle, reduced to the verification of 9-formulas, for which the decidability follows from the normalization property N and the Church-Rosser property. We can use normalization freely because we proved N for a very large system in IV.4.5, but Pq-CR we d o not know yet. Therefore we assume throughout V.4 property CR for the correct expressions, for the proof of which we refer to Ch. VI. Then (see 4.2.2) we present the actual algorithmic definition, to be adapted for the various languages by a suitable choice of a reduction relation, of a typing function cantyp and of a domain function dom for the computation of domains (Sec. 4.2.3, 4.2.4). The equivalence proof in Sec. 4.3 is organized as sketched in Sec. 1.2 and 1.6, with the following effects: 4.1.2.
(1) The strengthening rule can be skipped from the E-definition. (2) The E-systems are decidable. (3) The algorithmic system satisfies the nice properties of the E-system: closure, etc.
The language theory of Automath, Chapter V (C.5)
555
The final sections concern the verification of Automath languages in practice. This is a matter completely different from the theoretical decision procedure discussed before. Particularly some remarks are made on suitable reduction strategies for deciding @formulas.
4.1.3. Deciding 9 and C No matter whether a system has &rule V.2 or @rule V.2', there holds A q B e-I-A,I-B,AJB
=+. By induction on 9, using CR. +. This is precisely rule V.2' so either it
Proof.
holds by definition or it follows
from CL.
0
So, by N (as in 11.5.4), for correct A and B , A 9 B is decidable. In P(v)-AUT-QE all 1-expressions are of the form [Z : 317. We have
ACT e - F I A and (Sec. 3.2.3.1) I-'A
c [z : P]B1 e-
A = [z : o ] A l ,
Q
9 p and x E
Q
I- A1 C
B1
.
So, for correct 1-expressions A and B , A c B is decidable (use induction on the length of B ) . Since on non-1-expressions c is just 9, this is true for A and B of other degrees as well. Let k stand for correctness in P(q)-AUT-QE, I-+ for some larger system, like P$-AUT-QE+ or P$-AUT-QE* and let - denote the P'6-normal form. By UE (Secs. 3.3.2, 3.3.3) we have
I-+B, I - A - c B -
I-+AcB *I-+A,
So, in the larger systems, too, A c B is decidable, for correct A and B. Deciding E-formulas In principle, E-formulas A E B , for correct A and B are going to be decided by the equivalence
4.1.4.
A E B e typ(A)
C
B
which reduces the E-formula to a C-formula. However, there is some trouble with typ. First, t y p can lead us out of the correct expressions of the language we consider. There are two ways to solve this problem: first one can introduce for each language a specified modified typefunction cantyp (for: canonical type) which does not suffer from this defect. Then we get what we want (as in 3.2.4 for AUT-QE) A E B e - I - A , I-B, c a n t y p ( A ) C B .
D.T. van Daalen
556
Alternatively, one can use the fact that the new, possibly incorrect expressions created by t y p in general are correct in some larger system (e.g. the corresponding +-system). Then one can decide the E-formula in the larger system:
I-B, I-+typ(A)cp
A E B *FA,
where I-+ stands for correctness in the larger system. If we make sure that I-+cantyp(A) 9 typ(A) then, by conservativity, the two approaches are clearly equivalent. A second difficulty with t y p occurs exclusively in AUT-QE’ and AUT-QE*. These languages have the rule: I-2B, I- C E D , B 2; C + I- B E D, and for the new category D of B the property t y p ( B ) c D (even if t y p ( B ) is correct) is not necessarily true anymore. This problem can be solved by taking a type-function which first eliminates all the b2-constants. For a b2-constant d we have then cantyp(d(A)) =
cantyp(b2-nf (d(A))). 4.1.5. Deciding correctness of expressions All correct expressions relative to a correct B and a correct 6 have to be B;<-expressions, i.e. the constants have to be in B and the free variables have to be in The verification of compound expressions can roughly be described as: verify the subexpressions, plus their possible type- and degree-restrictions. E.g. for abstr-expressions use the equivalence
<.
I- [z : a ] A ($ I- a , a of domain degree
,
z E a I- A , A of value degree
.
B’
For the subexpressions in c(6) there are type-restrictions prescribed in the scheme of c, viz. if the context of the scheme is y’ E ,dthen I- C(B’)
*
E
,d[B’]
(i.e. B1 E
PI, B2 E P2[B1] etc.) .
To verify the right hand-side first verify I- B1. Since I- p1 (it occurs in B), we can decide B1 E p1 as indicated above. Then check I- B2. Since B1 E /31 and ~1 E pz we know I- ,&[Bl]so we can tackle the next E-formula etc. 4.1.6. Verification of application expressions Now we discuss the typerestriction implied in the correctness of (A) B. We restrict ourselves to AUT-68 and AUT-QE here. Define a to be a domain of B if (i) B E [z : cr]c for some
c,or
(ii) B E C E [z : a]D for some C, D.
The language theory of Automath, Chapter V (C.5)
557
Then, in view of the formation rules for appl-expressions, we have the equivalence:
I-(A)B el--, B h a s a d o m a i n a , A E ~ . The arbitrariness w.r.t. the domain can be somewhat reduced by another property of uniqueness of domains, viz. if a1 and a2 are domains of B then a1 9 a 2 (which will be proved below, 4.2.4.2). This allows us to modify the equivalence: k ( A )B e k B , B has a domain, and
V, ( B has a domain cr
*
A E a)
i.e. we need just one domain t o check the type-restriction. If one fixes a particular procedure for the computation of some domain of an expression, one can define a domain function dom (specific for each language). E.g. for AUT-68 one might inductively define
S2-nf(cantyp(B))
= [z : a]C *
dom(B) = a
Now define an extended reduction relation ter VZZ], as follows: (i)
[compare rt-reduction an Chap-
A>B+A+B.
(ii) A (iii)
+
.
+
+
typ(A).
is transitive.
Then, an alternative way t o compute a domain of an expression B, is to perform a more or less specified search through the +-reduction tree of B until one possibly encounters an abstraction expression, say [z : a]C;if so, this a is some domain of B . Certain restrictions (specific for each language) have to be imposed upon the search in order to guarantee that not too many expressions get a domain in this way. Just like property N (at least S2-N) is crucial in the definition of don above, t h well-foundedness (i.e. property S N ) of -, is needed for the termination of the second procedure. This will indeed be proved below (4.4.11). As a whole, the situation with the two possible ways of finding a domain can be very well compared with the two ways of deciding a 9-formula: either one can compare normal forms (use N) or one can search for a common reduct in the respective reduction trees (use SN).
D.T. van Daalen
558
4.2. The algorithmic definition 4.2.1. Now we give, guided by the considerations in the preceding sections, the algorithmic definition of correctness. Apart from the compatibility condition of def and t y p (see below), the book-and-context part of the definition is as usual [see e.g. [ v a n Daalen 73 (A.3)]] and will be omitted. So y e just define the correctness of expressions and formulas (new notations I-,, E,, Q, and C,,with the subscript for “algorithmic”) in terms of reduction, dom and cantyp (Sec. 4.2). Later we discuss the choice of cantyp and dom for the various regular languages (4.2.3, 4.2.4).
<
4.2.2.1. Let B ; < I-,. The conventions for omitting B and in B;< I-, A are as in V.2.1. Degrees are indicated as superscripts and defined as usual. The compatibility condition reads: def ( d ) E, typ(d). 4.2.2.2. Formula part of the definition Let A and B be B;<-expressions (so not necessarily correct). We define: (i)
A qa B :*A
(ii) A C,
’.[
1B
B , if degree(B) = 1
: fir, d q,
As, B’. F [Z : d]A1, P’b’-nf(B) =
with the straightforward extension to strings:
8.
(iii) A C, B, if degree(B)
#
:%
P’G1-nf(A)
1 :($ A Qa B.
(iv) A E, B :ecantyp(A) C, B with a straightforward extension to strings AE, B’.
4.2.2.3. Expression part of the definition (i)
FAT.
(ii) I-,z
:ez occurs in E.
(iii) Fac(B1,..., B,) :eFa&, ...,I-,B,, has context y’E p’ then B’E, p[B’]. (iv)
E I-,
(v)
I-, (A) B
[z : a]A
c occurs in
B and, if the scheme of c
:e6 I-: a and <,z E a F, A and A has value degree.
:% I-:
A , I-, B , B has function degree, A E, dom(B).
4.2.3. The choice of cantyp 4.2.3.1. For our purposes (see 4.1.4) we require that, for correct A , cantyp(A) is as well correct, is a category of A, i.e. A E cantyp(A), and is minimal with respect to C: A E B + cantyp(A) 1 B.
The language theory of Automath, Chapter V ((7.5)
559
This leaves us still a lot of freedom for our choice of cantyp: e.g., as long as different definitions of cantyp yield definitonally equal results, they are equally good to us. In some languages t y p itself meets the requirements mentioned above, viz. Pv-AUT-QE+ and Nederpelt's A. In most languages, however, t y p causes some problems, e.g. there are correct expressions with incorrect t y p ; then we choose cantyp to be some suitable modification of t y p . Below we give a survey of the difficulties with t y p , and how these can be solved by cantyp. We start with the languages where the trouble with t y p is due to mere degree restrictions.
4.2.3.2.
(1) Pv-AUT-68: If kz [z : alp then its t y p is not correct in AUT-68, but is a typical AUT-QE-expression. Then cantyp of this expression has to be P. Further, typ((A) B ) where degree(B) = 3, is incorrect in AUT-68 but correct in AUT-68+ (so, see 3.3.12.2, in AUT-QE). In cantyp((A)B) we have to remove the applicator (A), so we can define cantyp((A) B ) = C(A], where cantyp(B) G [z : a]C. This is the same idea as in 3.2.4, but now for B of degree 3. (2) Pv-AUT-QE and Pq-AUT-68+: Application of t y p to (A) B of degree 2 yields AUT-QE+ expressions. For AUT-68+ cant yp of these expressions has to be 7 . For AUT-QE we remove ( A ) from cantyp, by P-reduction as in 3.2.4 (and in (1)). 4.2.3.3. Now we add definitional constants. This gives rise to the interference of b2-constants and type-inclusion, discussed before in 3.3.10-3.3.12.
(3) pv6-AUT-68: Consider the example of 3.3.10 which is also correct in AUT68. There occurs an ( A ) B of degree 3 such that t y p ( ( A ) B ) does not belong to AUT-68 (of course not, as in ( l ) ) ,does not even belong t o AUTQE and AUT-QE+, but does belong to AUT-68' (3.3.12.1) and AUT-QE' (3.3.11). Again, we must remove the applicator in cantyp, but we cannot be certain anymore that cantyp(B) is a n abstr-expression. Therefore we define cantyp((A) B ) 3 C[A], where &nf(cantyp(B)) E [x : a]C. (4) pqS-AUT-QE(+): The same expression typ((A) B ) of (3) is again incorrect here. Now the applicator is allowed in cantyp, but we need the @-reduction in order to remove the effect of the type-inclusion: cantyp( (A) B ) = (A) (s2-nf(cantYP(B))). (5) &6-AUT-68+: This language has 2-expressions (A) B (see 3.3.11.2), the t y p of which is incorrect in all the languages, and even not normable, e.g. (A) T . The cantyp of such (A) B must be 7.
560
D.T. van Daalen
(6) &&AUT-QE’ and ,&b-AUT-QE*: Here we have the same (A) B of degree 2 of AUT-68+. Besides, the t y p of a degree 2 definitional const-expression (even if t y p is correct) need not be a minimal category anymore. Therefore we define cantyp(d(A)) :G cantyp(b2-nf (d(2))). Then for the cantyp of (A) B of degree 2 we can simply take (A) cantyp(B) in AUT-QE’, whereas in AUT-QE’ we must take C[A]where b2-nf (cantyp(B)) = [z : a]C. 4.2.3.4. Resuming: we have three types of difficulties, viz.
(i)
In AUT-68(+) the only 2-expression is 7, so the t y p of 2-expressions can be incorrect. Remedy: define cantyp to be 7.
(ii) In non-+-languages (AUT-68, AUT-QE and AUT-QE’) the t y p of (A) B of minimal function degree (say: i) is incorrect. Remedy: create an abstr. expression by taking the (Pb)i-l-normal form of cantyp(B) and remove (A) by another Pi-I-reduction. (iii) In languages with b2-constants and type-inclusion t y p produces incorrect appl-2-expressions (AUT-QE(+)) or appl-1-expressions (AUT-QE’ and AUT-QE’). Besides, in AUT-QE’ and AUT-QE* the t y p of a b2-constexpression is not necessarily a minimal category. Remedy: remove the b2-constants after (AUT-QE(+)) or before (AUT-QE‘ and AUT-QE’) taking cantyp. In view of the arbitariness of cantyp (4.2.3.1) we need only three different definitions of cantyp, one for the AUT-68-family, one for the restricted AUT-QE languages and AUT-QE+, and one for the liberal AUT-QE branch (AUT-QE’ and AUT-QE*). Since the above list of difficulties is exhaustive, for the rest (e.g. for variables and const-expressions) the definition of cantyp differs only as regards the following clauses: 4.2.3.5.
(1) for AUT-68 and AUT-68+
(i) degree(€?)= 2 =+ cantyp(B) := T . (ii) degree(B) = 3, P2b2-nf(cantyp(B)) G [x : a]C
+ cantyp((A) B )
:=
+
:=
CPI. (2) for AUT-QE and AUT-QES
(i) degree(B) = 2, plbl-nf(cantyp(B))
E [z : a]C
cantyp((A) B )
CPI. (ii) degree(B) = 3,
+ cantyp((A) B ) :=
(A) (b2-nf (cantyp(B)).
The language theory of Automath, Chapter V (C.5)
561
(3) for AUT-QE' and AUT-QE*
(i) degree(d) = 2 =+ cantyp(d(A)) ; (ii) degree(B) = 2, p'b'-nf(cantyp(B))
cantyp(b2-nf [z : a]C
+
(d(A))). cantyp((A) B )
:=
C[AL That the proposed definitions of cantyp actually satisfy the requirements of 4.2.3.1 can be proved directly for the E-system using the results (CLPT, LQ, UE etc.) from Section 3, but will become clear as well in the course of the equivalence proof, below. 4.2.3.6.
4.2.4. The choice of dom 4.2.4.1. We start with a recapitulation of the appl-rules for the various languages. First, the appl-rules of AUT-68 ((1) A E a , B E [z : a]C k (A)B ) and of AUT-QE ((2) A E a, B E C E [z : a ] D I- (A) B) are simply valid
*
*
in all the languages (though rule (2) is vacuously so in AUT-68(+)). Then, I- (A)B ) ; this rule is with i = additionally, rule 3') (A E a,Fi B [z : a]C minimal value degree necessary for defining the +-languages AUT-68+ ( i = 2), AUT-QE+ and AUT-QE* (i = 1). For languages satisfying LQ', where i is not the minimal value degree, rule 3') is a derived rule. Indeed, for such i is '-I [z : a]C E [z : a]Cso by LQi B E xal D. Hence, rule 33) is anyhow valid, rule 32) is valid in the AUT-QE languages without b2-constants, further in AUT68+, AUT-QE' and AUT-QE', and rule 3') is valid in AUT-68(+) (vacuously), AUT-QE+ and AUT-QE*. Alternatively formulated, rule 3') is always valid but for: rule 32) in AUT-68 and AUT-QE(+) with b2-constants, and: rule 3') in AUT-QE and AUT-QE'.
So, for certain languages we must extend the definition of domain from 4.1.6 with the clause: (iii) B Q [z : a]C + a is a domain of B. The set of domains of a n expression is clearly closed under 9: 4.2.4.2.
a1
a domain of B , a1 Q
02
=+
a2
a domain of B
.
The converse of this is the announced uniqueness property, which we prove here for the enlarged notion of domain: a1
and
a2
both domains of B
+
a1
Q
a2
.
Proof. From 3.2.3.2, 3.2.4.3, 3.2.5.7 we recall the properties of Pq-AUT-QE I-' [z : a$'
0 [z : az]D
a1
a2
(this includes UD')
D.T. van DaaJen
562
Now let F3 [z : a&' E [z : a2]D. Then also F3 [z : al]C E [z : a1]F. By UT2 we get [z : az]D Q [z : al]F and by UD2: a1 Q 012. So we have EUD3 as well. Further, let k3[z : a&' Q [z : a2]D. Then also k3[z : q ] C E [z : al]F and by Lq3 [z : a2]D E [z : al]F. So by EUD3: a1 a2. This amounts to UD3. These results can all be extended to the extensions of Pq-AUT-QE by translation (e.g. PIG-reduction) into Pq-AUT-QE, as follows: let I-+ [z : al]CE/O [z : a2]D, where F+ stands for correctness in the larger system. By UE, I- [z : a I ] C - E/O [z : a z ] D - , correct in Pq-AUT-QE, so by one of our (E)UD results: a1 Q a; Q a; Q a 2 . Of course, in AUT-68(+) these (E)UD results are also valid. Now we treat the various possibilities for a1 and a 2 to be a domain of B. (1) [z : al]C 9 B Q [z : a2]D. Use UD.
(2) [z : al]C Q B E [z : a2]D. If necessary, translate (e.g. by b2-reduction) into a language satisfying LQ: [z : a I ] C - Q B- E [z : a;]D-. Then by LQ we get [z : E [z : a;]D-, and can use EUD. (3) [z : a1]C Q B E D E [z : a2]F. Use Lq: [z : a&' E D E [z : a2]F. But also [z : al]C E [z : a l ] G and by UT3: [z : a1]G Q D we arrive in case (2) again.
(4) B E [z : a1]C, B E [z : 02]D. Then [z : al]C 0 [z: a2]D so a1 9 a 2 . (5) B E [z : al]c,B E D E [z : az]F. By UT3: [z : al]C D we are again in case (2). (6) B E C E [z : a l ] D , B E F E [z : aa]G. By UT3 we get C Q F. Translate into a language satisfying LQ. This gives C- Q F- E [z : a J G - and by LQ C- E [z : aJG-. It also gives C- E [z : a;]D-, and case (4) applies. 0 4.2.4.3. It would be nice if the notion of domain of an expression were preserved under 9: B C , a a domain of B + a a domain of C . This is indeed true for languages satisfying LQ, but not for the others, viz. Pq6-AUT-QE and PqbAUT-QE+. By CLPT, there holds
B 2 C , a a domain of B =+
LY
a domain of C
i.e. the notion of domain is preserved under 2. So the converse directions (C 2 B , in particular with b2-reduction), fails in pqh-AUT-QE(+). For all the languages we have
B
C , a a domain of B
+
where C- is the b2-normal form of B.
a a domain of C-
The language theory of Automath, Chapter V ((3.5)
563
Proof. By the translation - we arrive in a language satisfying Lq, so from B- Q C-, a a domain of B- we get the desired result. As a corollary of this, we get
B Q C, a a domain of B , C has a domain + a domain of C . 4.2.4.4. In view of the above remarks we still have a lot of freedom in defining
a domain function dom which picks some expression from the set of domains. Dom is going to be defined in terms of cantyp and, just like cantyp, in terms of b2-reduction and (P6)i-reduction, where i is the minimal value degree. 1.e. by application of cantyp and these reductions we arrive at an expression which we call the domain normal form, dnf. If the dnf is an abstr-expression then we read off the domain dom from it:
d n f ( B ) = [ x : a ] C+ dom(B) := a . Otherwise, dom is simply not defined. The rules for computing dnf are for the non-+-languages: (1) AUT-68: dnf(B) := P2b2-cantyp(B).
(2) AUT-QE(‘):
(i) degree(B) = 3 =+ dnf(B) := @161-nf(cantyp(62-nf (cantyp(B)))). dnf(B) := ,BlG’-nf(cantyp(B)), (ii) degree(B) = 2 The P2 of AUT-68 and the P1of AUT-QE(’) were only added in order to cover the corresponding +-languages too. Now, we can deal with the +-languages by simply adding a rule for B of minimal value degree: degree( B) = i , i is minimal value degree + dnf(B) := (,D6)i-nf(B). This rule gives us AUT-68+ from AUT-68, AUT-QE+ from AUT-QE and AUT-QE* from AUT-QE’. That dom(B), as defined above, gives us a domain if B has one, and gives us nothing otherwise, can be proved directly, but will also become clear in the course of the equivalence proof.
4.2.4.5.
4.3. The equivalence proof
4.3.1. As announced before, the equivalence of the algorithmic definition with the E-definition will also prove the superfluity of the strengthening rule. To this end we use, along with the algorithmic definition system 111, two distinct versions of the E-definition, system I and system 11. Here, system I is the system of Sec. 2: it has the strengthening rule and it has Q-rule V.2. System 11, however, lacks the strengthening rule and has Q-rule V.2’ instead.
D.T. van Daalen
564
By CL for system I, we have: str., V.2 e (str.,V.2') 3 V.2', so system I1 is clearly included in system I. Below we denote correctness in I, I1 and I11 respectively by I-, I-0 and F a ; hence the inclusion of I1 in I becomes: hl I-. Now the equivalence of the three systems is shown by additionally proving I-, =+ i-0 (Sec. 4.3.2) and I- =%I-, (Sec. 4.3.3).
4.3.2. The I-, + I-0-part 4.3.2.1. We first formulate the theorem, which we want to prove. Theorem. If B Fa resp. B;( I-, resp. B;( I-: A resp. B;( I-:,' A then B i-0 resp. B;( I-h A resp. B;( I-;' A E cantyp(A). So the theorem implies that cantyp is well-defined on the non-1-expressions of the algorithmic definition. The proof of the theorem is by induction on I-, and depends of course on dom and cantyp, i.e. on the language we consider. However, large parts of the proof can be done for all or some of the languages together. 4.3.2.2. Some properties (1) I-oA, FOB,A Q , B +-I-oA Q B. Proof. This is simply rule V.2'. (2) I-OA
* I-oP'S1-nf(A)
0
A.
Proof. By the simultaneous subst. theorem 6-CLPT holds. Further SA' can be proved as in 3.3.6.1-3.3.8.2, or holds vacuously so P'-CL. By PS-CR and 0 06-N the P'Sl-nf is well-defined. (3) Let I-oA, FOB,A C~ B. Then I-oA c B.
Proof. For A of degree 1, by (2) i-0 A q PIG1-nf(A) = [Z : Z]Al C [i? : ~517-1'.I : PIT P'Sl-nf ( B ) Q B so I-0 A c B. If degree(B) # 1 this is (1) again. 0 4
(4) I-oA E cantyp(A), cantyp(A) Ca B , FOB
* I-oA E B.
Proof. Apply (3).
0
(5) The I-0-system satisfies CR.
Proof.
I-0
+ t-
and we assumed CR for I-.
(6) Strengtheningfor 9: E I-0 A q B , 51 sub E , (1 I-o A , El I-o B Proof. By ind. on 9 we get A 1 B so (1 I-0 A 9 B.
0 (1
I-o A 9 B. 0
4.3.2.3. Proof of the theorem, part 1 We only need t o give the induction step for those clauses in the definition of I-, which differ from the corresponding clauses in the definition of I-0. We start with the easy cases.
The language theory of Automath, Chapter V ((2.5)
565
(1) The compatibility condition. Let * d(Z) := A * d(Z) E B be a correct scheme according t o the algorithmic definition, i.e. E I-a A, E I-, B and A E, B. By the ind. hyp. E ko A E cantyp(A), FOB,so by (4) above E I-0 A E B , q.e.d.
(2) Expressions (easy cases). (i)
T:
trivial.
(ii) variables: let E Fa then by the ind. hyp. z E t y p ( z ) = cantyp(z).
E
ko, so for z in
E, 5
k,,
(iii) const-expressions, except G2-const-expressions in AUT-QE' and AUTLet I-, B1, ..., QE*: let the scheme of c be in B with context $ E k, B, and B'E, p'[B']. By the ind. hyp. ko BI E cantyp(B1), k,, B2 E cantyp(B2) etc. Further y ' E p' k, so $ E p' I-o so ko PI, y1 E p1 FOPZ etc. So I-o B1 E 01 and by the subst. theorem ko P2[B1], so ko& E p2[B1] etc. up to FOB, E pm[2].The conclusion is F O C ( (E ~ )typ(c)[B] E Cantyp(c(2))).
p.
(iv) abstr-expressions: let 5 I-: a and E , z E a Fa A, A of value degree. By the ind. hyp. [ k i a and E , z E a I-0 A (E cantyp(A)). For A of degree 2 in AUT-68(+) this is I ,z E a t-0 A E T which yields E I-0 [z : a]A E 7 E cantyp([z : a]A).Otherwise, we get 5 ko [Z: a ] A (E [z : a]cantyp(A) = cantyp([z : &]A)). 4.3.2.4. Some more properties Before discussing the remaining clauses we prove some more properties of I-0. First something about C. Of course, the @'G1-nf's of 1-expressions are of the form [Z : 517. As in 3.3.6-3.3.8 (leave q1 out of consideration, restrict the appl-1-rule) we can prove, even without using CR k: A
B
=$
plG1-nf (A) = [Z : 517 ,
= [Z : & , 1 - 0 3 9 6, and, by induction on C, FA A 1B + P'Gl-nf ( A ) = '.[ : 51 [y' : 71'17, p'G'-nf(B) = [Z : BIT, 1 - 0 6 Q i? s o we .get: p'G'-nf(B)
khA 1[z : P]B1 =$ plG1-nf(A) = [z : a]A1, I-oa 9 p, z E a I- A1 C B1. Now we prove a lemma. F i A (E B ) =S I-;A E cantyp(A) (C B ) . Proof. E.g. in AUT-68(+) there is nothing t o prove. Anyhow, the cases A = 7 , A a variable or A an easy const-expression (i.e. not a G2-const-expression in AUT-QE' or AUT-QE*) are immediate. For the rest we proceed by induction on
566
D.T. van Daalen
(1) the length of b2-reduction tree of
A,
(2) the length of A.
0
Abstraction expressions are easy. If A is a 6'-const-expression in AUT-QE' or AUT-QE*, by 6-CLPT and the first ind. hyp. t-;b2-nf (A)E cantyp(S'-nf(A)) = cantyp(A)(C B ) . Then by the extra type modification rule of these languages we get t-; A E cantyp(A) (C B ) , q.e.d. Now let A E (Al)Az. We have ko a1 E a , t-0 A2 E cantyp(A2) c [z : a]C. So plSl-nf(cantyp(A2)) = [z : a1]C1 with a1 Q a , z E a1 I- C 1 C C. We want t-0 A E cantyp(A) = C1[Al](cB ) . If the formula A E B in the assumption comes directly from C[A1]r B we get Cl[Al]t~ C[A1]c B q.e.d. Otherwise A 22 D,I-0 D E B (i.e. the extra rule of AUT-QE' and AUT-QE* has been D 2 with A1 2% D1,A2 2% D2, so t-o D1 E a , and used). This D = (01) t o DZE cantyp(D2) Q PIG1-nf(cantyp(Dz)) = [z : az]C2 C [z : a119 (apply one of the ind. hypotheses to Dz),and by the first ind. hyp. t-0 D E cantyp(D) = C2[D1] Q Cz[Al]c CI[A~].So, by the type mod. rule, FoA E Cl[Al],q.e.d. 4.3.2.5. Proof of the theorem, part 2 Now we prove the induction step for the two remaining cases. (1) b2-const-expressions in AUT-QE' or AUT-QE*. As in 4.3.2.3 (iii) vie can get t-8 d ( 2 ) from Fa d ( 5 ) . Then by the lemma ~ o d ( ZE) cantyp(d(3)). (2) Appl-expressions. Let ti A, Fa B, B of function degree, A E, dom(B). By the ind. hyp. t-2 A E cantyp(A) J. dom(B), I- &(E cantyp(B)). For the computation of cantyp and dom in the various languages see 4.2.3.5 and 4.2.4.4 respectively. AUT-68(+), I-iB: p2b2-nf(cantyp(B)) G [z : a]C,dom(B) z a. By I-0 B E [z : a]C and k o a , so t o A E a and t-0 (A)B E C[A]= cantyp((4 B ) . (ii) AUT-68+, k-8B: P2b2-nf(B) = [z : a]C. We have SA2 (see e.g. 3.4.5) so p2 - CL so FOBQ [z : a]C and t-o(A) B E 7 E cantyp((A) B ) .
(i)
6-CLPT
(iii) AUT-QE(+),
t-i B: P1G1-nf(cantyp(6'-nf
(cantyp(B))))
dom(B) s a . By 6-CL and the lemma in 4.3.2.4 I-0 B E
= [z : a]C,
b2-nf(cantyp(B)) E [z : a]Cso l-o(A)B E (A)(&nf(cantyp(B))) = cantyp((A) B ) . (iv) AUT-QE' and AUT-QE*, B: As (iii), but from ko62-nf(cantyp(B)) E [z : a]C we infer now I-0 cantyp(B) E [z : a]C so t-0 (A)B E (A)cantyp(B) = cantyp((A) B ) .
l-2
The language theory of Automath, Chapter V ((3.5)
567
(v) AUT-QE, I-: B: Like (i) but decrease the degrees by 1. (vi) AUT-QE+ and AUT-QE', I-; B: Like (ii), but decrease the degrees by 1. This finishes the proof of the theorem in 4.3.2.1.
0
4.3.3. The Ika-part 4.3.3.1. We formulate our theorem. Theorem. If B t- resp. B;( t- resp. B ; ( I- A then B F a resp. U;[ F a resp. B;( F a A. Further, if B; [ F A E B then A E, B. The proof will be by induction on I-, We just discuss AUT-QE, because with AUT-68 everything is completely similar or somewhat easier. 4.3.3.2. First, we need some properties: (1) Strengthening holds in the F,-system.
Proof. Notice that the definition of c a n t y p only refers to the relevant parts of the context, i.e. to assumptions concerning actually occurring free variables, and that the other notions in the definition of correctness do not refer to the context at all. Hence, strengthening can be proved by a simple 0 induction on Fa. (2) On PCT2 (preservation of cantyp): In 3.2.5, we proved pq-outside-PCT? for pq-AUT-QE. However, 6-outside-PCTf is wrong, so for AUT-QE(+) with b2-constants we can only get restricted PCT2:
If F 2 A , A
1 B not using b2-reduction then cantyp(A) 9 c a n t y p ( B ) .
In order t o prove this, start with F 2 A E a + I- A E cantyp(A) C a! (e.g. as in 4.3.2.4). Then, as in 3.2.5, one can prove: F2A , A 2 B not by b2reduction + cantyp(A) 9 c a n t y p ( B ) . Restricted PCT2 gives us restricted LQ2 for AUT-QE(+):
If k2A , B E C, A 9 B without using b2-reduction then A E C . (3) However, in AUT-QE', and AUT-QE', full PCT2 is still valid and hence La2 holds (this was already implicitly claimed in 3.3.11.4).
Proof. In AUT-QE' and AUT-QE' we have S2-nf (cantyp(A)) 5: cantyp(b2-nf (A))
.
So, let A 1 B. Then b2-nf(A) 2 S2-nf(B) without using b2-reduction, so 0 by restricted PCT2 we have cantyp(s2-nf(B)).
D.T. van DaaJen
568
(4) By CR we have I- A Q B j. A 9, B. As in 4.3.2.4 we have I-l A ,BIG1-nf (A) = ’.[ : Z] [y’: TIT, P1G1-nf ( B )= ’.[ : I- d Q @. So
g]~,
B
*
4.3.3.3. Proof of the theorem Note that the I- A E B j. A E, B part of the theorem, for A of degree 2 follows from k2A E B + I- A E cantyp(A) K B (in 4.3.3.2 (2) and 4.3.3.2 (4)). The proof is by induction on I-. We first discuss some of the clauses for the formation of expressions: (i)
abstr-expressions: let I-2a , z E a I- A1 (E B1). By the ind. hyp. I-: a , z E a I-, All (A1 E, BI, i.e. cantyp(A1) c, BI), so F a [z : a]A1, (cantyp([z : aIA1) = [z : alcantyp(A1) c [z : aIB1, so [z : a]A1 E, [z : a]B1),q.e.d.
(ii) const-expressions: let y’ E @ be the context of the scheme of c, I- B’ E @[B’]. By the ind. hyp. I-, 2,B’E, @[g], so I-, c(@. If c is not a b2-constant in AUT-QE’ or AUT-QE* then cantyp(c(2)) = typ(c)[B’] so certainly c a n t y p ( c ( 2 ) ) C, typ(c)[g], q.e.d. Otherwise use the remark above. (iii) 2-appl-expressions: let k3 A E a , I- B E [z : a]C. By ind. hyp. k : A, I-, B , cantyp(A) 1 a , cantyp(B) 3. [z : a]C. So ,B’b2-nf(cantyp(B)) = [z : a’]C’, dom(B) = a’ 1 a. By CR cantyp(A) 1 dom(B) so I-, ( A ) B . Further, by the remark above, (A) B E, C[A], q.e.d. (iv) 3-appl-expressions: let k3 A E a , I- B E C E [z : a]D. By the ind. hyp. I-2 A, cantyp(A) 1 a , I-, B , cantyp(B) 1 C. By b2-CLPT, I- b2-nf(C) E [z : a ] D . By the I-, + I-0-part, I-oB E cantyp(B) so I- B E cantyp(B), so I- cantyp(B) so I- b2-nf(cantyp)). Further b2-nf(cantyp(B)) b2-nf(C) without using b2-reduction, so by restricted LQ, I- b2-nf (cantyp(B)) E [z : a ] D and cantyp(b2-nf (cantyp(B))) C a [z : a]D. 1.e. ,B1b1-nf(cantyp(b2-nf(cantyp(B)))) = [z : &’ID’, a 1 a’ = dom(B). Hence Fa(A) B. Further (A) cantyp(B) 1 (A) C and (A) (b2-nf(cantyp(B))) 1 (A) C so anyhow cantyp((A) B) 1 (A) C , q.e.d. Finally we discuss the type modification rules and the strengthening rule. (v) Type modification: let I- A E B , B C C. By the ind. hyp. I-, A, A E, B , i.e. cantyp(A) C, B and by 4.3.3.2 (4) B C, C. Use CR to get A E, C q.e.d. (vi) Strengthening: use 4.3.3.2 (1).
The language theory of Automath, Chapter V ((2.5)
569
This finishes the proof of the theorem I- + Fa and the proof of the equivalence of the three systems I-, I-0, Fa. So we do not distinguish between I-, FOand Fa any more and have
I- A (E a )
+ I- A (E cantyp(A) C a )
and I- ( A )B =+ cantyp(A) 1 dom(B)
.
0
4.4. The actual verification 4.4.1. Before discussing the actual verification we make some concluding remarks on the formal decidability of the Automath languages. First, on the well-definedness of the decision algorithm suggested by the definition of I-, in Sec. 4.2, in particular the well-definedness of cantyp and dom. Cantyp and dom axe partial functions, so by well-definedness we understand:
(1) It is decidable whether an expression has a cantyp (or a dom). (2) if it has one, this is effectively computable. All this is already implicitly included in the equivalence proof. E.g. the I-, + I-0part states that cantyp on the correct non-1-expressions delivers a correct expression again. In the course of the decision process cantyp and dom are required of correct expressions only. E.g. before settling cantyp(A) B (in the verification of A E B ) we first check I- A, and before settling A E dom(B) (in the verification of (A) B ) we first check I- B. The definitions of cantyp and dom just require computation of degrees, and computation of p6-normal forms where i is the minimal value degree. Notice that P-N in this case, and in fact for all i < 3, can even be proved without using normability. 4.4.2. Our second remark concerns the normability. Below we make sure the
normability result of Sec. IV.4.4, as we claimed already several times, actually covers the regular languages, viz. by proving that the system of Sec. IV.4.4 contains our most liberal language AUT-QE'. Let us abbreviate the system of Sec. IV.4.4 by system IV. [This is a system like A U T - Q P , i.e. with application expressions of degree 1, but extended further such that expressions of all degrees are permitted.] Theorem. System IV contains AUT-QE".
Proof. This system avoids Q-formulas as indicated in 2.12. For the rest it is like our system I-0, with type-modification rule V.2' (Sec. 2.11) and without strengthening, but of course with much weaker degree restrictions. The expression formation rules are the familiar rules of AUT-68 and AUT-QE, except perhaps for the appl-rules which are most similar to the rules in 3.3.11 for the
570
D.T. van Daalen
first version of AUT-QE’. We only consider the 1-appl-expressions. Let (in AUT-QE*) A E a,I-l B 9 (z : a]C. By 0’6-reduction we get B 2 [z : a‘]C’with a Q a‘. The substitution theorem and SA’ (and hence p’6-CL) are as usual valid in system IV, so using induction on AUT-QE*-correctness we get (in system IV) 0 A E a‘, I- B 2 [z : a’]C’so I- ( A )B, q.e.d. 4.4.3. From our axiomatic introduction in Sec. 11.1.3 the actual nature of ex-
pressions does not become very clear, via. that they are just some well-structured symbol-strings. In view of this fact, a verification process for the correctness of expressions must be able to perform the following task: given a correct book and a correct context (mere symbol strings as well), each symbol-string must, in a finite amount of time, either be recognized as a correct expression (relative to book and context) or be rejected. The verification of such a string can be analyzed in several stages, e.g.: (1) bracket structure has to be correct,
(2) the free variables have to occur in the context and the constants have to occur in the book (after this stage the constants in the string can be assigned an arity, variables and constants get a degree and possibly a t y p and a d e f ) , (3) the arity of each constant has to fit the arity of the argument string going with it (only after this stage we can speak of expressions in the sense of Sec. Kl),
(4) degree restrictions (and possibly norm restrictions) must be satisfied,
(5) the type restrictions have to be fulfilled (i.e. of the argument A in ( A ) B and of the argument string 6 in c((?)). Here i t is just stage (1) which represents the context-free part of the verification. The stages (2)-(4) are literaly context-dependent, but still trivially recursive. After passing stage (3) an expression is pretyped. From our point of view stage (5) is the interesting part of the verification. The actually running verification program for Automath languages at Eindhoven University has indeed been organized along this lines (see [Zandleven 73 (E.I)], [van Benthem Jutting 771). There is a first pass with a ‘Lsyntux-checker’l covering stages (1) and (2). This pass is optional since there is a next pass with a “translator” covering stages (1)-(4) (but without checking norm-restrictions). And finally there is the “processor”, operating on the result of the translator, which covers stage (5). First we discuss the verification of definitional equalities A 1 B. We do not want to compute normal forms but rather design a strategy which after
4.4.4.
The language theory of Automath, Chapter V ((7.5)
571
a few reduction steps in A or B either results in a common reduct of A and B (if this exists), or enables one to conclude that it does not exist. When confronted with certain A and B during the decision process, we have to answer the following questions: (1) Shall we do an outside reduction?
(2) If so, on which of the expressions? The form (or: shape) of A and B (i.e. whether they are abstr-, or appl-expressions etc.) plays a crucial role here. E.g. when A and B are both in immune form, i.e. there is simply no outside reduction possible. So either we can immediately decide our definitional equality (if A and B are of different shape, or if A and B are atomic), or we have to split up (or: decompose) the equality into the equalities of the corresponding subexpressions of A and B. But if A and B have different form, not both immune, then an outside reduction is required. The basic construction aim for a decision strategy is of course to minimize in most of the cases the total number of reduction steps required for a conclusion: A is equal to B or not. There is of course uncertainty about what happens in most of the cases, but the intuitive (and possibly questionable) ideas on this subject, underlying the algorithm in the next sections, can be summarized as follows: generally, the definitional equalities arising in the course of the verification and offered to the decision process, are true, and a common reduct can be reached in relatively few steps. 4.4.5.
>h, 2;
We define new, restricted relations which precisely cover:
>h, >h
( h for head reduction) and
(1) outside reduction steps, (2) the reduction steps needed in order to make new outside steps possible. The relations are given by a simultaneous inductive definition: (i)
B 2; [ z : a]C
(ii) d ( C )
>;
(A) B
>; C [ A ] .
def(d)[C].
(iii) A 2 h ( B )D , B (iv) A
*
>h B +
A
>h Z,D >h
2
c, X $ FV(C)
j
[X : C Y ] A>h
B.
(v) 2; (resp. 2 h ) is the reflexive and transitive closure of 1.e.
B
+
>; A
and
>i are
> B, and if
c.
>; (resp. > h ) .
just q-less versions of >h and >h. Clearly A >h A >h B (or A >h B ) then B is a so-called first
D.T.
572
vil~l DaaJen
main reduct of A . Remark: This reduction does correspond to the head reduction common in the literature [Barendregt 84a], i.e. to the “first half of” the so-called normal reduction [Curry and Feys 581. A reduction A 2h B consists of mere sample head contractions, i.e. ( A l )... (Ak)B > ( A l )... (Ah)C where B > C is an elementary PG-reduction, and even only such of these that their reduct eventually becomes a new simple head redex. The unrestricted reduction D 2 C in clause (iii) is put there on purpose: it is of course possible that internal contractions are needed in order to remove free variables from an expression. The main property of 2 h (or l i , depending on whether 7-reduction is present) is: if A 2 B then A 2 h C 2 B where the reduction from C to B consists solely of internal reductions. So if A 2 B and A , B have different shapes, then A > h A’ 1 B. The intuition formulated in 4.4.4 leads us to the idea that a sensible decision process for definitional equalities must search for a common reduct (i.e. an affirmative answer) rather than normalize, by means of 2 h (in order to get a negative answer), and that during the reduction process the definitional constants must be saved, i.e. left intact, as much as possible. The strategy presented below (corresponding to what is actually implemented in Eindhoven [Zandleven73 (E.l)])can indeed be characterized by the following principles:
4.4.6.
(1) decomposition is preferred above main reduction.
(2) P-reduction is preferred above &reduction (is preferred above 7-reduction).
(3) reduction of a “younger” definitional constant is preferred above reduction of the “older” one. For example, if there is t o be decided whether ( A )B 1 ( C )D, the process first tries decomposition: B 1 D and A 1 C. If this succeeds, i.e. B 2 F 5 D , A 2 G 5 C then we have a common reduct (G) F . Only after this has failed, an outside reduction is attempted on one of the expressions: e.g. ( A )B > h E , i.e. B L [z : a ] F ,E = F [ A ] ,and the new question to be decided is E 1 (C) D. Was no outside reduction possible, then the other expression is tackled: (C)D > h E is tried, possibly resulting in a new question ( A )B 1 E . And, when confronted with the question ( A )B 1 d ( C ) , the process tries t o main reduce the applexpression rather than the other one. 4.4.7. The inductive definition of
> h and rithm for deciding questions of the form A
>h
2h
can be read as a recursive algoB , 38 ( A > h B ) ,
The language theory of Automath, Chapter V (C.5)
573
3& 3~~( A >h [z : &]&) etc. We give our algorithm for deciding 1 also in the form of an inductive definition. Here are the rules:
-1 A
=+: A
1 B.
(0)
Exchange: B
(i)
Variable,
(ii)
Prim: ( A Lh p(C?),(? 1 l?)
7:
A 2 h z e:A J z , and A
>h 7
e:A 1 7 .
e:A 1- p ( l ? ) .
1 D , A 1 C * : ( A )B 1 (C) D. (iv) Appl, P-red: ( A )B >h C * (C 1 D e:( A )B 1 D). (v) Def-def, decompose: B’ 1C? *: d(g)1d(6). (C 1 D e:d ( 8 ) 1 D ) . (vi) Def, 6-red: d(l?) > h C (vii) Abstr-abstr, decompose: a 1 p, A 1 B W: [Z : a ] A 1 [Z: P]B. (viii) Abstr, 77-red: [ z :& ] A > h B =+ ( B 1 C e:[ z : a ] A 1 C ) . (iii) Appl-appl, decompose: B
The notation l? 1 C? is used in the ordinary sense, i.e. B1 1 CI, B2 1 C2 etc. The clauses (i)-(viii) are given in their order of priority, they have t o be tried successively until a clause applies. Clause (0) must only be applied, and of course only once: (1) if none of the rules (i)-(viii) applies, (2) if by the exchange a rule of higher priority among (i)-(viii) can be made to apply1 (3) in case the question d ( 2 ) 1 definitional constant than d.
e(d) is
presented, where e is a “younger”
The clauses containing a bi-implication ((i), (ii), (vii)) are terminal: if application of one of these rules does not lead to an affirmative answer, a negative conclusion about the presented definitional equality can be drawn. In contrast with the other clauses, e.g. clause (iii): if not ( A 1 C), so not ( A 1 C and B 1 D ) then it is of course very well possible that rule (iv) produces a common result of ( A )B and (C) D. Further, a negative conclusion can be drawn if after exchanging still no clause applies at all. If 77-reduction is not allowed then one has to read >h and 2; instead of >h and &, and rule (viii) has to be skipped. 4.4.8. It should be clear that the algorithm above on the correct expressions indeed corresponds with 1. The only interesting point is the bi-implication in clause (vii), which makes that clause (viii) never has t o be applied to a pair of abstr-expressions. This is justified by our property UD (for correct expressions only) from the previous sections. We also have to show the termination of the algorithm (this shows the decidability of 1once more). First, the questions concerning > h and 2 h (e.g. whether
D.T. mn Daalen
574
A >h [z : B1]Bz for certain B1, Bz) are decidable on behalf of SN. Secondly, the procedure sketched above (for deciding A 1 B ) is easily shown to terminate by induction on (1) fl(A) + f l ( B ) ,
(2) 1(A)
+ 1(B)
where I9 stands for length of reduction tree and 1 stands for length of expression. Clearly the g-rule (viii) is equivalent to:
A l ( z ) B , B Z C , xGFV(C)
+
[x:a]AlB.
By a careful implementation of the handling of bound variables - this falls outside the scope of my thesis - there can be guaranteed that whenever during actual verification an equality [z : a ] A 1 B is offered to the decision procedure, B does not contain free occurrences of the same free variable z. This enables us to modify (viii) into the simpler rule viii‘): A 1 (z)B =+ [x : a ] A 1 B , which avoids the nasty internal reductions in the course of an outside g-reduction completely. The termination of the algorithm is still guaranteed with this new rule; we can even use the same induction as before, because it can be shown that rule viii’) never will be applied with a B such that B >h [y : PIC. 4.4.9. In accordance with our views on the actual verification process it may be
sensible to provide the decision procedure with a device which gives a warning in the following cases: (1) If the decision process requires too much time, or rather: too many reduction steps.
(2) If a question d ( B ) 5 d ( 6 ) or ( A )D 1 ( F )G is posed and not ( D 1 G and not ( A 1 F ) ) has been concluded.
(8 1 C?),
resp.
The warnings in case (2) can be partly motivated by the idea that most defined constants in an Automath-book are “XI-constants” (see [van Daalen 80, 111.5.5.3, 111.6.31) and that most functions in an Automath-book are XIfunctions, where D is a XI-function if: D 1 [z : a ] F + x E FV(F). The following example shows, however, that this motivation is not quite satisfactory: D = G = [z : a ] ( V )z, A =- [y : p ] p ( y ,V ) , F = [y : P ] p ( y ,y). 4.4.10. Now we discuss the verification of E-formulas. Since the definitions of cantyp in 4.2.3, with their computation of normal forms , are very unpractical, we prefer the alternative approach sketched in 4.1.4. Besides, the latter approach avoids the different definitions of cantyp and is by uniformity easier to implement for several languages simultaneously.
The language theory of Automath, Chapter V ((2.5)
575
As our “universe”, the large language which we use to decide our E-formulas, we takeAUT-QE*. Let t- denote correctness in AUT-68, AUT-68+ or AUT-QEf and let k * stand for correctness in AUT-QE*. One easily proves by induction on A, using LQ, CLPT etc. for F*, the important properties: (1) t- A =+ t-* typ(A), and
-
unless A is a 2-expression in AUT-68(+)
-
This justifies the equivalence mentioned in 4.1.4.
~ A E eBk A , t - B , t-*typ(A) C B except, trivially, the degree 2 of case of AUT-68(+) t 2 A E B *k2A,
8-7.
The l-procedure of Sec. 4.4.7 can be adapted in order to decide simultaneously by making some obvious modifications, e.g.:
1 and c
- Clause (0) becomes: B 1 / c / 7 A :e A 1 / II/ c B (where “B / [I/ 7 A” reads “ B 1 A resp. B [I: A resp. B 7 A”, etc.).
4
- To clause (i) there is added: degree ( A ) = 1 =+ A - Clause (vii) becomes: a
1p, A 1 / C / 7 B
H:
c r.
[ x : a]A
1 / c / 7 [x: P]B
etc. We do not bother to give a practical algorithm for deciding E in AUT-QE’ and AUT-QE*, because we think that these languages are of mere theoretical purpose. 4.4.11. Rather than computing domains via the domain normal forms (dnf’s)
of Sec. 4.2.4.4 we use the alternative approach of 4.1.6 of searching though the -+-reduction tree of an expression. Recall that + is generated by (1) ordinary reduction, (2) taking typ.
We promised the following theorem. Theorem. -+ is well-founded on the correct expressions.
Proof. As long as we stay inside the correct expressions we can use a double induction, viz. (1) on degree, (2) on I9 (= length of reduction tree).
For, reduction preserves degree and decreases 8 , and taking typ decreases degree. We must be a bit careful with applying typ to a degree 2 AUT-QE*
D.T. van DaaJen
576
expression - such as, e.g., can originate by taking t y p of a degree 3 AUT-QE expression - because a n incorrect and even not normable 1-expression might arise. A typical example is ( A ) r . However, this does no harm t o the wellfoundedness, because p1- SN can be proved, without using norms at all, for all degree correct expressions. 0 Also, we have another uniqueness result (compare 4.2.4.2). Theorem. A correct, A [z : a]C,A [x : ,BID =+- cr 1p. -+
-+
Proof. For 3-expressions A we even have a kind of CR-result A 2 A’ =+typ(A) 1 typ(A‘). Now let degree ( A ) = 2, and let A 2 A’. In AUT-68(+) and AUT-QE(+) this gives k* typ(A) I typ(A’), but in AUT-QE* this is not generally true, because typ(A) and typ(A’) need not be correct. Luckily such incorrect 1-expressions (see the proof of the previous theorem) never reduce to 0 an abstr-expression. So by UD we still get the desired result. The internal q-reductions included in are of course useless during domain computation where one only wants to reach an abstr-expression. So in an algorithm for domain computation we rather employ a restriction of which we name -+h and is generated by head reduction 2; and taking typ. In general, unrestricted search through the +h-reduction tree can be permitted - provided the degree restrictions are respected. However, the 2-expressions of AUT-QE and AUT-QE+ form an exception. Here the search for an abstrexpression has t o start with taking typ. Otherwise too many expressions would get a domain, which would give rise to typical AUT-QE* appl-expressions. Besides, unrestricted search can be very unpractical. E.g. in AUT-68(+) one never needs to inspect 1-expressions: if the 2-expressions in the -+h-redUCtiOn tree fail to produce a domain, going to the 1-expression by taking t y p will not help. In general it is no good strategy to start the domain computation with reduction, unless we are obliged to because the expression under consideration is already of minimal value degree. So, a simple and probably rather practical strategy for AUT-68(+) and AUTQE(+) may run as follows. Let A be the expression we start with. Take t y p until one arrives at a n expression of minimal value degree. Then reduce (with 2h) until one possibly finds a domain. If this does not succeed, A can still have a domain if it is a 3-expression of AUT-QE(+), otherwise A has no domain. In the indicated case unrestricted search of the -+h-reduction tree of typ(A) is required, to be executed as follows: one-step reduce (typ(A) >h B ) ,then take typ, then reduce (with 2;). If this does not yield a domain, one-step reduce B once more etc. The well-foundedness of -+ guarantees the termination of this procedure. 4.4.12.
-+
-+
The language theory of Automath, Chapter VI (C.5)
577
VI. THE @ g - C H U R C H - R O S S E R P R O B L E M OF G E N E R A L I Z E D TYPED A-CALCULUS VI.l. Introduction 1.1. The problem with pq-CR in Automath-like languages was first pointed out in [Nederpelt 5'3 (C.3)].Let x # FV(@),then
[x : a](x)[x : p] c >11[x : @] c and the question is whether [x : a]C and [x : @] C have a common reduct, i.e. [x : a]c
whether @g-CRl holds. In untyped A-calculus this case of CR1 is particularly trivial, because without the type-labels there just remains
Ax.C
Xx.(Ax.C)x
>o
A2.C
and for the common reduct we can simply take Ax.C itself. If [x : a](x)[x : P] C is not necessarily correct, a common reduct does not need to exist, for a and @ can be any expressions. Nederpelt conjectured already that for correct expressions Pg-CR (so @gCR1) does hold. This we shall prove below, making free use of the results of the previous chapter, in particular Sec. 3. So, if k [x : a](x)[x : @] C then by SA we know a 9 @ so [x : a]C 9 [x : @] C;but we know nothing about a common reduct. It is possible that certain versions of the algorithmic definition allow a proof of Pg-CRl. But then it is not so easy to infer CR, because we do not yet know CL for the algorithmic system. An alternative to the approach below is presented in the next chapter. There CR and CL are proved simultaneously for an algorithmic system, by induction on so-called big trees.
1.2. Below we concentrate on @g-reductionand leave &reduction out of consideration. It is easy to extend our result t o @qG-CR, since 6 commutes with Pg-reduct ion:
B
56
A >p,
c*
B rpq D
56
c
and, of course, 6-CR holds. We start (in Sec. 2) with a partial solution of the @g-problem,for g-reduction of degree 2, which works for regular languages only. Then (Sec. 3) we prove full @g-CR.
VI.2. A first result concerning @g-CR for regular languages 2.1. We prove the Church-Rosser property for regular languages with a reduction relation 2 generated by @-reduction and v2-reduction, i.e. g-reduction of [x : a](x)A >: A . degree 2: degree(A) = 2, x # FV(A)
*
578
D.T. van DaaJen
The motivations for studying this restricted ,Bq-reduction lies in the fact that the actual verification of mathematics in AUT-QE (in particular, of Jutting's Landau-translation, see [van Benthem Jutting 771) just required this specific type of q-reduction. 1.e. the Automath texts offered t o the verification program appeared to be correct Pbq2-AUT-QE. 2.2. Heuristics The idea is to proceed in two stages. First we consider a seemingly weaker form of q2-reduction which is tailor-made to avoid the critical ,&case mentioned in the introduction. For this restricted @q2-reductionwe prove CR. Afterwards (Sec. 2.5) it is shown that full Pq2-equality is equivalent to the restricted form. This can be compared with the situation in Sec. V.3.3.8 - where $-equality turned out to be provable. How t o define the restricted form of v-reduction? 1.e. under which conditions do we permit the reduction of [z : a] (z)A to A ? Clearly, we require: (1)
zeFV(A)'
Further, that A is not of the form [y : PIC - to avoid the critical case -. But this is not enough. Consider, e.g., [z : a](z) F, where F 2 [y : Fl]F2, z # FV(F). So we require: (2)
A
2
PIC
i.e. A does not reduce t o an expression of the form [y : PIC. Thirdly we want to preserve the substitution lemma
B > B'
+
B [ D ]>B'[D]
at least for D of degree 3, so we further require (3)
degree(A) = 2
.
This shows why the method works for regular languages only. Condition (2) can now be weakened to (2')
A
2; [Y : PIC
or, in the presence of b-reduction, to: A
2g6 [y : PIC.
2.3. The definition of the restricted reduction relation For definiteness we give a formal definition:
(1)
> is the disjoint one-step reduction generated by the elementary reductions:
The language theory of Automath, Chapter VI (C.5) (i) (A) [z : B] C > C[A]. (ii) z 6 FV(A), A 2; [y : PIC, degree(A) = 2 (2) 2 is the transitive closure of
+
579
[z : a] (z) A
> A.
>.
2.4. The proof of CR for the restricted reduction 2.4.1. Substitution lemma I. (i) A > A ’
+
B[A] > B[A’].
(ii) A 2 A’ =+ B[A] 2 B[A’]. Proof. As usual, by induction on B and 2 respectively.
0
2.4.2. Weak Pi-Pj-postponement: if i # 3 and A is degree correct then A 2; B
+- A
2; C 2; D 56 B .
Proof. If a Pj-contraction produces an essentially new ,@-redex then i = 3 or i = j. If i = j there is nothing to prove, so unless i = 3 we have A >:,p >”;@ B + A >“;p 2; C $ B. So, using P-SN, Pi-CR and the fact that ,@ and P j 0 commute we get the desired property, as in [van Daalen 80, 11.7.41. 2.4.3. Something about
p2 (for degree correct expressions).
(i) Degree(B) = 2, B 2 [y : C ] D
+
B 2; [y : C’ID’.
(ii) If degree(B) = 2, degree(A) = degree(z) = 3 then B[z/A] 2; [y : C ] D
+
B 2; [y : C’ID’ .
Proof. (i) Let B 2 [y : C]D, degree(B) = 2. By Pq-postponement and weak P2-P3postponement we get B 2; F 2; G 5; H >11[y : C]D. Then H , G, F are abstraction expressions, q.e.d. (ii) Use the square brackets lemma (11.11.5, IV.2.4) and the previous property. 0
2.4.4. Substitution lemma 11. Ifdegree(A) = degree(z) = 3 and A, B are degree correct then (i) B > B’ (ii) B 2 B’
+ +
B[z/A]
> B’[z/A].
B[z/A] 2 B’[z/A].
580
D.T. vm Daalen
Proof. (i) By induction on B. The crucial case is when B = [y : B1] (y) B2, y # FV(Bz), B2 2; [y : C]D, degree(B) = degree(&) = 2. Of course, y # FV(Bz[A]), degree(&[A]) = 2 and, by 2.4.3 (ii) &[A] 2; [y : C]D. So B[A] E [y : B1[A]] (y) Bz[Az] > &[A] q.e.d. (ii) By induction on 2.
0
2.4.5. Theorem (CR1 for the restricted reduction): if A degree correct then
A>B, A > C
BJC.
P r o o f . Let A > B, A > C. By induction on A we define a common reduct D of B and C. The crucial cases are (i)
A E (Al) [z : A2]A3, B G A3[A1] (by P-red.), C = (A:) [z : A’,]AL (by monotonicity). Take D E Ai[A‘,] and use the substitution lemmas.
(ii) A = (Al) [z : A21 (z)A3, B = (A:) A3 (by 11-red. and monotonicity), C E (Al) A3 (by P-red.). Simply take D = B. (iii) A = [z : All (z)A2, B 3 A2 (by 7-red.), C = [z : A;] (z)A’, (by monotonicity). Clearly degree(A’,) = degree(A2) = 2, z @ FV(A’,). If A’, 2; [y : C1JC2 then A2 2 [y : C1]C2 so by 2.4.3 (i) A2 2; [y : Ci]Ci. Hence A’, 2; [y : C1]C2 so D = A’, can serve as the common reduct. 0 2.4.6. Corollary. If A degree correct and normable then CR(A).
Proof. By induction on the reduction tree of A .
0
2.5. The extension to full ,f3v2-reduction 2.5.1. From now on we label the notions referring to the restricted reduction
>,
and lo, and by to we denote relation with a subscript 0. Thus we write >, correctness in AUT-QE(+) with an equality relation Q, generated, e.g., by
k,A,
k,B,
A
>o
B or B >, A
+ Aq,B.
By 2.4.6 we have
AQ,B
A
1, B .
On the other hand, the notations without a subscript have t o be interpreted in terms of “full” Pq2-reduction. Thus, we write I- for correctness in AUT-QE(+) with equality Q, generated by
F A , k B , A > B or B > A
+
AQB.
The language theory of Automath, Chapter VI ((7.5) Below we sketch the equivalence of the two systems. The implications >, so to =+ I- and Qo + Q are immediate.
581
+>
2.5.2. First we go through some theory of the o-language (i.e. with and 9,). The theorems about renaming of contexts and weakening (see V.2.9) are still valid. We have a restricted substitution theorem: If r] = ( 1 7 1 , y' E ,@,all yi in y' have degree 3, and B' E B[B]then 71 F,
c (E/Q,
D)
*
711
I-,
@I
(E/Q, D [ ~ I. )
So we have the single substition theorem: if degree(y) = 3 then
I-,,B E P , y E P t-, c ( E / Q , D ) + F,C[B] (E/Q, D [ B ] ). Hence, from SAi we can infer Pi-CLPT, as usual. Now SA2 works precisely as in the previous chapter (V.3.2.4)so we may assume P2-CL.
*
2.5.3. The proof that II-, and Q + Q, goes by induction on I-. The only interesting case is when I-2 [z : a](z) A , z # F V ( A ) , A 2; [z : Al]A2. Then V2-reduction is possible, but restricted reduction is not. So from I- A one gets I- [z : a](z) A Q A and we like to show that I-, [z : a](z) A 9, A holds as well. By the ind. hyp. I-, [z : a] (z)A and I-, A , and by PfL A Q, [z : A I ] A ~ and [z : a](z) A 9, [z : a] (z) [z : A1]A2 I?, [z : a] A2. By SA2 a Q, A1 so by the substitution theorem [z : a]A2 9, [z : Al]A2, whence [z : a] (z) A Q, A. 2.5.4. So the o-language is equivalent with the pq2-language, for which the properties CL, PT, SA etc. can be proved as in the previous chapter. Now let A 9 B. By the equivalence A Q, B and by CR A 1, B, so a fortiori we have CR for full Pq2-reduction. Extension t o the corresponding &language is possible as in Sec. V.3.3. VI.3. A proof of CR for full &reduction from closure and strong normalization 3.1. The assumptions 3.1.1. In contrast with the proof in the previous section, the sequel does not presuppose regularity of the language. So, after having proved CL for, e.g., Nederpelt's A, the present proof applies to this language. We assume that correctness of expressions and equality formulas is defined relative to a correct book t3 and a context E. The book is fixed throughout this section and omitted in the notation. Below we introduce an extended reduction relation and a correspondingly extended equality. Since we want to employ our usual notations 2 , Q for these
D.T. van DaaJen
582
new relations, we write 2, and 9, for the ordinary Pq-reduction and the corresponding equality relation, generated e.g., by
t I - A , [ F B , A 2, C
so B
~I-AQ,B.
We use our ordinary shorthand notation, writing
qkA
forE,qFA
and
AQ, B for [ I- A Q ,B etc. 3.1.2. For definiteness we give a list of the properties which we assume through this section and use in the proof. (1) Strengthening, and in particular the following consequence: if q then
= (170,771)
(2) Soundness of equality w.r.t. abstraction,
(3) W.r.t. application,
(a consequence of La, see below). (4) And w.r.t. substitution
(also a consequence of LQ). ( 5 ) Closure: k A, A 2, B =+ I- B .
( 6 ) SA, so (this concerns directly the critical pg-case)
[x : a](z) [y : PIC
+- x E a
a Q, p
.
(7) Strong normalization (with respect to 2,): l- A =+ SN(A). Remark: the properties (3) and (4) depend on La. As we know (see V.3.3.10) LQ fails in AUT-QE(+) with &reduction, but CR for these languages can be proved in two ways:
The language theory of Autornath, Chapter VI (C.5)
583
(1) From CR for AUT-QE(*).
(2) By first proving CR for a &less version, and then extend the result by using UE [i.e. an unessential extension result].
3.2.1. Heuristics We saw that in the critical case of pv-reduction the two direct reducts of [x : a](z) [z : p] C are syntactically equal ( G ) but for the domains a and /3 which are just definitionally equal (9,). Below we define the relation k: which precisely covers this kind of syntactic similarity intermediate between = and 9,. It would be straightforward to try and prove a modified CR-property
B
so A
2, C
*
B 2, D = D '
so C
by proving =-postponement, i.e. A x B 2, C
*A
2, B ' = C .
However, there is a problem with the latter property if A G [z : a](z) Al, B = [x : a](x)C , z $! FV(C), A1 = C. For it is possible that z E FV(A1). So we take a different approach. We define an extended reduction relation > which is disjoint Pq-one-step reduction, enriched by the clause
A
=B +
A
>B
(elementary =-reduction)
.
This means that internal contractions in the domains are ignored for the bookkeeping of reduction steps. For the new reduction relation we can simply prove CR1. Further there holds a certain version of >-SN, which gives us CR.
3.2.2. Structure of the proof We point out the difference with the approach in Sec. VI.2. There we first restricted our reduction relation, proved CR for the restricted reduction and then extended the result to the original reduction. On the other hand, here we start with proving CR for the extended reduction relation 2 , and afterwards we still must prove CR for 2,. In fact we first prove modified uniqueness of >-normal form, i.e. uniqueness with respect to =: A 9 B , A and B >-normal + A = B. And then, using the equivalence of 9, and 9, uniqueness of >,-normal form. So we have 2,-CR. For a comparison of 2,- and l-normalisation see Sec. 3.7.1 below. 3.3. Definition of the extended reduction relation 3.3.1. By simultaneous inductive definition we introduce the syntactic similarity =, the extended reduction relation >, with one-step reduction >, and the extended definitional equality 9, between correct expressions, as follows.
584
(I)
D.T. van Daalen Elementary reductions
> C[A] (,&reduction) (2) [z : B] (z) C > C if z @ FV(C) (0-reduction) (3) A ST B + A > B (=-reduction) (1) ( A )[z : B]C
(11)
Monotonicity rules
> A’, B > B’ (2) z ~ a l - A > A ’ (1) A
(3) A1
+
*
( A )B > (A’)B‘ [z:a]A>[z:a]A’
> A:, ...,Ak > A: + C(A)> C(A’)
(111) (1) 2 is the transitive closure of
>
(2) 9 is the equivalence generated by
>
(IV) (1) A X A (2) a q a’, z E a l- B c B’
+
[z : a]B
x [z : a’]B‘
+ ( A )B x (A’)B’ A1 z A:, ...,Ak = A: + C ( A )M C(A’)
(3) A = A‘, B x B’ (4)
3.3.2. Some remarks concerning the definition 3.3.2.1. It is not necessary to define the above notions simultaneously. For in view of 3.4.3 below, we might as well have taken instead of IV.2 (Iv.2’) a Qo a’, z E
CY
l- B
= B’ +
[z : a]B
= [z : a’]B‘ .
3.3.2.2. Except for the rules 1.3 and 11.2, the rules of I and I1 are the ordinary rules for S1,ps, disjoint one-step &-reduction. Rule 1.3 can be considered a strong form of the reflexivity rule A > A. Rule 11.2 is one half of the usual monotonicity rule for abstr. expressions. The other half can be derived using IV.l, IV.2 and 1.3: if a > a’then a 9 a’, further A k: A so [z : a]A
= [z : a’]A
so
[z : a]A
> [z : &‘]A.
3.3.2.3. If we had defined > to be the corresponding ‘‘nested” one-step reduction we might have been able to prove the diamond property for >. Then we could have avoided the appeal to SN when deriving CR from CR1. [Nested one-step reduction 5- is like disjoint one-step reduction, but additionally redices may be contracted in one step when they occur inside each other; typical clause
A 5 A A ’ , C 5 C ‘+ ( A ) [ z : B ] C 5 C ’ [ A ’ ] . ]
The language theory of Automath, Chapter VI (C.5)
585
3.4. Some easy properties 3.4.1. By simultaneous induction on Definition 3.3.1, using the soundness of Q, w.r.t. expression formation, we get if A
> A’
or A 2 A‘ or A
A’ or A = A‘ then A Q, A’
.
3.4.2. From 3.3.2.2 it is clear that 2 satisfies all the monotonicity rules and that A
>, B
* A>B,
A
>,
+
so
B
A>B,
and
Aq,B
+
A Q B .
3.4.3. So combining this we have Q, @ Q. As a corollary we have the monotonicity rules 3.1.2 (2)-(4) now also for 9. The monotonicity of M is immediate. Further x is an equivalence relation. 3.5. On =-reduction and normalisation 3.5.1. In certain A-calculus systems (see, e.g. [Curry and Feys 581) renaming of bound variables is not ignored - like we do here - but formalised in the form of a-reduction:
Then (see our definition of substitution, Sec. 11.2.4) it is possible tions are needed before some /3-reduction can be carried out. In a suitable definition of proper reduction sequence is: a sequence a finite number of a-reductions occur. 1.e. a reduction sequence is proper if from a certain C, on, only a-reductions are applied. normal if only a-reductions of C are possible.
that a-reducsuch systems, in which only C1 > C2 > ... Similarly C is
3.5.2. Here we treat the %-reductions analogously, as extended a-reduction, and call them improper reductions. Proper reduction sequences are reduction sequences in which only a finite number of such improper reductions occur. An expression is now SN if all its proper reduction sequences terminate and normal if only improper reductions are possible. So A i s normal, A > A’
+
A%!’
D.T. van Daalen
586
3.5.3. I n 3.5.1 we mentioned the possibility that a-reductions created new 0redices. For =-reductions this is not the case. Let >p (resp. >,,) denote the disjoint one-step reduction generated by the rules (1.1) (resp. (1.2)) and (11) of 3.3.1. So, e.g., A >p A‘ if some P-redices not lying inside a “domain” are contracted. Then we have, indeed, P=-postponement
A R B >p C
+
A >p B ‘ e C .
However, q =-postponement fails because =-reductions can create new q-redices (see 3.2.1). Fortunately we have =q-postponement instead
A >,, B = C
3
A=.’
>,,
C.
3.5.4. Now we can prove SN (in the sense of 3.5.2). Let a proper reduction sequence C1 > Cz > ... be given. If no 0-step turns up then the sequence terminates because from some C, on only q-steps are applied, which decrease the length of the expression. Otherwise, for some n, by wj-PP
c1 r >,, c,
>p
.
By Pq-PP and p=-PP
c1 >p r’ = r ” 2v c ~ . + ~ By C A N , i.e. SN with respect to l o29p(C) , is defined for correct C and 29p(&) 29p(r’).So by induction on 29p we can prove SN.
>
3.6. CR for 2 3.6.1. S u b s t i t u t i o n lemma I. Zfl- B [ A ] l, B[A‘]then (i)
A > A’
(ii) A 2 A‘
(iii) A 4 A’ (iv) A
= A’
+ B[A]> B[A’]. + B[A]2 B[A‘]. + B[A]4 B[A’]. 3 B[A]= B[A’].
Proof. All parts can be proved separately by ind. on B using the monotonicity 0 rules for >, 2 , Q and z.
3.6.2. S u b s t i t u t i o n lemma 11. Zf I- B[A]and t- B’[A]then
B > B’
+
B[A]> B’[A].
(ii) B 2 B’
+
B[A]> B’[A].
(i)
The language theory of Automath, Chapter VI (C.5) (iii) B (iv) B
M
B’
+
B’
* B[A]
587
B[A]Q B’[A] k
B‘[A].
Proof. By simultaneous induction on the definition of >, 2 , and 3.6.3. Main lemma (CR1): If A correct, B < A 2 C then B
M.
0
1 C.
Proof. By ind. on A. If A = B then for the common reduct D we can take D = C. Similarly if A M C. In case A = (Al)A2,B = ( B I )B2, C = (C1)C2, B1 < A1 > C1, B2 < A2 > C2 then by the ind. hyp. and by monotonicity of 2 we find a common reduct ( 0 1 ) 0 2 with BI 2 D1 I C I , B2 2 D2 5 C2. Similarly if A = C(A1,..., Ak). Further distinguish: (i)
A = ( A l )[x : AzIA3, B = ( B I )[x : B2]B3, C = A3[A1],A1 > B I , A2 9 B2, A3 > B3. By the substitution lemmas above B > B3[B1]5 A3[A1]so take D = B3[B1].
=
=
=
( A l )[x : A21 (x)As, B (B1)A3 (by r]-red.), C ( A l )A3 (by P-red.), z $! FV(A3),A1 > B1. Then C B and take D B.
(ii) A
=
=
= [z : A1IA2, B = [z : B1IB2, C = [x : Cl]C2, A1 B1, A1 C1, B2 < A2 > C2. By ind. hyp. B2 2 D2 5 C2 so take e.g. D = [x : B1]D2.
(iii) A
= [z : All (x)A2, B = [z : B1](x)B2, C = A2 (by q-red.), x # FV(A2), A1 B1, A2 > B2. It is easy to see that A2 >p,, D2 M B2. Clearly x $! FV(D2) so B [x : B1] (x)D2 > D2 5 A2 = C. So take D = D2.
(iv) A
(v) A = [z : All (z)[ X : A2]A3, B = [Z: A1]A3, C [z : A2]A3,z # FV(A2). This is the critical case. By assumption 6) from 3.1.2 A1 A2 so we can take D = B M C. 0
3.6.4. Theorem (CR): If A correct then CR(A). Proof. By SN we can define 8 ( A ) the maximal number of proper reduction steps in reduction sequences of A. Use induction on 8(A). Let B 5 A 2 C. The cases A M B and A M C are trivial. Otherwise, for certain proper reducts B1 and C1, A > B1 2 B , A > C1 2 C. First apply 3.6.3 to get B1 2 D1 5 C1. 0 Then apply the ind. hyp. to B1, CI and D I . 3.6.5. Corollaries. I. A Q B + A ~ B . II. Similarity of normal forms:
A q B , AandBnormal
+
AMB.
588
D.T. van Daalen
3.7. CB for zo 3.7.1. Call an expression o-normal if it is normal with respect to >o, i.e. if it does not contain p- or q-redices. So, if A o- normal then there are no reduction steps A >p B or A >,, B possible. But it might be possible - as long as we do not have CR - that after some =-reductions new 7-redices are created. So a priori we do not know whether A is normal. But, if A is o-normal and A does not have abstraction form and A 2 B then this reduction is an internal, and not a main reduction. E.g. A = ( A l )A2 + B = (B1)B2, and:
AE(A~)+ A ~B = ( B l ) B z , A l r B 1 , A2>B2 3.7.2. Theorem (uniqueness of o-normal form): Let A and B be o-normal, then
Proof. By induction on the sum of the lengths of A and B. Let A 9, B , so A Q B , so A 2 C I B. Distinguish the following cases: (1) Both A and B are abstr-expressions, [z : A1]A2 resp. [z : B1]B2. By prop. 3.1.2.2), A1 Qo B1, z E A1 F A2 Po B2. By the ind. hyp. A1 = B1, A2 3 B2
so A = B.
(2) Neither A nor B are abstr-expressions. Then A and B and C have the same form. E.g. if A = ( A l )A2, then C = (Cl)C2, so B z (B1)B2 with A1 1 CI I B1 and A2 2 C2 I B2. So A1 9 B1, A2 9 B2 and A1 qo B , A2 Qo B and by the ind. hyp. Al ZE B I , A2 = B2. (3) A has abstr. form and B has not. Then A = [z : A1]A2, A2 1 (z)D2, 2 $? FV(D2), A1 Q Di, and A 2 [z : 0 1 1 (z) D2 > D2 2 C 5 B. By CL, 2 E Di l- (z)D2 and by 3.1.2.3), 2 E D1 F (z) D2 Q (z) B. SO 2 E A1 lA2 Q (z)B and both A2 and (z)B are o-normal. B y the ind. hyp. A2 = (z)B. Clearly z 4 FV(B), so A is not o-normal, contradiction. So this case 0 does not occur.
3.7.3. Corollary (CR): (a) A correct, A l oB , A 2o C (ii) A Q ,B
A 2o C So B.
+
B l oD So C.
The language theory of Automath, Chapter VI (C.5)
589
3.7.4. Now we can conclude
A o-normal =+ A normal
.
= B > q C (i.e. A is not normal) then A = A~](z)A ...,~ z E FV(Az), B ...[z : B ~ ] ( z )..., B~ z $ FV(BZ), AI 9 Bi, x E A1 I- A:! 9 Bz. By CR, Bz 2, Az, so FV(A2) c FV(&), imFor, if A o-normal, A
... [Z :
possible.
D.T. van Daalen
590
VII. THE ALGORITHMIC DEFINITION AND THE THEORY OF NEDERPELT’S A: THE BIG TREE THEOREM, CLOSURE AND CHURCH-ROSSER VII.l. Introduction and summary 1.1. The history of A A further unification of the concepts underlying AUT-68 and AUT-QE led Nederpelt and de Bruijn [Nederpelt 71a], [Nederpelt 71b], [ d e Bruijn 71 (B.2)], after the construction of an intermediate version A-AUT, to the introduction of the language A or, as de Bruijn names it, AUT-SL, for single line Automath. First Nederpelt noticed that, via a suitable translation, instantiation, i.e. , zn), could be replaced by applicasubstitution in constant-expressions ~ ( 2 1 ..., tion and that, by this translation, &reduction reduced to &reduction. We used this fact for one of our proofs of 6-SN in [van Daalen 80, 111.5.41. However, in order to cover substitution with 2-expressions, as is allowed in Automath languages, the restriction to argument degree 3 and domain degree 2 had to be dropped. This would in combination with type-inclusion have given a higher order system, so to avoid normability- and normalization problems, one had to skip type-inclusion. Then, a further streamlining of the definition was attained by dropping the restriction as to inhabitable degree as well, thus allowing expressions of any degree. By the aforementioned translation and the relaxation of the degree restrictions it became possible to dispense completely with constants and schemes: constants could be translated into variables, schemes could be turned into assumptions and a book could be transformed into a context. Besides, quantification over all free variables was allowed now, so all assumptions 2 E a from a context could be converted into abstractors [z : a]. Thus, a statement B;
<
<
The language theory of Automath, Chapter VII (C.5)
591
1.2. The present treatment The discussion in the previous chapters: starting from the E-definition (V.2), first proving closure (V.3) and Pq-Church-Rosser (VI), and finally proving the equivalence with the algorithmic definition (V.4), though concentrating on the so-called regular languages AUT-QE and AUT-68, applies to Nederpelt’s language as well, which shows that his conjectures were justified. Here we choose an altogether different approach. Below we start with the algorithmic definition of correctness (VII.2). We foilow Nederpelt but for his single-line presentation: we fit the system into the book-and-context framework of the previous chapters. Whereas the definition of the constant-less part of the language (Sec. 2.1) simply can take place in the pretyped expressions [in the pretyped expressions there is a typing function, but the typing does not restrict the term formation], it turns out that adding constant-expressions (Sec. 2.2) requires the introduction of degree-norm correct expressions (Sec. 2.2.4). Then both Nederpelt’s conjectures are proved directly from the algorithmic definition, using the so-called big tree theorem (BT). This theorem states that, on the correct expressions - and, in fact, on the much larger domain of normable expressions - the partial order 5 generated by sub (i.e. taking proper subexpressions), by and by taking t y p is well-founded. So BT is an SN-result for an extended reduction relation and, hence, implies ordinary SN. The big tree theorem was first formulated and proved in [ d e Vrijer 75 (C.4)] for the regular language AX.
>
Section 3 below contains the closure proof of A without constants, serving as a motivation for BT. Section 4 contains two different proofs of BT, and in Sec. 5 we prove closure and CR for the constant-less part of All. In Sec. 6 we give some equivalence proofs: of the systems with and without (definitional) constants, and of the single-line version with the book-and-context presentation. As a result we get the various nice properties for all these systems.
VII.2. The definition of A and Aq 2.1. The part without constant expressions 2.1.1. Both A and Aq are systems of admissible expressions in the sense of IV.3. [In a system of admissible expressions term formation is in one way or another restricted; examples are the normable expressions, the degree correct expressions, and the correct expressions. See [wan Daalen 73 (A,,?)].]The correctness of books and contexts is standard, so we just present the part of the definition concerning the correctness of expressions. A simplification compared with e.g. AUT-QE is that no degree restrictions are imposed. If in the definition below > (resp. 2 , resp. 1) is interpreted in terms of ,&-reduction then we get Aq otherwise just A.
D.T. van Daalen
592
The function t y p and the degrees are as usual, see e.g. [Nederpelt 73 (C.3)] and [van Daalen 73 (A.3)]. Throughout Sec. 2.1 we follow Nederpelt and do not admit constant-expressions. Later on (Secs. 2.2, 2.3) we show how the language can be extended with the formation of constant-expressions.
2.1.2. By taking t y p of a non-constant-expression A the degree is decreased by one so, by successively taking t y p one arrives at a 1-expression. This 1-expression is called typ*(A). So, typ*(A)
:= A if degree(A) = 1
typ*(A) := typ*(typ(A)) otherwise
.
Now let U be correct and let [ be correct w.r.t. 8. We use the conventional shorthand: I- A instead of 8 ; [ , q k A, t y p instead of E-typ etc. Of course, as long as we do not form constant-expressions, the presence of the book B is completely irrelevant. Now correctness of non-constant-expressionsis defined as follows: (i)
I- 7.
(ii) k z if z among the variables in
E.
(iii) I- [z : a]B if k a and z E a k B. (iv) k (A) B if k A, k B, typ(A) 2 a, typ*(B) 2 [z : a]C for some a , C.
2.1.3. So correct expressions are pretyped expressions satisfying the so-called application condition: in appl. expressions (A) B the expression B has a dcmain (to compute from typ*(B)) corresponding with the t y p of A. In the next section where we also introduce constant-expressions, an additional condition concerning instantiation will be imposed. There are various alternative, equivalent, formulations of the application condition possible. E.g. one can replace “typ(A) > a” by “typ(A) a”. In A (i.e. without q-reduction) we have CR, so it is even sufficient to require typ(A) = a and typ*(B) = [z : a]C, in other words: typ*(B) = [z : typ(A)]C - where = is full definitional equality (see V.2.11) - or, anticipating certain results of Sec. 6.2.6, we might restrict the computation of the domain of B by requiring typ*(B) 2b [z : a]C (compare V.3.3). 2.1.4. Since norms are preserved under taking ty p and under reduction (see e.g. [van Daalen 80, IV.3.41) the correct expressions are strictly normable [strictly normable: the norm restrictions are fully respected; contrary to languages with type inclusion, where in some cases (substitution with &-expressions) the norm
The language theory of Automath, Chapter VII (C.5)
593
restrictions may be violated (see also VIII, 5.4.1)]. This can be shown by induction on the definition of k. E.g. that (A) B is strictly normable if it is correct: By ind. hyp. A and B are normable, so @ ( A )= b(typ(A)) G p ( a ) and p ( B ) = p(typ’(B)) = p ( [ z : a]C) = [p(a)]p(C), so ( A )B is normable, with P ( ( A )B )
= P(C)*
Hence the correct expressions are SN and the system is decidable.
2.2. Introducing constant-expressions; degree-norm correctness 2.2.1. We allowed the presence of a book containing schemes for the constants. Now we can simply introduce constant-expressions by adding the instantiation rule:
6
(v) If $ E * c(f) E y is a scheme of B, k = Id, I- B1, ...,I- Bk and tYP(B1) 1P1, ...,tY P ( W 1P k P I then t- c ( @ . That is, in a constant-expression c(@, the arguments Bi have t o satisfy the
instantiation condition t y p ( B i )
1P i [ B ] .
However, we have to make sure that t y p * is still well-defined, particularly that taking t y p still decreases the degree by one. E.g. typ(c(@) (= t y p ( c ) [ g ] = y[B]) and typ(c) (= y) must have the same degree. 2.2.2. Call a substitution [y’i?]degree correct if degree(yi) = degree(&) for i = 1,...,Id. Degree correct substitutions preserve the degree: is degree correct then y[Z] and y have the If y is a y’-expression and same degree. So, if we would add the requirement of degree correct substitution to the instantiation condition, then we might be satisfied. But this is not what we want: we rather would like t o show that the instantiation condition implies
[$/a]
the degree correctness of the substitution involved. This amounts t o showing that degrees are preserved under reduction as well. To this end we introduce the concept of degree-norm correctness. 2.2.3. Degree-norms are defined by:
(i) positive integers are degree-norms. (ii) if v l , v2 are degree-norms then [vl]v2 is a degree-norm.
So, just like ordinary norms (IV.2) are built up from T and square brackets, degree-norms are constructed from 1,2,... and square brackets. For degree-norms v we define the degree-norm v 1 as follows:
+
(i) if v is an integer then v
+ 1 is as usual.
= ( ~ 1 1 then ~ 2 v + 1 := so (“21312) + 1 = “21313. (ii) if v
[vl](v2
+ 1).
D.T. van Daalen
594
2.2.4. Now we define degree-norm correctness of books, contexts (w.r.t. a book)
and expressions (w.r.t. book and context). It is implicitly intended that an expression is degree-norm correct (dnc), it its degree-norm (dn), w.r.t. book and context, is defined. The definition of the latter runs as follows: (i)
dn(T)
:=
1.
:= dn(typ(z)) + 1. dn([z : a ] B ) := [dn(a)+ l]dn(B). dn((A) B) := if dn(B) G [dn(A)]u then u. dn(c(g)) := dn(typ(c)) + 1, if dn(Bi) = dn(yi) y’ E fi * c(y’) E 7 is the scheme of c.
(ii) dn(z) (iii) (iv) (v)
for i = 1,...,Id, where
Here the notational conventions are just like those w.r.t. ordinary norms: we write dn instead of < A n ; clause (iii), e.g., would in full read like this: (iii) <-dn([z : a ] B )
:= [(<-dn(a)) + 11(<,rE a)-dn(B).
Further a context is dnc if all its type parts are so, and a book is dnc, if all the contexts and typs of it are dnc. 2.2.5. A degree-norm u can be translated into an ordinary norm u* by replacing all occurrences of numbers by T . Notice that ( u 1)’ u’, so dn(A)* p ( A ) .
+
=
=
This shows that dnc-ness implies strict normability. Further, degree(A) can also be constructed from dn(A), for dn(A) ends precisely in the degree of A. We call a substitution [$j’/B]dnc if dn(Bi) = dn(yi), for i = 1,...,Id. Clearly dnc substitutions are degree correct. Degree-norm correctness is preserved under dnc substitutions: if E p’ I- y, k = Id, I- &,...,I- B k , y dnc and [$/2] dnc then
W Y ) = dn(7[21)
*
0
Proof. By induction on the definition of dn(y). This gives us the following corollaries: (1) C dnc, degree(C) # 1 (2) C dnc, C 2 D
+
(3) C dnc, degree(C) (4) C dnc, C 2 D
=$
typ(C) dnc, dn(typ(C))
+ 1 = dn(C).
D dnc, dn(D) = dn(C).
#1
+
degree(typ(C))
+ 1 = degree(C).
degree(D) = degree(C).
595
The language theory of Automath, Chapter V I I (C.5)
So typ* is total on the dnc expressions and, since dnc-ness is clearly decidable, typx is well-defined on all the expressions, in the sense of V.4.4.1.
2.2.6. Now we are able to show that correctness implies degree-norm correctness. Proof. By induction on I-. E.g. let t- A, I- B, typ(A) 2 a , typ*(B) 2 [z : a]C. By ind. hyp. A and B are dnc (so typ*(B) is indeed defined), so typ(A), a, typ(B), typ(typ(B)), ...,typ*(B) and [z : a]C are dnc as well. Now dn(typ*(B)) = dn([z : a]C) = [dn(a) l]dn(C) = [dn(typ(A)) l]dn(C) = [dn(A)]dn(C),while dn(typ*(B)) and dn(B) just differ as to their “end number” so dn(B) = [dn(A)]v for some v . Hence (A) B is dnc. Or, let f E * c(fl E y be a scheme, let I- B1, ...,I-B k (with k = Id) and let the Bi satisfy the instantiation condition: typ(Bi) 1 Pi[l?].By ind. hyp. the Bi and the pi are dnc. Now dn(B1) = dn(typ(B1)) 1 = dn(01) 1 = dn(yl), so [y1/B1] is a dnc substitution. So dn(B2) G dn(typ(B2)) 1 = dn(P2[B1]) 1 = 0 dn(y2). SO [y1,y2/Blr B2] is dnc, etc. Hence c(l?) is dnc.
+
+
b
+
+
+
+
So typ* is also total on the correct expressions, and correctness is welldefined. Further, the above proof shows that the system with constants is strictly normable as well, so (using SN) it is decidable.
2.3. Introducing deAnitional constants 2.3.1. After the formulation of instantiation and application condition, it will also be clear how the compatibility condition of def and typ for the formation of definitional constant schemes has to read: typ(def(d)) 1 typ(d), for definitional constants d
2.3.2. The scheme of a definitional constant d is defined to be dnc, if dn(def ( d ) ) = dn(typ(d)) 1, and for the corresponding d(g)we define
+
dn(d(B)) := dn(typ(d)) + 1 provided [f/l?] is dnc, where E p’ is the context of the scheme. So, still dn(d(B)) 3 dn(typ(d))+l = dn(typ(d)[l?])+l =+dn(typ(d(g)))+l, and degree-norms remain preserved under reduction: dn(d(B)) = dn(typ(d)) 1 = dn(def(d)) = dn(def(d)[@). And, by induction on correctness, we can prove that correctness implies degree-norm correctness. E.g. let the scheme of d be correct, then I- def (d), so def ( d ) dnc, and dn(def ( d ) ) 3 dn(typ(def ( d ) ) ) 1, and t- typ(d) so typ(d) dnc, dn(typ(d)) = dn(typ(def ( d ) ) ) and dn(def ( d ) ) = dn(typ(d)) 1, q.e.d.
+
+
+
D.T. van Daalen
596
VII.3. The closure proof for A 3.1. What to prove The decidability of the Automath language is one of the major aims of the language theory. By using an algorithmic definition we got the decidability of A and Aq, both with and without constants, directly from normalization (see 2.1.4 and 2.2.6). So one might wonder what else there is to prove. First there are both Nederpelt’s conjectures, the Church-Rosser property (CR) for Av, and the closure property (CL). We define CR(A) : B < A > C
+
BlC
CL(A) : F A , A > B + F B .
A main lemma for P-CL (and 6-CL) is the substitutivity of correctness: substitution with correct expressions of the right types preserves correctness. Formally: z E a I- B , F A , typ(A)
1a
+ I-
B[z/A]
Other properties which play an important role in the proof of CL, are sound applicability (SA), preservation of t y p (PT), of typ* (P’T) and of domain (PD). We write
SA(A) : A
= (B) [z : C]D +
1C
1t y p ( B ) A > B + typ*(A) 1 typ*(B)
PT(A) : A 2 B P*T(A):
+
typ(B)
typ(A)
PD(A): A = [ z : B ] C , A > [ z : D ] E
(degree(A)
+
# 1, degree(B) # 1)
BlD.
The properties PT1, CL1, P*Tl and PD1 are the respective one-step variants of PT, CL, P*T and PD. The above properties are not mere technicalities from the closure proof, but are also meaningful from the point of view of interpretation. E.g. SA is characteristic for the fact that the Aut-languages do not allow “proper inclusion’’ of type, and PT (resp. P*T) expresses the nice behaviour of t y p (resp. typ*) w.r.t. definitional equivalence. Further, these properties serve t o establish the correspondence between the present, algorithmic systems and the E-systems, and between the versions with and without constants (see 6.2, 6.3).
3.2. Some simple facts 3.2.1. Throughout this Section VII.3 we just discuss A [ i e . without 771 without
The language theory of Automath, Chapter VIl (C.5)
597
constants. So we may assume CR, and PD(A) (for all A) and SA(A) (for correct A) are immediate. By induction on I- A one also proves easily that I- A implies I- typ(A) (so -i typ(typ(A)), ...,I- typ*(A)). This is not easy any more for a system with constants. This property is called correctness of types.
3.2.2. As with the E-systems (see V.3.1), we prove CL from CL1 by ind. on 2. For the p-outside case of CL1 we need substitutivity and SA. Previously substitutivity (i.e. the substitution theorem, V.2.9) was easy and SA was rather involved, but here SA is easy and substitutivity is quite complicated. First some properties of substitution, which are valid already for pretyped expressions. Let A be a <-expression, let B be a (<, z E a , v)-expression. Let C* denote C[z/A]. Then (1)
tYP(A)
1 tYP(Z)
* tYP(B*) 1tYP(B)*
7
i.e.1
written out in full, <-tYP(A) (2)
tYP'(4
1a
*
1 tYP*b)
(61
v')-tYP(B*)
* tYP'(B')
1 ((5,.
E
a1
v)-tYP(B))*
1tYP*(B)' .
Both facts are proved by ind. on the length of B. Notice that (1) and (2) are valid for each right monotonic, reflexive relation instead of 1, so e.g. for 2.
3.2.3. The problem with substitutivity is that the condition typ(A) 1 a is clearly not sufficient. We would also like to know something about typ'. In fact we have the following theorem (modified substitutivity of correctness, for short
sc) : Let z E a , 77 I- B , let I- A, typ(A) 1 typ(z) and typ*(A) 1 typ*(z). Let C' denote C[x/A] again. Then 7' I- B'. Proof. By induction on I- B. E.g. the application case. Let I- B1, I- B2, typ(B1) 2 p, typ'(B2) 2 [ y : PIC. By ind. hyp. I- B; and I- B;. By ( l ) , (2) and CR t y p ( B i ) 1 p* and typ*(B;) 1 [ y : p']C*. SO by CR again typ(Bi) 2 7, typ*(B;) 2 [ y : r ] D for some 7,D. SO I- ( B i ) B;. 3.2.4. Corollary. z E a I- B 1 I- A , tYP(A)
1tYP(Z)
1
tYP'(A)
1 tYP*(Z)
* I- BIAI
'
Another consequence of (1) is PT1(A) for correct A, 2.e. I- A1 A
> J3
* tYP(A) l t Y P ( B ) .
Proof. Assume for definiteness that > is disjoint one step reduction the introductory comment to 11.81. The proof is by induction on the length of A. For example:
51 [see
D.T. van Daalen
598
3.3. Heuristic considerations
3.3.1. At first sight SA,PT1 and correctness of types seem to give a good starting position for proving CL. In a way this is true: we only have to find the right induction and the right induction hypothesis. Let us first try to prove CL1(A) by induction on the length of A, or rather by induction on the relation "being a subexpression of", for short: by induction on subexpressions. We interpret CL1 in terms of disjoint one step reduction. For the appl. case of inside reduction the ind. hyp. is not strong enough, we additionally need P*T1. So instead we try to prove CL1 and P*T1 together, again by induction on subexpressions. Now everything is allright with the inside reductions, but with outside p1 we still come in trouble: A = (Al) [z : cr]Az, SA gives typ(A1) 4 Q but in view of the previous section we also want tYP*(Al) L tYP*(Q) .
3.3.2. So let us see under what conditions we might prove this typ*-requirement. First notice: if we knew CL already, then we could use PT1 t o prove PT (for correct expressions), e.g. by induction on 2. The induction step runs as follows: let I- A, A 2 B 2 C. By CL we get t- B and by ind. hyp. typ(A) 1 typ(B) 1 typ(C) whence by CR: typ(A) 1 typ(C), q.e.d. An alternative proof of PT(A) from CL works by induction on the reduction tree of A (by virtue of SN(A)), for short: by induction on reducts. Viz. let t- A, A 2 C. If A = C then typ(A) = typ(C). Otherwise for some B, A >1 B 2 C. By PT1 typ(A) 1typ(B), by CL t- B and by i d . hYP. tYP(B) 1tYP(C), so by CR tYP(A) 1 tYP(C). 3.3.3. Further from PT we can prove PIT, or rather:
t- A I t- B , A
1B
* ~ Y P * ( A1) typ*(B)
+
by induction on degree(A) degree(B), as follows. If degree(A) = 1, then degree(B) = 1 too, so typ*(A) E A 1 B = typ*(B). Otherwise, degree(B) # 1 either, so we can apply PT to A and B. By CR we get typ(A) 1 typ(B), by correctness of types I- typ(A), t- typ(B) so by the ind. hyp. typ*(A) 1 typ*(B), q.e.d. An alternative proof of PIT from CL and PT is by induction on +, the order generated (as in V.4.4.11) by
The language theory of Automath, Chapter VII (C.5)
599
(1) “being a proper reduct of”, (2) “being the t y p of”.
-
So the induction on
--+ includes the induction on reducts mentioned before. That is indeed well-founded will become clear in the sequel. The proof looks like this. Let F A, let A 2 B. By CL I- B and by PT typ(A) 2 F 5 typ(B). By correctness of types F typ(A), k typ(B) and by the ind. hyp. tYP*(A) 1tYP*(F) 1tYP*(B), and by CR tYP*(A) 1tYP*(B).
3.3.4. In Section 3.2.2 we announced t o prove CL from CL1 by induction on 2. However, this can be interpreted in two ways: (1) To prove -/ A, A 2 B + I- B , by induction on A 2 B , i.e. on the number of reduction steps between A and B, (2) To prove CL(A) by induction on the reduction tree of A , i.e. by induction on reducts. Both inductions work, but the second one has an advantage: we just need C L ~ ( A )but , can freely use CL(B) in the course of the proof, for each proper reduct A or B !
-
3.3.5. Now it becomes probably plausible t o try and prove CL(A)directly by an induction on 5 ,the order generated by (3.3.3) and by sub. In this way we combine the induction on subexpressions (3.3.1, for the “inside” cases of CLl), on reducts (3.3.2, to prove PT), and on -+ (3.3.3, t o prove P’T). In order to make the induction work we need the well-foundedness of 2 on the correct expressions, i.e. the so-called big tree theorem BT. Section 3.4 contains the proof of CL as sketched above, assuming BT. Section 4 is devoted t o the proof of BT. 3.4. The actual closure proof 3.4.1. Definition of + by (1) A
-
.
--+
is the reflexive and transitive relation generated
tYP(A).
A-B.
(2) A 2 B
0
3.4.2. Definition of 5 . 5 is the reflexive and transitive relation generated by
+ A 5 B. + A 5 B.
(1) B sub A (2) A + B
0
D.T. van Daalen
600
3.4.3. The big tree of an expression A is the reduction tree of A w.r.t. the extended reduction relation 5 . Throughout 3.4, we assume the big tree theorem BT, which states that 5 is well-founded on the correct expressions (and, hence, that their big trees are finite). 3.4.4. Lemma. Let I- A, CL(A). Then PT(A) (degree(A) # 1).
Proof. As in 3.3.2, e.g. by ind. on reducts, using PT1 and CR.
0
3.4.5.1. Define: CL+(A) :eA 3.4.5.2. So CL+(A)
3
-+
B
* I- B .
CL(A).
3.4.6. Lemma. Let t- A, CL+(A). Then P*T(A).
Proof. By BT we can use induction on +. Let A 1 B. If degree(A) = 1 then degree(B) = 1 too and there is nothing to prove. Otherwise, degree(B) # 1 either, so by the previous lemma PT(A), i.e. typ(A) 1 F 5 typ(B). By CL and correctness of types t- typ(A), I- t y p ( B ) and by the ind. hyp. 0 typ*(A) 1t y p * ( F ) 1 typ*(B). Now use CR. 3.4.7. Theorem. I- A
+
CL(A).
Proof. By BT we can use induction on 5 . Let I- A, A 2 B. If A = B then there is nothing to prove. Otherwise A > C 1 B with C a proper reduct of A. We want I- C. The interesting cases are: (1) A = ( A d '42, = (Cl) c2, t- Al, tYP(A1) I Q , I- A29 tYP*(A2) 2 [. : @ID, A1 > CI, A2 > C2. By ind. hyp. F C1, I- C2. By PT1 typ(A1) 1 typ(C1), so by CR typ(C1) J. a. Now by the ind. hyp. we can assume CL+(A2), so P*T(A2) and typ*(A2) 1 typ*(Cz), and by CR typ'(C2) 1 [z : @ID,q.e.d.
c
, 1 a. By ind. hyp. we (2) A = (AI) [z : cr]Az, I- Al, I- [z : Q ] A ~ typ(A1) can assume CL+(A1), CL+(a), so typ'(A1) 1typ*(a), and by substitutivity 0 (3.2.4) I- Az[Al] = C,q.e.d. VII.4. Proof of The Big Tree Theorem 4.1. Introduction For the definition of the extended reduction relations and A we refer to Sec. 3.4. Both definitions make use of t y p , so + and 5 are only defined on pretyped expressions, i.e. expressions with a context. Notice: taking subexpressions often requires extension of the context. --.)
601
The language theory of Automath, Chapter VII (C.5)
The big tree of an expression A is its reduction tree w.r.t. 5 ,i.e. the branches of the tree are the proper :-reduction sequences of A . We define BT(A) :eA has no infinite proper 5-reduction sequences
.
The big tree is infinitary so: BT(A) e the big tree of A is finite
.
In this Section VII.4 we prove the big tree theorem BT: (BT)
A normable
* BT(A) .
So BT states that on the normable expressions 5 is well-founded, i.e. that 5-SN holds. de Vrijer [de Vrijer 75 (C.4)] introduced 5 and big trees, and proved BT for a system of normable expressions containing his language AX. Below we give two different proofs of BT. The first (Sec. 4.5) is modelled after the second proof of p-SN (IV.2.5), the second one (Sec. 4.6) uses an idea from de Vrijer’s proof (the “bookkeeping pairs”) but further follows the first p-SN proof (IV.2.4.4). Actually both proofs deal with a modification >p7 of A which is somewhat easier to handle and gives rise to even bigger trees (Sec. 4.4.2). For simplicity we start with a system without constants, and take just preduction for the ordinary reduction involved in -+ and 5 . Later (5.2, 6.2, 6.3) BT will be extended to cover the remaining cases.
>
4.2. Heuristics 1 rt-reduction and rst-reduction respecAfter de Vrijer we also call + and tively, with r for ordinary reduction, s for subexpression, t for type. Similarly we speak about r-reduction (i.e. ordinary >), s-reduction ( A s-reduces to its subexpression), t-reduction (A t-reduces to typ(A) etc.) and their combinations. The meaning of rs-SN, st-SN etc. and 6,, - the length of rs-reduction tree of an rs-SN expression - etc. will be clear. We want BT, i.e. rst-SN for the normable expressions. Let us summarize what SN-results we know already:
(1) r-SN. This is ordinary 0-SN as proved in IV.2.4 for the normable expressions. (2) s-SN and t-SN. s-reduction decreases length of expressions, t-reduction decreases degree of (pre-typed) expressions.
(3) rt-SN. This was proved for correct expressions in V.4.4.11. The same induction (1) on degree, (2) on 19,, applies to all degree-norm correct expressions: taking t y p decreases the degree, r-reduction preserves degree.
D.T. va.n Daalen
602
(4) rs-SN. Provable for the normable expressions by induction on (1) 19,, (2) length of expression. In fact the induction used in the proof of the square brackets lemma SQBR(IV.2.4.3), and in several P-SN proofs as a subordinate induction (IV.2.4.4, IV.2.5.3) is just induction on the rs-reduction tree. (5) st-SN. Can be proved by induction on the definition of pre-typed [ i e . “typable”]expressions. Clearly these inductions fail for full rst-SN: s-reduction can increase the degree, r-reduction generally increases length of expression, and taking t y p can increase both length of expression and length of r-reduction tree. Besides, on the normable expressions r-reduction does not preserve the degree.
4.3.1. Norm properties From IV.2.1 we recall some properties of the norm p and of the normable expressions. We write A < p B for: p ( A ) is shorter than
4B). (1) ( A )B normable (2) A normable
+
+
( A )B
B and A
B.
p ( t y p ( A ) )I p ( A ) .
(3) p ( z ) = p ( A ) , B normable
+
+
p(B[z/A]G ) p(B)
p(B)=p(A).
(4) A
2 B , A normable
(5) B
c A , A normable =+ B normable.
Properties (2), (4),(5) make that the normable expressions are closed under + preserves the norm.
5 and that
BT-conditions Similarly to the SN-conditions in IV.2.4.1 we can formulate necessary and sufficient BT-conditions: 4.3.2.
* BT(tYP(Z)). BT([?/ : B11B2) * BT(Bi), BT(B2).
(1) BT(z) (2)
(3) BT((B1) B2) e BT(B1), BT(B2) and (B2 + [Y : PIC
* C[Bl]BT).
Proof. We just give the +part of (3). Let BT(B1), BT(B2) and B2 + [y : PIC BT(C[Bl]). B2 is rst-SN so rt-SN so we can use drt(B2). B1 is rst-SN so r-SN so we can use 19,(B1). Using induction on dr(B1) drt(B2) we prove that all one-step rst-reducts of (B1)B2 are BT. Distinguish:
*
(i)
D sub ( B I )B2, so D
+
c B1 or D c B2, so BT(D).
The language theory of Automath, Chapter VII (C.5) B2 >1,p D or D D [y : PIC BT((B1) D).
(ii)
(iii) B1 (iv)
B2
= typ(B2). We have BT(Bl), BT(D) and
*
-+
>1,p
603
BT(C[&]). Apply the ind. hyp. to (B1)D, this gives
D. Apply the ind. hyp. to (D)Bz.
= [y : PIC. Then by assumption BT(C[B1]).
0
Heuristics 2 If BT(B2), Bz + [y : PIC then clearly BT(C). So BTcondition (3) above suggests as a main step in proving BT the substitution theorem for BT: BT(A), p(z) = p(A), BT(B) + BT(B[z/A]). Indeed, if we knew this theorem, we could simply proceed by induction on pretyped expressions and get BT. The similarity with the situation around P-SN suggests us to use SQBR (IV.2.4.3), for instead of 2: If B' [y : PIC then either 4.3.3.
-+
(1) B (2) B
-+
[y : Po]Co with
-+
5 P, C$ + C, or
-, (9)z, ((9)z)* + [y : PIC, where * stands for [z/A].
However, the following counter example shows that this lemma is wrong: Take B E (B1)[ z : y] [y : P] (2)z, A = [u : p] .. u .. u. Then B* [y : P*[B;]].. B; ..y*, but B [y : P[B1]](BI)z, and ( ( B ~ ) Z ) *..B;..B;. -+
-+
-+
4.4. ,&-reduction 4.4.1. One point which makes SQBRbreak down for
B
+
C =+ B[z/A]
=
-+
C[z/A]
-+
is that n6t:
.
=
Example: B z, C typ(z) and the only connection between z and A concerns their norms (not their typ's). The other substitution property: A -+ A' =+ B[A] B[A'] does not hold either, due to the lack of monotonicity clauses in the definition of -+. Example: A -t typ(A) but not ... A ... + ...typ(A) ... . -+
4.4.2. Now we introduce Pr-reduction by adding these monotonicity rules to the
definition of +. What we get is a reduction in the usual sense, that a one step reduction consists of replacing a subexpression (redex) by another expression (contractum). The redices are here of two kinds: (1) P-redices which contract as usual.
(2) r-redices: variables z which contract according to z
>s
typ(z).
604
D.T.
vitn
Daalen
We use the same terminology as before [see the comment to 11.81: &, > I , ~ ,>pr etc., T-SN, PT-SN, dp, etc. Now ZpT satisfies the second substitution property (above) indeed but the first one is still not valid (same counterexample). Just like + and ,: ZpT is only defined for pretyped expressions. Formally, we ought to speak about “ l p , w.r.t. context <”, and the monotonicity for abstraction expressions then would read: If
B1
>pr C1 w.r.t.
< and 8 2
>pr
then [y : B1]& >pT [y : C1]C2 w.r.t.
c2 w.r.t.
(<,y E 81)
<.
4.4.3. We are going to prove PT-SN and then conclude BT from the Theorem. PT-SN(A) =+ BT(A).
Proof. Let @SN(A). Using induction on (1) d p T ( A ) ,( 2 ) length of A we proof 0 that all one-step rst-reducts of A are BT. So A itself is BT. 4.4.4. PT-SN conditions These are quite similar to the BT conditions. The only non-trivial modification concerns the application case.
Proof. As in 4.2.3 but now we use induction on dp,(81)
+6pT(&).
0
Just like st-SN (see 4.2 (5)) we can prove TLet C contain subexpressions A = [z : a] ..2 .., I? = [y : P l y . . . Then A >r A’ = [z : a] ..a.., r >r [y : P ] . . P . . and we want a common 7-reduct of ... A’.. I? ... and ... A ..I” ... . As in 11.8.2 we consider all the possible cases. Generally the reductions simply commute: ... A’ ..r ... > T ... A’ .. I? ... < r ... A ..I?‘ ... . In case the specific z occurs in P or the specific y occurs in a then two .r-steps are needed, e.g. [y : ..z . . ] .. y .. > r [y : ..a . . ] .. y .. >T [y : .. ..I..(.. ..) .. < s < s [y : ..2 . . ] ..(..z..) ... Anyhow the weak diamond property holds for > r , so by T-SN we get T-CR, and uniqueness of T-normal form.
4.4.5. Something on SN. Further we verify T-CR:
4.4.6. This gives an easy way of reaching a PT-normal form: first r-normalize then &normalize. Notice: the norm properties guarantee that Lor preserves the norm of normable expressions. 2 p and r Tdo not commute, but we still can get PT-CR for the normable expressions, as follows. For norms v we define a p-r-normal expression v*:
(1)
T*
7,
The language theory of Automath, Chapter VII (C.5)
605
E [x : v;]v;. (2) ([v1]v*)*
Now we can prove
A normable
+
A >pT (p(A))*
by ind. on the definition of p. This gives Pr-CR and uniqueness of Or-normal form. The procedure above assures the existence, so for normable A we can speak of PT-nf ( A ) . In fact v* is Nederpelt’s original representation of the norm v.
4.5. First proof of Or-SN; a correction to IV.2.5.3 4.5.1. In view of 4.4.4 it seems reasonable to concentrate on the substitution theorem for Pr-SN: A Pr-SN, B Pr-SN, p ( x ) = p ( A ) + B[A]Pr-SN.Just like with --)I SQBR fails for >p7, so we rather let us inspire by the second proof of 0-SN (IV.2.5.3). In fact we also take the occasion to indicate (and repair) a flaw in that proof, concerning the distinction between replacement and substitution. 4.5.2. When defining substitution we have assumed the concept of literary replacement t o be understood. Substitution amounts to replacement with precautions, viz. that no clash of variables takes place, and substitution can also be considered a special case of replacement. Now let us see what went wrong in IV.2.5.3 (and also in IV.2.6.2). Essentially we wanted to replace a specific subexpression A in C by another expression A’, thus producing C’. We had the idea that this replacement of A with A‘ could be performed via substitution for a new “fresh” variable y, such that COF .. y .., C = CO[ y / A ]C’ , = CO[y/A‘].However, this is wrong: possible bound variables of C, which become free in A, can never get the appropriate bindings in CO[ y / A ] . What we need here is literary replacement (LR) of y with A and A’ resp. We is the result of literary replacing all free introduce a new notation: B[z/A]LR occurrences of x in B by A. 4.5.3. Below we follow the general idea of IV.2.5.3, but instead of using a substitution theorem for SN, we use the - stronger! - replacement theorem - as we ought to have done there (and in IV.2.6.2) too. The easiest way is to use “multiple” replacement, i.e. replacement with a set of expressions. Notation: B [ [ z / a l ~ where ~ , a is a set of expressions, is the set of expressions which result from B by (literary) replacing all free x in B by an expression A E a, but possibly different A’s for different occurrences of x (compare multiple substitution, in 11.10).
606
D.T. van Daalen
4.5.4. The monotonicity of >pr makes the replacement property work:
A 2 p r A’
*
B[AILRLPTB[A’]LR
provided A has been put in the appropriate extended context. We make this slightly more explicit. Let A be an occurrence of a subexpression in C. The context of A in C can be defined by induction on the length of C. Intuitively speaking, it consists of all the assumptions z E Q, which one encounters (in the form of abstractors [z : a ] )when scanning C from “left to right” until one arrives at A. The crucial clause in the definition is of course: if E is the context of A in C2 then (z E E l , [ ) is the context of A in [z : El]&. Now the context of A in the replacement property must provide all free variables of A with the same typing as they get when A is inserted in B. E.g. we can take (E, 7 0 ) where E is the context of B and 70 is the intersection (in the sense of context inclusion sub, of V.2.6) of all the 7’s which are the context of a free occurrence of z in B. We define p(A) to be the set of or-reducts of A. Then, again if A has been put in the right context,
*
C E B [ [ E / ~ ( A ) ~ L BR [[z/P(A)]]LR Ipr C
.
The other replacement property B 2 p r C + B* >pT C*,where * stands for [x/A]LR is still not generally valid, but we have a restricted version. Lemma. I f A >pT t y p ( z ) and B I p , C then B* Lp, C*.
4.5.5.
Proof. Ind. on >or. E.g. if B > I , ~C, B = ...z ...z ..., C then B* = ... A ... A ... >pr ...t y p ( x )...A ... 5 C’. Corollary. B* Pr-SN, A >pT t y p ( z )
= ...t y p ( z ) ... x ...,
* B or-SN.
Proof. Use ind. on (1) %r(B*), (2) length of B*. E.g. inspect the Pr-SN conditions.
0
4.5.6. Now we are ready for the Pr-SN proof.
Replacement theorem for Br-SN. Let * denote [ z / p ( A ) l ~ Let ~ .B normable, p(z) = p ( A ) ,A, B PT-SN. Then C E B*
3
C PT-SN
provided A has the right context. Proof. By induction on (I) p(A), (11) 2YpT(B),(111) the “capacity” of the transition from B to C, i.e. the sum of the 19pT’s of the reducts of A inserted in B. Now consider a single reduction step C >l,pr D. We distinguish:
The language theory of Automath, Chapter Vll (C.5)
607
(1) this reduction step concerns an old redex, i.e. a redex already present in B , (2) this step concerns a new redex. The latter are of two kinds: (2a) multiplied redices, i.e. redices inside an inserted reduct of A, (2b) newly composed redices. All T-redices fall under case (1)or (2a) and the P-redices are classified as before, so the only possibility of case (2b) is as follows: B = ... z... ( B 1 ) z..., C = ...A1 ... (C i ) [ y : y ] E..., D E ...A1 ...E[C1]..., where C1 E B f , A >pT A1, A 207 19: TIE. In case (1)and (2a) the replacement and the reduction commute, i.e. B > DO, D E 0:. To be precise, let (Cl) [y : r]Cz be an “old” redex, i.e. (B1)[y : P]B2 c B , C1 E Bf, c2 E B;. Then D = ... c2[C1]... E (...Bz[B1]...) [ ~ / ~ ( A [ & ] ) ] L R , and not simply D E D:. Then we get PT-SN(D) by ind. hyp. I1 (case (1))or I11 (case (2a)). Now we tackle case 2b): create a new variable z and form BO by replacFor . simplicity we put ing the intended (B1)z by z. so B = B o [ z / ( B 1 ) z ] ~ R typ(z) PT-nf ((B1)z), so p ( z ) = p((B1)z) and PT-SN(BO)- by 4.5.5. Then we form Bh E BZ by replacing the remaining free z’s of BOwith the appropriate reducts of A, i.e. the same as used in the formation of C, and finally replace the z of Bh by E[C1]. This gives us D = Bh[z/E[C1]]LR back. Informally: BO = ...z ... E ..., Bh = ... A1 ... z ..., D ...A1 ... E[C1].... Either by ind. hyp. I1 or 111 we get P.r-SN(C1). Further PT-SN(A)soPr-SN([y : y ] E )so PT-SN(E). By normability B1
= p ( A ) , A,
Corollary 2. B normable + B PT-SN (see
B PT-SN
+
B[A] 0
4.4.4).
0
Corollary 3. B normable + BT(B) (as in 4.4.3).
0
4.6. Second proof of PT-SN 4.6.1. Bookkeeping pairs, r-expansion and n-reduction. [This x-reduction is
similar, but not equal, to the 7r-reduction in Chs. II and VIII.] 4.6.1.1. Assume that A l TB , i.e. B results from A by successively replacing variables z by their type typ(z). Alternatively we can work backwards from
D.T. van Daalen
608
7-nf ( A ) ,by successively replacing newly created subexpressions by the original variable. In general it is of course not possible to retrace which subexpressions are newly created, and from which variable they stem, unless we store this information somewhere inside the expression! Following [de Vrijer 75 (C.4)] we use a new pairing operation [...,...I for this kind of bookkeeping. Definitions:
(1) If A , B are expressions then [ A , B1 is an expression. ( 2 ) If A , B are [-expressions then [ A ,B ] is a &expression.
( 3 ) If A , B are normable, p ( A ) = p ( B ) then p( [ A , B1) G p ( A ) . For the rest the definitions of pretyped and normable expressions are unaltered. The notions of subexpressions and substitution are extended in a straightforward way. As a new monotonicity rule, for each kind of reduction, we can have, e.g. A > A’, B > B’ =s [ A ,B ] > [A‘,B’]. 4.6.1.2. Now the alternative way of producing B from A (above) can be described as follows:
(1) First provide all variables x successively with a copy of their type, i.e. replace x by [ x ,t y p ( x ) l and so on.
(2) Then for some of these pairs simply restore the lefthand part, and for the rest pick the righthand part. In the process (1)the 7-expansion of A , T-exp(A), is constructed, i.e. each x of A is replaced by [ x ,T - e x p ( t y p ( z ) ) l . The process ( 2 ) we describe in terms of a projection reduction (n-reduction l T ) . Definitions: (1) The T-exp of pretyped expressions is defined inductively:
(i)
T-exp(x) = [ x ,T - e x p ( t y p ( x ) ) l .
(ii) 7-exp((A) B ) = (7-exp(A)).r-exp(B). (iii) T-exp([z : a ] B )= [ x : . r - e x p ( a ) ] ~ - e x p ( B ) . (iv) .r-exp( [ A , B1)= [T-exp(A),.r-exp(B)l. (2) (i) One step n-reduction >I,= is generated from n-contraction: [ A ,B] >I,* A, [ A , B ] >I,= B by the monotonicity rules. (ii) n-reduction
>T
is the transitive and reflexive closure of
>I,=.
The language theory of Automath, Chapter VII (C.5)
609
4.6.1.3. Remark: Formally we should have defined the 7-expansion of expressions w.r.t. their context, notation [-T-exp(B). The abstr. case of the definition then becomes:
[-T-exp([z : a ] B ) =
=
[z : ( [ - f - e x p ( a ) ) (([, ] z E a)-7-exp(B))
4.6.1.4. The point of this alternative approach of
A
>,
B
r-exp(A) 2, B
=S
.
>,,
making use of
(see 6.2.2)
>,
>,
is that is definitely easier to handle than >, roughly because does not depend on the context, and that 20,-reductions of a n expression can be simulated by &,-reductions of its r-expansion. Our proof below consists of two parts: first we show that Pa-SN implies 07SN, then we prove the SQBR lemma for 20, and Pa-SN.
4.6.2. Pa-SN implies Pr-SN. > I , ~B
4.6.2.1. Lemma. A Proof. Ind. on
*
~ + x p ( A )2, T-exp(B) (in fact 5'-1,,).
>I,~:
(i) r-contraction, A = z, B G t y p ( z ) . Then .r-exp(A) >I,, T-exp(typ(z)) = r-exp(B).
= [z,r-exp(typ(z))l
(ii) Monotonicity, e.g. A = [z : A l ] z , B [z : Bl]z,A1 > I , ~B1: By ind. hyp. r-exp(A1) 2, r-exp(BI), so r-exp(A) = [z : r-exp(A1)] [z, r-exp(A1)l [z : ~ - e x p ( B 1 ) [z, ] T-exp(B1)l = T-exp( B ) . 0
>,
4.6.2.2. Corollary 1. A 2, B Corollary 2. A
>T
B
=S
*
r-exp(A)
r-exp(A) 2, .r-exp(B). >T
0
B (because .r-exp(B) 2, B).
0
4.6.2.3. Lemma. Let A be a (-expression, let B be a (E, z E a, q)-expression. Let I and I' stand for [z/A]and [ x / ~ - e x p ( A )resp. ] Then
7-exp(B)11
>,
T-exp(B')
with r-exp(B') taken w.r.t.
E , #.
Proof. Ind. on the definition of .r-exp(B):
(i)
-r-exp(x)" T-exp(xI).
= [ x ,.r-exp(a)]'I = [r-exp(A),.r-exp(a)l
>, T-exp(A)
=
D.T. van Daalen
610
4.6.2.4. Corollary. Let A be a <-expression, B is a (<,xE a)-expression. 0 Then r-exp(B )[ x / r-exp(A)] 2, .r-exp(B [ x / A ] ) . 4.6.2.5. Corollary. A 5 1 , p B Proof. Ind. on
+
r-exp(A) 51,p 2, r-exp(B).
51,~:
(i) P-contraction, A = ( A l )[ x : Az]A3, B A3[A1],r-exp(A) 51,p r-exp(A3) [x/r-exp(Al)]Zn r-exp(A3[A1])= T-exp(B), by 4.6.2.4. (ii) Monotonicity, e.g. A = [Al,A21, B [BI,& I , A1 s 1 , p B1, A2 5 1 , p B2. BY ind. hyp. r-exp(A) G [r-exp(Al),r-exp(Az)l 5 1 , p 2, [r-exp(Bl),r-exp(Bz)l = r-exp(B). 0
4.6.2.6. Theorem. r-exp(A) Pr-SN
+
A Pr-SN.
Proof. Let r-exp(A) be Prr-SN, use ind. on 19pv(r-exp(A)).If A >1,p B then r-exp(A) 3 1 , p In r-exp(B) (by 4.6.2.5), so by ind. hyp. Pr-SN(B). 0 Similarly, if A > I , ~ B then Pr-SN(B). So A is Pr-SN. 4.6.3. The proof of ,&-SN 4.6.3.1. The normable expressions are closed (and norms are preserved) under Zp T. Further & satisfies both substitution properties (see 4.4.1).Notice that 2, does not satisfy CR but that ,B and rr commute (use nested one step reduction 5 1 ,[compared ~ to disjoint 1-step reduction 51, nested 1-step reduction 51 has the extra clause: A > A', B > B' + ( [ A ,B] > A' and [A,B] > B ' ) ] )and that weak rrP-postponement holds: A >pr B + A >,>p C 5, B. 4.6.3.2. PPSN conditions These are again quite similar to the P-SN conditions. The interesting clauses are: (1) A Prr-SN, B Prr-SN
+
[x : A ] B and [A,B1 Prr-SN.
( 2 ) A Prr-SN, B PT-SN and ( B Pn-SN.
[ x : a]D
+
D[A]Prr-SN)
+
(A)B
So, again, we want the substitution theorem for Prr-SN. 4.6.3.3. Square brackets lemma for 2 p n . Let B be Prr-SN. Let * stand for [x/A].Let B* >pT [y : PIC. Then either
The language theory of Automath, Chapter VII ((7.5)
(1)
611
B I p , [Y:PoICOwith& I p T P, C$ >p, C , or
(2) B
>pn
Z)* 2 [Y PIC. (Bk) (B1)2, ((2)
Proof. AS in IV.2.4.3, by induction on (I) lilp,(B), (11) the length of B. The new case is [Bl,Bz],B* G [Bi,B,*l. Then either B,* >p, [ y : PIC 0 or B,* &, [ y : PIC, and we can apply ind. hyp. I to B1 or B2. Remark: An alternative proof is provided by Barendregt’s lemma, which is still valid for >p, (see 11.11.3.5).
4.6.3.4. Substitution theorem for Pr-SN. Let B be normable, p(x) = p ( A ) , A and B are ,hr-SN. Let * stand for [ x / A ] . Then B* Pr-SN. Proof. As in IV.2.4.4, by ind. on (1) P ( A ) , (11) %r(B), (111) length of B. [ B l ,B21, B* The new case concerns B Pr-SN by ind. hyp. I1 so B* is Pr-SN.
E
[Bi, €I,*]. Both Bf and B; are
4.6.3.5. Corollary. B normable =$ B Pr-SN.
0
0
4.6.3.6. Notice that the r-expansion of normable A is again normable, so A normable T-exp(A) normable. Corollary. A normable 3 A Pr-SN ( b y 6.2.6). 0
*
Corollary. BT.
0
VII.5. Closure and Church-Rosser for A v 5.1. Introduction 5.1.1. Here we consider the constant-less part of Aq, defined as in Sec. 2.12, but with 2 standing for Pq-reduction. It is easy t o derive a strengthening rule (Sec. V.1.6) for such an algorithmic system, so q-CL does not cause major difficulties. The problems with closure for Aq, as compared to A, are rather due t o the fact that CL and CR appear to be heavily interwoven. Namely, a proof of CL (see, e.g., VII.3) seems to make quite essential use of CR, while in turn we seem to need CL in the course of the CR-proof - because Pq-CR holds for correct expressions only. The solution is of course to prove CR and CL (and a number of other properties) simultaneously, by induction on big trees. In Sec. 5.2, below we prove indeed that BT extends to the present situation.
612
D.T. van Daalen
5.1.2. We introduce some notation that enables us to make the structure of the proof more explicit. Here 5 is as in VII.3.4. Definition. If P is a property of expressions then P* and Po' are given by (1) P*(A) :-+ A 5 B
=$
P(B).
(2) Po(A) :-+ (A properly &-reduces to B )
+
0
P(B).
Using this notation, we can express our induction step by
F A , CR;(A),
CLg(A)
+
CR(A), CL(A)
for which, of course, it is sufficient to prove
t- A , CRg(A), CL;(A)
+ C R ~ ( A ) ,C L ~ ( A.)
The properties SA, PD, PT and P*T from 3.1 play again a role in the proof, and further property SC, substitutivity of correctness, here defined by S C ( B ) :-+ .( E a B, t- A,tYP(A) 1tYP(.),tYP*(A) 1tYP*(.) I- B[AI).
*
5.1.3. Now the proof below is organized as follows. First we present some preliminary facts, among which Pq-BT (Sec. 5.2), strengthening and Q-PT (Sec. 5.3). Section 5.4 contains the actual closure proof. First we assume t- A, CR;(A), CL;(A), and prove SA(A) and PD(A) (in Sec. 5.4.1), PTl(A), SC(A) and CR1(A) (in Sec. 5.4.2-5.4.4) respectively by a separate induction on big trees, and by simple induction on length. Then we complete the proof by proving PT(A), P*T(A) and CL1(A) simultaneously, by induction on the big tree of A again. 5.2. Extension of BT t o the Pq-case 5.2.1. A postponement result Let &,, and 2p7,, be the straightforward extensions of Z7 and >p7, as defined in 4.4.2. Mere verification shows that
A pretyped, A
>1,,,>1,~ B 3
A
%,, B
>I,~
whence 777-postponement:
A pretyped, A 2,,7 B
A
&>,,
B.
Combining this with Pq-pp [P~ppostponernent] we get
A pretyped, A >-p7,, B
+
A 2p7>,, B
.
5.2.2. Pqr-SN and Pq-BT In 4.6.3 we proved Pr-SN, which - [induction o n 79p7], as in [ v a n Daalen 80, 11.7.3.51 - together with (Pr)-q-pp and 7-SN gives us Pqr-SN, for normable expressions. Then Pq-BT follows, as in 4.4.3.
The language theory of Automath, Chapter VII (C.5)
613
5.3. Some simple facts 5.3.1. Strengthening If B is a (t,x E a,G E B)-expression, but x $! FV(B) and x $? FV(B), then B is a (t,G E B)-expression as well, and the t y p (if degree(B) # 1) and typ' of B w.r.t. both contexts are syntactically equal (E). So, by induction-on the definition of correctness, we get strengthening: if x E a,G E PI- ( B ) ,x fZ FV(B) (and x fZ FV(B)) then G E @I-( B ) - read this twice, with and without the parts concerning B -. As a corollary we have: x E a I- A, x $! FV(A) + I- A, whence q-outsideCL1: I- [x : a](z)A, 2 $? FV(A) I- A.
*
5.3.2. q-PT and q-P*T For pretyped A there holds A
>9
B
* typ(A) tYP*(A)
>9 >9
typ(B) (if degree(A) # 1) , tYP*(B)
Proof. Induction on the length of A. So, induction on 29 gives A 29 B
* tYP(A) 2 9 tYP(B)
(if degree(A) # 1) 1
tYP*(A) 2s tYP*(B) and, a fortiori, we have y P T and q-P*T A
27
B
* tYP(A) 1tYP(B)
(if degree(A) # 1) ,
5.3.3. From 3.2.1 we recall the property of correctness of types I- A =+ I- typ(A)
and the substitution properties from 3.2.2
5.3.4. Property. Let degree(A) = 1, p(A) = [ul]... [ v ~ ] E Then . A 2 [x1 : a11 ... [xk : ak]C. Proof. Induction on the length of A. E.g. let A = (Al)A2, then p(A2) = [ ~ ( A I )[UI] ] ... [a]&, so by ind. hyp. A2 2 [x : P] [xl : a11 ... [xk : a k ] C and 0 A 2 [zl : a:] ... [xk : a;]C', q.e.d.
D.T. van Daalen
614 Corollary. Degree(A) = 1, p(A)
= [v1]v2 *
Corollary. I-' A, A E [z : a]C,A 2 F
+
A 2 [z : a]C.
0
F 2 [z : p]D.
Proof. If A correct, then A normable, so F normable, with
Corollary. I-l A, A E [z : a]C, A
1 F + F 2 [z : p]D.
5.4. The actual closure proof 5.4.1. Lemma. Let I- A, Cg(A), CL;(A). Then PD(A) and SA(A).
Proof. By induction on the big tree of A. (PD). Let A = [z : A1]A2, A 2 [z : B11B2. If A1 2 B1, A2 2 B2, then certainly A1 1 B1. Otherwise A2 2 (z)[z : The latter expression is correct, satisfies CR* and CL*, so we can use SA and get A1 1 B1, q.e.d. (SA). Let A = ( A l ) [ z : A2]A3. Then I- Al, typ(A1) 2 cp, I- [z : AzlA3, typ*([z : Az]A3) = [z : A2]typ*(A3) >_ [z : cp]C. By correctness of types I- [z : Az]typ*(A3), which also satisfies CR* and CL* so we can apply PD and get 0 A2 1 'p, whence typ(A1) I AS, q.e.d. 5.4.2. Lemma. Let I- A, CG(A), CL;(A). Then P T ~ ( A ) .
Proof. Induction on length(A). q-PT1 we know already (Sec. 5.3.2). For Pkmtside-PTl let A (Al) [z : A2]A3. By 5.4.1 typ(A1) 1 A2 and by the substitution property 5.3.3.1) typ(A) = (Al) [z : AzItyp(A3) > 0 typ(A3)[Al] 1typ(A3[A1]), q.e.d. The other cases are immediate.
=
5.4.3. Lemma. Let z E a, G E F B , C q ( B ) , CL:(B), I- A, typ(A) typ*(A) 1typ*(a). We write * for [z/A]. Then ( S C ( B ) ) a E p'* I- B*.
1 a,
Proof. Induction on length(B). The crucial case is: B = (BI) B2, typ(B1) Icp, typ'(B2) 2 [u : 'p]$. By ind. hyp. I- B1, I- B2. We do not know CR or CL for the substitution results, so we use a trick. Distinguish: (1)
B1
does not end in z, then typ(B1)
= typ(B1)"
2
'p*.
(2) Otherwise, let B1 = ...z ...z and form C1 from B1 by just replacing the final z, C1 = ... z...typ(A). Then C1 1 typ(B1) and by CR, C1 1 9. So t y p ( B f ) 3 Cf 1 v*.
Anyhow, in both cases t y p ( B f ) 2 Further distinguish:
'p'*,
with
'p'
1 'p.
The language theory of Automath, Chapter V I I (C.5)
(1) B2 does not end in z, then typ*(Bz)
615
= typ*(Bz)*2 [u : cp*]$*.
...z ... z, C2 (2) Otherwise form Cz from Bz by replacing its final z, B2 ... z... typ*(A) 1 typ*(Bz). Then, by CR (typ*(Bz)), C2 1 [u : cp]$ and, by 5.3.4 C2 2 [u : cp”]$” with, by PD, cp 1 $,”. Now typ*(B,”)= C,” 2 [u : cp’f*] $”* . So in both cases typ*(B,*)2 [u : cp”*]$”*, with cp 1 cp”. Now use CR(cp), this gives cp’ 1 p f f ,whence cpf* 1 cpf’* and typ(B,*)-1 cpIf*. So I- (€3;) B,”,q.e.d. 0 5.4.4. Lemma. Let I- A, CR;(A), CL;(A). Then CRl(A).
Proof. Again by induction on length. The crucial case is the critical &-case: A [X : All (x)[Z : AzIA3, z @ FV(A2). By 5.4.1 SA((Z) [Z: AzIA3) SO A1 1 Az, 0 [z : A1]A3 1 [z : AzIA3, q.e.d. 5.4.5. Lemma. Let I- A, CR;(A), CL:(A).
Then C L ~ ( A )PT(A) , and P*T(A).
Proof. Induction on the big tree of A. (1) (CL1). Let A > B, we must prove t- B. The 7-outside case we know already. Consider, e.g.: A = (Al) [Z: Az]As, B A3[A1]. By 5.4.1 typ(A1) 1 A2. By P*T - ind. hypothesis - we get typ*(Al) 1 typ*(z) as well, so by 5.4.3 we are done. This is P-outside CL1. Or consider: A E (Al) Az, A1 > B1, A2 > B2, B = (B1)B2, typ(A1) 2 cp, typ*(Az) 2 [u : cp]$. By (e.g.) the ind. hyp. we get I- B1, I- B2, typ(A1) 1 typ(B1) and typ”(A2) 1 typ*(Bz). Now use CR, this gives tYP(B1) 1 cp and tYP*(B2) 11. : cpl$. So, by 5.3.4, typ*(B2) 2 [u : cpf]$’I and by 5.4.1 cp 1 cp‘. Finally CR(cp) yields typ(B1) 1 cp’, so I- (B1)B2, q.e.d. The remaining case of CL1 is trivial.
=
(2) (PT). PT1 we know already. Now let A > I B 2 C. By CL1 I- B and by ind. hYP. PT(B), so by CR(tYP(B))l tYP(A) 5. tYP(C), q.e.d. (3) (P*T). Let degree(A) = 1. Then by PT, if A 2 B , typ(A) 2 F 5 typ(B). By C L ~ ( A(this ) implies CL(A))I- B, so by correctness of types, I- typ(A) and k typ(B). Now apply the ind. hyp.: typ*(A) 1 t y p * ( F ) 1 typ*(B) and 0 use CR: typ*(A) 1 typ*(B), q.e.d. 5.4.6. Theorem.
If k A, then CR(A),CL(A).
Proof. By induction on the big tree of A. The ind. hyp. reads CR;(A), CL;(A), and the preceding lemmas produce CR1(A) and CL1(A). As we noticed before, 0 this yields CR(A) and CL(A).
616
D.T. vaa Daalen
5.4.7. Corollary. If k A, then SA(A), PD(A), PT(A), P*T(A) and SC(A). 0
Note: The separate inductions on big trees in 5.4.1, 5.4.5 and 5.4.6 can of course be compressed into a single induction on big trees. 5.4.8.
VII.6. Various equivalence results 6.1. Introduction In V11.2 we introduced A(r]) with and without (definitional) constants. The results in VII.3-5 are derived for the constant-less system. In this section we extend these results in an indirect way to the remaining systems, by showing that, in a certain sense, they can be embedded in the constant-less version. Sec. 6.2 is devoted to primitive constants only. First we give a translation which eliminates the constant-expressions. Then we explain the relations between (a) the system with constants, (b) its image under the translation, and (c) the constant-less system. Afterwards we easily extend our nice properties (CL, CR, BT) to the system with constants. Sec. 6.3 covers the additional extension with definitional constants. In 6.4 we prove another equivalence: between Nederpelt’s single line presentation with abstractor strings Q and our presentation, with contexts E. In this case too, the correspondence is close enough to show that Nederpelt’s original system satisfies the required properties.
6.2. Eliminating primitive constants 6.2.1. The translation ‘ For the system with constants (for short: c-system) we use the notations A ( V ) ~and Fc. Now we define a translation of the c-system into the system without constants. The translation (notation ‘) is characterized by: (1) it transforms constants p into variables p’, (2) it converts constant-expressions p(A1, ...,A h ) into appl. expressions
(4)... (4)P’,
d*
(3) it eliminates schemes y’ E p($ E y one by one from the book by including an additional assumption p’ E [y’ : $17’ in the context,
The language theory of Automath, Chapter VII (C.5)
617
(4) it commutes with the other formation rules (for expressions, strings and contexts). Thus a statement B ;
6.2.2. Why the indirect approach? Below we use the properties of the constant-less system in our proof of the desired correspondence. Afterwards we can extend these properties to the c-system. The point is that the constant-less system is definitely easier to handle. In particular: the fact that the ty p of a constant-expression is constructed by substitution is a complicating factor, because correctness of types is not immediate any more. E.g. by using this indirect approach we would have been able to introduce constants without using degree-norms [as we did in VII.2.31. 6.2.3. The nature of the correspondence For terminology about extensions we refer to V.3.3.2. However, because we study an algorithmic system now, we replace A E B by typ(A) 1 B and A 9 B by A 1 B. Clearly the c-system is an extension of the system without constants. Because typ and remain the same, it is a conservative extension, too. Of course, it is not an unessential one: primitive constant-expressions do not main reduce at all, so they can never be definitionally equivalent to an expression without const ants. Contrarily, the translation ’ maps expression (and contexts), correct w.r.t. B in the c-system, properly [intended is: the translation is not surjective] into the expressions (and contexts), correct w.r.t. B’:expressions (A)p’ that do not have enough arguments in front, i.e. where 1 21 is smaller than the arity of p have no counterpart in the c-system. For the image of the c-system (w.r.t. a fixed book B) under ’, we introduce the notation I-. 1.e.
>
‘
r]
F-
,
resp.
r]
b- B :*q E
B
<’,
E A’
and
B;E kc
resp.
B;< kc A
Then below it will appear that the expressions (and contexts) correct w.r.t. B’ in the constant-less system, form a conservative extension of the system F-. In the presence of r]-reduction, it will be definitional (so unessential) too, See Sec. 6.2.9.
6.2.4. Facts about ’ Notice that ’ is a purely “syntactical” matter, which has nothing to do with correctness: pretyped-ness is sufficient.
D.T. van Daalen
618
As a map from statements B; [ k A to statements B’, [’ I- A’ the translation is not one-one, but as a map from B-expressions and B-contexts into B’-expressions and B’-contexts it is one-one indeed. For the (partial) inverse we use the notation 0:
(A’)o
:= A .
*
Clearly, A[@ = A’[@] so A 2 B + A‘ 2 B’, so A 1 B A’ J. B’. Further typ(A’) 1 typ(A)’ - there are only head-@ contractions involved, where degree(A) = i 1 (for the definition of head-reduction see V.4.4.5, for i-reduction see V.3.3.3). And typ(A’) = cp’ for some cp. If there is no rl-reduction then we have
+
(1) A’> B
+
A
> Bo, BA E B , SO
(2) A ’ ? B’ =+ A > B , and
(3) A ‘ 1 B’
+
A
B.
’ and rl-reduction With ?-reduction, (1) above does not hold any more: ([z: a]p(Ak, ...,A1,z))’ E [x : a’](x)(.&p’ may reduce to (2)~’. Lemma. A’ & B’ + A 2q B. 6.2.5.
Proof. Ind. on the length of A. E.g. let A = [z : a ] C , so A’ E [x : a’]C’. If B’ z [x : p’]D’ with a’ 2, p’, C’ & D’ use the ind. hyp. Otherwise C’ & (z) B’. The latter expression is ((x)B)’ so by ind. hyp. C (x)B and 0 A 1, B , q.e.d.
z7
Now let A’ 2 B‘ then by pq-pp: A’ >p C & B‘. This C 2 Ch so Co & B by the lemma, and A 2 B. This is property (2) above. Property (3) can be proved in the same fashion. 6.2.6. Something about typ* Lemma. -I B’ + (I- typ*(B)‘,typ*(B)’ J. typ*(B’)).
Proof. The translation ’ preserves the degree, of course. We use induction on degree(@). The degree 1 case is immediate. Otherwise typ*(B’) = typ*(typ(B’))and typ*(B)’ = typ*(typ(B))’. By correctness of types I- typ(B’), reducing to typ(B)’ and by P*T typ*(B’) J. typ*(typ(B)’). By CL I- typ(B)’ so by ind. hyp. I- typ*(B)‘, q.e.d., and typ*(B)’ 1 typ*(typ(B)’). By correctness 0 of types k typ*(typ(B)’) so by CR typ*(B)’ -1 typ*(B’), q.e.d. Now that we know CL, CR, PD and SA for h(7) we can extend property 5.3.4
The language theory of Automath, Chapter V I I ((7.5)
619
to: klA, I-’ [z : a]C,A 1[z : a]C =+ A 1~ [z : ,DID, a 1 P. So, as alternative application condition, equivalent to the one used in the original application rule: F A , I - B , t y p ( A ) > a , t y p * ( B ) > [ z : a ] C+ - I - ( A ) B we can as well use, e.g.
tYP(4
1a ,
tYP*(B)
Lp [. : @IC
or typ(A) J, a , typ*(B) 1 [z : a ] C , I- [x : a]C .
6.2.7. The proof of the correspondence Theorem. B;[ kc A & B‘, E’ I- A’. Proof. +. By induction on correctness. The formation of the context B’ is allowed, due to the liberal degree conventions of A(7). Consider, e.g. the appl. rule: let kc A, kc B , typ(A) 1 a, typ*(B) 1 [z : a]C. By ind. hyp. I- A’, I- B’, further typ(A’) 1 typ(A)’ 1a’ and by the lemma in 6.2.6 I- typ*(B)’, typ*(B‘) 1 typ*(B)’ 2 [z : a‘]C’. By CR, typ*(B’) 1 [z : a’]C’. By CL, I- [z : a’]@ so, by the alternative appl. rule I- (A‘) B‘. Or consider the instantiation rule: kcB1, ...,kcBk, y’E p p ( f ) E y is a scheme in B , Id_=k and for i = 1,...,k . The translated scheme reads p’ E [y’: P’ly’. By typ(B,) -1 pZ[L?] ind. hyp. I- Bi, ...,k BL. Now typ(B{) 1 typ(B1)’ 1Pi, typ*(p’) = [y~ : Pi] ... 7,so k ( B i ) p ’ . Further typ(Bb) 2 typ(B2)’ 1 ,&[BI]’= P;[Bi]and typ*((Bi)p’) (Bi)typ*(p’) > [ y :~P6[Bi]]... 7 ,SO I- (Bb)(Bi)p’. Etc. UP to I- (Bk),..(Bi)p’= p ( z ) ’ , q.e.d. e.Also by induction on correctness. E.g. consider an appl. expression. Either it is ((A) B)‘ or it is p ( @ . First case: if I- (A‘) B’ then I- A‘ (so kcA), I- B’ (SO I - c B ) , ~ Y P ( A I ) ’ typ(A’) 1 a (SO typ(A)’ I a ) , typ*(B)‘ 1typ*(B’) 1 [x : a ] C (so typ*(B) J, [ x : a]C). Hence typ*(B)’ 2 p [z : p ] D = [x : &,]DL = ([z : /30]00)’ with a 1p. By CR typ(A)’ -1 &, so typ(A) J, PO,and typ*(B) 1 [z : Po]Do, so kc (A) B. Second case: I- (Bh)... ( B i )p’ so kcBk,...,kc B1. Let y’E * p(y3 E y be the scheme of p . Typ(Bi) 2 P I , typ*(p‘) = [ y ~: Pi] ...7 2 [YI : PI]...^ so typ(B1)’ 1 Pi, typ(B1) 1 PI. Further typ(Bb) 2 (02, and [YZ : Ph[BilI...7 < (B;)typ*(p’) = typ*((Bi)p‘j 1 [YZ : P Z ] . . . ~so , 0 typ(B2) 1Pz[BII.Etc. UP to typ(Bk) I Pk[zI and k C p ( B )q.e.d. 6.2.8. The required properties Theorem. The strictly normable constant-expressions [see the comment to 2.1.41 satisfy BT. Proof. Strictly normable c-expressions transform into strictly normable expressions without constants under the translation ‘. And all 5-sequences of c-expressions A transform into subsequences of 5-sequences of A’:
D.T. van Daalen
620
(1) tYP(A’)
(2) A
>1
(3) A c B
1 tYP(A)‘,
B
+- A‘
+
>1
B’,
A ‘ c B‘.
So by BT for the constant-less version we are done.
0
Theorem. A ( Q ) ~satisfies CR.
Proof. Let F c A, A 2 B, A 2 C. By the =+-part of the correspondence I- A’ and by CR for A(0) B’ 1C’, so B 1C, q.e.d. Theorem. A ( Q ) ~satisfies CL.
Proof. Let k c A l A
> B.
Then k A’, A’
> B’ so by CL I- B‘. So kcB.
0
Theorem. A ( Q ) ~satisfies SA, PD, PT, P*T, SC etc.
Proof. Either from CL and CR, or using the correspondence.
0
6.2.9. An unessential extension result Now we explain the connection between the I---system and the ordinary I--system of A(q) without constants. Recall
I--A’ e F c A , i.e. F - A
t-,Ao
.
The first half of the correspondence result shows I-- + I-, i.e. a simple extension result. Now we define a translation’ from the larger into the smaller system, as follows: if c‘E ~5’* p ( z ) E y is a scheme in B, 151 = k, i < k then ((Ai) ... (A1)p’)- :F [zi+l : ai+l[K-]] ... [zk : ai[2-]](zk)... (zi+l) (AT) ... (A;)p’, i.e. we 77-expand until p’ gets enough arguments in front. For the rest acts as identity. Clearly A- l e A, A- 3 (A;)‘. Viz. ((Ai) ... (A1)p’)i = + [zi+l : (ri+l[AJ] ... [zk : ak[&]]p(&,zi+l, ..., zk). The translation is a bit intricate, because ((A) B)- is not necessarily (A-) B-. In general (A-) B- >p ((A) B)- and B-[A-] >p (B[A])-. Further typ(A-) Le typ(A)-, and also typ(A-) Jp typ(A)-. Without proof we state that A 1 B + A- 1 B - , and that typ*(A-) 1 typ*(A)-. From these facts, it can be proved that: -l A F A-, so by the second part of the correspondence I- A + k- A-. In case of &reduction, this is a typical unessential extension result.
-
6.3. The case of definitional constants 6.3.1. We have three main possibilities to incorporate definitional constants in our theory. The first one studies the new system (we call it A(77)d, with correctness predicate kd, and also speak about the d-system etc.) independently, as a
621
The language theory of Automath, Chapter VII (C.5)
separate subject, the second one considers it as an extension of A ( Q ) ~and , the third one embeds it into h(q),by extending the translation ' from the previous sections in order to cover definitional constants. Here we actually use the second method, and just mention some points on the third one. , reasons of comBut we start by proving the big tree theorem for A ( Q ) ~for pleteness and as an indispensable prerequisite for the separate study of the system (method one above). 6.3.2. The big tree theorem for A(7])d In 6.2.8we proved BT for A ( V ) ~ by means of the embedding ' into A(Q). It is indeed possible to extend ' to the case of definitional constants, but (see 6.3.3)the translation does not reflect the type-structure sufficiently, which makes this method fail here. so instead we revise the BT-proof of 5.2 (for A(Q)) and adapt in to the h ( 7 ) d case, which is relatively easy. First we mention the BT-condition (see 4.3.2):
* BT(Al), ...,BT ( A d , BT(tYP(P) [4). (6) BT(d(2)) * BT(A1), ...,BT(Ak), BT(typ(d) [A]), BT(def(d) [A)). (5) BT(P(4)
The P6r-SN conditions are quite analogous, and, as in 4.4.3, we have: Theorem. P&r-SN(A) =+ BT(A).
0
This suggests that, in this case as well, the substitution property of PSrSN is crucial. We choose to adapt the first BT-proof (Sec. 4.5) so need the
replacement theorem (see 4.5.6)instead: Let * denote [z/p(A)lLR, let B be normable, p ( z ) = p(A), A, B PGr-SN. Then:
C E B*
+
C Ph-SN.
Proof. As in 4.5.6.We consider a single reduction step C >1,$67 D. For all &steps and all r-steps concerning variables (not constants), P&SN(D) can be proved as in 4.5.6. The remaining steps, i.e. &steps and r-steps of constants, can only fall into the categories (1)and (2a) so we get P~T-SN(D)by ind. hyp. I1 or ind. hyp. 111. So we have a list of corollaries: (1) B normable, p ( z ) = p(A), A, B P6r-SN
(2) B normable, p ( q )
= p(Ai), Ai
+
B[A] PGr-SN.
(i = 1,...,lE) and B PSr-SN
+
B[A]
Ph-SN.
Proof. The simultaneous substitution can be simulated by iterated single sub0 stitution.
(3) B normable
+ B PGr-SN.
622
D.T. van Daalen
Proof. Induction on pretyped expressions. For the new cases use the previous corollary. 0 (4) B normable
+ B Oq67-SN.
Proof. ~ q - p pextends to the present case (see 5.2.1), 6q-pp we knew already 0 (see 11.7.4). This gives (p6.r)-q--pp and, by q-SN, Pq&-SN. (5) B normable =+ pqG-BT(B).
(3
6.3.3. The translation into A ( q ) Here we show how the translation ’ can be extended to the d-case. Viz. an expression d(A)transforms into (A:) ... (A;) [Z : G’ID‘, where Z E G * d ( Z ) := D * d(2) E y is the scheme of d. This translation behaves nicely w.r.t. to reduction: A > B =+ A’ 2 B‘. But of course it is possible that an expression A’ P-reduces t o a n expression which is not some B‘. This is in contrast with the situation with primitive constants where this could only occur by q-reduction. The best we can get is: A’ > l ,p B =+ B >p C , A >l,ps C . So, e.g. by ind. on 19p(A’), we get A’ >p B =+ B 2 p C‘, A 2 C . For the rest the translation seems t o be not too useful, because properties like A’ J, B’ =+ A J B (at least where q-reduction is allowed) and typ(A’) 1 typ(A)’ are only valid in the correct fragment. Note that typ(A’) >_ typ(A)’ is simply wrong here. 6.3.4. Some properties of h ( ~ Translation ) ~ of A(q)d into A ( V ) ~just requires the elimination of abbreviations, which can be done by &normalization. In the next sections we show that this actually constitutes a translation, i.e. that it preserves correctness. Here we first give some properties of A ( Q ) ~which we need in the - rather complicated - proof below. ~ The single substitution result (of A ( q ) , and of A ( Q ) too) I - A , t y p ( A ) l a , ( z ~ a , q I - B =+ ) v[Al~BB[Al
can, by induction on 131,be extended to a simultaneous substitution result I-
A, typ(Ai) 1cui[A]- for i = 1,...,IA(, ( 2E G I- B ) + I- B [ q .
The properties of Sec. 3.2.2 concerning the t y p of substitution results can be generalized t o (1) the simultaneous substitution case,
(2) successive application of typ, resulting in:
The language theory of Automath, Chapter VII (C.5) typJ(Ai) 1 typj(zi) tYPW
623
* [A], for i = 1,...,
[A1 1t Y P j ( B [ d )
9
for all relevant j, where typ3 stands for j successive applications of typ. This holds for A ( q ) but also for A ( V ) ~and A(7)d. Notice, that in case B does not end in one of the xi we even have t y p j ( B [ d ) = typj(B)[A] .
6.3.5. The translation into A ( V ) ~ Our notation for the translation is 7 For expressions'amounts just to taking &normal form. It is clear how-acts on strings and contexts. It is intended that the book B- is formed from B by &normalizing and by skipping the abbreviational schemes. The translation is of course not 1 - 1. We recall that B [ d - = B-[A-],that d ( 2 ) - E def(d)-[A-], and that 6reduction commutes with pq-reduction. The latter implies A 2 B
+
A-ZB-
and A J B
+
A-IB-.
6.3.6. The translation preserves correctness Theorem. B;( k d A + B-; (- kc typ'(A)-, typ'(A)i = 0, ...,degree(A) - 1 (thas ancludes kcA- ztself).
1 typ'(A-)
for
Proof. By induction on kd. Crucial cases are:
=
(1) The application case: A (Al) A2, kdA1, kdA2, typ(A1) 2 a, typ*(A2) 2 [z : a]C. By the ind. hyp. kc A;, kc typ(A1)-, typ(A1)- 1 typ(A;), kc typ'(Az)-, typ'(A2)- 1 typ'(A,). Clearly typ(A1)- 2 a- so by CR typ(A;) 1 a-. Similarly, typ*(A2)- 2 [z : a-16, and tYP*(tYPZ(A2)-)1 tYP'(tYPYA2)) = tYP*(A,) 1tYP*(A2)- (by P*T)¶ so by CR, typ*(typ2(A2)-) 1 [z : a-]C-. Hence kctyp'((A1) A2)- (= (AT) typZ(A2)-). See 6.2.6 for the alternative appl. condition. The property typ'((A1) A2)- 1 typ'(((A1) A2)-) is trivial.
(2) The definitional constant case: A = d(@, kdB3, typ(B,) 1 p3[3]for J' = 1, ...,14, where y ' E ,8*d($ := D * d(y') E y is the scheme of d. By ind. hyp. k c B y and typ(B;) 1 typ(B,)- 1 &[g].Also by ind. hyp. y ' E ,8- I-, D - , 3 E p'- kc y-, y ' E ,8- kc typ(D)- and typ(D)- 1 typ(D-). So, by the simultaneous subst. property, kcD-[g-] ( 5 A-), kcy-[g-] (= typ(A)-). We know that y 1 typ(D), soy- 1 t y p ( D ) - so by CR t y p ( D - ) 4 y-, whence typ(D-)[&] Jy-[g-] and, again by CR, typ(A-) 1 typ(A)-. Now there is left t o prove, for i = 2, ...,degree(A) - 1:
D.T. van Daalen
624
(a) kctypi(A)- (= typi-'(y[l?])-), (b) typZ(A)-
1 typ'(A-),
and
i.e. typi-'(y[d])-
1 typZ(D-[d-]).
The ind. hyp. gives us kctypi-l(y)-, kctypi(D)-, typZ-'(y)- 1 typi-'(y-), typZ(D)- 4 typi(D-) for these i, and kc typ'((Bj)- (1 typk(BjT)), for i = 0, ...,degree(Bj) - 1, for j = 1, ...,Id. Now (2) is simple: typa-'(y[J]) 1 typi-'(y) [d]so typi-l(y[J])- 1 typi-'(7)-[J-] 4 t y p y y - ) [J-] 1 t y p i ( d - ) [&I 1typi(D-[&]). Here we use PT and the substitution property of types. By CR we get (2). Property (1) we formulate in the form of a lemma.
Lemma. Let G E p k d y, i-dBj, for j = 1, ..., kctypi(y[d])-, for i = 0, ...,degree(y) - 1.
with y and
B' as above.
Then
Proof. If y does not end in some of the yj then typZ(y[d])- = typi(y)-[&] which is correct by the simultaneous subst. property. This also covers the case i = 0 (which we knew already ). For the rest we use the length of y. The case y = yj is true by assumption. Further consider the application case: Y (Ti)?'z, kd 71, Ed 7 2 , tYP(7i) 2 cp, tYP*('Yz) 2 [ z : VIE. BY ind. hYP. kcyi[B]-, kct y p ( r i [ 8 ] ) - , kc typi(rz[J])- for all i. We have typ(y;[g-]) 1 ~YP(T;) [J-I I_~YP(YI)-[J-I 2 C P - [ ~ - ] , SO by CR t ~ p ( y l [ g ] - )1 C P [ ~ ] - Sim. ilarly typ*(rz[B])- I typi(yz)-[8-] 1 typi(y;)[z-]. So, by CR and P*T, tYP'(tYP"(yz[fl)-) 1t Y P * ( t Y P W 1 t ? P * ( 3 i ) [J-I 1 typ*(_rz)-[J-I 2 [ z : cp[d]-]E[B]-.Again by CR, typ*(typi(yz[B])-) 1 [ z : cp[J]-]E[B]-,whence k c t y p i ( ( n [ @ yz[B])-, q.e.d. 0 The abstr. case is straightforward. This finishes the proof of the lemma.
[a-1)
This finishes the definitional constant case of the theorem. Now the remaining cases of the theorem are straightforward. This finishes the proof of the theorem. 0
6.3.7. Is A(7)d a definitional extension of A ( V ) ?~ The above corollary amounts to the unessential extension properties UE2 and UE3 (see V.3.2.2). Of course, we also have kcA + A z A- and it is tempting t o conclude the other half of of UE1: B;(kdA
+
B;tkdA-
from the corollary. This is, however, not immediate as yet: we can conclude
B;< kd A
+
B-;<-kd A-
The language theory of Automath, Chapter VII (C.5)
625
and we know
(B;6 ) - typ('4-1 2 (B-;6-1- tYP(A-7 but we hardly know anything about
( 4 6 ) - tYP*(A-)
*
Instead, we first prove the substitution theorem for A(q)d; this gives correctness of types, as well as 6-CL. The latter implies UE1, which completes our definitional extension result. 6.3.8. Some nice properties of A(q)d The corollary in 6.3.6 gives us already some nice results. Theorem. A(v) d satisfies (1) CR,
(2) SA and (3) PT
Proof. (1) Let kdA, B
B
1 C.
I A 2 C.
Then kcA-, B- 5 A- 2 C-. By CR B-
(2) Let kd (A) [z : B]C. Then kc(A-) [z : B-]C- so typ(A-) typ(A)- 1 typ(A-) and by CR, tYP(A) I B.
1 B-.
1 C-,
so
Further
(3) Let kd [z : &]A, [z : &]A 2 [z : P]B. Then k, [z : cy-]A-, [z : a-]A- 2 [z : P-]B-. By PD Q- 1p- so Q 1p.
0
Remark. We also prove some form of PT and P'T. Let k d A, k d B , A 2 B. Then typ(A) .1 typ(B) and typ*(A) 1 typ*(B). Proof. kcA-, k c B - , A- 2 B-, so typ(A)- 1 typ(A-) 1 typ(B-) 1 typ(B)0 and by CR typ(A) 1 typ(B). Similar for typ'.
Lemma. Let kd B , i = 1,...,k . Let G E p' with IV'~= k . Let k c c ( 8 - ) . Then kdc(2).
* c(8Ey
be the schemes of c,
626
D.T. van Daalen
Theorem. Let E 3 Ed B. Let * stand for for i = 1,..., [.'I. Then kdB*.
[Z/A].Let
and typ(Ai) 1 a,'
Proof. We use induction on FdB. so, by ind. hyp. kda; for i = 1,..., [.I' Now typ(A1) 1 a1. So typ(A;) 1 typ(A1)- 1 a; and by CR (A;) 1 a;. Similarly typ(A,) 1 a;- = a i [ X - ] . Etc., and for all i typ(Ai) 1 a i [ A - ] . Now consider, e.g., the application case: i? E 3 k d (B1)B2. By 6.3.6, Z E G- kc (B;) BT and by the subst. theorem in A(r]),, kc(B;[A-]) BT[A-] (E (BT-) B;-). By ind. hyp. EdB;, k d B;, so by the first lemma, k d (Bi)B,*.Similarly use the second 0 lemma for the constant-expression case. The other cases are immediate. 6.3.10. The remaining nice properties for ceding theorem are
A(7])d
Corollaries of the pre-
(1) correctness of types,
(2) 6-outside-CL1, (3) P-outside-CL1 (use SA).
Lemma. h(7)d satisfies CL1. Proof. The 7-outside case is mere strengthening. We can use the lemmas in 6.3.9 for the inside cases. Let k d (BI)B2, B1 > CI, B2 > CZ. By ind. hyp. kdC1, kdC2. By 6.3.6 k c (B;) BT, and B; > C;, B, > CF, SO kc (C;) CT SO 0 k d (C1)C2. Similarly for const. expressions. Theorem.
A(7))d
satisfies CL.
Proof. As usual, by ind. on 2.
0
6.4. Nederpelt's original formulation 6.4.1. Nederpelt's original definition of A [Nederpelt 73 (C.311 used single-line presentation. 1.e. instead of defining correctness of expression relative to a context, he defined correctness of expressions having a n abstractor string '.[ : G] (notation Q) in front. For definiteness we give his rules. We write k j for ~ correctness in his system. But for certain provisions making sure that no confusion of variables occurs, the rules read:
The language theory of Automath, Chapter VII (C.5)
627
6.4.2. Apart from the use of abstractor strings instead of contexts, there are two other points that make the two approaches not completely parallel. The first point concerns abstraction; our abstraction rule has no counterpart in Nederpelt’s system. Nederpelt rather follows a combinatory (in the sense of combinatory logic) way of building expressions. In the language of combinatory logic, rule (2) above is the rule for la,the identity in a, and rule (3) is the rule for Kay,the constant function on a with outcome y . Alternatively, rule (3) might be called a rule of weakening (see V.2.9.3). 6.4.3. The second point that requires attention is that an abstractor string can get involved in a reduction (notably an 17-step), whereas contexts are of course immune to reduction. First some notation. We write IQI for the number of abstractors in Q. We write Q 2 Q’ if Q = [Z : d ] ,Q’ = [Z : $1 and d 2 6‘in the obvious sense. Now we have the following lemma: QA 2 Q’A‘, IQI = IQ‘I A 2 A’. Proof: If there are no 77-steps involving the border line between Q and A, then clearly Q 2 Q , A 2 A‘. Otherwise Q = Q l [ z: a ] ,a 2 a’, Q1 2 Q;, A 2 (z) B with z g‘ FV(B) and Q’,B 2 QY[z : PIA‘. 1.e. Q A = Q1[z: a ] A 2 Qi[z : a‘](z) B >, QiB 2 QY[z : P]A’. Now we can, e.g., use ind. on t9(QA) and conclude that B 2 (z : PIA’. But then A 2 A’, q.e.d. 6.4.4. The equivalence proof Now we are ready for the equivalence proof. Theorem. Let
Q = [ Z : Z ] ,( E Z E G . Then FNQA
* (FA.
Proof. The +-part is immediate. We use induction on F. E.g. consider our variable rule: from 3 E Z k we conclude ? E d F zi. If zi is the most “recent” variable then we must use rule (2). Viz. Z E Z I- is itself a result from 2 1 E a1,...,zi-1 E ai-1 F ai. By ind. hyp. we get FN [z1 : a11 ... [zi-l: ai-1]ai. Otherwise we must insert the abstractors inbetween [xi : ail and the end of Q by successive applications of rule (3).
628
D.T. van Daalen
Now consider the =+part. The crucial case is the application clause. So let I-N QA, I-N QB, typ(QA) 1 Qa,typ*(QB) 2 Q[z : aIC. BY ind. ~ Y PI. A, I I- B. Now typ(QA) Q typ(A) 2 Q a so by the lemma typ(A) 2 a. Simi0 lady typ*(B) 2 [z : a]C. So we conclude I- (A) B , q.e.d. 6.4.5. The nice properties for Nederpelt’s system One of the consequences of the theorem is:
I-NA * I - A so the N-system can be considered a part of our system. This gives CR and CL for FN immediately. From this one can get the other properties SA, PD, PT etc. as usual. 6.4.6. Alternative way of embedding Ad into AN Resuming the results of the preceding sections: we have constructed an embedding of A ( q ) d (via A ( V ) ~ and A) into AN. Here we introduce a n alternative way (due to Nederpelt [Nederpelt 71al) of embedding A(q)d directly into AN. Our notation for the translation is, again, ’. Let a statement B ; t Ed A be given. Primitive schemes j? E * p ( 2 ) E y are, as is to be expected, turned into abstractors [p’ E [Z: d’ly’]. The context I is of course transformed into an abstractor string t’ G Q. Essentizl is the translation of definitional constant schemes. A scheme 2 E 6 * d(Z) := D * d(Z) E y is translated into an expression “segment” ( [ 2 : .‘’ID‘) [d‘ : [j? : 6’1 7’1. All constant expressions c(A) are now translated into (A:) ... (A:) c’. So B; k d A is translated into a single expression B’t’A‘, where B’ is a string of abstractors and applicators, and I’ consists solely of abstractors. For expressions the translation is quite similar to the translation ‘ in 6.2.1. In particular we have (as in 6.2.4) typ(A’) > p typ(A)‘. However, w.r.t. to &reduction the correspondence is not too close: it is not possible t o eliminate occurrences of d’ one at a time. So in order t o establish A 1 B =+ A’ 1 B‘ we need a partial &normal form again. FN B’E’A’. Anyhow, it is indeed possible to prove B;[ kd A
The language theory of Automath, Chapter VIII, Section 3 (C.5)
629
VIII. SOME RESULTS ON AUT-II VIII.3. A short proof of closure for AUT-11 3.1. Proving closure for AUT-II is not very different from proving it for AUTQE. So we just sketch how to modify the proof in V.3.2. We start with a version without the extensions mentioned in 2.4 and 2.7, but we include all reductions (also 6l-reduction). 3.2. For the terminology see V.3.1. Let > denote disjoint one step reduction. [See the comment to 11.8.1 By the properties in [van Daalen 80, 11.7.4.31 [or, alternatively, b y weak 6-advancement and induction on da(A)]we have A
>B
* 6-nf(A) > 6-nf(B) .
By the substitution theorem we have 6-CLPT. The 6-nf’s of 1-expressions are of the form n([z : a ] A )or 7 . Reductions of these expressions can only be internal, so by induction on p we get (including what might be called UD1 here): I-ln([z:a ] A )9 n([z :PIE)
*a
p and
(z E a I- A
B) .
3.3. From this follows SA2 (whence Pkmtside-CL:) and P-outside-PT:. Viz. let A E a , I-2[x : B]C E n([x: a]D). with conclusion I- ( A )[z : B]C. Then, for some E , z E B I- C E E and I- n([z : B ]E ) II([z: a]D). So a B and x E B I- E D whence A E B (i.e. SA2) and z E B I- C E D. So C[A]E D[A]
(i.e. @-outside-CLPT:)
.
The proofs of UT2 and the inside cases of PT: are by ind. on I-.
3.4. The strengthening rule gives Iputside-CL1. Here follows a proof of outside-PT: different from the proof in V.3.2.5. Viz. let k2 [z : a](z) A E y , z $ FV(A). Then, for some C, [z : a] (z) A E n([z : a ] C [ y / z ] ) y , where z E a I- A E ll([y : a’]C),a’ q a. So, as well, E a I- A E ll([y : a]C). By A E n([y : a]C) and z E a, y E a I- (y) A E so weakening z E a, y E a x E a t- [y : a](9) A E n([y: a]C).Again by weakening x E a I- [z : a](z) A E y , so by UT2 z E (Y I- y Q n ( [ y : C Y ] ~ ) Hence . x E a t- A E y and by strengthening A E y, q.e.d.
c
3.5. This completes the proof of PTT. Then PT2 and LQ2 follow by ind. on 2 and respectively. Now we come to CLPT3. For properties like SA3 we need
630
D.T.
vi~n Daalen
3.6. To this end we study P2-reduction and, in particular, P2-head-reduction, for short Pi (for the definitions see V.3.3.3 and V.4.4.5). We know already P2outside-CLPT1 (this is P-utside-CLPT:). From this follows P2-CLPT1 by ind. on I-, and P2-CLPT by ind. on 2. Now we use the fact that 3 is the only argument degree and that, hence, P2-reduction does not create new P2-redices. Compare v.3.3.4, VI.2.4. As a consequence, P2-SN is quite easily provable (for degree correct expressions) even without using norms: namely, if A P2-SN, B P2-SN then A [ B ]P2-SN, by ind. on (1) %34B), (2) length ( B ) . So, as usual, P2-SN by ind. on length (see IV.2.4.1). A fortiori, Pi-SN. Besides satisfies CR, so we can speak about Pi-nf’s. E.g., degree(@ = 2 , &nf Clearly
( B )= [x : a]C
+- &nf
( ( A )B ) = C [ A ].
and 6 commute, so PE6-CR and Pi6-nf’s are defined too.
C(cp) =+ @G-nf(A) = C(+), cp 9 $. Sketch of proof. Ind. on 9. For the induction step we need the following property: F2A , PiS-nf ( A ) = C(cp), A > C or C > A , F C =+ /3;6-nf(C) = C($), cp Q $. If C > A it is easy, (Pi)-i-pp holds here for all kinds of reduction i (see 11.9), so PiS-nf (C) = C($), $ > cp. Otherwise, A > C. Now Pi6 commutes with all other kinds of reduction, except 77; (see 11.8). And it even commutes
3.7. Theorem. F2A
with the latter, except for “outside” domains. Where we define the latter to be the airP j , etc. in (A)[Z : Z] (3)[y’: P] ..., with (A)possibly empty. But there are no “outside” domains left in C(cp). So, in any case, Pib-nf(C) = C($), cp > $. In fact, if A >$, C then cp = $. By Pi6-CL we know that both C(cp) and C($) are correct so from (cp > 11, or 0 > cp) we can conclude cp Q $. This proves the wanted property.
+
Corollary. C(p)
Q
C($)
cp Q $.
0
3.8. Both the theorem and the corollary can be proved in precisely the same manner for II and $, yielding the properties in 3.5. Remark: The theorem above is a kind of minimal result for the desired properties, E.g., we can, alternatively, prove a kind of weak CR2-result as in VI.2.4, or prove a similar but stronger theorem in the spirit of V.3.3, V.3.4.
The language theory of Automath, Chapter VIII, Section 3 (C.5)
631
3.9. Now we are able to prove the outside cases of CLPTQ. E.g. for +-reduction. Let ( i l ( A , p ) ) ( F @ G )E y . T h e n i l ( A , P ) E 6 , F @ G EIZ(cp), c p E 6 + r , y’, ( i l (A, p)) cp y. And A E a,a @ p 6, F E a’ + y‘,G E p’ (a’@ p‘) -7’ II(cp). So [x : a’ $ ,O‘]y’ cp, and [x : a‘ @ p’]y’ E 6-7. So (a’ @ p’) q 6 (a@ p), whence a a’,p p’. So (A) F E y’. Further y’ (il(A,P)) [x : a’ @ ply‘ 9 (il(A,p)) cp y, whence (A) F E y too. Similarly for the other variant of +.
-
3.10. Then follows full CLPTl by ind. on F and CLPT by ind. on 2. Besides, we have of course UT and LQ. And we can freely make the language definition somewhat more liberal, as follows. First we can change the &propagation rule into AqB, BJC, FC
AqC.
Secondly we can add the appl. rule, with i 2 1 A E ~ F, * + ’ B q [ x : a ] C =sl-(A)B and drop the degree restriction in the appl. rule 1 (i.e. rule 1.4).
3.11. Now we shall say something about proving CL for AUT-IZ with the extension of Sec. 2.4. Just adding abstr. expressions of degree 1 does not matter at all, we still can get UD’ without any difficulty. Making the language into a +-language (i.e. adding appl-1-expressions too) causes some trouble with the domains in case v-reduction is present. Which can, however, be circumvented as in V.3.3: First leave q1 out, then prove &CL and add 77’ again, 3.12. Finally the extension of Sec. 2.7, i.e. where @-2-expressions are present. If there is also E2-reduction the situation is essentially more complicated, because p and E interfere nastily. But without E~ the proofs of 3.3-3.8 just need some modification: (p+)’-SN can be proved as easy as P2-SN, +‘-CLPT is not difficult either. Then Theorem 3.7 can be proved for (P+)’-b-head-nf’s instead. 3.13. Requirements for the pp-results in 11.9 were: (1) The result of outside-o-reduction is never a $-,an inj- or an abstr-expression.
(2) The result of outside 77 or
E
is never an inj-expression or a pair.
Now we can easily verify them for AUT-IZ using the results of this section. First let (cp,A(l),A(2)) > o A. 1.e. degree(A) = 3, A E C(cp). If A were an C(cp). Theorem abstr-term, then A E n($) for some +. UT states that II($)
D.T. van Daalen
632
3.7 states that n($) 2 C ( x ) for some x. This is impossible. Similarly for injor $-expressions. Or let [z : a](z)A >, A. By PT A E II(cp0) for some cp. If A were an inj-expression, then degree(A) = 3, A E (P @ y) for some lo, y. By UT !J(cp) Q ( p @ y). Use the suitable variant of Theorem 3.7 again (Sec. 3.8), this gives a contradiction.
VIII.4. A first SN-result for an extended system 4.1. I n t r o d u c t i o n The word “extended” in the title of this section refers to the presence of other formation rules than just abstr and appl (and possibly instantiation) and other reduction rules than just P and q (and possibly 6). In the case of AUT-II we are concerned with the additional presence of: (1) pairs and projections, with reductions K and (2) injections and $-terms, with reductions
CT.
+ and
E.
In IV.2.4 we gave some versions of a “simple” (as compared to a proof using computability) proof of P-SN. Then we extended it to Pq using Pq-pp. Afterwards we included 6 as well. Here we stick to the separation of 6 from the other reduction rules. Below we first show (4.6) that addition (1) mentioned above does not cause any trouble: the first version of the “simple” proof of P-SN immediately covers the pn-case. And afterwards, we can include 6 and q by a postponement result again. However, the second addition essentially complicates matters. The presence makes the first P-SN proof fail here, because the important induction on of functional complexity (norm) goes wrong (see Sec. 5.1.2). We add new, so-called permutative reductions (Sec. 4.3.1, (111) in order to save the idea of the proof (5.1.3). These permutative reductions, in turn, complicate the SN-condition, and a way to keep them manageable consists of adding (in 5.1.5) still another kind of reduction, viz. improper reductions (Sec. 4.3.1, (IV)). Our second P-SN proof of Ch. IV can fairly easy be adapted for the present situation, however. We just have to add improper reductions to make the proof work (see Sec. 5.2). For completeness we also include a proof based on the computability method (Sec. 5.3). However, these three proofs just cover the situation with p + ~ - r e d u c t i o nand can, by ext-pt be extended to P K 6 q . Alas, we have not been able to handle E too. We cannot use pp anymore, so we have to include E from the start of the proof on. And none of our methods can cope with this situation. The problems with @ (or V) are well known from proof theory. E.g. Prawitz in [Prawitz 651 first proves normalization for classical propositional logic, where he avoids the problem with V, by defining V in terms of “negative” connectives.
+
+
The language theory of Automath, Chapter VIII, Section 4 (C.5)
633
Then, when studying intuitionistic propositional logic, he also needs permutative reductions for proving normalization. By the way, our improper reductions turn out t o be identical with the semi-proper reduction used in the SN proof for arithmetic in [Leivant 751.
4.2. The system AUT-IIo 4.2.1. For brevity and clarity we study a system of terms with the same “connectives” and reductions as AUT-II (so the essential problems with SN become clear) but with a simplified type-structure. It can be compared with the normable expressions of Ch. IV. Later (Sec. 5.4) we extend our results to AUT-II. 4.2.2. Reduced type structure The reduced types or norms (syntactical variables a,P, y, v) are inductively given by:
(1)
T
is a norm.
(2) if a and p are norms, then also a 8 PI a --* /3 and a @ P.
Note: If we write [a]@instead of a + P it is clear that t h e norms of Ch. IV form a subset of the present norm system. We write a --* /3 with the purpose to show that our norms form a simple type structure over a single fixed type, T . This is also true of the norms in Ch. IV. Hence normability results (as in Ch. IV, or as given earlier in [van Benthem Jutting 71b ((?.I)], [Nederpelt 73 (C.3)]for certain Automath variants) can alternatively be proved as follows: the generalized systems under consideration are not essentially richer than simple, non-generalized type theory, in the sense that they do provide the same set of terms of free A-calculus with a type as does a simple, non-generalized system. Compare [Ben-Yelles 811.
4.2.3. Terms of AUT-IIo All terms (syntactical variables A , B, C, ...) have a norm. The norm of A is denoted p(A). We also write A E a for p ( A ) a. Terms are constructed according to: (i)
variables x, y, z, ... of any norm.
(ii)
~ E ~ , A E ~ , B E[ xP: A ] B E a + P .
(iii)
C E ~ + P , A E ~ , B E P( C , A , B ) E ~ ~ P .
(iv)
A E a,B E P
(v) (vi)
*
*
* i l ( A , B ) E a $ P , iz(A,B) E Pea. B E ~ - + P , A E*~ ( A ) B E p . B E a 8 P * B(I)E a,B(z) E P.
D.T. van Daalen
634 (vii‘) [z : A]C E a
+
7 , [y : BID E
0
-+
y
( a $0) 7 .
*
([z : A]C @ [y : BID) E
+
These terms can be compared with the 3-expressions of AUT-II. However, there are no constants, no instantiation (and no 6),it has simpler type structure and it has only $-terms of the form [z : A]C $ [y : BID. Below we also consider a variant AUT-lll which has general @-terms. Instead of rule (vii’) it has rule (vii)
B ~ a + y , C ~ p - ++ y B$CE(~$P)+T.
Below, we often omit type-labels in [z : A]B, il(A, B ) ,&(A,B ) and (C, A, B ) , just writing [z]B,il(A), i2(A) and (A, B). 4.3. The reduction rules 4.3.1. We consider four groups of reduction rules.
(I)
The introduction-elimination rules (IE-reductions) 0,7~ and
+‘ (see 2.6).
Rule +‘ is particularly appropriate for AUT-no, i.e. in connection with rule (vii’). For AUT-II1 we rather use rule
+.
(11) The ext-reductions 7,0 and
E.
Here we use the simple unrestricted version of
0: (C,A(l), A(2))
>A
(111) Permutative reductions (p-reductions).
(-+) (A) ( B )([zIC@ [VlD) > ( B )([.I
(4c @ [YI (A) D ) .
(8) ((4([zIC@ [ ~ 1 0 ) ) ( 1 )> (A) (bIC(1)@ [YI D(1))- similarly for (’4-projection. (63) D
=E @F
=+ ( ( A )( [ 4 B@ [zlc))D
> (A) ([XI ( B )D @ [z] (C) D ) .
The general pattern of these rules looks like
O((A)“ZIB @ [YIC))> ( A )([zlC3(B)€3 [YlO(C)) where 0 is an operation on expressions, given in one of the following ways: U ( B )= (A) B , O ( B )= ( B )( E $ F ) , O ( B )= B(1) or O ( B )= B(2). The norms of these B’s are respectively a -+ p, a $ 0 and a 8 p. That is why the rules are coded (-+), ($) and (8). In case the argument of U allows outside (i.e. $-reduction), the p-step does ) $ [y]C) > O ( B [ A ] )= U ( B )[A] < not produce a new equality: U ( ( i l ( A ) [z]B (il(A)) ([z]U(B)$[y]O(C)).Below (6.2), it turns out that, generally, p-equality is generated by 077 &-reduction.
+
635
The language theory of Automath, Chapter Vlll, Section 4 ((2.5)
The above mentioned rules are the standard ones from proof theory. There it is formulated like this: if the conclusion of an V-elimination rule forms the major premise of an elimination rule, then the latter rule can be pushed upward through the V-elimination rule. E.g. our +-rule can be compared with the following proof theoretic reduction:
B
avp
1.1
C Y+6
lo1 D
A
7-6
A
Y
V E
Y
>
C
7-6
[PI A D Y 7 4 6
B
avp
7-6
1.1
6
6
+ E 6 6 Both here and in proof theory the p-reductions are primarily introduced for technical reasons. However, as Pottinger [Pottinger 771 points out there is some intuitive justification for them too. Part of it, that in some cases they do not extend the equality relation, is stated above. It has been suggested to allow other permutative reductions as well ([Pottinger 771, [Leivant 751). However, in [Zucker 741 it has been shown that this spoils SN. (IV) Improper reductions (im-reductions).
( A ) ( [ z : B ] C $ [D y :] E ) > C , (im)
( A )([z : B]C @[y : DIE) > E .
Notice that the set of free variables of the expression can be enlarged by performing an im-reduction. If an inside im-reduction takes place inside the scope of some bound variable, the latter variables have to be renamed in order to avoid any confusion. These reductions can be compared with Leivant’s [Leivant 751 semi-proper reductions, They degenerate to what Prawitz calls immediate simplifications, when z FV(C), resp. y $2 FV(E).
4.3.2. One-step and many-step reduction One-step reduction > 1 is, as well, generated from the main or outside reductions given above, by the monotonicity rules. Then follows many-step reduction 2 from reflexivity and transitivity. 4.3.3. The usual substitution properties are valid, e.g.,
B
>1
B’
A
>1
A’
+ B[A] > I B’[A] and + B[A]1 B[A‘] etc.
D.T. van Daalen
636
4.4. Closure for AUT-IIo 4.4.1. First notice that AUT-IIo is certainly not closed under 0, because of the restrictive rule (vii’). So the proof below is intended for the 77-less case. 4.4.2. Due to the simple type structure it is quite easy to show that norms are preserved under substitution and reduction and hence that AUT-IIo is closed under reduction. 4.4.3. Substitution lemma for the norms. E
2 E
a, A E a, B E
P
+ B[z/A]
p (and B[z/A] a term).
Proof. Ind. on length of B.
0
4.4.4. Reduction lemma for norms. A E a, A
> A’
*
A’ E a (this
includes CL1). Proof. Ind. on the definition of >. For /3 and +’ use the substitution lemma. E.g. let A = (il(A1)) ([z]Az @ [y]A3), A E a , A’ = A2[A1]. Then, for a , so [z]A2 E a1 a, some a t , a 2 , A1 E ai, ([z]A2 @ [y]A3) E (01 @ a 2 ) z E al, A2 E a. So Az[Al] E a , q.e.d. Or a permutative reduction: A = ((Ai) ([$]A2 @ [YIA3))(1), A E a , A’ = (Ai) ( [ ~ l A z (@~ )[Y]AB(,)).Then for Some P, Q I , 0 2 , (Al) ([zc]Az@[~]A3) E Q@P,x E 01, E a2, Ai E L Y I @ ( Y Z , A2 E a@P,
-
+’:
+
A ~ E c Y @ S@O. A ’ E C Y . 4.4.5. Theorem (Closure). A E a , A 2 A’ (without 77)
0
+ A’ E a.
Proof. Ind. on 2.
0
4.5. The system AUT-II1 4.5.1. Instead of rule (vii‘) it has the rule
BEa-y,
+
C€@+-, + B@CE(a@P)-y
and it has instead of +‘. Of course (vii’) (vii), so indeed AUT-II1 contains AUT-no. We can define a translation cp from AUT-IIo to AUT-II1 such that cp(A) & A and which shows that AUT-II is not a very essential extension of AUT-IIo. The translation is given by ind. on length. The only nontrivial clause is cp(C1 @ C2) = [z : Ma] (z) cp(C1) @ [z : M p ] (z) cp(C2), where C1 @ C2 E ( a @ 0) -+ y and Ma, M p are suitable fixed expressions of norms a , P and z, y are chosen of norm a , P such that z 6 FV(Cl), y 6 FV(C2), respectively, A. On variables, cp acts like identity. For the rest, cp just commutes with the formation rules. Clearly, cp leaves the norm invariant and is indeed a translation into AUT-IIo.
*
The language theory of Automath, Chapter VIII, Section 4 ((2.5)
637
4.5.3. In the sequel we prove SN for some versions (i.e. with and without p-red. etc.) of AUT-no. By the above properties we can easily extend the pand im-less case to AUT-IIl: AUT-IIo SN (with +') =+ AUT-II1 SN (with +). 0
Proof. Let A be an AUT-lI1 term. Use ind. on 6(cp(A)).
+
+
But, from SN with follows SN with and +', because each +'-step can be and a &step, so I?+decreases under +'-reduction, And, besimulated by a cause AUT-IIl contains AUT-IIo we also get SN for AUT-IIo with and
+
+
+'.
4.5.4. The postponement requirements For AUT-no- and AUT-II1expressions it is quite straightforward to show the requirements (l),(2) of 3.13. E.g. let (A(*),A(z)) > A. Then A E Q @ p. So A is not an inj-term, a @-term, or an abstr-term. Etc. 4.6. The first order character of the systems 4.6.1. In [van Daalen 80, IV.1.51 we emphasized the importance of the property
p((A1) B ) = p((A2) B ) , in particular p((A1) [XI&)
= p(A2)
i.e. the functional complexity of (A) B does not depend on the argument A. Alternatively stated: it is of course possible that the different values of B have different types, but apparently there is a strong uniformity in these types, for the functional complexity of all the values is the same. In fact, we defined a system to be first-order if this property was present.
D.T. van Daalen
638
Generally, the introduction of $-types and $-terms might spoil this uniformity: we might be able to define functions completely different on both parts of their domain. So, by “general” $-functions the first-order property above gets lost. However, in AUT-Ilo, AUT-IIl and in AUT-II the domain of $-functions is explicitly restricted in such a way, that the first-order property can be maintained, viz. by requiring
4.6.2.
(1) in AUT-IIo that p ( B )= p(C) when forming [ z ] B~3[y]C. (2) in AUT-Ill that B E
cy
(3) in AUT-II that B E a
-+
-+
y,C E /3
-+
y when forming B
$ C.
y,C E ,B -+ y when forming B $ C.
As a consequence we still have p((A1)B ) = =4C).
= p ( ( A 2 )B ) and
in particular
P ( ( A )([4B63 [YlC)) P(B)
Now it will be clear that the generalized $-rules of 2.7 would spoil the first-order character. Example: let A E T , B E T , C E T , D E T , then [z : A]C E A -+ T , [z : B ]D E B -+ T . So [z : A]C @ [z : B]D E ( A @ B ) + T . So, if E E A -+ C , F E B -+ D , then ( E $ F ) E Il([z: A]C$ [z : B]D ) . Clearly the functional complexity of ( i l ( G ) () E $ F ) for G E A and ( i z ( H ) )( E @F ) for H E B can be completely different, viz. that of C and D respectively.
4.6.3.
It is possible that a notion of norm (i.e. simplified type) can be defined which is manageable and measures functional complexity of these general $-terms, but the present norm (and the corresponding SN proof) is certainly not suitable for this situation.
4.6.4.
4.6.5. Remark: Strictly speaking, the suggested connection between the typing relation in AUT-II and the norms in AUT-Ilo has not yet been accounted for. The preceding statements have to be understood on an intuitive, heuristic level. 4.7. A proof of Pmp-SN 4.7.1. Here we show that the first P-SN proof of Ch. IV straightforwardly carries over to the case of Pqa-SN. As our domain of expressions we take, e.g.,
the terms of AUT-IIl. SN-conditions for PT For non-main-reducing expressions (also called immune forms or IFs) it is sufficient for SN if all their proper subexpressions are SN. Incidentally this is also true for projection expressions (because main .ir-reduction amounts to picking a certain subexpression). So we have: A SN A(l) SN, and the funny property: A(l) SN e A(2) SN. 4.7.2.
*
The language theory of Automath, Chapter VIII, Section 4 (C.5)
639
We recall the SN condition for appl expressions in this case:
( A )B SN e A SN, B SN and ( B 2 [x]C + C [ A ]SN) . 4.7.3. Heuristics: the dead end set of P So, the substitution theorem for SN is again sufficient for proving SN (see IV.2.4). The crucial case of the substitution theorem for P-SN was where A is SN, B = ( B I )B2 is SN, B2[Az]2 [y]C,but B2 2 [y]Co. 1.e. the reduction to square brackets form depends essentially on the substitutions. Then we used the square brackets lemma: B2 2 ( F ) x ,
[A1 2 [YlC. We define the set &, of these expressions ( F )x symbolically by a recursion equation Ex = x (U)Ex,where U stands for the set of all expressions and it is of course understood that all expressions in Ex are in AUT-IIl again. x can be considered as dead ends when one tries to copy The expressions (p) in Bz the contractions leading from B2[A]to [y]C,i.e. when one tries to come “as close as possible” to an abstr expression. We do not bother to make the concept of dead end more precise, or more general, but just give this informal explanation for naming Ex the dead end set w.r.t. x, P-reduction, and abstr expressions. (($)XI
+
4.7.4. The dead end set of PT When one tries to copy a @r-reduction sequence of B[A]in B one need not end up with an expression in Ex,but, e.g., can also end in x ( ~ )The . following theorem states that F defined by
3=2
+ 3 ( 1 ) + F(2)+ ( U )7
is the dead end set w.r.t. x, PT and immune forms (IF’S).Let 2 stand for 2 p r , and let * stand for [z/A]. Theorem. If B SN, B* 2 C , C E IF then B 2 CO, C; 2 C with either (i) C; non-main reduces to C, or
(ii)
co E 7 .
Proof. Just like the square brackets lemma (second proof, IV.2.4.3), by ind. on
(1) d ( B ) , (2) 1(B). Let B* main-reduce to C (otherwise take B = CO). Then B = z, (and take Co = B , Co E IF), B = D(1),B = D(2)or B = (01)D2. E.g. let B = D(l). Then D* 2 ( D I ,D z ) , D1 2 C. Apply ind. hyp. (2) to D. In case (i), D 2 ( E l ,Ez), Ef 2 D1, E,* 2 Dz,so B 2 E l , E; 2 C. Then apply ind. hyp. (1)t o E l . In case (ii), D 2 Eo, EO E 3,E6 2 (D1,Dz)and B 2 Eo(,, E F,E o ( ~=; E;(,) 2 C , so 0 case (ii) holds for B too.
640
D.T. van Daalen
Remark: (1) Similarly we can prove a more general outer-shape lemma (see 11.11.5.4) for Pr, where the condition “C E IF” simply has been dropped.
(2) It is probable that such “standardization-like” theorems can also be proved without using SN (as in 11.11).
4.7.5. Heuristics: the norms of dead ends The point of the P-SN proof is:
B E EZ
* G 4 B ) )I@(z))
-where 1 is the length of the norm -. So, if B[A]2 [y]C,then 1 ( p ( y ) ) < l ( p ( z ) ) , and we can use ind. on norms in the crucial case of the substitution theorem. We are lucky that the same method works for Pr-reduction too. Namely
4.7.6. The substitution theorem for PA-SN Theorem. A Or-SN, B Pr-SN B [ x / A ]Pr-SN. Proof. Ind. on (1) p ( A ) ,(2) 19pT(B),(3) l ( B ) . Let 2 be 2 p r . If B = z then B[A] = A so SN. If B E IF or B = C(l) or B = use ind. hyp. (3). If B = (B1)B2 proceed as for P-SN, using the norm properties of the dead end set F. 0 4.7.7. Pr-SN and Prqu-SN An immediate corollary of the substitution theorem for Pr-SN is Pr-SN itself. Now we can extend this to Prqo-SN (as in 11.7.2.5) using (Pr)-(qa)-pp, a case of ext-pp (see 11.9.2). The requirement for pp is indeed fulfilled (see 4.5.4). VIII.5. Three proofs of Pr+-SN, with application to AUT-II 5.1. A proof of P.rr+-SN using p- and im-reductions 5.1.1. Here we show how the preceding SN-proof (based on the first version of the simple P-SN proof in Ch. IV) has to be modified in order to cope with (or +’). First we shall see how the norm considerations of that proof do not go through.
+
The language theory of Automath, Chapter VIII, Section 5 ((3.5)
5.1.2. The dead end set for ,Or+ Let 2 be 2pA+. The following theorem states that the set
64 1
G defined by
G = x + G(1) + G(2) + (U)G + (0) (UCB U) is the dead end set w.r.t. x , / 3 ~ +and IF’S. Let * stand for [z/A]. Theorem. Let B be SN, B’ 2 C , C E IF, then B 1 Co with either ( 1 ) C$ non-main reduces to C , or
(2)
c,. 2 c, co E G .
Proof. As in 4.7.4, by ind. on
(9 (ii) 1(B).
CI
Similarly, we can prove the corresponding outer shape lemma. The problem is now that the norm of the expressions in G is not related t o the norm of x. E.g. consider the typical +-dead end (z) ( B CB C ) .
5.1.3. Improving the dead end set by p-reduction We restrict our domain of consideration to AUT-&. Instead of rule we choose rule Besides we add permutative reductions. Then a great deal of the “bad guys” among the dead ends, i.e. whose norm is not related to that of x , can be main reduced by a p-reduction. This will (in the next section) result in an improved dead end set ?-tdefined by
+
+ (F)(U@ U)
? =F i
+‘.
with F as in 4.7.4 .
5.1.4. Let 2 be p+’p-reduction. The direct reducts of a p-main step are of the (see 4.3.1 for the definition of U ) ,so never are in form ( A )( [ z ] Q ( B@) [g]O(C)) one of the immune forms (abstr, inj, pair, plus). Lemma. p-main reduction steps in a reduction to IF can be circumvented.
Proof. The last p-main step in a reduction t o I F must be followed by a +’-main step. However, this combination can be replaced by a single internal +’-step. 0 Corollaries. ( 1 ) ( B )([.]GI @ [ x I C ~2) D , D E IF
+
B 1i j ( A ) ,Cj[A]2 D ( j = 1,2).
+ Either (i) C 2 [PIE,E [ B ]2 D or (ii) B 2 i j ( A ) , C L ([x]C1@ [z]Cz),Cj[A]2 D,( j = 1,2).
(2) ( B ) C 2 D , D E IF (3) B(j) 2
D,D E I F
*
B 2 (C1,Cz), Cj 2 D ( j = 1,2).
D.T. van Daalen
642
Proof. Each of these reductions to I F can be replaced by one without p-main 0 steps. Part of the two corollaries can be summarized (with U as in 4.3.1) by: if
U ( B ) > D , D E I F then
B>C, CEIF, O(C)>D
This gives another lemma. Lemma. ZfU((B) ([z]C1@ [z]C2))2 D , D E IF, then
( B )([.10(Cl)
@
[.lc3(CZ)) 2 D.
Proof. (B) ( [ z ] G fB [z]C2)2 E , E E IF, U ( E ) 2 D. So B 1 i j ( A ) ,Cj[A]2 E . 0 But then (B) ( [ z ] O ( C l@) [z]U(C2)) 2 U ( C j [ A ]2) U ( E )2 D , q.e.d. This proof amounts to: if an expression allows both p- and IE-main reduction then we can insert p-main followed by +'-main before performing the IE-main step. Now we prove the theorem about the improved dead end set H. Let * stand for [ z / A ] . Theorem. Zf B SN, B* 2 C , C E IF, then B 2 CO,C$ 2 0 with (1)
c$non-main reduces to c,or
(2)
co E H.
Proof. As in 4.7.4, by ind. on (i) O(B), (ii) 1(B). Here 19 refers to the current reduction PT+' p. let B* main reduce to C , B f z. If the first main step can be mimicked in B use ind. hyp. (i). Otherwise, by ind. hyp. (ii) B 2 U ( D ) ,D E H , U ( D ) * 1C. If D E 3,then O ( D ) E 3-1 and we are done. Otherwise D = ( 0 3 ) ([y]D1@ [y]D2),0 3 E 3. Then B properly @ [y]U(Dz)), E E H , and by the previous lemma reduces t o E = ( 0 3 ) ([y]U(D1) E" 2 C , q.e.d. 0 5.1.5. Improving the SN-conditions by im-reduction The crucial SNcondition for @T+' (in AUT-IIo) is: If
(1) A SN, B SN, (2) B 2 [ z ] C =+ C [ A ]SN and for j = 1 , 2
(3) B
1 [.]Ci
@ [.]Cz,
A 2 ij(D)
* Cj[D]SN,
then is ( A )B SN. Now the p-reductions have improved our dead end set, but the problem is that
The language theory of Automath, Chapter Vlll, Section 5 (C.5)
643
they make the SN-conditions quite complicated. E.g. in order to prove that ( A )( B )([x]Cl@[x]Cz) is SN we need that ( A )C1 is S N , in particular if C1 2 [y]E we need that E[A]is SN etc. 1.e. the SN-condition of ( A )B ceases t o be easily expressible in terms of direct subexpressions of reducts of A and B. In order t o solve this problem we add im-reduction. But at first we show that the dead end set is not changed by this addition.
5.1.6. T h e dead end set of Pr+'p,im Luckily the dead end set remains 3-1. Let 2 stand for 2pn+fp,im. The first lemma of 5.1.4 can be maintained. For let a p-main step be followed by an im-main step. Then we can skip the main p-step and just apply the im-step internally. The next corollaries need a n obvious modification, in particular: If ( B )([z]C1@ [z]Cz)2 D , D E IF, then either (1) B 2 ij(A),Cj[A]2 D (for j = 1 or j = a), or (2) Cj 2 D (for j = 1 or j = 2). And the property thereafter becomes: E IF,then either
If O ( B )2 D , D
(1) B 2 C , C E IF, O ( C ) 2 D ,or (2) O ( B )E (€3) ([z]Cl@ [x]Cz),Cj 2 D (for j = 1 or 2). But the second lemma of 5.1.4 remains unchanged. Namely, if an expression allows p-main reduction but also im-main reduction, then we can insert p-main followed by im-main before performing the im-main step. E.g.
So, the theorem of 5.1.4, that the dead end set is still 3-1, carries over too. 5.1.7. The new SN-conditions The point of the im-reduction is that the SN-conditions for Pr+'p,im are identical with those for PT+' (see 5.1.5). First we give the SN-conditions of ( B )([x]C1C6 [x]C2).These are (1) B S N , C1 SN and Cz SN, and
(2) B 2 ij(A)
*
Cj[A]SN (for j = 1 and 2).
644
D.T. van DaaJen
Proof. Let the above condition be fulfilled. Use ind. on (1)f l ( B ) , (2) l(B). The interesting case is when the first main step in a reduction is a p-step. So let B 2 (B3)([y]B1@[ Y ] & ) , to prove that ( B 3 )( [ Y ](BI)C @[Y] ( B z )C ) is SN, with C E [x]Cl@ [x]C2.By ind. hyp. (1) or (2) we just need that B3 is SN (trivial), that ( B j )C SN for j = 1 , 2 and that ( B j [ D ]C) is SN, where B3 2 i j ( D ) . Since B properly reduces to both Bj and Bj[D](in case B3 2 i j ( D ) ) we can use ind. 0 hyp. (1) and get what we want. Theorem. The SN-conditions for P.ir+’p,im are identical with those of PT+’ (see 5.1.5). Proof. Let ( A )B fulfill the SN-conditions (l),(2), (3) of 5.1.5. We use ind. on fl(B). The interesting case is when the first main step is p. The case that B 2 [x]B1@ [x]B2has been done before, so let B 2 (B3)([5]B1@ [z]B2),to prove that (B3)( [ z ]( A )B1 @ [x]( A )B2) is SN. 1.e. that B3 SN, that ( A )B1 and ( A )B2 SN and that ( A )B1[D],( A )&[D] are SN whenever B3 2 i j ( D ) ( j = 1 or 2). Now B properly reduces to both Bj and Bj[D](if B3 2 i j ( D ) ) so we use the ind. hyp. and get what we want. In other words: we just need that the direct subexpressions and the IEmain reducts (not all the main reducts) are SN for proving that a n expression is SN. 5.1.8. The substitution theorem for SN Notation: We just write p(A) < (resp. 5 )p ( B ) to abbreviate G@)) < (resp. 5 )W B ) ) . B[x/A]SN. Theorem. B SN, A SN, p(x) = p(A)
*
Proof. Ind. on (1) F(A), (11) fl(B), (111) l ( B ) . The crucial case is when B = (B1)B2 and B[A]IE-main reduces. If this first main step can be mimicked in B use the second ind. hyp. Otherwise we end up with ( B i )C or (C)Bb with C E 7-l and BI 2 B: or B2 2 B: f [y]D1@[y]Dz, respectively. If C E 9, then p ( B i ) < p(C) 5 p ( x ) so a first main reduction of ( ( B i )C) [A]involves a substitution [ z / E ]with p ( z ) 5 p ( B i ) < ~ ( 5 )And . a first main-IE reduction step of ((C)Bb) [A]must be a +’-step, so involves a substitution [ z / E ]with C[A]2 i j ( E ) . So in that case too p ( z ) = p ( E ) < p(C) 5 ~ ( 5 ) . Anyhow if C E 8, we can use ind. hyp. (I). Otherwise C = ((73) ([y]C1@ [y]C2),
The language theory of Automath, Chapter VIII, Section 5 (C.5)
645
with C, E B. Then a p-step is possible and can be inserted before doing the main IE-step. This p-step can be mimicked in the reduction of B, so we can use ind. hyp. (11). 0 5.1.9. SN for AUT-IIo and AUT-IIl Like before, an immediate corollary is Pr+’p,im-SN for AUT-no, so Pr+’-SN for AUT-no, whence Pr+-SN for AUTII1. Then by pp we can extend the AUT-II1 result to PT va-SN. (Not for E . )
+
5.1.10. An alternative method Actually im-reduction can be avoided in this proof. Namely the effect of p-reductions on the SN-conditions can be expressed by means of certain inductively defined sets. We define a set of expressions B! by
B! = B
+ ( U )([XI( B ! )@ U ) + ( U )(U @ [XI ( B ! ) )
1.e. B! contains all those expressions that im-reduce to B. Then the SN-conditions for PT+’ become: If (1) B SN,
c SN,
(2) B 2 B’ E A!,C 2 C‘ E ([y]D)! + D[A]SN, and (3) B
L B’ E (ij(A))!, C 2 ( [ ~ ] C CB I[ ~ ] C Z* ) ! Cj[A]SN ( j = 1,2),
then ( B )C SN. 5.2. A second proof of Pr+’-SN, using im-reduction 5.2.1. This proof is based on the second instead of the first P-SN-proof of Ch. IV (Sec. IV.2.5, see also VII.4.5). There we did not use the square brackets lemma, and no dead end set, so we can do without p-reduction. Our language is AUT-IIo, again, and >_ stands for >_ Pr+’,im. 5.2.2. Replacement theorem for SN As explained in VII.4.5, the kernel of this type of proof is a replacement theorem, rather than a substitution theorem, for SN. Theorem. If B SN, A SN, ~ ( 2I)p ( A ) , then B[z/A]LRSN. Proof. By ind. on
(1) fi(A), (11) d ( B ) , (111) l(B).
646
D.T. va.n Daalen
We write * for [x/A]LR. Consider a reduction sequence B* >1 ... >1 F >1 G, where the contraction leading from F to G is the first contraction not taking place inside some reduct of one of the inserted occurrences of A. Realize first that the number of those inside-A contractions is finite, because A is SN. Now we prove that G is SN. Distinguish two possibilities: (a) The step F > I G does not essentially depend on the inserted A's and can be mimicked in B. 1.e. B >1 Go, G: 2 G. In this case we use ind. hyp. (11). (b) Otherwise some reduct of some inserted A plays a crucial role in the redex contracted. If G F > G is a r-step, then, e.g., B = ...x . . . x c ( ..., ~ ) B* E ... A ...A(l) ..., F = ...A' ... (Cl,C2)(,)..., G = ... A' ... C1 .... Now form BOE ...x ...y ... from B by replacing x ( ~by) a fresh y, with p ( y ) = a1 (wherea! E a ! ~x a2). And B _= Bo[y/z(l)] so BOis SN, ~ ( B o 5 )8(B),~ ( B o<) 1(B). So by ind. hyp. (11) or (111), B,' is SN and B,' 2 Go = ... A' ...y... with G = Go[Y/CI]LR.Here Go is SN, 9 is SN, p b ) = ~(cj), G is a P-step argue as in IV.2.5.3 or VII.4.5.6. If F > G is a +'-step, the redex contracted is, e.g., (il(D)) ([y]C1 @ [y]C2), reducing to Cl[D]. Now distinguish
WY))
( b l ) a reduct of an inserted A is crucial in i l ( D ) , (b2) a reduct of an inserted A is crucial in ([y]Cl @ [y]C2). (z)Co ..., C,+ 2 [y]C1 $ [y]Cz, A 2 First case ( b l ) . Then B 3 ... z... il(D). By a norm argument the $-term must be present in B already, SO Co z [p]El8 [y]E2, Ei 2 4 , E,' 2 Cz. NOWform Bo ...x...El .... This is an im-reduct of B, so SN and by ind. hyp. (11) B,' SN, reducing to Go = ... A' ... C1 ..., where G = ...A' ... Cl[D] .... Clearly Go SN, D SN and l ( p ( 0 ) ) < l ( p ( 2 ) ) . so G -= GO[Y/D]LRSN by ind. hyp. (I). In case (b2), argue as in the p-case. Finally, the redex contracted in F is an im-redex, in which A plays a crucial role. 1.e. B = ... z ... (Co)x ..., A 2 [y]D1@ [y]Dz, C,' 2 C, F = ...A' ... (C) ([y]Dl $ [y]D2)..., G zi ... A' ... D1 .... Form BO= ... z ... y ..., B = Bo[y/(Co) Z]LR; so either by ind. hyp. (11) or (111) B,' is SN, reducing to Go = ...A' ... y .... Clearly D1 SN, I(p(D1)) < l ( p ( z ) )so 0 by ind. hyp. (I) G = G o [ ~ / D ~ ] L is RSN.
5.2.3. An immediate corollary of this replacement theorem is the ordinary substitution theorem. From this, as before, follows @r+'im-SN for AUT-no. So we get Pr m-SN for AUT-rIl.
+
The language theory of Automath, Chapter VIII, Section 5 (C.5)
647
+
5.3. A proof of @r qu-SN by computability 5.3.1. In this proof we do not include qu by a pp-result afterwards, but consider
these ext-reductions from the beginning of the proof on. We must consider AUTIIl because AUT-IIo is not closed under q. Our definition of computability has been strongly inspired by de Vrijer's definition in [ d e Vrijer 75 (C.4)]. de Vrijer's definition is phrased in such a manner that the important properties: (1) computability implies SN,
(2) computability is preserved under reduction, follow almost immediately. Then, as usual, we prove by ind. on length that expressions are computable under substitution. Notice that we do not include E . 5.3.2. The definition of computability We write C, for the set of computable terms of norm a. The set C, is defined by induction on the length of a , as follows: Let B E a. Then B E C, if B SN and the following requirements are fulfilled:
Notice that each clause in the definition of C, only depends on Cp's with shorter than a.
p
5.3.3. We write C for the set of all computable expressions, the union of all the C,s' . By definition: A E C A SN. Each condition in the definition of computability of B has the form: B 2 C +- P ( C ) , with P some condition on C. So computability is preserved under reduction.
*
Now we try to express the computability of an expression in terms of the computability of its subexpressions. First a lemma. Lemma. 5.3.4.
( I ) [x]C2 [ x ] D + C 2 D (2) ( C , D ) > ( E , F ) + C > E , D > F .
D.T. van Daalen
648
(3) ij(C) 2 i j ( D ) =+ C 2 D (j= 1 , 2 ) .
(4) C @ D > E @ F =+ C > E , D > F . Proof. Without main reduction it is trivial. Otherwise it is or cr. E.g. if (C, D ) 2 ( E , F ) ,then C 2 ( E , F ) ( 1 )2 El D 2 ( E , F ) ( 2 2 ) F q.e.d. By the 0 way, Property (4) even holds in the presence of E . Lemma (computability conditions). (0) variables are in C .
(1) A S N , C E C , D E C =+ ( A , C , D ) E C . (2) A SN, C E C
*
il(C,A ) E C , i2(C,A ) E C .
(3) C E C , D E C =s C @ D E C .
(4)
cEc *
C(1) E
c, C(2) E c .
+
(B)CEC.
(5) B E C , C E C
Proof. (0) is clear. ( l ) , (2), (3) by the previous lemma. (4) as follows: Let C E C , then C SN so C(j) SN. If C(j) 2 [y]D,then C 2 (C1,Cz) with Cj 2 [y]D. Each of the Cj is in C , so [y]Dsatisfies the required condition. Similar if C(j) 2 (01, D z ) ,C(j)2 i l ( D ) etc. Proof of (5): Let B , C E C so B, C SN. Induction on ,u(B). We first check the SN conditions. Let C 2 [y]Dlthen D[B]E C so SN. Or let B 2 i j ( D ) ,C 1 C1@C2, to prove that ( D ) C j is SN. Well, both Cj’s are in C , D E C and we can use the ind. hyp. to prove that ( D ) C j E C (so SN). Further, if ( B ) C 2 [ y ] E(or reduces t o ( E ,F ) etc.), this is only possible after a main step, so either via some D [ B ]with C 2 [y]Dor some ( D )Cj where B 2 i j ( D ) , C 1 C1 @ C2. Those expressions were in C so [ y ] E(and ( E ,F ) etc.) satisfy the required conditions. 0
5.3.5. Computability under substitution For expressions [y]Csuch simple computability conditions cannot be given. We define an even stronger notion than computability. Definition. B is said to be computable under substitution (cus) if
A ~..., , A, E
c,
p(zi)
=p
( ~for~ i )= 1,...,
=s
B[z/A’I E c .
Some easy properties are (1) B Cus
+ B E C (e.g. take n = 0), and
(2) B Cus, B 2 C =s C E C .
0
The language theory of Automath, Chapter VIII, Section 5 (C.5) Then a lemma. Let p ( C ) G
c E C,l+,Z
+ QZ
and let F E C,,
+
649
(F)C E Ca2. Then
.
Proof. Clearly C is SN. We use ind. on l(a1). If C 2 [ y ] D ,F E C,,, we must prove D [ F ]E CaZ. This holds because ( F )C 2 D [ F ] . If C 2 D @ E we must prove that D , E E C. For il(F) E C,,, ( i l ( F ) ) C E C so (F)D E C. Now use the 0 ind. hyp. Similar for E. 5.3.6. Lemma. B Cus, C Cus j[y : B ] C Cus.
Proof. Let C Cus, B Cus, A' E C of the right norms. Abbreviate [?/A'] by *, We must prove that [y : B*]C*E C. Well, B* E C, C* E C so [y : B*]C*E SN. If [y : B*]C* 2 [p : DIE, F E C of the right norm, then we need that E [ F ]E C. Because C is Cus, C[3,y / x , F ] E C, which expression reduces to E [ F ] q.e.d. , In particular, if C* L (y) (El @Ez), y $! FV(E1 @Ez),we have that (F)(EICBE~) E C, so by the previous lemma El@&E C, El E C, EZ E C, q.e.d.0 Theorem. All AUT-II1 expressions are Cus. Proof. Variables are Cus by definition. Further use induction on length. For the abstr case use the previous lemma. For all the other cases use the lemma in 5.3.4. E.g. to prove that ( B ) C is Cus. Let * be as in the previous lemma. By 0 ind. hyp. B* E C, C* E C, so ( B * )C* E C. Corollaries. (1) All AUT-nl expressions are computable.
(2) All AUT-nl expressions are PT
+ qa-SN.
0
5.4. Strong normalization for AUT-II 5.4.1. The normability of AUT-II In order to extend our results from AUT-IIl to AUT-II we must first extend our definition of norm (see 4.2.3), and implicitly, of normability, as follows: p(7) E 7
p(A) G
Q + /3
p(A) = a
+
+
p ( I I ( A ) )G Q
,B =+ @ ( A ) )
+
,B
= a 8 ,B
A , B of degree 2 + p ( A @ B ) = p ( A ) @ p ( B ) . And we must say what the norms of the variables are P(2)
:= P ( t Y P ( X ) ) .
D.T. vaa Daalen
650
Our definition of normability, here, is modelled after the normability definition of AUT-QE (weak normability), in particular as far as the handling of 2-variables is concerned. For details see IV.4.4-IV.4.5. First we define norm inclusion c: (1) a a norm (2) a c
P
*a c
7.
=+ ( Y - + @ )
c (-/-+PI.
Then we say that A fits in B (notation A f i n B ) if degree(A) = 3
* p(A) = p ( B )
degree(A) = 2 =+ p(A) c p ( B ) . Now we define the norm of constant expressions Afin
c‘[A +
p(c(A))
:= p( typ( c)[A)
Afin
c‘[d *
p(d(A))
:= p ( t y p ( d ) [A])
where 5 E c‘ is the context of the scheme, in which c (resp. d ) was introduced. We want to show that correct expressions are normable, and of course that whenever A E B , A fits in B. In view of the instantiation rule and the fact that norms can change under substitution (for 2-variables) we prove, as in Ch. IV.4.5 a kind of normability under substitution. Theorem. If f i n g[A, f E B’ I- C E D , then C [ d f i n D [ A (note that “fitting in” implies the normability of the expressions involved).
A
Proof. Ind. on correctness.
0
Corollary. I- C E D
0
C f i n D (so, C, D normable).
5.4.2. Note: By the above defined concept of normability lots of expressions become normable which are certainly not correct in AUT-ll. E.g. p ( B ) , and (C(B))(i),with p ( B ) Pi P2. (A) (n([z: BI C)), with p(A) This is a consequence of the fact that AUT-II is handled just like AUT-QE: IIk are (as regards norms) ignored, and C’s are in some sense identified with pairs. +
E x t e n d i n g the SN-result to AUT-ll Clearly the presence of nonreducing constants such as C, II (for 2-expressions), and T does not harm the SN-results of the previous sections. We just have to add &reduction. The substitution (resp. replacement) theorem for SN can easily be extended because ) take place inside A or can be mimicked in &contractions in B [ x / A ] ( L Reither B already. Then we can proceed as in IV.4.6 or directly prove B normable B SN, by ind. on 5.4.3.
*
The language theory of Automath, Chapter VIII, Section 6 (C.5)
65 1
(1) date(B). [For a definitional constant, date(d) = date(def(d))+l. The date of an expression is the maximum of the dates of the definitional constants that occur in it. So, induction on date can be considered “induction on definitions ”.] (2) l ( B ) . [The length of B.] The new case is when B = d ( 6 ) . The Ci’s are SN by ind. hyp. (2). Further we = want that def(d) [C?] is SN. Well, def(d) is SN by ind. hyp. (1) and def (d) def(d) [Cl]... [Cn]. So by iterated use of the substitution theorem we are done. by pp. Later we can add q , Alternatively we can extend the SN proof by computability to the present case, viz. by leaving the definition of computability unmodified and prove computability under substitution by ind. on (1) date, (2) length. let In particular let A l l ...,Ah E C of the right norms, let * stand for B,*,...,B; E C. Then we must prove that d(@* E C. The Ba’s are SN. By ind. hyp. (1) def(d) is Cus, so def(d) ($1 E C, so SN. Further, if d(B*) 2 [y]E or (ElF ) etc.) then this reduction passes through def (d) [B*](which was in C). So, finally we have PT uq6-SN for AUT-ll.
[e]
[Z/d,
+
VIII.6. Some additional remarks on AUT-II 6.1. The connection between AUT-QE and the abstr part of AUT-II Here the abstr part of AUT-II is the part generated by the general rules (2.2.1, 2.2.2) and the specific rules group I (2.3). If it were not for the role of rI,and the rule of product formation, this part of AUT-II would be identical to AUT-QE. In the introduction to this chapter we mentioned already that the rule of type-inclusion is somewhat stronger than the rule of product formation. This means that the obvious translation of AUT-II, viz. just skipping the II’s produces correct AUT-QE, but not all of AUT-QE. Namely without II, the rule of product formation becomes
+
cpE[z:a]~
(1)
VET
which is just a specific instance of the type-inclusion rule cpE[$:fl[x:a]~+ cpE[y’:fl~.
(11)
Let us see whether sensible use of (I) can yield something like (11). So let
p’ I- (5) cp E [z : 1217 (where Y consists of the yi’s So by (I) y’E $ I- ( i cp ) E 7 ,and by iterated use of the +
cp E [y’ :
[z : 1217.Then y’ E
in the reversed order). abstr rule we get k cp+ E
= [y’: P] (Y) cp. - 9 -
T
with cp+
Clearly
D.T. van Daalen
652
cp+
1; cp
which indicates that AUT-QE is not a very essential extension of the image of AUT-II under the translation. Compare [de Bruijn 771, [de Bruijn 78c (B.4)]. 6.2. The CR problem caused by E In Ch. I1 we gave a counter example for PE-CR. Namely [z]z and [y]il(y)@ [y]iz(y)are distinct PE-equal normal forms (just two different ways to write identity on a @-type). This suggests to save CR by adding E alt (see 2.6)
[.]B[i1(z)] @ [z]B[iz(z)l> [ z ] B*
+
However, E alt and interfere in a nasty way: [z](... (z) F ...) @ [z](... (z) G ...) <++ [XI (... (ii(z))( F W )...) @[z](... ( i z ( 2 ) )(F@G)...) >s [z] (... (z) (FBG) ...), so this does not help. In principle, CR is not too important for our purpose, we rather need a good decision procedure for definitional equality. Just like (in V.4) we suggested to implement 17-equality by the rule (z) F 9 G
*
F 9 [z]G
we conjecture here that we would generate full equality (including
E)
by adding
( i i ( ~ )F) Q (z)G , ( i z ( ~ )F) Q (z) H =+ F Q G @ H . But in order to guarantee the well-foundedness of such an algorithm, we need of course some kind of strong normalization result, which applies in the present situation. The general pattern of the counterexample to E alt-CR reads
[z10((4F ) @ [510((5)G) Q [z10((4(F@ GI) where 0 is a very general operation on expressions. This shows that extensional equality generates the equality induced by permutative reductions (Sec. 4.3) O ( ( 4([ZIB@ [zIC))Q ( 4[40((4 ([zIB@ [zlC))Q ( 4([z10((4[zIB)@ [z10((4[zIC))Q (’4([zIO(B)@ [z10(4). E.g.3
( D ) ( A )([zIB@ blc)
6.3. The SN-problem caused by E We strongly believe that SN holds for the full AUT-ll reduction (including E ) , and that there are just some technical problems which prevent the proofs of
The language theory of Automath, Chapter VIII, Section 6 (C.5)
653
the preceding section to apply to that situation. We briefly sketch why each of the three proofs fails in presence of E . The problem with the first proof (5.1) is that the dead end set for, e.g., PE-reduction is not so easy to describe. E.g. [ y ]((il(y)) z) F @ [y]( i 2 ( y ) )F is a typical dead end for PE. Of course Pv- or Do-dead ends are not manageable either, but a77 can be included afterwards, using pp. Then the second proof (5.2). An c-redex [ y ]( i l ( y ) )F @ [y]( i z ( y ) )F can be created by substitution [z/A]in two different ways:
(1) from z @ [y](iz(y)) F , A part) 1
[ y ]( i l ( y ) )F (and similar with the right hand
(2) from [Y]( i l ( y ) )FI @ [yl ( h ( y ) )Fz, F l [ A ]= F , F2[AI = F . In case (1) we are suggested t o replace z @ [ y ]( i l ( y ) )F by a single variable z , and t o introduce a new substitution [ r / F ] . However, l ( p ( z ) ) > l ( p ( z ) ) ,which does not fit in the proof at all. But we can remove this case by just considering AUT-no. Case (2) does not pose a problem: the substitution plus reduction can be simulated by reduction plus substitution, starting from [y](il(y)) Fo @ [ y ]( 2 2 ( y ) ) Fo, where both F1 and F2 can be constructed from FO by substituting A for some of free z’s. Besides, the second proof is based on replacement. This means that the E-redex above can also be created from, e.g., (3) [YI
(4F @ [YI (i2(v))F , with A = i l ( Y ) , or
These two expressions do not reduce, unless we switch to a generalized form of (which does not solve the problem, though - see below). Finally the computability method (5.3) fails because the property: F E C, G E C + F @ G E C is not so easy anymore. For, let F 2 [z]( i l ( z ) [) y ] D ,G 2 [z]( i ~ ( z[)y)] D .Then we just know that A E C D [ i l ( A ) ]E C , D [ i z ( A ) ]E C , but we want that D [ A ]E C for general A E C. We have tried to adapt the second SN-proof to this situation, viz. by restricting to AUT-no, and by introducing a liberal version of &,It, named 8.
Ealt
*
E’
:
> [YIF. [ p ] F ( i i ( y )@] G > [ y ] F G @ [P]F[~(Y)I
This can be considered a kind of improper reduction in the sense that it identifies expressions which in the intuitive interpretation do correspond to different objects. A typical way of creating a new &-redex is, e.g., from (y]z @ G by the ~~, to [yly. One can indeed mimick this by first replacement [ z / 2 1 ( y ) ]reducing reducing to [ylz, and then apply a new replacement, viz. [z/y].But the norm of this new z is longer than that of the old one.
This Page Intentionally Left Blank
655
The Language Theory of A,, a Typed A-Calculus where Terms are Types L.S. van Benthem Jutting 1. INTRODUCTION
In the present paper we present the theory of a system of typed A-calculus A,, which is essentially the system introduced in [Nederpelt 73 (C.3)]. Its characteristic feature is that any term of the system can serve as a type. The main difference between the two systems is that our system only allows for @-reduction, while Nederpelt’s system has 7-reduction as well. The importance of A, lies in the fact that it may be considered as basic to the Automath languages. Therefore its theory can also be seen as basic to the theory of Automath [ d e Bruijn 80 (A.5)], [van Daalen 801. In our notation we will follow the habits of Automath, that is: for terms u and v, types
(Y
and variables z we will denote
Azauby[s:a]u and
(u.1
by
(4..
The system consisting of such terms will be called A. The system A, is the subset of A to which a term ( u ) v belongs only if v is a function, and if the domain of u and the type of u have a common (@-)reduct. Our main theorems will be:
(1) Church-Rosser for A. This will be proved along the lines of well-known proofs by Tait and Martin-Lof [Martin-Lof 75al. (2) Strong normalization for a subsystem of “normable terms” in A. Our proof will be along the lines of proofs in [Gandy 801 and in [de Vrijer 87c] for strong normalization in simply typed A-calculus.
(3) Closure of Am under (@)-reduction. For this we have a new direct proof, though the theorem has been proved previously in [van Daalen 801, see [C.5].
L.S. van Benthem Jutting
656
Moreover, we prove that the terms of Am are “normable” in the sense intended above; therefore those terms strongly normalize. This, together with correctness of types, implies that Am is decidable. In our presentation we will use “nameless variables’’ as suggested in [de Bruzjn 72b (C.2)]. That is, our variables will not be “letters from an alphabet” but “references to a binding A”, or rather, because of our notational habits, “references to a binding square brackets pair”. In order to grasp the use of nameless variables one should note that terms can be interpreted as trees. Consider e.g. the term:
.1 : 4 (4[v : PI (9)2 The corresponding tree is
.
Figure 1 In this tree the bindings may be indicated by arrows, omitting the names of the variables:
Figure 2
The language theory of A,
(C.6)
657
And here, again the arrows may be replaced by numbers, indicating the depth of the binding node to which the arrow points as seen from the node where the arrow starts (only binding nodes, indicated by c‘o”,are counted!):
Figure 3 This last tree can again be represented in a linear form:
1.1
(1) [Dl (1)2
*
Note that the same variable x in the first term (or tree) is represented in the “nameless” term (or tree) once by 1 and once by 2, whereas the same reference 1 in the “nameless” representation once denotes x and once y. Both the name carrying and the nameless linear representation can be considered as formalizations of the underlying intuitive notion of “tree with arrows”. The presentation with nameless variables makes the notion of a-conversion superfluous (and even meaningless). Thereby the definition of operations where “clash of variables” might arise (e.g. substitution) becomes more definite, and the proofs more formal. The drawbacks of this presentation might be a loss of “readability” of the formulas, and the need of a number of technical lemmas for updating references involved in certain formula manipulations. In our presentation frequent use will be made of inductive definitions (e.g. the definition of term, of substitution, of reduction and of A,). Subsequently, proofs are given with induction with respect to these definitions. This should always be understood in the sense of “induction with respect to the number of applications of a clause in the definition”, or, in other words, “induction with respect to the derivation tree”. This concept is not formalized here.
L.S. va.n Benthem Jutting
658
2. PRELIMINARIES AND NOTATIONS In our theory we will use some notions of intuitive set theory. N will denote the set of natural numbers { 0 , 1 , 2 , 3 , ...}, N+ the set of positive natural numbers {1,2,3, ...}, and IF'= N U {oo}, the set nV extended with infinity. The predecessor function is extended to N- by defining oo - 1 := 00. For n E N we define N,, := {k E N+I k I n}, so NO = 0, the empty set. Let A and B be sets. Then A x B denotes the Cartesian product of A and B, that is the set of pairs [a,b] where a E A and b E B; and A -+ B denotes the set of functions with domain A and values in B. If f E A --* B and a E A then ( a )f will denote the value of f at a; and if for a E A we have b(a) E B then [a E A ] b ( a )will denote the corresponding function, that is the set{[a,b(a)l E A x B ~ u E A } . As a consequence of our notation for the values of a function our notation for the composition of functions will be a little unusual: i f f and g are functions with domains A and B respectively, then
f 0 g = [z E C] ((z)f ) g ,
where C = {z E A I (z)f E B }
So (z) ( f o g ) = ((z)f ) g for z E C. If A is a collection of sets then U A denotes + A , i.e. the the union of A . If A is any set and n E N then A(") denotes N,, set of finite sequences of elements of A with length n. In particular A(O) = {0}, where 0 denotes the empty sequence. A* will denote U {A(")I n E N}, that is the set of all finite sequences of elements of A . If s E A* then L ( s ) is the length of a; and if s l 5 A* and 92 E A* then 81&82 denotes the concatenation of s l and 92. In particular, 0&s = s for s E A*. If a E A we will often confuse a with {[l,al}, that is the element of A(') with value a. In particular, if a E A and s E A*, then a&s E A', and moreover:
+
(1)(a&s) = a , and (n 1) ( a h )= (n) s for n 5 L ( s ) . Where no confusion is expected we will often omit the symbol "&". For the updating of references we will use the following functions and operations on functions:
For m E N
vrn = [nE N+] (n+m) For m E N
where
The language theory of A,
T(m,n)= Form E
In
(C.6)
659
n+l
ifnsm
1
ifn=m+l if n
>m
+ 1.
N and $ E N+ + N+ $(m)
= [n E M+]~ ( $ , m , n, )
where
It follows that cpo = 290 = [n E N + ] n , the identity on Nv+,and that for E N + + N+ we have +(') = +. Note that ( P and ~ 19', are injective, and that if $ is injective then so is +("). Simple computation shows that the following lemmas hold:
+
Lemma 2.3. If k E +l(k)
0
N and $1,$2 E N+ -+ N+ then
+2@) = ($10 $ 2 ) ( k )
Lemma 2.4. If k, m E
( n )' : 9
=
("
.
M and n E N+ then
n+m
ifnsk ifn>k
L.S. van Benthem Jutting
660
Lemma 2.5. Zf k,1, m E N then
3. TERMS, TRANSFORMATION AND SUBSTITUTION
We define the set of terms A inductively as follows:
Deflnition 3.1. (1) T E A (2) if
TI
E N + then
nEA
(3) if u,v E A then (u)v E A (4) if u, v E A then [u] v E A
0
Transformation, i.e. adaptation of the references in terms by means of a function $ is defined as follows:
0
Clearly if u E A then $21 E A. Moreover qU=T
iff u = ~ ,
@=m
iff u = 21 and
- = (vl)v2 $u
iff u = (ul)u2, $ul = v l and $2~2= v2
- = [vl] v2 $u
iff u = [ul]212, $u1= v l and $(l)u2 = v2
( T I ) $=
m
, ~
It follows that for injective $, $JU = $v implies u = v. -
Lemma 3.1. Zf$1,@2 E N + -+ N + , u $1 $2u = $2 0$1u
--
~
*
E
A then
, .
The language theory of Am (C.6)
66 1
Proof. By induction on u.
0
For u, v E A, k E N + we define substitution of u in v at k , denoted by as follows:
xi v
Definition 3.3.
xi
Clearly, again, if u,u E A then v E A. Now we have the following technical lemmas: L e m m a 3.2.
xi v = x r
LPk-lu
29k-1~. __
Proof. By induction on v. L e m m a 3.3.
11 ~7
-
v=
0
tltu
@v.
Proof. By induction on v. L e m m a 3.4. If m
then
0
xi (Pmv = xi-mv . -
(Pm
Proof. By Lemma 3.2 and Lemma 2.1. L e m m a 3.5. If m
+ 1 2 k > 1 then xi &v
-
=(P;-~V.
Proof. By induction on v. Corollary 3.5. Zf m 2 k then
xi -~ pmv = vm-lv.
These lemmas are used to prove the following theorem: T h e o r e m 3.1. Substitution theorem.
Z f m 2 k thenx:
xi w = x kE L + l x1+1W '
Proof. By induction on w.
L.S. van Benthem Jutting
662
The relevant case is when w = 12. If n = k then
and on the other hand
If n = rn
+ 1 then
c:+1
= Ck
x;-k+l'
'w
vnu = 'p.,-?u
-
For other values of n the proof is straightforward.
by Lemma 3.5
. 0
4. REDUCTION
We define on A the relation -+,called one step reduction. Definition 4.1. (1)
(4['wl v
+
CY
2,.
If u --* v then
The relation
-,,on A is the reflexive and transitive closure of +, defined by
Definition 4.2. (1) u+u.
(2) If u + v and v
+
w then u + w
The language theory of A,
on
(C.6)
663
It is easily seen that the relation ++is transitive and monotonic. By induction $v, the following technical lemma is proved: 21 -+ v, respectively $u + -
Lemma 4.1. If u + v then for any $ we have - implies u + v . if 4 is injective then $u - + +v
$21 .
+
$v ; 0
Another technical lemma: Lemma 4.2. If $u -
+v
then for some w we have v = $w - and u + w.
Proof. By induction on $u -
+ v.
0
Finally it is easily shown that if [ u l ] u 2 u 2 * v2.
+
v then v = [vl]v2, u1
+
v l and
5. THE CHURCH-ROSSER THEOREM
We define on A the relation
> called
nested one step reduction.
Definition 5.1. (1) u > u . If u 2 u1 and v 3 v l then (2)
(414v 3 EY1v l
(3) (u) v 3
(211)
vl
(4) [u]v 3 [ U l ] v l .
5
2 denotes the transitive (and - of course - reflexive) closure of easy inductive argument it is seen that u 2 v iff u --H v. The following technical lemma is proved by induction on u 3 v.
>. By
Lemma 5.1. If u 3 v then for any $ $u 3 +v. -
an
5
Now we are able to prove two lemmas on substitution. Lemma 5.2. If u 3 u l then
xt v 3 xi1V.
Proof. By induction on v it is proved that
xt v 3 Ct1v for any k.
0
L.S. van Benthem Jutting
664
Lemma 5.3. Substitution lemma for >. Zf u 3 u1 and v 2 v l then v> vl.
xi
xi1
Proof. By induction on v 3 v l it is proved that Lemma 5.2 and Theorem 3.1 are used.
xi v 3 xi1v l for any k.
Using these lemmas we can prove the diamond property for
0
>.
Lemma 5.4. Diamond lemma for >. If u 2 u l and u 3 u2 then there exists a term v such that u l
> v and u2 3 v.
Proof. By induction on u 3 u1 and u 2 u2,using Lemma 5.3.
0
As a corollary we have:
Theorem 5.1. Church-Rosser theorem for +. If u -M u1 and u + u2 then there exists a term v such that u1 +,v and u2 + v.
0
6. NORMS, NORMING FUNCTIONALS AND MONOTONIC
FUNCTIONALS A term u E A is called normal if u -n v implies u = v. A reduction sequence of u is a finite or infinite sequence uo, ul,212, ... such that uo = u and un-l -+ un for n E lN+. We say that u strongly normalizes if all reduction sequences of u are finite. This is the case, by Konig’s lemma, iff there is a uniform upperbound to the lengths of the reduction sequences of u. We will prove strong normalization for a subset of A, the set of normable terns. Our proof extends proofs in [Gandy 801 and [de Vrijer 87c] for strong normalization in simple type theory. It is based mainly on de Vrijer’s “quick proof”; we refer also to that proof for comments. We define the set F of norms recursively as follows:
DeAnit ion 6.1. (1) lN E F
(2) if a , @E F then a
-@
:= ( a
-+
0)x EV
E F.
It is clear that, for a,@€F , a =par a n p = 0 . The elements of UF will be called norming function&.
0
For any norming
The language theory of A,
(C.6)
665
functional f the norm to which f belongs is denoted by the projection operators:
fT.
Moreover, we define
if f = n, n E N then f' = n , if f = [g,nl, [g,nl E a
- p then f' = g and f'
=n.
Let f be a norming functional, m a natural number. We define the norming functional f m as follows:
+
Definition 6.2.
+ m = n + m.
(1) If f E N , f = n then f
(2) If f E a
- P, f = [g,nl then f + m
Thus for f E a we have f
= [ [ hE a]((h)g
+ m),n + ml.
0
+ m E a and
(f + m)' = 'f + m , ( h )(f + m)' = ( h )f'+ m if a = p
-
y and h E
0.
Note that + extends addition on the natural numbers. For a E F and n E IV we define the norming functional c; E a.
Definition 6.3. (1) :c
(2)
=n
4-7 = "h E PI
C;[.+n,
n1'
Thus c;*
=n ,
( h )(@-7)'
= c;.+~
if h E
,
Note that c;
+ m = c+;,
.
Now let a be a norm. We define a subset ao of a and a relation simultaneous inductive definition.
Definition 6.4. (1) N O = N ;for f,g E
N o , f < g iff 'f < g*
< on ao by a
L.S. van Benthem Jutting
666
We define G := {aola E F } ; the elements of G will be called monotonic functionals. Note that < on N o is the order on the naturals. The following facts are easily proved: If f , g , h E a', f < g and g < h then f < h. Iff,gEoo,mEMthenf+mEaoandiff
< n then c g < c:.
7. STRONG NORMALIZATION We will assign to certain terms u E A a functional in U F , which will be called the norming functional of u. In order to define it we need a sequence 9 E ( u F ) * ;9 may be thought of as an administration of the functionals assigned to the free variables of u. f n ( u , 9 ) will denote the norming functional of u. It may be the case that f n ( u , 9 ) is undefined. This will be denoted by fn(u, 9)= 0.Terms u for which fn(u, 9)# 0 for some 9 E (uG)* will be called nomable.
Definition 7.1. (1) fn(7,O) = 0 .
(fn(v, 9)) fn(w, 9)'
if fn(v, 9)#
,
fn(w, 9)# 0 and dom(fn(w, 9)')= fn(v, 0)t ;
l o
otherwise.
The language theory of A,
((2.6)
667
+ + fn(v, @)* + 1,fn(w,@)* -t
[ [ hE a]fn(w, h&@) h* (4) fn(b1 w,@) =
+ fn(w,ct&@)*l
if fn(w, @) # 0 , fn(v, @)t= a and fn(w,h&@) # 0 for h E a ; otherwise.
0
It will be clear from Lemma 7.5, which will be proved presently, that for normable terms u the number fn(u, @)* is a n upperbound for the lenghts of the reduction sequences of u. Note that if fn((u) [w]v,9)# 0 then fn(u, @)t= fn(w, @)t. Our first lemma expresses that it only depends on the norms of the functionals in @ whether fn(u, @) is defined and, if so, what is the value of fn(u, 0)t.
Lemma 7.1. Zf @1,@2 E ( U F ) * , L(@l) = L(@2) = n and (k)@lt= (k)@2t 5 n then either
for k
fn(u, 91) = fn(u, @2)= 0
,
OT
fn(u, @l)t= fn(u, @2)t
. 0
Proof. By induction on u. The following technical lemma is also proved by induction on u.
Lemma 7.2. If@ E ( U F ) ' , II, E N + + IV+ and II, o @ E ( U F ) * then fn($u, - @)= fn(u,II, o 0).
0
(Note that @ as well as II, is a function, hence II, o @ is a function.) The following important lemma expresses that an upperbound for the lengths of the reduction sequences of C'; w can be calculated from fn(u, @) and fn(w, fn(u, a)&@).
Lemma 7.3. Substitution lemma. Zf fn(u, @)# 0 then fn(Cy v, @) = fn(v, fn(u, a)&@). Proof. By induction on v. The main case is: v = [vl] v2.
+ + fn(C;" v l , @)* + 1,
fn(Cy v,@)= [[hE a]fn(C; w2, h&@) h* fn(C;" w l , @)*
+ fn(C;
v2,
.;&@)*I
L.S. van Benthem Jutting
668
where a = fn(Cy v l , 9)f, while by the induction hypothesis fn(Cy v l , 9)f = fn(v1, fn(u, @)&@)I . Moreover, we have by the induction hypothesis for h E a: fn(C; v2, h&9) = f n ( C y 5 v 2 , h&9) = fn(cplu, - h&Q)&h&@) . = fn(191v2,
Therefore f n ( C t v2, h&9) = fn(v2,&
o
(fn(u, 9)&h&9)) =
= fn(v2, h&fn(u, a)&@) .
It follows that fn(Zy v,9) = fn(v, fn(u, 9)&9).
0
In order to formulate the next lemma we need the concept of a free uariable. Therefore we define for u E A and k E N + the proposition free(u, k), expressing (in the language of Section 1) that the term u contains a reference (or an arrow) to the k-th binding node below u.
Definition 7.2. (1) not free(.r, k) (2) free(n,k) iff n = k (3) free( (v)w, k) iff free(v, k) or free(w, k) (4) free([v] w, k) iff free(v, k) or free(w, k
+ 1).
0
Lemma 7.4. Monotonicity lemma. -
If 0 E (uG)* then fn(u, 9)E (UG) u ( 0 ) .
- If 91, 9 2 E (UG)*, L(@l) = L(92) = n, we have ( 1 ) 91 = ( 1 ) 9 2 ,
(k)91 < (k)9 2 and for 1 5 n, 1 # k
then fn(ul91) < fn(ul9 2 ) OT fn(u, 91)= fn(ul9 2 ) = 0 zffree(u, k) and fn(u, 91) = fn(u, 9 2 ) zf not free(u, k).
Proof. By induction on u. The main case is, again, = [ul]212. Suppose fn(u, 9) # 0. Then by the induction hypothesis fn(u1, 9) E UG. Let a denote fn(u1, 9)f. Then also by the
The language theory of A,
(C.6)
669
induction hypothesis for every g E a we have fn(u2,g&@) E UG. Now let g, h be elements of a such that g < h. Then either fn(u2,g&@) < fn(u2, h&@) or fn(u2,g&@) = fn(u2,h&@), hence fn(u2,g&@) g* + fn(ul,@)* + 1 < fn(u2, h&@) h* fn(u1, @)* 1. It follows that fn(u, @) E UG. Now assume that free(u, k ) . Then for g E a we have:
+ +
+
+
+ + fn(u1, @I)*+ 1
(9)fn(ul @I)'= fn(u2,g&@1) g* and
+ + fn(u1, @2)*+ 1
(9)fn(u, @2)' = fn(u2, g&@2) g* and therefore (9) fn(u, 01)' < (9)fn(u, @2)' . Moreover ,
+
fn(u, @l)* = fn(u1, @ l ) * fn(u2, cg&@l)* and
+
fn(u, @2)*= fn(u1, @2)* fn(u2, cg&@2)* and therefore fn(u, @I)* < fn(u, @2)* . Hence if free(u,k) then f n ( u , @ l ) < fn(u,(P2). It is easily seen that if not free(u, k ) then fn(u, @ l ) = fn(u, 32). 0
Lemma 7.5. Reduction lemma. If 9 E (UG)*, fn(u, @) # 0 then u -+ v implies fn(vl 0)< fn(u, @), Proof. By induction on u ered by Lemma 7.3.
-+
v. The case u = (211) [u3]212, v =
xyl
212
is cov0
As a corollary we have Theorem 7.1. Strong normalization. If u is normable then u strongly normalizes. If 9 E (uG)*, fn(u,@) # 0 then fn(ul@)*is an upperbound for the lengths of reduction sequences of u. 0
8. CONTEXTS AND TYPES
In Sections 8 and 9 we will define the system A,. In order t o do so we must be able t o calculate the type of an expression u E A. For assigning a type
L.S. van Benthem Jutting
670
to u we need a sequence U E A*. Such a sequence is called a context. It can be considered as administrating the types of the free variables in u.The type of u may be undefined which, again, will be denoted by the symbol “0”.
Definition 8.1. (1) tYP(.,
U )=
In order to express the properties of the typing operator typ, we must extend the transformation operation, the substitution operation and the reduction relation to contexts. As far as transformation is concerned we restrict ourselves (k). to the functions (pm
Definition 8.2. Let U be a context, L ( U ) = n. Then v (4 m U
E A* with L ( ( p g ) U )= n ~
is defined by
The following lemmas are easily seen to hold:
0
Lemma 8.2. If L ( U 1 ) = k then ( p g ’ ( U l & U 2 ) = (cpg)Ul)&U2. ~
~
We prove a technical lemma by induction on u:
0
The language theory of A,
((2.6)
671
Lemma 8.3. If L ( U 0 ) = k , L ( U 1 ) = m and U = UO&Ul&U2 then either typ(&u,
@U)
tYP(&U,
@U) = q&typ(u, UO&U2) .
= typ(u, UO&U2) = 0
or 0
This gives as a consequence:
Corollary 8.3. If L ( U 1 ) = m, then either typ(q,u, - U l & U 2 ) = typ(u, U 2 ) = 0
or tYP(cp,U, U1&U2) = 'PmtYP(U,U 2 ) .
0
Now in order t o investigate the relation between substitution and typing we define substitution in contexts:
Definition 8.3. Let U be a context, L ( U ) = n, and 1 5 k 5 n. Then U E A* with L ( C g U ) = n - 1 is defined by
xi
(1)
C; u =
xg-l ( 1 ) U (1
if 1
+ 1) U
if k 5 1 < n
.
0
We have the following easy lemmas on substitution in contexts:
0
Lemma 8.5. If L ( U 1 ) = k then
( U l & U 2 )=
(xiU 1 )& U 2 .
0
The next lemma describes the relation between substitution and typing:
Lemma 8.6. Substitution lemma for typ. If tYP((Pku, - u ) -n w and q k (k) u -n w then either
xi U ) = tYP(U1 U ) = 0
tYP(x; v,
or typ(x;t v,
xi U )
-H
w0 and
typ(v, U ) -n w0 for some w0 E A
.
L.S. van Benthem Jutting
672
Proof. By induction on u. The main case is u = b. Because k 5 L ( U ) we have U = Ul&U2, where L(U1) = k .
fore typ(C1 u , C;t U ) = typ(cpk-lu, Hence, by Corollary 8.3: t y p ( Z i o, lary 8.3: ~~
cp1 tYP(CZ
21,
There-
( x i Ul)&U2)
where L(Ci U1) = k - 1. V ) = p k - 1 typ(u, V2), so, again by Corol~
C i U ) = 'PktYP(U,U 2 ) = tYP(B'L1, U )
+
w
*
On the other hand
C i tYP(U, U ) = xi y& (k)u = 'Pk-lM u by Corollary 3.5. This gives us
( k )u
5 C i tYP(U, U ) =
+
'w
.
By Lemma 4.2 it follows that w = cp1 w0 and that t y p ( C i u , Zi U ) -M w0 and
Corollary 8.6. If typ(u, V)
-M
xi typ(u, U )
w and u l
-M
-M
w0
0
w then either
typ(CY u , V) = typ(u, ul&V) = 0
or typ(CY u , V ) -M w0 and
zy typ(v, ol&V)
-M
w0 for some w0 E A
Proof. Take k = 1 and U = ul&V in Lemma 8.6.
.
0
Finally, in order t o describe a relation between typing and reduction we define the concept of reduction on contexts.
Deflnition 8.4. Let u and u be terms, U and V contexts. (1) if u --* u then u&U
(2) if U
+V
-.*
then u&U
u&U
.--$
0
u&V.
We have the following lemma:
Lemma 8.7. If U k 5 72 such that
(k)U
-.*
+
V then L ( U ) = L ( V ) = n
> 0 and there is just one
( I c ) V and ( 1 ) U = (1) V for 15 R , 1 # k
.
The language theory of Aw (C.6)
Proof. By induction on U
4
673
V.
0
Moreover, we have
Lemma 8.8. Zf U tYP(’u, U )
+
+
V then either
tYP(% V )
or tYP(.u,
w = tYP(% V ).
Proof. By induction on u.
Corollary 8.8. Zfv tYP(% V&U)
-+
+
w then either tYP(U, w
w
or typ(u, v&U) = typ(u, w&U) .
0
The relation +I between contexts is the reflexive and transitive closure of -+. If u -W v and U -,, V then clearly u&U +I v&V.
9. THE SYSTEM A,
We will define by simultaneous induction the set r, c A* which is the set of correct contexts, and the set A, c A x A* (it will turn out even A, c A x.),?I If [u, U l E Aw then u will be called a correct term on context U.Here correctness should be understood as follows: If (u)v is correct on context U then v “is a function” and moreover typ(u, U ) and “the domain of v” have a common reduct. In fact, we have not formalized what it means for v t o “be a function” and, if it is, what “the domain of v” is. The requirements described above appear, however, in clause 4 of our definition and - implicitly - also in clause 6. Together with r, and Aw we will define the sets ri and hi for i E N . They are introduced only for the purpose of induction in the proof of Lemma 10.3. If [u,Ul E Ai then u will be called i-correct. The systems are connected with the notion of degree in [de Bruijn 72b (C.2)] and [de Bruijn 80 (A.S)] in the sense that any i-correct term will have degree at most i. (The converse, however, does not hold.) In the following discussion it is always assumed that i E nVw. For i = 00 the definitions and lemmas contain the theory of A,.
L.S. van Benthem Jutting
674
Deflnition 9.1. (0)
ro= ho = 0 .
If i > 0 then ( 1 ) 0 E ri ( 2 ) if [u, Ul E A, then u&U E ri (3) if U E
ri then
17,Ul E Ai
(4) if typ((u)v,U) = 0 , [u,UlE hi, [ v I U 1 E [ v l ]v 2 then [ ( u )v , U l E hi
Ail
typ(u,U)
-w
v l and
-w
(5) if typ([u] v, U ) = 0 and [D,u&U] E hi then [ [ u ]v ,U l E Ai (6) if [typ(u, U ) , U ] E
Ai-1
then [u,UlE A i .
0
Clearly if [ulUl E A, then U E ri and if U1&U2 E ri then U2 E ri. It is also clear (by induction on i) that A i c Ai+l for i E AJand it is easy to check that Am = U {Ai I i E nV}. We have the following technical lemma:
Lemma 9.1. If L ( U 0 ) = k, L(U1) = m, U = UO&Ul&U2 and U1&U2 E then
[g~, &U] hi E
if
[u, VO&U21 E A,
ri
.
Proof. B induction, respectively on [u,UO&U21 E Ai and on [&u, &U1 E A i , where frequent use is made of Lemma 8.3.
0
The lemma has some nice corollaries:
Corollary 9.1.1. Weakening and strengthening lemma. If L ( U 1 ) = m, U1&U2 E ri, then [ ( P ~ UU, l & U 2 ] E hi afl [u, U21 E Ai. Corollary 9.1.2. If U E ri, k 5 L ( U ) then [( ~ k(k)U,Ul E Ai. Corollary 9.1.3.
[n,Ul E A, zf
UE
rooand n 5 L ( U ) .
0 0
The next lemma partially expresses our assertions about correctness of terms.
The language theory of Am (C.6)
675
Lemma 9.2. Soundness of application. Zf [ ( u ) [w]v,Ul E Ai then typ(u, U ) --* w0 and w -* w0 for some w0 E A. Proof. By induction on [ ( u ) [w]v,Vl E Ai.
0
Types of correct terms are, in a sense, preserved under reduction.
Lemma 9.3. Preservation of types. If [u, V l E A,, u v then either -+
tYP(U, V ) = tYP(V, U ) = 0
or typ(u, V ) -* w and typ(v, U ) --* w for some w E A
Proof. By induction on u v. We will consider the case u = ( u l ) [u3] u2, v = C;”’ u2. By the previous lemma typ(u1, V ) -* w0 and 213 -* w0. Now typ(u, U ) = (ul)[u3]typ(u2, u3&U) -+ typ(u2, u3&U) and typ(v, U ) = typ(Cy’ u2, U ) . Apply Corollary 8.6. -+
x;”’
0
The following lemmas are easy to prove. The first contains the converse of clause 6 in Definition 9.1.
Lemma 9.4. Correctness of types. If typ(u, V ) # then [u, Vl E hi if [typ(u, V ) ,U l E hi-I. The second tells us that if an application of a function to an argument is correct, then both the function and the argument are correct.
Lemma 9.5. Correctness of functions and arguments. If [ ( u ) v,Vl E hi then [ u , V l E Ai and [ v , U l E Ai.
0
We prove two lemmas which are, in a sense, converses of Lemma 9.5.
Lemma 9.6. If [ ( u ) v l , U ] E v2, vl E A ~ .
r(.)
Ai,
[v2,U] E hi, v l
-*
w and v2
+
w then
Proof. By induction on [ ( u )v l , V l E A i . We consider the case of clause 4: typ((u) v l , U ) = 0 , [u, Vl E A i , 1.1, Vl E A i , typ(u, V ) -* w0 and v l -* [wO]wl. We know that typ(v1, V ) = 0,hence, by Lemma 9.3, typ(w, V ) = 0 and also typ(v2, U ) = 0.Therefore typ((u) v2, V ) = 0 . Moreover, by the Church-Rosser theorem we have, for some w2 E A:
L.S. van Benthem Jutting
676
w + w 2 and [wO]wl-nw2, hence w2 = [wO*]wl* for some wO* and Therefore typ(u,U) [ ( u )v2, U l E hi.
--H
w0
-*
wO* and 712
++
w
-*
wl*
.
[wO*]wl*, so, by clause 4, 0
Lemma 9.7. If [ ( d ) v,U l E A i , [(u2) V , U ) E A;.
121.2, Ul
E
A, and u1 + 212 then
Proof. By induction [(vl) v,U l E A i . We consider again the case of clause 4: typ((u) v, U ) = 0 , [ul, U l E A i l [v,U l E Ai, typ(u1, U ) w l and v ++ [wl] w2. First we have typ(v, U ) = 0 , hence typ((u2) v , U ) = 0. By Lemma 9.3 we have for some w0 typ(v1,U) ++ w0 and typ(u2,U) w0. Hence, by the ChurchRosser theorem: w0 ++ v l and w l ++ v l for some v l . Therefore typ(u2,U) 0 w0 --H v l and 21 + [wl]w2 ++ [vl] w2, so, by clause 4, [(u2)v,Ul E A,. ++
++
++
Finally we state a lemma on correct abstraction:
Lemma 9.8. [[v]v,U l E A,
zfl [v,u&U1 E hi.
Proof. By induction, respectively on [[u] v, U l E Ai and on [v,u&U1 E A,. 0
10. CLOSURE FOR A,
For the proof that A, is closed under reduction we need Lemma 10.2 which tells us that correctness is preserved under correct substitution. In order to prove this lemma we give a slightly different definition of A*, which we will prove to be equivalent to the first definition. Induction on this alternative definition will be used in the proof of Lemma 10.2. We define for i E IN" the sets Ci and Li by a simultaneous inductive definition as follows:
Deflnition 10.1. (0)
co = Lo = 0.
If i
> 0 then
(1)
0 E Ci
(2)
if [u, Ul E Li then u&U E Ci
(3)
if U E Ci then [ T , U ~E Li
The language theory of Am (C.6)
(5)
677
if typ([u]v,V ) = 0 and [v,u&U1 E Li then [ [ u ]v,U l E Li
The clauses 0 to 5 are the same as the corresponding clauses of Definition 9.1, but clause 6 of that definition has been split up into three clauses. We easily verify that Li-1 c Li and that L , = U { L , I i E N } . In order to show that Ci = ri and Li = Ai we first prove the following lemma:
Lemma 10.1. If [typ(u, V ) ,Vl E Li-1 then [u,Vl E Li.
Proof. By induction on [typ(u, V)l E Li-1. We consider the case of clause 4: typ(u, V ) = (ul ) v, typ((u1) v,V ) = 0 , [ul, Ul E Li-1, [v,U l E Li-1, typ(u1, U ) + w l , v [wl]w2. Now either u = 24 or u = (ul) u2 and typ(u2, V ) = v. If u = B then [u, Vl E Li by clause 6.1. If u = ( u l) u 2 , typ(u2,U) = v then we have by the induction hypothesis ru2, Vl E L; and therefore [u, Vl E Li by clause 6.2. As another case we consider clause 6.3: typ(u, V ) = [ul] v, [typ([ul] v,V ) ,Vl E L,-z and [v,ul&U] E Li-1. Again we either have u = n or u = [ul] 212 and typ(u2, ul&U) = v. If u = n then again clause 6.1 applies. And if u = [ul]u2, typ(u2,~1&U) = v then by the induction hypothesis 0 ru2, ul&U1 E Li and therefore [u, Vl E Li by clause 6.3. I +
Corollary 10.1. C , = J?i and Li = hi.
Proof. Li C Ai is trivial, Ai
c Li is proved by using Lemma 10.1.
0
Now we are able to prove the following important substitution lemma.
Lemma 10.2. Substitution lemma for Li. If [__ ( ~ k u , U ]E Li, [vlul E Li, typ(@u,U) ~i E L ~ .
rc; v , c ;
+
w and ' p k ( k ) U
Proof. By induction on [v,Vl E Li, freely using Corollary 10.1.
+
w then
L.S. van Benthem Jutting
678 We consider some of the clauses:
Clause 3: = r. We have to prove that X i U E Cj. If k = 1 this is clear by Lemma 8.4. U= w& V, also by Lemma 8.4. If k > 1 then U = w&V and Now we have [w, Vl E L i , hence by the induction hypothesis w, U E Ci by clause 2. E L, and therefore
xg
xi
xi
xI Vl
Clause 4: v = (vl ) v2. We know that typ(v, U ) = 0,[vl, Ul E Li, rv2, ul E Li, typ(v1, U ) -n w l
and v2 + [wl] w2
.
By Lemma 8.6 we have:
The induction hypothesis gives us:
Also by Lemma 8.6 we see that
Now by Lemma 5.3 it follows that
hence by the Church-Rosser theorem
w 0 a w and E i w 1 - n ~forsome w , Therefore we have: typ(Cg
x;v2
vl,Ci U ) -n w0 -n w
+
and
[Xi 2011 Cbl w2 -w bl C;+l w2
From (i), (ii) and (iii) we conclude by clause 4 that
*
(iii)
The language theory of Am (C.6)
679
Clause 6.1: v = 12. We know that [typ(v,U),U] = rpn - (n)U,Ul E Li-1. We discern two cases: n = k and n # k . Suppose n = k. As L ( U ) 2 k we may put U = U1&U2 with L(U1) = k . Then U= U l ) & U 2 by Lemma 8.5 and L ( x ; U1) = k - 1. Moreover, U E Ci. Hence by Corollary it can be shown, just as under clause 3, that 9.1.1 we have [u, U21 E Li and by the same corollary also v, Ul =
(xi
r(Pk+a;
x;
Ul E Li.
Now suppose n # k. v either equals 12 (if n < k ) or n-l (if n Lemma 3.4 (for n < k ) or Corollary 3.5 (for n > k ) we see that tYP(C; u, C; U ) =
x;(pn (4u .
By the induction hypothesis we have by clause 6.1 [C; v,Ci Ul E Li. Clause 6.2: v = ( v l ) v2. We know that [ ( v l ) typ(v2, U ) ,U l E tion hypothesis it follows that
[C; __ pn (n)U ,
Li-1
x;U l E
Li-1
> k ) . Using
and therefore
and that [v2, U l E Li. By the induc-
By Lemma 8.6 it is known that for some w0 E A
x;typ(v2, U )
--H
w0 and typ(E; v2, C; U )
+
w0 .
(ii)
And from (*) we conclude by Lemma 9.4 that rtYP(C;
GC;
w,c;Ul E
.
Li-1
(iii)
From (i), (ii) and (iii) it follows by Lemma 9.6 that rtYP(C;
v , c ; % C i Ul E Li-1
7
and this, together with (*) gives us by clause 6.1:
rc;
v,Ci Ul E Li .
We leave the other clauses to the reader.
0
L.S. van Benthem Jutting
680
Proof. Take k = 1 and U = vl&V in Lemma 10.2.
0
Our next lemma implies that for i E N the set A, is closed under reduction. In order to word it we use the relation -H between contexts, which has been defined in Section 8. In order to prove the lemma we assign t o every context U the number M ( U ) which is the sum of the lengths of the terms in U : if L ( U ) = n then M ( U ) = L((1) U ) L((2) U ) ... L ( ( n )U ) .
+
Lemma 10.3. If i E N , u&U
-n
+ +
v&V and [u, U l E
Ai
then [v, Vl E A i .
Proof. By induction on i. If i = 0 then A, = 0, so the lemma holds. Suppose i > 0. We prove the following: Proposition. If u&U
+
v&V and [u,U l E A, then [v, Vl E A,.
Proof. By induction on M(u&U). If M ( u & U ) = 1 then u&U + v&V is impossible, so the proposition holds. Now suppose M ( u & U ) > 1. As u&U + v&V we have either u + v and U = V or u = v and U + V. Suppose u + v and U = V. We inspect the clauses for 21 --+ 0.
(1)
u = ( ~ 1[u3] ) u2
v = Cyl u2
,
.
By Lemma 9.2 we have typ(u1,U) + w and u3 -H w for some w , and by Lemma 9.5 we have “2131 212, Ul E A, so [u2, u3&U1 E Ai by Lemma 9.8. Apply Corollary 10.2.
(2)
21
= ( u l ) v2
,
u1 + v l
,
v = (vl) u2
.
By Lemma 9.5 we have [ul, Ul E A,. Moreover ul&U + vl&U and M ( u l & U ) < M ( u & U ) . Therefore by our induction hypothesis we have [ v l , U l E Ai and hence [v, U l E A, by Lemma 9.7.
(3)
u = (211) u2
,
u2
--+
v2
,
v = ( u l ) v2
.
[v, Ul E A, by a similar argument, where Lemma 9.6 is used instead of Lemma 9.7. (4)
u=
[vl]v 2 ,
u1 + v l
,
v = ( v l ] u2
.
By Lemma9.8 we have [u2, ul&U1 E A,. Moreover u2&ul&U + u2&vl&U and M(u2&ul&U) < M(u&U); in fact M ( u & U ) = M(uZ&ul&U) 2. Therefore our induction hypothesis gives us [u2, vl&U1 E A, and it follows that [v, Ul E A, by Lemma 9.8.
+
The language theory of Am (C.6)
(5)
'11 =
[v,Ul
[ul] u2
E Ai by
,
u2
v2
,
21
= [ul] v2
.
a similar argument as under 4.
Now suppose u = v and U
(3)
+
68 1
+
V. We inspect the clauses for [u, Ul E A i ,
u= 7.
We have to prove that V E ri. As U -+ V it is impossible that U = 8, so we may put U = ul&U1 and V = v l & V l . As U E ri we have [ul, U l l E Ai and also M ( U ) < M(u&U). Therefore we have by our induction hypothesis [ v l , V l l E Ai, hence V E ri.
u = ( ~ 1~2 ) ,
(4)
typ(u, U ) = 0
typ(u1, U ) +,v l and 212
-+
,
[ ~ Ul l , E hi
,
[ ~ 2U, l E
Ai
[vl]v2 .
By Lemma 8.7 we know typ(u, V) = 0 . Moreover, we have ul&U + u l & V and M ( u l & U ) < M ( u & U ) so by our induction hypothesis [ul, Vl E A i , and by a similar argument we see that ru2, V1 E A i . Also by Lemma 8.7 it is seen that typ(u1, U ) --* typ(u1, V) so by the ChurchRosser theorem we have: v l - w and typ(u1, V)
It follows that u2 -,, [vl]v2 = [ul] u2
(5)
,
-+
-+
w for some w
.
[w] v2, hence [u, Vl E Ai by clause 4.
typ(u, U ) = 0
,
[u2, ul&U] E A,
.
We know that u2&ul&U + 212&ul&V and that M(u2&ul&U) < M ( u & U ) . It follows that [u2, ul&V] E Ai, hence [u, Vl E A, by Lemma 9.8.
(6)
[ ~ Y P ( u , U ) , UE] Ai-1
.
By Lemma 8.7 we have typ(u, U ) -+ typ(u, V), hence typ(u&U)&U -+ typ(u&V)&V. Now by our induction hypothesis on i it follows that [typ(u, V), Vl 0 E Ai-1 and therefore [u, Vl E Ai by clause 6.
So our proposition is proved, and it follows immediately that u&U [u, Ul E A, imply [v,Vl E A i . This proves our lemma. Corollary 10.3. Closure for Ai. If i E N , [u, U l E Ai and u v then [v,Ul E -+
As a consequence we have:
Ai.
-+
v&V, 0
0
L.S. van Benthem Jutting
682
Theorem 10.1. Closure fOT Am. If [u, U l E Am and u --H v then [v,Ul E Am.
0
11. NORMABILITY FOR Am In this section we will prove that ru,Ul E Am implies that u is normable. It then follows from Theorem 7.1 that u strongly normalizes. In order t o prove that u is normable we will assign to certain sequences U E A* a sequence s ( U ) E (UG)*. If the assignment is not possible then we will write as before, s ( U ) = 0.
Definition 11.1. (1)
40) = 0 cg&s(U)
(2) s(u&U) =
if
s(U)#
fn(u,s(U)) # 0
0,
and fn(u, s(U))t = a otherwise
0
,
Proof. By induction on U.
0
Our second lemma gives a relation between norms and typing.
Lemma 11.2. If U E A*, s ( U ) # 0 and typ(u, U )# 0 then either fn(typ(u, U),s ( V ) )= fn(u, s ( U ) )= 0
or fn(typ(u, U), s ( U ) ) t = fn(u, W ) t
'
Proof. By induction on u. We consider the case that u = (ul]u2. Then typ(u, V ) = (2111 typ(u2, ul&U) and typ(u2, ul&U) # 0 . If fn(u1, s ( U ) ) = 0 then fn(typ(u, V ) ,s ( U ) ) = fn(u, s ( U ) )= 0 . Now assume that fn(u1, s ( U ) ) # 0 and put fn(ul,s(U))t = a. Then it follows that s(ul&U) = c$&s(U) # 0. If fn(typ(u2, ul & U ), s(ul&U)) = 0 then also fn(u2, s(ul&U)) = 0 by the induction hypothesis, and therefore fn(typ(u, V ) ,U ) = fn(u, V ) = 0 . SO let us assume fn(typ(u2, uI&U), s(ul&U)) # 0 . Putting fn(typ(d2, ul&U), s(ul&U))T = /3 we have by the induction hypothesis
The language theory of A,
683
(C.6)
fn(u2, s(ul&U))T = p and also fn(u2, g&s(V))j = ,B for g E a. Hence fn(typ(u, U ) ,s ( U ) ) t = fn(u, s(U))T = a + 0.
Lemma 11.3. If [u, Vl E
Ai
then s ( U ) #
0
0
and fn(u, s ( U ) ) # 0 .
Proof. By induction on [u, Crl E A i . We consider clause 3: u = 7. We only have to show that s ( U ) # 0 . If U = 0 then s ( V ) = 0, and if U = v&V then we have [v,V1 E hi, so by the induction hypothesis s ( V ) # 0 and fn(v,s(V)) # 0 and therefore s ( U ) # 0. We will also consider clause 4: u = ( u l ) u2. We have typ(u,U) = 0 , [ u l , U ] E A;, [u2,U] E A i , typ(u1,U) --H v l and 212 -H [vl] v2. By the induction hypothesis fn(u1, s ( U ) ) # 0 and fn(u2, s ( U ) ) # 0 . Putting fn(ul,s(U))T = a we have fn(typ(ul,U),s(U))T = a by Lemma 11.2 and fn(vl,s(U))T = a by Lemma 7.5. Also by Lemma 7.5 we have fn(u2, s ( U ) ) t = fn([vl]v2, s ( U ) ) t = a + p for some p, hence fn(u, s ( U ) )# 0. 0 We leave the other cases to the reader. As a consequence we have
Theorem 11.1. Strong normalization for A,. If [u, Ul E A, then u strongly normalizes.
0
ACKNOWLEDGEMENT
I want to express my gratitude to R. Nederpelt for his encouragement and his careful reading of the original text, where he suggested some improvements and detected a serious error.
This Page Intentionally Left Blank
PART D Text Examples
This Page Intentionally Left Blank
687
Example of a Text written in Automath N.G. de Bruijn
[Editor’s comments This early text is written in the first full-fledged version of an Automath language, later to become known as AUT-68. It covers some elementary logic and the notions of set, powerset and set inclusion. An introduction to the language AUT-68 can be found in this Volume flvan Benthem Jutting 81 (B.l)]). First a f e w remarks on features that are particular of this text and on the way at has been reproduced here. (1) Early 1968 d e Bruijn still used the term sort instead of type. We have
not changed this terminology.
(2) In the original text one finds vertical lines as indicators of the scope of the variables, like in Natural Deduction in the style of Fitch [Fitch 521. These lines are redundant, although they enhance readability. They have been deleted in this reproduction. (3) In some places, especially at the opening of each new section, you will find a few lines that have been placed between brackets. These lines are superfluous in the sense that deleting them would affect neither the correctness nor the meaning of the text. They redefine a context that could have been just picked up from the preceding sections. De Bruijn included these lines as reminders, saving the reader the trouble of searching the text for the proper identifiers. Since they definitely contribute to the readability, we have reproduced both the lines and the brackets.
(4) The division
of the text in sections with descriptive headers is from the original. So are the comments between the lines.
(5) The text has never been checked on a computer. A few obvious mistakes
have been corrected.
688
N.G. de Bruijn
We.continue with a few short comments on the handling of logic in terms of bool and T R U E . Once this mechanism is understood, it is not dificult to read the plain Automath text. Consult on this subject also the rCsumCs of [D.1] and [A.2] in the Introduction. Section 12 of [van Benthem Jutting 81 ( B . l ) ] , entitled ‘Logic’, contains a more recent text fragment developing some logic. It may be instructive to compare the two texts. 1.1-1.3. The primitive type bool of propositions (here called ‘booleans’) is introduced, and for each boolean x the connected assertion type T R U E ( x ) . The idea is that a boolean will be true if its assertion type is inhabited. 1.4 C O N T R is defined as the type of functions that attach to each boolean v an assertion of v. Such a function could of course be taken as an assertion of a contradiction, and in a pure propositions-as-types setting it would be natural to view the type C O N T R itself as the proposition that in a canonical way represents falsity. Here the corresponding proposition (boolean) can be obtained via the nonempty-construction that now follows. 1.9-1.13. To each type ksi corresponds a boolean nonempty(ksi). Its assertion type TRUE(nonempty(ksi)) is inhabited if ksi is. This construction may look a bit artificial: why is not ksi itself taken as the assertion type? The answer is that we already have the uniform construction of the TRUE-types as the assertion types of booleans, and there is no way to make ksi and TRUE(nonempty(ksi)) definitionally equal. It is noteworthy that i n the present text de Bruijn not always takes the trouble to explicitly construct the TRUE-type, or even the bool. A typical example is Section 2. Both the equality axioms and the reasoning on equality are entirely in terms of thc types IS(ksi,x,y). The corresponding boolean equal is defined in line 2,3, but never used, the assertion type does not occur at all. A s a matter of fact, the IS-types here already have taken the role of the propositions. A similar observation could be made e.g. on implication (IMPL, defined in line 4.7). I n other places the booleans are used in an essential way, though. I n particular the type [u : ksi] bool plays an important role. First, in Section 7 on quantification, as the type of predicates on ksi, and then again in Section 13, as the type of sets over ksi. A final remark seems in order on the kind of logic that is at issue. De Bruijn has always emphasized that Automath is neutral with respect to the logical principles that one wants to accept or reject. This view is reflected in the present text b y the manner i n which he handles non-constructive principles. Two such principles (or rather, corresponding types) are defined as PARADISE I and I 1 (line 7.15 and 1.19). Metaphorically speaking, a type ksi has an inhabited PARADISE I 1 if the double-negation law holds for ksi. However, there is no axiom ( P N ) stating that PARADISE II(ksi) is inhabited for each ksi. I n particular, this is not assumed for the TRUE-types, so that we obtain intuitionistic and not
Example of a text written in Automath (D.1)
689
classical logic for the inhabitants of bool. (The names bool and TRUE may be felt to be a bit misleading here.) In Section 12, line 12.1, the type EXCLTHIRD is defined. An inhabitant of EXCLTHIRD would yield that all TRUE-types are in PARADISE 11. Then in line 12.7 a non-constructive notion of truth, called VALID, is defined as truth on the assumption of EXCLTHIRD. The upshot is that intuitionistic and classical logic live happily together in the guises of TRUE and VALID.
R.C. de Vrijer]
1. BOOLEANS 1.1 0 1.2 0 1.3 x
@ bool
1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 1.13 1.14 1.15 1.16 1.17 1.18
:= PN
: sort
@ X
.-.- _ _ - - -
@ TRUE
:= PN
: sort
0 0 a b 0 ksi ksi a ksi a
@ CONTR
:= [v:bool]TRUE(~)
: sort
ksi ksi x u x
Q EMPTY
@ a @ b @ then 1 @ ksi @ nonempty @ a Q then 2 @ a @ then 3
.-...-
_____ _____
:= (b)a
..-
_____
:
TRUE(b)
: : : : :
sort
:= P N := [u:ksi]CONTR
: sort
:= PN
.-.- _ _ _ - _ := PN
..- _ _ _ _ _
@ U
Q then 4 @ then 5
:= (x)u := [t:EMPTY (ksi)]
__--_
___-_
then 4 (t) 1.19 ksi
: CONTR : bool
bool ksi TRUE(nonempty) TRUE(nonempty) : ksi
.-..-
@ X
: bool
: ksi : EMPTY(ksi) : CONTR :
EMPTY(EMPTY( ksi))
@ PARADISE I1 := [t:EMPTY(EMPTY(
ksi))]ksi
: sort
N.G. de Bruijn
690 2. EQUALITY
0 ksi
Q (ksi Q (x
2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9
Q y x y Q IS Q equal y x Q reflexive y Q ass 1 ass 1 Q symm ass 1 Q z z Q ass 2 ass 2 @ transitive
2.10 2.11 2.12 2.13 2.14 2.15
ksi theta x1 P 1 x2 ass 3
: sort) :
ksi)
.- _ _ _ _ _ := P N
:= nonempty(1S) := PN
.-.-
_____
:= PN _____
...-
_____
:= PN
Q theta
: sort
Q x1
:
@ P 1 Q x2 Q ass 3 @ then 6
:
ksi [t:ksi]theta : ksi : IS(ksi,xl,x2) : IS(theta, (xl)P 1,
Q P2
: [x:ksi]theta
Q ass 4
:
(X2)Pl) 2.16 P 1 2.17 P 2
2.18 ass 4 Q then 7 2.19 P 2
0 ass 4a
2.20 ass 4a Q then 7a
:= PN
IS([x:ksi]theta, Pl,P2) : [x:ksi]IS(theta,
.-.-
: [x:ksi]IS(theta,
(X)Pl,(X)P2) _____
:= PN
:
( 4 P1,W 2 ) IS([x:ksi]theta, Pl,P2)
Example of a text written in Automath (D.1)
691
Za. Ifelse 2”.1 2”.2 2”.3 2”.4 2”.5 2”.6 2”.7 2”.8
0 ksi x a a ass 4b a ass 4c
@ @ @ @ @ @ @ @
(ksi (x a ifelse ass 4b then 7b ass 4c then 7c
: sort) : ksi) :
bool
: ksi : TRUE(a) : IS(ksi,ifelse,x) : IS(ksi,ifelse,x) : TRUE(a)
2b. Equality for two sorts 2b.l 0 2b.2 ksi 2b.3 eta 2b.4 a 2b.5 b 2b.6 b 2b.7 b 2b.8 ass4d 2b.9 eta 2 b . 1 zeta ~ 2b.lla 2b.12 b 2b.13 c 2b.14 ass 4e 2b.15 ass 4f
@ (ksi @ eta @ a @ b @ ISS @ equal1 @ ass 4d @ symmm @ zeta @ a @ b @ c @ ass 4e @ ass 4f @ transitivv
: sort) : sort : ksi :
eta
: sort
bool : ISS(ksi,eta,a,b) : ISS(eta,ksi,b,a) :
: sort :
ksi
: eta : zeta
ISS(ksi,eta,a,b) ISS(eta,zeta,b,c) : ISS(ksi,zeta,a,c)
:
:
C o m m e n t : T h e PN’s in 2.2, 2.6, 2.9 can n o w be replaced respectively b y ISS(ksi, ksi,x,y), symmm(ksi,ksi,x,y); transitivv(ksi,ksi,ksi,x,y,z). 2‘.
2‘.1 2‘.2 2‘.3 2c.4
Embedding 0 ksi eta p
@ @ @ @
(ksi (eta p EMBED
...-
_____
: sort)
_____ _____
: sort)
:=
[t:eta]ISS(ksi,eta,(t)p,t)
: sort
1-
:
[x:eta]ksi
N.G. de Bruijn
692 2'.5 p 2".6 w
@ w @ image
.- _ _ _ _ -
:
EMBED
:= [x:ksi]nonempty (
EXISTS(eta,[b:eta] equal( (b)P9x11
: [x:ksi]bool
[Note: EXISTS is still to be defined i n line 7.2. So this Section 2' should, as a matter of fact, be placed after Section 7.1
3. PAIRSORT 0 ksi
@ (ksi @ (theta
3.1 3.2 3.3 3.4
theta theta x y
@ @ @ @
3.5 3.6 3.7 3.8
theta u u u
@ u 0 first @ second @ then 8
pairsort x y pair
.-.- _ _ _ _ _ .- _ _ _ _ :=
PN
: sort) : sort)
sort
:= P N
: : : :
ksi theta pairsort
.-.-
:
pairsort
.- _ _ _ _ _ .- _ _ _ _ _ ___--
:= P N := P N := P N
: IS(pairsort,u,
:= P N := P N
: IS(ksi,x,first(pair)) : IS(theta,y,
: ksi :
theta pair(first,second))
3.9 y 3.10 y
@ then 9 @ then 10
second( pair))
4. BOOLEQUAL, IMPLICATION 4.1 4.2 4.3
0 a b
@ a @ b @ c
.- _ _ _ _ _ .- _ _ _ _ _ .- _ _ _ _ _
4.4 4.5
c b
@ then 11 @ d
:= P N
.-
_____
bool bool : pairsort( [t:TRUE (a)]TRUE(b),[s: TRUE(b)]TRUE(a)) : IS(bool,a,b) : IS(bool,a,b)
:
:
Example of a text written in Automath ( D. l) @ then 12
4.6
d
4.7 4.8 4.9 4.10
b @ b @ ass 1 @ ass 2 @
IMPL ass 1 ass2 modpon
:= PN
693
:
pairsort([t:TRUE (a)]TRUE(b),[s: TRUE(b)]TRUE(a))
:= [u:TRUE(a)]TRUE(b) : s o r t .- _ _ _ _ _ : TRUE(a) .- - _ _ _ _ : IMPL := (ass 1)ass 2 : TRUE(b)
..-
5. SOME LOGICAL CONSTANTS
5.3
0
5.4
0
@ contradiction := nonempty(C0NTR) @ OBVIOUSLY := IMPL(contradiction, contradiction) @ trivial := nonempty( OBVIOUSLY) @ now 1 := [u:TRUE(
5.5
0
@ now 2
:= then 2
5.6
0
@ now 3
:= [u:CONTR]u
: TRUE(trivia1) : EMPTY(C0NTR)
..-
:
5.1 5.2
0 0
contradiction)]^ (OBVIOUSLY,now 1)
: bool : sort : bool :
OBVIOUSLY
6. NON, A N D 6.1 6.2 6.3
0 b b
@ b @ NON @ non
6.4 6.5
b c
6.6
_____
bool
:= EMPTY(TRUE(b)) := nonempty(N0N)
: sort : bool
@ C
..-
: bool
@ AND
:= pairsort (TRUE(b),
c
@ and
:= nonempty(AND)
:
6.7 6.8
c if
@ if @ then 12a
.-.-
: AND
6.9
if
@ then 12b
_____
TRUE(c))
_____
: sort
bool
:= first (TRUE(b),
TRUE(c) ,if)
: TRUE(b)
:= second(TRUE( b) ,
TRUE(c),if)
: TRUE(c)
N.G. de Bruijn
694
6.10 b 6.11 if
@I
6.12 b 6.13 if
Q if @ then 12d
6.14 b 6.15 if
Q if
if @I then 12c
@I
then 12c
.-
_____
:= then 3(NON(b),if)
.-.-
_____ := then 2 (NON(b),if)
.- - _ _ _ _
: TRUE(non(b)) : NON(b) : NON(b) : TRUE(non(b))
:= then 5 (TRUE(b) ,if)
: TRUE(b) : EMPTY(NON(b))
.- _ _ _ _ _ .-
: sort)
7. EXISTS, ALL
(ksi
0
@I
7.1 7.2 7.3 7.4 7.5
ksi P P v ass 1
P @I EXISTS @I v @I ass 1 @I then 13
7.6 7.7 7.8
P ass 2 ass 2
@I
7.9 7.10 7.11 7.12
P P
@I
@I
@I @I
ass 2 then 13a then 13b
ALL @ (v v @ ass 3 ass 3 @I specialize
.-.-
_____ := PN
.- _ _ _ _ _ .- _ _ _ _ _ := P N
..-
_____
:= PN := PN
:= [u:ksi]TRUE((u)P) .- _ _ _ _ _
.- _ _ _ _ _ :=
(v)ass 3
: : : :
[u:ksi]bool sort
ksi TRUE((v)P) : EXISTS : EXISTS : ksi :
TRUE( (then 13a)P)
: sort : ksi) : ALL :
TRUE((v)P)
7.13 P 7.14 P
Q NONEXIST := [u:ksi]NON((u)P) : sort @I WEAKEXIST := EMPTY(N0NEXIST) : s o r t
7.15 ksi
@ PARADISE I := [Q:[u:ksi]bool] [t:WEAKEXIST( Q)] EXISTS(Q) @I a .-.- _ _ _ _ _ .- _ _ _ _ _ @ b @ then 14
7.16 P 7.17 a 7.18 b
: : : :
sort
PARADISE1 WEAKEXIST (P) EXISTSfP’I
Example of a text written in Automath (D.l)
695
8. CONSTANT FUNCTIONS 0 ksi
8.1 8.2
8.3 8.4 8.5 8.6
@ ksi @ theta
theta @ pi pi @ CONSTANT
g a b c
@ 0 @ @
a b c then 15
.- _ _ _ _ ...- _ _ _ - -
: sort : sort
.- _ _ _ - .-
: [t:ksi]theta
:=
[s:ksi][t:ksi]IS(theta, (t)Pi,(S)Pi)
..- _ _ _ _ -
..-.-
_____
____-
:= (a)(b)c
: sort : : : :
ksi ksi CONSTANT IS(theta,(a)pi, (b)Pi)
9. CONDITIONAL BRACING
9.1 9.2
0
@ (ksi
ksi
@ P @ h
P
9.3 9.4 9.5 9.6
h x x h
@ X
9.7 9.8 9.9
x a a
@ a @ then 17 0 then 18
9.10 a
@ then 19
@ sigma @ then 16 @ Q
.- _ _ _ - -
: [t:ksi]bool : [t:ksi][s:TRUE(
..-
(t)P)]boo1 _____
:= TRUE( (x)P) := EXISTS(sigma, (x)h) := [t:ksi]nonemtpy(
then 16(t))
.- _ _ _ - -
1-
:= then 3(then 16,a) := then 13a(sigma, (x)h ,
then 17)
@ a @ b @ then 20
: ksi : sort : sort : [t:ksi]bool : TRUE((x)Q) : then 16 : TRUE((x)P)
:= then 13b(sigma,(x)h,
then 17) 9.11 x 9.12 a 9.13 b
: sort)
..- - _ _ _ _ .-.- _ _ _ - _ := then 13(TRUE((x)P),
: TRUE(
(then 18)(x)h) : TRUE((x)P) : TRUE((a)(x)h)
N.G. de Bruijn
696
9.14 b 9.15 b
Q then 21 @ then 22
:= then 2(then 16, then 20) : TRUE((x)Q)
:= then 2(then 16, then 13
(sigma,(x)h,a,b))
:
TRUE((x)Q)
_____
: sort) : [t:ksi]bool)
10. DIRECT BRACING
0 ksi 10.1 P 10.2 Q
Q (ksi Q (P Q Q @ R
.-
.- _ _ _ _ _
.- - _ _ _ _ := [t:ksi] and ((t)P,(t)Q)
: [t:ksi]bool : [t:ksi]bool
11. NAMECHANGING @ NAME @ dash
:= PN := PN
Q ksi Q classin
:= pairsort ([t:ksi]bool,
11.5 ksi 11.6 c
@ C
.-
@ predicof
:= first([t:ksi]bool,
11.7 ksi 11.8 d
@ d @ classof
11.1 11.2 11.3 11.4
0 0 0 ksi
.- - - _ _ _
..-
NAME) _____
NAME,c) _____
11.9 0 11.10 0
@ NAME 2 @ dot
11.11 0 11.12 ksi
.- _ _ _ _ _ @ (ksi @ PREDICATE := pairsort([t:ksi]bool, NAME 2) .-.- _ _ _ _ _ 8 C
:= PN := PN
@ predicup
:= first([t:ksi]bool,
@ d @ predicdown
:= _ _ _ _ _
NAME 2,c) 11.15 ksi 11.16 d
: sort : classin : t:ksi]bool : [t:ksi]bool
:= pair([t:ksi]bool,
NAME,d,dash)
11.13 ksi 11.14 c
: sort
: NAME : sort
: classin(ksi) : sort :
NAME 2
: sort)
: sort : PREDICATE : [t:ksi]bool : [t:ksi]bool
:= pair([t:ksi]bool,
NAME S,d,dot)
: P,REDICATE
Example of a text written in Automath (D.l) 11.17 0 11.18 ksi 11.19 theta 11.20 P 11.21 a
@ (ksi @ (theta
@ P @ a @ then 25
.- _ _ _ _ _ .- _ - - _ _ .- _ _ _ _ _ .- _ _ _ _ _ 1-
697
: : : :
sort) sort)
:= [s:ksi]((s)P)a
[t:ksi]theta EMPTY(theta) : EMPTY(ksi) : EMPTY(ksi)
@ b
.- _ - - _ _
@ then 26
:= then 25(TRUE(
11.24 ksi
@C
11.25 c
@ then 27
nonempty(ksi)) ,ksi,[s: TRUE(nonempty(ksi))] then 3(s),b) : EMPTY(TRUE( nonempty (ksi))) .- _ _ _ _ _ : EMPTY(TRUE( nonempty (ksi))) := then 25(ksi,TRUE( nonempty(ksi)) ,[s:ksi] then 2(s),c) : EMPTY(ksi)
11.26 0 11.27 b 11.28 x
@ X
..-.-
@ then 28
:= then 3(EMPTY(
11.22 ksi 11.23 b
@ b
_____
_____
: boo1 : TRUE(non( non( b)))
TRUE(non(b)),x)) 11.29 x
@ then 29
:= then 27(NON(b),
11.30 x
@ then 30
:= then 29
: EMPTY(NON(b)) : EMPTY(EMPTY(
11.31 b
Q Y
.-
: EMPTY (EMPTY(
11.32 y 11.33 y
@ then 31 @ then 32
:= y := then 26(NON(b),
11.34 y 11.35 b 11.36 z
@ then 33 @ Z
:= then 2(NON(non(b)) ..- _ _ _ _ _
@ then 34
:= then 5(TRUE(b) ,z)
11.37 z
@ then 35
:= then 33(TRUE(b),
then 28)
_____
TRUE@))) TRUE(b))1
then 31)
then 34)
: EMPTY(NON(b)) : : : :
NON(non(b)) TRUE(non(non( b))) TRUE(b) EMPTY(EMPTY( TRUE@)1)
: TRUE(non(non( b)))
N.G. de Bruijn
698
12. EXCLUDED THIRD 12.1 0
@ EXCLTHIRD := [t:bool]PARADISE 11(
12.2 12.3 12.4 12.5 12.6 12.7
@ excl @I a @ if @ then 36 63 (b @ VALID
.- _ - _ _ _ .-.- _ - - _ _ .- _--__
: EXCLTHIRD : bool : EMPTY(NON(a))
:= (a)(if)excl
:
if @ then 37
.- ---__
:
:= [s:EXCLTHIRD]if
: VALID(b)
TRUE(t)) 0 excl b if 0 b
12.8 b 12.9 if
@I
.- _ - - _ _ :=
[s:EXCLTHIRD] TRUE(b)
: sort
TRUE(a)
: bool) : sort
TRUE(b)
Comment: VALID is the notion of truth in non-intuitionistic logic.
12.10 b 12.11 p 12.13 q 12.14 q 12.15 p
.-.-
@ P
@ q @ then 38 @I then 39 @I then 40
_-___
.- ---__ := (q)P := then 35(then 38)
@ P @ q @ then @I then @ then @ then
40a 41 42 42a
: :
TRUE(b) TRUE(non(non(b)))
:= [xEXCLTHIRD]
then 39(s) 12.16 b 12.17 p 12.18 q 12.19 q 12.20 q 12.21 p
: VALID(b) : EXCLTHIRD
: VALID(non(non(b)))
.- ---__ .-.- _ _ _ _ _
:
(q)P then 30(then 40a) := then 36(q,b,then 41) := [s:EXCLTHIRD] then 42(s)
:
:= :=
VALID(non(non(b)))
: EXCLTHIRD
TRUE(non(non(b)) )
: EMPTY(NON(b)) : TRUE(b) : VALID(b)
13. SETS
13.1 13.2 13.3 13.4 13.5 13.6 13.7
0 ksi ksi x s s ksi
(ksi @ set @I x
@I
s EST1 @ esti @ s @I @I
.- --_-_ .:= [x:ksi]bool
..-
_-___
--___
:= TRUE((x)s) := (x)s
.-.-
_-___
: sort)
: sort : ksi : set : sort : bool :
set
Example of a text written in Automath (D.1) 13.8 s 13.9 t
@ t @ INCL
..-
13.10 t 13.11 ksi 13.12 s 13.13 ksi 13.14 ksi 13.15 x 13.16 ksi 13.17 ksi 13.18 x 13.19 ass
@ incl
:= nonempty(1NCL) := _ _ _ _ _
@ then 43 @ emptyset QX @ assume @ then 44
:= now 2 := [x:ksi]contradiction := _ _ _ _ _
13.20 ass
Q then 45
:= then 3(CONTR,
13.21 x
@ then 46
:= [t:ESTI(x,emptyset)]
_____
(s @ powerset Q universe @ X
:= [x:set]incl(x,s) := [x:ksi]trivial
..-
_____
14. TRANSITIVITY
14.1 ksi 14.2 14.3 14.4 14.5 14.6 14.7 14.8
s t r
Q (s
@ t @ r
@ ass 1 ass 1 @ ass 2
ass 2 @ x x @ ass 3 ass 3 Q then48
14.10 ass 3 @ then 50
: : :
:
:= then 46
ESTI(x,universe) set ksi ESTI(x,emptyset) TRUE( contradiction)
: CONTR
EMPTY(ESTI( x,emptyset)) : NON(esti( x,emptyset)) :
OF SET-INCLUSION :=
...-.....-
__------_____ _____ ____-
____-
:= _ _ - - -
: set) : set : set : INCL(s,t) : INCL(t,r) : ksi :
TRUE((x)s)
:= then 3(IMPL( (x)s,(x)t) ,
(x)ass 1) 14.9 ass 3 @ then 49 @ refl 14.9” s
:
:= assume
then 45 @ then 47
: sort : boo1 : set) : set(set(ksi)) : set : ksi
..- -_-_-
then 44)
13.22 x
: set
:= ALL( ksi,[u:ksilnonempty
(IMPL((uh(u)t))) @
699
:= (ass 3)then 48 := [u:ksi]
: IMPL((x)s,(x)t) : TRUE((x)t)
then 2(IMPL((u)s, ( 4 s ) J Y :TRUE((u)s)ly) : INCL(s,s) := then 49(t,r,r,ass 2, refl(r),x,then 49) : TRUE((x)r)
N.G. de Bruijn
700 14.11 x
@ then 51
:= [p:TRUE((x)s)]
14.12 x
@ then 52
:= then 2(IMPL( (x)s, (x) r) ,
then 50(p)
: IMPL((x)s,(x)r)
then 51) 14.13 ass 2 @ then 53
:= [t:ksi]then 52(t)
15. INCLUSION INDUCED IN POWERSET 15.1 15.2 15.3 15.4 15.5
ksi s t ass 4 u
@ @ @ @
set) set) INCL(s,t) set : TRUE((u) powerset (s)) : TRUE(incl(u,s)) : : : :
(s
(t ass 4 u @ a
15.6 a 15.7 a
@ then 54 @ then 55
15.8 a
@ then 56
15.9 a
Q then 57
15.10 a
62 then 58
..- a := then then := then then := then then := then
15.11 u
@ then 59
:= [a:TRUE((u)powerset(
3(INCL(u,s), 54) 53(u,s,t, 55,ass 4) 2(INCL(u,t), 56) 57
: INCL(u,s) : INCL(u,t) : TRUE(incl(u,t)) : TRUE( (u)
powerset( t)) s))]then 58
: IMPL(
(u)powerset (s), (u) powerset (t)) 15.12 u
@ then 60
:= IMPL((u)powerset(s),
15.13 u
@ then 61
:= then 2(then 60,
(u)powerset (t)) then 59)
: sort : TRUE(nonempty(
then 60)) 15.14 ass 4 @ then 62
:= [t:set]then 61(t)
: INCL(set,
powerset (s), power set (t ) )
70 1
Checking Landau’s “Grundlagen” in the Automath System Parts of Chapters 0, 1 and 2 (Introduction, Preparation, Translation) L.S. van Benthem Jutting
CHAPTER 0. INTRODUCTION 0.2. The book translated At an early stage of the Automath project the need was felt to translate an existing mathematical text into an Automath language, first, in order to acquire experience in the use of such a language, and secondly, to investigate to what extent mathematics could be represented in Automath in a natural way. As a text to be translated, the book “Grundlagen der Analysis” by E. Landau [Landau 301 was chosen. This book seemed a good choice for a number of reasons: it does not presuppose any mathematical theory, and it is written clearly, with much detail and with a rather constant degree of precision. For a short description of the contents of Landau’s book see 2.0.
0.3. The language of the translation The language into which Landau’s book has been translated is AUT-QE. A detailed description and a formal definition of this language is given in [van Daalen 73 (A.3)]. I will use the notations introduced there whenever necessary. Where in the following text a concept introduced in [van Daalen 73 (A.3)] is used for the first time, it will be displayed in italics, with a reference to the section in [van Daalen 73 (A.3)] where it occurs. The language of the translation differs from the definition in [van Daalen 73 (A.3)] in one respect, viz. the division of the text into paragraphs [van Daalen 73 (A.3), 2.161. By this device the strict rule that all constants [van Daalen 73 (A.3), 2.6, 5.4.11 in an AUT-QE book [van Daalen 73 (A.3), 2.13.1, 5.4.41 should be different is weakened to the more liberal rule that all constants in one paragraph have to differ. Now, in a line [van Daalen 73 (A.3), 2.13, 5.4.41, reference to constants defined in the paragraph containing that line is as usual, while reference to constants defined in other paragraphs is possible by a suitable
L.S. van Benthem Jutting
702
reference system. For a more detailed description of the system of paragraphing, see Appendix 2 [not an this Volume]. In contravention of the rules for the shape and use of names in AUT-QE, we will in examples in the following text not restrict ourselves to alpha-numeric symbols, and occasionally we use infix symbols. (Of course, in the actual translation of Landau’s book, these deviations from proper AUT-QE do not occur.)
CHAPTER 1. PREPARATION In this chapter the logic which Landau presupposes is analysed and its representation in AUT-QE is described. 1.0. The presupposed logic
In his Vorwort fur den Lernenden” Landau states: “Ich setze logisches Denken und die deutsche Sprache als bekannt voraus”. Clearly, in the translation AUT-QE should be substituted for “die deutsche Sprache”. The proper interpretation of “logisches denken” must be inferred from Landau’s use of logic in his text. This appears to be a kind of informal second (or higher) order predicate logic with equality. In the following some characteristics of Landau’s logic will be discussed, and illustrated by quotations from his text. (i)
Variables have well defined ranges which are not too different from types [van Daalen 73 (A.3), 2.21 in AUT-QE. Cf.: -
On the first page of “Kapitel 1”: “Kleine lateinische Buchstaben bedeuten in diesem Buch, wenn nichts anderes gesagt wird, durchweg naturliche Zahlen” .
- In “Kapitel2, $5’’ : Grosze lateinische Buchstaben bedeuten durchweg,
wenn nichts anderes gesagt wird, rationale Zahlen”. (ii)
Predicates have restricted domains, which again can be interpreted as types in AUT-QE. Cf.: -
-
“Satz 9: Sind 2 und y gegeben, so lie@ genau eine der Falle vor:
(1) 2 = y. (2) Es gibt ein u mit 2 = y + u ...” etc. It is clear that u (being a lower case letter) is a natural number, or u E nat. “Definition 28: Eine Menge von rationalen Zahlen heiszt ein Schnitt, wenn ...”.
Checking Landau’s “Grundlagen”, Prepaxation (0.2)
703
Here it is apparent that being a “Schnitt” is a predicate on the type of sets of rational numbers. (iii)
When, for a predicate P , it has been shown that a unique x exists for which P holds, then “the x such that P” is an object. Cf.: - “Satz 4, zugleich Definition 1: Auf genau eine Art laszt sich jedem
Zahlenpaar x,y eine natiirliche Zahl, x+y genannt, so zuordnen dasz ... . x y heiszt die Summe von x und y”.
+
+
X > Y so hat X U = Y genau eine Losung U . Definition 23: Dies U heiszt X - Y”.
- “Satz 101: 1st
(iv)
The theory of equivalence classes modulo a given equivalence relation, whereby such classes are considered as new objects, is presupposed by Landau. Cf.: -
The text preceding “Satz 40”: “Auf Grund der Satze 37 bis 39 zerfallen alle Briiche in Klassen, so dasz x1 -
N
22
und
y1
Y2
dann und nur dann, wenn
21
-
22
derselben Klasse angehoren”. Y2
-
“Definition 16: Unter eine rationale Zahl versteht mann die Menge aller einem festen Bruch aquivalenten Bruche (also eine Klasse im Sinne des fj1)”.
(v)
The concepts “function” and “bijective function” are vaguely described. Cf.: - “Satz 4” (see (iii) above).
< y so konnen die m eineindeutig bezogen werden” .
- ‘‘Satz 274: 1st x
- “Satz 275: Es sei x fest, f(n)fur n
5 x nicht auf die n 5 y
5 x definiert. Dann gibt es genau
ein fur n 5 x definiertes gz(n)mit folgenden Eigenschaften ...” followed by the “explanation”: “Unter definiert verstehe ich: als komplexe Zahl definiert” , This explanation might be interpreted to indicate the typing of the functions f and g. (vi)
Landau defines and uses partial functions. Cf.: -
“Definition 14: Das beim Beweise des Satzes 67 konstruierte spezielle 211
-
212
heiszt
21
-
22
Y1 --
...‘I.
Y2
nition, only applies if
Here the construction. and therefore the defi21
-
22
> y1 -.
Y2
L.S.van Benthem Jutting
704
- “Definition 56: Das Y des Satzes 204 heiszt
-
r”. This definition deH
pends upon H # 0. - “Definition 71”,where Landau states explicitly: “Nicht definiert ist x” also lediglich fur x = 0, n 5 0”.
X > Y folgt X = ( X - Y )+ Y”. “Satz 240: 1st y # 0 so ist - . y = x”. Y
- “Satz 155: Beweis: 11) Aus -
X
- “Satz 291: Es sei
xy . x y .
n > 0 oder
XI
# 0, xz #
0. Dann ist
( ~ 1 . ~= 2 ) ~
In these last three examples we see “generalized implications”: the terms occurring in the consequent are meaningful only if the antecedent is taken t o be true. A similar situation will be encountered in (vii). (vii) Definitions by cases, sometimes of a complicated nature, are used. Cf.: -
“Definition 52:
f -(Pl+IHl) 181- IHI
-=+H =
0
-(IHI - 1Bl) H+E H
I
-
wennBc0, H
I
- “Definition 71:
IElX
I”={
1
1
wenn n
> 0.
wennx#O,
n=O.
w e n n x # O , n
Notice that in these two definitions, in some of the cases the definiens is not defined when the corresponding condition does not hold, (“generalized definition by cases”), and also that, in some cases, there is in the definiens a reference to the definiendum. (viii) In his text Landau only occasionally mentions predicates and relations; usually he refers to sets. Cf.: -
“Axiom 5: Es sei A4 eine Menge naturlicher Zahlen mit den Eigenschaften:
705
Checking Landau’s “Grundlagen”, Preparation (0.2) (1) 1 gehort zu M. (2) Wenn x zu M gehort, so gehort x’ zu
M.
Dann umfaszt M alle natiirlichen Zahlen” . -
“Satz 2: x’ # x. Beweis: M sei die Menge der x, fur die dies gilt..”.
However, in the text preceding “Definition 26”: - “Da =, >, <, Summe und Produkt den alten Begriffen entsprechen...”.
(ix)
Landau considers (ordered) pairs of objects. In Chapter 2 the components of such pairs remain clearly visible in their names: he does not refer to “the pair 5 with components x1 and 2 2 ” , but only t o “the pair ~ 1 ~ x 2 ” . Nevertheless it is clear from his words that he considers such a pair as one object. Cf.: - “Definition 7: Unter einem Bruch 21 - versteht man das Paar der natiir-
lichen Zahlen -
“Definition 8:
22
XI, 2 2
21
-
22
N
(in dieser Reihenfolge)”. y1 - wenn xly2 = y1x2”. y2
In Chapter 5, however, variables for pairs are used. Cf.: -
“Definition 57: Eine komplexe Zahl ist ein Paar reeller Zahlen E1,E2 (in bestimmter Reihenfolge). Wir bezeichnen die komplexe Zahl mit [El,E:2]”.
This definition is immediately followed by - “Kleine deutsche Buchstaben bedeuten durchweg komplexe Zahlen” .
The two notations are linked in the following way: -
“Definition 60: 1st x = [E,,E2], y = H2]”.
E2, Hi
(x)
+
[H1,H2], so
ist x
+ y = [El +
Finally it should be pointed out that some of Landau’s proofs and remarks tend to a kind of intuitive reasoning which is not easily represented in a formal system. A first example of this is the treatment of equality in “Kapitel 1, 51”. - Yst x gegeben und y gegeben, so sind entweder x und y dieselbe Zahl;
das kann man auch x = y schreiben; oder x und y nicht dieselbe Zahl; das kann man auch z # y schreiben. Hiernach gilt aus rein logischen Griinden:
L.S. van Benthem Jutting
706 ( 1 ) x = x fur jedes x. (2) Aus x = y folgt y = X . (3) Aus x = y, y = z folgt x = z ” .
Here it seems that Landau derives the properties of equality from reflection on the properties of a mathematical structure. They are not theorems or axioms but intuitively true statements. Substitutivity of equal objects, though used frequently in the proofs of subsequent theorems, is never mentioned. Other examples of proofs with intuitive components may be found where Landau, in a glance, takes in a complex logical situation. Cf.: -
“Satz 16: Aus x 5 y, y < z oder x < y, y 5 z folgt x < z . Beweis: Mit dem Gleichheitszeichen in der Voraussetzung klar; sonst durch Satz 15 erledigt”.
- “Satz 20: Aus x
+ z > y + z bzw. x + z = y + z bzw. x + z < y + z
folgt x > y bzw. x = y bzw. x < y. Beweis: Folgt aus Satz 19 d a die drei FQle beide Male sich ausschlieszen und alle Moglichkeiten erschopfen” . A somewhat different example, which involves what might be called “metalogic”, is the text preceding “Definition 26”, where it is indicated how a number of theorems might be proved, without actually proving them. I will return to this in 2.1 (viii).
1.2. The representation of logic in AUT-QE The logic considered by Landau to be “logisches Denken”, as described in the previous section, has been formalized in the first part of the AUT-QE book, called “preliminaries”, which, unlike the other parts, does not correspond to an actual chapter of Landau’s book. A possible way of coding logic in AUT-QE has been described in [van Daalen 73 (A.3), 3, 41. In addition to this description we stress a few points on the interpretation of AUT-QE lines [van Daalen 73 (A.3), 2.13, 5.4.41. Adopting the terminology introduced in [Zucker 77 (A.4)] we shall call expressions of the form [ X I : a11 ... [xk : a k ] type (with k 2 0) (i.e. t-expressions of degree I) l t expressions and expressions of the form [ X I : a11 ... [xk : a k ] prop (again with k 2 0) lp-expressions. Expressions having I t - and lpexpressions as their types, will be called 2t-expressions and 2pexpressions, respectively. Finally, 3t- and 3pexpressions have 2t- and Ppexpressions as their types. Now a 2t-expression will be used to denote a type (or “class”). If its type is an abstraction expression [van Daalen 73 (A.3), 2.8, 5.4.21 then it denotes
Checking Landau’s “Grundlagen”, Preparation (0.2)
707
a type of functions. A 2pexpression denotes a proposition or a predicate. A 3t-expression denotes an object (of a certain type) and a 3pexpression a proof (of a certain proposition). The interpretation of an AUT-QE line having a certain shape (EB-line, PNline or abbreviation line [van Daalen 73 ( A . 3 ) , 2.13, 5.4.41) will depend on its category part [van Daalen 73 ( A . 3 ) , 2.13.11 being a It-, lp, 2t- or 2pexpression. So we arrive at the following refinement of the scheme in [van Daalen 73 (A.3), 4.51.
Shape of the line:
Category-part lt-expression introduces a type variable
lpexpression introduces a proposition or predicate variable
2t-expression introduces= object variable (of the stated type)
PN-line
introduces a primitive type constant
introduces a primitive object (of the stated type)
Abbreviation line
defines a type in terms of known concepts
introduces a primitive proposition or predicate constant defines a proposition or predicate in terms of known concepts
EB-line
defines an object (of the stated type) in terms of known concepts
2pexpression introduces the stated proposition as an assumption introduces the stated proposition as an axiom proves the stated proposition as a theorem
In the above scheme it is apparent that, if the category part of a line is a Ipexpression, then the interpretation of that line is an assertion. But also if the category part is a 2t-expression a , the interpretation has an assertional aspect; the line does not only introduce a new name for an object (as a variable, or a primitive or defined constant) but also asserts that this object has the type a.
1.3. Account of the PN-lines Here I will give a survey of the primitive concepts and axioms (PN-lines) occurring in the preliminary AUT-QE text. A mechanically produced list of
L.S.
708
vitn
Benthem Jutting
these axioms appears as Appendix 3 [see lD.5,11, in this Volume]. In this list the PN-lines appear numbered. References in parentheses below will refer to these numbers. (i)
Axioms for contradiction. Contradiction is postulated as a primitive proposition (l),the double negation law as an axiom (2).
(ii) Axioms for equality. Given a type S, equality is introduced as a primitive relation on S (3), with axioms for reflexivity (4)and for substitutivity (5) (i.e. if x = y , and if P is a predicate on S which holds at x, then P holds at y ) . Moreover, there is a n axiom stating extensionality for functions (8). The notion of equality so introduced is called book-equality (cf. [ v a n Daalen 73 (A.3), 3.61) in contrast to definitional equality of expressions ( [ v a n Daalen 73 (A.3), 2.12, 5.5.61). (iii) Axioms for individuals. Given a type S, a predicate P on S, and a proof that P holds at a unique x E S , the object 2nd (for individual) is a primitive object (6), to be interpreted as “the x for which P holds”. An axiom states that and satisfies P
(7). (iv) Axioms for subtypes. Given a type S and a predicate P on S, the type O T (for own-type, i.e. the subtype of S associated with P ) is a primitive type (9). For U E O T we have a primitive object in(u)E S (lo), and an axiom stating that the function [u : OT] in(u)is injective (12). Moreover, there are axioms to the effect that the images under this function are just those x E S for which P holds ((11) and (12)). (v) Axioms for products (of types). Given types S and T the type pairtype (the type of pairs (x,y ) with c E S and y E T ) is introduced as a primitive type (14). For p E pairtype we have the projections first(p)E S and second ( p ) E T as primitive objects (( 16) and (17)), and conversely, for x E S and y E T we have pair (x,y ) as a primitive object in pairtype (15). Next there are three axioms stating that pair (first(p),second ( p ) ) = p , first(pair (x,y ) ) = z and second (pair (x,y ) ) = y (where = refers to book-equality as introduced in (ii)) ((19), (20) and (21)). (Note: If a type U containing just two objects is available, and if S is a type, the type of pairs (x,y ) with x E S and y E S may be defined alternatively as the function type [x : V l S . In the translation this was done at
Checking Landau’s LLGrundlagen”, Preparation (0.2)
709
the end of Chapter 1, where we took for U the subtype of the naturals 5 2. Therefore the pairing axioms as described above were not used in the actual translation.) (vi) Axioms for sets. Given a type S, the type set (the type of sets of objects in S) is introduced as a primitive type (21), and the element relation as a primitive relation (22). Given a predicate P on S, there is a primitive object setof ( P ) E set (denoting the set of x E S satisfying P ) (23), and there are two axioms to the effect that P holds at z iff z is an element of setof ( P ) ((24) and (25)). These can be viewed as comprehension axioms for S. (As sets contain only objects of one type, such axioms will not give rise to Russell-type paradoxes.) Finally extensionality for sets is stated as an axiom (26). The axioms for sets permit “higher-order” reasoning in AUT-QE, since quantification over the type set is possible. 1.4. Development of concepts and theorems in Landau’s logic
Here we give a sketch of the development of the logic in [Landau 301 from the axioms described in the previous section. Starting from the axioms for contradiction, the development of classical first order predicate calculus is straightforward. In this development more than usual attention has been paid to mutual exclusion: ’ ( A A B ) , and trichotomy: ( A V B V C ) A ( - { A A B ) A i ( B A C) A -.(C A A ) ) , because these concepts are used frequently by Landau in discussing linear order. The properties of equality, e.g. symmetry, transitivity, and substitutivity for functions (i.e. if z = g and f is a function on S then f(z)= f ( y ) ) , follow from the axioms for equality. The development of the theory of equivalence classes (cf. 1.0 (iv)) requires the axioms for subtypes and for sets. It turns out here, when translating mathematics into AUT-QE, that Landau goes quite far in considering concepts and statements about those concepts to belong to “logisches Denken”. We had to choose how to describe partial functions in AUT-QE. As an example let us consider the function f on the type rl of the reals, defined for all 5 E rl for which z # 0, and mapping z to 1/x. There are (at least) four reasonable ways to represent f : (i)
The range of f may be taken to be rl*, the “extended type” of reals, containing, apart from the reals, an object und representing “undefined”. In this case (0) f will be (book-equal to) und, and rl may be defined as OT(rl*,[z : rl*] (z # und)).
710
L.S. van Benthem Jutting
(ii)
An arbitrary fixed object in rl, 0 say, may replace und. Then (0) f will be taken t o be 0.
(iii) f may be considered as a function on OT(r1,[z : rl]z the nonzero reals.
# 0), the subtype of
(iv) f may be represented as a function of two variables: an object x E rl and a proof p E z # 0. so
f E [x: T I ] [p, z # 01 rl (Then, given an 2 such that 2 # 0, i.e. given an z and a proof p that x # 0, we can use (p) (z)f t o represent I/z.) It is clear that the representations (i) and (ii) have much in common. The representations (iii) and (iv) are also related: in fact, we may construct, by the axioms for subtypes, for given z E r l and p E z # 0 an object out(z,p)EOT(rl, [z : rl]z # 0). Then, if fl
E [z : O T ( d ,[x: T I ] x
# O)] rl ,
then
[z::rl][p,z#O](out(z,p))fl~[x:rl][p,z#O]rl. On the other hand, if f2
E [z : rl] [p, z
# 01 rl
then [z : OT(r1,[z : rl]z
# O)] (OTAz(z))(in(z)) f2
E [z : OT(r1,[z : d ]z
# o)] rl
(for brevity some obvious subexpressions in the formula above have been omitted). After a careful examination of Landau’s language, I have decided that the fourth representation is closest to his intention, and have therefore adopted it. However, this leads to the following difficulty: Let, in our example, z E r l and y E r l be given, such that x = y, and suppose we have proofs pE (z # 0) and q E (y # 0). Now it is not a priori clear in AUTQE (though it is clear to Landau), that the corresponding values (p) (z) f and (q) (y) f will be equal. That is: it is not guaranteed in the language that the function values for equal arguments will be independent of the proofs p and q. This property of partial functions, which is called irrelevance of proofs, can be proved for all functions which Landau introduces. When discussing arbitrary
Checking Landau’s “Grundlagen”, Translation (D.2)
711
partial functions, however, irrelevance of proofs had to be assumed in some places (cf. gite below). For a further discussion we refer to 4.0.1. As a consequence of the chosen representation of partial functions, terms may depend on proofs, and therefore certain propositions are meaningful only if others are true. This gives rise to generalized implications (cf. 1.0 (vi)) and generalized conjunctions, such as: “2
> 0 * 1/x > 0”
and “x > 0 A
fi # 2” .
Logical connectives of this kind have been formalized in the paragraph iir’lin the preliminary AUT-QE text. The definition-by-cases operator ite (short for if-then-else, cf. 1.0 (vii)) can be defined on the basis of the axioms for individuals. As we have seen (1.0 (vii)), Landau admits partial functions in such definitions. For these cases a “generalized” version of the definition-by-cases operator gite (for generalized ifthen-else) is required, which has been defined only for partial functions satisfying the irrelevance of proofs condition. All set theoretical concepts used by Landau (cf. 1.0 (viii)) may be defined starting from our axioms for sets. The passages in Landau’s text which use more or less intuitive reasoning (cf. 1.0 (x)) could not very well be translated. In the relevant places straightforward logical proofs were given, which follow Landau’s line of thought as closely as possible.
CHAPTER 2. TRANSLATION In this chapter, we discuss the actual translation of Landau’s book, the difficulties encountered and the way they were overcome (or evaded). First, in Section 2.0, we given an abstract of Landau’s book; then, in Section 2.1, a general survey is given of the various reasons to deviate occasionally from Landau’s text. In the following sections we describe the translation of the Chapters 1 to 5 of Landau’s book. 2.0. An abstract of Landau’s book
(i)
“Kapitel 1. Natiirliche Zahlen”. Peano’s axioms for the natural numbers 1,2,3, ... are stated. ‘i+l’ is defined as the unique operation satisfying x 1 = x‘ and x y‘ = (x y)’. Properties of (associativity, commutativity) are derived.
+
+
+
+
L.S. van Benthern Jutting
712
+
Order is defined by z > y := 3 u(z = y u ] . It is proved to be a linear are derived. “Satz 27” states order relation and its connections with that it is a well-ordering. “.” (multiplication) is defined as the unique operation satisfying z . 1 = z and z.y’ = z.y z. Properties of “.” (commutativity, associativity) are derived, and also its connections with (distributivity) and with order.
+
+
(ii)
+
“Kapitel 2. Briiche”. Fractions (i.e. positive fractions) are defined as pairs of natural numbers. Equivalence of fractions is defined, and proved to be an equivalence relation. Order is defined, it is shown to be preserved by equivalence, and to be an order relation. Properties are derived (e.g. it is shown that neither maximal nor minimal fractions exist, and that the set of fractions is dense in it self). Addition and multiplication are defined, and proved to be consistent with equivalence. Their basic properties and interconnections are derived, and their connections with order are shown. Also subtraction and division are defined. Rationals (i.e. positive rational numbers) are defined as equivalence classes of fractions. Order, addition and multiplication are carried over to the rationals, and their various properties are proved. Finally the natural numbers are embedded, and the order in the rationals is shown to be archimedean.
(iii) “Kapitel 3. Schnitte”. Cuts in the positive rationals are defined. For these cuts, order, addition (with subtraction), and multiplication (with division), are defined, and again the various properties and interconnections of these concepts are proved. The rationals are embedded, and the set of rationals is proved to be dense in the set of cuts. Finally the existence of irrational numbers is proved, by introducing \/z as an example. (iv) “Kapitel 4. Reelle Zahlen”. The cuts are now identified with the positive real numbers, and to these the real number 0 and the negative reals are adjoined, in such a way that to every positive real there corresponds a unique negative real. The absolute value of a real number is defined. Order is defined, its properties are derived, and the predicates “rational” and “integral” (“ganz”) are defined on the reals. Now addition and multiplication are defined, and their properties and their
713
Checking Landau’s “Grundlagen”, Translation ( 0 . 2 )
connections with each other, with absolute value and with order are derived. In particular the minus operator (associating to each real its additive inverse) is discussed, as well as subtraction and division. Finally, in the “Dedekindsche Hauptsatz” , Dedekind-completeness of the order in the reals is proved. (v)
“Kapitel 5. Komplexe Zahlen”. Complex numbers are defined as pairs of reals. Addition, multiplication, subtraction and division, their properties and interconnections are discussed. To each complex number i s associated its conjugate, and also (following the definition of the square root of a nonnegative real) its modulus (as a real number). The connections of these two concepts with each other and with the previously introduced operations are derived. For an associative and commutative operator * (which may be interpreted or .), and for an n-tuple of complex numbers f ( l ) ,...,f(n), as either Landau denotes
+
f(1) * f ( 2 )
* ... *
f ( n ) by
@?=I
f(i) .
This concept is defined as the value at n of the unique function g (with domain {1,2,,..,n}) for which g(1) = f(1) and g ( i ’ ) = g ( i ) * f(i’) for i < n. The properties of @ are proved, in particular, for a permutation s of {1,2, ...,n} it is proved that
+
+
The definition of @ is extended to n-tuples f ( y ) , f ( y l),...,f ( y n - 1) (where y is an integer), and its properties are carried over to this situation. C is defined as the specialization of @ to the operation +, and as its specialization to . (multiplication). Some properties of C and are proved. For a complex number 2 and an integer n, with 2 # 0 or n > 0, zn is defined, and its properties and connections with previously defined concepts are discussed. Finally the reals are embedded in the set of complex numbers, the number i is defined, it is proved that i2 = -1, and that each complex number may be uniquely represented as a bi with a, b real.
n
n
+
2.1. Deviations from Landau’s text In our translation, deviations from Landau’s text appear occasionally. They may be classified as follows:
L.S. van Benthem Jutting
714
(i)
In some cases a direct translation of Landau’s proofs seems a bit too complicated. We list three reasons for this.
(a) Sometimes it is due to the structure of AUT-QE which does not quite agree with the proof Landau gives. E.g. in the proof of “Satz 6” Landau applies, for fixed y, induction with respect to x. As xE nut, y E nut is a common context in the translation, it is easier there to apply, for fixed x, induction with respect to y. (b) Sometimes the reason is that Landau uses a unifying argument. E.g. in the proof of the “Dedekindsche Hauptsatz” there are, at a certain stage, two real numbers E and H , such that E > 0 and Z > H . Here Landau needs a rational number z , such that E > z > H. Now it has been proved in “Satz 159” that between any two positive reals there is a rational. If H > 0 this may be applied immediately. If H 5 0
-
Landau defines HI= and again applies “Satz ‘159”, this time 1+1 with H I . This argument, however, is complicated, because, to apply “Satz 159”, first 0 < H I < Z has to be proved (which Landau fails to do). And it is superfluous because every z in the cut Z will meet the conditions in this case. I
~
( c ) In one instance (the proof of “Satz 27”), Landau has given a complex proof, which may be simplified. In all these cases I have, in the translation, given a proof which follows Landau’s line of reasoning. However, in some cases, I have also given shorter alternative proofs. This means that the deviations are optional in these cases. (ii)
Some of Landau’s “Satze” really consist of two or three theorems. E.g. Tatz 16: Aus x 5 y, y < z oder x < y, y 5 z folgt x < 2”. In such cases the theorem has been split up: “Satz 16a: Aus x 5 y, y < z folgt x < z ” , “Satz 16b: Aus x < y, y 5 z folgt x < 2’.
(iii)
Very frequently Landau uses without notice a number of more or less trivial corollaries of a theorem he has proved. E.g. besides “Satz 93: ( X + Y ) + Z = X + ( Y + Z ) ” he uses “ X + ( Y + Z ) = ( X + Y ) + Z ” without quoting “Satz 79”. Sometimes such a practice is explicitly announced, e.g. in the Vorbemerkung” to “Satz 15”, where it is stated that, with any property derived for <, the corresponding property for > shall be used. In all such cases the corollaries have been formulated and proved after the theorems.
Checking Landau’s “Grundlagen”, Translation (0.2)
715
(iv)
Following the translation of the definition of a concept, we often added the specialization to this concept of certain general properties. E.g. after the introduction of +, substitutivity of equality was applied: “If x = y then x+z = y+z and z+x = z+y. If x = y and z = u then z+z = y+u”. This was done in order to make later applications easier.
(v)
In a few proofs of the last three chapters minor changes were made. E.g. in the proof of “Satz 145”, where Landau states: “AUS > 7 folgt nach Satz 140 bei passendem u E = q+u” but where, by “Definition 35” u can be defined explicitly by u := - q. This has been done in the translation, thus avoiding the superfluous existence elimination. Another deviation occurs in the proof of “Satz 284”. Here Landau writes the following chain of equalities:
<
As in the proof the equality ((u
+ 1) - y) + ((x+ 1) - .( + 1))= (z + 1) - u
was needed, the following chain of equations was preferred in the translation:
+ 1) - y) + ((x+ 1) - (u + 1))= = ((x + 1)- + 1))+ ( ( u + 1) - y) = = (((x + 1)- ( u + 1))+ (u + 1))- y = (x+ 1) - y . ((u
(?A
(vi)
As we have seen in 1.0 (vii) Landau formulates Peano’s fifth axiom in terms of sets, and, when applying it, always represents a predicate as a set. In the translation this extra step has been avoided. The induction axiom is indeed introduced for sets, but then immediately a lemma, called induction, which applies to predicates is proved. This lemma has been used systematically in all proofs by induction. Also “Satz 27: In jeder nicht leeren Menge natiirlicher Zahlen gibt es eine kleinste” has been reworded and proved in terms of predicates and not of “Mengen” .
(vii)
“Intuitive arguments” of Landau were translated in various ways. E.g.
+
“Satz 20: Aus x z folgt X > Y
>y +z
+
bzw. x z = y bzw. x = y
+ z bzw. x + z < y + z bzw. x
< y.
L.S. van Benthem Jutting
716
Beweis: Folgt aus Satz 19 d a die drei FQle beide Male sich ausschlieszen und alle Moglichkeiten erschopfen” (where “Satz 19” asserts the inverse implications). Considering the fact that Landau regards this proof as belonging to “logisches Denken”, I have proved in the preliminaries three “logical” theorems to the effect that: D , B + E , C =+ F , If A V B V C , - ( D A E ) , 7 ( E ~ F )- (, F A D ) and A then D =+ A , E =+ B and F =+ C. These theorems were used in the translation. A second example: “Satz 17: Aus x 5 y, y 5 z folgt z 5 z . Beweis: Mit zwei Gleichheitszeichen in der Voraussetzung klar; sonst durch Satz 16 erledigt” (“Sazt 16” is quoted above under (ii)). Here the AUT-QE text, when translated back into German, might read: L‘Beweis: Es sei z = y. Dann ist, wenn y = z , auch x = z also z 5 z. Wenn aber y < z so ist x < z nach Satz 16a, also ebenfalls x 5 z . Nehme jetzt an x < y. Dan folgt aus Satz 16b x < z , also auch in diesem Fall x 5 z . Deshalb ist jedenfalls x 5 z ” . Another argument which is difficult to translate faithfully occurs in “Kapitel 5, $8’’ where sums and products are introduced. Landau uses here a symbol which he intends to represent either 6L+” or “.”, and in this simultaneously. In our translation we defined way defines “C” and iteration for arbitrary commutative and associative operators, and consequently our concept and the relevant theorems are essentially stronger than Landau’s. This generality is much easier to describe in AUT-QE and “.”. than a theory which applies only t o LL+”
*
“n”
(viii) Landau uses metatheorems whenever he embeds one structure into another, to show that the properties proved for the old structure “carry over” to the new. As an example I cite his treatment in Chapter 2 of the embedding of the natural numbers into the (positive) rationals.
folgt
x > y bzw. x = y bzw. x < y”.
“Definition 25: Eine rationale Zahl heiszt ganz, wenn under den Briichen, deren Gesamtheit sie ist, ein Bruch
X -
1
vorkommt”.
“Dies x ist nach Satz 111 eindeutig bestimmt, und umgekehrt entspricht jedem x genau eine ganze Zahl”.
Checking Landau’s “Grundlagen ”, Translation (0.2)
717
“Satz 113: Die ganzen Zahlen geniigen den fiinf Axiomen der natiirlichen 1 Zahlen, wenn die Klasse von - an Stelle von 1 genommen wird, und als 1 2
2’
Nachfolger der Klasse von - die Klasse von - angesehen wird”. 1 1 Landau adds the following comment: ‘LDa=, >, <, Summe und Produkt (nach Satz 111 und 112) den alten Begriffen entsprechen, haben die ganzen Zahlen alle Eigenschaften die wir in Kapitel 1 fur die natiirlichen Zahlen bewiesen haben”. It was difficult to translate this text. The translation requires first a careful analysis of the interpretation of Peano’s axioms in Chapter 1. There are two possibilities: In the first interpretation, the axioms describe fundamental properties of the given system of naturals (nat, 1, suc), which cannot be proved from more primitive properties, and from which all other properties of the system can be derived. In this conception there is an intention to characterize the structure by the axioms. In the second interpretation, the axioms are simply assumptions underlying a certain theory. The theorems of the theory are valid in any structure in which these assumptions hold. In this view, no claim is made that the axioms characterize the system. The difference between these two conceptions can be illustrated by comparing the r6le of the axioms in Euclid’s geometry to the r61e of the axioms for groups in group theory. The interpretation of “Satz 113” and Landau’s comment varies according to the interpretation of the Peano axioms. In the first interpretation the “ganzen rationalen Zahlen” form a structure (nat*, 1*,suc’) which “happens to” have the same fundamental properties as the original structure (nat, 1, sac). Hence, by a suitable metatheorem, we see that the reasoning of Chapter l may be repeated for this new structure, extending it to (nat*,1*,suc’, +*, .*, <*) and proving the various properties of this extended system. In the second interpretation “Satz 113” just proves that the structure (nat’, 1*,sue*) satisfies the assumptions. After this the theory of Chapter 1 can be applied immediately. However, there is a further problem (under either interpretation): addition on nat’ defined according to the method of Chapter 1 is not (definitionally) the same thing as the restriction (to nut*) of the addition on the rationals and these two functions must still be proved to be (extensionally) equal. Similar remarks can be makde about multiplication and order.
L.S. van Benthem Jutting
718
It follows that the relevant text cannot be rendered directly in AUT-QE under either interpretation of Peano’s axioms. There is, therefore, no technical reason to prefer one of these interpretations to the other. Landau’s ideas on the rBle of the axioms are not quite clear from his text. We cite some of his statements: - In his Vorwort fur den Kenner” he mentions certain laws on the reals
which can be “als Axiome postuliert”. -
He thinks it right, that the student should learn “auf welchen als Axiomen angenommenen Grundtatsachen sich luckenlos die Analysis aufbaut” .
- Moreover: “In dieser (Vorlesung) gelange ich, von den Peanoschen Ax-
iomen der natiirlichen Zahlen ausgehend, bis zur Theorie der reellen Zahlen” . -
In Chapter 1: “Wir nehmen als gegeben an: Eine Menge, d.h. Gesamtheit, von Dingen, naturliche Zahlen genannt, mit den nachher aufzuzahlenden Eigenschaften, Axiome genannt” .
- V o n der Menge der naturlichen Zahlen nehmen wit nun an, dasz sie
die Eigenschaften hat...”. -
A relevant passage is also “Satz 113” quoted above.
-
Landau never mentions “a system of naturals”, like in group theory one would discuss “a group”, but always “die naturlichen Zahlen” .
Most of the sentences quoted above point to the second interpretation, some of them, however, could be interpreted better or equally well in the first way. Now, as neither technical reasons nor Landau’s text indicated definitely how Peano’s axioms should be interpreted, I decided to interpret them as postulates (PN-lines) rather than assumptions (EB-lines) because it suited my own conception of the naturals. Moreover, this interpretation reduces the context and thereby simplifies verification. The meta-reasoning sketched above has been treated as follows. After the proof of “Satz 113” the proofs of Tatz 1” and “Satz 4” (where addition is introduced) were copied for the “ganzen Zahlen”. However, addition on the ‘lganzen Zahlen” has been defined as the restriction of addition on the rationals. Then a number of theorems from “Kapitel 1” was proved using “Satz 112”. Order and multiplication were treated in a similar way. These texts have been inserted as a matter of prestige because we claimed that we were able to say everything Landau says. The insertions were never used, however (cf. (ix) below).
Checking Landau’s “Grundlagen ”, Translation (0.2)
719
In “Kapitel 3, $5’’ and “Kapitel 5, $10” similar arguments occur, when the rationals are embedded in the reals, and the reals in the complex numbers. These arguments were “translated” just by constructing the relevant isomorphisms. This suffices for all applications. (ix)
A consequence of the difficulties described in (viii) is a divergence between the translation and Landau’s book with respect to the use of natural numbers in the Chapters 3, 4 and 5. After his comment (following “Satz 113”) that the “ganzen Zahlen” have the same properties as the “natiirliche Zahlen” Landau continues: “Daher werfen wir die natiirlichen Zahlen weg, ersetzen sie durch die entsprechenden ganzen Zahlen, und haben fortan (da auch die Briiche iiberfliissig werden) in bezug auf das Bisherige nur von rationalen Zahlen zu reden”. In the translation I have not followed this course, because, as pointed out, it would have been a cumbersome task to prove the properties of the “natiirliche Zahlen” for the “ganze Zahlen”, and also because it would have been inevitable to repeat this procedure with every further extension of the number system. Therefore I have stuck to the “natiirliche Zahlen” throughout the translation. Another important deviation of Landau’s text was caused by “Definition 43: Wir erschafFen eine neue, von den positiven Zahlen verschiedene Zahl 0. Wir erschaffen ferner Zahlen die von den positiven und 0 verschieden sind, negative genannt, derart, dasz wir jedem E (d.h. jeder nennen”. positiven Zahl) eine negative Zahl zuordnen, die wir I doubt whether this creative act may be called a “definition”. Landau considers it a part of “logisches Denken” to form, given sets (or types) a and p, the Cartesian product a x p, as is clear from Chapter 2. It might be also considered “logical” t o form the disjoint union [email protected] Landau does not mention this, he just “creates” 0 and the negative numbers from nothing. Moreover, I do not see a formal difference between the assertion “1 ist eine natiirliche Zahl” (which Landau calls an axiom) and the assertion “0 ist eine reelle Zahl” (which he calls a definition). Neither do I see a formal difference between “z’ # 1” and # 0”. In my opinion the limits of “logisches Denken” are exceeded here. In agreement with this criticism I have translated this “definition” by introducing a number of primitive concepts and axioms (PN-lines). The type of real numbers rl is a primitive type. To any cut [ real numbers p(E) and n(E)are associated. 0 is a primitive real number. Next there are axioms to the effect that the functions [z : c u t ] p ( z ) and [z : cut] n ( z )are
-<
“-<
720
L.S. van Benthem Jutting injective. Now zErl has the property pos (or neg) if it is in the range of the first (or the second) of these functions. Then there are axioms stating that, for zErl, p o s ( x ) , neg(z) and 2 = 0 are mutually exclusive, and that each 2 E rl has one of these properties. (In fact Landau does not state the latter axiom explicitly.) Starting from these axioms “Kapitel 4” was translated. However, as I thought it unsatisfactory t o develop the theory of real and complex numbers using more than Peano’s axioms alone, I have added an alternative AUT-QE version of Chapter 4, called Chapter 4a, where the real numbers are defined as equivalence classes of pairs of cuts, and where all theorems of Landau’s “Kapitel 4” are proved for these alternative reals. The AUT-QE translation of Chapter 5 has been checked relative to the AUT-QE book consisting of the Chapters 1, 2, 3 and 4a.
72 1
Checking Landau’s “Grundlagen” in the Automath System Chapter 4 (Conclusions) L.S. van Benthem Jutting
CHAPTER 4. CONCLUSIONS In this chapter we discuss some possibilities t o represent logic in Autoniath. We indicate some desirable extensions of AITT-68 and AIJT-QE and we discuss some aspects (positive as well as negative) of our translation. 4.0. Formalization of logic in Automath
In this section we shall describe various possibilities to represent systems of natural deduction in AUT-68 ( [ v a n Daalen 7.5’ (A.3), 2]), in AUT-QE and in some closely related languages. First we discuss two main decisions which have to be made when choosing between these possibilities. Then we indicate explicitly two possibilities to represent logic. 4.0.0. First order v. higher order In most Automath languages there are certain restrictions on abstraction. E.g. iii AUT-68 as well as in AIJT-QE correct abstraction expressions have the form [ J : a ]A where (Y is a Zexpression (and hence x, having type a, is a :I-variable, i.e. a variable which is a 3-expression). Such restrictions allow a faithful representation of‘ first order logic (in the sense of excluding higher order formulas and inferences). In AIJT-68 as well as in .iITT-QE this can be done by representing propositions and predicates as %expressions (as described in [aarz Daalen 73 (A.:?), 31). Then proposition variables and ( i n AITT-QE) predicate variables will be 2-variables and abstraction (or quantification) with respect t o such variables is impossible in the language.
If‘. in such a setting, we want t o discern between proposition variables and predicate variables then it is necessary t o have abstraction expressions of‘ degree 1 in the language, i.e. t o use AITT-QE (and not AUT-68). In order to represent higher order logic we should require the possibility of abstraction with respect t o proposition and predicate variables. Therefore, if we
L.S. van Benthem Jutting
722
stick to the abstraction restrictions of AUT-68 or AUT-QE, we should represent propositions and predicates by 3-expressions. We may proceed in two ways: (i) we can associate to each proposition a (primitive) type (which we will call the assertion type of the proposition). Objects of this type will be considered as proofs of the proposition. In other words: we consider the proposition as asserted iff its assertion type contains some object. This possibility will be elaborated in 4.0.2. (ii) we can extend the language t o a new language, called AUT-4, by admitting 4expressions (having 3-expressions as their types (cf. [van Daalen 73 (A.3), 2.31). Then a proposition (represented by a 3-expression) might be considered as asserted if it contains something (some Cexpression). Thus propositions act as their own assertion types, and the representation of logic is just as described in [van Daalen 73 (A.3), 3.21, but for a shift with respect to degrees. 4.0.1. Relevance of proofs vs. irrelevance of proofs In all representations of logic in Automath languages which have been developed so far, proofs (i.e. names of proofs) appear in the language ( [ v a n Daalen 73 (A.3), 31, [de Bruijn 73b], [C.4]).In this respect these representations reflect a constructive conception of logic, in which proofs and objects are treated similarly. In a classical conception of logic, proofs are discussed in the metalanguage only. As a consequence it is impossible in such a conception to discern (in the language) between different proofs of one proposition. This point of view can be roughly represented in Automath by proclaiming, for any given proposition a, all proofs of a to be equal. This deprives these proofs of their identity, their names should be considered only as references to the place in the book where the proposition is asserted. This possibility has been first suggested by de Bruijn. If, in a representation of logic in Automath, such a n attitude is adopted, we shall say that this representation satisfies irrelevance of proofs. (Cf. [Zucker 77 (A.4)], and also 1.4). How this irrelevance of proofs is implemented (i.e. in which sense proofs are considered “equal”) will depend both on the language and on the way logic is represented in it (cf. 4.0.3 (i) and (ii)). 4.0.2. A representation of logic in AUT-68 A higher order system of natural deduction can be formalized in AUT-68 as follows. A type of proposition is introduced as a primitive type:
PROP
:=
PN
Etype
Checking Landau’s “Grundlagen”, Conclusions (0.3)
723
and to each proposition A its assertion type I- ( A ) is associated:
E PROP E type
* A : = -
A
*
I-
:=
PN
(In earlier publications on AUT-68, boo1 and TRUE were used instead of PROP and I-). If S is a type, an object P E [z : S] PROP has to be interpreted as a predicate. Objects of type [ x : 5’) I- ( ( x )P ) must then be interpreted as proving that P holds for every E S. So we want to introduce the proposition V(S, P ) which has the property that its assertion type contains elements iff the type [z : S] I- ((z)P ) contains elements. This is expressed in the following lines:
S P P a u P u
* s : = -
E type E [z : S] PROP EPROP
* P : =
*
V
* a * u * Ve * u * Vi
:= PN :=-
ES E I- ( V ( S , P ) )
:=:= PN :=:= PN
E
I-((a)P)
s]
E [z : b ((z) P ) E I- ( V ( s , P ) )
Starting from these primitieve concepts and axioms, higher order logic can be developed. An indication of how this can be done, is given in Appendix 6 [not in this Volume, but see also [B.f]], where the first three theorems from Landau’s book are derived on the basis of the logic so developed. This logic represents a constructive system of natural deduction. Axioms could be added for extensional euqality of functions and extensional equality of propositions (i.e. if a b then a = b ) . Classical logic could be represented this way by adding axioms for irrelevance of proofs:
-
* A
._ .-
._ A * u .._ u * v .- v * i v . p r . := P N
E PROP E I- ( A )
E I- ( A ) E
IS (l-
(A),u,v)
and for the double negation law:
._ A * u .u * d.n.1. :=
PN
E I- ( - ( - ( A ) ) ) E I- ( A )
4.0.3. A representation of logic in AUT-QE How logic can be represented in AUT-QE is described in [vanDaalen 73 (A.3), 31. This system, a first order system of natural deduction, has been used in our translation. An indication of the development of logic in it can be found in the
L.S. van Benthem Jutting
724
excerpted text in Appendix 7, which covers the proofs of the first three theorems of Landau’s book and the logic used in these proofs. [Appendix 7 is not in this Volume. However, this excerpt is contained in the excerpt for “Satt 27”, [D.5] in this Volume.] The system is a bit ambivalent, because it is classical (containing the double negation law as an axiom) but does not satisfy irrelevance of proofs. There are two obvious ways t o implement irrelevance of proofs: (i) by adding an axiom: * A A * S
S * t t * u
.-.-
...-.-
~
~
u * v := v * zrr,pr. := PN
E Prop E type E [z : A] S EA EA E IS(S,(u)t,(v)t)
That is: if to every proof of A an object of type S is associated, then this object is independent of the nature of the proof. It has been indicated by J. Zucker that this axiom implies irrelevance of proofs in partial functions as mentioned in 1.4:
* s
:=
S * T : = T * P : = P * f := f * a := a * b := b *u:= u * v := v * w : = w * Q := [x : SI [Y : ( x )PI Is (T (4(a)f , (Y) (4 f1 w * 11 := [y : ( a )PI irr.pr.((a)P, T , (a)f , u,y) w * 12 := ISP(S,Q,a,b,u,11) w * 13 := ( w ) l z
E type E type E [z : S] prop E [z : S] [y : ( x )P ]T ES
ES E IS (S, a, b ) E
(4p
E E E E E
(b) p [x : 5’1 prop (a)Q (b) Q
W T ,(4 (4 f , (4( b ) f 1
(ii) by extending, in the language, the relation of definitional equality, in such a way that two 3p-expressions (cf. 1.2) are definitionally equal iff their types are definitionally equal. This has been done in the language AUT-II (cf. [Zucker 77 (A.4)]),but could be done in a variant of AUT-QE as well. If we want to formalize intuitionistic logic in AUT-QE we should have the absurdity rule (i.e. contradiction implies any proposition) instead of the double
Checking Landau’s “Grundlagen”, Conclusions (D.3)
725
negation law. The logical connectives (apart from implication) and the existential quantifier could be added as primitive constants, and their elimination- and introduction rules as axioms.
4.1. The language In this section we discuss some features of Automath languages, and the value of these features for the formalization of mathematics.
4.1.0. AUT-SYNT Consider the following AUT-QE text, representing the introduction rule for conjunction: a b u v
* a * b * u * v * andi
:=
__ __
E Prop E Prop
..-
__ __
Eb
:=
.....
.-.-
:=
Ea E and(a,b )
(where the dots indicate some proof which is irrelevant for the present discussion). We will call the variables a, b, u, v the parameters of andi. If we want to apply this rule for propositions A and B, we need two proofs p and q of the propositions, thus getting the proof andz(A,B,p,q) E and ( A ,B ) . Suppose we are given the proof p , then we can compute mechanically its type (cf. [van Daalen 73 (A.3), 6.4.2.31) which is (definitionally equal to) the proposition A it proves. A similar observation holds for q and B. Hence we could say that the expression andi ( A ,B , p , q ) contains redundant information. If the “mechanical type” function CAT ([vanDaalen 73 (A.3), 6.4.2.31) were incorporated in the language, we could write, instead of the expression above, andz (CAT(p), CAT(q),p,q), which only contains p and q. We will call the parameters u and v (for which p and q are substituted) the essential parameters of andi, while a and b (for which the redundant expression A and B are substituted) are called redundant parameters. There are many other examples of expressions with redundant parameters. It is worth while to extend the language in such a way that redundant parameters can be avoided, because the expressions which have to be substituted for them might be long. A system of extensions of this kind has been proposed by I. Zandleven. It is called AUT-SYNT since it admits sylztactzc variables for expressions. Thus we have the languages AUT-68-SYNTI AUT-QE-SYNT etc. For a description of AUT-SYNT we refer to Appendix 9 [[B.5] in thzs Volume], a text in AUT-68-SYNT may be found in Appendix 8 [not in this Volume]. Our experiences with translating Landau’s book have been a stimulus for developing AUT-SYNT, and have indicated the way this could be done. As no
L.S. van Benthem Jutting
726
verifying program for SYNT languages was available until after the translation was finished, the SYNT-facility could not be used in the translation. This may be considered unfortunate, because the presence of this facility would have simplified both the writing and the reading of our text.
4.1.1. 7-reduction in Automath In AUT-68 and AUT-QE one of the possible ways to establish definitional equality is by 0-reduction ([vanDaalen 73 (A.3), 6.2.21): If x is not free in A then [z : a](z) A -+rl A . As can be seen in the list in Chapter 3 [[E.t] in this Volume],0-reduction was applied only twice during the verification of our translation. We give the lines which required these 0-reductions, together with their relevant contexts. The following lines from the text on propositional logic are presupposed:
* *
con a not u
a * a * u * et a * u u * cone
:=
PN
:=
[z : a] con
..-
.._
:=
._ .-
:=
E Prop E Prop E Prop E not(not(a))
__
PN
E a
E con Ea
et(a, [ z : not(a)]u)
The first line where q-reduction is required occurs in the text on predicate logic. In this text the following lines appear:
* s S P P P u v s v
*
:= :=
* *
P all non u v
*
s
...-.-
*
tl t2
:= :=
* *
*
:=
P
:= :=
[x : S ] not((x)P ) __
et ((4Plb) v) ([z : S] tl(z))u
E type E [z : S ]p r o p E Prop E [x: S ]p r o p E not( all (s,P ) ) E non (non ( P ) )
ES E (4 p E con
In order to verify that the middle part of this last line is a correct expression, it should be established that
CAT([x : S]tl(x)) &' DOM(u)
(cf. [van Daalen 73 (A.3), 6.2.4.61) .
We have CAT([z : S ]tl(x)) = [x : S] CAT(tl(x)) = = [x : S] ((s) P ) [s := x] = [z : S] (z) P
,
Checking Landau js “Grundlagen”, Conclusions (0.3)
727
DOM(u) = DOM(not(al1 ( S , P ) ) )= D
= DOM([z : all ( S ,P ) ]con) = all (S,P ) = P
.
The question is to establish D
[z : S] (z) P = P
This obviously requires 17-reduction. The second case in which 17-reduction is used occurs in the text on generalized implication:
* a b b
*
u
*
* *
a b imp u th2
:= := := := :=
E E E E E
b [z : a] cone ((z)b, (z) u)
Prop [z : a] p r o p Prop not(a) imp (a,b)
Here, in order to verify the last line, it is asked whether the category of the middle part definitionally equals the category part, i.e. whether D .
CAT([z : a] cone((z) b, (z) u) = zmp (a,b )
.
Now CAT([z : a] c o n e ( ( z ) b , ( z ) u ) )= [z : a]CAT(cone ( ( z ) b , ( z ) u ) ) = = [z : a] (z) b
,
D
imp (a,b) = b and therefore 17-reduction must be used for establishing D
[z : a] (z) b = b
.
It has been observed by v. Daalen that 17-reduction might have been avoided in both cases by a slight modification of the definitions: for all (in the first case) and for imp (in the second case). In fact all might have been defined by
P
*
all
:=
[x:s]( z ) P
E prop
imp
:=
[ z : S](z)b
Eprop
and imp by b
*
This would have made no difference to the rest of the book, apart from the fact that in some places an extra /?-reduction would have been necessary. In fact, if a predicate P is defined explicitly (as opposed to being a predicate variable or a D primitive predicate constant) then P = [y : S ]m(y), say, and we have, without 7-reduction
L.S. van Benthem Jutting
728 D D [z : S] (z) P = [z : S] (z) [y : S] m(y) = D
D
= [z : S] (m(y)) [y := z] = [z : S] m(z) = P
.
We conclude therefore that q-reduction does not add considerably to the expressive power of Automath. prop v. type In the stage of exploration of the possibilities to represent logic in AUTQE, initially a variant of this language was used which did not contain the 1-expression prop. It was therefore impossible to prescribe whether types had to be interpreted as assertion types (containing proofs) or “ordinary” types (containing “ordinary” objects). Contradiction was represented as a primitive type, negation and the double negation law were formalized in terms of this type as follows:
4.1.2.
* *
*
a a
*
u
*
con a not u d.n.l.
PN
:=
..-
:= := :=
~
[ z : alcon ___
PN
E type E type E type E not(not(a))
Ea
If in this text a is interpreted as an “ordinary” type, nat say, then expressions of type not(a) (or [z : a] con) could be interpreted as proofs that a is empty (in fact, if we have p E not(a), then for an object z E a we have ( z ) p to prove contradiction). Hence expressions of type not(not(a))have to be interpreted as proofs that a is (in a weak sense) nonempty. Given such a proof q we have an object d.n.l.(a,q) E a. Or, in other words: d.n.1. acts as a Hilbert operator, selecting an object from any non-empty type. In particular this induces a form of the axiom of choice. As we did not want the double negation law to have such far-reaching consequences, we extended the language by admitting prop as a basic 1-expression. Thus we obtained the language AUT-QE (as defined in [van Daalen 73 (A.3), 5 ] ) , in which it is possible to distinguish between assertion types and ordinary types. The distinction of prop and type not only unlinked the double negation law from the axiom of choice, but also made it possible to implement irrelevance of proofs (cf. 4.0.1, 1.4). This opportunity was not seized in the logic underlying our translation (though this would have been natural). For an explanation we refer t o 4.2.1. We may conclude that the distinction between proofs and “ordinary” objects is an essential feature when representing classical logic in Automath. For representing constructive logic the version with only type keeps its value.
Checking Landau’s “Grundlagen”, Conclusions (0.3)
729
4.1.3. Strings and telescopes In Chapter 2 of his book Landau uses pairs (21, Q) of natural numbers. He considers such a pair as a single object and yet he describes it by two variables. A faithful translation of this practice could have been given if the concept of a string of expressions would have been present in our language. Another use strings of expressions might have is as arguments of partial functions (as described in 1.4). In fact such functions are applied to pairs ( a , p ) where a is an object of a certain type S, and p a proof that a satisfies some predicate P on S (which describes the range of the function). As a further example we consider the concept of a group, which might be op, iv,e , p ) where E type, op E [z : s][g : s] zv E considered as a string [z : e E and p E groupazzorns op, iv, e). We usually want the types of the expressions of such a string to satisfy certain conditions. In the case of the argument ( a , p ) of a partial function we want a E S, p E (a)P. In other words we want the argument ( a l p )to be consistent with the “abstractor part” of the function: z E S; y E (z) P. In the case of groups we want a group (S,op, iv, e,p) to be consistent with
s]s,
s
(s,
(s,
s
s,
zEtype; yE[s:z][t:z]z; zE[s:z]z; u E z ;
v E groupaxaorns (2,g, z , u ) . There is a strong analogy with the case where expressions All ...,A , are required to be suitable candidates for substitution for the variables zl, ...,z, of a certain context z1 E a1,22 E a2, ...,2, E a, (cf. [van Daalen 73 (A.3), 2.51). To describe such conditions on strings we introduce the following terminology. A finite sequence of E-formulas z1 E all...,xnE a, is called a telescope. The string of expressions (al, ...,a,) is said to fit into the telescope 31 E all...,Z n E an if al E ( Y ~ ,EuC~Y ~ [ :=al], Z~ ...,a, E an[zl,...,zn-l:=a1 ,...,a,-1]. Extension of the language with constants and variables for strings and defined constants for telescopes has been proposed by de Bruijn. This is especially helpful, when formalizing abstract structures such as groups, vector spaces or categories, and has been applied on a large scale by J. Zucker (cf. [ Z U C k e T 77 ( A 4)1)* 4.2. Comments on the Translation
In this section we first give a chronological survey of the different representations of logic which have been tried, and we state the motives for finally choosing AUT-QE as a language for our translation. Furthermore we mention some aspects which are (in our opinion) shortcomings of the translation and we add some positive conclusions which can be drawn from our work.
L.S. van Benthem Jutting
730
4.2.0. Choice of the language In our first attempts to translate Landau’s “Grundlagen” in Automath, we used the language AUT-68. The representation of logic was similar to the one described in 4.0.2 and presented in Appendix 6 [not in this Volume].Elimination and introduction of V were effected by the axioms Ve (with parameters S E type, p E [z : PROP, a E s, u E k (t/(s, P ) ) and vi (with parameters s E t y p e , P E [z : S ]PROP, u E [z : S] ((z)P ) ) . These axioms were used frequently in developing logic, because the logical connectives and the existential quantifier were defined in terms of V. On the basis of this logic Chapter 1 of Landau’s book was translated in AUT-68. At that stage of our work we started trying to represent logic in AUT-QE, initially using a variant of that language which did not contain prop. In AUTQE the axioms V i and Ve were superfluous: if P E [z : S]type (i.e. P represents a predicate on S ) then objects of type P can be interpreted as proofs of V(S, P ) . Conversely, given such an object u E P and an object a E S we have (a)u E (a)P (i.e. ( a )u proves that P holds at a). As a consequence the text on logic in AUT-QE was considerably shorter than the earlier text in BUT-68. (It was not observed at that time, that this was caused essentially by the redundant parameters S and P of both constants Ve and Vi.) So AUT-QE seemed to be a much better language, and therefore a fresh start was made with the translation of Landau’s book into that language. In 4.1.2 we have reported that in this system (AUT-QE without prop) the double negation law induces a Hilbert operator. This led us to add prop as a basic 1-expression to our language, thus extending it t o proper AUT-QE. At the time we finally fixed the language we did not appreciate the fundamental importance of incorporating a form of irrelevance of proofs. This was due mainly t o two reasons:
s]
+
(i) Partial functions are not frequently used in the first three chapters of Landau’s book, and for those partial functions which are defined there, irrelevance of proofs could be derived. Therefore no need was felt for an axiom. (ii) As Landau, being a classical mathematician, does not discuss proofs at all, we thought we should try to follow this practice. Consequently we did not want to have an axiom declaring proofs equal.
4.2.1. Shortcomings of the translation Here I list those features of the translation which I would change if I were to redo the work. (i)
In my opinion the SYNT-facility should be present in any Automath language. It will bring texts in Automath closer t o mathematical practice.
Checking Landau’s “Grundlagen”, Conclusions (0.3)
73 1
The middle parts of many lines in the present Landau translation are unnessarily complex and tedious (both to the reader and to the writer), because this facility is absent in the language I used. (ii) I regret that I have not implemented irrelevance of proofs as an axiom. As I see it now, for representing classical reasoning a language should be chosen which even contains irrelevance of proofs by definitional equality (cf. 4.0.3). (iii) Some of the names I have used lack expressive power. This is partly due to the fact that AUT-QE admits only alphanumeric identifiers, but mainly to my excessive preference for short names. (i.1
I am not content with the translation of Chapter 5.8. This text is overloaded with irrelevant embedding and lifting functions which hamper a clear understanding of the argument. I think it is better to define CZ1 f ( i ) and f(i) for functions f defined for all natural numbers (and not just on an initial part of the naturals), although this procedure deviates slightly from Landau’s intentions.
nz,
Final remarks The main positive comment we can make on the translation is that is has been succesfully finished (in spite of some inconveniences in the language). An aspect which has not been mentioned so far is the ratio between the length of pieces of AUT-QE text and the length of the corresponding German texts. Our claim at the outset was that this ratio can be kept constant. We give a few data. As pieces of text we have chosen the chapters of Landau’s book, and as a measure of the lengbhs .the number of stored AUT-QE expressions (storing expressions requires storing all subexpressions too) and (rough estimates of) the number of German words (where “z” and “+” were counted as words). We give the following list: 4.2.2.
nr. of expressions nr. of words nr.n;f ZE ;wn;:s
Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 12200 25800 30300 35000 60500 3200 4900 5300 5500 11000 3,8 593 597 64 5,5
The high ratio in Chapter 4 might be attributed to the complicated definitions by cases in this chapter, while the low ratio in Chapter 1 is possibly caused by the absence of calculations. Another notable aspect of the work is the comparatively small place taken by the preliminaries. It appears that a formal treatment of the logic underlying
732
L.S. van Benthem Jutting
mathematics (if we disregard metalogic) is much easier than a formal treatment of mathematics itself. It has not been the purpose of this enterprise to construct a formal system which suits my own fancy and to develop in this system the theory of naturals, reals and complex numbers. I have rather tried to represent in a language which was essentially given beforehand, a wide variety of concepts and ideas as expressed in a book like Landau’s. The success of this undertaking is due to the flexibility of Automath languages, and to the close connection which can be made between these languages and intuitive human reasoning.
733
A Text Fragment from Zucker’s “Real Analysis” L.S. van Benthem Jutting and R.C. de Vrijer
The text “Real Analysis” was written by J. Zucker, partly in cooperation with A. Kornaat, in 1975-1976. It contains a formalization of the theory of real numbers, functions, continuity, differentiation, ending with the definition of the exponential function as a power series and the proof that this function is its own derivative. In [Wcker 77 (A.4)] its author reported on the development of this text. It is written in AUT-II, a variant of Automath developed by Zucker, which contains explicit product types, sum types and disjoint sums of types. A verifying program for AUT-II was never finished; hence Zucker’s text has never been checked on a computer. We present here a short fragment of Zucker’s text. The aim is to give an impression of how mathematics, also of a more advanced level, can be written, and has actually been written, in a flexible Automath language like AUT-II. With this goal in mind, we have not selected a piece from the very beginning of the Automath book. Instead, we exhibit a fragment occurring at a point where already a good deal of the subject “Real Analysis” has been developed, viz., from Chapter 10 (entitled “Partial functions of a real variable”): Section 7 (“Differentiation”) and part of Section 8 (“Rules for differentiation”). The consequence of this policy is that the text is not readable without some explanation, since we just drop in in the middle of the story, so to say. Therefore, we start with an informal introduction, written with the aim of providing the background that is needed for an understanding of the formal text. It is not intended as a general introduction t o AUT-II. For more information on the background and specific features of this Automath language one should consult [Zucker 77 (A.411 or [ B . 6] .It may be noted that the format of the text roughly follows common Automath practice. In particular, the notations, the way of dealing with contexts, etc., are much as in the description of AUT-68 in [van Benthem Jutting 81 (B.I)]. This article is organized as follows. First we will give an informal exposition of the language and of some syntactical conventions (Section l),and comment on the particular way in which Zucker deals with the syntax (Section 2). Then we give a short account of some relevant parts of the text Real Analysis that
734
L.S. va.n Benthem Jutting and R.C. de Vrjer
precede the fragment (Section 3), on the basis of which it is possible to give a synopsis of the fragment itself (Section 4). We conclude our introductory remarks with an overview of some more identifiers that are used but not defined in the fragment (Section 5). Then Section 6 contains the actual text fragment.
1. DESCRIPTION OF THE SYNTAX The text consists of lines. A book is a sequence of lines. There are three main kinds of lines: (i) AUT-II lines, (ii) paragraph lines, (iii) skip lines. Paragraphs give a global structure to the text, facilitating internal referencing. They are indicated by the paragraph lines. Skip lines are for comments and for formatting purposes. The AUT-11 lines contain the real AUT-11 text. They can again be subdivided into three kinds:
(a) context lines, (b) defining lines, (c) primitive notion lines. We now briefly discuss each of the kinds of lines mentioned. (i)(a) A context line consists of two parts: a context indicator (which is optional) and a context extension. In order t o explain their roles, we discuss contexts. As usual in Automath, a context consists of a sequence [x1 : All ... [ x , : A,], where 2 1 , ..., xn are distinct “variables” and A l , ...,A , are expressions (see below). A context indicator is either a variable followed by the symbol @ or just the symbol @. It denotes the context on which the line should be interpreted. If a context [x1 : All [ x :~Az] [x3 : A31 [ 2 4 : Ad] has been introduced earlier, the context indicator “ 2 3 62”indicates that the present line must be interpreted on context [x1 : All [ x :~Az] [x3 : A31 and the context indicator “x2 @” that the context will be [q : All [x2 : A z ] . If the context indicator is just @, the line should be interpreted on an empty context. If it is absent, the context of the line is the context of the previous AUT-II line. The context extension in a context line is a nonempty sequence [y1 : B1] ... [y, : B,], where again y1, ...,,y are distinct variables and B1, ...,B, are expressions. We now describe the effect of a context line. Suppose that somewhere in our book appears the context [x1 : All [x2 : Az] [xg : A3][x4 : Ad], and that the
A text fragment from Zucker’s “Real Analysis” (0.4)
735
context of the previous AUT-rI line is [z1 : Cl] [Q : Cz]. Then the context that is created by the line [yi : B11 [yz : B2][y3 : &]
will be :c 1 1 [.2 : CZ][yt : B1] [yz : &] [y3 : 831;
the context that is created by the line @
[Yi : B11 [Y2 :
B21 [3/3 : &]
will be [!/I
: B11 b 2 : B21 [3/3 :
&I;
the context that is created by the line xi
@
[pi : B11 [YZ : &]
[2/3
:83)
will be [XI
: A11 [?A : B 11 [3/2 : B2] [u3 : &];
and the context that is created by the line x4
@
[Yl : B11 [y2 : B2j
will be
(i)(b) A defining line also consists of two parts: a context indicator again, and a definition. The context indicator works exactly the same way as it does in a context line. In particular the context indicator is optional. The definition consists of a definiendum, a definiens and a type. Definiendum and definiens are separated by the symbol :=, and definiens and type by the symbol “:”. So a defining line looks like this:
x
@
definiendum
:= definiens
: type
The definiendum is an identifier or a fix symbol. Identifiers are sequences of letters, digits and the symbol “.”. Fix symbols are some special symbols like &, =, >, +, 7 ,etc. and any sequence of symbols in quotes (e.g. ‘and’). They are a special kind of identifiers, that can be used prefix, postfix or infix. Their specific use is indicated at the place where they are defined. In the fragment which is shown here, no fix symbols are defined, and hence we omit the relevant syntax.
736
L.S. van Benthem Jutting and R.C. de Vrijer
Fix symbols which have been defined in the previous book appear frequently in the text, though. The definiens and the type are expressions, separated by a semicolon ":". The type in a definition line is optional, as it is possible t o deduce it from the definiens (modulo convertibility). However, in most cases it is explicitly given in the text. We now describe briefly six important shapes of expressions. Lambda expressions, written as [x: A] b. The square brackets are used for lambda abstraction. In more traditional notation one would write Ax : A h . Application expressions, written as ( a )f , meaning the function f applied to the argument a. In Automath tradition, the argument is written before the function. II-expressions (for Cartesian products), written as II(f), where f denotes a type- (or prop-)valued function. Typically, [x : A] b : II([x : A] B ) holds if on the context extended with [x: A] we have b : B. C-expressions, written as C ( f ) ,where f denotes a type- (or prop-)valued function. A C-expression denotes a type of pairs (a,b ) , where the type of b may depend on a. Typically, we have (a,b ) : C([x : A] B ( z ) )if a : A and b : B(a). Zucker writes pair(a, b ) for (a,b ) , and p r o j l and proj2 for the left and right projections. So projl(pair(a,b ) ) definitionally equals a and proj2(pair(a, b ) ) equals b. Head expressions. If an identifier id has been defined earlier in context [xl : All ... [z, : A , ] , then id(a1, ...,a,) is an expression. Below we make a remark about omitting arguments from such expressions. Fix expressions. If fz is a prefix symbol then fz a will be a n expression, if fz is an infix symbol then afx b will be an expression, and similarly for postfix symbols. Ordinary brackets are used for parsing in this case. This is not yet a complete list. E.g., we did not mention the construction of disjoint sums of types. It may be observed that explicit occurrences of the II- and the C-construction are not encountered in the fragment. However, you will see for example R1 + R1, an infix expression that is definitionally equal to the type II([z: Rl] Rl), and V ( P ) , defined as II(P), where P is a predicate, i.e. P : II([x : Alprop) for some type A . As will become apparent later in this introduction, II- and Cexpressions, and also disjoint-sum types, do prominently figure in the background of the fragment. (On the use of products and disjoint sums in logic see [Zucker 77 (A.4)].)
A text fragment from Zucker’s “Real Analysis” (0.4)
737
(i)(c) A primitive notion line is like a defining line. The difference is that the definiens is not an expression, but PN, for primitive notion. In a primitive notion line a new identifier is declared and its type is given.
x
@ identifier I
:= PN
: type
Primitive notion lines do not occur in the present fragment.
Abbreviating expressions. 1. If the identifier id has been defined earlier on. the context [ X I : All ... [x, : A,] [x,+1 : A,+1] ... [xn : An] and if [ X I : All ... [z, : A,] is the initial part of the context of the line under consideration, then id(a,+l, ...,a,) denotes id(x1,...,x m ra,+l, ...,an). So, in particular, just id on the context [x1 : All ... [x, : An] denotes the full expression id(z1,...,zn).
2. For writing lambda expressions there is in some cases another abbreviation device. If in the book a defining line id
._ .-
a
: A
occurs, with context [q : A11 ... [x, : A,] [zm+l: A,+1] ... [&+k : A,+k], then [Icxlid denotes the expression [2m+1 : A,+i] ... [x,+k : A,+k] id(z,+l, ...,x,+k), that is the expression obtained by “lc times abstracting id”. Cf. [de Bruijn 72al.
3. Zucker exploits the facilities provided by AUT-SYNT. For a description of the AUT-SYNT mechanism see [ B . 5 ] .In Section 5 below we will point out some examples. (ii) Paragraph lines mark the paragraph structure of the text. The text is divided into a nested structure of paragraphs, which is largely independent of the context structure. The purpose of the paragraph structure is to provide the possibility of reusing identifiers. Every line of the text, be it an AUT-II line, a paragraph line or a comment line, belongs to a paragraph, the L‘active paragraph” of that line. Three kinds of paragraph lines are used to indicate the paragraph structure of the text: paragraph opening lines, paragraph reopening lines and paragraph closing lines. They determine the active paragraph of the lines which follow.
L.S. van Benthem Jutting and R.C. de Vrijer
738
Paragraph opening lines have the form +P where P is an identifier, a so called paragraph name. After such a line the active paragraph has name P , until another paragraph line appears. The active paragraph, Q say, of the paragraph opening line itself is called the “mother” of P , and P is called a “daughter” of Q.
Paragraph reopening lanes will not be discussed here, since they do not occur in the present text fragment. Paragraph closing lines have the form
-P where P must be the name of the active paragraph. The lines following this paragraph closing line have as active paragraph the mother of P. The reflexive and transitive closure of the relation mother is called “ancestor”, the reflexive and transitive closure of daughter is “descendant”. Inside paragraphs the definienda must be distinct. Reference to definienda in ancestor paragraphs is (essentially) direct, reference to definienda in nonancestor paragraphs is by “paragraph indicators”. These are characterized by two enclosing double quotes. We do not describe the referencing technique formally, but give a typical example. Consider the following text fragment:
+P @
two
:= 1 + 1
: N
2
:= tWO“.P”
: N
-P Suppose the first line in the text fragment has active paragraph Q. The paragraph opening line changes the active paragraph to P. The paragraph Q is the mother of P , and hence is an ancestor of P. The identifier two is a definiens in P. The paragraph closing line changes the active paragraph back to Q. The next line contains in its definiens the identifier two. This identifier is defined in paragraph P , which is not an ancester of Q. By adjoining to this identifier a paragraph indication we obtain two“.P”, t o be read as “the two from paragraph P”. More complicated paragraph indications are possible, but do not occur in the text fragment from “Real Analysis” which is shown. (iii) Skip lines serve for structuring the text by lay-out devices, and for communicating informal reminders, intuitions and considerations to human readers.
A text fragment from Zucker’s “Real Analysis” (D.4)
739
They are skipped in machine verification. There are two kinds of skip lines: comment lines and empty lines.
Comment lanes have one of the three forms ‘comment’ ‘remark’ ‘heading’
(arbitrary text) (arbitrary text) (arbitrary text)
Comment lines with ‘heading’ are used for dividing the text into chapters and sections. The present fragment occurs in Chapter 10, titled “Partial functions of a real variable”. It contains Section 7 and part of Section 8 from this chapter. Comment lines with ‘comment’ and ‘remark’refer to the AUT-II lines next t o them, ‘comment’ referring forward and ‘remark’ backward. In Zucker’s original manuscript empty lanes are indicated as follows: % . They are to be read as instructions to the typist for organizing the lay-out of the AUT-II text. Here these instructions have simply been followed.
2. SYNTACTIC USAGE Now we discuss, guided by a few specific examples, the manner in which Zucker makes use of the syntax. We start out with the use of paragraphs. The paragraph structure follows to some extent the chapter structure of the text, in such a way that chapters which depend on earlier chapters correspond t o paragraphs which are descendants of other paragraphs. Chapter 10, which contains the text fragment presented here, and which has as its subject matter partial functions of a real variable, starts, after some comment, by a paragraph opening line
+PF This paragraph P F (obviously for “partial functions”) has among its ancestors the following paragraphs:
SYNT containing syntactic constructors derived from the basic AUT-SYNT constructors CAT, DOM,etc. B containing basic concepts: logic, equality, sets, etc. L on linear orders. N on natural numbers. CL on complete linear orders. G on (abelian) groups.
L.S. van Benthem Jutting and R.C. de Vrijer
740
F E R
A4
on fields. on extended reals, i.e. Rl U (03, -m}. on reals. on metric spaces.
The list above shows the structure of the text “Real Analysis” up to Chapter 10. In fact Zucker’s “main” paragraphs often coincide with his chapters. Another use that is made by Zucker of the paragraph mechanism, is storing proofs of theorems in paragraphs that are created especially for this purpose, as these proofs will (most probably) not be referred to, and only the theorems are of interest. A typical example is the paragraph 73 (the third paragraph inside Section 7) in our text fragment. Before the opening line of this paragraph, the context is extended to
[h: pfn] [d : Rl][u: l ( d ) do(h)] , to be interpreted as: let h be a partial function from R t o R, let d be a real number and suppose d does not belong t o the domain of h. The goal is to prove that d does not belong to the domain of the derivative of h. Now first the paragraph 73 is opened and inside this paragraph the theorem is stated: th
.._
-(d) do(der.fn)
The type of th is omitted as this type is clearly prop. Note that, according to the convention on omitting variables, der.fn should be interpreted as der.fn(h). Then we see a number of lines which form together the proof of the theorem. The last of these lines has the definiendum pf, obviously meaning proof, and as its definiens an object of type th, that is, in the “proofs as objects interpretation”, a proof of the theorem. Then paragraph 73 is closed and outside the paragraph the theorem is restated, now with a suggestive name. In words it would run like the following. If we have a partial function, a real number and a proof that the real number does not belong to the domain of the function then we may conclude that the number does not belong to the domain of the derivative. The theorem (or rather its proof) now gets the illustrative name: not.do.so.not. do.der.fn, suggesting “a not in the domain of f so a not in the domain of the derived function of f”. The definiens comes from inside the paragraph, and is indicated by a paragraph indication:
A text fragment from Zucker’s “Red Analysis” ( 0 . 4 )
not.do.so.not. do. der.fn
:= pfl.73”
74 1
: l ( d ) do(der.fn)
After this line the contents of paragraph 73 can be forgotten. A shorter way of proceeding is demonstrated in paragraph 71 (the first paragraph of Section 7). Again, before opening the paragraph a relevant context is built:
[h : pfn] [d : Rl] [e : Rl] [u: (e) der(d, h)]
to be interpreted: let h be a partial function, d and e be reals, and suppose that e is the derivative of h at d. Now paragraph 71 is opened, but instead of stating a theorem, only a proof is given in three lines with definienda 11, 12 and 13 respectively, 12 proving (in the proofs as objects interpretation) that def (derw(d, h ) ) and 13 that e = j w ( der w(d, h)). Then the paragraph is closed and two theorems are extracted, by using the lines of 12 and 13 inside the paragraph. The first is do.der.fn
:= 12“.71”
: (d) do(der.fn)
The name d0.der.b is mnemonic for “(concerning the) domain of the derived function. Apparently (d) do(der.fn) definitionally equals def(der w(d, h ) ) . Note again that der.jh should be interpreted as der.fn(h). The second theorem is va.der.fn
:= sy(1314.71”)
: (der.fn; d) = e
Here sy stands for symmetry of equality. Now apparently der.fn ; d is definitionally equal to jw(derw(d, h ) ) .
3. CONTENTS OF THE PRECEDING TEXT The text fragment is not self-contained. There are, by way of the identifiers that are used, many references t o the preceding text. In order to make it yet accessible, we will briefly comment on some of the relevant parts of the AUT-II book preceding Section 7 of Chapter 10. For the description of the underlying logic and of the treatment of the reals we refer to [Zucker 77 (A.4)] in this Volume. Here we will focus on the way partial functions are formalized, on neighbourhoods and nearness predicates and on limits. Occasionally we cite from Zucker’s text and especially from his comments and remarks.
L.S. van Benthem Jutting and R.C. de Vrijer
742
To keep this section readable, we have not tried t o include all identifiers that might trouble the reader. Some have been covered already in the previous section. In Section 5 , after the synopsis, we list and discuss some other key identifiers from the preceding text, that occur in the fragment but did not yet find a place in this introduction.
Partial functions. The type R1 of real numbers is extended by considering the disjoint sum Rw of R1 with the type which has as its only element the object w (to be interpreted as “the undefined object”). The natural injection of R1 into Rw is i w , so if a : R1 then i w ( a ) : Rw. The image of w in Rw is w_. If b : Rw then def ( b ) is a proposition which is provably equivalent t o b # w_. As a matter of fact, def (g) is definitionally equal to I (contradiction), and def ( b ) is definitionally equal to T (truth) for b other than w_.
Then j w is the mapping from Rw t o R1 such that j w ( g ) is definitionally equal t o 0, and j w ( i w ( a ) ) is definitionally equal t o a (for any a : Rl). On the other hand, if b : Rw and u : d e f ( b ) then it can be proved that i w ( j w ( b ) ) = b (but here we do not have definitional equality!). The type p f n is defined as R1 + Rw. If f : p f n then the domain of f is the predicate do on R1, defined as follows: @I
[f do
: Pfnl
:=
[x: Rl] d e f ( ( a ) f )
:
pred(R1)
The type pred(Rl), denoting the predicates on the reals, is defined as the product type U([x : R l ] prop); see [Zucker 77 (A.4)]. If moreover a : R l then the value of f at a, considered as a real, is denoted by f ; a (where ; is used as an infix symbol).
Note that our remarks above, concerning definitional equality of ( d ) do( der.fn) and d e f ( d e r w ( d , h ) ) , and of d e r . f n ; d and j w ( d e r w ( d , h ) ) respectively, can now be derived, by inspection of the definition of der.fn.
Nearness. The subject of limits is prepared in a section on “nearness”. Zucker begins by taking a point a : R1 and defining the concepts of “neighbourhood” and “punctured neighbourhood” with centre a and (positive) radius p as predicates on the reals.
A text fragment from Zucker’s “Real Analysis” (0.4) @
743
[ a : R l ] [p : R p ] nb := [z : R l ]
( m 4 z + -a) 5 projl(p)) nd
:= [z : R l ]
( n b ( z )A 2 # a )
:
pred(R1)
Here R p is the type of positive reals, that is, the C-type C( [p : Rl] ( p > 0 ) ) . So p r o j l ( p a i r ( p ,u ) ) definitionally equals p . And mo obviously denotes the modulus. Finally, “-” is a prefix symbol denoting the additive inverse, so x -a means x - a in usual notation. Let P : pred(R1). We want to define the propositions “P holds near a” (i.e. in some punctured neighbourhood of a ) , “ P holds at and near a” (i.e. in some neighbourhood of a ) and also, for uniformity of treatment, “P holds at a” (i.e.
+
(4PI. a
@
[ P :p r e d ( R l ) ] at near.pred
:= ( a ) P
near at.near
:= 3(near.pred) := 3( [x : R p ] V( [y : R l ]
:=
: prop
[x: R p ] V([y : R l ] ((Y) n d ( z ) --* (Y) P ) )
((Y) n b ( z )
(Y) P I ) ) We introduce a parameter for a partial function f. a
@
If
+
: pred(R1) :
prop
: Prop
:pfnI
Now we define some more propositions: def.at def.near def.at.near
:= a t ( d o ( f ) ) := n e a r ( d o ( f ) ) := a t . n e a r ( d o ( f ) )
These say that f is defined at a, or near a , or at and near a (respectively). Apart from the definitions above, this section contains other definitions and theorems concerning the defined concepts. For our text, and for the concept of limit, the following definition is important.
f
@
[b : R l ] [p : R p ] near.nb := n e a r ( [ z : RZ]((x)d o ( f ) A ( f ;z) n b ( b , p ) ) ) The proposition near.nb expresses that near a the values of f are in the pneighbourhood of b.
Limits. Then follows a section entitled Limits. The context [a : R l ] [f : p f n ] in Section 3 is kept here. We first define the predicate of “being a limit, at point a, of the partial function f”.
L.S. van Benthem Jutting and R.C. de Vrijer
744
f
@
lim
:= [b : Rl]V ( [ E: Rp]
near.nb(b,E ) )
: pred(R1)
Note that the predicate lam can be satisfied only if a punctured neighbourhood of a lies within the domain of f. There are proofs that the limit is unique:
unq.lim
.- ... .-
: unq(1im)
(where unq is the unicity quantifier). Then limw is defined, being the (unique) limit if the predicate lirn holds for some r e d number, and w_ otherwise.
lirn w
.-.- ...
:Rw
[u : 3(lim)] lim w l
._ .- ...
: (jw(1imw))lim
satisfying
and [u : V([z : Rl] ( ~ ( zlim))] )
lam w 2
._ .- ...
:
limw
=(J
Moreover, going back to the context [a : RI] [f : pfn] [b : Rl],we have b
@
[ u :( b ) lim] 1im.so.lim.w
:=
...
: iw(b) = limw
1im.w.def .so.lim :=
...
:
Finally
f
0
[u : def (limw)]
(jw(1imw))lam
It should by now be not too difficult t o understand other properties of limits occurring in the fragment by their names and use.
4. SYNOPSIS OF THE TEXT FRAGMENT The text fragment starts with Section 7 of Chapter 10, “Differentiation”, containing the definition and some elementary properties of differentiation. It starts out by defining, in context [a : Rl] [f : pfn] (d : Rl] [u : d # a ] ,the difference quotient dq. Then dq.fn : p f n is defined in context [a : Rl][f : p f n ] as the partial function that assigns to any x for which i t is defined the difference quotient dq(a, f , x), or rather iw(dq(a,f,z)), its injection in Rw. (The definition uses def.pfn.g. For a description of this see Section 5 below.) Then the predicate of being the derivative o f f at a is defined as the limit of dq.fn at a.
A text fragment from Zucker’s “Real Analysis” (0.4)
f
@I
der
:= lim(dq.fn)
745
:
pred(R1)
On the context [h : pfn] the derivative of h can now be given as a partial function. h
@
der.fn
:= [z : Rl] d e r w ( z ,h)
: pfn
where der w ( z , h) stands for l z m w ( z , dq.fn(z,h ) ) . These definitions are followed by a few immediate consequences, mainly on the domain of the derivative in relation to the domain of the original function (paragraphs 71-74). Subsequently the context [a : Rl] [f : pfn] is extended with the assumption that f is differentiable at a, with b as the value of the derivative.
f
@I
[b : Rl] [b.der.a.f : ( b ) der]
Under this assumption, it is proved in paragraph 74 that f is defined at and near a, and in paragraph 75 that f must be continuous at a. This ends Section 7. Section 8 is devoted to finding the derivatives of certain functions and giving rules for differentiation. We outline the contents of the part of Section 8 that is included in the selection. The derivative of a constant function is computed. In particular, first, in paragraph 81, it is established that the derivative of a constant function at a is 0. a
62 [c : Rl] th
:= (0) der(con.fn(c))
: prop
Subsequently, this result is used in paragraph 82 to compute the derivative of the constant function as a function. th
:= der.fn(con.fn(c) = con .fun(0))
: prop
The same pattern is followed for the case of the identity function (paragraphs 83, 84). Finally, paragraph 85 is devoted to computing the derivative of the sum of two partial functions as the sum of the derivatives. That is, on the extended context
b.der.a.f @ [g : pfn] [c : Rl] [c.der.a.f : (c) der(g)] the following theorem is proved. th
:= ( b + c )
der(sum.fn(f,g))
: prop
L.S. van Benthern Jutting and R.C. de Vrijer
746
5. OTHER PRECEDING NOTIONS We conclude this introduction by listing some more notions from the preceding text, mostly in the order in which the corresponding identifiers appear in the fragment. The identifiers given here in addition to the ones discussed in the previous sections do still not form a complete list. We expect that the definitions of the missing identifiers can be guessed by their use and by Zucker’s practice of choosing suggestive names. stands for “quotient”. It requires three arguments: two real numbers and a proof that the second is nonzero. daf-nz
proves that the difference of two distinct reals is nonzero. In fact this is an example of the use of AUT-SYNT. Zucker derives first in context [a: Rl][b : Rl][u : a # b]
.- ...
Pf121
: (a+-b)#O
Then he defines in context [z1 : synt]
.dif.nz .pfltl (LASTELT(PREPART(TAIL(#, CAT(zl)))), LASTELT( TAIL(#, C A T ( z l ) ) )21) ,
# q, then dzf .nz(v) : (p+-q) # 0. In context [ P r : pred(Rl)] [F : n([z: Rl]II([u. : (z) P r ] Rl))]
Now if v : p
we have def .pfn.g : pfn. The partial function def.pfn.g has the same value as F at points where P r holds and is undefined otherwise. Note that literally speaking F is a function of two arguments: a real z and a proof of (z)P r . However, since in AUT-II we have irrelevance of proofs (see [Zucker 77 (A.d)]),the value of F depends only on the first argument. d0.pfn.g
This identifier and the following substantiate the just mentioned properties of def.pfn.g. First, if the context above is extended with [d : Rl][u : (d)P r ] , then d is in the domain: d0.pfn.g : (d)do( def .pfn.g(Pr,F)).
va.pfn.g
And, secondly, the value is as said: va.pfn.g : (def .pfn.g ; d) = ( u ) (d) F .
not.d0.pfn.g
If, on the other hand, the context is extended by [d : Rl] [u : ~ ( dP )r ] then n0t.do.pfn.g : ~ ( ddo(def.pfn.g(Pr, ) F)).
A text fragment from Zucker’s “Real AnaJysis” (0.4)
921
747
is for “substitutivity”: if P is a predicate, a = b and ( a )P holds, then so does (b) P. In the definition of su, again, AUT-SYNT is used. First in context [a: type] [ P : II([z: a]prop)] [a : a][b : a][u : a = b]
[v : (4 PI
we have the axiom
ax34
:= PN
: (b)P
.
Then, in context [z1 : synt] [ z :~synt] [z3 : synt]
we have
..-
su
ax94 ( D OM ( CA T(t.1)) , a ,
LASTELT(PREPART(TAIL(=, C A T ( z 2 ) ) ) ) , LASTELT( TAIL(=, C A T ( t z ) ) )~, 2 ~ 2 .3 ) Now if P : n([z : @]prop), u : c = d and TJ : (c) P (where @ : type, c : @ and d : @),then su(P,u, TJ): ( d ) P.
defd
is the “abstracted version of def” i.e. the predicate [ x : Rw]def ( x ) .
Pf.T
is a proof for T (denoting truth and defined by T := I -+ I).In fact pf.T := [z : I]z.
aP
is for “application of a function to equal arguments”: if f is a function and a = 6, then ( a )f = ( b ) f .
jwab
is the “abstracted version of jw” i.e. the function
[. : Rw]j w ( z ) . SY
is for symmetry of equality.
tr3
is for transitivity of 3 equalities: if a = b, b = c and c = d, then a = d. (Note that both sy and tr3 use AUT-SYNT.)
never
is for “always not”. Temporal adverbs are used as identifiers for quantifiers. If P is a predicate on Rl,say, then always(P) is ll([z : Rl] (z) P ) , and n e v e r ( P ) is n([z: Rl]( ~ ( 2P)) ) .
‘amp’
is implication between predicates. In fact if P and Q are predicates on R1, say, then
L.S. van Benthem Jutting and R.C. de Vrijer
748
amp
is the universal quantification of P ‘imp’ Q , that is
(Zucker uses similar notations for other connectives, e.g.
P ‘and’ Q
:= [z : RI]((z)P A (z) Q ) ,
and(P,Q)
:= l l ( [ z : R l ] ( ( z ) P A ( z ) Q ))) .
2.neg
is the double negation law.
ex. i
is for “introduction of existential quantifier” (by producing a witness).
ex. e
is “elimination of existential quantifier”. If 3([z : Rl] (z) P ) and V([z : Rl] ((z) P -, c)) then c.
hf cont
stands for
R1).
is continuity of f at a, defined in context [a : Rl][ f : p f n ] by
cont fn.pl.con
(with type
:=
( a )do( f ) A (f ; a) lzm(a,f) .
stands for “function plus constant”. If f : pfn and a : Rl then fn.pl.con(a,f ) is the function with the same domain as f,and with value (f ; z)+a, for z in that domain. Formally, if u : (b) do(f) then
do.fn.pl.con(f,a, b, u ) : ( b ) do(fn.pl.con(f,a ) ) and
va.fn.pl,con(f,a,6,u) : (fn.pl.con(f,a);6) = ((f;b ) + a ) ii.eq2
.
i f f : pfn and g : pfn, then i i . e q a ( f , g ) is the predicate
That is: (z) i i . e q t ( f , g ) means that z is in the domain of both f and g , and f and g are equal at 5.
A text fragment from Zucker’s “Real Analysis” (0.4)
749
aP2
is for “application of a binary function to equal arguments”: if f is a binary function, a = c and b = d then (b) ( a )f =
ti
is the product function [z : Rl] [y : Rl](z x y).
div. ti
(for “division” followed by “times”) proves that, for q # 0, e9 x q = p . Here, again, a notable use is made of AUT-SYNT. First, on context [a: Rl] [b : Rl] [u : b # 01 Zucker derives
(4 (4f .
Pf27 Then on context
..- ... [tl
: qu(a,b, u ) x b = a
.
: synt] [z2 : synt]
._ div.ti .pf27(.%1,LASTELT(PREPART( TAIL(#, C A T ( t 2 ) ) ) )2, 2 )
.
It follows that if p : Rl and v : q # 0
div.ti(p, v)
:= qu(p,q , v ) x q = p
triple
is defined by: triple(a, b, c) := pair(pair(a,b ) , c).
elsewhere
if a : R1 and P a predicate on R1 then
elsewhere(a, P ) := V([z : Rl]((x # a) --t (z) P ) ) . near.90. near.mod
proves that, for predicates P and Q , if elsewhere(P ‘imp’ Q ) and near(P), then near(&). Note that both “elsewhere” and “near” refer to a : R1,which is not mentioned explicitly here, but should be taken from the context.
near. eq2
near.eqd(f,g ) definitionally equals near(ii.eq2( f , 9 ) ) . So near.eq2(f1g) means that f and g are both defined near a, and that f and g are pointwise equal near a.
ti.@
stands for “times zero”, and proves a x 0 = 0.
O.pE
similarly, stands for “zero plus”, and proves 0
do2
stands for “the intersection of 2 domains”, i.e. d o 2 ( f , g ) := d o ( f ) ‘and’ do(g).
co
stands for ‘‘convergence of equality”, i.e. if a = c and b = c then a = b.
+ a = a.
750
L.S. van Benthem Jutting and R.C. de Vrijer
6. THE TEXT FRAGMENT
‘heading’ 7. DIFFEREN TIA TION
‘comment’ We return to the context “[a:Rl][f:pfn]” To begin with, we define the “difference quotient condition” (a predicate on the reals),
‘comment’ and (in an extended context) the “difference quotient”: [d:Rl] [u:d#a] dq := qu((f;d)+-(f;a),d+-a, dif.nz( u))
: R1
‘comment’ Then we define the “difference quotient function, at a, o f f ” (by the second of the two general methods given in Sec. 2).
f
@I
dq.term := [x:Rl][y:(x)dq.cond] dq(x,projl(y)) dq.fn := def,pfn.g(dq.cond,dq.term)
d
@I
: pfn
[u:d#a][v:(a)do(f)][w:(d)do(f)] do.dq.fn := do.pfn.g(dq.cond,dq.term, d,pair(u,pair(v,w)))
: (d)do(dq.fn)
va.dq.fn := va.pfn.g(dq.cond,dq.term, d,pair (u,pair(v,w)))
: (dq.fn;d)=dq(u)
d @ [u:~(d)dq.cond] not.do.dq.fn := not.do.pfn.g(dq.cond, dq.term,d,u)
: -(d)do(dq.fn)
A text fragment from Zucker’s “Real Analysis” (D.4)
75 1
‘comment’ Now we can define a “derivative, at a, o f f ” (as a predicate o n R1) to be a limit, at a, of the difference quotient function o f f at a.
f@
der := lim(dq.fn)
: pred(R1)
‘comment’ The derivative is unique, unq.der := unq.lim(dq.fn)
: unq(der)
‘comment’ and so we can define it as an object of type Rw, derw := limw(dq.fn)
:Rw
‘comment’ satisfying: [u:3(der)] derwl := limwl(dq.fn,u)
:
(jw(derw))der
f @ [u:never(der)] derw2 := limw2(dq.fnlu)
: derw=w
f 63 [b:Rl][u:(b)der] der.so.derw := lim.so.limw(dq.fn,b,u)
:
iw(b)=derw
:
(jw(derw))der
f @ [u:def(derw)] derw.def. so.der := limw.def.so.lim(dq.fn,u)
‘comment’ Now we can define the “derived function” of a partial function h as the partial function which has as its value the derivative of h, at any point where this exists, and which is undefined otherwise.
L.S. van Benthern Jutting and R.C. de Vrijer
752
@ [h:pfn]
h @
der.fn := [x:R1]derw (x,h)
: pfn
[d:Rl] [e:Rl][u:(e)der(d,h)] +71 11 :=
der.so.derw(d,h,e,u)
:= su(defd,ll ,pf.T) 13 := Ap(jwab,ll) 12
: iw (e)=derw (d,h) : def( derw (d,h)) : e=jw (derw (d,h))
-71
do.der.fn :=
12 “.71”
va.der.fn := sy(l~“.71”)
:
(d)do(der.fn)
:
(der.fn;d)=e
:
derw(d,h)=w
d 62 [u:never(der(d,h))]
+72 11 := derw.2(dlh,u) 12 := w .so.not .def( derw (d,h),11)
: ldef(derw (d,h))
-72
not.do.der.fn :=
12 “.72”
:
(d)do(der.fn)
1
‘comment’ A corollary of the last result as: d @ (u:l(d)d~(h)] +73 th := T(d)do(der.fn)
[v:3(der (d,h))] 11 :=
derwl(d,h,v)
: (jw(derw(d,h)))
der(d,h)
A text fragment from Zucker’s “Real Analysis” (D.4)
753
: (d)do(h)
:I u @
pf := not.do.der.fn([lx]l3)
: th
-73
not .do.so. not.do.der.fn := pf“.73”
(d)do(der.fn)
1
‘comment’ W e return again to the context ‘Ia:Rl][f:pfn]”, and assume now that f is differentiable at a; i n fact, we assume that we have explicitly given a derivative, at a, off. f @ [b:Rl][b.der.a.f:(b)der]
‘remark’ This context (or an extension of it) will be used in most of the rest of this chapter. ‘comment’ The first interesting thing we can say (in this context) is that f is defined at and near a. +74 t h l := def.at th2 := def.near th3 := def.at.near 11 := lim.so.def.near(dq.fn,b, b.der.a.f)
:
def.near(dq.fn)
:
(d)do(dq.fn)
[6:Rp][u:imp(nd( 6) ,do(dq.fn))] [d:R1][v:(d)nd(6)] 12 :=
(v)(d)u
L.S. van Benthem Jutting and R.C. de Vrijer
754
[w:l(d)dq.cond] 13 :=
v @
:I
:= 2.neg([lx]l3)
: (d)dq.cond
15 :=
:
16
projl(proj2(14)) := proj2(proj2(14))
: (d)do(f)
14
u @
(l*)not.do.dq.fn(d,w)
17 :=
18 :=
[2X]16 ex.i(near.pred(do(f)),6,17)
(a)do(f)
: imp(nd(6) ,do(f)) : th2
dl := a+(projl(6) x hf) 19 := in.nd(6) 110 := 15(d1,19) b.der.a.f @
pfl := ex.e(ll,[2x]llo) pf2 := ex.e(11,[2x]l~) pf3 := at .and.near.so.at .near( dO(f),Pfl,Pf2)
: :
thl th2
: th3
-74
der.so.def. at := pfl“.74”
: def.at
der.so.def. near := pf2“.74”
: def.near
def.so.def at.near := pf3“.74”
‘comment’ Further, f is continuous at a.
+75 b.der.a.f @
th := cont dif.fn := fn.pl.con(id.fn,-a) prod := prod.fn(dq.fn,diff.fn) new.fn := fn.pl.con(prod,f;a)
:
def.at.near
A text fragment from Zucker’s “Real Analysis” (0.4)
755
‘remark’ Then (for (x)do(new.fn)): (new.fn;x)=( ((dq.fn;x)x (x+-a))+(f;a)). [editor’s comment: So (new.fn;x)=(f;x). This is proved in 117 below.] p l := do(f) p2 := ii.eq2(f,new.fn) := der.so.def.at 12 := der.so.def.near
11
: (a)do(f) : near(p1)
[d:Rl][u:d#a] [v:(d)p 11
l3 := do,id.fn(d) l4 := do.fn.pl.con(id.fn,-a,d,l3) 15 := va.fn.pl.con(id.fn,-a,d,l3) 16 := do,dq.fn(d,u,ll ,v) 17 := va.dq.fn(d,u,ll,v) 18 := do.prod.fn(dq.fn,dif.fn,d, 16514) 19 := va.prod.fn(dq.fn,dif.fn,d, 16 914 )
(d)do(id.fn) (d)do(dif.fn) (dif.fn;d)=(d+-a) (d)do(dq.fn) : (dq.fn;d)=dq(d,u)
: : : :
:
: (prod;d)=((dq.fn;d)x
:
(dif.fn;d)) (d)do(new.fn) (new.fn;d)= ((prod;d)+(f;a) ((dq.fn;d)x (dif.fn;d))= (dq(d4)x (d+-a)) (dq(d,u)x(d+-a))= ( (f;d)+- (f;a)) (prod;d)=((f;d)+- (Ca)) ((prod;d)+(f;a))= ( ( ( W + - (f;4 )+(f;a) 1 (((f;d)+-(f;a))+(f;a))= (f;d) (new.fn;d)=(f;d) (d)P2
:
elsewhere(pl‘imp’p2)
do.fn.pl.con(prod,f;a,d,ls) va.fn.pl.con(prod,f;a,d,lg)
:
111 :=
112 :=
ap2(ti,17,15)
:
113 :=
div.ti((f;d)+-(f;a),dif.nz(u))
:
114 :=
tr3(19,112,113) ap(pl.Ri(f;a),ll4)
:
ll0 :=
115 :=
:
:
116 := plmi.pl(f;d,f;a)
:
tr3(111,115,116) 11s := triple(v,llo,sy(ll7))
:
117 :=
119 := [3x]11g b.der.a.f @I 120 := near.so.near.mod(pl,p2, 119912)
(d)do(prod)
: near.eq2(f,new.fn)
756
L.S. van Benthem Jutting and R.C. de Vrijer
127
1im.id.fn lim.fn.pl.con(id.fn,a,l21 ,-a) ~u(lim(dif.fn),pl.mi(a),l22) lim.prod.fn(dq.fn,b, b.der.a.fldif.fn,0,123) := su(lim (prod),ti.a(b) ,124) := lirn.fn.pl.c0n(prod,0,12~,f;a) := su(lim(new.fn),0.pl(f;a),126)
128
:= near.eq2.so.same.lim(fl
:= := 123 := 124 := 121
122
125 126
new.fn,l2o,f;a,l27)
: (a)lim(id.fn) : (a+-a)lim(dif.fn) :
(O)lim(dif.fn)
(b x O)lim(prod) (O)lim(prod) : (O+(f;a))lim(new.fn) : (f;a)lim(new.fn) : :
:
(f;a)lim(f)
:
th
-75
der.so.cont := pf“.75”
: cont
‘heading’ 8. RULES FOR DIFFERENTIATION ‘comment’ We compute the derivative, at the point a, of certain partial functions. (In some cases, we will also give an expression for the derived function of the given partial function.) First, the constant function. a @ [c:Rl] +81
th := (O)der(con.fn(c)) con := con.fn(c) dqf := dq.fn(con) 11 :=
do.con.fn(c,a)
: (a)do(con)
A text fragment from Zucker’s “Real Analysis” (0.4)
757
[d:R1][u:d#a] do.con.fn(c,d) := do.dq.fn(con,d,u,ll,12)
12 := 13
(d)do(con) : (d)do(dqf) :
iv.dif := iv(d+-a,dif.nz(u))
17 := tr3(14,15,16)
(dqf;d)= ((c+-c) xiv.dif) : ((c+-c)xiv.dif)= (0x iv.dif) : (Oxiv.dif)=O : (dqf;d)=O
la := pair(13,17)
: (d) ii .eq (dqf,0)
14 := 15
:= ap(ti.Ri(iv.dif),pl.mi(c))
I6 :=
a
@
va.dq.fn(con,d,u,ll,12)
19 :=
@.ti(iv.dif)
[2X]la
pf := elsewhere.con.so.lim(dqf, 0,191
:
: elsewhere.eq(dqf,O)
:
th
-81
der.con.fn := pf“.81”
: (O)der(con.fn(c))
‘comment’ A s a corollary we have the derived function of a constant function. @ [c:R1] +82 th := der.fn(con.fn(c) )=con. fn(0) con := con.fn(c) derc := der.fn(con)
L.S. van Benthem Jutting and R.C. de Vrijer
758
[d:Rl] (O)der(con) (d)do(derc) : (derc;d)=O : (d)do(con.fn(0)) : (d)ii.eq2(derc, con.fn( 0)) :
12 :=
:
13 14 15
c @
der.con.fn(d,c) da.der.fn(con,d,O,ll) := va.der.fn(con,d,O,ll) := do.con.fn(0,d) := triple(l2,14,13)
11 :=
pf := eq.tot.fn(derc,con.fn(O),[l~]l~): th
-82 der.fn.con.fn := pf “.82”
: der.fn(con.fn(c))=
con.fn(0)
‘comment’ Next, the derivative of the identity function at a, +83
a @
th := (l)der(id.fn) dqf := dq.fn(id.fn) 11 :=
do.id.fn(a)
:
(a)do(id.fn)
[d:Rl][u:d#a]
a @
12 := do.id.fn(d) l3 := do.dq.fn(id.fn,d,u,l1,12) l4 := va.dq.fn(id.fn,d,u,ll,lZ) 15 := ti.iv(dif.nz(u)) 16 := pair(l3,tr(l4,15))
: (d)do(id.fn)
(d)do(dqf) (dqfid)=dq(id.fn,d,u) : dq(id.fn,d,u)=l : (d)ii .eq(dqf, 1)
pf := elsewhere.con.so,lim(dqf, 1 ,x 1~1 ~ )
: th
: :
-83
der.id.fn := pf“.83”
‘comment’ and, again, the derived function.
:
(l)der(id.fn)
A text fragment from Zucker’s “Real Analysis” (0.4)
759
+84
@
th := der.fn(id.fn)=con.fn(l) der.fn := der.fn(id.fn)
a @
@
der.id.fn do.der.fn(id.fn,a,l,ll) l3 := va.der.fn(id.fn,a,l,ll) 14 := do.con.fn( 1,a) l5 := triple(12,14,13) 11 :=
:
12 :=
: : : :
pf := eq.tot.fn(der.fn,con.fn(l), [1X 115)
(l)der(id.fn) (a)do(der.fn) (der.fn;a)=l (a)do(con.fn(l)) (a)ii.eq2(der.fn, con.fn( 1))
: th
-84
der.fn.id.fn := pf“.84”
: der.fn(id.fn)=con.fn(l)
’comment’ The derivative of the sum function of two partial functions. (Note: we return to the context ‘la:Rl][f:pfn][b:Rl][b.der.a.f(b)der(f)] ” and extend it.) b.der.a.f @ [g:pfn] [c:Rl][c.der.a.g: (c)der(g)] +85
th := (b+c)der(sum.fn(f,g))
‘remark’ The idea an this proof (and other proofs in this section) is to define a partial function which is defined and equal to the diflerence quotient function under consideration near a, and for which the limit at a can be computed b y the rules for limits in See. 6. sum dqf dqg dqs sdq
:= := := := :=
sum.fn(f,g) dq.fn(f) dq.fn(g) dq.fn(sum) sumbn(dqf,dqg)
L.S. vm Benthem Jutting and R.C. de Vrijer
760
‘remark’ We want to prove “near.eq2(dqs,sdq)”, since the limit of the partial function sdq at a can be computed. p l := d02(f,g) p2 := ii.eq2(dqs,sdq) 11 := 12 :=
13 := 14 :=
der.so.def.at(f,b,b.der.a.f) der.so.def.at(g,c,c.der.a.g) do.sum.fn(f,g,a,ll,l2) va.sum.fn(flg,a,ll,12)
: (a)do(f) : (a)do(g) : (a)do(sum) : (sum;a)=(f;a)+(g;a)
l5 := der.so.def.near(f,b,b.der.a.f) : near(do(f)) 16 := der.so.def.near(g,c,c.der .a.g) : near( do(g) ) 17 := near.both(do(f),do(g),l5,16) : near(p1) [d:Rl][u:d#a][v:(d)pl] projl(v) : (d)do(f) proj2(v) : (d)do(g) : (d)do(sum) 110 := do.sum.fn(f,g,d,l8,19) 111 := va.sum.fn(f,g,d,l&) : (sum;d)=( (f;d) +(g;d) ) : (d)do(dqf) 112 := do.dq.fn(f,d,u,ll,le) 113 := va.dq.fn(fld,u,ll,18) : (dqf;d)=dq(f,d,u) 114 := do.dq.fn(g,d,u712,19) : (d)do(dqg) lI5 := va.dq.fn(g,d,u,l2,19) : (dqg;d)=dq(g,d,u) 116 := do.dq.fn(sum,d,u,l3,110) : (d)do(dqs) 117 := va.dq.fn(sum,d,u,l3,110) : (dqs;d)=dq( sum,d,u) 118 := do.sum.fn(dqfldqg,d,112,1~~) : (d)do(sdq) 119 := va.sum.fn(dqf,dqg,d,l12,114) : (sdq;d)= ((dqf;d)+ (dqg;d)) 18 :=
19 :=
dif.f := ((f;d)+-(f;a)) dif.g := ((g;d)+-(g;a)) dif.sum := (sum;d)+- (sum;a) 120
:= ap2(dif,lll,l4)
: dif.sum=(((f;d)+(g;d))+-
((f;a)+(g;a)))
dif.sum.is.sum.difs(f;d,g;d, f;a,g;a) := tr(120~121)
121 := 122
: dif.sum=(dif.f+dif.g)
A text fragment from Zucker’s “Real Analysis” ( 0 . 4 )
q l := dif.nz(u) iv.dif := iv(d+-a,ql)
76 1
: (d+-a)#O
123 :=
ap(ti.Ri(iv.dif),lzz)
: dq(sum,d,u)=
124 :=
Ri.dist(dif.f,dif.g,iv.dif)
: ((dif.f+dif.g) xiv.dif)=
((dif.f+dif.g) xiv.dif) (dq(f,d,u)+dq(g,d,u))
c.der.a.g @
129 :=
[3x]lzs
: elsewhere(pl‘imp’p2)
130 :=
near.so.near.mod( p l ,p2, 129’17)
:
near.eql(dqs,sdq)
lim.sum.fn(dqf,b,b.der.a.f, dqg,c,c.der.a.g)
:
(b+c)lim(sdq)
:
th
131 :=
pf := near.eq2.so.same.lim(dqs, ~dq,l3o,b+~,l3i)
-85 der.sum.fn := pf“.85”
: (b+c)der(sum.fn(f,g))
This Page Intentionally Left Blank
763
Checking Landau’s “Grundlagen” in the Automath System Appendices 3 and 4 (The PN-lines; Excerpt for “Satz 27”) L.S. van Benthem Jutting APPENDIX 3.
THE PN-LINES FROM THE PRELIMINARIES
+L *A A*B B * IMP 1 * CON A * NOT A * WEL A* W 2 W*ET B*EC B * AND *SIGMA SIGMA * P P * ALL P * NON P * SOME
._ .._
-----
:= [X,A]B
..-
PN
:= IMP(C0N) :=NOT(NOT(A))
.-._ ._ .-
---
PN
:=IMP(A,NOT(B))
:= NOT(EC(A,B))
.._ ._
---
---
:= P := [X,SIGMA)NOT((X)P) := NOT(NON(P))
;PROP ;PROP ;PROP ; PROP ;PROP ;PROP ; WEL(A) ;A ;PROP ;PROP ;TYPE ; [X,SIGMA]PROP ;PROP ;[X,SIGMA]PROP ;PROP
+E SIGMA * S S*T 3 T * IS 4 S * REFIS P*S S*T TtSP SP*I 5 I * ISP P * AMONE
P * ONE
.._
-----
.._ ._ ..-.-
PN PN
---
:= :=
.-__ :=
__--..-
..-
PN := [X,SIGMA][Y,SIGMA][U,(X)P][V, (Y)PIIS(X,Y) := AND(AMONE(SIGMA,P), SOME(SIGMA,P))
.._ - - P*O1 .6 Ol*IND Pn 7 Ol*ONEAX .- PN .- - - SIGMA t TAU := TAU t F F * INJECTIVE:= ALL((X,SIGMA]ALL([Y,SIGMA]
..._
___
SIGMA SIGMA PROP IS(S,S) SIGMA SIGMA (S)P
; IS(S,T) ; (T)P ; PROP ;PROP ; ONE(SIGMA,P) ; SIGMA ; (1ND)P ; TYPE
; [X,SIGMA]TAU
IMP(IS(TAU,(X)F,(Y)F),IS(X,Y)) ))
; PROP
L.S. van Benthem Jutting
764
8 9 10
11 12
13
14
15 16 17 18 19 20
F*TO TO * IMAGE TAU * F F*G G*I I * FISI P*OT P*O1 OlrIN 01 * INP P * OTAXl P*S S*SP S P t OTAX2 TAU * PAIRTYPE TAU t S S+T T t PAIR TAU * P1 P1 *FIRST P1 *SECOND P 1 * PAIRISl
___
:= ; TAU := SOME([X,SIGMA]IS(TAU,TO,(X)F)) ; PR O P .._ - - ; [X,SIGMA]TAU ._ .- - - ; [X,SIGMA]TAU
._ ._ ._ .._ ..._ .-..-.._ .-
:=
._ ._ .-.-
:=
--PN PN --PN PN PN
___ --PN PN
[X,SIGMA]IS(TAU,(X)F,(X)G) IS( [X,SIGMA]TAU,F,G) TY PE OT SIGMA (WP
INJECTIVE(OT,SIGMA,[X,OT]
IW)) ;SIGMA ; (S)P ; IMAGE(OT,SIGMA,[X, OTlIN(X),S) ; TY PE ;SIGMA ; TAU ; PAIRTYPE ; PAIRTYPE ;SIGMA ; TAU
....._ .._ .-
___
:= := :=
PN PN PN
T * FIRSTISl := T * SECONDISl:=
PN PN
; IS(SIGMA,FIRST(PAIR),S) ; IS(TAU,SECOND(PAIR),T)
._ .._
PN
;TYPE ;SIGMA ; SET ; PR O P ; SET ; SIGMA ; (S)P ;ESTI(S,SETOF(P)) ;ESTI(S,SETOF(P)) ; (S)P ;SET ;SET
. . ~
PN
___
; IS(PAIRTYPE,PAIR(FIFlST,
SECOND),Pl)
-E +*E +ST 21 SI G M A r SET SIGMA * S
s*so
22 23
SO*ESTI P * SETOF P*S S*SP 24 SP*ESTI I S*E 25 E * ESTIE SIGMA t SO SO li TO TO * INCL
:=
._ ...:=
.._ ..-
._ ._ .._ ._
---
___
PN PN
___
--PN
---
PN
-----
:= ALL( [X,SICMA]IMP(ESTI(X,SO),
ESTI(X,TO)))
26 -ST -E -L
TOtI I*J J * ISSETI
:=
.._ ..-
.-.
--PN
;PROP ; INCL(S0,TO) ; INCL(T0,SO) ; IS(SET,SO,TO)
Checking Landau’s “Grundlagen”,Excerpt for “Satz 27” (0.5)
EXCERPT FOR “SATZ 27”
APPENDIX 4. +L *A A+B B * IMP B*A1 Al*I ItMP B*C C*I I +J J + TRIMP + CON A + NOT A * WEL A*A1 A 1 * WELI A +W W+ET AtCl C 1 + CONE
:= :=
---
---
:= [X,A]B :=
._
___
---
:= (A1)I :=
.._
:=
--_
---
___
:= [X,A]((X)I)J
.-
PN
:= I M P ( C 0 N ) := NOT(NOT(A))
._ ._
---
:= [X,NOT(A)](Al)X
._
.._
---
PN
---
:= ET([X,NOT(A)]Cl)
;PROP ;PROP ;P R O P
;A ; IMP(A,B) ;B ;PROP ;IMP(A,B) ; IMP(B,C) ; IMP(A,C) ;P R O P ;P R O P ;P R O P ;A ;WEL(A) ; WEL(A) ;A ; CON ;A
+IMP B+I I +J J: T H 1 BtN N + TH2 BIN N*I 11: T H 3 B*A1 AltN N t TH4 BIN N + TH5 N + TH6 -IMP
._ :=
---
___
; IMP(A,B) ; IMP(NOT(A),B)
:=ET(B,[X,NOT(B)]((TRIMP(CON,I,
._
X))J)X)
;B
._
XI)
; IMP(A,B) ; NOT(B) ; IMP(A,B) ; NOT(A)
--; NOT(A) := TRIMP(CON,B,N,(X,CON]CONE(B, :=
---
---
:= TRIMP(CON,I,N) := _ - -
;A
___ := [X,IMP(A,B)](Al)TH3(N,X) := _ - -
; NOT(B) ;NOT(IMP(A,B)) ; NOT(IMP(A,B))
:= ET([X,NOT(A)](THZ(X))N) := [X,B]([Y,A]X)N
; NOT(B)
:=
;A
765
L.S. van Benthem Jutting
766 BtEC
:=IMP(A,NOT(B))
;PROP
._ ._
; EC(A,B)
+EC BtI I * TH1 BtI I t TH2 -EC BtE EtA1 A1 t E C E l E*B1 B1 t ECEZ B t AND BtA1 A1 t B1 B1 t AND1 B+A1 A1 t A N D E l A1 t ANDEP
._ ._
-----
;A ; NOT(B) ;B :=TH3"-IMP"(NOT(B),WELI(B,Bl),E) ; NOT(A) := NOT(EC(A,B)) ;P R O P := .-;A
:= (A1)E :=
___
._
---
;B
:= TH4"-IMP"(NOT(B),Al,WELI(B,Bl)) ; AND(A,B)
._ ._
---
:= TH5"-IMP"(NOT(B),Al) := ET(B,THG"-IMP"(NOT(B),Al))
; AND(A,B) ;A ;B
+AND BtN N*A1 A1 * T H 3
._
--.
:=
.._
:= E C E l ( E T ( E C , N ) , A l )
; NOT(AND) ;A ; NOT(B)
:= IMP(NOT(A),B)
;PROP
-AND B+OR B*A1 A1 * OR11 B*B1 B 1 * OR12
.-._
--;A :=TH2"-IMP"(NOT(A),B,WELI(Al)) ; OR(A,B) ._ - - ;B := [X,NOT(A)]Bl
; OR(A,B)
+OR B*I I t TH2
._
--; IMP(NOT(B),A) := (X,NOT]ET(B,TH3"L-IMP"(NOT(B), AW)) ; OR(A,B)
-OR Bt Ot Nt Ot Nt
O N ORE2 N ORE1
._ ._
-----
:= (N)O
._
; OR(A,B) ; NOT(A)
;B
--; NOT(B) := ET(TH3"-IMP"(NOT(A),B,N,O)) ; A
Checking Landau’s “Grundlagen”,Excerpt for “Satz 27” (0.5) +*OR BtN N*M M * TH3
._
---
___
:=
; NOT(A) ; NOT(B)
:=TH4“L-IMP”(NOT(A),B,N,M)
; NOT(OR(A,B))
._ ._ ._
; OR(A,B) ; IMP(A,C)
-OR c * o O*I I*J J * ORAPP
-----
.._ ; IMP(B,C) := TH1“-IMP”(C,I,TRIMP(NOT,B,C, QJ)) ;C
:=
+*OR
O*I I I; TH7 O*I I IT H 8 C*D D*O O*I I*J J * TH9
._ __ - - -
; IMP(A,C)
:= TRIMP(NOT(C),NOT,B,[X,NOT(C)]
._ ._
TH3“L-IMP”(A,C,X,I) ,0)
---
:=TRIMP(NOT(A),B,C,O,I)
._ ._ ._ ._
---------
:=THT(A,D,C,TH8( A,B,D,O,J),I)
; OR(C,B)
; IMP(B,C) ; OR(A,C)
;P R O P ; OR(A,B)
; IMP(A,C) ; IMP(B,D) i OR(C,D)
-OR
* SIGMA SIGMA * P P * ALL
._
---
.-._
---
;T Y P E ; [X,SIGMA]PROP ;PROP
---
; SIGMA
:= P
+ALL P*S S*N N * TH1
._
:= - - _ := [X,ALL(SIGMA,P)]((S)X)N
; NOT((S)P)
; NOT(ALL(SIGMA,P))
-ALL
P * NON P * SOME P*S S*SP S P * SOME1
:= [X,SIGMA]NOT((X)P) := NOT(NON(P))
.-._ :=
---
---
:= TH1“-ALL”(NON(P),S,WELI((S)P,
SP))
; [X,SIGMA]PROP ;PROP
; SIGMA ; (S)P ; SOME(SIGMA,P)
+SOME P*N N * TH5 -SOME
:= _ _ . := WELI(NON(P),N)
; NON(P) ; NOT(SOME(SIGMA,P))
767
L.S. van Benthem Jutting
768 P*S s * x X*I
:=
._ ._ ._ ._
___
; SOME(SIGMA,P)
---
; PR O P
---
; [Y,SIGMA]IMP((Y)P,X)
---
; NOT(X)
---
;SIGMA ; NOT((T)P)
+*SOME I*N N*T T*T5 NtT6
.-._ ._
:=TH3'Z-IMP''((T)P,X,N,(T)I) := MP(SOME(SIGMA,P),CON,S,
TH5([Y,SIGMA]T5(Y)))
; CON
-SOME I * SOMEAPP := ET(X,[Y,NOT(X)]T6"-SOME'(Y)) ;X +*SOME P*Q Q*S
S*I
I * TH6
-SOME C * AND3 C*A1 A1 * AND3El A1 * AND3E2 A1 * AND3E3 C*A1 A1 * B1 B 1* C1 C 1* AND31
:= AND(A,AND(B,C))
._
---
; PR O P
; ANDB(A,B,C)
:= ANDEl(AND(B,C),Al) ;A := ANDEl(B,C,ANDEz(AND(B,C),AI)) ;B := ANDEz(B,C,ANDEz(AND(B,C),Al)); C
._ ._ ._ ._
-------
;A ;B ;C
:= ANDI(A,AND(B,C),Al,
ANDI(B,C,Bl,Cl))
; AND3(A,B,C)
+AND3 C*A1 A1 * T H l
._
--AND3E3( Al) ,AND3E1(Al))
-AND3
AND3(A,B,C)
:= AND3I(B,C,A,AND3EZ(Al),
ANDB(B,C,A)
Checking Landau’s “Grundlagen”, Excerpt for “Satz 27’’ (0.5) C t EC3 CtE
:=AND3(EC,EC(B,C),EC(C,A))
E t TH1 E * TH 3 E t TH4
:= ANDBEl(EC,EC(B,C),EC(C,A),E) :=AND3E3(EC,EC(B,C),EC(C,A),E) := THl“L-ANDB”(EC,EC(B,C),EC(C,
._
---
+EC3
‘413) -EC3 EtA1 A1 t EC3E12 A1 t EC3E13 EtB1 B1 t EC3E23 B1 t EC3E21
___
:= := ECEl(THl“-EC3”,Al) := ECE2(C,A,TH3“-EC3” ,Al) := _ - := EC3E12(B,C,A,TH4“-EC3”,Bl) := EC3E13(B,C,A,TH4“-EC3”,Bl)
+*EC3 CtE EtF FtG G * TH6
:= := :=
___ ___ _-.
:= ANDBI(EC,EC(B,C),EC(C,A),E,F, G)
-EC3 +E SIGMA t S StT T * IS S * REFIS PtS StT TtSP SPtI I * ISP SIGMA * S StT TtI I t SYMIS T*U UtI I*J J t TRIS
._ ._ ..._ .-
:=
._ ._ :=
.-.._ ._ ._ :=
---
--PN PN __.
-----
___ PN ----___
:= ISP( [X,SIGMA]IS(X,S),S,T,
._ :=
._ ._
REFIS(S),I)
---
___
---
:=ISP( [X,SIGMA]IS(X,U),T,S,J,
SYMIS( I)) ._ _ - UtI ._ --ItJ := TRIS(S,U,T,I,SYMIS(T,U,J)) J * TRIS2 := TtN N t SYMNOTIS := TH3“L-IMP”(IS(T,S),IS(S,T),N, [X,IS(T,S)]SYMIS(T,S,X))
___
769
L.S. van Benthern Jutting +NOTIS U*N N*I I t TH 3 N*I I * TH4
._ .-._
-----
; NOT(IS(S,T))
IS(T,U)
:= ISP( [X,SIGMA]NOT(IS(S,X)),T,U,
._ ,_
NJ)
---
:= THB(SYMIS(U,T,I))
NOT(IS(S,U)) IS(U,T) NOT(IS(S,U))
-NOTIS UtV VrI I*J JtK K * TR3IS v * w W*I I*J J*K K*L L * TRIIS P * AMONE P
* ONE
PtA1 A1 * S S t ONE1 P*O1 01 t IND 01 t ONEAX SIGMA t TAU TAU t F FtS S*T T*I I t ISF TAU * F F*G G*I I*S S t FISE
GtI I t FISI -E +*E +ST
:=
._ :=
.._
___ --__.
---
:=TRIS(S,U,V,TRIS(I,J),K)
._ ._ ._ .._ ._
----------:= TRIS(S,V,W,TR3IS(I,J,K),L)
SIGMA IS(S,T) IS(T,U) IS(U,V) IS(S,V) SIGMA IS(S,T) IS(T,U) IS(U,V) IS(V,W) IS(S,W)
:= [X,SIGMA][Y,SIGMA][U,(X)P][V,
(Y)PIIS(X,Y)
PROP
:= AND(AMONE(SIGMA,P),
._ .._ .-
SOME(SIGMA,P)) ---
---
PROP AMONE(SIGMA,P) SOME(SIGMA,P)
:= ANDI(AMONE(SIGMA,P),
._ ._ ._ .._ .._ := := :=
._ ._
SOME(SIGMA,P),Al ,S)
---
PN PN
---
___ ___
..-
---
:= ISP(SIGMA,[X,SIGMA]IS(TAU,(S)
ONE(SIGMA,P) ONE(SIGMA,P) SIGMA (1ND)P TY PE [X,SIGMA]TAU SIGMA SIGMA IS(S,T)
F,(X)F),S,T,REFIS(TAU,(S)F),I) IS(TAU,(S)F,(T)F) --; [X,SIGMA]TAU --; (X,SIGMA]TAU --; IS([X,SIGMA]TAU,F,G) --: SIGMA := ISP( [X,SIGMA]TAU,[Y,[X,SIGMA] TAU]IS(TAU,(S)F,(S)Y),F,G,
._ ._ ._ ._ ._ ._ ._
.._ ._ .-
REFIS(TAU,(S)F),I)
--PN
; IS(TAU,(S)F,(S)G) ; [X,SIGMA]IS(TAU,(X)F,(X)G) ; IS([X,SIGMA]TAU,F,G)
Checking Landau’s “Grundlagen”, Excerpt for “Satz 27” (D.5) SIGMA * SET SIGMA * S StSO SO t EST1 P * SETOF P*S S*SP S P * EST11 S*E E * ESTIE
77 1
;TYPE ;SIGMA ; SET ; PROP ; SET ;SIGMA ; (S)P ; ESTI(S,SETOF(P)) ; ESTI(S,SETOF(P)) ; (S)P
+EQ +LANDAU +N
* NAT *X X*Y Y * IS Y * NIS x*s S * IN *P P * SOME P * ALL *1 1; SUC *X X*Y Y*I I * AX2 * AX3 * AX4
*S S I CONDl S * COND2
* AX5
._ .._ ._
PN
; TYPE
-----
; NAT
:= IS“E”(NAT,X,Y) := NOT(IS(X,Y))
.._
---
:= ESTI(NAT,X,S)
._
---
:= SOME“L”(NAT,P)
:= ALL“L”(NAT,P) ._ .- PN ..- P N
._
---
:=
..-
._
---
:= ISF(NAT,NAT,SUC,X,Y,I)
._ .._ .-
PN PN
._ ._
---
:= IN(1,S) := ALL( [X,NAT]IMP(IN(X,S),IN((X)
._ .-
SUC,S))) PN
; NAT ; PROP ; PROP ; SET(NAT) ; PROP ; [X,NAT]PROP ; PROP ; PROP ; NAT ; [X,NAT]NAT ; NAT ; NAT ; IS(X,Y) ;I s ( ( x ) s u c , ( Y ) s u c ) ; [X,NAT]NIS((X)SUC,l) ; [X,NAT][Y,NAT][U,IS((X)SUC,
(Y)SUC)I~S(X,Y) ; SET(NAT) ; PROP ; PROP ; [S,SET(NAT)][U,CONDl(S)]
[V,COND2(S)][X,NAT]IN(X,S)
; [X,NAT]PROP
*P PtlP 1P * XSP XSP * x
+I1 x*s XtT1 X*Y Y+YES YES t T2 YES * T3 XtT4
__-
; SET(NAT) ; CONDl(S) ; NAT
---
; IN(Y,S)
:= SETOF(NAT,P) := ESTII(NAT,P,l,lP) :=
._ ._
:= ESTIE(NAT,P,Y,YES)
(Tl)(S)AX5 -I1
:. (Y)P I I
:= ESTII(NAT,P,(Y)SUC,(T2)(Y)XSP) ; IN((Y)SUC,S) := (X) ([Y,NAT][U,IN(Y,S)]T3(Y,U)) ; IN(X,S)
L.S. van Benthem Jutting
772 X * INDUCTION := ESTIE(NAT,P,X,T4“-11”) *X := := X*Y := Y*N
___ ___ ___
; (X)P ; NAT ; NAT ; NIS(X,Y)
+21
___
N*I IrTl
:=
N * SATZl
:= TH3“L-IMP” (IS( (X)SUC,(Y)SUC),
:= (I)(Y)(X)AX4
;I s ( ( x ) s u c , ( Y ) s u c ) ; IS(X,Y)
-21
IS(X,Y),N,[U,IS((X)SUC,(Y)SUC)l T1“-21”(U))
; NIS((X)SUC,(Y)SUC)
+23 X * PROPl
* T1
:= OR(IS(X,l),SOME([U,NAT]IS(X,
W)SUC)))
; PR O P
:=ORIl(IS(l,l),SOME([U,NAT]IS(l,
XtT2
; PR O Pl ( 1 ) (U)SUC)),REFIS(NAT,l)) := SOMEI(NAT,[U,NAT]IS((X)SUC,(U) SUC),X,REFIS(NAT,(X)SUC)) ; SOME([U,NAT]IS((X)SUC,(U)SUC)
X*T3
:= ORIZ(IS( (X)SUC,l),SOME([U,NAT]
XcT4
IS( (X)SUC,(U)SUC)),T2) ; PROPl((X)SUC) := INDUCTION([Y,NAT]PROPl(Y),Tl, [Y,NATI[U,PROPl(Y)]T3(Y),X); PROPl(X)
)
-23 X*N N * SAT23
:= ... ; NIS(X,l) := OREZ(IS(X,l),SOME([U,NAT]IS(X, (U)SUC)),T4“-23” ,N) ; SOME([U,NAT]IS(X,(U)SUC))
Y*Z
._ ._
---
: NAT
X*F F * PROPl
:=
___
; [Y,NAT]NAT
+24
F * PROP2 X*A A*B BtPA PA * PB PBtY Y * PROP3 P B * T1 PB * T 2 P B * T3
:= ALL((Y,NAT]IS(((Y)SUC)F,((Y)F)
S W )
:= AND(IS((l)F,(X)SUC),PROPl)
:=
___
:=
..-
:=
.--
:= :=
.--
___
:= IS((Y)A,(Y)B) := ANDEl (IS((1)A ,( X) SUC) ,PROP1(A)
; PR O P ; PR O P ; [Y,NAT]NAT ; [Y,NAT]NAT ; PROPZ(A) ; PROPZ(B) ; NAT ; PR O P
,PA) ; IS((1)A,(X)SUC) := ANDEl(IS((l)B,(X)SUC),PROPl(B) ; IS((1)B,(X)SUC) ,PB) := TRISZ(NAT,(l)A,(l)B,(X)SUC,Tl, ; PROP3( 1) T2)
Checking Landau’s “Grundlagen”,Excerpt for “Satz 27” (0.5) :=
P*T6
:=ANDE2(IS((l)B,(X)SUC),PROPl(B)
PtT7 PtT8 P*T9
:= (Y)T5
PB * T11 XtAA
X * PROP4 t t
T12 T13
* T14 X*P P*F FtPF PFtG PFtY Y * T15 PF * T16 PF t T17 Y * T18 Y * T19 Y t T20 Y * T21 PF * T22 PF * T23 PF t T24 P t T25
XtBB -24
___
Y*P PtT4 P*T5
Y * T10
773
:= AX2((Y)A,(Y)B,P) :=ANDE2(IS( (l)A,(X)SUC),PROPI(A)
,PA) J’B)
:= (Y)T6
:= TRBIS(NAT,((Y)SUC)A,( (Y)A)SUC,
((Y)B)SUC,( (Y)SUC)B,TI,TQ, SYM1S“E”(NAT,( (Y)SUC) B,((Y)B) SUC,T8)) ; PROP3((Y)SUC) := INDUCTION([Z,NAT]PROPB(Z),T3, [Z,NAT][U,PROPS(Z)]T9(Z,U),Y) ; PROPB(Y) := FISI(NAT,NAT,A,B,[Y,NAT]TlO(Y)) ; IS“E’’( [Y,NAT]NAT,A,B) := [Z,[Y,NAT]NAT][U,[Y,NAT]NAT] [V,PROP2(Z)][W,PROP2(U)]Tll(Z, U,V,W) ; AMONE([Y,NAT]NAT,[Z,[Y,NAT] NAT]PROP2(Z)) := S0ME“L” ([Y,NAT]NAT,[Z,[Y ,NAT] ; PROP NAT]PROP2(Z)) := [X,NAT]REFIS(NAT,((X)SUC)SUC); PROP1(1,SUC) :=ANDI(IS((l)SUC,(l)SUC), PROPl(l,SUC),REFIS(NAT,( 1)SUC) ,TW ; PROP2(1,SUC) := SOMEI([Y,NAT]NAT,[Z,[Y,NAT] NAT]PROP2(1,Z),SUC,T13) PROP4(1) := _ _ _ PROP4(X) := _ _ _ [Y,NAT]NAT ._ - - PROP2(F) [Y,NAT]NAT := [Y,NAT]((Y)F)SUC := --NAT :=REFIS(NAT,(Y)G) IS((Y)G,((Y)F)SUC) := ANDEl(IS((l)F,(X)SUC),PROPl(F) ,PF) ; IS((1)F,(X)SUC) := TRIS(NAT,(l)G,((l)F)SUC,((X) SUC)SUC,TlB(l),AXS((l)F,X) ( SUC,Tl6)) ; IS( (l)G,( (X)SUC)SUC) := ANDE2(IS((l)F,(X)SUC),PROPl(F) ,PF) ;PROPl(F) := (Y)T18 ; IS(((Y)SUC)F,((Y)F)SUC) := TRIS2(NAT,((Y)SUC)F,(Y)G,((Y) F)SUC,T19,T15) := TRIS(NAT,((Y)SUC)G,(((Y)SUC)F) SUC,((Y)G)SUC,T15((Y)SUC), AX2( ((Y)SUC)F,(Y)G,TBO)) := [Y,NAT]T21(Y) := ANDI(IS((l)G,((X)SUC)SUC), PROPl((X)SUC,G),T17,T22) :=SOMEI((Y,NAT1NAT,[Z,(Y,NAT] NAT]PROP2((X)SUC,.Z);G,T23)’ ; PROP4((X)SUC) :=SOMEAPP([Y,NAT]NAT,[Z,[Y,NAT] NAT]PROP2(Z),P,PROP4( (X)SUC), [Z,[Y,NAT]NAT][U,PROP2(Z)] T24(Z,U)) ; PROP4((X)SUC) := INDUCTION([Y,NAT]PROPI(Y),T14, [Y,NAT][U,PROP4(Y)]T25(Y,U),X) ; PROP4(X)
L.S. van Benthem Jutting
774 X * SATZ4
:=ONEI([Y,NAT]NAT,[Z,[Y,NAT]NAT]
PROP2“-24”(Z),AA“-24”,BB“-24”); ONE“E”([Y,NAT]NAT,[Z,[Y,NAT] NAT]AND(IS((l)Z,(X)SUC), ALL([Y,NAT]IS(((Y)SUC)Z,((Y)
Z)=JC))))
X * PLUS
:= IND([Y,NAT]NAT,[Z,[Y,NAT]NAT]
Y*PL
:= (Y)PLUS
X t T26
:= ONEAX([Y,NAT]NAT,[Z,v,NAT]
PROP2“-24”(Z),SATZ4)
NAT]PROP2(Z),SATZ4)
; [Y,NAT]NAT ; NAT
; PROP2(PLUS)
-24 X * SATZ4A
:= ANDEl(IS((l)PLUS,(X)SUC),
X * T27
:= ANDE2(IS((l)PLUS,(X)SUC),
PROP1“-24”(PLUS),T26“-24”)
; IS(PL(X,l),(X)SUC)
+*24 PROPl(PLUS),T26)
; PROPl(PLUS)
-24 Y t SATZ4B
:= (Y)T27“-24”
; rs(PL(X,(Y)suC),(PL(x,Y))suc)
:=Tll(l,PLUS(l),SUC,T26(1),T13)
; IS”E”([Y,NAT]NAT,PLUS(l),SUC)
+*24
* T28 -24 X * SATZ4C
:= FISE(NAT,NAT,PLUS(l),SUC,
T28“-24”,X)
; IS(PL(1,X),(X)SUC)
+*24 X t T29
:= T1l((X)SUC,PLUS((X)SUC),
[Y,NAT] ((Y)PLUS)SUC,T26((X) SUC),T23(BB,PLUS,T26))
; IS“E”([Y,NAT]NAT,PLUS((X)SUC),
[Y,NAT] ((Y)PLUS)SUC)
-24 Y * SATZ4D X * SATZ4E
Y t SATZ4F X * SATZ4G
:= FISE(NAT,NAT,PLUS((X)SUC),
[Z,NAT] ((Z)PLUS)SUC,T29“-24”, Y) := SYMIS(NAT,PL(X,l),(X)SUC, SATZ4A) :=SYMIS(NAT,PL(X,(Y)SUC),(PL (X,Y))SUC,SATZ4B) :=SYMIS(NAT,PL(l,X),(X)SUC, SATZ4C)
; IS(PL( (x)suc,Y),(PL(x,Y))suc) ; IS((X)SUC,PL(X,1)) ; Is((PL(x,Y))suc,PL(x,(Y)suc)) ; IS((X)SUC,PL(l,X))
Checking Landau's "Grundlagen", Excerpt for "Satz 27" (D.5)
775
Z*I I * ISPLl I * ISPL2
f25 Zt
PROPl
Y*Tl
:= rs(PL(PL(X,Y),z),PL(x,PL(Y,z)) ) ; PROP
:= TR31S(NAT,PL(PL(X,Y),l),(PL
~
ZtP P*T2 P+T3
~
,
~
~
~
~
~
~
-25 Zt
SAT25
:= INDUCTION([U,NAT]PROPl
"-25"
(U),T1"-25" ,[U,NAT][V,PROP1"-25" (U)]T3"-25"(U,V) ,Z) ; IS(PL(PL(X,Y) ,Z) ,PL(X,PL(Y,Z)
Z 1: ASSPLl
:= SAT25
))
; IS(PL(PL(X,Y),Z),PL(X,PL(Y,Z)
1) +26 Y c PROPl Y*Tl YtT2 Y*T3 YtP PrT4 PtT5 PtT6
:= IS(PL(X,Y),PL(Y,X))
:= SATZIA(Y)
:= SATZ4C(Y) := TRIS2(NAT,PL(l,Y),PL(Y,l),(Y)
; PROP ; IS(PL(Y,l),(Y)SUC) ; IS(PL(l,Y),(Y)SUC)
; PROPl(1,Y) SUC,T2,T1) --; PROPl(X,Y) := TRIS(NAT,(PL(X,Y))SUC,(PL(Y,X) )SUC,PL(Y,(X)SUC),AX2(PL(X,Y), PL(Y,X),P),SATZ4F(Y,X)) ; Is((PL(x,Y))suc,PL(Y,(x)suc)) := SATZ4D ; IS(PL((x)suc,Y),(PL(x,Y))suc) :=TRIS(NAT,PL((X)SUC,Y),(PL(X,Y) )SUC,PL(Y, (X) SUC),T5,T4) ; PROPl((X)SUC,Y)
._
- 26
Y I SAT26
Y t COMPL
,
~
PL(X,PL(Y,l)),SATZ4A(PL(X,Y)), SATZQF,ISPLP( (Y)SUC,PL(Y,l),X. SATZ4E(Y))) : PROPl(1) := - _ _ ; PROPl(2) := AXZ(PL(PL(X,Y),Z),PL(X,PL(Y,Z)) P) ; IS((PL(PL(X,Y),Z))SUC,(PL (X,PL(Y,Z)))SUC) := TR4IS(NAT,PL(PL(X,Y),(Z)SUC), (PL(PL(X,Y),Z))SUC,(PL(X,PL ~ Y , Z ~ ~ ~ S U C , P L ~ X , ~ ~ ~ ~ ~ , ~ ~ ~ ~ ~ ~ ~ , PL(X,PL(Y,(Z)SUC)),SATZ4B(PL( X,Y),Z) ,TZ,SATZIF(X,PL(Y ,Z)), ISPL2((PL(Y,Z))SUC,PL(Y,(Z) SUC),X,SATZIF(Y,Z))) ; PROPI((2)SUC)
:= INDUCTION([Z,NAT]PROP1"-26"
(Z,Y),T3"-26 ,[Z,NAT] [U,PROP1"-26b (Z,Y)]T6"-26"(Z, Y,U),X) :=SAT26
L.S. van Benthern Jutting
776 +27 Y * PROP1 XtTl X*TZ Y*P P*T3 P*T4
:=NIS(Y,PL(X,Y)) ; PROP := SYMNOTIS(NAT,(X)SUC,l,(X)AX3); NIS(l,(X)SUC) := TH4"E-NOTIS"(NAT,l,(X)SUC, PL(X,l),Tl,SATZIA) ; PROPl(1) := ; PROPl(Y) NIS( (Y)SUC,(PL(X,Y))SUC) := SATZl(Y,PL(X,Y),P)
-__
:= TH4"E-N0TIS1'(NAT,(Y)SUC,(PL
(X,Y))SUC,PL(X,(Y)SUC),T3, SATZ4B)
PROPl((Y)SUC)
-27 Y * SATZ7
:= INDUCTION([Z,NAT]PROPl"-27"
(Z),T2"-27",[Z,NAT][U,PROPl"-27" ; NIS(Y,PL(X,Y)) (Z)]T4"-27"(Z,U) ,Y) ; PROP Z* DIFFPROP := IS(X,PL(Y,Z))
+29
Y*I Y * I1 Y * 111 Y * ONEl ONEl * U U*Tl UcT2 ONEl * T3 Y*T4 ONEl * T5 Y *T6 Y t TWOl TWOl * THREEl THREEl I U U*DU DU*V V*DV DV * T6A
:= IS(X,Y)
; PROP
.._ ._
;I
; PROP := SOME([U,NAT]DIFFPROP(X,Y,U)) ; PROP := SOME([V,NAT]DIFFPROP(Y,X,V))
----:= TRIS(NAT,PL(U,X),PL(X,U), PL(Y,U),COMPL(U,X), ISPLl(U,ONEl)) := TH3"E-NOTIS"(NAT,X,PL(U,X), PL(Y,U) ,SATZ7(U,X),Tl) := TH5"L-SOME"(NAT,[U,NAT] DIFFPROP(U),[U,NAT]T2(U)) := THl"L-EC"(I,II,[Z,I]T3(Z)) := T3(Y,X,SYMIS(NAT,X,Y,ONEl)) := TH2"L-EC"(III,I,[Z,I]T5(Z)) ._ - - -
DU * T8 THREEl * T9 TWOl * T10
; IS(PL(U,X),PL(Y,U)) ; NIS(X,PL(Y,U))
;NOT(II) ; EC(I.11) ; NOT(II1) ; EC(II1,I) ; I1 ._ - - ; 111 ._ ._ - - ; NAT ._ - - ; DIFFPROP(X,Y,U) ._ ._ - - ; NAT := ___ ; DIFFPROP(Y,X,U) :=TR4IS(NAT,X,PL(Y,U),PL(PL(X,V)
,
DV * T7
: NAT
~
~
,
~
~
~
~
,
~
X),DU,ISPLl(Y,PL(X,V),U,DV), ASSPLl(X,V,U),COMPL(X,PL(V,U)) ) ; IS(X,PL(PL(V>U) 3)) := MP(IS(X,PL(PL(V,U),X)),CON, TSA,SATZ7(PL(V,U),X)) ; CON := SOMEAPP(NAT,[V,NAT]DIFFPROP (Y,X,V),THREEl,CON,[V,NAT] [DV,DIFFPROP(Y,X,V)]T7(V,DV)) ; CON := SOMEAPP(NAT,[U,NAT]DIFFPROP (U),TWOl,CON,[U,NAT] [DU,DIFFPROP(U)]T8(U,DU)) ; CON := [Z,III]TS(Z) ;NOT(III)
~
~
~
Checking Landau’s “Grundlagen”, Excerpt for “Satz 27” ( 0 . 5 )
777
Y*T11 Y*A
:= TH1 “L-EC”(II,III,[Z,II]TlO(Z)) := TH6”L-EC3”(I,II,III,T4,T11 ,T6)
; EC(I1,III) ; EC3(1,11,111)
Y * SATZ9B
:= A“-29”
; EC3(IS(X,Y),SOME([U,NAT]
-29
DIFFPROP(X,Y.U)),SOME(
[V,NAT]DIFFPROP(Y,X,V)))
Y * MORE Y * LESS Y * SATZlOB
:= SOME([U,NAT]DIFFPROP(X,Y,U)) ; PR O P := SOME([V,NAT]DIFFPROP(Y,X,V)) ; PR O P := SATZ9B ; EC3(IS(X,Y),MORE(X,Y),
Y*M M * SATZll Y * MOREIS Y * LESSIS Y*M M t SATZl3
:=
_-_
:= M := OR(MORE,IS(X,Y)) :=OR(LESS,IS(X,Y))
._ ._
---
LESS(X,Y)) ; MORE(X,Y) ; LESS(Y,X) ; PR O P ; PR O P ; MOREIS(X,Y)
:= TH9“L-OR”(MORE,IS(X,Y),
LESS(Y,X),IS(Y,X),M,[Z,MORE] Z*I I*M M * ISMOREl
._
SATZll(Z),[Z,IS(X,Y)] SYMIS(NAT,X,Y ,Z))
; LESSIS(Y,X)
--; IS(X,Y) := - _ ; MORE(X,Z) := ISP(NAT,[U,NAT]MORE(U,Z),X,Y,
; MORE(Y,Z) MA ._ I*M ._ - - ; MOREIS(X,Z) M * ISMOREIS1 := ISP(NAT,[U,NAT]MOREIS(U,Z),X, ; MOREIS(Y,Z) Y,M,I) I*M := -__ ; MOREIS(Z,X) M 8 ISMOREIS2 :=ISP(NAT,[U,NAT]MOREIS(Z,U),X, ; MOREIS(Z,Y) Y,M,I) ._ - - Y*I ; IS(X,Y) I 1: MOREIS12 :=ORI2(MORE(X,Y),IS(X,Y),I) ; MOREIS(X,Y) Y*M := ; MORE(X,Y) M * MOREISIl := ORIl(MORE(X,Y),IS(X,Y),M) ; MOREIS(X,Y) z*u ._ - - ; NAT ._ - - U*I ; IS(X,Y) ._ I*J ._ - - ; IS(Z,U) ._ J*M --; MOREIS(X,Z) M * ISMOREIS12:= ISMOREIS2(Z,U,Y,J, ISMOREISl(X,Y,Z,I,M)) ; MOREIS(Y,U) .._ - - YtM ; MORE(X,Y) M * SATZlOG := TH3“L-OR”(LESS(X,Y),IS(X.Y),
___
.-
__
EC3E23(IS(X,Y),MORE(X,Y), LESS(X,Y),SATZlOB,M),
EC3E21(IS(X,Y),MORE(X,Y), LESS(X,Y),SATZlOB,M)) Y * SATZ18
:= SOMEI(NAT,[U,NAT]
Z*M
.-
; NOT(LESSIS(X,Y))
DIFFPROP(PL(X,Y),X,U),Y,
._
REFIS(NAT,PL( X,Y)))
---
; MORE(PL(X,Y),X) ; MORE(X,Y)
L.S. van Benthem Jutting
778
+319
___
MtU UtDU DU t T 1
:=
DU t T 2
:= TRJIS( NAT,PL(X,Z),PL(PL(U,Y),
._ ._
; NAT
--;DIFFPROP(U) := TRIS(NAT,X,PL(Y,U),PL(U,Y),DU, COMPL(Y ,U))
Z
DU t T3
~
,
; IS(X,PL(U,Y))
~
~
~
~
,ISPLl (X,PL(U,Y),Z,Tl), ASSPLl(U,Y ,Z) ,COMPL(U,PL(Y ,Z)) ) := SOMEI(NAT,[V,NAT]
, ;~
~
~
~
~
~
~
DIFFPROP(PL(X,Z),PL(Y,Z),V),U, T2)
; MORE(PL(X,Z),PL(Y,Z))
-319 M t SATZ19A
:= SOMEAPP(NAT,[U,NAT]DIFFPROP
(U),M,MORE(PL(X,Z),PL(Y,z)), [U,NAT][V,DIFFPROP(U)] T3“-319”(U,V)) ZtM
:=
___
; MORE(PL(X,Z),PL(Y,Z)) ; MOREIS(X,Y)
MtN N * T4
:=
___
; MORE(X,Y)
+*319
M*I I*T5
:= MOREISIl(PL(X,Z),PL(Y,Z),
.-
SATZ19A(N)) ...
; MOREIS(PL(X,Z) ,PL(Y ,Z)) ; IS(X,Y)
:= MOREISI2(PL(X,Z),PL(Y,Z),
ISPLl(X,Y,Z,I))
; MOREIS(PL(X,Z) ,PL(Y ,Z))
-319 M * SATZ19L
M * SATZ19M
:= ORAPP(MORE(X,Y),IS(X,Y),
MOREIS(PL(X,Z),PL(Y,Z)),M,
[U,MORE(X,Y)]T4”-319” (U),[U,IS (X,Y)]T5“-319”(U))
; MOREIS(PL( X,Z) ,PL(Y ,Z))
:= ISMOREISl2(PL(X,Z),PL(Z,X),
PL(Y,Z),PL(Z,Y),COMPL(X,Z), COMPL(Y,Z),SATZlSL)
; MOREIS(PL(Z,X),PL(Z,Y))
+324 X*N NtU UtI ItTl
:= :=
..-
___
._ ._
---
:=TRIS(NAT,X,(U)SUC,PL( l,U),I,
SATZ4G(U)) I*T2
:= ISMOREl(PL(l,U),X,l,
SYMIS(NAT,X,PL( 1,U),Tl), SATZ18( 1,U)) N*T3
;M O R E ( X , l )
:= SOMEAPP(NAT,(U,NAT]IS(X,(U)
SUC),SATZ3( X,N),MORE(X,l),
[U,NAT][V,IS(X,(U)SUC)]TZ(U,V) )
-324
; MORE(X,l)
,
~
~
~
~
~
Checking Landau’s “Grundlagen”, Excerpt for “Satz 27” (D.5) X t SATZ24
:= TH2“L-OR”(MORE(X,l),IS(X,l),
X * SATZ24A YtM
:= SATZ13(X,l,SATZ24) :=
[U,NIS(X, l)]T3“-324”(U))
___
779
; MOREIS(X,l) ; LESSIS(1,X) ; MORE(Y,X)
+325
MtU U*DU DU t T1 DU * T2
___
:=
.._
---
; NAT ; DIFFPROP(Y,X,U)
:= SATZlSM(U,l,X,SATZ24(U)) ; MOREIS(PL(X,U),PL(X,l)) := ISMOREISl(PL(X,U),Y,PL(X,l), SYMIS(NAT,Y,PL(X,U),DU),Tl) ; MOREIS(Y,PL(X,l))
-325 M * SATZ25
:= SOMEAPP(NAT,[U,NAT]DIFFPROP
(Y,X,U),M,MOREIS(Y,PL(X,l)),
[U,NAT][V,DIFFPROP(Y ,X,U)] T2“-325”(U,V))
; MOREIS(Y,PL(X,l))
Y*L L * SATZ25B
:= - - ; LESS(Y,X) := SATZl3(X,PL(Y,l),SATZ25(Y,X,L)
*P P*N
.-._
NrM M t LBPROP
._
NtLB N * MIN PtS
:= ALL([X,NAT]LBPROP“-327”(X)) ; PROP := AND(LB,(N)P) ; PROP
1
._
---
---
; LESSIS(PL(Y,l),X) ; [X,NAT]PROP
: NAT
+327
---
:= IMP((M)P,LESSIS(N,M))
; NAT ; PROP
-327
:=
__-
:=
___
; SOME(P)
+*327
S*N NIT1 ScT2 StL L*Y Y +YP YP * T3 YP t T4 YP * T5 YP * T6 YP t T7 LtT8
:= [X,(N)P]SATZSIA(N)
:= [X,NAT]Tl(X)
; NAT ; LBPROP(1,N)
;LB(1) ; [X,NAT]LB(X) ; NAT : (Y)P := SATZ18(Y,1) ; MORE(PL(Y,l),Y) : NOT(LESSIS(PL(Y,l),Y)) := SATZlOG(PL(Y,l),Y,T3) := TH4“L-IMP”( (Y)P,LESSIS(PL(Y,l) ; NOT(LBPROP(PL(Y,l),Y)) ,Y),YP,W := TH1“L-ALL”(NAT,[X,NAT] ; NOT(LB(PL(Y,l))) LBPROP(PL(Y,l),X),Y,T5) := MP(LB(PL(Y,l)),CON,(PL(Y,l))L, : CON T6) := SOMEAPP(NAT,P,S,CON,[X,NAT] : CON [Y,(X)PlT7(X,Y))
._
:= :=
---
___ ___
L.S. van Benthem Jutting
780 StN
._
--.
; NON(NAT,[X,NAT]AND(LB(X),
NtM MtL LtT9
:=
___
; NAT
---
;W M ) ; NOT(AND(LB(M) ,NOT(LB(PL
L t T10 L t T11 N t T12 S t T13 StM MIA At T14 At T15 A * NMP NMP * N NtNP NP t T16 NP * T17
._
:= (M)N
:= ET(LB(PL(M,l)),
NP * T19 NMP * T20 NMP * T21 A * T22 A t T23
(M.1)))))
TH3"L-AND"(LB(M), ; LB(PL(MJ)) NOT(LB(PL(M,l))),Tg,L)) := ISP(NAT,[X,NAT]LB(X),PL(M,l), ; LB((M)SUC) (M)SUC,TlO,SATZ4A(M)) := [X,NAT]INDUCTION([Y,NAT]LB(Y), T2,[Y,NATl[Z,LB(Y)]Tll(Y,Z),X); (X,NAT]LB(X) := [X,NON(NAT,[X,NAT]AND(LB(X), NOT(LB(PL(X,1)))))]T8(T12(X)) ; SOME([X,NAT]AND(LB(X), NOT(LB(PL(X,1))))) ._ ; NAT ._ - - := .-. ; AND(LB(M),NOT(LB(PL(M,l)))) := ANDEi(LB(M),NOT(LB(PL(M,l))), ;L B W A) := ANDEz(LB(M),NOT(LB(PL(M,l))), ; NOT(LB(PL(M,l))) A) := --; NOT((M)P) ._ ._ - - ; NAT ._ - - ; (N)P ; LESSIS(M,N) := MP((N)P,LESSIS(M,N),NP,(N)T14) :=TH3"L-IMP"(IS(M,N),(M)P,NMP, [X,IS(M,N)]ISP(NAT,P,N,M,NP,
NP t T18
NOT(LB(PL(X,1)))))
SYMIS(NAT,M,N,X))) := OREl(LESS(M,N),IS(M,N),T16, T17) := SATZ25B(N,M,T18) := [X,NAT][Y,(X)P]T19(X,Y) := MP(LB(PL(M,l)),CON,T20,TlS) :=ET((M)P,[X,NOT((M)P)]T21(X)) := ANDI(LB( M),(M)P,T14,T22)
; NOT(IS(M,N))
LESS(M,N) LESSIS(PL(M,l),N) LB(PL(M,1)) CON (M)P MIN(M)
-327 S * SATZ27
:= TH6"L-SOME"(NAT,[X,NAT]
AND(LB(X),NOT(LB(PL(X,l)))), [X,NAT]MIN(X),T13"-327",
[X,NAT][Y,AND(LB(X), NOT(LB(PL( X,l))))]T23"-327"(X, Y)) -N -LANDAU -EQ -ST -E -L
; SOME([X,NAT]MIN(P,X))
PART E Verification
This Page Intentionally Left Blank
783
A Verifying Program for Automath I. Zandleven'
0. SUMMARY
This paper describes the Automath verifier which is being operated [in the beginning of the seventies] at the Technological University at Eindhoven. The description is given in terms of a number of procedures, written in an ALGOL-like language. The contents are: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
General remarks. The description language. The translator. Some basic notions and procedures. Substitution. Reductions. CAT and DOM. Definitional equality. Correctness of expressions. Correctness of lines. A paragraph system. Final remarks.
For the theoretical background we refer t o the papers of Prof. de Bruijn, D. van Daalen and R. Nederpelt: [ d e Bruijn 70a (A.211, [de Bruijn 73b], [van Daalen 73 (A.3)] and [Nederpelt 73 (C'.3)].
1. GENERAL REMARKS
1.1. The aim of this paper is to give a rough description of how the AUT-68 and AUT-QE verifier is constructed and how it works. Most of the procedures 'The author is employed in the Automath project and is supported by the Netherlands Organization for the Advancement of Pure Science (Z.W.O.).
I . Zandleven
784
are much simplified for the sake of clarity and so as not t o bother the reader with topics like memory organization, error messages etc.
1.2. The whole verifier is embedded in a conversational system (operating via a terminal) in order t o control the amount of work the program might do in certain cases (mostly when an error in the Automath text has been made). The parts of the procedure texts, whose execution is (partly) controlled by human intervention, are placed between the brackets ?( and )? . Furthermore there is the opportunity to the user to debug the text on-line. 1.3. Notations 1.3.1. Expressions are denoted by A, B, ...,A l , A2, ... etc. 1.3.2. Syntactical identity is denoted by
=.
1.3.3. Bound variables in abstraction expressions are denoted by x,y, ...; thus e.g. [z : A] B. 1.3.4. Expression strings are denoted by C, I?, ... . 1.3.5. An expression, occurring in an expression string C is denoted by C with a subscript; thus C = (El, ...,C,), where Ci are the expressions occurring in C ( i = 1,...,n). 1.3.6. Each non-empty string C can be divided into two parts: C+ := the last expression of C C- := the rest of C (which may be empty)
.
Example. If C = A, B, C ( D ,E ) ,F ( G ,H ) then C+ = F ( G ,H ) , C- = A , B , C ( D ,E ) .
1.3.7. The composition of a string is denoted by the parenthesis (( and )) = ((C+, C-)).
e.g.: C
1.3.8. An indicator string [van Daalen 73 ( A . 3 ) , 2.131 is denoted by I , and a context [van Daalen 7 3 (A.3), 2.21 by 0. 1.3.9. Sometimes, in theoretical discussions, the notation of D. van Daalen is used [van Daalen 73 ( A . 3 ) , 5.31.
A verifying program for Automath ( E l )
785
2. THE DESCRIPTION LANGUAGE
2.1. The language used for the description of the verifying procedures, is based upon ALGOL '60. 2.2. Several types (in the sense of ALGOL '60) are added, e.g. expression, defined name, etc.
2.3. A construction case ... of begin ... end is added, to avoid repeated if ... then ...else-constructions. The values of the case selector are placed before the entries, as labels. Examples. The statement if color = red then paint (river valley) if color = white then paint (Christmas) else if color = blue then paint (moon) else paint (nothing) ,
may now be written as: case color of begin red: paint (river valley); white: paint (Christmas); blue: paint (moon); otherwise: paint (nothing); end;
Another possibility is: paint (case color of begin red: river valley; white: Christmas; blue: moon; otherwise: nothing; end);
So the case-construction may be used for both statement selection and assignment selection.
I. Zandleven
786
Some non-ALGOL symbols are used, e.g. t-, 0,..., and sometimes procedure identifiers are defined as infix, e.g. d OLDER THAN b would be written OLDERTHAN(d, b ) in correct ALGOL. 2.4.
2.5. Each procedure, whose identifer is written in capitals or non-ALGOL symbols, is explained.
2.6. No use is made of the parameter device: value. If an actual parameter has to be evaluated, this is done once only at the beginning of the procedure. All further calls are calls by reference to a program variable.
3. THE TRANSLATOR
Before Automath texts are presented with the verifier, they are passed through a translator. One may consider this translator as a preprocessor, checking the context-free part of the Automath syntax (parentheses, commas etc.), coding the identifier-paragraph identification (see 11),completing the expressions written in shorthand, etc.
4. SOME BASIC NOTIONS AND PROCEDURES
4.1. Shapes Most of the procedures must be able to distinguish the different characteristic forms in which expressions appear. For this purpose we introduce the notion shape, which represents the outermost characteristic form of an expression. E.g. the expression:
(4B)) C ( b Dl E ) has the “application shape”, symbolically denoted by an application expression such as ( P )Q or ( E l )E z .
4.1.1. The shapes, and their symbolism, which are used, are:
A verifying program for Automath ( E l )
787
shape
symbolism
type Prop variable bound variable
type Prop variable boundvar
constant shape application shape abstraction shape
d(C)
( A )B [x : A] B
4.1.2. When using this symbolism for the shapes, we will permit ourselves to use the sub-elements of it, as expressions on which to operate (without explicit declaration of and assignment to the program variables). So we may write, for example: if shape ( E ) = [x : A] B then domain := A else ... ,
4.2. Primitive procedures Often, during the verification process of a book B we need the indicator string, the middle expression or the category expression of a certain line of B. Each line in the book U is uniquely indicated by the name introduced in the identifier part of that line (possibly with a paragraph reference, see 11). These names will belong to the ALGOL-type def inedname. Because an indicator string may be considered as a string of expressions, we may introduce the 4.2.1. expressionstring procedure INDSTR ( d ) ; definedname d comment INDSTR becomes the indicator string of the line in which d is defined;
For the middle and category expression procedures:
4.2.2. expression procedure MIDDLE ( d ) ; def inedname d; comment MIDDLE becomes the middle expression of the line in which d is defined. Of course this procedure is only allowed for those d which represent an abbreviation.
I. Zandleven
788
4.2.3. expression procedure CATEGORY ( d ) ; def inedname d; comment CATEGORY becomes the category expression of the line in which d is defined (both for EB lines, PN lines and abbreviations);
The bodies of these procedures cannot be explained without going into details of memory organization, a subject which is beyond the scope of this note. 4.2.4. Another primitive procedure, OLDER THAN, will be explained in 8.2.
5. SUBSTITUTION
5.1. We have introduced two different shapes (and codings) for variables to be able to distinguish properly between all the variables occurring in an expression. By “shape = variable” we code the variables which occur in indicator strings (these variables are sometimes called parameters). By “shape = boundvar” we code the variables which occur in abstractors. Furthermore, in one Automath book, all binding variables (i.e. variables occurring as z in [z : A] ...) get different code-numbers. So the substitution becomes a simple replacement operation. Now there is only one possible way t o get a so-called clash of variables, namely in the following example. Suppose we have an expression like [z : A] (..., ( B ( z ) )[y : C ][z : A] D(y), ...)
.
If we want to reduce the expression between the dots (by &reduction), we will obtain the expression [z : A1 D ( B ( z ) )
and we see that the z in D ( B ( z ) )is bound by the wrong abstractor now. It is claimed by the author that by this coding system no clash (conflict, confusion) of variables arises during the verification process of Automath. 5.2. Substitution for free variables At first we define a procedure SUBST, which will replace free variables (shape = variable) by expressions, as follows:
Let v be a string of free variables (mutually distinct), let r be an equally long string of expressions,
789
A verifying program for Automath [ E l )
let E be an expression. The procedure SUBST constructs a new expression by replacing in E all w, by the corresponding ri. The procedure (function) identifier SUBST will become the resulting expression: ([wl,...,wn/rl,...,r,]lE (see [van Daalen 73 (A.311 for this notation). We call this procedure by e.g.
SUB ST(^, r, E ) . The string analogue of SUBST(v, I?, E ) , STRINGSUBST(v, I', C) means: replace, in all C j , all vi by the corresponding ri.
5.2.1. Procedure text 5.2.1.1. expression procedure SUBST(v, r, E ) ; expression E ; expressionstring v,r; comment shape (vi) must be variable; SUBST := case shape ( E ) of begin variable : if 3io:=i(vi = E ) then rio else E ;
r,
: STRINGSU SUB ST(^, E ) ) ; : SUB ST(^, A ) ) SUB ST(^, B ) ; ( A )B [z : A] B : [z : SUBST(v, A ) ]SUBST(v, I', B ) ; otherwise : E ;
d ( ~ )
r,
r,
r,
end:
5.2.1.2. expressionstring procedure STRINGSUBST(v, r, C); expressionstring v,r,C; comment shape (vi) must be: variable; STRINGSUBST is the string-analogue of SUBST; STRINGSUBST := if C = 0 then 0 else ((STRINGSUBST(v, r,C-), SUBST(w, r, C+))) 5.3. Substitution for bound variables (shape = boundvar) This is like the substitution for free variables (apart from the fact that only one boundvar at a time is substituted for). Therefore we will only give the procedure heading.
1. Zandleven
790
5.3.1. expression procedure BOUNDSUBST(z, A , E ) , boundvar z; expression A , E ; comment A is either an expression or another boundvar to substitute for x in E,.
6. REDUCTIONS The reductions involved in the verification of correctness of =-formulas (cf. 8) are a-,P-, r]- and &reduction. See also [van Daalen 73 (A.3), 2.12 and 6.21.
6.1. a-reduction To perform an a-reduction one can easily use the procedure BOUNDSUBST. For an expression [x: A] B , where z is t o be replaced by y (say), we have simply to construct [y : A] BOUNDSUBST(z, y, B )
(y must be new of course).
6.2. P-reduction The P-reductor is written in the form: A procedure with two parameters, A and B. A typical use of this procedure is e.g. if El
> E2 P
> B,where > represents a boolean P P
then A := E2 else ... .
If a P-reduction is applicable to A (so A = ( A l ) [z : A21 A3) then B becomes 1[z/A1]lA3, and the procedure identifier gets the value true. If A has the form ( A l ) A z , where A2 does not have an abstraction shape, so that no direct P-reduction is possible, then the procedure tries to reduce A2 with 0-and/or &reduction so as to obtain the form [x : A31 Ad. At that point the actual O-reduction can be carried out. 6.2.1. Procedure text 6.2.1.1. boolean procedure A > B;
P
expression A , B; comment if A is reducible by P-reduction, then P-reduct of A ;
B becomes the
A verifying program for Automath ( E l )
791
begin i f shape(A) = ( P )Q then begin boolean possible; possible := shape(Q) = [z : R]T ; i f not possible then begin boolean continue; continue := true; while continue do ?(begincase shape(Q) of begin ( R ) S : continue := Q > U
P
d ( C ) : continue := Q > U 6 otherwise : continue := f a l s e ; end; i f continue then begin Q := U ; possible := shape(Q) = [z : R]T; continue := not possible; end; end)?; end; i f possible then begin B := BOUNDSUBST(z, P,T); > := true;
4
end else end else
>
0
> P
:= f a l s e ;
:= false;
end: 6.3. greduction
The whole procedure runs under cnntrol of the boolean “eta reduction allowed”, which may be set or reset by the user. When reset (eta reduction allowed = f a l s e ) , the verificator can only use a-,P- and &reduction. Interestingly enough, in the Automath texts, checked so far, q-reduction has almost never been used. The 7-reductor is written in the same form as the P-reductor: A > B. 77 We have for A the following cases. (i) A = [z : P] (Q) R. (a) If Q
2 z then the procedure
first tries t o reduce (Q) R.
I. Zandleven
792
(b) If Q 2 z, but z occurs in R then the procedure first tries to remove z in R by reducing R. (c) If Q 2 z and z does not occur in R , then the 77-reduct ( B ) becomes R and > gets the value true.
77
(ii) A = [x : P ] Q , Q = d ( C ) or Q = [y : R] S. Now the procedure first tries to reduce Q, and afterwards tests if an 77reduction is possible. In either case if no 7-reduction is possible, the procedure identifier > gets the 77 value false. There appear two procedures in > , which must still be explained. 77 D Firstly there is the procedure = to declare as boolean procedure El Ez; where El and E2 are expressions. This procedure investigates whether El and E2 are definitionally equal, and is described in 8. Secondly there is the procedure OCCURS IN, which searches an expression for occurrences of a specific bound variable. This procedure is defined as follows. 6.3.1. Procedure text for OCCURS IN 6.3.1.1. boolean procedure z OCCURS IN E ; boundvar x; expression E ; OCCURS IN := case shape(E) of begin boundvar : x = E ; : 3, x OCCURS IN Ci d(C) ( A )B : x OCCURS IN A or x OCCURS IN B; [y : A] B : z OCCURS IN A or z OCCURS IN B; otherwise : false; end; 6.3.2.1. Procedure text for the 77-reductor 6.3.2.1.1. boolean procedure A > B;
77
expression A, B; comment if A is reducible by 77-reduction then B becomes the areduct of A; if eta reduction allowed then
A verifying program for Automath ( E l )
793
begin i f shape(A) = [x : P] Q then case shape(Q) of begin
D
( R ) T : i f x = R then i f not x OCCURS IN T then begin > *- t r u e ; B := T 77 *-
end else ? ( i f T 2 TI and not x OCCURS IN TI then begin > .- t r u e ; B := T
77
end else if Q
.-
> Q1 then := [z : P]Q1 $ B P e l s e > := f a l s e ) ? 77 e l s e i f Q > Q1 then > := [x : PI Q1 $ B P 77 e l s e > := f a l s e ; 77 d ( C ) : i f Q > Q1 then > := [x : P]Q1 $ B 6 77 e l s e > := f a l s e ; 77 [ x : R ] T : i f Q > Q ~ t h e n >:= [ z : P ] Q 1$ B 77 77 e l s e > := f a l s e ; 77 otherwise: > .- f a l s e ; end else end else
>
77
77
>
77
.-
:= f a l s e ;
:= f a l s e ;
6.3.2.2. The part between ?( and )? has not yet been implemented. Although such cases are easily constructed (e.g. [x : XI (z) f(z,y), where f(z,y) > y), 6 in practice this has never occurred up to now. 6.4. &reduction
The b-reductor is written in the same way as the P- and 77-reductor, and tries to perform a single &reduction on the presented expression. If the presented expression has shape d(C), the procedure takes the middle expression of the line where d is defined (= MIDDLE(d)) and replaces the free variables in it (i.e. the elements of INDSTR(d)) by the expressions of C.
I. Zandleven
794
6.4.1. Procedure text 6.4.1.1. boolean procedure A > B;
6
expression A, B; comment if A reducible by &reduction then B becomes the b-reduct of A . begin i f shape(A) = d(C) then i f d represeids a n abbreviation then begin > .- true; B := SUBST(INDSTR(d), C, MIDDLE(d)); end else else
>
end;
6
6 *-
> 6
:= f a l s e
:= f a l s e ;
7. C A T A N D D O M As pointed out in [ v a n Daalen 73 (A.3), 6.41, we need two functions, CAT and DOM, to compute mechanically the category (type) and the domain of an expression respectively.
7.1. The ‘Lmechanicaltype” function CAT is defined by induction on the length of the expressions as follows. Let B be a correct book and (i)
If
0
= z1 E 01,
0
a correct context
...,zn E an then CAT(zi)
:= a1.
(ii) If d is an abbreviation constant, defined in a line of B by d := A E B, with indicator string I , then CAT(d(C)) := I[I/C]l B. (iii) CAT((A) B) := if CAT(B) G [z : PI Q then I[zclAII Q else (A) CAT( B) (iv) CAT([z : A] B) := [z : A] CAT(B) CAT is not defined for variables with shape = boundvar (see 5.1), because in the verification process there is no need for it (9.5). Further CAT is not defined for 1-expressions, of course. It is easy to see that, if the argument for CAT is a correct expression, the outcome will again be correct.
A verifying program for Autornath ( E l )
795
7.2. The procedure text of CAT reflects the given definition completely.
7.2.1. express ion procedure CAT( E ) ; Expression E ; CAT := case shape(E) of begin variable : CATEGORY(E); d(C) : SUBST(INDSTR(d),C, CATEGORY(d)); ( A )B : if shape(CAT(B)) = [z : P]Q then BOUNDSUBST(z, A, Q) else ( A )CAT(B) [z : A] B : [z : A] CAT(B); otherwise : undefined; end; 7.3. The “mechanical domain” f u n c t i o n DOM
This procedure has to yield (where possible), for a given expression A, an expression a , such that F A E [z : a]p or F A = [z : a]/3. For expressions A of the form [z : B ] C , the computing of the domain is trivial: DOM(A) = B. If A is a variable, we may compute the domain of the category of A. More difficult is the case where A has the shape d ( C ) or the shape ( B )C
.
If we try to reduce A, we may end up with a PN (e.g.: d(C) 2 f(I’), f := PN). On the other hand, if we take the category of A by computing CAT(A), we may obtain type or [q, c q ] ... (z, a,] type. To deal with this probem we use the following strategy. At first CAT(A) is computed, and presented t o DOM (N.B. This is a recursive call, so possibly CAT(CAT(A)) is computed.) If DOM(CAT(A)) does not yield a domain at all, then a 6- or &reduction on A is carried out (if possible), and the reduct is again presented to DOM. Since only 1, 2 and 3-expressions are investigated, the whole process can be given by the following tree figure:
796
I . Zandleven
Figure 1 7.3.1. Procedure text 7.3.1.1. expression procedure DOM(A); expression A; case shape(A) of begin [ z : B]C : DOM := B; variable : DOM := DOM(CATEGORY(A)); d ( C ) ,( B )C : begin D := DOM(CAT(A)); if undefined (D)then if A > A1 then DOM := DOM(A1) 6 else if A > A1 then DOM := DOM(A1)
P
otherwise end;
else DOM := undefined else DOM := D end; : DOM := undefined;
8. DEFINITIONAL EQUALITY To verify the correctness of a given =-formula we will use the Church-Rosser theorem: if A = B then A 2 C 5 B for some C (see also [van Daalen 73 (A.,?), 6.3.11).
A verifying program for Automath ( E l )
797
D
This definition is the guide for the procedure = which we will introduce here.
D
8.1. Description of = The type of the procedure is boolean, and the identifier will be written in
D
infix notation, viz. A = B (in the same way as for
D
> > D’ 6
etc.).
Roughly speaking, in order to check A = B, the procedure tries to reduce A
D
and/or B until either the two expressions are identical or the decision A # B can be made. It is not always necessary for both complete expressions to be present during the whole reduction process. If, for example, A = d ( C l , C 2 , C 3 ) and B = d(C1, C4, C,) then the procedure needs only parts of both expressions, namely
D
and CQ,and will check C2 = E d . So, in general, the procedure uses recursive calls, applied to sub-expressions, following the monotonicity rules described in [ v a n Daalen 73 (A.3), 5.5.61. Recursive calls are also used for the reduction sequences. Firstly the p r e cedure tries, if necessary, to reduce one of the expressions A and B. Which is reduced is a matter of strategy. If one of the two expressions is reduced, one could continue the equality-check by using an iterative or a recursive method. A recursive method is chosen in order to make the algorithm more readable. C2
Example. If A = d(C) and B = ( P )Q then the procedure first tries to reduce B by P-reduction. If this succeeds, and the outcome is B1, then the definitional equality of A and B follows from that of A and B1.
D
Otherwise the procedure tries to reduce A to A1 (say) and checks A1 = B.
D
If this also fails, then the procedure identifier = gets the value f a l s e .
8.2. Type inclusion If we want to verify A E B , we check t- A and t- B, compute CAT(A) and
D
check CAT(A) = B ; so CAT(A) is the first parameter and B is the second parameter of the procedure call. D In order to accept type inclusion as well, we add a slight extension to =, namely:
D
[z : A] type = type
will be accepted as correct, but not
D
type = [z : A] type
.
I. Zandleven
798
The same holds for prop. So the procedure is no longer symmetrical for 1expressions. (Notice that calls are sometimes made with reversed order of the arguments D of =, but as one can see in the procedure text these cases can never refer to 1-expressions.) D Now the definition of = is exactly the same as that of E [van Daalen 73 (A.3), 61. 8.3. OLDER THAN
D D The procedure = needs, in one special case, namely d(C) = b ( r ) and d f b, the boolean procedure OLDER THAN, to decide which of d and b must be reduced. It seems a good strategy to start off by reducing the younger of the two, i.e. the constant which was the more recently, for in this way we have a chance of reducing it to the other. 8.3.1. boolean procedure d OLDER THAN b; def inedname d, b; comment OLDER THAN := the line in which b is defined, appears later in the book than the line in which d is defined;
D
8.4. Procedure text of = 8.4.1. D boolean procedure El = expression El, E2;
E2;
Q - ..case (shape(E1), shape(E2)) of begin (type, type) : true; (type, otherwise) : false; (prop, Prop) : true; (prop, otherwise) : false; (variable, variable) : El = E2
D
>
E22
then El =
Ez >
E22
then El =
E22
then El = E22 else f a l s e ;
(variable, d( C))
: i f E2
(variable, ( A )B )
: if
(variable, [z : A ] B )
: i f E2
(variable, otherwise)
: false;
6
P >
9
D
D
E22
else f a l s e ;
E22
else false;
A verifying program for Automath ( E l )
799
(boundvar, boundvar) : El = E2; (boundvar, otherwise) : consider (variable, shape(&)) : i f d = b then
SD
if c = else ? ( i f El
r thentrue
> Ell then 6
Ell
D
=
E2 e l s e f a l s e ) ?
else i f d OLDER THAN b then i f E2
D
>
E22
then El = E 2 2 e l s e f a l s e
>
El1
then Ell = E2 e l s e f a l s e ;
6
else i f El
6
D
> E22 t h e n El D = E22 e l s e P D El > Ell then El1 = E2 e l s e f a l s e ;
: i f E2
if
6
: i f E2
>rl
E22
D
t h e n El = E 2 2 e l s e
D Ell t h e n Ell = E2 e l s e f a l s e ; 6 : consider reverse (i.e. (shape(E2), shape(E1))) D D : i f A = C and B = D then t r u e i f El
>
else ? ( i f El i f E2
> P > P
D
El1
then El1 = E2 e l s e
E22
t h e n El = E 2 2 e l s e f a l s e ) ? ;
D
> Ell then Ell D = E2 e l s e P D E2 > E22 t h e n El = E22 e l s e f a l s e ;
: i f El
if
77
: consider reverse;
:
B D = D B =
:
D D i f A = C then B = BOUNDSUBST(y, z, D )
:
E2; E2;
else false; : consider reverse;
end;
I . Zandleven
800
8.4.2.
sz
boolean procedure expressionstring C1, C2;
C2;
D
SD
comment = is the string analogue of =;
sg := if C1
3
else C;
0 then Cz
s$
0
C, and E:
D
= Xi;
9. CORRECTNESS OF EXPRESSIONS (k) 9.1. Correctness of an expression is checked by the boolean procedure “I-”, operating on an expression (say E ) and the indicator string (say I ) belonging to E. A procedure call is written like I I- E. Mentioning I is necessary, on account of the free variables in E which must all appear in I . Two non-trivial cases arise: (1) If shape(E) = ( A ) B, then the “applicability” (let us say) of B to A has to be checked. D This is done by looking at: CAT(A) = DOM(B) (see also [van Daalen 73 (A.3), 6.4.2.31).
(2) If shape(E) = d(C) then : all Ci must be correct, firstly secondly : all Ci must have the correct categories. In the case 2 there is a difficulty: Let us consider the following book:
0*
Q
:=
EB;type.
*a *f
:=
EB; Q
:=
0*P
:=
P N ;type EB;type
Q
a
P * b := E B ; P b
*g
:=
f(P,b);type.
Now: ( P , b ) I- f ( P , b ) , nevertheless the string of types expected by f is not definitionally equal t o the string of given types:
A verifying program for Automath ( E l )
801
We may conclude that after checking the definitional equality of the first two a ) , the variable Q by categories, we have to replace, in the category string of (a,
0. This replacement (substitution) is, in a more general way, done by the procedure CORRECTCATS (see also [van Daalen 73 (A.3), 2.5 and 5.4.61).
9.2. boolean procedure CORRECTCATS(C, I ) ; expressionstring C, I ;
CORRECTCATS := if C = 8 then I = 8 else CORRECTCATS(C-, I-) and CAT@+)
SUBST(1-, C-,CAT(I+));
9.3. boolean procedure I I- E; expressionstring I ; expression E ; I- :=
case shape(E) of begin type : true; prop : true; variable : 3i(Ii E ) ; d(C) :I C and CORRECTCATS(C, INDSTR(d));
( A )B : I I- A and I I- B and CAT(A) DOM(B); [z : A] B : I I- A and ((I,.)) I- B; (see 9.5) otherwise : false; end;
9.4. boolean procedure I ks C; expressionstring I,C; comment FSis the string analogue of I-; i-* :=
if C E 8 then true else I Fs C- and I i- C+;
9.6. A comment on I I- [z : P] Q In this case, the checker, after checking I I- P , adds a “waste-line” to the book, of the form:
I. Zandleven
802
I
*
waste := E B ; P .
If we denote this new book by
B',then the checker checks the statement
B',( ( I ,waste)) I- I[z/waste]l Q . For this reason the correctness of a bound variable will never be asked for, and its CAT or DOM will never be computed. Only in D = can the shape boundvar occur.
10. THE CORRECTNESS OF LINES The checking for correctness of a n Automath line is now easy to describe in terms of already defined procedures:
10.1. boolean procedure CORRECT(L1NE); Automath line value LINE;
CORRECT := case form of the line is of begin
I I
I
* * *
N := E B ; E . : I t E ; N := P N ; El : I t - E ;
N := El ; E2 otherwise : false;
:
D I t - El and I t E2 and CAT(E1) =
E2;
end;
11. A PARAGRAPH SYSTEM As already mentioned in [wan Daalen 73 ( A . 3 ) , Section 2.161, the syntactical definition of AUT-68 (and AUT-QE) forces us to write mutually exclusive names (identifiers) for both variables and constants. This, of course, is very annoying to the writer of Automath. Therefore we have introduced a paragraph system. Each Automath text may be divided into sections, called paragraphs. A paragraph starts with:
+ paragraph name. and ends with: - paragraph name.
A verifying program for Automath ( E l )
803
In a paragraph one may write Automath lines and other paragraphs (sub- paragraphs). Finally the whole book is contained in one big paragraph, so all paragraphs occur nested. Behind the identifier of a given constant one may write a so-called paragraph reference, t o indicate in which paragraph this identifier has been defined. An identifier b with paragraph reference to (say) paragraph P, is written in the form: b"P1 - P2 - ... - P,", where P2 is a sub-paragraph of P I , P3 is a sub-paragraph of P2, ... , and P, is the paragraph in which b is actually defined. An identifier, not followed by a paragraph reference, refers to a constant or variable defined in the same paragraph, or, if not found there, in the paragraph, which contains that one, and so on.
Example. ( a := ... denotes a definition of a ... ( a )... denotes a reference to a ) line nr
book
1
p :=
2 3
... (p"A")... ... (p"B")...
reference to line nr
+ A. ... + B. + c.
no good reference ( p has not been defined in B ) . 5 6 7
p := ... ... ( p ) ... ... ( p " A - B - C")... ... (p"A")...
8
... ( p ) ...
4
- c.
- 4
4 1 1
- B. - A.
12. FINAL REMARKS 12.1. We repeat that the procedures given here form only an outline of the actual verifier. Many more parameters are passed through the procedures to avoid duplication, to control critical passages, to permit communication with the user and so on.
804
I . Zandleven
12.2. With regard to efficiency, improvements may be possible. For example, D parts of the strategy, implemented in =, are more or less arbitrary, although suggested by reflexion and practical work. Experience and research may lead to better strategies. Also the use of the features of [de Bruijn 726 (C.2)] may lead to a more efficient verifier. 12.3. We are pleased to say, in any event, that the verifier has been working satisfactorily up t o now. 12.4. An example of a text checked with the described verifier is found in [van Benthem Jutting 731.
805
Checking Landau’s “Grundlagen” in the Automath System Parts of Chapter 3 (Verification) L.S. van Benthem Jutting
CHAPTER 3. VERIFICATION
In this chapter the verification of the AUT-QE text is described. features of the program and the possibility of excerpting are discussed.
Some
3.0. Verification of the text
The verification of the AUT-QE translation of Landau’s book was executed on the Burroughs BG700 computer a t t h e Eindhoven University of Technology. T h e last page of the book was checked in September 1975. T h e whole book was checked in a final run on October 18, 1973. The verifying program was conceived by N.G. d e Bruijn and implemented by I. Zandleven. For a description of this program we refer t o [Zandleven 7,?( E l ) ] .Zandleven also provided the program with input and output facilities, and extended it with a conversational mode for on-line checking and correcting of texts. The verification took place in three stages: (i)
First t h e AUT-QE text was fed into the system on a teleprinter. At this stage the main syntactical structure of the text was analyzed. It was checked, for example, t h a t the format of the lines was as it should be, that the bracketing of t h e expressions was correct, and t h a t no unknown identifiers occurred.
(ii)
Secondly t h e AIJT-QE text was coded. At this stage t h e correct use of the context structure, t h e validity of variables, t h e correct use of the shorthand facility [van Daalen 7’+?(A.:?), 2.151 and of the paragraph reference system (cf. Appendix 2 [not in this Volume]),were checked.
(iii) I’inally t h e text was checked with respect to all clauses of the language definition. At this stage t h e degree [van Daalen 7.“3(-4.3), 2.31 and types of
L.S. van Benthem Jutting expressions were calculated. and the correctness of application expressions and constant expressions was checked. Vital for this is t h e verificatioii of the definitional equality of certain types (cf. [aan Daulen 7.”)(.4.:1). 2.101. [Zandleveii 7.3 ( E . l ) ] ) . Runs of the stages (ii) and (iii) generally claimed much of the computers (virtual) memory capacity (over GOO K bytes were needed for the program together with t h e coded text). In order t o avoid congestion in the multi-programming system it was therefore necessary t o have the program executed a t night ( a n d off-line). As Automath texts are checked relative t o correct books. a mechanical provisional debugging device for off-line checking was implemented. by which lines which were found incorrect could be tentatively repaired. E.g., when the middle part [van Daalen 73 (-4..1)?2.13.11 of a line was found incorrect. the debugging device changed it temporarily into I”, thus turning an abbreviation line into a I”-line. T h e line so “corrected” was then again checked, and. if it was found correct. t h e lines following could then be checked relative t o the “corrected” book. By this device i t was not necessary t o stop the checking immediately after the first error had been found. Another feature of t h e verifying program was added because o f t h e fact that proving expressions t o be incorrect (especially proving expressions to be not definitionally equal) is often more difficult and more time-consuniing than proving correctness. Therefore during off-line runs a parameter in the program (viz. the number of decision points, t o be explained in 3.1) has been limited, and lines were considered provisionally incorrect when this limit was exceeded. When the later chapters were checked, we reduced the demands on the coniputers memory capacity by abridging the book relative t o which the text was checked, in t h e following way: In t h e chapters which had already been found correct, t h e proofs of theorems and lemmas were omitted, and t h e final lines of these proofs (where t h e theorems and lemmas are asserted) were changed into I”-lines. Each time a chapter was completely checked (relative t o the book so abridged) it was abridged in its turn. Texts which are correct relative to t h e abridged book will be correct with respect to t h e unabridged book too. On the other hand, as in classical mathematics there is no reference t o proofs but only t o assertions, it is unlikely t h a t texts which are correct relative t o t h e unabridged book will be rejected relative t o t h e abridged book. In actual fact this did not occur. When a chapter, after several off-line runs of t h e program, was found t o be “nearly correct”, the final verification of that chapter took place on-line. In such a n on-line run the remaining errors could be immediately corrected. Moreover, correct lines could be verified, which had been provisionally rejected because t h e number of decision points during verification in off-line runs had exceeded t h e chosen limit. The verification of such complicated lines could be shortened
(:hecking Landau's %rundlageii", L'erification (E.2)
807
by directing (in conversational mode) t h e strategy for establishing definitional equality. After all chapters were verified in this way, the integral AUT-QE text (coniplete and unabridged) was checked during a final on-line run, which took 2 hours (real time). Of this time '12 minutes were spent on verification (not including t h e time needed for coding). In a table we list some d a t a on this final run, concerning verification time, number of performed reductions and memory occupied. hapter hapter
preliiniiiary text
,hapter
~
107.3
secoiids cr-red iictioiis
:Iiapter
~
:hapter
total
text
~
~
verilicatioii time
haplei
1 2 3 4a 5 4 __ __ __ - __ ~
~
143.1
301.2
342.4
405.7
813.1
406.9
2519.7
752 832 1111
1077 460 1318
1455 466 1873
1644 414 27'4
3393 2749 9290
1533 529 3151
10485 6014 20063 2 13433 215138
iii
631 564 596
d-red tictioiis 6-retluctioiis r/-red tictiona iir.
I I ~ .of
-
1
886 12155
1068 9388
of lines
expressions
~
1603 25792
-
~
2181 30327
2779 42067
-
2690 60450
~
2226 34959
Since one coded expression occuL es 30 ytes (mainly used for references t o subespressions), the total nieniory required for t h e coded book is about ti500 K bytes (= 52000 K bits). 3.1. Controlling the strategy of the program
In ordertoestablish definitional equality of two expressions, t h e verification system tries t o find another expression t o which both reduce. T h e choice of efficient reduction steps for this purpose is a matter of strategy ([ziari Daaleii 73 ( A . 3 ) , 6.1.11). The programmed strategy is described in [Zandlcven 73 (/?.I,)], Under this strategy it is possible that intermediate results are obtained which strongly suggest a negative answer t o the question of definitional equality, without definitely settling it. Suppose, for example, that a ( p ) = a ( q ) has t o be established. T h e programs strategy is t o ascertain that t h e constants a and a are idrntical and t o verify whether p = y. If this is not the case, there is a strong sugge\tion that u ( p ) and u ( q ) are not definitionally equal either, but this is yet uncertain. Tor example. they are definitionally equal relative t o the book
* * * *
p
:= :=
y x
:= :=
71
x * * : : =
I" I" I" P
Etype E 71 E ?I E 71 E 11
808
L.S. van Benthem Jutting
It is a matter of strategy how t o proceed in such cases. We may either apply &reduction (in which case the issue will be eventually settled) or we may try t o continue the verification process without using n ( p ) = a ( q ) . Such a situation is called a decision poii2t. In on-line runs the verifiration may be controlled here by t h e human operator. (Actually. in the situation sketched above, information will be supplied, and the question will appear whether 0reduction should be tried.) In off-line runs &reduction will be applied in order t o get a definite answer t o the question, and it will be checked t h a t the total number of decision points passed during the checking of a line does not exceed t h e chosen limit (cf. 3.0).
809
An Implementation of Substitution in a X-Calculus with Dependent Types L.S. van Benthem Jutting 1. INTRODUCTION In this paper we describe an implementation of substitution, which makes use of de Bruijn indices [de Brzlijn 72% (C.211 and of structure sharing (cf. [Wadsworth 711, [Boyer & Moore 721, [Curien 861). This implementation was designed by I. Zandleven about 1972. He used it in the verifying program for Automath, which has been running at Eindhoven for over 10 years. As it was never properly described we have thought it worth while to give a formal description and to prove that the implementation really satisfies its specifications. The implementation of substitution which we describe does not really carry out substitutions, but implements them by considering pairs consisting of an environment and an expression. The environment in such a pair gives a meaning to the free variables which occur in the expression. Substitution for a free variable (or for more free variables) is implemented by changing the environment. Such an implementation is of interest in situations where (like in Automath) the issue is not to normalize an expression, but t o decide whether two expressions are equal, i.e. whether they have a common reduct. In such a situation time as well as space is saved because there is no copying involved in substitution. We start our description by briefly explaining the structure of the system. We also explain the mechanism of relative addressing, known by now as the system of de Bruijn indices. Our system (and also Zandleven’s implementation) makes essential use of this mechanism. We think, however, that similar implementations using absolute addressing might be possible. Then, in Section 2, we give a formal definition of our system of name-free Xcalculus. We formally define single and multiple substitution, stating theorems about commuting these operations. In Section 3 we discuss the implementation of substitution by structure sharing. We define environments, operations upon environments and an interpretation function, mapping pairs ( A ,E ) , where A is an expression and E an environment, to expressions. We prove that our implementation is sound, i.e. that
810
L.S. van Benthem Jutting
the interpretation of ( A , E ) is really the result of the substitution we meant to implement in constructing E . We carry out this program first for multiple substitution, then for single substitution and finally for a combination of both. Then, in Section 4, we treat the implementation of typing. We define a typing operator in the original system, which associates a type t o certain expressions. We give theorems about the connections between typing and substitution. After that we describe in our implementation a corresponding operator which produces a type for the expression denoted by the pair ( A ,E ) (if this expression has a type) and we prove this function to behave as described. We omit proofs, all proofs being by simple induction on the structure of expressions, possibly making use of earlier theorems. Finally we make some concluding remarks in Section 5. In that section we discuss briefly the possibilities of the system and the environment in which it has been used.
1.1. Automath First we give a short description of the main aspects of Automath which are relevant to our discussion. The Automath system is a proof checking system, which can be used for checking large mathematical proofs. As such it can be used also in checking correctness proofs for designs of computer programs. For an introduction into Automath we refer to [ d e Bruajn 80 (A.5)] and [van Daalen 73 (A.3)]. Automath is a typed A-calculus with dependent types. We assume the basic notions of A-calculus, such as substitution and &reduction, to be well known. Incorporated in the system is the notion of definition: if A is an expression and if the free variables of A are among X I ,...,x, (where n 2 0) then we can define an n-ary constant a such that a(x1,...,2), stands for A . Now if B1, ...,B, are expressions then a(&, ...,B,) is an expression, which should be interpreted as the expression obtained from A by simultaneous substitution of Bi for xi (where 1 5 i 5 n). The operation of eliminating a definition, replacing a defined constant by its definiens is called &reduction. In our system it gives rise to the implementation of simultaneous substitution. It is convenient for our description to forget about the arity of constants by postulating them to have infinite arity. We will therefore consider in the following sections expressions of the form a(@ where B’ is an infinite sequence of expressions. As we have said, Automath is a typed A-calculus: terms are typed A-terms, variables are typed. When the variable x has type A this is denoted by x : A . If A , B and C are expressions, x is a variable, and if for x typed by A the type
An implementation of substitution (E.3)
811
of B is C , then the function Ax : A.B has type IIx : A.C (where B and C may depend upon z). We adopt the practice (which is common use in Automath) to denote both expressions Ax : A.B and IIx : A.C in the same way: by [z : A]B and [x : A]C. In Section 4 we come back to this point. The value of the function A for the argument B will be denoted by A { B } . (This conflicts with the habit in Automath to put the argument before the function.) Thus we get the following description of Automath expressions: We assume two disjoint infinite sets C and V to be given: C = {a, b, c, ...} is the set of constants,
V = {z, y, z , ...} is the set of variables. Now if x is a variable then x is an expression, if a is a constant and
B’ is a sequence of expressions then a ( @ is an expression,
if A and B are expressions, and x is a variable then [z : A]B is an expression, if A and B are expressions then A { B } is an expression. On these expressions we can define single and multiple substitution. Also a-conversion and P-reduction can be defined, and for constants which have definitions we can define &reduction (i.e. application of a definition).
1.2. de Bruijn indices When defining substitution formally the main problem is to avoid clash of variables. This is usually done by introducing “fresh” variables where needed. In our presentation we avoid clash of variables by using relative addressing (nameless variables or “de Bruijn indices” [ d e Bruzjn 7 2 b (C.2)]; see also [Berkling & Fehr 821). This concept is important both in Zandleven’s actual implementation of the Automath verifier and in our formal treatment of substitution and its implementation by environments. We will now give an informal description of this representation of variables. As an example we choose the expression
[u: x { z } ][u : [w: u]w] z { u } .
(1)
This expression should be considered relative to a “context”, that is a sequence of distinct variables, containing all its free variables. In this case the context could be
x,y,z, ... .
(2)
L.S. vm Benthem Jutting
812
We represent the expression (1) on its context (2) by a planar tree.
Figure 1 In this tree the nodes labelled ab and ap represent abstraction and, respectively, application. The bold unary dots represent the place where variables are bound, the letters labelling the leaves are the (free and bound) variables. Note that every letter which labels a leaf should be bound by a bold dot, its binder, which is situated either on its path to the root of the tree or in the context. Now we replace every letter at a leaf by a number, which indicates the number of bold (binding) dots which lie on the path t o its binder. Or, in other words, we replace every occurrence of a variable by the number of scopes in which it is contained, and which are strictly contained in its own scope. Doing so we get the tree of Figure 2. It will be clear that, if we identify &-equivalent expressions, the letters labelling the binders are now irrelevant. Forgetting about these we get the “nameless” expression
Note that in this representation of (1) the number 0 denotes the different variables z, u and w, while the variable u is represented by 0 as well as by 1. Let us analyze which numbers denote the same free variable in a tree. Suppose n is a number labelling a node in a (nameless) tree and let m be the number of bold dots on the path from this node to the root. It is clear that n denotes
An implementation of substitution (E.3)
813
Y' X.
Figure 2
a free variable iff n 2 m. We will call the number n - m the depth of n in the tree. And we see that two labels denote the same free variable if they have the same non-negative depth. In the example this is the case with the labels 2 and 4 (which both denote z ; in fact there are two dots on the path from 2 to z , viz. the dots x and 8 , and four dots on the path from 4 t o z , viz. v , u, x and y). Both labels 2 and 4 have depth 2. It is easy to make algorithms which translate ordinary name-carrying expressions into nameless ones and back (given a certain context and forgetting about a-equivalence).
2. NAME-FREE A-CALCULUS 2.1. The language The structure of the expressions and the use of de Bruijn indices have been explained in the previous section. We will now give a formal definition of the language, which is essentially the Automath language defined in Section 1, but uses de Bruijn indices instead of variables. In the following N denotes the set of natural numbers {0,1,2, ...}.
Definition. We assume an infinite set C = {a,b, c, ...} of constants to be given. Now the set L = { A , B, C, ...} of expressions is defined by
L.S. van Benthem Jutting
814
if x E JVthen x is an expression,
if a E C and B’ is a sequence of expressions then a(@ is a n expression, if A and B are expressions then ( [ A ]B ) is a n expression, 0 if A and B are expressions then A { B } is a n expression.
We will omit parentheses around [A]B when this does not present parsing problems. The natural numbers occurring in an expression will be called references. It has been explained in the introduction how references can be interpreted as nameless (free or bound) variables, and that references having the same nonnegative depth can be interpreted as denoting the same free variable. There is, however, no need for defining formally the concept of depth; it will be used only in informal comments. 2.2. Operations 2.1.1. Updating When we substitute an expression A for the free variables in B with depth n we should add n to the references representing free variables of A in order t o preserve their original depth in the result of substitution. The corresponding updating operator is denoted by ug. When we are updating the expression A in this way we do not want the references denoting bound variables to be updated. If, for example, A is the expression [C]D then references in D with depth 0 should remain unchanged, and other references should be updated. Therefore we need operators u; (for every k E N ) which increase the references in A with depth k or more by n. This gives ug as a special case.
Definition.
uEA{B} = uEA{uEB]
u;E.a(A) = a(u;EA).
0
Remarks. In the last clause of this definition we have used @A, which is meant to denote the sequence of expressions obtained from A’ by applying UE
An implementation of substitution (E.3)
815
t o each item. It is possible, of course, to define this concept formally, but this would give rise to an extra clause in our definition. For brevity the present notation has been chosen in all definitions. The second clause of the definition is justified by the fact that references with depth k + 1 or more in B correspond to references in [A]B with depth k or more.
2.2.2. Single substitution We now define the operation s$ of single substitution. s$ A denotes the result of substituting C for the references in A with depth k . The variable denoted by the depth k in A is supposed to disappear from the context. Therefore the references with greater depth are decreased by 1.
Definition. s$ x
x $ [ A ]B
ifx
("
s $ A { B } = s$ A { s$ B }
s$a(A)
= a(s$A) .
0
Remark. The system defined here, with the operations u; and s$, is similar to the A-calculus treated in [Curien 861. 2.2.3. Multiple substitution We define the operation d$ of multiple substitution. That we use the symbol "d" to denote this operation suggests that its main use is in applying definitions (or, in Automath jargon, in &reduction). d$ A denotes the result of substituting C1, Cz, ... for the references in A with depth simultaneously the expressions CO, 0,1,2, ... respectively. We do not want the references denoting bound variables the operation d$ denoting in A to be changed. Therefore we need for k E simultaneous substitution for those references in A which have depth k or more.
Definition.
d$[A]B = [d2A]dF1B
L.S. van Benthem Jutting
816
d$A{B} = d$A{d$B}
dia(A)
= a(d$A) .
0
2.3. Theorems The following theorems treat the properties and relations of the operations defined above. Proofs proceed by easy induction on the structure of the expressions. For some theorems we give an intuitive justification.
Theorem 2.1. if 15 k 5 1 + m
0
ifk>l+m
Theorem 2.2. if 1 5 k
< 1+ m
if k 2 I + m
Theorem 2.3. 5kS1
c
D
-
1
k+l
- 5s;-1D5~
ifkzl
Remark. Theorem 2.3 corresponds to a well-known result on ordinary substitution: Sg S$ A = S:gD Sg A provided x $ y and x is not free in C
0
Theorem 2.4.
Remark. Theorem 2.4 can be understood as follows. Suppose k 2 1. Then, if we apply u r to an expression A, the references in A with depth 2 k are increased by m (together with all other references in A with depth 2 1 ) and become references with depth 2 k m. When we subsequently substitute the expressions 6 for these references, then they are updated with Uk+m , as may be seen in the definition in 2.2.3.
+
An implementation of substitution (E.3)
If, on the other hand, we first substitute
817
6 for
the references with depth
2 k, then the substitution will occur at the corresponding places in A and the
expressions 6 are updated with ut. When we afterwards apply u r , all references with depth 2 1 (which now includes all outside references in 6) are increased by m, which gives the same result. Now suppose k 5 1. If we substitute c' for the references in A with depth 2 k then there are no references left originating from A itself with depth 2 1. Moreover, all expressions in 6 are now updated by u/j. If we subsequently apply this will affect precisely those references in the expressions from c' which had depth 5 1 - k before substitution. Therefore we can reverse the order, beginning 0 with updating c' and doing the substitution afterwards.
ur
Theorem 2.5a.
Theorem 2.5b. ifk>1
Theorem 2.6.
Remark. Theorem 2.6 corresponds to a well-known result on multiple substitution: S< S< A = s C D
; ~A
c provided all free variables in A which are among y' are also among 2
. 0
3. THE IMPLEMENTATION OF SUBSTITUTION
In this section we describe the implementation of substitution by structure sharing. This implementation is used in the Automath verifying program, and has been designed by I. Zandleven. In order to code substitution without really carrying it out (i.e. without constructing new expressions by copying existing expressions) we use environments.
L.S. van Benthem Jutting
818
Expressions can be interpreted with respect to such environments, and the interpretation will be an expression, which we will prove to be the intended result of substitution. By coding substitution in this way the amount of memory space required and the time for carrying out substitutions will decrease, while the time needed for comparison of expressions might increase. We will give in this section mathematical definitions of environments and of the interpretation function and state a theorem that these definitions meet the requirements formulated above. In order to make our description clear we will treat first multiple substitution, then single substitution, and finally the combination of both.
3.1. Multiple substitution In this section we define environments for multiple substitution, called d-
environments. The name suggests that these environments are used when 6reduction (i.e. application of definitions) is performed. We will also define two operations on these environments.
Definition. A d-environment A is a finite sequence of functions A l , A2, ...,Ah, where Ai : N -+ C for 1 5 i 5 k and k E N . The number k will be called the 0 length of A. Let us first explain informally what we intend to picture by such a d-environment. Suppose A is a d-environment of length 1. If we interpret an expression A with respect to A we intend that the free references in A with depth i shall be interpreted to denote the value of the function A1 at i. We could picture this in a diagram as follows:
Figure 3 If we want to interpret an expression A on a longer d-environment A of length k , say, then we think of the free references of A as pointing into &, but an expression &(i) should now be interpreted with respect to the d-environment
An implementation of substitution (E.3)
A l l Az, ...,& - I .
819
This situation is illustrated in the following diagram.
Figure 4 Note that a d-environment might be an empty sequence. The interpretation of an expression A on the empty d-environment is A itself.
3.1.1. Multiple substitution The operation 66 (where c' is a sequence of expressions) is called multiple substitution (because it codes multiple substitution). Its effect is to extend a d-environment with the sequence c'. Deflnition. Let A = A l l ...)Ak be a d-environment. Then 66A = A l l &+I where Ak+l(i) = c, for i E N .
Ahr 0
3.1.2. Cutting The operation y cuts the last segment from a (non-empty) d-environment. Deflnition. Let A = A I ~ . . . ~beA a~ d-environment, and k 2 1. Then 0 YA = All ...l Ak-1. 3.1.3. Interpretation We now define the interpretation ("on depth n") of an expression A with respect to a d-environment A. The definition formalizes our intentions as explained above. The parameter n for the depth is needed for interpreting the bound variables in A. It is intended that the interpretation of A is the result
L.S. van Benthem Jutting
820
of a certain multiple substitution for all free variables of A . Because nothing should be substituted for the bound references in A , these references should be unchanged by interpretation. In the definition recursion is used on the length of A and on A.
Deflnition. Let A = Al, ...,Ak be a d-environment.
We derive two theorems concerning this interpretation. Theorem 3.2 shows us that the interpretation which we have defined has the property we wanted, i.e. that d-environments indeed code multiple substitution.
Theorem 3.1. If1 5 n then
I(uf"A, A, n
+ m) = uf"I ( A ,A, n) .
0
Theorem 3.2.
I ( A ,6eA, n) = dz* A where
c'.= I ( C ,A, 0) .
0
3.2. Single substitution
Now we define another kind of environments, called s-environments, which code single substitution (as is suggested by the "s" in their name). Deflnition. An s-environment is a partial function C : N a finite domain.
+
(L x N ) with 0
Let us explain the intended interpretation of an expression A with respect to an s-environment C. A reference in A with non-negative depth k (which represents a free variable in A ) is intended to denote a variable (i.e. a reference) if k $Z dom(C). If k E dom(C) then this tells us that an expression has been
An implementation of substitution (E.3)
82 1
substituted for k . If C ( k ) is the pair [C,nl then C has been substituted, or rather the interpretation of C with respect t o the prepart of the original environment, which starts n places before the place k of C. We illustrate this in another diagram.
depth = 1
environment for C
C(2) Figure 5
The need for such a construction can be seen when we consider the expression
“A1
PI C ){ D ) {El
where A , B , C, D and E are expressions. We want to consider the interpretation of this expression with respect to a certain s-environment C, and to apply preduction. Let us draw a diagram of this initial situation.
environment C
Figure 6
So we consider the p-redex ([A] [B]C) { D } on the same environment. I t will be clear that its reduct should be represented by the expression [B]C interpreted on the environment C extended by D (where D should also be interpreted on C). The situation is pictured in Figure 7. Its interpretation is the result of substitution of D (or, rather, the interpretation of D ) for the free references of depth 0 in [BIG‘. Now we can reduce again by applying this expression to El i.e. by substituting E for the references with depth 0 in C. But E should be interpreted with respect to the original environment C, and we can indicate this by putting E into the environment with index 1, which will bring about that references in E will be considered as pointing beyond D. The situation is sketched again in the diagram of Figure 8.
L.S. vitn Benthem Jutting
822
.
0
0
.
0
,
cnvironmcnt C
Figure 7
.
.
.
.
*J
environincnt C
Flm
Figure 8 Both D and E are now interpreted with respect t o the original environment C, while references with depth 0 from C point to E and references with depth 1 point to D. Now let us look at the intended interpretation of a reference which does not point into the domain of C. We have seen that references are shifted when a substitution is made for a smaller reference (cf. Definition 2.2.2, the clause concerning s$ z where z > k ) . This indicates that the interpretation of a reference with depth z should be obtained from z by subtracting the number of elements of dom(C) which are below z. Therefore we give the following definition.
Definition. z m o d C := z - #{y
E dom(C) Iy < z} .
0
Then the intended interpretation of a reference with depth z not pointing into dom(C) will be z mod C. On s-environments we define three operations. 3.2.1. Substitution The operation 0 6 , ~ (called substitution because it codes (single) substitution), extends the domain of an s-environment to k, and associates to k as value
An implementation of substitution (E.3)
823
the pair [C,n1. (As has been explained above “n” indicates that C should be interpreted with respect to a shorter s-environment.)
Definition. If k @‘ dom(C) then (g&
C ) ( k )= YCl n1
(g&C)(i) = C(i)
if i E dom(C) .
0
3.2.2. Extension The operation E is called extension because it codes extension of a context with another free variable. Its effect is that the domain of the s-environment is shifted. Definition.
( & C ) (+ i 1) = C ( i )
if i E dom(C)
.
0
3.2.3. Cutting The operation yn, called cutting, codes the removing of variables from the context. Its effect is that the domain of the s-environment is shifted, and possibly becomes smaller. Definition. (yJ)(h)
= C(i
+ n)
if i
+ n E dom(C) .
0
3.2.4. Interpretation With the help of the operations on s-environments defined above we can describe the interpretation of an expression with respect to an s-environment C. As we have seen, the interpretation of a reference with depth x which does not point into dom(C) will be x mod C, which is x minus the number of elements of dom(C) which are smaller than x. First we state some properties of 2 mod C. Clearly the function Xz.(x mod C) is increasing. Moreover we have: Theorem 3.3. If.
$ dom(C) then
xrnodC=(a:+l)modC-l.
0
L.S. van Benthern Jutting
824
The following three theorems show us what the relation is between the three operations defined on C and the value of x mod C.
Theorem 3.4.
x mod C xmodC-1
x mod 0 5 , C~ =
ifx
0 '
Theorem 3.5. xmodEC = (x - 1)modC
+1
0
I
Theorem 3.6. xmodykC=(x+k)modC-IcmodC.
0
Now we will define the interpretation of an expression A with respect to an s-environment C. We use recursion on dom(C) and A . Definition. I(x,C)
=
{
(s+k+l)modC
uo
I ( B ,yz+k+l C)
x mod C
if C(5) = [ B ,k] if x
dom(C)
0
We derive two theorems concerning this interpretation. Theorem 3.8 shows us that the interpretation which we have defined has the property we wanted, i.e. that s-environments indeed code single substitution.
Theorem 3.7.
I(u; A , E~ C) = u;""~' I ( A ,E' yn C) .
0
Corollary 3.7. I(u$ A , C) = u$"'OdC
I ( A ,^In C) .
0
An implementation of substitution (E.3)
Theorem 3.8.
Zf IF
825
@ dom(C) then
Z(A, &, C) = s;YodE Z(A, C) where
c*= U t m o d Y k + l C I(c,Yk+n+l c)
0
3.3. Combination of d-environments and s-environments Now we will combine the techniques described in the previous sections. For doing this we define another kind of environments, which we will call c-enuironments. Here “c” indicates that these environments combine the possibilities of d-environments and s-environments.
Definition. A c-environment is a partial function I? on if with a finite do0 main, and with values in (N --t C) U (C x N ) . The values N + C denote segments of a d-environment, the values in C x N denote values of an s-environment. Let us first sketch the intended interpretation of an expression A with respect to such a combined environment I?. As an example we present the following diagram.
.---
E I
Figure 9
As the diagram suggests, references from A are considered to point either into the first d-environment segment of I? or before that segment. In order to
L.S. van Benthem Jutting
826
picture the intended environments for C, D and E in the situation sketched above we look at the following diagram.
depth = 3
dcpth
=
0
~ ~ ) * j depth
<
=
7
environmcnt for C
.
- * , 1 environment for D
4
meJ
cnvironrncnt for E
1
Figure 10 As may be seen in the diagram, the environments for C and D are obtained by counting back 1 place and 6 places, respectively, just as in the case of an s-environment. The expressions in a d-environment segment, such as E , have as their environment that part of r which lies to the left of the segment. For a c-environment r we define the place of its first d-environment segment. Definition.
a(r)= min { I
E dom(r) I
r(I)E N
+
C} .
Remarks. The minimum is considered to be infinite if the set is empty. For the c-environment in the example above a(r)= 4.
0
0
On c-environments we (re)define the four operations: multiple and single substitution, extension and cutting. 3.3.1. Multiple substitution The c-environment gets a &segment
Definition. If 1
>k
c' at k .
for all 1 E d o m ( r ) then
(a$r)(k) = C (6$r)(i) = r(i)
if i E dom(r)
.
An implementation of substitution (E.3)
827
3.3.2. Single subst it ut ion The domain of the c-environment is extended to Ic, and its value at k is a pair [C,n1. As earlier the “n” indicates that C should be interpreted with respect to a shorter c-environment. Definition. If Ic $ dom(r) then
(4,n r ) ( k ) = [C, n1 if i E dom(r)
(o&F)(i) = r
0
3.3.3. Extension The c-environment is extended (as was previously done with s-environments). The domain of the environment is shifted.
Definition.
(Er)(i+ 1) = r ( i )
if i E dorn(r) .
0
3.3.4. Cutting From the c-environment the last n entries are cut.
Definition.
+
( T J ) ( i ) = r(i n)
if i
+ n E dom(r)
0
3.3.5. Interpret at ion The interpretation of a reference with depth z representing a free variable with respect to a c-environment l? is comparable to its interpretation on an senvironment C. Therefore we define again for z E N the possible interpretation z mod r as follows. Definition. z mod
I < Z} .
:= z - # {y E dom(r) y
0
As before, it is clear that the function X z . ( z m o d r ) is increasing. Also the analogous theorems hold.
L.S. van Benthem Jutting
828
Theorem 3.9. If x fZ dom(r) then x m o d r = (x
+ 1)m o d r - 1 .
Theorem 3.10. xmod6;r
=
x mod r xmodr- 1
ifxsk ifx>k
'
Theorem 3.11. x mod C
r=
T ~ , ~
x mod r xmodr - 1
ifxsk ifx>k
0 '
Theorem 3.12. x mod T k r = (x
+ k ) mod l- - k mod
.
0
Theorem 3.13. zmod&r=(x-l)modr+l
0
Now we will define the interpretation of an expression A with respect to a c-environment I?. We use induction on dorn(r) and A.
The following theorems hold. Theorems 3.15 and 3.16 show that the implementation is sound, coding single as well as multiple substitution.
An implementation of substitution (E.3)
Theorem 3.14. I f n
829
< a(r)then
I ( u ; A , e k r ) = u ; " " ~ ~ I ( A , E ~. ~ ~ I ' ) Corollary 3.14. I f n
< a(r)then
I ( u E A , r )= u ~ m o d r I ( A , ~ n. r ) Theorem 3.15. I f 1 > k for all 1 E dom(r) then
I(A,6; I?) = d'& A where
c*= I ( 6 ,yk+l r) .
Theorem 3.16. Zfk g' dom(r) then
I(A, where
~ 5I?) =, s~
$
I ( A~, I?)
~
c*= U0n m o d Y k + l r I ( c , Yk+n+l r) .
~
~
0
4. TYPING In this section we discuss the way in which certain expressions are typed. First we describe the typing of Automath expressions with named variables. Then we will define the typing of expressions in the name-free A-calculus L defined in Section 2. We will also state some theorems giving the relation between typing, updating and substitution. Finally we will indicate how to find a type for an expression which should be interpreted on a c-environment. And we will state a theorem to the effect that the type found in this way is really the type of the interpreted expression. We start by discussing types in Automath. Consider the expressions A , B and C and the variable x. We have remarked before that, if for x typed by A the type of B is C , then the function Ax : A.B has type IIx : A.C (where B and C may depend upon x). When we apply this function to a n argument D (of type A ) we get a value in SE C, that is the type C, where D is substituted for x. We could also describe this type by saying that it is the co-ordinate indexed by D of the product IIx : A . C . We use, as before, the notation F { D } for the value of the function F at the argument D , and we introduce here locally the notation P ( D ) for that coordinate of the product P which is indexed by D. Then we conclude that if F has type P then F { D } has type P ( D ) .
830
L.S. van Benthem Jutting
Moreover (Xz : A.B) { D } @reduces to Ss B , and also (Hz : A.C) ( D ) preduces to S g C . We see that the product constructor II together with coordinate selection ( ) have the same behaviour with respect to substitution and P-reduction as the functional abstractor X together with application { }. This is the reason for identifying them in our exposition (as is common use in Automath): Hz : A.B and Xz : A.B are both denoted by [z : A]B , and A { B } as well as A ( B )will be denoted by A { B } . Then, if the type of A is C , the type of A { B } will be C { B } . Now, as we are interested here in syntactical issues only, we will not treat the concept of correctness of expressions. We will, however, introduce a typing operator 7, such that for correct expressions A which have a type, it holds that A is of type 7 ( A ) . Let us first consider the proper Automath case, where all constants have a finite arity. We assume that for certain constants a their types are given; i.e. we presuppose a function 0 which associates to certain expressions a(.') a type (i.e. an expression). Here .'is a finite sequence of distinct variables and its length is the arity of a. This sequence should contain all free variables in the type @(a(Z)).We assume also that the notion of simultaneous substitution has been properly defined. If Z is a sequence of distinct variables and l? a sequence of expressions, and if these sequences have the same length, then S; A will denote the result of simultaneous substitution of the expressions for the variables .' in A. The typing operator 7 (which is a partial operator because not all constants have a type) is now defined as follows:
B'
Let A be an expression, let (a be a function associating a type (an expression) to every free variable of A. Now if A is a variable z then 7 ( A ,(a) = (a(z), if A is a(@ and a(.') in dom(@) then 7 ( A ,(a) = S$ @(a(.')), if A is [z : B]C then 7 ( A ,(a) = [z : B]7 ( C ,a*) (where @* is obtained from (a by changing its value at z t o B ) , if A is B {C} then 7 ( A ,@) = 7 ( B ,@){ C } .
Remark. This typing operator corresponds to the one described in [Nederpelt 73 (C.3)],as far as abstraction and application are concerned. The description of various typing operators for defined expressions a(@ can be found in [van 0 Daalen S O ] .
831
An implementation of substitution (E.3)
4.1. Typing in name-free A-calculus; contexts Now we discuss typing in the name-free A-calculus C in Section 2. Expressions will be expressions in L. In particular constants are considered to have infinite arity and sequences of expressions are infinite. For typing such expressions we need types for constants and variables, just as in the case described above. Therefore we assume two functions to be given: a possibly partial function 0 : C + C and a total function @ : N
+C
.
The latter, which gives us the types of variables, will be called a context. The expressions of such a context, i.e. the values @(i), are expressions which may contain free variables (i.e. references). These references should be considered as being typed relative to the prepart of @. More precisely: the type of a reference k 1). with depth k in @(i) must be taken to be @(i We define two operations upon contexts.
+ +
4.1.1. Extension For a context @ we denote by E A @ the extension of @ by an expression A . The consequence is that references with depth 0 in the new context will have type A , while references with depth i > 0 will have the type which was originally the type of a reference with depth i - 1.
4.1.2. Cutting If @ is a context then -yn @ denotes the result of cutting from @ a segment of length n. Hence 71 is the left inverse to E A . Definition.
(m@)(i)
= @(i
+ n)
0
4.1.3. The typing operator We now a.ssume the (partial) function 0 : C + C, which gives the types of constants, to be given and fixed. We define for an expression A and a context @ the type of A relative to @, to be denoted by t ( A , a).
L.S. van Benthem Jutting
832
0
Remark. The update u;" in the first clause of this definition ensures that the type which is defined can be considered relative to the same context as the points to a prepart of (a preexpression which is typed. The expression (a(.) ceding z, as has been indicated above. 4.2. Theorems The following theorems hold for the relation between typing, updating and substitution. The theorems have been formulated so as to make the proofs (by induction) straightforward. The interesting properties are contained in the corollaries.
Theorem 4.1.
Corollary 4.1.
t(uE A, (a) = UE t ( A ,"ln (a) .
Theorem 4.2. If
t(C,(a) = D then
An implementation of substitution (E.3)
833
Corollary 4.2. If
t ( C , @= ) D then
t ( s $ A , @ )= & t ( A , ~ g @ ) .
0
Corollary 4.3. If
t(ci,a) = GI&+’ @(i)
for i E N
then t(d$ A, @) = d$ t ( A ,a) .
0
Remark. For typing to commute with multiple substitution, it is necessary that this substitution is, in some sense, correct. More precisely: it is needed that the type of the expression which is substituted for a reference of depth k is equal to the type of that reference. We have seen, however, that the type of a reference may itself contain free references. Now the substitution operator indicates substitution for all free variables, and therefore the correctness requirement regards the types of the variables with the substitution carried out. The situation resembles the requirement of “fitting of strings into contexts” in Automath languages (cf. [van Daalen 801 and [van Daalen 73 (A.311). 4.3. Typing of expressions in a c-environment In order to describe typing in a c-environment we need, just as above, a (possibly partial) function 0 : C + C describing the typing of constants. This function is implicit in the following. We also need a description of the types of free variables (i.e. references). In our representation we will not give these types explicitly by a function @ : N + C, as we did above, but we will code the types in such a way that they can be retrieved using the c-environment. Let us consider an expression A on a c-environment r. We picture the situation in the following diagram.
L.S. van Benthem Jutting
834
Figure 11 We denote the complement of d o m ( r ) by Fr. As we have explained in Section 3, references from A to Fr denote free references. In the diagram we have indicated that a reference to 1 from A denotes 0, a reference to 3 denotes 1. References from A to 5 are possible only via a reference into dom(P), e.g. by a reference into r ( 4 ) . In this case it will denote a reference 2, etc. Generally a reference t o a number x in F ( r ) will denote z m o d r . We have written down all these interpretations next to the dots which represent the elements of Fr. Looking at these interpretations, we see that they could be considered as an order preserving map from Fr onto n/. We will denote this map by vr and its inverse by $r. Clearly $r is an order preserving enumeration of Fp. Now, as references to Fr should be interpreted as free variables, we choose to code the types of those free variables as a function with domain Fr. We will put the type of the free reference with depth i on the place $ r ( i ) . But we will not put there the type itself of the free reference, but an expression which codes this type, and which should be interpreted itself relative to the environment r. This leads us to the following description. We suppose a function '$ : Fr + C to be given. We will interpret this function as coding the context @ which gives the types of the variables. That is: we will define the "interpretation" U ( Q ,I') of Q with respect to r and this interpretation will be @.
Definition. Let i E N then
U(XP,r)(i)= I ( @ ( j ) , y j + l r ' )
where j = $r(i)
.
0
An implementation of substitution (33.3)
835
It turns out to be necessary to define what is the influence on @ when the environment r is cut or extended. This is done in the following definitions. In the case of extension the type of the extra variable should be given of course.
Definition. (&A
w)(o)
= A if i E dom(9)
( & ~ 9 ) (1)i += Q ( i )
.
Definition. (%) @(i) = @(i
+ n)
if i
+ n E dom(9) .
For these operations the following theorems hold.
Now we define the type T ( A ,I?, 9) of the expression A with respect to the c-environment I' and the function 9 (which codes the types of the free references in A ) .
836
L.S. van Benthem Jutting
4.4. The theorem on typing
We finish this description by stating the following theorem
Theorem 4.6. Let r be a c-environment and Q a function on Fr as described above, then we have:
5. CONCLUSIONS
The main conclusion which can be drawn from the description we have given is that the implementation of substitution we described is sound. It codes substitution without ever copying a given expression and as a consequence it is cheap both in execution time and in memory space, as far as pure substitution is concerned. In comparing expressions it might be slower than systems which use copying. We may add that it has been implemented and tried out extensively in the Automath checker, which has been used in Eindhoven since 1974. The system can be used to implement P-reduction, 77-reduction and so-called &reduction (i.e. application of definitions). Various strategies for deciding convertibility of typed expressions can be implemented using the system. Our description and also the implementation have made essential use of the concept of “de Bruijn indices”, that is of relative addressing of variables. We think, however, that similar implementations using absolute addressing might be possible. A remarkable feature of the system is the difference which is made between single and multiple substitution, as single substitution could be considered as a special case of multiple substitution. The main reason for the distinction is that, when multiple substitution is applied, it is never required to consider the terms which are substituted on a “shorter” environment, as in the case described in Section 3.2. Therefore we would burden multiple substitution with superfluous administration if we tried to incorporate single substitution as a special case. Looking back on the description it can be remarked that it uses the natural numbers for coding various tree-like structures. The advantage of this choice is that formal proofs of our theorems are possible, the disadvantage is that the intuitive background of our definitions is not as clear as might be possible in another presentation. This disadvantage is most obvious in Section 4. We have tried to compensate for it by giving informal explanations for our definitions.
An implementation of substitution (E.3)
837
Finally we situate this implementation into Landin’s SECD machine (for a reference see e.g. [Glaser et al. 841) in order to make clear what its status is. In fact our system is mainly a possible implementation of the E-part of that machine. The rest of the Automath checker has not been discussed here and is rather different from Landin’s machine, and also from the reduction machine for BRL (see [Berkling & Fehr 821)’ CATAR (see [Curien 861) and other functional programming languages. The reason is that the aims of these languages differ somewhat from those of the Automath checker: the latter aims at deciding convertibility of pairs of typed A-expressions, preferably without normalizing them, while Landin’s machine and similar A-calculus machines aim at normalization. Nevertheless our implementation of substitution could be used also for normalizing expressions.
ACKNOWLEDGEMENTS The incentive for writing this paper came from G. Huet and T. Coquand, who asked me (in 1986) how substitution was implemented in the Automath verifier. Of course the paper could not have been written without I. Zandleven who invented (and implemented) the system. I want to thank Rob Wieringa for many valuable comments which helped me in presenting my ideas, for patiently listening to many expositions and for invaluable technical support in preparing the text. Paul Gorissen and Frans Kruseman Aretz spent much effort in reading the manuscript and suggested major improvements in the presentation.
This Page Intentionally Left Blank
PART F Related Topics
This Page Intentionally Left Blank
841
Set Theory with Type Restrictions N.G. de Bruijn
1. It has been stated and it has been believed throughout this century that set theory is the basis of all mathematics. Usually (but not always) people think of the Cantor set theory, with some formalization like the one of Zermelo-F’raenkel. It describes a universe of things called sets, and everything discussed in mathematics is somehow interpreted in this universe. 2. It seems, however, that there is a revolt. Some people have begun to dislike the doctrine “everything is a set”, just at the moment that educational modernizers have pushed set theory even into kindergarten. It is not unthinkable that this educational innovation is one of the things that make others try to get rid of set theory’s grip for absolute power. Mostowski is reputed to have claimed a counterexample by declaring “I am not a set”. At the present state of science it seems to be impossible to find out whether this statement is true or false. Anyway, there is no safe ground for saying that everything is a set. Let us try to be more modest and say: “very many things can be coded as sets”. For example, Beethoven’s 9-th symphony can be coded as a set. But the coding is quite arbitrary, and we are not sure that nothing gets lost in the coding. To quote a more mathematical example: Gauss’ construction of the regular 17-gon may be interpretable as a set, but again such an interpretation is quite arbitrary and does not seem to be illuminating. An expression like “the intersection of the set of even integers with the set of all constructions of the 17-gon” makes sense only after the codings have been stated. Sets have become a very important part of our language. Until 1950 many rigorous texts on mathematical analysis were written with little or no use of the language and notation of sets. This has changed considerably, but quite often the change is very superficial. It is superficial as long as it is nothing but a translation from predicates to sets. One of the reasons for this translation may be that there is a vague opinion that a set is a mathematical object and a predicate is not. Accordingly, it is felt that someone who makes assumptions and proves theorems about predicates is a logician and not a mathematician.
N.G. de Bruijn Nevertheless, there still remains a tremendous use for sets in mathematics. Sets are here to stay, and we have to ask what kind of set theory we should adhere to. The question which set theory is the true set theory, is not a true question, of course. It is all a matter of taste: relevant things are whether a theory is beautiful, economic, powerful, easy t o manipulate, natural, easy to explain, etc. The fact that the Cantor-Zermelo-Fraenkel theory is interesting, correct, rich and deep, does not imply that it is necessarily the tool that should be available for every mathematician’s use. It has some disadvantages too. One is that it makes the foundation of mathematics rather hard for the non-specialist. We have the sad situation that late in the 20th century the average ordinary mathematician has rather vague ideas about the foundation of his science. Another unpopular feature in Cantor set theory is the admission of z E z, which seems t o be rather far away from possible interpretations. 3. The natural, intuitive way to think of a set, is to collect things that belong to a class or type given beforehand. In this way one can try to get theories that stay quite close t o their interpretations, that exclude z E z and are yet rich enough for everyday mathematics. Some of these theories may exclude large parts of the interesting, funny paradise of Cantor’s set theory which has been explored by so many expert mathematicians. For a survey of various type theories we refer to [Fraenkel et al. 581. 4. In this paper we shall try to make a plea for a kind of type theory where the
use of types is very similar to the rBle of types in cases where the objects t o be discussed are not sets. Let us first note that natural languages are confusing when dealing with types. The word “is,’ is used for too many things. We say “5 is a number”, “5 is the sum of 1 and 5”, “5 is the sum of two squares”. It is only in the first sentence that “is” can refer to a type. We shall use the symbol E for this: 5 E number. We shall call such a formula a typing. 5. We think of a type theory where the type of an object is unique. If A E B then B is completely determined by A . This seems to drift us away from the idea that B is something like a set and that A is a member of B, and we have to be careful not to confuse the typing symbol E with the membership symbol E , although there is a conceptual similarity.
We of course run into circumstances where we want to say that our number 5 is also a complex number: 5 E complex number. We have t o make the distinction between the real number 5 and the complex number 5 in order to maintain that in A E E the B is completely determined by A . It is a bit awkward, we have to talk a great deal about identification and embedding (but in de Cantor-
Set theory with type restrictions (F.1)
a43
Zermelo-Fraenkel theory this is not any better). Yet it should be done; let us not forget that most mathematicians would hesitate to identify the real number 5 with the 2 x 2 matrix
(
), and the latter situation is really not
very
much different from the one with the complex numbers.
6. Let us first explain some other cases where E plays a r8le. If B is a theorem and if A is a proof for B, then we can write A E B. The theorem can have several proofs, but a proof proves just one theorem. Another example: Let B be a statement of constructibility of some geometrical figure that can be constructed by means of ruler and compass, and let A be a description of one of the constructions. We again write A E B. There can be several different A’s t o a given B , but if the construction A is given, there is no doubt about what it constructs. A third example: Let A be a computer program and let B be a description of what the execution of the program achieves (i.e., A describes the syntax and B the semantics). In all these cases the A’s and B’s may depend on several variables (of certain types), and the results A E B may be transformed into other results A’ E B’ by means of substitution. Moreover, in all these cases there is a possibility to introduce a name for a thing in B if we do not actually have that thing. There are two ways for this. (i) The thing can be introduced as something primitive and fixed. For example Peano’s first axiom says that there is a natural number t o be denoted by 1, and nothing is assumed about it at that stage. An example with a different interpretation is that B is a proposition and we say that its truth is assumed, i.e. that B is an axiom. From then on, B plays the same r61e as a theorem: we act as if we have a proof, i.e. we have something E B that we do not wish to describe. Let us now look at the case of geometrical constructions. We want t o express that the possibility t o connect the distinct points P and Q by a straight line is a primitive construction, i.e. a construction that cannot be described in terms of simpler constructions. That is, we act as if we have a fixed thing E B, where B is a statement of constructibility. (ii) The thing can be introduced as a variable. Its validity is restricted to a piece of text (a “block”) that is opened by the introduction of the variable; that is why we call it a block opener. The variable is introduced by stating its type: if its name is z, we write something to the effect of “let z E B”. If B is a type like “number”, “point”, then this phrase “let z E B” sounds quite familiar. If B is a statement, however, we may interpret the phrase “let x E B” as “assume that B is true”. That is, we act as if we have a proof for B. (This is not the same thing as the introduction of B as an axiom:
844
N.G. de Bruijn “let x E B” does not reach beyond the block opened by x, and secondly, we can substitute A for x if we later get any proof A for B.) There is a slightly unfamiliar feature: most mathematicians have not got used to giving names to proofs, and here we give names even to would-be-proofs.
7. The parallels between the various interpretations of typings are very strong indeed. The mechanism of substitution is the same for the various interpretations, and actually the various interpretations are happily intermingled. Everything that is said in mathematics is said in a certain context. That context consists of a string of variables (block openers), each one having been introduced as a thing of a certain type. The type of the second variable may depend on the first variable, etc. In such a string some of the variables have to be interpreted as conventional mathematical objects (like numbers, points), others as wouldbe-proofs for assumptions. The linguistic treatment makes no difference as to the interpretations. 8 . The above characteristics are the common root of the mathematical languages of the Automath family [ d e Bruzjn 70a (A.2)]. The definitions of these
languages hardly contain anything on logic or on the foundation of mathematics. Notions like “truth” , “theorem”, “proof”, “set”, “definition”, “and”, “implies”, “inference rule” are either things that can be explained by means of the language (like any other piece of mathematical material) or else they are only meant to emphasize pieces of text to a reader who likes to have a feeling for motivation. To mention an example: a definition, an abbreviation and a theorem have the same linguistic form. It would not be necessary to distinguish between these three, if it were not for the fact that “readability” has something to do with the relation to conventional modes of expressing mathematics. The languages of the Automath family have the property that books written in these languages can be checked for syntactic correctness by means of a computer. We emphasize that syntactic correctness guarantees that the interpretations of the text are correct mathematics. Note that various sets of the typing symbol E can occur on one and the same piece of text, and therefore we can pursue a kind of unification of mathematical theories. It is not the right place t o go into a complete exposition of these languages, but one thing should be made clear; just as they admit to introduce objects of a given type, and to build new objects by means of old ones, it is equally possible to introduce new types (by way of variables or of primitive notions) and to build new types in terms of old. For this purpose we create the extra symbol type and we write things like “number E type”, “let B E type”, etc. 9. Having such type languages available as relatively simple tools, we are in-
Set theory with type restrictions (F.1)
845
duced to base mathematics on a type theory where types can be constructed as abundantly as other mathematical objects, i.e., where types may depend on parameters, are defined under certain assumptions only, where types can be introduced as variables or as primitive notions.
10. There are various ways to do set theory in such a system. One possibility is that we take a primitive type called SET, and from then on, we write A E SET for every A which we want to consider as a set. We can write the complete Cantor-Zermelo-Fraenkel theory this way. The relations A E B, A C B are relations that have a meaning whenever A E SET and B E SET. There is not the slightest danger to confuse E and E. The E is a relational symbol just like any other; it does not occur in the language definition. There is a second, entirely different way, that implements set theory with types, in the sense of the “5 E number” mentioned before. Now the symbol E means something like E. If B is a type, and if P is a predicate on B , we form the set S of all A with A E B for which P ( A ) is true. So sets in B correspond to predicates on B. We write S C B , and we define E by saying that A E S means P ( A ) . Quite often we like to consider S as a new temporary universe, i.e. we wish to have A E S in the form of a formula with an E. To that end we create a type called OWNTYPE(B,P ) and a one-to-one mapping of that type onto S. Some of this work can be simplified by special notation we shall not develop here; such notation can be used both for ordinary and for automated reading. 11. In order to work with the predicates mentioned in the previous section, we want some kind of typed lambda calculus. It is roughly this. If B is a type, and if for every x E B we have a formula of the form A(z) E C(z), then we want t o write [z : B]A(z) E [z : B]C ( x )
The left-hand side is the function that sends x into A ( z ) , defined for all x E B ; the conventional notation in non-typed lambda calculus is X,A(x). The righthand side is slightly unconventional; in the case that C(z) does not depend on z one may think of the class of all mappings that send B into C. This kind of lambda calculus is part of the language definition, independently of the mathematical axioms we are going to write in our books. So there is a primitive idea of mapping available before sets are discussed. In particular, predicates are such mappings, so if sets are introduced by means of predicates, they already require the lambda calculus. Later, one can show that the concept of a mapping as a subset in a Cartesian product is equivalent to the notion of mapping provided by the lambda calculus.
846
N.G. de Bruijn
12. Cantor produced his paradise by means of linguistic constructions. (This created considerable controversies in his time, since he did not specify his language.) Now let us see what we get by linguistic constructions in our typed set theory. Assume we introduce (by means of an axiom) the type N of all natural numbers (and we take a set of axioms like Peano’s). Then we have, whether we want it or not, subscribed to N N , to N”, etc., since the lambda calculus prescribes that we accept the type of all mappings of N into N , etc. However, it seems (we use the phrase “it seems” since no formal proof has been given thus far) that we cannot form something of the strength of the union
The reason is not that we would not allow ourselves to form the union of a countable number of types. That will be provided, anyway, by an axiom we would not like to live without. The reason is that we are unable to index the sets of the sequence (1) in our language. The indexing we want is N1 = N , N2 = ”1, N3 = ” 2 , ... , and this is in terms of our metalanguage, since it requires a discussion of something like the length of a formula. This is a little detail Cantor never made any trouble about. The fact that the union (1) is “inaccessible” does not mean that bigger types are forbidden. After all, we can just start saying “let B be any type (i.e. B E type)” and we can make assumptions about B that cannot be satisfied by the types N , N N ,... . The world where we have N , N N ,... , but where (1) is L‘inaccessible’’,is a world most mathematicians will doubtlessly find big enough to live in. For those who want to have a bigger world, where they cannot be troubled by people asking for interpretations, there is a simple way out: they just take a type SET and provide it with Zermelo-Fraenkel axioms. If they want to have the picture complete, they will not find it hard to embed the types N , N N ,... into a small portion of their paradise.
13. Having discarded the idea that every mathematical object is a set, we should be careful not to fall into the next trap. We might like to say that a mathematical object is either some B with B E type or an A with A E B (where B E type). However, the situation can be more complex than this. Let us consider the notion “group” that occurs in the sentence “let G be a group”. What we want t o say is something like this: assume we have a type A , that we have in A a set B , that in B we have a multiplication rule, that the multiplication is associative, etc. The object we want to handle can be denoted by a string of identifiers 2 1 , ...,xk, where 11 E A l l 2 2 E A2, ...,x k E Ak, but where A2 may depend on 2 1 , A3 on X I and 2 2 , etc. It is not as if the string A1, ...,Ak were something type-like, and 51,...,xk were something chosen in it. Accordingly, we
Set theory with type restrictions ( F . l )
847
cannot write let “G be a group” as a single typing “G E group”. We can of course create, by means of a set of axioms, a new type “group”, but that is a poor remedy: we cannot afford to adopt axioms for every new notion we like to introduce.
14. In Section 10 we compared two different ways to talk about sets by means of typings. The choice between the two has a more general aspect; viz. the question whether we shall or shall not aim at minimal use of typings. The word “minimal” refers to the number of different uses of the typing symbol. In order to say what we mean, we describe a kind of minimal system that seems t o be in the spirit of basing mathematics on Zermelo-Fraenkel set theory. In the first place we use typings ...E SET (as in Section 10). Secondly we create a type called BOOL, and we use “ AE BOOL” in order to express that A is a proposition. Finally, for every X with X E BOOL we create a type called TRUE(X), and we use the typing P E T R U E ( X )for expressing that P is a proof for the truth of X . In this minimal system, the use of typings of the form ...E type is restricted to the above-mentioned three instances right at the beginning of the book of mathematics. The author thinks that talking mathematics in such a minimal system is not always the natural thing to do. There is much to be said for a more liberal use of typings, where typings of the form ...E t y p e are used throughout the book. Let us consider the geometrical constructions mentioned in Section 6. It seems natural to use A E B for saying that A describes a construction and that B says what has been constructed. Let us say that we have created, for every point P , a type CONSTR(P). Hence statement A E B has the form A E CONSTR(P). If we want to phrase this in our minimal system, we get something as follows. The point is a set ( P E S E T) , and so is some coded form A* of A (A* E SET). We form a proposition q(P,A*) (so q(P,A*) E BOOL) that says that A* is a construction for P. Finally we need a proof S for this proposition, whence we write S E TRUE(q(P,A*))
for what was A E CONSTR(P) in the liberal system. In the latter case it is not necessary t o provide a proof corresponding to S, since the type of A can be determined by a simple algorithm. This example shows two advantages of liberal use of typings: one is that many unnatural codings can be suppressed, the other one is that a higher degree of automation can be achieved. Yet there are many other advantages of which we mention two: (i) We are neither forced nor forbidden to introduce the types SET, BOOL and TRUE(X); (ii) There is a possibility that one and the same piece of text gives rise to various pieces of standard mathematics, just by the use of different interpretations.
This Page Intentionally Left Blank
849
Formalization of Constructivity in Automath N.G. de Bruijn
1. INTRODUCTION There are various systems in which a large part of mathematical activity is formalized. The general effect of the activity of putting mathematics into such a system is what one might call the unification of mathematics: different parts of mathematics which used to be cultivated separately get united, and methods available in one part get an influence in other parts. Very typical for twentieth century mathematics is the unifying force of the concepts of set theory. And today one might say that the language of mathematics is the one of the theory of sets combined with predicate logic, even though one might disagree about the exact foundation one should give to these two. Not everyone thinks of set theory and logic as being parts of a single formal system. Set theory deals with objects, and logic deals with proofs, and these two are usually considered as of a different nature. Nevertheless, there are possibilities to treat these two different things in a common system in a way that handles analogous situations analogously indeed. A system that goes very far in treating objects and proofs alike, is the Automath system (see [ d e Bruajn 80 ( A . 5 ) ] ) . In Automath there are expressions on three different levels, called degrees. Each expression of degree 3 has a “type” that is of degree 2, and each expression of degree 2 has a type of degree 1. Expressions of degree 1 do not have a type. There are two basic expressions of degree 1, viz. type and prop. The word type should not be confused with the word type used more or less colloquially when saying that each expression of degree 2 or 3 has a type. We denote typing by a colon. If A has B as its type, we write A : B. So we can have
A : B : type and also C : D : prop
.
The interpretation of (1) is that A is the name of an object (like the number 3), and that B is the name of the class from which that object is taken (it might be
850
N.G. de Bruijn
a symbol for the set of integers). The interpretation of (2) is that C is a name for a proof, and that D somehow represents the statement that is proved by C. The main profit we have from this way of describing proofs and objects is the matter of substitutivity. If we have described an object depending on a number of parameters, that description can be used under different circumstances by means of substitution: we replace the formal parameters by explicit expressions. The same technique is applicable to theorems: a theorem is intended for many applications, and such applications can be effectuated by substitution. The conditions of the theorem are modified by these substitutions too. If we study the matter more closely, we see that some of the parameters are object-like, and othere are proof-like. The substitution machinery is the same for both. All this is effectively implemented in the Automath system.
2. ORIENTATION ON GEOMETRICAL CONSTRUCTIONS On the fringe of mathematics there are mathematical activities which seem to be of a kind that does not fit into the pattern of objects and proofs. One such thing is the matter of geometrical constructions, a subject that goes back to Greek mathematics. A construction is neither an object nor a proof, but constructions are discussed along with geometrical objects, and along with proofs that show that the constructions construct indeed what is claimed to be constructed. Since these geometrical constructions can also admit substitution for formal parameters, there is a case for creating facilities which handle a new kind of things along with objects and proofs. So we can think of a system that handles objects, proofs and geometrical constructions in more or less the same way. If we think of geometrical constructions, there is a peculiarity that may not arise easily with other kinds of constructions: it is the matter of observability. Let us study a particular example in order to stress this point. Let there be given four points A , B, C and D in the plane. We assume that A , B and C are not on a line. Let M be the centre of the circle through A , B and C. We wish to construct the point P that is defined as follows. P is obtained from D by multiplication, with A4 as the multiplication center, and multiplication factor 1, 2 or 3. The factor is 1 if D lies inside the circle, 2 if D lies on the circle, and 3 if D lies outside the circle. If we want to carry out the construction of P , we have to know whether we are allowed to observe what the position of D with respect to the circle is. In particular this problem comes up for the practical question what should happen if there is insufficient precision for concluding whether D is inside or outside. If we think of a construction with actual physical means like paper, pencil,
Formalization of constructivity in Automath (F.2)
851
ruler and compass, then the case of D lying exactly on the circle is, of course, undecidable. The above construction problem may seem to be very artificial, but yet its main characteristic turns up in very many geometrical constructions: it is the fact that, at some point of the construction the result of some observation will decide the further course of the construction. An example where this will happen is the case of geometrical constructions that have to be carried out inside a given finite part of the plane. The naive approach t o observability may be formulated as the slogan “truth is observable’’ (see Section 4). Other possibilities will be sketched in Sections 8-10. A further thing one might like to formalize is selectability: one wants to be able to select a n object from a set of objects one has constructed. For example, a construction of the intersection of two circles may produce two points, and we may wish to be able to “take one of them”. In this case such a selection principle is not indispensable: one might describe the effect of the construction of the intersection as giving a labelled “first point” and a labelled “second point”. But there is a stronger reason for implementing a selection principle: so often we have to “take an arbitrary point” at some stage of a construction. It should be noted that in such cases the final result of the entire construction does not depend on the particular point that was taken. In Section 5 we come back to this, in particular to the matter of the difference between “giving” and “taking” arbitrary points. A description of all these features is possible in Automath. We have various options for doing it. The way we present this matter is necessarily arbitrary. It is certainly not the intention of this note to give a particular basis for geometrical construction theory. The only thing that will be attempted is to provide a framework into which such a basis might be placed. If we formalize a thing like constructability we of course dislike t o do it in the style of classical logic. We do not want to consider constructability of a point as a proposition in the ordinary sense. We do not want to admit arguments where we get a contradiction from the assumption that the point P is not constructable, and then conclude the constructability of P. Therefore we want to put constructability (and the same thing might apply to observability and selectability) in a framework of positive logic, where we have no negation at all. In fact we can be even more restrictive, and refrain from introducing the ordinary logical connectives (like A , V, +) for this logic. The only thing we want to do is to register statements about constructability, observability and selectability (possibly provided with a number of parameters), and to keep them available for later use.
852
N.G. de Bruijn
We can provide facilities for such a positive logic in Automath by adding a new expression of degree 1, to be called pprop (the first p stands for “positive”). For this pprop we shall not proclaim any logical axioms, and we shall not introduce the notion of negation. Moreover, we do not feel the need to have abstraction in the world of pprop. That is, if u: pprop we shall not take abstractors [x:u] like we would have in cases with prop or type. Accordingly, in this pprop world we shall not consider application (..) ... either. That means: we take pprop entirely in the style of PAL (see [de Bruzj’n 80 ( A . 5 ) ] ) . There is a case for doing something similar in the world of type. Let us create a new expression of degree 1, to be called ctype (the ‘c’ stands for ‘construction’, since we intend to use it in the world of constructions). The difference between ctype and type is similar to the difference between pprop and prop. In ctype we intend to be free from all the assumptions that might have been made about type. In particular we shall not necessarily implement set-theoretical notions. And we shall not even introduce the notion of equality. That is, if a : C : ctype and b : C : ctype, then we will not introduce the equality of a and b as a proposition. Moreover, we shall treat ctype entirely in the style of PAL: no application and no abstraction. For a description of Automath versions where various sets of rules apply to various expressions of degree 1, we refer to [de Bruijn 74al. It has to be admitted that geometry is not the easiest example for the study of constructions. It is not so much the fact that the geometrical universes like planes, spaces, are uncountable. The most troublesome thing is neither that in the geometrical plane there is no fixed origin and no fixed direction. The real course of trouble is that there are so many situations where we have to except some of the cases. If we want to say that points p and q have just one connecting line we have to exclude the case p = q. Such things cause a steady flow of exceptions, which even has distorted the meaning of the word “arbitrary”. In past centuries the word “arbitrary” often had the meaning: “arbitrary, but avoiding some obvious exceptions”, and these exceptions were usually unspecified. If one took an arbitrary point and an arbitrary line then the point should not be so arbitrary t o lie accidentally on the line! A full description of all these exceptions has the tendency to make geometrical construction theory unattractive. Yet there is still another source of irritation: so often we have to split into cases (two circles may have 0, 1 or 2 points of intersection), and these situations might pile up to an entangled mess. Nevertheless we may be grateful t o geometry for having confronted us with the notion of constructability. What we have learned from geometry might be applied t o other areas. Computer science might be one of them. Observability, as a formal element in geometrical construction theory, was considered by D. Kijne [Kijne 621. That paper also attempts a formal treatment
Formalization of constructivity in Automath (F.2)
853
of selectability (with selection from finite sets only), and considers “giving arbitrary points” by means of a kind of algebraical adjunction operation.
3. THE BASIS OF FORMAL GEOMETRY
Before we discuss a formal basis for geometrical constructions we have to say what “formal geometry” or more generally, formal mathematics is. Here we are not concerned about the contents of formal geometry, but just about the spirit in which it is written. We may assume that it is written in an Automath book, using the full power of typed lambda calculus. And that it is written in a setting of logic and set theory, the details of which are still open to discussion. One might or might not take the rules of classical logic (e.g. in the form of the double negation law), and we might differ in taking or not taking a thing like the axiom of choice. Such distinctions hardly influence the spirit in which geometry is presented. They might influence the content, i.e. the set of all provable geometrical statements (but it should be remarked that there are areas of mathematics which are much more susceptible to foundational differences than classical geometry seems to be). Just to give an idea of the spirit, we give a small piece of Hilbert’s axiomatization of geometry. Hilbert starts with: there are things we call points and there are things we call lines (in Hilbert’s system the notion of a line is not presented as a special kind of point set). In Automath we say this by creating primitive types “line” and “point”. These types are undefined, just introduced as primitive notions (PN’s). As a primitive we also have the notion “incidence” of a point and a line. Next we can express axioms like: if two points are different, then there is exactly one line incident to both points. Something should be said about “different”. We take it that our geometry text is written in a mathematics book in which for any two objects a, b of type A there is a proposition that expresses equality of a and b, and that for any proposition we can form the negation. In this way the fact that a and b are different can be expressed in Automath by means of NOT(IS(A, a, b ) ) . But in order to keep this paper readable we shall just write a # b instead of this. We now give a piece of Automath text that can be considered as the start of a Hilbert-style geometry book (we display our Automath texts in a flag-andflagpole format: the block openers are written on flags, and the poles indicate their range of validity).
N.G. de Bruijn
854
point := PN : type line := PN : type
I
p : point
m : line
q : point
ax1 := P N : incident(p, conn) ax2 := PN : incident(q, conn)
m : line
I
ax3 := P N : m =conn So if p , q are points, and m is a line, then incident(p,m) is a proposition; if pr is a proof of p # q then conn(p,q,pr) is the connecting line of p and q . In Axioms 1 and 2 we have expressed that this line is incident to p and q, in axiom 3 it is stated that if a line m is incident to both p and q then m is equal to the connecting line. Although the above fragment is still a meagre piece of geometry it is hoped that it shows the spirit of a formalization. We shall refer to such a presentation of geometry as 0.
4. A NAIVE APPROACH TO OBSERVABILITY
What we shall call the naive approach is expressed by the slogan “Truth is Observable”. Let us explain what this means by mentioning two cases. In the first case we use knowledge obtained from geometrical theory 0 in order t o prove that some object we have to construct is already in our possession. We do not bother whether that proof is L‘constructive’’or not: truth is just truth. One might find this a poor example, since within the sope of usual geometrical theories and usual constructions it seems that “non-constructive” proofs can
Formalization of constructivity in Automath (F.2)
855
always be replaced by very constructive ones, but it is easy to imagine fields where the situation is different. In the second case we have a construction that started from a point that was chosen arbitrarily. At some stage of the construction we have a point P and a circle c, and subsequently our course of actions is depending on whether P lies inside c, outside c or on c. The naive point of view says that on the basis of the theory in G we have exactly one of the three alternatives. We can observe which one of the three occurs, and we act accordingly. In Sections 6 and 7 we offer two different implementations for the naive point of view.
5. TAKING ARBITRARY OBJECTS Before going on, we have to make it clear that there are two entirely different situations where in traditional geometry it was said that an arbitrary object (like a point) was taken. Let us call these situations V and S, (these letters abbreviate “data” and “selection”). If we think of a problem where a teacher requires a pupil to construct something, then 2) is the case where the data have been chosen arbitrarily by the teacher. On the other hand, S is the case where the pupil, in the course of the construction, selects some point arbitrarily. Quite often the final result does not depend on the particular point that was chosen, but there may be other cases. It may happen that the final result itself has a kind of arbitrariness. An example: given points A , B and C, not on a line, construct a point inside the triangle formed by A, B and C. In the opinion of the pupil, the points taken in situation 2) are not called “arbitrary”: they are called “given” or possibly “arbitrarily given”. The pupil has no freedom in case V . In the S-case, however, the pupil is completely free, and the teacher has no say in the matter. In a formal presentation like in Automath the difference between V and S is very pronounced. 2) is effectuated by means of the introduction of a new variable, S is implemented by means of a primitive notion (PN). We shall show this in detail in Section 6 and 7. There is something about the PN-implementation of the S-situation that might be felt as strange. If we describe a construction by such a PN, then we select exactly the same point if we are requested to do the construction a second time. If the second time we would insist on selecting a point that is actually different from the one chosen the first time, then we have to do this on the basis of some new selection principle, of course. But if we just want to take a point again, without any restriction as t o its being different from or equal to the first one, our P N provides us with the same point we had before. This means that we
N.G. de Bruijn
856
get more information than we intended to have. Nevertheless, such information cannot possibly do any harm. What shall we do about this weirdness of the PN-implementation? Shall we invent unpopular remedies in order to cure a completely harmless disease? Let us not prescribe a definite attitude in this, and admit that there are several ways to live with the situation. Either we leave the harmless disease for what it is, or we take one of the remedies. Let us mention two remedies. The first one is to take a notion of time t , and adhere a value o f t to every construction step. The arbitrarily selected points will depend on t. If we have to repeat the construction some other day, t has a different value, so nothing is known about the selected point in comparison to the one selected the previous day. As a second remedy we suggest to implement arbitrary selection not by an axiom but by some axiom scheme. The scheme proclaims the right to create as many copies of the axiom as one might wish, each time with a different identifier. We leave it at these scanty remarks. The author’s opinion is that unless we invent a much simpler cure, we’d better learn to live with the harmless disease.
6. FIRST IMPLEMENTATION OF THE NAIVE POINT OF VIEW We have to express in some way or other that some of our mathematical objects have been constructed. This can be thought of as a property of those objects, but for reasons sketched in Section 2 we prefer to take this property as a pprop rather than as a prop. We shall create, for every type X and for every z of type X, the expression have(X,z) with have(X,z) : pprop. In particular we can abbreviate have(point,z) t o havep(z) and have(line,z) to havel(z). (Since we use “have” for points and lines only, one might think of taking just “havep” and “havel” as primitives, without taking “have” for general types.) We now give some Automath text. It is supposed to be added to a book that contains geometrical theory G (see Section 3) already. First we introduce “have”, and abbreviations “havep” and “havel”.
I I
have := PN : pprop
Formalization of constructivity in Automath (F.2)
I
857
havep := have(point,u) : pprop
v : line have(line,v) : pprop Next we display how we take an arbitrary object in the sense of the D-situation of Section 5 (“given objects”). In order to talk about a given point we need two block openers, expressing (i) that u is a point, and (ii) that havep(u) holds; inside that context the point u can be considered as given. We shall now express: if u and v are given points and if u # v then we can construct the line connecting u and v. According to our naive point of view the condition that u and v are different is simply expressed in the terminology of 9.
I
u :point
J
-1
ass11 : havep(u)
v : point
1
Next we describe a case of “taking arbitrary points” in the S-situation of Section 5. We express that if m is a given line then we are able to take a point not on m (we use the identifier “up” to suggest “arbitrary point”).
I
m : line
I
ass14 : havel(m)
I
up := PN : point az12 := PN : NOT(incident(ap, m ) ) ax13 := PN : havep(ap)
N.G. de Bruijn
858
These pieces of text display the form in which the basic constructions are introduced. If we want to describe a more complicated construction, we mention the relevant objects one by one, in the order of the construction, and each time we express that we “have” them. We give a (still very simple) example.
assl6:p#q
1
L1 := conn(p, q, assl6) : line H1 := a z l l ( p , assl4, q, assl5, assl6) : havel( L1) P1 := ap(L1, H1) : point : NOT(incident(P1, L l ) ) N i l := az12(L1, H1) H2 := az13(L1, H1) : havep(P1) Here L1 is an abbrevation for the line connecting p and q; H1 can be used as a reference for the fact that we actually have that line. P1 is the result of the construction, N i l assures us that P1 does not lie on L1, and H 2 assures us that we actually have P1. Altogether the text line with identifier P1, Nil, H1 represent the “derived construction’’ expressing that if p and q are given different points then we can take a point P1 such that p , q, P1 are not on one line. This derived construction can be applied later without referring to how it came about. It can be constructed as a kind of “subroutine”. The example of a derived construction we gave here is ridiculously simple, of course. Yet the pattern is the same as in more complicated cases. It shows the old idea of subroutines, which existed in constructive geometry many centuries before it came up in computer programming.
7. SECOND IMPLEMENTATION OF THE NAIVE POINT OF VIEW In the second implementation we take a construction plane which we conceive as being different from the geometrical plane. We might think of the original geometrical plane as abstract, and the construction plane as concrete, consisting
Formalization of constructivity in Automath (F.2)
859
of a piece of paper we can draw on. But, of course, our construction plane is still abstract: it is a mathematical model of a concrete plane. The objects in the constructed plane will be called cpoints and clines. In the back of our minds we think of a one-to-one mapping between the two planes: every cpoint has a point as its companion, and every cline has a line as its companion. Yet we shall not express all of this in our mathematical formalism. We shall just talk about a mapping (to be called semp) of cpoints to points and a mapping (to be called seml) of clines to lines. The reason for this reticence lies in the interpretation. If p l is a point, and if we are able to name a cpoint cpl that is mapped t o p l in our mapping, then for us this means that we “have” p l . We do not want to say that every point in the geometrical plane is a point we “have” just by being able to express that point mathematically. Therefore we do not want to be able to express the inverse mapping. Related to this reticence is the fact that we do not want to be able to discuss equality of two cpoints. Such equality has to be discussed for the companion points in the geometrical plane. And we do not want to admit as mathematical objects things like “the set of all cpoints” with some prescribed property. We achieve these restrictions by putting “cpoint” and “cline” into ctype, which is a world without equality, without set theory, without quantification. As a consequence we do not have constructability questions in our theory. A statement: “the point P is not constructable with ruler and compass” will not be a proposition in our Automath book. If we would be able to quantify over the construction plane we would be able to express that ‘‘there is no cpoint that is mapped onto P” and that would express the non-existence of the construction. Constructability questions belong to the meta-theory. They express that something “cannot be obtained on the basis of the PN’s displayed thus far”, and we cannot say such things in Automath itself. What we call our second implementation starts with the introduction of cpoint, cline and the mappings semp and semf. The latter abbreviations suggest the word “semantics”: we might say that the geometrical plane forms the semantics of the construction plane. If P is a cpoint then semp(P) is its semantics. Off we go: cpoint cline
:= PN : ctype
:= PN : ctype
1
c p : point
I
1
semp := PN : point
N . G . de Bruijn
860
I
I
cl : cline
1
semi:= PN : line
In order to take an arbitrary point in the construction plane, a single block opener “x : cpoint” plays the role of the pair “U : point”, “ass11 : havep(u)” of the first implementation. We show this with the fundamental construction that connects two points: x : cpoint
cconn := PN : cline ax21 := PN : seml(cconn) = conn(semp(x), semp(y), ass21) The fact that cconn is the line we are looking for, is expressed (in ax21) by means of equality in 4. If we have to take an arbitrary point in the S-situation we again get one PN less than in the corresponding case of Section 6. In order to express that, we can take a point outside a line, we write cm : cline := PN : cpoint ax22 := PN : NOT(incident(semp(acp),seml(cm)))
We also show the text corresponding t o the one with P1, N i l , H2 in Section 6: p : cpoint
:
Ni2 := ax22(CL1) N i 3 := ...
cpoint
: NOT(incident(semp(CPl),seml(CLl)))
: NOT(incident(semp(CPl),
conn(semp(p),semp(q) 1 ass221))
Formalization of constructivity in Automath (F.2)
861
We have not displayed the proof N i 3 . It will depend on applying general axioms about equality, and will make use of N i 2 and ax21. Passages like the one from N i 2 to N i 3 might be superfluous in many cases, since it is practical to keep the discussion as long as possible in the construction plane. To that end we might copy notions from E to the construction plane. The simplest example is
7 x : cpoint
cincident := incident(semp(z),seml(y)) : prop
8. RESTRICTED OBSERVABILITY In Sections 4, 6, 7 we described the naive point of view, where every truth in the geometrical theory is considered to be “observable”. Observability has its meaning in the process of taking decisions about the course of our constructions. Let us describe two different motives for restricting observability. One is practical, the other one is fundamentalistic. We shall discuss these in Section 9 and 10, respectively.
9. PRACTICAL RESTRICTIONS ON OBSERVABILITY The practical point of view is connected to questions of precision. This can be compared t o the matter of rounding off errors in numerical analysis. If in a construction two points turn out to be so close together that our construction precision does not guarantee that they are different, then we can not claim to be able to connect them by a line. And even if the points are different, the line will be ill-defined. Although these practical matters give rise to quite complicated considerations, we cannot say that they are necessarily essentially different from what we did in Sections 6 and 7. One can still go on the basis that truth is observable: the question is just a matter of which propositions we consider the truth of. Instead of claiming the possibility t o connect two points p , q if p # q in the geometrical world G, we take a thing like d(p,q) > 1 (distance exceeds unity) as our criterion. Nevertheless we can make things a little livelier than this. Let us start from what we developed in the beginning of Section 7: just the four PN’s that were called cpoint, cline, semp and semi. We now introduce a primitive notion
N.G. de Bruijn
862 “obsdif” ( “observationally different”) in the construction plane: p : cpoint
obsdif := PN : prop And now instead of introducing the cconn, ax21, etc. of Section 7, we go on like this: z : cpoint
I
ass31 : obsdif(x, y)
1
cconnl := PN : cline ax31 := PN : 2 # y ax32 := P N : seml(cconn1) = conn(semp(z),semp(y),az31) Knowledge about obsdif can come from different sources. In the first place we can axiomatize things like: if d(semp(x),semp(y)) > 1 then z and y are observationally different. A second source arises if we axiomatize in the construction plane, in some situations, that if cpoints u and w are observationally different, then the cpoints z and y, derived from u and w in one way or other, are observationally different. A very simple case of this is an axiom stating that obsdif(s, y) implies obsdif(y, z). It will be clear that this subject will become very complicated without being very rewarding. Therefore it seems definitely unattractive.
10. FUNDAMENTALISTIC RESTRICTIONS ON OBSERVABILITY In Section 9 we still had the uncritical acceptance of all truth that can be obtained in the geometrical world. There is a clear reason for restriction. If we have to use geometrical propositions for taking decisions in the world of constructions, it is reasonable to require that we also have a “constructive” way for actually deciding whether such propositions hold or do not hold. We can implement such restrictions in Automath by selecting some “constructive” basis for logic and mathematics, like intuitionistic mathematics, and building our geometry G according to these principles. We might even mix
Formalization of constructivity in Automath (F.2)
863
a constructive kind of mathematics with the ordinary kind, using pprop and ctype for the constructive kind. In particular it seems to be reasonable to take the “obsdif” we had in Section 9 as a pprop rather than as a prop. The latter remark suggests that it might be simpler to shift life entirely to the construction plane, and to forget 0 altogether. But this is not what we usually want. Let us imagine that we want to describe the theory of Mascheroni constructions (constructions with compass but without ruler). The subject matter concerns both circles and straight lines, the constructions deal with circles only. This difference can be implemented by discussing both circles and straight lines in 0, but just “cpoints” and “ccircles” in the construction plane.
11. COMPARISON WITH COMPUTER PROGRAM SEMANTICS It is very natural to compare the field of geometrical constructions with the one of computer programming. In both cases there is a number of actions that produce one or more objects, and in both cases it is very essential that it is proved that these objects satisfy the problem specification that was given beforehand. In a computer program we usually think of a “state space”; the input is an element of that state space and the output is again such an element. In the case of geometrical constructions one would say that the input is (vaguely speaking) the given figure, and the output is the required figure. Let us admit different spaces for input space and output space, and try to describe at least the specification of a geometrical construction in terms of input and output. As an example we take the following (trivial) construction problem. Given two different points P , Q and a line rn. Construct a line q that intersects m, passes through P but not through Q. Let us talk in the style of Section 7, and let us moreover decide to introduce a name R for a cpoint of intersection of q and m (otherwise we would need existential quantification). An element of the input space is a triple ( P , Q , m ) where P : cpoint, Q : cpoint, rn : cline, and where we have semp(P) # semp(Q). An element of the output space is a pair (9, R) where q : cline and R : cpoint. The problem specification is given by the conditions that seml(q) is incident with semp(P) and semp(R) but not with semp(Q). This kind of problem specification is entirely in the style of what is called “relational semantics’’ in computing science. If we deal with geometrical constructions, the role of “subroutines” is more or less the same as in computer programming. In particular we can say that descriptive geometry consists of a large body of subroutines. In computer programs we can have loops. Sometimes pieces of a program
N.G. de Bruijn
864
have to be repeated until some condition is satisfied. The geometrical constructions we discussed in the previous sections have no such loops. This reveals essential restrictions on the class of constructions we can describe in the various systems that were suggested in these sections. An example of a construction with loop is the following one. Let A , B , C be given points on a given line, B between A and C. It is required to construct a point D on that line, such that C is between B and D , and such that the length of the line segment BD is an integral multiple of the length of the segment AB. This construction requires a loop. Our treatment of geometrical constructions in Sections 3-10 might be called “operational” or anyway “functional”. All the time uniquely determined outputs are obtained step by step, and in the slightly more sophisticated case of the use of subroutines the only thing we actually do is taking sequences of steps together and considering them as a single step. The reason is that the treatment is based on what we shall call the interior approach. In the interior approach we talk in terms of the constructed objects. The constructed objects are treated in the same style as ordinary mathematical objects and (but this is a typical Automath feature) proofs. In our Automath book we discuss the objects, but the action of construction is felt as subject matter of some metalanguage. An entirely different way to deal with constructions is that we consider constructions as objects, seemingly more abstract than the ordinary objects, but nevertheless on the same linguistic level. Let us call this the exterior approach. (The name is suggested by the fact that if we work in the interior approach then the metalinguistic discussion of construction is felt as being something at the outside.) With the exterior approach we can get rid of the limitations of our “functional style” of construction description. Anyway we can remove the last differences there might be between geometrical construction and computer programming. We might try to start the exterior treatment with the introduction of a primitive notion “construction” like: construction := P N : ctype but it has to be more complicated than this. The notion of construction has to depend on the input space and the output space as parameters, and this is not so easy t o describe.
865
The Mat hernat ical Vernacular, A Language for Mathematics with Typed Sets N.G. de Bruijn
1. INTRODUCTION 1.1. The body of this paper is from an unpublished manuscript (“Formalizing the Mathematical Vernacular”) that was started in 1980, had a more or less finished form in the summer of 1981, and a revision in July 1982. [The Sections 1 to 17 were published for the first time i n [de Bruijn 87a (F.3)]. A t that occasion the (very essential) Sections 12-1 7 were revised in order to adapt them to typed set theory, and the Introduction was extended. For this 1994 version the old Sections 18-22 have been revised i n order to let them match the revised Sections 12-1 7.1 1.2. The word “vernacular” means the native language of the people, in contrast to the official, or the literary language (in older days in contrast to the latin of the church). In combination with the word “mathematical”, the vernacular is taken to mean the very precise mixture of words and formulas used by mathematicians in their better moments, whereas the “official” mathematical language is taken to be some formal system that uses formulas only. We shall use MV as abbreviation for “mathematical vernacular”. This MV obeys rules of grammar which are sometimes different from those of the “natural” languages, and, on the other hand, by no means contained in current formal systems. 1.3. It is quite conceivable that MV, or variations of it, can have an impact on computing science. A thing that comes at once into mind, is the use of MV as an intermediate language in “expert systems”. Another possible use might be formal or informal specification language for computer programs. 1.4. Many people like to think that what really matters in mathematics is a formal system (usually embodying predicate calculus and Zermelo-F’raenkel
866
N.G. de Bruijn
set theory), and that everything else is loose informal talk about that system. Yet the current formal systems do no adequately describe how people actually think, and, moreover, do not quite match the goals we have in mathematical education. Therefore it is attractive to try to put a substantial part of the mathematical vernacular into the formal system. One can even try to discard the formal system altogether, making the vernacular so precise that its linguistic rules are sufficiently sound as a basis for mathematics. An attempt to this effect will be made in this paper. We shall try to do more than just define what the formalized vernacular is: much of our effort (certainly in Sections 2, 3, 4) will go into showing its relation to standard mathematical practice. 1.5. Putting some kind of order in such a complex set of habits as the mathematical vernacular really is, will necessarily involve a number of quite arbitrary decisions. The first question is whether one should feel free to start afresh, rather than adopting all pieces of organization that have become more or less customary in the description of mathematics. We have not chosen for a system that is based on what many people seem to have learned to be the only reasonable basis of mathematics, viz. classical logic and Zermelo-Fraenkel set theory, with the doctrine that “everything is a set”. Instead, we shall develop a system of typed set theory, and we postpone the decision to take or not to take the line of classical logic to a rather late stage.
1.6. The idea to develop MV arose from the wish to have an intermediate stage between ordinary mathematical presentation on the one hand, and fully coded presentation in Automath-like systems on the other hand. One can think of the MV texts being written by a mathematician who fully understands the subject, and the translation into Automath by someone who just knows the languages that are involved. [For general information on Automath the following paper may be adequate: [de Bruijn 80 (A.5)l.l Experience with teaching MV was acquired in a course ‘
1.7. Even a superficial inspection of mathematical literature shows that it is very hard to get anywhere as long as we take the term “mathematical vernacular” so wide as to contain all language mathematicians use for convincing one another. We shall try t o isolate a fragment of the language and polish it up so as to turn it into a basis for mathematics. It is this fragment that is called MV.
The mathematical vernacular (F.3)
867
The rules of MV do not just explain how mathematical sentences have t o be formed, but also how they have to be manipulated in order t o build new correct material. In particular they will help us to disclose the rules of the game of axioms, definitions, theorems and proofs.
1.8. Roughly speaking, the MV part of a piece of mathematics will be the rigorous part. In order to make a bit clear at this stage what this MV part is, we mention a few things that we do not want to belong t o it. Without being very systematic, we mention: Argumentation in the form of references to previous material, and indications of the kind of reasoning. Typical of what we mean here is: “Replacing x by p in Theorem 25 we find ...”. Indications for reconstructing pieces of texts that have been omitted. Example: “The second part of the proof can be given by interchanging the roles of x and y”. References t o the syntactical form of presented material, like “the lefthand side of this equation”. Interpretation in terms of notions that belong to an entirely different area, like the use of geometrical terminology for discussing the graph of a function, in a case where the rigorous part of the text has no geometry at all. Remarks about the relation between the human writer or reader and the text. Example: “It is easy to see that ...”. It may help the reader to draw a figure of this situation. Commands, like “Show that
...”.
Surveys of what is to be expected in later parts of the text. Historical remarks. It would not be hard to extend this list of non-MV items. Quite often non-MV components and MV-components occur in one and the same sentence. Example: “Obviously we have f(z) > 1 for all x, but that does not help us to prove the lemma”. The only MV-part here is “f(z)> 1 for all z”.
1.9. In a system where we expect to have our mathematics checked by a machine it will certainly be worth while to take both the MV-part and the argumentations as essential parts of the formal language, as has been done in Automath. But even if that is considered as a sound basis for mathematical communication,
868
N.G. de Bruijn
it is questionable whether it can ever replace that communication. It has the disadvantage that it makes sense only for texts that have been elaborated in every silly little detail. For communication this is rather inconvenient. We wish to write in a style in which we omit what we think is trivial. What things can be considered to be trivial depends on the experience the reader is expected to have. Therefore we shall define correctness of MV in such a way that proofs where pieces of the derivation are omitted, can be considered as still correct. A text would become incorrect if we omit definitions of notions that are used in later parts of the book. A proof written in MV may be restricted to showing a sequence of resting points only. The derivation from point to point may be suppressed, or at least be treated quite informally. This seems to come close to the current ideal of mathematical presentation: impeccable statements, connected by suggestive remarks.
1.10. In contrast to what one might expect at first sight, the grammar of the mathematical vernacular is not harder, but very much easier than the one of natural language. We can get away with only three grammatical categories (the sentence, the substantive and the name), because mathematicians can take a point of view that is very different from the one of linguists. The main thing is that mathematical language allows mathematical notions to be defined; it can even define words and sentences. In choosing these new words and sentences we have almost absolute freedom, just like in mathematical notation. We hardly need linguistic rules for the formation of new words and new sentences. It usually pleases us to form them in accordance with natural language traditions, but it is neither necessary nor adequate to set linguistic rules for them. 1.11. The language definition of MV will be presented in two rounds. In the first round we express the general framework of organization of mathematical texts. It is about books and lines, introduction of variables, assumptions, definitions, axioms and theorems. All this is condensed in the rules BR1-BR9 in Sections 9 and 10. In the second round we get the rules about validity. These cover Sections 11-17. These two rounds describe a language for mathematics. It would go too far to call them the foundation of mathematics. The language of mathematics allows us to write mathematical books, and in these books we can axiomatize the rest of what we call the foundation of mathematics. Part of that axiomatic basis might be considered as foundation of mathematics as a whole, but other sets of axioms just serve for particular mathematical theories. The dividing line between the two is traditional, not essential. Part of the axiomatic basis in the book may be of logical nature, and that part will certainly be considered to
The mathematical vernacular (F.3)
869
belong to the foundation. Most of the validity rules of the second round have been put in that second round since they cannot be expressed in the books. In other words, they cannot be expressed in MV itself. But a very large part of what is called the foundation of mathematics can just be written in the books, more or less to our own taste. As examples we mention here: falsehood, negation, conjunction, disjunction, the law of the excluded middle, existential quantification, the empty set, the axioms for the system of natural numbers, the axiom of choice. One might try to reduce the second round to an absolute minimum and to put as much as possible in the MV books. We have not gone that far, in most cases because it seems to be nicer to keep things together that belong together. In the case of Section 15 (rules for Cartesian products) the reason to keep it in the second round may seem peculiar. It is just because of the fact that if we want to refer to elements ( a ,b) of the Cartesian product of A and B , we would hate to have t o mention A and B as parameters all the time. We would have to, if that section would be shifted to the book.
1.12. Let us try to compare MV and Automath. In the first place it must be said that MV has been inspired by the structure of Automath as well as by the tradition of writing in Automath. In that tradition elementhood, i.e. the fact that an object belongs to a set, is expressed by the typing mechanism available in Automath. So in order to say that p is an element of the set S, this is coded as p : S, so S is the type of p . This is in accordance with the tradition in standard mathematical language. If we say that p is a demisemitriangle, one does not think of the set or the class of all demisemitriangles in the first place, but rather thinks as “demisemitriangle” as the type of p . It says what kind of thing p is. In order to keep this situation alive, MV does not take sets as the primitive vehicles for describing elementhood, but substantives (in the above example semidemitriangle is a substantive). It is important to see the difference between substantives and names. Grammatically they play different roles. If we say that a b is an integer, then “integer” is a substantive and a b plays the role of the name of an object. Coming to a situation like a E b E c E d, the Automath style does not allow to write this as a chain of typings like a : b : c : d. If 6 is a set, then let us write 61 for the substantive “element of b”. The chain becomes a : 61, b : c l , c : dl. An important difference between Automath and MV is that in Automath typirigs are unique (up t o definitional equivalence), and in MV they are not. MV is adapted to the tradition of ordinary mathematical language in which 5 is a real number and at the same time the same 5 is an integer. One does
+
+
870
N.G. de Bruijn
not feel a conflict since “integer” is just a special kind of “real number”. In Automath it is always a bit troublesome to express that an object belongs to a subtype: The fact that 5 is a positive real number is described in Automath by two consecutive typings. The first one says that 5 is a real number, the second one says that some particular expression u is a proof for the statement that 5 is positive. This is often felt as a burden. A consequence of the way we treat typing by means of substantives in MV is that a typing like “5 : real number” has the nature of a proposition. This is one of the rules of MV (see T1 in Section 12), but is not done in Automath. Another difference between Automath and MV, already mentioned in 1.9, is that Automath has exact proof references inside the formal text, whereas MV either does not have them at all or has them informally in the margin. This provides a serious (but quite clear) task for those who implement MV into Automat h. There is another trouble with the implementation. In MV we have quite strong equality rules, more or less corresponding to the standard feeling that “between two equal things there is no difference at all, they are just the same”. In Automath it may cause quite some work to show the equality of two expressions whose only difference is that, somewhere inside, the first one has p and the second one has q, and where p is equal to q. One has to bring the equality from the inside to the outside, and that may cause a lot of Automath text. Fortunately the writing of that text can be automated. In our version of MV we have a strong set of equality axioms (in particular EQ10a-EQ10c of Section 13.2) which make all this much easier. 1.13. One might think of direct machine verification of books written in MV, but this will be by no means so “trivial” as in Automath. Checking books in MV may require quite some amount of artificial intelligence. In the first place MV allows us to omit parts of proofs, at least as long as no definitions are suppressed (see Section 1.9). But even if the steps in an MV book are ridiculously small, a checker may have a hard time, since in MV proof indications are not given in the formal text itself. To make a book in MV better readable, one can provide the text with proof references in the form of hints, so to speak in the margin. In order to make automatic checking of MV books feasible, one has to invent some system t o pass those informal hints to the artificially intelligent machine.
1.14. The formation rules of MV allow us to form sub-substantives to a given substantive. The relation is denoted by <, like ‘%quare < rectangle” in geometry. Once we have the substantive “rectangle”, we can form smaller ones, but our rules do not allow us to form bigger ones. The effect is that for every object
The mathematical vernacular (F.3)
871
in an MV book one can find a “largest” substantive it is typed by. Let us call that one the archetype of the object. Likewise, this largest substantive can also be called the archetype of all the substantives it contains in the sense of <. These archetypes can be on the back of our minds, but they are never mentioned explicitly in the MV book. Moreover, archetypes are nowhere mentioned in the language rules. One advantage of this system of “anonymous archetypes” is that we are never obliged to state the archetypes as a kind of parameters (actually this is what we have to do in Automath). Another advantage is that the MV text we produce can also be appreciated by readers who have bigger archetypes in mind. For example, a book on real numbers where complex numbers are never mentioned, can be used by anyone who started from the complex numbers, and wants to see the reals as special cases. In other words, our MV books can always be embedded into book with bigger archetypes. Since all objects, all substantives and all sets have an archetype, we can refer to our MV as a kind of typed set theory.
1.15. Our effort in describing a large part of the language in terms of both substantives and sets, instead of sets only, gives some duplication in the language rules that might be considered as superfluous. We of course would like t o try to eliminate one of the two, and deal with sets only, or with substantives only. Both can be done, of course, but none of the two seems to give anything that looks more satisfactory than what we have in our MV. It has some advantage to describe both: it liberates us from the nasty decision to discard either of them. Substantives (like point, number, function) seem to be the things we handle in our natural language, and sets are things we have learned, more or less artificially, to use instead of substantives. Before the advent of New Math (or should we say, before the rise and fall of New Math) talking by means of substantives was general, and set language was only introduced if strictly necessary. This is roughly what we have done in our presentation of the MV rules. Talking and thinking in terms of substantives is so strongly traditional that one might even call it “natural”.
1.16. In MV it is not true that “every object is a set”. If we introduce a substantive as a variable or as a primitive, then the objects which are typed by that substantive can not be considered as sets. Elements of a Cartesian product (see Section 15) are no sets.
1.17. Some of the decisions we have to take about MV involve questions about what to put in the language and what in the metalanguage. In particular we have a kind of meta-typing (the “high typing”, see Section 3.6) in the language,
872
N.G. de Bruijn
whereas most other systems would have such things in the metalanguage. The high typing is used for saying that something is a substantive or that something is a statement. We note here that ‘‘statement” is a synonym for “sentence”. We use “statement” in MV, but there would be no harm in replacing it by “proposition”. Linguists would probably dislike the use of the word “sentence” for phrases which are no full sentences. The distinction between high typing and low typing corresponds in Automath to the distinction between typing by means of expressions of degree 1 and typing by means of expressions of degree 2.
1.18. It is customary to make the distinction between sets and classes. Roughly speaking, sets are classes over which we allow quantification. Usually we think of the classes which are no sets as those which are just too big to be sets, like the class of all sets. In our MV we allow quantification over every substantive, and substantives directly correspond to sets. Classes over which we cannot quantify are not discussed in MV itself, neither by means of low typing nor by means of high typing. We can discuss them in the metalanguage: the class of all statements, the class of all substantives. In Automath the class of all proofs for a given proposition is treated as a type. There is nothing of that kind in MV. 1.19. Let us devote some attention to the role played by adjectives. An adjective belongs to a substantive, and serves a double purpose: (i) t o form a new substantive, and (ii) to form a new sentence. Example: Having the substantive “triangle”, we can form the adjective “isosceles”. With this one we can form the new substantive “isosceles triangle” as well as the new sentence “... is isosceles”. It may be just because of this double usage that mathematicians like to express things by means of adjectives. Many definitions in mathematics are in the form of the introduction of a new adjective. A warning must be given: an adjective belongs to a substantive, but not automatically to the archetype of that substantive.
2. SOME TERMINOLOGY
2.1. Before we start explaining what MV really is, we say something about the terminology to be used in this paper. We of course distinguish between the language MV and texts written in the language. Instead of “texts” we shall speak of “books”. The language MV has to be defined by stating its rules of grammar, or, as we might say, its language definition.
The mathematical vernacular (F.3)
873
A certain amount of usage of MV in an MV-book will depend on constructions and notions that are introduced in the language definition. This kind of usage will be called primary MV. All the rest of MV that is used somewhere in the book depends on terms and phrases that were chosen previously in that book. Such book material will be called secondary MV. We quote some examples. A phrase like “for all z” will conveniently belong to the language definition, but “logz”, “normal subgroup”, will be things we prefer to define in some book. There are things on the borderline for which the language design has to choose between introduction in grammar or in book, and that choice might be a matter of taste. Examples of things about which one might hesitate are: use of the equality sign, elementary notions about logic, and the treatment of sets and functions. A mathematics book contains what we called (in Section 1.8) an MVfragment, and all the rest is non-MV. A part of the non-MV fragment consists of material that speaks about the MV-fragment. One might think of words and phrases like “left-hand side”, “equation”, “notation” “symbol”, “formula”, “substitution”, “unknowns”, “integration”, “algebraic”, and of the material mentioned in Section 1.8, (i) and (ii). We shall not study this kind of language systematically, but just vaguely refer to it as metalanguage. It should be noted that the borderline between language and meta-language has shifted over the centuries. There was a time when “set”, “function” were definitely metalanguage, and right now there are things on the threshold between language and metalanguage which might shift into the language in the next few decades. As such, one might suspect terms like “proposition”, “condition”, “proof”, “algorithm”. Quite often a word occurs in two different languages, with related but different meanings. This has its reasons: it is always hard to find new words for new notions, and we like t o have words with some suggestive power rather than new words that do not mean anything to us. For example, the word ‘Lalgebra” is used in metalanguage t o indicate a branch of mathematics, and in the language itself to denote a special kind of ring. And if we say “this system of three equations has two solutions”, then “three” and “two” may be on different sides of the border. In this paper we shall develop some new terminology. Some of it will be incorporated in the primary part of the language MV, some of it will be metaMV. We have to be explicit about this, especially since we shall shift things into MV that are commonly considered metalanguage. In particular words like “substantive”, “statement” will become (primary) MV. 2.2.
2.3. Let us give a survey of the abbreviations for the various languages to
874
N.G. de Bruijn
be referred to in this paper. Quite often a sentence does not belong to any of these, simply because the sentence is intended t o relate these languages to each other. But for separate words in this kind of discussion it may be possible t o state more precisely to what language they belong. In many cases we shall indicate this, and we use the abbreviations OMV, gL, MV, pMV, sMV, mMV, smMV, imMV. OMV. This stands for “ordinary mathematical vernacular”. This is the language today’s mathematicians actually use when they want to be precise but not absolutely formal.
gL stands for “general language”. This refers to words and phrases outside mathematics, in our case usually as a kind of meta-OMV. Since we do not suggest an exact definition of OMV, the boarderline between OMV and gL will be vague. MV stands for the “stylized” form of OMV, and is to be described in this paper. This MV is a language that can be defined completely (see Sections 3-17). pMV stands for primary MV, as explained in Section 2.1. Words and other constructs of pMV appear for the first time in the language definition. Authors of MV books do not have the right to deviate from pMV. sMV stands for secondary MV, as explained in Section 2.1. Words, symbols and phrases of sMV are chosen by the author of an MV-book. mMV. This stands for meta-MV. We shall consider two kinds, smMV and imMV. smMV stands for syntax-oriented meta-MV. It is the language we use for expressing the rules of MV. imMV stands for interpretation-oriented meta-MV. It is the language we use for discussion of the relation between MV and its popular variant OMV. We use imMV for explaining how pieces of existing mathematics can be expressed in MV. Many of the words we introduce in pMV, sMV, smMV will be borrowed from gL, and the usage will be related to their usage in gL. In order to give an impression what distinctions can be made, we give a list of terms which all mean something like “group of words or symbols that express something”, and in each case we indicate the possible languages. In several cases the set of possible languages might be extended, and in some cases it is not very
The mathematical vernacular (F.3)
875
clear what that list should be. What is important for us here is the question which cases are pMV, sMV, imMV, smMV. The matter of what belongs to gL is more vague, of course. word ............................ gL ......................................................... sentence ...................... gL ......................................................... proposition .... OMV ........................ sMV ................................ statement ............................. pMV ............................................ substantive ........................... pMV ............................................ formula .......... OMV .................................... smMV ................. expression ...................gL ........................... smMV ................. phrase ......................... gL ......................................................... theorem ..........OMV .................................... smMV .. imMV .. assertion ........ OMV .. gL ........................... smMV .. imMV .. assumption .... OMV .. gL ........................... smMV .. imMV .. condition ....... OMV .. gL ........................... smMV .. imMV .. definition ....... OMV .. gL ........................... smMV .. imMV .. predicate ........OMV .................................... smMV .. imMV .. clause ............................................................ smMV ................. name ........................... gL ........................... smMV .. imMV ..
3. INGREDIENTS OF MV
3.1. We shall point out some of the characteristics of MV, with special emphasis on those aspects which might be considered novelties. We shall comment on the following points: context indication (Section 3.2) , mixing natural language and formulas (Section 3.3), grammatical categories (Sections 3.4 and 3.5), typing (Section 3.6). 3.2. The context indication system, as described in Section 4, is little more than a systematic description of what all mathematicians are aware of when they are talking or writing mathematics. It is certainly worth while to give it a predominant place in the description of what mathematics is. In particular it gives insight in what “variables” are. Moreover it opens the way to natural deduction as a basis for mathematical reasoning. 3.3. Mixing natural language and formulas is a very typical aspect of a mathematician’s lingo (both OMV and MV). In most cases formulas become part of a sentence as if they were just words or sequences of words, in complete accordance with the grammar of the natural language. When we say “if a / / b then
N.G. de Bruijn
876
p E V ” ,then “affb” and “ p E V” play grammatically the role of sub-sentences, just like in “if it rains then we get wet”. And when we say “b satisfies ...” then b is the subject of the sentence just as if b were a person. But there are also cases where the mathematician’s lingo does not follow the rules of the natural language. We mention “for all integers z we have ...’I, where the z does not play a role that any ordinary word can play in a natural language sentence. And we mention “for all z we have ...”, where 2 does seem to play such a role, but the wrong one. For these little irritations we offer as explanation that our natural languages do not admit anything corresponding to bound variables. 3.4. In natural languages one analyses the structure of a sentence by attaching
grammatical categories t o words or word groups. Such categories can be “sentence”, “noun”, “verb”, etc. In our discussion of MV we shall restrict ourselves to a rather small number of categories. We shall only use the following: statement,
substantive,
name.
There might be a case for the adjective (cf. Section 1.19) as a fourth category, but we ignore them in our presentation of MV. The reason why we do not need to go into the finer shades of grammatical analysis lies in the fact that in the MV-book we can introduce words, symbols and other kinds of phrases by means of definitions (see Section 7.2). As far as the defined things are words or phrases, we usually choose them according to what sounds right in ordinary language, and that is why they seem to ask for linguistic analysis. But such an analysis is unnecessary. The only thing we do with the new words and phrases is to repeat them in other circumstances. As an example we quote a definition: “We say that the vectors p and q are locally independent in the sense of Prlwtzkowsky if ...”. Later we just repeat this phrase, with p and q replaced by other names. The fact that the words “in the sense of” have been taken just in this order, does not play a role we consider to be essential for MV. It plays a role in readability, memorizability and possibly in parsability (cf. Section 23). It is like choosing notation: as a function symbol for the hyperbolic cosine we might select “cos hyp” or “cosh” or “csh” but not “ggrrr”, since that would not be very suggestive, and certainly not “gg?(rg[?” since that would be asking for trouble with parsing. 3.5. Let us say a few informal things (expressed in gL) on the categories “state-
ment”, ‘Lsubstantive”, “name” and “adjective” , A statement is a group of words or formulas that might play the role of a complete sentence, although it can occur as just a part of some other phrase (the word “phrase” is used here to indicate any sequence of words or formulas that somehow is considered in its entirety at some moment). Example: In the
The mathematical vernacular (F.3)
877
phrase “if a > b then p is divisible by 5” the parts “a > b” and “ p is divisible by 5” are statements, and the whole phrase is a statement. A substantive is a generic term for a class. Examples: “circle”, “positive integer with exactly three divisors”, “point”. A generic term for a class is not the same as a name for that class. The difference is small: it is only the way we use them. If C is the class of all circles, then the phrases “P is a circle” and “ P is an element of C” are intended to mean the same thing. A warning: sometimes a phrase has the grammatical form of a substantive without playing that role in a mathematical text. In the phrase “ P is the orthocenter of triangle ABC”, the word “orthocenter” is not t o be considered as a generic name for a class. One should not think that it had first been explained what an orthocenter is, and that later it was proved that a triangle has just one orthocenter, so that finally we can speak of “the orthocenter”. No: the phrase “the orthocenter of triangle ABC” can be used by virtue of a previous definition in the book, where it was introduced as a name with the same status as a name like Oc(A, B, C) would have had. Therefore there is no question of parsing it into separate components like “the” “orthocenter”, etc. A n a m e is a phrase we consider as a sufficient indication of an object. Without going into the question whether we have or do not have objects in mathemat,ics, we note that our linguistic handling of mathematics seems to treat mathematical names as if they were names of objects. Examples of names are “the center of the unit circle”, “the point M ” , “ M ” , “a b”. As to adjectives we mention that adjectives are always attached directly or indirectly to a substantive. Once we know what a triangle is we can say what it means that a triangle is “isosceles”. It can be used in two ways: (i) in statements like “triangle A B C is isosceles”, and (ii) in order to form the new substantive “isosceles triangle”. This humble role of adjective does not seem to suffice as a reason for taking them as building blocks in our rudimentary grammar of MV. Nevertheless there is a reason to take them seriously: mathematicians seem to like them so much. They seem to like definitions where a new notion is presented in the form of a new adjective. We shall say more about adjectives in Section 22.14.
+
3.6. In our version of MV we use typing on two levels: low typing and high typing. Low typing is used to express that some “object” is of a certain “kind” like “ p is an integer”. In MV we have a preference for writing a colon instead of “is a”, so we write “ p : integer”. This colon is the notation for a kind of relation between “p” (which is grammatically a name) and “integer” (which is grammatically a substantive). In the metalanguage smMV we say that “ p : integer” is a (low)
878
N.G. de Bruijn
typing. High typing is a thing that in most other systems would be put into the metalanguage rather than into the language itself. We denote it by a double colon. On the right we have either “statement” or “substantive”. Examples of ‘‘z > y :: statement”. high typings are “integer :: We can as well say right here that low typings “ p : q” will occur only in cases where the high typing “q :: substantive” has been established already. In this connection we mention that one might say in the metalanguage smMV that p is a name, or that p is the name of some ql but we do not express this in MV itself. We mention a question that often turns up among mathematicians: is 3 a number or is 3 the name of a number? We can agree to both alternatives, depending on the language we use. In MV we say “3 : number”, but in smMV we say that 3 is a name, and more precisely that 3 is the name of a number. For a moment we consider the word “object”. There is the old philosophical question whether mathematical objects exist. Those who believe in the existence are called platonists. One might suspect that all mathematicians are platonists, even those who fiercely deny it. The matter is clear for those who consider it as their job to provide useful communication language for mathematicians: platonism is not right or wrong, platonism is irrelevant. At least it is irrelevant for matters of truth and falsehood of mathematical statements. It may be relevant for mathematical taste, but that is a personal matter anyway. The most important thing to say about platonism is possibly that platonism is dangerous. It may seduce mathematicians in thinking that they can get away with incomplete definitions of objects since these objects exist anyway. And it might give the false suggestion that slightly different definitions of a mathematical object are not harmful since after all they refer to one and the same platonic object. Another danger of the idea of platonic existence is that many people find it hard t o understand the meaning of existence in mathematics. The statement in OMV that “there exists a positive number whose square equals 9” has nothing to do with the platonic existence of the number 3. We shall give a kind of linguistic interpretation to the word “object”. We take it as a word used in smMV. If S is a substantive, and if we have in MV that p : S, then we might say in smMV that “ p is the name of an object”. Continuing in smMV, one might ask “what kind of object?”. The answer t o this will be in smMV that p is the name of an S, and in MV itself that p : S (which expresses that p is an S). We of course have not expressed here what the word “object” means, but only how the word is used.
3.7. In the next section we will use the word clause (smMV). It will get its exact description in the language definition (from Section 6 onward), but we may as well say right here that a clause is either a typing or a statement. This
The mathematical vernacular (F.3)
879
cannot serve as a definition of the word “clause”, however. We even note that Yyping” and “statement” belong to different languages. In order to give a preliminary idea, we say here that a clause will be either a high typing,
A :: substantive or
P :: statement or something of the form
P
(3.7.1)
in situations where
P
::
statement
(3.7.2)
had already been recognized as a valid clause. The interpretation of (3.7.2) is “P is a well-formed statement”, and the one of (3.7.1) is “P is true”. Note that in (3.7.1) and (3.7.2) P itself can be a low typing like “ a : A ” , where “ A :: substantive” has already been recognized as a valid clause. There will be cases where we establish that “ a : A” is a well-formed statement, and there will be cases where we establish that “u : A” is true. High typings will be different from low typings in the sense that they cannot be considered as statements. There will be no valid clauses of the form
( A :: substantive) :: statement (P :: statement) :: statement
4. STRUCTURE
OF MV BOOKS
4.1. In this section we give a first outline of what a book is. The following terms will be all smMV: “book”, “line”, “older”, “younger”, “context”, “context item”, “declarational”, “assumptional”, “body of a line”, “context length”, “empty context”, “flag”, “flagstaff”, “flagstaff form”, “flagless form”, “block”, “block opener”, “nested blocks”, “sub-block”. A book is a finite partially ordered set of lines. The order relation is called “older than” (“line p is older than line q” and “line q is younger than line p” are synonymous). A line consists of two parts: a context and a body. A context is a finite sequence of context items. There are two kinds of context items: declarational items and assumptional items. The sequence of context
880
N.G. de Bruijn
items may be empty, and in that case we speak of the empty context. In general, the number of items in the sequence is called the length of the context; the empty context has length zero. This is all we say here about context items; for further information we refer to Section 6. We refer to Section 7 for a description of the body of a line; for the time being we do not need such a description. 4.2. We sketch the interpretation of the words introduced in Section 4.1 in terms of gL. A book is t o be interpreted as any connected piece of mathematics that starts from scratch. Lines are primitive building blocks of books. One aspect of lines is that if we omit the last line of a book then it is still a book, but if we omit just a part of that line then it is no longer a book. Usually we think of a book as a linearly ordered set (i.e.l a sequence of lines, and we were thinking that way when using the words “last line” in the previous sentence), where the first line is “the oldest” and the last line is “the youngest”, but we need not go so far as to prescribe this linearity. The meaning of old and young is that younger material may make use of older material, but not the other way round. Since every finite partially ordered set can be put into a linear order that is consistent with the partial order, we see that the generalization from linear t o partial is a very superficial one. Nevertheless, the presentation in a non-linear form may make a book easier to understand. In particular, if two pieces A and B are logically independent of each other, then this independence would be muffled if, only for the sake of typography, we would proclaim A to be older than B. Saying thct a book remains a book if we omit the last line (or in the case of non-linear order, if we omit a line that is not older than any other line in the book), means that it remains a book in the sense of syntactic structure; it need not be an interesting book. An assumptional context item is t o be interpreted as an assumption, like “assume p > q ” . A declarational context item is t o be interpreted as the introduction of a variable of a specified type, like “let y be a real number”. A context is to be interpreted as a sequence of such items, arranged in the order in which they were introduced. As an example (in OMV) we give a context of length 4: “let n be a positive integer, let S be a subset of the set of real numbers, assume that S has n elements, let s be an element of S”. The body of a line is interpreted as a piece of true information we provide in the considered context. As an example (in OMV) we give, with the above context of length 4, “if n > 1 then S contains an element different from s”. 4.3. In this section we present examples of the structure of a book. Throughout
Sub-sections 4.3, 4.4, 4.5 we think of a linearly ordered book. The examples are
The mathematical vernacular (F.3)
88 1
abbreviated, in the sense that context items are replaced by symbols I1,12,13,... , line bodies by 61, bz, b3, ... . Contexts are represented as sequences of items separated by commas, and we write an asterisk between context and body of the line. Now a book can look like this:
The contexts in this example look a bit untidy. In a mathematics text the contexts usually do not change from line t o line, but are constant over a larger piece of text. And if the context changes, it is either by adding a few context items on the right or by deleting one or more from the right. So contexts grow and shrink on the right in the course of a mathematical discussion. Assumptions that were once introduced are no longer valid from a certain point onwards, and the same thing holds for variables: a variable is born, is alive during some time, and then dies. In OMV it is customary to announce birth of assumptions and of variables, but it is left to the reader to guess (possibly on the basis of the typographical layout, possibly on the basis of “understanding the author’s intentions”) at what point in the text they are dismissed. For the sake of further discussion we give a typical example:
4.4. The information contained in a book is completely preserved if we write
it in what we call its flagstaff form. In contrast to this, the form presented in Section 4.3 is called the flagless form. In the flagstafl form, the context items are written on flags. The staff of a flag is vertical, and marks the set of lines where the flag’s item is a part of the context. The following example, where the second example of Section 4.3 has been put into flagstaff form, speaks for itself.
N.G. de Bruijn
882
Needless to say, the way back from flagstaff form to flagless form is immediate. For every one of the bodies b l , ...,blo we get the context if we assemble the items on the flags carried by the flagstaffs we see on the left of that body. Later we shall use rectangular flags for assumptions and pointed flags for declarations, in order to make a clear distinction between those two kinds of context items. We did not do it here, since the relation between flagless form and flagstaff form is independent of such a distinction. In our formal presentation of MV we handle the flagless form; in examples we may switch to the flags (see Section 18). Sometimes we use the word block (smMV) to denote the material to the right of a flagstaff, including the flag itself. So every flag determines a block, and the item on the flag is called a block opener (smMV). As an example we quote that 4.5.
P
P
The mathematical vernacular (F.3)
883
are blocks of the book of Section 4.4.The block openers are 1 4 and 16, respectively. The blocks are always nested, that is to say that if two blocks are not disjoint then one of the two is a sub-block of the other one.
5. IDENTIFIERS
5.1. In this section the following terms of smMV will be introduced: “identifier”, “fresh identifier”, “constant”, “parametrized constant”, “modified parametrized constant”, “variable”, “dummy”, “variables of a context”. Note that a word like “variable” is smMV, but that the variables themselves are sMV. Similar things hold for the other notions. 5.2. An identifier is a symbol or a string of symbols to be considered as an atomic piece of text. We might say that an identifier is a symbol, but since we need a very large number of symbols we use strings of symbols instead, taken from a relatively small collection. It is a matter of parsing how to isolate these identifiers in a given piece of text. We shall not go into these parsing questions since they are not very essential here: if we had an unlimited amount of useful identifiers the matter would not have arisen at all. We refer to Section 23 for further remarks. Examples of identifiers in OMV are “z”,“2”, “the complex number field”, “parallelogram”. We note that if we describe “the complex number field” as a string of symbols, then we have t o consider the empty spaces between the separate words as symbols too. These are produced by key strokes on a typewriter just like the letters, but they do not leave a visible imprint on the paper. Therefore it is better to replace the empty spaces by a visible character that is not used otherwise. One can take the underlining symbol for this, and write “the_complexnumberfield”. In our examples we shall not do this, however. It is one of the aspects in which the paper remains informal. 5.3. An identifier is called fresh at some specified place of the book if it has
not appeared yet at older places of that book. In MV and in OMV we often need fresh identifiers, but in practice this is taken with a grain of salt. Since the number of short identifiers is rather small, we are inclined to use some of them repeatedly, in different circumstances, with different meanings. We shall not pay attention to this matter and act as if there were an unrestricted amount of easily recognizable symbols.
884
N.G. de Bruijn
5.4. In an MV book there are various kinds of identifiers. First there are the pMV symbols that occur in the definition of MV, like
USUbStantiVe7,, LIStatement)!,
U:,l,
.. ,
U..!,
.- .
U.-ll
Another class of identifiers is the class of variables (the word “variable” is smMV, the variables in the book will be sMV). A variable is an identifier that occurs for the first time in an MV book in a declarational context item (see Section 6.3). Other identifiers are bound variables (also called dummies), for which we refer to Section 20. Finally we have identifiers that are called constants. They are the identifiers whose first occurrence in an MV-book is on the left of a symbol ‘?=”. The interpretation is that a constant is the name given to a defined object like “2”, “e” ( e is the basis of natural logarithms). 5.5. Related to the constants are the parametrized constants, which are not identifiers in the proper sense. A parametrized constant is a kind of finite sequence of symbols in which there occur variables at various places. The notion is relative with respect to a context. A context has a number of variables, i.e., the variables introduced in the declarational items of that context. These variables will be referred to as the “variables of the context”. It is essential that each one of the context variables occurs at least once in the parametrized constant. The constants of Section 5.4 can be considered as parametrized constants for the case that there are no declarational items in the context. If z and y are the variables of the context, then the following things may be parametrized constants:
“f(2, y)”,
“z
+ y”,
“the distance from z to y”.
A parametrized constant is called fresh (smMV) somewhere in the book if it has not appeared at older places in that book, not even with different variables. Parametrized constants can be used later in the book by repeating them, with the variables replaced by other expressions. We do not say here what kind of expression we have in mind, but just mention as examples
+
+ +
“f(a b, 3)”, ‘‘(a 6 ) 4”, “the distance from P to the center of c”.
In smMV such modified repetitions will be called modified parametrized constants. Clearly, these modified constants will generate new parsing problems, but again we lightheartedly neglect these.
The mathematical vernacular (F.3)
885
The condition that in a parametrized constant all variables of the context occur, is usually taken with a grain of salt. We return to this in Section 21.6. 5.6. Many of the parametrized constants in our examples will have the form b(z1,...,x n ) , where X I , ...,zn are all the variables of the context, in the order in which they are introduced in the declarational context items. In these cases we often take the liberty to write just b instead of b(z1,...,z), on the left-hand side of the definitional line (and sometimes at other places where it is obvious what the abbreviation b stands for).
6. STRUCTURE OF CONTEXT ITEMS
6.1. A context item is a pair, consisting of a clause and a label. For a first orientation on what a clause is, we refer t o Section 3.7. The label is either “(asrn)” or “(dcl)”. See Section 6.4 for the reason why these labels are used. The phrases ‘‘context item”, “clause” and “label” are smMV; both “(asm)” and “(dcl)” are pMV. 6.2. An assumptional context item has the form “ P (asm)”. As a first orientation we say that this P is a clause, but that not every clause will be admitted: “P (asrn)” will only be allowed in cases where the high typing “P :: statement” can be established in the book, at least in the context formed by the sequence of context items preceding this item “ P (asrn)”. For details we refer to Section 9.
6.3. Declarational context items have one of the following forms: x : P (dcl) x :: substantive (dcl) z :: statement (dcl) where x is a fresh identifier and P is some expression. As said in Section 5.4, z is called the variable of the context item. Not every P will be admitted here, but only those P for which the high typing “ P :: substantive” can be established in the book, in the context formed by the sequence of context items preceding this item “x : P (decl)”. For details we refer to Section 9. 6.4. It is essential that context items are explicitly labeled as being either declarational or assumptional. In the flagstaff form this can be done by using pointed flags for declarations and rectangular flags for assumptions (see Section 18). The reason for the use of labels that distinguish between the two kinds of
N.G. de Bruijn
886
items, is the fact that the form of the context item does not always reveal to which one of the two categories it belongs. The following example of a context of length 2 in OMV shows what we mean: “Let p be a quadrilateral, assume that p is a rectangle” (of course no one would say this in one breath, but quite often the various items of a single context are pages apart). The labels “let be” and “assume that” are no luxury, for if we would say “ p is a quadrilateral, p is a rectangle” then it would not have been made clear that in the first item p is introduced and that in the second item p is a thing we already know about.
7. STRUCTURE OF LINE BODIES
7.1. There are four kinds of line bodies: (i)
definitional line bodies
(Sections 7.2 and 7.6)
(ii) primitive line bodies
(Sections 7.3 and 7.10)
(iii) assertional line bodies
(Sections 7.4 and 7.11)
(iv) axiomatic line bodies
(Sections 7.5 and 7.12)
(all these terms are smMV). 7.2. The interpretation in OMV of lines of the type (i) is that they represent definitions. That word has to be taken in a wide sense, and contains much more than what a text in OMV would label as “Definition”. Whenever we select a new symbol to represent a longer expression, usually for the sake of brevity, we essentially have a definition. We consider three kinds of definitions, according to the syntactic category of the things to be defined. There are “name definitions”, where a new name is introduced for an “object”. Next there are k b s t a n t i v e definitions”, where a new substantive is introduced, and finally “statement definitions”, where a new phrase is coined to represent a statement. As examples of the three kinds of definitions we quote
“the orthocenter of triangle t is ...”, “A rhombus is ...”, “We say that the sequence s converges t o the real number c if
...”.
In MV these three categories will be represented by (7.6.1), (7.6.2), (7.6.3), respectively. For further examples and comments see Sections 7.7-7.9. Many definitions in OMV have the form of the introduction of a new adjective. We shall not put these into MV since they can be circumvented (cf. Sections 3.5 and 22.14).
The mathematical vernacular (F.3)
887
7.3. The interpretation (in OMV) of lines with bodies of the type (ii) is that they introduce primitive notions. Such lines are rare in mathematics, and have the same status as axioms. Together with the axioms they may form the basis of a theory. As an example we quote from Hilbert’s axioms for plane geometry, which state “there are things we call points and things we call lines”, where the words “point” and “line” are introduced as new substantives but, in contrast to the substantive definitions of Section 7.6, without explanation in terms of known things. In MV the introductions of these primitives get the form (7.10.2). An example of a primitive of the form (7.10.3) is that, after points and lines have been mentioned, the notion “point A lies on line q” is introduced without explanation. Finally we give an example of what is expressed in (7.10.1). One of Peano’s axioms is: “there is a special natural number which we shall denote by the symbol 1”. Here the new object is introduced without definition. Instead of defining it we just say of what kind it is. 7.4. The interpretation of lines with bodies of the type (iii) is that assertions are made that follow from previous material. Some of these are called “theorems”, others L‘lemmas’l,but most of them (in particular the assertions inside proofs) do not get such a stately name. And it is certainly not common practice to apply words like “theorem”, “lemma” to cases with high typings like “ A :: substantive”, “P :: statement”, which are likewise admitted here (see Section 7.11). When saying that theorems and lemmas follow from previous material, we have to interpret the habit in OMV to print a proof after the announcement of the theorem instead of before. If we wish to have a similar announcement in MV, we might give a name to the thing stated in the theorem, claiming that it is a statement P , like in (3.7.2). The proof will end with the assertional line body P (see (3.7.1)). The interpretation of the first line in OMV is “ P is a well-formed proposition”, and of the last one ‘‘P is true”.
7.5. Lines with a body of the type (iv) are to be interpreted as axioms. They can be applied in the same way as theorems, but in the case of axioms we do not require that their content follows from previous material.
7.6. In MV, a definitional line body has one of the forms P : = Q : R P := Q :: substantive P := Q :: statement
(7.6.1) (7.6.2) (7.6.3)
where P stands for a parametrized constant. Note that “substantive” and
N.G. de Bruijn
888
‘‘statement” are pMV, as well as the symbols “:=”, Y’, “::”, but that Q is definitely not the pMV symbol “PN”.
7.7. We remark that it will be a consequence of our later rules that (7.6.1) appears only in situations where “R :: substantive” is valid. The interpretation of (7.6.1) is that the definition provides a new (possibly short) name P for an object of which the full description is Q. Example (in OMV): “Let S(n) denote the real number exp(1) ... +exp(n)”. In this example S ( n ) plays the role of P , “exp(1) ... exp(n)” the role of &, and ‘‘real number” the role of R. In MV we write it as “ S ( n ):= ... : real number” (we do not attempt right now to write exp( 1) ... exp(n) in official MV).
+ +
+
+ +
7.8. The interpretation of (7.6.2) is that it provides a (usually short) expression P for a (usually longer) description Q that represents a substantive. Example: the role of Q can be played by the substantive “positive integer with exactly two divisors” and the role of P by the new substantive “prime number”. 7.9. The interpretation of (7.6.3) is that the definition provides a new (usually short) expression for a (usually longer) statement. Example: “We say that p is orthogonal to q if the inner product of p and q is zero”. Here “ p is orthogonal to q” plays the role of P , and ‘the inner product of p and q is zero” plays the role of Q. 7.10. In MV a primitive line body has one of the forms
P := PN : R P := PN :: substantive P := PN :: statement
(7.10.1) (7.10.2) (7.10.3)
We note that “:=”, “PN”, “:”, “::”, “substantive” and “statement” are all pMV (“PN” has been chosen at mnemonic for the OMV-term “primitive notion”). P stands for a parametrized constant. In the case of (7.10-1) it will be required that “R :: substantive” is valid in the context in which (7.10.1) is written.
7.11. An assertional line body in MV is nothing but a single clause (cf. Section 3.7).
7.12. An axiomatic line body has the form c [Axiom]
(7.12.1)
where c is a clause, and the symbol “[Axiom]” is a pMV term. By virtue of language rules still to be formulated, there are two differences between axiomatic and assertional line bodies. In the first place, the assertional line body
The mathematical vernacular (F.3)
889
has to “follow” from the previous part of the book, and secondly, in the axiomatic case the clause c has to be restricted to cases for which the high typing “c :: statement” can be established in the book. The latter is similar t o the restriction made on assumptional context items (Section 6.2).
7.13. We introduce the notion clause of a lane body (smMV). In the cases of Sections 7.10-7.12 the body has just one clause. In a n assertional line body that clause is the line body itself. In lines with axiomatic line body “c [Axiom]” the clause of the line is just that c. In lines with bodies (7.10.1), (7.10.2), (7.10.3) the clause of the body is ‘‘P : R”, “P :: substantive”, “P :: statement”, respectively. A definitional line body has two clauses. The old clauses of the lines of Section 7.6 are “Q : R”, “Q :: substantive” and “Q :: statement”, respectively. The new clauses of these lines are “ P : Q”, “P :: substantive” and “ P :: statement”, respectively.
8. GENERAL REMARKS ON RULES OF MV 8.1. In Section 4, 6 and 7 we have explained the structure of books, contexts and lines. The question is now: what contexts and what lines are allowed? It will not be trivial to state a complete set of rules for this. A part of these rules will be felt as rules for language manipulation; these rules will be explained in Sections 9 and 10. Another part (Sections 12-17) will be more like a piece of the foundation of mathematics. However, the rules of MV will not contain all of what is usually called the foundation of mathematics. Once we have reached a certain level, the language is strong enough to allow us to write the rest of the foundation of mathematics in an MV book. It is attractive to put as little as possible in the language definition and as much as possible in the books, but we shall not aim at extremes in this respect. The state of affairs can be compared to the way a ship is built. The ship is constructed ashore only until the stage that it is just able to float. Then it is launched, and after that, the construction goes on. The reason for this is, of course, that a ship cannot be launched if it is too heavy. In the case of MV the reason is different. The MV ship can be used by many different customers in different ways. After MV is launched, every customer can finish the construction according to his own wishes. After the launching of a ship, two things happen: (i) the construction is completed, and (ii) the ship will be sailing the seas. Here our analogy is less satisfactory. The action of the ship’s construction in the water near the shipyard is very different from the action of sailing the seas. In the case of MV these
890
N.G. de Bruijn
two actions are alike. To (i) there corresponds the writing of the fundamental portions of the book, and what corresponds to (ii) is writing a (possibly long) book or set of books based on that fundamental chapter. But all the time the action consists of writing books and nothing else. 8.2. As said before, our MV will be modelled after OMV, i.e. the way mathematicians write and speak today, but we cannot just copy OMV. There is no consensus in OMV about how things should be said. We are not in a position to derive all rules of MV by observation of OMV. We have to invent new rules, and that may mean making arbitrary choices. We have to give definitive shape to things which are not properly revealed in OMV. In particular this refers to the fact that this paper tries to interpret OMV as a typed language. One might argue that such an interpretation is not really called for, and that it is about as arbitrary as interpreting OMV as a non-typed language. The most-favored method of coping with life without types is to maintain that “everything is a set”. One might try to arrange a typed language in such a way that this set-loving point of view can be obtained by just creating a single type, viz. the type “set”. Yet we have not taken the trouble to keep this possibility open in our presentation of MV. Conversely, one might try to code typed material in terms of a non-typed language, but this seems t o be very unattractive.
8.3. We first say something about the notion of validity (smMV). The word valid (smMV) means: having been built according to the grammar of MV (and that grammar has still to be disclosed). The rules of that grammar will be production rules, in the sense that they all describe ways to extend a valid book by adding a new line. In the course of the description of how the new line is t o be built, we have certain resting-points where certain phrases are discussed as being acceptable ingredients of the line t o be added. Important resting points are clauses (see Section 3.7 and Section 7.13). In a given context there is a set of such clauses which are called valid clauses. The production rules explain how our knowledge about that set can be extended, describing how by means of a number of elements of that set a new one can be constructed. 8.4. Validity is expressed with respect to a book.
As already said in Section 4, a book is a partially ordered set of lines. For any given line we can consider the set of all lines which are older than the given line; this is again a partially ordered set of lines and therefore a book. We shall refer to the given line as “the new line” and to that book as “the set of old lines”. Whether a new line will be called valid, depends on the set of old lines, and not on what happens in other lines. The same remark applies to parts of
The mathematical vernacular (F.3)
89 1
new lines, like clauses and contexts. Only for the identifiers, and more generally for the parametrized constants we have a condition that goes beyond the set of old lines: we have to stipulate that they are all different throughout the whole book. We usually think of a book as having been written line by line, where older lines precede newer lines in time. If this is the case, then the condition for the parametrized constants is that at each moment the parametrized constant introduced in a line (on the left of a sign :=) is different from all the parametrized constants used before. We still have to say how the context for the new line has to be built, and what clauses are valid in that context. This will be said in Sections 9 and 10. 8.5. From Section 9 onwards we shall give a formal definition of the notion of an MV book in flagless form. Except for the syntactic matters referred to in Section 8.6, we shall not make use of what was said in the preceding sections. Those sections were intended to give interpretations, and t o help the reader to get an insight into the complex set of definitions we shall display in the next sections. First a few things about the terminology. The smMV-terms “MV book” and “valid book” are synonymous. We shall not define notions “context” and “clause” as such, but we shall define “valid context with respect to a set of lines”, “valid clause with respect to a valid context and a set of lines” (in both cases that set is referred to as the set of “old lines”, and in the second caSe it will be required that the valid context is a valid context with respect to that same set of old lines), “valid book”, “line”, “body of a line”, “clause of the body of a line”, “context of a line”. The pMV symbols to be used are
.,
u.13
..
it..”,
“:=”, “substantive”, “statement”, “PN”, “[Axiom]”
and “(dcl)”, “(asm)”, %”. (The symbols of the second row do not appear in the flagstaff form: their role is taken over by the pointed and rectangular flags and flagstaffs.) A few general things can be said here about the format of things. A book is a finite (possibly empty) partially ordered set of lines. A line is a pair consisting of a (valid) context and a line body. A (valid) context is a finite (possibly empty) sequence of context items. 8.6. We mention some things that should have been formally discussed in the next sections, but are nevertheless treated very superficially. They are of a syntactical nature. We mention:
N.G. de Bruijn
892 (i)
substitution,
(ii) variables of a context, (iii) fresh identities and fresh parametrized constants, (iv) parsing. We take it that Section 5 (and 23) are sufficiently clear as an indication of how these notions are to be formalized. A complete formalized treatment of them would not quite fit into the general style of this paper.
9. VALID CONTEXTS AND VALID CLAUSES
9.1. Everything that has been said thus far is to be considered as introduction, providing an orientation about what we are going to describe. It also served to build up a feeling for the interpretation. From now on, however, we shall attempt a more complete and more formal description. Many things that have been referred to earlier in vague terms, will now get a more serious treatment. The rules are about books, lines and validity. They will get their content by means of rules BR1-BR9 (BR stands for “basic rule”). We need not say beforehand what these notions mean. These rules BR1 to BR9 are hardly of a logical or a mathematical nature. Or, rather, they describe how to handle logic and mathematics. In order to get to logic and mathematics themselves we have to add a number of rules in Sections 12-17 that describe more ways to produce valid clauses. As to the production of valid contexts and books no rules will be issued beyond these BR1-BR9. 9.2. The symbols c, C, 11, I,, P, A, x, 21, xk, X I , xk that are used in this section for explaining language rules are meta-variables. They are used in smMV in order to denote expressions occurring in an MV book. In the rules BR1-BR7 there is a set S of lines (“the set of old lines”), and “valid” stands for “valid with respect to S”.
9.3. BR1. If an old line has context C, and if c is a clause of the body of that line, then C is a valid context, and c is a valid clause in that context. 9.4. BR2. The empty context is valid. 9.5.
BR3. If 11,...,I, is a valid context (if n = 0 we mean the empty con-
text), and if
The mathematical vernacular (F.3)
893
P :: statement is a valid clause in that context, then 11,...,I,, I,+l is a valid context, where In+l stands for “P(asm)”. (As already explained in Section 6.3, the additional “(asm)” serves to label it as an assumptional context item; it is not superfluous since P may have the form of a typing.)
9.6. BR4. If 1 1 , ..., I, is a valid context (if n = 0 it is the empty context), and if 2 is a fresh identifier, then the following contexts of length n 1
+
11, 11,
...,I,,z ...,I,, z
::
substantive (dcl) (dcl)
:: statement
are valid contexts. If, moreover,
A :: substantive is a valid clause in the context 11,
...,I,,z
:
A
11,
...,I,, then
(dcl)
is a valid context.
9.7. BR5. If 11,...,I,, is a valid context, and if one of these n items is 2 : A (dcl), then 2 : A is a valid clause in that context. Similarly, if one of the n items is 5 :: statement (dcl), then z :: statement is a valid clause in the context. If one of the items is P(asm), then P is a valid clause in the context. 9.8. BR6. Let C and CO be valid contexts, let 21,...,~k be the variables of the context CO (this notion was explained informally in Section 5.5), and let c be a valid clause in the context Co. Let X I , ...,xk be expressions with the property that if we replace 21,...,xkby X I , ...,xk,then all context items of CO, with the labels “(dcl)” and “(asm)” deleted, become clauses which are valid in the context C. Then the clause we get if we replace 2’s by X ’ s in c becomes a clause that is valid in the context C.
9.9. BR7. If 11, ...,I, is a valid context, and if k < n, then I1 ,...,I k is a valid context. If c is a valid clause in the latter context, then c is a valid clause in the context 11, ...,I,.
N.G. de Bruijn
894
10. VALID BOOKS 10.1. The notion of a valid book is obtained by saying that the empty book is valid and by explaining how a valid book can be extended.
10.2. BR8. The empty book is valid.
10.3. BR9. Consider a valid book, and take any set of lines as set of old lines. Let C be a valid context with respect to this set. The following list indicates what line bodies can be used to form, together with the context C, a line that produces a valid book again if it is added to the book, making the new line younger than all the old lines. The line bodies are on the right. In the cases (iii), (iv), (v), (vi), (vii), (viii) we require, as an extra condition, that the clause on the left is valid in the context C with respect to the set of old lines. P := P N :: statement P := P N :: substantive (iii)
Q
:: statement
(iv)
Q
::
(v)
c :: statement
substantive
(4
I ~
P := Q :: statement P := Q :: substantive c [Axiom] C
C
(vii)
R
(viii)
Q :R
:: substantive
P
:=
PN : R
P : = Q : R
In all cases P stands for some fresh parametrized constant, containing the variables of the context and no others. As the “clause of the line body” we take, in the cases (i) to (vii), respectively, (i)
P :: statement
(ii)
P :: substantive
(iii)
both P :: statement and Q :: statement
(iv)
both P :: substantive and Q :: substantive
(v)
c
(vi)
c
(vii)
P
(viii)
both P : R a n d Q : R.
:
R
The mathematical vernacular (F.3)
895
10.4. We have a comment on case (ii) of BR9. Some people may say it is not customary to use, or to admit the use of, lines like this in any arbitrary context. They might like to admit them in the empty context only. Essentially this comes down to starting a mathematics book with the creation of a number of types, and then off we go. This restricted use of case (ii) has the advantage that it becomes much easier to describe the collection of all types that can occur in a book. Nevertheless we keep this rule (ii) as it stands, i.e. we allow it in any context. We leave it to the user of the language to make or not to make the more restricted use of the rule. It is as with roads: one can build a road that technically admits speeds of 200 mph, the legal authorities may prescribe a speed limit of 100, and the individual user may restrict himself to a maximum of 60. We note that if a substantive is introduced as primitive by means of a line of the type (ii), then this substantive is an archetype (see Section 12.1). And if the line has a context like the line z : A(dc1)
*
P ( z ) := P N :: substantive,
then the only way to make special instances P ( u ) and P ( v ) comparable, is to require 2~ = 21.
11. COMMON STRUCTURE OF FURTHER RULES 11.1. All further rules are about the validity of clauses, where “validity” is taken with respect to a set of old lines and with respect to a context (which in its turn is assumed to be valid with respect to that set of old lines). In the simplest case such a rule will be of the form
........................................................................ (11.1) Q .............................................................................
P
and will express the following: if P is a valid clause, then Q is a valid clause too. A variation on the scheme (11.1) is
.......................................................................... Pl p2
Qi
Q2 Q3
.............................................................................
N.G. de Bruijn
896
which means to express the rule that if PI and P2 are both valid, then Q1, Q2, and Q3 are valid.
11.2. Some of the rules will be slightly more intricate in the sense that they deal with context extension. This can happen in entries on either side. We take a case where it happens on the left only:
......................................................................... Pl
*
J
Pz
Q
p3
......................................................................... (the * is an smMV symbol here). The meaning of this is as follows. We are dealing with a set of old lines (which is not going to be changed in this rule), and a context C. Assume that PI is a valid clause in the context C, that P2 is a valid clause in the extended context C, J (if C = I1, ....I,, then C, J represents the context I1, ....I,, J ) , and finally that P3 is a valid clause in the context C. Then Q is a valid clause in the context C. The validity of J as a context item will not be open to doubt in the cases we present. This validity will always follow from the assumptions. In rule T6 there is a case where the role of J will be taken over by two context items (separated by a comma) instead of a single one. As remarked in Section 11.1,a rule like the one above is intended to hold in any context. If I is such a context, this means that the rule also includes the following one:
........................................................................ I * PI I,J * 9 I
*
I * &
P3
............................................................................. 11.3. In all our rules, the phrases that were represented above by P , P I , P2, P3, Q, Q1, Q2, 9 3 , J will be expressions in terms of one or more meta-variables. Actually all symbols that have not been introduced explicitly as pMV, are to be considered as meta-variables in this kind of rules. For example, in rule T8' the letters A and B are meta-variables. In applications of that rule, they may be replaced by any pair of expressions.
11.4. Except for rule EQ11, all rules to be presented in the next sections have the form sketched in Sub-sections 11.1 and 11.2.
The mathematical vernacular (F.3)
897
11.5. We sometimes use the term derived rule (smMV). Derived rules are rules whose validity follows from earlier rules. In other words, what such a rule proclaims to be valid, can be shown to be valid already because of the other rules. We shall use a rule number with asterisk if we claim that the rule is a derived rule. The remaining rules are called fundamental rules, although we do not claim our set of fundamental rules to be minimal. In some cases one might be able to write such derived rules as theorems in an MV book, and then it is a matter of taste whether we present them as language rules or as theorems. There are derived rules whose derivation requires induction over the length of the book. As an example we take the observation that a : A can appear in the book only when A :: substantive. Another example is the observation that if A and B are substantives, and ( A = ) B :: statement, then there is a substantive C such that both A << C and B << C. We shall not actually use this kind of derived rules, but rather consider them as part of the metatheory of MV books. Therefore we shall refer to such rules as metatheory rules. 11.6. In some rules we deal with substitution (smMV). We use as smMV notaIf z is an identifier, if P and Q are expressions, then [[z/P]] Q denotes tion [[ the expression we get if every occurrence of 2 in Q is replaced by the expression P. Since P may also contain z, there may arise new occurrences of z, but these new ones are not to be replaced by P , of course. Example: [[s/g(z, y)]] f(z,g) stands for f Mz,Y), Y).
/I].
12. RULES ABOUT TYPING 12.1. We start with an informal introduction on the relation “<<”,a relation that can hold between substantives. If A and B are substantives then “ A << B” is to be interpreted as “every A is a B”. Example: “square << rectangle”, “rectangle << rectangle”. In smMV we say that “ A << B” is a clause, and that it intends to express that “ A is a sub-substantive of B”. The clause “ A << B” can appear in books also in cases where A << B is not true. For example, although “rectangle << square” is not true, it can still be considered as a well-formed statement. Here we speak smMV, but in MV it would be expressed as rectangle << square :: statement.
Our rules will have the effect that A << B can only appear in our books if A and B have a common ancestor E , i.e., a substantive E such that A << E and B << E are both true. If there is not such a common ancestor, then A << B will not be a statement. Our production rules for valid clauses will never produce
N.G. de Bruijn
898
such a thing as “rectangle < complex number :: statement”. One might derive in the metalanguage that “A << B :: statement” is an equivalence relation on the set of all substantives, and that every equivalence class is completely characterized by an “archetype” E , i.e., a substantive E with the property that all A of the class are sub-substantives of E . Similarly, if an ‘‘object” x is typed by x : A, then x has an archetype E. These archetypes do not appear in our rules, and neither in our MV books. For a comment on why the rules of MV were designed without explicit archetypes we refer t o Section 1.14. Since formula A < B will not be a statement for arbitrary substantives A and B, we will be unable to state it as an assumption in an MV book. We cannot say in such a book: “let A and B be substantives and assume that A < B”. The fact that we do say such things in some of our rules (like in T4) is quite a different matter. These rules are not written in an MV book. In T4 it means the assumption “A << B is a valid clause”, and this is said in smMV, not in MV. In none of our rules there is a conclusion drawn from the mere fact that A << B :: statement. We refrain from such rules in the philosophy that in such cases we will always have some substantive C with A << C and B << C . Formulas a : A will only appear in our books in situations where A is a substantive. Likewise, A < B will only appear when both A and B are substantives. Therefore we need not add assumptions like A :: substantive on the left in the rules Tl-T6. As to use of the substantive binder S in rule T6 we refer to Section 20.4.
12.2. Rules T1-T11. (See Section 11 for notational conventions.)
............................................................................. T1
a : A
a : A :: statement
............................................................................. T2
A c C
A << B :: statement
B
............................................................................. T3
A
B
a : B :: statement
a : A :: statement
.............................................................................
899
The mathematical vernacular (F.3) T4
A
2
: A(dc1)
*z
:
B
............................................................................. T5
z : A(dc1)
*
z : B
A
.........................................................................
............................................................................. We mention some derived rules. Indications for derivations are: T7*: from BR5 and T5, T8*: from T7* and T2, T9* and T10*: from T3 and T7*, Tll*: from T4 and BR5, T12*: from T4, BR7, T11* and T5, T13*: from T6 and T11*.
. ....................................................................... . A :: substantive
A
............................................................................. T8*
A< B
A < B :: statement
............................................................................. T9*
A
a : B :: statement
............................................................................. T10* A
............................................................................. T11’
A< B a : A
a : B
.............................................................................
900
N.G. de Bruijn A
T12’
A
B e C
............................................................................. T13’
2 :
Y
:
A(dc1) Sz:AP
*P
:: statement
y :A “X/Yll
p
......................................................................... 13. RULES ABOUT EQUALITY
13.1. We shall consider equality between names of objects, between statements and between substantives. The effect of our rules will be that objects will be comparable by equality (being “comparable by equality” means that their equality is a statement) only if they have the same archetype (cf. Section 12.1), and substantives will be comparable by equality only if they are sub-substantives of a common archetype. Any two statements will always be comparable by equality. As a metatheory rule (which we shall never explicitly use) we mention that if p = q appears in a book, then p and q are either both objects typed by a substantive, or both substantives, or both statements. In the (quite strong) rules EQ10a-10c the symbols p and q stand for phrases that possibly show one or more occurrences of the identifier t . 13.2. We first present the fundamental rules of the form displayed in Section 11.
........................................................................... EQ1
a : A :: statement
a = b :: statement
b : A :: statement
............................................................................. EQ2
a : A
a=a
........................................................................ EQ3
a : A
b:A
a = b
.............................................................................
The mathematical vernacular (F.3) EQ4
A
90 1
A = B :: statement
............................................................................. EQ5
A :: substantive B :: substantive A=B
A
............................................................................. EQ6
A
A=B
............................................................................. EQ7
P :: statement Q :: statement
P =Q
:: statement
............................................................................. EQ8
P :: statement Q :: statement P(asm) * Q &(am) * P
............................ EQ9
P = Q
...
......................................
P :: statement Q :: statement P=Q
................................
... ..........................................
EQlOa u : A
w : A “tl4lP = “WlQ u=w t : A(dc1) . . * p = q (for notation of substitution see Section 11.6) ~
............................................................................. EQlOb As EQlOa, but with “:: substantive” instead of “: A”.
............................................................................. EQlOc As EQlOa, but with “:: statement” instead of “: A”.
.............................................................................
N.G. de Bruijn
902
13.3. The following rule E Q l l is not of the general form described in Section 11.
............................................................................ E Q l l If the set of old lines contains a line of one of the forms
C * P : = Q : R C * P := Q :: substantive C * P := Q :: statement (cf. the first three cases of Section 10.3), then P = Q is a valid clause in the context C.
............................................................................. 13.4. We mention the following derived rules, mainly on reflexivity, symmetry and transitivity of equality. Hints for derivation are:
EQ12': EQ13': EQ14': EQ15': EQ16': EQ17': EQ18': EQ19': EQ20': EQ21': EQ22': EQ23': EQ24': EQ25':
from EQ5 and Tll', from T9 and EQ1, from T7* and EQ6, from EQ5 and EQ6, from EQ5, T12' and EQ6, from BR5 and EQ8, from EQ9 and EQ8, from EQ9 and BR6, from EQ9, BR6 and EQ8, from EQ1, EQ17', EQlOa, EQ2 and EQ18*, from EQ1, EQ17*, EQlOa, from EQ6, T6, T13*, BR6, EQ19', T5 and EQ6, from T6, EQl2' and EQ8, from T5 and EQ6.
............................................................................. EQ12* A :: substantive B :: substantive a : A A=B
a : B
.............................................................................
The mathematical vernacular (F.3) EQ13'
a : A :: statement b : B :: statement
903
a = b :: statement
A
........................................................................... EQ14'
A :: substantive
A=A
........................................................................... EQ15'
A :: substantive B :: substantive A=B
B=A
............................................................................. EQlG* A :: substantive B :: substantive C :: substantive
A=C
A=B B=C
.......................................................................... EQ17'
P
::
statement
P=P
........................................................................... EQ18'
P :: statement Q :: statement
Q=P
P=Q
......................................................................... EQ19'
P :: statement Q :: statement P=Q P
.......................................................................... EQ20'
P :: statement Q :: statement R :: statement P=Q Q=R
P=R
..........................................................................
904
N.G. de Bruijn
EQ2l* a : A a=b
b=a
............................................................................. EQ22* a : A a=b b=c
a=c
............................................................................. EQ23* A :: substantive z : A(dc1) * P :: statement z : A(dc1) * Q :: statement z : A(dc1) * P = Q
S x :P ~= S x : Q~
....................................................................... EQ24* A :: substantive z : A(dc1) * P :: statement z : A(dc1) * Q :: statement Sx:A p = Sx:A Q
z : A(dc1)
*
P =Q
............................................................................. EQ25* A :: substantive B :: substantive z : A(dc1) * z : B z : B(dc1) * z : A
A=B
............................................................................. 14. RULES ABOUT SETS 14.1. In this section we shall provide rules that take care of the translation of the language of substantives into the language of sets. This translation is not very essential, and whether we prefer sets over substantives is partly a matter of fashion. But one thing is really important for us: we want to be able to speak of the collection of all subsets of a set, and to quantify over that collection. The symbols “-set”, “1”are pMV. We use “A-set” as a substantive formed from the substantive A (like “pointset” is derived from “point”). The notation “At” can be pronounced as “the set of all A’s”, and if T is an A-set, then 7’1 can be pronounced as the substantive “element of T’ .
The mathematical vernacular (F.3)
905
14.2. Fundamental rules.
............................................................................. s1
A :: substantive
A-set :: substantive A t : A-set
.......................................................................... s2
Ti :: substantive T1< A (Tl)t= T
A :: substantive T : A-set
........................................................................... s3
A = (At11
A :: substantive
.......................................................................... s4
A :: substantive B :: substantive A
A-set
< B-set
.......................................................................... 14.3. In order to get t o the ordinary notations about sets we have to introduce
some typographical abbreviations (for this notion we refer to Section 21):
aET
stands for
a : TJ
TIc T2 stands for T11< T21 , A = T1. {z E T I P ( z ) } stands for (&:A P ( z ) ) f where 14.4. We mention a number of derived rules. Hints for derivation are:
S5': from S1, S4 and T l l * , S6*: from S4, T2, EQ4, S1, T1, T3, S7': from S2, T11*, BR5, EQ20*, EQ24*, S8*: from BR5, S1, EQ14*, EQlOb, S1, EQ2, EQlOb, S9': from S4, S3, S2, EQ14*, EQlOa, S2, EQ15*, EQ16*, S10*: from S1, EQ14*, EQlOa, S11*: from S2, S8*, EQ23*, EQ24'.
............................................................................. S5*
A :: substantive B :: substantive A
A t : B-set
.............................................................................
N.G. de Bruijn S6*
A :: substantive B :: substantive C :: substantive A<
A-set (< B-set :: statement A-set = B-set :: statement A t : B-set :: statement
............................................................................. S7*
A :: substantive Ti : A-set Ti = Tz TZ : A-set z : A(dc1) * (ZE Ti) = (ZE Tz)
........................................................................ S8*
A :: substantive B :: substantive A=B
A-set = B-set AT= BT
............................................................................. S9*
A :: substantive B :: substantive C :: substantive A c C B c C AT= Bt
A=B
............................................................................. S10'
A :: substantive Ti : A-set Tz : A-set Ti = Tz
............................................................................. S11*
A :: substantive TI : A-set TZ : A-set
Ti = Tz
Til= Tzl
.........................................................................
907
The mathematical vernacular (F.3)
15. RULES ABOUT PAIRS 15.1. If we have two substantives, “point” and “line”, say, we want to speak of pairs, the first component of which is a point, the second one a line. Such pairs are called “point-line-pairs” . The symbols “-pair”, “proj1”, “proj2”, “the pair” axe pMV. 15.2. Fundamental rules about pairs.
............................................................................. P1
A :: substantive B :: substantive
A-B-pair :: substantive
............................................................................. P2
a : A
the pair (a, b ) : A-B-pair
b : B
............................................................................. A B
P3
:: substantive :: substantive u : A-B-pair
projl(u) : A proj2(u) : B the pair (projl(u),proj2(u)) = u
............................................................................. P4
a : A b : B
projl(the pair (a,b ) ) = a proj2(the pair(a, b ) ) = b
............................................................................. 15.3. We mention two derived rules. Hints for derivation are:
P5*: from P3, T11*,P2, EQ22*, T5, P6*: from P5*, T2, EQ4.
............................................................................. A-B-pair < C-D-pair P5* A
............................................................................. P6*
A<E
C<E B
A-B-pair < C-D-pair :: statement A-B-pair = C-D-pair :: statement
D < F
...........................................................................
N.G. de Bruijii
908
16. RULES ABOUT FUNCTIONS 16.1. If A and B are substantives we shall introduce a new substantive “mapping of A’s to B’s”. We write this in MV as “A ---* B”; the symbol + is pMV. The fact that the same arrow is used for implication (Section 17) will give no confusion. Actually the rules for the two are alike (compare F2 and F3 with L2 and L3). For the value of the function f at the point p we shall write in MV “Val( f , p ) ” , instead of the usual f ( p ) . The notations V a l and X are pMV. 16.2. Fundamental rules about functions.
........................................................................... F1
A :: substantive B :: substantive
A + B :: substantive
............................................................................. F2
A :: substantive B :: substantive f : A+B p : A
val(f,p) : B
............................................................................. F3
A :: substantive B :: substantive z : A(dc1) * F : B
X z : ~F : A
+B
............................................................................. F4
A :: substantive B :: substantive z : A(dc1) * F : B y : A
val(Xz:A F, y) = “z/yl] F
............................................................................. F5
A :: substantive B :: substantive f : A+B f = g g : A+B z : A(dc1) * Val( f , z) = val(g, z)
.............................................................................
909
The mathematical vernacular (F.3)
16.3. Here are two derived rules. Hints for derivation are: F6’: from F2, F4, F5, F7’: from T l l * , EQ12*, F3, F6*, EQ22*, T5.
........................................................................... F6*
A :: substantive B :: substantive f : A-B
Xz:A val(f,
x) = f
............................................................................. F7*
A :: substantive B :: substantive C :: substantive D :: substantive A=C
A+B<
B e D
............................................................................. 16.4. Many mathematicians would prefer to express the notion of function by means of a graph in a Cartesian product, which has the advantage to reduce the number of basic rules. On the other hand the function concept seems to be such a natural one, and the way we think of functions is usually so far from Cartesian products, that it is attractive to describe the function concept independently.
17. RULES ABOUT LOGIC 17.1. The only things to be presented in this section are the rules for implication and for universal and existential quantification. Treatment of negation, conjunction and disjunction can be postponed to the MV book. For this possibility we refer to Section 18.1. The symbol ‘I-” is pMV; the same arrow was used in Section 15 for the notation of mappings.
17.2. We first present the fundamental rules L1, L2, L3.
........................................................................... L1
P :: statement Q :: statement
P
-
Q :: statement
.............................................................................
N.G. de Bruijn
910
P :: statement Q :: statement P‘Q
L2
P(asm)
*
Q
............................................................................. P :: statement Q :: statement P(asm) * Q
L3
P“Q
............................................................................. 17.3. We mention some simple derived rules about implication. Hints for proofs are:
L4*: from BR5, L3, L5*: from L2, EQ8, L6*: from L4*, EQlOb.
....
.....................................................................
L4*
P :: statement
I . .
P+P
. . . . . . . . ................................................................. P :: statement Q :: statement P‘Q Q’P
L5*
... L6*
..
P=Q
.................................................................. P :: statement Q :: statement
P‘Q Q“P
P=Q
............................................................................. 17.4. We finally present derived rules on universal quantification. Let P be an expression (possibly containing the identifier z). Then we take “ V z : ~P” as P”. typographical abbreviation (see Section 21) for “A = For this new quantifier the following rules can be derived.
........................................................................ L10*
A :: substantive z : A(dc1) * P :: statement
V z : ~P :: statement
.............................................................................
The mathematical vernacular (F.3)
L11’
A :: substantive z : A(dc1) * P :: statement 2 : A(dc1) * P
911
V x :P ~
............................................................................. L12’
A :: substantive z : A(dc1)
*
P :: statement
vx:Ap
“x/aIl p
a : A
............................................................................. 17.5. The reader might have expected a treatment of existential quantification too. This can easily be postponed t o the MV book. I t can be built upon axioms for a statement exist(A), where A is a substantive. It seems t o be nicer to postpone that to the book, since it is of the same nature as the axioms for disjunction in propositional calculus. 18. EXAMPLE OF AN MV BOOK 18.1. Having completed our presentation of the rules of MV, we can now start writing books. In the beginning of an MV book we still have t o write a number of fundamentals in the form of primitive notions and axioms. These might have been taken as language rules too, but we would rather leave it t o the user of the language to have it in his own way. Moreover, the language definition is simplified if we shift t o the book whatever we can. Nevertheless there are several things in our language rules that have a form that would enable us to write them in the book. As examples we mention EQ2, EQ3, S3, F1, F2, F5, L1, L2, L3 as far as “fundamental” rules are concerned. In the case of S1, F1, L1 we had serious reasons for not shifting them t o the book: they were needed for the formulation of further rules that had to stay in the language definition because of their form. For the others there is no other reason than the wish to keep related material together. T h e fact that some of them play a role in the derivation of derived rules (like EQ2 is used in the derivation of EQ21*), is not a serious reason. The derived rules do not belong to the definition and theory of MV, and might as well be postponed until after a piece of the book has been written. In Section 19.4 we show that some book material might also have been put in the form of rules of the type of Section 17.
18.2. In the following MV book with pointed and rectangular flags (cf. Section
N.G. de Bruijn
912
6.4) we have numbers (l), (2), ... on the left. These do not belong to the book, but serve as labels for our comments in Section 19.
a :: statement> :: statement a then b := a
>
a-b b then a
+
-
b :: statement
b
El and b := PN :: statement or b := PN :: statement
4
a or b [ Axiom ]
P
a and b [ Axiom ] b and a
El
a or b [ Axiom ]
1a [ Axiom ] b [ Axiom ] b and a aorb I I c :: statement
>
a -+ ( b or a ) b -+ ( b or a ) b or a contradiction := PN :: statement no := contradiction :: statement a :: statement>
I
The mathematical vernacular (F.3) not(a) := a + no :: statement no a [ Axiom ] ( ( a -+ no) + a) + a [ Axiom ] -+
(a
-
no)
-+
not(not(a))
a
+a
'al
11-
r
not(6) a ot(a) + a d 6 = not(a b = not(a) I a or not(a)
not(b)) b
-+
-+
1 :: substantive> xist(A) := PN :: statement exist(A)
[ Axiom ]
exist(A)] :: statement
not (Vz:Ano)
not(tl,,Ano)I not exist A
913
N.G. de Bruijn
I
exist(A) no Vx:Ano no 2t (not(exist(A))) tist(A) st(A) = not(V,,A no) re is exactly one A := exist(S,,AVb,A (b = a ) ) :: statement iere is exactly one A I ie A := P N : A b =the A
A) Z}A :=fx:A
(x = a ) : A-set
e=
X:> {a}A
{ a , b}A :=fz:A (x = a or x = b) : A-set a E {a,
~IA
Ea,
b E {a,b)A
b ) and ~ not (c = a )
I
A :: substantive B :: substantive
IA
union(A, B , f ) :=fb:B 3,:Ab E
Vd(f, U )
: B-Set
:: substantive>
D
A) P :: statement> selected(A, a, b, P ) := S Z : ~ ( (= ( za ) and P ) or ((x = b) and not P ) ) :: substantive there is exactly one selected(A, a,b, P ) selection(A, a, b, P ) := the selected(A, a , b, P ) : A if P then (selection(A, a , b, P ) = a ) if not(P) then (selection(A, a, b, P ) = b)
The mathematical vernacular (F.3)
915
natural number := PN :: substantive nat := natural number :: substantive N := nat T: nat -set 1 : = PN : n a t
suc(n) := successor of n : nat not(suc(n) = 1) [ Axiom ]
propagate(S) := V n : ~ ( ( nE S) -+ (suc(n) E S)) :: statement if start(S) and propagate(S) then S = N [ Axiom ]
m divides n := 3k:nat ( k m = n) divisor of n := Sk:nat ( k divides n) :: substantive (divisor of n) << nat prime number := Sp:nat(not(p= 1) and vk:&visor of p ( k = 1 or k = p ) ) :: substantive prime number << nat 9
II
not ( p . q : prime number) :: statement
point := PN ::substantive line := PN :: substantive
r
A := P N :: statement
B : oint
A and B := ( b goes thr. A ) and ( b goes thr. B )
pzz>
one
(Sc:line
(c goes thr. A and B)) [ Axiom
1
N.G. de Bruijn
916
(147) (148) (149)
P
b : line A , B, C on 6 := ( b goes thr. A and B ) and (6 goes thr. C) :: statement
3A:point 3B:point 3C:point
not(%:line ( A , B ,
c on d ) ) [ Axiom ]
19. COMMENTS O N THE EXAMPLE OF AN MV BOOK 19.1. In Section 18.2 the text from (1) to (59) represents a piece of propositional logic. It is not complete in the sense that it contains everything one might ever need, but it is sufficiently representative for showing the following things. (i)
Some logic can indeed be developed in the book, so we do not need to put all of it in the language rules.
(ii) The logical statements we derive in the book can be applied later as inference rules, in the same way as mathematical theorems are applied. So there is hardly a borderline between logic and mathematics. A much more prominent borderline is the one between language definition and book material. (iii) The book starts with minimal propositional logic (lines (1) to (32)): just the rules for introduction and elimination of implication, conjuction and disjunction, without any negation. Next there is the introduction of contradiction and negation (lines (33) to (36)) and the “falsum rule” (37) as a logical axiom. This part of the book ((1) t o (37)) might be used as a basis for intuitionistic mathematics. It is not very likely, however, that our system would satisfy all intuitionists. They might dislike some of the things we have preferred to put in the language rules, like the possibility to introduce arbitrary substantives, and the rules Sl-S4 (which make it possible to talk about the set of all subsets of a set). The extra axiom (38) takes us into classical logic, with the double negation rule (44) and the rule of the excluded third (59) as results. From there on we are in classical logic. It has to be admitted that if classical logic had been our only goal, the number of primitives and axioms might have been reduced considerably. But reducing the number of axioms, important as it might be for metatheory, is completely irrelevant from the practical point of view: later in the book old axioms and primitives are applied in exactly the same way as old theorems and old defined notions. (iv) The spirit of the treatment is natural deduction. There is no trace of treating logic by means of truth values. Contrary to popular opinion, it
The mathematical vernacular (F.3)
917
is hard, if possible at all, to explain logical reasoning by means of truth values, unless one cheats by using a priori knowledge of logical reasoning in order to explain what such reasoning is. But nothing is lost by discarding truth tables. Everything that can be done with truth tables, can be done in natural deduction, usually faster, and usually closer to our actual way of thinking.
19.2. We now comment on some of the details of the MV book of Section 18.2. The book starts with implication. Line (3) introduces an alternative notation for the +. Lines (4) to (8) are meant as a little exercise with the introduction of implication, according to rule L3. Note that we have (6) by the fact that we had “b” in an old line in a smaller context (BR5 and BR7). Now (7) follows by L3. Similarly the little theorem (8) follows from (7). Lines (9) to (11) apply rule L2 for the elimination of implication, the so-called “modus ponens” rule. Actually it makes the rule available as a book theorem: any further case of modus ponens can be seen as an application of this theorem (11). A similar remark holds for many mathematical theorems. We need not always transform results obtained inside a block into results with a shorter context by means of L3 and L11*. We can just leave them quietly in their context, ready for the application of the powerful rule BR6. Lines (12) and (13) introduce the conjunction and disjunction as primitives. The introduction rule for the conjunction is given in (17) and (18); note that (18) need not be labeled as an axiom since it can be obtained from (17) by substitution: b for a and a for b. The elimination rule for the conjunction is given by (22) and (23)) and here both have to be axioms. The introduction rule for the conjunction is expressed by the two axioms (15) and (20). Elimination of disjunction is achieved by the more complex rule (29). It is the basis for “proof by cases”: if we want to prove c and we know that a or b, then it suffices to derive c from a and from b separately. In (33) the primitive notion “contradiction” is introduced. Thus far there was no question of “falsehood”, since all validity rules in the language definition are formulated in a positive way. In an MV book there is no reason to say that a thing is true: a man is not more honest because he says that he is honest. But we do say sometimes that a thing is false, by saying that it implies a contradiction. This is expressed in (36). In (34) we just abbreviated “contradiction” to “no”. This has the same function as the typographical abbreviations (Section 21), but we prefer to restrict the use of typographical abbreviations t o cases that can not be expressed in the book. The same remark applies to the notation -.a instead of not(a) : it can be introduced in a definitional line in the MV book if we wish.
N.G. de Bruijn
918
Classical logic is obtained by adding the double negation law (45): it says that if the negation of a is not true, then a is true. In the present text it is a theorem instead of an axiom, derivable from the two axioms (37) and (38). The first one is the intuitionistic “falsum rule”, the second one is a special case of Peirce’s law ( ( a + b) a ) a. We have taken (38) as an axiom since it gives rise t o interesting exercises in natural deduction. Let us assume that we did not have the rules (12) to (32) in our book. Then we can still say that the rule (38) (holding for all statements a ) has the eflect of a disjunction, viz. the disjunction of b and not(b) (for all b). Line (56) shows that a can be proved by cases, just as if “b or not (b)” had been available. In (45) we have the double negation law: the negation of the negation of a statement implies that statement itself. If we would have taken the double negation rule (45) as a n axiom, we might have derived both (37) and (38) as theorems. In a certain sense (37) and (38) form an orthogonal decomposition of (45). This is to be interpreted with the following notion of orthogonality: statements p and q are called orthogonal if p ) + p . In a way that means that q and p both ( p -, q ) -, q and ( q do not give any information about each other: if we can prove q under the assumption p then we can prove q all by itself, and if we can prove p under the assumption q then we can prove p all by itself. And indeed, without using the a and book from (1) to (32) it can be shown directly after line (35) that no ( ( a + no) a ) a are orthogonal. A few lines about the derivation of (45). By modus ponens we have -+
-+
-+
-+
-+
-+
not(not(a)) (ass), not(a) (ass)
*
no,
*
a.
so the falsum rule leads to not(not(a)) (ass), not(a) (ass) Therefore not(not(a)) (ass)
*
( a -+ no) + a ,
so applying (38) with modus ponens we get not(not(a)) (ass)
*
a.
Line (59) is the so-called rule of the excluded third. Abbreviating “a or not(a)” to c , we get it from (56), with a replaced by c and b by a. Notice that both a and not(a) lead to c, because of (15) and (20). The basic rules BR1-BR7 and the equality rules EQ1-EQ25’ soon become second nature to us, and therefore we shall hardly notice their application any more.
The math em at ical vernacular (F.3)
919
19.3. At this point we note that it is hard to give a satisfactory description of the word “proof” as an smMV term. Looking ahead from a line (i) to a line (j) (where i < j ) one might say that the text between (i) and (j) is a proof of (j). Others might only count the material between (i) and (j) as far as it is actually used for the derivation of (j). More important is the question whether explanations of the type given in Section 19.2 belong to the proof. If the steps in the MV book are so small that each line requires just a single application of a single rule, then most people would call it a very detailed proof, even if it is not mentioned in the text what rules were applied. We can omit lines here and there such that most people will be able to find intermediate steps themselves by mental exercise. In the case of (24) most people will be able to find the missing link in a split second. In the case of (57) there is a sequence of missing links, and we will feel the need for scrap paper, even if we are experienced in this kind of natural deduction. We can go quite far in this respect. It was pointed out already in Section 1.9 that the rules of MV allow to omit intermediate steps, even to such an extent that readers may find that the essence of the proof is lacking. This is caused by the fact that validity in MV is defined recursively. Something can be valid because of the existence of a sequence of intermediate steps; it is not required that these steps have actually been written down in the book. 19.4. Some of the logical parts of the text of Section 18.2 give us the same rights as if we had added a number of further rules in Section 17. We shall display them in that form here, labeled with two asterisks instead of one, since they are not derived from the fundamental rules of Sections 9-17 but from material of the particular book of Section 18.
............................................................................. L13**
P :: statement Q :: statement
PandQ
P
Q ......................................................................... L14”
P :: statement Q :: statement PandQ
P
Q
............................................................................. L15**
P :: statement Q :: statement
P or Q
P
.............................................................................
N.G. de Bruijn
920 L16**
P :: statement Q :: statement
P or Q
Q ............................................................................. L19**
P :: statement Q :: statement R :: statement P-R Q’R P or Q
R
............................................................................. 19.5. In (60)-(97) we show a few things about sets, sufficiently representative for showing how one should go on. Lines (60)-(67) introduce the notion of existence on a negation-free basis. Once we have classical logic, we can do more. In (64)-(80) we show the equivalence of “exist(A)” and “not(Vz:A no)”, which is t h e basis for expressing existential quantifiers into universal ones, and vice versa. We mention that (69) is obtained by application of (67) with c replaced by no; with this value of c assumption (66) is valid because of (68). Next, (70) rests on L3, (69) and, of course, (36). One gets (74) from (63), replacing a by 2, and (75) from L2, using (72) and (74). Then (76) follows from L11*, using (75), next (77) from L2 with (71), (76), and (78) from L3 and (77). Finally we get (79) from (78) and (45), and (80) from (70), (79) by virtue of L5*. In (83) we introduce (as a primitive notion) the definite article “the” in front of a substantive, if that substantive has the uniqueness property assumed in (82). In (85) we assert that if the uniqueness property holds then every A equals ”the A” . A detailed proof of (85) can be given as follows. Abbreviate K(A) := S a : ~ V b :(b~ = a ) . Then (82) says exist(K(A)). We want to apply (67), replacing A by K(A) and c by “b = the A”. To that end we have to satisfy what the condition (66) amounts to in this case, i.e., va:K(A)(b = the A ) . In order to prove the latter statement, we extend the context by means of a : K(A) (dcl). In the extended context we have to show b = the A. In this context we have a : K(A), and therefore V ~ : A(d = a). Using L12* we get both b = a and the A = a, so by EQ21* and EQ22* we infer b = the A. Note that this derivation uses no classical logic. It is entirely negation-free. In (87) we define the singleton { a } as ~ an A-set. In usual untyped set theory the subscript A is superfluous, but here it is not. Nevertheless, having to write
The mathematical vernacular (F.3)
92 1
the subscript is a formal duty only, for if the same a also satisfies a : B then { a } = ~ { a } ~ .We know (by metatheory) that the typings a : A and a : B can hold simultaneously only if A and B are sub-substantives of a common K (which might be the archetype of a ) , and therefore (90) helps us out. Note that in (87) and (92) the typographical abbreviation f z : A us used (see (20.7.2)). In the context of (87) we can define the empty set too: A :: substantive(dec1)
*
emptysetA :=Tz:A no : A-set
.
Note that we do not have a universal empty set: every archetype has one of its own. In (90) we wanted t o express that if A and C are substantives with C
as the thing denoted by { a l , . . , ,a,}. number in the metalanguage smMV.
t
In this case n is a variable natural
The two ways (i) and (ii) can be connected in the MV book for every single value of n, but not for all n simultaneously. We note that the length of the derivation (i.e. the number of applications of language rules) is more or less proportional to n. In (101) we define the union of an indexed collection of sets.
922
N.G. de Bruijn
Lines (102)-(110) present the “if-then-else” selector, which can be used as a basis for definition of functions by cases. We omit indications of the proofs. Note that (108) uses the “the” of (83).
19.6. The text from (111) to (135) deals with the natural number system. The Peano axioms are (112), (113), (114), (116), (118), (124). In (127) we have presented the notion of the product of two natural numbers by means of a PN. This can of course be avoided (the product can be defined), but the text would become lengthy, and it is our present purpose to get rapidly to divisibility. In (129) we define a substantive “divisor of n”, in (130) it is noted that it is a sub-substantive of ‘hat”, and (131) gives the definition of “prime number”. In (135) we form the statement that the product of two primes is not a prime. Note that this contains an example of a typing ( p . q : prime number) playing the role of a statement, which is allowed by virtue of (132) since p . q : nat. It would be wrong to claim “not ( p . q : prime number)” as a theorem here: a proof would require more information about products than what is expressed in (127). 19.7. The text from (136) to (149) is based on the beginning of Hilbert’s axiomatization of geometry. Hilbert starts by saying “We conceive three different systems of things: the things of the first system are called “points”, those of the second system are called “lines”, those of the third system “planes”.” Hilbert does not make any use of his “systems” as systems: what he actually does is just handling the words “point”, “line”, “plane” as new substantives. Therefore we interpreted his words in MV by taking them as PN’s. As to (143) one might hesitate. Does this really require a mathematical definition or is it just one of the linguistic transformations we want to admit anyway? We refer to Section 22 for such matters.
20. BINDERS 20.1. We have not spent much attention on quantification by means of bound variables and quantifiers. Formal treatment of quantification in lambda calculus is well known, of course. One of the hard things in quantification is the treatment of the names of bound variables, which have to be refreshed occasionally. In this section we do not go into these standard matters of non-typed lambda calculus. Instead, we shall indicate a number of points in which our present proposal of MV differs from more usual ways to treat quantification. 20.2. First we mention that our MV is a typed language, and that, accordingly, the bound variables in the quantifications run over a certain range. The range
The mathematical vernacular (F.3)
923
can be indicated by a substantive A , using a typing “x : A ” , or by a set S, and then we use z E S. By 14.3 .we can easily pass from one to the other, and so we restrict our discussions to the case ‘‘2: A ” .
20.3. In OMV the “value” of a result of quantification is either a statement or an object. Examples of the first kind are Vnmatural number
P(n)
%natural
1
number
p(n>,
where P ( n ) is a statement. Examples of the second kind are
2f
(n)
1
u V(n)
1
{. E SIP(z))
t
nES
n= 1
where f (n)is a number, V ( n )is a set, P ( z ) a statement. 20.4. In MV, where substantives are taken seriously, we can also admit quan-
tification where the value of the result is a substantive. One of the possibilities is given by the binder S, introduced in rule T6, Section 12.2. Its meaning is shown in the following example. We consider ‘‘quadrilateral with the property that its diagonals are perpendicular to each other” as a new substantive. It can be written by means of a quantifier as Sz:qudrilateral(thediagonals of z are perpendicular)
.
20.5. The following example demonstrates a second case where quantification leads to a new substantive. We have the name “square of p” if p is a prime number. We want to despecify the variable p , and get to the substantive ‘‘square of a prime”, or “prime square”. For this we can use the binder “despo” and write
despop:prime(the square of p ) (despo is short for “despecified object”). This binder can be considered as a typographical abbreviation (see Section 21.2).
20.6. Many quantifiers can be expressed once we have the functional bander (Church’s A). If A and B are substantives, if F ( ...z...) is an expression containing z at one or more places, with the property that z : A implies F ( ...z...) : B, then
F ( ...z...) is the function that attaches, for each z, the value F ( ...z...) to x. Example:
N.G. de Bruijn
924
(20.6.1)
Azpositive
integer(2’
+
’
Some other quantifiers can be expressed in terms of this one. For example, the sum
c
(20.6.2)
(n2+n)-2
n:positive integer
might be written as (20.6.3)
SUm(Xn:positive integer(n’
+ n)-’) .
This means that the quantification (20.6.2) can be obtained by application of the unary operator “sum” to the function (20.6.1). In contrast to (20.6.2), all the binding in (20.6.3) is in the function, and nothing of it in the operation. 20.7. The binder S of rule T6, Section 12.2, plays a similar central role as
Church’s A. In Section 20.6 we had the unary operator “sum”, acting on an expression quantified by a A, in the present case we take as examples the unary operators “exist” (from Section 18.2, formula (61)) and “t” (from Section 14.2). The operator “exist” maps substantives into statements, the “1”maps substantives into names. If P is an expression containing z, such that for all z : A we ~ agreeing that have P :: statement, then we can form a new statement 3 z : by (20.7.1)
&:A
P ( z ) = exist(S,:AP(z))
and the new name (20.7.2)
tz:A
P ( z ) = (Sz:AP(z))f .
This (20.7.2) is usually written on OMV as {z : A ( P ( z ) } . We get the standard rules for introduction and elimination of the quantifiers 3 , : ~and t=:A from T6 (Section 12.2) in combination with the rules for the unary operator “exist” (Section 18.2, (60)-(67)) and the rules for the unary operator t (Section 14.2). At this point it should be noted that we have not provided facilities in MV to deal explicitly with predicates. If A is a substantive then we cannot express in mV that something is a predicate over A . Instead, we have to use the rules of Section 14. A predicate is usually seen as a mapping from objects to statements. As an example, we consider the property of-a natural number to be > 5. One might suggest ArInat(z > 5) that sends any natural number z into the statement z > 5. It seems attractive, but we have not incorporated this into MV. It would not not quite fit into our system to attach a type to such a thing, and therefore it would not help us to create arbitrary predicates. 20.8.
The mathematical vernacular (F.3)
925
The set-forming rules of Section 14 help us out. Instead of discussing a predicate in MV, we discuss the set of all objects satisfying that predicate. So instead of that A z : n a t ( ~ > 5) we talk about the set fz:A P ( z ) (cf. (20.7.2). And instead of taking arbitrary predicates we take an arbitrary A-set, as in formula (88) of Section 18.2.
20.9. MV does not allow the introduction of new quantifiers in the book. The reason is that the language is not equipped with means for saying ”an expression containing I”. Have only two basic quantifiers in the language definition, viz. the substantive binder S and the functional binder A. All other binders have to be expressed in terms of these two in the way indicated in Section 20.6. If we insist on using notations like (20.6.2), (20.7.1), (20.7.2), we have to treat them as typographical abbreviations, to be discussed in Section 21.
21. TYPOGRAPHICAL ABBREVIATIONS 21.1. Of course we like to use OMV notations in MV as much as possible. We can do this in an informal way by using what we shall call typographical abbreviations. We can agree that if we write (20.6.2) in an MV book, this is just an informal abbreviation for (20.6.3). The agreement to use that abbreviation cannot be made in the book itself, but has to be written in the margin somehow. A similar remark applies to (20.7.1) and (20.7.2). 21.2. As a further example we take the “despo” of Section 20.5. If A and B are substantives, and if P( ...I...)is an expression containing I such that for all z : A we have P( ...I...): B , then despoz:A P( ...z...) can be considered as typographical abbreviation of
21.3. The words “typographical abbreviations” indicate unofficial abbreviations, usually (but not always) invented for the sake of typography. When reading a text that uses such abbreviations we first have to translate them into what they stand for, and only after that translation we are assumed to be able t o understand the text as an M V book. Typographical abbreviations are superficial, when compared to the abbreviations we introduce in the MV book itself by means of definitional lines (see
N.G. de Bruijn
926
Section 7.6). Definition and theory of the language have t o take these definitional lines as essential parts of the language, but never deal with typographical abbreviations. As an example of a typographical abbreviation outside the world of quantifiers we quote the notation { 1,.. . , n } for the set of integers from 1 to n. We refer to the discussion in Section 19.5.
21.4. When studying mathematical notation, we discover many other cases of abbreviations that we would prefer t o consider as informal. Some examples are: (i)
We write a = b
< c = d instead of the conjunction of a = b, b < c, c = d.
(ii) We write a(l),a(2), . . . to denote an infinite sequence. (iii) We have baroque notations for 17-th and 18-th century mathematics, on integrals, derivatives, differential equations, etc. (iv) We have many unwritten conventions by which we omit things that, strictly speaking, would be necessary for parsing. These may be local conventions in a certain area of mathematics. In trigonometry one interprets sin z cos y as the product (sin z)(cos y); the alternative sin (z cos y) occurs in the theory of Bessel functions but not in trigonometry.
21.5. Many formulas in OMV are written in a form that do not fit into a single line. A simple example is the old notation for quotients by means of a horizontal bar. If we think of MV having to be processed by a computer it seems that such notations have to be avoided, but it is not a matter of principle. Formats which do not have the form of a string of characters might be admitted in MV just as well. As a simple example we note that sometimes the value that a function takes at the point n is denoted by using the letter n as a subscript, like in c,. This should not be confused with the habit of using cl, c2, . . . as new identifiers. The difference between the two is of the same type as the difference between the two ways to look at al, . . . ,a,, discussed in Section 19.5. 21.6. There will be many cases where one is easily tempted to interpret the rigid rules of MV with a little grain of salt. At least, as long as the texts are intended to be read by human beings only. If we present them to machines, we have to be much more careful. It was already mentioned in Section 5.3 that we often cheat with the rules that require identifiers to be fresh. But it is not necessary to be so rigid in
The mathematical vernacular (F.3)
927
cases of variables introduced by means of declarations. It suffices to have them different from all previous variables of that same context and different from all previously introduced constants (either by definition or by PN). But the rule that all introduced constants should have different identifiers, will often be felt as a burden: in mathematics constants occurring in distinct subjects will often have the same name. When writing for human readers, there does not seem to be any harm. When writing for machines, we should do something like the paragraph system that was used in Automath, where identifiers are always interpreted in the sense given to them in the local paragraph. An old constant with the same name can only be referred to if we add some kind of paragraph indication. Another case for grains of salt was already mentioned in Section 5.5. The condition that all context variables occur in the name of a defined notion is quite different from habits in OMV. In Automath there is a systematic way to weaken this rule: in parentheses expressions like F ( A 1 , .. . , A n ) we may just omit the first k entries A 1 , . . . , Ak if t.hey are identical to the first k variables of the context. In particular that has the effect that in a definitional line in a context with variables 21,.. . ,zn the parametrized constant on the left of sign ":=" can be written as a single identifier (cf. Section 5 . 6 ) . But it should be noticed that in MV and in OMV not all parametrized constants have the form of such parentheses expressions. That makes it harder to formulate what liberties can be taken in the matter of omission of a number of context variables.
22. GETTING CLOSER TO NATURAL LANGUAGE 22.1. In the examples of Section 18.2 we inserted a small piece of what one might call natural language. On a small scale it showed how mathematics can be described in words and sentences, not just in symbols and formulas. If we want to insert more natural language, or even if we want our MV book to look like an ordinary mathematics book, we can do this on three levels:
(i)
the primary MV level (pMV),
(ii) the secondary MV level (sMV), (iii) the level of typographical abbreviations. 22.2. As to primary MV we can discuss how some of the basic notations on typing, on context and on quantification are to be expressed in terms of words. In several cases it is not quite clear how to do this: there may arise ambiguities,
N.G. de Bruijn
928
in particular just those which the MV notations intended to avoid. Right now we do not try to suggest how to solve all these difficulties. We hardly go beyond a first orientation. 22.3. The typing “p:prime number” can of course be pronounced “ p is a prime number”. A definitional line body “ p := Q : prime number” can be pronounced as “denote the prime number Q by p” (in this case Q is an expression and p is an identifier). The declaration ‘$:prime number” (the case of a pointed flag) is “let p be a prime”. The assumption “p:prime number” can be “assume that p is a prime number” (the case of a rectangular flag). The case of “assume...” is essentially different from “let...”: in the case of “let ...” the p is a new variable, in the case of “assume...” it is a variable or a constant that was introduced earlier in the book (cf. the assumption (89) in Section 18.2). Unfortunately there is not a very strong feeling in english OMV that “let ...” is to be restricted t o the case of declaration. One might consider to replace the “let ...” by something that cannot be misinterpreted, like “take any prime number p”. In OMV there is no clear way to tell where the flagstaffs end. Quite often it is suggested by the typographical layout, mainly by the subdivision of the text into sentences, paragraphs, sub-sections, sections and chapters. There is certainly a need for explicit rules for this. Right now there are just some unwritten conventions. One might express a rule like this: If an assumption is a part of a sentence, then it does not reach beyond that sentence. If it is the first sentence of a paragraph, and not a paragraph of its own, then it does not reach beyond its paragraph. Similarly, if a sequence of assumptions form the first sentence of a paragraph, and not a whole paragraph, then these assumptions are intended just for this paragraph. The rules for declarations are the same as those for assumptions. As an example we quote
(22.3.1)
If x is a real number then sin x
<2 .
It is considered to be bad manners to refer to x in the next sentence. Actually one may doubt what (22.3.1) means. It can be (i) a block, opened by the declaration “x:real number”, (ii) the universal statement VZzrealnumber sinx
< 2.
Fortunately the two are equivalent by virtue of the language rules of MV. But sometimes (22.3.1) means an implication: just imagine that the sentence (22.3.1) is preceded by “let x be any complex solution of the equation 4 cos x - 1 = 0”. In that case we can consider (22.3.1) as an implication, but also as a block (starting with the assumption “let x be a real number”). Fortunately again, the two possibilities are equivalent because of the language rules of MV.
The mathematical vernacular (F.3)
929
22.4. Expressing quantification in natural language is reasonably established: “for all points P . . .”, “for every point P * . .”, “there exists a point P such that ...”, are sufficiently clear, also if the quantifier “for all points P” is shifted to the end of the sentence: a machine should be able to translate them into V’s and 3’s. Nevertheless there is something wrong from the linguistic point of view: the name P does not play a role any ordinary word could ever play in a sentence. Writing “for every point, P say” does not make it any better. Natural language simply does not have anything corresponding to dummy variables! We (and the linguists) just have to learn t o live with the strange P in “for every point P”. 22.5. In our natural languages it is often possible to express quantification without the use of a dummy. The sentence “all dogs sleep” is equivalent to “for every dog P , P sleeps”. This can be done because of the fact that P is the subject of the sentence “P sleeps”. In other cases it can be done by means of pronouns. “There is a dog whose master trims its hair every day” will (although it is not very elegant English) mean: “there is a dog d such that the master of d trims d’s hair every day”. Correct interpretation of such sentences may depend on subtleties (see what happens if “d’s hair” is replaced by “his hair”), in particular in cases of more than one quantification in a single sentence. Special care should be taken with the words we use when applying the rule for elimination of the existence quantifier, mentioned at the end of Section 20.7. One usually says: “We know the existence of an z : A such that P ( z ) . Take such an 2”. Then one starts deriving a statement Q (which does not contain z). That is, one derives z : A(decl), P(z)(ass)
*
Q
and then the existence of an z : A such that P ( z ) guarantees that Q holds outside the block too. It would be nice to have a standard way of saying these things in a world where it is not customary to indicate the end of a block. A suggestion: open the argument with “We may assume that we have an z with the property P(z)”. The word “assume” stresses the fact that the life of z is short! 22.6. The situation with the substantive binder (see T6 in Section 12) is similar to the situation with logical quantifiers. The example in Section 20.4, viz. “quadrilateral with the property that its diagonals are orthogonal to each other” again shows that names of dummies can sometimes be avoided by means of a pronoun (in this case “its”). Another example is “prime number dividing nr’l where the gerund “dividing n” is derived from the statement “ p divides n”.
N . G . de Bruijn 22.7. It is not hard to get good translations for our formalism describing sets. If A is a substantive, we can pronounce “AT’, as ‘‘the set of all A’S’’. If S is
an A-set then the substantive “Sl” can be pronounced as “element of S”.So s : Sl is to be pronounced as “s is an element of S”,and this amounts to the same thing as “s E S ”. 22.8. Now coming to secondary MV, we of course depend on the special book we prefer to write. If it contains the material of Section 18.2, we first get to the discussion of natural language for negation. If P is a statement, the statement not(P) can be written as “it is not true that P”, and actually such a thing could be written as a book definition like this:
P :: statement(dec1)
*
it is not true that P := not(P) :: statement.
Nothing much is gained by this, of course (and some people might object that “not true” has too much of a metalinguistic flavor), but it seems to be the only construction that works the same way in all possible cases. In many sentences the negation can be worked into the statement P, usually by putting “does not” in front of the main verb, but that is impossible if P has the form of a quantified statement or of an implication. 22.9. Getting deeper into an MV book, we are no longer dealing with fundamentals, and this may imply that we are mainly sticking to the kind of sentences we have invented ourselves in definitions, especially since it is so easy for us to introduce new terminology in the book. We discuss the example of line (140) in Section 18.2. In natural language we like to have some synonyms available for a phrase like “b goes through A”, and there is no objection against codifying some of these in the M V book. We might insert directly after (140) a definition like
A lies on b := b goes through A :: statement
.
And we may introduce a new substantive “point of b” as “point lying on b” (the substantive binder with suppressed dummy, cf. Section 22.5). And from now on we have another way to say that “b goes through A” by means of the typing statement “A is a point of b”.
22.10. Natural language can have productive rules for getting synonyms. The rules we discuss here are connected with possession, with conjunction and with disjunction. Possession (to have or not to have) seems to be overwhelmingly important for human beings, and therefore they have made facilities for expressing it in
The mathematical vernacular (F.3)
93 1
many different ways, so as to fit in every linguistic situation. In the name “the derivative of g” the “of” suggests that g possesses something, and we rapidly accept that “f is the derivative of g” and “g has f as its derivative” are synonymous. Therefore we feel that in the MV book it is sufficient to define “the derivative of g” and that we can take the other constructions as implicitly defined by it. The phrases for describing possession of course depend on the question whether somebody can possess more that one thing of the kind mentioned, and also on the question whether he is or is not the sole proprietor. For example: “ A is a point of b” turns into “b has A as one of its points”, and the existence statement “exist(point of b)” is synonymous to “b has at least one point”. In conjunctive statements synonyms can lead to shorter sentences. The statement “ A lies on b and B lies on b” is synonymous to “ A and B lie on b”, and similarly “ A lies on b and A lies on c” is synonymous to “ A lies on b and c”. Still the matter is tricky: ‘‘a implies c and b implies c” can not be contracted t o ‘‘a and b imply c”. There are similar contractions for disjunction. “ A lies on b or on c” is synonymous to “ A lies on b or A lies on c”. In general one will go as far as one can with such contractions as long as there is no danger of ambiguities. Try “ P is the only point of S and P is the only point of T”, “ P is a point of the limit set of 5’and P is a point of the limit set of T”. Quite often synonym production rules are not safe enough for mathematics. “I hear John and Mary” can be considered to be synonymous to “I hear Mary and John”. But in “the Cartesian product of X and Y” we cannot just interchange the X z.nd the Y . And a rule that “not for all points P we have &” is synonymous to “there is a point P such that not(&)” is a thing we want to be able to discuss as a logical rule; we do not want t o be forced t o accept it for the sake of linguistics.
22.11. From Section 22.10 we may conclude that it is not easy at all to decide about synonym production rules. Forbidding them will make our language inelegant, admitting them makes it unsafe. Obviously the best thing we can do is that as long as we do not fully understand the synonym production rules, we refuse to consider them as a part of “official” MV. We can put them on the list of the “grains of salt”, if we wish. In many cases we can take the effects of synonym production rules seriously without proclaiming them as rules. An example of this was presented in Section 18, line (143), where a synonym was provided by a book definition. Many things can be developed that way, although it may become monotonous on the long run. 22.12. It requires quite some mathematical experience to understand sentences
N.G. de Bruijn
932
involving indefinite articles (“a” and “an”). Quite often such sentences are ambiguous, and their interpretation may depend on whether they are labeled as “definition” or as “theorem”. Example: (22.12.1) a rhombus is a quadrilateral with property P has three interpretations, viz. (22.12.2) rhombus Q quadrilateral with property P (22.12.3) rhombus := quadrilateral with property P (22.12.4) rhombus = quadrilateral with property P
.
If (22.12.1) is labeled with “theorem” or “lemma”, we choose for (22.12.2), but if the label is “definition”, we choose (22.12.3). If instead of (22.12.1) we would have had “rhombuses are quadrilaterals with property P”, with a theorem-like label, we might have hesitated between (22.12.2) and (22.12.4). If we really want to express (22.12.4) we might prefer “A quadrilateral is a rhombus if and only if it has property P”. We may test our abilities for understanding mathematical sentences by trying cases where some of the words have been replaced by symbols that conceal the meaning of the words. As an example we consider
Definition. An A of a B is a C whose D intersects that B. This we rapidly interpret as z : B (dcl)
*
:= S,,c((the
A of z :=
D of y) intersects Z)
::
substantive
.
A second example is
Definition. We say that the A of a B hibernates if it is skew to that B” This we interpret as z : B (dcl) := the
*
the A of z hibernates :=
A of z is skew to z
at least if the terms “the A of book.
Z”
:: statement
,
and “is skew to” have been defined in the
The mathematical vernacular (F.3)
933
There is an objection against the choice of the phrase ”the A of 2 hibernates” in the above definition. It asks for trouble with parsing. Let us be more specific by taking as a n example: “We say that the orthocenter of triangle d hibernates if it lies inside d”. Now take two triangles d and e with common orthocenter P. We might argue: if P lies inside d then P hibernates, and therefore P lies inside e. This is obviously false! The thing is that there never has been a definition explaining what it means that a point hibernates. Nevertheless this is suggested by the phrase “We say that the orthocenter of d hibernates”, since “the orthocenter of d” is the name of a point. Therefore it is in vain to appeal to EQlOa (Section 13) in order to say that in this phrase “the orthocenter of d” may be replaced by “the orthocenter of e”. We cannot say that
t : point (dcl) * t hibernates
:: statement
.
A way to avoid this inconvenience is to define “the orthocenter of triangle d hibernates with respect to d”. Or, still simpler, we define hibernation of a point with respect to a triangle, and then apply it t o the orthocenter. We note that the definite article “the” is used in two different ways. In a case like “the orthocenter of d” it originates from a line in the book where in a context “d:triangle (dcl)” we have defined the name “the orthocenter of d”. But a case like “the positive root of f ” has to be parsed as a substantive (“positive root of f ” ) preceded by the “the” of Section 18.2 (83), which requires a proof of the uniqueness statement (82). In the case of ‘‘the orthocenter of d” there has been no previous introduction of a substantive ‘Lorthocenterof d”, and therefore it can not be parsed as a substantive preceded by a definite article. 22.13. A line in an MV book can be labeled “theorem” or “lemma” if the line body is a statement (case (vi) of BR9 (Section 10)). That, is, if it is considered “important” enough for such a stately name. Otherwise it can just be considered as a stepping stone in a proof, or as a minor remark. Lines with a body of the form (iii), (iv) or (viii) of BR9, can be labeled “definition”, although very often (in particular in case (iii) ) we prefer to call them abbreviations. These cases (iii), (iv)l (vii) can be called “statement definition”, “substantive definition” and “name definition”, respectively, and each case has its own phraseology in OMV. Very many definitions in OMV are definitions of adjectives; we shall discuss the use of adjectives in Section 22.14. 22.14. In our grammar of MV we have not discussed adjectives thus far, but they are easily incorporated. We should always bear in mind that an adjective is to be defined with respect to a substantive. Let us take the substantive
N.G. de Bruijn
934 “triangle” as an example. If z : triangle (dcl)
*
P
:: statement
is valid, then P (which may be an expression containing z) expresses a property of z. In natural language such a property may be expressed by an adjective. Let us choose the word “blue” for it. Then we can consider the new substantive “blue triangle” and the new statement “x is blue”. This statement can be considered as an abbreviation for “z is a blue triangle”. The substantive can be defined as (22.14.1) blue triangle := Sz:triangle P :: substantive . The statement “z is blue” can be introduced by z : triangle (dcl)
*
z is blue :=
(x : blue triangle)
:: statement .
We can agree that we express both (22.14.1) and (22.14.2) by the single line
Definition. A triangle z is called blue when P. A nice way to write this with a new binder “Adj” is as follows: (22.14.3) blue := Adjz:triangleP
.
Working with adjectives has some other nice features. One is, that if an adjective like “yellow” is defined with respect to the substantive A , and if B << A , then we can speak of yellow B’s too. And if we have defined both “yellow” and “round” on the substantive A , then we can use the new substantive “yellow round A ” , and that is synonymous with “round yellow A”. Misunderstandings can arise if the adjective “round” was not defined with respect to the substantive A but with respect to the substantive “yellow A ” .
23. REMARKS O N PARSING The situation about parsing is like this. What we really want to say in a sentence or a formula, has the structure of a tree (to be more precise, of a planted planar tree). Such a tree is a finite directed graph where (i) every point has just one incoming edge (except for a single point, the root, which has none), (ii) at each point a linear order of the outgoing edges is given (the order from left t o right), (iii) every point can be reached by means of a finite path that starts at the root. Finally we mention that to the points of the tree there may be attached letters or words, i.e., identifiers.
The mathematical vernacular (F.3)
935
The difficulty is that we want to put such tree-shaped information in a linear form, in order to be expressed in speech or writing, and that we want this linearized form to reveal the original tree structure. Mathematicians have solved this problem centuries ago, coding their trees in linearized form with the aid of sets of nested pairs of parentheses. In natural languages, however, this has never been done. It is quite probable that parsing trouble had its influence on our natural languages before writing was invented, in particular in the shaping of inflexions and conjugations. Quite some effort in learning languages, and in studying their structure, is connected with parsing and with the constructs people invented for the benefit of parsability. In spite of the fact that parsing is an immense problem in the study of natural languages, we can be very casual about it when discussing MV. As long as we are able to create a language it is no serious problem at all. We can just be generous with the use of parentheses or any other means for describing the tree structure in a linear format. Admittedly, we do not want to go all the way: we want our MV to look like natural language as much as possible. This can be achieved to a large extent by sensible choice of the terms and phrases we introduce in our books (in the form of sMV material). Combined with the use of just a few parentheses, this can reduce parsing trouble to a bearable extent. Making a serious study of this would, for the moment, be a waste of time, since the trouble can so easily be eliminated by adding enough parentheses.
This Page Intentionally Left Blank
937
Relational Semantics in an Integrated System R.M.A. Wieringa 0. ABSTRACT
This paper contains the description of a system for handling semantics of computer programs. The methodology used for the description of semantics is the relational semantics: - possibly incomplete - information about programs is represented by binary relations. For the description we use the language Automath in which logic, mathematics, syntax and semantics are integrated. Moreover, the correctness of texts written in Automath can be checked mechanically by a computer. We consider an ALGOLGO-like programming language. The axiomatic basis of it is kept small, but it is large enough to make the definition of many ALGOL constructs possible. In the basis are included assignment, binary selection, concatenation, block structures and recursive parameterless procedures. For these basic constructs semantics is presented, and some examples are given how new program constructs can be described in terms of these basic ones.
1. INTRODUCTION We shall present a formalism for the description of syntax and semantics of programs in a n ALGOLGO-like programming language (i.e. a block-structured programming language with variables of various kinds, assignment to these variables] binary selection] recursion, etc.). An essential point is that program correctness proofs have to be subjected to an automatic verification system. So we have to deal with (a) the organisation of the variables, the so-called state space,
(b) the description of the syntax of the language: what kind of programs do we consider, (c) the description of the semantics of the programs: what information do we state about the programs.
R.M.A. Wieringa
938
The method we shall use to describe semantics will be relational semantics with strong emphasis on dealing with incomplete information about the relation between initial and final state. A system for verification of the correctness of programs has to be able to cope with mathematical theories (e.g. number theory) and to keep track of the mathematical interpretation of values of state space variables. In practice, the verification of the correctness of a program appears to be long and tedious, since it consists of very many elementary steps. We feel the need for a mechanical verification. So alltogether, we need a language in which various formal systems (e.g. semantics, logic, mathematics) are integrated, and the correctness of what is written in the language should be decidable by a computer. Automath ( [ d eBruijn 70a ( A . 2 ) ] )is such a wide-scope language. In an Automath book we can express all primitives we need about logic, mathematics, programming language, semantics, and on the basis of these primitives we can define particular programs and derive truths about their semantics. We use the following notatioan for some of the essentials of Automath. Typing is denoted by colons ( P : Q means P has type Q). Abstraction is written as [z : A] B , denoting the function with domain A and values B (this B may contain z). Application is written as ( A )B (i.e. the value of the function B at the point A ) . We use Q : type for saying that Q is a type, and R : prop for saying that R represents a proposition (if S : R then S is a proof of that proposition). The semantical framework described here is essentially based on various proposals by N.G. de Bruijn [de Bruijn 73d], [de Bruijn 75b]. In the present form it is used by the author of this paper for the development of an operational system intended to be useful for proving correctness of big programs.
2. T H E STATE SPACE Since programs act on variables, we have to pay some attention to these variables and their possible values; in other words, to the state space. Roughly speaking a state is a set of variables each of a certain type (think e.g. on the types integer, boolean etc. in ALGOLGO) and having a value corresponding to that type. So we introduce the notion datatype, and several datatypes, like datatype : type boo1 : datatype int : datatype
.
Relational semantics in an integrated system (F.4)
939
For each datatype dt the type of the corresponding values will be denoted by elts(dt) : type
.
Since our programming language has an ALGOL-like block structure, we put our variables on stacks: one for each datatype. For simplicity we do not assume the stacks to have a bottom. The places in each stack are indexed by 0,1,2,...; the 0 refers to the top of the stack. In the stack corresponding to dt the values have type elts(dt). Each pair (dt, i) of a datatype and an index now identifies a program variable: we do not talk about names of variables. So we define (written in Automath) State := [dt : datatype] [i : nat] elts(dt) : type (where nat is the type of the naturals). For a visual interpretation see Fig. 1.
Figure 1, A state space There are several operations on states. Let us fix a state a. By value(o, dt, i) we denote the value in a of the variable (dt, i); it has type elts(dt). Furthermore there are some operations transforming states into states: (a) adapt(a, dt, i, u ) is the state that is obtained from a by replacing the value of (dt, i) by a new value u ; (b) extend(u,dt,u) is the state we get when in (T we push an element with value u on the stack corresponding to dt. So, when 0’= extend(a, dt, v) we have value(a’, dt, 0) = v, value(a’, dt, i 1) = value(o, dt, i), value(a’, t, i) = value(a, t, i) when t # dt;
+
(c) restrict(a,dt) is the state we get when in a we remove the top element of the stack corresponding to dt. So when a’ = restrict(a,dt) we have value(a’, dt, i) = value(a, dt, i+ l),value(a‘, t , i) = value(a, t, i) when t # dt.
R.M.A. Wieringa
940
Having defined these operations on states, we can prove properties about them, e.g. value(extend(a, d t , v), dt, i
+ 1) = value(0, dt, i)
restrict(extend(0, dt, v), dt) = 0 and write these in our Automath book. In order to deal with nontermination, abortion because of thing like “divide by zero”, indexing outside array bounds etc., we add an extra datatype ref (standing for refuser). The variables belonging to ref are quasi-variables,i.e. they do not appear in a program itself but only in its semantics. There are two values connected to refusers: ON and OFF, ON meaning “there is something wrong”. The datatype ref plays an exceptional role in our discussions. In most cases we shall stipulate that datatypes are # ref.
3. SYNTAX What kind of programs do we consider? It is our intention to have a rich class of programs with a set of primitives that is as small as possible. Therefore we do not consider expressions of complex shape in our primitive programs. The following programs are primitive (the word “Program” will be used as the type of all programs).
(1) If dt : datatype, u : dt # ref (i.e. u proves dt we have
# ref), i
: nat, v : elts(dt),
Const_ass(dt,u, i, v) : Program corresponding to “z := d’in ALGOL (where z corresponds to (dt, i) and v is a constant of type dt).
(2) If dt : datatype, u
: dt
# ref, il
: nat, i2 : nat, we have
Var_ass(dt,u,illi2) : Program corresponding to “x := y” in ALGOL, y being a variable.
(3)
If b : nat, nl : Program, a2 : Program, we have Binselect@, irl,n2) : Program corresponding to “ i f b then irl else ~ 2 ” .
Relational semantics in
illl
integrated system (F.4)
94 1
If ~l : Program, 7r2 : Program, we have Concat(Rl,R2) : Program corresponding to “ ~ 1~ ;2 ” . If dt : datatype, u : dt
# ref, R
: Program, we have
Block(dt, u, R ) : Program corresponding to “begin dt x ; R end” (where dt is one of the types in ALGOL). If dt : datatype, u : dt
# ref, R
: Program, we have
Injection(dt, u, R ) : Program In ALGOL there is no construction corresponding to this. It intends the following: Program R acts on a state space. When we want to use R in a situation where that state space has been extended with a variable of datatype dt, x has to act on that extended state space. For formal reasons this program has to get a new name. If cp : Program we have
+ Program
(i.e. cp is a function from programs to programs)
Recurs(cp) : Program more or less corresponding to “procedure p ; ( p ) cp”. The idea behind this approach is the following: ALGOL uses in recursive procedures a kind of circular definition: in the specification of procedure p , p itself may appear: p := ( p ) cp. The essential part of the procedure is cp, the program-program function. By the formula p := Recurs(cp) we turn cp into a program. The above list of primitive programs is a reasonable basis for a programming language. We do not state it to be complete; if desirable we can add further primitives later, e.g. primitives about array assignment, and operations on records (as in PASCAL). And users of the system, handling special algorithms requiring special datatypes can add primitive notions for private use. By means of the seven primitive program constructs given above we can build other program constructs. Once they have been written in our Automath book they are available for later use, just like the primitive ones. We give some examples.
R.M.A. Wieringa
(8)
To the boolean assignment “ b l := b2 V b3” in ALGOL (where b l , b2 and b3 are variables) corresponds the statement “if b2 then b l := true else b l := b3”. It is written in Automath as follows: if b l : nat, b2 : nat, b3 : nat. we define Bool_or..ass(bl, b2, b3) := Binselect (b2, Const .ass( bool,boolnotref, b l T ) , Varass(bool,boolnotref, b l , b3)) : Program (where boolnotref states boo1 # ref, and T : elts(boo1) denotes the value true).
(9)
The empty statement in ALGOL can be mimicked as follows: Dummy := Var-ass(bool,boolnotref,0,O) : Program so “b := b” in ALGOL where b corresponds t o (boo1,O).
(10) To the statement “if b l V b2 then A” corresponds the block “begin boolean b; b := b l V b2; if b then A else end”. If b l : nat, b2 : nat, T : Program, we describe it by Or-cond(b1, b2, T ) := Block(bool,boolnotref,Concat(Bool_or_ass(O, bl
+ 1, b2 + l),
Binselect( 1,Injection(bool,boolnotref, A),Dummy))) : Program Notice the effect of the introduction of a new boolean. It transforms b l , b2 and T into b l 1, b2 1 resp. Injection(bool,boolnotref, A).
+
+
(11) To the while statement “while b do A” corresponds the recursive procedure “procedure p ; if b then begin T ; p end else”. If b : nat, A : Program, we describe it by While(b, T ) := Recurs([nl : Program] Binselect(b, Concat(T, TI), Dummy)) : Program.
+
We did not yet discuss integers and assignments like “ a := b c”. We can define the integers as sequences of bits 0 and 1 and write programs for addition, multiplication etc. It is a long way t o go, but whatever we produce is available forever.
Relational semantics in an integrated system (F.4)
943
4. SEMANTICS
Semantics as we describe it is closely related to the methodology of denotational semantics with one of its central ideas the presentation of meaning of a program as a function from states to states (cf. [Scott & Strachey 711). We take a different point of view: we do not consider functions from states t o states but binary relations over the state space. This is called relational semantics (cf. [Hichcock & Park 731). When discussing semantics of a program in a particular situation it is, fortunately, often sufficient to deal with incomplete information. Some parts of the program may have semantic properties which are partly irrelevant for the properties of the program as a whole. Such incomplete information has the form of a binary relation, and can be treated in our system. As an extra advantage we mention that we do not have the slightest trouble with non-deterministic programs. We connect relations to programs by stating that a relation p presents information about a program x. In our Automath book we take this notion to be primitive, but we can give the following interpretation from an executional point of view: For every pair a1 : State, a2 : State where a1 and a2 are initial and final state of some execution of x, the relation p holds. Because of the possible incompleteness of the information, the converse (i.e. when p holds for a1 and a2, x can transform a1 into a2) need not be true. In the jargon of Automath, a relation is a function that adds t o every u1 : State and a2 : State a proposition. So the type of all relations is Reln := [a1 : State] [a2 : State] prop
.
So given p : Reln, a1 : State, a2 : State, “ p holds for a1 and u2” is expressed by (a2) (01)p. Further we write, given x : Program, p : Reln, the primitive notion info(a,p) : prop
.
The interpretation of info(x, p ) is the proposition ‘‘p presents information about T” *
The basic properties embodied in this interpretation are given by the following axioms (where a : Program, p l : Reln, p2 : Reln) 0
info(T, p l ‘and’ p2) ‘eqv’ (info(a, p l ) ‘and’ info(x, p 2 ) )
0
( p l ‘imp’ p2) ‘imp’ (info(a, p l ) ‘imp’ info(x, p 2 ) )
(We use ‘and’, ‘imp’, ‘eqv’ for the connectives A , calculus.)
+, = of ordinary propositional
R.M .A. Wieringa
944
The relations p we claim by axiom to present information about the seven primitive programs in Section 3 all have a standard form, viz. [a1 : State] [a2 : State] i f Someref-on(a1) then a1 = a 2 e l s e P ( a 1 , a2)
where P ( a 1 , a2) is a proposition, and Some_ref_on(o) := 377 : nat(value(a,ref,r) = ON)
.
The motivation for this is the following: Once a refuser is in ON position (because of things like nontermination, abortion), we do not want to “execute” the rest of the program anymore; in other words: this rest is equivalent to a skip, for which we present the information m 1 = 02.
So to each primitive program T we have a relation p in standard form and an axiom stating info(T, p ) . In this paper we do not give the relations in standard form, but only the essential part, i.e. the proposition P(a1,a 2 ) in the e l s e part. (1) To Const-ass(dt, u,i, w) is connected the proposition (playing the role of P(aL02)) a 2 = adapt(a1, dt, i, w) . (2) To Var-ass(dt, u,i l , i2) is connected a2 = adapt(a1, d t , il, value(a1, d t , i2)) . (3) Given p l : Reln, p2 : Reln, info(T1, p l ) , i n f o ( ~ 2p2), , to Binselect(b, ~ 1 , 7 r 2 ) is connected i f value(a1, bool, b ) = T then (a2) (al)p l else (a2) (01)p2
.
(4) Given p l : Reln, p2 : Reln, info(T1, pl), i n f o ( ~ 2p2), , to Concat(xl,7r2) is
connected 3a : State ( ( a )(al)p l ‘and’ (a2) ( a )p2)
.
(5) Given p : Reln, info(T, p ) , to Block(&, u,T ) is connected 3wl : elts(dt) 3w2 : elts(dt) ((extend(a2,d t , w2)) (extend(a1, d t , wl)) p )
.
Relational semantics in an integrated system (F.4)
945
Since a1 and a 2 are states belonging t o the state space outside the block and p is a relation between states inside the block, we have to extend a1 and a2 with appropriate values when connecting them with p . They are extended with v l and v2, representing the initial and final value of the variable local in the block.
( 6 ) Given p : Reln, info(n, p ) , to Injection(& u, n) is connected (restrict(a2, dt)) (restrict(a1,dt)) p ‘and’ value(a2, dt, 0) = value(a1, dt, 0)
.
Now p acts on a “smaller” state space than the one 01 and a 2 belong to, so we have to restrict a1 and u2. The second part of ‘and’ states that the value of the added variable does not change.
(7) In order t o describe information on the recursive program Recurs(cp), we have to consider a sequence of relations with special properties. Given Seq : nat
+
Reln,
with
V a l : State Va2 : State ((02) (al)( 0 )Seq ‘eqv’ i f SomeIef.on(a1) then a2 = a1
else value(a2, ref,nonterm) = ON))
V k : nat V.rr : Program (info(n, ( k ) Seq) ‘imp’ info((n) cp, ( k
+ 1)Seq))
to Recurs(cp) is connected
Vn : nat 3k : nat ( k ‘gtr’ n ‘and’ (02) (al) ( k )Seq)
.
The interpretation is as follows: We start from a program TO to which we connect the proposition value(a2,ref,nonterm) = ON. (TO can be considered as a non-terminating program.) We now build the programs (nO)cpo := no, (x0)pl := (xO)cp, (nO)cp2 := ((.rrO)p)p,.... For every k , ( k )Seq is a relation that presents information on (TO) cpk, by induction: (0) Seq presents information about no, and for any k and n holds info(n, ( k ) Seq) ‘imp’ info((x) cp, ( k + 1)Seq). The information presented on Recurs(cp) is now the least upperbound of the sequence Seq: [a1 : State] [a2 : State] Vn : nat 3k : nat
( k ‘gtr’ n ‘and’ (a2) (01) ( k )Seq)
.
Starting from our semantics of the seven primitive programs, we can define relations for higher-level constructs and prove that these relations present information. Especially the while statement deserves some attention, and the
946
R.M.A. Wieringa
+
programs that effect the arithmetic operations, such as ‘‘a := 6 c”. Once all such standard programs have been written in our book, we gradually can start to write more complex programs and to present information about them. This set-up is completely parallel to the situation in mathematics where we start from very simple primitives, and gradually learn to say everything we want. Much of the work we have to do when writing programs and proving semantics about them, is more or less standard. All the time we deal with complex expressions in terms of the operations on states (as given in Section 2). Those can be simplified by application of the rules we have mentioned at the end of 2, applying elementary logic and elimination of if-then-else constructs. At this moment we feel the need for a (limited) automatic simplifier. Given a complex expression in terms of extend, adapt, restrict etc. such a simplifier is supposed to deliver a simpler equivalent form of this expression (and written in Automath a proof of this equivalence). Occasionally, some human interaction might be helpful.
947
Computer Program Semantics in Space and Time N.G. de Bruijn
1. INTRODUCTION This note can be considered as an addition to [de Bruijn 73d] (see also [ Wzeringa 80 (F'.4)]). We aim at a new treatment of recursive procedures, i.e., a new version of Section 11 of [de Bruijn 73dl. This new treatment also effects the other sections, but there the alterations that have to be made are quite obvious. In [de Bruijn 73d] we used a state space R that was extended to a space R+ by adding a single element m. This element m played a role in the semantics only: it is not something that can be referred to in the programs. Its semantic role is to indicate non-termination. In the present note we follow a system that handles some further information, i.e., something corresponding to runtime. Where the system of [de Bruijn 73d] only distinguished between runtime being finite or infinite, the present system might be able to say exactly what the runtime is in cases where it is finite. For practical applications it might be interesting to develop runtime administration for terminating programs, but this was not the main motivation for this study. The reason was rather of a theoretical nature. The semantic treatment of [de Bruijn 73d] (Sect. 11) turned out to be hard to combine with fixed points semantics. Mr. R. Wieringa, who implemented a large part of [de Bruijn 73d] in Automath, had to introduce a new notion of order between predicates and had to impose slightly awkward monotonicity restrictions in order t o establish a correspondence between the recursion semantics of [de Bruijn 73d] and fixed point theory. The system proposed in this note is much easier in this respect. Yet, if we weaken the runtime information by distinguishing between finite and infinite runtime only, the semantics can be expected to be the one of [de Bruijn 73d]. Our present system takes care of runtime by describing a relation between the moment t where the execution of a program starts, and the moment t' where the execution ends. Semantical information about a program will have the form of a predicate on (0 x 2') x (0 x T ) (whereas in [de Bruijn 73d] it had the form of a predicate on R+ x Of. It is quite reasonable to think of predicates in which
N.G. de Bruijn
948
the relation between t and t‘ is expressed in a form like
t + f ( w , w‘) I t’ I t
+ g(w, w’) ,
and it is also quite reasonable t o choose the semantics of primitive program constructs in accordance with this form. Nevertheless we shall not set it as a rule that our predicates should necessarily have this form. What we do require, however, is that t 5 t’ is somehow enforced, and we shall realize this by restricting our predicates to the set of all those quadruples (w, t , w‘, t’) for which t
5 t’.
The interpretation of t and t’ is obvious. In a case where t < 00, t‘ = 00 we of course say that the program does not terminate. In cases where t = 00, t’ = 00 we can say that the program execution never started, since some other program that had to be executed first, took infinitely long. In order to be able to say that the sum of the lengths of infinitely many time intervals is infinite, we restrict ourselves t o time moments which are either integers or the symbol 00. Just like in [de Bruijn 73d], where programs never explicitly referred to the element 00, in our present system the programs will usually not refer to t in any way, except for the program “delay”. In this note we shall discuss the relation between t and t’ only as far as it is relevant for the treatment of recursion. In fact we take the point of view that no program takes time, except for the “overhead” of a procedure call. This overhead is connected with the fake program we call “delay”, which takes time without doing anything else: its semantics may be described by (w = w’) A (t‘ = t 1). One might say that every case of non-termination is already caused by an infinity of executions of “delay”, in spite of the fact that other program components might try to make it worse. Another point of view in which this note differs from [de Bruijn 73d] is a simplification: we have given up the “relativistic” attitude (see Section 3). Part of the philosophy underlying the note [de Bruijn 73d] was that semantical discussion can be kept on a mathematical level without entering into the syntax of a programming language. The correctness of the code in which the program is presented t o a computer is a matter we can reduce to the question of the correctness of a compiler. This is not essentially different from the problem of the correctness of the translation of programs in higher order languages. The philosophy that semantics can be built up without entering into syntax, is elaborated in Section 8 of this paper.
+
Computer program semantics in space and time (F.5)
949
2. PREDICATES ON THE SET A We start from a set fl (which is called ‘%ate space”). And we introduce the “time space” T , defined as
(where X is the set of integers and 00 is a new element). In T we have addition and order. The ordinary addition of Z is extended by 00 t = t 00 = 00 for all t E T , and the ordinary order of Z is extended by agreeing that k < 00 for all k E Z. The set A is defined as
+
+
Given a point (w, t , d , t’),we may refer to the pair ( w , t ) as “initial”, and t o the pair (w‘, t ’ ) as “final”. We write Pred(A) for the collection of all predicates on A. Particular predicates are (i)
‘ITRUE”,which is identically true on A,
(ii)
“FALSE”,which is identically false on A,
(iii) “NONTERM” (for non-termination), which has as its value the proposition t’ = 00, (iv) a predicate which we shall denote by J , given by
J ( w , t, w’, t‘) = (t = t’) A ((t’ < m) + (w = w ’ ) )
.
(1)
It is inconvenient t o admit all arbitrary predicates on A, for in cases where t’ = 00 the value of w’ has no sensible interpretation, and for our semantics it would be awkward to make the distinction between different final pairs (w’, 00). (Of course it neither makes sense to distinguish between different initial pairs (w,m), but there it causes no trouble for our semantic system.) Somehow we want to consider all the (w‘,00)’s as equal, but we do not want to loose the Cartesian product structure of our space. Therefore we shall restrict predicates to being ‘konstant at infinity”. We say that a predicate P E Pred(A) is constant at infinity if P(w,t , ~ ’00), = P ( w ,t , w ; , 00) for all w,w‘,w; E R and all t E I. The set of all P E Pred(A) which are constant at infinity will be called Pred*(A). For every P E Pred(A) we construct a P* E Pred*(A) as follows:
950
N.G. de Bruijn P*(w,t ,w’, t’) = P(w,t ,w’, t’)
if t‘
<m ,
and
P * ( W , t ,w’, 00) = 3,En P(w,t , p, m) for all w,w‘ E R, t E T . If P is itself in Pred*(A) already, we obviously have P’ = P. This applies in particular to the examples TRUE, FALSE, NONTERM and J , mentioned above. We shall write P c Q for predicate implication (rather than using the dubious Q), so P c Q means notation P -+
P(wl t , u‘,t’)
V(u,t,ut,tt)
+
Q(w, t , w’, t‘)
.
We use the sign = for equivalence of predicates (and we shall often just call it equality), so P = Q means ( P c Q) A (Q c P ) . The notation P c Q helps us to remind that the set of points satisfying P is a subset of the set of points satifying Q. Quite often we interpret a predicate as an amount of information, and then we have to keep in mind that P c Q does not mean that Q gives more information than P. I t is just the other way around: P gives all the information presented by Q, but possibly more. With the above notation P*, Q’ we obviously have
(Pc
Q)
+
(P’
c Q’)
for all P,Q E Pred(A). Quite often we want to describe a predicate by means of some expression E containing w ,t , w’,t’. We shall use the notation XE in order to denote the predicate P for which P(w,t,w’,t’) = E for all w , t , w’,t’. And if P is obtained from E this way, we write X*E for the P* corresponding to P. As examples we mention X*((w = w’) A (t’ = 00)) = X*(t’ = m) = X ( t ’ = m) = NONTERM
,
X’((W = w’)A ( t = t‘)) = x(((w = w’)A ( t = t’)) V ( t = t’ = m)) = = X ( ( t = t’) A ((t’ < 03)
+
(w = w ’ ) ) ) = J
.
If P and Q are in Pred(A) we define the “boolean matrix product”, or “boolean convolution”, denoted by P * Q, as follows:
P
* Q=
3 0 ~ n . 1st i~s g~ f ( P ( w t, , 0 1
8)A
We note that this convolution is associative:
( P * Q)
*
R= P
*
(Q
* R).
It is not hard to show that for all P E Pred(A)
Q(.,
3,w’, t ‘ ) )
.
Computer program semantics in space and time (F.5)
P*J=P*,
95 1
(2)
and therefore ( P * Q)* = P * Q * J = P * Q*. In particular, if Q is constant at infinity, then P * Q is constant at infinity. As a special case of (2) we mention
J * J=J*=J.
3.
SEMANTIC INFORMATION
We assume that we have a set called Prog(0). The elements of this set are called programs (or “programs on 0”).And we assume to have a mapping “Totinf” (which stands for “total information”), mapping Prog(0) into Pred*(A). The interpretation is that if 7r is a program, then Totinf(.rr) is a predicate on A that presents all the semantic information about x. That is to say in terms of operational interpretation: there is an execution of x leading from initial state (w , t ) to final state (w’, t ’ ) if and only if the quadruple ( w , t, w’, t’) satisfies the predicate Totinf(T ) . Quite often it happens that for a given program x it is hard to find Totinf(7r) and, moreover, this total information is not always important in all its details. We can usually be quite happy with something that is simpler and weaker. That means that we work with some R E Pred*(A) such that Totinf(7r)
cR.
(1)
In most cases we start from the other end. We have some predicate R as our goal, and we want to find a program T such that (1)holds. Then we say that R is a program specification, and that x is a program that satisfies the specification. In [de Bruijn 73d] we tried to avoid the introduction of a thing like Totinf. The idea behind this is that it might be useful to have a semantic system in which people with different ideas about the total information are still able to communicate about things they do agree on, and also that it leaves some freedom t o those who implement a programming language. This idea of [de Bruijn 73d], if extended to our present system, means that we work with a predicate w on Prog(R) x Pred*(A), with the interpretation that W ( T , P ) expresses that P(w,t, w’, t ‘ ) is true if (but not necessarily “only if”) there is an execution for which ( w , t, w’, t‘) presents the initial and final state. The connection with Totinf is if Totinf(T) c P then
W(T,
P)
N.G. de Bruijn
952
for all K E Prog(R), P E Pred*(h). In the present note we shall not follow this line of [de Bruijn 73d]. The advantage of avoiding Totinf(7r) might not compensate the disadvantage that the basic properties of W ( K , P ) are harder to formulate than those of Totinf(.rr). There is an analogy in topology, where there is a possibility (“point-free topology”) to restrict the discussions t o “open sets” as basic objects, without bothering whether these objects are sets (of “points”) indeed. The price that has to be paid for this “relativistic” point of view is a complication of the axiomatic structure, a structure that can only be understood with the non-relativistic structure in mind. The interpretation of Totinf(a) makes it hard to attach a meaning to cases where w , t are such that there is not a single pair w‘, t’ such that Totinf(s)(w, t , w‘, t ’ ) is true. Nevertheless we do not exclude these cases by means of a general condition on Totinf(K), partly since it still might turn out to come in handy for special kinds of abortion (cf. Section 10). On the basis of Totinf we can define a transitive relation between programs. If both 7r and u are in Prog(R) we write x
If both
K
5 u if and only if Totinf(7r) c Totinf(u)
5 u and
u
5 K we say that
K
.
and u are semantically equivalent.
4. SEMANTICS OF PRIMITIVE PROGRAMS In this section we consider the programs “skip” , “delay” , “adlibitum”, “nonterm” and a class of programs called “assignments”. We ignore other primitive programs like 2 := 2 y , etc. The programs “skip”, “delay”, “adlibitum” and “nonterm” will hardly ever occur in actual programs, but are just added to the collection of all programs in order to smooth the semantic dicussion. The program “skip” does nothing, that is to say that it leaves both w and t untouched, at least as long as t < 00. Its semantics is
+
Totinf(skip) = X*((w = w’) A (t = t ’ ) ) . The program “delay” is just like “skip” in as far as w = w‘, but it “consumes a unit of time”: Totinf(de1ay) = X*((w = w’) A (t’ = t
+ 1)) .
The program “adlibitum” is a non-deterministic program that instructs the computer to do as it pleases, possibly even to use a n infinite amount of time.
953
Computer program semantics in space and time (F.5)
(The term “adlibiturn” is used in music with the same meaning, although one would not usually admit the performer to play infinitely long.) The semantics is “no information at all”, “anything may happen”, and is expressed formally by Totinf(ad1ibitum) = TRUE. The program “nonterm” instructs the computer to go on for ever, and requires nothing about w’. Its semantics is Totinf(nonterm) = X ( t ’ = co) = X * ( t ’ = 00) = NONTERM
.
The trouble with a formal treatment of assignments is this: they contain expressions in some syntactic form, intended to represent elements of R, but the language in which they are formulated (the computer programming language) should not be confused with the (so much richer) mathematical language in which we discuss the semantics. Let us say that somehow we have defined a class of expressions (in the programming language) and that t o each E of that class we have assigned a mapping g of R into R. Then something like “x := E” is a program; we note that E may contain a symbol x referring t o the initial value w , but no symbols referring to w’, t or t’. We shall not say precisely how g is obtained from E; one usually has the “naive” interpretation that the value g ( w ) is obtained if we just replace z by w in E. But whatever the relation between E and g might be, the semantics is Totinf(x := E ) = X*((w‘ = g(u))A ( t = t‘)) . We have chosen the (unrealistic) point of view that the execution of z := E does not take time. If one wants t o attach some time consumption to this program, it is easily administered by adding (in concatenation) a number of executions of “delay”.
5. LOWER PRIMITIVE PROGRAM CONSTRUCTS
One of the simplest lower primitive program constructs is K o r a, if both K and a are in Prog(R). The interpretation is that for every input the computer is free to choose which one of K and a is to be executed. The semantics is described, if P = Totinf(?r), Q = Totinf(a), R = Totinf(n o r a), by
R(w,t ,w‘, t’) = P(w,t ,w’, t‘) V Q(w, t ,w’, t’) for all (w, t ,w’, t’) E A. Since both P and Q are constant at infinity, the same thing holds for R.
N.G. de Bruijn
954
Next we consider the well-known construct “A ; u ” , called the concatenation of the programs x and u. The semantics is given by means of the convolution Totinf(x; u ) = Totinf(x)
* Totinf(u) .
We note that Totinf(x ; 0)is constant at infinity, since the second factor is constant at infinity. We also note that because of the associativity of the convolution the programs x ; ( u ; T ) and ( x ; u ) ; T are semantically equivalent. With the construct “if E then x else u” we have a situation similar to the one with the assignment statement we considered in Section 4. Somehow we have defined a class of expressions in the programming language, and to each E of that class we have associated a predicate B on R. The expression E may contain a symbol z referring to the input value w , but no symbols referring to w’, t or t’. (The usual “naive” point of view if that the program text contains B itself, and that it reads “if B(z) then x else u ” . ) The intuitive (operational) meaning of “if E then x else u ” ,is that if B(w) is true then x is to be executed, if B(w) is false then u is to be executed. The formal semantics is described as follows. Let P , Q , R be the Totinf’s of x,u and “if E then x else u”. Then we have
R(w,t , w‘, t’) = = (B(w) A P(w,t,w’,t’)) V (-B(w) A Q ( w , t , w ’ , t ’ ) )
and we note that the right-hand side is equivalent to
( B ( w )+ P(w, t , w ’ , t’))A (-B(w) + Q(w, t , w‘, t’)). We have to check that R is constant at infinity. This follows directly from the fact that both P and Q are constant at infinity. In the lower primitive program constructs of this section we have not administered any run time for the “overhead” of the constructs: the only time consumption is in the execution of the sub-programs x , u , ... . It would not be very hard to alter this by adding a number of executions of the program “delay”.
6. HIGHER PRIMITIVE PROGRAM CONSTRUCTS
Let cp be a program-to-program function, i.e., a mapping of Prog(R) into itself. As an example we start from some expression E to which there corresponds a predicate B (like in Section 5), and for any x E Prog(R) we define cp(x) by cp(x) = if
E then x else s k i p .
Computer program semantics in space and time (F.5)
955
For any program-to-program function cp we have a program to be called R E CURS(cp). We may think of a program p that can be described in ALGOL60 by the procedure declaration and procedure body procedure p ; cp(p)
.
In order to describe the semantics of RECURS(cp)we first define the programto-program function $ by $(A)
= cp((de1ay; A ) )
.
(1)
Next we introduce the functions q k (k = 0,1, ...) by iteration: $ ‘ ( A ) = A , qhk++’(a) = $ ( g k ( r ) ) .We apply the q k to the primitive program “adlibitum”. Abbreviating Rk
= Totinf($k(adlibitum))
(2)
we now define the semantics of RECURS(cp) by Totinf(RECURS(cp)) = R
(3)
R(U,t ,U ’ , t’) = V k G N & ( U , t ,W ’ , t’)
(4)
where
for all U,W’ E R and t , t‘ E T . IV is the set (1, 2,3, ...}, but it would do no harm t o include k = 0 as well, since & = Totinf(ad1ibitum) = TRUE. We have to check that R E Pred*(A), i.e., that R is constant at infinity. This is trivial from (4), since each Rk is constant at infinity. In Section 9, we shall comment on the definition (3), and discuss its relation to fixed point semantics. We postpone the discussion since we want to show first that (3) is good enough for a practical case: the “while” statement.
7. THE WHILE STATEMENT We consider the program that is usually written as while E do r
,
(1)
where r is a program, and E is an expression that plays the same role as in Section 4 in the construct “ if E then A else d’.We form a program-toprogram function cp by c p ( ~ ) = if E then ( r ; A ) else skip
(2)
N.G. de Bruijn
956
and we claim that the semantics of RECURS((p) is able to explain the semantics usually attached to (1). We shall do this both for partial correctness and for total correctness. In both cases B is the predicate on that corresponds t o E .
Theorem 1 (“Partial correctness”). Let C be a predicate on R and abbreviate
D = X((t = t’) A ( ( B ( w )A C ( w ) ) + c ( w ’ ) ) ) , F = X((C(w)
A
(3)
(t‘ < m)) + (C(W’) A -B(w’))) .
(4)
Assume that the semantics of I- satisfies Totinf(I-) c D*
.
Then the one of RECURS((p) satisfies Totinf(RECURS(q7)) C F
.
Proof. As in Section 6 we abbreviate
,
Rk = Totinf($’(adlibiturn)) and we note that qk++’(adlibitum)= = i f E then
(7; delay
; $k(adlibitum)) else skip
.
By the rules of Section 5 we get Rk+l C Wk+l where wk+l = X((B(w)
+
Hk(w,t,w’,t’))A ( l B ( w )
Hk = X(ausu,s, (D*(W,t,a,s) A (S‘ = S
+
s(w, t,W’, t’)))
1
(7)
+ 1) A (a‘ = a ) A
.
A Rk(o’, S’,W ‘ , T’)))
(8)
S = Totinf(skip) = X*((w = w’) A ( t = t’)) .
By (3) we have
D*(w,t , w‘, T’)
-+
t = t’
and therefore (8) can be simplified to
Hk = ~ ( ~ , ~ ~ D * ( ~ , t , ~ , t ) A R ~ ( ~ , ~ + 1 , W ’ , t ’ ) ) . (9) We define a sequence of predicates Pk
= X ( ( C ( W ) A (t’ +
Pk
E Pred(A) by
+ 1 5 k + t ) A (t‘ <
(C(w‘)A-B(w’)))
.
00))
-+
Computer program semantics in space and time (F.5)
957
We remark that for all k
i.e., that
Pk
= Pl
Pk
is constant at infinity: we even have
,
(11)
pk(w, t,w‘, m)
(12)
for all k , w , t , w’ because of the subexpression t‘ We shall show that Rk
cPk
( k = 0, 1,2, ...)
< 00 on the left in (10).
.
(13)
Once this has been shown we rapidly get to (6): by Section 6 Totinf(RECURS(9)) C Rk for all k , and for all (w, t , ~ ’t’) , E A we have (vk
Pk(w, t ,W ‘ , t’)) + F ( w , t ,U ‘ , t’) .
(14)
If t‘ = 03 this is trivial since the right-hand side is true, if t’ < rn we can (given w, t , w’, t’) take k such that t ’ + l 5 k + t , whence Pk(W, t , w ’ , t’) + F ( w , trw’, t’). We shall prove (13) by induction. First
Ro c Po
(15)
is trivial: P = TRUE since t’+ 1 5 t is false on A. next we take a fixed k we assume Rk C P k and we shall show C &+I. It suffices to show wk+1
c pk+1
> 0, (16)
To this end we fix ( w , t , w’, t’) E A, we assume
Rk C
Pk
and
wk+l( W , t , W ‘ ,
t’)
(17)
and our goal is Pk+l(w, t , w’, t‘). w e split in two cases: B ( w ) and +(w), according to (15) and (7) we can reach our goal by proving
so
In order to show (18) we assume its left-hand side. By (9) and Rk C conclude that n exists such that
we
B(U)A
D * ( W , t , 0,t’) A Pk(C7,
t
I, W ’ , t’) .
Pk
(20)
If t‘ < 00 we can just replace D’ by D , and from (20), (3), (4) we derive ( C ( w ) A (t’
+ 1 5 k + t + 1)) + (C(W‘)A i B ( w ’ ) )
(21)
N.G. de Bruijn
958
and that means Pk+l(W,t,W’,t’). If t’ = 00 we get Pk+l(W,t,W‘,t’) from (12). Next we show (19). We assume S(w, t,w’, t‘) A l B ( w ) , and have to prove Pk+l(w,t,w’,t’). If t‘ = 00 this is trivial by (12), so we take t’ < m. w e assume C ( w ) A (t’ 1 k 1 t ) A (t’ < 00) and want to show C(w‘) A +?(w’). From S(w, t,w’, t’) and t’ < 00 we get w = w’, so by C ( w ) and +(w) we have 0 C(w’) A -B(w’). This finishes the proof of Theorem 1.
+ < + +
Theorem 2 (“Total correctness”). Let C be a predicate on R, and let Q be a mapping of R into the set of integers 2 0 . We abbreviate
D = X ( ( ( B ( w )A C ( w ) )
+
F = X ( ( C ( w )A ( t < 0 0 ) )
< Q(w)))) A ( t = 0 ),
(22)
< t + Q(w)) A -B(w’))) .
(23)
(C(w’)A ( Q ( 4
+
( C ( w ‘ ) A (t’
Assume that the semantics of r satisfies Totinf(r) c D*
.
(24)
Then the one of RECURS((p) satisfies Totinf(RECURS(p)) c F
.
(25)
Proof. With the new D and F we follow the proof of Theorem 1. We also take new Pk’s:
Pk = X((C(W) +
A
(Q(W) < k) A (t < 0 O ) )
+
((t‘+ 1 < t + k) A c ( w ‘ ) A iB(w‘))) .
(26)
Since we have altered D, F and Pk, we have to supply new proofs for the details ( l l ) ,(14), (15), (18), (19) of the proof of Theorem 1. We note that the simplification of (8) to (9) is again valid. We have (11) since
Pk(w,t , W ’ ,
0O)
*y(C(W)
A (&(id)
< k ) ) V ( t = 00)
and the right-hand side does not depend on w‘. In order to show (14) we take any (w, t , w’, t’) E A, we assume all pk(W,t , w’, t’) and c ( w ) A ( t < 00). Taking k > Q(w) + 1 we infer from Pk(w,t , w ’ , t ‘ ) that t‘+l t + k and C ( w ’ ) A - B ( w ’ ) and therefore (t’ 5 t+Q(w))AC(w‘)A-B(w’). Thus we have proved F ( w , t ,w‘, t‘). We now show (15): since Q(w) < 0 is false for all w we have from (26)
<
Po = TRUE. Next we take any (w, t , d , t’) E A and any k E { O , l , ...} and we shall prove (18). We do this by showing that (20) leads to Pk+l(W,t,W‘,t‘). Assume (20). Moreover, assume C ( w ) A (Q(w) < k + 1) A ( t < 0 0 ) , and try to get (t’ 5 t k) A C ( w ‘ ) A (-B(u’)).The D*(w,t , 0,t ) of (20) equals D(w, t , n, t )
+
Computer program semantics in space and time (F.5)
<
959
and therefore implies in the present context C ( a )A (Q(u) < Q ( w )). Hence c(0)A ( Q ( u ) < k ) . The Pk(0,t l,w’,t’) of (20) now gives (t’ 5 t k ) A C ( w ‘ ) A +l(w’) as we wanted. This finishes the proof of (18). We finally turn to (19). We assume (with some fixed (w, t ,w’, t’) E A and fixed k ) that S(w, t , w ‘ , t’) A i B ( w ) ,and moreover C ( w ) A (&(w) < k 1) A (t < 03). If t’ < 00 then S(w, t , w ‘ , t’) says that w = w’ and t = t’, and we get (t’ 1 5 t k 1) A c ( w ‘ ) A -.B(w’). This proves Pk+l(w,t,w’,t’). The case t’ = 03 does not occur, since S(w, t,w’, t‘) would still say t = t’, which conflicts with t < 03. This finishes the proof of (19)) and completes the proof of Theorem 2.
since t
00,
+
+
+
+
+ +
0
Sometimes we can definitely conclude to non-termination of a while-statement. Rather than stating general theorems we present a single example. The proof will again follow the pattern of the proof of Theorem 1. In this example we take for R the set of all integers. The predicate B corresponding to the E in “while E do T” is given by B(w) = (w > 0). And T is a program for which we assume Totinf(.r) C D*, where
D = X ( ( t = t‘) A (w’ = w
+ 1)).
(27)
(In ALGOL one can think of the while-statement “while z > 0 do 2 := z+ l”.) We shall derive the semantical statement Totinf(RECURS(cp)) c F , where
F = X((w > 0 ) + (t’ = w)) .
(28)
This corresponds to the intuitively obvious statement that if the initial value of > 0 do z := z 1” is non-terminating. Again we follow the proof of Theorem 1. We take
+
z is positive, then “while z
= x((w
> 0 ) + (t + k 5 t’ 5 w)) .
(29)
Since D*(w,t,w‘, t‘) + ( t = t’) we can again simplify (8) t o (9). Now again it suffices to check (11)) (14)) (15), (18) and (19). We have (11) since w’ does not occur in (29). And (14) is trivial by (29) and (28). Next (15) is obvious since Po = TRUE (note that t 5 t’ 5 00 in all points of A). Now we turn to (18). We assume the left-hand side, i.e., Hk(W,t , w ’ , t ’ ) and w > 0. Turning to (9) we note that D*(w,t , u,t ) implies (t = 00) A (0= w 1). so by the induction assumption Rk C Pk we derive from H k ( W , t , w ’ , t ’ ) that u exists such that
+
( ( t = w) V (0= w + 1)) A
> 0 ) + ( t + 1+ k 5 t’ 5 w)) .
((0
I f w > 0 we deduce ( t = w ) v ( a > 0), so ( t = w ) V ( t + l + k 5 t’ 5 t + 1+ k 5 t’ 5 00.
03),
whence
960
N.G. de Bruijn
Therefore we have proved (18). Finally (19) is trivial since already -B(u) implies Pk+l(u,t,u’,t’) by (29).
8. A CONTEMPLATION ON SYNTAX AND SEMANTICS There is a world of semantics and a world of syntax. We use the word “world” in order to avoid to have to be very precise. It means something like “area of attention”. Let us call these worlds SEM and SYN. In SEM we talk about sets, relations and mappings in the usual mathematical sense. These mathematical “objects” are discussed in ordinary mathematical language. In SYN we talk about strings of characters, and in particular about special strings which are called “programs”. Again we use ordinary mathematical language to discuss these linguistic objects. So both for SEM and for SYN we use mathematical language, but the “objects” are different. Gradually we discover possibilities to link objects in SYN to objects in SEM, but there is always trouble with the metalanguages of SEM and SYN, in particular in those cases where we use one and the same word (like “variable”) in different meanings in the two metalanguages. In spite of the formidable amount of knowledge about formal languages it must be said that SYN is a poor man’s world, an underdeveloped country. SYN cannot really live without SEM, but SEM can certainly live very comfortably without SYN, just by developing a bit of extra metalanguage. Let us compare the situation of computer programs with a subject that came up about two thousand years earlier: geometrical constructions with ruler and compass. In this geometrical case SEM is the world of geometrical objects and logical discussions about those objects. Since the whole of mathematics is available to SEM, it includes sets and mappings. As an example we mention that there is a mapping that attaches to each pair ( P , r ) , where P is a point and r a line segment, the set circle(P,r), which is the set of all points in our plane which have the distance r to P. Let us call the act of getting the set circle(P, r ) from P and r a construction. In the metalanguage we now describe sequences of such constructs, which lead from a set of objects we are assumed t o “have” at the start, t o the objects which somehow interest us. We say that this sequence constructs these interesting objects. Parallel to this sequence we have a sequence of actions in our physical world on physical paper with physical ruler, compass and pencil, and for this sequence of physical actions the sequence in SEM is a mathematical model. But we have to emphasize that the world of SEM is bigger than this. We might
Computer program semantics in space and time (F.5)
961
study constructions for which no physical realization is available. Now where is SYN in this case? Let us hire people to carry out the geometrical constructions we invented. Assume these people are unable to understand our metalanguage. We have to instruct them very precisely what to do at each step. To that end we invent a system for coding instructions, and these coded instructions are the programs of SYN. (One might say that LOGO is a kind of programming language for at least some geometrical constructions.) The question whether a sequence of commands in SYN corresponds exactly t o the sequence we had in SEM’s metalanguage, is independent of the question whether we actually execute or can execute these commands physically. Let us not get to computer programs. The historical order seems to be somewhat different from the old geometrical case. Most of it started with programs (which had to be very precisely defined) plus a somewhat informal notion of state space and a possibly even more informal notion of time. At the moment the need for “program correctness proofs” was felt, SYN was much better developed than SEM. It is quite natural that this resulted in various ways to treat program correctness which were mostly SYN-centered. It became SYN with a tiny bit of SEM (like state space and predicates), or SYN with a lot of SEM (like fixed point theory). Even the term “program correctness” itself bears the traces of this. The term suggests syntactical correctness, but means something different: it means that a program is correct with respect t o some semantical specification. In the geometrical case one of course feels that the matter of correctness of a geometrical construction (like the question of whether our construction for a regular pentagon really leads to a pentagon that is regular) is a matter of SEM only, and that the question has nothing to do with the way we have coded the construction in SEM. Yet we can of course raise the question whether the execution of a given coded description of a construction leads to a proper pentagon. But it would be a clear case for “separation of concerns” to split this question into (i) whether the program is correct, and (ii) whether the coding is correct. There does not seem to be a good reason for tying things to SYN. In ordinary mathematical language we can define anything we need, like sets and mappings, without ever bothering about the kind of notation we use. There is a strong notion of equality in mathematics, with the effect that one and the same object can be described in various syntactically different ways. This is true for “objects” as well as for “actions”. So let us try to keep the matter of program semantics away from SYN. In SEM we can express in ordinary mathematical language everything we want for
N.G. de Bruijn
962
program correctness, and in a formal checking system (like Automath) we can speak of integrated semantics. In integrated semantics we can describe logic, mathematics, programs and program semantics all in one and the same system. And a compiler would be able to read these mathematically defined programs and to translate them into machine language without ever using the computer languages we usually think of. The total effect of integrated semantics will be simplification. It might also be a satisfactory framework in which other semantical systems can be placed and compared. In SYN-free semantics one can consider programs which are not representable syntactically at all. It might be possible to characterize the representable programs in the set of all programs by means of properties like monotonicity (see Section 9), and show that things like fixed point theory can be developed on the basis of such properties. But why go into all that trouble? In Section 6 we showed an example where an important ingredient of practical programming was treated semantically without any reference to such properties, and it seems likely that we can go quite a distance in this style. In SYN-free semantics it seems to be attractive t o identify the notions of a “program” with the notion of the semantic information of that program. Yet there is something to say for the idea of creating a separate set (the set of programs) and to map it into the set of relations by a mapping “Totinf”, as we did in Section 2. This policy anyway leaves various possibilities open. In particular we keep the possibility to add equality and equivalence assumptions in the set of programs, and such assumptions might be adjustable to later mappings of the programs in SYN into this set of programs. So we do not require that every element of Prog(R) is representable in SYN, and we do not require that two elements of Prog(R) are equal if they have the same semantics. Note that sometimes the notion of equality in SYN might be stronger than the one in Prog(R). For example, the repeated concatenations 2 :=
(2 :=
zS1;
2+1;
(2
:= 2 + 2 ;
2 := 2 + 2 ) ;
2 := 5
2-1)
:= 2-1
might be considered as equal in SYN, and their counterparts in Prog(R) are different but semantically equivalent. On the other hand the programs 2
:= s + l ;
2 :=
2+2;
5 := 2
2+3
:= 2 + 2
will be considered to be different in SYN, as well as different in Prog(R), but yet semantically equivalent.
Computer program semantics in space and time (F.5)
963
For the time being we just leave it open what Prog(R) is. Followers of SYNful semantics might like to identify it with the set of all their programs, and their antagonists might like to identify it with the set of all relations, like Pred(A).
9. COMMENTS ON THE SEMANTICS OF RECURSION In Section 6 we defined the semantics of RECURS (for any program-toprogram function cp) by means of formulas (l), (2), (3), (4). In this section we give some arguments for this choice, and we compare it with other possibilities. In the process of evaluating and comparing we shall appeal to more or less intuitive ideas on the structure and execution of computer programs. It should be stipulated that those ideas are certainly not substantiated in all respects by the treatment of program semantics as explained thus far in this paper. Let us first ignore the 1c, of Section 6, and just work with the iterates of cp itself. If p is a program then
are programs. In order t o facilitate the discussion we assume for a moment that all these programs are deterministic. We take any initial value w , and we ask what happens in the execution. Our intuition says: either the recursive program the p is is non-terminating, or there is a k such that in the execution of @(I) not executed at all. This also means that the executions of cp'(p), cp"'(p), ... are all equal, as far as the initial value w is concerned, and that they are all ), ... for any other program v. In the equal to the executions of ( ~ ' ( v cp"+'(v), case of non-deterministic programs these things are harder t o explain, but the idea remains the same. In the preparation of note [de Bruijn 73d] this idea led to a particular choice of p: p is a program with Totinf(p) = X ( t ' = 00)
,
which means that every execution of p is non-terminating. Consequently: if any execution of @ ( p ) actually executes p , then that execution of @ ( p ) is nonterminating too. So we get the semantics we expect, if we say that ( w , t ,w', t ' ) satisfies the predicate Totinf(cp'(p)) for all large k (or at least for infinitely many k). In the case of termination the p in cpk(p) is not executed at all if k is large. In the case of non-termination the p is executed in all these cpk(p), and that takes care of the truth of Totinf(cpk(p))at ( w , t , w ' , 00) for some w'. The objection one might make is the lack of monotonicity in the sequence. Let us discuss monotonicity first in general terms. If cp is a program-to-program
N.G. de Bruijn
964
function we can “expect” cp t o be monotonic in the sense that 7r
5
+
I cp(a)
cp().
(with 5 defined as in Section 3) for all programs x , 0: if we know more about 7r than about c7, then we know more about c p ( ~ )than about cp(c7). Unfortunately we cannot apply this in the case of the sequence p, cp(p),cp2(p),... since there is cp(p) or cp(p) 5 p for the “non-termination” prono guarantee that either p I gram p. We are so much better off with adlibitum: adlibitum cp(adlibitum), and if cp is monotonic this leads to
>
adlibitum
> cp(ad1ibitum) > @(adlibiturn) > ... .
Unfortunately we do not get the proper semantics this way. The simplest example is the one where cp is the identity: c p ( ~ ) = 7r for all programs 7r. Then the limit of the sequence with entries Totinf(cp’(ad1ibitum)) is just Totinf(ad1ibitum). This means that the sequence provides no semantic information at all: it just says that anything may happen, and not that RECURS(cp) is definitely non-terminating. This objection has been overcome in Section 6 of this paper by taking $ instead of cp. The extra executions of “delay” have the effect that for this particular cp Totinf(Q’(ad1ibitum)) = X ( t
+ k 5 t’ 5
00)
.
Taking limits for k + 00 we get X(t’ = oo),and that is what we wanted. Here we use the following definition of the limit of a monotonically decreasing sequence 7r1
> 7r2 2 7r3 1 ...
*
(1)
We say that T k + 7r
(k + m)
(2)
if for all (w, t , w’, t’) t/k p k ( W , t ,W ’ , t’) = P ( W , t,W ’ , t‘) , where P k = TOtinf(Tk), P = Totinf(T). w e note that Totinf(7r) is uniquely determined by the sequence T I , K Z , ... , and moreover that “k
>
(3)
for all k . In [de Bruijn 73d] we did not have monotonicity of the sequence of programs, and we had to have “lim sup” instead of “lirn”. The fact that we have monotonicity in the present semantics has the obvious advantage that the lim of a
Computer program semantics in space and time (F.5)
965
monotonic sequence is nicer to deal with than the lim sup of a non-monotonic sequence. If we have (1) and (2), then for any arbitrary k we can use Totinf(ak) as information about A , since (3) expresses Totinf(nk) 2 Totinf(n). This is much simper than with lim sup, where we can obtain information about the lim sup only if we have information about Totinf(7rk) for infinitely many k . Let us now discuss the idea of RECURS(cp) being a fixed point. Intuitive ideas about execution suggest the "fixed point statement" Totinf($( RECURS(cp)))= Totinf (RECURS(cp)),
(4)
but it is not easy to actually prove this without making restrictive assumptions about the class of constructs we take cp from. Just monotonicity will not do. Monotonicity does suffice for the weaker result
$(RECURS(cp)) 5 RECURS(cp) .
(5)
This follows if we apply 11, to both sides of the inequality (cf. (3))
RECURS(cp) 5 +'(adlibiturn) and take limits for k -B 00. The equality (4) is easy if we assume that $ is not just monotonic but also continuous. We take the latter notion in the sense that lirn $(Ah) = $(lim(wk))
(6)
for any sequence satisfying (1). However, it may be quite hard to establish continuity, even in this weak sense, for all $'s arising from a given set of program constructs. In particular we have t o bear in mind that we may wish to apply RECURS to functions cp which in their definition contain applications of RECURS already. If we take our set of program constructs a bit too wide, it is easy to kill continuity even for very simple program functions. We shall present an example that might give an idea of the kind of restrictions we might have to build in. Let R be the set of integers, and imagine that for every k we have a program r k with Totinf(7rk) = x*(((w'2 k) A ( t = t ' ) )V (t' = 0 0 ) ) . we let 0 be described similarly by Totinf(a) = X*((w'= 0)A ( t = t ' ) ) . We define the program-to-program function 19 by means of concatenation with a:
6(7)= (7; 0 )
N.G. de Bruijn
966
for all T . We have T I 2 7r2 2 ... . If we call the limit T it follows from the monotonicity of cp that cp(n1) 2 cp(x2) 2 ... . Let us put p = lim V ( n k ) . We hope that p and I ~ ( T are ) semantically equivalent, but unfortunately this is not the case: Totinf(p) = X*(((w’ = 0) A ( t = t’))V (t’ = 00)) Totinf(lS(w)) = X*(t’ = 00)
.
So a general proof of (4), which means (6) with T k = $,“(adlibiturn) has t o then just monotonicity. depend on more knowledge about the sequence Another point that has to be raised is the maximality. Let us assume that Q is a “predicate transformer” which is such that Totinf($(n)) = @(Totinf(n)) for all T . Then if P = Totinf(REGURS(cp)) we can read (4) as
@ ( P )= P Now let Q be any other predicate with @ ( Q )= Q. Then just assuming monotonicity we can show that Q c P , in other words: P is the maximal fixed point of Q. In order to show Q c P we remark that TRUE ZI @(TRUE) ZI Q2( TRUE), .. and that the limit of the sequence sequence
ak(TRUE) is P.
Comparing this with the
(of which all entries equal Q) we infer from Q c TRUE, by monotonicity of Q, that Q c Q k ( TRUE), and therefore Q c P . The question arises whether it is really worth the trouble of finding satisfactory restrictions on $ that guarantee (4).After all, we have shown in Section 7 that we can get to quite practical statements on actual programs without ever going into notions like continuity and fixed points. We did not even have to mention monotonicity! It might be easier t o prove (4) under restrictive assumptions like finiteness of state space, or exclusion of non-determinism. But such restrictions do not seem to be attractive for the practical discussion of actual programs. Anyway there is quite a distance between the definition of recursion semantics by means of lim(Totinf($k((adlibitum)))and any definition based on the idea of a maximal fixed point. It is a matter of opinion which one of the two ideas is preferable as the definition of the semantics of recursion.
Computer program semantics in space and time (F.5)
967
The two kinds of semantics get closer together if we take a more liberal interpretation of non-termination. In this more liberal version a semantical statement that says (with w , t given), that t‘ = 00 is a possible effect, has to be interpreted as that there is no upper bound to the values of the t’ of the possible executions (with initial values w , t ) . After all, if we are interested in having our programs terminated, a statement that a program might run for a million years does not give us much more comfort than a statement that it might go on for ever. Therefore, it is quite reasonable to identify “unpredictably long” with “infinitely long”. This “liberal version” is related to the following definition. If P E Pred(A) then instead of the P” of Section 2 we define P+ by: P+(w,t,w’, t’) = P(w,t , w ’ , t ’ ) if t’ < co,and
P + bt,w’, 00) = VU€T\{oo}
%JET I v>u
3,€n P(w,t , P,
.
If the set of all P+ with P E Pred(A) is denoted by Pred+(A), we have Predf(A)
c Pred*(A) c Pred(A) .
In determinietic cases there is no difference between P* and P+. To be more precise, if w , t are such that there is at most one pair w’,t’ with t‘ < co and P(w,t , ~ ‘t’), , then P+(w,t , w ’ , t’) = P*(w,t , ~ ’t’), for all w’, t‘. In general, if P describes the semantics of a program in the original version (where P ( w ,t , w’, co) means that the program can actually run for ever), then P+ is the liberal semantics (where P+(w,t ,w‘, co) means that there is no upper bound to the runtime). If we take the liberal semantics, then our prospects for proving the fixed point property for recursion become much more promising. The difference between P* and P+, and its being related to having nondeterminism and infinite state space, can be connected to Konig’s well known infinity lemma. In order to make the notation sufficiently clear for further discussion, we explain it in a few words. Let ( S , r , f ) be a rooted tree. That is, S is a set (the set of “points”), r is a special element of S (called the “root”) and f is a mapping of S\{r} into S (f(z)is called the “father” of x). It is assumed that for every z (z # r ) there is an integer n such that the n-th iterate f ” maps z into r . This n is uniquely determined, and is called the “level” of z. The level of r is zero. If z E S , the set of all y E S\{r} with f ( y ) = z is called the “offspring” of z, and denoted
Ob). If 3: E S we denote by IP(x) the proposition that there is an infinite path starting from z, that is a sequence 20,z1, 2 2 , ... with zo = z and f(zCn+l) = zn for n = 0 , 1 , 2 , ... . And by UL(x) (UL abbreviates “unbounded level”) we denote
N.G. de Bruijn
968
the proposition that for every natural number m there is a path 20,. . . l zm, again with zo = z, f(z,+l) = zn for n = 0, ...,m - 1. We note that for all z IP(2) + UL(2)
.
Finally we formulate the “finite offspring condition”. It says that for every z E S the offspring O(z) is a finite set.
Konig’s lemma expresses: If the finite offspring condition holds, and UL(T) is true, then we have IP(T). Coming back to semantics, we shall try to explain that IP(T)can be compared with infinite runtime, and UP(T) with unpredictably long runtime, in both cases with initial value T. And the finite offspring condition corresponds to a condition that says that the number of possible outputs is finite, for every given input. This condition is certainly guaranteed if the state space is finite, but also if the program is deterministic. We describe a typical case of a tree where UL(T) holds but IP(T) does not. . points of S are the pairs ( i , j ) with integers i, j Let us call it ( S , T , ~ ) The satisfying either 0 < i 5 j or i = j = 0. The point (0,O) is taken as the root. For all other points we define the “father” by
fo(i,j) =
{
< i 5 j ) A (i #
(i - 1,j)
if (0
(090)
ifl=i<j.
1)
The tree is depicted in Figure 1. In that figure the arrows run from points to their fathers.
Figure 1 Coming back to the general tree ( S ,T, f ) , we describe programs for which S is the state space. As a primitive program we take the program “step”. Its semantics is described by Totinf(step) = (t’ = t ) A (w’ E O ( w ) ) .
Computer program semantics in space and time (F.5)
969
Note that “step” is a non-deterministic program, at least if there exist points w where O ( w ) has more than one element. If O(w) is empty (such an w is called an end-point), then there is no possible output w’ to the input w . If this is considered unacceptable, one might take any arbitrary value of w’ as output, like w’ = w . That means that in the definition of Totinf(step) we replace w’ E O ( w )
by w’ E O ( w ) V ( ( O ( W= ) 8) A (w’ = w ) )
.
But actually the case of empty O(w) is unimportant since the program “step” will not be used there. Let us now discuss the program (see the beginning of Section 7) while E do s t e p , where E corresponds to the predicate B given by = (O(w)#
0) .
In the notation of Formula (2) of Section 7 this program denotes RECURS(cp), where cp is given by c p ( ~ )= i f
E then (step; T ) else skip
.
The “intuitive”, or, if one prefers, “operational” semantics of this program is the following one. Let w (the input) be any point of the tree. If w’ is an end-point such that fk(w’)= w for some lc 2 0 then w’is a possible output (and t’ = t k). If IP(w) holds then there is a non-terminating execution. If IP(w) is false but UL(w) still holds then there exist unpredictably long executions. If UL(w) does not hold then all executions starting with w terminate, and there is an upper bound to their runtime. Let us now investigate Totinf(RECURS(w)) as defined by (4) in Section 6. As far as terminating executions are concerned, it produces the same results as the intuitive semantics. For values of w where IP(w) holds, it proclaims the possibility that t’ = 00, as it should. But in points where UL(w) holds but IP(w) does not, the semantics of Section 6 still says that t’ = 00 is possible, and the intuitive semantics says it is not. The difference between these two kinds of semantics vanishes (at least for this program) if we identify “Unpredictably long runtime’’ and “non-termination” . In the tree program described here, it is also easy to illustrate that $(RECURS(cp))and RECURS(cp) can have different semantics in the system of Sec, UL(z) is false tion 6. We take the special tree (SO,TO, fo). We have U L ( T ~ )but for all z with f(z)= TO. From this it can be derived that at (TO, t , d ,00) (where w‘ irrelevant) RECURS(cp) is true but $(RECURS(cp))is not.
+
N.G. de Bruijn
970
Things look much better in the non liberal Pred' semantics. We briefly discuss the changes that have to be made. First, in Section 2 we have to introduce Predf(A) instead of Pred*(A). We have to give up the idea of a J such that always P * J = P+, like in Formula (2) of Section 2, and it is not generally true that ( P * Q)+ = P * Q+. (A counterexample: 0 = N ,
P ( ~ , t , w ' , t ' ) =( t ' = t ) , Q ( w , t , ~ ' , t ' )= ( ( w ' = ~ ) A ( t ' = t + w ) ) . In Section 3 we have to introduce Totinf+ as a mapping of Prog(R) into Pred+(A). The changes in Section 4 are trivial. In Section 5 the semantics of the concatenation has to be described by Totinf(r; 0)= (Totinf(a)
* Totinf(o))+ .
In Section 6 the semantics of RECURS(9) can be given by Totinf(RECURS(9)) = R+
,
although the definition of R in (4) will guarantee that there is no difference between R and R+ as soon as we have monotonicity. We note that if PO2 PI 2 P2 3 ... with P, = P$ for all n, and if P = V, P,, then P = P+. In Theorems 1 and 2 of Section 7 the new semantics makes no difference, since they deal with cases with bounded runtime. In the non-termination example at the end of Section 7 there is neither any difference, since the program is deterministic. From the tree program discussed earlier in this section it can be seen that $(RECURS((p)) and RECURS(9) need not have the same Pred*-semantics in nondeterministic cases with infinite state space. If we turn to Pred+-semantics, however, it seems that we only need monotonicity properties in order to show that Totinft ($( RECURS(9))) = Totinf+( RECURS((p)) , which means that RECURS(9) can be considered as the maximal fixed point.
10. A COMMENT ON ABORTION
We have not yet described the notion of abortion in our semantical system. Here we first discuss an attempt to treat abortion in a way that seems to be promising at first, and we shall explain why it is not satisfactory. The attempt is this one. If R is a program, and w , t are initial state and time such that there do not exist any w', t' (not even with t' = m) such that Totinf(w, t , w', t') holds, then we might try to interpret this as abortion. This means: with the initial w , t the execution of T will have been interrupted at
Computer program semantics in space and time (F.5)
971
some point. The semantical system does not disclose at exactly which point further execution is refused by the machine, simply because it never discusses executional details. This point of view seems to be very promising for Dijkstra’s guarded command statement i f El
+ a1
0
... 0 E,++.ak
fi
(1)
where each Ei corresponds t o a predicate B (like in our discussion of the “ i f then e l s e ” in Section 4). The semantics is: “select at random an i such that Bi(w)holds, and then execute ai;if there is no such i then abort”. If abortion is interpreted in the style “no possible w’, t’” we indicated above, then the total information of the progam (1) is given by (&(w) A
Pi(W,t,w’, t‘))V ... V ( B ~ ( wA )P k ( ~ , t , w ’t’)) , ,
(2)
where Pi = Totinf(ai) (i = 1,...,Ic). The simplicity of (2) seems to be a positive point both for Dijkstra’s semantics and for the “no possible w’, t’ ” interpretation of abortion. Unfortunately we have to admit that the “no possible w’, t‘” interpretation of abortion conflicts with the idea of nondeterministic programs. We show this with a concatenation “a ; u” where a is a nondeterministic program and u is a program that sometimes aborts. Let b and c be two different elements of 0, and let Totinf(a)(w, t,w’, t’) = ((w’ = b) V (w’ = c)) A ( t = t’) Totinf(u)(w, t , ~ t’) ‘ , = (w
# b) A (w = w’) A ( t = t’) .
So u leads to abortion with the initial state 6, but is harmless with all other initial states. By our semantic rule on concatenation we have Totinf(a ; u ) ( w ,t , d ,t’) = (w’ = c ) A (t’ = t )
(3)
so here there is no abortion for any w ,t. Therefore (3) does not describe the situation adequately, since we expect that the semantics of “a ; u” is: whatever w , t is, we have either abortion or (w’ = c) A (t’ = t ) . A more satisfactory way to incorporate abortion into a semantical system is by means of a special boolean variable. Let us call it ab (for “abortion”). At the start of a program we add the assignment ‘lab := f a l s e ” (interpretation: no abortion thus far), and every sub-program u of the program is replaced by “ i f Tab then u e l s e skip”. If in the final state we have ab = t r u e then we interpret this by saying that the program execution has been aborted. In addition to this the sub-programs may contain assignments ‘lab := t r u e ” in those cases where we actually want abortion to take place. We might want to
972
N.G. de Bruijn
do this in the guarded command statement. Another example is overflow: if we do not wish to handle numbers exceeding m, we might transform “y := l/p” into “ i f Il/pl 2 m then y := l / p e l s e ab := true”. Let us use the word “refuser” for such an extra boolean variable like ab. The essential thing for a refuser T is that subprograms 0 are to be remodelled into “ i f -rr then 0 e l s e skip”. Refusers can be used for semantic discussion of forward goto’s as well.
Bibliography
This Page Intentionally Left Blank
975
References
[A.1] de Bruijn, N.G., Verification of mathematical proofs b y a computer, A preparatory study for a project AUTOMATH (formerly unpublished, 1967). [A.2] de Bruijn, N.G., The mathematical language AUTOMATH, its usage and some of its extensions, in: Laudet, M., Lacombe, D. and Schuetzenberger, M., eds., Symposium on Automatic Demonstration, IRIA, Versailles, 1968 (Berlin, Springer Verlag, 1970), 2 9 4 1 . (Lecture Notes in Math., 125). [A.3] van Daalen, D.T., A description of Automath and some aspects of its language theory, in: Braffort, P., ed., Proceedings of the Symposium APLASM (Orsay, 1973), Vol. I. Also in [van Benthem Jutting 771, 48-77. [A.4] Zucker, J., Formalization of classical mathematics in Automath, in: Colloque International de Logique, Clermont-Ferrand, France, 1975 (Paris, CNRS, 1977), 135-145. (Colloques Internationaux du Centre National de la Recherche Scientifique, 249). [A.5] de Bruijn, N.G., A survey of the project AUTOMATH, in: Seldin, J.P. and Hindley, J.R., eds., To H.B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism (New York/London, Academic Press, 1980), 579-606. [A.6] van Daalen, D.T., The language theory of Automath, Ph.D. thesis (Eindhoven University of Technology, 1980), Chapter 1, Sections 1 - 5 (Introduction). [A.7] de Bruijn, N.G., Reflections on Automath (Eindhoven University of Technology, 1990). [A.8] Nederpelt, R.P., Type systems - basic ideas and applications, in: van de Goor, A.J., ed., Proceedings of CSN '90, Computing Science in the Netherlands (Amsterdam, Stichting Mathematisch Centrum, 1990), 367383. [B.1] van Benthem Jutting, L.S., Description of A UT-68(Eindhoven University of Technology, 1981). (Memorandum 1981-12, Dept. of Math.).
976
References
[B.2] de Bruijn, N.G., AUT-SL, a single line version of AUTOMATH (Eindhoven University of Technology, 1971). (Notitie 71-22, Dept. of Math.). [B.3] de Bruijn, N.G., Some extensions of AUTOMATH: the AUT-4 family (Eindhoven University of Technology, 1974). (Internal Report, Dept. of Math.). [B.4] de Bruijn, N.G., A UT-QE without type inclusion (Eindhoven University of Technology, 1978). (Memorandum 1978-04, Dept. of Math.). [B.5] van Benthem Jutting, L.S., Checking Landau’s LiGrundlagen”in the A UTOMATH system, Ph.D. thesis (Eindhoven University of Technology, 1977), Appendix 9 (AUT-SYNT). [B.6] van Daalen, D.T., The language theory of Automath, Ph.D. thesis (Eindhoven University of Technology, 1980), Chapter VIII, 1 and 2 (AUT-II).
[B.7]de Bruijn, N.G., Generalizing Automath by means of a lambda-typed lambda calculus, in: Kueker, D.W., Lopez-Escobar, E.G.K. and Smith, C.H., eds., Mathematical Logic and Theoretical Computer Science, Proceedings of the Maryland 1984/85 Special Year in Mathematical Logic and Theoretical Computer Science (New York, Marcel Dekker, 1987), 71-92. (Lecture Notes in Pure and Appl. Math., 106).
[B.8] Balsters, H., Lambda calculus extended with segments, Ph.D. thesis (Eindhoven University of Technology, 1986), Chapter 1, Sections 1.1 and 1.2 (Introduction). [C.1] van Benthem Jutting, L.S., A normal form theorem in a A-calculus with types, in: Mitt. d. Gesellsch. f. Math. u. Datenverarb. 17, Tagung ub. form. Sprachen u. Programmiersprachen, Oberwolfach, Germany (1971), 27-32. [C.2] de Bruijn, N.G., Lambda calculus notation with nameless dummies, a tool
for automatic formula manipulation, with application to the Church-Rosser theorem, Indagationes Math. 34, 5 (1972), 381-392.
[C.3] Nederpelt, R.P., Strong normalisation in a typed lambda calculus with lambda structured types, Ph.D. thesis (Eindhoven University of Technology, 1973). [C.4] de Vrijer, R.C., Big trees in a A-calculus with A-expressions as types, in: Bohm, C., ed., A-Calculus and Computer Science Theory (Berlin, Springer Verlag, 1975), 252-271. (Lecture Notes in Comp. Sc., 37). Also: de Vrijer, R.C., Surjective pairing and strong normalization, Ph.D. thesis (University of Amsterdam, 1987), Chapter 5.
References
977
[C.5] van Daalen, D.T., The language theory of Automath, Ph.D. thesis (Eindhoven University of Technology, 1980), Parts of Chapters 11, IV, V - VIII.
[C.6] van Benthem Jutting, L.S., The language theory of A,, a typed lambda calculus where terms are types (Eindhoven University of Technology, 1985). (Memorandum 1985-02, Dept. of Math. and Comp. Sc.). [ D . l ] de Bruijn, N.G., Example of a text written in Automath (formerly unpublished, 1968).
[0.2] van Benthem Jutting, L.S., Checking Landau’s “Grundlagen” in the A UTOMATH system, Ph.D. thesis (Eindhoven University of Technology, 1977), Parts of Chapters 0, 1 and 2 (Introduction, Preparation, Translation). [ 0 . 3 ] van Benthem Jutting, L.S., Checking Landau’s iiGrundlagenJJin the
A UTOMATH system, Ph.D. thesis (Eindhoven University of Technology, 1977), Chapter 4 (Conclusions).
ID.41 van Benthem Jutting, L.S. and de Vrijer, R.C., A text fragment from Zucker’s ”Real Analysis” (1994). [ 0 . 5 ] van Benthem Jutting, L.S., Checking LandauJs “Grundlagen” in the A UTOMATH system, Ph.D. thesis (Eindhoven University of Technology, 1977), Appendices 3 and 4 (The PN-lines from the preliminaries; Excerpt for “Satz 27”).
[E.1] Zandleven, I., A verifying program for Automath, in: Braffort, P., ed., Proceedings of the Symposium APLASM (Orsay, 1973), Vol. I.
[E.2]van Benthem Jutting, L.S., Checking Landau’s “Grundlagen” in the AUTOMATH system, Ph.D. thesis (Eindhoven University of Technology, 1977), Parts of Chapter 3 (Verification).
[E.3] van Benthem Jutting, L.S., An implementation of substitution in a Xcalculus with dependent types (Philips Research Laboratories Eindhoven, Eindhoven University of Technology, 1988). [F.1] de Bruijn, N.G., Set theory with type restrictions, in: Hajnal, A., Rado R. and Sos, V.T., eds., Infinite and Finite Sets, I, International Colloquium, Keszthely, Hungary, 1973 (1975), 205-214. (Colloquia Math. SOC.J&nos Bolyai, 10).
[F.2] de Bruijn, N.G., Formalization of constructivity in Automath, in: de Doelder, P.J., de Graaf, J. and van Lint, J.H., eds., Papers dedicated to J.J.
978
References
Seidel (Eindhoven University of Technology, 1984), 76-101. (EUT-Report 84-WSK-03).
[F.3] de Bruijn, N.G., The Mathematical Vernacular, a language for mathematics with typed sets, in: Dybjer, P. et al., eds., Proceedings of the Workshop on Programming Languages, Marstrand, Sweden 1987. Plus: Formalizing the Mathematical Vernacular (formerly unpublished, 1982), Examples of an MV Book.
[F.4] Wieringa. R.M.A., Relational semantics in an integrated system (Eindhoven University of Technology, 1980). (Internal Report, Dept. of Math.). [F.5] de Bruijn, N.G., Computer program semantics in space and time (Eindhoven University of Technology, 1983). (Internal Report, Dept. of Math. and Comp. Sc.). [Abadi et al. 911 Abadi, M., Cardelli, L., Curien, P.-L. and Lkvy, J.-J., Explicit substitutions, Functional Programming 1, 4 (1991), 375-416. [Andrews 711 Andrews, P., Resolution in type theory, Journ. of Symb. Logic 36 (1971), 414-432. [Backhouse et al. 891 Backhouse, R., Chisholm, P. and Malcolm, G . , Do-ityourself type theory, Formal Aspects of Computing 1 (1989), 19-84.
[Balsters 861 Balsters, H., Lambda calculus extended with segments, Ph.D. thesis (Eindhoven University of Technology, 1986). See also [B.8]. [Barendregt 711 'Barendregt, H.P., Some extensional models for combinatory logics and A-calculi, Ph.D. thesis (Utrecht University, 1971). [Barendregt 741 Barendregt, H.P., Pairing without conventional restraints, Zeitschr. f. math. Logik u. Grundl. d. Math. 20 (1974), 289-306. [Barendregt 771 Barendregt, H.P., The type free A-calculus, in: Barwise, J., ed., Handbook of Mathematical Logic (North Holland Publishing Co., Amsterdam, 1977), 1091-1132. (Studies in Logic and the Foundations of Math., VOl. 90). [Barendregt 811 Barendregt, H.P., The Lambda Calculus: Its Syntax and Semantics (North Holland Publishing Co., Amsterdam, 1981). [Barendregt 84a] Barendregt, H.P., The Lambda Calculus: Its Syntax and Semantics, Revised edition (North Holland Publishing Co., Amsterdam, 1984).
References
979
[Barendregt 84b] Barendregt, H.P., Introduction to lambda calculus, Nieuw Archief voor Wiskunde 4 , 2 (1984), 337-372. [Barendregt 911 Barendregt, H.P., Introduction to generalized type systems, Journal of Functional Programming 1, 2 (1991), 125-154. [Barendregt 921 Barendregt, H.P., Lambda calculi with types, in: Abramsky, S., Gabbay, D. M. and Maibaum, T., eds., Handbook of Logic in Computer Science (Oxford, Clarendon Press, 1992), Vol. 2, 117-309. [Barendregt et al. 761 Barendregt, H.P., Bergstra, J., Klop, J.W. and Volken, H., Representability in lambda algebras, Indagationes Math. 38 (1976), 377-387. [Barendregt and Hemerik 901 Barendregt, H.P. and Hemerik, C., Types in lambda calculi and programming languages, in: Jones, N., ed., European Symposium on Programming (Berlin, Springer Verlag, 1990), 1-36. (Lecture Notes in Comp. Sci., 432). [Barendsen 891 Barendsen, E., Representation of logic, data types and recursive functions in typed lambda calculus, Master’s thesis (University of Nijmegen, 1989). [van Benthem Jutting 71a] van Benthem Jutting, L.S., On normal forms in A UTOMATH (Eindhoven University of Technology, 1971). (Notitie 71-24, Dept. of Math.). [van Benthem Jutting 71b (C.1)] van Benthem Jutting, L.S., A normal form theorem in a A-calculus with types, in: Mitt. d. Gesellsch. f. Math. u. Datenverarb. 17, Tagung ub. form. Sprachen u. Programmiersprachen, Oberwolfach, Germany (1971), 27-32. [van Benthem Jutting 731 van Benthem Jutting, L.S., The development of a text in AUT-QE, in: Braffort, P., ed., Proceedings of the Symposium APLASM (Orsay, 1973), Vol. I. [van Benthem Jutting 761 van Benthem Jutting, L.S., A translation of Landau’s “Grundlagen” in AUTOMATH, Vol. 1-5 (Eindhoven University of Technology, 1976). [van Benthem Jutting 771 van Benthem Jutting, L.S., Checking Landau’s “Grundlagen” in the Automath system, Ph.D. thesis (Eindhoven University of Technology, 1977). Published as Mathematical Centre Tracts nr. 83 (Amsterdam, Mathematisch Centrum, 1979). See also [ B . 5 ] ,[ 0 . 2 ] , [ 0 . 3 ] , [D.5] and [E.2].
980
References
[van Benthem Jutting 81 (B.1)] van Benthem Jutting, L.S., Description of A UT-68 (Eindhoven University of Technology, 1981). (Memorandum 198112, Dept. of Math.). [van Benthem Jutting 85 (C.6)] van Benthem Jutting, L.S., The language theory of A,, a typed lambda calculus where terms are types (Eindhoven University of Technology, 1985). (Memorandum 1985-02, Dept. of Math. and Comp. Sc.). [van Benthem Jutting 88 (E,3)] van Benthem Jutting, L.S., An implementation of substitution in a A-calculus with dependent types (Philips Research Laboratories Eindhoven, Eindhoven University of Technology, 1988). [van Benthem Jutting and Wieringa 791 van Benthem Jutting, L.S. and Wieringa, R.M.A., Representatie van expressies in het verificatieprogramma VERA 1979 (Eindhoven University of Technology, 1980). (Memorandum 1979-15, Dept. of Math.).
[van Benthem Jutting and de Vrijer 94 (D.4)] van Benthem Jutting, L.S. and de Vrijer, R.C., A text fragment from Zucker’s ”Real Analysis” (1994). [Ben-Yelles 811 G-stratification is equivalent to F-stratification, Zeitschr. f. Math. Logilc u. Grundl. d. Math. 27 (1981), 141-150. [Berkling and Fehr 821 Berkling, K.J. and Fehr, E., A modification of the Acalculus as a base for functional programming languages, in: Nielsen, M. and Schmidt, E.M., eds., Automata, Languages and Programming, 9th International Colloquium, Aarhus (Berlin, Springer Verlag, 1982), 35-47. (Lecture Notes in Computer Science, 140). [Bishop 671 Bishop, E., Foundations of Constructive Analysis (New York, McGraw-Hill, 1967). [de Boer 751 de Boer, S., De ondefinieerbaarheid van Church’ 6-functie in d e A-calculus en Barendregt’s lemma, Master’s thesis (Eindhoven University of Technology, 1975). [Boyer and Moore 721 Boyer, R.S. and Moore, J.S., The sharing of structure in theorem-proving programs, Machine Intelligence 7 (Edinburgh, Edinburgh University Press, 1972), 101-113. [Boyer and Moore 881 Boyer, R.S. and Moore, J.S., A Computational Logic Handbook (Boston, Academic Press, 1988).
References
98 1
[ d e Bruijn 67 (A.1)] de Bruijn, N.G., Verification of mathematical proofs b y a computer, A preparatory study for a project AUTOMATH (formerly unpublished, 1967).
[ d e Bruijn 68a (D.l)] de Bruijn, N.G., Example of a text written in Automath (formerly unpublished, 1968). [de Bruijn 68b] de Bruijn, N.G., AUTOMATH, a language for mathematics (Eindhoven University of Technology, 1968). (T.H.-Report 68-WSK-05).
[ d e Bruijn 7Oa (A.Z)] de Bruijn, N.G., The mathematical language AUTOMATH, its usage and some of its extensions, in: Laudet, M., Lacombe, D. and Schuetzenberger, M., eds., Symposium on Automatic Demonstration, IRIA, Versailles, 1968 (Berlin, Springer Verlag, 1970), 29-61. (Lecture Notes in Math., 125). [de Bruijn 70b] de Bruijn, N.G., On the use of bound variables in AUTOMATH (Technological University, Eindhoven, 1970). (Notitie 70-9, Dept. of Math.). [ d e Bruijn 71 (B.Z)] de Bruijn, N.G., AUT-SL, a single line version of AUTOMATH (Eindhoven University of Technology, 1971). (Notitie 71-22, Dept. of Math.).
[de Bruijn 72a] de Bruijn, N.G., Some abbreviations in the input language for A UTOMATH (Eindhoven University of Technology, 1972). (Notitie 72-15, Dept. of Math.). [ d e Bruijn 72b (C.2)] de Bruijn, N.G., Lambda calculus notation with nameless dummies, a tool for automatic formula manipulation, with application to the Church-Rosser theorem, Zndagationes Math. 34 (1972), 381-392. [de Bruijn 73a] de Bruijn, N.G., A theory of generalized functions, with applications to Wigner distribution and Weyl correspondence, Nieuw Archief woor Wiskunde 3, 21 (1973), 205-280. [de Bruijn 73b] de Bruijn, N.G., AUTOMATH, a language for mathematics, A series of lectures at the SCminaire de mathkmatiques supbrieures, UniversitC de MontrCal, 1971, Lecture Notes by B. Fawcett (Les Presses de 1’UniversitC de MontrCal, 1973). [de Bruijn 73c] de Bruijn, N.G., The AUTOMATH Mathematics Checking Project, in: Braffort, p., ed., Proceedings of the Symposium APLASM (Orsay, 1973), Vol. I. Reprinted in: Kopania, J., ed., Studies in Logic, Grammar and Rhetoric, Humanities, Vol. I1 (Bialystok, 1983). (Papers of Warsaw University, 40).
982
References
[de Bruijn 73d] de Bruijn, N.G., A system for handling syntax and semantics of computer programs in terms of the mathematical language A UTOMATH (Eindhoven, unpublished, 1973). [de Bruijn 74a] de Bruijn, N.G., A framework for the description of a number of members of the A UTOMATH family (Eindhoven University of Technology, 1974). (Memorandum 1974-08, Dept. of Math.).
[ d e Bruijn 74b (B.3)) de Bruijn, N.G., Some extensions of AUTOMATH: the A UT-4 family (Eindhoven University of Technology, 1974). (Internal Report, Dept. of Math.). [ d e Bruijn 75a (F.1)]de Bruijn, N.G., Set theory with type restrictions, in: Hajnal, A., Rado R. and Sos, V.T., eds., Infinite and Finite Sets, I, International Colloquium, Keszthely, Hungary, 1973 (1975), 205-214. (Colloquia Math. SOC.Ja'nos Bolyai 10). [de Bruijn 75b] de Bruijn, N.G., The use of the language AUTOMATH for syntax and semantics of programming languages (Eindhoven, unpublished, 1975). [de Bruijn 761 de Bruijn, N.G., Modifications of the 1968 version of AUTOMATH (Eindhoven University of Technology, 1976). (Memorandum 197614, Dept. of Math.). [de Bruijn 771 de Bruijn, N.G., Some auxiliary operators in A UT-Pi (Eindhoven University of Technology, 1977). (Memorandum 1977-15, Dept. of Math.). [de Bruijn 78a] de Bruijn, N.G., A namefree lambda calculus with facilities for internal definition of expressions and segments (Eindhoven University of Technology, 1978). (T.H.-Report 78-WSK-03). [de Bruijn 78b] de Bruijn, N.G., Lambda calculus with namefree formulas involving symbols that represent reference transforming mappings, Indagationes Math. 40 (1978), 348-356. [ d e Bruijn 78c (B.4)] de Bruijn, N.G., AUT-QE without type inclusion (Eindhoven University of Technology, 1978). (Memorandum 1978-04, Dept. of Math.). [de Bruijn 78d] de Bruijn, N.G., A note on weak diamond properties (Eindhoven University of Technology, 1978). (Memorandum 1978-08, Dept. of Math.). [de Bruijn 79a] de Bruijn, N.G., Wees contextbewust in WOT, Euclides 55 (1979/1980), 7-12.
References
[de Bruijn 79b] de Bruijn, (1979/1980), 66-72.
983
N.G.,
Grammatica van WOT, Euclides 55
[de Bruijn 79c] de Bruijn, N.G., Van alles en nog wat over gebonden variabelen in wiskundige taal, Euclides 55 (1979/1980), 262-268.
[ d e Bruijn 80 (A.5)] de Bruijn, N.G., A survey of the project AUTOMATH, in: Seldin, J.P. and Hindley, J.R., eds., To H.B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism (New York/London, Academic Press, 1980), 579-606.
[ d e Bruijn 83 (F.5)] de Bruijn, N.G., Computer program semantics in space and time (Eindhoven University of Technology, 1983). (Internal Report, Dept. of Math. and Comp. Sc.). [ d e Bruijn 84 (F.Z)] de Bruijn, N.G., Formalization of constructivity in Automath, in: de Doelder, P.J., de Graaf, J. and van Lint, J.H., eds., Papers dedicated to J . J . Seidel (Eindhoven University of Technology, 1984), 76101. (EUT-Report 84-WSK-03). [de Bruijn 86) de Bruijn, N.G., Checking mathematics with the aid of a computer, in: Howson, A.G. and Kahane, J.-P., eds., The Influence of Computers and Informatics on Mathematics and its Teaching (Cambridge, Cambridge University Press, 1986), 61-68. [ d e Bruijn 87a (F.3)] de Bruijn, N.G., The Mathematical Vernacular, a language for mathematics with typed sets, in: Dybjer, P. et al., eds., Proceedings of the Workshop on Programming Languages, Marstrand, Sweden 1987. Plus: Formalizing the Mathematical Vernacular (formerly unpublished, 1982), Examples of an MV Book. [ d e Bruijn 87b (B.7)] de Bruijn, N.G., Generalizing Automath by means of a lambda-typed lambda calculus, in: Kueker, D.W., Lopez-Escobar, E.G.K. and Smith, C.H., eds., Mathematical Logic and Theoretical Computer Science, Proceedings of the Maryland 1984/85 Special Year in Mathematical Logic and Theoretical Computer Science (New York, Marcel Dekker, 1987), 71-92. (Lecture Notes in Pure and Appl. Math., 106). [de Bruijn 891 de Bruijn, N.G., Machinale verificatie van redeneringen, in: Lemniens, P.W.H., ed., Bewijzen in de Wiskunde (Amsterdam, Centrum voor Wiskunde en Inforrnatica, 1989), 61-79. [de Bruijn 9Oa] de Bruijn, N.G., The use of justification systems for integrated semantics, in: Martin-Lof, P. and Mints, G., eds., Colog-88 (Berlin, Springer Verlag, 1990), 9-24. (Lecture Notes in Comp. Sc., 417).
984
References
[ d e Bruijn 906 (A.7’1 de Bruijn, N.G., Reflections on Automath (Eindhoven University of Technology, 1990). [de Bruijn 91a] de Bruijn, N.G., Telescopic mappings in typed lambda calculus, Information and Computation 91 (1991), 189-204. [de Bruijn 91b] de Bruijn, N.G., A plea for weaker frameworks, in: Huet, G. and Plotkin, G., eds., Logical Frameworks, Proceedings of the BRA workshop, Sophia Antipolis, 1990 (Cambridge, Cambridge University Press, 1991), 40-67. [de Bruijn 91c] de Bruijn, N.G., Checking mathematics with computer assistance, Notices American Mathematical Society, 8 , 1 (1991), 8-15. [de Bruijn 921 de Bruijn, N.G., On the role of types i n mathematics (to be published, 1992). [de Bruijn 931 de Bruijn, N.G., Algorithmic definition of lambda-typed lambda calculus, in: Huet, G. and Plotkin, G., eds., Logical Environments (Cambridge, Cambridge University Press, 1993), 131-146. [Bulnes-Rozas 791 Bulnes-Rozas, J.P., GOAL: A goal oriented commend language for interactive proof construction, Ph.D. thesis, Stanford A.I. Lab., Stanford (1979). (Memo AIM-328). [Cardelli and Wegner 851 Cardelli, L. and Wegner, P., On understanding types, data abstraction, and polymorphism, Computing Surveys 17,4 (1985), 471522. [Church 321 Church, A., A set of postulates for the foundation of logic, Ann. of Math. 33 (1932), 346-366 and 34 (1933), 839-864. [Church 361 Church, A., An unsolvable problem of elementary number theory, Amer. Journal of Math. 58 (1936), 345-363. [Church 401 Church, A,, A formulation of the simple theory of types, Journ. of Symb. Logic 5 (1940), 56-68. [Church 411 Church, A., The Calculi of Lambda Conversion (Princeton University Press, 1941). (Annals of Math. Studies, 6). [Constable et al. 861 Constable, R.L.et al., Implementing Mathematics with the NuPRL Proof Development System (Englewood Cliffs, Prentice-Hall, 1986). [Coppo and Dezani 781 Coppo, M. and Dezani-Ciancaglini, M., New type assignment for A-terms, Archiv. Math. Logik 19 (1978), 139-156.
References
985
[Coppo et al. 811 Coppo, M., Dezani-Ciancaglini, M. and Venneri, B., Functional characters of solvable terms, Zeitschr. f. Math. Logik u. Grundl. d. Math. 27 (1981), 45-58. [Coquand 851 Coquand, Th., Une the‘orie des constructions, Thkse de troisibme cycle (Paris, Universite Paris VII, 1985). [Coquand 861 Coquand, Th., An analysis of Girard’s paradox, Proceedings of the first Symposium on Logic i n Computer Science, Cambridge Mass. (Washington DC, IEEE Computer Society), 227-236. [Coquand 901 Coquand, Th., Metamathematical investigations of a calculus of constructions, in: Odifreddi, P.G., ed., Logic and Computer Science (London, Academic Press, 1990), 91-122. (APIC series, 31). [Coquand and Huet 851 Coquand, Th. and Huet, G., Constructions: a higher order proof system for mechanizing mathematics, in: Buchberger, B., ed., Computer Algebra, Proceedings of the European conference EUROCAL ’85, Linz (1985), 151-184. (Lecture Notes in Comp. Sc., 203). [Coquand and Huet 881 Coquand, T. and Huet, G., The calculus of constructions, Information and Computation 76 (1988), 95-120. [Curien 861 Curien, P.-L., Categorical Combinators, Sequential Algorithms and Functional Programming (London, Pitman, 1986). [Curry and Feys 581 Curry, H.B. and Feys, R., Combinatory Logic (Amsterdam, North Holland Publishing Co., 1958), Vol. 1. [Curry et al. 721 Curry, H.B., Hindley, J.R. and Seldin, J.P., Combinatory Logic (Amsterdam, North Holland Publishing Co., 1972), Vol. 2. [van Daalen 701 van Daalen, D.T., Verzamelingstheorie, de axioma’s van Zermelo-Frankel (Eindhoven University of Technology, 1970). (Internal Report, Dept. of Math.).
[van Daalen 73 (A.3’11 van Daalen, D.T., A description of Automath and some aspects of its language theory, in: Braffort, P., ed., Proceedings of the Symposium A P L A S M (Orsay, 1973), Vol. I. Also in [van Benthem Jutting 771, 48-77. [van Daalen 801 van Daalen, D.T., The language theory of Automath, Ph.D. thesis (Eindhoven University of Technology, 1980). See also [A.6], (B.6) and [ C. 51.
986
References
[van Dalen et al. 781 van Dalen, D., Doets, H.C. and de Swart, H., Sets: Naive, Axiomatic and Applied (Oxford, Pergamon Press, 1978). [Dowek et al. 911 Dowek, G., Felty, A., Herbelin, H., Huet, G., Paulin-Mohring, Ch. and Werner, B., The Cop proof assistant version 5.6, user’s guide (Rocquencourt, INRIA - Lyon, CNRS ENS, 1991). [Fitch 521 Fitch, F.B., Symbolic Logic, an Introduction (New York, The Ronald Press Co., 1952). [Fraenkel et al. 581 Fraenkel, A.A., Bar-Hillel, Y. and LBvy, A., Foundations of Set Theory (Amsterdam, North Holland Publishing Co., 1958). [Frege 18791 Frege, G., Begriflsschrift, eine der arithmetischen nachgebildete Formelsprache des reinen Denkens (Halle, Verlag von Louis Nebert, 1879). [Frege 18931 Frege, G., Grundgesetze der Arithmetik, begriflsschriftlich abgeleitet (Jena, H. Pohle, 1893, 1903). [Gandy 801 Gandy, R.O., Proofs of Strong Normalization, in: Seldin, J.P. and Hindley, J.R., eds., To H.B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism (New York/London, Academic Press, 1980), 475477. [Gentzen 351 Gentzen, G., Untersuchungen uber das logische Schliessen, Math. Zeitschr. 39 (1935), 176-210, 405-431. [Gentzen 361 Gentzen, G., Die Widersprachsfreiheit der reine Zahlentheorie, Math. Annalen 112 (1936), 493-565. [Geuvers 881 Geuvers, J.H., The interpretation of logic in type systems, Master’s thesis (University of Nijmegen, 1988). [Geuvers 931 Geuvers, J.H., Logics and type systems, Ph.D. thesis (Catholic University of Nijmegen, 1993). [Geuvers and Nederhof 911 Geuvers, J.H. and Nederhof, M.J., A modular proof of strong normalization for the Calculus of Constructions, Journal of Functional Programming 1, 2 (1991), 155-189. [Girard 711 Girard, J.-Y., Une extension de l’interprktation de Godel a l’analyse, et son application A 1’Climination des coupures dans l’analyse et la thCorie des types, in: Fenstad, J.E., ed., Proceedings of the 2nd Scandinavian Logic Symp. (Amsterdam, North-Holland Publishing Co., 1971), 63-92.
References
987
[Girard 721 Girard, J.-Y., Interpre'tation fonctionelle et dimination des coupures dans l'arithm6tique d 'ordre supe'rieur, Thbse d ' Etat (Paris, UniversitC Paris VII, 1972). [Girard et al. 891 Girard, J.-Y., Lafont, Y. and Taylor, P., Proofs and Types (Cambridge, Cambridge University Press, 1989). [Glaser et al. 841 Glaser, H., Hankin, C. and Till, D., Principles of Functional Programming (Englewood Cliffs, Prentice-Hall, 1984). [Gordon and Melham 931 Gordon, M.J.C. and Melham, T.F., Introduction to HOL, A theorem proving environment for higher order logic (Cambridge, Cambridge University Press, 1993). [Gordon et al. 791 Gordon, M., Milner, R. and Wadsworth, C., Edinburgh LCF, A mechanised Logic of Computation (Berlin, Springer Verlag, 1979). (Lecture Notes in Comp. Sc., 78). [Hall 671 Hall Jr., M., Combinatorial Theory (Chichester, Wiley, 1967). (Blaisdell book in pure and applied mathematics). [Harper et al. 861 Harper, R., MacQueen, D. and Milner, R., Standard ML (Edinburgh University, 1986). (Report ECS-LFCS-86-2). [Harper et al. 871 Harper, R., Honsell, F. and Plotkin, G., A framework for defining logics, in: Proceedings of the second Symposium on Logic in Computer Science, Ithaca, N.Y. (Washington DC, Computer Society of the IEEE, 1987), 194-204. [Hichcock and Park 731 Hichcock, P. and Park, D., Induction rules and termination proofs, in: Nivat, M., ed., Automata, Languages and Programming (Amsterdam, North-Holland Publishing Co., 1973), 225-252. [Hilbert and Ackermann 19281 Hilbert, D. and Ackermann, W., GrundzGge der theoretischen Logzk (Berlin, Springer Verlag, 1928). [Hindley 691 Hindley, J.R., The principal type scheme of an object in combinatory logic, Transactions of the American Math. SOC.146 (1969), 29-60. [Hindley 791 Hindley, J.R., Combinatory reductions and lambda reductions compared, Zeitschr. f. Math. Logik u. Grundl. d. Math. 23 (1979), 169180. [Hindley et al. 721 Hindley, J.R., Lercher, B. and Seldin, J.P., Introduction to Combinatory Logic (Cambridge, Cambridge University Press, 1972).
References
988
[Hindley and Seldin 861 Hindley, J.R. and Seldin, J.P.,
Introduction to Combinators and A-Calculus (Cambridge, Cambridge University Press, 1986). [Howard 801 Howard, W.A., The formulae-as-types notion of construction, in: Seldin, J.P. and Hindley, J.R., eds., To H.B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism (New York/London, Academic Press, 1980), 479-490. (Manuscript 1969). [Jervell 711 A normal form in first order arithmetic, in: Fenstad, J.E., ed., Proceedings of the 2nd Scandinavian Logic Symp. (Amsterdam, North-Holland Publishing Co., 1971), 93-108. [Kamareddine and Nederpelt 93) Kamareddine, F. and Nederpelt, R., On stepwise explicit substitution, Int. Journal of Found. of Comp. Sc. 4, 3 (1993), 197-240. [Kijne 621 Kijne, D., Construction geometries and construction fields, in: Algebraical and Topological Foundations of Geometry, Proceedings of a Colloquium held in Utrecht, 1959 (Oxford, Pergamon Press, 1962). [Kleene 521 Kleene, S.C., Introduction to Metamathematics (New York, Van Nostrand, 1952). [Klop SO] Klop, J.-W., Combinatory reduction systems, Ph.D. thesis (Utrecht University, 1980). [Klop 921 Klop, J.W., Term rewriting systems, in: Abramsky, S. Gabbay, D. M. and Maibaum, T. eds., Handbook of Logic in Computer Science (Oxford, Clarendon Press, 1992), Vol. 2, 1-116. [Kneale and Kneale 621 Kneale, W. and Kneale, M., The Development of Logic (Oxford, Clarendon Press, 1962). [Kreisel 721 Kreisel, G., Five notes on the application of proof theory to computer science (Stanford University, 1972). (Techn. report no. 182). [Landau 301 Landau, E., Grundlagen der Analysis (First ed.: Leipzig, 1930; Third ed.: New York, Chelsea Publ. Comp., 1960). [Landin 641 Landin, P.J., The mechanical evaluation of expressions, Computer Journal 6, 4 (1964), 308-320. [Lauchli 70) Lauchli, H., An abstract notion of realizability for which intuitionistic predicate calculus is complete, in: Kino, A., Myhill, J. and Vesley,
References
989
R.E., eds., Intuitionism and Proof Theory, Proc. Summer Conference at Buffalo, 1968 (Amsterdam, North-Holland Publishing Co., 1970). [Leivant 751 Leivant, D., Strong normalization for arithmetic (variations on a theme of Prawitz), in: Proof Theory Symposium Kiel1974 (Berlin, Springer Verlag, 1975), 182-197. (Lecture Notes in Math., 500). [LCvy 741 LCvy, J.J., Re'ductions sures dans le lambda-calcul, These 3' Cycle (Paris, 1974). [LBvy 751 LCvy, J.J., An algebraic interpretation of the APK-calculus and a labelled A-calculus, in: Bohm, C., ed., A-Calculus and Computer Science Theory (Berlin, Springer Verlag, 1975), 147-165. (Lecture Notes in Comp. sc., 37). [Luo 891 Luo, Z., ECC: An extended Calculus of Constructions, in: Proceedings of the fourth Annual Symposium on Logic in Computer Science, Asilomar, Cal. (Washington DC, IEEE Computer Society Press, 1989), 386-395. [Magnusson and Nordstrom 941 Magnusson, L. and Nordstrom, B., The ALF proof editor and its proof engine, in: Barendregt, H. and Nipkow, T., eds., Types for Proofs and Programs (Berlin, Springer Verlag, 1994), 238-262. (Lecture Notes in Comp. Sc., 806.) [Mann 731 Mann, C.R., The connections between proof theory and category theory, Ph.D. thesis (Oxford, 1973). [Martin-Lof 71a] Martin-Lof, P., A theory of types (1971). (Manuscript). [Martin-Lof 71b] Martin-Lof, P., Hauptsatz for the theory of species, in: Fenstad, J.E., ed., Proceedings of the 2nd Scandinavian Logic Symp. (Amsterdam, North-Holland Publishing Co., 1971), 217-233. [Martin-Lof 75a] Martin-Lof, P., An intuitionistic theory of types: predicative part, in: Rose, H.E. and Shepherdson, J.C., eds., Logic Colloquium '73 (Amsterdam, North-Holland Publishing Co., 1975), 73-118. [Martin-Lof 75b] Martin-Lof, P., About models for intuitionistic type theory and the notion of definitional equality, in: Karger, S., ed., Proceedings of the third Scandinavian Logic Symp. (Amsterdam, North Holland Publishing CO., 1975), 81-109. [Martin-Lof 821 Martin-Lof, P., Constructive mathematics and computer programming,' in: Logic, Methodology and Philosophy of Science, VI, 1979 (Amsterdam, North-Holland Publishing Co., 1982), 153-175.
990
References
[Martin-Lof 841 Martin-Lof, P., Intuitionistic Type Theory, Studies in Proof Theory (Napoli, Bibliopolis, 1984). [Mitchell and Plotkin 851 Mitchell, J.C. and Plotkin, G.D., Abstract types have existential type, in: Proceedings of the 12th Annual Symposium on Principles of Programming Languages (New York, ACM, 1985), 37-51. [Mitschke 761 Mitschke, G., A-Kalkul, 6-Konversion und axiomatische Rekursions- Theorie, Habilit. Schr. (Darmstadt, 1976). [Mohring 861 Mohring, Ch., Algorithm development in the calculus of constructions, in: Proceedings of the first Symposium on Logic in Computer Science, Cambridge, Mass. (Washington DC, IEEE Computer Society, 1986), 84-91. [Nederpelt 71a] Nederpelt, R.P., Lambda-Automath (Eindhoven University of Technology, 1971). (Notitie 71-17, Dept. of Math.). [Nederpelt 71b] Nederpelt, R.P., Lambda-Automath IZ (Eindhoven University of Technology, 1971). (Notitie 71-25, Dept. of Math.). [Nederpelt 72a] Nederpelt, R.P., Strong normalisation in a A-calculus uiith Aexpressions as types (Eindhoven University of Technology, 1972). (Notitie 72-18, Dept. of Math.). [Nederpelt 72b] Nederpelt, R.P., The closure theorem in A-typed A-calculus (Eindhoven University of Technology, 1972). (Notitie 72-22, Dept. of Math.).
[Nederpelt 73 (C.3)] Nederpelt, R.P., Strong normalisation in a typed lambda calculus with lambda structured types, Ph.D. thesis (Eindhoven University of Technology, 1973). (Nederpelt 771 Nederpelt, R.P., Presentation of natural deduction, Receuil des Travaux d e l’lnstitut Math., Nouvelle SCrie, tome 2, 10, Symp. Set Theory, Foundations of Math. (Beograd, 1977), 115-126. [Nederpelt 801 Nederpelt, R.P., An approach to theorem proving on the basis of a typed lambda-calculus, in: 5th Conference on Automated Deduction (Berlin, Springer Verlag, 1980), 182-194. (Lecture Notes in Comp. Sc., 87). [Nederpelt 871 Nederpelt, R.P., De Taal van de Wiskunde (Almere, Versluys, 1987).
[Nederpelt 90 (A.S)] Nederpelt, R.P., Type systems - basic ideas and applications, in: van de Goor, A.J., ed., Proceedings of CSN ’90, Computing Science in the Netherlands (Amsterdam, Stichting Mathematisch Centrum, 1990), 367-383.
References
991
[Nederpelt 921 Nederpelt, R.P., The fine-structure of lambda calculus (Eindhoven University of Technology, 1992). (Computing Science Notes, 92/07). [Newman 421 Newman, M.H.A., On theories with a combinatorial definition of “equivalence”, Ann. of Math. 2, 43 (1942), 223-243. [Nordstrom et al. 901 Nordstrom, B., Petersson, K. and Smith, J.M., Programming in Martin-Loj’s Type Theory (Oxford, Oxford University Press, 1990). [Osswald 731 Osswald, H., Ein syntaktischer Beweis fur die ZulGsigkeit der Schnittregel im Kalkul von Schutte fur die intuitionistischen Typenlogik, Manusc. Math. 8 (1973), 243-249. [Paulin-Mohring 891 Paulin-Mohring, Ch., Extraction des programmes dans le calcul des constructions, Thhse (Paris, UniversitC Paris VII, 1989). [Paulson 871 Paulson, L.C., Logic and Computation (Cambridge, Cambridge University Press, 1987). (Cambridge Tracts in Theor. Comp. Sc., 2). [Penning 771 Penning, P., Automath-bewijzen voor tautologieen (Eindhoven, unpublished, 1977). [Peremans 941 Peremans, W., Ups and Downs of Type theory (Eindhoven University of Technology, 1994). (Computing Science Notes, 94/14). [Plotkin 801 Plotkin, G., Lambda-definability in the full type hierarchy, in: Seldin, J.P. and Hindley, J.R., eds., To H.B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism (New York/London, Academic Press, 1980), 363-373. [Pohlers 731 Pohlers, W., Ein starkes Normalisationssatz fur die intuitionistischen Typen, Manusc. Math. 8 (1973), 371-387. [Pottinger 771 Pottinger, G., Letter to Prawitz (unpublished, 1977). [Pottinger 791 Pottinger, G., On analysing relevance constructively, Studia Logica 38 (1979), 171-185. [Pottinger 801 Pottinger, G., A type assignment to the strongly normalizable terms, in: Seldin, J.P. and Hindley, J.R., eds., To H.B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism (New York/ London, Academic Press, 1980), 561-577. [Prawitz 651 Prawitz, D., Natural Deduction, a Proof- Theoretical Study (Stockholm, Almquist and Wiksell, 1965).
992
References
[Prawitz 711 Prawitz, D., Ideas and results in proof theory, in: Fenstad, J.E., ed., Proceedings of the 2nd Scandinavian Logic Symp. (Amsterdam, NorthHolland Publishing Co., 1971), 235-307. [Reynolds 741 Reynolds, J.C., Towards a theory of type structure, in: Robinet, B., ed., Proceedings of the Colloque sur la Programmation (Berlin, Springer Verlag, 1974), 408-425. (Lecture Notes in Comp. Sc., 19). [Reynolds 851 Reynolds, J.C., Three approaches to type structure, in: Ehrig, H. et al., eds., Mathematical Foundations of Software Development (Berlin, Springer Verlag, 1985), 97-138. (Lecture Notes in Comp. Sc., 185). [Sanchis 671 Sanchis, L.E., Functionals defined by recursion, Notre Dame Journal of Formal Logic 8 (1967), 161-174. [Schulte Monting 731 Schulte Monting, H., Yet another proof of the ChurchRosser theorem (unpublished, 1973). [Scott 701 Scott, D., Constructive validity, in: Laudet, M., Lacombe, D. and Schuetzenberger, M., eds., Symposium on Automatic Demonstration, IRIA, Versailles, 1968 (Berlin, Springer Verlag, 1970), 237-275. (Lecture Notes in Math., 125). [Scott 731 Scott, D.S., Models for various type-free calculi, in: Suppes, P. et al., eds., Logic, Methodology and Philosophy of Science, IV, 1971 (Amsterdam, North Holland Publishing Co., 1973), 157-187. [Scott and Strachey 711 Scott, D. and Strachey, C., Towards a mathematical semantics for computer languages (Oxford, Oxford University, 1971). [Seldin 751 Seldin, J.P., Review of ‘Lambda calculus notation with nameless dummies, a tool for automatic formula manipulation, with application to the Church-Rosser theorem’, Journ. of Symb. Logic 40 (1975), 470. [Seldin 761 Seldin, J.P., A theory of generalized functionality I (unpublished, 1976). See also: Seldin, J.P., Progress report on generalized functionality, Ann. Math. Logic 17 (1979), 29-59. [Seldin and Hindley 801 Seldin, J.P. and Hindley, J.R., eds., To H.B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism (New York/London, Academic Press, 1980). [Shoenfield 671 Shoenfield, J.R., Mathematical Logic (Reading MA, Addison Wesley, 1967).
References
993
[Smorynski 771 Smorynski, C., The incompleteness theorems, in: Barwise, J., ed., Handbook of Mathematical Logic (Amsterdam, North-Holland Publishing Co., 1977), 821-865. (Studies in Logic and the Foundations of Math., VOl. 90). [Staples 751 Staples, J., Church-Rosser theorems for replacement systems, in: Crossley, J.N., ed., Algebra and Logic (Berlin, Springer Verlag, 1975), 291306. (Lecture Notes in Math., 450). [Staples 771 Staples, J., A lambda calculus with naive substitution (Brisbane, unpublished, 1977). [Stenlund 721 Stenlund, S., Combinators, A-terms and Proof Theory (Dordrecht, Reidel, 1972). [Tait 671 Tait, W.W., Intentional interpretations of functionals of finite type I, Journ. of Symb. Logic 32 (1967), 198-212. [Troelstra 731 Troelstra, A S . , ed., Metarnathematical Investigation of Intuitionistic Arithmetic and Analysis (Berlin, Springer Verlag, 1973). (Lecture Notes in Math., 344). [Troelstra 771 Troelstra, AS., Aspects of constructive mathematics, in: Barwise, J., ed., Handbook of Mathematical Logic (Amsterdam, North Holland Publishing Co., 1977), 973-1052. (Studies in Logic and the Foundations of Math., Vol. 90). [Troelstra and van Dalen 881 Troelstra, AS. and van Dalen, D., Constructivism in Mathematics (Amsterdam, North-Holland Publishing Co., 1988), Vol. 1. [Trybulec 901 Trybulec, A., Introduction, Formalized Mathematics 1, 1 (1990), 7-8. [Turner 791 Turner, D.A., Another algorithm for bracket abstraction, Journ. of Symb. Logic 44 (1979), 267-270. [Udding 801 Udding, J.T., A Theory of Real Numbers and its Presentation in Automath, Vol. 1-3, Master’s thesis (Eindhoven University of Technology, 1980).
[de Vrijer 75 (C.4)] de Vrijer, R.C., Big trees in a A-calculus with A-expressions its types, in: Bohm, C., ed., A-Calculus and Computer Science Theory, (Berlin, Springer Verlag, 1975), 202-221. (Lecture Notes in Comp. Sc., 37). Also: de Vrijer, R.C., Surjective pairing and strong normalization, Ph.D. thesis (University of Amsterdam, 1987), Chapter 5.
994
References
[de Vrijer 87a] de Vrijer, R.C., Surjective pairing and strong normalization: two themes in lambda calculus, Ph.D. thesis (University of Amsterdam, 1987). See also [de Vrijer 75 (C.4)]. [de Vrijer 87b] de Vrijer, [de Vrijer 87al.
R.C.,
“Stelling” to
his Ph.D.
thesis.
See
[de Vrijer 87c] de Vrijer, R.C., Exactly estimating functionals and strong normalization, Proc. of the Koninklijke Nederlandse Akademie van Wetenschappen, Series A 90, 4 (1987), 479493. [Wadsworth 711 Wadsworth, C.P., Semantics and pragmatics of the lambdacalculus, Ph.D. thesis (Oxford, 1971). [Weyhrauch 771 Weyhrauch, R.W., A users manual for FOL (Stanford, Computer Science Dept., 1977). (Artificial Intelligence Project M235). [Whitehead and Russell 19101 Whitehead, A.N. and Russell, B., Principia Mathematica (Cambridge, Cambridge University Press, 1910-1913), Vol. 1-3. [Wieringa 761 Wieringa, R.M.A., Binaire optelling en vermenigvuldiging in AUT-QE (Eindhoven, unpublished, 1976). [Wieringa 781 Wieringa, R.M.A., Een notatie-systeem voor lambda-calculus met definities, Master’s thesis (Eindhoven University of Technology, 1978).
[ Wieringa 80 (F.4)] Wieringa. R.M. A., Relational semantics in a n integrated system (Eindhoven University of Technology, 1980). (Internal Report, Dept. of Math.). [Zandleven 73 (E.l)] Zandleven, I., A verifying program for Automath, in: BrafFort, P., ed., Proceedings of the Symposium APLASM (Orsay, 1973), VOl. I. [Zucker 741 Zucker, J . , Cut-elimination and normalization, Annals of Math. Logic 7 (1974), 1-112. [Zucker 77 (A.4)] Zucker, J., Formalization of classical mathematics in Automath, in: Colloque International de Logique, Clermont-Ferrand, France, 1975 (Paris, CNRS, 1977), 135-145. (Colloques Internationaux du Centre National de la Recherche Scientifique, 249).
Indexes
This Page Intentionally Left Blank
Index of Names Berkling & Fehr 119821, 811, 837 Bernays, P., 192 Beth, E.W., 26, 203 Bishop [1967], 130 de Boer, S., 503 - (19751, 503, 505 Bourbaki, N., 206 Boyer & Moore (19723, 809 - [1988], 12 Braun, W.C.P., 252 van Bree, L.G.F.C., 73 Brouwer, L.E. J., 26, 204, 236 de Bruijn, N.G., 6, 111, 127, 128, 133,139,163,164,169, 175177, 187, 194, 216, 230, 231, 235, 238, 240, 242, 252, 306, 339,345, 390,393,394,473, 590, 687, 688, 722, 729, 783, 805, 809, 811, 813, 836, 838 - [1968b],8, 15, 74, 141, 150, 394 - [197Oa], 8, 128, 141, 150, 169, 283, 285, 286, 334, 371, 387, 393, 394, 469, 470, 783,844, 938 - [1970b],277 - [1971], 147, 313, 334, 393, 590 - [1972a], 158, 737 - [1972b], 116, 156, 175, 177, 192, 319, 327, 390, 656, 673, 804, 809, 811 - [1973b],141, 283, 285, 286, 334, 396, 722, 783 - [1973~],127, 141 - [1973d], 53, 938, 947, 948, 951,
Abadi et al. (19911, 35, 50 Ackermann, W., 4 Aristotle, 3 Augustsson, L., 12 Balsters, H.,8 - [1986],35, 45, 170, 306, 368 Barendregt, H.P., 11, 229, 230, 235, 239, 243, 248, 376, 391,399, 443, 503, 505, 611 - [1971], 375, 376, 389-391, 427, 448 - [1981],327, 503 - [1984a],339, 340, 572 - 119921, 5, 11, 26, 27, 34, 36, 37, 46 - & Hemerik [1990], 5, 26 - et al. [1976], 503 von Beethoven, L., 841 Ben-Yelles [1981], 187, 633 van Benthem Jutting, L.S., 6, 73, 156, 169, 170, 216, 222, 247, 252, 303, 306, 331, 396, 508, 509, 578 - [1971a], 155, 396, 508 - [1971b],473, 633 - 119731, 101, 127, 160, 804 - [1976], 157, 159 - [1977], 7, 44, 141, 147, 157, 159, 163, 169,170,175, 188-190, 197, 198, 303, 306, 334, 471, 570, 578 - [1981], 339, 687, 688, 733 - (19881, 177 - & Wieringa [1979], 170, 177 997
Index of Names
998 952, 963, 964
- [1974a],289, 334, 852 - [1974b], 148, 159, 187 - [1975a], 144, 192
- & Feys [1958], 8, 150, 168, 175, 195,371,375,376,389,391393, 415, 443, 572, 585 - et al. [1972],195
- [1975b],53, 160, 938
- (19761, 159 - [1977], 159, 289, 306, 652
- [1978a], 33, 156, 160, 177, 339, 355
- [1978b], 177, 342 - [1978~],159, 306, 335, 652 - [1980], 163, 169, 171, 189, 197, 334, 339, 655, 673, 810, 849, 852, 866 - [1987a],865 - [1991a],11, 21, 44 - [1991b], 30, 32 Bulnes, J.P., 166 Bulnes-Rozas [1979],166, 170 Cantor, G., 203-205, 221, 229, 841, 842, 845, 846, 848 Church, A., 5, 34,36,37,41, 75, 168, 205, 229, 230, 232, 235, 246, 389, 391, 392, 503 - (19321, 5, 389 - [1936],5 - (19401, 5, 13, 135, 136, 340, 355 - [1941],375 Constable et al. [1986],9, 11 Coppo & Dezani [1978], 188 Coppo et al. [1981], 188 Coquand, Th., 12, 235, 837 - [1985], 10 - [1986], 8 - [1990], 10 - & Huet [1985], 10 - & Huet [1988], 10 Curien [1986], 809, 815, 837 Curry, H.B., 5, 128, 175, 195, 230, 232, 235, 248, 250
van Daalen,
D.T., 7, 252, 264, 396,
485, 783
- [1970], 144 - [1973], 127, 141, 155, 163, 172, 177,289,301,303,469-471, 474, 493, 517, 522, 523, 558, 591, 592, 701, 783, 787, 810, 833 - [1980], 7, 23, 31, 38, 155, 303, 314, 327, 331, 334, 339, 473, 508, 515, 526, 539, 545, 550, 574, 579, 590, 592, 612, 629, 637, 655, 830, 833 van Dantzig, D., 203 Dedekind, R., 713, 714 Dijkstra, E.W., 207 Dowek et al. [1991], 11 Erdos, P., 26, 204, 212 Euclid, 219, 717 Euler, L., 219 Fitch, F.B., 687 - [1952], 52, 687 Fleischhacker, L., 371 Fraenkel, A.A., 50, 220, 226, 841843, 845-847, 865, 866, 912 - et al. [1958],842 Frege, G., 3, 228 - [1879],3 - [1893], 3 Freudenthal, H., 205, 206 Gandy [1980],41, 655, 664 Gauss, C.F., 841 Geuvers [1993], 11 - & Nederhof [1991], 11
Index of Names Girard, J.-Y., 8, 111, 188, 200, 230, 235, 393 - [1971], 9, 188, 393 - [1972], 8, 9, 111, 123, 150, 168, 188, 520 Glaser et al. [1984],837 Godel, K., 9, 192, 229, 393 Gordon, M.J.C., 12 - & Melham [1993], 12 - et al. [1979], 9, 166 Gorissen, P., 837 Harper et al. [1987], 10 Helmink, L., 11 Heyting, A., 26, 204, 205, 211, 216, 218, 236 Hichcock & Park [1973], 943 Hilbert, D., 4, 52, 93, 94, 101, 142, 152, 163, 203, 205, 216, 217, 219, 728, 730, 733 853, 865, 887, 911, 922, 936 - & Ackermann [1928], 4 Hindley [1979], 177 - et al. [1972], 469 Howard, W.A., 8,111,128,235,248, 393, 469, 473 - [1980], 8, 111, 150, 168, 393, 469, 473 Huet, G., 235, 837 Kamareddine & Nederpelt [1993],50 Kijne (19621, 852 Klop [1980],36 - (19921, 36 Kneale & Kneale [1962], 3 Kolmogorov, A.N., 236 Kornaat, A., 7, 138, 156, 158, 170, 252, 303, 306, 733 Kramers, H.A., 203 Kreisel [1972], 395 Kronecker, L., 217, 218 Kruseman Aretz, F.E.J., 837
999 Lauchli, H., 161, 393 - [1970], 161, 393 LCvy, J.J., 503, 508, 509 - [1974],508 - [1975],503 Landau, E., 73, 144, 157, 159, 160, 169, 222, 223, 252,303,306, 578 701-706, 709-711, 713, 715, 723-725, 729-732, 805 - [1930],44 45 157 159701, 709 Landin, P.J., 243, 837 von Leibniz, G.W., 3 142 Leivant, D., 635 - [1975],633 635 Luo, z., 11 - [1989], 11 Magnusson & Nordstrom [1994], 12 Marcelis, J.G., 252 Martin-Lof, P., 8, 111, 128, 188, 194, 200,230,391,393,398,427429, 472, 474, 655 - [1971a], 8, 168, 194 - [1975a], 9, 25, 111, 123, 128, 130, 134, 150, 188, 192, 393, 474, 517, 520, 655 - [1984],9 Milner, R., 9, 166 Mitschke (19761, 503 Mostowski, A., 841 Nederpelt, R.P., 6, 28, 123, 165, 174, 175, 183, 186, 187, 193, 252, 264, 278, 281, 333, 371, 473, 517, 518, 551, 559, 577, 581, 591,592,596,605,616,626-
628, 655, 683, 783 - [1971a], 275, 393, 590, 628 - [1971b], 28, 393, 590 - [1972b], 400 - [1973], 123, 124, 147, 155, 163, 175, 177, 281, 314, 327, 331,
Index of Yames
1000 333, 345, 473, 482, 517, 518, 522, 577, 592, 626, 633, 655, 783, 830 - (19771, 144 - [1987],23, 52 von Neumann, J., 192 Nordstrom, B., 12 - et al. [1990], 9 Paulin-Mohring, Ch., 11 - [1989], 11 Paulson [1987], 10 Peano, G., 52, 137, 141, 218, 221, 228, 241, 246, 249, 711, 715, 717, 718, 720, 843, 846, 848, 887, 912, 922 Penning, P., 252 - [1977], 170 Plotkin [1980], 191 Pollack, R., 11 Pottinger, G., 635 - (19771, 305, 635 - (19793, 305 - [1980], 189 Prawitz, D., 111, 392, 393, 508, 632, 635 - [1965],305, 392, 509, 632 - [1971], 111, 123, 150, 168, 392, 393, 487, 488 Pythagoras, 3 Reynolds, J.C.,9 - [1974], 9 Rosser, J.B., 38, 42 Russell, B., 4, 229 Sanchis, L.E., 393 - (19671, 393 Schuh, F., 202, 203 Schulte Monting [1973], 391 Scott, D., 9, 26, 100, 161, 167, 169, 188, 193, 194, 216, 473
- [1970], 25, 168, 169, 188, 194,
200, 473
- [1973], 9, 128, 134 - & Strachey (19711, 943 Seidel, J.J., 6 Seldin, J.P., 177, 188, 195, 200, 230 - [1975], 177 - [1976], 25, 188, 195 - & Hindley (19801, 22 Shakespeare, W., 217 Smorynski [1977],5 Staples, J., 175 - [1977], 175
Tait, W.W., 118, 376, 391, 392, 396, 398, 427-429, 655 - [1967], 392, 396, 487 Tarski, A., 5 Thales, 3 Trybulec [1990], 12 Turing, A.M., 229 Turner [1979], 49, 177 Udding, J.T., 160, 252 - [1980], 160 de Vrijer, R.C., 7, 175, 185, 252, 396, 503, 517, 601, 647, 689 - [1975], 155, 163, 173, 175, 185, 517, 591, 601, 608, 647 .. . - [1987a], 37, 174 - [1987~],41, 655, 664 Wadsworth (19711, 177, 809 Weyhrauch, R.W., 166 - [1977], 166 Whitehead, A.N., 4, 229 - & Russell [1910],4 Wieringa, R.M.A., 8, 157, 160, 170, 177, 252, 837, 947 - [1976], 171 - [1978], 160
Index of Names - [1980], 947 Wittgenstein, L., 151
Zandleven, F., 7, 121, 126, 156, 158, 169, 170, 176, 252, 725, 805, 809,811, 817, 837 - 119731, 101, 115, 121, 127, 156, 169, 570, 572, 805-807 Zermelo, E., 51, 144, 220, 226, 841843, 865, 866, 912 Zucker, J., 7, 147, 158, 160, 169, 170, 224, 252, 303, 306, 724, 729, 733,736, 737,739-742,746, 748, 749 - [1974],635 - [1977], 158, 163, 169-171, 177, 188, 189, 197,289,303,305, 306,470, 706, 722, 724,729, 733, 736, 741, 742, 746
1001
This Page Intentionally Left Blank
Index of Notations +, 309, 493
a,116
+’, 311
P, 108
C9, 304
Pq-CR, 577 PTVU-SN, 638 p r , 603 bool, 688 bool, 16, 41, 723 b o o l , 53 BOUNDSUBST, 790 BT, 38 BT, 600, 601
507 >, 494, 518 >1,493 > p , 790 >6, 794 >,, 792 51,494 2 , 494, 518 A sub B, 493 A 1 B, 494, 519 A c B, 493 [ti : V ] ,149 E [z := p ] , 148 E, 52 00, 53 P I r 11 486 +, 599 -‘btr 473 ,: 599 c,583 El 493 N, 518 =, 494
cantyp, 539, 558 CAT, 49, 125, 795 CAT, 299, 739 CAT, 31 CATEGORY, 788 CL, 519, 534 CL1, 534 CL’, 612 CL& 612 CLPT, 534 CLPT1, 535 CORRECT, 802 CORRECTCATS, 801 CR, 494 CR’, 612 CR;, 612 CR1, 494 cus, 648
El 798 $ ’,
800 T, 747 I-, 723, 801 I-N, 626 I-*,801 :, 52 ::, 52
Deg, 399, 450 6 , 108, 493 dn, 594 dnc, 594 1003
1004
DOM, 48,795, 796 DOM,739 DOM, 31 dom, 561 E, 116
EB, 109 E , 304, 493 Eait, 311 77, 108 EUD, 519, 536 IF, 638 ij-pp, 494 INDSTR, 787 i n t , 53 IS,41
Lq, 519, 536 LR, 605 MIDDLE, 787 F , 39 N, 494 nonempty, 688
OCCURS IN, 792 OFF, 940 OLDER THAN,798 w , 346 %(x)4 89 ON,940 P'T, 596 PD, 596 n, 32,306,736 K , 304, 493 PN, 109 PROP, 723 prop, 19, 22, 43, 152,196,728 PT, 519, 534, 536,596 PT1, 534
Index of Notations
9, 116
&-, 518 r, 601 RECURS, 955 p, 38, 399,457 rst, 601 rt, 601 s, 601 519, 596 sc, 597 C,11, 736 6 ,304,493 SN, 494 STRINGSUBST, 789 SUBST, 789 SYNT, 739 synt, 31, 299 SA,
T,53 t, 601 r , 34,260,603 r', 482 381 0,468 el,467 0 2 , 466 Totinf, 53 TRUE, 688 TRUE, 16,41, 53,87, 723 Typ, 399, 450 Typn, 449 Typ', 399, 452, 482 typ, 184, 186 typ', 593 type, 16, 19, 22, 43, 75, 148, 181, 258, 395, 728 UD, 519,536 UT, 519,536 weak ij-pp, 494
Index of Subjects function -, 231 functional -, 90 - index, 294 lambda -, 18 abstractor, 398, 403 abstractor chain, 398, 403 abstractor string, 616, 626 absurdity, 236 acceptable, 84 adequacy, 10 adjective, 52, 872, 876, 877 ALF, 12 ALGOLGO, 206, 279, 937 algorithm, 387 algorithmic correctness check, 329 algorithmic definition, 172, 517, 554, 558, 563, 591 all-quantifier, 93 a--conversion, 116 modulo -, 175 -equality, 107 -equivalence, 390 -reduction, 47, 413, 790 alphabet, 401 ancestor, 738 anonymous archetype, 871 Another Logical Framework, 12 applicability condition, 37, 39, 395, 452, 472 application, 28, 148, 178, 231, 377, 396 - condition, 234, 592 -expression, 129, 178, 255, 736
+-language, 185, 309, 526 +-reduction, 40, 493 +‘-reduction, 311 rst-reduction, 601 rt-reduction, 601 r-reduction, 601 s-reduction, 601 t-reduction, 601 1-expression, 258, 371 2-expression, 258, 371 3-expression, 258, 371 Cexpression, 722 lpexpression, 706 It-expression, 706 Ppexpression, 706 2t-expression, 706 3pexpression, 706 3t-expression, 706 V-elimination, 46 $-function, 304 $-type, 304 (-context, 360 ij-postponement, 494 abbreviating expression, 737 abbreviation, 105, 179, 241 abbreviation system, 76, 147 abbreviation-line, 707 abortion, 940, 970 abstract algebra, 139 abstract data type, 28, 245 abstract linear order, 138 abstraction, 28, 148, 178, 231, 396, 470 - expression, 129, 178, 254 1005
Index of Subjects
1006 function -, 231 legitimate Q-,452 - restriction, 19, 107 - rule, 532 self -, 392, 530 applicator, 403 applicator chain, 403 arbitrary, 52, 204, 852, 855 archetype, 205, 871, 898 argument degree, 186, 525 arithmetical typed A-calculus, 168 artificial intelligence, 870 ascendant, 32, 321 assertion, 875 assertion category, 87 assertion type, 688, 722 assertional line body, 886 assignment, 53, 54 assumption, 153, 240, 844, 875 assumptional context item, 880, 885 assumptional item, 879 AT-couple, 324 AT-pair, 32, 324 AT-removal, 33, 37, 325 attachment, 157 AUT-Ah, 30, 32 AUT-A, 36, 37 AUT-II, 31, 40, 44, 127, 147, 158, 289, 294, 303, 724, 733 AUT-II line, 734 AUT-no, 633 AUT-IIl, 634, 636 AUT-4, 187, 284 AUT-68, 41, 147, 251, 687, 721, 722 AUT-LAMBDA, 335 AUT-QE, 127, 147, 183, 230, 276, 289, 303, 396, 469, 471, 701, 721 AUT-QE-NTI, 289, 294, 335 AUT-SL, 28, 32, 147, 154, 186, 275, 314, 393
AUT-SYNT, 30, 43! 158, 299, 725, 737 ACT-synt, 169 Automath, 6, 230, 393, 394, 869 Automath book, 127 Automath project, 169 Automath verifier, 47, 49 automatic expression simplifier, 946 automatic formula manipulation, 375 automatic theorem proving, 23, 98, 157, 221, 240 axiom, 88, 153, 234, 241, 843, 888 axiom of choice, 94, 152, 205, 216 axiom of reducibility, 4 axiomatic line body, 886 Barendregt's cube, 235 Barendregt's lemma, 505 base, 401, 478, 479 Begriffsschrift, 3
P-chain complex, 432 -conversion, 421 -equality, 107, 108 -equivalent, 421 -normal, 460 -normal form, 460 -normalizable, 460 -reduction, 47, 49, 122, 242, 257, 288, 326, 382, 386, 390, 417, 789 n-step -, 417 P1-
-equivalence, 424 -nf, 466 -normal, 463 -normal form, 463, 466 -normalizable, 463 -normalization theorem, 466 -reduction, 36, 398, 422, 424 n-step -, 425
Index of Subjects
P2-
-equivalence, 424 -normal, 463 -normal form, 463 -normalizable, 463 -reduction, 36, 422, 424 n-step -, 424 Pq-Church-Rosser , 577 Or-reduction, 603 big tree, 37, 473, 484, 519, 577, 600 big tree theorem, 38, 175, 485, 486, 591, 600, 601 binary predicate, 132 binary sum, 40 binary tree, 316 binary union, 311 binder, 922 binding abstractor, 407 binding influence, 408 binding instance, 318 binding occurrence, 406 binding variable, 176, 233 block, 15, 75, 843, 882 - opener, 76, 145, 882 - mechanism, 14 - structure, 53, 939 body of a line, 880 Bolzanc-Weierstrass theorem, 138 book, 28, 29, 74, 108, 144, 253, 474, 872, 879 Automath -, 127 correct -, 172 correct PAL -, 84 empty -, 894 - equality, 16, 19, 22, 41, 112, 154, 159, 219 MV -, 879, 891, 911 nested -, 79 valid -, 891, 894 zero abstraction index -, 294 bookkeeping pairs, 601, 607
1007 bool, 87 bool-style, 153, 213 boolean convolution, 950 borderline between language and metalanguage, 873 bound expression, 406 bound instance, 318 bound occurrence, 406 bound variable, 76, 89, 175, 375, 884 Boyer-Moore Theorem Prover, 12 Brouwer-Heyting-Kolmogorov interpretation, 236 de Bruijn index, 49, 809, 811 but for a-reduction, 418 calculated type, 31 Calculus of Constructions, 9, 10, 235 calculus of lambda conversion, 389 calculus ratiocinator, 3 Cambridge LCF, 9 CAML, 11 canonical type, 28, 39, 48, 539, 555 Cantor’s paradise, 205, 846 capacity, 606 Cartesian product, 131, 244, 736 case-construction, 245 category, 15, 48, 75, 76, 128 - of all propositions, 131 - of all types, 130 cc, 10 chain, 403 Characteristica Universalis, 3 chastity belt system, 223 checker, 156 checking, 142, 251, 337 Church’s thesis, 5 Church-numeral, 246 Church-Rosser, 326 h-,577 weak -, 494 - for hq, 596
1008
Index of Subjects
- property, 36, 123, 391, 472, 494 - for PI-reduction, 427
- theorem, 35, 155, 171, 371, 386,390,485, 655,663 Church-typing, 232 clash of variables, 605, 788 class, 372, 872 classical 17 rule, 133 classical logic, 133, 152, 236,866 classical mathematics, 133 classical real analysis, 136 clause, 875,878, 885 clause of a line body, 889 closed expression, 406 closure, 11, 40,418,519,655,676 - for AUT-II, 629 - for Pv-AUT-QE, 537 - for P@-AUT-QE+, 543 - for A, 596 - proofs for simpler languages, 551 - property, 37, 123, 173, 473, 517 - theorem, 155,483 combinatorics, 138 combinatory logic, 177 comment, 739 comment line, 739 common reduct, 173 compatibility condition, 180 compatibility of def and typ, 523, 595 compatible, 231 complete linear order, 139 complete ordered field, 139 completion, 82 complex, 432 composite &reduction, 417 compound notion, 76 comprehension, 136
computability, 5, 489,647 computability under substitution, 648 computable functions, 229 computer program semantics, 843, 863 computer programming, 858 concatenation, 54,954 condition, 875 confluent, 494 confusion of variables, 415 conjunction, 132 conservative extension, 544 constant, 60,230, 377,471, 883 - at infinity, 949 - function, 756 - symbol, 474 constructibility, 51 construction, 470 construction irrelevance, 288 constructive existence, 202 constructive reasoning, 167 constructive validity, 25 constructivist, 87 constructivity, 50 Constructor, 11 content, 253 context, 15, 28, 77, 103, 104, 144, 222, 233, 253, 879 E-, 360 assumptional - item, 880, 885 declarational - item, 880,885 empty -, 128,880 - extension, 734 - indicator, 75, 734 - item, 879,885 - line, 128,265, 734 - mechanism, 14 renaming of -s, 581 valid -, 891,892 variable of a -, 883,884 continuity, 44,748
Index of Subjects continuous, 965 continuum problem, 221 contractum, 179 contradiction, 28, 45, 236, 708 conversion, 28, 390 a-, 116 modulo -, 175 p -, 421 calculus of lambda -, 389 - rule, 234 type -, 308, 530 convertibility, 179 conveyor belt, 209 coq, 11 correct, 372 - book, 172 - PAL book, 84 - expression, 172 - formula, 172 - term, 34 correctness, 172, 263, 327, 328 algorithmic - check, 329 degree -, 307, 312 degree norm -, 593 mathematical -, 166 - in h, 592 - in AUT-QE, 119 - of a 9-formula, 524 - of an E-formula, 524 - of a category, 524, 528 - of an expression, 523, 800 - of a line, 802 partial -, 54, 956 program -, 53 semi -, 333 substitutivity of -, 597 total -, 54, 958 - of strings, 801 - of types, 597 CR for &-reduction, 427, 436 CR for ?-reduction, 436
1009 CR for full 07-reduction, 581 criteria for a good notation, 376 cube of typed lambda calculi, 11 Curry-Howard isomorphism, 235 Curry-typing, 232 d-system, 620 data bank, 164 data type, 10, 53, 938 daughter, 738 dead end set, 639 decidability, 37, 102, 173, 265, 472, 518, 554, 569 feasible -, 18, 25, 167, 173 formal -, 25, 173 decidable, 485 deciding @formulas, 124 decision procedure, 172, 555 declarational context item, 880, 885 declarational item, 879 decomposition, 417, 439, 571 defined constant, 493 defined constant-expression, 493 definiendum, 735 definiens, 735 defining line, 734 definite article, 920 definition, 14, 76, 88, 202, 225, 241, 735, 875, 876 algorithmic - , 172, 517, 554, 558, 563, 591 E-, 39, 172, 517, 554 inductive -, 402 language -, 872 - line, 128 - scheme, 493 unfolding a -, 242 definitional, - constants in h, 620 - equality, 19, 22, 28, 83, 90, 108, 133, 154, 219, 257, 281,
1010
Index of Subjects
394,494, 796
- extension, 185,522, 544 - line, 146
- line body, 886
- reduction, 47 degree, 16, 29, 32, 103, 130, 148, 181,283,307,320-322, 324, 393,450 argument -, 186, 525 - correctness, 307,312 domain -, 186, 525 function -, 185, 525 higher -, 148 - norm, 593 - norm correct, 591 - norm correctness, 593 value -, 185,525 6-equality, 107 -reduction, 39, 50, 154, 219, 256,493, 793, 794 -string, 345 -lambda, 327 A, 36,40,397 -constructible, 409 Ah, 33,313,327 denotational semantics, 943 dependent product, 234 derivation rule, 154 derivation step, 402 derivative, 137, 744,751,753,756 derived rule, 897 descendant, 738 despecify, 923 despo, 923,925 Dialectica interpretation, 9 diamond property, 123 didactics, 251 difference quotient, 744, 750 differentiable, 745 differentiation, 44,733,744,750,756
direct consequence, 417 disjoint one-step i-reduction, 496 disjoint one-step reduction, 527 disjoint reduction, 494 disjoint sum, 132,244, 736 disjoint union, 32,304 disjunction, 28,305 distinction between replacement and substitution, 605 distinctly bound, 397,407 domain, 126,232 - degree, 186,525 - function, 561 mechanical -, 795 - of a term, 48 preservation of -s, 596 Q-, 453 uniqueness of -s, 126,174,519, 535 extended -, 519,535 double negation axiom, 236 double negation law, 216, 728, 748 dummy, 231,318, 883,884 dummy-binding, 376 EB-line, 707 ECC, 11 Edinburgh LCF, 9 E-definition, 39, 172, 517, 554 - for A, 533 E-formula, 116,119, 177,517,524 elementary &reduction, 416 elementary PI-reduction, 424 elementary Pz-reduction, 424 elementary q-reduction, 439 elementary tc-reduction, 443 elementary mathematics, 143 elimination of primitive constants, 616 elimination rule, 234 embedding, 41 empty block opener, 109
Index of Subjects empty book, 894 empty context, 128, 880 empty line, 739 end-point, 316 environment, 49, 817 €-reduction, 40, 304, 493 &,It -reduct ion, 311 equality, 46, 135, 293, 708, 900 Q-, 107 0-, 107, 108 book-, 16,19,22,41,112,154, 159, 219 definitional -, 19, 22, 28, 83, 90, 108, 133, 154, 219, 257, 281, 394, 494, 796 extended -, 583 6-, 107 rj-, 107, 108 intensional -, 192 left hand - rule, 178, 536 - of booleans, 41 - of elements, 41 - of proofs, 306 - on types, 46 right hand - rule, 178 equivalence, 327 equivalence proof, 554, 563 equivalent, 172 77--equality, 107, 108 -reduction, 39, 43, 47, 48, 288, 382, 390, 726, 791, 792 n-step -, 440 T!-reduction, 441 k-fold -, 441 qd-system, 620 excerpt, 45, 48 excluded middle, 205 excluded third, 45, 216, 285 existence, 93 existential quantification, 132, 748
1011 existential quantifier, 237 existential type, 246 expert system, 865 explicit substitution, 50 explicit typing, 232 exponential function, 138 expression, 28, 59, 254, 402, 736, 875 1-, 258, 371 2-, 258, 371 3-, 258, 371 4-, 722 lp-, 706 It-, 706 2p-, 706 2t-, 706 3 p , 706 3t-, 706 abbreviating -, 737 abstraction -, 129, 178, 254 application -, 129, 178, 255, 736 bound -, 406 closed -, 406 correct -, 172 defined constant -, 493 fix -, 736 head -, 254, 736 inhabitable -, 307 lambda -, 736 legal -, 233 legitimate -, 452, 482 name carrying -, 380 NF -, 172, 379 normable -, 507 p, 20, 130 n-, 736 plus -, 493 primitive constant -, 493 pseudo -, 233 quasi -, 114, 149, 286 saturated -, 59 C-, 736
1012 t-, 20, 130 ext-postponement, 499 ext-reduction, 493, 496 extended definitional equality, 583 extended reduction, 583 extended system, 24 extended typed /\-calculus, 168, 303 extended uniqueness of domains, 519, 535 extension, 172, 544 definitional -, 185, 522, 544 unessential -, 185 extensional reduction, 175, 493 extensionality, 47, 136, 192 extensions of Automath, 98 exterior approach, 864 external reference, 322 factor, 404 failure of &-SN, 652 feasibility, 21, 24 feasible decidability, 18, 25, 167, 173 field, 139 finite product, 32 finite sum, 32 first-order language, 158 first-order system, 114, 637 fix expression, 736 fix symbol, 735 fixed point, 965 fixed point semantics, 54 flag, 891 flagless form, 881 flagstaff, 891 flagstaff form, 52, 881, 891 FOL, 166 formal decidability, 25, 173 formula, 875 correct -, 172 E-, 177 Q-, 177
Index of Subjects formulae-as-types, 8, 199, 393, 469 fourth degree identification, 283 fragment p-, 196 t-, 189 free occurrence, 406 free variable, 76, 118 fresh, 884 fresh identifier, 883 fresh variable, 412 function, 17, 231, 389, 908 - abstraction, 231 - application, 231 - degree, 185, 525 - like, 113 - type, 17 type-valued -, 131 - value, 17, 389 functional abstraction, 90 functional binder, 923 functional interpretation, 111 functional interpretation of logic, 19 fundamental rule, 897 y-equivalence, 430 y-type, 357 Godel’s general recursive functions, 5
general language rules, 177 general reduction, 446 generalized - Cartesian product, 239 - conjunction, 132, 711 - functionality, 25 - if-then-else, 711 - logic, 171 -implication, 131, 197, 704, 711 - product, 17, 40, 234 - sum, 32, 40 - typed A-calculus, 40, 168 generate, 402, 417, 424, 439
Index of Subjects generic form, 494 geometrical construction, 50,54?226,
843,850,960 Girard’s paradox, 8 gL, 874 GOAL, 167 grains of salt, 927,931 grammatical category, 876 graph reduction, 177 Grundlagen, 157 Hall-Konig theorem, 138 hardware verification, 12 head expression, 254, 736 head reduction, 571 heading, 739 Heine-Bore1 theorem, 138 high typing, 871, 877 higher degree, 148 higher mathematics, 143 higher order language, 158,187 Higher Order Logic, 12 higher order system, 200 higher order typing, 50 Hilbert operator, 93,728 Hilbert’s &-operator, 152 Hilbert’s axiomatization of geometry, 853,922 Hilbert’s axioms, 887 Hilbert’s program, 5 Hilbert’s selection operator, 205 HOL, 12 identifier, 15, 58, 75, 76,735, 883 identity - function, 758 syntactic -, 175 IEreduction, 493,496, 634 if-then-else, 41, 54,711 illative combinatory logic, 195 immune form, 571,638 imMV, 874
1013 implantation, 32! 322 implementation, 809 - of substitution, 809,817 implication, 41,45,91,132,152,153,
205, 211, 236 generalized -, 197 implicit typing, 232 impredicativity, 8 improper reduction, 585,632 improper symbol, 401 improved dead end set, 641 incomplete information, 53,943 incompleteness theorem, 12 increasing reduction, 36 indicator, 15, 75-77 - string, 79 individual, 45, 136,708 - variable, 401 induction, 46 - on p , 508 - on reducts, 598 - on subexpressions, 598 - on the length of proof, 402 inductive definition, 402 Inductive Definitions, 11 ineffective P-chain, 464 inference rule, 87 infinite binary tree, 315 inhabitability condition, 307 inhabitable, 186,309 - degree condition, 523 - expression, 307 inhabitant, 17 inhabited, 235 injection, 32,40,304,493 instantiate, 129 instantiation, 149,180,193,222,308 - condition, 593 integers, 137 integrated, 938 integration, 227
1014 intensional equality, 192 interactive program. 47 interior approach, 864 internal reference, 322 internalization, 218 interpretation, 143 interpretation-oriented meta-MV, 874 introduction of a variable, 240 introduction rule, 234 introduction-elimination reduction, 493 introduction-elimination rule, 634 intuitionism, 205, 236 intuitionistic Irule, 133 intuitionistic logic, 236 intuitionistic reasoning, 41 intuitionistic type theory, 8,25, 230 inverse mapping theorem, 138 irrelevance of objects, 134 irrelevance of proofs, 20,29,43,132, 133, 169,710,722,724 justification, 240 ereduction, 443 Konig’s lemma, 968 kind, 236 knowledge frame, 328 label, 316,885 labeling, 317 lambda abstraction, 18 Lambda Automath, 393 lambda equivalence, 399,447 lambda expression, 736 lambda notation, 389 lambda phrase, 403 lambda phrase chain, 403 lambda term, 231 lambda tree, 32, 317 A, 40, 186,314, 393, 397, 452, 473, 590,591 A,, 41, 655,673
Index of Subjects
Aq, 591 Aqc, 616 Avdr 620 Ac, 616 Ad, 620 A-Automath, 275 A-Calculus, 5, 229 arithmetical typed -, 168 cube of typed -, 11 extended typed -, 168,303 generalized typed -, 39, 168 A-typed -, 313 name-free -, 156,813 polymorphic -, 9, 193 polymorphic predicate -, 10,167 pure typed -, 168 second order -, 230, 235 typed -, 232,845 untyped -, 230 A-definability, 5 A-definable functions, 5 A-typed lambda calculus, 313 A-SEMIPAL, 146 AV, 340 A+, 34 A+-Church, 34 AX, 38, 175, 185,469, 601 AX-1, 185,472 AX-p, 486 AX-theory, 471,478 Xu, 33,339, 340 ATU, 340, 354 A(-term, 343 language, 143 -, 185 non -, 525 closure proofs for simpler -s, 551 - definition, 872 first order -, 158 general - rules, 177
+
Index of Subjects higher order -, 158, 187 meta -, 4, 143, 208, 873 object -, 4 programming - semantics, 23, 160 regular -, 181, 525 specification -, 865 superimposed -, 97 - theory, 155, 171, 172 type in a programming -, 244 large category, 130, 134 lazy evaluation, 49 LCF, 9, 167 left hand equality, 519 - rule, 178, 535 left projection, 736 legal expression, 233 legitimate expression, 452, 482 legitimate Q-application, 462 legitimate term, 38, 39, 472 LEGO, 11 length, 402 length of proof, 424 let-construction, 28, 244 level of a variable, 379 lexicographical order, 321 library, 251 life without types, 890 limit, 743 line, 28, 108, 128, 253, 734, 879 linear order, 139 list, 243 literary replacement, 513, 605 local P-reduction, 242, 244, 325, 382 local reduction, 32 logic, 268, 688, 706, 721, 909 classical -, 236 combinatory -, 177 functional interpretation of -, 19 generalized -, 171
1015 illative combinatory -, 195 intuitionistic -, 236 minimal predicate -, 166 predicate -, 236 propositional -, 236 Logic for Computable Functions, 9 logical constant, 241 Logical Framework, 10 logical paradox, 4 loops, 863 loss factor, 23, 44, 160 low typing, 877 machine verification, 18 macro-operation, 60 main line, 328 main reduct, 572 main reduction, 506 many-step reduction, 122 Mascheroni constructions, 863 mathematical correctness, 166 Mathematical Vernacular, 23, 27, 52, 161, 211, 224, 865 mathematics produced in Automath, 159 mechanical domain, 795 mechanical type, 31, 125 memory, 48 meta language, 4, 143, 208, 873 meta-MV, 874 meta-typing, 871 metalingual discussion, 376 metric space, 138 micro-operation, 61 mimicking, 159 mini-reduction, 33, 242, 326 minimal logic, 132 minimal predicate logic, 166 mixed string, 401 MIZAR, 12 ML, 9, 167, 230
1016 mMV, 874 mock typing, 150, 289 modified parametrized constant, 883 modulo a-conversion, 175 modulo a-reduction, 493 Modus Ponens, 45,91,214, 236,917 monotonic functional, 666 monotonicity, 390 monotonous, 231 monotony rule, 416, 425, 430, 439, 443,447 more-step reduction, 494 mother, 738 multi-step proof, 152 multiple P-reduction, 382, 384 multiple substitution, 39, 49, 501, 512, 815 MV, 52, 865, 874 - book, 879, 891, 911 name, 52, 253, 266, 376, 875-877 -carrying expression, 380 - clash, 33, 35, 47, 156 - of the proof, 47 named variable, 35 name-free, 35, 379 - A-Calculus, 156, 813 - notation, 34, 231, 342 - variable, 25, 343 nameless dummy, 33, 343, 375 nameless variable, 40, 49, 50, 656, 811 natural deduction, 14, 238, 721, 875, 916 natural language, 927 natural number, 46, 137, 241 nearness, 742 Nederpelt’s norm, 278 negation, 28 nested book, 79 nested one step reduction, 663
Index of Subjects nested reduction, 428 new clause of a line, 889 NF expression, 379 non-+-language, 525 non-termination, 53, 54, 940, 947 nonempty, 41 norm, 34, 37,371, 373,456, 507, 664 -correct, 331, 334 normability, 40 normable, 41, 331, 601 - expression, 507 - term, 664, 666 normal, 460 - reduction, 572 - term, 664 normal form, 82, 155, 173, 277, 281, 371, 391, 456, 460 uniqueness of -s, 173 - theorem, 155, 373 normalizable, 460 normalization, 123, 173, 391 - for 0-reduction, 508 - property, 494 strong -, 29, 36,37, 39-41, 123, 173, 391, 463, 472, 649, 655, 664 - theorem, 463 weak -, 34, 39 norming functional, 41, 664 notation rule, 401 Kuprl, 9, 230 object, 128, 130, 878 - language, 4 - name, 76 observability, 51, 850, 854, 861 occurrence, 401 old, 880 old clause of a line, 889 OMV, 874 one-step 0-reduction, 231
Index of Subjects one-step preservation of types, 534 one-step reduction, 121,493 order, 441 order-completeness, 137 ordered field, 139 osmosis, 202, 223 outer shape lemma, 506 p-expression, 20, 130 p-fragment, 196 p-part, 25 p-reduction, 634 pair, 31, 131,244,304,493, 736,907 pair rule, 310 pairing, 40 PAL, 15, 74,79, 146, 148,852 PAL-FT, 146, 147 paradox, 4,208 paragraph, 701 - closing line, 738 - indicator, 738 - line, 734, 737 - marker, 45 - opening line, 738 - reopening line, 738 - system, 44,48,110,147,156, 802 parameter, 128 parametrized constant, 883 parsing, 876,883,926,934 Part P,25 t-, 25, 191 partial correctness, 54,956 partial function, 137, 709, 724, 733, 742,744 Peano’s axioms, 137,241, 846,887 permissible, 449 permutative reduction, 632,634 phrase, 875 ?r-
1017
n-
-reduction, 40, 304, 493, 607 -type, 358
-expression, 736 -operator, 289 -type, 40 platonism, 203, 217,220,878 plus-expression , 493 pMV, 874 PN, 253 PN-line, 16,43, 146, 707 pointed flag, 882,885 polymorphic A-calculus, 9,193 polymorphic predicate A-calculus, 10, 167 polymorphism, 28, 243 positive logic, 851 postponed substitution, 176 postponement, 39,498 - of 1.)-reduction,399,442 power series, 44, 138 powertype, 21, 136 PPA, 10, 167 predicate, 41,93, 132,135, 196,875 predicate logic, 236 predicativity, 9 preservation - of domain, 596 - of types, 174,310, 519,834 - of C u t Y p , 540 - of typ, 596 - of typ*, 596 primary MV, 873 primitive constant-expression, 493 primitive line body, 886 primitive notion, 17, 76, 104, 109, 241 primitive notion line, 128, 734 primitive program, 952 primitive program construct, 953 principal type scheme, 230,232
1018 procedure, 54 processing, 143 processor, 73, 96, 570 product, 28, 45 - formation, 305 - type, 17, 233, 306, 708 program, 279, 951 - abortion, 54 - correctness, 53 - specification, 951 -to-program function, 954 programming language semantics, 23, 160 programming variable, 937 projection, 32, 304, 493 - function, 244 - of pairs, 40 - rule, 310 proof, 88, 128, 131, 242, 251, 919 - by cases, 966 - checking, 166, 167 - class, 211 - irrelevance, 22, 159 - object, 235 - type, 152, 153 proofs-as-objects, 5, 211 prop-inclusion, 114 prop-style, 153, 213 prop-type, 235 propagation, 525 proper identifier, 76 proper reduction sequence, 122, 585 proper subexpression, 403, 493 proposition, 87, 131, 153, 240, 872, 875 propositional logic, 236 propositional variable, 153 propositions-as-types, 5, 16, 18, 22, 27, 41, 111, 150, 168, 196, 198, 211, 235 pseud+expression, 233
Index of Subjects PTS, 11, 27, 233 PTS-rules, 37 pure system, 24, 517 Pure Type System, 11, 27, 233 pure typed A-calculus, 168 Q-applicable, 452 Q-domain, 452 Q-function, 452 @formula, 116, 120, 177, 517, 524 &propagation, 308, 529 quadruple, 53 quantification, 59. 106, 185, 396, 923 quantifier, 135, 225 quasi-expression, 114, 149, 286 quasi-type, 358 quasi-variable, 940 ramified theory of types, 4 rationals, 137 real analysis, 21, 733 realizer, 196 reals, 137 reasoning, 85, 150 rectangular flag, 882, 885 recursion, 21, 28, 53, 246, 963 recursive algorithm, 572 recursive procedure, 942 redex, 179 reduction, 28, 155, 390, 446 +-, 40, 493 +’-, 311 Q-, 47, 413, 790 single-step -, 414 modulo -, 493 p-, 47, 49, 122, 242, 257, 288, 326, 382, 386, 390, 417, 789 composite -, 417 elementary -, 416 local -, 325 multiple -, 382, 384 n-step -, 417
Index of Subjects one-step -, 231 single-step -, 416 PI--, 36, 398, 422, 424 elementary -, 424 n-step -, 424 CR for -, 427, 436 single-step -, 424 P2-, 37, 423, 425 elementary -, 424 n-step -, 424 single-step -, 424 PT-, 603 CR for ,&-, 581 definitional -, 47 delta -, 219 6--, 39, 50, 154, 219, 256, 493, 793, 794 disjoint -, 496 disjoint one-step i-, 496 disjoint one-step -, 527 E-, 40, 304, 493 Ea1t-r 311 7-, 39, 43, 47, 48, 288, 382, 390, 726, 791, 792 elementary - 439 n-step - 439 restricted - 578 single-step -, 439 ?!-, 441 k-fold - 441 extensional -, 175, 493 ext -, 493, 496 extended -, 583
Y-
-, 436 single-step -, 429 graph - , 177 head -, 571 IE-, 493, 496, 634 improper -, 585, 632 increasing -, 36 CR for
1019 introduction-elimination -, 493 6-, 443 elementary -, 443 single-step -, 443 local -, 32 main-, 506 many-step -, 122 mini-, 33, 242, 326 more-step -, 494 - of order p , 441 nested one-step -, 663 nested -, 428 single-step -, 428 normal -, 572 P-, 634 permutative -, 632, 634 A-, 40, 304, 493, 607 r-, 601 rst-, 601 rt-, 601 9-, 601 - sequence, 122, 173, 390 proper -, 122, 585 U-, 40, 304, 493 single-step -, 446 - step, 173 - strategy, 125, 173, 570 t-, 601 twin -, 495 - under substitution, 39, 503, 505 redundancy, 167 reference, 814 - depth, 176, 378 - mapping, 34, 344, 354 - number, 33, 35, 343 - transforming mapping, 177 reflexivity, 136 refmap, 354 refuser, 940, 972 regular language, 181, 525
1020 regular system, 517 relation, 132 relational semantics, 53, 938 relative addressing, 49 remark, 739 renaming, 48, 375 - of contexts, 581 renovation, 411 reopening of a paragraph, 46 replacement, 411 - property, 606 - theorem for P ~ T - S N , 621 - theorem for PT-SN, 606 restricted q-reduction, 578 pnormable, 399, 458, 460 pA-normable, 458 ptype, 360 right hand equality rule, 178 right projection, 736 root, 316 rule of the excluded third, 41 rules of grammar, 872 runtime, 947 saturated expression, 59 scar, 422 scheme, 193 second order lambda calculus, 230, 235 secondary MV, 873 segmap, 350 segment, 23, 33, 241, 339, 345 - calculus, 44 - mapping, 350 semantically equivalent, 952 semantics, 171, 859, 960 -of computer programs, 51, 54 - of programs, 937 semicorrectness, 333 SEMIPAL! 15, 74, 81, 146 sentence, 52, 872, 875
Index of Subjects sequential composition, 53, 54 set, 41, 45, 136, 240, 709, 871, 904 - extensionality, 136 - theory, 50, 138, 841 - type, 236 shape, 571, 786 shorthand facility, 110 C-expression, 736 o-reduction, 40, 304, 493 C-type, 11, 40, 246 similarity, 431 simple replacement, 411 simple theory of types, 5 simple type, 34 simple type theory, 4 simultaneous substitution, 118 simultaneous substitution theorem, 5 14 single line, 29, 32, 40, 590 single line Automath, 590 single substitution, 49 single-step a-reduction, 413 single-step P-reduction, 416 single-step PI-reduction, 424 single-step ,&-reduction, 424 single-step q-reduction, 439 single-step ?-reduction, 429 single-step &-reduction, 443 single-step nested reduction, 428 single-step proof, 152 single-step reduction, 446 skip line, 734, 738 smMV, 874 sMV, 874 sort, 234 sound applicability, 519, 596 soundness of applicability, 535 specification, 53 - language, 865 square brackets lemma, 503,505,510 square brackets lemma for >or, 610
Index of Subjects
stack, 245, 939 standardization, 391 - theorem, 503 Stanford LCF, 9. state space, 937 state transformation, 939 statement, 233, 872, 875, 876 strategy, 156 strengthening, 308, 613 - rule, 521, 525 string, 44, 158 strong ,@normalizability, 466 strong &normalization, 509 strong &normalization theorem, 468 strong PI-normalizability, 466 strong PI-normalization theorem, 467 strong Pz-normalizability, 466 strong ,&-normalization theorem, 466 strong 7-normalizability, 466 strong q-normalization theorem, 468 strong existence, 134 strong normal form theorem, 155 strong normalizability, 468 strong normalization, 29, 36, 37,3941, 123, 173, 391, 463, 472, 649, 655, 664 - property, 494 - theorem, 468 strongly normalize, 123 structure of line bodies, 886 structure sharing, 49, 809, 817 subdivided lambda tree, 328 subexpression, 403, 493 subroutine, 858, 863 substantive, 52,210,870, 871, 875877 sub-, 870, 897 substitution, 14, 28, 47, 49, 51, 59, 89, 118, 129, 380, 414, 809, 815 - lemma, 41
1021 - lemma for
P-N,
509
postponed -, 176 - theorem for P-SN, 509 - theorem for Pr-SN, 611 substitutivity, 136, 747 - of correctness, 597 subtype, 41, 45, 708 successor, 137, 241, 246 sugaring, 226 sum, 28 - rule, 310 - type, 304 superimposed language, 97 supertype, 114, 236, 303, 470 suppression mechanism, 167 surjectivity of pairing, 40 syllogism, 3 symmetry, 747 syntactic identity, 175, 493 syntactic similarity, 583 syntactic variable, 401 syntax, 54, 960 syntax checker, 571 syntax-oriented meta-MV, 874 system F, 9, 230, 235 t-expression, 20, 130 t-fragment, 189 t-part, 25, 191 tail, 405 r-expansion, 607 r-redex, 603 taxonomy of type systems, 229 teaching, 223, 224 telescope, 11, 21, 44, 139, 158, 224, 729 telescopic mapping, 11 term, 390 termination of verification algorithm, 573 text, 872
1022 theorem, 88, 241, 875 theorem proving, 167 theory of real numbers, 733 Theory of Types, 8 time space, 53 to fit in, 650 total correctness, 54, 958 transitivity, 747 translation, 711 translator, 570, 786 tree, 77, 314 tree of knowledge, 78 truth, 747 truth value, 916 twin case, 496 twin occurrence, 495 twin reduction, 495 type, 15, 32, 76, 128, 130, 210, 232, 258, 395, 449, 708, 735 $-, 304 abstract data -, 28, 245 anonymous arche-, 871 arche-, 205, 871, 898 assertion -, 688, 722 - assignment, 179, 478 calculated -, 31 canonical -, 28, 39, 48, 539, 555 category of all -s, 130 - check list, 330 - conversion, 308, 530 correctness of -, 597 data -, 10, 53, 938 equality on -s, 46 existential -, 246 extended -d A-calculus, 168,303 function -, 17 y-, 357 - in a programming language, 244 - inclusion, 19, 25, 30, 32, 39,
Index of Subjects 114, 150, 184, 285, 289, 305, 314, 335, 470, 479, 483, 531, 538, 651 intuitionistic - theory, 8, 25, 230 - label, 493, 497 -d lambda calculus, 313, 845 lambda -d lambda calculus, 313 mechanical -, 31, 125 - of a lambda tree, 320, 323 - of pairs, 131
II-, 40 358 power-, 21, 136 preservation of -s, 174,310, 519, 534 one-step -, 534 principal - scheme, 230, 232 product -, 17, 233, 306, 708 proof -, 152, 153 prop-, 235 quasi-, 358 ramified theory of -s, 4 - reduction, 285 - restriction, 50 P-, 360 -d set theory, 871 set-, 236 C-, 11, 40, 246 simple theory of -s, 5 simple -, 34 simple - theory, 4 - structure, 469 sub-, 41, 45, 708 sum-, 304 super-, 114, 236, 303, 470 - system, 229 taxonomy of - systems, 229 - theory, 5, 229 ultimate -, 19 uniqueness of -s, 11, 19, 102, T-,
Index of Subjects 174, 178, 306, 312,472, 519, 535, 869 - valued function, 131, 303 typing, 232, 449, 897 Church-, 232 Curry-, 232 explicit -, 232 - function, 554 high -, 871, 877 higher order -, 50 implicit -, 232 low -, 877 meta -, 871 mock -, 150, 289 - operation, 50 - operator, 260 ultimate -, 19 typographical abbreviation, 224, 905, 910, 921, 925 ultimate type, 19 understanding, 143 unessential extension, 185, 521, 522, 544 unfolding a definition, 242 unification of mathematics, 849 uniqueness - of domains, 126, 174, 519, 535 extended -, 519, 535 - of normal forms, 123, 173, 392 - of types, 11, 19, 102, 174, 178, 306, 312, 472, 519, 535, 869 - quantifier, 136 universal generator, 391 universal quantification, 132,290, 748 universal quantifier, 237 unstability, 174 update function, 34
1023 updating, 814
val, 908 valid, 60, 81, 890 - book, 891, 894 - clause, 891: 892 - context, 891, 892 - inference, 3 validity, 109 valuation, 423 value degree, 185, 525 variable, 230, 377, 402, 843, 875, 883 binding -, 176, 233 bound -, 76, 89, 175, 375, 884 clash of -s, 605, 788 confusion of -s, 415 free -, 26, 118 fresh --, 412 individual -, 401 introduction of a -, 240 level of a -, 379 name-free -, 25, 343 named -, 35 nameless -, 40,49, 50, 656, 811 - of a context, 884 programming -, 937 propositional -, 153 quasi -, 940 segment -, 33, 346 syntactic -, 401 verification, 23, 155, 556, 569, 805 - algorithm, 572 - of E-formulas, 574 - program, 47-49, 570 verifying program, 170,783,805,809 vernacular, 865 vicious circle, 4 waiting list, 328 weak ij-postponement, 494 weak 6-advancement, 500 weak Church-Rosser, 494
1024 weak disjunction, 134 weak existence, 20, 134 weak functional behaviour, 34 weak normalization, 34, 39 weakening, 527, 581 weight, 348,352 where-construction, 28, 244 while-statement, 54,945,955 Wiener Kreis, 203 word, 875 WOT, 161, 211 young, 880 Zermelo-Fraenkel axioms, 51, 846 Zermelo-Fraenkel set theory, 50,866 zero abstraction index book, 294 Zorn's lemma, 138
Index of Subjects