This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
we reduce P and Q to two distinct irreducible terms P ' and Q', we have generated an interesting lemma P ' = Q'~ which is an equational consequence of the rnles considered as equations. It may be possible to give an orientation to this new equality for forming an extended term rewriting system, while preserving the finite termination property. This is the basis of the Knuth-Bendix completion method, which attempts to complete a term rewriting system to a confluent one. This method may be considered a way of compiling a canonical form algorithm from an equational specification. We cannot describe the method fully here. The main ideas are that unresolved critical pairs are kept as new rewrite rules, and that all rules are kept inter-reduced. The procedure may stop with a canonical system, it may fail because termination is impossible to establish, or it may loop. Whenever it does not fail, it gives a semi-decision procedure for the original equational theory, as explained in Huet [66]. More detailed expositions of the method may be found in [84,65,71]. Failure may resnlt from some permutative consequence such as eommutativity. The method has been extended in varions ways in order to consider rewritings modulo such permutative axioms. For instance, Peterson and Stickel [127] have shown that it was possible to extend the method to complete equational presentations, where oneor several functors were assumed to be associative and commutative, using Stickel's associative-commutative unification algorithm [150,43]. This method has been extended by Jouannaud and Kirchner [73]. Various other extensions of the Knuth-Bendix procedure have been proposed, for handling constructors (free functors) [69] and for solving word problems in finitely presented algebras [90]. The Knuth-Bendix completion procedure and its extensions give a general framework to simplification techniques. As example of canonical term rewriting system we give distributive lattices. Here n and U are assumed to be associative and commutative. The canonical set consists in the following four rules: xn(zuy) and d'= , the product d*d' is a state constructor definition with I = (tl,..,tk), j (p) is a predicate of the form p (xl,..,x k) where the xi's are k distinct variable names and x i is of type tj. Given P = {
~x
• u(yn~) ~ [ ~ u y ) n(~uz) xU:~ ---~x
xn=
-~z
E x e r c i s e . Show that the other distributivity law is a consequence of the above rules. Finally, we show the canonical system for Boolean algebras. Now the connectives h and $ (exclusive or) are assumed to be associative and commutative. x A 1 --+x
53
xAO - ~ 0 xAx
"-~ x
x@0-*z z @ x --~0
This canonical set can be used to decide propositional calculus, using the following translations:
xVy
-*x
$
x=~y
~x
$
y •
(xAy)
(xAy)
$ 1
The resulting decision method is basically the method of Venn's diagrams, as the following example demonstrates. With three propositional letters a, b and c~ the proposition
(aA-~b) V (bA-~c) V (cA-~a) reduces to its canonical form: a $
b @ c @ aAb
@ bAc
~9 c A a
which can easily be "seen" as a disjoint union of regions in the following Venn diagram:
This example also shows that disjunctive normal form is n o t a canonical form, since the above proposition possesses another d.n.f.
or, as Quine puts it, a formula may have distinct mlnimal sets of prime impticants.
54
3.6
Sequential computations
We now consider term rewriting systems with two constraints: (a) left linearity: for every ~ -*/~ in R, every variable of ~ occurs exactly once (b) non ambiguity: there are no critical pairs As we shall see, these systems are always confluent, their termination is unnecessary. Functional programming languages, and more generally operational semantics rules can usually be expressed as such systems of rewrite rules [58]. As a very simple example, consider the system of two rules DefK and Defs defining the combinators S and K. We shall here define the main notions of computation using rewrite rules. The full theory is given in Huet-L~vy [70]. We call redex in term M an occurrence u e D(M) such that a _< M / u for some left-hand side a of a rule in R. We define the reduction relation -*R associated with R in the same way as in the preceding section. We shall assume R fixed from now on, and write simply -* for reduction. Let M ~ N at redex occurrence u E D(M), using rule a -* fl E R. Let now v be any redex in M. We define the set v\u of residuals of v as a set of redexes in N defined as follows. If v = u, v\u -- ~. If v < u or vlu , then v\u = {v}. Finally, if v > u, this means, by non-overlapping, that v is below some variable x of a. By linearity, x has a unique occurrence in c~, which we shall denote by x as well. That is, v = u " x ~ w for some w. Now let X be the set of occurrences of variable z in ft. We define u \ u = {u ~ y ~ • [ y e X}. Thus redex v may have zero, one or more residuals in N. Intuitively, these residuals are the places where one must reduce in N in order to effect the computation step consisting in reducing at redex v in M. Actually, on the natural dag implementation all the occurrences of v\u denote the same shared node of the dag representing N. Symmetrically the same holds of u\v. And as expected we have a local confluence diagram, where the single steps u and v confluate using all the steps in v\u (resp. u\v). However, this is not sufficient, since we do not want to require --+ to be Ncetherian. However, it is easy to notice that all the redexes in v\u are mutually disjoint, and that any residual of some redex is always disjoint from any residual of some other disjoint redex. Thus it is natural to extend the reduction relation -~ to parallel reduction of a set of mutually disjoint redexes, a relation we shall write - - ~ . If M - ~ N using set of redexes U, then for every set V of mutually disjoint redexes in M, we define the residuals of V by U as: V \ U = {w E v\u I u E U A v E V}. And now we have a strong confluence property:
which extends easily to 1autti-steps derivations A and B, yielding: T h e p a r a l l e l moves t h e o r e m . Let A and B be two co-initial derivations. Define A U B as B U As in the sense that these two derivations are co-final, a~ld preserve
A; B \ A . Then A U B -
55
residuals. T h e c a t e g o r i c a l v i e w p o i n t . The category whose objects arc terms, and whose arrows from M to N are parallel derivations, quotiented by the equivalence - , admits pushouts. C o r o l l a r y . The reduction relation -+ has the Church-Rosser property. Beware! The lattice structure given by the parMlel moves theorem is on derivations~ and not on terms. For instance, if we consider the system R consisting solely of the rules I(x) ~ x and J(z) ~ x, the following'derivations diagram shows that the terms I(J(A)) and J(I(A)) do not possess a g.l.b.
t(J(I(A)))
I(J(A))
\/\//
J(I(A))
+(A)
I(I(A))
t(A)
\/ A
Note that this phenomenon may be traced to the existence of two non-equivalent derivations between I(I(A)) and I(A). This shows that the categorical viewpoint is the right one here: we need to talk in terms of arrows, just just relations between terms. T h e s t a n d a r d i z a t i o n t h e o r e m . It is always possible to compute in an outside-in manner. We do not have the space here to explain in a rigorous manner what outside-in exactly means. We just remark that this may be more complicated than merely reducing the leftmost-outermost redex, i.e. the redex minimum in the total ordering on occurrences defined by u
56
leads to the notion of sequential term rewriting system. A fllrther refinement, strong sequentiality, gives a decidable criterion which may bc used to drive efficient interpreters which look for a needed redcx in linear time, using a generalization to trecs of the Kmlth-Mol-ris-Pratt string-matching algorithm [85]. This theory is completely explahmd in Huet-L6vy [70]. In practice, we obtain easy criterions for strong sequentiMity in the particular cases of systems with constructors, and ~left" systems such as systems of combinators definitions.
4 4.1
Natural
deduction
and
Proofs with variables;
A-calculus
sequents
We now come back to the general theory of proof structures. We saw earlier that the Hilbert presentation of minimal logic was not very natural, in that the trivial theorem A -~ A necessitated a complex proof S K K. The problem is that in practice one does not use just proof terms, but deductions of the form
r~A where i" is a set of (hypothetic) propositions. Deductions are exactly proof terms wi~h varlables. Naming these hypothesis variables and the proof term, we write:
{...[xi:Ai]...ti<_n}
t- M : A
with V ( M ) C {Xl, ..., x,~}. Such formulas are called sequents. Since this point of view is not very well-known, let us emphasize this constatation:
Sequents represent proof terms wi~h variables. Note that so far our notion of proof construction has not changed: P b-~ M : A iff b-~ur M : A, i.e. the hypotheses from r are used as supplementary axioms, in the same way that in the very beginning we have defined T(~, V) as T(~ tA V). 4.2
The deduction
theorem
This theorem~ fundamental for doing proofs in practice, gives an equivalence between proof terms with variables and functional proof terms:
r u { A } ~- B ~
r ~ A~B
That is, in our notations:
~ r u { z : A } ~- ( M x ) : B This direction is immediate, using App, i.e. Modus Ponens. b) rU{x:A} t- M : B ~ F t- [ x ] M : A - - ~ B where the term [~] M is given by tile following algorithm. a) F b- M : A - - - + B
8chSnfinkel's abstraction algorithm: [ x ] x -- I [z](MN)
(-- S g K) = S [x]M [x]N
57
Note that this algorithm motivates the choice of combinators S an(] K (and optionally I). Again we stress a basic observation: Sch6ntinkel's algorithm is the essence of the proof of the deduction theorem. Now let us consider the rewriting system R defined by the rules DefK and D e f s , optionally supplemented-by: Defi: I x = x and let us write ~> for the corresponding reduction relation. Fact.
( [ x ] M N)
~>* M [ z ~ - - N ] .
We leave the proof of this very important property to the reader. The important point is that the abstraction operation, together with the application operator and the reduction t>, define a substitution machinery. We shall now u.~e this idea more generally, in order to internalize the deduction theorem in a basic calculus of flmctionality. That is, we forget the specific combinators S and K , in favor of abstraction seen now as a new ~erm constructor.
4.3
A calculus.
Here we give up T-terms in general, in favor of A-terms constructed by 3 elementary operations: variable
(M N)
application
[z] M
abstraction
This last case is usually written Ax. M, whence the name A-notation. The A-notation is first a non-ambiguous notation for expressions denoting functions. For instance, the function of two arguments which computes sin of its first argument and adds it to cos of its second is written
Ez] [y]sin(z)+cos(y) The variables x and y are bound variables, that is they are dummies and their name does not matter, as long as there are no clashes. This defines a congruence of renaming of bound variables usually called a-conversion. Another method is to adopt de Bruijn's indexes, where variable names disappear in favor of positive natural numbers [15]. We define recursively the sets An of A-expressions valid in a context of length n > 0 as follows:
k
(1 < k < n)
[ (MN) (M, NeAn) I []M MEA,+I. Thus integer n refers to the variable bound by the n-th abstraction above it. For instance, the expression [] (1 [] (1 2)) corresponds to [x] (x [y] (y x)). This example shows that, although more rigorous from a formal point of view, the de Bruijn naming scheme is not fit for human understanding, and we shall now come back to the more usual concrete notation with variable names.
58 Tim fact observed above is now edicted as a computation rule, usually called #-reduction. Let > be the smallest relation on A-expressions compatible with application and abstraction and such that: ( [ x ] M N) > M [ x +- g ] . We call A-calculus the A-notation equipped with the fl-reduction computation rule >. )~-calculus is the basic calculus of substitution~ and fl-reduction is the basic computation mechanism of fimctional programming languages. Here is an example of computation:
>2 ([~] (~ u) [w] [~3 (~ u)) We briefly sketch here the syntactic properties of ,\-caiculus. Similarly to the theory developped above, the notion of residual can be defined. However, the residuals of a redex may not always be disjoint, and thus the theory of derivations is more complex. However the parallel moves lemma still holds, and thus the Church-Rosser property is also true. Finally, the standardization theorem holds, and here it means that it is possible to compute in a teftmost-outermost fashion. These results, and more details, in particular the precise conditions under which fl-reduction simulates combinatory logic calculus, are precisely stated in Barendregt [4]. We finally remark that A-calculus computations may not always terminate. For instazlce, with A = [u] (u u) and 3_ = (A A), we get 3_ > 3_ > ... A more interesting example is given by
Y = [ f ] ( [ u ] ( f (uu))
[ u ] ( f (uu)))
since (Y f) >* (f (Y f)) shows that Y defines a general fixpoint operator. This shows that (full) A-cMculus is inconsistent with logic. What could ( f i x -,) mean? As usual with such paradoxical situations, it is necessary to introduce types in order to stratify the definable notions in a logically meaningful way. Thus, the basic inconsistency of Church's A-calculus, shown by Rosser, led to Church's theory of types [22]. On the other hand, A-calculus as a pure computation mechanism is perfectly meaningful, and Strachey prompted Scott to develop the theory of reflexive domains as a model theory for full )~-calculus. But let us first investigate the typed universe. 4.4
G e n t z e n ' s s y s t e m N of n a t u r a l d e d u c t i o n
The idea of A-notation proofs underlies Gentzen's natural deduction inference rules [48], where App is called -*-elim and Abs is called -~-intro. The role of variables is taken by the base sequents:
AxiomA
: A ~- A
together with the structural ~1~nning rule:
Thinning
:
F~-B
ru{A} ~-B
which expresses that a proof may not use all of the hypotheses. Gentzen's remaining rules give types to proofs according to propositions built as flmctor terms, each functor corresponding to a propositional connective. The main idea of his system is that inference nlles should not bc arbitrary,
59
but should follow the flmctor structure, in explaining in a unifl)rm fashion how to introduce a functor, and how to eliminate it. For instance, mininal logic is obtained with ~ = {-~}, and the rules of ~ - i n t r o and ---* - e l i m , that is: Abs App
:
:
FU{A} b B F F" A - " + B
P b A-~B
A bA
FUA F B Now, the fl-reduction of A-calculus corresponds to cut-elimination, i.e. to proof-simplification. Reducing a redex corresponds to eliminating a detour in the demonstration, using an intermediate lemma. But now we have termination of this normalization process, that is the relation ~ is Ncetherian on valid proofs. This result is usuMly called strong normalization in proof theory. A full account of this theory is given in Stenlund [149]. Minimal logic can then be extended by adding more functors and corresponding inference rules. For instance, conjunction A is taken into account by the intro rule: Pair
:
PFA AbB PUA b AAB
which, from the types point of view, may be considered as product formation, and by the two elim rules: Fst Snd
FPAAB PbA FbAAB : PFB
:
corresponding to the two projection functions. This corresponds to building-in a ),-calculus with pairing. (~eneralizing the notion of redex (cut) to the configuration of a connective intro, immediately followed by elim of the same connective, we get new computation rules: F~t(P~ir(~,y))
~ x
Snd(Pair(x,y))
~> y
and the Noetherian property of E> still holds. We shall not develop further Gentzen's system. We just remark: (a) More connectives, such as disjunction, can be added in a similar fashion. It is also possible to give rules for quantifiers, although we prefer to differ this topic until we consider dependenL bindings. (b) Gentzen originally considered natural deduction systems for meta-mathematical reasons, namely to prove thcir consistency. He considered ~mother presentation of sequent inference rules, the L system, which possesses the subformula property (i.e. the result type of every operator is formed of subterms of the argument types), and is thus trivially consistent. Strong normalization in this context was the essential technical tool to establish the equivalence of the L and the I'~ systems. Of course, according to Ghdel's theorem, this does not establish absolute consistency of the logic, but relativizes it to a carefully identified troublesome point, the proof of termination of some reduction relation. This has the additional advantage to provide a hierarchy of strength of inference systems, classified according to the ordinal necessary to consider for the termination proof.
60 (e) All this development concerns so called intuitJonistic logic, where operators (inference rules) arc deterministic. It is possible to generalize the inference systems to classical logic, using a generalized notion of sequent I" b- A, where the right part A is also a set of propositions. It is possible to explain the composition of such non-deterministic operators, which leads to Gentzen's systems NK and LK (Klassical logic!). Remark that the analogue of the unification theorem above gives then precisely Robinson's resolution principle for general clauses [139]. (d) The categorical viewpoint fits nicely these developments. This point of view is completely developped in Szabo [151]. The specially important connections between A-calculus, natural deduction proofs and cartesian closed categories are investigated in [98,121,87,142,35,68]. Further readings on natural deduction proof theory are Prawitz [130] and Dummett [41]. The connection with recursion theory is developped in Kleene [82] and an algebraic treatment of these matters is given in Rasiowa-Sikorski [133]. 4.5
Programming
languages, recurslon
The design of programming languages such as ALGOL 60 was greatly influenced by A-calculus. In 1966 Peter Landin wrote a landmark article setting the stage for coherent design of powerful functional languages in the A-calculus tradition [89]. The core language of his proposal, /SWIM (If you see what I mean!) meant A-calculus, with syntactically sugared versions of the fl-redex ( [ z ] M N), namely let x = N in M and M where x = N respectively. His language followed the static binding rules of A-calculus. For instance, after the declarations: let f x
= x+ywherey=l; let y = 2;
the evaluation (reduction) of expression (f 1) leads to value 2, as expected. Note that in contrast laziguages such as LISP [107], although bearing some similarity with the A-notation, implement rather dynamic binding, which would result in the example above in the incorrect result 3. This discrepancy has led to heated debates which we want to avoid here, but we remark that static binding is generally considered safer and leads to more efficient implementations where compilation is consistent with interpretation. However, ISWIM is not completely faithful to A-calculus in one respect: its implementation does not follow the outside-in normal order of evaluation corresponding to the standardization theorem. Instead it follows the inside-out applicative order of evaluation demanding the arguments to be evaluated before a procedure is called. In the ALGOL terminology, ISWIM follows can by value instead of call by name. The development, of natural deduction as typed A-calculus fits the development of an ISWIMbased language with a type discipline. We shall call this language ML , which stands for "metalanguage", in the spirit of LCF's ML [54,53]. For instance, we get a core ML0 by considering minimal logic, with ~ interpreted as functionality, and further constant functors added for basic types such as triv, boot, int and string. Adding products we get a language ML1 where types reflect an intuitionistic predicate calculus with ~ and A. We may define functions on a pattern argument formed by pairing, such as: let fs~(x, y) = z
and the categorical analogue are the so-called cartes/an closed categories (CCCs). Adding sums lead to Bi-CCC's with co-product. The corresponding I~L2 primitives are int, inr, out1, outr and is1, with obvious meaning. So far all computations terminate, since the corresponding reduction relations are Ncetherian.
61
However such a programming language is too weak for practical use, since recursion is missing. Adding recursion operators may be clone in a stratified manner, as presented in Gbdel's system T [51], or in a completely general way in ML~,where we allow a "letrec" construct permitting arbitrary recursive definitions, such as: letrec f a c t n :
i f n = O then l else n * (/fact ( n - l ) )
But then we loose the termination of computations, since it is possible to write un-founded definitions such as letrec absurd x = absurd x. Furthermore, because ML follows the applicative order of evaluation we may get looping computations in cases where a A-calculus normal form exists, such as for let f z = 0 in f (absurd x),
4.6
Polymorphism
We have polymorphic operators (inference rules) at the meta level. It seems a good idea to push polymorphism to the object level, for functions defined by the user as A-expressions. To this en.d, we introduce bindings for type variables. This iclea of type quantification corresponds to allowing proposition quantifiers in our propositional logic. First we allow a universal quantifier in prenex position. That is, with To = T ( ~ , V), we now introduce type schemas in 7'1 = ToUVa T1, a E V. A (type) term in 7'1 has thus both free and bound variables, and we write F V ( M ) and B V ( M ) for the sets of free (respectively bound) variables. We now define generic instaneiation. Let r = V a i . . . a m . r o E TI a n d r s = Vfll...fln'r~ ET1. We definer s_>c riffv~ = a(v0) with D(a) C_ { a l , . . . , a m } and fll ~ FV(T) (1 < i _< n). Remark that _ acts on F V whereas _>(_;acts on B V . Also note ~" >_a ~" :* ~ff') >a ~(r) We now present t h e Damas-Milner inference system for polymorphic A-calculus [39]. In what follows, a sequent hypothesis A is assumed to be a list of specifications ~i : ri, with r i E T1, and we write F V ( A ) = Ui F V ( ~ ) . TAUT INST
: A I" x : t~
AFM:a : A FM : #
GEN
A~-M:r : A ~-M : V a . r
APP
:
a<_c#)
(a~FV(A)
: P--+T A ~-N : r l A F(MN) : r
AU{x:r'} FM : r A ~- [ x ] M : r I - - ~ r A F M : P A U { x : r I} ~ ' N : r : A Fletx=MinN : r ABS
LET
A t-M
(~:aEA)
:
62
For instance, it is all easy exercise to show that ~- let i =
[x]xin (it)
: c~ + c~.
The above system may be extended without difficulty by other flmctors such as product, and by other I~L contructions such as letrec. Actually every ML compiler contains a typechecker implementing implicitly tile above inference system. For instance, with the unary functor list and the following ML primitives: [] : (list a), c o n s : a × (list a) (written infix as a dot), hd : (list a) -~ a and tl : (list a) ---* (list o~), we may define recursively the map functional as: letree map f I = i f t = [3 the. [] else (f (hd I)) • map f (tl l) and we get as its type: ~- m a p :
(a ~ fl) --~ (list a) --* (list fl).
Of course the I~ILcompiler does not implement directly the inference system above, which is non-deterministic because of rules I N S T and G E N . It uses unification instead, and thus computes deterministieally a principal type, which is minimum with respect to -
T h e l i m i t s o f ML ~s p o l y m o r p h l s m
Consider the following ML definition: letree p o w e r n f u = i f n = O t h e n u
else f (po~e~ (~ - 1) f ~)
63
of type nat ~ (a ~ a) ~ (a ~ c~). This ['unction, which associates to natural n the polymorphic iterator mapping function f to the n-th power of f, may be considered a coercion operator between ML ;s internal naturals and Church's representation of naturals in pure A-calculus [23]. Let us recall briefly this representation. Integer 0 is reprcsented as the projection term [ f ] [u] u. Integer 1 is [ f ] [u] ( f u). More generally; n is represented as the functional ~ iterating a function f to its n-th power:
-----[f] [u] (f (f ...(fu)...)) and the arithmetlc operators may be coded respectively as: n+m
-- [ f ] [ u ] ( n f (m f u ) )
n×m
--- [ f ] ( n ( m f ) )
For instance; with 2 -- [ f ] [u] (f (f u)), we check that ~ x ~ converts to its normal form il. We would like to consider a type NAT
= Va~ (a -+ a) ~ (a ~ a)
and be able to type the operations above as functions of type N A T --+ N A T -+ N A T . However the notion of polymorphism found in ML does not support such a type, it allows only the weaker w.
((~ - , ~) -~ (~ -~ ~)) -~ ((~ -~ ~) -~ (~ -~ ~)) -~ ((~ -~ ~) -~ (~ - , ~))
which is inadequate, since it forces the same generic instanciation of N A T in the two arguments. W a r n i n g . These preliminary notes are very sketchy from now on. A future version will cover the topics below in greater depth. 4.8
Girard's
second order A-calculus.
The example above suggests using the universal type quantifier inside type formulas. We thus consider a functor alphabet based on one binary --* constructor and one quantifier V. We shall now consider a A-calculus with such types, which we shall call second-order A-calculus, owing to the fact that the type language is now a second-order propositional logic, with propositional variables explicitly quantified. Such a calculus was proposed by J.Y. Girard [49,50], and independently discovered by J. Reynolds [135]. Girard proved the main properties of the calculus: G i r a r d ' s t h e o r e m . Second-ol-der A-calculus admits strong normalization. C o r o l l a r y . Second-order natural deduction is consistent. Girard used this last resu!t to show the consistency of analysis. Second-order A-calculus is a very powerfnl language. Most usual data structures may be represented as types. Iaarthermore, it captures a large class of total reeursive functions (precisely, all the functions prouvably total in second-order arithmetic). It may seriously be considered as a candidate for the foundations of powerfill programming languages, where recursion is replaced by iteration. But the price we pay by extending polymorphism in this drastic fashion is that the notion of principal type is lost. Type synthesis is possible only in easy cases; and thus in general the programmer has to specify the types of its data. Further discussions on the second-order A-calculus may be found in [1(}8,46,91;7].
64
5 5.1
Dependent types Quantification
So far we have dealt only with types as propositions of some (intuitionistic) propositional logic. We shall now consider stronger logics, where it is possible to have statements depending upon variables that are A-bffund. We shall continue our identification of propositions and types, and thus consider a first-order statement such as Vx E E - P(z) as a product-forming type H~eEP(z )We shall call such types depende_~t, in ttmt it is now possible to declare a variable of a type which depends on the binding of some previously bound variable. Let us first of all remark that such types are absolutely necessary for practical programming purposes. For instance, a matrix manipulation procedure should have a declaration prefix of the type:
[n : nat] [matri~ : array(n)] where the second type depends on the dimension parameter. PASCAL programmers know that the lack of such declarations in the language is a serious hindrance. We shall not develop first-order notions here: and shall rather jump directly to calculi based on higher-order logic. 5.2
Martin-LSf~s Intuitionistic
Theory
of Types
P. Martin-LSf has been developing for the last 10 years a higher-order intuitionist logic based on a theory of types, allowing dependent sums and products [104,105,106]. His theory is not explicitly based on ;~-calculus, but it is formulated in the spirit of natural deduction, with introduction mid elimination rules for the various type constructors. Consistency is infcrred from semantic considerations, with a model theory giving an analysis of the normal forms of elements of a type, and of the equality predicate for each type. Martin-LSf's system has been advocated as a good candidate for the description and validation of computer programs, and is an active topic of research by the GSteborg Programming Methodology group [117,119,120]. A particularly ambitious implementation of Martin-LSf's system and extensions is under way at Cornell University, under the direction of R. Constable [25,26,132]. 5.3
de Bruijn's
AUTOMATH
languages
The mathematical language AUTOMATH has been developed and implemented by the Eindhoven group, under the direction of prof~ N.G. de l~ru~jn [14,t6~18]. AUTOMATH is a h-calculus with types that are themselves ;~-expressions: It is based on the natural idea that .~-binding and universal instaaciation are similar substitution operations. Thus in AUTOMATH there is only one binding operation, used both for parameter abstraction and product instanciation. The meta-theory of the various languages of the AUTOMATH family are investigated in [113,38,75]. The most notable success of the AUTOMATH effort has been the translation and mechanical validation of Landau's Grundlagen [74].
5.4
A Calculus of Constructions.
AUTOMATH established the correct linguistic foundations for higher-order natural deduction. Unfortunately, it did not allow Girard's second-order types, and probably for this reason was never considered under the programming language aspect. Th. Coquand showed that a slight extension of
65 the notation allowed the incorporation of Girard's types to AUTOMATH in a natural m~umer [27]. Coquand showed by a strong normalization thcorem that the formalism is consistent. Experiments with ,~1 implcmentation of the calculus showed that it is welt adapted to expressing naturally and concisely mathematical proofs and computer algorithms [29]. Variations on this calculus are under development [30,31].
Conclusion We have presented in these notes a uniform account of logic and computation theory, based on proof theory notions, and most importantly on the Curry-Howard isomorphism between propositions and types [37,59]. These notes are based on a course given at the Advanced School of Artificial Intelligence, Vigneu, France, in July 1985. An extended version is in preparation.
References [1] A. Aho, J. Hopcroft, J. Ullman. Addison-Wesley (1974).
~'~TheDesign and Analysis of Computer Algorithms."
[2] P. B. Andrews. "Resolution in Type Theory." Journal of Symbolic Logic 36~3 (1971), 414432. [3] P. B. Andrews, D. A. Miller, E. L. Cohen, F. Pfenning. ~Automating higher-order logic." Dept of Math, University Carnegie-Mellon, (Jan. 1983). [4] It. Barendregt. "The Lambda-Calculus: Its Syntax and Semantics ." North-Holland (1980). [5] E. Bishop. "Foundations of Constructive Analysis." McGraw-Hill, New-York (1967). [6] E. Bishop. "Mathematics as a numerical language." Intuitionism and Proof Theory, Eds. J. Myhill, A.Kino and R.E.Vesley, North-Holland, Amsterdam, (1970) 53-71. [7] C. BShm, A. Berarducci. "Automatic Synthesis of Typed Lambda-Programs on Term Algebras." Unpublished manuscript, (June 1984). [8] R.S. Boyer, J Moore. "The sharing of structure in theorem proving programs." Machine Iatelligence 7 (1972) Edinburgh U. Press, 101-116. [9] R. Boycr, J Moore. "A Lemma Driven Automatic Theorem Prover for Recursive Function Theory." 5th International Joint Conference on Artificial Intelligence, (1977) 511-519. [I0] R. Boyer, J Moore. "A Computational Logic." Academic Press (1979). [11] R. Boyer, J Moore. "A mechanical proof of the unsolvability of the halting problem." Report ICSCA-CMP-28, Institute for Computing Science, University of Texas at Austin (July 1982). [12] R. Boyer, J Moore. "Proof Checking the RSA Public Key Encryption Algorithm." Report ICSCA-CMP-33, Institute for Computing Science, University of Texas at Austin (Sept. 1982). [13] R. Boyer, J Moore. "Proof checking theorem proving and program verification." Report ICSCA-CMP-35, institute for Computing Science, University of Texas at Austin (Jan. 1083).
66 [t41 N.G. de Bruijn. "The matlmmatical language AUTOMATH, its usage and some of its extensions." Symposium on Aut.omatic Demonstration, IRIA, Versailles, 1968. Printed as Springer-Vcrlag Lecture Notes in Mathematics 125, (1970) 29-61. [15] N.G. de Bruijn. "Lambda-Calculus Notation with Nameless Dummies, a Tool for Automatic Formula Manipulation, with Application to the Church-Rosser Theorem." Indag. Math. 34,5 (1972), 381-392. [16] N.G. de Bruijn. ¢'Automath a language for mathematics." Les Presses de l'Universit~ de Montreal, (1973). [17] N.G. de Bruijn. "Some extensions of Automath: the AUT-4 family." Internal Automath memo M10 (Jan. 1974). [18] N.G. de Bruijn. "A survey of the project Automath." (i980) in to H. B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism, Eds Seldin J. P. and Hindle$ J. R., Academic Press (1980). [19] M. Bruynooghe. "The Memory Management of PROLOG implementations." Logic Programming Workshop. Ed. Tarnlund S.A (July 1980). [20] L. Cardelli. "MLunder UNIX." Bell Laboratories, Murray Hill, New Jersey (1982). [21] L. Cardelli. "Amber." Bell Laboratories, Murray Hill, New Jersey (1985). [22] A. Church: "A formulation of the simple theory of types." Journal of Symbolic Logic 5,1 (1940) 56-68. [23] A. Church. "The Calculi of Lambda-Conversion." Princeton U. Press, Princeton N.J. (1941). [24] A. Colmerauer, H. Kanoui, R. Pasero, Ph. Ronssel. "Un syst~me de communication hommemachine en francais." Rapport de recherche, Groupe Intelligence Artificielle, Facult~ des Sciences de Luminy, Marseille (1973). [25] R.L. Constable, J.L. Bates. "Proofs as Programs." Dept. of Computer Science, Cornell University. (Feb. 1983). [26] R.L. Constable, J.L. Bates. "The Nearly Ultimate Pearl." Dept. of Computer Science, Cornell University. (Dec. 1983). [27] Th. Coquand. "Une th~.orie des constructions." Th~se de troisi~me cycle, Universlt~ Paris
v n (Jan. 85), [28] Th. Coquand, G. Huet. "A Theory of Constructions." Preliminary version, presented at the International Symposium on Semantics of Data Types, Sophia-Antipolis (June 84). [29] Th. Coquand, G..Huet. "Constructions: A Higher Order Proof System for Mechanizing Mathematics." EUROCAL85, Linz, Springer-Verlag LNCS 203 (1985). [30] Th. Coquand~ G. Huet. "Concepts Mathhmatiques et Informatiques Formalls4s dans le Calcul des Constructions." Colloque de Logique, Orsay (auil. 1985). [31] Th. Coquand, G. Huet. "A Calculus of Constructions." To appear, JCSS (1986).
67 [32] J. Corbin, M. Bidoit. "A Rehabilitation of Robinson's Unification Algorithm." IFIP 83, Elsevier Science (1983) 909-914. [33] G. Cousinean, P.L. Curien and M. Manny. "The Categorical Abstract Machine." In Functional Programming Languages and Computer Architccture, Ed. J. P. Jouannaud, SpringerVerlag LNCS 201 (1985) 50-64. [34] P.L. Curien. "Combinateurs cat6goriques, algorithmes s6quentiels et programmation applicative ." Th~se de Doctorat d'Etat, Universit6 Paris VII (Dec. 1983). [35] P. L. Curien. "Categorical Combinatory Logic." ICALP 85, Nafplion, Springer-Verlag LNCS 194 (1985). [36] P.L. Curien. "Categorical Combinators, Sequential Algorithms and Functional Programming." Pitman (1986). [37] H. B. Curry, R. Feys. "Combinatory Logic Vol. I." North-Holland, Amsterdam {1958). [38] D. Van Daalen. "The language theory of Automath." P h . D . Dissertation, Technological Univ. Eindhoven (1980). [39] Luis Damas, Robin Milner. "Principal type-schemas for functional programs ." Edinburgh University (t982). [40] P.J. Downey, R. Sethi, R. Tarjan. "Variations on the common subexpression problem." JACM 27,4 (1980) 758-771. [41] Dummett. "Elements of Intuitionism." Clarendon Press, Oxford (1977). [42] F. Fages. "Formes canoniques dans les algtbres bool~ennes et application g la d6monstration automatique en logique de premier ordre ." Thtse de 3tree cycle, Univ. de Paris VI (Juin 1983). [43] F. Fages. "Associative-Commutative Unification." Submitted for publication (1985). [44] F. Fages, G. Huet. "Unification and Matching in Equational Theories." CAAP 83, l'Aquila, Italy. In Springer-Verlag LNCS 159 (1983). [45] P. Flajolet, J.M. Steyaert. "On the Analysis of Tree-Matching Algorithms." in Automata, Languages and Progfamnfing 7th hit. Coll., Lecture Notes in Computer Science 85 Springer Verlag (1980) 208-219. [46] S. Fortune, D. Leivant, M. O~Donnetl. "The Expressiveness of Simple and Second-Order Type Structures." Journal of the Assoc. for Comp. Mach., 39,1, (Jan. 1983) 151-185. [47] G. Frege. "Begriffschrift, a formula language, modeled upon that of arithmetic, for pure thought." (1879). Reprinted in From Frege to GSdel, J. van Heijenoort, Harvard University Press, 1967. [48] G. Gentzen. "The Collected Papers of Gerhard Gentzen." Ed. E. Szabo, North-Holland, Amsterdam (1969).
68 [49] J.Y. Girard. "Une extension de Finterprc~tation de GSdel "~ l'analyse, et son application l'~timination des coupures dans l'a~tatyse et la th6orie des types. Proceedings of the Second Scandinavian Logic Symposium, Ed. J.E. Fenstad, North Holland (1970) 63-92. [50] J.Y. Girard. "Interprgtation fonctionnelle et 61imination des coupures dans l'arithm~tique d'ordre sup6rieure.' Th~se d'Etat, Universit~ Paris VII (1972). [51] K. G6del. "Uber eine bisher noch nicht benutze Erweitrung des finiten Standpunktes." Dialectica, 12 (1958).. [52] W. D. Goldfarb. "The Undecidability of the Second-order Unification Problem." Theoretical Computer Science, 13, (1981) 225-230. [53] M. Gordon, R. Milner, C. Wadsworth. "A Metalanguage for Interactive Proof in LCF." Internal Report CSR-16-77, Department of Computer Science, University of Edinburgh (Sept. 1977). [54] M. J. Gordon, A. J. Milner, C. P. Wadsworth. "Edinburgh LCF" Springer-Verlag LNCS 78
(1979). [55] W. E. Gould. "A Matching Procedure for Omega Order Logic." Scientific Report 1, AFCRL 66-781, contract AF19 (628)-3250 (1966). [56] J. Guard. "Automated Logic for Semi-Automated Mathematics." Scientific Report 1, AFCRL (1964). [57] J. Herbrand. "Recherches sur la th6orie de la d~monstration." Th~se, U. de Paris (1930). In: Ecrits Iogiques de J~cques Herbrand, PUF Paris (1968). [58] C. M. Hoffmann, M. J. O'Donnell. "Programming with Equations." ACM Transactions on Programming Languages and Systems, 4~1 (1982) 83-112. [59] W. A. Howard. "The formulm-as-types notion of construction." Unpublished manuscript (1969). Reprinted in to H. B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism, Eds Seldin J. P. and Hindley J. R., Academic Press (1980). [60] G. Huet. "Constrained Resolution: a Complete Method for Type Theory." Ph.D. Thesis, Jennings Computing Center Report 1117, Case Western Reserve University (1972). [61] G. Huet. "A Mechanization of Type Theory." Proceedings, 3rd IJCAI, Stanford (Aug. 1973). [62] G. Huet. "The Undccidability of Unification in Third Order Logic." Information and Control 22 (1973) 257-267. [63] G. Huet. "A Unification Algorithm for Typed Lambda Calculus." Theoretical Computer Science, 1.1 (1975) 27--57. [64] G. Huct. "R~solution d'~quations dans des langages d'ordre 1,2, ... w." Th~se d'Etat, Universit~ Paris VII (1976). [65] G. Huet. "Confluent Reductions: Abstract Pioperties and Applications to Term Rewriting Systems." J. Assoc. Comp. Mach. 27,4 (1980) 797-821.
69 [66] G. Huet. "A Complete Proof of Correctness of the Kml~h-Bendix Complcti~m Algorithm ." JCSS 23,1 (1981) 11-21. [67] G: Huet. "Initiation £ ta Th4orie des Cat6gories." Polycopi4 de cours de DEA, Universit6 Paris VII (Nov. 1985). [68] G. Huet. iCCartesian Cloud Categories and Lambda-Calculus." Category Theory Seminar, Carnegie-Mellon University (Dec. 1985). [69] G. Huet, J.M. Hullot. "Proofs by L~duction in Equational Theories With Constructors." JACM 25,2 (1982) 239-266. [70] G. Huet, J.J. Ldvy "Call by Need Computations in Non-Ambiguous Linear Term Rewriting Systems." Rapport Laboria 359, IRIA (Aug. 1979). [71] G. tibet, D. Oppen. "Equations and Rewrite Rules: a Survey." In Formal Languages: Perspectives and Open Problems, Ed. Book R., Academic Press (1980). [72] J.M. Hullot "Compilation de Formes Canonlques dans les Theories Equationnelles ." Th~se de 3~me cycle, U. de Paris Sud (Nov. 80). [73] Jean Pierre Jouannaud, Hclene Kirchner. "Completion of a set of rules modulo a set of equations." (April 1984). [74] L.S. Jutting. ~A translation of Landau's "Grundlagen" in AUTOMATH." Eindhoven University of Technology, Dept of Mathematics (Oct. 1976). [75] L.S. van Benthem Jutting. "The language theory of Aoo, a typed A-calculus where terms are types." Unpublished manuscript (1984). [76] G. Kahn, G. Plotkin. "Domaines concrets ." Rapport Laboria 336, IRIA (D6c. 1978). [77] J. Ketonen, J. S. Weening. "The language of an interactive proof checker." Stanford University (1984). [78] J. Ketonen. "EKL-A Mathematically Oriented Proof Checker." 7th International Conference on Automated Deduction, Napa, Califonfia (May 1984). Springer-Verlag LNCS 170. [79] J. Ketonen. "A mechanical proof of Ramsey theorem." Stanford Univ. (1983). [80] S.C. Kleene. "Introduction to Meta-mathematics." North Holland (1952). [81] S.C. Kleene. "On the interpretation of intuitionistic number theory." J. Symbolic Logic 31
(1945). [82] S.C. Kleene. "On the interpretation of intuitionistic number theory." J. Symbolic Logic 31 (1945). [83] J.W. Klop. "Combinatory ~eduction Systems." Ph. D. Thesis, Mathematisch Centrum Amsterdam (1980). [84] D. Knuth, P. Bendix. "Simple word problems in universal algebras". In: Computational Problems in Abstract Algebra, J. Leech Ed., Pergamon (1970) 263--297.
70 [85] D.E. Knuth, J. Morris, V. Pratt. '~Fast Pattern MatchiIig in Strings." SIAM JouruM on Computing 6~2 (1977) 323-350. [86] G. Kreisel. "On the interpretation of nonfinitist proofs, Part I, II." JSL 16 (1952, i953). [87] J. Lambek. "From Lambda-calculus to Cartesian Closed Categories." in To H. B. Curry: Essays on Combinatory Logic, Lambda-calculus and Formalism, Eds. J. P. Seldin and J. R. Hindley, Academic Press (1980). [88] J. Lambek and P. J. Scott. "Aspects of Higher Order Categorical Logic." Contemporary Mathematics 30 (1984) 145-174. [89] P. J. Landin. "The next 700 programming languages." Comm. ACM 9,3 (1966) 157-166. [90] Philippe Le Chenadec. "Formes canoniques dans les alg~bres finiment pr~sent~es ." Th~se de 3~me cycle, Univ. d'Orsay (Juin 1983). [91] D. Leivant. "Polymorphic type inference." 10th ACM Conference on Principles of Programruing Languages (1983). [92] D. Leivant. "Structural semantics for potymorphic data types." 10th ACM Conference .on Principles of Programming Languages (1983). [93] J.J. L~vy. "R~ductions correctes et optimales dans le A-calcul," Th~se d'Etat, U. Paris VII (1978). [94] S. MacLane. "Categories for the Working Mathematician." Springer-Verlag (1971). [95] D. MacQueen, G. Plotkin, R. Sethi. "An ideal model for recursive polymorphic types." Proceedings, Principles of Programming Languages Symposium, Jan. 1984, 165-174. [96] D. B. MacQueen, R. Sethi. "A semantic model of types for applicative languages." ACM Symposium on Lisp and Functional Programming (Aug. 1982). [97] E.G. Manes, "Algebraic Theories." Sprh~ger-Verlag (1976). [98] C. Mann. "The Connection between Equivalence of Proofs and Cartesian Closed Categories." Proc. London Math. Soc. 31 (1975) 289-310. [99] A. Martelli, U. Montanari. "Theorem proving with structure sharing and efficient unification." Proc. 5th IJCAI, Boston, (1977) p 543. [100] A. Ma~'telli, U, Montanari. "An Efficient Unification Algorithm ." ACM Trans. on Prog. Lang. and Syst. 4~2 (1982) 258-282. [101] William A. Martin. "Determining the equivalence of algebraic expressions by hash coding." JACM 18,4 (1971) 549-558. [102] P. Martin-LSf. "A theory of types." Report 71-3, Dept. of Mathematics, University of Stockholm, Feb. 1971, revised (Oct. 1971). [103] P. Martin-L6f. "About models for intuitionistie type theories and the notion of definitional equality." Paper read at the OrlSans Logic Conference (1972).
71 [104] P. Martin-Lhf. "An intuitionistic Theory of Types: predicative part." Logic Colloquium 73, Eds. H. Rose and J. Sepherdson, North-Holland~ (1974) 73-118. [105] P. Martin-LhL "Constructive Mathematics and Computer Programming." In Logic~Methodology and Philosophy of Science 6 (1980) 153-175, North-Holland. [106] P. Martin-Lhf. "Intuitionistic Type Theory." Studies in Proof Theory, Bibliopotis (1984). [107] J. Mc Carthy. "Recursive functions of symbolic expressions and their computation by machine." CACM 3,4 "(1960) 184-195. [108] N. McCracken. "An investigation of a programming language with a polymorphic type structure." Ph.D. Dissertation, Syracuse University (1979). [109] D.A. Miller. "Proofs in Higher-order Logic." Ph. D. Dissertation, Carnegie-Mellon University (Aug. 1983). [110] D.A. Miller. "Expansion tree proofs and their conversion to natural deduction proofs." Technical report MS-CIS-84-6, University of Pennsylvania (Feb. 1984). [111] R. Milner. "A Theory of Type Polymorphism in Programming." Journal of Computer and System Sciences 17 (1978) 348-375. [112] R. Mitner. "A proposal for Standard ML ." Report CSR-157-83, Computer Science Dept., University of Edinburgh (1983). [113] R.P. Nederpelt. "Strong normalization in a typed A calculus with ~ structured types." Ph. D. Thesis, Eindhoven University of Technology (1973). [114] R.P. Nederpelt. "An approach to theorem proving on the basis of a typed ;~-calcutus2 5th Conference on Automated Deduction, Les Arcs, France. Springer-Verlag LNCS 87 (1980). [115] G. Nelson, D.C. Oppen. "Fast decision procedures based on congruence closure." JACM 27,2 (1980) 356-364. [116] M.H.A. Newman. "On Theories with a Combinatorial Definition of "Equivalence"." Annals of Math. 43,2 (1942) 223-243. [I17] B. Nordstrhm. "Programming in Constructive Set Theory: Some Examples." Proceedings of the ACM Cor~erence on Functional Programming Languages and Computer Architecture, Portnmuth~ New Hampshire (Oct. I981) 141-154. [t18] B. Nordstrhm. "Description of a Simple Programming Language." Report 1, Programming Methodology Group, University of Goteborg (Apr. 1984). [119] B. Nordstrhm, K. Petersson. ~'Types and Specifications." Information Processing 83, Ed. R~ Mason, North-Holland, (1983) 915-920. [120] B. Nordstrhm, J. Smith. "Propositions and Specifications of Programs in Martin-L6f's Type Theory." BIT 24, (1984) 288-301. [121] A. Obtulowicz. "The Logic of Categories of Partial ~nctions and its Applications." Dissertationes Mathematicae 241 (1982).
72 [1221 M.S. Paterson, M.N. Wegman. ~Linear Unification ." J. of Computer and Systems Sciences 16 (1978) 158-167. [123] L. Paulson. "Recent Developments in LCF : Examples of structural induction." Technical Report No 34, Computer Laboratory, University of Cambridge (Jan. 1983). [124] L. Paulson. "Tactics and Tacticals in Cambridge LCF." Technical Report No 39, Computer Laboratory, University of Cambridge (July 1983). [125] L. Paulson. "Verifying the unification algorithm in LCF." Technical report No 50, Computer Laboratory, University of Cambridge (March 1984). [126] L. C. Paulson. ~'Constructing Recursion Operators in Intuitionistic Type Theory." Tech. Report 57, Computer Laboratory, University of Cambridge (Oct. 1984). [127] G.E. Peterson, M.E. Stickel. "Complete Sets of Reductiol~ for Equational Theories with Complete Unification Algorithms ." JACM 28,2 {1981) 233-264. [128] T. Pietrzykowski, D.C. Jansen. "A complete mechanization of c0-order type theory." Proceedings of ACM Annual Conference (1972). [129] T. Pietrzykowski. "A Complete Mechanization of Second-Order Type Theory." JACM 20 (1973) 333-364. [130] D. Prawitz. "Natural Deduction." Ahnqist and Wiskell, Stockolm (1965). [131] D. Prawitz. "Ideas and results in proof theory." Proceedings of the Second Scandinavian Logic Symposium (1971). [132] PRL staff. "Implementing Mathematics with the NUPRL Proof Development System." Computer Science Department, Cornelt University (May 1985). [133] H. Rasiowa, R. Sikorski "The Mathematics of Metamathematics." Monografie Matematyczne tom 41, PWN, Polish Scientific Publishers, Warszawa (1963). [134] J. C. Reynolds. "Definitional Interpreters for Higher Order Programming Languages." Proc. ACM National Conference, Boston, (Aug. 72) 717-740. [135] J. C. Reynolds. "Towards a Theory of Type Structure." Programming Symposium, Paris. Sprlnge~ Verlag LNCS 19 (t974) 408-425. [136] J. C. l'teynolds. "Types, abstraction, and parametric polymorphism." IFIP Congress'83, Paris (Sept. 1983). [137] J. C. Reynolds. "Potymorphism is not set-theoretic." International Symposium on Semantics of Data Types, Sophia-Antipolis (June 1984). [138] J. C. Reynolds. "Three approaches to type structure." TAPSOFT Advanced Seminar on the Role of Semantics in Softwire Development, Berlin (March 1985). [139] J. A. Robinson. "A Machine-Oriented Logic Based on the Resolution Principle ." JACM 12 (1965) 32-41.
73 [140] J. A. Robinson. "ComputationM Logic: the Unification Computation ." Machine Intelligence 6 Eds B. Meltzer and D.Michie, American Elsevier, New-York (1971). [141] D. Scott. "Constructive validity. ~ Symposium on Automatic Demonstration, Springer-Verlag Lecture Notes in Mathematics, 125 (1970). [142] D. Scott. "Relating Theories of the Lambda-Calculus." in To H. B. Curry: Essays on Combinatory Logic, Lambda-calculus and Formalism, Eds. J. P. Scldin and J. R. Hindley, Academic Press (1980). [143] J.R. Shoenfield. "Mathematical Logic2 Addison-Wesley (1967). [144] R.E. Shostak ~Deciding' Combinations of Theories." JACM 31,1 (1985) 1-12. [145] J. Smith. "Course-of-values recursion on lists in intuitionistic type theory." Unpublished notes, GSteborg University (Sept. 1981). [146] J. Smith. "The identification of propositions and types in Martin-Lofts type theory : a programming example. ~ International Conference on Foundations of Computation Theory, Borghotm, Sweden, (Aug. 1983) Springer-Verlag LNCS 158. [147] It. Statman. "intuitionistic Propositional" Logic is Polynomial-space Complete." Theoretical Computer Science 9 (1979) 67-72, North-Holland. [148] I~. Statman. "The typed Lambda-CaIculus is not Elementary Recursive." Theoretical Computer Science 9 (1979) 73-81. [149] S. Stenlund. "Combinators h-terms, and proof theory." P~eidel (1972). [150] M.E. Stickel ~A Complete Unification Algorithm for Associative-Commutative Functions." JACM 28,3 (1981) 423-434. [151] M.E. Szabo. "Algebra of Proofs." North-Holland (1978). t152] W. Tait. "A non constructive proof of Gentzen's Hauptsatz for second order predicate logic." Bull. Amer. Math. Soc. 72 (1966). [153] W. Tait. "Intensional interpretations of functionals of finite type I." J. of Symbolic Logic 32 (1967) 198-212. I154] W. Tait. '~A Realizability Interpretation of the Theory of Species." Logic Colloquium, Ed. R. Parikh, Springer Verlag Lecture Notes 453 (1975). [155] M. Takahashi. "A proof of cut-elimination theorem in simple type theory." J. Math. Soc. Japan 19 (1967). [156] G. Takeuti. "On a generalized logic calculus." Japan J. Math. 23 (1953). [157] G. Takeuti. "Proof theory." Studie.~ in Logic 81 Amsterdam (1975). [158] R. E. Tarjan. "Efficiency of a good but non linear set union algorithm." JACM 22,2 (1975) 215-225.
74
[159] t~.. E. Tarjan, J. van Leeuwen. "Worst-case Analysis of Set Union Algorithms." JACM 31~2
(19s5) 245-2sl. [160] A. Tarski. "A lattice-theoretical fixpoint theorem and its applications." Pacific J. Math. 5 (1955) 285-309. [161] D.A. ~lrner. "Miranda: A non-strict functional language with polymorphic types." In ~ n c tional Programming Languages and Computer Architecture. Ed. J. P. Jouannaud~ SpringerVerlag LNCS 201 ([985) 1-16. [162] R. de Vrijer "Big Trees in a )t-calculus with )t-expressions as types," Conference on A-calculus and Computer Science Theory, Rome, Springer-Verlag LNCS 37 (1975) 252-271. [163] D. Warren "Applied Logic - Its use and implementation as a programming tool." Ph.D. Thesis~ University of Edinburgh (1977).
A n Introduction to A u t o m a t e d D e d u c t i o n M a r k E. Stickel Artificial Intelligence Center SRI International M e n l o P a r k , C a l i f o r n i a 94025
Contents 1
Introduction
2 Resolution 2.1 Elimination of Tautologies 2.2 Purity 2.3 Subsumption 2.4 Set of Support 2.5 P1 and N1 Resolution 2.6 Hyperresolution 2.7 Unit Resolution 2.8 Unit-Resulting Resolution 2.9 Input Resolution 2.10 Prolog 2.11 Linear Resolution 2.12 Model Elimination 2.13 Prolog Technology Theorem Prover 2.14 Connection-Graph Resolution 2.15 Nonclausal Resolution 2.16 Connection Method 2.17 Theory Resolution 2.18 Krypton Unification 3.1 Unification in Equational Theories 3.2 Commutative Unification 3.3 Associative Unification 3.4 Associative-Commutative Unification 3.5 Many-Sorted Unification 4
Equality Reasoning 4.1 Equality Axiomatization 4.2 Demodulation 4.3 Paramodulation 4.4 Resolution by Unification and Equality 4.5 E-Resolution 4.6 Knuth-Bendix Method References
76
1
Introduction
In this chapter, we present an informal introduction to many of the methods currently used in automated deduction. The principal method for theorem proving that we discuss is resolution, but we are also substantially concerned with extending the resolution framework to reason more efficiently about particular theories. The chapters by G~rard Huet and Wolfgang Bibel complement this one. In this chapter, we treat classical logic and classical (if resolution, developed in 1963, can be considered classical!) methods of theorem proving. Huet considers systems that merge the notions of computation and deduction and Bibel extends classical reasoning to nonmonotonic reasoning, metalevel reasoning, and reasoning about uncertainty.
Increasingly many good books present various parts of the material that we informally give here. Following are brief descriptions of some of them and their strengths. Chang and Lee [12] was the first textbook for resolution, paramodulation, and unification, and it is still very useful. Loveland [52] and Bibel [5] are newer texts that are exceptionally strong in the areas of linear refinements of resolution and the connection method, which they have developed, respectively. Wos et al. [88] is written for a wider audience and reflects their practical experience in theorem proving; it is especially strong in the areas of deciding how to formalize problems and to select strategies for their solution. Kowalski [39] emphasizes the important connections between automated deduction and logic programming. Manna and Waldinger [56] (unfortunately, only Volume 1 is available so far) and Gallier [22] are new textbooks in symbolic logic that are oriented toward computer science and automated deduction. Some of the topics in this chapter are too new or specialized to be included in any textbooks, but references are included at the end of the chapter for those who want to learn more.
2
Resolution
One of the most important procedures for automated deduction is resolution [66]. Its application to propositional calculus theorem proving will be examined first. The language of the propositional calculus includes a set of propositional symbols P, Q, R, and the like, the logical connectives ~ v , / \ , ~ , and ---, the logical constants true and false, and parentheses.
A formula of the propositional calculus is one of
77 ® The atomic formula or atom P, where P is a propositional symbol • The negation -,A, where A is a formula of the propositional calculus • The disjunction (AVB), where A and B are formulas of the propositional calculus • The conjunction (A A B), where A and B are formulas of the propositional calculus • The implication (A D B), where A and B are formulas of the propositional calculus • the equivalence (A =-- B), where A and B are formulas of the propositional calculus. Since V, A, and ~- are associative, they are often treated as n-ary operators for arbitrary n so that, for example, (A V (B V C)) and ((A V B) v C) can both be written as (A V B V C). The 7, V, A, D, and = are ordered by declining operator precedence. For convenience, parentheses can be omitted where precedence can be used to determine the correct reading. Thus, for example, ((A A B) D C) coon also be expressed by A A B D C. Subformulas can be classified as occurring with positive polarity (positively) or with negative
polarity (negatively). A subformula occurs positively in a formula if it is embedded in an even number of explicit or implicit negations (equivalences and left-hand sides of implications implicitly negate formulas). A subformula occurs negatively in a formula if it is embedded in an odd number of explicit or implicit negations. Thus, for example, A occurs positively in A, A V B , A A B , B D A, and A - B and negatively in -~A, A D B, and A - B. Note that A and B and their subformulas occur both positively and negatively in A ----B.
An interpretation of a formula is an assignment of the truth values true or false to each propositional symbol in the formula. The value of a formula in an interpretation can be computed using the following truth table:
A
B
-~A
-~B
(AVB)
(AAB)
(ADB)
(A=B)
true
true
false
false
true
true
true
true
true
false
false
true
true
false
true
false
false
true
true
false
true
false
false
false
false
false
true
true
false
false
true
true
Given a formula of the propositional calculus and an interpretation of it, the value of the formula in the interpretation can be computed by replacing propositional symbols in the formula
78 by their values in the interpretation and reducing the formula by means of the t r u t h table to either true or false. An interpretation satisfes a formula and is a model of it if the formula is true in the interpretation. A formula is valid if and only if every interpretation is a model and unsatisfiable if and only if no interpretation is a model. T h e process of determining validity of a formula by the t r u t h - t a b l e m e t h o d is exponential in the worst case, requiring determination of the value of the formula in each of 2 ~ interpretations, where n is the n u m b e r of propositional symbols appearing in the formula. A l t h o u g h all known algorithms for determining validity are exponential, because propositional validity is an NPcomplete problem, methods such as resolution generally yield better performance t h a n truth-table evaluation as well as being more readily extended to theorem proving in the first-order predicate calculus. Resolution is a refutation procedure. Instead of determining the validity of a formula directly, it determines the unsatisfiability of its negation. Thus, the first step in the use of the resolution procedure is to negate the formula to be proved valid. In the case where it is intended to prove a theorem from a set of axioms, i.e., the formula is of the form A1 A • • • A A , D B, where the A~ are axioms and B is the theorem, t h e n negating the formula results in formation of the conjunction A1 A -.. A An A ~ B , i.e., only the theorem needs to be negated. For most forms of resolution, the formula must then be transformed into clause form.
A literal is either a propositional symbol (e.g., P ) or the negation of a proposition symbol (e.g., -~P).
The former are positive literals and the latter are negative literals. A clause is a
disjunction L1 V . - . V L,~ of literals. The logical constant false is sometimes referred to as the empty clause, because it can be viewed as the disjunction of zero literals. A unit clause is a clause with exactly one literal. More generally, an n-clause for n = 1, 2, 3 , . . . is a clause with exactly n literals. A clause with at least two literals is called a nonunit clause.
A positive clause is a clause all of whose literals are positive. A negative clause is a clause all of whose literals are negative. (The e m p t y clause can be considered b o t h positive and negative.)
A mixed clause is a clause which is neither positive nor negative, i.e., it has at least one positive and at least one negative literal. A Hcrn clause is a clause with at most one positive literal. A pair of literals is complementary if one is positive and the other is negative and their propositional symbols are the same. A formula is in clause form if it is a conjunction C1 A ... A C~ of clauses Ci.
Given that
79
conjunction and disjunction are associative, commutative, and idempotent, a formula is often regarded as a set of clauses, with each clause being a set of literals. A formula can be transformed to clause form by application of the following rewrites until the formula cannot be rewritten any further: (A ~ B) -* ( ( - A V B) A (-~B V A)) (A D B) --* (-~A V B) -~-~A ~ A -~(A V B) --* (-~A A -~B) ~(A A B) --* (-~A V -~B) (A V (B A C)) -+ ( ( A V B ) A ( A V C ) ) ( ( B A C ) V A) -* ((B V A) A (C V A)) Note that the clause form of a formula is not necessarily unique. For example, (P = Q) A (Q ~ R) A (R ~ P) is equivalent to both ( ~ P v Q) A (~Q v R) A (-~R v P) and (~Q v P) A (-~R V Q) A (-~P v R). The resolution rule of inference states that the resolvent clause A V B can be derived from the
parent clauses P V A and -~P v B, where P is a propositional symbol and A and B are arbitrary clauses. Thus, false can be obtained by resolving the clauses P and -~P, Q can be obtained by resolving the clauses P and ~ P v Q, and Q v R can be obtained by resolving the clause P v R and -~P V Q. The order of the literals in the clauses is unimportant. For example, P V A denotes any clause that is the disjunction of P and the literals, if any, of A. Clauses that are derived from other clauses are referred to as derived clauses. Other clauses, i.e., those that were given as inputs to the deduction system, are referred to as input clauses. The resolution rule of inference is an extension of the standard modus ponens rule in logic, which permits the derivation of Q from P and P D Q whose clause form is ~ P v Q. A set of clauses is unsatisfiable if and only if the empty clause false is derivable from the set of clauses by resolution, and resolution is refutation complete or simply complete. Following is a resolution proof of the unsatisfiability of { P V Q, P v -~@,~ P v @, ~ P v -~Q} (note t h a t idempotence of V is used to automatically replace clauses of the form P v P v C by the equivalent P v C):
1. 2. 3. 4.
PvQ Pv-~Q -~PV@ =P V -~Q
B0
5. P 6. -~P
7. false
resolve 1 and 2 resolve 3 and 4 resolve 5 and 6
The following sections will be concerned with various refinements of resolution. These refinements will be described in terms of resolution applied to the propositional calculus, but can readily be extended to apply to the first-order predicate calculus. But first consider the general requirements for the resolution in the first-order predicate calculus. The first-order predicate calculus includes variables and n-ary predicate and function symbols. Propositional symbols are essentially 0-ary predicate symbols. Constant symbols are 0-ary function symbols. The first-order predicate calculus also adds the V and 3 quantifiers.
A term of the first-order predicate calculus is one of • A variable symbol
• f ( Q , . . . , t , ) , where f is an n-ary function symbol and tl . . . . , t , are terms. A formula of the first-order predicate calculus is one of: • The atomic formula or atom P ( t l , . . . , t,), where P is an n-ary predicate symbol and Q , . . . , t , are terms. • The universal quantification (VxA), where x is a variable and A is a formula of the first-order predicate calculus. • The existential quantification (3xA), where x is a variable and A is a formula of the first-order predicate calculus. • The negation, disjunction, conjunction, implicatio~ and equivalence of first-order predicate calculus formulas, defined analogously to formulas of the propositional calculus. The intuitive interpretation of the quantified formulas is that VxA means that A is true for every value of x and 3xA means that A is true for some value of x. VxA is equivalent to -~3x-~A and 3xA is equivalent to -~Vx-~A. The quantifier Yx or 3x binds the variable x (and x is bound by the quantifier) in VxA or 3xA. Resolution operates on unquantified formulas, so it is necessary to remove quantifiers from quantified formulas by skolemization. The skolemized formula is unsatisfiable if and only if the original formula is unsatisfiable.
81
The concept of quantifier force is used to deal with the fact t h a t a universal q u a n t i f e r behaves like an existential quantifier and vice versa if it appears inside a negation. If A is a formula and
VxB is a subformula of A, then Vx has universal force in A if VxB occurs positively in A and existential force in A if it occurs negatively in A. Similarly, if A is a formula and 3xB is a subformula of A, t h e n 3x has existential force in A if 3xB occurs positively in A and universal
force in A if it occurs negatively in A. Let A be a formula to be tested for unsatisfiability. Assume t h a t A has no u n b o u n d variables and t h a t each quantifier binds a different variable. These conditions can be achieved by adding universal quantifiers to the beginning of the formula for u n b o u n d variables and r e n a m i n g variables. Assume further t h a t every quantifier in A is of universal force or existential force, but not both, i.e., no quantifier appears inside an equivalence. If some quantifier appears inside an equivalence B -- C, the equivalence must be replaced by an equivalent formula such as (B D C) A (C D B). Let QxB be a subformula of A where Qx is a quantifier of existential force. Let QlxlA1, ...,
Q , x , A , (n > 0) be the successively smal!er quantified subformulas (each A~ contains Qxi+lA~+l) of A t h a t contain QxB where each Q~x~ is a quantifier of universal force. T h e n replace QxB in A by the formula B w i t h every occurrence of x replaced by the t e r m e if n = 0 or f ( x l , . . . ,x~) if n > 1, where c is a new Skolem constant or f a new Skotem function, i.e., one t h a t does not already appear in the formula. This process is repeated until no quantifiers of existential force remain at which point all remaining quantifiers can be removed leaving an unquantified formula. Skolemization is often described as if it applied only to formulas in prcnex form, i.e., those of the form Qlxl, ..., QnxnA where A contains no quantifiers. However, this restriction is unnecessary and has the disadvantage t h a t skolemizing a formula after its conversion to an equivalent formula in prenex form m a y lead to skolem functions having more arguments t h a n necessary. For example, to prove t h a t J o h n has a father from the s t a t e m e n t t h a t everyone has a father, it is necessary to refute the formula Vx3yFather(x,y) A -~3zFather(John, z). This can be skolemized to Father(x, f(x)) A ~Father(John, z). A single step resolution refutation exists with the substitution of John for x and f(John) for z. T h e Skolem function f can sometimes be intuitively interpreted such t h a t f is the function of its arguments x l , . . . , x , t h a t computes the value required for the containing expression to be true. For example, in Father(z, f ( x ) ) , f(x) can be t h o u g h t of as referring to the father of x. An expression is called ground if it contains no variables.
A set of ground clauses can be
regarded as a syntactic variation of clauses of the propositional calculus. A set of clauses S is unsatisfiable if and only if there is an unsatisfiable set of ground clauses S r such t h a t each clause
82 in S t is an instance of a clause in S. Note that a single clause in S may require more than one instance in S I for S ~ to be unsatisfiable. For example, the set of clauses S consisting of
P(a) V P(b)
and
-~P(x) is unsatisfiable, but
S' contains two instances
-~P(a) and -~P(b) of -~P(x).
When instantiating clauses in S, it is only necessary to consider replacing variables by terms construetible from symbols occurring in S (the
Hcrbrand universe of
S), i.e., no new function
or constant symbols need be introduced. An exception is that if S contains variables but no constant symbols then a single constant symbol is added. Before resolution was developed, some proof procedures successively formed instantiations of S by replacing variables by terms in the Herbrand universe of S in ascending order of term complexity. The resulting sets S f were then tested for unsatisfiability. This approach is inefficient because the instantiation process is not well directed to finding the specific instances of variables that lead to the result being unsatisfiable. Resolution is an important inference procedure for two reasons. First, as described above, it is a single inference rule for determining the unsatisfiability of sets of clauses of the propositionM calculus. Second, it instantiates variables in a manner that is more directed to finding an unsatisfiable instantiation. When resolving two clauses of the first-order predicate calculus, two literals are resolved on and the remaining literals are disjoined to form the resolvent, just as for propositional calculus clauses. However, there are two differences. First, two clauses of the first-order predicate calculus are
standardized apart before
being re-
solved, i.e., variables of one or both of the clauses are renamed so that the two clauses have no variables in common. This is valid because a set of clauses whose unsatisfiability is being determined is considered to be the conjunction of a set of universally quantified clauses, and any pair of conjoined formulas VxP(x)
A VxQ(x)
The set of clauses consisting of
P(a, x)
is equivalent to a variable renamed one VxP(x) and
-~P(x, b) is unsatisfiable, but P(a, x)
and
A VyQ(y).
P(x, b) have
no common instance as is required for a resolution operation. After renaming variables, however,
P(a, x)
and
P(y, b) have
the common instance
Second, resolution finds by
unification a
P(a, b),
and resolution is possible.
most general substitution that makes a pair of lit-
erals complementary. This substitution is then applied to the remaining literals in forming the resolvent. For example, if substitution that makes
P(a, x)
P(a,x)
-~P(y, b) v R(y)
are resolved, the most general
-~P(y,b) complementary is
the substitution of b for x and
v Q(x) and
and
a for y, and the resolvent is Q(b) v
R(a).
By finding most general substitutions to make pairs
of literals from pairs of clauses complementary, resolution progressively finds instantiations of
83 clauses that might lead to a ground refutation. Completeness arguments for resoIution for first-order predicate calculus generally rely on lifting theorems. These show how a resolution refutation of S t whose clauses are ground instances of clauses in S can be imitated by a resolution refutation of S. The fact that two or more Iiterals in a clause can be collapsed into a single literal in a ground instance is a complication. For example, the set of clauses consisting of instances
P(a)
and
P(x) V P(y)
-~P(a) of the
-~P(u) V -~P(v) only leads to
-~P(u) V -~P(v) is
and
unsatisfiable because ground
clauses are contradictory. However, resolving
resolvents like
P(y) V -~P(v), P(y) v -~P(u), and
P(x) V P(y)
and
the like. Resolving
these resolvents among themselves and with the original clauses also yields no progress toward a refutation. In fact, every resolvent has two literals, and the empty clause can never be derived. There are two solutions to this difficulty. One is to model resolution for general clauses directly on resolution for ground instances so that there is a general resolution step corresponding to each step in the refutation of the ground instances. To accomplish this, it is necessary to resolve on possibly more than one literal of each clause simultaneously. For example, the set of clauses consisting of P(x)V P(y) and
-~P(u)V -~P(v) has
ground instances P(a) and
-~P(a) with
a single-
step resolution refutation. The general resolution operation will then have to find a most g e n e r a l substitution that make all of
P(x), P(y), P(u),
P(v)
and
identical (for example, by substitution
of x for y, u, and v). The second solution entails the addition of the
factoring
operation. The resolution rule for
general clauses resolves on only a single pair of literaIs as in the case of ground clause, but the additional factoring operation adds clause instances
(factors) that
or more literals of a clause so that they are identical. Thus,
~P(u)
is a factor of
~P(u) v -~P(v).
P(x)
result from instantiating two is a factor of
P(x) v P(y)
and
The factors can then be resolved so that they result in
the empty clause. When more than a single pair of literals must be collapsed to one in a factor (e.g., two separate pairs of literals must each be collapsed to single literals, or three literaIs must all be collapsed to a single literal), all the factors can be generated by successively applying the factoring operation to single pairs of Iiterals. Although resolution is complete, it is not very efficient when measured in terms of the size of the search space for a resolution refutation. Since the development of resolution, many refinements have improved its efficiency. Some, such as elimination of tautologies and subsumption, discard useless or redundant results. Many restrict which pairs of clauses are allowed to resolve with each other. Some of these restrictions, such as set of support~ preserve completeness, while other, such as unit resolution, are complete for only some sets of clauses.
84
2.1
Elimination
o f Tautologies
A clause that contains both a literal and its negation is a tautology that can, in most resolution procedures, be discarded. The exceptions among the procedures discussed here are model elimination, which uses chains instead of clauses, but may require retaining chains with complementary iiterals, and some forms of theory resolution. In general, the rationale for being able to discard tautologies is that they can be evaluated as true by truth-functional rules and, thus, cannot contribute to the falsity of a conjunction of clauses.
Purity
2.2
A literal whose complement does not appear in a set of clauses is called pure. Because a pure literal can never be resolved on and thus eliminated, any clause containing a pure literal can never appear in the derivation of the empty clause. Thus, all clauses containing pure literals can be safely deleted.
Subsumption
2.3
A clause C subsumes a clause D if C's literals are a subset of D's literals. It requires an equal amount of work or less to derive the empty clause from C as D if C subsumes D. Thus, D can be eliminated if C subsumes D. Two forms of subsumption can be employed: forward subsumption or the discarding of a newly derived clause that is subsumed by a clause that is already present, i.e., an input clause or a previously derived clause, and backward subsumption or the discarding of clauses already present that are subsumed by a newly derived clause. Normally, when a clause is derived, it should be tested for elimination by forward subsumption before being used to eliminate other clauses by backward subsumption. If these operations are performed in the opposite order, then some clause necessary to a refutation may be continually derived and then eliminated shortly thereafter by backward subsumption without ever being used, because the search strategy may order inference operations partially based on the age of the clause.
2.4
Set of Support
Resolution is often used to prove a theorem from a set of axioms that is known to be satisfiable. However, unrestricted resolution does not distinguish between clauses that are created from
ax-
85 iotas
(axiom clauses) and
those created from the [negation of the] theorem
(theorem clauses)--all
are treated alike. A refutation cannot be found from the satisfiable set of axiom clauses alone; a refutation must depend upon the theorem clauses. The set of support restriction was developed to take advantage of this necessary dependency and make resolution more goal-directed. The set of support restriction [90] is a complete restriction of resolution that requires division of the total set of clauses S into disjoint subsets T and S - T such that S - T is a satisfiable set of clauses. The set of clauses created from the theorem is typically designated as the set of support T. Axiom clauses would then comprise S - T. The set of support restriction allows two clauses to be resolved only if at least one of the clauses is
supported by T - - i s
in T or has an ancestor clause in T. This can substantially reduce
the size of the search space and makes the procedure more goal-directed because every derived clause is derived from a theorem clause. An alternative definition of the set of support restriction is that it allows two clauses to be resolved only if it is not the case that both are in S - T.
When all the theorem clauses are
designated as the set of support T, this means that the only unallowed resolution operations are those between axiom clauses. Even if a problem is not posed in terms of a satisfiable set of axioms and a theorem so that the theorem clauses can be designated as the set of support, the set of support restriction can still be used. Syntactic criteria can be used to designate a set of support. Any unsatisfiable set of clauses must include at least one positive clause and at least one negative clause. The interpretation that assigns
false
(resp,
true)
to every atom is a model for
a set of clauses that contains no positive (resp., negative) clauses. Thus, the set of all positive clauses or the set of all negative clauses can be designated as the set of support because it is guaranteed that the set of remaining clauses is satisfiable. Note that the set of support restriction is only complete if the set of clauses outside the set of support is satisfiable. The set of clauses {P, -~P, -~Q} cannot be refuted if only ~Q is in the set of support. Therefore, Q cannot be proved from P and -~P, if only the negated theorem is used as the set of support. Logic is sometimes criticized as being unsuitable for artificial-intelligence applications because anything, e.g., Q, can be proved from an inconsistency, e.g., P and -~P. Although it is hard to argue that an inconsistent set of axioms is desirable, the critics claim that large collections of axioms about the real world may inadvertently be inconsistent, and that it would be undesirable to conclude irrelevant statements from an inconsistency in the axioms, The set of support restriction
86 provides some protection from this problem. Its failure to prove Q from P and -~F implies that the inconsistency must be connected to the theorem via resolution operations and, hence, must in some sense be relevant to the conclusion for a set of support refutation to succeed. It is worth considering whether there is a relationship between the set of support restriction and the logic of entailment or relevant implication.
2.5
P1 and
NI Resolution
P1 resolution [67] is the restriction of resolution that requires that one of the parent clauses to each resolution operation must be a positive clause. P1 resolution can be viewed as an extension of the set of support restriction and is also complete. Using the set of support restriction, it is legitimate to designate the set of all positive clauses as the set of support. Resolution operations between input clauses will then require one parent to be a positive clause as desired. However, with just the set of support restriction, any derived clause can be resolved with any other clause and the intended restriction that one of the parent clauses to each resolution operation must be positive will not be obeyed. After each resolution operation, the resulting set of clauses is unsatisfiable provided the initial set of clauses is unsatisfiable. Thus, the set of support restriction (with the set of all positive clauses designated as the set of support) can be applied to each set of clauses resulting after performing a resolution operation and not just to the initial set of clauses, effectively imposing the desired restriction that one parent clause of each resolution operation be a positive clause. The primary importance of P1 resolution is its relation t o hyperresolution.
N1 resolution is the restriction of resolution that requires that one of the parent clauses to each resolution operation must be a negative clause. It is defined analogously to P1 resolution and has similar properties.
2.6
Hyperresolution
Hyperresolution is a more efficient version of P1 resolution. Although ordinary resolution operations take two clauses as arguments, hyperresolution is the first of several operations to be discussed that may require an arbitrary number of arguments. Each hyperresolution operation takes a single mixed or negative clause, termed the nucleus, as one of its arguments and as many positive clauses, termed electrons, as there are negative literals in the nucleus as the other arguments and produces a positive clause result. Each negative literal
87 of the nucleus is resolved with a literal in one of the electrons. The hyperresolvent consists of all the positive literals of the nucleus disjoined with the unresolved on literals of the electrons. The completeness of hyperresolution can be used to prove the claim that an unsatisfiable set of Horn clauses never needs to contain more than one negative clause (if a set of Horn clauses has more than one negative clause, then at least one of the negative clauses alone is unsatisfiable with the positive and mixed clauses). Results of hyperresolution operations are always positive clauses. Thus, any negative clause can only be a parent to the empty clause in a hyperresolution operation. But, because the empty clause needs to be derived only once in a refutation, it is unnecessary for more than one negative clause to be used.
Negative hyperresolution is exactly the same as hyperresolution except it is an efficient version of N1 instead of P1 resolution and thus derives negative instead of positive clauses.
2.7
Unit Resolution
Unit resolution [11] is the restriction of resolution that requires at least one of the parent clauses in each resolution operation be a unit clause. This is an appealing restriction when considered from the point of view of implementation and efficiency because a resolvent always has fewer Iiterals than its longer parent clause. Because the goal in resolution theorem proving is to derive the empty clause, shorter clauses are "closer" to the goal than longer clauses. Thus, unit resolution always appears to be making progress toward the goal. Unit resolution is obviously incomplete because not every unsatisfiable set of clauses contains a unit clause; however, most importantly, it is complete for sets of Horn clauses. The completeness of unit resolution for Horn clauses is easily shown. PI resolution is complete for arbitrary sets of clauses and can thus be used to refute sets of Horn clauses. Because in sets of Horn clauses all positive clauses are also unit clauses, it is apparent that every PI resolution operation is also a unit resolution operation, and thus a P1 resolution refutation is also a unit resolution refutation.
2.8
Unit-Resulting
Resolution
Unit-resulting resolution (UR-resolution} [58] is a more efficient version of unit resolution. The unit-resulting resolution operation, like the hyperresolution operation, takes an arbitrary number of arguments. Where hyperresolution operates on a single mixed or negative clause and a set of positive clauses and produces a positive or empty clause as its output, unit-resulting resolution
88 operates on a single nonunit clause and a set of unit clauses and produces a unit or empty clause as its output. In addition to the ultimate goal of deriving the empty clause, unit resolution can be seen to have as an intermediate goal the derivation of additional unit clauses, for only unit clauses can participate freely in resolution operations. The sole purpose of nonunit clauses is their role in deriving additional unit clauses, because they cannot be resolved with each other. Deriving a unit clause requires a nonunit clause, all but one of whose literals are successively resolved away by either input or derived unit clauses, in the initial set of clauses. Unit-resulting resolution implements this process of resolving away by unit resolution all but one of the literals of a nonunit clause more directly. A unit-resulting resolution operation takes as its input n unit clauses and a single n-clause or n + 1-clause and uses the n unit clauses to resolve n distinct literals away simultaneously in the nonunit clause, resulting in the empty clause or a derived unit clause. This eliminates the need to form and store derived nonunit clauses; they are handled implicitly by the unit-resolution procedure.
2.9
Input Resolution
Input resolution [11] is the restriction of resolution that requires that one of the parent clauses to each resolution operation must be an input clause, i.e., not a derived clause. Input resolution is incomplete. For example, {P v Q, P v -~Q, -~P v .Q,-~P v -~Q} cannot be refuted by input resolution. Input resolution can derive P and -~P (and Q and ~Q) but cannot resolve them with each other because neither is an input clause as required. Input resolution, like unit resolution, is complete for sets of Horn clauses. N1 resolution is complete for arbitrary sets of clauses and can thus be used to refute sets of Horn clauses. Because in sets of Horn clauses no clause has more than one positive literal, it is apparent that every N1 resolution operation results in a negative clause, and thus no two derived clauses can be resolved with each other and an N1 refutation is also an input refutation. This demonstration of the completeness of input resolution for Horn clauses also shows that it is also unnecessary to resolve arbitrary pairs of input clauses with each other, because it is sufficient to take only those pairs of input clauses that include a negative clause. More generally, input resolution is compatible with the set of support restriction. Thus, input resolution can be restricted without further loss of completeness so that every derived clause is supported by set of support T, where T can be selected arbitrarily so long as S - T is satisfiable. For exampIe, because an unsatisfiable set of Horn clauses never needs to contain more than one negative clause,
89 it is possible to refute sets of Horn clauses using a single negative clause as the set of support. It is interesting that unit and input resolution, despite their substantial operational differences, are both incomplete procedures that are complete for sets of Horn clauses. Actually, unit and input resolution are capable of solving exactly the same class of problems, i.e., if a unit resolution refutation exists, then an input resolution refutation also exists, and vice versa.
There is a
constructive proof of this fact that can be used to transform one kind of refutation into the other [11]. Input resolution bears a strong resemblance to the problem-reduction method. In the problemreduction method, the inputs are a set of primitively solvable goals, a set of rules stating that if a set of antecedent goals can be solved then the consequent goal can be solved, and a goal to be solved. Solution of the goal is accomplished by backward chaining. To solve a goal, one asks if it is primitively solvable. If it is not, then rules whose consequent goals are the same as the goal are used and solution of the antecedent goals is attempted. Such problems axe easily encoded as input resolution problems. Primitively solvable goals can be represented by positive unit clauses. Rules of the form "if goals P x , . . . , P~ are solvable then goal Q is solvable" can be represented by the clause Q v -~P1 v ... v -~Pn. The problem goal can be represented by a negative unit clause (a set of problem goals to be simultaneously solved could be represented by a negative nonunit clause). These are all Horn clauses, so input resolution is applicable. With only the negative clause in the set of support, input resolution implements backward chaining.
Ordered input resolution is a further restriction of input resolution. In ordinary input resolution, a supported n-clause can be used a derivation of the empty clause with the literals resolved away in any order.
Even if each literal can be resolved away in only one way, there are n!
derivations of the empty clause. This inefficiency is eliminated in ordered input resolution by not treating the disjunction connective v as a commutative operator for which order does not matter and by requiring that literals be resolved away in some fixed order, e.g., strictly left to right.
2.10
Prolog
Prolog [13,39], the currently most widely used logic programming language, is based on ordered input resolution and relies upon input resolution's resemblance to the problem reduction method. A Prolog program consists of a set of unit assertions P and nonunit assertions Q *- P1,. •., P~. The latter represents the clause Q v -~P1 v . . - v ~Pn. Prolog can then be asked to evaluate queries with respect to the assertions. A query is represented by the Prolog clause *-- Q 1 , . . . , Q,~. The +-
90 connective can be interpreted as the ordinary implication connective except that the arguments are reversed. The literals on the right-hand side are conjoined. If it is converted to ordinary clause form, the literal on the left-hand side of a nonunit assertion will be positive; all the literals on the right-hand side of an assertion or query will be negative. This allows representation of all Horn clauses. Because there is no negation connective in Prolog, every clause has either one (in the case of assertions) or no (in the case of queries) positive literal. Prolog program execution performs ordered input resolution to refute the query clause using the assertions. The query clause is designated as the single clause in the set of support and the leftmost literal of a derived clause is always resolved with the leftmost literal of an assertion. When the literal of a derived clause is resolved on, it is removed and, in the case of resolution with a nonunit assertions, the literals on the right-hand side of the assertion will appear in its place, in the same order as they appeared in the assertion. Let ~-- Q I , . - - , Q,~ be the current derived clause. Then~ resolution with the unit clause Q1 will result in *- Q ~ , . . . , Q,~ and resolution with the nonunit clause Q1 ~-- P 1 , . . . , P~ will result in •-" P ~ , . . . , P~, Q 2 , . . . , Q,~- As always, derivation of the empty clause completes the refutation. To facilitate its use as a programming language as well as a deductive system, Prolog is much more precise than most deductive systems about the order in which inference operations are performed. It uses ordered input resolution with left to right resolution on literals. The assertions in a Prolog program are also ordered. Assertions that appear earlier in the list of assertions that comprise a ProIog program will be tried before later ones. The control strategy is depth-first search with backtracking on failure. If the current derived clause *- Q 1 , . . . ,Q,~ and Q1 is resolved away by the unit assertion Q1 or nonunit assertion Q1 *P 1 , . - - , P ~ , all ways of refuting the derived clause ~-- Q2 . . . . . Qm or ~ P1 . . . . . P~, Q 2 , - . . , Q m are explored before any other method of resolving away Q1 (by a later assertion in the Prolog program) is tried. When Prolog is blocked, i.e., the current derived clause is *-- Q1,...,Q,~, but Q1 cannot be resolved away by any assertion not already tried, the most recent resolution operation is undone and the next alternative is tried. Prolog was analyzed above from the narrow perspective of deduction. For general-purpose theorem proving, Prolog is inadequate mainly because its inference system for Horn clauses omits general disjunction and negation and its unbounded depth-first search strategy is incomplete. A further problem is that many Prolog systems employ unification without the occurs check. This will be discussed in the section on unification. However, Prolog is much more than a deduction system--it is a programming language with many attractive features. Prolog programs can often
9] be viewed as having logical and procedural interpretations. The logical interpretation has been discussed above. The procedural interpretation considers collections of clauses with the same predicate symbol in the he~id to comprise a procedure. Execution of the procedure proceeds to match the procedure~ca~l literal by trying alternative clauses in top-to-bottom order and satisfying subgoals of nonunit clauses in left-to-right order, backtracking to find alternative solutions as required. It supports the notion that algorithms should be viewed as being a combination of logic and control components [38,39}. Prolog efficiently implements an important subset of the features, including unification and backtracking, proposed for earlier artificial-intelligence languages such as PLANNER [24] and QA4 [68]. These earlier languages generally had less complete (relative to their specification) and less efficient implementations. Unification is used as a uniform mechanism for composing and decomposing data types, represented as first-order predicate calculus terms. Prolog provides a smooth interface to built-in predicates for arithmetic, input/output, and the like, as well as user-defined predicates with logical interpretation. The cut operation provides additional control capability. Prolog's restriction to sets of Horn clauses has the natural justification that only for Horn clauses are all answers to queries certain to be definite. The inexpressibiIity of facts such as
P(a) v P(b) in Horn clauses makes it unnecessary to consider whether 3xP(x) has the answer true, with x being either a or b, but not knowing which. Sets of ground unit clauses can be regarded naturally as containing the same information as a file in a relational database. Virtual relations can be defined by nonunit clauses. Assert and retract operations permit additions or deletions of clauses by a running Prolog program. The greater expressiveness of Prolog makes it a logical generalization of relational databases. Prolog provides a form of negation, though not the standard one, termed negation as failure [49,33] that supports reasoning with the closed-world assumption. The closed-world assumption asserts that for some predicate the given instances of the predicate comprise the entire set of instances of the predicate. Failure to prove a formula then implies its negation. This topic is covered more deeply in the chapter by" Bibel.
2.11
Linear Resolution
Input resolution and its derivatives (including Prolog) are incomplete. Linear resolution [50,53] is an extension of input resolution that requires that at least one of the parent clauses to each resolution operation must be either an input clause or an ancestor clause of the other parent.
92 Linear resolution is complete. Linear resolution can be further restricted while preserving completeness; in particular, linear resolution, like input resolution, is compatible with set of support and ordering restrictions.
2.12
Model
Elimination
The model e l i m i n a t i o n procedure [51,52] is isomorphic (in the propositional case) to a highly restricted form of linear resolution and is complete. It incorporates the set of support restriction, an ordering restriction on literals, and a requirement that earlier clauses in a derivation not subsume later ones. A procedure very similar to model elimination is called SL-resoIution [40]. The restriction of model elimination or SL-resolution to Horn clauses is basically ordered input resolution, i.e., the inference system employed by Prolog. This accounts for Prolog's inference system frequently being referred to as SLD-resolution (SL-resolution for definite clauses, where definite clauses are another name for Horn clauses). Model elimination is technically not a form of resolution at all, because it operates on chains instead of clauses. A chain differs from a clause in that its literals are ordered and there are two types of Iiterals: A-literals and B-literals. The ordinary literals used in clauses in resolution will be B-literals in the model elimination procedure. The literal that is resolved on in the model elimination procedure is saved in the result as an A-literal. A-literals are used in instances where, in linear resolution, a clause is resolved with an ancestor clause. There are two inference operations in model elimination: extension and reduction.
Let
Q m , . - . , Q i be a chain whose last literal Q1 is a B-literal. The literal indices are written in descending order to facilitate comparison with the Prolog inference rule stated previously. Model elimination consistently operates on the rightmost literal of a chain, while Prolog operates on the leftmost literal of its derived clauses. Let -~Q1 v P1 v -.- v P,~ be an input clause. Then the chain Q m , . . . , Q2, [Q1],PII, . . . , P~. is the result of applying the model-elimination extension operation. In the derived chain, literals Qrn,..., Q2 are A-literals or B=literals according to their status in the parent chain; Q1 is an A-literal; P i l , . . . , Pi. are all B-literals with Q , . . . , i , being some permutation of 1 , . . . , n .
(A-literals will be marked by enclosing them in brackets.) Any
permutation of P 1 , . . . , P- can be used in the result; it is unnecessary to derive additional chains with different permutations of these literats. Again, let Q,,~,..., Q1 be a chain whose last literal Q1 is a B-llteral. If Q1 is complementary to some earlier A-llteral Qi, then the chain Q , ~ , . . . , Q2 can be derived by the model-elimination reduction operation. In the derived chain, all the literals are A-literals or B-literats according to
93
their status in the parent chain. If the clause -'Q1 v P1 v . . . v P~ used in the extension operation is represented by the Prolog assertion Q1 *-- -~P1,---, -~P~ (this is possible precisely if -'Q1 is a positive literal and
PI,..., P,~
are all negative literals, so t h a t QI, ~ P I , . . . , ~Pn are all atoms) and the p e r m u t a t i o n n , . . . , 1 is used in forming the result of the extension operation, then the resulting chain's B-literals are exactly the same literals, b u t in reverse order, as the literals in the result of a Prolog inference operation. Because in Prolog all literals in a derived clause are negative, there can never be a case of an A-literal being followed by a c o m p l e m e n t a r y B-literal. Thus, no reduction operations are possible, and retaining the A-literals is unnecessary. Some other aspects of the model elimination procedure need to be mentioned.
Both the
extension and reduction operations require the last literal of the chain to be a B-literal, but extension by a unit clause results in a chain with a terminal A-literal.
The solution to this
difficulty is t h a t t e r m i n a l A-literals are simply removed from the chain. Certain chains can be rejected without loss of completeness.
If the chain contains (a) an
A-literal followed later in the chain by an identical A-literal or B-literal, (b) an A-literal followed later in the chain by a c o m p l e m e n t a r y A-literal, or (c) a B-literal followed later in the chain by a c o m p l e m e n t a r y B-literal, where the two literals are not separated by an A-literal, then the chain can be rejected. These tests can be performed on the chain before terminal A-literals are removed. Tests (a) and (b), in particular, may reject a chain with t e r m i n a l A-literals t h a t would be acceptable if the t e r m i n a l A-literals are removed. T h e rationale for Test (b) is that if the chain contains an A-literal followed by a c o m p l e m e n t a r y A-literal, the second A-literal has a B-literal ancestor in an earlier chain in the derivation. This literal could have been removed by reduction. T h e rationale for Test (c) is t h a t c o m p l e m e n t a r y B-literals unseparated by A-literals must come from the same input clause. This clause must then be a tautology, and it is unnecessary to use tautologous clauses. Test (a) is more difficult to justify, but its effect is to eliminate loops. Rejecting chains on the basis of Test (a) precludes the refutation of a literal being a subtask of the refutation of that same literal. Following is a m o d e l elimination proof of the unsatisfiability of { P V Q, P v -~Q, ~ P v Q, ~ P v
~Q}: 1. P, Q 2. P, [Q], P
a chain from P v Q extend by P V -,Q
94.
a. P, [Q], [Pi,-Q 4. P, [Q], [p] 5. P 6. [P],Q
~,. IF], [Q], ~P 8. [P], [Q] 9.0
extend by ~ P v -~Q reduce -~Q by [Q] delete terminal A-literals [Q], [P] extend by -~P v Q extend by -,P v -.Q reduce -~P by [P] delete terminal A-literals [Q], [P]
The model elimination procedure takes its name from the fact t h a t it systematically tries to construct a model for a set of clauses. When all such attempts fail, the set of clauses is determined to be unsatisfiable. As the procedure attempts to construct a model, the A-literal [P] or [-~P] marks the assignment of of true or false to the atom P in the interpretation, respectively. In the above proof, the procedure tries to make each of the literals P and Q of the first chain an A-literal, because at least one of P and Q must be true in any model in order to satisfy the clause P V Q. Each assignment ultimately lead to a contradiction, so the set of clauses is unsatisfiable.
2.13
Prolog
Technology
Theorem
Prover
Despite Prolog's logical deficiencies, it is quite interesting from a deduction standpoint because of its very high speed as compared with conventional deduction systems. The objective of a Protog
technology theorem prover (PTTP) [80] is to remedy Prolog's deficiencies while retaining to the fullest extent possible the high performance of well-engineered Prolog systems. To achieve completeness for non-Horn clauses, an inference system other than Prolog's input resolution must be adopted. However, an arbitrarily chosen complete inference system is unlikely to be as efficiently implementable as Prolog. The fact that Prolog employs input resolution is crucial to its high performance. No Prolog operation acts on two derived clauses at once. The use of input resolution and depth-first search implies there is only one active derived clause at a time--represented on the s t a c k - - t h a t is resolved with input clauses that can be compiled. The model elimination procedure is also an input procedure, but is complete. It can be seen as Prolog-style ordered input resolution plus one additional inference rule, the model-elimination reduction operation. The reduction operation, phrased in terms meaningful to Prolog, states that, if the current goal is complementary to an ancestor goal, then the current goal is treated as if it were solved (nonground goals may have to be unified for the rule to apply). It is a form of reasoning by contradiction. Consider proving C from A D C, B D C, and A V B. C has the subgoal A (by A D C), which has the subgoal -~B (by A V B), which has the subgoal -~C (by the contrapositive
95 of B D C). -~C is complementary to the higher goal C, so it can be treated as solved, as thus can -~B, A, and C. The reasoning is: the goal -~C is either true or false; if ~ C is true, then C must be true by the chain of inferences--a contradiction because -,C and C cannot both be true; thus, ~C must be false and C must be true. Note that this reasoning says nothing about the value of the intermediate subgoals A and ~B. Another major concern is the incompleteness of Prolog's unbounded depth-first search strategy. It cannot be replaced by an arbitrary complete search strategy, like breadth-first or meritordered search, without sacrificing performance. If depth-first search were not used, it would be necessary for more than one derived clause to be simultaneously represented and for variables to have more than a single value simultaneously, i.e., different values in different clauses. This implies the need for a more complex and less efficient representation for variable bindings than Prolog's. In addition, depth-first search allows all state information to be kept on the stack with a minimum of memory required. Breadth-first search would need an additional amount of memory that would grow exponentially with increasing depth. Therefore, depth-first search continues to be a good choice of search s t r a t e g y - - b u t for completeness, it must be bounded. That leaves the problem of selecting the depth bound. In an exponential search space, searching with a higher-than-necessary search bound can result in an enormous amount of wasted effort before the solution is found. The cost of searching level n in an exponential search space is generally large compared with the cost of searching earlier levels. This makes it a practical procedure to perform consecutively bounded depth-first search. The depth bound is set successively at 1, 2, 3, and so on, until a solution is found. If a constant branching factor b is assumed, this method results in only a factor of about ~
more inference
operations being performed than breadth-first search to the same depth [82]. The effect is similar to performing breadth-first search. However, instead of retaining the results from earlier levels, these results are recomputed--with the efficiency of Prolog-style variablebinding representation possible for depth-first search only. There are two important optimizations of this iteratively bounded depth-first search procedure that reduce the number of inference operations. The first optimization follows from the observation that, if the depth of the current goal plus the number of pending goals exceeds the d e p t h bound, then no solution within the depth bound can be found from this clause and so another solution should be sought. The second optimization is concerned with (1) recording the minimum value by which the depth of the current goal plus the number of pending goals exceeds the depth bound, and (2)
96 using this minimum value to increment the depth bound instead of always incrementing it by 1. This technique can also be used to recognize that the search space is finite and that the search itself can be abandoned--if, upon the completion of searching at some level, no cutoff had ever occurred because the depth of the current goal plus the number of pending goals exceeded the depth bound, then it is clear that searching with a higher depth bound will result in no additional inferences being made. A Prolog technology theorem prover has several advantages as compared to ordinary theorem provers. It can perform inferences at a very high rate approaching Prolog's. It is complete and easy to use. Many conventional theorem provers rely upon user selection of features and parameter values t h a t control behavior and limit completeness. A Prolog technology theorem prover, like Prolog, requires little memory and has facilities for procedural attachment (built-in functions) and control facilities (the ordering of literals in clause and of clauses in the database, the cut operation).
2.14
C o n n e c t l o n - G r a p h Resolution
In principle, ordinary resolution operates on just a set of clauses as its only data structure. Conneetion-graph resolution [37~3], on the other hand, operates on a derived data structure called a eoaneetion graph. A connection graph is a graph containing clauses with links between complementary pairs of literals. The connection-graph-resolution operation resolves on a link in the connection graph, forms the resolvent, and adds it to the connection graph. Literals in the resolvent acquire their links
to other literals in the connection graph by inheritance. They are linked only to those literals to which their parent literals were linked. An advantage of connection-graph resolution is its explicit representation by the links of what resolution operations are possible. This makes retrieval of matching literals easier and encourages graph searching as a method for selecting inference operations or finding proofs. Although the immediate access to matching literals via links is an often cited advantage of connection-graph resolution, it is also possible to achieve efficient excess to matching literals by using term indexes [61,23], at least for ordinary resolution. When the inference operations are more difficult to discover and computer as when theory resolution or unification in equational theories is used, inheriting links may be more efficient than rediscovering and recomputing possible inference operations. Besides suggesting encoding inference operations in a connection~graph data structure, con-
97 nection-graph resolution is a restriction of resolution because connection-graph resolution specifies that the link that is resolved on be deleted from the connection graph. The effect of this link deletion is that if literals L and L t are resolved, then L or a literal descended from L can never again be resolved with L t or a literal descended from L t. This has the beneficial effect of reducing the size of the search space in much the same way that ordering restrictions in input and linear resolution do. For example, ordinary resolution can discover two refutations of the set of clauses consisting of P , Q, and ~ P Y -~Q: 1. 2. 3. 4.
P Q ~ P v ~Q -~Q
5. false
resolve I and 3 resolve 2 and 4
,
and 1. 2. 3. 4.
P Q -~P v -~Q -~P
5. false
resolve 2 and 3 resolve 1 and 4
Either of these refutations can be discovered by connection-graph resolution, but not both in the same execution of a connection-graph-resolution theorem prover. The only resolution operations possible initially are to resolve on P in clauses 1 and 3 or on Q in clauses 2 and 3. Suppose clauses 1 and 3 are resolved first° Then the link between P in clause 1 and -~P in clause 3 is deleted, and neither literal will have any links. Thus, if clauses 2 and 3 are later resolved, the resolvent -~P will have no links (because there were none to inherit) and a refutation cannot be completed. Note that, aside from the initial set of links, links are acquired only by inheritance. Thus, once a literal has no links, the clause containing it can be removed, because the linkless literal cannot be resolved away. This is an extension of the purity rule for ordinary resolution. A sometime characteristic of connectlon-graph resolution is the dramatic collapse of the connection graph when links are deleted. Deletion of a link may make a literal pure in the graph causing its clause to be deleted which leads to yet more pure literals. The complexity of connection-graph resolution compared to other restrictions of resolution and its noncomrnutative behavior (i.e., inference operations cannot be freely reordered, because inferences can be blocked by absence of links depending on the order in which operations are performed) have made the procedure's completeness a difficult issue, although it can be shown complete under some restrictions [72,4,74].
98
2.15
Nonclausal
Resolution
One of the most widely criticized aspects of resolution theorem proving is its use of clause form. Besides generally being considered difficult to read and not human-oriented, one criticism of clause form is that conversion of a formula to clause form may eliminate pragmatically useful information encoded in the choice of logical connectives. For example, -,P y Q may suggest a case-analysis approach, while the logically equivalent P D Q may suggest a chaining approach to deduction. The use of clause form may also result in a large number of clauses being needed to represent a formula, as well as in substantial redundancy in the search space. An example of when conversion to clause formula results in a substantial increase in the size of the formula is the conversion of A -- B. If A and B are literals, the equivalent clause form is (-,A v B) h (A Y --B), which has two instances of the atoms of A and B. In the worst case, when A and B are formed using the equivalence connective, conversion of the single formula A - B may result in a number of clauses that is an exponential function of the size of the formula. Another problematical example is the formula (A1 A . . . A A,~) V (B1 A ... A B,). Even in the simple case when A1,... , A m , B 1 , . . . , B , are literals, if this formula is converted to the m x n clauses A1 V B 1 , . . . , Am V B,, each Ai occurs n times and each Bj occurs m times, instead of once as in the original formula. It is possible to extend the resolution rule to nonclausal formulas [60,55,78]. Although ordinary clausal resolution resolves on clauses containing complementary literals, nonelausal resolution resolves on general formulas containing subformulas occurring with opposite polarity. In clausal resolution, the literals resolved on are deleted and the remaining literals disjoined to form the resolvent. In nonclausal resolution, all occurrences of the subformulas resolved on are replaced by false (true) in the formula in which it occurs positively (negatively). The resulting formulas are disjoined and simplified by truth-functional reductions such as ( A v t r u e ) --~ true and
(A A true) -+ A that eliminate embedded occurrences of true and false and optionally perform simplifications such as (A A =A) --~ false. More precisely, if A and B are formulas and C is an atom occurring positively in A and negatively in B, then the result of simplifying A(C ~- false) v B ( C *-- true), where X ( Y +-- Z) denotes the result of replacing every occurrence of Y in X by Z, is a nonctausal resolvent of A and B. It is clear that nonclausal resolution reduces to clausal resolution when the formulas are
99 restricted to being clauses. In the general case, however, nonclausal resolution has some novel characteristics as compared with clausal resolution. It is possible to derive more than one resolvent from the same pair of formulas, even when resolving on the same atoms, if the atom occurs both positively and negatively in both formulas. Likewise, it is possible to resolve a formula with itself. The elimination of clause form and use of nonclausal resolution has some disadvantages as well as advantages. Most operations on nonclausal formulas are more complex than the corresponding operations on clauses. The result of a nonclausal resolution operation is less predictable than the result of a clausal resolution operation. This is an important point when a theorem-proving system selects what operation to perform next on the basis of the expected result (e.g., how many literals are in a derived clause). Clauses can be easily represented as lists of literals; sublists are appended to form the resolvent. Pointers can be used to share lists of literals between parent and resolvent [6]. With simplification being performed during the formation of a noncIausal resolvent, the appearance of a resolvent may differ substantially from its parents, making structure sharing more difficult. In clausal resolution, every literal in a clause must be resolved on for the clause to participate in a refutation. Thus, if a clause contains a literal that is pure (cannot be resolved with a literal in any other clause), the clause can be deleted. This is not the case for nonclausal resolution; not all atom occurrences are essential in the sense that they must be resolved on to participate in a refutation. For example, ( P / ~ Q,-~Q) is an unsatisfiable set of formulas, one of which contains the pure atom P. Only formulas containing pure atoms that are essential should be deleted for purity reasons. The subsumption operation must also be redefined for nonclausal resolution to take account of such facts as the subsumption of A by A A B as well as the clausal subsumption of A V B by A. The nonclausal resolution procedure gains additional power and complexity from allowing resolution on nonatomic formulas as well as atoms. For example, P v Q and (P v Q) D R could be resolved to obtain R. This can result in shorter, more natural proofs. However, the extension to nonatomic formulas is difficult in some respects. It may be difficult to recognize complementary formulas. For example, P V Q occurs positively in Q v R v P , -~P D Q, and -~(P -- Q). Also, the effect of resolving on nonatomic subformulas can be achieved by multiple resolution operations on atoms. Resolution on atomic constituents of nonatomic formulas that are also resolved on can lead to redundant derivations and inefficiency.
100
2.16
Connection
Method
The connection method [4,5] or generalized rantings [1] is not a form of resolution, but has some relationships to connection-graph resolution and nonclausal resolution, among others. Clause form is often referred to as conjunctive normal form (CNF) because it is a conjunction of disjunctions of literals. A dual form, called disjunctive normal form (DNF), is a disjunction of conjunctions of literals. One is easily obtained from the other by rewriting a formula in CNF by
(A A (B V C)) --* ((A A B) V (A A C)) ((B V C) A A) -~ ((S A A) V (C A A)) Another way of forming the DNF of a formula in CNF is to enumerate n conjunctions of literals, where n is the product of the number of literals in each clause and each conjunction is composed of one literal from each clause. For example, the CNF formula
(P V Q)A (P V ~Q)A (-~P v Q)A (~P V ~Q) is equivalent to the unsimplified DNF formula (P A P A-~P A ~P)V
(P ^ p A ~P ^ -~q)v ( P A P A Q A -~P)V
(P ^ P A 0 ^ ~q)v (q A -,q ^ O A -,q) The interesting thing about this formula is that every conjunction contains a complementary pair of literals. It is clear that this property holds for the DNF of any unsatisfiable formula. If a conjunction did not contain a complementary pair of literals, then that conjunction and, thus, the whole formula could be satisfied. This is the logical basis for the connection method. However~ it does not actually form the DNF of a formula. Instead, it does graph searching of the formula, enumerating its paths, where a path consists of one literal from each clause. If every path contains a complementary pair of literals, then the formula is unsatisfiable. Because a single complementary pair of literals often appears in more than one path, it is possible by clever search to avoid explicit enumeration of all
101
of the paths. A connection graph is often used as an auxiliary data structure in the connection method. The connection method is applicable to formulas that are not in clause form.
It is only
necessary to refine the definition of a path through the formula. Consider the case of formulas that
negation normal form (NNF). A formula is in
are in
NNF if its only connectives are conjunction,
disjunction, and negation, and only atomic formulas are arguments of negation. (The connection method applies to formulas more general than NNF, but NNF is especially convenient to discuss, because the restriction of negation to atomic subformula means that, for example, a conjunction is really a conjunction, not a disjunction in disguise because it is negated.) A path through a formula that is a single literal consists of that single literal. Any path through one of the disjuncts is a path through a disjunction. Any concatenation of paths through all of the conjuncts is a path through a conjunction. For example, the formula
PV
A (R v -,S)
has the paths
(P), (Q,R),
and
(Q,~S).
The principal strategic concerns for the connection method applied to ground formulas are the efficient enumeration of pairs of complementary literals and paths so that not all paths need to be individually checked and the reduction of formulas to equivalent ones. Many reduction methods are similar to methods used in resolution, such as elimination of tautologies, subsumption, and purity. For formulas that are not ground there is the additional strategic concern of how many instances of subformulas will be needed for a single substitution to exist such that every path contains a complementary pair of literals. For example, in refuting instances-~P(a) and
~P(b) of ~P(x)are
~P(x) A (P(a) V P(b)), two
required.
Bibel's chapter also includes discussion of the connection method.
2.17
Theory
Resolution
Theory resolution [79,81] is a method of incorporating specializedreasoning procedures in a resolution theorem prover so that the reasoning task will be effectivelydivided into two parts: special cases, such as reasoning about inequalitiesor about taxonomic information, are handled efficiently
102
by specialized reasoning procedures, while more general reasoning is handled by resolution. The connection between the two reasoning components is made by having the resolution procedure resolve on sets of literals whose conjunction is determined to be unsatisfiable by the specialized reasoning procedure. The objective of research on theory resolution is the conceptual design of deduction systems that combine deductive specialists within the common framework of a resolution theorem prover. Past criticisms of resolution can often be characterized by their pejorative use of the terms
uniform and syntactic. Theory resolution meets these objections head on. In theory resolution, a specialized reasoning procedure may be substituted for ordinary syntactic unification to determine unsatisfiability of sets of literals. Because the implementation of this specialized reasoning procedure is unspecified--to the theorem prover it is a "black box" with prescribed behavior, namely, able to determine unsatisfiability in the theory it implements--the resulting system is nonuniform because reasoning within the theory is performed by the specialized reasoning procedure; reasoning outside the theory is performed by resolution. Theory resolution can also be regarded as being not wholly syntactic~ because the conditions for resolving on a set of literals are no longer based on their being made syntactically identical, but rather on their being unsatisfiable in a theory, and thus resolvability is partly semantic. Reasoning about orderings and other transitive relations is often necessary, but using ordinary resolution for this is quite inefficient. It is possible to derive an infinite number of consequences from (a < b) and -~(x < y) V ~(y < z) v (x < z) despite the obvious fact that a refutation based on just these two formulas is impossible. A solution to this problem is to require that use of the transitivity axiom be restricted to occasions when either there are matches for two of its literals (partial theory resolution) or a complete refutation of the ordering part of the clauses can be found (total theory resolution). An important form of reasoning in artificial-intelligence applications embodied in knowledgerepresentation systems is reasoning about taxonomic information and property inheritance. One of objectives of theory resolution is to be able to take advantage of the efficient reasoning provided by a knowledge representation system by using it as a taxonomy decision procedure in a larger deduction system. For systems like the Krypton knowledge representation system, which comprises terminological and assertional reasoning components, theory resolution provides a theory for connecting different reasoning systems. Any satisfiable set of formulas that is to be incorporated into the inference process can be regarded as a theory. A T-interpretatlon is an interpretation that satisfies theory T.
103
For example, in a theory of partial ordering ORD consisting of -~(x < x) and (x < y) A (y < z) D (x < z), the predicate < cannot be interpreted so that (a < a) has value true or (a < e) has value f a / s e if (a < b) and (b < c) both have value true. In a taxonomic theory T A X including
Boy(x) D Person(x), Boy(John) cannot have value true while Person(John) has value false. A set of clauses S is T-unsatisfiable if and only if no T-interpretation satisfies S. Let Ci . . . . . C,~ (rn > 1) be a set of nonempty clauses, let each Ci be decomposed as K~ v L~ where Ki is a nonempty clause, and let R 1 , . . . , Rn (n _> 0) be unit clauses. Suppose the set of clauses K1,. • •, K,~, R1,. •., Rn is T-unsatisfiable. Then the clause Li V . - - V L,~ V ~R1 V . . . v -~R, is a theory resolvent using theory T (T-resolvent) of C i , . . . , Cm. It is a total theory resolvent if and only if n = 0; otherwise it is partial. K1,... ,Kin is called the key of the theory resolution operation. For partial theory resolvents, R 1 , . . . , Rn is a set of conditions for the T-unsatisfiability of the key. The negation -~R1 V . . . V -~R, of the conjunction of the conditions is called the residue of the theory resolution operation. It is a narrow theory resolvent if and only if each Ki is a unit clause; otherwise it is wide. The partial theory resolution procedure permits total as well as partial theory resolution operations. Similarly, the wide theory resolution procedure permits narrow as well as wide theory resolution operations. For example, a set of unit clauses is unsatisfiable in the theory of partial ordering ORD if and only if it contains a chain of inequalities tl < . . . < t,(n > 2) such that either tl is the same as tn or ~(tl < t , ) is also one of the clauses. P is a unary total narrow ORD-resolvent of (a < a) V P. P V Q is a binary total narrow ORD-resolvent of (a < b) v P and (b < a) V Q. P v Q v R V S is a 4-ary total narrow ORD-resolvent of (a < b) v P , (b < c) v Q, (c < d) v R, and -~(a < d) V S. This can also be derived incrementally through partial narrow ORD-resolution, i.e., by resolving (a < b) V P and (b < e) V Q to obtain (a < e) V P V Q (-~(a < e) is the condition), resolving that with (e < d) V R to obtain (a < d) V P V Q V R, and resolving that with -~(a < d) V S to obtain
PVQVRVS. Suppose the taxonomic theory T A X includes a definition for fatherhood Father(x) -- [Man(x)
/x 3yChild(x,y)]. Then Father(Fred) is a partial wide theory resolvent of Child(Fred, Pat) V Child(Fred, Sandy) and Man(Fred). Also, false is a total wide theory resolvent of Child(Fred, Pat) v Child(Fred, Sandy), Man(Fred), and -~Father(Fred). In narrow theory resolution, only T-unsatisfiability of sets of literals, not clauses, must be decided. Total and partial narrow theory resolution are both possible. In total narrow theory resolution, the literals resolved on (the key) must be T-unsatisfiable. In partial narrow theory
104
resolution, the key must be T-unsatisfiable only under some conditions. The negated conditions are used as the residue in the formation of the resolvent. The theory rantings procedure is another method of incorporating theories that is similar to the total narrow theory resolution method, in the sense of imposing the same requirements on the decision procedure for T, i.e., determining the T-unsatisfiability of sets of literals but does not depend on performing resolution inference operations. The theory matings procedure is an extension of the connection method or generalized matings. The statement that if every path through a formula contains a complementary pair of literals, then the formula is unsatisfiable can be generalized to the statement that if every path through a formula contains a set of literals that is unsatisfiable in the theory T, then the formula is T-unsatisfiable. Theory resolution is a procedure with substantial generality and power. Thus, it is not surprising that many specialized reasoning procedures can be viewed as instances of theory resolution, perhaps with additional constraints governing which theory resolvents can be inferred. For example, unification in equational theories can be viewed as a special case of theory resolution for building in equational theories. Inference rules such as paramodulation, resolution by unification and equality, and E-resolution can also be viewed as instances of theory resolution, differing in whether total or partial theory resolution is used and in their selection of key sets of literals to resolve on.
2.18
Krypton
Krypton [7,8] represents an approach to constructing a knowledge representation system that is composed of two parts: a terminological component (the TBox) that can represent and reason about terminological information and an assertional component (the ABox) that can represent and reason about assertional information. It is an interesting example of the application of theory resolution. Krypton's TBox provides a language for defining and reasoning about taxonomic relations. It permits definitions of concepts and roles that are associated with unary and binary predicates in the ABox. A concept can be defined as a primitive concept, a conjunction of concepts, or a concept restricted so that all fillers of a particular role are of a certain concept. A role can be defined to be a primitive role or a composition of roles. For example, the following are valid Krypton TBox definitions:
105 def
Grandchild =_ (RoleChain Child Child) def
Coed - (ConGenermWoman Student) Successful-Grandma de_f(VRGeneric Woman Grandchild Doctor) That is,
Grandchild(x, y) is true if and only if y is a child of a child of x, Coed(x) is true if
and only if x is someone that is both a woman and a student, and
Successful-Grandma(x) is
true if and only if x is a woman all of whose children (if any) are doctors. Krypton's ABox is a resolution theorem prover [78] that uses predicates that have been given TBox definitions. Taxonomic definitions are not provided as assertions to the ABox however. Instead, they are taken account of by theory resolution operations that use the theory of the defined concepts and roles. Thus, for example, all of the following inferences are single-step theory-resolution operations performable by the ABox: from
Student(John) and -~Coed(John) it is possible to infer
-~Woman(John); Coed(John) and ~Woman(John) are directly contradictory; from SuccessfulGrandma(Marge) and Child(Marge, Hope) it is possible to infer (Child(Hope, x) D Doctor(x)).
3
Unification
Unification is a bidirectional pattern matching process, i.e., it is like pattern matching except values can be assigned to variables in both expressions, not just one. For example, though neither of
P(a, x) and P(y, b) is a pattern-matching instance of the other, they are unifiable with most
general unifier {x ~-- b, y +-- a}. In general, a substitution is a set of variable assignments. It is convenient to consider only
idempotent substitutions [19], where an idempotent substitution is
one in which no variable x~ that appears in an assignment x~ ~-- tl also occurs inside the term tj for any assignment xj ~-- tj in the substitution. The standard unification algorithm scans the two expressions to be unified in left-to-right order, looking for the first
disagreement or difference between the two expressions. If one of the
two subexpressions located at the first disagreement is a variable and the other is an expression not containing that variable, then the assignment of the subexpression to the variable is added to the substitution being constructed, the two expressions are instantiated by the new assignment, and the process continues. If neither of the two expressions is a variable, or if one is a variable and the other subexpression contains that variable, then unification of the two expressions fails. Unification succeeds if no (further) disagreements are found, i.e., the two original expressions have been instantiated to be identical.
106
The check for whether the variable is contained in the expression that is perhaps to be assigned to it is call the occurs cheek. The occurs check causes the unification of x and f ( x )
(or any other
term containing x other than x itself) to fail. If the unification were allowed to succeed, it would result in the formation of a circular binding x +- f ( x ) ,
and the unified expressions would be
infinite. A demonstration of the importance of the occurs check for sound inference is the following Prolog program [64] (many Prolog implementations, for the sake of efficiency~ do not perform the occurs check): (1) X < s(X). (2) 3 < 2 : - s(Y) < Y. (3) ? - 3 < 2. Restated in English~ the foregoing asserts that says (I) every x is less than the successor of x, and that (2) if, for every y, the successor of y is less than y, then 3 would be less than 2; it then asks (3) whether 3 is less than 2. Prolog implementations without the occurs check would answer affirmatively, binding X to s(Y) and Y to s(X), thereby creating an infinite term. Moreover, unification without the occurs check may not even terminate, as in the case of unifying the values of X and Y. The unification algorithm either succeeds and returns a single unifying substitution u n i f i e r or fails and returns none. If it succeeds, the unifier returned is a m o s t g e n e r a l u n i f i e r .
A most
general unifier is one such that any other unifier is a variant or instance of it. The most general unifier is not necessarily unique, however. For example, both {x +- y} and {y +-- x} are most general unifiers of x and y. As given here, unification can be quite inefficient. In the worst case, its behavior is exponential. For example, consider the unification of f ( u , h ( w , w), w , j ( y , y)) and f ( g ( v , v), v, i ( x , z), x). This would result in successive assignments of u +-- g ( v , v ) , v +- h ( w , w ) , w +-- i(z, x), and x +- j ( y , y ) . The resulting substitution is { x g(h(...),
h(..-))}.
+-- j ( y , y ) , w
+-- i ( j ( y , y ) , j ( y , y ) ) , v
+-- h ( i ( . . . ) , i ( . . . ) ) , u
~-
The algorithm would incrementally construct this substitution, instantiating
the current substitution by each new variable assignment as it is made, and would also create new instances of the original expressions. There is a linear-time unification algorithm by Paterson and Wegman [62] that requires a directed acyclic graph representation for expressions, and there are also efficient unification algorithms by Huet [26] and Martelli and Montanari [57]. The costliest inefficiency of the standard unification algorithm is its need to instantiate the
107
expressions being unified and the substitution being constructed by the newest variable assignment. This can, as in the example, produce exponential growth in the size of the substitution and the terms being unified. A solution is to use a structure-sharing method [6] during unification. As the two expression are being scanned for disagreements, if a variable is encountered, its value is looked up in the list of bindings accumulated so far and used in the scanning process (but not substituted into the expression). Variables are also looked up when the occur check is applied. This process eliminates actual formation of the instantiated terms though they are implicitly created during the scanning and occur check processes. If the process is completed successfully with no uneliminatable disagreements, the result is a set of noncircular variable bindings that may depend on each other. The set of bindings should be converted to an idempotent substitution for it to be used efficiently, e.g., to instantiate the remaining literals of a pair of clauses being resolved° To convert a set of dependent noncircular bindings to an idempotent substitution, topologically sort them so that the binding x~ +- ti precedes the binding x1 +- t~. if x~ occurs in tj (an inability to topologically sort the bindings implies an occur check violation). Let (xl ~-- t l , . . . , x• +-- t,~) be a topologically sorted list of noncircular bindings. Then let ~1 be ( x l +-- tl} and 0~ be 0~-1 U {x~ +- t~0~_l) (1 < i < n). Each 0~ is an idempotent substitution with 8, being the final result. A more abstract characterization of substitutions and unification can be found in the chapter by Huet.
3.1
Unification
in Equational
Theories
Some equational theories occur in theorem-proving applications often enough and have enough impact on overall performance to merit their incorporation directly into unification algorithms [65]. The most pervasively used properties that have been built into unification algorithms are associativity, commutativity, and their combination. If the equality relation is used to represent associativity and commutativity, then associativity and commutativity of the function f can be expressed as f ( f ( x , y), z) -- f(x, f(y, z)) and f(x, y) =
f(y,x). Because of the difficulty of using the equality predicate, whether by a x i o m or special rules of inference like paramodulation, an alternate formulation has often been used. Let P(x, y, z) denote f ( x , y ) -= z. Then associativity can be represented by the pair of clauses -~P(x,y,u) v
108
-~P(u,~,~,) v --,P(,,, ~, ~) v e ( = , , , , ~ ) and -~P(~, y,,~) v -~P(y, ~,v) v -~P(=,v,~) v v(,~,~,~,), = d commutativity can be represented by the clause
-~P(x, y, z) v P(y, x, z).
There are several difficulties in specifying associativity and commutativity axiomatically regardless of which representation is used. One is that there are too many representations for the same expression. For example, the expressions
f(a,f(b,e)), f(a,f(c,b)), f(b,f(a,c)), f(f(a,c),b),
etc., are all equivalent if f is
associative and commutative. These multiple representations for equivalent expressions contribute to excessive search-space sizes. Subsumption will not detect and remove formulas that are associative-commutative variants. In addition, verifying that two expressions are associative-commutative variants often involves lengthy deductions. For example, to derive
f(f(e, a), b) from f(a, f(b, c))
requires two uses
of the commutativity axiom and one of the associativity axiom. This approach requires three paramodulation steps. Even more steps would be necessary if equality axioms or the nonequality formulation were used instead of paramodulation. Theorem provers will often fail to solve difficult problems that involve functions that are associative or commutative or both, because so much effort must be spent on deductions that should be trivial. Another problem with the axiomatic representation for associativity and commutativity is that the theorem prover will be undiscriminatingin what results are derived. If f(a,
x)
and
f(y, b) need
to be unified, where f is associative and commutative, an associative-commutative unification algorithm may recognize that a complete set of most general unifiers consists of {x ~-- b, y *- a} and {x ~--
f(b,z),y *-- f(a,z)}.
However, if axioms for associativity and commutativity are
used, less general unifiers like {x ~--
f(b, f(zx,z2)),y *-- f(a, f(z,,z2))}
and their associative-
commutative variants will also be generated, ad infinitum. Building properties like associativity and commutativity into the unification algorithm eliminates these difficulties and the need for associativity and commutativity axioms. If equality axioms or inference rules are needed only to support inference about these properties, then the use of equality axioms or inference rules can be eliminated as well. Despite the complexity of special unification algorithms for equational theories and the fact that they are generally much more time consuming than ordinary unification, their use generally pays off when trying to prove nontrivial theorems. When compared with formulating properties such as associativity and commutativity as axioms, special unification is advantageous because it does not return any results that are not implicit in the search space using the axioms--it just computes them more directly--and they often will compute a finite complete set of unifiers, while
109
the axiomatic approach would continue to generate redundant consequences. If the unification problem is intractable (associative-commutative pattern matching is NPcomplete [2]--associative~commutative unification is thus at least that difficult), this difficulty will also be reflected in the number of inferences required in trying to prove theorems with the axioms without using special unification. The most serious problem with using special unification algorithms is the possible occurrence of difficult unification tasks early in a search for a proof that effectively blocks the discovery of a shallow proof elsewhere, because of the resources spent on special unification. In such cases, incomplete or incremental special unification algorithms can be employed. Ideally, an incomplete special unification algorithm wilt return only some of the simpler unifiers and an incremental one wilt return progressively more complex unifiers on successive calls. Actually, the use of axioms for the theory of, say associativity or commutativity, along with axioms or rules for equality plus ordinary unification in effect form a quite inefficient incremental unification algorithm. Many results on special unification can be found in Siekmann [71].
3.2
Commutative
Unification
The standard unification algorithm can be easily modified to build in commutativity of functions [73,70].
For example, if f is a commutative function, when unifying the terms f(s~,sz) and
f(Q,t2), it is necessary to try to unify the arguments sl with tl and s2 with t2 simultaneously, as in the ordinary unification algorithm, and also to try to unify sl with t2 and s2 with tl simultaneously. This modification of the ordinary unification algorithm yields an algorithm that is complete for commutative functions. The commutative unification algorithm given here illustrates two properties of special unification algorithms. One is their added complexity and computational requirements compared to ordinary unification. The second is that, depending on the theory that is incorporated into the unification algorithm, it may no longer be the case that there will be only zero or one most general unifiers. For example, if
f(x, y) and f(a, b)
are unified, where f is commutative, then
{x ~ a, y *- b} and {x ~ b, y ~ a} are both most general unifiers; the two together compose the complete set of most general unifiers.
110
3.3
Associative
Unification
Associative unification is more difficult than commutative unification. If the function f is associative (but not commutative), then the terms unifiers, namely
f(a, x) and f(x, a) have an infinite set of most general
{x ~-- a), (x e- f(a, a)}, (x ¢-- f(a, f(a, a))}, and so on. Two other interesting
examples, syntactically similar to the first, are the unification of
f(a, x) and f(y, b), which have a
complete set of most general unifiers consisting of (x ~- b, y ~-- a) and {x +and the unification of
f(z, b), y ~-- f(a, z)),
f(a, x) and f(x, b), which have no unifiers.
For functions that are associative, it is convenient to drop the distinction between f(x, and
f(f(x,y),z)
and represent both by the term
f(y, z))
f(x,y,z) as if f were an n-ary function for
arbitrary n. A complete unification algorithm for associative functions is readily obtained by modifying the standard unification algorithm in the following manner [65,73,47]. Argument lists of two terms headed by the same associative function symbol are scanned in left-to-right order, looking for the first disagreement. If the first disagreement is that one argument list is exhausted before the other, then unification with the current substitution fails. The principal difference from standard unification occurs when the two subexpressions at the first disagreement are arguments of an associative function and one or both of the subexpressions is a variable. Let variable z and term t be such arguments of an associative function f. If t contains x, then unification with the current substitution fails. If t does not contain x, then unification proceeds with the substitution of t for x and also with the substitution of f(t,
u) for x, where u is a new variable not occurring elsewhere.
If t were also a variable, then it would also be necessary to try the substitution of
f(x, v) for t,
where v is a new variable. Consider the unification of
f(x, y) and f(a, b, c). The first disagreement is x differing from
a. Thus, the assignments x +- a and
x ~-- f(a, u) are tried, leading to the problems of unifying
f(a, y) and f(a, u, y) with f(a, b, c), respectively. The first problem is solved by the subsequent assignments
y +- f(b,v) and v e- b, and the final unifier is (x ~- a,y +- f(b,c)}. The second
problem is solved by the subsequent assignments u ~-- b and y ~- c, and the final unifier is
{x ~-- f(a, b), y *- c}. The complete set of unifiers consists of {x ~- a, y +- f(b, c)} and (x ~-/(a, b), y ~- e}. Although this algorithm is complete, it does not always terminate, because, as in the case of unifying
f(a, x) and f(x, a), there may be an infinite number of unifiers. Even where there is
a finite number of unifiers, as in the case of unifying
f(a,x) and f(x,b) that have no unifiers,
111
the algorithm fails to terminate, trying to match the expressions with a~signments z ~- a, x ~f ( a , vx), x *- f ( a , a , v2), and so on. An alternative approach to associative unification that allows better control separates the tasks of assigning terms to variables and creating new variables. This approach uses an incomplete associative unification algorithm that introduces no new variables. It just assigns to a variable one or more of the arguments in the other argument list. For example, in the case of unifying f(x, y) and f(a, b, c), the algorithm may try to assign to x the terms a, f(a, b), and f(a, b, e). Continuing to unify the expressions with each of these assignments results in the unifiers {x ~ a, y ~-- f(b, e)},
{x *-- f(a, b), y ~-- c}, and failure, respectively. This incomplete unification algorithm is combined with a widening [73] or variable splitting [76] process that replaces variables by more complex terms. If the variable x is an argument of the associative function symbol f , then the term containing x can be widened by replacing x by the term f(xl,x2) with new variables xl and x2. A complete associative unification algorithm is obtained by collecting the results of unifying, by the incomplete but terminating associative unification algorithm, one expression with all results of widening the other expression. The widening operation may be applied any number of times. In order to compute the infinite number of unifiers of f(a, x) and f(x, a), an infinite number of widening substitutions must be applied. Note that it is sufficient to create widening substitutions for only one of the two expressions. For example, in unifying f(a, x) and f(y, b), the incomplete unification algorithm returns the substitution {x ~-- b, y ~- a). Widening f(a, x) with the assignment x +-- f(xl, x2) results in the unification of f(a, x2, x2) and f(y, b) with unifier {x +- f(x2, b), y ~-- f(a, x2)}. The completeness of the above incomplete associative unification algorithm in conjunction with widening only one of the two expressions implies the completeness of the incomplete associative unification algorithm for pattern matching. Makanin [54] proved the decidability of associative unification for the restricted case of terms composed of a single associative function symbol and variables and constants only. However, this algorithm only decided whether a unification problem was solvable--it did not return a unifier, let alone a complete set of unifiers. More recently, Jaffar [34] has developed a minimal and complete unification algorithm for this case. This algorithm computes a minimal complete set of unifiers and, unlike the algorithm described above, is guaranteed to terminate if the complete set is finite. However, this algorithm has not yet been generalized to handle nonvariabte, nonconstant arguments, as is required for general use in theorem proving.
112
3,4
Associative-Commutative
Unification
It is possible to develop associative-commutative unification algorithms aIong the lines of the complete but nonterminating associative unification algorithm and the incomplete associative unification algorithm augmented by widening [76]. However, we can do better. In the case of associativity plus commutativity, there is a finite number of unifiers, and it is possible to devise a complete terminating unification algorithm [75~77,48]. Arguments common to the two terms headed by the same associative-commutative function symbol can be canceled in pairs until no arguments appear in both terms. Thus, for example, the problem of unifying f(x, x, y, a, b, c) and f(b, b, b, c, z) can be replaced by the problem of unifying
f(x, x, y, a) and f(b, b, z). The case of unification of terms headed by an associative-commutative function symbol with only variable arguments will be considered first.
For example, consider unifying the terms
f ( x , x , y , u ) and f ( v , v , z ) where f is an associative-commutative function. What is required of a substitution for variables u, v, x, y, and z for it to be a unifier? Each variable is assigned either a term not headed by the function symbol f or a term headed by f with some arguments. Consider each distinct term t that is either a variable value not headed by the function symbol f or variable-value argument of a term headed by f . For a substitution to be a unifier, for every such term t, twice the number of t's in x plus the number of t's in y plus the number of t's in u must equal twice the number of t's in v plus the number of t's in z. Thus, unification of the terms
f(x, x, y, u) and f(v, v, z) is related to solution of the linear homogeneous diophantine equation 2x+y+u=2v+z. In contrast to the usual situation of trying to solve linear homogeneous diophantine equations, associative-commutative unification requires that only nonnegative integral solutions be considered. A negative value for a variable corresponds to assigning a negative number of terms to a variable in the unification problem. Negative values are considered in extensions of this method to abelian-group-theory unification (associativity plus commutativity plus identity plus inverse); the presence of the inverse operation makes it meaningful to consider the assignment of a negative number of terms to a variable. The set of all nonnegative integral solutions to a linear homogeneous equation can be obtained by addition of elements of a finite basis set of solutions. This finite basis set of solutions is obtained by generating all solutions to the equation in ascending order of the value of 2x + y + u (= 2v + z), discarding solutions that are composable from those previously generated, and terminating when
113
no new noncomposable solutions can be found. It is necessary to discover some b o u n d on the value of the equation such that no new basis solutions will be found with value higher t h a n the b o u n d .
Consider the general problem of
finding solutions to the linear homogeneous diophantine equation a z x l + " " + a,~x,~ = blyl + For each i and j with 1 < i < m and 1 < j N n, there is a basis solution with
• ".+bny,.
xi = l c m ( a , , b j ) / a i ,
Y1 = l c m ( a i , b i ) / b i ,
and all other variables equal to zero.
One of these
solutions m u s t be subtractable with nonnegative difference from any solution with value greater t h a n m a x ( m , n) x max~,~-lcm(a~, bj) a n d this is, therefore, a b o u n d on the value of solutions. A lower b o u n d and more effective e n u m e r a t i o n m e t h o d can be found in Huet [27]. The 7 basis solutions for the equation 2x + y + u = 2v + z are given by the table
x y u v z
I 2 3 4 5 6 7
0 0 0 0 0 1 1
0 1 0 1 2 0 0
1 0 2 1 0 0 0
0 1 z l 0 1 z 2 1 0 z s 1 0 ~ 1 0 z 5 0 2 4 1 0 ~
Thus, any nonnegative integral solution to the equation can be obtained by assigning nonnegative integers to the variables zl,. • •, z7 and computing x=
z6+z7
y = z2 + z4 + 2z5 u = z l + 2z3 + z4 V = Zz + Z4 + Zh + Z7 z = zl + z2 + 2z6
The corresponding s u b s t i t u t i o n {x ~-- f(z¢~ zT), y ~-- f ( z 2 , z4, zh, z s ) , u ~-- f ( z l , z3, z z , z 4 ) , v ~-f ( z z , z4, zh, zr), z ~
f ( z l , z2, z¢, z6)} is the single most general unifier if f has an identity element.
Associative-commutative unification without identity is slightly more complicated. Because without an identity element it is impossible to assign zero terms to some variable z~, it is necessary to consider the 2" combinations of the n basis solutions, restricted to those such that none of the variables x, y, u, v, or z is assigned zero. There are 69 such solutions, including (denoting a solution by the set of its indices) {2, 3, 6}, {1, 2, 3, 6}, and {4, 6} with corresponding unifying substitutions
1t4
{x ~-- z6, y ~'- z~,u +-- f(z3,z3),v +-- z3, z .e-- f(z~,z6, z~)} {~ +-
~ , y ~- ~2,~ ~ - / ( z l , ~ 3 , ~ ) , v ~- z3,~ +-/(~1,z~,~,~,)}
{~ ~-
z~,y ~ -
~,,~ +- ~,,v ~-
z,,~ ~-/(z~, z,)}
This set of 69 unifiers is a m i n i m a l complete set of unifiers of f ( x , x, y, u) and f(v, v, z). Associative-commutative unification for more general terms is accomplished by first forming
a variable abstraction of the terms. For example, in unifying f ( x , x, y, a) and f(b, b, z), variable only terms f ( x , x, y, u) and f(v, v, z) are formed by replacing the distinct nonvariable terms a and b by new variables u a n d v. The original terms can be obtained from their variable abstraction by applying the s u b s t i t u t i o n {u ~-- a, v ~-- b}. The variable only terms are unified as above. Each unifier of the variable only terms is then unified with {u ~-- a, v ,--- b} [83]. The resulting substitutions are a complete set of unifiers for the original terms. As stated so far~ this would seem to entail the unification of each of the 69 unifiers of
f(x,x,y,u)
and f(v,v,z)
with the substitution {u +-- a,v +- b}.
However, substantially less
effort t h a n this is required [75,77]. The generation of the sets of basis solutions can be constrained to take account of the origins of the variables. In particular, the variables u and v of the variable abstraction correspond to the constants a and b in the original terms. Each assignment to a variable x, y, u, v, or z in a unifier of f ( x , x , y , u )
and f ( v , v , z )
is either a variable z; or a
term f ( . . o). Any unifier that assigns u or v a term of the form f ( . . - ) will not be unifiable with the s u b s t i t u t i o n {u ~- a, v ~ b}. W h e n computing sums of basis solutions, any variable (e.g., u a n d v) t h a t comes from a nonvariable term in the original problem m u s t be assigned exactly one. Only 6 unifiers of f ( x , x, y, u) a n d f ( v , v, z) are discovered when this restriction is imposed. The n u m b e r can be reduced to 4 by observing the restriction that the use of basis-solution n u m b e r 4 requires the unification of a a n d b. The constrained generation of basis sums and unifiers for f ( x , x , y,u) a n d f ( v , v , z) yields sums {1, 5, 6}, {1, 2, 5, 6}, {1, 2, 7}, and {1, 2, 6, 7} with corresponding unifiers:
{x ~-- z6, y ~ f ( z b , z s ) , u ~ z l , v ~-- zb,z ~-- f(zl,z6, z6)} {= ~-
~0,y ~ - / ( z ~ , ~ , ~ ) , ~
{~ ~- ~,y
~- ~,~
~- ~,~
,- ~2,~ ~- ~,,v ~- ~,,~ +-
{~ ~- f(z~,~,),y
~- ~,~
~- ~,,.
~-
f(z~,~,~,~)}
f(~,~)}
~- ~,,z ~- f(~,z~,~,z~)}
Unification of these with the substitution {u ~-- a,v +-- b} yields the complete set of unifiers of
f(x,x,y,a)
and f(b,b,z):
{u ~- f(b,b),~ ~ - / ( a , x , ~ ) }
1t5
Termination of associative-commutative unification was an open question for a long time, but has now been solved [20]. Termination of standard unification is easy to verify because as each pair of symbols is matched during the scan from left to right for disagreements, either the remaining number of symbols is fewer (when the matched symbols agree) or the number of uninstantiated variables is fewer (when a disagreement is eliminated by assigning a term to a variable). Such a simple termination criterion does not exist in the case of associative-commutative unification because associative-commutative unification can introduce additional variables. It is necessary to show that the recursive calls on the unification algorithm operate on pairs of terms having less complexity than the original terms. For associative-commutative unification, a complexity measure that can be used to prove termination is the ordered pair (~, r) where ~ is the number of variables that occur as arguments to two different associative-commutative function symbols and r is the number of distinct nonvariable subterms that appear in the two terms being unified. A unification problem described by (~, r) is less complex than one described by (#, and only i f ~ < ~ t , o r v = #
r')
if
a n d r < r n.
The associative-commutative unification algorithm presented here can be extended to handle identity and idempotence [48,20]. If an identity element is present, then only the sum of all the basis solutions is necessary, not all 2~ subsets. If the associative-commutative function is idempotent, then the linear homogeneous diophantine equation can be solved in 0, 1 instead of over the integers. The variable and constants only case of abelian-group-theory unification (associativity, eommutativity, identity, and inverse) can be handled by a modification of this method that uses the standard solution of the linear diophantine equations in all integers, not just nonnegative ones [46].
3.5
Many-Sorted
Many-sorted unification
Unification [84,86] can be used to reason efficiently with sort information. The
universe of discourse is assumed to be divided into objects of different sorts. Constants, functions, and variables may be declared to have particular sorts and subsort relationships may be declared among sorts.
116
T h e t y p e s of s o r t i n f o r m a t i o n t h a t c a n b e h a n d l e d b y m a n y - s o r t e d unific&tion includes assert i o n s of t h e f o r m
Man(John)--John is a m a n Woman(Mary)--Mary is a w o m a n Man(/ather(x))--the
father of x is a m a n
Man(x) ~ Per~on(x)--every m a n is a p e r s o n Woman(x) D Person(x)--every w o m a n is a person. T h e s e a s s e r t i o n s are s u p p l a n t e d by sort declarations:
Man, Woman, a n d Person are sorts T h e c o n s t a n t John is of sort Man T h e c o n s t a n t Mary is of sort Woman T h e f u n c t i o n father is of sort Man T h e s o r t Man is a s u b s o r t of t h e sort Person T h e s o r t Woman is a s u b s o r t of t h e sort Person. M a n y - s o r t e d u n i f i c a t i o n uses such d e c l a r a t i o n s to r e s t r i c t t h e s t a n d a r d u n i f i c a t i o n algorithm. W h e n e v e r t h e u n i f i c a t i o n a l g o r i t h m eliminates a d i s a g r e e m e n t b e t w e e n two expressions by assigning a t e r m to a variable, t h e m a n y - s o r t e d unification r e s t r i c t i o n cheeks for c o n f o r m a b i l i t y of t h e sorts of t h e v a r i a b l e a n d t h e t e r m . T w o cases of m a n y - s o r t e d unification will be d i s t i n g u i s h e d . In t h e first case, t h e sort h i e r a r c h y is a forest, i.e., a set of trees. No sort C is a s u b s o r t of b o t h A a n d B (unless A is a s u b s o r t of B or B is a s u b s o r t of A). T h e second m o r e general case p e r m i t s c o m m o n s u b s o r t s a n d allows sort h i e r a r c h i e s t h a t axe g r a p h s . I n b o t h cases, a nonvaxiable t e r m c a n b e assigned to a v a r i a b l e only if t h e nonvaxiable t e r m ' s s o r t is t h e s a m e as or is a s u b s o r t of t h e v a r i a b l e ' s sort. T h e o t h e r s i t u a t i o n in w h i c h a d i s a g r e e m e n t c a n b e successfully e l i m i n a t e d is w h e n t h e disa g r e e m e n t consists of two d i s t i n c t variables. In t h e forest sort h i e r a r c h y case, if t h e variables are of t h e s a m e sort, e i t h e r c a n b e assigned to t h e o t h e r . If o n e ' s sort is a s u b s o r t of t h e o t h e r ' s , t h e f o r m e r v a r i a b l e m u s t b e assigned to t h e l a t t e r variable. T h u s , in unifying variables x a n d y, one c a n n o t , as in s t a n d a r d unification, u n i f o r m l y m a k e t h e a s s i g n m e n t x ~- y. We m u s t i n s t e a d m a k e t h e a s s i g n m e n t y *- x if x's sort is a s u b s o r t of y's. If n e i t h e r v a r i a b l e ' s sort is a s u b s o r t of t h e o t h e r ' s , t h e n u n i f i c a t i o n s i m p l y fails. For e x a m p l e , if x is a v a r i a b l e of s o r t Person a n d y a v a r i a b l e of s o r t Man, t h e n John a n d y are
t17
unifiable with unifier {y +- John), x and y are unifiable with unifier {x +- y} (but not {y *- x}), and Mary and y are not unifiable. Note that, by use of a technique familiar in logic p r o g r a m m i n g [161, if the sort hierarchy is a forest, m a n y - s o r t e d unification can be simulated by encoding sort information directly in the terms.
In this technique, there is a unary function symbol associated with each sort.
Sorted terms are e m b e d d e d in a sequence of such u n a r y function symbols corresponding to the sequence of sorts f r o m the top of the sort hierarchy to the declared sort of the term. Thus, the m a n John and the w o m a n Mary are represented by the t e r m s person(man(John)) and person(woman(Mary)), respectively. The arbitrary person x and m a n y are represented by the terms person(x) and person(mart(y)) respectively.
For example, similarly to above,
person(man(John)) and person(man(y)) are unifiable with unifier {y ~-- John), person(x) and person(man(y)) are unifiable with unifier {x *-- man(y)), and person(woman(Mary)) and
person(man(y)) are not unifiable. W h e n unifying two variables in the graph sort hierarchy case, if the variables are not of the s a m e sort and neither variable's sort is a subsort of the other's, the variables are still unifiable provided their sorts have one or more subsorts in common. For each c o m m o n subsort, a new variable of t h a t sort is created and assigned to b o t h variables being unified. It is sufficient to consider m a x i m a l c o m m o n subsorts, e.g., if $1 and $2 are the two c o m m o n subsorts of the sorts of the variables x and y being unified, but $2 is a subsort of $1, then only one unifier need be f o r m e d - - w i t h a new variable z of sort $I being assigned to b o t h x and y. For example, assume the declarations
Animal, Mammal, Lion, Dog, Cat, Fish, Shark, Koi, and Pet are sorts Mammal, Fish, and Pet are subsorts of Animal Lion, Dog, and Cat are subsorts of Mammal Shark and Kol are subsorts of Fish Dog, Cat, and Koi are subsorts of Pet. Let xF~sh denote the variable x of sort Fish, and the like. T h e n xF~,h and YP~t are unifiable with unifier {x +-- Ugoi, Y ~- ugoi} and ZM. . . . l and YPet are unifiable with unifiers {z *-- vDog,y *--
VDog) and {z +-- WCat,Y +-- went}. M a n y - s o r t e d unification can be very effective as experiments w i t h "Schubert's steamroller" puzzle indicate [85]. It blocks formation of terms t h a t are nonsense from the s t a n d p o i n t of the sort structure of the problem. T h e n u m b e r of clauses and literals in problems is reduced. Clauses
118
stating sorts of symbols and subsort relationships are eliminated. Because sort qualifier literals are removed from clauses so that, for example, the clause replaced by the unit clause
~Fox(x) v -~Bird(y) v Eats(x, y) is
Eats(xFoz, YBi~d), the remaining clauses tend to be shorter, and there
are likely to be more unit clauses. A further advantage is the abstract level of proofs using many-sorted unification. Suppose that foxes and birds are animals and that foxes like to eat birds. That some animal likes to eat some animal can be proved in a single resolution step by unifying the atoms of the assertion
Eats(x~o~,ys~r~) and the negated theorem "~Eats(UA,~mal,VA,i~aZ). The instantiation
of the variables of the theorem suggest the answer that all foxes like to eat all birds. Without using many-sorted unification, the assertion solved with the negated theorem
-~Fox(x) V -~Bird(y) y Eats(x, y) could be re-
-~Animal(u) v -~Animal(v) V -~Eats(u,v). The resulting clause
-~Fox(x) V -~Bird(y) V -~Animal(x) Y ~Animal(y) must then be refuted. This requires instantiation of x by some specific fox and y by some specific bird, e.g., the Skolem constants used in asserting the existence of foxes and birds, and the proof will end up mentioning a specific fox and bird. Worse yet, if there were a large number of assertions specifying that certain things were foxes or birds, there would be a large number of ways of instantiating the clause and thus a large number of proofs that may mention different foxes and birds. There are some important assumptions associated with the use of many-sorted unification. One is the assumption of nonemptiness of the sorts used.
P(x) and -~P(x) are not contradictory
if x's sort is empty. More restrictive in practice is the assumption that terms can be assigned their sorts a priori. For example, suppose
Tweety is declared to be of sort Animal of which sort Bird is a subsort.
The absence of the characteristic predicate
Bird makes Bird(Twecty) inexpressible. Even if the
Bird predicate is included, assuming or even proving the formula Bird(Tweety) has no effect on Twccty's declared sort, which is used to restrict the unification algorithm. Thus, there should be in the sort hierarchy only those sorts for which it is unnecessary to assume or prove that some term is a subsort of its declared sort. A limitation of this form of many-sorted unification is the lack of polymorphic sort declarations. It is often very useful to declare predicates and functions to have more than one possible set of sorts of arguments and for the sort of a function's value to depend on the sorts of its arguments. More general procedures for reasoning about sorts are being developed [14,15,691.
119
4
Equality
Reasoning
The equality relation is often used in problems to which theorem-proving programs are applied. Because of its widespread use and the difficulties resulting from simply axiomatizing it, much effort has been devoted to developing special rules of inference for the equality relation.
4.1
Equality
Axiomatization
The equality relation : is an equivalence relation, i.e., it is reflexive, symmetric, and transitive. These properties are usually given to theorem-proving programs as the following three assertions: X:X
~(~ =
y) v (~ =
~)
~(~ : y) v ~(y = ~) v (~ : ~) However, this is not the only possible expression of these properties. A smaller set of assertions t h a t conveys the same information is X=X
~(~ :
y) v ~(y = ~) v (~ =
~)
The symmetry property is obtained from these latter two assertions by resolving on x
:
x and
-~(x : y) to yield -~(x -- z) V (z : x). The standard transitivity axiom can then be obtained by resolving this with the second assertion. The reduced number of assertions may yield a smaller search space with lower branching factor, though with sometimes longer proofs. In addition to reflexivity, symmetry, and transitivity, the equality relation possesses substitutivity properties, i.e., terms that are equal to each other can be substituted for each other anywhere in a term or formula. These are expressed by two sets of assertions that specify the predicate-substitutivity and functional-substitutivity axioms. For each n-ary predicate P other than : ~ there are n predicate-substitutivity axioms of the form:
~(~, = ~) v ~ p ( x l , . . . , ~ , )
v P(~I . . . . ,~,-1, ~, ~,+~,...,~°)
~(~. : ~) v ~P(~I, ...,~,) v P ( ~ l , . . . , ~._1,~) For each n-ary function f, there are n functional-substitutivity axioms:
120
~(xl
= x) V (f(xl ..... x,~) = f(x, xz ..... x,~))
~(xl -~ x) V (f(xl,... ,xn) = f(xl,..., x~-1, x, xi+1,..., x,~)) ~(~
= ~) v ( / ( ~ 1 , . . . , ~ , )
= :(xl .... , ~_l,x))
The problems of using this axiomatic formulation of the equality relation is the large number of axioms and their generality. In particular, the n predicate-substitutivity axioms for the predicate P are always resolvable with any literal with predicate P. The search space is large and contains many useless and redundant results. It is also very laborious to derive even simple consequences of equality. For example, to derive the obvious fact that f(g(h(a))) = f(g(h(b))) from a = b requires three applications of functional-substitutivity axioms. These problems motivated the development of special rules of inference to be used in addition to resolution. These additional rules of inferences have been only partially successful. They have largely succeeded in reducing the length of proofs and deriving obvious results like the above in a natural way, but the rules are still sufficiently general that the problem of large search spaces for problems involving equality is not fully solved.
4.2
Demodulation
Demodulation [91] or rewriting or reduction [30] is the process of using a set of equalities to replace terms by equal terms. The equalities are oriented and made into reductions ~ ~ p that are used to replace instances of the term :k by the corresponding instance of the term p. Thus, for example, the term a + 0 can be reduced to a using the reduction x + 0 --~ x with substitution (x ~-- a}. The reduction process repeatedly applies reductions to a term until a term that cannot be further reduced is produced. For this process to terminate, the reductions must be oriented by some well-defined complexity measure so that, for every instance of a reduction, the right-hand side is less complex than the left-hand side. For example, associativity of + can be built into the reduction (x + y) + z ~ x + (y + z) because, by an appropriate complexity measure, terms parenthesized to the right are simpler, and the reduction process terminates, but commutativity cannot be used in a reduction, because the reduction x + y --+ y + x can be used to rewrite a A- b to b + a to a + b infinitely. Demodulation is used in resolution theorem proving to perform rapid equality inferences on derived terms. It also has the beneficial effect of reducing many equivalent terms to the same form (in the ideal case of a complete set of reductions, all equivalent terms to the same form) and
121
thus reducing the number of variants of equivalent terms appearing in clauses to be stored and facilitating subsumption. It is also useful for performing various programming llke tricks [87,88] such as maintaining lists of possible values of parameters in puzzles and removing individual possibilities by demodulation.
Narrowing [73,42] is an extension of reduction that uses unification instead of pattern matching. A special case of paramodulation, narrowing is especially useful for constructing a unification algorithm in an equational theory specified by a complete set of reductions [21,32]. Let s and t be a pair of terms to be so unified and H be a symbol not occurring elsewhere. Then if H(s, t) can be transformed to H(s ~,t') by a sequence of narrowing operations and s t and t ~ are unifiable by the standard unification algorithm, then s and t are unifiable in the equational theory specified by the complete set of reductions with a unifier that is the composition of the unifier of s s and t t and the unifiers used in the narrowing steps.
4.3
Paramodulation
Paramodulation [89] is an equality inference rule that performs substitution directly. The paramodulant clause L(..- b.- -) v C V D can be derived by paramodulation from the clause (a = b) v C or (b = a) v C into the clause L(.-. a . . . ) VD, where L(... a . . - ) denotes the literal L and a particular occurrence of the term a in L and where C and D are arbitrary clauses. That is, an equality atom can be used to replace one of its arguments by the other in any other literal, with the remaining literaIs of the two clauses included as part of the derived clause. In the general case, it may be necessary to find a unifying substitution for the term to be replaced and the equality-atom argument. Resolution plus paramodulation is complete provided the equality reflexivity axiom x = x is included [9]. Thus, the paramodulation rule eliminates the need for the equality symmetry, transitivity, and substitutivity axioms. This completeness result applies to unrestricted resolution plus paramodulation. If refinements, such as set of support are employed, it may be necessary to include functional-reflexivity axioms to preserve completeness. The set of functional-reflexivity axioms consists of, for each n-ary function f, the unit clause
f ( x a , . . . , xn) = f ( x l , . . . , x,). These are instances of the reflexivity axiom x = x. An illustration of the necessity of the functional-reflexivity axioms when the set of support refinement is usecl is the refutation of the set of clauses P(x,x), a = b, and -~f(f(a),f(b)) with P(x, x) designated as the only clause in the set of support. To refute this set, it is necessary to paramodulate from the functional-reflexivity axiom f(x) = f(x) into P(x,x) to obtain
122
P(f(x), f(x)). Paramodulating from a = b into P(f(x), f(x)) yields P(f(a), f(b)), which can then be resolved with the input clause
4.4
~P(f(a), f(b)).
R e s o l u t i o n by Unification and Equality
Resolution by unification and equality (RUE) [17,18] adopts a different approach to incorporating equality reasoning into an inference rule. Where paramodulation applies equality substitution, producing a new literal from a literal and an equality literal, resolution by unification and equality derives a set of negative equality literals from a pair of literals. For example, while L ( . . . b.- -) v C V D can be derived by paramodulation from (a = b) v C into L ( . . . a . . . ) V D, resolution by unification and equality performs the complementary operation of deriving the clause -,(a = b) v E V F from the clauses L(--- a . . . ) V E and -~L(--- b - . . ) v F . The principle involved is that L ( . . . a . . . ) and -~L(... b . . . ) can both be true only if a is not equal to b. Thus, -~(a = b), along with the other literals of the clauses containing L and -~L, can be derived. Of course, performing resolution by unification and equality may result in the formation of resolvents with more than one inequality literal if there is more than a single disagreement in the two literals being matched. For example, ~(a = c) v -,(b = d) can be derived from
P(f(a, b)) and
~PCfCe,4). There are completeness and efficiency issues involved in the selection of what disagreements are used to construct a resolvent by unification and equality. Matching
P(f(a, b)) and -~P(f(c, d))
using the resolution by unification and equality rule must result in
~(f(a,b) = f(c,d)) for a
successful refutation of the set of clauses consisting of
P(f(a,b)), -~P(f(e, d)), and (f(a, b) =
f(c, d)). Thus, creating an inequality literal from terms whose function symbols are the same but whose subterms disagree may be necessary for completeness. It is generally more efficient, and often successful in practice, to form inequality literals from the bottommost disagreement set, as in the earlier derivation of -~(a = c) v ~(b = d). The
negative reflexive function (NRF) rule is also necessary. It creates from a clause -,(~ =
t) V C the clause -~(s: = tl) V . . . v -~(s, = t~) V C, where (8:,t:) . . . . . ( s , , t , ) is a disagreement set between s and t. For example, if
-~(f(a,b) = f(c,d)) is derived from P(f(a,b)) and -~P(f(c,d))
by resolution by unification and equality, the lower level disagreement -~(a = c) V -,(b = d) can be obtained by applying the negative reflexive function rule to
-~(f(a, b) = f(c, d)).
Appropriate use (i.e., suitable choice of disagreement sets) of the resolution by unification and equality and the negative reflexive function rules together yields a complete procedure for
123
equality reasoning.
4.5
E-Resolution
E-resolution [59] is similar to (but predates) resolution by unification a n d equality, b u t is a more complex rule of inference that includes the use of p a r a m o d u l a t i o n . It is a higher level rule t h a n resolution by unification a n d equality in much the same way that hyperresolution is a higher level rule t h a n resolution. If the titerals L and L' can be m a d e c o m p l e m e n t a r y by a sequence of p a r a m o d u l a t i o n operations by clauses (s~ = Q) V C ~ , . . . , (s~ = t , ) V C,, then D V E V C~ V . . . V C~ can be derived from L V D a n d L' V E by E-resolution, where D , E , C1,... ,C~ are arbitrary clauses. This is also b o t h a generalization and specialization of of unification in equational theories. It specializes unification in equational theories by stipulating that p a r a m o d u l a t i o n is used to match the expressions. It generalizes unification in equational theories because n o n u n i t clauses containing equalities can be used a n d the theory is therefore not equational.
4.6
Knuth-Bendix Method
Let R be the set of reductions
)~1 --4 /91,
..
•
,
)~n ---4 Pn and E be the corresponding equational
theory A1 = Pl, ... ,An = p~. T h e n R is a complete set of reductions for E if and only if it is
terminating and confluent. The term tl can be reduced by R to the t e r m tz (written tl --+ t~) if some s u b t e r m u of tl is a n instance of ), (with substitution a) for some A --+ p in R and t2 is the result t2(u +- pa) of replacing u by the corresponding instance of p. R is t e r m i n a t i n g if and only if there is no infinite sequence of reductions tl ~ t~ --+ .... R is confluent if a n d only if for every term t, if t 2+ tl and t 2+ t2 (i.e., t can be rewritten by _R to tl and t~ in zero or more steps) there is some t e r m t' such t h a t tl _I+ t ~ and t2 _!+ g. If R is a complete set of reductions for E , t h e n tl ~= t~ J~ for every pair of terms tl and t2 such that tl =E t2, where t ~ denotes the result of reducing t by R to a n irreducible form. Thus, a complete set of reductions for E can be used to solve the word p r o b l e m for E. The K n u t h - B e n d i x m e t h o d [36,29,30,10,41,42] provides a test for a set of reductions being
locally confluent. A set of reductions R is locally confluent if a n d only if for every term t, if t --* tl and t --* t2 (i.e., t can be rewritten by R to each of tl a n d t2 in one step) there is some term t' •
!
such that tx ~" t' a n d t2 --+ t . T e r m i n a t i n g sets of reductions are confluent if a n d only if they are locally confluent.
124
Instead of considering all possible terms t that can be reduced by R to terms tl and t~, the Knuth-Bendix method performs superposition operations that capture the general case of two reductions being simultaneously applicable to a term. Let )q --~ p~ and Aj ~ pj be two not necessarily distinct reductions in R with variables renamed so that they have no variables in common. Let u be a nonvariable subterm of ,kl that is unifiable with/~j with most general unifier a. Then the terms tl = p~a and t~ = ~i(u ~-- p j ) a (the instantiation by a of ),~ with u replaced by pj) form a critical pair that represent one of the cases of of A~ --+ p~ and Aj --+ pj rewriting some term t (in this case, A~a) to terms tl and t~. If for every critical pair (ti, t2), tl J.-- t2 $, R is locally confluent. For example, the set of reductions
(1) g(e,~) -~ (2) f(g(~), ~) ~ e
(3) f(f(x, y),z) ---+f(x, f(y,z)) (4) f(~,~) ---, (5)/(~,g(~)) --, (6) ~(~) --, (7) g(#(x)) ~ (s) f(~(~), f(~, y)) ---,y
(9) :(~, f(9(~),~/)) --, y
(Io) gC:(~,y)) --,/(g(y),g(~)) is a terminating and locally confluent (and, hence, complete) set of reductions for free groups, where f is the group multiplication operator, g the group inverse operator, and e the group identity element. Two terms are equal in the theory of free groups if and only if they can be simplified to the same term by this set of reductions. But, the Knuth-Bendix method is more than a test for local confluence of sets of reductions. If the set of reductions is not locally confluent, it will generate a critical pair that leads to a counterexample, i.e., a pair of terms equal in the equational theory, but distinct and irreducible. If one of the terms is simpler than the other in a manner consistent with the complexity ordering of the other reductions, then the counterexample can be made into a reduction and added to the current set of reductions being tested for local confluence. Thus, the Knuth-Bendix method can (1) terminate with no additional eounterexamples, resulting in a complete set of reductions, (2) terminate with a counterexample that cannot be oriented into a reduction because neither term is simpler than the other, or (3) continue generating reductions forever (thereby constructing the infinite complete set of reductions). An example of Case (1) is that the previously mentioned complete set of reductions for free groups is generable from reductions 1-3 by the Knuth-Bendix method.
125
An example of Case (2) is the generation of the unorientable equality f ( x , y) = f ( y , x) from reductions 1-3 plus f ( x , x) -= e. An example of Case (3) is the generation of an infinite set of reductions
f(x, f(y,f(x,y))) --+f(x,y) f(x, f(y, f(x, f(y,w)))) --~f(x,f(y,w)) :(=,:(v,f(z,f(=,f(v,z)))))-, f(=,f(v,~)) :(=,f(v,f(~,:(=,:(v,f(~,~o))))))--,/(=,f(y,:(~,~))) from the reductions
f(=, =) - ,
f(f(x, y),z) ---+f(x, f(y,z)) The Knuth-Bendix method, when it applies, is extraordinarily powerful. The complete set of 10 reductions for free groups can be derived by computer from the original 3 with little wasted effort in just a few seconds. The chapter by Huet contains some further discussion of the standard Knuth-Bendix method. One of the most obvious limitations of the standard Knuth-Bendix method is its inability to handle theories with commutativity. Commutativity cannot be handled because the equation f ( x , y ) = f ( y , x) cannot be treated as a reduction without losing the required termination property. Despite examples such as the theory of free groups above that include associativity, the standard Knuth-Bendix method is also somewhat deficient in its handling of associativity. For example, the set of reductions f ( x , x ) ~ x and f ( f ( x , y ) , z )
---+f ( x , f ( y , z ) )
can be extended only
to an infinite complete set of reductions, atthough the single reduction f ( x , x) ~ x composes a complete set of reductions if f is assumed to be associative and associative pattern matching is used in its application. Such problems have provided motivation for extending the Knuth-Bendix method to handle equational theories that are divided into a set of reductions plus a set of additional equalities [63,28,35,43,44,45]. Functions that are associative and commutative are particularly important. With special handling for such functions, it is possible to derive complete sets of reductions for abelian groups and rings and many other interesting theories [31] An especially interesting example is that of Boolean algebra when the associative and commutative exclusive-or (0) and conjunction connectives are used as the basic set of logical connectives in terms of which formulas are rewritten [25]. The set of reductions
126
x =- y --~ x @ y @ t r u e x D y---+ ( x A y ) @ x ( ~ t r u e
~vy--~
(xAy)exe~,
-~X --+ X O true x @ f a l s e --* x x e x -+ f a l s e x A true --~ x x A f~e
-* false
can be used to decide the equivalence of two formulas in the propositional calculus by associativecommutative identity checking of the results of reducing the two expressions to their irreducible forms using associative-commutative pattern matching. A formula is valid or unsatisfiable if and only if it reduces to true or false, respectively. Extensions of the technique can be used for theorem proving in the first-order predicate calculus. An approach to handling functions that axe associative and commutative in the Knuth-Bendix method is to employ associative-commutative identity checking, pattern matching, and unification in place of standard identity checking, pattern matching, and unification [63]. The immediate difficulty with carrying out this modification is that the reduction f ( g ( x ) , x) --* e is not directly applicable to the term f(g(a), b, a, e) where f is an associative and commutative function (again treated as an n-ary function for arbitrary n) because f(g(a), a) is not a subterm of f ( g ( a ) , b , a , e ) .
The superposition process is likewise complicated. A solution is to enlarge
the set of reductions. In particular, for every reduction ~ --* p where A is headed by f which is associative and commutative, the reduction f()%v) --+ f ( p , v ) is added, where v is a new variable not occurring in ~ ~
p.
The embedding f ( g ( x ) , x , v )
~
f(e,v) (f(g(x),x,v)
~
v
after rewriting the right-hand side) can be used to reduce f ( g ( a ) , b , a , c ) to f(b,c) using the substitution {x +- a,v ~-- f ( b , c ) } obtained by associative-commutatlve pattern matching. The use of associatlve-commutative identity checking, pattern matching, and unification operations plus the addition of embeddings of reductions permit extension of the Knuth-Bendix method to handle functions that are associative and commutative.
References [1] Andrews, P.B. Theorem proving via general rantings. Journal of the A C M 28, 2 (April 1981), 193-214. [2] Benanav, D., Kapur, D., and P. Narendran. Complexity of matching problems. Proceedings of the First International Conference on Rewriting Techniques and Applications, Dijon, France,
127
May 1985. [3] Bl~ius, K., N. Eisinger, J. Siekmann, G. Smolka, A. Herold, and C. Walther. The Markgraf Karl Refutation Procedure (Fail 1981). Proceedings of the Seventh International Joint Conference on Artlfieial Intelligence, Vancouver, B.C., Canada~ August 1981, 511-518. [4] Bibel, W. On matrices with connections. Journal of the ACMZS, 4 (October 1981), 633-645. [5] Bibel, W. Automated Theorem Proving. Friedr. Vieweg & Sohn, Braunschwelg, West Germany, 1982. [6] Boyer, R.S: and J S. Moore. The sharing of structure in theorem-proving programs. In B. Meltzer and D. Michie (eds.). Machine Intelligence 7. Edinburgh University Press, Edinburgh, Scotland, 1972, pp. 101-116. [7] Brachman, R.J., R.E. Fikes, and H.J. Levesque. Krypton: a functional approach to knowledge representation. IEEE Computer 16, 10 (October 1983), 67-73. [8] Brachman, R.J., V. Pigman Gilbert, and H.J. Levesque. An essential hybrid reasoning system: knowledge and symbol level accounts of Krypton. Proceedings of the Ninth International Joint Conference on Artificial Intelligence, Los Angeles, California, August 1985, 532-539. [9] Brand, D. Proving theorems with the modification method. SIAM Journal of Computing (December 1975), 412-430. [10] Buchberger, B. Basic features and development of the critical-pair/completion procedure. Proceedings of the First International Conference on Rewriting Techniques and Applications, Dijon, France, May 1985, 1-45. [11] Chang, C.-L. The unit proof and the input proof in theorem proving. Journal of the ACM 17, 4 (October 1970), 698-707. [12] Chang, C.-L. and R.C.-T. Lee. Symbolic Logic and Mechanical Theorem Proving. Academic Press, New York~ New York, 1973. [13] Clocksin~ W.F. and C.S. Meltish. Programming in Prolog. Springer-Verlag, Berlin, West Germany~ 1981. [14] Cohn, A.G. Mechanizing a Particularly Expressive Many Sorted Logic. Ph.D. dissertation, University of Essex, Essex, England, January 1983. [15] Cohn, A.G. On the solution of Schubert's steamroller in many sorted logic. Proceedings of the Ninth International Joint Conference on Artificial Intelligence, Los Angeles, California, August 1985, 1169-1174. [16] Dahl, V. Translating Spanish into logic through logic. American Journal of Computational Linguistics 7, 3 (September 1981), 149-164. [17] Digricoli, V.J. Resolution by unification and equality. Proceedings of the Fourth Workshop on Automated Deduction, Austin, Texas, February 1979. [18] Digricoli, V.J. The efficacy of RUE resolution: experimental results and heuristic theory. Proceedings of the Seventh International Joint Conference on Artificial Intelligence, Vancouver, B.C., Canada, August 1981, 539-547.
128
[19] Eder, E. Properties of substitutions and unifications. Journal of Symbolic Computation 1 (1985). [20] Fages, F. Associative-commutative unification. Proceedings of the 7th International Conference on Automated Deduction, Napa, California, May 1984. Lecture Notes in Computer Science 170, Springer-Verlag, Berlin, West Germany, pp. 194-208. [21] Fay, M.J. First-order unification in an equational theory. Proceedings of the 4th International Conference on Automated Deduction, Austin, Texas, February 1979, 161-167. [22] Gallier, J. Logic for Computer Science. Harper & Row, New York, New York, 1.986. [23] Henschen, L.J. and S.A. Naqvi. An improved filter for literal indexing in resolution systems. Proceedings of the Seventh International Joint Conference on Artificial Intelligence, Vancouver, B.C., Canada, August 1981, 528-529. [24] Hewitt, C. Description and theoretical analysis (using schemata) of PLANNER: a language for proving theorems and manipulating models in a robot. Technical Report, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, April 1972. [25] Hsiang, J. Refutational theorem proving using term-rewriting systems. Artificial Intelligence Journal Z5, 3 (1985), 255-300. [26] Huet, G. Rdsolution d'6quations dana les langages d'ordre i , 2 , . . ,w. Th~se d'6tat, Sp~cialit~ Mathmatiques, Universit~ Paris VII, 1976. [27] Huet, G. An algorithm to generate the basis of solutions to homogeneous diophantine equations. Information Processing Letters 7, 3 (April 1978), 144-147. [28] Huet, G. Confluent reductions: abstract properties and applications to term rewriting systems. Journal of the A C M 27, 4 (October 1980), 797-821. [29] Huet, G. A complete proof of correctness of the Knuth-Bendix completion algorithm. Journal of Computer and System Sicences P3 (1981), 11-21. [30] Huet, G. and D.C. Oppen. Equations and rewrite rules: a survey. Technical Report CSL-111, Computer Science Laboratory, SRI International, Menlo Park, California, January 1980. [31] Hullot, J.-M. A catalogue of canonical term rewriting systems. Technical Report CSL-113, Computer Science Laboratory, SRI International, Menlo Park, California, April 1980. [32] Hullot, J.-M. Canonical forms and unification. Proceedings of the 5th International Conference on Automated Deduction, Les Arcs, France, July 1980. Lecture Notes in Computer Science 87, Springer-Verlag, Berlin, West Germany, pp. 318-334. [33] Jaffar, J., J.-L. Lassez, and J. Lloyd. Completeness of the negation as failure rule. Proceedings of the Eighth International Joint Conference on Artificial Intelligence, Karlsruhe, West Germany, August 1983, 500-506. [34] Jaffax, J. Minimal and complete word unification. Technical Report 51, Department of Computer Science, Monash University, Clayton, Victoria, Australia, March 1985. [35] Jouannaud, J.-P. and H. Kirchner. Completion of a set of rules modulo a set of equations. Technical Note, Computer Science Laboratory, SRI International, Menlo Park, California, April 1984.
t29
[36] Knuth, D.E. and P.B. Bendix. SimpIe word problems in universal algebras. In Leech, J. (ed.), Computational Problems in Abstract AIgcbras, Pergamon Press, 1970, pp. 263-297. [37] Kowalski, R. A proof procedure using connection graphs. Journal of the A CM 22, 4 (October 1975), 572-595. [38] Kowalski, R.A. Algorithm = logic + control. Communications of the A CM 22, 7 (July 1979), 424-436. [39] Kowalski, R. Logic for Problem Solving. Elsevier North-Holland, New York, New York, 1979. [40] Kowalski, R. and D. Kuehner. Linear resolution with selection function. Artificial Intelligence 2 (1971), 227-260. [41] Lankford, D.S. Canonical algebraic simplification in computational logic. Technical Report, Department of Mathematics, UniverMty of Texas, Austin, Texas, May 1975. [42] Lankford, D.S. Canonical inference. Report ATP-32, Department of Mathematics and Computer Sciences, University of Texas at Austin, Austin, Texas, December 1975. [43] Lankford, D.S. and A.M. Ballantyne. Decision procedures for simple equational theories with commutative axioms: Complete sets of commutative reductions. Report ATP-35, Department of Mathematics, University of Texas, Austin, Texas, March 1977. [44] Lankford, D.S. and A.M. Ballantyne. Decision procedures for simple equational theories with permutative axioms: Complete sets of permutative reductions. Report ATP-37, Department of Mathematics, University of Texas, Austin, Texas, April 1977. [45] Lankford, D.S. and A.M. Ballantyne. Decision procedures for simple equational theories with commutative-associative axioms: Complete sets of commutative-assoclative reductions. Report ATP-39, Department of Mathematics, University of Texas, Austin, Texas, August 1977. [46] Lankford, D., G. Butler, and B. Brady. Abelian group theory unification algorithms for elementary terms. Technical Report, Mathematics Department, Louisiana Tech University, Ruston, Louisiana, 1983. [47] Livesey, M. and J. Siekmann. Termination and decidability results for string unification. Memo CSM-12, Essex University, Essex, England, 1975. [48] Livesey, M. and J. Siekmann. Unification of A+C-terms (bags) and A÷C+I-terms (sets). Interner Bericht Nr. 5/76, Institut ffir Informatik I, Universit£t Karlsruhe, Karlsruhe, West Germany, 1976. [49] Lloyd, J.W. Foundation~ of Logic Programming. Springer-Verlag, New York, New York, 1984. [50] Loveland, D.W. A linear format for resolution. Proceedings of the IRIA Symposium on Automatic Demonstration, Versailles, France, 1968. Lecture Notes in Mathematics 125, SpringerVerlag~ Berlin, West Germany, 1970, pp. 147-162. [51] Loveland, D.W. A simplified format for the model elimination procedure. Journal of the A C M 16, 3 (July 1969), 349-363. [52] Loveland, D.W. Automated Theorem Proving: A Logical Basis. North-Holland, Amsterdam, the Netherlands, 1978.
130
[53] Luckham, D. Refinement theorems in resolution theory. Proceedings of the IRIA Symposium on Automatic Demonstration, Verailles, France, 1968. Lecture Notes in Mathematics 125, Springer-Verlag, Berlin, 1970, pp. 163-190. [54] Makanin, G.S. The problem of solvability of equations in a free semigroup. Soviet Akad. Nauk SSSR 233, 2 (1977). [55] Manna, Z. and R. Waldinger. A deductive approach to program synthesis. A C M Transactions on Programming Languages and Systems 2, 1 (January 1980), 90-121. [56] Manna, Z. and R. Waldinger. The Logical Basis for Computer Programming. Addison-Wesley, Reading, Massachusetts, 1985. [57] Martelli, A. and U. Montanari. An efficient unification algorithm. A C M Transactions on Programming Languages 4, 2 (April 1982), 258-282. [58] McCharen, J., R. Overbeek, and L. Wos. Complexity and related enhancements for automated theorem-proving programs. Computers and Mathematics with Applications 2 (1976), 1-16. [59] Morris, J.B. E-resolution: extension of resolution to include the equality relation. Proceedings of the International Joint Conference on Artificial Intelligence, Washington, D.C., May 1969, 287-294. [60] Murray, N.V. Completely non-clausal theorem proving. Artificial Intelligence 18, 1 (January 1982), 67-85. [61] Overbeek, R. An implementation of hyperresoIution. Computers and Mathematics with Applications 1 (1975), 201-214. [62] Paterson, M.S. and M.N. Wegman. Linear unification. Journal of Computer and Systems Science 16, 2 (April 1978), 158-167. [63] Peterson, G.E. and M.E. Stickel. Complete sets of reductions for some equational theories. Journal of the Association for Computing Machinery 28, 2 (April 1981), 233-264. [64] Plaisted, D.A. The occur-check problem in Prolog. New Generation Computing 2, 4 (1984), 309-322. [65] Plotkin, G.D. Building-in equational theories. In Meltzer, B. and D. Michie (eds.). Edinburgh University Press, Edinburgh, Scotland, 1972, pp. 73-90. [66] Robinson, J.A. A machine-oriented logic based on the resolution principle. Journal of the ACMIP, 1 (January 1965), 23-41. [67] Robinson, J.A. Logic: Form and Function. Elsevier North-Holland, New York, New York, 1979. [68] Rulifson, J.F., J.A. Derksen, and R.J. Waldinger. QA4: a procedural calculus for intuitive reasoning. Technical Note 73, Artificial Intelligence Center, SRI International, Menlo Park, California, November 1972. [69] Schmidt-Schauss, M. A many-sorted calculus with polymorphic functions based on resolution and paramodulation, Proceedings of the Ninth International Joint Conference on Artificial Intelligence, Los Angeles, California, August 1985, 1162-1168.
131
[70] Siekmann, J.H. Unification of commutative terms. Interner Bericht Nr. 2/76, Institut ffir Informatik I, Universit£t Karlsruhe, Karlsruhe, West Germany, 1976. [71] Siekmann, J.H. Universal unification. Proceedings of the 7th International Conference on Automated Deduction, Napa, California, May 1984. Lecture Notes in Computer Science 170, Springer-Verlag, Berlin, West Germany, pp. 1--42. [72] Siekmann, J. and W. Stephan. Completeness and soundness of the connection graph proof procedure. Interner Bericht 7/76, Institut ffir Informatik I, Universit~t Karlsruhe, Karlsruhe, West Germany, 1976. [73] Slagle, J.R. Automated theorem-proving for theories with simplifiers, commutativity, and associativity. Journal of the A C M 21, 4 (October 1974), 622-642. [74] Smolka, G. Completeness and confluence properties of Kowalski's clause graph calculus. Interner Bericht 31/82, Institut ffir Informatik I, Universit£t Karlsruhe, Karlsruhe, West Germany, December 1982. [75] Stickel, M.E. A complete unification algorithm for associative-commutative functions. Proceedings of the Fifth International Joint Conference on Artificial Intelligence, Tbilisi, Georgia, U.S.S.R., September 1975, 71-76. [76] Stickel, M.E. Mechanical Theorem Proving and Artificial Intelligence Languages. Ph.D. dissertation, Computer Science Department, Carnegie-Metlon University, Pittsburgh, PennsyIvania, December 1977. [77} Stickel, M.E. A unification algorithm for associative-commutative functions. Journal of the A C M 28, 3 (July 1981), 423-434. [78] Stickel, M.E. A nonclausal connection-graph resolution theorem-proving program. Proceedings of the AAAI-S2 National Conference on Artificial Intelligence, Pittsburgh, Pennsylvania, August 1982, 229-233. [79] Stickel, M.E. Theory resolution: building in nonequational theories. Proceedings o/the AAAI83 National Conference on Artificial Intelligence, Washington, D,C., August 1983, 391-397. [80] Stickel, M.E. A Prolog technology theorem prover. New Generation Computing P, 4 (1984), 371-383. [81] Stickel, M.E. Automated deduction by theory resolution. Journal of Automated Reasoning t, 4 (1985), 333-355. [82] Stickel, M.E. and W.M. Tyson. An analysis of consecutively bounded depth-first search with applications in automated deduction. Proceedings of the Ninth International Joint Conference on Artificial Intelligence, Los Angeles, California, August 1985, 1073-1075. [831 van VaMen, J. An extension of unification to substitutions with an application to automatic theorem proving. Proceedings of the Fifth International Joint Conference on Artificial Intelligence, TbiIisi, Georgia, U.S.S.R., September 1975, 77-82. [84] Walther, C. A many-sorted calculus based on resolution and paramodulation. Proceedings of the Eighth International Joint Conference on Artificial Intelligence, Karlsruhe, West Germany, August 1983, 882-891.
132
[85] Walther, C. A mechanical solution of Schubert's steamroller by many-sorted resolntion. Proceedings of the AAAI-8~ National Conference on Artificial Intelligence, Austin, Texas, August 1984, 330-334. Revised version appeared in Artificial Intelligence g6, 2 (May 1985), 217-224. [86] Walther, C. Unification in many-sorted theories. Proceedings of the 6th European Conference on Artificial Intelligence, Pisa, Italy, September 1984. [87] Winker, S.K. and L. Wos. Procedure implementation through demodulation and related tricks. Proceedings of the 6th International Conference on Automated Deduction, New York, New York, June 1982. Lecture Notes in Computer Science 138, Springer-Verlag, Berlin~ West Germany, pp. 109-131. [88] Wos~ L , R. Overbeek, E. Lusk, and J. Boyle. Automated Reasoning. Prentlce-Hall, Englewood Cliffs, New Jersey, 1984. [89] Wos, L. and G.A. Robinson ParamoduIation and set of support. Proceedings of the IRIA Symposium on Automatic Demonstration, Verailles~ France~ 1968. Lecture Notes in Mathematics 125, Springer-Verlag, Berlin, 1970, pp. 276-310. [90] Wos, L., G.A. Robinson, and D.F. Carson. Efficiency and completeness of the set of support strategy in theorem proving. Journal of the ACM 12, 4 (October 1965), 536-541. [91] Wos, L., G.A. Robinson, D.F. Carson, and L. Shalla. The concept of demodulation in theorem proving. Journal of the A C M I~, 4 (October 1967), 698-709.
Fundamental Mechanisms in Machine Learning and Inductive Inference
Alan W. Biermann Duke University Durham, NC 27706
Supported in part, by the U.S. Army Research Office under $rant DAAG-29-84-K-0072
134
I. I N T R O D U C T I O N While learning and inductive inference are two distinctively different phenomena, they often appear together, and therefore, it is appropriate to study them simultaneously. Learning, for the purposes of this article, wilt be said to occur when a system self modifies to improve its own behavior. The scenario is thus that the system operates at a given performance level at one time, experiences events of one kind or another, and self modifies with purpose to achieve a higher level of performance at a later time.
Inductive inference occurs when a system observes examples (and possibly nonexamples) of a set and constructs a general rule to characterize the set. Thus, as an illustration, such a system might be shown several examples of arches and several objects that are not arches and asked to inductively infer a general rule that will distinguish all arches from other objects. The induced rule is only a guess based upon incomplete information, the known examples and nonexamptes. However, if the input information is representative, the guessed rule will be correct or nearly correct. If the rule has shortcomings, additional examples will often result in convergence to a correct form. The phenomenon of inductive inference has been studied under many different names in the literature including generalization, induction, concept formation, learning, categorization, and theory formation. Most systems that learn use inductive inference as the mechanism for improving behavior. That is, in the process of performing a task, the system infers rules about the domain and uses those rules in later actions to achieve a higher level of performance. This is the kind of learning system that will be studied here. Examples of learning systems that do not use inductive inference are those that improve behavior by simply memorizing facts or by discovering new behaviors using introspective mechanisms. The learning mechanisms to be studied here fall into five different categories, systems which learn. (1) finite functions, (2) grammars, (3) programs from traces, (4) LISP programs form input-output pairs, and (5) PROLOG programs from oracle queries. The first type of system learns functions which receive inputs and in a single computational action compute the associated output. The second type can learn a grammar for a language from example strings (or sentences) in the language (and possibly some nonsentences). The third type of system requires that the user lead the machine through a trace of a sample computation and then it infers a program for doing the computation. The fourth and fifth approaches
135
involve discovering LISP and PROLOG programs that can achieve certain target input-output behaviors. As each of these studies is undertaken, it is important to keep in mind the various measures of a learning machine. One should first notice the nature of the required training information. Are only positive examples of target behavior given, or are both positive and negative examples given? Is the information provided at random from the external world or can the learning machine ask for any fact it needs? Is the target behavior presented strictly in terms of input-output requirements, or does the training information show how the output is to be obtained from the input? Also one should notice whether it is possible to specify for the given learning machine exactly what is the set of behaviors that it can learn? Finally, what are the levels of error and rates of learning for the machine?
H. L E A R N I N G F I N I T E F U N C T I O N S A finite function will be defined for the purpose of this study to be any function which sequentially inputs a bounded amount of information and then computes an answer. Later sections in this chapter will study the acquisition of functions or programs which may process an input of unbounded length. While there are many finite function learning machines in the literature, five will be discussed here. Methods will be described for learning (1) (2) (3) (4) (5)
linear evaluation functions, signature tables, Boolean conjunctive and disjunctive normal forms, Michalski expressions, and semantic nets.
Learning Linear Evaluation Functions A linear evaluation function has the form y ~ ~ ci x i where y is the computed value, xl,x~,...,x n are i=l n inputs, and cl,c2,...,c n are variable coefficients. Learning is done by adjusting the coefficients for improved behavior. In many systems, such a linear function is built into a larger system which utilizes the computed value y for evaluating alternative decisions. Thus in a pattern recognition problem (Nilsson [65], Minsky and Papert [69]), the xi's may represent measurements or feature values of an unknown pattern and the pattern will be recognized as belonging to a given class if y is positive. In a game playi.,~g
136
situation (Samuel[59]), the xi's represent feature values of the specific position on the board and y is assumed to give a measure of the desirability of that position. Linear evaluation systems have been important in the learning literature because there are learning algorithms with guaranteed convergence to a solution if one exists and because much is known about the class of behaviors t h a t these systems can compute. A n example of a learning algorithm for such systems is the following (taken from Minsky and Papert [69]).
START:
Choose the constants c i randomly.
TEST:
Select an object from the set to be learned (positive information) or from outside the set (negative information), obtain its feature values Xl,X2,...,xa, and compute y -~ ~
c i x i. If positive
i=l
information was selected: If y > 0 then go to TEST. I f y < 0 then go t o A D D . If negative information was selected: If y < 0 go to TEST. If y > 0 go to S b ~ .
ADD:
For each i, ci = ci q- xi. Go to TEST.
SUB:
For each i, cl = el - xl. Go to TEST.
This algorithm loops without termination continuously selecting objects and testing its classification rule that asserts the object is in the class if y is positive and out otherwise. If a particular selected object is correctly classified, the algorithm does nothing except choase another object to test. If the object is not correctly classified, the coefficients are altered in the direction to increase y for positive information and
137
decrease it for negative information. If linear evaluation methods are used in a pattern recognition environment, then the learnable classes are those which are linearly separable in their feature spaces as defined, for example, by Nilsson [65]. Such classes are reasonably well understood and applicable in many domains (Fu [75]). However, many important features in a pattern cannot be recognized by these systems as has been described by Minsky and Papert [69]. For example, they showed that the well known "perceptron" recognizer which employs linear decision making is not capable of distinguishing geometric properties such as "connectedness" and "parity".
Learning Signature Tables Because of the limitations of linear methods, Samuel [67] developed a decision making scheme based on sequential table lookup as shown in Figure 1. The input values xl,...,xn are used to obtain output values from the lowest level table; these output values become inputs to the next level and so forth until a final function value is returned at the top level. Signature tables are capable of computing nonlinear functions and they are very fast in execution. The class of learnable functions has been characterized by Biermann, Fairfield, and Beres [82] and an optimal though expensive learning algorithm is known. The key insight needed for understanding signature tables comes from constructing a matrix of all the function values for each table in the system. The matrix for a table should have a row for each set of input values that feed that table and it should have a column for each set of input values that do not feed that table. As an illustration, consider the table labeled A in Figure 1. Its associated matrix, shown in Figure 2, has a row for each assignment of values to (xI,x2). These are inputs that "feed" table A. It has a column for each possible vector (x3,x4,x~,x6). We note that this matrix has only two distinct rows; the first and last rows are identical as are the second and third rows. This means that table A needs only two output values and that the first and last entries must be identical and the second and third entries must be identical, Thus this matrix shows that entries (0,i,1,0) must be made into the output column of table A. (Actually, (1,0,0,1) would also be satisfactory.) Similarly, aI1 other output columns of all other tables can be derived from their associated matrices so one has a synthesis methodology for such systems.
@
@
r~
@
@
II
0
0
0
0~
~-"
0
~-.'
0
0
0
o
~0
0
0
C~
0
0
0
c~
0
X
~ C~
0
0
C3
0
X
0
140
The synthesis methodology begins with a signature table system like the one shown in Figure 1 but with the output values for the tables being unknown. A matrix is constructed for each table in the system using the function to be realized as described above and the associated table output as derived. The resulting signature table system will correctly compute the target function. Samuel, however, was not able to use this learning scheme in the checker playing application because the size of the matrices would be too large and not all entries were known. His method amounted roughly to counting the number D of O's in a row and the number A of l's in that row and computing a coefficient C ~---(A - D) / (A + D). Then rows which had similar C's were given the same output values in the signature tables. Thus, as explained in Biermann et al. [82], Samuel identified rows with similar weights whereas the ideal solution identifies rows with similar or identical profiles. His system thus made errors proportional to the degree of variation of his method from the ideal. An analysis of his methodology and a suggestion for its improvement appears in Biermann et al. [82]. Signature tables have been used successfully in many applications in addition to game playing (see also Truscott [79] and Smith [73]) such as medical decision making (Page [77]) and operating systems (Mamrak and Amer [781).
Learning Conjunctive and Disjunctive Normal Form Boolean Expressions Valiant [84] has developed a series of algorithms for constructing normal form Boolean expressions from examples of target behavior. One class that was solved is the set of k-conjunctive normal form expressions which are made up of products of unions of not more than k input variables (some of which may be negated). Thus y ~---xl (x2 + x4) (x2 + xs) is a 2-conjunctive normal form since no more than two variables appear in any single conjunct. The output y can be computed from the inputs using the usual Boolean conventions so that, for example, (Xl,Xs,Xs,X4) =(1,1,1,1)
yields
y=
1
and
(xl,x2,xs,x4)=(1,0,1,1)
yields
y=0.
Valiant has given a strategy for learning such expressions from positive examples only (where y = 1). One begins with the k-conjunctive normal form which includes all possible k-conjuncts and then as each positive example behavior is encountered, those conjuncts which do not cover that example are
141
deleted. This process will be illustrated in she learning of a 2-conjunctive normal form when there are three possible inputs xl,x2, and x 3. The initial expression contains all possible 2-conjuncts.
The over-bar
notation is used to indicate negation.
y = xl x~ x8 x~ x2 x3 (xl + x~) (xt + ~ ) (~ + x~) ~
+ ~2) (x~ + x~) ...... (~2 + ~s)
Suppose a function is to be learned and the following positive example has been received: y = 1 when (xl,x2,xs) = (1,1,0). Then all conjuncts which yield 0 on this input are removed from the initial expression for y. T h a t is, x3,xl,x2,(xl + ~2), etc. are removed leaving the following expression. Y = Xl X2 X3 (Xl + X2) (Xl -~- X'2) ('X1 -]- X2) (Xl -[- X3) ...... (X2 -]- X3) If a second positive example is presented, say y :
1 if (x~,xe,x3) :
(0,0,0), then the expression would be
simplified further.
y = ~ (x~ + ~ ) (~ + x~) ...... ( ~ + ~). Clearly a sequence of such positive examples will quickly lead to a final expression if one exists capable of computing the target function. Valiant [84] uses a probabilistic model for selection of examples and defines a function to be learned when the probability of error on positive examples is less t h a n 1 / h where h is an arbitrary value. He has shown that, using his model, the k-conjunctive normal form expressions are learnable with a polynomial n u m b e r of positive examples and in polynomial time on the parameters h and k. He has also developed similar results on the monotone disjunctive normal form expressions (where negation is not allowed) and on other classes of Boolean expressions.
Learning Miehalski Expressions Michalski has developed a methodology for inducing generalizations from instances of scientific data and thus producing theories from observations. This methodology has been widely applied to medical and agricultural problems with considerable success (Miehatski [80]). The methodology begins by coding specific observational data into symbolic form and then performing generalizations on the basic data until one or more theories can be induced. For example, in a particular application a biological cell of a known type was described as follows:
142
CELL1, B1, B2 .... , B6 [contains (CELL1, BI, B2, ..., B6)] fcirc (CELLI) = 8] [pplasm (CELL1) = A]
[shape(B1) = ellipse] [texture (m) = strips] [shape (B2) = circle] etc. This statement asserts that there is a cell containing objects B1, B2,..., B6 and it enumerates various properties of the cell and these six objects. Many other cells of the known type were similarly coded and many cells outside of this class were coded. The task of the system in such problems is to find the properties or combination of properties needed to distinguish members of the known type from other members. It is often easy to find a way to distinguish one class from another. One way is to store all the source data and compare each unknown object with the set of known objects. The primary objection to this strategy is that it does not lead to understanding. It is much more desirable to know in simple terms what differentiates one class from another and then use these defining properties. The goal of the Michalski system is to discover such simple defining properties. Michalski has thus introduced the concept of preference criteria which enable the user of his system to limit the complexity of the generated theory. The user can set a series of weights which cause the system to bias its generated theories as desired along various complexity measures. The user can thus request that. the number of operators, the cost of measuring the features, and other significant complexity factors be minimized. The knowledge base for the system comes from both the given data samples and from rules related to the domain. For example, in a domain dealing with shapes, the system might be told that n-sided figures for any n are called polygons. Such rules are important because they give the system the opportunity to simplify theories and to achieve the preference criteria given by the user.
143
There are many generalization rules used by the system to build theories. Two will be described here to give the flavor of the approach. One is called the dropping condition rule. It states that if both A and B are observed in members of a type, then perhaps A alone is enough to characterize the type. That is, suppose it is known that all observed basketball players had the characteristics of being both tall and handsome; perhaps only tallness is needed to differentiate basketball players from others. A second example generalization rule is the adding alternative rule which states that A is observed in all eases of a type, possibly the type is characterized by the condition (A or B). Thus, again suppose all observed basketball players have been tall, one could propose the hypothesis that all basketball players are either tall or strong. Generalization rules have the properties of increasing the number of cases covered, and when used in combination, decreasing the told complexity of the describing expressions. The task of the Michalski system is to find combinations of such operators which will reduce the original data (as described in the second paragraph of this section) to simple expressions which successfully separate the specified type from all other cases. The program does this with a complex combination of extensive searching and "hill climbing" on the preference criteria. For example, in the cell classification problem given above, the system found five different ways of separating the given type from the other cells. Each defining rule is clearly a tremendous simplification of the original given characteristics of the cells and provides rather helpful observations about the type being considered. The five theories are as follows: 1. ~ (1) S [texture (B) = shaded] [weight (B) _> 3] 2. [circ = even] 3. 3 ( >
1) B [shape (B) = boat] [orient (B) = N v NE]
4. 3 ( _
1) B [#tails - boat (B) = 1]
5. ~ (1) S [shape (S) = circle] [#contains (S) = 1]
The first rule state that the cells of the given type differed from other cells in that they all contained an object of shaded texture and weight greater than or equal to 3. The second rule states that the observed cells of that type all had even circumference and this differentiated them from the other cells. The other
144
three theories give equally interesting and concise information. The Michalski system thus provides a method for scientific investigators to reduce symbolic data and to search for generalizations which may help to understand it. The investigator first must find a way to encode the problem data into a descriptive form satisfactory for input to the program. Then the background knowledge and observational statements must be coded. Finally, it is necessary to specify the type of description desired and preference criteria to guide the program toward acceptable solutions. The induced generalization may help the scientist to understand his data better and may suggest further avenues for research.
Learning Semantic Network Descriptions The learned data structure might also be a semantic network instead of symbolic calculus expressions of the type described above. Although many variations on the idea exist (Findler [79]), the usual semantic network represents objects as nodes on a graph and relationships between those objects with directed arcs between the nodes. Minsky [68] and his students extensively explored the concept of the semantic network during the 1960's and Winston showed how such structures can be synthesized from examples. An illustration of this type of learning is shown in Figure 3 where a representation of an arch is constructed at the top on the basis of one example. An arch is assumed to exist whenever two bricks support a third brick. However, a second example is given to the system showing that the supported object does not need to be a brick; it may be a wedge or perhaps other object. The third example presents negative information that can be used to derive a necessary condition for an arch: the supporting bricks may not touch. The Winston system works by merging the semantic nets from all positive examples and applying information from negative examples to determine minimum conditions for the concept. This work is important because it is one of the few learning mechanisms ever aimed at the construction of semantic nets. One of the significant results was Winston's discovery of the import~ance of having negative information in available examples to prevent overgeneralization in the learning process.
145
Example 1 ARCH
p dron
Example 2 ARCH
polyhedron
Example 3
is-a
NOT AN ARCH
F i g u r e 3. B u i l d i n g a s e m a n t i c
n e t to r e p r e s e n t
of a n a r c h .
the concept
146
HI. L E A R N I N G
GRAMMARS
Introduction
In contrast to the above learning systems, a learning machine may be required to classify data of unbounded length. Thus a system may receive strings of symbols of arbitrary length and have the task of classifying those strings as being in or not in a specified type. Since grammars are commonly used for classifying strings, it is reasonable to study the problem of inferring or constructing grammars from examples. As an illustration, suppose the set of strings A BAA ABABA AAA BBA ABAA
is known to be selected from a specific class. The question arises as to what general rule may characterize the strings in the class. One could make many hypotheses on the basis of these few examples, but a reasonable guess might be that every string ends in an A. A grammatical inference system would specify its guessed rule for the class by giving a grammar. The grammar for the set of strings ending in A is as follows where v is a nonterminal symbol and upper case letters are terminals. v --~ i v v-*Bv v ---*A If one has a successful grammatical inference system, it can find a grammar that represents the set to be classified and ever after use that grammar to correctly classify strings even if they have not previously been observed. Thus if a system has correctly discovered the above grammar, it could accurately classify such strings as ABAAAABA and ABAAAB as, respectively, in and not in the target set. The learned grammar can be thought of as either a recognizer of strings or a theory of the given data.
147
The grammatical inference model to be studied here assumes the existence of an information source and an inference machine.
The information source selects a language L from a known class C of
languages and presents examples which may be in L or not in L to the inference machine. At each time t=1,2,3,.., the information source presents a string which is marked "+" if the string is in L and "-" otherwise. -Information sources may be of two kinds, positive information sources which are organized so that every string in L appears at least once in the sequence and complete information sources which produce every possible positive or negative example at least once in the sequence. At each time t=1,2,3,.., the inference machine uses all information gathered to that time to make a guess at a grammar for the language L. The inference machine knows which class C the unknown language belongs to and must select a grammar for a member of this class. Using this model, there are many possible definitions of learnahility and three will be examined here, finite identification, identification in the limit, and strong approachability. It turns out that the complexity of the learnable grammar varies greatly depending on the definition of learnability used and on whether or not a complete information source with positive and negative examples is available. The next section will show what cla~ss of grammar can be learned under the various definitions of learnability and the last section will give an example of a grammar inference algorithm.
Finite Identification The first definition of learnability to be examined here is finite identification. With this definition, it is required that after only a finite number of samples from the information source, the inference machine identifies the unknown language correctly and announces that it has done so. Suppose the class C is made up of the three languages L1,L2, and L3 which are enumerated as follows:
Li = {A} L2 = {AB, AAB} L3 = {AB, AAB, AAAB}
148
This is one of the easiest learning problems imaginable since the inference machine needs only to see a few examples of the given languages to distinguish which is being presented. Thus if the information source presents the example A+, the machine will know that L 1 is correct and print the grammar {v --* A}. If the information source presents the example AB+, than either L2 or L3 will be correct but additional information is needed to make the selection. If AAA + comes from the information source, it will be possible to select L z. However, suppose the information source presents positive information only and presents the following sequence: AB+, AB+,AB+, AAB+, AB+, AAB+, ..... One might suspect that L~ is being presented but one cannot be sure. It may be that AAAB will appear as the billionth string in the sequence and that Ls is the correct answer. There is no way to prove that L2 is the answer because one can never be sure that AAAB will not appear later. The conclusion is that even this simple class is not learnable using positive information only. There exists a member Lz of the class which cannot be distinguished from other members from positive information only. In this case, the inference machine cannot at any time select L~ and announce that it has correctly identified the unknown. From the example, one can conclude the following: A finite class of finite sets is not in general
finitely identifiable from a positive information source. On the other hand, if this class C is to be approached using complete information, both positive and negative, then any member ean be finitely identified. Consider the following sequence from such an information source for L z. A-, AB+, B-, AA-, AAB+,..., AAAB-,... Since a complete information source will include every possible string somewhere, the key string AAABwill occur and when it does, the inference machine will be able to announce (with a proof) that L2 is the correct choice and print the grammar {v --+ AB, v --* AAB}. It is not predictable when the key string will occur, but it is known that it will appear somewhere. A generalization of this argument leads to the result that a finite class of finite sets is finitely identifiable from a complete information source.
t49
If C is the class of all finite sets, the problem of learning is much more difScult. Even with a complete information source, it is not possible to discover which finite set is to be selected at any given point in time because later samples may always produce unpredicted behaviors. Thus one can conclude that the
class of finite sets is not finitely identifiable from a complete (or positive} information ~ource. These results are summarized in the chart given below. An X appears in the entry where learnability was achieved. It is somewhat surprising that despite the simplicity of the problems being examined, only one positive result was obtained. Evidently the definition of learnability is so strict that only the most trivial learning problems can be solved. Another notable observation is that a complete information source is substantially more powerful than a positive only source. This effect will be seen more dramatically in later sections.
The Finite Sets Finite Class of Finite Sets
Not learnable
Not learnable
X
Not learnable
Complete Information
Positive Information
Figure 4. Learnability summary for finite identication.
I d e n t i f i c a t i o n in t h e L i m i t There are numerous examples of learning in nature such as the learning of natural language by children. Yet our discussion above showed that only the most trivial things can be learned if finite identification is required. In this section, the requirement will be removed that the system announces its final answer after a finite amount of time. Learning by identification in the limit will be achieved if the system correctly guesses the right answer at each time after some To but To is not known. In other words the learning system may guess the same answer for millions of consecutive times without being sure it has the correct answer. If unexpected data appears at any time, the system can modify its guess and hold that theory for an arbitrary length of time. The fact that the system is never required to announce a final
150
answer greatly increases the number of things that, can be learned. The system is required to make a guess at some point that wilt never be changed again as new information arrives but it will never be sure it has achieved the final answer. Many types of language can be identified in the limit. Consider, for example, the class C of all finite sets and assume that a positive information source is available. Assume the inference machine uses the strategy of guessing that the unknown language is made up of exactly the strings seen so far. Thus if the strings A + and 13+ have been seen, the guessed grammar would be {v ~ A, v --* B}. It is easy to see that this system will identify each language L in C in the limit because every string in the unknown L must appear at some time. So L will be guessed after all have been observed and the system will never change its guess. However, the system will not know that it has seen every member so will not be able to announce that it has a final answer. We conclude that the finite sets can be identified in the limit from
positive information only, a result that is much stronger than was possible in the previous section. However, if the above problem is made slightly more difficult, it is no longer possible to identify in the limit. Let C be the class of all finite languages plus the infinite language L0 that contains all possible strings. There exists an information sequence for L0 which has the property that the inference system will never select L0 and remain with it permanently. The inference system will change its guess repeatedly and without end. The pathological information sequence is designed as follows: A finite set Ll of strings is presented repeatedly until the system selects L1; then additional strings are presented until it selects Lo; then those finite strings are repeated until it selects that set as its finite guess, call it L2; then additional strings are presented until it selects L0; and so forth. Such an information sequence forces the inference system to change its guess an infinite number of times and thus violate the definition of identifiabitity in the limit. A class of languages containing the finite sets and one infinite language is called super finite. The current conclusion is that the super finite languages are not identifiable in the limit from positive
information only. If complete information is available, many classes of languages are identifiable in the limit. Consider any class of decidable rewriting systems. Such classes C must have these properties:
151
(1)
O must be enumerable. That is there must be an effective way to list the grammars G1,G2,Gs,... for all the languages in C.
(2)
O must be decidable. That is, if G i is a grammar for a language in C, there must be a way to decide whether G i generates a given string.
Examples of classes which are decidable rewriting systems are the regular, context-free, and contextsensitive languages. One can show that classes of decidable rewriting systems can be identified in the limit from complete
information sources. The inference system simply chooses the first grammar in the enumeration which can generate M1 the known strings marked " + " and none of the known strings marked % ' . Let G i be the first grammar in the enumeration for the target language to be learned. Such a G i must appear in the enumeration by the definition of decidable rewriting systems. Since every predecessor of G i in the enumeration will differ from G i on some string, that predecessor will be eliminated from consideration when t h a t string appears in the information source. So G i will be selected after all its predecessors are shown to be inadequate and the inference system will never again change its guess. Finally, one can prove that the reeursivcly enumcrable sets are not identifiable in the limit from com-
plete information. All of these results are proven by Gold [67] and are summarized in Figure 5.
Recumively Enumerable Sets
Not learnable
Not learnable
X
Not learnable
Super Finite Sets
X
Not learnable
The Finite Sets
X
X
Finite Glass of Finite Sets
X
X
Complete Information
Positive Information
Decidable Rewriting Systems
Figure 5. Learnability summary for identification in the limit.
152
Strong Approachability Feldman [72] has given a weaker definition of learnabiIity called strong approachability. This definition requires that (i)
for every string y in the target language there is a time after which every guessed language includes y,
(2)
for each grammar which does not generate the target language there is a time after which it will not be selected, and
(3)
there is a correct grammar which will be guessed an infinite number of (possibly nonconsecutive) times.
This definition is sufficiently weak that the system could select the wrong grammar most of the time and still be said to have learned. It, however, does include the essential elements of convergence. Feldman showed that the reeurMvcly enumerablc sets are strongly approachable from positive information. A summary of all three levels of learnability and the associated results appears Figure 6.
Recursively Enumerable Sets
XX
Decidable Rewriting Systems
XX
X
Super Finite Sets
X X
X
Finite Sets
X X
X
Finite Cla~s of Finite Sets
X X C+
X X C+
X C+
Strong Approachability
Identifiability in Limit
Finite Identifiability
X
Figure 6. Results summary for three levels of learnability.
t53
A n Algorithm for Grammatical I n f e r e n c e Few practical algorithms for grammatical inference have appeared over the past decades. Even though from a theoretical point of view many classes can be identified in the limit, most algorithms are so combinatorial that they cannot ordinarily be used. Two methodologies were developed with some capabilities to deal with finite state languages (Biermann and Feldman [72]) and context free languages (Wharton [77]). The first of these methods wilt be described here. Suppose the following set of strings are known to be samples from a finite state language and the task is to find its grammar: A, AA, BA, BB, AAA, ABA, ABB, BAA, BBA. Then one can construct the behavior tree shown in Figure 7. Each string is indicated on the tree by a node found by tracing the tree down from the top taking a left branch for each A and a right branch for each B. Next a finite automaton is built from this tree. The methodology involves selection of an integer k and constructing all subtrees found in Figure 7 of depth k. Choosing k = l in this example yields five types of subtrees as indicated in the figure, one corresponding to each of these sets of strings where A stands for the string of length 0.
1. {A}2. { A , A } 3 .
{A,B}4.
{A} 5. {}
These five subtrees then become states tbr a finite state machine as shown in Figure 8. Then a transition labeled x is placed in the finite state machine from state s i to state sj whenever the subtree in Figure 7 corresponding to s i has a transition labeled x to the subtree corresponding to sj. The set of all such transitions are shown in Figure 8. The initial state of the automaton corresponds to the top subtree in Figure 7. Ali states with A in their corresponding subtrees are labeled final states. The grammar can be constructed from the automaton by adding for each transition from s i to sj on input x a rule v i --* x vj to the grammar (plus a rule v~ --* x if sj is final). The resulting grammar in this example is given below.
154
'I
!
5 Figure 7. The behavior graph for the u n k n o w n finite state language.
6
A
5© I
~.j ........ :_A,B
>
A
6
Figure 8. A finite state accept,or for the u n k n o w n language.
155
v 1 --~ A v 2 v I --*A Vl --~ S vs v 2 --~ A v 2 v~ --~ A v2-, By 3 v2 --" A v4 v 2 --* B v 5 vs -'* A v 2 vs --*A V3 "-'+ n V2 vs ---~B v 3 -* A Y4 V3 --'~ B v 4 There are clearly redundant rules in this construction but the current concern is on how to build the grammar. The resulting g r a m m a r depends on the size of k. If k is large, the inferred language will be small and may even be finite. If k is small, the inferred language will be large. Biermann and Feldman [72] give a method of converging on the correct value of k. It adjusts k to obtain perfect behavior on all " s h o r t " strings.
Conclusion Research in grammatical inference has provided a mathematical model of learning and inference behaviors with definitions of learnability, convergence theorems, and many results concerning the Iearnabitity or lack thereof of various classes of behavior (See Angluin and Smith [83]). T h u s the field has tremendous theoretical importance in t h a t it provides models and tools t h a t can be applied in a variety of situations.
156
On the other hand, the field has led to few practical methods because of astronomical computations involved in most of the algorithms. The reason tbr this high cost is that example strings from a language provide no information about how the recognition computation is done. Thus the inference algorithms are reduced to enumerative methods for finding grammars. It was eventually realized that additional information about how to do computations would be needed if computational mechanisms such as grammars were to be synthesized. Later research thus tended to focus on the synthesis of programs where the inference environment often provides trace information that substantially aids in the synthesis. The following three sections describe approaches to the inference of programs where trace information can be used in the synthesis.
1V I N F E R R I N G P R O G R A M S F R O M C O M P U T A T I O N T R A C E S
Introduction: T h e Trainable Turlng Machine In many environments, trace information is available showing how a computation is done. It is not necessary to learn the grammar or program for doing a computation from only input-output behaviors.
The trainable Turing machine described by Biermann [72] provides an example of this. This machine has a
training mode in which the user can push the read-write head up and down the tape and indicate the desired computation by doing examples by hand. Then it has a computing mode in which the system acts like a normal Turing machine and uses a finite-state controller which was automatically synthesized on the basis of the hand examples. One can show that any Turing machine can be synthesized on the basis of such examples and that relatively few examples are needed in many practical situations. However, the computation cost for automatically synthesizing the finite-state controller can be high.
The Flowchart Generation Methodology One can see how to automatically generate a Turing machine from example computations by studying an example. Suppose it is desired to sort a sequence of A's and B's on a Turing machine tape so that all the A's precede the B's. The method will be to move the head of the Turing machine right until the first A is found. Then the head will move to the beginning of the tape and place the newly found A. Next it moves right looking for a second A which is moved left to be adjacent to the first A and so forth. An
t57
example of this computation appears below with the Turing machine head position being indicated by an underline.
BAA BAA BBA BBA B.BA AB__A AB& ABB ABB ABB AABB AAB _ The task is to automatically create a Turing machine that will do this calculation and hopefully, all other "similar" calculations. A notation is needed to represent a single head operation. Triples will be used that give, respectively, the symbol read from the tape, the symbol printed, and the subsequent head movement, left or right. Thus the triple ABR means that an A was read, a B was printed, and the head then moved right. The above twelve steps thus correspond to the following twelve head movements: BBR ABL BBL (blank) (blank) R BAR BBR ABL
158
BBL AAR BAR BBR
(halt) A finite state controller for the Turing machine is needed which will direct these head movements. The construction of the finite-state controller is shown in Figure 9. Initially the Turing machine has only one state and no transitions as in Figure 9(a). But the first head movement in the computation is BBR so a transition is added to account for it in (b). This means that if the machine is in state 1 and reads a B, it will print a B, move right, and go to state 1. The second desired movement is ABL which could also involve a transition from state 1 to state 1. Unfortunately the third step is BBL which contradicts the first step. It is not possible to be in state 1 and expect a B input to yield a move left because a transition already exists that directs a B input to yield a move right. Therefore the ABL transition must go from state 1 to state 2. (See (c)). The BBL transition can then proceed from state 2 to anywhere. (It is directed to state t unless that fails, state 2 unless that fails, etc.) The BBL transition is directed to state 1 as shown in (d). The fourth head movement is (blank)(blank)R which cannot go to states 1 or 2 because it is followed by a BAR step. Both states 1 and 2 have contradictory actions on a B input, BBR and BBL. So the fourth movement is indicated on a transition from state 1 to state 3. (See (e).) Continuing this series of arguments, the final flow is completed in Figure 90). This construction can be automated and is guaranteed to produce a Turing machine capable of executing the given example. The interesting point, is that this Turing machine will sort any tape of A's and B's no matter their order or the length of the tape. The basic algorithm is given in Biermann [72] and a greatly refined version appears in Biermaim et al. [75]. Once this methodology was discovered, it was applied to numerous problems. Biermann and Krishnaswamy [76] built a trainable desk calculator that was driven by a light pen at a display terminal. W a t e r m a n et al. [84] used the idea in the construction of an adaptive programmer's helper for computing systems. Fink and Biermann [86] used the technique to automaticMly construct diMogue models from
159
(a)
@ BBR
(b) 8BR
('~
BBR
(d) - ~ BBR
SBL
(e) (~1
(h)
~BR
BBL
BBR
BIIL
O A B L Q -~-Rf~
~xx:SC:~
BBR
I~BL
Figure 9.. Constructing a Turing machine controller.
160
human-machine conversations. The procedure appears to be a fundamental mechanism for procedure acquisition which will have continuing importance in the coming years.
V
CONSTRUCTING
LISP
PROGRAMS
FROM
EXAMPLE
INPUT-OUTPUT
BEHAVIORS Introduction During the 1970's, a number of researchers examined the problem of synthesizing LISP code from examples. See, for examples, Biermann [78], Biermann and Smith [79], Hardy [75], Jouannaud and Kodratoff [79], Smith [84], and Summers [77]. Synthesis of LISP from examples is marginally feasible because the structures of the input and output lists yield substantial trace information. One of the synthesis methodologies will be described here~ the synthesis of LISP programs from recurrence relations as developed by Summers [77]. Other methodologies are surveyed in Biermann et al. [84] and in Smith [84].
LISP Synthesis from Recurrence Relations Suppose it is desired to create automatically a LISP program that will convert input ((A B) (C D) (E F)) to (B D F). That is, the target program is to collect the second elements on a series of lists. The Summers methodology requires the user to display the loop pattern in the target program in a series of input-output examples. NIL ~ NIL
((A B)) --* (B)
((A B)(C D))-~ (B D) ((A B)(C D)(E F ) ) - * (B D F) The synthesis of this program will be explained following the treatment of Smith [84]. The first step involves writing the outputs in terms of their respective inputs using LISP car, edr, and cons functions. fl(x) = NIL
161
f2(x) = cons (cadar (x)
NIL)
f3(x) = c o ~ (cadar (x) cons (cadadr (x)
NIL))
f~(x) = cons (cadar (x) cons (cadadr (X)
cons (cadaddr (x)
N~L)))
In fact, a program for achieving the observed example is F(x) = (cond (pi(x)
fl(x))
(pe(x)
f2(x))
(Pa(X)
fa(x))
(p4(x)
f4(x)))
where the pi's are predicates which select the correct fl to execute in each case. In fact, Summers gives a simple predicate generating algorithm which finds the pi's.
pi(x)
=
atom
(X)
p~(x) = atom (car (x)) p~(x) = atom (cddr (x)) p~(x) = atom (cdadr (x)) Program synthesis then involves finding a way to roll the straight line code for F into a loop. So the methodology tries to find a recurrence relation which relates each fl to previous f/s where j
fl(x) = Nm f2(x) = cons (cadar (x)
f,(cdr (x)))
f3(x) = cons (cadar (x)
f2(cdr (x)))
f,(x) = cons (cadar (x)
f3(cdr (x)))
So the recurrence relation is easily seen to be
fi(x) = cons (cad~ (x)
fi_,(cdr (x)))
for i=2,3,4. Similarly a recurrence can be found for the pi's.
162 pi(x) ---- pl_t(cdr (x)) for i=2,3,4. The induction step then assumes that these recurrence relations hold for all i > 1 and applies the Summers Basic Synthesis Theorem: If pi(x) ..... pk(x), pk+.(x) = pn(b(x))
for n _> 1
fi(x) ..... fk(x), fk+n(x) = C (fn(b(x)), x)
for n > 1
where b is a function of car's and cdr's and C is a cons structure that includes f~(b(x)) exactly once, then the function F (X) = (¢ond (pl(x)
fl(x))
(p~(x)
f~(x))
---
)
can be computed by the following recursive program. F (x) = (cond (p,(x)
fl(x))
(p~(~)
f~(x))
(pk(x)
fk(x))
(T
C (F(b(x)), x)))
In this example, b = c d r , k ~ l , and C (F(b(x)), x) = cons (cadar (x)
F (cdr (x))).
So the synthesized program is F (x) = (¢ond (atom (x)
(T
NIL)
cons (eadar
(x)
F (¢dr (x)))))
In summary, the Summers synthesis methodology begins with a carefully constructed set of examples which illustrate the desired recursive computation, The pi's and fi's are constructed for the given
163
examples and then each Pi and fi is written in terms of Pi-k and fi-k- Recurrence relations are then derived and the synthesis theorem is applied to give the finM program. This system is able to efficiently generate many useful single loop programs. Biermann [78] applied the flowchart synthesis methodology described in the previous section to the LISP synthesis problem. This leads to an algorithm for the synthesis of regular LISP programs from exampies. The regular LISP programs are analogous to finite start automata and allow arbitrarily complicated flow of control. This system does not require the user to carefully construct examples as was done by Summers but it also requires more execution time for a synthesis.
VI SYNTHESIZING PROLOG PROGRAMS FROM EXAMPLES Shapiro [83] has developed a methodology for synthesizing PROLOG programs from examples. A flowchart for the system appears in Figure 10. Its operation will be illustrated by showing how it constructs the program for the function member
(X,Y)
which yields true if X is a member of list Y and false
otherwise. The system functions as follows. The user furnishes example
facts illustrated
at the left side of Fig-
ure 10. These facts include positive information showing desired behavior for the target program and negative information showing undesired behavior. Thus in the member example, a user might supply the facts "member (a,[a,b]) is true" and "member (c,[a,b]) is false". The system at all times maintains a PROLOG program as shown at the right and it continuously compares the current version of the PROLOG program with the known collection of user-supplied facts. If the current program is not satisfactory because of lack of correctness with respect to some fact, the program is modified either by adding new clauses from the generator at the top or by throwing away existing clauses. Normal operation of the system thus involves continuously debugging the existing PROLOG program with respect to the known facts. Three kinds of errors may occur: (1) The program may compute a result which is undesired, an incorrect answer. (2) The program may be unable to compute a desired answer. (3) The program may not terminate.
164
In the first kind of error, the system simulates the incorrect computation and continuously queries the user and the data base to check that each step is correct. When a PROLOG clause is found that computes an incorrect result from correct premises, that clause is removed from the program as indicated at the bottom of Figure 10. In the second type of error, the system again simulates the computation but this time it will fail because some needed result was not computed. This indicates an additional clause is required and the enumerator at the top is run until the needed clause is found. In the third type of error, the simulator halts after a prespecified limit on computation size has been exceeded and then the system searches for causes of the suspected loop. The system searches for places where the same computation state may be reentered more than once and it may also query the user concerning violations of a well founded ordering needed for termination. The nontermination will be caused by some clause which computes an undesired result and the debugging procedure will discover that clause and remove it. Shapiro experimented with many different types of clause generators and associated with each was a class of synthesizable programs. He also developed a scheme for improving efficiency by avoiding the generation of many clauses which are covered by other clauses previously shown to be inadequate. At the time of system invocation, the user is asked to furnish the names of predicates appropriate to the current problem and to indicate which predicates may appear on the right sides of PROLOG clauses. Proceeding with the synthesis of the member program, the user first indicates that "member (_~_)" is an appropriate predicate and that it can appear on the right hand side of rules. The system begins with the empty program and debugs it with respect to given facts. If the user supplies the fact "member (a,[a]) is true", the system will discover its current program is incomplete, an error of type (2) listed above. It will be assumed for the purposes of this treatment that the generator will create the following clauses in the order given. member (X,Y) *- true
member (X,[XlZ]) ~- true member (X,[YIZ]) *-- member (X,Z)
member (X,Y) ~- member (Y,X) etc.
165
ill rul
Generator for al~ posslble clauses
IProgram I
.'~f~.,~
Get onother clause ii
"'
ill
Incomplete
Facts
l
i
i
PROLOG
J Monltor I
i
...........
1
Interpreter Nonterminatlon debugglng
i|iii
I I L
1
Program
Drop clause
Drop the offending clause i
ii
answer
Away
Figure 10 The Sh~pTrosynthes~s algorlthm
166
The notation [XIY~ stands for the list with heazt X and tail Y. The first call to the generator would then yield the current program { member (X,Y) ~- true } which covers the given example. Next suppose the user provides the fact that "member (a,[b]) is false". Here the system would discover a type (1) error and discard the single clause in the current program. This means that member (a,[a]) is no longer handled causing a type (2) error and another call to the generator. { member (X,~X[Z]) ~
true }
This program satisfies both known facts. Again the user may supply a fact: "member (a,[b,a]) is true". So another type (2) error results in an additional clause generation and the final program. { member (X,[XIZ]) *- true, member (X,[YIZ]) *- member (X,Z) } Shapiro showed his system to be capable of generating a variety of programs and compared it to various other systems. For example, his system solved the following problem posed by Biermann [78]: Construct a program to find the first elements of lists in a list of atoms and lists. Thus the program should be able to input [a,[b],c,[d],[e],f] and compute the result [b,d,e]. Shapiro's system needed 25 fa~ts to solve this problem and constructed the following program after 38 seconds of computing { heads ([ 1,[ ]) *-- true, heads ([[XlY]IZ],[XIW]) ~- heads (Z,W), heads ([X/Y],Z) *-- atom (X), heads (Y,Z) } Biermann's regular LISP synthesis system was able to create a solution for this problem using only the single example given above. However, its execution time was approximately one half hour.
V. C O N C L U S I O N Computer science has historically required programmers of systems to anticipate every possible behavior that could be desired and to program in advance M1 the knowledge and mechanisms needed to achieve it. Unfortunately, it has been found that such extensive and explicit programming is expensive
167
and it still, in many cases, does not achieve the range of behaviors that might be needed. The only alternative is to have the machines program themselves to acquire the knowledge they need to function satisfactorily. This chapter has described many mechanisms for machine learning and provides an introduction to the field. Additional information can be found in the references and in the textbook on learning edited by Michalski et al. /88]. REFERENQES {1] D. Angluin and C. Smith [1983], "Inductive inference: theory and methods", ACM Computhag Surveys, Vol. 15. [2] A. Biermann and J. Feldman [1972], "On the synthesis of finite-state machines from samples of their behavior", IEEE Trans. on Computers, Vol. C-21. [3] A. Biermann [1972], "On the inference of Turing machines from sample computations", Artificial Intelligence, Vol. 3.
[4] A. Biermann, R. Banm, and F. Petry [1975], "Speeding up the synthesis of programs from traces", IEEE Trans. on Computers, Vol. C-24.
[5] A. Biermann and R. Krishnas~vamy [1976], "Constructing programs from example computations", IEEE Trans. on Software Engineering, Vol. SE-2.
[6] A. Biermann [1978], "The inference of regular LISP programs from examples", IEEE Trans. on Systems, Man, and Cybernetics, Vol. SMC-8.
[7] A. Biermann and D. Smith [1979], "A production rule mechanism for generating LISP code", IEEE Tran6. on Systems, Man, and Cybernetics, Vol. SMC-9.
[8] A. Biermann, J. Fairfield, and T. Bares [1982], "Signature table systems and learning", IEEE Transactions .on Systems, Man and Cybernetics, Vol. SMC-12, No. 5.
[91 A. Biermann, G. Guiho, and ¥. Kodratoff, (Eds.) [1984], Automatic Program Construction Techniquuues, Macmillan Publishing Co., N.Y.
[10] J. Feldman [1972], "Some decidability results in grammatical inference", Information and Control, Vol. 20. [11] N. Findler, Ed. [1979], A88oeiative Networks, Academic Press, N.Y.
168
[12] P. Fink and A. Biermann [1988], "The correction of ill-formed input using history-based expectation with applications to speech understanding", to appear. [13t K.S. Fu [1975], Syntactic Methods in Pattern Recognition, Academic Press, N.Y. [14] M. Gold [1969], "Language identification in the limit", Information and Control, Vol. 10. [15] S. Hardy [1975], "Synthesis of LISP programs from examples", Proc. Fourth International Joint Conf. on Artificial Intelligence.
[16] J.P. Jouannaud and Y. Kodratoff [1979], "Characterization of a class of functions synthesized from examples by a Summers-like method", Proc. Sixth International Joint Conference on Artificial Intelligence.
[17] S. Mamrak and P. Amer [1978], "Estimation of run times using signature table analysis", NBS Special Publication 500-14, Fourteenth Computing Performance Evaluation User's Group, Boston,
Mass., Oct., 1978. [18] R.S. Michalski [1980], "Pattern recognition as rule-guided inductive inference", IEEE Trans. on Pattern Analysis and Machine Intelligence.
[19] R.S. Miehalski, J.G. Carbonell, T.M. Mitchell, [1983], Machine Learning, Tioga Publishing Company. [20] M. Minsky, Ed. [1968], Semantic Informatioq Processing, M.I.T. Press, Cambridge, Mass. [21] M. Minsky and S. Papert [1969], Perceptrons, M.I.T. Press, Cambridge, Mass. [22] N. Nilsson [1965], Learning Machines, McGraw Hill. [23] C. Page [1977], "Heuristics for signature table analysis as a pattern recognition technique", IEEE Trans. on Systems, Man, and Cybernetics, Vol. SMC-7.
[24] A. Samuel [1959], "Some studies in machine learning using the game of checkers", IBM Journal of Research and Development.
[25] A. Samuel [1967], "Some studies in machine learning using the game of checkers, II", IBM Journal of Research and Development.
[26] E.Y. Shapiro [1983], Algorithmic Program Debugging, M.I.T. Press, Cambridge, Mass. [27] D. Smith [1984], "The synthesis of LISP programs from examples: a survey", in A. Biermann, G. Guiho, Y. Kodratoff (Eds.), Automatic Program Construction Techniques, Macmillan Publishing Co., 1984.
169
[28] M. Smith [1973], "A learning program which plays partnership dominoes", Communications of the ACM, Vol. 16. [29] P. Summers [1977], "A methodology for LISP program construction from examples", Journal ACM, Voh 24. [30] T. Truscott [1979], '"The Duke checker program", Journal of Recreational Mathematics, Voh 12. [31] L. Valiant [1984], "A theory of the learnable", Communications of the ACM, VoL 27. [32] D. Waterman, W. Faught, P. Klahr, S. Rosenschein, and R. Wesson [1984], "Design issues for exemplary programming", in A~ Biermann, G. Guiho, Y. Kodratoff (Eds.), Automatic Program Construction Techniques, Macmillan Publishing Co., N.Y. 1984. [33] R. Wharton [1977], "Grammar enumeration and inference", Information and Control, Voh 33.
METHODS OF AUTOMATED REASONING A tutorial
Walfgang Bibel
Technische Universitfit M6nchen
ABSTRACT This chapter introduces into various aspects and methods of the formalization and automation of processes
involved in performing inferences.
It views automated inferencing as a
machine-oriented simulation of human reasoning, In this sense classical deductive methods for first-order logic like resolution and the connection method are introduced as a derived form of natural deduction. The wide range of phenomena known as non-monotonic reasoning is represented by a spectrum of technical approaches ranging from the closed-world assumption for data bases to the various forms of circumscription. Meta-reasoning is treated as a particularly important technique for modeling many significant features of reasoning including selfreference. Various techniques of reasoning about uncertainty are presented that have become particularly important in knowledge-based systems applications.
Many other methods and
techniques (like reasoning with time involved) could only briefly - if at all - be mentioned.
172
GONTENTS
INTRODUCTION
I. N A T U R A L AND A U T O M A T E D DEDUCTION
2. N O N - M O N O T O N I C REASONING 2.1 A formalism for data bases 2.2 Negation as failure 2.3 Circumscription 2.4 Inferential minimization with circumscription 2.5 Other approaches to inferential minimization
3. M E T A - R E A S O N I N G 3.1 Language and meta-language 3.2 Application to default reasoning 3.3 Self-reference 3,4 Reasoning about knowledge and belief 3.5 Expressing control on the recta-level
4. REASONING ABOUT UNCERTAINTY 4. t Bayesian inference 4.2 The Dempster-Shafer theory of evidence 4.3 Fuzzy logic 4.4 Performance approaches 4,5 Engineering approaches
5. SUMMARY AND CONCLUSIONS
REFERENCES
173
INTRODUCTION
There has never been given a widely accepted definition of intelligence that both accounts for our everyday use of this notion and at the same time yields a precise and formal notion. This is probably because intelligence is a fuzzy notion in everyday use comprising various more precise notions at the same time which until now have not been elaborated. In this situation we have no choice other than continuing to talk about intelligence in an informal and intuitive way.
Taking such an intuitive view it seems that we associate with intelligence at least the following capabilities. A person without any knowledge would never be called intelligent; hence the capability to dispose of a certain amount of knowledge is one fundamental aspect of intelligence. However , a person with all the entries in the Webster at immediate disposal, but this being the only capability, would still not be regarded as truly intelligent because we also expect the capability for solving problems in changing environments from an intelligent being. For reasons that will be discussed in section 5 of this paper, problem solving is considered a special form of reasoning. With this understanding we thus can say that the capability of reasoning is another fundamental aspect of intelligence. Intelligence has more such aspects.
For instance, the speed with which the aforementioned
capabilities are performed certainly plays an important although complex role in our intuitive understanding. In view of the computer systems that we actually have in mind, we integrate this aspect into the previous two ones as technical issues. An impressive capability for learning of course is part of our understanding of intelligence as well. But learning can be understood as a sort of problem solving and thus is taken already into account by the previous two fundamental aspects. Further an intelligent being must have the capability for communication with others; but we might prefer to combine this capability with our understanding of intelligence only insofar as the inherent problem solving and reasoning processes are concerned. We will ngt continue this analysis any further but rather draw already at this point the conclusion that, under the views taken in our analysis so far, there are essentially two fundamental aspects of intelligence, one is knowledge and the other reasoning. Let us take this as a working thesis claiming that any further aspects somehow can be understood in terms of these two as we have just discussed. The focus of this tutorial is on techniques that lend themselves to an automatic treatment. In this context we prefer to substitute the notion of knowledge by that of a knowledge base and the notion of reasoning by that of inference as a matter of notational convention. In this restricted context our thesis translates into the architectural concept that any intelligent (computer) system may be viewed as basically consisting of two fundamental components, the knowledge base and the inference component. The two are not independent, of course, since inferencing has to take the representational structure of the knowledge base into account and
174
this structure in turn heavily influences the performance of inference. The topic of this paper, then, is a treatment of various important techniques in use for the one of these two components, viz. the inference component. Because of the interdependence just mentioned this will occasionally require some discussion of issues of knowledge representation as well which will be restricted to a m i n i m u m , however, since they are extensively discussed in the contribution by J . P . Delgrande and J. Mylopoulos within this volume.
Hence
our topic is a rather restricted one; none the less it is one of central importance for Artificial Intelligence (AI) as the previous discussion should have demonstrated. To some extent the distinction between knowledge and inference may be questioned, since inference may be regarded as a sort of knowledge as well.
Specifically, the capacity for
inferencing derives from a knowledge about how one can infer new knowledge from previous knowledge. U n d e r this view it might be classified as meta-knowledge that provides information about the relation of different pieces of object-level knowledge.
Never the less the special
way, how this particular kind of knowledge is used in knowledge-based systems, justifies the distinction we made. A confusingly rich variety of methods and techniques for inferencing is known today. It ranges from exact mathematical theorem proving to the speculative conclusions of a stockbroker in one dimension, and from the human forms to sophisticated machine versions in another. An exhaustive treatment is therefore beyond the limits available within this volume.
Yet we
make an attempt to provide the reader with a feel for most of the aspects of inferencing.
At
the same time we try to present the different approaches as far as possible in a uniform way. The paper begins in section 1 with classical first-order reasoning.
In particular, we present
this form of reasoning as a model of h u m a n mathematical reasoning following Gentzen with his calculus NK.
On this basis the question is pursued in some detail how a deduction may
be determined for a given formula.
This leads us to a more technical version of Gentzen's
calculus, from which we then derive the idea for the connection method in Automated Theorem Proving (ATP).
This way of treatment and selection of topics is meant to comple-
ment the chapter by Stickel in the same volume. Section 2 is devoted to non-monotonic reasoning which is characterized by the following two features.
O n the one hand a problem description is always to be seen in the context of addi-
tional common-sense knowledge that is assumed by default.
On the other hand the resulting
complete description is to be understood in some sense in a minimal way.
Our emphasis is on
this latter aspect. There are several techniques for achieving this kind of minimality.
We dis-
cuss the techniques of predicate completion in relational data bases and of negation-by-failure in P R O L O G in some detail.
T h e n we provide an introduction to the circumscription approach
along with an illustration of its use for common-sense reasoning.
Finally, we briefly review a
number of other approaches such as default reasoning and reasoning that tolerates inconsistencies in the knowledge base.
175
Some of these approaches to non-monotonic reasoning demonstrate the need for a distinction of the reasoning on the object level from the one on the meta-level (or recta-recta-level, etc.). This important topic is treated in section 3. In particular, we explain the distinction between these various levels of languages and explain how one can technically amalgamate more than one, such levels into a single one. This may then directly be applied to non-monotonic reasoning, to expressing control knowledge explicitly in a system (such as PROLOG), and to other important topics. We also present a recent solution that allows to express self-reference within first-order logic which has an important application in the reasoning about knowledge and belief. Other approaches to this latter area are briefly discussed as well. The next major topic is reasoning about uncertainty in section 4. This is of a particular actuality since many knowledge-based systems necessarily have to cope with the uncertainty of the available information. The most widely used approach to these phenomena is based on Bayes' theorem; but we also point out a number of problems that have been experienced with this method. One of these problems have initiated the development of a related approach based on the Dempster-Shafer theory of evidence. In addition to that we conclude the presentation of such probabilistic approaches with a discussion of fuzzy logic that has been developed from fuzzy set theory. As a contrast the section concludes with a review of non-probabilistic techniques to deal with uncertainty which might be regarded more in line with AI methodologies. For instance we mention the plausible reasoning technique used in the system Ponderosa, the technique based on the model of endorsement, engineering techniques and others. In the final section we fill the remaining gaps of other forms of reasoning by briefly addressing some of them. This way we summarize the whole presentation. Its importance for many applications is pointed out.
And finally we give a view of how a complex reasoning system
comprising all these features eventually might look like.
1. N A T U R A L DEDUCTION Human beings draw inferences from what they know, that is, they explicate new pieces of knowledge from previous ones. For instance, if the original knowledge consists of the two sentences
KI: Socrates is a man K2: All men are mortal
then by way of inference anyone would conclude that
K0: Socrates is mortal
although obviously this latter fact is not explicitly given with K1 and K2 . This aspect of
176
h u m a n thinking has been observed and studied for more than 2000 years. In illustrating this phenomenon the way we did, we have already taken into account a basic assumption.
Namely, we might originally think of inferences being drawn from pieces of
knowledge that are not necessarily represented in language form.
But our assumption is that
an appropriate representation in some language can be used without affecting the nature of this phenomenon. Formally we may think of h u m a n inferencing as of a relation between pieces of knowledge like between
K1
and
K2 on one side and
K0 on the other in the present example.
cians usually denote this relation by the symbol
Logi-
]= . Since our discussion at this point still is
concerned with the cognitive aspect of h u m a n inferencing, let us emphasize this aspect by adding the subscript
h to this relation, i.e.
t% • Thus in our example the relation between
the pieces of knowledge may be expressed by
K1, K2
J% K0
O n the basis of this analysis of the phenomenon of h u m a n inferencing we face the following fundamental problems in view of the topic of this paper.
1. How can we adequately represent knowledge in language form? 2. How can we define
[~ so that it coincides with our experience?
3. How can we determine whether K [~ K'
holds for any
K and
K' ?
The first question addresses the relation of language and its meaning (or its semantics).
It is
a question that has kept many philosophers busy for at least the last hundred years. Today these issues are treated in the areas of natural language semantics and model theory. question is by far a non-trivial one. paper.
So this
We will meet it again at several occasions later in the
In order to avoid its complications for the beginning, we rely on the solution offered
by the language of first-order logic with its well-defined semantics until we discuss some of the complications in later parts of the paper. The second question of course is intimately related to the first one and with it is again decidedly non-trivial.
As long as we accept the traditional form of first-order logic, however, logic
provides us with a well-defined solution that we denoted by
[= before.
But we will later
have to account for a number of complications as mentioned for the first question.
177
^-t
^-E
v-I
v-E [A][B]
A B AAB
A^B A
A^B B
A A v B
B A v B
V-I
¥-E
9-1
F Vc F
Vx F ~
F{xkt} qx F
-.-I
-,-E
~-I
AvB
C C C
3-E
[F}
[A]
3c F C C
.-E
[A]
B A-, B
A A-*B B
Inferences according to ¥ - I
and
F ~A
3-E
F D
A
F
are subject to a condition on the variable c
Figure 1. T h e inference figure schemes of the calculus N K
i n the present section we will now discuss the third question in some detaiJ on the basis of first-order logic. m e n t relation
In other words, we now assume that
]= of first-order logic.
]% is modeled by the classical entail-
T h e r e knowledge is represented by flrst-order formulas.
]= is defined in terms of truth values in most logic texts. would prefer a more syntactic characterization.
For the purpose of automation we
I n standard logic texts we find m a n y such
syntactic characterizations. O n e is due to G. G e n t z e n which we are going to describe i n the sequel. W i t h his calculus N K G e n t z e n tried to simulate the n a t u r a l way of a m a t h e m a t i c i a n ' s reasoning.
I n particular, he observed that such reasoning starts from assumptions (or from previ-
ously established results), a n d proceeds by a n u m b e r of well-defined syntactic aries. ferred to present these rules in a tree-like form as inference figure schemes.
H e pre-
Figure 1 shows
all the schemes that establish the entire calculus NK. These schemes are to be understood in the following way. For each logical symbol there is a rule that introduces (I) a n d one that eliminates (E) it. instance, consider the first rule
^-I
introducing a conjunction symbol in
^ . It says, if there
A
extended to establish the formula
A ^ B . As we m a y see there are two different versions of
^-E
B
NK,
is a derivation for the formula
the rule
a n d one for
For
then the derivation m a y be
which eliminates a conjunction symbol. T h e same holds for the rule
F{x\t} denotes the substitution of x by t in
F.
v-I .
t78
2: Vy Pay
1: 3a Vy Pay
Pab
¥-E
3x Pxb
3-I
Vb 3x Pxb
V-I
Vb 3x Pxb
3-E~
3a Vy Pay --, Yb 3x Pxb
-~-Iz
Figure 2, A derivation in the calculus N K
As we mentioned N K operates with assumptions that are stated at the b e g i n n i n g of a chain of reasoning.
Some of the rules allow the transition of the premises to the conclusion only u n d e r
certain assumptions.
I n the rules such assumptions are represented as formulas in brackets
like the formulas
and
A
B
in the scheme
derivation in N K of the formula ject to a n assumption same formula
A
v - E . In detail this scheme says: if there is a
A v B , further a derivation of any formula
that is sub-
made initially in this derivation, a n d finally a derivation of the
C but now subject to an assumption
not subject to both assumptions
A and
B , then we m a y infer C
-
which then is
B a n y longer.
Figure 2 shows a derivation in N K that starts with two assumptions. which is short for P(a,b)
C
as "person b earns more t h a n
If we interpret
Pab
-
a d e u t s c h m a r k s ' , then assump-
tion 1 states that there is a lower b o u n d in the salaries in question, while assumption 2 states that
a is such a lower bound,
U n d e r the assumption 2 the formula
'¢b 3x Pxb
(along with
it predecessors in the derivation) is derived in a n initial b r a n c h of the derivation.
In the next
step of the derivation, however, the dependency of this assumption is eliminated by way of a n instance of rule
3 - E . T h e reference to the particular assumption 2 is established in the fig-
ure by way of the index 2 added to the n a m e of the rule.
T h e same h a p p e n s with assumption
1 in the final step of the derivation in a n analogue way, so that the final formula of the derivation is not subject to a n y assumption. In order to demonstrate that such a formal derivation in N K actually mirrors a natural chain of reasoning, we translate the derivation from figure 2 into English text as follows: 'Suppose there is a n
a
such that, for all
(assumption 2). then
Pab
Pxb
holds.
T h e n , for all
y , Pay
holds (the step V-E). Since
b
there is a n x such that
y , Pay
holds (assumption 1).
holds; therefore, if b
T h u s there is a n
x , viz.
a
Let
a
be such a n
denotes a n arbitrary object, is such a n object, such that
was arbitrary, our result therefore holds for all objects, i.e. for all Pxb
holds ( ¥ - I and 3-E~).
a
b
This yields our assertion (-,-I1)-'
Without going into any further details we just note that there is a condition on the variable in two of the rules. quantified
b
I n our example derivation, for instance, this condition requires that the all-
in the formula resulting from the V-I inference must not occur in a n y assump-
tion on which this formula depends, i.e. it must not occur in assumption 2 in this case; this
179
provision guarantees that
b
is indeed arbitrary as the text says.
Similarly, the
assumption 1 must not occur in the formula resulting from the 3 - E inference. variables apparently play a different rote t h a n the ones like x distinguish t h e m with our notation that uses
a, b, ...
ers. As a final explanatory remark we m e n t i o n that
and
y
for the ones and
a
from
Since these
in the derivation, we x, y, ...
for the oth-
F denotes the logical constant 'false'.
T h e set of these rules defines a derivability relation a m o n g formulas that usually is denoted by t- • For instance, the end formula of the derivation from figure 2 is derivable in this sense, i.e.
t-
3a Vy Pay
-* Vb 3x Pxb .
assumptions a n d a formula.
In general,
[-
is a b i n a r y relation between a set of
In the present example the set of assumptions u n d e r which this
formula is derivable is empty a n d thus not written explicitly.
T h e formula
derivable only in the context of the assumption 2, thus we have
Vy Pay
Vb 3x Pxb
is
I- Vb 3x Pxb , a n d
so forth. O n e can prove that
I= a n d
I- both define the same relation, a result which is k n o w n as the
soundness a n d completeness theorem for the calculus NK. A [- B holds iff (i.e. if a n d only if)
T h e -*-rules in N K show that
I- A -* B holds (known as the deduction theorem), for
which reason we m a y restrict our attention to the latter case with no assumptions which simplifies the discussion of question 3 above. As we already pointed out it was G e n t z e n ' s m a i n intention with N K to introduce a calculus that closely simulates the n a t u r a l way of h u m a n reasoning.
In particular, N K contrasts from
the so-called H i l b e r t - t y p e calculi which define the notion of derivability in a different way. These specify a set of formulas as axioms a n d allow only one rule of inference, viz. the rule --,-E well-known as the m o d u s ponens. of the form
(A-,(B-*C))-,(A-*B)-*(A--,C)
For instance, a n y formulas of the form
A-,(B-*A) or
would be a m o n g the axioms although their validity
does not seem to be obvious in all cases (such as the second one).
This indicates why N K
appears to be m u c h more natural t h a n calculi of the Hilbert type. As a n aside we m e n t i o n that G e n t z e n has provided a n intuitionistic version N J of N K by deleting the rule ~ - E .
This shows that technically intuitionistic reasoning is not m u c h dif-
ferent from classical reasoning.
T h e reader m a y find more on this topic in section 4 of H u e t ' s
contribution in this volume. A n additional a d v a n t a g e of N K in comparison with Hilbert-type calculi is the fact that it lends itself to a n automation of the computation of
I- in a n easier way. This becomes more
obvious if we proceed to a technical variant of N K that also was developed by G e n t z e n for the purpose of simplifying his consistency proofs for n u m b e r theory.
This variant is denoted by
L K ("logistic calculi") a n d is known as G e n t z e n ' s sequent calculus.
It is very easy to
t r a n s f o r m a derivation in N K into one in LK. For that purpose any formula
B in the given
180
~Pab
v
3y ~Pay
v Pab
3y ~P, ay v Pab 3y ~Pay
v
Ya 3y ~Pay Va 3y ~Pay
v
3x Pxb
(ax)
3x Pxb
(9)
3x Pxb v
v
v
(3)
3x Pxb
(¥)
Vb 3 x P x b
(V)
Figure 3. A derivation in the calculus GS
derivation that depends on a number of assumptions
A1,...,A~
is replaced by the sequent
A1,...,An -, B. Strictly speaking, the use of the implication sign -, takes place on a level different from the one of possible implication signs in the formula
B or in the A's, namely on the meta-level.
However, since the logical meaning of implication remains the same on any level, no additional rules are required for it.
Following Schfitte [Sch] we also interpret the comma in such a
sequent as a conjunction, drop the redundant elimination rules from NK, restrict the tertiumnon-datur ( v - I ) to literals and transform any formula to its negation normal form (no negation signs except in literals and no implication signs).
With all these modifications applied to
N K we obtain a calculus GS (for Gentzen-Schfitte) for first-order logic defined below via its derivability relation (see e.g. [Bi3]).
[- . It would be boring to explain this transition in all technical details
Rather we will illustrate it with the example from figure 2 after stating the
formal definition. Def'milion. Inductive definition of the derivability relation
1- in GS for formulas in negation
normal form. (ax) [- G1 v Ph...t~ v G2 v ~Ph...tn v G3 ; that is, all formulas of this kind, which are called axioms, are derivable. Here and in the following rules the occurrence of the formulas Gi , i=t,2,3, is optional. (^)
t-
G1 v F1 v G2
thereby (V)
(3)
l-
G1 v F2 v G2
^ is assumed to bind stronger than
I- G1 v F v G2 G1 and
and
implies
implies
t-
G1 v F1AF2 v G2
;
v .
I- G1 v gcF v G2
provided that
c does not occur in
G2.
1- G1 v F{x\t} v 3xF v G2
implies
l- G1 v 3xF v G2
The negation normal form of the last formula in the derivation of figure 2 is obtained by the following substitutions.
The formula of the form
negation sign is moved inward by replacing 43 ing to welt-known laws in first-order logic. shown in figure 3.
A--,B is replaced by
with '¢~ , and then
~ A v B ; then the
~V with
39 , accord-
The GS derivation of the resulting formula is
18t
Its first formula is a n axiom according to (ax); viz. in this instance G1 is
~Pab ,
G2
is
3y~Pay ,
the t e r t i u m - n o n - d a t u r
~Ptlt2
P a b v -,Pab
is
Pab , a n d
G3
does not occur,
Ptlt2
is the rest. Essentially it expresses
which also may be read
steps introduce a n existential assertion instead of the given term.
Pab ~ Pab .
T h e next two
T h e need for the occurrence
of the existential f o r m u l a before a n d after such a step arises from the possibility to combine more facts of the sort P a b
(e.g.
Pbb, Pcb, ...) within a single existential statement
T h e final two steps introduce a n all assertion.
3xPxb .
Note that in both cases the quantified variable
does not occur in other parts of the formula as required by the condition on
c in (V).
The
example does not illustrate a n application of the simple rule (^) that has two premises that are identical upto the two parts F1
and
F2 .
Although we skipped m a n y details that are usually discussed in a logic text in such a context, the reader might now be able to carry out derivations in either N K or GS.
O f course this is
also not the place to formally prove the fact that a n y formula derivable in GS is derivable in N K , a n d vice versa.
T h u s both calculi provide the same derivational power, while they differ
in their naturalness a n d their conciseness.
N K is more natural while GS is more concise a n d
thus technically more transparent. In order to determine whether formula
[- F
holds (question 3 above in its simplified form) for any
F one would think of starting with
F a n d try out any of the four rules of GS in a
backward way in order to see which one of t h e m m i g h t be the last in a derivation. would yield premises for which we carry out the same process again, a n d so forth. this view we would be interested in the backward direction of these rules only.
This
So u n d e r
I n this case
one might prefer to state these rules in this backward direction from the very beginning.
The
calculus k n o w n as semantic tableaux by Beth [Bet] does exactly this, i.e. its rules include exactty those of G S read in a backward direction with all formulas negated (since proofs are established by contradiction).
So the inclusion of the semantic tableaux into this discussion
would not add any new aspects. T h e r e is a further well-known result in logic which helps us to simplify our problem even further.
This says that a formula is derivable iff its skolemized form is derivable.
formula we obtain the (positively) skolemized form in the following way. mula (in negation n o r m a l form) of the form Thereby it is assumed that that there are exactly
n
Va F[a]
For any
Any- part in the for-
is replaced by
F[f(xl .... ,xn)] .
f is a function symbol not occurring elsewhere in the formula a n d quantifiers 3xi , i = l , . . . , n , that precede
Va
in the formula.
instance, the skolemized form of the e n d formula in the derivation of figure 3 is
3y ~Pay
For v
3x Pxb , because both all-quantifiers are not preceded by a n y existential quantifier so that the Skolem function has zero a r g u m e n t s , i.e. it is a constant in each case (here denoted by the same letters a a n d
b
which do not occur anywhere else in the resultant formula).
the skolemized form of the formula
3x g a 3y F[x,a,y]
is
Similarly
3x 3y F[x,f(x),y] ; a n d so forth.
As a w a r n i n g we m e n t i o n that often skolemization is introduced (negatively) in the context of
t82
a refutation rather than a proof system in which case the roles of the all- and existential quantifiers are exchanged. If we restrict our attention to formulas in skolemized form then all-quantifiers never occur. Consequently, the rule (V) in GS becomes obsolete and thus can be ignored.
In this case we
furthermore can apply another well-known result and transform any given formula into its prenex form, again without affecting derivability.
For instance, the prenex form of the end
formula again from figure 2 after skolemization is
3y 3x (~Pay v Pxb) . Any such formula
consists of a sequence of existential quantifiers, the
prefix,
followed by the
part that is purely propositional by nature and has no quantifiers.
matrix,
a formula
One may easily see that for
the derivation of such a formula it can be assumed that the rules according to (^) precede all those according to (3) (in more general form known as Gentzen's Hauptsatz).
How could one
determine a derivation of that kind for any given formula in this special form? Since we know that the final part of the derivation must consist of a finite number of applications of nile (3) for each existential quantifier, we only have to determine these finite numbers along with the term t on the left side of this rule for each of its instances.
Assume
we would know this, then we could easily determine the first formula in this final part of the derivation which must then be derivable from axioms in a first part of the entire derivation by applications of the rule (^) only.
Apparently this first part achieves nothing else than estab-
lishing that this formula is a propositional tautology.
In summary our task consists in deter-
mining the number of applications of rule (3) and the respective terms, and in testing for tautologies. The simplest solution for this task would be an exhaustive enumeration of the numbers and the possible terms together with an application of the tautology test for each resulting configuration.
In the beginnings of A T P such an approach has in fact been pursued which
became known as the British Museum Method.
Obviously it is hopelessly inefficient.
As the
crucial idea of improvement Prawitz suggested in 1960 to exchange the sequence in the solution for this task, namely t~ test for tautologies first and determine the numbers and terms by need only.
All theorem proving methods today work along this basic idea; they only differ in
the particular choice of the tautology testing method. Let us illustrate this idea with our previous example quantifiers yields the matrix x must be replaced by
a
~Pay and
3y 3x ( - P a y v Pxb) . Deletion of the
v Pxb . In order for this formula to become a tautology,
y by b . So this provides the information about the final
part of the derivation for this example which consists of exactly one application of rule (3) for each existential quantifier, one with the term
a , the other with
b .
The replacement of
variables by terms as just illustrated is determined by a welt-known and fast process called unification for which more details may be found in the chapter by Stickel in this volume.
If
no appropriate replacement would have been found in our example then we would have taken into account a second copy corresponding to two applications of rule (3) for each quantifier
183
and thus would have considered the formula
(~Pay v Pxb) v (~Pay' v Px'b)
to be tested for
tautology; and so forth. To summarize once more, because rute (3) can be applied more than once, and, viewed in the backward direction, each time produces a new copy of the matrix, the tautology test possibly has to account for more than one such copy.
Apparently one would first try one copy, taking
others into consideration if this fails to yield a tautology. appropriate tautology test that includes unification.
We are left with the question for an
The one suggested by GS consists in a
straightforward application of the inverse of rule (A) along with a simple test for (ax) on the resulting formulas, and doing this over and over again.
This is pretty redundant since the
inverse of rule (^) generates two out of one formula which share a great deal of information. One way to avoid this redundancy consists in an extension of the axiom property to any tautology which renders rule (a) completely redundant as well (hence (3) is the only rule that remains after these modifications on GS). v Pxb v ~ Q a
Let us illustrate this with the matrix
that is slightly more complicated than our previous one.
(~Pay ^ Qx)
With one application
of the inverse of rule (A) we would obtain two axioms according to (ax) after application of the same substitution as before. itself.
But now we define (ax') such that this matrix becomes an axiom
In order to give an intuitive idea of the property characterizing (ax') we use a different
representation of this matrix, viz. in real matrix form in a two-dimensional space. For that purpose we represent conjunctive parts top-clown and disjunctive parts left-right.
So our
matrix now reads ~Pay
Pxb
~Q.a
Q×
The columns in such a matrix are called ctauses. A path through the matrix is a set of literads that is obtained by selecting one literal from each clause. eling through the matrix along such a path.
Intuitively one might think of trav-
Our matrix has exactly two paths.
Note that
they correspond exactly to the two axioms obtained after application of the inverse of rule (ax) as described before.
Two literals in a matrix are called a connection if they are contained in a
path and share the same predicate symbol, one negated the other unnegated. exactly two connections as depicted in the following copy.
~Pay
Pxb
~tQa
Our matrix has
184
A set of connections is called
spanning' for
the matrix, if each path through the matrix con-
tains one of them, as is the case in our present example.
W i t h our previous substitution ix\a,
y\b} the literals in each connection become identical upto the negation sign in which case the connections (or the literals) are called
complementary. With
these notions we can now define
(ax'). ( a x ' ) [- F for any formula F for which there is a s p a n n i n g set of complementary connections. T h e r e are powerful algorithms which test for this property which along with unification provide a convenient a n d comparatively efficient solution for our task 3. It takes one (or more) copies of the matrix of the given formula a n d tests for (ax') whereby substitutions are generated by need via unification.
This whole approach is k n o w n as the C o n n e c t i o n Method.
Except for
this brief outline we will not describe it in any further details since there is a more detailed expository overview in [Bi4] for readers ready to taste it in more but still limited details a n d a comprehensive treatment in [Bi3] for the truly committed readers. With the way of development used in the outline above we wanted to emphasize the close relationship of the connection method with the calculi of natural deduction. This is a very important feature for a n interactive theorem proving e n v i r o n m e n t . Namely we might think of a powerful m a c h i n e - o r i e n t e d prover based on the connection method inside the m a c h i n e a n d the proofs (completed or partial) represented on the screen of a workstation in a h u m a n - o r i e n t e d and natural way. We would like to make the reader aware of the fact that for the purpose of explanation we have m a d e a n u m b e r of simplifications in our task that are justified in view of a correct solution.
However these simplifications (like skolemization, prenex form, etc.) do not necessarily
contribute to a more efficient solution.
This is to say that we have to omit the simplifications
if we head towards a really smart solution [Bi3].
U n f o r t u n a t e l y our task becomes then so
complex by its very n a t u r e that a long experience is needed in order to be able to advance to these more challenging topics.
O n the other h a n d there is no other way to advance this field
any further. I n a way the restriction to first-order logic might be regarded already as a very serious one since it seems to exclude a n y h i g h e r - o r d e r features.
For this reason we m e n t i o n that the con-
nection method can be generalized to h i g h e r - o r d e r logic which has b e e n carried out in section V.6.
Because of the computational problems that arise in such a general logical framework a
restriction might nevertheless be desirable.
T h e results outlined in section 3.3 in the present
paper m i g h t be of great interest also in this context. As we said above the test for tautologies distinguishes the various theorem proving methods in use today.
Solar we have discussed those based on the connection method. T h e most popular
ones are those based on resolution, however.
W e witl not consider t h e m at all here since they
are extensively covered in Stickel's chapter in this volume.
Resolution works on the basis of
185
the same simplifications that we used above. So the advances that we just talked about are important issues for resolution as well.
Most likely they will be pursued further in the context
of the connection method because it is more transparent than resolution for such a purpose which is an important point in view of coping with the complexity of the task. Let us, finally mention that all of the special topics discussed in the context of resolution in Sticket's chapter (such as theory resolution) similarly apply to the connection method.
Most of
them have been treated in [Bi3] under this viewpoint.
2. N O N - M O N O ' I K ) N I C RF_,ASONING At the beginning of section 1 we have considered h u m a n inferences as a relation pieces of knowledge.
We have then taken
[% to be the relation
order logic and studied several syntactically defined versions
[- of it.
]~ between
[= as defined for firstIt has been pointed out
in this context that this special choice will have to be reconsidered in a more detailed discussion of what we formulated as question 2. In the present section we enter this discussion. Consider the following two pieces of knowledge:
- IBM produces ~ m p u t e r s , or
P(ibm,cps)
- Daimler-Benz produces cars, or P(d-b,crs)
With no additional knowledge at hand, what would you answer being asked whether IBM produces cars, i.e.
P(ibm,crs) ? No, of course! In other words, it seems that
P(ibm,cps), P(d-b,crs)
]~ ~P(ibm,crs)
holds despite nothing was stated in the premises about IBM with respect to cars.
In fact, in
first-order logic this is an invalid inference. As another example (borrowed from McCarthy), suppose someone is hired to build a bird cage and doesn't put a top on it.
Since anyone knows that birds can fly, no judge in the world
would accept his excuse that it was not mentioned the bird could fly . O n the other hand, if the bird for some reason could indeed not fly and thus money should not be wasted by putting a top on the cage it should have been said so. In other words, it seems that
BIRD(x)
t% FLY(x)
holds which again clearly is not valid in t'irst-order logic. There are many more such cases where it seems that the h u m a n inference relation not coincide with phenomenon.
I% does
l= , the first-order one; but these two might do for a first discussion of this
There are two different ways of approach.
One is to acknowledge the
186
discrepancy and took out for an appropriate logic different from the first--order one.
There is
little doubt that we would have a hard time in bringing so different examples as the two above under a common logical framework that includes first-order logic as discussed in the previous section. The other approach would be to assume hidden pieces of knowledge as additional premises in examples of this sort.
In the first example this piece might be "and that's all which holds" in
the sense that everything that is not explicitly stated to hold is assumed not to hold, a principle known as the
dosed-world-assumption.
assumed not to hold, i.e.
~P(ibm,ers)
P(ibm,crs)
was not stated explicitly hence it is
is one among the pieces of this hidden knowledge.
If
we make it explicit by adding it to the inference above, we obtain a classical first-order inference. P(ibm,cps), P(d-b,crs), ~P(ibm,crs) . . . .
[% ~P(ibm,crs)
Similarly, if we add the hidden knowledge "birds can fly" to the premise in the second example, again the result is a classical first-order inference, viz. modus ponens.
BIRD(x),
BIRD(y)-* FLY(y)
I% FLY(x)
But note the difference; in the first case we have assumed that nothing except the stated facts holds (closed-world assumption) while in the second we added a fact (as common-sense knowledge).
In combination we might be inclined to say that on the one hand there is a body
of common sense knowledge that is tacitly assumed in any appropriate context, like the flying birds, while on the other hand no facts are taken into account except those stated explicitly or assumed as common sense knowledge. At least for these examples, then, the second approach above appears to be much more convincing.
So we learn that in certain cases humans draw inferences which involve tacit
assumptions; they become first-order logic inferences once these assumptions are made explicit. Note that these assumptions are context dependent.
For instance, in the first example the
assumptions of course would not include ~P(ibm,crs)
if P(ibm,crs)
the explicitly stated pieces of knowledge.
would have been among
As a consequence, the conclusions drawn from a set
of pieces of knowledge may change as we add additional information, a feature which is called
non-monotonicity. First-order logic is monotonic in this sense. knowledge K1 , then
If a piece of knowledge
K0 also follows from K1
implies
follows from some
enriched by additional knowledge K2 . In
symbols, K1 [= K0
K0
K1, K2
I= K0
t87
C o m m o n sense reasoning in contrast seems to be non-monotonic as we have just noticed. But our examples also show us that this non-monotonicity occurs on the surface only. If the tacit assumptions all are made explicit monotonicity is retained (since then the addition of a new fact like
P(ibm,crs)
also changes
K1
so that the monotonicity rule does not apply at all).
We will see, however, that it is not quite a trivial problem to handle the distinction between stated and assumed knowledge appropriately so that the formalism simulates usual common sense reasoning.
Before entering the technical details let us summarize the different types of
using non-monotonic reasoning following [Mc3]. 1. Use as a communication convention by which a body of knowledge is tacitly assumed unless explicitly stated otherwise (like in the bird example above). 2. Use as a database or information storage convention by which only knowledge is taken into account (whatever this means in detail) that is explicitly stated or assumed by other conventions (like in the IBM example above). 3. Use as a n~e of conjecture for solving problems in the absence of complete information. For instance, if you want to catch a bird you better assume it can fly in spite of the many exceptions to the rule that birds normally fly. 4. Use as a representation of a policy.
For instance, if a committee meeting has taken place
always on Wednesday, the next meeting will again be Wednesday unless another decision is explicitly made. 5. Use as a very streamlined expression of probabilistic information when numerical probabilities are unobtainable.
For instance, if you see a bird what might be the probability that it can
fly? In order to calculate it one would need a sample space in the first place which usually is not available in such situations.
Moreover, what purpose would it serve in the particular
situation to know that this probability is exactly 97.4%.
Or think of statements like "she is a
young and pretty woman" as another example where the probabilistic treatment appears to be out of place. 6. Use in the form of auto-epistemic reasoning where we reason about our own state of knowledge.
For instance, "I am sure I have no elder brother because if I had one I would
know it" belongs to this type of reasoning. 7. Use in common-sense physics and psychology.
For instance, we anticipate an object to
continue in a straight line if nothing interferes with it. This shows us that we are dealing here with a wide-spread phenomenon with a number of different aspects.
We will begin with the technical treatment of a very restricted case of usage 2
in the list above.
188
2.1.
A formalimm for data bases
O u r first example above has shown us that the p h e n o m e n o n of n o n - m o n o t o n i c reasoning occurs already in the case of a simple data base. structure.
Logically a data base has a very simple
So it might be helpful for the more complicated applications to study the issue first
for this simple case. Data bases are described in a relational language which is a first-order l a n g u a g e with a finite n u m b e r of (at least one) constants and predicate symbols, without function symbols, with equality, a n d with a set of simple types, that is a subset a m o n g the u n a r y predicate symbols. From a logical point of view the notion of a relational data base is defined in a model theoretic way as a triple ( R , I , I C ) where
1. R is a relational language, 2. I is a n interpretation for R
such that the constants in
R
are interpreted as mutually dif-
ferent elements in the domain. 3. IC is a set of formulas of
R , called integrity constraints, such that for each n - a r y predi-
cate symbol
= and from the simple types, IC must contain a formula of the
P distinct from
form qxl...Yxn ( Pxl...x~
"* Plxl^ ... ^ P,x~ )
where the Pi are types, i = l , . . . , n , called the domains of P .
As early as 1969 [Gre] the members of the A T P (Automated T h e o r e m Proving) c o m m u n i t y have preferred to think of data bases in a proof theoretic (rather t h a n the previous model theoretic) way.
U n d e r this view "answering a query m e a n s proving a statement . . . .
Thus
theorem proving is f u n d a m e n t a l for solving data base problems, a fact which is well k n o w n (but not very popular at present)" as I stated in 1976 [Bi2]. More recently the need for a more flexible data base m a n a g e m e n t has b e e n recognized. Attempts into such a direction faced a n u m b e r of problems such as the treatment of disjunctive information, the semantics of null values, the incorporation of more world knowledge, a n d ]ast not least the n o n - m o n o t o n i c i t y , which all seem to be due to limitations of the model theoretic view of data bases. attention it deserves.
For this reason the proof theoretic view has finally received the
It will now be briefly presented following [Re3].
A relational data base is defined in the proof theoretic way as a triple ( R , T , t C ) where R a n d IC
are defined as before while T
is a relational theory defined as follows.
t89
1. T
is a first order theory, i.e. a set of first-order formulas.
2. T
contains the d o m a i n closure axioms
axioms language 3. T
Vx ( x = q v ... v x=c~ )
~ ci = ck , i,k = 1,...,n, i
q,...,c,
a n d the u n i q u e n a m e
are all of the constants in the
R.
cSntains the equality axioms
VX X=X
reflexivity
Vxy (x=y --, y=x)
commutativity
Vxyz (x=y ,,, y=z + x=z)
transitivity
Vxl...x.,yl...y,.
Leibniz' principle of substitution
4. T
(Pxl...xr, ^ xl=yl ^ ... ,,, x==y,, "" Pyl---y=)
contains a set
D
of ground atomic formulas without equality, which might be con-
completion axioms for
sidered as the actual data base, along with the following
any predicate
P different from equality. Vx~...x~ (Pxl..,x~ -~ xl=cnA...^x~=cl~v ... v xl=e~I^.,.AX==C,~) , whereby
(ql,...,q,),
..-, (cr,,...c=)
are all of the tuples such that
P(qt,...,c=)
is in
D
for some i in {i .... , r } .
For instance, let
D
be the set
{P(ibm,cps), P(d-b,crs)}
from our previous example.
Then
there would be only a single completion axiom in this particular case, namely
Vx,xe (Pxix~ -* xi=ibm^x2=cps v xl=d-b^x2=crs) .
It minimizes the extension of the predicate
P , that is it restricts the tuples for which
holds to those stated explicitly in the data base approach is also k n o w n as
predicate completion.
theory obtained for this particular example cannot.
D
However, if we add
P(ibm,crs)
from this new theory while ~P(ibm,crs)
as described above, for which reason this It is easy to see that from the relational
~P(ibm,crs) to
D
P
then
can be derived while P(ibm,crs)
P(ibm,crs)
trivially can be derived
can nomore, because with this update the completion
axiom changes to become
Vxtx2 (PxIxz -* xI=ibm^x2=cps v xl=d-bAxe=crs v xl=ibm^x2=crs) .
T h e completion axioms are c o n t e x t - d e p e n d e n t like the assumptions discussed f u r t h e r above. T h i s way we achieve the n o n - m o n o t o n i c behavior of h u m a n reasoning within first-order logic. As we see, a classical theorem prover would now give the expected answer to a n y query to the data base
D . This remains true if disjunctive i n f o r m a t i o n is present in
occur, or if more complex world knowledge is added.
D , if null values
So this approach settles the kind of
problems that are now u n d e r discussion in the data base community.
O f course, a standard
190
theorem prover would not meet the requirements on efficiency that are standard in data base technology.
Both techniques may be integrated, however, by compiling the prover's steps into
data base techniques without changing the semantics wherever such techniques are applicable. Currently a solution is preferred that interfaces an existing data base system built with conventional technology with a theorem prover, for instance a P R O L O G system.
2.2.
Negation as failure
A conventional data base has a very poor logical structure, so poor even that this structure could be ignored by the data base community for decades.
So the question naturally arises
how the solution achieved by the completion axioms can be extended to more complex knowledge bases, say to P R O L O G programs [GeG] to begin with.
There in addition to the
relational facts as in data bases we have to account for general P R O L O G clauses that take the form of rules.
As we noted in section 1 such rules allow the derivation of facts that were not
stated initially in an explicit way.
This suggests that a generalized closed-world-assumption
takes into account derivable facts rather than stated ones.
So we would say that any fact is
assumed not to hold unless it is derivable from the explicitly stated knowledge.
For the case of
P R O L O G this principle is known as negation as failure which we briefly review now. Recall from Stickel's paper in this volume that P R O L O G clauses are rules of the form H ,- GI^...^Gn
where n > 0 , and the head
H
and the subgoals G1 are atomic formulas.
Further the goal clause is of the same form but has no head (and at least one goal).
This
means that in pure P R O L O G negation cannot directly be processed. Instead it is handled according to the principle just explained.
That is, if a goal or subgoal has the form ~ G
atomic), then the P R O L O G interpreter first attempts to prove
G ; if this fails then
(G
-~G is
established, otherwise it fails. This may be expressed as a P R O L O G program in the following way. ~G *- G,/,fail ~G *- true
We may view this treatment in a different way.
The clauses of a P R O L O G program define
the predicates occurring in the heads; but they do so with the if-halves of the full definition only that would include the only-if-halves as well.
In [Cla] it is shown in detail that
negation-as-failure amounts exactly to the effect that would be achieved if these o n l y - i f halves of the clauses would be added to the program and a theorem prover for full first-order logic would then do the interpretation. tained in [JLL].
A theoretically more comprehensive treatment is con-
Instead of presenting these results here in any detail, we simply illustrate
that the same view can already be taken in our previous data base example. P R O L O G clauses it reads
As a set of
191
?(ibm,cps) ,-P(d-b,crs) *-
Obviously, the same can be expressed equivalently in the following way.
P(xi,x2) ~- xl=ibm, x2=cps P(xi,xa) *- xz=d-b, x2=crs
which in turn is equivalent with the logical formula
P(xa,x2) *- xl=ibm^x2=cps v xI=d-b^x2=crs
As always in P R O L O G the variables are to be interpreted as all-quantified ones. With this in mind a comparison of this formula with the completion axiom (from the definition of a relational data base above) for this particular case shows that this axiom is in fact the only-if-half of this formula.
In other words, the completion axioms achieve exactly the same effect for the
simple case of a relational theory that is achieved by negation-as-failure for the more complicated case of Horn clause logic [She].
With this remark it is now also obvious that negation-
as-failure is non-monotonic since our previous example applies here too. We note a distinction, however, in the way of treatment.
In relational theories we have added
the completion axioms and then carried out a classical proof process.
In P R O L O G there is an
evaluation being extracted from the behavior of the classical proof process.
This evaluation
logically takes place on one level higher than the level of the proof process itself, that is on the meta-level.
We will come back to such a combination of object-level and recta-level proofs in
the sections 2.5 and 3 of this paper. There is yet another way of viewing the negation-as-failure approach, viz. the semantic one. It may be shown that a set of Horn clauses always has a minimal model [Llo], a fact which is not true in general for first-order logic.
The proof process in P R O L O G with negation-as-
failure in fact determines a minimal Herbrand model such as the one in the example above. Thereby minimality means that the domain is minimal -
{ibm, d-b, cps, crs} in the example
above - and that the relations have their minimal extensions the example.
-
{P(ibm,cps), P(d-b,crs)}
in
So from the semantic point of view the underlying closed-world-assumption
principle may be regarded as aiming at minimal models of the given set of formulas that describes the world under consideration. tion.
We wilt come back to this point in the following sec-
t92
2.3.
Circumscription
T h e way we handled the p h e n o m e n o n of n o n - m o n o t o n i c reasoning in the case of data bases a n d P R O L O G programs seems to be completely satisfactory at least for these restricted cases. U n f o r t u n a t e l y , the world is more complex to be modeled adequately in P R O L O G .
At least we
have to extend our language to include the features from first-order logic that are not included in P R O L O G , if not even more.
T h u s the question naturally arises whether the way of h a n -
dling used so far can be generalized to arbitrary formulas in first-order logic.
This turns out
to be more complicated t h a n one would normally expect. M c C a r t h y who worked on this problem for m a n y years if not decades has proposed a technique called
circumscription.
As he beautifully describes in [Mc2] this technique tries to cope
with the problem of c o m m o n sense reasoning of the most general sort. For instance, think of the well-known m i s s i o n a r y - a n d - c a n n i b a l s problem where three missionaries a n d three cannibals are to cross a river with a boat that carries no more t h a n two persons, and to do so in a way that at no time the cannibals o u t n u m b e r the missionaries at a n y side of the river.
The
point is that without common sense a description of that sort could never be understood.
This
is because there are myriads of ways to m i s u n d e r s t a n d the story due to its lack of precision and completeness (why not use the boat as a bridge which might work for a narrow river; or why should there be a solution anyway since the raws might be broken; etc.). Usually h u m a n s do not even think of such unlikely aspects a n d easily capture the essence of the problem for the same reasons that have been identified further above. Namely,
we
immediately associate a package of additional knowledge with such a description like "rivers normally are m u c h broader t h a n a boat" or our "birds normally fly" further above.
However
this extension is performed in a m i n i m a l way, i.e. no objects or properties are assumed that are not normally associated with a scenario as the one u n d e r consideration. offers a technique to simulate such a behavior in a mechanical way.
Circumscription
O n e element in this
technique is the use of sort of a completion axiom like the one in section 2.1.
W e begin by
formally defining this circumscription formula. This definition requires the use of second-order logic which we have not m e n t i o n e d so far. T h e reader should think of first-order logic as before except that function and predicate symbols are no more considered as constants, but m a y be regarded as variables a n d thus m a y also be quantified in the same way as the usual object variables in first-order logic.
Let
A(P,Z)
be such a formula of second order logic in which P occurs as such a predicate variable but is not quantified.
I n fact here a n d in the following we always allow a n y variable to represent a
sequence of variables, i.e.
P1,...,P~
write down the sequences explicitly. variable
P
a n d a n object variable
in t h e present case; but for sake of readability we never Further let x
E(P,x)
both are not quantified.
T h e n the circumscription of
mula
defined by
Circum(A;E;P;Z)
be a formula in which the predicate
(both possibly tuples by our assumption just made) E(P,x) relative to
A(P,Z)
is the for-
193
A(P,Z)
^ Vpz {A(p,z) ^ Vx[E(p,x)-.E(P,x)]-, VxIE(p,x),,E(e,x)]} .
For a better understanding of this formula let us instantiate it for the case of a simple example such as the one from section 2.1. data base, i.e. in it, i.e.
There
A(P,Z)
would be the formula describing the
P(ibm,cps)AP(d-b,crs) , and we would have to circumscribe the predicate
E(P,x)
P
would simply be P(xl,x2) • So the circumscription in this case would be
P(ibm,cps)AP(d-b,crs) A Vp {p(ibm,cps)Ap(d-b,crs)
A Vxlx2[p(xl,x~)-*P(xl,x2)] -* Vx~,x2[p(x~,xa)*-*P(x~,x2)]}
Since p is all-quantified, we may think of any predicate.
For instance, consider
p(xl,xa) -= xl=ibmAx2=cps v xl=d-bax2=crs . The premise
Vx~,×~[p(x~,x~)*P(x~,x~)] in the circumscription formula is obviously true given the assumption
A(P)
in this case.
Therefore, according to the circumscription formula, it is also required that
Vx~,xdp(x~,×~)~P(x~,x~)] holds as well which spelled out is the formula
Vxz,xa[P(x~,x2) ~ x~=ibmAx~=cps v x~=d-bAx~=crs] ,
i.e. the completion axiom from section 2.1.
In other words, we have shown that for the sim-
ple case of our data base example the completion axiom is a logical consequence of the circumscription formula, a result that holds in general for data base as well as for Horn clause problems [Re2,She,Mc3]. This might have given us a feel for the circumscription formula, at least for this special case. It is meant to replace a given set of axioms extension of
P
when
A(P,Z)
by a modified set that minimizes the
Z is allowed to vary in this process of minimization.
world descriptions A(P,Z)
It applies to
of arbitrary form, in fact even one in second-order logic, which is
to say that circumscription is far more general than predicate completion and negation-asfailure as discussed in the previous two sections.
Perhaps it is even too general for most prac-
tical applications for which reason we now present it in a slightly more restricted form of predicate circumscription where E(P,x)
is P ( x ) .
194
For this purpose we also abbreviate any formula of the form
Vx(Px--,Qx) by
case of tuples (aIways keep in mind our assumption), P_~Q abbreviates and
P
stands for P < Q ^ ~ Q < P . T h e n
Circum(A;P;Z)
P ~ Q ; in the
PI_~QI^...APn_gQ~ ,
may be expressed [Lil] by
A(P,Z) A ~Bpz[a(p,z)Ap
which is equivalent with our previous version for this special case as may be easily seen. Actually, the more general case may in fact always be reduced to the present one by introducing an abbreviation of the form Po(x) ~" E(P,x)
into A(P,Z) .
Currently there is no working system of knowledge representation based on circumscription. The major difficulty in designing such a system lies in the fact that the circumscription formula involves a second-order quantifier.
Fortunately it is possible in many cases to reduce the
circumscription formula to one in first-order logic as is shown in [Lil].
At least for these
cases such a system may now easily be realized on the basis of any existing theorem prover . At the end of the previous section we discussed the model theoretic meaning of negation-asfailure and we will now provide the same for circumscription. that
A(P,Z)
is
given.
Then
for
any
two
For that purpose let us assume
models M1,M2
of A(P,Z)
we
write
Mz -~P;z M2 if (i)
the universes of, both models are the same,
(ii) for every (object, function or predicate) constant not in P, Z both models also coincide, (iii) for every predicate in P its extension in ml
is contained in that of M2 .
Then the following result holds [Mc2,MiP,Lil].
Theorem.
M
is a model of Circum(A;P;Z)
iff M
is minimal in the class of models of A
with respect to _Kp;z .
The relation
<~;z is, generally, not a partial ordering; therefore
not necessarily exist.
In fact, there are consistent formulas
is even inconsistent [EMR].
M
as in the theorem must
A such that their circumscription
For important classes of formulas it has been shown, however,
that consistency is always preserved [Li2,MiP,EMR].
These complications indicate why we
regarded this topic as a difficult one at the beginning of the present section. Before we turn our attention now to the application of circumscription to non-monotonic reasoning, we finally mention as an aside that circumscription has a close relationship with the concept of implicit definability, that has been explored for many years in Mathematical Logic
[Do2].
195
2.4.
Inferential minimi:,-~tion with c i r c u n ~ p t i o n
As we said in the previous section, circumscription provides a tool for treating examples like the one with flying birds further above.
We will now illustrate how this works in detail
[Mc3]. For that purpose let us use the following predicates.
Bx
for
xisabird
Ox
for
x is an ostrich
Fx
for
x can fly
ABx
for
x is abnormal
Instead of stating that all birds can fly, as we did at the beginning of the present chapter, we rather express that birds normally can fly to account for the kind of scenarios described at the beginning of the previous section.
So we consider the following formula A(AB;F) :
V:,,(Bx,,.ABx.-.Fx)
,, Vx(Ox-.Bx) ,,, Vx(O,,-..l~x)
Intuitively we would like to have the ostrichs as the only birds which are abnormal w.r.t, flying in the world captured by A , that is Vx(ABx,-,Ox) . Indeed the circumscription formula Circum(A;AB;F) essentially amounts to this equivalence and thus produces the desired result. From this example we see that facts of the sort "normally such and such is the case" are represented as a first-order logic statement which in the premise includes a literal ~ABx that accounts for possible abnormal cases.
Minimizing this predicate by circumscribing it yields the
kind of reasoning observed for humans which might be called inferential minimization, general there may be various ways (or aspects) of being abnormal. using a different predicate
tn
We may treat this by
AB for each aspect, or we may provide the distinction in a func-
tional way with a single predicate AB , such as AB(aspectl(x)) , AB(aspect2(x)) and so forth. In the following scenario, that includes airplanes (P) and dead things (D), we take the first alternative.
Vx(Ox-,Bx) Vx~(Bx^Px) Yx(~ABlx -' ~Fx) Yx(Px ^ ~AB2x -, Fx) Vx(Bx A ~AB~x "~ Fx) Yx(Ox A ~AB4x "* ~Fx) Vx(Bx A Dx n ~ABox--, ~Fx)
196
The circumscription
Circum(A;AB1,...,ABs;F)
does not lead to the intuitively expected con-
clusions: there are no abnormal airplanes, ostrichs, and dead birds; ostriehs and dead birds are abnormal birds; airplanes and the birds that are alive and not ostrichs are the only objects satisfying ABz • The reason is that the goals of minimizing our five abnormality predicates conflict with each other.
For instance, minimizing the extensions
AB~ and
ABz conflicts
with ttie goal of minimizing AB 1 . Prioritized circumscription [Mc3] overcomes this problem.
There one establishes priorities
between different kinds of abnormality, specifically one assigns higher priorities to the abnormality predicates representing exceptions to "more specific" common sense facts, e.g. AB~ > AB 2 , AB~ > ABz in the present example.
AB 4 ,
With the circumscription formula adapted
to this generalization indeed provides a satisfactory solution that leads to the expected solutions.
Also for this case a first-order treatment may be achieved in many important cases
[Lil]. In summary we have seen that circumscription offers a rather general solution to the common sense reasoning.
The approach still seems a bit too complicated in comparison with the sup-
posed human reasoning.
Also there are still open problems of detail. Therefore it seems
worthwhile to have a look at other approaches taken to cope with this phenomenon as we do in the next section.
2.5. Other approaches to inferential mlnirni~,.ation While circumscription generalizes the way taken with the completion axioms from section 2.1, all of the variants discussed briefly in the present section might be regarded as a generalization of the way taken with negation-as-failure from section 2.2. Namely, it has been pointed out in 2.2 that negation-as-failure is in fact a meta-levet principle, and the same holds for all of the following approaches. 2.5.1
Explicit listing of exceptions. The simplest way to deal with rules of the sort "birds fly"
is by including all exceptions explicitly in the form
BIRD x ^ ~OSTRICH x ^ ~PENGUIN x A ....
FLY x
For a large number of exceptions this clearly is an awkward approach although it reduces the problem to classical reasoning without any extra provision. To some extent the behavior can be simulated by taking advantage of the fixed control of a P R O L O G system and listing the clauses appropriately, or by a set-of-support mechanism in a general resolution prover. 2.5.2
Default reasoning. One natural way to deal with problems of the flying-birds sort is to
interpret a rule like "birds fly" more precisely as "if nothing is known to the contrary we may assume that birds fly". The question thus is how we could formalize the "if nothing is known to the contrary" in this phrase.
197
Reiter in IRe1] has proposed to adopt the interpretation "it is consistent to assume that ..." for it which formally may be represented as a default rule in the following way.
BIRDx : M FLYx FLY x
The general form of such a default rule is
A: MBI,...,MB~ C
O f course, it has to be made precise in a formalism what exactly this means and how such a rule can be applied within a theory, in particular how the consistency can be determined. Reiter has carried out this program in all details in form of a default theory. default theories have deficiencies that prevent their use in the intended way.
In general,
These deficien-
cies do not occur if one restricts the theories to normal or semi-normal default rules. default rule is called normal if m=l and Bz-C in the rule above. is of the form
A : MB1 , M~B/B1
A
It is called semi-normal if it
, where B is an atom.
Even in normal default theories there is a serious computational problem since each Dale application requires a deductive test for the derivability of the defaults which incidentally demonstrates the meta-level aspect of this approach which will further be pursued in section 3.2.
In
[Gro] it has been shown that this kind of default reasoning can be reduced to circumscription for which reason we do not discuss it here any further. 2.5.3
Truth maintenance. As we learned at the beginning of the whole section 2, the addition
of facts to a non-monotonic reasoning system may change the conclusions which can be derived.
In practice one would prefer to store a condusion explicitly after its derivation in
order to keep it available for later purposes.
This however raises the problem of truth-
maintenance since after new updates of the knowledge base earlier derived conclusions might not be true any longer.
One of the first systems that deals with this particular problem is
described in [Do1]. 2.5.4
Modal lo#c. The
M
in the default rules may well be interpreted as the modal opera-
tor "is possible" from modal logic.
This is no surprise since modal logic was invented to deal
with exactly this kind of recta-level reasoning about what is derivable on the object-level or not.
In a modal logic approach of such a kind of problem the main issue always is to find the
right axioms capturing the exact meaning of the modal operators.
In our case this attempt,
which follows the first line of approach mentioned at the beginning of this section 2, has run into a number of problems. be found in [Moo].
The latest state of the discussion of this particular approach may
198
2.5.5
HiKher-order predicates. Higher-order logic provides another way of integrating recta-
level expressions into object-level ones.
An approach to default reasoning that takes advantage
of this flexibility but within first-order logic is discussed in section 3.3. 2.5.6
Meta-reasoning.
Instead of integrating the meta-level features into the object language
as in the previous two approaches, one may separate them explicitly from the object language and provide a mechanism that links the two levels together.
This approach has been taken in
[BoK]. We will briefly demonstrate it when we discuss recta-level reasoning in section 3. 2.5.7
Tolerance of inconsistencies. If we have a rule with exceptions then an inconsistency
would arise if no extra care would be taken.
For instance, if we have
B I R D x --, FLY x P E N G U I N x --, B I R D x ^ ~FLY x
then penguins would both fly and not fly.
It seems that humans use exactly this kind of
representation to deal with this sort of problem; especially no one thinks of any exceptions when asked about the characteristics of birds and then lists among other things that birds fly. Yet such inconsistencies apparently do not confuse our logical reasoning. In a formalism the inconsistency could be tolerated by restricting the inference mechanism so that e.g. the two rules above never interfere with each other.
If we think in terms of the con-
nection method as the underlying inference mechanism, then this would simply m e a n that certain connections may never be used in any deduction. Which of the connections are taken out in this way may be determined in advance for a given knowledge base.
We just mention that
this amounts to determining tautology loops in the knowledge base like the one from the literal BIRDx in the first clause to the same in the second clause, and from ~FLYx there back to FLYx.
In this example these two connections together are useless for any reasonable deduc-
tion and thus should never be taken into account. The advantage of this simple proposal, which seems to have been ignored so far, is the resulting efficiency.
Namely, it would even be more efficient to locate an appropriate deduction
than in a usual first-order problem since some connections may simply be ignored thus reducing the search space.
For this reason we feel that this approach might be the most attractive
one at all. But no one has explored this in any detail upto now. In summary, we have seen that there are several viable solutions at hand that may be used for realizing non-monotonic inference, a form of common-sense reasoning.
It is now time to try
the most promising ones in experiments in order to find out their relative merits in practice.
199
3.
META-REASONING
The methods discussed in the previous two sections provided a way of drawing inferences among pieces of knowledge represented in some formal language, mostly first-order logic. These syntactic constructs were meant to model some real world scenarios. Apparently these constructs themselves might be regarded as part of some real world scenario. In fact, often the need arises to reason not only about the real world knowledge of the first sort but also about these formulas. For instance, in the default reasoning approach we met such a situation where the reasoning process involved the question whether some formula was derivable which amounts to reasoning about certain relations among these syntactic constructs. There are many other applications than the one just mentioned.
For instance, one might wish to provide the user of a reasoning
system access to its control. This necessarily requires a language allowing for meta-level expressions. Another application arises in situations where we reason about the knowledge or beliefs of other agents. The present section deals with exactly these kinds of phenomena.
3.1.
Language and met,a-language
The first-order language used so far in this paper for emphasis might be called an object-level language in the present context. What then is a meta-language? It talks about the syntactic entities of the object-level language just as this in turn talks about some other entities. A literal such as
P(ibm,cps)
is an example of such a syntactic entity. How could it be named
in the meta-language? It is our intention to let the recta-language be a first-order language as well. Objects in such a languages are denoted by constants. Hence we would have to denote such a literal by some constant, say
c , in the meta-language. O n the object-level we sometimes preferred to use
mnemotechnic notations such as ibm
rather than
c . The same will be even more useful in
the present context. The most natural way to n a m e a phrase is by quoting it. Hence we include constants of the sort
"P(ibm,cps)" in the alphabet of our recta-language keeping in
mind that this is just for better reading otherwise being a constant just as
c . This way we
may name any first-order formula of the object-language. Once we are in a position to name formulas we may then represent relations among them by predicates. For instance, a predicate, say I N F E R , in the recta-language may denote the relation defined by
l- from section 1. For instance, we may consider the following literal.
INFER("Ms^Vx(Mx-,MTx)","MTs")
It relates two constants that name the formulas considered in section 1 where we used different names for them, viz. K I ^ K 2 and K0, So n a m i n g and talking about formulas has been done before in this paper except that this was done in a more informal way while now we are
200
a i m i n g at a formalism for the same purpose. I n fact, a definition like the one for the system G S i n section 1 is already very close to such a formalized language. M a y we therefore suggest as a n instructive exercise that the reader rewrites this definition in a purely formalized way in first-order logic, or in P R O L O G .
If done correctly with D E R I V E d e n o t i n g the derivability
relation then along with the rule
INFER(x,y)
if
DERIVE(x."-*".y)
we m i g h t successfully r u n the resulting P R O L O G program with the literal
INFER("Ms^Vx(Mx-*MTx)",'MTs")
as a goal clause. I n other words what we do this way is just writing a n interpreter for GS in PROLOG
-
-
one application of the r e c t a - l a n g u a g e .
I n this exercise a question arises that has b e e n made explicit in the rule just given. Namely, we face the need to construct constants with variable components such as and
y
x."-*".y
where
x
ranges over arbitrary formulas. For this purpose we used the infix notation for con-
catenation; alternatively we might have written conc(x,conc("~",conc(y, nil))) in L I S P notation which for first-order logic is simply a t e r m with two vm'iabtes, a n d thus causes no problems at all. Let us summarize this discussion and do so in restriction to P R O L O G in order to focus the attention on a n executable system. T h e r e is the usual language in which we write P R O L O G programs, the object language. T h e P R O L O G system interprets such programs thus establishing a relation between the program, say progr , and the
progr I- goal
goal , formally
.
T h e r e is a second language, the m e t a - l a n g u a g e , which allows to n a m e formulas (programs a n d goals) a n d relate them by predicates, e.g. we m a y say I N F E R ( " p r o g " , " g o a l " ) a n d "goal"
are the names for
prog
and
where
"prog"
goal on the meta-!evet. Even more we m a y write
a n interpreter for a formal system like GS in this m e t a - l a n g u a g e in form of a P R O L O G program, say interpr , a n d r u n Prolog to test the relation
interpr t- I N F E R ( " p m g " , " g o a l " )
If all this is done correctly then we clearly would expect that the one relation holds iff the same is true for the other, which in fact provides the i m p o r t a n t link between the two levels. Formally this link m a y be established by adding the transitions from one to the other as
201
explicit rule.s to the system that this way amalgamates the two languages/systems into a single one.
These two rules are usually called reflection principles and have been used in a number
of systems that were designed along t~hese lines [Wey,BoK,BoW,Gen].
3.2. Application to default reasoning tn section 2.5.2 the meta-level aspect of default reasoning was already pointed out and may now be formalized in the following way [BoK].
FLY(x) if BIRD(x), ~ I N F E R ( ' p m g " , ' ~ F L Y ( " . x , " ) " )
In other words, birds for which the knowledge base represented by the current object-level program denoted by "prog" does not specify anything to the contrary are assumed to fly. Although this is an elegant way of representation experience with its implementation [BOW] has to show whether it is a feasible approach that may compete with the circumscription techniques under development.
3.3.
Self-reference
We learned in the present section 3 that on the meta-level we may name syntactic items from the object-level. So if
Pc
and introduce a predicate
is a literal on the object-level we clearly may name HAS
P
by
"P"
on the meta-level, which we define by
H A S ( c , " P " ) ,-, Pc
informally,
c has property named
"P"
iff Pc holds, a form of what is called comprehen-
sion axiom. That seems to be absolutely natural, and in fact G. Frege has taken this view about a hundred years ago in a slightly different notation. Unfortunately, B. Russel showed early in this century that a formal system that allows these kinds of definitions is inherently inconsistent.
Essentially, he defined
R(x) ,-, ~HAS(x,x)
and applied it to x = " R " , i.e. used
an example that involved self-reference. This problem can be eliminated simply by avoiding self-reference altogether. But this amounts to a serious restriction since this feature is one that we use quite often in our natural way of reasoning; just think of the sentence "what I am just saying is not correct". The solution proposed by Russel consisted in establishing a hierarchy of language levels in what today we call higher-order logic, which however leads to computational and still representational problems [Per]. In section 3.1 we avoided this problem by allowing a less powerful comprehension axiom, viz. the reflection principle. Recently, it has been shown [Fef, Per] that Frege's approach is possible in a consistent way if one takes into account a slight restriction on the comprehension axioms that can be easily
202
tolerated for practice. It seems that this result which we do not develop in detail here might have far-reaching consequences among which are the following. Object- and meta-level may be amalgamated in an even stronger form than shown in 3.1. The computationally relevant parts of modal and higher-order logic might be treated in a purely first-order way which in turn would mean that first-order logic would finally turn out to become the formalism par excellence. Encouraged by this result let us reconsider our flying-birds problem from section 2. R e m e m b e r that at the end of section 2.4 we already indicated doubts whether the circumscription (and other) approaches to non-monotonic reasoning are as efficient as the human one. In particular we feel that a rule like "birds fly" itself is not affected by encountering say a penguin that does not fly. Here an additional rule is added that we think is not "penguins are birds" but rather "penguins are non-flying birds". Let us illustrate this idea with the following example. Px
represents "x is a penguin";
animal"; r
denotes robby. T h e n
animals"
Bx
represents "x is a bird";
Cx represents "x is a cardinal";
by
A("B")
"birds fly" may be represented by
, "cardinals are birds" by
and "tweety is a penguin" by
Ax
represents "x is an
Fx represents "x can fly"; t denotes tweety and
BCC"),
~F("B("P")")
, "birds are
"robby is a cardinal" by
C(r) ,
P(t) . Altogether this scenario is given by the following for-
mula. A("B") A F("B") A B("C") A ~F("B("P")") A C(r) ^ P(t)
Properties that apply to a class also apply to each of its members, expressed by
zcx")
^ X(x) ~ Z(x)
otherwise it must be made explicit (e.g. in the functional way whole("B") ) that the class as a whole is addressed. This property inheritance is allowed in classes with additional restrictions only if the property is not identical with the complement of the restriction, expressed by the formula Z ( " X " ) A Y ( " X ( " . x . " ) " ) ^ " Y " ~ " ~ " . " Z " --* Z(x)
From these three formulas we easily may infer that robby can fly but that tweety cannot, further that both are animals. For instance, the second rule applied with the substitution {ZkA, X\B, Y\~F, xV'P"}
yields
A('P")
which by application of the first rule results in
A(t) . This looks like a very unusual way of representing this kind of knowledge in first-order logic. But we remind the reader that logic provides the form only and is totally open w.r.t, how this form is used to represent concepts. If this representation has no other disadvantages (which we might have overlooked at this point) why not preferring it to more familiar ones. In natural
203
language this kind of representation seems to be quite familiar anyway.
3.4. Re.a~ning about knowledge and belief A sentence like " D e a n doesn't know whether Nixon knows that Dean knows that Nixon knows about the Watergate break-in" demonstrates that reasoning about knowledge and belief is not only familiar in everyday circumstances, but may also be quite complicated. If we, to begin with, restrict it to the simple form "a knows that B" then we may see that knowing is a meta-level concept. So for its representation we may use the two approaches discussed before in the present section 3.
That is we may treat
KNOW
as a predicate on the meta-level or
may allow iterated application of predicates as in K N O W ( a , " G R E E N ( g r a s s ) " ) . From a philosophical point of view all knowing is relative, that is pieces of knowledge are actually beliefs that are true relative to some higher beliefs. We therefore prefer to talk about belief rather than knowledge and adopt the rule
KNOW(a,F)
~ BEL(a,F) ^ T R U E ( F )
but not its inverse. So as another example "Hans believes that Richard-von-Weizs/icker is president of G e r m a n y " would read
B E L ( h a n s , " P R E S ( r - v - w ) " ) . Or "There is someone whom
Hans believes to be president" would either read [NAMES(hans,x,y) ^ BEL(hans,"PRES(".x.")")]
.
3x B E L ( h a n s , " P R E S ( " . x . " ) " )
or 3xy
This illustrates how one may represent
arbitrarily complex statements about someone's beliefs with the first-order language envisaged in the previous section 3.3. Obviously, one may then also use the reasoning mechanism available in first-order logic for drawing correct inferences. Like with any other predicates we have to specify rules that capture our intention with the predicate BEL . For instance, one might think of the rule [Moo]
BEL(a,"A-*B") A B E L ( a , " A " ) -* BEL(a,"B")
This kind of rule may be questioned since
a
might never think of actually performing the
inference. A similar case is known as the problem of logical omniscience occurring in the presence of the rule K N O W ( a , A ) ^ (A-,B) -* KNOW(a,B)
which even more cannot be accepted without some restriction.
A possible solution is discussed
in [Levi. The problems occurring in reasoning about knowledge and belief have extensively been discussed in [Hin]. There a semantics is taken into account that envisages possible worlds. In particular,
a knows
F iff F is true in all the worlds a thinks are possible. This kind of
204
semantics has been formalized with Kripke structures [Kri] but it is not clear how to model the state of knowledge with them. There are serious doubts w.r.t, computational feasibility of these possible worlds approaches. In [Mcl] a functional approach to modelling knowledge and belief has been drafted. There concepts are treated as special functions that are denoted by strings starting with a capital letter. For instance, Know, Pat, Phonenumber, Mike, all denote such concepts. "Mike knows Pat's phonenumber" would read
TRUE Know(Mike,Phonenumber(Pat))
Similarly, "Joe knows whether Mike knows ..." would read
TRUE Know[Joe, Know(Mike, Phonenumber(Pat)) ]
while "Joe knows that ..." would be distinguished by reading
TRUE K[Joe, Know(Mike,Phonenumber(Pat))]
with a different knowing concept denoted by K . For each concept X there is an object x of which X is the concept, formally x = denote(X) . Although this approach seems to work for the examples given in [Mcl], we share the opinion expressed in [Per] that the hope of a satisfactory functional treatment of concepts and modalities, that is associated with this proposal, is unfounded, not mentioning the unintuitiveness of this model. Finally we mention the approach described in [Bi6]. It takes the view that different believing agents are making their inferences in completely separate worlds, which formally means a representation in alphabets with empty intersections. In this sense the phonenumber of Pat in Mike's world of beliefs would be represented as
phonenumber~ike(patmke) indicating that both the function symbol and the constant are taken from Mike's world and are to be regarded different from those say of Mary's world similarly indexed by mary. This basic idea has been generalized in order to correctly reason in a purely first-order setting about what different agents know. That such reasoning is not quite trivial may be seen from such examples like "Mike knows Pat's phonenumber which incidentally is the same as Mary's; does Mike therefore know also Mary's number?" He does not know it, that is, unlike usual first-order reasoning, reasoning on knowledge is opaque and not transparent in the sense that we not simply substitute equals for equals in such sentences unless Mike knows of the equality
205
(and in fact uses it).
3.5. Expressing control on the recta-level
Usually, a theorem prover for first-order logic or a subset thereof is built with a fixed more or less complex control in it. In particular in the context of using logic as a programming language the need naturally arises to adapt the control to the special problem under consideration either automatically or by interaction with the programmer. From what we have learned in the present section 3 it should be obvious that this is typically a recta-level task. It requires a control language on the recta-level talking about what goals to be processed next and unified with what heads of etauses, if we think in terms of P R O L O G . It is straightforward to realize this idea in a practical control language. In [Bil] it has first been discussed how such a control added to a logic program if appropriate would after compilation result in a program as efficient as any corresponding say A L G O L program. A language in use that realizes this kind of approach is I C - P R O L O G [C1M]. For other proposals in the same direction see [GaL,Gal].
4. R E A S O N I N G
ABOUT
UNCERTAINTY"
Recall from section I that the basic issue of this paper is the inferential relation K section 2 we have already considered examples where the knowledge
K
I% K'. In
is not quite certain in
some sense; the knowledge that birds fly is of such a kind. In m a n y cases of that sort wc have some feeling about how certain we actually are. In scientific disciplines such as economics, medicine, geology, etc. this feeling often has even a very solid base provided by extensive statistical material. If such additional information is available then obviously it should be taken into consideration in our way of reasoning perhaps in a more explicit way than the one discussed in section 2 where a m o n g the various uses of non-monotonic reasoning we already mentioned this aspect under point 5. In the present section we are going to discuss some of the possibilities that have been explored for that purpose. We do so in a section separate from section 2 not so much because the quality of the problem is really different, rather because here there is an emphasis on this extra aspect of taking into account the uncertainty in a more quantitative way. Also note the relation with the previous section since this extra aspect clearly is some sort of knowledge on the meta-level although the approaches discussed below mostly do not take this into explicit account. The phenomena associated with this kind of reasoning about uncertainty have been studied under m a n y different points of view which gave rise to a variety of different names for more or less the same topic. Some of these names are fuzzy, approximate, plausible, or vague reasoning, reasoning with, under, or about uncertainty, theory of evidence, or of possibility, to some extent also inconsistency reasoning. A m o n g all these approaches we may distinguish
206
those based on some sort of probability theory ("normative approaches") from those taking a non-probabilistic knowledge-based point of view which aim at modeling h u m a n performance ("performance or positive approaches").
4.1. Baye~an inference Often h u m a n s associate some kind of measure with statements. For instance, an expert investment counselor might associate a degree 0.5 with the rule that advanced age implies low risk tolerance. This might mean that there are statistic informations showing that 60% of the elderly people have low risk tolerance. It may also mean that the expert summarizes his knowledge about the relation of age and risk tolerance with this figure which might then be regarded as a degree of belief in the statement. Whatever is the case let us assume that somehow we are provided with probabilities of this kind along with any kind of statements. Let
E denote any such statement, e.g. "John is of advanced age". So we would consider a
probability value P(E) along with E ; similarly with any other statement H has low risk tolerance". The rule mentioned just before states that
E
example, where E may be regarded as evidence for the hypothesis H statement, we consider some probability P(E-*H)
such as "John
implies
H
in this
to hold. As with any
for this rule as well; in probability theory
one usually writes P ( H I E ) to express exactly the same and calls it the conditional probability of H
relative to E .
Finally, we may consider the probability that
H
and
E both hold,
i.e. P(H^E) , sometimes written shortly P(HE) . A simple probabilistic argument shows us that
P(H^E)
is the probability
P(E) times
P(H[E) , called the theorem of compound probabilities.
P(H^E) = P(E)'P(H[E) For instance if John may be considered to be of advanced age only to a degree of 50% then P(HAE)
would be
0.5"0.6=0.3
in our example. Of course, we may as well consider the
inverse rule "low risk tolerance implies advanced age", that is
H ~ E , and its probability
P ( E [ H ) . As before we obtain P(E^H) = P ( H ) ' P ( E [ H ) Since ^ is commutative, the left sides of these two equations must be equal, hence also their right sides, that is P ( E ) ' P ( H ! E ) = P ( H ) ' P ( E [ H ) , or P ( H I E ) = P(H)'P(E]H)/P(E) This equation is called Bayes' theorem. See any book on probability theory (such as [deF]) for
207
more details. It provides the basis for reasoning about uncertainty in many expert systems such as M Y C I N , P R O S P E C T O R , etc. For this kind of application we have to consider a number of hypotheses
H1,...,H,
, each of which being conditional on
E = E I ^ . . . ^ E ~ . In
practice the hypotheses are selected such that they may be assumed to be mutually exclusive and exhaustive, i.e. in any scenario exactly one of them is assumed to hold. Further, conditional independence hypothesis
in
an
is assumed independent
which means that the pieces of evidence support each way,
expressed
formally
as
P(E1A...AE, t Hi)
=
P ( E t [ H ~ ) ' . . . ' P ( E , [ H i ) . U n d e r these assumptions one may derive a form of this theorem that allows to update the conditional probabilities whenever information becomes available (e.g. by experiments) that some of the evidences in fact hold; this means in other words that we may carry out the kind of reasoning discussed in section 1 for the simple case of rules (or Horn clauses), but at the same time calculate the probabilities for the derived statements. For more details see [DHN]. A useful way of viewing this formalism is an inference net in which the propositions describing pieces of evidence or hypotheses are represented as nodes and the relations among propositions become the links of the network. The probabilities are measures associated with the nodes. The updating of such a measure for one or more nodes upon arrival of new evidence causes a propagation of the change along the links until the net stabilizes again. Disregarding the fact that this approach is quite popular and successful, it has been shown that these assumptions are quite problematic, in fact may even lead to inconsistencies [Gly]. One way of avoiding these problems is described in [Kad]. But there are other problems with the Bayesian approach as discussed so far. They include the difficulty to distinguish uncertainty (about what we know) from ignorance (see next section for an example) as well as the fundamental problem how meaningful such probabilities (the "certainty factors") are in applications and where to take them with some reliability in the first place. Finally, this approach also is restricted to situations where the propositions can be arranged in a hierarchy with inference chains flowing smoothly from rough evidence through to conclusions which often is not the case in practical applications [Qui].
4.2.
T h e I)embmer-Shafer theory of evidence
The Dembster-Shafer theory of evidence [Sha] is a close relative of the Bayesian approach. Both take into account degrees for measuring certainty. As a main difference we note that in the Dembster-Shafer approach the probability distribution is assumed over all subsets of hypotheses rather than over all individual hypotheses as in the Bayesian approach. Suppose we are considering a world with four automobile makers, Nissan (N), Toyota (T), General Motors (G), and Chrysler (C), and want to determine the probability of who might dominate a new market [GoS]. Instead of considering a predicate D O M I N in order to express e.g. the dominance of Nissan and Toyota by
DOMIN({N,T})
we briefly write
{N,T} for
208
the same purpose. The set
D = {N,T,G,C}
is called the frame of discernment in this
approach. As in the Bayes approach it is assumed that the singleton hypotheses are mutually exclusive and exhaustive. Now assume evidence is somehow obtained that the probability of Japanese dominance is .4. In order to update the probabilities so that this new information is incorporated the following mechanism is applied. A basic probability assignment function subsets of D . Initially, m(D)=1.0 the evidence m({N,T})=.4
m
is introduced that allows to assign probabilities to
since no other information was available. After obtaining
the decrease in our ignorance is captured by updating
that its value continues to express exactly the degree of ignorance which is this case. probability
For all other subsets of
D
the value of m
is 0
m(D)
so
1.0 - .4 = .6 in
in the present situation. The
P expressing our degree of belief as in the Bayesian approach may be calculated
from the values of m
by adding the m-values of all subsets. For instance, P({N,T})=.4 and
P(D)=I.0 while this value is 0 for any other subset in this case. The question remains how one performs the updating as with complicated case.
m(D)
just before in a more
This is achieved with Dempster's rule of combination as follows. Suppose a
second evidence is obtained for the present scenario suggesting a dominance by {T,G,C} with a probability of .8, i.e.
m2({T,G,C})=.8
which leaves an ignorance
m2(D)=.2 . For distinc-
tion let us denote the m-function with the previous values by m 1 . What are the new m values on the basis of this additional evidence? The rule is to multiply the previous m-values (ml) for a subset
$1
with the m - v a l u e (m~) obtained from the new evidence in order to
obtain the combined m-value for the intersection
$1 with
S~ . In the present example this
rule gives us the following values.
m({T})=m~({N,TI)'m2({T,G,C})=. 4 +.8=.32 m({N,TI)=ma({N,T})'me(D)=.4".2=.08
m({T,G,C})=ml(D)"m~({T,G,C})=.6".8=.48 m(D)=ml(D)'m+(D)=.6'.2=. 12
For the remaining subsets the m-values continue to be zero. As before the degrees of belief are calculated by adding the m-values for all subsets. So we obtain for example
P({N,T}) = m({N,T}) + m({T}) + m({N}) = .08 + .32 + 0.0 = .4
Recently is has been shown in [Pea] that an appropriate view of the Bayesian theory in fact yields the same kind of flexibility that has been just demonstrated with the Dempster-Shafer theory. So it seems that both approaches are pretty much the same indeed. This includes the fact that both share the same disadvantages indicated at the end of the previous section.
209 I..3. Fuzzy logic Fuzzy logic has emerged form an attempt to develop a logic that models the fuzziness of natural language. This fuzziness is present in m a n y features of natural language, for instance in predicates such as "young", "intelligent", "blonde", or "elderly", "having low risk tolerance" mentioned already in section 4.1 above, but also in quantifiers such as "most", "some", "not very many", etc. While we have seen a way to cope with the fuzziness of predicates in the previous two approaches, there is nothing in them which suggests a way of dealing with such fuzzy quantifiers. Here fuzzy logic offers a single conceptual framework for dealing with those different types of uncertainty. In a sense it subsumes both predicate logic and probability theory. A n attempt to access the huge amount of papers on this topic might Start with [Zal,Za2]. Let us first consider the case of a fuzzy predicate, e.g. YOUNG(john). Fuzzy logic requires such a statement to be transformed in a canonical form which makes explicit the range of fuzziness. Here this range is the age of John within the interval say [0,100]. The canonical form is
YOUNG(john)--,age(john)=YOUNG
which is associated with a membership function
fy0u~0(u)=l-S(u;20,30,40) of a fuzzy set of the range [0,1001.
S
is a fixed continuous
function that is 0 upto 20, then grows to .5 at 30, and saturates at 40 reaching the value t that is kept until 100. For instance, f~ou~0(28) is approximately .7.
By itself it is meant to
express that .7 is the degree of compatibility of 28 with the concept labeled Y O U N G .
The
statement YOUNG(john) converts this meaning to that of expressing the degree of the possibility that John is 28.
In summary, for any such statement containing fuzzy predicates we
have to specify the canonical form of the statement, provide the ranges, and specify the parameters for the S-function which determines the possibility function f . Such possibility functions are then associated with fuzzy quantifiers such as "most" or "more than half". Further it is defined how two such functions are combined to yield one that characterizes a "quantifier" resulting from the application of a togieal inference rule such as the "quantifier" Q
in the following inference.
m o s t students are single m o r e - t h a n - h M f - o f the students are male
Q
students are single and male
Clearly, fuzzy logic indeed provides a coherent formalism for dealing with uncertainty. But the problems mentioned for the previous probabilistic approaches seem to be even more serious here where the probability technique covers even more features. I n this context we might question the membership functions, the rules of combinations as being fixed in a pretty much arbitrary way that does not adequately model the h u m a n way of coping with uncertainty.
210
4.4. P c r f ~
approaches
At the end of section 4.1 we have mentioned some of the difficulties that are encountered in the probabilistic approaches discussed so far. tt is not surprising, then, that a number of attempts have been made to cope with the phenomenon of uncertainty in a non-probabilistic way. These may in fact be regarded as more typical AI approaches since they more closely try to model what seems to be the h u m a n way of coping with uncertainty which in essence clearly is not a probabilistic one. If we ignore quantifiers to begin with, then the situation encountered in the previous sections consists of a number of propositions such as facts, rules or more complex statements. They are supposed to model some reality. For each particular application each of the proposition is either true or false, i.e. there is no such thing as being true with degree .6 . We only do not know for certain which of the two alternatives in fact applies. However often these propositions are not the only information available. In addition there may be some meta-knowledge about the propositions themselves such as experience that rule 1 is more reliable than rule 2 possibly based on statistical information derived from earlier applications. More important, even large knowledge-based systems in use today comprise only a very small fraction of the knowledge that is usually available for a h u m a n expert carrying out the tasks posed for such a system. For speculation let us assume that a system can be built comprising all this knowledge represented in form of propositional statements. Since even then we still would not know for many of these statements whether they are true or not, the only remaining way of deriving some conclusion would be to isolate consistent subsets of knowledge, to draw conclusions from each of them, to compare the results and arbitrarily decide which ones to prefer. One might consider many of the performance approaches as approximations to this general model of calculation. Especially the system Ponderosa [Qui] is based on this paradigm. It generates from the given set of statements maximally consistent sets of statements and separates them from the remaining statements. Although it does take into account measures of belief, there is no automatic selection of the "best" maximally consistent set; rather it provides those measures to the user as a filter and heuristic guide. A n obvious objection to this approach would be the exponential growth of possible subsets to be considered in the search for the consistent ones. This problem is overcome in Ponderosa again by a regress to the measures involved that are used here to restrict the search to result in the (currently) 10 "best" sets without the need to generate the remaining ones. Note that Ponderosa realizes one way of reasoning that tolerates inconsistencies, a topic that we already mentioned in section 2.5.6. While Ponderosa still involves measures as in the probabilistic approaches though with a drastically restricted function, the system S O L O M O N [Coh] is based on the model of endorsement
211
and uses no such measures anymore. As in a bureaucracy potential conclusions have to pass a n u m b e r of tests by meta-rnles that qualify it as positive ("pro"), negative ("con"), or irrelevant. The rules encode judgement, qualify the source and preciseness of the information, and other meta-knowledge of similar kind. When passing the structured net of rules the
test
results are collected in a "ledger-book'. The summing of these results is carried out in a deductive rather than a probalistic way. Only sufficiently endorsed conclusions eventually are allowed to pass the test. None of these systems involves a technique for dealing with fuzzy quantifiers as in fuzzy logic. That does not mean that the probabilistic treatment of such quantifiers provides the only solution to their formalization. For instance, in [BiS] a first-order solution for representing fuzzy quantifiers is outlined. For instance, the fuzzy quantifier "most" in "most students" is expressed as, informally, "all elements in a subset of" the set of students which in terms of cardinality is not very different from the whole set".
4.5. Engineering appm,~c'hes Often the meta-knowledge is not represented explicitly in current systems dealing with uncertainty but rather is encoded implicitly into the control strategy of the system. As we discussed in section 3.5 control is knowledge that naturally is interpreted as meta-knowledge, hence such systems in this sense take an approach like S O L O M O N except that they do not make the control knowledge explicit. Pattern-recognltion systems dealing with huge amounts of noisy information are often built in such a way. As an example we mention the system HEARSAY II (see [BaF]). Often such systems use a special system architecture known as the blackboard model developed during the HEARSAY project.
5. S U M M A R Y A N D C O N C L U S I O N S In this paper we have given an introduction to essentially four types of reasoning, viz. classical, non-monotonic, meta-, and uncertainty reasoning. In retrospect we might now wish to raise a n u m b e r of questions about this selection. First of all one might ask why we have chosen this particular sequence of topics. We admit that there is no absolutely convincing argument which separates this structure of presentation from others. The problem is the close interrelation among all these topics. For instance, meta-reasonlng in first-order logic clearly is classical reasoning and after amalgamation is even totally identical with it on the system level. Similarly, m a n y aspects of uncertainty reasoning can be interpreted the way discussed under the topic of non-monotonic reasoning which in turn may be formalized in terms of classical logic as we have seen in this paper. Yet the focus of interest is sufficiently different in each of these four types of reasoning to justify their separate treatment.
Anyway this separation is in line with common practice
except perhaps for meta-reasoning. The latter often does not enjoy the special attention that
212
we have spent on it. But we feet that its potential may have been underestimated in the past. Its treatment after non-monotonic reasoning rather than along with classical reasoning was done for didactic reasons in order to demonstrate its applicability to the phenomena described in section 2. Next we might ask whether these four comprise all the main types of reasoning. This is certainly not the case since there are many more kinds of reasoning that are sufficiently different from those to justify their presentation. O n e of them is inductive reasoning which however is included in the chapter by A. Biermann in this volume. Another one is reasoning by rewriting as in equality theories. These may be regarded as encoded forms of classical reasoning as we pointed out in section V.4 of [Bi3]. In the present volume they are treated in the chapter by G. Huet and to some extent also in that of M. Stickel. Some other kinds for lack of space will be mentioned only briefly in the following. As we have seen throughout the paper, for all the types discussed so far there was always a technique within the framework of first-order logic that handled it appropriately. This is the case also for those not mentioned so far. Hence their treatment is implicit in what we have presented before, This is to say that first-order logic provides the formalism that is flexible enough to allow the conceptual expression of many more kinds of inference. Of course, the formalism does not reveal by itself how this is to be achieved in detail. One further type of reasoning might be distinguished that occurs in problem solving and planning. Obviously, a problem may be formalized within first-order logic. But it is not obvious how our natural reasoning in such cases could be modeled in this framework. Yet a number of such techniques have indeed been suggested. Among them [Bi8] proposes a direct application of a classical theorem prover (as described here in section 1) that is subject to a certain restriction in its control (in other words some meta-knowledge is built into it - cf. section 3.5). A lot of reasoning is involved in programming and in reasoning about programs. P R O L O G has demonstrated that classical theorem provers as described in section t indeed are extremely useful tools in this context. But there are many more kinds of application in this wide field such as for program synthesis (which is closely related to inductive reasoning), program verification, program analysis. Other logics have been proposed for those purposes such as temporal logic, a kind of modal logic (see section 2.5.3). For good reasons we prefer the classical approach also in this context but for lack of space cannot go into further details of this large topic. In both previous applications of planning and programming time played a certain role, So we might ask how time can in fact be dealt with in a purely descriptive framework such as firstorder logic. Again we only can mention that convincing proposals do exist and have been used successfully in running systems - see [Sho] for a survey; as an example we mention [KoS] where time is captured by events and the periods marked through their occurrence.
213
Speaking of time might bring us next to space, viz. the physical space. Or, more generally, to qualitative physical laws and the reasoning about them [BdK], which again opens a whole new range of aspects; just think of reasoning about the behavior of liquids [Hay]. Before we interrupt this seemingly endless list we finally mention analogical reasoning [Win] which is a kind of meta-reasoning that allows for certain abstractions. As in all previous cases we see the first-order formalism as the appropriate framework for its treatment. In summary, we admit a strong bias towards the attempt to uniformly conceptualize all these different phenomena under the common framework of first-order logic. In order to appreciate this bias it is important to be aware of the fact that the deductive relation
I- as in K ]- K '
is really a relation that can be explored in various ways, not only in the axiomatic one where K
is assumed to be given and
and unknown
K'
K , or partially known
is derived or tested. We may also use it for given K and
K ' , and so forth.
K'
In addition there are vari-
ous ways of structuring like recta-inference as discussed in this paper. Because the variety of kinds of reasoning is so confusingly rich it appears that any other approach could simply not be carried out because of lack of uniformity and simplicity available here. With this latter remark we actually carry the discussion to the point of realizing all these variants in a hopefully single uniform system. There should be no doubt that we are far away from such an artifact. In fact the task seems to be so complex that it is hard to imagine how it might be put together by h u m a n minds. We believe that this is possible only if the system is of such a kind that it allows to assemble the pieces of knowledge in arbitrary order, one after the other. Seven pieces of factual and rule knowledge, one meta-level piece talking about them, then a piece of control knowledge, followed by another 42 pieces of domain knowledge, a rule of judgement, and so forth, just to illustrate what we mean. This requirement singles out a form of knowledge representation that is extremely modular on the one hand, but also reflects the tightly woven net of relationships among aI1 these pieces of knowledge. First-order logic clearly enjoys the modularity needed, but does it also support the connections? O n the surface of the representation it does not indeed. But once the representation is transformed into an internal form, for instance as a dag (directed acyclic graph), then these connections become visible, at least to the system as we showed in section 1.6 of [Bi7]. Since this part may be implemented completely independent from the particular knowledge to be represented we see that the first-order formalism does indeed support both requirements in an ideal way. With respect to the architecture of the knowledge base of such a system it seems that a hierarchical structure for the various parts would be best suitable as we argued in [Bi5]. On the bottom we would have the clusters of domain knowledge. O n top of these would be what we call deductive knowledge that stores preprocessed deductive information, so that costly search has not to be repeated many times. O n the next level we would have meta-level knowledge of judgement. And so forth, until on the top level all is brought together by a
214 central control. There are still so many open problems in important issues of detail for most of the features of reasoning discussed in this paper that it might seem premature to speculate in such a way about a uniform system comprising all these forms of reasoning. But speculation here is meant to play the role of a heuristic that guides our judgement about which of the many problems to attack with higher priority than others. If this paper has contributed to see all these problems in a common context it has fulfilled its purpose. Acknowledgment. The typscript is due to A. Bentrup and W. Fischer. REFERENCES
[BaF] Barr, A.B., Feigenbaum, E.A. (eds.), The Handbook of Artificial Intelligence, 1, W. Kaufmann, Los Altos (1981). [Bet] Beth, E.W., The foundations of mathematics, North-Holland, Amsterdam (1965). [Bil] Bibel, W., Programmieren in der Sprache der Pr/idikatenlogik, Habilitationsarbeit (abgelehnt), Technische Universit/it Mfinchen (1975); shortened version: Pr/idikatives Programmieren, LNCS 33, Springer, Berlin, 274-283 (1975). [Bi2] Bibel, W., A uniform approach to programming, Report No. 7633, Technische Universitht Mfinchen, Abtlg. Mathematik (1976). [Bi3] Bibel, W., Automated theorem proving, Vieweg, Braunschweig (1982). [Bi4] Bibel, W., Matings in matrices, CACM 26, 844-852 (1983). [Bi5] Bibel, W., Knowledge representation from a deductive point of view, Proc. I IFAC Symposium Artificial Intelligence (V. M. Ponomaryov, ed.), Pergamon Press, Oxford, 37-48 (1984). [Bi6] Bibel, W., First-order reasoning about knowledge and belief, Proc. Int. Conf. Artificial Intelligence and robotic control systems (I. Plander, ed.), North-Holland, Amsterdam, 9-16 (t984). [Bi7] Bibel, W., Automated inferencing, J. Symbolic Computation 1, 245-260 (1985). [Bi8] Bibel, W., A deductive solution for plan generation, New Generation Computing 4
(1986). [BoK] Bowen, K.A., Kowalski, R., Amalgamating language and meta-language in logic programming, Logic Programming (K.L. Clark, S.-A. T~irntund, eds.), Academic Press, London, 153-172 (1982). [BOW] Bowen, K.A., Weinberg, T., A recta-level extension of PROLOG, Technical Report, CIS-85-t, Syracuse University (1985).
215
[BdK] Brown, J.S., de K.leer, J., The origin, form, and logic of qualitative physical laws, IJCAI-83 (A. Bundy, ed.), Kaufmann, Los Altos, 1158-1169 (1984). [Bun] Bundy, A., The computer modelling of mathematical reasoning, Academic Press (1983). [Cla] Clark, K.L., Negation as failure, Logic and Data Bases (H. Gallaire et al., eds.), Plenum Press, New York, 293-322 (1978). [C1M] Clark, K.L., McCabe, F.G., The control facilities of IC-PROLOG, Expert systems in the Microelectronic Age (D. Michie, ed.), Edinburgh University Press (1979). [Coh] Cohen, P.R., Heuristic reasoning about uncertainty: an Artificial Intelligence approach, Pitman, Boston (1985). [deF] de Finetti, B., Theory of probability, vol. 1, Wiley, London (1974). [Doll Doyle, J., A truth maintenance system, Artificial Intelligence 12, 231-272 (1979). [Do2] Doyle, J., Circumscription and implicit definability, Non-monotonic Reasoning Workshop, AAAI, 57-67 (1984). [DHN] Duda, R.O., Hart, P.E., Nilsson, N.J., Subjective Bayesian methods for rule-based inference systems, Techn. Note 124, SRI International, AI Center, Menlo Park; also: Proc. NCC, AFIPS Press (1976). [EMR]
Etherton, D.W., Mercer, R.E., Reiter, R., On the adequacy of predicate cir-
cumscription for closed-world reasoning, Proc. Non-monotonic Reasoning Workshop, AAAI, 70-81 (1984). [Fef] Feferman, S., Toward useful type-free theories I, JSL 49, 75-111 (1984). [Gall Gallagher, J., Transforming logic programs by specialising interpreters, Report, Dept. Computer Science, University of Dublin (1984). [GaLl Gallaire, H., Lasserre, C., Meta-level control for logic programming, Logic Programming (K.L. Clark, S.-A. T~rnlund, eds.), Academic Press, London (1982). [GeG] Genesereth, M.R., Ginsberg, M.L., Logic Programming, CACM 28, 933-941 (1985). [Gen] Gentzen, G., Untersuchungen fiber das logische Schliessen, Mathem. Zeitschr. 39, t76-210, 405-431 (1935). [Gly] Glymour, C., Independence assumptions and Bayesian updating, Artificial Intelligence 25, 95-99 (1985). [CoS] Cordon, J., Shortliffe, E.H., The Dempster-Shafer theory of evidence and its relevance to expert systems, Rule-based Expert Systems (B.G. Buchanan, E.H. Shorttiffe, eds.), Addison-Wesley, Readings, ch. 13 (1984). [Gre] Green, C.C., Theorem proving by resolution as a basis for question-answering systems, Machine Intelligence 4, Elsevier, New York, 183 - 205 (1969).
216 [Gro] Grosof, B., Default reasoning as circumscription, Proc. Non-monotonic Reasoning Workshop, AAAI, 115-124 (1984). [Haa] Haas, A.R., A syntactic theory of belief and action, Artificial Intelligence 28 (1986). [Hay] Hayes, P.J., Naive physics I - Ontology for liquids, Formal Theories of the Commonsense World (Hobbs, J.R., Moore, R.C., eds.), Ablex (1984). [Hin] Hintikka, J., Knowledge and belief: An introduction to the logic of the two notions, Cornell University Press (1962). [JLL1 Jaffar, J., Lassez, J.-L., Lloyd, J., Completeness of the negation as failure rule, IJCAI-83 (A. Bundy, ed.), Kaufmann, Los Altos, 500-506 (1983). [Kad] Kadesch, R.R., Subjective inference with multiple evidence, Artificial Intelligence 28
0986). [Kow] Kowalski, R.A., Sergot, M., A logic-based calculus of events, New Generation Computing 4, 67-95 (1986). [Kri] Kripke, S., Semantical analysis of modal logic, Zeitschrift f. Mathem. Logik u. Grundlagen der Mathem. 9, 67-96 (1962). [Lev] Levesque, H., A logic of knowledge and active belief, Proc. AAAI-84 (1984). [Lil] Lifschitz, V., Computing circumscription, Proc. IJCAI-85, Kaufmann, Los Altos, 121127 (1985). [Li2] Lifschitz, V., On the satisfiability of circumscription, Artificial Intelligence 28, 17-27
(1986), [Llo] Lloyd, J.W., Foundations of logic programming, Springer, Berlin (1984). [Mcl] McCarthy, J., First-order theories of individual concepts and propositions, Expert Systems in the Micro-electronic Age (D. Michie, ed.), Edinburgh University Press, 271-287 (1979). [Me2] McCarthy, J., Circumscription - a form of non-monotonic reasoning, Artificial Intelligence 13, 27-39 (t980). [Me3} McCarthy, J., Applications of circumscription to formalizing common sense knowledge, Proc. Non-monotonic Reasoning Workshop, AAAI, 295-324 (1984). [MiP] Minker, J., Perlis, D., Completeness results for circumscription, Artificial Intelligence 28, 29-42 (1986). [Moo] Moore, R.C., Semantical considerations on non-monotonic logic, IJCAI-83 (A. Bundy, ed.), Kaufmann, Los Altos, 272-279 (1983). [Pea] Pearl, J., On evidential reasoning in a hierarchy of hypothesis, Artificial Intelligence 28,
9-16 (1986).
217
[Per] Perlis, D., Languages with self-reference, Artificial Intelligence 25, 301-322 (1985). [Qui] Quinlan, J.R., Internal consistency in plausible reasoning systems, New Generation Computing 3, 157-180 (1985). [Rel] Reiter, R., A logic for default reasoning, Artificial Intelligence 13, 81-132 (1980). [Re2] Reiter, R., Circumscription implies predicate completion (sometimes), Proc. AAAI-82, 418-420 (1982). [Re3] Reiter, R., Towards a logical reconstruction of relational database theory, On Conceptual Modelling:
perspectives from Artificial Intelligence,
databases,
and programming
languages (M.L. Brodie et al., eds.), Springer, Berlin, 191-238 (1983). [Sch] Schtitte, K., Proof theory, Springer, Berlin (1977). [Sha] Shafer, G., A mathematical theory of evidence, Princeton University Press, Princeton (1976).
[She] Shepherdson, J.C., Negation as failure: A comparison of Clark's completed data base and Reiter's closed-world assumption, Report PM-84-01, School of Mathematics, University of Bristol (1984). [Silo] Shoham, Y., Ten requirements for a theory of change, New Generation Computing 5, 467-477 (1985). [Tur] Turner, R., Logics for Artificial Intelligence, E. Horwood, Chichester (1984). [Wey] Weyrauch, R., Prolegomena to a theory of mechanized formal reasoning, Artificial Intelligence 13, 133-197 (1980). [Win] Winston, P.H., Learning and reasoning by analogy, CACM 23, 689-703 (1979) . [Zal] Zadeh, L.A., A computational approach to fuzzy quantifiers in natural languages, Comp. & Maths. with Appls. 9, 149-184 (1983). [Za2] Zadeh, L.A., The role of fuzzy logic in the management of uncertainty in expert systems, Fuzzy Sets and Systems 11, 199-227 (1983).
PART THREE Knowledge Programming
T e r m Rewriting as a Basis for the Design of a Functional and Parallel Programming Language. A case study : the Language FP2
Philippe Jorrand LIFIA Institut National Polytechnique de Grenoble
FOREWORD
The semantic elegance and the mathematical properties of applicative and functional programming languages are now widely recognized as relevant and useful qualities for implementing the large and complex algorithms of the kind encountered in artificial intelligence.
On the other hand, it is also being realized that many of the problems solved by these algorithms, like automated reasoning, and some major application areas related to artificial intelligence, like computer vision, are in fact considered in a distorted and limited way because of the implicit and ever present hypothesis that they have to be solved in a sequential way by a single processing engine. This is one reason why parallelism has become a highly active topic for research in programming languages and methodology. Another reason is that the design of machine architectures with massive parallelism is also becoming a feasable task because of the progress in VLSI technology.
However, the history of languages has put parallelism, communication and synchronization on a separate path from nice and clean applicative and functional programming. One difficulty is then to reconcile these seemingly antagonist styles.
222
It is such a unified framework for both functional and parallel programming which is presented here. It takes the form of a language, called FP2, which is entirely based on the notion of terms for representing the objects of the language, and on the mechanics of term rewriting for representing the operational semantics.
Part of the work on FP2 is carried out in the context of ESPRIT Project 415, where FP2 is the basic tool for designing and implementing a parallel inference machine. This presentation is partly drawn from : "FP2 : the language and its formal definition", a working document written for ESPRIT Project 415. This presentation has the format of an informal language description and it does not contain references inserted in the text. It is followed by a bibliography on topics related to the essential questions raised by the design of such a language.
223
I _ OVERVIEW. The main language styles under active study for new generation programming can be visualised on a triangle. Vertices represent "pure" programming styles (i. e. functional, parallel and logic). The edges represent "mixed" programming styles, where 2 "pure" styles are explicitly present in a single language.
9),,, unct.ionalpro~: :,.ammlno ... /"
FP2 /
",.,,%.
.)/"
¥ Parallel programming
",\.
-
"-, ") Logic programming
FP2 is on the edge joining functional programming and parallel programming. It must be distinguished from other functional languages where data flow is used for taking advantage of possible parallelism during evaluation. In FP2, on the contrary, both functional programming and parallel oro2ramminl are explicitly present and can be independently expressed using specific constructs in the language. Furthermore, FP2 is a typed language allowing polymorphic algebraic type definitions, polymorphic functions and polymorphic communicating processes. Finally, the "declarative" style of FP2 and its semantics give to that language the qualities of both a programming language and a specification language. The semantics of FP2 rely on term algebras and on rewrite systems. This establishes a sound basis for designing and implementing formal verification tools, like full static type checking, static analysis of dynamic behavior (deadlocks, livelocks.... ) and proof of implementation correctness (comparison of a specification in FP2 with an implementation, also in FP2).
224
The main characteristics of FP2 can be summarized as follows "
FP2 is a functional programming language. Values in FP2 are represented by terms and basic function definitions have the form of rewrite rules. Function applications are terms containing defined function names : rewrite rules reduce function applications to terms containing no function application. Functional forms using second order functional operators and function names can be written and named : such higher form for constructing and defining functions has its semantics defined in terms of basic function definitions (i. e. rewrite rules).
FP2 is a parallel programming language. Independant communicating processes can be defined and networks of them can be constructed. A process is able to send and to receive messages to and from its environment. These messages How through ports owned by the process. Messages are values : they are represented by terms, they are built and reduced according to functional programming in FP2. Describing a process requires both describing the possible orders in which its ports may be used (sequentiality, non determinism, simultaneity) and describing sent messages by applying functions to received messages : basic process definitions accomplish all of this within a single formalism, namely rewrite rules. "Process forms" using "process operators" and process names can be written and named : such higher level form for constructing and defining processes denotes, in general, a network of processes (e. g. systolic arrays) and has its semantics defined in terms of basic process definitions (i. e. rewrite rules).
FP2 is a typed programmin~ language. Every term representing a value in FP2 is typed and every function has a domain and a range defined by types. All terms of a given type are thus results of applying functions having their range in that type • on that basis, types in FP2 are defined as term algebras. A type definition introduces the constructor operations for objects of that type ' terms containing only constructors are normal forms for terms containing function applications. Elaborate type structures built by means of "type forms" using "types operators" and types names can be written and named.
225
FP2 is a oolvmorDhic pro~rammin~ language. Type definitions, function definitions and process definitions may be parameterized by types : such definitions are called "polymorphic". In order to guarantee the type correctness of polymorphic definitions and for establishing the proper bindings, a notion of "property" is introduced : a property characterizes a class of types by defining a minimal algebraic structure that all the types of the class must have. Arbitrary properties may be defined. Once a property is defined, it may be used for specifying the class of actual types a formal type parameter of a polymorphic definition may be bound to.
FP2 is a modular programming language. Function, process, type and property definitions can be grouped inside "modules" which can be assembled in a hierarchical manner. Modules may export definitions to ascendent modules and may hide definitions to descendent modules. Modules form the basis for a strict control of visibility within FP2 programs.
226
2 _ TYPES, Values are represented by terms and FP2 operates on values by applying functions to terms. There are two kinds of functions : constructors and operations. Terms containing operation applications should always be reduceable to terms containing only constructors. The reductions corresponding to a given operation are defined by rewrite rules. Terms are typed and every function has a domain and a range, both of which are types. Thus, terms of a given type are all results of applying functions ranging in that type. Formally, types are term algebras and, with the rules defining operations, every term is congruent to a term containing only constructors. The basic form of type definition provides a signature for constructors and, possibly, for operations involving objects of the type. It also provides the rules defining the reductions for these operations. In addition to this reasonnably classical basis for algebraic type definition, FP2 also provides ways of constructing new types from existing types, using cartesian product, sequence and union type building "operators".
2. I _ Basic type definitions, A basic type definition presents a term algebra. It provides : - The name t of the type ; The names, domains and ranges (necessarily t) of the constructors of t ; The names, domains and ranges of operations involving t ; - The names and types of variables used in the rules for operations ; Rewrite rules defining the reduction of terms containing operation applications.
-
-
The left and right members of rules are separated by "==>" signs: Rules should be written in such a way that terms containing operation applications can be reduced to terms containing only constructors : this is an important question which has been studied in a number of places, especially in connection with algebraic data types and with terms rewriting systems. It will not be discussed here.
227
An example is the t y p e "Nat" of Natural integers (assuming that the type Bool of booleans is defined with constructors "true" and "false", and with operations "or", "and" and "not") •
Nat 0 • -) Nat succ • Nat -) Nat oons add, mul " N a t × N a t = ) Nat eq, leq " Nat × Nat -) Bool max • N a t × N a t = ) Nat I " -) Nat • Nat VarS m, n rules add(0,n) ==> n add(succ(m),n) ==> succ(add(m,n)) mul(0,n) ==> 0 mul(succ(m),n) ==> add(mul(m,n),n) leq(0,m) ==> true leq(succ(m),0) ==> false leq(succ(m),succ(n)) ==> leq(m,n) eq(m,n) ==> and(leq(m,n),leq(n,m)) max(re,n) ==> i f leq(m,n) t h e n n e.lsem endif I ==> suce(0) cons
endtype This example should not imply that integers have to be r e p r e s e n t e d as succ(succ(...)) - the usual decimal notation and infix arithmetic operators can also be used. This is also the case for <, ( .... and the boolean operators.
228
As another example, binary trees with natural integers at their leaves can be described by "
type Btree tip : Nat fork : Btree xBtree 0PnS maxt : Btree m : Nat vats u, v : Btree ru~Is maxt(tip(m)) maxt(fork(u,v)) endtvDe cons
The tree pictured as
-) Btree -~ Btree -) Nat
==>
m
==>
max(maxt(u),maxt(v)}
-
.
/
\
'...
f
1"T-)
/, "-.,,.. \ j<
,.U
\',%,.,
d~
would be constructed by • fork(tip(3), fork(fork(tip( I },tip(4)), tip(2)})
A general method for writing rules is that the left members apply each defined operations to disjoint cases of constructors, whereas the right members may have any format including conditional expressions.
229
2.2 _ Type forms. In addition to e l e m e n t a r y types defined by means of basic type definitions, it is possible to define constructed types by means of type forms w h e r e the operands are type names and the operators are type operators. There are three such operators : I _ Cartesian product. If tl,t2,...,t n are types then tl×te×...xtn is also a type, the cartesian product of tl,t2,,_,t n. If Xl,X2,...,X n denote objects of types tl,t2,...,t n respectively, then (xl,x2,...,x n) denotes an object of type t I × tax...× t n. It must be noted that t i ×t2×t 3, (t I ×t~)×t~ and t I ×(taxt 3) are distinct types. 2 _ Sequence. If t is a type, then t* is the type of sequences with elements of type t. If x denotes an object of type t and if s denotes a sequence of type t*, then x.s denotes a sequence of type t*. The notation nil : t* denotes the e m p t y sequence of type t'. It is simply written nil w h e n the type t* is k n o w n from the context. If Xl,X2,...,X n denote objects of type t, then [xl,x2,...,x J
denotes
the
sequence
of
type
t*
constructed
by
x t.(x2.(....(xn.nil)...)). 3 - Union. If tl,ta,...,t n are types then tlit21...It n is the union of types tl,t2,...,t n. There is no special w a y of denoting an object of a union type : w h e n in a context w h e r e an object of type tllt2l...Itn is required, then an object of type t i, or an object of type t 2, or .... or an object of type t n may be provided. The types tllt2lt3, (tllt2)lt 3 and tll(t2lt 3) are all equivalent, and tilt I is the same as t v
230
Type expressions can be used in any context w h e r e a type may be written ; examples of this have already appeared above with functions having cartesian products as their domains. It is also possible to define names standing for type expressions, by means of type declaration, like in • type Snat is Nat* type Ssnat is (Nat*)* For example, a w a y of describing trees of variable arity with natural numbers or booleans at e v e r y node could be • type Vtree is (NatlBool) × Vtree* Given
a 'type
t=tiltaL.It =,
m/>l
and
a
type
t'=t'llt'21_.lt' n, n/>l, where
tt,t2,...,t,n,t'1,t'2,.o.,t' n are not union types, then t is compatible with t' if, for all i, there is a j such that ti=t' j , w h e r e "=" is syntactic equality, modulo type declarations. The notation " t =_ t' " stands for " t is compatible with t' " Thus, given the type declaration • type t is u, both relations t ~_ u and u c_ t hold.
231
5 - FUNCTIONS. Operations may be defined within basic type definitions. But once a type is defined, additional operations on it may be defined separately. The elementary form of operation definition, called "basic operation definition", follows the same general approach as operations defined within types, namely rewrite rules. In addition to basic operation definitions, FP2 also allows the construction of other operations, by means of second order functional forms which apply functional operators to function names. 5. I _ Basic operation definitions...,. A basic operation definition provides : the name, domain and range of the n e w operation ; the names and types of variables used in the rules for that operation ; the rules defining the reductions of terms containing applications of that operation. For example, a new operation on Nat's can be introduced by • oi:)
min vats rules
• N a t × N a t -) Nat m,n • Nat min(m,n) ==> if m,
endoo An operation replacing all natural integers at the leaves of a Btree t by maxt(t) would be • 912
repmax • Btree -) Btree vars t • Btree rules repmax(t) ==> rep(t,maxt(t)) endoo o_~ rep vats rules endop
• Btree x Nat -) Btree m , n • Nat u , v • Btree rep(tip(m),n) ==> tip(n) rep(fork(u,v),n) ==> fork(rep(u,n),rep(v,n))
232
Another, more elaborate example, shows a w a y of programming unification of t e r m s in FP2. Terms are structured objects like t and u below •
t.
/
/"
/
..
,,.. XV
/ ,"~.fz ,.<, /
.,./" x'%L
\,
/
i
"
f~/ ",," \
",,
.
.
-
2
/.
'\ x,
',a4
"\,
/"
,.:..
./'" >' ~'".v al
/
"""x "
~":J'2
v4)
/
~
/
/
//
al
"""\.
'\" a 3
The labels fi r e p r e s e n t b i n a r y function names, a i r e p r e s e n t constant n a m e s and v i r e p r e s e n t variable names. For simplifying the example, the algorithm assumes that each variable n a m e appears at most once in (t,u). The result of unifying t and u is computed by unify(t,u). In the case of t and u above it succeeds and results in a sequence of assignments to variables denoted b y :
[(
i ' ) *"a
t7 ") 2 '4 There are other cases, w h e r e unification results in a failure. The types for terms can be defined as follows :
t y p e Term is Const I Vat I Applic t y p e Const cons a - Nat -) Const endtype Woe Var cons v - Nat -) Var endtype t y p e Applic is Funct × Term x Term type Funct cons f " Nat =) Funct endtype
233
The possible results of unification have the type : type Result is Assign* [ Failure type Assign is Var x Term type Failure cons fail : =) Failure endtvDe While unifying two terms, it will be necessary to combine the results of unifying subterms. This is accomplished by an operation "+" : o_~ + vats rules
: r : u,v : r+fail fail+ u u +v
Result x Result -) Result Result Assign* ==> fail ==> fail ==> append(u,v)
endoo This basic operation definition shows an example of operation overloading : the operation +, which is already defined on Nat's, gets here another definition attached to it. When + is applied, the choice among these definitions is determined by the types of the operands. Definitions leading to possible ambiguities are not permitted. This definition also shows a use of union types : w h e n + is applied on Results, each of its operands may be of type Assign* or of type Failure : the case analysis on operand types is made by the type of variables used in the left members of the rules.
234
Finally, the unification operation is ' o_~
unify vars
ru~s
' T e r m × T e r m -) Result t,u,v,w • Term i, j • Nat c • Const x • Var h • Applic unify(a(i),a(j)) ==> if i=j t h e n nil else fail e n d i f unify(c,x) ==> [(x,c)] unify(c,h) ==> fail unify(x,t) ==> [(x,t)] unify(h,c) ==> fail unify(h,x) ==> [(x,h)] unify((f(i),t,u), (f(j),v,w)) ==> if_. i=j t h e n unify(t,v) + unify(u,w) els___~efail endif
endop 3.2 - Functional forms. FP2 provides functional functional forms.
operators
for
combining
defined
functions
into
Let r, s, t, t I, t 2..... t a be types. T h e r e are eight such operators • I _ Composition. If f and g a r e f u n c t i o n s w i t h f : t - ) r andg:r =) s, t h e n ( g o f ) is a functional f o r m denoting an operation in t --) s. If x is a t e r m of t y p e t, t h e n the r e d u c t i o n of (gof)(x) is the r e d u c t i o n of g(f(x)). 2 _ Condition. If p, f and g are functions with p • t -) Bool, f - t --) r and g • t -) r, t h e n (p=) f ; g) is a functional f o r m denoting an o p e r a t i o n in t -) r. If x is a t e r m of t y p e t, t h e n t h e r e d u c t i o n of (p =) f ; g)(x) is t h e r e d u c t i o n of if p(x) t h e n f(x) else g(x) endif.
235
3 - ~artesian Product construction. If fl, f~...... fn are functions w i t h fi : t =) t i, t h e n (fl, f2 ..... fn) is a functional f o r m denoting an operation in t ~ t I x t 2 x ... x t n. If x is a t e r m of type t, t h e n the reduction of (fl,f2,...~a)(X) is the reduction of (fl (x),f2(x),...Jrn(x)). 4 _ Seauence construction. If fl, fa ..... fn are functions w i t h fj : t -) r, t h e n If j, f~...... fn] is a functional form denoting an operation in t -) r*. If x is a t e r m of type t, t h e n the reduction of Ill,r2, ..., fn](x) is the reduction of [f)(x),f2(x),..., fn(x)]. If n=O, t h e n the reduction of [](x) is the reduction of lnill(x). 5 - Constant. If x is a t e r m of t y p e t, t h e n Ixl is a functional form denoting an operation in r-)t, for any type r. If y is a t e r m of any t y p e r, t h e n the reduction of lx|(y) is the reduction of x . 6 _ Map.
If f is a function with f : t =) r, t h e n o((f) is a functional form denoting an operation in t* -) r*. If x is of type t*, t h e n t h e r e are two cases : (i) if x reduces to nil t h e n 0((f)(x) reduces to nil, and (ii) if x reduces to u.y t h e n the reduction of 0{(f)(x) is the reduction of f(u).0((f)(y). Insert. If f is a function with f : t x t -) t, t h e n l(f) is a functional form denoting an operation in t* -) t. If x is a non e m p t y sequence of t y p e t*, t h e n there are t w o cases : (i) if x reduces to u.nil t h e n / ( f ) ( x ) reduces to u, and (U) if x reduces to u.(v.y) t h e n the reduction of /(f)(x) is the reduction of f(u,l(f)(v.y}). If x is the e m p t y sequence of t y p e t*, t h e n l(f)(x) raises the exception "inserLerror". 7 _
If f is a function w i t h f: tlxtax...xtn-) t, t h e n : f(xl, x 2.... , xn), w h e r e xj is either a t e r m of t y p e tj or a ".", is a functional f o r m denoting an operation in ti×tjx...×tl-)t, w h e r e ti/tjx...xt I is obtained by keeping in t~ xta×...xt n the t k such that x k is a ".". If Yk is a t e r m of t y p e t k such that x k is a ".", t h e n the reduction of : f(xl,x2,...,xn)(y m..... yp) is the reduction of f(z~,z2,...,z n) w h e r e z i = x i if x i is a t e r m and z i = Yi otherwise.
236
The semantics of functional operators are defined by considering that functional forms are second order expressions which can be "evaluated". This is possible in FP2, where basic operation definitions constitute a more elementary form of operation description : evaluating functional forms means producing basic operation definitions. Let F(x) be a term where F is a functional form. The evaluation of F is guided by its syntax : dummy operation names f0,fp.., are generated, one for each syntactical sub-form in F, where f0 is the operation for F itself. Then, basic operation definitions for f0,fl ..... with their respective names, domains, ranges, variables and equations can be mechanically produced. In fact, defining one of these functions fi is necessary only when fi corresponds to a map or insert functional operator and in the case of recursive functional forms. Finally, F(x) is replaced by f0(x) which has its reduction defined by the generated basic operation definition. For example, given the following basic operation definition • o_R null vars rules
: Nat* -~ Bool m : Nat s : Nat* null(nil) ==> true null(re.s) ==> false
endoR The evaluation of ((not o null) =) l(add);101) produces " " Nat* -) Nat
o_O_R f0 vats
v0
" Nat ~
rules
f0(v0)
==> ~_ not (null(v0)) then f! (v 0) else 0 endif
endo~
237
fl vats
: Nat" -) Nat v o , v I : Nat v2
rules
: Nat"
fl(v0.nil)
==> v 0
fl(Vo.(vl.V2)) ==> add(vo, fl(vl.v2)) fl(nil) ==> ! insert_error endop If x is a sequence of t y p e Nat*, t h e n ((not 0 null) =) l(add);101)(x) reduces to the s u m of the e l e m e n t s of x, or to 0 if x is nil. Functional forms can be used in any context w h e r e function n a m e s m a y be written. It is also possible to define n a m e s standing for functions built by functional f o r m s • 9J~ sigma is ((not o null) ~ l(add);101) o__p_pi is ((not 0 null) -) l(mul);10l) 0_~ sigpi is (sigma 0 o((pi)) Recursive operation definitions with functional forms fit quite naturally in that f r a m e w o r k . For example, given oo II is (leq 0 (id,lll)) w h e r e id is the identity function on natural n u m b e r s , and " pred vars rules
- Nat -~ Nat m • Nat pred(0) ==> 0 pred(succ(m)) ==> m
e,~dop re_p_p I is pred p2 is (pred o p r e d )
238
Fibonacci n u m b e r s can be computed b y • fib
is
(II =) id ; (add o ((fib o pl),(fib o p2))))
This definition produces • oo
Nat -) Nat • Nat
fib ¥~r~
v0
rules
fib(v 0)
endoo
==>
if ll(vo) then id(vo) else add(fib(pI(vo)),fib(p2(vo))) endif
239
4 _ PROCESSES. The elementary component for organizing parallel computations in FP2 is the process. A process has ~)orts through which messages may flow in and out. Messages are values, they are represented by functional terms, they have types and they are built and reduced according to functional programming in FP2. Messages arrive at ports or leave ports along directed connectors having their destinations or their origins attached to these ports. Each connector allows messages of a certain type, which may be a union type. The transportation of one message along one connector is a communication. There is no such notion like the duration of a communication. Describing a process is describing its ability to perform communications along the connectors attached to its ports. In addition to applying functions to received messages for computing sent messages, this involves also seauencin~, n ~ determinism and parallelism in the ordering among communications. Formally, a process is a state transition system which can be viewed as a graph : nodes represent states, multiple branching represents non determinism and arcs are labelled by events, where an event is a set of communications occurring in parallel. This graph is in general infinite and every path represents a possible sequence of events : one set after the other of communications occurring in parallel. The basic form of process definition provides a description of the connectors of the process and it makes use of rewrite rules for describing the non deterministic state transition systems : the rules, labelled by (possibly empty) events, rewrite states. In addition to basic process definitions, the language allows definitions of processes built by combining other processes into process forms by means of process operators. 4. I ~ Basic process definitions. A basic process definition describes a transition system, with transition rul~s, where the events are sets of communications along typed connectors. It provides
•
the name N of the process ; the names and message types of the input, output and internal connectors of N ; the names and domains of state constructors used in the rules of N ; the names and types of variables used in the rules of N ; rules defining the transitions of N.
240
As an example, let STACK be a process. It can be pictured as :
----•
t
01
~-
It has an input connector I and an output connector O. The communication of a message v, w h e r e v is a functional term of type t, along a connector k of message type t is denoted by k(v). For example, if both I and 0 may communicate Nat's, then I(O) and O(succ(O)) denote communications. An event is composed of a set of corn munications k ~(v j)...kn(vn), w h e r e k ~.....k n are n distinct connectors. A term of the form Q(u i .....urn), where Q is a state constructor and w h e r e the ui's are functional terms of the correct types for the domain of Q, is called a predicate. A predicate without variables in the ui's is a stale. State constructors cannot appear in the ui's. Rules are composed
of three
parts:
a predicate
R(u i .....urn) called the
pre-condition, an event ki(vl)...kn(v n) and a predicate S(wl,...,w n) called the post-condition. They have the general format :
R(ui..... u,,):kl(vl)...kn(vn) ==> S(wl,...,wp) If k i is an internal or output connector, all variables appearing in v i must appear in R(u I .....urn) or in vj such that kj is an input connector. The same must be true for the variables in S(w i .....wp). Since an event is a set it may be empty. In that case, the rule is an internal rule, of the form : R(u I .....u m) ==> S(wI,o..,wp) Furthermore, among the rules of a process, there must be at least one initial rule, without pre-condition and without event, and w h e r e the post-condition is a state : -=> S(wl,._,w p)
241
For example, let the process STACK be an unbounded stack of Nat's. It is initially e m p t y and w h e n a Nat arrives along I, it may be written into STACK. When STACK is not empty, the last arrived Nat may be read from it along O. Writing and reading are mutually exclusive. A basic process definition for this "Last In First Out" STACK may then be : proc STACK i~ o¢~t states vats
I 0 S e v
-Nat "Nat " Nat* • Nat • Nat*
S(v) S(e.v)
- l(e) • O(e)
rules
==>
==> ==>
S(nil) S(e.v) S(v)
endproc Rules in a process N describe a transition system in the following way • 0 _ Initially, one of the initial rules in N is chosen. This choice is non deterministic, The post-condition of the chosen rule becomes the current state of N. Then repeat steps I, 2 and 3. I _ The current state q of N is matched against the pre-conditions of the rules : a rule with pre-condition r is said to be Dre-aoolicable if there exists a substitution h for the variables of r such that h(r)=q. If there is no pre-applicable rule, the process is terminated.
2 _ Let e be the event in the pre-applicable rule and let mj be a message about to be sent accross kj, for all ki(v i) in e where kj is an external connector. That rule is said to be applicable if there exists a substitution g for the variables of all vj's such that g(h(vi))=m j. 3 - One of the applicable rules is chosen to be the at)t~lied rule. This choice is non deterministic. Let s be the post.condition of this rule. The event g(h(e)) occurs and the term g(h(s)) becomes the current state of N.
242
This operational view shows how rules express sequencing, non determinism and parallelism among communications : rules are applied one at a time (sequencing), the applied rule is chosen among several applicable rules (non determinism) and several communications occur within a single event (parallelism). It must be noted that internal rules can be used for describing computations. This can be seen in the following example : MAXNAT sends through C the maximum of two previously entered Nat's, the first one entered through A and the second one through B C-" denotes the null arity for state constructors) " proc MAXNAT in A, B • Nat out C • Nat states X y Nat Nat x Nat x Nat × Nat Z vars m, n, p, q • Nat ==> X rules ==> Y(m) X • A(m) ==> Z(m,n,m,n) Y(m) • B(n) ==> Z(m,n,m,n) Z(m,n,succ(p),succ(q)) ==> X Z(m,n;p,0) • C(m) ==> X Z(m,n,0,q) • C(n) endt)roc In fact, this form of process definition could v e r y well do without the operation definitions of the functional part of FP2. Assuming that the available functions are only constructors, basic process definitions are sufficiently powerful to define any function that can be computed on a Turing machine. However, defined operations make basic process definitions much easier to write and to read. For example, a process sending out the maximum of its two input messages could also be described as follows : proc MAX in A, B " Nat ou__Xt C • Nat states X • Y, Z " Nat vars m, n " Nat ==> X rule, s, X : A(m) ==> Y(m) Y(m) : B(n) ==> Z(max(m,n)) Z(m) : C(m) ==> X endproc
243
Process definitions m a y also be parameterized. For example, bounded queues of natural n u m b e r s of capacity k m a y be defined as follows • proc
BQUEUE i__~n out states vars
[k" Nat] W R Q e t, u, v n
• ' • • • •
Nat Nat Nat* xNat* × Nat Nat Nat* Nat
rules ==> Q(u,v,succ(n)) : W ( e ) = = > Q(u,e.v,n) : R(e) ==> Q(e.u,nil,n) ==>
Q(nil,nil,k) Q(e.u,v,n) Q(u,v,n+l) Q(nil,reverse(e.u),n)
endproc w h e r e reverse(s) r e t u r n s a sequence with the elements of s in the opposite order. Once such a parameterized process has been defined, it can be instantiated with actual parameters. It is also possible to define n a m e s standing for processes • proc BQUEUE4 is BQUEUE[4] Every process definition, w i t h or without parameters, can also be considered as the definition of an indexed family of processes, w h e r e the indexes are natural numbers. For example, let processes V be variables alternating write and read communications : V in out states vars rules
endproc
W R E F v
: : : : :
Nat Nat Nat Nat
E F(v)
E : W(v)==> F(v) : R(v) ==> E
244 This definition also defines processes V_l, V_2, etc .... with connectors W_I and R_I, W_2 and R_2, etc... These indexes m a y also appear as parameters, like in proc VNAT [i-Nat] is V_i Then • V3 is V_3 and " rp=f_OS_V3 is VNAT[3] are identical definitions and produce a process with connectors W_3 and R_3. That process may in turn be considered as defining an indexed family V3_I, V3_2, etc... A similar indexing facility can also be used within basic process definitions, w h e n it is n e c e s s a r y to describe processes with indexed families of connectors, states, variables, rules or events. For example, a process ONE receiving a Nat into I and sending it out from on_.__~eof its n output connectors O_i, is described b y p_~oc ONE [n" Nat] i~
I
•
ou___~t states
{O_i [i= l..n) E F v
" • " •
Nat Nat Nat Nat
' •
==> E I(v) ==> F(v) O_i(v) ==> E l i=l..n
vars rules
(
E F(v)
endproc
t
I
,,1
0_1 O_2
C)_t'i
J, ,L
I
4,
245
Given an instantiation ONE[3], the repetition facility {O_i I i=1_n} • Nat stands for • 0_1,0_2,0_3 • Nat. Similarly, 3 rules are produced, one for o u t p u t through each of 0_I, 0_2, 0_3. An other example shows the use of this facility for describing a process ALL which receives a Nat into I and sends it out from all of its n o u t p u t connectors within the same e v e n t • proc
ALL In" in out states vats rules
Nat] I {O_ili=I..n} E F v E F(v)
-Nat " Nat - • Nat " Nat - I(v) • {O_i(v) li=l..n}
==>
E
==> ==>
F(v) E
endproc A process which receives n Nat's sequentially into its input connectors I_i t a k e n in any order and then sends out their maximum through 0 is defined b y • MAXALL [ n Nat] in {IJ]i=1..n} out 0 sta)es Q vars v, m s rules {
0(succ(m),s) Q(0,v.s) Q(O,nil)
• • " • •
Nat Nat Nat x Nat* Nat Nat*
==> Q(n,nil) : I_i(v) ==> Q(m,v.s) ] i=l..n : O(/(max)(v.s)) ==> Q(n,nil) : 0(0) ==> Q(n,nil)
endproc
I I_1
I_2
I_n @
246
Finally, process definitions may also parameterized by port names, as in the following definition of a "CELL". A CELL performs four communications within a single event. In that event, it inputs natural integer values, while sending out the result of a simple computation performed on previously received values : proc CELL [c. Nat] [X0,Yo,XI,Y1 "Port] in
Xo, Yo
• Nat
ou.t states vats rules
XI, Y1 Q x, y, u, v
- Nat • Nat x Nat - Nat ==> Q(O,O)
Q(x,y)
• X0(u) Y0(v) Xl(x) Yl(y+c*x)
==> Q(u,v)
endp,roc Given natural integers a, i and j, and identifiers U, V, X and Y, CELL could be instantiated as follows • proc CELLI is C~L[al[U, Y_i_j, X i. j, V].
¥-J-i U
x_i_j V
i
247
4.2 _ Process forms. FP2 provides process operators for combining defined processes into process forms. The number and the nature of these operators are arbitrary and a given implementation of the language could take any collection of them. The important facts are : (a) all operators are built up on top of a common primitive basis ; (b) process forms can all be evaluated into basic process definitions with connectors, state constructors, variables and rules.
4.2. I _ Primitive basis for process operators. A non parameterized basic process definition is a syntactic object of the form : proc N connectors
kl
: tI
...
kI
states
Pl
: Jl
".
Pro: lm
vats
V 1
:
U1
,..
Vn
==> ql
-.-
Sl
...
rules ri e I
==>
: t1
:
ti n
==> qp fq eq
==>
Sq
endproc
where the input, output and internal connectors have been grouped within a single list. Given a connector k i, its sort is given by sort(ki) E {in. ~ internal}. Thus, a basic process definition can be viewed as associating a process name N with a tuple
-
Q = { qi {i= 1..p }, represents initial rules, where the qi's are states. R = {
predicates and the ei's are (possibly empty) events.
248
Let N =
I
Product of transition rules. Iff R={
Let NI =
249
I _ Interleaved composition • Nt_J_.Nz W h e n an e v e n t occurs in the process N t l N 2, it is either an e v e n t in N i while N a is idle or an e v e n t in N 2 while N t is idle • NIIN 2
=
where - K = Ki+K 2 P = Pl*P2 V=
Vt+V 2+w(pi)+w(p2)
O = Qt*Q2 R = Rt*I(Pa)+I(Pt)*R
a
2 _ Synchronous composition • Nt__lilN~ When an e v e n t occurs in N t Ill N 2, it is an e v e n t in N i together with an e v e n t in N2 • NI]IiN 2
=
where • K = KI+K z P = Pt*P2 V=
VI+V 2
Q = QI *Q2 R = RI*R a 3 - Parallel composition • Nt_ H N 2,
When an e v e n t occurs in N i II N a, it is an e v e n t in N l while N 2 is idle or an e v e n t in N 2 while N t is idle, or an e v e n t in N! together with an e v e n t in N 2 NlllN 2
=
where - K = Kt+K 2 P = Pt*P2 V = Vi+V 2+W(PI)+W(P2)
Q
=
Ot *O2
R = RI * I (P2) + I ( P t ) * R 2 + R! * R 2
250
4 _ Uncontrolable c h o i c e " NI_ .~ N2._ At initialization, the process N l ? N2 chooses non deterministicaUy to b e h a v e always like N I while leaving N2 idle, or to b e h a v e always like Na while leaving N~ idle : N!?N 2
=
where • K = KI+K 2 P = P~+P2 V=
Vt+V 2
O = Qi+O2 R = RI+R 2
5 - Controlable choice : N! ! N~_. After initialization', the first e v e n t to occur in N l ! N2 may be either an e v e n t in N I while N 2 is idle or an e v e n t in N2 while N l is idle. If that e v e n t contains input or o u t p u t communications, the choice m a y be controlled by the e n v i r o n m e n t of N I ) N 2. After that e v e n t has occurred, N I ! N x continues to b e h a v e like N I while leaving N 2 idle if it was an e v e n t in N l, or continues to b e h a v e like N2 while leaving Nl idle if it was an e v e n t in N 2 : N!!N 2
=
p
= (pj * p2) * p,
V
= V I +V 2
Q
=
R
= (R l * I (P2)) * R' l + (I ( P i ) * R 2) * R' 2
where •
given "
K I +K 2
(QI * Q2 ) * O'
{
251
.6 _ Connection : N ! + A.B. Let A b e an o u t p u t connector of t y p e t I and B be an i n p u t connector of t y p e t 2, b o t h in N v If t h e r e exists a t y p e t w i t h t ~ t I and t _mt2 such t h a t t h e r e is no t' ~ t satisfying t h e s e s a m e conditions and t _: t', t h e n A.B is an internal connector of t y p e t in N I + A.B. The process N l + A.B b e h a v e s like N l and, in addition, w h e n an e v e n t involving b o t h A and B m a y occur in N l, a n e w e v e n t involving A.B m a y occur in N! + A.B, w h e r e the m e s s a g e s e n t f r o m A arrives at
B NI + A.B where
=
• K = KI+K' P=
Pl
V=
Vl
0 = Ol R = RI+R' given •
K' = {
and g = mgu(u,v) } with : mgu (u, v) = most general unifier of u and v. 7 _ Hiding : N ! - k. If k is a connector of t y p e t in N l, the connectors of N l - k are all the connectors of N l, e x c e p t k w h i c h is "hidden". if k is an e x t e r n a l connector (i. e. i n p u t or output
connector),
no e v e n t
involving
k m a y occur in N i - k, since the
e n v i r o n m e n t cannot "see" k a n y longer. If k is an internal connector, e v e n t s involving k in N I m a y still occur but, in Nj - k, t h e y do not m e n t i o n k a n y longer : Nl-k where
=
P= Pl V=
Vl
0 = Ol R = R I - R' ÷ R" given :
K'= {
R'= {
252
8 _ Tri~er
: e -> N!.
If e is an event, t h e first e v e n t to occur m e -> N l m a y occur only t o g e t h e r w i t h e. After that, e -> N I b e h a v e s like N v Thus, e -> N l is N t t r i g g e r e d b y e : e->N t
=
where : K = Kt+K' P = pl*p'
V = Vt O = q l * Q' R = RI*R' given :
K' = { a set of connector definitions n e c e s s a r y for e } P' = {
9 _ Control : e => ..N!. If e is an event, e => N l b e h a v e s like N v b u t e v e r y e v e n t occurs together w i t h e. Thus, e => N l is N! controlled b y e : e=>Ni
=
where : K = Kt+K' p = pt*p' V= V 1 Q = QI* Q' R = RI*R' given :
K' = { a set of connector definitions n e c e s s a r y for e ) P' = (
O'= {T} R'= (
253
I0 _ T i m e - o u t : N!.n:N2_~ If n is a natural integer, t h e n NI.n:N 2 b e h a v e s like N! for at most n successive events. If N! is t e r m i n a t e d before n e v e n t s have occurred in it, t h e n NI.n:N 2 is also terminated. If N! is not t e r m i n a t e d at that time, t h e n N!.n:N 2 stops behaving like N! and starts behaving like N 2 : NI.n:N 2
=
where " K
given •
P
= Pl*P'+P2
V Q
= VI+V'+V",V = Ql* Q'
R
= RI*R'+R"+R 2
2
P' = {
Process forms appear in the context of process definitions • proc <process name> i_s <process form> A process form is an expression in which the operators are process operators and the o p e r a n d s are processes, connectors, e v e n t s or natural integers. Evaluating a process f o r m results in basic process definitions. In principle, the evaluation of a process form N is guided by its syntax " d u m m y n a m e s no, n ! . . . . . are generated, one for each syntactical s u b - f o r m in N, w h e r e n o is the n a m e corresponding to N itself. Then, basic process definitions for n o, n~ ..... w i t h their respective names, connectors, predicates, variables and rules can be mechanically produced b y applying the definitions of the operators. In practice, this evaluation can be greatly optimized and most i n t e r m e d i a t e basic process definitions (especially those resulting f r o m compositions) can be avoided.
254
4.2,3 _ Examples of process form&s For writing the examples in this section and showing the results of some process form evaluations, the following conventions are used • - Process forms w r i t t e n "N ++ A.B", w h e r e N is a process form and A.B is an internal connector, are expanded to "N + A.B - A - B - A.B". - The list M (p, p', I, I'), w h e r e p and p' are state constructors names and I and I' are lists is w r i t t e n as if it was append (I, I').
I _ Maxima of sequence of Nat's. In this first example, values of type Nat are read in sequentially and t h e y are considered as forming a series of sequences separated b y O's. At the end of each sequence, the maximum of that sequence is sent out. This is achieved b y a process SMAX constructed as a n e t w o r k of more e l e m e n t a r y processes. The process MAX and the following definitions are used in that construction : proc REG out
states vats
W R V r, s
:Nat : Nat : Nat : Nat
v (r) v (r)
• W (s) • R (r)
rules
==> V (0) ==> V (s) ==> V (0)
endproc type Signal cons
buzz, ring
• -)
Signal
endtype proc
BZZZ in out
K L S
: Nat : Nat : Signal
st~t)es
P
:
vats
p
: Nat
p
.
=
rules p endproc
K (0) L (0) S (buzz) - K (succ(p)) L (succ(p))
==>
P
==>
P
==>
P
255
oroc
GATE in out states
M : Nat T :Signal N : Nat Q : -
vats
q
: Nat ==>
rules Q
" M ( q ) N ( q ) T(buzz)
Q
==> Q
endproc Then the process SMAX can be constructed b y the following process form • p r.oc SMAX is ( MAX II REG ++ C.W + R.B - B - R.B ) [I ( BZZZ II GATE ++ S.T ) ++ L.A ++ R.M This construction can be pictured as foUows •
[
MAX
SMAX The resulting basic process definition is r e m a r k a b l y short • t)roc SMAX
in OUt statep
v,ars rules
K N X_V__P_Q Y_V_P_Q Z_V_P_Q p, m, r
X_V_P_Q (0) ==> Y_V_P_Q (succ(p), r) ==> Z_V_P_Q (max (m,r), 0) ==> X_V_P_Q (m) ==> Y_V_P_Q (0, 0) ==>
x_V_p_Q (r) Y-V--P-0 (m, r) Z_V__P_0 (m, r) X_V__p_Q (r)
en,dproc
: Nat : Nat :Nat : Nat × Nat : Nat × Nat : Nat
: K (succ(p)) : : : K (0) N (r)
256
2 _ Construction of a. queue. In addition to process operators, process forms m a y also use conditionals. It is t h e n possible to write recursive definitions, like the following construction of a b o u n d e d q u e u e BQ built as a chain of processes of the indexed family V proc BQ [k "Nat] is
~_ k=l t h e n V_I els____~eBQ[k- I ] I] V_k ++ R_(k- I ).W_k endif The instantiation • proc BQ4 is BQ[4] can be pictured as follows •
It is also possible to have an "iterative" description of BQ, using the repetition facility inside a process form " t)roc B Q [ k Nail is II ( V=_i I i= l..k} ++ ( R_i.W_(i+ I ) ] i= I ..k- I )
257
;L.S~tolic arrays. Let A and B be two nxn matrices. Given a series X0,...,Xi.... of vectors with n components, the problem is to compute a series ¥t,...,Yi,... of vectors with n components such that Yi = AXi + BXi-I This computation can be performed by a nxn systolic array of processes. Let SYSTOL be the name of that array. The complete system comprises SYSTOL and four interface processes which prepare the input vectors for SYSTOL and assemble the output vectors for the environment.
,;2
",i j~.IZ,' , 0 0
Z~; "
INX
F ~>.:, ; 2 ., .,...~
i
~",) LO !If. . T '
t 1.........l, il The Xi vectors arrive into the right of the system and they get out unchanged from the left, while the computed results, the Yi vectors, leave from the bottom. The processes INX, OUTX, ZERO and OUTY are interface processes : INX inserts one vector of 0's after each Xi vector and delays the jth component of the i m vector of that new sequence so that it arrives into SYSTOL "at the same time" (i. e. within the same event) as the first component of vector (i+j-l) of that sequence. Symmetrically, OUTX and OUTY re-establish the synchrony among the components of each Xi vector and of each Yi vector respectively. ZEROrepeatedly sends vectors of O's into the top of SYSTOL. The FP2 description of these interface processes is left to the reader.
258
Surrounded by its interfaces, the process SYSTOL is an n×n array of orthogonally connected processes of the family MOD, each containing 2 CELL's. MOD[i,j] is positioned at row i, column j of SYSTOL and it is defined by : proc MOD [i,j:Nat] is CELL [b(i,j)][U,Xl i j,Y0 i j,V] II CELL [a(i,j)][X0_i_jZ,W,YI_Lj] ++ V.W ++ Z.U where a(i,j) and b(i,j) are elements of the matrices A and B respectively. Finally, SYSTOL is constructed as follows : proc SYSTOL [n : Nat] is II { ROWIi,nl I i= l..n } ++{ Yl_i_j.Y0_(i+ 1)_j I i=1..n-1, j=l..n } proc ROW [i,n : Natl its I1{ MODIi,jl I j=l..n } ++{ X1_L(j+1).X0_i_j I j=l..n-1 }
2~9
5 _ POLYMORPHISM, Definitions of types, operations and processes can specify that the defined entities are parameterized by types • such entities are called "polymorphic". Polymorphic definitions use formal type parameters which are introduced by the definition, with their names and with an algebraic characterization of the family of possible corresponding actual types. In FP2, such an algebraic characterization is called a Drooertv : properties are defined by means of equations on terms, formal type parameters of polymorphic definitions require properties on their corresponding actual types and the satisfaction of a property by an actual type can be asserted by means of a specialized satisfaction clause. 5.1.........Polymorphic definitions without properties. A polymorphic definition of a type, operation or process provides + the name of the polymorphic type, operation or process ; the description of the formal type parameters ; the body of the definition, which is a basic type, operation or process definition or a type, functional or process form. The body can use the formal type parameters, by refering to their names in any context where a type can be written. For example a polymorphic type for pairs of values of the same type is • type Pair [ t" type ] l Ftype [t] is t x t It reads as f o l l o w s ' "the type Pair [t], such that t is a type satisfying the property Ftype is the cartesian product t x t". The property Ftype is a predefined property • all types satisfy it, which means that any type can be used for instantiating the polymorphic type Pair • tyt)e Pairnat is Pair [ Nat ] type Twopairs is Pair [ Pairnat ]
260
Binary trees with nodes labelled by values of a given type and leaves labelled by values of a possibly different type have the type : type Tree [ t, u : type ] I Ftype [ t ], Ftype [ u ] con~ leaf : t node : u x T r e e [ t , u ] x T r e e [ t , u ] endtvDe
=) T r e e [ t , u ] ~) T r e e [ t , u ]
Trees with pairs of Nat's on nodes and Nat's on leaves have the type : tvoe Treenat ~ Tree [ Nat, Pairnat ] Such a tree can be constructed by : node ( (3, 4), leaf (I), leaf (2)) But it is also possible to define : Woe Treebool is Tree [ Bool, Bool ] °
-
and to construct : node ( false, leaf (true), leaf (true)) Thus, the polymorphic definition of Tree has also introduced operations "leaf" and "node", which are polymorphic. Instances of these operations have also been created : one instance of leaf takes a Nat and returns a Treenat, the other instance of leaf takes a Bo01 and returns a Treebool. Similarly, two instances of node have been created. The complete names of these functions are qualified by their signatures : ( ( ( (
leaf leaf node node
: : : :
Nat Bool Pairnat x Treenat x Treenat Bool x Treebool x Treebool
=) -) -) =)
Treenat ) Treebool) Treenat ) Treebool )
When constructing "node ( (3, 4), leaf (I), leaf (2))", the choice among the various instances of node is governed by the types of the arguments. In fact, this term stands for the more explicit construction : ( node : Pairnat x Treenat x Treenat =) Treenat ) ( (3, 4), ( l e a f : N a t =) Treenat ) (I), ( l e a f : N a t =) Treenat ) ( I ) )
261
P o l y m o r p h i c o p e r a t i o n s can also be d e f i n e d s e p a r a t e l y • o_~ first [ t" t y p e ] I F t y p e [ t ]" Pair [ t ] -) vats x, y • t rules f i r s t ( x , y ) ==> x endoo An instance of first could be explicitly c r e a t e d and called, like in • ( first "Pair [ Nat ] =) Nat ) (3, 4) But it is also possible, as above, to o m i t the signature and to s i m p l y w r i t e first (3, 4) Finally, t h e r e are also p o l y m o r p h i c processes • Droc PSTACK [ t " t y P e in I • out 0 • states S • vars e • v • rules S(v) S(e.v)
] i Ftype [ t ] t t t~ t t*
• I(e) • 0(e)
==>
endDroc T h e y can be i n s t a n t i a t e d
S (nil)
==> S (e.v) ==> S (v)
•
proc STACKNAT is PSTACK [ Nat ]
262
5.2 ~ Property definitions and satisfac)ion claus.es, In all the above examples, any actual type can be bound to the formal types, since the only requirement is that it satisfies the property Ftype. This is not always the case. For example, the definition of a generic equality operation on Pair's would require that there also exist an equality operation on the type t of the elements. The orooertv of such types with an equality operation can be defined in FP2, by means of a property definition : prop Equality I t with e q ] opns eq " txt-) Bool vats x,y,z • t eqns e q ( x , y ) == eq (y,x) eq (x, x) == true eq (x, y) ^ eq (y, z) A ~ eq (X, Z) == false endoroo It can be read as follows :"the property Equality is satisfied by all types like t with an operation like eq : t x t -) Bool iff the terms built with that operation obey the specified equations". (Two terms v and w obey the equation "v == w" iff the reductions of v and w terminate with the same term). Here, the equations state that eq is symmetric, reflexive and transitive. If "=" is the name of the equality operation on Nat's, the type Nat should now satisfy the property Equality with the operation "=". However, orovin~ that it is indeed the case is, in general, not a feasable task. This is why FP2, for that purpose, relies on assertions in the form of satisfaction clauses : sat Natequal is Equality[Nat with = ] which reads • "Natequal is the name of the satisfaction clause asserting that the property Equality is satisfied by the type Nat with its operation ='. Then it becomes possible to define an equality operation on Pair's • on same [ t : type ] [ equal : ~ ] I Equality [ t with equal ] : P a i r [ t ] x P a i r [ t ] -) Bool a, b,c, d : t rule~ same ( (a, b), (c, d) ) ==> equal (a, c) A equal (b, d) endop
263
Thus, "same" is a polymorphic operation with signature Pair[ t ] × Pair[ t ] -) Bool, requiring that the formal type t satisfy Equality with the formal operation equal. With that definition, a term like • same ( (3, 4), (5, 6) ) binds t to Nat, since it means ' ( same "Pair [ Nat ] × Pair [ Nat ] =) Bool ) ( (3, 4), (5, 6) ) Given that • and that •
Equality [ t with Equality [ Nat with
equal ] = ]
is required, is satisfied,
this term is correct and the formal operation equal is bound to "=". Finally, the rule ' same ( (3, 4), (5, 6) ) ==> (3=5) ^ (4=6) is applied and the term eventually reduces to false, as expected. Given this equality operation "same" on Pair's, it becomes even possible to say that the type Pair [ t ] satisfies Equality with it, provided that t itself satisfy Equality. This is accomplished by a polymorphic satisfaction clause ' sat Pairequal [ t" typ~ ] [ e q 0_~ ] I Equality [ t with eq ] is Equality [ Pair [ t ] with same ] That satisfaction clause enlarges the polymorphism of the operation same • it becomes applicable to Pair [ Nat ], Pair [ Pair [ Nat ]], Pair [ Pair [ Pair [ Nat ]]], etc.
264
The last example shows a polymorphic process BIGMAX [ t ], requiring that there be a semi-lattice structure among objects of type t • it inputs n objects of type t within one e v e n t and sends out their least u p p e r bound. prop Semilattice [ t with eq, leq, lub ] leq " t x t -) Bool lub " t × t -) t vars m,n,p • t leq (m, m) leq (m, n) A leq (n, m)A -, eq (m, n) leq (m, n) A leq (n, p) A " leq (m, p) lub (m, n) leq (m, lub (m, n) ) leq (m, p) ^ leq (n, p) A " l e q (iub (m, n), p) ~ndoroo
= =
== == == == ==
true false false lub (n, m) true false
proc BIGMAX[ t- t y p e ] [ eq, le, up" o.o.p_] I Semilattice [ t w i t h eq, le, up ] [ n :Nat ] in {I_ili=1..n} : t out 0 : t states E : F : t* (v_i]i=l..n} : t s : t* ==> E rul,e,S E " {I_i(v_i)li=l..n} ==> F ( I v _ i l i = 1 . . n l ) F (s) - 0 ( / ( u p ) (s)) ==> E endproc
Given the satisfaction clause • ~_aAt Latnat is Semilattice [ Nat w i t h =, (, max ] BIGMAX can be instantiated to • proc BMAXNAT [ n" Nat ] is BIGMAX [ Nat ] [ n ] In that instantiation, the formal operations eq, le and up of BIGMAX get bound to =, ( and max respectively.
265
6 _ EXCEPTIONS. In the definition of a function, it is often assumed that it applies to all possible values in its domain type. However, there are cases where the domain of definition should not cover all the domain type. In FP2, it-is possible to take care of such situations, which correspond to the notion of partial f u n c t i o n s - Preconditions on parameters can restrict the domain of definition and raise exceptions. Exceptions handlers provide means of defining the actions to be taken when an exception is raised.
6. I _ Preconditions In addition to the "normal" rules which define the reductions of terms containing operation applications, an operation definition may also contain precondition
Normal rules have the format ' left term
==>
right term
where, in the left term, the outmost function name is the operation being defined and its subterms are either constructor applications or variables. Furthermore, no two normal rules in an operation definition have unifiable left terms.
Precondition rules have the format • left term
I condition
==>
! exception name
Here, the outmost function name of the left term may also be a constructor. The condition is a term reducing to a boolean value, where the variables also appear in the left term. No two precondition rules in an operation definition have unifiable left terms. The exception name is simply an identifier.
266
For example, accessing the i th element of a sequence, w h e r e i is a natural number, requires that i~be not smaller than 1 and not greater than the length of the sequence. The following polymorphic operation "elem" has its domain of definition restricted accordingly, by means of a precondition rule • o_12 elem [ t" type ] l Ftype [ t ]" t* x Nat =) vars s • t* e • t i • Nat rules elem ( s, i ) I i < I v i ) length (s) elem ( e.s, 1 ) elem ( e.s, succ (succ (i))) endoo
out_of_range
==>
!
==>
e
==>
elem ( s, succ (i))
Given the definition of an operation f possibly containing precondition rules, an application f (arg) is interpreted as follows • I. If f(arg) matches the left term of a precondition rule, the corresponding condition is evaluated. If the result is true, the named exception is raised, w h e r e "raising an exception" means returning that exception as value. If the result is false, the precondition rule is ignored. 2. If f(arg) does not match the left term of a precondition rule, or if f(arg) matches the left term of a precondition rule but the condition was false, a normal rule is looked for with f(arg) matching its left term. 3. If f(arg) matches the left term of a normal rule, that rule is applied. . If f(arg) does not match the left term of a normal rule, the predefined exception '! axiomatization" is raised. This means that the definition of f is not complete. As a consequence of this general mechanism for interpreting function applications, e v e r y FP2 function f : Targ -) Tres may be viewed as a function f : Targ -) Tres | Exception w h e r e "Exception" is a predefined type : all Exception "values" are built by the constructor ! which take an identifier as its parameter.
267
Precondition rules can also be used to restrict the domain of constructors. For example, the type of rational n u m b e r s could be defined as follows • Rat cons vars
rules endtype
// m, n
• Nat × Nat -) Ra~ • Nat, m I/n ~ n ,~ 0 ==> ! zero_divide
In that case, since there are no "normal rules" for rewriting constructor applications, an application p / / q w h e r e p and q are in n o r m a l f o r m either raises ! zero_divide or stays as it is.
6.2 _ Exception handlers. Exceptions can only be raised by the reduction of terms. This occurs w h e n a precondition rule is applicable and its condition is true. It is also possible to explicitly raise an exception in the right t e r m of a n o r m a l rule, like in • f (arg)
==> if p (arg) t h e n g (arg) e!s¢ ! e .endif
Raising an exception means returning it as value. As a consequence, a s u b t e r m x of a t e r m f ( ... x ... ) m a y t u r n out to produce an exception value ! e : all functions in FP2 are strict w i t h respect to exceptions, which means that f(... !e _.) also has the value ! e. However, it is possible to catch an exception on its w a y out of a term, by means of an exception handler. There are two situations w h e r e exception handlers, may catch exceptions • W h e n the evaluation of the right t e r m of a n o r m a l operation rule produces an exception value ; in that case, an exception handler can be attached to the corresponding rule in the definition of the operation. W h e n a functional t e r m inside the post-condition of a process rule produces an exception : in that case, an exception handler can be attached to the state constructor of the c u r r e n t state in the definition of the process.
268
6.2.1 _ Exception handling in The general format of a normal operation rule with exception handlers attached isleft term
==>
right term
when
! e I then fl
when
l e z the____~n f2
when
! e n the____n_nfn
endwhen w h e r e ! e i is an exception name and fi is a term written with the same conventions as for a right term. When ! e i is obtained as the value of the right term, then fi is taken as a "replacement" right term and evaluated. Of course, the evaluation of fi may in turn raise an exception e' i which may be handled in its due place by the rule (in general another rule) getting it as its right term value, etc.
For example, let Seqnat be the type of infinite sequences of natural numbers w h e r e only a finite slice of elements indexed from I may have a non zero value • type
Seqnat cons infseq o pns access yars s i rules access
endtvDe
: Nat* "~ Seqnat : Seqnat x Nat =) Nat : Nat* : Nat ( infseq (s), i ) ==> elem (s, i) w h e n ! out_of_range then 0 endwhen
269
~ p t i o n
handling in processes.
In addition to the "normal" rules which define state transitions, a process definition m a y also contain e x c e ~ r e c o v e r y rules The general f o r m a t of an exception r e c o v e r y rule is whe____nn l e
Ln Q ==> s
w h e r e l e is an exception name, Q is a state constructor n a m e and s is a state. When a normal rule of the form • P(f)
• event
--=> Q ( g )
is being applied, the evaluation of g may raise exception ! e. If ! e is not caught by an exception handler of an operation rule before reaching the outer layer of g, t h e n the process is said to be in the exceptional state "! e in Q". If there is no exception r e c o v e r y rule corresponding to that exceptional state, the process is terminated. If there is one, the process "recovers" by going into state s. In that case, the application of the normal rule "recovered" by the r e c o v e r y rule is considered as one transition. For example, a process receiving two natural integers p and sending out p / / q m a y have to deal with the exception ! zero_divide • NATRAT in M, N out R states E F vats p,q r rules E F(r) when endproc
q
and
proc
' • • • " •
Nat Rat Rat Nat Rat
• M (p) N (q) • R (r) !zero_divide i_n_nF
==> ==>
E F(pHq)
==)
E
==)
E
It must also be noted that exceptions can be produced by the evaluation of sent m e s s a g e s " since no t e r m may be unified with an exception value, the consequence of that situation is that the corresponding rule is not applicable.
270
7 _ MODULES. FP2 allows the definition of a v a r i e t y of entities - Types - Operations Processes - Properties - Satisfactions -
The purpose of a definition is to associate a n a m e with an entity. The n a m e - e n t i t y associations established by a set of definitions are in effect within a region of FP2 text called a module. In addition to the above entities, it is also possible to define modules within modules : this is a means of structuring FP2 programs into a hierarchy of modules. This h i e r a r c h y of modules is used as a basis for controling the extent of the region of FP2 text accross which e v e r y definition is in effect. The basic f o r m a t of a module definition is " module
M is <module body> endmodule w h e r e M is the n a m e of the module and the module body is a set of definitions. With modules defined within modules, the basic visibility rides are the same as for classical block structure - all definitions of a module are visible from inner modules, except for redefined names. In addition to that "from inside-out" visibility, a module may exoort some of its definitions up to its directly enclosing module • such exported definitions are then considered as if t h e y w e r e made in the enclosing module. Thus, the exporting facility brings a controlled "from outside-in" visibility.
27~
For example, the definition of the operation r e p m a x uses an auxiliary function rep. For defining r e p m a x in a module M while keeping rep hidden, it is possible to write ' module
M is type Btree cons yafs
rules endtype module
B e_xx oj~
repmax
is
o_p= r e p m a x : Btree -) Btree vars t : Btree rules repmax(t) ==> rep(t, max(t)) endop_ o_p_ rep
endmodule
endmodule
: Btree × Nat -) Btree m, n : Nat u, v : Btree rep(tip(m),n) ==> tip(n) rep(fork(u,v),n) ==> fork(rep(u,n),rep(v,n))
272
But, it is also possible to exercise a control over the basic from "inside-out" visibility by explicitly stating w h a t names, which are visible in a module M, become hidden from some of its inner modules • module M ~
E x is
without D I, ..., Dn module N export E I, ..., Ep is
endmodule module P is
endmodule without E l module Q is
endmodule endmodule Here, the definition of module N is made at a point w h e r e names D i ..... D, which are known in M, become invisible from within N. By combining export and without facilities, FP2 allows a v e r y flexible control over the visibility of definitions. For example, the n a m e E l , which is exported by N, is visible in N, in M and also in the enclosing module of M. It is also visible in P, but it is not visible in Q.
273
AK_.__NOWLEDGEMENTS
The design, the formal definition and the implementation of FP2, both as a programming language and as a specification language, are carried out by a research group at LIFIA. The current (temporary ?) status of the language is the result of numerous discussions among the members of this group: Philippe Schnoebelen, Sylvie Roge, Juan-Manuel Pereira, Jean-Charles Marty, Anrd.ck Marty, Philippe Jorrand, Maria-DIanca Ibanez and Jean-Michel Hufflen. The principles for polymorphism in FP2 are drawn from the work accomplished kn another research group at LIFIA, led by Didier Bert, on the design and implementation of LPG "Langage de Programmation G~nerique". The work on FP2 has also benefited from the support of the French Project C3 ("Concurrence, Communication, Cooperation") of CNRS and from the support of Nixdorf Computer A.G. in Paderborn, FRG, within ESPRIT Project 415 ("Parallel Languages and Architectures for Advanced Information Processing. A VLSI Approach").
BIBLIOGRAPHY
The work on FP2 has heavily relied upon the current state of the art in language design. Much inspiration has come from recently proposed functional languages and from a variety of models for parallelism and communicating processes. A collection of such important sources is listed in the following pages. Past experience of LIFIA in language design has also been of some help and the corresponding reports are inserted in the list.
274
ARKAXHIU, E. "Un environnement et un langage graphique pour la specification de processus parall~les communicants." These, LIFIA, Grenoble, 1984. AUSTRY, D. "Aspects syntaxiques de MEIJE, un calcul pour le paralIelisme. Applications." These, LITP, Paris, 1984. AUSTRY, D. and BOUDOL, G. "Algebre de processus et synchronisation." Theoretical Computer Science, 1984. BACKUS, J. W. "can Programming Be Liberated From The Von Neumann Style ? A functional style and its algebra of programs." Communications of the ACM. Vol. 2 I, no. 8, ! 978. BACKUS, J. w. "The algebra of functional programs : function level reasoning, linear equations and extended definitions." Lecture Notes in Computer Science no. 107, 198 I. BACKUS, J. W. "Function Level Programs as Mathematical Objects." Conference on Functional Programming Languages & Computer Architecture, ACM, 198 I. BERT, D. "Specification algebrique et axiomatique des exceptions." RR IMAG 183, LIFIA, Grenoble, 1980. BERT, D. "Refinements of Generic Specifications with Algebraic Tools." IFIP Congress, North Holland, 1983. BERT, D. "Generic Programming : a tool for designing universal operators." RR IMAG 336, LIFIA, Grenoble, 1982. BERT, D. "Manuel de r~ference de LPG, Version 1.2." RR IMAG 408, LIFIA, Grenoble, 1983. BERT, D. and BENSALEM, S. "Algebre des operateurs generiques et transformation de programmes en LPG." RR IMAG 488 (LIFIA 14), Grenoble, 1984.
275
BERT, D. and JACQUET, P. "Some validation problems with parameterized types and generic functions." 3r~ International Symposium on Programming, Dunod, Paris, 1978. BIDOIT, M. "Une methode de presentation des types abstraits : applications." These, LRI, Orsay, 198 I. BJORNER, D. and JONES, C. B. "The Vienna Development Method : The Meta-Language." Lecture Notes in Computer Science no. 6 I, 1978. BJORNER, D. and JONES, C. B. "Formal specification & software development." Prentice Hall International, Englewood Cliffs, New Jersey 1982. BOUDOL, G. "Computational semantics of terms rewriting systems." RR 192, INRIA, 1983. BROOKES, S. D. "A model for communicating sequential processes." Thesis, Carnegie-Mellon University, ! 983. BURSTALL, R. M., MACQUEEN,D.B. and SANNELLA, D.T. "HOPE: an experimental applicative language." CSR-62-80, University of Edinburgh, 198 I. CISNEROS, M. "Programmation parallele et programmation fonctionnelle : propositions pour un langage." These, LIFIA, Grenoble, 1984. DERSHOWITZ,N. "Computing with rewrite systems." ATR-83 (8478)-I, Aerospace Corporation, 1983. GOGUEN,J. A., THATCHER,J. W. and WAGNER,E. G. "An initial algebra approach to the specification, correctness, and implementation of abstract data types." Current Trends in Programming Methodology, Vol. 4, Prentice Hall, Englewood Cliffs, New Jersey, 1978. GUERREIRO, P. J. V. D. "Semantique relationnelle des programmes non-deterministes et des processus communicants." Th~se, IMAG, Grenoble, juillet 198 I. GUTTAG, J. V. and HORNING, J.J. "The algebraic specification of abstract data types." Acta Informatica, 1978.
276
HOARE, C. A. R. "Communicating sequential processes." Communications of the ACM, Vol. 2 I, no. 8, 1978. HOARE, C. A.R. "Notes on communicating processes." PRG-33, Oxford University, 1983. HUFFLEN, J. M. "Notes sur FP et son implantation en LPG." RR IMAG 518 (LIFIA 20), Grenoble, 1985. JORRAND, Ph. "Specification of communicating processes and process implementation correctness." Lecture Notes in Computer Science no. 137, 1982. JORRAND, Ph. "FP2 :"Functional Parallel Programming based on term substitution." RR IMAG 482 (LIFIA 15), Grenoble, 1984. MAY, D. "OCCAM."SIGPLAN Notices, Vol. 13, no. 4, 1983. MILNER, R. "A calculus of communicating systems." Lecture Notes in Computer Science, no. 92, 1980. PEREIRA, J. M. "Processus communicants : un langage formel et ses mod(~les. Probl~mes d'analyse." Th~se, LIFIA, Grenoble, 1984. SOLER, R. "Une approche de la th(~orie de D. Scott et application ~ la semantique des types abstraits alg~briques." Th(~se, LIFIA, Grenoble, septembre 1982. TURNER, D. A. "The semantic elegance of applicative languages." Conference on Functional Programming Languages & Computer Architecture, ACM, 198 I. WILLIAMS J. H. "On the development of the algebra of functional programs." ACM Transactions on Programming Languages and Systems, Vol. 4, no. 4, 1982.
Concurrent Pro]og: A Progress Report
Ehud Shapiro Department. of Computer Science The Weizmann Institute of Science Rehovot 76100, Israel April 1986
Abstract Concurrent Prolog is a logic programming language designed for concurrent programming and parallel execution. It is a process oriented language, which embodies dataflow synchronization arm guarded-command indeterminacy as its basic control mechanisms. The paper outlines the basic concepts and definition of the language, and surveys the major programming techniques that emerged out of three years of its use. The history of the language development, implementation, and applications to date is reviewed. Details of the performance of its compiler and the functionality of Logix, its programming environment and operating system, are provided.
1.
Orientation
Logic programming is based on an abstract computation model, derived by Kowalski [28] from Robinson's resolution principle [40]. A logic program is a set of axioms defining relationships between objects. A computation of a logic program is a proof of a goal statement from the axioms. As the proof is constructive, it provides values for goal variables, which constitute the output of the computation. Figure 1.1 shows the relationships between the abstract computation model of logic programming, and two concrete programming languages based on it: Prolog, designed by A. Colmerauer [41] and Concurrent Prolog. It shows that Prolog programs are logic programs augmented with a control mechanism based on sequential search with backtracking; Concurrent Prolog's control is based on guardedcommand indeterminacy and dataflow synchronization. The execution model of Prolog is implemented using a stack of goals, which behave like procedure calls. Concurrent Prolog's computation model is implemented using a queue of goals,
278
Logic Programs Nondeterministic goal reduction
Abstract model:
Unification Language:
Control:
Implementation:
Prolog
Concurrent Prolog
Goal and clause order define sequential search and backtracking
Commit and read-only operators define guarded-command indeterminacy and dataflow synchronization
stack of goals + trail for backtracking
queue of goals + suspension mechanism
F i g u r e 1.1: Logic programs, Prolog, and Concurrent Prolog
which behave like processes. Figure 1.2 argues that there is a homomorphism between von Neumann and logic, sequential and concurrent languages. That is, it claims that the relationship between Occam and Concurrent Prolog is similar to the relationship between Pascal and Prolog, and that the relationship between Pascal and Occam is similar to the relationship between Prolog and Concurrent Prolog 1 .
2.
Logic Programs
A logic program is a set of axioms, or rules, defining relationships between objects. A computation of a logic program is a deduction of consequences of the axioms. The concepts of logic programming and the definition and implementation Some of the attributes in the figure are rather schematic, and shouldn't be taken literally, e.g. Pascal has recursion, but its basic repetitive construct, as in Occam, is iteration, whereas in Frolog and Concurrent Prolog it is r~ursion. Similarly Occam has if-then-else, but its basic conditional statement, as in Concurrent Prolog, is the guarded-command.
279
Pascal
Prolog
Occam
Concurrent Prolog
sequential sfiack-based procedure call parameter passing if-then-else/cut concurrent
queue-based process activation message passing guarded-command/commit yon Neumann model storage variables (mutable) parameter-passing, assignment, selectors, constructors explicit/static allocation of data/processes iteration
logic programs model logical variables (single assignment) unification
implicit/dynamic allocation of data/processes with garbage collection recursion
F i g u r e 1.2: A homomorphism between von Neumann and logic, sequential and concurrent languages
of the programming language Prolog date back to the early seventies. Earlier attempts were made to use Robinson's resolution principle and unification algorithm [40] as the engine of a logic based computation model [16]. These attempts were frustrated by the inherent inefficiency of general resolution and by the lack of a natural control mechanism which could be applied to it. Kowalski [28] has found that such a control mechanism can be applied to a restricted class of logical theories, namely Horn clause theories. His major insight was that universally quantified axioms of the form A +-- B 1 , B 2 , . . . , B n
n >_ 0
can be read both declaratively, saying that A is true if B1 and B2 and ... and Bn are
280
intersect(X,L1,L2) ~-- member(X,L1), member(X,L2). member(X,list (X,Xs)). member(X,list(Y,Ys)) ~-- member(X,Ys). Program
2.1: A logic program for List intersection
true, and procedurally, saying that to prove the goal A (execute procedure A, solve problem A), one can prove subgoals (execute subprocedures, solve subproblems) B1 and B2 and ... and Bn. Such axioms are called definite-clauses. A logic program is a finite set of definite clauses. Program 2.1 is an example of a logic program for defining list intersection. It assumes that lists such as [1,2,3] are represented by recursive terms such as
list(1,1ist( e, list( S, nil) ) ). Declaratively, its first axiom reads: X is in the intersection of lists L1 and L2 if X is a member of L1 mad X is a member of/;2. Procedurally, it reads: to find an X in the intersection of L1 and/;2 find an X which is a member of L1 and is also a member of L2. The axioms defining member read declaratively: X is a member of the list whose first element is X. X is a member of the list list( Y, Ys) if X is a member of Ys. (Here and in the following we use the convention that names of logical variable begin with an upper-case letter.) The difference between the various logic programming languages, such as sequential Prolog [41], PARLOG [7], Guarded Horn Clauses [65], and Concurrent Prolog [49], lie in the way they deduce consequences from such axioms. However, the deduction mechanism used by all these languages is based on the abstract interpreter for logic programs, shown in Figure 2.1. The notions it uses are explained below. On the face of it, the abstract interpreter seems nothing b u t a simple nondeterministic reduction engine: it has a resolvent, which is a set of goals to reduce; it selects a goal from the resolvent, a unifiable clause from the program, and reduces the goal using t h e clause. What distinguishes this computation model from others is the logical variable, and the unification procedure associated with it. The basic computation step of the interpreter, as well as that of Prolog and Concurrent Prolog, is the unification of a goal with the head of a clause [40]. The unification of two terms involves finding a substitution of values for variables in the terms that make the two terms identical. Thus unification is a simple and powerful form of pattern matching.
281
Input:
A logic program P and a goal G
Output:
GO, which is an instance of G proved from P, or failure.
Algorithm: Initialize the resolvent to be G, the input goal. While the resolvent is not empty do choose a goat A in the resolvent and a fresh copy of a clause A ~ *- B1,B2,...,B~, k > O, in P, such that A and A I are unifiable with a substitution 0 (exit if such a goal and clause do not exist). Remove A frora, and add B1,B2,...,B,~ to, the resolvent Apply 0 to the resolvent and to G. If the resolvent is empty then output G, else output failure. F i g u r e 2.1: An abstract interpreter for logic programs
Unification is the basic, and only, data manipulation primitive in logic programming. Understanding logic programming is understanding the power of unification. As the example programs below show, unification subsumes the following data-manipulation primitives, used in conventional programming languages: •
Single-assignment (assigning a value to a single-assignment variable).
® Parameter passing (binding actual parameters to formal parameters in a procedure or function call). •
Simple testing (testing whether a variable equals some value, or if the values of two variables are the same).
.
Data access (field selectors in Pascal, ear and edr in Lisp).
•
Data construction (new in Pascal, cons in Lisp).
•
Communication (as elaborated below).
The efficient implementation of a logic programming language involves the compilation of the known part of unification, as specified by the program's clause heads to the above mentioned set of more primitive operations [72]. A term is either a variable, e.g. X, a constant, e.g. a and 18, or a compound t e r m f(T1,T~,...,T,~), whose main functor has name f, arity n, and whose argu-
282 T1,T2,...,Tn, are terms. A substitution element is a pair of the form Variable=Term. An (idempotent) substitution is a finite set of substitution elements ( V I = T 1 , V2=T2,..., V,,=T,~) such that V i ¢ V1 if i ~ j, and Vi does not occur in Ti for any i and 3".
ments
The application of a substitution 0 to a term S, denoted $0, is the term obtained by replacing every occurrence of a variable V by the term T~ for every substitution element V----Tin 0. Such a term is called an instance of S. term
For example, applying the substitution (X---3, Xs--list(1,1ist(3,nil))} to the mcmber( X, list( X, Xs) ) is the term member( 3,1ist(3,1ist(1,1ist(3,nil) ) ) ).
A substitution 0 unifies terms T1 and T2 if T10=T20. Two terms are unifiable if they have a unifying substitution. If two terms T1 and T9 are unifiable then there exists a unique substitution/9 (up to renaming of variables), called the most general unifier of T1 and T2, with the following property: for any other unifying substitution a of T1 and T~, Tla is an instance of T10. In the following we use 'unifier' as a shorthand for 'most general unifier'. For example, the unifier of X and a is (X=a). The unifier of X and Y is (or (Y=X)). The unifier off(X,X) and f(A,b) is (X=b, A--b), and the unifier of g(X,X) and g(a,b) does not exist. Considering the example logic program above, the unifier of member(A,list(l,list($,nil))) and member(X, list(X, Xs))
(X=Y)
is {X=I,A=I,Xs=tist(
3.
,nil) }.
Concurrent Prolog
We first survey some common concepts of concurrent programming, tie them to logic programming, and then introduce Concurrent Prolog.
3.1 Concurrent programming: processes, communication, and synchronization A concurrent programming language can express concurrent activities, or processes, and communication among them. Processes are abstract entities; they are the generalization of the execution thread of sequential programs. The actions a process can take include inter-process communication, change of state, creation of new processes, and termination. It might seem that a declarative language, based on the logic programming computation model, will be unsuitable for expressing the wide spectrum of actions of concurrent programs. This is not the case. Sequential Prolog shows that, in addition to its declarative reading, a logic program can be read procedurally.
283
al) Goal = Process a2) Conjunctive goal = Network of processes a3) Shared logical variable = Communication channel = Shared-memory single-assignment variable a4) Clauses of a logic program = Rules, or instructions, for process behavior F i g u r e 3.1: Concepts of logic programming and concurrency
Concurrent Prolog shows yet another possible reading of logic programs, namely t h e process behavior reading, or process reading for short. The insight we would like to convey is that the essential components of concurrent computations Q concurrent actions, indeterminate actions, communication, and process creation and termination - - are already embodied in the abstract computation model of logic programming, and that they can be uncovered using the process reading. Before introducing the computation model of Concurrent Prolog that embodies these notions, we would like to dwell on the intuitions and metaphors that link the formal, symbolic, computational model with the familiar concepts of concurrent programming, via a sequence of analogies, shown in Figure 3.1. We exemplify t h e m using the Concurrent Prolog program for quicksort, Program 3.1. In the meantime the read-only operator '?' can be ignored, and the commit operator '1' can be read as a conjunction ','. Following Edinburgh Prolog, the term [XIXs ] is a syntactic convention replacing list(X, Xs), and [] replaces nil. The list [1,ZlXs ] is a shorthand for [l[[21Xs]] , that is list(g,Iist(2,Xs)), and [1,2,31 for list(1,1ist(2,1ist(3,nil))). The clauses for quicksort read: Sorting the list [XIXs] gives Ys if partitioning Xs with respect to X gives Smaller and Larger, sorting Larger gives Ls, sorting Smaller gives Ss, and appending [X]Ss] to Ls gives Ys. Sorting the empty list gives the empty list. The first clause of partition reads: partitioning a list [XIIn ] with respect to X gives [ YISmatler] and Larger if X _> Y and partitioning In with respect to X gives Smaller and Larger. al) Goal = Process A goal p(T1,T2,...,Tn) can be viewed as a process. The arguments of the goal (TI,T2,..., Tn) constitute the data state of the process. The predicate, p/n (name p, arity n), is the program state, which determines the procedure (set of
284 quicksort ([XIXs],Ys) +partition(Xs?,X,Smaller,Larger), quicksor t (Smaller?,Ss), quicksort(Larger?,Ls), append(Ss?,[XlLs?],Ys). quicksort([ ],[ ]). partition([YlIn],X,[Y[Smaller ],Larger) +X >_ Y [ partition(In?,X,Smaller,Larger). partition([YlIn ],X,Smaller,[vlLarger]) +X < Y [ partition(In?,X,SmalIer,Larger). partition([ ],X,[ ],[ ]).
append([XlX],Ys,iXlZs]) append(Xs?,Ys,Zs). append([ ],Xs,Xs). Program
3.1: A Concurrent Prolog Quicksort program
clauses with same predicate name and arity) executed by the process. A typical state of a quicksort process might be qsort([5,SS,$,7,191Xs ], Ys). a2) Conjunctive goal = Network of processes A network of processes is defined by its constituent processes, and by the way they are interconnected. A conjunctive goal is a set of processes. For example, the body of the recursive clause of quieksort defines a network of four proceses, one partition process, two quicksort processes, and one append process. The variables shared between the goals in the conjunction determine an illterconnection scheme. This leads to a third analogy. a3) Shared logical variable -= Communication channel -= Shared single-assignment
variable A communication channel provides a means by which two or more processes may communicate information. A shared variable is another means for several processes to share or communicate information. A logical variable, shared between two or more goals (processes), can serve both these functions. For example, the variables Smaller and Larger serve as communication channels between partition and the two recursive quicksort processes. Logical variables are single-assignment, since a logical variable can be assigned only once during a computation. Hence, a logical variable is analogous to a communication channel capable of transmitting only one message, or to a shared-memory variable that can receive only one value.
285
Note that under this singie-assignment restriction the distinction between a communication channel and a shared-memory variable vanishes. It is convenient to view shared logical variables sometimes as analogous to communication channels and sometimes as analogous to shared-memory variables. The single-assignment restriction has been proposed as suitable for parallel programming languages independently of Ioglc-programming [i]. At first sight it would seem a hindrance to the expressiveness of Concurrent Prolog, but it is not. Multiple communications and cooperative construction of a complex data structure are possible by starting with a single shared logical variable, as explained below. a4) Clauses of a logic program -- Rules, or instructions, for process behavior The actions of a process can be separated into control actions and data actions. Control actions include termination, iteration, branching~ and creation of new processes. These are specified explicitly by logic program clauses. Data actions include communication and various operations on data structures, e.g. single-assignment, inspection, testing, and construction. As in sequential Prolog, data actions are specified implicitly by the arguments of the head and body goals of a clause, and are realized via unification.
3.2 The process reading of logic programs We show how termination, iteration, branching, state-change, and creation of new processes can be specified by clauses, using the process reading of logic programs. 1) Terminate A unit clause, i.e. a definite clause with an empty body:
p(T1, T2,..., Tn). specifies that a process in a state unifiable with p(T1,Ts,°..,T,~) can reduce itself to the empty set of processes~ and thus terminate. For example the clause quicksort([ ],[ ]) says that any process which unifies with it, e.g. quicksort([ ], Ys), may terminate. While doing soy this process unifies Ys with [ ], effectively closing its output stream. 2) Change of data and program state An iterative clause, i.e. a clause with one goal in the body:
p( T1, T2,..., Tn) +-- q(S1,S2,...,S,~). specifies that a process in a state unifiable with p( T1, T2,..., Try) can change its state to q(S1,S2,...,Sm). The program state is changed to q/m (i.e. branch),
286
and the data state to (St~S2,.o .,Sin). For example, the recursive clause of append specifies that the process append([1,3,~,7,1Z]il],[21,2P,25ILZ],L3 ) can change its state to append([3,4,7, 12]L1],[P1,PP, B5]LP],Zs). While doing so, it unifies L• with [1 IZs], effectively sending an element down its output stream. Since append branches back to itself, it is actually an iterative process. 3) Create new processes A general clause, of the form:
p(T1,T2,...,T,~) ~ Q1,Q2,...,Qm. specifies that a process in a state unifiable with p( TI, T2,..., T,t) can replace itself with m new processes as specified by Q1,Q2,...,Qm. For example, the recursive clause of quicksort says that a quicksor~ process whose first argument is a list can replace itself with a network of four processes: one partition process, two quieksort processes, and one append process. It further specifies their interconnection, and initializes the first element in the list forming the second argument of append to be X, the partitioning element. Note that under this reading an iterative clause can be viewed as specifying that a process can be replaced by another process, rather then change its state. These two views are equivalent. Recall the abstract interpreter in Figure 3.1. Under the process reading the resolvent, i.e. the current set of goals of the interpreter, is viewed as a network of concurrent processes, where each goal is a process. The basic action a process can take is process reduction: the unification of the process with the head of a clause, and its reduction to (or replacement by) the processes specified by the body of the clause. The actions a process can take depend on its state - - on whether its arguments unify with the arguments of the head of a given clause. Concurrency can be achieved by reducing several processes in parallel. This form of parallelism is called And-parallelism. Communication is achieved by the assignment of values to shared variables, caused by the unification that occurs during process reduction. Given a process to reduce, all clauses applicable for its reduction may be tried in parallel. This form of parallelism is called Or-parallelism, and is the source of a process's ability to take indeterminate actions.
3.3 Synchronization using the read-only and commit operators In contrast to sequential Prolog, in Concurrent ProIog art action taken by a process cannot be undone: once a process has reduced itself using some clause, it is
:287
committed to it. The resulting computational behavior is called committed-choice nondeterminism, don't-care nondeterminism, and sometimes also indeterminacy, to distinguish it from the "don't-know" nondeterminism of the abstract interpreter. This design decision is common to other concurrent logic programming languages, including the original Relational Language [6], PARLOG [7], and GHC [54]. It implies that a process faced with a choice should better make a correct one, lest it might doom the entire computation to failure. The basic strategy taken by Concurrent Prolog to ensure that processes make correct choices of actions is to provide the programmer with a mechanism to delay process reductions until enough information is available so that a correct choice can be made. The two synchronization and control constructs of Concurrent Prolog are the read-only and the commit operators. The read-only operator (indicated by a question-mark suffix '?'), can be applied to logical variables, e.g. X?, thus designating them as read-only. The read-only operator is ignored in the declarative reading of a clause, and can be understood only operationally. Intuitively, a read-only variable cannot be written upon, i.e. be instantiated. It can receive a value only through the instantiation of its corresponding writeenabled variable. A unification that attempts to instantiate a read-only variable suspends until that variable becomes instantiated. For example, the unification of X? with a suspends; of f(X, Y?) w i t h / ( a , g ) succeeds, with unifier {X=a, Z=Y?}. Considering Program 3.1, the unification of quieksort(In?,Out) with both quieksort([ ],[ ]) and quieksort([X[Xs], Ys)suspends, as does the unification of append(Li?,[3,4,hlLZ],L3 ) with the heads of its two clauses. However, as soon as In? gets instantiated to [81Ini], for example, by another partition process who has a write-enabled occurrence of In, the unification of the quieksort goal with the head of the first clause fails, and with the second clause succeeds. Definition: We assume two distinct sets of variables, write-enabled variables and read-only variables. The read-only operator, ?, is a one-to-one mapping from write-enabled to read-only variables. It is written in postfix notation. For every write-enabled variable X, the variable X? is the read-only variable corresponding
to X.
|
The extension of the read-only operator to terms which are not write-enabled variables is the identity function. Definition: A substitution 0 affects a variable X if it contains a substitution element X=T. A substitution 0 is admissible if it does not affect any read-only variable. |
288
D e f i n i t i o n : The read-only extension of a substitution 8, denoted 0?, is the result of adding to 0 the substitution elements X?=T? for every X = T in 0 such that T ~ X?. | D e f i n i t i o n : The read-only unification of two terms T1 and T2 succeeds, with read-only mgu 0?, if T1 and T2 have an admissible mgu 8. It suspends if every mgu of TI and T2 is not admissible. It fails if T1 and T2 do not unify. | Note that the definition of unifiability prevents the unification attempt to instantiate read-only variables. However, once the unification is successful, the read-only unifier instantiates read-only variables in accordance with their corresponding write-enabled variables. This definition of read-only unification resolves several ill-defined points in the original description of Concurrent Prolog [49], discussed by Saraswat [42] and Ueda [65], such as order-dependency. It implicitly embodies the suggestion of Ramakrishnan and Silberschatz [39], that a single unification should not be able to "feed itself', that is simultaneously write on a write-enabled variable and read from its corresponding read-only variable. In particular, it implies that the unification of f(X,X?) with f(a,a) suspends. The second synchronization and control construct of Concurrent Prolog is the commit operator. A guarded clause is a clause of the form:
A ,,- G1,G2,...,Gin t B1,B2,...,B,~
m,n >_ O.
The commit operator 'l' separates the right hand side of a rule into a guard and a body. Declaratively, the commit operator is read just like a conjunction: A is true if the G's and the B's are true. Procedurally, the reduction of a process A1 using such a clause suspends until A1 is unifiable with A, and the guard is determined to be true. Thus the guard is another mechanism for preventing or postponing erroneous process actions. As a syntactic convention, if the guard is empty, i.e. re=O, the commit operator is omitted. The read-only variables in the recursive invocations of quicksort, partition, and append cause them to suspend until it is known whether the input is a list or nil. The non-empty guard in the recursive clauses for partition allows the process to choose correctly on which output stream to place its next input element. It is placed on the first stream if it is smaller or equal to the partitioning element. It is placed on the second stream if it is larger then the partitioning element. Concurrent Prolog allows G's, the goals in the guard, to be calls to general Concurrent Prolog programs. Hence guards can be nested recursively, and testing the applicability of a clause for reduction can be arbitrarily complex. In the following discussion we will restrict our attention to a subset of Concurrent Prolog
289
,:ailed Flat Concurrent Prolog [33 I. In Flat Concurrent Prolog the goals in the guards can contain calls to a fixed set of simple test-predicates only. For example, P r o g r a m 3.1 is a Flat Concurrent Prolog program. In Flat Concurrent Prolog, the reduction of a goal using a guarded clause succeeds if the goal unifies with the clauses' head, and its guard test predicates succeed. Flat Concurrent Prolog is both the target language and the implementation language for the Logix system, to be discussed in Section 5. It is a rich enough subset of Concurrent Prolog to be sufficient for most practical purposes. It is simple enough to be amenable to an efficient implementation, resulting in a high-level concurrent programming language which is practical even on conventional uniprocessors.
3.4 An abstract interpreter for Flat Concurrent Prolog Flat Concurrent Prolog is provided with a fixed set T of test predicates. Typical test predicates include string(X) (which suspends until X is a non-variable, then succeeds if it is a string; fails otherwise), and X < Y (which suspends until X and Y are non-variables, then succeeds if they are integers such that X < Y, else fails). D e f i n i t i o n : A fiat guarded clause is a guarded clause of the form
A +-- G1,G2,...,Gin [ B1,B2,...,Bn
m,n >_ O.
such that the predicate of Gi is in T, for all i, 0 < i < m.
A Flat Concurrent Prolog program is a finite set of fiat guarded clauses.
|
An abstract interpreter of Flat Concurrent ProIog is defined in Figure 3.2. The interpreter again leaves the nondeterministic choices for a goal and a clause unspecified: the scheduling policy, by which goals are added to and removed from the resoIvent, and the clause selection policy, which indicates which clause to choose for reduction, when several clauses are applicable. Fairness in the scheduling and clause selection policies are further discussed in [44]. For concreteness, we will explain the choices made in Logix. Logix implements bounded depth-first scheduling. In bounded depth-first scheduling the resolvent is maintained as a queue~ and each dequeued goal is allocated a timeslice t. A dequeued goal can be reduced t times before it is returned back to the queue. If a goal is reduced using an iterative clause A ~-- B, then B inherits the remaining time-slice. If it is reduced using a general Clause A ~-- Bx,B2,...,Bn, then, by convention, B1 inherits the remaining time-slice, and B2 to Bn are enqueued to the back of the queue. Bounded depth-first scheduling reduces the overhead
290
Input:
A Flat Concurrent Prolog program P and a goal G
Output:
GO, if GO was an instance of G proved from P or deadlock otherwise.
Algorithm: Initialize the resolvent to be G, the input goal. While the resolvent is not empty do choose a goal A in the resolvent and a fresh copy of a clause
A' *-- Gx,G2,...,Gin ] B1,B2,...,B,~ in P such that A and A' have a read-only unifier 0 and the tests (G1,G2,...,Gm)O succeed (exit if such a goal and clause do not exist). Remove A from and add B1,B2,...,B,~ to the resolvent Apply 0 to the resolvent and to G. If the resolvent is empty then output G, else output deadlock. F i g u r e 3.~: An abstract interpreter for Flat Concurrent Prolog
of process switching, and allows more effective cashing of process arguments in registers. Logix also implements stable clause selection, which means that if a process has several applicable clauses for reduction, the first one (textually) will be chosen. Stability is a property that can be abused by programmers. It is hard to preserve in a distributed implementation [44], and makes the life of optimizing compilers harder. It is not part of the language definition. In addition Logix implements a non-busy waiting mechanism, in which a suspended process is associated with the set of read-only variables which caused the suspension of its clause reductions. If any of the variables in that suspension set gets instantiated, the process is activated, and enqueued to the back of the queue. The abstract interpreter models concurrency by interleaving. The truly parallel implementation of the language requires that each process reduction be viewed as an atomic transaction, which reads from and writes to logical variables. A parallel interpreter must ensure that its resulting behavior is serializable, i.e. can be ordered to correspond to some possible behavior of the sequential interpreter. Such an algorithm has been designed [ref distributed] and is currently being im-
291
plemented on Intel's iPSC at the Weizn~ann Institute.
4.
Concurrent Prolog Programming
Techniques
In the past three years of its use, Concurrent Prolog has developed a wide range of programming techniques. Some are simply known concurrent programming techniques restated in the formalism of logic programming, e.g. divide-andconquer, monitors, stream-processing, and bounded buffers. Others are novel techniques, which exploit the unique aspects of logic programs, notably the logical variable. Examples include difference-streams, incomplete-messages, and the short-circuit technique. Some techniques exploit properties of the read-only variable, e.g. blackboards, constraint-systems, and protected data-structures. Perhaps the most important in the long-run are the meta-programming techniques. Using enhanced meta-interpreters, one can implemented a wide spectrum of programming environment and operating system functions, such as inspecting and affecting the state of the computation, and detecting distributed termination and deadlock, in a simple and uniform way [45,20]. In the following account of these techniques breadth was preferred over depth. References to deeper treatment of various subjects are provided.
4.1 Divide-and-conquer: recursion and communication Divide and conquer is a method for solving a problem by dividing it into subproblems, solving them, possibly in parallel, and combining the results. If the subproblems are small enough they are solved directly, otherwise they are solved by applying the divide-and-conquer method recursively. Parallel divide-and-conquer algorithms can be specified easily in both functional and logic languages. Divideand-conquer becomes more interesting when it involves cooperation, and hence direct communication, among the processes solving the subproblems. Program 4.1 solves a problem due to Leslie Lamport [30]. The problem is to number the leaves of a tree in ascending order from left to right, by the following recursive algorithm: spawn leaf processes, one per leaf, in such a way that each process has an input channel from the leaf process to its left, and an output channel to the leaf process to its right. The leftmost leaf process is initialized with .a number. Each process receives a number from the left, numbers its leaf with it, increments it by one, and sends the result to the right. The problem is shown in order to explore the problematies of combining recursion with communication, and is not necessarily a useful parallel algorithm. The program assumes that binary trees are represented using the terms
292
number(leaf(N) ,N,N1) ~-plus(N?,l,N1)o number(tree(L,R),N,N 2) ~number(L?,N?,N1), number(R?,Nl?,N2). P r o g r a m 4.1: Numbering the leaves of a tree: recursion with general communication
leaf(X) and tree(L,R). For example with three leaves.
tree(leaf(Xl),tree(leaf(X2),leaf(X3))) is a tree
Program 4.1 works in parallel on the two subtrees of a tree, until it reaches a leaf, where it spawns a plus process. A plus process suspends until its first two arguments are integers, then unifies the third with their sum. The plus processes, however, cannot operate in parallel. Rather, they are synchronized in such a way t h a t they are activated one at a time, starting from the leftmost node. Program 4.1 passes the communication channels to the leaf processes in a simple and uniform way, via unification. It numbers a leaf by unifying its value with the left channel, even before that channel has transmitted a value.
4.2 Stream processing Concurrent Prolog is a single-assignment programming language, in that a logical variable can be assigned to a non-variable term only once during a computation. Hence it seems that, as a communication channel, a shared logical variable can transmit at most one message between two processes. This is not quite true. A variable can be assigned to a term that contains a message and another variable. This new variables is shared by the processes t h a t shared the original variable. Hence it can serve as a new communication channel, which can be assigned to a term t h a t contains an additional message and an additional variable, and so on ad
infinitum. This idea is the basis of stream communication in Concurrent Prolog. In stream communication, the communicating processes, typically one sender and one receiver (also called the stream's producer and consumer) share a variable, say Xs. The sender, who wants to send a sequence of messages ml,m2,m3,... assigns Xs to [ml]Xsl] in order to send ml, then instantiates Xsl to [m21Xs2] to send m2, then assigns Xs2 to [m3[Xs3], and so on. The receiver inspects the read-only variable
Xs? attempting to unify it with
293
merge([XIXsl,Ys,[X[Zs]) ~- merge(Xs?,Ys?,Zs). merge(Xs,[YiYs],[YlZs]) +--merge(Xs?,Ys?,Zs). merge([ 1,[ ],[ 1). Program
4.2: A binary stream merger
[M1IXsl ]. When successful, it can process the first message MI, and iterate with Xsl?, waiting for the next message. Exactly the same technique would work for one sender and multiple receivers, provided that all receivers have read-only access to the original shared variable. A receiver that spawns a new process can include it in the group of receivers by providing it with a read-only reference to the current stream variable. Program 3.1 for Quicksort demonstrates stream processing. Each partition process has one input stream and two o u t p u t streams. On each iteration it consumes one element from its input stream, and places it on one of its output streams. When it reaches the end of its input stream it closes its two output streams and terminates. The append process from the same program is a simpler example of a stream processor. It copies its first input stream into its output stream, and when it reaches the end of the first input stream it binds the second input stream to its o u t p u t stream, and terminates.
4.3 Stream merging Streams are the basic communication means between processes in Concurrent Prolog. It is sometimes necessary, or convenient, to allow several processes to communicate with one other process. This is achieved in Concurrent Prolog using a stream merger. A stream merger is not a function, since its output - - the merged stream can be any one of the possible interleavings of its input streams. Hence streambased functional programming languages incorporate stream mergers as a language primitive. In logic programming, however, a stream merger can be defined directly, as was shown by Clark and Gregory [6]; their definition, adapted to Concurrent Prolog, is shown in Program 4.2. As a logic program, Program 4.2 defines the relation containing all facts merge(Xs, Ys, Zs), in which the list Zs is an order preserving interleaving of the elements of the lists Xs and ]Is. As a process, merge(Xs~, Ys?,Zs) behaves as follows: If neither Xs nor Ys are instantiated, it suspends, since unification with all
294
three clauses suspends. If Xs is a list then it can reduce using the first clause, which copies the list element to Zs, its output stream, and iterates with the updated streams. Similarly with Ys and the second clause. If it has reached the end of its input streams it closes its output stream and terminates, as specified by the third clause.
In case both Xs and Ys have elements ready, either the first or the second clause can be used for reduction. The abstract interpreter of Flat Concurrent Prolog, defined in Figure 2.1, does not dictate which one to use. This may lead to an unfortunate situation, in which one clause (say the first) is always chosen, and elements from the second stream never appear in the output stream. A stream merger that allows this is called unfair. There are several techniques to implement fair mergers in Concurrent Prolog. They are discussed in [51,52,67].
4.4 Recursive process networks The recursive structure of Concurrent Prolog, together with the logical variable, makes it a convenient language for specifying recursive process networks. An example is the Quicksort program above. Although hard to visualize, the program forms two tree-like networks: a tree of partition processes, which partitions the input list into smaller lists, and a tree of append processes, which concatenates these lists together. Process trees are useful for divide-and-conquer algorithms, and for searching, among other things. Here we show an application to stream merging. An n-ary stream merger can be obtained by composing n-1 binary stream mergers in a process tree. A program for creating a balanced tree of binary merge operators is shown as Program 4.3. Program 4.3 creates a merge tree layer by layer, using an auxiliary procedure
merge_layer. The merge trees defined are static, i.e. the number of streams to be merged should be defined in advance, and cannot be changed easily. In [44] it is shown how to implement multiway dynamic merge trees in Concurrent Prolog, using the concept of 2-3-trees. Ueda and Chikayama [67] and Shapiro and Safra [52] improve this scheme further. More complex process structures, including rectangular and hexagonal process arrays [50], quad-trees [11], and pyramids, can easily be constructed in Concurrent Prolog. These process structures are found useful in programming systolic algorithms, and spawning virtual parallel machines [64].
295
merge_tree(Bottom,Top) ¢-.... Bottom¢[-] l merge_layer (Bott om,Bottoml), merge_tree (Bott oml?,Top). merge_tree([Xs],Xs). merge-layer ([Xs,YslBottomt,[ZslBottoml ?]) merge(Xs?,Ys?,Zs), merge_layer(Bottom?,Bottoml). merge _layer ([Xs],[Xs D. merge_layer([ ],[ ]). merge(Xs,Ys,Zs) ~ See Program 5.10. P r o g r a m 4.3: A balanced binary merge tree
4.5 Systolic programming: parallelism with locality and pipelining Systolic algorithms were designed originally by Kung and his colleagues [29] for implementation via special purpose hardware. However, they are based on two rather general principles: 1.
Localize communication
2.
Overlap and balance computation with communication.
The advantages of implementing systolic algorithms on general purpose parallel computers using a high-level language, compared to implementation in special purpose hardware, are obvious. The systolic programming approach [50] was conceived in an attempt to apply the systolic approach to general purpose parallel computers. The specification of systolic algorithms in Concurrent Prolog is rather straightforward. However, to ensure that performance is preserved in the implementation, two aspects of the execution of the program need explicit attention. One is the mapping of processes to processors, which should preserve the locality of the algorithm, using the locality of the architecture. Another is the communication pattern employed by the processes. In the systolic programming approach [50], the mapping is done using a special notation, Logo-like Turtle programs [36]. Each process, like a turtle in Logo, is associated a position and a heading. A goal in the body of a clause may have a Turtle program associated with it. When activated, this Turtle program, applied to the position and heading of the parent process, determines the position and
296
mm([
],_,[ 1). mm([XIXsl,Ys,[ZIZs]) +vm(X,Ys?,Z) @right, mm(Xs?,Ys,Zs)@forward.
vm(-,[ ],[ ]). vm(Xs,[VtYs],[ZIZs]) ~-ip(Xs?,Y?,Z), vm(Xs,Ys?,Zs) @forward. ip([XIXs ],[Y]Ys],Z) *-Z:=(X*Y)+Z1, ip(Xs?,Ys?,Z1). ip([ ],[ 1,0). Program
4.4: Matrix multiplication
heading of the new process. Using this notation, complex process structures can be m a p p e d in the desired way. Programming in Concurrent Prolog augmented with Turtle programs as a mapping notation is as easy as mastering a herd of turtles. Pipelining is the other aspect that requires explicit attention. The performance of many systolic algorithms depends on routing communication in specific patterns. The abstract specification of a systolic algorithm in Concurrent Prolog often does not enforce a communication pattern. However, the tools to do that are in the language. By appropriate transformations, broadcasting can be replaced by pipelining, and specific communication patterns can be enforced [63]. For example, Program 4.4 is a Turtle-annotated Concurrent Prolog program for multiplying two matrices, based on the classic systolic algorithm which pipelines two matrices orthogonally on the rows and columns of a processor array [ref Kung]. It assumes that the two input matrices are represented by a stream of streams of their columns and rows respectively. It produces a stream of streams of the rows of the o u t p u t matrix. The program operates by spawning a rectangular grid of ip processes for computing the inner-products of each row and column. Unlike the original systolic algorithm, this program does not pipeline the streams between ip processes but rather broadcasts them. However, pipelining can be easily achieved by adding two additional streams to each process [50].
4.6 The logical variable All the programming techniques shown before can be realized in other com-
297 putation models, with various degrees of success. For example, stream processing can be specified with functional notaLion [27]. By adding to functional languages a non-deterministic constructor they can even specify stream mergers [12]. Using simultaneous recursion equations one can specify recursive process networks. In this section we show Concurrent Prolog programming techniques which are unique to logic programming, as they rely on properties of the logical variable. Of course, one can take a functional programming language, extend it with stream constructors, non-deterministic constructors, simultaneous recursion equations, and logical variables, and perhaps achieve these techniques as well. But why approximate logic programming from below, instead of just using it?
4.6.1. Incomplete messages An incomplete message is a message that contains one or more uninstantiated variables. An incomplete message can be viewed in various ways, including: ® A message that is being sent incrementally. •
A message containing a communication channel as an argument.
•
A message containing implicitly the identity of the sender.
•
A data structure that is being constructed cooperatively.
The first and second views are taken by stream processing programs. A stream is just a message being sent incrementally, and each list-cell in the stream is a message containing the stream variable to be used in the subsequent communication. Similarly, the processes for constructing the merge trees communicated via incomplete messages, each containing a stream of streams. However, it is not necessary that the sender of an incomplete message would be the one to complete it. It could also be the receiver. Two Concurrent Prolog programming techniques - - monitors and bounded-buffers [59] M operate this way. Monitors also take the third view, that an incomplete message holds implicitly the identity of its sender. This view enables rich communication patterns to be specified without the need for an extra layer of naming conventions and communication protocols, by providing a simple mechanism for replying to a message.
4.6.2. Monitors Monitors were introduced into conventional concurrent programming languages by Hoare [21], as a technique for structuring the management of shared data. A monitor has some local data, which it maintains, and some procedures, or entries, defined for manipulating and examining the data. A user process that wants to u p d a t e or inspect the data performs the relevant monitor call.
298
stack([push(X) ISl)
stack(In?,[XIS]).
stack([pop(X) llnl,[XtS]) *--
stack(In?,S).
stack([ ],[ ]). P r o g r a m 4.5: A stack monitor
The monitor has built-in synchronization mechanisms, which prevent different callers from updating the data simultaneously and allow the inspection of data only when it is in an integral state. One of the convenient aspects of monitors is that the process performing a monitor-call does not need to identify itself explicitly. Rather, some of the arguments of the monitor call (which syntactically looks similar to a procedure call) serve as the return address for the information provided by the monitor. When the monitor call completes the caller can inspect these arguments and find there the answer to its query. Stream-based languages can mimic the concept of a monitor as follows [2]. A designated process, the "monitor" process, maintains the data to be shared. Users of the data have streams connected to the monitor via a merger. "Monitor calls" are simply messages to the monitor, which update the data and respond to queries according to the message received. The elegance in this scheme is that no special language constructs need be added in order achieve this behavior: the concepts already available, of processes, streams, and mergers, are sufficient. The awkward aspect of this scheme is routing the response back to the sender. Fortunately, in Concurrent Prolog incomplete messages allow responses to queries to be routed back to the sender directly, without the need for an explicit naming and routing mechanism. Both the underlying mechanism required to implement incomplete messages and the resulting effect from the user's point of view are similar to conventional monitors, where a process that performs a monitor call finds the answer by inspecting the appropriate argument of the call, after the call is "served". Hence Concurrent Prolog provides the convenience of monitors, while maintaining the elegance of stream-based communication. In contrast to conventional monitors, Concurrent Prolog monitors are not a special l~nguage construct, but simply a programming technique for organizing processes and data. Program 4.5 implements a simple stack monitor. It understands two messages:
push(X), on which it changes the stack contents S to [X]S], and pop(X), to which it responds by unifying the top element of the stack with X, and changing the stack contents to contain the remaining stack, pop(X) is an example of an incomplete message.
299
Monitors in Concurrent Prolog are discussed further in [48,49}.
4.6.3. Detecting distributed termination: the short-circuit technique Concurrent Prolog does not contain a sequential-AND construct. Suggestions to include one were resisted for two reasons. First, a desire to keep the number of language constructs down to a minimum. Second, the belief that even if eventually such a construct would be needed, introducing it at an early stage would encourage awkward and lazy thinking. Instead of using Concurrent Prolog's datafiow synchronization mechanism, programmers would resort to the familiar sequential construct 2 . In retrospect, this decision proved to be very important, both from an educational and an implementation point of view. Concurrent Prolog still does not have sequential-AND and Logix does not have the necessary underlying machinery to implement it, even if it was desired. The reason is that implementing sequentialAND in Concurrent Prolog on a parallel machine requires solving the problem of distributed termination detection. To run P& Q (assuming that & is the sequentialAND construct) one has to detect that P has terminated in order to proceed to Q. If P spawned many parallel processes that run on different processors, it requires detecting when all of t h e m have terminated, which is a rather difficult problem for an implementation to solve. O n the other hand, there is sometimes a need to detect when a computation terminates. First of all, as a service to the programmer or user who wishes to know whether his program worked properly and terminated, or if it has some useful or useless processes still running there in the background. Second, when interfacing with the external environment there is a need to know whether a certain set of operations, e.g. a transaction, has completed in order to proceed. This problem can be solved using a very elegant Concurrent Prolog programming technique, called the short-circuit technique, which is due to Takeuchi [58]. The idea is simple: chain the processes in a certain computation using a circuit, where each active process is an open switch on the circuit. When a process terminates, it closes the switch and shortens the circuit. When the entire circuit is shortened, global termination is detected. The technique is implemented using logical variables, as follows: each process is invoked with two variables, Left and Right, where the Left of one process is unified with the Right of another. The leftmost and rightmost processes each have 2 Early Prolog-in-Lisp implementations~ which provided an easy cop-out to Lisp, had a similar fate. Users of these systems -- typically experienced Lisp hackers -- would resort to Lisp whenever they were confronted with a difficult programming problem, instead of thinking it through in Prolog. This led some to conclude that Prolog ~wasn't for real".
300
one end of the chain connected to the manager. The manager instantiates one end of the chain to some constant and waits till the variable at the other end is instantiated to that constant as well. Each process that terminates unifies its Left and Right variables. When all terminate the entire chain becomes one variable and the manager sees the constant it sent on one end appearing on the other. An example of using the short-circuit technique is shown below, in Program 4.7.
4.7 Meta-programming and partial evaluation Meta-programs are programs that treat other programs as data. Examples of meta-programs include compilers, assemblers, and debuggers. One of the most important and useful type of meta-programs is the meta-interpreter, sometimes called a meta-circular interpreter, which is an interpreter for a language written in that language. A meta-interpreter is important from a theoretical point of view, as a measure for the quality of the language design. Designing a language with a simple metainterpreter is like solving a fixpoint equation: if the language is too complex, its meta-interpreter would be large. If it is too weak, it won't have the necessary data-structures to represent its programs and the control structures to simulate them. A language may have several meta-interpreters of different granularities. In logic programs, the most useful meta-interpreter is the one that simulates goal reduction, but relies on the underlying implementation to perform unification. An example of a Flat Concurrent Prolog meta-interpreter at this granularity is shown as Program 4.6. The meta-interpreter assumes that a guardless clause A *- B in the interpreted program is represented using the unit clause elause(A,B). If the body of the clause is empty, then B=true. A guarded clause A *-- G1B is represented by clause(A,B) *-- Gltrue. A similar interpreter for full Concurrent Prolog is shown in [48]. The plain recta-interpreter is interesting mostly for a theoretical reason, as it does nothing except simulate the program being executed. However, slight variations on it result in recta-interpreters with very useful functionalities. For example, by extending it with a short circuit, as in Program 4.7, a termination-detecting meta=interpreter is obtained. Many other important functions can be implemented via enhanced metainterpreters [45]. In Prolog, they have been used to implement explanation facilities for expert systems [56]. In compiler=based Prolog systems, as well as in Logix, the debugger is based on an enhanced meta-interpreter, and layers of protection
301
reduce(true).
reauce((A,B)) reduce(A?), reduce(B?). reduce(A) +A#true, a#(_,_) ] clause(A?,B), reduce(B ?).
% halt % fork % reduce
P r o g r a m 4.6: A plain meta-interpreter for Flat Concurrent Prolog
reduce(A,Done) +reducel (A,done-Done). reducel (true,Done-Done). reducel ((A,B),Left-Right) ~-reducel (A?,Left-Middle), reducel (B ?,Middle-Right). reducel (A,Left-Right) ~A#true, A#(_,_) I clause( A ?,B), reduce I (B ?,Left-Right ).
% halt % fork % reduce
P r o g r a m 4.7: A termination detecting meta-interpreter
and control are defined via meta-interpreters [20]. Such meta-interpreters, including abortable, interruptible, failsafe, and deadlock detecting meta-interpreters, are shown and explained in [ref Hirsch]. One problem with using such meta-interpreters directly is the execution overhead of the added layer of interpretation, which is unacceptable in many applications. In [45,60] it is shown how partial evaluation, a program-transformation technique, can eliminate the overhead of meta-interpreters. In effect, partial evaluation can turn enhanced meta-interpreters into compilers, which produce as output the input program enhanced with the functionality of the meta-interpreter.
4.8 Modular programming and programming-in-the-large The techniques shown above refer mostly to programming in the small. This does not mean that Concurrent Prolog is not suitable for programming in the large. To the contrary, we found that even using the simple module system developed for bootstrapping Logix many people could cooperate in its development. We expect
302
the situation to improve further using the hierarchical module system, currently under development. The key idea in these module systems, which are implemented entirely in Concurrent Prolog, is to use Concurrent Prolog message-passing to implement inter-module calls. This means that no additional communication mechanism is needed to support remote procedure calls between modules which reside on different processors.
5.
The Development of Concurrent Prolog
Concurrent Prolog was conceived and first implemented in November 1982, in an attempt to extend Prolog to a concurrent programming language, and to cleanup and generalize the Relational Language of Clark and Gregory [6]. Although one of the goals of the language was to be a superset of sequential Prolog, the proposed design did not seem, on the face of it, to achieve this goal, and hence was termed UA Subset of Concurrent Prolog" [49]. A major strength of that language, which later became known simply as Concurrent Prolog, was that it had a working, usable, implementation: an interpreter written in Prolog [49]. Since the concepts of the language were quite radical at the time, it seemed fruitful to try and explore them experimentally, by writing programs in the language, rather than to get involved in premature arguments on language constructs, or to implement the language "for real" before its concepts were explored and understood, or to extend this "language subset" prematurely, before its true limitations were encountered. In this respect the development of Concurrent Prolog deviated from the common practice of research on a new programming language. This typically concentrates on theoretical aspects of the language definition (e.g. CCS [34]), or attempts to construct an efficient implementation of it (e.g. Pascal), but rarely focuses on actual usage of the language through a prototype implementation. This exploratory activity proved tremendously useful. Novel ways of using logic as a programming language were unveiled [49,55,58], and techniques for incorporating conventional concepts of concurrent programming in logic were developed [48,51]. Most importantly, a large body of working Concurrent Prolog programs that solve a wide range of problems and implement many types of algorithms were gathered. This activity, which continued for a period of about two years mostly at ICOT and at the Weizmann Institute, resulted in papers on "How to do X in Concurrent Prolog" for numerous X's [5,11,14,17,18,19,46,48,50,51,52,55,57]. A programming language cannot be general purpose if only a handful of experts can grasp it and use it effectively. To investigate how easy is Concurrent
303
Prolog to learn, I have taught Concurrent Prolog programming courses at the Weizmann Institute and at the Hebrew University at Jerusalem. Altogether about 90 graduate and 100 undergraduate students in Computer Science have attended these courses. Based on performance in programming assignments and on the quality of the course's final programming projects, it seems that more then threequarters of the students became effective Concurrent Prolog programmers. The accumulated experience suggested that Concurrent Prolog would be an expressive and productive general-purpose programming language, if implemented efficiently. The strength of the language was perceived mostly in systems programming [20,45,48,59] and in the implementation of parallel and distributed algorithms [17,18,46,50]; it also seemed suitable for the implementation of knowledgeprogramming tools for AI applications [14,19], and as a system-description and simulation language [5,57]. The next step was to try and develop an efficient implementation of the language on a uniprocessor, to serve as a building-block for a parallel implementation and as a tool for exploring and testing the applicability of the language further. This proved to be surprisingly difficult. Interpreters for the language developed at the Weizmann Institute exhibited miserable performance [32]. A compiler of Concurrent Prolog on top of ProIog was developed at ICOT [68]. Although the latest version of the compiler reached a speed of more then 10K reductions per second, which is more then a quarter of the speed of the underlying Prolog system on that machine, it did not scale to large applications since it employed busy waiting. In addition to the implementation difficulties, subtle problems and opacities in the definition of the OR-parallel aspect of Concurrent Prolog were uncovered [42,66]. As a result of these difficulties we decided to switch research direction, and concentrate our implementation effort on Flat Concurrent Protog, the ANDparallel subset of Concurrent Prolog. Flat Concurrent Prolog was a "legitimate" subset of Concurrent Prolog for two reasons. First, it has a simple metainterpreter, shown above as Program 4.6. Second, we have discovered that almost all the applications that have been written in Concurrent Prolog previously are either in its Flat subset already, or can be easily hand-converted into it. This demonstrated the utility of having a large body of Concurrent Prolog code. Without it we would not have had the courage to make what seemed to be such a drastic cut in the language. There was one Concurrent Prolog program that would not translate into Flat Concurrent Prolog easily: an Or-parallel Prolog interpreter. This four-clause program, written by Ken Kahn, and shown as Program 5.1, was simultaneously the final victory of Concurrent Prolog, and its death-blow. It was a victory to the pragmatic expressiveness of Concurrent Prolog, since it showed that without extending the original "Subset of Concurrent Prolog', the language was as expres-
304
solve([ ]). solve([AlAsl) *clauses(A,Cs), resolve(A?,Cs?,As?). resolve( A,[(A ~- Bs)]Cs],As) *append(Bs?,As?,ABs), solve(A,Bs?) [true. resolve(A,iC[Cs] ,As ) *resolve(A?,Cs?,As?) ]true. append(Xs,Ys,Zs) *-- See Program 3.1 clauses(A,Cs) ~- Cs is the list of clauses in A's procedure. Program
5.1: Kahn's Or-parallel Prolog interpreter
sive as Prolog: any pure Prolog program can run on a Concurrent Prolog machine (with Or-parallelism for free!), by adding to it the four clauses of Kahn's interpreter. Thus the original design goal of Concurrent Prolog - - to have a concurrent programming language that includes Prolog - - was actually achieved, though it took more then a year to realize that. It was a death-blow to the implementability of Concurrent Prolog, at least for the time being, since it showed that implementing Concurrent Prolog efficiently is as hard as, and probably harder than, implementing Or-parallel Prolog. As we all know, no one knows how to implement Or-parallel Prolog efficiently, as yet. Once the switch to Flat Concurrent Prolog was made, in June 1984, implementation work began to progress rapidly. A simple interpreter for the language was implemented in Pascal [33]. An abstract instruction set for Flat Concurrent Prolog, based on the Warren Instruction set for unification [72] and the abstract machine embodied in the FCP interpreter, was designed [24], and an initial version of the compiler was written in Flat Concurrent Prolog. In July 1985, the bootstrapping of this compiler-based system was completed. The system, called Logix [54] is a single-user multi-tasking program development environment. It consists of: a five-pass compiler, including a tokenizer, parser, preprocessor, encoder, and an assembler. An interactive shell, which includes a command-line editor, and supports management and inspection of multiple parallel computations. A source level debugger, based on a meta-interpreter; a module system that supports separate compilation, runtime linking, and a free mixing of interpreted (debuggable) and compiled modules. A tty-controlIer, which allows multiple parallel processes, including the interactive shell, to interact with the user
305
in a consistent way. A simple file-server, which interfaces to the Unix file system; and some input, output, profiling, style-checking, and other utilities. The system is written in Flat Concurrent Prolog. Its source is about 10,000 lines of code long, divided between 45 modules. About half of it is the compiler. The system uses no side-effects or other extra-logical constructs, except in a few well-defined places. In the interface to the physical devices, low-level kernels make the keyboard and screen look like Concurrent Prolog input and output streams of bytes, and the Unix file system looks like a Concurrent Prolog monitor that maintains an association table of (FileName,FileContents). In the multiway stream merger and distributer, which are used heavily by the rest of the system, destructive-assignment is used to achieve constant delay [52], compared with the logarithmic delay that can be achieved in pure Concurrent Prolog [51]. The other part of the system, written in C, includes an emulator of the abstract machine, an implementation of the kernels, and a stop-and-copy garbage collector [24]. It is about 6000 lines of code long. When compiled on the VAX, the emulator occupies about 60K bytes, and Logix another 300K bytes ~ . When idle, Logix consists of about 750 Concurrent Prolog processes. Logix itself is running as one Unix process. The compiler compiles about 100 source lines per cpu minute on a VAX/ll750. A run of the compiler on the encoder, which is about 400 lines long, creates about 31,000 temporary Concurrent Prolog processes, and generates about 1.5M bytes of temporary data structures (garbage). During this computation about 90,000 process reductions occur and 10,000 process suspensions/activations. Overall, the system achieves at present about a fifth to a quarter of the speed of Quintus Prolog [38], which is the fastest commercially available Prolog on t h e VAX today. The number is obtained by comparing Concurrent Prolog process reductions to Prolog procedure calls for the same logic programs. This indicates that the efficiency of Warren's abstract ProIog machine [72], which is at the basis of Quintus Prolog, and our Flat Concurrent Prolog machine is about the same. The gap can be closed by rewriting our emulator in assembly language, as Quintus does. To explain this similarity in performance, recall that although Flat Concurrent ProIog needs to create and maintain processes, which is a bit more expensive t h e n creating stack frames for Prolog procedure calls, it does not support deep backtracking, where Prolog does and pays dearly for it.
3 At the moment we use word encoding, rather then byte encoding, for the abstract machine instructions.
306
6.
Efforts at ICOT and Imperial College: GHC and PARLOG
In the meantime ICOT did not stand still. Given their decision to use Concurrent Prolog as the basis for Kernel Language 1 [13], the core programming language of their planned Parallel Inference Machine, they have also attempted to implement its Or-parallel aspect° Prototype implementations of three different schemes were constructed, namely shallow-blnding [35], deep-binding, and lazycopying (the scheme we tried at Weizmann) [62]. Shallow binding proved to be the fastest, but did not seem to scale to multiprocessors. Lazy copying was the slowest, so the choice seemed to fall on deep-binding. Unfortunately the implementation scheme was rather complex, and the subtle problems with Concurrent Prolog's Or-parallelism were still unsolved. On the other hand, ICOT did not want to follow the Flat Concurrent Prolog path since it seemed to take them even further away from Prolog and from the AI applications envisioned for the Parallel Inference Machine. An elegant solution to these problems was found in Guarded Horn Clauses [65], a novel concurrent logic programming language. The main design choice of GHC was to eliminate multiple Or-parallel environments from Concurrent Prolog. Besides avoiding a major implementation problem, this decision also provided a synchronization rule: if you try to write on the parent environment, then suspend (in Concurrent Prolog a process would allocate a local copy of the variable and continue instead). This rule made the read-only annotation somewhat superfluous. The resulting language exhibits elegance and conciseness, and seems to capture most of Concurrent Prolog's applications and programming techniques, excluding, of course, Kahn's Or-parallel Prolog interpreter. GHC is the current choice of ICOT for Kernel Language 1. Besides solving some of the difficulties in the definition and implementation of Concurrent Prolog, GHC is "Made in Japan", which certainly is not a disadvantage from ICOT's point of view. Recent implementation efforts at ICOT concentrate on Flat GHC, which is the GHC analogue to Flat Concurrent Prolog. So why didn't we switch to GHC? Long discussion were carried at our group about this option. Our general conclusion was that even though GHC is a simpler formalism, it is also more fragile, tess expressive, and more difficult to extend. We felt it would either break or lose much of its elegance when faced with the problems of implementing a real operating system, which includes a secure kernel, errorhandling for user programs, and distributed termination and deadlock detection. Furthermore, it would be less adequate for AI applications, since it has a weaker notion of unification. Another related research effort is the development of the PARLOG programming language by Clark and Gregory at Imperial College [7]. PARLOG is compileroriented, even more than GHC, in a way that seems to render it unsuitable for
307
meta-programming. Given our com~aitment to implement the entire programming environment and operating system around the concepts of metaAnterpretation and partial-evaluation, we cannot use PAELOG. On the performance side, PARLOG and GHC seem quite similar, except that GHC has to make a runtime check that guards do not write on the parent's environment, whereas PARLOG ensures this at compile-time, using what is called a safety-check [8]. On the expressiveness side, there does not seem to be a grea.t difference between PARLOG and GHC, except for meta-programming. Alternative synchronization constructs to the read-only variable were proposed by Saraswat [43] and by Ramakrishnan and Silberschatz [39].
7.
Current Research Directions
The main focus of our current research at the Weizmann Institute is the implementation of a Concurrent Protog based general-purpose parallel computer system. Our present implementation vehicle is Intel's iPSC d4/me, a memoryenhanced four-dimensional hypercube, which, incidentally, is isomorphic to a 4 x 4 mesh-connected torus. As a first step, a distributed FCP interpreter is being implemented in C, based on a distributed unification algorithm which guarantees the atomicity of goal reductions [44]. Also a technique for implementing Concurrent ProIog virtual machines that manage code and process mapping on top of the physical machine has been developed [64]. Since Logix is self-contained, once the abstract FCP machine runs on a parallel computer, an entire program development environment and operating system will also become available on it. For example, the Logix source-level debugger, as well as other meta-interpreter based tools such as a profiler, would preserve the parallelism of the interpreted program while executing on a parallel computer. So with this system a parallel computer could be used both as the development machine and as the target machine, which is clearly advantageous over the sequential front-end/parallel back-end machine approach. Since both source text, parsed code, and compiled code are first-class objects in Logix, routines that implement code-management algorithms on the parallel computer could be written in Concurrent Prolog itself [64]. A technique for compiling Concurrent Prolog into Flat Concurrent Prolog was developed [10]. It involves writing a Concurrent Prolog interpreter in Flat Concurrent Prolog, and then partially evaluating it [15] with respect to the program to be compiled. It avoids the dynamic multiple-environment problem by requiring static output annotations on variables to be written upon. An attempt to provide Concurrent Prolog with a precise semantics is also being made, following initial work by Levi and Palamidessi [31] and Saraswat [43].
308
Another research direction pursued is partial evaluation [45], a technique of program transformation and optimization which proves to be very versatile when combined with heavy usage of interpreters and meta-interpreters [20,54], as in Logix. We believe that parallel execution is not a substitute for, but rather is dependent upon, efficient uniprocessor implementation. To that effect a highperformance FCP compiler is being developed. Hand timings indicate expected performance of about 30K LIPS for a 10MHz 68010. Lastly, Logix itself is still under development. Short term extensions include a hierarchical module system and a window system. Longer term research includes extending it to a multiprocessor/multiuser operating system.
8.
Conclusion
Our research on Concurrent Prolog has demonstrated that a high-level logic programming language can express conveniently a wide range of parallel algorithms. The performance of the Logix system demonstrates that a side-effect free language based on light-weight processes can be practical even on conventional uniprocessors. It thus "debunks the expensive process spawn myth". Its functionality and pace of development testifies that Concurrent Prolog is a usable and productive systems programming language. We have yet to demonstrate the practicality of Concurrent Prolog for programming parallel computers. Our prototyping engine is Intel's iPSC. We find the ultimate and most important question to be: which of the currently proposed approaches will result in a scalable parallel computer system, whose generality of applications, ease of use, and cost/performance ratio in terms of both hardware and software would compete favorably with existing sequential computers. Until such a system is demonstrated, the question of parallel processing could not be considered as solved.
Acknowledgements The research reported on in this survey has been conducted in cooperation with many people at ICOT, The Weizmann Institute, and other places; perhaps too many to recall by name. I am particularly indebted to the hospitality and
309
stimulating research environment provided by ICOT and its people. The development of Logix was supported by IBM Poughkeepsie, Data Systems Division. Contributors to its development include Avshalom Houri, William Silverman, Jim Crammond, Michael Hirsch, Colin Mierowsky, Shmuel Safra, Steve Taylor, and Marc Rosen. I am grateful to Vijay saraswat for discussions on read-only unification, and to Steve Taylor and William Silverman for comments on earlier drafts of the paper.
References [1] W.B. Ackerman, "Data flow languages", IEEE Computer, Vot. 15, No. 2, 1982, pp. 15-25. [2] Arvind and J.D. Brock, "Streams and managers", in M. Makegawa and L.A. Belady (eds.), Operating Systems Engineering, Springer-Verlag, 1982, pp. 452465. Lecture Notes in Computer Science, No. 143. [3] C. Bloch, "Source to source transformations of logic programs", Weizmann Institute Technical Report CS84-22, 1984. [4] D.L. Bowen, L. Byrd, L.M. Pereira, F.C.N. Pereira and D.H.D. Warren, "PROLOG on the DECSystem-10 user's manual", Technical Report, University of Edinburgh, Department of Artificial Intelligence, October, 1981. [5] K. Broda and S. Gregory, "PARLOG for discrete event simulation", Proceedings of the 2nd International Logic Programming Conference, Uppsala, 1984, pp. 77-312. [6] K.L. Clark and S. Gregory, "A relational language for parallel programming", in Proceedings of the A CM Conference on Functional Programming Languages and Computer Architecture, October, 1981. [7] K.L. Clark and S. Gregory, "PARLOG: Parallel programming in logic", Research Report DOC 84/4, April, 1984. [8] K.L. Clark and S. Gregory, ~Notes on the implementation of PARLOG", Research Report DOC 84/16, October, 1984. [9] K.L. Clark and S.-A. Tarnlund,"A first-order theory of data and programs", in B. Gilchrist (ed.), Information Processing, Vol. 77, North-Holland, 1977, pp. 939-944. [10] M. Codish and E. Shapiro, "Compiling Or-parallelism into And-parallelism", Proceedings of the Third International Conference on Logic Programming, Springer LNCS, July 1986. [11] S. Edelman and E. Shapiro, "Quadtrees in Concurrent Prolog', Proceedings of
310
the International Conference on Parallel Processing, IEEE Computer Society, August, 1985, pp. 544-551. [12] D.P. Friedman and D.S. Wise, "An approach to fair applicative multiprogramming', in G. Kahn (ed.), Semantics of Concurrent Computations, SpringerVerlag, 1979. Lecture Notes in Computer Science, No. 70. [13] K. Furukawa, S. Kunifuji, A. Takeuchi and K. Ueda, "The conceptual specification of the Kernel Language version 1", ICOT Technical Report TR-054, 1985. [14] K. Furukawa, A. Takeuchi, S. Kunifuji, H. Yasukawa, M. Ohki and K. Ueda, "Mandala: A logic based knowledge programming system", Proceedings of FGCS '84, Tokyo, Japan, 1984, pp. 613-622. [15] Y. Futamura, "Partial evaluation of computation process - an approach to a compiler-compiler', Systems, Computers, Controls, Vol. 2, No. 5, 1971, pp. 721-728. [16] C.C. Green, "Theorem proving by resolution as a basis for question answering", in B. Meltzer and D. Michie (eds.), Machine Intelligence, Vol. 4, Edinburgh University Press, Edinburgh, 1969, pp. 183-205. [17] L. Hellerstein, "A Concurrent Prolog based region finding algorithm", Honors Thesis, Harvard University, Computer Science Department, May, 1984. [18] L. Hellerstein and E. Shapiro, "Implementing parallel algorithms in Concurrent Prolog: The MAXFLOW experience", Proceedings of the International Symposium on Logic Programming, Atlantic City, New Jersey, February, 1984. [19] H. Hirakawa, "Chart parsing in Concurrent Prolog", ICOT Technical Report TR-008, 1983. [20] M. Hirsch, W. Silverman and E. Shapiro, "Layers of protection and control in the Logix system", Weizmann Institute Technical Report CS86-??, 1986. [21] C.A.R. Hoare, "Monitors: an operating systems structuring concept", Communications of the ACM, Vol. 17, No. 10, 1974, pp. 549-557. [22] C.A.R. Hoare, Communicating Sequential Processes, Prentice-Hall, 1985. [23] J.E. Hopcroft and J.D. Ullman, Introduction to automata theory, Languages, and Computation, Addison Wesley, Reading, MA, 1979. [24] A. Houri, "An abstract machine for Flat Concurrent Prolog', M.Sc. Thesis, Weizmann Institute of Science, 1986. [25] INMOS Ltd., IMS T424 Transputer Reference Manual, INMOS, 1984. [26] S.D. Johnson, "Circuits and systems: Implementing Communications with streams", Technical Report 116, Indiana University, Computer Science Department, October, 1981.
311
[27] G. Kahn and D.B. MacQueen, "Coroutines and networks of parallel processes", in G. Gilchrist (ed.), Information Processing, Vol. 77, North-Holland, 1977, pp. 993-998. [28] R.A. Kowalski, Logic/or Problem Solving, Elsevier North Holland Inc., 1979. [29] H.T. Kung, "Why systolic architectures?", IEEE Computer, Vol. 15, No. 1, 1982, pp. 37-46. [30] L. Lamport, "A recursive Concurrent Algorithm", January, 1982, Unpublished note. [31] G. Levi and Palamidessi, "The semantics of the read-only variable", 1985 Symposium on Logic Programming, IEEE Computer Society, July, 1985, pp. 128-137. [32] J. Levy, "A unification algorithm for Concurrent Prolog', Proceedings of the Second International Logic Programming Conference, Uppsala, 1984, pp. 333341. [33] C. Mierowsky, S. Taylor, E. Shapiro, J. Levy and M. Safra, "The design and implementation of Flat Concurrent Prolog', Weizmann Institute Technical Report CS85-09, 1985. [34] R. Milner, A Calculus of Communicating Systems, Lecture Notes in Computer Science, Vol. 92, Springer-Verlag, 1980. [35] T. Miyazaki, A. Takeuchi and T. Chikayama, "A sequential implementation of Concurrent Prolog based on the shallow binding scheme", 1985 Symposium on Logic Programming, IEEE Computer Society, 1985, pp. 110-118. [36] S. Pappert, Mindstorms: Children, computers, and powerful ideas", Basic Books, New York, 1980. [37] F. Pereira, "C-Prolog user's manual", EdCAAD, University of Edinburgh, 1983. [38] Quintus Prolog Reference Manual, Quintus Computer Systems Inc., 1985. [39] R. Ramakrishnan and A. Silberschatz, "Annotations for Distributed Programming in Logic", in Conference Record of the Thirteen Annual ACM Symposium on Principles of Programming Languages, January, 1986. [40] J.A. Robinson, "A machine oriented logic based on the resolution principle", Journal of the ACM, Vol. 12, January, 1965, pp. 23-41. [41] P. Roussel, "Prolog: Manuel reference et d'utilisation', Technical Report, Groupe d'Intelligence Artificielle, Marseille-Luminy, September, 1975. [42] V.A. Saraswat, "Problems with Concurrent Prolog', Carnegie-Mellon University CSD Technical Report CS-86-100, January, 1986.
312
[43] V.A. Saraswat, "Partial Correctness Semantics for CP[?,[,&]', Proceedings of the Fifth Conference on Foundations of Software Technology and Theoretical Computer Science, New Delhi, 1985, Springer LNCS 206. [44] M. Safra, S. Taylor and E. Shapiro, "Distributed Execution of Flat Concurrent Prolog', To appear as a Weizmann Institute technical report. [45] S. Safra and E. Shapiro, "Meta-interpreters for real", to appear in Proceedings of IFIP-86. [46] A. Shafrir and E. Shapiro, "Distributed programming in Concurrent Prolog', Weizmann Institute Technical Report CS83-12, August, 1983. [47] E. Shapiro, Algorithmic Program Debugging, MIT Press, 1983. [48] E. Shapiro, "Systems programming in Concurrent Prolog", in Logic Programming and its Applications, D.H.D. Warren and M. van Caneghem (eds.), Ablex, 1986. [49] E. Shapiro, "A subset of Concurrent Prolog and its interpreter", ICOT Technical Report TR-003, February, 1983. [50] E. Shapiro, "Systolic programming: A paradigm of parallel processing", Proceedings of FGCS '8~, Ohmsha, Tokyo, 1984. Revised as Weizmann Institute Technical Report CS84-16, 1984. [51] E. Shapiro and C. Mierowsky, "Fair, biased, and self-balancing merge operators: Their specification and implementation in Concurrent Prolog', Journal of New Generation Computing, Vol. 2, No. 3, 1984, pp. 221-240. [52] E. Shapiro and S. Safra, "Fast multiway merge using destructive operations", Proceedings of the International Conference on Parallel Processing, IEEE Computer Society, August, 1985, pp. 118-122. [53] S.Safra, S.Taylor and E.Shapiro, "Distributed Execution of Flat Concurrent Prolog", To appear as Weizmann Institute technical Report, 1986. [54] W. Silverman, A. Houri, M. Hirsch and E. Shapiro, "Logix user manual, release 1.1", Weizmann Institute of Science, 1985. [55] E. Shapiro and A. Takeuchi, "Object-oriented programming in Concurrent Prolog', Journal of New Generation Computing, Vol. 1, No. 1, July, 1983. [56] L. Sterling and E. Shapiro, The Art of Prolog, MIT Press, 1986. [57] N. Suzuki, "Experience with specification and verification of complex computer hardware using Concurrent Prolog', in Logic Programming and its Applications, D.H.D. Warren and M. van Caneghem (eds.), Ablex, 1986. [58] A. Takeuchi, "How to solve it in Concurrent Prolog', Unpublished note, 1983. [59] A. Takeuchi and K. Furukawa, "Interprocess communication in Concurrent
313
Prolog", Proceedings of the Logic Programming Workshop '82, Albufeira, Portugal, June, 1983, pp. 171-185. [60] A. Takeuchi and K. Furukawa, '¢Partial evaluation of Prolog programs and its application to meta programming", ICOT Technical Report TR-126, 1985. [61] H. Tamaki, "A distributed unification scheme for systolic logic programs", in Proceedings of the 1985 International Conference on Parallel Processing, pp. 552-559, IEEE, 1985. [62] J. Tanaka, T. Miyazaki and A. Takeuchi, "A sequential implementation of Concurrent Prolog - based on Lazy Copying scheme", The 1st National Conference of Japan Society for Software Science and Technology, 1984. [63] S.Taylor, L.Hellerstein, S.Safra and E.Shapiro "Notes on the Complexity of Systolic Programs", Weizmann Institute Technical Report CS86-??, 1986. [64] S.Taylor, E.Av-Ron and E.Y.Shapiro "Virtual Machines for Process and Code Mapping" Weizmann Institute Technical Report CS86-??, 1986. [65] K. Ueda, "Guarded Horn Clauses", ICOT Technical Report TR-103, 1985. [66] K. Ueda, "Concurrent Prolog re-examined", to appear as ICOT Technical Report. [67] K. Ueda and T. Chikayama, "Efficient stream/array processing in logic programming languages", Proceedings of the International Conference on 5th Generation Computer Systems, ICOT, 1984, pp. 317-326. [68] K. Ueda and T. Chikayama, "Concurrent Prolog compiler on top of Prolog', 1985 Symposium on Logic Programming, IEEE Computer Society, July, 1985, pp. 119-126. [69] M.H. van Emden and R.A. Kowalski, "The semantics of predicate logic as a programming language", Journal of the ACM, Vol. 23, October, 1976, pp. 733-742. [70] O. Viner, "Distributed constraint propagation", Weizmann Institute Technical Report CS84-24, 1984. [71] D.H.D. Warren, "Logic programming and compiler writing", Software-Practice and Experience, Vol. 10, 1980, pp. 97-125. [72] D.H.D. Warren, "An abstract Prolog instruction set", Technical Report 309, Artificial Intelligence Center, SRI International, 1983.