Fundamentals of Artificial Intelligence: An Advanced Course (Lecture Notes in Computer Science)

Lecture Notes in Computer Science Edited by G. Goos and J. Hartmanis 232 Fundamentals of Artificial Intelligence An Adv...

36 downloads 986 Views 17MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Lecture Notes in Computer Science Edited by G. Goos and J. Hartmanis

232 Fundamentals of Artificial Intelligence An Advanced Course

Edited by W. Bibel and Ph. Jorrand

Prof. Dr. J. Stoer Instltut far Angewandte Mathematik und Statistik

87 WQrzburg,Am Hubland

Springer-Verlag Berlin Heidelberg NewYork London Paris Tokyo

Editorial Board

D. Barstow W. Brauer R Brinch Hansen D. Gries D. Luckham C. Moler A. Pnueli G. SeegmiJller J. Stoer N. Wirth Editors

Wolfgang Bibel Institut fLir Informatik, Technische Universit~t Mtinchen Postfach 202420, D-8000 MiJnchen 2 Philippe Jorrand LIFIA-IMAG BP 68, F-38402 St. Martin d'H6res Cedex

CR Subject Classifications (1985): F.4.1, 1.2.3, 1.2.4, 1.2.6, D. 1.3 ISBN 3-540-16782-X Springer-Verlag Berlin Heidelberg New York ISBN 0-387-16782-X Springer-Verlag New York Berlin Heidelberg This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically those of translation, reprinting, re-use of illustrations, broadcasting, reproduction by photocopying machine or similar means, and storage in data banks. Under § 54 of the German Copyright Law where copies are made for other than private use, a fee is payable to "Verwertungsgesellschaft Wort", Munich. © Springer-Verlag Berlin Heidelberg 1986 Printed in Germany Printing and binding: Beltz Offsetdruck, Hemsbach/Bergstr. 2145/3140-543210

PREFACE

T h e expectations in Artificial Intelligence - or Intellectics - have never b e e n as high as they are today. Clearly, they are too high given the current state of the art in this fascinating field. Although we have seen some remarkable systems p e r f o r m i n g extremely well, there is no doubt for those who u n d e r s t a n d how they work that this performance reflects just a b e g i n n i n g in our u n d e r s t a n d i n g of the f u n d a m e n t a l s that might be required in systems to perform in a truly intelligent way. O n e of the basic paradigms in Artificial Intelligence has always been that experiment needs to be complemented with theory, or vice versa. D u r i n g a wave of e x p e r i m e n t i n g throughout the world in AI we feel that some emphasis on the more theoretical side might be appropriate a n d necessary for progress in this u n d e r s t a n d i n g of t h e f u n d a m e n t a l s of AI. This sort of reflection motivated us when we took the initiative in organizing the first Advanced Course in Artificial Intelligence that was held in Vignieu, France, in July 1985. Seven well-known AI researchers were asked to cover basic topics that might be of relevance for the f u n d a m e n t a l s of our field. T h e present volume comprises the elaborated a n d h a r m o n ized versions of their lectures. Most of them have been written in the form of a tutorial, so that the book provides a most valuable guide into the more f u n d a m e n t a l aspects of AI. O n e m i g h t be inclined tO say that intelligence is the capability to acquire, memorize, a n d process knowledge in a way that appropriate accommodation in a c h a n g i n g world is achieved. In any case, the concept of knowledge a n d its representation clearly are a m o n g the f u n d a m e n t a l issues in AI. T h e book begins with this topic discussed in the contribution by Delgrande a n d Mylopoulos in order to give the reader a feel for the variety of aspects that have to be taken into account for the later issues raised in the book. T h e subsequent four articles are dealing with the second focus in this book which is the processing of knowledge.

Obviously,

knowledge c a n n o t be processed unless it is adequately

represented; on the other h a n d , a n appropriate representation is determined by the way of processing needed. This is to say that these two issues are intimately related with each other. It is therefore not accidental that some of the aspects raised in P A R T O N E reoccur in different context here in P A R T T W O (such as in the contribution by Bibel). For some processing of knowledge would be synonymous with computation, for others with inference or deduction. W e see in the remarkable contribution by H u e t that both are just different aspects ~f the same p h e n o m e n o n which is studied in great depth in this contribution. T h e reader m a y find it rewarding to overcome the difficulty of studying such a formal a n d concise text.

JV

One of the most successful tools for processing knowledge represented on a logical level of language is resolution. The chapter written by Stickel gives a thorough and knowledgeable introduction into the most important aspects of deduction by some form of resolution. It may be regarded as the basis for the more advanced forms of reasoning discussed in the subsequent two papers, but also for the programming language P R O L O G . The deductive forms of reasoning captured by resolution in its pure form do by no means exhaust the kinds of knowledge processing known from human experience and studied in Artificial Intelligence. A particularly important one is the kind of reasoning that is involved in inductive inferencing, problem solving (or programming) from examples, and in learning.

It

is covered in the contribution by Biermann that the reader will like also for its style of presentation. There are many more forms of knowledge processing and inferencing than those discussed in the previous three papers. The tutorial by Bibel covers the more important among those that might play a significant role in common-sense reasoning. It takes, however, the position that the basic deductive tools like resolution or the connection method are essential for these forms of inference as well. The third part of fhe book focuses on the more advanced programming tools for implementing the kind of systems that are envisaged with the topics discussed before. Logic programming and functional programming are known as the main styles for AI programming.

Both lend

themselves to a parallel treatment. In this part of the book, Jorrand takes the approach that the semantic elegance and the mathematical properties of functional programming languages can be preserved within a language where computations or inferences can also be described as networks of cooperating parallel processes.

This is shown in the language FP2, where term rewriting forms the basis

for the semantics of both functional and parallel programming. By its nature, P R O L O G lends itself to parallel processing.

To some extent the programmer

might wish to control such parallel processes in P R O L O G programs without compromising P R O L O G ' s elegance as a descriptive language. and his group provides such features.

Concurrent P R O L O G developed by Shapiro

His introduction to this language stages the adequate

finale of the whole book. There is .an obvious lack of more good textbooks in Artificial Intelligence. One reason is that in a rapidly progressing field like AI it is nearly impossible for a researcher to actively contribute to the field's progress and at the same time be able to overlook large portions of this developing area, not mentioning the time needed for working out an appropriate presentation. A volume like the present one certainly cannot be a substitute for a textbook. But nevertheless we feel that it may be regarded as a good compromise for the time being.

In this sense we hope that the book not only refreshes the memories of those who attended the course, b u t serves as a u n i q u e source of valuable i n f o r m a t i o n for graduate classes a n d individuals with interest in the more f u n d a m e n t a l a n d advanced topics of this exciting area of research.

M f i n c h e n a n d Grenoble, April 1986

W. Bibel a n d Ph. J o r r a n d

CONTENTS

P A R T ONE:

KNOWLEDGE REPRESENTATION

Knowledge Representation: Features of Knowledge J . P . Delgrande a n d J. Mylopoulos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

P A R T TWO:

K N O W L E D G E PROCESSING

Deduction a n d C o m p u t a t i o n G. H u e t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

A n Introduction to A u t o m a t e d Deduction M.E. Stickel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

75

F u n d a m e n t a l M e c h a n i s m s in M a c h i n e L e a r n i n g and Inductive Inference A.W. B i e r m a n n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

133

Methods of A u t o m a t e d R e a s o n i n g W. Bibel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

171

P A R T THREE:

KNOWLEDGE PROGRAMMING

T e r m Rewriting as a Basis for the Design of a Functional a n d Parallel P r o g r a m m i n g Language. A case study: the language FP2 Ph. J o r r a n d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

221

C o n c u r r e n t P R O L O G : A Progress Report E. Shapiro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

277

PART ONE Knowledge Representation

Knowledge Representation: Features of Knowledge*

James P. Delgrande** John Mylopoulos*** D e p a r t m e n t of C o m p u t e r Science, U n i v e r s i t y of Toronto, Canada

1. I n t r o d u c t i o n It is by n o w a clich~ to claim t h a t knowledge representation is a f u n d a m e n t a l research issue in Artificial Intelligence ( M ) u n d e r l y i n g m u c h of the research, and the progress, of the last fifteen years. A n d yet, it is difficul~ to pinpoint exactly w h a t knowledge representation is, does, or promises to do. A thorough s u r v e y of the field by Ron B r a c h m a n a n d Brian Smith [Brachman & Smith 80] points o u t quite clearly the t r e m e n d o u s range in viewpoints a n d methodologies of researchers in knowledge representation. This paper is a f u r t h e r a t t e m p t to look at the field in order to examine the state of t h e a r t and provide some insights into the n a t u r e of the research m e t h o d s and results. The distinctive m a r k of this overview is its viewpoint: t h a t propositions encoded in knowledge bases have a n u m b e r of i m p o r t a n t features, and these features serve, or o u g h t to serve, as a basis for guiding current interest and activity in AI. Accordingly, the paper provides an account of some of the issues that arise in s t u d y i n g knowledge, belief, and conjecture, a n d discusses some of the approaches t h a t have been adopted in formalizing a n d using some of these f e a t u r e s in AI. The account is intended p r i m a r i l y for the c o m p u t e r scientist w i t h little exposure to A1 and Knowledge Representation, and w h o is interested in u n d e r s t a n d i n g some of t h e issues. As such, the paper concentrates on raising issues and sketching possible approaches to solutions. More technical details can be f o u n d in the w o r k referenced t h r o u g h o u t the paper. Naively, and circularly, knowledge representation is concerned w i t h t h e development of suitable notations for representing knowledge. The reason for its importance in AI is t h a t the c u r r e n t paradigm for building "intelligent" s y s t e m s a s s u m e s t h a t such s y s t e m s m u s t have access to domain-specific knowledge and m u s t be capable of using it in performing their intended task (hence t h e t e r m k n o w l e d g e b a s e d sys-

tems). This paradigm is in s h a r p contrast to t h e approaches used in the sixties, w h e n the emphasis was on general-purpose search techniques, The aim of the earlier approaches, w h i c h are termed p o w e r - ~ r i e n t e d , w a s to construct general, domain-independent problem-solving s y s t e m s [Goldstein & Papert 77]. This goal w a s generally f o u n d to be unrealistic for nontrivial t a s k s since, w i t h undirected search, the n u m b e r of alternatives t h a t needs to be explored g r o w s exponentially w i t h t h e size of t h e p r o b l e m to be solved. * Repri~ed from Fundamentals in blan-Machine Communication: Speech, Vision and Natural latnguage, Jean-Paul Halon (Ed.), 1986. wi~h permission from Cambridge University Press, **Current address: Departmen'~of Computing Science,Simon Fraser University, Burnaby, BC. *** Senior fellow, Canadian Institute for Advanced Research.

Current attitudes ~,owards ~intelligen1" system building can be accurately summarized by the slogan "knowledge is power". According to popular wisdom, a knowledge based system includes a knowledge base which can be thought of as a data structure assumed to represent propositions about a domain o f discourse (or world). A knowledge base is constructed in terms of a knowledge r e p r e s e n t a t i o n scheme which, ideally, provides a means for interpreting the data structure with respect to its intended subject matter and for manipulating it in ways which are consistent with its intended meaning. There already exist several surveys of Knowledge Representation which describe the field mostly from the point of view of current practice; these include [Hayes 74]. [McDermott 78], [Barr & Davidson 80] and [Mylopoulos & Levesque 84]. In addition. there have been several fine collections of research papers focusing on Knowledge Representation, such as [McCalla & Cercone 83] and [Brachman and Levesque 85]. The newcomer to the area may also be interested in key papers such as [McCarthy & Hayes 69]. [Minsky 75], [Woods 75], [Hayes 77], [Reiter 78]. [Newell 81], and [Brachman & Levesque 82] which have raised fundamental research issues and have influenced the direction of research. We already noted the difficulty of characterizing Knowledge Representation as a research area in terms of a coherent set of goals or methodologies. In preparation for the discussion, however, we need to adopt at least working definitions for the terms ~knowledge" and "representation'. By knowledge we will mean justified true belief, following traditional philosophical literature. While there are shortcomings to such a working definition, as [Dretske gl] points out, it is adequate for our purpose. By r e p r e s e n t a t i o n we will understand an encoding into a data structure. Intuitively then, knowle~lge r e p r e s e n t a t i o n means the encoding of justified true beliefs into suitable data structures. This though is a little rigid for our purposes. For example we will want to consider also encodings where the information is only thought to be true or maybe even is known to be false or inconsistent. So we will on occasion want to deal with encodings where the information may not be knowledge per se. Preference for one knowledge representation scheme over another depends heavily on the nature of the formal system adopted as a formalization of knowledge. However the preference for one scheme over another depends also on the suitability of the data structures offered, i.e., on how direct the mapping is from the components of the data structures used into their intended interpretations, This paper is concerned primarily with the nature of knowledge and its formalizations, rather than its representation. A companion paper [Kramer & Mylopoulos 85] attempts to examine the knowledge representation issues by surveying organizational structures that have been proposed for knowledge bases. For the purposes of the discussion in the remainder of the paper, a knowledge base KB is a pair where KBo is a collection of statements in the language of some logic L, for example: KBo = {Student(John), Supervisor(John,Mary)} and ~z is the derivability relation in L, i.e.. specifies what can be derived from the axioms, given the rules of inference of L. Then EKB

iff

KBob-zC~

(Adoption of this view implies that knowledge bases are essentially treated here as theories in Mathematical Logic, with KBo playing the role of a set of proper axioms.) Thus the example knowledge base contains

not just the statements in KBo but also others that can be derived from them in L

So KB may contain

statements such as

Student(John) V Professor(Joe) -,-,Student(John) depending on our choice of L. Statements in a knowledge base can be assigned a t r u t h v a l u e (usually either true or false) given a world or domain of discourse. The assignment of t r u t h values to statements is carried out in terms of a semantic function.

A standard method for doing so, due to Alfred Tarski. treats a knowledge base

i n t e r p r e t a t i o n as a 3-tuple where D is the set of individuals in the domain of discourse, and R and F are respectively the relations and functions between individuals that hold in the world, Tarskian semantics assumes the availability of semantic functions that map constant s y m b o l s in L, such as John, onto individuals in D, predicate symbols in L, such as Student, onto relations in R, and function symbols in L onto functions in F. From these functions, the notion of t r u t h in L can be made explicit. So, for example, Student(John) is true (roughly) if John (the object in D) satisfies the property of studenthood. A consequence of Student(John) being true in the interpretation is that (in most logics)

Student(John) V Student(q~lary) will also be true. An interpretation is said to be a m o d e l of a knowledge base if and only if all sentences in the knowledge base come out true in the interpretation, Of course, interpretations are only idealizations of the "real" w o r l d s of students, ships, and bombs with respect to which we interpret a knowledge base. Nevertheless, a formal semantics, Tarskian or other. can be extremely valuable as long as the structure of the interpretation captures our intuitions about the world or domain in question. The remainder of the paper consists of t w o parts. The first discusses the basic nature of knowledge, belief, and hypothesis, and introduces a n u m b e r of important concepts and methods for their study. The second part points out a n u m b e r of features that information in knowledge bases has, such as incompleteness, inconsistency, inaccuracy, and uncertainty, and provides a brief overview of methods that have been used in attempting to deal with these features within a representational framework.

2. On t h e N a t u r e o f K n o w l e d g e The main concern of this section is the relationship between the information contained in a knowledge base, and the state of the w o r l d or domain of discourse which the knowledge base is intended to describe. First w e discuss the commitment t h a t is made with respect to the t r u t h of a statement. While we have restricted ourselves so far to knowledge per so. m a n y s y s t e m s treat weaker notions such as belief or hypothesis. This commitment m a y be called the e p i s t e m i c s t a t u s of a statement, Second we consider the as.c~rtional s t a t u s of a statement, i.e., the confidence in the assertion represented by a statement. For example, a statement m a y be regarded as holding absolutely and w i t h o u t exception, or alternatively as only being u s u a l l y true. Lastly we review semantic theories that have been proposed for assigning meaning to encodings of knowledge.

2.1, K n o w l e d g e , Belief, a n d H y p o t h e s i s Knowledge w a s defined as "true, justified, belief ~. In this section we develop this notion f u r t h e r by exploring the t e r m s "true", "justified", and "belief". The notion of t r u t h can be discharged by a standard Tarskian account. However we a r e still left with the t e r m s "justified" and "belief". Let u s look at "belief" first. Belief can be defined in a surprisingly simple w a y . Given a knowledge-based s y s t e m (or agent) A, A believes a sentence P, j u s t w h e n P appears in A ' s knowledge base (or "language of thought~), Belief then, so construed, consists of literally a n y t h i n g t h a t can be represented. So does this make t h e t e r m vacuous7 Not quite: belief m a y be taken as distinguishing genuine cognitive s y s t e m s f r o m simple processors of information, such as television sets [Dretske 81]. However since belief is w h a t is attributed to cognitive s y s t e m s , it is clear that a general unconstrained "believing" s y s t e m is unacceptable: one w o u l d also w a n t to ensure t h a t beliefs are coherent, consistent, and (in a nutshell) "reasonable". So, for example, given t h a t

Student(John) ( x )[Stud ent( x ) D Hard Worker( x ) ] are believed to be true, one m a y w a n t to require t h a t

HardWorker(John) also be believed to be true

Similarly it seems reasonable to stipulate t h a t it not be the case that

~Student( ]ohn) be believed. Typically then one w o u l d w a n t beliefs, although possibly counterfactual, to have properties similar to knowledge. Logical s y s t e m s of knowledge and belief typically deal with only one of knowledge or belief. For such s y s t e m s s t a n d a r d first-order logic is u s u a l l y a u g m e n t e d with a sentential operator K, where Kc~ m a y be read as "c~ is k n o w n (believed) to be true ". W h e t h e r the informal interpretation of K actually corresponds to knowledge, or instead to belief, though u s u a l l y depends only on whether the axiom KaD~ is present. This axiom has the informal reading "if ~ is believed to be true, then ~ is true", tf the axiom is present, then w h a t e v e r is in the knowledge b a ~ is in fact true. and the notion corresponds to knowledge; otherwise it corresponds to belief, A n y other axioms of t h e s y s t e m apply both to knowledge and belief, The fact t h a t a s y s t e m deals with knowledge (say) rather t h a n belief then has v e r y little effect on the characteristics of the s y s t e m . Given this, the work of Moore on reasoning about knowledge and action ([Moore 80]) and of Fagin and his co-workers on m u l t i - a g e n t reasoning ([Fagin et al 84]) deal with knowledge, while t h a t of Levesque on incomplete knowledge bases ([Levesque 81]) deals w i t h belief. Konolige, in his dissertation research [Konotige 84]. examines both notions f r o m the point of view of a set of agents. Clearly m a n y s y s t e m s of d e f a u l t reasoning are not knowledge-preserving a n d t h u s deal w i t h belief. [Halpern & Moses 85] provides a general introduction to logics of knowledge and belief, while [Hughes & Cresswell 68] is an excellent introduction to Modal Logic.

However m a t t e r s do not end here

For example, if a sentence is belief only, then it is possible that the

sentence m a y later be discovered to not in fact hold. In this case. other beliefs based on the erroneous belief w o u l d have to be re-examined and perhaps modified or retracted.

This leads to the question of

w h i c h beliefs s h o u l d be introduced or held, or, more broadly, h o w one m a y j u s t i f y a belief. Let u s c a h a justified belief t h a t is not k n o w n to be t r u e a h y p o t h e s i s . [Quine & Ullian 78] is a good introduction to issues s u r r o u n d i n g this notion, while [Schefller 81] provides a more thorough exposition. U n d e r this view, it is the established t r u t h of a sentence t h a t separates knowledge per se f r o m hypothesis and belief. The justification of a sentence, on the other hand, separates knowledge and hypothesis f r o m belief. In this latter case, a k n o w n sentence m a y be regarded as being absolutely justified. It w o u l d be going too f a r afield to s u r v e y justification in a n y depth. It is instructive though to consider f o r m s of reasoning t h a t can be used to introduce justified belief into a knowledge base. For purposes of illustration we will m a k e use of the following classical f o r m for deductive reasoning.

(x)[e(x) ~ Q(x)]

(1)

e(a)

(2)

Q(a)

(3)

The inference from (1) and (2) to (3) is of course absolutely justified. However, the schema can also be used as a template for introducing justified belief. Some d e f a u l t logics for example m a y be regarded as a u t o m a t i n g a weaker f o r m of the above deduction. T h u s if one k n o w s t h a t m o s t elephants are grey, and that C l y d e is an elephant, then lacking information to the contrary, one m a y feel justified in concluding t h a t C l y d e is grey. (These considerations are discussed f u r t h e r in the section on nonmonotonicity ) Strictly speaking, such an inference w o u l d introduce a hypothesis t h a t Clyde is grey. Justification w o u l d depend on pragmatic factors, such as the n u m b e r of elephants seen, knowledge of albinoism, etc. However, the schema m a y be employed in quite different w a y s for introducing hypotheses. Consider first situations w h e r e we have instances of (1) and (3). W e can t h e n claim t h a t rule (1). together w i t h conclusion (3) suggests a cause, n a m e l y (2). T h u s for example if we have t h a t "All people w i t h colds have r u n n y noses" and "John has a r u n n y nose", we can propose the hypothesis "John h a s a cold". If we k n e w f u r t h e r t h a t people w i t h colds had elevated temperature, and t h a t John had this s y m p t o m , then our faith in the h y p o t h e s i s that John had a cold w o u l d be strengthened. This type of reasoning is k n o w n as abduc-

tire inference [Pople 73]. A b d u c t i o n provides a m e c h a n i s m for reasoning f r o m effect to possibIe cause. It provides a model of reasoning t h a t has been f o u n d useful in the development of medical diagnosis s y s t e m s (in particular) and expert s y s t e m s (in general). The inferencing components of m a n y production rule s y s tems, as perhaps best exemplified by MYCIN [Shortliffe 76], can be viewed as implementing particular f o r m s o f abductive reasoning. A b d u c t i o n can also be associated w i t h d e f a u l t reasoning. T h u s if we k n e w t h a t people w i t h colds typically had an elevated temperature, then, again, if John h a d this s y m p t o m , we could propose t h a t John h a d a cold. The question of h o w to determine one's f a i t h in such a diagnosis in a non a d hoe fashion is, of course, v e r y ditficuIt. Returning to o u r schema for reasoning, consider the t h i r d alternative w h e r e we have instances of (2) a n d (3) - - for example, a large collection of ravens, all of w h i c h happen to be black. In this case we m i g h t hypothesise the general s t a t e m e n t "All r a v e n s are black".

This is k n o w n as i n d u c t i v e reasoning. An

inductive m e c h a n i s m provides a m e a n s w h e r e b y general conjectures m a y be formed f r o m simple facts or ground atomic f o r m u l a s . However the general problems of j u s t i f y i n g induction and explicating the notion of confirmation are k n o w n f r o m philosophy to be extremely difficult [Goodman 79], [Scheltter 81]. In AI, inductive inference p r o g r a m s typically a s s u m e t h a t the domain of application is governed b y some u n d e r lying g r a m m a r . [Angulin & S m i t h 82] provides a thorough s u r v e y of efforts in this area, while [Shapiro 81] presents a particularly elegant t r e a t m e n t of some of the problems. This b r e a k d o w n into knowledge, hypothesis, and belief gives u s a m e a n s of characterising the e p i s t e m i c s t a t u s of a s t a t e m e n t . If a s t a t e m e n t is considered to be knowledge, t h e n p r e s u m a b l y one w o u l d be unwilling to allow t h a t it can be a n y t h i n g b u t true. T h u s a n y m a t h e m a t i c a l or definitional s t a t e m e n t w o u l d be treated as knowledge. There are certainly other sentences t h o u g h t h a t one w o u l d wish to treat as knowledge. For example whales, which were once regarded as fish,

n o w are recognised as being m a r e -

mats. However, while this d e m o n s t r a t e s t h a t "all whales are m a m m a l s " isn't knowledge per s¢, it w o u l d be a rare knowledge base t h a t d i d n ' t treat it as such. Knowledge then, pragmatically viewed, consists of those sentences t h a t are taken for granted, i.e. t h a t one is unwilling to give up. This suggests that a particular set of f o r m u l a s m a y or m a y not be taken as knowledge, depending on one's viewpoint. For example, m u c h of c u r r e n t a s t r o n o m y is conducted u n d e r the a s s u m p t i o n t h a t the t h e o r y of relativity is true. Yet relativity certainly isn't knowledge as such (since t h e theory, like m o s t of its predecessors, could be incorrect) and so, at a lower level, this t h e o r y itself is subject to experimentation and confirmation.

2.2. A s s e r t i o n a l S t a t u s The previous section dealt w i t h the epistemic s t a t u s of a s t a t e m e n t - - t h a t is, the presumed t r u t h of a sentence. In this section we t u r n to the a s s e r t i o n a l s t a t u s of a (general) sentence, t h a t is, the strength of the claim being made b y a sentence. This notion is best introduced by m e a n s of an example. Consider the statement "Elephants are grey". There are at least three readings:

(1)

"All elephants are grey". W h i l e this seems intuitively reasonable, strictly speaking it is false, since there are, a m o n g other things, albino elephants.

(2)

"Typically elephants are grey". This has the related reading "an elephant is grey with confidence or probability p",

(3)

~ l e p h a n t s are grey. However we acknowledge possible exceptional individuals". In this case t h e intention is t h a t greyness is in some sense associated w i t h elephanthood ( a l t h o u g h it is n o t clear exactly how).

These possibilities lead to three different approaches to specifying t h e m e a n i n g of a term. Consider the first case. The claim t h a t a t e r m m a y be exactly identified w i t h a collection of properties h a s been called the t r a d i t i o n a l t h e o r y o f m e a n i n g [Schwartz 77]. U n d e r this theory, "all elephants have f o u r legs" (if true) w o u l d be a n a l y t i c (i.e. w o u l d be true p u r e l y b y virtue of m e a n i n g a n d independently of collateral information), a n d t h e m e a n i n g of "elephant" could be laid o u t b y specifying enough of these

properties Squares being equilateral rectangles and bachelors being u n m a r r i e d males are, in most accounts, examples of analytic t r u t h s , However, it is certainly not the case t h a t all elephants have f o u r legs, nor is it t h e case t h a t every elephant is grey or has a trunk. In fact it seems that elephants m a y have no commonplace exceptionless properties, and that, barring assertions such as "all elephants are m a m m a l s " and logical t r u t h s , a n y general s t a t e m e n t concerning elephants m a y have exceptions. Clearly a similar a r g u m e n t can be applied to other c o m m o n nouns, s u c h as "lemon", "gold", "water", and so on, T h u s for example a lemon need not be yellow. nor bitter, nor necessarily oblong. Such t e r m s are examples of n a t u r a l k i n d terms. These t e r m s m a y be characterised as being of e x p l a n a t o r y importance, b u t whose n o r m a l distinguishing characteristies are explained b y deep-lying m e c h a n i s m s [ P u t m a n 75]. Hilary P u t n a m , in t h e reference just cited, argues pers u a s i v e l y t h a t n a t u r a l kind t e r m s have no knowable defining conditions, and no non-trivial exceptionless properties. T h u s he takes the position t h a t a s t a t e m e n t such as "elephants are m a m m a l s " m a y be falsified; this seems n o t unreasonable if one considers t h a t "whales are fish" w a s once t h o u g h t to be true. There are of course t e r m s t h a t m a y be precisely defined or specified. For example, a square is defined to be an equilateral rectangle - - a three-sided square m a k e s no sense whatsoever. Also, if we define "uncle" to mean a b r o t h e r - i n - l a w of a parent, t h e n someone w h o fulfills t h e latter conditions cannot fail to be an uncle, and an uncle cannot b u t be a b r o t h e r - i n - l a w of a parent. These definitions clearly do not allow for exceptions. The notions of definitional t e r m s and terminology moreover are k e y in the design of m a n y knowledge representation s y s t e m s , and in particular ~ m a n t i c n e t w o r k f o r m a l i s m s s u c h as KL-ONE [Brachman 79]. So t h e notions of analyticity and the traditional theory of meaning, while inapplicable to n a t u r a l kind terms, are nonetheless necessary for terminology and definition. In the second reading of "elephants are grey", where we have, "typically elephants are grey", a term is identified w i t h a description of a typical member. This is t h e essence of prototype theory [Rosch 78]. In AI, prototype t h e o r y provides the f o u n d a t i o n for m a n y f r a m e - b a s e d reasoning systems. Frame-based reasoning s y s t e m s are c o m m o n l y used for recognizing stereotypical situations. For such applications, where an individual, situation, etc. is identified on t h e basis of a description, prototype theory seems perfectly adequate. T h u s to recognise an elephant, we m i g h t look for a t r u n k , grey colouring, f o u r legs, and so on. If a n y of these features are missing, it doesn't mean t h a t the object isn't an elephant, although it m a y m a k e u s tess certain t h a t it in fact is. For general reasoning s y s t e m s h o w e v e r prototype theory h a s drawbacks. Foremost is the fact t h a t for reasoning with prototypes, one is forced to use a probabilistic or default t h e o r y of reasoning. In contrast, s t a n d a r d first-order logic can be e m p l o y e d for reasoning w i t h analytic (definitional) statements. Also, as is pointed o u t in [Israel & B r a c h m a n 81], one cannot f o r m complex concepts strictly w i t h i n prototype theory. A prototype s y s t e m h a s to be told for example t h a t (the concept) "four-legged elephant" s u b s u m e s both "four-legged" and "elephant". In s u m m a r y t h e n prototype t h e o r y appears too weak to be used as a m e d i u m for the general representation of knowledge. However, it h a s been f o u n d u s e f u l in representing descriptions of n a t u r a l kind terms. The third case a t t e m p t s to m a i n t a i n a general sentence s u c h as "elephants have f o u r legs" while a d m i t ting exceptions to the sentence at the same time, This approach lacks a precise and complete formalisation; h o w e v e r it can be m o t i v a t e d by m e a n s of a naive view of scientific t h e o r y formation.

Consider the

scientific hypothesis t h a t water boils at 273°K. In testing this h y p o t h e s i s b y examining a particular sampte o f water, one verifies not just the s t a t e m e n t in question, b u t also a host of u n d e r l y i n g a s s u m p t i o n s

10

[ P u t n a m 79]. [Quine & Ullian 78]. T h u s a test of the statement "water boils at 273"K" p r e s u m e s t h a t the w a t e r is pure, that atmospheric pressure is 760ram, t h a t the t h e r m o m e t e r is accurate, that the act of measu r e m e n t does not affect the boiling point, etc. The failure of a sample to boil at 273"K t h e n does n o t necessarily f a l s i f y the hypothesis, b u t rather falsifies the conjunction of the hypothesis and the underlying assumptions.

The original conjecture can be maintained by claiming t h a t some a s s u m p t i o n h a s been

falsified, even t h o u g h the particular a s s u m p t i o n m a y not be specified, nor even k n o w n . Similar r e m a r k s apply to four-leggedness and elephanthood: a three-legged instance m a y be discharged b y appealing to some (possibly u n k n o w n ) underlying assumption. This is n o t to s a y though t h a t defining conditions for n a t u r a l kind t e r m s m a y not be hypothesised. For example we m a y entertain the hypothesis t h a t w a t e r is H20. In this case no exceptions are permitted; the radical OH for example is s i m p l y n o t water. On the other hand, the h y p o t h e s i s that water is H20 m a y be used to account for (at least in principle) the notion of boiling point, a n d to account for a n y exceptions.

2.3. S e m a n t i c T h e o r i e s o f K n o w l e d g e A knowledge representation scheme is u s u a l l y intended as a vehicle for conveying meanings about some domain of discourse. To be at all u s e f u l there m u s t be an external account of the w a y particular configurations expressed in t e r m s of the scheme correspond to particular a r r a n g e m e n t s in the domain. T h a t Is, there m u s t be an associated s e m a n t i c t h e o r y . Simply put, this m e a n s t h a t knowledge bases m u s t be about something, and a formal account of this aboutness constitutes a semantic theory. As indicated in the introduction, the s t a n d a r d starting point for semantic theories is Tarskian s e m a n tics, which is relatively straightforward, well understood, and well accepted by now. The question arises though as to h o w this t r e a t m e n t can be extended to deal w i t h knowledge (or belief - - since the points made in this section apply equally to both notions, we will use t h e m interchangeably). The main difficulty in providing a semantic t h e o r y for knowledge is t h a t t h e t r u t h value of s t a t e m e n t s m a y or m a y not be k n o w n , independently of their actual t r u t h v a l u e in the domain of discourse. T h u s for example one m a y not k n o w w h e t h e r it is raining in North Bay at present, although it certainly either is or is not raining there now. Moreover if we allow t h a t knowledge can be explicitly referred to. there arise questions concerning the extent to w h i c h one can have knowledge a b o u t one's o w n knowledge or knowledge about one's ignorance. If we introduce a new monadic operator K for "knows", then these questions concern the s t a t u s of sentences such as KKc~ or K-.Ka. In o u r review, we consider three semantic theories t h a t have been proposed for knowledge and belief. Following [Levesque 84]. we refer to t h e m as the possible w o r l d s , s y n t a c t i c , and s i t u a t i o n a l approaches. For possible worlds semantics, [Hintikka 62] is the seminal work.

W i t h i n AI, [Moore 80] and

[Levesque 84] present formalisations of knowledge or belief based on a possible-worlds semantics. To illustrate this approach, consider the following knowledge base:

Teacher(John) Teacher(Bill) V Teacher(Mary) (x)[Teacher(x) D SchoolErnployee(x)]. This knowledge base m a y be regarded as specifying w h a t is k n o w n about the world. T h u s it constrains the

11

way the wortd is t h o u g h t to be; for example, under the intended interpretation, John is a teacher and at least one of Bill and M a r y are teachers. However it also underconstrains the world. If Lou is an individual, then according to w h a t ' s k n o w n , she m a y or m a y not be a teacher. T h a t is, the actual world m a y be such t h a t Lou teaches, or it m a y be such t h a t she does not. W e can s a y t h e n t h a t there are, according to the knowledge base, possible w o r l d s in w h i c h Lou does teach, and others in w h i c h she does not. On the other hand, there are no possible w o r l d s compatible w i t h the knowledge base in w h i c h John doesn't teach; and in each possible w o r l d compatible w i t h the knowledge base at least one of Bill or M a r y teaches. N o w each s u c h possible world m a y be characterised using a Tarskian f r a m e w o r k . T h u s if Teach(Lou) is true in a possible world, so are -,-,Teach(Lou) and Teach(ixrg)V -Teach(John). So a knowledge base can be characterised semantically as a set of possible worlds. A s y s t e m k n o w s a sentence tx just w h e n a is true in all worlds t h a t are possible according to the s y s t e m ' s knowledge base. T h u s , f r o m our previous e x a m ple. the s y s t e m k n o w s not j u s t that John is a teacher, b u t also t h a t John is a school employee. Depending on how the notion of "possible" is defined, one can stipulate, for example, t h a t if s o m e t h i n g is k n o w n , t h e n it is k n o w n to be k n o w n , and t h a t if something is not k n o w n , t h e n it is k n o w n to be not k n o w n -- that is, w h e t h e r K a implies K K a , or -,Kt~ implies K ~ K a respectively. A drawback to approaches of this type for modelling knowledge is t h a t t h e y i m p l y logical o m n i s c i e n c e ; t h a t is, all logical consequences of beliefs m u s t also be believed

T h u s all valid sentences m u s t be believed. This entails for example t h a t such a s y s -

tem k n o w s the outcome of an optimal s t r a t e g y in chess or the t r u t h of F e r m a t ' s last theorem. F u r t h e r more, if a sentence and its negation are believed, then so m u s t be every sentence. Neither restriction seems particularly realistic. The first is c o m p u t a t i o n a l l y unreasonable and, for the second, m o s t people w o u l d happily a d m i t to the possibility of harbouring inconsistent beliefs, w i t h o u t t h e r e b y believing everything. There are h o w e v e r f o r m u l a t i o n s of possible w o r l d s t h a t do n o t necessarily lead to logical omniscience, notably those presented in [Lewis 73] and [Montague 74]. [Hadley 85] presents a critique of t h e aforementioned approaches to knowledge and sugests a solution to these difficulties based on the Lewis/Montague approach. A n alternative to the possible w o r l d s approach, which m a y be called the s y n t a c t i c a p p r o a c h , is to have the model s t r u c t u r e contain~ or be isomorphic to, an explicit set of sentences. [Moore & Hendrix 79] and [Konolige 84] are both advocates of this approach. Given our example knowledge base then, all t h a t w o u l d be k n o w n w o u l d be the three original sentences.

It w o u l d not necessarily be k n o w n that

SchoolEmployee(John), since this sentence doesn't appear explicitly in the knowledge base. However this 'is not unreasonable: one cannot in general k n o w all consequences of one's betiefs -- this after all is the problem w i t h logicat omniscience. This alternative also h a s some intuitive support. Certainly w h e n people acquire beliefs, t h e y seem to u s u a l l y do so w i t h o u t m a r k e d l y altering their prior set of beliefs. Conceivably belief acquisition, in m o s t instances, consists of little more t h a n adding a belief to an existent set. The approach also avoids the problem of logical omniscience, since e v e r y t h i n g believed is explicitly represented. H o w e v e r this approach seems to m a k e too "fine grain" a distinction w i t h respect to the form of a belief. The sentence

Teacher(Bill) V Teacher(Mary) is in the belief set, and so is believed. H o w e v e r

Teacher(Mary) V Teacher(Bill)

12

is not in the belief set, and so is not believed

Yet this is counterintuitive: a disjunction, a V S , m a y be

i n f o r m a l l y read as "a or 8 (or both) are true ~ - - a reading that is independent of a n y ordering on a and 8. So it w o u l d seem t h a t w h e n e v e r a V 8 is believed then 8 V¢x should be also. In general then a n y knowledge representation scheme u s i n g the syntactic approach m u s t also ( p r e s u m a b l y ) specify w h a t beliefs follow f r o m a given set. A third possibility, presented in [Levesque 84] generalises the notion of a possible world to t h a t of a situation,

The general idea is t h a t while a possible world fixes t h e t r u t h value for all sentences, a situa-

tion m a y s u p p o r t the t r u t h of some sentences, t h e f a l s i t y of others, a n d neither t h e t r u t h nor f a l s i t y of y e t other sentences. Phrased slightly differently, a knowledge base is relevant to (the t r u t h v a l u e of) some sentences and is irrelevant to others. So our example knowledge base supports the t r u t h of John being a teacher and at least one of Bill or M a r y being a teacher. On the other hand it supports neither t h e t r u t h nor falsity of Lou being a teacher. The definition of a "support" relation specifies w h a t beliefs are held, given that others are held. Roughly speaking, the definition extends the standard possible worlds model s t r u c t u r e b y replacing the notion of a possible world, w h e r e the t r u t h value of all sentences is specified, by the notion of a situation, where the t r u t h v a l u e of a sentence m a y or m a y not be specified. The definition of the s u p p o r t relation also ensures that desired relations a m o n g sentences hold. T h u s a situation supports the t r u t h of ¢~V8 if and o n l y if it s u p p o r t s the t r u t h of either c~ or 8. and a situation supports the falsity of aV8 if and only if it s u p p o r t s the f a l s i t y of both ~ and B. T h u s if

Teacher(Bill) V Teacher(Mary) is believed, then so is

Teacher(Mary) V Teacher(Bitl). In fact, in some sense these s t a t e m e n t s m a y be regarded as being the same belief, Unlike the possible worlds approach though, logical omniscience is avoided. In particular, a valid sentence need not be (explicitly) believed, beliefs need not be closed u n d e r implication, and beliefs can be inconsistent w i t h o u t every sentence being believed. T h u s given our example knowledge base it m a y or m a y not be the case t h a t either of

SchoolErnployee(John ) Teacher( I_xru) V ~ Teacher( Lou ) is believed. Finally, unlike the syntactic approach, the semantics of belief are u l t i m a t e l y based on the (Tarskian) conception of t r u t h , rather t h a n on restrictions to a set of sentences. The situational approach also p e r m i t s a distinction between w h a t m a y be caIled e x p l i c i t belief and i m p l i c i t belief. The f o r m e r deals w i t h w h a t an agent actually holds to be t h e case, while t h e latter deals with the w a y t h a t the world w o u l d be, a s s u m i n g t h a t the agent's beliefs are in fact true. In this view then, implicit belief is the "limiting" case of explicit belief. This fits in well w i t h the semantic view, where a possible w o r l d s semantics m a y be regarded as the "limiting" case of a situational semantics, wherein either the t r u t h or f a l s i t y of all (rather t h a n some) sentences is supported.

13

There is a second distinction, separate from semantic theories of knowledge, that m a y be profitably discussed at this point,

This distinction concerns h o w knowledge is to be f o r m u l a t e d in a knowledge

representation scheme. There are at present t w o major approaches. The first extends a logic, typically classical propositional or predicate logic, b y adding t h e sentential operator K mentioned previously, w h e r e a sentence K a m a y be read as "c~ is k n o w n to be true". T h u s the following s t a t e m e n t s "John is a s t u d e n t and M a r y is k n o w n to be a student. '~ "It is k n o w n that the only s t u d e n t s are the k n o w n students," "There is a s t u d e n t apart f r o m the k n o w n students." m i g h t be respectively represented Student(John) A K S t u d e n t ( M a r y ) K (x)[Student(x) ~ KStudent(x)] ( ~ x ) [ S t u d e n t ( x ) A ~KStudent(x)]

The operator B, for "~lelieves", is s o m e t i m e s used instead of K; [Konolige 84] uses [Si]a to mean "agent Si believes c~". [,Levesque 81], [Levesque 84], [Konolige 84], and [Fagin et al 84] are all examples of approaches that extend first-order logic. The second approach is to f o r m u l a t e a theory of knowledge within first-order logic; [McCarthy 79] and [Moore 80] are both examples of this approach. The idea is that one introduces a predicate "Know", and then provides axioms to govern this predicate. T h u s Moore represents the import of "Know" by reducing it to the notions of t r u t h in possible worlds, and of worlds possible according to w h a t is known. His " f u n d a m e n t a l axiom of knowledge" is T ( w I, K n o w ( a , p)) -----(we)[K(a, w I . w 2) D T ( w 2. p)]

w h i c h can be read as "a person a k n o w s the facts p t h a t are true in every world w e t h a t is possible according to w h a t he knows", F u r t h e r a x i o m s of course are required to pin d o w n the predicates K and T: these a x i o m s a m o u n t to encoding expressions in t h e object language (which t a l k s about k n o w n facts) into expressions of first-order logic t h a t talk about possible worlds. So, is there a n y reason to f a v o u r one approach over the other? or, more to the point, is there a n y t h i n g that one b u y s y o u t h a t the other does not? First of all, the second approach has the advantage that it embeds the characteristics of knowledge w i t h i n a w e l l - u n d e r s t o o d formal f r a m e w o r k .

This also m e a n s

that one can take an existing, off-the-shelf theorem prover (say) for deriving sentences f r o m a knowledge base phrased in s u c h terms, W i t h the first approach, inference procedures implementing the s y s t e m m u s t be developed. H o w e v e r this a d v a n t a g e isn't conclusive. W i t h t h e second approach we have, after a l l encoded a language w i t h i n t h e meta-Ianguage, first-order logic, a n d need to express explicitly h o w one m a y reason w i t h knowledge. T h u s for example if we have t h a t if someone k n o w s t h a t p&/ is true t h e n it d o e s n ' t a u t o m a t i c a l l y follow t h a t one k n o w s t h a t p is true. T h u s , one w a y or another, one m u s t state that something like K n o w ( a , "pAq") D Know(a. "p")

14

holds. So it is not clear thai an automatic computational advantage obtains. A potential disadvantage to the second approach is that it posits entities that m a y not be directly useful or applicable to the task of representing knowledge and moreover m a y lead to problems of their own. Thus. taking Moore's work as an example again, possible worlds are recognised as real entities in the language (in the sense that they appear in the range of quantifiable variables), The first approach doesn't make this explicit commitment. However once we allow possible worlds into our language, one is forced to deal with these entities. Questions arise as to how one possible world differs from another, h o w individuals m a y differ across worlds, and how, given an individual in one world, it can be identified in another. For these reasons the first approach, where an existing logic is extended, is generally favoured for reasoning with knowledge.

3. The "Ins", "Uns", a n d "Nons" o f K n o w l e d g e To understand a phenomenon, such as cars, hearts, or knowledge, one needs to s t u d y more than just its textbook definition. In particular, one needs to examine dimensions in terms of which the phenomenon can be characterized, and to study the allowable variations of the phenomena along each dimension. This section s u r v e y s some such dimensions for encodings of knowledge and describes relevant research issues and results

3.1. I n c o m p l e t e n e s s When a query is evaluated with respect to a database to find out for example if John Smith is a student, it is customary to assume that the database contains complete information about students. Thus failure to find information in the database is interpreted as negative information. In this case if John Smith's status was not found in the database, it would be concluded not that his status was u n k n o w n , but that he was not a student. This hidden assumption was pointed out and examined in [Reiter 78] and has been labelled the closed w o r l d a s s u m p t i o n . In general, however, this assumption is not justified and cannot be used. For anything but idealized microworlds, a knowledge base wilt have to be an incomplete account of the domain of discourse. Given this state of affairs, we want to be able to first, express our lack of information, and second, ask questions about it. Before discussing some proposals for dealing with incompleteness, it is instructive to examine some of its sources. The most obvious source is iack of information about the domain of discourse. Thus. an incomplete knowledge base m a y only know two students when in fact there are m a n y more. Moreover, it may be the case that the knowledge base

knows

that there are other, u n k n o w n students. A second important

source of incompleteness has to do w i t h the derivability relation ~-L which defines w h a t can be derived f r o m given facts in the knowledge base. In particular, this relation may be "weak" in the sense that there are statements whose t r u t h would seem to be implicit in the given facts (the set KBo discussed in the introduction), and yet are not in the knowledge base because they are not derivable through ~-z- For example, aknowledge base m a y contain

Student(John)

(4)

Student(Mary)

(5)

and use the e m p t y derivability relation (i.e., there are no inference rules). Such a knowledge base does not

15

contain

Student(John) V Student(Joe) even t h o u g h this is clearly true in e v e r y possible world described b y the knowledge base. It m a y seem to t h e reader t h a t this is a pathological example a n d that. in fact, "reasonable ~ knowIedge bases will a l w a y s have a sufficiently s t r o n g derivability relation to eliminate s u c h examples. It t u r n s out, however, t h a t there are several reasons w h y a derivability relation m a y be weak either b y necessity or design [Lakemeyer 84]. Firstly, weak derivability relations m a y m a k e m u c h smaller d e m a n d s on c o m p u tational resources, and t h u s m a y be desirable f r o m a computational point of view (see [Brachman & Levesque 84] for a discussion of such issues). In addition, GSdeYs incompleteness theorem establishes t h a t there are inherent limits to the completeness of a knowledge base w h e n t h e knowledge representation scheme is sufficiently powerful. Expressing incompleteness involves a n u m b e r of capabilities, including saying that s o m e t h i n g h a s a property w i t h o u t identifying t h e thing w i t h t h a t property, saying t h a t e v e r y t h i n g in a class has a property w i t h o u t s a y i n g w h a t is in t h e class, a n d altowing t h e possibility t h a t t w o nonidentical expressions n a m e t h e s a m e object [Moore 80]. First-order logic provides facilities for handling these situations t h r o u g h the use of logical connectives, quantifiers, and terms. T h u s , easily and trivially, we can state:

(3 x)[reach(x) A PlaeeOfResidence(x, Paris)] (x)[Teacher(x) D Erudite(x)] MorningStar = EveningStar. However difficulties arise in a first-order logic setting when one a t t e m p t s to deal with the closed world a s s u m p t i o n or its converse, the open w o r l d assumption. Suppose for example that we w a n t to state t h a t there is an u n k n o w n s t u d e n t in a knowledge base w h i c h includes s t a t e m e n t s (4) and (5). T h u s :

(3 x)[Student(x) A ",(x=John) A -,(x=Mary)]

(6)

One drawback of this f o r m u l a t i o n is t h a t the length of such f o r m u l a s could be proportional to the size of the knowledge base. A more i m p o r t a n t drawback of (6) comes into the picture if we t r y to use it as a query, asking of the knowledge base w h e t h e r there exists an u n k n o w n student, To express such a q u e r y the user will have to k n o w a/l t h e k n o w n s t u d e n t s . A n alternative, explored in [Levesque 84], is to use the m o d a l operator K where K s m e a n s ~ is k n o w n . Then, stating t h a t there is an u n k n o w n s t u d e n t can be expressed by

(3x)[Student(x) ^ ~KStudent(x)] a n d a similar f o r m u l a t i o n can be used to ask if t h e knowledge base k n o w s all students. Note t h a t this s t a t e m e n t , unlike (4), (5), or (6) is a s t a t e m e n t about the knowledge of the knowledge base (or lack of it) rather t h a n a b o u t the domain of discourse (students), A c o m p l e m e n t a r y approach to Levesque's is proposed in [Moore 80] w h i c h focuses on a knowledge base's knowledge about other agents, rather t h a n on self-knowledge. To s a y t h a t John k n o w s a Frenchspeaking teacher m i g h t be expressed as

16

Know(John. "( 3 x )[Teacher( x ) h Fr enchS peaking( x ) ]") whereas the statement that there is a French-speaking individual that John knows is a teacher might be represented as

( 3 x )[Fr enchS peaking( x ) A Know(John, "Teacher(x)")] As discussed earlier, Moore's w o r k is also distinguished by the fact that it formulates its theory within first-order logic. In both Levesque's and Moore's approaches, possible world semantics serve as the basis for a semantic theory. An alternative approach to those described so far is presented in [Konolige 84] where each agent in a multi-agent environment is assumed to have its o w n set of facts and its own (possibly weak) derivability relation. Thus, stating that John k n o w s that Sue is a teacher is expressed as

[John] Teacher(Sue) and the facts derivable f r o m this are determined by the derivability relation associated with agent John. A similar proposal is outlined in [Bibel 83]. Yet another treatment to incompleteness is described in [Belnap 75] which proposes a four-valued logic where the extra two values can be read as "unknown" and "inconsistent". This approach has been used by [Shapiro & Bechtel 76] in the development of a semantics for a semantic network formalism and by [Vassiliou 80] in accounting for "null values" in databases.

3.2. N o n m o n o t o n i t y If we view a knowledge base as a first-order theory, additional facts invariably lead to additional knowledge. For instance, if we have a knowledge base which is given (4) and (5), and add

Student(Jane) we now know -- in addition to everything that logically follows f r o m (4) and (5) -- formulas such as

Student(John) h Student(Jane) Student(Jane) V Married(Bill) that were not k n o w n previously. More formally, if KB and KB' are knowledge bases and KB = < KBo, ~-z > KB" = < KB0 U ~, ~-L > then KB -- KB'. This property makes first-order and most other "conventional logics" m o n o t o n i c . Unfortunately, monotonicity is not a property of commonsense knowledge and reasoning w i t h respect to such knowledge [Minsky 75]. Indeed, there are m a n y situations where monotonicity leads to problems. Here are some, noted in [Reiter 78]:

17

D e f a u l t a s s i g n m e n t s . Default rules are used to assign values to properties in the absence of specific information. T w o examples are: "Unless y o u k n o w otherwise, a s s u m e t h a t a person's city of residence is Toronto." "Unless y o u k n o w otherwise, a s s u m e t h a t an elephant is grey," K n o w l e d g e Incomlaleteness.

The closed world a s s u m p t i o n discussed in the previous section can be

expressed w i t h s t a t e m e n t s of the f o r m "Unless y o u k n o w otherwise, a s s u m e t h a t an object is not a student." w h i c h a m o u n t s to s a y i n g t h a t all s t u d e n t s are a s s u m e d to be k n o w n .

Default Inheritance,

Consider a prototypical description of birds w h i c h states that birds fly. Of course,

this can be false, either for particular birds ( T w e e t y ) or classes of birds (penguins). It can then be u n d e r stood as a rule of t h e f o r m KUnless y o u k n o w otherwise, a s s u m e t h a t a bird flies." This is a classical example of "default inheritance" used in semantic n e t w o r k s (e.g., [Fahlman 79]) where a "flies" a t t r i b u t e is associated w i t h the concept of bird, and is then inherited b y instances or specializations of the concept if there isn'~ i n f o r m a t i o n to t h e contrary.

So far we have seen the need to introduce a s s u m p t i o n s into the knowledge base while reasoning in order to deal w i t h ignorance (incompleteness) or w i t h knowledge t h a t o n l y provides an approximate account of the world (e.g., prototypical descriptions). Nonmonotonic reasoning is brought about by the introduction of such assumptions. If at some time an a s s u m p t i o n is introduced in the knowledge base, say

~Student(Sue ) because of lack of information, and it is later discovered t h a t Sue is in fact a student, we m u s t remove the a s s u m p t i o n concerning Sue's s t u d e n t s t a t u s , or face the prospect of an inconsistent knowledge base. T h u s in this situation, the addition of facts to the knowledge base leads to some ( f o r m e r ) conclusions no longer being derivable. This is the feature t h a t renders reasoning s y s t e m s t h a t use "unless otherwise" rules n o n monotonic. Versions of nonmonotonic reasoning were used in semantic n e t w o r k and procedural representation languages such as PLANNER [Hewitt 71] before a n y a t t e m p t s were made w i t h i n AI to formalize and s t u d y it. [Reiter 78] a n d s u b s e q u e n t l y [Reiter 80] offered a formalization based on d e f a u l t logics. These are logics which include first-order logic, b u t in addition can have domain-specific inference rules of the f o r m or(x1 . . . . .

x.): Mt3(xl . . . . . y ( x l . . . . . x~)

x.)

These rules can be read i n f o r m a l l y as "If for particular v a l u e s of x l , . . . , x n ,

tx is t r u e and g can be

a s s u m e d consistently, then a s s u m e T , For example.

Person(x): MLives(x,Toronto) Lives(x,Toronto) states t h a t if someone is a person and it can be consistently a s s u m e d t h a t he lives in Toronto (i.e, it cannot

18

be derived f r o m t h e knowledge base that this someone doesn?t live in Toronto) then a s s u m e that he lives in Toronto. Likewise, the closed world a s s u m p t i o n for s t u d e n t s can be approximated b y the inference rule

Person(x): M-Student(x) -,Student(x) W i t h this m a c h i n e r y , a default t h e o r y consists of a set of axioms and a set of default inference rules. Its theorems include those t h a t logically follow f r o m the axioms using not only first-order logic inference rules, b u t also "assumptions" generated by the default inference rules. It proves to be the case t h a t a d e f a u l t theory can have more t h a n one possible set of theorems, depending on the order of application of its d e f a u l t inference rules [Reiter 80], Each of these sets can be viewed as an acceptable set of beliefs that one can entertain w i t h respect to a d e f a u l t theory. A n o t h e r approach to d e f a u l t reasoning, first proposed in [ M c C a r t h y 80], is the notion of circumscription. Intuitively, circumscription can be t h o u g h t of as a rule of conjecture t h a t allows one to j u m p to certain conclusions in the absence of knowledge. This is achieved by stating t h a t all objects t h a t can be s h o w n to have property P, given some set of facts, are in fact the only objects t h a t s a t i s f y P. Consider, for e x a m ple. a blocks world situation w i t h two ( k n o w n ) blocks A and B:

Block(A ) Block(B). One w a y of circumscribing w i t h respect to the predicate Block a m o u n t s to saying that the k n o w n blocks are the o n l y blocks To achieve this we pick the following as t h e circumscription of Block, w r i t t e n

C(Block): C(Block) ~ Block(A)ABlock(B) and s u b s t i t u t e this in the f o r m u l a schema

c( ~ ) ^ (x)[ ~ ( x ) ~ e(x)] ~ ( x ) [ e ( x ) ~ ( x ) ] ( f r o m [McCarthy 80]). This schema can be regarded as stating t h a t the only objects that satisfy P are those t h a t h a v e to, a s s u m i n g the sentence C. C(q~) is the r e s u l t of replacing all occurrences of P (here

Block) in C b y some predicate expression @. T h u s it states that q~ satisfies the conditions satisfied b y P. The second conjunct, (x)[~(x)DP(x)] states t h a t entities satisfying • also s a t i s f y P, while t h e conclusion states t h a t • and P are then equivalent. In our example, if we then pick

• (x) =-- (x=A v x=B) and we s u b s t i t u t e back in the circumscriptive schema and simplify, we end u p w i t h the circumscriptive inference

(Block(A) A Block(B)) ~-c (x)[Btock(x) D (x=A V x=B)]. Note t h a t n o t e v e r y possible choice of a circumscription for Block leads to reasonable conclusions (see, for example, discussion in [Papalaskaris & B u n d y 84]). [Etherington et al 85] s t u d y the power and limitations of circumscription, while [McCarthy 84] provides a more recent account of circumscription, including a more general f o r m u l a t i o n of this i m p o r t a n t m e t h o d of nonmonotonic reasoning.

t9

Yet another approach to nonmonotonic reasoning is described in [McDermott and Doyle $0] and subs e q u e n t l y in [McDermott 82] and [Moore 83]. In the first paper, the a u t h o r s present a logic consisting of s t a n d a r d first-order logic augmented w i t h a sentential operator M whose informal interpretation is "is consistent". T h u s the s t a t e m e n t t h a t a bird is a s s u m e d to fly, given t h a t s u c h an a s s u m p t i o n is consistent, is expressed b y the axiom:

(x)[Bird(x) A MFties(x) D Flies(x)]. If we k n o w that T w e e t y is a bird then, barring information to the contrary, we w o u l d conclude that T w e e t y flies. The s y s t e m is nonmonotonic since additional axioms m a y block previous inferences. For example, if the axioms

(x)[Penguin(x) D ~Flies(x)] Penguin(Tweety) were added, we w o u l d conclude t h a t -~Flies(Tweety). This conclusion blocks the original inference since

MFlies(Tweety) is no longer true. A difficulty w i t h this particular approach is that the notion of consistency provided is quite weak. Specifically, there is no relation between t h e t r u t h values of a sentence p and Nip. This m e a n s t h a t the pair IMp. -,p} m a y not necessarily be inconsistent~ [McDermott 82] examines stronger versions of this logic arrived at by adding a x i o m s to the original. [Moore 83] developes this line of w o r k f u r t h e r by pointing out t h a t there are least two types of d e f a u l t reasoning t h a t w o r k such as M c D e r m o t t and Doyle's addresses, The first, which he calls default reasoning deals w i t h facts concerning the external world. As an example, if most Quebecois are French-speaking and Pierre is f r o m Quebec, then if there is no information to the contrary, one m a y reasonably conclude t h a t Pierre speaks French. The second, called autoep-

isternic reasoning, concerns reasoning a b o u t one's o w n beliefs. A n example is the sentence "I k n o w that I d o n ' t k n o w w h e t h e r it's raining in Paris at present ~. This is nonmonotonic since if I w a s told that the s u n is presently shining in Paris, I wouId w i t h d r a w the previous sentence. Moore addresses this latter type of reasoning b y m e a n s of a propositional logic of belief, m u c h like those presented in the previous section. It is a point of interest then t h a t this particular line of research, beginning w i t h [McDermott and Doyle 80], lead f r o m strictly nonmonotonic concerns, to logics of knowledge and belief as are used to address problems of incompleteness. A second point of interest that s h o u l d be evident f r o m this brief s u r v e y is t h a t there is no clear agreem e n t as to w h a t c o n s t i t u t e s the preferable approach (if indeed there is one) to the problems of n o n m o n o tonicity.

Solutions proposed range f r o m the (circumscriptive) addition of f o r m u l a s , to extending first-

order logic with sentential operators, to adding d e f a u l t rules of inference to first-order logic. The c u r r e n t diversity, interest, and activity of the area is also borne o u t b y a recent w o r k s h o p in this area [AAAI 84].

3.3. I n c o n s i s t e n c y In principle, one of the advantages of "conventional" logics (e.g., s t a n d a r d first-order predicate calculus) is the availability of a notion of consistency w h i c h determines, for example, t h a t the knowledge base

20

Canadian(John) V C ~ a d ian( Mar y ) -~Canadian(]ohn ) ~Canadian(Mary) is inconsistent, i.e., there is no interpretation t h a t is a model for this knowledge base. U n f o r t u n a t e l y , h o w ever, it is a fact of life t h a t large knowledge bases are inherently inconsistent, in the same w a y large prog r a m s are inherently buggy. Moreover, w i t h i n a conventional logic, the inconsistency of a knowledge base has the catastrophic consequence t h a t everything is derivable f r o m the knowledge base. From the point of view of knowledge representation, dealing w i t h inconsistency involves two issues. The first concerns t h e assimilation of inconsistent information, i.e., the ability to include in a knowledge base inconsistent information w i t h o u t rendering the knowledge base useless. The second issue concerns the

accommodation of inconsistent knowledge, i.e., the modification of t h e knowledge base to restore consistency. It must be stressed t h a t both issues are important and should be seen as opposite sides of the same coin. Indeed, a knowledge base should be able to behave like a body of scientific knowledge consisting of observations and general laws. Inconsistencies can exist at a n y time, b u t there are aIso m e c h a n i s m s for rationalising inconsistencies and for introducing new general laws t h a t account for observational knowledge and at the same time eliminate or reduce the inconsistencies. A c o m m o n solution to assimilation employed by m a n y early semantic n e t w o r k f o r m a l i s m s was to constrain the order m which inferences are tried. T h u s , given the knowledge base:

Is a(Pengu~n. Bgrd) Attribute(Bird, Flies) Attribute(Penguin, ~Flies ) Instance of(Opus. Penguin) with the obvious informal interpretation, if we wanted to determine w h e t h e r Opus flies, then attributes associated w i t h penguinhood w o u l d be tried before those of the (superclass) birdhood. T h u s it w o u l d be concluded t h a t Opus doesn't fly. The (inconsistent) assertion that O p u s flies, which could potentially be derived f r o m the fact t h a t Opus is also a bird, s i m p l y isn't inferred. The difficulty w i t h such an approach of course is t h a t its semantics isn't at all clear. Given such a scheme, it is b y no m e a n s obvious j u s t w h a t can or cannot be inferred. The problems of recasting the issue of inheritance of such d e f a u l t properties however have been recently addressed in [Etherington a n d Reiter 83], using Reiter's default logic, and in [Touretsky 84]. W i t h respect to conventional logics, the source of difficulty with inconsistency can be traced to the so-called p a r a d o x e s o f i m p l i c a t i o n , such as

AD(BDA) w h i c h can be paraphrased as "anything implies a true proposition", or

(AA...,A) D B, "a contradiction implies anything". One w a y to eliminate these undesirables is to m o d i f y the axiom set and revise the notion of proof so t h a t a proof of B f r o m hypotheses Ai, A 2, . . . ,A n is w e l l - f o r m e d only if it

21

actually uses each hypothesis in some step. T h u s proofs are w e l l - f o r m e d , according to this proposal, o n l y if each h y p o t h e s i s is redevant to the conclusion of the proof. [Anderson & Belnap 75] provide a thorough s t u d y of such r e l e v a n c e logics. As mentioned earlier, Levesque's f o r m u l a t i o n of a situational semantics for belief uses similar ideas and ends u p w i t h a notion of e n t a i l m e n t t h a t is the same as one of the relevance logics. A novel proposal for treating inconsistency is described in [Borgida & Imielinski 84]. A knowledge base is treated as a collection of viewpoints held by m e m b e r s of a committee, w h e r e each viewpoint includes a consistent collection of facts. Derivability w i t h respect to the knowledge base is t h e n determined by m e a n s of a c o m m i t t e e decision rule. Some examples of alternative derivation rules are: KB ~L P if in each viewpoint V, V ~- p KB b-L p if in at least one v i e w p o i n t V, V ~- p a n d in no v i e w p o i n t V, V ~- -~/~ Note t h a t the first is a v e r y conservative definition of derivability.

Bot~ definitions allow conflicting

viewpoints a m o n g committee m e m b e r s w i t h o u t leading to contradictory knowledge bases. The proposal is s h o w n to be capable of handling a variety of nonmonotonic phenomena, including d e f a u l t rules and database updates. A n o t h e r approach to the problem of assimilation of inconsistent information is the previously-cited f o u r - v a l u e d logic described in [Belnap 75]. In this logic, besides having values for true and false, there are also values for u n k n o w n and inconsistent. So if a s y s t e m was told f r o m one source t h a t Student(John) w a s true, and later informed b y another source t h a t ~Student(John) was true, the s y s t e m could assign "inconsistent ~ as the t r u t h value of the statement. The approach then allows the explicit representation of, and hence ability to reason with, inconsistent information. T u r n i n g to the issue of accommodation, one w a y to resolve the problem, indeed eliminate it altogether, is to treat "suspect" f o r m u l a s (i.e. f o r m u t a s t h a t m i g h t be contradicted) as hypotheses. This point of view is adopted b y [Delgrande 85] where it is a s s u m e d that facts in a knowledge base are of three different kinds: ground atomic f o r m u l a s such as

Student(John) Supervisor(John, Mary) hypothesized general s t a t e m e n t s s u c h as "Elephants are hypothesized to be m a m m a l s " "An uncle is hypothesised to be a brother or h u s b a n d of a sibling of a parent" and a r b i t r a r y sentences p r e s u m e d to be beyond refutation.

Given this assumption, three issues are

explored: first, h o w to generate and m a i n t a i n the consistency a m o n g the hypothesised general statements, given ground atomic f o r m u l a s and other statements; second, h o w to f o r m a l l y prescribe the set of general s t a t e m e n t s t h a t m a y be hypothesised; a n d last, h o w such a hypothesis formation s y s t e m m i g h t interact w i t h a s y s t e m for deductively reasoning w i t h hypothesis and knowledge. The problem of f o r m i n g conjectures and maintaining consistency is treated less as a n inductive inference problem and m o r e as a deductive, consistency-restoration problem. Simplistic criteria are used to f o r m general h y p o t h e s e s on the basis of the ground facts; these criteria h o w e v e r are n o t strong enough to ensure that s t a n d a r d relations hold among hypotheses. However it is s h o w n h o w consistency m a y be deductively restored by the m e a n s of determining the t r u t h v a l u e of knowable b u t u n k n o w n ground

22

instances, and reapplying the simplistic criteria to the expanded set of ground instances. T h u s for example if it was k n o w n that instructors in some university department w i t h their Master's degree could supervise M,Sc. students, we might hypothesise that these groups are equivalent, say

H(x )[HasMSc(x) =- MScSup(x)].

(7)

In another department it m a y t u r n out that it is not inconsistent w i t h w h a t is known that supervisors of M.Sc. s t u d e n t s can also supervise Ph.D. students:

H(x)[MScSup(x) -~ PhDSup(x)].

(8)

However there is, as yet, no reason (i.e. common satisfying individuals) to conjecture the transitive equivalence

H(x)[HasMSc(x) = PhDSul~x)].

(9)

Clearly though if the individuals k n o w n to satisfy (7) were determined to satisfy PhDSup(x), then we w o u l d have reason to hypothesise (9),

If one of these individuals w a s determined to not satisfy

PhDSup(x), then this individual would also falsify (8), and so we would obtain H(x)[PhDSup(x) D MScSup(x)] H(x)[PhDSup(x) D HasMSc(x)] and so in any case consistency would be restored. Another approach to the issue of accommodation due to [Borgida 85] takes the view that general laws are useful and should be available in a knowledge representation framework, along w i t h a mechanism for accommodating exceptions. Consider, for example, a statement such as "Before admission to the hospital, a person m u s t present his hospital insurance number" This statement cannot be treated as a default rule because then it has no force. At the same time. it cannot be treated as a universally quantified constraint because it is obvious that it will be violated in individual cases (e.g., during the admission of a VIP to the hospital) as well as in whole classes of cases (e.g_ in emergency situations where the person being admitted is in no position to w o r r y about his hospital insurance number).

Borgida's proposal treats the introduction of an exception as a composite operation which

includes a modification of the general statement, Thus if John is admitted to the hospital under emergency conditions and it is decided to delay enforcement of the constraint specified above, the constraint is revised to read "Before admission to the hospital, a person m u s t present his hospital insurance n u m b e r or the person is John" Thus at any one time, a general f o r m u l a is thought of as having a given n u m b e r of exceptions which wel*e introduced in the knowledge base after permission was granted. One desirable feature of this mechanism is that reasoning can be done in first-order logic. In a s o m e w h a t different vein, truth rnaintenar~ce systems or reason maintenance systems have been proposed for revising sets of beliefs. In this approach the reasons or justifications for holding a particular belief are recorded along with the belief, If the belief is later found to be false, then the justifications can

23

be examined in an attempt to restore consistency. An early implementation of these ideas is [Doyle 79]. The notion of recording which formulas are relevant in deriving a proof sounds much like the approach taken in relevance logics and it would seem that such w o r k m a y be useful for belief revision systems. This indeed is the case, and recent w o r k by Martins and Shapiro, described in [Martins and Shapiro 84], uses a relevance logic of Anderson and Belnap as the basis of a formal f r a m e w o r k for a belief revision s y s tem.

3.4. I n a c c u r a c y A knowledge base m a y contain information which is assumed to be true when in fact it is not. For instance, we m a y have

Student(John) Supervisor(John, Mary) when actually John isn't a student (he was last year) a n d / o r he is not supervised by Mary (the data entry clerk make a typing error in entering this fact in the knowledge base). We call such a knowledge base inacc u r a t e since it does not provide an accurate description of the world being modelled. Inaccuracy, like incompleteness and inconsistency, has to be treated as a fact of life in any large knowledge base, either because external sources of information may be incorrect, or because unintended coding errors m a y occur in adding the information to the knowledge base, or because the knowledge base isn't updated property. It is important to note that inaccuracy is a different notion f r o m inconsistency. For example, if a knowledge base k n o w s that a person's age is between 0 and 120, and it is asserted that John's age is 134, when in fact it is 34, then the resultant knowledge base will be both inconsistent and inaccurate. But claiming that John's age was 43, when in fact it is 34, leads to an inaccurate but consistent knowledge base. What can be done about inaccuracy? Well, as with other features of knowledge, at the very least we would like to be able to (1)

talk about

it. We would like to be able to state, for instance,

"An entity asserted to be a student in the knowledge base is, in fact, a student in the domain of discourse",

which asserts that the knowledge base has accurate knowledge with respect to studenthood, or (2)

"A person's age as recorded in the knowledge base m a y be off by up to t w o years".

One w a y to achieve this is through the use of a modal operator, such as the operator K discussed earlier. Thus. to assert (1) we can write

(x)[KStudent(x) D

Student(x)]

while (2) can be asserted w i t h

(x)(y)[K(age(x)=y) D Irealage(x)--yf ~ Note that if we f u r t h e r assume that

2].

24

(3)

KaDa

then an inaccurate knowledge base m u s t also be inconsistent. T h u s (3) has the undesirable consequence that accuracy of the knowledge base is legislated, and so cannot be discussed, constrained, or asserted -- in contrast to (1). This point is discussed f u r t h e r in [Levesque 81, pp 2-5]. The careful reader will note t h a t there is nothing about the K operator that is specific to (in)accuracy. This operator s i m p l y allows us to talk about s t a t e m e n t s in the knowledge base, and this capability m a k e s it u s e f u l in the t r e a t m e n t of incompleteness, inaccuracy, and other features of encodings of knowledge. This suggests t h a t other m e c h a n i s m s w h i c h allow one to talk about s t a t e m e n t s in the knowledge base should also be suitable for talking about inaccuracy. This is indeed the case, a n d we'll discuss one such m e c h a n i s m due to [McCarthy 79], w h i c h provides capabilities comparable to those provided by the K operator, b u t in a first-order logical setting. M c C a r t h y ' s point of departure is to treat concepts such as t h a t of John or M a r y as objects in a firstorder theory, t h u s bringing t h e m into the domain of discourse. In order to relate a concept (e.g., John) to the entity denoted b y the concept (e.g.. t h e real person john) M c C a r t h y uses a d e n o t a t i o n f u n c t i o n denot so that

denot(John )=john. A s s u m i n g that s y m b o l s beginning w i t h a capital letter denote concepts, or f u n c t i o n s or predicates over concepts, while s y m b o l s beginning with lower case letters denote entities, or f u n c t i o n s or predicates over entities, in t h e domain of discourse, we can n o w write

(X)[Student(X) D student(denot(X))] to assert t h a t s t u d e n t concepts in the knowledge base denote s t u d e n t s in the domain of discourse, while

(X)(Y)[Age(X)=Y D Irealage(denot(X))-denot(Y) I ~ 2] asserts t h a t the age of a person stored in the knowledge base m a y be inaccurate by up to t w o years. A comparable approach is used in [Konolige 81] to describe the contents of a relational database. Here the role of the knowledge base is played by the relational database and Konolige uses a predicate DB which takes as a r g u m e n t s encodings of s t a t e m e n t s w i t h respect to the database, and r e t u r n s true or false depending on w h e t h e r the s t a t e m e n t is true or false with respect to the database. For example, if f is the encoding of the s t a t e m e n t

(t/SHIPR )[sname(t )=LAFAYETTE V length(t) > 300] then DB(f) is true if and only if e v e r y tuple in the relation SHIPR h a s s h a m e a t t r i b u t e equal to LAFAY~7"TE or its length attribute is greater t h a n 300. Of course, since DB represents t r u t h in t h e database, it has to s a t i s f y a x i o m s s u c h as

(/)[DB(~/) = ~DB(/)]

(f)(g)[DB(f A g) = DB(f) A DB(g)] etc. In addition, a denotation f u n c t i o n comparable to M c C a r t h y ' s is used to talk about the denotations of database terms. It is s h o w n t h a t this m a c h i n e r y is adequate for answering questions about the domain of

25

d i s c o u r ~ , given a database and a set of axioms that describe its semantics, and also for the expression of incompleteness in t h e database.

3.5. R e l a t i v i t y Yet another i m p o r t a n t feature of knowledge and belief is t h a t it is r e l a t i v e to an agent. Different agents have different, possibly inconsistent beliefs about a domain of discourse. Moreover, t h e y have beliefs a b o u t each others' knowledge and belief as well as their own. Consider for example t h e Wise M a n Puzzle, as given in [Konolige 82]: A king wishing to k n o w w h i c h of his three wise m e n is the wisest, paints white dots on each of their foreheads, tells t h e m t h a t at least one spot is white, and a s k s each to determine the colour of his o w n spot. A f t e r a while the wisest announces t h a t his spot is white, reasoning as follows: "Suppose m y spot were black. The second wisest of u s w o u l d then see a black and a w h i t e and w o u l d reason t h a t if his spot were black, the least wise w o u l d see t w o black spots and w o u l d conclude t h a t his spot is white on the basis of the king's assurance. He w o u l d have announced it by now, so m y spot m u s t be white. ') Konolige formalises the problem in a propositional m o d a l logic consisting of the propositional calculus, together w i t h a set Sp of modal operators, where for

SE Sp the

intended meaning of [S]~ is t h a t agent S

k n o w s a. If Pi is the proposition asserting t h a t t h e i ~h wise m a n h a s a w h i t e spot on his forehead, and abbreviates (1).

[Sip V [S]~p. then

~S~p

some of the initial conditions of the puzzle are:

p1Ap2Ap3

(2) [o](plvp2vpj) (3) [o](Irsl]~v2 ^ ~s~]p3 ^ ns21lv~ A lrs2~p3 A [~sj~p~ A ~sj)w ) The first a x i o m states the actual situation. The second states t h a t it is c o m m o n knowledge t h a t someone has a w h i t e spot on his forehead. The third axiom s a y s t h a t it is c o m m o n knowledge t h a t each can see the spots of the others. F u r t h e r axioms are used to f u l l y specify the problem, including a

circumscriptive

axiom stating t h a t S I has sufficient knowledge to solve the problem. F r o m this f o r m u l a t i o n it is proved t h a t $1 k n o w s the colour of his spot. Konolige's approach also altows a certain flexibility in representing nested beliefs and belief s y s t e m s [Konotige 84]. For example, John m a y h a v e one set of beliefs and deduction rules, while M a r y has another. John h o w e v e r also m a y have beliefs about M a r y ' s beliefs and rules of inference. Such belief s y s t e m s and belief s u b s y s t e m s m a y be of v a r y i n g power and capabilities. So if John is reasoning about M a r y ' s beliefs. his reasoning is "filtered )' t h r o u g h M a r y ' s

perceived beliefs

a n d deduction rules.

Fagin a n d his co-workers present a general model for reasoning about this sort of knowledge for a set of agents [Fagin et al 84], Instead of an extension to possible world semantics for an agent's belief as m i g h t be expected, t h e y present a model based on a notion of "knowledge levels ". Each level corresponds to an iteration of t h e ~knows ~ operator, or to a level of recta-knowledge. For example, a s s u m e t h a t level zero, where t h e d o m a i n itself is described, contains j u s t t h e sentence

Student( Bffl )

(10)

The first level gives each agent's knowledge about the domain. So perhaps John k n o w s that (10) is true

26

while M a r y does not:

K lohnStudent (Bill ) ",KM~Student(Bill) A -~Km~rj~Student(Bill). The second level contains each agent's knowledge about the other agent's knowledge about t h e domain. So M a r y m a y k n o w that John k n o w s the t r u t h value of (10), while John m a y not k n o w w h e t h e r M a r y k n o w s w h e t h e r (10) is true or false:

K Ma~y(K TohnStudent(Bill) V K joh~..Student ( BilY) ) ~K john(K-MaryStuderff(Bill) V K Mary-,Studerg(Bill) ). Since an agent's self-knowledge is a s s u m e d to be complete and accurate, we also obtain sentences such as

K johnKjoh~Student( Bitl ), stating t h a t John k n o w s t h a t he k n o w s t h a t Bill is a student. M c C a r t h y ' s proposal for treating knowledge, belief, and related concepts f r o m w i t h i n first-order logic, discussed in the previous section, also applies to reasoning about other agents' beliefs. So, Know(A,P) is a proposition meaning that agent A k n o w s the value of concept P, while

true K n o w ( A . P ) asserts the t r u t h of the proposition. T h u s we can assert that John k n o w s w h e t h e r M a r y k n o w s Bill's telephone n u m b e r by:

true Know(John, Know(Mary, Telephone Bill)). It m a y seem to the reader t h a t knowledge about other agents is s i m p l y a generalisation of selfknowledge. Surprisingly, this is not quite true. Consider an example (paraphrased f r o m [Levesque 81]) where the knowledge base is told that:

( 3 x ) Kn ow( John. "Teacher(x)") which asserts t h a t there is an individual k n o w n b y John to be a teacher. Replacing John b y the agent itself, we have ( 3 x ) K n o w ( K B , "Teacher(x)") w h i c h is either trivially true (if there is, in fact, a teacher in the knowledge base) or meaningless (if there i s n ' t a teacher in t h e knowledge base, t h e n one can't teR the knowledge base t h a t it k n o w s otherwise), The conclusion t h a t can be d r a w n f r o m this example is t h a t there are s t a t e m e n t s about an agent's knowledge w h i c h m a k e sense as long as t h e y are not s t a t e m e n t s of self-knowledge. A n o t h e r w a y in which knowledge a b o u t other agents is not a simple generalisation of self-knowledge is t h a t in some logics it is s u b s t a n t i a l l y m o r e complex to reason a b o u t a n u m b e r of agents, t h a t it is to reason about a single agent. This topic is addressed in [Halpern and Moses 85]. For logics based on $5

27

(roughly, w h e r e one k n o w s about one's o w n knowledge and ignorance) the problem of s h o w i n g whether a f o r m u l a is satisfiable belongs to complexity class NP for a single agent, b u t to the next complexity class PSPACE for m u l t i p l e agents. This m e a n s that it is quite likely that the problem of satisfiability for m u l t i ple agents is s u b s t a n t i a l l y more difficult t h a n for a single agent, S o m e w h a t surprisingly, the problem of satisfiability in the case of m u l t i p l e agents becomes yet more complex if operators for c o m m o n knowledge are added. T h a t is, operators E and C are added, w h e r e Ec~ is read as "everyone k n o w s a" a n d C ~ abbreviates E a A EE~ A -. - and is read as "c~ is c o m m o n knowledge ~. [Halperen and Moses 85] s h o w t h a t for a general set of logics the problem of satisfiability m o v e s f r o m PSPACE to deterministic exponential time EXP w h e n axioms for E and C are added. In s u m m a r y , adoption of a relativist viewpoint m e a n s that it is no longer possible to a s s u m e that every s t a t e m e n t about the domain of discourse is either true or false. Indeed, the notion t h a t there is a unique domain of discourse (~God's point of view", if y o u like) is abandoned in f a v o u r of a subjective reality. It is interesting that ~useful" knowledge bases developed so far ignore relativism and a s s u m e t h a t for a particular application one can construct an objective account b y piecing together personal viewpoints.

3.6. U n c e r t a i n t y The next feature of~knowledge we witt examine is concerned w i t h the degree of confidence an agent has in the t r u t h of a particular fact in its knowledge base. Each fact then has associated c e r t a i n t y i n f o r m a t i o n w h i c h indicates t h e degree of this confidence. The notions of "certainty" and "confidence" however have proved v e r y difficult to formalise and consequently m o s t of t h e m e a s u r e s that have been used for the degree of this confidence have been quantitative (rather t h a n qualitative). The basic idea behind such m e a s u r e s is to provide a function uric f r o m propositions to real n u m b e r s such that

uric(p)

indicates the certainty the KB has in the t r u t h of proposition p. Hence if p is more certain

than q then

unc( p ) >turic(q). A n y approach to providing such a m e a s u r e m u s t address two questions: first, h o w are the measures to be updated in the light of n e w evidence, and second, h o w does one choose among the v a r i o u s possibilities, given the propositions and certainty values? The traditional approach is probability theory which, until recently, has provided the best-developed m a t h e m a t i c a l f r a m e w o r k for dealing w i t h uncertainty. Let H be a finite set of propositions, closed u n d e r negation a n d conjunction, and a s s u m e t h a t ~ and I denote respectively the inconsistent and true propositions in the set II, A probability m e a s u r e P defined over II, intended to represent the c e r t a i n t y (or probability or plausibility or credibility) of a proposition, is a function f r o m H to [0,1] such t h a t

(1)

P(o) = o

(2)

P(I) = 1

(3)

P(pvq) -- P(p)+P(q)

if pAq =

Traditional probability theory h a s been perceived however as having a n u m b e r of difficulties. As we

28

shall see though, there is no general consensus as to the validity of these claims. One s u c h drawback is t h a t it is v e r y difficult in general to establish a P function for a particular set of propositions. A second perceived d r a w b a c k is t h a t the above f o r m u l a t i o n of u n c e r t a i n t y has t h e p r o p e r t y t h a t P(g)+P(~q)

(11)

= I

w h i c h m e a n s that w h a t e v e r u n c e r t a i n t y is missing w i t h respect to a proposition q m u s t be attached to its c o m p l e m e n t --~. It follows t h a t there is no room in this f r a m e w o r k for ignorance in the certainty of a proposition a n d its complement. T h u s , even if we k n o w nothing about John being or n o t being a millionaire, for q ~ J o h n is a millionaire,

we are forced to have (11) hold. A f u n d a m e n t a l issue that needs to be addressed in selecting an u n c e r t a i n t y function for a set of propositions is h o w the u n c e r t a i n t y function should be constrained by propositions t h a t are logically or probabilistically related. A solution to this issue is provided by Bayes's rule: P ( ~ ) X P ( n . n 2. . . . . Pff-II&,E2

.....

where t ' ( H I E 1. . . . .

E.)=

_V(E~. E2 . . . . .

n. In) E.)

E . ) is the conditional probability of H given E 1. . . . . E., i.e.. the probability t h a t H is

t r u e given t h a t E t . . . . . E n are all true. U n f o r t u n a t e l y . this f o r m u l a is h i g h l y impractical to use in a realistic setting because P(E1 . . . . .

E n) a n d P(EI . . . . .

E . IH) are u s u a l l y v e r y difficult to determine; moreover

the f o r m u l a leads to severe combinatorial problems. (Consider for example the n u m b e r of P vatues t h a t w o u l d have to be calculated, somehow, for n = 10 and each E taking two possible values.) In order to overcome these problems, two simplifying a s s u m p t i o n s are u s u a l l y made. Firstly, the events E i are a s s u m e d to be s t a t i s t i c a l l y i n d e p e n d e n t , in which case P(EI .....

E.) = P(E~) x

~n2) x

. ..

xP(E.).

This drastically reduces the n u m b e r of P values that needs to be estimated. U n f o r t u n a t e l y however the a s s u m p t i o n of statistical independence is u s u a l l y false. Secondly, it is a s s u m e d that statistical independence between the Ei continues to hold given H, i.e., P(E~ . . . . .

n . I H ) = P(E~ ~H) x - • - x P ( n . ~n).

A n appealing result of these simplifications is t h a t the conditional certainty in H , given i pieces of evidence, is a linear function of t h e certainty in H given (i--1) pieces of evidence and the certainty in the i t~ piece of evidence: P ( H Jn~ . . . . .

n.) = P(n l n.

. . . , n._~) x [ P ( n . IH) / P(n~)].

This m e t h o d of calculating u n c e r t a i n t y f o r m s the basis of the reasoning component for PROSPECTOR [Duda et al 78] and provides evidence t h a t simplifying a s s u m p t i o n s can sometimes be a positive step t o w a r d s building practical s y s t e m s . M a n y alternatives to the above f o r m u l a t i o n of u n c e r t a i n t y relax condition (3) so t h a t

29

(3") P(p)+P(-p) ~< 1. The Dempster-Shafer theory ([Dempster 68]) proposes such an alternative. Here the basic means for assigning and manipulating degrees of belief is a m a s s f u n c t i o n M which represents a basic probability assignment to all possible propositions. From this basic assignment, the s u p p o r t of a proposition p is defined b y sup(p) = ~'.M(q)

over {qDp}.

Thus the mass oLevery proposition that implies p contributes to the support of p. The p l a u s i b i l i t y of p is defined by:

pzs(v) = 1-s,,p(~p). It is easily shown that

sup(p)~pls(p).

The confidence in a proposition p then is defined as the interval

given by the support and plausibility:

con/(p) = [s"v(e), #~(v)]. Thus the proposition p denoting "John is a millionaire ~ and its negation might be assigned confidence [0, O] if nothing is known about the matter. If, on the other hand, it is certain that John is not a millionaire, then

conf(p) =

[0,1]

while ~onf(-~p) = [1,1].

The Dempster-Shafer theory also indicates how two individual mass functions (engendered perhaps from disparate evidence reports) can be combined to yield a single mass function. [Dubois & Prade 85] examine this rule for combining mass functions in typical situations. They show, for example, that suitably restricted, the rule is equivalent to that used in the MYCIN system [Shortliffe 76] for combining two nonconflicting pieces of evidence. They show also though that in some cases the rule m a y be quite sensitive to small changes in the values of the evidence reports: in some cases a change f r o m a claim that p is impossible (i.e.

conf(p)= [0,1])

to one that p is highly improbable (i.e. perhaps

conf(p)= L0001,

.9999]) can yield

quite different results w h e n combined w i t h other evidence reports. This problem (if indeed it is a problem at a11) however is one common to all such approaches, including traditional probability theory. The theory has also been extended to allow for the calculation of the confidence of togical combinations of propositions [Lowrence 82]. For example, if ~on[(p) =

[o,pz~(q)]

co,~Rq) -

[me(r), 1]

pDq then

Such rules offer an alternative to the Bayes" rule and essentially replace statistical dependence concerns with logical dependence ones among the evidence and the conceivable hypotheses for a given setting. A difficulty with the Dempster-Shafer approach is that, unlike Bayesian analysis, it is not clear how one m a y arrive at a decision about which proposition to hold [Thompson 85].

While support and

30

plausibility v a l u e s m a y be used as bounds on the probability of a statement, there is no accepted mechanism for selecting a m o n g propositions that m a y have overlapping intervals or intervals differing in size. However here, as elsewhere, there is no clear agreement as to w h i c h of these approaches is preferable. or even if recent proposals are tending in the right direction. So while there has been m u c h recent activity and interest in approaches such as Dempster-Shafer, there is still m u c h interest in investigating more traditional avenues. [Cheesman 85] for example presents a forceful and spirited defense of standard, classical probability theory. In fact, he claims that m u c h of the present work in AI, including default and n o n monotonic reasoning s y s t e m s , f u z z y logics, and the inferential apparatus of m a n y expert s y s t e m s , is f o u n d e d on misinterpretations of probability theory. T h a t is, the crucial concept is t h a t of the c o n d i t i o n a l p r o b a b i l i t y of a proposition, w h i c h is the m e a s u r e of an e n t i t y ' s belief in t h a t proposition, given a particular set of evidence. This notion differs s h a r p l y f r o m the c o m m o n l y held f r e q u e n c y d e f i n i t i o n of probability, as being the ratio of the n u m b e r of occurrences in w h i c h an event is t r u e to t h e total n u m b e r of such occurrences, With

respect to uncertainty, Cheesman argues t h a t "extended" approaches, such as those of

Dempster-Shafer or Lowrence, can be handled classically. A n example cited is t h a t of a box t h a t s i m p l y o u t p u t s a string of decimal digits, and where we are asked the probability t h a t the n e x t n u m b e r is 7. W i t h no f u r t h e r information we w o u l d s a y .1. If we s u b s e q u e n t l y examined 1,000,000 digits, and 100,000 were 7 and in no apparent order, we w o u l d still s a y .1. The difference of course is t h a t we w o u l d be m u c h more confident in the second prediction. However to deal w i t h such notions of confidence, extended notions such as significance, plausibility, etc. aren't necessarily required; rather the changed expectation can be captured s i m p l y as the s t a n d a r d deviation or, m o r e generally, w i t h a probability d e n s i t y function. Finally, some AI s y s t e m s , n o t a b l y expert s y s t e m s , e m p l o y techniques for updating certainties t h a t differ significantly f r o m those just described. The major difference is that such s y s t e m s can use and take advantage of large quantities of domain-specific information. A good example is given in [Tsotsos 81]. Here the idea is t h a t certainty information is attached to complex objects, called h y p o t h e , se~. Hypotheses are arranged b y m e a n s of their c o n c e p t u a l a d j a c e n c y .

In updating certainties, a particular hypotheses

will be supported by some hypothesis while others witl conflict w i t h it. T h u s for example, the hypothesis that )'The object u n d e r consideration is a station wagon" s u p p o r t s the hypothesis t h a t "The object under consideration is a car", b u t conflicts w i t h "The object under consideration is a bicycle". C e r t a i n t y values are updated s i m u l t a n e o u s l y using r e l a x a t i o n l a b e l l i n g [Zucker 78]. If a hypothesis has a high certainty value, then it will tend to increase t h e certainty value of those t h a t it supports, and decrease t h e certainty value of those t h a t it conflicts w i t h . To provide for a damped convergence to a solution, techniques f r o m control t h e o r y are also employed. W h i l e this approach provides a feasible m e a n s of dealing w i t h a collecT;ion of certainty values, as mentioned, it does require a pr/ari knowledge of the d o m a i n u n d e r consideration. A related approach is also described in [Khan and Jain 85].

31

3,7, I m p r e c i s i o n Apart from uncertainty in the truth value of a proposition, there is also the issue of the contents of a proposition being imprecise. For instance, asserting that "John was born in 1956" is imprecise in that we are not told exactly when in 1956 John was born. Likewise~ "John is very young" "Most Swedes are blond" "George is bald* are imprecise w i t h respect to the age of John, the proportion of blond Swedes [Prade 85], and the degree of George's baldness. It is important to emphasize that imprecision and uncertainty are orthogonal notions, We m a y be absolutely certain that John is young but only have imprecise information about how young he is. Conversely, we m a y be uncertain about very precise propositions such as "the area of a circle is 7r times the square of its radius". A popular w a y of dealing with imprecision involves the notion of f u z z y sets [Zadeh 75]. These are sets defined by a membership function # which ranges over the full interval [0, 1] instead of being just binary. A proposition of the form ~X is A" (e.g., *George is bald") is thought as describing X's membership in a fuzzy set SA. For example, the fuzzy set

SBALDm a y have a membership function/ZSAZD such that l~BAz~(George) = 0.9 tZBALD(Mary) = 0,05.

In [Zadeh 83], this simple account is extended to show how one can represent the meaning of statements such as "Most Swedes are blond", given a (fuzzy) world which includes information on the hair colour of Swedes, the nature of blondness as a function of hair colour, and the ratio of true instances of "Swedes are blond" which would satisfy the quantifier "Most". Zadeh calls his method t e s t score s e m a n t i c s and argues that it constitutes a generalization of other types of semantics such as the Tarskian and possible worlds semantics discussed earlier. In addition to fuzzy sets, probability functions (or probability distributions) can also be used to represent imprecision. For instance, we can think of "John is very young" as defining a probability function, ~r, for the age of John. Then if SAGe is the set of all age values, It(s) specifies the probability of John's age being s. Presumably, ~r m u s t assign larger values to younger ages and in addition, ~ r ( s ) = 1 for all seSAGg. [Prade 85] provides a thorough account of the use of this machinery for the representation of imprecision. The methods of fuzzy sets have been extended in a number of directions and attempts have been made to apply them in other ways in representing common sense knowledge. Despite these attempts, there doesn't

32

appear strong s u p p o r t for the use of f u z z y se~s in the representation of a n y t h i n g b u t imprecision in measure spaces [Hayes 79]. [Osherson & Smith 81] also is a critique of accepted views of imprecision as they bear on intuitions concerned w i t h the combination of concepts to f o r m complex concepts.

4. Conclusions Clearly there is no single, complete set of features of knowledge. In this paper we have attempted to identify some features t h a t are of interest to researchers in Knowledge Representation and to sketch some of the approaches that have been used to formalize and s t u d y them. There are other features one m a y w a n t to examine. W e k n o w . for example, t h a t i n f o r m a l specifications, including c o m m e n t s attached to a program, graphical sketches of the overall s t r u c t u r e of a s y s t e m , and n a t u r a l language accounts of requirem e n t s for a piece of software, are all t i m e - h o n o u r e d and accepted practices for representing knowledge about a program. W o u l d we have a more p o w e r f u l knowledge representational f r a m e w o r k if it could h a n dle i n f o r m a l i t y ? Likewise, we m a y w a n t to be able to talk about the s i g n i f i c a n c e or i n s i g n i f i c a n c e of an item in a knowledge base, or its r e l e v a n c e or i r r e l e v a n c e to the knowledge base. The reader m a y w a n t to add his o w n list of features of knowledge to w h a t has been presented or mentioned so far. If there is a c o m m o n direction or t h e m e to the w o r k reviewed here, it is the continuing concern w i t h formality. This is indicated by the emphasis on f o r m a l logics, both as a tool for representing knowledge and as a tool for t h e analysis of knowledge. W h i l e the paper its'elf has emphasised f o r m a l approaches, it no~aetheless appears that m a n y researchers are concerned with investigating the f u n d a m e n t a l , foundational properties of knowledge. This certainly is to be expected, given that m a n y of the issues are only n o w beginning to be f u l l y understood and explored in At. It is interesting t h a t in the discussion of m e c h a n i s m s for handling the different features of knowledge, we t u r n e d several times to the same m e c h a n i s m s for help. Nonmonotonic reasoning, modal operators, and the availability of a metalanguage that allows one to treat propositions in the knowledge base as entities within the domain of discourse, are three such mechanisms. It is significant t h a t b y and large there has been little interest in embedding such m e c h a n i s m s in a representational f r a m e w o r k , b u t u n d e r s t a n d a b l y so since these m e c h a n i s m s are still u n d e r development. This paper provides an a d m i t t e d l y brief and subjective overview of some issues concerning the nature of knowledge, There is no claim t h a t either the list of issues or the list of references given for each one is exhaustive. We do hope however, t h a t we have helped the reader with background in C o m p u t e r Science b u t little in Artificial Intelligence appreciate some of the deeper issues t h a t need to be addressed if one is to call the i n f o r m a t i o n handled by his s y s t e m "knowledge" a n d the data s t r u c t u r e s storing this information "knowledge bases".

References

American Association for Artificial Intelligence, Non-Monotonic York, Oct. 1984

Reasoning Worksho2, New Paltz, New

A.R. Anderson and N.D. Belnap Jr., Entailment: The Logic of Relevance and Necessity, VoL I, Princeton U n i v e r s i t y Press, 1975.

33

D. Angluin and C.H. Smith, "A Survey of Inductive Inference: Theory and Methods". Technical Report 250, Department of Computer Science, Yale University, 1982. A. Barr and J. Davidson, "Representation of Knowledge", Stanford Heuristic Programming Project, Memo HPP-80-3, Stanford University, 1980. N.D. Belnap, "A Useful Four-Valued Logic" in Modern Uses of Multiple-Valued Logic, J.M. Dunn and G. Epstein eds., D. Reidel Pub. Co., 1975. W. Bibet, "First-Order Reasoning About Knowledge and Belief", ATP-21-IX-83, Technical University of Munich, 1983. A. Borgida and T. Imielinski, "Decision Making in Committees -- A Framework for Dealing with Inconsistency and Non-Monotonicity", Workshop on Non-Monotonio Reasoning, New Paltz, 1984. A. Borgida, "Language Features For Flexible Handling Of Exceptions In Information Systems", Transactions on Database Systems, to appear. R.3. Brachman. "On the Epistemological Status of Semantic Networks", in Associative Networks: Representation and Use of Knowledge by Computers, N.V. Findler (ed.), Academic Press. 1979, pp 3-50. R.J. Brachman and H.J. Levesque, "Competence in Knowledge Representation" Proc. AAA1-82, Pittsburgh, 1982, pp 189-192. R.J. Brachman and H.J. Levesque, "The Tractability of Subsumption in Frame-Based Description Languages" Proc. AAA1-84, Austin, 1984, pp 34-37. R.J. Brachman and H.J. Levesque (eds.), Readings in Knowledge Representation, Morgan Kaufmann Publishers, Inc., 1985 R.3. Brachman and B.C. Smith (eds.), Special Issue on Knowledge Representation, SIGART Newsletter No. 70, Feb. 1980. P. Cheeseman, "In Defense of Probability", Proc. IJCAI-85, Los Angeles, 1985, pp 1002-1009 J.P. Delgrande, "A Foundational Approach to Conjecture and Knowledge in Knowledge Bases". Ph.D. Thesis. Department of Computer Science, University of Toronto, 1985. A. P. Dempster, "A Generalization Of Bayesian Inference", Journal of the Royal Statistical Society, Vol. 30. pp 205-247. 1968. J. Doyle, "A Truth Maintenance System", Artificial Intelligence 12, 1979, pp 231-272. J. Doyle and P. London, "A Selected Descriptor-Indexed Bibliography to the Literature on Belief Revision". SIGART Newsletter #71, Apr. 1980, pp 7-23. F.I. Dretske. Knowledge and the Flow of Information, Bradford Books, the MIT Press, 1981. D. Dubois and H. Prade. "Combination and Propagation of Uncertainty with Belief Functions", Proc. 1JCA1-85, Los Angeles. 1985, pp 111-113 R.O. Duda, P.E. Hart, N.J. Nilsson, and G.L. Sutherland, "Semantic Network Representations in Rule-Based Inference Systems", in Pattern-Directed Inference Systems, D.A. Waterman and F. Hayes-Roth eds.. Academic Press, 1978. D.W. Etherington, R.E. Mercer, and R. Reiter, "On the Adequacy of Predicate Circumscription for ClosedWorld Reasoning". Computational Intelligence, Vol. 1, No. 1, 1985. pp 11-15. D.W. Etherington and R. Reiter, "On Inheritance Hierarchies with Exceptions", Proc. AAAI-83, 1983, pp 104-108. R. Fagin. J.Y. Halpern, and M.Y, Vardi, "A Model-Theoretic Analysis of Knowledge: Preliminary Report". Proceedings of the Twenty-Fifth IEEE Symposium on Foundations of Computer Science, Florida, 1984.

34

S.E. Fahlman, NETL: A System for Representing and Using Real-World Knowledge, MIT Press, 1979. I. Gotdstein and S. Papert, "Artificial Intelligence, Language, and the Study of Knowledge", Cognitive Science, Vol. 1, No. 1, 1977. N. Goodman, Fact, Fiction and Forecast, 3rd ed., Hackett Publishing Co., 1979. R.F. Hadley, "Two Solutions to Logical Omniscience: A Critique with an Alternative", TR 85-21, School of Computing Science, Simon Fraser University, B.C., 1985 J.Y. Ilalpern and Y.O. Moses, "A Guide to the Modal Logics of Knowledge and Belief: Preliminary Draft". Proc IJCAI-85, Los Angeles, 1985. P. J. Hayes, "Some problems and Non-Problems in Representation Theory', l~oceedings AISB Summer Conference, 1974, pp 63-79. P.J. Hayes. "In Defense of Logic", Proc. 1JCA1-77, Cambridge, 1977. pp 559-565. P. J. Hayes, "The Naive Physics Manifesto", Machine lntelligenoe 9, D. Michie (ed,), Edinburgh University Press, 1979, pp 243-270. C. Hewitt, "PLANNER: A Language for Proving Theorems in Robots", Proceedings IJCA1-71, London, 1971. J. Hintikka, Knowledge and Belief: An Introduction to the Logic of the Two Notions, Cornell University Press, 1962. G.E. Hughes and M.J. Cresswell, An Introduction to Modal Logic, Methuen and Co., 1968. D.J. Israel and R.J. Brachman, "Distinctions and Confusions: A Catalogue Raisonne", Proceedings of the Seventh International Conference on Artificial lntelligenoe, Vancouver, B.C., 1981, pp 252-259. N.A. Khan and R. Jain, "Uncertainty Management in a Distributed Knowledge Based System", Proc. 1JCA1-85, Los Angeles, 1985, pp 318-320 K. Konolige, "A Metalanguage Representation of Relational Databases for Deductive Question-Answering Systems", Proceedings of the Seventh International Conference on Artificial Intelligence, Vancouver, B.C., 1981, pp 496-503. K. Konolige, "Circumscriptive Ignorance", Proc. AAAI-82, Pittsburgh, 1982 K. Konolige, "A Deductive Model of Belief", Ph.D. Thesis, Department of Computer Science. Stanford University, 1984. B. Kramer and J. Mylopoulos. '*Knowledge Representation: Knowledge Organization". to appear. G. Lakemeyer, Internal Memo, Department of Computer Science, University of Toronto, 1984. H.J. Levesque, "A Formal Treatment of Incomplete Knowledge Bases", Ph.D. thesis. Department of Computer Science, University of Toronto, 1981. H.L Levesque, "A Logic of Implicit and Explicit Belief~, Proc. AAA1-84, Austin, 1984. D. Lewis, Counterfactuals, Harvard University Press, 1973. J, D. Lowrence, "Dependency-Graph Models of Evidence Support". COINS technical report 82-26. University of Massachusetts at Amherst, 1982. G. McCalla and N. Cercone (eds.), 1EBB Comtmter (Special lssue on Knowledge Representation) Vol. 16, No. 10, October 1983. J. McCarthy, "First Order Theories of Individual Concepts and Propositions". in Machine Intelligence 9, D. Michie (ed.), Edinburgh University Press, 1979, pp 129-147.

35

J. McCarthy, "Circumscription -- A Form of Non-Monotonic Reasoning", Artificial inteUigence 13, pp 2739, 1980. J. McCarthy, "Applications of Circumscription to Formalizing Common Sense Knowledge", Non-Monotonic Reasoning Workshop, New Paltz, New York. 1984, pp 295-324, J. McCarthy and P.J. Hayes, "Some Philosophical Problems from the Standpoint of Artificial Intelligence", in Machine Intelligence 4, D. Michie and B. Meltzer (eds.), Edinburgh University Press, 1969, pp 463-502. J.P. Martins and S.C. Shapiro, "A Model for Belief Revision", Non-Monotonic Reasoning Workshop, New Paltz, 1984, D. McDermott, "The Last Survey of Representation of Knowledge", Proceedings AISB/GI Conference, 1978, 206-221. D. McDermott, "Monmonotonic Logic II: Nonmonotonic Modal Theories" JACM 29, 1, 1982, pp 33-57 D. McDermott and J. Doyle, "Non-Monotonic Logic I", Artificial Intelligence 13, 1980, pp 41-72 M. Minsky, "A Framework for Representing Knowledge" in The Psychology of Computer Vision, P.H. Winston (ed.), McGraw-Hill, 1975, pp 211-277. R. Montague, Formal Philosophy, Yale University Press, 1974. R.C. Moore, "Reasoning About Knowledge and Action", Technical Note 284, Artificial Intelligence Centre, SRI International, 1980. R.C. Moore, "Semantical Considerations on Nonmonotonic Logic", Proc. IJCA1-83, Karlsruhe, 1983, pp 272-279. R.C. Moore and G. Hendrix, "Computational Models of Beliefs and the Semantics of Belief-Sentences", Technical Note 187, SRI International, Menlo Park, 1979. J. Mylopoulos and H.J. Levesque, "An Overview of Knowledge Representation" in On Conceptual Modelling, M.L. Brodie, J. Mylopoulos, and J.W, Schmidt (eds.), Springer-Verlag, 1984. A. Newell, "The Knowledge Level", A1 Magazine 2(2), 1981, pp 1-20. D.N. Osherson and E.E. Smith, "On the Adequacy of Prototype Theory as a Theory of Concepts", Cognition 9, 1981, pp 35-58. M. Papalaskaris and A. Bundy, "Topics for Circumscription", Non-Monotonic Reasoning Workshop, New Paltz, New York, 1984, pp 355-362. H.E. Pople, "On the Mechanisation of Abductive Logic", Proceedings of the Third International Conference on Artificial Intelligence, Stanford, Ca., 1973, pp 147-152. H. Prade, "A Computational Approach to Approximate and Plausible Reasoning with Applications to Expert Systems", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 7, No. 3, May 1985. H. Putnam, "Is Semantics Possible?" in Mind, Language and Reality: Philosophical Papers Volume H, Cambridge University Press, 1975, pp 215-271. H. Putnam, "The "Corroboration' of Theories", in Mathematics, Matter, and Method: Philosophical Papers Volume 1, 2rid ed., Cambridge University Press, 1979, pp 250-269. W.V.O. Quine and J.S. Ullian, The Web of Belief, 2rid ed., Random House, 1978. R. Reiter, "On Closed World Data Bases", in Logic and Databases, H. Gallaire and J. Minker eds., Plenum Press, 1978. R. Reiter, "A Logic for Default Reasoning", Artificial Intelligence 13, 1980, pp 81-132,

36 E. Rosch, "Principles of Categorisation" in Cognition and Categorisation, E. Rosch and B.B. Lloyds eds., Lawrence Erlbaum Associates, 1978. I. Scheffler. The Anatomy of lnquiry: Philosophical Studies ~n the Theory of Science, Hackett Publishing Co.. 1981. S.P. Schwartz (ed,), Naming, Necessity, and Natural Kinds, Cornell University Press, 1977. E.Y. Shapiro, "Inductive Inference of Theories from Facts", Research Report 192, Department of Computer Science, Yale University, 1981. S. Shapiro and R. Bechtel, "The Logic of Semantic Networks", TR-47, Department of Computer Science, Indiana University, 1976. E.H. Shortliffe, Computer-Based Medical Consultation: MYCIN, American Elsevier, 1976. T.R. Thompson, "Parallel Formulation of Evidential-Reasoning Theories" Proc, 1JCAI-85, Los Angeles, 1985, pp 321-327 D.S. Touretzky, "Implicit Ordering of Defaults in Inheritance Systems" Proc. AAA1-84, Austin, Texas, 1984, pp 322-325. J.K. Tsotsos, "Temporal Event Recognition: An Application to Left Ventriculat Performance", Proc. 1JCA1-

81, Vancouver, 1981, pp 900-905 Y. Vassiliou, "A Formal Treatment of Imperfect Information in Database Management", Ph.D. Thesis, Department of Computer Science, University of Toronto, 1980. W.A. Woods, "What's in a Link: Foundations for Semantic Networks" in Representation and Understand-

ing, D.G. Bobrow and A. Collins eds., Academic Press, 1975. L.A. Zadeh, "Fuzzy Logic and Approximate Reasoning", Synthese 30, 1975, pp 407-428. L.A. Zadeh, "Commonsense Knowledge Representation Based on Fuzzy Logic", 1EEE Computer, Vol. 16, No. I0, October 1983.61-66. S.W. Zucker, "Production Systems with Feedback", in Pattern-Directed Inference Systems, D,A. Waterman and F. Hayes-Roth eds., Academic Press, 1978.

PART TWO

Knowledge Processing

D e d u c t i o n a n d Computation Gdrard Huet t N R I A and C M U

We present in a unified framework the basic syntactic notions of deduction and computation.

1 1.1

T e r m s a n d types General notations

We assume known elementary set theory and algebra. )4 is the set {0, 1, ...} of natural numbers, )/+ the set of positive natural numbers. We shall identify the natural n with the set {0 ..... n - 1}, and thus 0 is also the empty set 0. Every finite set S is isomorphic to n, with n the cardinal of S, denoted n = ISI. If A and B are sets, we write A --+ B, or sometimes B A, for the set of functions with domain A and codomain B.

1.2

Languages, concrete syntax

Let E be a finite alphabet. A string u of length n is a function in n ~ E. The set of all strings over ~ is hE3/ We write ]u[ for the length n of u. We write ui for u ( i - 1), when i < n. The null string, unique element of E 0 is denoted h. The unit string mapping 1 to a E E is denoted 'a'. The concatenation of strings u and v, defined in the usual fashion, is denoted u " v, and when there is no ambiguity we write e.g. 'abc' for ~a' " 'b' " 'c'. When u E E* and a E E, we write u . a for u ~ 'a'. We define an ordering < on E*, called the p r e f i x ordering, by u
¢~ 3 w

v = u'w

If u < v, the residual w is unique, and we write w = v / u . We say that occurrences u and v are disjoint, and we write u l v , iff u and v are unrelated by the partial ordering <_. FinMly we let u < v iff u < v with u ¢ v. The set ~* has the structure of a mono'fd, that is: A88:

- ,)

-

=

-

5dL : h'u

= u

5dR:

= u

u'h

(,,

-

Actually, ~* is the free mono'/d generated by ~E.

Examples. 1. ~] -- 0. We get ~* = 1. 2. ~ -- 1. We get ~.* = ~/. Strings are here natural numbers in unary notation, and concatenation corresponds to addition.

40

3. E = 2 = {0, 1} (the Booleans). The set E* is the set of all binary words. 4. ~] = N+. We call the elements of ~* occurrences. When u = w . m and v = w . n, with m < n, we say that u is Ieft of v, and" write u
Terms:

abstract

syntax

We first define a tree domain as a subset D of )4_~ closed under < and
uED uED

A v
~

vED

=¢~ r E D .

We say t h a t M is a :E-tree iff M E D --* ~2, for some tree domain D. We write D = D ( M ) , and we say t h a t D is the set of occurrences in M . M is said to be finite whenever D is. We shall now use occurrences to designate nodes of a tree, and the subtree starting at t h a t node. If u E D ( M ) , we define the ~-tree M / u as mapping occurrence v to M ( u " v). We say t h a t M / u is the sub-tree of M at occurrence u. tf N is also a El-tree, we define the graft M [u *-- N ] as the ~-tree mapping v ' t o N ( w ) whenever v = u " w with w E D ( N ) , and to M(v) if v E D ( M ) and not u < v. We need one auxiliary notion~ t h a t of width of a tree. If M E E*, we define the (top) width of Mas IIMtl = m a x { n ] ' n ' E DCM)} We shall now consider ~ a graded alphabet, that is given with an arity function a in E --* ~/. We then say t h a t M is a E - t e r m iff M is a E-tree verifying the supplementary consistency condition:

IlM/ul[

Vu e DCM)

= aCMCu))

T h a t is, every subtree of M is of the form F(MI,M2,...,Mn), with n = a(F). We write T ( E ) for the set of ~-terms. If M1,M2,...Mn 6. T(D) and F e E, with a(F) = n, then M = F(MI, M2, ...Mn) is easily defined as a E-term. This gives T(E) the structure of a ~-algebra. Since conversely the decomposition of M is uniquely determined, we call T(~3) the completely free I2-algebra.

Example W i t h E = {+, S, 0}, a ( + ) = 2, a ( S ) = 1, a(0) = 0, the following structure represents a X]-term:

/\ +

0

/\

÷

S

t

8

8

0

0

The following proposition is easy to prove by induction. All occurrences are supposed to be univcrsally quantified in the relevant tree domain.

41

Proposition 1. Embedding:

M [ u ,-- N ] / ( u ~ v) -= N / v

Assoeiativity : M [ u ~-- N] [u ~ v ~-- P] = M [ u *-- N [ v ~-- P ] ] ' Persistence: M [ u ~-- g ] / v C o m m u t a t i v i t y : M [ u ~- N] [v ~-- P] D i e t r i b u t i v i t y : M [ u ~-- N ] / v

= M/v

(ulv)

= M [ v ~- P] [u ~-- N]

= ( M / v ) [ u / v ~- N]

Dominance : M [u ~-- N] Iv ~-- P] -- M [v ~- P]

(u]v)

(v < u) (v < u)

We define the length IM[ of a (finite) term M recursively by: IF(MI,...,M2)] = 1 + ~=IIM~]

1.4

Parsing

It is well-known that the term in the example above can be represented unambiguously as a T-string, for instance in prefix polish notation, that is here: + + OSOSSO. This result is not very interesting: such strings are neither good notations for humans, nor good representations for computers, since the graft operation necessitates unnecessary copying. We shall discuss later good machine representations, using binary graphs. As far as human readibility is concerned, we assume known parsing techniques. This permits to represent terms, on an extended alphabet with parentheses and commas, which is closer to standard mathematical practice. Also, infix notation and indentation permit to keep in the string some of the tree structure more apparent. We shall not make explicit the exact representation grammar, anal allow ourselves to write freely for instance (0 ÷ S(0)) + S(S(O)). Note that we avoid explicit quotes as well, which permits us to mix freely recta-variables with object structures, like in S ( M ) , where M is a recta-variable denoting a ~-term.

1.5

Terms with variables, substltution

The idea is to internalize the notation S ( M ) above as a term S(x) over an extended alphabet containing special symbols of arity 0 called variables. Let V be a denumerable set disjoint from ~. We define the set of terms with variables, T(~., V), in exactly the same way as T(~. U V), extending the arity function so that a(x) =- 0 for every z in V. The only difference between the variables and the constants (symbol of arity 0) is that a constant has an existential import: it denotes a value i n the domain we are modelling with our term language, whereas ~ variable denotes a term. The difference is important only when there are no constants in E, since then T(E) is empty. All of the notions defined for terms extend to terms with variables. We define the set N ( M ) of variables occurring in M as-

V(M)

= {x e V I 3u e D ( M )

M(u) = x}

and we define the number of distinct variables in M as P ( M ) = tV(M)I. We shall now formalize the notion of substitution of terms for variables in a term containing variables. From now on, the sets ~ and V are fixed, aald we use T to denote T(~, V). A subs~itn$ion a is a function in V --~ T, identity almost everywhere. That is, thc set D(a) = {x E V I a(x) ¢ x} is finite. We call it the domain of or. Substitutions are extended to nmrphisms over T by

a(F(M1, ..., M,.,))

--

F(a(M1), ...,a(M~))

42

Bijective substitutions are called pe1"mutntions. When U C_ V, we writc cru for the restriction of substitution a to U. It is easy to show that, for all a, M and U:

V(M) C_ U =~ a(M) = av(M). Alternatively, we can define the replacement M [x ~- N ] as M [ul ~ N ] ... [un ~ N ] where {ul,...,un}

= {~lM(u)

= x} and then

a(M) = M [ x ~ a(x) l x e V(M)] with an obvious notation. We now define the quasi-ordering < of matching in T by:

M <_N

¢~ 3a

Y=a(M)

It is easy to show that if such a a exists, o'V(M) is unique. We shall call it the match of N by M, a n d denote it by N / M . We define M =- N ¢~ M < N A N < M. When M = N , we say that M and N are isomorphic. This is equivalent to say t h a t M = a(N) for some p e r m u t a t i o n a. Note t h a t M - .~/ implies [M[ = ]N[. Finally, we define

M>N

¢~ N < M

A -~M
P r o p o s i t i o n . > is a well-ordering on T. Proof. We show t h a t M > N implies # ( M ) > # ( N ) , with #(M) = IMI - v(M). Let p be any bijection between T x T and V. We define a binary operation f7 in T by:

F(M1, ..., Mn) N F(NI, ..., Nn) = F(M1 n N1, ..., Mn n N,) MAN

= 9~(M,N)

in all other cases.

M N N is uniquely determined from p and, for distinct ~'s, is unique up to - .

Proposition.

M V1N is a g.l.b, of M and N under the m a t c h quasi-ordering.

Let T be the quotient set T~ - , completed with a m a x i m u m element T. From the propositions above we conclude: Theorem.

T is a complete lattice.

C o r o l l a r y . If two terms M and N h a v e an upper bound, i.e. a c o m m o n instance a(M) = a t ( N ) , they have a l.u.b. M (3 N , which is a most general such instance: a = a0; v, a ~ = a~; T. The t e r m M U N is unique modulo -= and may be foumt by the unification algorithm.

Proposition. D(a(M)) = D(M) U

U

{u ^ v l v e D ( a ( M ( u ) ) ) }

{ulMCu)eV} Vu E D(M) M(u) E V ~

a ( M ) / u " v = a(M(u))/v (v e D(a(M(u))))

M(u) e ~ ~

a ( M ) / u = a(M/u)

43

1.6

Graph representations~ dags

We represent trees by binary graphs of adr pairs. An adr consists in one tag bit, and one byte field interpreted either as an address in the graph memory, or as a natural number. In this last case, the natural 0 is reserved for nil, the empty list of trees. Symbols from ~ are coded as positive naturals. If tree M is represented at the graph address adrl and the hst L is represented at address adr2, then the list M . L is represented by the graph node (M . L). Finally, the tree F(L) is represented

by (F • L). This is the standard .way of representing trees and lists in the language LISP. A precise description of the memory allocation implementation of such schemes is beyond the scope of these notes. Terms are of course represented as trees. A global table holds the arity function. There are several possibilities for the representation of variables. They may be represented as symbols. But then the scope structure must be computed by an algorithm, rather than being implicit in the structure. Also a global scanning of the term is necessary to determine its set of variables, and substitution involves copying of the substituted term. For these reasons, variables are often represented rather as integer offsets in stacks of bindings. Such "structure sharing" representations are now standard for PROLOG implementations. A precise account of the various representations schemes for term structures, and of the accompanying algorithms, is out of the scope of these notes. It should be born in mind that the crucial problem is memory utilization: the trade-off between copying and sharing is often the deciding factor for an implementation. Languages with garbage-collected structures, such as LISP, are ideal for programming "quick and dirty" prototypes. But serious implementation efforts should aim at good algorithmic performance on realistic size applications. The crucial algorithms in formula and proof manipulation are matching, unification, substitution and grafting. First-order unification has been speciMly well studied. A linear algorithm is known [122], but in practice quasi-linear algorithms based on congruence classes operations are preferred [99,100]. Furthermore, these algorithms extend without modification to unification of infinite rational terms represented by finite graphs [64]. Implementation methods may be partitioned into two families. Some depend on logical properties (e.g. sharing subterms in dags arising from substitution to a term containing several occurrences of the same variable). Some are purely statistical (e.g. sharing structures globally through hash-coding techniques). Particular applications require a careful analysis of the optimal trade-off between logical and statistical techniques. There is no comprehensive survey on implementation issues. Some partial aspects are described in [8,140,101,99,163,158,115,40,1,32,42,19,45,144,159].

2

Inference

rules

We shall now study inference systems, defined by inference rules. The general form of an hlference rule is: I R P1 P2 ... P,

q

where the Pi's and Q are propositions belonging to some formal language. We shall here regard these propositions as types, and the inference rule as the description of the signature of I R considered as a typed operator. More precisely, I R has arity n, P~ is the type of its f-th argument, and Q is the

44

type of its result. Well-typed terms composed of inference operators are called the proofs defined by the inference system. Let us now examine a few familiar inference systems. 2.1

The trivial homogeneous

case: Arities

A graded alphabet ~ may be considered as the simplest inference systems, where types are reduced to arities. I.e., the set of propositions is 1, and an operator F of arity n is an inference rule F:

00...0 0

(with n zero's in the numerator). A E-proof corresponds to our E-terms above.

2.2

Finite systems of types: Sorts

The next level of inference systems consist in choosing a finite set S of elementary propositions, usually called sorts. For instance, with S = {int, bool}, and E defined by: O: int

S : int --~ int

true: bool

f a l s e : bool

if:

bool, int -+ int

where we use the alternative syntax Pt, ..., P,~ -* Q for an inference rule, the term if(true, O, S(O)) is of sort int, i.e. it is a proof of proposition int. As another example, consider the puzzle "Missionaries and Cannibals". We call configuration any triple (b, m, c 1 E 2 × 4 × 4. The boolean b indicates the position of the boat, m (resp. c) is the number of missionaries (resp. cannibals) on the left bank. The set of states S is the set of legad configurations, that obey the condition P ( m , c ) ==- m = c or m = O or m = 3

There are thus 10 distinct states or sorts. The rules of inference comprise first a constant denoting the starting configuration:

~o: then the transitions carrying p missionaries and q cannibals from left to right: Lm,~,p,q : (O,m,c) -.* (1, m - p , c - q )

(m > p,c > q , P ( m , c ) , P ( m - p , c - q ) , l

< p + q <_ 2)

and finally the transitions Rm,c,p,q, which are inverses of L~,c,p,q. The game consists in finding a proof of (t, 0, 0). This simple example of a finite group of transformations applies to more complex tasks, such a s Rubik's cube. All state transition systems can be described in a similar fashion. Examples of such proofs are parse-trees of regular grammars, where the inference rules signatures correspond to a finite automaton transition graph. Slightly more complicated formalisms allow subsorts, i.e. containment relationships between the sorts, i.e. implications between the elementary propositions. These systems reduce to simple sorts by considering dummy transitions corresponding to the implicit coercions.

45

2.3

Types as terms:

standard

proof trees

We shall here describe our types as terms formed over an Mphabct ~I> of type operators, which we shall call functors. For the moment, we shall assume that we have just one category of such propositions, i.e. the functors have just an arity. The alphabet ~ of inference rules determines the legal proof trees. E x a m p l e . Combinatory logic. We take as functors a set ¢ of constants ¢0, plus a binary operator --% which we shall write in infix notation. We call functionality a term in T(¢). We have three families of rules in ~. In the following, the meta-variables A, B, C denote arbitrary functionalities. The operators of the K and S families are of arity 0, the operators of the App family are binary. KA,B : A ~ (B--* A) SA,I3,C : (A ~ (B ~ C)) --* ((A ~ B) --+ (A --+ C)) AppA,B :

A---*B A B

Here is an example of a proof. Let A and B be 'any functionalities, C = B --+ A, D = A --~ C, E = A--*A,F = A~(C~A),G = D - - * E . The term App D,.~( App a a ( S A,C,A, -~A,C ), K A,B ) has type E, i.e. it gives a proof of the proposition A ~ A. We express formally that proof M proves proposition P in the inference system ~ as: ~ ~ M : P. That is, we think of a theorem as the type of its proof tree. Proof-checking is identified with type-checking. Here this is a simple consistency check; that is, if operator F is declared in ~ as: F : P1 .... ,Pn --* Q and if ~ ~ M~ : P~ for 1 < i < n, then ~ ~ F ( M I , . . . , M , ) : Q. 2.4

Polymorphism:

Rule schemas

This next level of generality consists in authorizing variables in the propositional terms. This is very natural, since it internalizes the meta-variables used to index families of inference rules as propositional variables. The ~ules of inference become thus polymorphic operators, whose types are expressions containing free variables. This is the traditional notion of schematic inference rule from mathematical logic. E x a m p l e . The example from the previous section is more naturally expressed in this poIymorphic formalism. We replace the set ¢0 by a set of variables V, and now we have just 3 rules of inference: K, S and App. The types can be completely dispensed with, since a well typed term possesses~a most general type, called its principal type. For instance, in the example above, the proof A p p ( A p p ( S , K ) , K ) has a principal type A ~ A, with A E V. This term is usually written I -- S K K in combinatory logic, where the concrete syntax convention is to write combinator strings to represent sequences of applications associated to the left. The notion of principM type, first discovered by Hindley in the cornbinatory logic context, and independently by Milner for HL type-checking [111], is actually completely general:

46

T h e o r e m . Let E be any signatur e of polymorphic operators over a functor signature ~. Let M be a legal proof term. Then M possesses a principal type r E T(q~, V). That is, ~ ~ M : r, and for all r * e T(¢, V), E ~ M : r ~ implies r _< rq Proof. This is an easy application of the unification theorem. By now we have developped enough formalism to make sense out of our "types as propositions" paradigm. Actually, the example we have developped above is the fragment of propositional logic known as ~minimal logic". When regarding the functor -~ as (intuitionistic) implication, and A p p as the usual inference rule of Modus ponens, K and S are the two axioms of minimal logic presented as a Hilbert calculus. Combinatory logic is thus the calculus of proofs in minimal logic [37]. Actually combinators don't just have a type, they have a value~ They can be defined with definition equations in terms of application. Using the concrete syntax mentioned above, we get for instance K and S defined by the following equations: DefK : K x y = x 1 ) e I s : 8 x y z = ~ z (y z).

Exereice. Verify that the two equations above, when seen as unification constraints, define the expected principal types for K and S. This point of view of considering equality axiomatizations of the proof structures corresponds to what the proof-theorists call cut elimination. That is, the two equations above can be used as rewrite rules in order to eliminate redundancies corresponding to useless detours in the proofs. We shall develop more completely this point of view of c o m p u t a t i o n as p r o o f n o r m a l i z a t l o n in section 4.4 below. The current formalism of inference rules typed by terms with variables corresponds to proof theory's iatuitionistic sequents, and to automated reasoning's Horn clauses. For instance, a PROLOG [24] interpreter may be seen in this framework as a proof synthesis method. Given an alphabet E of polymorphic inference rules (usually called definite clauses), and a proposition r over functor alphabet ~, it returns a proof term M such that M is a legal E-proof term of principal type r t instance of r: E~M:rf_>r. With a = r t / r , we say that a is a PROLOG a n s w e r to the query r. Of course this explanation is incomplete; we have to explain that PROLOG finds all such instances by a backtrack procedure constructing proofs in a bottom-l~p left-to-right fashion, using operators from E in a specific order (the order in which clauses are deelm'ed); this last requirement leads to incompleteness, since PROLOG may loop with recursively composable operators, whereas a different order might lead to termination of the procedure. Also, PROLOG may be presented several goals together, and they may share certain variables, but this may be explained by a simple extension of the above proof-synthesis explanation. We claim that this explanation of PROLOG is more faithful to reality than the usual one with Horn clauses. In particular, our explanation is completely constructive, and we do not have to explain the processes of conjunctive normalization and Skolemization. N~rthermore, there is no distinction in ¢ between predicate and function symbols, consistently with most PROLOG implementations.

47

2.5

Proof terms with variables, natural deduction.

The example above demonstrated the difficulty of proofs presented in a Hilbert style. The completely trivial theorem V A . A ~ A had a complicated proof using three axioms and two applications of modus ponens. Of course one could consider adding combinator I as an axiom, but this is only begging the question since other trivial natural theorems would present similar difficulties. And of course there is no e~sy way to decide which combinators are well-typed. For instance, Peirce's law: Peiree : ((A ~ B) -* A) -~ A

although a propositional tautology easily checkable by the truth-table method, is not intuitionistically valid. The natural proof of A --* A consists in, given a proof x of A, returning merely x as a proof of A. that is, the natural proof of A -~ A is the (polymorphic) identity algorithm. This method of proof usually procecds through the deduction theorem below. D e d u c t i o n t h e o r e m . Let P be any set of propositions, A and B be two propositions. We have P,A l- B iffP [- A -* B. The deduction theorem holds in any reasonable system of logic. It can be proved easily in minimal logic, by induction on the size of proofs. Unfortunately, the deduction theorem is a meta theorem, i.e. a mathematical theorem of the meta-theory analyzing the proof system, as opposed to the theorems, or well-typed proof terms inside the proof system. We shall see in section 4 that it is easy to internalize deductions as proof terms with variables, called sequents. This point of view will lead to logic presented in natural deduction style, that is to A-calculus formalisms. Before investigating this next level of expressive power, we consider in the next section a particularly important inference system ~, that of equational logic.

3 3.1

R e w r i t i n g inference a n d e q u a t i o n a l logic The classical presentation

Equational logic is classically presented as a restriction of first-order logic, where the only predicate symbol is =, and the only non-logical axioms are universal equalities between terms containing free variables. For instance, the theory of groups is classically presented over the functor alphabet = (*,-1,1} by the equations: Idl: invl:

l*x

= x =

1

A88: (x*y)*z = z*(y,z) and the class of all first ~)rder models of these equations is called the variety of groups. The well known completeness theorem of Birkhoff states that a universally quantified equation between terms over ~ is valid in the variety iff it can be deduced from the axioms using the rotes of substitution and of replacement of equal for equal.

48

3.2

The proof-theoretlc formalization

Here we ignore the abstract notion of model and concentrate on the rules of inference. We assume given a fum:tor alphabet ¢ given with arity function a, in which we distinguish an'atom --* given with arity 2. The substitution inference rule disappears, since it is implicit from the polymorphism of other rules. The replacement of equals for equals is decomposed into elementary steps of term replacement rules: IdA : A --* A Reflexivity A-*B B--,C A ~ C

; :

Transitivity

which specify that the rewriting arrow ~ is a quasi-ordering. Now we must state that --* is compatible with the rest of the C-structure. That is, for every functor F in • - {--*} of drily n and for every i < n we take a congruence rule: A-, B Funeral :' F ( A 1 , . . . , A i - I , A , AI+I,...,An) --* F ( A 1 , . . . , A i - I , B , Ai+I,...,An)

Congruence

If we add the rule of symmetry we get the theory of equMity, where we usually use symbol = instead of -+: A=B Op : • B = A Symmetry The non-logical axioms of the variety are then added as so many constants. For instance, over groups, we obtain a proof of proposition y = z -1 * (x * y) by the term

O p ( F ~ , ~ a , , ~ ( z ~ l ) ; Sat); Ass : y = ~ - ~ , (~ • y)

Exercise: Show a proof of x * 1 = x using the inference system ~ above. The conclusion we may draw from the example above is that, beyond its apparent simplicity, equational reasoning may indeed be quite complicated. The rule of symmetry is specially hard to use since it expresses a commutativity of ---~,harder to visualize than the easier mono~d structure implicit from the rules I d and ";'. It is then natural to ask: 1) Can we eliminate Op 2) More generally, can we normalize equational proofs?

3.3

The categorical viewpoint

This viewpoint gives a prominent ro!e to the mono~d structure of the quasl-ordering --~. Simplifying the presentation, we may present a category as presented by a set of objects Obj, which we shall here confuse with the set of (closed) terms over some functor alphabet ¢, and by a set of arrows (or morphisms) which we shall here confuse with the set of (closed) proofs generated from some inference system ~ containing initially the two rules; IdA : A --* A ; :

A~B

B--~C A ~ C

Identity Composition

Whenever f : A ~ B, we say that arrow f has domain A and codomain B. Furthermore, it is specified that the proofs are quotiented by a congruence = verifying: Idl: Id;f

= f

49

Idr:

A~:

f;Id

= f

(/;9);h = /; (9;h)

So we see that a category is a structure obtained as a hybrid of quasi-ordering and of mondid, to which it reduces in the two degenerate cases (i.e. If : A --* B t < 1 and IObjI = I). Note that we have given the same name to axiom Idt as for the axiomatization of groups above, Mthough here the operator ";" is a E-operator, and not just a C-operator like "*". However the unification theorem allows us to make implicit the type of variable f above, and the overloading of "Idl" may be seen as a reflection pl:inciple. If A and B are two categories, a functor F from A to B associates to every object A of A an object F ( A ) of B , and to every arrow f : A -4 S an arrow f ( f ) : F ( A ) -~ F ( B ) such that the following functorial conditions hold: F(Id) = Id F(f;g) = F(f);F(g) We see a great analogy between the notion of rewriting inference system and the main categorical notions. Actually, the categorical viewpoint is richer in that the functors have sorts themselves (i.e., the categories), and poorer in that they do not yet have arities (i.e. we just have monadic functors so far). In order to build-in arities we shall need products, and a full categorical account of minimal logic is obtained by a further adjunction, namely exponentiation. But we shall defer this explanation until we develop natural deduction in section 4. We have given this elementary development of category theory essentially to justify our terminology. The congruence rule of term formation explains a functoriality condition on the object part, and the functoriaJ~ty condition on the arrow part of the functor expresses the congruence property for rewriting. Substitutivity in rewrite rules is expressed by defining them as natural transformations between the functors denoted by the two sides of the rule. That is, a natural trans#orma~ion r between functors F and G (both from category A to category B) is a mapping associating to every object A of A an arrow rA : F ( A ) -+ G(A) such that r A ; G ( f ) ---- F ( f ) ; r B And if we consider equations rather than simply rewrite rules, the symmetry inference rule is interpreted as the existence of inverses to arrows. Equations are thus defined as natural isomorphisms. Category theory is explained in Mac Lane [94]. The categorical viewpoint for algebra has been developped by Lawvere and others [97]. Its application to proof theory is explained (in a somewhat complicated form) in Szabo [151].

3.4

Confluence and Termination

We come back to the problem of eliminating the symmetry rule. Let now --+ be any binary re]atlon over some set S , --~* be its reflexive-transitive closure, ~-~* be its equivalence closure. We say that --* verifies tile Church-Rosser condition iff X +'+* y ¢~e 3Z

Z -'-+* Z A y "~ * Z

It is easy to show that this condition is equivalent to confluence, i.e. u-+*x

that is, diagrammatically:

A U'-+* y ~-? 3Z

X"'~*Z A y " * * Z

50 U

x

y

\/ Z

When -~ is a confluent relation, normal forms (i.e. terminal elements) are unique whenever they exist, and equality (i.e. +-+*)may be decided by rewriting: That is, deduction may be replaced by computation~ and symmetry is eliminated in all but one instance. For instance~ the following set of 10 rewrite rules defines a confluent term rewriting system for group theory'. 1*~

-'+ :g

X--I * x

"-+ 1

x*l

-+ x

x,x. -1

-*

(x-l) -I

"-'* X

1

1-1-+1 ~,(x-l,y)

~

y

~-l , (x , y) .--, y (x * y ) - I _~ y - 1 ,

x-1

The rewrite relation associated with such a term rewriting system/~ is defined by M --~R N iff there exists a rule a --~ fl in R and an occurrence u in D ( M ) such that M / u = a(c~) for some substitution a, and i V = M [ u ~-- a(fl)]. It is clear that the group axioms are decided by the system R above. Conversely all rewrite rules in R may be shown to be valid equations in group theory (see the exercise above). What is less obvious is to decide the confluence of R. We shall see in the next section that is is easy to show that it is 1ocally c o n f l u e n t , in the sense that: u-+ x A u-'+ y

"¢~ 3 z

x-'+ * z A y--~ * z

that is, diagrammatically:

x

/\ \/

U

Y

Z

51

However, local confluence is not enough to prove confluence, as shown by the following counterexamples:

3.5

The Ncetherian

case: Knuth and Bendix

The problem encountered with the above counter-examples is that the rewriting relation possessed infinite chains. Let us say that relation ---r is Ncetherian iff there is no infinite chain Xl ~ x2 --~ ... (Then, its transitive closure --*+ is a well-founded ordering). We remark that ~ is Ncetherian over S iff every non-empty subset of S admits a minimal element with respect to --*+. Now lct us say that a predicate P over S is --*-hereditary iff VxeS

[Vy

x--~+y :=~ P(y)] =~ P(x).

Now we may state an important induction principle. P r i n c i p l e of N o e t h e r i a n I n d u c t i o n : Let ~ be a Ncetherian relation over 8. Then for every --*-hereditary predicate P we have Vx E S P(z). It is easy to validate this induction principle using the above remark, by considering the set of all x's such that not P(z). And now we may show that local confluence implies confluence for Ncetherian relations. N e w m a n ' s l a m i n a . A Ncetherian relation is confluent iff it is locally confluent. Proof: Ncctherian induction on predidate P defined as:

P(u) = Vz, y

u--o* z A u ~ * y

=~ 3z

z--+*z A y ~ * z

We now explain the Knuth-Bendix decision procedure for the confluence of Ncetherian term rewriting systems. First let us give an algorithm. S u p e r p o s i t i o n a l g o r i t h m . Let a l -+ fll and a2 ~ f12 be two rewrite rules in R, let u E D ( a l ) and M = a~/u be such that M is a non-variable term unifiable with as. Let iV = al (M) --- a2(a2) be a principal instance, with V (N)NV (al) = ~. We say that the ~uperposition of a2 --~ & on ~1 --* fli at u determines the critical pair (P, Q), with P --" al (al) [u <-- a2(f12)] oald Q = al(/~l).

Examples

52

• G(B,x) --~ K(x) superposes on F(x,G(x,A)) --~ H(x) at '2' to give P = F(B,K(A)) and Q = H(B).

• H(H(x)) ---* If(x) superposes on itself at '1' to give P = H(g(y)) and Q = K(H(y)). T h e K n u t h - B e n d l x t h e o r e m . The relation -~/~ is locally confluent iff for every critical pair

we reduce P and Q to two distinct irreducible terms P ' and Q', we have generated an interesting lemma P ' = Q'~ which is an equational consequence of the rnles considered as equations. It may be possible to give an orientation to this new equality for forming an extended term rewriting system, while preserving the finite termination property. This is the basis of the Knuth-Bendix completion method, which attempts to complete a term rewriting system to a confluent one. This method may be considered a way of compiling a canonical form algorithm from an equational specification. We cannot describe the method fully here. The main ideas are that unresolved critical pairs are kept as new rewrite rules, and that all rules are kept inter-reduced. The procedure may stop with a canonical system, it may fail because termination is impossible to establish, or it may loop. Whenever it does not fail, it gives a semi-decision procedure for the original equational theory, as explained in Huet [66]. More detailed expositions of the method may be found in [84,65,71]. Failure may resnlt from some permutative consequence such as eommutativity. The method has been extended in varions ways in order to consider rewritings modulo such permutative axioms. For instance, Peterson and Stickel [127] have shown that it was possible to extend the method to complete equational presentations, where oneor several functors were assumed to be associative and commutative, using Stickel's associative-commutative unification algorithm [150,43]. This method has been extended by Jouannaud and Kirchner [73]. Various other extensions of the Knuth-Bendix procedure have been proposed, for handling constructors (free functors) [69] and for solving word problems in finitely presented algebras [90]. The Knuth-Bendix completion procedure and its extensions give a general framework to simplification techniques. As example of canonical term rewriting system we give distributive lattices. Here n and U are assumed to be associative and commutative. The canonical set consists in the following four rules: xn(zuy)

~x

• u(yn~) ~ [ ~ u y ) n(~uz) xU:~ ---~x

xn=

-~z

E x e r c i s e . Show that the other distributivity law is a consequence of the above rules. Finally, we show the canonical system for Boolean algebras. Now the connectives h and $ (exclusive or) are assumed to be associative and commutative. x A 1 --+x

53

xAO - ~ 0 xAx

"-~ x

x@0-*z z @ x --~0

This canonical set can be used to decide propositional calculus, using the following translations:

xVy

-*x

$

x=~y

~x

$

y •

(xAy)

(xAy)

$ 1

The resulting decision method is basically the method of Venn's diagrams, as the following example demonstrates. With three propositional letters a, b and c~ the proposition

(aA-~b) V (bA-~c) V (cA-~a) reduces to its canonical form: a $

b @ c @ aAb

@ bAc

~9 c A a

which can easily be "seen" as a disjoint union of regions in the following Venn diagram:

This example also shows that disjunctive normal form is n o t a canonical form, since the above proposition possesses another d.n.f.

or, as Quine puts it, a formula may have distinct mlnimal sets of prime impticants.

54

3.6

Sequential computations

We now consider term rewriting systems with two constraints: (a) left linearity: for every ~ -*/~ in R, every variable of ~ occurs exactly once (b) non ambiguity: there are no critical pairs As we shall see, these systems are always confluent, their termination is unnecessary. Functional programming languages, and more generally operational semantics rules can usually be expressed as such systems of rewrite rules [58]. As a very simple example, consider the system of two rules DefK and Defs defining the combinators S and K. We shall here define the main notions of computation using rewrite rules. The full theory is given in Huet-L~vy [70]. We call redex in term M an occurrence u e D(M) such that a _< M / u for some left-hand side a of a rule in R. We define the reduction relation -*R associated with R in the same way as in the preceding section. We shall assume R fixed from now on, and write simply -* for reduction. Let M ~ N at redex occurrence u E D(M), using rule a -* fl E R. Let now v be any redex in M. We define the set v\u of residuals of v as a set of redexes in N defined as follows. If v = u, v\u -- ~. If v < u or vlu , then v\u = {v}. Finally, if v > u, this means, by non-overlapping, that v is below some variable x of a. By linearity, x has a unique occurrence in c~, which we shall denote by x as well. That is, v = u " x ~ w for some w. Now let X be the set of occurrences of variable z in ft. We define u \ u = {u ~ y ~ • [ y e X}. Thus redex v may have zero, one or more residuals in N. Intuitively, these residuals are the places where one must reduce in N in order to effect the computation step consisting in reducing at redex v in M. Actually, on the natural dag implementation all the occurrences of v\u denote the same shared node of the dag representing N. Symmetrically the same holds of u\v. And as expected we have a local confluence diagram, where the single steps u and v confluate using all the steps in v\u (resp. u\v). However, this is not sufficient, since we do not want to require --+ to be Ncetherian. However, it is easy to notice that all the redexes in v\u are mutually disjoint, and that any residual of some redex is always disjoint from any residual of some other disjoint redex. Thus it is natural to extend the reduction relation -~ to parallel reduction of a set of mutually disjoint redexes, a relation we shall write - - ~ . If M - ~ N using set of redexes U, then for every set V of mutually disjoint redexes in M, we define the residuals of V by U as: V \ U = {w E v\u I u E U A v E V}. And now we have a strong confluence property:

which extends easily to 1autti-steps derivations A and B, yielding: T h e p a r a l l e l moves t h e o r e m . Let A and B be two co-initial derivations. Define A U B as B U As in the sense that these two derivations are co-final, a~ld preserve

A; B \ A . Then A U B -

55

residuals. T h e c a t e g o r i c a l v i e w p o i n t . The category whose objects arc terms, and whose arrows from M to N are parallel derivations, quotiented by the equivalence - , admits pushouts. C o r o l l a r y . The reduction relation -+ has the Church-Rosser property. Beware! The lattice structure given by the parMlel moves theorem is on derivations~ and not on terms. For instance, if we consider the system R consisting solely of the rules I(x) ~ x and J(z) ~ x, the following'derivations diagram shows that the terms I(J(A)) and J(I(A)) do not possess a g.l.b.

t(J(I(A)))

I(J(A))

\/\//

J(I(A))

+(A)

I(I(A))

t(A)

\/ A

Note that this phenomenon may be traced to the existence of two non-equivalent derivations between I(I(A)) and I(A). This shows that the categorical viewpoint is the right one here: we need to talk in terms of arrows, just just relations between terms. T h e s t a n d a r d i z a t i o n t h e o r e m . It is always possible to compute in an outside-in manner. We do not have the space here to explain in a rigorous manner what outside-in exactly means. We just remark that this may be more complicated than merely reducing the leftmost-outermost redex, i.e. the redex minimum in the total ordering on occurrences defined by u
56

leads to the notion of sequential term rewriting system. A fllrther refinement, strong sequentiality, gives a decidable criterion which may bc used to drive efficient interpreters which look for a needed redcx in linear time, using a generalization to trecs of the Kmlth-Mol-ris-Pratt string-matching algorithm [85]. This theory is completely explahmd in Huet-L6vy [70]. In practice, we obtain easy criterions for strong sequentiMity in the particular cases of systems with constructors, and ~left" systems such as systems of combinators definitions.

4 4.1

Natural

deduction

and

Proofs with variables;

A-calculus

sequents

We now come back to the general theory of proof structures. We saw earlier that the Hilbert presentation of minimal logic was not very natural, in that the trivial theorem A -~ A necessitated a complex proof S K K. The problem is that in practice one does not use just proof terms, but deductions of the form

r~A where i" is a set of (hypothetic) propositions. Deductions are exactly proof terms wi~h varlables. Naming these hypothesis variables and the proof term, we write:

{...[xi:Ai]...ti<_n}

t- M : A

with V ( M ) C {Xl, ..., x,~}. Such formulas are called sequents. Since this point of view is not very well-known, let us emphasize this constatation:

Sequents represent proof terms wi~h variables. Note that so far our notion of proof construction has not changed: P b-~ M : A iff b-~ur M : A, i.e. the hypotheses from r are used as supplementary axioms, in the same way that in the very beginning we have defined T(~, V) as T(~ tA V). 4.2

The deduction

theorem

This theorem~ fundamental for doing proofs in practice, gives an equivalence between proof terms with variables and functional proof terms:

r u { A } ~- B ~

r ~ A~B

That is, in our notations:

~ r u { z : A } ~- ( M x ) : B This direction is immediate, using App, i.e. Modus Ponens. b) rU{x:A} t- M : B ~ F t- [ x ] M : A - - ~ B where the term [~] M is given by tile following algorithm. a) F b- M : A - - - + B

8chSnfinkel's abstraction algorithm: [ x ] x -- I [z](MN)

(-- S g K) = S [x]M [x]N

57

Note that this algorithm motivates the choice of combinators S an(] K (and optionally I). Again we stress a basic observation: Sch6ntinkel's algorithm is the essence of the proof of the deduction theorem. Now let us consider the rewriting system R defined by the rules DefK and D e f s , optionally supplemented-by: Defi: I x = x and let us write ~> for the corresponding reduction relation. Fact.

( [ x ] M N)

~>* M [ z ~ - - N ] .

We leave the proof of this very important property to the reader. The important point is that the abstraction operation, together with the application operator and the reduction t>, define a substitution machinery. We shall now u.~e this idea more generally, in order to internalize the deduction theorem in a basic calculus of flmctionality. That is, we forget the specific combinators S and K , in favor of abstraction seen now as a new ~erm constructor.

4.3

A calculus.

Here we give up T-terms in general, in favor of A-terms constructed by 3 elementary operations: variable

(M N)

application

[z] M

abstraction

This last case is usually written Ax. M, whence the name A-notation. The A-notation is first a non-ambiguous notation for expressions denoting functions. For instance, the function of two arguments which computes sin of its first argument and adds it to cos of its second is written

Ez] [y]sin(z)+cos(y) The variables x and y are bound variables, that is they are dummies and their name does not matter, as long as there are no clashes. This defines a congruence of renaming of bound variables usually called a-conversion. Another method is to adopt de Bruijn's indexes, where variable names disappear in favor of positive natural numbers [15]. We define recursively the sets An of A-expressions valid in a context of length n > 0 as follows:

k

(1 < k < n)

[ (MN) (M, NeAn) I []M MEA,+I. Thus integer n refers to the variable bound by the n-th abstraction above it. For instance, the expression [] (1 [] (1 2)) corresponds to [x] (x [y] (y x)). This example shows that, although more rigorous from a formal point of view, the de Bruijn naming scheme is not fit for human understanding, and we shall now come back to the more usual concrete notation with variable names.

58 Tim fact observed above is now edicted as a computation rule, usually called #-reduction. Let > be the smallest relation on A-expressions compatible with application and abstraction and such that: ( [ x ] M N) > M [ x +- g ] . We call A-calculus the A-notation equipped with the fl-reduction computation rule >. )~-calculus is the basic calculus of substitution~ and fl-reduction is the basic computation mechanism of fimctional programming languages. Here is an example of computation:

>2 ([~] (~ u) [w] [~3 (~ u)) We briefly sketch here the syntactic properties of ,\-caiculus. Similarly to the theory developped above, the notion of residual can be defined. However, the residuals of a redex may not always be disjoint, and thus the theory of derivations is more complex. However the parallel moves lemma still holds, and thus the Church-Rosser property is also true. Finally, the standardization theorem holds, and here it means that it is possible to compute in a teftmost-outermost fashion. These results, and more details, in particular the precise conditions under which fl-reduction simulates combinatory logic calculus, are precisely stated in Barendregt [4]. We finally remark that A-calculus computations may not always terminate. For instazlce, with A = [u] (u u) and 3_ = (A A), we get 3_ > 3_ > ... A more interesting example is given by

Y = [ f ] ( [ u ] ( f (uu))

[ u ] ( f (uu)))

since (Y f) >* (f (Y f)) shows that Y defines a general fixpoint operator. This shows that (full) A-cMculus is inconsistent with logic. What could ( f i x -,) mean? As usual with such paradoxical situations, it is necessary to introduce types in order to stratify the definable notions in a logically meaningful way. Thus, the basic inconsistency of Church's A-calculus, shown by Rosser, led to Church's theory of types [22]. On the other hand, A-calculus as a pure computation mechanism is perfectly meaningful, and Strachey prompted Scott to develop the theory of reflexive domains as a model theory for full )~-calculus. But let us first investigate the typed universe. 4.4

G e n t z e n ' s s y s t e m N of n a t u r a l d e d u c t i o n

The idea of A-notation proofs underlies Gentzen's natural deduction inference rules [48], where App is called -*-elim and Abs is called -~-intro. The role of variables is taken by the base sequents:

AxiomA

: A ~- A

together with the structural ~1~nning rule:

Thinning

:

F~-B

ru{A} ~-B

which expresses that a proof may not use all of the hypotheses. Gentzen's remaining rules give types to proofs according to propositions built as flmctor terms, each functor corresponding to a propositional connective. The main idea of his system is that inference nlles should not bc arbitrary,

59

but should follow the flmctor structure, in explaining in a unifl)rm fashion how to introduce a functor, and how to eliminate it. For instance, mininal logic is obtained with ~ = {-~}, and the rules of ~ - i n t r o and ---* - e l i m , that is: Abs App

:

:

FU{A} b B F F" A - " + B

P b A-~B

A bA

FUA F B Now, the fl-reduction of A-calculus corresponds to cut-elimination, i.e. to proof-simplification. Reducing a redex corresponds to eliminating a detour in the demonstration, using an intermediate lemma. But now we have termination of this normalization process, that is the relation ~ is Ncetherian on valid proofs. This result is usuMly called strong normalization in proof theory. A full account of this theory is given in Stenlund [149]. Minimal logic can then be extended by adding more functors and corresponding inference rules. For instance, conjunction A is taken into account by the intro rule: Pair

:

PFA AbB PUA b AAB

which, from the types point of view, may be considered as product formation, and by the two elim rules: Fst Snd

FPAAB PbA FbAAB : PFB

:

corresponding to the two projection functions. This corresponds to building-in a ),-calculus with pairing. (~eneralizing the notion of redex (cut) to the configuration of a connective intro, immediately followed by elim of the same connective, we get new computation rules: F~t(P~ir(~,y))

~ x

Snd(Pair(x,y))

~> y

and the Noetherian property of E> still holds. We shall not develop further Gentzen's system. We just remark: (a) More connectives, such as disjunction, can be added in a similar fashion. It is also possible to give rules for quantifiers, although we prefer to differ this topic until we consider dependenL bindings. (b) Gentzen originally considered natural deduction systems for meta-mathematical reasons, namely to prove thcir consistency. He considered ~mother presentation of sequent inference rules, the L system, which possesses the subformula property (i.e. the result type of every operator is formed of subterms of the argument types), and is thus trivially consistent. Strong normalization in this context was the essential technical tool to establish the equivalence of the L and the I'~ systems. Of course, according to Ghdel's theorem, this does not establish absolute consistency of the logic, but relativizes it to a carefully identified troublesome point, the proof of termination of some reduction relation. This has the additional advantage to provide a hierarchy of strength of inference systems, classified according to the ordinal necessary to consider for the termination proof.

60 (e) All this development concerns so called intuitJonistic logic, where operators (inference rules) arc deterministic. It is possible to generalize the inference systems to classical logic, using a generalized notion of sequent I" b- A, where the right part A is also a set of propositions. It is possible to explain the composition of such non-deterministic operators, which leads to Gentzen's systems NK and LK (Klassical logic!). Remark that the analogue of the unification theorem above gives then precisely Robinson's resolution principle for general clauses [139]. (d) The categorical viewpoint fits nicely these developments. This point of view is completely developped in Szabo [151]. The specially important connections between A-calculus, natural deduction proofs and cartesian closed categories are investigated in [98,121,87,142,35,68]. Further readings on natural deduction proof theory are Prawitz [130] and Dummett [41]. The connection with recursion theory is developped in Kleene [82] and an algebraic treatment of these matters is given in Rasiowa-Sikorski [133]. 4.5

Programming

languages, recurslon

The design of programming languages such as ALGOL 60 was greatly influenced by A-calculus. In 1966 Peter Landin wrote a landmark article setting the stage for coherent design of powerful functional languages in the A-calculus tradition [89]. The core language of his proposal, /SWIM (If you see what I mean!) meant A-calculus, with syntactically sugared versions of the fl-redex ( [ z ] M N), namely let x = N in M and M where x = N respectively. His language followed the static binding rules of A-calculus. For instance, after the declarations: let f x

= x+ywherey=l; let y = 2;

the evaluation (reduction) of expression (f 1) leads to value 2, as expected. Note that in contrast laziguages such as LISP [107], although bearing some similarity with the A-notation, implement rather dynamic binding, which would result in the example above in the incorrect result 3. This discrepancy has led to heated debates which we want to avoid here, but we remark that static binding is generally considered safer and leads to more efficient implementations where compilation is consistent with interpretation. However, ISWIM is not completely faithful to A-calculus in one respect: its implementation does not follow the outside-in normal order of evaluation corresponding to the standardization theorem. Instead it follows the inside-out applicative order of evaluation demanding the arguments to be evaluated before a procedure is called. In the ALGOL terminology, ISWIM follows can by value instead of call by name. The development, of natural deduction as typed A-calculus fits the development of an ISWIMbased language with a type discipline. We shall call this language ML , which stands for "metalanguage", in the spirit of LCF's ML [54,53]. For instance, we get a core ML0 by considering minimal logic, with ~ interpreted as functionality, and further constant functors added for basic types such as triv, boot, int and string. Adding products we get a language ML1 where types reflect an intuitionistic predicate calculus with ~ and A. We may define functions on a pattern argument formed by pairing, such as: let fs~(x, y) = z

and the categorical analogue are the so-called cartes/an closed categories (CCCs). Adding sums lead to Bi-CCC's with co-product. The corresponding I~L2 primitives are int, inr, out1, outr and is1, with obvious meaning. So far all computations terminate, since the corresponding reduction relations are Ncetherian.

61

However such a programming language is too weak for practical use, since recursion is missing. Adding recursion operators may be clone in a stratified manner, as presented in Gbdel's system T [51], or in a completely general way in ML~,where we allow a "letrec" construct permitting arbitrary recursive definitions, such as: letrec f a c t n :

i f n = O then l else n * (/fact ( n - l ) )

But then we loose the termination of computations, since it is possible to write un-founded definitions such as letrec absurd x = absurd x. Furthermore, because ML follows the applicative order of evaluation we may get looping computations in cases where a A-calculus normal form exists, such as for let f z = 0 in f (absurd x),

4.6

Polymorphism

We have polymorphic operators (inference rules) at the meta level. It seems a good idea to push polymorphism to the object level, for functions defined by the user as A-expressions. To this en.d, we introduce bindings for type variables. This iclea of type quantification corresponds to allowing proposition quantifiers in our propositional logic. First we allow a universal quantifier in prenex position. That is, with To = T ( ~ , V), we now introduce type schemas in 7'1 = ToUVa T1, a E V. A (type) term in 7'1 has thus both free and bound variables, and we write F V ( M ) and B V ( M ) for the sets of free (respectively bound) variables. We now define generic instaneiation. Let r = V a i . . . a m . r o E TI a n d r s = Vfll...fln'r~ ET1. We definer s_>c riffv~ = a(v0) with D(a) C_ { a l , . . . , a m } and fll ~ FV(T) (1 < i _< n). Remark that _ acts on F V whereas _>(_;acts on B V . Also note ~" >_a ~" :* ~ff') >a ~(r) We now present t h e Damas-Milner inference system for polymorphic A-calculus [39]. In what follows, a sequent hypothesis A is assumed to be a list of specifications ~i : ri, with r i E T1, and we write F V ( A ) = Ui F V ( ~ ) . TAUT INST

: A I" x : t~

AFM:a : A FM : #

GEN

A~-M:r : A ~-M : V a . r

APP

:

a<_c#)

(a~FV(A)

: P--+T A ~-N : r l A F(MN) : r

AU{x:r'} FM : r A ~- [ x ] M : r I - - ~ r A F M : P A U { x : r I} ~ ' N : r : A Fletx=MinN : r ABS

LET

A t-M

(~:aEA)

:

62

For instance, it is all easy exercise to show that ~- let i =

[x]xin (it)

: c~ + c~.

The above system may be extended without difficulty by other flmctors such as product, and by other I~L contructions such as letrec. Actually every ML compiler contains a typechecker implementing implicitly tile above inference system. For instance, with the unary functor list and the following ML primitives: [] : (list a), c o n s : a × (list a) (written infix as a dot), hd : (list a) -~ a and tl : (list a) ---* (list o~), we may define recursively the map functional as: letree map f I = i f t = [3 the. [] else (f (hd I)) • map f (tl l) and we get as its type: ~- m a p :

(a ~ fl) --~ (list a) --* (list fl).

Of course the I~ILcompiler does not implement directly the inference system above, which is non-deterministic because of rules I N S T and G E N . It uses unification instead, and thus computes deterministieally a principal type, which is minimum with respect to -
T h e l i m i t s o f ML ~s p o l y m o r p h l s m

Consider the following ML definition: letree p o w e r n f u = i f n = O t h e n u

else f (po~e~ (~ - 1) f ~)

63

of type nat ~ (a ~ a) ~ (a ~ c~). This ['unction, which associates to natural n the polymorphic iterator mapping function f to the n-th power of f, may be considered a coercion operator between ML ;s internal naturals and Church's representation of naturals in pure A-calculus [23]. Let us recall briefly this representation. Integer 0 is reprcsented as the projection term [ f ] [u] u. Integer 1 is [ f ] [u] ( f u). More generally; n is represented as the functional ~ iterating a function f to its n-th power:

-----[f] [u] (f (f ...(fu)...)) and the arithmetlc operators may be coded respectively as: n+m

-- [ f ] [ u ] ( n f (m f u ) )

n×m

--- [ f ] ( n ( m f ) )

For instance; with 2 -- [ f ] [u] (f (f u)), we check that ~ x ~ converts to its normal form il. We would like to consider a type NAT

= Va~ (a -+ a) ~ (a ~ a)

and be able to type the operations above as functions of type N A T --+ N A T -+ N A T . However the notion of polymorphism found in ML does not support such a type, it allows only the weaker w.

((~ - , ~) -~ (~ -~ ~)) -~ ((~ -~ ~) -~ (~ -~ ~)) -~ ((~ -~ ~) -~ (~ - , ~))

which is inadequate, since it forces the same generic instanciation of N A T in the two arguments. W a r n i n g . These preliminary notes are very sketchy from now on. A future version will cover the topics below in greater depth. 4.8

Girard's

second order A-calculus.

The example above suggests using the universal type quantifier inside type formulas. We thus consider a functor alphabet based on one binary --* constructor and one quantifier V. We shall now consider a A-calculus with such types, which we shall call second-order A-calculus, owing to the fact that the type language is now a second-order propositional logic, with propositional variables explicitly quantified. Such a calculus was proposed by J.Y. Girard [49,50], and independently discovered by J. Reynolds [135]. Girard proved the main properties of the calculus: G i r a r d ' s t h e o r e m . Second-ol-der A-calculus admits strong normalization. C o r o l l a r y . Second-order natural deduction is consistent. Girard used this last resu!t to show the consistency of analysis. Second-order A-calculus is a very powerfnl language. Most usual data structures may be represented as types. Iaarthermore, it captures a large class of total reeursive functions (precisely, all the functions prouvably total in second-order arithmetic). It may seriously be considered as a candidate for the foundations of powerfill programming languages, where recursion is replaced by iteration. But the price we pay by extending polymorphism in this drastic fashion is that the notion of principal type is lost. Type synthesis is possible only in easy cases; and thus in general the programmer has to specify the types of its data. Further discussions on the second-order A-calculus may be found in [1(}8,46,91;7].

64

5 5.1

Dependent types Quantification

So far we have dealt only with types as propositions of some (intuitionistic) propositional logic. We shall now consider stronger logics, where it is possible to have statements depending upon variables that are A-bffund. We shall continue our identification of propositions and types, and thus consider a first-order statement such as Vx E E - P(z) as a product-forming type H~eEP(z )We shall call such types depende_~t, in ttmt it is now possible to declare a variable of a type which depends on the binding of some previously bound variable. Let us first of all remark that such types are absolutely necessary for practical programming purposes. For instance, a matrix manipulation procedure should have a declaration prefix of the type:

[n : nat] [matri~ : array(n)] where the second type depends on the dimension parameter. PASCAL programmers know that the lack of such declarations in the language is a serious hindrance. We shall not develop first-order notions here: and shall rather jump directly to calculi based on higher-order logic. 5.2

Martin-LSf~s Intuitionistic

Theory

of Types

P. Martin-LSf has been developing for the last 10 years a higher-order intuitionist logic based on a theory of types, allowing dependent sums and products [104,105,106]. His theory is not explicitly based on ;~-calculus, but it is formulated in the spirit of natural deduction, with introduction mid elimination rules for the various type constructors. Consistency is infcrred from semantic considerations, with a model theory giving an analysis of the normal forms of elements of a type, and of the equality predicate for each type. Martin-LSf's system has been advocated as a good candidate for the description and validation of computer programs, and is an active topic of research by the GSteborg Programming Methodology group [117,119,120]. A particularly ambitious implementation of Martin-LSf's system and extensions is under way at Cornell University, under the direction of R. Constable [25,26,132]. 5.3

de Bruijn's

AUTOMATH

languages

The mathematical language AUTOMATH has been developed and implemented by the Eindhoven group, under the direction of prof~ N.G. de l~ru~jn [14,t6~18]. AUTOMATH is a h-calculus with types that are themselves ;~-expressions: It is based on the natural idea that .~-binding and universal instaaciation are similar substitution operations. Thus in AUTOMATH there is only one binding operation, used both for parameter abstraction and product instanciation. The meta-theory of the various languages of the AUTOMATH family are investigated in [113,38,75]. The most notable success of the AUTOMATH effort has been the translation and mechanical validation of Landau's Grundlagen [74].

5.4

A Calculus of Constructions.

AUTOMATH established the correct linguistic foundations for higher-order natural deduction. Unfortunately, it did not allow Girard's second-order types, and probably for this reason was never considered under the programming language aspect. Th. Coquand showed that a slight extension of

65 the notation allowed the incorporation of Girard's types to AUTOMATH in a natural m~umer [27]. Coquand showed by a strong normalization thcorem that the formalism is consistent. Experiments with ,~1 implcmentation of the calculus showed that it is welt adapted to expressing naturally and concisely mathematical proofs and computer algorithms [29]. Variations on this calculus are under development [30,31].

Conclusion We have presented in these notes a uniform account of logic and computation theory, based on proof theory notions, and most importantly on the Curry-Howard isomorphism between propositions and types [37,59]. These notes are based on a course given at the Advanced School of Artificial Intelligence, Vigneu, France, in July 1985. An extended version is in preparation.

References [1] A. Aho, J. Hopcroft, J. Ullman. Addison-Wesley (1974).

~'~TheDesign and Analysis of Computer Algorithms."

[2] P. B. Andrews. "Resolution in Type Theory." Journal of Symbolic Logic 36~3 (1971), 414432. [3] P. B. Andrews, D. A. Miller, E. L. Cohen, F. Pfenning. ~Automating higher-order logic." Dept of Math, University Carnegie-Mellon, (Jan. 1983). [4] It. Barendregt. "The Lambda-Calculus: Its Syntax and Semantics ." North-Holland (1980). [5] E. Bishop. "Foundations of Constructive Analysis." McGraw-Hill, New-York (1967). [6] E. Bishop. "Mathematics as a numerical language." Intuitionism and Proof Theory, Eds. J. Myhill, A.Kino and R.E.Vesley, North-Holland, Amsterdam, (1970) 53-71. [7] C. BShm, A. Berarducci. "Automatic Synthesis of Typed Lambda-Programs on Term Algebras." Unpublished manuscript, (June 1984). [8] R.S. Boyer, J Moore. "The sharing of structure in theorem proving programs." Machine Iatelligence 7 (1972) Edinburgh U. Press, 101-116. [9] R. Boycr, J Moore. "A Lemma Driven Automatic Theorem Prover for Recursive Function Theory." 5th International Joint Conference on Artificial Intelligence, (1977) 511-519. [I0] R. Boyer, J Moore. "A Computational Logic." Academic Press (1979). [11] R. Boyer, J Moore. "A mechanical proof of the unsolvability of the halting problem." Report ICSCA-CMP-28, Institute for Computing Science, University of Texas at Austin (July 1982). [12] R. Boyer, J Moore. "Proof Checking the RSA Public Key Encryption Algorithm." Report ICSCA-CMP-33, Institute for Computing Science, University of Texas at Austin (Sept. 1982). [13] R. Boyer, J Moore. "Proof checking theorem proving and program verification." Report ICSCA-CMP-35, institute for Computing Science, University of Texas at Austin (Jan. 1083).

66 [t41 N.G. de Bruijn. "The matlmmatical language AUTOMATH, its usage and some of its extensions." Symposium on Aut.omatic Demonstration, IRIA, Versailles, 1968. Printed as Springer-Vcrlag Lecture Notes in Mathematics 125, (1970) 29-61. [15] N.G. de Bruijn. "Lambda-Calculus Notation with Nameless Dummies, a Tool for Automatic Formula Manipulation, with Application to the Church-Rosser Theorem." Indag. Math. 34,5 (1972), 381-392. [16] N.G. de Bruijn. ¢'Automath a language for mathematics." Les Presses de l'Universit~ de Montreal, (1973). [17] N.G. de Bruijn. "Some extensions of Automath: the AUT-4 family." Internal Automath memo M10 (Jan. 1974). [18] N.G. de Bruijn. "A survey of the project Automath." (i980) in to H. B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism, Eds Seldin J. P. and Hindle$ J. R., Academic Press (1980). [19] M. Bruynooghe. "The Memory Management of PROLOG implementations." Logic Programming Workshop. Ed. Tarnlund S.A (July 1980). [20] L. Cardelli. "MLunder UNIX." Bell Laboratories, Murray Hill, New Jersey (1982). [21] L. Cardelli. "Amber." Bell Laboratories, Murray Hill, New Jersey (1985). [22] A. Church: "A formulation of the simple theory of types." Journal of Symbolic Logic 5,1 (1940) 56-68. [23] A. Church. "The Calculi of Lambda-Conversion." Princeton U. Press, Princeton N.J. (1941). [24] A. Colmerauer, H. Kanoui, R. Pasero, Ph. Ronssel. "Un syst~me de communication hommemachine en francais." Rapport de recherche, Groupe Intelligence Artificielle, Facult~ des Sciences de Luminy, Marseille (1973). [25] R.L. Constable, J.L. Bates. "Proofs as Programs." Dept. of Computer Science, Cornell University. (Feb. 1983). [26] R.L. Constable, J.L. Bates. "The Nearly Ultimate Pearl." Dept. of Computer Science, Cornell University. (Dec. 1983). [27] Th. Coquand. "Une th~.orie des constructions." Th~se de troisi~me cycle, Universlt~ Paris

v n (Jan. 85), [28] Th. Coquand, G. Huet. "A Theory of Constructions." Preliminary version, presented at the International Symposium on Semantics of Data Types, Sophia-Antipolis (June 84). [29] Th. Coquand, G..Huet. "Constructions: A Higher Order Proof System for Mechanizing Mathematics." EUROCAL85, Linz, Springer-Verlag LNCS 203 (1985). [30] Th. Coquand~ G. Huet. "Concepts Mathhmatiques et Informatiques Formalls4s dans le Calcul des Constructions." Colloque de Logique, Orsay (auil. 1985). [31] Th. Coquand, G. Huet. "A Calculus of Constructions." To appear, JCSS (1986).

67 [32] J. Corbin, M. Bidoit. "A Rehabilitation of Robinson's Unification Algorithm." IFIP 83, Elsevier Science (1983) 909-914. [33] G. Cousinean, P.L. Curien and M. Manny. "The Categorical Abstract Machine." In Functional Programming Languages and Computer Architccture, Ed. J. P. Jouannaud, SpringerVerlag LNCS 201 (1985) 50-64. [34] P.L. Curien. "Combinateurs cat6goriques, algorithmes s6quentiels et programmation applicative ." Th~se de Doctorat d'Etat, Universit6 Paris VII (Dec. 1983). [35] P. L. Curien. "Categorical Combinatory Logic." ICALP 85, Nafplion, Springer-Verlag LNCS 194 (1985). [36] P.L. Curien. "Categorical Combinators, Sequential Algorithms and Functional Programming." Pitman (1986). [37] H. B. Curry, R. Feys. "Combinatory Logic Vol. I." North-Holland, Amsterdam {1958). [38] D. Van Daalen. "The language theory of Automath." P h . D . Dissertation, Technological Univ. Eindhoven (1980). [39] Luis Damas, Robin Milner. "Principal type-schemas for functional programs ." Edinburgh University (t982). [40] P.J. Downey, R. Sethi, R. Tarjan. "Variations on the common subexpression problem." JACM 27,4 (1980) 758-771. [41] Dummett. "Elements of Intuitionism." Clarendon Press, Oxford (1977). [42] F. Fages. "Formes canoniques dans les algtbres bool~ennes et application g la d6monstration automatique en logique de premier ordre ." Thtse de 3tree cycle, Univ. de Paris VI (Juin 1983). [43] F. Fages. "Associative-Commutative Unification." Submitted for publication (1985). [44] F. Fages, G. Huet. "Unification and Matching in Equational Theories." CAAP 83, l'Aquila, Italy. In Springer-Verlag LNCS 159 (1983). [45] P. Flajolet, J.M. Steyaert. "On the Analysis of Tree-Matching Algorithms." in Automata, Languages and Progfamnfing 7th hit. Coll., Lecture Notes in Computer Science 85 Springer Verlag (1980) 208-219. [46] S. Fortune, D. Leivant, M. O~Donnetl. "The Expressiveness of Simple and Second-Order Type Structures." Journal of the Assoc. for Comp. Mach., 39,1, (Jan. 1983) 151-185. [47] G. Frege. "Begriffschrift, a formula language, modeled upon that of arithmetic, for pure thought." (1879). Reprinted in From Frege to GSdel, J. van Heijenoort, Harvard University Press, 1967. [48] G. Gentzen. "The Collected Papers of Gerhard Gentzen." Ed. E. Szabo, North-Holland, Amsterdam (1969).

68 [49] J.Y. Girard. "Une extension de Finterprc~tation de GSdel "~ l'analyse, et son application l'~timination des coupures dans l'a~tatyse et la th6orie des types. Proceedings of the Second Scandinavian Logic Symposium, Ed. J.E. Fenstad, North Holland (1970) 63-92. [50] J.Y. Girard. "Interprgtation fonctionnelle et 61imination des coupures dans l'arithm~tique d'ordre sup6rieure.' Th~se d'Etat, Universit~ Paris VII (1972). [51] K. G6del. "Uber eine bisher noch nicht benutze Erweitrung des finiten Standpunktes." Dialectica, 12 (1958).. [52] W. D. Goldfarb. "The Undecidability of the Second-order Unification Problem." Theoretical Computer Science, 13, (1981) 225-230. [53] M. Gordon, R. Milner, C. Wadsworth. "A Metalanguage for Interactive Proof in LCF." Internal Report CSR-16-77, Department of Computer Science, University of Edinburgh (Sept. 1977). [54] M. J. Gordon, A. J. Milner, C. P. Wadsworth. "Edinburgh LCF" Springer-Verlag LNCS 78

(1979). [55] W. E. Gould. "A Matching Procedure for Omega Order Logic." Scientific Report 1, AFCRL 66-781, contract AF19 (628)-3250 (1966). [56] J. Guard. "Automated Logic for Semi-Automated Mathematics." Scientific Report 1, AFCRL (1964). [57] J. Herbrand. "Recherches sur la th6orie de la d~monstration." Th~se, U. de Paris (1930). In: Ecrits Iogiques de J~cques Herbrand, PUF Paris (1968). [58] C. M. Hoffmann, M. J. O'Donnell. "Programming with Equations." ACM Transactions on Programming Languages and Systems, 4~1 (1982) 83-112. [59] W. A. Howard. "The formulm-as-types notion of construction." Unpublished manuscript (1969). Reprinted in to H. B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism, Eds Seldin J. P. and Hindley J. R., Academic Press (1980). [60] G. Huet. "Constrained Resolution: a Complete Method for Type Theory." Ph.D. Thesis, Jennings Computing Center Report 1117, Case Western Reserve University (1972). [61] G. Huet. "A Mechanization of Type Theory." Proceedings, 3rd IJCAI, Stanford (Aug. 1973). [62] G. Huet. "The Undccidability of Unification in Third Order Logic." Information and Control 22 (1973) 257-267. [63] G. Huet. "A Unification Algorithm for Typed Lambda Calculus." Theoretical Computer Science, 1.1 (1975) 27--57. [64] G. Huct. "R~solution d'~quations dans des langages d'ordre 1,2, ... w." Th~se d'Etat, Universit~ Paris VII (1976). [65] G. Huet. "Confluent Reductions: Abstract Pioperties and Applications to Term Rewriting Systems." J. Assoc. Comp. Mach. 27,4 (1980) 797-821.

69 [66] G. Huet. "A Complete Proof of Correctness of the Kml~h-Bendix Complcti~m Algorithm ." JCSS 23,1 (1981) 11-21. [67] G: Huet. "Initiation £ ta Th4orie des Cat6gories." Polycopi4 de cours de DEA, Universit6 Paris VII (Nov. 1985). [68] G. Huet. iCCartesian Cloud Categories and Lambda-Calculus." Category Theory Seminar, Carnegie-Mellon University (Dec. 1985). [69] G. Huet, J.M. Hullot. "Proofs by L~duction in Equational Theories With Constructors." JACM 25,2 (1982) 239-266. [70] G. Huet, J.J. Ldvy "Call by Need Computations in Non-Ambiguous Linear Term Rewriting Systems." Rapport Laboria 359, IRIA (Aug. 1979). [71] G. tibet, D. Oppen. "Equations and Rewrite Rules: a Survey." In Formal Languages: Perspectives and Open Problems, Ed. Book R., Academic Press (1980). [72] J.M. Hullot "Compilation de Formes Canonlques dans les Theories Equationnelles ." Th~se de 3~me cycle, U. de Paris Sud (Nov. 80). [73] Jean Pierre Jouannaud, Hclene Kirchner. "Completion of a set of rules modulo a set of equations." (April 1984). [74] L.S. Jutting. ~A translation of Landau's "Grundlagen" in AUTOMATH." Eindhoven University of Technology, Dept of Mathematics (Oct. 1976). [75] L.S. van Benthem Jutting. "The language theory of Aoo, a typed A-calculus where terms are types." Unpublished manuscript (1984). [76] G. Kahn, G. Plotkin. "Domaines concrets ." Rapport Laboria 336, IRIA (D6c. 1978). [77] J. Ketonen, J. S. Weening. "The language of an interactive proof checker." Stanford University (1984). [78] J. Ketonen. "EKL-A Mathematically Oriented Proof Checker." 7th International Conference on Automated Deduction, Napa, Califonfia (May 1984). Springer-Verlag LNCS 170. [79] J. Ketonen. "A mechanical proof of Ramsey theorem." Stanford Univ. (1983). [80] S.C. Kleene. "Introduction to Meta-mathematics." North Holland (1952). [81] S.C. Kleene. "On the interpretation of intuitionistic number theory." J. Symbolic Logic 31

(1945). [82] S.C. Kleene. "On the interpretation of intuitionistic number theory." J. Symbolic Logic 31 (1945). [83] J.W. Klop. "Combinatory ~eduction Systems." Ph. D. Thesis, Mathematisch Centrum Amsterdam (1980). [84] D. Knuth, P. Bendix. "Simple word problems in universal algebras". In: Computational Problems in Abstract Algebra, J. Leech Ed., Pergamon (1970) 263--297.

70 [85] D.E. Knuth, J. Morris, V. Pratt. '~Fast Pattern MatchiIig in Strings." SIAM JouruM on Computing 6~2 (1977) 323-350. [86] G. Kreisel. "On the interpretation of nonfinitist proofs, Part I, II." JSL 16 (1952, i953). [87] J. Lambek. "From Lambda-calculus to Cartesian Closed Categories." in To H. B. Curry: Essays on Combinatory Logic, Lambda-calculus and Formalism, Eds. J. P. Seldin and J. R. Hindley, Academic Press (1980). [88] J. Lambek and P. J. Scott. "Aspects of Higher Order Categorical Logic." Contemporary Mathematics 30 (1984) 145-174. [89] P. J. Landin. "The next 700 programming languages." Comm. ACM 9,3 (1966) 157-166. [90] Philippe Le Chenadec. "Formes canoniques dans les alg~bres finiment pr~sent~es ." Th~se de 3~me cycle, Univ. d'Orsay (Juin 1983). [91] D. Leivant. "Polymorphic type inference." 10th ACM Conference on Principles of Programruing Languages (1983). [92] D. Leivant. "Structural semantics for potymorphic data types." 10th ACM Conference .on Principles of Programming Languages (1983). [93] J.J. L~vy. "R~ductions correctes et optimales dans le A-calcul," Th~se d'Etat, U. Paris VII (1978). [94] S. MacLane. "Categories for the Working Mathematician." Springer-Verlag (1971). [95] D. MacQueen, G. Plotkin, R. Sethi. "An ideal model for recursive polymorphic types." Proceedings, Principles of Programming Languages Symposium, Jan. 1984, 165-174. [96] D. B. MacQueen, R. Sethi. "A semantic model of types for applicative languages." ACM Symposium on Lisp and Functional Programming (Aug. 1982). [97] E.G. Manes, "Algebraic Theories." Sprh~ger-Verlag (1976). [98] C. Mann. "The Connection between Equivalence of Proofs and Cartesian Closed Categories." Proc. London Math. Soc. 31 (1975) 289-310. [99] A. Martelli, U. Montanari. "Theorem proving with structure sharing and efficient unification." Proc. 5th IJCAI, Boston, (1977) p 543. [100] A. Ma~'telli, U, Montanari. "An Efficient Unification Algorithm ." ACM Trans. on Prog. Lang. and Syst. 4~2 (1982) 258-282. [101] William A. Martin. "Determining the equivalence of algebraic expressions by hash coding." JACM 18,4 (1971) 549-558. [102] P. Martin-LSf. "A theory of types." Report 71-3, Dept. of Mathematics, University of Stockholm, Feb. 1971, revised (Oct. 1971). [103] P. Martin-L6f. "About models for intuitionistie type theories and the notion of definitional equality." Paper read at the OrlSans Logic Conference (1972).

71 [104] P. Martin-Lhf. "An intuitionistic Theory of Types: predicative part." Logic Colloquium 73, Eds. H. Rose and J. Sepherdson, North-Holland~ (1974) 73-118. [105] P. Martin-LhL "Constructive Mathematics and Computer Programming." In Logic~Methodology and Philosophy of Science 6 (1980) 153-175, North-Holland. [106] P. Martin-Lhf. "Intuitionistic Type Theory." Studies in Proof Theory, Bibliopotis (1984). [107] J. Mc Carthy. "Recursive functions of symbolic expressions and their computation by machine." CACM 3,4 "(1960) 184-195. [108] N. McCracken. "An investigation of a programming language with a polymorphic type structure." Ph.D. Dissertation, Syracuse University (1979). [109] D.A. Miller. "Proofs in Higher-order Logic." Ph. D. Dissertation, Carnegie-Mellon University (Aug. 1983). [110] D.A. Miller. "Expansion tree proofs and their conversion to natural deduction proofs." Technical report MS-CIS-84-6, University of Pennsylvania (Feb. 1984). [111] R. Milner. "A Theory of Type Polymorphism in Programming." Journal of Computer and System Sciences 17 (1978) 348-375. [112] R. Mitner. "A proposal for Standard ML ." Report CSR-157-83, Computer Science Dept., University of Edinburgh (1983). [113] R.P. Nederpelt. "Strong normalization in a typed A calculus with ~ structured types." Ph. D. Thesis, Eindhoven University of Technology (1973). [114] R.P. Nederpelt. "An approach to theorem proving on the basis of a typed ;~-calcutus2 5th Conference on Automated Deduction, Les Arcs, France. Springer-Verlag LNCS 87 (1980). [115] G. Nelson, D.C. Oppen. "Fast decision procedures based on congruence closure." JACM 27,2 (1980) 356-364. [116] M.H.A. Newman. "On Theories with a Combinatorial Definition of "Equivalence"." Annals of Math. 43,2 (1942) 223-243. [I17] B. Nordstrhm. "Programming in Constructive Set Theory: Some Examples." Proceedings of the ACM Cor~erence on Functional Programming Languages and Computer Architecture, Portnmuth~ New Hampshire (Oct. I981) 141-154. [t18] B. Nordstrhm. "Description of a Simple Programming Language." Report 1, Programming Methodology Group, University of Goteborg (Apr. 1984). [119] B. Nordstrhm, K. Petersson. ~'Types and Specifications." Information Processing 83, Ed. R~ Mason, North-Holland, (1983) 915-920. [120] B. Nordstrhm, J. Smith. "Propositions and Specifications of Programs in Martin-L6f's Type Theory." BIT 24, (1984) 288-301. [121] A. Obtulowicz. "The Logic of Categories of Partial ~nctions and its Applications." Dissertationes Mathematicae 241 (1982).

72 [1221 M.S. Paterson, M.N. Wegman. ~Linear Unification ." J. of Computer and Systems Sciences 16 (1978) 158-167. [123] L. Paulson. "Recent Developments in LCF : Examples of structural induction." Technical Report No 34, Computer Laboratory, University of Cambridge (Jan. 1983). [124] L. Paulson. "Tactics and Tacticals in Cambridge LCF." Technical Report No 39, Computer Laboratory, University of Cambridge (July 1983). [125] L. Paulson. "Verifying the unification algorithm in LCF." Technical report No 50, Computer Laboratory, University of Cambridge (March 1984). [126] L. C. Paulson. ~'Constructing Recursion Operators in Intuitionistic Type Theory." Tech. Report 57, Computer Laboratory, University of Cambridge (Oct. 1984). [127] G.E. Peterson, M.E. Stickel. "Complete Sets of Reductiol~ for Equational Theories with Complete Unification Algorithms ." JACM 28,2 {1981) 233-264. [128] T. Pietrzykowski, D.C. Jansen. "A complete mechanization of c0-order type theory." Proceedings of ACM Annual Conference (1972). [129] T. Pietrzykowski. "A Complete Mechanization of Second-Order Type Theory." JACM 20 (1973) 333-364. [130] D. Prawitz. "Natural Deduction." Ahnqist and Wiskell, Stockolm (1965). [131] D. Prawitz. "Ideas and results in proof theory." Proceedings of the Second Scandinavian Logic Symposium (1971). [132] PRL staff. "Implementing Mathematics with the NUPRL Proof Development System." Computer Science Department, Cornelt University (May 1985). [133] H. Rasiowa, R. Sikorski "The Mathematics of Metamathematics." Monografie Matematyczne tom 41, PWN, Polish Scientific Publishers, Warszawa (1963). [134] J. C. Reynolds. "Definitional Interpreters for Higher Order Programming Languages." Proc. ACM National Conference, Boston, (Aug. 72) 717-740. [135] J. C. Reynolds. "Towards a Theory of Type Structure." Programming Symposium, Paris. Sprlnge~ Verlag LNCS 19 (t974) 408-425. [136] J. C. l'teynolds. "Types, abstraction, and parametric polymorphism." IFIP Congress'83, Paris (Sept. 1983). [137] J. C. Reynolds. "Potymorphism is not set-theoretic." International Symposium on Semantics of Data Types, Sophia-Antipolis (June 1984). [138] J. C. Reynolds. "Three approaches to type structure." TAPSOFT Advanced Seminar on the Role of Semantics in Softwire Development, Berlin (March 1985). [139] J. A. Robinson. "A Machine-Oriented Logic Based on the Resolution Principle ." JACM 12 (1965) 32-41.

73 [140] J. A. Robinson. "ComputationM Logic: the Unification Computation ." Machine Intelligence 6 Eds B. Meltzer and D.Michie, American Elsevier, New-York (1971). [141] D. Scott. "Constructive validity. ~ Symposium on Automatic Demonstration, Springer-Verlag Lecture Notes in Mathematics, 125 (1970). [142] D. Scott. "Relating Theories of the Lambda-Calculus." in To H. B. Curry: Essays on Combinatory Logic, Lambda-calculus and Formalism, Eds. J. P. Scldin and J. R. Hindley, Academic Press (1980). [143] J.R. Shoenfield. "Mathematical Logic2 Addison-Wesley (1967). [144] R.E. Shostak ~Deciding' Combinations of Theories." JACM 31,1 (1985) 1-12. [145] J. Smith. "Course-of-values recursion on lists in intuitionistic type theory." Unpublished notes, GSteborg University (Sept. 1981). [146] J. Smith. "The identification of propositions and types in Martin-Lofts type theory : a programming example. ~ International Conference on Foundations of Computation Theory, Borghotm, Sweden, (Aug. 1983) Springer-Verlag LNCS 158. [147] It. Statman. "intuitionistic Propositional" Logic is Polynomial-space Complete." Theoretical Computer Science 9 (1979) 67-72, North-Holland. [148] I~. Statman. "The typed Lambda-CaIculus is not Elementary Recursive." Theoretical Computer Science 9 (1979) 73-81. [149] S. Stenlund. "Combinators h-terms, and proof theory." P~eidel (1972). [150] M.E. Stickel ~A Complete Unification Algorithm for Associative-Commutative Functions." JACM 28,3 (1981) 423-434. [151] M.E. Szabo. "Algebra of Proofs." North-Holland (1978). t152] W. Tait. "A non constructive proof of Gentzen's Hauptsatz for second order predicate logic." Bull. Amer. Math. Soc. 72 (1966). [153] W. Tait. "Intensional interpretations of functionals of finite type I." J. of Symbolic Logic 32 (1967) 198-212. I154] W. Tait. '~A Realizability Interpretation of the Theory of Species." Logic Colloquium, Ed. R. Parikh, Springer Verlag Lecture Notes 453 (1975). [155] M. Takahashi. "A proof of cut-elimination theorem in simple type theory." J. Math. Soc. Japan 19 (1967). [156] G. Takeuti. "On a generalized logic calculus." Japan J. Math. 23 (1953). [157] G. Takeuti. "Proof theory." Studie.~ in Logic 81 Amsterdam (1975). [158] R. E. Tarjan. "Efficiency of a good but non linear set union algorithm." JACM 22,2 (1975) 215-225.

74

[159] t~.. E. Tarjan, J. van Leeuwen. "Worst-case Analysis of Set Union Algorithms." JACM 31~2

(19s5) 245-2sl. [160] A. Tarski. "A lattice-theoretical fixpoint theorem and its applications." Pacific J. Math. 5 (1955) 285-309. [161] D.A. ~lrner. "Miranda: A non-strict functional language with polymorphic types." In ~ n c tional Programming Languages and Computer Architecture. Ed. J. P. Jouannaud~ SpringerVerlag LNCS 201 ([985) 1-16. [162] R. de Vrijer "Big Trees in a )t-calculus with )t-expressions as types," Conference on A-calculus and Computer Science Theory, Rome, Springer-Verlag LNCS 37 (1975) 252-271. [163] D. Warren "Applied Logic - Its use and implementation as a programming tool." Ph.D. Thesis~ University of Edinburgh (1977).

A n Introduction to A u t o m a t e d D e d u c t i o n M a r k E. Stickel Artificial Intelligence Center SRI International M e n l o P a r k , C a l i f o r n i a 94025

Contents 1

Introduction

2 Resolution 2.1 Elimination of Tautologies 2.2 Purity 2.3 Subsumption 2.4 Set of Support 2.5 P1 and N1 Resolution 2.6 Hyperresolution 2.7 Unit Resolution 2.8 Unit-Resulting Resolution 2.9 Input Resolution 2.10 Prolog 2.11 Linear Resolution 2.12 Model Elimination 2.13 Prolog Technology Theorem Prover 2.14 Connection-Graph Resolution 2.15 Nonclausal Resolution 2.16 Connection Method 2.17 Theory Resolution 2.18 Krypton Unification 3.1 Unification in Equational Theories 3.2 Commutative Unification 3.3 Associative Unification 3.4 Associative-Commutative Unification 3.5 Many-Sorted Unification 4

Equality Reasoning 4.1 Equality Axiomatization 4.2 Demodulation 4.3 Paramodulation 4.4 Resolution by Unification and Equality 4.5 E-Resolution 4.6 Knuth-Bendix Method References

76

1

Introduction

In this chapter, we present an informal introduction to many of the methods currently used in automated deduction. The principal method for theorem proving that we discuss is resolution, but we are also substantially concerned with extending the resolution framework to reason more efficiently about particular theories. The chapters by G~rard Huet and Wolfgang Bibel complement this one. In this chapter, we treat classical logic and classical (if resolution, developed in 1963, can be considered classical!) methods of theorem proving. Huet considers systems that merge the notions of computation and deduction and Bibel extends classical reasoning to nonmonotonic reasoning, metalevel reasoning, and reasoning about uncertainty.

Increasingly many good books present various parts of the material that we informally give here. Following are brief descriptions of some of them and their strengths. Chang and Lee [12] was the first textbook for resolution, paramodulation, and unification, and it is still very useful. Loveland [52] and Bibel [5] are newer texts that are exceptionally strong in the areas of linear refinements of resolution and the connection method, which they have developed, respectively. Wos et al. [88] is written for a wider audience and reflects their practical experience in theorem proving; it is especially strong in the areas of deciding how to formalize problems and to select strategies for their solution. Kowalski [39] emphasizes the important connections between automated deduction and logic programming. Manna and Waldinger [56] (unfortunately, only Volume 1 is available so far) and Gallier [22] are new textbooks in symbolic logic that are oriented toward computer science and automated deduction. Some of the topics in this chapter are too new or specialized to be included in any textbooks, but references are included at the end of the chapter for those who want to learn more.

2

Resolution

One of the most important procedures for automated deduction is resolution [66]. Its application to propositional calculus theorem proving will be examined first. The language of the propositional calculus includes a set of propositional symbols P, Q, R, and the like, the logical connectives ~ v , / \ , ~ , and ---, the logical constants true and false, and parentheses.

A formula of the propositional calculus is one of

77 ® The atomic formula or atom P, where P is a propositional symbol • The negation -,A, where A is a formula of the propositional calculus • The disjunction (AVB), where A and B are formulas of the propositional calculus • The conjunction (A A B), where A and B are formulas of the propositional calculus • The implication (A D B), where A and B are formulas of the propositional calculus • the equivalence (A =-- B), where A and B are formulas of the propositional calculus. Since V, A, and ~- are associative, they are often treated as n-ary operators for arbitrary n so that, for example, (A V (B V C)) and ((A V B) v C) can both be written as (A V B V C). The 7, V, A, D, and = are ordered by declining operator precedence. For convenience, parentheses can be omitted where precedence can be used to determine the correct reading. Thus, for example, ((A A B) D C) coon also be expressed by A A B D C. Subformulas can be classified as occurring with positive polarity (positively) or with negative

polarity (negatively). A subformula occurs positively in a formula if it is embedded in an even number of explicit or implicit negations (equivalences and left-hand sides of implications implicitly negate formulas). A subformula occurs negatively in a formula if it is embedded in an odd number of explicit or implicit negations. Thus, for example, A occurs positively in A, A V B , A A B , B D A, and A - B and negatively in -~A, A D B, and A - B. Note that A and B and their subformulas occur both positively and negatively in A ----B.

An interpretation of a formula is an assignment of the truth values true or false to each propositional symbol in the formula. The value of a formula in an interpretation can be computed using the following truth table:

A

B

-~A

-~B

(AVB)

(AAB)

(ADB)

(A=B)

true

true

false

false

true

true

true

true

true

false

false

true

true

false

true

false

false

true

true

false

true

false

false

false

false

false

true

true

false

false

true

true

Given a formula of the propositional calculus and an interpretation of it, the value of the formula in the interpretation can be computed by replacing propositional symbols in the formula

78 by their values in the interpretation and reducing the formula by means of the t r u t h table to either true or false. An interpretation satisfes a formula and is a model of it if the formula is true in the interpretation. A formula is valid if and only if every interpretation is a model and unsatisfiable if and only if no interpretation is a model. T h e process of determining validity of a formula by the t r u t h - t a b l e m e t h o d is exponential in the worst case, requiring determination of the value of the formula in each of 2 ~ interpretations, where n is the n u m b e r of propositional symbols appearing in the formula. A l t h o u g h all known algorithms for determining validity are exponential, because propositional validity is an NPcomplete problem, methods such as resolution generally yield better performance t h a n truth-table evaluation as well as being more readily extended to theorem proving in the first-order predicate calculus. Resolution is a refutation procedure. Instead of determining the validity of a formula directly, it determines the unsatisfiability of its negation. Thus, the first step in the use of the resolution procedure is to negate the formula to be proved valid. In the case where it is intended to prove a theorem from a set of axioms, i.e., the formula is of the form A1 A • • • A A , D B, where the A~ are axioms and B is the theorem, t h e n negating the formula results in formation of the conjunction A1 A -.. A An A ~ B , i.e., only the theorem needs to be negated. For most forms of resolution, the formula must then be transformed into clause form.

A literal is either a propositional symbol (e.g., P ) or the negation of a proposition symbol (e.g., -~P).

The former are positive literals and the latter are negative literals. A clause is a

disjunction L1 V . - . V L,~ of literals. The logical constant false is sometimes referred to as the empty clause, because it can be viewed as the disjunction of zero literals. A unit clause is a clause with exactly one literal. More generally, an n-clause for n = 1, 2, 3 , . . . is a clause with exactly n literals. A clause with at least two literals is called a nonunit clause.

A positive clause is a clause all of whose literals are positive. A negative clause is a clause all of whose literals are negative. (The e m p t y clause can be considered b o t h positive and negative.)

A mixed clause is a clause which is neither positive nor negative, i.e., it has at least one positive and at least one negative literal. A Hcrn clause is a clause with at most one positive literal. A pair of literals is complementary if one is positive and the other is negative and their propositional symbols are the same. A formula is in clause form if it is a conjunction C1 A ... A C~ of clauses Ci.

Given that

79

conjunction and disjunction are associative, commutative, and idempotent, a formula is often regarded as a set of clauses, with each clause being a set of literals. A formula can be transformed to clause form by application of the following rewrites until the formula cannot be rewritten any further: (A ~ B) -* ( ( - A V B) A (-~B V A)) (A D B) --* (-~A V B) -~-~A ~ A -~(A V B) --* (-~A A -~B) ~(A A B) --* (-~A V -~B) (A V (B A C)) -+ ( ( A V B ) A ( A V C ) ) ( ( B A C ) V A) -* ((B V A) A (C V A)) Note that the clause form of a formula is not necessarily unique. For example, (P = Q) A (Q ~ R) A (R ~ P) is equivalent to both ( ~ P v Q) A (~Q v R) A (-~R v P) and (~Q v P) A (-~R V Q) A (-~P v R). The resolution rule of inference states that the resolvent clause A V B can be derived from the

parent clauses P V A and -~P v B, where P is a propositional symbol and A and B are arbitrary clauses. Thus, false can be obtained by resolving the clauses P and -~P, Q can be obtained by resolving the clauses P and ~ P v Q, and Q v R can be obtained by resolving the clause P v R and -~P V Q. The order of the literals in the clauses is unimportant. For example, P V A denotes any clause that is the disjunction of P and the literals, if any, of A. Clauses that are derived from other clauses are referred to as derived clauses. Other clauses, i.e., those that were given as inputs to the deduction system, are referred to as input clauses. The resolution rule of inference is an extension of the standard modus ponens rule in logic, which permits the derivation of Q from P and P D Q whose clause form is ~ P v Q. A set of clauses is unsatisfiable if and only if the empty clause false is derivable from the set of clauses by resolution, and resolution is refutation complete or simply complete. Following is a resolution proof of the unsatisfiability of { P V Q, P v -~@,~ P v @, ~ P v -~Q} (note t h a t idempotence of V is used to automatically replace clauses of the form P v P v C by the equivalent P v C):

1. 2. 3. 4.

PvQ Pv-~Q -~PV@ =P V -~Q

B0

5. P 6. -~P

7. false

resolve 1 and 2 resolve 3 and 4 resolve 5 and 6

The following sections will be concerned with various refinements of resolution. These refinements will be described in terms of resolution applied to the propositional calculus, but can readily be extended to apply to the first-order predicate calculus. But first consider the general requirements for the resolution in the first-order predicate calculus. The first-order predicate calculus includes variables and n-ary predicate and function symbols. Propositional symbols are essentially 0-ary predicate symbols. Constant symbols are 0-ary function symbols. The first-order predicate calculus also adds the V and 3 quantifiers.

A term of the first-order predicate calculus is one of • A variable symbol

• f ( Q , . . . , t , ) , where f is an n-ary function symbol and tl . . . . , t , are terms. A formula of the first-order predicate calculus is one of: • The atomic formula or atom P ( t l , . . . , t,), where P is an n-ary predicate symbol and Q , . . . , t , are terms. • The universal quantification (VxA), where x is a variable and A is a formula of the first-order predicate calculus. • The existential quantification (3xA), where x is a variable and A is a formula of the first-order predicate calculus. • The negation, disjunction, conjunction, implicatio~ and equivalence of first-order predicate calculus formulas, defined analogously to formulas of the propositional calculus. The intuitive interpretation of the quantified formulas is that VxA means that A is true for every value of x and 3xA means that A is true for some value of x. VxA is equivalent to -~3x-~A and 3xA is equivalent to -~Vx-~A. The quantifier Yx or 3x binds the variable x (and x is bound by the quantifier) in VxA or 3xA. Resolution operates on unquantified formulas, so it is necessary to remove quantifiers from quantified formulas by skolemization. The skolemized formula is unsatisfiable if and only if the original formula is unsatisfiable.

81

The concept of quantifier force is used to deal with the fact t h a t a universal q u a n t i f e r behaves like an existential quantifier and vice versa if it appears inside a negation. If A is a formula and

VxB is a subformula of A, then Vx has universal force in A if VxB occurs positively in A and existential force in A if it occurs negatively in A. Similarly, if A is a formula and 3xB is a subformula of A, t h e n 3x has existential force in A if 3xB occurs positively in A and universal

force in A if it occurs negatively in A. Let A be a formula to be tested for unsatisfiability. Assume t h a t A has no u n b o u n d variables and t h a t each quantifier binds a different variable. These conditions can be achieved by adding universal quantifiers to the beginning of the formula for u n b o u n d variables and r e n a m i n g variables. Assume further t h a t every quantifier in A is of universal force or existential force, but not both, i.e., no quantifier appears inside an equivalence. If some quantifier appears inside an equivalence B -- C, the equivalence must be replaced by an equivalent formula such as (B D C) A (C D B). Let QxB be a subformula of A where Qx is a quantifier of existential force. Let QlxlA1, ...,

Q , x , A , (n > 0) be the successively smal!er quantified subformulas (each A~ contains Qxi+lA~+l) of A t h a t contain QxB where each Q~x~ is a quantifier of universal force. T h e n replace QxB in A by the formula B w i t h every occurrence of x replaced by the t e r m e if n = 0 or f ( x l , . . . ,x~) if n > 1, where c is a new Skolem constant or f a new Skotem function, i.e., one t h a t does not already appear in the formula. This process is repeated until no quantifiers of existential force remain at which point all remaining quantifiers can be removed leaving an unquantified formula. Skolemization is often described as if it applied only to formulas in prcnex form, i.e., those of the form Qlxl, ..., QnxnA where A contains no quantifiers. However, this restriction is unnecessary and has the disadvantage t h a t skolemizing a formula after its conversion to an equivalent formula in prenex form m a y lead to skolem functions having more arguments t h a n necessary. For example, to prove t h a t J o h n has a father from the s t a t e m e n t t h a t everyone has a father, it is necessary to refute the formula Vx3yFather(x,y) A -~3zFather(John, z). This can be skolemized to Father(x, f(x)) A ~Father(John, z). A single step resolution refutation exists with the substitution of John for x and f(John) for z. T h e Skolem function f can sometimes be intuitively interpreted such t h a t f is the function of its arguments x l , . . . , x , t h a t computes the value required for the containing expression to be true. For example, in Father(z, f ( x ) ) , f(x) can be t h o u g h t of as referring to the father of x. An expression is called ground if it contains no variables.

A set of ground clauses can be

regarded as a syntactic variation of clauses of the propositional calculus. A set of clauses S is unsatisfiable if and only if there is an unsatisfiable set of ground clauses S r such t h a t each clause

82 in S t is an instance of a clause in S. Note that a single clause in S may require more than one instance in S I for S ~ to be unsatisfiable. For example, the set of clauses S consisting of

P(a) V P(b)

and

-~P(x) is unsatisfiable, but

S' contains two instances

-~P(a) and -~P(b) of -~P(x).

When instantiating clauses in S, it is only necessary to consider replacing variables by terms construetible from symbols occurring in S (the

Hcrbrand universe of

S), i.e., no new function

or constant symbols need be introduced. An exception is that if S contains variables but no constant symbols then a single constant symbol is added. Before resolution was developed, some proof procedures successively formed instantiations of S by replacing variables by terms in the Herbrand universe of S in ascending order of term complexity. The resulting sets S f were then tested for unsatisfiability. This approach is inefficient because the instantiation process is not well directed to finding the specific instances of variables that lead to the result being unsatisfiable. Resolution is an important inference procedure for two reasons. First, as described above, it is a single inference rule for determining the unsatisfiability of sets of clauses of the propositionM calculus. Second, it instantiates variables in a manner that is more directed to finding an unsatisfiable instantiation. When resolving two clauses of the first-order predicate calculus, two literals are resolved on and the remaining literals are disjoined to form the resolvent, just as for propositional calculus clauses. However, there are two differences. First, two clauses of the first-order predicate calculus are

standardized apart before

being re-

solved, i.e., variables of one or both of the clauses are renamed so that the two clauses have no variables in common. This is valid because a set of clauses whose unsatisfiability is being determined is considered to be the conjunction of a set of universally quantified clauses, and any pair of conjoined formulas VxP(x)

A VxQ(x)

The set of clauses consisting of

P(a, x)

is equivalent to a variable renamed one VxP(x) and

-~P(x, b) is unsatisfiable, but P(a, x)

and

A VyQ(y).

P(x, b) have

no common instance as is required for a resolution operation. After renaming variables, however,

P(a, x)

and

P(y, b) have

the common instance

Second, resolution finds by

unification a

P(a, b),

and resolution is possible.

most general substitution that makes a pair of lit-

erals complementary. This substitution is then applied to the remaining literals in forming the resolvent. For example, if substitution that makes

P(a, x)

P(a,x)

-~P(y, b) v R(y)

are resolved, the most general

-~P(y,b) complementary is

the substitution of b for x and

v Q(x) and

and

a for y, and the resolvent is Q(b) v

R(a).

By finding most general substitutions to make pairs

of literals from pairs of clauses complementary, resolution progressively finds instantiations of

83 clauses that might lead to a ground refutation. Completeness arguments for resoIution for first-order predicate calculus generally rely on lifting theorems. These show how a resolution refutation of S t whose clauses are ground instances of clauses in S can be imitated by a resolution refutation of S. The fact that two or more Iiterals in a clause can be collapsed into a single literal in a ground instance is a complication. For example, the set of clauses consisting of instances

P(a)

and

P(x) V P(y)

-~P(a) of the

-~P(u) V -~P(v) only leads to

-~P(u) V -~P(v) is

and

unsatisfiable because ground

clauses are contradictory. However, resolving

resolvents like

P(y) V -~P(v), P(y) v -~P(u), and

P(x) V P(y)

and

the like. Resolving

these resolvents among themselves and with the original clauses also yields no progress toward a refutation. In fact, every resolvent has two literals, and the empty clause can never be derived. There are two solutions to this difficulty. One is to model resolution for general clauses directly on resolution for ground instances so that there is a general resolution step corresponding to each step in the refutation of the ground instances. To accomplish this, it is necessary to resolve on possibly more than one literal of each clause simultaneously. For example, the set of clauses consisting of P(x)V P(y) and

-~P(u)V -~P(v) has

ground instances P(a) and

-~P(a) with

a single-

step resolution refutation. The general resolution operation will then have to find a most g e n e r a l substitution that make all of

P(x), P(y), P(u),

P(v)

and

identical (for example, by substitution

of x for y, u, and v). The second solution entails the addition of the

factoring

operation. The resolution rule for

general clauses resolves on only a single pair of literaIs as in the case of ground clause, but the additional factoring operation adds clause instances

(factors) that

or more literals of a clause so that they are identical. Thus,

~P(u)

is a factor of

~P(u) v -~P(v).

P(x)

result from instantiating two is a factor of

P(x) v P(y)

and

The factors can then be resolved so that they result in

the empty clause. When more than a single pair of literals must be collapsed to one in a factor (e.g., two separate pairs of literals must each be collapsed to single literals, or three literaIs must all be collapsed to a single literal), all the factors can be generated by successively applying the factoring operation to single pairs of Iiterals. Although resolution is complete, it is not very efficient when measured in terms of the size of the search space for a resolution refutation. Since the development of resolution, many refinements have improved its efficiency. Some, such as elimination of tautologies and subsumption, discard useless or redundant results. Many restrict which pairs of clauses are allowed to resolve with each other. Some of these restrictions, such as set of support~ preserve completeness, while other, such as unit resolution, are complete for only some sets of clauses.

84

2.1

Elimination

o f Tautologies

A clause that contains both a literal and its negation is a tautology that can, in most resolution procedures, be discarded. The exceptions among the procedures discussed here are model elimination, which uses chains instead of clauses, but may require retaining chains with complementary iiterals, and some forms of theory resolution. In general, the rationale for being able to discard tautologies is that they can be evaluated as true by truth-functional rules and, thus, cannot contribute to the falsity of a conjunction of clauses.

Purity

2.2

A literal whose complement does not appear in a set of clauses is called pure. Because a pure literal can never be resolved on and thus eliminated, any clause containing a pure literal can never appear in the derivation of the empty clause. Thus, all clauses containing pure literals can be safely deleted.

Subsumption

2.3

A clause C subsumes a clause D if C's literals are a subset of D's literals. It requires an equal amount of work or less to derive the empty clause from C as D if C subsumes D. Thus, D can be eliminated if C subsumes D. Two forms of subsumption can be employed: forward subsumption or the discarding of a newly derived clause that is subsumed by a clause that is already present, i.e., an input clause or a previously derived clause, and backward subsumption or the discarding of clauses already present that are subsumed by a newly derived clause. Normally, when a clause is derived, it should be tested for elimination by forward subsumption before being used to eliminate other clauses by backward subsumption. If these operations are performed in the opposite order, then some clause necessary to a refutation may be continually derived and then eliminated shortly thereafter by backward subsumption without ever being used, because the search strategy may order inference operations partially based on the age of the clause.

2.4

Set of Support

Resolution is often used to prove a theorem from a set of axioms that is known to be satisfiable. However, unrestricted resolution does not distinguish between clauses that are created from

ax-

85 iotas

(axiom clauses) and

those created from the [negation of the] theorem

(theorem clauses)--all

are treated alike. A refutation cannot be found from the satisfiable set of axiom clauses alone; a refutation must depend upon the theorem clauses. The set of support restriction was developed to take advantage of this necessary dependency and make resolution more goal-directed. The set of support restriction [90] is a complete restriction of resolution that requires division of the total set of clauses S into disjoint subsets T and S - T such that S - T is a satisfiable set of clauses. The set of clauses created from the theorem is typically designated as the set of support T. Axiom clauses would then comprise S - T. The set of support restriction allows two clauses to be resolved only if at least one of the clauses is

supported by T - - i s

in T or has an ancestor clause in T. This can substantially reduce

the size of the search space and makes the procedure more goal-directed because every derived clause is derived from a theorem clause. An alternative definition of the set of support restriction is that it allows two clauses to be resolved only if it is not the case that both are in S - T.

When all the theorem clauses are

designated as the set of support T, this means that the only unallowed resolution operations are those between axiom clauses. Even if a problem is not posed in terms of a satisfiable set of axioms and a theorem so that the theorem clauses can be designated as the set of support, the set of support restriction can still be used. Syntactic criteria can be used to designate a set of support. Any unsatisfiable set of clauses must include at least one positive clause and at least one negative clause. The interpretation that assigns

false

(resp,

true)

to every atom is a model for

a set of clauses that contains no positive (resp., negative) clauses. Thus, the set of all positive clauses or the set of all negative clauses can be designated as the set of support because it is guaranteed that the set of remaining clauses is satisfiable. Note that the set of support restriction is only complete if the set of clauses outside the set of support is satisfiable. The set of clauses {P, -~P, -~Q} cannot be refuted if only ~Q is in the set of support. Therefore, Q cannot be proved from P and -~P, if only the negated theorem is used as the set of support. Logic is sometimes criticized as being unsuitable for artificial-intelligence applications because anything, e.g., Q, can be proved from an inconsistency, e.g., P and -~P. Although it is hard to argue that an inconsistent set of axioms is desirable, the critics claim that large collections of axioms about the real world may inadvertently be inconsistent, and that it would be undesirable to conclude irrelevant statements from an inconsistency in the axioms, The set of support restriction

86 provides some protection from this problem. Its failure to prove Q from P and -~F implies that the inconsistency must be connected to the theorem via resolution operations and, hence, must in some sense be relevant to the conclusion for a set of support refutation to succeed. It is worth considering whether there is a relationship between the set of support restriction and the logic of entailment or relevant implication.

2.5

P1 and

NI Resolution

P1 resolution [67] is the restriction of resolution that requires that one of the parent clauses to each resolution operation must be a positive clause. P1 resolution can be viewed as an extension of the set of support restriction and is also complete. Using the set of support restriction, it is legitimate to designate the set of all positive clauses as the set of support. Resolution operations between input clauses will then require one parent to be a positive clause as desired. However, with just the set of support restriction, any derived clause can be resolved with any other clause and the intended restriction that one of the parent clauses to each resolution operation must be positive will not be obeyed. After each resolution operation, the resulting set of clauses is unsatisfiable provided the initial set of clauses is unsatisfiable. Thus, the set of support restriction (with the set of all positive clauses designated as the set of support) can be applied to each set of clauses resulting after performing a resolution operation and not just to the initial set of clauses, effectively imposing the desired restriction that one parent clause of each resolution operation be a positive clause. The primary importance of P1 resolution is its relation t o hyperresolution.

N1 resolution is the restriction of resolution that requires that one of the parent clauses to each resolution operation must be a negative clause. It is defined analogously to P1 resolution and has similar properties.

2.6

Hyperresolution

Hyperresolution is a more efficient version of P1 resolution. Although ordinary resolution operations take two clauses as arguments, hyperresolution is the first of several operations to be discussed that may require an arbitrary number of arguments. Each hyperresolution operation takes a single mixed or negative clause, termed the nucleus, as one of its arguments and as many positive clauses, termed electrons, as there are negative literals in the nucleus as the other arguments and produces a positive clause result. Each negative literal

87 of the nucleus is resolved with a literal in one of the electrons. The hyperresolvent consists of all the positive literals of the nucleus disjoined with the unresolved on literals of the electrons. The completeness of hyperresolution can be used to prove the claim that an unsatisfiable set of Horn clauses never needs to contain more than one negative clause (if a set of Horn clauses has more than one negative clause, then at least one of the negative clauses alone is unsatisfiable with the positive and mixed clauses). Results of hyperresolution operations are always positive clauses. Thus, any negative clause can only be a parent to the empty clause in a hyperresolution operation. But, because the empty clause needs to be derived only once in a refutation, it is unnecessary for more than one negative clause to be used.

Negative hyperresolution is exactly the same as hyperresolution except it is an efficient version of N1 instead of P1 resolution and thus derives negative instead of positive clauses.

2.7

Unit Resolution

Unit resolution [11] is the restriction of resolution that requires at least one of the parent clauses in each resolution operation be a unit clause. This is an appealing restriction when considered from the point of view of implementation and efficiency because a resolvent always has fewer Iiterals than its longer parent clause. Because the goal in resolution theorem proving is to derive the empty clause, shorter clauses are "closer" to the goal than longer clauses. Thus, unit resolution always appears to be making progress toward the goal. Unit resolution is obviously incomplete because not every unsatisfiable set of clauses contains a unit clause; however, most importantly, it is complete for sets of Horn clauses. The completeness of unit resolution for Horn clauses is easily shown. PI resolution is complete for arbitrary sets of clauses and can thus be used to refute sets of Horn clauses. Because in sets of Horn clauses all positive clauses are also unit clauses, it is apparent that every PI resolution operation is also a unit resolution operation, and thus a P1 resolution refutation is also a unit resolution refutation.

2.8

Unit-Resulting

Resolution

Unit-resulting resolution (UR-resolution} [58] is a more efficient version of unit resolution. The unit-resulting resolution operation, like the hyperresolution operation, takes an arbitrary number of arguments. Where hyperresolution operates on a single mixed or negative clause and a set of positive clauses and produces a positive or empty clause as its output, unit-resulting resolution

88 operates on a single nonunit clause and a set of unit clauses and produces a unit or empty clause as its output. In addition to the ultimate goal of deriving the empty clause, unit resolution can be seen to have as an intermediate goal the derivation of additional unit clauses, for only unit clauses can participate freely in resolution operations. The sole purpose of nonunit clauses is their role in deriving additional unit clauses, because they cannot be resolved with each other. Deriving a unit clause requires a nonunit clause, all but one of whose literals are successively resolved away by either input or derived unit clauses, in the initial set of clauses. Unit-resulting resolution implements this process of resolving away by unit resolution all but one of the literals of a nonunit clause more directly. A unit-resulting resolution operation takes as its input n unit clauses and a single n-clause or n + 1-clause and uses the n unit clauses to resolve n distinct literals away simultaneously in the nonunit clause, resulting in the empty clause or a derived unit clause. This eliminates the need to form and store derived nonunit clauses; they are handled implicitly by the unit-resolution procedure.

2.9

Input Resolution

Input resolution [11] is the restriction of resolution that requires that one of the parent clauses to each resolution operation must be an input clause, i.e., not a derived clause. Input resolution is incomplete. For example, {P v Q, P v -~Q, -~P v .Q,-~P v -~Q} cannot be refuted by input resolution. Input resolution can derive P and -~P (and Q and ~Q) but cannot resolve them with each other because neither is an input clause as required. Input resolution, like unit resolution, is complete for sets of Horn clauses. N1 resolution is complete for arbitrary sets of clauses and can thus be used to refute sets of Horn clauses. Because in sets of Horn clauses no clause has more than one positive literal, it is apparent that every N1 resolution operation results in a negative clause, and thus no two derived clauses can be resolved with each other and an N1 refutation is also an input refutation. This demonstration of the completeness of input resolution for Horn clauses also shows that it is also unnecessary to resolve arbitrary pairs of input clauses with each other, because it is sufficient to take only those pairs of input clauses that include a negative clause. More generally, input resolution is compatible with the set of support restriction. Thus, input resolution can be restricted without further loss of completeness so that every derived clause is supported by set of support T, where T can be selected arbitrarily so long as S - T is satisfiable. For exampIe, because an unsatisfiable set of Horn clauses never needs to contain more than one negative clause,

89 it is possible to refute sets of Horn clauses using a single negative clause as the set of support. It is interesting that unit and input resolution, despite their substantial operational differences, are both incomplete procedures that are complete for sets of Horn clauses. Actually, unit and input resolution are capable of solving exactly the same class of problems, i.e., if a unit resolution refutation exists, then an input resolution refutation also exists, and vice versa.

There is a

constructive proof of this fact that can be used to transform one kind of refutation into the other [11]. Input resolution bears a strong resemblance to the problem-reduction method. In the problemreduction method, the inputs are a set of primitively solvable goals, a set of rules stating that if a set of antecedent goals can be solved then the consequent goal can be solved, and a goal to be solved. Solution of the goal is accomplished by backward chaining. To solve a goal, one asks if it is primitively solvable. If it is not, then rules whose consequent goals are the same as the goal are used and solution of the antecedent goals is attempted. Such problems axe easily encoded as input resolution problems. Primitively solvable goals can be represented by positive unit clauses. Rules of the form "if goals P x , . . . , P~ are solvable then goal Q is solvable" can be represented by the clause Q v -~P1 v ... v -~Pn. The problem goal can be represented by a negative unit clause (a set of problem goals to be simultaneously solved could be represented by a negative nonunit clause). These are all Horn clauses, so input resolution is applicable. With only the negative clause in the set of support, input resolution implements backward chaining.

Ordered input resolution is a further restriction of input resolution. In ordinary input resolution, a supported n-clause can be used a derivation of the empty clause with the literals resolved away in any order.

Even if each literal can be resolved away in only one way, there are n!

derivations of the empty clause. This inefficiency is eliminated in ordered input resolution by not treating the disjunction connective v as a commutative operator for which order does not matter and by requiring that literals be resolved away in some fixed order, e.g., strictly left to right.

2.10

Prolog

Prolog [13,39], the currently most widely used logic programming language, is based on ordered input resolution and relies upon input resolution's resemblance to the problem reduction method. A Prolog program consists of a set of unit assertions P and nonunit assertions Q *- P1,. •., P~. The latter represents the clause Q v -~P1 v . . - v ~Pn. Prolog can then be asked to evaluate queries with respect to the assertions. A query is represented by the Prolog clause *-- Q 1 , . . . , Q,~. The +-

90 connective can be interpreted as the ordinary implication connective except that the arguments are reversed. The literals on the right-hand side are conjoined. If it is converted to ordinary clause form, the literal on the left-hand side of a nonunit assertion will be positive; all the literals on the right-hand side of an assertion or query will be negative. This allows representation of all Horn clauses. Because there is no negation connective in Prolog, every clause has either one (in the case of assertions) or no (in the case of queries) positive literal. Prolog program execution performs ordered input resolution to refute the query clause using the assertions. The query clause is designated as the single clause in the set of support and the leftmost literal of a derived clause is always resolved with the leftmost literal of an assertion. When the literal of a derived clause is resolved on, it is removed and, in the case of resolution with a nonunit assertions, the literals on the right-hand side of the assertion will appear in its place, in the same order as they appeared in the assertion. Let ~-- Q I , . - - , Q,~ be the current derived clause. Then~ resolution with the unit clause Q1 will result in *- Q ~ , . . . , Q,~ and resolution with the nonunit clause Q1 ~-- P 1 , . . . , P~ will result in •-" P ~ , . . . , P~, Q 2 , . . . , Q,~- As always, derivation of the empty clause completes the refutation. To facilitate its use as a programming language as well as a deductive system, Prolog is much more precise than most deductive systems about the order in which inference operations are performed. It uses ordered input resolution with left to right resolution on literals. The assertions in a Prolog program are also ordered. Assertions that appear earlier in the list of assertions that comprise a ProIog program will be tried before later ones. The control strategy is depth-first search with backtracking on failure. If the current derived clause *- Q 1 , . . . ,Q,~ and Q1 is resolved away by the unit assertion Q1 or nonunit assertion Q1 *P 1 , . - - , P ~ , all ways of refuting the derived clause ~-- Q2 . . . . . Qm or ~ P1 . . . . . P~, Q 2 , - . . , Q m are explored before any other method of resolving away Q1 (by a later assertion in the Prolog program) is tried. When Prolog is blocked, i.e., the current derived clause is *-- Q1,...,Q,~, but Q1 cannot be resolved away by any assertion not already tried, the most recent resolution operation is undone and the next alternative is tried. Prolog was analyzed above from the narrow perspective of deduction. For general-purpose theorem proving, Prolog is inadequate mainly because its inference system for Horn clauses omits general disjunction and negation and its unbounded depth-first search strategy is incomplete. A further problem is that many Prolog systems employ unification without the occurs check. This will be discussed in the section on unification. However, Prolog is much more than a deduction system--it is a programming language with many attractive features. Prolog programs can often

9] be viewed as having logical and procedural interpretations. The logical interpretation has been discussed above. The procedural interpretation considers collections of clauses with the same predicate symbol in the he~id to comprise a procedure. Execution of the procedure proceeds to match the procedure~ca~l literal by trying alternative clauses in top-to-bottom order and satisfying subgoals of nonunit clauses in left-to-right order, backtracking to find alternative solutions as required. It supports the notion that algorithms should be viewed as being a combination of logic and control components [38,39}. Prolog efficiently implements an important subset of the features, including unification and backtracking, proposed for earlier artificial-intelligence languages such as PLANNER [24] and QA4 [68]. These earlier languages generally had less complete (relative to their specification) and less efficient implementations. Unification is used as a uniform mechanism for composing and decomposing data types, represented as first-order predicate calculus terms. Prolog provides a smooth interface to built-in predicates for arithmetic, input/output, and the like, as well as user-defined predicates with logical interpretation. The cut operation provides additional control capability. Prolog's restriction to sets of Horn clauses has the natural justification that only for Horn clauses are all answers to queries certain to be definite. The inexpressibiIity of facts such as

P(a) v P(b) in Horn clauses makes it unnecessary to consider whether 3xP(x) has the answer true, with x being either a or b, but not knowing which. Sets of ground unit clauses can be regarded naturally as containing the same information as a file in a relational database. Virtual relations can be defined by nonunit clauses. Assert and retract operations permit additions or deletions of clauses by a running Prolog program. The greater expressiveness of Prolog makes it a logical generalization of relational databases. Prolog provides a form of negation, though not the standard one, termed negation as failure [49,33] that supports reasoning with the closed-world assumption. The closed-world assumption asserts that for some predicate the given instances of the predicate comprise the entire set of instances of the predicate. Failure to prove a formula then implies its negation. This topic is covered more deeply in the chapter by" Bibel.

2.11

Linear Resolution

Input resolution and its derivatives (including Prolog) are incomplete. Linear resolution [50,53] is an extension of input resolution that requires that at least one of the parent clauses to each resolution operation must be either an input clause or an ancestor clause of the other parent.

92 Linear resolution is complete. Linear resolution can be further restricted while preserving completeness; in particular, linear resolution, like input resolution, is compatible with set of support and ordering restrictions.

2.12

Model

Elimination

The model e l i m i n a t i o n procedure [51,52] is isomorphic (in the propositional case) to a highly restricted form of linear resolution and is complete. It incorporates the set of support restriction, an ordering restriction on literals, and a requirement that earlier clauses in a derivation not subsume later ones. A procedure very similar to model elimination is called SL-resoIution [40]. The restriction of model elimination or SL-resolution to Horn clauses is basically ordered input resolution, i.e., the inference system employed by Prolog. This accounts for Prolog's inference system frequently being referred to as SLD-resolution (SL-resolution for definite clauses, where definite clauses are another name for Horn clauses). Model elimination is technically not a form of resolution at all, because it operates on chains instead of clauses. A chain differs from a clause in that its literals are ordered and there are two types of Iiterals: A-literals and B-literals. The ordinary literals used in clauses in resolution will be B-literals in the model elimination procedure. The literal that is resolved on in the model elimination procedure is saved in the result as an A-literal. A-literals are used in instances where, in linear resolution, a clause is resolved with an ancestor clause. There are two inference operations in model elimination: extension and reduction.

Let

Q m , . - . , Q i be a chain whose last literal Q1 is a B-literal. The literal indices are written in descending order to facilitate comparison with the Prolog inference rule stated previously. Model elimination consistently operates on the rightmost literal of a chain, while Prolog operates on the leftmost literal of its derived clauses. Let -~Q1 v P1 v -.- v P,~ be an input clause. Then the chain Q m , . . . , Q2, [Q1],PII, . . . , P~. is the result of applying the model-elimination extension operation. In the derived chain, literals Qrn,..., Q2 are A-literals or B=literals according to their status in the parent chain; Q1 is an A-literal; P i l , . . . , Pi. are all B-literals with Q , . . . , i , being some permutation of 1 , . . . , n .

(A-literals will be marked by enclosing them in brackets.) Any

permutation of P 1 , . . . , P- can be used in the result; it is unnecessary to derive additional chains with different permutations of these literats. Again, let Q,,~,..., Q1 be a chain whose last literal Q1 is a B-llteral. If Q1 is complementary to some earlier A-llteral Qi, then the chain Q , ~ , . . . , Q2 can be derived by the model-elimination reduction operation. In the derived chain, all the literals are A-literals or B-literats according to

93

their status in the parent chain. If the clause -'Q1 v P1 v . . . v P~ used in the extension operation is represented by the Prolog assertion Q1 *-- -~P1,---, -~P~ (this is possible precisely if -'Q1 is a positive literal and

PI,..., P,~

are all negative literals, so t h a t QI, ~ P I , . . . , ~Pn are all atoms) and the p e r m u t a t i o n n , . . . , 1 is used in forming the result of the extension operation, then the resulting chain's B-literals are exactly the same literals, b u t in reverse order, as the literals in the result of a Prolog inference operation. Because in Prolog all literals in a derived clause are negative, there can never be a case of an A-literal being followed by a c o m p l e m e n t a r y B-literal. Thus, no reduction operations are possible, and retaining the A-literals is unnecessary. Some other aspects of the model elimination procedure need to be mentioned.

Both the

extension and reduction operations require the last literal of the chain to be a B-literal, but extension by a unit clause results in a chain with a terminal A-literal.

The solution to this

difficulty is t h a t t e r m i n a l A-literals are simply removed from the chain. Certain chains can be rejected without loss of completeness.

If the chain contains (a) an

A-literal followed later in the chain by an identical A-literal or B-literal, (b) an A-literal followed later in the chain by a c o m p l e m e n t a r y A-literal, or (c) a B-literal followed later in the chain by a c o m p l e m e n t a r y B-literal, where the two literals are not separated by an A-literal, then the chain can be rejected. These tests can be performed on the chain before terminal A-literals are removed. Tests (a) and (b), in particular, may reject a chain with t e r m i n a l A-literals t h a t would be acceptable if the t e r m i n a l A-literals are removed. T h e rationale for Test (b) is that if the chain contains an A-literal followed by a c o m p l e m e n t a r y A-literal, the second A-literal has a B-literal ancestor in an earlier chain in the derivation. This literal could have been removed by reduction. T h e rationale for Test (c) is t h a t c o m p l e m e n t a r y B-literals unseparated by A-literals must come from the same input clause. This clause must then be a tautology, and it is unnecessary to use tautologous clauses. Test (a) is more difficult to justify, but its effect is to eliminate loops. Rejecting chains on the basis of Test (a) precludes the refutation of a literal being a subtask of the refutation of that same literal. Following is a m o d e l elimination proof of the unsatisfiability of { P V Q, P v -~Q, ~ P v Q, ~ P v

~Q}: 1. P, Q 2. P, [Q], P

a chain from P v Q extend by P V -,Q

94.

a. P, [Q], [Pi,-Q 4. P, [Q], [p] 5. P 6. [P],Q

~,. IF], [Q], ~P 8. [P], [Q] 9.0

extend by ~ P v -~Q reduce -~Q by [Q] delete terminal A-literals [Q], [P] extend by -~P v Q extend by -,P v -.Q reduce -~P by [P] delete terminal A-literals [Q], [P]

The model elimination procedure takes its name from the fact t h a t it systematically tries to construct a model for a set of clauses. When all such attempts fail, the set of clauses is determined to be unsatisfiable. As the procedure attempts to construct a model, the A-literal [P] or [-~P] marks the assignment of of true or false to the atom P in the interpretation, respectively. In the above proof, the procedure tries to make each of the literals P and Q of the first chain an A-literal, because at least one of P and Q must be true in any model in order to satisfy the clause P V Q. Each assignment ultimately lead to a contradiction, so the set of clauses is unsatisfiable.

2.13

Prolog

Technology

Theorem

Prover

Despite Prolog's logical deficiencies, it is quite interesting from a deduction standpoint because of its very high speed as compared with conventional deduction systems. The objective of a Protog

technology theorem prover (PTTP) [80] is to remedy Prolog's deficiencies while retaining to the fullest extent possible the high performance of well-engineered Prolog systems. To achieve completeness for non-Horn clauses, an inference system other than Prolog's input resolution must be adopted. However, an arbitrarily chosen complete inference system is unlikely to be as efficiently implementable as Prolog. The fact that Prolog employs input resolution is crucial to its high performance. No Prolog operation acts on two derived clauses at once. The use of input resolution and depth-first search implies there is only one active derived clause at a time--represented on the s t a c k - - t h a t is resolved with input clauses that can be compiled. The model elimination procedure is also an input procedure, but is complete. It can be seen as Prolog-style ordered input resolution plus one additional inference rule, the model-elimination reduction operation. The reduction operation, phrased in terms meaningful to Prolog, states that, if the current goal is complementary to an ancestor goal, then the current goal is treated as if it were solved (nonground goals may have to be unified for the rule to apply). It is a form of reasoning by contradiction. Consider proving C from A D C, B D C, and A V B. C has the subgoal A (by A D C), which has the subgoal -~B (by A V B), which has the subgoal -~C (by the contrapositive

95 of B D C). -~C is complementary to the higher goal C, so it can be treated as solved, as thus can -~B, A, and C. The reasoning is: the goal -~C is either true or false; if ~ C is true, then C must be true by the chain of inferences--a contradiction because -,C and C cannot both be true; thus, ~C must be false and C must be true. Note that this reasoning says nothing about the value of the intermediate subgoals A and ~B. Another major concern is the incompleteness of Prolog's unbounded depth-first search strategy. It cannot be replaced by an arbitrary complete search strategy, like breadth-first or meritordered search, without sacrificing performance. If depth-first search were not used, it would be necessary for more than one derived clause to be simultaneously represented and for variables to have more than a single value simultaneously, i.e., different values in different clauses. This implies the need for a more complex and less efficient representation for variable bindings than Prolog's. In addition, depth-first search allows all state information to be kept on the stack with a minimum of memory required. Breadth-first search would need an additional amount of memory that would grow exponentially with increasing depth. Therefore, depth-first search continues to be a good choice of search s t r a t e g y - - b u t for completeness, it must be bounded. That leaves the problem of selecting the depth bound. In an exponential search space, searching with a higher-than-necessary search bound can result in an enormous amount of wasted effort before the solution is found. The cost of searching level n in an exponential search space is generally large compared with the cost of searching earlier levels. This makes it a practical procedure to perform consecutively bounded depth-first search. The depth bound is set successively at 1, 2, 3, and so on, until a solution is found. If a constant branching factor b is assumed, this method results in only a factor of about ~

more inference

operations being performed than breadth-first search to the same depth [82]. The effect is similar to performing breadth-first search. However, instead of retaining the results from earlier levels, these results are recomputed--with the efficiency of Prolog-style variablebinding representation possible for depth-first search only. There are two important optimizations of this iteratively bounded depth-first search procedure that reduce the number of inference operations. The first optimization follows from the observation that, if the depth of the current goal plus the number of pending goals exceeds the d e p t h bound, then no solution within the depth bound can be found from this clause and so another solution should be sought. The second optimization is concerned with (1) recording the minimum value by which the depth of the current goal plus the number of pending goals exceeds the depth bound, and (2)

96 using this minimum value to increment the depth bound instead of always incrementing it by 1. This technique can also be used to recognize that the search space is finite and that the search itself can be abandoned--if, upon the completion of searching at some level, no cutoff had ever occurred because the depth of the current goal plus the number of pending goals exceeded the depth bound, then it is clear that searching with a higher depth bound will result in no additional inferences being made. A Prolog technology theorem prover has several advantages as compared to ordinary theorem provers. It can perform inferences at a very high rate approaching Prolog's. It is complete and easy to use. Many conventional theorem provers rely upon user selection of features and parameter values t h a t control behavior and limit completeness. A Prolog technology theorem prover, like Prolog, requires little memory and has facilities for procedural attachment (built-in functions) and control facilities (the ordering of literals in clause and of clauses in the database, the cut operation).

2.14

C o n n e c t l o n - G r a p h Resolution

In principle, ordinary resolution operates on just a set of clauses as its only data structure. Conneetion-graph resolution [37~3], on the other hand, operates on a derived data structure called a eoaneetion graph. A connection graph is a graph containing clauses with links between complementary pairs of literals. The connection-graph-resolution operation resolves on a link in the connection graph, forms the resolvent, and adds it to the connection graph. Literals in the resolvent acquire their links

to other literals in the connection graph by inheritance. They are linked only to those literals to which their parent literals were linked. An advantage of connection-graph resolution is its explicit representation by the links of what resolution operations are possible. This makes retrieval of matching literals easier and encourages graph searching as a method for selecting inference operations or finding proofs. Although the immediate access to matching literals via links is an often cited advantage of connection-graph resolution, it is also possible to achieve efficient excess to matching literals by using term indexes [61,23], at least for ordinary resolution. When the inference operations are more difficult to discover and computer as when theory resolution or unification in equational theories is used, inheriting links may be more efficient than rediscovering and recomputing possible inference operations. Besides suggesting encoding inference operations in a connection~graph data structure, con-

97 nection-graph resolution is a restriction of resolution because connection-graph resolution specifies that the link that is resolved on be deleted from the connection graph. The effect of this link deletion is that if literals L and L t are resolved, then L or a literal descended from L can never again be resolved with L t or a literal descended from L t. This has the beneficial effect of reducing the size of the search space in much the same way that ordering restrictions in input and linear resolution do. For example, ordinary resolution can discover two refutations of the set of clauses consisting of P , Q, and ~ P Y -~Q: 1. 2. 3. 4.

P Q ~ P v ~Q -~Q

5. false

resolve I and 3 resolve 2 and 4

,

and 1. 2. 3. 4.

P Q -~P v -~Q -~P

5. false

resolve 2 and 3 resolve 1 and 4

Either of these refutations can be discovered by connection-graph resolution, but not both in the same execution of a connection-graph-resolution theorem prover. The only resolution operations possible initially are to resolve on P in clauses 1 and 3 or on Q in clauses 2 and 3. Suppose clauses 1 and 3 are resolved first° Then the link between P in clause 1 and -~P in clause 3 is deleted, and neither literal will have any links. Thus, if clauses 2 and 3 are later resolved, the resolvent -~P will have no links (because there were none to inherit) and a refutation cannot be completed. Note that, aside from the initial set of links, links are acquired only by inheritance. Thus, once a literal has no links, the clause containing it can be removed, because the linkless literal cannot be resolved away. This is an extension of the purity rule for ordinary resolution. A sometime characteristic of connectlon-graph resolution is the dramatic collapse of the connection graph when links are deleted. Deletion of a link may make a literal pure in the graph causing its clause to be deleted which leads to yet more pure literals. The complexity of connection-graph resolution compared to other restrictions of resolution and its noncomrnutative behavior (i.e., inference operations cannot be freely reordered, because inferences can be blocked by absence of links depending on the order in which operations are performed) have made the procedure's completeness a difficult issue, although it can be shown complete under some restrictions [72,4,74].

98

2.15

Nonclausal

Resolution

One of the most widely criticized aspects of resolution theorem proving is its use of clause form. Besides generally being considered difficult to read and not human-oriented, one criticism of clause form is that conversion of a formula to clause form may eliminate pragmatically useful information encoded in the choice of logical connectives. For example, -,P y Q may suggest a case-analysis approach, while the logically equivalent P D Q may suggest a chaining approach to deduction. The use of clause form may also result in a large number of clauses being needed to represent a formula, as well as in substantial redundancy in the search space. An example of when conversion to clause formula results in a substantial increase in the size of the formula is the conversion of A -- B. If A and B are literals, the equivalent clause form is (-,A v B) h (A Y --B), which has two instances of the atoms of A and B. In the worst case, when A and B are formed using the equivalence connective, conversion of the single formula A - B may result in a number of clauses that is an exponential function of the size of the formula. Another problematical example is the formula (A1 A . . . A A,~) V (B1 A ... A B,). Even in the simple case when A1,... , A m , B 1 , . . . , B , are literals, if this formula is converted to the m x n clauses A1 V B 1 , . . . , Am V B,, each Ai occurs n times and each Bj occurs m times, instead of once as in the original formula. It is possible to extend the resolution rule to nonclausal formulas [60,55,78]. Although ordinary clausal resolution resolves on clauses containing complementary literals, nonelausal resolution resolves on general formulas containing subformulas occurring with opposite polarity. In clausal resolution, the literals resolved on are deleted and the remaining literals disjoined to form the resolvent. In nonclausal resolution, all occurrences of the subformulas resolved on are replaced by false (true) in the formula in which it occurs positively (negatively). The resulting formulas are disjoined and simplified by truth-functional reductions such as ( A v t r u e ) --~ true and

(A A true) -+ A that eliminate embedded occurrences of true and false and optionally perform simplifications such as (A A =A) --~ false. More precisely, if A and B are formulas and C is an atom occurring positively in A and negatively in B, then the result of simplifying A(C ~- false) v B ( C *-- true), where X ( Y +-- Z) denotes the result of replacing every occurrence of Y in X by Z, is a nonctausal resolvent of A and B. It is clear that nonclausal resolution reduces to clausal resolution when the formulas are

99 restricted to being clauses. In the general case, however, nonclausal resolution has some novel characteristics as compared with clausal resolution. It is possible to derive more than one resolvent from the same pair of formulas, even when resolving on the same atoms, if the atom occurs both positively and negatively in both formulas. Likewise, it is possible to resolve a formula with itself. The elimination of clause form and use of nonclausal resolution has some disadvantages as well as advantages. Most operations on nonclausal formulas are more complex than the corresponding operations on clauses. The result of a nonclausal resolution operation is less predictable than the result of a clausal resolution operation. This is an important point when a theorem-proving system selects what operation to perform next on the basis of the expected result (e.g., how many literals are in a derived clause). Clauses can be easily represented as lists of literals; sublists are appended to form the resolvent. Pointers can be used to share lists of literals between parent and resolvent [6]. With simplification being performed during the formation of a noncIausal resolvent, the appearance of a resolvent may differ substantially from its parents, making structure sharing more difficult. In clausal resolution, every literal in a clause must be resolved on for the clause to participate in a refutation. Thus, if a clause contains a literal that is pure (cannot be resolved with a literal in any other clause), the clause can be deleted. This is not the case for nonclausal resolution; not all atom occurrences are essential in the sense that they must be resolved on to participate in a refutation. For example, ( P / ~ Q,-~Q) is an unsatisfiable set of formulas, one of which contains the pure atom P. Only formulas containing pure atoms that are essential should be deleted for purity reasons. The subsumption operation must also be redefined for nonclausal resolution to take account of such facts as the subsumption of A by A A B as well as the clausal subsumption of A V B by A. The nonclausal resolution procedure gains additional power and complexity from allowing resolution on nonatomic formulas as well as atoms. For example, P v Q and (P v Q) D R could be resolved to obtain R. This can result in shorter, more natural proofs. However, the extension to nonatomic formulas is difficult in some respects. It may be difficult to recognize complementary formulas. For example, P V Q occurs positively in Q v R v P , -~P D Q, and -~(P -- Q). Also, the effect of resolving on nonatomic subformulas can be achieved by multiple resolution operations on atoms. Resolution on atomic constituents of nonatomic formulas that are also resolved on can lead to redundant derivations and inefficiency.

100

2.16

Connection

Method

The connection method [4,5] or generalized rantings [1] is not a form of resolution, but has some relationships to connection-graph resolution and nonclausal resolution, among others. Clause form is often referred to as conjunctive normal form (CNF) because it is a conjunction of disjunctions of literals. A dual form, called disjunctive normal form (DNF), is a disjunction of conjunctions of literals. One is easily obtained from the other by rewriting a formula in CNF by

(A A (B V C)) --* ((A A B) V (A A C)) ((B V C) A A) -~ ((S A A) V (C A A)) Another way of forming the DNF of a formula in CNF is to enumerate n conjunctions of literals, where n is the product of the number of literals in each clause and each conjunction is composed of one literal from each clause. For example, the CNF formula

(P V Q)A (P V ~Q)A (-~P v Q)A (~P V ~Q) is equivalent to the unsimplified DNF formula (P A P A-~P A ~P)V

(P ^ p A ~P ^ -~q)v ( P A P A Q A -~P)V

(P ^ P A 0 ^ ~q)v (q A -,q ^ O A -,q) The interesting thing about this formula is that every conjunction contains a complementary pair of literals. It is clear that this property holds for the DNF of any unsatisfiable formula. If a conjunction did not contain a complementary pair of literals, then that conjunction and, thus, the whole formula could be satisfied. This is the logical basis for the connection method. However~ it does not actually form the DNF of a formula. Instead, it does graph searching of the formula, enumerating its paths, where a path consists of one literal from each clause. If every path contains a complementary pair of literals, then the formula is unsatisfiable. Because a single complementary pair of literals often appears in more than one path, it is possible by clever search to avoid explicit enumeration of all

101

of the paths. A connection graph is often used as an auxiliary data structure in the connection method. The connection method is applicable to formulas that are not in clause form.

It is only

necessary to refine the definition of a path through the formula. Consider the case of formulas that

negation normal form (NNF). A formula is in

are in

NNF if its only connectives are conjunction,

disjunction, and negation, and only atomic formulas are arguments of negation. (The connection method applies to formulas more general than NNF, but NNF is especially convenient to discuss, because the restriction of negation to atomic subformula means that, for example, a conjunction is really a conjunction, not a disjunction in disguise because it is negated.) A path through a formula that is a single literal consists of that single literal. Any path through one of the disjuncts is a path through a disjunction. Any concatenation of paths through all of the conjuncts is a path through a conjunction. For example, the formula

PV

A (R v -,S)

has the paths

(P), (Q,R),

and

(Q,~S).

The principal strategic concerns for the connection method applied to ground formulas are the efficient enumeration of pairs of complementary literals and paths so that not all paths need to be individually checked and the reduction of formulas to equivalent ones. Many reduction methods are similar to methods used in resolution, such as elimination of tautologies, subsumption, and purity. For formulas that are not ground there is the additional strategic concern of how many instances of subformulas will be needed for a single substitution to exist such that every path contains a complementary pair of literals. For example, in refuting instances-~P(a) and

~P(b) of ~P(x)are

~P(x) A (P(a) V P(b)), two

required.

Bibel's chapter also includes discussion of the connection method.

2.17

Theory

Resolution

Theory resolution [79,81] is a method of incorporating specializedreasoning procedures in a resolution theorem prover so that the reasoning task will be effectivelydivided into two parts: special cases, such as reasoning about inequalitiesor about taxonomic information, are handled efficiently

102

by specialized reasoning procedures, while more general reasoning is handled by resolution. The connection between the two reasoning components is made by having the resolution procedure resolve on sets of literals whose conjunction is determined to be unsatisfiable by the specialized reasoning procedure. The objective of research on theory resolution is the conceptual design of deduction systems that combine deductive specialists within the common framework of a resolution theorem prover. Past criticisms of resolution can often be characterized by their pejorative use of the terms

uniform and syntactic. Theory resolution meets these objections head on. In theory resolution, a specialized reasoning procedure may be substituted for ordinary syntactic unification to determine unsatisfiability of sets of literals. Because the implementation of this specialized reasoning procedure is unspecified--to the theorem prover it is a "black box" with prescribed behavior, namely, able to determine unsatisfiability in the theory it implements--the resulting system is nonuniform because reasoning within the theory is performed by the specialized reasoning procedure; reasoning outside the theory is performed by resolution. Theory resolution can also be regarded as being not wholly syntactic~ because the conditions for resolving on a set of literals are no longer based on their being made syntactically identical, but rather on their being unsatisfiable in a theory, and thus resolvability is partly semantic. Reasoning about orderings and other transitive relations is often necessary, but using ordinary resolution for this is quite inefficient. It is possible to derive an infinite number of consequences from (a < b) and -~(x < y) V ~(y < z) v (x < z) despite the obvious fact that a refutation based on just these two formulas is impossible. A solution to this problem is to require that use of the transitivity axiom be restricted to occasions when either there are matches for two of its literals (partial theory resolution) or a complete refutation of the ordering part of the clauses can be found (total theory resolution). An important form of reasoning in artificial-intelligence applications embodied in knowledgerepresentation systems is reasoning about taxonomic information and property inheritance. One of objectives of theory resolution is to be able to take advantage of the efficient reasoning provided by a knowledge representation system by using it as a taxonomy decision procedure in a larger deduction system. For systems like the Krypton knowledge representation system, which comprises terminological and assertional reasoning components, theory resolution provides a theory for connecting different reasoning systems. Any satisfiable set of formulas that is to be incorporated into the inference process can be regarded as a theory. A T-interpretatlon is an interpretation that satisfies theory T.

103

For example, in a theory of partial ordering ORD consisting of -~(x < x) and (x < y) A (y < z) D (x < z), the predicate < cannot be interpreted so that (a < a) has value true or (a < e) has value f a / s e if (a < b) and (b < c) both have value true. In a taxonomic theory T A X including

Boy(x) D Person(x), Boy(John) cannot have value true while Person(John) has value false. A set of clauses S is T-unsatisfiable if and only if no T-interpretation satisfies S. Let Ci . . . . . C,~ (rn > 1) be a set of nonempty clauses, let each Ci be decomposed as K~ v L~ where Ki is a nonempty clause, and let R 1 , . . . , Rn (n _> 0) be unit clauses. Suppose the set of clauses K1,. • •, K,~, R1,. •., Rn is T-unsatisfiable. Then the clause Li V . - - V L,~ V ~R1 V . . . v -~R, is a theory resolvent using theory T (T-resolvent) of C i , . . . , Cm. It is a total theory resolvent if and only if n = 0; otherwise it is partial. K1,... ,Kin is called the key of the theory resolution operation. For partial theory resolvents, R 1 , . . . , Rn is a set of conditions for the T-unsatisfiability of the key. The negation -~R1 V . . . V -~R, of the conjunction of the conditions is called the residue of the theory resolution operation. It is a narrow theory resolvent if and only if each Ki is a unit clause; otherwise it is wide. The partial theory resolution procedure permits total as well as partial theory resolution operations. Similarly, the wide theory resolution procedure permits narrow as well as wide theory resolution operations. For example, a set of unit clauses is unsatisfiable in the theory of partial ordering ORD if and only if it contains a chain of inequalities tl < . . . < t,(n > 2) such that either tl is the same as tn or ~(tl < t , ) is also one of the clauses. P is a unary total narrow ORD-resolvent of (a < a) V P. P V Q is a binary total narrow ORD-resolvent of (a < b) v P and (b < a) V Q. P v Q v R V S is a 4-ary total narrow ORD-resolvent of (a < b) v P , (b < c) v Q, (c < d) v R, and -~(a < d) V S. This can also be derived incrementally through partial narrow ORD-resolution, i.e., by resolving (a < b) V P and (b < e) V Q to obtain (a < e) V P V Q (-~(a < e) is the condition), resolving that with (e < d) V R to obtain (a < d) V P V Q V R, and resolving that with -~(a < d) V S to obtain

PVQVRVS. Suppose the taxonomic theory T A X includes a definition for fatherhood Father(x) -- [Man(x)

/x 3yChild(x,y)]. Then Father(Fred) is a partial wide theory resolvent of Child(Fred, Pat) V Child(Fred, Sandy) and Man(Fred). Also, false is a total wide theory resolvent of Child(Fred, Pat) v Child(Fred, Sandy), Man(Fred), and -~Father(Fred). In narrow theory resolution, only T-unsatisfiability of sets of literals, not clauses, must be decided. Total and partial narrow theory resolution are both possible. In total narrow theory resolution, the literals resolved on (the key) must be T-unsatisfiable. In partial narrow theory

104

resolution, the key must be T-unsatisfiable only under some conditions. The negated conditions are used as the residue in the formation of the resolvent. The theory rantings procedure is another method of incorporating theories that is similar to the total narrow theory resolution method, in the sense of imposing the same requirements on the decision procedure for T, i.e., determining the T-unsatisfiability of sets of literals but does not depend on performing resolution inference operations. The theory matings procedure is an extension of the connection method or generalized matings. The statement that if every path through a formula contains a complementary pair of literals, then the formula is unsatisfiable can be generalized to the statement that if every path through a formula contains a set of literals that is unsatisfiable in the theory T, then the formula is T-unsatisfiable. Theory resolution is a procedure with substantial generality and power. Thus, it is not surprising that many specialized reasoning procedures can be viewed as instances of theory resolution, perhaps with additional constraints governing which theory resolvents can be inferred. For example, unification in equational theories can be viewed as a special case of theory resolution for building in equational theories. Inference rules such as paramodulation, resolution by unification and equality, and E-resolution can also be viewed as instances of theory resolution, differing in whether total or partial theory resolution is used and in their selection of key sets of literals to resolve on.

2.18

Krypton

Krypton [7,8] represents an approach to constructing a knowledge representation system that is composed of two parts: a terminological component (the TBox) that can represent and reason about terminological information and an assertional component (the ABox) that can represent and reason about assertional information. It is an interesting example of the application of theory resolution. Krypton's TBox provides a language for defining and reasoning about taxonomic relations. It permits definitions of concepts and roles that are associated with unary and binary predicates in the ABox. A concept can be defined as a primitive concept, a conjunction of concepts, or a concept restricted so that all fillers of a particular role are of a certain concept. A role can be defined to be a primitive role or a composition of roles. For example, the following are valid Krypton TBox definitions:

105 def

Grandchild =_ (RoleChain Child Child) def

Coed - (ConGenermWoman Student) Successful-Grandma de_f(VRGeneric Woman Grandchild Doctor) That is,

Grandchild(x, y) is true if and only if y is a child of a child of x, Coed(x) is true if

and only if x is someone that is both a woman and a student, and

Successful-Grandma(x) is

true if and only if x is a woman all of whose children (if any) are doctors. Krypton's ABox is a resolution theorem prover [78] that uses predicates that have been given TBox definitions. Taxonomic definitions are not provided as assertions to the ABox however. Instead, they are taken account of by theory resolution operations that use the theory of the defined concepts and roles. Thus, for example, all of the following inferences are single-step theory-resolution operations performable by the ABox: from

Student(John) and -~Coed(John) it is possible to infer

-~Woman(John); Coed(John) and ~Woman(John) are directly contradictory; from SuccessfulGrandma(Marge) and Child(Marge, Hope) it is possible to infer (Child(Hope, x) D Doctor(x)).

3

Unification

Unification is a bidirectional pattern matching process, i.e., it is like pattern matching except values can be assigned to variables in both expressions, not just one. For example, though neither of

P(a, x) and P(y, b) is a pattern-matching instance of the other, they are unifiable with most

general unifier {x ~-- b, y +-- a}. In general, a substitution is a set of variable assignments. It is convenient to consider only

idempotent substitutions [19], where an idempotent substitution is

one in which no variable x~ that appears in an assignment x~ ~-- tl also occurs inside the term tj for any assignment xj ~-- tj in the substitution. The standard unification algorithm scans the two expressions to be unified in left-to-right order, looking for the first

disagreement or difference between the two expressions. If one of the

two subexpressions located at the first disagreement is a variable and the other is an expression not containing that variable, then the assignment of the subexpression to the variable is added to the substitution being constructed, the two expressions are instantiated by the new assignment, and the process continues. If neither of the two expressions is a variable, or if one is a variable and the other subexpression contains that variable, then unification of the two expressions fails. Unification succeeds if no (further) disagreements are found, i.e., the two original expressions have been instantiated to be identical.

106

The check for whether the variable is contained in the expression that is perhaps to be assigned to it is call the occurs cheek. The occurs check causes the unification of x and f ( x )

(or any other

term containing x other than x itself) to fail. If the unification were allowed to succeed, it would result in the formation of a circular binding x +- f ( x ) ,

and the unified expressions would be

infinite. A demonstration of the importance of the occurs check for sound inference is the following Prolog program [64] (many Prolog implementations, for the sake of efficiency~ do not perform the occurs check): (1) X < s(X). (2) 3 < 2 : - s(Y) < Y. (3) ? - 3 < 2. Restated in English~ the foregoing asserts that says (I) every x is less than the successor of x, and that (2) if, for every y, the successor of y is less than y, then 3 would be less than 2; it then asks (3) whether 3 is less than 2. Prolog implementations without the occurs check would answer affirmatively, binding X to s(Y) and Y to s(X), thereby creating an infinite term. Moreover, unification without the occurs check may not even terminate, as in the case of unifying the values of X and Y. The unification algorithm either succeeds and returns a single unifying substitution u n i f i e r or fails and returns none. If it succeeds, the unifier returned is a m o s t g e n e r a l u n i f i e r .

A most

general unifier is one such that any other unifier is a variant or instance of it. The most general unifier is not necessarily unique, however. For example, both {x +- y} and {y +-- x} are most general unifiers of x and y. As given here, unification can be quite inefficient. In the worst case, its behavior is exponential. For example, consider the unification of f ( u , h ( w , w), w , j ( y , y)) and f ( g ( v , v), v, i ( x , z), x). This would result in successive assignments of u +-- g ( v , v ) , v +- h ( w , w ) , w +-- i(z, x), and x +- j ( y , y ) . The resulting substitution is { x g(h(...),

h(..-))}.

+-- j ( y , y ) , w

+-- i ( j ( y , y ) , j ( y , y ) ) , v

+-- h ( i ( . . . ) , i ( . . . ) ) , u

~-

The algorithm would incrementally construct this substitution, instantiating

the current substitution by each new variable assignment as it is made, and would also create new instances of the original expressions. There is a linear-time unification algorithm by Paterson and Wegman [62] that requires a directed acyclic graph representation for expressions, and there are also efficient unification algorithms by Huet [26] and Martelli and Montanari [57]. The costliest inefficiency of the standard unification algorithm is its need to instantiate the

107

expressions being unified and the substitution being constructed by the newest variable assignment. This can, as in the example, produce exponential growth in the size of the substitution and the terms being unified. A solution is to use a structure-sharing method [6] during unification. As the two expression are being scanned for disagreements, if a variable is encountered, its value is looked up in the list of bindings accumulated so far and used in the scanning process (but not substituted into the expression). Variables are also looked up when the occur check is applied. This process eliminates actual formation of the instantiated terms though they are implicitly created during the scanning and occur check processes. If the process is completed successfully with no uneliminatable disagreements, the result is a set of noncircular variable bindings that may depend on each other. The set of bindings should be converted to an idempotent substitution for it to be used efficiently, e.g., to instantiate the remaining literals of a pair of clauses being resolved° To convert a set of dependent noncircular bindings to an idempotent substitution, topologically sort them so that the binding x~ +- ti precedes the binding x1 +- t~. if x~ occurs in tj (an inability to topologically sort the bindings implies an occur check violation). Let (xl ~-- t l , . . . , x• +-- t,~) be a topologically sorted list of noncircular bindings. Then let ~1 be ( x l +-- tl} and 0~ be 0~-1 U {x~ +- t~0~_l) (1 < i < n). Each 0~ is an idempotent substitution with 8, being the final result. A more abstract characterization of substitutions and unification can be found in the chapter by Huet.

3.1

Unification

in Equational

Theories

Some equational theories occur in theorem-proving applications often enough and have enough impact on overall performance to merit their incorporation directly into unification algorithms [65]. The most pervasively used properties that have been built into unification algorithms are associativity, commutativity, and their combination. If the equality relation is used to represent associativity and commutativity, then associativity and commutativity of the function f can be expressed as f ( f ( x , y), z) -- f(x, f(y, z)) and f(x, y) =

f(y,x). Because of the difficulty of using the equality predicate, whether by a x i o m or special rules of inference like paramodulation, an alternate formulation has often been used. Let P(x, y, z) denote f ( x , y ) -= z. Then associativity can be represented by the pair of clauses -~P(x,y,u) v

108

-~P(u,~,~,) v --,P(,,, ~, ~) v e ( = , , , , ~ ) and -~P(~, y,,~) v -~P(y, ~,v) v -~P(=,v,~) v v(,~,~,~,), = d commutativity can be represented by the clause

-~P(x, y, z) v P(y, x, z).

There are several difficulties in specifying associativity and commutativity axiomatically regardless of which representation is used. One is that there are too many representations for the same expression. For example, the expressions

f(a,f(b,e)), f(a,f(c,b)), f(b,f(a,c)), f(f(a,c),b),

etc., are all equivalent if f is

associative and commutative. These multiple representations for equivalent expressions contribute to excessive search-space sizes. Subsumption will not detect and remove formulas that are associative-commutative variants. In addition, verifying that two expressions are associative-commutative variants often involves lengthy deductions. For example, to derive

f(f(e, a), b) from f(a, f(b, c))

requires two uses

of the commutativity axiom and one of the associativity axiom. This approach requires three paramodulation steps. Even more steps would be necessary if equality axioms or the nonequality formulation were used instead of paramodulation. Theorem provers will often fail to solve difficult problems that involve functions that are associative or commutative or both, because so much effort must be spent on deductions that should be trivial. Another problem with the axiomatic representation for associativity and commutativity is that the theorem prover will be undiscriminatingin what results are derived. If f(a,

x)

and

f(y, b) need

to be unified, where f is associative and commutative, an associative-commutative unification algorithm may recognize that a complete set of most general unifiers consists of {x ~-- b, y *- a} and {x ~--

f(b,z),y *-- f(a,z)}.

However, if axioms for associativity and commutativity are

used, less general unifiers like {x ~--

f(b, f(zx,z2)),y *-- f(a, f(z,,z2))}

and their associative-

commutative variants will also be generated, ad infinitum. Building properties like associativity and commutativity into the unification algorithm eliminates these difficulties and the need for associativity and commutativity axioms. If equality axioms or inference rules are needed only to support inference about these properties, then the use of equality axioms or inference rules can be eliminated as well. Despite the complexity of special unification algorithms for equational theories and the fact that they are generally much more time consuming than ordinary unification, their use generally pays off when trying to prove nontrivial theorems. When compared with formulating properties such as associativity and commutativity as axioms, special unification is advantageous because it does not return any results that are not implicit in the search space using the axioms--it just computes them more directly--and they often will compute a finite complete set of unifiers, while

109

the axiomatic approach would continue to generate redundant consequences. If the unification problem is intractable (associative-commutative pattern matching is NPcomplete [2]--associative~commutative unification is thus at least that difficult), this difficulty will also be reflected in the number of inferences required in trying to prove theorems with the axioms without using special unification. The most serious problem with using special unification algorithms is the possible occurrence of difficult unification tasks early in a search for a proof that effectively blocks the discovery of a shallow proof elsewhere, because of the resources spent on special unification. In such cases, incomplete or incremental special unification algorithms can be employed. Ideally, an incomplete special unification algorithm wilt return only some of the simpler unifiers and an incremental one wilt return progressively more complex unifiers on successive calls. Actually, the use of axioms for the theory of, say associativity or commutativity, along with axioms or rules for equality plus ordinary unification in effect form a quite inefficient incremental unification algorithm. Many results on special unification can be found in Siekmann [71].

3.2

Commutative

Unification

The standard unification algorithm can be easily modified to build in commutativity of functions [73,70].

For example, if f is a commutative function, when unifying the terms f(s~,sz) and

f(Q,t2), it is necessary to try to unify the arguments sl with tl and s2 with t2 simultaneously, as in the ordinary unification algorithm, and also to try to unify sl with t2 and s2 with tl simultaneously. This modification of the ordinary unification algorithm yields an algorithm that is complete for commutative functions. The commutative unification algorithm given here illustrates two properties of special unification algorithms. One is their added complexity and computational requirements compared to ordinary unification. The second is that, depending on the theory that is incorporated into the unification algorithm, it may no longer be the case that there will be only zero or one most general unifiers. For example, if

f(x, y) and f(a, b)

are unified, where f is commutative, then

{x ~ a, y *- b} and {x ~ b, y ~ a} are both most general unifiers; the two together compose the complete set of most general unifiers.

110

3.3

Associative

Unification

Associative unification is more difficult than commutative unification. If the function f is associative (but not commutative), then the terms unifiers, namely

f(a, x) and f(x, a) have an infinite set of most general

{x ~-- a), (x e- f(a, a)}, (x ¢-- f(a, f(a, a))}, and so on. Two other interesting

examples, syntactically similar to the first, are the unification of

f(a, x) and f(y, b), which have a

complete set of most general unifiers consisting of (x ~- b, y ~-- a) and {x +and the unification of

f(z, b), y ~-- f(a, z)),

f(a, x) and f(x, b), which have no unifiers.

For functions that are associative, it is convenient to drop the distinction between f(x, and

f(f(x,y),z)

and represent both by the term

f(y, z))

f(x,y,z) as if f were an n-ary function for

arbitrary n. A complete unification algorithm for associative functions is readily obtained by modifying the standard unification algorithm in the following manner [65,73,47]. Argument lists of two terms headed by the same associative function symbol are scanned in left-to-right order, looking for the first disagreement. If the first disagreement is that one argument list is exhausted before the other, then unification with the current substitution fails. The principal difference from standard unification occurs when the two subexpressions at the first disagreement are arguments of an associative function and one or both of the subexpressions is a variable. Let variable z and term t be such arguments of an associative function f. If t contains x, then unification with the current substitution fails. If t does not contain x, then unification proceeds with the substitution of t for x and also with the substitution of f(t,

u) for x, where u is a new variable not occurring elsewhere.

If t were also a variable, then it would also be necessary to try the substitution of

f(x, v) for t,

where v is a new variable. Consider the unification of

f(x, y) and f(a, b, c). The first disagreement is x differing from

a. Thus, the assignments x +- a and

x ~-- f(a, u) are tried, leading to the problems of unifying

f(a, y) and f(a, u, y) with f(a, b, c), respectively. The first problem is solved by the subsequent assignments

y +- f(b,v) and v e- b, and the final unifier is (x ~- a,y +- f(b,c)}. The second

problem is solved by the subsequent assignments u ~-- b and y ~- c, and the final unifier is

{x ~-- f(a, b), y *- c}. The complete set of unifiers consists of {x ~- a, y +- f(b, c)} and (x ~-/(a, b), y ~- e}. Although this algorithm is complete, it does not always terminate, because, as in the case of unifying

f(a, x) and f(x, a), there may be an infinite number of unifiers. Even where there is

a finite number of unifiers, as in the case of unifying

f(a,x) and f(x,b) that have no unifiers,

111

the algorithm fails to terminate, trying to match the expressions with a~signments z ~- a, x ~f ( a , vx), x *- f ( a , a , v2), and so on. An alternative approach to associative unification that allows better control separates the tasks of assigning terms to variables and creating new variables. This approach uses an incomplete associative unification algorithm that introduces no new variables. It just assigns to a variable one or more of the arguments in the other argument list. For example, in the case of unifying f(x, y) and f(a, b, c), the algorithm may try to assign to x the terms a, f(a, b), and f(a, b, e). Continuing to unify the expressions with each of these assignments results in the unifiers {x ~ a, y ~-- f(b, e)},

{x *-- f(a, b), y ~-- c}, and failure, respectively. This incomplete unification algorithm is combined with a widening [73] or variable splitting [76] process that replaces variables by more complex terms. If the variable x is an argument of the associative function symbol f , then the term containing x can be widened by replacing x by the term f(xl,x2) with new variables xl and x2. A complete associative unification algorithm is obtained by collecting the results of unifying, by the incomplete but terminating associative unification algorithm, one expression with all results of widening the other expression. The widening operation may be applied any number of times. In order to compute the infinite number of unifiers of f(a, x) and f(x, a), an infinite number of widening substitutions must be applied. Note that it is sufficient to create widening substitutions for only one of the two expressions. For example, in unifying f(a, x) and f(y, b), the incomplete unification algorithm returns the substitution {x ~-- b, y ~- a). Widening f(a, x) with the assignment x +-- f(xl, x2) results in the unification of f(a, x2, x2) and f(y, b) with unifier {x +- f(x2, b), y ~-- f(a, x2)}. The completeness of the above incomplete associative unification algorithm in conjunction with widening only one of the two expressions implies the completeness of the incomplete associative unification algorithm for pattern matching. Makanin [54] proved the decidability of associative unification for the restricted case of terms composed of a single associative function symbol and variables and constants only. However, this algorithm only decided whether a unification problem was solvable--it did not return a unifier, let alone a complete set of unifiers. More recently, Jaffar [34] has developed a minimal and complete unification algorithm for this case. This algorithm computes a minimal complete set of unifiers and, unlike the algorithm described above, is guaranteed to terminate if the complete set is finite. However, this algorithm has not yet been generalized to handle nonvariabte, nonconstant arguments, as is required for general use in theorem proving.

112

3,4

Associative-Commutative

Unification

It is possible to develop associative-commutative unification algorithms aIong the lines of the complete but nonterminating associative unification algorithm and the incomplete associative unification algorithm augmented by widening [76]. However, we can do better. In the case of associativity plus commutativity, there is a finite number of unifiers, and it is possible to devise a complete terminating unification algorithm [75~77,48]. Arguments common to the two terms headed by the same associative-commutative function symbol can be canceled in pairs until no arguments appear in both terms. Thus, for example, the problem of unifying f(x, x, y, a, b, c) and f(b, b, b, c, z) can be replaced by the problem of unifying

f(x, x, y, a) and f(b, b, z). The case of unification of terms headed by an associative-commutative function symbol with only variable arguments will be considered first.

For example, consider unifying the terms

f ( x , x , y , u ) and f ( v , v , z ) where f is an associative-commutative function. What is required of a substitution for variables u, v, x, y, and z for it to be a unifier? Each variable is assigned either a term not headed by the function symbol f or a term headed by f with some arguments. Consider each distinct term t that is either a variable value not headed by the function symbol f or variable-value argument of a term headed by f . For a substitution to be a unifier, for every such term t, twice the number of t's in x plus the number of t's in y plus the number of t's in u must equal twice the number of t's in v plus the number of t's in z. Thus, unification of the terms

f(x, x, y, u) and f(v, v, z) is related to solution of the linear homogeneous diophantine equation 2x+y+u=2v+z. In contrast to the usual situation of trying to solve linear homogeneous diophantine equations, associative-commutative unification requires that only nonnegative integral solutions be considered. A negative value for a variable corresponds to assigning a negative number of terms to a variable in the unification problem. Negative values are considered in extensions of this method to abelian-group-theory unification (associativity plus commutativity plus identity plus inverse); the presence of the inverse operation makes it meaningful to consider the assignment of a negative number of terms to a variable. The set of all nonnegative integral solutions to a linear homogeneous equation can be obtained by addition of elements of a finite basis set of solutions. This finite basis set of solutions is obtained by generating all solutions to the equation in ascending order of the value of 2x + y + u (= 2v + z), discarding solutions that are composable from those previously generated, and terminating when

113

no new noncomposable solutions can be found. It is necessary to discover some b o u n d on the value of the equation such that no new basis solutions will be found with value higher t h a n the b o u n d .

Consider the general problem of

finding solutions to the linear homogeneous diophantine equation a z x l + " " + a,~x,~ = blyl + For each i and j with 1 < i < m and 1 < j N n, there is a basis solution with

• ".+bny,.

xi = l c m ( a , , b j ) / a i ,

Y1 = l c m ( a i , b i ) / b i ,

and all other variables equal to zero.

One of these

solutions m u s t be subtractable with nonnegative difference from any solution with value greater t h a n m a x ( m , n) x max~,~-lcm(a~, bj) a n d this is, therefore, a b o u n d on the value of solutions. A lower b o u n d and more effective e n u m e r a t i o n m e t h o d can be found in Huet [27]. The 7 basis solutions for the equation 2x + y + u = 2v + z are given by the table

x y u v z

I 2 3 4 5 6 7

0 0 0 0 0 1 1

0 1 0 1 2 0 0

1 0 2 1 0 0 0

0 1 z l 0 1 z 2 1 0 z s 1 0 ~ 1 0 z 5 0 2 4 1 0 ~

Thus, any nonnegative integral solution to the equation can be obtained by assigning nonnegative integers to the variables zl,. • •, z7 and computing x=

z6+z7

y = z2 + z4 + 2z5 u = z l + 2z3 + z4 V = Zz + Z4 + Zh + Z7 z = zl + z2 + 2z6

The corresponding s u b s t i t u t i o n {x ~-- f(z¢~ zT), y ~-- f ( z 2 , z4, zh, z s ) , u ~-- f ( z l , z3, z z , z 4 ) , v ~-f ( z z , z4, zh, zr), z ~

f ( z l , z2, z¢, z6)} is the single most general unifier if f has an identity element.

Associative-commutative unification without identity is slightly more complicated. Because without an identity element it is impossible to assign zero terms to some variable z~, it is necessary to consider the 2" combinations of the n basis solutions, restricted to those such that none of the variables x, y, u, v, or z is assigned zero. There are 69 such solutions, including (denoting a solution by the set of its indices) {2, 3, 6}, {1, 2, 3, 6}, and {4, 6} with corresponding unifying substitutions

1t4

{x ~-- z6, y ~'- z~,u +-- f(z3,z3),v +-- z3, z .e-- f(z~,z6, z~)} {~ +-

~ , y ~- ~2,~ ~ - / ( z l , ~ 3 , ~ ) , v ~- z3,~ +-/(~1,z~,~,~,)}

{~ ~-

z~,y ~ -

~,,~ +- ~,,v ~-

z,,~ ~-/(z~, z,)}

This set of 69 unifiers is a m i n i m a l complete set of unifiers of f ( x , x, y, u) and f(v, v, z). Associative-commutative unification for more general terms is accomplished by first forming

a variable abstraction of the terms. For example, in unifying f ( x , x, y, a) and f(b, b, z), variable only terms f ( x , x, y, u) and f(v, v, z) are formed by replacing the distinct nonvariable terms a and b by new variables u a n d v. The original terms can be obtained from their variable abstraction by applying the s u b s t i t u t i o n {u ~-- a, v ~-- b}. The variable only terms are unified as above. Each unifier of the variable only terms is then unified with {u ~-- a, v ,--- b} [83]. The resulting substitutions are a complete set of unifiers for the original terms. As stated so far~ this would seem to entail the unification of each of the 69 unifiers of

f(x,x,y,u)

and f(v,v,z)

with the substitution {u +-- a,v +- b}.

However, substantially less

effort t h a n this is required [75,77]. The generation of the sets of basis solutions can be constrained to take account of the origins of the variables. In particular, the variables u and v of the variable abstraction correspond to the constants a and b in the original terms. Each assignment to a variable x, y, u, v, or z in a unifier of f ( x , x , y , u )

and f ( v , v , z )

is either a variable z; or a

term f ( . . o). Any unifier that assigns u or v a term of the form f ( . . - ) will not be unifiable with the s u b s t i t u t i o n {u ~- a, v ~ b}. W h e n computing sums of basis solutions, any variable (e.g., u a n d v) t h a t comes from a nonvariable term in the original problem m u s t be assigned exactly one. Only 6 unifiers of f ( x , x, y, u) a n d f ( v , v, z) are discovered when this restriction is imposed. The n u m b e r can be reduced to 4 by observing the restriction that the use of basis-solution n u m b e r 4 requires the unification of a a n d b. The constrained generation of basis sums and unifiers for f ( x , x , y,u) a n d f ( v , v , z) yields sums {1, 5, 6}, {1, 2, 5, 6}, {1, 2, 7}, and {1, 2, 6, 7} with corresponding unifiers:

{x ~-- z6, y ~ f ( z b , z s ) , u ~ z l , v ~-- zb,z ~-- f(zl,z6, z6)} {= ~-

~0,y ~ - / ( z ~ , ~ , ~ ) , ~

{~ ~- ~,y

~- ~,~

~- ~,~

,- ~2,~ ~- ~,,v ~- ~,,~ +-

{~ ~- f(z~,~,),y

~- ~,~

~- ~,,.

~-

f(z~,~,~,~)}

f(~,~)}

~- ~,,z ~- f(~,z~,~,z~)}

Unification of these with the substitution {u ~-- a,v +-- b} yields the complete set of unifiers of

f(x,x,y,a)

and f(b,b,z):

{u ~- f(b,b),~ ~ - / ( a , x , ~ ) }

1t5

Termination of associative-commutative unification was an open question for a long time, but has now been solved [20]. Termination of standard unification is easy to verify because as each pair of symbols is matched during the scan from left to right for disagreements, either the remaining number of symbols is fewer (when the matched symbols agree) or the number of uninstantiated variables is fewer (when a disagreement is eliminated by assigning a term to a variable). Such a simple termination criterion does not exist in the case of associative-commutative unification because associative-commutative unification can introduce additional variables. It is necessary to show that the recursive calls on the unification algorithm operate on pairs of terms having less complexity than the original terms. For associative-commutative unification, a complexity measure that can be used to prove termination is the ordered pair (~, r) where ~ is the number of variables that occur as arguments to two different associative-commutative function symbols and r is the number of distinct nonvariable subterms that appear in the two terms being unified. A unification problem described by (~, r) is less complex than one described by (#, and only i f ~ < ~ t , o r v = #

r')

if

a n d r < r n.

The associative-commutative unification algorithm presented here can be extended to handle identity and idempotence [48,20]. If an identity element is present, then only the sum of all the basis solutions is necessary, not all 2~ subsets. If the associative-commutative function is idempotent, then the linear homogeneous diophantine equation can be solved in 0, 1 instead of over the integers. The variable and constants only case of abelian-group-theory unification (associativity, eommutativity, identity, and inverse) can be handled by a modification of this method that uses the standard solution of the linear diophantine equations in all integers, not just nonnegative ones [46].

3.5

Many-Sorted

Many-sorted unification

Unification [84,86] can be used to reason efficiently with sort information. The

universe of discourse is assumed to be divided into objects of different sorts. Constants, functions, and variables may be declared to have particular sorts and subsort relationships may be declared among sorts.

116

T h e t y p e s of s o r t i n f o r m a t i o n t h a t c a n b e h a n d l e d b y m a n y - s o r t e d unific&tion includes assert i o n s of t h e f o r m

Man(John)--John is a m a n Woman(Mary)--Mary is a w o m a n Man(/ather(x))--the

father of x is a m a n

Man(x) ~ Per~on(x)--every m a n is a p e r s o n Woman(x) D Person(x)--every w o m a n is a person. T h e s e a s s e r t i o n s are s u p p l a n t e d by sort declarations:

Man, Woman, a n d Person are sorts T h e c o n s t a n t John is of sort Man T h e c o n s t a n t Mary is of sort Woman T h e f u n c t i o n father is of sort Man T h e s o r t Man is a s u b s o r t of t h e sort Person T h e s o r t Woman is a s u b s o r t of t h e sort Person. M a n y - s o r t e d u n i f i c a t i o n uses such d e c l a r a t i o n s to r e s t r i c t t h e s t a n d a r d u n i f i c a t i o n algorithm. W h e n e v e r t h e u n i f i c a t i o n a l g o r i t h m eliminates a d i s a g r e e m e n t b e t w e e n two expressions by assigning a t e r m to a variable, t h e m a n y - s o r t e d unification r e s t r i c t i o n cheeks for c o n f o r m a b i l i t y of t h e sorts of t h e v a r i a b l e a n d t h e t e r m . T w o cases of m a n y - s o r t e d unification will be d i s t i n g u i s h e d . In t h e first case, t h e sort h i e r a r c h y is a forest, i.e., a set of trees. No sort C is a s u b s o r t of b o t h A a n d B (unless A is a s u b s o r t of B or B is a s u b s o r t of A). T h e second m o r e general case p e r m i t s c o m m o n s u b s o r t s a n d allows sort h i e r a r c h i e s t h a t axe g r a p h s . I n b o t h cases, a nonvaxiable t e r m c a n b e assigned to a v a r i a b l e only if t h e nonvaxiable t e r m ' s s o r t is t h e s a m e as or is a s u b s o r t of t h e v a r i a b l e ' s sort. T h e o t h e r s i t u a t i o n in w h i c h a d i s a g r e e m e n t c a n b e successfully e l i m i n a t e d is w h e n t h e disa g r e e m e n t consists of two d i s t i n c t variables. In t h e forest sort h i e r a r c h y case, if t h e variables are of t h e s a m e sort, e i t h e r c a n b e assigned to t h e o t h e r . If o n e ' s sort is a s u b s o r t of t h e o t h e r ' s , t h e f o r m e r v a r i a b l e m u s t b e assigned to t h e l a t t e r variable. T h u s , in unifying variables x a n d y, one c a n n o t , as in s t a n d a r d unification, u n i f o r m l y m a k e t h e a s s i g n m e n t x ~- y. We m u s t i n s t e a d m a k e t h e a s s i g n m e n t y *- x if x's sort is a s u b s o r t of y's. If n e i t h e r v a r i a b l e ' s sort is a s u b s o r t of t h e o t h e r ' s , t h e n u n i f i c a t i o n s i m p l y fails. For e x a m p l e , if x is a v a r i a b l e of s o r t Person a n d y a v a r i a b l e of s o r t Man, t h e n John a n d y are

t17

unifiable with unifier {y +- John), x and y are unifiable with unifier {x +- y} (but not {y *- x}), and Mary and y are not unifiable. Note that, by use of a technique familiar in logic p r o g r a m m i n g [161, if the sort hierarchy is a forest, m a n y - s o r t e d unification can be simulated by encoding sort information directly in the terms.

In this technique, there is a unary function symbol associated with each sort.

Sorted terms are e m b e d d e d in a sequence of such u n a r y function symbols corresponding to the sequence of sorts f r o m the top of the sort hierarchy to the declared sort of the term. Thus, the m a n John and the w o m a n Mary are represented by the t e r m s person(man(John)) and person(woman(Mary)), respectively. The arbitrary person x and m a n y are represented by the terms person(x) and person(mart(y)) respectively.

For example, similarly to above,

person(man(John)) and person(man(y)) are unifiable with unifier {y ~-- John), person(x) and person(man(y)) are unifiable with unifier {x *-- man(y)), and person(woman(Mary)) and

person(man(y)) are not unifiable. W h e n unifying two variables in the graph sort hierarchy case, if the variables are not of the s a m e sort and neither variable's sort is a subsort of the other's, the variables are still unifiable provided their sorts have one or more subsorts in common. For each c o m m o n subsort, a new variable of t h a t sort is created and assigned to b o t h variables being unified. It is sufficient to consider m a x i m a l c o m m o n subsorts, e.g., if $1 and $2 are the two c o m m o n subsorts of the sorts of the variables x and y being unified, but $2 is a subsort of $1, then only one unifier need be f o r m e d - - w i t h a new variable z of sort $I being assigned to b o t h x and y. For example, assume the declarations

Animal, Mammal, Lion, Dog, Cat, Fish, Shark, Koi, and Pet are sorts Mammal, Fish, and Pet are subsorts of Animal Lion, Dog, and Cat are subsorts of Mammal Shark and Kol are subsorts of Fish Dog, Cat, and Koi are subsorts of Pet. Let xF~sh denote the variable x of sort Fish, and the like. T h e n xF~,h and YP~t are unifiable with unifier {x +-- Ugoi, Y ~- ugoi} and ZM. . . . l and YPet are unifiable with unifiers {z *-- vDog,y *--

VDog) and {z +-- WCat,Y +-- went}. M a n y - s o r t e d unification can be very effective as experiments w i t h "Schubert's steamroller" puzzle indicate [85]. It blocks formation of terms t h a t are nonsense from the s t a n d p o i n t of the sort structure of the problem. T h e n u m b e r of clauses and literals in problems is reduced. Clauses

118

stating sorts of symbols and subsort relationships are eliminated. Because sort qualifier literals are removed from clauses so that, for example, the clause replaced by the unit clause

~Fox(x) v -~Bird(y) v Eats(x, y) is

Eats(xFoz, YBi~d), the remaining clauses tend to be shorter, and there

are likely to be more unit clauses. A further advantage is the abstract level of proofs using many-sorted unification. Suppose that foxes and birds are animals and that foxes like to eat birds. That some animal likes to eat some animal can be proved in a single resolution step by unifying the atoms of the assertion

Eats(x~o~,ys~r~) and the negated theorem "~Eats(UA,~mal,VA,i~aZ). The instantiation

of the variables of the theorem suggest the answer that all foxes like to eat all birds. Without using many-sorted unification, the assertion solved with the negated theorem

-~Fox(x) V -~Bird(y) y Eats(x, y) could be re-

-~Animal(u) v -~Animal(v) V -~Eats(u,v). The resulting clause

-~Fox(x) V -~Bird(y) V -~Animal(x) Y ~Animal(y) must then be refuted. This requires instantiation of x by some specific fox and y by some specific bird, e.g., the Skolem constants used in asserting the existence of foxes and birds, and the proof will end up mentioning a specific fox and bird. Worse yet, if there were a large number of assertions specifying that certain things were foxes or birds, there would be a large number of ways of instantiating the clause and thus a large number of proofs that may mention different foxes and birds. There are some important assumptions associated with the use of many-sorted unification. One is the assumption of nonemptiness of the sorts used.

P(x) and -~P(x) are not contradictory

if x's sort is empty. More restrictive in practice is the assumption that terms can be assigned their sorts a priori. For example, suppose

Tweety is declared to be of sort Animal of which sort Bird is a subsort.

The absence of the characteristic predicate

Bird makes Bird(Twecty) inexpressible. Even if the

Bird predicate is included, assuming or even proving the formula Bird(Tweety) has no effect on Twccty's declared sort, which is used to restrict the unification algorithm. Thus, there should be in the sort hierarchy only those sorts for which it is unnecessary to assume or prove that some term is a subsort of its declared sort. A limitation of this form of many-sorted unification is the lack of polymorphic sort declarations. It is often very useful to declare predicates and functions to have more than one possible set of sorts of arguments and for the sort of a function's value to depend on the sorts of its arguments. More general procedures for reasoning about sorts are being developed [14,15,691.

119

4

Equality

Reasoning

The equality relation is often used in problems to which theorem-proving programs are applied. Because of its widespread use and the difficulties resulting from simply axiomatizing it, much effort has been devoted to developing special rules of inference for the equality relation.

4.1

Equality

Axiomatization

The equality relation : is an equivalence relation, i.e., it is reflexive, symmetric, and transitive. These properties are usually given to theorem-proving programs as the following three assertions: X:X

~(~ =

y) v (~ =

~)

~(~ : y) v ~(y = ~) v (~ : ~) However, this is not the only possible expression of these properties. A smaller set of assertions t h a t conveys the same information is X=X

~(~ :

y) v ~(y = ~) v (~ =

~)

The symmetry property is obtained from these latter two assertions by resolving on x

:

x and

-~(x : y) to yield -~(x -- z) V (z : x). The standard transitivity axiom can then be obtained by resolving this with the second assertion. The reduced number of assertions may yield a smaller search space with lower branching factor, though with sometimes longer proofs. In addition to reflexivity, symmetry, and transitivity, the equality relation possesses substitutivity properties, i.e., terms that are equal to each other can be substituted for each other anywhere in a term or formula. These are expressed by two sets of assertions that specify the predicate-substitutivity and functional-substitutivity axioms. For each n-ary predicate P other than : ~ there are n predicate-substitutivity axioms of the form:

~(~, = ~) v ~ p ( x l , . . . , ~ , )

v P(~I . . . . ,~,-1, ~, ~,+~,...,~°)

~(~. : ~) v ~P(~I, ...,~,) v P ( ~ l , . . . , ~._1,~) For each n-ary function f, there are n functional-substitutivity axioms:

120

~(xl

= x) V (f(xl ..... x,~) = f(x, xz ..... x,~))

~(xl -~ x) V (f(xl,... ,xn) = f(xl,..., x~-1, x, xi+1,..., x,~)) ~(~

= ~) v ( / ( ~ 1 , . . . , ~ , )

= :(xl .... , ~_l,x))

The problems of using this axiomatic formulation of the equality relation is the large number of axioms and their generality. In particular, the n predicate-substitutivity axioms for the predicate P are always resolvable with any literal with predicate P. The search space is large and contains many useless and redundant results. It is also very laborious to derive even simple consequences of equality. For example, to derive the obvious fact that f(g(h(a))) = f(g(h(b))) from a = b requires three applications of functional-substitutivity axioms. These problems motivated the development of special rules of inference to be used in addition to resolution. These additional rules of inferences have been only partially successful. They have largely succeeded in reducing the length of proofs and deriving obvious results like the above in a natural way, but the rules are still sufficiently general that the problem of large search spaces for problems involving equality is not fully solved.

4.2

Demodulation

Demodulation [91] or rewriting or reduction [30] is the process of using a set of equalities to replace terms by equal terms. The equalities are oriented and made into reductions ~ ~ p that are used to replace instances of the term :k by the corresponding instance of the term p. Thus, for example, the term a + 0 can be reduced to a using the reduction x + 0 --~ x with substitution (x ~-- a}. The reduction process repeatedly applies reductions to a term until a term that cannot be further reduced is produced. For this process to terminate, the reductions must be oriented by some well-defined complexity measure so that, for every instance of a reduction, the right-hand side is less complex than the left-hand side. For example, associativity of + can be built into the reduction (x + y) + z ~ x + (y + z) because, by an appropriate complexity measure, terms parenthesized to the right are simpler, and the reduction process terminates, but commutativity cannot be used in a reduction, because the reduction x + y --+ y + x can be used to rewrite a A- b to b + a to a + b infinitely. Demodulation is used in resolution theorem proving to perform rapid equality inferences on derived terms. It also has the beneficial effect of reducing many equivalent terms to the same form (in the ideal case of a complete set of reductions, all equivalent terms to the same form) and

121

thus reducing the number of variants of equivalent terms appearing in clauses to be stored and facilitating subsumption. It is also useful for performing various programming llke tricks [87,88] such as maintaining lists of possible values of parameters in puzzles and removing individual possibilities by demodulation.

Narrowing [73,42] is an extension of reduction that uses unification instead of pattern matching. A special case of paramodulation, narrowing is especially useful for constructing a unification algorithm in an equational theory specified by a complete set of reductions [21,32]. Let s and t be a pair of terms to be so unified and H be a symbol not occurring elsewhere. Then if H(s, t) can be transformed to H(s ~,t') by a sequence of narrowing operations and s t and t ~ are unifiable by the standard unification algorithm, then s and t are unifiable in the equational theory specified by the complete set of reductions with a unifier that is the composition of the unifier of s s and t t and the unifiers used in the narrowing steps.

4.3

Paramodulation

Paramodulation [89] is an equality inference rule that performs substitution directly. The paramodulant clause L(..- b.- -) v C V D can be derived by paramodulation from the clause (a = b) v C or (b = a) v C into the clause L(.-. a . . . ) VD, where L(... a . . - ) denotes the literal L and a particular occurrence of the term a in L and where C and D are arbitrary clauses. That is, an equality atom can be used to replace one of its arguments by the other in any other literal, with the remaining literaIs of the two clauses included as part of the derived clause. In the general case, it may be necessary to find a unifying substitution for the term to be replaced and the equality-atom argument. Resolution plus paramodulation is complete provided the equality reflexivity axiom x = x is included [9]. Thus, the paramodulation rule eliminates the need for the equality symmetry, transitivity, and substitutivity axioms. This completeness result applies to unrestricted resolution plus paramodulation. If refinements, such as set of support are employed, it may be necessary to include functional-reflexivity axioms to preserve completeness. The set of functional-reflexivity axioms consists of, for each n-ary function f, the unit clause

f ( x a , . . . , xn) = f ( x l , . . . , x,). These are instances of the reflexivity axiom x = x. An illustration of the necessity of the functional-reflexivity axioms when the set of support refinement is usecl is the refutation of the set of clauses P(x,x), a = b, and -~f(f(a),f(b)) with P(x, x) designated as the only clause in the set of support. To refute this set, it is necessary to paramodulate from the functional-reflexivity axiom f(x) = f(x) into P(x,x) to obtain

122

P(f(x), f(x)). Paramodulating from a = b into P(f(x), f(x)) yields P(f(a), f(b)), which can then be resolved with the input clause

4.4

~P(f(a), f(b)).

R e s o l u t i o n by Unification and Equality

Resolution by unification and equality (RUE) [17,18] adopts a different approach to incorporating equality reasoning into an inference rule. Where paramodulation applies equality substitution, producing a new literal from a literal and an equality literal, resolution by unification and equality derives a set of negative equality literals from a pair of literals. For example, while L ( . . . b.- -) v C V D can be derived by paramodulation from (a = b) v C into L ( . . . a . . . ) V D, resolution by unification and equality performs the complementary operation of deriving the clause -,(a = b) v E V F from the clauses L(--- a . . . ) V E and -~L(--- b - . . ) v F . The principle involved is that L ( . . . a . . . ) and -~L(... b . . . ) can both be true only if a is not equal to b. Thus, -~(a = b), along with the other literals of the clauses containing L and -~L, can be derived. Of course, performing resolution by unification and equality may result in the formation of resolvents with more than one inequality literal if there is more than a single disagreement in the two literals being matched. For example, ~(a = c) v -,(b = d) can be derived from

P(f(a, b)) and

~PCfCe,4). There are completeness and efficiency issues involved in the selection of what disagreements are used to construct a resolvent by unification and equality. Matching

P(f(a, b)) and -~P(f(c, d))

using the resolution by unification and equality rule must result in

~(f(a,b) = f(c,d)) for a

successful refutation of the set of clauses consisting of

P(f(a,b)), -~P(f(e, d)), and (f(a, b) =

f(c, d)). Thus, creating an inequality literal from terms whose function symbols are the same but whose subterms disagree may be necessary for completeness. It is generally more efficient, and often successful in practice, to form inequality literals from the bottommost disagreement set, as in the earlier derivation of -~(a = c) v ~(b = d). The

negative reflexive function (NRF) rule is also necessary. It creates from a clause -,(~ =

t) V C the clause -~(s: = tl) V . . . v -~(s, = t~) V C, where (8:,t:) . . . . . ( s , , t , ) is a disagreement set between s and t. For example, if

-~(f(a,b) = f(c,d)) is derived from P(f(a,b)) and -~P(f(c,d))

by resolution by unification and equality, the lower level disagreement -~(a = c) V -,(b = d) can be obtained by applying the negative reflexive function rule to

-~(f(a, b) = f(c, d)).

Appropriate use (i.e., suitable choice of disagreement sets) of the resolution by unification and equality and the negative reflexive function rules together yields a complete procedure for

123

equality reasoning.

4.5

E-Resolution

E-resolution [59] is similar to (but predates) resolution by unification a n d equality, b u t is a more complex rule of inference that includes the use of p a r a m o d u l a t i o n . It is a higher level rule t h a n resolution by unification a n d equality in much the same way that hyperresolution is a higher level rule t h a n resolution. If the titerals L and L' can be m a d e c o m p l e m e n t a r y by a sequence of p a r a m o d u l a t i o n operations by clauses (s~ = Q) V C ~ , . . . , (s~ = t , ) V C,, then D V E V C~ V . . . V C~ can be derived from L V D a n d L' V E by E-resolution, where D , E , C1,... ,C~ are arbitrary clauses. This is also b o t h a generalization and specialization of of unification in equational theories. It specializes unification in equational theories by stipulating that p a r a m o d u l a t i o n is used to match the expressions. It generalizes unification in equational theories because n o n u n i t clauses containing equalities can be used a n d the theory is therefore not equational.

4.6

Knuth-Bendix Method

Let R be the set of reductions

)~1 --4 /91,

..

•

,

)~n ---4 Pn and E be the corresponding equational

theory A1 = Pl, ... ,An = p~. T h e n R is a complete set of reductions for E if and only if it is

terminating and confluent. The term tl can be reduced by R to the t e r m tz (written tl --+ t~) if some s u b t e r m u of tl is a n instance of ), (with substitution a) for some A --+ p in R and t2 is the result t2(u +- pa) of replacing u by the corresponding instance of p. R is t e r m i n a t i n g if and only if there is no infinite sequence of reductions tl ~ t~ --+ .... R is confluent if a n d only if for every term t, if t 2+ tl and t 2+ t2 (i.e., t can be rewritten by _R to tl and t~ in zero or more steps) there is some t e r m t' such t h a t tl _I+ t ~ and t2 _!+ g. If R is a complete set of reductions for E , t h e n tl ~= t~ J~ for every pair of terms tl and t2 such that tl =E t2, where t ~ denotes the result of reducing t by R to a n irreducible form. Thus, a complete set of reductions for E can be used to solve the word p r o b l e m for E. The K n u t h - B e n d i x m e t h o d [36,29,30,10,41,42] provides a test for a set of reductions being

locally confluent. A set of reductions R is locally confluent if a n d only if for every term t, if t --* tl and t --* t2 (i.e., t can be rewritten by R to each of tl a n d t2 in one step) there is some term t' •

!

such that tx ~" t' a n d t2 --+ t . T e r m i n a t i n g sets of reductions are confluent if a n d only if they are locally confluent.

124

Instead of considering all possible terms t that can be reduced by R to terms tl and t~, the Knuth-Bendix method performs superposition operations that capture the general case of two reductions being simultaneously applicable to a term. Let )q --~ p~ and Aj ~ pj be two not necessarily distinct reductions in R with variables renamed so that they have no variables in common. Let u be a nonvariable subterm of ,kl that is unifiable with/~j with most general unifier a. Then the terms tl = p~a and t~ = ~i(u ~-- p j ) a (the instantiation by a of ),~ with u replaced by pj) form a critical pair that represent one of the cases of of A~ --+ p~ and Aj --+ pj rewriting some term t (in this case, A~a) to terms tl and t~. If for every critical pair (ti, t2), tl J.-- t2 $, R is locally confluent. For example, the set of reductions

(1) g(e,~) -~ (2) f(g(~), ~) ~ e

(3) f(f(x, y),z) ---+f(x, f(y,z)) (4) f(~,~) ---, (5)/(~,g(~)) --, (6) ~(~) --, (7) g(#(x)) ~ (s) f(~(~), f(~, y)) ---,y

(9) :(~, f(9(~),~/)) --, y

(Io) gC:(~,y)) --,/(g(y),g(~)) is a terminating and locally confluent (and, hence, complete) set of reductions for free groups, where f is the group multiplication operator, g the group inverse operator, and e the group identity element. Two terms are equal in the theory of free groups if and only if they can be simplified to the same term by this set of reductions. But, the Knuth-Bendix method is more than a test for local confluence of sets of reductions. If the set of reductions is not locally confluent, it will generate a critical pair that leads to a counterexample, i.e., a pair of terms equal in the equational theory, but distinct and irreducible. If one of the terms is simpler than the other in a manner consistent with the complexity ordering of the other reductions, then the counterexample can be made into a reduction and added to the current set of reductions being tested for local confluence. Thus, the Knuth-Bendix method can (1) terminate with no additional eounterexamples, resulting in a complete set of reductions, (2) terminate with a counterexample that cannot be oriented into a reduction because neither term is simpler than the other, or (3) continue generating reductions forever (thereby constructing the infinite complete set of reductions). An example of Case (1) is that the previously mentioned complete set of reductions for free groups is generable from reductions 1-3 by the Knuth-Bendix method.

125

An example of Case (2) is the generation of the unorientable equality f ( x , y) = f ( y , x) from reductions 1-3 plus f ( x , x) -= e. An example of Case (3) is the generation of an infinite set of reductions

f(x, f(y,f(x,y))) --+f(x,y) f(x, f(y, f(x, f(y,w)))) --~f(x,f(y,w)) :(=,:(v,f(z,f(=,f(v,z)))))-, f(=,f(v,~)) :(=,f(v,f(~,:(=,:(v,f(~,~o))))))--,/(=,f(y,:(~,~))) from the reductions

f(=, =) - ,

f(f(x, y),z) ---+f(x, f(y,z)) The Knuth-Bendix method, when it applies, is extraordinarily powerful. The complete set of 10 reductions for free groups can be derived by computer from the original 3 with little wasted effort in just a few seconds. The chapter by Huet contains some further discussion of the standard Knuth-Bendix method. One of the most obvious limitations of the standard Knuth-Bendix method is its inability to handle theories with commutativity. Commutativity cannot be handled because the equation f ( x , y ) = f ( y , x) cannot be treated as a reduction without losing the required termination property. Despite examples such as the theory of free groups above that include associativity, the standard Knuth-Bendix method is also somewhat deficient in its handling of associativity. For example, the set of reductions f ( x , x ) ~ x and f ( f ( x , y ) , z )

---+f ( x , f ( y , z ) )

can be extended only

to an infinite complete set of reductions, atthough the single reduction f ( x , x) ~ x composes a complete set of reductions if f is assumed to be associative and associative pattern matching is used in its application. Such problems have provided motivation for extending the Knuth-Bendix method to handle equational theories that are divided into a set of reductions plus a set of additional equalities [63,28,35,43,44,45]. Functions that are associative and commutative are particularly important. With special handling for such functions, it is possible to derive complete sets of reductions for abelian groups and rings and many other interesting theories [31] An especially interesting example is that of Boolean algebra when the associative and commutative exclusive-or (0) and conjunction connectives are used as the basic set of logical connectives in terms of which formulas are rewritten [25]. The set of reductions

126

x =- y --~ x @ y @ t r u e x D y---+ ( x A y ) @ x ( ~ t r u e

~vy--~

(xAy)exe~,

-~X --+ X O true x @ f a l s e --* x x e x -+ f a l s e x A true --~ x x A f~e

-* false

can be used to decide the equivalence of two formulas in the propositional calculus by associativecommutative identity checking of the results of reducing the two expressions to their irreducible forms using associative-commutative pattern matching. A formula is valid or unsatisfiable if and only if it reduces to true or false, respectively. Extensions of the technique can be used for theorem proving in the first-order predicate calculus. An approach to handling functions that axe associative and commutative in the Knuth-Bendix method is to employ associative-commutative identity checking, pattern matching, and unification in place of standard identity checking, pattern matching, and unification [63]. The immediate difficulty with carrying out this modification is that the reduction f ( g ( x ) , x) --* e is not directly applicable to the term f(g(a), b, a, e) where f is an associative and commutative function (again treated as an n-ary function for arbitrary n) because f(g(a), a) is not a subterm of f ( g ( a ) , b , a , e ) .

The superposition process is likewise complicated. A solution is to enlarge

the set of reductions. In particular, for every reduction ~ --* p where A is headed by f which is associative and commutative, the reduction f()%v) --+ f ( p , v ) is added, where v is a new variable not occurring in ~ ~

p.

The embedding f ( g ( x ) , x , v )

~

f(e,v) (f(g(x),x,v)

~

v

after rewriting the right-hand side) can be used to reduce f ( g ( a ) , b , a , c ) to f(b,c) using the substitution {x +- a,v ~-- f ( b , c ) } obtained by associative-commutatlve pattern matching. The use of associatlve-commutative identity checking, pattern matching, and unification operations plus the addition of embeddings of reductions permit extension of the Knuth-Bendix method to handle functions that are associative and commutative.

References [1] Andrews, P.B. Theorem proving via general rantings. Journal of the A C M 28, 2 (April 1981), 193-214. [2] Benanav, D., Kapur, D., and P. Narendran. Complexity of matching problems. Proceedings of the First International Conference on Rewriting Techniques and Applications, Dijon, France,

127

May 1985. [3] Bl~ius, K., N. Eisinger, J. Siekmann, G. Smolka, A. Herold, and C. Walther. The Markgraf Karl Refutation Procedure (Fail 1981). Proceedings of the Seventh International Joint Conference on Artlfieial Intelligence, Vancouver, B.C., Canada~ August 1981, 511-518. [4] Bibel, W. On matrices with connections. Journal of the ACMZS, 4 (October 1981), 633-645. [5] Bibel, W. Automated Theorem Proving. Friedr. Vieweg & Sohn, Braunschwelg, West Germany, 1982. [6] Boyer, R.S: and J S. Moore. The sharing of structure in theorem-proving programs. In B. Meltzer and D. Michie (eds.). Machine Intelligence 7. Edinburgh University Press, Edinburgh, Scotland, 1972, pp. 101-116. [7] Brachman, R.J., R.E. Fikes, and H.J. Levesque. Krypton: a functional approach to knowledge representation. IEEE Computer 16, 10 (October 1983), 67-73. [8] Brachman, R.J., V. Pigman Gilbert, and H.J. Levesque. An essential hybrid reasoning system: knowledge and symbol level accounts of Krypton. Proceedings of the Ninth International Joint Conference on Artificial Intelligence, Los Angeles, California, August 1985, 532-539. [9] Brand, D. Proving theorems with the modification method. SIAM Journal of Computing (December 1975), 412-430. [10] Buchberger, B. Basic features and development of the critical-pair/completion procedure. Proceedings of the First International Conference on Rewriting Techniques and Applications, Dijon, France, May 1985, 1-45. [11] Chang, C.-L. The unit proof and the input proof in theorem proving. Journal of the ACM 17, 4 (October 1970), 698-707. [12] Chang, C.-L. and R.C.-T. Lee. Symbolic Logic and Mechanical Theorem Proving. Academic Press, New York~ New York, 1973. [13] Clocksin~ W.F. and C.S. Meltish. Programming in Prolog. Springer-Verlag, Berlin, West Germany~ 1981. [14] Cohn, A.G. Mechanizing a Particularly Expressive Many Sorted Logic. Ph.D. dissertation, University of Essex, Essex, England, January 1983. [15] Cohn, A.G. On the solution of Schubert's steamroller in many sorted logic. Proceedings of the Ninth International Joint Conference on Artificial Intelligence, Los Angeles, California, August 1985, 1169-1174. [16] Dahl, V. Translating Spanish into logic through logic. American Journal of Computational Linguistics 7, 3 (September 1981), 149-164. [17] Digricoli, V.J. Resolution by unification and equality. Proceedings of the Fourth Workshop on Automated Deduction, Austin, Texas, February 1979. [18] Digricoli, V.J. The efficacy of RUE resolution: experimental results and heuristic theory. Proceedings of the Seventh International Joint Conference on Artificial Intelligence, Vancouver, B.C., Canada, August 1981, 539-547.

128

[19] Eder, E. Properties of substitutions and unifications. Journal of Symbolic Computation 1 (1985). [20] Fages, F. Associative-commutative unification. Proceedings of the 7th International Conference on Automated Deduction, Napa, California, May 1984. Lecture Notes in Computer Science 170, Springer-Verlag, Berlin, West Germany, pp. 194-208. [21] Fay, M.J. First-order unification in an equational theory. Proceedings of the 4th International Conference on Automated Deduction, Austin, Texas, February 1979, 161-167. [22] Gallier, J. Logic for Computer Science. Harper & Row, New York, New York, 1.986. [23] Henschen, L.J. and S.A. Naqvi. An improved filter for literal indexing in resolution systems. Proceedings of the Seventh International Joint Conference on Artificial Intelligence, Vancouver, B.C., Canada, August 1981, 528-529. [24] Hewitt, C. Description and theoretical analysis (using schemata) of PLANNER: a language for proving theorems and manipulating models in a robot. Technical Report, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, April 1972. [25] Hsiang, J. Refutational theorem proving using term-rewriting systems. Artificial Intelligence Journal Z5, 3 (1985), 255-300. [26] Huet, G. Rdsolution d'6quations dana les langages d'ordre i , 2 , . . ,w. Th~se d'6tat, Sp~cialit~ Mathmatiques, Universit~ Paris VII, 1976. [27] Huet, G. An algorithm to generate the basis of solutions to homogeneous diophantine equations. Information Processing Letters 7, 3 (April 1978), 144-147. [28] Huet, G. Confluent reductions: abstract properties and applications to term rewriting systems. Journal of the A C M 27, 4 (October 1980), 797-821. [29] Huet, G. A complete proof of correctness of the Knuth-Bendix completion algorithm. Journal of Computer and System Sicences P3 (1981), 11-21. [30] Huet, G. and D.C. Oppen. Equations and rewrite rules: a survey. Technical Report CSL-111, Computer Science Laboratory, SRI International, Menlo Park, California, January 1980. [31] Hullot, J.-M. A catalogue of canonical term rewriting systems. Technical Report CSL-113, Computer Science Laboratory, SRI International, Menlo Park, California, April 1980. [32] Hullot, J.-M. Canonical forms and unification. Proceedings of the 5th International Conference on Automated Deduction, Les Arcs, France, July 1980. Lecture Notes in Computer Science 87, Springer-Verlag, Berlin, West Germany, pp. 318-334. [33] Jaffar, J., J.-L. Lassez, and J. Lloyd. Completeness of the negation as failure rule. Proceedings of the Eighth International Joint Conference on Artificial Intelligence, Karlsruhe, West Germany, August 1983, 500-506. [34] Jaffax, J. Minimal and complete word unification. Technical Report 51, Department of Computer Science, Monash University, Clayton, Victoria, Australia, March 1985. [35] Jouannaud, J.-P. and H. Kirchner. Completion of a set of rules modulo a set of equations. Technical Note, Computer Science Laboratory, SRI International, Menlo Park, California, April 1984.

t29

[36] Knuth, D.E. and P.B. Bendix. SimpIe word problems in universal algebras. In Leech, J. (ed.), Computational Problems in Abstract AIgcbras, Pergamon Press, 1970, pp. 263-297. [37] Kowalski, R. A proof procedure using connection graphs. Journal of the A CM 22, 4 (October 1975), 572-595. [38] Kowalski, R.A. Algorithm = logic + control. Communications of the A CM 22, 7 (July 1979), 424-436. [39] Kowalski, R. Logic for Problem Solving. Elsevier North-Holland, New York, New York, 1979. [40] Kowalski, R. and D. Kuehner. Linear resolution with selection function. Artificial Intelligence 2 (1971), 227-260. [41] Lankford, D.S. Canonical algebraic simplification in computational logic. Technical Report, Department of Mathematics, UniverMty of Texas, Austin, Texas, May 1975. [42] Lankford, D.S. Canonical inference. Report ATP-32, Department of Mathematics and Computer Sciences, University of Texas at Austin, Austin, Texas, December 1975. [43] Lankford, D.S. and A.M. Ballantyne. Decision procedures for simple equational theories with commutative axioms: Complete sets of commutative reductions. Report ATP-35, Department of Mathematics, University of Texas, Austin, Texas, March 1977. [44] Lankford, D.S. and A.M. Ballantyne. Decision procedures for simple equational theories with permutative axioms: Complete sets of permutative reductions. Report ATP-37, Department of Mathematics, University of Texas, Austin, Texas, April 1977. [45] Lankford, D.S. and A.M. Ballantyne. Decision procedures for simple equational theories with commutative-associative axioms: Complete sets of commutative-assoclative reductions. Report ATP-39, Department of Mathematics, University of Texas, Austin, Texas, August 1977. [46] Lankford, D., G. Butler, and B. Brady. Abelian group theory unification algorithms for elementary terms. Technical Report, Mathematics Department, Louisiana Tech University, Ruston, Louisiana, 1983. [47] Livesey, M. and J. Siekmann. Termination and decidability results for string unification. Memo CSM-12, Essex University, Essex, England, 1975. [48] Livesey, M. and J. Siekmann. Unification of A+C-terms (bags) and A÷C+I-terms (sets). Interner Bericht Nr. 5/76, Institut ffir Informatik I, Universit£t Karlsruhe, Karlsruhe, West Germany, 1976. [49] Lloyd, J.W. Foundation~ of Logic Programming. Springer-Verlag, New York, New York, 1984. [50] Loveland, D.W. A linear format for resolution. Proceedings of the IRIA Symposium on Automatic Demonstration, Versailles, France, 1968. Lecture Notes in Mathematics 125, SpringerVerlag~ Berlin, West Germany, 1970, pp. 147-162. [51] Loveland, D.W. A simplified format for the model elimination procedure. Journal of the A C M 16, 3 (July 1969), 349-363. [52] Loveland, D.W. Automated Theorem Proving: A Logical Basis. North-Holland, Amsterdam, the Netherlands, 1978.

130

[53] Luckham, D. Refinement theorems in resolution theory. Proceedings of the IRIA Symposium on Automatic Demonstration, Verailles, France, 1968. Lecture Notes in Mathematics 125, Springer-Verlag, Berlin, 1970, pp. 163-190. [54] Makanin, G.S. The problem of solvability of equations in a free semigroup. Soviet Akad. Nauk SSSR 233, 2 (1977). [55] Manna, Z. and R. Waldinger. A deductive approach to program synthesis. A C M Transactions on Programming Languages and Systems 2, 1 (January 1980), 90-121. [56] Manna, Z. and R. Waldinger. The Logical Basis for Computer Programming. Addison-Wesley, Reading, Massachusetts, 1985. [57] Martelli, A. and U. Montanari. An efficient unification algorithm. A C M Transactions on Programming Languages 4, 2 (April 1982), 258-282. [58] McCharen, J., R. Overbeek, and L. Wos. Complexity and related enhancements for automated theorem-proving programs. Computers and Mathematics with Applications 2 (1976), 1-16. [59] Morris, J.B. E-resolution: extension of resolution to include the equality relation. Proceedings of the International Joint Conference on Artificial Intelligence, Washington, D.C., May 1969, 287-294. [60] Murray, N.V. Completely non-clausal theorem proving. Artificial Intelligence 18, 1 (January 1982), 67-85. [61] Overbeek, R. An implementation of hyperresoIution. Computers and Mathematics with Applications 1 (1975), 201-214. [62] Paterson, M.S. and M.N. Wegman. Linear unification. Journal of Computer and Systems Science 16, 2 (April 1978), 158-167. [63] Peterson, G.E. and M.E. Stickel. Complete sets of reductions for some equational theories. Journal of the Association for Computing Machinery 28, 2 (April 1981), 233-264. [64] Plaisted, D.A. The occur-check problem in Prolog. New Generation Computing 2, 4 (1984), 309-322. [65] Plotkin, G.D. Building-in equational theories. In Meltzer, B. and D. Michie (eds.). Edinburgh University Press, Edinburgh, Scotland, 1972, pp. 73-90. [66] Robinson, J.A. A machine-oriented logic based on the resolution principle. Journal of the ACMIP, 1 (January 1965), 23-41. [67] Robinson, J.A. Logic: Form and Function. Elsevier North-Holland, New York, New York, 1979. [68] Rulifson, J.F., J.A. Derksen, and R.J. Waldinger. QA4: a procedural calculus for intuitive reasoning. Technical Note 73, Artificial Intelligence Center, SRI International, Menlo Park, California, November 1972. [69] Schmidt-Schauss, M. A many-sorted calculus with polymorphic functions based on resolution and paramodulation, Proceedings of the Ninth International Joint Conference on Artificial Intelligence, Los Angeles, California, August 1985, 1162-1168.

131

[70] Siekmann, J.H. Unification of commutative terms. Interner Bericht Nr. 2/76, Institut ffir Informatik I, Universit£t Karlsruhe, Karlsruhe, West Germany, 1976. [71] Siekmann, J.H. Universal unification. Proceedings of the 7th International Conference on Automated Deduction, Napa, California, May 1984. Lecture Notes in Computer Science 170, Springer-Verlag, Berlin, West Germany, pp. 1--42. [72] Siekmann, J. and W. Stephan. Completeness and soundness of the connection graph proof procedure. Interner Bericht 7/76, Institut ffir Informatik I, Universit~t Karlsruhe, Karlsruhe, West Germany, 1976. [73] Slagle, J.R. Automated theorem-proving for theories with simplifiers, commutativity, and associativity. Journal of the A C M 21, 4 (October 1974), 622-642. [74] Smolka, G. Completeness and confluence properties of Kowalski's clause graph calculus. Interner Bericht 31/82, Institut ffir Informatik I, Universit£t Karlsruhe, Karlsruhe, West Germany, December 1982. [75] Stickel, M.E. A complete unification algorithm for associative-commutative functions. Proceedings of the Fifth International Joint Conference on Artificial Intelligence, Tbilisi, Georgia, U.S.S.R., September 1975, 71-76. [76] Stickel, M.E. Mechanical Theorem Proving and Artificial Intelligence Languages. Ph.D. dissertation, Computer Science Department, Carnegie-Metlon University, Pittsburgh, PennsyIvania, December 1977. [77} Stickel, M.E. A unification algorithm for associative-commutative functions. Journal of the A C M 28, 3 (July 1981), 423-434. [78] Stickel, M.E. A nonclausal connection-graph resolution theorem-proving program. Proceedings of the AAAI-S2 National Conference on Artificial Intelligence, Pittsburgh, Pennsylvania, August 1982, 229-233. [79] Stickel, M.E. Theory resolution: building in nonequational theories. Proceedings o/the AAAI83 National Conference on Artificial Intelligence, Washington, D,C., August 1983, 391-397. [80] Stickel, M.E. A Prolog technology theorem prover. New Generation Computing P, 4 (1984), 371-383. [81] Stickel, M.E. Automated deduction by theory resolution. Journal of Automated Reasoning t, 4 (1985), 333-355. [82] Stickel, M.E. and W.M. Tyson. An analysis of consecutively bounded depth-first search with applications in automated deduction. Proceedings of the Ninth International Joint Conference on Artificial Intelligence, Los Angeles, California, August 1985, 1073-1075. [831 van VaMen, J. An extension of unification to substitutions with an application to automatic theorem proving. Proceedings of the Fifth International Joint Conference on Artificial Intelligence, TbiIisi, Georgia, U.S.S.R., September 1975, 77-82. [84] Walther, C. A many-sorted calculus based on resolution and paramodulation. Proceedings of the Eighth International Joint Conference on Artificial Intelligence, Karlsruhe, West Germany, August 1983, 882-891.

132

[85] Walther, C. A mechanical solution of Schubert's steamroller by many-sorted resolntion. Proceedings of the AAAI-8~ National Conference on Artificial Intelligence, Austin, Texas, August 1984, 330-334. Revised version appeared in Artificial Intelligence g6, 2 (May 1985), 217-224. [86] Walther, C. Unification in many-sorted theories. Proceedings of the 6th European Conference on Artificial Intelligence, Pisa, Italy, September 1984. [87] Winker, S.K. and L. Wos. Procedure implementation through demodulation and related tricks. Proceedings of the 6th International Conference on Automated Deduction, New York, New York, June 1982. Lecture Notes in Computer Science 138, Springer-Verlag, Berlin~ West Germany, pp. 109-131. [88] Wos~ L , R. Overbeek, E. Lusk, and J. Boyle. Automated Reasoning. Prentlce-Hall, Englewood Cliffs, New Jersey, 1984. [89] Wos, L. and G.A. Robinson ParamoduIation and set of support. Proceedings of the IRIA Symposium on Automatic Demonstration, Verailles~ France~ 1968. Lecture Notes in Mathematics 125, Springer-Verlag, Berlin, 1970, pp. 276-310. [90] Wos, L., G.A. Robinson, and D.F. Carson. Efficiency and completeness of the set of support strategy in theorem proving. Journal of the ACM 12, 4 (October 1965), 536-541. [91] Wos, L., G.A. Robinson, D.F. Carson, and L. Shalla. The concept of demodulation in theorem proving. Journal of the A C M I~, 4 (October 1967), 698-709.

Fundamental Mechanisms in Machine Learning and Inductive Inference

Alan W. Biermann Duke University Durham, NC 27706

Supported in part, by the U.S. Army Research Office under $rant DAAG-29-84-K-0072

134

I. I N T R O D U C T I O N While learning and inductive inference are two distinctively different phenomena, they often appear together, and therefore, it is appropriate to study them simultaneously. Learning, for the purposes of this article, wilt be said to occur when a system self modifies to improve its own behavior. The scenario is thus that the system operates at a given performance level at one time, experiences events of one kind or another, and self modifies with purpose to achieve a higher level of performance at a later time.

Inductive inference occurs when a system observes examples (and possibly nonexamples) of a set and constructs a general rule to characterize the set. Thus, as an illustration, such a system might be shown several examples of arches and several objects that are not arches and asked to inductively infer a general rule that will distinguish all arches from other objects. The induced rule is only a guess based upon incomplete information, the known examples and nonexamptes. However, if the input information is representative, the guessed rule will be correct or nearly correct. If the rule has shortcomings, additional examples will often result in convergence to a correct form. The phenomenon of inductive inference has been studied under many different names in the literature including generalization, induction, concept formation, learning, categorization, and theory formation. Most systems that learn use inductive inference as the mechanism for improving behavior. That is, in the process of performing a task, the system infers rules about the domain and uses those rules in later actions to achieve a higher level of performance. This is the kind of learning system that will be studied here. Examples of learning systems that do not use inductive inference are those that improve behavior by simply memorizing facts or by discovering new behaviors using introspective mechanisms. The learning mechanisms to be studied here fall into five different categories, systems which learn. (1) finite functions, (2) grammars, (3) programs from traces, (4) LISP programs form input-output pairs, and (5) PROLOG programs from oracle queries. The first type of system learns functions which receive inputs and in a single computational action compute the associated output. The second type can learn a grammar for a language from example strings (or sentences) in the language (and possibly some nonsentences). The third type of system requires that the user lead the machine through a trace of a sample computation and then it infers a program for doing the computation. The fourth and fifth approaches

135

involve discovering LISP and PROLOG programs that can achieve certain target input-output behaviors. As each of these studies is undertaken, it is important to keep in mind the various measures of a learning machine. One should first notice the nature of the required training information. Are only positive examples of target behavior given, or are both positive and negative examples given? Is the information provided at random from the external world or can the learning machine ask for any fact it needs? Is the target behavior presented strictly in terms of input-output requirements, or does the training information show how the output is to be obtained from the input? Also one should notice whether it is possible to specify for the given learning machine exactly what is the set of behaviors that it can learn? Finally, what are the levels of error and rates of learning for the machine?

H. L E A R N I N G F I N I T E F U N C T I O N S A finite function will be defined for the purpose of this study to be any function which sequentially inputs a bounded amount of information and then computes an answer. Later sections in this chapter will study the acquisition of functions or programs which may process an input of unbounded length. While there are many finite function learning machines in the literature, five will be discussed here. Methods will be described for learning (1) (2) (3) (4) (5)

linear evaluation functions, signature tables, Boolean conjunctive and disjunctive normal forms, Michalski expressions, and semantic nets.

Learning Linear Evaluation Functions A linear evaluation function has the form y ~ ~ ci x i where y is the computed value, xl,x~,...,x n are i=l n inputs, and cl,c2,...,c n are variable coefficients. Learning is done by adjusting the coefficients for improved behavior. In many systems, such a linear function is built into a larger system which utilizes the computed value y for evaluating alternative decisions. Thus in a pattern recognition problem (Nilsson [65], Minsky and Papert [69]), the xi's may represent measurements or feature values of an unknown pattern and the pattern will be recognized as belonging to a given class if y is positive. In a game playi.,~g

136

situation (Samuel[59]), the xi's represent feature values of the specific position on the board and y is assumed to give a measure of the desirability of that position. Linear evaluation systems have been important in the learning literature because there are learning algorithms with guaranteed convergence to a solution if one exists and because much is known about the class of behaviors t h a t these systems can compute. A n example of a learning algorithm for such systems is the following (taken from Minsky and Papert [69]).

START:

Choose the constants c i randomly.

TEST:

Select an object from the set to be learned (positive information) or from outside the set (negative information), obtain its feature values Xl,X2,...,xa, and compute y -~ ~

c i x i. If positive

i=l

information was selected: If y > 0 then go to TEST. I f y < 0 then go t o A D D . If negative information was selected: If y < 0 go to TEST. If y > 0 go to S b ~ .

ADD:

For each i, ci = ci q- xi. Go to TEST.

SUB:

For each i, cl = el - xl. Go to TEST.

This algorithm loops without termination continuously selecting objects and testing its classification rule that asserts the object is in the class if y is positive and out otherwise. If a particular selected object is correctly classified, the algorithm does nothing except choase another object to test. If the object is not correctly classified, the coefficients are altered in the direction to increase y for positive information and

137

decrease it for negative information. If linear evaluation methods are used in a pattern recognition environment, then the learnable classes are those which are linearly separable in their feature spaces as defined, for example, by Nilsson [65]. Such classes are reasonably well understood and applicable in many domains (Fu [75]). However, many important features in a pattern cannot be recognized by these systems as has been described by Minsky and Papert [69]. For example, they showed that the well known "perceptron" recognizer which employs linear decision making is not capable of distinguishing geometric properties such as "connectedness" and "parity".

Learning Signature Tables Because of the limitations of linear methods, Samuel [67] developed a decision making scheme based on sequential table lookup as shown in Figure 1. The input values xl,...,xn are used to obtain output values from the lowest level table; these output values become inputs to the next level and so forth until a final function value is returned at the top level. Signature tables are capable of computing nonlinear functions and they are very fast in execution. The class of learnable functions has been characterized by Biermann, Fairfield, and Beres [82] and an optimal though expensive learning algorithm is known. The key insight needed for understanding signature tables comes from constructing a matrix of all the function values for each table in the system. The matrix for a table should have a row for each set of input values that feed that table and it should have a column for each set of input values that do not feed that table. As an illustration, consider the table labeled A in Figure 1. Its associated matrix, shown in Figure 2, has a row for each assignment of values to (xI,x2). These are inputs that "feed" table A. It has a column for each possible vector (x3,x4,x~,x6). We note that this matrix has only two distinct rows; the first and last rows are identical as are the second and third rows. This means that table A needs only two output values and that the first and last entries must be identical and the second and third entries must be identical, Thus this matrix shows that entries (0,i,1,0) must be made into the output column of table A. (Actually, (1,0,0,1) would also be satisfactory.) Similarly, aI1 other output columns of all other tables can be derived from their associated matrices so one has a synthesis methodology for such systems.

@

@

r~

@

@

II

0

0

0

0~

~-"

0

~-.'

0

0

0

o

~0

0

0

C~

0

0

0

c~

0

X

~ C~

0

0

C3

0

X

0

140

The synthesis methodology begins with a signature table system like the one shown in Figure 1 but with the output values for the tables being unknown. A matrix is constructed for each table in the system using the function to be realized as described above and the associated table output as derived. The resulting signature table system will correctly compute the target function. Samuel, however, was not able to use this learning scheme in the checker playing application because the size of the matrices would be too large and not all entries were known. His method amounted roughly to counting the number D of O's in a row and the number A of l's in that row and computing a coefficient C ~---(A - D) / (A + D). Then rows which had similar C's were given the same output values in the signature tables. Thus, as explained in Biermann et al. [82], Samuel identified rows with similar weights whereas the ideal solution identifies rows with similar or identical profiles. His system thus made errors proportional to the degree of variation of his method from the ideal. An analysis of his methodology and a suggestion for its improvement appears in Biermann et al. [82]. Signature tables have been used successfully in many applications in addition to game playing (see also Truscott [79] and Smith [73]) such as medical decision making (Page [77]) and operating systems (Mamrak and Amer [781).

Learning Conjunctive and Disjunctive Normal Form Boolean Expressions Valiant [84] has developed a series of algorithms for constructing normal form Boolean expressions from examples of target behavior. One class that was solved is the set of k-conjunctive normal form expressions which are made up of products of unions of not more than k input variables (some of which may be negated). Thus y ~---xl (x2 + x4) (x2 + xs) is a 2-conjunctive normal form since no more than two variables appear in any single conjunct. The output y can be computed from the inputs using the usual Boolean conventions so that, for example, (Xl,Xs,Xs,X4) =(1,1,1,1)

yields

y=

1

and

(xl,x2,xs,x4)=(1,0,1,1)

yields

y=0.

Valiant has given a strategy for learning such expressions from positive examples only (where y = 1). One begins with the k-conjunctive normal form which includes all possible k-conjuncts and then as each positive example behavior is encountered, those conjuncts which do not cover that example are

141

deleted. This process will be illustrated in she learning of a 2-conjunctive normal form when there are three possible inputs xl,x2, and x 3. The initial expression contains all possible 2-conjuncts.

The over-bar

notation is used to indicate negation.

y = xl x~ x8 x~ x2 x3 (xl + x~) (xt + ~ ) (~ + x~) ~

+ ~2) (x~ + x~) ...... (~2 + ~s)

Suppose a function is to be learned and the following positive example has been received: y = 1 when (xl,x2,xs) = (1,1,0). Then all conjuncts which yield 0 on this input are removed from the initial expression for y. T h a t is, x3,xl,x2,(xl + ~2), etc. are removed leaving the following expression. Y = Xl X2 X3 (Xl + X2) (Xl -~- X'2) ('X1 -]- X2) (Xl -[- X3) ...... (X2 -]- X3) If a second positive example is presented, say y :

1 if (x~,xe,x3) :

(0,0,0), then the expression would be

simplified further.

y = ~ (x~ + ~ ) (~ + x~) ...... ( ~ + ~). Clearly a sequence of such positive examples will quickly lead to a final expression if one exists capable of computing the target function. Valiant [84] uses a probabilistic model for selection of examples and defines a function to be learned when the probability of error on positive examples is less t h a n 1 / h where h is an arbitrary value. He has shown that, using his model, the k-conjunctive normal form expressions are learnable with a polynomial n u m b e r of positive examples and in polynomial time on the parameters h and k. He has also developed similar results on the monotone disjunctive normal form expressions (where negation is not allowed) and on other classes of Boolean expressions.

Learning Miehalski Expressions Michalski has developed a methodology for inducing generalizations from instances of scientific data and thus producing theories from observations. This methodology has been widely applied to medical and agricultural problems with considerable success (Miehatski [80]). The methodology begins by coding specific observational data into symbolic form and then performing generalizations on the basic data until one or more theories can be induced. For example, in a particular application a biological cell of a known type was described as follows:

142

CELL1, B1, B2 .... , B6 [contains (CELL1, BI, B2, ..., B6)] fcirc (CELLI) = 8] [pplasm (CELL1) = A]

[shape(B1) = ellipse] [texture (m) = strips] [shape (B2) = circle] etc. This statement asserts that there is a cell containing objects B1, B2,..., B6 and it enumerates various properties of the cell and these six objects. Many other cells of the known type were similarly coded and many cells outside of this class were coded. The task of the system in such problems is to find the properties or combination of properties needed to distinguish members of the known type from other members. It is often easy to find a way to distinguish one class from another. One way is to store all the source data and compare each unknown object with the set of known objects. The primary objection to this strategy is that it does not lead to understanding. It is much more desirable to know in simple terms what differentiates one class from another and then use these defining properties. The goal of the Michalski system is to discover such simple defining properties. Michalski has thus introduced the concept of preference criteria which enable the user of his system to limit the complexity of the generated theory. The user can set a series of weights which cause the system to bias its generated theories as desired along various complexity measures. The user can thus request that. the number of operators, the cost of measuring the features, and other significant complexity factors be minimized. The knowledge base for the system comes from both the given data samples and from rules related to the domain. For example, in a domain dealing with shapes, the system might be told that n-sided figures for any n are called polygons. Such rules are important because they give the system the opportunity to simplify theories and to achieve the preference criteria given by the user.

143

There are many generalization rules used by the system to build theories. Two will be described here to give the flavor of the approach. One is called the dropping condition rule. It states that if both A and B are observed in members of a type, then perhaps A alone is enough to characterize the type. That is, suppose it is known that all observed basketball players had the characteristics of being both tall and handsome; perhaps only tallness is needed to differentiate basketball players from others. A second example generalization rule is the adding alternative rule which states that A is observed in all eases of a type, possibly the type is characterized by the condition (A or B). Thus, again suppose all observed basketball players have been tall, one could propose the hypothesis that all basketball players are either tall or strong. Generalization rules have the properties of increasing the number of cases covered, and when used in combination, decreasing the told complexity of the describing expressions. The task of the Michalski system is to find combinations of such operators which will reduce the original data (as described in the second paragraph of this section) to simple expressions which successfully separate the specified type from all other cases. The program does this with a complex combination of extensive searching and "hill climbing" on the preference criteria. For example, in the cell classification problem given above, the system found five different ways of separating the given type from the other cells. Each defining rule is clearly a tremendous simplification of the original given characteristics of the cells and provides rather helpful observations about the type being considered. The five theories are as follows: 1. ~ (1) S [texture (B) = shaded] [weight (B) _> 3] 2. [circ = even] 3. 3 ( >

1) B [shape (B) = boat] [orient (B) = N v NE]

4. 3 ( _

1) B [#tails - boat (B) = 1]

5. ~ (1) S [shape (S) = circle] [#contains (S) = 1]

The first rule state that the cells of the given type differed from other cells in that they all contained an object of shaded texture and weight greater than or equal to 3. The second rule states that the observed cells of that type all had even circumference and this differentiated them from the other cells. The other

144

three theories give equally interesting and concise information. The Michalski system thus provides a method for scientific investigators to reduce symbolic data and to search for generalizations which may help to understand it. The investigator first must find a way to encode the problem data into a descriptive form satisfactory for input to the program. Then the background knowledge and observational statements must be coded. Finally, it is necessary to specify the type of description desired and preference criteria to guide the program toward acceptable solutions. The induced generalization may help the scientist to understand his data better and may suggest further avenues for research.

Learning Semantic Network Descriptions The learned data structure might also be a semantic network instead of symbolic calculus expressions of the type described above. Although many variations on the idea exist (Findler [79]), the usual semantic network represents objects as nodes on a graph and relationships between those objects with directed arcs between the nodes. Minsky [68] and his students extensively explored the concept of the semantic network during the 1960's and Winston showed how such structures can be synthesized from examples. An illustration of this type of learning is shown in Figure 3 where a representation of an arch is constructed at the top on the basis of one example. An arch is assumed to exist whenever two bricks support a third brick. However, a second example is given to the system showing that the supported object does not need to be a brick; it may be a wedge or perhaps other object. The third example presents negative information that can be used to derive a necessary condition for an arch: the supporting bricks may not touch. The Winston system works by merging the semantic nets from all positive examples and applying information from negative examples to determine minimum conditions for the concept. This work is important because it is one of the few learning mechanisms ever aimed at the construction of semantic nets. One of the significant results was Winston's discovery of the import~ance of having negative information in available examples to prevent overgeneralization in the learning process.

145

Example 1 ARCH

p dron

Example 2 ARCH

polyhedron

Example 3

is-a

NOT AN ARCH

F i g u r e 3. B u i l d i n g a s e m a n t i c

n e t to r e p r e s e n t

of a n a r c h .

the concept

146

HI. L E A R N I N G

GRAMMARS

Introduction

In contrast to the above learning systems, a learning machine may be required to classify data of unbounded length. Thus a system may receive strings of symbols of arbitrary length and have the task of classifying those strings as being in or not in a specified type. Since grammars are commonly used for classifying strings, it is reasonable to study the problem of inferring or constructing grammars from examples. As an illustration, suppose the set of strings A BAA ABABA AAA BBA ABAA

is known to be selected from a specific class. The question arises as to what general rule may characterize the strings in the class. One could make many hypotheses on the basis of these few examples, but a reasonable guess might be that every string ends in an A. A grammatical inference system would specify its guessed rule for the class by giving a grammar. The grammar for the set of strings ending in A is as follows where v is a nonterminal symbol and upper case letters are terminals. v --~ i v v-*Bv v ---*A If one has a successful grammatical inference system, it can find a grammar that represents the set to be classified and ever after use that grammar to correctly classify strings even if they have not previously been observed. Thus if a system has correctly discovered the above grammar, it could accurately classify such strings as ABAAAABA and ABAAAB as, respectively, in and not in the target set. The learned grammar can be thought of as either a recognizer of strings or a theory of the given data.

147

The grammatical inference model to be studied here assumes the existence of an information source and an inference machine.

The information source selects a language L from a known class C of

languages and presents examples which may be in L or not in L to the inference machine. At each time t=1,2,3,.., the information source presents a string which is marked "+" if the string is in L and "-" otherwise. -Information sources may be of two kinds, positive information sources which are organized so that every string in L appears at least once in the sequence and complete information sources which produce every possible positive or negative example at least once in the sequence. At each time t=1,2,3,.., the inference machine uses all information gathered to that time to make a guess at a grammar for the language L. The inference machine knows which class C the unknown language belongs to and must select a grammar for a member of this class. Using this model, there are many possible definitions of learnahility and three will be examined here, finite identification, identification in the limit, and strong approachability. It turns out that the complexity of the learnable grammar varies greatly depending on the definition of learnability used and on whether or not a complete information source with positive and negative examples is available. The next section will show what cla~ss of grammar can be learned under the various definitions of learnability and the last section will give an example of a grammar inference algorithm.

Finite Identification The first definition of learnability to be examined here is finite identification. With this definition, it is required that after only a finite number of samples from the information source, the inference machine identifies the unknown language correctly and announces that it has done so. Suppose the class C is made up of the three languages L1,L2, and L3 which are enumerated as follows:

Li = {A} L2 = {AB, AAB} L3 = {AB, AAB, AAAB}

148

This is one of the easiest learning problems imaginable since the inference machine needs only to see a few examples of the given languages to distinguish which is being presented. Thus if the information source presents the example A+, the machine will know that L 1 is correct and print the grammar {v --* A}. If the information source presents the example AB+, than either L2 or L3 will be correct but additional information is needed to make the selection. If AAA + comes from the information source, it will be possible to select L z. However, suppose the information source presents positive information only and presents the following sequence: AB+, AB+,AB+, AAB+, AB+, AAB+, ..... One might suspect that L~ is being presented but one cannot be sure. It may be that AAAB will appear as the billionth string in the sequence and that Ls is the correct answer. There is no way to prove that L2 is the answer because one can never be sure that AAAB will not appear later. The conclusion is that even this simple class is not learnable using positive information only. There exists a member Lz of the class which cannot be distinguished from other members from positive information only. In this case, the inference machine cannot at any time select L~ and announce that it has correctly identified the unknown. From the example, one can conclude the following: A finite class of finite sets is not in general

finitely identifiable from a positive information source. On the other hand, if this class C is to be approached using complete information, both positive and negative, then any member ean be finitely identified. Consider the following sequence from such an information source for L z. A-, AB+, B-, AA-, AAB+,..., AAAB-,... Since a complete information source will include every possible string somewhere, the key string AAABwill occur and when it does, the inference machine will be able to announce (with a proof) that L2 is the correct choice and print the grammar {v --+ AB, v --* AAB}. It is not predictable when the key string will occur, but it is known that it will appear somewhere. A generalization of this argument leads to the result that a finite class of finite sets is finitely identifiable from a complete information source.

t49

If C is the class of all finite sets, the problem of learning is much more difScult. Even with a complete information source, it is not possible to discover which finite set is to be selected at any given point in time because later samples may always produce unpredicted behaviors. Thus one can conclude that the

class of finite sets is not finitely identifiable from a complete (or positive} information ~ource. These results are summarized in the chart given below. An X appears in the entry where learnability was achieved. It is somewhat surprising that despite the simplicity of the problems being examined, only one positive result was obtained. Evidently the definition of learnability is so strict that only the most trivial learning problems can be solved. Another notable observation is that a complete information source is substantially more powerful than a positive only source. This effect will be seen more dramatically in later sections.

The Finite Sets Finite Class of Finite Sets

Not learnable

Not learnable

X

Not learnable

Complete Information

Positive Information

Figure 4. Learnability summary for finite identication.

I d e n t i f i c a t i o n in t h e L i m i t There are numerous examples of learning in nature such as the learning of natural language by children. Yet our discussion above showed that only the most trivial things can be learned if finite identification is required. In this section, the requirement will be removed that the system announces its final answer after a finite amount of time. Learning by identification in the limit will be achieved if the system correctly guesses the right answer at each time after some To but To is not known. In other words the learning system may guess the same answer for millions of consecutive times without being sure it has the correct answer. If unexpected data appears at any time, the system can modify its guess and hold that theory for an arbitrary length of time. The fact that the system is never required to announce a final

150

answer greatly increases the number of things that, can be learned. The system is required to make a guess at some point that wilt never be changed again as new information arrives but it will never be sure it has achieved the final answer. Many types of language can be identified in the limit. Consider, for example, the class C of all finite sets and assume that a positive information source is available. Assume the inference machine uses the strategy of guessing that the unknown language is made up of exactly the strings seen so far. Thus if the strings A + and 13+ have been seen, the guessed grammar would be {v ~ A, v --* B}. It is easy to see that this system will identify each language L in C in the limit because every string in the unknown L must appear at some time. So L will be guessed after all have been observed and the system will never change its guess. However, the system will not know that it has seen every member so will not be able to announce that it has a final answer. We conclude that the finite sets can be identified in the limit from

positive information only, a result that is much stronger than was possible in the previous section. However, if the above problem is made slightly more difficult, it is no longer possible to identify in the limit. Let C be the class of all finite languages plus the infinite language L0 that contains all possible strings. There exists an information sequence for L0 which has the property that the inference system will never select L0 and remain with it permanently. The inference system will change its guess repeatedly and without end. The pathological information sequence is designed as follows: A finite set Ll of strings is presented repeatedly until the system selects L1; then additional strings are presented until it selects Lo; then those finite strings are repeated until it selects that set as its finite guess, call it L2; then additional strings are presented until it selects L0; and so forth. Such an information sequence forces the inference system to change its guess an infinite number of times and thus violate the definition of identifiabitity in the limit. A class of languages containing the finite sets and one infinite language is called super finite. The current conclusion is that the super finite languages are not identifiable in the limit from positive

information only. If complete information is available, many classes of languages are identifiable in the limit. Consider any class of decidable rewriting systems. Such classes C must have these properties:

151

(1)

O must be enumerable. That is there must be an effective way to list the grammars G1,G2,Gs,... for all the languages in C.

(2)

O must be decidable. That is, if G i is a grammar for a language in C, there must be a way to decide whether G i generates a given string.

Examples of classes which are decidable rewriting systems are the regular, context-free, and contextsensitive languages. One can show that classes of decidable rewriting systems can be identified in the limit from complete

information sources. The inference system simply chooses the first grammar in the enumeration which can generate M1 the known strings marked " + " and none of the known strings marked % ' . Let G i be the first grammar in the enumeration for the target language to be learned. Such a G i must appear in the enumeration by the definition of decidable rewriting systems. Since every predecessor of G i in the enumeration will differ from G i on some string, that predecessor will be eliminated from consideration when t h a t string appears in the information source. So G i will be selected after all its predecessors are shown to be inadequate and the inference system will never again change its guess. Finally, one can prove that the reeursivcly enumcrable sets are not identifiable in the limit from com-

plete information. All of these results are proven by Gold [67] and are summarized in Figure 5.

Recumively Enumerable Sets

Not learnable

Not learnable

X

Not learnable

Super Finite Sets

X

Not learnable

The Finite Sets

X

X

Finite Glass of Finite Sets

X

X

Complete Information

Positive Information

Decidable Rewriting Systems

Figure 5. Learnability summary for identification in the limit.

152

Strong Approachability Feldman [72] has given a weaker definition of learnabiIity called strong approachability. This definition requires that (i)

for every string y in the target language there is a time after which every guessed language includes y,

(2)

for each grammar which does not generate the target language there is a time after which it will not be selected, and

(3)

there is a correct grammar which will be guessed an infinite number of (possibly nonconsecutive) times.

This definition is sufficiently weak that the system could select the wrong grammar most of the time and still be said to have learned. It, however, does include the essential elements of convergence. Feldman showed that the reeurMvcly enumerablc sets are strongly approachable from positive information. A summary of all three levels of learnability and the associated results appears Figure 6.

Recursively Enumerable Sets

XX

Decidable Rewriting Systems

XX

X

Super Finite Sets

X X

X

Finite Sets

X X

X

Finite Cla~s of Finite Sets

X X C+

X X C+

X C+

Strong Approachability

Identifiability in Limit

Finite Identifiability

X

Figure 6. Results summary for three levels of learnability.

t53

A n Algorithm for Grammatical I n f e r e n c e Few practical algorithms for grammatical inference have appeared over the past decades. Even though from a theoretical point of view many classes can be identified in the limit, most algorithms are so combinatorial that they cannot ordinarily be used. Two methodologies were developed with some capabilities to deal with finite state languages (Biermann and Feldman [72]) and context free languages (Wharton [77]). The first of these methods wilt be described here. Suppose the following set of strings are known to be samples from a finite state language and the task is to find its grammar: A, AA, BA, BB, AAA, ABA, ABB, BAA, BBA. Then one can construct the behavior tree shown in Figure 7. Each string is indicated on the tree by a node found by tracing the tree down from the top taking a left branch for each A and a right branch for each B. Next a finite automaton is built from this tree. The methodology involves selection of an integer k and constructing all subtrees found in Figure 7 of depth k. Choosing k = l in this example yields five types of subtrees as indicated in the figure, one corresponding to each of these sets of strings where A stands for the string of length 0.

1. {A}2. { A , A } 3 .

{A,B}4.

{A} 5. {}

These five subtrees then become states tbr a finite state machine as shown in Figure 8. Then a transition labeled x is placed in the finite state machine from state s i to state sj whenever the subtree in Figure 7 corresponding to s i has a transition labeled x to the subtree corresponding to sj. The set of all such transitions are shown in Figure 8. The initial state of the automaton corresponds to the top subtree in Figure 7. Ali states with A in their corresponding subtrees are labeled final states. The grammar can be constructed from the automaton by adding for each transition from s i to sj on input x a rule v i --* x vj to the grammar (plus a rule v~ --* x if sj is final). The resulting grammar in this example is given below.

154

'I

!

5 Figure 7. The behavior graph for the u n k n o w n finite state language.

6

A

5© I

~.j ........ :_A,B

>

A

6

Figure 8. A finite state accept,or for the u n k n o w n language.

155

v 1 --~ A v 2 v I --*A Vl --~ S vs v 2 --~ A v 2 v~ --~ A v2-, By 3 v2 --" A v4 v 2 --* B v 5 vs -'* A v 2 vs --*A V3 "-'+ n V2 vs ---~B v 3 -* A Y4 V3 --'~ B v 4 There are clearly redundant rules in this construction but the current concern is on how to build the grammar. The resulting g r a m m a r depends on the size of k. If k is large, the inferred language will be small and may even be finite. If k is small, the inferred language will be large. Biermann and Feldman [72] give a method of converging on the correct value of k. It adjusts k to obtain perfect behavior on all " s h o r t " strings.

Conclusion Research in grammatical inference has provided a mathematical model of learning and inference behaviors with definitions of learnability, convergence theorems, and many results concerning the Iearnabitity or lack thereof of various classes of behavior (See Angluin and Smith [83]). T h u s the field has tremendous theoretical importance in t h a t it provides models and tools t h a t can be applied in a variety of situations.

156

On the other hand, the field has led to few practical methods because of astronomical computations involved in most of the algorithms. The reason tbr this high cost is that example strings from a language provide no information about how the recognition computation is done. Thus the inference algorithms are reduced to enumerative methods for finding grammars. It was eventually realized that additional information about how to do computations would be needed if computational mechanisms such as grammars were to be synthesized. Later research thus tended to focus on the synthesis of programs where the inference environment often provides trace information that substantially aids in the synthesis. The following three sections describe approaches to the inference of programs where trace information can be used in the synthesis.

1V I N F E R R I N G P R O G R A M S F R O M C O M P U T A T I O N T R A C E S

Introduction: T h e Trainable Turlng Machine In many environments, trace information is available showing how a computation is done. It is not necessary to learn the grammar or program for doing a computation from only input-output behaviors.

The trainable Turing machine described by Biermann [72] provides an example of this. This machine has a

training mode in which the user can push the read-write head up and down the tape and indicate the desired computation by doing examples by hand. Then it has a computing mode in which the system acts like a normal Turing machine and uses a finite-state controller which was automatically synthesized on the basis of the hand examples. One can show that any Turing machine can be synthesized on the basis of such examples and that relatively few examples are needed in many practical situations. However, the computation cost for automatically synthesizing the finite-state controller can be high.

The Flowchart Generation Methodology One can see how to automatically generate a Turing machine from example computations by studying an example. Suppose it is desired to sort a sequence of A's and B's on a Turing machine tape so that all the A's precede the B's. The method will be to move the head of the Turing machine right until the first A is found. Then the head will move to the beginning of the tape and place the newly found A. Next it moves right looking for a second A which is moved left to be adjacent to the first A and so forth. An

t57

example of this computation appears below with the Turing machine head position being indicated by an underline.

BAA BAA BBA BBA B.BA AB__A AB& ABB ABB ABB AABB AAB _ The task is to automatically create a Turing machine that will do this calculation and hopefully, all other "similar" calculations. A notation is needed to represent a single head operation. Triples will be used that give, respectively, the symbol read from the tape, the symbol printed, and the subsequent head movement, left or right. Thus the triple ABR means that an A was read, a B was printed, and the head then moved right. The above twelve steps thus correspond to the following twelve head movements: BBR ABL BBL (blank) (blank) R BAR BBR ABL

158

BBL AAR BAR BBR

(halt) A finite state controller for the Turing machine is needed which will direct these head movements. The construction of the finite-state controller is shown in Figure 9. Initially the Turing machine has only one state and no transitions as in Figure 9(a). But the first head movement in the computation is BBR so a transition is added to account for it in (b). This means that if the machine is in state 1 and reads a B, it will print a B, move right, and go to state 1. The second desired movement is ABL which could also involve a transition from state 1 to state 1. Unfortunately the third step is BBL which contradicts the first step. It is not possible to be in state 1 and expect a B input to yield a move left because a transition already exists that directs a B input to yield a move right. Therefore the ABL transition must go from state 1 to state 2. (See (c)). The BBL transition can then proceed from state 2 to anywhere. (It is directed to state t unless that fails, state 2 unless that fails, etc.) The BBL transition is directed to state 1 as shown in (d). The fourth head movement is (blank)(blank)R which cannot go to states 1 or 2 because it is followed by a BAR step. Both states 1 and 2 have contradictory actions on a B input, BBR and BBL. So the fourth movement is indicated on a transition from state 1 to state 3. (See (e).) Continuing this series of arguments, the final flow is completed in Figure 90). This construction can be automated and is guaranteed to produce a Turing machine capable of executing the given example. The interesting point, is that this Turing machine will sort any tape of A's and B's no matter their order or the length of the tape. The basic algorithm is given in Biermann [72] and a greatly refined version appears in Biermaim et al. [75]. Once this methodology was discovered, it was applied to numerous problems. Biermann and Krishnaswamy [76] built a trainable desk calculator that was driven by a light pen at a display terminal. W a t e r m a n et al. [84] used the idea in the construction of an adaptive programmer's helper for computing systems. Fink and Biermann [86] used the technique to automaticMly construct diMogue models from

159

(a)

@ BBR

(b) 8BR

('~

BBR

(d) - ~ BBR

SBL

(e) (~1

(h)

~BR

BBL

BBR

BIIL

O A B L Q -~-Rf~

~xx:SC:~

BBR

I~BL

Figure 9.. Constructing a Turing machine controller.

160

human-machine conversations. The procedure appears to be a fundamental mechanism for procedure acquisition which will have continuing importance in the coming years.

V

CONSTRUCTING

LISP

PROGRAMS

FROM

EXAMPLE

INPUT-OUTPUT

BEHAVIORS Introduction During the 1970's, a number of researchers examined the problem of synthesizing LISP code from examples. See, for examples, Biermann [78], Biermann and Smith [79], Hardy [75], Jouannaud and Kodratoff [79], Smith [84], and Summers [77]. Synthesis of LISP from examples is marginally feasible because the structures of the input and output lists yield substantial trace information. One of the synthesis methodologies will be described here~ the synthesis of LISP programs from recurrence relations as developed by Summers [77]. Other methodologies are surveyed in Biermann et al. [84] and in Smith [84].

LISP Synthesis from Recurrence Relations Suppose it is desired to create automatically a LISP program that will convert input ((A B) (C D) (E F)) to (B D F). That is, the target program is to collect the second elements on a series of lists. The Summers methodology requires the user to display the loop pattern in the target program in a series of input-output examples. NIL ~ NIL

((A B)) --* (B)

((A B)(C D))-~ (B D) ((A B)(C D)(E F ) ) - * (B D F) The synthesis of this program will be explained following the treatment of Smith [84]. The first step involves writing the outputs in terms of their respective inputs using LISP car, edr, and cons functions. fl(x) = NIL

161

f2(x) = cons (cadar (x)

NIL)

f3(x) = c o ~ (cadar (x) cons (cadadr (x)

NIL))

f~(x) = cons (cadar (x) cons (cadadr (X)

cons (cadaddr (x)

N~L)))

In fact, a program for achieving the observed example is F(x) = (cond (pi(x)

fl(x))

(pe(x)

f2(x))

(Pa(X)

fa(x))

(p4(x)

f4(x)))

where the pi's are predicates which select the correct fl to execute in each case. In fact, Summers gives a simple predicate generating algorithm which finds the pi's.

pi(x)

=

atom

(X)

p~(x) = atom (car (x)) p~(x) = atom (cddr (x)) p~(x) = atom (cdadr (x)) Program synthesis then involves finding a way to roll the straight line code for F into a loop. So the methodology tries to find a recurrence relation which relates each fl to previous f/s where j
fl(x) = Nm f2(x) = cons (cadar (x)

f,(cdr (x)))

f3(x) = cons (cadar (x)

f2(cdr (x)))

f,(x) = cons (cadar (x)

f3(cdr (x)))

So the recurrence relation is easily seen to be

fi(x) = cons (cad~ (x)

fi_,(cdr (x)))

for i=2,3,4. Similarly a recurrence can be found for the pi's.

162 pi(x) ---- pl_t(cdr (x)) for i=2,3,4. The induction step then assumes that these recurrence relations hold for all i > 1 and applies the Summers Basic Synthesis Theorem: If pi(x) ..... pk(x), pk+.(x) = pn(b(x))

for n _> 1

fi(x) ..... fk(x), fk+n(x) = C (fn(b(x)), x)

for n > 1

where b is a function of car's and cdr's and C is a cons structure that includes f~(b(x)) exactly once, then the function F (X) = (¢ond (pl(x)

fl(x))

(p~(x)

f~(x))

---

)

can be computed by the following recursive program. F (x) = (cond (p,(x)

fl(x))

(p~(~)

f~(x))

(pk(x)

fk(x))

(T

C (F(b(x)), x)))

In this example, b = c d r , k ~ l , and C (F(b(x)), x) = cons (cadar (x)

F (cdr (x))).

So the synthesized program is F (x) = (¢ond (atom (x)

(T

NIL)

cons (eadar

(x)

F (¢dr (x)))))

In summary, the Summers synthesis methodology begins with a carefully constructed set of examples which illustrate the desired recursive computation, The pi's and fi's are constructed for the given

163

examples and then each Pi and fi is written in terms of Pi-k and fi-k- Recurrence relations are then derived and the synthesis theorem is applied to give the finM program. This system is able to efficiently generate many useful single loop programs. Biermann [78] applied the flowchart synthesis methodology described in the previous section to the LISP synthesis problem. This leads to an algorithm for the synthesis of regular LISP programs from exampies. The regular LISP programs are analogous to finite start automata and allow arbitrarily complicated flow of control. This system does not require the user to carefully construct examples as was done by Summers but it also requires more execution time for a synthesis.

VI SYNTHESIZING PROLOG PROGRAMS FROM EXAMPLES Shapiro [83] has developed a methodology for synthesizing PROLOG programs from examples. A flowchart for the system appears in Figure 10. Its operation will be illustrated by showing how it constructs the program for the function member

(X,Y)

which yields true if X is a member of list Y and false

otherwise. The system functions as follows. The user furnishes example

facts illustrated

at the left side of Fig-

ure 10. These facts include positive information showing desired behavior for the target program and negative information showing undesired behavior. Thus in the member example, a user might supply the facts "member (a,[a,b]) is true" and "member (c,[a,b]) is false". The system at all times maintains a PROLOG program as shown at the right and it continuously compares the current version of the PROLOG program with the known collection of user-supplied facts. If the current program is not satisfactory because of lack of correctness with respect to some fact, the program is modified either by adding new clauses from the generator at the top or by throwing away existing clauses. Normal operation of the system thus involves continuously debugging the existing PROLOG program with respect to the known facts. Three kinds of errors may occur: (1) The program may compute a result which is undesired, an incorrect answer. (2) The program may be unable to compute a desired answer. (3) The program may not terminate.

164

In the first kind of error, the system simulates the incorrect computation and continuously queries the user and the data base to check that each step is correct. When a PROLOG clause is found that computes an incorrect result from correct premises, that clause is removed from the program as indicated at the bottom of Figure 10. In the second type of error, the system again simulates the computation but this time it will fail because some needed result was not computed. This indicates an additional clause is required and the enumerator at the top is run until the needed clause is found. In the third type of error, the simulator halts after a prespecified limit on computation size has been exceeded and then the system searches for causes of the suspected loop. The system searches for places where the same computation state may be reentered more than once and it may also query the user concerning violations of a well founded ordering needed for termination. The nontermination will be caused by some clause which computes an undesired result and the debugging procedure will discover that clause and remove it. Shapiro experimented with many different types of clause generators and associated with each was a class of synthesizable programs. He also developed a scheme for improving efficiency by avoiding the generation of many clauses which are covered by other clauses previously shown to be inadequate. At the time of system invocation, the user is asked to furnish the names of predicates appropriate to the current problem and to indicate which predicates may appear on the right sides of PROLOG clauses. Proceeding with the synthesis of the member program, the user first indicates that "member (_~_)" is an appropriate predicate and that it can appear on the right hand side of rules. The system begins with the empty program and debugs it with respect to given facts. If the user supplies the fact "member (a,[a]) is true", the system will discover its current program is incomplete, an error of type (2) listed above. It will be assumed for the purposes of this treatment that the generator will create the following clauses in the order given. member (X,Y) *- true

member (X,[XlZ]) ~- true member (X,[YIZ]) *-- member (X,Z)

member (X,Y) ~- member (Y,X) etc.

165

ill rul

Generator for al~ posslble clauses

IProgram I

.'~f~.,~

Get onother clause ii

"'

ill

Incomplete

Facts

l

i

i

PROLOG

J Monltor I

i

...........

1

Interpreter Nonterminatlon debugglng

i|iii

I I L

1

Program

Drop clause

Drop the offending clause i

ii

answer

Away

Figure 10 The Sh~pTrosynthes~s algorlthm

166

The notation [XIY~ stands for the list with heazt X and tail Y. The first call to the generator would then yield the current program { member (X,Y) ~- true } which covers the given example. Next suppose the user provides the fact that "member (a,[b]) is false". Here the system would discover a type (1) error and discard the single clause in the current program. This means that member (a,[a]) is no longer handled causing a type (2) error and another call to the generator. { member (X,~X[Z]) ~

true }

This program satisfies both known facts. Again the user may supply a fact: "member (a,[b,a]) is true". So another type (2) error results in an additional clause generation and the final program. { member (X,[XIZ]) *- true, member (X,[YIZ]) *- member (X,Z) } Shapiro showed his system to be capable of generating a variety of programs and compared it to various other systems. For example, his system solved the following problem posed by Biermann [78]: Construct a program to find the first elements of lists in a list of atoms and lists. Thus the program should be able to input [a,[b],c,[d],[e],f] and compute the result [b,d,e]. Shapiro's system needed 25 fa~ts to solve this problem and constructed the following program after 38 seconds of computing { heads ([ 1,[ ]) *-- true, heads ([[XlY]IZ],[XIW]) ~- heads (Z,W), heads ([X/Y],Z) *-- atom (X), heads (Y,Z) } Biermann's regular LISP synthesis system was able to create a solution for this problem using only the single example given above. However, its execution time was approximately one half hour.

V. C O N C L U S I O N Computer science has historically required programmers of systems to anticipate every possible behavior that could be desired and to program in advance M1 the knowledge and mechanisms needed to achieve it. Unfortunately, it has been found that such extensive and explicit programming is expensive

167

and it still, in many cases, does not achieve the range of behaviors that might be needed. The only alternative is to have the machines program themselves to acquire the knowledge they need to function satisfactorily. This chapter has described many mechanisms for machine learning and provides an introduction to the field. Additional information can be found in the references and in the textbook on learning edited by Michalski et al. /88]. REFERENQES {1] D. Angluin and C. Smith [1983], "Inductive inference: theory and methods", ACM Computhag Surveys, Vol. 15. [2] A. Biermann and J. Feldman [1972], "On the synthesis of finite-state machines from samples of their behavior", IEEE Trans. on Computers, Vol. C-21. [3] A. Biermann [1972], "On the inference of Turing machines from sample computations", Artificial Intelligence, Vol. 3.

[4] A. Biermann, R. Banm, and F. Petry [1975], "Speeding up the synthesis of programs from traces", IEEE Trans. on Computers, Vol. C-24.

[5] A. Biermann and R. Krishnas~vamy [1976], "Constructing programs from example computations", IEEE Trans. on Software Engineering, Vol. SE-2.

[6] A. Biermann [1978], "The inference of regular LISP programs from examples", IEEE Trans. on Systems, Man, and Cybernetics, Vol. SMC-8.

[7] A. Biermann and D. Smith [1979], "A production rule mechanism for generating LISP code", IEEE Tran6. on Systems, Man, and Cybernetics, Vol. SMC-9.

[8] A. Biermann, J. Fairfield, and T. Bares [1982], "Signature table systems and learning", IEEE Transactions .on Systems, Man and Cybernetics, Vol. SMC-12, No. 5.

[91 A. Biermann, G. Guiho, and ¥. Kodratoff, (Eds.) [1984], Automatic Program Construction Techniquuues, Macmillan Publishing Co., N.Y.

[10] J. Feldman [1972], "Some decidability results in grammatical inference", Information and Control, Vol. 20. [11] N. Findler, Ed. [1979], A88oeiative Networks, Academic Press, N.Y.

168

[12] P. Fink and A. Biermann [1988], "The correction of ill-formed input using history-based expectation with applications to speech understanding", to appear. [13t K.S. Fu [1975], Syntactic Methods in Pattern Recognition, Academic Press, N.Y. [14] M. Gold [1969], "Language identification in the limit", Information and Control, Vol. 10. [15] S. Hardy [1975], "Synthesis of LISP programs from examples", Proc. Fourth International Joint Conf. on Artificial Intelligence.

[16] J.P. Jouannaud and Y. Kodratoff [1979], "Characterization of a class of functions synthesized from examples by a Summers-like method", Proc. Sixth International Joint Conference on Artificial Intelligence.

[17] S. Mamrak and P. Amer [1978], "Estimation of run times using signature table analysis", NBS Special Publication 500-14, Fourteenth Computing Performance Evaluation User's Group, Boston,

Mass., Oct., 1978. [18] R.S. Michalski [1980], "Pattern recognition as rule-guided inductive inference", IEEE Trans. on Pattern Analysis and Machine Intelligence.

[19] R.S. Miehalski, J.G. Carbonell, T.M. Mitchell, [1983], Machine Learning, Tioga Publishing Company. [20] M. Minsky, Ed. [1968], Semantic Informatioq Processing, M.I.T. Press, Cambridge, Mass. [21] M. Minsky and S. Papert [1969], Perceptrons, M.I.T. Press, Cambridge, Mass. [22] N. Nilsson [1965], Learning Machines, McGraw Hill. [23] C. Page [1977], "Heuristics for signature table analysis as a pattern recognition technique", IEEE Trans. on Systems, Man, and Cybernetics, Vol. SMC-7.

[24] A. Samuel [1959], "Some studies in machine learning using the game of checkers", IBM Journal of Research and Development.

[25] A. Samuel [1967], "Some studies in machine learning using the game of checkers, II", IBM Journal of Research and Development.

[26] E.Y. Shapiro [1983], Algorithmic Program Debugging, M.I.T. Press, Cambridge, Mass. [27] D. Smith [1984], "The synthesis of LISP programs from examples: a survey", in A. Biermann, G. Guiho, Y. Kodratoff (Eds.), Automatic Program Construction Techniques, Macmillan Publishing Co., 1984.

169

[28] M. Smith [1973], "A learning program which plays partnership dominoes", Communications of the ACM, Vol. 16. [29] P. Summers [1977], "A methodology for LISP program construction from examples", Journal ACM, Voh 24. [30] T. Truscott [1979], '"The Duke checker program", Journal of Recreational Mathematics, Voh 12. [31] L. Valiant [1984], "A theory of the learnable", Communications of the ACM, VoL 27. [32] D. Waterman, W. Faught, P. Klahr, S. Rosenschein, and R. Wesson [1984], "Design issues for exemplary programming", in A~ Biermann, G. Guiho, Y. Kodratoff (Eds.), Automatic Program Construction Techniques, Macmillan Publishing Co., N.Y. 1984. [33] R. Wharton [1977], "Grammar enumeration and inference", Information and Control, Voh 33.

METHODS OF AUTOMATED REASONING A tutorial

Walfgang Bibel

Technische Universitfit M6nchen

ABSTRACT This chapter introduces into various aspects and methods of the formalization and automation of processes

involved in performing inferences.

It views automated inferencing as a

machine-oriented simulation of human reasoning, In this sense classical deductive methods for first-order logic like resolution and the connection method are introduced as a derived form of natural deduction. The wide range of phenomena known as non-monotonic reasoning is represented by a spectrum of technical approaches ranging from the closed-world assumption for data bases to the various forms of circumscription. Meta-reasoning is treated as a particularly important technique for modeling many significant features of reasoning including selfreference. Various techniques of reasoning about uncertainty are presented that have become particularly important in knowledge-based systems applications.

Many other methods and

techniques (like reasoning with time involved) could only briefly - if at all - be mentioned.

172

GONTENTS

INTRODUCTION

I. N A T U R A L AND A U T O M A T E D DEDUCTION

2. N O N - M O N O T O N I C REASONING 2.1 A formalism for data bases 2.2 Negation as failure 2.3 Circumscription 2.4 Inferential minimization with circumscription 2.5 Other approaches to inferential minimization

3. M E T A - R E A S O N I N G 3.1 Language and meta-language 3.2 Application to default reasoning 3.3 Self-reference 3,4 Reasoning about knowledge and belief 3.5 Expressing control on the recta-level

4. REASONING ABOUT UNCERTAINTY 4. t Bayesian inference 4.2 The Dempster-Shafer theory of evidence 4.3 Fuzzy logic 4.4 Performance approaches 4,5 Engineering approaches

5. SUMMARY AND CONCLUSIONS

REFERENCES

173

INTRODUCTION

There has never been given a widely accepted definition of intelligence that both accounts for our everyday use of this notion and at the same time yields a precise and formal notion. This is probably because intelligence is a fuzzy notion in everyday use comprising various more precise notions at the same time which until now have not been elaborated. In this situation we have no choice other than continuing to talk about intelligence in an informal and intuitive way.

Taking such an intuitive view it seems that we associate with intelligence at least the following capabilities. A person without any knowledge would never be called intelligent; hence the capability to dispose of a certain amount of knowledge is one fundamental aspect of intelligence. However , a person with all the entries in the Webster at immediate disposal, but this being the only capability, would still not be regarded as truly intelligent because we also expect the capability for solving problems in changing environments from an intelligent being. For reasons that will be discussed in section 5 of this paper, problem solving is considered a special form of reasoning. With this understanding we thus can say that the capability of reasoning is another fundamental aspect of intelligence. Intelligence has more such aspects.

For instance, the speed with which the aforementioned

capabilities are performed certainly plays an important although complex role in our intuitive understanding. In view of the computer systems that we actually have in mind, we integrate this aspect into the previous two ones as technical issues. An impressive capability for learning of course is part of our understanding of intelligence as well. But learning can be understood as a sort of problem solving and thus is taken already into account by the previous two fundamental aspects. Further an intelligent being must have the capability for communication with others; but we might prefer to combine this capability with our understanding of intelligence only insofar as the inherent problem solving and reasoning processes are concerned. We will ngt continue this analysis any further but rather draw already at this point the conclusion that, under the views taken in our analysis so far, there are essentially two fundamental aspects of intelligence, one is knowledge and the other reasoning. Let us take this as a working thesis claiming that any further aspects somehow can be understood in terms of these two as we have just discussed. The focus of this tutorial is on techniques that lend themselves to an automatic treatment. In this context we prefer to substitute the notion of knowledge by that of a knowledge base and the notion of reasoning by that of inference as a matter of notational convention. In this restricted context our thesis translates into the architectural concept that any intelligent (computer) system may be viewed as basically consisting of two fundamental components, the knowledge base and the inference component. The two are not independent, of course, since inferencing has to take the representational structure of the knowledge base into account and

174

this structure in turn heavily influences the performance of inference. The topic of this paper, then, is a treatment of various important techniques in use for the one of these two components, viz. the inference component. Because of the interdependence just mentioned this will occasionally require some discussion of issues of knowledge representation as well which will be restricted to a m i n i m u m , however, since they are extensively discussed in the contribution by J . P . Delgrande and J. Mylopoulos within this volume.

Hence

our topic is a rather restricted one; none the less it is one of central importance for Artificial Intelligence (AI) as the previous discussion should have demonstrated. To some extent the distinction between knowledge and inference may be questioned, since inference may be regarded as a sort of knowledge as well.

Specifically, the capacity for

inferencing derives from a knowledge about how one can infer new knowledge from previous knowledge. U n d e r this view it might be classified as meta-knowledge that provides information about the relation of different pieces of object-level knowledge.

Never the less the special

way, how this particular kind of knowledge is used in knowledge-based systems, justifies the distinction we made. A confusingly rich variety of methods and techniques for inferencing is known today. It ranges from exact mathematical theorem proving to the speculative conclusions of a stockbroker in one dimension, and from the human forms to sophisticated machine versions in another. An exhaustive treatment is therefore beyond the limits available within this volume.

Yet we

make an attempt to provide the reader with a feel for most of the aspects of inferencing.

At

the same time we try to present the different approaches as far as possible in a uniform way. The paper begins in section 1 with classical first-order reasoning.

In particular, we present

this form of reasoning as a model of h u m a n mathematical reasoning following Gentzen with his calculus NK.

On this basis the question is pursued in some detail how a deduction may

be determined for a given formula.

This leads us to a more technical version of Gentzen's

calculus, from which we then derive the idea for the connection method in Automated Theorem Proving (ATP).

This way of treatment and selection of topics is meant to comple-

ment the chapter by Stickel in the same volume. Section 2 is devoted to non-monotonic reasoning which is characterized by the following two features.

O n the one hand a problem description is always to be seen in the context of addi-

tional common-sense knowledge that is assumed by default.

On the other hand the resulting

complete description is to be understood in some sense in a minimal way.

Our emphasis is on

this latter aspect. There are several techniques for achieving this kind of minimality.

We dis-

cuss the techniques of predicate completion in relational data bases and of negation-by-failure in P R O L O G in some detail.

T h e n we provide an introduction to the circumscription approach

along with an illustration of its use for common-sense reasoning.

Finally, we briefly review a

number of other approaches such as default reasoning and reasoning that tolerates inconsistencies in the knowledge base.

175

Some of these approaches to non-monotonic reasoning demonstrate the need for a distinction of the reasoning on the object level from the one on the meta-level (or recta-recta-level, etc.). This important topic is treated in section 3. In particular, we explain the distinction between these various levels of languages and explain how one can technically amalgamate more than one, such levels into a single one. This may then directly be applied to non-monotonic reasoning, to expressing control knowledge explicitly in a system (such as PROLOG), and to other important topics. We also present a recent solution that allows to express self-reference within first-order logic which has an important application in the reasoning about knowledge and belief. Other approaches to this latter area are briefly discussed as well. The next major topic is reasoning about uncertainty in section 4. This is of a particular actuality since many knowledge-based systems necessarily have to cope with the uncertainty of the available information. The most widely used approach to these phenomena is based on Bayes' theorem; but we also point out a number of problems that have been experienced with this method. One of these problems have initiated the development of a related approach based on the Dempster-Shafer theory of evidence. In addition to that we conclude the presentation of such probabilistic approaches with a discussion of fuzzy logic that has been developed from fuzzy set theory. As a contrast the section concludes with a review of non-probabilistic techniques to deal with uncertainty which might be regarded more in line with AI methodologies. For instance we mention the plausible reasoning technique used in the system Ponderosa, the technique based on the model of endorsement, engineering techniques and others. In the final section we fill the remaining gaps of other forms of reasoning by briefly addressing some of them. This way we summarize the whole presentation. Its importance for many applications is pointed out.

And finally we give a view of how a complex reasoning system

comprising all these features eventually might look like.

1. N A T U R A L DEDUCTION Human beings draw inferences from what they know, that is, they explicate new pieces of knowledge from previous ones. For instance, if the original knowledge consists of the two sentences

KI: Socrates is a man K2: All men are mortal

then by way of inference anyone would conclude that

K0: Socrates is mortal

although obviously this latter fact is not explicitly given with K1 and K2 . This aspect of

176

h u m a n thinking has been observed and studied for more than 2000 years. In illustrating this phenomenon the way we did, we have already taken into account a basic assumption.

Namely, we might originally think of inferences being drawn from pieces of

knowledge that are not necessarily represented in language form.

But our assumption is that

an appropriate representation in some language can be used without affecting the nature of this phenomenon. Formally we may think of h u m a n inferencing as of a relation between pieces of knowledge like between

K1

and

K2 on one side and

K0 on the other in the present example.

cians usually denote this relation by the symbol

Logi-

]= . Since our discussion at this point still is

concerned with the cognitive aspect of h u m a n inferencing, let us emphasize this aspect by adding the subscript

h to this relation, i.e.

t% • Thus in our example the relation between

the pieces of knowledge may be expressed by

K1, K2

J% K0

O n the basis of this analysis of the phenomenon of h u m a n inferencing we face the following fundamental problems in view of the topic of this paper.

1. How can we adequately represent knowledge in language form? 2. How can we define

[~ so that it coincides with our experience?

3. How can we determine whether K [~ K'

holds for any

K and

K' ?

The first question addresses the relation of language and its meaning (or its semantics).

It is

a question that has kept many philosophers busy for at least the last hundred years. Today these issues are treated in the areas of natural language semantics and model theory. question is by far a non-trivial one. paper.

So this

We will meet it again at several occasions later in the

In order to avoid its complications for the beginning, we rely on the solution offered

by the language of first-order logic with its well-defined semantics until we discuss some of the complications in later parts of the paper. The second question of course is intimately related to the first one and with it is again decidedly non-trivial.

As long as we accept the traditional form of first-order logic, however, logic

provides us with a well-defined solution that we denoted by

[= before.

But we will later

have to account for a number of complications as mentioned for the first question.

177

^-t

^-E

v-I

v-E [A][B]

A B AAB

A^B A

A^B B

A A v B

B A v B

V-I

¥-E

9-1

F Vc F

Vx F ~

F{xkt} qx F

-.-I

-,-E

~-I

AvB

C C C

3-E

[F}

[A]

3c F C C

.-E

[A]

B A-, B

A A-*B B

Inferences according to ¥ - I

and

F ~A

3-E

F D

A

F

are subject to a condition on the variable c

Figure 1. T h e inference figure schemes of the calculus N K

i n the present section we will now discuss the third question in some detaiJ on the basis of first-order logic. m e n t relation

In other words, we now assume that

]= of first-order logic.

]% is modeled by the classical entail-

T h e r e knowledge is represented by flrst-order formulas.

]= is defined in terms of truth values in most logic texts. would prefer a more syntactic characterization.

For the purpose of automation we

I n standard logic texts we find m a n y such

syntactic characterizations. O n e is due to G. G e n t z e n which we are going to describe i n the sequel. W i t h his calculus N K G e n t z e n tried to simulate the n a t u r a l way of a m a t h e m a t i c i a n ' s reasoning.

I n particular, he observed that such reasoning starts from assumptions (or from previ-

ously established results), a n d proceeds by a n u m b e r of well-defined syntactic aries. ferred to present these rules in a tree-like form as inference figure schemes.

H e pre-

Figure 1 shows

all the schemes that establish the entire calculus NK. These schemes are to be understood in the following way. For each logical symbol there is a rule that introduces (I) a n d one that eliminates (E) it. instance, consider the first rule

^-I

introducing a conjunction symbol in

^ . It says, if there

A

extended to establish the formula

A ^ B . As we m a y see there are two different versions of

^-E

B

NK,

is a derivation for the formula

the rule

a n d one for

For

then the derivation m a y be

which eliminates a conjunction symbol. T h e same holds for the rule

F{x\t} denotes the substitution of x by t in

F.

v-I .

t78

2: Vy Pay

1: 3a Vy Pay

Pab

¥-E

3x Pxb

3-I

Vb 3x Pxb

V-I

Vb 3x Pxb

3-E~

3a Vy Pay --, Yb 3x Pxb

-~-Iz

Figure 2, A derivation in the calculus N K

As we mentioned N K operates with assumptions that are stated at the b e g i n n i n g of a chain of reasoning.

Some of the rules allow the transition of the premises to the conclusion only u n d e r

certain assumptions.

I n the rules such assumptions are represented as formulas in brackets

like the formulas

and

A

B

in the scheme

derivation in N K of the formula ject to a n assumption same formula

A

v - E . In detail this scheme says: if there is a

A v B , further a derivation of any formula

that is sub-

made initially in this derivation, a n d finally a derivation of the

C but now subject to an assumption

not subject to both assumptions

A and

B , then we m a y infer C

-

which then is

B a n y longer.

Figure 2 shows a derivation in N K that starts with two assumptions. which is short for P(a,b)

C

as "person b earns more t h a n

If we interpret

Pab

-

a d e u t s c h m a r k s ' , then assump-

tion 1 states that there is a lower b o u n d in the salaries in question, while assumption 2 states that

a is such a lower bound,

U n d e r the assumption 2 the formula

'¢b 3x Pxb

(along with

it predecessors in the derivation) is derived in a n initial b r a n c h of the derivation.

In the next

step of the derivation, however, the dependency of this assumption is eliminated by way of a n instance of rule

3 - E . T h e reference to the particular assumption 2 is established in the fig-

ure by way of the index 2 added to the n a m e of the rule.

T h e same h a p p e n s with assumption

1 in the final step of the derivation in a n analogue way, so that the final formula of the derivation is not subject to a n y assumption. In order to demonstrate that such a formal derivation in N K actually mirrors a natural chain of reasoning, we translate the derivation from figure 2 into English text as follows: 'Suppose there is a n

a

such that, for all

(assumption 2). then

Pab

Pxb

holds.

T h e n , for all

y , Pay

holds (the step V-E). Since

b

there is a n x such that

y , Pay

holds (assumption 1).

holds; therefore, if b

T h u s there is a n

x , viz.

a

Let

a

be such a n

denotes a n arbitrary object, is such a n object, such that

was arbitrary, our result therefore holds for all objects, i.e. for all Pxb

holds ( ¥ - I and 3-E~).

a

b

This yields our assertion (-,-I1)-'

Without going into any further details we just note that there is a condition on the variable in two of the rules. quantified

b

I n our example derivation, for instance, this condition requires that the all-

in the formula resulting from the V-I inference must not occur in a n y assump-

tion on which this formula depends, i.e. it must not occur in assumption 2 in this case; this

179

provision guarantees that

b

is indeed arbitrary as the text says.

Similarly, the

assumption 1 must not occur in the formula resulting from the 3 - E inference. variables apparently play a different rote t h a n the ones like x distinguish t h e m with our notation that uses

a, b, ...

ers. As a final explanatory remark we m e n t i o n that

and

y

for the ones and

a

from

Since these

in the derivation, we x, y, ...

for the oth-

F denotes the logical constant 'false'.

T h e set of these rules defines a derivability relation a m o n g formulas that usually is denoted by t- • For instance, the end formula of the derivation from figure 2 is derivable in this sense, i.e.

t-

3a Vy Pay

-* Vb 3x Pxb .

assumptions a n d a formula.

In general,

[-

is a b i n a r y relation between a set of

In the present example the set of assumptions u n d e r which this

formula is derivable is empty a n d thus not written explicitly.

T h e formula

derivable only in the context of the assumption 2, thus we have

Vy Pay

Vb 3x Pxb

is

I- Vb 3x Pxb , a n d

so forth. O n e can prove that

I= a n d

I- both define the same relation, a result which is k n o w n as the

soundness a n d completeness theorem for the calculus NK. A [- B holds iff (i.e. if a n d only if)

T h e -*-rules in N K show that

I- A -* B holds (known as the deduction theorem), for

which reason we m a y restrict our attention to the latter case with no assumptions which simplifies the discussion of question 3 above. As we already pointed out it was G e n t z e n ' s m a i n intention with N K to introduce a calculus that closely simulates the n a t u r a l way of h u m a n reasoning.

In particular, N K contrasts from

the so-called H i l b e r t - t y p e calculi which define the notion of derivability in a different way. These specify a set of formulas as axioms a n d allow only one rule of inference, viz. the rule --,-E well-known as the m o d u s ponens. of the form

(A-,(B-*C))-,(A-*B)-*(A--,C)

For instance, a n y formulas of the form

A-,(B-*A) or

would be a m o n g the axioms although their validity

does not seem to be obvious in all cases (such as the second one).

This indicates why N K

appears to be m u c h more natural t h a n calculi of the Hilbert type. As a n aside we m e n t i o n that G e n t z e n has provided a n intuitionistic version N J of N K by deleting the rule ~ - E .

This shows that technically intuitionistic reasoning is not m u c h dif-

ferent from classical reasoning.

T h e reader m a y find more on this topic in section 4 of H u e t ' s

contribution in this volume. A n additional a d v a n t a g e of N K in comparison with Hilbert-type calculi is the fact that it lends itself to a n automation of the computation of

I- in a n easier way. This becomes more

obvious if we proceed to a technical variant of N K that also was developed by G e n t z e n for the purpose of simplifying his consistency proofs for n u m b e r theory.

This variant is denoted by

L K ("logistic calculi") a n d is known as G e n t z e n ' s sequent calculus.

It is very easy to

t r a n s f o r m a derivation in N K into one in LK. For that purpose any formula

B in the given

180

~Pab

v

3y ~Pay

v Pab

3y ~P, ay v Pab 3y ~Pay

v

Ya 3y ~Pay Va 3y ~Pay

v

3x Pxb

(ax)

3x Pxb

(9)

3x Pxb v

v

v

(3)

3x Pxb

(¥)

Vb 3 x P x b

(V)

Figure 3. A derivation in the calculus GS

derivation that depends on a number of assumptions

A1,...,A~

is replaced by the sequent

A1,...,An -, B. Strictly speaking, the use of the implication sign -, takes place on a level different from the one of possible implication signs in the formula

B or in the A's, namely on the meta-level.

However, since the logical meaning of implication remains the same on any level, no additional rules are required for it.

Following Schfitte [Sch] we also interpret the comma in such a

sequent as a conjunction, drop the redundant elimination rules from NK, restrict the tertiumnon-datur ( v - I ) to literals and transform any formula to its negation normal form (no negation signs except in literals and no implication signs).

With all these modifications applied to

N K we obtain a calculus GS (for Gentzen-Schfitte) for first-order logic defined below via its derivability relation (see e.g. [Bi3]).

[- . It would be boring to explain this transition in all technical details

Rather we will illustrate it with the example from figure 2 after stating the

formal definition. Def'milion. Inductive definition of the derivability relation

1- in GS for formulas in negation

normal form. (ax) [- G1 v Ph...t~ v G2 v ~Ph...tn v G3 ; that is, all formulas of this kind, which are called axioms, are derivable. Here and in the following rules the occurrence of the formulas Gi , i=t,2,3, is optional. (^)

t-

G1 v F1 v G2

thereby (V)

(3)

l-

G1 v F2 v G2

^ is assumed to bind stronger than

I- G1 v F v G2 G1 and

and

implies

implies

t-

G1 v F1AF2 v G2

;

v .

I- G1 v gcF v G2

provided that

c does not occur in

G2.

1- G1 v F{x\t} v 3xF v G2

implies

l- G1 v 3xF v G2

The negation normal form of the last formula in the derivation of figure 2 is obtained by the following substitutions.

The formula of the form

negation sign is moved inward by replacing 43 ing to welt-known laws in first-order logic. shown in figure 3.

A--,B is replaced by

with '¢~ , and then

~ A v B ; then the

~V with

39 , accord-

The GS derivation of the resulting formula is

18t

Its first formula is a n axiom according to (ax); viz. in this instance G1 is

~Pab ,

G2

is

3y~Pay ,

the t e r t i u m - n o n - d a t u r

~Ptlt2

P a b v -,Pab

is

Pab , a n d

G3

does not occur,

Ptlt2

is the rest. Essentially it expresses

which also may be read

steps introduce a n existential assertion instead of the given term.

Pab ~ Pab .

T h e next two

T h e need for the occurrence

of the existential f o r m u l a before a n d after such a step arises from the possibility to combine more facts of the sort P a b

(e.g.

Pbb, Pcb, ...) within a single existential statement

T h e final two steps introduce a n all assertion.

3xPxb .

Note that in both cases the quantified variable

does not occur in other parts of the formula as required by the condition on

c in (V).

The

example does not illustrate a n application of the simple rule (^) that has two premises that are identical upto the two parts F1

and

F2 .

Although we skipped m a n y details that are usually discussed in a logic text in such a context, the reader might now be able to carry out derivations in either N K or GS.

O f course this is

also not the place to formally prove the fact that a n y formula derivable in GS is derivable in N K , a n d vice versa.

T h u s both calculi provide the same derivational power, while they differ

in their naturalness a n d their conciseness.

N K is more natural while GS is more concise a n d

thus technically more transparent. In order to determine whether formula

[- F

holds (question 3 above in its simplified form) for any

F one would think of starting with

F a n d try out any of the four rules of GS in a

backward way in order to see which one of t h e m m i g h t be the last in a derivation. would yield premises for which we carry out the same process again, a n d so forth. this view we would be interested in the backward direction of these rules only.

This

So u n d e r

I n this case

one might prefer to state these rules in this backward direction from the very beginning.

The

calculus k n o w n as semantic tableaux by Beth [Bet] does exactly this, i.e. its rules include exactty those of G S read in a backward direction with all formulas negated (since proofs are established by contradiction).

So the inclusion of the semantic tableaux into this discussion

would not add any new aspects. T h e r e is a further well-known result in logic which helps us to simplify our problem even further.

This says that a formula is derivable iff its skolemized form is derivable.

formula we obtain the (positively) skolemized form in the following way. mula (in negation n o r m a l form) of the form Thereby it is assumed that that there are exactly

n

Va F[a]

For any

Any- part in the for-

is replaced by

F[f(xl .... ,xn)] .

f is a function symbol not occurring elsewhere in the formula a n d quantifiers 3xi , i = l , . . . , n , that precede

Va

in the formula.

instance, the skolemized form of the e n d formula in the derivation of figure 3 is

3y ~Pay

For v

3x Pxb , because both all-quantifiers are not preceded by a n y existential quantifier so that the Skolem function has zero a r g u m e n t s , i.e. it is a constant in each case (here denoted by the same letters a a n d

b

which do not occur anywhere else in the resultant formula).

the skolemized form of the formula

3x g a 3y F[x,a,y]

is

Similarly

3x 3y F[x,f(x),y] ; a n d so forth.

As a w a r n i n g we m e n t i o n that often skolemization is introduced (negatively) in the context of

t82

a refutation rather than a proof system in which case the roles of the all- and existential quantifiers are exchanged. If we restrict our attention to formulas in skolemized form then all-quantifiers never occur. Consequently, the rule (V) in GS becomes obsolete and thus can be ignored.

In this case we

furthermore can apply another well-known result and transform any given formula into its prenex form, again without affecting derivability.

For instance, the prenex form of the end

formula again from figure 2 after skolemization is

3y 3x (~Pay v Pxb) . Any such formula

consists of a sequence of existential quantifiers, the

prefix,

followed by the

part that is purely propositional by nature and has no quantifiers.

matrix,

a formula

One may easily see that for

the derivation of such a formula it can be assumed that the rules according to (^) precede all those according to (3) (in more general form known as Gentzen's Hauptsatz).

How could one

determine a derivation of that kind for any given formula in this special form? Since we know that the final part of the derivation must consist of a finite number of applications of nile (3) for each existential quantifier, we only have to determine these finite numbers along with the term t on the left side of this rule for each of its instances.

Assume

we would know this, then we could easily determine the first formula in this final part of the derivation which must then be derivable from axioms in a first part of the entire derivation by applications of the rule (^) only.

Apparently this first part achieves nothing else than estab-

lishing that this formula is a propositional tautology.

In summary our task consists in deter-

mining the number of applications of rule (3) and the respective terms, and in testing for tautologies. The simplest solution for this task would be an exhaustive enumeration of the numbers and the possible terms together with an application of the tautology test for each resulting configuration.

In the beginnings of A T P such an approach has in fact been pursued which

became known as the British Museum Method.

Obviously it is hopelessly inefficient.

As the

crucial idea of improvement Prawitz suggested in 1960 to exchange the sequence in the solution for this task, namely t~ test for tautologies first and determine the numbers and terms by need only.

All theorem proving methods today work along this basic idea; they only differ in

the particular choice of the tautology testing method. Let us illustrate this idea with our previous example quantifiers yields the matrix x must be replaced by

a

~Pay and

3y 3x ( - P a y v Pxb) . Deletion of the

v Pxb . In order for this formula to become a tautology,

y by b . So this provides the information about the final

part of the derivation for this example which consists of exactly one application of rule (3) for each existential quantifier, one with the term

a , the other with

b .

The replacement of

variables by terms as just illustrated is determined by a welt-known and fast process called unification for which more details may be found in the chapter by Stickel in this volume.

If

no appropriate replacement would have been found in our example then we would have taken into account a second copy corresponding to two applications of rule (3) for each quantifier

183

and thus would have considered the formula

(~Pay v Pxb) v (~Pay' v Px'b)

to be tested for

tautology; and so forth. To summarize once more, because rute (3) can be applied more than once, and, viewed in the backward direction, each time produces a new copy of the matrix, the tautology test possibly has to account for more than one such copy.

Apparently one would first try one copy, taking

others into consideration if this fails to yield a tautology. appropriate tautology test that includes unification.

We are left with the question for an

The one suggested by GS consists in a

straightforward application of the inverse of rule (A) along with a simple test for (ax) on the resulting formulas, and doing this over and over again.

This is pretty redundant since the

inverse of rule (^) generates two out of one formula which share a great deal of information. One way to avoid this redundancy consists in an extension of the axiom property to any tautology which renders rule (a) completely redundant as well (hence (3) is the only rule that remains after these modifications on GS). v Pxb v ~ Q a

Let us illustrate this with the matrix

that is slightly more complicated than our previous one.

(~Pay ^ Qx)

With one application

of the inverse of rule (A) we would obtain two axioms according to (ax) after application of the same substitution as before. itself.

But now we define (ax') such that this matrix becomes an axiom

In order to give an intuitive idea of the property characterizing (ax') we use a different

representation of this matrix, viz. in real matrix form in a two-dimensional space. For that purpose we represent conjunctive parts top-clown and disjunctive parts left-right.

So our

matrix now reads ~Pay

Pxb

~Q.a

Q×

The columns in such a matrix are called ctauses. A path through the matrix is a set of literads that is obtained by selecting one literal from each clause. eling through the matrix along such a path.

Intuitively one might think of trav-

Our matrix has exactly two paths.

Note that

they correspond exactly to the two axioms obtained after application of the inverse of rule (ax) as described before.

Two literals in a matrix are called a connection if they are contained in a

path and share the same predicate symbol, one negated the other unnegated. exactly two connections as depicted in the following copy.

~Pay

Pxb

~tQa

Our matrix has

184

A set of connections is called

spanning' for

the matrix, if each path through the matrix con-

tains one of them, as is the case in our present example.

W i t h our previous substitution ix\a,

y\b} the literals in each connection become identical upto the negation sign in which case the connections (or the literals) are called

complementary. With

these notions we can now define

(ax'). ( a x ' ) [- F for any formula F for which there is a s p a n n i n g set of complementary connections. T h e r e are powerful algorithms which test for this property which along with unification provide a convenient a n d comparatively efficient solution for our task 3. It takes one (or more) copies of the matrix of the given formula a n d tests for (ax') whereby substitutions are generated by need via unification.

This whole approach is k n o w n as the C o n n e c t i o n Method.

Except for

this brief outline we will not describe it in any further details since there is a more detailed expository overview in [Bi4] for readers ready to taste it in more but still limited details a n d a comprehensive treatment in [Bi3] for the truly committed readers. With the way of development used in the outline above we wanted to emphasize the close relationship of the connection method with the calculi of natural deduction. This is a very important feature for a n interactive theorem proving e n v i r o n m e n t . Namely we might think of a powerful m a c h i n e - o r i e n t e d prover based on the connection method inside the m a c h i n e a n d the proofs (completed or partial) represented on the screen of a workstation in a h u m a n - o r i e n t e d and natural way. We would like to make the reader aware of the fact that for the purpose of explanation we have m a d e a n u m b e r of simplifications in our task that are justified in view of a correct solution.

However these simplifications (like skolemization, prenex form, etc.) do not necessarily

contribute to a more efficient solution.

This is to say that we have to omit the simplifications

if we head towards a really smart solution [Bi3].

U n f o r t u n a t e l y our task becomes then so

complex by its very n a t u r e that a long experience is needed in order to be able to advance to these more challenging topics.

O n the other h a n d there is no other way to advance this field

any further. I n a way the restriction to first-order logic might be regarded already as a very serious one since it seems to exclude a n y h i g h e r - o r d e r features.

For this reason we m e n t i o n that the con-

nection method can be generalized to h i g h e r - o r d e r logic which has b e e n carried out in section V.6.

Because of the computational problems that arise in such a general logical framework a

restriction might nevertheless be desirable.

T h e results outlined in section 3.3 in the present

paper m i g h t be of great interest also in this context. As we said above the test for tautologies distinguishes the various theorem proving methods in use today.

Solar we have discussed those based on the connection method. T h e most popular

ones are those based on resolution, however.

W e witl not consider t h e m at all here since they

are extensively covered in Stickel's chapter in this volume.

Resolution works on the basis of

185

the same simplifications that we used above. So the advances that we just talked about are important issues for resolution as well.

Most likely they will be pursued further in the context

of the connection method because it is more transparent than resolution for such a purpose which is an important point in view of coping with the complexity of the task. Let us, finally mention that all of the special topics discussed in the context of resolution in Sticket's chapter (such as theory resolution) similarly apply to the connection method.

Most of

them have been treated in [Bi3] under this viewpoint.

2. N O N - M O N O ' I K ) N I C RF_,ASONING At the beginning of section 1 we have considered h u m a n inferences as a relation pieces of knowledge.

We have then taken

[% to be the relation

order logic and studied several syntactically defined versions

[- of it.

]~ between

[= as defined for firstIt has been pointed out

in this context that this special choice will have to be reconsidered in a more detailed discussion of what we formulated as question 2. In the present section we enter this discussion. Consider the following two pieces of knowledge:

- IBM produces ~ m p u t e r s , or

P(ibm,cps)

- Daimler-Benz produces cars, or P(d-b,crs)

With no additional knowledge at hand, what would you answer being asked whether IBM produces cars, i.e.

P(ibm,crs) ? No, of course! In other words, it seems that

P(ibm,cps), P(d-b,crs)

]~ ~P(ibm,crs)

holds despite nothing was stated in the premises about IBM with respect to cars.

In fact, in

first-order logic this is an invalid inference. As another example (borrowed from McCarthy), suppose someone is hired to build a bird cage and doesn't put a top on it.

Since anyone knows that birds can fly, no judge in the world

would accept his excuse that it was not mentioned the bird could fly . O n the other hand, if the bird for some reason could indeed not fly and thus money should not be wasted by putting a top on the cage it should have been said so. In other words, it seems that

BIRD(x)

t% FLY(x)

holds which again clearly is not valid in t'irst-order logic. There are many more such cases where it seems that the h u m a n inference relation not coincide with phenomenon.

I% does

l= , the first-order one; but these two might do for a first discussion of this

There are two different ways of approach.

One is to acknowledge the

186

discrepancy and took out for an appropriate logic different from the first--order one.

There is

little doubt that we would have a hard time in bringing so different examples as the two above under a common logical framework that includes first-order logic as discussed in the previous section. The other approach would be to assume hidden pieces of knowledge as additional premises in examples of this sort.

In the first example this piece might be "and that's all which holds" in

the sense that everything that is not explicitly stated to hold is assumed not to hold, a principle known as the

dosed-world-assumption.

assumed not to hold, i.e.

~P(ibm,ers)

P(ibm,crs)

was not stated explicitly hence it is

is one among the pieces of this hidden knowledge.

If

we make it explicit by adding it to the inference above, we obtain a classical first-order inference. P(ibm,cps), P(d-b,crs), ~P(ibm,crs) . . . .

[% ~P(ibm,crs)

Similarly, if we add the hidden knowledge "birds can fly" to the premise in the second example, again the result is a classical first-order inference, viz. modus ponens.

BIRD(x),

BIRD(y)-* FLY(y)

I% FLY(x)

But note the difference; in the first case we have assumed that nothing except the stated facts holds (closed-world assumption) while in the second we added a fact (as common-sense knowledge).

In combination we might be inclined to say that on the one hand there is a body

of common sense knowledge that is tacitly assumed in any appropriate context, like the flying birds, while on the other hand no facts are taken into account except those stated explicitly or assumed as common sense knowledge. At least for these examples, then, the second approach above appears to be much more convincing.

So we learn that in certain cases humans draw inferences which involve tacit

assumptions; they become first-order logic inferences once these assumptions are made explicit. Note that these assumptions are context dependent.

For instance, in the first example the

assumptions of course would not include ~P(ibm,crs)

if P(ibm,crs)

the explicitly stated pieces of knowledge.

would have been among

As a consequence, the conclusions drawn from a set

of pieces of knowledge may change as we add additional information, a feature which is called

non-monotonicity. First-order logic is monotonic in this sense. knowledge K1 , then

If a piece of knowledge

K0 also follows from K1

implies

follows from some

enriched by additional knowledge K2 . In

symbols, K1 [= K0

K0

K1, K2

I= K0

t87

C o m m o n sense reasoning in contrast seems to be non-monotonic as we have just noticed. But our examples also show us that this non-monotonicity occurs on the surface only. If the tacit assumptions all are made explicit monotonicity is retained (since then the addition of a new fact like

P(ibm,crs)

also changes

K1

so that the monotonicity rule does not apply at all).

We will see, however, that it is not quite a trivial problem to handle the distinction between stated and assumed knowledge appropriately so that the formalism simulates usual common sense reasoning.

Before entering the technical details let us summarize the different types of

using non-monotonic reasoning following [Mc3]. 1. Use as a communication convention by which a body of knowledge is tacitly assumed unless explicitly stated otherwise (like in the bird example above). 2. Use as a database or information storage convention by which only knowledge is taken into account (whatever this means in detail) that is explicitly stated or assumed by other conventions (like in the IBM example above). 3. Use as a n~e of conjecture for solving problems in the absence of complete information. For instance, if you want to catch a bird you better assume it can fly in spite of the many exceptions to the rule that birds normally fly. 4. Use as a representation of a policy.

For instance, if a committee meeting has taken place

always on Wednesday, the next meeting will again be Wednesday unless another decision is explicitly made. 5. Use as a very streamlined expression of probabilistic information when numerical probabilities are unobtainable.

For instance, if you see a bird what might be the probability that it can

fly? In order to calculate it one would need a sample space in the first place which usually is not available in such situations.

Moreover, what purpose would it serve in the particular

situation to know that this probability is exactly 97.4%.

Or think of statements like "she is a

young and pretty woman" as another example where the probabilistic treatment appears to be out of place. 6. Use in the form of auto-epistemic reasoning where we reason about our own state of knowledge.

For instance, "I am sure I have no elder brother because if I had one I would

know it" belongs to this type of reasoning. 7. Use in common-sense physics and psychology.

For instance, we anticipate an object to

continue in a straight line if nothing interferes with it. This shows us that we are dealing here with a wide-spread phenomenon with a number of different aspects.

We will begin with the technical treatment of a very restricted case of usage 2

in the list above.

188

2.1.

A formalimm for data bases

O u r first example above has shown us that the p h e n o m e n o n of n o n - m o n o t o n i c reasoning occurs already in the case of a simple data base. structure.

Logically a data base has a very simple

So it might be helpful for the more complicated applications to study the issue first

for this simple case. Data bases are described in a relational language which is a first-order l a n g u a g e with a finite n u m b e r of (at least one) constants and predicate symbols, without function symbols, with equality, a n d with a set of simple types, that is a subset a m o n g the u n a r y predicate symbols. From a logical point of view the notion of a relational data base is defined in a model theoretic way as a triple ( R , I , I C ) where

1. R is a relational language, 2. I is a n interpretation for R

such that the constants in

R

are interpreted as mutually dif-

ferent elements in the domain. 3. IC is a set of formulas of

R , called integrity constraints, such that for each n - a r y predi-

cate symbol

= and from the simple types, IC must contain a formula of the

P distinct from

form qxl...Yxn ( Pxl...x~

"* Plxl^ ... ^ P,x~ )

where the Pi are types, i = l , . . . , n , called the domains of P .

As early as 1969 [Gre] the members of the A T P (Automated T h e o r e m Proving) c o m m u n i t y have preferred to think of data bases in a proof theoretic (rather t h a n the previous model theoretic) way.

U n d e r this view "answering a query m e a n s proving a statement . . . .

Thus

theorem proving is f u n d a m e n t a l for solving data base problems, a fact which is well k n o w n (but not very popular at present)" as I stated in 1976 [Bi2]. More recently the need for a more flexible data base m a n a g e m e n t has b e e n recognized. Attempts into such a direction faced a n u m b e r of problems such as the treatment of disjunctive information, the semantics of null values, the incorporation of more world knowledge, a n d ]ast not least the n o n - m o n o t o n i c i t y , which all seem to be due to limitations of the model theoretic view of data bases. attention it deserves.

For this reason the proof theoretic view has finally received the

It will now be briefly presented following [Re3].

A relational data base is defined in the proof theoretic way as a triple ( R , T , t C ) where R a n d IC

are defined as before while T

is a relational theory defined as follows.

t89

1. T

is a first order theory, i.e. a set of first-order formulas.

2. T

contains the d o m a i n closure axioms

axioms language 3. T

Vx ( x = q v ... v x=c~ )

~ ci = ck , i,k = 1,...,n, i
q,...,c,

a n d the u n i q u e n a m e

are all of the constants in the

R.

cSntains the equality axioms

VX X=X

reflexivity

Vxy (x=y --, y=x)

commutativity

Vxyz (x=y ,,, y=z + x=z)

transitivity

Vxl...x.,yl...y,.

Leibniz' principle of substitution

4. T

(Pxl...xr, ^ xl=yl ^ ... ,,, x==y,, "" Pyl---y=)

contains a set

D

of ground atomic formulas without equality, which might be con-

completion axioms for

sidered as the actual data base, along with the following

any predicate

P different from equality. Vx~...x~ (Pxl..,x~ -~ xl=cnA...^x~=cl~v ... v xl=e~I^.,.AX==C,~) , whereby

(ql,...,q,),

..-, (cr,,...c=)

are all of the tuples such that

P(qt,...,c=)

is in

D

for some i in {i .... , r } .

For instance, let

D

be the set

{P(ibm,cps), P(d-b,crs)}

from our previous example.

Then

there would be only a single completion axiom in this particular case, namely

Vx,xe (Pxix~ -* xi=ibm^x2=cps v xl=d-b^x2=crs) .

It minimizes the extension of the predicate

P , that is it restricts the tuples for which

holds to those stated explicitly in the data base approach is also k n o w n as

predicate completion.

theory obtained for this particular example cannot.

D

However, if we add

P(ibm,crs)

from this new theory while ~P(ibm,crs)

as described above, for which reason this It is easy to see that from the relational

~P(ibm,crs) to

D

P

then

can be derived while P(ibm,crs)

P(ibm,crs)

trivially can be derived

can nomore, because with this update the completion

axiom changes to become

Vxtx2 (PxIxz -* xI=ibm^x2=cps v xl=d-bAxe=crs v xl=ibm^x2=crs) .

T h e completion axioms are c o n t e x t - d e p e n d e n t like the assumptions discussed f u r t h e r above. T h i s way we achieve the n o n - m o n o t o n i c behavior of h u m a n reasoning within first-order logic. As we see, a classical theorem prover would now give the expected answer to a n y query to the data base

D . This remains true if disjunctive i n f o r m a t i o n is present in

occur, or if more complex world knowledge is added.

D , if null values

So this approach settles the kind of

problems that are now u n d e r discussion in the data base community.

O f course, a standard

190

theorem prover would not meet the requirements on efficiency that are standard in data base technology.

Both techniques may be integrated, however, by compiling the prover's steps into

data base techniques without changing the semantics wherever such techniques are applicable. Currently a solution is preferred that interfaces an existing data base system built with conventional technology with a theorem prover, for instance a P R O L O G system.

2.2.

Negation as failure

A conventional data base has a very poor logical structure, so poor even that this structure could be ignored by the data base community for decades.

So the question naturally arises

how the solution achieved by the completion axioms can be extended to more complex knowledge bases, say to P R O L O G programs [GeG] to begin with.

There in addition to the

relational facts as in data bases we have to account for general P R O L O G clauses that take the form of rules.

As we noted in section 1 such rules allow the derivation of facts that were not

stated initially in an explicit way.

This suggests that a generalized closed-world-assumption

takes into account derivable facts rather than stated ones.

So we would say that any fact is

assumed not to hold unless it is derivable from the explicitly stated knowledge.

For the case of

P R O L O G this principle is known as negation as failure which we briefly review now. Recall from Stickel's paper in this volume that P R O L O G clauses are rules of the form H ,- GI^...^Gn

where n > 0 , and the head

H

and the subgoals G1 are atomic formulas.

Further the goal clause is of the same form but has no head (and at least one goal).

This

means that in pure P R O L O G negation cannot directly be processed. Instead it is handled according to the principle just explained.

That is, if a goal or subgoal has the form ~ G

atomic), then the P R O L O G interpreter first attempts to prove

G ; if this fails then

(G

-~G is

established, otherwise it fails. This may be expressed as a P R O L O G program in the following way. ~G *- G,/,fail ~G *- true

We may view this treatment in a different way.

The clauses of a P R O L O G program define

the predicates occurring in the heads; but they do so with the if-halves of the full definition only that would include the only-if-halves as well.

In [Cla] it is shown in detail that

negation-as-failure amounts exactly to the effect that would be achieved if these o n l y - i f halves of the clauses would be added to the program and a theorem prover for full first-order logic would then do the interpretation. tained in [JLL].

A theoretically more comprehensive treatment is con-

Instead of presenting these results here in any detail, we simply illustrate

that the same view can already be taken in our previous data base example. P R O L O G clauses it reads

As a set of

191

?(ibm,cps) ,-P(d-b,crs) *-

Obviously, the same can be expressed equivalently in the following way.

P(xi,x2) ~- xl=ibm, x2=cps P(xi,xa) *- xz=d-b, x2=crs

which in turn is equivalent with the logical formula

P(xa,x2) *- xl=ibm^x2=cps v xI=d-b^x2=crs

As always in P R O L O G the variables are to be interpreted as all-quantified ones. With this in mind a comparison of this formula with the completion axiom (from the definition of a relational data base above) for this particular case shows that this axiom is in fact the only-if-half of this formula.

In other words, the completion axioms achieve exactly the same effect for the

simple case of a relational theory that is achieved by negation-as-failure for the more complicated case of Horn clause logic [She].

With this remark it is now also obvious that negation-

as-failure is non-monotonic since our previous example applies here too. We note a distinction, however, in the way of treatment.

In relational theories we have added

the completion axioms and then carried out a classical proof process.

In P R O L O G there is an

evaluation being extracted from the behavior of the classical proof process.

This evaluation

logically takes place on one level higher than the level of the proof process itself, that is on the meta-level.

We will come back to such a combination of object-level and recta-level proofs in

the sections 2.5 and 3 of this paper. There is yet another way of viewing the negation-as-failure approach, viz. the semantic one. It may be shown that a set of Horn clauses always has a minimal model [Llo], a fact which is not true in general for first-order logic.

The proof process in P R O L O G with negation-as-

failure in fact determines a minimal Herbrand model such as the one in the example above. Thereby minimality means that the domain is minimal -

{ibm, d-b, cps, crs} in the example

above - and that the relations have their minimal extensions the example.

-

{P(ibm,cps), P(d-b,crs)}

in

So from the semantic point of view the underlying closed-world-assumption

principle may be regarded as aiming at minimal models of the given set of formulas that describes the world under consideration. tion.

We wilt come back to this point in the following sec-

t92

2.3.

Circumscription

T h e way we handled the p h e n o m e n o n of n o n - m o n o t o n i c reasoning in the case of data bases a n d P R O L O G programs seems to be completely satisfactory at least for these restricted cases. U n f o r t u n a t e l y , the world is more complex to be modeled adequately in P R O L O G .

At least we

have to extend our language to include the features from first-order logic that are not included in P R O L O G , if not even more.

T h u s the question naturally arises whether the way of h a n -

dling used so far can be generalized to arbitrary formulas in first-order logic.

This turns out

to be more complicated t h a n one would normally expect. M c C a r t h y who worked on this problem for m a n y years if not decades has proposed a technique called

circumscription.

As he beautifully describes in [Mc2] this technique tries to cope

with the problem of c o m m o n sense reasoning of the most general sort. For instance, think of the well-known m i s s i o n a r y - a n d - c a n n i b a l s problem where three missionaries a n d three cannibals are to cross a river with a boat that carries no more t h a n two persons, and to do so in a way that at no time the cannibals o u t n u m b e r the missionaries at a n y side of the river.

The

point is that without common sense a description of that sort could never be understood.

This

is because there are myriads of ways to m i s u n d e r s t a n d the story due to its lack of precision and completeness (why not use the boat as a bridge which might work for a narrow river; or why should there be a solution anyway since the raws might be broken; etc.). Usually h u m a n s do not even think of such unlikely aspects a n d easily capture the essence of the problem for the same reasons that have been identified further above. Namely,

we

immediately associate a package of additional knowledge with such a description like "rivers normally are m u c h broader t h a n a boat" or our "birds normally fly" further above.

However

this extension is performed in a m i n i m a l way, i.e. no objects or properties are assumed that are not normally associated with a scenario as the one u n d e r consideration. offers a technique to simulate such a behavior in a mechanical way.

Circumscription

O n e element in this

technique is the use of sort of a completion axiom like the one in section 2.1.

W e begin by

formally defining this circumscription formula. This definition requires the use of second-order logic which we have not m e n t i o n e d so far. T h e reader should think of first-order logic as before except that function and predicate symbols are no more considered as constants, but m a y be regarded as variables a n d thus m a y also be quantified in the same way as the usual object variables in first-order logic.

Let

A(P,Z)

be such a formula of second order logic in which P occurs as such a predicate variable but is not quantified.

I n fact here a n d in the following we always allow a n y variable to represent a

sequence of variables, i.e.

P1,...,P~

write down the sequences explicitly. variable

P

a n d a n object variable

in t h e present case; but for sake of readability we never Further let x

E(P,x)

both are not quantified.

T h e n the circumscription of

mula

defined by

Circum(A;E;P;Z)

be a formula in which the predicate

(both possibly tuples by our assumption just made) E(P,x) relative to

A(P,Z)

is the for-

193

A(P,Z)

^ Vpz {A(p,z) ^ Vx[E(p,x)-.E(P,x)]-, VxIE(p,x),,E(e,x)]} .

For a better understanding of this formula let us instantiate it for the case of a simple example such as the one from section 2.1. data base, i.e. in it, i.e.

There

A(P,Z)

would be the formula describing the

P(ibm,cps)AP(d-b,crs) , and we would have to circumscribe the predicate

E(P,x)

P

would simply be P(xl,x2) • So the circumscription in this case would be

P(ibm,cps)AP(d-b,crs) A Vp {p(ibm,cps)Ap(d-b,crs)

A Vxlx2[p(xl,x~)-*P(xl,x2)] -* Vx~,x2[p(x~,xa)*-*P(x~,x2)]}

Since p is all-quantified, we may think of any predicate.

For instance, consider

p(xl,xa) -= xl=ibmAx2=cps v xl=d-bax2=crs . The premise

Vx~,×~[p(x~,x~)*P(x~,x~)] in the circumscription formula is obviously true given the assumption

A(P)

in this case.

Therefore, according to the circumscription formula, it is also required that

Vx~,xdp(x~,×~)~P(x~,x~)] holds as well which spelled out is the formula

Vxz,xa[P(x~,x2) ~ x~=ibmAx~=cps v x~=d-bAx~=crs] ,

i.e. the completion axiom from section 2.1.

In other words, we have shown that for the sim-

ple case of our data base example the completion axiom is a logical consequence of the circumscription formula, a result that holds in general for data base as well as for Horn clause problems [Re2,She,Mc3]. This might have given us a feel for the circumscription formula, at least for this special case. It is meant to replace a given set of axioms extension of

P

when

A(P,Z)

by a modified set that minimizes the

Z is allowed to vary in this process of minimization.

world descriptions A(P,Z)

It applies to

of arbitrary form, in fact even one in second-order logic, which is

to say that circumscription is far more general than predicate completion and negation-asfailure as discussed in the previous two sections.

Perhaps it is even too general for most prac-

tical applications for which reason we now present it in a slightly more restricted form of predicate circumscription where E(P,x)

is P ( x ) .

194

For this purpose we also abbreviate any formula of the form

Vx(Px--,Qx) by

case of tuples (aIways keep in mind our assumption), P_~Q abbreviates and

P
stands for P < Q ^ ~ Q < P . T h e n

Circum(A;P;Z)

P ~ Q ; in the

PI_~QI^...APn_gQ~ ,

may be expressed [Lil] by

A(P,Z) A ~Bpz[a(p,z)Ap
which is equivalent with our previous version for this special case as may be easily seen. Actually, the more general case may in fact always be reduced to the present one by introducing an abbreviation of the form Po(x) ~" E(P,x)

into A(P,Z) .

Currently there is no working system of knowledge representation based on circumscription. The major difficulty in designing such a system lies in the fact that the circumscription formula involves a second-order quantifier.

Fortunately it is possible in many cases to reduce the

circumscription formula to one in first-order logic as is shown in [Lil].

At least for these

cases such a system may now easily be realized on the basis of any existing theorem prover . At the end of the previous section we discussed the model theoretic meaning of negation-asfailure and we will now provide the same for circumscription. that

A(P,Z)

is

given.

Then

for

any

two

For that purpose let us assume

models M1,M2

of A(P,Z)

we

write

Mz -~P;z M2 if (i)

the universes of, both models are the same,

(ii) for every (object, function or predicate) constant not in P, Z both models also coincide, (iii) for every predicate in P its extension in ml

is contained in that of M2 .

Then the following result holds [Mc2,MiP,Lil].

Theorem.

M

is a model of Circum(A;P;Z)

iff M

is minimal in the class of models of A

with respect to _Kp;z .

The relation

<~;z is, generally, not a partial ordering; therefore

not necessarily exist.

In fact, there are consistent formulas

is even inconsistent [EMR].

M

as in the theorem must

A such that their circumscription

For important classes of formulas it has been shown, however,

that consistency is always preserved [Li2,MiP,EMR].

These complications indicate why we

regarded this topic as a difficult one at the beginning of the present section. Before we turn our attention now to the application of circumscription to non-monotonic reasoning, we finally mention as an aside that circumscription has a close relationship with the concept of implicit definability, that has been explored for many years in Mathematical Logic

[Do2].

195

2.4.

Inferential minimi:,-~tion with c i r c u n ~ p t i o n

As we said in the previous section, circumscription provides a tool for treating examples like the one with flying birds further above.

We will now illustrate how this works in detail

[Mc3]. For that purpose let us use the following predicates.

Bx

for

xisabird

Ox

for

x is an ostrich

Fx

for

x can fly

ABx

for

x is abnormal

Instead of stating that all birds can fly, as we did at the beginning of the present chapter, we rather express that birds normally can fly to account for the kind of scenarios described at the beginning of the previous section.

So we consider the following formula A(AB;F) :

V:,,(Bx,,.ABx.-.Fx)

,, Vx(Ox-.Bx) ,,, Vx(O,,-..l~x)

Intuitively we would like to have the ostrichs as the only birds which are abnormal w.r.t, flying in the world captured by A , that is Vx(ABx,-,Ox) . Indeed the circumscription formula Circum(A;AB;F) essentially amounts to this equivalence and thus produces the desired result. From this example we see that facts of the sort "normally such and such is the case" are represented as a first-order logic statement which in the premise includes a literal ~ABx that accounts for possible abnormal cases.

Minimizing this predicate by circumscribing it yields the

kind of reasoning observed for humans which might be called inferential minimization, general there may be various ways (or aspects) of being abnormal. using a different predicate

tn

We may treat this by

AB for each aspect, or we may provide the distinction in a func-

tional way with a single predicate AB , such as AB(aspectl(x)) , AB(aspect2(x)) and so forth. In the following scenario, that includes airplanes (P) and dead things (D), we take the first alternative.

Vx(Ox-,Bx) Vx~(Bx^Px) Yx(~ABlx -' ~Fx) Yx(Px ^ ~AB2x -, Fx) Vx(Bx A ~AB~x "~ Fx) Yx(Ox A ~AB4x "* ~Fx) Vx(Bx A Dx n ~ABox--, ~Fx)

196

The circumscription

Circum(A;AB1,...,ABs;F)

does not lead to the intuitively expected con-

clusions: there are no abnormal airplanes, ostrichs, and dead birds; ostriehs and dead birds are abnormal birds; airplanes and the birds that are alive and not ostrichs are the only objects satisfying ABz • The reason is that the goals of minimizing our five abnormality predicates conflict with each other.

For instance, minimizing the extensions

AB~ and

ABz conflicts

with ttie goal of minimizing AB 1 . Prioritized circumscription [Mc3] overcomes this problem.

There one establishes priorities

between different kinds of abnormality, specifically one assigns higher priorities to the abnormality predicates representing exceptions to "more specific" common sense facts, e.g. AB~ > AB 2 , AB~ > ABz in the present example.

AB 4 ,

With the circumscription formula adapted

to this generalization indeed provides a satisfactory solution that leads to the expected solutions.

Also for this case a first-order treatment may be achieved in many important cases

[Lil]. In summary we have seen that circumscription offers a rather general solution to the common sense reasoning.

The approach still seems a bit too complicated in comparison with the sup-

posed human reasoning.

Also there are still open problems of detail. Therefore it seems

worthwhile to have a look at other approaches taken to cope with this phenomenon as we do in the next section.

2.5. Other approaches to inferential mlnirni~,.ation While circumscription generalizes the way taken with the completion axioms from section 2.1, all of the variants discussed briefly in the present section might be regarded as a generalization of the way taken with negation-as-failure from section 2.2. Namely, it has been pointed out in 2.2 that negation-as-failure is in fact a meta-levet principle, and the same holds for all of the following approaches. 2.5.1

Explicit listing of exceptions. The simplest way to deal with rules of the sort "birds fly"

is by including all exceptions explicitly in the form

BIRD x ^ ~OSTRICH x ^ ~PENGUIN x A ....

FLY x

For a large number of exceptions this clearly is an awkward approach although it reduces the problem to classical reasoning without any extra provision. To some extent the behavior can be simulated by taking advantage of the fixed control of a P R O L O G system and listing the clauses appropriately, or by a set-of-support mechanism in a general resolution prover. 2.5.2

Default reasoning. One natural way to deal with problems of the flying-birds sort is to

interpret a rule like "birds fly" more precisely as "if nothing is known to the contrary we may assume that birds fly". The question thus is how we could formalize the "if nothing is known to the contrary" in this phrase.

197

Reiter in IRe1] has proposed to adopt the interpretation "it is consistent to assume that ..." for it which formally may be represented as a default rule in the following way.

BIRDx : M FLYx FLY x

The general form of such a default rule is

A: MBI,...,MB~ C

O f course, it has to be made precise in a formalism what exactly this means and how such a rule can be applied within a theory, in particular how the consistency can be determined. Reiter has carried out this program in all details in form of a default theory. default theories have deficiencies that prevent their use in the intended way.

In general,

These deficien-

cies do not occur if one restricts the theories to normal or semi-normal default rules. default rule is called normal if m=l and Bz-C in the rule above. is of the form

A : MB1 , M~B/B1

A

It is called semi-normal if it

, where B is an atom.

Even in normal default theories there is a serious computational problem since each Dale application requires a deductive test for the derivability of the defaults which incidentally demonstrates the meta-level aspect of this approach which will further be pursued in section 3.2.

In

[Gro] it has been shown that this kind of default reasoning can be reduced to circumscription for which reason we do not discuss it here any further. 2.5.3

Truth maintenance. As we learned at the beginning of the whole section 2, the addition

of facts to a non-monotonic reasoning system may change the conclusions which can be derived.

In practice one would prefer to store a condusion explicitly after its derivation in

order to keep it available for later purposes.

This however raises the problem of truth-

maintenance since after new updates of the knowledge base earlier derived conclusions might not be true any longer.

One of the first systems that deals with this particular problem is

described in [Do1]. 2.5.4

Modal lo#c. The

M

in the default rules may well be interpreted as the modal opera-

tor "is possible" from modal logic.

This is no surprise since modal logic was invented to deal

with exactly this kind of recta-level reasoning about what is derivable on the object-level or not.

In a modal logic approach of such a kind of problem the main issue always is to find the

right axioms capturing the exact meaning of the modal operators.

In our case this attempt,

which follows the first line of approach mentioned at the beginning of this section 2, has run into a number of problems. be found in [Moo].

The latest state of the discussion of this particular approach may

198

2.5.5

HiKher-order predicates. Higher-order logic provides another way of integrating recta-

level expressions into object-level ones.

An approach to default reasoning that takes advantage

of this flexibility but within first-order logic is discussed in section 3.3. 2.5.6

Meta-reasoning.

Instead of integrating the meta-level features into the object language

as in the previous two approaches, one may separate them explicitly from the object language and provide a mechanism that links the two levels together.

This approach has been taken in

[BoK]. We will briefly demonstrate it when we discuss recta-level reasoning in section 3. 2.5.7

Tolerance of inconsistencies. If we have a rule with exceptions then an inconsistency

would arise if no extra care would be taken.

For instance, if we have

B I R D x --, FLY x P E N G U I N x --, B I R D x ^ ~FLY x

then penguins would both fly and not fly.

It seems that humans use exactly this kind of

representation to deal with this sort of problem; especially no one thinks of any exceptions when asked about the characteristics of birds and then lists among other things that birds fly. Yet such inconsistencies apparently do not confuse our logical reasoning. In a formalism the inconsistency could be tolerated by restricting the inference mechanism so that e.g. the two rules above never interfere with each other.

If we think in terms of the con-

nection method as the underlying inference mechanism, then this would simply m e a n that certain connections may never be used in any deduction. Which of the connections are taken out in this way may be determined in advance for a given knowledge base.

We just mention that

this amounts to determining tautology loops in the knowledge base like the one from the literal BIRDx in the first clause to the same in the second clause, and from ~FLYx there back to FLYx.

In this example these two connections together are useless for any reasonable deduc-

tion and thus should never be taken into account. The advantage of this simple proposal, which seems to have been ignored so far, is the resulting efficiency.

Namely, it would even be more efficient to locate an appropriate deduction

than in a usual first-order problem since some connections may simply be ignored thus reducing the search space.

For this reason we feel that this approach might be the most attractive

one at all. But no one has explored this in any detail upto now. In summary, we have seen that there are several viable solutions at hand that may be used for realizing non-monotonic inference, a form of common-sense reasoning.

It is now time to try

the most promising ones in experiments in order to find out their relative merits in practice.

199

3.

META-REASONING

The methods discussed in the previous two sections provided a way of drawing inferences among pieces of knowledge represented in some formal language, mostly first-order logic. These syntactic constructs were meant to model some real world scenarios. Apparently these constructs themselves might be regarded as part of some real world scenario. In fact, often the need arises to reason not only about the real world knowledge of the first sort but also about these formulas. For instance, in the default reasoning approach we met such a situation where the reasoning process involved the question whether some formula was derivable which amounts to reasoning about certain relations among these syntactic constructs. There are many other applications than the one just mentioned.

For instance, one might wish to provide the user of a reasoning

system access to its control. This necessarily requires a language allowing for meta-level expressions. Another application arises in situations where we reason about the knowledge or beliefs of other agents. The present section deals with exactly these kinds of phenomena.

3.1.

Language and met,a-language

The first-order language used so far in this paper for emphasis might be called an object-level language in the present context. What then is a meta-language? It talks about the syntactic entities of the object-level language just as this in turn talks about some other entities. A literal such as

P(ibm,cps)

is an example of such a syntactic entity. How could it be named

in the meta-language? It is our intention to let the recta-language be a first-order language as well. Objects in such a languages are denoted by constants. Hence we would have to denote such a literal by some constant, say

c , in the meta-language. O n the object-level we sometimes preferred to use

mnemotechnic notations such as ibm

rather than

c . The same will be even more useful in

the present context. The most natural way to n a m e a phrase is by quoting it. Hence we include constants of the sort

"P(ibm,cps)" in the alphabet of our recta-language keeping in

mind that this is just for better reading otherwise being a constant just as

c . This way we

may name any first-order formula of the object-language. Once we are in a position to name formulas we may then represent relations among them by predicates. For instance, a predicate, say I N F E R , in the recta-language may denote the relation defined by

l- from section 1. For instance, we may consider the following literal.

INFER("Ms^Vx(Mx-,MTx)","MTs")

It relates two constants that name the formulas considered in section 1 where we used different names for them, viz. K I ^ K 2 and K0, So n a m i n g and talking about formulas has been done before in this paper except that this was done in a more informal way while now we are

200

a i m i n g at a formalism for the same purpose. I n fact, a definition like the one for the system G S i n section 1 is already very close to such a formalized language. M a y we therefore suggest as a n instructive exercise that the reader rewrites this definition in a purely formalized way in first-order logic, or in P R O L O G .

If done correctly with D E R I V E d e n o t i n g the derivability

relation then along with the rule

INFER(x,y)

if

DERIVE(x."-*".y)

we m i g h t successfully r u n the resulting P R O L O G program with the literal

INFER("Ms^Vx(Mx-*MTx)",'MTs")

as a goal clause. I n other words what we do this way is just writing a n interpreter for GS in PROLOG

-

-

one application of the r e c t a - l a n g u a g e .

I n this exercise a question arises that has b e e n made explicit in the rule just given. Namely, we face the need to construct constants with variable components such as and

y

x."-*".y

where

x

ranges over arbitrary formulas. For this purpose we used the infix notation for con-

catenation; alternatively we might have written conc(x,conc("~",conc(y, nil))) in L I S P notation which for first-order logic is simply a t e r m with two vm'iabtes, a n d thus causes no problems at all. Let us summarize this discussion and do so in restriction to P R O L O G in order to focus the attention on a n executable system. T h e r e is the usual language in which we write P R O L O G programs, the object language. T h e P R O L O G system interprets such programs thus establishing a relation between the program, say progr , and the

progr I- goal

goal , formally

.

T h e r e is a second language, the m e t a - l a n g u a g e , which allows to n a m e formulas (programs a n d goals) a n d relate them by predicates, e.g. we m a y say I N F E R ( " p r o g " , " g o a l " ) a n d "goal"

are the names for

prog

and

where

"prog"

goal on the meta-!evet. Even more we m a y write

a n interpreter for a formal system like GS in this m e t a - l a n g u a g e in form of a P R O L O G program, say interpr , a n d r u n Prolog to test the relation

interpr t- I N F E R ( " p m g " , " g o a l " )

If all this is done correctly then we clearly would expect that the one relation holds iff the same is true for the other, which in fact provides the i m p o r t a n t link between the two levels. Formally this link m a y be established by adding the transitions from one to the other as

201

explicit rule.s to the system that this way amalgamates the two languages/systems into a single one.

These two rules are usually called reflection principles and have been used in a number

of systems that were designed along t~hese lines [Wey,BoK,BoW,Gen].

3.2. Application to default reasoning tn section 2.5.2 the meta-level aspect of default reasoning was already pointed out and may now be formalized in the following way [BoK].

FLY(x) if BIRD(x), ~ I N F E R ( ' p m g " , ' ~ F L Y ( " . x , " ) " )

In other words, birds for which the knowledge base represented by the current object-level program denoted by "prog" does not specify anything to the contrary are assumed to fly. Although this is an elegant way of representation experience with its implementation [BOW] has to show whether it is a feasible approach that may compete with the circumscription techniques under development.

3.3.

Self-reference

We learned in the present section 3 that on the meta-level we may name syntactic items from the object-level. So if

Pc

and introduce a predicate

is a literal on the object-level we clearly may name HAS

P

by

"P"

on the meta-level, which we define by

H A S ( c , " P " ) ,-, Pc

informally,

c has property named

"P"

iff Pc holds, a form of what is called comprehen-

sion axiom. That seems to be absolutely natural, and in fact G. Frege has taken this view about a hundred years ago in a slightly different notation. Unfortunately, B. Russel showed early in this century that a formal system that allows these kinds of definitions is inherently inconsistent.

Essentially, he defined

R(x) ,-, ~HAS(x,x)

and applied it to x = " R " , i.e. used

an example that involved self-reference. This problem can be eliminated simply by avoiding self-reference altogether. But this amounts to a serious restriction since this feature is one that we use quite often in our natural way of reasoning; just think of the sentence "what I am just saying is not correct". The solution proposed by Russel consisted in establishing a hierarchy of language levels in what today we call higher-order logic, which however leads to computational and still representational problems [Per]. In section 3.1 we avoided this problem by allowing a less powerful comprehension axiom, viz. the reflection principle. Recently, it has been shown [Fef, Per] that Frege's approach is possible in a consistent way if one takes into account a slight restriction on the comprehension axioms that can be easily

202

tolerated for practice. It seems that this result which we do not develop in detail here might have far-reaching consequences among which are the following. Object- and meta-level may be amalgamated in an even stronger form than shown in 3.1. The computationally relevant parts of modal and higher-order logic might be treated in a purely first-order way which in turn would mean that first-order logic would finally turn out to become the formalism par excellence. Encouraged by this result let us reconsider our flying-birds problem from section 2. R e m e m b e r that at the end of section 2.4 we already indicated doubts whether the circumscription (and other) approaches to non-monotonic reasoning are as efficient as the human one. In particular we feel that a rule like "birds fly" itself is not affected by encountering say a penguin that does not fly. Here an additional rule is added that we think is not "penguins are birds" but rather "penguins are non-flying birds". Let us illustrate this idea with the following example. Px

represents "x is a penguin";

animal"; r

denotes robby. T h e n

animals"

Bx

represents "x is a bird";

Cx represents "x is a cardinal";

by

A("B")

"birds fly" may be represented by

, "cardinals are birds" by

and "tweety is a penguin" by

Ax

represents "x is an

Fx represents "x can fly"; t denotes tweety and

BCC"),

~F("B("P")")

, "birds are

"robby is a cardinal" by

C(r) ,

P(t) . Altogether this scenario is given by the following for-

mula. A("B") A F("B") A B("C") A ~F("B("P")") A C(r) ^ P(t)

Properties that apply to a class also apply to each of its members, expressed by

zcx")

^ X(x) ~ Z(x)

otherwise it must be made explicit (e.g. in the functional way whole("B") ) that the class as a whole is addressed. This property inheritance is allowed in classes with additional restrictions only if the property is not identical with the complement of the restriction, expressed by the formula Z ( " X " ) A Y ( " X ( " . x . " ) " ) ^ " Y " ~ " ~ " . " Z " --* Z(x)

From these three formulas we easily may infer that robby can fly but that tweety cannot, further that both are animals. For instance, the second rule applied with the substitution {ZkA, X\B, Y\~F, xV'P"}

yields

A('P")

which by application of the first rule results in

A(t) . This looks like a very unusual way of representing this kind of knowledge in first-order logic. But we remind the reader that logic provides the form only and is totally open w.r.t, how this form is used to represent concepts. If this representation has no other disadvantages (which we might have overlooked at this point) why not preferring it to more familiar ones. In natural

203

language this kind of representation seems to be quite familiar anyway.

3.4. Re.a~ning about knowledge and belief A sentence like " D e a n doesn't know whether Nixon knows that Dean knows that Nixon knows about the Watergate break-in" demonstrates that reasoning about knowledge and belief is not only familiar in everyday circumstances, but may also be quite complicated. If we, to begin with, restrict it to the simple form "a knows that B" then we may see that knowing is a meta-level concept. So for its representation we may use the two approaches discussed before in the present section 3.

That is we may treat

KNOW

as a predicate on the meta-level or

may allow iterated application of predicates as in K N O W ( a , " G R E E N ( g r a s s ) " ) . From a philosophical point of view all knowing is relative, that is pieces of knowledge are actually beliefs that are true relative to some higher beliefs. We therefore prefer to talk about belief rather than knowledge and adopt the rule

KNOW(a,F)

~ BEL(a,F) ^ T R U E ( F )

but not its inverse. So as another example "Hans believes that Richard-von-Weizs/icker is president of G e r m a n y " would read

B E L ( h a n s , " P R E S ( r - v - w ) " ) . Or "There is someone whom

Hans believes to be president" would either read [NAMES(hans,x,y) ^ BEL(hans,"PRES(".x.")")]

.

3x B E L ( h a n s , " P R E S ( " . x . " ) " )

or 3xy

This illustrates how one may represent

arbitrarily complex statements about someone's beliefs with the first-order language envisaged in the previous section 3.3. Obviously, one may then also use the reasoning mechanism available in first-order logic for drawing correct inferences. Like with any other predicates we have to specify rules that capture our intention with the predicate BEL . For instance, one might think of the rule [Moo]

BEL(a,"A-*B") A B E L ( a , " A " ) -* BEL(a,"B")

This kind of rule may be questioned since

a

might never think of actually performing the

inference. A similar case is known as the problem of logical omniscience occurring in the presence of the rule K N O W ( a , A ) ^ (A-,B) -* KNOW(a,B)

which even more cannot be accepted without some restriction.

A possible solution is discussed

in [Levi. The problems occurring in reasoning about knowledge and belief have extensively been discussed in [Hin]. There a semantics is taken into account that envisages possible worlds. In particular,

a knows

F iff F is true in all the worlds a thinks are possible. This kind of

204

semantics has been formalized with Kripke structures [Kri] but it is not clear how to model the state of knowledge with them. There are serious doubts w.r.t, computational feasibility of these possible worlds approaches. In [Mcl] a functional approach to modelling knowledge and belief has been drafted. There concepts are treated as special functions that are denoted by strings starting with a capital letter. For instance, Know, Pat, Phonenumber, Mike, all denote such concepts. "Mike knows Pat's phonenumber" would read

TRUE Know(Mike,Phonenumber(Pat))

Similarly, "Joe knows whether Mike knows ..." would read

TRUE Know[Joe, Know(Mike, Phonenumber(Pat)) ]

while "Joe knows that ..." would be distinguished by reading

TRUE K[Joe, Know(Mike,Phonenumber(Pat))]

with a different knowing concept denoted by K . For each concept X there is an object x of which X is the concept, formally x = denote(X) . Although this approach seems to work for the examples given in [Mcl], we share the opinion expressed in [Per] that the hope of a satisfactory functional treatment of concepts and modalities, that is associated with this proposal, is unfounded, not mentioning the unintuitiveness of this model. Finally we mention the approach described in [Bi6]. It takes the view that different believing agents are making their inferences in completely separate worlds, which formally means a representation in alphabets with empty intersections. In this sense the phonenumber of Pat in Mike's world of beliefs would be represented as

phonenumber~ike(patmke) indicating that both the function symbol and the constant are taken from Mike's world and are to be regarded different from those say of Mary's world similarly indexed by mary. This basic idea has been generalized in order to correctly reason in a purely first-order setting about what different agents know. That such reasoning is not quite trivial may be seen from such examples like "Mike knows Pat's phonenumber which incidentally is the same as Mary's; does Mike therefore know also Mary's number?" He does not know it, that is, unlike usual first-order reasoning, reasoning on knowledge is opaque and not transparent in the sense that we not simply substitute equals for equals in such sentences unless Mike knows of the equality

205

(and in fact uses it).

3.5. Expressing control on the recta-level

Usually, a theorem prover for first-order logic or a subset thereof is built with a fixed more or less complex control in it. In particular in the context of using logic as a programming language the need naturally arises to adapt the control to the special problem under consideration either automatically or by interaction with the programmer. From what we have learned in the present section 3 it should be obvious that this is typically a recta-level task. It requires a control language on the recta-level talking about what goals to be processed next and unified with what heads of etauses, if we think in terms of P R O L O G . It is straightforward to realize this idea in a practical control language. In [Bil] it has first been discussed how such a control added to a logic program if appropriate would after compilation result in a program as efficient as any corresponding say A L G O L program. A language in use that realizes this kind of approach is I C - P R O L O G [C1M]. For other proposals in the same direction see [GaL,Gal].

4. R E A S O N I N G

ABOUT

UNCERTAINTY"

Recall from section I that the basic issue of this paper is the inferential relation K section 2 we have already considered examples where the knowledge

K

I% K'. In

is not quite certain in

some sense; the knowledge that birds fly is of such a kind. In m a n y cases of that sort wc have some feeling about how certain we actually are. In scientific disciplines such as economics, medicine, geology, etc. this feeling often has even a very solid base provided by extensive statistical material. If such additional information is available then obviously it should be taken into consideration in our way of reasoning perhaps in a more explicit way than the one discussed in section 2 where a m o n g the various uses of non-monotonic reasoning we already mentioned this aspect under point 5. In the present section we are going to discuss some of the possibilities that have been explored for that purpose. We do so in a section separate from section 2 not so much because the quality of the problem is really different, rather because here there is an emphasis on this extra aspect of taking into account the uncertainty in a more quantitative way. Also note the relation with the previous section since this extra aspect clearly is some sort of knowledge on the meta-level although the approaches discussed below mostly do not take this into explicit account. The phenomena associated with this kind of reasoning about uncertainty have been studied under m a n y different points of view which gave rise to a variety of different names for more or less the same topic. Some of these names are fuzzy, approximate, plausible, or vague reasoning, reasoning with, under, or about uncertainty, theory of evidence, or of possibility, to some extent also inconsistency reasoning. A m o n g all these approaches we may distinguish

206

those based on some sort of probability theory ("normative approaches") from those taking a non-probabilistic knowledge-based point of view which aim at modeling h u m a n performance ("performance or positive approaches").

4.1. Baye~an inference Often h u m a n s associate some kind of measure with statements. For instance, an expert investment counselor might associate a degree 0.5 with the rule that advanced age implies low risk tolerance. This might mean that there are statistic informations showing that 60% of the elderly people have low risk tolerance. It may also mean that the expert summarizes his knowledge about the relation of age and risk tolerance with this figure which might then be regarded as a degree of belief in the statement. Whatever is the case let us assume that somehow we are provided with probabilities of this kind along with any kind of statements. Let

E denote any such statement, e.g. "John is of advanced age". So we would consider a

probability value P(E) along with E ; similarly with any other statement H has low risk tolerance". The rule mentioned just before states that

E

example, where E may be regarded as evidence for the hypothesis H statement, we consider some probability P(E-*H)

such as "John

implies

H

in this

to hold. As with any

for this rule as well; in probability theory

one usually writes P ( H I E ) to express exactly the same and calls it the conditional probability of H

relative to E .

Finally, we may consider the probability that

H

and

E both hold,

i.e. P(H^E) , sometimes written shortly P(HE) . A simple probabilistic argument shows us that

P(H^E)

is the probability

P(E) times

P(H[E) , called the theorem of compound probabilities.

P(H^E) = P(E)'P(H[E) For instance if John may be considered to be of advanced age only to a degree of 50% then P(HAE)

would be

0.5"0.6=0.3

in our example. Of course, we may as well consider the

inverse rule "low risk tolerance implies advanced age", that is

H ~ E , and its probability

P ( E [ H ) . As before we obtain P(E^H) = P ( H ) ' P ( E [ H ) Since ^ is commutative, the left sides of these two equations must be equal, hence also their right sides, that is P ( E ) ' P ( H ! E ) = P ( H ) ' P ( E [ H ) , or P ( H I E ) = P(H)'P(E]H)/P(E) This equation is called Bayes' theorem. See any book on probability theory (such as [deF]) for

207

more details. It provides the basis for reasoning about uncertainty in many expert systems such as M Y C I N , P R O S P E C T O R , etc. For this kind of application we have to consider a number of hypotheses

H1,...,H,

, each of which being conditional on

E = E I ^ . . . ^ E ~ . In

practice the hypotheses are selected such that they may be assumed to be mutually exclusive and exhaustive, i.e. in any scenario exactly one of them is assumed to hold. Further, conditional independence hypothesis

in

an

is assumed independent

which means that the pieces of evidence support each way,

expressed

formally

as

P(E1A...AE, t Hi)

=

P ( E t [ H ~ ) ' . . . ' P ( E , [ H i ) . U n d e r these assumptions one may derive a form of this theorem that allows to update the conditional probabilities whenever information becomes available (e.g. by experiments) that some of the evidences in fact hold; this means in other words that we may carry out the kind of reasoning discussed in section 1 for the simple case of rules (or Horn clauses), but at the same time calculate the probabilities for the derived statements. For more details see [DHN]. A useful way of viewing this formalism is an inference net in which the propositions describing pieces of evidence or hypotheses are represented as nodes and the relations among propositions become the links of the network. The probabilities are measures associated with the nodes. The updating of such a measure for one or more nodes upon arrival of new evidence causes a propagation of the change along the links until the net stabilizes again. Disregarding the fact that this approach is quite popular and successful, it has been shown that these assumptions are quite problematic, in fact may even lead to inconsistencies [Gly]. One way of avoiding these problems is described in [Kad]. But there are other problems with the Bayesian approach as discussed so far. They include the difficulty to distinguish uncertainty (about what we know) from ignorance (see next section for an example) as well as the fundamental problem how meaningful such probabilities (the "certainty factors") are in applications and where to take them with some reliability in the first place. Finally, this approach also is restricted to situations where the propositions can be arranged in a hierarchy with inference chains flowing smoothly from rough evidence through to conclusions which often is not the case in practical applications [Qui].

4.2.

T h e I)embmer-Shafer theory of evidence

The Dembster-Shafer theory of evidence [Sha] is a close relative of the Bayesian approach. Both take into account degrees for measuring certainty. As a main difference we note that in the Dembster-Shafer approach the probability distribution is assumed over all subsets of hypotheses rather than over all individual hypotheses as in the Bayesian approach. Suppose we are considering a world with four automobile makers, Nissan (N), Toyota (T), General Motors (G), and Chrysler (C), and want to determine the probability of who might dominate a new market [GoS]. Instead of considering a predicate D O M I N in order to express e.g. the dominance of Nissan and Toyota by

DOMIN({N,T})

we briefly write

{N,T} for

208

the same purpose. The set

D = {N,T,G,C}

is called the frame of discernment in this

approach. As in the Bayes approach it is assumed that the singleton hypotheses are mutually exclusive and exhaustive. Now assume evidence is somehow obtained that the probability of Japanese dominance is .4. In order to update the probabilities so that this new information is incorporated the following mechanism is applied. A basic probability assignment function subsets of D . Initially, m(D)=1.0 the evidence m({N,T})=.4

m

is introduced that allows to assign probabilities to

since no other information was available. After obtaining

the decrease in our ignorance is captured by updating

that its value continues to express exactly the degree of ignorance which is this case. probability

For all other subsets of

D

the value of m

is 0

m(D)

so

1.0 - .4 = .6 in

in the present situation. The

P expressing our degree of belief as in the Bayesian approach may be calculated

from the values of m

by adding the m-values of all subsets. For instance, P({N,T})=.4 and

P(D)=I.0 while this value is 0 for any other subset in this case. The question remains how one performs the updating as with complicated case.

m(D)

just before in a more

This is achieved with Dempster's rule of combination as follows. Suppose a

second evidence is obtained for the present scenario suggesting a dominance by {T,G,C} with a probability of .8, i.e.

m2({T,G,C})=.8

which leaves an ignorance

m2(D)=.2 . For distinc-

tion let us denote the m-function with the previous values by m 1 . What are the new m values on the basis of this additional evidence? The rule is to multiply the previous m-values (ml) for a subset

$1

with the m - v a l u e (m~) obtained from the new evidence in order to

obtain the combined m-value for the intersection

$1 with

S~ . In the present example this

rule gives us the following values.

m({T})=m~({N,TI)'m2({T,G,C})=. 4 +.8=.32 m({N,TI)=ma({N,T})'me(D)=.4".2=.08

m({T,G,C})=ml(D)"m~({T,G,C})=.6".8=.48 m(D)=ml(D)'m+(D)=.6'.2=. 12

For the remaining subsets the m-values continue to be zero. As before the degrees of belief are calculated by adding the m-values for all subsets. So we obtain for example

P({N,T}) = m({N,T}) + m({T}) + m({N}) = .08 + .32 + 0.0 = .4

Recently is has been shown in [Pea] that an appropriate view of the Bayesian theory in fact yields the same kind of flexibility that has been just demonstrated with the Dempster-Shafer theory. So it seems that both approaches are pretty much the same indeed. This includes the fact that both share the same disadvantages indicated at the end of the previous section.

209 I..3. Fuzzy logic Fuzzy logic has emerged form an attempt to develop a logic that models the fuzziness of natural language. This fuzziness is present in m a n y features of natural language, for instance in predicates such as "young", "intelligent", "blonde", or "elderly", "having low risk tolerance" mentioned already in section 4.1 above, but also in quantifiers such as "most", "some", "not very many", etc. While we have seen a way to cope with the fuzziness of predicates in the previous two approaches, there is nothing in them which suggests a way of dealing with such fuzzy quantifiers. Here fuzzy logic offers a single conceptual framework for dealing with those different types of uncertainty. In a sense it subsumes both predicate logic and probability theory. A n attempt to access the huge amount of papers on this topic might Start with [Zal,Za2]. Let us first consider the case of a fuzzy predicate, e.g. YOUNG(john). Fuzzy logic requires such a statement to be transformed in a canonical form which makes explicit the range of fuzziness. Here this range is the age of John within the interval say [0,100]. The canonical form is

YOUNG(john)--,age(john)=YOUNG

which is associated with a membership function

fy0u~0(u)=l-S(u;20,30,40) of a fuzzy set of the range [0,1001.

S

is a fixed continuous

function that is 0 upto 20, then grows to .5 at 30, and saturates at 40 reaching the value t that is kept until 100. For instance, f~ou~0(28) is approximately .7.

By itself it is meant to

express that .7 is the degree of compatibility of 28 with the concept labeled Y O U N G .

The

statement YOUNG(john) converts this meaning to that of expressing the degree of the possibility that John is 28.

In summary, for any such statement containing fuzzy predicates we

have to specify the canonical form of the statement, provide the ranges, and specify the parameters for the S-function which determines the possibility function f . Such possibility functions are then associated with fuzzy quantifiers such as "most" or "more than half". Further it is defined how two such functions are combined to yield one that characterizes a "quantifier" resulting from the application of a togieal inference rule such as the "quantifier" Q

in the following inference.

m o s t students are single m o r e - t h a n - h M f - o f the students are male

Q

students are single and male

Clearly, fuzzy logic indeed provides a coherent formalism for dealing with uncertainty. But the problems mentioned for the previous probabilistic approaches seem to be even more serious here where the probability technique covers even more features. I n this context we might question the membership functions, the rules of combinations as being fixed in a pretty much arbitrary way that does not adequately model the h u m a n way of coping with uncertainty.

210

4.4. P c r f ~

approaches

At the end of section 4.1 we have mentioned some of the difficulties that are encountered in the probabilistic approaches discussed so far. tt is not surprising, then, that a number of attempts have been made to cope with the phenomenon of uncertainty in a non-probabilistic way. These may in fact be regarded as more typical AI approaches since they more closely try to model what seems to be the h u m a n way of coping with uncertainty which in essence clearly is not a probabilistic one. If we ignore quantifiers to begin with, then the situation encountered in the previous sections consists of a number of propositions such as facts, rules or more complex statements. They are supposed to model some reality. For each particular application each of the proposition is either true or false, i.e. there is no such thing as being true with degree .6 . We only do not know for certain which of the two alternatives in fact applies. However often these propositions are not the only information available. In addition there may be some meta-knowledge about the propositions themselves such as experience that rule 1 is more reliable than rule 2 possibly based on statistical information derived from earlier applications. More important, even large knowledge-based systems in use today comprise only a very small fraction of the knowledge that is usually available for a h u m a n expert carrying out the tasks posed for such a system. For speculation let us assume that a system can be built comprising all this knowledge represented in form of propositional statements. Since even then we still would not know for many of these statements whether they are true or not, the only remaining way of deriving some conclusion would be to isolate consistent subsets of knowledge, to draw conclusions from each of them, to compare the results and arbitrarily decide which ones to prefer. One might consider many of the performance approaches as approximations to this general model of calculation. Especially the system Ponderosa [Qui] is based on this paradigm. It generates from the given set of statements maximally consistent sets of statements and separates them from the remaining statements. Although it does take into account measures of belief, there is no automatic selection of the "best" maximally consistent set; rather it provides those measures to the user as a filter and heuristic guide. A n obvious objection to this approach would be the exponential growth of possible subsets to be considered in the search for the consistent ones. This problem is overcome in Ponderosa again by a regress to the measures involved that are used here to restrict the search to result in the (currently) 10 "best" sets without the need to generate the remaining ones. Note that Ponderosa realizes one way of reasoning that tolerates inconsistencies, a topic that we already mentioned in section 2.5.6. While Ponderosa still involves measures as in the probabilistic approaches though with a drastically restricted function, the system S O L O M O N [Coh] is based on the model of endorsement

211

and uses no such measures anymore. As in a bureaucracy potential conclusions have to pass a n u m b e r of tests by meta-rnles that qualify it as positive ("pro"), negative ("con"), or irrelevant. The rules encode judgement, qualify the source and preciseness of the information, and other meta-knowledge of similar kind. When passing the structured net of rules the

test

results are collected in a "ledger-book'. The summing of these results is carried out in a deductive rather than a probalistic way. Only sufficiently endorsed conclusions eventually are allowed to pass the test. None of these systems involves a technique for dealing with fuzzy quantifiers as in fuzzy logic. That does not mean that the probabilistic treatment of such quantifiers provides the only solution to their formalization. For instance, in [BiS] a first-order solution for representing fuzzy quantifiers is outlined. For instance, the fuzzy quantifier "most" in "most students" is expressed as, informally, "all elements in a subset of" the set of students which in terms of cardinality is not very different from the whole set".

4.5. Engineering appm,~c'hes Often the meta-knowledge is not represented explicitly in current systems dealing with uncertainty but rather is encoded implicitly into the control strategy of the system. As we discussed in section 3.5 control is knowledge that naturally is interpreted as meta-knowledge, hence such systems in this sense take an approach like S O L O M O N except that they do not make the control knowledge explicit. Pattern-recognltion systems dealing with huge amounts of noisy information are often built in such a way. As an example we mention the system HEARSAY II (see [BaF]). Often such systems use a special system architecture known as the blackboard model developed during the HEARSAY project.

5. S U M M A R Y A N D C O N C L U S I O N S In this paper we have given an introduction to essentially four types of reasoning, viz. classical, non-monotonic, meta-, and uncertainty reasoning. In retrospect we might now wish to raise a n u m b e r of questions about this selection. First of all one might ask why we have chosen this particular sequence of topics. We admit that there is no absolutely convincing argument which separates this structure of presentation from others. The problem is the close interrelation among all these topics. For instance, meta-reasonlng in first-order logic clearly is classical reasoning and after amalgamation is even totally identical with it on the system level. Similarly, m a n y aspects of uncertainty reasoning can be interpreted the way discussed under the topic of non-monotonic reasoning which in turn may be formalized in terms of classical logic as we have seen in this paper. Yet the focus of interest is sufficiently different in each of these four types of reasoning to justify their separate treatment.

Anyway this separation is in line with common practice

except perhaps for meta-reasoning. The latter often does not enjoy the special attention that

212

we have spent on it. But we feet that its potential may have been underestimated in the past. Its treatment after non-monotonic reasoning rather than along with classical reasoning was done for didactic reasons in order to demonstrate its applicability to the phenomena described in section 2. Next we might ask whether these four comprise all the main types of reasoning. This is certainly not the case since there are many more kinds of reasoning that are sufficiently different from those to justify their presentation. O n e of them is inductive reasoning which however is included in the chapter by A. Biermann in this volume. Another one is reasoning by rewriting as in equality theories. These may be regarded as encoded forms of classical reasoning as we pointed out in section V.4 of [Bi3]. In the present volume they are treated in the chapter by G. Huet and to some extent also in that of M. Stickel. Some other kinds for lack of space will be mentioned only briefly in the following. As we have seen throughout the paper, for all the types discussed so far there was always a technique within the framework of first-order logic that handled it appropriately. This is the case also for those not mentioned so far. Hence their treatment is implicit in what we have presented before, This is to say that first-order logic provides the formalism that is flexible enough to allow the conceptual expression of many more kinds of inference. Of course, the formalism does not reveal by itself how this is to be achieved in detail. One further type of reasoning might be distinguished that occurs in problem solving and planning. Obviously, a problem may be formalized within first-order logic. But it is not obvious how our natural reasoning in such cases could be modeled in this framework. Yet a number of such techniques have indeed been suggested. Among them [Bi8] proposes a direct application of a classical theorem prover (as described here in section 1) that is subject to a certain restriction in its control (in other words some meta-knowledge is built into it - cf. section 3.5). A lot of reasoning is involved in programming and in reasoning about programs. P R O L O G has demonstrated that classical theorem provers as described in section t indeed are extremely useful tools in this context. But there are many more kinds of application in this wide field such as for program synthesis (which is closely related to inductive reasoning), program verification, program analysis. Other logics have been proposed for those purposes such as temporal logic, a kind of modal logic (see section 2.5.3). For good reasons we prefer the classical approach also in this context but for lack of space cannot go into further details of this large topic. In both previous applications of planning and programming time played a certain role, So we might ask how time can in fact be dealt with in a purely descriptive framework such as firstorder logic. Again we only can mention that convincing proposals do exist and have been used successfully in running systems - see [Sho] for a survey; as an example we mention [KoS] where time is captured by events and the periods marked through their occurrence.

213

Speaking of time might bring us next to space, viz. the physical space. Or, more generally, to qualitative physical laws and the reasoning about them [BdK], which again opens a whole new range of aspects; just think of reasoning about the behavior of liquids [Hay]. Before we interrupt this seemingly endless list we finally mention analogical reasoning [Win] which is a kind of meta-reasoning that allows for certain abstractions. As in all previous cases we see the first-order formalism as the appropriate framework for its treatment. In summary, we admit a strong bias towards the attempt to uniformly conceptualize all these different phenomena under the common framework of first-order logic. In order to appreciate this bias it is important to be aware of the fact that the deductive relation

I- as in K ]- K '

is really a relation that can be explored in various ways, not only in the axiomatic one where K

is assumed to be given and

and unknown

K'

K , or partially known

is derived or tested. We may also use it for given K and

K ' , and so forth.

K'

In addition there are vari-

ous ways of structuring like recta-inference as discussed in this paper. Because the variety of kinds of reasoning is so confusingly rich it appears that any other approach could simply not be carried out because of lack of uniformity and simplicity available here. With this latter remark we actually carry the discussion to the point of realizing all these variants in a hopefully single uniform system. There should be no doubt that we are far away from such an artifact. In fact the task seems to be so complex that it is hard to imagine how it might be put together by h u m a n minds. We believe that this is possible only if the system is of such a kind that it allows to assemble the pieces of knowledge in arbitrary order, one after the other. Seven pieces of factual and rule knowledge, one meta-level piece talking about them, then a piece of control knowledge, followed by another 42 pieces of domain knowledge, a rule of judgement, and so forth, just to illustrate what we mean. This requirement singles out a form of knowledge representation that is extremely modular on the one hand, but also reflects the tightly woven net of relationships among aI1 these pieces of knowledge. First-order logic clearly enjoys the modularity needed, but does it also support the connections? O n the surface of the representation it does not indeed. But once the representation is transformed into an internal form, for instance as a dag (directed acyclic graph), then these connections become visible, at least to the system as we showed in section 1.6 of [Bi7]. Since this part may be implemented completely independent from the particular knowledge to be represented we see that the first-order formalism does indeed support both requirements in an ideal way. With respect to the architecture of the knowledge base of such a system it seems that a hierarchical structure for the various parts would be best suitable as we argued in [Bi5]. On the bottom we would have the clusters of domain knowledge. O n top of these would be what we call deductive knowledge that stores preprocessed deductive information, so that costly search has not to be repeated many times. O n the next level we would have meta-level knowledge of judgement. And so forth, until on the top level all is brought together by a

214 central control. There are still so many open problems in important issues of detail for most of the features of reasoning discussed in this paper that it might seem premature to speculate in such a way about a uniform system comprising all these forms of reasoning. But speculation here is meant to play the role of a heuristic that guides our judgement about which of the many problems to attack with higher priority than others. If this paper has contributed to see all these problems in a common context it has fulfilled its purpose. Acknowledgment. The typscript is due to A. Bentrup and W. Fischer. REFERENCES

[BaF] Barr, A.B., Feigenbaum, E.A. (eds.), The Handbook of Artificial Intelligence, 1, W. Kaufmann, Los Altos (1981). [Bet] Beth, E.W., The foundations of mathematics, North-Holland, Amsterdam (1965). [Bil] Bibel, W., Programmieren in der Sprache der Pr/idikatenlogik, Habilitationsarbeit (abgelehnt), Technische Universit/it Mfinchen (1975); shortened version: Pr/idikatives Programmieren, LNCS 33, Springer, Berlin, 274-283 (1975). [Bi2] Bibel, W., A uniform approach to programming, Report No. 7633, Technische Universitht Mfinchen, Abtlg. Mathematik (1976). [Bi3] Bibel, W., Automated theorem proving, Vieweg, Braunschweig (1982). [Bi4] Bibel, W., Matings in matrices, CACM 26, 844-852 (1983). [Bi5] Bibel, W., Knowledge representation from a deductive point of view, Proc. I IFAC Symposium Artificial Intelligence (V. M. Ponomaryov, ed.), Pergamon Press, Oxford, 37-48 (1984). [Bi6] Bibel, W., First-order reasoning about knowledge and belief, Proc. Int. Conf. Artificial Intelligence and robotic control systems (I. Plander, ed.), North-Holland, Amsterdam, 9-16 (t984). [Bi7] Bibel, W., Automated inferencing, J. Symbolic Computation 1, 245-260 (1985). [Bi8] Bibel, W., A deductive solution for plan generation, New Generation Computing 4

(1986). [BoK] Bowen, K.A., Kowalski, R., Amalgamating language and meta-language in logic programming, Logic Programming (K.L. Clark, S.-A. T~irntund, eds.), Academic Press, London, 153-172 (1982). [BOW] Bowen, K.A., Weinberg, T., A recta-level extension of PROLOG, Technical Report, CIS-85-t, Syracuse University (1985).

215

[BdK] Brown, J.S., de K.leer, J., The origin, form, and logic of qualitative physical laws, IJCAI-83 (A. Bundy, ed.), Kaufmann, Los Altos, 1158-1169 (1984). [Bun] Bundy, A., The computer modelling of mathematical reasoning, Academic Press (1983). [Cla] Clark, K.L., Negation as failure, Logic and Data Bases (H. Gallaire et al., eds.), Plenum Press, New York, 293-322 (1978). [C1M] Clark, K.L., McCabe, F.G., The control facilities of IC-PROLOG, Expert systems in the Microelectronic Age (D. Michie, ed.), Edinburgh University Press (1979). [Coh] Cohen, P.R., Heuristic reasoning about uncertainty: an Artificial Intelligence approach, Pitman, Boston (1985). [deF] de Finetti, B., Theory of probability, vol. 1, Wiley, London (1974). [Doll Doyle, J., A truth maintenance system, Artificial Intelligence 12, 231-272 (1979). [Do2] Doyle, J., Circumscription and implicit definability, Non-monotonic Reasoning Workshop, AAAI, 57-67 (1984). [DHN] Duda, R.O., Hart, P.E., Nilsson, N.J., Subjective Bayesian methods for rule-based inference systems, Techn. Note 124, SRI International, AI Center, Menlo Park; also: Proc. NCC, AFIPS Press (1976). [EMR]

Etherton, D.W., Mercer, R.E., Reiter, R., On the adequacy of predicate cir-

cumscription for closed-world reasoning, Proc. Non-monotonic Reasoning Workshop, AAAI, 70-81 (1984). [Fef] Feferman, S., Toward useful type-free theories I, JSL 49, 75-111 (1984). [Gall Gallagher, J., Transforming logic programs by specialising interpreters, Report, Dept. Computer Science, University of Dublin (1984). [GaLl Gallaire, H., Lasserre, C., Meta-level control for logic programming, Logic Programming (K.L. Clark, S.-A. T~rnlund, eds.), Academic Press, London (1982). [GeG] Genesereth, M.R., Ginsberg, M.L., Logic Programming, CACM 28, 933-941 (1985). [Gen] Gentzen, G., Untersuchungen fiber das logische Schliessen, Mathem. Zeitschr. 39, t76-210, 405-431 (1935). [Gly] Glymour, C., Independence assumptions and Bayesian updating, Artificial Intelligence 25, 95-99 (1985). [CoS] Cordon, J., Shortliffe, E.H., The Dempster-Shafer theory of evidence and its relevance to expert systems, Rule-based Expert Systems (B.G. Buchanan, E.H. Shorttiffe, eds.), Addison-Wesley, Readings, ch. 13 (1984). [Gre] Green, C.C., Theorem proving by resolution as a basis for question-answering systems, Machine Intelligence 4, Elsevier, New York, 183 - 205 (1969).

216 [Gro] Grosof, B., Default reasoning as circumscription, Proc. Non-monotonic Reasoning Workshop, AAAI, 115-124 (1984). [Haa] Haas, A.R., A syntactic theory of belief and action, Artificial Intelligence 28 (1986). [Hay] Hayes, P.J., Naive physics I - Ontology for liquids, Formal Theories of the Commonsense World (Hobbs, J.R., Moore, R.C., eds.), Ablex (1984). [Hin] Hintikka, J., Knowledge and belief: An introduction to the logic of the two notions, Cornell University Press (1962). [JLL1 Jaffar, J., Lassez, J.-L., Lloyd, J., Completeness of the negation as failure rule, IJCAI-83 (A. Bundy, ed.), Kaufmann, Los Altos, 500-506 (1983). [Kad] Kadesch, R.R., Subjective inference with multiple evidence, Artificial Intelligence 28

0986). [Kow] Kowalski, R.A., Sergot, M., A logic-based calculus of events, New Generation Computing 4, 67-95 (1986). [Kri] Kripke, S., Semantical analysis of modal logic, Zeitschrift f. Mathem. Logik u. Grundlagen der Mathem. 9, 67-96 (1962). [Lev] Levesque, H., A logic of knowledge and active belief, Proc. AAAI-84 (1984). [Lil] Lifschitz, V., Computing circumscription, Proc. IJCAI-85, Kaufmann, Los Altos, 121127 (1985). [Li2] Lifschitz, V., On the satisfiability of circumscription, Artificial Intelligence 28, 17-27

(1986), [Llo] Lloyd, J.W., Foundations of logic programming, Springer, Berlin (1984). [Mcl] McCarthy, J., First-order theories of individual concepts and propositions, Expert Systems in the Micro-electronic Age (D. Michie, ed.), Edinburgh University Press, 271-287 (1979). [Me2] McCarthy, J., Circumscription - a form of non-monotonic reasoning, Artificial Intelligence 13, 27-39 (t980). [Me3} McCarthy, J., Applications of circumscription to formalizing common sense knowledge, Proc. Non-monotonic Reasoning Workshop, AAAI, 295-324 (1984). [MiP] Minker, J., Perlis, D., Completeness results for circumscription, Artificial Intelligence 28, 29-42 (1986). [Moo] Moore, R.C., Semantical considerations on non-monotonic logic, IJCAI-83 (A. Bundy, ed.), Kaufmann, Los Altos, 272-279 (1983). [Pea] Pearl, J., On evidential reasoning in a hierarchy of hypothesis, Artificial Intelligence 28,

9-16 (1986).

217

[Per] Perlis, D., Languages with self-reference, Artificial Intelligence 25, 301-322 (1985). [Qui] Quinlan, J.R., Internal consistency in plausible reasoning systems, New Generation Computing 3, 157-180 (1985). [Rel] Reiter, R., A logic for default reasoning, Artificial Intelligence 13, 81-132 (1980). [Re2] Reiter, R., Circumscription implies predicate completion (sometimes), Proc. AAAI-82, 418-420 (1982). [Re3] Reiter, R., Towards a logical reconstruction of relational database theory, On Conceptual Modelling:

perspectives from Artificial Intelligence,

databases,

and programming

languages (M.L. Brodie et al., eds.), Springer, Berlin, 191-238 (1983). [Sch] Schtitte, K., Proof theory, Springer, Berlin (1977). [Sha] Shafer, G., A mathematical theory of evidence, Princeton University Press, Princeton (1976).

[She] Shepherdson, J.C., Negation as failure: A comparison of Clark's completed data base and Reiter's closed-world assumption, Report PM-84-01, School of Mathematics, University of Bristol (1984). [Silo] Shoham, Y., Ten requirements for a theory of change, New Generation Computing 5, 467-477 (1985). [Tur] Turner, R., Logics for Artificial Intelligence, E. Horwood, Chichester (1984). [Wey] Weyrauch, R., Prolegomena to a theory of mechanized formal reasoning, Artificial Intelligence 13, 133-197 (1980). [Win] Winston, P.H., Learning and reasoning by analogy, CACM 23, 689-703 (1979) . [Zal] Zadeh, L.A., A computational approach to fuzzy quantifiers in natural languages, Comp. & Maths. with Appls. 9, 149-184 (1983). [Za2] Zadeh, L.A., The role of fuzzy logic in the management of uncertainty in expert systems, Fuzzy Sets and Systems 11, 199-227 (1983).

PART THREE Knowledge Programming

T e r m Rewriting as a Basis for the Design of a Functional and Parallel Programming Language. A case study : the Language FP2

Philippe Jorrand LIFIA Institut National Polytechnique de Grenoble

FOREWORD

The semantic elegance and the mathematical properties of applicative and functional programming languages are now widely recognized as relevant and useful qualities for implementing the large and complex algorithms of the kind encountered in artificial intelligence.

On the other hand, it is also being realized that many of the problems solved by these algorithms, like automated reasoning, and some major application areas related to artificial intelligence, like computer vision, are in fact considered in a distorted and limited way because of the implicit and ever present hypothesis that they have to be solved in a sequential way by a single processing engine. This is one reason why parallelism has become a highly active topic for research in programming languages and methodology. Another reason is that the design of machine architectures with massive parallelism is also becoming a feasable task because of the progress in VLSI technology.

However, the history of languages has put parallelism, communication and synchronization on a separate path from nice and clean applicative and functional programming. One difficulty is then to reconcile these seemingly antagonist styles.

222

It is such a unified framework for both functional and parallel programming which is presented here. It takes the form of a language, called FP2, which is entirely based on the notion of terms for representing the objects of the language, and on the mechanics of term rewriting for representing the operational semantics.

Part of the work on FP2 is carried out in the context of ESPRIT Project 415, where FP2 is the basic tool for designing and implementing a parallel inference machine. This presentation is partly drawn from : "FP2 : the language and its formal definition", a working document written for ESPRIT Project 415. This presentation has the format of an informal language description and it does not contain references inserted in the text. It is followed by a bibliography on topics related to the essential questions raised by the design of such a language.

223

I _ OVERVIEW. The main language styles under active study for new generation programming can be visualised on a triangle. Vertices represent "pure" programming styles (i. e. functional, parallel and logic). The edges represent "mixed" programming styles, where 2 "pure" styles are explicitly present in a single language.

9),,, unct.ionalpro~: :,.ammlno ... /"

FP2 /

",.,,%.

.)/"

¥ Parallel programming

",\.

-

"-, ") Logic programming

FP2 is on the edge joining functional programming and parallel programming. It must be distinguished from other functional languages where data flow is used for taking advantage of possible parallelism during evaluation. In FP2, on the contrary, both functional programming and parallel oro2ramminl are explicitly present and can be independently expressed using specific constructs in the language. Furthermore, FP2 is a typed language allowing polymorphic algebraic type definitions, polymorphic functions and polymorphic communicating processes. Finally, the "declarative" style of FP2 and its semantics give to that language the qualities of both a programming language and a specification language. The semantics of FP2 rely on term algebras and on rewrite systems. This establishes a sound basis for designing and implementing formal verification tools, like full static type checking, static analysis of dynamic behavior (deadlocks, livelocks.... ) and proof of implementation correctness (comparison of a specification in FP2 with an implementation, also in FP2).

224

The main characteristics of FP2 can be summarized as follows "

FP2 is a functional programming language. Values in FP2 are represented by terms and basic function definitions have the form of rewrite rules. Function applications are terms containing defined function names : rewrite rules reduce function applications to terms containing no function application. Functional forms using second order functional operators and function names can be written and named : such higher form for constructing and defining functions has its semantics defined in terms of basic function definitions (i. e. rewrite rules).

FP2 is a parallel programming language. Independant communicating processes can be defined and networks of them can be constructed. A process is able to send and to receive messages to and from its environment. These messages How through ports owned by the process. Messages are values : they are represented by terms, they are built and reduced according to functional programming in FP2. Describing a process requires both describing the possible orders in which its ports may be used (sequentiality, non determinism, simultaneity) and describing sent messages by applying functions to received messages : basic process definitions accomplish all of this within a single formalism, namely rewrite rules. "Process forms" using "process operators" and process names can be written and named : such higher level form for constructing and defining processes denotes, in general, a network of processes (e. g. systolic arrays) and has its semantics defined in terms of basic process definitions (i. e. rewrite rules).

FP2 is a typed programmin~ language. Every term representing a value in FP2 is typed and every function has a domain and a range defined by types. All terms of a given type are thus results of applying functions having their range in that type • on that basis, types in FP2 are defined as term algebras. A type definition introduces the constructor operations for objects of that type ' terms containing only constructors are normal forms for terms containing function applications. Elaborate type structures built by means of "type forms" using "types operators" and types names can be written and named.

225

FP2 is a oolvmorDhic pro~rammin~ language. Type definitions, function definitions and process definitions may be parameterized by types : such definitions are called "polymorphic". In order to guarantee the type correctness of polymorphic definitions and for establishing the proper bindings, a notion of "property" is introduced : a property characterizes a class of types by defining a minimal algebraic structure that all the types of the class must have. Arbitrary properties may be defined. Once a property is defined, it may be used for specifying the class of actual types a formal type parameter of a polymorphic definition may be bound to.

FP2 is a modular programming language. Function, process, type and property definitions can be grouped inside "modules" which can be assembled in a hierarchical manner. Modules may export definitions to ascendent modules and may hide definitions to descendent modules. Modules form the basis for a strict control of visibility within FP2 programs.

226

2 _ TYPES, Values are represented by terms and FP2 operates on values by applying functions to terms. There are two kinds of functions : constructors and operations. Terms containing operation applications should always be reduceable to terms containing only constructors. The reductions corresponding to a given operation are defined by rewrite rules. Terms are typed and every function has a domain and a range, both of which are types. Thus, terms of a given type are all results of applying functions ranging in that type. Formally, types are term algebras and, with the rules defining operations, every term is congruent to a term containing only constructors. The basic form of type definition provides a signature for constructors and, possibly, for operations involving objects of the type. It also provides the rules defining the reductions for these operations. In addition to this reasonnably classical basis for algebraic type definition, FP2 also provides ways of constructing new types from existing types, using cartesian product, sequence and union type building "operators".

2. I _ Basic type definitions, A basic type definition presents a term algebra. It provides : - The name t of the type ; The names, domains and ranges (necessarily t) of the constructors of t ; The names, domains and ranges of operations involving t ; - The names and types of variables used in the rules for operations ; Rewrite rules defining the reduction of terms containing operation applications.

-

-

The left and right members of rules are separated by "==>" signs: Rules should be written in such a way that terms containing operation applications can be reduced to terms containing only constructors : this is an important question which has been studied in a number of places, especially in connection with algebraic data types and with terms rewriting systems. It will not be discussed here.

227

An example is the t y p e "Nat" of Natural integers (assuming that the type Bool of booleans is defined with constructors "true" and "false", and with operations "or", "and" and "not") •

Nat 0 • -) Nat succ • Nat -) Nat oons add, mul " N a t × N a t = ) Nat eq, leq " Nat × Nat -) Bool max • N a t × N a t = ) Nat I " -) Nat • Nat VarS m, n rules add(0,n) ==> n add(succ(m),n) ==> succ(add(m,n)) mul(0,n) ==> 0 mul(succ(m),n) ==> add(mul(m,n),n) leq(0,m) ==> true leq(succ(m),0) ==> false leq(succ(m),succ(n)) ==> leq(m,n) eq(m,n) ==> and(leq(m,n),leq(n,m)) max(re,n) ==> i f leq(m,n) t h e n n e.lsem endif I ==> suce(0) cons

endtype This example should not imply that integers have to be r e p r e s e n t e d as succ(succ(...)) - the usual decimal notation and infix arithmetic operators can also be used. This is also the case for <, ( .... and the boolean operators.

228

As another example, binary trees with natural integers at their leaves can be described by "

type Btree tip : Nat fork : Btree xBtree 0PnS maxt : Btree m : Nat vats u, v : Btree ru~Is maxt(tip(m)) maxt(fork(u,v)) endtvDe cons

The tree pictured as

-) Btree -~ Btree -) Nat

==>

m

==>

max(maxt(u),maxt(v)}

-

.

/

\

'...

f

1"T-)

/, "-.,,.. \ j<

,.U

\',%,.,

d~

would be constructed by • fork(tip(3), fork(fork(tip( I },tip(4)), tip(2)})

A general method for writing rules is that the left members apply each defined operations to disjoint cases of constructors, whereas the right members may have any format including conditional expressions.

229

2.2 _ Type forms. In addition to e l e m e n t a r y types defined by means of basic type definitions, it is possible to define constructed types by means of type forms w h e r e the operands are type names and the operators are type operators. There are three such operators : I _ Cartesian product. If tl,t2,...,t n are types then tl×te×...xtn is also a type, the cartesian product of tl,t2,,_,t n. If Xl,X2,...,X n denote objects of types tl,t2,...,t n respectively, then (xl,x2,...,x n) denotes an object of type t I × tax...× t n. It must be noted that t i ×t2×t 3, (t I ×t~)×t~ and t I ×(taxt 3) are distinct types. 2 _ Sequence. If t is a type, then t* is the type of sequences with elements of type t. If x denotes an object of type t and if s denotes a sequence of type t*, then x.s denotes a sequence of type t*. The notation nil : t* denotes the e m p t y sequence of type t'. It is simply written nil w h e n the type t* is k n o w n from the context. If Xl,X2,...,X n denote objects of type t, then [xl,x2,...,x J

denotes

the

sequence

of

type

t*

constructed

by

x t.(x2.(....(xn.nil)...)). 3 - Union. If tl,ta,...,t n are types then tlit21...It n is the union of types tl,t2,...,t n. There is no special w a y of denoting an object of a union type : w h e n in a context w h e r e an object of type tllt2l...Itn is required, then an object of type t i, or an object of type t 2, or .... or an object of type t n may be provided. The types tllt2lt3, (tllt2)lt 3 and tll(t2lt 3) are all equivalent, and tilt I is the same as t v

230

Type expressions can be used in any context w h e r e a type may be written ; examples of this have already appeared above with functions having cartesian products as their domains. It is also possible to define names standing for type expressions, by means of type declaration, like in • type Snat is Nat* type Ssnat is (Nat*)* For example, a w a y of describing trees of variable arity with natural numbers or booleans at e v e r y node could be • type Vtree is (NatlBool) × Vtree* Given

a 'type

t=tiltaL.It =,

m/>l

and

a

type

t'=t'llt'21_.lt' n, n/>l, where

tt,t2,...,t,n,t'1,t'2,.o.,t' n are not union types, then t is compatible with t' if, for all i, there is a j such that ti=t' j , w h e r e "=" is syntactic equality, modulo type declarations. The notation " t =_ t' " stands for " t is compatible with t' " Thus, given the type declaration • type t is u, both relations t ~_ u and u c_ t hold.

231

5 - FUNCTIONS. Operations may be defined within basic type definitions. But once a type is defined, additional operations on it may be defined separately. The elementary form of operation definition, called "basic operation definition", follows the same general approach as operations defined within types, namely rewrite rules. In addition to basic operation definitions, FP2 also allows the construction of other operations, by means of second order functional forms which apply functional operators to function names. 5. I _ Basic operation definitions...,. A basic operation definition provides : the name, domain and range of the n e w operation ; the names and types of variables used in the rules for that operation ; the rules defining the reductions of terms containing applications of that operation. For example, a new operation on Nat's can be introduced by • oi:)

min vats rules

• N a t × N a t -) Nat m,n • Nat min(m,n) ==> if m,
endoo An operation replacing all natural integers at the leaves of a Btree t by maxt(t) would be • 912

repmax • Btree -) Btree vars t • Btree rules repmax(t) ==> rep(t,maxt(t)) endoo o_~ rep vats rules endop

• Btree x Nat -) Btree m , n • Nat u , v • Btree rep(tip(m),n) ==> tip(n) rep(fork(u,v),n) ==> fork(rep(u,n),rep(v,n))

232

Another, more elaborate example, shows a w a y of programming unification of t e r m s in FP2. Terms are structured objects like t and u below •

t.

/

/"

/

..

,,.. XV

/ ,"~.fz ,.<, /

.,./" x'%L

\,

/

i

"

f~/ ",," \

",,

.

.

-

2

/.

'\ x,

',a4

"\,

/"

,.:..

./'" >' ~'".v al

/

"""x "

~":J'2

v4)

/

~

/

/

//

al

"""\.

'\" a 3

The labels fi r e p r e s e n t b i n a r y function names, a i r e p r e s e n t constant n a m e s and v i r e p r e s e n t variable names. For simplifying the example, the algorithm assumes that each variable n a m e appears at most once in (t,u). The result of unifying t and u is computed by unify(t,u). In the case of t and u above it succeeds and results in a sequence of assignments to variables denoted b y :

[(

i ' ) *"a

t7 ") 2 '4 There are other cases, w h e r e unification results in a failure. The types for terms can be defined as follows :

t y p e Term is Const I Vat I Applic t y p e Const cons a - Nat -) Const endtype Woe Var cons v - Nat -) Var endtype t y p e Applic is Funct × Term x Term type Funct cons f " Nat =) Funct endtype

233

The possible results of unification have the type : type Result is Assign* [ Failure type Assign is Var x Term type Failure cons fail : =) Failure endtvDe While unifying two terms, it will be necessary to combine the results of unifying subterms. This is accomplished by an operation "+" : o_~ + vats rules

: r : u,v : r+fail fail+ u u +v

Result x Result -) Result Result Assign* ==> fail ==> fail ==> append(u,v)

endoo This basic operation definition shows an example of operation overloading : the operation +, which is already defined on Nat's, gets here another definition attached to it. When + is applied, the choice among these definitions is determined by the types of the operands. Definitions leading to possible ambiguities are not permitted. This definition also shows a use of union types : w h e n + is applied on Results, each of its operands may be of type Assign* or of type Failure : the case analysis on operand types is made by the type of variables used in the left members of the rules.

234

Finally, the unification operation is ' o_~

unify vars

ru~s

' T e r m × T e r m -) Result t,u,v,w • Term i, j • Nat c • Const x • Var h • Applic unify(a(i),a(j)) ==> if i=j t h e n nil else fail e n d i f unify(c,x) ==> [(x,c)] unify(c,h) ==> fail unify(x,t) ==> [(x,t)] unify(h,c) ==> fail unify(h,x) ==> [(x,h)] unify((f(i),t,u), (f(j),v,w)) ==> if_. i=j t h e n unify(t,v) + unify(u,w) els___~efail endif

endop 3.2 - Functional forms. FP2 provides functional functional forms.

operators

for

combining

defined

functions

into

Let r, s, t, t I, t 2..... t a be types. T h e r e are eight such operators • I _ Composition. If f and g a r e f u n c t i o n s w i t h f : t - ) r andg:r =) s, t h e n ( g o f ) is a functional f o r m denoting an operation in t --) s. If x is a t e r m of t y p e t, t h e n the r e d u c t i o n of (gof)(x) is the r e d u c t i o n of g(f(x)). 2 _ Condition. If p, f and g are functions with p • t -) Bool, f - t --) r and g • t -) r, t h e n (p=) f ; g) is a functional f o r m denoting an o p e r a t i o n in t -) r. If x is a t e r m of t y p e t, t h e n t h e r e d u c t i o n of (p =) f ; g)(x) is t h e r e d u c t i o n of if p(x) t h e n f(x) else g(x) endif.

235

3 - ~artesian Product construction. If fl, f~...... fn are functions w i t h fi : t =) t i, t h e n (fl, f2 ..... fn) is a functional f o r m denoting an operation in t ~ t I x t 2 x ... x t n. If x is a t e r m of type t, t h e n the reduction of (fl,f2,...~a)(X) is the reduction of (fl (x),f2(x),...Jrn(x)). 4 _ Seauence construction. If fl, fa ..... fn are functions w i t h fj : t -) r, t h e n If j, f~...... fn] is a functional form denoting an operation in t -) r*. If x is a t e r m of type t, t h e n the reduction of Ill,r2, ..., fn](x) is the reduction of [f)(x),f2(x),..., fn(x)]. If n=O, t h e n the reduction of [](x) is the reduction of lnill(x). 5 - Constant. If x is a t e r m of t y p e t, t h e n Ixl is a functional form denoting an operation in r-)t, for any type r. If y is a t e r m of any t y p e r, t h e n the reduction of lx|(y) is the reduction of x . 6 _ Map.

If f is a function with f : t =) r, t h e n o((f) is a functional form denoting an operation in t* -) r*. If x is of type t*, t h e n t h e r e are two cases : (i) if x reduces to nil t h e n 0((f)(x) reduces to nil, and (ii) if x reduces to u.y t h e n the reduction of 0{(f)(x) is the reduction of f(u).0((f)(y). Insert. If f is a function with f : t x t -) t, t h e n l(f) is a functional form denoting an operation in t* -) t. If x is a non e m p t y sequence of t y p e t*, t h e n there are t w o cases : (i) if x reduces to u.nil t h e n / ( f ) ( x ) reduces to u, and (U) if x reduces to u.(v.y) t h e n the reduction of /(f)(x) is the reduction of f(u,l(f)(v.y}). If x is the e m p t y sequence of t y p e t*, t h e n l(f)(x) raises the exception "inserLerror". 7 _

If f is a function w i t h f: tlxtax...xtn-) t, t h e n : f(xl, x 2.... , xn), w h e r e xj is either a t e r m of t y p e tj or a ".", is a functional f o r m denoting an operation in ti×tjx...×tl-)t, w h e r e ti/tjx...xt I is obtained by keeping in t~ xta×...xt n the t k such that x k is a ".". If Yk is a t e r m of t y p e t k such that x k is a ".", t h e n the reduction of : f(xl,x2,...,xn)(y m..... yp) is the reduction of f(z~,z2,...,z n) w h e r e z i = x i if x i is a t e r m and z i = Yi otherwise.

236

The semantics of functional operators are defined by considering that functional forms are second order expressions which can be "evaluated". This is possible in FP2, where basic operation definitions constitute a more elementary form of operation description : evaluating functional forms means producing basic operation definitions. Let F(x) be a term where F is a functional form. The evaluation of F is guided by its syntax : dummy operation names f0,fp.., are generated, one for each syntactical sub-form in F, where f0 is the operation for F itself. Then, basic operation definitions for f0,fl ..... with their respective names, domains, ranges, variables and equations can be mechanically produced. In fact, defining one of these functions fi is necessary only when fi corresponds to a map or insert functional operator and in the case of recursive functional forms. Finally, F(x) is replaced by f0(x) which has its reduction defined by the generated basic operation definition. For example, given the following basic operation definition • o_R null vars rules

: Nat* -~ Bool m : Nat s : Nat* null(nil) ==> true null(re.s) ==> false

endoR The evaluation of ((not o null) =) l(add);101) produces " " Nat* -) Nat

o_O_R f0 vats

v0

" Nat ~

rules

f0(v0)

==> ~_ not (null(v0)) then f! (v 0) else 0 endif

endo~

237

fl vats

: Nat" -) Nat v o , v I : Nat v2

rules

: Nat"

fl(v0.nil)

==> v 0

fl(Vo.(vl.V2)) ==> add(vo, fl(vl.v2)) fl(nil) ==> ! insert_error endop If x is a sequence of t y p e Nat*, t h e n ((not 0 null) =) l(add);101)(x) reduces to the s u m of the e l e m e n t s of x, or to 0 if x is nil. Functional forms can be used in any context w h e r e function n a m e s m a y be written. It is also possible to define n a m e s standing for functions built by functional f o r m s • 9J~ sigma is ((not o null) ~ l(add);101) o__p_pi is ((not 0 null) -) l(mul);10l) 0_~ sigpi is (sigma 0 o((pi)) Recursive operation definitions with functional forms fit quite naturally in that f r a m e w o r k . For example, given oo II is (leq 0 (id,lll)) w h e r e id is the identity function on natural n u m b e r s , and " pred vars rules

- Nat -~ Nat m • Nat pred(0) ==> 0 pred(succ(m)) ==> m

e,~dop re_p_p I is pred p2 is (pred o p r e d )

238

Fibonacci n u m b e r s can be computed b y • fib

is

(II =) id ; (add o ((fib o pl),(fib o p2))))

This definition produces • oo

Nat -) Nat • Nat

fib ¥~r~

v0

rules

fib(v 0)

endoo

==>

if ll(vo) then id(vo) else add(fib(pI(vo)),fib(p2(vo))) endif

239

4 _ PROCESSES. The elementary component for organizing parallel computations in FP2 is the process. A process has ~)orts through which messages may flow in and out. Messages are values, they are represented by functional terms, they have types and they are built and reduced according to functional programming in FP2. Messages arrive at ports or leave ports along directed connectors having their destinations or their origins attached to these ports. Each connector allows messages of a certain type, which may be a union type. The transportation of one message along one connector is a communication. There is no such notion like the duration of a communication. Describing a process is describing its ability to perform communications along the connectors attached to its ports. In addition to applying functions to received messages for computing sent messages, this involves also seauencin~, n ~ determinism and parallelism in the ordering among communications. Formally, a process is a state transition system which can be viewed as a graph : nodes represent states, multiple branching represents non determinism and arcs are labelled by events, where an event is a set of communications occurring in parallel. This graph is in general infinite and every path represents a possible sequence of events : one set after the other of communications occurring in parallel. The basic form of process definition provides a description of the connectors of the process and it makes use of rewrite rules for describing the non deterministic state transition systems : the rules, labelled by (possibly empty) events, rewrite states. In addition to basic process definitions, the language allows definitions of processes built by combining other processes into process forms by means of process operators. 4. I ~ Basic process definitions. A basic process definition describes a transition system, with transition rul~s, where the events are sets of communications along typed connectors. It provides

•

the name N of the process ; the names and message types of the input, output and internal connectors of N ; the names and domains of state constructors used in the rules of N ; the names and types of variables used in the rules of N ; rules defining the transitions of N.

240

As an example, let STACK be a process. It can be pictured as :

----•

t

01

~-

It has an input connector I and an output connector O. The communication of a message v, w h e r e v is a functional term of type t, along a connector k of message type t is denoted by k(v). For example, if both I and 0 may communicate Nat's, then I(O) and O(succ(O)) denote communications. An event is composed of a set of corn munications k ~(v j)...kn(vn), w h e r e k ~.....k n are n distinct connectors. A term of the form Q(u i .....urn), where Q is a state constructor and w h e r e the ui's are functional terms of the correct types for the domain of Q, is called a predicate. A predicate without variables in the ui's is a stale. State constructors cannot appear in the ui's. Rules are composed

of three

parts:

a predicate

R(u i .....urn) called the

pre-condition, an event ki(vl)...kn(v n) and a predicate S(wl,...,w n) called the post-condition. They have the general format :

R(ui..... u,,):kl(vl)...kn(vn) ==> S(wl,...,wp) If k i is an internal or output connector, all variables appearing in v i must appear in R(u I .....urn) or in vj such that kj is an input connector. The same must be true for the variables in S(w i .....wp). Since an event is a set it may be empty. In that case, the rule is an internal rule, of the form : R(u I .....u m) ==> S(wI,o..,wp) Furthermore, among the rules of a process, there must be at least one initial rule, without pre-condition and without event, and w h e r e the post-condition is a state : -=> S(wl,._,w p)

241

For example, let the process STACK be an unbounded stack of Nat's. It is initially e m p t y and w h e n a Nat arrives along I, it may be written into STACK. When STACK is not empty, the last arrived Nat may be read from it along O. Writing and reading are mutually exclusive. A basic process definition for this "Last In First Out" STACK may then be : proc STACK i~ o¢~t states vats

I 0 S e v

-Nat "Nat " Nat* • Nat • Nat*

S(v) S(e.v)

- l(e) • O(e)

rules

==>

==> ==>

S(nil) S(e.v) S(v)

endproc Rules in a process N describe a transition system in the following way • 0 _ Initially, one of the initial rules in N is chosen. This choice is non deterministic, The post-condition of the chosen rule becomes the current state of N. Then repeat steps I, 2 and 3. I _ The current state q of N is matched against the pre-conditions of the rules : a rule with pre-condition r is said to be Dre-aoolicable if there exists a substitution h for the variables of r such that h(r)=q. If there is no pre-applicable rule, the process is terminated.

2 _ Let e be the event in the pre-applicable rule and let mj be a message about to be sent accross kj, for all ki(v i) in e where kj is an external connector. That rule is said to be applicable if there exists a substitution g for the variables of all vj's such that g(h(vi))=m j. 3 - One of the applicable rules is chosen to be the at)t~lied rule. This choice is non deterministic. Let s be the post.condition of this rule. The event g(h(e)) occurs and the term g(h(s)) becomes the current state of N.

242

This operational view shows how rules express sequencing, non determinism and parallelism among communications : rules are applied one at a time (sequencing), the applied rule is chosen among several applicable rules (non determinism) and several communications occur within a single event (parallelism). It must be noted that internal rules can be used for describing computations. This can be seen in the following example : MAXNAT sends through C the maximum of two previously entered Nat's, the first one entered through A and the second one through B C-" denotes the null arity for state constructors) " proc MAXNAT in A, B • Nat out C • Nat states X y Nat Nat x Nat x Nat × Nat Z vars m, n, p, q • Nat ==> X rules ==> Y(m) X • A(m) ==> Z(m,n,m,n) Y(m) • B(n) ==> Z(m,n,m,n) Z(m,n,succ(p),succ(q)) ==> X Z(m,n;p,0) • C(m) ==> X Z(m,n,0,q) • C(n) endt)roc In fact, this form of process definition could v e r y well do without the operation definitions of the functional part of FP2. Assuming that the available functions are only constructors, basic process definitions are sufficiently powerful to define any function that can be computed on a Turing machine. However, defined operations make basic process definitions much easier to write and to read. For example, a process sending out the maximum of its two input messages could also be described as follows : proc MAX in A, B " Nat ou__Xt C • Nat states X • Y, Z " Nat vars m, n " Nat ==> X rule, s, X : A(m) ==> Y(m) Y(m) : B(n) ==> Z(max(m,n)) Z(m) : C(m) ==> X endproc

243

Process definitions m a y also be parameterized. For example, bounded queues of natural n u m b e r s of capacity k m a y be defined as follows • proc

BQUEUE i__~n out states vars

[k" Nat] W R Q e t, u, v n

• ' • • • •

Nat Nat Nat* xNat* × Nat Nat Nat* Nat

rules ==> Q(u,v,succ(n)) : W ( e ) = = > Q(u,e.v,n) : R(e) ==> Q(e.u,nil,n) ==>

Q(nil,nil,k) Q(e.u,v,n) Q(u,v,n+l) Q(nil,reverse(e.u),n)

endproc w h e r e reverse(s) r e t u r n s a sequence with the elements of s in the opposite order. Once such a parameterized process has been defined, it can be instantiated with actual parameters. It is also possible to define n a m e s standing for processes • proc BQUEUE4 is BQUEUE[4] Every process definition, w i t h or without parameters, can also be considered as the definition of an indexed family of processes, w h e r e the indexes are natural numbers. For example, let processes V be variables alternating write and read communications : V in out states vars rules

endproc

W R E F v

: : : : :

Nat Nat Nat Nat

E F(v)

E : W(v)==> F(v) : R(v) ==> E

244 This definition also defines processes V_l, V_2, etc .... with connectors W_I and R_I, W_2 and R_2, etc... These indexes m a y also appear as parameters, like in proc VNAT [i-Nat] is V_i Then • V3 is V_3 and " rp=f_OS_V3 is VNAT[3] are identical definitions and produce a process with connectors W_3 and R_3. That process may in turn be considered as defining an indexed family V3_I, V3_2, etc... A similar indexing facility can also be used within basic process definitions, w h e n it is n e c e s s a r y to describe processes with indexed families of connectors, states, variables, rules or events. For example, a process ONE receiving a Nat into I and sending it out from on_.__~eof its n output connectors O_i, is described b y p_~oc ONE [n" Nat] i~

I

•

ou___~t states

{O_i [i= l..n) E F v

" • " •

Nat Nat Nat Nat

' •

==> E I(v) ==> F(v) O_i(v) ==> E l i=l..n

vars rules

(

E F(v)

endproc

t

I

,,1

0_1 O_2

C)_t'i

J, ,L

I

4,

245

Given an instantiation ONE[3], the repetition facility {O_i I i=1_n} • Nat stands for • 0_1,0_2,0_3 • Nat. Similarly, 3 rules are produced, one for o u t p u t through each of 0_I, 0_2, 0_3. An other example shows the use of this facility for describing a process ALL which receives a Nat into I and sends it out from all of its n o u t p u t connectors within the same e v e n t • proc

ALL In" in out states vats rules

Nat] I {O_ili=I..n} E F v E F(v)

-Nat " Nat - • Nat " Nat - I(v) • {O_i(v) li=l..n}

==>

E

==> ==>

F(v) E

endproc A process which receives n Nat's sequentially into its input connectors I_i t a k e n in any order and then sends out their maximum through 0 is defined b y • MAXALL [ n Nat] in {IJ]i=1..n} out 0 sta)es Q vars v, m s rules {

0(succ(m),s) Q(0,v.s) Q(O,nil)

• • " • •

Nat Nat Nat x Nat* Nat Nat*

==> Q(n,nil) : I_i(v) ==> Q(m,v.s) ] i=l..n : O(/(max)(v.s)) ==> Q(n,nil) : 0(0) ==> Q(n,nil)

endproc

I I_1

I_2

I_n @

246

Finally, process definitions may also parameterized by port names, as in the following definition of a "CELL". A CELL performs four communications within a single event. In that event, it inputs natural integer values, while sending out the result of a simple computation performed on previously received values : proc CELL [c. Nat] [X0,Yo,XI,Y1 "Port] in

Xo, Yo

• Nat

ou.t states vats rules

XI, Y1 Q x, y, u, v

- Nat • Nat x Nat - Nat ==> Q(O,O)

Q(x,y)

• X0(u) Y0(v) Xl(x) Yl(y+c*x)

==> Q(u,v)

endp,roc Given natural integers a, i and j, and identifiers U, V, X and Y, CELL could be instantiated as follows • proc CELLI is C~L[al[U, Y_i_j, X i. j, V].

¥-J-i U

x_i_j V

i

247

4.2 _ Process forms. FP2 provides process operators for combining defined processes into process forms. The number and the nature of these operators are arbitrary and a given implementation of the language could take any collection of them. The important facts are : (a) all operators are built up on top of a common primitive basis ; (b) process forms can all be evaluated into basic process definitions with connectors, state constructors, variables and rules.

4.2. I _ Primitive basis for process operators. A non parameterized basic process definition is a syntactic object of the form : proc N connectors

kl

: tI

...

kI

states

Pl

: Jl

".

Pro: lm

vats

V 1

:

U1

,..

Vn

==> ql

-.-

Sl

...

rules ri e I

==>

: t1

:

ti n

==> qp fq eq

==>

Sq

endproc

where the input, output and internal connectors have been grouped within a single list. Given a connector k i, its sort is given by sort(ki) E {in. ~ internal}. Thus, a basic process definition can be viewed as associating a process name N with a tuple composed of the five following sets : - K = { { i=l..l }, represents connector definitions, where the ki's are I distinct connector names and the ti's are types. " P = { i i=l..m ), represents state constructor definitions, where the pi's are m distinct state constructor names and the li's are (possibly empty) lists of types. V = { { i=l..n }, represents variable definitions, where the vi's are n distinct variable names and the ui's are types. -

-

Q = { qi {i= 1..p }, represents initial rules, where the qi's are states. R = { } i= l_q }, represents transition rules, where the ri's and si's are

predicates and the ei's are (possibly empty) events.

248

Let N = and N' = be two basic process definitions Primitive sum operators. Iff K and K' share no port name, the sum K+K' is the set union of K and K'. It is undefined otherwise. Analogously, P+P' is the union of P and P' iff they share no state constructor name, V+V' is the union of V and V' iff they share no variable name and e+e', where e and e' are events, is the union of e and e' iff they share no connector name. The sums Q+Q' and R+R' are the corresponding set unions and are always defined. Product of predicates. Given state constructors definitions d= and d'=, the product d*d' is a state constructor definition , where p_p' is a state constructor name and where M (p, p', I, I') is a list obtained by merging the lists 1 and I' in a way depending solely on p and p' and such that M(p,p',l,l')=M(p',p,l',l). Given P = { d i I i=l..m } and P' = { d'j ] j=l..m' }, the product P*P' is {di*d'i ] i=l..m, j=l..m' }. Iff r = p(f) and r' = p'(f') are predicates such that the lists of functional terms f and f' share no variable name, the product r*r' is defined and is p_p' ( M (p, p', f, f')). Product of initial rules. Given Q = Cqi J i= 1_p } and Q' = { q'j ] j= I ..p' }, the product Q*Q' is { qi*q'j I i=l..p, j=l..p' }. )

I

Product of transition rules. Iff R={ ] i= l..q } and R'={ I j= I ..q' } share no variable name and no port name, the product R*R' is defined and is { i ~ I i=l ..q, j=l..q' }. Idle rules. Given a state constructor definition with I = (tl,..,tk), j (p) is a predicate of the form p (xl,..,x k) where the xi's are k distinct variable names and x i is of type tj. Given P = { I i=l..m }, the set I (P) = { I r ~ J (p) }, where J (p) = { j (pi) ] i=l..m }, is the set of "idle rules" on P and W (P) is a set of variable definitions necessary for I (P). 4.2.2 _ process operators.

Let NI = and N 2 = be basic process definitions where Kl and K2 do not share any port name. In each definition D of a process operator, N l and N~. are such that the primitive operators applied in D are defined on their operands : this can be assumed without any loss of generality, since state constructors and variables may always be renamed.

249

I _ Interleaved composition • Nt_J_.Nz W h e n an e v e n t occurs in the process N t l N 2, it is either an e v e n t in N i while N a is idle or an e v e n t in N 2 while N t is idle • NIIN 2

=

where - K = Ki+K 2 P = Pl*P2 V=

Vt+V 2+w(pi)+w(p2)

O = Qt*Q2 R = Rt*I(Pa)+I(Pt)*R

a

2 _ Synchronous composition • Nt__lilN~ When an e v e n t occurs in N t Ill N 2, it is an e v e n t in N i together with an e v e n t in N2 • NI]IiN 2

=

where • K = KI+K z P = Pt*P2 V=

VI+V 2

Q = QI *Q2 R = RI*R a 3 - Parallel composition • Nt_ H N 2,

When an e v e n t occurs in N i II N a, it is an e v e n t in N l while N 2 is idle or an e v e n t in N 2 while N t is idle, or an e v e n t in N! together with an e v e n t in N 2 NlllN 2

=

where - K = Kt+K 2 P = Pt*P2 V = Vi+V 2+W(PI)+W(P2)

Q

=

Ot *O2

R = RI * I (P2) + I ( P t ) * R 2 + R! * R 2

250

4 _ Uncontrolable c h o i c e " NI_ .~ N2._ At initialization, the process N l ? N2 chooses non deterministicaUy to b e h a v e always like N I while leaving N2 idle, or to b e h a v e always like Na while leaving N~ idle : N!?N 2

=

where • K = KI+K 2 P = P~+P2 V=

Vt+V 2

O = Qi+O2 R = RI+R 2

5 - Controlable choice : N! ! N~_. After initialization', the first e v e n t to occur in N l ! N2 may be either an e v e n t in N I while N 2 is idle or an e v e n t in N2 while N l is idle. If that e v e n t contains input or o u t p u t communications, the choice m a y be controlled by the e n v i r o n m e n t of N I ) N 2. After that e v e n t has occurred, N I ! N x continues to b e h a v e like N I while leaving N 2 idle if it was an e v e n t in N l, or continues to b e h a v e like N2 while leaving Nl idle if it was an e v e n t in N 2 : N!!N 2

=

p

= (pj * p2) * p,

V

= V I +V 2

Q

=

R

= (R l * I (P2)) * R' l + (I ( P i ) * R 2) * R' 2

where •

given "

K I +K 2

(QI * Q2 ) * O'

{ , <TI,()>, } {T} R' I = { , <TI,{},TI> } R ' 2 = { , }

251

.6 _ Connection : N ! + A.B. Let A b e an o u t p u t connector of t y p e t I and B be an i n p u t connector of t y p e t 2, b o t h in N v If t h e r e exists a t y p e t w i t h t ~ t I and t _mt2 such t h a t t h e r e is no t' ~ t satisfying t h e s e s a m e conditions and t _: t', t h e n A.B is an internal connector of t y p e t in N I + A.B. The process N l + A.B b e h a v e s like N l and, in addition, w h e n an e v e n t involving b o t h A and B m a y occur in N l, a n e w e v e n t involving A.B m a y occur in N! + A.B, w h e r e the m e s s a g e s e n t f r o m A arrives at

B NI + A.B where

=

• K = KI+K' P=

Pl

V=

Vl

0 = Ol R = RI+R' given •

K' = { } R'= { g() i < r , e , s > a R I and A ( u ) e e and B ( v ) a e

and g = mgu(u,v) } with : mgu (u, v) = most general unifier of u and v. 7 _ Hiding : N ! - k. If k is a connector of t y p e t in N l, the connectors of N l - k are all the connectors of N l, e x c e p t k w h i c h is "hidden". if k is an e x t e r n a l connector (i. e. i n p u t or output

connector),

no e v e n t

involving

k m a y occur in N i - k, since the

e n v i r o n m e n t cannot "see" k a n y longer. If k is an internal connector, e v e n t s involving k in N I m a y still occur but, in Nj - k, t h e y do not m e n t i o n k a n y longer : Nl-k where

= : K = Kj-K'

P= Pl V=

Vl

0 = Ol R = R I - R' ÷ R" given :

K'= { }

R'= { I e R l and e i n v o l v e s k ) R,, = {l • R i and k(u) e e and k is internal}

252

8 _ Tri~er

: e -> N!.

If e is an event, t h e first e v e n t to occur m e -> N l m a y occur only t o g e t h e r w i t h e. After that, e -> N I b e h a v e s like N v Thus, e -> N l is N t t r i g g e r e d b y e : e->N t

=

where : K = Kt+K' P = pl*p'

V = Vt O = q l * Q' R = RI*R' given :

K' = { a set of connector definitions n e c e s s a r y for e } P' = { , <TI,()> } Q'= { T } R' = { , }

9 _ Control : e => ..N!. If e is an event, e => N l b e h a v e s like N v b u t e v e r y e v e n t occurs together w i t h e. Thus, e => N l is N! controlled b y e : e=>Ni

=

where : K = Kt+K' p = pt*p' V= V 1 Q = QI* Q' R = RI*R' given :

K' = { a set of connector definitions n e c e s s a r y for e ) P' = ( }

O'= {T} R'= ( }

253

I0 _ T i m e - o u t : N!.n:N2_~ If n is a natural integer, t h e n NI.n:N 2 b e h a v e s like N! for at most n successive events. If N! is t e r m i n a t e d before n e v e n t s have occurred in it, t h e n NI.n:N 2 is also terminated. If N! is not t e r m i n a t e d at that time, t h e n N!.n:N 2 stops behaving like N! and starts behaving like N 2 : NI.n:N 2

=

where " K

given •

= KI+K 2

P

= Pl*P'+P2

V Q

= VI+V'+V",V = Ql* Q'

R

= RI*R'+R"+R 2

2

P' = { } V' = { } V" = W (Pl) Q' = { T ( n ) } R' = { } R" = { < p * T ( 0 ) , { } , q > I p E J ( P l ) a n d q e Q 2 }

Process forms appear in the context of process definitions • proc <process name> i_s <process form> A process form is an expression in which the operators are process operators and the o p e r a n d s are processes, connectors, e v e n t s or natural integers. Evaluating a process f o r m results in basic process definitions. In principle, the evaluation of a process form N is guided by its syntax " d u m m y n a m e s no, n ! . . . . . are generated, one for each syntactical s u b - f o r m in N, w h e r e n o is the n a m e corresponding to N itself. Then, basic process definitions for n o, n~ ..... w i t h their respective names, connectors, predicates, variables and rules can be mechanically produced b y applying the definitions of the operators. In practice, this evaluation can be greatly optimized and most i n t e r m e d i a t e basic process definitions (especially those resulting f r o m compositions) can be avoided.

254

4.2,3 _ Examples of process form&s For writing the examples in this section and showing the results of some process form evaluations, the following conventions are used • - Process forms w r i t t e n "N ++ A.B", w h e r e N is a process form and A.B is an internal connector, are expanded to "N + A.B - A - B - A.B". - The list M (p, p', I, I'), w h e r e p and p' are state constructors names and I and I' are lists is w r i t t e n as if it was append (I, I').

I _ Maxima of sequence of Nat's. In this first example, values of type Nat are read in sequentially and t h e y are considered as forming a series of sequences separated b y O's. At the end of each sequence, the maximum of that sequence is sent out. This is achieved b y a process SMAX constructed as a n e t w o r k of more e l e m e n t a r y processes. The process MAX and the following definitions are used in that construction : proc REG out

states vats

W R V r, s

:Nat : Nat : Nat : Nat

v (r) v (r)

• W (s) • R (r)

rules

==> V (0) ==> V (s) ==> V (0)

endproc type Signal cons

buzz, ring

• -)

Signal

endtype proc

BZZZ in out

K L S

: Nat : Nat : Signal

st~t)es

P

:

vats

p

: Nat

p

.

=

rules p endproc

K (0) L (0) S (buzz) - K (succ(p)) L (succ(p))

==>

P

==>

P

==>

P

255

oroc

GATE in out states

M : Nat T :Signal N : Nat Q : -

vats

q

: Nat ==>

rules Q

" M ( q ) N ( q ) T(buzz)

Q

==> Q

endproc Then the process SMAX can be constructed b y the following process form • p r.oc SMAX is ( MAX II REG ++ C.W + R.B - B - R.B ) [I ( BZZZ II GATE ++ S.T ) ++ L.A ++ R.M This construction can be pictured as foUows •

[

MAX

SMAX The resulting basic process definition is r e m a r k a b l y short • t)roc SMAX

in OUt statep

v,ars rules

K N X_V__P_Q Y_V_P_Q Z_V_P_Q p, m, r

X_V_P_Q (0) ==> Y_V_P_Q (succ(p), r) ==> Z_V_P_Q (max (m,r), 0) ==> X_V_P_Q (m) ==> Y_V_P_Q (0, 0) ==>

x_V_p_Q (r) Y-V--P-0 (m, r) Z_V__P_0 (m, r) X_V__p_Q (r)

en,dproc

: Nat : Nat :Nat : Nat × Nat : Nat × Nat : Nat

: K (succ(p)) : : : K (0) N (r)

256

2 _ Construction of a. queue. In addition to process operators, process forms m a y also use conditionals. It is t h e n possible to write recursive definitions, like the following construction of a b o u n d e d q u e u e BQ built as a chain of processes of the indexed family V proc BQ [k "Nat] is

~_ k=l t h e n V_I els____~eBQ[k- I ] I] V_k ++ R_(k- I ).W_k endif The instantiation • proc BQ4 is BQ[4] can be pictured as follows •

It is also possible to have an "iterative" description of BQ, using the repetition facility inside a process form " t)roc B Q [ k Nail is II ( V=_i I i= l..k} ++ ( R_i.W_(i+ I ) ] i= I ..k- I )

257

;L.S~tolic arrays. Let A and B be two nxn matrices. Given a series X0,...,Xi.... of vectors with n components, the problem is to compute a series ¥t,...,Yi,... of vectors with n components such that Yi = AXi + BXi-I This computation can be performed by a nxn systolic array of processes. Let SYSTOL be the name of that array. The complete system comprises SYSTOL and four interface processes which prepare the input vectors for SYSTOL and assemble the output vectors for the environment.

,;2

",i j~.IZ,' , 0 0

Z~; "

INX

F ~>.:, ; 2 ., .,...~

i

~",) LO !If. . T '

t 1.........l, il The Xi vectors arrive into the right of the system and they get out unchanged from the left, while the computed results, the Yi vectors, leave from the bottom. The processes INX, OUTX, ZERO and OUTY are interface processes : INX inserts one vector of 0's after each Xi vector and delays the jth component of the i m vector of that new sequence so that it arrives into SYSTOL "at the same time" (i. e. within the same event) as the first component of vector (i+j-l) of that sequence. Symmetrically, OUTX and OUTY re-establish the synchrony among the components of each Xi vector and of each Yi vector respectively. ZEROrepeatedly sends vectors of O's into the top of SYSTOL. The FP2 description of these interface processes is left to the reader.

258

Surrounded by its interfaces, the process SYSTOL is an n×n array of orthogonally connected processes of the family MOD, each containing 2 CELL's. MOD[i,j] is positioned at row i, column j of SYSTOL and it is defined by : proc MOD [i,j:Nat] is CELL [b(i,j)][U,Xl i j,Y0 i j,V] II CELL [a(i,j)][X0_i_jZ,W,YI_Lj] ++ V.W ++ Z.U where a(i,j) and b(i,j) are elements of the matrices A and B respectively. Finally, SYSTOL is constructed as follows : proc SYSTOL [n : Nat] is II { ROWIi,nl I i= l..n } ++{ Yl_i_j.Y0_(i+ 1)_j I i=1..n-1, j=l..n } proc ROW [i,n : Natl its I1{ MODIi,jl I j=l..n } ++{ X1_L(j+1).X0_i_j I j=l..n-1 }

2~9

5 _ POLYMORPHISM, Definitions of types, operations and processes can specify that the defined entities are parameterized by types • such entities are called "polymorphic". Polymorphic definitions use formal type parameters which are introduced by the definition, with their names and with an algebraic characterization of the family of possible corresponding actual types. In FP2, such an algebraic characterization is called a Drooertv : properties are defined by means of equations on terms, formal type parameters of polymorphic definitions require properties on their corresponding actual types and the satisfaction of a property by an actual type can be asserted by means of a specialized satisfaction clause. 5.1.........Polymorphic definitions without properties. A polymorphic definition of a type, operation or process provides + the name of the polymorphic type, operation or process ; the description of the formal type parameters ; the body of the definition, which is a basic type, operation or process definition or a type, functional or process form. The body can use the formal type parameters, by refering to their names in any context where a type can be written. For example a polymorphic type for pairs of values of the same type is • type Pair [ t" type ] l Ftype [t] is t x t It reads as f o l l o w s ' "the type Pair [t], such that t is a type satisfying the property Ftype is the cartesian product t x t". The property Ftype is a predefined property • all types satisfy it, which means that any type can be used for instantiating the polymorphic type Pair • tyt)e Pairnat is Pair [ Nat ] type Twopairs is Pair [ Pairnat ]

260

Binary trees with nodes labelled by values of a given type and leaves labelled by values of a possibly different type have the type : type Tree [ t, u : type ] I Ftype [ t ], Ftype [ u ] con~ leaf : t node : u x T r e e [ t , u ] x T r e e [ t , u ] endtvDe

=) T r e e [ t , u ] ~) T r e e [ t , u ]

Trees with pairs of Nat's on nodes and Nat's on leaves have the type : tvoe Treenat ~ Tree [ Nat, Pairnat ] Such a tree can be constructed by : node ( (3, 4), leaf (I), leaf (2)) But it is also possible to define : Woe Treebool is Tree [ Bool, Bool ] °

-

and to construct : node ( false, leaf (true), leaf (true)) Thus, the polymorphic definition of Tree has also introduced operations "leaf" and "node", which are polymorphic. Instances of these operations have also been created : one instance of leaf takes a Nat and returns a Treenat, the other instance of leaf takes a Bo01 and returns a Treebool. Similarly, two instances of node have been created. The complete names of these functions are qualified by their signatures : ( ( ( (

leaf leaf node node

: : : :

Nat Bool Pairnat x Treenat x Treenat Bool x Treebool x Treebool

=) -) -) =)

Treenat ) Treebool) Treenat ) Treebool )

When constructing "node ( (3, 4), leaf (I), leaf (2))", the choice among the various instances of node is governed by the types of the arguments. In fact, this term stands for the more explicit construction : ( node : Pairnat x Treenat x Treenat =) Treenat ) ( (3, 4), ( l e a f : N a t =) Treenat ) (I), ( l e a f : N a t =) Treenat ) ( I ) )

261

P o l y m o r p h i c o p e r a t i o n s can also be d e f i n e d s e p a r a t e l y • o_~ first [ t" t y p e ] I F t y p e [ t ]" Pair [ t ] -) vats x, y • t rules f i r s t ( x , y ) ==> x endoo An instance of first could be explicitly c r e a t e d and called, like in • ( first "Pair [ Nat ] =) Nat ) (3, 4) But it is also possible, as above, to o m i t the signature and to s i m p l y w r i t e first (3, 4) Finally, t h e r e are also p o l y m o r p h i c processes • Droc PSTACK [ t " t y P e in I • out 0 • states S • vars e • v • rules S(v) S(e.v)

] i Ftype [ t ] t t t~ t t*

• I(e) • 0(e)

==>

endDroc T h e y can be i n s t a n t i a t e d

S (nil)

==> S (e.v) ==> S (v)

•

proc STACKNAT is PSTACK [ Nat ]

262

5.2 ~ Property definitions and satisfac)ion claus.es, In all the above examples, any actual type can be bound to the formal types, since the only requirement is that it satisfies the property Ftype. This is not always the case. For example, the definition of a generic equality operation on Pair's would require that there also exist an equality operation on the type t of the elements. The orooertv of such types with an equality operation can be defined in FP2, by means of a property definition : prop Equality I t with e q ] opns eq " txt-) Bool vats x,y,z • t eqns e q ( x , y ) == eq (y,x) eq (x, x) == true eq (x, y) ^ eq (y, z) A ~ eq (X, Z) == false endoroo It can be read as follows :"the property Equality is satisfied by all types like t with an operation like eq : t x t -) Bool iff the terms built with that operation obey the specified equations". (Two terms v and w obey the equation "v == w" iff the reductions of v and w terminate with the same term). Here, the equations state that eq is symmetric, reflexive and transitive. If "=" is the name of the equality operation on Nat's, the type Nat should now satisfy the property Equality with the operation "=". However, orovin~ that it is indeed the case is, in general, not a feasable task. This is why FP2, for that purpose, relies on assertions in the form of satisfaction clauses : sat Natequal is Equality[Nat with = ] which reads • "Natequal is the name of the satisfaction clause asserting that the property Equality is satisfied by the type Nat with its operation ='. Then it becomes possible to define an equality operation on Pair's • on same [ t : type ] [ equal : ~ ] I Equality [ t with equal ] : P a i r [ t ] x P a i r [ t ] -) Bool a, b,c, d : t rule~ same ( (a, b), (c, d) ) ==> equal (a, c) A equal (b, d) endop

263

Thus, "same" is a polymorphic operation with signature Pair[ t ] × Pair[ t ] -) Bool, requiring that the formal type t satisfy Equality with the formal operation equal. With that definition, a term like • same ( (3, 4), (5, 6) ) binds t to Nat, since it means ' ( same "Pair [ Nat ] × Pair [ Nat ] =) Bool ) ( (3, 4), (5, 6) ) Given that • and that •

Equality [ t with Equality [ Nat with

equal ] = ]

is required, is satisfied,

this term is correct and the formal operation equal is bound to "=". Finally, the rule ' same ( (3, 4), (5, 6) ) ==> (3=5) ^ (4=6) is applied and the term eventually reduces to false, as expected. Given this equality operation "same" on Pair's, it becomes even possible to say that the type Pair [ t ] satisfies Equality with it, provided that t itself satisfy Equality. This is accomplished by a polymorphic satisfaction clause ' sat Pairequal [ t" typ~ ] [ e q 0_~ ] I Equality [ t with eq ] is Equality [ Pair [ t ] with same ] That satisfaction clause enlarges the polymorphism of the operation same • it becomes applicable to Pair [ Nat ], Pair [ Pair [ Nat ]], Pair [ Pair [ Pair [ Nat ]]], etc.

264

The last example shows a polymorphic process BIGMAX [ t ], requiring that there be a semi-lattice structure among objects of type t • it inputs n objects of type t within one e v e n t and sends out their least u p p e r bound. prop Semilattice [ t with eq, leq, lub ] leq " t x t -) Bool lub " t × t -) t vars m,n,p • t leq (m, m) leq (m, n) A leq (n, m)A -, eq (m, n) leq (m, n) A leq (n, p) A " leq (m, p) lub (m, n) leq (m, lub (m, n) ) leq (m, p) ^ leq (n, p) A " l e q (iub (m, n), p) ~ndoroo

= =

== == == == ==

true false false lub (n, m) true false

proc BIGMAX[ t- t y p e ] [ eq, le, up" o.o.p_] I Semilattice [ t w i t h eq, le, up ] [ n :Nat ] in {I_ili=1..n} : t out 0 : t states E : F : t* (v_i]i=l..n} : t s : t* ==> E rul,e,S E " {I_i(v_i)li=l..n} ==> F ( I v _ i l i = 1 . . n l ) F (s) - 0 ( / ( u p ) (s)) ==> E endproc

Given the satisfaction clause • ~_aAt Latnat is Semilattice [ Nat w i t h =, (, max ] BIGMAX can be instantiated to • proc BMAXNAT [ n" Nat ] is BIGMAX [ Nat ] [ n ] In that instantiation, the formal operations eq, le and up of BIGMAX get bound to =, ( and max respectively.

265

6 _ EXCEPTIONS. In the definition of a function, it is often assumed that it applies to all possible values in its domain type. However, there are cases where the domain of definition should not cover all the domain type. In FP2, it-is possible to take care of such situations, which correspond to the notion of partial f u n c t i o n s - Preconditions on parameters can restrict the domain of definition and raise exceptions. Exceptions handlers provide means of defining the actions to be taken when an exception is raised.

6. I _ Preconditions In addition to the "normal" rules which define the reductions of terms containing operation applications, an operation definition may also contain precondition

Normal rules have the format ' left term

==>

right term

where, in the left term, the outmost function name is the operation being defined and its subterms are either constructor applications or variables. Furthermore, no two normal rules in an operation definition have unifiable left terms.

Precondition rules have the format • left term

I condition

==>

! exception name

Here, the outmost function name of the left term may also be a constructor. The condition is a term reducing to a boolean value, where the variables also appear in the left term. No two precondition rules in an operation definition have unifiable left terms. The exception name is simply an identifier.

266

For example, accessing the i th element of a sequence, w h e r e i is a natural number, requires that i~be not smaller than 1 and not greater than the length of the sequence. The following polymorphic operation "elem" has its domain of definition restricted accordingly, by means of a precondition rule • o_12 elem [ t" type ] l Ftype [ t ]" t* x Nat =) vars s • t* e • t i • Nat rules elem ( s, i ) I i < I v i ) length (s) elem ( e.s, 1 ) elem ( e.s, succ (succ (i))) endoo

out_of_range

==>

!

==>

e

==>

elem ( s, succ (i))

Given the definition of an operation f possibly containing precondition rules, an application f (arg) is interpreted as follows • I. If f(arg) matches the left term of a precondition rule, the corresponding condition is evaluated. If the result is true, the named exception is raised, w h e r e "raising an exception" means returning that exception as value. If the result is false, the precondition rule is ignored. 2. If f(arg) does not match the left term of a precondition rule, or if f(arg) matches the left term of a precondition rule but the condition was false, a normal rule is looked for with f(arg) matching its left term. 3. If f(arg) matches the left term of a normal rule, that rule is applied. . If f(arg) does not match the left term of a normal rule, the predefined exception '! axiomatization" is raised. This means that the definition of f is not complete. As a consequence of this general mechanism for interpreting function applications, e v e r y FP2 function f : Targ -) Tres may be viewed as a function f : Targ -) Tres | Exception w h e r e "Exception" is a predefined type : all Exception "values" are built by the constructor ! which take an identifier as its parameter.

267

Precondition rules can also be used to restrict the domain of constructors. For example, the type of rational n u m b e r s could be defined as follows • Rat cons vars

rules endtype

// m, n

• Nat × Nat -) Ra~ • Nat, m I/n ~ n ,~ 0 ==> ! zero_divide

In that case, since there are no "normal rules" for rewriting constructor applications, an application p / / q w h e r e p and q are in n o r m a l f o r m either raises ! zero_divide or stays as it is.

6.2 _ Exception handlers. Exceptions can only be raised by the reduction of terms. This occurs w h e n a precondition rule is applicable and its condition is true. It is also possible to explicitly raise an exception in the right t e r m of a n o r m a l rule, like in • f (arg)

==> if p (arg) t h e n g (arg) e!s¢ ! e .endif

Raising an exception means returning it as value. As a consequence, a s u b t e r m x of a t e r m f ( ... x ... ) m a y t u r n out to produce an exception value ! e : all functions in FP2 are strict w i t h respect to exceptions, which means that f(... !e _.) also has the value ! e. However, it is possible to catch an exception on its w a y out of a term, by means of an exception handler. There are two situations w h e r e exception handlers, may catch exceptions • W h e n the evaluation of the right t e r m of a n o r m a l operation rule produces an exception value ; in that case, an exception handler can be attached to the corresponding rule in the definition of the operation. W h e n a functional t e r m inside the post-condition of a process rule produces an exception : in that case, an exception handler can be attached to the state constructor of the c u r r e n t state in the definition of the process.

268

6.2.1 _ Exception handling in The general format of a normal operation rule with exception handlers attached isleft term

==>

right term

when

! e I then fl

when

l e z the____~n f2

when

! e n the____n_nfn

endwhen w h e r e ! e i is an exception name and fi is a term written with the same conventions as for a right term. When ! e i is obtained as the value of the right term, then fi is taken as a "replacement" right term and evaluated. Of course, the evaluation of fi may in turn raise an exception e' i which may be handled in its due place by the rule (in general another rule) getting it as its right term value, etc.

For example, let Seqnat be the type of infinite sequences of natural numbers w h e r e only a finite slice of elements indexed from I may have a non zero value • type

Seqnat cons infseq o pns access yars s i rules access

endtvDe

: Nat* "~ Seqnat : Seqnat x Nat =) Nat : Nat* : Nat ( infseq (s), i ) ==> elem (s, i) w h e n ! out_of_range then 0 endwhen

269

~ p t i o n

handling in processes.

In addition to the "normal" rules which define state transitions, a process definition m a y also contain e x c e ~ r e c o v e r y rules The general f o r m a t of an exception r e c o v e r y rule is whe____nn l e

Ln Q ==> s

w h e r e l e is an exception name, Q is a state constructor n a m e and s is a state. When a normal rule of the form • P(f)

• event

--=> Q ( g )

is being applied, the evaluation of g may raise exception ! e. If ! e is not caught by an exception handler of an operation rule before reaching the outer layer of g, t h e n the process is said to be in the exceptional state "! e in Q". If there is no exception r e c o v e r y rule corresponding to that exceptional state, the process is terminated. If there is one, the process "recovers" by going into state s. In that case, the application of the normal rule "recovered" by the r e c o v e r y rule is considered as one transition. For example, a process receiving two natural integers p and sending out p / / q m a y have to deal with the exception ! zero_divide • NATRAT in M, N out R states E F vats p,q r rules E F(r) when endproc

q

and

proc

' • • • " •

Nat Rat Rat Nat Rat

• M (p) N (q) • R (r) !zero_divide i_n_nF

==> ==>

E F(pHq)

==)

E

==)

E

It must also be noted that exceptions can be produced by the evaluation of sent m e s s a g e s " since no t e r m may be unified with an exception value, the consequence of that situation is that the corresponding rule is not applicable.

270

7 _ MODULES. FP2 allows the definition of a v a r i e t y of entities - Types - Operations Processes - Properties - Satisfactions -

The purpose of a definition is to associate a n a m e with an entity. The n a m e - e n t i t y associations established by a set of definitions are in effect within a region of FP2 text called a module. In addition to the above entities, it is also possible to define modules within modules : this is a means of structuring FP2 programs into a hierarchy of modules. This h i e r a r c h y of modules is used as a basis for controling the extent of the region of FP2 text accross which e v e r y definition is in effect. The basic f o r m a t of a module definition is " module

M is <module body> endmodule w h e r e M is the n a m e of the module and the module body is a set of definitions. With modules defined within modules, the basic visibility rides are the same as for classical block structure - all definitions of a module are visible from inner modules, except for redefined names. In addition to that "from inside-out" visibility, a module may exoort some of its definitions up to its directly enclosing module • such exported definitions are then considered as if t h e y w e r e made in the enclosing module. Thus, the exporting facility brings a controlled "from outside-in" visibility.

27~

For example, the definition of the operation r e p m a x uses an auxiliary function rep. For defining r e p m a x in a module M while keeping rep hidden, it is possible to write ' module

M is type Btree cons yafs

rules endtype module

B e_xx oj~

repmax

is

o_p= r e p m a x : Btree -) Btree vars t : Btree rules repmax(t) ==> rep(t, max(t)) endop_ o_p_ rep

endmodule

endmodule

: Btree × Nat -) Btree m, n : Nat u, v : Btree rep(tip(m),n) ==> tip(n) rep(fork(u,v),n) ==> fork(rep(u,n),rep(v,n))

272

But, it is also possible to exercise a control over the basic from "inside-out" visibility by explicitly stating w h a t names, which are visible in a module M, become hidden from some of its inner modules • module M ~

E x is

without D I, ..., Dn module N export E I, ..., Ep is

endmodule module P is

endmodule without E l module Q is

endmodule endmodule Here, the definition of module N is made at a point w h e r e names D i ..... D, which are known in M, become invisible from within N. By combining export and without facilities, FP2 allows a v e r y flexible control over the visibility of definitions. For example, the n a m e E l , which is exported by N, is visible in N, in M and also in the enclosing module of M. It is also visible in P, but it is not visible in Q.

273

AK_.__NOWLEDGEMENTS

The design, the formal definition and the implementation of FP2, both as a programming language and as a specification language, are carried out by a research group at LIFIA. The current (temporary ?) status of the language is the result of numerous discussions among the members of this group: Philippe Schnoebelen, Sylvie Roge, Juan-Manuel Pereira, Jean-Charles Marty, Anrd.ck Marty, Philippe Jorrand, Maria-DIanca Ibanez and Jean-Michel Hufflen. The principles for polymorphism in FP2 are drawn from the work accomplished kn another research group at LIFIA, led by Didier Bert, on the design and implementation of LPG "Langage de Programmation G~nerique". The work on FP2 has also benefited from the support of the French Project C3 ("Concurrence, Communication, Cooperation") of CNRS and from the support of Nixdorf Computer A.G. in Paderborn, FRG, within ESPRIT Project 415 ("Parallel Languages and Architectures for Advanced Information Processing. A VLSI Approach").

BIBLIOGRAPHY

The work on FP2 has heavily relied upon the current state of the art in language design. Much inspiration has come from recently proposed functional languages and from a variety of models for parallelism and communicating processes. A collection of such important sources is listed in the following pages. Past experience of LIFIA in language design has also been of some help and the corresponding reports are inserted in the list.

274

ARKAXHIU, E. "Un environnement et un langage graphique pour la specification de processus parall~les communicants." These, LIFIA, Grenoble, 1984. AUSTRY, D. "Aspects syntaxiques de MEIJE, un calcul pour le paralIelisme. Applications." These, LITP, Paris, 1984. AUSTRY, D. and BOUDOL, G. "Algebre de processus et synchronisation." Theoretical Computer Science, 1984. BACKUS, J. W. "can Programming Be Liberated From The Von Neumann Style ? A functional style and its algebra of programs." Communications of the ACM. Vol. 2 I, no. 8, ! 978. BACKUS, J. w. "The algebra of functional programs : function level reasoning, linear equations and extended definitions." Lecture Notes in Computer Science no. 107, 198 I. BACKUS, J. W. "Function Level Programs as Mathematical Objects." Conference on Functional Programming Languages & Computer Architecture, ACM, 198 I. BERT, D. "Specification algebrique et axiomatique des exceptions." RR IMAG 183, LIFIA, Grenoble, 1980. BERT, D. "Refinements of Generic Specifications with Algebraic Tools." IFIP Congress, North Holland, 1983. BERT, D. "Generic Programming : a tool for designing universal operators." RR IMAG 336, LIFIA, Grenoble, 1982. BERT, D. "Manuel de r~ference de LPG, Version 1.2." RR IMAG 408, LIFIA, Grenoble, 1983. BERT, D. and BENSALEM, S. "Algebre des operateurs generiques et transformation de programmes en LPG." RR IMAG 488 (LIFIA 14), Grenoble, 1984.

275

BERT, D. and JACQUET, P. "Some validation problems with parameterized types and generic functions." 3r~ International Symposium on Programming, Dunod, Paris, 1978. BIDOIT, M. "Une methode de presentation des types abstraits : applications." These, LRI, Orsay, 198 I. BJORNER, D. and JONES, C. B. "The Vienna Development Method : The Meta-Language." Lecture Notes in Computer Science no. 6 I, 1978. BJORNER, D. and JONES, C. B. "Formal specification & software development." Prentice Hall International, Englewood Cliffs, New Jersey 1982. BOUDOL, G. "Computational semantics of terms rewriting systems." RR 192, INRIA, 1983. BROOKES, S. D. "A model for communicating sequential processes." Thesis, Carnegie-Mellon University, ! 983. BURSTALL, R. M., MACQUEEN,D.B. and SANNELLA, D.T. "HOPE: an experimental applicative language." CSR-62-80, University of Edinburgh, 198 I. CISNEROS, M. "Programmation parallele et programmation fonctionnelle : propositions pour un langage." These, LIFIA, Grenoble, 1984. DERSHOWITZ,N. "Computing with rewrite systems." ATR-83 (8478)-I, Aerospace Corporation, 1983. GOGUEN,J. A., THATCHER,J. W. and WAGNER,E. G. "An initial algebra approach to the specification, correctness, and implementation of abstract data types." Current Trends in Programming Methodology, Vol. 4, Prentice Hall, Englewood Cliffs, New Jersey, 1978. GUERREIRO, P. J. V. D. "Semantique relationnelle des programmes non-deterministes et des processus communicants." Th~se, IMAG, Grenoble, juillet 198 I. GUTTAG, J. V. and HORNING, J.J. "The algebraic specification of abstract data types." Acta Informatica, 1978.

276

HOARE, C. A. R. "Communicating sequential processes." Communications of the ACM, Vol. 2 I, no. 8, 1978. HOARE, C. A.R. "Notes on communicating processes." PRG-33, Oxford University, 1983. HUFFLEN, J. M. "Notes sur FP et son implantation en LPG." RR IMAG 518 (LIFIA 20), Grenoble, 1985. JORRAND, Ph. "Specification of communicating processes and process implementation correctness." Lecture Notes in Computer Science no. 137, 1982. JORRAND, Ph. "FP2 :"Functional Parallel Programming based on term substitution." RR IMAG 482 (LIFIA 15), Grenoble, 1984. MAY, D. "OCCAM."SIGPLAN Notices, Vol. 13, no. 4, 1983. MILNER, R. "A calculus of communicating systems." Lecture Notes in Computer Science, no. 92, 1980. PEREIRA, J. M. "Processus communicants : un langage formel et ses mod(~les. Probl~mes d'analyse." Th~se, LIFIA, Grenoble, 1984. SOLER, R. "Une approche de la th(~orie de D. Scott et application ~ la semantique des types abstraits alg~briques." Th(~se, LIFIA, Grenoble, septembre 1982. TURNER, D. A. "The semantic elegance of applicative languages." Conference on Functional Programming Languages & Computer Architecture, ACM, 198 I. WILLIAMS J. H. "On the development of the algebra of functional programs." ACM Transactions on Programming Languages and Systems, Vol. 4, no. 4, 1982.

Concurrent Pro]og: A Progress Report

Ehud Shapiro Department. of Computer Science The Weizmann Institute of Science Rehovot 76100, Israel April 1986

Abstract Concurrent Prolog is a logic programming language designed for concurrent programming and parallel execution. It is a process oriented language, which embodies dataflow synchronization arm guarded-command indeterminacy as its basic control mechanisms. The paper outlines the basic concepts and definition of the language, and surveys the major programming techniques that emerged out of three years of its use. The history of the language development, implementation, and applications to date is reviewed. Details of the performance of its compiler and the functionality of Logix, its programming environment and operating system, are provided.

1.

Orientation

Logic programming is based on an abstract computation model, derived by Kowalski [28] from Robinson's resolution principle [40]. A logic program is a set of axioms defining relationships between objects. A computation of a logic program is a proof of a goal statement from the axioms. As the proof is constructive, it provides values for goal variables, which constitute the output of the computation. Figure 1.1 shows the relationships between the abstract computation model of logic programming, and two concrete programming languages based on it: Prolog, designed by A. Colmerauer [41] and Concurrent Prolog. It shows that Prolog programs are logic programs augmented with a control mechanism based on sequential search with backtracking; Concurrent Prolog's control is based on guardedcommand indeterminacy and dataflow synchronization. The execution model of Prolog is implemented using a stack of goals, which behave like procedure calls. Concurrent Prolog's computation model is implemented using a queue of goals,

278

Logic Programs Nondeterministic goal reduction

Abstract model:

Unification Language:

Control:

Implementation:

Prolog

Concurrent Prolog

Goal and clause order define sequential search and backtracking

Commit and read-only operators define guarded-command indeterminacy and dataflow synchronization

stack of goals + trail for backtracking

queue of goals + suspension mechanism

F i g u r e 1.1: Logic programs, Prolog, and Concurrent Prolog

which behave like processes. Figure 1.2 argues that there is a homomorphism between von Neumann and logic, sequential and concurrent languages. That is, it claims that the relationship between Occam and Concurrent Prolog is similar to the relationship between Pascal and Prolog, and that the relationship between Pascal and Occam is similar to the relationship between Prolog and Concurrent Prolog 1 .

2.

Logic Programs

A logic program is a set of axioms, or rules, defining relationships between objects. A computation of a logic program is a deduction of consequences of the axioms. The concepts of logic programming and the definition and implementation Some of the attributes in the figure are rather schematic, and shouldn't be taken literally, e.g. Pascal has recursion, but its basic repetitive construct, as in Occam, is iteration, whereas in Frolog and Concurrent Prolog it is r~ursion. Similarly Occam has if-then-else, but its basic conditional statement, as in Concurrent Prolog, is the guarded-command.

279

Pascal

Prolog

Occam

Concurrent Prolog

sequential sfiack-based procedure call parameter passing if-then-else/cut concurrent

queue-based process activation message passing guarded-command/commit yon Neumann model storage variables (mutable) parameter-passing, assignment, selectors, constructors explicit/static allocation of data/processes iteration

logic programs model logical variables (single assignment) unification

implicit/dynamic allocation of data/processes with garbage collection recursion

F i g u r e 1.2: A homomorphism between von Neumann and logic, sequential and concurrent languages

of the programming language Prolog date back to the early seventies. Earlier attempts were made to use Robinson's resolution principle and unification algorithm [40] as the engine of a logic based computation model [16]. These attempts were frustrated by the inherent inefficiency of general resolution and by the lack of a natural control mechanism which could be applied to it. Kowalski [28] has found that such a control mechanism can be applied to a restricted class of logical theories, namely Horn clause theories. His major insight was that universally quantified axioms of the form A +-- B 1 , B 2 , . . . , B n

n >_ 0

can be read both declaratively, saying that A is true if B1 and B2 and ... and Bn are

280

intersect(X,L1,L2) ~-- member(X,L1), member(X,L2). member(X,list (X,Xs)). member(X,list(Y,Ys)) ~-- member(X,Ys). Program

2.1: A logic program for List intersection

true, and procedurally, saying that to prove the goal A (execute procedure A, solve problem A), one can prove subgoals (execute subprocedures, solve subproblems) B1 and B2 and ... and Bn. Such axioms are called definite-clauses. A logic program is a finite set of definite clauses. Program 2.1 is an example of a logic program for defining list intersection. It assumes that lists such as [1,2,3] are represented by recursive terms such as

list(1,1ist( e, list( S, nil) ) ). Declaratively, its first axiom reads: X is in the intersection of lists L1 and L2 if X is a member of L1 mad X is a member of/;2. Procedurally, it reads: to find an X in the intersection of L1 and/;2 find an X which is a member of L1 and is also a member of L2. The axioms defining member read declaratively: X is a member of the list whose first element is X. X is a member of the list list( Y, Ys) if X is a member of Ys. (Here and in the following we use the convention that names of logical variable begin with an upper-case letter.) The difference between the various logic programming languages, such as sequential Prolog [41], PARLOG [7], Guarded Horn Clauses [65], and Concurrent Prolog [49], lie in the way they deduce consequences from such axioms. However, the deduction mechanism used by all these languages is based on the abstract interpreter for logic programs, shown in Figure 2.1. The notions it uses are explained below. On the face of it, the abstract interpreter seems nothing b u t a simple nondeterministic reduction engine: it has a resolvent, which is a set of goals to reduce; it selects a goal from the resolvent, a unifiable clause from the program, and reduces the goal using t h e clause. What distinguishes this computation model from others is the logical variable, and the unification procedure associated with it. The basic computation step of the interpreter, as well as that of Prolog and Concurrent Prolog, is the unification of a goal with the head of a clause [40]. The unification of two terms involves finding a substitution of values for variables in the terms that make the two terms identical. Thus unification is a simple and powerful form of pattern matching.

281

Input:

A logic program P and a goal G

Output:

GO, which is an instance of G proved from P, or failure.

Algorithm: Initialize the resolvent to be G, the input goal. While the resolvent is not empty do choose a goat A in the resolvent and a fresh copy of a clause A ~ *- B1,B2,...,B~, k > O, in P, such that A and A I are unifiable with a substitution 0 (exit if such a goal and clause do not exist). Remove A frora, and add B1,B2,...,B,~ to, the resolvent Apply 0 to the resolvent and to G. If the resolvent is empty then output G, else output failure. F i g u r e 2.1: An abstract interpreter for logic programs

Unification is the basic, and only, data manipulation primitive in logic programming. Understanding logic programming is understanding the power of unification. As the example programs below show, unification subsumes the following data-manipulation primitives, used in conventional programming languages: •

Single-assignment (assigning a value to a single-assignment variable).

® Parameter passing (binding actual parameters to formal parameters in a procedure or function call). •

Simple testing (testing whether a variable equals some value, or if the values of two variables are the same).

.

Data access (field selectors in Pascal, ear and edr in Lisp).

•

Data construction (new in Pascal, cons in Lisp).

•

Communication (as elaborated below).

The efficient implementation of a logic programming language involves the compilation of the known part of unification, as specified by the program's clause heads to the above mentioned set of more primitive operations [72]. A term is either a variable, e.g. X, a constant, e.g. a and 18, or a compound t e r m f(T1,T~,...,T,~), whose main functor has name f, arity n, and whose argu-

282 T1,T2,...,Tn, are terms. A substitution element is a pair of the form Variable=Term. An (idempotent) substitution is a finite set of substitution elements ( V I = T 1 , V2=T2,..., V,,=T,~) such that V i ¢ V1 if i ~ j, and Vi does not occur in Ti for any i and 3".

ments

The application of a substitution 0 to a term S, denoted $0, is the term obtained by replacing every occurrence of a variable V by the term T~ for every substitution element V----Tin 0. Such a term is called an instance of S. term

For example, applying the substitution (X---3, Xs--list(1,1ist(3,nil))} to the mcmber( X, list( X, Xs) ) is the term member( 3,1ist(3,1ist(1,1ist(3,nil) ) ) ).

A substitution 0 unifies terms T1 and T2 if T10=T20. Two terms are unifiable if they have a unifying substitution. If two terms T1 and T9 are unifiable then there exists a unique substitution/9 (up to renaming of variables), called the most general unifier of T1 and T2, with the following property: for any other unifying substitution a of T1 and T~, Tla is an instance of T10. In the following we use 'unifier' as a shorthand for 'most general unifier'. For example, the unifier of X and a is (X=a). The unifier of X and Y is (or (Y=X)). The unifier off(X,X) and f(A,b) is (X=b, A--b), and the unifier of g(X,X) and g(a,b) does not exist. Considering the example logic program above, the unifier of member(A,list(l,list($,nil))) and member(X, list(X, Xs))

(X=Y)

is {X=I,A=I,Xs=tist(

3.

,nil) }.

Concurrent Prolog

We first survey some common concepts of concurrent programming, tie them to logic programming, and then introduce Concurrent Prolog.

3.1 Concurrent programming: processes, communication, and synchronization A concurrent programming language can express concurrent activities, or processes, and communication among them. Processes are abstract entities; they are the generalization of the execution thread of sequential programs. The actions a process can take include inter-process communication, change of state, creation of new processes, and termination. It might seem that a declarative language, based on the logic programming computation model, will be unsuitable for expressing the wide spectrum of actions of concurrent programs. This is not the case. Sequential Prolog shows that, in addition to its declarative reading, a logic program can be read procedurally.

283

al) Goal = Process a2) Conjunctive goal = Network of processes a3) Shared logical variable = Communication channel = Shared-memory single-assignment variable a4) Clauses of a logic program = Rules, or instructions, for process behavior F i g u r e 3.1: Concepts of logic programming and concurrency

Concurrent Prolog shows yet another possible reading of logic programs, namely t h e process behavior reading, or process reading for short. The insight we would like to convey is that the essential components of concurrent computations Q concurrent actions, indeterminate actions, communication, and process creation and termination - - are already embodied in the abstract computation model of logic programming, and that they can be uncovered using the process reading. Before introducing the computation model of Concurrent Prolog that embodies these notions, we would like to dwell on the intuitions and metaphors that link the formal, symbolic, computational model with the familiar concepts of concurrent programming, via a sequence of analogies, shown in Figure 3.1. We exemplify t h e m using the Concurrent Prolog program for quicksort, Program 3.1. In the meantime the read-only operator '?' can be ignored, and the commit operator '1' can be read as a conjunction ','. Following Edinburgh Prolog, the term [XIXs ] is a syntactic convention replacing list(X, Xs), and [] replaces nil. The list [1,ZlXs ] is a shorthand for [l[[21Xs]] , that is list(g,Iist(2,Xs)), and [1,2,31 for list(1,1ist(2,1ist(3,nil))). The clauses for quicksort read: Sorting the list [XIXs] gives Ys if partitioning Xs with respect to X gives Smaller and Larger, sorting Larger gives Ls, sorting Smaller gives Ss, and appending [X]Ss] to Ls gives Ys. Sorting the empty list gives the empty list. The first clause of partition reads: partitioning a list [XIIn ] with respect to X gives [ YISmatler] and Larger if X _> Y and partitioning In with respect to X gives Smaller and Larger. al) Goal = Process A goal p(T1,T2,...,Tn) can be viewed as a process. The arguments of the goal (TI,T2,..., Tn) constitute the data state of the process. The predicate, p/n (name p, arity n), is the program state, which determines the procedure (set of

284 quicksort ([XIXs],Ys) +partition(Xs?,X,Smaller,Larger), quicksor t (Smaller?,Ss), quicksort(Larger?,Ls), append(Ss?,[XlLs?],Ys). quicksort([ ],[ ]). partition([YlIn],X,[Y[Smaller ],Larger) +X >_ Y [ partition(In?,X,Smaller,Larger). partition([YlIn ],X,Smaller,[vlLarger]) +X < Y [ partition(In?,X,SmalIer,Larger). partition([ ],X,[ ],[ ]).

append([XlX],Ys,iXlZs]) append(Xs?,Ys,Zs). append([ ],Xs,Xs). Program

3.1: A Concurrent Prolog Quicksort program

clauses with same predicate name and arity) executed by the process. A typical state of a quicksort process might be qsort([5,SS,$,7,191Xs ], Ys). a2) Conjunctive goal = Network of processes A network of processes is defined by its constituent processes, and by the way they are interconnected. A conjunctive goal is a set of processes. For example, the body of the recursive clause of quieksort defines a network of four proceses, one partition process, two quicksort processes, and one append process. The variables shared between the goals in the conjunction determine an illterconnection scheme. This leads to a third analogy. a3) Shared logical variable -= Communication channel -= Shared single-assignment

variable A communication channel provides a means by which two or more processes may communicate information. A shared variable is another means for several processes to share or communicate information. A logical variable, shared between two or more goals (processes), can serve both these functions. For example, the variables Smaller and Larger serve as communication channels between partition and the two recursive quicksort processes. Logical variables are single-assignment, since a logical variable can be assigned only once during a computation. Hence, a logical variable is analogous to a communication channel capable of transmitting only one message, or to a shared-memory variable that can receive only one value.

285

Note that under this singie-assignment restriction the distinction between a communication channel and a shared-memory variable vanishes. It is convenient to view shared logical variables sometimes as analogous to communication channels and sometimes as analogous to shared-memory variables. The single-assignment restriction has been proposed as suitable for parallel programming languages independently of Ioglc-programming [i]. At first sight it would seem a hindrance to the expressiveness of Concurrent Prolog, but it is not. Multiple communications and cooperative construction of a complex data structure are possible by starting with a single shared logical variable, as explained below. a4) Clauses of a logic program -- Rules, or instructions, for process behavior The actions of a process can be separated into control actions and data actions. Control actions include termination, iteration, branching~ and creation of new processes. These are specified explicitly by logic program clauses. Data actions include communication and various operations on data structures, e.g. single-assignment, inspection, testing, and construction. As in sequential Prolog, data actions are specified implicitly by the arguments of the head and body goals of a clause, and are realized via unification.

3.2 The process reading of logic programs We show how termination, iteration, branching, state-change, and creation of new processes can be specified by clauses, using the process reading of logic programs. 1) Terminate A unit clause, i.e. a definite clause with an empty body:

p(T1, T2,..., Tn). specifies that a process in a state unifiable with p(T1,Ts,°..,T,~) can reduce itself to the empty set of processes~ and thus terminate. For example the clause quicksort([ ],[ ]) says that any process which unifies with it, e.g. quicksort([ ], Ys), may terminate. While doing soy this process unifies Ys with [ ], effectively closing its output stream. 2) Change of data and program state An iterative clause, i.e. a clause with one goal in the body:

p( T1, T2,..., Tn) +-- q(S1,S2,...,S,~). specifies that a process in a state unifiable with p( T1, T2,..., Try) can change its state to q(S1,S2,...,Sm). The program state is changed to q/m (i.e. branch),

286

and the data state to (St~S2,.o .,Sin). For example, the recursive clause of append specifies that the process append([1,3,~,7,1Z]il],[21,2P,25ILZ],L3 ) can change its state to append([3,4,7, 12]L1],[P1,PP, B5]LP],Zs). While doing so, it unifies L• with [1 IZs], effectively sending an element down its output stream. Since append branches back to itself, it is actually an iterative process. 3) Create new processes A general clause, of the form:

p(T1,T2,...,T,~) ~ Q1,Q2,...,Qm. specifies that a process in a state unifiable with p( TI, T2,..., T,t) can replace itself with m new processes as specified by Q1,Q2,...,Qm. For example, the recursive clause of quicksort says that a quicksor~ process whose first argument is a list can replace itself with a network of four processes: one partition process, two quieksort processes, and one append process. It further specifies their interconnection, and initializes the first element in the list forming the second argument of append to be X, the partitioning element. Note that under this reading an iterative clause can be viewed as specifying that a process can be replaced by another process, rather then change its state. These two views are equivalent. Recall the abstract interpreter in Figure 3.1. Under the process reading the resolvent, i.e. the current set of goals of the interpreter, is viewed as a network of concurrent processes, where each goal is a process. The basic action a process can take is process reduction: the unification of the process with the head of a clause, and its reduction to (or replacement by) the processes specified by the body of the clause. The actions a process can take depend on its state - - on whether its arguments unify with the arguments of the head of a given clause. Concurrency can be achieved by reducing several processes in parallel. This form of parallelism is called And-parallelism. Communication is achieved by the assignment of values to shared variables, caused by the unification that occurs during process reduction. Given a process to reduce, all clauses applicable for its reduction may be tried in parallel. This form of parallelism is called Or-parallelism, and is the source of a process's ability to take indeterminate actions.

3.3 Synchronization using the read-only and commit operators In contrast to sequential Prolog, in Concurrent ProIog art action taken by a process cannot be undone: once a process has reduced itself using some clause, it is

:287

committed to it. The resulting computational behavior is called committed-choice nondeterminism, don't-care nondeterminism, and sometimes also indeterminacy, to distinguish it from the "don't-know" nondeterminism of the abstract interpreter. This design decision is common to other concurrent logic programming languages, including the original Relational Language [6], PARLOG [7], and GHC [54]. It implies that a process faced with a choice should better make a correct one, lest it might doom the entire computation to failure. The basic strategy taken by Concurrent Prolog to ensure that processes make correct choices of actions is to provide the programmer with a mechanism to delay process reductions until enough information is available so that a correct choice can be made. The two synchronization and control constructs of Concurrent Prolog are the read-only and the commit operators. The read-only operator (indicated by a question-mark suffix '?'), can be applied to logical variables, e.g. X?, thus designating them as read-only. The read-only operator is ignored in the declarative reading of a clause, and can be understood only operationally. Intuitively, a read-only variable cannot be written upon, i.e. be instantiated. It can receive a value only through the instantiation of its corresponding writeenabled variable. A unification that attempts to instantiate a read-only variable suspends until that variable becomes instantiated. For example, the unification of X? with a suspends; of f(X, Y?) w i t h / ( a , g ) succeeds, with unifier {X=a, Z=Y?}. Considering Program 3.1, the unification of quieksort(In?,Out) with both quieksort([ ],[ ]) and quieksort([X[Xs], Ys)suspends, as does the unification of append(Li?,[3,4,hlLZ],L3 ) with the heads of its two clauses. However, as soon as In? gets instantiated to [81Ini], for example, by another partition process who has a write-enabled occurrence of In, the unification of the quieksort goal with the head of the first clause fails, and with the second clause succeeds. Definition: We assume two distinct sets of variables, write-enabled variables and read-only variables. The read-only operator, ?, is a one-to-one mapping from write-enabled to read-only variables. It is written in postfix notation. For every write-enabled variable X, the variable X? is the read-only variable corresponding

to X.

|

The extension of the read-only operator to terms which are not write-enabled variables is the identity function. Definition: A substitution 0 affects a variable X if it contains a substitution element X=T. A substitution 0 is admissible if it does not affect any read-only variable. |

288

D e f i n i t i o n : The read-only extension of a substitution 8, denoted 0?, is the result of adding to 0 the substitution elements X?=T? for every X = T in 0 such that T ~ X?. | D e f i n i t i o n : The read-only unification of two terms T1 and T2 succeeds, with read-only mgu 0?, if T1 and T2 have an admissible mgu 8. It suspends if every mgu of TI and T2 is not admissible. It fails if T1 and T2 do not unify. | Note that the definition of unifiability prevents the unification attempt to instantiate read-only variables. However, once the unification is successful, the read-only unifier instantiates read-only variables in accordance with their corresponding write-enabled variables. This definition of read-only unification resolves several ill-defined points in the original description of Concurrent Prolog [49], discussed by Saraswat [42] and Ueda [65], such as order-dependency. It implicitly embodies the suggestion of Ramakrishnan and Silberschatz [39], that a single unification should not be able to "feed itself', that is simultaneously write on a write-enabled variable and read from its corresponding read-only variable. In particular, it implies that the unification of f(X,X?) with f(a,a) suspends. The second synchronization and control construct of Concurrent Prolog is the commit operator. A guarded clause is a clause of the form:

A ,,- G1,G2,...,Gin t B1,B2,...,B,~

m,n >_ O.

The commit operator 'l' separates the right hand side of a rule into a guard and a body. Declaratively, the commit operator is read just like a conjunction: A is true if the G's and the B's are true. Procedurally, the reduction of a process A1 using such a clause suspends until A1 is unifiable with A, and the guard is determined to be true. Thus the guard is another mechanism for preventing or postponing erroneous process actions. As a syntactic convention, if the guard is empty, i.e. re=O, the commit operator is omitted. The read-only variables in the recursive invocations of quicksort, partition, and append cause them to suspend until it is known whether the input is a list or nil. The non-empty guard in the recursive clauses for partition allows the process to choose correctly on which output stream to place its next input element. It is placed on the first stream if it is smaller or equal to the partitioning element. It is placed on the second stream if it is larger then the partitioning element. Concurrent Prolog allows G's, the goals in the guard, to be calls to general Concurrent Prolog programs. Hence guards can be nested recursively, and testing the applicability of a clause for reduction can be arbitrarily complex. In the following discussion we will restrict our attention to a subset of Concurrent Prolog

289

,:ailed Flat Concurrent Prolog [33 I. In Flat Concurrent Prolog the goals in the guards can contain calls to a fixed set of simple test-predicates only. For example, P r o g r a m 3.1 is a Flat Concurrent Prolog program. In Flat Concurrent Prolog, the reduction of a goal using a guarded clause succeeds if the goal unifies with the clauses' head, and its guard test predicates succeed. Flat Concurrent Prolog is both the target language and the implementation language for the Logix system, to be discussed in Section 5. It is a rich enough subset of Concurrent Prolog to be sufficient for most practical purposes. It is simple enough to be amenable to an efficient implementation, resulting in a high-level concurrent programming language which is practical even on conventional uniprocessors.

3.4 An abstract interpreter for Flat Concurrent Prolog Flat Concurrent Prolog is provided with a fixed set T of test predicates. Typical test predicates include string(X) (which suspends until X is a non-variable, then succeeds if it is a string; fails otherwise), and X < Y (which suspends until X and Y are non-variables, then succeeds if they are integers such that X < Y, else fails). D e f i n i t i o n : A fiat guarded clause is a guarded clause of the form

A +-- G1,G2,...,Gin [ B1,B2,...,Bn

m,n >_ O.

such that the predicate of Gi is in T, for all i, 0 < i < m.

A Flat Concurrent Prolog program is a finite set of fiat guarded clauses.

|

An abstract interpreter of Flat Concurrent ProIog is defined in Figure 3.2. The interpreter again leaves the nondeterministic choices for a goal and a clause unspecified: the scheduling policy, by which goals are added to and removed from the resoIvent, and the clause selection policy, which indicates which clause to choose for reduction, when several clauses are applicable. Fairness in the scheduling and clause selection policies are further discussed in [44]. For concreteness, we will explain the choices made in Logix. Logix implements bounded depth-first scheduling. In bounded depth-first scheduling the resolvent is maintained as a queue~ and each dequeued goal is allocated a timeslice t. A dequeued goal can be reduced t times before it is returned back to the queue. If a goal is reduced using an iterative clause A ~-- B, then B inherits the remaining time-slice. If it is reduced using a general Clause A ~-- Bx,B2,...,Bn, then, by convention, B1 inherits the remaining time-slice, and B2 to Bn are enqueued to the back of the queue. Bounded depth-first scheduling reduces the overhead

290

Input:

A Flat Concurrent Prolog program P and a goal G

Output:

GO, if GO was an instance of G proved from P or deadlock otherwise.

Algorithm: Initialize the resolvent to be G, the input goal. While the resolvent is not empty do choose a goal A in the resolvent and a fresh copy of a clause

A' *-- Gx,G2,...,Gin ] B1,B2,...,B,~ in P such that A and A' have a read-only unifier 0 and the tests (G1,G2,...,Gm)O succeed (exit if such a goal and clause do not exist). Remove A from and add B1,B2,...,B,~ to the resolvent Apply 0 to the resolvent and to G. If the resolvent is empty then output G, else output deadlock. F i g u r e 3.~: An abstract interpreter for Flat Concurrent Prolog

of process switching, and allows more effective cashing of process arguments in registers. Logix also implements stable clause selection, which means that if a process has several applicable clauses for reduction, the first one (textually) will be chosen. Stability is a property that can be abused by programmers. It is hard to preserve in a distributed implementation [44], and makes the life of optimizing compilers harder. It is not part of the language definition. In addition Logix implements a non-busy waiting mechanism, in which a suspended process is associated with the set of read-only variables which caused the suspension of its clause reductions. If any of the variables in that suspension set gets instantiated, the process is activated, and enqueued to the back of the queue. The abstract interpreter models concurrency by interleaving. The truly parallel implementation of the language requires that each process reduction be viewed as an atomic transaction, which reads from and writes to logical variables. A parallel interpreter must ensure that its resulting behavior is serializable, i.e. can be ordered to correspond to some possible behavior of the sequential interpreter. Such an algorithm has been designed [ref distributed] and is currently being im-

291

plemented on Intel's iPSC at the Weizn~ann Institute.

4.

Concurrent Prolog Programming

Techniques

In the past three years of its use, Concurrent Prolog has developed a wide range of programming techniques. Some are simply known concurrent programming techniques restated in the formalism of logic programming, e.g. divide-andconquer, monitors, stream-processing, and bounded buffers. Others are novel techniques, which exploit the unique aspects of logic programs, notably the logical variable. Examples include difference-streams, incomplete-messages, and the short-circuit technique. Some techniques exploit properties of the read-only variable, e.g. blackboards, constraint-systems, and protected data-structures. Perhaps the most important in the long-run are the meta-programming techniques. Using enhanced meta-interpreters, one can implemented a wide spectrum of programming environment and operating system functions, such as inspecting and affecting the state of the computation, and detecting distributed termination and deadlock, in a simple and uniform way [45,20]. In the following account of these techniques breadth was preferred over depth. References to deeper treatment of various subjects are provided.

4.1 Divide-and-conquer: recursion and communication Divide and conquer is a method for solving a problem by dividing it into subproblems, solving them, possibly in parallel, and combining the results. If the subproblems are small enough they are solved directly, otherwise they are solved by applying the divide-and-conquer method recursively. Parallel divide-and-conquer algorithms can be specified easily in both functional and logic languages. Divideand-conquer becomes more interesting when it involves cooperation, and hence direct communication, among the processes solving the subproblems. Program 4.1 solves a problem due to Leslie Lamport [30]. The problem is to number the leaves of a tree in ascending order from left to right, by the following recursive algorithm: spawn leaf processes, one per leaf, in such a way that each process has an input channel from the leaf process to its left, and an output channel to the leaf process to its right. The leftmost leaf process is initialized with .a number. Each process receives a number from the left, numbers its leaf with it, increments it by one, and sends the result to the right. The problem is shown in order to explore the problematies of combining recursion with communication, and is not necessarily a useful parallel algorithm. The program assumes that binary trees are represented using the terms

292

number(leaf(N) ,N,N1) ~-plus(N?,l,N1)o number(tree(L,R),N,N 2) ~number(L?,N?,N1), number(R?,Nl?,N2). P r o g r a m 4.1: Numbering the leaves of a tree: recursion with general communication

leaf(X) and tree(L,R). For example with three leaves.

tree(leaf(Xl),tree(leaf(X2),leaf(X3))) is a tree

Program 4.1 works in parallel on the two subtrees of a tree, until it reaches a leaf, where it spawns a plus process. A plus process suspends until its first two arguments are integers, then unifies the third with their sum. The plus processes, however, cannot operate in parallel. Rather, they are synchronized in such a way t h a t they are activated one at a time, starting from the leftmost node. Program 4.1 passes the communication channels to the leaf processes in a simple and uniform way, via unification. It numbers a leaf by unifying its value with the left channel, even before that channel has transmitted a value.

4.2 Stream processing Concurrent Prolog is a single-assignment programming language, in that a logical variable can be assigned to a non-variable term only once during a computation. Hence it seems that, as a communication channel, a shared logical variable can transmit at most one message between two processes. This is not quite true. A variable can be assigned to a term that contains a message and another variable. This new variables is shared by the processes t h a t shared the original variable. Hence it can serve as a new communication channel, which can be assigned to a term t h a t contains an additional message and an additional variable, and so on ad

infinitum. This idea is the basis of stream communication in Concurrent Prolog. In stream communication, the communicating processes, typically one sender and one receiver (also called the stream's producer and consumer) share a variable, say Xs. The sender, who wants to send a sequence of messages ml,m2,m3,... assigns Xs to [ml]Xsl] in order to send ml, then instantiates Xsl to [m21Xs2] to send m2, then assigns Xs2 to [m3[Xs3], and so on. The receiver inspects the read-only variable

Xs? attempting to unify it with

293

merge([XIXsl,Ys,[X[Zs]) ~- merge(Xs?,Ys?,Zs). merge(Xs,[YiYs],[YlZs]) +--merge(Xs?,Ys?,Zs). merge([ 1,[ ],[ 1). Program

4.2: A binary stream merger

[M1IXsl ]. When successful, it can process the first message MI, and iterate with Xsl?, waiting for the next message. Exactly the same technique would work for one sender and multiple receivers, provided that all receivers have read-only access to the original shared variable. A receiver that spawns a new process can include it in the group of receivers by providing it with a read-only reference to the current stream variable. Program 3.1 for Quicksort demonstrates stream processing. Each partition process has one input stream and two o u t p u t streams. On each iteration it consumes one element from its input stream, and places it on one of its output streams. When it reaches the end of its input stream it closes its two output streams and terminates. The append process from the same program is a simpler example of a stream processor. It copies its first input stream into its output stream, and when it reaches the end of the first input stream it binds the second input stream to its o u t p u t stream, and terminates.

4.3 Stream merging Streams are the basic communication means between processes in Concurrent Prolog. It is sometimes necessary, or convenient, to allow several processes to communicate with one other process. This is achieved in Concurrent Prolog using a stream merger. A stream merger is not a function, since its output - - the merged stream can be any one of the possible interleavings of its input streams. Hence streambased functional programming languages incorporate stream mergers as a language primitive. In logic programming, however, a stream merger can be defined directly, as was shown by Clark and Gregory [6]; their definition, adapted to Concurrent Prolog, is shown in Program 4.2. As a logic program, Program 4.2 defines the relation containing all facts merge(Xs, Ys, Zs), in which the list Zs is an order preserving interleaving of the elements of the lists Xs and ]Is. As a process, merge(Xs~, Ys?,Zs) behaves as follows: If neither Xs nor Ys are instantiated, it suspends, since unification with all

294

three clauses suspends. If Xs is a list then it can reduce using the first clause, which copies the list element to Zs, its output stream, and iterates with the updated streams. Similarly with Ys and the second clause. If it has reached the end of its input streams it closes its output stream and terminates, as specified by the third clause.

In case both Xs and Ys have elements ready, either the first or the second clause can be used for reduction. The abstract interpreter of Flat Concurrent Prolog, defined in Figure 2.1, does not dictate which one to use. This may lead to an unfortunate situation, in which one clause (say the first) is always chosen, and elements from the second stream never appear in the output stream. A stream merger that allows this is called unfair. There are several techniques to implement fair mergers in Concurrent Prolog. They are discussed in [51,52,67].

4.4 Recursive process networks The recursive structure of Concurrent Prolog, together with the logical variable, makes it a convenient language for specifying recursive process networks. An example is the Quicksort program above. Although hard to visualize, the program forms two tree-like networks: a tree of partition processes, which partitions the input list into smaller lists, and a tree of append processes, which concatenates these lists together. Process trees are useful for divide-and-conquer algorithms, and for searching, among other things. Here we show an application to stream merging. An n-ary stream merger can be obtained by composing n-1 binary stream mergers in a process tree. A program for creating a balanced tree of binary merge operators is shown as Program 4.3. Program 4.3 creates a merge tree layer by layer, using an auxiliary procedure

merge_layer. The merge trees defined are static, i.e. the number of streams to be merged should be defined in advance, and cannot be changed easily. In [44] it is shown how to implement multiway dynamic merge trees in Concurrent Prolog, using the concept of 2-3-trees. Ueda and Chikayama [67] and Shapiro and Safra [52] improve this scheme further. More complex process structures, including rectangular and hexagonal process arrays [50], quad-trees [11], and pyramids, can easily be constructed in Concurrent Prolog. These process structures are found useful in programming systolic algorithms, and spawning virtual parallel machines [64].

295

merge_tree(Bottom,Top) ¢-.... Bottom¢[-] l merge_layer (Bott om,Bottoml), merge_tree (Bott oml?,Top). merge_tree([Xs],Xs). merge-layer ([Xs,YslBottomt,[ZslBottoml ?]) merge(Xs?,Ys?,Zs), merge_layer(Bottom?,Bottoml). merge _layer ([Xs],[Xs D. merge_layer([ ],[ ]). merge(Xs,Ys,Zs) ~ See Program 5.10. P r o g r a m 4.3: A balanced binary merge tree

4.5 Systolic programming: parallelism with locality and pipelining Systolic algorithms were designed originally by Kung and his colleagues [29] for implementation via special purpose hardware. However, they are based on two rather general principles: 1.

Localize communication

2.

Overlap and balance computation with communication.

The advantages of implementing systolic algorithms on general purpose parallel computers using a high-level language, compared to implementation in special purpose hardware, are obvious. The systolic programming approach [50] was conceived in an attempt to apply the systolic approach to general purpose parallel computers. The specification of systolic algorithms in Concurrent Prolog is rather straightforward. However, to ensure that performance is preserved in the implementation, two aspects of the execution of the program need explicit attention. One is the mapping of processes to processors, which should preserve the locality of the algorithm, using the locality of the architecture. Another is the communication pattern employed by the processes. In the systolic programming approach [50], the mapping is done using a special notation, Logo-like Turtle programs [36]. Each process, like a turtle in Logo, is associated a position and a heading. A goal in the body of a clause may have a Turtle program associated with it. When activated, this Turtle program, applied to the position and heading of the parent process, determines the position and

296

mm([

],_,[ 1). mm([XIXsl,Ys,[ZIZs]) +vm(X,Ys?,Z) @right, mm(Xs?,Ys,Zs)@forward.

vm(-,[ ],[ ]). vm(Xs,[VtYs],[ZIZs]) ~-ip(Xs?,Y?,Z), vm(Xs,Ys?,Zs) @forward. ip([XIXs ],[Y]Ys],Z) *-Z:=(X*Y)+Z1, ip(Xs?,Ys?,Z1). ip([ ],[ 1,0). Program

4.4: Matrix multiplication

heading of the new process. Using this notation, complex process structures can be m a p p e d in the desired way. Programming in Concurrent Prolog augmented with Turtle programs as a mapping notation is as easy as mastering a herd of turtles. Pipelining is the other aspect that requires explicit attention. The performance of many systolic algorithms depends on routing communication in specific patterns. The abstract specification of a systolic algorithm in Concurrent Prolog often does not enforce a communication pattern. However, the tools to do that are in the language. By appropriate transformations, broadcasting can be replaced by pipelining, and specific communication patterns can be enforced [63]. For example, Program 4.4 is a Turtle-annotated Concurrent Prolog program for multiplying two matrices, based on the classic systolic algorithm which pipelines two matrices orthogonally on the rows and columns of a processor array [ref Kung]. It assumes that the two input matrices are represented by a stream of streams of their columns and rows respectively. It produces a stream of streams of the rows of the o u t p u t matrix. The program operates by spawning a rectangular grid of ip processes for computing the inner-products of each row and column. Unlike the original systolic algorithm, this program does not pipeline the streams between ip processes but rather broadcasts them. However, pipelining can be easily achieved by adding two additional streams to each process [50].

4.6 The logical variable All the programming techniques shown before can be realized in other com-

297 putation models, with various degrees of success. For example, stream processing can be specified with functional notaLion [27]. By adding to functional languages a non-deterministic constructor they can even specify stream mergers [12]. Using simultaneous recursion equations one can specify recursive process networks. In this section we show Concurrent Prolog programming techniques which are unique to logic programming, as they rely on properties of the logical variable. Of course, one can take a functional programming language, extend it with stream constructors, non-deterministic constructors, simultaneous recursion equations, and logical variables, and perhaps achieve these techniques as well. But why approximate logic programming from below, instead of just using it?

4.6.1. Incomplete messages An incomplete message is a message that contains one or more uninstantiated variables. An incomplete message can be viewed in various ways, including: ® A message that is being sent incrementally. •

A message containing a communication channel as an argument.

•

A message containing implicitly the identity of the sender.

•

A data structure that is being constructed cooperatively.

The first and second views are taken by stream processing programs. A stream is just a message being sent incrementally, and each list-cell in the stream is a message containing the stream variable to be used in the subsequent communication. Similarly, the processes for constructing the merge trees communicated via incomplete messages, each containing a stream of streams. However, it is not necessary that the sender of an incomplete message would be the one to complete it. It could also be the receiver. Two Concurrent Prolog programming techniques - - monitors and bounded-buffers [59] M operate this way. Monitors also take the third view, that an incomplete message holds implicitly the identity of its sender. This view enables rich communication patterns to be specified without the need for an extra layer of naming conventions and communication protocols, by providing a simple mechanism for replying to a message.

4.6.2. Monitors Monitors were introduced into conventional concurrent programming languages by Hoare [21], as a technique for structuring the management of shared data. A monitor has some local data, which it maintains, and some procedures, or entries, defined for manipulating and examining the data. A user process that wants to u p d a t e or inspect the data performs the relevant monitor call.

298

stack([push(X) ISl)

stack(In?,[XIS]).

stack([pop(X) llnl,[XtS]) *--

stack(In?,S).

stack([ ],[ ]). P r o g r a m 4.5: A stack monitor

The monitor has built-in synchronization mechanisms, which prevent different callers from updating the data simultaneously and allow the inspection of data only when it is in an integral state. One of the convenient aspects of monitors is that the process performing a monitor-call does not need to identify itself explicitly. Rather, some of the arguments of the monitor call (which syntactically looks similar to a procedure call) serve as the return address for the information provided by the monitor. When the monitor call completes the caller can inspect these arguments and find there the answer to its query. Stream-based languages can mimic the concept of a monitor as follows [2]. A designated process, the "monitor" process, maintains the data to be shared. Users of the data have streams connected to the monitor via a merger. "Monitor calls" are simply messages to the monitor, which update the data and respond to queries according to the message received. The elegance in this scheme is that no special language constructs need be added in order achieve this behavior: the concepts already available, of processes, streams, and mergers, are sufficient. The awkward aspect of this scheme is routing the response back to the sender. Fortunately, in Concurrent Prolog incomplete messages allow responses to queries to be routed back to the sender directly, without the need for an explicit naming and routing mechanism. Both the underlying mechanism required to implement incomplete messages and the resulting effect from the user's point of view are similar to conventional monitors, where a process that performs a monitor call finds the answer by inspecting the appropriate argument of the call, after the call is "served". Hence Concurrent Prolog provides the convenience of monitors, while maintaining the elegance of stream-based communication. In contrast to conventional monitors, Concurrent Prolog monitors are not a special l~nguage construct, but simply a programming technique for organizing processes and data. Program 4.5 implements a simple stack monitor. It understands two messages:

push(X), on which it changes the stack contents S to [X]S], and pop(X), to which it responds by unifying the top element of the stack with X, and changing the stack contents to contain the remaining stack, pop(X) is an example of an incomplete message.

299

Monitors in Concurrent Prolog are discussed further in [48,49}.

4.6.3. Detecting distributed termination: the short-circuit technique Concurrent Prolog does not contain a sequential-AND construct. Suggestions to include one were resisted for two reasons. First, a desire to keep the number of language constructs down to a minimum. Second, the belief that even if eventually such a construct would be needed, introducing it at an early stage would encourage awkward and lazy thinking. Instead of using Concurrent Prolog's datafiow synchronization mechanism, programmers would resort to the familiar sequential construct 2 . In retrospect, this decision proved to be very important, both from an educational and an implementation point of view. Concurrent Prolog still does not have sequential-AND and Logix does not have the necessary underlying machinery to implement it, even if it was desired. The reason is that implementing sequentialAND in Concurrent Prolog on a parallel machine requires solving the problem of distributed termination detection. To run P& Q (assuming that & is the sequentialAND construct) one has to detect that P has terminated in order to proceed to Q. If P spawned many parallel processes that run on different processors, it requires detecting when all of t h e m have terminated, which is a rather difficult problem for an implementation to solve. O n the other hand, there is sometimes a need to detect when a computation terminates. First of all, as a service to the programmer or user who wishes to know whether his program worked properly and terminated, or if it has some useful or useless processes still running there in the background. Second, when interfacing with the external environment there is a need to know whether a certain set of operations, e.g. a transaction, has completed in order to proceed. This problem can be solved using a very elegant Concurrent Prolog programming technique, called the short-circuit technique, which is due to Takeuchi [58]. The idea is simple: chain the processes in a certain computation using a circuit, where each active process is an open switch on the circuit. When a process terminates, it closes the switch and shortens the circuit. When the entire circuit is shortened, global termination is detected. The technique is implemented using logical variables, as follows: each process is invoked with two variables, Left and Right, where the Left of one process is unified with the Right of another. The leftmost and rightmost processes each have 2 Early Prolog-in-Lisp implementations~ which provided an easy cop-out to Lisp, had a similar fate. Users of these systems -- typically experienced Lisp hackers -- would resort to Lisp whenever they were confronted with a difficult programming problem, instead of thinking it through in Prolog. This led some to conclude that Prolog ~wasn't for real".

300

one end of the chain connected to the manager. The manager instantiates one end of the chain to some constant and waits till the variable at the other end is instantiated to that constant as well. Each process that terminates unifies its Left and Right variables. When all terminate the entire chain becomes one variable and the manager sees the constant it sent on one end appearing on the other. An example of using the short-circuit technique is shown below, in Program 4.7.

4.7 Meta-programming and partial evaluation Meta-programs are programs that treat other programs as data. Examples of meta-programs include compilers, assemblers, and debuggers. One of the most important and useful type of meta-programs is the meta-interpreter, sometimes called a meta-circular interpreter, which is an interpreter for a language written in that language. A meta-interpreter is important from a theoretical point of view, as a measure for the quality of the language design. Designing a language with a simple metainterpreter is like solving a fixpoint equation: if the language is too complex, its meta-interpreter would be large. If it is too weak, it won't have the necessary data-structures to represent its programs and the control structures to simulate them. A language may have several meta-interpreters of different granularities. In logic programs, the most useful meta-interpreter is the one that simulates goal reduction, but relies on the underlying implementation to perform unification. An example of a Flat Concurrent Prolog meta-interpreter at this granularity is shown as Program 4.6. The meta-interpreter assumes that a guardless clause A *- B in the interpreted program is represented using the unit clause elause(A,B). If the body of the clause is empty, then B=true. A guarded clause A *-- G1B is represented by clause(A,B) *-- Gltrue. A similar interpreter for full Concurrent Prolog is shown in [48]. The plain recta-interpreter is interesting mostly for a theoretical reason, as it does nothing except simulate the program being executed. However, slight variations on it result in recta-interpreters with very useful functionalities. For example, by extending it with a short circuit, as in Program 4.7, a termination-detecting meta=interpreter is obtained. Many other important functions can be implemented via enhanced metainterpreters [45]. In Prolog, they have been used to implement explanation facilities for expert systems [56]. In compiler=based Prolog systems, as well as in Logix, the debugger is based on an enhanced meta-interpreter, and layers of protection

301

reduce(true).

reauce((A,B)) reduce(A?), reduce(B?). reduce(A) +A#true, a#(_,_) ] clause(A?,B), reduce(B ?).

% halt % fork % reduce

P r o g r a m 4.6: A plain meta-interpreter for Flat Concurrent Prolog

reduce(A,Done) +reducel (A,done-Done). reducel (true,Done-Done). reducel ((A,B),Left-Right) ~-reducel (A?,Left-Middle), reducel (B ?,Middle-Right). reducel (A,Left-Right) ~A#true, A#(_,_) I clause( A ?,B), reduce I (B ?,Left-Right ).

% halt % fork % reduce

P r o g r a m 4.7: A termination detecting meta-interpreter

and control are defined via meta-interpreters [20]. Such meta-interpreters, including abortable, interruptible, failsafe, and deadlock detecting meta-interpreters, are shown and explained in [ref Hirsch]. One problem with using such meta-interpreters directly is the execution overhead of the added layer of interpretation, which is unacceptable in many applications. In [45,60] it is shown how partial evaluation, a program-transformation technique, can eliminate the overhead of meta-interpreters. In effect, partial evaluation can turn enhanced meta-interpreters into compilers, which produce as output the input program enhanced with the functionality of the meta-interpreter.

4.8 Modular programming and programming-in-the-large The techniques shown above refer mostly to programming in the small. This does not mean that Concurrent Prolog is not suitable for programming in the large. To the contrary, we found that even using the simple module system developed for bootstrapping Logix many people could cooperate in its development. We expect

302

the situation to improve further using the hierarchical module system, currently under development. The key idea in these module systems, which are implemented entirely in Concurrent Prolog, is to use Concurrent Prolog message-passing to implement inter-module calls. This means that no additional communication mechanism is needed to support remote procedure calls between modules which reside on different processors.

5.

The Development of Concurrent Prolog

Concurrent Prolog was conceived and first implemented in November 1982, in an attempt to extend Prolog to a concurrent programming language, and to cleanup and generalize the Relational Language of Clark and Gregory [6]. Although one of the goals of the language was to be a superset of sequential Prolog, the proposed design did not seem, on the face of it, to achieve this goal, and hence was termed UA Subset of Concurrent Prolog" [49]. A major strength of that language, which later became known simply as Concurrent Prolog, was that it had a working, usable, implementation: an interpreter written in Prolog [49]. Since the concepts of the language were quite radical at the time, it seemed fruitful to try and explore them experimentally, by writing programs in the language, rather than to get involved in premature arguments on language constructs, or to implement the language "for real" before its concepts were explored and understood, or to extend this "language subset" prematurely, before its true limitations were encountered. In this respect the development of Concurrent Prolog deviated from the common practice of research on a new programming language. This typically concentrates on theoretical aspects of the language definition (e.g. CCS [34]), or attempts to construct an efficient implementation of it (e.g. Pascal), but rarely focuses on actual usage of the language through a prototype implementation. This exploratory activity proved tremendously useful. Novel ways of using logic as a programming language were unveiled [49,55,58], and techniques for incorporating conventional concepts of concurrent programming in logic were developed [48,51]. Most importantly, a large body of working Concurrent Prolog programs that solve a wide range of problems and implement many types of algorithms were gathered. This activity, which continued for a period of about two years mostly at ICOT and at the Weizmann Institute, resulted in papers on "How to do X in Concurrent Prolog" for numerous X's [5,11,14,17,18,19,46,48,50,51,52,55,57]. A programming language cannot be general purpose if only a handful of experts can grasp it and use it effectively. To investigate how easy is Concurrent

303

Prolog to learn, I have taught Concurrent Prolog programming courses at the Weizmann Institute and at the Hebrew University at Jerusalem. Altogether about 90 graduate and 100 undergraduate students in Computer Science have attended these courses. Based on performance in programming assignments and on the quality of the course's final programming projects, it seems that more then threequarters of the students became effective Concurrent Prolog programmers. The accumulated experience suggested that Concurrent Prolog would be an expressive and productive general-purpose programming language, if implemented efficiently. The strength of the language was perceived mostly in systems programming [20,45,48,59] and in the implementation of parallel and distributed algorithms [17,18,46,50]; it also seemed suitable for the implementation of knowledgeprogramming tools for AI applications [14,19], and as a system-description and simulation language [5,57]. The next step was to try and develop an efficient implementation of the language on a uniprocessor, to serve as a building-block for a parallel implementation and as a tool for exploring and testing the applicability of the language further. This proved to be surprisingly difficult. Interpreters for the language developed at the Weizmann Institute exhibited miserable performance [32]. A compiler of Concurrent Prolog on top of ProIog was developed at ICOT [68]. Although the latest version of the compiler reached a speed of more then 10K reductions per second, which is more then a quarter of the speed of the underlying Prolog system on that machine, it did not scale to large applications since it employed busy waiting. In addition to the implementation difficulties, subtle problems and opacities in the definition of the OR-parallel aspect of Concurrent Prolog were uncovered [42,66]. As a result of these difficulties we decided to switch research direction, and concentrate our implementation effort on Flat Concurrent Protog, the ANDparallel subset of Concurrent Prolog. Flat Concurrent Prolog was a "legitimate" subset of Concurrent Prolog for two reasons. First, it has a simple metainterpreter, shown above as Program 4.6. Second, we have discovered that almost all the applications that have been written in Concurrent Prolog previously are either in its Flat subset already, or can be easily hand-converted into it. This demonstrated the utility of having a large body of Concurrent Prolog code. Without it we would not have had the courage to make what seemed to be such a drastic cut in the language. There was one Concurrent Prolog program that would not translate into Flat Concurrent Prolog easily: an Or-parallel Prolog interpreter. This four-clause program, written by Ken Kahn, and shown as Program 5.1, was simultaneously the final victory of Concurrent Prolog, and its death-blow. It was a victory to the pragmatic expressiveness of Concurrent Prolog, since it showed that without extending the original "Subset of Concurrent Prolog', the language was as expres-

304

solve([ ]). solve([AlAsl) *clauses(A,Cs), resolve(A?,Cs?,As?). resolve( A,[(A ~- Bs)]Cs],As) *append(Bs?,As?,ABs), solve(A,Bs?) [true. resolve(A,iC[Cs] ,As ) *resolve(A?,Cs?,As?) ]true. append(Xs,Ys,Zs) *-- See Program 3.1 clauses(A,Cs) ~- Cs is the list of clauses in A's procedure. Program

5.1: Kahn's Or-parallel Prolog interpreter

sive as Prolog: any pure Prolog program can run on a Concurrent Prolog machine (with Or-parallelism for free!), by adding to it the four clauses of Kahn's interpreter. Thus the original design goal of Concurrent Prolog - - to have a concurrent programming language that includes Prolog - - was actually achieved, though it took more then a year to realize that. It was a death-blow to the implementability of Concurrent Prolog, at least for the time being, since it showed that implementing Concurrent Prolog efficiently is as hard as, and probably harder than, implementing Or-parallel Prolog. As we all know, no one knows how to implement Or-parallel Prolog efficiently, as yet. Once the switch to Flat Concurrent Prolog was made, in June 1984, implementation work began to progress rapidly. A simple interpreter for the language was implemented in Pascal [33]. An abstract instruction set for Flat Concurrent Prolog, based on the Warren Instruction set for unification [72] and the abstract machine embodied in the FCP interpreter, was designed [24], and an initial version of the compiler was written in Flat Concurrent Prolog. In July 1985, the bootstrapping of this compiler-based system was completed. The system, called Logix [54] is a single-user multi-tasking program development environment. It consists of: a five-pass compiler, including a tokenizer, parser, preprocessor, encoder, and an assembler. An interactive shell, which includes a command-line editor, and supports management and inspection of multiple parallel computations. A source level debugger, based on a meta-interpreter; a module system that supports separate compilation, runtime linking, and a free mixing of interpreted (debuggable) and compiled modules. A tty-controlIer, which allows multiple parallel processes, including the interactive shell, to interact with the user

305

in a consistent way. A simple file-server, which interfaces to the Unix file system; and some input, output, profiling, style-checking, and other utilities. The system is written in Flat Concurrent Prolog. Its source is about 10,000 lines of code long, divided between 45 modules. About half of it is the compiler. The system uses no side-effects or other extra-logical constructs, except in a few well-defined places. In the interface to the physical devices, low-level kernels make the keyboard and screen look like Concurrent Prolog input and output streams of bytes, and the Unix file system looks like a Concurrent Prolog monitor that maintains an association table of (FileName,FileContents). In the multiway stream merger and distributer, which are used heavily by the rest of the system, destructive-assignment is used to achieve constant delay [52], compared with the logarithmic delay that can be achieved in pure Concurrent Prolog [51]. The other part of the system, written in C, includes an emulator of the abstract machine, an implementation of the kernels, and a stop-and-copy garbage collector [24]. It is about 6000 lines of code long. When compiled on the VAX, the emulator occupies about 60K bytes, and Logix another 300K bytes ~ . When idle, Logix consists of about 750 Concurrent Prolog processes. Logix itself is running as one Unix process. The compiler compiles about 100 source lines per cpu minute on a VAX/ll750. A run of the compiler on the encoder, which is about 400 lines long, creates about 31,000 temporary Concurrent Prolog processes, and generates about 1.5M bytes of temporary data structures (garbage). During this computation about 90,000 process reductions occur and 10,000 process suspensions/activations. Overall, the system achieves at present about a fifth to a quarter of the speed of Quintus Prolog [38], which is the fastest commercially available Prolog on t h e VAX today. The number is obtained by comparing Concurrent Prolog process reductions to Prolog procedure calls for the same logic programs. This indicates that the efficiency of Warren's abstract ProIog machine [72], which is at the basis of Quintus Prolog, and our Flat Concurrent Prolog machine is about the same. The gap can be closed by rewriting our emulator in assembly language, as Quintus does. To explain this similarity in performance, recall that although Flat Concurrent ProIog needs to create and maintain processes, which is a bit more expensive t h e n creating stack frames for Prolog procedure calls, it does not support deep backtracking, where Prolog does and pays dearly for it.

3 At the moment we use word encoding, rather then byte encoding, for the abstract machine instructions.

306

6.

Efforts at ICOT and Imperial College: GHC and PARLOG

In the meantime ICOT did not stand still. Given their decision to use Concurrent Prolog as the basis for Kernel Language 1 [13], the core programming language of their planned Parallel Inference Machine, they have also attempted to implement its Or-parallel aspect° Prototype implementations of three different schemes were constructed, namely shallow-blnding [35], deep-binding, and lazycopying (the scheme we tried at Weizmann) [62]. Shallow binding proved to be the fastest, but did not seem to scale to multiprocessors. Lazy copying was the slowest, so the choice seemed to fall on deep-binding. Unfortunately the implementation scheme was rather complex, and the subtle problems with Concurrent Prolog's Or-parallelism were still unsolved. On the other hand, ICOT did not want to follow the Flat Concurrent Prolog path since it seemed to take them even further away from Prolog and from the AI applications envisioned for the Parallel Inference Machine. An elegant solution to these problems was found in Guarded Horn Clauses [65], a novel concurrent logic programming language. The main design choice of GHC was to eliminate multiple Or-parallel environments from Concurrent Prolog. Besides avoiding a major implementation problem, this decision also provided a synchronization rule: if you try to write on the parent environment, then suspend (in Concurrent Prolog a process would allocate a local copy of the variable and continue instead). This rule made the read-only annotation somewhat superfluous. The resulting language exhibits elegance and conciseness, and seems to capture most of Concurrent Prolog's applications and programming techniques, excluding, of course, Kahn's Or-parallel Prolog interpreter. GHC is the current choice of ICOT for Kernel Language 1. Besides solving some of the difficulties in the definition and implementation of Concurrent Prolog, GHC is "Made in Japan", which certainly is not a disadvantage from ICOT's point of view. Recent implementation efforts at ICOT concentrate on Flat GHC, which is the GHC analogue to Flat Concurrent Prolog. So why didn't we switch to GHC? Long discussion were carried at our group about this option. Our general conclusion was that even though GHC is a simpler formalism, it is also more fragile, tess expressive, and more difficult to extend. We felt it would either break or lose much of its elegance when faced with the problems of implementing a real operating system, which includes a secure kernel, errorhandling for user programs, and distributed termination and deadlock detection. Furthermore, it would be less adequate for AI applications, since it has a weaker notion of unification. Another related research effort is the development of the PARLOG programming language by Clark and Gregory at Imperial College [7]. PARLOG is compileroriented, even more than GHC, in a way that seems to render it unsuitable for

307

meta-programming. Given our com~aitment to implement the entire programming environment and operating system around the concepts of metaAnterpretation and partial-evaluation, we cannot use PAELOG. On the performance side, PARLOG and GHC seem quite similar, except that GHC has to make a runtime check that guards do not write on the parent's environment, whereas PARLOG ensures this at compile-time, using what is called a safety-check [8]. On the expressiveness side, there does not seem to be a grea.t difference between PARLOG and GHC, except for meta-programming. Alternative synchronization constructs to the read-only variable were proposed by Saraswat [43] and by Ramakrishnan and Silberschatz [39].

7.

Current Research Directions

The main focus of our current research at the Weizmann Institute is the implementation of a Concurrent Protog based general-purpose parallel computer system. Our present implementation vehicle is Intel's iPSC d4/me, a memoryenhanced four-dimensional hypercube, which, incidentally, is isomorphic to a 4 x 4 mesh-connected torus. As a first step, a distributed FCP interpreter is being implemented in C, based on a distributed unification algorithm which guarantees the atomicity of goal reductions [44]. Also a technique for implementing Concurrent ProIog virtual machines that manage code and process mapping on top of the physical machine has been developed [64]. Since Logix is self-contained, once the abstract FCP machine runs on a parallel computer, an entire program development environment and operating system will also become available on it. For example, the Logix source-level debugger, as well as other meta-interpreter based tools such as a profiler, would preserve the parallelism of the interpreted program while executing on a parallel computer. So with this system a parallel computer could be used both as the development machine and as the target machine, which is clearly advantageous over the sequential front-end/parallel back-end machine approach. Since both source text, parsed code, and compiled code are first-class objects in Logix, routines that implement code-management algorithms on the parallel computer could be written in Concurrent Prolog itself [64]. A technique for compiling Concurrent Prolog into Flat Concurrent Prolog was developed [10]. It involves writing a Concurrent Prolog interpreter in Flat Concurrent Prolog, and then partially evaluating it [15] with respect to the program to be compiled. It avoids the dynamic multiple-environment problem by requiring static output annotations on variables to be written upon. An attempt to provide Concurrent Prolog with a precise semantics is also being made, following initial work by Levi and Palamidessi [31] and Saraswat [43].

308

Another research direction pursued is partial evaluation [45], a technique of program transformation and optimization which proves to be very versatile when combined with heavy usage of interpreters and meta-interpreters [20,54], as in Logix. We believe that parallel execution is not a substitute for, but rather is dependent upon, efficient uniprocessor implementation. To that effect a highperformance FCP compiler is being developed. Hand timings indicate expected performance of about 30K LIPS for a 10MHz 68010. Lastly, Logix itself is still under development. Short term extensions include a hierarchical module system and a window system. Longer term research includes extending it to a multiprocessor/multiuser operating system.

8.

Conclusion

Our research on Concurrent Prolog has demonstrated that a high-level logic programming language can express conveniently a wide range of parallel algorithms. The performance of the Logix system demonstrates that a side-effect free language based on light-weight processes can be practical even on conventional uniprocessors. It thus "debunks the expensive process spawn myth". Its functionality and pace of development testifies that Concurrent Prolog is a usable and productive systems programming language. We have yet to demonstrate the practicality of Concurrent Prolog for programming parallel computers. Our prototyping engine is Intel's iPSC. We find the ultimate and most important question to be: which of the currently proposed approaches will result in a scalable parallel computer system, whose generality of applications, ease of use, and cost/performance ratio in terms of both hardware and software would compete favorably with existing sequential computers. Until such a system is demonstrated, the question of parallel processing could not be considered as solved.

Acknowledgements The research reported on in this survey has been conducted in cooperation with many people at ICOT, The Weizmann Institute, and other places; perhaps too many to recall by name. I am particularly indebted to the hospitality and

309

stimulating research environment provided by ICOT and its people. The development of Logix was supported by IBM Poughkeepsie, Data Systems Division. Contributors to its development include Avshalom Houri, William Silverman, Jim Crammond, Michael Hirsch, Colin Mierowsky, Shmuel Safra, Steve Taylor, and Marc Rosen. I am grateful to Vijay saraswat for discussions on read-only unification, and to Steve Taylor and William Silverman for comments on earlier drafts of the paper.

References [1] W.B. Ackerman, "Data flow languages", IEEE Computer, Vot. 15, No. 2, 1982, pp. 15-25. [2] Arvind and J.D. Brock, "Streams and managers", in M. Makegawa and L.A. Belady (eds.), Operating Systems Engineering, Springer-Verlag, 1982, pp. 452465. Lecture Notes in Computer Science, No. 143. [3] C. Bloch, "Source to source transformations of logic programs", Weizmann Institute Technical Report CS84-22, 1984. [4] D.L. Bowen, L. Byrd, L.M. Pereira, F.C.N. Pereira and D.H.D. Warren, "PROLOG on the DECSystem-10 user's manual", Technical Report, University of Edinburgh, Department of Artificial Intelligence, October, 1981. [5] K. Broda and S. Gregory, "PARLOG for discrete event simulation", Proceedings of the 2nd International Logic Programming Conference, Uppsala, 1984, pp. 77-312. [6] K.L. Clark and S. Gregory, "A relational language for parallel programming", in Proceedings of the A CM Conference on Functional Programming Languages and Computer Architecture, October, 1981. [7] K.L. Clark and S. Gregory, "PARLOG: Parallel programming in logic", Research Report DOC 84/4, April, 1984. [8] K.L. Clark and S. Gregory, ~Notes on the implementation of PARLOG", Research Report DOC 84/16, October, 1984. [9] K.L. Clark and S.-A. Tarnlund,"A first-order theory of data and programs", in B. Gilchrist (ed.), Information Processing, Vol. 77, North-Holland, 1977, pp. 939-944. [10] M. Codish and E. Shapiro, "Compiling Or-parallelism into And-parallelism", Proceedings of the Third International Conference on Logic Programming, Springer LNCS, July 1986. [11] S. Edelman and E. Shapiro, "Quadtrees in Concurrent Prolog', Proceedings of

310

the International Conference on Parallel Processing, IEEE Computer Society, August, 1985, pp. 544-551. [12] D.P. Friedman and D.S. Wise, "An approach to fair applicative multiprogramming', in G. Kahn (ed.), Semantics of Concurrent Computations, SpringerVerlag, 1979. Lecture Notes in Computer Science, No. 70. [13] K. Furukawa, S. Kunifuji, A. Takeuchi and K. Ueda, "The conceptual specification of the Kernel Language version 1", ICOT Technical Report TR-054, 1985. [14] K. Furukawa, A. Takeuchi, S. Kunifuji, H. Yasukawa, M. Ohki and K. Ueda, "Mandala: A logic based knowledge programming system", Proceedings of FGCS '84, Tokyo, Japan, 1984, pp. 613-622. [15] Y. Futamura, "Partial evaluation of computation process - an approach to a compiler-compiler', Systems, Computers, Controls, Vol. 2, No. 5, 1971, pp. 721-728. [16] C.C. Green, "Theorem proving by resolution as a basis for question answering", in B. Meltzer and D. Michie (eds.), Machine Intelligence, Vol. 4, Edinburgh University Press, Edinburgh, 1969, pp. 183-205. [17] L. Hellerstein, "A Concurrent Prolog based region finding algorithm", Honors Thesis, Harvard University, Computer Science Department, May, 1984. [18] L. Hellerstein and E. Shapiro, "Implementing parallel algorithms in Concurrent Prolog: The MAXFLOW experience", Proceedings of the International Symposium on Logic Programming, Atlantic City, New Jersey, February, 1984. [19] H. Hirakawa, "Chart parsing in Concurrent Prolog", ICOT Technical Report TR-008, 1983. [20] M. Hirsch, W. Silverman and E. Shapiro, "Layers of protection and control in the Logix system", Weizmann Institute Technical Report CS86-??, 1986. [21] C.A.R. Hoare, "Monitors: an operating systems structuring concept", Communications of the ACM, Vol. 17, No. 10, 1974, pp. 549-557. [22] C.A.R. Hoare, Communicating Sequential Processes, Prentice-Hall, 1985. [23] J.E. Hopcroft and J.D. Ullman, Introduction to automata theory, Languages, and Computation, Addison Wesley, Reading, MA, 1979. [24] A. Houri, "An abstract machine for Flat Concurrent Prolog', M.Sc. Thesis, Weizmann Institute of Science, 1986. [25] INMOS Ltd., IMS T424 Transputer Reference Manual, INMOS, 1984. [26] S.D. Johnson, "Circuits and systems: Implementing Communications with streams", Technical Report 116, Indiana University, Computer Science Department, October, 1981.

311

[27] G. Kahn and D.B. MacQueen, "Coroutines and networks of parallel processes", in G. Gilchrist (ed.), Information Processing, Vol. 77, North-Holland, 1977, pp. 993-998. [28] R.A. Kowalski, Logic/or Problem Solving, Elsevier North Holland Inc., 1979. [29] H.T. Kung, "Why systolic architectures?", IEEE Computer, Vol. 15, No. 1, 1982, pp. 37-46. [30] L. Lamport, "A recursive Concurrent Algorithm", January, 1982, Unpublished note. [31] G. Levi and Palamidessi, "The semantics of the read-only variable", 1985 Symposium on Logic Programming, IEEE Computer Society, July, 1985, pp. 128-137. [32] J. Levy, "A unification algorithm for Concurrent Prolog', Proceedings of the Second International Logic Programming Conference, Uppsala, 1984, pp. 333341. [33] C. Mierowsky, S. Taylor, E. Shapiro, J. Levy and M. Safra, "The design and implementation of Flat Concurrent Prolog', Weizmann Institute Technical Report CS85-09, 1985. [34] R. Milner, A Calculus of Communicating Systems, Lecture Notes in Computer Science, Vol. 92, Springer-Verlag, 1980. [35] T. Miyazaki, A. Takeuchi and T. Chikayama, "A sequential implementation of Concurrent Prolog based on the shallow binding scheme", 1985 Symposium on Logic Programming, IEEE Computer Society, 1985, pp. 110-118. [36] S. Pappert, Mindstorms: Children, computers, and powerful ideas", Basic Books, New York, 1980. [37] F. Pereira, "C-Prolog user's manual", EdCAAD, University of Edinburgh, 1983. [38] Quintus Prolog Reference Manual, Quintus Computer Systems Inc., 1985. [39] R. Ramakrishnan and A. Silberschatz, "Annotations for Distributed Programming in Logic", in Conference Record of the Thirteen Annual ACM Symposium on Principles of Programming Languages, January, 1986. [40] J.A. Robinson, "A machine oriented logic based on the resolution principle", Journal of the ACM, Vol. 12, January, 1965, pp. 23-41. [41] P. Roussel, "Prolog: Manuel reference et d'utilisation', Technical Report, Groupe d'Intelligence Artificielle, Marseille-Luminy, September, 1975. [42] V.A. Saraswat, "Problems with Concurrent Prolog', Carnegie-Mellon University CSD Technical Report CS-86-100, January, 1986.

312

[43] V.A. Saraswat, "Partial Correctness Semantics for CP[?,[,&]', Proceedings of the Fifth Conference on Foundations of Software Technology and Theoretical Computer Science, New Delhi, 1985, Springer LNCS 206. [44] M. Safra, S. Taylor and E. Shapiro, "Distributed Execution of Flat Concurrent Prolog', To appear as a Weizmann Institute technical report. [45] S. Safra and E. Shapiro, "Meta-interpreters for real", to appear in Proceedings of IFIP-86. [46] A. Shafrir and E. Shapiro, "Distributed programming in Concurrent Prolog', Weizmann Institute Technical Report CS83-12, August, 1983. [47] E. Shapiro, Algorithmic Program Debugging, MIT Press, 1983. [48] E. Shapiro, "Systems programming in Concurrent Prolog", in Logic Programming and its Applications, D.H.D. Warren and M. van Caneghem (eds.), Ablex, 1986. [49] E. Shapiro, "A subset of Concurrent Prolog and its interpreter", ICOT Technical Report TR-003, February, 1983. [50] E. Shapiro, "Systolic programming: A paradigm of parallel processing", Proceedings of FGCS '8~, Ohmsha, Tokyo, 1984. Revised as Weizmann Institute Technical Report CS84-16, 1984. [51] E. Shapiro and C. Mierowsky, "Fair, biased, and self-balancing merge operators: Their specification and implementation in Concurrent Prolog', Journal of New Generation Computing, Vol. 2, No. 3, 1984, pp. 221-240. [52] E. Shapiro and S. Safra, "Fast multiway merge using destructive operations", Proceedings of the International Conference on Parallel Processing, IEEE Computer Society, August, 1985, pp. 118-122. [53] S.Safra, S.Taylor and E.Shapiro, "Distributed Execution of Flat Concurrent Prolog", To appear as Weizmann Institute technical Report, 1986. [54] W. Silverman, A. Houri, M. Hirsch and E. Shapiro, "Logix user manual, release 1.1", Weizmann Institute of Science, 1985. [55] E. Shapiro and A. Takeuchi, "Object-oriented programming in Concurrent Prolog', Journal of New Generation Computing, Vol. 1, No. 1, July, 1983. [56] L. Sterling and E. Shapiro, The Art of Prolog, MIT Press, 1986. [57] N. Suzuki, "Experience with specification and verification of complex computer hardware using Concurrent Prolog', in Logic Programming and its Applications, D.H.D. Warren and M. van Caneghem (eds.), Ablex, 1986. [58] A. Takeuchi, "How to solve it in Concurrent Prolog', Unpublished note, 1983. [59] A. Takeuchi and K. Furukawa, "Interprocess communication in Concurrent

313

Prolog", Proceedings of the Logic Programming Workshop '82, Albufeira, Portugal, June, 1983, pp. 171-185. [60] A. Takeuchi and K. Furukawa, '¢Partial evaluation of Prolog programs and its application to meta programming", ICOT Technical Report TR-126, 1985. [61] H. Tamaki, "A distributed unification scheme for systolic logic programs", in Proceedings of the 1985 International Conference on Parallel Processing, pp. 552-559, IEEE, 1985. [62] J. Tanaka, T. Miyazaki and A. Takeuchi, "A sequential implementation of Concurrent Prolog - based on Lazy Copying scheme", The 1st National Conference of Japan Society for Software Science and Technology, 1984. [63] S.Taylor, L.Hellerstein, S.Safra and E.Shapiro "Notes on the Complexity of Systolic Programs", Weizmann Institute Technical Report CS86-??, 1986. [64] S.Taylor, E.Av-Ron and E.Y.Shapiro "Virtual Machines for Process and Code Mapping" Weizmann Institute Technical Report CS86-??, 1986. [65] K. Ueda, "Guarded Horn Clauses", ICOT Technical Report TR-103, 1985. [66] K. Ueda, "Concurrent Prolog re-examined", to appear as ICOT Technical Report. [67] K. Ueda and T. Chikayama, "Efficient stream/array processing in logic programming languages", Proceedings of the International Conference on 5th Generation Computer Systems, ICOT, 1984, pp. 317-326. [68] K. Ueda and T. Chikayama, "Concurrent Prolog compiler on top of Prolog', 1985 Symposium on Logic Programming, IEEE Computer Society, July, 1985, pp. 119-126. [69] M.H. van Emden and R.A. Kowalski, "The semantics of predicate logic as a programming language", Journal of the ACM, Vol. 23, October, 1976, pp. 733-742. [70] O. Viner, "Distributed constraint propagation", Weizmann Institute Technical Report CS84-24, 1984. [71] D.H.D. Warren, "Logic programming and compiler writing", Software-Practice and Experience, Vol. 10, 1980, pp. 97-125. [72] D.H.D. Warren, "An abstract Prolog instruction set", Technical Report 309, Artificial Intelligence Center, SRI International, 1983.